Compute the i2c timeout in jiffies from a value in milliseconds. The
original values of 2 jiffies equals 2 milliseconds if HZ has been
configured to a value of 1000. This corresponds to 2.2 milliseconds
used by most other DRM drivers. Update ast accordingly.
Signed-off-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Fixes: 312fec1405dd ("drm: Initial KMS driver for AST (ASpeed Technologies) 2000 series (v2)")
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Jocelyn Falempe <jfalempe(a)redhat.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v3.5+
---
drivers/gpu/drm/ast/ast_ddc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ast/ast_ddc.c b/drivers/gpu/drm/ast/ast_ddc.c
index b7718084422f3..3e156a6b6831d 100644
--- a/drivers/gpu/drm/ast/ast_ddc.c
+++ b/drivers/gpu/drm/ast/ast_ddc.c
@@ -153,7 +153,7 @@ struct ast_ddc *ast_ddc_create(struct ast_device *ast)
bit = &ddc->bit;
bit->udelay = 20;
- bit->timeout = 2;
+ bit->timeout = usecs_to_jiffies(2200);
bit->data = ddc;
bit->setsda = ast_ddc_algo_bit_data_setsda;
bit->setscl = ast_ddc_algo_bit_data_setscl;
--
2.44.0
This bug was found with syzkaller on Linux kernel v5.10.
This patch series fixes the bug.
Signed-off-by: Alexander Ofitserov <oficerovas(a)altlinux.org>
Cc: stable(a)vger.kernel.org
Xin Long (1):
rxrpc: use udp tunnel APIs instead of open code in rxrpc_open_socket
David Howells (2):
rxrpc: Fix missing dependency on NET_UDP_TUNNEL
rxrpc: Enable IPv6 checksums on transport socket
Vadim Fedorenko (1):
rxrpc: Fix dependency on IPv6 in udp tunnel config
net/rxrpc/Kconfig | 1 +
net/rxrpc/local_object.c | 77 +++++++++++++++-------------------------
2 files changed, 30 insertions(+), 48 deletions(-)
--
2.42.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 04c35ab3bdae7fefbd7c7a7355f29fa03a035221
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040850-wildly-gyration-12ff@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
04c35ab3bdae ("x86/mm/pat: fix VM_PAT handling in COW mappings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 3 Apr 2024 23:21:30 +0200
Subject: [PATCH] x86/mm/pat: fix VM_PAT handling in COW mappings
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[ 301.565725] PKRU: 55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 301.567163] ? report_bug+0x171/0x1a0
[ 301.567466] ? handle_bug+0x3c/0x80
[ 301.567743] ? exc_invalid_op+0x17/0x70
[ 301.568038] ? asm_exc_invalid_op+0x1a/0x20
[ 301.568363] ? untrack_pfn+0xf4/0x100
[ 301.568660] ? untrack_pfn+0x65/0x100
[ 301.568947] unmap_single_vma+0xa6/0xe0
[ 301.569247] unmap_vmas+0xb5/0x190
[ 301.569532] exit_mmap+0xec/0x340
[ 301.569801] __mmput+0x3e/0x130
[ 301.570051] do_exit+0x305/0xaf0
...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Wupeng Ma <mawupeng1(a)huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0d72183b5dd0..36b603d0cdde 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -947,6 +947,38 @@ static void free_pfn_range(u64 paddr, unsigned long size)
memtype_free(paddr, paddr + size);
}
+static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
+ pgprot_t *pgprot)
+{
+ unsigned long prot;
+
+ VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT));
+
+ /*
+ * We need the starting PFN and cachemode used for track_pfn_remap()
+ * that covered the whole VMA. For most mappings, we can obtain that
+ * information from the page tables. For COW mappings, we might now
+ * suddenly have anon folios mapped and follow_phys() will fail.
+ *
+ * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to
+ * detect the PFN. If we need the cachemode as well, we're out of luck
+ * for now and have to fail fork().
+ */
+ if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) {
+ if (pgprot)
+ *pgprot = __pgprot(prot);
+ return 0;
+ }
+ if (is_cow_mapping(vma->vm_flags)) {
+ if (pgprot)
+ return -EINVAL;
+ *paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
+ return 0;
+ }
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+
/*
* track_pfn_copy is called when vma that is covering the pfnmap gets
* copied through copy_page_range().
@@ -957,20 +989,13 @@ static void free_pfn_range(u64 paddr, unsigned long size)
int track_pfn_copy(struct vm_area_struct *vma)
{
resource_size_t paddr;
- unsigned long prot;
unsigned long vma_size = vma->vm_end - vma->vm_start;
pgprot_t pgprot;
if (vma->vm_flags & VM_PAT) {
- /*
- * reserve the whole chunk covered by vma. We need the
- * starting address and protection from pte.
- */
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, &pgprot))
return -EINVAL;
- }
- pgprot = __pgprot(prot);
+ /* reserve the whole chunk covered by vma. */
return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}
@@ -1045,7 +1070,6 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size, bool mm_wr_locked)
{
resource_size_t paddr;
- unsigned long prot;
if (vma && !(vma->vm_flags & VM_PAT))
return;
@@ -1053,11 +1077,8 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
/* free the chunk starting from pfn or the whole chunk */
paddr = (resource_size_t)pfn << PAGE_SHIFT;
if (!paddr && !size) {
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, NULL))
return;
- }
-
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
diff --git a/mm/memory.c b/mm/memory.c
index 904f70b99498..d2155ced45f8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5973,6 +5973,10 @@ int follow_phys(struct vm_area_struct *vma,
goto out;
pte = ptep_get(ptep);
+ /* Never return PFNs of anon folios in COW mappings. */
+ if (vm_normal_folio(vma, address, pte))
+ goto unlock;
+
if ((flags & FOLL_WRITE) && !pte_write(pte))
goto unlock;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 04c35ab3bdae7fefbd7c7a7355f29fa03a035221
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040843-utmost-staff-773b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
04c35ab3bdae ("x86/mm/pat: fix VM_PAT handling in COW mappings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 3 Apr 2024 23:21:30 +0200
Subject: [PATCH] x86/mm/pat: fix VM_PAT handling in COW mappings
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[ 301.565725] PKRU: 55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 301.567163] ? report_bug+0x171/0x1a0
[ 301.567466] ? handle_bug+0x3c/0x80
[ 301.567743] ? exc_invalid_op+0x17/0x70
[ 301.568038] ? asm_exc_invalid_op+0x1a/0x20
[ 301.568363] ? untrack_pfn+0xf4/0x100
[ 301.568660] ? untrack_pfn+0x65/0x100
[ 301.568947] unmap_single_vma+0xa6/0xe0
[ 301.569247] unmap_vmas+0xb5/0x190
[ 301.569532] exit_mmap+0xec/0x340
[ 301.569801] __mmput+0x3e/0x130
[ 301.570051] do_exit+0x305/0xaf0
...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Wupeng Ma <mawupeng1(a)huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0d72183b5dd0..36b603d0cdde 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -947,6 +947,38 @@ static void free_pfn_range(u64 paddr, unsigned long size)
memtype_free(paddr, paddr + size);
}
+static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
+ pgprot_t *pgprot)
+{
+ unsigned long prot;
+
+ VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT));
+
+ /*
+ * We need the starting PFN and cachemode used for track_pfn_remap()
+ * that covered the whole VMA. For most mappings, we can obtain that
+ * information from the page tables. For COW mappings, we might now
+ * suddenly have anon folios mapped and follow_phys() will fail.
+ *
+ * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to
+ * detect the PFN. If we need the cachemode as well, we're out of luck
+ * for now and have to fail fork().
+ */
+ if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) {
+ if (pgprot)
+ *pgprot = __pgprot(prot);
+ return 0;
+ }
+ if (is_cow_mapping(vma->vm_flags)) {
+ if (pgprot)
+ return -EINVAL;
+ *paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
+ return 0;
+ }
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+
/*
* track_pfn_copy is called when vma that is covering the pfnmap gets
* copied through copy_page_range().
@@ -957,20 +989,13 @@ static void free_pfn_range(u64 paddr, unsigned long size)
int track_pfn_copy(struct vm_area_struct *vma)
{
resource_size_t paddr;
- unsigned long prot;
unsigned long vma_size = vma->vm_end - vma->vm_start;
pgprot_t pgprot;
if (vma->vm_flags & VM_PAT) {
- /*
- * reserve the whole chunk covered by vma. We need the
- * starting address and protection from pte.
- */
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, &pgprot))
return -EINVAL;
- }
- pgprot = __pgprot(prot);
+ /* reserve the whole chunk covered by vma. */
return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}
@@ -1045,7 +1070,6 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size, bool mm_wr_locked)
{
resource_size_t paddr;
- unsigned long prot;
if (vma && !(vma->vm_flags & VM_PAT))
return;
@@ -1053,11 +1077,8 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
/* free the chunk starting from pfn or the whole chunk */
paddr = (resource_size_t)pfn << PAGE_SHIFT;
if (!paddr && !size) {
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, NULL))
return;
- }
-
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
diff --git a/mm/memory.c b/mm/memory.c
index 904f70b99498..d2155ced45f8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5973,6 +5973,10 @@ int follow_phys(struct vm_area_struct *vma,
goto out;
pte = ptep_get(ptep);
+ /* Never return PFNs of anon folios in COW mappings. */
+ if (vm_normal_folio(vma, address, pte))
+ goto unlock;
+
if ((flags & FOLL_WRITE) && !pte_write(pte))
goto unlock;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 04c35ab3bdae7fefbd7c7a7355f29fa03a035221
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040847-departure-lining-fed7@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
04c35ab3bdae ("x86/mm/pat: fix VM_PAT handling in COW mappings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 04c35ab3bdae7fefbd7c7a7355f29fa03a035221 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Wed, 3 Apr 2024 23:21:30 +0200
Subject: [PATCH] x86/mm/pat: fix VM_PAT handling in COW mappings
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[ 301.565725] PKRU: 55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 301.567163] ? report_bug+0x171/0x1a0
[ 301.567466] ? handle_bug+0x3c/0x80
[ 301.567743] ? exc_invalid_op+0x17/0x70
[ 301.568038] ? asm_exc_invalid_op+0x1a/0x20
[ 301.568363] ? untrack_pfn+0xf4/0x100
[ 301.568660] ? untrack_pfn+0x65/0x100
[ 301.568947] unmap_single_vma+0xa6/0xe0
[ 301.569247] unmap_vmas+0xb5/0x190
[ 301.569532] exit_mmap+0xec/0x340
[ 301.569801] __mmput+0x3e/0x130
[ 301.570051] do_exit+0x305/0xaf0
...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Wupeng Ma <mawupeng1(a)huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 0d72183b5dd0..36b603d0cdde 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -947,6 +947,38 @@ static void free_pfn_range(u64 paddr, unsigned long size)
memtype_free(paddr, paddr + size);
}
+static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
+ pgprot_t *pgprot)
+{
+ unsigned long prot;
+
+ VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT));
+
+ /*
+ * We need the starting PFN and cachemode used for track_pfn_remap()
+ * that covered the whole VMA. For most mappings, we can obtain that
+ * information from the page tables. For COW mappings, we might now
+ * suddenly have anon folios mapped and follow_phys() will fail.
+ *
+ * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to
+ * detect the PFN. If we need the cachemode as well, we're out of luck
+ * for now and have to fail fork().
+ */
+ if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) {
+ if (pgprot)
+ *pgprot = __pgprot(prot);
+ return 0;
+ }
+ if (is_cow_mapping(vma->vm_flags)) {
+ if (pgprot)
+ return -EINVAL;
+ *paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
+ return 0;
+ }
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+
/*
* track_pfn_copy is called when vma that is covering the pfnmap gets
* copied through copy_page_range().
@@ -957,20 +989,13 @@ static void free_pfn_range(u64 paddr, unsigned long size)
int track_pfn_copy(struct vm_area_struct *vma)
{
resource_size_t paddr;
- unsigned long prot;
unsigned long vma_size = vma->vm_end - vma->vm_start;
pgprot_t pgprot;
if (vma->vm_flags & VM_PAT) {
- /*
- * reserve the whole chunk covered by vma. We need the
- * starting address and protection from pte.
- */
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, &pgprot))
return -EINVAL;
- }
- pgprot = __pgprot(prot);
+ /* reserve the whole chunk covered by vma. */
return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}
@@ -1045,7 +1070,6 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size, bool mm_wr_locked)
{
resource_size_t paddr;
- unsigned long prot;
if (vma && !(vma->vm_flags & VM_PAT))
return;
@@ -1053,11 +1077,8 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
/* free the chunk starting from pfn or the whole chunk */
paddr = (resource_size_t)pfn << PAGE_SHIFT;
if (!paddr && !size) {
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, NULL))
return;
- }
-
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
diff --git a/mm/memory.c b/mm/memory.c
index 904f70b99498..d2155ced45f8 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5973,6 +5973,10 @@ int follow_phys(struct vm_area_struct *vma,
goto out;
pte = ptep_get(ptep);
+ /* Never return PFNs of anon folios in COW mappings. */
+ if (vm_normal_folio(vma, address, pte))
+ goto unlock;
+
if ((flags & FOLL_WRITE) && !pte_write(pte))
goto unlock;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
From 310227f42882c52356b523e2f4e11690eebcd2ab Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Tue, 13 Feb 2024 14:54:25 +0100
Subject: [PATCH] virtio: reenable config if freezing device failed
Currently, we don't reenable the config if freezing the device failed.
For example, virtio-mem currently doesn't support suspend+resume, and
trying to freeze the device will always fail. Afterwards, the device
will no longer respond to resize requests, because it won't get notified
about config changes.
Let's fix this by re-enabling the config if freezing fails.
Fixes: 22b7050a024d ("virtio: defer config changed notifications")
Cc: <stable(a)kernel.org>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Cc: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Message-Id: <20240213135425.795001-1-david(a)redhat.com>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
---
drivers/virtio/virtio.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f4080692b3513..f513ee21b1c18 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -510,8 +510,10 @@ int virtio_device_freeze(struct virtio_device *dev)
if (drv && drv->freeze) {
ret = drv->freeze(dev);
- if (ret)
+ if (ret) {
+ virtio_config_enable(dev);
return ret;
+ }
}
if (dev->config->destroy_avq)
--
2.43.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
From 310227f42882c52356b523e2f4e11690eebcd2ab Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Tue, 13 Feb 2024 14:54:25 +0100
Subject: [PATCH] virtio: reenable config if freezing device failed
Currently, we don't reenable the config if freezing the device failed.
For example, virtio-mem currently doesn't support suspend+resume, and
trying to freeze the device will always fail. Afterwards, the device
will no longer respond to resize requests, because it won't get notified
about config changes.
Let's fix this by re-enabling the config if freezing fails.
Fixes: 22b7050a024d ("virtio: defer config changed notifications")
Cc: <stable(a)kernel.org>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Cc: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Message-Id: <20240213135425.795001-1-david(a)redhat.com>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
---
drivers/virtio/virtio.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f4080692b3513..f513ee21b1c18 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -510,8 +510,10 @@ int virtio_device_freeze(struct virtio_device *dev)
if (drv && drv->freeze) {
ret = drv->freeze(dev);
- if (ret)
+ if (ret) {
+ virtio_config_enable(dev);
return ret;
+ }
}
if (dev->config->destroy_avq)
--
2.43.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
From 310227f42882c52356b523e2f4e11690eebcd2ab Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Tue, 13 Feb 2024 14:54:25 +0100
Subject: [PATCH] virtio: reenable config if freezing device failed
Currently, we don't reenable the config if freezing the device failed.
For example, virtio-mem currently doesn't support suspend+resume, and
trying to freeze the device will always fail. Afterwards, the device
will no longer respond to resize requests, because it won't get notified
about config changes.
Let's fix this by re-enabling the config if freezing fails.
Fixes: 22b7050a024d ("virtio: defer config changed notifications")
Cc: <stable(a)kernel.org>
Cc: "Michael S. Tsirkin" <mst(a)redhat.com>
Cc: Jason Wang <jasowang(a)redhat.com>
Cc: Xuan Zhuo <xuanzhuo(a)linux.alibaba.com>
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Message-Id: <20240213135425.795001-1-david(a)redhat.com>
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
---
drivers/virtio/virtio.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c
index f4080692b3513..f513ee21b1c18 100644
--- a/drivers/virtio/virtio.c
+++ b/drivers/virtio/virtio.c
@@ -510,8 +510,10 @@ int virtio_device_freeze(struct virtio_device *dev)
if (drv && drv->freeze) {
ret = drv->freeze(dev);
- if (ret)
+ if (ret) {
+ virtio_config_enable(dev);
return ret;
+ }
}
if (dev->config->destroy_avq)
--
2.43.0
Tony reported that the Machine check recovery was broken in v6.9-rc1,
as he was hitting a VM_BUG_ON when injecting uncorrectable memory errors
to DRAM.
After some more digging and debugging on his side, he realized that this
went back to v6.1, with the introduction of 'commit 0d206b5d2e0d ("mm/swap: add
swp_offset_pfn() to fetch PFN from swap entry")'.
That commit, among other things, introduced swp_offset_pfn(), replacing
hwpoison_entry_to_pfn() in its favour.
The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(),
but is_pfn_swap_entry() never got updated to cover hwpoison entries, which
means that we would hit the VM_BUG_ON whenever we would call
swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM set.
Fix this by updating the check to cover hwpoison entries as well, and update
the comment while we are it.
Reported-by: Tony Luck <tony.luck(a)intel.com>
Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
Tested-by: Tony Luck <tony.luck(a)intel.com>
Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
Cc: <stable(a)vger.kernel.org> # 6.1.x
Signed-off-by: Oscar Salvador <osalvador(a)suse.de>
---
include/linux/swapops.h | 65 +++++++++++++++++++++--------------------
1 file changed, 33 insertions(+), 32 deletions(-)
diff --git a/include/linux/swapops.h b/include/linux/swapops.h
index 48b700ba1d18..a5c560a2f8c2 100644
--- a/include/linux/swapops.h
+++ b/include/linux/swapops.h
@@ -390,6 +390,35 @@ static inline bool is_migration_entry_dirty(swp_entry_t entry)
}
#endif /* CONFIG_MIGRATION */
+#ifdef CONFIG_MEMORY_FAILURE
+
+/*
+ * Support for hardware poisoned pages
+ */
+static inline swp_entry_t make_hwpoison_entry(struct page *page)
+{
+ BUG_ON(!PageLocked(page));
+ return swp_entry(SWP_HWPOISON, page_to_pfn(page));
+}
+
+static inline int is_hwpoison_entry(swp_entry_t entry)
+{
+ return swp_type(entry) == SWP_HWPOISON;
+}
+
+#else
+
+static inline swp_entry_t make_hwpoison_entry(struct page *page)
+{
+ return swp_entry(0, 0);
+}
+
+static inline int is_hwpoison_entry(swp_entry_t swp)
+{
+ return 0;
+}
+#endif
+
typedef unsigned long pte_marker;
#define PTE_MARKER_UFFD_WP BIT(0)
@@ -483,8 +512,9 @@ static inline struct folio *pfn_swap_entry_folio(swp_entry_t entry)
/*
* A pfn swap entry is a special type of swap entry that always has a pfn stored
- * in the swap offset. They are used to represent unaddressable device memory
- * and to restrict access to a page undergoing migration.
+ * in the swap offset. They can either be used to represent unaddressable device
+ * memory, to restrict access to a page undergoing migration or to represent a
+ * pfn which has been hwpoisoned and unmapped.
*/
static inline bool is_pfn_swap_entry(swp_entry_t entry)
{
@@ -492,7 +522,7 @@ static inline bool is_pfn_swap_entry(swp_entry_t entry)
BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_PFN_BITS);
return is_migration_entry(entry) || is_device_private_entry(entry) ||
- is_device_exclusive_entry(entry);
+ is_device_exclusive_entry(entry) || is_hwpoison_entry(entry);
}
struct page_vma_mapped_walk;
@@ -561,35 +591,6 @@ static inline int is_pmd_migration_entry(pmd_t pmd)
}
#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
-#ifdef CONFIG_MEMORY_FAILURE
-
-/*
- * Support for hardware poisoned pages
- */
-static inline swp_entry_t make_hwpoison_entry(struct page *page)
-{
- BUG_ON(!PageLocked(page));
- return swp_entry(SWP_HWPOISON, page_to_pfn(page));
-}
-
-static inline int is_hwpoison_entry(swp_entry_t entry)
-{
- return swp_type(entry) == SWP_HWPOISON;
-}
-
-#else
-
-static inline swp_entry_t make_hwpoison_entry(struct page *page)
-{
- return swp_entry(0, 0);
-}
-
-static inline int is_hwpoison_entry(swp_entry_t swp)
-{
- return 0;
-}
-#endif
-
static inline int non_swap_entry(swp_entry_t entry)
{
return swp_type(entry) >= MAX_SWAPFILES;
--
2.44.0
While adding the GIC ITS MSI support, it was found that the msi-map entries
needed to be swapped to receive MSIs from the endpoint.
But later it was identified that the swapping was needed due to a bug in
the Qualcomm PCIe controller driver. And since the bug is now fixed with
commit bf79e33cdd89 ("PCI: qcom: Enable BDF to SID translation properly"),
let's fix the msi-map entries also to reflect the actual mapping in the
hardware.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
Manivannan Sadhasivam (3):
arm64: dts: qcom: sm8450: Fix the msi-map entries
arm64: dts: qcom: sm8550: Fix the msi-map entries
arm64: dts: qcom: sm8650: Fix the msi-map entries
arch/arm64/boot/dts/qcom/sm8450.dtsi | 16 ++++------------
arch/arm64/boot/dts/qcom/sm8550.dtsi | 10 ++++------
arch/arm64/boot/dts/qcom/sm8650.dtsi | 10 ++++------
3 files changed, 12 insertions(+), 24 deletions(-)
---
base-commit: f6cef5f8c37f58a3bc95b3754c3ae98e086631ca
change-id: 20240318-pci-bdf-sid-fix-2e7db6fe4238
Best regards,
--
Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Hi, developers,
This is Chenglong Tang from the Google Container Optimized OS team. We
recently received a kernel panic bug from the customers regarding cifs.
This happened since the backport of following changes in cifs(in our kernel
COS-5.10.208 and COS-5.15.146):
cifs: Fix non-availability of dedup breaking generic/304:
https://lore.kernel.org/r/3876191.1701555260@warthog.procyon.org.uk/
smb: client: fix potential NULL deref in parse_dfs_referrals(): Upstream
commit 92414333eb375ed64f4ae92d34d579e826936480
ksmbd: fix wrong name of SMB2_CREATE_ALLOCATION_SIZE: Upstream
commit 13736654481198e519059d4a2e2e3b20fa9fdb3e
smb: client: fix NULL deref in asn1_ber_decoder(): Upstream commit
90d025c2e953c11974e76637977c473200593a46
smb: a few more smb changes...
The line that crashed is line 197 in fs/cifs/dfs_cache.c
```
if (unlikely(strcmp(cp->charset, cache_cp->charset))) {
```
I attached the dmesg and backtrace for debugging purposes. Let me know if
you need more information.
Best,
Chenglong
On x86 each cpu_hw_events maintains a table for counter assignment but
it missed to update one for the deleted event in x86_pmu_del(). This
can make perf_clear_dirty_counters() reset used counter if it's called
before event scheduling or enabling. Then it would return out of range
data which doesn't make sense.
The following code can reproduce the problem.
$ cat repro.c
#include <pthread.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/perf_event.h>
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/syscall.h>
struct perf_event_attr attr = {
.type = PERF_TYPE_HARDWARE,
.config = PERF_COUNT_HW_CPU_CYCLES,
.disabled = 1,
};
void *worker(void *arg)
{
int cpu = (long)arg;
int fd1 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
int fd2 = syscall(SYS_perf_event_open, &attr, -1, cpu, -1, 0);
void *p;
do {
ioctl(fd1, PERF_EVENT_IOC_ENABLE, 0);
p = mmap(NULL, 4096, PROT_READ, MAP_SHARED, fd1, 0);
ioctl(fd2, PERF_EVENT_IOC_ENABLE, 0);
ioctl(fd2, PERF_EVENT_IOC_DISABLE, 0);
munmap(p, 4096);
ioctl(fd1, PERF_EVENT_IOC_DISABLE, 0);
} while (1);
return NULL;
}
int main(void)
{
int i;
int n = sysconf(_SC_NPROCESSORS_ONLN);
pthread_t *th = calloc(n, sizeof(*th));
for (i = 0; i < n; i++)
pthread_create(&th[i], NULL, worker, (void *)(long)i);
for (i = 0; i < n; i++)
pthread_join(th[i], NULL);
free(th);
return 0;
}
And you can see the out of range data using perf stat like this.
Probably it'd be easier to see on a large machine.
$ gcc -o repro repro.c -pthread
$ ./repro &
$ sudo perf stat -A -I 1000 2>&1 | awk '{ if (length($3) > 15) print }'
1.001028462 CPU6 196,719,295,683,763 cycles # 194290.996 GHz (71.54%)
1.001028462 CPU3 396,077,485,787,730 branch-misses # 15804359784.80% of all branches (71.07%)
1.001028462 CPU17 197,608,350,727,877 branch-misses # 14594186554.56% of all branches (71.22%)
2.020064073 CPU4 198,372,472,612,140 cycles # 194681.113 GHz (70.95%)
2.020064073 CPU6 199,419,277,896,696 cycles # 195720.007 GHz (70.57%)
2.020064073 CPU20 198,147,174,025,639 cycles # 194474.654 GHz (71.03%)
2.020064073 CPU20 198,421,240,580,145 stalled-cycles-frontend # 100.14% frontend cycles idle (70.93%)
3.037443155 CPU4 197,382,689,923,416 cycles # 194043.065 GHz (71.30%)
3.037443155 CPU20 196,324,797,879,414 cycles # 193003.773 GHz (71.69%)
3.037443155 CPU5 197,679,956,608,205 stalled-cycles-backend # 1315606428.66% backend cycles idle (71.19%)
3.037443155 CPU5 198,571,860,474,851 instructions # 13215422.58 insn per cycle
It should move the contents in the cpuc->assign as well.
Fixes: 5471eea5d3bf ("perf/x86: Reset the dirty counter to prevent the leak for an RDPMC task")
Reviewed-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
---
* add Kan's reviewed-by tag
arch/x86/events/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 09050641ce5d..5b0dd07b1ef1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1644,6 +1644,7 @@ static void x86_pmu_del(struct perf_event *event, int flags)
while (++i < cpuc->n_events) {
cpuc->event_list[i-1] = cpuc->event_list[i];
cpuc->event_constraint[i-1] = cpuc->event_constraint[i];
+ cpuc->assign[i-1] = cpuc->assign[i];
}
cpuc->event_constraint[i-1] = NULL;
--cpuc->n_events;
--
2.44.0.278.ge034bb2e1d-goog
Hello,
I trust this message finds you well. My name is Marisol Alvarez, and I am currently in search of a new tax preparer for Extension. I came across your services and would like to inquire about your availability for new clients for the extension period.
Specifically, I need assistance with the preparation of my individual tax returns, which includes Schedule D and Schedule C. Could you please provide information on your fee structure for handling returns with these schedules?
Additionally, I am interested in becoming a client and would like to know the steps involved in signing up for your tax preparation services.
I appreciate your prompt response and look forward to the possibility of working with you.
Kind regards,
Marisol Alvarez
Member Services Specialist
This is the start of the stable review cycle for the 5.15.154 release.
There are 690 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 10 Apr 2024 12:52:23 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.154-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.154-rc1
min15.li <min15.li(a)samsung.com>
nvme: fix miss command type check
Antoine Tenart <atenart(a)kernel.org>
gro: fix ownership transfer
David Hildenbrand <david(a)redhat.com>
mm/secretmem: fix GUP-fast succeeding on secretmem folios
Davide Caratti <dcaratti(a)redhat.com>
mptcp: don't account accept() of non-MPC client as fallback to TCP
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/bugs: Fix the SRSO mitigation on Zen3/4
Stefan O'Rear <sorear(a)fastmail.com>
riscv: process: Fix kernel gp leakage
Samuel Holland <samuel.holland(a)sifive.com>
riscv: Fix spurious errors from __get/put_kernel_nofault
Sumanth Korikkar <sumanthk(a)linux.ibm.com>
s390/entry: align system call table on 8 bytes
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
Herve Codina <herve.codina(a)bootlin.com>
of: dynamic: Synchronize of_changeset_destroy() with the devlink removals
Herve Codina <herve.codina(a)bootlin.com>
driver core: Introduce device_link_wait_removal()
I Gede Agastya Darma Laksana <gedeagas22(a)gmail.com>
ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
Jann Horn <jannh(a)google.com>
fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
Jann Horn <jannh(a)google.com>
openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached
Jann Horn <jannh(a)google.com>
HID: uhid: Use READ_ONCE()/WRITE_ONCE() for ->running
Jeff Layton <jlayton(a)kernel.org>
nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
Arnd Bergmann <arnd(a)arndb.de>
ata: sata_mv: Fix PCI device ID table declaration compilation warning
Arnd Bergmann <arnd(a)arndb.de>
scsi: mylex: Fix sysfs buffer lengths
Arnd Bergmann <arnd(a)arndb.de>
ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
Stephen Lee <slee08177(a)gmail.com>
ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: rt711-sdw: fix locking sequence
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: rt711-sdca: fix locking sequence
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: rt5682-sdw: fix locking sequence
Paul Barker <paul.barker.ct(a)bp.renesas.com>
net: ravb: Always process TX descriptor ring
Wei Fang <wei.fang(a)nxp.com>
net: fec: Set mac_managed_pm during probe
Denis Kirjanov <dkirjanov(a)suse.de>
drivers: net: convert to boolean for the mac_managed_pm flag
Oleksij Rempel <linux(a)rempel-privat.de>
net: usb: asix: suspend embedded PHY if external is used
Ivan Vecera <ivecera(a)redhat.com>
i40e: Enforce software interrupt during busy-poll exit
Ivan Vecera <ivecera(a)redhat.com>
i40e: Remove _t suffix from enum type names
Joe Damato <jdamato(a)fastly.com>
i40e: Store the irq number in i40e_q_vector
Christian A. Ehrhardt <lk(a)c--e.de>
usb: typec: ucsi: Check for notifications after init
Alexander Stein <alexander.stein(a)ew.tq-group.com>
Revert "usb: phy: generic: Get the vbus supply"
Bikash Hazarika <bhazarika(a)marvell.com>
scsi: qla2xxx: Update manufacturer detail
Bikash Hazarika <bhazarika(a)marvell.com>
scsi: qla2xxx: Update manufacturer details
Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
i40e: fix vf may be used uninitialized in this function warning
Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
i40e: fix i40e_count_filters() to count only active/new filters
Su Hui <suhui(a)nfschina.com>
octeontx2-pf: check negative error code in otx2_open()
Hariprasad Kelam <hkelam(a)marvell.com>
octeontx2-af: Fix issue with loading coalesced KPU profiles
Antoine Tenart <atenart(a)kernel.org>
udp: prevent local UDP tunnel packets from being GROed
Antoine Tenart <atenart(a)kernel.org>
udp: do not transition UDP GRO fraglist partial checksums to unnecessary
Antoine Tenart <atenart(a)kernel.org>
udp: do not accept non-tunnel GSO skbs landing in a tunnel
David Thompson <davthompson(a)nvidia.com>
mlxbf_gige: stop interface during shutdown
Kuniyuki Iwashima <kuniyu(a)amazon.com>
ipv6: Fix infinite recursion in fib6_dump_done().
Jakub Kicinski <kuba(a)kernel.org>
selftests: reuseaddr_conflict: add missing new line at the end of the output
Eric Dumazet <edumazet(a)google.com>
erspan: make sure erspan_base_hdr is present in skb->head
Antoine Tenart <atenart(a)kernel.org>
selftests: net: gro fwd: update vxlan GRO test expectations
Piotr Wejman <piotrwejman90(a)gmail.com>
net: stmmac: fix rx queue priority assignment
Eric Dumazet <edumazet(a)google.com>
net/sched: act_skbmod: prevent kernel-infoleak
Jakub Sitnicki <jakub(a)cloudflare.com>
bpf, sockmap: Prevent lock inversion deadlock in map delete elem
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
vboxsf: Avoid an spurious warning if load_nls_xxx() fails
Eric Dumazet <edumazet(a)google.com>
netfilter: validate user input for expected length
Ziyang Xuan <william.xuanziyang(a)huawei.com>
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: flush pending destroy work before exit_net release
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: reject new basechain after table flag update
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Mark target gfn of emulated atomic instruction as dirty
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Bail to userspace if emulation of atomic user access faults
Ye Zhang <ye.zhang(a)rock-chips.com>
thermal: devfreq_cooling: Fix perf state when calculate dfc res_util
Vlastimil Babka <vbabka(a)suse.cz>
mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
Ingo Molnar <mingo(a)kernel.org>
Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
Jens Axboe <axboe(a)kernel.dk>
io_uring: ensure '0' is returned on file registration success
Gokul krishna Krishnakumar <quic_gokukris(a)quicinc.com>
locking/rwsem: Disable preemption while trying for rwsem lock
Mahmoud Adam <mngyadam(a)amazon.com>
net/rds: fix possible cp null dereference
Jesper Dangaard Brouer <hawk(a)kernel.org>
xen-netfront: Add missing skb_mark_for_recycle
Bastien Nocera <hadess(a)hadess.net>
Bluetooth: Fix TOCTOU in HCI debugfs implementation
Hui Wang <hui.wang(a)canonical.com>
Bluetooth: hci_event: set the conn encrypted before conn establishes
Johan Hovold <johan+linaro(a)kernel.org>
arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken
Sean Christopherson <seanjc(a)google.com>
x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined word
Sandipan Das <sandipan.das(a)amd.com>
x86/cpufeatures: Add new word for scattered features
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
Arnd Bergmann <arnd(a)arndb.de>
dm integrity: fix out-of-range warning
Hariprasad Kelam <hkelam(a)marvell.com>
Octeontx2-af: fix pause frame configuration in GMP mode
Andrei Matei <andreimatei1(a)gmail.com>
bpf: Protect against int overflow for stack access size
David Thompson <davthompson(a)nvidia.com>
mlxbf_gige: call request_irq() after NAPI initialized
Nikita Kiryushin <kiryushin(a)ancud.ru>
ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields()
Eric Dumazet <edumazet(a)google.com>
tcp: properly terminate timers for kernel sockets
Alexandra Winter <wintera(a)linux.ibm.com>
s390/qeth: handle deferred cc1
Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()
Johannes Berg <johannes.berg(a)intel.com>
wifi: iwlwifi: mvm: rfi: fix potential response leaks
Bixuan Cui <cuibixuan(a)linux.alibaba.com>
iwlwifi: mvm: rfi: use kmemdup() to replace kzalloc + memcpy
David Thompson <davthompson(a)nvidia.com>
mlxbf_gige: stop PHY during open() error paths
Ryosuke Yasuoka <ryasuoka(a)redhat.com>
nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet
Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
USB: UAS: return ENODEV when submit urbs fail with device not attached
Bart Van Assche <bvanassche(a)acm.org>
scsi: usb: Stop using the SCSI pointer
Bart Van Assche <bvanassche(a)acm.org>
scsi: usb: Call scsi_done() directly
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Fix deadlock in usb_deauthorize_interface()
Muhammad Usama Anjum <usama.anjum(a)collabora.com>
scsi: lpfc: Correct size for wqe for memset()
Mika Westerberg <mika.westerberg(a)linux.intel.com>
PCI/DPC: Quirk PIO log size for Intel Ice Lake Root Ports
Kim Phillips <kim.phillips(a)amd.com>
x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Delay I/O Abort on PCI error
Saurav Kashyap <skashyap(a)marvell.com>
scsi: qla2xxx: Change debug message during driver unload
Saurav Kashyap <skashyap(a)marvell.com>
scsi: qla2xxx: Fix double free of fcport
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Fix command flush on cable pull
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: NVME|FCP prefer flag not being honored
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Split FCE|EFT trace control
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Fix N2N stuck connection
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Prevent command send on chip reset
Christian A. Ehrhardt <lk(a)c--e.de>
usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
Christian A. Ehrhardt <lk(a)c--e.de>
usb: typec: ucsi: Ack unsupported commands
yuan linyu <yuanlinyu(a)hihonor.com>
usb: udc: remove warning when queue disabled ep
Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
usb: dwc2: gadget: LPM flow fix
Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
usb: dwc2: gadget: Fix exiting from clock gating
Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
usb: dwc2: host: Fix ISOC flow in DDMA mode
Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
usb: dwc2: host: Fix hibernation flow
Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
usb: dwc2: host: Fix remote wakeup from hibernation
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Add hub_get() and hub_put() routines
Dan Carpenter <dan.carpenter(a)linaro.org>
staging: vc04_services: fix information leak in create_component()
Arnd Bergmann <arnd(a)arndb.de>
staging: vc04_services: changen strncpy() to strscpy_pad()
Guilherme G. Piccoli <gpiccoli(a)igalia.com>
scsi: core: Fix unremoved procfs host directory regression
Duoming Zhou <duoming(a)zju.edu.cn>
ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
Tom Chung <chiahsuan.chung(a)amd.com>
drm/amd/display: Preserve original aspect ratio in create stream
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/amdgpu: Use drm_mode_copy()
Oliver Neukum <oneukum(a)suse.com>
usb: cdc-wdm: close race between read and workqueue
Chris Wilson <chris(a)chris-wilson.co.uk>
drm/i915/gt: Reset queue_priority_hint on parking
Claus Hansen Ries <chr(a)terma.com>
net: ll_temac: platform_get_resource replaced by wrong function
Mikko Rapeli <mikko.rapeli(a)linaro.org>
mmc: core: Avoid negative index with array access
Mikko Rapeli <mikko.rapeli(a)linaro.org>
mmc: core: Initialize mmc_blk_ioc_data
Nathan Chancellor <nathan(a)kernel.org>
hexagon: vmlinux.lds.S: handle attributes section
Max Filippov <jcmvbkbc(a)gmail.com>
exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
Felix Fietkau <nbd(a)nbd.name>
wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
btrfs: zoned: use zone aware sb location for scrub
John Sperbeck <jsperbeck(a)google.com>
init: open /initrd.image with O_LARGEFILE
Zi Yan <ziy(a)nvidia.com>
mm/migrate: set swap entry values of THP tail pages properly.
Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
serial: sc16is7xx: convert from _raw_ to _noinc_ regmap functions for FIFO
Alex Williamson <alex.williamson(a)redhat.com>
vfio/fsl-mc: Block calling interrupt handler without trigger
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: Create persistent IRQ handlers
Alex Williamson <alex.williamson(a)redhat.com>
vfio/pci: Create persistent INTx handler
Alex Williamson <alex.williamson(a)redhat.com>
vfio: Introduce interface to flush virqfd inject workqueue
Alex Williamson <alex.williamson(a)redhat.com>
vfio/pci: Lock external INTx masking ops
Alex Williamson <alex.williamson(a)redhat.com>
vfio/pci: Disable auto-enable of exclusive INTx IRQ
Geliang Tang <tanggeliang(a)kylinos.cn>
selftests: mptcp: diag: return KSFT_FAIL not test_cnt
Nathan Chancellor <nathan(a)kernel.org>
powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
Tim Schumacher <timschumi(a)gmx.de>
efivarfs: Request at most 512 bytes for variable names
Yang Jihong <yangjihong1(a)huawei.com>
perf/core: Fix reentry problem in perf_output_read_group()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
nfsd: Fix a regression in nfsd_setattr()
NeilBrown <neilb(a)suse.de>
nfsd: don't call locks_release_private() twice concurrently
NeilBrown <neilb(a)suse.de>
nfsd: don't take fi_lock in nfsd_break_deleg_cb()
NeilBrown <neilb(a)suse.de>
nfsd: fix RELEASE_LOCKOWNER
Jeff Layton <jlayton(a)kernel.org>
nfsd: drop the nfsd_put helper
NeilBrown <neilb(a)suse.de>
nfsd: call nfsd_last_thread() before final nfsd_put()
Alexander Aring <aahringo(a)redhat.com>
lockd: introduce safe async lock op
NeilBrown <neilb(a)suse.de>
NFSD: fix possible oops when nfsd/pool_stats is closed.
Chuck Lever <chuck.lever(a)oracle.com>
Documentation: Add missing documentation for EXPORT_OP flags
NeilBrown <neilb(a)suse.de>
nfsd: separate nfsd_last_thread() from nfsd_put()
NeilBrown <neilb(a)suse.de>
nfsd: Simplify code around svc_exit_thread() call in nfsd()
NeilBrown <neilb(a)suse.de>
nfsd: don't allow nfsd threads to be signalled.
Tavian Barnes <tavianator(a)tavianator.com>
nfsd: Fix creation time serialization order
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add an nfsd4_encode_nfstime4() helper
NeilBrown <neilb(a)suse.de>
lockd: drop inappropriate svc_get() from locked_get()
Dan Carpenter <dan.carpenter(a)linaro.org>
nfsd: fix double fget() bug in __write_ports_addfd()
Jeff Layton <jlayton(a)kernel.org>
nfsd: make a copy of struct iattr before calling notify_change
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop
Jeff Layton <jlayton(a)kernel.org>
nfsd: simplify the delayed disposal list code
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Convert filecache to rhltable
Jeff Layton <jlayton(a)kernel.org>
nfsd: allow reaping files still under writeback
Jeff Layton <jlayton(a)kernel.org>
nfsd: update comment over __nfsd_file_cache_purge
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't take/put an extra reference when putting a file
Jeff Layton <jlayton(a)kernel.org>
nfsd: add some comments to nfsd_file_do_acquire
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't kill nfsd_files because of lease break error
Jeff Layton <jlayton(a)kernel.org>
nfsd: simplify test_bit return in NFSD_FILE_KEY_FULL comparator
Jeff Layton <jlayton(a)kernel.org>
nfsd: NFSD_FILE_KEY_INODE only needs to find GC'ed entries
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't open-code clear_and_wake_up_bit
Jeff Layton <jlayton(a)kernel.org>
nfsd: call op_release, even when op_func returns an error
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't replace page in rq_pages if it's a continuation of last page
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Protect against filesystem freezing
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: copy the whole verifier in nfsd_copy_write_verifier
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't fsync nfsd_files on last close
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: fix problems with cleanup on errors in nfsd4_copy
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't hand out delegation on setuid files being opened for write
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: fix leaked reference count of nfsd4_ssc_umount_item
Jeff Layton <jlayton(a)kernel.org>
nfsd: clean up potential nfsd_file refcount leaks in COPY codepath
Jeff Layton <jlayton(a)kernel.org>
nfsd: allow nfsd_file_get to sanely handle a NULL pointer
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: enhance inter-server copy cleanup
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't destroy global nfs4_file table in per-net shutdown
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't free files unconditionally in __nfsd_file_cache_purge
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: replace delayed_work with work_struct for nfsd_client_shrinker
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use set_bit(RQ_DROPME)
Chuck Lever <chuck.lever(a)oracle.com>
Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix handling of cached open files in nfsd4_open codepath
Jeff Layton <jlayton(a)kernel.org>
nfsd: rework refcounting in filecache
Kees Cook <keescook(a)chromium.org>
NFSD: Avoid clashing function prototypes
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use only RQ_DROPME to signal the need to drop a reply
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add CB_RECALL_ANY tracepoints
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add delegation reaper to react to low memory condition
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add support for sending CB_RECALL_ANY
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker
Chuck Lever <chuck.lever(a)oracle.com>
trace: Relocate event helper files
Jeff Layton <jlayton(a)kernel.org>
lockd: fix file selection in nlmsvc_cancel_blocked
Jeff Layton <jlayton(a)kernel.org>
lockd: ensure we use the correct file descriptor when unlocking
Jeff Layton <jlayton(a)kernel.org>
lockd: set missing fl_flags field when retrieving args
Xiu Jianfeng <xiujianfeng(a)huawei.com>
NFSD: Use struct_size() helper in alloc_session()
Jeff Layton <jlayton(a)kernel.org>
nfsd: return error if nfs4_setacl fails
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add an nfsd_file_fsync tracepoint
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix up the filecache laundrette scheduling
Jeff Layton <jlayton(a)kernel.org>
filelock: add a new locks_inode_context accessor function
Jeff Layton <jlayton(a)kernel.org>
nfsd: reorganize filecache.c
Jeff Layton <jlayton(a)kernel.org>
nfsd: remove the pages_flushed statistic from filecache
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix licensing header in filecache.c
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use rhashtable for managing nfs4_file objects
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor find_file()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up find_or_add_file()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add a nfsd4_file_hash_remove() helper
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd4_init_file()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Update file_hashtbl() helpers
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use const pointers as parameters to fh_ helpers
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace delegation revocations
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace stateids returned via DELEGRETURN
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfs4_preprocess_stateid_op() call sites
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Flesh out a documenting comment for filecache.c
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Pass the target nfsd_file to nfsd_commit()
David Disseldorp <ddiss(a)suse.de>
exportfs: use pr_debug for unreachable debug statements
Jeff Layton <jlayton(a)kernel.org>
nfsd: allow disabling NFSv2 at compile time
Jeff Layton <jlayton(a)kernel.org>
nfsd: move nfserrno() to vfs.c
Jeff Layton <jlayton(a)kernel.org>
nfsd: ignore requests to disable unsupported versions
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Finish converting the NFSv3 GETACL result encoder
Colin Ian King <colin.i.king(a)gmail.com>
NFSD: Remove redundant assignment to variable host_err
Anna Schumaker <Anna.Schumaker(a)Netapp.com>
NFSD: Simplify READ_PLUS
Jeff Layton <jlayton(a)kernel.org>
nfsd: use locks_inode_context helper
Jeff Layton <jlayton(a)kernel.org>
lockd: use locks_inode_context helper
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix reads with a non-zero offset that don't end on a page boundary
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix trace_nfsd_fh_verify_err() crasher
Jeff Layton <jlayton(a)kernel.org>
nfsd: put the export reference in nfsd4_verify_deleg_dentry
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix use-after-free in nfsd_file_do_acquire tracepoint
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix net-namespace logic in __nfsd_file_cache_purge
Jeff Layton <jlayton(a)kernel.org>
nfsd: ensure we always call fh_verify_error tracepoint
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
NFSD: unregister shrinker when nfsd_init_net() fails
Jeff Layton <jlayton(a)kernel.org>
nfsd: rework hashtable handling in nfsd_do_file_acquire
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix nfsd_file_unhash_and_dispose
Gaosheng Cui <cuigaosheng1(a)huawei.com>
fanotify: Remove obsoleted fanotify_event_has_path()
Gaosheng Cui <cuigaosheng1(a)huawei.com>
fsnotify: remove unused declaration
Al Viro <viro(a)zeniv.linux.org.uk>
fs/notify: constify path
Jeff Layton <jlayton(a)kernel.org>
nfsd: extra checks when freeing delegation stateids
Jeff Layton <jlayton(a)kernel.org>
nfsd: make nfsd4_run_cb a bool return function
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix comments about spinlock handling with delegations
Jeff Layton <jlayton(a)kernel.org>
nfsd: only fill out return pointer on success in nfsd4_lookup_stateid
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Cap rsize_bop result based on send buffer size
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Rename the fields in copy_stateid_t
ChenXiaoSong <chenxiaosong2(a)huawei.com>
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops
ChenXiaoSong <chenxiaosong2(a)huawei.com>
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
ChenXiaoSong <chenxiaosong2(a)huawei.com>
nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops
ChenXiaoSong <chenxiaosong2(a)huawei.com>
nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops
ChenXiaoSong <chenxiaosong2(a)huawei.com>
nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Pack struct nfsd4_compoundres
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove unused nfsd4_compoundargs::cachetype field
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove "inline" directives on op_rsize_bop helpers
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfs4svc_encode_compoundres()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up WRITE arg decoders
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor common code out of dirlist helpers
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Parametrize how much of argsize should be zeroed
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add shrinker to reap courtesy clients on low memory condition
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: keep track of the number of courtesy clients in the system
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd_setattr()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add a mechanism to wait for a DELEGRETURN
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add tracepoints to report NFSv4 callback completions
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace NFSv4 COMPOUND tags
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Replace dprintk() call site in fh_verify()
Gaosheng Cui <cuigaosheng1(a)huawei.com>
nfsd: remove nfsd4_prepare_cb_recall() declaration
Jeff Layton <jlayton(a)kernel.org>
nfsd: clean up mounted_on_fileid handling
NeilBrown <neilb(a)suse.de>
NFSD: drop fname and flen args from nfsd_create_locked()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
nfsd: Propagate some error code returned by memdup_user()
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
nfsd: Avoid some useless tests
Jinpeng Cui <cui.jinpeng2(a)zte.com.cn>
NFSD: remove redundant variable status
Olga Kornievskaia <kolga(a)netapp.com>
NFSD enforce filehandle check for source file in COPY
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
lockd: move from strlcpy with unused retval to strscpy
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
NFSD: move from strlcpy with unused retval to strscpy
Al Viro <viro(a)zeniv.linux.org.uk>
nfsd_splice_actor(): handle compound pages
NeilBrown <neilb(a)suse.de>
NFSD: fix regression with setting ACLs.
NeilBrown <neilb(a)suse.de>
NFSD: discard fh_locked flag and fh_lock/fh_unlock
NeilBrown <neilb(a)suse.de>
NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
NeilBrown <neilb(a)suse.de>
NFSD: use explicit lock/unlock for directory ops
NeilBrown <neilb(a)suse.de>
NFSD: reduce locking in nfsd_lookup()
NeilBrown <neilb(a)suse.de>
NFSD: only call fh_unlock() once in nfsd_link()
NeilBrown <neilb(a)suse.de>
NFSD: always drop directory lock in nfsd_unlink()
NeilBrown <neilb(a)suse.de>
NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.
NeilBrown <neilb(a)suse.de>
NFSD: add posix ACLs to struct nfsd_attrs
NeilBrown <neilb(a)suse.de>
NFSD: add security label to struct nfsd_attrs
NeilBrown <neilb(a)suse.de>
NFSD: set attributes when creating symlinks
NeilBrown <neilb(a)suse.de>
NFSD: introduce struct nfsd_attrs
Jeff Layton <jlayton(a)kernel.org>
NFSD: verify the opened dentry after setting a delegation
Jeff Layton <jlayton(a)kernel.org>
NFSD: drop fh argument from alloc_init_deleg
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Move copy offload callback arguments into a separate structure
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add nfsd4_send_cb_offload()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove kmalloc from nfsd4_do_async_copy()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd4_do_copy()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Replace boolean fields in struct nfsd4_copy
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Make nfs4_put_copy() static
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Reorder the fields in struct nfsd4_op
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Shrink size of struct nfsd4_copy
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Shrink size of struct nfsd4_copy_notify
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: nfserrno(-ENOMEM) is nfserr_jukebox
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix strncpy() fortify warning
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd4_encode_readlink()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use xdr_pad_size()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Simplify starting_len
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Optimize nfsd4_encode_readv()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add an nfsd4_read::rd_eof field
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up SPLICE_OK in nfsd4_encode_read()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Optimize nfsd4_encode_fattr()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Optimize nfsd4_encode_operation()
Jeff Layton <jlayton(a)kernel.org>
nfsd: silence extraneous printk on nfsd.ko insertion
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: limit the number of v4 clients to 1024 per 1GB of system memory
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: keep track of the number of v4 clients in the system
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: refactoring v4 specific code to a helper in nfs4state.c
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Ensure nf_inode is never dereferenced
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: NFSv4 CLOSE should release an nfsd_file immediately
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Move nfsd_file_trace_alloc() tracepoint
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Separate tracepoints for acquire and create
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up unused code after rhashtable conversion
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Convert the filecache to use rhashtable
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Set up an rhashtable for the filecache
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Replace the "init once" mechanism
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove nfsd_file::nf_hashval
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: nfsd_file_hash_remove can compute hashval
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor __nfsd_file_close_inode()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove lockdep assertion from unhash_and_release_locked()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: No longer record nf_hashval in the trace log
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Never call nfsd_file_gc() in foreground paths
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix the filecache LRU shrinker
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Leave open files out of the filecache LRU
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace filecache LRU activity
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: WARN when freeing an item still linked via nf_lru
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Hook up the filecache stat file
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Zero counters when the filecache is re-initialized
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Record number of flush calls
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Report the number of items evicted by the LRU walk
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd_file_lru_scan()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd_file_gc()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add nfsd_file_lru_dispose_list() helper
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Report average age of filecache items
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Report count of freed filecache items
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Report count of calls to nfsd_file_acquire()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Report filecache LRU size
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Demote a WARN to a pr_warn()
Colin Ian King <colin.i.king(a)gmail.com>
nfsd: remove redundant assignment to variable len
Zhang Jiaming <jiaming(a)nfschina.com>
NFSD: Fix space and spelling mistake
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Instrument fh_verify()
Benjamin Coddington <bcodding(a)redhat.com>
NLM: Defend against file_lock changes after vfs_test_lock()
Xin Gao <gaoxin(a)cdjrlc.com>
fsnotify: Fix comment typo
Amir Goldstein <amir73il(a)gmail.com>
fanotify: introduce FAN_MARK_IGNORE
Amir Goldstein <amir73il(a)gmail.com>
fanotify: cleanups for fanotify_mark() input validations
Amir Goldstein <amir73il(a)gmail.com>
fanotify: prepare for setting event flags in ignore mask
Oliver Ford <ojford(a)gmail.com>
fs: inotify: Fix typo in inotify comment
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Decode NFSv4 birth time attribute
Amir Goldstein <amir73il(a)gmail.com>
fanotify: refine the validation checks on non-dir inode mask
NeilBrown <neilb(a)suse.de>
NFS: restore module put when manager exits.
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix potential use-after-free in nfsd_file_put()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: nfsd_file_put() can sleep
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add documenting comment for nfsd4_release_lockowner()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Modernize nfsd4_release_lockowner()
Zhang Xiaoxu <zhangxiaoxu5(a)huawei.com>
nfsd: Fix null-ptr-deref in nfsd_fill_super()
Zhang Xiaoxu <zhangxiaoxu5(a)huawei.com>
nfsd: Unregister the cld notifier when laundry_wq create failed
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Use RMW bitops in single-threaded hot paths
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace filecache opens
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Move documenting comment for nfsd4_process_open2()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix whitespace
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove dprintk call sites from tail of nfsd4_open()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Instantiate a struct file when creating a regular NFSv4 file
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd_open_verified()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove do_nfsd_create()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor NFSv4 OPEN(CREATE)
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor NFSv3 CREATE
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd_create_setattr()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd3_proc_create()
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: Show state of courtesy client in client info
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add support for lock conflict to courteous server
Dai Ngo <dai.ngo(a)oracle.com>
fs/lock: add 2 callbacks to lock_manager_operations to resolve conflict
Dai Ngo <dai.ngo(a)oracle.com>
fs/lock: add helper locks_owner_has_blockers to check for blockers
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: move create/destroy of laundry_wq to init_nfsd and exit_nfsd
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add support for share reservation conflict to courteous server
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add courteous server support for thread with only delegation
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd_splice_actor()
Vasily Averin <vvs(a)openvz.org>
fanotify: fix incorrect fmode_t casts
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: consistent behavior for parent not watching children
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: introduce mark type iterator
Amir Goldstein <amir73il(a)gmail.com>
fanotify: enable "evictable" inode marks
Amir Goldstein <amir73il(a)gmail.com>
fanotify: use fsnotify group lock helpers
Amir Goldstein <amir73il(a)gmail.com>
fanotify: implement "evictable" inode marks
Amir Goldstein <amir73il(a)gmail.com>
fanotify: factor out helper fanotify_mark_update_flags()
Amir Goldstein <amir73il(a)gmail.com>
fanotify: create helper fanotify_mark_user_flags()
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: allow adding an inode mark without pinning inode
Amir Goldstein <amir73il(a)gmail.com>
dnotify: use fsnotify group lock helpers
Amir Goldstein <amir73il(a)gmail.com>
nfsd: use fsnotify group lock helpers
Amir Goldstein <amir73il(a)gmail.com>
inotify: use fsnotify group lock helpers
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: create helpers for group mark_mutex lock
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: make allow_dups a property of the group
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: pass flags argument to fsnotify_alloc_group()
Amir Goldstein <amir73il(a)gmail.com>
inotify: move control flags from mask to mark flags
Dai Ngo <dai.ngo(a)oracle.com>
fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.
Amir Goldstein <amir73il(a)gmail.com>
fanotify: do not allow setting dirent events in mask of non-dir
Trond Myklebust <trond.myklebust(a)hammerspace.com>
nfsd: Clean up nfsd_file_put()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
nfsd: Fix a write performance regression
Bang Li <libang.linuxer(a)gmail.com>
fsnotify: remove redundant parameter judgment
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: optimize FS_MODIFY events with no ignored masks
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: fix merge with parent's ignored mask
Jakob Koschel <jakobkoschel(a)gmail.com>
nfsd: fix using the correct variable for sizeof()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up _lm_ operation names
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove CONFIG_NFSD_V3
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Move svc_serv_ops::svo_function into struct svc_serv
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove svc_serv_ops::svo_module
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Remove svc_shutdown_net()
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Rename svc_close_xprt()
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Rename svc_create_xprt()
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Remove svo_shutdown method
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Remove the .svo_enqueue_xprt method
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove NFSD_PROC_ARGS_* macros
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Streamline the rare "found" case
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Skip extra computation for RC_NOCACHE case
Chuck Lever <chuck.lever(a)oracle.com>
orDate: Thu Sep 30 19:19:57 2021 -0400
Ondrej Valousek <ondrej.valousek.xm(a)renesas.com>
nfsd: Add support for the birth time attribute
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Deprecate NFS_OFFSET_MAX
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: invalidate dcache before IN_DELETE event
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Move fill_pre_wcc() and fill_post_wcc()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace boot verifier resets
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Rename boot verifier functions
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up the nfsd_net::nfssvc_boot field
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Write verifier might go backwards
Trond Myklebust <trond.myklebust(a)hammerspace.com>
nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id)
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd_vfs_write()
Jeff Layton <jeff.layton(a)primarydata.com>
nfsd: Retry once in nfsd_open on an -EOPENSTALE return
Jeff Layton <jeff.layton(a)primarydata.com>
nfsd: Add errno mapping for EREMOTEIO
Peng Tao <tao.peng(a)primarydata.com>
nfsd: map EBADF
Vasily Averin <vvs(a)virtuozzo.com>
nfsd4: add refcount for nfsd4_blocked_lock
J. Bruce Fields <bfields(a)redhat.com>
nfs: block notification on fs with its own ->lock
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: De-duplicate nfsd4_decode_bitmap4()
J. Bruce Fields <bfields(a)redhat.com>
nfsd: improve stateid access bitmask documentation
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Combine XDR error tracepoints
NeilBrown <neilb(a)suse.de>
NFSD: simplify per-net file cache management
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
NFSD: Fix inconsistent indenting
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove be32_to_cpu() from DRC hash function
NeilBrown <neilb(a)suse.de>
NFS: switch the callback service back to non-pooled.
NeilBrown <neilb(a)suse.de>
lockd: use svc_set_num_threads() for thread start and stop
NeilBrown <neilb(a)suse.de>
SUNRPC: always treat sv_nrpools==1 as "not pooled"
NeilBrown <neilb(a)suse.de>
SUNRPC: move the pool_map definitions (back) into svc.c
NeilBrown <neilb(a)suse.de>
lockd: rename lockd_create_svc() to lockd_get()
NeilBrown <neilb(a)suse.de>
lockd: introduce lockd_put()
NeilBrown <neilb(a)suse.de>
lockd: move svc_exit_thread() into the thread
NeilBrown <neilb(a)suse.de>
lockd: move lockd_start_svc() call into lockd_create_svc()
NeilBrown <neilb(a)suse.de>
lockd: simplify management of network status notifiers
NeilBrown <neilb(a)suse.de>
lockd: introduce nlmsvc_serv
NeilBrown <neilb(a)suse.de>
NFSD: simplify locking for network notifier.
NeilBrown <neilb(a)suse.de>
SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()
NeilBrown <neilb(a)suse.de>
NFSD: Make it possible to use svc_set_num_threads_sync
NeilBrown <neilb(a)suse.de>
NFSD: narrow nfsd_mutex protection in nfsd thread
NeilBrown <neilb(a)suse.de>
SUNRPC: use sv_lock to protect updates to sv_nrthreads.
NeilBrown <neilb(a)suse.de>
nfsd: make nfsd_stats.th_cnt atomic_t
NeilBrown <neilb(a)suse.de>
SUNRPC: stop using ->sv_nrthreads as a refcount
NeilBrown <neilb(a)suse.de>
SUNRPC/NFSD: clean up get/put functions.
NeilBrown <neilb(a)suse.de>
SUNRPC: change svc_get() to return the svc.
NeilBrown <neilb(a)suse.de>
NFSD: handle errors better in write_ports_addfd()
Eric W. Biederman <ebiederm(a)xmission.com>
exit: Rename module_put_and_exit to module_put_and_kthread_exit
Eric W. Biederman <ebiederm(a)xmission.com>
exit: Implement kthread_exit
Amir Goldstein <amir73il(a)gmail.com>
fanotify: wire up FAN_RENAME event
Amir Goldstein <amir73il(a)gmail.com>
fanotify: report old and/or new parent+name in FAN_RENAME event
Amir Goldstein <amir73il(a)gmail.com>
fanotify: record either old name new name or both for FAN_RENAME
Amir Goldstein <amir73il(a)gmail.com>
fanotify: record old and new parent and name in FAN_RENAME event
Amir Goldstein <amir73il(a)gmail.com>
fanotify: support secondary dir fh and name in fanotify_info
Amir Goldstein <amir73il(a)gmail.com>
fanotify: use helpers to parcel fanotify_info buffer
Amir Goldstein <amir73il(a)gmail.com>
fanotify: use macros to get the offset to fanotify_info buffer
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: generate FS_RENAME event with rich information
Amir Goldstein <amir73il(a)gmail.com>
fanotify: introduce group flag FAN_REPORT_TARGET_FID
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: separate mark iterator type from object type enum
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: clarify object type argument
Gabriel Krisman Bertazi <krisman(a)collabora.com>
ext4: fix error code saved on super block during file system abort
J. Bruce Fields <bfields(a)redhat.com>
nfsd4: remove obselete comment
Changcheng Deng <deng.changcheng(a)zte.com.cn>
NFSD:fix boolreturn.cocci warning
J. Bruce Fields <bfields(a)redhat.com>
nfsd: update create verifier comment
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Change return value type of .pc_encode
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Save location of NFSv4 COMPOUND status
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Change return value type of .pc_decode
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Replace the "__be32 *p" parameter to .pc_decode
Colin Ian King <colin.king(a)canonical.com>
NFSD: Initialize pointer ni with NULL and not plain integer 0
NeilBrown <neilb(a)suse.de>
NFSD: simplify struct nfsfh
NeilBrown <neilb(a)suse.de>
NFSD: drop support for ancient filehandles
NeilBrown <neilb(a)suse.de>
NFSD: move filehandle format declarations out of "uapi".
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Optimize DRC bucket pruning
Chuck Lever <chuck.lever(a)oracle.com>
NFS: Move NFS protocol display macros to global header
Chuck Lever <chuck.lever(a)oracle.com>
NFS: Move generic FS show macros to global header
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size field
Chuck Lever <chuck.lever(a)oracle.com>
NFS: Remove unnecessary TRACE_DEFINE_ENUM()s
Gabriel Krisman Bertazi <krisman(a)collabora.com>
docs: Document the FAN_FS_ERROR event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
ext4: Send notifications on error
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Allow users to request FAN_FS_ERROR events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Emit generic error info for error event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Report fid info for file related file system errors
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: WARN_ON against too large file handles
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Add helpers to decide whether to report FID/DFID
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Wrap object_fh inline space in a creator macro
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Support merging of error events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Support enqueueing of error events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Pre-allocate pool of error events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Reserve UAPI bits for FAN_FS_ERROR
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Support FS_ERROR event type
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Require fid_mode for any non-fd event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Encode empty file handle when no inode is provided
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Allow file handle encoding for unhashed events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Support null inode event in fanotify_dfid_inode
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Pass group argument to free_event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Protect fsnotify_handle_inode_event from no-inode events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Retrieve super block from the data field
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Add wrapper around fsnotify_add_event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Add helper to detect overflow_event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
inotify: Don't force FS_IN_IGNORED
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Split fsid check from other fid mode checks
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Fold event size calculation to its own function
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Don't insert unmergeable events in hashtable
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: clarify contract for create event hooks
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: pass dentry instead of inode data
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: pass data_type to fsnotify_name()
Peter Zijlstra <peterz(a)infradead.org>
x86/static_call: Add support for Jcc tail-calls
Peter Zijlstra <peterz(a)infradead.org>
x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions
Peter Zijlstra <peterz(a)infradead.org>
x86/alternatives: Introduce int3_emulate_jcc()
Thomas Gleixner <tglx(a)linutronix.de>
x86/asm: Differentiate between code and function alignment
Peter Zijlstra <peterz(a)infradead.org>
arch: Introduce CONFIG_FUNCTION_ALIGNMENT
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/rfds: Mitigate Register File Data Sampling (RFDS)
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Documentation/hw-vuln: Add documentation for RFDS
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
Sean Christopherson <seanjc(a)google.com>
KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/entry_32: Add VERW just before userspace transition
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/entry_64: Add VERW just before userspace transition
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Add asm helpers for executing VERW
H. Peter Anvin (Intel) <hpa(a)zytor.com>
x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix
Oliver Upton <oliver.upton(a)linux.dev>
KVM: arm64: Limit stage2_apply_range() batch size to largest block
Oliver Upton <oliver.upton(a)linux.dev>
KVM: arm64: Work out supported block level at compile time
Rickard x Andersson <rickaran(a)axis.com>
tty: serial: imx: Fix broken RS485
John Ogness <john.ogness(a)linutronix.de>
printk: Update @console_may_schedule in console_trylock_spinning()
Nicolin Chen <nicolinc(a)nvidia.com>
iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
John Garry <john.garry(a)huawei.com>
dma-iommu: add iommu_dma_opt_mapping_size()
John Garry <john.garry(a)huawei.com>
dma-mapping: add dma_opt_mapping_size()
Will Deacon <will(a)kernel.org>
swiotlb: Fix alignment checks when both allocation and DMA masks are present
David Laight <David.Laight(a)ACULAB.COM>
minmax: add umin(a, b) and umax(a, b)
André Rösti <an.roesti(a)gmail.com>
entry: Respect changes to system call number by trace_sys_enter()
Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
clocksource/drivers/arm_global_timer: Fix maximum prescaler value
Jarred White <jarredwhite(a)linux.microsoft.com>
ACPI: CPPC: Use access_width over bit_width for system memory accesses
Maximilian Heyne <mheyne(a)amazon.de>
xen/events: close evtchn after mapping cleanup
Heiner Kallweit <hkallweit1(a)gmail.com>
i2c: i801: Avoid potential double call to gpiod_remove_lookup_table
Sumit Garg <sumit.garg(a)linaro.org>
tee: optee: Fix kernel panic caused by incorrect error handling
Bart Van Assche <bvanassche(a)acm.org>
fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
Nicolas Pitre <nico(a)fluxnic.net>
vt: fix unicode buffer corruption when deleting characters
Alexander Usyskin <alexander.usyskin(a)intel.com>
mei: me: add arrow lake point H DID
Alexander Usyskin <alexander.usyskin(a)intel.com>
mei: me: add arrow lake point S DID
Sherry Sun <sherry.sun(a)nxp.com>
tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled
Mathias Nyman <mathias.nyman(a)linux.intel.com>
usb: port: Don't try to peer unused USB ports based on location
Krishna Kurapati <quic_kriskura(a)quicinc.com>
usb: gadget: ncm: Fix handling of zero block length packets
Alan Stern <stern(a)rowland.harvard.edu>
USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command
Kailang Yang <kailang(a)realtek.com>
ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform
Nirmoy Das <nirmoy.das(a)intel.com>
drm/i915: Check before removing mm notifier
Steven Rostedt (Google) <rostedt(a)goodmis.org>
tracing: Use .flush() call to wake up readers
Sean Christopherson <seanjc(a)google.com>
KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
Nathan Chancellor <nathan(a)kernel.org>
xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
Michael Kelley <mhklinux(a)outlook.com>
Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: reject constant set with timeout
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: disallow anonymous set with timeout flag
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
cpufreq: brcmstb-avs-cpufreq: fix up "add check for cpufreq_cpu_get's return value"
Geert Uytterhoeven <geert+renesas(a)glider.be>
net: ravb: Add R-Car Gen4 support
Anton Altaparmakov <anton(a)tuxera.com>
x86/pm: Work around false positive kmemleak report in msr_build_context()
Mikulas Patocka <mpatocka(a)redhat.com>
dm snapshot: fix lockup in dm_exception_table_exit
Leo Ma <hanghong.ma(a)amd.com>
drm/amd/display: Fix noise issue on HDMI AV mute
Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
drm/amd/display: Return the correct HDCP error code
Philip Yang <Philip.Yang(a)amd.com>
drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flag
Conrad Kostecki <conikost(a)gentoo.org>
ahci: asm1064: asm1166: don't limit reported ports
Andrey Jr. Melnikov <temnota.am(a)gmail.com>
ahci: asm1064: correct count of reported ports
Jason A. Donenfeld <Jason(a)zx2c4.com>
wireguard: netlink: access device through ctx instead of peer
Jason A. Donenfeld <Jason(a)zx2c4.com>
wireguard: netlink: check for dangling peer via is_dead instead of empty list
Steven Rostedt (Google) <rostedt(a)goodmis.org>
net: hns3: tracing: fix hclgevf trace event strings
Steven Rostedt (Google) <rostedt(a)goodmis.org>
NFSD: Fix nfsd_clid_class use of __string_len() macro
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/CPU/AMD: Update the Zenbleed microcode revisions
Marek Szyprowski <m.szyprowski(a)samsung.com>
cpufreq: dt: always allocate zeroed cpumask
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: prevent kernel bug at submit_bh_wbc()
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix failure to detect DAT corruption in btree and direct mappings
Qiang Zhang <qiang4.zhang(a)intel.com>
memtest: use {READ,WRITE}_ONCE in memory scanning
Jani Nikula <jani.nikula(a)intel.com>
drm/vc4: hdmi: do not return negative values from .get_modes()
Jani Nikula <jani.nikula(a)intel.com>
drm/imx/ipuv3: do not return negative values from .get_modes()
Jani Nikula <jani.nikula(a)intel.com>
drm/exynos: do not return negative values from .get_modes()
Jani Nikula <jani.nikula(a)intel.com>
drm/panel: do not return negative error codes from drm_panel_get_modes()
Harald Freudenberger <freude(a)linux.ibm.com>
s390/zcrypt: fix reference counting on zcrypt card objects
Sean Anderson <sean.anderson(a)linux.dev>
soc: fsl: qbman: Use raw spinlock for cgr_lock
Sean Anderson <sean.anderson(a)seco.com>
soc: fsl: qbman: Add CGR update function
Sean Anderson <sean.anderson(a)seco.com>
soc: fsl: qbman: Add helper for sanity checking cgr ops
Sean Anderson <sean.anderson(a)linux.dev>
soc: fsl: qbman: Always disable interrupts when taking cgr_lock
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Fix full_waiters_pending in poll
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Fix resetting of shortest_full
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Do not set shortest_full when full target is hit
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Fix waking up ring buffer readers
Marios Makassikis <mmakassikis(a)freebox.fr>
ksmbd: retrieve number of blocks using vfs_getattr in set_file_allocation_info
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: Disable virqfds on cleanup
Niklas Cassel <cassel(a)kernel.org>
PCI: dwc: endpoint: Fix advertised resizable BAR size
Nathan Chancellor <nathan(a)kernel.org>
kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
Josef Bacik <josef(a)toxicpanda.com>
nfs: fix UAF in direct writes
Stanislaw Gruszka <stanislaw.gruszka(a)linux.intel.com>
PCI/AER: Block runtime suspend when handling errors
Samuel Thibault <samuel.thibault(a)ens-lyon.org>
speakup: Fix 8bit characters from direct synth
Wayne Chang <waynec(a)nvidia.com>
usb: gadget: tegra-xudc: Fix USB3 PHY retrieval logic
Wayne Chang <waynec(a)nvidia.com>
phy: tegra: xusb: Add API to retrieve the port number of phy
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
slimbus: core: Remove usage of the deprecated ida_simple_xx() API
Jerome Brunet <jbrunet(a)baylibre.com>
nvmem: meson-efuse: fix function pointer type mismatch
Maximilian Heyne <mheyne(a)amazon.de>
ext4: fix corruption during on-line resize
Josua Mayer <josua(a)solid-run.com>
hwmon: (amc6821) add of_match table
Mickaël Salaün <mic(a)digikod.net>
landlock: Warn once if a Landlock action is requested while disabled
Christian Gmeiner <cgmeiner(a)igalia.com>
drm/etnaviv: Restore some id values
Dominique Martinet <dominique.martinet(a)atmark-techno.com>
mmc: core: Fix switch on gp3 partition
Ryan Roberts <ryan.roberts(a)arm.com>
mm: swap: fix race between free_swap_and_cache() and swapoff()
Huang Ying <ying.huang(a)intel.com>
swap: comments get_swap_device() with usage rule
Fedor Pchelkin <pchelkin(a)ispras.ru>
mac802154: fix llsec key resources release in mac802154_llsec_key_del
Yu Kuai <yukuai3(a)huawei.com>
dm-raid: fix lockdep waring in "pers->hot_add_disk"
Paul Menzel <pmenzel(a)molgen.mpg.de>
PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports
Mika Westerberg <mika.westerberg(a)linux.intel.com>
PCI/DPC: Quirk PIO log size for certain Intel Root Ports
Mika Westerberg <mika.westerberg(a)linux.intel.com>
PCI/ASPM: Make Intel DG2 L1 acceptable latency unlimited
Bjorn Helgaas <bhelgaas(a)google.com>
PCI: Work around Intel I210 ROM BAR overlap defect
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
PCI/PM: Drain runtime-idle callbacks before driver removal
Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
PCI: Drop pci_device_remove() test of pci_dev->driver
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix off-by-one chunk length calculation at contains_pending_extent()
Peter Collingbourne <pcc(a)google.com>
serial: Lock console when calling into driver before registration
Petr Mladek <pmladek(a)suse.com>
printk/console: Split out code that enables default console
Jameson Thies <jthies(a)google.com>
usb: typec: ucsi: Clean up UCSI_CABLE_PROP macros
Miklos Szeredi <mszeredi(a)redhat.com>
fuse: don't unhash root
Miklos Szeredi <mszeredi(a)redhat.com>
fuse: fix root lookup with nonzero generation
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
mmc: tmio: avoid concurrent runs of mmc_request_done()
Qingliang Li <qingliang.li(a)mediatek.com>
PM: sleep: wakeirq: fix wake irq warning in system suspend
Toru Katagiri <Toru.Katagiri(a)tdk.com>
USB: serial: cp210x: add pid/vid for TDK NC0110013M and MM0110113M
Aurélien Jacobs <aurel(a)gnuage.org>
USB: serial: option: add MeiG Smart SLM320 product
Christian Häggström <christian.haggstrom(a)orexplore.com>
USB: serial: cp210x: add ID for MGP Instruments PDS100
Cameron Williams <cang1(a)live.co.uk>
USB: serial: add device ID for VeriFone adapter
Daniel Vogelbacher <daniel(a)chaospixel.com>
USB: serial: ftdi_sio: add support for GMC Z216C Adapter IR-USB
Michael Ellerman <mpe(a)ellerman.id.au>
powerpc/fsl: Fix mfpmr build errors with newer binutils
Prashanth K <quic_prashk(a)quicinc.com>
usb: xhci: Add error handling in xhci_map_urb_for_dma
Gabor Juhos <j4g8y7(a)gmail.com>
clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays
Gabor Juhos <j4g8y7(a)gmail.com>
clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays
Gabor Juhos <j4g8y7(a)gmail.com>
clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays
Gabor Juhos <j4g8y7(a)gmail.com>
clk: qcom: gcc-ipq6018: fix terminating of frequency table arrays
Maulik Shah <quic_mkshah(a)quicinc.com>
PM: suspend: Set mem_sleep_current during kernel command line setup
Guenter Roeck <linux(a)roeck-us.net>
parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds
Guenter Roeck <linux(a)roeck-us.net>
parisc: Fix csum_ipv6_magic on 64-bit systems
Guenter Roeck <linux(a)roeck-us.net>
parisc: Fix csum_ipv6_magic on 32-bit systems
Guenter Roeck <linux(a)roeck-us.net>
parisc: Fix ip_fast_csum
John David Anglin <dave.anglin(a)bell.net>
parisc: Avoid clobbering the C/B bits in the PSW with tophys and tovirt macros
Arseniy Krasnov <avkrasnov(a)salutedevices.com>
mtd: rawnand: meson: fix scrambling mode value in command macro
Zhang Yi <yi.zhang(a)huawei.com>
ubi: correct the calculation of fastmap size
Richard Weinberger <richard(a)nod.at>
ubi: Check for too small LEB size in VTBL code
Matthew Wilcox (Oracle) <willy(a)infradead.org>
ubifs: Set page uptodate in the correct place
Jan Kara <jack(a)suse.cz>
fat: fix uninitialized field in nostale filehandles
Matthew Wilcox (Oracle) <willy(a)infradead.org>
bounds: support non-power-of-two CONFIG_NR_CPUS
Arnd Bergmann <arnd(a)arndb.de>
kasan/test: avoid gcc warning for intentional overflow
Peter Collingbourne <pcc(a)google.com>
kasan: test: add memcpy test that avoids out-of-bounds write
Damien Le Moal <dlemoal(a)kernel.org>
block: Clear zone limits for a non-zoned stacked queue
Baokun Li <libaokun1(a)huawei.com>
ext4: correct best extent lstart adjustment logic
SeongJae Park <sj(a)kernel.org>
selftests/mqueue: Set timeout to 180 seconds
Damian Muszynski <damian.muszynski(a)intel.com>
crypto: qat - resolve race condition during AER recovery
Svyatoslav Pankratov <svyatoslav.pankratov(a)intel.com>
crypto: qat - fix double free during reset
Randy Dunlap <rdunlap(a)infradead.org>
sparc: vDSO: fix return value of __setup handler
Randy Dunlap <rdunlap(a)infradead.org>
sparc64: NMI watchdog: fix return value of __setup handler
Sean Christopherson <seanjc(a)google.com>
KVM: Always flush async #PF workqueue when vCPU is being destroyed
Gui-Dong Han <2045gemini(a)gmail.com>
media: xc4000: Fix atomicity violation in xc4000_get_frequency
Philipp Stanner <pstanner(a)redhat.com>
pci_iounmap(): Fix MMIO mapping leak
Zack Rusin <zack.rusin(a)broadcom.com>
drm/vmwgfx: Fix possible null pointer derefence with invalid contexts
Duje Mihanović <duje.mihanovic(a)skole.hr>
arm: dts: marvell: Fix maxium->maxim typo in brownstone dts
Roberto Sassu <roberto.sassu(a)huawei.com>
smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
Roberto Sassu <roberto.sassu(a)huawei.com>
smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
Amit Pundir <amit.pundir(a)linaro.org>
clk: qcom: gcc-sdm845: Add soft dependency on rpmhpd
Hidenori Kobayashi <hidenorik(a)chromium.org>
media: staging: ipu3-imgu: Set fields before media_entity_pads_init()
Zheng Wang <zyytlz.wz(a)163.com>
wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach
Thomas Gleixner <tglx(a)linutronix.de>
timers: Rename del_timer_sync() to timer_delete_sync()
Thomas Gleixner <tglx(a)linutronix.de>
timers: Use del_timer_sync() even on UP
Thomas Gleixner <tglx(a)linutronix.de>
timers: Update kernel-doc for various functions
Jim Mattson <jmattson(a)google.com>
KVM: x86: Use a switch statement and macros in __feature_translate()
Jim Mattson <jmattson(a)google.com>
KVM: x86: Advertise CPUID.(EAX=7,ECX=2):EDX[5:0] to userspace
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Update KVM-only leaf handling to allow for 100% KVM-only leafs
Borislav Petkov <bp(a)suse.de>
x86/bugs: Use sysfs_emit()
Kim Phillips <kim.phillips(a)amd.com>
x86/cpu: Support AMD Automatic IBRS
Lin Yujun <linyujun809(a)huawei.com>
Documentation/hw-vuln: Update spectre doc
-------------
Diffstat:
Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
.../admin-guide/filesystem-monitoring.rst | 74 ++
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 ++
Documentation/admin-guide/hw-vuln/spectre.rst | 18 +-
Documentation/admin-guide/index.rst | 1 +
Documentation/admin-guide/kernel-parameters.txt | 27 +-
Documentation/core-api/dma-api.rst | 14 +
Documentation/filesystems/locking.rst | 10 +-
Documentation/filesystems/nfs/exporting.rst | 33 +
Documentation/x86/mds.rst | 34 +-
MAINTAINERS | 7 +
Makefile | 8 +-
arch/Kconfig | 24 +
arch/arm/boot/dts/mmp2-brownstone.dts | 2 +-
arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi | 2 +
arch/arm64/include/asm/kvm_pgtable.h | 18 +-
arch/arm64/include/asm/stage2_pgtable.h | 20 -
arch/arm64/kvm/mmu.c | 9 +-
arch/hexagon/kernel/vmlinux.lds.S | 1 +
arch/ia64/Kconfig | 1 +
arch/ia64/Makefile | 2 +-
arch/openrisc/kernel/dma.c | 16 +-
arch/parisc/include/asm/assembly.h | 18 +-
arch/parisc/include/asm/checksum.h | 10 +-
arch/powerpc/include/asm/reg_fsl_emb.h | 11 +-
arch/powerpc/lib/Makefile | 2 +-
arch/riscv/include/asm/uaccess.h | 4 +-
arch/riscv/kernel/process.c | 3 -
arch/s390/kernel/entry.S | 1 +
arch/sparc/kernel/nmi.c | 2 +-
arch/sparc/vdso/vma.c | 7 +-
arch/x86/Kconfig | 13 +
arch/x86/boot/compressed/head_64.S | 8 +
arch/x86/entry/entry.S | 23 +
arch/x86/entry/entry_32.S | 3 +
arch/x86/entry/entry_64.S | 11 +
arch/x86/entry/entry_64_compat.S | 1 +
arch/x86/include/asm/asm-prototypes.h | 1 +
arch/x86/include/asm/asm.h | 5 +
arch/x86/include/asm/cpufeature.h | 8 +-
arch/x86/include/asm/cpufeatures.h | 6 +-
arch/x86/include/asm/disabled-features.h | 3 +-
arch/x86/include/asm/entry-common.h | 1 -
arch/x86/include/asm/linkage.h | 12 +-
arch/x86/include/asm/msr-index.h | 10 +
arch/x86/include/asm/nospec-branch.h | 47 +-
arch/x86/include/asm/required-features.h | 3 +-
arch/x86/include/asm/suspend_32.h | 10 +-
arch/x86/include/asm/text-patching.h | 31 +
arch/x86/kernel/alternative.c | 56 +-
arch/x86/kernel/cpu/amd.c | 10 +-
arch/x86/kernel/cpu/bugs.c | 245 ++--
arch/x86/kernel/cpu/common.c | 57 +-
arch/x86/kernel/cpu/mce/core.c | 4 +-
arch/x86/kernel/kprobes/core.c | 38 +-
arch/x86/kernel/nmi.c | 3 -
arch/x86/kernel/static_call.c | 50 +-
arch/x86/kvm/cpuid.c | 29 +-
arch/x86/kvm/reverse_cpuid.h | 44 +-
arch/x86/kvm/svm/sev.c | 18 +-
arch/x86/kvm/vmx/run_flags.h | 7 +-
arch/x86/kvm/vmx/vmenter.S | 9 +-
arch/x86/kvm/vmx/vmx.c | 12 +-
arch/x86/kvm/x86.c | 17 +-
arch/x86/lib/retpoline.S | 5 +-
arch/x86/mm/ident_map.c | 23 +-
block/blk-settings.c | 4 +
crypto/algboss.c | 4 +-
drivers/accessibility/speakup/synth.c | 4 +-
drivers/acpi/acpica/dbnames.c | 8 +-
drivers/acpi/cppc_acpi.c | 27 +-
drivers/ata/ahci.c | 5 -
drivers/ata/sata_mv.c | 63 +-
drivers/ata/sata_sx4.c | 6 +-
drivers/base/core.c | 26 +-
drivers/base/cpu.c | 8 +
drivers/base/power/wakeirq.c | 4 +-
drivers/clk/qcom/gcc-ipq6018.c | 2 +
drivers/clk/qcom/gcc-ipq8074.c | 2 +
drivers/clk/qcom/gcc-sdm845.c | 1 +
drivers/clk/qcom/mmcc-apq8084.c | 2 +
drivers/clk/qcom/mmcc-msm8974.c | 2 +
drivers/clocksource/arm_global_timer.c | 2 +-
drivers/cpufreq/brcmstb-avs-cpufreq.c | 5 +-
drivers/cpufreq/cpufreq-dt.c | 2 +-
drivers/crypto/qat/qat_common/adf_aer.c | 23 +-
drivers/firmware/efi/vars.c | 17 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 +
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 +-
drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c | 12 +-
.../gpu/drm/amd/display/modules/hdcp/hdcp_psp.c | 3 +
drivers/gpu/drm/drm_panel.c | 17 +-
drivers/gpu/drm/etnaviv/etnaviv_drv.c | 2 +-
drivers/gpu/drm/etnaviv/etnaviv_hwdb.c | 9 +
drivers/gpu/drm/exynos/exynos_drm_vidi.c | 4 +-
drivers/gpu/drm/exynos/exynos_hdmi.c | 4 +-
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 3 +
drivers/gpu/drm/i915/gt/intel_engine_pm.c | 3 -
.../gpu/drm/i915/gt/intel_execlists_submission.c | 3 +
drivers/gpu/drm/imx/parallel-display.c | 4 +-
drivers/gpu/drm/vc4/vc4_hdmi.c | 2 +-
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 14 +-
drivers/hid/uhid.c | 20 +-
drivers/hwmon/amc6821.c | 11 +
drivers/i2c/busses/i2c-i801.c | 4 +-
drivers/infiniband/core/cm_trace.h | 2 +-
drivers/infiniband/core/cma_trace.h | 2 +-
drivers/iommu/dma-iommu.c | 15 +
drivers/iommu/iova.c | 5 +
drivers/md/dm-integrity.c | 2 +-
drivers/md/dm-raid.c | 2 +
drivers/md/dm-snap.c | 4 +-
drivers/media/tuners/xc4000.c | 4 +-
drivers/misc/mei/hw-me-regs.h | 2 +
drivers/misc/mei/pci-me.c | 2 +
drivers/mmc/core/block.c | 14 +-
drivers/mmc/host/tmio_mmc_core.c | 2 +
drivers/mtd/nand/raw/meson_nand.c | 2 +-
drivers/mtd/ubi/fastmap.c | 7 +-
drivers/mtd/ubi/vtbl.c | 6 +
drivers/net/ethernet/freescale/fec_main.c | 11 +-
.../ethernet/hisilicon/hns3/hns3pf/hclge_trace.h | 8 +-
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_trace.h | 8 +-
drivers/net/ethernet/intel/i40e/i40e.h | 6 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 14 +-
drivers/net/ethernet/intel/i40e/i40e_ptp.c | 6 +-
drivers/net/ethernet/intel/i40e/i40e_register.h | 3 +
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 82 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.h | 5 +-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 34 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 16 +-
drivers/net/ethernet/marvell/octeontx2/af/cgx.c | 5 +
.../net/ethernet/marvell/octeontx2/af/rvu_npc.c | 2 +-
.../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +-
.../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 31 +-
drivers/net/ethernet/realtek/r8169_main.c | 11 +-
drivers/net/ethernet/renesas/ravb_main.c | 8 +-
drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 40 +-
.../net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 38 +-
drivers/net/ethernet/xilinx/ll_temac_main.c | 2 +-
drivers/net/usb/asix.h | 3 +
drivers/net/usb/asix_devices.c | 20 +-
drivers/net/wireguard/netlink.c | 10 +-
.../broadcom/brcm80211/brcmfmac/cfg80211.c | 4 +-
drivers/net/wireless/intel/iwlwifi/mvm/rfi.c | 12 +-
drivers/net/xen-netfront.c | 1 +
drivers/nvme/host/core.c | 6 +-
drivers/nvmem/meson-efuse.c | 25 +-
drivers/of/dynamic.c | 12 +
drivers/pci/controller/dwc/pcie-designware-ep.c | 7 +-
drivers/pci/pci-driver.c | 23 +-
drivers/pci/pcie/dpc.c | 15 +-
drivers/pci/pcie/err.c | 20 +
drivers/pci/quirks.c | 100 ++
drivers/pci/setup-res.c | 8 +-
drivers/phy/tegra/xusb.c | 13 +
drivers/s390/crypto/zcrypt_api.c | 2 +
drivers/s390/net/qeth_core_main.c | 38 +-
drivers/scsi/hosts.c | 7 +-
drivers/scsi/lpfc/lpfc_nvmet.c | 2 +-
drivers/scsi/myrb.c | 20 +-
drivers/scsi/myrs.c | 24 +-
drivers/scsi/qla2xxx/qla_attr.c | 14 +-
drivers/scsi/qla2xxx/qla_def.h | 2 +-
drivers/scsi/qla2xxx/qla_gbl.h | 2 +-
drivers/scsi/qla2xxx/qla_gs.c | 2 +-
drivers/scsi/qla2xxx/qla_init.c | 128 +--
drivers/scsi/qla2xxx/qla_iocb.c | 68 +-
drivers/scsi/qla2xxx/qla_mbx.c | 2 +-
drivers/scsi/qla2xxx/qla_os.c | 2 +-
drivers/scsi/qla2xxx/qla_target.c | 10 +
drivers/slimbus/core.c | 4 +-
drivers/soc/fsl/qbman/qman.c | 98 +-
drivers/staging/media/ipu3/ipu3-v4l2.c | 16 +-
.../staging/vc04_services/vchiq-mmal/mmal-vchiq.c | 5 +-
drivers/tee/optee/device.c | 3 +-
drivers/thermal/devfreq_cooling.c | 2 +-
drivers/tty/serial/8250/8250_port.c | 6 -
drivers/tty/serial/fsl_lpuart.c | 7 +-
drivers/tty/serial/imx.c | 22 +-
drivers/tty/serial/sc16is7xx.c | 15 +-
drivers/tty/serial/serial_core.c | 12 +
drivers/tty/vt/vt.c | 2 +-
drivers/usb/class/cdc-wdm.c | 6 +-
drivers/usb/core/hub.c | 23 +-
drivers/usb/core/hub.h | 2 +
drivers/usb/core/port.c | 5 +-
drivers/usb/core/sysfs.c | 16 +-
drivers/usb/dwc2/core.h | 14 +
drivers/usb/dwc2/core_intr.c | 72 +-
drivers/usb/dwc2/gadget.c | 10 +
drivers/usb/dwc2/hcd.c | 49 +-
drivers/usb/dwc2/hcd_ddma.c | 17 +-
drivers/usb/dwc2/hw.h | 2 +-
drivers/usb/dwc2/platform.c | 2 +-
drivers/usb/gadget/function/f_ncm.c | 2 +-
drivers/usb/gadget/udc/core.c | 4 +-
drivers/usb/gadget/udc/tegra-xudc.c | 39 +-
drivers/usb/host/xhci.c | 2 +
drivers/usb/phy/phy-generic.c | 7 -
drivers/usb/serial/cp210x.c | 4 +
drivers/usb/serial/ftdi_sio.c | 2 +
drivers/usb/serial/ftdi_sio_ids.h | 6 +
drivers/usb/serial/option.c | 6 +
drivers/usb/storage/isd200.c | 23 +-
drivers/usb/storage/scsiglue.c | 1 -
drivers/usb/storage/uas.c | 81 +-
drivers/usb/storage/usb.c | 4 +-
drivers/usb/typec/ucsi/ucsi.c | 52 +-
drivers/usb/typec/ucsi/ucsi.h | 4 +-
drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c | 7 +-
drivers/vfio/pci/vfio_pci_intrs.c | 188 +--
drivers/vfio/platform/vfio_platform_irq.c | 106 +-
drivers/vfio/virqfd.c | 21 +
drivers/xen/events/events_base.c | 5 +-
fs/Kconfig | 2 +-
fs/aio.c | 8 +-
fs/btrfs/scrub.c | 12 +-
fs/btrfs/volumes.c | 2 +-
fs/cifs/connect.c | 2 +-
fs/exec.c | 1 +
fs/exportfs/expfs.c | 8 +-
fs/ext4/mballoc.c | 17 +-
fs/ext4/resize.c | 3 +-
fs/ext4/super.c | 10 +-
fs/fat/nfs.c | 6 +
fs/fuse/dir.c | 4 +
fs/fuse/fuse_i.h | 1 -
fs/fuse/inode.c | 7 +-
fs/ksmbd/smb2pdu.c | 10 +-
fs/lockd/host.c | 2 +-
fs/lockd/svc.c | 223 ++--
fs/lockd/svc4proc.c | 29 +-
fs/lockd/svclock.c | 31 +-
fs/lockd/svcproc.c | 30 +-
fs/lockd/svcsubs.c | 4 +-
fs/lockd/xdr.c | 152 ++-
fs/lockd/xdr4.c | 153 ++-
fs/locks.c | 85 +-
fs/nfs/callback.c | 105 +-
fs/nfs/callback_xdr.c | 5 +-
fs/nfs/direct.c | 11 +-
fs/nfs/export.c | 9 +-
fs/nfs/nfs4state.c | 2 +-
fs/nfs/nfs4trace.h | 477 +-------
fs/nfs/nfstrace.h | 269 +----
fs/nfs/pnfs.h | 4 -
fs/nfs/write.c | 2 +-
fs/nfsd/Kconfig | 27 +-
fs/nfsd/Makefile | 8 +-
fs/nfsd/acl.h | 6 +-
fs/nfsd/blocklayout.c | 1 +
fs/nfsd/blocklayoutxdr.c | 1 +
fs/nfsd/cache.h | 2 +-
fs/nfsd/export.h | 1 -
fs/nfsd/filecache.c | 1192 +++++++++++---------
fs/nfsd/filecache.h | 19 +-
fs/nfsd/flexfilelayout.c | 3 +-
fs/nfsd/lockd.c | 2 +-
fs/nfsd/netns.h | 34 +-
fs/nfsd/nfs2acl.c | 55 +-
fs/nfsd/nfs3acl.c | 83 +-
fs/nfsd/nfs3proc.c | 212 +++-
fs/nfsd/nfs3xdr.c | 444 +++-----
fs/nfsd/nfs4acl.c | 46 +-
fs/nfsd/nfs4callback.c | 125 +-
fs/nfsd/nfs4idmap.c | 9 +-
fs/nfsd/nfs4layouts.c | 4 +-
fs/nfsd/nfs4proc.c | 991 +++++++++-------
fs/nfsd/nfs4recover.c | 12 +-
fs/nfsd/nfs4state.c | 1049 +++++++++++++----
fs/nfsd/nfs4xdr.c | 1115 +++++++++---------
fs/nfsd/nfscache.c | 63 +-
fs/nfsd/nfsctl.c | 146 ++-
fs/nfsd/nfsd.h | 35 +-
fs/nfsd/nfsfh.c | 264 ++---
fs/nfsd/nfsfh.h | 145 ++-
fs/nfsd/nfsproc.c | 121 +-
fs/nfsd/nfssvc.c | 275 ++---
fs/nfsd/nfsxdr.c | 178 ++-
fs/nfsd/state.h | 59 +-
fs/nfsd/stats.c | 16 +-
fs/nfsd/stats.h | 4 +-
fs/nfsd/trace.h | 692 ++++++++++--
fs/nfsd/vfs.c | 822 +++++++-------
fs/nfsd/vfs.h | 56 +-
fs/nfsd/xdr.h | 35 +-
fs/nfsd/xdr3.h | 61 +-
fs/nfsd/xdr4.h | 81 +-
fs/nfsd/xdr4cb.h | 6 +
fs/nilfs2/btree.c | 9 +-
fs/nilfs2/direct.c | 9 +-
fs/nilfs2/inode.c | 2 +-
fs/notify/dnotify/dnotify.c | 15 +-
fs/notify/fanotify/fanotify.c | 363 ++++--
fs/notify/fanotify/fanotify.h | 212 +++-
fs/notify/fanotify/fanotify_user.c | 441 ++++++--
fs/notify/fdinfo.c | 16 +-
fs/notify/fsnotify.c | 177 +--
fs/notify/fsnotify.h | 4 -
fs/notify/group.c | 36 +-
fs/notify/inotify/inotify.h | 11 +-
fs/notify/inotify/inotify_fsnotify.c | 7 +-
fs/notify/inotify/inotify_user.c | 53 +-
fs/notify/mark.c | 137 ++-
fs/notify/notification.c | 14 +-
fs/open.c | 42 +
fs/pipe.c | 17 +-
fs/ubifs/file.c | 13 +-
fs/vboxsf/super.c | 3 +-
include/asm-generic/vmlinux.lds.h | 4 +-
include/linux/cpu.h | 2 +
include/linux/device.h | 1 +
include/linux/dma-map-ops.h | 1 +
include/linux/dma-mapping.h | 5 +
include/linux/dnotify.h | 2 +-
include/linux/exportfs.h | 17 +-
include/linux/fanotify.h | 31 +-
include/linux/fs.h | 26 +
include/linux/fsnotify.h | 70 +-
include/linux/fsnotify_backend.h | 356 +++++-
include/linux/gfp.h | 9 +
include/linux/hyperv.h | 22 +-
include/linux/iova.h | 2 +
include/linux/kthread.h | 1 +
include/linux/linkage.h | 4 +-
include/linux/lockd/lockd.h | 10 +-
include/linux/lockd/xdr.h | 27 +-
include/linux/lockd/xdr4.h | 29 +-
include/linux/minmax.h | 17 +
include/linux/module.h | 6 +-
include/linux/nfs.h | 8 -
include/linux/nfs4.h | 17 +
include/linux/nfs_fs.h | 1 +
include/linux/nfs_ssc.h | 4 +-
include/linux/pci.h | 1 +
include/linux/phy/tegra/xusb.h | 1 +
include/linux/ring_buffer.h | 1 +
include/linux/secretmem.h | 4 +-
include/linux/sunrpc/svc.h | 93 +-
include/linux/sunrpc/svc_xprt.h | 11 +-
include/linux/sunrpc/svcsock.h | 7 +-
include/linux/sunrpc/xdr.h | 2 +
include/linux/timer.h | 18 +-
include/linux/udp.h | 28 +
include/linux/vfio.h | 2 +
include/net/cfg802154.h | 1 +
include/net/inet_connection_sock.h | 1 +
include/net/sock.h | 7 +
include/soc/fsl/qman.h | 9 +
include/trace/events/rpcgss.h | 18 +-
include/trace/events/rpcrdma.h | 44 +-
include/trace/events/sunrpc.h | 74 +-
include/trace/misc/fs.h | 122 ++
include/trace/misc/nfs.h | 387 +++++++
include/trace/{events => misc}/rdma.h | 0
include/trace/misc/sunrpc.h | 18 +
include/uapi/linux/fanotify.h | 29 +
include/uapi/linux/nfsd/nfsfh.h | 115 --
init/initramfs.c | 2 +-
io_uring/io_uring.c | 2 +-
kernel/audit_fsnotify.c | 8 +-
kernel/audit_tree.c | 2 +-
kernel/audit_watch.c | 5 +-
kernel/bounds.c | 2 +-
kernel/bpf/verifier.c | 5 +
kernel/dma/mapping.c | 12 +
kernel/dma/swiotlb.c | 11 +-
kernel/entry/common.c | 8 +-
kernel/events/core.c | 9 +
kernel/kthread.c | 23 +-
kernel/locking/rwsem.c | 14 +-
kernel/module.c | 8 +-
kernel/power/suspend.c | 1 +
kernel/printk/printk.c | 63 +-
kernel/time/timer.c | 160 +--
kernel/trace/ring_buffer.c | 233 ++--
kernel/trace/trace.c | 21 +-
lib/Kconfig.debug | 1 +
lib/pci_iomap.c | 2 +-
lib/test_kasan.c | 21 +-
mm/compaction.c | 7 +-
mm/memtest.c | 4 +-
mm/migrate.c | 6 +-
mm/page_alloc.c | 10 +-
mm/swapfile.c | 25 +-
mm/vmscan.c | 5 +-
net/bluetooth/bnep/core.c | 2 +-
net/bluetooth/cmtp/core.c | 2 +-
net/bluetooth/hci_debugfs.c | 64 +-
net/bluetooth/hci_event.c | 25 +
net/bluetooth/hidp/core.c | 2 +-
net/bridge/netfilter/ebtables.c | 6 +
net/core/skbuff.c | 6 +-
net/core/sock_map.c | 6 +
net/ipv4/inet_connection_sock.c | 14 +
net/ipv4/ip_gre.c | 5 +
net/ipv4/netfilter/arp_tables.c | 4 +
net/ipv4/netfilter/ip_tables.c | 4 +
net/ipv4/tcp.c | 2 +
net/ipv4/udp.c | 7 +
net/ipv4/udp_offload.c | 20 +-
net/ipv6/ip6_fib.c | 14 +-
net/ipv6/ip6_gre.c | 3 +
net/ipv6/netfilter/ip6_tables.c | 4 +
net/ipv6/udp.c | 2 +-
net/ipv6/udp_offload.c | 8 +-
net/mac80211/cfg.c | 5 +-
net/mac802154/llsec.c | 18 +-
net/mptcp/protocol.c | 3 -
net/mptcp/subflow.c | 3 +
net/netfilter/nf_tables_api.c | 20 +-
net/nfc/nci/core.c | 5 +
net/rds/rdma.c | 2 +-
net/sched/act_skbmod.c | 10 +-
net/sunrpc/svc.c | 227 ++--
net/sunrpc/svc_xprt.c | 84 +-
net/sunrpc/svcsock.c | 24 +-
net/sunrpc/xdr.c | 22 +
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 +-
net/xfrm/xfrm_user.c | 3 +
scripts/Makefile.extrawarn | 2 +
security/landlock/syscalls.c | 18 +-
security/smack/smack_lsm.c | 12 +-
sound/pci/hda/patch_realtek.c | 9 +-
sound/sh/aica.c | 17 +-
sound/soc/codecs/rt5682-sdw.c | 4 +-
sound/soc/codecs/rt711-sdca-sdw.c | 4 +-
sound/soc/codecs/rt711-sdw.c | 4 +-
sound/soc/soc-ops.c | 2 +-
tools/objtool/check.c | 3 +-
tools/testing/selftests/mqueue/setting | 1 +
tools/testing/selftests/net/mptcp/diag.sh | 6 +-
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 7 +
tools/testing/selftests/net/reuseaddr_conflict.c | 2 +-
tools/testing/selftests/net/udpgro_fwd.sh | 10 +-
virt/kvm/async_pf.c | 31 +-
439 files changed, 11612 insertions(+), 6882 deletions(-)
The patch titled
Subject: mm/shmem: Inline shmem_is_huge() for disabled transparent hugepages
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-shmem-inline-shmem_is_huge-for-disabled-transparent-hugepages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Subject: mm/shmem: Inline shmem_is_huge() for disabled transparent hugepages
Date: Tue, 9 Apr 2024 17:54:07 +0200
In order to minimize code size (CONFIG_CC_OPTIMIZE_FOR_SIZE=y),
compiler might choose to make a regular function call (out-of-line) for
shmem_is_huge() instead of inlining it. When transparent hugepages are
disabled (CONFIG_TRANSPARENT_HUGEPAGE=n), it can cause compilation
error.
mm/shmem.c: In function `shmem_getattr':
./include/linux/huge_mm.h:383:27: note: in expansion of macro `BUILD_BUG'
383 | #define HPAGE_PMD_SIZE ({ BUILD_BUG(); 0; })
| ^~~~~~~~~
mm/shmem.c:1148:33: note: in expansion of macro `HPAGE_PMD_SIZE'
1148 | stat->blksize = HPAGE_PMD_SIZE;
To prevent the possible error, always inline shmem_is_huge() when
transparent hugepages are disabled.
Link: https://lkml.kernel.org/r/20240409155407.2322714-1-sumanthk@linux.ibm.com
Signed-off-by: Sumanth Korikkar <sumanthk(a)linux.ibm.com>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Ilya Leoshkevich <iii(a)linux.ibm.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/shmem_fs.h | 9 +++++++++
mm/shmem.c | 6 ------
2 files changed, 9 insertions(+), 6 deletions(-)
--- a/include/linux/shmem_fs.h~mm-shmem-inline-shmem_is_huge-for-disabled-transparent-hugepages
+++ a/include/linux/shmem_fs.h
@@ -110,8 +110,17 @@ extern struct page *shmem_read_mapping_p
extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
int shmem_unuse(unsigned int type);
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
struct mm_struct *mm, unsigned long vm_flags);
+#else
+static __always_inline bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
+ struct mm_struct *mm, unsigned long vm_flags)
+{
+ return false;
+}
+#endif
+
#ifdef CONFIG_SHMEM
extern unsigned long shmem_swap_usage(struct vm_area_struct *vma);
#else
--- a/mm/shmem.c~mm-shmem-inline-shmem_is_huge-for-disabled-transparent-hugepages
+++ a/mm/shmem.c
@@ -748,12 +748,6 @@ static long shmem_unused_huge_count(stru
#define shmem_huge SHMEM_HUGE_DENY
-bool shmem_is_huge(struct inode *inode, pgoff_t index, bool shmem_huge_force,
- struct mm_struct *mm, unsigned long vm_flags)
-{
- return false;
-}
-
static unsigned long shmem_unused_huge_shrink(struct shmem_sb_info *sbinfo,
struct shrink_control *sc, unsigned long nr_to_split)
{
_
Patches currently in -mm which might be from sumanthk(a)linux.ibm.com are
mm-shmem-inline-shmem_is_huge-for-disabled-transparent-hugepages.patch
The patch titled
Subject: kexec: fix the unexpected kexec_dprintk() macro
has been added to the -mm mm-nonmm-unstable branch. Its filename is
kexec-fix-the-unexpected-kexec_dprintk-macro.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Baoquan He <bhe(a)redhat.com>
Subject: kexec: fix the unexpected kexec_dprintk() macro
Date: Tue, 9 Apr 2024 12:22:38 +0800
Jiri reported that the current kexec_dprintk() always prints out debugging
message whenever kexec/kdmmp loading is triggered. That is not wanted.
The debugging message is supposed to be printed out when 'kexec -s -d' is
specified for kexec/kdump loading.
After investigating, the reason is the current kexec_dprintk() takes
printk(KERN_INFO) or printk(KERN_DEBUG) depending on whether '-d' is
specified. However, distros usually have defaulg log level like below:
[~]# cat /proc/sys/kernel/printk
7 4 1 7
So, even though '-d' is not specified, printk(KERN_DEBUG) also always
prints out. I thought printk(KERN_DEBUG) is equal to pr_debug(), it's
not.
Fix it by changing to use pr_info() instead which are expected to work.
Link: https://lkml.kernel.org/r/20240409042238.1240462-1-bhe@redhat.com
Fixes: cbc2fe9d9cb2 ("kexec_file: add kexec_file flag to control debug printing")
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Reported-by: Jiri Slaby <jirislaby(a)kernel.org>
Closes: https://lore.kernel.org/all/4c775fca-5def-4a2d-8437-7130b02722a2@kernel.org
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/kexec.h | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)
--- a/include/linux/kexec.h~kexec-fix-the-unexpected-kexec_dprintk-macro
+++ a/include/linux/kexec.h
@@ -461,10 +461,8 @@ static inline void arch_kexec_pre_free_p
extern bool kexec_file_dbg_print;
-#define kexec_dprintk(fmt, ...) \
- printk("%s" fmt, \
- kexec_file_dbg_print ? KERN_INFO : KERN_DEBUG, \
- ##__VA_ARGS__)
+#define kexec_dprintk(fmt, arg...) \
+ do { if (kexec_file_dbg_print) pr_info(fmt, ##arg); } while (0)
#else /* !CONFIG_KEXEC_CORE */
struct pt_regs;
_
Patches currently in -mm which might be from bhe(a)redhat.com are
mm-vmallocc-optimize-to-reduce-arguments-of-alloc_vmap_area.patch
x86-remove-unneeded-memblock_find_dma_reserve.patch
mm-mm_initc-remove-the-useless-dma_reserve.patch
mm-mm_initc-add-new-function-calc_nr_all_pages.patch
mm-mm_initc-remove-meaningless-calculation-of-zone-managed_pages-in-free_area_init_core.patch
mm-mm_initc-remove-meaningless-calculation-of-zone-managed_pages-in-free_area_init_core-v3.patch
mm-mm_initc-remove-unneeded-calc_memmap_size.patch
mm-mm_initc-remove-arch_reserved_kernel_pages.patch
mm-move-array-mem_section-init-code-out-of-memory_present.patch
mm-init-remove-the-unnecessary-special-treatment-for-memory-less-node.patch
mm-make-__absent_pages_in_range-as-static.patch
mm-page_allocc-remove-unneeded-codes-in-numa-version-of-build_zonelists.patch
mm-page_allocc-remove-unneeded-codes-in-numa-version-of-build_zonelists-v2.patch
mm-mm_initc-remove-the-outdated-code-comment-above-deferred_grow_zone.patch
mm-page_allocc-dont-show-protection-in-zones-lowmem_reserve-for-empty-zone.patch
mm-page_allocc-change-the-array-length-to-migrate_pcptypes.patch
arch-loongarch-clean-up-the-left-code-and-kconfig-item-related-to-crash_core.patch
documentation-kdump-clean-up-the-outdated-description.patch
kexec-fix-the-unexpected-kexec_dprintk-macro.patch
The patch titled
Subject: Squashfs: check the inode number is not the invalid value of zero
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
squashfs-check-the-inode-number-is-not-the-invalid-value-of-zero.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Phillip Lougher <phillip(a)squashfs.org.uk>
Subject: Squashfs: check the inode number is not the invalid value of zero
Date: Mon, 8 Apr 2024 23:02:06 +0100
Syskiller has produced an out of bounds access in fill_meta_index().
That out of bounds access is ultimately caused because the inode
has an inode number with the invalid value of zero, which was not checked.
The reason this causes the out of bounds access is due to following
sequence of events:
1. Fill_meta_index() is called to allocate (via empty_meta_index())
and fill a metadata index. It however suffers a data read error
and aborts, invalidating the newly returned empty metadata index.
It does this by setting the inode number of the index to zero,
which means unused (zero is not a valid inode number).
2. When fill_meta_index() is subsequently called again on another
read operation, locate_meta_index() returns the previous index
because it matches the inode number of 0. Because this index
has been returned it is expected to have been filled, and because
it hasn't been, an out of bounds access is performed.
This patch adds a sanity check which checks that the inode number
is not zero when the inode is created and returns -EINVAL if it is.
Link: https://lkml.kernel.org/r/20240408220206.435788-1-phillip@squashfs.org.uk
Signed-off-by: Phillip Lougher <phillip(a)squashfs.org.uk>
Reported-by: "Ubisectech Sirius" <bugreport(a)ubisectech.com>
Closes: https://lore.kernel.org/lkml/87f5c007-b8a5-41ae-8b57-431e924c5915.bugreport…
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/squashfs/inode.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/fs/squashfs/inode.c~squashfs-check-the-inode-number-is-not-the-invalid-value-of-zero
+++ a/fs/squashfs/inode.c
@@ -48,6 +48,10 @@ static int squashfs_new_inode(struct sup
gid_t i_gid;
int err;
+ inode->i_ino = le32_to_cpu(sqsh_ino->inode_number);
+ if(inode->i_ino == 0)
+ return -EINVAL;
+
err = squashfs_get_id(sb, le16_to_cpu(sqsh_ino->uid), &i_uid);
if (err)
return err;
@@ -58,7 +62,6 @@ static int squashfs_new_inode(struct sup
i_uid_write(inode, i_uid);
i_gid_write(inode, i_gid);
- inode->i_ino = le32_to_cpu(sqsh_ino->inode_number);
inode_set_mtime(inode, le32_to_cpu(sqsh_ino->mtime), 0);
inode_set_atime(inode, inode_get_mtime_sec(inode), 0);
inode_set_ctime(inode, inode_get_mtime_sec(inode), 0);
_
Patches currently in -mm which might be from phillip(a)squashfs.org.uk are
squashfs-check-the-inode-number-is-not-the-invalid-value-of-zero.patch
squashfs-remove-deprecated-strncpy-by-not-copying-the-string.patch
From: Elizaveta Gorina <s02220065(a)gse.cs.msu.ru>
If the device is unavailable, of_reserved_mem_device_init_by_name
returns 0, and emc->nominal is not initialized and the null pointer
is dereferenced.
Make a return from the tegra210_emc_probe function when emc->nominal
is equal to the null pointer.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 0553d7b204ef ("memory: tegra: Support derated timings on Tegra210")
Cc: stable(a)vger.kernel.org
Signed-off-by: Elizaveta Gorina <s02220065(a)gse.cs.msu.ru>
---
drivers/memory/tegra/tegra210-emc-core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/memory/tegra/tegra210-emc-core.c b/drivers/memory/tegra/tegra210-emc-core.c
index 78ca1d6c0977..a49a5e36ba34 100644
--- a/drivers/memory/tegra/tegra210-emc-core.c
+++ b/drivers/memory/tegra/tegra210-emc-core.c
@@ -1865,6 +1865,9 @@ static int tegra210_emc_probe(struct platform_device *pdev)
emc->num_timings);
if (err < 0)
goto release;
+ } else {
+ err = -ENODEV;
+ goto release;
}
if (emc->derated) {
--
2.25.1
This is the start of the stable review cycle for the 5.15.154 release.
There are 696 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Thu, 11 Apr 2024 17:27:40 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.15.154-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.15.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.15.154-rc2
Daniel Sneddon <daniel.sneddon(a)linux.intel.com>
KVM: x86: Add BHI_NO
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bhi: Mitigate KVM by default
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bhi: Add BHI mitigation knob
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bhi: Enumerate Branch History Injection (BHI) bug
Daniel Sneddon <daniel.sneddon(a)linux.intel.com>
x86/bhi: Define SPEC_CTRL_BHI_DIS_S
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bhi: Add support for clearing branch history at syscall entry
Linus Torvalds <torvalds(a)linux-foundation.org>
x86/syscall: Don't force use of indirect calls for system calls
Josh Poimboeuf <jpoimboe(a)kernel.org>
x86/bugs: Change commas to semicolons in 'spectre_v2' sysfs file
min15.li <min15.li(a)samsung.com>
nvme: fix miss command type check
Antoine Tenart <atenart(a)kernel.org>
gro: fix ownership transfer
David Hildenbrand <david(a)redhat.com>
mm/secretmem: fix GUP-fast succeeding on secretmem folios
Davide Caratti <dcaratti(a)redhat.com>
mptcp: don't account accept() of non-MPC client as fallback to TCP
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/retpoline: Do the necessary fixup to the Zen3/4 srso return thunk for !SRSO
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/bugs: Fix the SRSO mitigation on Zen3/4
Stefan O'Rear <sorear(a)fastmail.com>
riscv: process: Fix kernel gp leakage
Samuel Holland <samuel.holland(a)sifive.com>
riscv: Fix spurious errors from __get/put_kernel_nofault
Sumanth Korikkar <sumanthk(a)linux.ibm.com>
s390/entry: align system call table on 8 bytes
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/mce: Make sure to grab mce_sysfs_mutex in set_bank()
Herve Codina <herve.codina(a)bootlin.com>
of: dynamic: Synchronize of_changeset_destroy() with the devlink removals
Herve Codina <herve.codina(a)bootlin.com>
driver core: Introduce device_link_wait_removal()
I Gede Agastya Darma Laksana <gedeagas22(a)gmail.com>
ALSA: hda/realtek: Update Panasonic CF-SZ6 quirk to support headset with microphone
Jann Horn <jannh(a)google.com>
fs/pipe: Fix lockdep false-positive in watchqueue pipe_write()
Jann Horn <jannh(a)google.com>
openrisc: Fix pagewalk usage in arch_dma_{clear, set}_uncached
Jann Horn <jannh(a)google.com>
HID: uhid: Use READ_ONCE()/WRITE_ONCE() for ->running
Jeff Layton <jlayton(a)kernel.org>
nfsd: hold a lighter-weight client reference over CB_RECALL_ANY
Arnd Bergmann <arnd(a)arndb.de>
ata: sata_mv: Fix PCI device ID table declaration compilation warning
Arnd Bergmann <arnd(a)arndb.de>
scsi: mylex: Fix sysfs buffer lengths
Arnd Bergmann <arnd(a)arndb.de>
ata: sata_sx4: fix pdc20621_get_from_dimm() on 64-bit
Stephen Lee <slee08177(a)gmail.com>
ASoC: ops: Fix wraparound for mask in snd_soc_get_volsw
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: rt711-sdw: fix locking sequence
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: rt711-sdca: fix locking sequence
Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
ASoC: rt5682-sdw: fix locking sequence
Paul Barker <paul.barker.ct(a)bp.renesas.com>
net: ravb: Always process TX descriptor ring
Wei Fang <wei.fang(a)nxp.com>
net: fec: Set mac_managed_pm during probe
Denis Kirjanov <dkirjanov(a)suse.de>
drivers: net: convert to boolean for the mac_managed_pm flag
Oleksij Rempel <linux(a)rempel-privat.de>
net: usb: asix: suspend embedded PHY if external is used
Ivan Vecera <ivecera(a)redhat.com>
i40e: Enforce software interrupt during busy-poll exit
Ivan Vecera <ivecera(a)redhat.com>
i40e: Remove _t suffix from enum type names
Joe Damato <jdamato(a)fastly.com>
i40e: Store the irq number in i40e_q_vector
Alexander Stein <alexander.stein(a)ew.tq-group.com>
Revert "usb: phy: generic: Get the vbus supply"
Bikash Hazarika <bhazarika(a)marvell.com>
scsi: qla2xxx: Update manufacturer detail
Bikash Hazarika <bhazarika(a)marvell.com>
scsi: qla2xxx: Update manufacturer details
Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
i40e: fix vf may be used uninitialized in this function warning
Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
i40e: fix i40e_count_filters() to count only active/new filters
Su Hui <suhui(a)nfschina.com>
octeontx2-pf: check negative error code in otx2_open()
Hariprasad Kelam <hkelam(a)marvell.com>
octeontx2-af: Fix issue with loading coalesced KPU profiles
Antoine Tenart <atenart(a)kernel.org>
udp: prevent local UDP tunnel packets from being GROed
Antoine Tenart <atenart(a)kernel.org>
udp: do not transition UDP GRO fraglist partial checksums to unnecessary
Antoine Tenart <atenart(a)kernel.org>
udp: do not accept non-tunnel GSO skbs landing in a tunnel
David Thompson <davthompson(a)nvidia.com>
mlxbf_gige: stop interface during shutdown
Kuniyuki Iwashima <kuniyu(a)amazon.com>
ipv6: Fix infinite recursion in fib6_dump_done().
Jakub Kicinski <kuba(a)kernel.org>
selftests: reuseaddr_conflict: add missing new line at the end of the output
Eric Dumazet <edumazet(a)google.com>
erspan: make sure erspan_base_hdr is present in skb->head
Antoine Tenart <atenart(a)kernel.org>
selftests: net: gro fwd: update vxlan GRO test expectations
Piotr Wejman <piotrwejman90(a)gmail.com>
net: stmmac: fix rx queue priority assignment
Eric Dumazet <edumazet(a)google.com>
net/sched: act_skbmod: prevent kernel-infoleak
Jakub Sitnicki <jakub(a)cloudflare.com>
bpf, sockmap: Prevent lock inversion deadlock in map delete elem
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
vboxsf: Avoid an spurious warning if load_nls_xxx() fails
Eric Dumazet <edumazet(a)google.com>
netfilter: validate user input for expected length
Ziyang Xuan <william.xuanziyang(a)huawei.com>
netfilter: nf_tables: Fix potential data-race in __nft_flowtable_type_get()
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: flush pending destroy work before exit_net release
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: reject new basechain after table flag update
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Mark target gfn of emulated atomic instruction as dirty
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Bail to userspace if emulation of atomic user access faults
Ye Zhang <ye.zhang(a)rock-chips.com>
thermal: devfreq_cooling: Fix perf state when calculate dfc res_util
Vlastimil Babka <vbabka(a)suse.cz>
mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations
Ingo Molnar <mingo(a)kernel.org>
Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."
Jens Axboe <axboe(a)kernel.dk>
io_uring: ensure '0' is returned on file registration success
Gokul krishna Krishnakumar <quic_gokukris(a)quicinc.com>
locking/rwsem: Disable preemption while trying for rwsem lock
Mahmoud Adam <mngyadam(a)amazon.com>
net/rds: fix possible cp null dereference
Jesper Dangaard Brouer <hawk(a)kernel.org>
xen-netfront: Add missing skb_mark_for_recycle
Bastien Nocera <hadess(a)hadess.net>
Bluetooth: Fix TOCTOU in HCI debugfs implementation
Hui Wang <hui.wang(a)canonical.com>
Bluetooth: hci_event: set the conn encrypted before conn establishes
Johan Hovold <johan+linaro(a)kernel.org>
arm64: dts: qcom: sc7180-trogdor: mark bluetooth address as broken
Sean Christopherson <seanjc(a)google.com>
x86/cpufeatures: Add CPUID_LNX_5 to track recently added Linux-defined word
Sandipan Das <sandipan.das(a)amd.com>
x86/cpufeatures: Add new word for scattered features
Heiner Kallweit <hkallweit1(a)gmail.com>
r8169: fix issue caused by buggy BIOS on certain boards with RTL8168d
Arnd Bergmann <arnd(a)arndb.de>
dm integrity: fix out-of-range warning
Hariprasad Kelam <hkelam(a)marvell.com>
Octeontx2-af: fix pause frame configuration in GMP mode
Andrei Matei <andreimatei1(a)gmail.com>
bpf: Protect against int overflow for stack access size
David Thompson <davthompson(a)nvidia.com>
mlxbf_gige: call request_irq() after NAPI initialized
Nikita Kiryushin <kiryushin(a)ancud.ru>
ACPICA: debugger: check status of acpi_evaluate_object() in acpi_db_walk_for_fields()
Eric Dumazet <edumazet(a)google.com>
tcp: properly terminate timers for kernel sockets
Alexandra Winter <wintera(a)linux.ibm.com>
s390/qeth: handle deferred cc1
Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
ixgbe: avoid sleeping allocation in ixgbe_ipsec_vf_add_sa()
Johannes Berg <johannes.berg(a)intel.com>
wifi: iwlwifi: mvm: rfi: fix potential response leaks
Bixuan Cui <cuibixuan(a)linux.alibaba.com>
iwlwifi: mvm: rfi: use kmemdup() to replace kzalloc + memcpy
David Thompson <davthompson(a)nvidia.com>
mlxbf_gige: stop PHY during open() error paths
Ryosuke Yasuoka <ryasuoka(a)redhat.com>
nfc: nci: Fix uninit-value in nci_dev_up and nci_ntf_packet
Weitao Wang <WeitaoWang-oc(a)zhaoxin.com>
USB: UAS: return ENODEV when submit urbs fail with device not attached
Bart Van Assche <bvanassche(a)acm.org>
scsi: usb: Stop using the SCSI pointer
Bart Van Assche <bvanassche(a)acm.org>
scsi: usb: Call scsi_done() directly
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Fix deadlock in usb_deauthorize_interface()
Muhammad Usama Anjum <usama.anjum(a)collabora.com>
scsi: lpfc: Correct size for wqe for memset()
Mika Westerberg <mika.westerberg(a)linux.intel.com>
PCI/DPC: Quirk PIO log size for Intel Ice Lake Root Ports
Kim Phillips <kim.phillips(a)amd.com>
x86/cpu: Enable STIBP on AMD if Automatic IBRS is enabled
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Delay I/O Abort on PCI error
Saurav Kashyap <skashyap(a)marvell.com>
scsi: qla2xxx: Change debug message during driver unload
Saurav Kashyap <skashyap(a)marvell.com>
scsi: qla2xxx: Fix double free of fcport
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Fix command flush on cable pull
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: NVME|FCP prefer flag not being honored
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Split FCE|EFT trace control
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Fix N2N stuck connection
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Prevent command send on chip reset
Christian A. Ehrhardt <lk(a)c--e.de>
usb: typec: ucsi: Clear UCSI_CCI_RESET_COMPLETE before reset
Christian A. Ehrhardt <lk(a)c--e.de>
usb: typec: ucsi: Ack unsupported commands
yuan linyu <yuanlinyu(a)hihonor.com>
usb: udc: remove warning when queue disabled ep
Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
usb: dwc2: gadget: LPM flow fix
Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
usb: dwc2: gadget: Fix exiting from clock gating
Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
usb: dwc2: host: Fix ISOC flow in DDMA mode
Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
usb: dwc2: host: Fix hibernation flow
Minas Harutyunyan <Minas.Harutyunyan(a)synopsys.com>
usb: dwc2: host: Fix remote wakeup from hibernation
Alan Stern <stern(a)rowland.harvard.edu>
USB: core: Add hub_get() and hub_put() routines
Dan Carpenter <dan.carpenter(a)linaro.org>
staging: vc04_services: fix information leak in create_component()
Arnd Bergmann <arnd(a)arndb.de>
staging: vc04_services: changen strncpy() to strscpy_pad()
Guilherme G. Piccoli <gpiccoli(a)igalia.com>
scsi: core: Fix unremoved procfs host directory regression
Duoming Zhou <duoming(a)zju.edu.cn>
ALSA: sh: aica: reorder cleanup operations to avoid UAF bugs
Tom Chung <chiahsuan.chung(a)amd.com>
drm/amd/display: Preserve original aspect ratio in create stream
Ville Syrjälä <ville.syrjala(a)linux.intel.com>
drm/amdgpu: Use drm_mode_copy()
Oliver Neukum <oneukum(a)suse.com>
usb: cdc-wdm: close race between read and workqueue
Chris Wilson <chris(a)chris-wilson.co.uk>
drm/i915/gt: Reset queue_priority_hint on parking
Claus Hansen Ries <chr(a)terma.com>
net: ll_temac: platform_get_resource replaced by wrong function
Mikko Rapeli <mikko.rapeli(a)linaro.org>
mmc: core: Avoid negative index with array access
Mikko Rapeli <mikko.rapeli(a)linaro.org>
mmc: core: Initialize mmc_blk_ioc_data
Nathan Chancellor <nathan(a)kernel.org>
hexagon: vmlinux.lds.S: handle attributes section
Max Filippov <jcmvbkbc(a)gmail.com>
exec: Fix NOMMU linux_binprm::exec in transfer_args_to_stack()
Felix Fietkau <nbd(a)nbd.name>
wifi: mac80211: check/clear fast rx for non-4addr sta VLAN changes
Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
btrfs: zoned: use zone aware sb location for scrub
John Sperbeck <jsperbeck(a)google.com>
init: open /initrd.image with O_LARGEFILE
Zi Yan <ziy(a)nvidia.com>
mm/migrate: set swap entry values of THP tail pages properly.
Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
serial: sc16is7xx: convert from _raw_ to _noinc_ regmap functions for FIFO
Alex Williamson <alex.williamson(a)redhat.com>
vfio/fsl-mc: Block calling interrupt handler without trigger
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: Create persistent IRQ handlers
Alex Williamson <alex.williamson(a)redhat.com>
vfio/pci: Create persistent INTx handler
Alex Williamson <alex.williamson(a)redhat.com>
vfio: Introduce interface to flush virqfd inject workqueue
Alex Williamson <alex.williamson(a)redhat.com>
vfio/pci: Lock external INTx masking ops
Alex Williamson <alex.williamson(a)redhat.com>
vfio/pci: Disable auto-enable of exclusive INTx IRQ
Geliang Tang <tanggeliang(a)kylinos.cn>
selftests: mptcp: diag: return KSFT_FAIL not test_cnt
Nathan Chancellor <nathan(a)kernel.org>
powerpc: xor_vmx: Add '-mhard-float' to CFLAGS
Tim Schumacher <timschumi(a)gmx.de>
efivarfs: Request at most 512 bytes for variable names
Yang Jihong <yangjihong1(a)huawei.com>
perf/core: Fix reentry problem in perf_output_read_group()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
nfsd: Fix a regression in nfsd_setattr()
NeilBrown <neilb(a)suse.de>
nfsd: don't call locks_release_private() twice concurrently
NeilBrown <neilb(a)suse.de>
nfsd: don't take fi_lock in nfsd_break_deleg_cb()
NeilBrown <neilb(a)suse.de>
nfsd: fix RELEASE_LOCKOWNER
Jeff Layton <jlayton(a)kernel.org>
nfsd: drop the nfsd_put helper
NeilBrown <neilb(a)suse.de>
nfsd: call nfsd_last_thread() before final nfsd_put()
Alexander Aring <aahringo(a)redhat.com>
lockd: introduce safe async lock op
NeilBrown <neilb(a)suse.de>
NFSD: fix possible oops when nfsd/pool_stats is closed.
Chuck Lever <chuck.lever(a)oracle.com>
Documentation: Add missing documentation for EXPORT_OP flags
NeilBrown <neilb(a)suse.de>
nfsd: separate nfsd_last_thread() from nfsd_put()
NeilBrown <neilb(a)suse.de>
nfsd: Simplify code around svc_exit_thread() call in nfsd()
Tavian Barnes <tavianator(a)tavianator.com>
nfsd: Fix creation time serialization order
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add an nfsd4_encode_nfstime4() helper
NeilBrown <neilb(a)suse.de>
lockd: drop inappropriate svc_get() from locked_get()
Dan Carpenter <dan.carpenter(a)linaro.org>
nfsd: fix double fget() bug in __write_ports_addfd()
Jeff Layton <jlayton(a)kernel.org>
nfsd: make a copy of struct iattr before calling notify_change
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: Fix problem of COMMIT and NFS4ERR_DELAY in infinite loop
Jeff Layton <jlayton(a)kernel.org>
nfsd: simplify the delayed disposal list code
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Convert filecache to rhltable
Jeff Layton <jlayton(a)kernel.org>
nfsd: allow reaping files still under writeback
Jeff Layton <jlayton(a)kernel.org>
nfsd: update comment over __nfsd_file_cache_purge
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't take/put an extra reference when putting a file
Jeff Layton <jlayton(a)kernel.org>
nfsd: add some comments to nfsd_file_do_acquire
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't kill nfsd_files because of lease break error
Jeff Layton <jlayton(a)kernel.org>
nfsd: simplify test_bit return in NFSD_FILE_KEY_FULL comparator
Jeff Layton <jlayton(a)kernel.org>
nfsd: NFSD_FILE_KEY_INODE only needs to find GC'ed entries
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't open-code clear_and_wake_up_bit
Jeff Layton <jlayton(a)kernel.org>
nfsd: call op_release, even when op_func returns an error
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't replace page in rq_pages if it's a continuation of last page
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Protect against filesystem freezing
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: copy the whole verifier in nfsd_copy_write_verifier
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't fsync nfsd_files on last close
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix courtesy client with deny mode handling in nfs4_upgrade_open
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: fix problems with cleanup on errors in nfsd4_copy
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't hand out delegation on setuid files being opened for write
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: fix leaked reference count of nfsd4_ssc_umount_item
Jeff Layton <jlayton(a)kernel.org>
nfsd: clean up potential nfsd_file refcount leaks in COPY codepath
Jeff Layton <jlayton(a)kernel.org>
nfsd: allow nfsd_file_get to sanely handle a NULL pointer
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: enhance inter-server copy cleanup
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't destroy global nfs4_file table in per-net shutdown
Jeff Layton <jlayton(a)kernel.org>
nfsd: don't free files unconditionally in __nfsd_file_cache_purge
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: replace delayed_work with work_struct for nfsd_client_shrinker
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: register/unregister of nfsd-client shrinker at nfsd startup/shutdown time
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use set_bit(RQ_DROPME)
Chuck Lever <chuck.lever(a)oracle.com>
Revert "SUNRPC: Use RMW bitops in single-threaded hot paths"
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix handling of cached open files in nfsd4_open codepath
Jeff Layton <jlayton(a)kernel.org>
nfsd: rework refcounting in filecache
Kees Cook <keescook(a)chromium.org>
NFSD: Avoid clashing function prototypes
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use only RQ_DROPME to signal the need to drop a reply
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add CB_RECALL_ANY tracepoints
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add delegation reaper to react to low memory condition
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add support for sending CB_RECALL_ANY
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: refactoring courtesy_client_reaper to a generic low memory shrinker
Chuck Lever <chuck.lever(a)oracle.com>
trace: Relocate event helper files
Jeff Layton <jlayton(a)kernel.org>
lockd: fix file selection in nlmsvc_cancel_blocked
Jeff Layton <jlayton(a)kernel.org>
lockd: ensure we use the correct file descriptor when unlocking
Jeff Layton <jlayton(a)kernel.org>
lockd: set missing fl_flags field when retrieving args
Xiu Jianfeng <xiujianfeng(a)huawei.com>
NFSD: Use struct_size() helper in alloc_session()
Jeff Layton <jlayton(a)kernel.org>
nfsd: return error if nfs4_setacl fails
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add an nfsd_file_fsync tracepoint
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix up the filecache laundrette scheduling
Jeff Layton <jlayton(a)kernel.org>
filelock: add a new locks_inode_context accessor function
Jeff Layton <jlayton(a)kernel.org>
nfsd: reorganize filecache.c
Jeff Layton <jlayton(a)kernel.org>
nfsd: remove the pages_flushed statistic from filecache
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix licensing header in filecache.c
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use rhashtable for managing nfs4_file objects
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor find_file()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up find_or_add_file()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add a nfsd4_file_hash_remove() helper
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd4_init_file()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Update file_hashtbl() helpers
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use const pointers as parameters to fh_ helpers
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace delegation revocations
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace stateids returned via DELEGRETURN
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfs4_preprocess_stateid_op() call sites
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Flesh out a documenting comment for filecache.c
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add an NFSD_FILE_GC flag to enable nfsd_file garbage collection
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Revert "NFSD: NFSv4 CLOSE should release an nfsd_file immediately"
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Pass the target nfsd_file to nfsd_commit()
David Disseldorp <ddiss(a)suse.de>
exportfs: use pr_debug for unreachable debug statements
Jeff Layton <jlayton(a)kernel.org>
nfsd: allow disabling NFSv2 at compile time
Jeff Layton <jlayton(a)kernel.org>
nfsd: move nfserrno() to vfs.c
Jeff Layton <jlayton(a)kernel.org>
nfsd: ignore requests to disable unsupported versions
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Finish converting the NFSv3 GETACL result encoder
Colin Ian King <colin.i.king(a)gmail.com>
NFSD: Remove redundant assignment to variable host_err
Anna Schumaker <Anna.Schumaker(a)Netapp.com>
NFSD: Simplify READ_PLUS
Jeff Layton <jlayton(a)kernel.org>
nfsd: use locks_inode_context helper
Jeff Layton <jlayton(a)kernel.org>
lockd: use locks_inode_context helper
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix reads with a non-zero offset that don't end on a page boundary
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix trace_nfsd_fh_verify_err() crasher
Jeff Layton <jlayton(a)kernel.org>
nfsd: put the export reference in nfsd4_verify_deleg_dentry
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix use-after-free in nfsd_file_do_acquire tracepoint
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix net-namespace logic in __nfsd_file_cache_purge
Jeff Layton <jlayton(a)kernel.org>
nfsd: ensure we always call fh_verify_error tracepoint
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
NFSD: unregister shrinker when nfsd_init_net() fails
Jeff Layton <jlayton(a)kernel.org>
nfsd: rework hashtable handling in nfsd_do_file_acquire
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix nfsd_file_unhash_and_dispose
Gaosheng Cui <cuigaosheng1(a)huawei.com>
fanotify: Remove obsoleted fanotify_event_has_path()
Gaosheng Cui <cuigaosheng1(a)huawei.com>
fsnotify: remove unused declaration
Al Viro <viro(a)zeniv.linux.org.uk>
fs/notify: constify path
Jeff Layton <jlayton(a)kernel.org>
nfsd: extra checks when freeing delegation stateids
Jeff Layton <jlayton(a)kernel.org>
nfsd: make nfsd4_run_cb a bool return function
Jeff Layton <jlayton(a)kernel.org>
nfsd: fix comments about spinlock handling with delegations
Jeff Layton <jlayton(a)kernel.org>
nfsd: only fill out return pointer on success in nfsd4_lookup_stateid
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Cap rsize_bop result based on send buffer size
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Rename the fields in copy_stateid_t
ChenXiaoSong <chenxiaosong2(a)huawei.com>
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_file_cache_stats_fops
ChenXiaoSong <chenxiaosong2(a)huawei.com>
nfsd: use DEFINE_SHOW_ATTRIBUTE to define nfsd_reply_cache_stats_fops
ChenXiaoSong <chenxiaosong2(a)huawei.com>
nfsd: use DEFINE_SHOW_ATTRIBUTE to define client_info_fops
ChenXiaoSong <chenxiaosong2(a)huawei.com>
nfsd: use DEFINE_SHOW_ATTRIBUTE to define export_features_fops and supported_enctypes_fops
ChenXiaoSong <chenxiaosong2(a)huawei.com>
nfsd: use DEFINE_PROC_SHOW_ATTRIBUTE to define nfsd_proc_ops
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Pack struct nfsd4_compoundres
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove unused nfsd4_compoundargs::cachetype field
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove "inline" directives on op_rsize_bop helpers
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfs4svc_encode_compoundres()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up WRITE arg decoders
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use xdr_inline_decode() to decode NFSv3 symlinks
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor common code out of dirlist helpers
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Reduce amount of struct nfsd4_compoundargs that needs clearing
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Parametrize how much of argsize should be zeroed
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add shrinker to reap courtesy clients on low memory condition
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: keep track of the number of courtesy clients in the system
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Make nfsd4_remove() wait before returning NFS4ERR_DELAY
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Make nfsd4_rename() wait before returning NFS4ERR_DELAY
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Make nfsd4_setattr() wait before returning NFS4ERR_DELAY
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd_setattr()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add a mechanism to wait for a DELEGRETURN
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add tracepoints to report NFSv4 callback completions
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace NFSv4 COMPOUND tags
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Replace dprintk() call site in fh_verify()
Gaosheng Cui <cuigaosheng1(a)huawei.com>
nfsd: remove nfsd4_prepare_cb_recall() declaration
Jeff Layton <jlayton(a)kernel.org>
nfsd: clean up mounted_on_fileid handling
NeilBrown <neilb(a)suse.de>
NFSD: drop fname and flen args from nfsd_create_locked()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Increase NFSD_MAX_OPS_PER_COMPOUND
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
nfsd: Propagate some error code returned by memdup_user()
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
nfsd: Avoid some useless tests
Jinpeng Cui <cui.jinpeng2(a)zte.com.cn>
NFSD: remove redundant variable status
Olga Kornievskaia <kolga(a)netapp.com>
NFSD enforce filehandle check for source file in COPY
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
lockd: move from strlcpy with unused retval to strscpy
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
NFSD: move from strlcpy with unused retval to strscpy
Al Viro <viro(a)zeniv.linux.org.uk>
nfsd_splice_actor(): handle compound pages
NeilBrown <neilb(a)suse.de>
NFSD: fix regression with setting ACLs.
NeilBrown <neilb(a)suse.de>
NFSD: discard fh_locked flag and fh_lock/fh_unlock
NeilBrown <neilb(a)suse.de>
NFSD: use (un)lock_inode instead of fh_(un)lock for file operations
NeilBrown <neilb(a)suse.de>
NFSD: use explicit lock/unlock for directory ops
NeilBrown <neilb(a)suse.de>
NFSD: reduce locking in nfsd_lookup()
NeilBrown <neilb(a)suse.de>
NFSD: only call fh_unlock() once in nfsd_link()
NeilBrown <neilb(a)suse.de>
NFSD: always drop directory lock in nfsd_unlink()
NeilBrown <neilb(a)suse.de>
NFSD: change nfsd_create()/nfsd_symlink() to unlock directory before returning.
NeilBrown <neilb(a)suse.de>
NFSD: add posix ACLs to struct nfsd_attrs
NeilBrown <neilb(a)suse.de>
NFSD: add security label to struct nfsd_attrs
NeilBrown <neilb(a)suse.de>
NFSD: set attributes when creating symlinks
NeilBrown <neilb(a)suse.de>
NFSD: introduce struct nfsd_attrs
Jeff Layton <jlayton(a)kernel.org>
NFSD: verify the opened dentry after setting a delegation
Jeff Layton <jlayton(a)kernel.org>
NFSD: drop fh argument from alloc_init_deleg
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Move copy offload callback arguments into a separate structure
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add nfsd4_send_cb_offload()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove kmalloc from nfsd4_do_async_copy()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd4_do_copy()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd4_cleanup_inter_ssc() (2/2)
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd4_cleanup_inter_ssc() (1/2)
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Replace boolean fields in struct nfsd4_copy
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Make nfs4_put_copy() static
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Reorder the fields in struct nfsd4_op
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Shrink size of struct nfsd4_copy
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Shrink size of struct nfsd4_copy_notify
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: nfserrno(-ENOMEM) is nfserr_jukebox
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix strncpy() fortify warning
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd4_encode_readlink()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Use xdr_pad_size()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Simplify starting_len
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Optimize nfsd4_encode_readv()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add an nfsd4_read::rd_eof field
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up SPLICE_OK in nfsd4_encode_read()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Optimize nfsd4_encode_fattr()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Optimize nfsd4_encode_operation()
Jeff Layton <jlayton(a)kernel.org>
nfsd: silence extraneous printk on nfsd.ko insertion
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: limit the number of v4 clients to 1024 per 1GB of system memory
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: keep track of the number of v4 clients in the system
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: refactoring v4 specific code to a helper in nfs4state.c
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Ensure nf_inode is never dereferenced
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: NFSv4 CLOSE should release an nfsd_file immediately
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Move nfsd_file_trace_alloc() tracepoint
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Separate tracepoints for acquire and create
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up unused code after rhashtable conversion
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Convert the filecache to use rhashtable
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Set up an rhashtable for the filecache
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Replace the "init once" mechanism
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove nfsd_file::nf_hashval
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: nfsd_file_hash_remove can compute hashval
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor __nfsd_file_close_inode()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: nfsd_file_unhash can compute hashval from nf->nf_inode
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove lockdep assertion from unhash_and_release_locked()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: No longer record nf_hashval in the trace log
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Never call nfsd_file_gc() in foreground paths
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix the filecache LRU shrinker
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Leave open files out of the filecache LRU
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace filecache LRU activity
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: WARN when freeing an item still linked via nf_lru
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Hook up the filecache stat file
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Zero counters when the filecache is re-initialized
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Record number of flush calls
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Report the number of items evicted by the LRU walk
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd_file_lru_scan()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd_file_gc()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add nfsd_file_lru_dispose_list() helper
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Report average age of filecache items
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Report count of freed filecache items
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Report count of calls to nfsd_file_acquire()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Report filecache LRU size
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Demote a WARN to a pr_warn()
Colin Ian King <colin.i.king(a)gmail.com>
nfsd: remove redundant assignment to variable len
Zhang Jiaming <jiaming(a)nfschina.com>
NFSD: Fix space and spelling mistake
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Instrument fh_verify()
Benjamin Coddington <bcodding(a)redhat.com>
NLM: Defend against file_lock changes after vfs_test_lock()
Xin Gao <gaoxin(a)cdjrlc.com>
fsnotify: Fix comment typo
Amir Goldstein <amir73il(a)gmail.com>
fanotify: introduce FAN_MARK_IGNORE
Amir Goldstein <amir73il(a)gmail.com>
fanotify: cleanups for fanotify_mark() input validations
Amir Goldstein <amir73il(a)gmail.com>
fanotify: prepare for setting event flags in ignore mask
Oliver Ford <ojford(a)gmail.com>
fs: inotify: Fix typo in inotify comment
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Decode NFSv4 birth time attribute
Amir Goldstein <amir73il(a)gmail.com>
fanotify: refine the validation checks on non-dir inode mask
NeilBrown <neilb(a)suse.de>
NFS: restore module put when manager exits.
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix potential use-after-free in nfsd_file_put()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: nfsd_file_put() can sleep
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Add documenting comment for nfsd4_release_lockowner()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Modernize nfsd4_release_lockowner()
Zhang Xiaoxu <zhangxiaoxu5(a)huawei.com>
nfsd: Fix null-ptr-deref in nfsd_fill_super()
Zhang Xiaoxu <zhangxiaoxu5(a)huawei.com>
nfsd: Unregister the cld notifier when laundry_wq create failed
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Use RMW bitops in single-threaded hot paths
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace filecache opens
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Move documenting comment for nfsd4_process_open2()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix whitespace
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove dprintk call sites from tail of nfsd4_open()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Instantiate a struct file when creating a regular NFSv4 file
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd_open_verified()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove do_nfsd_create()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor NFSv4 OPEN(CREATE)
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor NFSv3 CREATE
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Refactor nfsd_create_setattr()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Avoid calling fh_drop_write() twice in do_nfsd_create()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd3_proc_create()
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: Show state of courtesy client in client info
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add support for lock conflict to courteous server
Dai Ngo <dai.ngo(a)oracle.com>
fs/lock: add 2 callbacks to lock_manager_operations to resolve conflict
Dai Ngo <dai.ngo(a)oracle.com>
fs/lock: add helper locks_owner_has_blockers to check for blockers
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: move create/destroy of laundry_wq to init_nfsd and exit_nfsd
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add support for share reservation conflict to courteous server
Dai Ngo <dai.ngo(a)oracle.com>
NFSD: add courteous server support for thread with only delegation
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd_splice_actor()
Vasily Averin <vvs(a)openvz.org>
fanotify: fix incorrect fmode_t casts
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: consistent behavior for parent not watching children
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: introduce mark type iterator
Amir Goldstein <amir73il(a)gmail.com>
fanotify: enable "evictable" inode marks
Amir Goldstein <amir73il(a)gmail.com>
fanotify: use fsnotify group lock helpers
Amir Goldstein <amir73il(a)gmail.com>
fanotify: implement "evictable" inode marks
Amir Goldstein <amir73il(a)gmail.com>
fanotify: factor out helper fanotify_mark_update_flags()
Amir Goldstein <amir73il(a)gmail.com>
fanotify: create helper fanotify_mark_user_flags()
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: allow adding an inode mark without pinning inode
Amir Goldstein <amir73il(a)gmail.com>
dnotify: use fsnotify group lock helpers
Amir Goldstein <amir73il(a)gmail.com>
nfsd: use fsnotify group lock helpers
Amir Goldstein <amir73il(a)gmail.com>
inotify: use fsnotify group lock helpers
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: create helpers for group mark_mutex lock
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: make allow_dups a property of the group
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: pass flags argument to fsnotify_alloc_group()
Amir Goldstein <amir73il(a)gmail.com>
inotify: move control flags from mask to mark flags
Dai Ngo <dai.ngo(a)oracle.com>
fs/lock: documentation cleanup. Replace inode->i_lock with flc_lock.
Amir Goldstein <amir73il(a)gmail.com>
fanotify: do not allow setting dirent events in mask of non-dir
Trond Myklebust <trond.myklebust(a)hammerspace.com>
nfsd: Clean up nfsd_file_put()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
nfsd: Fix a write performance regression
Bang Li <libang.linuxer(a)gmail.com>
fsnotify: remove redundant parameter judgment
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: optimize FS_MODIFY events with no ignored masks
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: fix merge with parent's ignored mask
Jakob Koschel <jakobkoschel(a)gmail.com>
nfsd: fix using the correct variable for sizeof()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up _lm_ operation names
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove CONFIG_NFSD_V3
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Move svc_serv_ops::svo_function into struct svc_serv
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove svc_serv_ops::svo_module
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Remove svc_shutdown_net()
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Rename svc_close_xprt()
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Rename svc_create_xprt()
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Remove svo_shutdown method
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Merge svc_do_enqueue_xprt() into svc_enqueue_xprt()
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Remove the .svo_enqueue_xprt method
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove NFSD_PROC_ARGS_* macros
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Streamline the rare "found" case
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Skip extra computation for RC_NOCACHE case
Chuck Lever <chuck.lever(a)oracle.com>
orDate: Thu Sep 30 19:19:57 2021 -0400
Ondrej Valousek <ondrej.valousek.xm(a)renesas.com>
nfsd: Add support for the birth time attribute
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Deprecate NFS_OFFSET_MAX
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: invalidate dcache before IN_DELETE event
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Move fill_pre_wcc() and fill_post_wcc()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Trace boot verifier resets
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Rename boot verifier functions
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up the nfsd_net::nfssvc_boot field
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Write verifier might go backwards
Trond Myklebust <trond.myklebust(a)hammerspace.com>
nfsd: Add a tracepoint for errors in nfsd4_clone_file_range()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: De-duplicate net_generic(SVC_NET(rqstp), nfsd_net_id)
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Clean up nfsd_vfs_write()
Jeff Layton <jeff.layton(a)primarydata.com>
nfsd: Retry once in nfsd_open on an -EOPENSTALE return
Jeff Layton <jeff.layton(a)primarydata.com>
nfsd: Add errno mapping for EREMOTEIO
Peng Tao <tao.peng(a)primarydata.com>
nfsd: map EBADF
Vasily Averin <vvs(a)virtuozzo.com>
nfsd4: add refcount for nfsd4_blocked_lock
J. Bruce Fields <bfields(a)redhat.com>
nfs: block notification on fs with its own ->lock
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: De-duplicate nfsd4_decode_bitmap4()
J. Bruce Fields <bfields(a)redhat.com>
nfsd: improve stateid access bitmask documentation
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Combine XDR error tracepoints
NeilBrown <neilb(a)suse.de>
NFSD: simplify per-net file cache management
Jiapeng Chong <jiapeng.chong(a)linux.alibaba.com>
NFSD: Fix inconsistent indenting
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Remove be32_to_cpu() from DRC hash function
NeilBrown <neilb(a)suse.de>
NFS: switch the callback service back to non-pooled.
NeilBrown <neilb(a)suse.de>
lockd: use svc_set_num_threads() for thread start and stop
NeilBrown <neilb(a)suse.de>
SUNRPC: always treat sv_nrpools==1 as "not pooled"
NeilBrown <neilb(a)suse.de>
SUNRPC: move the pool_map definitions (back) into svc.c
NeilBrown <neilb(a)suse.de>
lockd: rename lockd_create_svc() to lockd_get()
NeilBrown <neilb(a)suse.de>
lockd: introduce lockd_put()
NeilBrown <neilb(a)suse.de>
lockd: move svc_exit_thread() into the thread
NeilBrown <neilb(a)suse.de>
lockd: move lockd_start_svc() call into lockd_create_svc()
NeilBrown <neilb(a)suse.de>
lockd: simplify management of network status notifiers
NeilBrown <neilb(a)suse.de>
lockd: introduce nlmsvc_serv
NeilBrown <neilb(a)suse.de>
NFSD: simplify locking for network notifier.
NeilBrown <neilb(a)suse.de>
SUNRPC: discard svo_setup and rename svc_set_num_threads_sync()
NeilBrown <neilb(a)suse.de>
NFSD: Make it possible to use svc_set_num_threads_sync
NeilBrown <neilb(a)suse.de>
NFSD: narrow nfsd_mutex protection in nfsd thread
NeilBrown <neilb(a)suse.de>
SUNRPC: use sv_lock to protect updates to sv_nrthreads.
NeilBrown <neilb(a)suse.de>
nfsd: make nfsd_stats.th_cnt atomic_t
NeilBrown <neilb(a)suse.de>
SUNRPC: stop using ->sv_nrthreads as a refcount
NeilBrown <neilb(a)suse.de>
SUNRPC/NFSD: clean up get/put functions.
NeilBrown <neilb(a)suse.de>
SUNRPC: change svc_get() to return the svc.
NeilBrown <neilb(a)suse.de>
NFSD: handle errors better in write_ports_addfd()
Eric W. Biederman <ebiederm(a)xmission.com>
exit: Rename module_put_and_exit to module_put_and_kthread_exit
Eric W. Biederman <ebiederm(a)xmission.com>
exit: Implement kthread_exit
Amir Goldstein <amir73il(a)gmail.com>
fanotify: wire up FAN_RENAME event
Amir Goldstein <amir73il(a)gmail.com>
fanotify: report old and/or new parent+name in FAN_RENAME event
Amir Goldstein <amir73il(a)gmail.com>
fanotify: record either old name new name or both for FAN_RENAME
Amir Goldstein <amir73il(a)gmail.com>
fanotify: record old and new parent and name in FAN_RENAME event
Amir Goldstein <amir73il(a)gmail.com>
fanotify: support secondary dir fh and name in fanotify_info
Amir Goldstein <amir73il(a)gmail.com>
fanotify: use helpers to parcel fanotify_info buffer
Amir Goldstein <amir73il(a)gmail.com>
fanotify: use macros to get the offset to fanotify_info buffer
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: generate FS_RENAME event with rich information
Amir Goldstein <amir73il(a)gmail.com>
fanotify: introduce group flag FAN_REPORT_TARGET_FID
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: separate mark iterator type from object type enum
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: clarify object type argument
Gabriel Krisman Bertazi <krisman(a)collabora.com>
ext4: fix error code saved on super block during file system abort
J. Bruce Fields <bfields(a)redhat.com>
nfsd4: remove obselete comment
Changcheng Deng <deng.changcheng(a)zte.com.cn>
NFSD:fix boolreturn.cocci warning
J. Bruce Fields <bfields(a)redhat.com>
nfsd: update create verifier comment
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Change return value type of .pc_encode
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Replace the "__be32 *p" parameter to .pc_encode
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Save location of NFSv4 COMPOUND status
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Change return value type of .pc_decode
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Replace the "__be32 *p" parameter to .pc_decode
Colin Ian King <colin.king(a)canonical.com>
NFSD: Initialize pointer ni with NULL and not plain integer 0
NeilBrown <neilb(a)suse.de>
NFSD: simplify struct nfsfh
NeilBrown <neilb(a)suse.de>
NFSD: drop support for ancient filehandles
NeilBrown <neilb(a)suse.de>
NFSD: move filehandle format declarations out of "uapi".
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Optimize DRC bucket pruning
Chuck Lever <chuck.lever(a)oracle.com>
NFS: Move NFS protocol display macros to global header
Chuck Lever <chuck.lever(a)oracle.com>
NFS: Move generic FS show macros to global header
Chuck Lever <chuck.lever(a)oracle.com>
SUNRPC: Tracepoints should display tk_pid and cl_clid as a fixed-size field
Chuck Lever <chuck.lever(a)oracle.com>
NFS: Remove unnecessary TRACE_DEFINE_ENUM()s
Gabriel Krisman Bertazi <krisman(a)collabora.com>
docs: Document the FAN_FS_ERROR event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
ext4: Send notifications on error
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Allow users to request FAN_FS_ERROR events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Emit generic error info for error event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Report fid info for file related file system errors
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: WARN_ON against too large file handles
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Add helpers to decide whether to report FID/DFID
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Wrap object_fh inline space in a creator macro
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Support merging of error events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Support enqueueing of error events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Pre-allocate pool of error events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Reserve UAPI bits for FAN_FS_ERROR
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Support FS_ERROR event type
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Require fid_mode for any non-fd event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Encode empty file handle when no inode is provided
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Allow file handle encoding for unhashed events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Support null inode event in fanotify_dfid_inode
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Pass group argument to free_event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Protect fsnotify_handle_inode_event from no-inode events
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Retrieve super block from the data field
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Add wrapper around fsnotify_add_event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Add helper to detect overflow_event
Gabriel Krisman Bertazi <krisman(a)collabora.com>
inotify: Don't force FS_IN_IGNORED
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Split fsid check from other fid mode checks
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fanotify: Fold event size calculation to its own function
Gabriel Krisman Bertazi <krisman(a)collabora.com>
fsnotify: Don't insert unmergeable events in hashtable
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: clarify contract for create event hooks
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: pass dentry instead of inode data
Amir Goldstein <amir73il(a)gmail.com>
fsnotify: pass data_type to fsnotify_name()
Peter Zijlstra <peterz(a)infradead.org>
x86/static_call: Add support for Jcc tail-calls
Peter Zijlstra <peterz(a)infradead.org>
x86/alternatives: Teach text_poke_bp() to patch Jcc.d32 instructions
Peter Zijlstra <peterz(a)infradead.org>
x86/alternatives: Introduce int3_emulate_jcc()
Thomas Gleixner <tglx(a)linutronix.de>
x86/asm: Differentiate between code and function alignment
Peter Zijlstra <peterz(a)infradead.org>
arch: Introduce CONFIG_FUNCTION_ALIGNMENT
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/rfds: Mitigate Register File Data Sampling (RFDS)
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
Documentation/hw-vuln: Add documentation for RFDS
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
Sean Christopherson <seanjc(a)google.com>
KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/entry_32: Add VERW just before userspace transition
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/entry_64: Add VERW just before userspace transition
Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
x86/bugs: Add asm helpers for executing VERW
H. Peter Anvin (Intel) <hpa(a)zytor.com>
x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix
Oliver Upton <oliver.upton(a)linux.dev>
KVM: arm64: Limit stage2_apply_range() batch size to largest block
Oliver Upton <oliver.upton(a)linux.dev>
KVM: arm64: Work out supported block level at compile time
Rickard x Andersson <rickaran(a)axis.com>
tty: serial: imx: Fix broken RS485
John Ogness <john.ogness(a)linutronix.de>
printk: Update @console_may_schedule in console_trylock_spinning()
Nicolin Chen <nicolinc(a)nvidia.com>
iommu/dma: Force swiotlb_max_mapping_size on an untrusted device
John Garry <john.garry(a)huawei.com>
dma-iommu: add iommu_dma_opt_mapping_size()
John Garry <john.garry(a)huawei.com>
dma-mapping: add dma_opt_mapping_size()
Will Deacon <will(a)kernel.org>
swiotlb: Fix alignment checks when both allocation and DMA masks are present
David Laight <David.Laight(a)ACULAB.COM>
minmax: add umin(a, b) and umax(a, b)
André Rösti <an.roesti(a)gmail.com>
entry: Respect changes to system call number by trace_sys_enter()
Martin Blumenstingl <martin.blumenstingl(a)googlemail.com>
clocksource/drivers/arm_global_timer: Fix maximum prescaler value
Jarred White <jarredwhite(a)linux.microsoft.com>
ACPI: CPPC: Use access_width over bit_width for system memory accesses
Maximilian Heyne <mheyne(a)amazon.de>
xen/events: close evtchn after mapping cleanup
Heiner Kallweit <hkallweit1(a)gmail.com>
i2c: i801: Avoid potential double call to gpiod_remove_lookup_table
Sumit Garg <sumit.garg(a)linaro.org>
tee: optee: Fix kernel panic caused by incorrect error handling
Bart Van Assche <bvanassche(a)acm.org>
fs/aio: Check IOCB_AIO_RW before the struct aio_kiocb conversion
Nicolas Pitre <nico(a)fluxnic.net>
vt: fix unicode buffer corruption when deleting characters
Alexander Usyskin <alexander.usyskin(a)intel.com>
mei: me: add arrow lake point H DID
Alexander Usyskin <alexander.usyskin(a)intel.com>
mei: me: add arrow lake point S DID
Sherry Sun <sherry.sun(a)nxp.com>
tty: serial: fsl_lpuart: avoid idle preamble pending if CTS is enabled
Mathias Nyman <mathias.nyman(a)linux.intel.com>
usb: port: Don't try to peer unused USB ports based on location
Krishna Kurapati <quic_kriskura(a)quicinc.com>
usb: gadget: ncm: Fix handling of zero block length packets
Alan Stern <stern(a)rowland.harvard.edu>
USB: usb-storage: Prevent divide-by-0 error in isd200_ata_command
Kailang Yang <kailang(a)realtek.com>
ALSA: hda/realtek - Fix headset Mic no show at resume back for Lenovo ALC897 platform
Nirmoy Das <nirmoy.das(a)intel.com>
drm/i915: Check before removing mm notifier
Steven Rostedt (Google) <rostedt(a)goodmis.org>
tracing: Use .flush() call to wake up readers
Sean Christopherson <seanjc(a)google.com>
KVM: SVM: Flush pages under kvm->lock to fix UAF in svm_register_enc_region()
Nathan Chancellor <nathan(a)kernel.org>
xfrm: Avoid clang fortify warning in copy_to_user_tmpl()
Michael Kelley <mhklinux(a)outlook.com>
Drivers: hv: vmbus: Calculate ring buffer size for more efficient use of memory
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: reject constant set with timeout
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: disallow anonymous set with timeout flag
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: mark set as dead when unbinding anonymous set with timeout
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
cpufreq: brcmstb-avs-cpufreq: fix up "add check for cpufreq_cpu_get's return value"
Geert Uytterhoeven <geert+renesas(a)glider.be>
net: ravb: Add R-Car Gen4 support
Anton Altaparmakov <anton(a)tuxera.com>
x86/pm: Work around false positive kmemleak report in msr_build_context()
Mikulas Patocka <mpatocka(a)redhat.com>
dm snapshot: fix lockup in dm_exception_table_exit
Leo Ma <hanghong.ma(a)amd.com>
drm/amd/display: Fix noise issue on HDMI AV mute
Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
drm/amd/display: Return the correct HDCP error code
Philip Yang <Philip.Yang(a)amd.com>
drm/amdgpu: amdgpu_ttm_gart_bind set gtt bound flag
Conrad Kostecki <conikost(a)gentoo.org>
ahci: asm1064: asm1166: don't limit reported ports
Andrey Jr. Melnikov <temnota.am(a)gmail.com>
ahci: asm1064: correct count of reported ports
Jason A. Donenfeld <Jason(a)zx2c4.com>
wireguard: netlink: access device through ctx instead of peer
Jason A. Donenfeld <Jason(a)zx2c4.com>
wireguard: netlink: check for dangling peer via is_dead instead of empty list
Steven Rostedt (Google) <rostedt(a)goodmis.org>
net: hns3: tracing: fix hclgevf trace event strings
Steven Rostedt (Google) <rostedt(a)goodmis.org>
NFSD: Fix nfsd_clid_class use of __string_len() macro
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/CPU/AMD: Update the Zenbleed microcode revisions
Marek Szyprowski <m.szyprowski(a)samsung.com>
cpufreq: dt: always allocate zeroed cpumask
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: prevent kernel bug at submit_bh_wbc()
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix failure to detect DAT corruption in btree and direct mappings
Qiang Zhang <qiang4.zhang(a)intel.com>
memtest: use {READ,WRITE}_ONCE in memory scanning
Jani Nikula <jani.nikula(a)intel.com>
drm/vc4: hdmi: do not return negative values from .get_modes()
Jani Nikula <jani.nikula(a)intel.com>
drm/imx/ipuv3: do not return negative values from .get_modes()
Jani Nikula <jani.nikula(a)intel.com>
drm/exynos: do not return negative values from .get_modes()
Jani Nikula <jani.nikula(a)intel.com>
drm/panel: do not return negative error codes from drm_panel_get_modes()
Harald Freudenberger <freude(a)linux.ibm.com>
s390/zcrypt: fix reference counting on zcrypt card objects
Sean Anderson <sean.anderson(a)linux.dev>
soc: fsl: qbman: Use raw spinlock for cgr_lock
Sean Anderson <sean.anderson(a)seco.com>
soc: fsl: qbman: Add CGR update function
Sean Anderson <sean.anderson(a)seco.com>
soc: fsl: qbman: Add helper for sanity checking cgr ops
Sean Anderson <sean.anderson(a)linux.dev>
soc: fsl: qbman: Always disable interrupts when taking cgr_lock
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Use wait_event_interruptible() in ring_buffer_wait()
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Fix full_waiters_pending in poll
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Fix resetting of shortest_full
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Do not set shortest_full when full target is hit
Steven Rostedt (Google) <rostedt(a)goodmis.org>
ring-buffer: Fix waking up ring buffer readers
Marios Makassikis <mmakassikis(a)freebox.fr>
ksmbd: retrieve number of blocks using vfs_getattr in set_file_allocation_info
Alex Williamson <alex.williamson(a)redhat.com>
vfio/platform: Disable virqfds on cleanup
Niklas Cassel <cassel(a)kernel.org>
PCI: dwc: endpoint: Fix advertised resizable BAR size
Nathan Chancellor <nathan(a)kernel.org>
kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
Josef Bacik <josef(a)toxicpanda.com>
nfs: fix UAF in direct writes
Stanislaw Gruszka <stanislaw.gruszka(a)linux.intel.com>
PCI/AER: Block runtime suspend when handling errors
Samuel Thibault <samuel.thibault(a)ens-lyon.org>
speakup: Fix 8bit characters from direct synth
Wayne Chang <waynec(a)nvidia.com>
usb: gadget: tegra-xudc: Fix USB3 PHY retrieval logic
Wayne Chang <waynec(a)nvidia.com>
phy: tegra: xusb: Add API to retrieve the port number of phy
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
slimbus: core: Remove usage of the deprecated ida_simple_xx() API
Jerome Brunet <jbrunet(a)baylibre.com>
nvmem: meson-efuse: fix function pointer type mismatch
Maximilian Heyne <mheyne(a)amazon.de>
ext4: fix corruption during on-line resize
Josua Mayer <josua(a)solid-run.com>
hwmon: (amc6821) add of_match table
Mickaël Salaün <mic(a)digikod.net>
landlock: Warn once if a Landlock action is requested while disabled
Christian Gmeiner <cgmeiner(a)igalia.com>
drm/etnaviv: Restore some id values
Dominique Martinet <dominique.martinet(a)atmark-techno.com>
mmc: core: Fix switch on gp3 partition
Ryan Roberts <ryan.roberts(a)arm.com>
mm: swap: fix race between free_swap_and_cache() and swapoff()
Huang Ying <ying.huang(a)intel.com>
swap: comments get_swap_device() with usage rule
Fedor Pchelkin <pchelkin(a)ispras.ru>
mac802154: fix llsec key resources release in mac802154_llsec_key_del
Yu Kuai <yukuai3(a)huawei.com>
dm-raid: fix lockdep waring in "pers->hot_add_disk"
Paul Menzel <pmenzel(a)molgen.mpg.de>
PCI/DPC: Quirk PIO log size for Intel Raptor Lake Root Ports
Mika Westerberg <mika.westerberg(a)linux.intel.com>
PCI/DPC: Quirk PIO log size for certain Intel Root Ports
Mika Westerberg <mika.westerberg(a)linux.intel.com>
PCI/ASPM: Make Intel DG2 L1 acceptable latency unlimited
Bjorn Helgaas <bhelgaas(a)google.com>
PCI: Work around Intel I210 ROM BAR overlap defect
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
PCI/PM: Drain runtime-idle callbacks before driver removal
Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
PCI: Drop pci_device_remove() test of pci_dev->driver
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix off-by-one chunk length calculation at contains_pending_extent()
Peter Collingbourne <pcc(a)google.com>
serial: Lock console when calling into driver before registration
Petr Mladek <pmladek(a)suse.com>
printk/console: Split out code that enables default console
Jameson Thies <jthies(a)google.com>
usb: typec: ucsi: Clean up UCSI_CABLE_PROP macros
Miklos Szeredi <mszeredi(a)redhat.com>
fuse: don't unhash root
Miklos Szeredi <mszeredi(a)redhat.com>
fuse: fix root lookup with nonzero generation
Wolfram Sang <wsa+renesas(a)sang-engineering.com>
mmc: tmio: avoid concurrent runs of mmc_request_done()
Qingliang Li <qingliang.li(a)mediatek.com>
PM: sleep: wakeirq: fix wake irq warning in system suspend
Toru Katagiri <Toru.Katagiri(a)tdk.com>
USB: serial: cp210x: add pid/vid for TDK NC0110013M and MM0110113M
Aurélien Jacobs <aurel(a)gnuage.org>
USB: serial: option: add MeiG Smart SLM320 product
Christian Häggström <christian.haggstrom(a)orexplore.com>
USB: serial: cp210x: add ID for MGP Instruments PDS100
Cameron Williams <cang1(a)live.co.uk>
USB: serial: add device ID for VeriFone adapter
Daniel Vogelbacher <daniel(a)chaospixel.com>
USB: serial: ftdi_sio: add support for GMC Z216C Adapter IR-USB
Michael Ellerman <mpe(a)ellerman.id.au>
powerpc/fsl: Fix mfpmr build errors with newer binutils
Prashanth K <quic_prashk(a)quicinc.com>
usb: xhci: Add error handling in xhci_map_urb_for_dma
Gabor Juhos <j4g8y7(a)gmail.com>
clk: qcom: mmcc-msm8974: fix terminating of frequency table arrays
Gabor Juhos <j4g8y7(a)gmail.com>
clk: qcom: mmcc-apq8084: fix terminating of frequency table arrays
Gabor Juhos <j4g8y7(a)gmail.com>
clk: qcom: gcc-ipq8074: fix terminating of frequency table arrays
Gabor Juhos <j4g8y7(a)gmail.com>
clk: qcom: gcc-ipq6018: fix terminating of frequency table arrays
Maulik Shah <quic_mkshah(a)quicinc.com>
PM: suspend: Set mem_sleep_current during kernel command line setup
Guenter Roeck <linux(a)roeck-us.net>
parisc: Strip upper 32 bit of sum in csum_ipv6_magic for 64-bit builds
Guenter Roeck <linux(a)roeck-us.net>
parisc: Fix csum_ipv6_magic on 64-bit systems
Guenter Roeck <linux(a)roeck-us.net>
parisc: Fix csum_ipv6_magic on 32-bit systems
Guenter Roeck <linux(a)roeck-us.net>
parisc: Fix ip_fast_csum
John David Anglin <dave.anglin(a)bell.net>
parisc: Avoid clobbering the C/B bits in the PSW with tophys and tovirt macros
Arseniy Krasnov <avkrasnov(a)salutedevices.com>
mtd: rawnand: meson: fix scrambling mode value in command macro
Zhang Yi <yi.zhang(a)huawei.com>
ubi: correct the calculation of fastmap size
Richard Weinberger <richard(a)nod.at>
ubi: Check for too small LEB size in VTBL code
Matthew Wilcox (Oracle) <willy(a)infradead.org>
ubifs: Set page uptodate in the correct place
Jan Kara <jack(a)suse.cz>
fat: fix uninitialized field in nostale filehandles
Matthew Wilcox (Oracle) <willy(a)infradead.org>
bounds: support non-power-of-two CONFIG_NR_CPUS
Arnd Bergmann <arnd(a)arndb.de>
kasan/test: avoid gcc warning for intentional overflow
Peter Collingbourne <pcc(a)google.com>
kasan: test: add memcpy test that avoids out-of-bounds write
Damien Le Moal <dlemoal(a)kernel.org>
block: Clear zone limits for a non-zoned stacked queue
Baokun Li <libaokun1(a)huawei.com>
ext4: correct best extent lstart adjustment logic
SeongJae Park <sj(a)kernel.org>
selftests/mqueue: Set timeout to 180 seconds
Damian Muszynski <damian.muszynski(a)intel.com>
crypto: qat - resolve race condition during AER recovery
Svyatoslav Pankratov <svyatoslav.pankratov(a)intel.com>
crypto: qat - fix double free during reset
Randy Dunlap <rdunlap(a)infradead.org>
sparc: vDSO: fix return value of __setup handler
Randy Dunlap <rdunlap(a)infradead.org>
sparc64: NMI watchdog: fix return value of __setup handler
Sean Christopherson <seanjc(a)google.com>
KVM: Always flush async #PF workqueue when vCPU is being destroyed
Gui-Dong Han <2045gemini(a)gmail.com>
media: xc4000: Fix atomicity violation in xc4000_get_frequency
Philipp Stanner <pstanner(a)redhat.com>
pci_iounmap(): Fix MMIO mapping leak
Zack Rusin <zack.rusin(a)broadcom.com>
drm/vmwgfx: Fix possible null pointer derefence with invalid contexts
Duje Mihanović <duje.mihanovic(a)skole.hr>
arm: dts: marvell: Fix maxium->maxim typo in brownstone dts
Roberto Sassu <roberto.sassu(a)huawei.com>
smack: Handle SMACK64TRANSMUTE in smack_inode_setsecurity()
Roberto Sassu <roberto.sassu(a)huawei.com>
smack: Set SMACK64TRANSMUTE only for dirs in smack_inode_setxattr()
Amit Pundir <amit.pundir(a)linaro.org>
clk: qcom: gcc-sdm845: Add soft dependency on rpmhpd
Hidenori Kobayashi <hidenorik(a)chromium.org>
media: staging: ipu3-imgu: Set fields before media_entity_pads_init()
Zheng Wang <zyytlz.wz(a)163.com>
wifi: brcmfmac: Fix use-after-free bug in brcmf_cfg80211_detach
Thomas Gleixner <tglx(a)linutronix.de>
timers: Rename del_timer_sync() to timer_delete_sync()
Thomas Gleixner <tglx(a)linutronix.de>
timers: Use del_timer_sync() even on UP
Thomas Gleixner <tglx(a)linutronix.de>
timers: Update kernel-doc for various functions
Jim Mattson <jmattson(a)google.com>
KVM: x86: Use a switch statement and macros in __feature_translate()
Jim Mattson <jmattson(a)google.com>
KVM: x86: Advertise CPUID.(EAX=7,ECX=2):EDX[5:0] to userspace
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Update KVM-only leaf handling to allow for 100% KVM-only leafs
Borislav Petkov <bp(a)suse.de>
x86/bugs: Use sysfs_emit()
Kim Phillips <kim.phillips(a)amd.com>
x86/cpu: Support AMD Automatic IBRS
Lin Yujun <linyujun809(a)huawei.com>
Documentation/hw-vuln: Update spectre doc
-------------
Diffstat:
Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
.../admin-guide/filesystem-monitoring.rst | 74 ++
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 ++
Documentation/admin-guide/hw-vuln/spectre.rst | 66 +-
Documentation/admin-guide/index.rst | 1 +
Documentation/admin-guide/kernel-parameters.txt | 39 +-
Documentation/core-api/dma-api.rst | 14 +
Documentation/filesystems/locking.rst | 10 +-
Documentation/filesystems/nfs/exporting.rst | 33 +
Documentation/x86/mds.rst | 34 +-
MAINTAINERS | 7 +
Makefile | 8 +-
arch/Kconfig | 24 +
arch/arm/boot/dts/mmp2-brownstone.dts | 2 +-
arch/arm64/boot/dts/qcom/sc7180-trogdor.dtsi | 2 +
arch/arm64/include/asm/kvm_pgtable.h | 18 +-
arch/arm64/include/asm/stage2_pgtable.h | 20 -
arch/arm64/kvm/mmu.c | 9 +-
arch/hexagon/kernel/vmlinux.lds.S | 1 +
arch/ia64/Kconfig | 1 +
arch/ia64/Makefile | 2 +-
arch/openrisc/kernel/dma.c | 16 +-
arch/parisc/include/asm/assembly.h | 18 +-
arch/parisc/include/asm/checksum.h | 10 +-
arch/powerpc/include/asm/reg_fsl_emb.h | 11 +-
arch/powerpc/lib/Makefile | 2 +-
arch/riscv/include/asm/uaccess.h | 4 +-
arch/riscv/kernel/process.c | 3 -
arch/s390/kernel/entry.S | 1 +
arch/sparc/kernel/nmi.c | 2 +-
arch/sparc/vdso/vma.c | 7 +-
arch/x86/Kconfig | 38 +
arch/x86/boot/compressed/head_64.S | 8 +
arch/x86/entry/common.c | 6 +-
arch/x86/entry/entry.S | 23 +
arch/x86/entry/entry_32.S | 3 +
arch/x86/entry/entry_64.S | 72 ++
arch/x86/entry/entry_64_compat.S | 4 +
arch/x86/entry/syscall_32.c | 21 +-
arch/x86/entry/syscall_64.c | 19 +-
arch/x86/entry/syscall_x32.c | 10 +-
arch/x86/include/asm/asm-prototypes.h | 1 +
arch/x86/include/asm/asm.h | 5 +
arch/x86/include/asm/cpufeature.h | 8 +-
arch/x86/include/asm/cpufeatures.h | 18 +-
arch/x86/include/asm/disabled-features.h | 3 +-
arch/x86/include/asm/entry-common.h | 1 -
arch/x86/include/asm/linkage.h | 12 +-
arch/x86/include/asm/msr-index.h | 19 +-
arch/x86/include/asm/nospec-branch.h | 64 +-
arch/x86/include/asm/required-features.h | 3 +-
arch/x86/include/asm/suspend_32.h | 10 +-
arch/x86/include/asm/syscall.h | 10 +-
arch/x86/include/asm/text-patching.h | 31 +
arch/x86/kernel/alternative.c | 56 +-
arch/x86/kernel/cpu/amd.c | 10 +-
arch/x86/kernel/cpu/bugs.c | 360 ++++--
arch/x86/kernel/cpu/common.c | 77 +-
arch/x86/kernel/cpu/mce/core.c | 4 +-
arch/x86/kernel/cpu/scattered.c | 1 +
arch/x86/kernel/kprobes/core.c | 38 +-
arch/x86/kernel/nmi.c | 3 -
arch/x86/kernel/static_call.c | 50 +-
arch/x86/kvm/cpuid.c | 29 +-
arch/x86/kvm/reverse_cpuid.h | 45 +-
arch/x86/kvm/svm/sev.c | 18 +-
arch/x86/kvm/vmx/run_flags.h | 7 +-
arch/x86/kvm/vmx/vmenter.S | 11 +-
arch/x86/kvm/vmx/vmx.c | 12 +-
arch/x86/kvm/x86.c | 17 +-
arch/x86/lib/retpoline.S | 5 +-
arch/x86/mm/ident_map.c | 23 +-
block/blk-settings.c | 4 +
crypto/algboss.c | 4 +-
drivers/accessibility/speakup/synth.c | 4 +-
drivers/acpi/acpica/dbnames.c | 8 +-
drivers/acpi/cppc_acpi.c | 27 +-
drivers/ata/ahci.c | 5 -
drivers/ata/sata_mv.c | 63 +-
drivers/ata/sata_sx4.c | 6 +-
drivers/base/core.c | 26 +-
drivers/base/cpu.c | 8 +
drivers/base/power/wakeirq.c | 4 +-
drivers/clk/qcom/gcc-ipq6018.c | 2 +
drivers/clk/qcom/gcc-ipq8074.c | 2 +
drivers/clk/qcom/gcc-sdm845.c | 1 +
drivers/clk/qcom/mmcc-apq8084.c | 2 +
drivers/clk/qcom/mmcc-msm8974.c | 2 +
drivers/clocksource/arm_global_timer.c | 2 +-
drivers/cpufreq/brcmstb-avs-cpufreq.c | 5 +-
drivers/cpufreq/cpufreq-dt.c | 2 +-
drivers/crypto/qat/qat_common/adf_aer.c | 23 +-
drivers/firmware/efi/vars.c | 17 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_connectors.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 1 +
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 8 +-
drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c | 12 +-
.../gpu/drm/amd/display/modules/hdcp/hdcp_psp.c | 3 +
drivers/gpu/drm/drm_panel.c | 17 +-
drivers/gpu/drm/etnaviv/etnaviv_drv.c | 2 +-
drivers/gpu/drm/etnaviv/etnaviv_hwdb.c | 9 +
drivers/gpu/drm/exynos/exynos_drm_vidi.c | 4 +-
drivers/gpu/drm/exynos/exynos_hdmi.c | 4 +-
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 3 +
drivers/gpu/drm/i915/gt/intel_engine_pm.c | 3 -
.../gpu/drm/i915/gt/intel_execlists_submission.c | 3 +
drivers/gpu/drm/imx/parallel-display.c | 4 +-
drivers/gpu/drm/vc4/vc4_hdmi.c | 2 +-
drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 14 +-
drivers/hid/uhid.c | 20 +-
drivers/hwmon/amc6821.c | 11 +
drivers/i2c/busses/i2c-i801.c | 4 +-
drivers/infiniband/core/cm_trace.h | 2 +-
drivers/infiniband/core/cma_trace.h | 2 +-
drivers/iommu/dma-iommu.c | 15 +
drivers/iommu/iova.c | 5 +
drivers/md/dm-integrity.c | 2 +-
drivers/md/dm-raid.c | 2 +
drivers/md/dm-snap.c | 4 +-
drivers/media/tuners/xc4000.c | 4 +-
drivers/misc/mei/hw-me-regs.h | 2 +
drivers/misc/mei/pci-me.c | 2 +
drivers/mmc/core/block.c | 14 +-
drivers/mmc/host/tmio_mmc_core.c | 2 +
drivers/mtd/nand/raw/meson_nand.c | 2 +-
drivers/mtd/ubi/fastmap.c | 7 +-
drivers/mtd/ubi/vtbl.c | 6 +
drivers/net/ethernet/freescale/fec_main.c | 11 +-
.../ethernet/hisilicon/hns3/hns3pf/hclge_trace.h | 8 +-
.../ethernet/hisilicon/hns3/hns3vf/hclgevf_trace.h | 8 +-
drivers/net/ethernet/intel/i40e/i40e.h | 6 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 14 +-
drivers/net/ethernet/intel/i40e/i40e_ptp.c | 6 +-
drivers/net/ethernet/intel/i40e/i40e_register.h | 3 +
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 82 +-
drivers/net/ethernet/intel/i40e/i40e_txrx.h | 5 +-
drivers/net/ethernet/intel/i40e/i40e_virtchnl_pf.c | 34 +-
drivers/net/ethernet/intel/ixgbe/ixgbe_ipsec.c | 16 +-
drivers/net/ethernet/marvell/octeontx2/af/cgx.c | 5 +
.../net/ethernet/marvell/octeontx2/af/rvu_npc.c | 2 +-
.../net/ethernet/marvell/octeontx2/nic/otx2_pf.c | 2 +-
.../ethernet/mellanox/mlxbf_gige/mlxbf_gige_main.c | 31 +-
drivers/net/ethernet/realtek/r8169_main.c | 11 +-
drivers/net/ethernet/renesas/ravb_main.c | 8 +-
drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 40 +-
.../net/ethernet/stmicro/stmmac/dwxgmac2_core.c | 38 +-
drivers/net/ethernet/xilinx/ll_temac_main.c | 2 +-
drivers/net/usb/asix.h | 3 +
drivers/net/usb/asix_devices.c | 20 +-
drivers/net/wireguard/netlink.c | 10 +-
.../broadcom/brcm80211/brcmfmac/cfg80211.c | 4 +-
drivers/net/wireless/intel/iwlwifi/mvm/rfi.c | 12 +-
drivers/net/xen-netfront.c | 1 +
drivers/nvme/host/core.c | 6 +-
drivers/nvmem/meson-efuse.c | 25 +-
drivers/of/dynamic.c | 12 +
drivers/pci/controller/dwc/pcie-designware-ep.c | 7 +-
drivers/pci/pci-driver.c | 23 +-
drivers/pci/pcie/dpc.c | 15 +-
drivers/pci/pcie/err.c | 20 +
drivers/pci/quirks.c | 100 ++
drivers/pci/setup-res.c | 8 +-
drivers/phy/tegra/xusb.c | 13 +
drivers/s390/crypto/zcrypt_api.c | 2 +
drivers/s390/net/qeth_core_main.c | 38 +-
drivers/scsi/hosts.c | 7 +-
drivers/scsi/lpfc/lpfc_nvmet.c | 2 +-
drivers/scsi/myrb.c | 20 +-
drivers/scsi/myrs.c | 24 +-
drivers/scsi/qla2xxx/qla_attr.c | 14 +-
drivers/scsi/qla2xxx/qla_def.h | 2 +-
drivers/scsi/qla2xxx/qla_gbl.h | 2 +-
drivers/scsi/qla2xxx/qla_gs.c | 2 +-
drivers/scsi/qla2xxx/qla_init.c | 128 +--
drivers/scsi/qla2xxx/qla_iocb.c | 68 +-
drivers/scsi/qla2xxx/qla_mbx.c | 2 +-
drivers/scsi/qla2xxx/qla_os.c | 2 +-
drivers/scsi/qla2xxx/qla_target.c | 10 +
drivers/slimbus/core.c | 4 +-
drivers/soc/fsl/qbman/qman.c | 98 +-
drivers/staging/media/ipu3/ipu3-v4l2.c | 16 +-
.../staging/vc04_services/vchiq-mmal/mmal-vchiq.c | 5 +-
drivers/tee/optee/device.c | 3 +-
drivers/thermal/devfreq_cooling.c | 2 +-
drivers/tty/serial/8250/8250_port.c | 6 -
drivers/tty/serial/fsl_lpuart.c | 7 +-
drivers/tty/serial/imx.c | 22 +-
drivers/tty/serial/sc16is7xx.c | 15 +-
drivers/tty/serial/serial_core.c | 12 +
drivers/tty/vt/vt.c | 2 +-
drivers/usb/class/cdc-wdm.c | 6 +-
drivers/usb/core/hub.c | 23 +-
drivers/usb/core/hub.h | 2 +
drivers/usb/core/port.c | 5 +-
drivers/usb/core/sysfs.c | 16 +-
drivers/usb/dwc2/core.h | 14 +
drivers/usb/dwc2/core_intr.c | 72 +-
drivers/usb/dwc2/gadget.c | 10 +
drivers/usb/dwc2/hcd.c | 49 +-
drivers/usb/dwc2/hcd_ddma.c | 17 +-
drivers/usb/dwc2/hw.h | 2 +-
drivers/usb/dwc2/platform.c | 2 +-
drivers/usb/gadget/function/f_ncm.c | 2 +-
drivers/usb/gadget/udc/core.c | 4 +-
drivers/usb/gadget/udc/tegra-xudc.c | 39 +-
drivers/usb/host/xhci.c | 2 +
drivers/usb/phy/phy-generic.c | 7 -
drivers/usb/serial/cp210x.c | 4 +
drivers/usb/serial/ftdi_sio.c | 2 +
drivers/usb/serial/ftdi_sio_ids.h | 6 +
drivers/usb/serial/option.c | 6 +
drivers/usb/storage/isd200.c | 23 +-
drivers/usb/storage/scsiglue.c | 1 -
drivers/usb/storage/uas.c | 81 +-
drivers/usb/storage/usb.c | 4 +-
drivers/usb/typec/ucsi/ucsi.c | 42 +-
drivers/usb/typec/ucsi/ucsi.h | 4 +-
drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c | 7 +-
drivers/vfio/pci/vfio_pci_intrs.c | 188 +--
drivers/vfio/platform/vfio_platform_irq.c | 106 +-
drivers/vfio/virqfd.c | 21 +
drivers/xen/events/events_base.c | 5 +-
fs/Kconfig | 2 +-
fs/aio.c | 8 +-
fs/btrfs/scrub.c | 12 +-
fs/btrfs/volumes.c | 2 +-
fs/cifs/connect.c | 2 +-
fs/exec.c | 1 +
fs/exportfs/expfs.c | 8 +-
fs/ext4/mballoc.c | 17 +-
fs/ext4/resize.c | 3 +-
fs/ext4/super.c | 10 +-
fs/fat/nfs.c | 6 +
fs/fuse/dir.c | 4 +
fs/fuse/fuse_i.h | 1 -
fs/fuse/inode.c | 7 +-
fs/ksmbd/smb2pdu.c | 10 +-
fs/lockd/host.c | 2 +-
fs/lockd/svc.c | 223 ++--
fs/lockd/svc4proc.c | 29 +-
fs/lockd/svclock.c | 31 +-
fs/lockd/svcproc.c | 30 +-
fs/lockd/svcsubs.c | 4 +-
fs/lockd/xdr.c | 152 ++-
fs/lockd/xdr4.c | 153 ++-
fs/locks.c | 85 +-
fs/nfs/callback.c | 96 +-
fs/nfs/callback_xdr.c | 5 +-
fs/nfs/direct.c | 11 +-
fs/nfs/export.c | 9 +-
fs/nfs/nfs4state.c | 2 +-
fs/nfs/nfs4trace.h | 477 +-------
fs/nfs/nfstrace.h | 269 +----
fs/nfs/pnfs.h | 4 -
fs/nfs/write.c | 2 +-
fs/nfsd/Kconfig | 27 +-
fs/nfsd/Makefile | 8 +-
fs/nfsd/acl.h | 6 +-
fs/nfsd/blocklayout.c | 1 +
fs/nfsd/blocklayoutxdr.c | 1 +
fs/nfsd/cache.h | 2 +-
fs/nfsd/export.h | 1 -
fs/nfsd/filecache.c | 1192 +++++++++++---------
fs/nfsd/filecache.h | 19 +-
fs/nfsd/flexfilelayout.c | 3 +-
fs/nfsd/lockd.c | 2 +-
fs/nfsd/netns.h | 34 +-
fs/nfsd/nfs2acl.c | 55 +-
fs/nfsd/nfs3acl.c | 83 +-
fs/nfsd/nfs3proc.c | 212 +++-
fs/nfsd/nfs3xdr.c | 444 +++-----
fs/nfsd/nfs4acl.c | 46 +-
fs/nfsd/nfs4callback.c | 125 +-
fs/nfsd/nfs4idmap.c | 9 +-
fs/nfsd/nfs4layouts.c | 4 +-
fs/nfsd/nfs4proc.c | 986 +++++++++-------
fs/nfsd/nfs4recover.c | 12 +-
fs/nfsd/nfs4state.c | 1049 +++++++++++++----
fs/nfsd/nfs4xdr.c | 1115 +++++++++---------
fs/nfsd/nfscache.c | 63 +-
fs/nfsd/nfsctl.c | 146 ++-
fs/nfsd/nfsd.h | 35 +-
fs/nfsd/nfsfh.c | 264 ++---
fs/nfsd/nfsfh.h | 145 ++-
fs/nfsd/nfsproc.c | 121 +-
fs/nfsd/nfssvc.c | 263 ++---
fs/nfsd/nfsxdr.c | 178 ++-
fs/nfsd/state.h | 59 +-
fs/nfsd/stats.c | 16 +-
fs/nfsd/stats.h | 4 +-
fs/nfsd/trace.h | 692 ++++++++++--
fs/nfsd/vfs.c | 822 +++++++-------
fs/nfsd/vfs.h | 56 +-
fs/nfsd/xdr.h | 35 +-
fs/nfsd/xdr3.h | 61 +-
fs/nfsd/xdr4.h | 81 +-
fs/nfsd/xdr4cb.h | 6 +
fs/nilfs2/btree.c | 9 +-
fs/nilfs2/direct.c | 9 +-
fs/nilfs2/inode.c | 2 +-
fs/notify/dnotify/dnotify.c | 15 +-
fs/notify/fanotify/fanotify.c | 363 ++++--
fs/notify/fanotify/fanotify.h | 212 +++-
fs/notify/fanotify/fanotify_user.c | 441 ++++++--
fs/notify/fdinfo.c | 16 +-
fs/notify/fsnotify.c | 177 +--
fs/notify/fsnotify.h | 4 -
fs/notify/group.c | 36 +-
fs/notify/inotify/inotify.h | 11 +-
fs/notify/inotify/inotify_fsnotify.c | 7 +-
fs/notify/inotify/inotify_user.c | 53 +-
fs/notify/mark.c | 137 ++-
fs/notify/notification.c | 14 +-
fs/open.c | 42 +
fs/pipe.c | 17 +-
fs/ubifs/file.c | 13 +-
fs/vboxsf/super.c | 3 +-
include/asm-generic/vmlinux.lds.h | 4 +-
include/linux/cpu.h | 2 +
include/linux/device.h | 1 +
include/linux/dma-map-ops.h | 1 +
include/linux/dma-mapping.h | 5 +
include/linux/dnotify.h | 2 +-
include/linux/exportfs.h | 17 +-
include/linux/fanotify.h | 31 +-
include/linux/fs.h | 26 +
include/linux/fsnotify.h | 70 +-
include/linux/fsnotify_backend.h | 356 +++++-
include/linux/gfp.h | 9 +
include/linux/hyperv.h | 22 +-
include/linux/iova.h | 2 +
include/linux/kthread.h | 1 +
include/linux/linkage.h | 4 +-
include/linux/lockd/lockd.h | 10 +-
include/linux/lockd/xdr.h | 27 +-
include/linux/lockd/xdr4.h | 29 +-
include/linux/minmax.h | 17 +
include/linux/module.h | 6 +-
include/linux/nfs.h | 8 -
include/linux/nfs4.h | 17 +
include/linux/nfs_fs.h | 1 +
include/linux/nfs_ssc.h | 4 +-
include/linux/pci.h | 1 +
include/linux/phy/tegra/xusb.h | 1 +
include/linux/ring_buffer.h | 1 +
include/linux/secretmem.h | 4 +-
include/linux/sunrpc/svc.h | 93 +-
include/linux/sunrpc/svc_xprt.h | 11 +-
include/linux/sunrpc/svcsock.h | 7 +-
include/linux/sunrpc/xdr.h | 2 +
include/linux/timer.h | 18 +-
include/linux/udp.h | 28 +
include/linux/vfio.h | 2 +
include/net/cfg802154.h | 1 +
include/net/inet_connection_sock.h | 1 +
include/net/sock.h | 7 +
include/soc/fsl/qman.h | 9 +
include/trace/events/rpcgss.h | 18 +-
include/trace/events/rpcrdma.h | 44 +-
include/trace/events/sunrpc.h | 74 +-
include/trace/misc/fs.h | 122 ++
include/trace/misc/nfs.h | 387 +++++++
include/trace/{events => misc}/rdma.h | 0
include/trace/misc/sunrpc.h | 18 +
include/uapi/linux/fanotify.h | 29 +
include/uapi/linux/nfsd/nfsfh.h | 115 --
init/initramfs.c | 2 +-
io_uring/io_uring.c | 2 +-
kernel/audit_fsnotify.c | 8 +-
kernel/audit_tree.c | 2 +-
kernel/audit_watch.c | 5 +-
kernel/bounds.c | 2 +-
kernel/bpf/verifier.c | 5 +
kernel/dma/mapping.c | 12 +
kernel/dma/swiotlb.c | 11 +-
kernel/entry/common.c | 8 +-
kernel/events/core.c | 9 +
kernel/kthread.c | 23 +-
kernel/locking/rwsem.c | 14 +-
kernel/module.c | 8 +-
kernel/power/suspend.c | 1 +
kernel/printk/printk.c | 63 +-
kernel/time/timer.c | 160 +--
kernel/trace/ring_buffer.c | 233 ++--
kernel/trace/trace.c | 21 +-
lib/Kconfig.debug | 1 +
lib/pci_iomap.c | 2 +-
lib/test_kasan.c | 21 +-
mm/compaction.c | 7 +-
mm/memtest.c | 4 +-
mm/migrate.c | 6 +-
mm/page_alloc.c | 10 +-
mm/swapfile.c | 25 +-
mm/vmscan.c | 5 +-
net/bluetooth/bnep/core.c | 2 +-
net/bluetooth/cmtp/core.c | 2 +-
net/bluetooth/hci_debugfs.c | 64 +-
net/bluetooth/hci_event.c | 25 +
net/bluetooth/hidp/core.c | 2 +-
net/bridge/netfilter/ebtables.c | 6 +
net/core/skbuff.c | 6 +-
net/core/sock_map.c | 6 +
net/ipv4/inet_connection_sock.c | 14 +
net/ipv4/ip_gre.c | 5 +
net/ipv4/netfilter/arp_tables.c | 4 +
net/ipv4/netfilter/ip_tables.c | 4 +
net/ipv4/tcp.c | 2 +
net/ipv4/udp.c | 7 +
net/ipv4/udp_offload.c | 20 +-
net/ipv6/ip6_fib.c | 14 +-
net/ipv6/ip6_gre.c | 3 +
net/ipv6/netfilter/ip6_tables.c | 4 +
net/ipv6/udp.c | 2 +-
net/ipv6/udp_offload.c | 8 +-
net/mac80211/cfg.c | 5 +-
net/mac802154/llsec.c | 18 +-
net/mptcp/protocol.c | 3 -
net/mptcp/subflow.c | 3 +
net/netfilter/nf_tables_api.c | 20 +-
net/nfc/nci/core.c | 5 +
net/rds/rdma.c | 2 +-
net/sched/act_skbmod.c | 10 +-
net/sunrpc/svc.c | 227 ++--
net/sunrpc/svc_xprt.c | 68 +-
net/sunrpc/svcsock.c | 24 +-
net/sunrpc/xdr.c | 22 +
net/sunrpc/xprtrdma/svc_rdma_backchannel.c | 2 +-
net/xfrm/xfrm_user.c | 3 +
scripts/Makefile.extrawarn | 2 +
security/landlock/syscalls.c | 18 +-
security/smack/smack_lsm.c | 12 +-
sound/pci/hda/patch_realtek.c | 9 +-
sound/sh/aica.c | 17 +-
sound/soc/codecs/rt5682-sdw.c | 4 +-
sound/soc/codecs/rt711-sdca-sdw.c | 4 +-
sound/soc/codecs/rt711-sdw.c | 4 +-
sound/soc/soc-ops.c | 2 +-
tools/objtool/check.c | 3 +-
tools/testing/selftests/mqueue/setting | 1 +
tools/testing/selftests/net/mptcp/diag.sh | 6 +-
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 7 +
tools/testing/selftests/net/reuseaddr_conflict.c | 2 +-
tools/testing/selftests/net/udpgro_fwd.sh | 10 +-
virt/kvm/async_pf.c | 31 +-
445 files changed, 11948 insertions(+), 6886 deletions(-)
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 6d029c25b71f2de2838a6f093ce0fa0e69336154
Gitweb: https://git.kernel.org/tip/6d029c25b71f2de2838a6f093ce0fa0e69336154
Author: Oleg Nesterov <oleg(a)redhat.com>
AuthorDate: Tue, 09 Apr 2024 15:38:03 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 09 Apr 2024 17:48:19 +02:00
selftests/timers/posix_timers: Reimplement check_timer_distribution()
check_timer_distribution() runs ten threads in a busy loop and tries to
test that the kernel distributes a process posix CPU timer signal to every
thread over time.
There is not guarantee that this is true even after commit bcb7ee79029d
("posix-timers: Prefer delivery of signals to the current thread") because
that commit only avoids waking up the sleeping process leader thread, but
that has nothing to do with the actual signal delivery.
As the signal is process wide the first thread which observes sigpending
and wins the race to lock sighand will deliver the signal. Testing shows
that this hangs on a regular base because some threads never win the race.
The comment "This primarily tests that the kernel does not favour any one."
is wrong. The kernel does favour a thread which hits the timer interrupt
when CLOCK_PROCESS_CPUTIME_ID expires.
Rewrite the test so it only checks that the group leader sleeping in join()
never receives SIGALRM and the thread which burns CPU cycles receives all
signals.
In older kernels which do not have commit bcb7ee79029d ("posix-timers:
Prefer delivery of signals to the current thread") the test-case fails
immediately, the very 1st tick wakes the leader up. Otherwise it quickly
succeeds after 100 ticks.
CI testing wants to use newer selftest versions on stable kernels. In this
case the test is guaranteed to fail.
So check in the failure case whether the kernel version is less than v6.3
and skip the test result in that case.
[ tglx: Massaged change log, renamed the version check helper ]
Fixes: e797203fb3ba ("selftests/timers/posix_timers: Test delivery of signals across threads")
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240409133802.GD29396@redhat.com
---
tools/testing/selftests/kselftest.h | 13 ++-
tools/testing/selftests/timers/posix_timers.c | 103 +++++++----------
2 files changed, 60 insertions(+), 56 deletions(-)
diff --git a/tools/testing/selftests/kselftest.h b/tools/testing/selftests/kselftest.h
index 541bf19..973b18e 100644
--- a/tools/testing/selftests/kselftest.h
+++ b/tools/testing/selftests/kselftest.h
@@ -51,6 +51,7 @@
#include <stdarg.h>
#include <string.h>
#include <stdio.h>
+#include <sys/utsname.h>
#endif
#ifndef ARRAY_SIZE
@@ -388,4 +389,16 @@ static inline __printf(1, 2) int ksft_exit_skip(const char *msg, ...)
exit(KSFT_SKIP);
}
+static inline int ksft_min_kernel_version(unsigned int min_major,
+ unsigned int min_minor)
+{
+ unsigned int major, minor;
+ struct utsname info;
+
+ if (uname(&info) || sscanf(info.release, "%u.%u.", &major, &minor) != 2)
+ ksft_exit_fail_msg("Can't parse kernel version\n");
+
+ return major > min_major || (major == min_major && minor >= min_minor);
+}
+
#endif /* __KSELFTEST_H */
diff --git a/tools/testing/selftests/timers/posix_timers.c b/tools/testing/selftests/timers/posix_timers.c
index d49dd3f..d86a0e0 100644
--- a/tools/testing/selftests/timers/posix_timers.c
+++ b/tools/testing/selftests/timers/posix_timers.c
@@ -184,80 +184,71 @@ static int check_timer_create(int which)
return 0;
}
-int remain;
-__thread int got_signal;
+static pthread_t ctd_thread;
+static volatile int ctd_count, ctd_failed;
-static void *distribution_thread(void *arg)
+static void ctd_sighandler(int sig)
{
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
- return NULL;
+ if (pthread_self() != ctd_thread)
+ ctd_failed = 1;
+ ctd_count--;
}
-static void distribution_handler(int nr)
+static void *ctd_thread_func(void *arg)
{
- if (!__atomic_exchange_n(&got_signal, 1, __ATOMIC_RELAXED))
- __atomic_fetch_sub(&remain, 1, __ATOMIC_RELAXED);
-}
-
-/*
- * Test that all running threads _eventually_ receive CLOCK_PROCESS_CPUTIME_ID
- * timer signals. This primarily tests that the kernel does not favour any one.
- */
-static int check_timer_distribution(void)
-{
- int err, i;
- timer_t id;
- const int nthreads = 10;
- pthread_t threads[nthreads];
struct itimerspec val = {
.it_value.tv_sec = 0,
.it_value.tv_nsec = 1000 * 1000,
.it_interval.tv_sec = 0,
.it_interval.tv_nsec = 1000 * 1000,
};
+ timer_t id;
- remain = nthreads + 1; /* worker threads + this thread */
- signal(SIGALRM, distribution_handler);
- err = timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id);
- if (err < 0) {
- ksft_perror("Can't create timer");
- return -1;
- }
- err = timer_settime(id, 0, &val, NULL);
- if (err < 0) {
- ksft_perror("Can't set timer");
- return -1;
- }
+ /* 1/10 seconds to ensure the leader sleeps */
+ usleep(10000);
- for (i = 0; i < nthreads; i++) {
- err = pthread_create(&threads[i], NULL, distribution_thread,
- NULL);
- if (err) {
- ksft_print_msg("Can't create thread: %s (%d)\n",
- strerror(errno), errno);
- return -1;
- }
- }
+ ctd_count = 100;
+ if (timer_create(CLOCK_PROCESS_CPUTIME_ID, NULL, &id))
+ return "Can't create timer\n";
+ if (timer_settime(id, 0, &val, NULL))
+ return "Can't set timer\n";
- /* Wait for all threads to receive the signal. */
- while (__atomic_load_n(&remain, __ATOMIC_RELAXED));
+ while (ctd_count > 0 && !ctd_failed)
+ ;
- for (i = 0; i < nthreads; i++) {
- err = pthread_join(threads[i], NULL);
- if (err) {
- ksft_print_msg("Can't join thread: %s (%d)\n",
- strerror(errno), errno);
- return -1;
- }
- }
+ if (timer_delete(id))
+ return "Can't delete timer\n";
- if (timer_delete(id)) {
- ksft_perror("Can't delete timer");
- return -1;
- }
+ return NULL;
+}
+
+/*
+ * Test that only the running thread receives the timer signal.
+ */
+static int check_timer_distribution(void)
+{
+ const char *errmsg;
- ksft_test_result_pass("check_timer_distribution\n");
+ signal(SIGALRM, ctd_sighandler);
+
+ errmsg = "Can't create thread\n";
+ if (pthread_create(&ctd_thread, NULL, ctd_thread_func, NULL))
+ goto err;
+
+ errmsg = "Can't join thread\n";
+ if (pthread_join(ctd_thread, (void **)&errmsg) || errmsg)
+ goto err;
+
+ if (!ctd_failed)
+ ksft_test_result_pass("check signal distribution\n");
+ else if (ksft_min_kernel_version(6, 3))
+ ksft_test_result_fail("check signal distribution\n");
+ else
+ ksft_test_result_skip("check signal distribution (old kernel)\n");
return 0;
+err:
+ ksft_print_msg(errmsg);
+ return -1;
}
int main(int argc, char **argv)
The conditional was supposed to prevent enabling of a crtc state
without a set primary plane. Accidently it also prevented disabling
crtc state with a set primary plane. Neither is correct.
Fix the conditional and just driver-warn when a crtc state has been
enabled without a primary plane which will help debug broken userspace.
Fixes IGT's kms_atomic_interruptible and kms_atomic_transition tests.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: 06ec41909e31 ("drm/vmwgfx: Add and connect CRTC helper functions")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v4.12+
---
drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index e33e5993d8fc..13b2820cae51 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -931,6 +931,7 @@ int vmw_du_cursor_plane_atomic_check(struct drm_plane *plane,
int vmw_du_crtc_atomic_check(struct drm_crtc *crtc,
struct drm_atomic_state *state)
{
+ struct vmw_private *vmw = vmw_priv(crtc->dev);
struct drm_crtc_state *new_state = drm_atomic_get_new_crtc_state(state,
crtc);
struct vmw_display_unit *du = vmw_crtc_to_du(new_state->crtc);
@@ -938,9 +939,13 @@ int vmw_du_crtc_atomic_check(struct drm_crtc *crtc,
bool has_primary = new_state->plane_mask &
drm_plane_mask(crtc->primary);
- /* We always want to have an active plane with an active CRTC */
- if (has_primary != new_state->enable)
- return -EINVAL;
+ /*
+ * This is fine in general, but broken userspace might expect
+ * some actual rendering so give a clue as why it's blank.
+ */
+ if (new_state->enable && !has_primary)
+ drm_dbg_driver(&vmw->drm,
+ "CRTC without a primary plane will be blank.\n");
if (new_state->connector_mask != connector_mask &&
--
2.40.1
[BUG]
During my extent_map cleanup/refactor, with more than too strict sanity
checks, extent-map-tests::test_case_7() would crash my extent_map sanity
checks.
The problem is, after btrfs_drop_extent_map_range(), the resulted
extent_map has a @block_start way too large.
Meanwhile my btrfs_file_extent_item based members are returning a
correct @disk_bytenr along with correct @offset.
The extent map layout looks like this:
0 16K 32K 48K
| PINNED | | Regular |
The regular em at [32K, 48K) also has 32K @block_start.
Then drop range [0, 36K), which should shrink the regular one to be
[36K, 48K).
However the @block_start is incorrect, we expect 32K + 4K, but got 52K.
[CAUSE]
Inside btrfs_drop_extent_map_range() function, if we hit an extent_map
that covers the target range but is still beyond it, we need to split
that extent map into half:
|<-- drop range -->|
|<----- existing extent_map --->|
And if the extent map is not compressed, we need to forward
extent_map::block_start by the difference between the end of drop range
and the extent map start.
However in that particular case, the difference is calculated using
(start + len - em->start).
The problem is @start can be modified if the drop range covers any
pinned extent.
This leads to wrong calculation, and would be caught by my later
extent_map sanity checks, which checks the em::block_start against
btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset.
And unfortunately this is going to cause data corruption, as the
splitted em is pointing an incorrect location, can cause either
unexpected read error or wild writes.
[FIX]
Fix it by avoiding using @start completely, and use @end - em->start
instead, which @end is exclusive bytenr number.
And update the test case to verify the @block_start to prevent such
problem from happening.
CC: stable(a)vger.kernel.org # 6.7+
Fixes: c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range")
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/extent_map.c | 2 +-
fs/btrfs/tests/extent-map-tests.c | 6 +++++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 471654cb65b0..955ce300e5a1 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -799,7 +799,7 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
split->block_len = em->block_len;
split->orig_start = em->orig_start;
} else {
- const u64 diff = start + len - em->start;
+ const u64 diff = end - em->start;
split->block_len = split->len;
split->block_start += diff;
diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
index 253cce7ffecf..80e71c5cb7ab 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -818,7 +818,6 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
test_err("em->len is %llu, expected 16K", em->len);
goto out;
}
-
free_extent_map(em);
read_lock(&em_tree->lock);
@@ -847,6 +846,11 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
goto out;
}
+ if (em->block_start != SZ_32K + SZ_4K) {
+ test_err("em->block_start is %llu, expected 36K", em->block_start);
+ goto out;
+ }
+
free_extent_map(em);
read_lock(&em_tree->lock);
--
2.44.0
Enable DMA mappings in vmwgfx after TTM has been fixed in commit
3bf3710e3718 ("drm/ttm: Add a generic TTM memcpy move for page-based iomem")
This enables full guest-backed memory support and in particular allows
usage of screen targets as the presentation mechanism.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Reported-by: Ye Li <ye.li(a)broadcom.com>
Tested-by: Ye Li <ye.li(a)broadcom.com>
Fixes: 3b0d6458c705 ("drm/vmwgfx: Refuse DMA operation when SEV encryption is active")
Cc: Broadcom internal kernel review list <bcm-kernel-feedback-list(a)broadcom.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: <stable(a)vger.kernel.org> # v6.6+
---
drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
index 41ad13e45554..bdad93864b98 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c
@@ -667,11 +667,12 @@ static int vmw_dma_select_mode(struct vmw_private *dev_priv)
[vmw_dma_map_populate] = "Caching DMA mappings.",
[vmw_dma_map_bind] = "Giving up DMA mappings early."};
- /* TTM currently doesn't fully support SEV encryption. */
- if (cc_platform_has(CC_ATTR_MEM_ENCRYPT))
- return -EINVAL;
-
- if (vmw_force_coherent)
+ /*
+ * When running with SEV we always want dma mappings, because
+ * otherwise ttm tt pool pages will bounce through swiotlb running
+ * out of available space.
+ */
+ if (vmw_force_coherent || cc_platform_has(CC_ATTR_MEM_ENCRYPT))
dev_priv->map_mode = vmw_dma_alloc_coherent;
else if (vmw_restrict_iommu)
dev_priv->map_mode = vmw_dma_map_bind;
--
2.40.1
[BUG]
During my extent_map cleanup/refactor, with extra sanity checks,
extent-map-tests::test_case_7() would not pass the checks.
The problem is, after btrfs_drop_extent_map_range(), the resulted
extent_map has a @block_start way too large.
Meanwhile my btrfs_file_extent_item based members are returning a
correct @disk_bytenr/@offset combination.
The extent map layout looks like this:
0 16K 32K 48K
| PINNED | | Regular |
The regular em at [32K, 48K) also has 32K @block_start.
Then drop range [0, 36K), which should shrink the regular one to be
[36K, 48K).
However the @block_start is incorrect, we expect 32K + 4K, but got 52K.
[CAUSE]
Inside btrfs_drop_extent_map_range() function, if we hit an extent_map
that covers the target range but is still beyond it, we need to split
that extent map into half:
|<-- drop range -->|
|<----- existing extent_map --->|
And if the extent map is not compressed, we need to forward
extent_map::block_start by the difference between the end of drop range
and the extent map start.
However in that particular case, the difference is calculated using
(start + len - em->start).
The problem is @start can be modified if the drop range covers any
pinned extent.
This leads to wrong calculation, and would be caught by my later
extent_map sanity checks, which checks the em::block_start against
btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset.
This is a regression caused by commit c962098ca4af ("btrfs: fix
incorrect splitting in btrfs_drop_extent_map_range"), which removed the
@len update for pinned extents.
[FIX]
Fix it by avoiding using @start completely, and use @end - em->start
instead, which @end is exclusive bytenr number.
And update the test case to verify the @block_start to prevent such
problem from happening.
Thankfully this is not going to lead to any data corruption, as IO path
does not utilize btrfs_drop_extent_map_range() with @skip_pinned set.
So this fix is only here for the sake of consistency/correctness.
CC: stable(a)vger.kernel.org # 6.5+
Fixes: c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range")
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
Changelog:
v2:
- Remove the mention of possible corruption
Thankfully this bug does not affect IO path thus it's fine.
- Explain why c962098ca4af is the cause
v3:
- Fix an accidental removal of a newline
---
fs/btrfs/extent_map.c | 2 +-
fs/btrfs/tests/extent-map-tests.c | 5 +++++
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 471654cb65b0..955ce300e5a1 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -799,7 +799,7 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
split->block_len = em->block_len;
split->orig_start = em->orig_start;
} else {
- const u64 diff = start + len - em->start;
+ const u64 diff = end - em->start;
split->block_len = split->len;
split->block_start += diff;
diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
index 253cce7ffecf..47b5d301038e 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -847,6 +847,11 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
goto out;
}
+ if (em->block_start != SZ_32K + SZ_4K) {
+ test_err("em->block_start is %llu, expected 36K", em->block_start);
+ goto out;
+ }
+
free_extent_map(em);
read_lock(&em_tree->lock);
--
2.44.0
[BUG]
During my extent_map cleanup/refactor, with extra sanity checks,
extent-map-tests::test_case_7() would not pass the checks.
The problem is, after btrfs_drop_extent_map_range(), the resulted
extent_map has a @block_start way too large.
Meanwhile my btrfs_file_extent_item based members are returning a
correct @disk_bytenr/@offset combination.
The extent map layout looks like this:
0 16K 32K 48K
| PINNED | | Regular |
The regular em at [32K, 48K) also has 32K @block_start.
Then drop range [0, 36K), which should shrink the regular one to be
[36K, 48K).
However the @block_start is incorrect, we expect 32K + 4K, but got 52K.
[CAUSE]
Inside btrfs_drop_extent_map_range() function, if we hit an extent_map
that covers the target range but is still beyond it, we need to split
that extent map into half:
|<-- drop range -->|
|<----- existing extent_map --->|
And if the extent map is not compressed, we need to forward
extent_map::block_start by the difference between the end of drop range
and the extent map start.
However in that particular case, the difference is calculated using
(start + len - em->start).
The problem is @start can be modified if the drop range covers any
pinned extent.
This leads to wrong calculation, and would be caught by my later
extent_map sanity checks, which checks the em::block_start against
btrfs_file_extent_item::disk_bytenr + btrfs_file_extent_item::offset.
This is a regression caused by commit c962098ca4af ("btrfs: fix
incorrect splitting in btrfs_drop_extent_map_range"), which removed the
@len update for pinned extents.
[FIX]
Fix it by avoiding using @start completely, and use @end - em->start
instead, which @end is exclusive bytenr number.
And update the test case to verify the @block_start to prevent such
problem from happening.
Thankfully this is not going to lead to any data corruption, as IO path
does not utilize btrfs_drop_extent_map_range() with @skip_pinned set.
So this fix is only here for the sake of consistency/correctness.
CC: stable(a)vger.kernel.org # 6.5+
Fixes: c962098ca4af ("btrfs: fix incorrect splitting in btrfs_drop_extent_map_range")
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
Changelog:
v2:
- Remove the mention of possible corruption
Thankfully this bug does not affect IO path thus it's fine.
- Explain why c962098ca4af is the cause
---
fs/btrfs/extent_map.c | 2 +-
fs/btrfs/tests/extent-map-tests.c | 6 +++++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 471654cb65b0..955ce300e5a1 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -799,7 +799,7 @@ void btrfs_drop_extent_map_range(struct btrfs_inode *inode, u64 start, u64 end,
split->block_len = em->block_len;
split->orig_start = em->orig_start;
} else {
- const u64 diff = start + len - em->start;
+ const u64 diff = end - em->start;
split->block_len = split->len;
split->block_start += diff;
diff --git a/fs/btrfs/tests/extent-map-tests.c b/fs/btrfs/tests/extent-map-tests.c
index 253cce7ffecf..80e71c5cb7ab 100644
--- a/fs/btrfs/tests/extent-map-tests.c
+++ b/fs/btrfs/tests/extent-map-tests.c
@@ -818,7 +818,6 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
test_err("em->len is %llu, expected 16K", em->len);
goto out;
}
-
free_extent_map(em);
read_lock(&em_tree->lock);
@@ -847,6 +846,11 @@ static int test_case_7(struct btrfs_fs_info *fs_info)
goto out;
}
+ if (em->block_start != SZ_32K + SZ_4K) {
+ test_err("em->block_start is %llu, expected 36K", em->block_start);
+ goto out;
+ }
+
free_extent_map(em);
read_lock(&em_tree->lock);
--
2.44.0
I split before patch to two patch. one for bugfix, anorther one for cleanup
Yi Yang (2):
net: usb: asix: Add check for usbnet_get_endpoints
net: usb: asix: Replace the direct return with goto statement
drivers/net/usb/asix_devices.c | 20 +++++++++++++-------
1 file changed, 13 insertions(+), 7 deletions(-)
--
2.25.1
From: Arnd Bergmann <arnd(a)arndb.de>
I ran into a randconfig build failure with UBSAN using gcc-13.2:
arm-linux-gnueabi-ld: error: unplaced orphan section `.bss..Lubsan_data31' from `drivers/mtd/nand/raw/diskonchip.o'
I'm not entirely sure what is going on here, but I suspect this has something
to do with the check for the end of the doc_locations[] array that contains
an (unsigned long)0xffffffff element, which is compared against the signed
(int)0xffffffff. If this is the case, we should get a runtime check for
undefined behavior, but we instead get an unexpected build-time error.
I would have expected this to work fine on 32-bit architectures despite the
signed integer overflow, though on 64-bit architectures this likely won't
ever work.
Changing the contition to instead check for the size of the array makes the
code safe everywhere and avoids the ubsan check that leads to the link
error. The loop code goes back to before 2.6.12.
Cc: stable(a)vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
drivers/mtd/nand/raw/diskonchip.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/raw/diskonchip.c b/drivers/mtd/nand/raw/diskonchip.c
index 5243fab9face..8db7fc424571 100644
--- a/drivers/mtd/nand/raw/diskonchip.c
+++ b/drivers/mtd/nand/raw/diskonchip.c
@@ -53,7 +53,7 @@ static unsigned long doc_locations[] __initdata = {
0xe8000, 0xea000, 0xec000, 0xee000,
#endif
#endif
- 0xffffffff };
+};
static struct mtd_info *doclist = NULL;
@@ -1554,7 +1554,7 @@ static int __init init_nanddoc(void)
if (ret < 0)
return ret;
} else {
- for (i = 0; (doc_locations[i] != 0xffffffff); i++) {
+ for (i = 0; i < ARRAY_SIZE(doc_locations); i++) {
doc_probe(doc_locations[i]);
}
}
--
2.39.2
In the __cgroup_bpf_query() function, it is possible to dereference
the null pointer in the line id = prog->aux->id; since there is no
check for a non-zero value of the variable prog.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: af6eea57437a ("bpf: Implement bpf_link-based cgroup BPF program attachment")
Cc: stable(a)vger.kernel.org
Signed-off-by: Mikhail Lobanov <m.lobanov(a)rosalinux.ru>
---
kernel/bpf/cgroup.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/kernel/bpf/cgroup.c b/kernel/bpf/cgroup.c
index 491d20038cbe..7f2db96f0c6a 100644
--- a/kernel/bpf/cgroup.c
+++ b/kernel/bpf/cgroup.c
@@ -1092,6 +1092,8 @@ static int __cgroup_bpf_query(struct cgroup *cgrp, const union bpf_attr *attr,
i = 0;
hlist_for_each_entry(pl, progs, node) {
prog = prog_list_prog(pl);
+ if (!prog_list_prog(pl))
+ continue;
id = prog->aux->id;
if (copy_to_user(prog_ids + i, &id, sizeof(id)))
return -EFAULT;
--
2.43.0
Since commit 1a50d9403fb9 ("treewide: Fix probing of devices in DT
overlays"), when using device-tree overlays, the FWNODE_FLAG_NOT_DEVICE
is set on each overlay nodes. This flag is cleared when a struct device
is actually created for the DT node.
Also, when a device is created, the device DT node is parsed for known
phandle and devlinks consumer/supplier links are created between the
device (consumer) and the devices referenced by phandles (suppliers).
As these supplier device can have a struct device not already created,
the FWNODE_FLAG_NOT_DEVICE can be set for suppliers and leads the
devlink supplier point to the device's parent instead of the device
itself.
Avoid this situation clearing the supplier FWNODE_FLAG_NOT_DEVICE just
before the devlink creation if a device is supposed to be created and
handled later in the process.
Fixes: 1a50d9403fb9 ("treewide: Fix probing of devices in DT overlays")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
---
drivers/of/property.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/of/property.c b/drivers/of/property.c
index 641a40cf5cf3..ff5cac477dbe 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1097,6 +1097,7 @@ static void of_link_to_phandle(struct device_node *con_np,
struct device_node *sup_np)
{
struct device_node *tmp_np = of_node_get(sup_np);
+ struct fwnode_handle *sup_fwnode;
/* Check that sup_np and its ancestors are available. */
while (tmp_np) {
@@ -1113,7 +1114,20 @@ static void of_link_to_phandle(struct device_node *con_np,
tmp_np = of_get_next_parent(tmp_np);
}
- fwnode_link_add(of_fwnode_handle(con_np), of_fwnode_handle(sup_np));
+ /*
+ * In case of overlays, the fwnode are added with FWNODE_FLAG_NOT_DEVICE
+ * flag set. A node can have a phandle that references an other node
+ * added by the overlay.
+ * Clear the supplier's FWNODE_FLAG_NOT_DEVICE so that fw_devlink links
+ * to this supplier instead of linking to its parent.
+ */
+ sup_fwnode = of_fwnode_handle(sup_np);
+ if (sup_fwnode->flags & FWNODE_FLAG_NOT_DEVICE) {
+ if (of_property_present(sup_np, "compatible") &&
+ of_device_is_available(sup_np))
+ sup_fwnode->flags &= ~FWNODE_FLAG_NOT_DEVICE;
+ }
+ fwnode_link_add(of_fwnode_handle(con_np), sup_fwnode);
}
/**
--
2.43.0
Hi Greg, Sasha,
This batch contains a backport for recent fixes already upstream for 5.10.x,
to add them on top of your enqueued patches:
994209ddf4f4 ("netfilter: nf_tables: reject new basechain after table flag update")
24cea9677025 ("netfilter: nf_tables: flush pending destroy work before exit_net release")
a45e6889575c ("netfilter: nf_tables: release batch on table validation from abort path")
0d459e2ffb54 ("netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path")
1bc83a019bbe ("netfilter: nf_tables: discard table flag update with pending basechain deletion")
Please, apply, thanks.
Pablo Neira Ayuso (5):
netfilter: nf_tables: reject new basechain after table flag update
netfilter: nf_tables: flush pending destroy work before exit_net release
netfilter: nf_tables: release batch on table validation from abort path
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
netfilter: nf_tables: discard table flag update with pending basechain deletion
net/netfilter/nf_tables_api.c | 51 ++++++++++++++++++++++++++++-------
1 file changed, 41 insertions(+), 10 deletions(-)
--
2.30.2
Hi Greg, Sasha,
This batch contains a backport for recent fixes already upstream for 5.10.x,
to add them on top of your enqueued patches:
994209ddf4f4 ("netfilter: nf_tables: reject new basechain after table flag update")
24cea9677025 ("netfilter: nf_tables: flush pending destroy work before exit_net release")
a45e6889575c ("netfilter: nf_tables: release batch on table validation from abort path")
0d459e2ffb54 ("netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path")
1bc83a019bbe ("netfilter: nf_tables: discard table flag update with pending basechain deletion")
Please, apply, thanks.
Pablo Neira Ayuso (5):
netfilter: nf_tables: reject new basechain after table flag update
netfilter: nf_tables: flush pending destroy work before exit_net release
netfilter: nf_tables: release batch on table validation from abort path
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
netfilter: nf_tables: discard table flag update with pending basechain deletion
net/netfilter/nf_tables_api.c | 51 ++++++++++++++++++++++++++++-------
1 file changed, 41 insertions(+), 10 deletions(-)
--
2.30.2
Hi Greg, Sasha,
This batch contains a backport for recent fixes already upstream for 6.1.x,
to add them on top of your enqueued patches:
a45e6889575c ("netfilter: nf_tables: release batch on table validation from abort path")
0d459e2ffb54 ("netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path")
1bc83a019bbe ("netfilter: nf_tables: discard table flag update with pending basechain deletion")
Please, apply, thanks.
Pablo Neira Ayuso (3):
netfilter: nf_tables: release batch on table validation from abort path
netfilter: nf_tables: release mutex after nft_gc_seq_end from abort path
netfilter: nf_tables: discard table flag update with pending basechain deletion
net/netfilter/nf_tables_api.c | 47 +++++++++++++++++++++++++++--------
1 file changed, 36 insertions(+), 11 deletions(-)
--
2.30.2
From: Peter Xu <peterx(a)redhat.com>
After UFFDIO_POISON, there can be two kinds of hugetlb pte markers, either
the POISON one or UFFD_WP one.
Allow change protection to run on a poisoned marker just like !hugetlb
cases, ignoring the marker irrelevant of the permission.
Here the two bits are mutual exclusive. For example, when install a
poisoned entry it must not be UFFD_WP already (by checking pte_none()
before such install). And it also means if UFFD_WP is set there must have
no POISON bit set. It makes sense because UFFD_WP is a bit to reflect
permission, and permissions do not apply if the pte is poisoned and
destined to sigbus.
So here we simply check uffd_wp bit set first, do nothing otherwise.
Attach the Fixes to UFFDIO_POISON work, as before that it should not be
possible to have poison entry for hugetlb (e.g., hugetlb doesn't do swap,
so no chance of swapin errors).
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: linux-stable <stable(a)vger.kernel.org> # 6.6+
Link: https://lore.kernel.org/r/000000000000920d5e0615602dd1@google.com
Reported-by: syzbot+b07c8ac8eee3d4d8440f(a)syzkaller.appspotmail.com
Fixes: fc71884a5f59 ("mm: userfaultfd: add new UFFDIO_POISON ioctl")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
---
mm/hugetlb.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 8267e221ca5d..ba7162441adf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6960,9 +6960,13 @@ long hugetlb_change_protection(struct vm_area_struct *vma,
if (!pte_same(pte, newpte))
set_huge_pte_at(mm, address, ptep, newpte, psize);
} else if (unlikely(is_pte_marker(pte))) {
- /* No other markers apply for now. */
- WARN_ON_ONCE(!pte_marker_uffd_wp(pte));
- if (uffd_wp_resolve)
+ /*
+ * Do nothing on a poison marker; page is
+ * corrupted, permissons do not apply. Here
+ * pte_marker_uffd_wp()==true implies !poison
+ * because they're mutual exclusive.
+ */
+ if (pte_marker_uffd_wp(pte) && uffd_wp_resolve)
/* Safe to modify directly (non-present->none). */
huge_pte_clear(mm, address, ptep, psize);
} else if (!huge_pte_none(pte)) {
--
2.44.0
The patch titled
Subject: ocfs2: use coarse time for new created files
has been added to the -mm mm-nonmm-unstable branch. Its filename is
ocfs2-use-coarse-time-for-new-created-files.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Su Yue <glass.su(a)suse.com>
Subject: ocfs2: use coarse time for new created files
Date: Mon, 8 Apr 2024 16:20:41 +0800
The default atime related mount option is '-o realtime' which means file
atime should be updated if atime <= ctime or atime <= mtime. atime should
be updated in the following scenario, but it is not:
==========================================================
$ rm /mnt/testfile;
$ echo test > /mnt/testfile
$ stat -c "%X %Y %Z" /mnt/testfile
1711881646 1711881646 1711881646
$ sleep 5
$ cat /mnt/testfile > /dev/null
$ stat -c "%X %Y %Z" /mnt/testfile
1711881646 1711881646 1711881646
==========================================================
And the reason the atime in the test is not updated is that ocfs2 calls
ktime_get_real_ts64() in __ocfs2_mknod_locked during file creation. Then
inode_set_ctime_current() is called in inode_set_ctime_current() calls
ktime_get_coarse_real_ts64() to get current time.
ktime_get_real_ts64() is more accurate than ktime_get_coarse_real_ts64().
In my test box, I saw ctime set by ktime_get_coarse_real_ts64() is less
than ktime_get_real_ts64() even ctime is set later. The ctime of the new
inode is smaller than atime.
The call trace is like:
ocfs2_create
ocfs2_mknod
__ocfs2_mknod_locked
....
ktime_get_real_ts64 <------- set atime,ctime,mtime, more accurate
ocfs2_populate_inode
...
ocfs2_init_acl
ocfs2_acl_set_mode
inode_set_ctime_current
current_time
ktime_get_coarse_real_ts64 <-------less accurate
ocfs2_file_read_iter
ocfs2_inode_lock_atime
ocfs2_should_update_atime
atime <= ctime ? <-------- false, ctime < atime due to accuracy
So here call ktime_get_coarse_real_ts64 to set inode time coarser while
creating new files. It may lower the accuracy of file times. But it's
not a big deal since we already use coarse time in other places like
ocfs2_update_inode_atime and inode_set_ctime_current.
Link: https://lkml.kernel.org/r/20240408082041.20925-5-glass.su@suse.com
Fixes: c62c38f6b91b ("ocfs2: replace CURRENT_TIME macro")
Signed-off-by: Su Yue <glass.su(a)suse.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/namei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/ocfs2/namei.c~ocfs2-use-coarse-time-for-new-created-files
+++ a/fs/ocfs2/namei.c
@@ -566,7 +566,7 @@ static int __ocfs2_mknod_locked(struct i
fe->i_last_eb_blk = 0;
strcpy(fe->i_signature, OCFS2_INODE_SIGNATURE);
fe->i_flags |= cpu_to_le32(OCFS2_VALID_FL);
- ktime_get_real_ts64(&ts);
+ ktime_get_coarse_real_ts64(&ts);
fe->i_atime = fe->i_ctime = fe->i_mtime =
cpu_to_le64(ts.tv_sec);
fe->i_mtime_nsec = fe->i_ctime_nsec = fe->i_atime_nsec =
_
Patches currently in -mm which might be from glass.su(a)suse.com are
ocfs2-update-inode-ctime-in-ocfs2_fileattr_set.patch
ocfs2-return-real-error-code-in-ocfs2_dio_wr_get_block.patch
ocfs2-fix-races-between-hole-punching-and-aiodio.patch
ocfs2-update-inode-fsync-transaction-id-in-ocfs2_unlink-and-ocfs2_link.patch
ocfs2-use-coarse-time-for-new-created-files.patch
The patch titled
Subject: ocfs2: update inode fsync transaction id in ocfs2_unlink and ocfs2_link
has been added to the -mm mm-nonmm-unstable branch. Its filename is
ocfs2-update-inode-fsync-transaction-id-in-ocfs2_unlink-and-ocfs2_link.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Su Yue <glass.su(a)suse.com>
Subject: ocfs2: update inode fsync transaction id in ocfs2_unlink and ocfs2_link
Date: Mon, 8 Apr 2024 16:20:40 +0800
transaction id should be updated in ocfs2_unlink and ocfs2_link.
Otherwise, inode link will be wrong after journal replay even fsync was
called before power failure:
=======================================================================
$ touch testdir/bar
$ ln testdir/bar testdir/bar_link
$ fsync testdir/bar
$ stat -c %h $SCRATCH_MNT/testdir/bar
1
$ stat -c %h $SCRATCH_MNT/testdir/bar
1
=======================================================================
Link: https://lkml.kernel.org/r/20240408082041.20925-4-glass.su@suse.com
Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem")
Signed-off-by: Su Yue <glass.su(a)suse.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/namei.c | 2 ++
1 file changed, 2 insertions(+)
--- a/fs/ocfs2/namei.c~ocfs2-update-inode-fsync-transaction-id-in-ocfs2_unlink-and-ocfs2_link
+++ a/fs/ocfs2/namei.c
@@ -797,6 +797,7 @@ static int ocfs2_link(struct dentry *old
ocfs2_set_links_count(fe, inode->i_nlink);
fe->i_ctime = cpu_to_le64(inode_get_ctime_sec(inode));
fe->i_ctime_nsec = cpu_to_le32(inode_get_ctime_nsec(inode));
+ ocfs2_update_inode_fsync_trans(handle, inode, 0);
ocfs2_journal_dirty(handle, fe_bh);
err = ocfs2_add_entry(handle, dentry, inode,
@@ -993,6 +994,7 @@ static int ocfs2_unlink(struct inode *di
drop_nlink(inode);
drop_nlink(inode);
ocfs2_set_links_count(fe, inode->i_nlink);
+ ocfs2_update_inode_fsync_trans(handle, inode, 0);
ocfs2_journal_dirty(handle, fe_bh);
inode_set_mtime_to_ts(dir, inode_set_ctime_current(dir));
_
Patches currently in -mm which might be from glass.su(a)suse.com are
ocfs2-update-inode-ctime-in-ocfs2_fileattr_set.patch
ocfs2-return-real-error-code-in-ocfs2_dio_wr_get_block.patch
ocfs2-fix-races-between-hole-punching-and-aiodio.patch
ocfs2-update-inode-fsync-transaction-id-in-ocfs2_unlink-and-ocfs2_link.patch
ocfs2-use-coarse-time-for-new-created-files.patch
The patch titled
Subject: ocfs2: fix races between hole punching and AIO+DIO
has been added to the -mm mm-nonmm-unstable branch. Its filename is
ocfs2-fix-races-between-hole-punching-and-aiodio.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Su Yue <glass.su(a)suse.com>
Subject: ocfs2: fix races between hole punching and AIO+DIO
Date: Mon, 8 Apr 2024 16:20:39 +0800
After commit "ocfs2: return real error code in ocfs2_dio_wr_get_block",
fstests/generic/300 become from always failed to sometimes failed:
========================================================================
[ 473.293420 ] run fstests generic/300
[ 475.296983 ] JBD2: Ignoring recovery information on journal
[ 475.302473 ] ocfs2: Mounting device (253,1) on (node local, slot 0)
with ordered data mode.
[ 494.290998 ] OCFS2: ERROR (device dm-1): ocfs2_change_extent_flag:
Owner 5668 has an extent at cpos 78723 which can no longer be found
[ 494.291609 ] On-disk corruption discovered. Please run fsck.ocfs2
once the filesystem is unmounted.
[ 494.292018 ] OCFS2: File system is now read-only.
[ 494.292224 ] (kworker/19:11,2628,19):ocfs2_mark_extent_written:5272
ERROR: status = -30
[ 494.292602 ] (kworker/19:11,2628,19):ocfs2_dio_end_io_write:2374
ERROR: status = -3
fio: io_u error on file /mnt/scratch/racer: Read-only file system: write
offset=460849152, buflen=131072
=========================================================================
In __blockdev_direct_IO, ocfs2_dio_wr_get_block is called to add unwritten
extents to a list. extents are also inserted into extent tree in
ocfs2_write_begin_nolock. Then another thread call fallocate to puch a
hole at one of the unwritten extent. The extent at cpos was removed by
ocfs2_remove_extent(). At end io worker thread, ocfs2_search_extent_list
found there is no such extent at the cpos.
T1 T2 T3
inode lock
...
insert extents
...
inode unlock
ocfs2_fallocate
__ocfs2_change_file_space
inode lock
lock ip_alloc_sem
ocfs2_remove_inode_range inode
ocfs2_remove_btree_range
ocfs2_remove_extent
^---remove the extent at cpos 78723
...
unlock ip_alloc_sem
inode unlock
ocfs2_dio_end_io
ocfs2_dio_end_io_write
lock ip_alloc_sem
ocfs2_mark_extent_written
ocfs2_change_extent_flag
ocfs2_search_extent_list
^---failed to find extent
...
unlock ip_alloc_sem
In most filesystems, fallocate is not compatible with racing with AIO+DIO,
so fix it by adding to wait for all dio before fallocate/punch_hole like
ext4.
Link: https://lkml.kernel.org/r/20240408082041.20925-3-glass.su@suse.com
Fixes: b25801038da5 ("ocfs2: Support xfs style space reservation ioctls")
Signed-off-by: Su Yue <glass.su(a)suse.com>
Reviewed-by: Joseph Qi <joseph.qi(a)linux.alibaba.com>
Cc: Changwei Ge <gechangwei(a)live.cn>
Cc: Gang He <ghe(a)suse.com>
Cc: Joel Becker <jlbec(a)evilplan.org>
Cc: Jun Piao <piaojun(a)huawei.com>
Cc: Junxiao Bi <junxiao.bi(a)oracle.com>
Cc: Mark Fasheh <mark(a)fasheh.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ocfs2/file.c | 2 ++
1 file changed, 2 insertions(+)
--- a/fs/ocfs2/file.c~ocfs2-fix-races-between-hole-punching-and-aiodio
+++ a/fs/ocfs2/file.c
@@ -1936,6 +1936,8 @@ static int __ocfs2_change_file_space(str
inode_lock(inode);
+ /* Wait all existing dio workers, newcomers will block on i_rwsem */
+ inode_dio_wait(inode);
/*
* This prevents concurrent writes on other nodes
*/
_
Patches currently in -mm which might be from glass.su(a)suse.com are
ocfs2-update-inode-ctime-in-ocfs2_fileattr_set.patch
ocfs2-return-real-error-code-in-ocfs2_dio_wr_get_block.patch
ocfs2-fix-races-between-hole-punching-and-aiodio.patch
ocfs2-update-inode-fsync-transaction-id-in-ocfs2_unlink-and-ocfs2_link.patch
ocfs2-use-coarse-time-for-new-created-files.patch
The patch titled
Subject: mm,swapops: update check in is_pfn_swap_entry for hwpoison entries
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mmswapops-update-check-in-is_pfn_swap_entry-for-hwpoison-entries.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Oscar Salvador <osalvador(a)suse.de>
Subject: mm,swapops: update check in is_pfn_swap_entry for hwpoison entries
Date: Sun, 7 Apr 2024 15:05:37 +0200
Tony reported that the Machine check recovery was broken in v6.9-rc1, as
he was hitting a VM_BUG_ON when injecting uncorrectable memory errors to
DRAM.
After some more digging and debugging on his side, he realized that this
went back to v6.1, with the introduction of 'commit 0d206b5d2e0d
("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")'. That
commit, among other things, introduced swp_offset_pfn(), replacing
hwpoison_entry_to_pfn() in its favour.
The patch also introduced a VM_BUG_ON() check for is_pfn_swap_entry(), but
is_pfn_swap_entry() never got updated to cover hwpoison entries, which
means that we would hit the VM_BUG_ON whenever we would call
swp_offset_pfn() for such entries on environments with CONFIG_DEBUG_VM
set. Fix this by updating the check to cover hwpoison entries as well,
and update the comment while we are it.
Link: https://lkml.kernel.org/r/20240407130537.16977-1-osalvador@suse.de
Fixes: 0d206b5d2e0d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
Signed-off-by: Oscar Salvador <osalvador(a)suse.de>
Reported-by: Tony Luck <tony.luck(a)intel.com>
Closes: https://lore.kernel.org/all/Zg8kLSl2yAlA3o5D@agluck-desk3/
Tested-by: Tony Luck <tony.luck(a)intel.com>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: <stable(a)vger.kernel.org> [6.1.x]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/swapops.h | 65 +++++++++++++++++++-------------------
1 file changed, 33 insertions(+), 32 deletions(-)
--- a/include/linux/swapops.h~mmswapops-update-check-in-is_pfn_swap_entry-for-hwpoison-entries
+++ a/include/linux/swapops.h
@@ -390,6 +390,35 @@ static inline bool is_migration_entry_di
}
#endif /* CONFIG_MIGRATION */
+#ifdef CONFIG_MEMORY_FAILURE
+
+/*
+ * Support for hardware poisoned pages
+ */
+static inline swp_entry_t make_hwpoison_entry(struct page *page)
+{
+ BUG_ON(!PageLocked(page));
+ return swp_entry(SWP_HWPOISON, page_to_pfn(page));
+}
+
+static inline int is_hwpoison_entry(swp_entry_t entry)
+{
+ return swp_type(entry) == SWP_HWPOISON;
+}
+
+#else
+
+static inline swp_entry_t make_hwpoison_entry(struct page *page)
+{
+ return swp_entry(0, 0);
+}
+
+static inline int is_hwpoison_entry(swp_entry_t swp)
+{
+ return 0;
+}
+#endif
+
typedef unsigned long pte_marker;
#define PTE_MARKER_UFFD_WP BIT(0)
@@ -483,8 +512,9 @@ static inline struct folio *pfn_swap_ent
/*
* A pfn swap entry is a special type of swap entry that always has a pfn stored
- * in the swap offset. They are used to represent unaddressable device memory
- * and to restrict access to a page undergoing migration.
+ * in the swap offset. They can either be used to represent unaddressable device
+ * memory, to restrict access to a page undergoing migration or to represent a
+ * pfn which has been hwpoisoned and unmapped.
*/
static inline bool is_pfn_swap_entry(swp_entry_t entry)
{
@@ -492,7 +522,7 @@ static inline bool is_pfn_swap_entry(swp
BUILD_BUG_ON(SWP_TYPE_SHIFT < SWP_PFN_BITS);
return is_migration_entry(entry) || is_device_private_entry(entry) ||
- is_device_exclusive_entry(entry);
+ is_device_exclusive_entry(entry) || is_hwpoison_entry(entry);
}
struct page_vma_mapped_walk;
@@ -561,35 +591,6 @@ static inline int is_pmd_migration_entry
}
#endif /* CONFIG_ARCH_ENABLE_THP_MIGRATION */
-#ifdef CONFIG_MEMORY_FAILURE
-
-/*
- * Support for hardware poisoned pages
- */
-static inline swp_entry_t make_hwpoison_entry(struct page *page)
-{
- BUG_ON(!PageLocked(page));
- return swp_entry(SWP_HWPOISON, page_to_pfn(page));
-}
-
-static inline int is_hwpoison_entry(swp_entry_t entry)
-{
- return swp_type(entry) == SWP_HWPOISON;
-}
-
-#else
-
-static inline swp_entry_t make_hwpoison_entry(struct page *page)
-{
- return swp_entry(0, 0);
-}
-
-static inline int is_hwpoison_entry(swp_entry_t swp)
-{
- return 0;
-}
-#endif
-
static inline int non_swap_entry(swp_entry_t entry)
{
return swp_type(entry) >= MAX_SWAPFILES;
_
Patches currently in -mm which might be from osalvador(a)suse.de are
mmpage_owner-update-metadata-for-tail-pages.patch
mmpage_owner-fix-refcount-imbalance.patch
mmpage_owner-fix-accounting-of-pages-when-migrating.patch
mmpage_owner-fix-printing-of-stack-records.patch
mmswapops-update-check-in-is_pfn_swap_entry-for-hwpoison-entries.patch
The patch titled
Subject: mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/memory-failure: fix deadlock when hugetlb_optimize_vmemmap is enabled
Date: Sun, 7 Apr 2024 16:54:56 +0800
When I did hard offline test with hugetlb pages, below deadlock occurs:
======================================================
WARNING: possible circular locking dependency detected
6.8.0-11409-gf6cef5f8c37f #1 Not tainted
------------------------------------------------------
bash/46904 is trying to acquire lock:
ffffffffabe68910 (cpu_hotplug_lock){++++}-{0:0}, at: static_key_slow_dec+0x16/0x60
but task is already holding lock:
ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #1 (pcp_batch_high_lock){+.+.}-{3:3}:
__mutex_lock+0x6c/0x770
page_alloc_cpu_online+0x3c/0x70
cpuhp_invoke_callback+0x397/0x5f0
__cpuhp_invoke_callback_range+0x71/0xe0
_cpu_up+0xeb/0x210
cpu_up+0x91/0xe0
cpuhp_bringup_mask+0x49/0xb0
bringup_nonboot_cpus+0xb7/0xe0
smp_init+0x25/0xa0
kernel_init_freeable+0x15f/0x3e0
kernel_init+0x15/0x1b0
ret_from_fork+0x2f/0x50
ret_from_fork_asm+0x1a/0x30
-> #0 (cpu_hotplug_lock){++++}-{0:0}:
__lock_acquire+0x1298/0x1cd0
lock_acquire+0xc0/0x2b0
cpus_read_lock+0x2a/0xc0
static_key_slow_dec+0x16/0x60
__hugetlb_vmemmap_restore_folio+0x1b9/0x200
dissolve_free_huge_page+0x211/0x260
__page_handle_poison+0x45/0xc0
memory_failure+0x65e/0xc70
hard_offline_page_store+0x55/0xa0
kernfs_fop_write_iter+0x12c/0x1d0
vfs_write+0x387/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xca/0x1e0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(pcp_batch_high_lock);
lock(cpu_hotplug_lock);
lock(pcp_batch_high_lock);
rlock(cpu_hotplug_lock);
*** DEADLOCK ***
5 locks held by bash/46904:
#0: ffff98f6c3bb23f0 (sb_writers#5){.+.+}-{0:0}, at: ksys_write+0x64/0xe0
#1: ffff98f6c328e488 (&of->mutex){+.+.}-{3:3}, at: kernfs_fop_write_iter+0xf8/0x1d0
#2: ffff98ef83b31890 (kn->active#113){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x100/0x1d0
#3: ffffffffabf9db48 (mf_mutex){+.+.}-{3:3}, at: memory_failure+0x44/0xc70
#4: ffffffffabf92ea8 (pcp_batch_high_lock){+.+.}-{3:3}, at: zone_pcp_disable+0x16/0x40
stack backtrace:
CPU: 10 PID: 46904 Comm: bash Kdump: loaded Not tainted 6.8.0-11409-gf6cef5f8c37f #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x68/0xa0
check_noncircular+0x129/0x140
__lock_acquire+0x1298/0x1cd0
lock_acquire+0xc0/0x2b0
cpus_read_lock+0x2a/0xc0
static_key_slow_dec+0x16/0x60
__hugetlb_vmemmap_restore_folio+0x1b9/0x200
dissolve_free_huge_page+0x211/0x260
__page_handle_poison+0x45/0xc0
memory_failure+0x65e/0xc70
hard_offline_page_store+0x55/0xa0
kernfs_fop_write_iter+0x12c/0x1d0
vfs_write+0x387/0x550
ksys_write+0x64/0xe0
do_syscall_64+0xca/0x1e0
entry_SYSCALL_64_after_hwframe+0x6d/0x75
RIP: 0033:0x7fc862314887
Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24
RSP: 002b:00007fff19311268 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 000000000000000c RCX: 00007fc862314887
RDX: 000000000000000c RSI: 000056405645fe10 RDI: 0000000000000001
RBP: 000056405645fe10 R08: 00007fc8623d1460 R09: 000000007fffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000000000000000c
R13: 00007fc86241b780 R14: 00007fc862417600 R15: 00007fc862416a00
In short, below scene breaks the lock dependency chain:
memory_failure
__page_handle_poison
zone_pcp_disable -- lock(pcp_batch_high_lock)
dissolve_free_huge_page
__hugetlb_vmemmap_restore_folio
static_key_slow_dec
cpus_read_lock -- rlock(cpu_hotplug_lock)
Fix this by calling drain_all_pages() instead.
Link: https://lkml.kernel.org/r/20240407085456.2798193-1-linmiaohe@huawei.com
Fixes: 510d25c92ec4a ("mm/hwpoison: disable pcp for page_handle_poison()")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled
+++ a/mm/memory-failure.c
@@ -154,11 +154,17 @@ static int __page_handle_poison(struct p
{
int ret;
- zone_pcp_disable(page_zone(page));
+ /*
+ * zone_pcp_disable() can't be used here. It will hold pcp_batch_high_lock and
+ * dissolve_free_huge_page() might hold cpu_hotplug_lock via static_key_slow_dec()
+ * when hugetlb vmemmap optimization is enabled. This will break current lock
+ * dependency chain and leads to deadlock.
+ */
ret = dissolve_free_huge_page(page);
- if (!ret)
+ if (!ret) {
+ drain_all_pages(page_zone(page));
ret = take_page_off_buddy(page);
- zone_pcp_enable(page_zone(page));
+ }
return ret;
}
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-memory-failure-fix-deadlock-when-hugetlb_optimize_vmemmap-is-enabled.patch
The patch titled
Subject: mm/userfaultfd: Allow hugetlb change protection upon poison entry
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-userfaultfd-allow-hugetlb-change-protection-upon-poison-entry.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Peter Xu <peterx(a)redhat.com>
Subject: mm/userfaultfd: Allow hugetlb change protection upon poison entry
Date: Fri, 5 Apr 2024 19:19:20 -0400
After UFFDIO_POISON, there can be two kinds of hugetlb pte markers, either
the POISON one or UFFD_WP one.
Allow change protection to run on a poisoned marker just like !hugetlb
cases, ignoring the marker irrelevant of the permission.
Here the two bits are mutual exclusive. For example, when install a
poisoned entry it must not be UFFD_WP already (by checking pte_none()
before such install). And it also means if UFFD_WP is set there must have
no POISON bit set. It makes sense because UFFD_WP is a bit to reflect
permission, and permissions do not apply if the pte is poisoned and
destined to sigbus.
So here we simply check uffd_wp bit set first, do nothing otherwise.
Attach the Fixes to UFFDIO_POISON work, as before that it should not be
possible to have poison entry for hugetlb (e.g., hugetlb doesn't do swap,
so no chance of swapin errors).
Link: https://lkml.kernel.org/r/20240405231920.1772199-1-peterx@redhat.com
Link: https://lore.kernel.org/r/000000000000920d5e0615602dd1@google.com
Reported-by: syzbot+b07c8ac8eee3d4d8440f(a)syzkaller.appspotmail.com
Fixes: fc71884a5f59 ("mm: userfaultfd: add new UFFDIO_POISON ioctl")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
--- a/mm/hugetlb.c~mm-userfaultfd-allow-hugetlb-change-protection-upon-poison-entry
+++ a/mm/hugetlb.c
@@ -7044,9 +7044,13 @@ long hugetlb_change_protection(struct vm
if (!pte_same(pte, newpte))
set_huge_pte_at(mm, address, ptep, newpte, psize);
} else if (unlikely(is_pte_marker(pte))) {
- /* No other markers apply for now. */
- WARN_ON_ONCE(!pte_marker_uffd_wp(pte));
- if (uffd_wp_resolve)
+ /*
+ * Do nothing on a poison marker; page is
+ * corrupted, permissons do not apply. Here
+ * pte_marker_uffd_wp()==true implies !poison
+ * because they're mutual exclusive.
+ */
+ if (pte_marker_uffd_wp(pte) && uffd_wp_resolve)
/* Safe to modify directly (non-present->none). */
huge_pte_clear(mm, address, ptep, psize);
} else if (!huge_pte_none(pte)) {
_
Patches currently in -mm which might be from peterx(a)redhat.com are
mm-userfaultfd-allow-hugetlb-change-protection-upon-poison-entry.patch
mm-hmm-process-pud-swap-entry-without-pud_huge.patch
mm-gup-cache-p4d-in-follow_p4d_mask.patch
mm-gup-check-p4d-presence-before-going-on.patch
mm-x86-change-pxd_huge-behavior-to-exclude-swap-entries.patch
mm-sparc-change-pxd_huge-behavior-to-exclude-swap-entries.patch
mm-arm-use-macros-to-define-pmd-pud-helpers.patch
mm-arm-redefine-pmd_huge-with-pmd_leaf.patch
mm-arm64-merge-pxd_huge-and-pxd_leaf-definitions.patch
mm-powerpc-redefine-pxd_huge-with-pxd_leaf.patch
mm-gup-merge-pxd-huge-mapping-checks.patch
mm-treewide-replace-pxd_huge-with-pxd_leaf.patch
mm-treewide-remove-pxd_huge.patch
mm-arm-remove-pmd_thp_or_huge.patch
mm-document-pxd_leaf-api.patch
selftests-mm-run_vmtestssh-fix-hugetlb-mem-size-calculation.patch
selftests-mm-run_vmtestssh-fix-hugetlb-mem-size-calculation-fix.patch
mm-kconfig-config_pgtable_has_huge_leaves.patch
mm-hugetlb-declare-hugetlbfs_pagecache_present-non-static.patch
mm-make-hpage_pxd_-macros-even-if-thp.patch
mm-introduce-vma_pgtable_walk_beginend.patch
mm-arch-provide-pud_pfn-fallback.patch
mm-arch-provide-pud_pfn-fallback-fix.patch
mm-gup-drop-folio_fast_pin_allowed-in-hugepd-processing.patch
mm-gup-refactor-record_subpages-to-find-1st-small-page.patch
mm-gup-handle-hugetlb-for-no_page_table.patch
mm-gup-cache-pudp-in-follow_pud_mask.patch
mm-gup-handle-huge-pud-for-follow_pud_mask.patch
mm-gup-handle-huge-pmd-for-follow_pmd_mask.patch
mm-gup-handle-huge-pmd-for-follow_pmd_mask-fix.patch
mm-gup-handle-hugepd-for-follow_page.patch
mm-gup-handle-hugetlb-in-the-generic-follow_page_mask-code.patch
mm-allow-anon-exclusive-check-over-hugetlb-tail-pages.patch
Stop printing the TT memory decryption status info each time tt is created
and instead print it just once.
Reduces the spam in the system logs when running guests with SEV enabled.
Signed-off-by: Zack Rusin <zack.rusin(a)broadcom.com>
Fixes: 71ce046327cf ("drm/ttm: Make sure the mapped tt pages are decrypted when needed")
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: Christian König <christian.koenig(a)amd.com>
Cc: dri-devel(a)lists.freedesktop.org
Cc: linux-kernel(a)vger.kernel.org
Cc: <stable(a)vger.kernel.org> # v5.14+
---
drivers/gpu/drm/ttm/ttm_tt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
index 578a7c37f00b..d776e3f87064 100644
--- a/drivers/gpu/drm/ttm/ttm_tt.c
+++ b/drivers/gpu/drm/ttm/ttm_tt.c
@@ -92,7 +92,7 @@ int ttm_tt_create(struct ttm_buffer_object *bo, bool zero_alloc)
*/
if (bdev->pool.use_dma_alloc && cc_platform_has(CC_ATTR_GUEST_MEM_ENCRYPT)) {
page_flags |= TTM_TT_FLAG_DECRYPTED;
- drm_info(ddev, "TT memory decryption enabled.");
+ drm_info_once(ddev, "TT memory decryption enabled.");
}
bo->ttm = bdev->funcs->ttm_tt_create(bo, page_flags);
--
2.40.1
Hi,
Starting from kernel version 6.7.4 I hit the following problem: when I
connect two or more USB-to-ethernet adapters to the computer they get
assigned the same MAC address. Furthermore, the address is not the one
specified in any of the device labels but is selected seemingly at
random on boot.
This becomes a blocking issue when trying to use SYSTEMD.LINK(5) to
match the interfaces.
6.7.3 is OK, 6.7.4 introduces this behavior in the following upstream commit:
d2689b6a86b9 net: usb: ax88179_178a: avoid two consecutive device resets
Reverting this commit in 6.7.4 fixes the issue.
The behavior is also present in LTS 6.6.23. The commit has been
backported in 6.6.16.
Example system log when connecting two adapters. Both interfaces are
assigned address 02:a5:ab:80:e6:94.
kernel: usb 2-5.4: new SuperSpeed USB device number 3 using xhci_hcd
kernel: usb 2-5.4: New USB device found, idVendor=2001, idProduct=4a00, bcdDevice= 1.00
kernel: usb 2-5.4: New USB device strings: Mfr=1, Product=2, SerialNumber=3
kernel: usb 2-5.4: Product: D-Link DUB-1312
kernel: usb 2-5.4: Manufacturer: D-Link Elec. Corp.
kernel: usb 2-5.4: SerialNumber: 00000000001D4D
mtp-probe[2431]: checking bus 2, device 3: "/sys/devices/pci0000:00/0000:00:14.0/usb2/2-5/2-5.4"
mtp-probe[2431]: bus: 2, device: 3 was not an MTP device
kernel: ax88179_178a 2-5.4:1.0 eth0: register 'ax88179_178a' at usb-0000:00:14.0-5.4, D-Link DUB-1312 USB 3.0 to Gigabit Ethernet Adapter, 02:a5:ab:80:e6:94
kernel: usbcore: registered new interface driver ax88179_178a
mtp-probe[2436]: checking bus 2, device 3: "/sys/devices/pci0000:00/0000:00:14.0/usb2/2-5/2-5.4"
mtp-probe[2436]: bus: 2, device: 3 was not an MTP device
kernel: ax88179_178a 2-5.4:1.0 enp0s20f0u5u4: renamed from eth0
systemd-networkd[469]: eth0: Interface name change detected, renamed to enp0s20f0u5u4.
kernel: usb 1-5.3: new high-speed USB device number 8 using xhci_hcd
kernel: usb 1-5.3: New USB device found, idVendor=0b95, idProduct=1790, bcdDevice= 1.00
kernel: usb 1-5.3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
kernel: usb 1-5.3: Product: AX88179
kernel: usb 1-5.3: Manufacturer: ASIX Elec. Corp.
kernel: usb 1-5.3: SerialNumber: 0000249B2BAEC8
kernel: ax88179_178a 1-5.3:1.0 eth0: register 'ax88179_178a' at usb-0000:00:14.0-5.3, ASIX AX88179 USB 3.0 Gigabit Ethernet, 02:a5:ab:80:e6:94
mtp-probe[2440]: checking bus 1, device 8: "/sys/devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.3"
mtp-probe[2440]: bus: 1, device: 8 was not an MTP device
kernel: ax88179_178a 1-5.3:1.0 enp0s20f0u5u3: renamed from eth0
systemd-networkd[469]: eth0: Interface name change detected, renamed to enp0s20f0u5u3.
mtp-probe[2444]: checking bus 1, device 8: "/sys/devices/pci0000:00/0000:00:14.0/usb1/1-5/1-5.3"
mtp-probe[2444]: bus: 1, device: 8 was not an MTP device
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: c1d11fc2c8320871b40730991071dd0a0b405bc8
Gitweb: https://git.kernel.org/tip/c1d11fc2c8320871b40730991071dd0a0b405bc8
Author: Arnd Bergmann <arnd(a)arndb.de>
AuthorDate: Mon, 08 Apr 2024 09:46:01 +02:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 08 Apr 2024 16:34:18 +02:00
irqflags: Explicitly ignore lockdep_hrtimer_exit() argument
When building with 'make W=1' but CONFIG_TRACE_IRQFLAGS=n, the
unused argument to lockdep_hrtimer_exit() causes a warning:
kernel/time/hrtimer.c:1655:14: error: variable 'expires_in_hardirq' set but not used [-Werror=unused-but-set-variable]
This is intentional behavior, so add a cast to void to shut up the warning.
Fixes: 73d20564e0dc ("hrtimer: Don't dereference the hrtimer pointer after the callback")
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240408074609.3170807-1-arnd@kernel.org
Closes: https://lore.kernel.org/oe-kbuild-all/202311191229.55QXHVc6-lkp@intel.com/
---
include/linux/irqflags.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h
index 147feeb..3f003d5 100644
--- a/include/linux/irqflags.h
+++ b/include/linux/irqflags.h
@@ -114,7 +114,7 @@ do { \
# define lockdep_softirq_enter() do { } while (0)
# define lockdep_softirq_exit() do { } while (0)
# define lockdep_hrtimer_enter(__hrtimer) false
-# define lockdep_hrtimer_exit(__context) do { } while (0)
+# define lockdep_hrtimer_exit(__context) do { (void)(__context); } while (0)
# define lockdep_posixtimer_enter() do { } while (0)
# define lockdep_posixtimer_exit() do { } while (0)
# define lockdep_irq_work_enter(__work) do { } while (0)
From: Benjamin Berg <benjamin.berg(a)intel.com>
[ Upstream commit b8b80770b26c4591f20f1cde3328e5f1489c4488 ]
struct sta_info may be removed without holding sta_mtx if it has not
yet been inserted. To support this, only assert that the lock is held
for links other than the deflink.
This fixes lockdep issues that may be triggered in error cases.
Signed-off-by: Benjamin Berg <benjamin.berg(a)intel.com>
Signed-off-by: Gregory Greenman <gregory.greenman(a)intel.com>
Link: https://lore.kernel.org/r/20230619161906.cdd81377dea0.If5a6734b4b85608a2275…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Alexander Ofitserov <oficerovas(a)altlinux.org>
Cc: stable(a)vger.kernel.org
---
net/mac80211/sta_info.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index bd56015b29258..edec857edbd25 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -357,8 +357,9 @@ static void sta_remove_link(struct sta_info *sta, unsigned int link_id,
struct sta_link_alloc *alloc = NULL;
struct link_sta_info *link_sta;
- link_sta = rcu_dereference_protected(sta->link[link_id],
- lockdep_is_held(&sta->local->sta_mtx));
+ link_sta = rcu_access_pointer(sta->link[link_id]);
+ if (link_sta != &sta->deflink)
+ lockdep_assert_held(&sta->local->sta_mtx);
if (WARN_ON(!link_sta))
return;
--
2.42.1
When done from a virtual machine, instructions that touch APIC memory
must be emulated. By convention, MMIO access are typically performed via
io.h helpers such as 'readl()' or 'writeq()' to simplify instruction
emulation/decoding (ex: in KVM hosts and SEV guests) [0].
Currently, native_apic_mem_read() does not follow this convention,
allowing the compiler to emit instructions other than the MOV
instruction generated by readl(). In particular, when compiled with
clang and run as a SEV-ES or SEV-SNP guest, the compiler would emit a
TESTL instruction which is not supported by the SEV-ES emulator, causing
a boot failure in that environment. It is likely the same problem would
happen in a TDX guest as that uses the same instruction emulator as
SEV-ES.
To make sure all emulators can emulate APIC memory reads via MOV, use
the readl() function in native_apic_mem_read(). It is expected that any
emulator would support MOV in any addressing mode it is the most generic
and is what is ususally emitted currently.
The TESTL instruction is emitted when native_apic_mem_read() is inlined
into apic_mem_wait_icr_idle(). The emulator comes from insn_decode_mmio
in arch/x86/lib/insn-eval.c. It's not worth it to extend
insn_decode_mmio to support more instructions since, in theory, the
compiler could choose to output nearly any instruction for such reads
which would bloat the emulator beyond reason.
[0] https://lore.kernel.org/all/20220405232939.73860-12-kirill.shutemov@linux.i…
Signed-off-by: Adam Dunlap <acdunlap(a)google.com>
Tested-by: Kevin Loughlin <kevinloughlin(a)google.com>
Reviewed-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
An alterative to this approach would be to use inline assembly instead
of the readl() helper, as that is what native_apic_mem_write() does. I
consider using readl() to be cleaner since it is documented to be a simple
wrapper and inline assembly is less readable. native_apic_mem_write()
cannot be trivially updated to use writel since it appears to use custom
asm to workaround for a processor-specific bug.
Patch changelog:
V1 -> V2: Replaced asm with readl function which does the same thing
V2 -> V3: Updated commit message to show more motivation and
justification
V3 -> V4: Fixed nits in commit message
Link to v2 discussion: https://lore.kernel.org/all/20220908170456.3177635-1-acdunlap@google.com/
arch/x86/include/asm/apic.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 9d159b771dc8..dddd3fc195ef 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -13,6 +13,7 @@
#include <asm/mpspec.h>
#include <asm/msr.h>
#include <asm/hardirq.h>
+#include <asm/io.h>
#define ARCH_APICTIMER_STOPS_ON_C3 1
@@ -96,7 +97,7 @@ static inline void native_apic_mem_write(u32 reg, u32 v)
static inline u32 native_apic_mem_read(u32 reg)
{
- return *((volatile u32 *)(APIC_BASE + reg));
+ return readl((void __iomem *)(APIC_BASE + reg));
}
static inline void native_apic_mem_eoi(void)
--
2.43.0.594.gd9cf4e227d-goog
From: Ma Wupeng <mawupeng1(a)huawei.com>
[ Upstream commit d155df53f31068c3340733d586eb9b3ddfd70fc5 ]
Syzbot reports a warning in untrack_pfn(). Digging into the root we found
that this is due to memory allocation failure in pmd_alloc_one. And this
failure is produced due to failslab.
In copy_page_range(), memory alloaction for pmd failed. During the error
handling process in copy_page_range(), mmput() is called to remove all
vmas. While untrack_pfn this empty pfn, warning happens.
Here's a simplified flow:
dup_mm
dup_mmap
copy_page_range
copy_p4d_range
copy_pud_range
copy_pmd_range
pmd_alloc
__pmd_alloc
pmd_alloc_one
page = alloc_pages(gfp, 0);
if (!page)
return NULL;
mmput
exit_mmap
unmap_vmas
unmap_single_vma
untrack_pfn
follow_phys
WARN_ON_ONCE(1);
Since this vma is not generate successfully, we can clear flag VM_PAT. In
this case, untrack_pfn() will not be called while cleaning this vma.
Function untrack_pfn_moved() has also been renamed to fit the new logic.
Link: https://lkml.kernel.org/r/20230217025615.1595558-1-mawupeng1@huawei.com
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Reported-by: <syzbot+5f488e922d047d8f00cc(a)syzkaller.appspotmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Alexander Ofitserov <oficerovas(a)altlinux.org>
Cc: stable(a)vger.kernel.org
---
arch/x86/mm/pat/memtype.c | 12 ++++++++----
include/linux/pgtable.h | 7 ++++---
mm/memory.c | 1 +
mm/mremap.c | 2 +-
4 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index d5ef64ddd35e9..fd819f112a7a7 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -1108,11 +1108,15 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
}
/*
- * untrack_pfn_moved is called, while mremapping a pfnmap for a new region,
- * with the old vma after its pfnmap page table has been removed. The new
- * vma has a new pfnmap to the same pfn & cache type with VM_PAT set.
+ * untrack_pfn_clear is called if the following situation fits:
+ *
+ * 1) while mremapping a pfnmap for a new region, with the old vma after
+ * its pfnmap page table has been removed. The new vma has a new pfnmap
+ * to the same pfn & cache type with VM_PAT set.
+ * 2) while duplicating vm area, the new vma fails to copy the pgtable from
+ * old vma.
*/
-void untrack_pfn_moved(struct vm_area_struct *vma)
+void untrack_pfn_clear(struct vm_area_struct *vma)
{
vma->vm_flags &= ~VM_PAT;
}
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index d468efcf48f45..734d5e707fe6d 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1121,9 +1121,10 @@ static inline void untrack_pfn(struct vm_area_struct *vma,
}
/*
- * untrack_pfn_moved is called while mremapping a pfnmap for a new region.
+ * untrack_pfn_clear is called while mremapping a pfnmap for a new region
+ * or fails to copy pgtable during duplicate vm area.
*/
-static inline void untrack_pfn_moved(struct vm_area_struct *vma)
+static inline void untrack_pfn_clear(struct vm_area_struct *vma)
{
}
#else
@@ -1135,7 +1136,7 @@ extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
extern int track_pfn_copy(struct vm_area_struct *vma);
extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size);
-extern void untrack_pfn_moved(struct vm_area_struct *vma);
+extern void untrack_pfn_clear(struct vm_area_struct *vma);
#endif
#ifdef CONFIG_MMU
diff --git a/mm/memory.c b/mm/memory.c
index 8d71a82462dd5..95db1df5fd03a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1289,6 +1289,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
continue;
if (unlikely(copy_p4d_range(dst_vma, src_vma, dst_pgd, src_pgd,
addr, next))) {
+ untrack_pfn_clear(dst_vma);
ret = -ENOMEM;
break;
}
diff --git a/mm/mremap.c b/mm/mremap.c
index 3a3cf4cc2c632..9457a1e06b5ae 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -668,7 +668,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
/* Tell pfnmap has moved from this vma */
if (unlikely(vma->vm_flags & VM_PFNMAP))
- untrack_pfn_moved(vma);
+ untrack_pfn_clear(vma);
if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
/* We always clear VM_LOCKED[ONFAULT] on the old vma */
--
2.42.1
From: Ard Biesheuvel <ardb(a)kernel.org>
Please merge the attached series into stable branches v6.6 and v6.8.
They backport changes that are part of the work to harden the EFI stub
and make it compatible with MS requirements on EFI memory protections on
secure boot enabled systems.
Note that the first patch by Hou Wenlong is already in v6.8. The
remaining ones should apply equally to v6.6 and v6.8. Only patch #5 was
tweaked for context changes due to backports that overtook this one.
Thanks.
Ard Biesheuvel (5):
efi/libstub: Add generic support for parsing mem_encrypt=
x86/boot: Move mem_encrypt= parsing to the decompressor
x86/sme: Move early SME kernel encryption handling into .head.text
x86/sev: Move early startup code into .head.text section
x86/efistub: Remap kernel text read-only before dropping NX attribute
Hou Wenlong (1):
x86/head/64: Move the __head definition to <asm/init.h>
arch/x86/boot/compressed/Makefile | 2 +-
arch/x86/boot/compressed/misc.c | 16 +++++
arch/x86/boot/compressed/sev.c | 3 +
arch/x86/include/asm/boot.h | 1 +
arch/x86/include/asm/init.h | 2 +
arch/x86/include/asm/mem_encrypt.h | 8 +--
arch/x86/include/asm/sev.h | 10 +--
arch/x86/include/uapi/asm/bootparam.h | 1 +
arch/x86/kernel/head64.c | 3 +-
arch/x86/kernel/sev-shared.c | 23 +++---
arch/x86/kernel/sev.c | 14 ++--
arch/x86/lib/Makefile | 13 ----
arch/x86/mm/mem_encrypt_identity.c | 74 ++++++--------------
drivers/firmware/efi/libstub/efi-stub-helper.c | 8 +++
drivers/firmware/efi/libstub/efistub.h | 2 +-
drivers/firmware/efi/libstub/x86-stub.c | 14 +++-
16 files changed, 94 insertions(+), 100 deletions(-)
--
2.44.0.478.gd926399ef9-goog
From: Ma Wupeng <mawupeng1(a)huawei.com>
[ Upstream commit d155df53f31068c3340733d586eb9b3ddfd70fc5 ]
Syzbot reports a warning in untrack_pfn(). Digging into the root we found
that this is due to memory allocation failure in pmd_alloc_one. And this
failure is produced due to failslab.
In copy_page_range(), memory alloaction for pmd failed. During the error
handling process in copy_page_range(), mmput() is called to remove all
vmas. While untrack_pfn this empty pfn, warning happens.
Here's a simplified flow:
dup_mm
dup_mmap
copy_page_range
copy_p4d_range
copy_pud_range
copy_pmd_range
pmd_alloc
__pmd_alloc
pmd_alloc_one
page = alloc_pages(gfp, 0);
if (!page)
return NULL;
mmput
exit_mmap
unmap_vmas
unmap_single_vma
untrack_pfn
follow_phys
WARN_ON_ONCE(1);
Since this vma is not generate successfully, we can clear flag VM_PAT. In
this case, untrack_pfn() will not be called while cleaning this vma.
Function untrack_pfn_moved() has also been renamed to fit the new logic.
Link: https://lkml.kernel.org/r/20230217025615.1595558-1-mawupeng1@huawei.com
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Reported-by: <syzbot+5f488e922d047d8f00cc(a)syzkaller.appspotmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Alexander Ofitserov <oficerovas(a)altlinux.org>
Cc: stable(a)vger.kernel.org
---
arch/x86/mm/pat/memtype.c | 12 ++++++++----
include/linux/pgtable.h | 7 ++++---
mm/memory.c | 1 +
mm/mremap.c | 2 +-
4 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index f9c53a7107407..7c57001f79b83 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -1106,11 +1106,15 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
}
/*
- * untrack_pfn_moved is called, while mremapping a pfnmap for a new region,
- * with the old vma after its pfnmap page table has been removed. The new
- * vma has a new pfnmap to the same pfn & cache type with VM_PAT set.
+ * untrack_pfn_clear is called if the following situation fits:
+ *
+ * 1) while mremapping a pfnmap for a new region, with the old vma after
+ * its pfnmap page table has been removed. The new vma has a new pfnmap
+ * to the same pfn & cache type with VM_PAT set.
+ * 2) while duplicating vm area, the new vma fails to copy the pgtable from
+ * old vma.
*/
-void untrack_pfn_moved(struct vm_area_struct *vma)
+void untrack_pfn_clear(struct vm_area_struct *vma)
{
vma->vm_flags &= ~VM_PAT;
}
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index f924468d84ec4..b04a675fa320e 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1048,9 +1048,10 @@ static inline void untrack_pfn(struct vm_area_struct *vma,
}
/*
- * untrack_pfn_moved is called while mremapping a pfnmap for a new region.
+ * untrack_pfn_clear is called while mremapping a pfnmap for a new region
+ * or fails to copy pgtable during duplicate vm area.
*/
-static inline void untrack_pfn_moved(struct vm_area_struct *vma)
+static inline void untrack_pfn_clear(struct vm_area_struct *vma)
{
}
#else
@@ -1062,7 +1063,7 @@ extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
extern int track_pfn_copy(struct vm_area_struct *vma);
extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size);
-extern void untrack_pfn_moved(struct vm_area_struct *vma);
+extern void untrack_pfn_clear(struct vm_area_struct *vma);
#endif
#ifdef __HAVE_COLOR_ZERO_PAGE
diff --git a/mm/memory.c b/mm/memory.c
index fddd2e9aff245..cbd62138dfff0 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1204,6 +1204,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
continue;
if (unlikely(copy_p4d_range(dst_vma, src_vma, dst_pgd, src_pgd,
addr, next))) {
+ untrack_pfn_clear(dst_vma);
ret = -ENOMEM;
break;
}
diff --git a/mm/mremap.c b/mm/mremap.c
index 3334c40222101..af4398387b49e 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -421,7 +421,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
/* Tell pfnmap has moved from this vma */
if (unlikely(vma->vm_flags & VM_PFNMAP))
- untrack_pfn_moved(vma);
+ untrack_pfn_clear(vma);
if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
if (vm_flags & VM_ACCOUNT) {
--
2.42.1
From: Ma Wupeng <mawupeng1(a)huawei.com>
[ Upstream commit d155df53f31068c3340733d586eb9b3ddfd70fc5 ]
Syzbot reports a warning in untrack_pfn(). Digging into the root we found
that this is due to memory allocation failure in pmd_alloc_one. And this
failure is produced due to failslab.
In copy_page_range(), memory alloaction for pmd failed. During the error
handling process in copy_page_range(), mmput() is called to remove all
vmas. While untrack_pfn this empty pfn, warning happens.
Here's a simplified flow:
dup_mm
dup_mmap
copy_page_range
copy_p4d_range
copy_pud_range
copy_pmd_range
pmd_alloc
__pmd_alloc
pmd_alloc_one
page = alloc_pages(gfp, 0);
if (!page)
return NULL;
mmput
exit_mmap
unmap_vmas
unmap_single_vma
untrack_pfn
follow_phys
WARN_ON_ONCE(1);
Since this vma is not generate successfully, we can clear flag VM_PAT. In
this case, untrack_pfn() will not be called while cleaning this vma.
Function untrack_pfn_moved() has also been renamed to fit the new logic.
Link: https://lkml.kernel.org/r/20230217025615.1595558-1-mawupeng1@huawei.com
Signed-off-by: Ma Wupeng <mawupeng1(a)huawei.com>
Reported-by: <syzbot+5f488e922d047d8f00cc(a)syzkaller.appspotmail.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Alexander Ofitserov <oficerovas(a)altlinux.org>
Cc: stable(a)vger.kernel.org
---
arch/x86/mm/pat/memtype.c | 12 ++++++++----
include/linux/pgtable.h | 7 ++++---
mm/memory.c | 1 +
mm/mremap.c | 2 +-
4 files changed, 14 insertions(+), 8 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index 66a209f7eb86d..ed07807845ab0 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -1116,11 +1116,15 @@ void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
}
/*
- * untrack_pfn_moved is called, while mremapping a pfnmap for a new region,
- * with the old vma after its pfnmap page table has been removed. The new
- * vma has a new pfnmap to the same pfn & cache type with VM_PAT set.
+ * untrack_pfn_clear is called if the following situation fits:
+ *
+ * 1) while mremapping a pfnmap for a new region, with the old vma after
+ * its pfnmap page table has been removed. The new vma has a new pfnmap
+ * to the same pfn & cache type with VM_PAT set.
+ * 2) while duplicating vm area, the new vma fails to copy the pgtable from
+ * old vma.
*/
-void untrack_pfn_moved(struct vm_area_struct *vma)
+void untrack_pfn_clear(struct vm_area_struct *vma)
{
vma->vm_flags &= ~VM_PAT;
}
diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h
index 5f0d7d0b9471b..cce5f8ab461c6 100644
--- a/include/linux/pgtable.h
+++ b/include/linux/pgtable.h
@@ -1214,9 +1214,10 @@ static inline void untrack_pfn(struct vm_area_struct *vma,
}
/*
- * untrack_pfn_moved is called while mremapping a pfnmap for a new region.
+ * untrack_pfn_clear is called while mremapping a pfnmap for a new region
+ * or fails to copy pgtable during duplicate vm area.
*/
-static inline void untrack_pfn_moved(struct vm_area_struct *vma)
+static inline void untrack_pfn_clear(struct vm_area_struct *vma)
{
}
#else
@@ -1228,7 +1229,7 @@ extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot,
extern int track_pfn_copy(struct vm_area_struct *vma);
extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn,
unsigned long size);
-extern void untrack_pfn_moved(struct vm_area_struct *vma);
+extern void untrack_pfn_clear(struct vm_area_struct *vma);
#endif
#ifdef CONFIG_MMU
diff --git a/mm/memory.c b/mm/memory.c
index fb83cf56377ab..91e2d4520e4d4 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1335,6 +1335,7 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma)
continue;
if (unlikely(copy_p4d_range(dst_vma, src_vma, dst_pgd, src_pgd,
addr, next))) {
+ untrack_pfn_clear(dst_vma);
ret = -ENOMEM;
break;
}
diff --git a/mm/mremap.c b/mm/mremap.c
index 930f65c315c02..6ed28eeae5a84 100644
--- a/mm/mremap.c
+++ b/mm/mremap.c
@@ -682,7 +682,7 @@ static unsigned long move_vma(struct vm_area_struct *vma,
/* Tell pfnmap has moved from this vma */
if (unlikely(vma->vm_flags & VM_PFNMAP))
- untrack_pfn_moved(vma);
+ untrack_pfn_clear(vma);
if (unlikely(!err && (flags & MREMAP_DONTUNMAP))) {
/* We always clear VM_LOCKED[ONFAULT] on the old vma */
--
2.42.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 1a80dbcb2dbaf6e4c216e62e30fa7d3daa8001ce
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040548-lid-mahogany-fd86@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1a80dbcb2dbaf6e4c216e62e30fa7d3daa8001ce Mon Sep 17 00:00:00 2001
From: Andrii Nakryiko <andrii(a)kernel.org>
Date: Wed, 27 Mar 2024 22:24:26 -0700
Subject: [PATCH] bpf: support deferring bpf_link dealloc to after RCU grace
period
BPF link for some program types is passed as a "context" which can be
used by those BPF programs to look up additional information. E.g., for
multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values.
Because of this runtime dependency, when bpf_link refcnt drops to zero
there could still be active BPF programs running accessing link data.
This patch adds generic support to defer bpf_link dealloc callback to
after RCU GP, if requested. This is done by exposing two different
deallocation callbacks, one synchronous and one deferred. If deferred
one is provided, bpf_link_free() will schedule dealloc_deferred()
callback to happen after RCU GP.
BPF is using two flavors of RCU: "classic" non-sleepable one and RCU
tasks trace one. The latter is used when sleepable BPF programs are
used. bpf_link_free() accommodates that by checking underlying BPF
program's sleepable flag, and goes either through normal RCU GP only for
non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP
(taking into account rcu_trace_implies_rcu_gp() optimization), if BPF
program is sleepable.
We use this for multi-kprobe and multi-uprobe links, which dereference
link during program run. We also preventively switch raw_tp link to use
deferred dealloc callback, as upcoming changes in bpf-next tree expose
raw_tp link data (specifically, cookie value) to BPF program at runtime
as well.
Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
Reported-by: syzbot+981935d9485a560bfbcb(a)syzkaller.appspotmail.com
Reported-by: syzbot+2cb5a6c573e98db598cc(a)syzkaller.appspotmail.com
Reported-by: syzbot+62d8b26793e8a2bd0516(a)syzkaller.appspotmail.com
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Acked-by: Jiri Olsa <jolsa(a)kernel.org>
Link: https://lore.kernel.org/r/20240328052426.3042617-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4f20f62f9d63..890e152d553e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1574,12 +1574,26 @@ struct bpf_link {
enum bpf_link_type type;
const struct bpf_link_ops *ops;
struct bpf_prog *prog;
- struct work_struct work;
+ /* rcu is used before freeing, work can be used to schedule that
+ * RCU-based freeing before that, so they never overlap
+ */
+ union {
+ struct rcu_head rcu;
+ struct work_struct work;
+ };
};
struct bpf_link_ops {
void (*release)(struct bpf_link *link);
+ /* deallocate link resources callback, called without RCU grace period
+ * waiting
+ */
void (*dealloc)(struct bpf_link *link);
+ /* deallocate link resources callback, called after RCU grace period;
+ * if underlying BPF program is sleepable we go through tasks trace
+ * RCU GP and then "classic" RCU GP
+ */
+ void (*dealloc_deferred)(struct bpf_link *link);
int (*detach)(struct bpf_link *link);
int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
struct bpf_prog *old_prog);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ae2ff73bde7e..c287925471f6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3024,17 +3024,46 @@ void bpf_link_inc(struct bpf_link *link)
atomic64_inc(&link->refcnt);
}
+static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu)
+{
+ struct bpf_link *link = container_of(rcu, struct bpf_link, rcu);
+
+ /* free bpf_link and its containing memory */
+ link->ops->dealloc_deferred(link);
+}
+
+static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)
+{
+ if (rcu_trace_implies_rcu_gp())
+ bpf_link_defer_dealloc_rcu_gp(rcu);
+ else
+ call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp);
+}
+
/* bpf_link_free is guaranteed to be called from process context */
static void bpf_link_free(struct bpf_link *link)
{
+ bool sleepable = false;
+
bpf_link_free_id(link->id);
if (link->prog) {
+ sleepable = link->prog->sleepable;
/* detach BPF program, clean up used resources */
link->ops->release(link);
bpf_prog_put(link->prog);
}
- /* free bpf_link and its containing memory */
- link->ops->dealloc(link);
+ if (link->ops->dealloc_deferred) {
+ /* schedule BPF link deallocation; if underlying BPF program
+ * is sleepable, we need to first wait for RCU tasks trace
+ * sync, then go through "classic" RCU grace period
+ */
+ if (sleepable)
+ call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp);
+ else
+ call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp);
+ }
+ if (link->ops->dealloc)
+ link->ops->dealloc(link);
}
static void bpf_link_put_deferred(struct work_struct *work)
@@ -3544,7 +3573,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link,
static const struct bpf_link_ops bpf_raw_tp_link_lops = {
.release = bpf_raw_tp_link_release,
- .dealloc = bpf_raw_tp_link_dealloc,
+ .dealloc_deferred = bpf_raw_tp_link_dealloc,
.show_fdinfo = bpf_raw_tp_link_show_fdinfo,
.fill_link_info = bpf_raw_tp_link_fill_link_info,
};
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0b73fe5f7206..9dc605f08a23 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2728,7 +2728,7 @@ static int bpf_kprobe_multi_link_fill_link_info(const struct bpf_link *link,
static const struct bpf_link_ops bpf_kprobe_multi_link_lops = {
.release = bpf_kprobe_multi_link_release,
- .dealloc = bpf_kprobe_multi_link_dealloc,
+ .dealloc_deferred = bpf_kprobe_multi_link_dealloc,
.fill_link_info = bpf_kprobe_multi_link_fill_link_info,
};
@@ -3242,7 +3242,7 @@ static int bpf_uprobe_multi_link_fill_link_info(const struct bpf_link *link,
static const struct bpf_link_ops bpf_uprobe_multi_link_lops = {
.release = bpf_uprobe_multi_link_release,
- .dealloc = bpf_uprobe_multi_link_dealloc,
+ .dealloc_deferred = bpf_uprobe_multi_link_dealloc,
.fill_link_info = bpf_uprobe_multi_link_fill_link_info,
};
From: "min15.li" <min15.li(a)samsung.com>
commit 31a5978243d24d77be4bacca56c78a0fbc43b00d upstream.
In the function nvme_passthru_end(), only the value of the command
opcode is checked, without checking the command type (IO command or
Admin command). When we send a Dataset Management command (The opcode
of the Dataset Management command is the same as the Set Feature
command), kernel thinks it is a set feature command, then sets the
controller's keep alive interval, and calls nvme_keep_alive_work().
Signed-off-by: min15.li <min15.li(a)samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k(a)samsung.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Fixes: b58da2d270db ("nvme: update keep alive interval when kato is modified")
Signed-off-by: Tokunori Ikegami <ikegami.t(a)gmail.com>
---
drivers/nvme/host/core.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8f06e5c1706b..960a31e3307a 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1185,7 +1185,7 @@ static u32 nvme_passthru_start(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
return effects;
}
-static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
+static void nvme_passthru_end(struct nvme_ctrl *ctrl, struct nvme_ns *ns, u32 effects,
struct nvme_command *cmd, int status)
{
if (effects & NVME_CMD_EFFECTS_CSE_MASK) {
@@ -1201,6 +1201,8 @@ static void nvme_passthru_end(struct nvme_ctrl *ctrl, u32 effects,
nvme_queue_scan(ctrl);
flush_work(&ctrl->scan_work);
}
+ if (ns)
+ return;
switch (cmd->common.opcode) {
case nvme_admin_set_features:
@@ -1235,7 +1237,7 @@ int nvme_execute_passthru_rq(struct request *rq)
effects = nvme_passthru_start(ctrl, ns, cmd->common.opcode);
ret = nvme_execute_rq(disk, rq, false);
if (effects) /* nothing to be done for zero cmd effects */
- nvme_passthru_end(ctrl, effects, cmd, ret);
+ nvme_passthru_end(ctrl, ns, effects, cmd, ret);
return ret;
}
--
2.40.1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ed4cccef64c1d0d5b91e69f7a8a6697c3a865486
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040543-backdrop-sequester-2458@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ed4cccef64c1d0d5b91e69f7a8a6697c3a865486 Mon Sep 17 00:00:00 2001
From: Antoine Tenart <atenart(a)kernel.org>
Date: Tue, 26 Mar 2024 12:33:59 +0100
Subject: [PATCH] gro: fix ownership transfer
If packets are GROed with fraglist they might be segmented later on and
continue their journey in the stack. In skb_segment_list those skbs can
be reused as-is. This is an issue as their destructor was removed in
skb_gro_receive_list but not the reference to their socket, and then
they can't be orphaned. Fix this by also removing the reference to the
socket.
For example this could be observed,
kernel BUG at include/linux/skbuff.h:3131! (skb_orphan)
RIP: 0010:ip6_rcv_core+0x11bc/0x19a0
Call Trace:
ipv6_list_rcv+0x250/0x3f0
__netif_receive_skb_list_core+0x49d/0x8f0
netif_receive_skb_list_internal+0x634/0xd40
napi_complete_done+0x1d2/0x7d0
gro_cell_poll+0x118/0x1f0
A similar construction is found in skb_gro_receive, apply the same
change there.
Fixes: 5e10da5385d2 ("skbuff: allow 'slow_gro' for skb carring sock reference")
Signed-off-by: Antoine Tenart <atenart(a)kernel.org>
Reviewed-by: Willem de Bruijn <willemb(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/core/gro.c b/net/core/gro.c
index ee30d4f0c038..83f35d99a682 100644
--- a/net/core/gro.c
+++ b/net/core/gro.c
@@ -192,8 +192,9 @@ int skb_gro_receive(struct sk_buff *p, struct sk_buff *skb)
}
merge:
- /* sk owenrship - if any - completely transferred to the aggregated packet */
+ /* sk ownership - if any - completely transferred to the aggregated packet */
skb->destructor = NULL;
+ skb->sk = NULL;
delta_truesize = skb->truesize;
if (offset > headlen) {
unsigned int eat = offset - headlen;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index e9719afe91cf..3bb69464930b 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -449,8 +449,9 @@ static int skb_gro_receive_list(struct sk_buff *p, struct sk_buff *skb)
NAPI_GRO_CB(p)->count++;
p->data_len += skb->len;
- /* sk owenrship - if any - completely transferred to the aggregated packet */
+ /* sk ownership - if any - completely transferred to the aggregated packet */
skb->destructor = NULL;
+ skb->sk = NULL;
p->truesize += skb->truesize;
p->len += skb->len;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 65291dcfcf8936e1b23cfd7718fdfde7cfaf7706
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040819-elf-bamboo-00f6@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
65291dcfcf89 ("mm/secretmem: fix GUP-fast succeeding on secretmem folios")
8f9ff2deb8b9 ("secretmem: convert page_is_secretmem() to folio_is_secretmem()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 65291dcfcf8936e1b23cfd7718fdfde7cfaf7706 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Tue, 26 Mar 2024 15:32:08 +0100
Subject: [PATCH] mm/secretmem: fix GUP-fast succeeding on secretmem folios
folio_is_secretmem() currently relies on secretmem folios being LRU
folios, to save some cycles.
However, folios might reside in a folio batch without the LRU flag set, or
temporarily have their LRU flag cleared. Consequently, the LRU flag is
unreliable for this purpose.
In particular, this is the case when secretmem_fault() allocates a fresh
page and calls filemap_add_folio()->folio_add_lru(). The folio might be
added to the per-cpu folio batch and won't get the LRU flag set until the
batch was drained using e.g., lru_add_drain().
Consequently, folio_is_secretmem() might not detect secretmem folios and
GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel
when we would later try reading/writing to the folio, because the folio
has been unmapped from the directmap.
Fix it by removing that unreliable check.
Link: https://lkml.kernel.org/r/20240326143210.291116-2-david@redhat.com
Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: xingwei lee <xrivendell7(a)gmail.com>
Reported-by: yue sun <samsun1006219(a)gmail.com>
Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2r…
Debugged-by: Miklos Szeredi <miklos(a)szeredi.hu>
Tested-by: Miklos Szeredi <mszeredi(a)redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index 35f3a4a8ceb1..acf7e1a3f3de 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -13,10 +13,10 @@ static inline bool folio_is_secretmem(struct folio *folio)
/*
* Using folio_mapping() is quite slow because of the actual call
* instruction.
- * We know that secretmem pages are not compound and LRU so we can
+ * We know that secretmem pages are not compound, so we can
* save a couple of cycles here.
*/
- if (folio_test_large(folio) || !folio_test_lru(folio))
+ if (folio_test_large(folio))
return false;
mapping = (struct address_space *)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 4535e1a4174c4111d92c5a9a21e542d232e0fcaa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024033031-efficient-gallows-6872@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4535e1a4174c4111d92c5a9a21e542d232e0fcaa Mon Sep 17 00:00:00 2001
From: "Borislav Petkov (AMD)" <bp(a)alien8.de>
Date: Thu, 28 Mar 2024 13:59:05 +0100
Subject: [PATCH] x86/bugs: Fix the SRSO mitigation on Zen3/4
The original version of the mitigation would patch in the calls to the
untraining routines directly. That is, the alternative() in UNTRAIN_RET
will patch in the CALL to srso_alias_untrain_ret() directly.
However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain
mess") meant well in trying to clean up the situation, due to micro-
architectural reasons, the untraining routine srso_alias_untrain_ret()
must be the target of a CALL instruction and not of a JMP instruction as
it is done now.
Reshuffle the alternative macros to accomplish that.
Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 076bf8dee702..25466c4d2134 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -14,6 +14,7 @@
#include <asm/asm.h>
#include <asm/fred.h>
#include <asm/gsseg.h>
+#include <asm/nospec-branch.h>
#ifndef CONFIG_X86_CMPXCHG64
extern void cmpxchg8b_emu(void);
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index fc3a8a3c7ffe..170c89ed22fc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -262,11 +262,20 @@
.Lskip_rsb_\@:
.endm
+/*
+ * The CALL to srso_alias_untrain_ret() must be patched in directly at
+ * the spot where untraining must be done, ie., srso_alias_untrain_ret()
+ * must be the target of a CALL instruction instead of indirectly
+ * jumping to a wrapper which then calls it. Therefore, this macro is
+ * called outside of __UNTRAIN_RET below, for the time being, before the
+ * kernel can support nested alternatives with arbitrary nesting.
+ */
+.macro CALL_UNTRAIN_RET
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
-#define CALL_UNTRAIN_RET "call entry_untrain_ret"
-#else
-#define CALL_UNTRAIN_RET ""
+ ALTERNATIVE_2 "", "call entry_untrain_ret", X86_FEATURE_UNRET, \
+ "call srso_alias_untrain_ret", X86_FEATURE_SRSO_ALIAS
#endif
+.endm
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. Requires KERNEL CR3 because the
@@ -282,8 +291,8 @@
.macro __UNTRAIN_RET ibpb_feature, call_depth_insns
#if defined(CONFIG_MITIGATION_RETHUNK) || defined(CONFIG_MITIGATION_IBPB_ENTRY)
VALIDATE_UNRET_END
- ALTERNATIVE_3 "", \
- CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \
+ CALL_UNTRAIN_RET
+ ALTERNATIVE_2 "", \
"call entry_ibpb", \ibpb_feature, \
__stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH
#endif
@@ -342,6 +351,8 @@ extern void retbleed_return_thunk(void);
static inline void retbleed_return_thunk(void) {}
#endif
+extern void srso_alias_untrain_ret(void);
+
#ifdef CONFIG_MITIGATION_SRSO
extern void srso_return_thunk(void);
extern void srso_alias_return_thunk(void);
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 721b528da9ac..02cde194a99e 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -163,6 +163,7 @@ SYM_CODE_START_NOALIGN(srso_alias_untrain_ret)
lfence
jmp srso_alias_return_thunk
SYM_FUNC_END(srso_alias_untrain_ret)
+__EXPORT_THUNK(srso_alias_untrain_ret)
.popsection
.pushsection .text..__x86.rethunk_safe
@@ -224,10 +225,12 @@ SYM_CODE_START(srso_return_thunk)
SYM_CODE_END(srso_return_thunk)
#define JMP_SRSO_UNTRAIN_RET "jmp srso_untrain_ret"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "jmp srso_alias_untrain_ret"
#else /* !CONFIG_MITIGATION_SRSO */
+/* Dummy for the alternative in CALL_UNTRAIN_RET. */
+SYM_CODE_START(srso_alias_untrain_ret)
+ RET
+SYM_FUNC_END(srso_alias_untrain_ret)
#define JMP_SRSO_UNTRAIN_RET "ud2"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "ud2"
#endif /* CONFIG_MITIGATION_SRSO */
#ifdef CONFIG_MITIGATION_UNRET_ENTRY
@@ -319,9 +322,7 @@ SYM_FUNC_END(retbleed_untrain_ret)
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
SYM_FUNC_START(entry_untrain_ret)
- ALTERNATIVE_2 JMP_RETBLEED_UNTRAIN_RET, \
- JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO, \
- JMP_SRSO_ALIAS_UNTRAIN_RET, X86_FEATURE_SRSO_ALIAS
+ ALTERNATIVE JMP_RETBLEED_UNTRAIN_RET, JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO
SYM_FUNC_END(entry_untrain_ret)
__EXPORT_THUNK(entry_untrain_ret)
Hello,
after upgrading to v6.6.24 from v6.6.23 some old boxes (i686; Intel
Celeron M) stop to boot:
They hang after:
Decompressing Linux... Parsing ELF... No relocation needed... done.
Booting the kernel (entry_offset: 0x00000000).
After some minutes they reboot.
I bisected this down to
commit bebb5af001dc6cb4f505bb21c4d5e2efbdc112e2
Author: Thomas Gleixner <tglx(a)linutronix.de>
Date: Fri Mar 22 19:56:39 2024 +0100
x86/mpparse: Register APIC address only once
[ Upstream commit f2208aa12c27bfada3c15c550c03ca81d42dcac2 ]
The APIC address is registered twice. First during the early
detection and
afterwards when actually scanning the table for APIC IDs. The APIC
and
topology core warn about the second attempt.
Restrict it to the early detection call.
Fixes: 81287ad65da5 ("x86/apic: Sanitize APIC address setup")
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Tested-by: Guenter Roeck <linux(a)roeck-us.net>
Link:
https://lore.kernel.org/r/20240322185305.297774848@linutronix.de
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Reverting this commit in v6.6.24 solves the problem.
Regards,
--
Wolfgang Walter
Studierendenwerk München Oberbayern
Anstalt des öffentlichen Rechts
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 65291dcfcf8936e1b23cfd7718fdfde7cfaf7706
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040823-spilt-marsupial-8d2f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
65291dcfcf89 ("mm/secretmem: fix GUP-fast succeeding on secretmem folios")
8f9ff2deb8b9 ("secretmem: convert page_is_secretmem() to folio_is_secretmem()")
b0496fe4effd ("mm/gup: Convert gup_pte_range() to use a folio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 65291dcfcf8936e1b23cfd7718fdfde7cfaf7706 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Tue, 26 Mar 2024 15:32:08 +0100
Subject: [PATCH] mm/secretmem: fix GUP-fast succeeding on secretmem folios
folio_is_secretmem() currently relies on secretmem folios being LRU
folios, to save some cycles.
However, folios might reside in a folio batch without the LRU flag set, or
temporarily have their LRU flag cleared. Consequently, the LRU flag is
unreliable for this purpose.
In particular, this is the case when secretmem_fault() allocates a fresh
page and calls filemap_add_folio()->folio_add_lru(). The folio might be
added to the per-cpu folio batch and won't get the LRU flag set until the
batch was drained using e.g., lru_add_drain().
Consequently, folio_is_secretmem() might not detect secretmem folios and
GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel
when we would later try reading/writing to the folio, because the folio
has been unmapped from the directmap.
Fix it by removing that unreliable check.
Link: https://lkml.kernel.org/r/20240326143210.291116-2-david@redhat.com
Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: xingwei lee <xrivendell7(a)gmail.com>
Reported-by: yue sun <samsun1006219(a)gmail.com>
Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2r…
Debugged-by: Miklos Szeredi <miklos(a)szeredi.hu>
Tested-by: Miklos Szeredi <mszeredi(a)redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/secretmem.h b/include/linux/secretmem.h
index 35f3a4a8ceb1..acf7e1a3f3de 100644
--- a/include/linux/secretmem.h
+++ b/include/linux/secretmem.h
@@ -13,10 +13,10 @@ static inline bool folio_is_secretmem(struct folio *folio)
/*
* Using folio_mapping() is quite slow because of the actual call
* instruction.
- * We know that secretmem pages are not compound and LRU so we can
+ * We know that secretmem pages are not compound, so we can
* save a couple of cycles here.
*/
- if (folio_test_large(folio) || !folio_test_lru(folio))
+ if (folio_test_large(folio))
return false;
mapping = (struct address_space *)
Added support internal keyboard for the following models:
Asus ExpertBook (B1502CGA, B1502CVA, B2502FBA),
Asus Vivobook (E1504GA, E1504GAB),
Maibenben X565.
Successfully tested on the available Asus ExpertBook B1502CVA model.
[PATCH 6.6.y 1/7] ACPI: resource: Consolidate IRQ trigger-type override DMI
[PATCH 6.6.y 2/7] ACPI: resource: Drop .ident values from dmi_system_id
[PATCH 6.6.y 3/7] ACPI: resource: Add DMI quirks for ASUS Vivobook E1504GA
[PATCH 6.6.y 4/7] ACPI: resource: Skip IRQ override on ASUS ExpertBook
[PATCH 6.6.y 5/7] ACPI: resource: Skip IRQ override on ASUS ExpertBook
[PATCH 6.6.y 6/7] ACPI: resource: Add IRQ override quirk for ASUS
[PATCH 6.6.y 7/7] ACPI: resource: Use IRQ override on Maibenben X565
Added support internal keyboard for the following models:
Asus ExpertBook (B1502CVA, B2502FBA),
Maibenben X565.
Successfully tested on the available Asus ExpertBook B1502CVA model.
[PATCH 6.8.y 1/3] ACPI: resource: Skip IRQ override on ASUS ExpertBook
[PATCH 6.8.y 2/3] ACPI: resource: Add IRQ override quirk for ASUS
[PATCH 6.8.y 3/3] ACPI: resource: Use IRQ override on Maibenben X565
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 99f855082f228cdcecd6ab768d3b8b505e0eb028
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040803-tackling-bogged-527d@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
99f855082f22 ("drm/i915/mst: Reject FEC+MST on ICL")
126f94e87e79 ("drm/i915: Fix FEC pipe A vs. DDI A mixup")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 99f855082f228cdcecd6ab768d3b8b505e0eb028 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Tue, 2 Apr 2024 16:51:47 +0300
Subject: [PATCH] drm/i915/mst: Reject FEC+MST on ICL
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
ICL supposedly doesn't support FEC on MST. Reject it.
Cc: stable(a)vger.kernel.org
Fixes: d51f25eb479a ("drm/i915: Add DSC support to MST path")
Reviewed-by: Uma Shankar <uma.shankar(a)intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240402135148.23011-7-ville.…
(cherry picked from commit b648ce2a28ba83c4fa67c61fcc5983e15e9d4afb)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_dp.c b/drivers/gpu/drm/i915/display/intel_dp.c
index 36afbb68d87d..abd62bebc46d 100644
--- a/drivers/gpu/drm/i915/display/intel_dp.c
+++ b/drivers/gpu/drm/i915/display/intel_dp.c
@@ -1422,7 +1422,8 @@ static bool intel_dp_source_supports_fec(struct intel_dp *intel_dp,
if (DISPLAY_VER(dev_priv) >= 12)
return true;
- if (DISPLAY_VER(dev_priv) == 11 && encoder->port != PORT_A)
+ if (DISPLAY_VER(dev_priv) == 11 && encoder->port != PORT_A &&
+ !intel_crtc_has_type(pipe_config, INTEL_OUTPUT_DP_MST))
return true;
return false;
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 51bc63392e96ca45d7be98bc43c180b174ffca09
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040851-nail-pectin-26a4@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
51bc63392e96 ("drm/i915/mst: Limit MST+DSC to TGL+")
d19daffc89fe ("drm/i915/dp_mst: Use connector DSC DPCD in intel_dp_mst_mode_valid_ctx()")
8d5284765a43 ("drm/i915/dp: Use consistent name for link bpp and compressed bpp")
3a4b4809c8cc ("drm/i915/dp: Move compressed bpp check with 420 format inside the helper")
a1476c2a9715 ("drm/i915/dp: Consider output_format while computing dsc bpp")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 51bc63392e96ca45d7be98bc43c180b174ffca09 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Ville=20Syrj=C3=A4l=C3=A4?= <ville.syrjala(a)linux.intel.com>
Date: Tue, 2 Apr 2024 16:51:46 +0300
Subject: [PATCH] drm/i915/mst: Limit MST+DSC to TGL+
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The MST code currently assumes that glk+ already supports MST+DSC,
which is incorrect. We need to check for TGL+ actually. ICL does
support SST+DSC, but supposedly it can't do MST+FEC which will
also rule out MST+DSC.
Note that a straight TGL+ check doesn't work here because DSC
support can get fused out, so we do need to also check 'has_dsc'.
Cc: stable(a)vger.kernel.org
Fixes: d51f25eb479a ("drm/i915: Add DSC support to MST path")
Reviewed-by: Uma Shankar <uma.shankar(a)intel.com>
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20240402135148.23011-6-ville.…
(cherry picked from commit c9c92f286dbdf872390ef3e74dbe5f0641e46f55)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
diff --git a/drivers/gpu/drm/i915/display/intel_display_device.h b/drivers/gpu/drm/i915/display/intel_display_device.h
index fe4268813786..9b1bce2624b9 100644
--- a/drivers/gpu/drm/i915/display/intel_display_device.h
+++ b/drivers/gpu/drm/i915/display/intel_display_device.h
@@ -47,6 +47,7 @@ struct drm_printer;
#define HAS_DPT(i915) (DISPLAY_VER(i915) >= 13)
#define HAS_DSB(i915) (DISPLAY_INFO(i915)->has_dsb)
#define HAS_DSC(__i915) (DISPLAY_RUNTIME_INFO(__i915)->has_dsc)
+#define HAS_DSC_MST(__i915) (DISPLAY_VER(__i915) >= 12 && HAS_DSC(__i915))
#define HAS_FBC(i915) (DISPLAY_RUNTIME_INFO(i915)->fbc_mask != 0)
#define HAS_FPGA_DBG_UNCLAIMED(i915) (DISPLAY_INFO(i915)->has_fpga_dbg)
#define HAS_FW_BLC(i915) (DISPLAY_VER(i915) >= 3)
diff --git a/drivers/gpu/drm/i915/display/intel_dp_mst.c b/drivers/gpu/drm/i915/display/intel_dp_mst.c
index 53aec023ce92..b651c990af85 100644
--- a/drivers/gpu/drm/i915/display/intel_dp_mst.c
+++ b/drivers/gpu/drm/i915/display/intel_dp_mst.c
@@ -1355,7 +1355,7 @@ intel_dp_mst_mode_valid_ctx(struct drm_connector *connector,
return 0;
}
- if (DISPLAY_VER(dev_priv) >= 10 &&
+ if (HAS_DSC_MST(dev_priv) &&
drm_dp_sink_supports_dsc(intel_connector->dp.dsc_dpcd)) {
/*
* TBD pass the connector BPC,
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 58acd1f497162e7d282077f816faa519487be045
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040829-upcoming-gnat-69ec@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
58acd1f49716 ("smb: client: fix potential UAF in cifs_dump_full_key()")
38c8a9a52082 ("smb: move client and server files to common directory fs/smb")
8e3554150d6c ("cifs: fix sharing of DFS connections")
2f4e429c8469 ("cifs: lock chan_lock outside match_session")
396935de1455 ("cifs: fix use-after-free bug in refresh_cache_worker()")
b56bce502f55 ("cifs: set DFS root session in cifs_get_smb_ses()")
b9ee2e307c6b ("cifs: improve checking of DFS links over STATUS_OBJECT_NAME_INVALID")
7ad54b98fc1f ("cifs: use origin fullpath for automounts")
466611e4af82 ("cifs: fix source pathname comparison of dfs supers")
6916881f443f ("cifs: fix refresh of cached referrals")
cb3f6d876452 ("cifs: don't refresh cached referrals from unactive mounts")
a1c0d00572fc ("cifs: share dfs connections and supers")
a73a26d97eca ("cifs: split out ses and tcon retrieval from mount_get_conns()")
2301bc103ac4 ("cifs: remove unused smb3_fs_context::mount_options")
abdb1742a312 ("cifs: get rid of mount options string parsing")
9fd29a5bae6e ("cifs: use fs_context for automounts")
c877ce47e137 ("cifs: reduce roundtrips on create/qinfo requests")
83fb8abec293 ("cifs: Add "extbuf" and "extbuflen" args to smb2_compound_op()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 58acd1f497162e7d282077f816faa519487be045 Mon Sep 17 00:00:00 2001
From: Paulo Alcantara <pc(a)manguebit.com>
Date: Tue, 2 Apr 2024 16:33:54 -0300
Subject: [PATCH] smb: client: fix potential UAF in cifs_dump_full_key()
Skip sessions that are being teared down (status == SES_EXITING) to
avoid UAF.
Cc: stable(a)vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/ioctl.c b/fs/smb/client/ioctl.c
index c012dfdba80d..855ac5a62edf 100644
--- a/fs/smb/client/ioctl.c
+++ b/fs/smb/client/ioctl.c
@@ -247,7 +247,9 @@ static int cifs_dump_full_key(struct cifs_tcon *tcon, struct smb3_full_key_debug
spin_lock(&cifs_tcp_ses_lock);
list_for_each_entry(server_it, &cifs_tcp_ses_list, tcp_ses_list) {
list_for_each_entry(ses_it, &server_it->smb_ses_list, smb_ses_list) {
- if (ses_it->Suid == out.session_id) {
+ spin_lock(&ses_it->ses_lock);
+ if (ses_it->ses_status != SES_EXITING &&
+ ses_it->Suid == out.session_id) {
ses = ses_it;
/*
* since we are using the session outside the crit
@@ -255,9 +257,11 @@ static int cifs_dump_full_key(struct cifs_tcon *tcon, struct smb3_full_key_debug
* so increment its refcount
*/
cifs_smb_ses_inc_refcount(ses);
+ spin_unlock(&ses_it->ses_lock);
found = true;
goto search_end;
}
+ spin_unlock(&ses_it->ses_lock);
}
}
search_end:
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 24a9799aa8efecd0eb55a75e35f9d8e6400063aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040843-empathic-duller-2cb9@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
24a9799aa8ef ("smb: client: fix UAF in smb2_reconnect_server()")
7257bcf3bdc7 ("cifs: cifs_chan_is_iface_active should be called with chan_lock held")
27e1fd343f80 ("cifs: after disabling multichannel, mark tcon for reconnect")
fa1d0508bdd4 ("cifs: account for primary channel in the interface list")
a6d8fb54a515 ("cifs: distribute channels across interfaces based on speed")
c37ed2d7d098 ("smb: client: remove extra @chan_count check in __cifs_put_smb_ses()")
ff7d80a9f271 ("cifs: fix session state transition to avoid use-after-free issue")
38c8a9a52082 ("smb: move client and server files to common directory fs/smb")
943fb67b0902 ("cifs: missing lock when updating session status")
bc962159e8e3 ("cifs: avoid race conditions with parallel reconnects")
1bcd548d935a ("cifs: prevent data race in cifs_reconnect_tcon()")
e77978de4765 ("cifs: update ip_addr for ses only for primary chan setup")
3c0070f54b31 ("cifs: prevent data race in smb2_reconnect()")
05844bd661d9 ("cifs: print last update time for interface list")
25cf01b7c920 ("cifs: set correct status of tcon ipc when reconnecting")
abdb1742a312 ("cifs: get rid of mount options string parsing")
9fd29a5bae6e ("cifs: use fs_context for automounts")
f391d6ee002e ("cifs: Use after free in debug code")
68e14569d7e5 ("smb3: add dynamic trace points for tree disconnect")
13609a8b3ac6 ("cifs: move from strlcpy with unused retval to strscpy")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24a9799aa8efecd0eb55a75e35f9d8e6400063aa Mon Sep 17 00:00:00 2001
From: Paulo Alcantara <pc(a)manguebit.com>
Date: Mon, 1 Apr 2024 14:13:10 -0300
Subject: [PATCH] smb: client: fix UAF in smb2_reconnect_server()
The UAF bug is due to smb2_reconnect_server() accessing a session that
is already being teared down by another thread that is executing
__cifs_put_smb_ses(). This can happen when (a) the client has
connection to the server but no session or (b) another thread ends up
setting @ses->ses_status again to something different than
SES_EXITING.
To fix this, we need to make sure to unconditionally set
@ses->ses_status to SES_EXITING and prevent any other threads from
setting a new status while we're still tearing it down.
The following can be reproduced by adding some delay to right after
the ipc is freed in __cifs_put_smb_ses() - which will give
smb2_reconnect_server() worker a chance to run and then accessing
@ses->ipc:
kinit ...
mount.cifs //srv/share /mnt/1 -o sec=krb5,nohandlecache,echo_interval=10
[disconnect srv]
ls /mnt/1 &>/dev/null
sleep 30
kdestroy
[reconnect srv]
sleep 10
umount /mnt/1
...
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
general protection fault, probably for non-canonical address
0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc2 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39
04/01/2014
Workqueue: cifsiod smb2_reconnect_server [cifs]
RIP: 0010:__list_del_entry_valid_or_report+0x33/0xf0
Code: 4f 08 48 85 d2 74 42 48 85 c9 74 59 48 b8 00 01 00 00 00 00 ad
de 48 39 c2 74 61 48 b8 22 01 00 00 00 00 74 69 <48> 8b 01 48 39 f8 75
7b 48 8b 72 08 48 39 c6 0f 85 88 00 00 00 b8
RSP: 0018:ffffc900001bfd70 EFLAGS: 00010a83
RAX: dead000000000122 RBX: ffff88810da53838 RCX: 6b6b6b6b6b6b6b6b
RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc02f6878 RDI: ffff88810da53800
RBP: ffff88810da53800 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88810c064000
R13: 0000000000000001 R14: ffff88810c064000 R15: ffff8881039cc000
FS: 0000000000000000(0000) GS:ffff888157c00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe3728b1000 CR3: 000000010caa4000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
? die_addr+0x36/0x90
? exc_general_protection+0x1c1/0x3f0
? asm_exc_general_protection+0x26/0x30
? __list_del_entry_valid_or_report+0x33/0xf0
__cifs_put_smb_ses+0x1ae/0x500 [cifs]
smb2_reconnect_server+0x4ed/0x710 [cifs]
process_one_work+0x205/0x6b0
worker_thread+0x191/0x360
? __pfx_worker_thread+0x10/0x10
kthread+0xe2/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 9b85b5341822..ee29bc57300c 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -232,7 +232,13 @@ cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
spin_lock(&cifs_tcp_ses_lock);
list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) {
- /* check if iface is still active */
+ spin_lock(&ses->ses_lock);
+ if (ses->ses_status == SES_EXITING) {
+ spin_unlock(&ses->ses_lock);
+ continue;
+ }
+ spin_unlock(&ses->ses_lock);
+
spin_lock(&ses->chan_lock);
if (cifs_ses_get_chan_index(ses, server) ==
CIFS_INVAL_CHAN_INDEX) {
@@ -1963,31 +1969,6 @@ cifs_setup_ipc(struct cifs_ses *ses, struct smb3_fs_context *ctx)
return rc;
}
-/**
- * cifs_free_ipc - helper to release the session IPC tcon
- * @ses: smb session to unmount the IPC from
- *
- * Needs to be called everytime a session is destroyed.
- *
- * On session close, the IPC is closed and the server must release all tcons of the session.
- * No need to send a tree disconnect here.
- *
- * Besides, it will make the server to not close durable and resilient files on session close, as
- * specified in MS-SMB2 3.3.5.6 Receiving an SMB2 LOGOFF Request.
- */
-static int
-cifs_free_ipc(struct cifs_ses *ses)
-{
- struct cifs_tcon *tcon = ses->tcon_ipc;
-
- if (tcon == NULL)
- return 0;
-
- tconInfoFree(tcon);
- ses->tcon_ipc = NULL;
- return 0;
-}
-
static struct cifs_ses *
cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
{
@@ -2019,48 +2000,52 @@ cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
void __cifs_put_smb_ses(struct cifs_ses *ses)
{
struct TCP_Server_Info *server = ses->server;
+ struct cifs_tcon *tcon;
unsigned int xid;
size_t i;
+ bool do_logoff;
int rc;
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_EXITING) {
- spin_unlock(&ses->ses_lock);
- return;
- }
- spin_unlock(&ses->ses_lock);
-
- cifs_dbg(FYI, "%s: ses_count=%d\n", __func__, ses->ses_count);
- cifs_dbg(FYI,
- "%s: ses ipc: %s\n", __func__, ses->tcon_ipc ? ses->tcon_ipc->tree_name : "NONE");
-
spin_lock(&cifs_tcp_ses_lock);
- if (--ses->ses_count > 0) {
+ spin_lock(&ses->ses_lock);
+ cifs_dbg(FYI, "%s: id=0x%llx ses_count=%d ses_status=%u ipc=%s\n",
+ __func__, ses->Suid, ses->ses_count, ses->ses_status,
+ ses->tcon_ipc ? ses->tcon_ipc->tree_name : "none");
+ if (ses->ses_status == SES_EXITING || --ses->ses_count > 0) {
+ spin_unlock(&ses->ses_lock);
spin_unlock(&cifs_tcp_ses_lock);
return;
}
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_GOOD)
- ses->ses_status = SES_EXITING;
- spin_unlock(&ses->ses_lock);
- spin_unlock(&cifs_tcp_ses_lock);
-
/* ses_count can never go negative */
WARN_ON(ses->ses_count < 0);
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_EXITING && server->ops->logoff) {
- spin_unlock(&ses->ses_lock);
- cifs_free_ipc(ses);
+ spin_lock(&ses->chan_lock);
+ cifs_chan_clear_need_reconnect(ses, server);
+ spin_unlock(&ses->chan_lock);
+
+ do_logoff = ses->ses_status == SES_GOOD && server->ops->logoff;
+ ses->ses_status = SES_EXITING;
+ tcon = ses->tcon_ipc;
+ ses->tcon_ipc = NULL;
+ spin_unlock(&ses->ses_lock);
+ spin_unlock(&cifs_tcp_ses_lock);
+
+ /*
+ * On session close, the IPC is closed and the server must release all
+ * tcons of the session. No need to send a tree disconnect here.
+ *
+ * Besides, it will make the server to not close durable and resilient
+ * files on session close, as specified in MS-SMB2 3.3.5.6 Receiving an
+ * SMB2 LOGOFF Request.
+ */
+ tconInfoFree(tcon);
+ if (do_logoff) {
xid = get_xid();
rc = server->ops->logoff(xid, ses);
if (rc)
cifs_server_dbg(VFS, "%s: Session Logoff failure rc=%d\n",
__func__, rc);
_free_xid(xid);
- } else {
- spin_unlock(&ses->ses_lock);
- cifs_free_ipc(ses);
}
spin_lock(&cifs_tcp_ses_lock);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 24a9799aa8efecd0eb55a75e35f9d8e6400063aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040841-scarcity-subarctic-4f20@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
24a9799aa8ef ("smb: client: fix UAF in smb2_reconnect_server()")
7257bcf3bdc7 ("cifs: cifs_chan_is_iface_active should be called with chan_lock held")
27e1fd343f80 ("cifs: after disabling multichannel, mark tcon for reconnect")
fa1d0508bdd4 ("cifs: account for primary channel in the interface list")
a6d8fb54a515 ("cifs: distribute channels across interfaces based on speed")
c37ed2d7d098 ("smb: client: remove extra @chan_count check in __cifs_put_smb_ses()")
ff7d80a9f271 ("cifs: fix session state transition to avoid use-after-free issue")
38c8a9a52082 ("smb: move client and server files to common directory fs/smb")
943fb67b0902 ("cifs: missing lock when updating session status")
bc962159e8e3 ("cifs: avoid race conditions with parallel reconnects")
1bcd548d935a ("cifs: prevent data race in cifs_reconnect_tcon()")
e77978de4765 ("cifs: update ip_addr for ses only for primary chan setup")
3c0070f54b31 ("cifs: prevent data race in smb2_reconnect()")
05844bd661d9 ("cifs: print last update time for interface list")
25cf01b7c920 ("cifs: set correct status of tcon ipc when reconnecting")
abdb1742a312 ("cifs: get rid of mount options string parsing")
9fd29a5bae6e ("cifs: use fs_context for automounts")
f391d6ee002e ("cifs: Use after free in debug code")
68e14569d7e5 ("smb3: add dynamic trace points for tree disconnect")
13609a8b3ac6 ("cifs: move from strlcpy with unused retval to strscpy")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24a9799aa8efecd0eb55a75e35f9d8e6400063aa Mon Sep 17 00:00:00 2001
From: Paulo Alcantara <pc(a)manguebit.com>
Date: Mon, 1 Apr 2024 14:13:10 -0300
Subject: [PATCH] smb: client: fix UAF in smb2_reconnect_server()
The UAF bug is due to smb2_reconnect_server() accessing a session that
is already being teared down by another thread that is executing
__cifs_put_smb_ses(). This can happen when (a) the client has
connection to the server but no session or (b) another thread ends up
setting @ses->ses_status again to something different than
SES_EXITING.
To fix this, we need to make sure to unconditionally set
@ses->ses_status to SES_EXITING and prevent any other threads from
setting a new status while we're still tearing it down.
The following can be reproduced by adding some delay to right after
the ipc is freed in __cifs_put_smb_ses() - which will give
smb2_reconnect_server() worker a chance to run and then accessing
@ses->ipc:
kinit ...
mount.cifs //srv/share /mnt/1 -o sec=krb5,nohandlecache,echo_interval=10
[disconnect srv]
ls /mnt/1 &>/dev/null
sleep 30
kdestroy
[reconnect srv]
sleep 10
umount /mnt/1
...
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
general protection fault, probably for non-canonical address
0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc2 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39
04/01/2014
Workqueue: cifsiod smb2_reconnect_server [cifs]
RIP: 0010:__list_del_entry_valid_or_report+0x33/0xf0
Code: 4f 08 48 85 d2 74 42 48 85 c9 74 59 48 b8 00 01 00 00 00 00 ad
de 48 39 c2 74 61 48 b8 22 01 00 00 00 00 74 69 <48> 8b 01 48 39 f8 75
7b 48 8b 72 08 48 39 c6 0f 85 88 00 00 00 b8
RSP: 0018:ffffc900001bfd70 EFLAGS: 00010a83
RAX: dead000000000122 RBX: ffff88810da53838 RCX: 6b6b6b6b6b6b6b6b
RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc02f6878 RDI: ffff88810da53800
RBP: ffff88810da53800 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88810c064000
R13: 0000000000000001 R14: ffff88810c064000 R15: ffff8881039cc000
FS: 0000000000000000(0000) GS:ffff888157c00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe3728b1000 CR3: 000000010caa4000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
? die_addr+0x36/0x90
? exc_general_protection+0x1c1/0x3f0
? asm_exc_general_protection+0x26/0x30
? __list_del_entry_valid_or_report+0x33/0xf0
__cifs_put_smb_ses+0x1ae/0x500 [cifs]
smb2_reconnect_server+0x4ed/0x710 [cifs]
process_one_work+0x205/0x6b0
worker_thread+0x191/0x360
? __pfx_worker_thread+0x10/0x10
kthread+0xe2/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 9b85b5341822..ee29bc57300c 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -232,7 +232,13 @@ cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
spin_lock(&cifs_tcp_ses_lock);
list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) {
- /* check if iface is still active */
+ spin_lock(&ses->ses_lock);
+ if (ses->ses_status == SES_EXITING) {
+ spin_unlock(&ses->ses_lock);
+ continue;
+ }
+ spin_unlock(&ses->ses_lock);
+
spin_lock(&ses->chan_lock);
if (cifs_ses_get_chan_index(ses, server) ==
CIFS_INVAL_CHAN_INDEX) {
@@ -1963,31 +1969,6 @@ cifs_setup_ipc(struct cifs_ses *ses, struct smb3_fs_context *ctx)
return rc;
}
-/**
- * cifs_free_ipc - helper to release the session IPC tcon
- * @ses: smb session to unmount the IPC from
- *
- * Needs to be called everytime a session is destroyed.
- *
- * On session close, the IPC is closed and the server must release all tcons of the session.
- * No need to send a tree disconnect here.
- *
- * Besides, it will make the server to not close durable and resilient files on session close, as
- * specified in MS-SMB2 3.3.5.6 Receiving an SMB2 LOGOFF Request.
- */
-static int
-cifs_free_ipc(struct cifs_ses *ses)
-{
- struct cifs_tcon *tcon = ses->tcon_ipc;
-
- if (tcon == NULL)
- return 0;
-
- tconInfoFree(tcon);
- ses->tcon_ipc = NULL;
- return 0;
-}
-
static struct cifs_ses *
cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
{
@@ -2019,48 +2000,52 @@ cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
void __cifs_put_smb_ses(struct cifs_ses *ses)
{
struct TCP_Server_Info *server = ses->server;
+ struct cifs_tcon *tcon;
unsigned int xid;
size_t i;
+ bool do_logoff;
int rc;
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_EXITING) {
- spin_unlock(&ses->ses_lock);
- return;
- }
- spin_unlock(&ses->ses_lock);
-
- cifs_dbg(FYI, "%s: ses_count=%d\n", __func__, ses->ses_count);
- cifs_dbg(FYI,
- "%s: ses ipc: %s\n", __func__, ses->tcon_ipc ? ses->tcon_ipc->tree_name : "NONE");
-
spin_lock(&cifs_tcp_ses_lock);
- if (--ses->ses_count > 0) {
+ spin_lock(&ses->ses_lock);
+ cifs_dbg(FYI, "%s: id=0x%llx ses_count=%d ses_status=%u ipc=%s\n",
+ __func__, ses->Suid, ses->ses_count, ses->ses_status,
+ ses->tcon_ipc ? ses->tcon_ipc->tree_name : "none");
+ if (ses->ses_status == SES_EXITING || --ses->ses_count > 0) {
+ spin_unlock(&ses->ses_lock);
spin_unlock(&cifs_tcp_ses_lock);
return;
}
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_GOOD)
- ses->ses_status = SES_EXITING;
- spin_unlock(&ses->ses_lock);
- spin_unlock(&cifs_tcp_ses_lock);
-
/* ses_count can never go negative */
WARN_ON(ses->ses_count < 0);
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_EXITING && server->ops->logoff) {
- spin_unlock(&ses->ses_lock);
- cifs_free_ipc(ses);
+ spin_lock(&ses->chan_lock);
+ cifs_chan_clear_need_reconnect(ses, server);
+ spin_unlock(&ses->chan_lock);
+
+ do_logoff = ses->ses_status == SES_GOOD && server->ops->logoff;
+ ses->ses_status = SES_EXITING;
+ tcon = ses->tcon_ipc;
+ ses->tcon_ipc = NULL;
+ spin_unlock(&ses->ses_lock);
+ spin_unlock(&cifs_tcp_ses_lock);
+
+ /*
+ * On session close, the IPC is closed and the server must release all
+ * tcons of the session. No need to send a tree disconnect here.
+ *
+ * Besides, it will make the server to not close durable and resilient
+ * files on session close, as specified in MS-SMB2 3.3.5.6 Receiving an
+ * SMB2 LOGOFF Request.
+ */
+ tconInfoFree(tcon);
+ if (do_logoff) {
xid = get_xid();
rc = server->ops->logoff(xid, ses);
if (rc)
cifs_server_dbg(VFS, "%s: Session Logoff failure rc=%d\n",
__func__, rc);
_free_xid(xid);
- } else {
- spin_unlock(&ses->ses_lock);
- cifs_free_ipc(ses);
}
spin_lock(&cifs_tcp_ses_lock);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 24a9799aa8efecd0eb55a75e35f9d8e6400063aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040839-drainage-uninvited-614e@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
24a9799aa8ef ("smb: client: fix UAF in smb2_reconnect_server()")
7257bcf3bdc7 ("cifs: cifs_chan_is_iface_active should be called with chan_lock held")
27e1fd343f80 ("cifs: after disabling multichannel, mark tcon for reconnect")
fa1d0508bdd4 ("cifs: account for primary channel in the interface list")
a6d8fb54a515 ("cifs: distribute channels across interfaces based on speed")
c37ed2d7d098 ("smb: client: remove extra @chan_count check in __cifs_put_smb_ses()")
ff7d80a9f271 ("cifs: fix session state transition to avoid use-after-free issue")
38c8a9a52082 ("smb: move client and server files to common directory fs/smb")
943fb67b0902 ("cifs: missing lock when updating session status")
bc962159e8e3 ("cifs: avoid race conditions with parallel reconnects")
1bcd548d935a ("cifs: prevent data race in cifs_reconnect_tcon()")
e77978de4765 ("cifs: update ip_addr for ses only for primary chan setup")
3c0070f54b31 ("cifs: prevent data race in smb2_reconnect()")
05844bd661d9 ("cifs: print last update time for interface list")
25cf01b7c920 ("cifs: set correct status of tcon ipc when reconnecting")
abdb1742a312 ("cifs: get rid of mount options string parsing")
9fd29a5bae6e ("cifs: use fs_context for automounts")
f391d6ee002e ("cifs: Use after free in debug code")
68e14569d7e5 ("smb3: add dynamic trace points for tree disconnect")
13609a8b3ac6 ("cifs: move from strlcpy with unused retval to strscpy")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24a9799aa8efecd0eb55a75e35f9d8e6400063aa Mon Sep 17 00:00:00 2001
From: Paulo Alcantara <pc(a)manguebit.com>
Date: Mon, 1 Apr 2024 14:13:10 -0300
Subject: [PATCH] smb: client: fix UAF in smb2_reconnect_server()
The UAF bug is due to smb2_reconnect_server() accessing a session that
is already being teared down by another thread that is executing
__cifs_put_smb_ses(). This can happen when (a) the client has
connection to the server but no session or (b) another thread ends up
setting @ses->ses_status again to something different than
SES_EXITING.
To fix this, we need to make sure to unconditionally set
@ses->ses_status to SES_EXITING and prevent any other threads from
setting a new status while we're still tearing it down.
The following can be reproduced by adding some delay to right after
the ipc is freed in __cifs_put_smb_ses() - which will give
smb2_reconnect_server() worker a chance to run and then accessing
@ses->ipc:
kinit ...
mount.cifs //srv/share /mnt/1 -o sec=krb5,nohandlecache,echo_interval=10
[disconnect srv]
ls /mnt/1 &>/dev/null
sleep 30
kdestroy
[reconnect srv]
sleep 10
umount /mnt/1
...
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
general protection fault, probably for non-canonical address
0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc2 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39
04/01/2014
Workqueue: cifsiod smb2_reconnect_server [cifs]
RIP: 0010:__list_del_entry_valid_or_report+0x33/0xf0
Code: 4f 08 48 85 d2 74 42 48 85 c9 74 59 48 b8 00 01 00 00 00 00 ad
de 48 39 c2 74 61 48 b8 22 01 00 00 00 00 74 69 <48> 8b 01 48 39 f8 75
7b 48 8b 72 08 48 39 c6 0f 85 88 00 00 00 b8
RSP: 0018:ffffc900001bfd70 EFLAGS: 00010a83
RAX: dead000000000122 RBX: ffff88810da53838 RCX: 6b6b6b6b6b6b6b6b
RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc02f6878 RDI: ffff88810da53800
RBP: ffff88810da53800 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88810c064000
R13: 0000000000000001 R14: ffff88810c064000 R15: ffff8881039cc000
FS: 0000000000000000(0000) GS:ffff888157c00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe3728b1000 CR3: 000000010caa4000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
? die_addr+0x36/0x90
? exc_general_protection+0x1c1/0x3f0
? asm_exc_general_protection+0x26/0x30
? __list_del_entry_valid_or_report+0x33/0xf0
__cifs_put_smb_ses+0x1ae/0x500 [cifs]
smb2_reconnect_server+0x4ed/0x710 [cifs]
process_one_work+0x205/0x6b0
worker_thread+0x191/0x360
? __pfx_worker_thread+0x10/0x10
kthread+0xe2/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 9b85b5341822..ee29bc57300c 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -232,7 +232,13 @@ cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
spin_lock(&cifs_tcp_ses_lock);
list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) {
- /* check if iface is still active */
+ spin_lock(&ses->ses_lock);
+ if (ses->ses_status == SES_EXITING) {
+ spin_unlock(&ses->ses_lock);
+ continue;
+ }
+ spin_unlock(&ses->ses_lock);
+
spin_lock(&ses->chan_lock);
if (cifs_ses_get_chan_index(ses, server) ==
CIFS_INVAL_CHAN_INDEX) {
@@ -1963,31 +1969,6 @@ cifs_setup_ipc(struct cifs_ses *ses, struct smb3_fs_context *ctx)
return rc;
}
-/**
- * cifs_free_ipc - helper to release the session IPC tcon
- * @ses: smb session to unmount the IPC from
- *
- * Needs to be called everytime a session is destroyed.
- *
- * On session close, the IPC is closed and the server must release all tcons of the session.
- * No need to send a tree disconnect here.
- *
- * Besides, it will make the server to not close durable and resilient files on session close, as
- * specified in MS-SMB2 3.3.5.6 Receiving an SMB2 LOGOFF Request.
- */
-static int
-cifs_free_ipc(struct cifs_ses *ses)
-{
- struct cifs_tcon *tcon = ses->tcon_ipc;
-
- if (tcon == NULL)
- return 0;
-
- tconInfoFree(tcon);
- ses->tcon_ipc = NULL;
- return 0;
-}
-
static struct cifs_ses *
cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
{
@@ -2019,48 +2000,52 @@ cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
void __cifs_put_smb_ses(struct cifs_ses *ses)
{
struct TCP_Server_Info *server = ses->server;
+ struct cifs_tcon *tcon;
unsigned int xid;
size_t i;
+ bool do_logoff;
int rc;
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_EXITING) {
- spin_unlock(&ses->ses_lock);
- return;
- }
- spin_unlock(&ses->ses_lock);
-
- cifs_dbg(FYI, "%s: ses_count=%d\n", __func__, ses->ses_count);
- cifs_dbg(FYI,
- "%s: ses ipc: %s\n", __func__, ses->tcon_ipc ? ses->tcon_ipc->tree_name : "NONE");
-
spin_lock(&cifs_tcp_ses_lock);
- if (--ses->ses_count > 0) {
+ spin_lock(&ses->ses_lock);
+ cifs_dbg(FYI, "%s: id=0x%llx ses_count=%d ses_status=%u ipc=%s\n",
+ __func__, ses->Suid, ses->ses_count, ses->ses_status,
+ ses->tcon_ipc ? ses->tcon_ipc->tree_name : "none");
+ if (ses->ses_status == SES_EXITING || --ses->ses_count > 0) {
+ spin_unlock(&ses->ses_lock);
spin_unlock(&cifs_tcp_ses_lock);
return;
}
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_GOOD)
- ses->ses_status = SES_EXITING;
- spin_unlock(&ses->ses_lock);
- spin_unlock(&cifs_tcp_ses_lock);
-
/* ses_count can never go negative */
WARN_ON(ses->ses_count < 0);
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_EXITING && server->ops->logoff) {
- spin_unlock(&ses->ses_lock);
- cifs_free_ipc(ses);
+ spin_lock(&ses->chan_lock);
+ cifs_chan_clear_need_reconnect(ses, server);
+ spin_unlock(&ses->chan_lock);
+
+ do_logoff = ses->ses_status == SES_GOOD && server->ops->logoff;
+ ses->ses_status = SES_EXITING;
+ tcon = ses->tcon_ipc;
+ ses->tcon_ipc = NULL;
+ spin_unlock(&ses->ses_lock);
+ spin_unlock(&cifs_tcp_ses_lock);
+
+ /*
+ * On session close, the IPC is closed and the server must release all
+ * tcons of the session. No need to send a tree disconnect here.
+ *
+ * Besides, it will make the server to not close durable and resilient
+ * files on session close, as specified in MS-SMB2 3.3.5.6 Receiving an
+ * SMB2 LOGOFF Request.
+ */
+ tconInfoFree(tcon);
+ if (do_logoff) {
xid = get_xid();
rc = server->ops->logoff(xid, ses);
if (rc)
cifs_server_dbg(VFS, "%s: Session Logoff failure rc=%d\n",
__func__, rc);
_free_xid(xid);
- } else {
- spin_unlock(&ses->ses_lock);
- cifs_free_ipc(ses);
}
spin_lock(&cifs_tcp_ses_lock);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 24a9799aa8efecd0eb55a75e35f9d8e6400063aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040837-alkaline-motor-f911@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
24a9799aa8ef ("smb: client: fix UAF in smb2_reconnect_server()")
7257bcf3bdc7 ("cifs: cifs_chan_is_iface_active should be called with chan_lock held")
27e1fd343f80 ("cifs: after disabling multichannel, mark tcon for reconnect")
fa1d0508bdd4 ("cifs: account for primary channel in the interface list")
a6d8fb54a515 ("cifs: distribute channels across interfaces based on speed")
c37ed2d7d098 ("smb: client: remove extra @chan_count check in __cifs_put_smb_ses()")
ff7d80a9f271 ("cifs: fix session state transition to avoid use-after-free issue")
38c8a9a52082 ("smb: move client and server files to common directory fs/smb")
943fb67b0902 ("cifs: missing lock when updating session status")
bc962159e8e3 ("cifs: avoid race conditions with parallel reconnects")
1bcd548d935a ("cifs: prevent data race in cifs_reconnect_tcon()")
e77978de4765 ("cifs: update ip_addr for ses only for primary chan setup")
3c0070f54b31 ("cifs: prevent data race in smb2_reconnect()")
05844bd661d9 ("cifs: print last update time for interface list")
25cf01b7c920 ("cifs: set correct status of tcon ipc when reconnecting")
abdb1742a312 ("cifs: get rid of mount options string parsing")
9fd29a5bae6e ("cifs: use fs_context for automounts")
f391d6ee002e ("cifs: Use after free in debug code")
68e14569d7e5 ("smb3: add dynamic trace points for tree disconnect")
13609a8b3ac6 ("cifs: move from strlcpy with unused retval to strscpy")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24a9799aa8efecd0eb55a75e35f9d8e6400063aa Mon Sep 17 00:00:00 2001
From: Paulo Alcantara <pc(a)manguebit.com>
Date: Mon, 1 Apr 2024 14:13:10 -0300
Subject: [PATCH] smb: client: fix UAF in smb2_reconnect_server()
The UAF bug is due to smb2_reconnect_server() accessing a session that
is already being teared down by another thread that is executing
__cifs_put_smb_ses(). This can happen when (a) the client has
connection to the server but no session or (b) another thread ends up
setting @ses->ses_status again to something different than
SES_EXITING.
To fix this, we need to make sure to unconditionally set
@ses->ses_status to SES_EXITING and prevent any other threads from
setting a new status while we're still tearing it down.
The following can be reproduced by adding some delay to right after
the ipc is freed in __cifs_put_smb_ses() - which will give
smb2_reconnect_server() worker a chance to run and then accessing
@ses->ipc:
kinit ...
mount.cifs //srv/share /mnt/1 -o sec=krb5,nohandlecache,echo_interval=10
[disconnect srv]
ls /mnt/1 &>/dev/null
sleep 30
kdestroy
[reconnect srv]
sleep 10
umount /mnt/1
...
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
general protection fault, probably for non-canonical address
0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc2 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39
04/01/2014
Workqueue: cifsiod smb2_reconnect_server [cifs]
RIP: 0010:__list_del_entry_valid_or_report+0x33/0xf0
Code: 4f 08 48 85 d2 74 42 48 85 c9 74 59 48 b8 00 01 00 00 00 00 ad
de 48 39 c2 74 61 48 b8 22 01 00 00 00 00 74 69 <48> 8b 01 48 39 f8 75
7b 48 8b 72 08 48 39 c6 0f 85 88 00 00 00 b8
RSP: 0018:ffffc900001bfd70 EFLAGS: 00010a83
RAX: dead000000000122 RBX: ffff88810da53838 RCX: 6b6b6b6b6b6b6b6b
RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc02f6878 RDI: ffff88810da53800
RBP: ffff88810da53800 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88810c064000
R13: 0000000000000001 R14: ffff88810c064000 R15: ffff8881039cc000
FS: 0000000000000000(0000) GS:ffff888157c00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe3728b1000 CR3: 000000010caa4000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
? die_addr+0x36/0x90
? exc_general_protection+0x1c1/0x3f0
? asm_exc_general_protection+0x26/0x30
? __list_del_entry_valid_or_report+0x33/0xf0
__cifs_put_smb_ses+0x1ae/0x500 [cifs]
smb2_reconnect_server+0x4ed/0x710 [cifs]
process_one_work+0x205/0x6b0
worker_thread+0x191/0x360
? __pfx_worker_thread+0x10/0x10
kthread+0xe2/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 9b85b5341822..ee29bc57300c 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -232,7 +232,13 @@ cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
spin_lock(&cifs_tcp_ses_lock);
list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) {
- /* check if iface is still active */
+ spin_lock(&ses->ses_lock);
+ if (ses->ses_status == SES_EXITING) {
+ spin_unlock(&ses->ses_lock);
+ continue;
+ }
+ spin_unlock(&ses->ses_lock);
+
spin_lock(&ses->chan_lock);
if (cifs_ses_get_chan_index(ses, server) ==
CIFS_INVAL_CHAN_INDEX) {
@@ -1963,31 +1969,6 @@ cifs_setup_ipc(struct cifs_ses *ses, struct smb3_fs_context *ctx)
return rc;
}
-/**
- * cifs_free_ipc - helper to release the session IPC tcon
- * @ses: smb session to unmount the IPC from
- *
- * Needs to be called everytime a session is destroyed.
- *
- * On session close, the IPC is closed and the server must release all tcons of the session.
- * No need to send a tree disconnect here.
- *
- * Besides, it will make the server to not close durable and resilient files on session close, as
- * specified in MS-SMB2 3.3.5.6 Receiving an SMB2 LOGOFF Request.
- */
-static int
-cifs_free_ipc(struct cifs_ses *ses)
-{
- struct cifs_tcon *tcon = ses->tcon_ipc;
-
- if (tcon == NULL)
- return 0;
-
- tconInfoFree(tcon);
- ses->tcon_ipc = NULL;
- return 0;
-}
-
static struct cifs_ses *
cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
{
@@ -2019,48 +2000,52 @@ cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
void __cifs_put_smb_ses(struct cifs_ses *ses)
{
struct TCP_Server_Info *server = ses->server;
+ struct cifs_tcon *tcon;
unsigned int xid;
size_t i;
+ bool do_logoff;
int rc;
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_EXITING) {
- spin_unlock(&ses->ses_lock);
- return;
- }
- spin_unlock(&ses->ses_lock);
-
- cifs_dbg(FYI, "%s: ses_count=%d\n", __func__, ses->ses_count);
- cifs_dbg(FYI,
- "%s: ses ipc: %s\n", __func__, ses->tcon_ipc ? ses->tcon_ipc->tree_name : "NONE");
-
spin_lock(&cifs_tcp_ses_lock);
- if (--ses->ses_count > 0) {
+ spin_lock(&ses->ses_lock);
+ cifs_dbg(FYI, "%s: id=0x%llx ses_count=%d ses_status=%u ipc=%s\n",
+ __func__, ses->Suid, ses->ses_count, ses->ses_status,
+ ses->tcon_ipc ? ses->tcon_ipc->tree_name : "none");
+ if (ses->ses_status == SES_EXITING || --ses->ses_count > 0) {
+ spin_unlock(&ses->ses_lock);
spin_unlock(&cifs_tcp_ses_lock);
return;
}
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_GOOD)
- ses->ses_status = SES_EXITING;
- spin_unlock(&ses->ses_lock);
- spin_unlock(&cifs_tcp_ses_lock);
-
/* ses_count can never go negative */
WARN_ON(ses->ses_count < 0);
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_EXITING && server->ops->logoff) {
- spin_unlock(&ses->ses_lock);
- cifs_free_ipc(ses);
+ spin_lock(&ses->chan_lock);
+ cifs_chan_clear_need_reconnect(ses, server);
+ spin_unlock(&ses->chan_lock);
+
+ do_logoff = ses->ses_status == SES_GOOD && server->ops->logoff;
+ ses->ses_status = SES_EXITING;
+ tcon = ses->tcon_ipc;
+ ses->tcon_ipc = NULL;
+ spin_unlock(&ses->ses_lock);
+ spin_unlock(&cifs_tcp_ses_lock);
+
+ /*
+ * On session close, the IPC is closed and the server must release all
+ * tcons of the session. No need to send a tree disconnect here.
+ *
+ * Besides, it will make the server to not close durable and resilient
+ * files on session close, as specified in MS-SMB2 3.3.5.6 Receiving an
+ * SMB2 LOGOFF Request.
+ */
+ tconInfoFree(tcon);
+ if (do_logoff) {
xid = get_xid();
rc = server->ops->logoff(xid, ses);
if (rc)
cifs_server_dbg(VFS, "%s: Session Logoff failure rc=%d\n",
__func__, rc);
_free_xid(xid);
- } else {
- spin_unlock(&ses->ses_lock);
- cifs_free_ipc(ses);
}
spin_lock(&cifs_tcp_ses_lock);
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 24a9799aa8efecd0eb55a75e35f9d8e6400063aa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040832-episode-phrasing-9e1a@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
24a9799aa8ef ("smb: client: fix UAF in smb2_reconnect_server()")
7257bcf3bdc7 ("cifs: cifs_chan_is_iface_active should be called with chan_lock held")
27e1fd343f80 ("cifs: after disabling multichannel, mark tcon for reconnect")
fa1d0508bdd4 ("cifs: account for primary channel in the interface list")
a6d8fb54a515 ("cifs: distribute channels across interfaces based on speed")
c37ed2d7d098 ("smb: client: remove extra @chan_count check in __cifs_put_smb_ses()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 24a9799aa8efecd0eb55a75e35f9d8e6400063aa Mon Sep 17 00:00:00 2001
From: Paulo Alcantara <pc(a)manguebit.com>
Date: Mon, 1 Apr 2024 14:13:10 -0300
Subject: [PATCH] smb: client: fix UAF in smb2_reconnect_server()
The UAF bug is due to smb2_reconnect_server() accessing a session that
is already being teared down by another thread that is executing
__cifs_put_smb_ses(). This can happen when (a) the client has
connection to the server but no session or (b) another thread ends up
setting @ses->ses_status again to something different than
SES_EXITING.
To fix this, we need to make sure to unconditionally set
@ses->ses_status to SES_EXITING and prevent any other threads from
setting a new status while we're still tearing it down.
The following can be reproduced by adding some delay to right after
the ipc is freed in __cifs_put_smb_ses() - which will give
smb2_reconnect_server() worker a chance to run and then accessing
@ses->ipc:
kinit ...
mount.cifs //srv/share /mnt/1 -o sec=krb5,nohandlecache,echo_interval=10
[disconnect srv]
ls /mnt/1 &>/dev/null
sleep 30
kdestroy
[reconnect srv]
sleep 10
umount /mnt/1
...
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
CIFS: VFS: Verify user has a krb5 ticket and keyutils is installed
CIFS: VFS: \\srv Send error in SessSetup = -126
general protection fault, probably for non-canonical address
0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 50 Comm: kworker/3:1 Not tainted 6.9.0-rc2 #1
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-1.fc39
04/01/2014
Workqueue: cifsiod smb2_reconnect_server [cifs]
RIP: 0010:__list_del_entry_valid_or_report+0x33/0xf0
Code: 4f 08 48 85 d2 74 42 48 85 c9 74 59 48 b8 00 01 00 00 00 00 ad
de 48 39 c2 74 61 48 b8 22 01 00 00 00 00 74 69 <48> 8b 01 48 39 f8 75
7b 48 8b 72 08 48 39 c6 0f 85 88 00 00 00 b8
RSP: 0018:ffffc900001bfd70 EFLAGS: 00010a83
RAX: dead000000000122 RBX: ffff88810da53838 RCX: 6b6b6b6b6b6b6b6b
RDX: 6b6b6b6b6b6b6b6b RSI: ffffffffc02f6878 RDI: ffff88810da53800
RBP: ffff88810da53800 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: ffff88810c064000
R13: 0000000000000001 R14: ffff88810c064000 R15: ffff8881039cc000
FS: 0000000000000000(0000) GS:ffff888157c00000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fe3728b1000 CR3: 000000010caa4000 CR4: 0000000000750ef0
PKRU: 55555554
Call Trace:
<TASK>
? die_addr+0x36/0x90
? exc_general_protection+0x1c1/0x3f0
? asm_exc_general_protection+0x26/0x30
? __list_del_entry_valid_or_report+0x33/0xf0
__cifs_put_smb_ses+0x1ae/0x500 [cifs]
smb2_reconnect_server+0x4ed/0x710 [cifs]
process_one_work+0x205/0x6b0
worker_thread+0x191/0x360
? __pfx_worker_thread+0x10/0x10
kthread+0xe2/0x110
? __pfx_kthread+0x10/0x10
ret_from_fork+0x34/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c
index 9b85b5341822..ee29bc57300c 100644
--- a/fs/smb/client/connect.c
+++ b/fs/smb/client/connect.c
@@ -232,7 +232,13 @@ cifs_mark_tcp_ses_conns_for_reconnect(struct TCP_Server_Info *server,
spin_lock(&cifs_tcp_ses_lock);
list_for_each_entry_safe(ses, nses, &pserver->smb_ses_list, smb_ses_list) {
- /* check if iface is still active */
+ spin_lock(&ses->ses_lock);
+ if (ses->ses_status == SES_EXITING) {
+ spin_unlock(&ses->ses_lock);
+ continue;
+ }
+ spin_unlock(&ses->ses_lock);
+
spin_lock(&ses->chan_lock);
if (cifs_ses_get_chan_index(ses, server) ==
CIFS_INVAL_CHAN_INDEX) {
@@ -1963,31 +1969,6 @@ cifs_setup_ipc(struct cifs_ses *ses, struct smb3_fs_context *ctx)
return rc;
}
-/**
- * cifs_free_ipc - helper to release the session IPC tcon
- * @ses: smb session to unmount the IPC from
- *
- * Needs to be called everytime a session is destroyed.
- *
- * On session close, the IPC is closed and the server must release all tcons of the session.
- * No need to send a tree disconnect here.
- *
- * Besides, it will make the server to not close durable and resilient files on session close, as
- * specified in MS-SMB2 3.3.5.6 Receiving an SMB2 LOGOFF Request.
- */
-static int
-cifs_free_ipc(struct cifs_ses *ses)
-{
- struct cifs_tcon *tcon = ses->tcon_ipc;
-
- if (tcon == NULL)
- return 0;
-
- tconInfoFree(tcon);
- ses->tcon_ipc = NULL;
- return 0;
-}
-
static struct cifs_ses *
cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
{
@@ -2019,48 +2000,52 @@ cifs_find_smb_ses(struct TCP_Server_Info *server, struct smb3_fs_context *ctx)
void __cifs_put_smb_ses(struct cifs_ses *ses)
{
struct TCP_Server_Info *server = ses->server;
+ struct cifs_tcon *tcon;
unsigned int xid;
size_t i;
+ bool do_logoff;
int rc;
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_EXITING) {
- spin_unlock(&ses->ses_lock);
- return;
- }
- spin_unlock(&ses->ses_lock);
-
- cifs_dbg(FYI, "%s: ses_count=%d\n", __func__, ses->ses_count);
- cifs_dbg(FYI,
- "%s: ses ipc: %s\n", __func__, ses->tcon_ipc ? ses->tcon_ipc->tree_name : "NONE");
-
spin_lock(&cifs_tcp_ses_lock);
- if (--ses->ses_count > 0) {
+ spin_lock(&ses->ses_lock);
+ cifs_dbg(FYI, "%s: id=0x%llx ses_count=%d ses_status=%u ipc=%s\n",
+ __func__, ses->Suid, ses->ses_count, ses->ses_status,
+ ses->tcon_ipc ? ses->tcon_ipc->tree_name : "none");
+ if (ses->ses_status == SES_EXITING || --ses->ses_count > 0) {
+ spin_unlock(&ses->ses_lock);
spin_unlock(&cifs_tcp_ses_lock);
return;
}
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_GOOD)
- ses->ses_status = SES_EXITING;
- spin_unlock(&ses->ses_lock);
- spin_unlock(&cifs_tcp_ses_lock);
-
/* ses_count can never go negative */
WARN_ON(ses->ses_count < 0);
- spin_lock(&ses->ses_lock);
- if (ses->ses_status == SES_EXITING && server->ops->logoff) {
- spin_unlock(&ses->ses_lock);
- cifs_free_ipc(ses);
+ spin_lock(&ses->chan_lock);
+ cifs_chan_clear_need_reconnect(ses, server);
+ spin_unlock(&ses->chan_lock);
+
+ do_logoff = ses->ses_status == SES_GOOD && server->ops->logoff;
+ ses->ses_status = SES_EXITING;
+ tcon = ses->tcon_ipc;
+ ses->tcon_ipc = NULL;
+ spin_unlock(&ses->ses_lock);
+ spin_unlock(&cifs_tcp_ses_lock);
+
+ /*
+ * On session close, the IPC is closed and the server must release all
+ * tcons of the session. No need to send a tree disconnect here.
+ *
+ * Besides, it will make the server to not close durable and resilient
+ * files on session close, as specified in MS-SMB2 3.3.5.6 Receiving an
+ * SMB2 LOGOFF Request.
+ */
+ tconInfoFree(tcon);
+ if (do_logoff) {
xid = get_xid();
rc = server->ops->logoff(xid, ses);
if (rc)
cifs_server_dbg(VFS, "%s: Session Logoff failure rc=%d\n",
__func__, rc);
_free_xid(xid);
- } else {
- spin_unlock(&ses->ses_lock);
- cifs_free_ipc(ses);
}
spin_lock(&cifs_tcp_ses_lock);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x d14fa1fcf69db9d070e75f1c4425211fa619dfc8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040817-unplowed-heavily-b59d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
d14fa1fcf69d ("riscv: process: Fix kernel gp leakage")
fea2fed201ee ("riscv: Enable per-task stack canaries")
d7071743db31 ("RISC-V: Add EFI stub support.")
f2c9699f6555 ("riscv: Add STACKPROTECTOR supported")
a5d8e55b2c7d ("Merge tag 'v5.7-rc7' into efi/core, to refresh the branch and pick up fixes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d14fa1fcf69db9d070e75f1c4425211fa619dfc8 Mon Sep 17 00:00:00 2001
From: Stefan O'Rear <sorear(a)fastmail.com>
Date: Wed, 27 Mar 2024 02:12:58 -0400
Subject: [PATCH] riscv: process: Fix kernel gp leakage
childregs represents the registers which are active for the new thread
in user context. For a kernel thread, childregs->gp is never used since
the kernel gp is not touched by switch_to. For a user mode helper, the
gp value can be observed in user space after execve or possibly by other
means.
[From the email thread]
The /* Kernel thread */ comment is somewhat inaccurate in that it is also used
for user_mode_helper threads, which exec a user process, e.g. /sbin/init or
when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have
PF_KTHREAD set and are valid targets for ptrace etc. even before they exec.
childregs is the *user* context during syscall execution and it is observable
from userspace in at least five ways:
1. kernel_execve does not currently clear integer registers, so the starting
register state for PID 1 and other user processes started by the kernel has
sp = user stack, gp = kernel __global_pointer$, all other integer registers
zeroed by the memset in the patch comment.
This is a bug in its own right, but I'm unwilling to bet that it is the only
way to exploit the issue addressed by this patch.
2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread
before it execs, but ptrace requires SIGSTOP to be delivered which can only
happen at user/kernel boundaries.
3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for
user_mode_helpers before the exec completes, but gp is not one of the
registers it returns.
4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel
addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses
are also exposed via PERF_SAMPLE_REGS_USER which is permitted under
LOCKDOWN_PERF. I have not attempted to write exploit code.
5. Much of the tracing infrastructure allows access to user registers. I have
not attempted to determine which forms of tracing allow access to user
registers without already allowing access to kernel registers.
Fixes: 7db91e57a0ac ("RISC-V: Task implementation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Stefan O'Rear <sorear(a)fastmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 6abeecbfc51d..e4bc61c4e58a 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -27,8 +27,6 @@
#include <asm/vector.h>
#include <asm/cpufeature.h>
-register unsigned long gp_in_global __asm__("gp");
-
#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
#include <linux/stackprotector.h>
unsigned long __stack_chk_guard __read_mostly;
@@ -207,7 +205,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
if (unlikely(args->fn)) {
/* Kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
- childregs->gp = gp_in_global;
/* Supervisor/Machine, irqs on: */
childregs->status = SR_PP | SR_PIE;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x d14fa1fcf69db9d070e75f1c4425211fa619dfc8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040815-geometric-quaintly-f4a7@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
d14fa1fcf69d ("riscv: process: Fix kernel gp leakage")
fea2fed201ee ("riscv: Enable per-task stack canaries")
d7071743db31 ("RISC-V: Add EFI stub support.")
f2c9699f6555 ("riscv: Add STACKPROTECTOR supported")
a5d8e55b2c7d ("Merge tag 'v5.7-rc7' into efi/core, to refresh the branch and pick up fixes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d14fa1fcf69db9d070e75f1c4425211fa619dfc8 Mon Sep 17 00:00:00 2001
From: Stefan O'Rear <sorear(a)fastmail.com>
Date: Wed, 27 Mar 2024 02:12:58 -0400
Subject: [PATCH] riscv: process: Fix kernel gp leakage
childregs represents the registers which are active for the new thread
in user context. For a kernel thread, childregs->gp is never used since
the kernel gp is not touched by switch_to. For a user mode helper, the
gp value can be observed in user space after execve or possibly by other
means.
[From the email thread]
The /* Kernel thread */ comment is somewhat inaccurate in that it is also used
for user_mode_helper threads, which exec a user process, e.g. /sbin/init or
when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have
PF_KTHREAD set and are valid targets for ptrace etc. even before they exec.
childregs is the *user* context during syscall execution and it is observable
from userspace in at least five ways:
1. kernel_execve does not currently clear integer registers, so the starting
register state for PID 1 and other user processes started by the kernel has
sp = user stack, gp = kernel __global_pointer$, all other integer registers
zeroed by the memset in the patch comment.
This is a bug in its own right, but I'm unwilling to bet that it is the only
way to exploit the issue addressed by this patch.
2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread
before it execs, but ptrace requires SIGSTOP to be delivered which can only
happen at user/kernel boundaries.
3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for
user_mode_helpers before the exec completes, but gp is not one of the
registers it returns.
4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel
addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses
are also exposed via PERF_SAMPLE_REGS_USER which is permitted under
LOCKDOWN_PERF. I have not attempted to write exploit code.
5. Much of the tracing infrastructure allows access to user registers. I have
not attempted to determine which forms of tracing allow access to user
registers without already allowing access to kernel registers.
Fixes: 7db91e57a0ac ("RISC-V: Task implementation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Stefan O'Rear <sorear(a)fastmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 6abeecbfc51d..e4bc61c4e58a 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -27,8 +27,6 @@
#include <asm/vector.h>
#include <asm/cpufeature.h>
-register unsigned long gp_in_global __asm__("gp");
-
#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
#include <linux/stackprotector.h>
unsigned long __stack_chk_guard __read_mostly;
@@ -207,7 +205,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
if (unlikely(args->fn)) {
/* Kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
- childregs->gp = gp_in_global;
/* Supervisor/Machine, irqs on: */
childregs->status = SR_PP | SR_PIE;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d14fa1fcf69db9d070e75f1c4425211fa619dfc8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040813-decline-blinker-ebc3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
d14fa1fcf69d ("riscv: process: Fix kernel gp leakage")
fea2fed201ee ("riscv: Enable per-task stack canaries")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d14fa1fcf69db9d070e75f1c4425211fa619dfc8 Mon Sep 17 00:00:00 2001
From: Stefan O'Rear <sorear(a)fastmail.com>
Date: Wed, 27 Mar 2024 02:12:58 -0400
Subject: [PATCH] riscv: process: Fix kernel gp leakage
childregs represents the registers which are active for the new thread
in user context. For a kernel thread, childregs->gp is never used since
the kernel gp is not touched by switch_to. For a user mode helper, the
gp value can be observed in user space after execve or possibly by other
means.
[From the email thread]
The /* Kernel thread */ comment is somewhat inaccurate in that it is also used
for user_mode_helper threads, which exec a user process, e.g. /sbin/init or
when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have
PF_KTHREAD set and are valid targets for ptrace etc. even before they exec.
childregs is the *user* context during syscall execution and it is observable
from userspace in at least five ways:
1. kernel_execve does not currently clear integer registers, so the starting
register state for PID 1 and other user processes started by the kernel has
sp = user stack, gp = kernel __global_pointer$, all other integer registers
zeroed by the memset in the patch comment.
This is a bug in its own right, but I'm unwilling to bet that it is the only
way to exploit the issue addressed by this patch.
2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread
before it execs, but ptrace requires SIGSTOP to be delivered which can only
happen at user/kernel boundaries.
3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for
user_mode_helpers before the exec completes, but gp is not one of the
registers it returns.
4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel
addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses
are also exposed via PERF_SAMPLE_REGS_USER which is permitted under
LOCKDOWN_PERF. I have not attempted to write exploit code.
5. Much of the tracing infrastructure allows access to user registers. I have
not attempted to determine which forms of tracing allow access to user
registers without already allowing access to kernel registers.
Fixes: 7db91e57a0ac ("RISC-V: Task implementation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Stefan O'Rear <sorear(a)fastmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti(a)rivosinc.com>
Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com
Signed-off-by: Palmer Dabbelt <palmer(a)rivosinc.com>
diff --git a/arch/riscv/kernel/process.c b/arch/riscv/kernel/process.c
index 6abeecbfc51d..e4bc61c4e58a 100644
--- a/arch/riscv/kernel/process.c
+++ b/arch/riscv/kernel/process.c
@@ -27,8 +27,6 @@
#include <asm/vector.h>
#include <asm/cpufeature.h>
-register unsigned long gp_in_global __asm__("gp");
-
#if defined(CONFIG_STACKPROTECTOR) && !defined(CONFIG_STACKPROTECTOR_PER_TASK)
#include <linux/stackprotector.h>
unsigned long __stack_chk_guard __read_mostly;
@@ -207,7 +205,6 @@ int copy_thread(struct task_struct *p, const struct kernel_clone_args *args)
if (unlikely(args->fn)) {
/* Kernel thread */
memset(childregs, 0, sizeof(struct pt_regs));
- childregs->gp = gp_in_global;
/* Supervisor/Machine, irqs on: */
childregs->status = SR_PP | SR_PIE;
Hi,
On Sun, 2024-04-07 at 16:13 -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> ASoC: tas2781: mark dvc_tlv with __maybe_unused
>
> to the 6.8-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> asoc-tas2781-mark-dvc_tlv-with-__maybe_unused.patch
> and it can be found in the queue-6.8 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
Is this necessary for stable? It only fixes a W=1 build warning.
Regards,
Gergo
The patchset fixes some warnings reported by the kernel during boot.
The cache size info is from Processor_Datasheet_v2XX.pdf [1], Section
2.2.1 Master Processor.
The cache line size and the set-associative info are from Cortex-A53
Documentation [2].
From the doc, it can be concluded that L1 i-cache is 4-way assoc, L1
d-cache is 2-way assoc and L2 cache is 16-way assoc. Calculate the dts
props accordingly.
Also, to use KVM's VGIC code, GICH, GICV registers spaces and maintenance
IRQ are added to the dts with verification.
[1]: https://github.com/96boards/documentation/blob/master/enterprise/poplar/har…
[2]: https://developer.arm.com/documentation/ddi0500/j/Level-1-Memory-System
Signed-off-by: Yang Xiwen <forbidden405(a)outlook.com>
---
Changes in v3:
- send patches to stable (Andrew Lunn)
- rewrite the commit logs more formally (Andrew Lunn)
- rename l2-cache0 to l2-cache (Krzysztof Kozlowski)
- Link to v2: https://lore.kernel.org/r/20240218-cache-v2-0-1fd919e2bd3e@outlook.com
Changes in v2:
- arm64: dts: hi3798cv200: add GICH, GICV register spces and
maintainance IRQ.
- Link to v1: https://lore.kernel.org/r/20240218-cache-v1-0-2c0a8a4472e7@outlook.com
---
Yang Xiwen (3):
arm64: dts: hi3798cv200: fix the size of GICR
arm64: dts: hi3798cv200: add GICH, GICV register space and irq
arm64: dts: hi3798cv200: add cache info
arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi | 43 +++++++++++++++++++++++++-
1 file changed, 42 insertions(+), 1 deletion(-)
---
base-commit: 8d3dea210042f54b952b481838c1e7dfc4ec751d
change-id: 20240218-cache-11c8bf7566c2
Best regards,
--
Yang Xiwen <forbidden405(a)outlook.com>
From: Vasant Karasulli <vkarasulli(a)suse.de>
Hi,
here are changes to enable kexec/kdump in SEV-ES guests. The biggest
problem for supporting kexec/kdump under SEV-ES is to find a way to
hand the non-boot CPUs (APs) from one kernel to another.
Without SEV-ES the first kernel parks the CPUs in a HLT loop until
they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence.
For virtual machines the CPU reset is emulated by the hypervisor,
which sets the vCPU registers back to reset state.
This does not work under SEV-ES, because the hypervisor has no access
to the vCPU registers and can't make modifications to them. So an
SEV-ES guest needs to reset the vCPU itself and park it using the
AP-reset-hold protocol. Upon wakeup the guest needs to jump to
real-mode and to the reset-vector configured in the AP-Jump-Table.
The code to do this is the main part of this patch-set. It works by
placing code on the AP Jump-Table page itself to park the vCPU and for
jumping to the reset vector upon wakeup. The code on the AP Jump Table
runs in 16-bit protected mode with segment base set to the beginning
of the page. The AP Jump-Table is usually not within the first 1MB of
memory, so the code can't run in real-mode.
The AP Jump-Table is the best place to put the parking code, because
the memory is owned, but read-only by the firmware and writeable by
the OS. Only the first 4 bytes are used for the reset-vector, leaving
the rest of the page for code/data/stack to park a vCPU. The code
can't be in kernel memory because by the time the vCPU wakes up the
memory will be owned by the new kernel, which might have overwritten it
already.
The other patches add initial GHCB Version 2 protocol support, because
kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXIT,
which is a GHCB protocol version 2 feature.
The kexec'ed kernel is also entered via the decompressor and needs
MMIO support there, so this patch-set also adds MMIO #VC support to
the decompressor and support for handling CLFLUSH instructions.
Finally there is also code to disable kexec/kdump support at runtime
when the environment does not support it (e.g. no GHCB protocol
version 2 support or AP Jump Table over 4GB).
The diffstat looks big, but most of it is moving code for MMIO #VC
support around to make it available to the decompressor.
The previous version of this patch-set can be found here:
https://lore.kernel.org/kvm/20240311161727.14916-1-vsntk18@gmail.com/
Please review.
Thanks,
Vasant
Changes v4->v5:
- Rebased to v6.9-rc2 kernel
- Applied review comments by Tom Lendacky
- Exclude the AP jump table related code for SEV-SNP guests
Changes v3->v4:
- Rebased to v6.8 kernel
- Applied review comments by Sean Christopherson
- Combined sev_es_setup_ap_jump_table() and sev_setup_ap_jump_table()
into a single function which makes caching jump table address
unnecessary
- annotated struct sev_ap_jump_table_header with __packed attribute
- added code to set up real mode data segment at boot time instead of
hardcoding the value.
Changes v2->v3:
- Rebased to v5.17-rc1
- Applied most review comments by Boris
- Use the name 'AP jump table' consistently
- Make kexec-disabling for unsupported guests x86-specific
- Cleanup and consolidate patches to detect GHCB v2 protocol
support
Joerg Roedel (9):
x86/kexec/64: Disable kexec when SEV-ES is active
x86/sev: Save and print negotiated GHCB protocol version
x86/sev: Set GHCB data structure version
x86/sev: Setup code to park APs in the AP Jump Table
x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
x86/sev: Use AP Jump Table blob to stop CPU
x86/sev: Add MMIO handling support to boot/compressed/ code
x86/sev: Handle CLFLUSH MMIO events
x86/kexec/64: Support kexec under SEV-ES with AP Jump Table Blob
Vasant Karasulli (1):
x86/sev: Exclude AP jump table related code for SEV-SNP guests
arch/x86/boot/compressed/sev.c | 45 +-
arch/x86/include/asm/insn-eval.h | 1 +
arch/x86/include/asm/realmode.h | 5 +
arch/x86/include/asm/sev-ap-jumptable.h | 30 +
arch/x86/include/asm/sev.h | 7 +
arch/x86/kernel/machine_kexec_64.c | 12 +
arch/x86/kernel/process.c | 8 +
arch/x86/kernel/sev-shared.c | 234 +++++-
arch/x86/kernel/sev.c | 376 +++++-----
arch/x86/lib/insn-eval-shared.c | 921 ++++++++++++++++++++++++
arch/x86/lib/insn-eval.c | 911 +----------------------
arch/x86/realmode/Makefile | 9 +-
arch/x86/realmode/init.c | 5 +-
arch/x86/realmode/rm/Makefile | 11 +-
arch/x86/realmode/rm/header.S | 3 +
arch/x86/realmode/rm/sev.S | 85 +++
arch/x86/realmode/rmpiggy.S | 6 +
arch/x86/realmode/sev/Makefile | 33 +
arch/x86/realmode/sev/ap_jump_table.S | 131 ++++
arch/x86/realmode/sev/ap_jump_table.lds | 24 +
20 files changed, 1711 insertions(+), 1146 deletions(-)
create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
create mode 100644 arch/x86/lib/insn-eval-shared.c
create mode 100644 arch/x86/realmode/rm/sev.S
create mode 100644 arch/x86/realmode/sev/Makefile
create mode 100644 arch/x86/realmode/sev/ap_jump_table.S
create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds
base-commit: 39cd87c4eb2b893354f3b850f916353f2658ae6f
--
2.34.1
'nr' member of struct spmi_controller, which serves as an identifier
for the controller/bus. This value is a dynamic ID assigned in
spmi_controller_alloc, and overriding it from the driver results in an
ida_free error "ida_free called for id=xx which is not allocated".
Signed-off-by: Vamshi Gajjela <vamshigajjela(a)google.com>
Fixes: 70f59c90c819 ("staging: spmi: add Hikey 970 SPMI controller driver")
Cc: stable(a)vger.kernel.org
---
drivers/spmi/hisi-spmi-controller.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/spmi/hisi-spmi-controller.c b/drivers/spmi/hisi-spmi-controller.c
index 674a350cc676..fa068b34b040 100644
--- a/drivers/spmi/hisi-spmi-controller.c
+++ b/drivers/spmi/hisi-spmi-controller.c
@@ -300,7 +300,6 @@ static int spmi_controller_probe(struct platform_device *pdev)
spin_lock_init(&spmi_controller->lock);
- ctrl->nr = spmi_controller->channel;
ctrl->dev.parent = pdev->dev.parent;
ctrl->dev.of_node = of_node_get(pdev->dev.of_node);
--
2.44.0.rc1.240.g4c46232300-goog
From: Petre Rodan <petre.rodan(a)subdimension.ro>
[ Upstream commit 4e6500bfa053dc133021f9c144261b77b0ba7dc8 ]
Replace seekdir() with rewinddir() in order to fix a localized glibc bug.
One of the glibc patches that stable Gentoo is using causes an improper
directory stream positioning bug on 32bit arm. That in turn ends up as a
floating point exception in iio_generic_buffer.
The attached patch provides a fix by using an equivalent function which
should not cause trouble for other distros and is easier to reason about
in general as it obviously always goes back to to the start.
https://sourceware.org/bugzilla/show_bug.cgi?id=31212
Signed-off-by: Petre Rodan <petre.rodan(a)subdimension.ro>
Link: https://lore.kernel.org/r/20240108103224.3986-1-petre.rodan@subdimension.ro
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/iio/iio_utils.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/iio/iio_utils.c b/tools/iio/iio_utils.c
index 6a00a6eecaef0..c5c5082cb24e5 100644
--- a/tools/iio/iio_utils.c
+++ b/tools/iio/iio_utils.c
@@ -376,7 +376,7 @@ int build_channel_array(const char *device_dir, int buffer_idx,
goto error_close_dir;
}
- seekdir(dp, 0);
+ rewinddir(dp);
while (ent = readdir(dp), ent) {
if (strcmp(ent->d_name + strlen(ent->d_name) - strlen("_en"),
"_en") == 0) {
--
2.43.0
From: Marcel Ziswiler <marcel.ziswiler(a)toradex.com>
On the i.MX 8M Mini, the AUX_PLL_REFCLK_SEL has to be left at its reset
default of AUX_IN (PLL clock).
Background Information:
In our automated testing setup, we use Delock Mini-PCIe SATA cards [1].
While this setup has proven very stable overall we noticed upstream on
the i.MX 8M Mini fails quite regularly (about 50/50) to bring up the
PCIe link while with NXP's downstream BSP 5.15.71_2.2.2 it always works.
As that old downstream stuff was quite different, I first also tried
NXP's latest downstream BSP 6.1.55_2.2.0 which from a PCIe point of view
is fairly vanilla, however, also there the PCIe link-up was not stable.
Comparing and debugging I noticed that upstream explicitly configures
the AUX_PLL_REFCLK_SEL to I_PLL_REFCLK_FROM_SYSPLL while working
downstream [2] leaving it at reset defaults of AUX_IN (PLL clock).
Unfortunately, the TRM does not mention any further details about this
register (both for the i.MX 8M Mini as well as the Plus).
NXP confirmed their validation codes for the i.MX8MM PCIe doesn't
configure cmn_reg063 (offset: 0x18C).
BTW: On the i.MX 8M Plus we have not seen any issues with PCIe with the
exact same setup which is why I left it unchanged.
[1] https://www.delock.com/produkt/95233/merkmale.html
[2] https://github.com/nxp-imx/linux-imx/blob/lf-5.15.71-2.2.0/drivers/pci/cont…
Fixes: 1aa97b002258 ("phy: freescale: pcie: Initialize the imx8 pcie standalone phy driver")
Cc: stable(a)vger.kernel.org # 6.1.x: ca679c49: phy: freescale: imx8m-pcie: Refine i.MX8MM PCIe PHY driver
Cc: stable(a)vger.kernel.org # 6.1.x
Signed-off-by: Marcel Ziswiler <marcel.ziswiler(a)toradex.com>
Reviewed-by: Richard Zhu <hongxing.zhu(a)nxp.com>
Link: https://lore.kernel.org/all/AS8PR04MB867661386FEA07649771FBE18C362@AS8PR04M…
---
Changes in v2:
- Reword the commmit message.
- Meld the background information from the cover letter into the commit
message as suggested by Fabio. Thanks!
- Document NXP's confirmation from their validation codes and add
Richard Zhu's reviewed-by. Thanks!
drivers/phy/freescale/phy-fsl-imx8m-pcie.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
index b700f52b7b67..11fcb1867118 100644
--- a/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
+++ b/drivers/phy/freescale/phy-fsl-imx8m-pcie.c
@@ -110,8 +110,10 @@ static int imx8_pcie_phy_power_on(struct phy *phy)
/* Source clock from SoC internal PLL */
writel(ANA_PLL_CLK_OUT_TO_EXT_IO_SEL,
imx8_phy->base + IMX8MM_PCIE_PHY_CMN_REG062);
- writel(AUX_PLL_REFCLK_SEL_SYS_PLL,
- imx8_phy->base + IMX8MM_PCIE_PHY_CMN_REG063);
+ if (imx8_phy->drvdata->variant != IMX8MM) {
+ writel(AUX_PLL_REFCLK_SEL_SYS_PLL,
+ imx8_phy->base + IMX8MM_PCIE_PHY_CMN_REG063);
+ }
val = ANA_AUX_RX_TX_SEL_TX | ANA_AUX_TX_TERM;
writel(val | ANA_AUX_RX_TERM_GND_EN,
imx8_phy->base + IMX8MM_PCIE_PHY_CMN_REG064);
--
2.44.0
From: Conor Dooley <conor.dooley(a)microchip.com>
On RISC-V and arm64, and presumably x86, if CFI_CLANG is enabled,
loading a rust module will trigger a kernel panic. Support for
sanitisers, including kcfi (CFI_CLANG), is in the works, but for now
they're nightly-only options in rustc. Make RUST depend on !CFI_CLANG
to prevent configuring a kernel without symmetrical support for kfi.
Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support")
cc: stable(a)vger.kernel.org
Signed-off-by: Conor Dooley <conor.dooley(a)microchip.com>
---
Sending this one on its own, there's no explicit dep on this for the
riscv enabling patch, v3 to continue the numbering from there. Nothing
has changed since v2.
CC: Miguel Ojeda <ojeda(a)kernel.org>
CC: Alex Gaynor <alex.gaynor(a)gmail.com>
CC: Wedson Almeida Filho <wedsonaf(a)gmail.com>
CC: linux-kernel(a)vger.kernel.org (open list)
CC: rust-for-linux(a)vger.kernel.org
CC: Sami Tolvanen <samitolvanen(a)google.com>
CC: Kees Cook <keescook(a)chromium.org>
CC: Nathan Chancellor <nathan(a)kernel.org>
CC: llvm(a)lists.linux.dev
---
init/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/init/Kconfig b/init/Kconfig
index aa02aec6aa7d..ad9a2da27dc9 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1899,6 +1899,7 @@ config RUST
bool "Rust support"
depends on HAVE_RUST
depends on RUST_IS_AVAILABLE
+ depends on !CFI_CLANG
depends on !MODVERSIONS
depends on !GCC_PLUGINS
depends on !RANDSTRUCT
--
2.43.0
In Rust, producing an invalid value of any type is immediate undefined
behavior (UB); this includes via zeroing memory. Therefore, since an
uninhabited type has no valid values, producing any values at all for it is
UB.
The Rust standard library type `core::convert::Infallible` is uninhabited,
by virtue of having been declared as an enum with no cases, which always
produces uninhabited types in Rust.
The current kernel code allows this UB to be triggered, for example by code
like `Box::<core::convert::Infallible>::init(kernel::init::zeroed())`.
Thus, remove the implementation of `Zeroable` for `Infallible`, thereby
avoiding the unsoundness (potential for future UB).
Cc: stable(a)vger.kernel.org
Fixes: 38cde0bd7b67 ("rust: init: add `Zeroable` trait and `init::zeroed` function")
Closes: https://github.com/Rust-for-Linux/pinned-init/pull/13
Signed-off-by: Laine Taffin Altman <alexanderaltman(a)me.com>
Reviewed-by: Alice Ryhl <aliceryhl(a)google.com>
Reviewed-by: Boqun Feng <boqun.feng(a)gmail.com>
---
V3 -> V4: Address review nits; run checkpatch properly.
V2 -> V3: Email formatting correction.
V1 -> V2: Added more documentation to the comment, with links; also added more details to the commit message.
rust/kernel/init.rs | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/rust/kernel/init.rs b/rust/kernel/init.rs
index 424257284d16..3859c7ff81b7 100644
--- a/rust/kernel/init.rs
+++ b/rust/kernel/init.rs
@@ -1292,8 +1292,15 @@ macro_rules! impl_zeroable {
i8, i16, i32, i64, i128, isize,
f32, f64,
- // SAFETY: These are ZSTs, there is nothing to zero.
- {<T: ?Sized>} PhantomData<T>, core::marker::PhantomPinned, Infallible, (),
+ // Note: do not add uninhabited types (such as `!` or `core::convert::Infallible`) to this list;
+ // creating an instance of an uninhabited type is immediate undefined behavior. For more on
+ // uninhabited/empty types, consult The Rustonomicon:
+ // https://doc.rust-lang.org/stable/nomicon/exotic-sizes.html#empty-types The Rust Reference
+ // also has information on undefined behavior:
+ // https://doc.rust-lang.org/stable/reference/behavior-considered-undefined.ht…
+ //
+ // SAFETY: These are inhabited ZSTs; there is nothing to zero and a valid value exists.
+ {<T: ?Sized>} PhantomData<T>, core::marker::PhantomPinned, (),
// SAFETY: Type is allowed to take any value, including all zeros.
{<T>} MaybeUninit<T>,
base-commit: c85af715cac0a951eea97393378e84bb49384734
--
2.44.0
Hi Sasha,
On Sun, Apr 07, 2024 at 08:53:40AM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> usb: typec: ucsi: Check for notifications after init
>
> to the 6.8-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
This patch contains an out of bounds memory access and should not
be included in the stable backports until a fix is available.
A fix is already queued in Greg's usb-linus branch.
Please drop the above patch from all stable trees for now.
Sorry for the inconvenience.
> The filename of the patch is:
> usb-typec-ucsi-check-for-notifications-after-init.patch
> and it can be found in the queue-6.8 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 903bfed719f3e87b607956bbe4d855c71831a43a
> Author: Christian A. Ehrhardt <lk(a)c--e.de>
> Date: Wed Mar 20 08:39:23 2024 +0100
>
> usb: typec: ucsi: Check for notifications after init
>
> [ Upstream commit 808a8b9e0b87bbc72bcc1f7ddfe5d04746e7ce56 ]
>
> The completion notification for the final SET_NOTIFICATION_ENABLE
> command during initialization can include a connector change
> notification. However, at the time this completion notification is
> processed, the ucsi struct is not ready to handle this notification.
> As a result the notification is ignored and the controller
> never sends an interrupt again.
>
> Re-check CCI for a pending connector state change after
> initialization is complete. Adjust the corresponding debug
> message accordingly.
>
> Fixes: 71a1fa0df2a3 ("usb: typec: ucsi: Store the notification mask")
> Cc: stable(a)vger.kernel.org
> Signed-off-by: Christian A. Ehrhardt <lk(a)c--e.de>
> Reviewed-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
> Tested-by: Neil Armstrong <neil.armstrong(a)linaro.org> # on SM8550-QRD
> Link: https://lore.kernel.org/r/20240320073927.1641788-3-lk@c--e.de
> Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/drivers/usb/typec/ucsi/ucsi.c b/drivers/usb/typec/ucsi/ucsi.c
> index 0bfe5e906e543..96da828f556a9 100644
> --- a/drivers/usb/typec/ucsi/ucsi.c
> +++ b/drivers/usb/typec/ucsi/ucsi.c
> @@ -962,7 +962,7 @@ void ucsi_connector_change(struct ucsi *ucsi, u8 num)
> struct ucsi_connector *con = &ucsi->connector[num - 1];
>
> if (!(ucsi->ntfy & UCSI_ENABLE_NTFY_CONNECTOR_CHANGE)) {
> - dev_dbg(ucsi->dev, "Bogus connector change event\n");
> + dev_dbg(ucsi->dev, "Early connector change event\n");
> return;
> }
>
> @@ -1393,6 +1393,7 @@ static int ucsi_init(struct ucsi *ucsi)
> {
> struct ucsi_connector *con, *connector;
> u64 command, ntfy;
> + u32 cci;
> int ret;
> int i;
>
> @@ -1445,6 +1446,13 @@ static int ucsi_init(struct ucsi *ucsi)
>
> ucsi->connector = connector;
> ucsi->ntfy = ntfy;
> +
> + ret = ucsi->ops->read(ucsi, UCSI_CCI, &cci, sizeof(cci));
> + if (ret)
> + return ret;
> + if (UCSI_CCI_CONNECTOR(READ_ONCE(cci)))
> + ucsi_connector_change(ucsi, cci);
> +
> return 0;
>
> err_unregister:
>
Best regards
Christian
The current driver have some issues, this series fixes them.
PATCH 1 is fixing a wrong offset computation in the dma descriptor address
PATCH 2 is fixing the xdma_synchronize callback, which was not waiting
properly for the last transfer.
PATCH 3 is clarifing the documentation for xdma_fill_descs
---
Louis Chauvet (1):
dmaengine: xilinx: xdma: Fix synchronization issue
Miquel Raynal (2):
dmaengine: xilinx: xdma: Fix wrong offsets in the buffers addresses in dma descriptor
dmaengine: xilinx: xdma: Clarify kdoc in XDMA driver
drivers/dma/xilinx/xdma-regs.h | 3 +++
drivers/dma/xilinx/xdma.c | 42 +++++++++++++++++++++++++++---------------
2 files changed, 30 insertions(+), 15 deletions(-)
---
base-commit: 8e938e39866920ddc266898e6ae1fffc5c8f51aa
change-id: 20240322-digigram-xdma-fixes-aa13451b7474
Best regards,
--
Louis Chauvet <louis.chauvet(a)bootlin.com>
There is a bug when setting the RSS options in virtio_net that can break
the whole machine, getting the kernel into an infinite loop.
Running the following command in any QEMU virtual machine with virtionet
will reproduce this problem:
# ethtool -X eth0 hfunc toeplitz
This is how the problem happens:
1) ethtool_set_rxfh() calls virtnet_set_rxfh()
2) virtnet_set_rxfh() calls virtnet_commit_rss_command()
3) virtnet_commit_rss_command() populates 4 entries for the rss
scatter-gather
4) Since the command above does not have a key, then the last
scatter-gatter entry will be zeroed, since rss_key_size == 0.
sg_buf_size = vi->rss_key_size;
5) This buffer is passed to qemu, but qemu is not happy with a buffer
with zero length, and do the following in virtqueue_map_desc() (QEMU
function):
if (!sz) {
virtio_error(vdev, "virtio: zero sized buffers are not allowed");
6) virtio_error() (also QEMU function) set the device as broken
vdev->broken = true;
7) Qemu bails out, and do not repond this crazy kernel.
8) The kernel is waiting for the response to come back (function
virtnet_send_command())
9) The kernel is waiting doing the following :
while (!virtqueue_get_buf(vi->cvq, &tmp) &&
!virtqueue_is_broken(vi->cvq))
cpu_relax();
10) None of the following functions above is true, thus, the kernel
loops here forever. Keeping in mind that virtqueue_is_broken() does
not look at the qemu `vdev->broken`, so, it never realizes that the
vitio is broken at QEMU side.
Fix it by not sending RSS commands if the feature is not available in
the device.
Fixes: c7114b1249fa ("drivers/net/virtio_net: Added basic RSS support.")
Cc: stable(a)vger.kernel.org
Cc: qemu-devel(a)nongnu.org
Signed-off-by: Breno Leitao <leitao(a)debian.org>
Reviewed-by: Heng Qi <hengqi(a)linux.alibaba.com>
---
Changelog:
V2:
* Moved from creating a valid packet, by rejecting the request
completely.
V3:
* Got some good feedback from and Xuan Zhuo and Heng Qi, and reworked
the rejection path.
V4:
* Added a comment in an "if" clause, as suggested by Michael S. Tsirkin.
---
drivers/net/virtio_net.c | 26 ++++++++++++++++++++++----
1 file changed, 22 insertions(+), 4 deletions(-)
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c22d1118a133..115c3c5414f2 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -3807,6 +3807,7 @@ static int virtnet_set_rxfh(struct net_device *dev,
struct netlink_ext_ack *extack)
{
struct virtnet_info *vi = netdev_priv(dev);
+ bool update = false;
int i;
if (rxfh->hfunc != ETH_RSS_HASH_NO_CHANGE &&
@@ -3814,13 +3815,28 @@ static int virtnet_set_rxfh(struct net_device *dev,
return -EOPNOTSUPP;
if (rxfh->indir) {
+ if (!vi->has_rss)
+ return -EOPNOTSUPP;
+
for (i = 0; i < vi->rss_indir_table_size; ++i)
vi->ctrl->rss.indirection_table[i] = rxfh->indir[i];
+ update = true;
}
- if (rxfh->key)
+
+ if (rxfh->key) {
+ /* If either _F_HASH_REPORT or _F_RSS are negotiated, the
+ * device provides hash calculation capabilities, that is,
+ * hash_key is configured.
+ */
+ if (!vi->has_rss && !vi->has_rss_hash_report)
+ return -EOPNOTSUPP;
+
memcpy(vi->ctrl->rss.key, rxfh->key, vi->rss_key_size);
+ update = true;
+ }
- virtnet_commit_rss_command(vi);
+ if (update)
+ virtnet_commit_rss_command(vi);
return 0;
}
@@ -4729,13 +4745,15 @@ static int virtnet_probe(struct virtio_device *vdev)
if (virtio_has_feature(vdev, VIRTIO_NET_F_HASH_REPORT))
vi->has_rss_hash_report = true;
- if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS))
+ if (virtio_has_feature(vdev, VIRTIO_NET_F_RSS)) {
vi->has_rss = true;
- if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_indir_table_size =
virtio_cread16(vdev, offsetof(struct virtio_net_config,
rss_max_indirection_table_length));
+ }
+
+ if (vi->has_rss || vi->has_rss_hash_report) {
vi->rss_key_size =
virtio_cread8(vdev, offsetof(struct virtio_net_config, rss_max_key_size));
--
2.43.0
From: Justin Tee <justin.tee(a)broadcom.com>
[ Upstream commit bb011631435c705cdeddca68d5c85fd40a4320f9 ]
Typically when an out of resource CQE status is detected, the
lpfc_ramp_down_queue_handler() logic is called to help reduce I/O load by
reducing an sdev's queue_depth.
However, the current lpfc_rampdown_queue_depth() logic does not help reduce
queue_depth. num_cmd_success is never updated and is always zero, which
means new_queue_depth will always be set to sdev->queue_depth. So,
new_queue_depth = sdev->queue_depth - new_queue_depth always sets
new_queue_depth to zero. And, scsi_change_queue_depth(sdev, 0) is
essentially a no-op.
Change the lpfc_ramp_down_queue_handler() logic to set new_queue_depth
equal to sdev->queue_depth subtracted from number of times num_rsrc_err was
incremented. If num_rsrc_err is >= sdev->queue_depth, then set
new_queue_depth equal to 1. Eventually, the frequency of Good_Status
frames will signal SCSI upper layer to auto increase the queue_depth back
to the driver default of 64 via scsi_handle_queue_ramp_up().
Signed-off-by: Justin Tee <justin.tee(a)broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/lpfc/lpfc.h | 1 -
drivers/scsi/lpfc/lpfc_scsi.c | 13 ++++---------
2 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 53b661793268f..5698928d8029c 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -989,7 +989,6 @@ struct lpfc_hba {
unsigned long bit_flags;
#define FABRIC_COMANDS_BLOCKED 0
atomic_t num_rsrc_err;
- atomic_t num_cmd_success;
unsigned long last_rsrc_error_time;
unsigned long last_ramp_down_time;
#ifdef CONFIG_SCSI_LPFC_DEBUG_FS
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 425b83618a2e5..02d067e1fc45c 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -303,11 +303,10 @@ lpfc_ramp_down_queue_handler(struct lpfc_hba *phba)
struct Scsi_Host *shost;
struct scsi_device *sdev;
unsigned long new_queue_depth;
- unsigned long num_rsrc_err, num_cmd_success;
+ unsigned long num_rsrc_err;
int i;
num_rsrc_err = atomic_read(&phba->num_rsrc_err);
- num_cmd_success = atomic_read(&phba->num_cmd_success);
/*
* The error and success command counters are global per
@@ -322,20 +321,16 @@ lpfc_ramp_down_queue_handler(struct lpfc_hba *phba)
for (i = 0; i <= phba->max_vports && vports[i] != NULL; i++) {
shost = lpfc_shost_from_vport(vports[i]);
shost_for_each_device(sdev, shost) {
- new_queue_depth =
- sdev->queue_depth * num_rsrc_err /
- (num_rsrc_err + num_cmd_success);
- if (!new_queue_depth)
- new_queue_depth = sdev->queue_depth - 1;
+ if (num_rsrc_err >= sdev->queue_depth)
+ new_queue_depth = 1;
else
new_queue_depth = sdev->queue_depth -
- new_queue_depth;
+ num_rsrc_err;
scsi_change_queue_depth(sdev, new_queue_depth);
}
}
lpfc_destroy_vport_work_array(phba, vports);
atomic_set(&phba->num_rsrc_err, 0);
- atomic_set(&phba->num_cmd_success, 0);
}
/**
--
2.43.0
From: Justin Tee <justin.tee(a)broadcom.com>
[ Upstream commit bb011631435c705cdeddca68d5c85fd40a4320f9 ]
Typically when an out of resource CQE status is detected, the
lpfc_ramp_down_queue_handler() logic is called to help reduce I/O load by
reducing an sdev's queue_depth.
However, the current lpfc_rampdown_queue_depth() logic does not help reduce
queue_depth. num_cmd_success is never updated and is always zero, which
means new_queue_depth will always be set to sdev->queue_depth. So,
new_queue_depth = sdev->queue_depth - new_queue_depth always sets
new_queue_depth to zero. And, scsi_change_queue_depth(sdev, 0) is
essentially a no-op.
Change the lpfc_ramp_down_queue_handler() logic to set new_queue_depth
equal to sdev->queue_depth subtracted from number of times num_rsrc_err was
incremented. If num_rsrc_err is >= sdev->queue_depth, then set
new_queue_depth equal to 1. Eventually, the frequency of Good_Status
frames will signal SCSI upper layer to auto increase the queue_depth back
to the driver default of 64 via scsi_handle_queue_ramp_up().
Signed-off-by: Justin Tee <justin.tee(a)broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/lpfc/lpfc.h | 1 -
drivers/scsi/lpfc/lpfc_scsi.c | 13 ++++---------
2 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index 7ce0d94cdc018..98ab07c3774ed 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -1039,7 +1039,6 @@ struct lpfc_hba {
unsigned long bit_flags;
#define FABRIC_COMANDS_BLOCKED 0
atomic_t num_rsrc_err;
- atomic_t num_cmd_success;
unsigned long last_rsrc_error_time;
unsigned long last_ramp_down_time;
#ifdef CONFIG_SCSI_LPFC_DEBUG_FS
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index 816235ccd2992..f238e0f41f07c 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -246,11 +246,10 @@ lpfc_ramp_down_queue_handler(struct lpfc_hba *phba)
struct Scsi_Host *shost;
struct scsi_device *sdev;
unsigned long new_queue_depth;
- unsigned long num_rsrc_err, num_cmd_success;
+ unsigned long num_rsrc_err;
int i;
num_rsrc_err = atomic_read(&phba->num_rsrc_err);
- num_cmd_success = atomic_read(&phba->num_cmd_success);
/*
* The error and success command counters are global per
@@ -265,20 +264,16 @@ lpfc_ramp_down_queue_handler(struct lpfc_hba *phba)
for (i = 0; i <= phba->max_vports && vports[i] != NULL; i++) {
shost = lpfc_shost_from_vport(vports[i]);
shost_for_each_device(sdev, shost) {
- new_queue_depth =
- sdev->queue_depth * num_rsrc_err /
- (num_rsrc_err + num_cmd_success);
- if (!new_queue_depth)
- new_queue_depth = sdev->queue_depth - 1;
+ if (num_rsrc_err >= sdev->queue_depth)
+ new_queue_depth = 1;
else
new_queue_depth = sdev->queue_depth -
- new_queue_depth;
+ num_rsrc_err;
scsi_change_queue_depth(sdev, new_queue_depth);
}
}
lpfc_destroy_vport_work_array(phba, vports);
atomic_set(&phba->num_rsrc_err, 0);
- atomic_set(&phba->num_cmd_success, 0);
}
/**
--
2.43.0
From: Justin Tee <justin.tee(a)broadcom.com>
[ Upstream commit bb011631435c705cdeddca68d5c85fd40a4320f9 ]
Typically when an out of resource CQE status is detected, the
lpfc_ramp_down_queue_handler() logic is called to help reduce I/O load by
reducing an sdev's queue_depth.
However, the current lpfc_rampdown_queue_depth() logic does not help reduce
queue_depth. num_cmd_success is never updated and is always zero, which
means new_queue_depth will always be set to sdev->queue_depth. So,
new_queue_depth = sdev->queue_depth - new_queue_depth always sets
new_queue_depth to zero. And, scsi_change_queue_depth(sdev, 0) is
essentially a no-op.
Change the lpfc_ramp_down_queue_handler() logic to set new_queue_depth
equal to sdev->queue_depth subtracted from number of times num_rsrc_err was
incremented. If num_rsrc_err is >= sdev->queue_depth, then set
new_queue_depth equal to 1. Eventually, the frequency of Good_Status
frames will signal SCSI upper layer to auto increase the queue_depth back
to the driver default of 64 via scsi_handle_queue_ramp_up().
Signed-off-by: Justin Tee <justin.tee(a)broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/lpfc/lpfc.h | 1 -
drivers/scsi/lpfc/lpfc_scsi.c | 13 ++++---------
2 files changed, 4 insertions(+), 10 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc.h b/drivers/scsi/lpfc/lpfc.h
index cf69f831a7253..8f1b5b0ee8cd8 100644
--- a/drivers/scsi/lpfc/lpfc.h
+++ b/drivers/scsi/lpfc/lpfc.h
@@ -1065,7 +1065,6 @@ struct lpfc_hba {
unsigned long bit_flags;
#define FABRIC_COMANDS_BLOCKED 0
atomic_t num_rsrc_err;
- atomic_t num_cmd_success;
unsigned long last_rsrc_error_time;
unsigned long last_ramp_down_time;
#ifdef CONFIG_SCSI_LPFC_DEBUG_FS
diff --git a/drivers/scsi/lpfc/lpfc_scsi.c b/drivers/scsi/lpfc/lpfc_scsi.c
index b4b87e5d8b291..2121534838747 100644
--- a/drivers/scsi/lpfc/lpfc_scsi.c
+++ b/drivers/scsi/lpfc/lpfc_scsi.c
@@ -246,11 +246,10 @@ lpfc_ramp_down_queue_handler(struct lpfc_hba *phba)
struct Scsi_Host *shost;
struct scsi_device *sdev;
unsigned long new_queue_depth;
- unsigned long num_rsrc_err, num_cmd_success;
+ unsigned long num_rsrc_err;
int i;
num_rsrc_err = atomic_read(&phba->num_rsrc_err);
- num_cmd_success = atomic_read(&phba->num_cmd_success);
/*
* The error and success command counters are global per
@@ -265,20 +264,16 @@ lpfc_ramp_down_queue_handler(struct lpfc_hba *phba)
for (i = 0; i <= phba->max_vports && vports[i] != NULL; i++) {
shost = lpfc_shost_from_vport(vports[i]);
shost_for_each_device(sdev, shost) {
- new_queue_depth =
- sdev->queue_depth * num_rsrc_err /
- (num_rsrc_err + num_cmd_success);
- if (!new_queue_depth)
- new_queue_depth = sdev->queue_depth - 1;
+ if (num_rsrc_err >= sdev->queue_depth)
+ new_queue_depth = 1;
else
new_queue_depth = sdev->queue_depth -
- new_queue_depth;
+ num_rsrc_err;
scsi_change_queue_depth(sdev, new_queue_depth);
}
}
lpfc_destroy_vport_work_array(phba, vports);
atomic_set(&phba->num_rsrc_err, 0);
- atomic_set(&phba->num_cmd_success, 0);
}
/**
--
2.43.0
From: Justin Tee <justin.tee(a)broadcom.com>
[ Upstream commit 4ddf01f2f1504fa08b766e8cfeec558e9f8eef6c ]
There are cases after NPIV deletion where the fabric switch still believes
the NPIV is logged into the fabric. This occurs when a vport is
unregistered before the Remove All DA_ID CT and LOGO ELS are sent to the
fabric.
Currently fc_remove_host(), which calls dev_loss_tmo for all D_IDs including
the fabric D_ID, removes the last ndlp reference and frees the ndlp rport
object. This sometimes causes the race condition where the final DA_ID and
LOGO are skipped from being sent to the fabric switch.
Fix by moving the fc_remove_host() and scsi_remove_host() calls after DA_ID
and LOGO are sent.
Signed-off-by: Justin Tee <justin.tee(a)broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/lpfc/lpfc_vport.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_vport.c b/drivers/scsi/lpfc/lpfc_vport.c
index da9a1f72d9383..b1071226e27fb 100644
--- a/drivers/scsi/lpfc/lpfc_vport.c
+++ b/drivers/scsi/lpfc/lpfc_vport.c
@@ -651,10 +651,6 @@ lpfc_vport_delete(struct fc_vport *fc_vport)
lpfc_free_sysfs_attr(vport);
lpfc_debugfs_terminate(vport);
- /* Remove FC host to break driver binding. */
- fc_remove_host(shost);
- scsi_remove_host(shost);
-
/* Send the DA_ID and Fabric LOGO to cleanup Nameserver entries. */
ndlp = lpfc_findnode_did(vport, Fabric_DID);
if (!ndlp)
@@ -700,6 +696,10 @@ lpfc_vport_delete(struct fc_vport *fc_vport)
skip_logo:
+ /* Remove FC host to break driver binding. */
+ fc_remove_host(shost);
+ scsi_remove_host(shost);
+
lpfc_cleanup(vport);
/* Remove scsi host now. The nodes are cleaned up. */
--
2.43.0
From: Justin Tee <justin.tee(a)broadcom.com>
[ Upstream commit 4ddf01f2f1504fa08b766e8cfeec558e9f8eef6c ]
There are cases after NPIV deletion where the fabric switch still believes
the NPIV is logged into the fabric. This occurs when a vport is
unregistered before the Remove All DA_ID CT and LOGO ELS are sent to the
fabric.
Currently fc_remove_host(), which calls dev_loss_tmo for all D_IDs including
the fabric D_ID, removes the last ndlp reference and frees the ndlp rport
object. This sometimes causes the race condition where the final DA_ID and
LOGO are skipped from being sent to the fabric switch.
Fix by moving the fc_remove_host() and scsi_remove_host() calls after DA_ID
and LOGO are sent.
Signed-off-by: Justin Tee <justin.tee(a)broadcom.com>
Link: https://lore.kernel.org/r/20240305200503.57317-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/lpfc/lpfc_vport.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/lpfc/lpfc_vport.c b/drivers/scsi/lpfc/lpfc_vport.c
index 4d171f5c213f7..6b4259894584f 100644
--- a/drivers/scsi/lpfc/lpfc_vport.c
+++ b/drivers/scsi/lpfc/lpfc_vport.c
@@ -693,10 +693,6 @@ lpfc_vport_delete(struct fc_vport *fc_vport)
lpfc_free_sysfs_attr(vport);
lpfc_debugfs_terminate(vport);
- /* Remove FC host to break driver binding. */
- fc_remove_host(shost);
- scsi_remove_host(shost);
-
/* Send the DA_ID and Fabric LOGO to cleanup Nameserver entries. */
ndlp = lpfc_findnode_did(vport, Fabric_DID);
if (!ndlp)
@@ -740,6 +736,10 @@ lpfc_vport_delete(struct fc_vport *fc_vport)
skip_logo:
+ /* Remove FC host to break driver binding. */
+ fc_remove_host(shost);
+ scsi_remove_host(shost);
+
lpfc_cleanup(vport);
/* Remove scsi host now. The nodes are cleaned up. */
--
2.43.0
From: Rohit Ner <rohitner(a)google.com>
[ Upstream commit 767712f91de76abd22a45184e6e3440120b8bfce ]
As per JEDEC Standard No. 223E Section 5.9.2, the max # active commands
value programmed by the host sw in MCQConfig.MAC should be one less than
the actual value.
Signed-off-by: Rohit Ner <rohitner(a)google.com>
Link: https://lore.kernel.org/r/20240220095637.2900067-1-rohitner@google.com
Reviewed-by: Peter Wang <peter.wang(a)mediatek.com>
Reviewed-by: Can Guo <quic_cang(a)quicinc.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/ufs/core/ufs-mcq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
index 0787456c2b892..c873fd8239427 100644
--- a/drivers/ufs/core/ufs-mcq.c
+++ b/drivers/ufs/core/ufs-mcq.c
@@ -94,7 +94,7 @@ void ufshcd_mcq_config_mac(struct ufs_hba *hba, u32 max_active_cmds)
val = ufshcd_readl(hba, REG_UFS_MCQ_CFG);
val &= ~MCQ_CFG_MAC_MASK;
- val |= FIELD_PREP(MCQ_CFG_MAC_MASK, max_active_cmds);
+ val |= FIELD_PREP(MCQ_CFG_MAC_MASK, max_active_cmds - 1);
ufshcd_writel(hba, val, REG_UFS_MCQ_CFG);
}
EXPORT_SYMBOL_GPL(ufshcd_mcq_config_mac);
--
2.43.0
From: Rohit Ner <rohitner(a)google.com>
[ Upstream commit 767712f91de76abd22a45184e6e3440120b8bfce ]
As per JEDEC Standard No. 223E Section 5.9.2, the max # active commands
value programmed by the host sw in MCQConfig.MAC should be one less than
the actual value.
Signed-off-by: Rohit Ner <rohitner(a)google.com>
Link: https://lore.kernel.org/r/20240220095637.2900067-1-rohitner@google.com
Reviewed-by: Peter Wang <peter.wang(a)mediatek.com>
Reviewed-by: Can Guo <quic_cang(a)quicinc.com>
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/ufs/core/ufs-mcq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ufs/core/ufs-mcq.c b/drivers/ufs/core/ufs-mcq.c
index 0787456c2b892..c873fd8239427 100644
--- a/drivers/ufs/core/ufs-mcq.c
+++ b/drivers/ufs/core/ufs-mcq.c
@@ -94,7 +94,7 @@ void ufshcd_mcq_config_mac(struct ufs_hba *hba, u32 max_active_cmds)
val = ufshcd_readl(hba, REG_UFS_MCQ_CFG);
val &= ~MCQ_CFG_MAC_MASK;
- val |= FIELD_PREP(MCQ_CFG_MAC_MASK, max_active_cmds);
+ val |= FIELD_PREP(MCQ_CFG_MAC_MASK, max_active_cmds - 1);
ufshcd_writel(hba, val, REG_UFS_MCQ_CFG);
}
EXPORT_SYMBOL_GPL(ufshcd_mcq_config_mac);
--
2.43.0
I'm announcing the release of the 6.8.4 kernel.
All users of the 6.8 kernel series must upgrade.
The updated 6.8.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.8.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
include/linux/workqueue.h | 35 --
kernel/workqueue.c | 757 +++++++---------------------------------------
3 files changed, 131 insertions(+), 663 deletions(-)
Greg Kroah-Hartman (12):
Revert "workqueue: Shorten events_freezable_power_efficient name"
Revert "workqueue: Don't call cpumask_test_cpu() with -1 CPU in wq_update_node_max_active()"
Revert "workqueue: Implement system-wide nr_active enforcement for unbound workqueues"
Revert "workqueue: Introduce struct wq_node_nr_active"
Revert "workqueue: RCU protect wq->dfl_pwq and implement accessors for it"
Revert "workqueue: Make wq_adjust_max_active() round-robin pwqs while activating"
Revert "workqueue: Move nr_active handling into helpers"
Revert "workqueue: Replace pwq_activate_inactive_work() with [__]pwq_activate_work()"
Revert "workqueue: Factor out pwq_is_empty()"
Revert "workqueue: Move pwq->max_active to wq->max_active"
Revert "workqueue.c: Increase workqueue name length"
Linux 6.8.4
[cc stable to see if they have any ideas about fixing this]
On Sat, 2024-04-06 at 12:16 -0400, John David Anglin wrote:
> On 2024-04-06 11:06 a.m., James Bottomley wrote:
> > On Sat, 2024-04-06 at 10:30 -0400, John David Anglin wrote:
> > > On 2024-04-05 3:36 p.m., Bart Van Assche wrote:
> > > > On 4/4/24 13:07, John David Anglin wrote:
> > > > > On 2024-04-04 12:32 p.m., Bart Van Assche wrote:
> > > > > > Can you please help with verifying whether this kernel warn
> > > > > > ing is only triggered by the 6.1 stable kernel series or
> > > > > > whether it is also
> > > > > > triggered by a vanilla kernel, e.g. kernel v6.8? That will
> > > > > > tell us whether we
> > > > > > need to review the upstream changes or the backp
> > > > > > orts on the v6.1 branch.
> > > > > Stable kernel v6.8.3 is okay.
> > > > Would it be possible to bisect this issue on the linux-6.1.y
> > > > branch? That probably will be faster than reviewing all
> > > > backports
> > > > of SCSI patches on that branch.
> > > The warning triggers with v6.1.81. It doesn't trigger with
> > > v6.1.80.
> > It's this patch:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
> >
> > The specific problem being that the update to scsi_execute doesn't
> > set the sense_len that the WARN_ON is checking.
> >
> > This isn't a problem in mainline because we've converted all uses
> > of scsi_execute. Stable needs to either complete the conversion or
> > back out the inital patch. This change depends on the above change:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
>
> Thus, more than just the initial patch needs to be backed out.
OK, so the reason the bad patch got pulled in is because it's a
precursor of this fixes tagged backport:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=…
Which is presumably the other patch you had to back out to fix the
issue.
The problem is that Mike's series updating and then removing
scsi_execute() went into the tree as one series, so no-one notice the
first patch had this bug because the buggy routine got removed at the
end of the series. This also means there's nothing to fix and backport
in upstream.
The bug is also more widely spread than simply domain validation,
because every use of scsi_execute in the current stable tree will trip
this.
I'm not sure what the best fix is. I can certainly come up with a one
line fix for stable adding the missing length in the #define, but it
can't come from upstream as stated above. We could back the two
patches out then do a stable specific fix for the UAS problem (I don't
think we can leave the UAS patch backed out because the problem was
pretty serious).
What does stable want to do?
James
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
The modes[] array contains pointers to modes on the connectors'
mode lists, which are protected by dev->mode_config.mutex.
Thus we need to extend modes[] the same protection or by the
time we use it the elements may already be pointing to
freed/reused memory.
Cc: stable(a)vger.kernel.org
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/10583
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/drm_client_modeset.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_client_modeset.c b/drivers/gpu/drm/drm_client_modeset.c
index 871e4e2129d6..0683a129b362 100644
--- a/drivers/gpu/drm/drm_client_modeset.c
+++ b/drivers/gpu/drm/drm_client_modeset.c
@@ -777,6 +777,7 @@ int drm_client_modeset_probe(struct drm_client_dev *client, unsigned int width,
unsigned int total_modes_count = 0;
struct drm_client_offset *offsets;
unsigned int connector_count = 0;
+ /* points to modes protected by mode_config.mutex */
struct drm_display_mode **modes;
struct drm_crtc **crtcs;
int i, ret = 0;
@@ -845,7 +846,6 @@ int drm_client_modeset_probe(struct drm_client_dev *client, unsigned int width,
drm_client_pick_crtcs(client, connectors, connector_count,
crtcs, modes, 0, width, height);
}
- mutex_unlock(&dev->mode_config.mutex);
drm_client_modeset_release(client);
@@ -875,6 +875,7 @@ int drm_client_modeset_probe(struct drm_client_dev *client, unsigned int width,
modeset->y = offset->y;
}
}
+ mutex_unlock(&dev->mode_config.mutex);
mutex_unlock(&client->modeset_mutex);
out:
--
2.43.2
On 04.04.2024 19:02, Ritesh Harjani (IBM) wrote:
> It will be good to know what was the test which identified this though?
>
> -ritesh
Unfortunately syzkaller was not able to generate a reproducer for this
issue.
- Ukhin Mikhail.
There is code in the SCSI core that sets the SCMD_FAIL_IF_RECOVERING
flag but there is no code that clears this flag. Instead of only clearing
SCMD_INITIALIZED in scsi_end_request(), clear all flags. It is never
necessary to preserve any command flags inside scsi_end_request().
Cc: stable(a)vger.kernel.org
Fixes: 310bcaef6d7e ("scsi: core: Support failing requests while recovering")
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
---
drivers/scsi/scsi_lib.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index ca48ba9a229a..2fc2b97777ca 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -633,10 +633,9 @@ static bool scsi_end_request(struct request *req, blk_status_t error,
if (blk_queue_add_random(q))
add_disk_randomness(req->q->disk);
- if (!blk_rq_is_passthrough(req)) {
- WARN_ON_ONCE(!(cmd->flags & SCMD_INITIALIZED));
- cmd->flags &= ~SCMD_INITIALIZED;
- }
+ WARN_ON_ONCE(!blk_rq_is_passthrough(req) &&
+ !(cmd->flags & SCMD_INITIALIZED));
+ cmd->flags = 0;
/*
* Calling rcu_barrier() is not necessary here because the
Fuzzing reports a possible deadlock in jbd2_log_wait_commit.
The problem occurs in ext4_ind_migrate due to an incorrect order of
unlocking of the journal and write semaphores - the order of unlocking
must be the reverse of the order of locking.
Found by Linux Verification Center (linuxtesting.org) with syzkaller.
Signed-off-by: Artem Sadovnikov <ancowi69(a)gmail.com>
---
fs/ext4/migrate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/migrate.c b/fs/ext4/migrate.c
index d98ac2af8199..a5e1492bbaaa 100644
--- a/fs/ext4/migrate.c
+++ b/fs/ext4/migrate.c
@@ -663,8 +663,8 @@ int ext4_ind_migrate(struct inode *inode)
if (unlikely(ret2 && !ret))
ret = ret2;
errout:
- ext4_journal_stop(handle);
up_write(&EXT4_I(inode)->i_data_sem);
+ ext4_journal_stop(handle);
out_unlock:
ext4_writepages_up_write(inode->i_sb, alloc_ctx);
return ret;
--
2.25.1
The quilt patch titled
Subject: x86/mm/pat: fix VM_PAT handling in COW mappings
has been removed from the -mm tree. Its filename was
x86-mm-pat-fix-vm_pat-handling-in-cow-mappings.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: x86/mm/pat: fix VM_PAT handling in COW mappings
Date: Wed, 3 Apr 2024 23:21:30 +0200
PAT handling won't do the right thing in COW mappings: the first PTE (or,
in fact, all PTEs) can be replaced during write faults to point at anon
folios. Reliably recovering the correct PFN and cachemode using
follow_phys() from PTEs will not work in COW mappings.
Using follow_phys(), we might just get the address+protection of the anon
folio (which is very wrong), or fail on swap/nonswap entries, failing
follow_phys() and triggering a WARN_ON_ONCE() in untrack_pfn() and
track_pfn_copy(), not properly calling free_pfn_range().
In free_pfn_range(), we either wouldn't call memtype_free() or would call
it with the wrong range, possibly leaking memory.
To fix that, let's update follow_phys() to refuse returning anon folios,
and fallback to using the stored PFN inside vma->vm_pgoff for COW mappings
if we run into that.
We will now properly handle untrack_pfn() with COW mappings, where we
don't need the cachemode. We'll have to fail fork()->track_pfn_copy() if
the first page was replaced by an anon folio, though: we'd have to store
the cachemode in the VMA to make this work, likely growing the VMA size.
For now, lets keep it simple and let track_pfn_copy() just fail in that
case: it would have failed in the past with swap/nonswap entries already,
and it would have done the wrong thing with anon folios.
Simple reproducer to trigger the WARN_ON_ONCE() in untrack_pfn():
<--- C reproducer --->
#include <stdio.h>
#include <sys/mman.h>
#include <unistd.h>
#include <liburing.h>
int main(void)
{
struct io_uring_params p = {};
int ring_fd;
size_t size;
char *map;
ring_fd = io_uring_setup(1, &p);
if (ring_fd < 0) {
perror("io_uring_setup");
return 1;
}
size = p.sq_off.array + p.sq_entries * sizeof(unsigned);
/* Map the submission queue ring MAP_PRIVATE */
map = mmap(0, size, PROT_READ | PROT_WRITE, MAP_PRIVATE,
ring_fd, IORING_OFF_SQ_RING);
if (map == MAP_FAILED) {
perror("mmap");
return 1;
}
/* We have at least one page. Let's COW it. */
*map = 0;
pause();
return 0;
}
<--- C reproducer --->
On a system with 16 GiB RAM and swap configured:
# ./iouring &
# memhog 16G
# killall iouring
[ 301.552930] ------------[ cut here ]------------
[ 301.553285] WARNING: CPU: 7 PID: 1402 at arch/x86/mm/pat/memtype.c:1060 untrack_pfn+0xf4/0x100
[ 301.553989] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_g
[ 301.558232] CPU: 7 PID: 1402 Comm: iouring Not tainted 6.7.5-100.fc38.x86_64 #1
[ 301.558772] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebu4
[ 301.559569] RIP: 0010:untrack_pfn+0xf4/0x100
[ 301.559893] Code: 75 c4 eb cf 48 8b 43 10 8b a8 e8 00 00 00 3b 6b 28 74 b8 48 8b 7b 30 e8 ea 1a f7 000
[ 301.561189] RSP: 0018:ffffba2c0377fab8 EFLAGS: 00010282
[ 301.561590] RAX: 00000000ffffffea RBX: ffff9208c8ce9cc0 RCX: 000000010455e047
[ 301.562105] RDX: 07fffffff0eb1e0a RSI: 0000000000000000 RDI: ffff9208c391d200
[ 301.562628] RBP: 0000000000000000 R08: ffffba2c0377fab8 R09: 0000000000000000
[ 301.563145] R10: ffff9208d2292d50 R11: 0000000000000002 R12: 00007fea890e0000
[ 301.563669] R13: 0000000000000000 R14: ffffba2c0377fc08 R15: 0000000000000000
[ 301.564186] FS: 0000000000000000(0000) GS:ffff920c2fbc0000(0000) knlGS:0000000000000000
[ 301.564773] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 301.565197] CR2: 00007fea88ee8a20 CR3: 00000001033a8000 CR4: 0000000000750ef0
[ 301.565725] PKRU: 55555554
[ 301.565944] Call Trace:
[ 301.566148] <TASK>
[ 301.566325] ? untrack_pfn+0xf4/0x100
[ 301.566618] ? __warn+0x81/0x130
[ 301.566876] ? untrack_pfn+0xf4/0x100
[ 301.567163] ? report_bug+0x171/0x1a0
[ 301.567466] ? handle_bug+0x3c/0x80
[ 301.567743] ? exc_invalid_op+0x17/0x70
[ 301.568038] ? asm_exc_invalid_op+0x1a/0x20
[ 301.568363] ? untrack_pfn+0xf4/0x100
[ 301.568660] ? untrack_pfn+0x65/0x100
[ 301.568947] unmap_single_vma+0xa6/0xe0
[ 301.569247] unmap_vmas+0xb5/0x190
[ 301.569532] exit_mmap+0xec/0x340
[ 301.569801] __mmput+0x3e/0x130
[ 301.570051] do_exit+0x305/0xaf0
...
Link: https://lkml.kernel.org/r/20240403212131.929421-3-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Wupeng Ma <mawupeng1(a)huawei.com>
Closes: https://lkml.kernel.org/r/20240227122814.3781907-1-mawupeng1@huawei.com
Fixes: b1a86e15dc03 ("x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines")
Fixes: 5899329b1910 ("x86: PAT: implement track/untrack of pfnmap regions for x86 - v3")
Acked-by: Ingo Molnar <mingo(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/mm/pat/memtype.c | 49 +++++++++++++++++++++++++-----------
mm/memory.c | 4 ++
2 files changed, 39 insertions(+), 14 deletions(-)
--- a/arch/x86/mm/pat/memtype.c~x86-mm-pat-fix-vm_pat-handling-in-cow-mappings
+++ a/arch/x86/mm/pat/memtype.c
@@ -947,6 +947,38 @@ static void free_pfn_range(u64 paddr, un
memtype_free(paddr, paddr + size);
}
+static int get_pat_info(struct vm_area_struct *vma, resource_size_t *paddr,
+ pgprot_t *pgprot)
+{
+ unsigned long prot;
+
+ VM_WARN_ON_ONCE(!(vma->vm_flags & VM_PAT));
+
+ /*
+ * We need the starting PFN and cachemode used for track_pfn_remap()
+ * that covered the whole VMA. For most mappings, we can obtain that
+ * information from the page tables. For COW mappings, we might now
+ * suddenly have anon folios mapped and follow_phys() will fail.
+ *
+ * Fallback to using vma->vm_pgoff, see remap_pfn_range_notrack(), to
+ * detect the PFN. If we need the cachemode as well, we're out of luck
+ * for now and have to fail fork().
+ */
+ if (!follow_phys(vma, vma->vm_start, 0, &prot, paddr)) {
+ if (pgprot)
+ *pgprot = __pgprot(prot);
+ return 0;
+ }
+ if (is_cow_mapping(vma->vm_flags)) {
+ if (pgprot)
+ return -EINVAL;
+ *paddr = (resource_size_t)vma->vm_pgoff << PAGE_SHIFT;
+ return 0;
+ }
+ WARN_ON_ONCE(1);
+ return -EINVAL;
+}
+
/*
* track_pfn_copy is called when vma that is covering the pfnmap gets
* copied through copy_page_range().
@@ -957,20 +989,13 @@ static void free_pfn_range(u64 paddr, un
int track_pfn_copy(struct vm_area_struct *vma)
{
resource_size_t paddr;
- unsigned long prot;
unsigned long vma_size = vma->vm_end - vma->vm_start;
pgprot_t pgprot;
if (vma->vm_flags & VM_PAT) {
- /*
- * reserve the whole chunk covered by vma. We need the
- * starting address and protection from pte.
- */
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, &pgprot))
return -EINVAL;
- }
- pgprot = __pgprot(prot);
+ /* reserve the whole chunk covered by vma. */
return reserve_pfn_range(paddr, vma_size, &pgprot, 1);
}
@@ -1045,7 +1070,6 @@ void untrack_pfn(struct vm_area_struct *
unsigned long size, bool mm_wr_locked)
{
resource_size_t paddr;
- unsigned long prot;
if (vma && !(vma->vm_flags & VM_PAT))
return;
@@ -1053,11 +1077,8 @@ void untrack_pfn(struct vm_area_struct *
/* free the chunk starting from pfn or the whole chunk */
paddr = (resource_size_t)pfn << PAGE_SHIFT;
if (!paddr && !size) {
- if (follow_phys(vma, vma->vm_start, 0, &prot, &paddr)) {
- WARN_ON_ONCE(1);
+ if (get_pat_info(vma, &paddr, NULL))
return;
- }
-
size = vma->vm_end - vma->vm_start;
}
free_pfn_range(paddr, size);
--- a/mm/memory.c~x86-mm-pat-fix-vm_pat-handling-in-cow-mappings
+++ a/mm/memory.c
@@ -5973,6 +5973,10 @@ int follow_phys(struct vm_area_struct *v
goto out;
pte = ptep_get(ptep);
+ /* Never return PFNs of anon folios in COW mappings. */
+ if (vm_normal_folio(vma, address, pte))
+ goto unlock;
+
if ((flags & FOLL_WRITE) && !pte_write(pte))
goto unlock;
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly.patch
mm-madvise-dont-perform-madvise-vma-walk-for-madv_populate_readwrite.patch
mm-userfaultfd-dont-place-zeropages-when-zeropages-are-disallowed.patch
s390-mm-re-enable-the-shared-zeropage-for-pv-and-skeys-kvm-guests.patch
mm-convert-folio_estimated_sharers-to-folio_likely_mapped_shared.patch
mm-convert-folio_estimated_sharers-to-folio_likely_mapped_shared-fix.patch
selftests-memfd_secret-add-vmsplice-test.patch
mm-merge-folio_is_secretmem-and-folio_fast_pin_allowed-into-gup_fast_folio_allowed.patch
mm-optimize-config_per_vma_lock-member-placement-in-vm_area_struct.patch
mm-remove-prot-parameter-from-move_pte.patch
mm-gup-consistently-name-gup-fast-functions.patch
mm-treewide-rename-config_have_fast_gup-to-config_have_gup_fast.patch
mm-use-gup-fast-instead-fast-gup-in-remaining-comments.patch
The quilt patch titled
Subject: selftests/mm: include strings.h for ffsl
has been removed from the -mm tree. Its filename was
selftests-mm-include-stringsh-for-ffsl.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Edward Liaw <edliaw(a)google.com>
Subject: selftests/mm: include strings.h for ffsl
Date: Fri, 29 Mar 2024 18:58:10 +0000
Got a compilation error on Android for ffsl after 91b80cc5b39f
("selftests: mm: fix map_hugetlb failure on 64K page size systems")
included vm_util.h.
Link: https://lkml.kernel.org/r/20240329185814.16304-1-edliaw@google.com
Fixes: af605d26a8f2 ("selftests/mm: merge util.h into vm_util.h")
Signed-off-by: Edward Liaw <edliaw(a)google.com>
Reviewed-by: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: "Mike Rapoport (IBM)" <rppt(a)kernel.org>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/vm_util.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/vm_util.h~selftests-mm-include-stringsh-for-ffsl
+++ a/tools/testing/selftests/mm/vm_util.h
@@ -3,7 +3,7 @@
#include <stdbool.h>
#include <sys/mman.h>
#include <err.h>
-#include <string.h> /* ffsl() */
+#include <strings.h> /* ffsl() */
#include <unistd.h> /* _SC_PAGESIZE */
#define BIT_ULL(nr) (1ULL << (nr))
_
Patches currently in -mm which might be from edliaw(a)google.com are
The quilt patch titled
Subject: mm/secretmem: fix GUP-fast succeeding on secretmem folios
has been removed from the -mm tree. Its filename was
mm-secretmem-fix-gup-fast-succeeding-on-secretmem-folios.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/secretmem: fix GUP-fast succeeding on secretmem folios
Date: Tue, 26 Mar 2024 15:32:08 +0100
folio_is_secretmem() currently relies on secretmem folios being LRU
folios, to save some cycles.
However, folios might reside in a folio batch without the LRU flag set, or
temporarily have their LRU flag cleared. Consequently, the LRU flag is
unreliable for this purpose.
In particular, this is the case when secretmem_fault() allocates a fresh
page and calls filemap_add_folio()->folio_add_lru(). The folio might be
added to the per-cpu folio batch and won't get the LRU flag set until the
batch was drained using e.g., lru_add_drain().
Consequently, folio_is_secretmem() might not detect secretmem folios and
GUP-fast can succeed in grabbing a secretmem folio, crashing the kernel
when we would later try reading/writing to the folio, because the folio
has been unmapped from the directmap.
Fix it by removing that unreliable check.
Link: https://lkml.kernel.org/r/20240326143210.291116-2-david@redhat.com
Fixes: 1507f51255c9 ("mm: introduce memfd_secret system call to create "secret" memory areas")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: xingwei lee <xrivendell7(a)gmail.com>
Reported-by: yue sun <samsun1006219(a)gmail.com>
Closes: https://lore.kernel.org/lkml/CABOYnLyevJeravW=QrH0JUPYEcDN160aZFb7kwndm-J2r…
Debugged-by: Miklos Szeredi <miklos(a)szeredi.hu>
Tested-by: Miklos Szeredi <mszeredi(a)redhat.com>
Reviewed-by: Mike Rapoport (IBM) <rppt(a)kernel.org>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/secretmem.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/secretmem.h~mm-secretmem-fix-gup-fast-succeeding-on-secretmem-folios
+++ a/include/linux/secretmem.h
@@ -13,10 +13,10 @@ static inline bool folio_is_secretmem(st
/*
* Using folio_mapping() is quite slow because of the actual call
* instruction.
- * We know that secretmem pages are not compound and LRU so we can
+ * We know that secretmem pages are not compound, so we can
* save a couple of cycles here.
*/
- if (folio_test_large(folio) || !folio_test_lru(folio))
+ if (folio_test_large(folio))
return false;
mapping = (struct address_space *)
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-madvise-make-madv_populate_readwrite-handle-vm_fault_retry-properly.patch
mm-madvise-dont-perform-madvise-vma-walk-for-madv_populate_readwrite.patch
mm-userfaultfd-dont-place-zeropages-when-zeropages-are-disallowed.patch
s390-mm-re-enable-the-shared-zeropage-for-pv-and-skeys-kvm-guests.patch
mm-convert-folio_estimated_sharers-to-folio_likely_mapped_shared.patch
mm-convert-folio_estimated_sharers-to-folio_likely_mapped_shared-fix.patch
selftests-memfd_secret-add-vmsplice-test.patch
mm-merge-folio_is_secretmem-and-folio_fast_pin_allowed-into-gup_fast_folio_allowed.patch
mm-optimize-config_per_vma_lock-member-placement-in-vm_area_struct.patch
mm-remove-prot-parameter-from-move_pte.patch
mm-gup-consistently-name-gup-fast-functions.patch
mm-treewide-rename-config_have_fast_gup-to-config_have_gup_fast.patch
mm-use-gup-fast-instead-fast-gup-in-remaining-comments.patch
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040524-stitch-resolute-ead5@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282 Mon Sep 17 00:00:00 2001
From: Davide Caratti <dcaratti(a)redhat.com>
Date: Fri, 29 Mar 2024 13:08:52 +0100
Subject: [PATCH] mptcp: don't account accept() of non-MPC client as fallback
to TCP
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
accept non-MPC connections. As reported by Christoph, this is "surprising"
because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
incremented when a connection was seen using MPTCP options, then a
fallback to TCP has been done. Let's do that by incrementing it when
the subflow context of an inbound MPC connection attempt is dropped.
Also, update mptcp_connect.sh kselftest, to ensure that the
above MIB does not increment in case a pure TCP client connects to a
MPTCP server.
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
Cc: stable(a)vger.kernel.org
Reported-by: Christoph Paasch <cpaasch(a)apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449
Signed-off-by: Davide Caratti <dcaratti(a)redhat.com>
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3a1967bc7bad..7e74b812e366 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3937,8 +3937,6 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
mptcp_set_state(newsk, TCP_CLOSE);
}
} else {
- MPTCP_INC_STATS(sock_net(ssk),
- MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
tcpfallback:
newsk->sk_kern_sock = kern;
lock_sock(newsk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 1626dd20c68f..6042a47da61b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -905,6 +905,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
return child;
fallback:
+ if (fallback)
+ SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
mptcp_subflow_drop_ctx(child);
return child;
}
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 4c4248554826..4131f3263a48 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -383,12 +383,14 @@ do_transfer()
local stat_cookierx_last
local stat_csum_err_s
local stat_csum_err_c
+ local stat_tcpfb_last_l
stat_synrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX")
stat_ackrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX")
stat_cookietx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent")
stat_cookierx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv")
stat_csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr")
stat_csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr")
+ stat_tcpfb_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
timeout ${timeout_test} \
ip netns exec ${listener_ns} \
@@ -457,11 +459,13 @@ do_transfer()
local stat_cookietx_now
local stat_cookierx_now
local stat_ooo_now
+ local stat_tcpfb_now_l
stat_synrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX")
stat_ackrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX")
stat_cookietx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent")
stat_cookierx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv")
stat_ooo_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtTCPOFOQueue")
+ stat_tcpfb_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
expect_synrx=$((stat_synrx_last_l))
expect_ackrx=$((stat_ackrx_last_l))
@@ -508,6 +512,11 @@ do_transfer()
fi
fi
+ if [ ${stat_ooo_now} -eq 0 ] && [ ${stat_tcpfb_last_l} -ne ${stat_tcpfb_now_l} ]; then
+ mptcp_lib_pr_fail "unexpected fallback to TCP"
+ rets=1
+ fi
+
if [ $cookies -eq 2 ];then
if [ $stat_cookietx_last -ge $stat_cookietx_now ] ;then
extra+=" WARN: CookieSent: did not advance"
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040523-handling-conceded-2895@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282 Mon Sep 17 00:00:00 2001
From: Davide Caratti <dcaratti(a)redhat.com>
Date: Fri, 29 Mar 2024 13:08:52 +0100
Subject: [PATCH] mptcp: don't account accept() of non-MPC client as fallback
to TCP
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
accept non-MPC connections. As reported by Christoph, this is "surprising"
because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
incremented when a connection was seen using MPTCP options, then a
fallback to TCP has been done. Let's do that by incrementing it when
the subflow context of an inbound MPC connection attempt is dropped.
Also, update mptcp_connect.sh kselftest, to ensure that the
above MIB does not increment in case a pure TCP client connects to a
MPTCP server.
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
Cc: stable(a)vger.kernel.org
Reported-by: Christoph Paasch <cpaasch(a)apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449
Signed-off-by: Davide Caratti <dcaratti(a)redhat.com>
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3a1967bc7bad..7e74b812e366 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3937,8 +3937,6 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
mptcp_set_state(newsk, TCP_CLOSE);
}
} else {
- MPTCP_INC_STATS(sock_net(ssk),
- MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
tcpfallback:
newsk->sk_kern_sock = kern;
lock_sock(newsk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 1626dd20c68f..6042a47da61b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -905,6 +905,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
return child;
fallback:
+ if (fallback)
+ SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
mptcp_subflow_drop_ctx(child);
return child;
}
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 4c4248554826..4131f3263a48 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -383,12 +383,14 @@ do_transfer()
local stat_cookierx_last
local stat_csum_err_s
local stat_csum_err_c
+ local stat_tcpfb_last_l
stat_synrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX")
stat_ackrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX")
stat_cookietx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent")
stat_cookierx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv")
stat_csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr")
stat_csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr")
+ stat_tcpfb_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
timeout ${timeout_test} \
ip netns exec ${listener_ns} \
@@ -457,11 +459,13 @@ do_transfer()
local stat_cookietx_now
local stat_cookierx_now
local stat_ooo_now
+ local stat_tcpfb_now_l
stat_synrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX")
stat_ackrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX")
stat_cookietx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent")
stat_cookierx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv")
stat_ooo_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtTCPOFOQueue")
+ stat_tcpfb_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
expect_synrx=$((stat_synrx_last_l))
expect_ackrx=$((stat_ackrx_last_l))
@@ -508,6 +512,11 @@ do_transfer()
fi
fi
+ if [ ${stat_ooo_now} -eq 0 ] && [ ${stat_tcpfb_last_l} -ne ${stat_tcpfb_now_l} ]; then
+ mptcp_lib_pr_fail "unexpected fallback to TCP"
+ rets=1
+ fi
+
if [ $cookies -eq 2 ];then
if [ $stat_cookietx_last -ge $stat_cookietx_now ] ;then
extra+=" WARN: CookieSent: did not advance"
Case values introduced in commit
5f78e1fb7a3e ("ASoC: qcom: Add driver support for audioreach solution")
cause out of bounds access in arrays of sc7280 driver data (e.g. in case
of RX_CODEC_DMA_RX_0 in sc7280_snd_hw_params()).
Redefine LPASS_MAX_PORTS to consider the maximum possible port id for
q6dsp as sc7280 driver utilizes some of those values.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 77d0ffef793d ("ASoC: qcom: Add macro for lpass DAI id's max limit")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mikhail Kobuk <m.kobuk(a)ispras.ru>
---
sound/soc/qcom/lpass.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/sound/soc/qcom/lpass.h b/sound/soc/qcom/lpass.h
index 27a2bf9a6613..10a507c95312 100644
--- a/sound/soc/qcom/lpass.h
+++ b/sound/soc/qcom/lpass.h
@@ -13,10 +13,11 @@
#include <linux/platform_device.h>
#include <linux/regmap.h>
#include <dt-bindings/sound/qcom,lpass.h>
+#include <dt-bindings/sound/qcom,q6dsp-lpass-ports.h>
#include "lpass-hdmi.h"
#define LPASS_AHBIX_CLOCK_FREQUENCY 131072000
-#define LPASS_MAX_PORTS (LPASS_CDC_DMA_VA_TX8 + 1)
+#define LPASS_MAX_PORTS (QUINARY_MI2S_TX + 1)
#define LPASS_MAX_MI2S_PORTS (8)
#define LPASS_MAX_DMA_CHANNELS (8)
#define LPASS_MAX_HDMI_DMA_CHANNELS (4)
--
2.44.0
This patch introduces a new API, tegra_xusb_padctl_get_port_number,
to the Tegra XUSB Pad Controller driver. This API is used to identify
the USB port that is associated with a given PHY.
The function takes a PHY pointer for either a USB2 PHY or USB3 PHY as input
and returns the corresponding port number. If the PHY pointer is invalid,
it returns -ENODEV.
Cc: stable(a)vger.kernel.org
Signed-off-by: Wayne Chang <waynec(a)nvidia.com>
---
V1 -> V2:cc stable
drivers/phy/tegra/xusb.c | 13 +++++++++++++
include/linux/phy/tegra/xusb.h | 1 +
2 files changed, 14 insertions(+)
diff --git a/drivers/phy/tegra/xusb.c b/drivers/phy/tegra/xusb.c
index 142ebe0247cc..983a6e6173bd 100644
--- a/drivers/phy/tegra/xusb.c
+++ b/drivers/phy/tegra/xusb.c
@@ -1531,6 +1531,19 @@ int tegra_xusb_padctl_get_usb3_companion(struct tegra_xusb_padctl *padctl,
}
EXPORT_SYMBOL_GPL(tegra_xusb_padctl_get_usb3_companion);
+int tegra_xusb_padctl_get_port_number(struct phy *phy)
+{
+ struct tegra_xusb_lane *lane;
+
+ if (!phy)
+ return -ENODEV;
+
+ lane = phy_get_drvdata(phy);
+
+ return lane->index;
+}
+EXPORT_SYMBOL_GPL(tegra_xusb_padctl_get_port_number);
+
MODULE_AUTHOR("Thierry Reding <treding(a)nvidia.com>");
MODULE_DESCRIPTION("Tegra XUSB Pad Controller driver");
MODULE_LICENSE("GPL v2");
diff --git a/include/linux/phy/tegra/xusb.h b/include/linux/phy/tegra/xusb.h
index 70998e6dd6fd..6ca51e0080ec 100644
--- a/include/linux/phy/tegra/xusb.h
+++ b/include/linux/phy/tegra/xusb.h
@@ -26,6 +26,7 @@ void tegra_phy_xusb_utmi_pad_power_down(struct phy *phy);
int tegra_phy_xusb_utmi_port_reset(struct phy *phy);
int tegra_xusb_padctl_get_usb3_companion(struct tegra_xusb_padctl *padctl,
unsigned int port);
+int tegra_xusb_padctl_get_port_number(struct phy *phy);
int tegra_xusb_padctl_enable_phy_sleepwalk(struct tegra_xusb_padctl *padctl, struct phy *phy,
enum usb_device_speed speed);
int tegra_xusb_padctl_disable_phy_sleepwalk(struct tegra_xusb_padctl *padctl, struct phy *phy);
--
2.25.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040522-tremor-freehand-618e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282 Mon Sep 17 00:00:00 2001
From: Davide Caratti <dcaratti(a)redhat.com>
Date: Fri, 29 Mar 2024 13:08:52 +0100
Subject: [PATCH] mptcp: don't account accept() of non-MPC client as fallback
to TCP
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
accept non-MPC connections. As reported by Christoph, this is "surprising"
because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
incremented when a connection was seen using MPTCP options, then a
fallback to TCP has been done. Let's do that by incrementing it when
the subflow context of an inbound MPC connection attempt is dropped.
Also, update mptcp_connect.sh kselftest, to ensure that the
above MIB does not increment in case a pure TCP client connects to a
MPTCP server.
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
Cc: stable(a)vger.kernel.org
Reported-by: Christoph Paasch <cpaasch(a)apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449
Signed-off-by: Davide Caratti <dcaratti(a)redhat.com>
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3a1967bc7bad..7e74b812e366 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3937,8 +3937,6 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
mptcp_set_state(newsk, TCP_CLOSE);
}
} else {
- MPTCP_INC_STATS(sock_net(ssk),
- MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
tcpfallback:
newsk->sk_kern_sock = kern;
lock_sock(newsk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 1626dd20c68f..6042a47da61b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -905,6 +905,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
return child;
fallback:
+ if (fallback)
+ SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
mptcp_subflow_drop_ctx(child);
return child;
}
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 4c4248554826..4131f3263a48 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -383,12 +383,14 @@ do_transfer()
local stat_cookierx_last
local stat_csum_err_s
local stat_csum_err_c
+ local stat_tcpfb_last_l
stat_synrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX")
stat_ackrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX")
stat_cookietx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent")
stat_cookierx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv")
stat_csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr")
stat_csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr")
+ stat_tcpfb_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
timeout ${timeout_test} \
ip netns exec ${listener_ns} \
@@ -457,11 +459,13 @@ do_transfer()
local stat_cookietx_now
local stat_cookierx_now
local stat_ooo_now
+ local stat_tcpfb_now_l
stat_synrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX")
stat_ackrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX")
stat_cookietx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent")
stat_cookierx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv")
stat_ooo_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtTCPOFOQueue")
+ stat_tcpfb_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
expect_synrx=$((stat_synrx_last_l))
expect_ackrx=$((stat_ackrx_last_l))
@@ -508,6 +512,11 @@ do_transfer()
fi
fi
+ if [ ${stat_ooo_now} -eq 0 ] && [ ${stat_tcpfb_last_l} -ne ${stat_tcpfb_now_l} ]; then
+ mptcp_lib_pr_fail "unexpected fallback to TCP"
+ rets=1
+ fi
+
if [ $cookies -eq 2 ];then
if [ $stat_cookietx_last -ge $stat_cookietx_now ] ;then
extra+=" WARN: CookieSent: did not advance"
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x 1a80dbcb2dbaf6e4c216e62e30fa7d3daa8001ce
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040527-propeller-immovably-a6d8@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1a80dbcb2dbaf6e4c216e62e30fa7d3daa8001ce Mon Sep 17 00:00:00 2001
From: Andrii Nakryiko <andrii(a)kernel.org>
Date: Wed, 27 Mar 2024 22:24:26 -0700
Subject: [PATCH] bpf: support deferring bpf_link dealloc to after RCU grace
period
BPF link for some program types is passed as a "context" which can be
used by those BPF programs to look up additional information. E.g., for
multi-kprobes and multi-uprobes, link is used to fetch BPF cookie values.
Because of this runtime dependency, when bpf_link refcnt drops to zero
there could still be active BPF programs running accessing link data.
This patch adds generic support to defer bpf_link dealloc callback to
after RCU GP, if requested. This is done by exposing two different
deallocation callbacks, one synchronous and one deferred. If deferred
one is provided, bpf_link_free() will schedule dealloc_deferred()
callback to happen after RCU GP.
BPF is using two flavors of RCU: "classic" non-sleepable one and RCU
tasks trace one. The latter is used when sleepable BPF programs are
used. bpf_link_free() accommodates that by checking underlying BPF
program's sleepable flag, and goes either through normal RCU GP only for
non-sleepable, or through RCU tasks trace GP *and* then normal RCU GP
(taking into account rcu_trace_implies_rcu_gp() optimization), if BPF
program is sleepable.
We use this for multi-kprobe and multi-uprobe links, which dereference
link during program run. We also preventively switch raw_tp link to use
deferred dealloc callback, as upcoming changes in bpf-next tree expose
raw_tp link data (specifically, cookie value) to BPF program at runtime
as well.
Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Fixes: 89ae89f53d20 ("bpf: Add multi uprobe link")
Reported-by: syzbot+981935d9485a560bfbcb(a)syzkaller.appspotmail.com
Reported-by: syzbot+2cb5a6c573e98db598cc(a)syzkaller.appspotmail.com
Reported-by: syzbot+62d8b26793e8a2bd0516(a)syzkaller.appspotmail.com
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Acked-by: Jiri Olsa <jolsa(a)kernel.org>
Link: https://lore.kernel.org/r/20240328052426.3042617-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 4f20f62f9d63..890e152d553e 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -1574,12 +1574,26 @@ struct bpf_link {
enum bpf_link_type type;
const struct bpf_link_ops *ops;
struct bpf_prog *prog;
- struct work_struct work;
+ /* rcu is used before freeing, work can be used to schedule that
+ * RCU-based freeing before that, so they never overlap
+ */
+ union {
+ struct rcu_head rcu;
+ struct work_struct work;
+ };
};
struct bpf_link_ops {
void (*release)(struct bpf_link *link);
+ /* deallocate link resources callback, called without RCU grace period
+ * waiting
+ */
void (*dealloc)(struct bpf_link *link);
+ /* deallocate link resources callback, called after RCU grace period;
+ * if underlying BPF program is sleepable we go through tasks trace
+ * RCU GP and then "classic" RCU GP
+ */
+ void (*dealloc_deferred)(struct bpf_link *link);
int (*detach)(struct bpf_link *link);
int (*update_prog)(struct bpf_link *link, struct bpf_prog *new_prog,
struct bpf_prog *old_prog);
diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index ae2ff73bde7e..c287925471f6 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -3024,17 +3024,46 @@ void bpf_link_inc(struct bpf_link *link)
atomic64_inc(&link->refcnt);
}
+static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu)
+{
+ struct bpf_link *link = container_of(rcu, struct bpf_link, rcu);
+
+ /* free bpf_link and its containing memory */
+ link->ops->dealloc_deferred(link);
+}
+
+static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)
+{
+ if (rcu_trace_implies_rcu_gp())
+ bpf_link_defer_dealloc_rcu_gp(rcu);
+ else
+ call_rcu(rcu, bpf_link_defer_dealloc_rcu_gp);
+}
+
/* bpf_link_free is guaranteed to be called from process context */
static void bpf_link_free(struct bpf_link *link)
{
+ bool sleepable = false;
+
bpf_link_free_id(link->id);
if (link->prog) {
+ sleepable = link->prog->sleepable;
/* detach BPF program, clean up used resources */
link->ops->release(link);
bpf_prog_put(link->prog);
}
- /* free bpf_link and its containing memory */
- link->ops->dealloc(link);
+ if (link->ops->dealloc_deferred) {
+ /* schedule BPF link deallocation; if underlying BPF program
+ * is sleepable, we need to first wait for RCU tasks trace
+ * sync, then go through "classic" RCU grace period
+ */
+ if (sleepable)
+ call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp);
+ else
+ call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp);
+ }
+ if (link->ops->dealloc)
+ link->ops->dealloc(link);
}
static void bpf_link_put_deferred(struct work_struct *work)
@@ -3544,7 +3573,7 @@ static int bpf_raw_tp_link_fill_link_info(const struct bpf_link *link,
static const struct bpf_link_ops bpf_raw_tp_link_lops = {
.release = bpf_raw_tp_link_release,
- .dealloc = bpf_raw_tp_link_dealloc,
+ .dealloc_deferred = bpf_raw_tp_link_dealloc,
.show_fdinfo = bpf_raw_tp_link_show_fdinfo,
.fill_link_info = bpf_raw_tp_link_fill_link_info,
};
diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
index 0b73fe5f7206..9dc605f08a23 100644
--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -2728,7 +2728,7 @@ static int bpf_kprobe_multi_link_fill_link_info(const struct bpf_link *link,
static const struct bpf_link_ops bpf_kprobe_multi_link_lops = {
.release = bpf_kprobe_multi_link_release,
- .dealloc = bpf_kprobe_multi_link_dealloc,
+ .dealloc_deferred = bpf_kprobe_multi_link_dealloc,
.fill_link_info = bpf_kprobe_multi_link_fill_link_info,
};
@@ -3242,7 +3242,7 @@ static int bpf_uprobe_multi_link_fill_link_info(const struct bpf_link *link,
static const struct bpf_link_ops bpf_uprobe_multi_link_lops = {
.release = bpf_uprobe_multi_link_release,
- .dealloc = bpf_uprobe_multi_link_dealloc,
+ .dealloc_deferred = bpf_uprobe_multi_link_dealloc,
.fill_link_info = bpf_uprobe_multi_link_fill_link_info,
};
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 40061817d95bce6dd5634a61a65cd5922e6ccc92
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040506-skinny-unsuited-f487@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40061817d95bce6dd5634a61a65cd5922e6ccc92 Mon Sep 17 00:00:00 2001
From: Geliang Tang <tanggeliang(a)kylinos.cn>
Date: Fri, 29 Mar 2024 13:08:53 +0100
Subject: [PATCH] selftests: mptcp: join: fix dev in check_endpoint
There's a bug in pm_nl_check_endpoint(), 'dev' didn't be parsed correctly.
If calling it in the 2nd test of endpoint_tests() too, it fails with an
error like this:
creation [FAIL] expected '10.0.2.2 id 2 subflow dev dev' \
found '10.0.2.2 id 2 subflow dev ns2eth2'
The reason is '$2' should be set to 'dev', not '$1'. This patch fixes it.
Fixes: 69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case")
Cc: stable(a)vger.kernel.org
Signed-off-by: Geliang Tang <tanggeliang(a)kylinos.cn>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-2-…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh
index 5e9211e89825..e4403236f655 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_join.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh
@@ -729,7 +729,7 @@ pm_nl_check_endpoint()
[ -n "$_flags" ]; flags="flags $_flags"
shift
elif [ $1 = "dev" ]; then
- [ -n "$2" ]; dev="dev $1"
+ [ -n "$2" ]; dev="dev $2"
shift
elif [ $1 = "id" ]; then
_id=$2
@@ -3610,6 +3610,8 @@ endpoint_tests()
local tests_pid=$!
wait_mpj $ns2
+ pm_nl_check_endpoint "creation" \
+ $ns2 10.0.2.2 id 2 flags subflow dev ns2eth2
chk_subflow_nr "before delete" 2
chk_mptcp_info subflows 1 subflows 1
ivpu_device->context_xa is locked both in kernel thread and IRQ context.
It requires XA_FLAGS_LOCK_IRQ flag to be passed during initialization
otherwise the lock could be acquired from a thread and interrupted by
an IRQ that locks it for the second time causing the deadlock.
This deadlock was reported by lockdep and observed in internal tests.
Fixes: 35b137630f08 ("accel/ivpu: Introduce a new DRM driver for Intel VPU")
Cc: <stable(a)vger.kernel.org> # v6.3+
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com>
---
drivers/accel/ivpu/ivpu_drv.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/accel/ivpu/ivpu_drv.c b/drivers/accel/ivpu/ivpu_drv.c
index 77283daaedd1..51d3f1a55d02 100644
--- a/drivers/accel/ivpu/ivpu_drv.c
+++ b/drivers/accel/ivpu/ivpu_drv.c
@@ -517,7 +517,7 @@ static int ivpu_dev_init(struct ivpu_device *vdev)
vdev->context_xa_limit.min = IVPU_USER_CONTEXT_MIN_SSID;
vdev->context_xa_limit.max = IVPU_USER_CONTEXT_MAX_SSID;
atomic64_set(&vdev->unique_id_counter, 0);
- xa_init_flags(&vdev->context_xa, XA_FLAGS_ALLOC);
+ xa_init_flags(&vdev->context_xa, XA_FLAGS_ALLOC | XA_FLAGS_LOCK_IRQ);
xa_init_flags(&vdev->submitted_jobs_xa, XA_FLAGS_ALLOC1);
xa_init_flags(&vdev->db_xa, XA_FLAGS_ALLOC1);
lockdep_set_class(&vdev->submitted_jobs_xa.xa_lock, &submitted_jobs_xa_lock_class_key);
--
2.43.2
Put NPU in D3hot after ivpu_resume() fails to power up the device.
This will assure that D3->D0 power cycle will be performed before
the next resume and also will minimize power usage in this corner case.
Fixes: 28083ff18d3f ("accel/ivpu: Fix DevTLB errors on suspend/resume and recovery")
Cc: <stable(a)vger.kernel.org> # v6.8+
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com>
---
drivers/accel/ivpu/ivpu_pm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c
index 325b82f8d971..ba51781b5896 100644
--- a/drivers/accel/ivpu/ivpu_pm.c
+++ b/drivers/accel/ivpu/ivpu_pm.c
@@ -97,6 +97,7 @@ static int ivpu_resume(struct ivpu_device *vdev)
ivpu_mmu_disable(vdev);
err_power_down:
ivpu_hw_power_down(vdev);
+ pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D3hot);
if (!ivpu_fw_is_cold_boot(vdev)) {
ivpu_pm_prepare_cold_boot(vdev);
--
2.43.2
From: "Wachowski, Karol" <karol.wachowski(a)intel.com>
In case of failed power up we end up left in PCI D3hot
state making it impossible to access NPU registers on retry.
Enter D0 state on retry before proceeding with power up sequence.
Fixes: 28083ff18d3f ("accel/ivpu: Fix DevTLB errors on suspend/resume and recovery")
Cc: <stable(a)vger.kernel.org> # v6.8+
Signed-off-by: Wachowski, Karol <karol.wachowski(a)intel.com>
Signed-off-by: Jacek Lawrynowicz <jacek.lawrynowicz(a)linux.intel.com>
---
drivers/accel/ivpu/ivpu_pm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/accel/ivpu/ivpu_pm.c b/drivers/accel/ivpu/ivpu_pm.c
index 9cbd7af6576b..325b82f8d971 100644
--- a/drivers/accel/ivpu/ivpu_pm.c
+++ b/drivers/accel/ivpu/ivpu_pm.c
@@ -71,10 +71,10 @@ static int ivpu_resume(struct ivpu_device *vdev)
{
int ret;
- pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D0);
+retry:
pci_restore_state(to_pci_dev(vdev->drm.dev));
+ pci_set_power_state(to_pci_dev(vdev->drm.dev), PCI_D0);
-retry:
ret = ivpu_hw_power_up(vdev);
if (ret) {
ivpu_err(vdev, "Failed to power up HW: %d\n", ret);
--
2.43.2
These pointers are frequently the same and memcmp does not compare the
pointers before comparing their contents so this was wasting cycles
comparing 16 KiB of memory which will always be equal.
Fixes: bb6780aa5a1d ("drm/vmwgfx: Diff cursors when using cmds")
Signed-off-by: Ian Forbes <ian.forbes(a)broadcom.com>
Cc: <stable(a)vger.kernel.org> # v6.2+
---
v2: Fix code and commit message formatting.
--
drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index cd4925346ed4..ef0af10c4968 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -216,7 +216,7 @@ static bool vmw_du_cursor_plane_has_changed(struct vmw_plane_state *old_vps,
new_image = vmw_du_cursor_plane_acquire_image(new_vps);
changed = false;
- if (old_image && new_image)
+ if (old_image && new_image && old_image != new_image)
changed = memcmp(old_image, new_image, size) != 0;
return changed;
--
2.34.1
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040519-palpable-barrel-9103@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7a1b3490f47e88ec4cbde65f1a77a0f4bc972282 Mon Sep 17 00:00:00 2001
From: Davide Caratti <dcaratti(a)redhat.com>
Date: Fri, 29 Mar 2024 13:08:52 +0100
Subject: [PATCH] mptcp: don't account accept() of non-MPC client as fallback
to TCP
Current MPTCP servers increment MPTcpExtMPCapableFallbackACK when they
accept non-MPC connections. As reported by Christoph, this is "surprising"
because the counter might become greater than MPTcpExtMPCapableSYNRX.
MPTcpExtMPCapableFallbackACK counter's name suggests it should only be
incremented when a connection was seen using MPTCP options, then a
fallback to TCP has been done. Let's do that by incrementing it when
the subflow context of an inbound MPC connection attempt is dropped.
Also, update mptcp_connect.sh kselftest, to ensure that the
above MIB does not increment in case a pure TCP client connects to a
MPTCP server.
Fixes: fc518953bc9c ("mptcp: add and use MIB counter infrastructure")
Cc: stable(a)vger.kernel.org
Reported-by: Christoph Paasch <cpaasch(a)apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/449
Signed-off-by: Davide Caratti <dcaratti(a)redhat.com>
Reviewed-by: Mat Martineau <martineau(a)kernel.org>
Reviewed-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://lore.kernel.org/r/20240329-upstream-net-20240329-fallback-mib-v1-1-…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 3a1967bc7bad..7e74b812e366 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -3937,8 +3937,6 @@ static int mptcp_stream_accept(struct socket *sock, struct socket *newsock,
mptcp_set_state(newsk, TCP_CLOSE);
}
} else {
- MPTCP_INC_STATS(sock_net(ssk),
- MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
tcpfallback:
newsk->sk_kern_sock = kern;
lock_sock(newsk);
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index 1626dd20c68f..6042a47da61b 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -905,6 +905,8 @@ static struct sock *subflow_syn_recv_sock(const struct sock *sk,
return child;
fallback:
+ if (fallback)
+ SUBFLOW_REQ_INC_STATS(req, MPTCP_MIB_MPCAPABLEPASSIVEFALLBACK);
mptcp_subflow_drop_ctx(child);
return child;
}
diff --git a/tools/testing/selftests/net/mptcp/mptcp_connect.sh b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
index 4c4248554826..4131f3263a48 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_connect.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_connect.sh
@@ -383,12 +383,14 @@ do_transfer()
local stat_cookierx_last
local stat_csum_err_s
local stat_csum_err_c
+ local stat_tcpfb_last_l
stat_synrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX")
stat_ackrx_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX")
stat_cookietx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent")
stat_cookierx_last=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv")
stat_csum_err_s=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtDataCsumErr")
stat_csum_err_c=$(mptcp_lib_get_counter "${connector_ns}" "MPTcpExtDataCsumErr")
+ stat_tcpfb_last_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
timeout ${timeout_test} \
ip netns exec ${listener_ns} \
@@ -457,11 +459,13 @@ do_transfer()
local stat_cookietx_now
local stat_cookierx_now
local stat_ooo_now
+ local stat_tcpfb_now_l
stat_synrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableSYNRX")
stat_ackrx_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableACKRX")
stat_cookietx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesSent")
stat_cookierx_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtSyncookiesRecv")
stat_ooo_now=$(mptcp_lib_get_counter "${listener_ns}" "TcpExtTCPOFOQueue")
+ stat_tcpfb_now_l=$(mptcp_lib_get_counter "${listener_ns}" "MPTcpExtMPCapableFallbackACK")
expect_synrx=$((stat_synrx_last_l))
expect_ackrx=$((stat_ackrx_last_l))
@@ -508,6 +512,11 @@ do_transfer()
fi
fi
+ if [ ${stat_ooo_now} -eq 0 ] && [ ${stat_tcpfb_last_l} -ne ${stat_tcpfb_now_l} ]; then
+ mptcp_lib_pr_fail "unexpected fallback to TCP"
+ rets=1
+ fi
+
if [ $cookies -eq 2 ];then
if [ $stat_cookietx_last -ge $stat_cookietx_now ] ;then
extra+=" WARN: CookieSent: did not advance"
From: Kan Liang <kan.liang(a)linux.intel.com>
A non-0 retire latency can be observed on a Raptorlake which doesn't
support the retire latency feature.
By design, the retire latency shares the PERF_SAMPLE_WEIGHT_STRUCT
sample type with other types of latency. That could avoid adding too
many different sample types to support all kinds of latency. For the
machine which doesn't support some kind of latency, 0 should be
returned.
Perf doesn’t clear/init all the fields of a sample data for the sake
of performance. It expects the later perf_{prepare,output}_sample() to
update the uninitialized field. However, the current implementation
doesn't touch the field of the retire latency if the feature is not
supported. The memory garbage is dumped into the perf data.
Clear the retire latency if the feature is not supported.
Fixes: c87a31093c70 ("perf/x86: Support Retire Latency")
Reported-by: "Bayduraev, Alexey V" <alexey.v.bayduraev(a)intel.com>
Tested-by: "Bayduraev, Alexey V" <alexey.v.bayduraev(a)intel.com>
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/events/intel/ds.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index f95cca6b632a..838f3e23bce9 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -1989,8 +1989,12 @@ static void setup_pebs_adaptive_sample_data(struct perf_event *event,
set_linear_ip(regs, basic->ip);
regs->flags = PERF_EFLAGS_EXACT;
- if ((sample_type & PERF_SAMPLE_WEIGHT_STRUCT) && (x86_pmu.flags & PMU_FL_RETIRE_LATENCY))
- data->weight.var3_w = format_size >> PEBS_RETIRE_LATENCY_OFFSET & PEBS_LATENCY_MASK;
+ if (sample_type & PERF_SAMPLE_WEIGHT_STRUCT) {
+ if (x86_pmu.flags & PMU_FL_RETIRE_LATENCY)
+ data->weight.var3_w = format_size >> PEBS_RETIRE_LATENCY_OFFSET & PEBS_LATENCY_MASK;
+ else
+ data->weight.var3_w = 0;
+ }
/*
* The record for MEMINFO is in front of GP
--
2.35.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 4535e1a4174c4111d92c5a9a21e542d232e0fcaa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024033032-confess-monument-a6db@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4535e1a4174c4111d92c5a9a21e542d232e0fcaa Mon Sep 17 00:00:00 2001
From: "Borislav Petkov (AMD)" <bp(a)alien8.de>
Date: Thu, 28 Mar 2024 13:59:05 +0100
Subject: [PATCH] x86/bugs: Fix the SRSO mitigation on Zen3/4
The original version of the mitigation would patch in the calls to the
untraining routines directly. That is, the alternative() in UNTRAIN_RET
will patch in the CALL to srso_alias_untrain_ret() directly.
However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain
mess") meant well in trying to clean up the situation, due to micro-
architectural reasons, the untraining routine srso_alias_untrain_ret()
must be the target of a CALL instruction and not of a JMP instruction as
it is done now.
Reshuffle the alternative macros to accomplish that.
Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 076bf8dee702..25466c4d2134 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -14,6 +14,7 @@
#include <asm/asm.h>
#include <asm/fred.h>
#include <asm/gsseg.h>
+#include <asm/nospec-branch.h>
#ifndef CONFIG_X86_CMPXCHG64
extern void cmpxchg8b_emu(void);
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index fc3a8a3c7ffe..170c89ed22fc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -262,11 +262,20 @@
.Lskip_rsb_\@:
.endm
+/*
+ * The CALL to srso_alias_untrain_ret() must be patched in directly at
+ * the spot where untraining must be done, ie., srso_alias_untrain_ret()
+ * must be the target of a CALL instruction instead of indirectly
+ * jumping to a wrapper which then calls it. Therefore, this macro is
+ * called outside of __UNTRAIN_RET below, for the time being, before the
+ * kernel can support nested alternatives with arbitrary nesting.
+ */
+.macro CALL_UNTRAIN_RET
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
-#define CALL_UNTRAIN_RET "call entry_untrain_ret"
-#else
-#define CALL_UNTRAIN_RET ""
+ ALTERNATIVE_2 "", "call entry_untrain_ret", X86_FEATURE_UNRET, \
+ "call srso_alias_untrain_ret", X86_FEATURE_SRSO_ALIAS
#endif
+.endm
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. Requires KERNEL CR3 because the
@@ -282,8 +291,8 @@
.macro __UNTRAIN_RET ibpb_feature, call_depth_insns
#if defined(CONFIG_MITIGATION_RETHUNK) || defined(CONFIG_MITIGATION_IBPB_ENTRY)
VALIDATE_UNRET_END
- ALTERNATIVE_3 "", \
- CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \
+ CALL_UNTRAIN_RET
+ ALTERNATIVE_2 "", \
"call entry_ibpb", \ibpb_feature, \
__stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH
#endif
@@ -342,6 +351,8 @@ extern void retbleed_return_thunk(void);
static inline void retbleed_return_thunk(void) {}
#endif
+extern void srso_alias_untrain_ret(void);
+
#ifdef CONFIG_MITIGATION_SRSO
extern void srso_return_thunk(void);
extern void srso_alias_return_thunk(void);
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 721b528da9ac..02cde194a99e 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -163,6 +163,7 @@ SYM_CODE_START_NOALIGN(srso_alias_untrain_ret)
lfence
jmp srso_alias_return_thunk
SYM_FUNC_END(srso_alias_untrain_ret)
+__EXPORT_THUNK(srso_alias_untrain_ret)
.popsection
.pushsection .text..__x86.rethunk_safe
@@ -224,10 +225,12 @@ SYM_CODE_START(srso_return_thunk)
SYM_CODE_END(srso_return_thunk)
#define JMP_SRSO_UNTRAIN_RET "jmp srso_untrain_ret"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "jmp srso_alias_untrain_ret"
#else /* !CONFIG_MITIGATION_SRSO */
+/* Dummy for the alternative in CALL_UNTRAIN_RET. */
+SYM_CODE_START(srso_alias_untrain_ret)
+ RET
+SYM_FUNC_END(srso_alias_untrain_ret)
#define JMP_SRSO_UNTRAIN_RET "ud2"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "ud2"
#endif /* CONFIG_MITIGATION_SRSO */
#ifdef CONFIG_MITIGATION_UNRET_ENTRY
@@ -319,9 +322,7 @@ SYM_FUNC_END(retbleed_untrain_ret)
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
SYM_FUNC_START(entry_untrain_ret)
- ALTERNATIVE_2 JMP_RETBLEED_UNTRAIN_RET, \
- JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO, \
- JMP_SRSO_ALIAS_UNTRAIN_RET, X86_FEATURE_SRSO_ALIAS
+ ALTERNATIVE JMP_RETBLEED_UNTRAIN_RET, JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO
SYM_FUNC_END(entry_untrain_ret)
__EXPORT_THUNK(entry_untrain_ret)
Some systems have ACPI tables that don't include everything that needs
to be mapped for a successful kexec. These systems rely on identity
maps that include the full gigabyte surrounding any smaller region
requested for kexec success. Without this, they fail to kexec and end
up doing a full firmware reboot.
So, reduce the use of GB pages only on systems where this is known to
be necessary (specifically, UV systems).
Signed-off-by: Steve Wahl <steve.wahl(a)hpe.com>
Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
Reported-by: Pavin Joseph <me(a)pavinjoseph.com>
Closes: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjosep…
Tested-by: Pavin Joseph <me(a)pavinjoseph.com>
Tested-by: Eric Hagberg <ehagberg(a)gmail.com>
Tested-by: Sarah Brofeldt <srhb(a)dbc.dk>
---
arch/x86/include/asm/init.h | 1 +
arch/x86/kernel/machine_kexec_64.c | 3 +++
arch/x86/mm/ident_map.c | 13 +++++++------
3 files changed, 11 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/init.h b/arch/x86/include/asm/init.h
index cc9ccf61b6bd..4ae843e8fefb 100644
--- a/arch/x86/include/asm/init.h
+++ b/arch/x86/include/asm/init.h
@@ -10,6 +10,7 @@ struct x86_mapping_info {
unsigned long page_flag; /* page flag for PMD or PUD entry */
unsigned long offset; /* ident mapping offset */
bool direct_gbpages; /* PUD level 1GB page support */
+ bool direct_gbpages_always; /* use 1GB pages exclusively */
unsigned long kernpg_flag; /* kernel pagetable flag override */
};
diff --git a/arch/x86/kernel/machine_kexec_64.c b/arch/x86/kernel/machine_kexec_64.c
index b180d8e497c3..1e1c6633bbec 100644
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -28,6 +28,7 @@
#include <asm/setup.h>
#include <asm/set_memory.h>
#include <asm/cpu.h>
+#include <asm/uv/uv.h>
#ifdef CONFIG_ACPI
/*
@@ -212,6 +213,8 @@ static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
if (direct_gbpages)
info.direct_gbpages = true;
+ if (!is_uv_system())
+ info.direct_gbpages_always = true;
for (i = 0; i < nr_pfn_mapped; i++) {
mstart = pfn_mapped[i].start << PAGE_SHIFT;
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index a204a332c71f..8039498b9713 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -39,12 +39,13 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
/* Is using a gbpage allowed? */
use_gbpage = info->direct_gbpages;
- /* Don't use gbpage if it maps more than the requested region. */
- /* at the begining: */
- use_gbpage &= ((addr & ~PUD_MASK) == 0);
- /* ... or at the end: */
- use_gbpage &= ((next & ~PUD_MASK) == 0);
-
+ if (!info->direct_gbpages_always) {
+ /* Don't use gbpage if it maps more than the requested region. */
+ /* at the beginning: */
+ use_gbpage &= ((addr & ~PUD_MASK) == 0);
+ /* ... or at the end: */
+ use_gbpage &= ((next & ~PUD_MASK) == 0);
+ }
/* Never overwrite existing mappings */
use_gbpage &= !pud_present(*pud);
--
2.26.2
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x fd819ad3ecf6f3c232a06b27423ce9ed8c20da89
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040512-graves-stipend-babf@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd819ad3ecf6f3c232a06b27423ce9ed8c20da89 Mon Sep 17 00:00:00 2001
From: Duoming Zhou <duoming(a)zju.edu.cn>
Date: Fri, 29 Mar 2024 09:50:23 +0800
Subject: [PATCH] ax25: fix use-after-free bugs caused by ax25_ds_del_timer
When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:
(Thread 1) | (Thread 2)
| ax25_ds_timeout()
ax25_dev_device_down() |
ax25_ds_del_timer() |
del_timer() |
ax25_dev_put() //FREE |
| ax25_dev-> //USE
In order to mitigate bugs, when the device is detaching, use
timer_shutdown_sync() to stop the timer.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming(a)zju.edu.cn>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index c5462486dbca..282ec581c072 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -105,7 +105,7 @@ void ax25_dev_device_down(struct net_device *dev)
spin_lock_bh(&ax25_dev_lock);
#ifdef CONFIG_AX25_DAMA_SLAVE
- ax25_ds_del_timer(ax25_dev);
+ timer_shutdown_sync(&ax25_dev->dama.slave_timer);
#endif
/*
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x fd819ad3ecf6f3c232a06b27423ce9ed8c20da89
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040511-chamber-yelp-2ec9@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd819ad3ecf6f3c232a06b27423ce9ed8c20da89 Mon Sep 17 00:00:00 2001
From: Duoming Zhou <duoming(a)zju.edu.cn>
Date: Fri, 29 Mar 2024 09:50:23 +0800
Subject: [PATCH] ax25: fix use-after-free bugs caused by ax25_ds_del_timer
When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:
(Thread 1) | (Thread 2)
| ax25_ds_timeout()
ax25_dev_device_down() |
ax25_ds_del_timer() |
del_timer() |
ax25_dev_put() //FREE |
| ax25_dev-> //USE
In order to mitigate bugs, when the device is detaching, use
timer_shutdown_sync() to stop the timer.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming(a)zju.edu.cn>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index c5462486dbca..282ec581c072 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -105,7 +105,7 @@ void ax25_dev_device_down(struct net_device *dev)
spin_lock_bh(&ax25_dev_lock);
#ifdef CONFIG_AX25_DAMA_SLAVE
- ax25_ds_del_timer(ax25_dev);
+ timer_shutdown_sync(&ax25_dev->dama.slave_timer);
#endif
/*
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x fd819ad3ecf6f3c232a06b27423ce9ed8c20da89
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040509-monetize-joyride-107c@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd819ad3ecf6f3c232a06b27423ce9ed8c20da89 Mon Sep 17 00:00:00 2001
From: Duoming Zhou <duoming(a)zju.edu.cn>
Date: Fri, 29 Mar 2024 09:50:23 +0800
Subject: [PATCH] ax25: fix use-after-free bugs caused by ax25_ds_del_timer
When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:
(Thread 1) | (Thread 2)
| ax25_ds_timeout()
ax25_dev_device_down() |
ax25_ds_del_timer() |
del_timer() |
ax25_dev_put() //FREE |
| ax25_dev-> //USE
In order to mitigate bugs, when the device is detaching, use
timer_shutdown_sync() to stop the timer.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming(a)zju.edu.cn>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index c5462486dbca..282ec581c072 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -105,7 +105,7 @@ void ax25_dev_device_down(struct net_device *dev)
spin_lock_bh(&ax25_dev_lock);
#ifdef CONFIG_AX25_DAMA_SLAVE
- ax25_ds_del_timer(ax25_dev);
+ timer_shutdown_sync(&ax25_dev->dama.slave_timer);
#endif
/*
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x fd819ad3ecf6f3c232a06b27423ce9ed8c20da89
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040506-earmark-skydiver-ab1c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd819ad3ecf6f3c232a06b27423ce9ed8c20da89 Mon Sep 17 00:00:00 2001
From: Duoming Zhou <duoming(a)zju.edu.cn>
Date: Fri, 29 Mar 2024 09:50:23 +0800
Subject: [PATCH] ax25: fix use-after-free bugs caused by ax25_ds_del_timer
When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:
(Thread 1) | (Thread 2)
| ax25_ds_timeout()
ax25_dev_device_down() |
ax25_ds_del_timer() |
del_timer() |
ax25_dev_put() //FREE |
| ax25_dev-> //USE
In order to mitigate bugs, when the device is detaching, use
timer_shutdown_sync() to stop the timer.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming(a)zju.edu.cn>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index c5462486dbca..282ec581c072 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -105,7 +105,7 @@ void ax25_dev_device_down(struct net_device *dev)
spin_lock_bh(&ax25_dev_lock);
#ifdef CONFIG_AX25_DAMA_SLAVE
- ax25_ds_del_timer(ax25_dev);
+ timer_shutdown_sync(&ax25_dev->dama.slave_timer);
#endif
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x fd819ad3ecf6f3c232a06b27423ce9ed8c20da89
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040503-tropical-fascism-219c@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fd819ad3ecf6f3c232a06b27423ce9ed8c20da89 Mon Sep 17 00:00:00 2001
From: Duoming Zhou <duoming(a)zju.edu.cn>
Date: Fri, 29 Mar 2024 09:50:23 +0800
Subject: [PATCH] ax25: fix use-after-free bugs caused by ax25_ds_del_timer
When the ax25 device is detaching, the ax25_dev_device_down()
calls ax25_ds_del_timer() to cleanup the slave_timer. When
the timer handler is running, the ax25_ds_del_timer() that
calls del_timer() in it will return directly. As a result,
the use-after-free bugs could happen, one of the scenarios
is shown below:
(Thread 1) | (Thread 2)
| ax25_ds_timeout()
ax25_dev_device_down() |
ax25_ds_del_timer() |
del_timer() |
ax25_dev_put() //FREE |
| ax25_dev-> //USE
In order to mitigate bugs, when the device is detaching, use
timer_shutdown_sync() to stop the timer.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: Duoming Zhou <duoming(a)zju.edu.cn>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20240329015023.9223-1-duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/ax25/ax25_dev.c b/net/ax25/ax25_dev.c
index c5462486dbca..282ec581c072 100644
--- a/net/ax25/ax25_dev.c
+++ b/net/ax25/ax25_dev.c
@@ -105,7 +105,7 @@ void ax25_dev_device_down(struct net_device *dev)
spin_lock_bh(&ax25_dev_lock);
#ifdef CONFIG_AX25_DAMA_SLAVE
- ax25_ds_del_timer(ax25_dev);
+ timer_shutdown_sync(&ax25_dev->dama.slave_timer);
#endif
/*
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x ea558de7238bb12c3435c47f0631e9d17bf4a09f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040510-impose-yonder-1ce4@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea558de7238bb12c3435c47f0631e9d17bf4a09f Mon Sep 17 00:00:00 2001
From: Ivan Vecera <ivecera(a)redhat.com>
Date: Sat, 16 Mar 2024 12:38:29 +0100
Subject: [PATCH] i40e: Enforce software interrupt during busy-poll exit
As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
during poll exit") followed by commit 23be7075b318 ("ice: fix software
generating extra interrupts") I'm seeing the similar issue also with
i40e driver.
In certain situation when busy-loop is enabled together with adaptive
coalescing, the driver occasionally misses that there are outstanding
descriptors to clean when exiting busy poll.
Try to catch the remaining work by triggering a software interrupt
when exiting busy poll. No extra interrupts will be generated when
busy polling is not used.
The issue was found when running sockperf ping-pong tcp test with
adaptive coalescing and busy poll enabled (50 as value busy_pool
and busy_read sysctl knobs) and results in huge latency spikes
with more than 100000us.
The fix is inspired from the ice driver and do the following:
1) During napi poll exit in case of busy-poll (napo_complete_done()
returns false) this is recorded to q_vector that we were in busy
loop.
2) Extends i40e_buildreg_itr() to be able to add an enforced software
interrupt into built value
2) In i40e_update_enable_itr() enforces a software interrupt trigger
if we are exiting busy poll to catch any pending clean-ups
3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
20K interrupts per second to limit the number of these sw interrupts.
Test results
============
Prior:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473
sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.571 usec
sockperf: Total 2429473 observations; each percentile contains 24294.73 observations
sockperf: ---> <MAX> observation = 103294.331
sockperf: ---> percentile 99.999 = 45.633
sockperf: ---> percentile 99.990 = 37.013
sockperf: ---> percentile 99.900 = 35.910
sockperf: ---> percentile 99.000 = 33.390
sockperf: ---> percentile 90.000 = 28.626
sockperf: ---> percentile 75.000 = 27.741
sockperf: ---> percentile 50.000 = 26.743
sockperf: ---> percentile 25.000 = 25.614
sockperf: ---> <MIN> observation = 12.220
After:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186
sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.965 usec
sockperf: Total 2391186 observations; each percentile contains 23911.86 observations
sockperf: ---> <MAX> observation = 195.841
sockperf: ---> percentile 99.999 = 45.026
sockperf: ---> percentile 99.990 = 39.009
sockperf: ---> percentile 99.900 = 35.922
sockperf: ---> percentile 99.000 = 33.482
sockperf: ---> percentile 90.000 = 28.902
sockperf: ---> percentile 75.000 = 27.821
sockperf: ---> percentile 50.000 = 26.860
sockperf: ---> percentile 25.000 = 25.685
sockperf: ---> <MIN> observation = 12.277
Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit")
Reported-by: Hugo Ferreira <hferreir(a)redhat.com>
Reviewed-by: Michal Schmidt <mschmidt(a)redhat.com>
Signed-off-by: Ivan Vecera <ivecera(a)redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index ba24f3fa92c3..2fbabcdb5bb5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -955,6 +955,7 @@ struct i40e_q_vector {
struct rcu_head rcu; /* to avoid race with update stats on free */
char name[I40E_INT_NAME_STR_LEN];
bool arm_wb_state;
+ bool in_busy_poll;
int irq_num; /* IRQ assigned to this q_vector */
} ____cacheline_internodealigned_in_smp;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f86578857e8a..6576a0081093 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3911,6 +3911,12 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi)
q_vector->tx.target_itr >> 1);
q_vector->tx.current_itr = q_vector->tx.target_itr;
+ /* Set ITR for software interrupts triggered after exiting
+ * busy-loop polling.
+ */
+ wr32(hw, I40E_PFINT_ITRN(I40E_SW_ITR, vector - 1),
+ I40E_ITR_20K);
+
wr32(hw, I40E_PFINT_RATEN(vector - 1),
i40e_intrl_usec_to_reg(vsi->int_rate_limit));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_register.h b/drivers/net/ethernet/intel/i40e/i40e_register.h
index 14ab642cafdb..432afbb64201 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_register.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_register.h
@@ -333,8 +333,11 @@
#define I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT 3
#define I40E_PFINT_DYN_CTLN_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT)
#define I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT 5
+#define I40E_PFINT_DYN_CTLN_INTERVAL_MASK I40E_MASK(0xFFF, I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT)
#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT 24
#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK I40E_MASK(0x1, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT)
+#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT 25
+#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT)
#define I40E_PFINT_ICR0 0x00038780 /* Reset: CORER */
#define I40E_PFINT_ICR0_INTEVENT_SHIFT 0
#define I40E_PFINT_ICR0_INTEVENT_MASK I40E_MASK(0x1, I40E_PFINT_ICR0_INTEVENT_SHIFT)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 0d7177083708..1a12b732818e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2630,7 +2630,22 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
return failure ? budget : (int)total_rx_packets;
}
-static inline u32 i40e_buildreg_itr(const int type, u16 itr)
+/**
+ * i40e_buildreg_itr - build a value for writing to I40E_PFINT_DYN_CTLN register
+ * @itr_idx: interrupt throttling index
+ * @interval: interrupt throttling interval value in usecs
+ * @force_swint: force software interrupt
+ *
+ * The function builds a value for I40E_PFINT_DYN_CTLN register that
+ * is used to update interrupt throttling interval for specified ITR index
+ * and optionally enforces a software interrupt. If the @itr_idx is equal
+ * to I40E_ITR_NONE then no interval change is applied and only @force_swint
+ * parameter is taken into account. If the interval change and enforced
+ * software interrupt are not requested then the built value just enables
+ * appropriate vector interrupt.
+ **/
+static u32 i40e_buildreg_itr(enum i40e_dyn_idx itr_idx, u16 interval,
+ bool force_swint)
{
u32 val;
@@ -2644,23 +2659,33 @@ static inline u32 i40e_buildreg_itr(const int type, u16 itr)
* an event in the PBA anyway so we need to rely on the automask
* to hold pending events for us until the interrupt is re-enabled
*
- * The itr value is reported in microseconds, and the register
- * value is recorded in 2 microsecond units. For this reason we
- * only need to shift by the interval shift - 1 instead of the
- * full value.
+ * We have to shift the given value as it is reported in microseconds
+ * and the register value is recorded in 2 microsecond units.
*/
- itr &= I40E_ITR_MASK;
+ interval >>= 1;
+ /* 1. Enable vector interrupt
+ * 2. Update the interval for the specified ITR index
+ * (I40E_ITR_NONE in the register is used to indicate that
+ * no interval update is requested)
+ */
val = I40E_PFINT_DYN_CTLN_INTENA_MASK |
- (type << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT) |
- (itr << (I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT - 1));
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_ITR_INDX_MASK, itr_idx) |
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_INTERVAL_MASK, interval);
+
+ /* 3. Enforce software interrupt trigger if requested
+ * (These software interrupts rate is limited by ITR2 that is
+ * set to 20K interrupts per second)
+ */
+ if (force_swint)
+ val |= I40E_PFINT_DYN_CTLN_SWINT_TRIG_MASK |
+ I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK |
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK,
+ I40E_SW_ITR);
return val;
}
-/* a small macro to shorten up some long lines */
-#define INTREG I40E_PFINT_DYN_CTLN
-
/* The act of updating the ITR will cause it to immediately trigger. In order
* to prevent this from throwing off adaptive update statistics we defer the
* update so that it can only happen so often. So after either Tx or Rx are
@@ -2679,8 +2704,10 @@ static inline u32 i40e_buildreg_itr(const int type, u16 itr)
static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
struct i40e_q_vector *q_vector)
{
+ enum i40e_dyn_idx itr_idx = I40E_ITR_NONE;
struct i40e_hw *hw = &vsi->back->hw;
- u32 intval;
+ u16 interval = 0;
+ u32 itr_val;
/* If we don't have MSIX, then we only need to re-enable icr0 */
if (!test_bit(I40E_FLAG_MSIX_ENA, vsi->back->flags)) {
@@ -2702,8 +2729,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
*/
if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
/* Rx ITR needs to be reduced, this is highest priority */
- intval = i40e_buildreg_itr(I40E_RX_ITR,
- q_vector->rx.target_itr);
+ itr_idx = I40E_RX_ITR;
+ interval = q_vector->rx.target_itr;
q_vector->rx.current_itr = q_vector->rx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
@@ -2712,25 +2739,36 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
/* Tx ITR needs to be reduced, this is second priority
* Tx ITR needs to be increased more than Rx, fourth priority
*/
- intval = i40e_buildreg_itr(I40E_TX_ITR,
- q_vector->tx.target_itr);
+ itr_idx = I40E_TX_ITR;
+ interval = q_vector->tx.target_itr;
q_vector->tx.current_itr = q_vector->tx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
/* Rx ITR needs to be increased, third priority */
- intval = i40e_buildreg_itr(I40E_RX_ITR,
- q_vector->rx.target_itr);
+ itr_idx = I40E_RX_ITR;
+ interval = q_vector->rx.target_itr;
q_vector->rx.current_itr = q_vector->rx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else {
/* No ITR update, lowest priority */
- intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
if (q_vector->itr_countdown)
q_vector->itr_countdown--;
}
- if (!test_bit(__I40E_VSI_DOWN, vsi->state))
- wr32(hw, INTREG(q_vector->reg_idx), intval);
+ /* Do not update interrupt control register if VSI is down */
+ if (test_bit(__I40E_VSI_DOWN, vsi->state))
+ return;
+
+ /* Update ITR interval if necessary and enforce software interrupt
+ * if we are exiting busy poll.
+ */
+ if (q_vector->in_busy_poll) {
+ itr_val = i40e_buildreg_itr(itr_idx, interval, true);
+ q_vector->in_busy_poll = false;
+ } else {
+ itr_val = i40e_buildreg_itr(itr_idx, interval, false);
+ }
+ wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), itr_val);
}
/**
@@ -2845,6 +2883,8 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
*/
if (likely(napi_complete_done(napi, work_done)))
i40e_update_enable_itr(vsi, q_vector);
+ else
+ q_vector->in_busy_poll = true;
return min(work_done, budget - 1);
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index abf15067eb5d..2cdc7de6301c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -68,6 +68,7 @@ enum i40e_dyn_idx {
/* these are indexes into ITRN registers */
#define I40E_RX_ITR I40E_IDX_ITR0
#define I40E_TX_ITR I40E_IDX_ITR1
+#define I40E_SW_ITR I40E_IDX_ITR2
/* Supported RSS offloads */
#define I40E_DEFAULT_RSS_HENA ( \
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ea558de7238bb12c3435c47f0631e9d17bf4a09f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040507-email-pried-bb87@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea558de7238bb12c3435c47f0631e9d17bf4a09f Mon Sep 17 00:00:00 2001
From: Ivan Vecera <ivecera(a)redhat.com>
Date: Sat, 16 Mar 2024 12:38:29 +0100
Subject: [PATCH] i40e: Enforce software interrupt during busy-poll exit
As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
during poll exit") followed by commit 23be7075b318 ("ice: fix software
generating extra interrupts") I'm seeing the similar issue also with
i40e driver.
In certain situation when busy-loop is enabled together with adaptive
coalescing, the driver occasionally misses that there are outstanding
descriptors to clean when exiting busy poll.
Try to catch the remaining work by triggering a software interrupt
when exiting busy poll. No extra interrupts will be generated when
busy polling is not used.
The issue was found when running sockperf ping-pong tcp test with
adaptive coalescing and busy poll enabled (50 as value busy_pool
and busy_read sysctl knobs) and results in huge latency spikes
with more than 100000us.
The fix is inspired from the ice driver and do the following:
1) During napi poll exit in case of busy-poll (napo_complete_done()
returns false) this is recorded to q_vector that we were in busy
loop.
2) Extends i40e_buildreg_itr() to be able to add an enforced software
interrupt into built value
2) In i40e_update_enable_itr() enforces a software interrupt trigger
if we are exiting busy poll to catch any pending clean-ups
3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
20K interrupts per second to limit the number of these sw interrupts.
Test results
============
Prior:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473
sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.571 usec
sockperf: Total 2429473 observations; each percentile contains 24294.73 observations
sockperf: ---> <MAX> observation = 103294.331
sockperf: ---> percentile 99.999 = 45.633
sockperf: ---> percentile 99.990 = 37.013
sockperf: ---> percentile 99.900 = 35.910
sockperf: ---> percentile 99.000 = 33.390
sockperf: ---> percentile 90.000 = 28.626
sockperf: ---> percentile 75.000 = 27.741
sockperf: ---> percentile 50.000 = 26.743
sockperf: ---> percentile 25.000 = 25.614
sockperf: ---> <MIN> observation = 12.220
After:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186
sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.965 usec
sockperf: Total 2391186 observations; each percentile contains 23911.86 observations
sockperf: ---> <MAX> observation = 195.841
sockperf: ---> percentile 99.999 = 45.026
sockperf: ---> percentile 99.990 = 39.009
sockperf: ---> percentile 99.900 = 35.922
sockperf: ---> percentile 99.000 = 33.482
sockperf: ---> percentile 90.000 = 28.902
sockperf: ---> percentile 75.000 = 27.821
sockperf: ---> percentile 50.000 = 26.860
sockperf: ---> percentile 25.000 = 25.685
sockperf: ---> <MIN> observation = 12.277
Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit")
Reported-by: Hugo Ferreira <hferreir(a)redhat.com>
Reviewed-by: Michal Schmidt <mschmidt(a)redhat.com>
Signed-off-by: Ivan Vecera <ivecera(a)redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index ba24f3fa92c3..2fbabcdb5bb5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -955,6 +955,7 @@ struct i40e_q_vector {
struct rcu_head rcu; /* to avoid race with update stats on free */
char name[I40E_INT_NAME_STR_LEN];
bool arm_wb_state;
+ bool in_busy_poll;
int irq_num; /* IRQ assigned to this q_vector */
} ____cacheline_internodealigned_in_smp;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f86578857e8a..6576a0081093 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3911,6 +3911,12 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi)
q_vector->tx.target_itr >> 1);
q_vector->tx.current_itr = q_vector->tx.target_itr;
+ /* Set ITR for software interrupts triggered after exiting
+ * busy-loop polling.
+ */
+ wr32(hw, I40E_PFINT_ITRN(I40E_SW_ITR, vector - 1),
+ I40E_ITR_20K);
+
wr32(hw, I40E_PFINT_RATEN(vector - 1),
i40e_intrl_usec_to_reg(vsi->int_rate_limit));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_register.h b/drivers/net/ethernet/intel/i40e/i40e_register.h
index 14ab642cafdb..432afbb64201 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_register.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_register.h
@@ -333,8 +333,11 @@
#define I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT 3
#define I40E_PFINT_DYN_CTLN_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT)
#define I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT 5
+#define I40E_PFINT_DYN_CTLN_INTERVAL_MASK I40E_MASK(0xFFF, I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT)
#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT 24
#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK I40E_MASK(0x1, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT)
+#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT 25
+#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT)
#define I40E_PFINT_ICR0 0x00038780 /* Reset: CORER */
#define I40E_PFINT_ICR0_INTEVENT_SHIFT 0
#define I40E_PFINT_ICR0_INTEVENT_MASK I40E_MASK(0x1, I40E_PFINT_ICR0_INTEVENT_SHIFT)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 0d7177083708..1a12b732818e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2630,7 +2630,22 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
return failure ? budget : (int)total_rx_packets;
}
-static inline u32 i40e_buildreg_itr(const int type, u16 itr)
+/**
+ * i40e_buildreg_itr - build a value for writing to I40E_PFINT_DYN_CTLN register
+ * @itr_idx: interrupt throttling index
+ * @interval: interrupt throttling interval value in usecs
+ * @force_swint: force software interrupt
+ *
+ * The function builds a value for I40E_PFINT_DYN_CTLN register that
+ * is used to update interrupt throttling interval for specified ITR index
+ * and optionally enforces a software interrupt. If the @itr_idx is equal
+ * to I40E_ITR_NONE then no interval change is applied and only @force_swint
+ * parameter is taken into account. If the interval change and enforced
+ * software interrupt are not requested then the built value just enables
+ * appropriate vector interrupt.
+ **/
+static u32 i40e_buildreg_itr(enum i40e_dyn_idx itr_idx, u16 interval,
+ bool force_swint)
{
u32 val;
@@ -2644,23 +2659,33 @@ static inline u32 i40e_buildreg_itr(const int type, u16 itr)
* an event in the PBA anyway so we need to rely on the automask
* to hold pending events for us until the interrupt is re-enabled
*
- * The itr value is reported in microseconds, and the register
- * value is recorded in 2 microsecond units. For this reason we
- * only need to shift by the interval shift - 1 instead of the
- * full value.
+ * We have to shift the given value as it is reported in microseconds
+ * and the register value is recorded in 2 microsecond units.
*/
- itr &= I40E_ITR_MASK;
+ interval >>= 1;
+ /* 1. Enable vector interrupt
+ * 2. Update the interval for the specified ITR index
+ * (I40E_ITR_NONE in the register is used to indicate that
+ * no interval update is requested)
+ */
val = I40E_PFINT_DYN_CTLN_INTENA_MASK |
- (type << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT) |
- (itr << (I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT - 1));
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_ITR_INDX_MASK, itr_idx) |
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_INTERVAL_MASK, interval);
+
+ /* 3. Enforce software interrupt trigger if requested
+ * (These software interrupts rate is limited by ITR2 that is
+ * set to 20K interrupts per second)
+ */
+ if (force_swint)
+ val |= I40E_PFINT_DYN_CTLN_SWINT_TRIG_MASK |
+ I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK |
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK,
+ I40E_SW_ITR);
return val;
}
-/* a small macro to shorten up some long lines */
-#define INTREG I40E_PFINT_DYN_CTLN
-
/* The act of updating the ITR will cause it to immediately trigger. In order
* to prevent this from throwing off adaptive update statistics we defer the
* update so that it can only happen so often. So after either Tx or Rx are
@@ -2679,8 +2704,10 @@ static inline u32 i40e_buildreg_itr(const int type, u16 itr)
static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
struct i40e_q_vector *q_vector)
{
+ enum i40e_dyn_idx itr_idx = I40E_ITR_NONE;
struct i40e_hw *hw = &vsi->back->hw;
- u32 intval;
+ u16 interval = 0;
+ u32 itr_val;
/* If we don't have MSIX, then we only need to re-enable icr0 */
if (!test_bit(I40E_FLAG_MSIX_ENA, vsi->back->flags)) {
@@ -2702,8 +2729,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
*/
if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
/* Rx ITR needs to be reduced, this is highest priority */
- intval = i40e_buildreg_itr(I40E_RX_ITR,
- q_vector->rx.target_itr);
+ itr_idx = I40E_RX_ITR;
+ interval = q_vector->rx.target_itr;
q_vector->rx.current_itr = q_vector->rx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
@@ -2712,25 +2739,36 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
/* Tx ITR needs to be reduced, this is second priority
* Tx ITR needs to be increased more than Rx, fourth priority
*/
- intval = i40e_buildreg_itr(I40E_TX_ITR,
- q_vector->tx.target_itr);
+ itr_idx = I40E_TX_ITR;
+ interval = q_vector->tx.target_itr;
q_vector->tx.current_itr = q_vector->tx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
/* Rx ITR needs to be increased, third priority */
- intval = i40e_buildreg_itr(I40E_RX_ITR,
- q_vector->rx.target_itr);
+ itr_idx = I40E_RX_ITR;
+ interval = q_vector->rx.target_itr;
q_vector->rx.current_itr = q_vector->rx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else {
/* No ITR update, lowest priority */
- intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
if (q_vector->itr_countdown)
q_vector->itr_countdown--;
}
- if (!test_bit(__I40E_VSI_DOWN, vsi->state))
- wr32(hw, INTREG(q_vector->reg_idx), intval);
+ /* Do not update interrupt control register if VSI is down */
+ if (test_bit(__I40E_VSI_DOWN, vsi->state))
+ return;
+
+ /* Update ITR interval if necessary and enforce software interrupt
+ * if we are exiting busy poll.
+ */
+ if (q_vector->in_busy_poll) {
+ itr_val = i40e_buildreg_itr(itr_idx, interval, true);
+ q_vector->in_busy_poll = false;
+ } else {
+ itr_val = i40e_buildreg_itr(itr_idx, interval, false);
+ }
+ wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), itr_val);
}
/**
@@ -2845,6 +2883,8 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
*/
if (likely(napi_complete_done(napi, work_done)))
i40e_update_enable_itr(vsi, q_vector);
+ else
+ q_vector->in_busy_poll = true;
return min(work_done, budget - 1);
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index abf15067eb5d..2cdc7de6301c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -68,6 +68,7 @@ enum i40e_dyn_idx {
/* these are indexes into ITRN registers */
#define I40E_RX_ITR I40E_IDX_ITR0
#define I40E_TX_ITR I40E_IDX_ITR1
+#define I40E_SW_ITR I40E_IDX_ITR2
/* Supported RSS offloads */
#define I40E_DEFAULT_RSS_HENA ( \
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ea558de7238bb12c3435c47f0631e9d17bf4a09f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040504-iron-respect-9ff0@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea558de7238bb12c3435c47f0631e9d17bf4a09f Mon Sep 17 00:00:00 2001
From: Ivan Vecera <ivecera(a)redhat.com>
Date: Sat, 16 Mar 2024 12:38:29 +0100
Subject: [PATCH] i40e: Enforce software interrupt during busy-poll exit
As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
during poll exit") followed by commit 23be7075b318 ("ice: fix software
generating extra interrupts") I'm seeing the similar issue also with
i40e driver.
In certain situation when busy-loop is enabled together with adaptive
coalescing, the driver occasionally misses that there are outstanding
descriptors to clean when exiting busy poll.
Try to catch the remaining work by triggering a software interrupt
when exiting busy poll. No extra interrupts will be generated when
busy polling is not used.
The issue was found when running sockperf ping-pong tcp test with
adaptive coalescing and busy poll enabled (50 as value busy_pool
and busy_read sysctl knobs) and results in huge latency spikes
with more than 100000us.
The fix is inspired from the ice driver and do the following:
1) During napi poll exit in case of busy-poll (napo_complete_done()
returns false) this is recorded to q_vector that we were in busy
loop.
2) Extends i40e_buildreg_itr() to be able to add an enforced software
interrupt into built value
2) In i40e_update_enable_itr() enforces a software interrupt trigger
if we are exiting busy poll to catch any pending clean-ups
3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
20K interrupts per second to limit the number of these sw interrupts.
Test results
============
Prior:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473
sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.571 usec
sockperf: Total 2429473 observations; each percentile contains 24294.73 observations
sockperf: ---> <MAX> observation = 103294.331
sockperf: ---> percentile 99.999 = 45.633
sockperf: ---> percentile 99.990 = 37.013
sockperf: ---> percentile 99.900 = 35.910
sockperf: ---> percentile 99.000 = 33.390
sockperf: ---> percentile 90.000 = 28.626
sockperf: ---> percentile 75.000 = 27.741
sockperf: ---> percentile 50.000 = 26.743
sockperf: ---> percentile 25.000 = 25.614
sockperf: ---> <MIN> observation = 12.220
After:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186
sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.965 usec
sockperf: Total 2391186 observations; each percentile contains 23911.86 observations
sockperf: ---> <MAX> observation = 195.841
sockperf: ---> percentile 99.999 = 45.026
sockperf: ---> percentile 99.990 = 39.009
sockperf: ---> percentile 99.900 = 35.922
sockperf: ---> percentile 99.000 = 33.482
sockperf: ---> percentile 90.000 = 28.902
sockperf: ---> percentile 75.000 = 27.821
sockperf: ---> percentile 50.000 = 26.860
sockperf: ---> percentile 25.000 = 25.685
sockperf: ---> <MIN> observation = 12.277
Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit")
Reported-by: Hugo Ferreira <hferreir(a)redhat.com>
Reviewed-by: Michal Schmidt <mschmidt(a)redhat.com>
Signed-off-by: Ivan Vecera <ivecera(a)redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index ba24f3fa92c3..2fbabcdb5bb5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -955,6 +955,7 @@ struct i40e_q_vector {
struct rcu_head rcu; /* to avoid race with update stats on free */
char name[I40E_INT_NAME_STR_LEN];
bool arm_wb_state;
+ bool in_busy_poll;
int irq_num; /* IRQ assigned to this q_vector */
} ____cacheline_internodealigned_in_smp;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f86578857e8a..6576a0081093 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3911,6 +3911,12 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi)
q_vector->tx.target_itr >> 1);
q_vector->tx.current_itr = q_vector->tx.target_itr;
+ /* Set ITR for software interrupts triggered after exiting
+ * busy-loop polling.
+ */
+ wr32(hw, I40E_PFINT_ITRN(I40E_SW_ITR, vector - 1),
+ I40E_ITR_20K);
+
wr32(hw, I40E_PFINT_RATEN(vector - 1),
i40e_intrl_usec_to_reg(vsi->int_rate_limit));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_register.h b/drivers/net/ethernet/intel/i40e/i40e_register.h
index 14ab642cafdb..432afbb64201 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_register.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_register.h
@@ -333,8 +333,11 @@
#define I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT 3
#define I40E_PFINT_DYN_CTLN_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT)
#define I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT 5
+#define I40E_PFINT_DYN_CTLN_INTERVAL_MASK I40E_MASK(0xFFF, I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT)
#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT 24
#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK I40E_MASK(0x1, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT)
+#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT 25
+#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT)
#define I40E_PFINT_ICR0 0x00038780 /* Reset: CORER */
#define I40E_PFINT_ICR0_INTEVENT_SHIFT 0
#define I40E_PFINT_ICR0_INTEVENT_MASK I40E_MASK(0x1, I40E_PFINT_ICR0_INTEVENT_SHIFT)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 0d7177083708..1a12b732818e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2630,7 +2630,22 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
return failure ? budget : (int)total_rx_packets;
}
-static inline u32 i40e_buildreg_itr(const int type, u16 itr)
+/**
+ * i40e_buildreg_itr - build a value for writing to I40E_PFINT_DYN_CTLN register
+ * @itr_idx: interrupt throttling index
+ * @interval: interrupt throttling interval value in usecs
+ * @force_swint: force software interrupt
+ *
+ * The function builds a value for I40E_PFINT_DYN_CTLN register that
+ * is used to update interrupt throttling interval for specified ITR index
+ * and optionally enforces a software interrupt. If the @itr_idx is equal
+ * to I40E_ITR_NONE then no interval change is applied and only @force_swint
+ * parameter is taken into account. If the interval change and enforced
+ * software interrupt are not requested then the built value just enables
+ * appropriate vector interrupt.
+ **/
+static u32 i40e_buildreg_itr(enum i40e_dyn_idx itr_idx, u16 interval,
+ bool force_swint)
{
u32 val;
@@ -2644,23 +2659,33 @@ static inline u32 i40e_buildreg_itr(const int type, u16 itr)
* an event in the PBA anyway so we need to rely on the automask
* to hold pending events for us until the interrupt is re-enabled
*
- * The itr value is reported in microseconds, and the register
- * value is recorded in 2 microsecond units. For this reason we
- * only need to shift by the interval shift - 1 instead of the
- * full value.
+ * We have to shift the given value as it is reported in microseconds
+ * and the register value is recorded in 2 microsecond units.
*/
- itr &= I40E_ITR_MASK;
+ interval >>= 1;
+ /* 1. Enable vector interrupt
+ * 2. Update the interval for the specified ITR index
+ * (I40E_ITR_NONE in the register is used to indicate that
+ * no interval update is requested)
+ */
val = I40E_PFINT_DYN_CTLN_INTENA_MASK |
- (type << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT) |
- (itr << (I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT - 1));
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_ITR_INDX_MASK, itr_idx) |
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_INTERVAL_MASK, interval);
+
+ /* 3. Enforce software interrupt trigger if requested
+ * (These software interrupts rate is limited by ITR2 that is
+ * set to 20K interrupts per second)
+ */
+ if (force_swint)
+ val |= I40E_PFINT_DYN_CTLN_SWINT_TRIG_MASK |
+ I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK |
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK,
+ I40E_SW_ITR);
return val;
}
-/* a small macro to shorten up some long lines */
-#define INTREG I40E_PFINT_DYN_CTLN
-
/* The act of updating the ITR will cause it to immediately trigger. In order
* to prevent this from throwing off adaptive update statistics we defer the
* update so that it can only happen so often. So after either Tx or Rx are
@@ -2679,8 +2704,10 @@ static inline u32 i40e_buildreg_itr(const int type, u16 itr)
static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
struct i40e_q_vector *q_vector)
{
+ enum i40e_dyn_idx itr_idx = I40E_ITR_NONE;
struct i40e_hw *hw = &vsi->back->hw;
- u32 intval;
+ u16 interval = 0;
+ u32 itr_val;
/* If we don't have MSIX, then we only need to re-enable icr0 */
if (!test_bit(I40E_FLAG_MSIX_ENA, vsi->back->flags)) {
@@ -2702,8 +2729,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
*/
if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
/* Rx ITR needs to be reduced, this is highest priority */
- intval = i40e_buildreg_itr(I40E_RX_ITR,
- q_vector->rx.target_itr);
+ itr_idx = I40E_RX_ITR;
+ interval = q_vector->rx.target_itr;
q_vector->rx.current_itr = q_vector->rx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
@@ -2712,25 +2739,36 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
/* Tx ITR needs to be reduced, this is second priority
* Tx ITR needs to be increased more than Rx, fourth priority
*/
- intval = i40e_buildreg_itr(I40E_TX_ITR,
- q_vector->tx.target_itr);
+ itr_idx = I40E_TX_ITR;
+ interval = q_vector->tx.target_itr;
q_vector->tx.current_itr = q_vector->tx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
/* Rx ITR needs to be increased, third priority */
- intval = i40e_buildreg_itr(I40E_RX_ITR,
- q_vector->rx.target_itr);
+ itr_idx = I40E_RX_ITR;
+ interval = q_vector->rx.target_itr;
q_vector->rx.current_itr = q_vector->rx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else {
/* No ITR update, lowest priority */
- intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
if (q_vector->itr_countdown)
q_vector->itr_countdown--;
}
- if (!test_bit(__I40E_VSI_DOWN, vsi->state))
- wr32(hw, INTREG(q_vector->reg_idx), intval);
+ /* Do not update interrupt control register if VSI is down */
+ if (test_bit(__I40E_VSI_DOWN, vsi->state))
+ return;
+
+ /* Update ITR interval if necessary and enforce software interrupt
+ * if we are exiting busy poll.
+ */
+ if (q_vector->in_busy_poll) {
+ itr_val = i40e_buildreg_itr(itr_idx, interval, true);
+ q_vector->in_busy_poll = false;
+ } else {
+ itr_val = i40e_buildreg_itr(itr_idx, interval, false);
+ }
+ wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), itr_val);
}
/**
@@ -2845,6 +2883,8 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
*/
if (likely(napi_complete_done(napi, work_done)))
i40e_update_enable_itr(vsi, q_vector);
+ else
+ q_vector->in_busy_poll = true;
return min(work_done, budget - 1);
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index abf15067eb5d..2cdc7de6301c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -68,6 +68,7 @@ enum i40e_dyn_idx {
/* these are indexes into ITRN registers */
#define I40E_RX_ITR I40E_IDX_ITR0
#define I40E_TX_ITR I40E_IDX_ITR1
+#define I40E_SW_ITR I40E_IDX_ITR2
/* Supported RSS offloads */
#define I40E_DEFAULT_RSS_HENA ( \
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x ea558de7238bb12c3435c47f0631e9d17bf4a09f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040500-scalping-relatable-1383@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ea558de7238bb12c3435c47f0631e9d17bf4a09f Mon Sep 17 00:00:00 2001
From: Ivan Vecera <ivecera(a)redhat.com>
Date: Sat, 16 Mar 2024 12:38:29 +0100
Subject: [PATCH] i40e: Enforce software interrupt during busy-poll exit
As for ice bug fixed by commit b7306b42beaf ("ice: manage interrupts
during poll exit") followed by commit 23be7075b318 ("ice: fix software
generating extra interrupts") I'm seeing the similar issue also with
i40e driver.
In certain situation when busy-loop is enabled together with adaptive
coalescing, the driver occasionally misses that there are outstanding
descriptors to clean when exiting busy poll.
Try to catch the remaining work by triggering a software interrupt
when exiting busy poll. No extra interrupts will be generated when
busy polling is not used.
The issue was found when running sockperf ping-pong tcp test with
adaptive coalescing and busy poll enabled (50 as value busy_pool
and busy_read sysctl knobs) and results in huge latency spikes
with more than 100000us.
The fix is inspired from the ice driver and do the following:
1) During napi poll exit in case of busy-poll (napo_complete_done()
returns false) this is recorded to q_vector that we were in busy
loop.
2) Extends i40e_buildreg_itr() to be able to add an enforced software
interrupt into built value
2) In i40e_update_enable_itr() enforces a software interrupt trigger
if we are exiting busy poll to catch any pending clean-ups
3) Reuses unused 3rd ITR (interrupt throttle) index and set it to
20K interrupts per second to limit the number of these sw interrupts.
Test results
============
Prior:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2438563; ReceivedMessages=2438562
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2429473; ReceivedMessages=2429473
sockperf: ====> avg-latency=24.571 (std-dev=93.297, mean-ad=4.904, median-ad=1.510, siqr=1.063, cv=3.797, std-error=0.060, 99.0% ci=[24.417, 24.725])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.571 usec
sockperf: Total 2429473 observations; each percentile contains 24294.73 observations
sockperf: ---> <MAX> observation = 103294.331
sockperf: ---> percentile 99.999 = 45.633
sockperf: ---> percentile 99.990 = 37.013
sockperf: ---> percentile 99.900 = 35.910
sockperf: ---> percentile 99.000 = 33.390
sockperf: ---> percentile 90.000 = 28.626
sockperf: ---> percentile 75.000 = 27.741
sockperf: ---> percentile 50.000 = 26.743
sockperf: ---> percentile 25.000 = 25.614
sockperf: ---> <MIN> observation = 12.220
After:
[root@dell-per640-07 net]# sockperf ping-pong -i 10.9.9.1 --tcp -m 1000 --mps=max -t 120
sockperf: == version #3.10-no.git ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on socket(s)
[ 0] IP = 10.9.9.1 PORT = 11111 # TCP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=119.999 sec; Warm up time=400 msec; SentMessages=2400055; ReceivedMessages=2400054
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=119.549 sec; SentMessages=2391186; ReceivedMessages=2391186
sockperf: ====> avg-latency=24.965 (std-dev=5.934, mean-ad=4.642, median-ad=1.485, siqr=1.067, cv=0.238, std-error=0.004, 99.0% ci=[24.955, 24.975])
sockperf: # dropped messages = 0; # duplicated messages = 0; # out-of-order messages = 0
sockperf: Summary: Latency is 24.965 usec
sockperf: Total 2391186 observations; each percentile contains 23911.86 observations
sockperf: ---> <MAX> observation = 195.841
sockperf: ---> percentile 99.999 = 45.026
sockperf: ---> percentile 99.990 = 39.009
sockperf: ---> percentile 99.900 = 35.922
sockperf: ---> percentile 99.000 = 33.482
sockperf: ---> percentile 90.000 = 28.902
sockperf: ---> percentile 75.000 = 27.821
sockperf: ---> percentile 50.000 = 26.860
sockperf: ---> percentile 25.000 = 25.685
sockperf: ---> <MIN> observation = 12.277
Fixes: 0bcd952feec7 ("ethernet/intel: consolidate NAPI and NAPI exit")
Reported-by: Hugo Ferreira <hferreir(a)redhat.com>
Reviewed-by: Michal Schmidt <mschmidt(a)redhat.com>
Signed-off-by: Ivan Vecera <ivecera(a)redhat.com>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg(a)intel.com>
Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha(a)intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/i40e/i40e.h b/drivers/net/ethernet/intel/i40e/i40e.h
index ba24f3fa92c3..2fbabcdb5bb5 100644
--- a/drivers/net/ethernet/intel/i40e/i40e.h
+++ b/drivers/net/ethernet/intel/i40e/i40e.h
@@ -955,6 +955,7 @@ struct i40e_q_vector {
struct rcu_head rcu; /* to avoid race with update stats on free */
char name[I40E_INT_NAME_STR_LEN];
bool arm_wb_state;
+ bool in_busy_poll;
int irq_num; /* IRQ assigned to this q_vector */
} ____cacheline_internodealigned_in_smp;
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index f86578857e8a..6576a0081093 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -3911,6 +3911,12 @@ static void i40e_vsi_configure_msix(struct i40e_vsi *vsi)
q_vector->tx.target_itr >> 1);
q_vector->tx.current_itr = q_vector->tx.target_itr;
+ /* Set ITR for software interrupts triggered after exiting
+ * busy-loop polling.
+ */
+ wr32(hw, I40E_PFINT_ITRN(I40E_SW_ITR, vector - 1),
+ I40E_ITR_20K);
+
wr32(hw, I40E_PFINT_RATEN(vector - 1),
i40e_intrl_usec_to_reg(vsi->int_rate_limit));
diff --git a/drivers/net/ethernet/intel/i40e/i40e_register.h b/drivers/net/ethernet/intel/i40e/i40e_register.h
index 14ab642cafdb..432afbb64201 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_register.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_register.h
@@ -333,8 +333,11 @@
#define I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT 3
#define I40E_PFINT_DYN_CTLN_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT)
#define I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT 5
+#define I40E_PFINT_DYN_CTLN_INTERVAL_MASK I40E_MASK(0xFFF, I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT)
#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT 24
#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK I40E_MASK(0x1, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_SHIFT)
+#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT 25
+#define I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK I40E_MASK(0x3, I40E_PFINT_DYN_CTLN_SW_ITR_INDX_SHIFT)
#define I40E_PFINT_ICR0 0x00038780 /* Reset: CORER */
#define I40E_PFINT_ICR0_INTEVENT_SHIFT 0
#define I40E_PFINT_ICR0_INTEVENT_MASK I40E_MASK(0x1, I40E_PFINT_ICR0_INTEVENT_SHIFT)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 0d7177083708..1a12b732818e 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -2630,7 +2630,22 @@ static int i40e_clean_rx_irq(struct i40e_ring *rx_ring, int budget,
return failure ? budget : (int)total_rx_packets;
}
-static inline u32 i40e_buildreg_itr(const int type, u16 itr)
+/**
+ * i40e_buildreg_itr - build a value for writing to I40E_PFINT_DYN_CTLN register
+ * @itr_idx: interrupt throttling index
+ * @interval: interrupt throttling interval value in usecs
+ * @force_swint: force software interrupt
+ *
+ * The function builds a value for I40E_PFINT_DYN_CTLN register that
+ * is used to update interrupt throttling interval for specified ITR index
+ * and optionally enforces a software interrupt. If the @itr_idx is equal
+ * to I40E_ITR_NONE then no interval change is applied and only @force_swint
+ * parameter is taken into account. If the interval change and enforced
+ * software interrupt are not requested then the built value just enables
+ * appropriate vector interrupt.
+ **/
+static u32 i40e_buildreg_itr(enum i40e_dyn_idx itr_idx, u16 interval,
+ bool force_swint)
{
u32 val;
@@ -2644,23 +2659,33 @@ static inline u32 i40e_buildreg_itr(const int type, u16 itr)
* an event in the PBA anyway so we need to rely on the automask
* to hold pending events for us until the interrupt is re-enabled
*
- * The itr value is reported in microseconds, and the register
- * value is recorded in 2 microsecond units. For this reason we
- * only need to shift by the interval shift - 1 instead of the
- * full value.
+ * We have to shift the given value as it is reported in microseconds
+ * and the register value is recorded in 2 microsecond units.
*/
- itr &= I40E_ITR_MASK;
+ interval >>= 1;
+ /* 1. Enable vector interrupt
+ * 2. Update the interval for the specified ITR index
+ * (I40E_ITR_NONE in the register is used to indicate that
+ * no interval update is requested)
+ */
val = I40E_PFINT_DYN_CTLN_INTENA_MASK |
- (type << I40E_PFINT_DYN_CTLN_ITR_INDX_SHIFT) |
- (itr << (I40E_PFINT_DYN_CTLN_INTERVAL_SHIFT - 1));
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_ITR_INDX_MASK, itr_idx) |
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_INTERVAL_MASK, interval);
+
+ /* 3. Enforce software interrupt trigger if requested
+ * (These software interrupts rate is limited by ITR2 that is
+ * set to 20K interrupts per second)
+ */
+ if (force_swint)
+ val |= I40E_PFINT_DYN_CTLN_SWINT_TRIG_MASK |
+ I40E_PFINT_DYN_CTLN_SW_ITR_INDX_ENA_MASK |
+ FIELD_PREP(I40E_PFINT_DYN_CTLN_SW_ITR_INDX_MASK,
+ I40E_SW_ITR);
return val;
}
-/* a small macro to shorten up some long lines */
-#define INTREG I40E_PFINT_DYN_CTLN
-
/* The act of updating the ITR will cause it to immediately trigger. In order
* to prevent this from throwing off adaptive update statistics we defer the
* update so that it can only happen so often. So after either Tx or Rx are
@@ -2679,8 +2704,10 @@ static inline u32 i40e_buildreg_itr(const int type, u16 itr)
static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
struct i40e_q_vector *q_vector)
{
+ enum i40e_dyn_idx itr_idx = I40E_ITR_NONE;
struct i40e_hw *hw = &vsi->back->hw;
- u32 intval;
+ u16 interval = 0;
+ u32 itr_val;
/* If we don't have MSIX, then we only need to re-enable icr0 */
if (!test_bit(I40E_FLAG_MSIX_ENA, vsi->back->flags)) {
@@ -2702,8 +2729,8 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
*/
if (q_vector->rx.target_itr < q_vector->rx.current_itr) {
/* Rx ITR needs to be reduced, this is highest priority */
- intval = i40e_buildreg_itr(I40E_RX_ITR,
- q_vector->rx.target_itr);
+ itr_idx = I40E_RX_ITR;
+ interval = q_vector->rx.target_itr;
q_vector->rx.current_itr = q_vector->rx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else if ((q_vector->tx.target_itr < q_vector->tx.current_itr) ||
@@ -2712,25 +2739,36 @@ static inline void i40e_update_enable_itr(struct i40e_vsi *vsi,
/* Tx ITR needs to be reduced, this is second priority
* Tx ITR needs to be increased more than Rx, fourth priority
*/
- intval = i40e_buildreg_itr(I40E_TX_ITR,
- q_vector->tx.target_itr);
+ itr_idx = I40E_TX_ITR;
+ interval = q_vector->tx.target_itr;
q_vector->tx.current_itr = q_vector->tx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else if (q_vector->rx.current_itr != q_vector->rx.target_itr) {
/* Rx ITR needs to be increased, third priority */
- intval = i40e_buildreg_itr(I40E_RX_ITR,
- q_vector->rx.target_itr);
+ itr_idx = I40E_RX_ITR;
+ interval = q_vector->rx.target_itr;
q_vector->rx.current_itr = q_vector->rx.target_itr;
q_vector->itr_countdown = ITR_COUNTDOWN_START;
} else {
/* No ITR update, lowest priority */
- intval = i40e_buildreg_itr(I40E_ITR_NONE, 0);
if (q_vector->itr_countdown)
q_vector->itr_countdown--;
}
- if (!test_bit(__I40E_VSI_DOWN, vsi->state))
- wr32(hw, INTREG(q_vector->reg_idx), intval);
+ /* Do not update interrupt control register if VSI is down */
+ if (test_bit(__I40E_VSI_DOWN, vsi->state))
+ return;
+
+ /* Update ITR interval if necessary and enforce software interrupt
+ * if we are exiting busy poll.
+ */
+ if (q_vector->in_busy_poll) {
+ itr_val = i40e_buildreg_itr(itr_idx, interval, true);
+ q_vector->in_busy_poll = false;
+ } else {
+ itr_val = i40e_buildreg_itr(itr_idx, interval, false);
+ }
+ wr32(hw, I40E_PFINT_DYN_CTLN(q_vector->reg_idx), itr_val);
}
/**
@@ -2845,6 +2883,8 @@ int i40e_napi_poll(struct napi_struct *napi, int budget)
*/
if (likely(napi_complete_done(napi, work_done)))
i40e_update_enable_itr(vsi, q_vector);
+ else
+ q_vector->in_busy_poll = true;
return min(work_done, budget - 1);
}
diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.h b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
index abf15067eb5d..2cdc7de6301c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.h
@@ -68,6 +68,7 @@ enum i40e_dyn_idx {
/* these are indexes into ITRN registers */
#define I40E_RX_ITR I40E_IDX_ITR0
#define I40E_TX_ITR I40E_IDX_ITR1
+#define I40E_SW_ITR I40E_IDX_ITR2
/* Supported RSS offloads */
#define I40E_DEFAULT_RSS_HENA ( \
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x ef15ddeeb6bee87c044bf7754fac524545bf71e8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040558-trustee-garage-e392@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ef15ddeeb6bee87c044bf7754fac524545bf71e8 Mon Sep 17 00:00:00 2001
From: Aleksandr Mishin <amishin(a)t-argos.ru>
Date: Thu, 28 Mar 2024 19:55:05 +0300
Subject: [PATCH] octeontx2-af: Add array index check
In rvu_map_cgx_lmac_pf() the 'iter', which is used as an array index, can reach
value (up to 14) that exceed the size (MAX_LMAC_COUNT = 8) of the array.
Fix this bug by adding 'iter' value check.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 91c6945ea1f9 ("octeontx2-af: cn10k: Add RPM MAC support")
Signed-off-by: Aleksandr Mishin <amishin(a)t-argos.ru>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
index 72e060cf6b61..e9bf9231b018 100644
--- a/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
+++ b/drivers/net/ethernet/marvell/octeontx2/af/rvu_cgx.c
@@ -160,6 +160,8 @@ static int rvu_map_cgx_lmac_pf(struct rvu *rvu)
continue;
lmac_bmap = cgx_get_lmac_bmap(rvu_cgx_pdata(cgx, rvu));
for_each_set_bit(iter, &lmac_bmap, rvu->hw->lmac_per_cgx) {
+ if (iter >= MAX_LMAC_COUNT)
+ continue;
lmac = cgx_get_lmacid(rvu_cgx_pdata(cgx, rvu),
iter);
rvu->pf2cgxlmac_map[pf] = cgxlmac_id_to_bmap(cgx, lmac);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 861e8086029e003305750b4126ecd6617465f5c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040539-fiber-dormitory-6321@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 861e8086029e003305750b4126ecd6617465f5c7 Mon Sep 17 00:00:00 2001
From: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Date: Sun, 3 Mar 2024 12:51:32 +0200
Subject: [PATCH] e1000e: move force SMBUS from enable ulp function to avoid
PHY loss issue
Forcing SMBUS inside the ULP enabling flow leads to sporadic PHY loss on
some systems. It is suspected to be caused by initiating PHY transactions
before the interface settles.
Separating this configuration from the ULP enabling flow and moving it to
the shutdown function allows enough time for the interface to settle and
avoids adding a delay.
Fixes: 6607c99e7034 ("e1000e: i219 - fix to enable both ULP and EEE in Sx state")
Co-developed-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Tested-by: Naama Meir <naamax.meir(a)linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index d8e97669f31b..f9e94be36e97 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1165,25 +1165,6 @@ s32 e1000_enable_ulp_lpt_lp(struct e1000_hw *hw, bool to_sx)
if (ret_val)
goto out;
- /* Switching PHY interface always returns MDI error
- * so disable retry mechanism to avoid wasting time
- */
- e1000e_disable_phy_retry(hw);
-
- /* Force SMBus mode in PHY */
- ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg);
- if (ret_val)
- goto release;
- phy_reg |= CV_SMB_CTRL_FORCE_SMBUS;
- e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg);
-
- e1000e_enable_phy_retry(hw);
-
- /* Force SMBus mode in MAC */
- mac_reg = er32(CTRL_EXT);
- mac_reg |= E1000_CTRL_EXT_FORCE_SMBUS;
- ew32(CTRL_EXT, mac_reg);
-
/* Si workaround for ULP entry flow on i127/rev6 h/w. Enable
* LPLU and disable Gig speed when entering ULP
*/
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index cc8c531ec3df..3692fce20195 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6623,6 +6623,7 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
struct e1000_hw *hw = &adapter->hw;
u32 ctrl, ctrl_ext, rctl, status, wufc;
int retval = 0;
+ u16 smb_ctrl;
/* Runtime suspend should only enable wakeup for link changes */
if (runtime)
@@ -6696,6 +6697,23 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
if (retval)
return retval;
}
+
+ /* Force SMBUS to allow WOL */
+ /* Switching PHY interface always returns MDI error
+ * so disable retry mechanism to avoid wasting time
+ */
+ e1000e_disable_phy_retry(hw);
+
+ e1e_rphy(hw, CV_SMB_CTRL, &smb_ctrl);
+ smb_ctrl |= CV_SMB_CTRL_FORCE_SMBUS;
+ e1e_wphy(hw, CV_SMB_CTRL, smb_ctrl);
+
+ e1000e_enable_phy_retry(hw);
+
+ /* Force SMBus mode in MAC */
+ ctrl_ext = er32(CTRL_EXT);
+ ctrl_ext |= E1000_CTRL_EXT_FORCE_SMBUS;
+ ew32(CTRL_EXT, ctrl_ext);
}
/* Ensure that the appropriate bits are set in LPI_CTRL
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 861e8086029e003305750b4126ecd6617465f5c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040536-paying-life-0a32@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 861e8086029e003305750b4126ecd6617465f5c7 Mon Sep 17 00:00:00 2001
From: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Date: Sun, 3 Mar 2024 12:51:32 +0200
Subject: [PATCH] e1000e: move force SMBUS from enable ulp function to avoid
PHY loss issue
Forcing SMBUS inside the ULP enabling flow leads to sporadic PHY loss on
some systems. It is suspected to be caused by initiating PHY transactions
before the interface settles.
Separating this configuration from the ULP enabling flow and moving it to
the shutdown function allows enough time for the interface to settle and
avoids adding a delay.
Fixes: 6607c99e7034 ("e1000e: i219 - fix to enable both ULP and EEE in Sx state")
Co-developed-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Tested-by: Naama Meir <naamax.meir(a)linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index d8e97669f31b..f9e94be36e97 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1165,25 +1165,6 @@ s32 e1000_enable_ulp_lpt_lp(struct e1000_hw *hw, bool to_sx)
if (ret_val)
goto out;
- /* Switching PHY interface always returns MDI error
- * so disable retry mechanism to avoid wasting time
- */
- e1000e_disable_phy_retry(hw);
-
- /* Force SMBus mode in PHY */
- ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg);
- if (ret_val)
- goto release;
- phy_reg |= CV_SMB_CTRL_FORCE_SMBUS;
- e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg);
-
- e1000e_enable_phy_retry(hw);
-
- /* Force SMBus mode in MAC */
- mac_reg = er32(CTRL_EXT);
- mac_reg |= E1000_CTRL_EXT_FORCE_SMBUS;
- ew32(CTRL_EXT, mac_reg);
-
/* Si workaround for ULP entry flow on i127/rev6 h/w. Enable
* LPLU and disable Gig speed when entering ULP
*/
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index cc8c531ec3df..3692fce20195 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6623,6 +6623,7 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
struct e1000_hw *hw = &adapter->hw;
u32 ctrl, ctrl_ext, rctl, status, wufc;
int retval = 0;
+ u16 smb_ctrl;
/* Runtime suspend should only enable wakeup for link changes */
if (runtime)
@@ -6696,6 +6697,23 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
if (retval)
return retval;
}
+
+ /* Force SMBUS to allow WOL */
+ /* Switching PHY interface always returns MDI error
+ * so disable retry mechanism to avoid wasting time
+ */
+ e1000e_disable_phy_retry(hw);
+
+ e1e_rphy(hw, CV_SMB_CTRL, &smb_ctrl);
+ smb_ctrl |= CV_SMB_CTRL_FORCE_SMBUS;
+ e1e_wphy(hw, CV_SMB_CTRL, smb_ctrl);
+
+ e1000e_enable_phy_retry(hw);
+
+ /* Force SMBus mode in MAC */
+ ctrl_ext = er32(CTRL_EXT);
+ ctrl_ext |= E1000_CTRL_EXT_FORCE_SMBUS;
+ ew32(CTRL_EXT, ctrl_ext);
}
/* Ensure that the appropriate bits are set in LPI_CTRL
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 861e8086029e003305750b4126ecd6617465f5c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040531-scuba-germinate-e982@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 861e8086029e003305750b4126ecd6617465f5c7 Mon Sep 17 00:00:00 2001
From: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Date: Sun, 3 Mar 2024 12:51:32 +0200
Subject: [PATCH] e1000e: move force SMBUS from enable ulp function to avoid
PHY loss issue
Forcing SMBUS inside the ULP enabling flow leads to sporadic PHY loss on
some systems. It is suspected to be caused by initiating PHY transactions
before the interface settles.
Separating this configuration from the ULP enabling flow and moving it to
the shutdown function allows enough time for the interface to settle and
avoids adding a delay.
Fixes: 6607c99e7034 ("e1000e: i219 - fix to enable both ULP and EEE in Sx state")
Co-developed-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Tested-by: Naama Meir <naamax.meir(a)linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index d8e97669f31b..f9e94be36e97 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1165,25 +1165,6 @@ s32 e1000_enable_ulp_lpt_lp(struct e1000_hw *hw, bool to_sx)
if (ret_val)
goto out;
- /* Switching PHY interface always returns MDI error
- * so disable retry mechanism to avoid wasting time
- */
- e1000e_disable_phy_retry(hw);
-
- /* Force SMBus mode in PHY */
- ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg);
- if (ret_val)
- goto release;
- phy_reg |= CV_SMB_CTRL_FORCE_SMBUS;
- e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg);
-
- e1000e_enable_phy_retry(hw);
-
- /* Force SMBus mode in MAC */
- mac_reg = er32(CTRL_EXT);
- mac_reg |= E1000_CTRL_EXT_FORCE_SMBUS;
- ew32(CTRL_EXT, mac_reg);
-
/* Si workaround for ULP entry flow on i127/rev6 h/w. Enable
* LPLU and disable Gig speed when entering ULP
*/
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index cc8c531ec3df..3692fce20195 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6623,6 +6623,7 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
struct e1000_hw *hw = &adapter->hw;
u32 ctrl, ctrl_ext, rctl, status, wufc;
int retval = 0;
+ u16 smb_ctrl;
/* Runtime suspend should only enable wakeup for link changes */
if (runtime)
@@ -6696,6 +6697,23 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
if (retval)
return retval;
}
+
+ /* Force SMBUS to allow WOL */
+ /* Switching PHY interface always returns MDI error
+ * so disable retry mechanism to avoid wasting time
+ */
+ e1000e_disable_phy_retry(hw);
+
+ e1e_rphy(hw, CV_SMB_CTRL, &smb_ctrl);
+ smb_ctrl |= CV_SMB_CTRL_FORCE_SMBUS;
+ e1e_wphy(hw, CV_SMB_CTRL, smb_ctrl);
+
+ e1000e_enable_phy_retry(hw);
+
+ /* Force SMBus mode in MAC */
+ ctrl_ext = er32(CTRL_EXT);
+ ctrl_ext |= E1000_CTRL_EXT_FORCE_SMBUS;
+ ew32(CTRL_EXT, ctrl_ext);
}
/* Ensure that the appropriate bits are set in LPI_CTRL
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 861e8086029e003305750b4126ecd6617465f5c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040528-comply-variety-c87f@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 861e8086029e003305750b4126ecd6617465f5c7 Mon Sep 17 00:00:00 2001
From: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Date: Sun, 3 Mar 2024 12:51:32 +0200
Subject: [PATCH] e1000e: move force SMBUS from enable ulp function to avoid
PHY loss issue
Forcing SMBUS inside the ULP enabling flow leads to sporadic PHY loss on
some systems. It is suspected to be caused by initiating PHY transactions
before the interface settles.
Separating this configuration from the ULP enabling flow and moving it to
the shutdown function allows enough time for the interface to settle and
avoids adding a delay.
Fixes: 6607c99e7034 ("e1000e: i219 - fix to enable both ULP and EEE in Sx state")
Co-developed-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Tested-by: Naama Meir <naamax.meir(a)linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index d8e97669f31b..f9e94be36e97 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1165,25 +1165,6 @@ s32 e1000_enable_ulp_lpt_lp(struct e1000_hw *hw, bool to_sx)
if (ret_val)
goto out;
- /* Switching PHY interface always returns MDI error
- * so disable retry mechanism to avoid wasting time
- */
- e1000e_disable_phy_retry(hw);
-
- /* Force SMBus mode in PHY */
- ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg);
- if (ret_val)
- goto release;
- phy_reg |= CV_SMB_CTRL_FORCE_SMBUS;
- e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg);
-
- e1000e_enable_phy_retry(hw);
-
- /* Force SMBus mode in MAC */
- mac_reg = er32(CTRL_EXT);
- mac_reg |= E1000_CTRL_EXT_FORCE_SMBUS;
- ew32(CTRL_EXT, mac_reg);
-
/* Si workaround for ULP entry flow on i127/rev6 h/w. Enable
* LPLU and disable Gig speed when entering ULP
*/
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index cc8c531ec3df..3692fce20195 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6623,6 +6623,7 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
struct e1000_hw *hw = &adapter->hw;
u32 ctrl, ctrl_ext, rctl, status, wufc;
int retval = 0;
+ u16 smb_ctrl;
/* Runtime suspend should only enable wakeup for link changes */
if (runtime)
@@ -6696,6 +6697,23 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
if (retval)
return retval;
}
+
+ /* Force SMBUS to allow WOL */
+ /* Switching PHY interface always returns MDI error
+ * so disable retry mechanism to avoid wasting time
+ */
+ e1000e_disable_phy_retry(hw);
+
+ e1e_rphy(hw, CV_SMB_CTRL, &smb_ctrl);
+ smb_ctrl |= CV_SMB_CTRL_FORCE_SMBUS;
+ e1e_wphy(hw, CV_SMB_CTRL, smb_ctrl);
+
+ e1000e_enable_phy_retry(hw);
+
+ /* Force SMBus mode in MAC */
+ ctrl_ext = er32(CTRL_EXT);
+ ctrl_ext |= E1000_CTRL_EXT_FORCE_SMBUS;
+ ew32(CTRL_EXT, ctrl_ext);
}
/* Ensure that the appropriate bits are set in LPI_CTRL
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 861e8086029e003305750b4126ecd6617465f5c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040524-olive-outbreak-88c8@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 861e8086029e003305750b4126ecd6617465f5c7 Mon Sep 17 00:00:00 2001
From: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Date: Sun, 3 Mar 2024 12:51:32 +0200
Subject: [PATCH] e1000e: move force SMBUS from enable ulp function to avoid
PHY loss issue
Forcing SMBUS inside the ULP enabling flow leads to sporadic PHY loss on
some systems. It is suspected to be caused by initiating PHY transactions
before the interface settles.
Separating this configuration from the ULP enabling flow and moving it to
the shutdown function allows enough time for the interface to settle and
avoids adding a delay.
Fixes: 6607c99e7034 ("e1000e: i219 - fix to enable both ULP and EEE in Sx state")
Co-developed-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Tested-by: Naama Meir <naamax.meir(a)linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index d8e97669f31b..f9e94be36e97 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1165,25 +1165,6 @@ s32 e1000_enable_ulp_lpt_lp(struct e1000_hw *hw, bool to_sx)
if (ret_val)
goto out;
- /* Switching PHY interface always returns MDI error
- * so disable retry mechanism to avoid wasting time
- */
- e1000e_disable_phy_retry(hw);
-
- /* Force SMBus mode in PHY */
- ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg);
- if (ret_val)
- goto release;
- phy_reg |= CV_SMB_CTRL_FORCE_SMBUS;
- e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg);
-
- e1000e_enable_phy_retry(hw);
-
- /* Force SMBus mode in MAC */
- mac_reg = er32(CTRL_EXT);
- mac_reg |= E1000_CTRL_EXT_FORCE_SMBUS;
- ew32(CTRL_EXT, mac_reg);
-
/* Si workaround for ULP entry flow on i127/rev6 h/w. Enable
* LPLU and disable Gig speed when entering ULP
*/
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index cc8c531ec3df..3692fce20195 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6623,6 +6623,7 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
struct e1000_hw *hw = &adapter->hw;
u32 ctrl, ctrl_ext, rctl, status, wufc;
int retval = 0;
+ u16 smb_ctrl;
/* Runtime suspend should only enable wakeup for link changes */
if (runtime)
@@ -6696,6 +6697,23 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
if (retval)
return retval;
}
+
+ /* Force SMBUS to allow WOL */
+ /* Switching PHY interface always returns MDI error
+ * so disable retry mechanism to avoid wasting time
+ */
+ e1000e_disable_phy_retry(hw);
+
+ e1e_rphy(hw, CV_SMB_CTRL, &smb_ctrl);
+ smb_ctrl |= CV_SMB_CTRL_FORCE_SMBUS;
+ e1e_wphy(hw, CV_SMB_CTRL, smb_ctrl);
+
+ e1000e_enable_phy_retry(hw);
+
+ /* Force SMBus mode in MAC */
+ ctrl_ext = er32(CTRL_EXT);
+ ctrl_ext |= E1000_CTRL_EXT_FORCE_SMBUS;
+ ew32(CTRL_EXT, ctrl_ext);
}
/* Ensure that the appropriate bits are set in LPI_CTRL
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 861e8086029e003305750b4126ecd6617465f5c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040521-deduce-blast-00e2@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 861e8086029e003305750b4126ecd6617465f5c7 Mon Sep 17 00:00:00 2001
From: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Date: Sun, 3 Mar 2024 12:51:32 +0200
Subject: [PATCH] e1000e: move force SMBUS from enable ulp function to avoid
PHY loss issue
Forcing SMBUS inside the ULP enabling flow leads to sporadic PHY loss on
some systems. It is suspected to be caused by initiating PHY transactions
before the interface settles.
Separating this configuration from the ULP enabling flow and moving it to
the shutdown function allows enough time for the interface to settle and
avoids adding a delay.
Fixes: 6607c99e7034 ("e1000e: i219 - fix to enable both ULP and EEE in Sx state")
Co-developed-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Dima Ruinskiy <dima.ruinskiy(a)intel.com>
Signed-off-by: Vitaly Lifshits <vitaly.lifshits(a)intel.com>
Tested-by: Naama Meir <naamax.meir(a)linux.intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen(a)intel.com>
diff --git a/drivers/net/ethernet/intel/e1000e/ich8lan.c b/drivers/net/ethernet/intel/e1000e/ich8lan.c
index d8e97669f31b..f9e94be36e97 100644
--- a/drivers/net/ethernet/intel/e1000e/ich8lan.c
+++ b/drivers/net/ethernet/intel/e1000e/ich8lan.c
@@ -1165,25 +1165,6 @@ s32 e1000_enable_ulp_lpt_lp(struct e1000_hw *hw, bool to_sx)
if (ret_val)
goto out;
- /* Switching PHY interface always returns MDI error
- * so disable retry mechanism to avoid wasting time
- */
- e1000e_disable_phy_retry(hw);
-
- /* Force SMBus mode in PHY */
- ret_val = e1000_read_phy_reg_hv_locked(hw, CV_SMB_CTRL, &phy_reg);
- if (ret_val)
- goto release;
- phy_reg |= CV_SMB_CTRL_FORCE_SMBUS;
- e1000_write_phy_reg_hv_locked(hw, CV_SMB_CTRL, phy_reg);
-
- e1000e_enable_phy_retry(hw);
-
- /* Force SMBus mode in MAC */
- mac_reg = er32(CTRL_EXT);
- mac_reg |= E1000_CTRL_EXT_FORCE_SMBUS;
- ew32(CTRL_EXT, mac_reg);
-
/* Si workaround for ULP entry flow on i127/rev6 h/w. Enable
* LPLU and disable Gig speed when entering ULP
*/
diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c
index cc8c531ec3df..3692fce20195 100644
--- a/drivers/net/ethernet/intel/e1000e/netdev.c
+++ b/drivers/net/ethernet/intel/e1000e/netdev.c
@@ -6623,6 +6623,7 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
struct e1000_hw *hw = &adapter->hw;
u32 ctrl, ctrl_ext, rctl, status, wufc;
int retval = 0;
+ u16 smb_ctrl;
/* Runtime suspend should only enable wakeup for link changes */
if (runtime)
@@ -6696,6 +6697,23 @@ static int __e1000_shutdown(struct pci_dev *pdev, bool runtime)
if (retval)
return retval;
}
+
+ /* Force SMBUS to allow WOL */
+ /* Switching PHY interface always returns MDI error
+ * so disable retry mechanism to avoid wasting time
+ */
+ e1000e_disable_phy_retry(hw);
+
+ e1e_rphy(hw, CV_SMB_CTRL, &smb_ctrl);
+ smb_ctrl |= CV_SMB_CTRL_FORCE_SMBUS;
+ e1e_wphy(hw, CV_SMB_CTRL, smb_ctrl);
+
+ e1000e_enable_phy_retry(hw);
+
+ /* Force SMBus mode in MAC */
+ ctrl_ext = er32(CTRL_EXT);
+ ctrl_ext |= E1000_CTRL_EXT_FORCE_SMBUS;
+ ew32(CTRL_EXT, ctrl_ext);
}
/* Ensure that the appropriate bits are set in LPI_CTRL
On Fri, Apr 5, 2024 at 12:34 AM Ramon de C Valle <rcvalle(a)google.com> wrote:
>
> Sorry about that, I should've done this already. In addition to the link you added above, here's a tracking issue for KCFI support for Rust:
> https://github.com/rust-lang/rust/issues/123479
No worries! Thanks for taking the time creating the new tracking issue
-- linked now from our issue #2.
I also linked your update here and noted the benchmarking needs /
regressions report request too.
Cheers,
Miguel
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 4535e1a4174c4111d92c5a9a21e542d232e0fcaa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024033030-steam-implosion-5c12@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4535e1a4174c4111d92c5a9a21e542d232e0fcaa Mon Sep 17 00:00:00 2001
From: "Borislav Petkov (AMD)" <bp(a)alien8.de>
Date: Thu, 28 Mar 2024 13:59:05 +0100
Subject: [PATCH] x86/bugs: Fix the SRSO mitigation on Zen3/4
The original version of the mitigation would patch in the calls to the
untraining routines directly. That is, the alternative() in UNTRAIN_RET
will patch in the CALL to srso_alias_untrain_ret() directly.
However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain
mess") meant well in trying to clean up the situation, due to micro-
architectural reasons, the untraining routine srso_alias_untrain_ret()
must be the target of a CALL instruction and not of a JMP instruction as
it is done now.
Reshuffle the alternative macros to accomplish that.
Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 076bf8dee702..25466c4d2134 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -14,6 +14,7 @@
#include <asm/asm.h>
#include <asm/fred.h>
#include <asm/gsseg.h>
+#include <asm/nospec-branch.h>
#ifndef CONFIG_X86_CMPXCHG64
extern void cmpxchg8b_emu(void);
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index fc3a8a3c7ffe..170c89ed22fc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -262,11 +262,20 @@
.Lskip_rsb_\@:
.endm
+/*
+ * The CALL to srso_alias_untrain_ret() must be patched in directly at
+ * the spot where untraining must be done, ie., srso_alias_untrain_ret()
+ * must be the target of a CALL instruction instead of indirectly
+ * jumping to a wrapper which then calls it. Therefore, this macro is
+ * called outside of __UNTRAIN_RET below, for the time being, before the
+ * kernel can support nested alternatives with arbitrary nesting.
+ */
+.macro CALL_UNTRAIN_RET
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
-#define CALL_UNTRAIN_RET "call entry_untrain_ret"
-#else
-#define CALL_UNTRAIN_RET ""
+ ALTERNATIVE_2 "", "call entry_untrain_ret", X86_FEATURE_UNRET, \
+ "call srso_alias_untrain_ret", X86_FEATURE_SRSO_ALIAS
#endif
+.endm
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. Requires KERNEL CR3 because the
@@ -282,8 +291,8 @@
.macro __UNTRAIN_RET ibpb_feature, call_depth_insns
#if defined(CONFIG_MITIGATION_RETHUNK) || defined(CONFIG_MITIGATION_IBPB_ENTRY)
VALIDATE_UNRET_END
- ALTERNATIVE_3 "", \
- CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \
+ CALL_UNTRAIN_RET
+ ALTERNATIVE_2 "", \
"call entry_ibpb", \ibpb_feature, \
__stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH
#endif
@@ -342,6 +351,8 @@ extern void retbleed_return_thunk(void);
static inline void retbleed_return_thunk(void) {}
#endif
+extern void srso_alias_untrain_ret(void);
+
#ifdef CONFIG_MITIGATION_SRSO
extern void srso_return_thunk(void);
extern void srso_alias_return_thunk(void);
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 721b528da9ac..02cde194a99e 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -163,6 +163,7 @@ SYM_CODE_START_NOALIGN(srso_alias_untrain_ret)
lfence
jmp srso_alias_return_thunk
SYM_FUNC_END(srso_alias_untrain_ret)
+__EXPORT_THUNK(srso_alias_untrain_ret)
.popsection
.pushsection .text..__x86.rethunk_safe
@@ -224,10 +225,12 @@ SYM_CODE_START(srso_return_thunk)
SYM_CODE_END(srso_return_thunk)
#define JMP_SRSO_UNTRAIN_RET "jmp srso_untrain_ret"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "jmp srso_alias_untrain_ret"
#else /* !CONFIG_MITIGATION_SRSO */
+/* Dummy for the alternative in CALL_UNTRAIN_RET. */
+SYM_CODE_START(srso_alias_untrain_ret)
+ RET
+SYM_FUNC_END(srso_alias_untrain_ret)
#define JMP_SRSO_UNTRAIN_RET "ud2"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "ud2"
#endif /* CONFIG_MITIGATION_SRSO */
#ifdef CONFIG_MITIGATION_UNRET_ENTRY
@@ -319,9 +322,7 @@ SYM_FUNC_END(retbleed_untrain_ret)
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
SYM_FUNC_START(entry_untrain_ret)
- ALTERNATIVE_2 JMP_RETBLEED_UNTRAIN_RET, \
- JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO, \
- JMP_SRSO_ALIAS_UNTRAIN_RET, X86_FEATURE_SRSO_ALIAS
+ ALTERNATIVE JMP_RETBLEED_UNTRAIN_RET, JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO
SYM_FUNC_END(entry_untrain_ret)
__EXPORT_THUNK(entry_untrain_ret)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x cbc17e7802f5de37c7c262204baadfad3f7f99e5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040527-action-late-9749@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cbc17e7802f5de37c7c262204baadfad3f7f99e5 Mon Sep 17 00:00:00 2001
From: Wei Fang <wei.fang(a)nxp.com>
Date: Thu, 28 Mar 2024 15:59:29 +0000
Subject: [PATCH] net: fec: Set mac_managed_pm during probe
Setting mac_managed_pm during interface up is too late.
In situations where the link is not brought up yet and the system suspends
the regular PHY power management will run. Since the FEC ETHEREN control
bit is cleared (automatically) on suspend the controller is off in resume.
When the regular PHY power management resume path runs in this context it
will write to the MII_DATA register but nothing will be transmitted on the
MDIO bus.
This can be observed by the following log:
fec 5b040000.ethernet eth0: MDIO read timeout
Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110
The data written will however remain in the MII_DATA register.
When the link later is set to administrative up it will trigger a call to
fec_restart() which will restore the MII_SPEED register. This triggers the
quirk explained in f166f890c8f0 ("net: ethernet: fec: Replace interrupt
driven MDIO with polled IO") causing an extra MII_EVENT.
This extra event desynchronizes all the MDIO register reads, causing them
to complete too early. Leading all reads to read as 0 because
fec_enet_mdio_wait() returns too early.
When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes
the PHY to be initialized incorrectly and the PHY will not transmit any
ethernet signal in this state. It cannot be brought out of this state
without a power cycle of the PHY.
Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
Closes: https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.s…
Signed-off-by: Wei Fang <wei.fang(a)nxp.com>
[jernberg: commit message]
Signed-off-by: John Ernberg <john.ernberg(a)actia.se>
Link: https://lore.kernel.org/r/20240328155909.59613-2-john.ernberg@actia.se
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index d7693fdf640d..8bd213da8fb6 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2454,8 +2454,6 @@ static int fec_enet_mii_probe(struct net_device *ndev)
fep->link = 0;
fep->full_duplex = 0;
- phy_dev->mac_managed_pm = true;
-
phy_attached_info(phy_dev);
return 0;
@@ -2467,10 +2465,12 @@ static int fec_enet_mii_init(struct platform_device *pdev)
struct net_device *ndev = platform_get_drvdata(pdev);
struct fec_enet_private *fep = netdev_priv(ndev);
bool suppress_preamble = false;
+ struct phy_device *phydev;
struct device_node *node;
int err = -ENXIO;
u32 mii_speed, holdtime;
u32 bus_freq;
+ int addr;
/*
* The i.MX28 dual fec interfaces are not equal.
@@ -2584,6 +2584,13 @@ static int fec_enet_mii_init(struct platform_device *pdev)
goto err_out_free_mdiobus;
of_node_put(node);
+ /* find all the PHY devices on the bus and set mac_managed_pm to true */
+ for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
+ phydev = mdiobus_get_phy(fep->mii_bus, addr);
+ if (phydev)
+ phydev->mac_managed_pm = true;
+ }
+
mii_cnt++;
/* save fec0 mii_bus */
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x cbc17e7802f5de37c7c262204baadfad3f7f99e5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040525-buccaneer-rehab-aa0f@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cbc17e7802f5de37c7c262204baadfad3f7f99e5 Mon Sep 17 00:00:00 2001
From: Wei Fang <wei.fang(a)nxp.com>
Date: Thu, 28 Mar 2024 15:59:29 +0000
Subject: [PATCH] net: fec: Set mac_managed_pm during probe
Setting mac_managed_pm during interface up is too late.
In situations where the link is not brought up yet and the system suspends
the regular PHY power management will run. Since the FEC ETHEREN control
bit is cleared (automatically) on suspend the controller is off in resume.
When the regular PHY power management resume path runs in this context it
will write to the MII_DATA register but nothing will be transmitted on the
MDIO bus.
This can be observed by the following log:
fec 5b040000.ethernet eth0: MDIO read timeout
Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: dpm_run_callback(): mdio_bus_phy_resume+0x0/0xc8 returns -110
Microchip LAN87xx T1 5b040000.ethernet-1:04: PM: failed to resume: error -110
The data written will however remain in the MII_DATA register.
When the link later is set to administrative up it will trigger a call to
fec_restart() which will restore the MII_SPEED register. This triggers the
quirk explained in f166f890c8f0 ("net: ethernet: fec: Replace interrupt
driven MDIO with polled IO") causing an extra MII_EVENT.
This extra event desynchronizes all the MDIO register reads, causing them
to complete too early. Leading all reads to read as 0 because
fec_enet_mdio_wait() returns too early.
When a Microchip LAN8700R PHY is connected to the FEC, the 0 reads causes
the PHY to be initialized incorrectly and the PHY will not transmit any
ethernet signal in this state. It cannot be brought out of this state
without a power cycle of the PHY.
Fixes: 557d5dc83f68 ("net: fec: use mac-managed PHY PM")
Closes: https://lore.kernel.org/netdev/1f45bdbe-eab1-4e59-8f24-add177590d27@actia.s…
Signed-off-by: Wei Fang <wei.fang(a)nxp.com>
[jernberg: commit message]
Signed-off-by: John Ernberg <john.ernberg(a)actia.se>
Link: https://lore.kernel.org/r/20240328155909.59613-2-john.ernberg@actia.se
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/freescale/fec_main.c b/drivers/net/ethernet/freescale/fec_main.c
index d7693fdf640d..8bd213da8fb6 100644
--- a/drivers/net/ethernet/freescale/fec_main.c
+++ b/drivers/net/ethernet/freescale/fec_main.c
@@ -2454,8 +2454,6 @@ static int fec_enet_mii_probe(struct net_device *ndev)
fep->link = 0;
fep->full_duplex = 0;
- phy_dev->mac_managed_pm = true;
-
phy_attached_info(phy_dev);
return 0;
@@ -2467,10 +2465,12 @@ static int fec_enet_mii_init(struct platform_device *pdev)
struct net_device *ndev = platform_get_drvdata(pdev);
struct fec_enet_private *fep = netdev_priv(ndev);
bool suppress_preamble = false;
+ struct phy_device *phydev;
struct device_node *node;
int err = -ENXIO;
u32 mii_speed, holdtime;
u32 bus_freq;
+ int addr;
/*
* The i.MX28 dual fec interfaces are not equal.
@@ -2584,6 +2584,13 @@ static int fec_enet_mii_init(struct platform_device *pdev)
goto err_out_free_mdiobus;
of_node_put(node);
+ /* find all the PHY devices on the bus and set mac_managed_pm to true */
+ for (addr = 0; addr < PHY_MAX_ADDR; addr++) {
+ phydev = mdiobus_get_phy(fep->mii_bus, addr);
+ if (phydev)
+ phydev->mac_managed_pm = true;
+ }
+
mii_cnt++;
/* save fec0 mii_bus */
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x 4535e1a4174c4111d92c5a9a21e542d232e0fcaa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024033027-tapered-curly-3516@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4535e1a4174c4111d92c5a9a21e542d232e0fcaa Mon Sep 17 00:00:00 2001
From: "Borislav Petkov (AMD)" <bp(a)alien8.de>
Date: Thu, 28 Mar 2024 13:59:05 +0100
Subject: [PATCH] x86/bugs: Fix the SRSO mitigation on Zen3/4
The original version of the mitigation would patch in the calls to the
untraining routines directly. That is, the alternative() in UNTRAIN_RET
will patch in the CALL to srso_alias_untrain_ret() directly.
However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain
mess") meant well in trying to clean up the situation, due to micro-
architectural reasons, the untraining routine srso_alias_untrain_ret()
must be the target of a CALL instruction and not of a JMP instruction as
it is done now.
Reshuffle the alternative macros to accomplish that.
Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 076bf8dee702..25466c4d2134 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -14,6 +14,7 @@
#include <asm/asm.h>
#include <asm/fred.h>
#include <asm/gsseg.h>
+#include <asm/nospec-branch.h>
#ifndef CONFIG_X86_CMPXCHG64
extern void cmpxchg8b_emu(void);
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index fc3a8a3c7ffe..170c89ed22fc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -262,11 +262,20 @@
.Lskip_rsb_\@:
.endm
+/*
+ * The CALL to srso_alias_untrain_ret() must be patched in directly at
+ * the spot where untraining must be done, ie., srso_alias_untrain_ret()
+ * must be the target of a CALL instruction instead of indirectly
+ * jumping to a wrapper which then calls it. Therefore, this macro is
+ * called outside of __UNTRAIN_RET below, for the time being, before the
+ * kernel can support nested alternatives with arbitrary nesting.
+ */
+.macro CALL_UNTRAIN_RET
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
-#define CALL_UNTRAIN_RET "call entry_untrain_ret"
-#else
-#define CALL_UNTRAIN_RET ""
+ ALTERNATIVE_2 "", "call entry_untrain_ret", X86_FEATURE_UNRET, \
+ "call srso_alias_untrain_ret", X86_FEATURE_SRSO_ALIAS
#endif
+.endm
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. Requires KERNEL CR3 because the
@@ -282,8 +291,8 @@
.macro __UNTRAIN_RET ibpb_feature, call_depth_insns
#if defined(CONFIG_MITIGATION_RETHUNK) || defined(CONFIG_MITIGATION_IBPB_ENTRY)
VALIDATE_UNRET_END
- ALTERNATIVE_3 "", \
- CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \
+ CALL_UNTRAIN_RET
+ ALTERNATIVE_2 "", \
"call entry_ibpb", \ibpb_feature, \
__stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH
#endif
@@ -342,6 +351,8 @@ extern void retbleed_return_thunk(void);
static inline void retbleed_return_thunk(void) {}
#endif
+extern void srso_alias_untrain_ret(void);
+
#ifdef CONFIG_MITIGATION_SRSO
extern void srso_return_thunk(void);
extern void srso_alias_return_thunk(void);
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 721b528da9ac..02cde194a99e 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -163,6 +163,7 @@ SYM_CODE_START_NOALIGN(srso_alias_untrain_ret)
lfence
jmp srso_alias_return_thunk
SYM_FUNC_END(srso_alias_untrain_ret)
+__EXPORT_THUNK(srso_alias_untrain_ret)
.popsection
.pushsection .text..__x86.rethunk_safe
@@ -224,10 +225,12 @@ SYM_CODE_START(srso_return_thunk)
SYM_CODE_END(srso_return_thunk)
#define JMP_SRSO_UNTRAIN_RET "jmp srso_untrain_ret"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "jmp srso_alias_untrain_ret"
#else /* !CONFIG_MITIGATION_SRSO */
+/* Dummy for the alternative in CALL_UNTRAIN_RET. */
+SYM_CODE_START(srso_alias_untrain_ret)
+ RET
+SYM_FUNC_END(srso_alias_untrain_ret)
#define JMP_SRSO_UNTRAIN_RET "ud2"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "ud2"
#endif /* CONFIG_MITIGATION_SRSO */
#ifdef CONFIG_MITIGATION_UNRET_ENTRY
@@ -319,9 +322,7 @@ SYM_FUNC_END(retbleed_untrain_ret)
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
SYM_FUNC_START(entry_untrain_ret)
- ALTERNATIVE_2 JMP_RETBLEED_UNTRAIN_RET, \
- JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO, \
- JMP_SRSO_ALIAS_UNTRAIN_RET, X86_FEATURE_SRSO_ALIAS
+ ALTERNATIVE JMP_RETBLEED_UNTRAIN_RET, JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO
SYM_FUNC_END(entry_untrain_ret)
__EXPORT_THUNK(entry_untrain_ret)
The patch below does not apply to the 6.7-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.7.y
git checkout FETCH_HEAD
git cherry-pick -x 4535e1a4174c4111d92c5a9a21e542d232e0fcaa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024033028-lumpiness-pouch-475f@gregkh' --subject-prefix 'PATCH 6.7.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4535e1a4174c4111d92c5a9a21e542d232e0fcaa Mon Sep 17 00:00:00 2001
From: "Borislav Petkov (AMD)" <bp(a)alien8.de>
Date: Thu, 28 Mar 2024 13:59:05 +0100
Subject: [PATCH] x86/bugs: Fix the SRSO mitigation on Zen3/4
The original version of the mitigation would patch in the calls to the
untraining routines directly. That is, the alternative() in UNTRAIN_RET
will patch in the CALL to srso_alias_untrain_ret() directly.
However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain
mess") meant well in trying to clean up the situation, due to micro-
architectural reasons, the untraining routine srso_alias_untrain_ret()
must be the target of a CALL instruction and not of a JMP instruction as
it is done now.
Reshuffle the alternative macros to accomplish that.
Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 076bf8dee702..25466c4d2134 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -14,6 +14,7 @@
#include <asm/asm.h>
#include <asm/fred.h>
#include <asm/gsseg.h>
+#include <asm/nospec-branch.h>
#ifndef CONFIG_X86_CMPXCHG64
extern void cmpxchg8b_emu(void);
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index fc3a8a3c7ffe..170c89ed22fc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -262,11 +262,20 @@
.Lskip_rsb_\@:
.endm
+/*
+ * The CALL to srso_alias_untrain_ret() must be patched in directly at
+ * the spot where untraining must be done, ie., srso_alias_untrain_ret()
+ * must be the target of a CALL instruction instead of indirectly
+ * jumping to a wrapper which then calls it. Therefore, this macro is
+ * called outside of __UNTRAIN_RET below, for the time being, before the
+ * kernel can support nested alternatives with arbitrary nesting.
+ */
+.macro CALL_UNTRAIN_RET
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
-#define CALL_UNTRAIN_RET "call entry_untrain_ret"
-#else
-#define CALL_UNTRAIN_RET ""
+ ALTERNATIVE_2 "", "call entry_untrain_ret", X86_FEATURE_UNRET, \
+ "call srso_alias_untrain_ret", X86_FEATURE_SRSO_ALIAS
#endif
+.endm
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. Requires KERNEL CR3 because the
@@ -282,8 +291,8 @@
.macro __UNTRAIN_RET ibpb_feature, call_depth_insns
#if defined(CONFIG_MITIGATION_RETHUNK) || defined(CONFIG_MITIGATION_IBPB_ENTRY)
VALIDATE_UNRET_END
- ALTERNATIVE_3 "", \
- CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \
+ CALL_UNTRAIN_RET
+ ALTERNATIVE_2 "", \
"call entry_ibpb", \ibpb_feature, \
__stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH
#endif
@@ -342,6 +351,8 @@ extern void retbleed_return_thunk(void);
static inline void retbleed_return_thunk(void) {}
#endif
+extern void srso_alias_untrain_ret(void);
+
#ifdef CONFIG_MITIGATION_SRSO
extern void srso_return_thunk(void);
extern void srso_alias_return_thunk(void);
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 721b528da9ac..02cde194a99e 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -163,6 +163,7 @@ SYM_CODE_START_NOALIGN(srso_alias_untrain_ret)
lfence
jmp srso_alias_return_thunk
SYM_FUNC_END(srso_alias_untrain_ret)
+__EXPORT_THUNK(srso_alias_untrain_ret)
.popsection
.pushsection .text..__x86.rethunk_safe
@@ -224,10 +225,12 @@ SYM_CODE_START(srso_return_thunk)
SYM_CODE_END(srso_return_thunk)
#define JMP_SRSO_UNTRAIN_RET "jmp srso_untrain_ret"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "jmp srso_alias_untrain_ret"
#else /* !CONFIG_MITIGATION_SRSO */
+/* Dummy for the alternative in CALL_UNTRAIN_RET. */
+SYM_CODE_START(srso_alias_untrain_ret)
+ RET
+SYM_FUNC_END(srso_alias_untrain_ret)
#define JMP_SRSO_UNTRAIN_RET "ud2"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "ud2"
#endif /* CONFIG_MITIGATION_SRSO */
#ifdef CONFIG_MITIGATION_UNRET_ENTRY
@@ -319,9 +322,7 @@ SYM_FUNC_END(retbleed_untrain_ret)
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
SYM_FUNC_START(entry_untrain_ret)
- ALTERNATIVE_2 JMP_RETBLEED_UNTRAIN_RET, \
- JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO, \
- JMP_SRSO_ALIAS_UNTRAIN_RET, X86_FEATURE_SRSO_ALIAS
+ ALTERNATIVE JMP_RETBLEED_UNTRAIN_RET, JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO
SYM_FUNC_END(entry_untrain_ret)
__EXPORT_THUNK(entry_untrain_ret)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 1bc83a019bbe268be3526406245ec28c2458a518
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040515-cornbread-bottling-0651@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1bc83a019bbe268be3526406245ec28c2458a518 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo(a)netfilter.org>
Date: Wed, 3 Apr 2024 19:35:30 +0200
Subject: [PATCH] netfilter: nf_tables: discard table flag update with pending
basechain deletion
Hook unregistration is deferred to the commit phase, same occurs with
hook updates triggered by the table dormant flag. When both commands are
combined, this results in deleting a basechain while leaving its hook
still registered in the core.
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index e02d0ae4f436..d89d77946719 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1209,10 +1209,11 @@ static bool nft_table_pending_update(const struct nft_ctx *ctx)
return true;
list_for_each_entry(trans, &nft_net->commit_list, list) {
- if ((trans->msg_type == NFT_MSG_NEWCHAIN ||
- trans->msg_type == NFT_MSG_DELCHAIN) &&
- trans->ctx.table == ctx->table &&
- nft_trans_chain_update(trans))
+ if (trans->ctx.table == ctx->table &&
+ ((trans->msg_type == NFT_MSG_NEWCHAIN &&
+ nft_trans_chain_update(trans)) ||
+ (trans->msg_type == NFT_MSG_DELCHAIN &&
+ nft_is_base_chain(trans->ctx.chain))))
return true;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 1bc83a019bbe268be3526406245ec28c2458a518
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040515-chatroom-cathedral-e974@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1bc83a019bbe268be3526406245ec28c2458a518 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo(a)netfilter.org>
Date: Wed, 3 Apr 2024 19:35:30 +0200
Subject: [PATCH] netfilter: nf_tables: discard table flag update with pending
basechain deletion
Hook unregistration is deferred to the commit phase, same occurs with
hook updates triggered by the table dormant flag. When both commands are
combined, this results in deleting a basechain while leaving its hook
still registered in the core.
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index e02d0ae4f436..d89d77946719 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1209,10 +1209,11 @@ static bool nft_table_pending_update(const struct nft_ctx *ctx)
return true;
list_for_each_entry(trans, &nft_net->commit_list, list) {
- if ((trans->msg_type == NFT_MSG_NEWCHAIN ||
- trans->msg_type == NFT_MSG_DELCHAIN) &&
- trans->ctx.table == ctx->table &&
- nft_trans_chain_update(trans))
+ if (trans->ctx.table == ctx->table &&
+ ((trans->msg_type == NFT_MSG_NEWCHAIN &&
+ nft_trans_chain_update(trans)) ||
+ (trans->msg_type == NFT_MSG_DELCHAIN &&
+ nft_is_base_chain(trans->ctx.chain))))
return true;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 1bc83a019bbe268be3526406245ec28c2458a518
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040514-browse-cabana-cb93@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1bc83a019bbe268be3526406245ec28c2458a518 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo(a)netfilter.org>
Date: Wed, 3 Apr 2024 19:35:30 +0200
Subject: [PATCH] netfilter: nf_tables: discard table flag update with pending
basechain deletion
Hook unregistration is deferred to the commit phase, same occurs with
hook updates triggered by the table dormant flag. When both commands are
combined, this results in deleting a basechain while leaving its hook
still registered in the core.
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index e02d0ae4f436..d89d77946719 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1209,10 +1209,11 @@ static bool nft_table_pending_update(const struct nft_ctx *ctx)
return true;
list_for_each_entry(trans, &nft_net->commit_list, list) {
- if ((trans->msg_type == NFT_MSG_NEWCHAIN ||
- trans->msg_type == NFT_MSG_DELCHAIN) &&
- trans->ctx.table == ctx->table &&
- nft_trans_chain_update(trans))
+ if (trans->ctx.table == ctx->table &&
+ ((trans->msg_type == NFT_MSG_NEWCHAIN &&
+ nft_trans_chain_update(trans)) ||
+ (trans->msg_type == NFT_MSG_DELCHAIN &&
+ nft_is_base_chain(trans->ctx.chain))))
return true;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 1bc83a019bbe268be3526406245ec28c2458a518
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040513-slimness-statistic-1bd4@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1bc83a019bbe268be3526406245ec28c2458a518 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo(a)netfilter.org>
Date: Wed, 3 Apr 2024 19:35:30 +0200
Subject: [PATCH] netfilter: nf_tables: discard table flag update with pending
basechain deletion
Hook unregistration is deferred to the commit phase, same occurs with
hook updates triggered by the table dormant flag. When both commands are
combined, this results in deleting a basechain while leaving its hook
still registered in the core.
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index e02d0ae4f436..d89d77946719 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -1209,10 +1209,11 @@ static bool nft_table_pending_update(const struct nft_ctx *ctx)
return true;
list_for_each_entry(trans, &nft_net->commit_list, list) {
- if ((trans->msg_type == NFT_MSG_NEWCHAIN ||
- trans->msg_type == NFT_MSG_DELCHAIN) &&
- trans->ctx.table == ctx->table &&
- nft_trans_chain_update(trans))
+ if (trans->ctx.table == ctx->table &&
+ ((trans->msg_type == NFT_MSG_NEWCHAIN &&
+ nft_trans_chain_update(trans)) ||
+ (trans->msg_type == NFT_MSG_DELCHAIN &&
+ nft_is_base_chain(trans->ctx.chain))))
return true;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 994209ddf4f430946f6247616b2e33d179243769
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040556-dexterity-handwrite-98fd@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 994209ddf4f430946f6247616b2e33d179243769 Mon Sep 17 00:00:00 2001
From: Pablo Neira Ayuso <pablo(a)netfilter.org>
Date: Mon, 1 Apr 2024 00:33:02 +0200
Subject: [PATCH] netfilter: nf_tables: reject new basechain after table flag
update
When dormant flag is toggled, hooks are disabled in the commit phase by
iterating over current chains in table (existing and new).
The following configuration allows for an inconsistent state:
add table x
add chain x y { type filter hook input priority 0; }
add table x { flags dormant; }
add chain x w { type filter hook input priority 1; }
which triggers the following warning when trying to unregister chain w
which is already unregistered.
[ 127.322252] WARNING: CPU: 7 PID: 1211 at net/netfilter/core.c:50 1 __nf_unregister_net_hook+0x21a/0x260
[...]
[ 127.322519] Call Trace:
[ 127.322521] <TASK>
[ 127.322524] ? __warn+0x9f/0x1a0
[ 127.322531] ? __nf_unregister_net_hook+0x21a/0x260
[ 127.322537] ? report_bug+0x1b1/0x1e0
[ 127.322545] ? handle_bug+0x3c/0x70
[ 127.322552] ? exc_invalid_op+0x17/0x40
[ 127.322556] ? asm_exc_invalid_op+0x1a/0x20
[ 127.322563] ? kasan_save_free_info+0x3b/0x60
[ 127.322570] ? __nf_unregister_net_hook+0x6a/0x260
[ 127.322577] ? __nf_unregister_net_hook+0x21a/0x260
[ 127.322583] ? __nf_unregister_net_hook+0x6a/0x260
[ 127.322590] ? __nf_tables_unregister_hook+0x8a/0xe0 [nf_tables]
[ 127.322655] nft_table_disable+0x75/0xf0 [nf_tables]
[ 127.322717] nf_tables_commit+0x2571/0x2620 [nf_tables]
Fixes: 179d9ba5559a ("netfilter: nf_tables: fix table flag updates")
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 17bf53cd0e84..1eb51bf24fc2 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2449,6 +2449,9 @@ static int nf_tables_addchain(struct nft_ctx *ctx, u8 family, u8 genmask,
struct nft_stats __percpu *stats = NULL;
struct nft_chain_hook hook = {};
+ if (table->flags & __NFT_TABLE_F_UPDATE)
+ return -EINVAL;
+
if (flags & NFT_CHAIN_BINDING)
return -EOPNOTSUPP;
Two KVM x86 backports for 5.15. Patch 2 is the primary motivation (fix
for potential guest data corruption after live migration).
Patch 1 is a (very) soft dependency to resolve a conflict. It's not strictly
necessary (manually resolving the conflict wouldn't be difficult), but it
is a fix that has been in upstream for a long time. The only reason I didn't
tag it for stable from the get-go is that the bug it fixes is very
theoretical. At this point, the odds of the patch causing problems are
lower than the odds of me botching a manual backport.
Sean Christopherson (2):
KVM: x86: Bail to userspace if emulation of atomic user access faults
KVM: x86: Mark target gfn of emulated atomic instruction as dirty
arch/x86/kvm/x86.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
base-commit: 9465fef4ae351749f7068da8c78af4ca27e61928
--
2.44.0.478.gd926399ef9-goog
From: Ye Zhang <ye.zhang(a)rock-chips.com>
The issue occurs when the devfreq cooling device uses the EM power model
and the get_real_power() callback is provided by the driver.
The EM power table is sorted ascending,can't index the table by cooling
device state,so convert cooling state to performance state by
dfc->max_state - dfc->capped_state.
Fixes: 615510fe13bd ("thermal: devfreq_cooling: remove old power model and use EM")
Cc: 5.11+ <stable(a)vger.kernel.org> # 5.11+
Signed-off-by: Ye Zhang <ye.zhang(a)rock-chips.com>
Reviewed-by: Dhruva Gole <d-gole(a)ti.com>
Reviewed-by: Lukasz Luba <lukasz.luba(a)arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Signed-off-by: Lukasz Luba <lukasz.luba(a)arm.com>
---
Hi Greg,
I have solved small backporting conflict to that v5.15.
The patch is based on tag v5.15.99 and it's for this
failing backport:
https://lore.kernel.org/stable/2024033050-imitation-unmixed-ef53@gregkh/
Regards,
Lukasz Luba
drivers/thermal/devfreq_cooling.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/devfreq_cooling.c b/drivers/thermal/devfreq_cooling.c
index d38a80adec733..5be79b5d788e5 100644
--- a/drivers/thermal/devfreq_cooling.c
+++ b/drivers/thermal/devfreq_cooling.c
@@ -199,7 +199,7 @@ static int devfreq_cooling_get_requested_power(struct thermal_cooling_device *cd
res = dfc->power_ops->get_real_power(df, power, freq, voltage);
if (!res) {
- state = dfc->capped_state;
+ state = dfc->max_state - dfc->capped_state;
dfc->res_util = dfc->em_pd->table[state].power;
dfc->res_util *= SCALE_ERROR_MITIGATION;
--
2.25.1
From: Min Li <min15.li(a)samsung.com>
[ Upstream commit 6f64f866aa1ae6975c95d805ed51d7e9433a0016]
Before calling add partition or resize partition, there is no check
on whether the length is aligned with the logical block size.
If the logical block size of the disk is larger than 512 bytes,
then the partition size maybe not the multiple of the logical block size,
and when the last sector is read, bio_truncate() will adjust the bio size,
resulting in an IO error if the size of the read command is smaller than
the logical block size.If integrity data is supported, this will also
result in a null pointer dereference when calling bio_integrity_free.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Min Li <min15.li(a)samsung.com>
Reviewed-by: Damien Le Moal <dlemoal(a)kernel.org>
Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/20230629142517.121241-1-min15.li@samsung.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
Signed-off-by: Ashwin Dayanand Kamat <ashwin.kamat(a)broadcom.com>
---
block/ioctl.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)
diff --git a/block/ioctl.c b/block/ioctl.c
index e7eed7dad..c490d67fe 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -17,7 +17,7 @@ static int blkpg_do_ioctl(struct block_device *bdev,
struct blkpg_partition __user *upart, int op)
{
struct blkpg_partition p;
- long long start, length;
+ sector_t start, length;
if (!capable(CAP_SYS_ADMIN))
return -EACCES;
@@ -32,6 +32,12 @@ static int blkpg_do_ioctl(struct block_device *bdev,
if (op == BLKPG_DEL_PARTITION)
return bdev_del_partition(bdev, p.pno);
+ if (p.start < 0 || p.length <= 0 || p.start + p.length < 0)
+ return -EINVAL;
+ /* Check that the partition is aligned to the block size */
+ if (!IS_ALIGNED(p.start | p.length, bdev_logical_block_size(bdev)))
+ return -EINVAL;
+
start = p.start >> SECTOR_SHIFT;
length = p.length >> SECTOR_SHIFT;
@@ -46,9 +52,6 @@ static int blkpg_do_ioctl(struct block_device *bdev,
switch (op) {
case BLKPG_ADD_PARTITION:
- /* check if partition is aligned to blocksize */
- if (p.start & (bdev_logical_block_size(bdev) - 1))
- return -EINVAL;
return bdev_add_partition(bdev, p.pno, start, length);
case BLKPG_RESIZE_PARTITION:
return bdev_resize_partition(bdev, p.pno, start, length);
--
2.35.6
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 4535e1a4174c4111d92c5a9a21e542d232e0fcaa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024033029-roast-pajamas-3c55@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4535e1a4174c4111d92c5a9a21e542d232e0fcaa Mon Sep 17 00:00:00 2001
From: "Borislav Petkov (AMD)" <bp(a)alien8.de>
Date: Thu, 28 Mar 2024 13:59:05 +0100
Subject: [PATCH] x86/bugs: Fix the SRSO mitigation on Zen3/4
The original version of the mitigation would patch in the calls to the
untraining routines directly. That is, the alternative() in UNTRAIN_RET
will patch in the CALL to srso_alias_untrain_ret() directly.
However, even if commit e7c25c441e9e ("x86/cpu: Cleanup the untrain
mess") meant well in trying to clean up the situation, due to micro-
architectural reasons, the untraining routine srso_alias_untrain_ret()
must be the target of a CALL instruction and not of a JMP instruction as
it is done now.
Reshuffle the alternative macros to accomplish that.
Fixes: e7c25c441e9e ("x86/cpu: Cleanup the untrain mess")
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Ingo Molnar <mingo(a)kernel.org>
Cc: stable(a)kernel.org
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 076bf8dee702..25466c4d2134 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -14,6 +14,7 @@
#include <asm/asm.h>
#include <asm/fred.h>
#include <asm/gsseg.h>
+#include <asm/nospec-branch.h>
#ifndef CONFIG_X86_CMPXCHG64
extern void cmpxchg8b_emu(void);
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index fc3a8a3c7ffe..170c89ed22fc 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -262,11 +262,20 @@
.Lskip_rsb_\@:
.endm
+/*
+ * The CALL to srso_alias_untrain_ret() must be patched in directly at
+ * the spot where untraining must be done, ie., srso_alias_untrain_ret()
+ * must be the target of a CALL instruction instead of indirectly
+ * jumping to a wrapper which then calls it. Therefore, this macro is
+ * called outside of __UNTRAIN_RET below, for the time being, before the
+ * kernel can support nested alternatives with arbitrary nesting.
+ */
+.macro CALL_UNTRAIN_RET
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
-#define CALL_UNTRAIN_RET "call entry_untrain_ret"
-#else
-#define CALL_UNTRAIN_RET ""
+ ALTERNATIVE_2 "", "call entry_untrain_ret", X86_FEATURE_UNRET, \
+ "call srso_alias_untrain_ret", X86_FEATURE_SRSO_ALIAS
#endif
+.endm
/*
* Mitigate RETBleed for AMD/Hygon Zen uarch. Requires KERNEL CR3 because the
@@ -282,8 +291,8 @@
.macro __UNTRAIN_RET ibpb_feature, call_depth_insns
#if defined(CONFIG_MITIGATION_RETHUNK) || defined(CONFIG_MITIGATION_IBPB_ENTRY)
VALIDATE_UNRET_END
- ALTERNATIVE_3 "", \
- CALL_UNTRAIN_RET, X86_FEATURE_UNRET, \
+ CALL_UNTRAIN_RET
+ ALTERNATIVE_2 "", \
"call entry_ibpb", \ibpb_feature, \
__stringify(\call_depth_insns), X86_FEATURE_CALL_DEPTH
#endif
@@ -342,6 +351,8 @@ extern void retbleed_return_thunk(void);
static inline void retbleed_return_thunk(void) {}
#endif
+extern void srso_alias_untrain_ret(void);
+
#ifdef CONFIG_MITIGATION_SRSO
extern void srso_return_thunk(void);
extern void srso_alias_return_thunk(void);
diff --git a/arch/x86/lib/retpoline.S b/arch/x86/lib/retpoline.S
index 721b528da9ac..02cde194a99e 100644
--- a/arch/x86/lib/retpoline.S
+++ b/arch/x86/lib/retpoline.S
@@ -163,6 +163,7 @@ SYM_CODE_START_NOALIGN(srso_alias_untrain_ret)
lfence
jmp srso_alias_return_thunk
SYM_FUNC_END(srso_alias_untrain_ret)
+__EXPORT_THUNK(srso_alias_untrain_ret)
.popsection
.pushsection .text..__x86.rethunk_safe
@@ -224,10 +225,12 @@ SYM_CODE_START(srso_return_thunk)
SYM_CODE_END(srso_return_thunk)
#define JMP_SRSO_UNTRAIN_RET "jmp srso_untrain_ret"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "jmp srso_alias_untrain_ret"
#else /* !CONFIG_MITIGATION_SRSO */
+/* Dummy for the alternative in CALL_UNTRAIN_RET. */
+SYM_CODE_START(srso_alias_untrain_ret)
+ RET
+SYM_FUNC_END(srso_alias_untrain_ret)
#define JMP_SRSO_UNTRAIN_RET "ud2"
-#define JMP_SRSO_ALIAS_UNTRAIN_RET "ud2"
#endif /* CONFIG_MITIGATION_SRSO */
#ifdef CONFIG_MITIGATION_UNRET_ENTRY
@@ -319,9 +322,7 @@ SYM_FUNC_END(retbleed_untrain_ret)
#if defined(CONFIG_MITIGATION_UNRET_ENTRY) || defined(CONFIG_MITIGATION_SRSO)
SYM_FUNC_START(entry_untrain_ret)
- ALTERNATIVE_2 JMP_RETBLEED_UNTRAIN_RET, \
- JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO, \
- JMP_SRSO_ALIAS_UNTRAIN_RET, X86_FEATURE_SRSO_ALIAS
+ ALTERNATIVE JMP_RETBLEED_UNTRAIN_RET, JMP_SRSO_UNTRAIN_RET, X86_FEATURE_SRSO
SYM_FUNC_END(entry_untrain_ret)
__EXPORT_THUNK(entry_untrain_ret)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 803de9000f334b771afacb6ff3e78622916668b0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024032732-prowess-craving-9106@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
803de9000f33 ("mm, vmscan: prevent infinite loop for costly GFP_NOIO | __GFP_RETRY_MAYFAIL allocations")
f98a497e1f16 ("mm: compaction: remove unnecessary is_via_compact_memory() checks")
e8606320e9af ("mm: compaction: refactor __compaction_suitable()")
fe573327ffb1 ("tracing: incorrect gfp_t conversion")
cff387d6a294 ("mm: compaction: make compaction_zonelist_suitable return false when COMPACT_SUCCESS")
9353ffa6e9e9 ("kasan, page_alloc: allow skipping memory init for HW_TAGS")
53ae233c30a6 ("kasan, page_alloc: allow skipping unpoisoning for HW_TAGS")
f49d9c5bb15c ("kasan, mm: only define ___GFP_SKIP_KASAN_POISON with HW_TAGS")
e9d0ca922816 ("kasan, page_alloc: rework kasan_unpoison_pages call site")
7e3cbba65de2 ("kasan, page_alloc: move kernel_init_free_pages in post_alloc_hook")
89b271163328 ("kasan, page_alloc: move SetPageSkipKASanPoison in post_alloc_hook")
9294b1281d0a ("kasan, page_alloc: combine tag_clear_highpage calls in post_alloc_hook")
b42090ae6f3a ("kasan, page_alloc: merge kasan_alloc_pages into post_alloc_hook")
b8491b9052fe ("kasan, page_alloc: refactor init checks in post_alloc_hook")
1c0e5b24f117 ("kasan: only apply __GFP_ZEROTAGS when memory is zeroed")
c82ce3195fd1 ("mm: clarify __GFP_ZEROTAGS comment")
7c13c163e036 ("kasan, page_alloc: merge kasan_free_pages into free_pages_prepare")
5b2c07138cbd ("kasan, page_alloc: move tag_clear_highpage out of kernel_init_free_pages")
94ae8b83fefc ("kasan, page_alloc: deduplicate should_skip_kasan_poison")
3bf03b9a0839 ("Merge branch 'akpm' (patches from Andrew)")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 803de9000f334b771afacb6ff3e78622916668b0 Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka(a)suse.cz>
Date: Wed, 21 Feb 2024 12:43:58 +0100
Subject: [PATCH] mm, vmscan: prevent infinite loop for costly GFP_NOIO |
__GFP_RETRY_MAYFAIL allocations
Sven reports an infinite loop in __alloc_pages_slowpath() for costly order
__GFP_RETRY_MAYFAIL allocations that are also GFP_NOIO. Such combination
can happen in a suspend/resume context where a GFP_KERNEL allocation can
have __GFP_IO masked out via gfp_allowed_mask.
Quoting Sven:
1. try to do a "costly" allocation (order > PAGE_ALLOC_COSTLY_ORDER)
with __GFP_RETRY_MAYFAIL set.
2. page alloc's __alloc_pages_slowpath tries to get a page from the
freelist. This fails because there is nothing free of that costly
order.
3. page alloc tries to reclaim by calling __alloc_pages_direct_reclaim,
which bails out because a zone is ready to be compacted; it pretends
to have made a single page of progress.
4. page alloc tries to compact, but this always bails out early because
__GFP_IO is not set (it's not passed by the snd allocator, and even
if it were, we are suspending so the __GFP_IO flag would be cleared
anyway).
5. page alloc believes reclaim progress was made (because of the
pretense in item 3) and so it checks whether it should retry
compaction. The compaction retry logic thinks it should try again,
because:
a) reclaim is needed because of the early bail-out in item 4
b) a zonelist is suitable for compaction
6. goto 2. indefinite stall.
(end quote)
The immediate root cause is confusing the COMPACT_SKIPPED returned from
__alloc_pages_direct_compact() (step 4) due to lack of __GFP_IO to be
indicating a lack of order-0 pages, and in step 5 evaluating that in
should_compact_retry() as a reason to retry, before incrementing and
limiting the number of retries. There are however other places that
wrongly assume that compaction can happen while we lack __GFP_IO.
To fix this, introduce gfp_compaction_allowed() to abstract the __GFP_IO
evaluation and switch the open-coded test in try_to_compact_pages() to use
it.
Also use the new helper in:
- compaction_ready(), which will make reclaim not bail out in step 3, so
there's at least one attempt to actually reclaim, even if chances are
small for a costly order
- in_reclaim_compaction() which will make should_continue_reclaim()
return false and we don't over-reclaim unnecessarily
- in __alloc_pages_slowpath() to set a local variable can_compact,
which is then used to avoid retrying reclaim/compaction for costly
allocations (step 5) if we can't compact and also to skip the early
compaction attempt that we do in some cases
Link: https://lkml.kernel.org/r/20240221114357.13655-2-vbabka@suse.cz
Fixes: 3250845d0526 ("Revert "mm, oom: prevent premature OOM killer invocation for high order request"")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Sven van Ashbrook <svenva(a)chromium.org>
Closes: https://lore.kernel.org/all/CAG-rBihs_xMKb3wrMO1%2B-%2Bp4fowP9oy1pa_OTkfxBz…
Tested-by: Karthikeyan Ramasubramanian <kramasub(a)chromium.org>
Cc: Brian Geffon <bgeffon(a)google.com>
Cc: Curtis Malainey <cujomalainey(a)chromium.org>
Cc: Jaroslav Kysela <perex(a)perex.cz>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Takashi Iwai <tiwai(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index de292a007138..e2a916cf29c4 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -353,6 +353,15 @@ static inline bool gfp_has_io_fs(gfp_t gfp)
return (gfp & (__GFP_IO | __GFP_FS)) == (__GFP_IO | __GFP_FS);
}
+/*
+ * Check if the gfp flags allow compaction - GFP_NOIO is a really
+ * tricky context because the migration might require IO.
+ */
+static inline bool gfp_compaction_allowed(gfp_t gfp_mask)
+{
+ return IS_ENABLED(CONFIG_COMPACTION) && (gfp_mask & __GFP_IO);
+}
+
extern gfp_t vma_thp_gfp_mask(struct vm_area_struct *vma);
#ifdef CONFIG_CONTIG_ALLOC
diff --git a/mm/compaction.c b/mm/compaction.c
index 4add68d40e8d..b961db601df4 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2723,16 +2723,11 @@ enum compact_result try_to_compact_pages(gfp_t gfp_mask, unsigned int order,
unsigned int alloc_flags, const struct alloc_context *ac,
enum compact_priority prio, struct page **capture)
{
- int may_perform_io = (__force int)(gfp_mask & __GFP_IO);
struct zoneref *z;
struct zone *zone;
enum compact_result rc = COMPACT_SKIPPED;
- /*
- * Check if the GFP flags allow compaction - GFP_NOIO is really
- * tricky context because the migration might require IO
- */
- if (!may_perform_io)
+ if (!gfp_compaction_allowed(gfp_mask))
return COMPACT_SKIPPED;
trace_mm_compaction_try_to_compact_pages(order, gfp_mask, prio);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 150d4f23b010..a663202045dc 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -4041,6 +4041,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
struct alloc_context *ac)
{
bool can_direct_reclaim = gfp_mask & __GFP_DIRECT_RECLAIM;
+ bool can_compact = gfp_compaction_allowed(gfp_mask);
const bool costly_order = order > PAGE_ALLOC_COSTLY_ORDER;
struct page *page = NULL;
unsigned int alloc_flags;
@@ -4111,7 +4112,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* Don't try this for allocations that are allowed to ignore
* watermarks, as the ALLOC_NO_WATERMARKS attempt didn't yet happen.
*/
- if (can_direct_reclaim &&
+ if (can_direct_reclaim && can_compact &&
(costly_order ||
(order > 0 && ac->migratetype != MIGRATE_MOVABLE))
&& !gfp_pfmemalloc_allowed(gfp_mask)) {
@@ -4209,9 +4210,10 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
/*
* Do not retry costly high order allocations unless they are
- * __GFP_RETRY_MAYFAIL
+ * __GFP_RETRY_MAYFAIL and we can compact
*/
- if (costly_order && !(gfp_mask & __GFP_RETRY_MAYFAIL))
+ if (costly_order && (!can_compact ||
+ !(gfp_mask & __GFP_RETRY_MAYFAIL)))
goto nopage;
if (should_reclaim_retry(gfp_mask, order, ac, alloc_flags,
@@ -4224,7 +4226,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
* implementation of the compaction depends on the sufficient amount
* of free memory (see __compaction_suitable)
*/
- if (did_some_progress > 0 &&
+ if (did_some_progress > 0 && can_compact &&
should_compact_retry(ac, order, alloc_flags,
compact_result, &compact_priority,
&compaction_retries))
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4f9c854ce6cc..4255619a1a31 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -5753,7 +5753,7 @@ static void shrink_lruvec(struct lruvec *lruvec, struct scan_control *sc)
/* Use reclaim/compaction for costly allocs or under memory pressure */
static bool in_reclaim_compaction(struct scan_control *sc)
{
- if (IS_ENABLED(CONFIG_COMPACTION) && sc->order &&
+ if (gfp_compaction_allowed(sc->gfp_mask) && sc->order &&
(sc->order > PAGE_ALLOC_COSTLY_ORDER ||
sc->priority < DEF_PRIORITY - 2))
return true;
@@ -5998,6 +5998,9 @@ static inline bool compaction_ready(struct zone *zone, struct scan_control *sc)
{
unsigned long watermark;
+ if (!gfp_compaction_allowed(sc->gfp_mask))
+ return false;
+
/* Allocation can already succeed, nothing to do */
if (zone_watermark_ok(zone, sc->order, min_wmark_pages(zone),
sc->reclaim_idx, 0))
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x c567f2948f57bdc03ed03403ae0234085f376b7d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040118-disgrace-tanning-bf41@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
c567f2948f57 ("Revert "x86/mm/ident_map: Use gbpages only where full GB page should be mapped."")
0a845e0f6348 ("mm/treewide: replace pud_large() with pud_leaf()")
d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c567f2948f57bdc03ed03403ae0234085f376b7d Mon Sep 17 00:00:00 2001
From: Ingo Molnar <mingo(a)kernel.org>
Date: Mon, 25 Mar 2024 11:47:51 +0100
Subject: [PATCH] Revert "x86/mm/ident_map: Use gbpages only where full GB page
should be mapped."
This reverts commit d794734c9bbfe22f86686dc2909c25f5ffe1a572.
While the original change tries to fix a bug, it also unintentionally broke
existing systems, see the regressions reported at:
https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjosep…
Since d794734c9bbf was also marked for -stable, let's back it out before
causing more damage.
Note that due to another upstream change the revert was not 100% automatic:
0a845e0f6348 mm/treewide: replace pud_large() with pud_leaf()
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Cc: Russ Anderson <rja(a)hpe.com>
Cc: Steve Wahl <steve.wahl(a)hpe.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Link: https://lore.kernel.org/all/3a1b9909-45ac-4f97-ad68-d16ef1ce99db@pavinjosep…
Fixes: d794734c9bbf ("x86/mm/ident_map: Use gbpages only where full GB page should be mapped.")
diff --git a/arch/x86/mm/ident_map.c b/arch/x86/mm/ident_map.c
index a204a332c71f..968d7005f4a7 100644
--- a/arch/x86/mm/ident_map.c
+++ b/arch/x86/mm/ident_map.c
@@ -26,31 +26,18 @@ static int ident_pud_init(struct x86_mapping_info *info, pud_t *pud_page,
for (; addr < end; addr = next) {
pud_t *pud = pud_page + pud_index(addr);
pmd_t *pmd;
- bool use_gbpage;
next = (addr & PUD_MASK) + PUD_SIZE;
if (next > end)
next = end;
- /* if this is already a gbpage, this portion is already mapped */
- if (pud_leaf(*pud))
- continue;
-
- /* Is using a gbpage allowed? */
- use_gbpage = info->direct_gbpages;
-
- /* Don't use gbpage if it maps more than the requested region. */
- /* at the begining: */
- use_gbpage &= ((addr & ~PUD_MASK) == 0);
- /* ... or at the end: */
- use_gbpage &= ((next & ~PUD_MASK) == 0);
-
- /* Never overwrite existing mappings */
- use_gbpage &= !pud_present(*pud);
-
- if (use_gbpage) {
+ if (info->direct_gbpages) {
pud_t pudval;
+ if (pud_present(*pud))
+ continue;
+
+ addr &= PUD_MASK;
pudval = __pud((addr - info->offset) | info->page_flag);
set_pud(pud, pudval);
continue;
Hi,
A previous stable backport neglected to handle that we now don't clear
'ret' since the SCM registration went away. I checked the other stable
kernels and it's only affecting 5.10/5.15.
Can you apply this fixup patch for 5.10-stable and 5.15-stable? I
provided one for each even though they are identical, but the fixup
sha is obviously different for them.
Thanks!
--
Jens Axboe
This is nearly identical to the v5.15 backport, the only difference is
the split to struct vfio_pci_core_device hasn't occurred yet, so we need
to use the original vfio_pci_device object instead.
NB. The fsl-mc driver doesn't exist on v5.4.y, therefore the last patch
in the series should be dropped against v5.4.
v4.19.y does not include IRQF_NO_AUTOEN, so this is as far as I intend
to go with these. Thanks,
Alex
Alex Williamson (6):
vfio/pci: Disable auto-enable of exclusive INTx IRQ
vfio/pci: Lock external INTx masking ops
vfio: Introduce interface to flush virqfd inject workqueue
vfio/pci: Create persistent INTx handler
vfio/platform: Create persistent IRQ handlers
vfio/fsl-mc: Block calling interrupt handler without trigger
drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c | 7 +-
drivers/vfio/pci/vfio_pci_intrs.c | 176 +++++++++++++---------
drivers/vfio/platform/vfio_platform_irq.c | 101 +++++++++----
drivers/vfio/virqfd.c | 21 +++
include/linux/vfio.h | 2 +
5 files changed, 201 insertions(+), 106 deletions(-)
--
2.44.0
From: Ezra Buehler <ezra.buehler(a)husqvarnagroup.com>
[ Upstream commit 34a956739d295de6010cdaafeed698ccbba87ea4 ]
E.g. ESMT chips will return an identification code with a length of 5
bytes. In order to prevent ambiguity, flash chips would actually need to
return IDs that are up to 17 or more bytes long due to JEDEC's
continuation scheme. I understand that if a manufacturer ID is located
in bank N of JEDEC's database (there are currently 16 banks), N - 1
continuation codes (7Fh) need to be added to the identification code
(comprising of manufacturer ID and device ID). However, most flash chip
manufacturers don't seem to implement this (correctly).
Cc: <stable(a)vger.kernel.org> # 6.6.23
Cc: <stable(a)vger.kernel.org> # 6.7.11
Cc: <stable(a)vger.kernel.org> # 6.8.2
Signed-off-by: Ezra Buehler <ezra.buehler(a)husqvarnagroup.com>
Reviewed-by: Martin Kurbanov <mmkurbanov(a)salutedevices.com>
Tested-by: Martin Kurbanov <mmkurbanov(a)salutedevices.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20240125200108.24374-2-ezra@easyb.ch
Signed-off-by: Dmitry Rokosov <ddrokosov(a)salutedevices.com>
---
In the patch series [1] only one patch was marked with Fixes tag, that's
why the secon patch was not applied to 6.6.y, 6.7.y and 6.8y. It breaks
ESMT detection flow with logs:
[ 0.770730] spi-nand spi0.0: unknown raw ID c8017f7f
[ 0.772688] spi-nand: probe of spi0.0 failed with error -524
Please cherry-pick the second patch from the series to 6.6.y, 6.7.y and
6.8.y.
Links:
[1] https://lore.kernel.org/linux-mtd/20240125200108.24374-1-ezra@easyb.ch/
---
include/linux/mtd/spinand.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/mtd/spinand.h b/include/linux/mtd/spinand.h
index badb4c1ac079..5c19ead60499 100644
--- a/include/linux/mtd/spinand.h
+++ b/include/linux/mtd/spinand.h
@@ -169,7 +169,7 @@
struct spinand_op;
struct spinand_device;
-#define SPINAND_MAX_ID_LEN 4
+#define SPINAND_MAX_ID_LEN 5
/*
* For erase, write and read operation, we got the following timings :
* tBERS (erase) 1ms to 4ms
--
2.43.0
Since the introduction of the `of: property: fw_devlink: Fix stupid bug in remote-endpoint parsing` patch, an issue with the device-tree of the Rock 5 Model B has been detected. All the stable kernels (6.7.y and 6.8.y) work on the Orange Pi 5, which has the Rockchip RK3588S SoC (same as the RK3588, but less I/O basically). So, being an owner of only two SBCs which use the RK3588* SoC, it appears that the Rock 5 Model B's DT is incorrect.
I looked at the patch and tried several things, neither resulted in anything that would point me to the core issue. Then I tried this:
```
$ grep -C 3 remote-endpoint arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts
port {
es8316_p0_0: endpoint {
remote-endpoint = <&i2s0_8ch_p0_0>;
};
};
};
--
i2s0_8ch_p0_0: endpoint {
dai-format = "i2s";
mclk-fs = <256>;
remote-endpoint = <&es8316_p0_0>;
};
};
};
```
So, from a cursory look, the issue seems to be related to either the DT node for the audio codec or related to the es8316's binding itself. Though I doubt that the later is the issue because if that were the issue, _someone_ with a Pine64 Pinebook Pro would've raised alarms. So far, this seems to be related to the `rk3588-rock-5b.dts` and possibly with the `rk3588s-rock-5a.dts` too.
I would **love** to help but I'm afraid I device-trees are not something that I am at-all familiar with. That said, I am open to methods of debugging this issue to provide a fix myself.
I would have replied to the patch's link but unfortunately, I haven't yet setup neomutt and my email provider's web UI doesn't have a [straightforward] way to reply using the 'In-Reply-To' header, hence a new thread. Apologies for the inconvenience caused.
-- Pratham Patel
From: Sven Eckelmann <sven(a)narfation.org>
If the MTU of one of an attached interface becomes too small to transmit
the local translation table then it must be resized to fit inside all
fragments (when enabled) or a single packet.
But if the MTU becomes too low to transmit even the header + the VLAN
specific part then the resizing of the local TT will never succeed. This
can for example happen when the usable space is 110 bytes and 11 VLANs are
on top of batman-adv. In this case, at least 116 byte would be needed.
There will just be an endless spam of
batman_adv: batadv0: Forced to purge local tt entries to fit new maximum fragment MTU (110)
in the log but the function will never finish. Problem here is that the
timeout will be halved all the time and will then stagnate at 0 and
therefore never be able to reduce the table even more.
There are other scenarios possible with a similar result. The number of
BATADV_TT_CLIENT_NOPURGE entries in the local TT can for example be too
high to fit inside a packet. Such a scenario can therefore happen also with
only a single VLAN + 7 non-purgable addresses - requiring at least 120
bytes.
While this should be handled proactively when:
* interface with too low MTU is added
* VLAN is added
* non-purgeable local mac is added
* MTU of an attached interface is reduced
* fragmentation setting gets disabled (which most likely requires dropping
attached interfaces)
not all of these scenarios can be prevented because batman-adv is only
consuming events without the the possibility to prevent these actions
(non-purgable MAC address added, MTU of an attached interface is reduced).
It is therefore necessary to also make sure that the code is able to handle
also the situations when there were already incompatible system
configuration are present.
Cc: stable(a)vger.kernel.org
Fixes: a19d3d85e1b8 ("batman-adv: limit local translation table max size")
Reported-by: syzbot+a6a4b5bb3da165594cff(a)syzkaller.appspotmail.com
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
---
net/batman-adv/translation-table.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/batman-adv/translation-table.c b/net/batman-adv/translation-table.c
index b95c36765d04..2243cec18ecc 100644
--- a/net/batman-adv/translation-table.c
+++ b/net/batman-adv/translation-table.c
@@ -3948,7 +3948,7 @@ void batadv_tt_local_resize_to_mtu(struct net_device *soft_iface)
spin_lock_bh(&bat_priv->tt.commit_lock);
- while (true) {
+ while (timeout) {
table_size = batadv_tt_local_table_transmit_size(bat_priv);
if (packet_size_max >= table_size)
break;
--
2.39.2
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 037965402a010898d34f4e35327d22c0a95cd51f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040525-mousy-reclining-38e6@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 037965402a010898d34f4e35327d22c0a95cd51f Mon Sep 17 00:00:00 2001
From: Jesper Dangaard Brouer <hawk(a)kernel.org>
Date: Wed, 27 Mar 2024 13:14:56 +0100
Subject: [PATCH] xen-netfront: Add missing skb_mark_for_recycle
Notice that skb_mark_for_recycle() is introduced later than fixes tag in
commit 6a5bcd84e886 ("page_pool: Allow drivers to hint on SKB recycling").
It is believed that fixes tag were missing a call to page_pool_release_page()
between v5.9 to v5.14, after which is should have used skb_mark_for_recycle().
Since v6.6 the call page_pool_release_page() were removed (in
commit 535b9c61bdef ("net: page_pool: hide page_pool_release_page()")
and remaining callers converted (in commit 6bfef2ec0172 ("Merge branch
'net-page_pool-remove-page_pool_release_page'")).
This leak became visible in v6.8 via commit dba1b8a7ab68 ("mm/page_pool: catch
page_pool memory leaks").
Cc: stable(a)vger.kernel.org
Fixes: 6c5aa6fc4def ("xen networking: add basic XDP support for xen-netfront")
Reported-by: Leonidas Spyropoulos <artafinde(a)archlinux.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218654
Reported-by: Arthur Borsboom <arthurborsboom(a)gmail.com>
Signed-off-by: Jesper Dangaard Brouer <hawk(a)kernel.org>
Link: https://lore.kernel.org/r/171154167446.2671062.9127105384591237363.stgit@fi…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/xen-netfront.c b/drivers/net/xen-netfront.c
index ad29f370034e..8d2aee88526c 100644
--- a/drivers/net/xen-netfront.c
+++ b/drivers/net/xen-netfront.c
@@ -285,6 +285,7 @@ static struct sk_buff *xennet_alloc_one_rx_buffer(struct netfront_queue *queue)
return NULL;
}
skb_add_rx_frag(skb, 0, page, 0, 0, PAGE_SIZE);
+ skb_mark_for_recycle(skb);
/* Align ip header to a 16 bytes boundary */
skb_reserve(skb, NET_IP_ALIGN);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 37801a36b4d68892ce807264f784d818f8d0d39b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040504-outpost-unbiased-0f9b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
37801a36b4d6 ("selinux: avoid dereference of garbage after mount failure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 37801a36b4d68892ce807264f784d818f8d0d39b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20G=C3=B6ttsche?= <cgzones(a)googlemail.com>
Date: Thu, 28 Mar 2024 20:16:58 +0100
Subject: [PATCH] selinux: avoid dereference of garbage after mount failure
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In case kern_mount() fails and returns an error pointer return in the
error branch instead of continuing and dereferencing the error pointer.
While on it drop the never read static variable selinuxfs_mount.
Cc: stable(a)vger.kernel.org
Fixes: 0619f0f5e36f ("selinux: wrap selinuxfs state")
Signed-off-by: Christian Göttsche <cgzones(a)googlemail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 0619a1cbbfbe..074d6c2714eb 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -2123,7 +2123,6 @@ static struct file_system_type sel_fs_type = {
.kill_sb = sel_kill_sb,
};
-static struct vfsmount *selinuxfs_mount __ro_after_init;
struct path selinux_null __ro_after_init;
static int __init init_sel_fs(void)
@@ -2145,18 +2144,21 @@ static int __init init_sel_fs(void)
return err;
}
- selinux_null.mnt = selinuxfs_mount = kern_mount(&sel_fs_type);
- if (IS_ERR(selinuxfs_mount)) {
+ selinux_null.mnt = kern_mount(&sel_fs_type);
+ if (IS_ERR(selinux_null.mnt)) {
pr_err("selinuxfs: could not mount!\n");
- err = PTR_ERR(selinuxfs_mount);
- selinuxfs_mount = NULL;
+ err = PTR_ERR(selinux_null.mnt);
+ selinux_null.mnt = NULL;
+ return err;
}
+
selinux_null.dentry = d_hash_and_lookup(selinux_null.mnt->mnt_root,
&null_name);
if (IS_ERR(selinux_null.dentry)) {
pr_err("selinuxfs: could not lookup null!\n");
err = PTR_ERR(selinux_null.dentry);
selinux_null.dentry = NULL;
+ return err;
}
return err;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 37801a36b4d68892ce807264f784d818f8d0d39b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040503-snugly-wikipedia-e4c2@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
37801a36b4d6 ("selinux: avoid dereference of garbage after mount failure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 37801a36b4d68892ce807264f784d818f8d0d39b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20G=C3=B6ttsche?= <cgzones(a)googlemail.com>
Date: Thu, 28 Mar 2024 20:16:58 +0100
Subject: [PATCH] selinux: avoid dereference of garbage after mount failure
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In case kern_mount() fails and returns an error pointer return in the
error branch instead of continuing and dereferencing the error pointer.
While on it drop the never read static variable selinuxfs_mount.
Cc: stable(a)vger.kernel.org
Fixes: 0619f0f5e36f ("selinux: wrap selinuxfs state")
Signed-off-by: Christian Göttsche <cgzones(a)googlemail.com>
Signed-off-by: Paul Moore <paul(a)paul-moore.com>
diff --git a/security/selinux/selinuxfs.c b/security/selinux/selinuxfs.c
index 0619a1cbbfbe..074d6c2714eb 100644
--- a/security/selinux/selinuxfs.c
+++ b/security/selinux/selinuxfs.c
@@ -2123,7 +2123,6 @@ static struct file_system_type sel_fs_type = {
.kill_sb = sel_kill_sb,
};
-static struct vfsmount *selinuxfs_mount __ro_after_init;
struct path selinux_null __ro_after_init;
static int __init init_sel_fs(void)
@@ -2145,18 +2144,21 @@ static int __init init_sel_fs(void)
return err;
}
- selinux_null.mnt = selinuxfs_mount = kern_mount(&sel_fs_type);
- if (IS_ERR(selinuxfs_mount)) {
+ selinux_null.mnt = kern_mount(&sel_fs_type);
+ if (IS_ERR(selinux_null.mnt)) {
pr_err("selinuxfs: could not mount!\n");
- err = PTR_ERR(selinuxfs_mount);
- selinuxfs_mount = NULL;
+ err = PTR_ERR(selinux_null.mnt);
+ selinux_null.mnt = NULL;
+ return err;
}
+
selinux_null.dentry = d_hash_and_lookup(selinux_null.mnt->mnt_root,
&null_name);
if (IS_ERR(selinux_null.dentry)) {
pr_err("selinuxfs: could not lookup null!\n");
err = PTR_ERR(selinux_null.dentry);
selinux_null.dentry = NULL;
+ return err;
}
return err;
The patch below does not apply to the 6.8-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.8.y
git checkout FETCH_HEAD
git cherry-pick -x 6946b9c99bde45f3ba74e00a7af9a3458cc24bea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040517-sharpener-displace-6491@gregkh' --subject-prefix 'PATCH 6.8.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6946b9c99bde45f3ba74e00a7af9a3458cc24bea Mon Sep 17 00:00:00 2001
From: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Date: Tue, 26 Mar 2024 12:43:17 -0400
Subject: [PATCH] Bluetooth: hci_sync: Fix not checking error on
hci_cmd_sync_cancel_sync
hci_cmd_sync_cancel_sync shall check the error passed to it since it
will be propagated using req_result which is __u32 it needs to be
properly set to a positive value if it was passed as negative othertise
IS_ERR will not trigger as -(errno) would be converted to a positive
value.
Fixes: 63298d6e752f ("Bluetooth: hci_core: Cancel request on command timeout")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz(a)intel.com>
Reported-and-tested-by: Thorsten Leemhuis <linux(a)leemhuis.info>
Closes: https://lore.kernel.org/all/08275279-7462-4f4a-a0ee-8aa015f829bc@leemhuis.i…
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 1690ae57a09d..a7028d38c1f5 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -2874,7 +2874,7 @@ static void hci_cancel_cmd_sync(struct hci_dev *hdev, int err)
cancel_delayed_work_sync(&hdev->ncmd_timer);
atomic_set(&hdev->cmd_cnt, 1);
- hci_cmd_sync_cancel_sync(hdev, -err);
+ hci_cmd_sync_cancel_sync(hdev, err);
}
/* Suspend HCI device */
@@ -2894,7 +2894,7 @@ int hci_suspend_dev(struct hci_dev *hdev)
return 0;
/* Cancel potentially blocking sync operation before suspend */
- hci_cancel_cmd_sync(hdev, -EHOSTDOWN);
+ hci_cancel_cmd_sync(hdev, EHOSTDOWN);
hci_req_sync_lock(hdev);
ret = hci_suspend_sync(hdev);
@@ -4210,7 +4210,7 @@ static void hci_send_cmd_sync(struct hci_dev *hdev, struct sk_buff *skb)
err = hci_send_frame(hdev, skb);
if (err < 0) {
- hci_cmd_sync_cancel_sync(hdev, err);
+ hci_cmd_sync_cancel_sync(hdev, -err);
return;
}
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 639090b9f4b8..8fe02921adf1 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -617,7 +617,10 @@ void hci_cmd_sync_cancel_sync(struct hci_dev *hdev, int err)
bt_dev_dbg(hdev, "err 0x%2.2x", err);
if (hdev->req_status == HCI_REQ_PEND) {
- hdev->req_result = err;
+ /* req_result is __u32 so error must be positive to be properly
+ * propagated.
+ */
+ hdev->req_result = err < 0 ? -err : err;
hdev->req_status = HCI_REQ_CANCELED;
wake_up_interruptible(&hdev->req_wait_q);
Commit e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at
correct rate") breaks the dw UARTs on Intel Bay Trail (BYT) and
Cherry Trail (CHT) SoCs.
Before this change the RTL8732BS Bluetooth HCI which is found
connected over the dw UART on both BYT and CHT boards works properly:
Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
Bluetooth: hci0: RTL: rom_version status=0 version=1
Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
Bluetooth: hci0: RTL: fw version 0x365d462e
where as after this change probing it fails:
Bluetooth: hci0: RTL: examining hci_ver=06 hci_rev=000b lmp_ver=06 lmp_subver=8723
Bluetooth: hci0: RTL: rom_version status=0 version=1
Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_fw.bin
Bluetooth: hci0: RTL: loading rtl_bt/rtl8723bs_config-OBDA8723.bin
Bluetooth: hci0: RTL: cfg_sz 64, total sz 24508
Bluetooth: hci0: command 0xfc20 tx timeout
Bluetooth: hci0: RTL: download fw command failed (-110)
Revert the changes to fix this regression.
Fixes: e5d6bd25f93d ("serial: 8250_dw: Do not reclock if already at correct rate")
Cc: stable(a)vger.kernel.org
Cc: Peter Collingbourne <pcc(a)google.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
Note it is not entirely clear to me why this commit is causing
this issue. Maybe probe() needs to explicitly set the clk rate
which it just got (that feels like a clk driver issue) or maybe
the issue is that unless setup before hand by firmware /
the bootloader serial8250_update_uartclk() needs to be called
at least once to setup things ? Note that probe() does not call
serial8250_update_uartclk(), this is only called from the
dw8250_clk_notifier_cb()
This requires more debugging which is why I'm proposing
a straight revert to fix the regression ASAP and then this
can be investigated further.
---
drivers/tty/serial/8250/8250_dw.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/serial/8250/8250_dw.c b/drivers/tty/serial/8250/8250_dw.c
index c1d43f040c43..2d1f350a4bea 100644
--- a/drivers/tty/serial/8250/8250_dw.c
+++ b/drivers/tty/serial/8250/8250_dw.c
@@ -357,9 +357,9 @@ static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios,
long rate;
int ret;
+ clk_disable_unprepare(d->clk);
rate = clk_round_rate(d->clk, newrate);
- if (rate > 0 && p->uartclk != rate) {
- clk_disable_unprepare(d->clk);
+ if (rate > 0) {
/*
* Note that any clock-notifer worker will block in
* serial8250_update_uartclk() until we are done.
@@ -367,8 +367,8 @@ static void dw8250_set_termios(struct uart_port *p, struct ktermios *termios,
ret = clk_set_rate(d->clk, newrate);
if (!ret)
p->uartclk = rate;
- clk_prepare_enable(d->clk);
}
+ clk_prepare_enable(d->clk);
dw8250_do_set_termios(p, termios, old);
}
--
2.44.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Thanks,
Sasha
------------------ original commit in Linus's tree ------------------
From f2d5dcb48f7ba9e3ff249d58fc1fa963d374e66a Mon Sep 17 00:00:00 2001
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Date: Tue, 10 Oct 2023 15:55:49 +0100
Subject: [PATCH] bounds: support non-power-of-two CONFIG_NR_CPUS
ilog2() rounds down, so for example when PowerPC 85xx sets CONFIG_NR_CPUS
to 24, we will only allocate 4 bits to store the number of CPUs instead of
5. Use bits_per() instead, which rounds up. Found by code inspection.
The effect of this would probably be a misaccounting when doing NUMA
balancing, so to a user, it would only be a performance penalty. The
effects may be more wide-spread; it's hard to tell.
Link: https://lkml.kernel.org/r/20231010145549.1244748-1-willy@infradead.org
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Fixes: 90572890d202 ("mm: numa: Change page last {nid,pid} into {cpu,pid}")
Reviewed-by: Rik van Riel <riel(a)surriel.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/bounds.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/bounds.c b/kernel/bounds.c
index b529182e8b04f..c5a9fcd2d6228 100644
--- a/kernel/bounds.c
+++ b/kernel/bounds.c
@@ -19,7 +19,7 @@ int main(void)
DEFINE(NR_PAGEFLAGS, __NR_PAGEFLAGS);
DEFINE(MAX_NR_ZONES, __MAX_NR_ZONES);
#ifdef CONFIG_SMP
- DEFINE(NR_CPUS_BITS, ilog2(CONFIG_NR_CPUS));
+ DEFINE(NR_CPUS_BITS, bits_per(CONFIG_NR_CPUS));
#endif
DEFINE(SPINLOCK_SIZE, sizeof(spinlock_t));
#ifdef CONFIG_LRU_GEN
--
2.43.0
On 3/5/24 06:04, Aaro Koskinen wrote:
> Hi,
>
> It seems this patch (commit 48dfb5d2560d) is missing from
> at least 5.15 stable tree.
>
> Based on quick test, it seems to fix an issue where system locks up
> easily when RT throttling is disabled, and also it applies cleanly, so
> I think it should be good to have it 5.15?
You need to cc stable as the locking maintainers are not responsible for
merging patches to stable trees.
Cheers,
Longman
>
> A.
>
> On Thu, Sep 08, 2022 at 11:54:27PM +0530, Mukesh Ojha wrote:
>> From: Gokul krishna Krishnakumar <quic_gokukris(a)quicinc.com>
>>
>> Make the region inside the rwsem_write_trylock non preemptible.
>>
>> We observe RT task is hogging CPU when trying to acquire rwsem lock
>> which was acquired by a kworker task but before the rwsem owner was set.
>>
>> Here is the scenario:
>> 1. CFS task (affined to a particular CPU) takes rwsem lock.
>>
>> 2. CFS task gets preempted by a RT task before setting owner.
>>
>> 3. RT task (FIFO) is trying to acquire the lock, but spinning until
>> RT throttling happens for the lock as the lock was taken by CFS task.
>>
>> This patch attempts to fix the above issue by disabling preemption
>> until owner is set for the lock. While at it also fix the issues
>> at the places where rwsem_{set,clear}_owner() are called.
>>
>> This also adds lockdep annotation of preemption disable in
>> rwsem_{set,clear}_owner() on Peter Z. suggestion.
>>
>> Signed-off-by: Gokul krishna Krishnakumar <quic_gokukris(a)quicinc.com>
>> Signed-off-by: Mukesh Ojha <quic_mojha(a)quicinc.com>
>> ---
>> Changes in v2:
>> - Remove preempt disable code in rwsem_try_write_lock_unqueued()
>> - Addressed suggestion from Peter Z.
>> - Modified commit text
>> kernel/locking/rwsem.c | 14 ++++++++++++--
>> 1 file changed, 12 insertions(+), 2 deletions(-)
>>
>> diff --git a/kernel/locking/rwsem.c b/kernel/locking/rwsem.c
>> index 65f0262..4487359 100644
>> --- a/kernel/locking/rwsem.c
>> +++ b/kernel/locking/rwsem.c
>> @@ -133,14 +133,19 @@
>> * the owner value concurrently without lock. Read from owner, however,
>> * may not need READ_ONCE() as long as the pointer value is only used
>> * for comparison and isn't being dereferenced.
>> + *
>> + * Both rwsem_{set,clear}_owner() functions should be in the same
>> + * preempt disable section as the atomic op that changes sem->count.
>> */
>> static inline void rwsem_set_owner(struct rw_semaphore *sem)
>> {
>> + lockdep_assert_preemption_disabled();
>> atomic_long_set(&sem->owner, (long)current);
>> }
>>
>> static inline void rwsem_clear_owner(struct rw_semaphore *sem)
>> {
>> + lockdep_assert_preemption_disabled();
>> atomic_long_set(&sem->owner, 0);
>> }
>>
>> @@ -251,13 +256,16 @@ static inline bool rwsem_read_trylock(struct rw_semaphore *sem, long *cntp)
>> static inline bool rwsem_write_trylock(struct rw_semaphore *sem)
>> {
>> long tmp = RWSEM_UNLOCKED_VALUE;
>> + bool ret = false;
>>
>> + preempt_disable();
>> if (atomic_long_try_cmpxchg_acquire(&sem->count, &tmp, RWSEM_WRITER_LOCKED)) {
>> rwsem_set_owner(sem);
>> - return true;
>> + ret = true;
>> }
>>
>> - return false;
>> + preempt_enable();
>> + return ret;
>> }
>>
>> /*
>> @@ -1352,8 +1360,10 @@ static inline void __up_write(struct rw_semaphore *sem)
>> DEBUG_RWSEMS_WARN_ON((rwsem_owner(sem) != current) &&
>> !rwsem_test_oflags(sem, RWSEM_NONSPINNABLE), sem);
>>
>> + preempt_disable();
>> rwsem_clear_owner(sem);
>> tmp = atomic_long_fetch_add_release(-RWSEM_WRITER_LOCKED, &sem->count);
>> + preempt_enable();
>> if (unlikely(tmp & RWSEM_FLAG_WAITERS))
>> rwsem_wake(sem);
>> }
>> --
>> 2.7.4
>>
>>
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x b32a09ea7c38849ff925489a6bf5bd8914bc45df
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040558-jet-obstinate-7484@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b32a09ea7c38849ff925489a6bf5bd8914bc45df Mon Sep 17 00:00:00 2001
From: Marco Pinna <marco.pinn95(a)gmail.com>
Date: Fri, 29 Mar 2024 17:12:59 +0100
Subject: [PATCH] vsock/virtio: fix packet delivery to tap device
Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added
virtio_transport_deliver_tap_pkt() for handing packets to the
vsockmon device. However, in virtio_transport_send_pkt_work(),
the function is called before actually sending the packet (i.e.
before placing it in the virtqueue with virtqueue_add_sgs() and checking
whether it returned successfully).
Queuing the packet in the virtqueue can fail even multiple times.
However, in virtio_transport_deliver_tap_pkt() we deliver the packet
to the monitoring tap interface only the first time we call it.
This certainly avoids seeing the same packet replicated multiple times
in the monitoring interface, but it can show the packet sent with the
wrong timestamp or even before we succeed to queue it in the virtqueue.
Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
and making sure it returned successfully.
Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks")
Cc: stable(a)vge.kernel.org
Signed-off-by: Marco Pinna <marco.pinn95(a)gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare(a)redhat.com>
Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 1748268e0694..ee5d306a96d0 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -120,7 +120,6 @@ virtio_transport_send_pkt_work(struct work_struct *work)
if (!skb)
break;
- virtio_transport_deliver_tap_pkt(skb);
reply = virtio_vsock_skb_reply(skb);
sgs = vsock->out_sgs;
sg_init_one(sgs[out_sg], virtio_vsock_hdr(skb),
@@ -170,6 +169,8 @@ virtio_transport_send_pkt_work(struct work_struct *work)
break;
}
+ virtio_transport_deliver_tap_pkt(skb);
+
if (reply) {
struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX];
int val;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x b32a09ea7c38849ff925489a6bf5bd8914bc45df
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024040557-jeeringly-random-fd56@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b32a09ea7c38849ff925489a6bf5bd8914bc45df Mon Sep 17 00:00:00 2001
From: Marco Pinna <marco.pinn95(a)gmail.com>
Date: Fri, 29 Mar 2024 17:12:59 +0100
Subject: [PATCH] vsock/virtio: fix packet delivery to tap device
Commit 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks") added
virtio_transport_deliver_tap_pkt() for handing packets to the
vsockmon device. However, in virtio_transport_send_pkt_work(),
the function is called before actually sending the packet (i.e.
before placing it in the virtqueue with virtqueue_add_sgs() and checking
whether it returned successfully).
Queuing the packet in the virtqueue can fail even multiple times.
However, in virtio_transport_deliver_tap_pkt() we deliver the packet
to the monitoring tap interface only the first time we call it.
This certainly avoids seeing the same packet replicated multiple times
in the monitoring interface, but it can show the packet sent with the
wrong timestamp or even before we succeed to queue it in the virtqueue.
Move virtio_transport_deliver_tap_pkt() after calling virtqueue_add_sgs()
and making sure it returned successfully.
Fixes: 82dfb540aeb2 ("VSOCK: Add virtio vsock vsockmon hooks")
Cc: stable(a)vge.kernel.org
Signed-off-by: Marco Pinna <marco.pinn95(a)gmail.com>
Reviewed-by: Stefano Garzarella <sgarzare(a)redhat.com>
Link: https://lore.kernel.org/r/20240329161259.411751-1-marco.pinn95@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/vmw_vsock/virtio_transport.c b/net/vmw_vsock/virtio_transport.c
index 1748268e0694..ee5d306a96d0 100644
--- a/net/vmw_vsock/virtio_transport.c
+++ b/net/vmw_vsock/virtio_transport.c
@@ -120,7 +120,6 @@ virtio_transport_send_pkt_work(struct work_struct *work)
if (!skb)
break;
- virtio_transport_deliver_tap_pkt(skb);
reply = virtio_vsock_skb_reply(skb);
sgs = vsock->out_sgs;
sg_init_one(sgs[out_sg], virtio_vsock_hdr(skb),
@@ -170,6 +169,8 @@ virtio_transport_send_pkt_work(struct work_struct *work)
break;
}
+ virtio_transport_deliver_tap_pkt(skb);
+
if (reply) {
struct virtqueue *rx_vq = vsock->vqs[VSOCK_VQ_RX];
int val;