From: Steven Rostedt <rostedt(a)goodmis.org>
Memory mapping the tracing ring buffer will disable resizing the buffer.
But if there's an error in the memory mapping like an invalid parameter,
the function exits out without re-enabling the resizing of the ring
buffer, preventing the ring buffer from being resized after that.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Vincent Donnefort <vdonnefort(a)google.com>
Link: https://lore.kernel.org/20250213131957.530ec3c5@gandalf.local.home
Fixes: 117c39200d9d7 ("ring-buffer: Introducing ring-buffer mapping functions")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index b8e0ae15ca5b..07b421115692 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -7126,6 +7126,7 @@ int ring_buffer_map(struct trace_buffer *buffer, int cpu,
kfree(cpu_buffer->subbuf_ids);
cpu_buffer->subbuf_ids = NULL;
rb_free_meta_page(cpu_buffer);
+ atomic_dec(&cpu_buffer->resize_disabled);
}
unlock:
--
2.47.2
commit e4b9852a0f4afe40604afb442e3af4452722050a upstream.
Below two paths could overlap each other if we power off a drive quickly
after powering it on. There are multiple races in nvme_setup_io_queues()
because of shutdown_lock missing and improper use of NVMEQ_ENABLED bit.
nvme_reset_work() nvme_remove()
nvme_setup_io_queues() nvme_dev_disable()
... ...
A1 clear NVMEQ_ENABLED bit for admin queue lock
retry: B1 nvme_suspend_io_queues()
A2 pci_free_irq() admin queue B2 nvme_suspend_queue() admin queue
A3 pci_free_irq_vectors() nvme_pci_disable()
A4 nvme_setup_irqs(); B3 pci_free_irq_vectors()
... unlock
A5 queue_request_irq() for admin queue
set NVMEQ_ENABLED bit
...
nvme_create_io_queues()
A6 result = queue_request_irq();
set NVMEQ_ENABLED bit
...
fail to allocate enough IO queues:
A7 nvme_suspend_io_queues()
goto retry
If B3 runs in between A1 and A2, it will crash if irqaction haven't
been freed by A2. B2 is supposed to free admin queue IRQ but it simply
can't fulfill the job as A1 has cleared NVMEQ_ENABLED bit.
Fix: combine A1 A2 so IRQ get freed as soon as the NVMEQ_ENABLED bit
gets cleared.
After solved #1, A2 could race with B3 if A2 is freeing IRQ while B3
is checking irqaction. A3 also could race with B2 if B2 is freeing
IRQ while A3 is checking irqaction.
Fix: A2 and A3 take lock for mutual exclusion.
A3 could race with B3 since they could run free_msi_irqs() in parallel.
Fix: A3 takes lock for mutual exclusion.
A4 could fail to allocate all needed IRQ vectors if A3 and A4 are
interrupted by B3.
Fix: A4 takes lock for mutual exclusion.
If A5/A6 happened after B2/B1, B3 will crash since irqaction is not NULL.
They are just allocated by A5/A6.
Fix: Lock queue_request_irq() and setting of NVMEQ_ENABLED bit.
A7 could get chance to pci_free_irq() for certain IO queue while B3 is
checking irqaction.
Fix: A7 takes lock.
nvme_dev->online_queues need to be protected by shutdown_lock. Since it
is not atomic, both paths could modify it using its own copy.
Co-developed-by: Yuanyuan Zhong <yzhong(a)purestorage.com>
Signed-off-by: Casey Chen <cachen(a)purestorage.com>
Reviewed-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
[noahm(a)debian.org: backported to 5.10]
Link: https://lore.kernel.org/linux-nvme/20210707211432.29536-1-cachen@purestorag…
Signed-off-by: Noah Meyerhans <noahm(a)debian.org>
---
drivers/nvme/host/pci.c | 66 ++++++++++++++++++++++++++++++++++++-----
1 file changed, 58 insertions(+), 8 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 875ebef6adc7..ae04bdce560a 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -1563,6 +1563,28 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid)
wmb(); /* ensure the first interrupt sees the initialization */
}
+/*
+ * Try getting shutdown_lock while setting up IO queues.
+ */
+static int nvme_setup_io_queues_trylock(struct nvme_dev *dev)
+{
+ /*
+ * Give up if the lock is being held by nvme_dev_disable.
+ */
+ if (!mutex_trylock(&dev->shutdown_lock))
+ return -ENODEV;
+
+ /*
+ * Controller is in wrong state, fail early.
+ */
+ if (dev->ctrl.state != NVME_CTRL_CONNECTING) {
+ mutex_unlock(&dev->shutdown_lock);
+ return -ENODEV;
+ }
+
+ return 0;
+}
+
static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
{
struct nvme_dev *dev = nvmeq->dev;
@@ -1591,8 +1613,11 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
goto release_cq;
nvmeq->cq_vector = vector;
- nvme_init_queue(nvmeq, qid);
+ result = nvme_setup_io_queues_trylock(dev);
+ if (result)
+ return result;
+ nvme_init_queue(nvmeq, qid);
if (!polled) {
result = queue_request_irq(nvmeq);
if (result < 0)
@@ -1600,10 +1625,12 @@ static int nvme_create_queue(struct nvme_queue *nvmeq, int qid, bool polled)
}
set_bit(NVMEQ_ENABLED, &nvmeq->flags);
+ mutex_unlock(&dev->shutdown_lock);
return result;
release_sq:
dev->online_queues--;
+ mutex_unlock(&dev->shutdown_lock);
adapter_delete_sq(dev, qid);
release_cq:
adapter_delete_cq(dev, qid);
@@ -2182,7 +2209,18 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
if (nr_io_queues == 0)
return 0;
- clear_bit(NVMEQ_ENABLED, &adminq->flags);
+ /*
+ * Free IRQ resources as soon as NVMEQ_ENABLED bit transitions
+ * from set to unset. If there is a window to it is truely freed,
+ * pci_free_irq_vectors() jumping into this window will crash.
+ * And take lock to avoid racing with pci_free_irq_vectors() in
+ * nvme_dev_disable() path.
+ */
+ result = nvme_setup_io_queues_trylock(dev);
+ if (result)
+ return result;
+ if (test_and_clear_bit(NVMEQ_ENABLED, &adminq->flags))
+ pci_free_irq(pdev, 0, adminq);
if (dev->cmb_use_sqes) {
result = nvme_cmb_qdepth(dev, nr_io_queues,
@@ -2198,14 +2236,17 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
result = nvme_remap_bar(dev, size);
if (!result)
break;
- if (!--nr_io_queues)
- return -ENOMEM;
+ if (!--nr_io_queues) {
+ result = -ENOMEM;
+ goto out_unlock;
+ }
} while (1);
adminq->q_db = dev->dbs;
retry:
/* Deregister the admin queue's interrupt */
- pci_free_irq(pdev, 0, adminq);
+ if (test_and_clear_bit(NVMEQ_ENABLED, &adminq->flags))
+ pci_free_irq(pdev, 0, adminq);
/*
* If we enable msix early due to not intx, disable it again before
@@ -2214,8 +2255,10 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
pci_free_irq_vectors(pdev);
result = nvme_setup_irqs(dev, nr_io_queues);
- if (result <= 0)
- return -EIO;
+ if (result <= 0) {
+ result = -EIO;
+ goto out_unlock;
+ }
dev->num_vecs = result;
result = max(result - 1, 1);
@@ -2229,8 +2272,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
*/
result = queue_request_irq(adminq);
if (result)
- return result;
+ goto out_unlock;
set_bit(NVMEQ_ENABLED, &adminq->flags);
+ mutex_unlock(&dev->shutdown_lock);
result = nvme_create_io_queues(dev);
if (result || dev->online_queues < 2)
@@ -2239,6 +2283,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
if (dev->online_queues - 1 < dev->max_qid) {
nr_io_queues = dev->online_queues - 1;
nvme_disable_io_queues(dev);
+ result = nvme_setup_io_queues_trylock(dev);
+ if (result)
+ return result;
nvme_suspend_io_queues(dev);
goto retry;
}
@@ -2247,6 +2294,9 @@ static int nvme_setup_io_queues(struct nvme_dev *dev)
dev->io_queues[HCTX_TYPE_READ],
dev->io_queues[HCTX_TYPE_POLL]);
return 0;
+out_unlock:
+ mutex_unlock(&dev->shutdown_lock);
+ return result;
}
static void nvme_del_queue_end(struct request *req, blk_status_t error)
--
2.39.5
On error restore anything still on the pin_list back to the invalidation
list on error. For the actual pin, so long as the vma is tracked on
either list it should get picked up on the next pin, however it looks
possible for the vma to get nuked but still be present on this per vm
pin_list leading to corruption. An alternative might be then to instead
just remove the link when destroying the vma.
Fixes: ed2bdf3b264d ("drm/xe/vm: Subclass userptr vmas")
Suggested-by: Matthew Brost <matthew.brost(a)intel.com>
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Thomas Hellström <thomas.hellstrom(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_vm.c | 26 +++++++++++++++++++-------
1 file changed, 19 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_vm.c b/drivers/gpu/drm/xe/xe_vm.c
index d664f2e418b2..668b0bde7822 100644
--- a/drivers/gpu/drm/xe/xe_vm.c
+++ b/drivers/gpu/drm/xe/xe_vm.c
@@ -670,12 +670,12 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
list_for_each_entry_safe(uvma, next, &vm->userptr.invalidated,
userptr.invalidate_link) {
list_del_init(&uvma->userptr.invalidate_link);
- list_move_tail(&uvma->userptr.repin_link,
- &vm->userptr.repin_list);
+ list_add_tail(&uvma->userptr.repin_link,
+ &vm->userptr.repin_list);
}
spin_unlock(&vm->userptr.invalidated_lock);
- /* Pin and move to temporary list */
+ /* Pin and move to bind list */
list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list,
userptr.repin_link) {
err = xe_vma_userptr_pin_pages(uvma);
@@ -691,10 +691,10 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
err = xe_vm_invalidate_vma(&uvma->vma);
xe_vm_unlock(vm);
if (err)
- return err;
+ break;
} else {
- if (err < 0)
- return err;
+ if (err)
+ break;
list_del_init(&uvma->userptr.repin_link);
list_move_tail(&uvma->vma.combined_links.rebind,
@@ -702,7 +702,19 @@ int xe_vm_userptr_pin(struct xe_vm *vm)
}
}
- return 0;
+ if (err) {
+ down_write(&vm->userptr.notifier_lock);
+ spin_lock(&vm->userptr.invalidated_lock);
+ list_for_each_entry_safe(uvma, next, &vm->userptr.repin_list,
+ userptr.repin_link) {
+ list_del_init(&uvma->userptr.repin_link);
+ list_move_tail(&uvma->userptr.invalidate_link,
+ &vm->userptr.invalidated);
+ }
+ spin_unlock(&vm->userptr.invalidated_lock);
+ up_write(&vm->userptr.notifier_lock);
+ }
+ return err;
}
/**
--
2.48.1
The patch titled
Subject: Revert "selftests/mm: remove local __NR_* definitions"
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
revert-selftests-mm-remove-local-__nr_-definitions.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: John Hubbard <jhubbard(a)nvidia.com>
Subject: Revert "selftests/mm: remove local __NR_* definitions"
Date: Thu, 13 Feb 2025 19:38:50 -0800
This reverts commit a5c6bc590094a1a73cf6fa3f505e1945d2bf2461.
The general approach described in commit e076eaca5906 ("selftests: break
the dependency upon local header files") was taken one step too far here:
it should not have been extended to include the syscall numbers. This is
because doing so would require per-arch support in tools/include/uapi, and
no such support exists.
This revert fixes two separate reports of test failures, from Dave
Hansen[1], and Li Wang[2]. An excerpt of Dave's report:
Before this commit (a5c6bc590094a1a73cf6fa3f505e1945d2bf2461) things are
fine. But after, I get:
running PKEY tests for unsupported CPU/OS
An excerpt of Li's report:
I just found that mlock2_() return a wrong value in mlock2-test
[1] https://lore.kernel.org/dc585017-6740-4cab-a536-b12b37a7582d@intel.com
[2] https://lore.kernel.org/CAEemH2eW=UMu9+turT2jRie7+6ewUazXmA6kL+VBo3cGDGU6RA…
Link: https://lkml.kernel.org/r/20250214033850.235171-1-jhubbard@nvidia.com
Fixes: a5c6bc590094 ("selftests/mm: remove local __NR_* definitions")
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: Li Wang <liwang(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Jeff Xu <jeffxu(a)chromium.org>
Cc: Andrei Vagin <avagin(a)google.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: Kees Cook <kees(a)kernel.org>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Rich Felker <dalias(a)libc.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hugepage-mremap.c | 2 -
tools/testing/selftests/mm/ksm_functional_tests.c | 8 +++++-
tools/testing/selftests/mm/memfd_secret.c | 14 ++++++++++-
tools/testing/selftests/mm/mkdirty.c | 8 +++++-
tools/testing/selftests/mm/mlock2.h | 1
tools/testing/selftests/mm/protection_keys.c | 2 -
tools/testing/selftests/mm/uffd-common.c | 4 +++
tools/testing/selftests/mm/uffd-stress.c | 15 +++++++++++-
tools/testing/selftests/mm/uffd-unit-tests.c | 14 ++++++++++-
9 files changed, 60 insertions(+), 8 deletions(-)
--- a/tools/testing/selftests/mm/hugepage-mremap.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/hugepage-mremap.c
@@ -15,7 +15,7 @@
#define _GNU_SOURCE
#include <stdlib.h>
#include <stdio.h>
-#include <asm-generic/unistd.h>
+#include <unistd.h>
#include <sys/mman.h>
#include <errno.h>
#include <fcntl.h> /* Definition of O_* constants */
--- a/tools/testing/selftests/mm/ksm_functional_tests.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/ksm_functional_tests.c
@@ -11,7 +11,7 @@
#include <string.h>
#include <stdbool.h>
#include <stdint.h>
-#include <asm-generic/unistd.h>
+#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
#include <sys/mman.h>
@@ -369,6 +369,7 @@ unmap:
munmap(map, size);
}
+#ifdef __NR_userfaultfd
static void test_unmerge_uffd_wp(void)
{
struct uffdio_writeprotect uffd_writeprotect;
@@ -429,6 +430,7 @@ close_uffd:
unmap:
munmap(map, size);
}
+#endif
/* Verify that KSM can be enabled / queried with prctl. */
static void test_prctl(void)
@@ -684,7 +686,9 @@ int main(int argc, char **argv)
exit(test_child_ksm());
}
+#ifdef __NR_userfaultfd
tests++;
+#endif
ksft_print_header();
ksft_set_plan(tests);
@@ -696,7 +700,9 @@ int main(int argc, char **argv)
test_unmerge();
test_unmerge_zero_pages();
test_unmerge_discarded();
+#ifdef __NR_userfaultfd
test_unmerge_uffd_wp();
+#endif
test_prot_none();
--- a/tools/testing/selftests/mm/memfd_secret.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/memfd_secret.c
@@ -17,7 +17,7 @@
#include <stdlib.h>
#include <string.h>
-#include <asm-generic/unistd.h>
+#include <unistd.h>
#include <errno.h>
#include <stdio.h>
#include <fcntl.h>
@@ -28,6 +28,8 @@
#define pass(fmt, ...) ksft_test_result_pass(fmt, ##__VA_ARGS__)
#define skip(fmt, ...) ksft_test_result_skip(fmt, ##__VA_ARGS__)
+#ifdef __NR_memfd_secret
+
#define PATTERN 0x55
static const int prot = PROT_READ | PROT_WRITE;
@@ -332,3 +334,13 @@ int main(int argc, char *argv[])
ksft_finished();
}
+
+#else /* __NR_memfd_secret */
+
+int main(int argc, char *argv[])
+{
+ printf("skip: skipping memfd_secret test (missing __NR_memfd_secret)\n");
+ return KSFT_SKIP;
+}
+
+#endif /* __NR_memfd_secret */
--- a/tools/testing/selftests/mm/mkdirty.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/mkdirty.c
@@ -9,7 +9,7 @@
*/
#include <fcntl.h>
#include <signal.h>
-#include <asm-generic/unistd.h>
+#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <stdlib.h>
@@ -265,6 +265,7 @@ munmap:
munmap(mmap_mem, mmap_size);
}
+#ifdef __NR_userfaultfd
static void test_uffdio_copy(void)
{
struct uffdio_register uffdio_register;
@@ -322,6 +323,7 @@ munmap:
munmap(dst, pagesize);
free(src);
}
+#endif /* __NR_userfaultfd */
int main(void)
{
@@ -334,7 +336,9 @@ int main(void)
thpsize / 1024);
tests += 3;
}
+#ifdef __NR_userfaultfd
tests += 1;
+#endif /* __NR_userfaultfd */
ksft_print_header();
ksft_set_plan(tests);
@@ -364,7 +368,9 @@ int main(void)
if (thpsize)
test_pte_mapped_thp();
/* Placing a fresh page via userfaultfd may set the PTE dirty. */
+#ifdef __NR_userfaultfd
test_uffdio_copy();
+#endif /* __NR_userfaultfd */
err = ksft_get_fail_cnt();
if (err)
--- a/tools/testing/selftests/mm/mlock2.h~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/mlock2.h
@@ -3,7 +3,6 @@
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
-#include <asm-generic/unistd.h>
static int mlock2_(void *start, size_t len, int flags)
{
--- a/tools/testing/selftests/mm/protection_keys.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/protection_keys.c
@@ -42,7 +42,7 @@
#include <sys/wait.h>
#include <sys/stat.h>
#include <fcntl.h>
-#include <asm-generic/unistd.h>
+#include <unistd.h>
#include <sys/ptrace.h>
#include <setjmp.h>
--- a/tools/testing/selftests/mm/uffd-common.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/uffd-common.c
@@ -673,7 +673,11 @@ int uffd_open_dev(unsigned int flags)
int uffd_open_sys(unsigned int flags)
{
+#ifdef __NR_userfaultfd
return syscall(__NR_userfaultfd, flags);
+#else
+ return -1;
+#endif
}
int uffd_open(unsigned int flags)
--- a/tools/testing/selftests/mm/uffd-stress.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/uffd-stress.c
@@ -33,10 +33,11 @@
* pthread_mutex_lock will also verify the atomicity of the memory
* transfer (UFFDIO_COPY).
*/
-#include <asm-generic/unistd.h>
+
#include "uffd-common.h"
uint64_t features;
+#ifdef __NR_userfaultfd
#define BOUNCE_RANDOM (1<<0)
#define BOUNCE_RACINGFAULTS (1<<1)
@@ -471,3 +472,15 @@ int main(int argc, char **argv)
nr_pages, nr_pages_per_cpu);
return userfaultfd_stress();
}
+
+#else /* __NR_userfaultfd */
+
+#warning "missing __NR_userfaultfd definition"
+
+int main(void)
+{
+ printf("skip: Skipping userfaultfd test (missing __NR_userfaultfd)\n");
+ return KSFT_SKIP;
+}
+
+#endif /* __NR_userfaultfd */
--- a/tools/testing/selftests/mm/uffd-unit-tests.c~revert-selftests-mm-remove-local-__nr_-definitions
+++ a/tools/testing/selftests/mm/uffd-unit-tests.c
@@ -5,11 +5,12 @@
* Copyright (C) 2015-2023 Red Hat, Inc.
*/
-#include <asm-generic/unistd.h>
#include "uffd-common.h"
#include "../../../../mm/gup_test.h"
+#ifdef __NR_userfaultfd
+
/* The unit test doesn't need a large or random size, make it 32MB for now */
#define UFFD_TEST_MEM_SIZE (32UL << 20)
@@ -1558,3 +1559,14 @@ int main(int argc, char *argv[])
return ksft_get_fail_cnt() ? KSFT_FAIL : KSFT_PASS;
}
+#else /* __NR_userfaultfd */
+
+#warning "missing __NR_userfaultfd definition"
+
+int main(void)
+{
+ printf("Skipping %s (missing __NR_userfaultfd)\n", __file__);
+ return KSFT_SKIP;
+}
+
+#endif /* __NR_userfaultfd */
_
Patches currently in -mm which might be from jhubbard(a)nvidia.com are
revert-selftests-mm-remove-local-__nr_-definitions.patch
From: Ashish Kalra <ashish.kalra(a)amd.com>
This patch-set fixes the current SNP host enabling code and effectively SNP
which is broken with respect to the KVM module being built-in.
Essentially SNP host enabling code should be invoked before KVM
initialization, which is currently not the case when KVM is built-in.
SNP host support is currently enabled in snp_rmptable_init() which is
invoked as a device_initcall(). Here device_initcall() is used as
snp_rmptable_init() expects AMD IOMMU SNP support to be enabled prior
to it and the AMD IOMMU driver enables SNP support after PCI bus enumeration.
This patch-set adds support to call snp_rmptable_init() early and
directly from iommu_snp_enable() (after checking and enabling IOMMU
SNP support) which enables SNP host support before KVM initialization
with kvm_amd module built-in.
Additionally the patch-set adds support to initialize PSP SEV driver
during KVM module probe time.
This patch-set has been tested with the following cases/scenarios:
1). kvm_amd module and PSP driver built-in.
2). kvm_amd module built-in with intremap=off kernel command line.
3). kvm_amd module built-in with iommu=off kernel command line.
4). kvm_amd and PSP driver built as modules.
5). kvm_amd built as module with iommu=off kernel command line.
6). kvm_amd module as built-in and PSP driver as module.
7). kvm_amd build as a module and PSP driver as built-in.
v4:
- Add warning if SNP support has been checked on IOMMUs and host
SNP support has been enabled but late IOMMU initialization fails
subsequently.
- Add reviewed-by's.
v3:
- Ensure that dropping the device_initcall() happens in the same
patch that wires up the IOMMU code to invoke snp_rmptable_init()
which then makes sure that snp_rmptable_init() is still getting
called and also merge patches 3 & 4.
- Fix commit logs.
v2:
- Drop calling iommu_snp_enable() early before enabling IOMMUs as
IOMMU subsystem gets initialized via subsys_initcall() and hence
snp_rmptable_init() cannot be invoked via subsys_initcall().
- Instead add support to call snp_rmptable_init() early and
directly via iommu_snp_enable().
- Fix commit logs.
Fixes: c3b86e61b756 ("x86/cpufeatures: Enable/unmask SEV-SNP CPU feature")
Ashish Kalra (1):
x86/sev: Fix broken SNP support with KVM module built-in
Sean Christopherson (2):
crypto: ccp: Add external API interface for PSP module initialization
KVM: SVM: Ensure PSP module is initialized if KVM module is built-in
arch/x86/include/asm/sev.h | 2 ++
arch/x86/kvm/svm/sev.c | 10 ++++++++++
arch/x86/virt/svm/sev.c | 23 +++++++----------------
drivers/crypto/ccp/sp-dev.c | 14 ++++++++++++++
drivers/iommu/amd/init.c | 34 ++++++++++++++++++++++++++++++----
include/linux/psp-sev.h | 9 +++++++++
6 files changed, 72 insertions(+), 20 deletions(-)
--
2.34.1
From: Joshua Washington <joshwash(a)google.com>
Before this patch the NETDEV_XDP_ACT_NDO_XMIT XDP feature flag is set by
default as part of driver initialization, and is never cleared. However,
this flag differs from others in that it is used as an indicator for
whether the driver is ready to perform the ndo_xdp_xmit operation as
part of an XDP_REDIRECT. Kernel helpers
xdp_features_(set|clear)_redirect_target exist to convey this meaning.
This patch ensures that the netdev is only reported as a redirect target
when XDP queues exist to forward traffic.
Fixes: 39a7f4aa3e4a ("gve: Add XDP REDIRECT support for GQI-QPL format")
Cc: stable(a)vger.kernel.org
Reviewed-by: Praveen Kaligineedi <pkaligineedi(a)google.com>
Reviewed-by: Jeroen de Borst <jeroendb(a)google.com>
Signed-off-by: Joshua Washington <joshwash(a)google.com>
---
drivers/net/ethernet/google/gve/gve.h | 10 ++++++++++
drivers/net/ethernet/google/gve/gve_main.c | 6 +++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 8167cc5fb0df..78d2a19593d1 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -1116,6 +1116,16 @@ static inline u32 gve_xdp_tx_start_queue_id(struct gve_priv *priv)
return gve_xdp_tx_queue_id(priv, 0);
}
+static inline bool gve_supports_xdp_xmit(struct gve_priv *priv)
+{
+ switch (priv->queue_format) {
+ case GVE_GQI_QPL_FORMAT:
+ return true;
+ default:
+ return false;
+ }
+}
+
/* gqi napi handler defined in gve_main.c */
int gve_napi_poll(struct napi_struct *napi, int budget);
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 533e659b15b3..92237fb0b60c 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1903,6 +1903,8 @@ static void gve_turndown(struct gve_priv *priv)
/* Stop tx queues */
netif_tx_disable(priv->dev);
+ xdp_features_clear_redirect_target(priv->dev);
+
gve_clear_napi_enabled(priv);
gve_clear_report_stats(priv);
@@ -1972,6 +1974,9 @@ static void gve_turnup(struct gve_priv *priv)
napi_schedule(&block->napi);
}
+ if (priv->num_xdp_queues && gve_supports_xdp_xmit(priv))
+ xdp_features_set_redirect_target(priv->dev, false);
+
gve_set_napi_enabled(priv);
}
@@ -2246,7 +2251,6 @@ static void gve_set_netdev_xdp_features(struct gve_priv *priv)
if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
xdp_features = NETDEV_XDP_ACT_BASIC;
xdp_features |= NETDEV_XDP_ACT_REDIRECT;
- xdp_features |= NETDEV_XDP_ACT_NDO_XMIT;
xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY;
} else {
xdp_features = 0;
--
2.48.1.601.g30ceb7b040-goog
From: Joshua Washington <joshwash(a)google.com>
Before this patch the NETDEV_XDP_ACT_NDO_XMIT XDP feature flag is set by
default as part of driver initialization, and is never cleared. However,
this flag differs from others in that it is used as an indicator for
whether the driver is ready to perform the ndo_xdp_xmit operation as
part of an XDP_REDIRECT. Kernel helpers
xdp_features_(set|clear)_redirect_target exist to convey this meaning.
This patch ensures that the netdev is only reported as a redirect target
when XDP queues exist to forward traffic.
Fixes: 39a7f4aa3e4a ("gve: Add XDP REDIRECT support for GQI-QPL format")
Cc: stable(a)vger.kernel.org
Reviewed-by: Praveen Kaligineedi <pkaligineedi(a)google.com>
Reviewed-by: Jeroen de Borst <jeroendb(a)google.com>
Signed-off-by: Joshua Washington <joshwash(a)google.com>
---
drivers/net/ethernet/google/gve/gve.h | 10 ++++++++++
drivers/net/ethernet/google/gve/gve_main.c | 6 +++++-
2 files changed, 15 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/google/gve/gve.h b/drivers/net/ethernet/google/gve/gve.h
index 8167cc5fb0df..78d2a19593d1 100644
--- a/drivers/net/ethernet/google/gve/gve.h
+++ b/drivers/net/ethernet/google/gve/gve.h
@@ -1116,6 +1116,16 @@ static inline u32 gve_xdp_tx_start_queue_id(struct gve_priv *priv)
return gve_xdp_tx_queue_id(priv, 0);
}
+static inline bool gve_supports_xdp_xmit(struct gve_priv *priv)
+{
+ switch (priv->queue_format) {
+ case GVE_GQI_QPL_FORMAT:
+ return true;
+ default:
+ return false;
+ }
+}
+
/* gqi napi handler defined in gve_main.c */
int gve_napi_poll(struct napi_struct *napi, int budget);
diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
index 533e659b15b3..92237fb0b60c 100644
--- a/drivers/net/ethernet/google/gve/gve_main.c
+++ b/drivers/net/ethernet/google/gve/gve_main.c
@@ -1903,6 +1903,8 @@ static void gve_turndown(struct gve_priv *priv)
/* Stop tx queues */
netif_tx_disable(priv->dev);
+ xdp_features_clear_redirect_target(priv->dev);
+
gve_clear_napi_enabled(priv);
gve_clear_report_stats(priv);
@@ -1972,6 +1974,9 @@ static void gve_turnup(struct gve_priv *priv)
napi_schedule(&block->napi);
}
+ if (priv->num_xdp_queues && gve_supports_xdp_xmit(priv))
+ xdp_features_set_redirect_target(priv->dev, false);
+
gve_set_napi_enabled(priv);
}
@@ -2246,7 +2251,6 @@ static void gve_set_netdev_xdp_features(struct gve_priv *priv)
if (priv->queue_format == GVE_GQI_QPL_FORMAT) {
xdp_features = NETDEV_XDP_ACT_BASIC;
xdp_features |= NETDEV_XDP_ACT_REDIRECT;
- xdp_features |= NETDEV_XDP_ACT_NDO_XMIT;
xdp_features |= NETDEV_XDP_ACT_XSK_ZEROCOPY;
} else {
xdp_features = 0;
--
2.48.1.601.g30ceb7b040-goog
Fix several issues in partition probing:
- The bailout for a bad partoffset must use put_dev_sector(), since the
preceding read_part_sector() succeeded.
- If the partition table claims a silly sector size like 0xfff bytes
(which results in partition table entries straddling sector boundaries),
bail out instead of accessing out-of-bounds memory.
- We must not assume that the partition table contains proper NUL
termination - use strnlen() and strncmp() instead of strlen() and
strcmp().
Cc: stable(a)vger.kernel.org
Signed-off-by: Jann Horn <jannh(a)google.com>
---
block/partitions/mac.c | 18 +++++++++++++++---
1 file changed, 15 insertions(+), 3 deletions(-)
diff --git a/block/partitions/mac.c b/block/partitions/mac.c
index c80183156d68020e0e14974308ac751b3df84421..b02530d986297058de0db929fbf638a76fc44508 100644
--- a/block/partitions/mac.c
+++ b/block/partitions/mac.c
@@ -53,13 +53,25 @@ int mac_partition(struct parsed_partitions *state)
}
secsize = be16_to_cpu(md->block_size);
put_dev_sector(sect);
+
+ /*
+ * If the "block size" is not a power of 2, things get weird - we might
+ * end up with a partition straddling a sector boundary, so we wouldn't
+ * be able to read a partition entry with read_part_sector().
+ * Real block sizes are probably (?) powers of two, so just require
+ * that.
+ */
+ if (!is_power_of_2(secsize))
+ return -1;
datasize = round_down(secsize, 512);
data = read_part_sector(state, datasize / 512, §);
if (!data)
return -1;
partoffset = secsize % 512;
- if (partoffset + sizeof(*part) > datasize)
+ if (partoffset + sizeof(*part) > datasize) {
+ put_dev_sector(sect);
return -1;
+ }
part = (struct mac_partition *) (data + partoffset);
if (be16_to_cpu(part->signature) != MAC_PARTITION_MAGIC) {
put_dev_sector(sect);
@@ -112,8 +124,8 @@ int mac_partition(struct parsed_partitions *state)
int i, l;
goodness++;
- l = strlen(part->name);
- if (strcmp(part->name, "/") == 0)
+ l = strnlen(part->name, sizeof(part->name));
+ if (strncmp(part->name, "/", sizeof(part->name)) == 0)
goodness++;
for (i = 0; i <= l - 4; ++i) {
if (strncasecmp(part->name + i, "root",
---
base-commit: ab68d7eb7b1a64f3f4710da46cc5f93c6c154942
change-id: 20250214-partition-mac-2e7114c62223
--
Jann Horn <jannh(a)google.com>