From: Shyjumon N <shyjumon.n(a)intel.com>
commit 1fae37accfc5872af3905d4ba71dc6ab15829be7 upstream
The Samsung SSD SM981/PM981 and Toshiba SSD KBG40ZNT256G on the Lenovo
C640 platform experience runtime resume issues when the SSDs are kept in
sleep/suspend mode for long time.
This patch applies the 'Simple Suspend' quirk to these configurations.
With this patch, the issue had not been observed in a 1+ day test.
Reviewed-by: Jon Derrick <jonathan.derrick(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Shyjumon N <shyjumon.n(a)intel.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
Signed-off-by: Erpeng Xu <xuerpeng(a)uniontech.com>
---
drivers/nvme/host/pci.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c
index 9c80f9f08149..b0434b687b17 100644
--- a/drivers/nvme/host/pci.c
+++ b/drivers/nvme/host/pci.c
@@ -2747,6 +2747,18 @@ static unsigned long check_vendor_combination_bug(struct pci_dev *pdev)
(dmi_match(DMI_BOARD_NAME, "PRIME B350M-A") ||
dmi_match(DMI_BOARD_NAME, "PRIME Z370-A")))
return NVME_QUIRK_NO_APST;
+ } else if ((pdev->vendor == 0x144d && (pdev->device == 0xa801 ||
+ pdev->device == 0xa808 || pdev->device == 0xa809)) ||
+ (pdev->vendor == 0x1e0f && pdev->device == 0x0001)) {
+ /*
+ * Forcing to use host managed nvme power settings for
+ * lowest idle power with quick resume latency on
+ * Samsung and Toshiba SSDs based on suspend behavior
+ * on Coffee Lake board for LENOVO C640
+ */
+ if ((dmi_match(DMI_BOARD_VENDOR, "LENOVO")) &&
+ dmi_match(DMI_BOARD_NAME, "LNVNB161216"))
+ return NVME_QUIRK_SIMPLE_SUSPEND;
}
return 0;
--
2.45.2
commit f442fa6141379a20b48ae3efabee827a3d260787 upstream
A kernel warning was reported when pinning folio in CMA memory when
launching SEV virtual machine. The splat looks like:
[ 464.325306] WARNING: CPU: 13 PID: 6734 at mm/gup.c:1313 __get_user_pages+0x423/0x520
[ 464.325464] CPU: 13 PID: 6734 Comm: qemu-kvm Kdump: loaded Not tainted 6.6.33+ #6
[ 464.325477] RIP: 0010:__get_user_pages+0x423/0x520
[ 464.325515] Call Trace:
[ 464.325520] <TASK>
[ 464.325523] ? __get_user_pages+0x423/0x520
[ 464.325528] ? __warn+0x81/0x130
[ 464.325536] ? __get_user_pages+0x423/0x520
[ 464.325541] ? report_bug+0x171/0x1a0
[ 464.325549] ? handle_bug+0x3c/0x70
[ 464.325554] ? exc_invalid_op+0x17/0x70
[ 464.325558] ? asm_exc_invalid_op+0x1a/0x20
[ 464.325567] ? __get_user_pages+0x423/0x520
[ 464.325575] __gup_longterm_locked+0x212/0x7a0
[ 464.325583] internal_get_user_pages_fast+0xfb/0x190
[ 464.325590] pin_user_pages_fast+0x47/0x60
[ 464.325598] sev_pin_memory+0xca/0x170 [kvm_amd]
[ 464.325616] sev_mem_enc_register_region+0x81/0x130 [kvm_amd]
Per the analysis done by yangge, when starting the SEV virtual machine, it
will call pin_user_pages_fast(..., FOLL_LONGTERM, ...) to pin the memory.
But the page is in CMA area, so fast GUP will fail then fallback to the
slow path due to the longterm pinnalbe check in try_grab_folio().
The slow path will try to pin the pages then migrate them out of CMA area.
But the slow path also uses try_grab_folio() to pin the page, it will
also fail due to the same check then the above warning is triggered.
In addition, the try_grab_folio() is supposed to be used in fast path and
it elevates folio refcount by using add ref unless zero. We are guaranteed
to have at least one stable reference in slow path, so the simple atomic add
could be used. The performance difference should be trivial, but the
misuse may be confusing and misleading.
Redefined try_grab_folio() to try_grab_folio_fast(), and try_grab_page()
to try_grab_folio(), and use them in the proper paths. This solves both
the abuse and the kernel warning.
The proper naming makes their usecase more clear and should prevent from
abusing in the future.
peterx said:
: The user will see the pin fails, for gpu-slow it further triggers the WARN
: right below that failure (as in the original report):
:
: folio = try_grab_folio(page, page_increm - 1,
: foll_flags);
: if (WARN_ON_ONCE(!folio)) { <------------------------ here
: /*
: * Release the 1st page ref if the
: * folio is problematic, fail hard.
: */
: gup_put_folio(page_folio(page), 1,
: foll_flags);
: ret = -EFAULT;
: goto out;
: }
[1] https://lore.kernel.org/linux-mm/1719478388-31917-1-git-send-email-yangge11…
[shy828301(a)gmail.com: fix implicit declaration of function try_grab_folio_fast]
Link: https://lkml.kernel.org/r/CAHbLzkowMSso-4Nufc9hcMehQsK9PNz3OSu-+eniU-2Mm-xj…
Link: https://lkml.kernel.org/r/20240628191458.2605553-1-yang@os.amperecomputing.…
Fixes: 57edfcfd3419 ("mm/gup: accelerate thp gup even for "pages != NULL"")
Signed-off-by: Yang Shi <yang(a)os.amperecomputing.com>
Reported-by: yangge <yangge1116(a)126.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [6.6+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 251 ++++++++++++++++++++++++-----------------------
mm/huge_memory.c | 6 +-
mm/hugetlb.c | 2 +-
mm/internal.h | 4 +-
4 files changed, 135 insertions(+), 128 deletions(-)
v2: Fixed a build failure
diff --git a/mm/gup.c b/mm/gup.c
index f50fe2219a13..fdd75384160d 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -97,95 +97,6 @@ static inline struct folio *try_get_folio(struct page *page, int refs)
return folio;
}
-/**
- * try_grab_folio() - Attempt to get or pin a folio.
- * @page: pointer to page to be grabbed
- * @refs: the value to (effectively) add to the folio's refcount
- * @flags: gup flags: these are the FOLL_* flag values.
- *
- * "grab" names in this file mean, "look at flags to decide whether to use
- * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
- *
- * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
- * same time. (That's true throughout the get_user_pages*() and
- * pin_user_pages*() APIs.) Cases:
- *
- * FOLL_GET: folio's refcount will be incremented by @refs.
- *
- * FOLL_PIN on large folios: folio's refcount will be incremented by
- * @refs, and its pincount will be incremented by @refs.
- *
- * FOLL_PIN on single-page folios: folio's refcount will be incremented by
- * @refs * GUP_PIN_COUNTING_BIAS.
- *
- * Return: The folio containing @page (with refcount appropriately
- * incremented) for success, or NULL upon failure. If neither FOLL_GET
- * nor FOLL_PIN was set, that's considered failure, and furthermore,
- * a likely bug in the caller, so a warning is also emitted.
- */
-struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags)
-{
- struct folio *folio;
-
- if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0))
- return NULL;
-
- if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
- return NULL;
-
- if (flags & FOLL_GET)
- return try_get_folio(page, refs);
-
- /* FOLL_PIN is set */
-
- /*
- * Don't take a pin on the zero page - it's not going anywhere
- * and it is used in a *lot* of places.
- */
- if (is_zero_page(page))
- return page_folio(page);
-
- folio = try_get_folio(page, refs);
- if (!folio)
- return NULL;
-
- /*
- * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
- * right zone, so fail and let the caller fall back to the slow
- * path.
- */
- if (unlikely((flags & FOLL_LONGTERM) &&
- !folio_is_longterm_pinnable(folio))) {
- if (!put_devmap_managed_page_refs(&folio->page, refs))
- folio_put_refs(folio, refs);
- return NULL;
- }
-
- /*
- * When pinning a large folio, use an exact count to track it.
- *
- * However, be sure to *also* increment the normal folio
- * refcount field at least once, so that the folio really
- * is pinned. That's why the refcount from the earlier
- * try_get_folio() is left intact.
- */
- if (folio_test_large(folio))
- atomic_add(refs, &folio->_pincount);
- else
- folio_ref_add(folio,
- refs * (GUP_PIN_COUNTING_BIAS - 1));
- /*
- * Adjust the pincount before re-checking the PTE for changes.
- * This is essentially a smp_mb() and is paired with a memory
- * barrier in page_try_share_anon_rmap().
- */
- smp_mb__after_atomic();
-
- node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
-
- return folio;
-}
-
static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
{
if (flags & FOLL_PIN) {
@@ -203,58 +114,59 @@ static void gup_put_folio(struct folio *folio, int refs, unsigned int flags)
}
/**
- * try_grab_page() - elevate a page's refcount by a flag-dependent amount
- * @page: pointer to page to be grabbed
- * @flags: gup flags: these are the FOLL_* flag values.
+ * try_grab_folio() - add a folio's refcount by a flag-dependent amount
+ * @folio: pointer to folio to be grabbed
+ * @refs: the value to (effectively) add to the folio's refcount
+ * @flags: gup flags: these are the FOLL_* flag values
*
* This might not do anything at all, depending on the flags argument.
*
* "grab" names in this file mean, "look at flags to decide whether to use
- * FOLL_PIN or FOLL_GET behavior, when incrementing the page's refcount.
+ * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
*
* Either FOLL_PIN or FOLL_GET (or neither) may be set, but not both at the same
- * time. Cases: please see the try_grab_folio() documentation, with
- * "refs=1".
+ * time.
*
* Return: 0 for success, or if no action was required (if neither FOLL_PIN
* nor FOLL_GET was set, nothing is done). A negative error code for failure:
*
- * -ENOMEM FOLL_GET or FOLL_PIN was set, but the page could not
+ * -ENOMEM FOLL_GET or FOLL_PIN was set, but the folio could not
* be grabbed.
+ *
+ * It is called when we have a stable reference for the folio, typically in
+ * GUP slow path.
*/
-int __must_check try_grab_page(struct page *page, unsigned int flags)
+int __must_check try_grab_folio(struct folio *folio, int refs,
+ unsigned int flags)
{
- struct folio *folio = page_folio(page);
-
if (WARN_ON_ONCE(folio_ref_count(folio) <= 0))
return -ENOMEM;
- if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
+ if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(&folio->page)))
return -EREMOTEIO;
if (flags & FOLL_GET)
- folio_ref_inc(folio);
+ folio_ref_add(folio, refs);
else if (flags & FOLL_PIN) {
/*
* Don't take a pin on the zero page - it's not going anywhere
* and it is used in a *lot* of places.
*/
- if (is_zero_page(page))
+ if (is_zero_folio(folio))
return 0;
/*
- * Similar to try_grab_folio(): be sure to *also*
- * increment the normal page refcount field at least once,
+ * Increment the normal page refcount field at least once,
* so that the page really is pinned.
*/
if (folio_test_large(folio)) {
- folio_ref_add(folio, 1);
- atomic_add(1, &folio->_pincount);
+ folio_ref_add(folio, refs);
+ atomic_add(refs, &folio->_pincount);
} else {
- folio_ref_add(folio, GUP_PIN_COUNTING_BIAS);
+ folio_ref_add(folio, refs * GUP_PIN_COUNTING_BIAS);
}
- node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, 1);
+ node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
}
return 0;
@@ -647,8 +559,8 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
!PageAnonExclusive(page), page);
- /* try_grab_page() does nothing unless FOLL_GET or FOLL_PIN is set. */
- ret = try_grab_page(page, flags);
+ /* try_grab_folio() does nothing unless FOLL_GET or FOLL_PIN is set. */
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (unlikely(ret)) {
page = ERR_PTR(ret);
goto out;
@@ -899,7 +811,7 @@ static int get_gate_page(struct mm_struct *mm, unsigned long address,
goto unmap;
*page = pte_page(entry);
}
- ret = try_grab_page(*page, gup_flags);
+ ret = try_grab_folio(page_folio(*page), 1, gup_flags);
if (unlikely(ret))
goto unmap;
out:
@@ -1302,20 +1214,19 @@ static long __get_user_pages(struct mm_struct *mm,
* pages.
*/
if (page_increm > 1) {
- struct folio *folio;
+ struct folio *folio = page_folio(page);
/*
* Since we already hold refcount on the
* large folio, this should never fail.
*/
- folio = try_grab_folio(page, page_increm - 1,
- foll_flags);
- if (WARN_ON_ONCE(!folio)) {
+ if (try_grab_folio(folio, page_increm - 1,
+ foll_flags)) {
/*
* Release the 1st page ref if the
* folio is problematic, fail hard.
*/
- gup_put_folio(page_folio(page), 1,
+ gup_put_folio(folio, 1,
foll_flags);
ret = -EFAULT;
goto out;
@@ -2541,6 +2452,102 @@ static void __maybe_unused undo_dev_pagemap(int *nr, int nr_start,
}
}
+/**
+ * try_grab_folio_fast() - Attempt to get or pin a folio in fast path.
+ * @page: pointer to page to be grabbed
+ * @refs: the value to (effectively) add to the folio's refcount
+ * @flags: gup flags: these are the FOLL_* flag values.
+ *
+ * "grab" names in this file mean, "look at flags to decide whether to use
+ * FOLL_PIN or FOLL_GET behavior, when incrementing the folio's refcount.
+ *
+ * Either FOLL_PIN or FOLL_GET (or neither) must be set, but not both at the
+ * same time. (That's true throughout the get_user_pages*() and
+ * pin_user_pages*() APIs.) Cases:
+ *
+ * FOLL_GET: folio's refcount will be incremented by @refs.
+ *
+ * FOLL_PIN on large folios: folio's refcount will be incremented by
+ * @refs, and its pincount will be incremented by @refs.
+ *
+ * FOLL_PIN on single-page folios: folio's refcount will be incremented by
+ * @refs * GUP_PIN_COUNTING_BIAS.
+ *
+ * Return: The folio containing @page (with refcount appropriately
+ * incremented) for success, or NULL upon failure. If neither FOLL_GET
+ * nor FOLL_PIN was set, that's considered failure, and furthermore,
+ * a likely bug in the caller, so a warning is also emitted.
+ *
+ * It uses add ref unless zero to elevate the folio refcount and must be called
+ * in fast path only.
+ */
+static struct folio *try_grab_folio_fast(struct page *page, int refs,
+ unsigned int flags)
+{
+ struct folio *folio;
+
+ /* Raise warn if it is not called in fast GUP */
+ VM_WARN_ON_ONCE(!irqs_disabled());
+
+ if (WARN_ON_ONCE((flags & (FOLL_GET | FOLL_PIN)) == 0))
+ return NULL;
+
+ if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page)))
+ return NULL;
+
+ if (flags & FOLL_GET)
+ return try_get_folio(page, refs);
+
+ /* FOLL_PIN is set */
+
+ /*
+ * Don't take a pin on the zero page - it's not going anywhere
+ * and it is used in a *lot* of places.
+ */
+ if (is_zero_page(page))
+ return page_folio(page);
+
+ folio = try_get_folio(page, refs);
+ if (!folio)
+ return NULL;
+
+ /*
+ * Can't do FOLL_LONGTERM + FOLL_PIN gup fast path if not in a
+ * right zone, so fail and let the caller fall back to the slow
+ * path.
+ */
+ if (unlikely((flags & FOLL_LONGTERM) &&
+ !folio_is_longterm_pinnable(folio))) {
+ if (!put_devmap_managed_page_refs(&folio->page, refs))
+ folio_put_refs(folio, refs);
+ return NULL;
+ }
+
+ /*
+ * When pinning a large folio, use an exact count to track it.
+ *
+ * However, be sure to *also* increment the normal folio
+ * refcount field at least once, so that the folio really
+ * is pinned. That's why the refcount from the earlier
+ * try_get_folio() is left intact.
+ */
+ if (folio_test_large(folio))
+ atomic_add(refs, &folio->_pincount);
+ else
+ folio_ref_add(folio,
+ refs * (GUP_PIN_COUNTING_BIAS - 1));
+ /*
+ * Adjust the pincount before re-checking the PTE for changes.
+ * This is essentially a smp_mb() and is paired with a memory
+ * barrier in folio_try_share_anon_rmap_*().
+ */
+ smp_mb__after_atomic();
+
+ node_stat_mod_folio(folio, NR_FOLL_PIN_ACQUIRED, refs);
+
+ return folio;
+}
+
#ifdef CONFIG_ARCH_HAS_PTE_SPECIAL
/*
* Fast-gup relies on pte change detection to avoid concurrent pgtable
@@ -2605,7 +2612,7 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
- folio = try_grab_folio(page, 1, flags);
+ folio = try_grab_folio_fast(page, 1, flags);
if (!folio)
goto pte_unmap;
@@ -2699,7 +2706,7 @@ static int __gup_device_huge(unsigned long pfn, unsigned long addr,
SetPageReferenced(page);
pages[*nr] = page;
- if (unlikely(try_grab_page(page, flags))) {
+ if (unlikely(try_grab_folio(page_folio(page), 1, flags))) {
undo_dev_pagemap(nr, nr_start, flags, pages);
break;
}
@@ -2808,7 +2815,7 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
page = nth_page(pte_page(pte), (addr & (sz - 1)) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -2879,7 +2886,7 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
page = nth_page(pmd_page(orig), (addr & ~PMD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -2923,7 +2930,7 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
page = nth_page(pud_page(orig), (addr & ~PUD_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
@@ -2963,7 +2970,7 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
page = nth_page(pgd_page(orig), (addr & ~PGDIR_MASK) >> PAGE_SHIFT);
refs = record_subpages(page, addr, end, pages + *nr);
- folio = try_grab_folio(page, refs, flags);
+ folio = try_grab_folio_fast(page, refs, flags);
if (!folio)
return 0;
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 79fbd6ddec49..2e64897168bc 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1052,7 +1052,7 @@ struct page *follow_devmap_pmd(struct vm_area_struct *vma, unsigned long addr,
if (!*pgmap)
return ERR_PTR(-EFAULT);
page = pfn_to_page(pfn);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
page = ERR_PTR(ret);
@@ -1210,7 +1210,7 @@ struct page *follow_devmap_pud(struct vm_area_struct *vma, unsigned long addr,
return ERR_PTR(-EFAULT);
page = pfn_to_page(pfn);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
page = ERR_PTR(ret);
@@ -1471,7 +1471,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) &&
!PageAnonExclusive(page), page);
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (ret)
return ERR_PTR(ret);
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a480affd475b..ab040f8d1987 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -6532,7 +6532,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma,
* try_grab_page() should always be able to get the page here,
* because we hold the ptl lock and have verified pte_present().
*/
- ret = try_grab_page(page, flags);
+ ret = try_grab_folio(page_folio(page), 1, flags);
if (WARN_ON_ONCE(ret)) {
page = ERR_PTR(ret);
diff --git a/mm/internal.h b/mm/internal.h
index abed947f784b..ef8d787a510c 100644
--- a/mm/internal.h
+++ b/mm/internal.h
@@ -938,8 +938,8 @@ int migrate_device_coherent_page(struct page *page);
/*
* mm/gup.c
*/
-struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags);
-int __must_check try_grab_page(struct page *page, unsigned int flags);
+int __must_check try_grab_folio(struct folio *folio, int refs,
+ unsigned int flags);
/*
* mm/huge_memory.c
--
2.41.0
From: Chuck Lever <chuck.lever(a)oracle.com>
Following up on
https://lore.kernel.org/linux-nfs/d4b235df-4ee5-4824-9d48-e3b3c1f1f4d1@orac…
Here is a backport series targeting origin/linux-6.1.y that closes
the information leak described in the above thread.
I started with v6.1.y because that is the most recent LTS kernel
and thus the closest to upstream. I plan to look at 5.15 and 5.10
LTS too if this series is applied to v6.1.y.
Review comments welcome.
Chuck Lever (6):
NFSD: Refactor nfsd_reply_cache_free_locked()
NFSD: Rename nfsd_reply_cache_alloc()
NFSD: Replace nfsd_prune_bucket()
NFSD: Refactor the duplicate reply cache shrinker
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
NFSD: Fix frame size warning in svc_export_parse()
Jeff Layton (2):
nfsd: move reply cache initialization into nfsd startup
nfsd: move init of percpu reply_cache_stats counters back to
nfsd_init_net
Josef Bacik (10):
sunrpc: don't change ->sv_stats if it doesn't exist
nfsd: stop setting ->pg_stats for unused stats
sunrpc: pass in the sv_stats struct through svc_create_pooled
sunrpc: remove ->pg_stats from svc_program
sunrpc: use the struct net as the svc proc private
nfsd: rename NFSD_NET_* to NFSD_STATS_*
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
nfsd: make all of the nfsd stats per-network namespace
nfsd: remove nfsd_stats, make th_cnt a global counter
nfsd: make svc_stat per-network namespace instead of global
fs/lockd/svc.c | 3 -
fs/nfs/callback.c | 3 -
fs/nfsd/export.c | 32 ++++--
fs/nfsd/export.h | 4 +-
fs/nfsd/netns.h | 25 ++++-
fs/nfsd/nfs4proc.c | 6 +-
fs/nfsd/nfscache.c | 201 ++++++++++++++++++++++---------------
fs/nfsd/nfsctl.c | 24 ++---
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 3 +-
fs/nfsd/nfssvc.c | 24 +++--
fs/nfsd/stats.c | 52 ++++------
fs/nfsd/stats.h | 83 ++++++---------
fs/nfsd/trace.h | 22 ++++
fs/nfsd/vfs.c | 6 +-
include/linux/sunrpc/svc.h | 5 +-
net/sunrpc/stats.c | 2 +-
net/sunrpc/svc.c | 36 ++++---
18 files changed, 301 insertions(+), 231 deletions(-)
--
2.45.1
From: Chuck Lever <chuck.lever(a)oracle.com>
Following up on:
https://lore.kernel.org/linux-nfs/d4b235df-4ee5-4824-9d48-e3b3c1f1f4d1@orac…
Here is a backport series targeting origin/linux-6.6.y that closes
the information leak described in the above thread. It passes basic
NFSD regression testing.
Review comments welcome.
Chuck Lever (2):
NFSD: Rewrite synopsis of nfsd_percpu_counters_init()
NFSD: Fix frame size warning in svc_export_parse()
Josef Bacik (10):
sunrpc: don't change ->sv_stats if it doesn't exist
nfsd: stop setting ->pg_stats for unused stats
sunrpc: pass in the sv_stats struct through svc_create_pooled
sunrpc: remove ->pg_stats from svc_program
sunrpc: use the struct net as the svc proc private
nfsd: rename NFSD_NET_* to NFSD_STATS_*
nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
nfsd: make all of the nfsd stats per-network namespace
nfsd: remove nfsd_stats, make th_cnt a global counter
nfsd: make svc_stat per-network namespace instead of global
fs/lockd/svc.c | 3 --
fs/nfs/callback.c | 3 --
fs/nfsd/cache.h | 2 -
fs/nfsd/export.c | 32 ++++++++++----
fs/nfsd/export.h | 4 +-
fs/nfsd/netns.h | 25 +++++++++--
fs/nfsd/nfs4proc.c | 6 +--
fs/nfsd/nfs4state.c | 3 +-
fs/nfsd/nfscache.c | 40 ++++-------------
fs/nfsd/nfsctl.c | 16 +++----
fs/nfsd/nfsd.h | 1 +
fs/nfsd/nfsfh.c | 3 +-
fs/nfsd/nfssvc.c | 14 +++---
fs/nfsd/stats.c | 54 ++++++++++-------------
fs/nfsd/stats.h | 88 ++++++++++++++------------------------
fs/nfsd/vfs.c | 6 ++-
include/linux/sunrpc/svc.h | 5 ++-
net/sunrpc/stats.c | 2 +-
net/sunrpc/svc.c | 39 +++++++++++------
19 files changed, 163 insertions(+), 183 deletions(-)
--
2.45.1
The patch below does not apply to the 6.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.10.y
git checkout FETCH_HEAD
git cherry-pick -x 7697a0fe0154468f5df35c23ebd7aa48994c2cdc
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024072921-props-yam-bb2b@gregkh' --subject-prefix 'PATCH 6.10.y' HEAD^..
Possible dependencies:
7697a0fe0154 ("LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h")
26a3b85bac08 ("loongarch: convert to generic syscall table")
505d66d1abfb ("clone3: drop __ARCH_WANT_SYS_CLONE3 macro")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 7697a0fe0154468f5df35c23ebd7aa48994c2cdc Mon Sep 17 00:00:00 2001
From: Huacai Chen <chenhuacai(a)kernel.org>
Date: Sat, 20 Jul 2024 22:40:58 +0800
Subject: [PATCH] LoongArch: Define __ARCH_WANT_NEW_STAT in unistd.h
Chromium sandbox apparently wants to deny statx [1] so it could properly
inspect arguments after the sandboxed process later falls back to fstat.
Because there's currently not a "fd-only" version of statx, so that the
sandbox has no way to ensure the path argument is empty without being
able to peek into the sandboxed process's memory. For architectures able
to do newfstatat though, glibc falls back to newfstatat after getting
-ENOSYS for statx, then the respective SIGSYS handler [2] takes care of
inspecting the path argument, transforming allowed newfstatat's into
fstat instead which is allowed and has the same type of return value.
But, as LoongArch is the first architecture to not have fstat nor
newfstatat, the LoongArch glibc does not attempt falling back at all
when it gets -ENOSYS for statx -- and you see the problem there!
Actually, back when the LoongArch port was under review, people were
aware of the same problem with sandboxing clone3 [3], so clone was
eventually kept. Unfortunately it seemed at that time no one had noticed
statx, so besides restoring fstat/newfstatat to LoongArch uapi (and
postponing the problem further), it seems inevitable that we would need
to tackle seccomp deep argument inspection.
However, this is obviously a decision that shouldn't be taken lightly,
so we just restore fstat/newfstatat by defining __ARCH_WANT_NEW_STAT
in unistd.h. This is the simplest solution for now, and so we hope the
community will tackle the long-standing problem of seccomp deep argument
inspection in the future [4][5].
Also add "newstat" to syscall_abis_64 in Makefile.syscalls due to
upstream asm-generic changes.
More infomation please reading this thread [6].
[1] https://chromium-review.googlesource.com/c/chromium/src/+/2823150
[2] https://chromium.googlesource.com/chromium/src/sandbox/+/c085b51940bd/linux…
[3] https://lore.kernel.org/linux-arch/20220511211231.GG7074@brightrain.aerifal…
[4] https://lwn.net/Articles/799557/
[5] https://lpc.events/event/4/contributions/560/attachments/397/640/deep-arg-i…
[6] https://lore.kernel.org/loongarch/20240226-granit-seilschaft-eccc2433014d@b…
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
diff --git a/arch/loongarch/include/asm/unistd.h b/arch/loongarch/include/asm/unistd.h
index fc0a481a7416..e2c0f3d86c7b 100644
--- a/arch/loongarch/include/asm/unistd.h
+++ b/arch/loongarch/include/asm/unistd.h
@@ -8,6 +8,7 @@
#include <uapi/asm/unistd.h>
+#define __ARCH_WANT_NEW_STAT
#define __ARCH_WANT_SYS_CLONE
#define NR_syscalls (__NR_syscalls)
diff --git a/arch/loongarch/kernel/Makefile.syscalls b/arch/loongarch/kernel/Makefile.syscalls
index ab7d9baa2915..523bb411a3bc 100644
--- a/arch/loongarch/kernel/Makefile.syscalls
+++ b/arch/loongarch/kernel/Makefile.syscalls
@@ -1,4 +1,3 @@
# SPDX-License-Identifier: GPL-2.0
-# No special ABIs on loongarch so far
-syscall_abis_64 +=
+syscall_abis_64 += newstat
To clarify…
On 02/07/2024 5:54 pm, Calum Mackay wrote:
> hi Petr,
>
> I noticed your LTP patch [1][2] which adjusts the nfsstat01 test on v6.9
> kernels, to account for Josef's changes [3], which restrict the NFS/RPC
> stats per-namespace.
>
> I see that Josef's changes were backported, as far back as longterm
> v5.4,
Sorry, that's not quite accurate.
Josef's NFS client changes were all backported from v6.9, as far as
longterm v5.4.y:
2057a48d0dd0 sunrpc: add a struct rpc_stats arg to rpc_create_args
d47151b79e32 nfs: expose /proc/net/sunrpc/nfs in net namespaces
1548036ef120 nfs: make the rpc_stat per net namespace
Of Josef's NFS server changes, four were backported from v6.9 to v6.8:
418b9687dece sunrpc: use the struct net as the svc proc private
d98416cc2154 nfsd: rename NFSD_NET_* to NFSD_STATS_*
93483ac5fec6 nfsd: expose /proc/net/sunrpc/nfsd in net namespaces
4b14885411f7 nfsd: make all of the nfsd stats per-network namespace
and the others remained only in v6.9:
ab42f4d9a26f sunrpc: don't change ->sv_stats if it doesn't exist
a2214ed588fb nfsd: stop setting ->pg_stats for unused stats
f09432386766 sunrpc: pass in the sv_stats struct through svc_create_pooled
3f6ef182f144 sunrpc: remove ->pg_stats from svc_program
e41ee44cc6a4 nfsd: remove nfsd_stats, make th_cnt a global counter
16fb9808ab2c nfsd: make svc_stat per-network namespace instead of global
I'm wondering if this difference between NFS client, and NFS server,
stat behaviour, across kernel versions, may perhaps cause some user
confusion?
cheers,
calum.
> so your check for kernel version "6.9" in the test may need to be
> adjusted, if LTP is intended to be run on stable kernels?
>
> best wishes,
> calum.
>
>
> [1] https://lore.kernel.org/ltp/20240620111129.594449-1-pvorel@suse.cz/
> [2] https://patchwork.ozlabs.org/project/ltp/
> patch/20240620111129.594449-1-pvorel(a)suse.cz/
> [3] https://lore.kernel.org/linux-nfs/
> cover.1708026931.git.josef(a)toxicpanda.com/
The DWC3_EP_RESOURCE_ALLOCATED flag ensures that the resource of an
endpoint is only assigned once. Unless the endpoint is reset, don't
clear this flag. Otherwise we may set endpoint resource again, which
prevents the driver from initiate transfer after handling a STALL or
endpoint halt to the control endpoint.
Commit f2e0eee47038 ("usb: dwc3: ep0: Don't reset resource alloc flag")
was fixing the initial issue, but did this only for physical ep1. Since
the function dwc3_ep0_stall_and_restart is resetting the flags for both
physical endpoints, this also has to be done for ep0.
Cc: stable(a)vger.kernel.org
Fixes: b311048c174d ("usb: dwc3: gadget: Rewrite endpoint allocation flow")
Acked-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Michael Grzeschik <m.grzeschik(a)pengutronix.de>
---
v2: Added missing double quotes in the referenced patch name
- Link to v1: https://lore.kernel.org/r/20240814-dwc3hwep0reset-v1-1-087b0d26f3d0@pengutr…
---
drivers/usb/dwc3/ep0.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/dwc3/ep0.c b/drivers/usb/dwc3/ep0.c
index d96ffbe520397..c9533a99e47c8 100644
--- a/drivers/usb/dwc3/ep0.c
+++ b/drivers/usb/dwc3/ep0.c
@@ -232,7 +232,8 @@ void dwc3_ep0_stall_and_restart(struct dwc3 *dwc)
/* stall is always issued on EP0 */
dep = dwc->eps[0];
__dwc3_gadget_ep_set_halt(dep, 1, false);
- dep->flags = DWC3_EP_ENABLED;
+ dep->flags &= DWC3_EP_RESOURCE_ALLOCATED;
+ dep->flags |= DWC3_EP_ENABLED;
dwc->delayed_status = false;
if (!list_empty(&dep->pending_list)) {
---
base-commit: 38343be0bf9a7d7ef0d160da5f2db887a0e29b62
change-id: 20240814-dwc3hwep0reset-b4d371873494
Best regards,
--
Michael Grzeschik <m.grzeschik(a)pengutronix.de>
pinmux_generic_get_function() can return NULL and the pointer 'function'
was dereferenced without checking against NULL. Add checking of pointer
'function' in pcs_get_function().
Found by code review.
Cc: stable(a)vger.kernel.org
Fixes: 571aec4df5b7 ("pinctrl: single: Use generic pinmux helpers for managing functions")
Signed-off-by: Ma Ke <make24(a)iscas.ac.cn>
---
drivers/pinctrl/pinctrl-single.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pinctrl/pinctrl-single.c b/drivers/pinctrl/pinctrl-single.c
index 4c6bfabb6bd7..4da3c3f422b6 100644
--- a/drivers/pinctrl/pinctrl-single.c
+++ b/drivers/pinctrl/pinctrl-single.c
@@ -345,6 +345,8 @@ static int pcs_get_function(struct pinctrl_dev *pctldev, unsigned pin,
return -ENOTSUPP;
fselector = setting->func;
function = pinmux_generic_get_function(pctldev, fselector);
+ if (!function)
+ return -EINVAL;
*func = function->data;
if (!(*func)) {
dev_err(pcs->dev, "%s could not find function%i\n",
--
2.25.1