The patch titled
Subject: arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
arm64-hugetlb-fix-set_huge_pte_at-to-work-with-all-swap-entries.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
Date: Fri, 22 Sep 2023 12:58:04 +0100
When called with a swap entry that does not embed a PFN (e.g.
PTE_MARKER_POISONED or PTE_MARKER_UFFD_WP), the previous implementation of
set_huge_pte_at() would either cause a BUG() to fire (if CONFIG_DEBUG_VM
is enabled) or cause a dereference of an invalid address and subsequent
panic.
arm64's huge pte implementation supports multiple huge page sizes, some of
which are implemented in the page table with multiple contiguous entries.
So set_huge_pte_at() needs to work out how big the logical pte is, so that
it can also work out how many physical ptes (or pmds) need to be written.
It previously did this by grabbing the folio out of the pte and querying
its size.
However, there are cases when the pte being set is actually a swap entry.
But this also used to work fine, because for huge ptes, we only ever saw
migration entries and hwpoison entries. And both of these types of swap
entries have a PFN embedded, so the code would grab that and everything
still worked out.
But over time, more calls to set_huge_pte_at() have been added that set
swap entry types that do not embed a PFN. And this causes the code to go
bang. The triggering case is for the uffd poison test, commit
99aa77215ad0 ("selftests/mm: add uffd unit test for UFFDIO_POISON"), which
causes a PTE_MARKER_POISONED swap entry to be set, coutesey of commit
8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs") -
added in v6.5-rc7. Although review shows that there are other call sites
that set PTE_MARKER_UFFD_WP (which also has no PFN), these don't trigger
on arm64 because arm64 doesn't support UFFD WP.
Arguably, the root cause is really due to commit 18f3962953e4 ("mm:
hugetlb: kill set_huge_swap_pte_at()"), which aimed to simplify the
interface to the core code by removing set_huge_swap_pte_at() (which took
a page size parameter) and replacing it with calls to set_huge_pte_at()
where the size was inferred from the folio, as descibed above. While that
commit didn't break anything at the time, it did break the interface
because it couldn't handle swap entries without PFNs. And since then new
callers have come along which rely on this working. But given the
brokeness is only observable after commit 8a13897fb0da ("mm: userfaultfd:
support UFFDIO_POISON for hugetlbfs"), that one gets the Fixes tag.
Now that we have modified the set_huge_pte_at() interface to pass the huge
page size in the previous patch, we can trivially fix this issue.
Link: https://lkml.kernel.org/r/20230922115804.2043771-3-ryan.roberts@arm.com
Fixes: 8a13897fb0da ("mm: userfaultfd: support UFFDIO_POISON for hugetlbfs")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: Albert Ou <aou(a)eecs.berkeley.edu>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Alexandre Ghiti <alex(a)ghiti.fr>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Gerald Schaefer <gerald.schaefer(a)linux.ibm.com>
Cc: Heiko Carstens <hca(a)linux.ibm.com>
Cc: Helge Deller <deller(a)gmx.de>
Cc: "James E.J. Bottomley" <James.Bottomley(a)HansenPartnership.com>
Cc: Lorenzo Stoakes <lstoakes(a)gmail.com>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Palmer Dabbelt <palmer(a)dabbelt.com>
Cc: Paul Walmsley <paul.walmsley(a)sifive.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: SeongJae Park <sj(a)kernel.org>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Vasily Gorbik <gor(a)linux.ibm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org> [6.5+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/arm64/mm/hugetlbpage.c | 17 +++--------------
1 file changed, 3 insertions(+), 14 deletions(-)
--- a/arch/arm64/mm/hugetlbpage.c~arm64-hugetlb-fix-set_huge_pte_at-to-work-with-all-swap-entries
+++ a/arch/arm64/mm/hugetlbpage.c
@@ -241,13 +241,6 @@ static void clear_flush(struct mm_struct
flush_tlb_range(&vma, saddr, addr);
}
-static inline struct folio *hugetlb_swap_entry_to_folio(swp_entry_t entry)
-{
- VM_BUG_ON(!is_migration_entry(entry) && !is_hwpoison_entry(entry));
-
- return page_folio(pfn_to_page(swp_offset_pfn(entry)));
-}
-
void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte, unsigned long sz)
{
@@ -257,13 +250,10 @@ void set_huge_pte_at(struct mm_struct *m
unsigned long pfn, dpfn;
pgprot_t hugeprot;
- if (!pte_present(pte)) {
- struct folio *folio;
-
- folio = hugetlb_swap_entry_to_folio(pte_to_swp_entry(pte));
- ncontig = num_contig_ptes(folio_size(folio), &pgsize);
+ ncontig = num_contig_ptes(sz, &pgsize);
- for (i = 0; i < ncontig; i++, ptep++)
+ if (!pte_present(pte)) {
+ for (i = 0; i < ncontig; i++, ptep++, addr += pgsize)
set_pte_at(mm, addr, ptep, pte);
return;
}
@@ -273,7 +263,6 @@ void set_huge_pte_at(struct mm_struct *m
return;
}
- ncontig = find_num_contig(mm, addr, ptep, &pgsize);
pfn = pte_pfn(pte);
dpfn = pgsize >> PAGE_SHIFT;
hugeprot = pte_pgprot(pte);
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-hugetlb-add-huge-page-size-param-to-set_huge_pte_at.patch
arm64-hugetlb-fix-set_huge_pte_at-to-work-with-all-swap-entries.patch
Hi Greg, Sasha,
The following list shows the backported patches, this batch is targeting
at garbage collection (GC) / set timeout fixes that address possible UaF
and memleaks. I am using original commit IDs for reference:
1) 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk")
2) 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
3) f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
4) c92db3030492 ("netfilter: nft_set_hash: mark set element as dead when deleting from packet path")
5) a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API")
6) 7845914f45f0 ("netfilter: nf_tables: don't fail inserts if duplicate has expired")
7) 6a33d8b73dfa ("netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path")
8) 02c6c24402bf ("netfilter: nf_tables: GC transaction race with netns dismantle")
9) 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
10) 8357bc946a2a ("netfilter: nf_tables: use correct lock to protect gc_list")
11) 8e51830e29e1 ("netfilter: nf_tables: defer gc run if previous batch is still pending")
12) 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction")
13) 96b33300fba8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
14) 4a9e12ea7e70 ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC")
15) 6d365eabce3c ("netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails")
16) b079155faae9 ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
17) cf5000a7787c ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
Please, apply.
Thanks.
Florian Westphal (4):
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: don't fail inserts if duplicate has expired
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso (13):
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
include/net/netfilter/nf_tables.h | 126 +++++-----
net/netfilter/nf_tables_api.c | 369 ++++++++++++++++++++++++------
net/netfilter/nft_set_hash.c | 87 ++++---
net/netfilter/nft_set_pipapo.c | 67 +++---
net/netfilter/nft_set_rbtree.c | 161 +++++++------
5 files changed, 547 insertions(+), 263 deletions(-)
--
2.30.2
Hi Greg, Sasha,
The following list shows the backported patches, this batch is targeting
at garbage collection (GC) / set timeout fixes that results in UaF and
memleaks. I am using original commit IDs for reference:
1) 24138933b97b ("netfilter: nf_tables: don't skip expired elements during walk")
2) 5f68718b34a5 ("netfilter: nf_tables: GC transaction API to avoid race with control plane")
3) f6c383b8c31a ("netfilter: nf_tables: adapt set backend to use GC transaction API")
4) c92db3030492 ("netfilter: nft_set_hash: mark set element as dead when deleting from packet path")
5) a2dd0233cbc4 ("netfilter: nf_tables: remove busy mark and gc batch API")
6) 7845914f45f0 ("netfilter: nf_tables: don't fail inserts if duplicate has expired")
7) 6a33d8b73dfa ("netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path")
8) 02c6c24402bf ("netfilter: nf_tables: GC transaction race with netns dismantle")
9) 720344340fb9 ("netfilter: nf_tables: GC transaction race with abort path")
10) 8357bc946a2a ("netfilter: nf_tables: use correct lock to protect gc_list")
11) 8e51830e29e1 ("netfilter: nf_tables: defer gc run if previous batch is still pending")
12) 2ee52ae94baa ("netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction")
13) 96b33300fba8 ("netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention")
14) 4a9e12ea7e70 ("netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC")
15) 6d365eabce3c ("netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails")
16) b079155faae9 ("netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration")
17) cf5000a7787c ("netfilter: nf_tables: fix memleak when more than 255 elements expired")
Please, apply.
Thanks.
Florian Westphal (4):
netfilter: nf_tables: don't skip expired elements during walk
netfilter: nf_tables: don't fail inserts if duplicate has expired
netfilter: nf_tables: defer gc run if previous batch is still pending
netfilter: nf_tables: fix memleak when more than 255 elements expired
Pablo Neira Ayuso (13):
netfilter: nf_tables: GC transaction API to avoid race with control plane
netfilter: nf_tables: adapt set backend to use GC transaction API
netfilter: nft_set_hash: mark set element as dead when deleting from packet path
netfilter: nf_tables: remove busy mark and gc batch API
netfilter: nf_tables: fix GC transaction races with netns and netlink event exit path
netfilter: nf_tables: GC transaction race with netns dismantle
netfilter: nf_tables: GC transaction race with abort path
netfilter: nf_tables: use correct lock to protect gc_list
netfilter: nft_set_rbtree: skip sync GC for new elements in this transaction
netfilter: nft_set_rbtree: use read spinlock to avoid datapath contention
netfilter: nft_set_pipapo: call nft_trans_gc_queue_sync() in catchall GC
netfilter: nft_set_pipapo: stop GC iteration if GC transaction allocation fails
netfilter: nft_set_hash: try later when GC hits EAGAIN on iteration
include/net/netfilter/nf_tables.h | 126 +++++-----
net/netfilter/nf_tables_api.c | 369 ++++++++++++++++++++++++------
net/netfilter/nft_set_hash.c | 87 ++++---
net/netfilter/nft_set_pipapo.c | 67 +++---
net/netfilter/nft_set_rbtree.c | 161 +++++++------
5 files changed, 547 insertions(+), 263 deletions(-)
--
2.30.2
Instead of constantly checking each possibility of the maple state,
create a fast path that will skip over checking unlikely states.
Cc: stable(a)vger.kernel.org
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
---
include/linux/maple_tree.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
index e41c70ac7744..f66f5f78f8cf 100644
--- a/include/linux/maple_tree.h
+++ b/include/linux/maple_tree.h
@@ -511,6 +511,15 @@ static inline bool mas_is_paused(const struct ma_state *mas)
return mas->node == MAS_PAUSE;
}
+/* Check if the mas is pointing to a node or not */
+static inline bool mas_is_active(struct ma_state *mas)
+{
+ if ((unsigned long)mas->node >= MAPLE_RESERVED_RANGE)
+ return true;
+
+ return false;
+}
+
/**
* mas_reset() - Reset a Maple Tree operation state.
* @mas: Maple Tree operation state.
--
2.40.1
The NAND core complies with the ONFI specification, which itself
mentions that after any program or erase operation, a status check
should be performed to see whether the operation was finished *and*
successful.
The NAND core offers helpers to finish a page write (sending the
"PAGE PROG" command, waiting for the NAND chip to be ready again, and
checking the operation status). But in some cases, advanced controller
drivers might want to optimize this and craft their own page write
helper to leverage additional hardware capabilities, thus not always
using the core facilities.
Some drivers, like this one, do not use the core helper to finish a page
write because the final cycles are automatically managed by the
hardware. In this case, the additional care must be taken to manually
perform the final status check.
Let's read the NAND chip status at the end of the page write helper and
return -EIO upon error.
Cc: stable(a)vger.kernel.org
Fixes: 02f26ecf8c77 ("mtd: nand: add reworked Marvell NAND controller driver")
Reported-by: Aviram Dali <aviramd(a)marvell.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
Hello Aviram,
I have not tested this, but based on your report I believe the status
check is indeed missing here and could sometimes lead to unnoticed
partial writes.
Please test on your side and reply with your Tested-by if you validate
the change.
Any backport on kernels predating v4.17 will likely fail because of a
folder rename, so you will have to do the backport manually if needed.
Thanks,
Miquèl
---
drivers/mtd/nand/raw/marvell_nand.c | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/marvell_nand.c b/drivers/mtd/nand/raw/marvell_nand.c
index 30c15e4e1cc0..576441095012 100644
--- a/drivers/mtd/nand/raw/marvell_nand.c
+++ b/drivers/mtd/nand/raw/marvell_nand.c
@@ -1162,6 +1162,7 @@ static int marvell_nfc_hw_ecc_hmg_do_write_page(struct nand_chip *chip,
.ndcb[2] = NDCB2_ADDR5_PAGE(page),
};
unsigned int oob_bytes = lt->spare_bytes + (raw ? lt->ecc_bytes : 0);
+ u8 status;
int ret;
/* NFCv2 needs more information about the operation being executed */
@@ -1195,7 +1196,18 @@ static int marvell_nfc_hw_ecc_hmg_do_write_page(struct nand_chip *chip,
ret = marvell_nfc_wait_op(chip,
PSEC_TO_MSEC(sdr->tPROG_max));
- return ret;
+ if (ret)
+ return ret;
+
+ /* Check write status on the chip side */
+ ret = nand_status_op(chip, &status);
+ if (ret)
+ return ret;
+
+ if (status & NAND_STATUS_FAIL)
+ return -EIO;
+
+ return 0;
}
static int marvell_nfc_hw_ecc_hmg_write_page_raw(struct nand_chip *chip,
@@ -1624,6 +1636,7 @@ static int marvell_nfc_hw_ecc_bch_write_page(struct nand_chip *chip,
int data_len = lt->data_bytes;
int spare_len = lt->spare_bytes;
int chunk, ret;
+ u8 status;
marvell_nfc_select_target(chip, chip->cur_cs);
@@ -1660,6 +1673,14 @@ static int marvell_nfc_hw_ecc_bch_write_page(struct nand_chip *chip,
if (ret)
return ret;
+ /* Check write status on the chip side */
+ ret = nand_status_op(chip, &status);
+ if (ret)
+ return ret;
+
+ if (status & NAND_STATUS_FAIL)
+ return -EIO;
+
return 0;
}
--
2.34.1
We currently provide the physical address of the DMA region
rather than the output of dma_map_resource() which is obviously wrong.
Fixes: 7330fc505af4 ("mtd: rawnand: qcom: stop using phys_to_dma()")
Cc: stable(a)vger.kernel.org
Reviewed-by: Manivannan Sadhasivam <mani(a)kernel.org>
Signed-off-by: Bibek Kumar Patro <quic_bibekkum(a)quicinc.com>
---
v5: Incorporated suggestions from Miquel/Mani
- Added tag to automatically include this patch in stable tree.
v4: Incorporated suggestion from Miquel
- Modified title and commit description.
https://lore.kernel.org/all/20230912115903.1007-1-quic_bibekkum@quicinc.com/
v3: Incorporated comments from Miquel
- Modified the commit message and title as per suggestions.
https://lore.kernel.org/all/20230912101814.7748-1-quic_bibekkum@quicinc.com/
v2: Incorporated comments from Pavan/Mani.
https://lore.kernel.org/all/20230911133026.29868-1-quic_bibekkum@quicinc.co…
v1: https://lore.kernel.org/all/20230907092854.11408-1-quic_bibekkum@quicinc.co…
drivers/mtd/nand/raw/qcom_nandc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
index 64499c1b3603..b079605c84d3 100644
--- a/drivers/mtd/nand/raw/qcom_nandc.c
+++ b/drivers/mtd/nand/raw/qcom_nandc.c
@@ -3444,7 +3444,7 @@ static int qcom_nandc_probe(struct platform_device *pdev)
err_aon_clk:
clk_disable_unprepare(nandc->core_clk);
err_core_clk:
- dma_unmap_resource(dev, res->start, resource_size(res),
+ dma_unmap_resource(dev, nandc->base_dma, resource_size(res),
DMA_BIDIRECTIONAL, 0);
return ret;
}
--
2.17.1
Defining a prctl flag as an int is a footgun because on a 64 bit machine
and with a variadic implementation of prctl (like in musl and glibc),
when used directly as a prctl argument, it can get casted to long with
garbage upper bits which would result in unexpected behaviors.
This patch changes the constant to an unsigned long to eliminate that
possibilities. This does not break UAPI.
Fixes: b507808ebce2 ("mm: implement memory-deny-write-execute as a prctl")
Cc: stable(a)vger.kernel.org
Signed-off-by: Florent Revest <revest(a)chromium.org>
Suggested-by: Alexey Izbyshev <izbyshev(a)ispras.ru>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Kees Cook <keescook(a)chromium.org>
Acked-by: Catalin Marinas <catalin.marinas(a)arm.com>
---
include/uapi/linux/prctl.h | 2 +-
tools/include/uapi/linux/prctl.h | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/prctl.h b/include/uapi/linux/prctl.h
index 3c36aeade991..9a85c69782bd 100644
--- a/include/uapi/linux/prctl.h
+++ b/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
-# define PR_MDWE_REFUSE_EXEC_GAIN 1
+# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
#define PR_GET_MDWE 66
diff --git a/tools/include/uapi/linux/prctl.h b/tools/include/uapi/linux/prctl.h
index 3c36aeade991..9a85c69782bd 100644
--- a/tools/include/uapi/linux/prctl.h
+++ b/tools/include/uapi/linux/prctl.h
@@ -283,7 +283,7 @@ struct prctl_mm_map {
/* Memory deny write / execute */
#define PR_SET_MDWE 65
-# define PR_MDWE_REFUSE_EXEC_GAIN 1
+# define PR_MDWE_REFUSE_EXEC_GAIN (1UL << 0)
#define PR_GET_MDWE 66
--
2.42.0.rc2.253.gd59a3bf2b4-goog