The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 3a5a8d343e1cf96eb9971b17cbd4b832ab19b8e7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024061317-promoter-record-bc91@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
3a5a8d343e1c ("mm: fix race between __split_huge_pmd_locked() and GUP-fast")
4f83145721f3 ("mm: avoid unnecessary flush on change_huge_pmd()")
c9fe66560bf2 ("mm/mprotect: do not flush when not required architecturally")
4a18419f71cd ("mm/mprotect: use mmu_gather")
e346e6688c4a ("mm: thp: skip make PMD PROT_NONE if THP migration is not supported")
f0953a1bbaca ("mm: fix typos in comments")
e2db1a9aa381 ("kasan, mm: optimize kmalloc poisoning")
928501344fc6 ("kasan, mm: don't save alloc stacks twice")
2b8305260fb3 ("kfence, kasan: make KFENCE compatible with KASAN")
0ce20dd84089 ("mm: add Kernel Electric-Fence infrastructure")
41139aa4c3a3 ("mm/filemap: add mapping_seek_hole_data")
a1ba9da8f0f9 ("mm/hugetlb.c: fix unnecessary address expansion of pmd sharing")
611806b4bf8d ("kasan: fix bug detection via ksize for HW_TAGS mode")
027b37b552f3 ("kasan: move _RET_IP_ to inline wrappers")
573a48092313 ("kasan: add match-all tag tests")
f00748bfa024 ("kasan: prefix global functions with kasan_")
dbf53f7597be ("mm/mprotect.c: optimize error detection in do_mprotect_pkey()")
96667f8a4382 ("mm: Close race in generic_access_phys")
97593cad003c ("kasan: sanitize objects when metadata doesn't fit")
1ef3133bd3b8 ("kasan: simplify assign_tag and set_tag calls")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3a5a8d343e1cf96eb9971b17cbd4b832ab19b8e7 Mon Sep 17 00:00:00 2001
From: Ryan Roberts <ryan.roberts(a)arm.com>
Date: Wed, 1 May 2024 15:33:10 +0100
Subject: [PATCH] mm: fix race between __split_huge_pmd_locked() and GUP-fast
__split_huge_pmd_locked() can be called for a present THP, devmap or
(non-present) migration entry. It calls pmdp_invalidate() unconditionally
on the pmdp and only determines if it is present or not based on the
returned old pmd. This is a problem for the migration entry case because
pmd_mkinvalid(), called by pmdp_invalidate() must only be called for a
present pmd.
On arm64 at least, pmd_mkinvalid() will mark the pmd such that any future
call to pmd_present() will return true. And therefore any lockless
pgtable walker could see the migration entry pmd in this state and start
interpretting the fields as if it were present, leading to BadThings (TM).
GUP-fast appears to be one such lockless pgtable walker.
x86 does not suffer the above problem, but instead pmd_mkinvalid() will
corrupt the offset field of the swap entry within the swap pte. See link
below for discussion of that problem.
Fix all of this by only calling pmdp_invalidate() for a present pmd. And
for good measure let's add a warning to all implementations of
pmdp_invalidate[_ad](). I've manually reviewed all other
pmdp_invalidate[_ad]() call sites and believe all others to be conformant.
This is a theoretical bug found during code review. I don't have any test
case to trigger it in practice.
Link: https://lkml.kernel.org/r/20240501143310.1381675-1-ryan.roberts@arm.com
Link: https://lore.kernel.org/all/0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com/
Fixes: 84c3fc4e9c56 ("mm: thp: check pmd migration entry in common path")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Zi Yan <ziy(a)nvidia.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual(a)arm.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andreas Larsson <andreas(a)gaisler.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Aneesh Kumar K.V <aneesh.kumar(a)kernel.org>
Cc: Borislav Petkov (AMD) <bp(a)alien8.de>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Christian Borntraeger <borntraeger(a)linux.ibm.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Naveen N. Rao <naveen.n.rao(a)linux.ibm.com>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Will Deacon <will(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/Documentation/mm/arch_pgtable_helpers.rst b/Documentation/mm/arch_pgtable_helpers.rst
index 2466d3363af7..ad50ca6f495e 100644
--- a/Documentation/mm/arch_pgtable_helpers.rst
+++ b/Documentation/mm/arch_pgtable_helpers.rst
@@ -140,7 +140,8 @@ PMD Page Table Helpers
+---------------------------+--------------------------------------------------+
| pmd_swp_clear_soft_dirty | Clears a soft dirty swapped PMD |
+---------------------------+--------------------------------------------------+
-| pmd_mkinvalid | Invalidates a mapped PMD [1] |
+| pmd_mkinvalid | Invalidates a present PMD; do not call for |
+| | non-present PMD [1] |
+---------------------------+--------------------------------------------------+
| pmd_set_huge | Creates a PMD huge mapping |
+---------------------------+--------------------------------------------------+
@@ -196,7 +197,8 @@ PUD Page Table Helpers
+---------------------------+--------------------------------------------------+
| pud_mkdevmap | Creates a ZONE_DEVICE mapped PUD |
+---------------------------+--------------------------------------------------+
-| pud_mkinvalid | Invalidates a mapped PUD [1] |
+| pud_mkinvalid | Invalidates a present PUD; do not call for |
+| | non-present PUD [1] |
+---------------------------+--------------------------------------------------+
| pud_set_huge | Creates a PUD huge mapping |
+---------------------------+--------------------------------------------------+
diff --git a/arch/powerpc/mm/book3s64/pgtable.c b/arch/powerpc/mm/book3s64/pgtable.c
index 83823db3488b..2975ea0841ba 100644
--- a/arch/powerpc/mm/book3s64/pgtable.c
+++ b/arch/powerpc/mm/book3s64/pgtable.c
@@ -170,6 +170,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
{
unsigned long old_pmd;
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
old_pmd = pmd_hugepage_update(vma->vm_mm, address, pmdp, _PAGE_PRESENT, _PAGE_INVALID);
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return __pmd(old_pmd);
diff --git a/arch/s390/include/asm/pgtable.h b/arch/s390/include/asm/pgtable.h
index 2cb2a2e7b34b..558902edbfec 100644
--- a/arch/s390/include/asm/pgtable.h
+++ b/arch/s390/include/asm/pgtable.h
@@ -1769,8 +1769,10 @@ static inline pmd_t pmdp_huge_clear_flush(struct vm_area_struct *vma,
static inline pmd_t pmdp_invalidate(struct vm_area_struct *vma,
unsigned long addr, pmd_t *pmdp)
{
- pmd_t pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID);
+ pmd_t pmd;
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
+ pmd = __pmd(pmd_val(*pmdp) | _SEGMENT_ENTRY_INVALID);
return pmdp_xchg_direct(vma->vm_mm, addr, pmdp, pmd);
}
diff --git a/arch/sparc/mm/tlb.c b/arch/sparc/mm/tlb.c
index 19642f7ffb52..8648a50afe88 100644
--- a/arch/sparc/mm/tlb.c
+++ b/arch/sparc/mm/tlb.c
@@ -249,6 +249,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
{
pmd_t old, entry;
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
entry = __pmd(pmd_val(*pmdp) & ~_PAGE_VALID);
old = pmdp_establish(vma, address, pmdp, entry);
flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index 94767c82fc0d..93e54ba91fbf 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -631,6 +631,8 @@ int pmdp_clear_flush_young(struct vm_area_struct *vma,
pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
+
/*
* No flush is necessary. Once an invalid PTE is established, the PTE's
* access and dirty bits cannot be updated.
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 08e4f3343bcd..ccdcff73284a 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2430,32 +2430,11 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
return __split_huge_zero_page_pmd(vma, haddr, pmd);
}
- /*
- * Up to this point the pmd is present and huge and userland has the
- * whole access to the hugepage during the split (which happens in
- * place). If we overwrite the pmd with the not-huge version pointing
- * to the pte here (which of course we could if all CPUs were bug
- * free), userland could trigger a small page size TLB miss on the
- * small sized TLB while the hugepage TLB entry is still established in
- * the huge TLB. Some CPU doesn't like that.
- * See http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum
- * 383 on page 105. Intel should be safe but is also warns that it's
- * only safe if the permission and cache attributes of the two entries
- * loaded in the two TLB is identical (which should be the case here).
- * But it is generally safer to never allow small and huge TLB entries
- * for the same virtual address to be loaded simultaneously. So instead
- * of doing "pmd_populate(); flush_pmd_tlb_range();" we first mark the
- * current pmd notpresent (atomically because here the pmd_trans_huge
- * must remain set at all times on the pmd until the split is complete
- * for this pmd), then we flush the SMP TLB and finally we write the
- * non-huge version of the pmd entry with pmd_populate.
- */
- old_pmd = pmdp_invalidate(vma, haddr, pmd);
-
- pmd_migration = is_pmd_migration_entry(old_pmd);
+ pmd_migration = is_pmd_migration_entry(*pmd);
if (unlikely(pmd_migration)) {
swp_entry_t entry;
+ old_pmd = *pmd;
entry = pmd_to_swp_entry(old_pmd);
page = pfn_swap_entry_to_page(entry);
write = is_writable_migration_entry(entry);
@@ -2466,6 +2445,30 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd,
soft_dirty = pmd_swp_soft_dirty(old_pmd);
uffd_wp = pmd_swp_uffd_wp(old_pmd);
} else {
+ /*
+ * Up to this point the pmd is present and huge and userland has
+ * the whole access to the hugepage during the split (which
+ * happens in place). If we overwrite the pmd with the not-huge
+ * version pointing to the pte here (which of course we could if
+ * all CPUs were bug free), userland could trigger a small page
+ * size TLB miss on the small sized TLB while the hugepage TLB
+ * entry is still established in the huge TLB. Some CPU doesn't
+ * like that. See
+ * http://support.amd.com/TechDocs/41322_10h_Rev_Gd.pdf, Erratum
+ * 383 on page 105. Intel should be safe but is also warns that
+ * it's only safe if the permission and cache attributes of the
+ * two entries loaded in the two TLB is identical (which should
+ * be the case here). But it is generally safer to never allow
+ * small and huge TLB entries for the same virtual address to be
+ * loaded simultaneously. So instead of doing "pmd_populate();
+ * flush_pmd_tlb_range();" we first mark the current pmd
+ * notpresent (atomically because here the pmd_trans_huge must
+ * remain set at all times on the pmd until the split is
+ * complete for this pmd), then we flush the SMP TLB and finally
+ * we write the non-huge version of the pmd entry with
+ * pmd_populate.
+ */
+ old_pmd = pmdp_invalidate(vma, haddr, pmd);
page = pmd_page(old_pmd);
folio = page_folio(page);
if (pmd_dirty(old_pmd)) {
diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c
index 4fcd959dcc4d..a78a4adf711a 100644
--- a/mm/pgtable-generic.c
+++ b/mm/pgtable-generic.c
@@ -198,6 +198,7 @@ pgtable_t pgtable_trans_huge_withdraw(struct mm_struct *mm, pmd_t *pmdp)
pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
return old;
@@ -208,6 +209,7 @@ pmd_t pmdp_invalidate(struct vm_area_struct *vma, unsigned long address,
pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
pmd_t *pmdp)
{
+ VM_WARN_ON_ONCE(!pmd_present(*pmdp));
return pmdp_invalidate(vma, address, pmdp);
}
#endif
This is the start of the stable review cycle for the 6.1.94 release.
There are 85 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 15 Jun 2024 11:31:50 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.94-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.94-rc1
Enzo Matsumiya <ematsumiya(a)suse.de>
smb: client: fix deadlock in smb2_find_smb_tcon()
Puranjay Mohan <puranjay(a)kernel.org>
powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH
Omar Sandoval <osandov(a)fb.com>
btrfs: fix crash on racing fsync and size-extending write into prealloc
Anna Schumaker <Anna.Schumaker(a)Netapp.com>
NFS: Fix READ_PLUS when server doesn't support OP_READ_PLUS
Sergey Shtylyov <s.shtylyov(a)omp.ru>
nfs: fix undefined behavior in nfs_block_bits()
Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
EDAC/igen6: Convert PCIBIOS_* return codes to errnos
Frank Li <Frank.Li(a)nxp.com>
i3c: master: svc: fix invalidate IBI type and miss call client IBI handler
Harald Freudenberger <freude(a)linux.ibm.com>
s390/cpacf: Make use of invalid opcode produce a link error
Harald Freudenberger <freude(a)linux.ibm.com>
s390/cpacf: Split and rework cpacf query functions
Harald Freudenberger <freude(a)linux.ibm.com>
s390/ap: Fix crash in AP internal function modify_bitmap()
Helge Deller <deller(a)kernel.org>
parisc: Define sigset_t in parisc uapi header
Helge Deller <deller(a)gmx.de>
parisc: Define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
Baokun Li <libaokun1(a)huawei.com>
ext4: fix mb_cache_entry's e_refcnt leak in ext4_xattr_block_cache_find()
Baokun Li <libaokun1(a)huawei.com>
ext4: set type of ac_groups_linear_remaining to __u32 to avoid overflow
Mike Gilbert <floppym(a)gentoo.org>
sparc: move struct termio to asm/termios.h
Eric Dumazet <edumazet(a)google.com>
net: fix __dst_negative_advice() race
Daniel Thompson <daniel.thompson(a)linaro.org>
kdb: Use format-specifiers rather than memset() for padding in kdb_read()
Daniel Thompson <daniel.thompson(a)linaro.org>
kdb: Merge identical case statements in kdb_read()
Daniel Thompson <daniel.thompson(a)linaro.org>
kdb: Fix console handling when editing and tab-completing commands
Daniel Thompson <daniel.thompson(a)linaro.org>
kdb: Use format-strings rather than '\0' injection in kdb_read()
Daniel Thompson <daniel.thompson(a)linaro.org>
kdb: Fix buffer overflow during tab-complete
Judith Mendez <jm(a)ti.com>
watchdog: rti_wdt: Set min_hw_heartbeat_ms to accommodate a safety margin
Frank van der Linden <fvdl(a)google.com>
mm/hugetlb: pass correct order_per_bit to cma_declare_contiguous_nid
Frank van der Linden <fvdl(a)google.com>
mm/cma: drop incorrect alignment check in cma_init_reserved_mem
Sam Ravnborg <sam(a)ravnborg.org>
sparc64: Fix number of online CPUs
Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
intel_th: pci: Add Meteor Lake-S CPU support
Dhananjay Ugwekar <Dhananjay.Ugwekar(a)amd.com>
cpufreq: amd-pstate: Fix the inconsistency in max frequency units
Alexander Potapenko <glider(a)google.com>
kmsan: do not wipe out origin when doing partial unpoisoning
Nikita Zhandarovich <n.zhandarovich(a)fintech.ru>
net/9p: fix uninit-value in p9_client_rpc()
xu xin <xu.xin16(a)zte.com.cn>
net/ipv6: Fix route deleting failure when metric equals 0
Martin K. Petersen <martin.petersen(a)oracle.com>
scsi: core: Handle devices which return an unusually large VPD page count
Ryan Roberts <ryan.roberts(a)arm.com>
mm: fix race between __split_huge_pmd_locked() and GUP-fast
Herbert Xu <herbert(a)gondor.apana.org.au>
crypto: qat - Fix ADF_DEV_RESET_SYNC memory leak
Vitaly Chikunov <vt(a)altlinux.org>
crypto: ecrdsa - Fix module auto-load on add_key
Stefan Berger <stefanb(a)linux.ibm.com>
crypto: ecdsa - Fix module auto-load on add-key
Marc Zyngier <maz(a)kernel.org>
KVM: arm64: AArch32: Fix spurious trapping of conditional instructions
Marc Zyngier <maz(a)kernel.org>
KVM: arm64: Allow AArch32 PSTATE.M to be restored as System mode
Marc Zyngier <maz(a)kernel.org>
KVM: arm64: Fix AArch32 register narrowing on userspace write
Mario Limonciello <mario.limonciello(a)amd.com>
drm/amd: Fix shutdown (again) on some SMU v13.0.4/11 platforms
Dominique Martinet <asmadeus(a)codewreck.org>
9p: add missing locking around taking dentry fid list
Li Ma <li.ma(a)amd.com>
drm/amdgpu/atomfirmware: add intergrated info v2.3 table
Cai Xinchen <caixinchen1(a)huawei.com>
fbdev: savage: Handle err return when savagefb_check_var failed
Hans de Goede <hdegoede(a)redhat.com>
mmc: sdhci-acpi: Add quirk to enable pull-up on the card-detect GPIO on Asus T100TA
Hans de Goede <hdegoede(a)redhat.com>
mmc: sdhci-acpi: Disable write protect detection on Toshiba WT10-A
Hans de Goede <hdegoede(a)redhat.com>
mmc: sdhci-acpi: Fix Lenovo Yoga Tablet 2 Pro 1380 sdcard slot not working
Hans de Goede <hdegoede(a)redhat.com>
mmc: sdhci-acpi: Sort DMI quirks alphabetically
Adrian Hunter <adrian.hunter(a)intel.com>
mmc: sdhci: Add support for "Tuning Error" interrupts
Hans de Goede <hdegoede(a)redhat.com>
mmc: core: Add mmc_gpiod_set_cd_config() function
Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
media: v4l2-core: hold videodev_lock until dev reg, finishes
Nathan Chancellor <nathan(a)kernel.org>
media: mxl5xx: Move xpt structures off stack
Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
media: mc: mark the media devnode as registered from the, start
Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
media: mc: Fix graph walk in media_pipeline_start
Yang Xiwen <forbidden405(a)outlook.com>
arm64: dts: hi3798cv200: fix the size of GICR
Bitterblue Smith <rtl8821cerfe2(a)gmail.com>
wifi: rtlwifi: rtl8192de: Fix endianness issue in RX path
Bitterblue Smith <rtl8821cerfe2(a)gmail.com>
wifi: rtlwifi: rtl8192de: Fix low speed with WPA3-SAE
Bitterblue Smith <rtl8821cerfe2(a)gmail.com>
wifi: rtlwifi: rtl8192de: Fix 5 GHz TX power
Bitterblue Smith <rtl8821cerfe2(a)gmail.com>
wifi: rtl8xxxu: Fix the TX power of RTL8192CU, RTL8723AU
Ping-Ke Shih <pkshih(a)realtek.com>
wifi: rtw89: pci: correct TX resource checking for PCI DMA channel of firmware command
Yu Kuai <yukuai3(a)huawei.com>
md/raid5: fix deadlock that raid5d() wait for itself to clear MD_SB_CHANGE_PENDING
Johan Hovold <johan+linaro(a)kernel.org>
arm64: dts: qcom: qcs404: fix bluetooth device address
Krzysztof Kozlowski <krzk(a)kernel.org>
arm64: tegra: Correct Tegra132 I2C alias
Christoffer Sandberg <cs(a)tuxedo.de>
ACPI: resource: Do IRQ override on TongFang GXxHRXx and GMxHGxx
Maulik Shah <quic_mkshah(a)quicinc.com>
soc: qcom: rpmh-rsc: Enhance check for VRM in-flight request
Konrad Dybcio <konrad.dybcio(a)linaro.org>
thermal/drivers/qcom/lmh: Check for SCM availability at probe
Sergey Shtylyov <s.shtylyov(a)omp.ru>
ata: pata_legacy: make legacy_exit() work again
Ping-Ke Shih <pkshih(a)realtek.com>
wifi: rtw89: correct aSIFSTime for 6GHz band
Matthew Mirvish <matthew(a)mm12.xyz>
bcache: fix variable length array abuse in btree_iter
Bob Zhou <bob.zhou(a)amd.com>
drm/amdgpu: add error handle to avoid out-of-bounds
Zheyu Ma <zheyuma97(a)gmail.com>
media: lgdt3306a: Add a check against null-pointer-def
Chao Yu <chao(a)kernel.org>
f2fs: fix to do sanity check on i_xattr_nid in sanity_check_inode()
Florian Fainelli <florian.fainelli(a)broadcom.com>
scripts/gdb: fix SB_* constants parsing
Daniel Borkmann <daniel(a)iogearbox.net>
vxlan: Fix regression when dropping packets due to invalid src addresses
Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
mptcp: fix full TCP keep-alive support
Paolo Abeni <pabeni(a)redhat.com>
mptcp: cleanup SOL_TCP handling
Paolo Abeni <pabeni(a)redhat.com>
mptcp: avoid some duplicate code in socket option handling
Chaitanya Kumar Borah <chaitanya.kumar.borah(a)intel.com>
drm/i915/audio: Fix audio time stamp programming for DP
Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
nilfs2: fix use-after-free of timer for log writer thread
Haorong Lu <ancientmodern4(a)gmail.com>
riscv: signal: handle syscall restart before get_signal
Marc Dionne <marc.dionne(a)auristor.com>
afs: Don't cross .backup mountpoint from backup volume
Jorge Ramirez-Ortiz <jorge(a)foundries.io>
mmc: core: Do not force a retune before RPMB switch
Liam R. Howlett <Liam.Howlett(a)oracle.com>
maple_tree: fix mas_empty_area_rev() null pointer dereference
Peng Zhang <zhangpeng.00(a)bytedance.com>
maple_tree: fix allocation in mas_sparse_area()
Dan Gora <dan.gora(a)gmail.com>
Bluetooth: btrtl: Add missing MODULE_FIRMWARE declarations
Shradha Gupta <shradhagupta(a)linux.microsoft.com>
drm: Check polling initialized before enabling in drm_helper_probe_single_connector_modes
Shradha Gupta <shradhagupta(a)linux.microsoft.com>
drm: Check output polling initialized before disabling
-------------
Diffstat:
Documentation/mm/arch_pgtable_helpers.rst | 6 +-
Makefile | 4 +-
arch/arm64/boot/dts/hisilicon/hi3798cv200.dtsi | 2 +-
arch/arm64/boot/dts/nvidia/tegra132-norrin.dts | 4 +-
arch/arm64/boot/dts/nvidia/tegra132.dtsi | 2 +-
arch/arm64/boot/dts/qcom/qcs404-evb.dtsi | 2 +-
arch/arm64/kvm/guest.c | 3 +-
arch/arm64/kvm/hyp/aarch32.c | 18 ++-
arch/parisc/include/asm/page.h | 1 +
arch/parisc/include/asm/signal.h | 12 --
arch/parisc/include/uapi/asm/signal.h | 10 ++
arch/powerpc/mm/book3s64/pgtable.c | 1 +
arch/powerpc/net/bpf_jit_comp32.c | 12 ++
arch/powerpc/net/bpf_jit_comp64.c | 12 ++
arch/riscv/kernel/signal.c | 95 +++++++-------
arch/s390/include/asm/cpacf.h | 109 +++++++++++++---
arch/s390/include/asm/pgtable.h | 4 +-
arch/sparc/include/asm/smp_64.h | 2 -
arch/sparc/include/uapi/asm/termbits.h | 10 --
arch/sparc/include/uapi/asm/termios.h | 9 ++
arch/sparc/kernel/prom_64.c | 4 +-
arch/sparc/kernel/setup_64.c | 1 -
arch/sparc/kernel/smp_64.c | 14 --
arch/sparc/mm/tlb.c | 1 +
arch/x86/mm/pgtable.c | 2 +
crypto/ecdsa.c | 3 +
crypto/ecrdsa.c | 1 +
drivers/acpi/resource.c | 12 ++
drivers/ata/pata_legacy.c | 8 +-
drivers/bluetooth/btrtl.c | 18 ++-
drivers/cpufreq/amd-pstate.c | 2 +-
drivers/crypto/qat/qat_common/adf_aer.c | 19 +--
drivers/edac/igen6_edac.c | 4 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 15 +++
drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 3 +
drivers/gpu/drm/amd/include/atomfirmware.h | 43 ++++++
.../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_4_ppt.c | 20 +--
drivers/gpu/drm/drm_modeset_helper.c | 19 ++-
drivers/gpu/drm/drm_probe_helper.c | 15 ++-
drivers/gpu/drm/i915/display/intel_audio.c | 116 ++---------------
drivers/hwtracing/intel_th/pci.c | 5 +
drivers/i3c/master/svc-i3c-master.c | 16 ++-
drivers/md/bcache/bset.c | 44 +++----
drivers/md/bcache/bset.h | 28 ++--
drivers/md/bcache/btree.c | 40 +++---
drivers/md/bcache/super.c | 5 +-
drivers/md/bcache/sysfs.c | 2 +-
drivers/md/bcache/writeback.c | 10 +-
drivers/md/raid5.c | 15 +--
drivers/media/dvb-frontends/lgdt3306a.c | 5 +
drivers/media/dvb-frontends/mxl5xx.c | 22 ++--
drivers/media/mc/mc-devnode.c | 5 +-
drivers/media/mc/mc-entity.c | 6 +
drivers/media/v4l2-core/v4l2-dev.c | 3 +
drivers/mmc/core/host.c | 3 +-
drivers/mmc/core/slot-gpio.c | 20 +++
drivers/mmc/host/sdhci-acpi.c | 61 ++++++++-
drivers/mmc/host/sdhci.c | 10 +-
drivers/mmc/host/sdhci.h | 3 +-
drivers/net/vxlan/vxlan_core.c | 4 -
.../net/wireless/realtek/rtl8xxxu/rtl8xxxu_core.c | 25 ++--
.../net/wireless/realtek/rtlwifi/rtl8192de/phy.c | 4 +-
.../net/wireless/realtek/rtlwifi/rtl8192de/trx.c | 21 ++-
.../net/wireless/realtek/rtlwifi/rtl8192de/trx.h | 79 +++--------
drivers/net/wireless/realtek/rtw89/mac80211.c | 2 +-
drivers/net/wireless/realtek/rtw89/pci.c | 3 +-
drivers/s390/crypto/ap_bus.c | 2 +-
drivers/scsi/scsi.c | 7 +
drivers/soc/qcom/cmd-db.c | 32 ++++-
drivers/soc/qcom/rpmh-rsc.c | 3 +-
drivers/thermal/qcom/lmh.c | 3 +
drivers/video/fbdev/savage/savagefb_driver.c | 5 +-
drivers/watchdog/rti_wdt.c | 34 +++--
fs/9p/vfs_dentry.c | 9 +-
fs/afs/mntpt.c | 5 +
fs/btrfs/tree-log.c | 17 ++-
fs/ext4/mballoc.h | 2 +-
fs/ext4/xattr.c | 4 +-
fs/f2fs/inode.c | 6 +
fs/nfs/internal.h | 4 +-
fs/nfs/nfs4proc.c | 2 +-
fs/nilfs2/segment.c | 25 +++-
fs/smb/client/smb2transport.c | 2 +-
include/linux/mmc/slot-gpio.h | 1 +
include/net/dst_ops.h | 2 +-
include/net/sock.h | 13 +-
include/soc/qcom/cmd-db.h | 10 +-
kernel/debug/kdb/kdb_io.c | 99 ++++++++------
lib/maple_tree.c | 55 ++++----
mm/cma.c | 4 -
mm/huge_memory.c | 49 +++----
mm/hugetlb.c | 6 +-
mm/kmsan/core.c | 15 ++-
mm/pgtable-generic.c | 2 +
net/9p/client.c | 2 +
net/ipv4/route.c | 22 ++--
net/ipv6/route.c | 34 ++---
net/mptcp/protocol.h | 3 +
net/mptcp/sockopt.c | 144 +++++++++++++++------
net/xfrm/xfrm_policy.c | 11 +-
scripts/gdb/linux/constants.py.in | 12 +-
101 files changed, 1030 insertions(+), 695 deletions(-)
Commit 90c2d2eb7ab5 ("MIPS: pci: lantiq: switch to using gpiod API") not
only switched to the gpiod API, but also inverted / changed the polarity
of the GPIO.
According to the PCI specification, the RST# pin is an active-low
signal. However, most of the device trees that have been widely used for
a long time (mainly in the openWrt project) define this GPIO as
active-high and the old driver code inverted the signal internally.
Apparently there are actually boards where the reset gpio must be
operated inverted. For this reason, we cannot use the GPIOD_OUT_LOW/HIGH
flag for initialization. Instead, we must explicitly set the gpio to
value 1 in order to take into account any "GPIO_ACTIVE_LOW" flag that
may have been set.
In order to remain compatible with all these existing device trees, we
should therefore keep the logic as it was before the commit.
Fixes: 90c2d2eb7ab5 ("MIPS: pci: lantiq: switch to using gpiod API")
Cc: stable(a)vger.kernel.org
Signed-off-by: Martin Schiller <ms(a)dev.tdt.de>
---
arch/mips/pci/pci-lantiq.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/mips/pci/pci-lantiq.c b/arch/mips/pci/pci-lantiq.c
index 68a8cefed420..0844db34022e 100644
--- a/arch/mips/pci/pci-lantiq.c
+++ b/arch/mips/pci/pci-lantiq.c
@@ -124,14 +124,14 @@ static int ltq_pci_startup(struct platform_device *pdev)
clk_disable(clk_external);
/* setup reset gpio used by pci */
- reset_gpio = devm_gpiod_get_optional(&pdev->dev, "reset",
- GPIOD_OUT_LOW);
+ reset_gpio = devm_gpiod_get_optional(&pdev->dev, "reset", GPIOD_ASIS);
error = PTR_ERR_OR_ZERO(reset_gpio);
if (error) {
dev_err(&pdev->dev, "failed to request gpio: %d\n", error);
return error;
}
gpiod_set_consumer_name(reset_gpio, "pci_reset");
+ gpiod_direction_output(reset_gpio, 1);
/* enable auto-switching between PCI and EBU */
ltq_pci_w32(0xa, PCI_CR_CLK_CTRL);
@@ -194,10 +194,10 @@ static int ltq_pci_startup(struct platform_device *pdev)
/* toggle reset pin */
if (reset_gpio) {
- gpiod_set_value_cansleep(reset_gpio, 1);
+ gpiod_set_value_cansleep(reset_gpio, 0);
wmb();
mdelay(1);
- gpiod_set_value_cansleep(reset_gpio, 0);
+ gpiod_set_value_cansleep(reset_gpio, 1);
}
return 0;
}
--
2.39.2
In a TDX VM without paravisor, currently the default timer is the Hyper-V
timer, which depends on the slow VM Reference Counter MSR: the Hyper-V TSC
page is not enabled in such a VM because the VM uses Invariant TSC as a
better clocksource and it's challenging to mark the Hyper-V TSC page shared
in very early boot.
Lower the rating of the Hyper-V timer so the local APIC timer becomes the
the default timer in such a VM, and print a warning in case Invariant TSC
is unavailable in such a VM. This change should cause no perceivable
performance difference.
Cc: stable(a)vger.kernel.org # 6.6+
Reviewed-by: Roman Kisel <romank(a)linux.microsoft.com>
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
---
Changes in v2:
Improved the comments in ms_hyperv_init_platform() [Michael Kelley]
Added "print a warning in case Invariant TSC unavailable" in the changelog.
Added Roman's Reviewed-by.
arch/x86/kernel/cpu/mshyperv.c | 16 +++++++++++++++-
drivers/clocksource/hyperv_timer.c | 16 +++++++++++++++-
2 files changed, 30 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kernel/cpu/mshyperv.c b/arch/x86/kernel/cpu/mshyperv.c
index e0fd57a8ba840..954b7cbfa2f02 100644
--- a/arch/x86/kernel/cpu/mshyperv.c
+++ b/arch/x86/kernel/cpu/mshyperv.c
@@ -449,9 +449,23 @@ static void __init ms_hyperv_init_platform(void)
ms_hyperv.hints &= ~HV_X64_APIC_ACCESS_RECOMMENDED;
if (!ms_hyperv.paravisor_present) {
- /* To be supported: more work is required. */
+ /*
+ * Mark the Hyper-V TSC page feature as disabled
+ * in a TDX VM without paravisor so that the
+ * Invariant TSC, which is a better clocksource
+ * anyway, is used instead.
+ */
ms_hyperv.features &= ~HV_MSR_REFERENCE_TSC_AVAILABLE;
+ /*
+ * The Invariant TSC is expected to be available
+ * in a TDX VM without paravisor, but if not,
+ * print a warning message. The slower Hyper-V MSR-based
+ * Ref Counter should end up being the clocksource.
+ */
+ if (!(ms_hyperv.features & HV_ACCESS_TSC_INVARIANT))
+ pr_warn("Hyper-V: Invariant TSC is unavailable\n");
+
/* HV_MSR_CRASH_CTL is unsupported. */
ms_hyperv.misc_features &= ~HV_FEATURE_GUEST_CRASH_MSR_AVAILABLE;
diff --git a/drivers/clocksource/hyperv_timer.c b/drivers/clocksource/hyperv_timer.c
index b2a080647e413..99177835cadec 100644
--- a/drivers/clocksource/hyperv_timer.c
+++ b/drivers/clocksource/hyperv_timer.c
@@ -137,7 +137,21 @@ static int hv_stimer_init(unsigned int cpu)
ce->name = "Hyper-V clockevent";
ce->features = CLOCK_EVT_FEAT_ONESHOT;
ce->cpumask = cpumask_of(cpu);
- ce->rating = 1000;
+
+ /*
+ * Lower the rating of the Hyper-V timer in a TDX VM without paravisor,
+ * so the local APIC timer (lapic_clockevent) is the default timer in
+ * such a VM. The Hyper-V timer is not preferred in such a VM because
+ * it depends on the slow VM Reference Counter MSR (the Hyper-V TSC
+ * page is not enbled in such a VM because the VM uses Invariant TSC
+ * as a better clocksource and it's challenging to mark the Hyper-V
+ * TSC page shared in very early boot).
+ */
+ if (!ms_hyperv.paravisor_present && hv_isolation_type_tdx())
+ ce->rating = 90;
+ else
+ ce->rating = 1000;
+
ce->set_state_shutdown = hv_ce_shutdown;
ce->set_state_oneshot = hv_ce_set_oneshot;
ce->set_next_event = hv_ce_set_next_event;
--
2.25.1
Hi
Side note: This fix requires
4e7aaa6b82d6 ("netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type"
in first place, as a dependency.
Thanks
On Sat, Jun 22, 2024 at 07:41:24PM -0400, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> netfilter: ipset: Fix suspicious rcu_dereference_protected()
>
> to the 6.9-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> netfilter-ipset-fix-suspicious-rcu_dereference_prote.patch
> and it can be found in the queue-6.9 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
>
>
>
> commit 0226dfa53edc90463c1b0d50167da948c88025ef
> Author: Jozsef Kadlecsik <kadlec(a)netfilter.org>
> Date: Mon Jun 17 11:18:15 2024 +0200
>
> netfilter: ipset: Fix suspicious rcu_dereference_protected()
>
> [ Upstream commit 8ecd06277a7664f4ef018abae3abd3451d64e7a6 ]
>
> When destroying all sets, we are either in pernet exit phase or
> are executing a "destroy all sets command" from userspace. The latter
> was taken into account in ip_set_dereference() (nfnetlink mutex is held),
> but the former was not. The patch adds the required check to
> rcu_dereference_protected() in ip_set_dereference().
>
> Fixes: 4e7aaa6b82d6 ("netfilter: ipset: Fix race between namespace cleanup and gc in the list:set type")
> Reported-by: syzbot+b62c37cdd58103293a5a(a)syzkaller.appspotmail.com
> Reported-by: syzbot+cfbe1da5fdfc39efc293(a)syzkaller.appspotmail.com
> Reported-by: kernel test robot <oliver.sang(a)intel.com>
> Closes: https://lore.kernel.org/oe-lkp/202406141556.e0b6f17e-lkp@intel.com
> Signed-off-by: Jozsef Kadlecsik <kadlec(a)netfilter.org>
> Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
> Signed-off-by: Sasha Levin <sashal(a)kernel.org>
>
> diff --git a/net/netfilter/ipset/ip_set_core.c b/net/netfilter/ipset/ip_set_core.c
> index c7ae4d9bf3d24..61431690cbd5f 100644
> --- a/net/netfilter/ipset/ip_set_core.c
> +++ b/net/netfilter/ipset/ip_set_core.c
> @@ -53,12 +53,13 @@ MODULE_DESCRIPTION("core IP set support");
> MODULE_ALIAS_NFNL_SUBSYS(NFNL_SUBSYS_IPSET);
>
> /* When the nfnl mutex or ip_set_ref_lock is held: */
> -#define ip_set_dereference(p) \
> - rcu_dereference_protected(p, \
> +#define ip_set_dereference(inst) \
> + rcu_dereference_protected((inst)->ip_set_list, \
> lockdep_nfnl_is_held(NFNL_SUBSYS_IPSET) || \
> - lockdep_is_held(&ip_set_ref_lock))
> + lockdep_is_held(&ip_set_ref_lock) || \
> + (inst)->is_deleted)
> #define ip_set(inst, id) \
> - ip_set_dereference((inst)->ip_set_list)[id]
> + ip_set_dereference(inst)[id]
> #define ip_set_ref_netlink(inst,id) \
> rcu_dereference_raw((inst)->ip_set_list)[id]
> #define ip_set_dereference_nfnl(p) \
> @@ -1133,7 +1134,7 @@ static int ip_set_create(struct sk_buff *skb, const struct nfnl_info *info,
> if (!list)
> goto cleanup;
> /* nfnl mutex is held, both lists are valid */
> - tmp = ip_set_dereference(inst->ip_set_list);
> + tmp = ip_set_dereference(inst);
> memcpy(list, tmp, sizeof(struct ip_set *) * inst->ip_set_max);
> rcu_assign_pointer(inst->ip_set_list, list);
> /* Make sure all current packets have passed through */
The following commit has been merged into the smp/urgent branch of tip:
Commit-ID: 932d8476399f622aa0767a4a0a9e78e5341dc0e1
Gitweb: https://git.kernel.org/tip/932d8476399f622aa0767a4a0a9e78e5341dc0e1
Author: Yuntao Wang <ytcoode(a)gmail.com>
AuthorDate: Wed, 15 May 2024 21:45:54 +08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Mon, 17 Jun 2024 15:08:04 +02:00
cpu/hotplug: Fix dynstate assignment in __cpuhp_setup_state_cpuslocked()
Commit 4205e4786d0b ("cpu/hotplug: Provide dynamic range for prepare
stage") added a dynamic range for the prepare states, but did not handle
the assignment of the dynstate variable in __cpuhp_setup_state_cpuslocked().
This causes the corresponding startup callback not to be invoked when
calling __cpuhp_setup_state_cpuslocked() with the CPUHP_BP_PREPARE_DYN
parameter, even though it should be.
Currently, the users of __cpuhp_setup_state_cpuslocked(), for one reason or
another, have not triggered this bug.
Fixes: 4205e4786d0b ("cpu/hotplug: Provide dynamic range for prepare stage")
Signed-off-by: Yuntao Wang <ytcoode(a)gmail.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20240515134554.427071-1-ytcoode@gmail.com
---
kernel/cpu.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 563877d..74cfdb6 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2446,7 +2446,7 @@ EXPORT_SYMBOL_GPL(__cpuhp_state_add_instance);
* The caller needs to hold cpus read locked while calling this function.
* Return:
* On success:
- * Positive state number if @state is CPUHP_AP_ONLINE_DYN;
+ * Positive state number if @state is CPUHP_AP_ONLINE_DYN or CPUHP_BP_PREPARE_DYN;
* 0 for all other states
* On failure: proper (negative) error code
*/
@@ -2469,7 +2469,7 @@ int __cpuhp_setup_state_cpuslocked(enum cpuhp_state state,
ret = cpuhp_store_callbacks(state, name, startup, teardown,
multi_instance);
- dynstate = state == CPUHP_AP_ONLINE_DYN;
+ dynstate = state == CPUHP_AP_ONLINE_DYN || state == CPUHP_BP_PREPARE_DYN;
if (ret > 0 && dynstate) {
state = ret;
ret = 0;
@@ -2500,8 +2500,8 @@ int __cpuhp_setup_state_cpuslocked(enum cpuhp_state state,
out:
mutex_unlock(&cpuhp_state_mutex);
/*
- * If the requested state is CPUHP_AP_ONLINE_DYN, return the
- * dynamically allocated state in case of success.
+ * If the requested state is CPUHP_AP_ONLINE_DYN or CPUHP_BP_PREPARE_DYN,
+ * return the dynamically allocated state in case of success.
*/
if (!ret && dynstate)
return state;