From: MoYuanhao <moyuanhao3676(a)163.com>
Widen protocol name column from %-9s to %-11s to properly display
UNIX-STREAM and keep table alignment.
before modification:
console:/ # cat /proc/net/protocols
protocol size sockets memory press maxhdr slab module cl co di ac io in de sh ss gs se re sp bi br ha uh gp em
PPPOL2TP 920 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
HIDP 808 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
BNEP 808 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
RFCOMM 840 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
KEY 864 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
PACKET 1536 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
PINGv6 1184 0 -1 NI 0 yes kernel y y y n n y n n y y y y n y y y y y n
RAWv6 1184 0 -1 NI 0 yes kernel y y y n y y y n y y y y n y y y y n n
UDPLITEv6 1344 0 0 NI 0 yes kernel y y y n y y y n y y y y n n n y y y n
UDPv6 1344 0 0 NI 0 yes kernel y y y n y y y n y y y y n n n y y y n
TCPv6 2352 0 0 no 320 yes kernel y y y y y y y y y y y y y n y y y y y
PPTP 920 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
PPPOE 920 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
UNIX-STREAM 1024 29 -1 NI 0 yes kernel y n n n n n n n n n n n n n n n y n n
UNIX 1024 193 -1 NI 0 yes kernel y n n n n n n n n n n n n n n n n n n
UDP-Lite 1152 0 0 NI 0 yes kernel y y y n y y y n y y y y y n n y y y n
PING 976 0 -1 NI 0 yes kernel y y y n n y n n y y y y n y y y y y n
RAW 984 0 -1 NI 0 yes kernel y y y n y y y n y y y y n y y y y n n
UDP 1152 0 0 NI 0 yes kernel y y y n y y y n y y y y y n n y y y n
TCP 2192 0 0 no 320 yes kernel y y y y y y y y y y y y y n y y y y y
SCO 848 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
L2CAP 824 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
HCI 888 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
NETLINK 1104 18 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
after modification:
console:/ # cat /proc/net/protocols
protocol size sockets memory press maxhdr slab module cl co di ac io in de sh ss gs se re sp bi br ha uh gp em
PPPOL2TP 920 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
HIDP 808 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
BNEP 808 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
RFCOMM 840 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
KEY 864 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
PACKET 1536 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
PINGv6 1184 0 -1 NI 0 yes kernel y y y n n y n n y y y y n y y y y y n
RAWv6 1184 0 -1 NI 0 yes kernel y y y n y y y n y y y y n y y y y n n
UDPLITEv6 1344 0 0 NI 0 yes kernel y y y n y y y n y y y y n n n y y y n
UDPv6 1344 0 0 NI 0 yes kernel y y y n y y y n y y y y n n n y y y n
TCPv6 2352 0 0 no 320 yes kernel y y y y y y y y y y y y y n y y y y y
PPTP 920 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
PPPOE 920 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
UNIX-STREAM 1024 29 -1 NI 0 yes kernel y n n n n n n n n n n n n n n n y n n
UNIX 1024 193 -1 NI 0 yes kernel y n n n n n n n n n n n n n n n n n n
UDP-Lite 1152 0 0 NI 0 yes kernel y y y n y y y n y y y y y n n y y y n
PING 976 0 -1 NI 0 yes kernel y y y n n y n n y y y y n y y y y y n
RAW 984 0 -1 NI 0 yes kernel y y y n y y y n y y y y n y y y y n n
UDP 1152 0 0 NI 0 yes kernel y y y n y y y n y y y y y n n y y y n
TCP 2192 0 0 no 320 yes kernel y y y y y y y y y y y y y n y y y y y
SCO 848 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
L2CAP 824 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
HCI 888 0 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
NETLINK 1104 18 -1 NI 0 no kernel n n n n n n n n n n n n n n n n n n n
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Signed-off-by: MoYuanhao <moyuanhao3676(a)163.com>
---
net/core/sock.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/core/sock.c b/net/core/sock.c
index 3b409bc8ef6d..d2de5459e94f 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -4284,7 +4284,7 @@ static const char *sock_prot_memory_pressure(struct proto *proto)
static void proto_seq_printf(struct seq_file *seq, struct proto *proto)
{
- seq_printf(seq, "%-9s %4u %6d %6ld %-3s %6u %-3s %-10s "
+ seq_printf(seq, "%-11s %4u %6d %6ld %-3s %6u %-3s %-10s "
"%2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c %2c\n",
proto->name,
proto->obj_size,
@@ -4317,7 +4317,7 @@ static void proto_seq_printf(struct seq_file *seq, struct proto *proto)
static int proto_seq_show(struct seq_file *seq, void *v)
{
if (v == &proto_list)
- seq_printf(seq, "%-9s %-4s %-8s %-6s %-5s %-7s %-4s %-10s %s",
+ seq_printf(seq, "%-11s %-4s %-8s %-6s %-5s %-7s %-4s %-10s %s",
"protocol",
"size",
"sockets",
--
2.34.1
From: Lukas Wunner <lukas(a)wunner.de>
[ Upstream commit 3be5fa236649da6404f1bca1491bf02d4b0d5cce ]
Commit 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") changed IRQ handling on PCI driver probing.
It inadvertently broke resume from system sleep on AMD platforms:
https://lore.kernel.org/r/20150926164651.GA3640@pd.tnic/
This was fixed by two independent commits:
* 8affb487d4a4 ("x86/PCI: Don't alloc pcibios-irq when MSI is enabled")
* cbbc00be2ce3 ("iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices")
The breaking change and one of these two fixes were subsequently reverted:
* fe25d078874f ("Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"")
* 6c777e8799a9 ("Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"")
This rendered the second fix unnecessary, so revert it as well. It used
the match_driver flag in struct pci_dev, which is internal to the PCI core
and not supposed to be touched by arbitrary drivers.
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski(a)kernel.org>
Acked-by: Joerg Roedel <jroedel(a)suse.de>
Link: https://patch.msgid.link/9a3ddff5cc49512044f963ba0904347bd404094d.174557234…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
**YES**
This commit should be backported to stable kernel trees.
**Extensive Analysis:**
**1. Context and Purpose:**
This commit is part of a coordinated series of reverts addressing
regressions introduced by commit 991de2e59090 ("PCI, x86: Implement
pcibios_alloc_irq() and pcibios_free_irq()") which broke system
suspend/resume on AMD platforms in v4.3. The original issue was fixed by
two independent commits, one of which (cbbc00be2ce3) added the line
`iommu->dev->match_driver = false;` to prevent other PCI drivers from
binding to IOMMU devices.
**2. Code Change Analysis:**
The commit removes exactly one line from
`drivers/iommu/amd/init.c:2034`:
```c
/* Prevent binding other PCI device drivers to IOMMU devices */
iommu->dev->match_driver = false;
```
This line was setting the `match_driver` field of `struct pci_dev`,
which the commit message correctly identifies as "internal to the PCI
core and not supposed to be touched by arbitrary drivers."
**3. Why This Should Be Backported:**
**a) Part of Coordinated Fix Series:** This revert is the logical
completion of a series of reverts that fixed major regressions. Looking
at the similar commits provided as examples:
- Similar Commit #1 (YES): Revert of x86/PCI pcibios-irq allocation
- Similar Commit #2 (YES): Revert of PCI IRQ management helpers
- Similar Commit #5 (YES): Revert of the original problematic
pcibios_alloc_irq implementation
All these related reverts were marked for backporting, making this
commit part of the same logical fix series.
**b) Removes Inappropriate Code:** The commit eliminates code that
violates kernel design principles by accessing internal PCI core
structures from a driver. The `match_driver` field is not meant to be
manipulated by individual drivers.
**c) Cleanup After Main Fix:** Once the root cause (commit 991de2e59090)
was reverted by commit 6c777e8799a9, the workaround became unnecessary.
Keeping unnecessary workaround code, especially code that
inappropriately accesses internal structures, is problematic.
**d) Minimal Risk:** The change is extremely small and low-risk - it
simply removes one line that was setting an internal field
inappropriately.
**e) Consistency and Completeness:** Since the other reverts in this
series addressing the 991de2e59090 regressions were backported, this
should be included for consistency and to ensure the cleanup is
complete.
**4. Stable Tree Criteria Met:**
- ✅ Fixes inappropriate driver behavior (accessing internal PCI
structures)
- ✅ Small and contained change
- ✅ Part of a series addressing known regressions
- ✅ Minimal risk of introducing new issues
- ✅ Consistent with backporting decisions for related commits
The commit represents necessary cleanup after a coordinated regression
fix and should be backported to maintain consistency with the related
reverts and to remove code that inappropriately accesses internal kernel
structures.
drivers/iommu/amd_iommu_init.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index de29512c75ccc..cd17deeedf349 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -1751,9 +1751,6 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
if (!iommu->dev)
return -ENODEV;
- /* Prevent binding other PCI device drivers to IOMMU devices */
- iommu->dev->match_driver = false;
-
/* ACPI _PRT won't have an IRQ for IOMMU */
iommu->dev->irq_managed = 1;
--
2.39.5
From: Lukas Wunner <lukas(a)wunner.de>
[ Upstream commit 3be5fa236649da6404f1bca1491bf02d4b0d5cce ]
Commit 991de2e59090 ("PCI, x86: Implement pcibios_alloc_irq() and
pcibios_free_irq()") changed IRQ handling on PCI driver probing.
It inadvertently broke resume from system sleep on AMD platforms:
https://lore.kernel.org/r/20150926164651.GA3640@pd.tnic/
This was fixed by two independent commits:
* 8affb487d4a4 ("x86/PCI: Don't alloc pcibios-irq when MSI is enabled")
* cbbc00be2ce3 ("iommu/amd: Prevent binding other PCI drivers to IOMMU PCI devices")
The breaking change and one of these two fixes were subsequently reverted:
* fe25d078874f ("Revert "x86/PCI: Don't alloc pcibios-irq when MSI is enabled"")
* 6c777e8799a9 ("Revert "PCI, x86: Implement pcibios_alloc_irq() and pcibios_free_irq()"")
This rendered the second fix unnecessary, so revert it as well. It used
the match_driver flag in struct pci_dev, which is internal to the PCI core
and not supposed to be touched by arbitrary drivers.
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Signed-off-by: Krzysztof Wilczyński <kwilczynski(a)kernel.org>
Acked-by: Joerg Roedel <jroedel(a)suse.de>
Link: https://patch.msgid.link/9a3ddff5cc49512044f963ba0904347bd404094d.174557234…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
**YES**
This commit should be backported to stable kernel trees.
**Extensive Analysis:**
**1. Context and Purpose:**
This commit is part of a coordinated series of reverts addressing
regressions introduced by commit 991de2e59090 ("PCI, x86: Implement
pcibios_alloc_irq() and pcibios_free_irq()") which broke system
suspend/resume on AMD platforms in v4.3. The original issue was fixed by
two independent commits, one of which (cbbc00be2ce3) added the line
`iommu->dev->match_driver = false;` to prevent other PCI drivers from
binding to IOMMU devices.
**2. Code Change Analysis:**
The commit removes exactly one line from
`drivers/iommu/amd/init.c:2034`:
```c
/* Prevent binding other PCI device drivers to IOMMU devices */
iommu->dev->match_driver = false;
```
This line was setting the `match_driver` field of `struct pci_dev`,
which the commit message correctly identifies as "internal to the PCI
core and not supposed to be touched by arbitrary drivers."
**3. Why This Should Be Backported:**
**a) Part of Coordinated Fix Series:** This revert is the logical
completion of a series of reverts that fixed major regressions. Looking
at the similar commits provided as examples:
- Similar Commit #1 (YES): Revert of x86/PCI pcibios-irq allocation
- Similar Commit #2 (YES): Revert of PCI IRQ management helpers
- Similar Commit #5 (YES): Revert of the original problematic
pcibios_alloc_irq implementation
All these related reverts were marked for backporting, making this
commit part of the same logical fix series.
**b) Removes Inappropriate Code:** The commit eliminates code that
violates kernel design principles by accessing internal PCI core
structures from a driver. The `match_driver` field is not meant to be
manipulated by individual drivers.
**c) Cleanup After Main Fix:** Once the root cause (commit 991de2e59090)
was reverted by commit 6c777e8799a9, the workaround became unnecessary.
Keeping unnecessary workaround code, especially code that
inappropriately accesses internal structures, is problematic.
**d) Minimal Risk:** The change is extremely small and low-risk - it
simply removes one line that was setting an internal field
inappropriately.
**e) Consistency and Completeness:** Since the other reverts in this
series addressing the 991de2e59090 regressions were backported, this
should be included for consistency and to ensure the cleanup is
complete.
**4. Stable Tree Criteria Met:**
- ✅ Fixes inappropriate driver behavior (accessing internal PCI
structures)
- ✅ Small and contained change
- ✅ Part of a series addressing known regressions
- ✅ Minimal risk of introducing new issues
- ✅ Consistent with backporting decisions for related commits
The commit represents necessary cleanup after a coordinated regression
fix and should be backported to maintain consistency with the related
reverts and to remove code that inappropriately accesses internal kernel
structures.
drivers/iommu/amd/init.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/iommu/amd/init.c b/drivers/iommu/amd/init.c
index e09391ab3deb0..752edbf529f5f 100644
--- a/drivers/iommu/amd/init.c
+++ b/drivers/iommu/amd/init.c
@@ -1876,9 +1876,6 @@ static int __init iommu_init_pci(struct amd_iommu *iommu)
if (!iommu->dev)
return -ENODEV;
- /* Prevent binding other PCI device drivers to IOMMU devices */
- iommu->dev->match_driver = false;
-
/* ACPI _PRT won't have an IRQ for IOMMU */
iommu->dev->irq_managed = 1;
--
2.39.5
Hello,
as described in the commit log of the two commits the rtc-mt6397 driver
relies on these fixes as soon as it should store dates later than
2027-12-31. On one of the patches has a Fixes line, so this submission
is done to ensure that both patches are backported.
The patches sent in reply to this mail are (trivial) backports to
v6.15.1, they should get backported to the older stable kernels, too, to
(somewhat) ensure that in 2028 no surprises happen. `git am` is able to
apply the patches as is to 6.14.y, 6.12.y, 6.6.y, 6.1.y and 5.15.y.
5.10 and 5.4 need an adaption, I didn't look into that yet but can
follow up with backports for these.
The two fixes were accompanied by 3 test updates:
46351921cbe1 ("rtc: test: Emit the seconds-since-1970 value instead of days-since-1970")
da62b49830f8 ("rtc: test: Also test time and wday outcome of rtc_time64_to_tm()")
ccb2dba3c19f ("rtc: test: Test date conversion for dates starting in 1900")
that cover one of the patches. Would you consider it sensible to
backport these, too?
Best regards
Uwe
Alexandre Mergnat (2):
rtc: Make rtc_time64_to_tm() support dates before 1970
rtc: Fix offset calculation for .start_secs < 0
drivers/rtc/class.c | 2 +-
drivers/rtc/lib.c | 24 +++++++++++++++++++-----
2 files changed, 20 insertions(+), 6 deletions(-)
base-commit: 3ef49626da6dd67013fc2cf0a4e4c9e158bb59f7
--
2.47.2
The patch titled
Subject: mm: close theoretical race where stale TLB entries could linger
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-close-theoretical-race-where-stale-tlb-entries-could-linger.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryan Roberts <ryan.roberts(a)arm.com>
Subject: mm: close theoretical race where stale TLB entries could linger
Date: Fri, 6 Jun 2025 10:28:07 +0100
Commit 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with a
parallel reclaim leaving stale TLB entries") described a theoretical race
as such:
"""
Nadav Amit identified a theoretical race between page reclaim and mprotect
due to TLB flushes being batched outside of the PTL being held.
He described the race as follows:
CPU0 CPU1
---- ----
user accesses memory using RW PTE
[PTE now cached in TLB]
try_to_unmap_one()
==> ptep_get_and_clear()
==> set_tlb_ubc_flush_pending()
mprotect(addr, PROT_READ)
==> change_pte_range()
==> [ PTE non-present - no flush ]
user writes using cached RW PTE
...
try_to_unmap_flush()
The same type of race exists for reads when protecting for PROT_NONE and
also exists for operations that can leave an old TLB entry behind such as
munmap, mremap and madvise.
"""
The solution was to introduce flush_tlb_batched_pending() and call it
under the PTL from mprotect/madvise/munmap/mremap to complete any pending
tlb flushes.
However, while madvise_free_pte_range() and
madvise_cold_or_pageout_pte_range() were both retro-fitted to call
flush_tlb_batched_pending() immediately after initially acquiring the PTL,
they both temporarily release the PTL to split a large folio if they
stumble upon one. In this case, where re-acquiring the PTL
flush_tlb_batched_pending() must be called again, but it previously was
not. Let's fix that.
There are 2 Fixes: tags here: the first is the commit that fixed
madvise_free_pte_range(). The second is the commit that added
madvise_cold_or_pageout_pte_range(), which looks like it copy/pasted the
faulty pattern from madvise_free_pte_range().
This is a theoretical bug discovered during code review.
Link: https://lkml.kernel.org/r/20250606092809.4194056-1-ryan.roberts@arm.com
Fixes: 3ea277194daa ("mm, mprotect: flush TLB if potentially racing with a parallel reclaim leaving stale TLB entries")
Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD")
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Reviewed-by: Jann Horn <jannh(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Mel Gorman <mgorman <mgorman(a)suse.de>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/madvise.c~mm-close-theoretical-race-where-stale-tlb-entries-could-linger
+++ a/mm/madvise.c
@@ -508,6 +508,7 @@ restart:
pte_offset_map_lock(mm, pmd, addr, &ptl);
if (!start_pte)
break;
+ flush_tlb_batched_pending(mm);
arch_enter_lazy_mmu_mode();
if (!err)
nr = 0;
@@ -741,6 +742,7 @@ static int madvise_free_pte_range(pmd_t
start_pte = pte;
if (!start_pte)
break;
+ flush_tlb_batched_pending(mm);
arch_enter_lazy_mmu_mode();
if (!err)
nr = 0;
_
Patches currently in -mm which might be from ryan.roberts(a)arm.com are
mm-close-theoretical-race-where-stale-tlb-entries-could-linger.patch
The patch titled
Subject: mm/vma: reset VMA iterator on commit_merge() OOM failure
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vma-reset-vma-iterator-on-commit_merge-oom-failure.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Subject: mm/vma: reset VMA iterator on commit_merge() OOM failure
Date: Fri, 6 Jun 2025 13:50:32 +0100
While an OOM failure in commit_merge() isn't really feasible due to the
allocation which might fail (a maple tree pre-allocation) being 'too small
to fail', we do need to handle this case correctly regardless.
In vma_merge_existing_range(), we can theoretically encounter failures
which result in an OOM error in two ways - firstly dup_anon_vma() might
fail with an OOM error, and secondly commit_merge() failing, ultimately,
to pre-allocate a maple tree node.
The abort logic for dup_anon_vma() resets the VMA iterator to the initial
range, ensuring that any logic looping on this iterator will correctly
proceed to the next VMA.
However the commit_merge() abort logic does not do the same thing. This
resulted in a syzbot report occurring because mlockall() iterates through
VMAs, is tolerant of errors, but ended up with an incorrect previous VMA
being specified due to incorrect iterator state.
While making this change, it became apparent we are duplicating logic -
the logic introduced in commit 41e6ddcaa0f1 ("mm/vma: add give_up_on_oom
option on modify/merge, use in uffd release") duplicates the
vmg->give_up_on_oom check in both abort branches.
Additionally, we observe that we can perform the anon_dup check safely on
dup_anon_vma() failure, as this will not be modified should this call
fail.
Finally, we need to reset the iterator in both cases, so now we can simply
use the exact same code to abort for both.
We remove the VM_WARN_ON(err != -ENOMEM) as it would be silly for this to
be otherwise and it allows us to implement the abort check more neatly.
Link: https://lkml.kernel.org/r/20250606125032.164249-1-lorenzo.stoakes@oracle.com
Fixes: 47b16d0462a4 ("mm: abort vma_modify() on merge out of memory failure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Reported-by: syzbot+d16409ea9ecc16ed261a(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-mm/6842cc67.a00a0220.29ac89.003b.GAE@google.c…
Reviewed-by: Pedro Falcato <pfalcato(a)suse.de>
Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Jann Horn <jannh(a)google.com>
Cc: Liam Howlett <liam.howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vma.c | 22 ++++------------------
1 file changed, 4 insertions(+), 18 deletions(-)
--- a/mm/vma.c~mm-vma-reset-vma-iterator-on-commit_merge-oom-failure
+++ a/mm/vma.c
@@ -961,26 +961,9 @@ static __must_check struct vm_area_struc
err = dup_anon_vma(next, middle, &anon_dup);
}
- if (err)
+ if (err || commit_merge(vmg))
goto abort;
- err = commit_merge(vmg);
- if (err) {
- VM_WARN_ON(err != -ENOMEM);
-
- if (anon_dup)
- unlink_anon_vmas(anon_dup);
-
- /*
- * We've cleaned up any cloned anon_vma's, no VMAs have been
- * modified, no harm no foul if the user requests that we not
- * report this and just give up, leaving the VMAs unmerged.
- */
- if (!vmg->give_up_on_oom)
- vmg->state = VMA_MERGE_ERROR_NOMEM;
- return NULL;
- }
-
khugepaged_enter_vma(vmg->target, vmg->flags);
vmg->state = VMA_MERGE_SUCCESS;
return vmg->target;
@@ -989,6 +972,9 @@ abort:
vma_iter_set(vmg->vmi, start);
vma_iter_load(vmg->vmi);
+ if (anon_dup)
+ unlink_anon_vmas(anon_dup);
+
/*
* This means we have failed to clone anon_vma's correctly, but no
* actual changes to VMAs have occurred, so no harm no foul - if the
_
Patches currently in -mm which might be from lorenzo.stoakes(a)oracle.com are
mm-vma-reset-vma-iterator-on-commit_merge-oom-failure.patch
docs-mm-expand-vma-doc-to-highlight-pte-freeing-non-vma-traversal.patch
mm-ksm-have-ksm-vma-checks-not-require-a-vma-pointer.patch
mm-ksm-refer-to-special-vmas-via-vm_special-in-ksm_compatible.patch
mm-prevent-ksm-from-breaking-vma-merging-for-new-vmas.patch
tools-testing-selftests-add-vma-merge-tests-for-ksm-merge.patch
mm-pagewalk-split-walk_page_range_novma-into-kernel-user-parts.patch
fxls8962af_fifo_flush() uses indio_dev->active_scan_mask (with
iio_for_each_active_channel()) without making sure the indio_dev
stays in buffer mode.
There is a race if indio_dev exits buffer mode in the middle of the
interrupt that flushes the fifo. Fix this by calling
synchronize_irq() to ensure that no interrupt is currently running when
disabling buffer mode.
Unable to handle kernel NULL pointer dereference at virtual address 00000000 when read
[...]
_find_first_bit_le from fxls8962af_fifo_flush+0x17c/0x290
fxls8962af_fifo_flush from fxls8962af_interrupt+0x80/0x178
fxls8962af_interrupt from irq_thread_fn+0x1c/0x7c
irq_thread_fn from irq_thread+0x110/0x1f4
irq_thread from kthread+0xe0/0xfc
kthread from ret_from_fork+0x14/0x2c
Fixes: 79e3a5bdd9ef ("iio: accel: fxls8962af: add hw buffered sampling")
Cc: stable(a)vger.kernel.org
Suggested-by: David Lechner <dlechner(a)baylibre.com>
Signed-off-by: Sean Nyekjaer <sean(a)geanix.com>
---
Changes in v2:
- As per David's suggestion; switched to use synchronize_irq() instead.
- Link to v1: https://lore.kernel.org/r/20250524-fxlsrace-v1-1-dec506dc87ae@geanix.com
---
drivers/iio/accel/fxls8962af-core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/iio/accel/fxls8962af-core.c b/drivers/iio/accel/fxls8962af-core.c
index 6d23da3e7aa22c61f2d9348bb91d70cc5719a732..f2558fba491dffa78b26d47d2cd9f1f4d9811f54 100644
--- a/drivers/iio/accel/fxls8962af-core.c
+++ b/drivers/iio/accel/fxls8962af-core.c
@@ -866,6 +866,8 @@ static int fxls8962af_buffer_predisable(struct iio_dev *indio_dev)
if (ret)
return ret;
+ synchronize_irq(data->irq);
+
ret = __fxls8962af_fifo_set_mode(data, false);
if (data->enable_event)
---
base-commit: 5c3fcb36c92443a9a037683626a2e43d8825f783
change-id: 20250524-fxlsrace-f4d20e29fb29
Best regards,
--
Sean Nyekjaer <sean(a)geanix.com>
From: Ajay Agarwal <ajayagarwal(a)google.com>
[ Upstream commit 7447990137bf06b2aeecad9c6081e01a9f47f2aa ]
PCIe r6.2, sec 5.5.4, requires that:
If setting either or both of the enable bits for ASPM L1 PM Substates,
both ports must be configured as described in this section while ASPM L1
is disabled.
Previously, pcie_config_aspm_l1ss() assumed that "setting enable bits"
meant "setting them to 1", and it configured L1SS as follows:
- Clear L1SS enable bits
- Disable L1
- Configure L1SS enable bits as required
- Enable L1 if required
With this sequence, when disabling L1SS on an ARM A-core with a Synopsys
DesignWare PCIe core, the CPU occasionally hangs when reading
PCI_L1SS_CTL1, leading to a reboot when the CPU watchdog expires.
Move the L1 disable to the caller (pcie_config_aspm_link(), where L1 was
already enabled) so L1 is always disabled while updating the L1SS bits:
- Disable L1
- Clear L1SS enable bits
- Configure L1SS enable bits as required
- Enable L1 if required
Change pcie_aspm_cap_init() similarly.
Link: https://lore.kernel.org/r/20241007032917.872262-1-ajayagarwal@google.com
Signed-off-by: Ajay Agarwal <ajayagarwal(a)google.com>
[bhelgaas: comments, commit log, compute L1SS setting before config access]
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Tested-by: Johnny-CC Chang <Johnny-CC.Chang(a)mediatek.com>
Signed-off-by: Macpaul Lin <macpaul.lin(a)mediatek.com>
---
drivers/pci/pcie/aspm.c | 92 ++++++++++++++++++++++-------------------
1 file changed, 50 insertions(+), 42 deletions(-)
diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index cee2365e54b8..e943691bc931 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -805,6 +805,15 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
pcie_capability_read_word(parent, PCI_EXP_LNKCTL, &parent_lnkctl);
pcie_capability_read_word(child, PCI_EXP_LNKCTL, &child_lnkctl);
+ /* Disable L0s/L1 before updating L1SS config */
+ if (FIELD_GET(PCI_EXP_LNKCTL_ASPMC, child_lnkctl) ||
+ FIELD_GET(PCI_EXP_LNKCTL_ASPMC, parent_lnkctl)) {
+ pcie_capability_write_word(child, PCI_EXP_LNKCTL,
+ child_lnkctl & ~PCI_EXP_LNKCTL_ASPMC);
+ pcie_capability_write_word(parent, PCI_EXP_LNKCTL,
+ parent_lnkctl & ~PCI_EXP_LNKCTL_ASPMC);
+ }
+
/*
* Setup L0s state
*
@@ -829,6 +838,13 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
aspm_l1ss_init(link);
+ /* Restore L0s/L1 if they were enabled */
+ if (FIELD_GET(PCI_EXP_LNKCTL_ASPMC, child_lnkctl) ||
+ FIELD_GET(PCI_EXP_LNKCTL_ASPMC, parent_lnkctl)) {
+ pcie_capability_write_word(parent, PCI_EXP_LNKCTL, parent_lnkctl);
+ pcie_capability_write_word(child, PCI_EXP_LNKCTL, child_lnkctl);
+ }
+
/* Save default state */
link->aspm_default = link->aspm_enabled;
@@ -845,25 +861,28 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist)
}
}
-/* Configure the ASPM L1 substates */
+/* Configure the ASPM L1 substates. Caller must disable L1 first. */
static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
{
- u32 val, enable_req;
+ u32 val;
struct pci_dev *child = link->downstream, *parent = link->pdev;
- enable_req = (link->aspm_enabled ^ state) & state;
+ val = 0;
+ if (state & PCIE_LINK_STATE_L1_1)
+ val |= PCI_L1SS_CTL1_ASPM_L1_1;
+ if (state & PCIE_LINK_STATE_L1_2)
+ val |= PCI_L1SS_CTL1_ASPM_L1_2;
+ if (state & PCIE_LINK_STATE_L1_1_PCIPM)
+ val |= PCI_L1SS_CTL1_PCIPM_L1_1;
+ if (state & PCIE_LINK_STATE_L1_2_PCIPM)
+ val |= PCI_L1SS_CTL1_PCIPM_L1_2;
/*
- * Here are the rules specified in the PCIe spec for enabling L1SS:
- * - When enabling L1.x, enable bit at parent first, then at child
- * - When disabling L1.x, disable bit at child first, then at parent
- * - When enabling ASPM L1.x, need to disable L1
- * (at child followed by parent).
- * - The ASPM/PCIPM L1.2 must be disabled while programming timing
+ * PCIe r6.2, sec 5.5.4, rules for enabling L1 PM Substates:
+ * - Clear L1.x enable bits at child first, then at parent
+ * - Set L1.x enable bits at parent first, then at child
+ * - ASPM/PCIPM L1.2 must be disabled while programming timing
* parameters
- *
- * To keep it simple, disable all L1SS bits first, and later enable
- * what is needed.
*/
/* Disable all L1 substates */
@@ -871,26 +890,6 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state)
PCI_L1SS_CTL1_L1SS_MASK, 0);
pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
PCI_L1SS_CTL1_L1SS_MASK, 0);
- /*
- * If needed, disable L1, and it gets enabled later
- * in pcie_config_aspm_link().
- */
- if (enable_req & (PCIE_LINK_STATE_L1_1 | PCIE_LINK_STATE_L1_2)) {
- pcie_capability_clear_word(child, PCI_EXP_LNKCTL,
- PCI_EXP_LNKCTL_ASPM_L1);
- pcie_capability_clear_word(parent, PCI_EXP_LNKCTL,
- PCI_EXP_LNKCTL_ASPM_L1);
- }
-
- val = 0;
- if (state & PCIE_LINK_STATE_L1_1)
- val |= PCI_L1SS_CTL1_ASPM_L1_1;
- if (state & PCIE_LINK_STATE_L1_2)
- val |= PCI_L1SS_CTL1_ASPM_L1_2;
- if (state & PCIE_LINK_STATE_L1_1_PCIPM)
- val |= PCI_L1SS_CTL1_PCIPM_L1_1;
- if (state & PCIE_LINK_STATE_L1_2_PCIPM)
- val |= PCI_L1SS_CTL1_PCIPM_L1_2;
/* Enable what we need to enable */
pci_clear_and_set_config_dword(parent, parent->l1ss + PCI_L1SS_CTL1,
@@ -937,21 +936,30 @@ static void pcie_config_aspm_link(struct pcie_link_state *link, u32 state)
dwstream |= PCI_EXP_LNKCTL_ASPM_L1;
}
+ /*
+ * Per PCIe r6.2, sec 5.5.4, setting either or both of the enable
+ * bits for ASPM L1 PM Substates must be done while ASPM L1 is
+ * disabled. Disable L1 here and apply new configuration after L1SS
+ * configuration has been completed.
+ *
+ * Per sec 7.5.3.7, when disabling ASPM L1, software must disable
+ * it in the Downstream component prior to disabling it in the
+ * Upstream component, and ASPM L1 must be enabled in the Upstream
+ * component prior to enabling it in the Downstream component.
+ *
+ * Sec 7.5.3.7 also recommends programming the same ASPM Control
+ * value for all functions of a multi-function device.
+ */
+ list_for_each_entry(child, &linkbus->devices, bus_list)
+ pcie_config_aspm_dev(child, 0);
+ pcie_config_aspm_dev(parent, 0);
+
if (link->aspm_capable & PCIE_LINK_STATE_L1SS)
pcie_config_aspm_l1ss(link, state);
- /*
- * Spec 2.0 suggests all functions should be configured the
- * same setting for ASPM. Enabling ASPM L1 should be done in
- * upstream component first and then downstream, and vice
- * versa for disabling ASPM L1. Spec doesn't mention L0S.
- */
- if (state & PCIE_LINK_STATE_L1)
- pcie_config_aspm_dev(parent, upstream);
+ pcie_config_aspm_dev(parent, upstream);
list_for_each_entry(child, &linkbus->devices, bus_list)
pcie_config_aspm_dev(child, dwstream);
- if (!(state & PCIE_LINK_STATE_L1))
- pcie_config_aspm_dev(parent, upstream);
link->aspm_enabled = state;
--
2.45.2
Hi,
I wanted to check if you’d be interested in acquiring the attendees list of UITP Summit - Hamburg 2025?
Event Overview:
Dates: 15 - 18 Jun 2025
Location: Hamburg, Germany
Attendees: 10,126
Exhibitors: 380
Each contact contains: Contact Name, First Name, Last Name, Job Title, Company, Website Address, City, State, Zip, Country Code, Revenue, Employee Size, Email, Phone Number, and Fax Number.
This dataset is an excellent asset for companies looking to expand their reach, build partnerships, and strengthen market presence.
If you're interested in the list, just reply "Send Counts and Cost"?
Best regards,
Tammie Wells
Senior Marketing Manager
To unsubscribe, simply respond with “Not interested.”
Hi,
Please cherry-pick following 9 patches to 6.12:
525a3858aad73 accel/ivpu: Set 500 ns delay between power island TRICKLE and ENABLE
08eb99ce911d3 accel/ivpu: Do not fail on cmdq if failed to allocate preemption buffers
755fb86789165 accel/ivpu: Use whole user and shave ranges for preemption buffers
98110eb5924bd accel/ivpu: Increase MS info buffer size
c140244f0cfb9 accel/ivpu: Add initial Panther Lake support
88bdd1644ca28 accel/ivpu: Update power island delays
ce68f86c44513 accel/ivpu: Do not fail when more than 1 tile is fused
83b6fa5844b53 accel/ivpu: Increase DMA address range
e91191efe75a9 accel/ivpu: Move secondary preemption buffer allocation to DMA range
These add support for new Panther Lake HW.
They should apply without conflicts.
Thanks,
Jacek
IDT event delivery has a debug hole in which it does not generate #DB
upon returning to userspace before the first userspace instruction is
executed if the Trap Flag (TF) is set.
FRED closes this hole by introducing a software event flag, i.e., bit
17 of the augmented SS: if the bit is set and ERETU would result in
RFLAGS.TF = 1, a single-step trap will be pending upon completion of
ERETU.
However I overlooked properly setting and clearing the bit in different
situations. Thus when FRED is enabled, if the Trap Flag (TF) is set
without an external debugger attached, it can lead to an infinite loop
in the SIGTRAP handler. To avoid this, the software event flag in the
augmented SS must be cleared, ensuring that no single-step trap remains
pending when ERETU completes.
This patch set combines the fix [1] and its corresponding selftest [2]
(requested by Dave Hansen) into one patch set.
[1] https://lore.kernel.org/lkml/20250523050153.3308237-1-xin@zytor.com/
[2] https://lore.kernel.org/lkml/20250530230707.2528916-1-xin@zytor.com/
This patch set is based on tip/x86/urgent branch as of today.
Link to v4 of this patch set:
https://lore.kernel.org/lkml/20250605181020.590459-1-xin@zytor.com/
Changes in v5:
*) Accurately rephrase the shortlog (hpa).
*) Do "sub $-128, %rsp" rather than "add $128, %rsp", which is more
efficient in code size (hpa).
*) Add TB from Sohil.
*) Add Cc: stable(a)vger.kernel.org to all patches.
Xin Li (Intel) (2):
x86/fred/signal: Prevent immediate repeat of single step trap on
return from SIGTRAP handler
selftests/x86: Add a test to detect infinite sigtrap handler loop
arch/x86/include/asm/sighandling.h | 22 +++++
arch/x86/kernel/signal_32.c | 4 +
arch/x86/kernel/signal_64.c | 4 +
tools/testing/selftests/x86/Makefile | 2 +-
tools/testing/selftests/x86/sigtrap_loop.c | 98 ++++++++++++++++++++++
5 files changed, 129 insertions(+), 1 deletion(-)
create mode 100644 tools/testing/selftests/x86/sigtrap_loop.c
base-commit: dd2922dcfaa3296846265e113309e5f7f138839f
--
2.49.0
When sysfs_create_files() fails in wacom_initialize_remotes() the error
is returned and the cleanup action will not have been registered yet.
As a result the kobject���s refcount is never dropped, so the
kobject can never be freed leading to a reference leak.
Fix this by calling kobject_put() before returning.
Fixes: 83e6b40e2de6 ("HID: wacom: EKR: have the wacom resources dynamically allocated")
Acked-by: Ping Cheng <ping.cheng(a)wacom.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
---
drivers/hid/wacom_sys.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
index 58cbd43a37e9..1257131b1e34 100644
--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -2059,6 +2059,7 @@ static int wacom_initialize_remotes(struct wacom *wacom)
hid_err(wacom->hdev,
"cannot create sysfs group err: %d\n", error);
kfifo_free(&remote->remote_fifo);
+ kobject_put(remote->remote_dir);
return error;
}
--
2.39.5
During wacom_initialize_remotes() a fifo buffer is allocated
with kfifo_alloc() and later a cleanup action is registered
during devm_add_action_or_reset() to clean it up.
However if the code fails to create a kobject and register it
with sysfs the code simply returns -ENOMEM before the cleanup
action is registered leading to a memory leak.
Fix this by ensuring the fifo is freed when the kobject creation
and registration process fails.
Fixes: 83e6b40e2de6 ("HID: wacom: EKR: have the wacom resources dynamically allocated")
Reviewed-by: Ping Cheng <ping.cheng(a)wacom.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Qasim Ijaz <qasdev00(a)gmail.com>
---
drivers/hid/wacom_sys.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/hid/wacom_sys.c b/drivers/hid/wacom_sys.c
index eaf099b2efdb..ec5282bc69d6 100644
--- a/drivers/hid/wacom_sys.c
+++ b/drivers/hid/wacom_sys.c
@@ -2048,8 +2048,10 @@ static int wacom_initialize_remotes(struct wacom *wacom)
remote->remote_dir = kobject_create_and_add("wacom_remote",
&wacom->hdev->dev.kobj);
- if (!remote->remote_dir)
+ if (!remote->remote_dir) {
+ kfifo_free(&remote->remote_fifo);
return -ENOMEM;
+ }
error = sysfs_create_files(remote->remote_dir, remote_unpair_attrs);
--
2.39.5
From: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
commit 1c9977b263475373b31bbf86af94a5c9ae2be42c upstream.
Commit 3ef9f710efcb ("pinctrl: mediatek: Add EINT support for multiple
addresses") introduced an access to the 'soc' field of struct
mtk_pinctrl in mtk_eint_do_init() and for that an include of
pinctrl-mtk-common-v2.h.
However, pinctrl drivers relying on the v1 common driver include
pinctrl-mtk-common.h instead, which provides another definition of
struct mtk_pinctrl that does not contain an 'soc' field.
Since mtk_eint_do_init() can be called both by v1 and v2 drivers, it
will now try to dereference an invalid pointer when called on v1
platforms. This has been observed on Genio 350 EVK (MT8365), which
crashes very early in boot (the kernel trace can only be seen with
earlycon).
In order to fix this, since 'struct mtk_pinctrl' was only needed to get
a 'struct mtk_eint_pin', make 'struct mtk_eint_pin' a parameter
of mtk_eint_do_init() so that callers need to supply it, removing
mtk_eint_do_init()'s dependency on any particular 'struct mtk_pinctrl'.
Fixes: 3ef9f710efcb ("pinctrl: mediatek: Add EINT support for multiple addresses")
Suggested-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
Link: https://lore.kernel.org/20250520-genio-350-eint-null-ptr-deref-fix-v2-1-6a3…
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
[ukleinek: backport to 6.15.y]
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)baylibre.com>
---
Hello,
would be great to have this in 6.15. Further backporting isn't needed as
3ef9f710efcb == v6.15-rc1~106^2 isn't in 6.14.
This patch fixes booting on mt8365-evk (and probably a few more machines
based on mediatek SoCs.
There was an easy conflict with
86dee87f4b2e6ac119b03810e58723d0b27787a4 in
drivers/pinctrl/mediatek/mtk-eint.c.
Thanks
Uwe
drivers/pinctrl/mediatek/mtk-eint.c | 26 ++++++++-----------
drivers/pinctrl/mediatek/mtk-eint.h | 5 ++--
.../pinctrl/mediatek/pinctrl-mtk-common-v2.c | 2 +-
drivers/pinctrl/mediatek/pinctrl-mtk-common.c | 2 +-
4 files changed, 16 insertions(+), 19 deletions(-)
diff --git a/drivers/pinctrl/mediatek/mtk-eint.c b/drivers/pinctrl/mediatek/mtk-eint.c
index b4eb2beab691..c516c34aaaf6 100644
--- a/drivers/pinctrl/mediatek/mtk-eint.c
+++ b/drivers/pinctrl/mediatek/mtk-eint.c
@@ -22,7 +22,6 @@
#include <linux/platform_device.h>
#include "mtk-eint.h"
-#include "pinctrl-mtk-common-v2.h"
#define MTK_EINT_EDGE_SENSITIVE 0
#define MTK_EINT_LEVEL_SENSITIVE 1
@@ -505,10 +504,9 @@ int mtk_eint_find_irq(struct mtk_eint *eint, unsigned long eint_n)
}
EXPORT_SYMBOL_GPL(mtk_eint_find_irq);
-int mtk_eint_do_init(struct mtk_eint *eint)
+int mtk_eint_do_init(struct mtk_eint *eint, struct mtk_eint_pin *eint_pin)
{
unsigned int size, i, port, inst = 0;
- struct mtk_pinctrl *hw = (struct mtk_pinctrl *)eint->pctl;
/* If clients don't assign a specific regs, let's use generic one */
if (!eint->regs)
@@ -519,7 +517,15 @@ int mtk_eint_do_init(struct mtk_eint *eint)
if (!eint->base_pin_num)
return -ENOMEM;
- if (eint->nbase == 1) {
+ if (eint_pin) {
+ eint->pins = eint_pin;
+ for (i = 0; i < eint->hw->ap_num; i++) {
+ inst = eint->pins[i].instance;
+ if (inst >= eint->nbase)
+ continue;
+ eint->base_pin_num[inst]++;
+ }
+ } else {
size = eint->hw->ap_num * sizeof(struct mtk_eint_pin);
eint->pins = devm_kmalloc(eint->dev, size, GFP_KERNEL);
if (!eint->pins)
@@ -533,16 +539,6 @@ int mtk_eint_do_init(struct mtk_eint *eint)
}
}
- if (hw && hw->soc && hw->soc->eint_pin) {
- eint->pins = hw->soc->eint_pin;
- for (i = 0; i < eint->hw->ap_num; i++) {
- inst = eint->pins[i].instance;
- if (inst >= eint->nbase)
- continue;
- eint->base_pin_num[inst]++;
- }
- }
-
eint->pin_list = devm_kmalloc(eint->dev, eint->nbase * sizeof(u16 *), GFP_KERNEL);
if (!eint->pin_list)
goto err_pin_list;
@@ -610,7 +606,7 @@ int mtk_eint_do_init(struct mtk_eint *eint)
err_wake_mask:
devm_kfree(eint->dev, eint->pin_list);
err_pin_list:
- if (eint->nbase == 1)
+ if (!eint_pin)
devm_kfree(eint->dev, eint->pins);
err_pins:
devm_kfree(eint->dev, eint->base_pin_num);
diff --git a/drivers/pinctrl/mediatek/mtk-eint.h b/drivers/pinctrl/mediatek/mtk-eint.h
index f7f58cca0d5e..23801d4b636f 100644
--- a/drivers/pinctrl/mediatek/mtk-eint.h
+++ b/drivers/pinctrl/mediatek/mtk-eint.h
@@ -88,7 +88,7 @@ struct mtk_eint {
};
#if IS_ENABLED(CONFIG_EINT_MTK)
-int mtk_eint_do_init(struct mtk_eint *eint);
+int mtk_eint_do_init(struct mtk_eint *eint, struct mtk_eint_pin *eint_pin);
int mtk_eint_do_suspend(struct mtk_eint *eint);
int mtk_eint_do_resume(struct mtk_eint *eint);
int mtk_eint_set_debounce(struct mtk_eint *eint, unsigned long eint_n,
@@ -96,7 +96,8 @@ int mtk_eint_set_debounce(struct mtk_eint *eint, unsigned long eint_n,
int mtk_eint_find_irq(struct mtk_eint *eint, unsigned long eint_n);
#else
-static inline int mtk_eint_do_init(struct mtk_eint *eint)
+static inline int mtk_eint_do_init(struct mtk_eint *eint,
+ struct mtk_eint_pin *eint_pin)
{
return -EOPNOTSUPP;
}
diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
index d1556b75d9ef..ba13558bfcd7 100644
--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
+++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
@@ -416,7 +416,7 @@ int mtk_build_eint(struct mtk_pinctrl *hw, struct platform_device *pdev)
hw->eint->pctl = hw;
hw->eint->gpio_xlate = &mtk_eint_xt;
- ret = mtk_eint_do_init(hw->eint);
+ ret = mtk_eint_do_init(hw->eint, hw->soc->eint_pin);
if (ret)
goto err_free_eint;
diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
index 8596f3541265..7289648eaa02 100644
--- a/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
+++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common.c
@@ -1039,7 +1039,7 @@ static int mtk_eint_init(struct mtk_pinctrl *pctl, struct platform_device *pdev)
pctl->eint->pctl = pctl;
pctl->eint->gpio_xlate = &mtk_eint_xt;
- return mtk_eint_do_init(pctl->eint);
+ return mtk_eint_do_init(pctl->eint, NULL);
}
/* This is used as a common probe function */
--
2.47.2
From: Steven Rostedt <rostedt(a)goodmis.org>
When faultable trace events were added, a trace event may no longer use
normal RCU to synchronize but instead used synchronize_rcu_tasks_trace().
This synchronization takes a much longer time to synchronize.
The filter logic would free the filters by calling
tracepoint_synchronize_unregister() after it unhooked the filter strings
and before freeing them. With this function now calling
synchronize_rcu_tasks_trace() this increased the time to free a filter
tremendously. On a PREEMPT_RT system, it was even more noticeable.
# time trace-cmd record -p function sleep 1
[..]
real 2m29.052s
user 0m0.244s
sys 0m20.136s
As trace-cmd would clear out all the filters before recording, it could
take up to 2 minutes to do a recording of "sleep 1".
To find out where the issues was:
~# trace-cmd sqlhist -e -n sched_stack select start.prev_state as state, end.next_comm as comm, TIMESTAMP_DELTA_USECS as delta, start.STACKTRACE as stack from sched_switch as start join sched_switch as end on start.prev_pid = end.next_pid
Which will produce the following commands (and -e will also execute them):
echo 's:sched_stack s64 state; char comm[16]; u64 delta; unsigned long stack[];' >> /sys/kernel/tracing/dynamic_events
echo 'hist:keys=prev_pid:__arg_18057_2=prev_state,__arg_18057_4=common_timestamp.usecs,__arg_18057_7=common_stacktrace' >> /sys/kernel/tracing/events/sched/sched_switch/trigger
echo 'hist:keys=next_pid:__state_18057_1=$__arg_18057_2,__comm_18057_3=next_comm,__delta_18057_5=common_timestamp.usecs-$__arg_18057_4,__stack_18057_6=$__arg_18057_7:onmatch(sched.sched_switch).trace(sched_stack,$__state_18057_1,$__comm_18057_3,$__delta_18057_5,$__stack_18057_6)' >> /sys/kernel/tracing/events/sched/sched_switch/trigger
The above creates a synthetic event that creates a stack trace when a task
schedules out and records it with the time it scheduled back in. Basically
the time a task is off the CPU. It also records the state of the task when
it left the CPU (running, blocked, sleeping, etc). It also saves the comm
of the task as "comm" (needed for the next command).
~# echo 'hist:keys=state,stack.stacktrace:vals=delta:sort=state,delta if comm == "trace-cmd" && state & 3' > /sys/kernel/tracing/events/synthetic/sched_stack/trigger
The above creates a histogram with buckets per state, per stack, and the
value of the total time it was off the CPU for that stack trace. It filters
on tasks with "comm == trace-cmd" and only the sleeping and blocked states
(1 - sleeping, 2 - blocked).
~# trace-cmd record -p function sleep 1
~# cat /sys/kernel/tracing/events/synthetic/sched_stack/hist | tail -18
{ state: 2, stack.stacktrace __schedule+0x1545/0x3700
schedule+0xe2/0x390
schedule_timeout+0x175/0x200
wait_for_completion_state+0x294/0x440
__wait_rcu_gp+0x247/0x4f0
synchronize_rcu_tasks_generic+0x151/0x230
apply_subsystem_event_filter+0xa2b/0x1300
subsystem_filter_write+0x67/0xc0
vfs_write+0x1e2/0xeb0
ksys_write+0xff/0x1d0
do_syscall_64+0x7b/0x420
entry_SYSCALL_64_after_hwframe+0x76/0x7e
} hitcount: 237 delta: 99756288 <<--------------- Delta is 99 seconds!
Totals:
Hits: 525
Entries: 21
Dropped: 0
This shows that this particular trace waited for 99 seconds on
synchronize_rcu_tasks() in apply_subsystem_event_filter().
In fact, there's a lot of places in the filter code that spends a lot of
time waiting for synchronize_rcu_tasks_trace() in order to free the
filters.
Add helper functions that will use call_rcu*() variants to asynchronously
free the filters. This brings the timings back to normal:
# time trace-cmd record -p function sleep 1
[..]
real 0m14.681s
user 0m0.335s
sys 0m28.616s
And the histogram also shows this:
~# cat /sys/kernel/tracing/events/synthetic/sched_stack/hist | tail -21
{ state: 2, stack.stacktrace __schedule+0x1545/0x3700
schedule+0xe2/0x390
schedule_timeout+0x175/0x200
wait_for_completion_state+0x294/0x440
__wait_rcu_gp+0x247/0x4f0
synchronize_rcu_normal+0x3db/0x5c0
tracing_reset_online_cpus+0x8f/0x1e0
tracing_open+0x335/0x440
do_dentry_open+0x4c6/0x17a0
vfs_open+0x82/0x360
path_openat+0x1a36/0x2990
do_filp_open+0x1c5/0x420
do_sys_openat2+0xed/0x180
__x64_sys_openat+0x108/0x1d0
do_syscall_64+0x7b/0x420
} hitcount: 2 delta: 77044
Totals:
Hits: 55
Entries: 28
Dropped: 0
Where the total waiting time of synchronize_rcu_tasks_trace() is 77
milliseconds.
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: "Paul E. McKenney" <paulmck(a)kernel.org>
Cc: "Kiszka, Jan" <jan.kiszka(a)siemens.com>
Cc: "Ziegler, Andreas" <ziegler.andreas(a)siemens.com>
Cc: "MOESSBAUER, Felix" <felix.moessbauer(a)siemens.com>
Link: https://lore.kernel.org/20250605161701.35f7989a@gandalf.local.home
Reported-by: "Flot, Julien" <julien.flot(a)siemens.com>
Tested-by: Julien Flot <julien.flot(a)siemens.com>
Fixes: a363d27cdbc2 ("tracing: Allow system call tracepoints to handle page faults")
Closes: https://lore.kernel.org/all/240017f656631c7dd4017aa93d91f41f653788ea.camel@…
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace_events_filter.c | 164 ++++++++++++++++++++++-------
1 file changed, 127 insertions(+), 37 deletions(-)
diff --git a/kernel/trace/trace_events_filter.c b/kernel/trace/trace_events_filter.c
index 2048560264bb..3ff782d6b522 100644
--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -1335,6 +1335,74 @@ static void filter_free_subsystem_preds(struct trace_subsystem_dir *dir,
}
}
+struct filter_list {
+ struct list_head list;
+ struct event_filter *filter;
+};
+
+struct filter_head {
+ struct list_head list;
+ struct rcu_head rcu;
+};
+
+
+static void free_filter_list(struct rcu_head *rhp)
+{
+ struct filter_head *filter_list = container_of(rhp, struct filter_head, rcu);
+ struct filter_list *filter_item, *tmp;
+
+ list_for_each_entry_safe(filter_item, tmp, &filter_list->list, list) {
+ __free_filter(filter_item->filter);
+ list_del(&filter_item->list);
+ kfree(filter_item);
+ }
+ kfree(filter_list);
+}
+
+static void free_filter_list_tasks(struct rcu_head *rhp)
+{
+ call_rcu(rhp, free_filter_list);
+}
+
+/*
+ * The tracepoint_synchronize_unregister() is a double rcu call.
+ * It calls synchronize_rcu_tasks_trace() followed by synchronize_rcu().
+ * Instead of waiting for it, simply call these via the call_rcu*()
+ * variants.
+ */
+static void delay_free_filter(struct filter_head *head)
+{
+ call_rcu_tasks_trace(&head->rcu, free_filter_list_tasks);
+}
+
+static void try_delay_free_filter(struct event_filter *filter)
+{
+ struct filter_head *head;
+ struct filter_list *item;
+
+ head = kmalloc(sizeof(*head), GFP_KERNEL);
+ if (!head)
+ goto free_now;
+
+ INIT_LIST_HEAD(&head->list);
+
+ item = kmalloc(sizeof(*item), GFP_KERNEL);
+ if (!item) {
+ kfree(head);
+ goto free_now;
+ }
+
+ item->filter = filter;
+ list_add_tail(&item->list, &head->list);
+ delay_free_filter(head);
+ return;
+
+ free_now:
+ /* Make sure the filter is not being used */
+ tracepoint_synchronize_unregister();
+ __free_filter(filter);
+}
+
static inline void __free_subsystem_filter(struct trace_event_file *file)
{
__free_filter(file->filter);
@@ -1342,15 +1410,53 @@ static inline void __free_subsystem_filter(struct trace_event_file *file)
}
static void filter_free_subsystem_filters(struct trace_subsystem_dir *dir,
- struct trace_array *tr)
+ struct trace_array *tr,
+ struct event_filter *filter)
{
struct trace_event_file *file;
+ struct filter_head *head;
+ struct filter_list *item;
+
+ head = kmalloc(sizeof(*head), GFP_KERNEL);
+ if (!head)
+ goto free_now;
+
+ INIT_LIST_HEAD(&head->list);
+
+ item = kmalloc(sizeof(*item), GFP_KERNEL);
+ if (!item) {
+ kfree(head);
+ goto free_now;
+ }
+
+ item->filter = filter;
+ list_add_tail(&item->list, &head->list);
list_for_each_entry(file, &tr->events, list) {
if (file->system != dir)
continue;
- __free_subsystem_filter(file);
+ item = kmalloc(sizeof(*item), GFP_KERNEL);
+ if (!item)
+ goto free_now;
+ item->filter = file->filter;
+ list_add_tail(&item->list, &head->list);
+ file->filter = NULL;
+ }
+
+ delay_free_filter(head);
+ return;
+ free_now:
+ tracepoint_synchronize_unregister();
+
+ if (head)
+ free_filter_list(&head->rcu);
+
+ list_for_each_entry(file, &tr->events, list) {
+ if (file->system != dir || !file->filter)
+ continue;
+ __free_filter(file->filter);
}
+ __free_filter(filter);
}
int filter_assign_type(const char *type)
@@ -2131,11 +2237,6 @@ static inline void event_clear_filter(struct trace_event_file *file)
RCU_INIT_POINTER(file->filter, NULL);
}
-struct filter_list {
- struct list_head list;
- struct event_filter *filter;
-};
-
static int process_system_preds(struct trace_subsystem_dir *dir,
struct trace_array *tr,
struct filter_parse_error *pe,
@@ -2144,11 +2245,16 @@ static int process_system_preds(struct trace_subsystem_dir *dir,
struct trace_event_file *file;
struct filter_list *filter_item;
struct event_filter *filter = NULL;
- struct filter_list *tmp;
- LIST_HEAD(filter_list);
+ struct filter_head *filter_list;
bool fail = true;
int err;
+ filter_list = kmalloc(sizeof(*filter_list), GFP_KERNEL);
+ if (!filter_list)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&filter_list->list);
+
list_for_each_entry(file, &tr->events, list) {
if (file->system != dir)
@@ -2175,7 +2281,7 @@ static int process_system_preds(struct trace_subsystem_dir *dir,
if (!filter_item)
goto fail_mem;
- list_add_tail(&filter_item->list, &filter_list);
+ list_add_tail(&filter_item->list, &filter_list->list);
/*
* Regardless of if this returned an error, we still
* replace the filter for the call.
@@ -2195,31 +2301,22 @@ static int process_system_preds(struct trace_subsystem_dir *dir,
* Do a synchronize_rcu() and to ensure all calls are
* done with them before we free them.
*/
- tracepoint_synchronize_unregister();
- list_for_each_entry_safe(filter_item, tmp, &filter_list, list) {
- __free_filter(filter_item->filter);
- list_del(&filter_item->list);
- kfree(filter_item);
- }
+ delay_free_filter(filter_list);
return 0;
fail:
/* No call succeeded */
- list_for_each_entry_safe(filter_item, tmp, &filter_list, list) {
- list_del(&filter_item->list);
- kfree(filter_item);
- }
+ free_filter_list(&filter_list->rcu);
parse_error(pe, FILT_ERR_BAD_SUBSYS_FILTER, 0);
return -EINVAL;
fail_mem:
__free_filter(filter);
+
/* If any call succeeded, we still need to sync */
if (!fail)
- tracepoint_synchronize_unregister();
- list_for_each_entry_safe(filter_item, tmp, &filter_list, list) {
- __free_filter(filter_item->filter);
- list_del(&filter_item->list);
- kfree(filter_item);
- }
+ delay_free_filter(filter_list);
+ else
+ free_filter_list(&filter_list->rcu);
+
return -ENOMEM;
}
@@ -2361,9 +2458,7 @@ int apply_event_filter(struct trace_event_file *file, char *filter_string)
event_clear_filter(file);
- /* Make sure the filter is not being used */
- tracepoint_synchronize_unregister();
- __free_filter(filter);
+ try_delay_free_filter(filter);
return 0;
}
@@ -2387,11 +2482,8 @@ int apply_event_filter(struct trace_event_file *file, char *filter_string)
event_set_filter(file, filter);
- if (tmp) {
- /* Make sure the call is done with the filter */
- tracepoint_synchronize_unregister();
- __free_filter(tmp);
- }
+ if (tmp)
+ try_delay_free_filter(tmp);
}
return err;
@@ -2417,9 +2509,7 @@ int apply_subsystem_event_filter(struct trace_subsystem_dir *dir,
filter = system->filter;
system->filter = NULL;
/* Ensure all filters are no longer used */
- tracepoint_synchronize_unregister();
- filter_free_subsystem_filters(dir, tr);
- __free_filter(filter);
+ filter_free_subsystem_filters(dir, tr, filter);
return 0;
}
--
2.47.2
The first patch is a fix with an explanation of the issue, you should
read that first.
The second patch adds a comment to document the rules because figuring
this out from scratch causes brain pain.
Accidentally hitting this issue and getting negative consequences from
it would require several stars to line up just right; but if someone out
there is using a malloc() implementation that uses lockless data
structures across threads or such, this could actually be a problem.
In case someone wants a testcase, here's a very artificial one:
```
#include <pthread.h>
#include <err.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>
#include <sys/uio.h>
#include <sys/mman.h>
#include <sys/wait.h>
#include <linux/io_uring.h>
#define SYSCHK(x) ({ \
typeof(x) __res = (x); \
if (__res == (typeof(x))-1) \
err(1, "SYSCHK(" #x ")"); \
__res; \
})
#define NUM_SQ_PAGES 4
static int uring_init(struct io_uring_sqe **sqesp, void **cqesp) {
struct io_uring_sqe *sqes = SYSCHK(mmap(NULL, NUM_SQ_PAGES*0x1000, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0));
void *cqes = SYSCHK(mmap(NULL, NUM_SQ_PAGES*0x1000, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANONYMOUS, -1, 0));
*(volatile unsigned int *)(cqes+4) = 64 * NUM_SQ_PAGES;
struct io_uring_params params = {
.flags = IORING_SETUP_NO_MMAP|IORING_SETUP_NO_SQARRAY,
.sq_off = { .user_addr = (unsigned long)sqes },
.cq_off = { .user_addr = (unsigned long)cqes }
};
int uring_fd = SYSCHK(syscall(__NR_io_uring_setup, /*entries=*/10, ¶ms));
if (sqesp)
*sqesp = sqes;
if (cqesp)
*cqesp = cqes;
return uring_fd;
}
static char *bufmem[0x3000] __attribute__((aligned(0x1000)));
static void *thread_fn(void *dummy) {
unsigned long i = 0;
while (1) {
*(volatile unsigned long *)(bufmem + 0x0000) = i;
*(volatile unsigned long *)(bufmem + 0x0f00) = i;
*(volatile unsigned long *)(bufmem + 0x1000) = i;
*(volatile unsigned long *)(bufmem + 0x1f00) = i;
*(volatile unsigned long *)(bufmem + 0x2000) = i;
*(volatile unsigned long *)(bufmem + 0x2f00) = i;
i++;
}
}
int main(void) {
#if 1
int uring_fd = uring_init(NULL, NULL);
struct iovec reg_iov = { .iov_base = bufmem, .iov_len = 0x2000 };
SYSCHK(syscall(__NR_io_uring_register, uring_fd, IORING_REGISTER_BUFFERS, ®_iov, 1));
#endif
pthread_t thread;
if (pthread_create(&thread, NULL, thread_fn, NULL))
errx(1, "pthread_create");
sleep(1);
int child = SYSCHK(fork());
if (child == 0) {
printf("bufmem values:\n");
printf(" 0x0000: 0x%lx\n", *(volatile unsigned long *)(bufmem + 0x0000));
printf(" 0x0f00: 0x%lx\n", *(volatile unsigned long *)(bufmem + 0x0f00));
printf(" 0x1000: 0x%lx\n", *(volatile unsigned long *)(bufmem + 0x1000));
printf(" 0x1f00: 0x%lx\n", *(volatile unsigned long *)(bufmem + 0x1f00));
printf(" 0x2000: 0x%lx\n", *(volatile unsigned long *)(bufmem + 0x2000));
printf(" 0x2f00: 0x%lx\n", *(volatile unsigned long *)(bufmem + 0x2f00));
return 0;
}
int wstatus;
SYSCHK(wait(&wstatus));
return 0;
}
```
Without this series, the child will usually print results that are
apart by more than 1, which is not a state that ever occurred in
the parent; in my opinion, that counts as a bug.
If you change the "#if 1" to "#if 0", the bug won't manifest.
Signed-off-by: Jann Horn <jannh(a)google.com>
---
Jann Horn (2):
mm/memory: ensure fork child sees coherent memory snapshot
mm/memory: Document how we make a coherent memory snapshot
kernel/fork.c | 34 ++++++++++++++++++++++++++++++++++
mm/memory.c | 18 ++++++++++++++++++
2 files changed, 52 insertions(+)
---
base-commit: 8477ab143069c6b05d6da4a8184ded8b969240f5
change-id: 20250530-fork-tearing-71da211a50cf
--
Jann Horn <jannh(a)google.com>