During fuzz testing, the following issue was discovered:
BUG: KMSAN: kernel-infoleak in _copy_to_iter+0x598/0x2a30
_copy_to_iter+0x598/0x2a30
__skb_datagram_iter+0x168/0x1060
skb_copy_datagram_iter+0x5b/0x220
netlink_recvmsg+0x362/0x1700
sock_recvmsg+0x2dc/0x390
__sys_recvfrom+0x381/0x6d0
__x64_sys_recvfrom+0x130/0x200
x64_sys_call+0x32c8/0x3cc0
do_syscall_64+0xd8/0x1c0
entry_SYSCALL_64_after_hwframe+0x79/0x81
Uninit was stored to memory at:
copy_to_user_state_extra+0xcc1/0x1e00
dump_one_state+0x28c/0x5f0
xfrm_state_walk+0x548/0x11e0
xfrm_dump_sa+0x1e0/0x840
netlink_dump+0x943/0x1c40
__netlink_dump_start+0x746/0xdb0
xfrm_user_rcv_msg+0x429/0xc00
netlink_rcv_skb+0x613/0x780
xfrm_netlink_rcv+0x77/0xc0
netlink_unicast+0xe90/0x1280
netlink_sendmsg+0x126d/0x1490
__sock_sendmsg+0x332/0x3d0
____sys_sendmsg+0x863/0xc30
___sys_sendmsg+0x285/0x3e0
__x64_sys_sendmsg+0x2d6/0x560
x64_sys_call+0x1316/0x3cc0
do_syscall_64+0xd8/0x1c0
entry_SYSCALL_64_after_hwframe+0x79/0x81
Uninit was created at:
__kmalloc+0x571/0xd30
attach_auth+0x106/0x3e0
xfrm_add_sa+0x2aa0/0x4230
xfrm_user_rcv_msg+0x832/0xc00
netlink_rcv_skb+0x613/0x780
xfrm_netlink_rcv+0x77/0xc0
netlink_unicast+0xe90/0x1280
netlink_sendmsg+0x126d/0x1490
__sock_sendmsg+0x332/0x3d0
____sys_sendmsg+0x863/0xc30
___sys_sendmsg+0x285/0x3e0
__x64_sys_sendmsg+0x2d6/0x560
x64_sys_call+0x1316/0x3cc0
do_syscall_64+0xd8/0x1c0
entry_SYSCALL_64_after_hwframe+0x79/0x81
Bytes 328-379 of 732 are uninitialized
Memory access of size 732 starts at ffff88800e18e000
Data copied to user address 00007ff30f48aff0
CPU: 2 PID: 18167 Comm: syz-executor.0 Not tainted 6.8.11 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
Fixes copying of xfrm algorithms where some random
data of the structure fields can end up in userspace.
Padding in structures may be filled with random (possibly sensitve)
data and should never be given directly to user-space.
A similar issue was resolved in the commit
8222d5910dae ("xfrm: Zero padding when dumping algos and encap")
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: c7a5899eb26e ("xfrm: redact SA secret with lockdown confidentiality")
Cc: stable(a)vger.kernel.org
Co-developed-by: Boris Tonofa <b.tonofa(a)ideco.ru>
Signed-off-by: Boris Tonofa <b.tonofa(a)ideco.ru>
Signed-off-by: Petr Vaganov <p.vaganov(a)ideco.ru>
---
v3: Corrected commit description "This patch fixes copying..." to
"Fixes copying..." according to accepted rules of Linux kernel commits,
as suggested by Markus Elfring <elfring(a)users.sourceforge.net>.
v2: Fixed typo in sizeof(sizeof(ap->alg_name) expression.
The third argument for the strscpy_pad macro was chosen by
analogy with those in other functions of this file - as did
commit 8222d5910dae ("xfrm: Zero padding when dumping algos and encap").
I still think it would be better to leave the strscpy_pad() macro with
three arguments in this patch, mainly so that it would be consistent
with the existing similar code in this module.
Regarding strncpy() above, we don't think it is
a real issue since the uncopied parts should be already padded with zeroes
by nla_reserve().
net/xfrm/xfrm_user.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 55f039ec3d59..0083faabe8be 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1098,7 +1098,9 @@ static int copy_to_user_auth(struct xfrm_algo_auth *auth, struct sk_buff *skb)
if (!nla)
return -EMSGSIZE;
ap = nla_data(nla);
- memcpy(ap, auth, sizeof(struct xfrm_algo_auth));
+ strscpy_pad(ap->alg_name, auth->alg_name, sizeof(ap->alg_name));
+ ap->alg_key_len = auth->alg_key_len;
+ ap->alg_trunc_len = auth->alg_trunc_len;
if (redact_secret && auth->alg_key_len)
memset(ap->alg_key, 0, (auth->alg_key_len + 7) / 8);
else
--
2.46.2
New processors have become pickier about the local APIC timer state
before entering low power modes. These low power modes are used (for
example) when you close your laptop lid and suspend. If you put your
laptop in a bag in this unnecessarily-high-power state, it is likely
to get quite toasty while it quickly sucks the battery dry.
The problem boils down to some CPUs' inability to power down until the
kernel fully disables the local APIC timer. The current kernel code
works in one-shot and periodic modes but does not work for deadline
mode. Deadline mode has been the supported and preferred mode on
Intel CPUs for over a decade and uses an MSR to drive the timer
instead of an APIC register.
Disable the TSC Deadline timer in lapic_timer_shutdown() by writing to
MSR_IA32_TSC_DEADLINE when in TSC-deadline mode. Also avoid writing
to the initial-count register (APIC_TMICT) which is ignored in
TSC-deadline mode.
Note: The APIC_LVTT|=APIC_LVT_MASKED operation should theoretically be
enough to tell the hardware that the timer will not fire in any of the
timer modes. But mitigating AMD erratum 411[1] also requires clearing
out APIC_TMICT. Solely setting APIC_LVT_MASKED is also ineffective in
practice on Intel Lunar Lake systems, which is the motivation for this
change.
1. 411 Processor May Exit Message-Triggered C1E State Without an Interrupt if Local APIC Timer Reaches Zero - https://www.amd.com/content/dam/amd/en/documents/archived-tech-docs/revisio…
Cc: stable(a)vger.kernel.org
Fixes: 279f1461432c ("x86: apic: Use tsc deadline for oneshot when available")
Suggested-by: Dave Hansen <dave.hansen(a)intel.com>
Signed-off-by: Zhang Rui <rui.zhang(a)intel.com>
Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
Tested-by: Todd Brandt <todd.e.brandt(a)intel.com>
---
V2
- Improve changelog
V3
- Subject and changelog rewrite
- Check LAPIC Timer mode using APIC_LVTT value instead of extra CPU feature flag check
- Avoid APIC_TMICT write which is ignored in TSC-deadline mode
V4
- Add back Fixes tag and stable tag which was missing in V3
- Update patch recipients
---
arch/x86/kernel/apic/apic.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 6513c53c9459..5436a4083065 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -440,7 +440,19 @@ static int lapic_timer_shutdown(struct clock_event_device *evt)
v = apic_read(APIC_LVTT);
v |= (APIC_LVT_MASKED | LOCAL_TIMER_VECTOR);
apic_write(APIC_LVTT, v);
- apic_write(APIC_TMICT, 0);
+
+ /*
+ * Setting APIC_LVT_MASKED should be enough to tell the
+ * hardware that this timer will never fire. But AMD
+ * erratum 411 and some Intel CPU behavior circa 2024
+ * say otherwise. Time for belt and suspenders programming,
+ * mask the timer and zero the counter registers:
+ */
+ if (v & APIC_LVT_TIMER_TSCDEADLINE)
+ wrmsrl(MSR_IA32_TSC_DEADLINE, 0);
+ else
+ apic_write(APIC_TMICT, 0);
+
return 0;
}
--
2.34.1
The patch titled
Subject: mm/swapfile: skip HugeTLB pages for unuse_vma
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-swapfile-skip-hugetlb-pages-for-unuse_vma.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Liu Shixin <liushixin2(a)huawei.com>
Subject: mm/swapfile: skip HugeTLB pages for unuse_vma
Date: Tue, 15 Oct 2024 09:45:21 +0800
I got a bad pud error and lost a 1GB HugeTLB when calling swapoff. The
problem can be reproduced by the following steps:
1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
2. Swapout the above anonymous memory.
3. run swapoff and we will get a bad pud error in kernel message:
mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7)
We can tell that pud_clear_bad is called by pud_none_or_clear_bad in
unuse_pud_range() by ftrace. And therefore the HugeTLB pages will never
be freed because we lost it from page table. We can skip HugeTLB pages
for unuse_vma to fix it.
Link: https://lkml.kernel.org/r/20241015014521.570237-1-liushixin2@huawei.com
Fixes: 0fe6e20b9c4c ("hugetlb, rmap: add reverse mapping for hugepage")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Acked-by: Muchun Song <muchun.song(a)linux.dev>
Cc: Naoya Horiguchi <nao.horiguchi(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/swapfile.c~mm-swapfile-skip-hugetlb-pages-for-unuse_vma
+++ a/mm/swapfile.c
@@ -2313,7 +2313,7 @@ static int unuse_mm(struct mm_struct *mm
mmap_read_lock(mm);
for_each_vma(vmi, vma) {
- if (vma->anon_vma) {
+ if (vma->anon_vma && !is_vm_hugetlb_page(vma)) {
ret = unuse_vma(vma, type);
if (ret)
break;
_
Patches currently in -mm which might be from liushixin2(a)huawei.com are
mm-swapfile-skip-hugetlb-pages-for-unuse_vma.patch
I got a bad pud error and lost a 1GB HugeTLB when calling swapoff.
The problem can be reproduced by the following steps:
1. Allocate an anonymous 1GB HugeTLB and some other anonymous memory.
2. Swapout the above anonymous memory.
3. run swapoff and we will get a bad pud error in kernel message:
mm/pgtable-generic.c:42: bad pud 00000000743d215d(84000001400000e7)
We can tell that pud_clear_bad is called by pud_none_or_clear_bad
in unuse_pud_range() by ftrace. And therefore the HugeTLB pages will
never be freed because we lost it from page table. We can skip
HugeTLB pages for unuse_vma to fix it.
Fixes: 0fe6e20b9c4c ("hugetlb, rmap: add reverse mapping for hugepage")
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
---
mm/swapfile.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 0cded32414a1..f4ef91513fc9 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -2312,7 +2312,7 @@ static int unuse_mm(struct mm_struct *mm, unsigned int type)
mmap_read_lock(mm);
for_each_vma(vmi, vma) {
- if (vma->anon_vma) {
+ if (vma->anon_vma && !is_vm_hugetlb_page(vma)) {
ret = unuse_vma(vma, type);
if (ret)
break;
--
2.34.1
Previously, the domain_context_clear() function incorrectly called
pci_for_each_dma_alias() to set up context entries for non-PCI devices.
This could lead to kernel hangs or other unexpected behavior.
Add a check to only call pci_for_each_dma_alias() for PCI devices. For
non-PCI devices, domain_context_clear_one() is called directly.
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219363
Fixes: 9a16ab9d6402 ("iommu/vt-d: Make context clearing consistent with context mapping")
Cc: stable(a)vger.kernel.org
Signed-off-by: Lu Baolu <baolu.lu(a)linux.intel.com>
---
drivers/iommu/intel/iommu.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 9f6b0780f2ef..e860bc9439a2 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -3340,8 +3340,10 @@ static int domain_context_clear_one_cb(struct pci_dev *pdev, u16 alias, void *op
*/
static void domain_context_clear(struct device_domain_info *info)
{
- if (!dev_is_pci(info->dev))
+ if (!dev_is_pci(info->dev)) {
domain_context_clear_one(info, info->bus, info->devfn);
+ return;
+ }
pci_for_each_dma_alias(to_pci_dev(info->dev),
&domain_context_clear_one_cb, info);
--
2.43.0
The added test case from commit
09bcf9254838 ("selftests/ftrace: Add new test case which checks non unique symbol")
failed, it is fixed by
b022f0c7e404 ("tracing/kprobes: Return EADDRNOTAVAIL when func matches several symbols").
Backport it and its fix commit to 5.4.y together. Resolved minor context change conflicts.
Resend the patch series after backpporting to 5.10.y first.
Andrii Nakryiko (1):
tracing/kprobes: Fix symbol counting logic by looking at modules as
well
Francis Laniel (1):
tracing/kprobes: Return EADDRNOTAVAIL when func matches several
symbols
kernel/trace/trace_kprobe.c | 76 +++++++++++++++++++++++++++++++++++++
kernel/trace/trace_probe.h | 1 +
2 files changed, 77 insertions(+)
--
2.46.0