The patch titled
Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
has been added to the -mm tree. Its filename is
mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hugetlb-fix-a-addressing-except…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hugetlb-fix-a-addressing-except…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Longpeng <longpeng2(a)huawei.com>
Subject: mm/hugetlb: fix a addressing exception caused by huge_pte_offset
Our machine encountered a panic(addressing exception) after run
for a long time and the calltrace is:
RIP: 0010:[<ffffffff9dff0587>] [<ffffffff9dff0587>] hugetlb_fault+0x307/0xbe0
RSP: 0018:ffff9567fc27f808 EFLAGS: 00010286
RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
FS: 00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff9df9b71b>] ? unlock_page+0x2b/0x30
[<ffffffff9dff04a2>] ? hugetlb_fault+0x222/0xbe0
[<ffffffff9dff1405>] follow_hugetlb_page+0x175/0x540
[<ffffffff9e15b825>] ? cpumask_next_and+0x35/0x50
[<ffffffff9dfc7230>] __get_user_pages+0x2a0/0x7e0
[<ffffffff9dfc648d>] __get_user_pages_unlocked+0x15d/0x210
[<ffffffffc068cfc5>] __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
[<ffffffffc06b28be>] try_async_pf+0x6e/0x2a0 [kvm]
[<ffffffffc06b4b41>] tdp_page_fault+0x151/0x2d0 [kvm]
...
[<ffffffffc06a6f90>] kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
[<ffffffffc068d919>] kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
[<ffffffff9deaa8c2>] ? dequeue_signal+0x32/0x180
[<ffffffff9deae34d>] ? do_sigtimedwait+0xcd/0x230
[<ffffffff9e03aed0>] do_vfs_ioctl+0x3f0/0x540
[<ffffffff9e03b0c1>] SyS_ioctl+0xa1/0xc0
[<ffffffff9e53879b>] system_call_fastpath+0x22/0x27
For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race. Please look at the following
code snippet:
...
pud = pud_offset(p4d, addr);
if (sz != PUD_SIZE && pud_none(*pud))
return NULL;
/* hugepage or swap? */
if (pud_huge(*pud) || !pud_present(*pud))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
if (sz != PMD_SIZE && pmd_none(*pmd))
return NULL;
/* hugepage or swap? */
if (pmd_huge(*pmd) || !pmd_present(*pmd))
return (pte_t *)pmd;
...
The following sequence would trigger this bug:
1. CPU0: sz = PUD_SIZE and *pud = 0 , continue
1. CPU0: "pud_huge(*pud)" is false
2. CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
3. CPU0: "!pud_present(*pud)" is false, continue
4. CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
However, we want CPU0 to return NULL or pudp in this case.
Also, according to the section 'COMPILER BARRIER' of memory-barriers.txt:
'''
(*) The compiler is within its rights to reorder loads and stores
to the same variable, and in some cases, the CPU is within its
rights to reorder loads to the same variable. This means that
the following code:
a[0] = x;
a[1] = x;
Might result in an older value of x stored in a[1] than in a[0].
'''
there're several other data races in huge_pte_offset, for example:
'''
p4d = p4d_offset(pgd, addr)
if (!p4d_present(*p4d))
return NULL;
pud = pud_offset(p4d, addr) <-- will be unwinded as:
pud = (pud_t *)p4d_page_vaddr(*p4d) + pud_index(address);
'''
which is free for the compiler/CPU to execute as:
'''
p4d = p4d_offset(pgd, addr)
p4d_for_vaddr = *p4d;
if (!p4d_present(*p4d))
return NULL;
pud = (pud_t *)p4d_page_vaddr(p4d_for_vaddr) + pud_index(address);
'''
so in the case where *p4d goes from '!present' to 'present':
p4d_present(*p4d) == true and p4d_for_vaddr == none, meaning the
p4d_page_vaddr() will crash.
For these reasons, we must make sure there is exactly one dereference of
p4d, pud and pmd.
Link: http://lkml.kernel.org/r/20200327235748.2048-1-longpeng2@huawei.com
Signed-off-by: Longpeng <longpeng2(a)huawei.com>
Suggested-by: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Sean Christopherson <sean.j.christopherson(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 24 ++++++++++++++----------
1 file changed, 14 insertions(+), 10 deletions(-)
--- a/mm/hugetlb.c~mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset
+++ a/mm/hugetlb.c
@@ -4909,29 +4909,33 @@ pte_t *huge_pte_offset(struct mm_struct
unsigned long addr, unsigned long sz)
{
pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
+ p4d_t *p4d, p4d_entry;
+ pud_t *pud, pud_entry;
+ pmd_t *pmd, pmd_entry;
pgd = pgd_offset(mm, addr);
if (!pgd_present(*pgd))
return NULL;
+
p4d = p4d_offset(pgd, addr);
- if (!p4d_present(*p4d))
+ p4d_entry = READ_ONCE(*p4d);
+ if (!p4d_present(p4d_entry))
return NULL;
- pud = pud_offset(p4d, addr);
- if (sz != PUD_SIZE && pud_none(*pud))
+ pud = pud_offset(&p4d_entry, addr);
+ pud_entry = READ_ONCE(*pud);
+ if (sz != PUD_SIZE && pud_none(pud_entry))
return NULL;
/* hugepage or swap? */
- if (pud_huge(*pud) || !pud_present(*pud))
+ if (pud_huge(pud_entry) || !pud_present(pud_entry))
return (pte_t *)pud;
- pmd = pmd_offset(pud, addr);
- if (sz != PMD_SIZE && pmd_none(*pmd))
+ pmd = pmd_offset(&pud_entry, addr);
+ pmd_entry = READ_ONCE(*pmd);
+ if (sz != PMD_SIZE && pmd_none(pmd_entry))
return NULL;
/* hugepage or swap? */
- if (pmd_huge(*pmd) || !pmd_present(*pmd))
+ if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
return (pte_t *)pmd;
return NULL;
_
Patches currently in -mm which might be from longpeng2(a)huawei.com are
mm-hugetlb-fix-a-addressing-exception-caused-by-huge_pte_offset.patch
When we allocate space in the GGTT we may have to allocate a larger
region than will be populated by the object to accommodate fencing. Make
sure that this space beyond the end of the buffer points safely into
scratch space, in case the HW tries to access it anyway (e.g. fenced
access to the last tile row).
v2: Preemptively / conservatively guard gen6 ggtt as well.
Reported-by: Imre Deak <imre.deak(a)intel.com>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/1554
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Imre Deak <imre.deak(a)intel.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Matthew Auld <matthew.auld(a)intel.com>
---
drivers/gpu/drm/i915/gt/intel_ggtt.c | 37 ++++++++++++++++++++--------
1 file changed, 27 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index d8944dabed55..ae07bcd7c226 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -191,10 +191,11 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
enum i915_cache_level level,
u32 flags)
{
- struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
- struct sgt_iter sgt_iter;
- gen8_pte_t __iomem *gtt_entries;
const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, 0);
+ struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
+ gen8_pte_t __iomem *gte;
+ gen8_pte_t __iomem *end;
+ struct sgt_iter iter;
dma_addr_t addr;
/*
@@ -202,10 +203,17 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
* not to allow the user to override access to a read only page.
*/
- gtt_entries = (gen8_pte_t __iomem *)ggtt->gsm;
- gtt_entries += vma->node.start / I915_GTT_PAGE_SIZE;
- for_each_sgt_daddr(addr, sgt_iter, vma->pages)
- gen8_set_pte(gtt_entries++, pte_encode | addr);
+ gte = (gen8_pte_t __iomem *)ggtt->gsm;
+ gte += vma->node.start / I915_GTT_PAGE_SIZE;
+ end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
+
+ for_each_sgt_daddr(addr, iter, vma->pages)
+ gen8_set_pte(gte++, pte_encode | addr);
+ GEM_BUG_ON(gte > end);
+
+ /* Fill the allocated but "unused" space beyond the end of the buffer */
+ while (gte < end)
+ gen8_set_pte(gte++, vm->scratch[0].encode);
/*
* We want to flush the TLBs only after we're certain all the PTE
@@ -241,13 +249,22 @@ static void gen6_ggtt_insert_entries(struct i915_address_space *vm,
u32 flags)
{
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
- gen6_pte_t __iomem *entries = (gen6_pte_t __iomem *)ggtt->gsm;
- unsigned int i = vma->node.start / I915_GTT_PAGE_SIZE;
+ gen6_pte_t __iomem *gte;
+ gen6_pte_t __iomem *end;
struct sgt_iter iter;
dma_addr_t addr;
+ gte = (gen6_pte_t __iomem *)ggtt->gsm;
+ gte += vma->node.start / I915_GTT_PAGE_SIZE;
+ end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
+
for_each_sgt_daddr(addr, iter, vma->pages)
- iowrite32(vm->pte_encode(addr, level, flags), &entries[i++]);
+ iowrite32(vm->pte_encode(addr, level, flags), gte++);
+ GEM_BUG_ON(gte > end);
+
+ /* Fill the allocated but "unused" space beyond the end of the buffer */
+ while (gte < end)
+ iowrite32(vm->scratch[0].encode, gte++);
/*
* We want to flush the TLBs only after we're certain all the PTE
--
2.20.1
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 44ef5a869396 - Linux 5.5.14-rc1
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ Storage blktests
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
Host 3:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
ppc64le:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ❌ Networking vnic: ipvlan/basic
🚧 ✅ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
Host 4:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ Networking bridge: sanity
✅ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
✅ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ audit: audit testsuite test
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ storage: dm/common
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
🚧 ❌ Storage blktests
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
🚧 ⚡⚡⚡ Storage blktests
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
🚧 ⚡⚡⚡ Storage blktests
x86_64:
Host 1:
✅ Boot test
✅ Storage SAN device stress - mpt3sas driver
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
✅ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ pciutils: sanity smoke test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ✅ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ⚡⚡⚡ storage: dm/common
Host 3:
✅ Boot test
✅ Storage SAN device stress - megaraid_sas
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: cdc41fd03565 - media: v4l2-core: fix a use-after-free bug of sd->devnode
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
✅ Networking MACsec: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
Host 3:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
ppc64le:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ Usex - version 1.9-29
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ storage: dm/common
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
⚡⚡⚡ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
s390x:
Host 1:
✅ Boot test
✅ selinux-policy: serge-testsuite
🚧 ✅ Storage blktests
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
⚡⚡⚡ Usex - version 1.9-29
✅ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ storage: dm/common
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ Usex - version 1.9-29
✅ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ✅ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ Usex - version 1.9-29
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ storage: dm/common
x86_64:
Host 1:
✅ Boot test
✅ Storage SAN device stress - megaraid_sas
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ pciutils: sanity smoke test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ✅ Storage blktests
Host 4:
✅ Boot test
✅ Storage SAN device stress - mpt3sas driver
Host 5:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ pciutils: sanity smoke test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ✅ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
Declaring setjmp()/longjmp() as taking longs makes the signature
non-standard, and makes clang complain. In the past, this has been
worked around by adding -ffreestanding to the compile flags.
The implementation looks like it only ever propagates the value
(in longjmp) or sets it to 1 (in setjmp), and we only call longjmp
with integer parameters.
This allows removing -ffreestanding from the compilation flags.
Context:
https://lore.kernel.org/patchwork/patch/1214060https://lore.kernel.org/patchwork/patch/1216174
Signed-off-by: Clement Courbet <courbet(a)google.com>
Reviewed-by: Nathan Chancellor <natechancellor(a)gmail.com>
Tested-by: Nathan Chancellor <natechancellor(a)gmail.com>
Cc: stable(a)vger.kernel.org # v4.14+
Fixes: c9029ef9c957 ("powerpc: Avoid clang warnings around setjmp and longjmp")
---
v2:
Use and array type as suggested by Segher Boessenkool
Add fix tags.
v3:
Properly place tags.
---
arch/powerpc/include/asm/setjmp.h | 6 ++++--
arch/powerpc/kexec/Makefile | 3 ---
arch/powerpc/xmon/Makefile | 3 ---
3 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
index e9f81bb3f83b..f798e80e4106 100644
--- a/arch/powerpc/include/asm/setjmp.h
+++ b/arch/powerpc/include/asm/setjmp.h
@@ -7,7 +7,9 @@
#define JMP_BUF_LEN 23
-extern long setjmp(long *) __attribute__((returns_twice));
-extern void longjmp(long *, long) __attribute__((noreturn));
+typedef long jmp_buf[JMP_BUF_LEN];
+
+extern int setjmp(jmp_buf env) __attribute__((returns_twice));
+extern void longjmp(jmp_buf env, int val) __attribute__((noreturn));
#endif /* _ASM_POWERPC_SETJMP_H */
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 378f6108a414..86380c69f5ce 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -3,9 +3,6 @@
# Makefile for the linux kernel.
#
-# Avoid clang warnings around longjmp/setjmp declarations
-CFLAGS_crash.o += -ffreestanding
-
obj-y += core.o crash.o core_$(BITS).o
obj-$(CONFIG_PPC32) += relocate_32.o
diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index c3842dbeb1b7..6f9cccea54f3 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -1,9 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for xmon
-# Avoid clang warnings around longjmp/setjmp declarations
-subdir-ccflags-y := -ffreestanding
-
GCOV_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
--
2.26.0.rc2.310.g2932bb562d-goog
When we allocate space in the GGTT we may have to allocate a larger
region than will be populated by the object to accommodate fencing. Make
sure that this space beyond the end of the buffer points safely into
scratch space, in case the HW tries to access it anyway (e.g. fenced
access to the last tile row).
Reported-by: Imre Deak <imre.deak(a)intel.com>
References: https://gitlab.freedesktop.org/drm/intel/-/issues/1554
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Matthew Auld <matthew.auld(a)intel.com>
Cc: Imre Deak <imre.deak(a)intel.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/i915/gt/intel_ggtt.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt.c b/drivers/gpu/drm/i915/gt/intel_ggtt.c
index d8944dabed55..ad56059651b8 100644
--- a/drivers/gpu/drm/i915/gt/intel_ggtt.c
+++ b/drivers/gpu/drm/i915/gt/intel_ggtt.c
@@ -191,10 +191,11 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
enum i915_cache_level level,
u32 flags)
{
+ const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, 0);
struct i915_ggtt *ggtt = i915_vm_to_ggtt(vm);
struct sgt_iter sgt_iter;
- gen8_pte_t __iomem *gtt_entries;
- const gen8_pte_t pte_encode = gen8_ggtt_pte_encode(0, level, 0);
+ gen8_pte_t __iomem *gte;
+ gen8_pte_t __iomem *end;
dma_addr_t addr;
/*
@@ -202,10 +203,16 @@ static void gen8_ggtt_insert_entries(struct i915_address_space *vm,
* not to allow the user to override access to a read only page.
*/
- gtt_entries = (gen8_pte_t __iomem *)ggtt->gsm;
- gtt_entries += vma->node.start / I915_GTT_PAGE_SIZE;
+ gte = (gen8_pte_t __iomem *)ggtt->gsm;
+ gte += vma->node.start / I915_GTT_PAGE_SIZE;
+ end = gte + vma->node.size / I915_GTT_PAGE_SIZE;
for_each_sgt_daddr(addr, sgt_iter, vma->pages)
- gen8_set_pte(gtt_entries++, pte_encode | addr);
+ gen8_set_pte(gte++, pte_encode | addr);
+ GEM_BUG_ON(gte > end);
+
+ /* Fill the allocated but "unused" space beyond the end of the buffer */
+ while (gte < end)
+ gen8_set_pte(gte++, vm->scratch[0].encode);
/*
* We want to flush the TLBs only after we're certain all the PTE
--
2.20.1
This codec (among others) has a hidden set of audio routes, apparently
designed to allow PC Beep output without a mixer widget on the output
path, which are controlled by an undocumented Realtek vendor register.
The default configuration of these routes means that certain inputs
aren't accessible, necessitating driver control of the register.
However, Realtek has provided no documentation of the register, instead
opting to fix issues by providing magic numbers, most of which have been
at least somewhat erroneous. These magic numbers then get copied by
others into model-specific fixups, leading to a fragmented and buggy set
of configurations.
To get out of this situation, I've reverse engineered the register by
flipping bits and observing how the codec's behavior changes. This
commit documents my findings. It does not change any code.
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Hebb <tommyhebb(a)gmail.com>
---
Documentation/sound/hd-audio/index.rst | 1 +
.../sound/hd-audio/realtek-pc-beep.rst | 129 ++++++++++++++++++
2 files changed, 130 insertions(+)
create mode 100644 Documentation/sound/hd-audio/realtek-pc-beep.rst
diff --git a/Documentation/sound/hd-audio/index.rst b/Documentation/sound/hd-audio/index.rst
index f8a72ffffe66..6e12de9fc34e 100644
--- a/Documentation/sound/hd-audio/index.rst
+++ b/Documentation/sound/hd-audio/index.rst
@@ -8,3 +8,4 @@ HD-Audio
models
controls
dp-mst
+ realtek-pc-beep
diff --git a/Documentation/sound/hd-audio/realtek-pc-beep.rst b/Documentation/sound/hd-audio/realtek-pc-beep.rst
new file mode 100644
index 000000000000..be47c6f76a6e
--- /dev/null
+++ b/Documentation/sound/hd-audio/realtek-pc-beep.rst
@@ -0,0 +1,129 @@
+===============================
+Realtek PC Beep Hidden Register
+===============================
+
+This file documents the "PC Beep Hidden Register", which is present in certain
+Realtek HDA codecs and controls a muxer and pair of passthrough mixers that can
+route audio between pins but aren't themselves exposed as HDA widgets. As far
+as I can tell, these hidden routes are designed to allow flexible PC Beep output
+for codecs that don't have mixer widgets in their output paths. Why it's easier
+to hide a mixer behind an undocumented vendor register than to just expose it
+as a widget, I have no idea.
+
+Register Description
+====================
+
+The register is accessed via processing coefficient 0x36 on NID 20h. Bits not
+identified below have no discernible effect on my machine, a Dell XPS 13 9350::
+
+ MSB LSB
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |h|S|L| | B |R| | Known bits
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ |0|0|1|1| 0x7 |0|0x0|1| 0x7 | Reset value
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+1Ah input select (B): 2 bits
+ When zero, expose the PC Beep line (from the internal beep generator, when
+ enabled with the Set Beep Generation verb on NID 01h, or else from the
+ external PCBEEP pin) on the 1Ah pin node. When nonzero, expose the headphone
+ jack (or possibly Line In on some machines) input instead. If PC Beep is
+ selected, the 1Ah boost control has no effect.
+
+Amplify 1Ah loopback, left (L): 1 bit
+ Amplify the left channel of 1Ah before mixing it into outputs as specified
+ by h and S bits. Does not affect the level of 1Ah exposed to other widgets.
+
+Amplify 1Ah loopback, right (R): 1 bit
+ Amplify the right channel of 1Ah before mixing it into outputs as specified
+ by h and S bits. Does not affect the level of 1Ah exposed to other widgets.
+
+Loopback 1Ah to 21h [active low] (h): 1 bit
+ When zero, mix 1Ah (possibly with amplification, depending on L and R bits)
+ into 21h (headphone jack on my machine). Mixed signal respects the mute
+ setting on 21h.
+
+Loopback 1Ah to 14h (S): 1 bit
+ When one, mix 1Ah (possibly with amplification, depending on L and R bits)
+ into 14h (internal speaker on my machine). Mixed signal **ignores** the mute
+ setting on 14h and is present whenever 14h is configured as an output.
+
+Path diagrams
+=============
+
+1Ah input selection (DIV is the PC Beep divider set on NID 01h)::
+
+ <Beep generator> <PCBEEP pin> <Headphone jack>
+ | | |
+ +--DIV--+--!DIV--+ {1Ah boost control}
+ | |
+ +--(b == 0)--+--(b != 0)--+
+ |
+ >1Ah (Beep/Headphone Mic/Line In)<
+
+Loopback of 1Ah to 21h/14h::
+
+ <1Ah (Beep/Headphone Mic/Line In)>
+ |
+ {amplify if L/R}
+ |
+ +-----!h-----+-----S-----+
+ | |
+ {21h mute control} |
+ | |
+ >21h (Headphone)< >14h (Internal Speaker)<
+
+Background
+==========
+
+All Realtek HDA codecs have a vendor-defined widget with node ID 20h which
+provides access to a bank of registers that control various codec functions.
+Registers are read and written via the standard HDA processing coefficient
+verbs (Set/Get Coefficient Index, Set/Get Processing Coefficient). The node is
+named "Realtek Vendor Registers" in public datasheets' verb listings and,
+apart from that, is entirely undocumented.
+
+This particular register, exposed at coefficient 0x36 and named in commits from
+Realtek, is of note: unlike most registers, which seem to control detailed
+amplifier parameters not in scope of the HDA specification, it controls audio
+routing which could just as easily have been defined using standard HDA mixer
+and selector widgets.
+
+Specifically, it selects between two sources for the input pin widget with Node
+ID (NID) 1Ah: the widget's signal can come either from an audio jack (on my
+laptop, a Dell XPS 13 9350, it's the headphone jack, but comments in Realtek
+commits indicate that it might be a Line In on some machines) or from the PC
+Beep line (which is itself multiplexed between the codec's internal beep
+generator and external PCBEEP pin, depending on if the beep generator is
+enabled via verbs on NID 01h). Additionally, it can mix (with optional
+amplification) that signal onto the 21h and/or 14h output pins.
+
+The register's reset value is 0x3717, corresponding to PC Beep on 1Ah that is
+then amplified and mixed into both the headphones and the speakers. Not only
+does this violate the HDA specification, which says that "[a vendor defined
+beep input pin] connection may be maintained *only* while the Link reset
+(**RST#**) is asserted", it means that we cannot ignore the register if we care
+about the input that 1Ah would otherwise expose or if the PCBEEP trace is
+poorly shielded and picks up chassis noise (both of which are the case on my
+machine).
+
+Unfortunately, there are lots of ways to get this register configuration wrong.
+Linux, it seems, has gone through most of them. For one, the register resets
+after S3 suspend: judging by existing code, this isn't the case for all vendor
+registers, and it's led to some fixes that improve behavior on cold boot but
+don't last after suspend. Other fixes have successfully switched the 1Ah input
+away from PC Beep but have failed to disable both loopback paths. On my
+machine, this means that the headphone input is amplified and looped back to
+the headphone output, which uses the exact same pins! As you might expect, this
+causes terrible headphone noise, the character of which is controlled by the
+1Ah boost control. (If you've seen instructions online to fix XPS 13 headphone
+noise by changing "Headphone Mic Boost" in ALSA, now you know why.)
+
+The information here has been obtained through black-box reverse engineering of
+the ALC256 codec's behavior and is not guaranteed to be correct. It likely
+also applies for the ALC255, ALC257, ALC235, and ALC236, since those codecs
+seem to be close relatives of the ALC256. (They all share one initialization
+function.) Additionally, other codecs like the ALC225 and ALC285 also have this
+register, judging by existing fixups in ``patch_realtek.c``, but specific
+data (e.g. node IDs, bit positions, pin mappings) for those codecs may differ
+from what I've described here.
--
2.25.2
This codec (among others) has a hidden set of audio routes, apparently
designed to allow PC Beep output without a mixer widget on the output
path, which are controlled by an undocumented Realtek vendor register.
The default configuration of these routes means that certain inputs
aren't accessible, necessitating driver control of the register.
However, Realtek has provided no documentation of the register, instead
opting to fix issues by providing magic numbers, most of which have been
at least somewhat erroneous. These magic numbers then get copied by
others into model-specific fixups, leading to a fragmented and buggy set
of configurations.
To get out of this situation, I've reverse engineered the register by
flipping bits and observing how the codec's behavior changes. This
commit documents my findings. It does not change any code.
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Hebb <tommyhebb(a)gmail.com>
---
Changes in v2: None
Documentation/sound/hd-audio/index.rst | 1 +
.../sound/hd-audio/realtek-pc-beep.rst | 129 ++++++++++++++++++
2 files changed, 130 insertions(+)
create mode 100644 Documentation/sound/hd-audio/realtek-pc-beep.rst
diff --git a/Documentation/sound/hd-audio/index.rst b/Documentation/sound/hd-audio/index.rst
index f8a72ffffe66..6e12de9fc34e 100644
--- a/Documentation/sound/hd-audio/index.rst
+++ b/Documentation/sound/hd-audio/index.rst
@@ -8,3 +8,4 @@ HD-Audio
models
controls
dp-mst
+ realtek-pc-beep
diff --git a/Documentation/sound/hd-audio/realtek-pc-beep.rst b/Documentation/sound/hd-audio/realtek-pc-beep.rst
new file mode 100644
index 000000000000..be47c6f76a6e
--- /dev/null
+++ b/Documentation/sound/hd-audio/realtek-pc-beep.rst
@@ -0,0 +1,129 @@
+===============================
+Realtek PC Beep Hidden Register
+===============================
+
+This file documents the "PC Beep Hidden Register", which is present in certain
+Realtek HDA codecs and controls a muxer and pair of passthrough mixers that can
+route audio between pins but aren't themselves exposed as HDA widgets. As far
+as I can tell, these hidden routes are designed to allow flexible PC Beep output
+for codecs that don't have mixer widgets in their output paths. Why it's easier
+to hide a mixer behind an undocumented vendor register than to just expose it
+as a widget, I have no idea.
+
+Register Description
+====================
+
+The register is accessed via processing coefficient 0x36 on NID 20h. Bits not
+identified below have no discernible effect on my machine, a Dell XPS 13 9350::
+
+ MSB LSB
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ | |h|S|L| | B |R| | Known bits
+ +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
+ |0|0|1|1| 0x7 |0|0x0|1| 0x7 | Reset value
+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+
+1Ah input select (B): 2 bits
+ When zero, expose the PC Beep line (from the internal beep generator, when
+ enabled with the Set Beep Generation verb on NID 01h, or else from the
+ external PCBEEP pin) on the 1Ah pin node. When nonzero, expose the headphone
+ jack (or possibly Line In on some machines) input instead. If PC Beep is
+ selected, the 1Ah boost control has no effect.
+
+Amplify 1Ah loopback, left (L): 1 bit
+ Amplify the left channel of 1Ah before mixing it into outputs as specified
+ by h and S bits. Does not affect the level of 1Ah exposed to other widgets.
+
+Amplify 1Ah loopback, right (R): 1 bit
+ Amplify the right channel of 1Ah before mixing it into outputs as specified
+ by h and S bits. Does not affect the level of 1Ah exposed to other widgets.
+
+Loopback 1Ah to 21h [active low] (h): 1 bit
+ When zero, mix 1Ah (possibly with amplification, depending on L and R bits)
+ into 21h (headphone jack on my machine). Mixed signal respects the mute
+ setting on 21h.
+
+Loopback 1Ah to 14h (S): 1 bit
+ When one, mix 1Ah (possibly with amplification, depending on L and R bits)
+ into 14h (internal speaker on my machine). Mixed signal **ignores** the mute
+ setting on 14h and is present whenever 14h is configured as an output.
+
+Path diagrams
+=============
+
+1Ah input selection (DIV is the PC Beep divider set on NID 01h)::
+
+ <Beep generator> <PCBEEP pin> <Headphone jack>
+ | | |
+ +--DIV--+--!DIV--+ {1Ah boost control}
+ | |
+ +--(b == 0)--+--(b != 0)--+
+ |
+ >1Ah (Beep/Headphone Mic/Line In)<
+
+Loopback of 1Ah to 21h/14h::
+
+ <1Ah (Beep/Headphone Mic/Line In)>
+ |
+ {amplify if L/R}
+ |
+ +-----!h-----+-----S-----+
+ | |
+ {21h mute control} |
+ | |
+ >21h (Headphone)< >14h (Internal Speaker)<
+
+Background
+==========
+
+All Realtek HDA codecs have a vendor-defined widget with node ID 20h which
+provides access to a bank of registers that control various codec functions.
+Registers are read and written via the standard HDA processing coefficient
+verbs (Set/Get Coefficient Index, Set/Get Processing Coefficient). The node is
+named "Realtek Vendor Registers" in public datasheets' verb listings and,
+apart from that, is entirely undocumented.
+
+This particular register, exposed at coefficient 0x36 and named in commits from
+Realtek, is of note: unlike most registers, which seem to control detailed
+amplifier parameters not in scope of the HDA specification, it controls audio
+routing which could just as easily have been defined using standard HDA mixer
+and selector widgets.
+
+Specifically, it selects between two sources for the input pin widget with Node
+ID (NID) 1Ah: the widget's signal can come either from an audio jack (on my
+laptop, a Dell XPS 13 9350, it's the headphone jack, but comments in Realtek
+commits indicate that it might be a Line In on some machines) or from the PC
+Beep line (which is itself multiplexed between the codec's internal beep
+generator and external PCBEEP pin, depending on if the beep generator is
+enabled via verbs on NID 01h). Additionally, it can mix (with optional
+amplification) that signal onto the 21h and/or 14h output pins.
+
+The register's reset value is 0x3717, corresponding to PC Beep on 1Ah that is
+then amplified and mixed into both the headphones and the speakers. Not only
+does this violate the HDA specification, which says that "[a vendor defined
+beep input pin] connection may be maintained *only* while the Link reset
+(**RST#**) is asserted", it means that we cannot ignore the register if we care
+about the input that 1Ah would otherwise expose or if the PCBEEP trace is
+poorly shielded and picks up chassis noise (both of which are the case on my
+machine).
+
+Unfortunately, there are lots of ways to get this register configuration wrong.
+Linux, it seems, has gone through most of them. For one, the register resets
+after S3 suspend: judging by existing code, this isn't the case for all vendor
+registers, and it's led to some fixes that improve behavior on cold boot but
+don't last after suspend. Other fixes have successfully switched the 1Ah input
+away from PC Beep but have failed to disable both loopback paths. On my
+machine, this means that the headphone input is amplified and looped back to
+the headphone output, which uses the exact same pins! As you might expect, this
+causes terrible headphone noise, the character of which is controlled by the
+1Ah boost control. (If you've seen instructions online to fix XPS 13 headphone
+noise by changing "Headphone Mic Boost" in ALSA, now you know why.)
+
+The information here has been obtained through black-box reverse engineering of
+the ALC256 codec's behavior and is not guaranteed to be correct. It likely
+also applies for the ALC255, ALC257, ALC235, and ALC236, since those codecs
+seem to be close relatives of the ALC256. (They all share one initialization
+function.) Additionally, other codecs like the ALC225 and ALC285 also have this
+register, judging by existing fixups in ``patch_realtek.c``, but specific
+data (e.g. node IDs, bit positions, pin mappings) for those codecs may differ
+from what I've described here.
--
2.25.2
For SCIF and HSCIF interfaces the SCxSR register holds the status of
data that is to be read next from SCxRDR register, But where as for
SCIFA and SCIFB interfaces SCxSR register holds status of data that is
previously read from SCxRDR register.
This patch makes sure the status register is read depending on the port
types so that errors are caught accordingly.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Kazuhiro Fujita <kazuhiro.fujita.jg(a)renesas.com>
Signed-off-by: Hao Bui <hao.bui.yg(a)renesas.com>
Signed-off-by: KAZUMI HARADA <kazumi.harada.rh(a)renesas.com>
Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj(a)bp.renesas.com>
---
drivers/tty/serial/sh-sci.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 0641a72..d646bc4 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -870,9 +870,16 @@ static void sci_receive_chars(struct uart_port *port)
tty_insert_flip_char(tport, c, TTY_NORMAL);
} else {
for (i = 0; i < count; i++) {
- char c = serial_port_in(port, SCxRDR);
-
- status = serial_port_in(port, SCxSR);
+ char c;
+
+ if (port->type == PORT_SCIF ||
+ port->type == PORT_HSCIF) {
+ status = serial_port_in(port, SCxSR);
+ c = serial_port_in(port, SCxRDR);
+ } else {
+ c = serial_port_in(port, SCxRDR);
+ status = serial_port_in(port, SCxSR);
+ }
if (uart_handle_sysrq_char(port, c)) {
count--; i--;
continue;
--
2.7.4
The DDI IO power well must not be enabled for a TypeC port in TBT mode,
ensure this during driver loading/system resume.
This gets rid of error messages like
[drm] *ERROR* power well DDI E TC2 IO state mismatch (refcount 1/enabled 0)
and avoids leaking the power ref when disabling the output.
Cc: <stable(a)vger.kernel.org> # v5.4+
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_ddi.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_ddi.c b/drivers/gpu/drm/i915/display/intel_ddi.c
index 916a802af788..654151d9a6db 100644
--- a/drivers/gpu/drm/i915/display/intel_ddi.c
+++ b/drivers/gpu/drm/i915/display/intel_ddi.c
@@ -1899,7 +1899,11 @@ static void intel_ddi_get_power_domains(struct intel_encoder *encoder,
return;
dig_port = enc_to_dig_port(encoder);
- intel_display_power_get(dev_priv, dig_port->ddi_io_power_domain);
+
+ if (!intel_phy_is_tc(dev_priv, phy) ||
+ dig_port->tc_mode != TC_PORT_TBT_ALT)
+ intel_display_power_get(dev_priv,
+ dig_port->ddi_io_power_domain);
/*
* AUX power is only needed for (e)DP mode, and for HDMI mode on TC
--
2.23.1
From: Longpeng <longpeng2(a)huawei.com>
Our machine encountered a panic(addressing exception) after run
for a long time and the calltrace is:
RIP: 0010:[<ffffffff9dff0587>] [<ffffffff9dff0587>] hugetlb_fault+0x307/0xbe0
RSP: 0018:ffff9567fc27f808 EFLAGS: 00010286
RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
FS: 00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff9df9b71b>] ? unlock_page+0x2b/0x30
[<ffffffff9dff04a2>] ? hugetlb_fault+0x222/0xbe0
[<ffffffff9dff1405>] follow_hugetlb_page+0x175/0x540
[<ffffffff9e15b825>] ? cpumask_next_and+0x35/0x50
[<ffffffff9dfc7230>] __get_user_pages+0x2a0/0x7e0
[<ffffffff9dfc648d>] __get_user_pages_unlocked+0x15d/0x210
[<ffffffffc068cfc5>] __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
[<ffffffffc06b28be>] try_async_pf+0x6e/0x2a0 [kvm]
[<ffffffffc06b4b41>] tdp_page_fault+0x151/0x2d0 [kvm]
...
[<ffffffffc06a6f90>] kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
[<ffffffffc068d919>] kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
[<ffffffff9deaa8c2>] ? dequeue_signal+0x32/0x180
[<ffffffff9deae34d>] ? do_sigtimedwait+0xcd/0x230
[<ffffffff9e03aed0>] do_vfs_ioctl+0x3f0/0x540
[<ffffffff9e03b0c1>] SyS_ioctl+0xa1/0xc0
[<ffffffff9e53879b>] system_call_fastpath+0x22/0x27
For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race. Please look at the following
code snippet:
...
pud = pud_offset(p4d, addr);
if (sz != PUD_SIZE && pud_none(*pud))
return NULL;
/* hugepage or swap? */
if (pud_huge(*pud) || !pud_present(*pud))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
if (sz != PMD_SIZE && pmd_none(*pmd))
return NULL;
/* hugepage or swap? */
if (pmd_huge(*pmd) || !pmd_present(*pmd))
return (pte_t *)pmd;
...
The following sequence would trigger this bug:
1. CPU0: sz = PUD_SIZE and *pud = 0 , continue
1. CPU0: "pud_huge(*pud)" is false
2. CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
3. CPU0: "!pud_present(*pud)" is false, continue
4. CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
However, we want CPU0 to return NULL or pudp in this case.
Also, according to the section 'COMPILER BARRIER' of memory-barriers.txt:
'''
(*) The compiler is within its rights to reorder loads and stores
to the same variable, and in some cases, the CPU is within its
rights to reorder loads to the same variable. This means that
the following code:
a[0] = x;
a[1] = x;
Might result in an older value of x stored in a[1] than in a[0].
'''
there're several other data races in huge_pte_offset, for example:
'''
p4d = p4d_offset(pgd, addr)
if (!p4d_present(*p4d))
return NULL;
pud = pud_offset(p4d, addr) <-- will be unwinded as:
pud = (pud_t *)p4d_page_vaddr(*p4d) + pud_index(address);
'''
which is free for the compiler/CPU to execute as:
'''
p4d = p4d_offset(pgd, addr)
p4d_for_vaddr = *p4d;
if (!p4d_present(*p4d))
return NULL;
pud = (pud_t *)p4d_page_vaddr(p4d_for_vaddr) + pud_index(address);
'''
so in the case where *p4d goes from '!present' to 'present':
p4d_present(*p4d) == true and p4d_for_vaddr == none, meaning the
p4d_page_vaddr() will crash.
For these reasons, we must make sure there is exactly one dereference of
p4d, pud and pmd.
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Sean Christopherson <sean.j.christopherson(a)intel.com>
Cc: stable(a)vger.kernel.org
Suggested-by: Jason Gunthorpe <jgg(a)ziepe.ca>
Signed-off-by: Longpeng <longpeng2(a)huawei.com>
---
v4 -> v3:
fix a typo s/p4g/p4d. [Jason]
v2 -> v3:
make sure p4d/pud/pmd be dereferenced once. [Jason]
---
mm/hugetlb.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dd8737a..d4fab68 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4909,29 +4909,33 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
unsigned long addr, unsigned long sz)
{
pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
+ p4d_t *p4d, p4d_entry;
+ pud_t *pud, pud_entry;
+ pmd_t *pmd, pmd_entry;
pgd = pgd_offset(mm, addr);
if (!pgd_present(*pgd))
return NULL;
- p4d = p4d_offset(pgd, addr);
- if (!p4d_present(*p4d))
+
+ p4d = p4d_offset(pgd, addr);
+ p4d_entry = READ_ONCE(*p4d);
+ if (!p4d_present(p4d_entry))
return NULL;
- pud = pud_offset(p4d, addr);
- if (sz != PUD_SIZE && pud_none(*pud))
+ pud = pud_offset(&p4d_entry, addr);
+ pud_entry = READ_ONCE(*pud);
+ if (sz != PUD_SIZE && pud_none(pud_entry))
return NULL;
/* hugepage or swap? */
- if (pud_huge(*pud) || !pud_present(*pud))
+ if (pud_huge(pud_entry) || !pud_present(pud_entry))
return (pte_t *)pud;
- pmd = pmd_offset(pud, addr);
- if (sz != PMD_SIZE && pmd_none(*pmd))
+ pmd = pmd_offset(&pud_entry, addr);
+ pmd_entry = READ_ONCE(*pmd);
+ if (sz != PMD_SIZE && pmd_none(pmd_entry))
return NULL;
/* hugepage or swap? */
- if (pmd_huge(*pmd) || !pmd_present(*pmd))
+ if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
return (pte_t *)pmd;
return NULL;
--
1.8.3.1
From: Longpeng <longpeng2(a)huawei.com>
Our machine encountered a panic(addressing exception) after run
for a long time and the calltrace is:
RIP: 0010:[<ffffffff9dff0587>] [<ffffffff9dff0587>] hugetlb_fault+0x307/0xbe0
RSP: 0018:ffff9567fc27f808 EFLAGS: 00010286
RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
FS: 00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff9df9b71b>] ? unlock_page+0x2b/0x30
[<ffffffff9dff04a2>] ? hugetlb_fault+0x222/0xbe0
[<ffffffff9dff1405>] follow_hugetlb_page+0x175/0x540
[<ffffffff9e15b825>] ? cpumask_next_and+0x35/0x50
[<ffffffff9dfc7230>] __get_user_pages+0x2a0/0x7e0
[<ffffffff9dfc648d>] __get_user_pages_unlocked+0x15d/0x210
[<ffffffffc068cfc5>] __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
[<ffffffffc06b28be>] try_async_pf+0x6e/0x2a0 [kvm]
[<ffffffffc06b4b41>] tdp_page_fault+0x151/0x2d0 [kvm]
...
[<ffffffffc06a6f90>] kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
[<ffffffffc068d919>] kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
[<ffffffff9deaa8c2>] ? dequeue_signal+0x32/0x180
[<ffffffff9deae34d>] ? do_sigtimedwait+0xcd/0x230
[<ffffffff9e03aed0>] do_vfs_ioctl+0x3f0/0x540
[<ffffffff9e03b0c1>] SyS_ioctl+0xa1/0xc0
[<ffffffff9e53879b>] system_call_fastpath+0x22/0x27
For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race. Please look at the following
code snippet:
...
pud = pud_offset(p4d, addr);
if (sz != PUD_SIZE && pud_none(*pud))
return NULL;
/* hugepage or swap? */
if (pud_huge(*pud) || !pud_present(*pud))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
if (sz != PMD_SIZE && pmd_none(*pmd))
return NULL;
/* hugepage or swap? */
if (pmd_huge(*pmd) || !pmd_present(*pmd))
return (pte_t *)pmd;
...
The following sequence would trigger this bug:
1. CPU0: sz = PUD_SIZE and *pud = 0 , continue
1. CPU0: "pud_huge(*pud)" is false
2. CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
3. CPU0: "!pud_present(*pud)" is false, continue
4. CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
However, we want CPU0 to return NULL or pudp in this case.
Also, according to the section 'COMPILER BARRIER' of memory-barriers.txt:
'''
(*) The compiler is within its rights to reorder loads and stores
to the same variable, and in some cases, the CPU is within its
rights to reorder loads to the same variable. This means that
the following code:
a[0] = x;
a[1] = x;
Might result in an older value of x stored in a[1] than in a[0].
'''
there're several other data races in huge_pte_offset, for example:
'''
p4d = p4d_offset(pgd, addr)
if (!p4d_present(*p4d))
return NULL;
pud = pud_offset(p4d, addr) <-- will be unwinded as:
pud = (pud_t *)p4d_page_vaddr(*p4d) + pud_index(address);
'''
which is free for the compiler/CPU to execute as:
'''
p4d = p4d_offset(pgd, addr)
p4d_for_vaddr = *p4d;
if (!p4d_present(*p4d))
return NULL;
pud = (pud_t *)p4d_page_vaddr(p4d_for_vaddr) + pud_index(address);
'''
so in the case where *p4d goes from '!present' to 'present':
p4d_present(*p4d) == true and p4d_for_vaddr == none, meaning the
p4d_page_vaddr() will crash.
For these reasons, we must make sure there is exactly one dereference of
p4d, pud and pmd.
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Sean Christopherson <sean.j.christopherson(a)intel.com>
Cc: stable(a)vger.kernel.org
Suggested-by: Jason Gunthorpe <jgg(a)ziepe.ca>
Signed-off-by: Longpeng <longpeng2(a)huawei.com>
---
v3 -> v4:
fix a typo s/p4g/p4d. [Jason]
v2 -> v3:
make sure p4d/pud/pmd be dereferenced once. [Jason]
---
mm/hugetlb.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dd8737a..d4fab68 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4909,29 +4909,33 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
unsigned long addr, unsigned long sz)
{
pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
+ p4d_t *p4d, p4d_entry;
+ pud_t *pud, pud_entry;
+ pmd_t *pmd, pmd_entry;
pgd = pgd_offset(mm, addr);
if (!pgd_present(*pgd))
return NULL;
- p4d = p4d_offset(pgd, addr);
- if (!p4d_present(*p4d))
+
+ p4d = p4d_offset(pgd, addr);
+ p4d_entry = READ_ONCE(*p4d);
+ if (!p4d_present(p4d_entry))
return NULL;
- pud = pud_offset(p4d, addr);
- if (sz != PUD_SIZE && pud_none(*pud))
+ pud = pud_offset(&p4d_entry, addr);
+ pud_entry = READ_ONCE(*pud);
+ if (sz != PUD_SIZE && pud_none(pud_entry))
return NULL;
/* hugepage or swap? */
- if (pud_huge(*pud) || !pud_present(*pud))
+ if (pud_huge(pud_entry) || !pud_present(pud_entry))
return (pte_t *)pud;
- pmd = pmd_offset(pud, addr);
- if (sz != PMD_SIZE && pmd_none(*pmd))
+ pmd = pmd_offset(&pud_entry, addr);
+ pmd_entry = READ_ONCE(*pmd);
+ if (sz != PMD_SIZE && pmd_none(pmd_entry))
return NULL;
/* hugepage or swap? */
- if (pmd_huge(*pmd) || !pmd_present(*pmd))
+ if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
return (pte_t *)pmd;
return NULL;
--
1.8.3.1
Hi
[This is an automated email]
This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v5.5.13, v5.4.28, v4.19.113, v4.14.174, v4.9.217, v4.4.217.
v5.5.13: Build OK!
v5.4.28: Build OK!
v4.19.113: Build OK!
v4.14.174: Build OK!
v4.9.217: Build OK!
v4.4.217: Failed to apply! Possible dependencies:
5d6c31910bc0 ("xattr: Add __vfs_{get,set,remove}xattr helpers")
6b2553918d8b ("replace ->follow_link() with new method that could stay in RCU mode")
aa80deab33a8 ("namei: page_getlink() and page_follow_link_light() are the same thing")
cd3417c8fc95 ("kill free_page_put_link()")
ce23e6401334 ("->getxattr(): pass dentry and inode as separate arguments")
fceef393a538 ("switch ->get_link() to delayed_call, kill ->put_link()")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks
Sasha
The patch titled
Subject: mm/sparse: fix kernel crash with pfn_section_valid check
has been removed from the -mm tree. Its filename was
mm-sparse-fix-kernel-crash-with-pfn_section_valid-check.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/sparse: fix kernel crash with pfn_section_valid check
Fix the below crash
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000000c3447c
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
...
NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
LR [c000000000088354] vmemmap_free+0x144/0x320
Call Trace:
section_deactivate+0x220/0x240
__remove_pages+0x118/0x170
arch_remove_memory+0x3c/0x150
memunmap_pages+0x1cc/0x2f0
devm_action_release+0x30/0x50
release_nodes+0x2f8/0x3e0
device_release_driver_internal+0x168/0x270
unbind_store+0x130/0x170
drv_attr_store+0x44/0x60
sysfs_kf_write+0x68/0x80
kernfs_fop_write+0x100/0x290
__vfs_write+0x3c/0x70
vfs_write+0xcc/0x240
ksys_write+0x7c/0x140
system_call+0x5c/0x68
The crash is due to NULL dereference at
test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL; in
pfn_section_valid()
With commit d41e2f3bd546 ("mm/hotplug: fix hot remove failure in
SPARSEMEM|!VMEMMAP case") section_mem_map is set to NULL after
depopulate_section_mem(). This was done so that pfn_page() can work
correctly with kernel config that disables SPARSEMEM_VMEMMAP. With that
config pfn_to_page does
__section_mem_map_addr(__sec) + __pfn;
where
static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
unsigned long map = section->section_mem_map;
map &= SECTION_MAP_MASK;
return (struct page *)map;
}
Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is
used to check the pfn validity (pfn_valid()). Since section_deactivate
release mem_section->usage if a section is fully deactivated, pfn_valid()
check after a subsection_deactivate cause a kernel crash.
static inline int pfn_valid(unsigned long pfn)
{
...
return early_section(ms) || pfn_section_valid(ms, pfn);
}
where
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
int idx = subsection_map_index(pfn);
return test_bit(idx, ms->usage->subsection_map);
}
Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is
freed. For architectures like ppc64 where large pages are used for
vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple
sections. Hence before a vmemmap mapping page can be freed, the kernel
needs to make sure there are no valid sections within that mapping.
Clearing the section valid bit before depopulate_section_memap enables
this.
[aneesh.kumar(a)linux.ibm.com: add comment]
Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.com…: http://lkml.kernel.org/r/20200325031914.107660-1-aneesh.kumar@linux.ibm.com
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Reviewed-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/mm/sparse.c~mm-sparse-fix-kernel-crash-with-pfn_section_valid-check
+++ a/mm/sparse.c
@@ -781,6 +781,12 @@ static void section_deactivate(unsigned
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+ /*
+ * Mark the section invalid so that valid_section()
+ * return false. This prevents code from dereferencing
+ * ms->usage array.
+ */
+ ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
}
if (section_is_early && memmap)
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
The patch titled
Subject: mm: fork: fix kernel_stack memcg stats for various stack implementations
has been removed from the -mm tree. Its filename was
mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: fork: fix kernel_stack memcg stats for various stack implementations
Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the
space for task stacks can be allocated using __vmalloc_node_range(),
alloc_pages_node() and kmem_cache_alloc_node(). In the first and the
second cases page->mem_cgroup pointer is set, but in the third it's not:
memcg membership of a slab page should be determined using the
memcg_from_slab_page() function, which looks at
page->slab_cache->memcg_params.memcg . In this case, using
mod_memcg_page_state() (as in account_kernel_stack()) is incorrect:
page->mem_cgroup pointer is NULL even for pages charged to a non-root
memory cgroup.
It can lead to kernel_stack per-memcg counters permanently showing 0 on
some architectures (depending on the configuration).
In order to fix it, let's introduce a mod_memcg_obj_state() helper, which
takes a pointer to a kernel object as a first argument, uses
mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls
mod_memcg_state(). It allows to handle all possible configurations
(CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without
spilling any memcg/kmem specifics into fork.c .
Note: This is a special version of the patch created for stable
backports. It contains code from the following two patches:
- mm: memcg/slab: introduce mem_cgroup_from_obj()
- mm: fork: fix kernel_stack memcg stats for various stack implementations
[guro(a)fb.com: introduce mem_cgroup_from_obj()]
Link: http://lkml.kernel.org/r/20200324004221.GA36662@carbon.dhcp.thefacebook.com
Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.com
Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Bharata B Rao <bharata(a)linux.ibm.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memcontrol.h | 12 +++++++++++
kernel/fork.c | 4 +--
mm/memcontrol.c | 38 +++++++++++++++++++++++++++++++++++
3 files changed, 52 insertions(+), 2 deletions(-)
--- a/include/linux/memcontrol.h~mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations
+++ a/include/linux/memcontrol.h
@@ -695,6 +695,7 @@ static inline unsigned long lruvec_page_
void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val);
void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val);
+void mod_memcg_obj_state(void *p, int idx, int val);
static inline void mod_lruvec_state(struct lruvec *lruvec,
enum node_stat_item idx, int val)
@@ -1123,6 +1124,10 @@ static inline void __mod_lruvec_slab_sta
__mod_node_page_state(page_pgdat(page), idx, val);
}
+static inline void mod_memcg_obj_state(void *p, int idx, int val)
+{
+}
+
static inline
unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
gfp_t gfp_mask,
@@ -1427,6 +1432,8 @@ static inline int memcg_cache_id(struct
return memcg ? memcg->kmemcg_id : -1;
}
+struct mem_cgroup *mem_cgroup_from_obj(void *p);
+
#else
static inline int memcg_kmem_charge(struct page *page, gfp_t gfp, int order)
@@ -1468,6 +1475,11 @@ static inline void memcg_put_cache_ids(v
{
}
+static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
+{
+ return NULL;
+}
+
#endif /* CONFIG_MEMCG_KMEM */
#endif /* _LINUX_MEMCONTROL_H */
--- a/kernel/fork.c~mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations
+++ a/kernel/fork.c
@@ -397,8 +397,8 @@ static void account_kernel_stack(struct
mod_zone_page_state(page_zone(first_page), NR_KERNEL_STACK_KB,
THREAD_SIZE / 1024 * account);
- mod_memcg_page_state(first_page, MEMCG_KERNEL_STACK_KB,
- account * (THREAD_SIZE / 1024));
+ mod_memcg_obj_state(stack, MEMCG_KERNEL_STACK_KB,
+ account * (THREAD_SIZE / 1024));
}
}
--- a/mm/memcontrol.c~mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations
+++ a/mm/memcontrol.c
@@ -777,6 +777,17 @@ void __mod_lruvec_slab_state(void *p, en
rcu_read_unlock();
}
+void mod_memcg_obj_state(void *p, int idx, int val)
+{
+ struct mem_cgroup *memcg;
+
+ rcu_read_lock();
+ memcg = mem_cgroup_from_obj(p);
+ if (memcg)
+ mod_memcg_state(memcg, idx, val);
+ rcu_read_unlock();
+}
+
/**
* __count_memcg_events - account VM events in a cgroup
* @memcg: the memory cgroup
@@ -2661,6 +2672,33 @@ static void commit_charge(struct page *p
}
#ifdef CONFIG_MEMCG_KMEM
+/*
+ * Returns a pointer to the memory cgroup to which the kernel object is charged.
+ *
+ * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
+ * cgroup_mutex, etc.
+ */
+struct mem_cgroup *mem_cgroup_from_obj(void *p)
+{
+ struct page *page;
+
+ if (mem_cgroup_disabled())
+ return NULL;
+
+ page = virt_to_head_page(p);
+
+ /*
+ * Slab pages don't have page->mem_cgroup set because corresponding
+ * kmem caches can be reparented during the lifetime. That's why
+ * memcg_from_slab_page() should be used instead.
+ */
+ if (PageSlab(page))
+ return memcg_from_slab_page(page);
+
+ /* All other pages use page->mem_cgroup */
+ return page->mem_cgroup;
+}
+
static int memcg_alloc_cache_id(void)
{
int id, size;
_
Patches currently in -mm which might be from guro(a)fb.com are
mm-memcg-slab-introduce-mem_cgroup_from_obj.patch
mm-kmem-cleanup-__memcg_kmem_charge_memcg-arguments.patch
mm-kmem-cleanup-memcg_kmem_uncharge_memcg-arguments.patch
mm-kmem-rename-memcg_kmem_uncharge-into-memcg_kmem_uncharge_page.patch
mm-kmem-switch-to-nr_pages-in-__memcg_kmem_charge_memcg.patch
mm-memcg-slab-cache-page-number-in-memcg_uncharge_slab.patch
mm-kmem-rename-__memcg_kmem_uncharge_memcg-to-__memcg_kmem_uncharge.patch
mm-memcg-make-memoryoomgroup-tolerable-to-task-migration.patch
mmpage_alloccma-conditionally-prefer-cma-pageblocks-for-movable-allocations.patch
mmpage_alloccma-conditionally-prefer-cma-pageblocks-for-movable-allocations-fix.patch
mm-hugetlb-optionally-allocate-gigantic-hugepages-using-cma.patch
mm-hugetlb-optionally-allocate-gigantic-hugepages-using-cma-fix.patch
mm-hugetlb-optionally-allocate-gigantic-hugepages-using-cma-fix-2.patch
mm-hugetlb-fix-hugetlb_cma_reserve-if-config_numa-isnt-set.patch
The patch titled
Subject: drivers/base/memory.c: indicate all memory blocks as removable
has been removed from the -mm tree. Its filename was
drivers-base-memoryc-indicate-all-memory-blocks-as-removable.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: drivers/base/memory.c: indicate all memory blocks as removable
We see multiple issues with the implementation/interface to compute
whether a memory block can be offlined (exposed via
/sys/devices/system/memory/memoryX/removable) and would like to simplify
it (remove the implementation).
1. It runs basically lockless. While this might be good for performance,
we see possible races with memory offlining that will require at least
some sort of locking to fix.
2. Nowadays, more false positives are possible. No arch-specific checks
are performed that validate if memory offlining will not be denied
right away (and such check will require locking). For example, arm64
won't allow to offline any memory block that was added during boot -
which will imply a very high error rate. Other archs have other
constraints.
3. The interface is inherently racy. E.g., if a memory block is
detected to be removable (and was not a false positive at that time),
there is still no guarantee that offlining will actually succeed. So
any caller already has to deal with false positives.
4. It is unclear which performance benefit this interface actually
provides. The introducing commit 5c755e9fd813 ("memory-hotplug: add
sysfs removable attribute for hotplug memory remove") mentioned
"A user-level agent must be able to identify which sections of
memory are likely to be removable before attempting the
potentially expensive operation."
However, no actual performance comparison was included.
Known users:
- lsmem: Will group memory blocks based on the "removable" property. [1]
- chmem: Indirect user. It has a RANGE mode where one can specify
removable ranges identified via lsmem to be offlined. However, it
also has a "SIZE" mode, which allows a sysadmin to skip the manual
"identify removable blocks" step. [2]
- powerpc-utils: Uses the "removable" attribute to skip some memory
blocks right away when trying to find some to
offline+remove. However, with ballooning enabled, it
already skips this information completely (because it
once resulted in many false negatives). Therefore, the
implementation can deal with false positives properly
already. [3]
According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer
driven from userspace via the drmgr command (powerpc-utils). Nowadays
it's managed in the kernel - including onlining/offlining of memory
blocks - triggered by drmgr writing to /sys/kernel/dlpar. So the
affected legacy userspace handling is only active on old kernels. Only ve=
ry
old versions of drmgr on a new kernel (unlikely) might execute slower -
totally acceptable.
With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not
break any user space tool. We implement a very bad heuristic now. Withou=
t
CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report
"not removable" as before.
Original discussion can be found in [4] ("[PATCH RFC v1] mm:
is_mem_section_removable() overhaul").
Other users of is_mem_section_removable() will be removed next, so that
we can remove is_mem_section_removable() completely.
[1] http://man7.org/linux/man-pages/man1/lsmem.1.html
[2] http://man7.org/linux/man-pages/man8/chmem.8.html
[3] https://github.com/ibm-power-utilities/powerpc-utils
[4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com
Also, this patch probably fixes a crash reported by Steve.
http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLX…
Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Suggested-by: Michal Hocko <mhocko(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Nathan Fontenot <ndfont(a)gmail.com>
Reported-by: "Scargall, Steve" <steve.scargall(a)intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Cc: Badari Pulavarty <pbadari(a)us.ibm.com>
Cc: Robert Jennings <rcj(a)linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: Karel Zak <kzak(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/base/memory.c | 23 +++--------------------
1 file changed, 3 insertions(+), 20 deletions(-)
--- a/drivers/base/memory.c~drivers-base-memoryc-indicate-all-memory-blocks-as-removable
+++ a/drivers/base/memory.c
@@ -97,30 +97,13 @@ static ssize_t phys_index_show(struct de
}
/*
- * Show whether the memory block is likely to be offlineable (or is already
- * offline). Once offline, the memory block could be removed. The return
- * value does, however, not indicate that there is a way to remove the
- * memory block.
+ * Legacy interface that we cannot remove. Always indicate "removable"
+ * with CONFIG_MEMORY_HOTREMOVE - bad heuristic.
*/
static ssize_t removable_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
- struct memory_block *mem = to_memory_block(dev);
- unsigned long pfn;
- int ret = 1, i;
-
- if (mem->state != MEM_ONLINE)
- goto out;
-
- for (i = 0; i < sections_per_block; i++) {
- if (!present_section_nr(mem->start_section_nr + i))
- continue;
- pfn = section_nr_to_pfn(mem->start_section_nr + i);
- ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
- }
-
-out:
- return sprintf(buf, "%d\n", ret);
+ return sprintf(buf, "%d\n", (int)IS_ENABLED(CONFIG_MEMORY_HOTREMOVE));
}
/*
_
Patches currently in -mm which might be from david(a)redhat.com are
drivers-base-memoryc-cache-memory-blocks-in-xarray-to-accelerate-lookup-fix.patch
virtio-balloon-switch-back-to-oom-handler-for-virtio_balloon_f_deflate_on_oom.patch
drivers-base-memoryc-drop-section_count.patch
drivers-base-memoryc-drop-pages_correctly_probed.patch
mm-page_extc-drop-pfn_present-check-when-onlining.patch
mm-memory_hotplug-simplify-calculation-of-number-of-pages-in-__remove_pages.patch
mm-memory_hotplug-cleanup-__add_pages.patch
drivers-base-memory-rename-mmop_online_keep-to-mmop_online.patch
drivers-base-memory-map-mmop_offline-to-0.patch
drivers-base-memory-store-mapping-between-mmop_-and-string-in-an-array.patch
powernv-memtrace-always-online-added-memory-blocks.patch
hv_balloon-dont-check-for-memhp_auto_online-manually.patch
mm-memory_hotplug-unexport-memhp_auto_online.patch
mm-memory_hotplug-convert-memhp_auto_online-to-store-an-online_type.patch
mm-memory_hotplug-allow-to-specify-a-default-online_type.patch
The patch titled
Subject: mm/swapfile.c: move inode_lock out of claim_swapfile
has been removed from the -mm tree. Its filename was
mm-swap-move-inode_lock-out-of-claim_swapfile.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Naohiro Aota <naohiro.aota(a)wdc.com>
Subject: mm/swapfile.c: move inode_lock out of claim_swapfile
claim_swapfile() currently keeps the inode locked when it is successful,
or the file is already swapfile (with -EBUSY). And, on the other error
cases, it does not lock the inode.
This inconsistency of the lock state and return value is quite confusing
and actually causing a bad unlock balance as below in the "bad_swap"
section of __do_sys_swapon().
This commit fixes this issue by moving the inode_lock() and IS_SWAPFILE
check out of claim_swapfile(). The inode is unlocked in
"bad_swap_unlock_inode" section, so that the inode is ensured to be
unlocked at "bad_swap". Thus, error handling codes after the locking now
jumps to "bad_swap_unlock_inode" instead of "bad_swap".
=====================================
WARNING: bad unlock balance detected!
5.5.0-rc7+ #176 Not tainted
-------------------------------------
swapon/4294 is trying to release lock (&sb->s_type->i_mutex_key) at:
[<ffffffff8173a6eb>] __do_sys_swapon+0x94b/0x3550
but there are no more locks to release!
other info that might help us debug this:
no locks held by swapon/4294.
stack backtrace:
CPU: 5 PID: 4294 Comm: swapon Not tainted 5.5.0-rc7-BTRFS-ZNS+ #176
Hardware name: ASUS All Series/H87-PRO, BIOS 2102 07/29/2014
Call Trace:
dump_stack+0xa1/0xea
? __do_sys_swapon+0x94b/0x3550
print_unlock_imbalance_bug.cold+0x114/0x123
? __do_sys_swapon+0x94b/0x3550
lock_release+0x562/0xed0
? kvfree+0x31/0x40
? lock_downgrade+0x770/0x770
? kvfree+0x31/0x40
? rcu_read_lock_sched_held+0xa1/0xd0
? rcu_read_lock_bh_held+0xb0/0xb0
up_write+0x2d/0x490
? kfree+0x293/0x2f0
__do_sys_swapon+0x94b/0x3550
? putname+0xb0/0xf0
? kmem_cache_free+0x2e7/0x370
? do_sys_open+0x184/0x3e0
? generic_max_swapfile_size+0x40/0x40
? do_syscall_64+0x27/0x4b0
? entry_SYSCALL_64_after_hwframe+0x49/0xbe
? lockdep_hardirqs_on+0x38c/0x590
__x64_sys_swapon+0x54/0x80
do_syscall_64+0xa4/0x4b0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f15da0a0dc7
Link: http://lkml.kernel.org/r/20200206090132.154869-1-naohiro.aota@wdc.com
Fixes: 1638045c3677 ("mm: set S_SWAPFILE on blockdev swap devices")
Signed-off-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Reviewed-by: Darrick J. Wong <darrick.wong(a)oracle.com>
Tested-by: Qais Youef <qais.yousef(a)arm.com>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/swapfile.c | 41 ++++++++++++++++++++---------------------
1 file changed, 20 insertions(+), 21 deletions(-)
--- a/mm/swapfile.c~mm-swap-move-inode_lock-out-of-claim_swapfile
+++ a/mm/swapfile.c
@@ -2899,10 +2899,6 @@ static int claim_swapfile(struct swap_in
p->bdev = inode->i_sb->s_bdev;
}
- inode_lock(inode);
- if (IS_SWAPFILE(inode))
- return -EBUSY;
-
return 0;
}
@@ -3157,36 +3153,41 @@ SYSCALL_DEFINE2(swapon, const char __use
mapping = swap_file->f_mapping;
inode = mapping->host;
- /* will take i_rwsem; */
error = claim_swapfile(p, inode);
if (unlikely(error))
goto bad_swap;
+ inode_lock(inode);
+ if (IS_SWAPFILE(inode)) {
+ error = -EBUSY;
+ goto bad_swap_unlock_inode;
+ }
+
/*
* Read the swap header.
*/
if (!mapping->a_ops->readpage) {
error = -EINVAL;
- goto bad_swap;
+ goto bad_swap_unlock_inode;
}
page = read_mapping_page(mapping, 0, swap_file);
if (IS_ERR(page)) {
error = PTR_ERR(page);
- goto bad_swap;
+ goto bad_swap_unlock_inode;
}
swap_header = kmap(page);
maxpages = read_swap_header(p, swap_header, inode);
if (unlikely(!maxpages)) {
error = -EINVAL;
- goto bad_swap;
+ goto bad_swap_unlock_inode;
}
/* OK, set up the swap map and apply the bad block list */
swap_map = vzalloc(maxpages);
if (!swap_map) {
error = -ENOMEM;
- goto bad_swap;
+ goto bad_swap_unlock_inode;
}
if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
@@ -3211,7 +3212,7 @@ SYSCALL_DEFINE2(swapon, const char __use
GFP_KERNEL);
if (!cluster_info) {
error = -ENOMEM;
- goto bad_swap;
+ goto bad_swap_unlock_inode;
}
for (ci = 0; ci < nr_cluster; ci++)
@@ -3220,7 +3221,7 @@ SYSCALL_DEFINE2(swapon, const char __use
p->percpu_cluster = alloc_percpu(struct percpu_cluster);
if (!p->percpu_cluster) {
error = -ENOMEM;
- goto bad_swap;
+ goto bad_swap_unlock_inode;
}
for_each_possible_cpu(cpu) {
struct percpu_cluster *cluster;
@@ -3234,13 +3235,13 @@ SYSCALL_DEFINE2(swapon, const char __use
error = swap_cgroup_swapon(p->type, maxpages);
if (error)
- goto bad_swap;
+ goto bad_swap_unlock_inode;
nr_extents = setup_swap_map_and_extents(p, swap_header, swap_map,
cluster_info, maxpages, &span);
if (unlikely(nr_extents < 0)) {
error = nr_extents;
- goto bad_swap;
+ goto bad_swap_unlock_inode;
}
/* frontswap enabled? set up bit-per-page map for frontswap */
if (IS_ENABLED(CONFIG_FRONTSWAP))
@@ -3280,7 +3281,7 @@ SYSCALL_DEFINE2(swapon, const char __use
error = init_swap_address_space(p->type, maxpages);
if (error)
- goto bad_swap;
+ goto bad_swap_unlock_inode;
/*
* Flush any pending IO and dirty mappings before we start using this
@@ -3290,7 +3291,7 @@ SYSCALL_DEFINE2(swapon, const char __use
error = inode_drain_writes(inode);
if (error) {
inode->i_flags &= ~S_SWAPFILE;
- goto bad_swap;
+ goto bad_swap_unlock_inode;
}
mutex_lock(&swapon_mutex);
@@ -3315,6 +3316,8 @@ SYSCALL_DEFINE2(swapon, const char __use
error = 0;
goto out;
+bad_swap_unlock_inode:
+ inode_unlock(inode);
bad_swap:
free_percpu(p->percpu_cluster);
p->percpu_cluster = NULL;
@@ -3322,6 +3325,7 @@ bad_swap:
set_blocksize(p->bdev, p->old_block_size);
blkdev_put(p->bdev, FMODE_READ | FMODE_WRITE | FMODE_EXCL);
}
+ inode = NULL;
destroy_swap_extents(p);
swap_cgroup_swapoff(p->type);
spin_lock(&swap_lock);
@@ -3333,13 +3337,8 @@ bad_swap:
kvfree(frontswap_map);
if (inced_nr_rotate_swap)
atomic_dec(&nr_rotate_swap);
- if (swap_file) {
- if (inode) {
- inode_unlock(inode);
- inode = NULL;
- }
+ if (swap_file)
filp_close(swap_file, NULL);
- }
out:
if (page && !IS_ERR(page)) {
kunmap(page);
_
Patches currently in -mm which might be from naohiro.aota(a)wdc.com are
This removes the last users of cred_guard_mutex
and replaces it with a new mutex exec_guard_mutex,
and a boolean unsafe_execve_in_progress.
This addresses the case when at least one of the
sibling threads is traced, and therefore the trace
process may dead-lock in ptrace_attach, but de_thread
will need to wait for the tracer to continue execution.
The solution is to detect this situation and make
ptrace_attach and similar functions return -EAGAIN,
but only in a situation where a dead-lock is imminent.
This means this is an API change, but only when the
process is traced while execve happens in a
multi-threaded application.
See tools/testing/selftests/ptrace/vmaccess.c
for a test case that gets fixed by this change.
Signed-off-by: Bernd Edlinger <bernd.edlinger(a)hotmail.de>
---
fs/exec.c | 44 +++++++++++++++++++++++++++++++++++---------
fs/proc/base.c | 13 ++++++++-----
include/linux/sched/signal.h | 14 +++++++++-----
init/init_task.c | 2 +-
kernel/cred.c | 2 +-
kernel/fork.c | 2 +-
kernel/ptrace.c | 20 +++++++++++++++++---
kernel/seccomp.c | 15 +++++++++------
8 files changed, 81 insertions(+), 31 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 0e46ec5..2056562 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1078,14 +1078,26 @@ static int de_thread(struct task_struct *tsk)
struct signal_struct *sig = tsk->signal;
struct sighand_struct *oldsighand = tsk->sighand;
spinlock_t *lock = &oldsighand->siglock;
+ struct task_struct *t = tsk;
if (thread_group_empty(tsk))
goto no_thread_group;
+ spin_lock_irq(lock);
+ while_each_thread(tsk, t) {
+ if (unlikely(t->ptrace))
+ sig->unsafe_execve_in_progress = true;
+ }
+
+ if (unlikely(sig->unsafe_execve_in_progress)) {
+ spin_unlock_irq(lock);
+ mutex_unlock(&sig->exec_guard_mutex);
+ spin_lock_irq(lock);
+ }
+
/*
* Kill all other threads in the thread group.
*/
- spin_lock_irq(lock);
if (signal_group_exit(sig)) {
/*
* Another group action in progress, just
@@ -1429,22 +1441,30 @@ void finalize_exec(struct linux_binprm *bprm)
EXPORT_SYMBOL(finalize_exec);
/*
- * Prepare credentials and lock ->cred_guard_mutex.
+ * Prepare credentials and lock ->exec_guard_mutex.
* install_exec_creds() commits the new creds and drops the lock.
* Or, if exec fails before, free_bprm() should release ->cred and
* and unlock.
*/
static int prepare_bprm_creds(struct linux_binprm *bprm)
{
- if (mutex_lock_interruptible(¤t->signal->cred_guard_mutex))
+ int ret;
+
+ if (mutex_lock_interruptible(¤t->signal->exec_guard_mutex))
return -ERESTARTNOINTR;
+ ret = -EAGAIN;
+ if (unlikely(current->signal->unsafe_execve_in_progress))
+ goto out;
+
bprm->cred = prepare_exec_creds();
if (likely(bprm->cred))
return 0;
- mutex_unlock(¤t->signal->cred_guard_mutex);
- return -ENOMEM;
+ ret = -ENOMEM;
+out:
+ mutex_unlock(¤t->signal->exec_guard_mutex);
+ return ret;
}
static void free_bprm(struct linux_binprm *bprm)
@@ -1453,7 +1473,10 @@ static void free_bprm(struct linux_binprm *bprm)
if (bprm->cred) {
if (bprm->called_exec_mmap)
mutex_unlock(¤t->signal->exec_update_mutex);
- mutex_unlock(¤t->signal->cred_guard_mutex);
+ if (unlikely(current->signal->unsafe_execve_in_progress))
+ mutex_lock(¤t->signal->exec_guard_mutex);
+ current->signal->unsafe_execve_in_progress = false;
+ mutex_unlock(¤t->signal->exec_guard_mutex);
abort_creds(bprm->cred);
}
if (bprm->file) {
@@ -1497,19 +1520,22 @@ void install_exec_creds(struct linux_binprm *bprm)
if (get_dumpable(current->mm) != SUID_DUMP_USER)
perf_event_exit_task(current);
/*
- * cred_guard_mutex must be held at least to this point to prevent
+ * exec_guard_mutex must be held at least to this point to prevent
* ptrace_attach() from altering our determination of the task's
* credentials; any time after this it may be unlocked.
*/
security_bprm_committed_creds(bprm);
mutex_unlock(¤t->signal->exec_update_mutex);
- mutex_unlock(¤t->signal->cred_guard_mutex);
+ if (unlikely(current->signal->unsafe_execve_in_progress))
+ mutex_lock(¤t->signal->exec_guard_mutex);
+ current->signal->unsafe_execve_in_progress = false;
+ mutex_unlock(¤t->signal->exec_guard_mutex);
}
EXPORT_SYMBOL(install_exec_creds);
/*
* determine how safe it is to execute the proposed program
- * - the caller must hold ->cred_guard_mutex to protect against
+ * - the caller must hold ->exec_guard_mutex to protect against
* PTRACE_ATTACH or seccomp thread-sync
*/
static void check_unsafe_exec(struct linux_binprm *bprm)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 6b13fc4..a428536 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2680,14 +2680,17 @@ static ssize_t proc_pid_attr_write(struct file * file, const char __user * buf,
}
/* Guard against adverse ptrace interaction */
- rv = mutex_lock_interruptible(¤t->signal->cred_guard_mutex);
+ rv = mutex_lock_interruptible(¤t->signal->exec_guard_mutex);
if (rv < 0)
goto out_free;
- rv = security_setprocattr(PROC_I(inode)->op.lsm,
- file->f_path.dentry->d_name.name, page,
- count);
- mutex_unlock(¤t->signal->cred_guard_mutex);
+ if (unlikely(current->signal->unsafe_execve_in_progress))
+ rv = -EAGAIN;
+ else
+ rv = security_setprocattr(PROC_I(inode)->op.lsm,
+ file->f_path.dentry->d_name.name,
+ page, count);
+ mutex_unlock(¤t->signal->exec_guard_mutex);
out_free:
kfree(page);
out:
diff --git a/include/linux/sched/signal.h b/include/linux/sched/signal.h
index a29df79..e83cef2 100644
--- a/include/linux/sched/signal.h
+++ b/include/linux/sched/signal.h
@@ -212,6 +212,13 @@ struct signal_struct {
#endif
/*
+ * Set while execve is executing but is *not* holding
+ * exec_guard_mutex to avoid possible dead-locks.
+ * Only valid when exec_guard_mutex is held.
+ */
+ bool unsafe_execve_in_progress;
+
+ /*
* Thread is the potential origin of an oom condition; kill first on
* oom
*/
@@ -222,11 +229,8 @@ struct signal_struct {
struct mm_struct *oom_mm; /* recorded mm when the thread group got
* killed by the oom killer */
- struct mutex cred_guard_mutex; /* guard against foreign influences on
- * credential calculations
- * (notably. ptrace)
- * Deprecated do not use in new code.
- * Use exec_update_mutex instead.
+ struct mutex exec_guard_mutex; /* Held while execve runs, except when
+ * a sibling thread is being traced.
*/
struct mutex exec_update_mutex; /* Held while task_struct is being
* updated during exec, and may have
diff --git a/init/init_task.c b/init/init_task.c
index bd403ed..6f96327 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -25,7 +25,7 @@
},
.multiprocess = HLIST_HEAD_INIT,
.rlim = INIT_RLIMITS,
- .cred_guard_mutex = __MUTEX_INITIALIZER(init_signals.cred_guard_mutex),
+ .exec_guard_mutex = __MUTEX_INITIALIZER(init_signals.exec_guard_mutex),
.exec_update_mutex = __MUTEX_INITIALIZER(init_signals.exec_update_mutex),
#ifdef CONFIG_POSIX_TIMERS
.posix_timers = LIST_HEAD_INIT(init_signals.posix_timers),
diff --git a/kernel/cred.c b/kernel/cred.c
index 71a7926..341ca59 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -295,7 +295,7 @@ struct cred *prepare_creds(void)
/*
* Prepare credentials for current to perform an execve()
- * - The caller must hold ->cred_guard_mutex
+ * - The caller must hold ->exec_guard_mutex
*/
struct cred *prepare_exec_creds(void)
{
diff --git a/kernel/fork.c b/kernel/fork.c
index e23ccac..98012f7 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1593,7 +1593,7 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->oom_score_adj = current->signal->oom_score_adj;
sig->oom_score_adj_min = current->signal->oom_score_adj_min;
- mutex_init(&sig->cred_guard_mutex);
+ mutex_init(&sig->exec_guard_mutex);
mutex_init(&sig->exec_update_mutex);
return 0;
diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 43d6179..221759e 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -392,9 +392,13 @@ static int ptrace_attach(struct task_struct *task, long request,
* under ptrace.
*/
retval = -ERESTARTNOINTR;
- if (mutex_lock_interruptible(&task->signal->cred_guard_mutex))
+ if (mutex_lock_interruptible(&task->signal->exec_guard_mutex))
goto out;
+ retval = -EAGAIN;
+ if (unlikely(task->signal->unsafe_execve_in_progress))
+ goto unlock_creds;
+
task_lock(task);
retval = __ptrace_may_access(task, PTRACE_MODE_ATTACH_REALCREDS);
task_unlock(task);
@@ -447,7 +451,7 @@ static int ptrace_attach(struct task_struct *task, long request,
unlock_tasklist:
write_unlock_irq(&tasklist_lock);
unlock_creds:
- mutex_unlock(&task->signal->cred_guard_mutex);
+ mutex_unlock(&task->signal->exec_guard_mutex);
out:
if (!retval) {
/*
@@ -472,10 +476,18 @@ static int ptrace_attach(struct task_struct *task, long request,
*/
static int ptrace_traceme(void)
{
- int ret = -EPERM;
+ int ret;
+
+ if (mutex_lock_interruptible(¤t->signal->exec_guard_mutex))
+ return -ERESTARTNOINTR;
+
+ ret = -EAGAIN;
+ if (unlikely(current->signal->unsafe_execve_in_progress))
+ goto unlock_creds;
write_lock_irq(&tasklist_lock);
/* Are we already being traced? */
+ ret = -EPERM;
if (!current->ptrace) {
ret = security_ptrace_traceme(current->parent);
/*
@@ -490,6 +502,8 @@ static int ptrace_traceme(void)
}
write_unlock_irq(&tasklist_lock);
+unlock_creds:
+ mutex_unlock(¤t->signal->exec_guard_mutex);
return ret;
}
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index b6ea3dc..acd6960 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -329,7 +329,7 @@ static int is_ancestor(struct seccomp_filter *parent,
/**
* seccomp_can_sync_threads: checks if all threads can be synchronized
*
- * Expects sighand and cred_guard_mutex locks to be held.
+ * Expects sighand and exec_guard_mutex locks to be held.
*
* Returns 0 on success, -ve on error, or the pid of a thread which was
* either not in the correct seccomp mode or did not have an ancestral
@@ -339,9 +339,12 @@ static inline pid_t seccomp_can_sync_threads(void)
{
struct task_struct *thread, *caller;
- BUG_ON(!mutex_is_locked(¤t->signal->cred_guard_mutex));
+ BUG_ON(!mutex_is_locked(¤t->signal->exec_guard_mutex));
assert_spin_locked(¤t->sighand->siglock);
+ if (unlikely(current->signal->unsafe_execve_in_progress))
+ return -EAGAIN;
+
/* Validate all threads being eligible for synchronization. */
caller = current;
for_each_thread(caller, thread) {
@@ -371,7 +374,7 @@ static inline pid_t seccomp_can_sync_threads(void)
/**
* seccomp_sync_threads: sets all threads to use current's filter
*
- * Expects sighand and cred_guard_mutex locks to be held, and for
+ * Expects sighand and exec_guard_mutex locks to be held, and for
* seccomp_can_sync_threads() to have returned success already
* without dropping the locks.
*
@@ -380,7 +383,7 @@ static inline void seccomp_sync_threads(unsigned long flags)
{
struct task_struct *thread, *caller;
- BUG_ON(!mutex_is_locked(¤t->signal->cred_guard_mutex));
+ BUG_ON(!mutex_is_locked(¤t->signal->exec_guard_mutex));
assert_spin_locked(¤t->sighand->siglock);
/* Synchronize all threads. */
@@ -1319,7 +1322,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
* while another thread is in the middle of calling exec.
*/
if (flags & SECCOMP_FILTER_FLAG_TSYNC &&
- mutex_lock_killable(¤t->signal->cred_guard_mutex))
+ mutex_lock_killable(¤t->signal->exec_guard_mutex))
goto out_put_fd;
spin_lock_irq(¤t->sighand->siglock);
@@ -1337,7 +1340,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
out:
spin_unlock_irq(¤t->sighand->siglock);
if (flags & SECCOMP_FILTER_FLAG_TSYNC)
- mutex_unlock(¤t->signal->cred_guard_mutex);
+ mutex_unlock(¤t->signal->exec_guard_mutex);
out_put_fd:
if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {
if (ret) {
--
1.9.1
From: Sultan Alsawaf <sultan(a)kerneltoast.com>
On all Dell laptops with panels and chipsets that support PSR (an
esoteric power-saving mechanism), both PSR1 and PSR2 cause flickering
and graphical glitches, sometimes to the point of making the laptop
unusable. Many laptops don't support PSR so it isn't known if PSR works
correctly on any consumer hardware right now. PSR was enabled by default
in 5.0 for capable hardware, so this patch just restores the previous
functionality of PSR being disabled by default.
More info is available on the freedesktop bug:
https://gitlab.freedesktop.org/drm/intel/issues/425
Cc: <stable(a)vger.kernel.org> # 5.4.x, 5.5.x
Signed-off-by: Sultan Alsawaf <sultan(a)kerneltoast.com>
---
drivers/gpu/drm/i915/i915_params.c | 3 +--
drivers/gpu/drm/i915/i915_params.h | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
index 1dd1f3652795..0c4661fcf011 100644
--- a/drivers/gpu/drm/i915/i915_params.c
+++ b/drivers/gpu/drm/i915/i915_params.c
@@ -85,8 +85,7 @@ i915_param_named_unsafe(enable_hangcheck, bool, 0600,
i915_param_named_unsafe(enable_psr, int, 0600,
"Enable PSR "
- "(0=disabled, 1=enabled) "
- "Default: -1 (use per-chip default)");
+ "(-1=use per-chip default, 0=disabled [default], 1=enabled) ");
i915_param_named_unsafe(force_probe, charp, 0400,
"Force probe the driver for specified devices. "
diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
index 31b88f297fbc..4a9a3df7d292 100644
--- a/drivers/gpu/drm/i915/i915_params.h
+++ b/drivers/gpu/drm/i915/i915_params.h
@@ -50,7 +50,7 @@ struct drm_printer;
param(int, vbt_sdvo_panel_type, -1) \
param(int, enable_dc, -1) \
param(int, enable_fbc, -1) \
- param(int, enable_psr, -1) \
+ param(int, enable_psr, 0) \
param(int, disable_power_well, -1) \
param(int, enable_ips, 1) \
param(int, invert_brightness, 0) \
--
2.26.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b95d2ccd2ccb834394d50347d0e40dc38a954e4a Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg(a)intel.com>
Date: Thu, 26 Mar 2020 15:53:34 +0100
Subject: [PATCH] mac80211: set IEEE80211_TX_CTRL_PORT_CTRL_PROTO for nl80211
TX
When a frame is transmitted via the nl80211 TX rather than as a
normal frame, IEEE80211_TX_CTRL_PORT_CTRL_PROTO wasn't set and
this will lead to wrong decisions (rate control etc.) being made
about the frame; fix this.
Fixes: 911806491425 ("mac80211: Add support for tx_control_port")
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Link: https://lore.kernel.org/r/20200326155333.f183f52b02f0.I4054e2a8c11c2ddcb795…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 455eb8e6a459..d9cca6dbd870 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -5,7 +5,7 @@
* Copyright 2006-2007 Jiri Benc <jbenc(a)suse.cz>
* Copyright 2007 Johannes Berg <johannes(a)sipsolutions.net>
* Copyright 2013-2014 Intel Mobile Communications GmbH
- * Copyright (C) 2018 Intel Corporation
+ * Copyright (C) 2018, 2020 Intel Corporation
*
* Transmit and frame generation functions.
*/
@@ -5149,6 +5149,7 @@ int ieee80211_tx_control_port(struct wiphy *wiphy, struct net_device *dev,
struct ieee80211_local *local = sdata->local;
struct sk_buff *skb;
struct ethhdr *ehdr;
+ u32 ctrl_flags = 0;
u32 flags;
/* Only accept CONTROL_PORT_PROTOCOL configured in CONNECT/ASSOCIATE
@@ -5158,6 +5159,9 @@ int ieee80211_tx_control_port(struct wiphy *wiphy, struct net_device *dev,
proto != cpu_to_be16(ETH_P_PREAUTH))
return -EINVAL;
+ if (proto == sdata->control_port_protocol)
+ ctrl_flags |= IEEE80211_TX_CTRL_PORT_CTRL_PROTO;
+
if (unencrypted)
flags = IEEE80211_TX_INTFL_DONT_ENCRYPT;
else
@@ -5183,7 +5187,7 @@ int ieee80211_tx_control_port(struct wiphy *wiphy, struct net_device *dev,
skb_reset_mac_header(skb);
local_bh_disable();
- __ieee80211_subif_start_xmit(skb, skb->dev, flags, 0);
+ __ieee80211_subif_start_xmit(skb, skb->dev, flags, ctrl_flags);
local_bh_enable();
return 0;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 4636cf184d6d9a92a56c2554681ea520dd4fe49a Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Fri, 13 Mar 2020 13:36:01 +0000
Subject: [PATCH] afs: Fix some tracing details
Fix a couple of tracelines to indicate the usage count after the atomic op,
not the usage count before it to be consistent with other afs and rxrpc
trace lines.
Change the wording of the afs_call_trace_work trace ID label from "WORK" to
"QUEUE" to reflect the fact that it's queueing work, not doing work.
Fixes: 341f741f04be ("afs: Refcount the afs_call struct")
Signed-off-by: David Howells <dhowells(a)redhat.com>
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 4c28712bb7f6..907d5948564a 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -169,7 +169,7 @@ void afs_put_call(struct afs_call *call)
int n = atomic_dec_return(&call->usage);
int o = atomic_read(&net->nr_outstanding_calls);
- trace_afs_call(call, afs_call_trace_put, n + 1, o,
+ trace_afs_call(call, afs_call_trace_put, n, o,
__builtin_return_address(0));
ASSERTCMP(n, >=, 0);
@@ -736,7 +736,7 @@ static void afs_wake_up_async_call(struct sock *sk, struct rxrpc_call *rxcall,
u = atomic_fetch_add_unless(&call->usage, 1, 0);
if (u != 0) {
- trace_afs_call(call, afs_call_trace_wake, u,
+ trace_afs_call(call, afs_call_trace_wake, u + 1,
atomic_read(&call->net->nr_outstanding_calls),
__builtin_return_address(0));
diff --git a/include/trace/events/afs.h b/include/trace/events/afs.h
index 564ba1b5cf57..c612cabbc378 100644
--- a/include/trace/events/afs.h
+++ b/include/trace/events/afs.h
@@ -233,7 +233,7 @@ enum afs_cb_break_reason {
EM(afs_call_trace_get, "GET ") \
EM(afs_call_trace_put, "PUT ") \
EM(afs_call_trace_wake, "WAKE ") \
- E_(afs_call_trace_work, "WORK ")
+ E_(afs_call_trace_work, "QUEUE")
#define afs_server_traces \
EM(afs_server_trace_alloc, "ALLOC ") \
The Realtek PC Beep Hidden Register[1] is currently set by
patch_realtek.c in two different places:
In alc_fill_eapd_coef(), it's set to the value 0x5757, corresponding to
non-beep input on 1Ah and no 1Ah loopback to either headphones or
speakers. (Although, curiously, the loopback amp is still enabled.) This
write was added fairly recently by commit e3743f431143 ("ALSA:
hda/realtek - Dell headphone has noise on unmute for ALC236") and is a
safe default. However, it happens in the wrong place:
alc_fill_eapd_coef() runs on module load and cold boot but not on S3
resume, meaning the register loses its value after suspend.
Conversely, in alc256_init(), the register is updated to unset bit 13
(disable speaker loopback) and set bit 5 (set non-beep input on 1Ah).
Although this write does run on S3 resume, it's not quite enough to fix
up the register's default value of 0x3717. What's missing is a set of
bit 14 to disable headphone loopback. Without that, we end up with a
feedback loop where the headphone jack is being driven by amplified
samples of itself[2].
This change eliminates the update in alc256_init() and replaces it with
the 0x5757 write from alc_fill_eapd_coef(). Kailang says that 0x5757 is
supposed to be the codec's default value, so using it will make
debugging easier for Realtek.
Affects the ALC255, ALC256, ALC257, ALC235, and ALC236 codecs.
[1] Newly documented in Documentation/sound/hd-audio/realtek-pc-beep.rst
[2] Setting the "Headphone Mic Boost" control from userspace changes
this feedback loop and has been a widely-shared workaround for headphone
noise on laptops like the Dell XPS 13 9350. This commit eliminates the
feedback loop and makes the workaround unnecessary.
Fixes: e3743f431143 ("ALSA: hda/realtek - Dell headphone has noise on unmute for ALC236")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Hebb <tommyhebb(a)gmail.com>
---
Changes in v2:
- Change fixed value from 0x4727 to 0x5757, which should behave
identically, on advice from Kailang.
sound/pci/hda/patch_realtek.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 63e1a56f705b..9efb0a858c64 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -367,7 +367,9 @@ static void alc_fill_eapd_coef(struct hda_codec *codec)
case 0x10ec0215:
case 0x10ec0233:
case 0x10ec0235:
+ case 0x10ec0236:
case 0x10ec0255:
+ case 0x10ec0256:
case 0x10ec0257:
case 0x10ec0282:
case 0x10ec0283:
@@ -379,11 +381,6 @@ static void alc_fill_eapd_coef(struct hda_codec *codec)
case 0x10ec0300:
alc_update_coef_idx(codec, 0x10, 1<<9, 0);
break;
- case 0x10ec0236:
- case 0x10ec0256:
- alc_write_coef_idx(codec, 0x36, 0x5757);
- alc_update_coef_idx(codec, 0x10, 1<<9, 0);
- break;
case 0x10ec0275:
alc_update_coef_idx(codec, 0xe, 0, 1<<0);
break;
@@ -3269,7 +3266,13 @@ static void alc256_init(struct hda_codec *codec)
alc_update_coefex_idx(codec, 0x57, 0x04, 0x0007, 0x4); /* Hight power */
alc_update_coefex_idx(codec, 0x53, 0x02, 0x8000, 1 << 15); /* Clear bit */
alc_update_coefex_idx(codec, 0x53, 0x02, 0x8000, 0 << 15);
- alc_update_coef_idx(codec, 0x36, 1 << 13, 1 << 5); /* Switch pcbeep path to Line in path*/
+ /*
+ * Expose headphone mic (or possibly Line In on some machines) instead
+ * of PC Beep on 1Ah, and disable 1Ah loopback for all outputs. See
+ * Documentation/sound/hd-audio/realtek-pc-beep.rst for details of
+ * this register.
+ */
+ alc_write_coef_idx(codec, 0x36, 0x5757);
}
static void alc256_shutup(struct hda_codec *codec)
--
2.25.2
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From be40920fbf1003c38ccdc02b571e01a75d890c82 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Sat, 7 Mar 2020 03:32:58 +0900
Subject: [PATCH] tools: Let O= makes handle a relative path with -C option
When I tried to compile tools/perf from the top directory with the -C
option, the O= option didn't work correctly if I passed a relative path:
$ make O=BUILD -C tools/perf/
make: Entering directory '/home/mhiramat/ksrc/linux/tools/perf'
BUILD: Doing 'make -j8' parallel build
../scripts/Makefile.include:4: *** O=/home/mhiramat/ksrc/linux/tools/perf/BUILD does not exist. Stop.
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/home/mhiramat/ksrc/linux/tools/perf'
The O= directory existence check failed because the check script ran in
the build target directory instead of the directory where I ran the make
command.
To fix that, once change directory to $(PWD) and check O= directory,
since the PWD is set to where the make command runs.
Fixes: c883122acc0d ("perf tools: Let O= makes handle relative paths")
Reported-by: Randy Dunlap <rdunlap(a)infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Cc: Michal Marek <michal.lkml(a)markovi.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Cc: stable(a)vger.kernel.org
Link: http://lore.kernel.org/lkml/158351957799.3363.15269768530697526765.stgit@de…
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 7902a5681fc8..b8fc7d972be9 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -35,7 +35,7 @@ endif
# Only pass canonical directory names as the output directory:
#
ifneq ($(O),)
- FULL_O := $(shell readlink -f $(O) || echo $(O))
+ FULL_O := $(shell cd $(PWD); readlink -f $(O) || echo $(O))
endif
#
diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
index ded7a950dc40..6d2f3a1b2249 100644
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -1,8 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
ifneq ($(O),)
ifeq ($(origin O), command line)
- dummy := $(if $(shell test -d $(O) || echo $(O)),$(error O=$(O) does not exist),)
- ABSOLUTE_O := $(shell cd $(O) ; pwd)
+ dummy := $(if $(shell cd $(PWD); test -d $(O) || echo $(O)),$(error O=$(O) does not exist),)
+ ABSOLUTE_O := $(shell cd $(PWD); cd $(O) ; pwd)
OUTPUT := $(ABSOLUTE_O)/$(if $(subdir),$(subdir)/)
COMMAND_O := O=$(ABSOLUTE_O)
ifeq ($(objtree),)
The warning about "bdi-%s not registered" in __mark_inode_dirty()
is incorrect and can relatively easily be triggered during failsafe
testing of any file system where the test involves unplugging the
storage without unmounting in the middle of write operations.
Even if the filesystem checks "is device not unplugged" before calling
__mark_inode_dirty() the device unplug can happen after this check has
been performed but before __mark_inode_dirty() gets to the check inside
the WARN().
Thus this patch simply removes this warning as it is perfectly normal
thing to happen when volume is unplugged whilst being written to.
Signed-off-by: Anton Altaparmakov <anton(a)tuxera.com>
Cc: stable(a)vger.kernel.org
---
fs/fs-writeback.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
index 76ac9c7d32ec..443134d3dbf3 100644
--- a/fs/fs-writeback.c
+++ b/fs/fs-writeback.c
@@ -2318,10 +2318,6 @@ void __mark_inode_dirty(struct inode *inode, int flags)
wb = locked_inode_to_wb_and_lock_list(inode);
- WARN(bdi_cap_writeback_dirty(wb->bdi) &&
- !test_bit(WB_registered, &wb->state),
- "bdi-%s not registered\n", wb->bdi->name);
-
inode->dirtied_when = jiffies;
if (dirtytime)
inode->dirtied_time_when = jiffies;
--
2.24.1 (Apple Git-126)
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: b5b72a4e3292 - netfilter: nft_fwd_netdev: allow to redirect to ifb via ingress
The results of these automated tests are provided below.
Overall result: FAILED (see details below)
Merge: OK
Compile: FAILED
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
We attempted to compile the kernel for multiple architectures, but the compile
failed on one or more architectures:
aarch64: FAILED (see build-aarch64.log.xz attachment)
ppc64le: FAILED (see build-ppc64le.log.xz attachment)
s390x: FAILED (see build-s390x.log.xz attachment)
x86_64: FAILED (see build-x86_64.log.xz attachment)
We hope that these logs can help you find the problem quickly. For the full
detail on our testing procedures, please scroll to the bottom of this message.
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hello,
the following one line commit
commit 7df003c85218b5f5b10a7f6418208f31e813f38f
Author: Zhuang Yanying <ann.zhuangyanying(a)huawei.com>
Date: Sat Oct 12 11:37:31 2019 +0800
KVM: fix overflow of zero page refcount with ksm running
applies cleanly to v5.5.y, v5.4.y, 4.19.y, v4.14.y and v4.9.y.
I actually ran into that bug on 4.9.y
Thanks in advance,
Thomas
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7d7587db0d7fd1138f2afcffdc46a8e15630b944 Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Thu, 12 Mar 2020 21:40:06 +0000
Subject: [PATCH] afs: Fix client call Rx-phase signal handling
Fix the handling of signals in client rxrpc calls made by the afs
filesystem. Ignore signals completely, leaving call abandonment or
connection loss to be detected by timeouts inside AF_RXRPC.
Allowing a filesystem call to be interrupted after the entire request has
been transmitted and an abort sent means that the server may or may not
have done the action - and we don't know. It may even be worse than that
for older servers.
Fixes: bc5e3a546d55 ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Signed-off-by: David Howells <dhowells(a)redhat.com>
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 972e3aafa361..1ecc67da6c1a 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -604,11 +604,7 @@ static void afs_deliver_to_call(struct afs_call *call)
long afs_wait_for_call_to_complete(struct afs_call *call,
struct afs_addr_cursor *ac)
{
- signed long rtt2, timeout;
long ret;
- bool stalled = false;
- u64 rtt;
- u32 life, last_life;
bool rxrpc_complete = false;
DECLARE_WAITQUEUE(myself, current);
@@ -619,14 +615,6 @@ long afs_wait_for_call_to_complete(struct afs_call *call,
if (ret < 0)
goto out;
- rtt = rxrpc_kernel_get_rtt(call->net->socket, call->rxcall);
- rtt2 = nsecs_to_jiffies64(rtt) * 2;
- if (rtt2 < 2)
- rtt2 = 2;
-
- timeout = rtt2;
- rxrpc_kernel_check_life(call->net->socket, call->rxcall, &last_life);
-
add_wait_queue(&call->waitq, &myself);
for (;;) {
set_current_state(TASK_UNINTERRUPTIBLE);
@@ -637,37 +625,19 @@ long afs_wait_for_call_to_complete(struct afs_call *call,
call->need_attention = false;
__set_current_state(TASK_RUNNING);
afs_deliver_to_call(call);
- timeout = rtt2;
continue;
}
if (afs_check_call_state(call, AFS_CALL_COMPLETE))
break;
- if (!rxrpc_kernel_check_life(call->net->socket, call->rxcall, &life)) {
+ if (!rxrpc_kernel_check_life(call->net->socket, call->rxcall)) {
/* rxrpc terminated the call. */
rxrpc_complete = true;
break;
}
- if (call->intr && timeout == 0 &&
- life == last_life && signal_pending(current)) {
- if (stalled)
- break;
- __set_current_state(TASK_RUNNING);
- rxrpc_kernel_probe_life(call->net->socket, call->rxcall);
- timeout = rtt2;
- stalled = true;
- continue;
- }
-
- if (life != last_life) {
- timeout = rtt2;
- last_life = life;
- stalled = false;
- }
-
- timeout = schedule_timeout(timeout);
+ schedule();
}
remove_wait_queue(&call->waitq, &myself);
diff --git a/include/net/af_rxrpc.h b/include/net/af_rxrpc.h
index 8e547b4d88c8..04e97bab6f28 100644
--- a/include/net/af_rxrpc.h
+++ b/include/net/af_rxrpc.h
@@ -64,9 +64,7 @@ int rxrpc_kernel_charge_accept(struct socket *, rxrpc_notify_rx_t,
rxrpc_user_attach_call_t, unsigned long, gfp_t,
unsigned int);
void rxrpc_kernel_set_tx_length(struct socket *, struct rxrpc_call *, s64);
-bool rxrpc_kernel_check_life(const struct socket *, const struct rxrpc_call *,
- u32 *);
-void rxrpc_kernel_probe_life(struct socket *, struct rxrpc_call *);
+bool rxrpc_kernel_check_life(const struct socket *, const struct rxrpc_call *);
u32 rxrpc_kernel_get_epoch(struct socket *, struct rxrpc_call *);
bool rxrpc_kernel_get_reply_time(struct socket *, struct rxrpc_call *,
ktime_t *);
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index 7603cf811f75..15ee92d79581 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -371,44 +371,17 @@ EXPORT_SYMBOL(rxrpc_kernel_end_call);
* rxrpc_kernel_check_life - Check to see whether a call is still alive
* @sock: The socket the call is on
* @call: The call to check
- * @_life: Where to store the life value
*
- * Allow a kernel service to find out whether a call is still alive - ie. we're
- * getting ACKs from the server. Passes back in *_life a number representing
- * the life state which can be compared to that returned by a previous call and
- * return true if the call is still alive.
- *
- * If the life state stalls, rxrpc_kernel_probe_life() should be called and
- * then 2RTT waited.
+ * Allow a kernel service to find out whether a call is still alive -
+ * ie. whether it has completed.
*/
bool rxrpc_kernel_check_life(const struct socket *sock,
- const struct rxrpc_call *call,
- u32 *_life)
+ const struct rxrpc_call *call)
{
- *_life = call->acks_latest;
return call->state != RXRPC_CALL_COMPLETE;
}
EXPORT_SYMBOL(rxrpc_kernel_check_life);
-/**
- * rxrpc_kernel_probe_life - Poke the peer to see if it's still alive
- * @sock: The socket the call is on
- * @call: The call to check
- *
- * In conjunction with rxrpc_kernel_check_life(), allow a kernel service to
- * find out whether a call is still alive by pinging it. This should cause the
- * life state to be bumped in about 2*RTT.
- *
- * The must be called in TASK_RUNNING state on pain of might_sleep() objecting.
- */
-void rxrpc_kernel_probe_life(struct socket *sock, struct rxrpc_call *call)
-{
- rxrpc_propose_ACK(call, RXRPC_ACK_PING, 0, true, false,
- rxrpc_propose_ack_ping_for_check_life);
- rxrpc_send_ack_packet(call, true, NULL);
-}
-EXPORT_SYMBOL(rxrpc_kernel_probe_life);
-
/**
* rxrpc_kernel_get_epoch - Retrieve the epoch value from a call.
* @sock: The socket the call is on
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 1f72f43b082d..3eb1ab40ca5c 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -675,7 +675,6 @@ struct rxrpc_call {
/* transmission-phase ACK management */
ktime_t acks_latest_ts; /* Timestamp of latest ACK received */
- rxrpc_serial_t acks_latest; /* serial number of latest ACK received */
rxrpc_seq_t acks_lowest_nak; /* Lowest NACK in the buffer (or ==tx_hard_ack) */
rxrpc_seq_t acks_lost_top; /* tx_top at the time lost-ack ping sent */
rxrpc_serial_t acks_lost_ping; /* Serial number of probe ACK */
diff --git a/net/rxrpc/input.c b/net/rxrpc/input.c
index ef10fbf71b15..69e09d69c896 100644
--- a/net/rxrpc/input.c
+++ b/net/rxrpc/input.c
@@ -882,7 +882,6 @@ static void rxrpc_input_ack(struct rxrpc_call *call, struct sk_buff *skb)
before(prev_pkt, call->ackr_prev_seq))
goto out;
call->acks_latest_ts = skb->tstamp;
- call->acks_latest = sp->hdr.serial;
call->ackr_first_seq = first_soft_ack;
call->ackr_prev_seq = prev_pkt;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From dde9f095583b3f375ba23979045ee10dfcebec2f Mon Sep 17 00:00:00 2001
From: David Howells <dhowells(a)redhat.com>
Date: Fri, 13 Mar 2020 13:46:08 +0000
Subject: [PATCH] afs: Fix handling of an abort from a service handler
When an AFS service handler function aborts a call, AF_RXRPC marks the call
as complete - which means that it's not going to get any more packets from
the receiver. This is a problem because reception of the final ACK is what
triggers afs_deliver_to_call() to drop the final ref on the afs_call
object.
Instead, aborted AFS service calls may then just sit around waiting for
ever or until they're displaced by a new call on the same connection
channel or a connection-level abort.
Fix this by calling afs_set_call_complete() to finalise the afs_call struct
representing the call.
However, we then need to drop the ref that stops the call from being
deallocated. We can do this in afs_set_call_complete(), as the work queue
is holding a separate ref of its own, but then we shouldn't do it in
afs_process_async_call() and afs_delete_async_call().
call->drop_ref is set to indicate that a ref needs dropping for a call and
this is dealt with when we transition a call to AFS_CALL_COMPLETE.
But then we also need to get rid of the ref that pins an asynchronous
client call. We can do this by the same mechanism, setting call->drop_ref
for an async client call too.
We can also get rid of call->incoming since nothing ever sets it and only
one thing ever checks it (futilely).
A trace of the rxrpc_call and afs_call struct ref counting looks like:
<idle>-0 [001] ..s5 164.764892: rxrpc_call: c=00000002 SEE u=3 sp=rxrpc_new_incoming_call+0x473/0xb34 a=00000000442095b5
<idle>-0 [001] .Ns5 164.766001: rxrpc_call: c=00000002 QUE u=4 sp=rxrpc_propose_ACK+0xbe/0x551 a=00000000442095b5
<idle>-0 [001] .Ns4 164.766005: rxrpc_call: c=00000002 PUT u=3 sp=rxrpc_new_incoming_call+0xa3f/0xb34 a=00000000442095b5
<idle>-0 [001] .Ns7 164.766433: afs_call: c=00000002 WAKE u=2 o=11 sp=rxrpc_notify_socket+0x196/0x33c
kworker/1:2-1810 [001] ...1 164.768409: rxrpc_call: c=00000002 SEE u=3 sp=rxrpc_process_call+0x25/0x7ae a=00000000442095b5
kworker/1:2-1810 [001] ...1 164.769439: rxrpc_tx_packet: c=00000002 e9f1a7a8:95786a88:00000008:09c5 00000001 00000000 02 22 ACK CallAck
kworker/1:2-1810 [001] ...1 164.769459: rxrpc_call: c=00000002 PUT u=2 sp=rxrpc_process_call+0x74f/0x7ae a=00000000442095b5
kworker/1:2-1810 [001] ...1 164.770794: afs_call: c=00000002 QUEUE u=3 o=12 sp=afs_deliver_to_call+0x449/0x72c
kworker/1:2-1810 [001] ...1 164.770829: afs_call: c=00000002 PUT u=2 o=12 sp=afs_process_async_call+0xdb/0x11e
kworker/1:2-1810 [001] ...2 164.771084: rxrpc_abort: c=00000002 95786a88:00000008 s=0 a=1 e=1 K-1
kworker/1:2-1810 [001] ...1 164.771461: rxrpc_tx_packet: c=00000002 e9f1a7a8:95786a88:00000008:09c5 00000002 00000000 04 00 ABORT CallAbort
kworker/1:2-1810 [001] ...1 164.771466: afs_call: c=00000002 PUT u=1 o=12 sp=SRXAFSCB_ProbeUuid+0xc1/0x106
The abort generated in SRXAFSCB_ProbeUuid(), labelled "K-1", indicates that
the local filesystem/cache manager didn't recognise the UUID as its own.
Fixes: 2067b2b3f484 ("afs: Fix the CB.ProbeUuid service handler to reply correctly")
Signed-off-by: David Howells <dhowells(a)redhat.com>
diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c
index ff3994a6be23..6765949b3aab 100644
--- a/fs/afs/cmservice.c
+++ b/fs/afs/cmservice.c
@@ -243,6 +243,17 @@ static void afs_cm_destructor(struct afs_call *call)
call->buffer = NULL;
}
+/*
+ * Abort a service call from within an action function.
+ */
+static void afs_abort_service_call(struct afs_call *call, u32 abort_code, int error,
+ const char *why)
+{
+ rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
+ abort_code, error, why);
+ afs_set_call_complete(call, error, 0);
+}
+
/*
* The server supplied a list of callbacks that it wanted to break.
*/
@@ -510,8 +521,7 @@ static void SRXAFSCB_ProbeUuid(struct work_struct *work)
if (memcmp(r, &call->net->uuid, sizeof(call->net->uuid)) == 0)
afs_send_empty_reply(call);
else
- rxrpc_kernel_abort_call(call->net->socket, call->rxcall,
- 1, 1, "K-1");
+ afs_abort_service_call(call, 1, 1, "K-1");
afs_put_call(call);
_leave("");
diff --git a/fs/afs/internal.h b/fs/afs/internal.h
index 1d81fc4c3058..52de2112e1b1 100644
--- a/fs/afs/internal.h
+++ b/fs/afs/internal.h
@@ -154,7 +154,7 @@ struct afs_call {
};
unsigned char unmarshall; /* unmarshalling phase */
unsigned char addr_ix; /* Address in ->alist */
- bool incoming; /* T if incoming call */
+ bool drop_ref; /* T if need to drop ref for incoming call */
bool send_pages; /* T if data from mapping should be sent */
bool need_attention; /* T if RxRPC poked us */
bool async; /* T if asynchronous */
@@ -1209,8 +1209,16 @@ static inline void afs_set_call_complete(struct afs_call *call,
ok = true;
}
spin_unlock_bh(&call->state_lock);
- if (ok)
+ if (ok) {
trace_afs_call_done(call);
+
+ /* Asynchronous calls have two refs to release - one from the alloc and
+ * one queued with the work item - and we can't just deallocate the
+ * call because the work item may be queued again.
+ */
+ if (call->drop_ref)
+ afs_put_call(call);
+ }
}
/*
diff --git a/fs/afs/rxrpc.c b/fs/afs/rxrpc.c
index 907d5948564a..972e3aafa361 100644
--- a/fs/afs/rxrpc.c
+++ b/fs/afs/rxrpc.c
@@ -18,7 +18,6 @@ struct workqueue_struct *afs_async_calls;
static void afs_wake_up_call_waiter(struct sock *, struct rxrpc_call *, unsigned long);
static void afs_wake_up_async_call(struct sock *, struct rxrpc_call *, unsigned long);
-static void afs_delete_async_call(struct work_struct *);
static void afs_process_async_call(struct work_struct *);
static void afs_rx_new_call(struct sock *, struct rxrpc_call *, unsigned long);
static void afs_rx_discard_new_call(struct rxrpc_call *, unsigned long);
@@ -402,8 +401,10 @@ void afs_make_call(struct afs_addr_cursor *ac, struct afs_call *call, gfp_t gfp)
/* If the call is going to be asynchronous, we need an extra ref for
* the call to hold itself so the caller need not hang on to its ref.
*/
- if (call->async)
+ if (call->async) {
afs_get_call(call, afs_call_trace_get);
+ call->drop_ref = true;
+ }
/* create a call */
rxcall = rxrpc_kernel_begin_call(call->net->socket, srx, call->key,
@@ -585,8 +586,6 @@ static void afs_deliver_to_call(struct afs_call *call)
done:
if (call->type->done)
call->type->done(call);
- if (state == AFS_CALL_COMPLETE && call->incoming)
- afs_put_call(call);
out:
_leave("");
return;
@@ -745,21 +744,6 @@ static void afs_wake_up_async_call(struct sock *sk, struct rxrpc_call *rxcall,
}
}
-/*
- * Delete an asynchronous call. The work item carries a ref to the call struct
- * that we need to release.
- */
-static void afs_delete_async_call(struct work_struct *work)
-{
- struct afs_call *call = container_of(work, struct afs_call, async_work);
-
- _enter("");
-
- afs_put_call(call);
-
- _leave("");
-}
-
/*
* Perform I/O processing on an asynchronous call. The work item carries a ref
* to the call struct that we either need to release or to pass on.
@@ -775,16 +759,6 @@ static void afs_process_async_call(struct work_struct *work)
afs_deliver_to_call(call);
}
- if (call->state == AFS_CALL_COMPLETE) {
- /* We have two refs to release - one from the alloc and one
- * queued with the work item - and we can't just deallocate the
- * call because the work item may be queued again.
- */
- call->async_work.func = afs_delete_async_call;
- if (!queue_work(afs_async_calls, &call->async_work))
- afs_put_call(call);
- }
-
afs_put_call(call);
_leave("");
}
@@ -811,6 +785,7 @@ void afs_charge_preallocation(struct work_struct *work)
if (!call)
break;
+ call->drop_ref = true;
call->async = true;
call->state = AFS_CALL_SV_AWAIT_OP_ID;
init_waitqueue_head(&call->waitq);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From be40920fbf1003c38ccdc02b571e01a75d890c82 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Sat, 7 Mar 2020 03:32:58 +0900
Subject: [PATCH] tools: Let O= makes handle a relative path with -C option
When I tried to compile tools/perf from the top directory with the -C
option, the O= option didn't work correctly if I passed a relative path:
$ make O=BUILD -C tools/perf/
make: Entering directory '/home/mhiramat/ksrc/linux/tools/perf'
BUILD: Doing 'make -j8' parallel build
../scripts/Makefile.include:4: *** O=/home/mhiramat/ksrc/linux/tools/perf/BUILD does not exist. Stop.
make: *** [Makefile:70: all] Error 2
make: Leaving directory '/home/mhiramat/ksrc/linux/tools/perf'
The O= directory existence check failed because the check script ran in
the build target directory instead of the directory where I ran the make
command.
To fix that, once change directory to $(PWD) and check O= directory,
since the PWD is set to where the make command runs.
Fixes: c883122acc0d ("perf tools: Let O= makes handle relative paths")
Reported-by: Randy Dunlap <rdunlap(a)infradead.org>
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Cc: Jiri Olsa <jolsa(a)redhat.com>
Cc: Masahiro Yamada <masahiroy(a)kernel.org>
Cc: Michal Marek <michal.lkml(a)markovi.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Sasha Levin <sashal(a)kernel.org>
Cc: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
Cc: stable(a)vger.kernel.org
Link: http://lore.kernel.org/lkml/158351957799.3363.15269768530697526765.stgit@de…
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/Makefile b/tools/perf/Makefile
index 7902a5681fc8..b8fc7d972be9 100644
--- a/tools/perf/Makefile
+++ b/tools/perf/Makefile
@@ -35,7 +35,7 @@ endif
# Only pass canonical directory names as the output directory:
#
ifneq ($(O),)
- FULL_O := $(shell readlink -f $(O) || echo $(O))
+ FULL_O := $(shell cd $(PWD); readlink -f $(O) || echo $(O))
endif
#
diff --git a/tools/scripts/Makefile.include b/tools/scripts/Makefile.include
index ded7a950dc40..6d2f3a1b2249 100644
--- a/tools/scripts/Makefile.include
+++ b/tools/scripts/Makefile.include
@@ -1,8 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
ifneq ($(O),)
ifeq ($(origin O), command line)
- dummy := $(if $(shell test -d $(O) || echo $(O)),$(error O=$(O) does not exist),)
- ABSOLUTE_O := $(shell cd $(O) ; pwd)
+ dummy := $(if $(shell cd $(PWD); test -d $(O) || echo $(O)),$(error O=$(O) does not exist),)
+ ABSOLUTE_O := $(shell cd $(PWD); cd $(O) ; pwd)
OUTPUT := $(ABSOLUTE_O)/$(if $(subdir),$(subdir)/)
COMMAND_O := O=$(ABSOLUTE_O)
ifeq ($(objtree),)
This is a note to let you know that I've just added the patch titled
amba: Initialize dma_parms for amba devices
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 5caf6102e32ead7ed5d21b5309c1a4a7d70e6a9f Mon Sep 17 00:00:00 2001
From: Ulf Hansson <ulf.hansson(a)linaro.org>
Date: Wed, 25 Mar 2020 12:34:07 +0100
Subject: amba: Initialize dma_parms for amba devices
It's currently the amba driver's responsibility to initialize the pointer,
dma_parms, for its corresponding struct device. The benefit with this
approach allows us to avoid the initialization and to not waste memory for
the struct device_dma_parameters, as this can be decided on a case by case
basis.
However, it has turned out that this approach is not very practical. Not
only does it lead to open coding, but also to real errors. In principle
callers of dma_set_max_seg_size() doesn't check the error code, but just
assumes it succeeds.
For these reasons, let's do the initialization from the common amba bus at
the device registration point. This also follows the way the PCI devices
are being managed, see pci_device_add().
Cc: <stable(a)vger.kernel.org>
Cc: Russell King <linux(a)armlinux.org.uk>
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Tested-by: Ludovic Barre <ludovic.barre(a)st.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
Link: https://lore.kernel.org/r/20200325113407.26996-3-ulf.hansson@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/amba/bus.c | 2 ++
include/linux/amba/bus.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index fe1523664816..5e61783ce92d 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -374,6 +374,8 @@ static int amba_device_try_add(struct amba_device *dev, struct resource *parent)
WARN_ON(dev->irq[0] == (unsigned int)-1);
WARN_ON(dev->irq[1] == (unsigned int)-1);
+ dev->dev.dma_parms = &dev->dma_parms;
+
ret = request_resource(parent, &dev->res);
if (ret)
goto err_out;
diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h
index 26f0ecf401ea..0bbfd647f5c6 100644
--- a/include/linux/amba/bus.h
+++ b/include/linux/amba/bus.h
@@ -65,6 +65,7 @@ struct amba_device {
struct device dev;
struct resource res;
struct clk *pclk;
+ struct device_dma_parameters dma_parms;
unsigned int periphid;
unsigned int cid;
struct amba_cs_uci_id uci;
--
2.26.0
This is a note to let you know that I've just added the patch titled
driver core: platform: Initialize dma_parms for platform devices
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 7c8978c0837d40c302f5e90d24c298d9ca9fc097 Mon Sep 17 00:00:00 2001
From: Ulf Hansson <ulf.hansson(a)linaro.org>
Date: Wed, 25 Mar 2020 12:34:06 +0100
Subject: driver core: platform: Initialize dma_parms for platform devices
It's currently the platform driver's responsibility to initialize the
pointer, dma_parms, for its corresponding struct device. The benefit with
this approach allows us to avoid the initialization and to not waste memory
for the struct device_dma_parameters, as this can be decided on a case by
case basis.
However, it has turned out that this approach is not very practical. Not
only does it lead to open coding, but also to real errors. In principle
callers of dma_set_max_seg_size() doesn't check the error code, but just
assumes it succeeds.
For these reasons, let's do the initialization from the common platform bus
at the device registration point. This also follows the way the PCI devices
are being managed, see pci_device_add().
Cc: <stable(a)vger.kernel.org>
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Tested-by: Ludovic Barre <ludovic.barre(a)st.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
Link: https://lore.kernel.org/r/20200325113407.26996-2-ulf.hansson@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/base/platform.c | 1 +
include/linux/platform_device.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index b5ce7b085795..46abbfb52655 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -512,6 +512,7 @@ int platform_device_add(struct platform_device *pdev)
pdev->dev.parent = &platform_bus;
pdev->dev.bus = &platform_bus_type;
+ pdev->dev.dma_parms = &pdev->dma_parms;
switch (pdev->id) {
default:
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 041bfa412aa0..81900b3cbe37 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -25,6 +25,7 @@ struct platform_device {
bool id_auto;
struct device dev;
u64 platform_dma_mask;
+ struct device_dma_parameters dma_parms;
u32 num_resources;
struct resource *resource;
--
2.26.0
This is a note to let you know that I've just added the patch titled
misc: rtsx: set correct pcr_ops for rts522A
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 10cea23b6aae15e8324f4101d785687f2c514fe5 Mon Sep 17 00:00:00 2001
From: YueHaibing <yuehaibing(a)huawei.com>
Date: Thu, 26 Mar 2020 11:26:18 +0800
Subject: misc: rtsx: set correct pcr_ops for rts522A
rts522a should use rts522a_pcr_ops, which is
diffrent with rts5227 in phy/hw init setting.
Fixes: ce6a5acc9387 ("mfd: rtsx: Add support for rts522A")
Signed-off-by: YueHaibing <yuehaibing(a)huawei.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20200326032618.20472-1-yuehaibing@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/cardreader/rts5227.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/misc/cardreader/rts5227.c b/drivers/misc/cardreader/rts5227.c
index 423fecc19fc4..3a9467aaa435 100644
--- a/drivers/misc/cardreader/rts5227.c
+++ b/drivers/misc/cardreader/rts5227.c
@@ -394,6 +394,7 @@ static const struct pcr_ops rts522a_pcr_ops = {
void rts522a_init_params(struct rtsx_pcr *pcr)
{
rts5227_init_params(pcr);
+ pcr->ops = &rts522a_pcr_ops;
pcr->tx_initial_phase = SET_CLOCK_PHASE(20, 20, 11);
pcr->reg_pm_ctrl3 = RTS522A_PM_CTRL3;
--
2.26.0
This is a note to let you know that I've just added the patch titled
mei: me: add cedar fork device ids
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 99397d33b763dc554d118aaa38cc5abc6ce985de Mon Sep 17 00:00:00 2001
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Date: Tue, 24 Mar 2020 23:07:30 +0200
Subject: mei: me: add cedar fork device ids
Add Cedar Fork (CDF) device ids, those belongs to the cannon point family.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Link: https://lore.kernel.org/r/20200324210730.17672-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/hw-me-regs.h | 2 ++
drivers/misc/mei/pci-me.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/drivers/misc/mei/hw-me-regs.h b/drivers/misc/mei/hw-me-regs.h
index d2359aed79ae..9392934e3a06 100644
--- a/drivers/misc/mei/hw-me-regs.h
+++ b/drivers/misc/mei/hw-me-regs.h
@@ -87,6 +87,8 @@
#define MEI_DEV_ID_CMP_H 0x06e0 /* Comet Lake H */
#define MEI_DEV_ID_CMP_H_3 0x06e4 /* Comet Lake H 3 (iTouch) */
+#define MEI_DEV_ID_CDF 0x18D3 /* Cedar Fork */
+
#define MEI_DEV_ID_ICP_LP 0x34E0 /* Ice Lake Point LP */
#define MEI_DEV_ID_JSP_N 0x4DE0 /* Jasper Lake Point N */
diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c
index ebdc2d6f8ddb..3d21c38e2dbb 100644
--- a/drivers/misc/mei/pci-me.c
+++ b/drivers/misc/mei/pci-me.c
@@ -102,6 +102,8 @@ static const struct pci_device_id mei_me_pci_tbl[] = {
{MEI_PCI_DEVICE(MEI_DEV_ID_MCC, MEI_ME_PCH15_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_MCC_4, MEI_ME_PCH8_CFG)},
+ {MEI_PCI_DEVICE(MEI_DEV_ID_CDF, MEI_ME_PCH8_CFG)},
+
/* required last entry */
{0, }
};
--
2.26.0
This is a note to let you know that I've just added the patch titled
coresight: do not use the BIT() macro in the UAPI header
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 9b6eaaf3db5e5888df7bca7fed7752a90f7fd871 Mon Sep 17 00:00:00 2001
From: Eugene Syromiatnikov <esyr(a)redhat.com>
Date: Tue, 24 Mar 2020 05:22:13 +0100
Subject: coresight: do not use the BIT() macro in the UAPI header
The BIT() macro definition is not available for the UAPI headers
(moreover, it can be defined differently in the user space); replace
its usage with the _BITUL() macro that is defined in <linux/const.h>.
Fixes: 237483aa5cf4 ("coresight: stm: adding driver for CoreSight STM component")
Signed-off-by: Eugene Syromiatnikov <esyr(a)redhat.com>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20200324042213.GA10452@asgard.redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/uapi/linux/coresight-stm.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/coresight-stm.h b/include/uapi/linux/coresight-stm.h
index aac550a52f80..8847dbf24151 100644
--- a/include/uapi/linux/coresight-stm.h
+++ b/include/uapi/linux/coresight-stm.h
@@ -2,8 +2,10 @@
#ifndef __UAPI_CORESIGHT_STM_H_
#define __UAPI_CORESIGHT_STM_H_
-#define STM_FLAG_TIMESTAMPED BIT(3)
-#define STM_FLAG_GUARANTEED BIT(7)
+#include <linux/const.h>
+
+#define STM_FLAG_TIMESTAMPED _BITUL(3)
+#define STM_FLAG_GUARANTEED _BITUL(7)
/*
* The CoreSight STM supports guaranteed and invariant timing
--
2.26.0
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: venus: firmware: Ignore secure call error on first resume
Author: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Date: Wed Mar 4 11:09:49 2020 +0100
With the latest cleanup in qcom scm driver the secure monitor
call for setting the remote processor state returns EINVAL when
it is called for the first time and after another scm call
auth_and_reset. The error returned from scm call could be ignored
because the state transition is already done in auth_and_reset.
Acked-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
Signed-off-by: Stanimir Varbanov <stanimir.varbanov(a)linaro.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei(a)kernel.org>
drivers/media/platform/qcom/venus/firmware.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
---
diff --git a/drivers/media/platform/qcom/venus/firmware.c b/drivers/media/platform/qcom/venus/firmware.c
index de6812fb55f4..8801a6a7543d 100644
--- a/drivers/media/platform/qcom/venus/firmware.c
+++ b/drivers/media/platform/qcom/venus/firmware.c
@@ -44,8 +44,14 @@ static void venus_reset_cpu(struct venus_core *core)
int venus_set_hw_state(struct venus_core *core, bool resume)
{
- if (core->use_tz)
- return qcom_scm_set_remote_state(resume, 0);
+ int ret;
+
+ if (core->use_tz) {
+ ret = qcom_scm_set_remote_state(resume, 0);
+ if (resume && ret == -EINVAL)
+ ret = 0;
+ return ret;
+ }
if (resume)
venus_reset_cpu(core);
Hi,
Please consider applying the following patches to v4.14.y.
The following patches were found to be missing in v4.14.y by the ChromeOS
missing patch robot. The patches meet the following criteria.
- The patch includes a Fixes: tag
- The patch referenced in the Fixes: tag has been applied to v4.14.y
- The patch has not been applied to v4.14.y
All patches have been applied to v4.14.y and chromeos-4.14. Resulting images
have been build- and runtime-tested on real hardware with chromeos-4.14
and with virtual hardware on kerneltests.org.
Upstream commit 76fc52bd07d3 ("arm64: ptrace: map SPSR_ELx<->PSR for compat tasks")
Fixes: 7206dc93a58fb764 ("arm64: Expose Arm v8.4 features")
in v4.14.y: 053cdffad3dd
Upstream commit 25dc2c80cfa3 ("arm64: compat: map SPSR_ELx<->PSR for signals")
Fixes: 7206dc93a58fb764 ("arm64: Expose Arm v8.4 features")
in v4.14.y: 053cdffad3dd
Upstream commit 93a64ee71d10 ("MAINTAINERS: Remove deleted file from futex file pattern")
Fixes: 04e7712f4460 ("y2038: futex: Move compat implementation into futex.c")
in v4.14.y: 0c08f1da992d
Notes:
Also applies to v4.19.y.
This is an example for a patch which isn't really necessary
(it doesn't fix a bug, only an entry in the the MAINTAINERS file),
but automation won't be able to know that. Please let me know
what to do with similar patches in the future.
Upstream commit 074376ac0e1d ("ftrace/x86: Anotate text_mutex split between ftrace_arch_code_modify_post_process() and ftrace_arch_code_modify_prepare()")
Fixes: 39611265edc1a ("ftrace/x86: Add a comment to why we take text_mutex in ftrace_arch_code_modify_prepare()")
Fixes: d5b844a2cf507 ("ftrace/x86: Remove possible deadlock between register_kprobe() and ftrace_run_update_code()")
in v4.14.y: 0c0b54770189 (upstream d5b844a2cf507)
Notes:
Also applies to v4.19.y.
Thanks,
Guenter
From: Johannes Berg <johannes.berg(a)intel.com>
commit 7937fd3227055892e169f4b34d21157e57d919e2 upstream.
The code now compiles without ACPI, but there's a warning since
iwl_mvm_get_ppag_table() isn't used, and iwl_mvm_ppag_init() must
not unconditionally fail but return success instead.
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Luca Coelho <luciano.coelho(a)intel.com>
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
[Drop hunk removing iwl_mvm_get_ppag_table() since it doesn't exist in
5.4]
Signed-off-by: Jason Andryuk <jandryuk(a)gmail.com>
---
A 5.4 kernel can't "up" an iwlwifi interface when CONFIG_ACPI=n.
`wpa_supplicant` or `ip link set wlan0 up` return "No such file or
directory". The non-acpi stub iwl_mvm_ppag_init() always returns
-ENOENT which means iwl_mvm_up() always fails. Backporting the commit
lets iwl_mvm_up() succeed.
drivers/net/wireless/intel/iwlwifi/mvm/fw.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
index c59cbb8cbdd7..c54fe6650018 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/fw.c
@@ -1181,7 +1181,7 @@ int iwl_mvm_ppag_send_cmd(struct iwl_mvm *mvm)
static int iwl_mvm_ppag_init(struct iwl_mvm *mvm)
{
- return -ENOENT;
+ return 0;
}
#endif /* CONFIG_ACPI */
--
2.25.1
Declaring setjmp()/longjmp() as taking longs makes the signature
non-standard, and makes clang complain. In the past, this has been
worked around by adding -ffreestanding to the compile flags.
The implementation looks like it only ever propagates the value
(in longjmp) or sets it to 1 (in setjmp), and we only call longjmp
with integer parameters.
This allows removing -ffreestanding from the compilation flags.
Context:
https://lore.kernel.org/patchwork/patch/1214060https://lore.kernel.org/patchwork/patch/1216174
Signed-off-by: Clement Courbet <courbet(a)google.com>
Reviewed-by: Nathan Chancellor <natechancellor(a)gmail.com>
Tested-by: Nathan Chancellor <natechancellor(a)gmail.com>
---
v2:
Use and array type as suggested by Segher Boessenkool
Cc: stable(a)vger.kernel.org # v4.14+
Fixes: c9029ef9c957 ("powerpc: Avoid clang warnings around setjmp and longjmp")
---
arch/powerpc/include/asm/setjmp.h | 6 ++++--
arch/powerpc/kexec/Makefile | 3 ---
arch/powerpc/xmon/Makefile | 3 ---
3 files changed, 4 insertions(+), 8 deletions(-)
diff --git a/arch/powerpc/include/asm/setjmp.h b/arch/powerpc/include/asm/setjmp.h
index e9f81bb3f83b..f798e80e4106 100644
--- a/arch/powerpc/include/asm/setjmp.h
+++ b/arch/powerpc/include/asm/setjmp.h
@@ -7,7 +7,9 @@
#define JMP_BUF_LEN 23
-extern long setjmp(long *) __attribute__((returns_twice));
-extern void longjmp(long *, long) __attribute__((noreturn));
+typedef long jmp_buf[JMP_BUF_LEN];
+
+extern int setjmp(jmp_buf env) __attribute__((returns_twice));
+extern void longjmp(jmp_buf env, int val) __attribute__((noreturn));
#endif /* _ASM_POWERPC_SETJMP_H */
diff --git a/arch/powerpc/kexec/Makefile b/arch/powerpc/kexec/Makefile
index 378f6108a414..86380c69f5ce 100644
--- a/arch/powerpc/kexec/Makefile
+++ b/arch/powerpc/kexec/Makefile
@@ -3,9 +3,6 @@
# Makefile for the linux kernel.
#
-# Avoid clang warnings around longjmp/setjmp declarations
-CFLAGS_crash.o += -ffreestanding
-
obj-y += core.o crash.o core_$(BITS).o
obj-$(CONFIG_PPC32) += relocate_32.o
diff --git a/arch/powerpc/xmon/Makefile b/arch/powerpc/xmon/Makefile
index c3842dbeb1b7..6f9cccea54f3 100644
--- a/arch/powerpc/xmon/Makefile
+++ b/arch/powerpc/xmon/Makefile
@@ -1,9 +1,6 @@
# SPDX-License-Identifier: GPL-2.0
# Makefile for xmon
-# Avoid clang warnings around longjmp/setjmp declarations
-subdir-ccflags-y := -ffreestanding
-
GCOV_PROFILE := n
KCOV_INSTRUMENT := n
UBSAN_SANITIZE := n
--
2.26.0.rc2.310.g2932bb562d-goog
The Realtek PC Beep Hidden Register[1] is currently set by
patch_realtek.c in two different places:
In alc_fill_eapd_coef(), it's set to the value 0x5757, corresponding to
non-beep input on 1Ah and no 1Ah loopback to either headphones or
speakers. (Although, curiously, the loopback amp is still enabled.) This
write was added fairly recently by commit e3743f431143 ("ALSA:
hda/realtek - Dell headphone has noise on unmute for ALC236") and is a
safe default. However, it happens in the wrong place:
alc_fill_eapd_coef() runs on module load and cold boot but not on S3
resume, meaning the register loses its value after suspend.
Conversely, in alc256_init(), the register is updated to unset bit 13
(disable speaker loopback) and set bit 5 (set non-beep input on 1Ah).
Although this write does run on S3 resume, it's not quite enough to fix
up the register's default value of 0x3717. What's missing is a set of
bit 14 to disable headphone loopback. Without that, we end up with a
feedback loop where the headphone jack is being driven by amplified
samples of itself[2].
This change eliminates the write in alc_fill_eapd_coef() and replaces
the update in alc256_init() with a write of the fixed value 0x4727. This
value ought to have the same behavior as 0x5757--disabling all PC Beep
routing that isn't part of the HDA node graph--but it has two
differences:
1. To enable non-beep input on 1Ah, the `b` field is set to 1, like the
previous code in alc256_init() used, instead of 2, like the value
written by alc_fill_eapd_coef() used. My testing shows these two
values to behave identically, but, in case there is a difference,
a bitwise update of the register seems like a more reliable source
of truth than a magic number written without explanation.
2. Loopback amplification is disabled (unset L and R bits) because no
loopback path is in use.
Affects the ALC255, ALC256, ALC257, ALC235, and ALC236 codecs.
[1] Newly documented in Documentation/sound/hd-audio/realtek-pc-beep.rst
[2] Setting the "Headphone Mic Boost" control from userspace changes
this feedback loop and has been a widely-shared workaround for headphone
noise on laptops like the Dell XPS 13 9350. This commit eliminates the
feedback loop and makes the workaround unnecessary.
Fixes: e3743f431143 ("ALSA: hda/realtek - Dell headphone has noise on unmute for ALC236")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Hebb <tommyhebb(a)gmail.com>
---
sound/pci/hda/patch_realtek.c | 15 +++++++++------
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 63e1a56f705b..024dd61a788b 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -367,7 +367,9 @@ static void alc_fill_eapd_coef(struct hda_codec *codec)
case 0x10ec0215:
case 0x10ec0233:
case 0x10ec0235:
+ case 0x10ec0236:
case 0x10ec0255:
+ case 0x10ec0256:
case 0x10ec0257:
case 0x10ec0282:
case 0x10ec0283:
@@ -379,11 +381,6 @@ static void alc_fill_eapd_coef(struct hda_codec *codec)
case 0x10ec0300:
alc_update_coef_idx(codec, 0x10, 1<<9, 0);
break;
- case 0x10ec0236:
- case 0x10ec0256:
- alc_write_coef_idx(codec, 0x36, 0x5757);
- alc_update_coef_idx(codec, 0x10, 1<<9, 0);
- break;
case 0x10ec0275:
alc_update_coef_idx(codec, 0xe, 0, 1<<0);
break;
@@ -3269,7 +3266,13 @@ static void alc256_init(struct hda_codec *codec)
alc_update_coefex_idx(codec, 0x57, 0x04, 0x0007, 0x4); /* Hight power */
alc_update_coefex_idx(codec, 0x53, 0x02, 0x8000, 1 << 15); /* Clear bit */
alc_update_coefex_idx(codec, 0x53, 0x02, 0x8000, 0 << 15);
- alc_update_coef_idx(codec, 0x36, 1 << 13, 1 << 5); /* Switch pcbeep path to Line in path*/
+ /*
+ * Expose headphone mic (or possibly Line In on some machines) instead
+ * of PC Beep on 1Ah, and disable 1Ah loopback for all outputs. See
+ * Documentation/sound/hd-audio/realtek-pc-beep.rst for details of
+ * this register.
+ */
+ alc_write_coef_idx(codec, 0x36, 0x4727);
}
static void alc256_shutup(struct hda_codec *codec)
--
2.25.2
From: Johannes Berg <johannes.berg(a)intel.com>
The original patch didn't copy the ieee80211_is_data() condition
because on most drivers the management frames don't go through
this path. However, they do on iwlwifi/mvm, so we do need to keep
the condition here.
Cc: stable(a)vger.kernel.org
Fixes: ce2e1ca70307 ("mac80211: Check port authorization in the ieee80211_tx_dequeue() case")
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
---
Dave, can you please apply this directly?
(sorry, I shall remember to use git commit --amend properly)
---
net/mac80211/tx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d9cca6dbd870..efe4c1fc68e5 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3610,7 +3610,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
* Drop unicast frames to unauthorised stations unless they are
* EAPOL frames from the local station.
*/
- if (unlikely(!ieee80211_vif_is_mesh(&tx.sdata->vif) &&
+ if (unlikely(ieee80211_is_data(hdr->frame_control) &&
+ !ieee80211_vif_is_mesh(&tx.sdata->vif) &&
tx.sdata->vif.type != NL80211_IFTYPE_OCB &&
!is_multicast_ether_addr(hdr->addr1) &&
!test_sta_flag(tx.sta, WLAN_STA_AUTHORIZED) &&
--
2.25.1
Hello friend,
This is Julian Smith and i am purchasing manager from Sinara Group Co.,LTD in Russia.
We are glad to know about your company from the web and we are interested in your products.
Could you kindly send us your Latest catalog and price list for our trial order.
Thanks and Best Regards,
Ms. Julian Smith
Purchasing Manager
Sinara Group Co.,LTD
The command ring and cursor ring use different notify port addresses
definition: QXL_IO_NOTIFY_CMD and QXL_IO_NOTIFY_CURSOR. However, in
qxl_device_init() we use QXL_IO_NOTIFY_CMD to create both command ring
and cursor ring. This doesn't cause any problems now, because QEMU's
behaviors on QXL_IO_NOTIFY_CMD and QXL_IO_NOTIFY_CURSOR are the same.
However, QEMU's behavior may be change in future, so let's fix it.
P.S.: In the X.org QXL driver, the notify port address of cursor ring
is correct.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Huacai Chen <chenhc(a)lemote.com>
---
drivers/gpu/drm/qxl/qxl_kms.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/qxl/qxl_kms.c b/drivers/gpu/drm/qxl/qxl_kms.c
index bfc1631..9bdbe0d 100644
--- a/drivers/gpu/drm/qxl/qxl_kms.c
+++ b/drivers/gpu/drm/qxl/qxl_kms.c
@@ -218,7 +218,7 @@ int qxl_device_init(struct qxl_device *qdev,
&(qdev->ram_header->cursor_ring_hdr),
sizeof(struct qxl_command),
QXL_CURSOR_RING_SIZE,
- qdev->io_base + QXL_IO_NOTIFY_CMD,
+ qdev->io_base + QXL_IO_NOTIFY_CURSOR,
false,
&qdev->cursor_event);
--
2.7.0
The Power Management Events (PMEs) the INT0002 driver listens for get
signalled by the Power Management Controller (PMC) using the same IRQ
as used for the ACPI SCI.
Since commit fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from
waking up the system") the SCI triggering without there being a wakeup
cause recognized by the ACPI sleep code will no longer wakeup the system.
This breaks PMEs / wakeups signalled to the INT0002 driver, the system
never leaves the s2idle_loop() now.
Use acpi_s2idle_register_wake_callback to register a function which
checks the GPE0a_STS register for a PME and trigger a wakeup when a
PME has been signalled.
With this new mechanism the pm_wakeup_hard_event() call is no longer
necessary, so remove it and also remove the matching device_init_wakeup()
calls.
Fixes: fdde0ff8590b ("ACPI: PM: s2idle: Prevent spurious SCIs from waking up the system")
Cc: 5.4+ <stable(a)vger.kernel.org> # 5.4+
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/platform/x86/intel_int0002_vgpio.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/drivers/platform/x86/intel_int0002_vgpio.c b/drivers/platform/x86/intel_int0002_vgpio.c
index f14e2c5f9da5..3c70694188fe 100644
--- a/drivers/platform/x86/intel_int0002_vgpio.c
+++ b/drivers/platform/x86/intel_int0002_vgpio.c
@@ -122,11 +122,17 @@ static irqreturn_t int0002_irq(int irq, void *data)
generic_handle_irq(irq_find_mapping(chip->irq.domain,
GPE0A_PME_B0_VIRT_GPIO_PIN));
- pm_wakeup_hard_event(chip->parent);
-
return IRQ_HANDLED;
}
+static bool int0002_check_wake(void *data)
+{
+ u32 gpe_sts_reg;
+
+ gpe_sts_reg = inl(GPE0A_STS_PORT);
+ return (gpe_sts_reg & GPE0A_PME_B0_STS_BIT);
+}
+
static struct irq_chip int0002_byt_irqchip = {
.name = DRV_NAME,
.irq_ack = int0002_irq_ack,
@@ -220,13 +226,13 @@ static int int0002_probe(struct platform_device *pdev)
return ret;
}
- device_init_wakeup(dev, true);
+ acpi_s2idle_register_wake_callback(irq, int0002_check_wake, NULL);
return 0;
}
static int int0002_remove(struct platform_device *pdev)
{
- device_init_wakeup(&pdev->dev, false);
+ acpi_s2idle_unregister_wake_callback(int0002_check_wake, NULL);
return 0;
}
--
2.26.0
The following commit has been merged into the irq/core branch of tip:
Commit-ID: 6a214a28132f19ace3d835a6d8f6422ec80ad200
Gitweb: https://git.kernel.org/tip/6a214a28132f19ace3d835a6d8f6422ec80ad200
Author: Sungbo Eo <mans0n(a)gorani.run>
AuthorDate: Sat, 21 Mar 2020 22:38:42 +09:00
Committer: Marc Zyngier <maz(a)kernel.org>
CommitterDate: Sun, 22 Mar 2020 11:52:16
irqchip/versatile-fpga: Apply clear-mask earlier
Clear its own IRQs before the parent IRQ get enabled, so that the
remaining IRQs do not accidentally interrupt the parent IRQ controller.
This patch also fixes a reboot bug on OX820 SoC, where the remaining
rps-timer IRQ raises a GIC interrupt that is left pending. After that,
the rps-timer IRQ is cleared during driver initialization, and there's
no IRQ left in rps-irq when local_irq_enable() is called, which evokes
an error message "unexpected IRQ trap".
Fixes: bdd272cbb97a ("irqchip: versatile FPGA: support cascaded interrupts from DT")
Signed-off-by: Sungbo Eo <mans0n(a)gorani.run>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20200321133842.2408823-1-mans0n@gorani.run
---
drivers/irqchip/irq-versatile-fpga.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/irqchip/irq-versatile-fpga.c b/drivers/irqchip/irq-versatile-fpga.c
index 70e2cff..f138673 100644
--- a/drivers/irqchip/irq-versatile-fpga.c
+++ b/drivers/irqchip/irq-versatile-fpga.c
@@ -212,6 +212,9 @@ int __init fpga_irq_of_init(struct device_node *node,
if (of_property_read_u32(node, "valid-mask", &valid_mask))
valid_mask = 0;
+ writel(clear_mask, base + IRQ_ENABLE_CLEAR);
+ writel(clear_mask, base + FIQ_ENABLE_CLEAR);
+
/* Some chips are cascaded from a parent IRQ */
parent_irq = irq_of_parse_and_map(node, 0);
if (!parent_irq) {
@@ -221,9 +224,6 @@ int __init fpga_irq_of_init(struct device_node *node,
fpga_irq_init(base, node->name, 0, parent_irq, valid_mask, node);
- writel(clear_mask, base + IRQ_ENABLE_CLEAR);
- writel(clear_mask, base + FIQ_ENABLE_CLEAR);
-
/*
* On Versatile AB/PB, some secondary interrupts have a direct
* pass-thru to the primary controller for IRQs 20 and 22-31 which need
From: Johannes Berg <johannes.berg(a)intel.com>
The original patch didn't copy the ieee80211_is_data() condition
because on most drivers the management frames don't go through
this path. However, they do on iwlwifi/mvm, so we do need to keep
the condition here.
Cc: stable(a)vger.kernel.org
Fixes: ce2e1ca70307 ("mac80211: Check port authorization in the ieee80211_tx_dequeue() case")
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
---
Dave, can you please apply this directly?
---
net/mac80211/tx.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index d9cca6dbd870..19a67f639471 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3610,7 +3610,8 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
* Drop unicast frames to unauthorised stations unless they are
* EAPOL frames from the local station.
*/
- if (unlikely(!ieee80211_vif_is_mesh(&tx.sdata->vif) &&
+ if (unlikely(ieee80211_is_data(hdr->fc) &&
+ !ieee80211_vif_is_mesh(&tx.sdata->vif) &&
tx.sdata->vif.type != NL80211_IFTYPE_OCB &&
!is_multicast_ether_addr(hdr->addr1) &&
!test_sta_flag(tx.sta, WLAN_STA_AUTHORIZED) &&
--
2.25.1
On Sun, Mar 29, 2020 at 11:04 AM David Hildenbrand <david(a)redhat.com> wrote:
>
>
> What I received via the mailing list (e.g., linux-mm(a)kvack.org)
>
> Message-Id: <20200128093542.6908-1-david(a)redhat.com>
> MIME-Version: 1.0
> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13
> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
> Sender: owner-linux-mm(a)kvack.org
> Precedence: bulk
> X-Loop: owner-majordomo(a)kvack.org
> List-ID: <linux-mm.kvack.org>
> [...]
> X-Mimecast-Spam-Score: 1
> Content-Type: text/plain; charset=US-ASCII
> Content-Transfer-Encoding: quoted-printable
> X-Scanned-By: MIMEDefang 2.78 on 10.11.54.4
> [...]
>
> And a lot of this MIME crap.
Well, that may still be a perfectly fine email.
Yes, it has the MIME crap, but it also has that
Content-Transfer-Encoding: quoted-printable
which should tell all users how to _handle_ that MIME crap.
It's sad that people in this day and age still don't just handle
Content-Transfer-Encoding: 8bit
and just send it on untouched, but SMTP certainly encourages that bad
behavior of "convert to 7-bit MIME crap", because in theory there
could be SMTP servers out there that can't handle anything 8-bit or
with longer lines.
Those SMTP servers should just be scrapped and people told not to use
them, but sadly that's not the approach email people have taken.
They've taken the approach that old garbage SMTP servers should be
allowed to exist and destroy email for the rest of us.
> I have no idea if such a conversion is expected to be done.
It is (sadly) expected to be done by a lot of mail software.
But the problem is that some part of your email handling code then
doesn't _undo_ the MIME conversion, and leaves the MIME turds alone,
while then that "Content-Transfer-Encoding: quoted-printable" got
lost.
Do you at any point end up using a raw mbox and cut-and-pasting stuff?
Reading email in a broken mail-reader that doesn't undo MIME? Because
that's the usual way that these kinds of turds get copied.. Using raw
emails without honoring or taking that "Content-Transfer-Encoding"
into account.
Linus
From: David Hildenbrand <david(a)redhat.com>
Subject: drivers/base/memory.c: indicate all memory blocks as removable
We see multiple issues with the implementation/interface to compute
whether a memory block can be offlined (exposed via
/sys/devices/system/memory/memoryX/removable) and would like to simplify
it (remove the implementation).
1. It runs basically lockless. While this might be good for performance,
we see possible races with memory offlining that will require at least
some sort of locking to fix.
2. Nowadays, more false positives are possible. No arch-specific checks
are performed that validate if memory offlining will not be denied
right away (and such check will require locking). For example, arm64
won't allow to offline any memory block that was added during boot -
which will imply a very high error rate. Other archs have other
constraints.
3. The interface is inherently racy. E.g., if a memory block is
detected to be removable (and was not a false positive at that time),
there is still no guarantee that offlining will actually succeed. So
any caller already has to deal with false positives.
4. It is unclear which performance benefit this interface actually
provides. The introducing commit 5c755e9fd813 ("memory-hotplug: add
sysfs removable attribute for hotplug memory remove") mentioned
"A user-level agent must be able to identify which sections of
memory are likely to be removable before attempting the
potentially expensive operation."
However, no actual performance comparison was included.
Known users:
- lsmem: Will group memory blocks based on the "removable" property. [1]
- chmem: Indirect user. It has a RANGE mode where one can specify
removable ranges identified via lsmem to be offlined. However, it
also has a "SIZE" mode, which allows a sysadmin to skip the manual
"identify removable blocks" step. [2]
- powerpc-utils: Uses the "removable" attribute to skip some memory
blocks right away when trying to find some to
offline+remove. However, with ballooning enabled, it
already skips this information completely (because it
once resulted in many false negatives). Therefore, the
implementation can deal with false positives properly
already. [3]
According to Nathan Fontenot, DLPAR on powerpc is nowadays no longer
driven from userspace via the drmgr command (powerpc-utils). Nowadays
it's managed in the kernel - including onlining/offlining of memory
blocks - triggered by drmgr writing to /sys/kernel/dlpar. So the
affected legacy userspace handling is only active on old kernels. Only ve=
ry
old versions of drmgr on a new kernel (unlikely) might execute slower -
totally acceptable.
With CONFIG_MEMORY_HOTREMOVE, always indicating "removable" should not
break any user space tool. We implement a very bad heuristic now. Withou=
t
CONFIG_MEMORY_HOTREMOVE we cannot offline anything, so report
"not removable" as before.
Original discussion can be found in [4] ("[PATCH RFC v1] mm:
is_mem_section_removable() overhaul").
Other users of is_mem_section_removable() will be removed next, so that
we can remove is_mem_section_removable() completely.
[1] http://man7.org/linux/man-pages/man1/lsmem.1.html
[2] http://man7.org/linux/man-pages/man8/chmem.8.html
[3] https://github.com/ibm-power-utilities/powerpc-utils
[4] https://lkml.kernel.org/r/20200117105759.27905-1-david@redhat.com
Also, this patch probably fixes a crash reported by Steve.
http://lkml.kernel.org/r/CAPcyv4jpdaNvJ67SkjyUJLBnBnXXQv686BiVW042g03FUmWLX…
Link: http://lkml.kernel.org/r/20200128093542.6908-1-david@redhat.com
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Suggested-by: Michal Hocko <mhocko(a)kernel.org>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Reviewed-by: Nathan Fontenot <ndfont(a)gmail.com>
Reported-by: "Scargall, Steve" <steve.scargall(a)intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Cc: Badari Pulavarty <pbadari(a)us.ibm.com>
Cc: Robert Jennings <rcj(a)linux.vnet.ibm.com>
Cc: Heiko Carstens <heiko.carstens(a)de.ibm.com>
Cc: Karel Zak <kzak(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/base/memory.c | 23 +++--------------------
1 file changed, 3 insertions(+), 20 deletions(-)
--- a/drivers/base/memory.c~drivers-base-memoryc-indicate-all-memory-blocks-as-removable
+++ a/drivers/base/memory.c
@@ -97,30 +97,13 @@ static ssize_t phys_index_show(struct de
}
/*
- * Show whether the memory block is likely to be offlineable (or is already
- * offline). Once offline, the memory block could be removed. The return
- * value does, however, not indicate that there is a way to remove the
- * memory block.
+ * Legacy interface that we cannot remove. Always indicate "removable"
+ * with CONFIG_MEMORY_HOTREMOVE - bad heuristic.
*/
static ssize_t removable_show(struct device *dev, struct device_attribute *attr,
char *buf)
{
- struct memory_block *mem = to_memory_block(dev);
- unsigned long pfn;
- int ret = 1, i;
-
- if (mem->state != MEM_ONLINE)
- goto out;
-
- for (i = 0; i < sections_per_block; i++) {
- if (!present_section_nr(mem->start_section_nr + i))
- continue;
- pfn = section_nr_to_pfn(mem->start_section_nr + i);
- ret &= is_mem_section_removable(pfn, PAGES_PER_SECTION);
- }
-
-out:
- return sprintf(buf, "%d\n", ret);
+ return sprintf(buf, "%d\n", (int)IS_ENABLED(CONFIG_MEMORY_HOTREMOVE));
}
/*
_
From: Roman Gushchin <guro(a)fb.com>
Subject: mm: fork: fix kernel_stack memcg stats for various stack implementations
Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the
space for task stacks can be allocated using __vmalloc_node_range(),
alloc_pages_node() and kmem_cache_alloc_node(). In the first and the
second cases page->mem_cgroup pointer is set, but in the third it's not:
memcg membership of a slab page should be determined using the
memcg_from_slab_page() function, which looks at
page->slab_cache->memcg_params.memcg . In this case, using
mod_memcg_page_state() (as in account_kernel_stack()) is incorrect:
page->mem_cgroup pointer is NULL even for pages charged to a non-root
memory cgroup.
It can lead to kernel_stack per-memcg counters permanently showing 0 on
some architectures (depending on the configuration).
In order to fix it, let's introduce a mod_memcg_obj_state() helper, which
takes a pointer to a kernel object as a first argument, uses
mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls
mod_memcg_state(). It allows to handle all possible configurations
(CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without
spilling any memcg/kmem specifics into fork.c .
Note: This is a special version of the patch created for stable
backports. It contains code from the following two patches:
- mm: memcg/slab: introduce mem_cgroup_from_obj()
- mm: fork: fix kernel_stack memcg stats for various stack implementations
[guro(a)fb.com: introduce mem_cgroup_from_obj()]
Link: http://lkml.kernel.org/r/20200324004221.GA36662@carbon.dhcp.thefacebook.com
Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.com
Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <guro(a)fb.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Bharata B Rao <bharata(a)linux.ibm.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/memcontrol.h | 12 +++++++++++
kernel/fork.c | 4 +--
mm/memcontrol.c | 38 +++++++++++++++++++++++++++++++++++
3 files changed, 52 insertions(+), 2 deletions(-)
--- a/include/linux/memcontrol.h~mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations
+++ a/include/linux/memcontrol.h
@@ -695,6 +695,7 @@ static inline unsigned long lruvec_page_
void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx,
int val);
void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val);
+void mod_memcg_obj_state(void *p, int idx, int val);
static inline void mod_lruvec_state(struct lruvec *lruvec,
enum node_stat_item idx, int val)
@@ -1123,6 +1124,10 @@ static inline void __mod_lruvec_slab_sta
__mod_node_page_state(page_pgdat(page), idx, val);
}
+static inline void mod_memcg_obj_state(void *p, int idx, int val)
+{
+}
+
static inline
unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order,
gfp_t gfp_mask,
@@ -1427,6 +1432,8 @@ static inline int memcg_cache_id(struct
return memcg ? memcg->kmemcg_id : -1;
}
+struct mem_cgroup *mem_cgroup_from_obj(void *p);
+
#else
static inline int memcg_kmem_charge(struct page *page, gfp_t gfp, int order)
@@ -1468,6 +1475,11 @@ static inline void memcg_put_cache_ids(v
{
}
+static inline struct mem_cgroup *mem_cgroup_from_obj(void *p)
+{
+ return NULL;
+}
+
#endif /* CONFIG_MEMCG_KMEM */
#endif /* _LINUX_MEMCONTROL_H */
--- a/kernel/fork.c~mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations
+++ a/kernel/fork.c
@@ -397,8 +397,8 @@ static void account_kernel_stack(struct
mod_zone_page_state(page_zone(first_page), NR_KERNEL_STACK_KB,
THREAD_SIZE / 1024 * account);
- mod_memcg_page_state(first_page, MEMCG_KERNEL_STACK_KB,
- account * (THREAD_SIZE / 1024));
+ mod_memcg_obj_state(stack, MEMCG_KERNEL_STACK_KB,
+ account * (THREAD_SIZE / 1024));
}
}
--- a/mm/memcontrol.c~mm-fork-fix-kernel_stack-memcg-stats-for-various-stack-implementations
+++ a/mm/memcontrol.c
@@ -777,6 +777,17 @@ void __mod_lruvec_slab_state(void *p, en
rcu_read_unlock();
}
+void mod_memcg_obj_state(void *p, int idx, int val)
+{
+ struct mem_cgroup *memcg;
+
+ rcu_read_lock();
+ memcg = mem_cgroup_from_obj(p);
+ if (memcg)
+ mod_memcg_state(memcg, idx, val);
+ rcu_read_unlock();
+}
+
/**
* __count_memcg_events - account VM events in a cgroup
* @memcg: the memory cgroup
@@ -2661,6 +2672,33 @@ static void commit_charge(struct page *p
}
#ifdef CONFIG_MEMCG_KMEM
+/*
+ * Returns a pointer to the memory cgroup to which the kernel object is charged.
+ *
+ * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(),
+ * cgroup_mutex, etc.
+ */
+struct mem_cgroup *mem_cgroup_from_obj(void *p)
+{
+ struct page *page;
+
+ if (mem_cgroup_disabled())
+ return NULL;
+
+ page = virt_to_head_page(p);
+
+ /*
+ * Slab pages don't have page->mem_cgroup set because corresponding
+ * kmem caches can be reparented during the lifetime. That's why
+ * memcg_from_slab_page() should be used instead.
+ */
+ if (PageSlab(page))
+ return memcg_from_slab_page(page);
+
+ /* All other pages use page->mem_cgroup */
+ return page->mem_cgroup;
+}
+
static int memcg_alloc_cache_id(void)
{
int id, size;
_
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/sparse: fix kernel crash with pfn_section_valid check
Fix the below crash
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000000c3447c
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
...
NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
LR [c000000000088354] vmemmap_free+0x144/0x320
Call Trace:
section_deactivate+0x220/0x240
__remove_pages+0x118/0x170
arch_remove_memory+0x3c/0x150
memunmap_pages+0x1cc/0x2f0
devm_action_release+0x30/0x50
release_nodes+0x2f8/0x3e0
device_release_driver_internal+0x168/0x270
unbind_store+0x130/0x170
drv_attr_store+0x44/0x60
sysfs_kf_write+0x68/0x80
kernfs_fop_write+0x100/0x290
__vfs_write+0x3c/0x70
vfs_write+0xcc/0x240
ksys_write+0x7c/0x140
system_call+0x5c/0x68
The crash is due to NULL dereference at
test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL; in
pfn_section_valid()
With commit d41e2f3bd546 ("mm/hotplug: fix hot remove failure in
SPARSEMEM|!VMEMMAP case") section_mem_map is set to NULL after
depopulate_section_mem(). This was done so that pfn_page() can work
correctly with kernel config that disables SPARSEMEM_VMEMMAP. With that
config pfn_to_page does
__section_mem_map_addr(__sec) + __pfn;
where
static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
unsigned long map = section->section_mem_map;
map &= SECTION_MAP_MASK;
return (struct page *)map;
}
Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is
used to check the pfn validity (pfn_valid()). Since section_deactivate
release mem_section->usage if a section is fully deactivated, pfn_valid()
check after a subsection_deactivate cause a kernel crash.
static inline int pfn_valid(unsigned long pfn)
{
...
return early_section(ms) || pfn_section_valid(ms, pfn);
}
where
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
int idx = subsection_map_index(pfn);
return test_bit(idx, ms->usage->subsection_map);
}
Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is
freed. For architectures like ppc64 where large pages are used for
vmmemap mapping (16MB), a specific vmemmap mapping can cover multiple
sections. Hence before a vmemmap mapping page can be freed, the kernel
needs to make sure there are no valid sections within that mapping.
Clearing the section valid bit before depopulate_section_memap enables
this.
[aneesh.kumar(a)linux.ibm.com: add comment]
Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.com…: http://lkml.kernel.org/r/20200325031914.107660-1-aneesh.kumar@linux.ibm.com
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Reviewed-by: Wei Yang <richard.weiyang(a)gmail.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/mm/sparse.c~mm-sparse-fix-kernel-crash-with-pfn_section_valid-check
+++ a/mm/sparse.c
@@ -781,6 +781,12 @@ static void section_deactivate(unsigned
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+ /*
+ * Mark the section invalid so that valid_section()
+ * return false. This prevents code from dereferencing
+ * ms->usage array.
+ */
+ ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
}
if (section_is_early && memmap)
_
The patch titled
Subject: mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2
has been removed from the -mm tree. Its filename was
mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2.patch
This patch was dropped because it was folded into mm-sparse-fix-kernel-crash-with-pfn_section_valid-check.patch
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2
add comment
Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.com
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/mm/sparse.c~mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2
+++ a/mm/sparse.c
@@ -781,7 +781,11 @@ static void section_deactivate(unsigned
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
- /* Mark the section invalid */
+ /*
+ * Mark the section invalid so that valid_section()
+ * return false. This prevents code from dereferencing
+ * ms->usage array.
+ */
ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
}
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
mm-sparse-fix-kernel-crash-with-pfn_section_valid-check.patch
The patch titled
Subject: umh: fix refcount underflow in fork_usermode_blob().
has been added to the -mm tree. Its filename is
umh-fix-refcount-underflow-in-fork_usermode_blob.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/umh-fix-refcount-underflow-in-fork…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/umh-fix-refcount-underflow-in-fork…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Tetsuo Handa <penguin-kernel(a)i-love.sakura.ne.jp>
Subject: umh: fix refcount underflow in fork_usermode_blob().
Since free_bprm(bprm) always calls allow_write_access(bprm->file) and
fput(bprm->file) if bprm->file is set to non-NULL, __do_execve_file()
must call deny_write_access(file) and get_file(file) if called from
do_execve_file() path. Otherwise, use-after-free access can happen at
fput(file) in fork_usermode_blob().
general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] SMP DEBUG_PAGEALLOC
CPU: 3 PID: 4131 Comm: insmod Tainted: G O 5.6.0-rc5+ #978
Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/29/2019
RIP: 0010:fork_usermode_blob+0xaa/0x190
Link: http://lkml.kernel.org/r/9b846b1f-a231-4f09-8c37-6bfb0d1e7b05@i-love.sakura…
Signed-off-by: Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
Fixes: 449325b52b7a6208 ("umh: introduce fork_usermode_blob() helper")
Cc: <stable(a)vger.kernel.org> [4.18+]
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/exec.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
--- a/fs/exec.c~umh-fix-refcount-underflow-in-fork_usermode_blob
+++ a/fs/exec.c
@@ -1761,11 +1761,17 @@ static int __do_execve_file(int fd, stru
check_unsafe_exec(bprm);
current->in_execve = 1;
- if (!file)
+ if (!file) {
file = do_open_execat(fd, filename, flags);
- retval = PTR_ERR(file);
- if (IS_ERR(file))
- goto out_unmark;
+ retval = PTR_ERR(file);
+ if (IS_ERR(file))
+ goto out_unmark;
+ } else {
+ retval = deny_write_access(file);
+ if (retval)
+ goto out_unmark;
+ get_file(file);
+ }
sched_exec();
_
Patches currently in -mm which might be from penguin-kernel(a)i-love.sakura.ne.jp are
kernel-hung_taskc-monitor-killed-tasks.patch
umh-fix-refcount-underflow-in-fork_usermode_blob.patch
Hi,
Please consider applying the following patches to v4.4.y.
The following patches were found to be missing in v4.4.y by the ChromeOS
missing patches robot. The patches meet the following criteria.
- The patch includes a Fixes: tag
- The patch referenced in the Fixes: tag has been applied to v4.4.y
- The patch itself has not been applied to v4.4.y
All patches have been applied to v4.4.y and chromeos-4-4. Resulting images
have been build- and runtime-tested on real hardware running chromeos-4.4
and with virtual hardware on kerneltests.org.
Upstream commit 14fa91e0fef8 ("IB/ipoib: Do not warn if IPoIB debugfs doesn't exist")
Fixes: 771a52584096 ("IB/IPoIB: ibX: failed to create mcg debug file")
in v4.4.y: 771a52584096
Upstream commit efc45154828a ("uapi glibc compat: fix outer guard of net device flags enum")
Fixes: 4a91cb61bb99 ("uapi glibc compat: fix compile errors when glibc net/if.h included before linux/if.h")
in v4.4.y: 1575c095e444
Upstream commit c4409905cd6e ("KVM: VMX: Do not allow reexecute_instruction() when skipping MMIO instr")
Fixes: d391f1207067 ("x86/kvm/vmx: do not use vm-exit instruction length for fast MMIO when running nested")
in v4.4.y: 0c53038267a9
Notes:
This patch also applies to v4.9.y.
This patch has already been applied to v4.14.y.
This patch does not affect v4.19.y and later.
Upstream commit b76ba4af4ddd ("drivers/hwspinlock: use correct radix tree API")
Fixes: c6400ba7e13a ("drivers/hwspinlock: fix race between radix tree insertion and lookup")
in v4.4.y: 077b6173a8c8
Upstream commit 28d35bcdd392 ("net: ipv4: don't let PMTU updates increase route MTU")
Fixes: d52e5a7e7ca4 ("ipv4: lock mtu in fnhe when received PMTU < net.ipv4.route.min_pmtu")
in v4.4.y: 119bbaa6795a
Notes:
This patch also applies to v4.9.y.
This patch also applies to v4.14.y.
This patch does not affect v4.19.y and later.
Note: I generated this notification manually. I hope I'll have some
automation soon; until then I'll send similar notifications for other
kernel branches as I find the time.
Thanks,
Guenter
Commit 42d84c8490f9 ("vhost: Check docket sk_family instead of call getname")
fixes CVE-2020-10942. It has been applied to v4.14.y and later, but not to v4.4.y.
While it does not apply directly to v4.4.y, its backport to v4.14.y (commit
ff8e12b0cfe2 in v4.14.y) does. Please apply the backport to v4.4.y as well.
Thanks,
Guenter
The following commit has been merged into the smp/core branch of tip:
Commit-ID: e98eac6ff1b45e4e73f2e6031b37c256ccb5d36b
Gitweb: https://git.kernel.org/tip/e98eac6ff1b45e4e73f2e6031b37c256ccb5d36b
Author: Thomas Gleixner <tglx(a)linutronix.de>
AuthorDate: Fri, 27 Mar 2020 12:06:44 +01:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Sat, 28 Mar 2020 11:42:55 +01:00
cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
A recent change to freeze_secondary_cpus() which added an early abort if a
wakeup is pending missed the fact that the function is also invoked for
shutdown, reboot and kexec via disable_nonboot_cpus().
In case of disable_nonboot_cpus() the wakeup event needs to be ignored as
the purpose is to terminate the currently running kernel.
Add a 'suspend' argument which is only set when the freeze is in context of
a suspend operation. If not set then an eventually pending wakeup event is
ignored.
Fixes: a66d955e910a ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending")
Reported-by: Boqun Feng <boqun.feng(a)gmail.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Pavankumar Kondeti <pkondeti(a)codeaurora.org>
Cc: stable(a)vger.kernel.org
Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de
---
include/linux/cpu.h | 12 +++++++++---
kernel/cpu.c | 4 ++--
2 files changed, 11 insertions(+), 5 deletions(-)
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 9ead281..beaed2d 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -144,12 +144,18 @@ static inline void get_online_cpus(void) { cpus_read_lock(); }
static inline void put_online_cpus(void) { cpus_read_unlock(); }
#ifdef CONFIG_PM_SLEEP_SMP
-extern int freeze_secondary_cpus(int primary);
+int __freeze_secondary_cpus(int primary, bool suspend);
+static inline int freeze_secondary_cpus(int primary)
+{
+ return __freeze_secondary_cpus(primary, true);
+}
+
static inline int disable_nonboot_cpus(void)
{
- return freeze_secondary_cpus(0);
+ return __freeze_secondary_cpus(0, false);
}
-extern void enable_nonboot_cpus(void);
+
+void enable_nonboot_cpus(void);
static inline int suspend_disable_secondary_cpus(void)
{
diff --git a/kernel/cpu.c b/kernel/cpu.c
index 3084849..12ae636 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -1327,7 +1327,7 @@ void bringup_nonboot_cpus(unsigned int setup_max_cpus)
#ifdef CONFIG_PM_SLEEP_SMP
static cpumask_var_t frozen_cpus;
-int freeze_secondary_cpus(int primary)
+int __freeze_secondary_cpus(int primary, bool suspend)
{
int cpu, error = 0;
@@ -1352,7 +1352,7 @@ int freeze_secondary_cpus(int primary)
if (cpu == primary)
continue;
- if (pm_wakeup_pending()) {
+ if (suspend && pm_wakeup_pending()) {
pr_info("Wakeup pending. Abort CPU freeze\n");
error = -EBUSY;
break;
This is a note to let you know that I've just added the patch titled
vt: vt_ioctl: fix use-after-free in vt_in_use()
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 7cf64b18b0b96e751178b8d0505d8466ff5a448f Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Sat, 21 Mar 2020 20:43:05 -0700
Subject: vt: vt_ioctl: fix use-after-free in vt_in_use()
vt_in_use() dereferences console_driver->ttys[i] without proper locking.
This is broken because the tty can be closed and freed concurrently.
We could fix this by using 'READ_ONCE(console_driver->ttys[i]) != NULL'
and skipping the check of tty_struct::count. But, looking at
console_driver->ttys[i] isn't really appropriate anyway because even if
it is NULL the tty can still be in the process of being closed.
Instead, fix it by making vt_in_use() require console_lock() and check
whether the vt is allocated and has port refcount > 1. This works since
following the patch "vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use
virtual console" the port refcount is incremented while the vt is open.
Reproducer (very unreliable, but it worked for me after a few minutes):
#include <fcntl.h>
#include <linux/vt.h>
int main()
{
int fd, nproc;
struct vt_stat state;
char ttyname[16];
fd = open("/dev/tty10", O_RDONLY);
for (nproc = 1; nproc < 8; nproc *= 2)
fork();
for (;;) {
sprintf(ttyname, "/dev/tty%d", rand() % 8);
close(open(ttyname, O_RDONLY));
ioctl(fd, VT_GETSTATE, &state);
}
}
KASAN report:
BUG: KASAN: use-after-free in vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
BUG: KASAN: use-after-free in vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
Read of size 4 at addr ffff888065722468 by task syz-vt2/132
CPU: 0 PID: 132 Comm: syz-vt2 Not tainted 5.6.0-rc5-00130-g089b6d3654916 #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
Call Trace:
[...]
vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660
[...]
Allocated by task 136:
[...]
kzalloc include/linux/slab.h:669 [inline]
alloc_tty_struct+0x96/0x8a0 drivers/tty/tty_io.c:2982
tty_init_dev+0x23/0x350 drivers/tty/tty_io.c:1334
tty_open_by_driver drivers/tty/tty_io.c:1987 [inline]
tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035
[...]
Freed by task 41:
[...]
kfree+0xbf/0x200 mm/slab.c:3757
free_tty_struct+0x8d/0xb0 drivers/tty/tty_io.c:177
release_one_tty+0x22d/0x2f0 drivers/tty/tty_io.c:1468
process_one_work+0x7f1/0x14b0 kernel/workqueue.c:2264
worker_thread+0x8b/0xc80 kernel/workqueue.c:2410
[...]
Fixes: 4001d7b7fc27 ("vt: push down the tty lock so we can see what is left to tackle")
Cc: <stable(a)vger.kernel.org> # v3.4+
Acked-by: Jiri Slaby <jslaby(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Link: https://lore.kernel.org/r/20200322034305.210082-3-ebiggers@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/vt/vt_ioctl.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
index f62f498f63c0..daf61c28ba76 100644
--- a/drivers/tty/vt/vt_ioctl.c
+++ b/drivers/tty/vt/vt_ioctl.c
@@ -43,9 +43,15 @@ bool vt_dont_switch;
static inline bool vt_in_use(unsigned int i)
{
- extern struct tty_driver *console_driver;
+ const struct vc_data *vc = vc_cons[i].d;
- return console_driver->ttys[i] && console_driver->ttys[i]->count;
+ /*
+ * console_lock must be held to prevent the vc from being deallocated
+ * while we're checking whether it's in-use.
+ */
+ WARN_CONSOLE_UNLOCKED();
+
+ return vc && kref_read(&vc->port.kref) > 1;
}
static inline bool vt_busy(int i)
@@ -643,15 +649,16 @@ int vt_ioctl(struct tty_struct *tty,
struct vt_stat __user *vtstat = up;
unsigned short state, mask;
- /* Review: FIXME: Console lock ? */
if (put_user(fg_console + 1, &vtstat->v_active))
ret = -EFAULT;
else {
state = 1; /* /dev/tty0 is always open */
+ console_lock(); /* required by vt_in_use() */
for (i = 0, mask = 2; i < MAX_NR_CONSOLES && mask;
++i, mask <<= 1)
if (vt_in_use(i))
state |= mask;
+ console_unlock();
ret = put_user(state, &vtstat->v_state);
}
break;
@@ -661,10 +668,11 @@ int vt_ioctl(struct tty_struct *tty,
* Returns the first available (non-opened) console.
*/
case VT_OPENQRY:
- /* FIXME: locking ? - but then this is a stupid API */
+ console_lock(); /* required by vt_in_use() */
for (i = 0; i < MAX_NR_CONSOLES; ++i)
if (!vt_in_use(i))
break;
+ console_unlock();
uival = i < MAX_NR_CONSOLES ? (i+1) : -1;
goto setint;
--
2.26.0
This is a note to let you know that I've just added the patch titled
vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From ca4463bf8438b403596edd0ec961ca0d4fbe0220 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Sat, 21 Mar 2020 20:43:04 -0700
Subject: vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console
The VT_DISALLOCATE ioctl can free a virtual console while tty_release()
is still running, causing a use-after-free in con_shutdown(). This
occurs because VT_DISALLOCATE considers a virtual console's
'struct vc_data' to be unused as soon as the corresponding tty's
refcount hits 0. But actually it may be still being closed.
Fix this by making vc_data be reference-counted via the embedded
'struct tty_port'. A newly allocated virtual console has refcount 1.
Opening it for the first time increments the refcount to 2. Closing it
for the last time decrements the refcount (in tty_operations::cleanup()
so that it happens late enough), as does VT_DISALLOCATE.
Reproducer:
#include <fcntl.h>
#include <linux/vt.h>
#include <sys/ioctl.h>
#include <unistd.h>
int main()
{
if (fork()) {
for (;;)
close(open("/dev/tty5", O_RDWR));
} else {
int fd = open("/dev/tty10", O_RDWR);
for (;;)
ioctl(fd, VT_DISALLOCATE, 5);
}
}
KASAN report:
BUG: KASAN: use-after-free in con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278
Write of size 8 at addr ffff88806a4ec108 by task syz_vt/129
CPU: 0 PID: 129 Comm: syz_vt Not tainted 5.6.0-rc2 #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
Call Trace:
[...]
con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278
release_tty+0xa8/0x410 drivers/tty/tty_io.c:1514
tty_release_struct+0x34/0x50 drivers/tty/tty_io.c:1629
tty_release+0x984/0xed0 drivers/tty/tty_io.c:1789
[...]
Allocated by task 129:
[...]
kzalloc include/linux/slab.h:669 [inline]
vc_allocate drivers/tty/vt/vt.c:1085 [inline]
vc_allocate+0x1ac/0x680 drivers/tty/vt/vt.c:1066
con_install+0x4d/0x3f0 drivers/tty/vt/vt.c:3229
tty_driver_install_tty drivers/tty/tty_io.c:1228 [inline]
tty_init_dev+0x94/0x350 drivers/tty/tty_io.c:1341
tty_open_by_driver drivers/tty/tty_io.c:1987 [inline]
tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035
[...]
Freed by task 130:
[...]
kfree+0xbf/0x1e0 mm/slab.c:3757
vt_disallocate drivers/tty/vt/vt_ioctl.c:300 [inline]
vt_ioctl+0x16dc/0x1e30 drivers/tty/vt/vt_ioctl.c:818
tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660
[...]
Fixes: 4001d7b7fc27 ("vt: push down the tty lock so we can see what is left to tackle")
Cc: <stable(a)vger.kernel.org> # v3.4+
Reported-by: syzbot+522643ab5729b0421998(a)syzkaller.appspotmail.com
Acked-by: Jiri Slaby <jslaby(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Link: https://lore.kernel.org/r/20200322034305.210082-2-ebiggers@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/vt/vt.c | 23 ++++++++++++++++++++++-
drivers/tty/vt/vt_ioctl.c | 12 ++++--------
2 files changed, 26 insertions(+), 9 deletions(-)
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index bbc26d73209a..309a39197be0 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -1075,6 +1075,17 @@ static void visual_deinit(struct vc_data *vc)
module_put(vc->vc_sw->owner);
}
+static void vc_port_destruct(struct tty_port *port)
+{
+ struct vc_data *vc = container_of(port, struct vc_data, port);
+
+ kfree(vc);
+}
+
+static const struct tty_port_operations vc_port_ops = {
+ .destruct = vc_port_destruct,
+};
+
int vc_allocate(unsigned int currcons) /* return 0 on success */
{
struct vt_notifier_param param;
@@ -1100,6 +1111,7 @@ int vc_allocate(unsigned int currcons) /* return 0 on success */
vc_cons[currcons].d = vc;
tty_port_init(&vc->port);
+ vc->port.ops = &vc_port_ops;
INIT_WORK(&vc_cons[currcons].SAK_work, vc_SAK);
visual_init(vc, currcons, 1);
@@ -3250,6 +3262,7 @@ static int con_install(struct tty_driver *driver, struct tty_struct *tty)
tty->driver_data = vc;
vc->port.tty = tty;
+ tty_port_get(&vc->port);
if (!tty->winsize.ws_row && !tty->winsize.ws_col) {
tty->winsize.ws_row = vc_cons[currcons].d->vc_rows;
@@ -3285,6 +3298,13 @@ static void con_shutdown(struct tty_struct *tty)
console_unlock();
}
+static void con_cleanup(struct tty_struct *tty)
+{
+ struct vc_data *vc = tty->driver_data;
+
+ tty_port_put(&vc->port);
+}
+
static int default_color = 7; /* white */
static int default_italic_color = 2; // green (ASCII)
static int default_underline_color = 3; // cyan (ASCII)
@@ -3410,7 +3430,8 @@ static const struct tty_operations con_ops = {
.throttle = con_throttle,
.unthrottle = con_unthrottle,
.resize = vt_resize,
- .shutdown = con_shutdown
+ .shutdown = con_shutdown,
+ .cleanup = con_cleanup,
};
static struct cdev vc0_cdev;
diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
index 7297997fcf04..f62f498f63c0 100644
--- a/drivers/tty/vt/vt_ioctl.c
+++ b/drivers/tty/vt/vt_ioctl.c
@@ -310,10 +310,8 @@ static int vt_disallocate(unsigned int vc_num)
vc = vc_deallocate(vc_num);
console_unlock();
- if (vc && vc_num >= MIN_NR_CONSOLES) {
- tty_port_destroy(&vc->port);
- kfree(vc);
- }
+ if (vc && vc_num >= MIN_NR_CONSOLES)
+ tty_port_put(&vc->port);
return ret;
}
@@ -333,10 +331,8 @@ static void vt_disallocate_all(void)
console_unlock();
for (i = 1; i < MAX_NR_CONSOLES; i++) {
- if (vc[i] && i >= MIN_NR_CONSOLES) {
- tty_port_destroy(&vc[i]->port);
- kfree(vc[i]);
- }
+ if (vc[i] && i >= MIN_NR_CONSOLES)
+ tty_port_put(&vc[i]->port);
}
}
--
2.26.0
This is a note to let you know that I've just added the patch titled
USB: cdc-acm: restore capability check order
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 62d65bdd9d05158aa2547f8ef72375535f3bc6e3 Mon Sep 17 00:00:00 2001
From: Matthias Reichl <hias(a)horus.com>
Date: Fri, 27 Mar 2020 16:03:50 +0100
Subject: USB: cdc-acm: restore capability check order
commit b401f8c4f492c ("USB: cdc-acm: fix rounding error in TIOCSSERIAL")
introduced a regression by changing the order of capability and close
settings change checks. When running with CAP_SYS_ADMIN setting the
close settings to the values already set resulted in -EOPNOTSUPP.
Fix this by changing the check order back to how it was before.
Fixes: b401f8c4f492c ("USB: cdc-acm: fix rounding error in TIOCSSERIAL")
Cc: Anthony Mallet <anthony.mallet(a)laas.fr>
Cc: stable <stable(a)vger.kernel.org>
Cc: Oliver Neukum <oneukum(a)suse.com>
Signed-off-by: Matthias Reichl <hias(a)horus.com>
Link: https://lore.kernel.org/r/20200327150350.3657-1-hias@horus.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/class/cdc-acm.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 47f09a6ce7bd..84d6f7df09a4 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -923,16 +923,16 @@ static int set_serial_info(struct tty_struct *tty, struct serial_struct *ss)
mutex_lock(&acm->port.mutex);
- if ((ss->close_delay != old_close_delay) ||
- (ss->closing_wait != old_closing_wait)) {
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable(CAP_SYS_ADMIN)) {
+ if ((ss->close_delay != old_close_delay) ||
+ (ss->closing_wait != old_closing_wait))
retval = -EPERM;
- else {
- acm->port.close_delay = close_delay;
- acm->port.closing_wait = closing_wait;
- }
- } else
- retval = -EOPNOTSUPP;
+ else
+ retval = -EOPNOTSUPP;
+ } else {
+ acm->port.close_delay = close_delay;
+ acm->port.closing_wait = closing_wait;
+ }
mutex_unlock(&acm->port.mutex);
return retval;
--
2.26.0
From: Longpeng <longpeng2(a)huawei.com>
Our machine encountered a panic(addressing exception) after run
for a long time and the calltrace is:
RIP: 0010:[<ffffffff9dff0587>] [<ffffffff9dff0587>] hugetlb_fault+0x307/0xbe0
RSP: 0018:ffff9567fc27f808 EFLAGS: 00010286
RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
FS: 00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
[<ffffffff9df9b71b>] ? unlock_page+0x2b/0x30
[<ffffffff9dff04a2>] ? hugetlb_fault+0x222/0xbe0
[<ffffffff9dff1405>] follow_hugetlb_page+0x175/0x540
[<ffffffff9e15b825>] ? cpumask_next_and+0x35/0x50
[<ffffffff9dfc7230>] __get_user_pages+0x2a0/0x7e0
[<ffffffff9dfc648d>] __get_user_pages_unlocked+0x15d/0x210
[<ffffffffc068cfc5>] __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
[<ffffffffc06b28be>] try_async_pf+0x6e/0x2a0 [kvm]
[<ffffffffc06b4b41>] tdp_page_fault+0x151/0x2d0 [kvm]
...
[<ffffffffc06a6f90>] kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
[<ffffffffc068d919>] kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
[<ffffffff9deaa8c2>] ? dequeue_signal+0x32/0x180
[<ffffffff9deae34d>] ? do_sigtimedwait+0xcd/0x230
[<ffffffff9e03aed0>] do_vfs_ioctl+0x3f0/0x540
[<ffffffff9e03b0c1>] SyS_ioctl+0xa1/0xc0
[<ffffffff9e53879b>] system_call_fastpath+0x22/0x27
For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race. Please look at the following
code snippet:
...
pud = pud_offset(p4d, addr);
if (sz != PUD_SIZE && pud_none(*pud))
return NULL;
/* hugepage or swap? */
if (pud_huge(*pud) || !pud_present(*pud))
return (pte_t *)pud;
pmd = pmd_offset(pud, addr);
if (sz != PMD_SIZE && pmd_none(*pmd))
return NULL;
/* hugepage or swap? */
if (pmd_huge(*pmd) || !pmd_present(*pmd))
return (pte_t *)pmd;
...
The following sequence would trigger this bug:
1. CPU0: sz = PUD_SIZE and *pud = 0 , continue
1. CPU0: "pud_huge(*pud)" is false
2. CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
3. CPU0: "!pud_present(*pud)" is false, continue
4. CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp
However, we want CPU0 to return NULL or pudp in this case.
Also, according to the section 'COMPILER BARRIER' of memory-barriers.txt:
'''
(*) The compiler is within its rights to reorder loads and stores
to the same variable, and in some cases, the CPU is within its
rights to reorder loads to the same variable. This means that
the following code:
a[0] = x;
a[1] = x;
Might result in an older value of x stored in a[1] than in a[0].
'''
there're several other data races in huge_pte_offset, for example:
'''
p4d = p4d_offset(pgd, addr)
if (!p4d_present(*p4d))
return NULL;
pud = pud_offset(p4d, addr) <-- will be unwinded as:
pud = (pud_t *)p4d_page_vaddr(*p4d) + pud_index(address);
'''
which is free for the compiler/CPU to execute as:
'''
p4d = p4d_offset(pgd, addr)
p4d_for_vaddr = *p4d;
if (!p4d_present(*p4d))
return NULL;
pud = (pud_t *)p4d_page_vaddr(p4d_for_vaddr) + pud_index(address);
'''
so in the case where *p4g goes from '!present' to 'present':
p4d_present(*p4d) and p4d_for_vaddr == none, meaning the p4d_page_vaddr()
will crash.
For these reasons, we must make sure there is exactly one dereference of
p4g, pud and pmd.
Cc: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Sean Christopherson <sean.j.christopherson(a)intel.com>
Cc: stable(a)vger.kernel.org
Suggested-by: Jason Gunthorpe <jgg(a)ziepe.ca>
Signed-off-by: Longpeng <longpeng2(a)huawei.com>
---
v2 -> v3:
make sure p4g/pud/pmd be dereferenced once. [Jason]
---
mm/hugetlb.c | 26 +++++++++++++++-----------
1 file changed, 15 insertions(+), 11 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index dd8737a..d4fab68 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4909,29 +4909,33 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
unsigned long addr, unsigned long sz)
{
pgd_t *pgd;
- p4d_t *p4d;
- pud_t *pud;
- pmd_t *pmd;
+ p4d_t *p4g, p4d_entry;
+ pud_t *pud, pud_entry;
+ pmd_t *pmd, pmd_entry;
pgd = pgd_offset(mm, addr);
if (!pgd_present(*pgd))
return NULL;
- p4d = p4d_offset(pgd, addr);
- if (!p4d_present(*p4d))
+
+ p4g = p4d_offset(pgd, addr);
+ p4d_entry = READ_ONCE(*p4g);
+ if (!p4d_present(p4d_entry))
return NULL;
- pud = pud_offset(p4d, addr);
- if (sz != PUD_SIZE && pud_none(*pud))
+ pud = pud_offset(&p4d_entry, addr);
+ pud_entry = READ_ONCE(*pud);
+ if (sz != PUD_SIZE && pud_none(pud_entry))
return NULL;
/* hugepage or swap? */
- if (pud_huge(*pud) || !pud_present(*pud))
+ if (pud_huge(pud_entry) || !pud_present(pud_entry))
return (pte_t *)pud;
- pmd = pmd_offset(pud, addr);
- if (sz != PMD_SIZE && pmd_none(*pmd))
+ pmd = pmd_offset(&pud_entry, addr);
+ pmd_entry = READ_ONCE(*pmd);
+ if (sz != PMD_SIZE && pmd_none(pmd_entry))
return NULL;
/* hugepage or swap? */
- if (pmd_huge(*pmd) || !pmd_present(*pmd))
+ if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
return (pte_t *)pmd;
return NULL;
--
1.8.3.1
Tiger Lake's new unique ACPI device IDs for intel-hid driver is not
valid because of missing 'C' in the ID. Fix the ID by updating it.
After the update, the new ID should now look like
INT1051 --> INTC1051
Fixes: bdd11b654035 ("platform/x86: intel-hid: Add Tiger Lake ACPI device ID")
Cc: 5.6+ <stable(a)vger.kernel.org> # 5.6+
Cc: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Cc: Srinivas Pandruvada <srinivas.pandruvada(a)intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Suggested-by: Srinivas Pandruvada <srinivas.pandruvada(a)intel.com>
Signed-off-by: Gayatri Kammela <gayatri.kammela(a)intel.com>
---
drivers/platform/x86/intel-hid.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/platform/x86/intel-hid.c b/drivers/platform/x86/intel-hid.c
index 43d590250228..c0a4696803eb 100644
--- a/drivers/platform/x86/intel-hid.c
+++ b/drivers/platform/x86/intel-hid.c
@@ -19,7 +19,7 @@ MODULE_LICENSE("GPL");
MODULE_AUTHOR("Alex Hung");
static const struct acpi_device_id intel_hid_ids[] = {
- {"INT1051", 0},
+ {"INTC1051", 0},
{"INT33D5", 0},
{"", 0},
};
--
2.17.1
Commit 0161a94e2d1c7 ("tools: gpio: Correctly add make dependencies for
gpio_utils") added a make rule for gpio-utils-in.o but used $(output)
instead of the correct $(OUTPUT) for the output directory, breaking
out-of-tree build (O=xx) with the following error:
No rule to make target 'out/tools/gpio/gpio-utils-in.o', needed by 'out/tools/gpio/lsgpio-in.o'. Stop.
Fix that.
Fixes: 0161a94e2d1c ("tools: gpio: Correctly add make dependencies for gpio_utils")
Cc: <stable(a)vger.kernel.org>
Cc: Laura Abbott <labbott(a)redhat.com>
Signed-off-by: Anssi Hannula <anssi.hannula(a)bitwise.fi>
---
The 0161a94e2d1c was also applied to stable releases, which is where I
got hit by the issue.
tools/gpio/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/gpio/Makefile b/tools/gpio/Makefile
index 842287e42c83..440434027557 100644
--- a/tools/gpio/Makefile
+++ b/tools/gpio/Makefile
@@ -35,7 +35,7 @@ $(OUTPUT)include/linux/gpio.h: ../../include/uapi/linux/gpio.h
prepare: $(OUTPUT)include/linux/gpio.h
-GPIO_UTILS_IN := $(output)gpio-utils-in.o
+GPIO_UTILS_IN := $(OUTPUT)gpio-utils-in.o
$(GPIO_UTILS_IN): prepare FORCE
$(Q)$(MAKE) $(build)=gpio-utils
--
2.21.1
From: Corentin Labbe <clabbe(a)baylibre.com>
I have hit the following build error:
armv7a-hardfloat-linux-gnueabi-ld: drivers/rtc/rtc-max8907.o: in function `max8907_rtc_probe':
rtc-max8907.c:(.text+0x400): undefined reference to `regmap_irq_get_virq'
max8907 should select REGMAP_IRQ
Fixes: 94c01ab6d7544 ("rtc: add MAX8907 RTC driver")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Corentin Labbe <clabbe(a)baylibre.com>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
---
drivers/rtc/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 34c8b6c7e095..8e503881d9d6 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -327,6 +327,7 @@ config RTC_DRV_MAX6900
config RTC_DRV_MAX8907
tristate "Maxim MAX8907"
depends on MFD_MAX8907 || COMPILE_TEST
+ select REGMAP_IRQ
help
If you say yes here you will get support for the
RTC of Maxim MAX8907 PMIC.
--
2.25.1
This is a note to let you know that I've just added the patch titled
USB: cdc-acm: restore capability check order
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 62d65bdd9d05158aa2547f8ef72375535f3bc6e3 Mon Sep 17 00:00:00 2001
From: Matthias Reichl <hias(a)horus.com>
Date: Fri, 27 Mar 2020 16:03:50 +0100
Subject: USB: cdc-acm: restore capability check order
commit b401f8c4f492c ("USB: cdc-acm: fix rounding error in TIOCSSERIAL")
introduced a regression by changing the order of capability and close
settings change checks. When running with CAP_SYS_ADMIN setting the
close settings to the values already set resulted in -EOPNOTSUPP.
Fix this by changing the check order back to how it was before.
Fixes: b401f8c4f492c ("USB: cdc-acm: fix rounding error in TIOCSSERIAL")
Cc: Anthony Mallet <anthony.mallet(a)laas.fr>
Cc: stable <stable(a)vger.kernel.org>
Cc: Oliver Neukum <oneukum(a)suse.com>
Signed-off-by: Matthias Reichl <hias(a)horus.com>
Link: https://lore.kernel.org/r/20200327150350.3657-1-hias@horus.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/class/cdc-acm.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 47f09a6ce7bd..84d6f7df09a4 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -923,16 +923,16 @@ static int set_serial_info(struct tty_struct *tty, struct serial_struct *ss)
mutex_lock(&acm->port.mutex);
- if ((ss->close_delay != old_close_delay) ||
- (ss->closing_wait != old_closing_wait)) {
- if (!capable(CAP_SYS_ADMIN))
+ if (!capable(CAP_SYS_ADMIN)) {
+ if ((ss->close_delay != old_close_delay) ||
+ (ss->closing_wait != old_closing_wait))
retval = -EPERM;
- else {
- acm->port.close_delay = close_delay;
- acm->port.closing_wait = closing_wait;
- }
- } else
- retval = -EOPNOTSUPP;
+ else
+ retval = -EOPNOTSUPP;
+ } else {
+ acm->port.close_delay = close_delay;
+ acm->port.closing_wait = closing_wait;
+ }
mutex_unlock(&acm->port.mutex);
return retval;
--
2.26.0
From: Jouni Malinen <jouni(a)codeaurora.org>
mac80211 used to check port authorization in the Data frame enqueue case
when going through start_xmit(). However, that authorization status may
change while the frame is waiting in a queue. Add a similar check in the
dequeue case to avoid sending previously accepted frames after
authorization change. This provides additional protection against
potential leaking of frames after a station has been disconnected and
the keys for it are being removed.
Cc: stable(a)vger.kernel.org
Signed-off-by: Jouni Malinen <jouni(a)codeaurora.org>
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
---
net/mac80211/tx.c | 19 ++++++++++++++++++-
1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 7dbfb9e3cd84..455eb8e6a459 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -3604,8 +3604,25 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw,
tx.skb = skb;
tx.sdata = vif_to_sdata(info->control.vif);
- if (txq->sta)
+ if (txq->sta) {
tx.sta = container_of(txq->sta, struct sta_info, sta);
+ /*
+ * Drop unicast frames to unauthorised stations unless they are
+ * EAPOL frames from the local station.
+ */
+ if (unlikely(!ieee80211_vif_is_mesh(&tx.sdata->vif) &&
+ tx.sdata->vif.type != NL80211_IFTYPE_OCB &&
+ !is_multicast_ether_addr(hdr->addr1) &&
+ !test_sta_flag(tx.sta, WLAN_STA_AUTHORIZED) &&
+ (!(info->control.flags &
+ IEEE80211_TX_CTRL_PORT_CTRL_PROTO) ||
+ !ether_addr_equal(tx.sdata->vif.addr,
+ hdr->addr2)))) {
+ I802_DEBUG_INC(local->tx_handlers_drop_unauth_port);
+ ieee80211_free_txskb(&local->hw, skb);
+ goto begin;
+ }
+ }
/*
* The key can be removed while the packet was queued, so need to call
--
2.25.1
After our server is upgraded to a newer kernel, we found that it
continuesly print a warning in the kernel message. The warning is,
[832984.946322] netlink: 'irmas.lc': attribute type 1 has an invalid length.
irmas.lc is one of our container monitor daemons, and it will use
CGROUPSTATS_CMD_GET to get the cgroupstats, that is similar with
tools/accounting/getdelays.c. We can also produce this warning with
getdelays. For example, after running bellow command
$ ./getdelays -C /sys/fs/cgroup/memory
then you can find a warning in dmesg,
[61607.229318] netlink: 'getdelays': attribute type 1 has an invalid length.
This warning is introduced in commit 6e237d099fac ("netlink: Relax attr
validation for fixed length types"), which is used to check whether
attributes using types NLA_U* and NLA_S* have an exact length.
Regarding this issue, the root cause is cgroupstats_cmd_get_policy defines
a wrong type as NLA_U32, while it should be NLA_NESTED an its minimal
length is NLA_HDRLEN. That is similar to taskstats_cmd_get_policy.
As this behavior change really breaks our application, we'd better
cc stable as well.
Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
kernel/taskstats.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/taskstats.c b/kernel/taskstats.c
index e2ac0e3..b90a520 100644
--- a/kernel/taskstats.c
+++ b/kernel/taskstats.c
@@ -35,8 +35,8 @@
static struct genl_family family;
static const struct nla_policy taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {
- [TASKSTATS_CMD_ATTR_PID] = { .type = NLA_U32 },
- [TASKSTATS_CMD_ATTR_TGID] = { .type = NLA_U32 },
+ [TASKSTATS_CMD_ATTR_PID] = { .type = NLA_NESTED },
+ [TASKSTATS_CMD_ATTR_TGID] = { .type = NLA_NESTED },
[TASKSTATS_CMD_ATTR_REGISTER_CPUMASK] = { .type = NLA_STRING },
[TASKSTATS_CMD_ATTR_DEREGISTER_CPUMASK] = { .type = NLA_STRING },};
@@ -45,7 +45,7 @@
* Make sure they are always aligned.
*/
static const struct nla_policy cgroupstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {
- [CGROUPSTATS_CMD_ATTR_FD] = { .type = NLA_U32 },
+ [CGROUPSTATS_CMD_ATTR_FD] = { .type = NLA_NESTED },
};
struct listener {
--
1.8.3.1
Hi
[This is an automated email]
This commit has been processed because it contains a -stable tag.
The stable tag indicates that it's relevant for the following trees: all
The bot has tested the following trees: v5.5.11, v5.4.27, v4.19.112, v4.14.174, v4.9.217, v4.4.217.
v5.5.11: Build OK!
v5.4.27: Build OK!
v4.19.112: Failed to apply! Possible dependencies:
03512ceb60ae ("ieee80211: remove redundant leading zeroes")
09b4a4faf9d0 ("mac80211: introduce capability flags for VHT EXT NSS support")
0eeb2b674f05 ("mac80211: add an option for station management TXQ")
1c9559734eca ("mac80211: remove unnecessary key condition")
57a3a454f303 ("iwlwifi: split HE capabilities between AP and STA")
62872a9b9a10 ("mac80211: Fix PTK rekey freezes and clear text leak")
77f7ffdc335d ("mac80211: minstrel_ht: add flag to indicate missing/inaccurate tx A-MPDU length")
77ff2c6b4984 ("mac80211: update HE IEs to D3.3")
80aaa9c16415 ("mac80211: Add he_capa debugfs entry")
96fc6efb9ad9 ("mac80211: IEEE 802.11 Extended Key ID support")
add7453ad62f ("wireless: align to draft 11ax D3.0")
adf8ed01e4fd ("mac80211: add an optional TXQ for other PS-buffered frames")
caf56338c22f ("mac80211: indicate support for multiple BSSID")
daa5b83513a7 ("mac80211: update HE operation fields to D3.0")
v4.14.174: Failed to apply! Possible dependencies:
110b32f065f3 ("iwlwifi: mvm: rs: add basic implementation of the new RS API handlers")
1c73acf58bd6 ("iwlwifi: acpi: move ACPI method definitions to acpi.h")
28e9c00fe1f0 ("iwlwifi: remove upper case letters in sku_capa_band_*_enable")
4ae80f6c8d86 ("iwlwifi: support api ver2 of NVM_GET_INFO resp")
4b82455ca51e ("iwlwifi: use flags to denote modifiers for the channel maps")
4c625c564ba2 ("iwlwifi: get rid of fw/nvm.c")
514c30696fbc ("iwlwifi: add support for IEEE802.11ax")
57a3a454f303 ("iwlwifi: split HE capabilities between AP and STA")
77ff2c6b4984 ("mac80211: update HE IEs to D3.3")
813df5cef3bb ("iwlwifi: acpi: add common code to read from ACPI")
8a6171a7b601 ("iwlwifi: fw: add FW APIs for HE")
9c4f7d512740 ("iwlwifi: move all NVM parsing code to the common files")
9f66a397c877 ("iwlwifi: mvm: rs: add ops for the new rate scaling in the FW")
e7a3b8d87910 ("iwlwifi: acpi: move ACPI-related definitions to acpi.h")
v4.9.217: Failed to apply! Possible dependencies:
01796ff2fa6e ("iwlwifi: mvm: always free inactive queue when moving ownership")
0aaece81114e ("iwlwifi: split firmware API from iwl-trans.h")
1ea423b0e047 ("iwlwifi: remove unnecessary dev_cmd_headroom parameter")
310181ec34e2 ("iwlwifi: move to TVQM mode")
5594d80e9bf4 ("iwlwifi: support two phys for a000 devices")
623e7766be90 ("iwlwifi: pcie: introduce split point to a000 devices")
65e254821cee ("iwlwifi: mvm: use firmware station PM notification for AP_LINK_PS")
6b35ff91572f ("iwlwifi: pcie: introduce a000 TX queues management")
727c02dfb848 ("iwlwifi: pcie: cleanup rfkill checks")
77ff2c6b4984 ("mac80211: update HE IEs to D3.3")
8236f7db2724 ("iwlwifi: mvm: assign cab queue to the correct station")
87d0e1af9db3 ("iwlwifi: mvm: separate queue mapping from queue enablement")
bb49701b41de ("iwlwifi: mvm: support a000 SCD queue configuration")
cf90da352a32 ("iwlwifi: mvm: use mvm_disable_queue instead of sharing logic")
d172a5eff629 ("iwlwifi: reorganize firmware API")
df88c08d5c7e ("iwlwifi: mvm: release static queues on bcast release")
eda50cde58de ("iwlwifi: pcie: add context information support")
v4.4.217: Failed to apply! Possible dependencies:
0aaece81114e ("iwlwifi: split firmware API from iwl-trans.h")
13555e8ba2f4 ("iwlwifi: mvm: add 9000-series RX API")
1a616dd2f171 ("iwlwifi: dump prph registers in a common place for all transports")
1e0bbebaae66 ("mac80211: enable starting BA session with custom timeout")
2f89a5d7d377 ("iwlwifi: mvm: move fw-dbg code to separate file")
39bdb17ebb5b ("iwlwifi: update host command messages to new format")
41837ca962ec ("iwlwifi: pcie: allow to pretend to have Tx CSUM for debug")
4707fde5cdef ("iwlwifi: mvm: use build-time assertion for fw trigger ID")
5b88792cd850 ("iwlwifi: move to wide ID for all commands")
6c4fbcbc1c95 ("iwlwifi: add support for 12K Receive Buffers")
77ff2c6b4984 ("mac80211: update HE IEs to D3.3")
92fe83430b89 ("iwlwifi: uninline iwl_trans_send_cmd")
d172a5eff629 ("iwlwifi: reorganize firmware API")
dcbb4746286a ("iwlwifi: trans: support a callback for ASYNC commands")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks
Sasha
In AIO case, the request is freed up if ep_queue fails.
However, io_data->req still has the reference to this freed
request. In the case of this failure if there is aio_cancel
call on this io_data it will lead to an invalid dequeue
operation and a potential use after free issue.
Fix this by setting the io_data->req to NULL when the request
is freed as part of queue failure.
Fixes: 2e4c7553cd6f ("usb: gadget: f_fs: add aio support")
Signed-off-by: Sriharsha Allenki <sallenki(a)codeaurora.org>
CC: stable <stable(a)vger.kernel.org>
Reviewed-by: Peter Chen <peter.chen(a)nxp.com>
---
Changes in v2:
- As suggested by Greg, updated the tags
drivers/usb/gadget/function/f_fs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 571917677d35..767f30b86645 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1120,6 +1120,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data)
ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
if (unlikely(ret)) {
+ io_data->req = NULL;
usb_ep_free_request(ep->ep, req);
goto error_lock;
}
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
It's currently the platform driver's responsibility to initialize the
pointer, dma_parms, for its corresponding struct device. The benefit with
this approach allows us to avoid the initialization and to not waste memory
for the struct device_dma_parameters, as this can be decided on a case by
case basis.
However, it has turned out that this approach is not very practical. Not
only does it lead to open coding, but also to real errors. In principle
callers of dma_set_max_seg_size() doesn't check the error code, but just
assumes it succeeds.
For these reasons, let's do the initialization from the common platform bus
at the device registration point. This also follows the way the PCI devices
are being managed, see pci_device_add().
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
---
drivers/base/platform.c | 1 +
include/linux/platform_device.h | 1 +
2 files changed, 2 insertions(+)
diff --git a/drivers/base/platform.c b/drivers/base/platform.c
index b5ce7b085795..46abbfb52655 100644
--- a/drivers/base/platform.c
+++ b/drivers/base/platform.c
@@ -512,6 +512,7 @@ int platform_device_add(struct platform_device *pdev)
pdev->dev.parent = &platform_bus;
pdev->dev.bus = &platform_bus_type;
+ pdev->dev.dma_parms = &pdev->dma_parms;
switch (pdev->id) {
default:
diff --git a/include/linux/platform_device.h b/include/linux/platform_device.h
index 041bfa412aa0..81900b3cbe37 100644
--- a/include/linux/platform_device.h
+++ b/include/linux/platform_device.h
@@ -25,6 +25,7 @@ struct platform_device {
bool id_auto;
struct device dev;
u64 platform_dma_mask;
+ struct device_dma_parameters dma_parms;
u32 num_resources;
struct resource *resource;
--
2.20.1
This is a note to let you know that I've just added the patch titled
vt: vt_ioctl: fix use-after-free in vt_in_use()
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the tty-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 7cf64b18b0b96e751178b8d0505d8466ff5a448f Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Sat, 21 Mar 2020 20:43:05 -0700
Subject: vt: vt_ioctl: fix use-after-free in vt_in_use()
vt_in_use() dereferences console_driver->ttys[i] without proper locking.
This is broken because the tty can be closed and freed concurrently.
We could fix this by using 'READ_ONCE(console_driver->ttys[i]) != NULL'
and skipping the check of tty_struct::count. But, looking at
console_driver->ttys[i] isn't really appropriate anyway because even if
it is NULL the tty can still be in the process of being closed.
Instead, fix it by making vt_in_use() require console_lock() and check
whether the vt is allocated and has port refcount > 1. This works since
following the patch "vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use
virtual console" the port refcount is incremented while the vt is open.
Reproducer (very unreliable, but it worked for me after a few minutes):
#include <fcntl.h>
#include <linux/vt.h>
int main()
{
int fd, nproc;
struct vt_stat state;
char ttyname[16];
fd = open("/dev/tty10", O_RDONLY);
for (nproc = 1; nproc < 8; nproc *= 2)
fork();
for (;;) {
sprintf(ttyname, "/dev/tty%d", rand() % 8);
close(open(ttyname, O_RDONLY));
ioctl(fd, VT_GETSTATE, &state);
}
}
KASAN report:
BUG: KASAN: use-after-free in vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
BUG: KASAN: use-after-free in vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
Read of size 4 at addr ffff888065722468 by task syz-vt2/132
CPU: 0 PID: 132 Comm: syz-vt2 Not tainted 5.6.0-rc5-00130-g089b6d3654916 #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
Call Trace:
[...]
vt_in_use drivers/tty/vt/vt_ioctl.c:48 [inline]
vt_ioctl+0x1ad3/0x1d70 drivers/tty/vt/vt_ioctl.c:657
tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660
[...]
Allocated by task 136:
[...]
kzalloc include/linux/slab.h:669 [inline]
alloc_tty_struct+0x96/0x8a0 drivers/tty/tty_io.c:2982
tty_init_dev+0x23/0x350 drivers/tty/tty_io.c:1334
tty_open_by_driver drivers/tty/tty_io.c:1987 [inline]
tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035
[...]
Freed by task 41:
[...]
kfree+0xbf/0x200 mm/slab.c:3757
free_tty_struct+0x8d/0xb0 drivers/tty/tty_io.c:177
release_one_tty+0x22d/0x2f0 drivers/tty/tty_io.c:1468
process_one_work+0x7f1/0x14b0 kernel/workqueue.c:2264
worker_thread+0x8b/0xc80 kernel/workqueue.c:2410
[...]
Fixes: 4001d7b7fc27 ("vt: push down the tty lock so we can see what is left to tackle")
Cc: <stable(a)vger.kernel.org> # v3.4+
Acked-by: Jiri Slaby <jslaby(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Link: https://lore.kernel.org/r/20200322034305.210082-3-ebiggers@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/vt/vt_ioctl.c | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
index f62f498f63c0..daf61c28ba76 100644
--- a/drivers/tty/vt/vt_ioctl.c
+++ b/drivers/tty/vt/vt_ioctl.c
@@ -43,9 +43,15 @@ bool vt_dont_switch;
static inline bool vt_in_use(unsigned int i)
{
- extern struct tty_driver *console_driver;
+ const struct vc_data *vc = vc_cons[i].d;
- return console_driver->ttys[i] && console_driver->ttys[i]->count;
+ /*
+ * console_lock must be held to prevent the vc from being deallocated
+ * while we're checking whether it's in-use.
+ */
+ WARN_CONSOLE_UNLOCKED();
+
+ return vc && kref_read(&vc->port.kref) > 1;
}
static inline bool vt_busy(int i)
@@ -643,15 +649,16 @@ int vt_ioctl(struct tty_struct *tty,
struct vt_stat __user *vtstat = up;
unsigned short state, mask;
- /* Review: FIXME: Console lock ? */
if (put_user(fg_console + 1, &vtstat->v_active))
ret = -EFAULT;
else {
state = 1; /* /dev/tty0 is always open */
+ console_lock(); /* required by vt_in_use() */
for (i = 0, mask = 2; i < MAX_NR_CONSOLES && mask;
++i, mask <<= 1)
if (vt_in_use(i))
state |= mask;
+ console_unlock();
ret = put_user(state, &vtstat->v_state);
}
break;
@@ -661,10 +668,11 @@ int vt_ioctl(struct tty_struct *tty,
* Returns the first available (non-opened) console.
*/
case VT_OPENQRY:
- /* FIXME: locking ? - but then this is a stupid API */
+ console_lock(); /* required by vt_in_use() */
for (i = 0; i < MAX_NR_CONSOLES; ++i)
if (!vt_in_use(i))
break;
+ console_unlock();
uival = i < MAX_NR_CONSOLES ? (i+1) : -1;
goto setint;
--
2.26.0
This is a note to let you know that I've just added the patch titled
vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console
to my tty git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git
in the tty-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the tty-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From ca4463bf8438b403596edd0ec961ca0d4fbe0220 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)google.com>
Date: Sat, 21 Mar 2020 20:43:04 -0700
Subject: vt: vt_ioctl: fix VT_DISALLOCATE freeing in-use virtual console
The VT_DISALLOCATE ioctl can free a virtual console while tty_release()
is still running, causing a use-after-free in con_shutdown(). This
occurs because VT_DISALLOCATE considers a virtual console's
'struct vc_data' to be unused as soon as the corresponding tty's
refcount hits 0. But actually it may be still being closed.
Fix this by making vc_data be reference-counted via the embedded
'struct tty_port'. A newly allocated virtual console has refcount 1.
Opening it for the first time increments the refcount to 2. Closing it
for the last time decrements the refcount (in tty_operations::cleanup()
so that it happens late enough), as does VT_DISALLOCATE.
Reproducer:
#include <fcntl.h>
#include <linux/vt.h>
#include <sys/ioctl.h>
#include <unistd.h>
int main()
{
if (fork()) {
for (;;)
close(open("/dev/tty5", O_RDWR));
} else {
int fd = open("/dev/tty10", O_RDWR);
for (;;)
ioctl(fd, VT_DISALLOCATE, 5);
}
}
KASAN report:
BUG: KASAN: use-after-free in con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278
Write of size 8 at addr ffff88806a4ec108 by task syz_vt/129
CPU: 0 PID: 129 Comm: syz_vt Not tainted 5.6.0-rc2 #11
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20191223_100556-anatol 04/01/2014
Call Trace:
[...]
con_shutdown+0x76/0x80 drivers/tty/vt/vt.c:3278
release_tty+0xa8/0x410 drivers/tty/tty_io.c:1514
tty_release_struct+0x34/0x50 drivers/tty/tty_io.c:1629
tty_release+0x984/0xed0 drivers/tty/tty_io.c:1789
[...]
Allocated by task 129:
[...]
kzalloc include/linux/slab.h:669 [inline]
vc_allocate drivers/tty/vt/vt.c:1085 [inline]
vc_allocate+0x1ac/0x680 drivers/tty/vt/vt.c:1066
con_install+0x4d/0x3f0 drivers/tty/vt/vt.c:3229
tty_driver_install_tty drivers/tty/tty_io.c:1228 [inline]
tty_init_dev+0x94/0x350 drivers/tty/tty_io.c:1341
tty_open_by_driver drivers/tty/tty_io.c:1987 [inline]
tty_open+0x3ca/0xb30 drivers/tty/tty_io.c:2035
[...]
Freed by task 130:
[...]
kfree+0xbf/0x1e0 mm/slab.c:3757
vt_disallocate drivers/tty/vt/vt_ioctl.c:300 [inline]
vt_ioctl+0x16dc/0x1e30 drivers/tty/vt/vt_ioctl.c:818
tty_ioctl+0x9db/0x11b0 drivers/tty/tty_io.c:2660
[...]
Fixes: 4001d7b7fc27 ("vt: push down the tty lock so we can see what is left to tackle")
Cc: <stable(a)vger.kernel.org> # v3.4+
Reported-by: syzbot+522643ab5729b0421998(a)syzkaller.appspotmail.com
Acked-by: Jiri Slaby <jslaby(a)suse.cz>
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Link: https://lore.kernel.org/r/20200322034305.210082-2-ebiggers@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/tty/vt/vt.c | 23 ++++++++++++++++++++++-
drivers/tty/vt/vt_ioctl.c | 12 ++++--------
2 files changed, 26 insertions(+), 9 deletions(-)
diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index bbc26d73209a..309a39197be0 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -1075,6 +1075,17 @@ static void visual_deinit(struct vc_data *vc)
module_put(vc->vc_sw->owner);
}
+static void vc_port_destruct(struct tty_port *port)
+{
+ struct vc_data *vc = container_of(port, struct vc_data, port);
+
+ kfree(vc);
+}
+
+static const struct tty_port_operations vc_port_ops = {
+ .destruct = vc_port_destruct,
+};
+
int vc_allocate(unsigned int currcons) /* return 0 on success */
{
struct vt_notifier_param param;
@@ -1100,6 +1111,7 @@ int vc_allocate(unsigned int currcons) /* return 0 on success */
vc_cons[currcons].d = vc;
tty_port_init(&vc->port);
+ vc->port.ops = &vc_port_ops;
INIT_WORK(&vc_cons[currcons].SAK_work, vc_SAK);
visual_init(vc, currcons, 1);
@@ -3250,6 +3262,7 @@ static int con_install(struct tty_driver *driver, struct tty_struct *tty)
tty->driver_data = vc;
vc->port.tty = tty;
+ tty_port_get(&vc->port);
if (!tty->winsize.ws_row && !tty->winsize.ws_col) {
tty->winsize.ws_row = vc_cons[currcons].d->vc_rows;
@@ -3285,6 +3298,13 @@ static void con_shutdown(struct tty_struct *tty)
console_unlock();
}
+static void con_cleanup(struct tty_struct *tty)
+{
+ struct vc_data *vc = tty->driver_data;
+
+ tty_port_put(&vc->port);
+}
+
static int default_color = 7; /* white */
static int default_italic_color = 2; // green (ASCII)
static int default_underline_color = 3; // cyan (ASCII)
@@ -3410,7 +3430,8 @@ static const struct tty_operations con_ops = {
.throttle = con_throttle,
.unthrottle = con_unthrottle,
.resize = vt_resize,
- .shutdown = con_shutdown
+ .shutdown = con_shutdown,
+ .cleanup = con_cleanup,
};
static struct cdev vc0_cdev;
diff --git a/drivers/tty/vt/vt_ioctl.c b/drivers/tty/vt/vt_ioctl.c
index 7297997fcf04..f62f498f63c0 100644
--- a/drivers/tty/vt/vt_ioctl.c
+++ b/drivers/tty/vt/vt_ioctl.c
@@ -310,10 +310,8 @@ static int vt_disallocate(unsigned int vc_num)
vc = vc_deallocate(vc_num);
console_unlock();
- if (vc && vc_num >= MIN_NR_CONSOLES) {
- tty_port_destroy(&vc->port);
- kfree(vc);
- }
+ if (vc && vc_num >= MIN_NR_CONSOLES)
+ tty_port_put(&vc->port);
return ret;
}
@@ -333,10 +331,8 @@ static void vt_disallocate_all(void)
console_unlock();
for (i = 1; i < MAX_NR_CONSOLES; i++) {
- if (vc[i] && i >= MIN_NR_CONSOLES) {
- tty_port_destroy(&vc[i]->port);
- kfree(vc[i]);
- }
+ if (vc[i] && i >= MIN_NR_CONSOLES)
+ tty_port_put(&vc[i]->port);
}
}
--
2.26.0
Hi Greg at al,
Please the following commit:
commit 024aa8732acb7d2503eae43c3fe3504d0a8646d0
Author: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
Date: Thu Nov 28 23:50:40 2019 +0100
ACPI: PM: s2idle: Rework ACPI events synchronization
Note that the EC GPE processing need not be synchronized in
acpi_s2idle_wake() after invoking acpi_ec_dispatch_gpe(), because
that function checks the GPE status and dispatches its handler if
need be and the SCI action handler is not going to run anyway at
that point.
Moreover, it is better to drain all of the pending ACPI events
before restoring the working-state configuration of GPEs in
acpi_s2idle_restore(), because those events are likely to be related
to system wakeup, in which case they will not be relevant going
forward.
Rework the code to take these observations into account.
Tested-by: Kenneth R. Crudup <kenny(a)panix.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
into 5.4.y as it is needed to fix an ACPI wakeup events
synchronization issue in suspend-to-idle.
Note that this commit was present in 5.5.0, so 5.4.y is the only
-stable series needing this fix.
Cheers,
Rafael
GPIO lines for the CM36651 sensor I2C bus use the normal not the inverted
polarity. This bug has been there since adding the CM36651 sensor by
commit 85cb4e0bd2294, but went unnoticed because the "i2c-gpio" driver
ignored the GPIO polarity specified in the device-tree.
The recent conversion of "i2c-gpio" driver to the new, descriptor based
GPIO API, automatically made it the DT-specified polarity aware, what
broke the CM36651 sensor operation.
Fixes: c769eaf7a85d ("ARM: dts: exynos: Split Trats2 DTS in preparation for Midas boards")
Fixes: c10d3290cbde ("ARM: dts: Use GPIO constants for flags cells in exynos4412 boards")
Fixes: 85cb4e0bd229 ("ARM: dts: add cm36651 light/proximity sensor node for exynos4412-trats2")
CC: stable(a)vger.kernel.org # 4.16+
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
---
arch/arm/boot/dts/exynos4412-galaxy-s3.dtsi | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/boot/dts/exynos4412-galaxy-s3.dtsi b/arch/arm/boot/dts/exynos4412-galaxy-s3.dtsi
index 44f97546dd0a..f910aa924bfb 100644
--- a/arch/arm/boot/dts/exynos4412-galaxy-s3.dtsi
+++ b/arch/arm/boot/dts/exynos4412-galaxy-s3.dtsi
@@ -68,7 +68,7 @@
i2c_cm36651: i2c-gpio-2 {
compatible = "i2c-gpio";
- gpios = <&gpf0 0 GPIO_ACTIVE_LOW>, <&gpf0 1 GPIO_ACTIVE_LOW>;
+ gpios = <&gpf0 0 GPIO_ACTIVE_HIGH>, <&gpf0 1 GPIO_ACTIVE_HIGH>;
i2c-gpio,delay-us = <2>;
#address-cells = <1>;
#size-cells = <0>;
--
2.17.1
Generic protection fault type kernel panic is observed when user
performs soft(ordered) HBA unplug operation while IOs are running
on drives connected to HBA.
When user performs ordered HBA removal operation then kernel calls
PCI device's .remove() call back function where driver is flushing out
all the outstanding SCSI IO commands with DID_NO_CONNECT host byte and
also un-maps sg buffers allocated for these IO commands.
But in the ordered HBA removal case (unlike of real HBA hot unplug)
HBA device is still alive and hence HBA hardware is performing the
DMA operations to those buffers on the system memory which are already
unmapped while flushing out the outstanding SCSI IO commands
and this leads to Kernel panic.
Fix:
Don't flush out the outstanding IOs from .remove() path in case of
ordered HBA removal since HBA will be still alive in this case and
it can complete the outstanding IOs. Flush out the outstanding IOs
only in case physical HBA hot unplug where their won't be any
communication with the HBA.
Cc: stable(a)vger.kernel.org
Signed-off-by: Sreekanth Reddy <sreekanth.reddy(a)broadcom.com>
---
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 778d5e6..04a40af 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -9908,8 +9908,8 @@ static void scsih_remove(struct pci_dev *pdev)
ioc->remove_host = 1;
- mpt3sas_wait_for_commands_to_complete(ioc);
- _scsih_flush_running_cmds(ioc);
+ if (!pci_device_is_present(pdev))
+ _scsih_flush_running_cmds(ioc);
_scsih_fw_event_cleanup_queue(ioc);
@@ -9992,8 +9992,8 @@ static void scsih_remove(struct pci_dev *pdev)
ioc->remove_host = 1;
- mpt3sas_wait_for_commands_to_complete(ioc);
- _scsih_flush_running_cmds(ioc);
+ if (!pci_device_is_present(pdev))
+ _scsih_flush_running_cmds(ioc);
_scsih_fw_event_cleanup_queue(ioc);
--
1.8.3.1
Fixes the below crash
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000000c3447c
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
...
NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
LR [c000000000088354] vmemmap_free+0x144/0x320
Call Trace:
section_deactivate+0x220/0x240
__remove_pages+0x118/0x170
arch_remove_memory+0x3c/0x150
memunmap_pages+0x1cc/0x2f0
devm_action_release+0x30/0x50
release_nodes+0x2f8/0x3e0
device_release_driver_internal+0x168/0x270
unbind_store+0x130/0x170
drv_attr_store+0x44/0x60
sysfs_kf_write+0x68/0x80
kernfs_fop_write+0x100/0x290
__vfs_write+0x3c/0x70
vfs_write+0xcc/0x240
ksys_write+0x7c/0x140
system_call+0x5c/0x68
The crash is due to NULL dereference at
test_bit(idx, ms->usage->subsection_map); due to ms->usage = NULL; in pfn_section_valid()
With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
section_mem_map is set to NULL after depopulate_section_mem(). This
was done so that pfn_page() can work correctly with kernel config that disables
SPARSEMEM_VMEMMAP. With that config pfn_to_page does
__section_mem_map_addr(__sec) + __pfn;
where
static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
unsigned long map = section->section_mem_map;
map &= SECTION_MAP_MASK;
return (struct page *)map;
}
Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is used to
check the pfn validity (pfn_valid()). Since section_deactivate release
mem_section->usage if a section is fully deactivated, pfn_valid() check after
a subsection_deactivate cause a kernel crash.
static inline int pfn_valid(unsigned long pfn)
{
...
return early_section(ms) || pfn_section_valid(ms, pfn);
}
where
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
int idx = subsection_map_index(pfn);
return test_bit(idx, ms->usage->subsection_map);
}
Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.
For architectures like ppc64 where large pages are used for vmmemap mapping (16MB),
a specific vmemmap mapping can cover multiple sections. Hence before a vmemmap
mapping page can be freed, the kernel needs to make sure there are no valid sections
within that mapping. Clearing the section valid bit before
depopulate_section_memap enables this.
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Reported-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
---
mm/sparse.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/mm/sparse.c b/mm/sparse.c
index aadb7298dcef..65599e8bd636 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -781,6 +781,12 @@ static void section_deactivate(unsigned long pfn, unsigned long nr_pages,
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+ /*
+ * Mark the section invalid so that valid_section()
+ * return false. This prevents code from dereferencing
+ * ms->usage array.
+ */
+ ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
}
if (section_is_early && memmap)
--
2.25.1
This is a note to let you know that I've just added the patch titled
staging: wlan-ng: fix use-after-free Read in hfa384x_usbin_callback
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 1165dd73e811a07d947aee218510571f516081f6 Mon Sep 17 00:00:00 2001
From: Qiujun Huang <hqjagain(a)gmail.com>
Date: Thu, 26 Mar 2020 21:18:50 +0800
Subject: staging: wlan-ng: fix use-after-free Read in hfa384x_usbin_callback
We can't handle the case length > WLAN_DATA_MAXLEN.
Because the size of rxfrm->data is WLAN_DATA_MAXLEN(2312), and we can't
read more than that.
Thanks-to: Hillf Danton <hdanton(a)sina.com>
Reported-and-tested-by: syzbot+7d42d68643a35f71ac8a(a)syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20200326131850.17711-1-hqjagain@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/wlan-ng/hfa384x_usb.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/staging/wlan-ng/hfa384x_usb.c b/drivers/staging/wlan-ng/hfa384x_usb.c
index e38acb58cd5e..fa1bf8b069fd 100644
--- a/drivers/staging/wlan-ng/hfa384x_usb.c
+++ b/drivers/staging/wlan-ng/hfa384x_usb.c
@@ -3376,6 +3376,8 @@ static void hfa384x_int_rxmonitor(struct wlandevice *wlandev,
WLAN_HDR_A4_LEN + WLAN_DATA_MAXLEN + WLAN_CRC_LEN)) {
pr_debug("overlen frm: len=%zd\n",
skblen - sizeof(struct p80211_caphdr));
+
+ return;
}
skb = dev_alloc_skb(skblen);
--
2.26.0
This is a note to let you know that I've just added the patch titled
usb: gadget: f_fs: Fix use after free issue as part of queue failure
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From f63ec55ff904b2f2e126884fcad93175f16ab4bb Mon Sep 17 00:00:00 2001
From: Sriharsha Allenki <sallenki(a)codeaurora.org>
Date: Thu, 26 Mar 2020 17:26:20 +0530
Subject: usb: gadget: f_fs: Fix use after free issue as part of queue failure
In AIO case, the request is freed up if ep_queue fails.
However, io_data->req still has the reference to this freed
request. In the case of this failure if there is aio_cancel
call on this io_data it will lead to an invalid dequeue
operation and a potential use after free issue.
Fix this by setting the io_data->req to NULL when the request
is freed as part of queue failure.
Fixes: 2e4c7553cd6f ("usb: gadget: f_fs: add aio support")
Signed-off-by: Sriharsha Allenki <sallenki(a)codeaurora.org>
CC: stable <stable(a)vger.kernel.org>
Reviewed-by: Peter Chen <peter.chen(a)nxp.com>
Link: https://lore.kernel.org/r/20200326115620.12571-1-sallenki@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/function/f_fs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 571917677d35..767f30b86645 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1120,6 +1120,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data)
ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
if (unlikely(ret)) {
+ io_data->req = NULL;
usb_ep_free_request(ep->ep, req);
goto error_lock;
}
--
2.26.0
This is a note to let you know that I've just added the patch titled
USB: serial: io_edgeport: fix slab-out-of-bounds read in
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 57aa9f294b09463492f604feaa5cc719beaace32 Mon Sep 17 00:00:00 2001
From: Qiujun Huang <hqjagain(a)gmail.com>
Date: Wed, 25 Mar 2020 15:52:37 +0800
Subject: USB: serial: io_edgeport: fix slab-out-of-bounds read in
edge_interrupt_callback
Fix slab-out-of-bounds read in the interrupt-URB completion handler.
The boundary condition should be (length - 1) as we access
data[position + 1].
Reported-and-tested-by: syzbot+37ba33391ad5f3935bbd(a)syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain(a)gmail.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/serial/io_edgeport.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/serial/io_edgeport.c b/drivers/usb/serial/io_edgeport.c
index 5737add6a2a4..4cca0b836f43 100644
--- a/drivers/usb/serial/io_edgeport.c
+++ b/drivers/usb/serial/io_edgeport.c
@@ -710,7 +710,7 @@ static void edge_interrupt_callback(struct urb *urb)
/* grab the txcredits for the ports if available */
position = 2;
portNumber = 0;
- while ((position < length) &&
+ while ((position < length - 1) &&
(portNumber < edge_serial->serial->num_ports)) {
txCredits = data[position] | (data[position+1] << 8);
if (txCredits) {
--
2.26.0
On Thu, 2020-03-26 at 13:08 -0700, Andrew Morton wrote:
> > After our server is upgraded to a newer kernel, we found that it
> > continuesly print a warning in the kernel message. The warning is,
> > [832984.946322] netlink: 'irmas.lc': attribute type 1 has an invalid length.
> >
> > irmas.lc is one of our container monitor daemons, and it will use
> > CGROUPSTATS_CMD_GET to get the cgroupstats, that is similar with
> > tools/accounting/getdelays.c. We can also produce this warning with
> > getdelays. For example, after running bellow command
> > $ ./getdelays -C /sys/fs/cgroup/memory
> > then you can find a warning in dmesg,
> > [61607.229318] netlink: 'getdelays': attribute type 1 has an invalid length.
> >
> > This warning is introduced in commit 6e237d099fac ("netlink: Relax attr
> > validation for fixed length types"), which is used to check whether
> > attributes using types NLA_U* and NLA_S* have an exact length.
> >
> > Regarding this issue, the root cause is cgroupstats_cmd_get_policy defines
> > a wrong type as NLA_U32, while it should be NLA_NESTED an its minimal
> > length is NLA_HDRLEN. That is similar to taskstats_cmd_get_policy.
> >
> > As this behavior change really breaks our application, we'd better
> > cc stable as well.
Can you explain how it breaks the application? I mean, it's really only
printing a message to the kernel log in this case? At least that's what
you're describing.
I think you may be describing it wrong, because an NLA_NESTED is allowed
to be *empty* (but otherwise must have at least 4 bytes just like an
NLA_U32).
That said, I'm not even sure I agree that this fix is right? See below.
> Is it correct to say that although the code has always been incorrect,
> but only kernels after 6e237d099fac need this change? If so, I'll add
> Fixes:6e237d099fac to guide the -stable backporting.
That doesn't really seem right - 6e237d099fac *relaxed* the checks. If
anything then it ought to point to 28033ae4e0f5 which may have actually
returned an error; but again, need to understand better what really the
issue is.
> > diff --git a/kernel/taskstats.c b/kernel/taskstats.c
> > index e2ac0e3..b90a520 100644
> > --- a/kernel/taskstats.c
> > +++ b/kernel/taskstats.c
> > @@ -35,8 +35,8 @@
> > static struct genl_family family;
> >
> > static const struct nla_policy taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {
> > - [TASKSTATS_CMD_ATTR_PID] = { .type = NLA_U32 },
> > - [TASKSTATS_CMD_ATTR_TGID] = { .type = NLA_U32 },
> > + [TASKSTATS_CMD_ATTR_PID] = { .type = NLA_NESTED },
> > + [TASKSTATS_CMD_ATTR_TGID] = { .type = NLA_NESTED },
I'm not sure where this is coming from - the kernel evidently uses them
as nested attributes in *outgoing* data (see mk_reply()), but as NLA_U32
in *incoming* data, (see cmd_attr_pid() and cmd_attr_tgid()).
I would generally recommend not doing such a thing as it's messy, but we
do have quite a few such instances cases. In all those cases must the
policy list the incoming policy since that's what the kernel uses to
validate the attributes.
IOW, this part of the change seems _wrong_.
> > * Make sure they are always aligned.
> > */
> > static const struct nla_policy cgroupstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {
> > - [CGROUPSTATS_CMD_ATTR_FD] = { .type = NLA_U32 },
> > + [CGROUPSTATS_CMD_ATTR_FD] = { .type = NLA_NESTED },
> > };
And same here, actually.
johannes
On Thu, 2020-03-26 at 13:08 -0700, Andrew Morton wrote:
> (cc's added)
>
> On Wed, 25 Mar 2020 22:50:42 -0400 Yafang Shao <laoar.shao(a)gmail.com> wrote:
>
> > After our server is upgraded to a newer kernel, we found that it
> > continuesly print a warning in the kernel message. The warning is,
> > [832984.946322] netlink: 'irmas.lc': attribute type 1 has an invalid length.
> >
> > irmas.lc is one of our container monitor daemons, and it will use
> > CGROUPSTATS_CMD_GET to get the cgroupstats, that is similar with
> > tools/accounting/getdelays.c. We can also produce this warning with
> > getdelays. For example, after running bellow command
> > $ ./getdelays -C /sys/fs/cgroup/memory
> > then you can find a warning in dmesg,
> > [61607.229318] netlink: 'getdelays': attribute type 1 has an invalid length.
And looking at this ... well, that code is completely wrong?
E.g.
rc = send_cmd(nl_sd, id, mypid, TASKSTATS_CMD_GET,
cmd_type, &tid, sizeof(__u32));
(cmd_type is one of TASKSTATS_CMD_ATTR_TGID, TASKSTATS_CMD_ATTR_PID)
or it might do
rc = send_cmd(nl_sd, id, mypid, CGROUPSTATS_CMD_GET,
CGROUPSTATS_CMD_ATTR_FD, &cfd, sizeof(__u32));
so clearly it wants to produce a u32 attribute.
But then
static int send_cmd(int sd, __u16 nlmsg_type, __u32 nlmsg_pid,
__u8 genl_cmd, __u16 nla_type,
void *nla_data, int nla_len)
{
...
na = (struct nlattr *) GENLMSG_DATA(&msg);
// this is still fine
na->nla_type = nla_type;
// this is also fine
na->nla_len = nla_len + 1 + NLA_HDRLEN;
// but this??? the nla_len of a netlink attribute should just be
// the len ... what's NLA_HDRLEN doing here? this isn't nested
// here we end up just reserving 1+NLA_HDRLEN too much space
memcpy(NLA_DATA(na), nla_data, nla_len);
// but then it anyway only fills the first nla_len bytes, which
// is just like a regular attribute.
msg.n.nlmsg_len += NLMSG_ALIGN(na->nla_len);
// note that this is also wrong - it should be
// += NLA_ALIGN(NLA_HDRLEN + nla_len)
So really I think what happened here is precisely what we wanted -
David's kernel patch caught the broken userspace tool.
johannes
From: Kaike Wan <kaike.wan(a)intel.com>
When the hfi1 driver is unloaded, kmemleak will report the following
issue:
unreferenced object 0xffff8888461a4c08 (size 8):
comm "kworker/0:0", pid 5, jiffies 4298601264 (age 2047.134s)
hex dump (first 8 bytes):
73 64 6d 61 30 00 ff ff sdma0...
backtrace:
[<00000000311a6ef5>] kvasprintf+0x62/0xd0
[<00000000ade94d9f>] kobject_set_name_vargs+0x1c/0x90
[<0000000060657dbb>] kobject_init_and_add+0x5d/0xb0
[<00000000346fe72b>] 0xffffffffa0c5ecba
[<000000006cfc5819>] 0xffffffffa0c866b9
[<0000000031c65580>] 0xffffffffa0c38e87
[<00000000e9739b3f>] local_pci_probe+0x41/0x80
[<000000006c69911d>] work_for_cpu_fn+0x16/0x20
[<00000000601267b5>] process_one_work+0x171/0x380
[<0000000049a0eefa>] worker_thread+0x1d1/0x3f0
[<00000000909cf2b9>] kthread+0xf8/0x130
[<0000000058f5f874>] ret_from_fork+0x35/0x40
This patch fixes the issue by:
- Releasing dd->per_sdma[i].kobject in hfi1_unregister_sysfs().
- This will fix the memory leak.
- Calling kobject_put() to unwind operations only for those entries in
dd->per_sdma[] whose operations have succeeded (including the current
one that has just failed) in hfi1_verbs_register_sysfs().
Fixes: 0cb2aa690c7e ("IB/hfi1: Add sysfs interface for affinity setup")
Cc: <stable(a)vger.kernel.org>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn(a)intel.com>
Signed-off-by: Kaike Wan <kaike.wan(a)intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro(a)intel.com>
---
drivers/infiniband/hw/hfi1/sysfs.c | 13 +++++++++++--
1 file changed, 11 insertions(+), 2 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/sysfs.c b/drivers/infiniband/hw/hfi1/sysfs.c
index 90f62c4..f1bcecf 100644
--- a/drivers/infiniband/hw/hfi1/sysfs.c
+++ b/drivers/infiniband/hw/hfi1/sysfs.c
@@ -853,8 +853,13 @@ int hfi1_verbs_register_sysfs(struct hfi1_devdata *dd)
return 0;
bail:
- for (i = 0; i < dd->num_sdma; i++)
- kobject_del(&dd->per_sdma[i].kobj);
+ /*
+ * The function kobject_put() will call kobject_del() if the kobject
+ * has been added successfully. The sysfs files created under the
+ * kobject directory will also be removed during the process.
+ */
+ for (; i >= 0; i--)
+ kobject_put(&dd->per_sdma[i].kobj);
return ret;
}
@@ -867,6 +872,10 @@ void hfi1_verbs_unregister_sysfs(struct hfi1_devdata *dd)
struct hfi1_pportdata *ppd;
int i;
+ /* Unwind operations in hfi1_verbs_register_sysfs() */
+ for (i = 0; i < dd->num_sdma; i++)
+ kobject_put(&dd->per_sdma[i].kobj);
+
for (i = 0; i < dd->num_pports; i++) {
ppd = &dd->pport[i];
From: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
[ Upstream commit 3b36b13d5e69d6f51ff1c55d1b404a74646c9757 ]
Commit 317d9313925c ("ALSA: hda/realtek - Set default power save node to
0") makes the ALC225 have pop noise on S3 resume and cold boot.
So partially revert this commit for ALC225 to fix the regression.
Fixes: 317d9313925c ("ALSA: hda/realtek - Set default power save node to 0")
BugLink: https://bugs.launchpad.net/bugs/1866357
Signed-off-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
Link: https://lore.kernel.org/r/20200311061328.17614-1-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/pci/hda/patch_realtek.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index 55bae9e6de27d..76cf438aa339c 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6333,6 +6333,8 @@ static int patch_alc269(struct hda_codec *codec)
alc_update_coef_idx(codec, 0x36, 1 << 13, 1 << 5); /* Switch pcbeep path to Line in path*/
break;
case 0x10ec0225:
+ codec->power_save_node = 1;
+ /* fall through */
case 0x10ec0295:
case 0x10ec0299:
spec->codec_variant = ALC269_TYPE_ALC225;
--
2.20.1
From: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
[ Upstream commit 3b36b13d5e69d6f51ff1c55d1b404a74646c9757 ]
Commit 317d9313925c ("ALSA: hda/realtek - Set default power save node to
0") makes the ALC225 have pop noise on S3 resume and cold boot.
So partially revert this commit for ALC225 to fix the regression.
Fixes: 317d9313925c ("ALSA: hda/realtek - Set default power save node to 0")
BugLink: https://bugs.launchpad.net/bugs/1866357
Signed-off-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
Link: https://lore.kernel.org/r/20200311061328.17614-1-kai.heng.feng@canonical.com
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/pci/hda/patch_realtek.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index a64612db1f155..d7b33c9c9f40b 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6410,6 +6410,8 @@ static int patch_alc269(struct hda_codec *codec)
spec->gen.mixer_nid = 0;
break;
case 0x10ec0225:
+ codec->power_save_node = 1;
+ /* fall through */
case 0x10ec0295:
spec->codec_variant = ALC269_TYPE_ALC225;
break;
--
2.20.1
The patch titled
Subject: kernel/taskstats: fix wrong nla type for {cgroup,task}stats policy
has been removed from the -mm tree. Its filename was
kernel-taskstats-fix-wrong-nla-type-for-cgrouptaskstats-policy.patch
This patch was dropped because it was nacked
------------------------------------------------------
From: Yafang Shao <laoar.shao(a)gmail.com>
Subject: kernel/taskstats: fix wrong nla type for {cgroup,task}stats policy
After our server is upgraded to a newer kernel, we found that it
continuesly print a warning in the kernel message. The warning is,
[832984.946322] netlink: 'irmas.lc': attribute type 1 has an invalid length.
irmas.lc is one of our container monitor daemons, and it will use
CGROUPSTATS_CMD_GET to get the cgroupstats, that is similar with
tools/accounting/getdelays.c. We can also produce this warning with
getdelays. For example, after running below command
$ ./getdelays -C /sys/fs/cgroup/memory
then you can find a warning in dmesg,
[61607.229318] netlink: 'getdelays': attribute type 1 has an invalid length.
This warning is introduced in commit 6e237d099fac ("netlink: Relax attr
validation for fixed length types"), which is used to check whether
attributes using types NLA_U* and NLA_S* have an exact length.
Regarding this issue, the root cause is cgroupstats_cmd_get_policy defines
a wrong type as NLA_U32, while it should be NLA_NESTED an its minimal
length is NLA_HDRLEN. That is similar to taskstats_cmd_get_policy.
As this behavior change really breaks our application, we'd better cc
stable as well.
Link: http://lkml.kernel.org/r/1585191042-9935-1-git-send-email-laoar.shao@gmail.…
Fixes: 6e237d099fac ("netlink: Relax attr validation for fixed length types")
Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com>
Cc: Balbir Singh <bsingharora(a)gmail.com>
Cc: David Ahern <dsahern(a)gmail.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/taskstats.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/taskstats.c~kernel-taskstats-fix-wrong-nla-type-for-cgrouptaskstats-policy
+++ a/kernel/taskstats.c
@@ -35,8 +35,8 @@ struct kmem_cache *taskstats_cache;
static struct genl_family family;
static const struct nla_policy taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {
- [TASKSTATS_CMD_ATTR_PID] = { .type = NLA_U32 },
- [TASKSTATS_CMD_ATTR_TGID] = { .type = NLA_U32 },
+ [TASKSTATS_CMD_ATTR_PID] = { .type = NLA_NESTED },
+ [TASKSTATS_CMD_ATTR_TGID] = { .type = NLA_NESTED },
[TASKSTATS_CMD_ATTR_REGISTER_CPUMASK] = { .type = NLA_STRING },
[TASKSTATS_CMD_ATTR_DEREGISTER_CPUMASK] = { .type = NLA_STRING },};
@@ -45,7 +45,7 @@ static const struct nla_policy taskstats
* Make sure they are always aligned.
*/
static const struct nla_policy cgroupstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {
- [CGROUPSTATS_CMD_ATTR_FD] = { .type = NLA_U32 },
+ [CGROUPSTATS_CMD_ATTR_FD] = { .type = NLA_NESTED },
};
struct listener {
_
Patches currently in -mm which might be from laoar.shao(a)gmail.com are
mm-memcg-fix-build-error-around-the-usage-of-kmem_caches.patch
The patch titled
Subject: kernel/taskstats: fix wrong nla type for {cgroup,task}stats policy
has been added to the -mm tree. Its filename is
kernel-taskstats-fix-wrong-nla-type-for-cgrouptaskstats-policy.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/kernel-taskstats-fix-wrong-nla-typ…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/kernel-taskstats-fix-wrong-nla-typ…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Yafang Shao <laoar.shao(a)gmail.com>
Subject: kernel/taskstats: fix wrong nla type for {cgroup,task}stats policy
After our server is upgraded to a newer kernel, we found that it
continuesly print a warning in the kernel message. The warning is,
[832984.946322] netlink: 'irmas.lc': attribute type 1 has an invalid length.
irmas.lc is one of our container monitor daemons, and it will use
CGROUPSTATS_CMD_GET to get the cgroupstats, that is similar with
tools/accounting/getdelays.c. We can also produce this warning with
getdelays. For example, after running bellow command
$ ./getdelays -C /sys/fs/cgroup/memory
then you can find a warning in dmesg,
[61607.229318] netlink: 'getdelays': attribute type 1 has an invalid length.
This warning is introduced in commit 6e237d099fac ("netlink: Relax attr
validation for fixed length types"), which is used to check whether
attributes using types NLA_U* and NLA_S* have an exact length.
Regarding this issue, the root cause is cgroupstats_cmd_get_policy defines
a wrong type as NLA_U32, while it should be NLA_NESTED an its minimal
length is NLA_HDRLEN. That is similar to taskstats_cmd_get_policy.
As this behavior change really breaks our application, we'd better cc
stable as well.
Link: http://lkml.kernel.org/r/1585191042-9935-1-git-send-email-laoar.shao@gmail.…
Signed-off-by: Yafang Shao <laoar.shao(a)gmail.com>
Cc: Balbir Singh <bsingharora(a)gmail.com>
Cc: David Ahern <dsahern(a)gmail.com>
Cc: David S. Miller <davem(a)davemloft.net>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/taskstats.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/kernel/taskstats.c~kernel-taskstats-fix-wrong-nla-type-for-cgrouptaskstats-policy
+++ a/kernel/taskstats.c
@@ -35,8 +35,8 @@ struct kmem_cache *taskstats_cache;
static struct genl_family family;
static const struct nla_policy taskstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {
- [TASKSTATS_CMD_ATTR_PID] = { .type = NLA_U32 },
- [TASKSTATS_CMD_ATTR_TGID] = { .type = NLA_U32 },
+ [TASKSTATS_CMD_ATTR_PID] = { .type = NLA_NESTED },
+ [TASKSTATS_CMD_ATTR_TGID] = { .type = NLA_NESTED },
[TASKSTATS_CMD_ATTR_REGISTER_CPUMASK] = { .type = NLA_STRING },
[TASKSTATS_CMD_ATTR_DEREGISTER_CPUMASK] = { .type = NLA_STRING },};
@@ -45,7 +45,7 @@ static const struct nla_policy taskstats
* Make sure they are always aligned.
*/
static const struct nla_policy cgroupstats_cmd_get_policy[TASKSTATS_CMD_ATTR_MAX+1] = {
- [CGROUPSTATS_CMD_ATTR_FD] = { .type = NLA_U32 },
+ [CGROUPSTATS_CMD_ATTR_FD] = { .type = NLA_NESTED },
};
struct listener {
_
Patches currently in -mm which might be from laoar.shao(a)gmail.com are
kernel-taskstats-fix-wrong-nla-type-for-cgrouptaskstats-policy.patch
mm-memcg-fix-build-error-around-the-usage-of-kmem_caches.patch
The patch titled
Subject: mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2
has been added to the -mm tree. Its filename is
mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-sparse-fix-kernel-crash-with-pf…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-sparse-fix-kernel-crash-with-pf…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2
add comment
Link: http://lkml.kernel.org/r/20200326133235.343616-1-aneesh.kumar@linux.ibm.com
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Reviewed-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
--- a/mm/sparse.c~mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2
+++ a/mm/sparse.c
@@ -781,7 +781,11 @@ static void section_deactivate(unsigned
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
- /* Mark the section invalid */
+ /*
+ * Mark the section invalid so that valid_section()
+ * return false. This prevents code from dereferencing
+ * ms->usage array.
+ */
ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
}
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
mm-sparse-fix-kernel-crash-with-pfn_section_valid-check.patch
mm-sparse-fix-kernel-crash-with-pfn_section_valid-check-v2.patch
The patch titled
Subject: libfs: fix infoleak in simple_attr_read()
has been removed from the -mm tree. Its filename was
libfs-fix-infoleak-in-simple_attr_read.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Eric Biggers <ebiggers(a)google.com>
Subject: libfs: fix infoleak in simple_attr_read()
Reading from a debugfs file at a nonzero position, without first reading
at position 0, leaks uninitialized memory to userspace.
It's a bit tricky to do this, since lseek() and pread() aren't allowed on
these files, and write() doesn't update the position on them. But writing
to them with splice() *does* update the position:
#define _GNU_SOURCE 1
#include <fcntl.h>
#include <stdio.h>
#include <unistd.h>
int main()
{
int pipes[2], fd, n, i;
char buf[32];
pipe(pipes);
write(pipes[1], "0", 1);
fd = open("/sys/kernel/debug/fault_around_bytes", O_RDWR);
splice(pipes[0], NULL, fd, NULL, 1, 0);
n = read(fd, buf, sizeof(buf));
for (i = 0; i < n; i++)
printf("%02x", buf[i]);
printf("
");
}
Output:
5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a5a30
Fix the infoleak by making simple_attr_read() always fill
simple_attr::get_buf if it hasn't been filled yet.
Link: http://lkml.kernel.org/r/20200308023849.988264-1-ebiggers@kernel.org
Fixes: acaefc25d21f ("[PATCH] libfs: add simple attribute files")
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Reported-by: syzbot+fcab69d1ada3e8d6f06b(a)syzkaller.appspotmail.com
Reported-by: Alexander Potapenko <glider(a)google.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael(a)kernel.org>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/libfs.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/fs/libfs.c~libfs-fix-infoleak-in-simple_attr_read
+++ a/fs/libfs.c
@@ -891,7 +891,7 @@ int simple_attr_open(struct inode *inode
{
struct simple_attr *attr;
- attr = kmalloc(sizeof(*attr), GFP_KERNEL);
+ attr = kzalloc(sizeof(*attr), GFP_KERNEL);
if (!attr)
return -ENOMEM;
@@ -931,9 +931,11 @@ ssize_t simple_attr_read(struct file *fi
if (ret)
return ret;
- if (*ppos) { /* continued read */
+ if (*ppos && attr->get_buf[0]) {
+ /* continued read */
size = strlen(attr->get_buf);
- } else { /* first read */
+ } else {
+ /* first read */
u64 val;
ret = attr->get(attr->data, &val);
if (ret)
_
Patches currently in -mm which might be from ebiggers(a)google.com are
kmod-make-request_module-return-an-error-when-autoloading-is-disabled.patch
fs-filesystemsc-downgrade-user-reachable-warn_once-to-pr_warn_once.patch
docs-admin-guide-document-the-kernelmodprobe-sysctl.patch
selftests-kmod-fix-handling-test-numbers-above-9.patch
selftests-kmod-test-disabling-module-autoloading.patch
The patch titled
Subject: x86/mm: split vmalloc_sync_all()
has been removed from the -mm tree. Its filename was
x86-mm-split-vmalloc_sync_all.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Joerg Roedel <jroedel(a)suse.de>
Subject: x86/mm: split vmalloc_sync_all()
Commit 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in
__purge_vmap_area_lazy()") introduced a call to vmalloc_sync_all() in the
vunmap() code-path. While this change was necessary to maintain
correctness on x86-32-pae kernels, it also adds additional cycles for
architectures that don't need it.
Specifically on x86-64 with CONFIG_VMAP_STACK=y some people reported
severe performance regressions in micro-benchmarks because it now also
calls the x86-64 implementation of vmalloc_sync_all() on vunmap(). But
the vmalloc_sync_all() implementation on x86-64 is only needed for newly
created mappings.
To avoid the unnecessary work on x86-64 and to gain the performance back,
split up vmalloc_sync_all() into two functions:
* vmalloc_sync_mappings(), and
* vmalloc_sync_unmappings()
Most call-sites to vmalloc_sync_all() only care about new mappings being
synchronized. The only exception is the new call-site added in the above
mentioned commit.
Shile Zhang directed us to a report of an 80% regression in reaim
throughput.
Link: http://lkml.kernel.org/r/20191009124418.8286-1-joro@8bytes.org
Link: https://lists.01.org/hyperkitty/list/lkp@lists.01.org/thread/4D3JPPHBNOSPFK…
Link: http://lkml.kernel.org/r/20191113095530.228959-1-shile.zhang@linux.alibaba.…
Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()")
Signed-off-by: Joerg Roedel <jroedel(a)suse.de>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Reported-by: Shile Zhang <shile.zhang(a)linux.alibaba.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> [GHES]
Tested-by: Borislav Petkov <bp(a)suse.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/mm/fault.c | 26 ++++++++++++++++++++++++--
drivers/acpi/apei/ghes.c | 2 +-
include/linux/vmalloc.h | 5 +++--
kernel/notifier.c | 2 +-
mm/nommu.c | 10 +++++++---
mm/vmalloc.c | 11 +++++++----
6 files changed, 43 insertions(+), 13 deletions(-)
--- a/arch/x86/mm/fault.c~x86-mm-split-vmalloc_sync_all
+++ a/arch/x86/mm/fault.c
@@ -190,7 +190,7 @@ static inline pmd_t *vmalloc_sync_one(pg
return pmd_k;
}
-void vmalloc_sync_all(void)
+static void vmalloc_sync(void)
{
unsigned long address;
@@ -217,6 +217,16 @@ void vmalloc_sync_all(void)
}
}
+void vmalloc_sync_mappings(void)
+{
+ vmalloc_sync();
+}
+
+void vmalloc_sync_unmappings(void)
+{
+ vmalloc_sync();
+}
+
/*
* 32-bit:
*
@@ -319,11 +329,23 @@ out:
#else /* CONFIG_X86_64: */
-void vmalloc_sync_all(void)
+void vmalloc_sync_mappings(void)
{
+ /*
+ * 64-bit mappings might allocate new p4d/pud pages
+ * that need to be propagated to all tasks' PGDs.
+ */
sync_global_pgds(VMALLOC_START & PGDIR_MASK, VMALLOC_END);
}
+void vmalloc_sync_unmappings(void)
+{
+ /*
+ * Unmappings never allocate or free p4d/pud pages.
+ * No work is required here.
+ */
+}
+
/*
* 64-bit:
*
--- a/drivers/acpi/apei/ghes.c~x86-mm-split-vmalloc_sync_all
+++ a/drivers/acpi/apei/ghes.c
@@ -171,7 +171,7 @@ int ghes_estatus_pool_init(int num_ghes)
* New allocation must be visible in all pgd before it can be found by
* an NMI allocating from the pool.
*/
- vmalloc_sync_all();
+ vmalloc_sync_mappings();
rc = gen_pool_add(ghes_estatus_pool, addr, PAGE_ALIGN(len), -1);
if (rc)
--- a/include/linux/vmalloc.h~x86-mm-split-vmalloc_sync_all
+++ a/include/linux/vmalloc.h
@@ -141,8 +141,9 @@ extern int remap_vmalloc_range_partial(s
extern int remap_vmalloc_range(struct vm_area_struct *vma, void *addr,
unsigned long pgoff);
-void vmalloc_sync_all(void);
-
+void vmalloc_sync_mappings(void);
+void vmalloc_sync_unmappings(void);
+
/*
* Lowlevel-APIs (not for driver use!)
*/
--- a/kernel/notifier.c~x86-mm-split-vmalloc_sync_all
+++ a/kernel/notifier.c
@@ -519,7 +519,7 @@ NOKPROBE_SYMBOL(notify_die);
int register_die_notifier(struct notifier_block *nb)
{
- vmalloc_sync_all();
+ vmalloc_sync_mappings();
return atomic_notifier_chain_register(&die_chain, nb);
}
EXPORT_SYMBOL_GPL(register_die_notifier);
--- a/mm/nommu.c~x86-mm-split-vmalloc_sync_all
+++ a/mm/nommu.c
@@ -370,10 +370,14 @@ void vm_unmap_aliases(void)
EXPORT_SYMBOL_GPL(vm_unmap_aliases);
/*
- * Implement a stub for vmalloc_sync_all() if the architecture chose not to
- * have one.
+ * Implement a stub for vmalloc_sync_[un]mapping() if the architecture
+ * chose not to have one.
*/
-void __weak vmalloc_sync_all(void)
+void __weak vmalloc_sync_mappings(void)
+{
+}
+
+void __weak vmalloc_sync_unmappings(void)
{
}
--- a/mm/vmalloc.c~x86-mm-split-vmalloc_sync_all
+++ a/mm/vmalloc.c
@@ -1295,7 +1295,7 @@ static bool __purge_vmap_area_lazy(unsig
* First make sure the mappings are removed from all page-tables
* before they are freed.
*/
- vmalloc_sync_all();
+ vmalloc_sync_unmappings();
/*
* TODO: to calculate a flush range without looping.
@@ -3128,16 +3128,19 @@ int remap_vmalloc_range(struct vm_area_s
EXPORT_SYMBOL(remap_vmalloc_range);
/*
- * Implement a stub for vmalloc_sync_all() if the architecture chose not to
- * have one.
+ * Implement stubs for vmalloc_sync_[un]mappings () if the architecture chose
+ * not to have one.
*
* The purpose of this function is to make sure the vmalloc area
* mappings are identical in all page-tables in the system.
*/
-void __weak vmalloc_sync_all(void)
+void __weak vmalloc_sync_mappings(void)
{
}
+void __weak vmalloc_sync_unmappings(void)
+{
+}
static int f(pte_t *pte, unsigned long addr, void *data)
{
_
Patches currently in -mm which might be from jroedel(a)suse.de are
The patch titled
Subject: mm, slub: prevent kmalloc_node crashes and memory leaks
has been removed from the -mm tree. Its filename was
mm-slub-prevent-kmalloc_node-crashes-and-memory-leaks.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, slub: prevent kmalloc_node crashes and memory leaks
Sachin reports [1] a crash in SLUB __slab_alloc():
BUG: Kernel NULL pointer dereference on read at 0x000073b0
Faulting instruction address: 0xc0000000003d55f4
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 19 PID: 1 Comm: systemd Not tainted 5.6.0-rc2-next-20200218-autotest #1
NIP: c0000000003d55f4 LR: c0000000003d5b94 CTR: 0000000000000000
REGS: c0000008b37836d0 TRAP: 0300 Not tainted (5.6.0-rc2-next-20200218-autotest)
MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24004844 XER: 00000000
CFAR: c00000000000dec4 DAR: 00000000000073b0 DSISR: 40000000 IRQMASK: 1
GPR00: c0000000003d5b94 c0000008b3783960 c00000000155d400 c0000008b301f500
GPR04: 0000000000000dc0 0000000000000002 c0000000003443d8 c0000008bb398620
GPR08: 00000008ba2f0000 0000000000000001 0000000000000000 0000000000000000
GPR12: 0000000024004844 c00000001ec52a00 0000000000000000 0000000000000000
GPR16: c0000008a1b20048 c000000001595898 c000000001750c18 0000000000000002
GPR20: c000000001750c28 c000000001624470 0000000fffffffe0 5deadbeef0000122
GPR24: 0000000000000001 0000000000000dc0 0000000000000002 c0000000003443d8
GPR28: c0000008b301f500 c0000008bb398620 0000000000000000 c00c000002287180
NIP [c0000000003d55f4] ___slab_alloc+0x1f4/0x760
LR [c0000000003d5b94] __slab_alloc+0x34/0x60
Call Trace:
[c0000008b3783960] [c0000000003d5734] ___slab_alloc+0x334/0x760 (unreliable)
[c0000008b3783a40] [c0000000003d5b94] __slab_alloc+0x34/0x60
[c0000008b3783a70] [c0000000003d6fa0] __kmalloc_node+0x110/0x490
[c0000008b3783af0] [c0000000003443d8] kvmalloc_node+0x58/0x110
[c0000008b3783b30] [c0000000003fee38] mem_cgroup_css_online+0x108/0x270
[c0000008b3783b90] [c000000000235aa8] online_css+0x48/0xd0
[c0000008b3783bc0] [c00000000023eaec] cgroup_apply_control_enable+0x2ec/0x4d0
[c0000008b3783ca0] [c000000000242318] cgroup_mkdir+0x228/0x5f0
[c0000008b3783d10] [c00000000051e170] kernfs_iop_mkdir+0x90/0xf0
[c0000008b3783d50] [c00000000043dc00] vfs_mkdir+0x110/0x230
[c0000008b3783da0] [c000000000441c90] do_mkdirat+0xb0/0x1a0
[c0000008b3783e20] [c00000000000b278] system_call+0x5c/0x68
This is a PowerPC platform with following NUMA topology:
available: 2 nodes (0-1)
node 0 cpus:
node 0 size: 0 MB
node 0 free: 0 MB
node 1 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
node 1 size: 35247 MB
node 1 free: 30907 MB
node distances:
node 0 1
0: 10 40
1: 40 10
possible numa nodes: 0-31
This only happens with a mmotm patch "mm/memcontrol.c: allocate shrinker_map on
appropriate NUMA node" [2] which effectively calls kmalloc_node for each
possible node. SLUB however only allocates kmem_cache_node on online
N_NORMAL_MEMORY nodes, and relies on node_to_mem_node to return such valid node
for other nodes since commit a561ce00b09e ("slub: fall back to
node_to_mem_node() node if allocating on memoryless node"). This is however not
true in this configuration where the _node_numa_mem_ array is not initialized
for nodes 0 and 2-31, thus it contains zeroes and get_partial() ends up
accessing non-allocated kmem_cache_node.
A related issue was reported by Bharata (originally by Ramachandran) [3] where
a similar PowerPC configuration, but with mainline kernel without patch [2]
ends up allocating large amounts of pages by kmalloc-1k kmalloc-512. This seems
to have the same underlying issue with node_to_mem_node() not behaving as
expected, and might probably also lead to an infinite loop with
CONFIG_SLUB_CPU_PARTIAL [4].
This patch should fix both issues by not relying on node_to_mem_node() anymore
and instead simply falling back to NUMA_NO_NODE, when kmalloc_node(node) is
attempted for a node that's not online, or has no usable memory. The "usable
memory" condition is also changed from node_present_pages() to N_NORMAL_MEMORY
node state, as that is exactly the condition that SLUB uses to allocate
kmem_cache_node structures. The check in get_partial() is removed completely,
as the checks in ___slab_alloc() are now sufficient to prevent get_partial()
being reached with an invalid node.
[1] https://lore.kernel.org/linux-next/3381CD91-AB3D-4773-BA04-E7A072A63968@lin…
[2] https://lore.kernel.org/linux-mm/fff0e636-4c36-ed10-281c-8cdb0687c839@virtu…
[3] https://lore.kernel.org/linux-mm/20200317092624.GB22538@in.ibm.com/
[4] https://lore.kernel.org/linux-mm/088b5996-faae-8a56-ef9c-5b567125ae54@suse.…
Link: http://lkml.kernel.org/r/20200320115533.9604-1-vbabka@suse.cz
Fixes: a561ce00b09e ("slub: fall back to node_to_mem_node() node if allocating on memoryless node")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Reported-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Reported-by: PUVICHAKRAVARTHY RAMACHANDRAN <puvichakravarthy(a)in.ibm.com>
Tested-by: Bharata B Rao <bharata(a)linux.ibm.com>
Debugged-by: Srikar Dronamraju <srikar(a)linux.vnet.ibm.com>
Reviewed-by: Srikar Dronamraju <srikar(a)linux.vnet.ibm.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Christopher Lameter <cl(a)linux.com>
Cc: linuxppc-dev(a)lists.ozlabs.org
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Kirill Tkhai <ktkhai(a)virtuozzo.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Nathan Lynch <nathanl(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/slub.c | 26 +++++++++++++++++---------
1 file changed, 17 insertions(+), 9 deletions(-)
--- a/mm/slub.c~mm-slub-prevent-kmalloc_node-crashes-and-memory-leaks
+++ a/mm/slub.c
@@ -1973,8 +1973,6 @@ static void *get_partial(struct kmem_cac
if (node == NUMA_NO_NODE)
searchnode = numa_mem_id();
- else if (!node_present_pages(node))
- searchnode = node_to_mem_node(node);
object = get_partial_node(s, get_node(s, searchnode), c, flags);
if (object || node != NUMA_NO_NODE)
@@ -2563,17 +2561,27 @@ static void *___slab_alloc(struct kmem_c
struct page *page;
page = c->page;
- if (!page)
+ if (!page) {
+ /*
+ * if the node is not online or has no normal memory, just
+ * ignore the node constraint
+ */
+ if (unlikely(node != NUMA_NO_NODE &&
+ !node_state(node, N_NORMAL_MEMORY)))
+ node = NUMA_NO_NODE;
goto new_slab;
+ }
redo:
if (unlikely(!node_match(page, node))) {
- int searchnode = node;
-
- if (node != NUMA_NO_NODE && !node_present_pages(node))
- searchnode = node_to_mem_node(node);
-
- if (unlikely(!node_match(page, searchnode))) {
+ /*
+ * same as above but node_match() being false already
+ * implies node != NUMA_NO_NODE
+ */
+ if (!node_state(node, N_NORMAL_MEMORY)) {
+ node = NUMA_NO_NODE;
+ goto redo;
+ } else {
stat(s, ALLOC_NODE_MISMATCH);
deactivate_slab(s, page, c->freelist, c);
goto new_slab;
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
revert-topology-add-support-for-node_to_mem_node-to-determine-the-fallback-node.patch
mm-compaction-fully-assume-capture-is-not-null-in-compact_zone_order.patch
mm-hugetlb-remove-unnecessary-memory-fetch-in-pageheadhuge.patch
The patch titled
Subject: epoll: fix possible lost wakeup on epoll_ctl() path
has been removed from the -mm tree. Its filename was
epoll-fix-possible-lost-wakeup-on-epoll_ctl-path.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Roman Penyaev <rpenyaev(a)suse.de>
Subject: epoll: fix possible lost wakeup on epoll_ctl() path
This fixes possible lost wakeup introduced by commit a218cc491420.
Originally modifications to ep->wq were serialized by ep->wq.lock, but in
the a218cc491420 new rw lock was introduced in order to relax fd event
path, i.e. callers of ep_poll_callback() function.
After the change ep_modify and ep_insert (both are called on epoll_ctl()
path) were switched to ep->lock, but ep_poll (epoll_wait) was using
ep->wq.lock on wqueue list modification.
The bug doesn't lead to any wqueue list corruptions, because wake up path
and list modifications were serialized by ep->wq.lock internally, but
actual waitqueue_active() check prior wake_up() call can be reordered with
modifications of ep ready list, thus wake up can be lost.
And yes, can be healed by explicit smp_mb():
list_add_tail(&epi->rdlink, &ep->rdllist);
smp_mb();
if (waitqueue_active(&ep->wq))
wake_up(&ep->wp);
But let's make it simple, thus current patch replaces ep->wq.lock with the
ep->lock for wqueue modifications, thus wake up path always observes
activeness of the wqueue correcty.
Link: http://lkml.kernel.org/r/20200214170211.561524-1-rpenyaev@suse.de
Fixes: a218cc491420 ("epoll: use rwlock in order to reduce ep_poll_callback() contention")
References: https://bugzilla.kernel.org/show_bug.cgi?id=205933
Signed-off-by: Roman Penyaev <rpenyaev(a)suse.de>
Reported-by: Max Neunhoeffer <max(a)arangodb.com>
Bisected-by: Max Neunhoeffer <max(a)arangodb.com>
Tested-by: Max Neunhoeffer <max(a)arangodb.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Christopher Kohlhoff <chris.kohlhoff(a)clearpool.io>
Cc: Davidlohr Bueso <dbueso(a)suse.de>
Cc: Jason Baron <jbaron(a)akamai.com>
Cc: Jes Sorensen <jes.sorensen(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [5.1+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/eventpoll.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--- a/fs/eventpoll.c~epoll-fix-possible-lost-wakeup-on-epoll_ctl-path
+++ a/fs/eventpoll.c
@@ -1854,9 +1854,9 @@ fetch_events:
waiter = true;
init_waitqueue_entry(&wait, current);
- spin_lock_irq(&ep->wq.lock);
+ write_lock_irq(&ep->lock);
__add_wait_queue_exclusive(&ep->wq, &wait);
- spin_unlock_irq(&ep->wq.lock);
+ write_unlock_irq(&ep->lock);
}
for (;;) {
@@ -1904,9 +1904,9 @@ send_events:
goto fetch_events;
if (waiter) {
- spin_lock_irq(&ep->wq.lock);
+ write_lock_irq(&ep->lock);
__remove_wait_queue(&ep->wq, &wait);
- spin_unlock_irq(&ep->wq.lock);
+ write_unlock_irq(&ep->lock);
}
return res;
_
Patches currently in -mm which might be from rpenyaev(a)suse.de are
kselftest-introduce-new-epoll-test-case.patch
The patch titled
Subject: mm: do not allow MADV_PAGEOUT for CoW pages
has been removed from the -mm tree. Its filename was
mm-do-not-allow-madv_pageout-for-cow-pages.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: mm: do not allow MADV_PAGEOUT for CoW pages
Jann has brought up a very interesting point [1]. While shared pages are
excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed
that way. This can lead to all sorts of hard to debug problems. E.g.
performance problems outlined by Daniel [2].
There are runtime environments where there is a substantial memory shared
among security domains via CoW memory and a easy to reclaim way of that
memory, which MADV_{COLD,PAGEOUT} offers, can lead to either performance
degradation in for the parent process which might be more privileged or
even open side channel attacks.
The feasibility of the latter is not really clear to me TBH but there is
no real reason for exposure at this stage. It seems there is no real use
case to depend on reclaiming CoW memory via madvise at this stage so it is
much easier to simply disallow it and this is what this patch does. Put
it simply MADV_{PAGEOUT,COLD} can operate only on the exclusively owned
memory which is a straightforward semantic.
[1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKu…
[2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zR…
Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.cz
Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: Jann Horn <jannh(a)google.com>
Acked-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Daniel Colascione <dancol(a)google.com>
Cc: Dave Hansen <dave.hansen(a)intel.com>
Cc: "Joel Fernandes (Google)" <joel(a)joelfernandes.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/mm/madvise.c~mm-do-not-allow-madv_pageout-for-cow-pages
+++ a/mm/madvise.c
@@ -335,12 +335,14 @@ static int madvise_cold_or_pageout_pte_r
}
page = pmd_page(orig_pmd);
+
+ /* Do not interfere with other mappings of this page */
+ if (page_mapcount(page) != 1)
+ goto huge_unlock;
+
if (next - addr != HPAGE_PMD_SIZE) {
int err;
- if (page_mapcount(page) != 1)
- goto huge_unlock;
-
get_page(page);
spin_unlock(ptl);
lock_page(page);
@@ -426,6 +428,10 @@ regular_page:
continue;
}
+ /* Do not interfere with other mappings of this page */
+ if (page_mapcount(page) != 1)
+ continue;
+
VM_BUG_ON_PAGE(PageTransCompound(page), page);
if (pte_young(ptent)) {
_
Patches currently in -mm which might be from mhocko(a)suse.com are
selftests-vm-drop-dependencies-on-page-flags-from-mlock2-tests.patch
The patch titled
Subject: mm, memcg: throttle allocators based on ancestral memory.high
has been removed from the -mm tree. Its filename was
mm-memcg-throttle-allocators-based-on-ancestral-memoryhigh.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Chris Down <chris(a)chrisdown.name>
Subject: mm, memcg: throttle allocators based on ancestral memory.high
Prior to this commit, we only directly check the affected cgroup's
memory.high against its usage. However, it's possible that we are being
reclaimed as a result of hitting an ancestor memory.high and should be
penalised based on that, instead.
This patch changes memory.high overage throttling to use the largest
overage in its ancestors when considering how many penalty jiffies to
charge. This makes sure that we penalise poorly behaving cgroups in the
same way regardless of at what level of the hierarchy memory.high was
breached.
Link: http://lkml.kernel.org/r/8cd132f84bd7e16cdb8fde3378cdbf05ba00d387.158403614…
Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
Signed-off-by: Chris Down <chris(a)chrisdown.name>
Reported-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Nathan Chancellor <natechancellor(a)gmail.com>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: <stable(a)vger.kernel.org> [5.4.x+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 93 ++++++++++++++++++++++++++++------------------
1 file changed, 58 insertions(+), 35 deletions(-)
--- a/mm/memcontrol.c~mm-memcg-throttle-allocators-based-on-ancestral-memoryhigh
+++ a/mm/memcontrol.c
@@ -2297,28 +2297,41 @@ static void high_work_func(struct work_s
#define MEMCG_DELAY_SCALING_SHIFT 14
/*
- * Scheduled by try_charge() to be executed from the userland return path
- * and reclaims memory over the high limit.
+ * Get the number of jiffies that we should penalise a mischievous cgroup which
+ * is exceeding its memory.high by checking both it and its ancestors.
*/
-void mem_cgroup_handle_over_high(void)
+static unsigned long calculate_high_delay(struct mem_cgroup *memcg,
+ unsigned int nr_pages)
{
- unsigned long usage, high, clamped_high;
- unsigned long pflags;
- unsigned long penalty_jiffies, overage;
- unsigned int nr_pages = current->memcg_nr_pages_over_high;
- struct mem_cgroup *memcg;
+ unsigned long penalty_jiffies;
+ u64 max_overage = 0;
- if (likely(!nr_pages))
- return;
+ do {
+ unsigned long usage, high;
+ u64 overage;
+
+ usage = page_counter_read(&memcg->memory);
+ high = READ_ONCE(memcg->high);
+
+ /*
+ * Prevent division by 0 in overage calculation by acting as if
+ * it was a threshold of 1 page
+ */
+ high = max(high, 1UL);
+
+ overage = usage - high;
+ overage <<= MEMCG_DELAY_PRECISION_SHIFT;
+ overage = div64_u64(overage, high);
+
+ if (overage > max_overage)
+ max_overage = overage;
+ } while ((memcg = parent_mem_cgroup(memcg)) &&
+ !mem_cgroup_is_root(memcg));
- memcg = get_mem_cgroup_from_mm(current->mm);
- reclaim_high(memcg, nr_pages, GFP_KERNEL);
- current->memcg_nr_pages_over_high = 0;
+ if (!max_overage)
+ return 0;
/*
- * memory.high is breached and reclaim is unable to keep up. Throttle
- * allocators proactively to slow down excessive growth.
- *
* We use overage compared to memory.high to calculate the number of
* jiffies to sleep (penalty_jiffies). Ideally this value should be
* fairly lenient on small overages, and increasingly harsh when the
@@ -2326,24 +2339,9 @@ void mem_cgroup_handle_over_high(void)
* its crazy behaviour, so we exponentially increase the delay based on
* overage amount.
*/
-
- usage = page_counter_read(&memcg->memory);
- high = READ_ONCE(memcg->high);
-
- if (usage <= high)
- goto out;
-
- /*
- * Prevent division by 0 in overage calculation by acting as if it was a
- * threshold of 1 page
- */
- clamped_high = max(high, 1UL);
-
- overage = div64_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT,
- clamped_high);
-
- penalty_jiffies = ((u64)overage * overage * HZ)
- >> (MEMCG_DELAY_PRECISION_SHIFT + MEMCG_DELAY_SCALING_SHIFT);
+ penalty_jiffies = max_overage * max_overage * HZ;
+ penalty_jiffies >>= MEMCG_DELAY_PRECISION_SHIFT;
+ penalty_jiffies >>= MEMCG_DELAY_SCALING_SHIFT;
/*
* Factor in the task's own contribution to the overage, such that four
@@ -2360,7 +2358,32 @@ void mem_cgroup_handle_over_high(void)
* application moving forwards and also permit diagnostics, albeit
* extremely slowly.
*/
- penalty_jiffies = min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES);
+ return min(penalty_jiffies, MEMCG_MAX_HIGH_DELAY_JIFFIES);
+}
+
+/*
+ * Scheduled by try_charge() to be executed from the userland return path
+ * and reclaims memory over the high limit.
+ */
+void mem_cgroup_handle_over_high(void)
+{
+ unsigned long penalty_jiffies;
+ unsigned long pflags;
+ unsigned int nr_pages = current->memcg_nr_pages_over_high;
+ struct mem_cgroup *memcg;
+
+ if (likely(!nr_pages))
+ return;
+
+ memcg = get_mem_cgroup_from_mm(current->mm);
+ reclaim_high(memcg, nr_pages, GFP_KERNEL);
+ current->memcg_nr_pages_over_high = 0;
+
+ /*
+ * memory.high is breached and reclaim is unable to keep up. Throttle
+ * allocators proactively to slow down excessive growth.
+ */
+ penalty_jiffies = calculate_high_delay(memcg, nr_pages);
/*
* Don't sleep if the amount of jiffies this memcg owes us is so low
_
Patches currently in -mm which might be from chris(a)chrisdown.name are
mm-memcg-prevent-memoryhigh-load-store-tearing.patch
mm-memcg-prevent-memorymax-load-tearing.patch
mm-memcg-prevent-memorylow-load-store-tearing.patch
mm-memcg-prevent-memorymin-load-store-tearing.patch
mm-memcg-prevent-memoryswapmax-load-tearing.patch
mm-memcg-prevent-mem_cgroup_protected-store-tearing.patch
mm-memcg-bypass-high-reclaim-iteration-for-cgroup-hierarchy-root.patch
The patch titled
Subject: mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
has been removed from the -mm tree. Its filename was
mm-memcg-fix-corruption-on-64-bit-divisor-in-memoryhigh-throttling.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Chris Down <chris(a)chrisdown.name>
Subject: mm, memcg: fix corruption on 64-bit divisor in memory.high throttling
0e4b01df8659 had a bunch of fixups to use the right division method.
However, it seems that after all that it still wasn't right -- div_u64
takes a 32-bit divisor.
The headroom is still large (2^32 pages), so on mundane systems you won't
hit this, but this should definitely be fixed.
Link: http://lkml.kernel.org/r/80780887060514967d414b3cd91f9a316a16ab98.158403614…
Fixes: 0e4b01df8659 ("mm, memcg: throttle allocators when failing reclaim over memory.high")
Signed-off-by: Chris Down <chris(a)chrisdown.name>
Reported-by: Johannes Weiner <hannes(a)cmpxchg.org>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Nathan Chancellor <natechancellor(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [5.4.x+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcg-fix-corruption-on-64-bit-divisor-in-memoryhigh-throttling
+++ a/mm/memcontrol.c
@@ -2339,7 +2339,7 @@ void mem_cgroup_handle_over_high(void)
*/
clamped_high = max(high, 1UL);
- overage = div_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT,
+ overage = div64_u64((u64)(usage - high) << MEMCG_DELAY_PRECISION_SHIFT,
clamped_high);
penalty_jiffies = ((u64)overage * overage * HZ)
_
Patches currently in -mm which might be from chris(a)chrisdown.name are
mm-memcg-prevent-memoryhigh-load-store-tearing.patch
mm-memcg-prevent-memorymax-load-tearing.patch
mm-memcg-prevent-memorylow-load-store-tearing.patch
mm-memcg-prevent-memorymin-load-store-tearing.patch
mm-memcg-prevent-memoryswapmax-load-tearing.patch
mm-memcg-prevent-mem_cgroup_protected-store-tearing.patch
mm-memcg-bypass-high-reclaim-iteration-for-cgroup-hierarchy-root.patch
The patch titled
Subject: memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
has been removed from the -mm tree. Its filename was
memcg-fix-null-pointer-dereference-in-__mem_cgroup_usage_unregister_event.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Chunguang Xu <brookxu(a)tencent.com>
Subject: memcg: fix NULL pointer dereference in __mem_cgroup_usage_unregister_event
An eventfd monitors multiple memory thresholds of the cgroup, closes them,
the kernel deletes all events related to this eventfd. Before all events
are deleted, another eventfd monitors the memory threshold of this cgroup,
leading to a crash:
[135.675108] BUG: kernel NULL pointer dereference, address: 0000000000000004
[135.675350] #PF: supervisor write access in kernel mode
[135.675579] #PF: error_code(0x0002) - not-present page
[135.675816] PGD 800000033058e067 P4D 800000033058e067 PUD 3355ce067 PMD 0
[135.676080] Oops: 0002 [#1] SMP PTI
[135.676332] CPU: 2 PID: 14012 Comm: kworker/2:6 Kdump: loaded Not tainted 5.6.0-rc4 #3
[135.676610] Hardware name: LENOVO 20AWS01K00/20AWS01K00, BIOS GLET70WW (2.24 ) 05/21/2014
[135.676909] Workqueue: events memcg_event_remove
[135.677192] RIP: 0010:__mem_cgroup_usage_unregister_event+0xb3/0x190
[135.677825] RSP: 0018:ffffb47e01c4fe18 EFLAGS: 00010202
[135.678186] RAX: 0000000000000001 RBX: ffff8bb223a8a000 RCX: 0000000000000001
[135.678548] RDX: 0000000000000001 RSI: ffff8bb22fb83540 RDI: 0000000000000001
[135.678912] RBP: ffffb47e01c4fe48 R08: 0000000000000000 R09: 0000000000000010
[135.679287] R10: 000000000000000c R11: 071c71c71c71c71c R12: ffff8bb226aba880
[135.679670] R13: ffff8bb223a8a480 R14: 0000000000000000 R15: 0000000000000000
[135.680066] FS: 0000000000000000(0000) GS:ffff8bb242680000(0000) knlGS:0000000000000000
[135.680475] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[135.680894] CR2: 0000000000000004 CR3: 000000032c29c003 CR4: 00000000001606e0
[135.681325] Call Trace:
[135.681763]� memcg_event_remove+0x32/0x90
[135.682209]� process_one_work+0x172/0x380
[135.682657]� worker_thread+0x49/0x3f0
[135.683111]� kthread+0xf8/0x130
[135.683570]� ? max_active_store+0x80/0x80
[135.684034]� ? kthread_bind+0x10/0x10
[135.684506]� ret_from_fork+0x35/0x40
[135.689733] CR2: 0000000000000004
We can reproduce this problem in the following ways:
1. We create a new cgroup subdirectory and a new eventfd, and then we
monitor multiple memory thresholds of the cgroup through this eventfd.
2. closing this eventfd, and __mem_cgroup_usage_unregister_event ()
will be called multiple times to delete all events related to this
eventfd.
The first time __mem_cgroup_usage_unregister_event() is called, the kernel
will clear all items related to this eventfd in thresholds-> primary.Since
there is currently only one eventfd, thresholds-> primary becomes empty,
so the kernel will set thresholds-> primary and hresholds-> spare to NULL.
If at this time, the user creates a new eventfd and monitor the memory
threshold of this cgroup, kernel will re-initialize thresholds-> primary.
Then when __mem_cgroup_usage_unregister_event () is called for the second
time, because thresholds-> primary is not empty, the system will access
thresholds-> spare, but thresholds-> spare is NULL, which will trigger a
crash.
In general, the longer it takes to delete all events related to this
eventfd, the easier it is to trigger this problem.
The solution is to check whether the thresholds associated with the
eventfd has been cleared when deleting the event. If so, we do nothing.
[akpm(a)linux-foundation.org: fix comment, per Kirill]
Link: http://lkml.kernel.org/r/077a6f67-aefa-4591-efec-f2f3af2b0b02@gmail.com
Fixes: 907860ed381a ("cgroups: make cftype.unregister_event() void-returning")
Signed-off-by: Chunguang Xu <brookxu(a)tencent.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
--- a/mm/memcontrol.c~memcg-fix-null-pointer-dereference-in-__mem_cgroup_usage_unregister_event
+++ a/mm/memcontrol.c
@@ -4027,7 +4027,7 @@ static void __mem_cgroup_usage_unregiste
struct mem_cgroup_thresholds *thresholds;
struct mem_cgroup_threshold_ary *new;
unsigned long usage;
- int i, j, size;
+ int i, j, size, entries;
mutex_lock(&memcg->thresholds_lock);
@@ -4047,14 +4047,20 @@ static void __mem_cgroup_usage_unregiste
__mem_cgroup_threshold(memcg, type == _MEMSWAP);
/* Calculate new number of threshold */
- size = 0;
+ size = entries = 0;
for (i = 0; i < thresholds->primary->size; i++) {
if (thresholds->primary->entries[i].eventfd != eventfd)
size++;
+ else
+ entries++;
}
new = thresholds->spare;
+ /* If no items related to eventfd have been cleared, nothing to do */
+ if (!entries)
+ goto unlock;
+
/* Set thresholds array to NULL if we don't have thresholds */
if (!size) {
kfree(new);
_
Patches currently in -mm which might be from brookxu(a)tencent.com are
The vboxvideo driver is missing a call to remove conflicting framebuffers.
Surprisingly, when using legacy BIOS booting this does not really cause
any issues. But when using UEFI to boot the VM then plymouth will draw
on both the efifb /dev/fb0 and /dev/drm/card0 (which has registered
/dev/fb1 as fbdev emulation).
VirtualBox will actual display the output of both devices (I guess it is
showing whatever was drawn last), this causes weird artifacts because of
pitch issues in the efifb when the VM window is not sized at 1024x768
(the window will resize to its last size once the vboxvideo driver loads,
changing the pitch).
Adding the missing drm_fb_helper_remove_conflicting_pci_framebuffers()
call fixes this.
Cc: stable(a)vger.kernel.org
Fixes: 2695eae1f6d3 ("drm/vboxvideo: Switch to generic fbdev emulation")
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/gpu/drm/vboxvideo/vbox_drv.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/vboxvideo/vbox_drv.c b/drivers/gpu/drm/vboxvideo/vbox_drv.c
index 8512d970a09f..261255085918 100644
--- a/drivers/gpu/drm/vboxvideo/vbox_drv.c
+++ b/drivers/gpu/drm/vboxvideo/vbox_drv.c
@@ -76,6 +76,10 @@ static int vbox_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
if (ret)
goto err_mode_fini;
+ ret = drm_fb_helper_remove_conflicting_pci_framebuffers(pdev, "vboxvideodrmfb");
+ if (ret)
+ goto err_irq_fini;
+
ret = drm_fbdev_generic_setup(&vbox->ddev, 32);
if (ret)
goto err_irq_fini;
--
2.26.0.rc2
This is a note to let you know that I've just added the patch titled
amba: Initialize dma_parms for amba devices
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 5caf6102e32ead7ed5d21b5309c1a4a7d70e6a9f Mon Sep 17 00:00:00 2001
From: Ulf Hansson <ulf.hansson(a)linaro.org>
Date: Wed, 25 Mar 2020 12:34:07 +0100
Subject: amba: Initialize dma_parms for amba devices
It's currently the amba driver's responsibility to initialize the pointer,
dma_parms, for its corresponding struct device. The benefit with this
approach allows us to avoid the initialization and to not waste memory for
the struct device_dma_parameters, as this can be decided on a case by case
basis.
However, it has turned out that this approach is not very practical. Not
only does it lead to open coding, but also to real errors. In principle
callers of dma_set_max_seg_size() doesn't check the error code, but just
assumes it succeeds.
For these reasons, let's do the initialization from the common amba bus at
the device registration point. This also follows the way the PCI devices
are being managed, see pci_device_add().
Cc: <stable(a)vger.kernel.org>
Cc: Russell King <linux(a)armlinux.org.uk>
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Tested-by: Ludovic Barre <ludovic.barre(a)st.com>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
Link: https://lore.kernel.org/r/20200325113407.26996-3-ulf.hansson@linaro.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/amba/bus.c | 2 ++
include/linux/amba/bus.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index fe1523664816..5e61783ce92d 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -374,6 +374,8 @@ static int amba_device_try_add(struct amba_device *dev, struct resource *parent)
WARN_ON(dev->irq[0] == (unsigned int)-1);
WARN_ON(dev->irq[1] == (unsigned int)-1);
+ dev->dev.dma_parms = &dev->dma_parms;
+
ret = request_resource(parent, &dev->res);
if (ret)
goto err_out;
diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h
index 26f0ecf401ea..0bbfd647f5c6 100644
--- a/include/linux/amba/bus.h
+++ b/include/linux/amba/bus.h
@@ -65,6 +65,7 @@ struct amba_device {
struct device dev;
struct resource res;
struct clk *pclk;
+ struct device_dma_parameters dma_parms;
unsigned int periphid;
unsigned int cid;
struct amba_cs_uci_id uci;
--
2.26.0
This is a note to let you know that I've just added the patch titled
misc: rtsx: set correct pcr_ops for rts522A
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 10cea23b6aae15e8324f4101d785687f2c514fe5 Mon Sep 17 00:00:00 2001
From: YueHaibing <yuehaibing(a)huawei.com>
Date: Thu, 26 Mar 2020 11:26:18 +0800
Subject: misc: rtsx: set correct pcr_ops for rts522A
rts522a should use rts522a_pcr_ops, which is
diffrent with rts5227 in phy/hw init setting.
Fixes: ce6a5acc9387 ("mfd: rtsx: Add support for rts522A")
Signed-off-by: YueHaibing <yuehaibing(a)huawei.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20200326032618.20472-1-yuehaibing@huawei.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/cardreader/rts5227.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/misc/cardreader/rts5227.c b/drivers/misc/cardreader/rts5227.c
index 423fecc19fc4..3a9467aaa435 100644
--- a/drivers/misc/cardreader/rts5227.c
+++ b/drivers/misc/cardreader/rts5227.c
@@ -394,6 +394,7 @@ static const struct pcr_ops rts522a_pcr_ops = {
void rts522a_init_params(struct rtsx_pcr *pcr)
{
rts5227_init_params(pcr);
+ pcr->ops = &rts522a_pcr_ops;
pcr->tx_initial_phase = SET_CLOCK_PHASE(20, 20, 11);
pcr->reg_pm_ctrl3 = RTS522A_PM_CTRL3;
--
2.26.0
This is a note to let you know that I've just added the patch titled
staging: wlan-ng: fix use-after-free Read in hfa384x_usbin_callback
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 1165dd73e811a07d947aee218510571f516081f6 Mon Sep 17 00:00:00 2001
From: Qiujun Huang <hqjagain(a)gmail.com>
Date: Thu, 26 Mar 2020 21:18:50 +0800
Subject: staging: wlan-ng: fix use-after-free Read in hfa384x_usbin_callback
We can't handle the case length > WLAN_DATA_MAXLEN.
Because the size of rxfrm->data is WLAN_DATA_MAXLEN(2312), and we can't
read more than that.
Thanks-to: Hillf Danton <hdanton(a)sina.com>
Reported-and-tested-by: syzbot+7d42d68643a35f71ac8a(a)syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20200326131850.17711-1-hqjagain@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/wlan-ng/hfa384x_usb.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/staging/wlan-ng/hfa384x_usb.c b/drivers/staging/wlan-ng/hfa384x_usb.c
index e38acb58cd5e..fa1bf8b069fd 100644
--- a/drivers/staging/wlan-ng/hfa384x_usb.c
+++ b/drivers/staging/wlan-ng/hfa384x_usb.c
@@ -3376,6 +3376,8 @@ static void hfa384x_int_rxmonitor(struct wlandevice *wlandev,
WLAN_HDR_A4_LEN + WLAN_DATA_MAXLEN + WLAN_CRC_LEN)) {
pr_debug("overlen frm: len=%zd\n",
skblen - sizeof(struct p80211_caphdr));
+
+ return;
}
skb = dev_alloc_skb(skblen);
--
2.26.0
This is a note to let you know that I've just added the patch titled
mei: me: add cedar fork device ids
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 99397d33b763dc554d118aaa38cc5abc6ce985de Mon Sep 17 00:00:00 2001
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Date: Tue, 24 Mar 2020 23:07:30 +0200
Subject: mei: me: add cedar fork device ids
Add Cedar Fork (CDF) device ids, those belongs to the cannon point family.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Link: https://lore.kernel.org/r/20200324210730.17672-1-tomas.winkler@intel.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/hw-me-regs.h | 2 ++
drivers/misc/mei/pci-me.c | 2 ++
2 files changed, 4 insertions(+)
diff --git a/drivers/misc/mei/hw-me-regs.h b/drivers/misc/mei/hw-me-regs.h
index d2359aed79ae..9392934e3a06 100644
--- a/drivers/misc/mei/hw-me-regs.h
+++ b/drivers/misc/mei/hw-me-regs.h
@@ -87,6 +87,8 @@
#define MEI_DEV_ID_CMP_H 0x06e0 /* Comet Lake H */
#define MEI_DEV_ID_CMP_H_3 0x06e4 /* Comet Lake H 3 (iTouch) */
+#define MEI_DEV_ID_CDF 0x18D3 /* Cedar Fork */
+
#define MEI_DEV_ID_ICP_LP 0x34E0 /* Ice Lake Point LP */
#define MEI_DEV_ID_JSP_N 0x4DE0 /* Jasper Lake Point N */
diff --git a/drivers/misc/mei/pci-me.c b/drivers/misc/mei/pci-me.c
index ebdc2d6f8ddb..3d21c38e2dbb 100644
--- a/drivers/misc/mei/pci-me.c
+++ b/drivers/misc/mei/pci-me.c
@@ -102,6 +102,8 @@ static const struct pci_device_id mei_me_pci_tbl[] = {
{MEI_PCI_DEVICE(MEI_DEV_ID_MCC, MEI_ME_PCH15_CFG)},
{MEI_PCI_DEVICE(MEI_DEV_ID_MCC_4, MEI_ME_PCH8_CFG)},
+ {MEI_PCI_DEVICE(MEI_DEV_ID_CDF, MEI_ME_PCH8_CFG)},
+
/* required last entry */
{0, }
};
--
2.26.0
This is a note to let you know that I've just added the patch titled
coresight: do not use the BIT() macro in the UAPI header
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the char-misc-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 9b6eaaf3db5e5888df7bca7fed7752a90f7fd871 Mon Sep 17 00:00:00 2001
From: Eugene Syromiatnikov <esyr(a)redhat.com>
Date: Tue, 24 Mar 2020 05:22:13 +0100
Subject: coresight: do not use the BIT() macro in the UAPI header
The BIT() macro definition is not available for the UAPI headers
(moreover, it can be defined differently in the user space); replace
its usage with the _BITUL() macro that is defined in <linux/const.h>.
Fixes: 237483aa5cf4 ("coresight: stm: adding driver for CoreSight STM component")
Signed-off-by: Eugene Syromiatnikov <esyr(a)redhat.com>
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Mathieu Poirier <mathieu.poirier(a)linaro.org>
Link: https://lore.kernel.org/r/20200324042213.GA10452@asgard.redhat.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/uapi/linux/coresight-stm.h | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/coresight-stm.h b/include/uapi/linux/coresight-stm.h
index aac550a52f80..8847dbf24151 100644
--- a/include/uapi/linux/coresight-stm.h
+++ b/include/uapi/linux/coresight-stm.h
@@ -2,8 +2,10 @@
#ifndef __UAPI_CORESIGHT_STM_H_
#define __UAPI_CORESIGHT_STM_H_
-#define STM_FLAG_TIMESTAMPED BIT(3)
-#define STM_FLAG_GUARANTEED BIT(7)
+#include <linux/const.h>
+
+#define STM_FLAG_TIMESTAMPED _BITUL(3)
+#define STM_FLAG_GUARANTEED _BITUL(7)
/*
* The CoreSight STM supports guaranteed and invariant timing
--
2.26.0
This is a note to let you know that I've just added the patch titled
usb: gadget: f_fs: Fix use after free issue as part of queue failure
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From f63ec55ff904b2f2e126884fcad93175f16ab4bb Mon Sep 17 00:00:00 2001
From: Sriharsha Allenki <sallenki(a)codeaurora.org>
Date: Thu, 26 Mar 2020 17:26:20 +0530
Subject: usb: gadget: f_fs: Fix use after free issue as part of queue failure
In AIO case, the request is freed up if ep_queue fails.
However, io_data->req still has the reference to this freed
request. In the case of this failure if there is aio_cancel
call on this io_data it will lead to an invalid dequeue
operation and a potential use after free issue.
Fix this by setting the io_data->req to NULL when the request
is freed as part of queue failure.
Fixes: 2e4c7553cd6f ("usb: gadget: f_fs: add aio support")
Signed-off-by: Sriharsha Allenki <sallenki(a)codeaurora.org>
CC: stable <stable(a)vger.kernel.org>
Reviewed-by: Peter Chen <peter.chen(a)nxp.com>
Link: https://lore.kernel.org/r/20200326115620.12571-1-sallenki@codeaurora.org
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/gadget/function/f_fs.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/usb/gadget/function/f_fs.c b/drivers/usb/gadget/function/f_fs.c
index 571917677d35..767f30b86645 100644
--- a/drivers/usb/gadget/function/f_fs.c
+++ b/drivers/usb/gadget/function/f_fs.c
@@ -1120,6 +1120,7 @@ static ssize_t ffs_epfile_io(struct file *file, struct ffs_io_data *io_data)
ret = usb_ep_queue(ep->ep, req, GFP_ATOMIC);
if (unlikely(ret)) {
+ io_data->req = NULL;
usb_ep_free_request(ep->ep, req);
goto error_lock;
}
--
2.26.0
It's currently the amba driver's responsibility to initialize the pointer,
dma_parms, for its corresponding struct device. The benefit with this
approach allows us to avoid the initialization and to not waste memory for
the struct device_dma_parameters, as this can be decided on a case by case
basis.
However, it has turned out that this approach is not very practical. Not
only does it lead to open coding, but also to real errors. In principle
callers of dma_set_max_seg_size() doesn't check the error code, but just
assumes it succeeds.
For these reasons, let's do the initialization from the common amba bus at
the device registration point. This also follows the way the PCI devices
are being managed, see pci_device_add().
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
---
drivers/amba/bus.c | 2 ++
include/linux/amba/bus.h | 1 +
2 files changed, 3 insertions(+)
diff --git a/drivers/amba/bus.c b/drivers/amba/bus.c
index fe1523664816..5e61783ce92d 100644
--- a/drivers/amba/bus.c
+++ b/drivers/amba/bus.c
@@ -374,6 +374,8 @@ static int amba_device_try_add(struct amba_device *dev, struct resource *parent)
WARN_ON(dev->irq[0] == (unsigned int)-1);
WARN_ON(dev->irq[1] == (unsigned int)-1);
+ dev->dev.dma_parms = &dev->dma_parms;
+
ret = request_resource(parent, &dev->res);
if (ret)
goto err_out;
diff --git a/include/linux/amba/bus.h b/include/linux/amba/bus.h
index 26f0ecf401ea..0bbfd647f5c6 100644
--- a/include/linux/amba/bus.h
+++ b/include/linux/amba/bus.h
@@ -65,6 +65,7 @@ struct amba_device {
struct device dev;
struct resource res;
struct clk *pclk;
+ struct device_dma_parameters dma_parms;
unsigned int periphid;
unsigned int cid;
struct amba_cs_uci_id uci;
--
2.20.1
This is a note to let you know that I've just added the patch titled
USB: serial: io_edgeport: fix slab-out-of-bounds read in
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 57aa9f294b09463492f604feaa5cc719beaace32 Mon Sep 17 00:00:00 2001
From: Qiujun Huang <hqjagain(a)gmail.com>
Date: Wed, 25 Mar 2020 15:52:37 +0800
Subject: USB: serial: io_edgeport: fix slab-out-of-bounds read in
edge_interrupt_callback
Fix slab-out-of-bounds read in the interrupt-URB completion handler.
The boundary condition should be (length - 1) as we access
data[position + 1].
Reported-and-tested-by: syzbot+37ba33391ad5f3935bbd(a)syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain(a)gmail.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Johan Hovold <johan(a)kernel.org>
---
drivers/usb/serial/io_edgeport.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/serial/io_edgeport.c b/drivers/usb/serial/io_edgeport.c
index 5737add6a2a4..4cca0b836f43 100644
--- a/drivers/usb/serial/io_edgeport.c
+++ b/drivers/usb/serial/io_edgeport.c
@@ -710,7 +710,7 @@ static void edge_interrupt_callback(struct urb *urb)
/* grab the txcredits for the ports if available */
position = 2;
portNumber = 0;
- while ((position < length) &&
+ while ((position < length - 1) &&
(portNumber < edge_serial->serial->num_ports)) {
txCredits = data[position] | (data[position+1] << 8);
if (txCredits) {
--
2.26.0
This is a note to let you know that I've just added the patch titled
staging: wlan-ng: fix ODEBUG bug in prism2sta_disconnect_usb
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From a1f165a6b738f0c9d744bad4af7a53909278f5fc Mon Sep 17 00:00:00 2001
From: Qiujun Huang <hqjagain(a)gmail.com>
Date: Wed, 25 Mar 2020 15:06:46 +0800
Subject: staging: wlan-ng: fix ODEBUG bug in prism2sta_disconnect_usb
We should cancel hw->usb_work before kfree(hw).
Reported-by: syzbot+6d2e7f6fa90e27be9d62(a)syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1585120006-30042-1-git-send-email-hqjagain@gmail.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/wlan-ng/prism2usb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/wlan-ng/prism2usb.c b/drivers/staging/wlan-ng/prism2usb.c
index 352556f6870a..4689b2170e4f 100644
--- a/drivers/staging/wlan-ng/prism2usb.c
+++ b/drivers/staging/wlan-ng/prism2usb.c
@@ -180,6 +180,7 @@ static void prism2sta_disconnect_usb(struct usb_interface *interface)
cancel_work_sync(&hw->link_bh);
cancel_work_sync(&hw->commsqual_bh);
+ cancel_work_sync(&hw->usb_work);
/* Now we complete any outstanding commands
* and tell everyone who is waiting for their
--
2.26.0
The patch titled
Subject: selftests: vm: drop dependencies on page flags from mlock2 tests
has been added to the -mm tree. Its filename is
selftests-vm-drop-dependencies-on-page-flags-from-mlock2-tests.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/selftests-vm-drop-dependencies-on-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/selftests-vm-drop-dependencies-on-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: selftests: vm: drop dependencies on page flags from mlock2 tests
It was noticed that mlock2 tests are failing after 9c4e6b1a7027f ("mm,
mlock, vmscan: no more skipping pagevecs") because the patch has changed
the timing on when the page is added to the unevictable LRU list and thus
gains the unevictable page flag.
The test was just too dependent on the implementation details which were
true at the time when it was introduced. Page flags and the timing when
they are set is something no userspace should ever depend on. The test
should be testing only for the user observable contract of the tested
syscalls. Those are defined pretty well for the mlock and there are other
means for testing them. In fact this is already done and testing for page
flags can be safely dropped to achieve the aimed purpose. Present bits
can be checked by /proc/<pid>/smaps RSS field and the locking state by
VmFlags although I would argue that Locked: field would be more
appropriate.
Drop all the page flag machinery and considerably simplify the test. This
should be more robust for future kernel changes while checking the
promised contract is still valid.
Link: http://lkml.kernel.org/r/20200324154218.GS19542@dhcp22.suse.cz
Fixes: 9c4e6b1a7027f ("mm, mlock, vmscan: no more skipping pagevecs")
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Reported-by: Rafael Aquini <aquini(a)redhat.com>
Acked-by: Rafael Aquini <aquini(a)redhat.com>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Eric B Munson <emunson(a)akamai.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/vm/mlock2-tests.c | 233 +++-----------------
1 file changed, 37 insertions(+), 196 deletions(-)
--- a/tools/testing/selftests/vm/mlock2-tests.c~selftests-vm-drop-dependencies-on-page-flags-from-mlock2-tests
+++ a/tools/testing/selftests/vm/mlock2-tests.c
@@ -67,59 +67,6 @@ out:
return ret;
}
-static uint64_t get_pageflags(unsigned long addr)
-{
- FILE *file;
- uint64_t pfn;
- unsigned long offset;
-
- file = fopen("/proc/self/pagemap", "r");
- if (!file) {
- perror("fopen pagemap");
- _exit(1);
- }
-
- offset = addr / getpagesize() * sizeof(pfn);
-
- if (fseek(file, offset, SEEK_SET)) {
- perror("fseek pagemap");
- _exit(1);
- }
-
- if (fread(&pfn, sizeof(pfn), 1, file) != 1) {
- perror("fread pagemap");
- _exit(1);
- }
-
- fclose(file);
- return pfn;
-}
-
-static uint64_t get_kpageflags(unsigned long pfn)
-{
- uint64_t flags;
- FILE *file;
-
- file = fopen("/proc/kpageflags", "r");
- if (!file) {
- perror("fopen kpageflags");
- _exit(1);
- }
-
- if (fseek(file, pfn * sizeof(flags), SEEK_SET)) {
- perror("fseek kpageflags");
- _exit(1);
- }
-
- if (fread(&flags, sizeof(flags), 1, file) != 1) {
- perror("fread kpageflags");
- _exit(1);
- }
-
- fclose(file);
- return flags;
-}
-
#define VMFLAGS "VmFlags:"
static bool is_vmflag_set(unsigned long addr, const char *vmflag)
@@ -159,19 +106,13 @@ out:
#define RSS "Rss:"
#define LOCKED "lo"
-static bool is_vma_lock_on_fault(unsigned long addr)
+static unsigned long get_value_for_name(unsigned long addr, const char *name)
{
- bool ret = false;
- bool locked;
- FILE *smaps = NULL;
- unsigned long vma_size, vma_rss;
char *line = NULL;
- char *value;
size_t size = 0;
-
- locked = is_vmflag_set(addr, LOCKED);
- if (!locked)
- goto out;
+ char *value_ptr;
+ FILE *smaps = NULL;
+ unsigned long value = -1UL;
smaps = seek_to_smaps_entry(addr);
if (!smaps) {
@@ -180,112 +121,70 @@ static bool is_vma_lock_on_fault(unsigne
}
while (getline(&line, &size, smaps) > 0) {
- if (!strstr(line, SIZE)) {
+ if (!strstr(line, name)) {
free(line);
line = NULL;
size = 0;
continue;
}
- value = line + strlen(SIZE);
- if (sscanf(value, "%lu kB", &vma_size) < 1) {
+ value_ptr = line + strlen(name);
+ if (sscanf(value_ptr, "%lu kB", &value) < 1) {
printf("Unable to parse smaps entry for Size\n");
goto out;
}
break;
}
- while (getline(&line, &size, smaps) > 0) {
- if (!strstr(line, RSS)) {
- free(line);
- line = NULL;
- size = 0;
- continue;
- }
-
- value = line + strlen(RSS);
- if (sscanf(value, "%lu kB", &vma_rss) < 1) {
- printf("Unable to parse smaps entry for Rss\n");
- goto out;
- }
- break;
- }
-
- ret = locked && (vma_rss < vma_size);
out:
- free(line);
if (smaps)
fclose(smaps);
- return ret;
+ free(line);
+ return value;
}
-#define PRESENT_BIT 0x8000000000000000ULL
-#define PFN_MASK 0x007FFFFFFFFFFFFFULL
-#define UNEVICTABLE_BIT (1UL << 18)
-
-static int lock_check(char *map)
+static bool is_vma_lock_on_fault(unsigned long addr)
{
- unsigned long page_size = getpagesize();
- uint64_t page1_flags, page2_flags;
+ bool locked;
+ unsigned long vma_size, vma_rss;
+
+ locked = is_vmflag_set(addr, LOCKED);
+ if (!locked)
+ return false;
- page1_flags = get_pageflags((unsigned long)map);
- page2_flags = get_pageflags((unsigned long)map + page_size);
+ vma_size = get_value_for_name(addr, SIZE);
+ vma_rss = get_value_for_name(addr, RSS);
- /* Both pages should be present */
- if (((page1_flags & PRESENT_BIT) == 0) ||
- ((page2_flags & PRESENT_BIT) == 0)) {
- printf("Failed to make both pages present\n");
- return 1;
- }
+ /* only one page is faulted in */
+ return (vma_rss < vma_size);
+}
- page1_flags = get_kpageflags(page1_flags & PFN_MASK);
- page2_flags = get_kpageflags(page2_flags & PFN_MASK);
+#define PRESENT_BIT 0x8000000000000000ULL
+#define PFN_MASK 0x007FFFFFFFFFFFFFULL
+#define UNEVICTABLE_BIT (1UL << 18)
- /* Both pages should be unevictable */
- if (((page1_flags & UNEVICTABLE_BIT) == 0) ||
- ((page2_flags & UNEVICTABLE_BIT) == 0)) {
- printf("Failed to make both pages unevictable\n");
- return 1;
- }
+static int lock_check(unsigned long addr)
+{
+ bool locked;
+ unsigned long vma_size, vma_rss;
- if (!is_vmflag_set((unsigned long)map, LOCKED)) {
- printf("VMA flag %s is missing on page 1\n", LOCKED);
- return 1;
- }
+ locked = is_vmflag_set(addr, LOCKED);
+ if (!locked)
+ return false;
- if (!is_vmflag_set((unsigned long)map + page_size, LOCKED)) {
- printf("VMA flag %s is missing on page 2\n", LOCKED);
- return 1;
- }
+ vma_size = get_value_for_name(addr, SIZE);
+ vma_rss = get_value_for_name(addr, RSS);
- return 0;
+ return (vma_rss == vma_size);
}
static int unlock_lock_check(char *map)
{
- unsigned long page_size = getpagesize();
- uint64_t page1_flags, page2_flags;
-
- page1_flags = get_pageflags((unsigned long)map);
- page2_flags = get_pageflags((unsigned long)map + page_size);
- page1_flags = get_kpageflags(page1_flags & PFN_MASK);
- page2_flags = get_kpageflags(page2_flags & PFN_MASK);
-
- if ((page1_flags & UNEVICTABLE_BIT) || (page2_flags & UNEVICTABLE_BIT)) {
- printf("A page is still marked unevictable after unlock\n");
- return 1;
- }
-
if (is_vmflag_set((unsigned long)map, LOCKED)) {
printf("VMA flag %s is present on page 1 after unlock\n", LOCKED);
return 1;
}
- if (is_vmflag_set((unsigned long)map + page_size, LOCKED)) {
- printf("VMA flag %s is present on page 2 after unlock\n", LOCKED);
- return 1;
- }
-
return 0;
}
@@ -311,7 +210,7 @@ static int test_mlock_lock()
goto unmap;
}
- if (lock_check(map))
+ if (!lock_check((unsigned long)map))
goto unmap;
/* Now unlock and recheck attributes */
@@ -330,64 +229,18 @@ out:
static int onfault_check(char *map)
{
- unsigned long page_size = getpagesize();
- uint64_t page1_flags, page2_flags;
-
- page1_flags = get_pageflags((unsigned long)map);
- page2_flags = get_pageflags((unsigned long)map + page_size);
-
- /* Neither page should be present */
- if ((page1_flags & PRESENT_BIT) || (page2_flags & PRESENT_BIT)) {
- printf("Pages were made present by MLOCK_ONFAULT\n");
- return 1;
- }
-
*map = 'a';
- page1_flags = get_pageflags((unsigned long)map);
- page2_flags = get_pageflags((unsigned long)map + page_size);
-
- /* Only page 1 should be present */
- if ((page1_flags & PRESENT_BIT) == 0) {
- printf("Page 1 is not present after fault\n");
- return 1;
- } else if (page2_flags & PRESENT_BIT) {
- printf("Page 2 was made present\n");
- return 1;
- }
-
- page1_flags = get_kpageflags(page1_flags & PFN_MASK);
-
- /* Page 1 should be unevictable */
- if ((page1_flags & UNEVICTABLE_BIT) == 0) {
- printf("Failed to make faulted page unevictable\n");
- return 1;
- }
-
if (!is_vma_lock_on_fault((unsigned long)map)) {
printf("VMA is not marked for lock on fault\n");
return 1;
}
- if (!is_vma_lock_on_fault((unsigned long)map + page_size)) {
- printf("VMA is not marked for lock on fault\n");
- return 1;
- }
-
return 0;
}
static int unlock_onfault_check(char *map)
{
unsigned long page_size = getpagesize();
- uint64_t page1_flags;
-
- page1_flags = get_pageflags((unsigned long)map);
- page1_flags = get_kpageflags(page1_flags & PFN_MASK);
-
- if (page1_flags & UNEVICTABLE_BIT) {
- printf("Page 1 is still marked unevictable after unlock\n");
- return 1;
- }
if (is_vma_lock_on_fault((unsigned long)map) ||
is_vma_lock_on_fault((unsigned long)map + page_size)) {
@@ -445,7 +298,6 @@ static int test_lock_onfault_of_present(
char *map;
int ret = 1;
unsigned long page_size = getpagesize();
- uint64_t page1_flags, page2_flags;
map = mmap(NULL, 2 * page_size, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
@@ -465,17 +317,6 @@ static int test_lock_onfault_of_present(
goto unmap;
}
- page1_flags = get_pageflags((unsigned long)map);
- page2_flags = get_pageflags((unsigned long)map + page_size);
- page1_flags = get_kpageflags(page1_flags & PFN_MASK);
- page2_flags = get_kpageflags(page2_flags & PFN_MASK);
-
- /* Page 1 should be unevictable */
- if ((page1_flags & UNEVICTABLE_BIT) == 0) {
- printf("Failed to make present page unevictable\n");
- goto unmap;
- }
-
if (!is_vma_lock_on_fault((unsigned long)map) ||
!is_vma_lock_on_fault((unsigned long)map + page_size)) {
printf("VMA with present pages is not marked lock on fault\n");
@@ -507,7 +348,7 @@ static int test_munlockall()
goto out;
}
- if (lock_check(map))
+ if (!lock_check((unsigned long)map))
goto unmap;
if (munlockall()) {
@@ -549,7 +390,7 @@ static int test_munlockall()
goto out;
}
- if (lock_check(map))
+ if (!lock_check((unsigned long)map))
goto unmap;
if (munlockall()) {
_
Patches currently in -mm which might be from mhocko(a)suse.com are
selftests-vm-drop-dependencies-on-page-flags-from-mlock2-tests.patch
The patch titled
Subject: ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
has been added to the -mm tree. Its filename is
ipc-mqueuec-change-__do_notify-to-bypass-check_kill_permission.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/ipc-mqueuec-change-__do_notify-to-…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/ipc-mqueuec-change-__do_notify-to-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Oleg Nesterov <oleg(a)redhat.com>
Subject: ipc/mqueue.c: change __do_notify() to bypass check_kill_permission()
Commit cc731525f26a ("signal: Remove kernel interal si_code magic")
changed the value of SI_FROMUSER(SI_MESGQ), this means that mq_notify() no
longer works if the sender doesn't have rights to send a signal.
Change __do_notify() to use do_send_sig_info() instead of kill_pid_info()
to avoid check_kill_permission().
This needs the additional notify.sigev_signo != 0 check, shouldn't we
change do_mq_notify() to deny sigev_signo == 0 ?
Test-case:
#include <signal.h>
#include <mqueue.h>
#include <unistd.h>
#include <sys/wait.h>
#include <assert.h>
static int notified;
static void sigh(int sig)
{
notified = 1;
}
int main(void)
{
signal(SIGIO, sigh);
int fd = mq_open("/mq", O_RDWR|O_CREAT, 0666, NULL);
assert(fd >= 0);
struct sigevent se = {
.sigev_notify = SIGEV_SIGNAL,
.sigev_signo = SIGIO,
};
assert(mq_notify(fd, &se) == 0);
if (!fork()) {
assert(setuid(1) == 0);
mq_send(fd, "",1,0);
return 0;
}
wait(NULL);
mq_unlink("/mq");
assert(notified);
return 0;
}
Link: http://lkml.kernel.org/r/20200324200932.GB24230@redhat.com
Fixes: cc731525f26a ("signal: Remove kernel interal si_code magic")
Reported-by: Yoji <yoji.fujihar.min(a)gmail.com>
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: Markus Elfring <elfring(a)users.sourceforge.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/mqueue.c | 28 ++++++++++++++++++++--------
1 file changed, 20 insertions(+), 8 deletions(-)
--- a/ipc/mqueue.c~ipc-mqueuec-change-__do_notify-to-bypass-check_kill_permission
+++ a/ipc/mqueue.c
@@ -774,28 +774,40 @@ static void __do_notify(struct mqueue_in
* synchronously. */
if (info->notify_owner &&
info->attr.mq_curmsgs == 1) {
- struct kernel_siginfo sig_i;
switch (info->notify.sigev_notify) {
case SIGEV_NONE:
break;
- case SIGEV_SIGNAL:
- /* sends signal */
+ case SIGEV_SIGNAL: {
+ struct kernel_siginfo sig_i;
+ struct task_struct *task;
+
+ /* do_mq_notify() accepts sigev_signo == 0, why?? */
+ if (!info->notify.sigev_signo)
+ break;
clear_siginfo(&sig_i);
sig_i.si_signo = info->notify.sigev_signo;
sig_i.si_errno = 0;
sig_i.si_code = SI_MESGQ;
sig_i.si_value = info->notify.sigev_value;
- /* map current pid/uid into info->owner's namespaces */
rcu_read_lock();
+ /* map current pid/uid into info->owner's namespaces */
sig_i.si_pid = task_tgid_nr_ns(current,
ns_of_pid(info->notify_owner));
- sig_i.si_uid = from_kuid_munged(info->notify_user_ns, current_uid());
+ sig_i.si_uid = from_kuid_munged(info->notify_user_ns,
+ current_uid());
+ /*
+ * We can't use kill_pid_info(), this signal should
+ * bypass check_kill_permission(). It is from kernel
+ * but si_fromuser() can't know this.
+ */
+ task = pid_task(info->notify_owner, PIDTYPE_PID);
+ if (task)
+ do_send_sig_info(info->notify.sigev_signo,
+ &sig_i, task, PIDTYPE_TGID);
rcu_read_unlock();
-
- kill_pid_info(info->notify.sigev_signo,
- &sig_i, info->notify_owner);
break;
+ }
case SIGEV_THREAD:
set_cookie(info->notify_cookie, NOTIFY_WOKENUP);
netlink_sendskb(info->notify_sock, info->notify_cookie);
_
Patches currently in -mm which might be from oleg(a)redhat.com are
ipc-mqueuec-change-__do_notify-to-bypass-check_kill_permission.patch
aio-simplify-read_events.patch
The patch titled
Subject: mm/sparse: fix kernel crash with pfn_section_valid check
has been added to the -mm tree. Its filename is
mm-sparse-fix-kernel-crash-with-pfn_section_valid-check.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-sparse-fix-kernel-crash-with-pf…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-sparse-fix-kernel-crash-with-pf…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/sparse: fix kernel crash with pfn_section_valid check
Fixes the below crash
BUG: Kernel NULL pointer dereference on read at 0x00000000
Faulting instruction address: 0xc000000000c3447c
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
CPU: 11 PID: 7519 Comm: lt-ndctl Not tainted 5.6.0-rc7-autotest #1
...
NIP [c000000000c3447c] vmemmap_populated+0x98/0xc0
LR [c000000000088354] vmemmap_free+0x144/0x320
Call Trace:
section_deactivate+0x220/0x240
__remove_pages+0x118/0x170
arch_remove_memory+0x3c/0x150
memunmap_pages+0x1cc/0x2f0
devm_action_release+0x30/0x50
release_nodes+0x2f8/0x3e0
device_release_driver_internal+0x168/0x270
unbind_store+0x130/0x170
drv_attr_store+0x44/0x60
sysfs_kf_write+0x68/0x80
kernfs_fop_write+0x100/0x290
__vfs_write+0x3c/0x70
vfs_write+0xcc/0x240
ksys_write+0x7c/0x140
system_call+0x5c/0x68
With commit: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in
SPARSEMEM|!VMEMMAP case") section_mem_map is set to NULL after
depopulate_section_mem(). This was done so that pfn_page() can work
correctly with kernel config that disables SPARSEMEM_VMEMMAP. With that
config pfn_to_page does
__section_mem_map_addr(__sec) + __pfn;
where
static inline struct page *__section_mem_map_addr(struct mem_section *section)
{
unsigned long map = section->section_mem_map;
map &= SECTION_MAP_MASK;
return (struct page *)map;
}
Now with SPASEMEM_VMEMAP enabled, mem_section->usage->subsection_map is
used to check the pfn validity (pfn_valid()). Since section_deactivate
release mem_section->usage if a section is fully deactivated, pfn_valid()
check after a subsection_deactivate cause a kernel crash.
static inline int pfn_valid(unsigned long pfn)
{
...
return early_section(ms) || pfn_section_valid(ms, pfn);
}
where
static inline int pfn_section_valid(struct mem_section *ms, unsigned long pfn)
{
int idx = subsection_map_index(pfn);
return test_bit(idx, ms->usage->subsection_map);
}
Avoid this by clearing SECTION_HAS_MEM_MAP when mem_section->usage is freed.
Link: http://lkml.kernel.org/r/20200325031914.107660-1-aneesh.kumar@linux.ibm.com
Fixes: d41e2f3bd546 ("mm/hotplug: fix hot remove failure in SPARSEMEM|!VMEMMAP case")
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Reported-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Tested-by: Sachin Sant <sachinp(a)linux.vnet.ibm.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Pankaj Gupta <pankaj.gupta.linux(a)gmail.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Mike Rapoport <rppt(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/sparse.c | 2 ++
1 file changed, 2 insertions(+)
--- a/mm/sparse.c~mm-sparse-fix-kernel-crash-with-pfn_section_valid-check
+++ a/mm/sparse.c
@@ -781,6 +781,8 @@ static void section_deactivate(unsigned
ms->usage = NULL;
}
memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr);
+ /* Mark the section invalid */
+ ms->section_mem_map &= ~SECTION_HAS_MEM_MAP;
}
if (section_is_early && memmap)
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
mm-sparse-fix-kernel-crash-with-pfn_section_valid-check.patch
On Thu, Mar 12, 2020 at 6:42 AM Linus Walleij <linus.walleij(a)linaro.org> wrote:
>
> On Wed, Mar 11, 2020 at 4:43 PM Tim Harvey <tharvey(a)gateworks.com> wrote:
>
> > If there are no parent resources do not call irq_chip_request_resources_parent
> > at all as this will return an error.
> >
> > This resolves a regression where devices using a thunderx gpio as an interrupt
> > would fail probing.
> >
> > Fixes: 0d04d0c ("gpio: thunderx: Use the default parent apis for {request,release}_resources")
> > Signed-off-by: Tim Harvey <tharvey(a)gateworks.com>
>
> This patch does not apply to the mainline kernel or v5.6-rc1.
>
> Please verify:
> 1. If the problem is still in v5.6 (we refactored the driver to
> use GPIOLIB_IRQCHIP)
Linus,
Sorry, another issue was keeping me from being able to boot 5.6-rc but
that's now understood and I can confirm the issue is not present in
v5.6-rc5
>
> 2. If not, only propose it for linux-stable v5.5 etc.
>
Yes, needs to be applied to v5.2, v5.3, v5.4, v5.5. I cc'd stable. If
I need to re-submit please let me know.
Cc: stable(a)vger.kernel.org
Tim
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 0b0b40a42baf - Linux 5.5.13-rc1
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://cki-artifacts.s3.us-east-2.amazonaws.com/index.html?prefix=dataware…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ❌ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
Host 2:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
✅ stress: stress-ng
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
ppc64le:
Host 1:
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ LTP
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking socket: fuzz
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ Usex - version 1.9-29
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ❌ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
Host 2:
✅ Boot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ lvm thinp sanity
✅ storage: software RAID testing
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
s390x:
Host 1:
✅ Boot test
✅ selinux-policy: serge-testsuite
🚧 ✅ Storage blktests
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Podman system integration test - as root
✅ Podman system integration test - as user
✅ Loopdev Sanity
✅ Memory function: memfd_create
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking MACsec: sanity
✅ Networking sctp-auth: sockopts test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ httpd: mod_ssl smoke sanity
✅ tuned: tune-processes-through-perf
✅ trace: ftrace/tracer
🚧 ✅ CIFS Connectathon
🚧 ✅ POSIX pjd-fstest suites
🚧 ✅ jvm - DaCapo Benchmark Suite
🚧 ✅ jvm - jcstress tests
🚧 ✅ Memory function: kaslr
🚧 ✅ LTP: openposix test suite
🚧 ✅ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ✅ iotop: sanity
🚧 ✅ storage: dm/common
x86_64:
Host 1:
✅ Boot test
✅ Storage SAN device stress - megaraid_sas
Host 2:
✅ Boot test
✅ Storage SAN device stress - mpt3sas driver
Host 3:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ Usex - version 1.9-29
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ storage: dm/common
Host 4:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ lvm thinp sanity
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ stress: stress-ng
🚧 ⚡⚡⚡ IOMMU boot test
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
Host 5:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ Usex - version 1.9-29
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ storage: dm/common
Host 6:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Podman system integration test - as root
⚡⚡⚡ Podman system integration test - as user
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Ethernet drivers sanity
⚡⚡⚡ Networking MACsec: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking sctp-auth: sockopts test
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ httpd: mod_ssl smoke sanity
⚡⚡⚡ tuned: tune-processes-through-perf
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ Usex - version 1.9-29
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - DaCapo Benchmark Suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ LTP: openposix test suite
🚧 ⚡⚡⚡ Networking vnic: ipvlan/basic
🚧 ⚡⚡⚡ audit: audit testsuite test
🚧 ⚡⚡⚡ iotop: sanity
🚧 ⚡⚡⚡ storage: dm/common
Test sources: https://github.com/CKI-project/tests-beaker
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
I'm announcing the release of the 5.5.13 kernel.
All users of the 5.5 kernel series must upgrade.
The updated 5.5.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.5.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/base/core.c | 4 +++-
include/linux/device.h | 11 +++++++++++
3 files changed, 15 insertions(+), 2 deletions(-)
Greg Kroah-Hartman (1):
Linux 5.5.13
Saravana Kannan (2):
driver core: Add dev_has_sync_state()
driver core: Skip unnecessary work when device doesn't have sync_state()
This changes kcmp_epoll_target to use the new exec_update_mutex
instead of cred_guard_mutex.
This should be safe, as the credentials are only used for reading,
and furthermore ->mm and ->sighand are updated on execve,
but only under the new exec_update_mutex.
Signed-off-by: Bernd Edlinger <bernd.edlinger(a)hotmail.de>
---
kernel/kcmp.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/kcmp.c b/kernel/kcmp.c
index a0e3d7a..b3ff928 100644
--- a/kernel/kcmp.c
+++ b/kernel/kcmp.c
@@ -173,8 +173,8 @@ static int kcmp_epoll_target(struct task_struct *task1,
/*
* One should have enough rights to inspect task details.
*/
- ret = kcmp_lock(&task1->signal->cred_guard_mutex,
- &task2->signal->cred_guard_mutex);
+ ret = kcmp_lock(&task1->signal->exec_update_mutex,
+ &task2->signal->exec_update_mutex);
if (ret)
goto err;
if (!ptrace_may_access(task1, PTRACE_MODE_READ_REALCREDS) ||
@@ -229,8 +229,8 @@ static int kcmp_epoll_target(struct task_struct *task1,
}
err_unlock:
- kcmp_unlock(&task1->signal->cred_guard_mutex,
- &task2->signal->cred_guard_mutex);
+ kcmp_unlock(&task1->signal->exec_update_mutex,
+ &task2->signal->exec_update_mutex);
err:
put_task_struct(task1);
put_task_struct(task2);
--
1.9.1
This changes __pidfd_fget to use the new exec_update_mutex
instead of cred_guard_mutex.
This should be safe, as the credentials do not change
before exec_update_mutex is locked. Therefore whatever
file access is possible with holding the cred_guard_mutex
here is also possbile with the exec_update_mutex.
Signed-off-by: Bernd Edlinger <bernd.edlinger(a)hotmail.de>
---
kernel/pid.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/pid.c b/kernel/pid.c
index 0f4ecb5..04821f4 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -584,7 +584,7 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd)
struct file *file;
int ret;
- ret = mutex_lock_killable(&task->signal->cred_guard_mutex);
+ ret = mutex_lock_killable(&task->signal->exec_update_mutex);
if (ret)
return ERR_PTR(ret);
@@ -593,7 +593,7 @@ static struct file *__pidfd_fget(struct task_struct *task, int fd)
else
file = ERR_PTR(-EPERM);
- mutex_unlock(&task->signal->cred_guard_mutex);
+ mutex_unlock(&task->signal->exec_update_mutex);
return file ?: ERR_PTR(-EBADF);
}
--
1.9.1
Hi,
Tools like sar and iostat are reporting abnormally high %util with 4.19.y
running in VM (the disk is almost idle):
$ sar -dp
Linux 4.19.107-1.x86_64 03/25/20 _x86_64_ (6 CPU)
00:00:00 DEV tps ... %util
00:10:00 vda 0.55 ... 98.07
...
10:00:00 vda 0.44 ... 99.74
Average: vda 0.48 ... 98.98
The numbers look reasonable for the partition:
# iostat -x -p ALL 1 1
Linux 4.19.107-1.x86_64 03/25/20 _x86_64_ (6 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
10.51 0.00 8.58 0.05 0.11 80.75
Device r/s ... %util
vda 0.02 ... 98.25
vda1 0.01 ... 0.09
Lots of io_ticks in /proc/diskstats:
# cat /proc/uptime
45787.03 229321.29
# grep vda /proc/diskstats
253 0 vda 760 0 38498 731 28165 43212 1462928 157514 0 44690263
44812032 0 0 0 0
253 1 vda1 350 0 19074 293 26169 43212 1462912 154931 0 41560 150998
0 0 0 0
Other people are apparently seeing this too with 4.19:
https://kudzia.eu/b/2019/09/iostat-x-1-reporting-100-utilization-of-nearly-…
I also see this only in 4.19.y and bisected to this (based on the Fixes
tag, this should have been taken to 4.14 too...):
commit 6131837b1de66116459ef4413e26fdbc70d066dc
Author: Omar Sandoval <osandov(a)fb.com>
Date: Thu Apr 26 00:21:58 2018 -0700
blk-mq: count allocated but not started requests in iostats inflight
In the legacy block case, we increment the counter right after we
allocate the request, not when the driver handles it. In both the legacy
and blk-mq cases, part_inc_in_flight() is called from
blk_account_io_start() right after we've allocated the request. blk-mq
only considers requests started requests as inflight, but this is
inconsistent with the legacy definition and the intention in the code.
This removes the started condition and instead counts all allocated
requests.
Fixes: f299b7c7a9de ("blk-mq: provide internal in-flight variant")
Signed-off-by: Omar Sandoval <osandov(a)fb.com>
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-mq.c b/block/blk-mq.c
index c3621453ad87..5450cbc61f8d 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -95,18 +95,15 @@ static void blk_mq_check_inflight(struct blk_mq_hw_ctx
*hctx,
{
struct mq_inflight *mi = priv;
- if (blk_mq_rq_state(rq) == MQ_RQ_IN_FLIGHT) {
- /*
- * index[0] counts the specific partition that was asked
- * for. index[1] counts the ones that are active on the
- * whole device, so increment that if mi->part is indeed
- * a partition, and not a whole device.
- */
- if (rq->part == mi->part)
- mi->inflight[0]++;
- if (mi->part->partno)
- mi->inflight[1]++;
- }
+ /*
+ * index[0] counts the specific partition that was asked for.
index[1]
+ * counts the ones that are active on the whole device, so
increment
+ * that if mi->part is indeed a partition, and not a whole device.
+ */
+ if (rq->part == mi->part)
+ mi->inflight[0]++;
+ if (mi->part->partno)
+ mi->inflight[1]++;
}
void blk_mq_in_flight(struct request_queue *q, struct hd_struct *part,
If I get it right, when the disk is idle, and some request is allocated,
part_round_stats() with this commit will now add all ticks between
previous I/O and current time (now - part->stamp) to io_ticks.
Before the commit, part_round_stats() would only update part->stamp when
called after request allocation.
Any thoughts how to best fix this in 4.19?
I see the io_ticks accounting has been reworked in 5.0, do we need to
backport those to 4.19, or any ill effects if this commit is reverted in
4.19?
-Tommi
Hi
[This is an automated email]
The bot has tested the following trees: v5.5.11, v5.4.27, v4.19.112, v4.14.174, v4.9.217, v4.4.217.
v5.5.11: Build OK!
v5.4.27: Build OK!
v4.19.112: Failed to apply! Possible dependencies:
1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
v4.14.174: Failed to apply! Possible dependencies:
1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
v4.9.217: Failed to apply! Possible dependencies:
1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
2b477c00f3bd ("svcrpc: free contexts immediately on PROC_DESTROY")
471a930ad7d1 ("SUNRPC: Drop all entries from cache_detail when cache_purge()")
v4.4.217: Failed to apply! Possible dependencies:
1863d77f15da ("SUNRPC: Replace the cache_detail->hash_lock with a regular spinlock")
2b477c00f3bd ("svcrpc: free contexts immediately on PROC_DESTROY")
471a930ad7d1 ("SUNRPC: Drop all entries from cache_detail when cache_purge()")
d8d29138b17c ("sunrpc: remove 'inuse' flag from struct cache_detail.")
NOTE: The patch will not be queued to stable trees until it is upstream.
How should we proceed with this patch?
--
Thanks
Sasha
This is a note to let you know that I've just added the patch titled
staging: wlan-ng: fix ODEBUG bug in prism2sta_disconnect_usb
to my staging git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging.git
in the staging-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the staging-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From a1f165a6b738f0c9d744bad4af7a53909278f5fc Mon Sep 17 00:00:00 2001
From: Qiujun Huang <hqjagain(a)gmail.com>
Date: Wed, 25 Mar 2020 15:06:46 +0800
Subject: staging: wlan-ng: fix ODEBUG bug in prism2sta_disconnect_usb
We should cancel hw->usb_work before kfree(hw).
Reported-by: syzbot+6d2e7f6fa90e27be9d62(a)syzkaller.appspotmail.com
Signed-off-by: Qiujun Huang <hqjagain(a)gmail.com>
Cc: stable <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/1585120006-30042-1-git-send-email-hqjagain@gmail.…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/staging/wlan-ng/prism2usb.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/staging/wlan-ng/prism2usb.c b/drivers/staging/wlan-ng/prism2usb.c
index 352556f6870a..4689b2170e4f 100644
--- a/drivers/staging/wlan-ng/prism2usb.c
+++ b/drivers/staging/wlan-ng/prism2usb.c
@@ -180,6 +180,7 @@ static void prism2sta_disconnect_usb(struct usb_interface *interface)
cancel_work_sync(&hw->link_bh);
cancel_work_sync(&hw->commsqual_bh);
+ cancel_work_sync(&hw->usb_work);
/* Now we complete any outstanding commands
* and tell everyone who is waiting for their
--
2.25.2
v4l2_ctrl_handler_free() uses hdl->lock, which in ov5640 driver is set
to sensor's own sensor->lock. In ov5640_remove(), the driver destroys the
sensor->lock first, and then calls v4l2_ctrl_handler_free(), resulting
in the use of the destroyed mutex.
Fix this by calling v4l2_ctrl_handler_free() before mutex_destroy().
Signed-off-by: Tomi Valkeinen <tomi.valkeinen(a)ti.com>
Cc: stable(a)vger.kernel.org
---
drivers/media/i2c/ov5640.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/i2c/ov5640.c b/drivers/media/i2c/ov5640.c
index 854031f0b64a..64511de4eea8 100644
--- a/drivers/media/i2c/ov5640.c
+++ b/drivers/media/i2c/ov5640.c
@@ -3104,9 +3104,9 @@ static int ov5640_remove(struct i2c_client *client)
struct ov5640_dev *sensor = to_ov5640_dev(sd);
v4l2_async_unregister_subdev(&sensor->sd);
+ v4l2_ctrl_handler_free(&sensor->ctrls.handler);
mutex_destroy(&sensor->lock);
media_entity_cleanup(&sensor->sd.entity);
- v4l2_ctrl_handler_free(&sensor->ctrls.handler);
return 0;
}
--
Texas Instruments Finland Oy, Porkkalankatu 22, 00180 Helsinki.
Y-tunnus/Business ID: 0615521-4. Kotipaikka/Domicile: Helsinki
From: Johannes Berg <johannes.berg(a)intel.com>
The new opmode notification used this attribute with a u8, when
it's documented as a u32 and indeed used in userspace as such,
it just happens to work on little-endian systems since userspace
isn't doing any strict size validation, and the u8 goes into the
lower byte. Fix this.
Cc: stable(a)vger.kernel.org
Fixes: 466b9936bf93 ("cfg80211: Add support to notify station's opmode change to userspace")
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
---
net/wireless/nl80211.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index ec5d67794aab..f0af23c1634a 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -16416,7 +16416,7 @@ void cfg80211_sta_opmode_change_notify(struct net_device *dev, const u8 *mac,
goto nla_put_failure;
if ((sta_opmode->changed & STA_OPMODE_MAX_BW_CHANGED) &&
- nla_put_u8(msg, NL80211_ATTR_CHANNEL_WIDTH, sta_opmode->bw))
+ nla_put_u32(msg, NL80211_ATTR_CHANNEL_WIDTH, sta_opmode->bw))
goto nla_put_failure;
if ((sta_opmode->changed & STA_OPMODE_N_SS_CHANGED) &&
--
2.25.1