- Linux-stable-mirror - lists.linaro.org

[patch 12/17] dump_stack: avoid the livelock of the dump_lock

by akpm＠linux-foundation.org

From: Kevin Hao <haokexin(a)gmail.com> Subject: dump_stack: avoid the livelock of the dump_lock In the current code, we use the atomic_cmpxchg() to serialize the output of the dump_stack(), but this implementation suffers the thundering herd problem. We have observed such kind of livelock on a Marvell cn96xx board(24 cpus) when heavily using the dump_stack() in a kprobe handler. Actually we can let the competitors to wait for the releasing of the lock before jumping to atomic_cmpxchg(). This will definitely mitigate the thundering herd problem. Thanks Linus for the suggestion. [akpm(a)linux-foundation.org: fix comment] Link: http://lkml.kernel.org/r/20191030031637.6025-1-haokexin@gmail.com Fixes: b58d977432c8 ("dump_stack: serialize the output from dump_stack()") Signed-off-by: Kevin Hao <haokexin(a)gmail.com> Suggested-by: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- lib/dump_stack.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) --- a/lib/dump_stack.c~dump_stack-avoid-the-livelock-of-the-dump_lock +++ a/lib/dump_stack.c @@ -106,7 +106,12 @@ retry: was_locked = 1; } else { local_irq_restore(flags); - cpu_relax(); + /* + * Wait for the lock to release before jumping to + * atomic_cmpxchg() in order to mitigate the thundering herd + * problem. + */ + do { cpu_relax(); } while (atomic_read(&dump_lock) != -1); goto retry; } _

5 years, 8 months

1
0
0 0

[patch 09/17] mm/khugepaged: fix might_sleep() warn with CONFIG_HIGHPTE=y

by akpm＠linux-foundation.org

From: Ville Syrjälä <ville.syrjala(a)linux.intel.com> Subject: mm/khugepaged: fix might_sleep() warn with CONFIG_HIGHPTE=y I got some khugepaged spew on a 32bit x86: [ 217.490026] BUG: sleeping function called from invalid context at include/linux/mmu_notifier.h:346 [ 217.492826] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 25, name: khugepaged [ 217.495589] INFO: lockdep is turned off. [ 217.498371] CPU: 1 PID: 25 Comm: khugepaged Not tainted 5.4.0-rc5-elk+ #206 [ 217.501233] Hardware name: System manufacturer P5Q-EM/P5Q-EM, BIOS 2203 07/08/2009 [ 217.501697] Call Trace: [ 217.501697] dump_stack+0x66/0x8e [ 217.501697] ___might_sleep.cold.96+0x95/0xa6 [ 217.501697] __might_sleep+0x2e/0x80 [ 217.501697] collapse_huge_page.isra.51+0x5ac/0x1360 [ 217.501697] ? __alloc_pages_nodemask+0xec/0xf80 [ 217.501697] ? __alloc_pages_nodemask+0x191/0xf80 [ 217.501697] ? trace_hardirqs_on+0x4a/0xf0 [ 217.501697] khugepaged+0x9a9/0x20f0 [ 217.501697] ? _raw_spin_unlock+0x21/0x30 [ 217.501697] ? trace_hardirqs_on+0x4a/0xf0 [ 217.501697] ? wait_woken+0xa0/0xa0 [ 217.501697] kthread+0xf5/0x110 [ 217.501697] ? collapse_pte_mapped_thp+0x3b0/0x3b0 [ 217.501697] ? kthread_create_worker_on_cpu+0x20/0x20 [ 217.501697] ret_from_fork+0x2e/0x38 Looks like it's due to CONFIG_HIGHPTE=y pte_offset_map()->kmap_atomic() vs. mmu_notifier_invalidate_range_start(). Let's do the naive approach and just reorder the two operations. Link: http://lkml.kernel.org/r/20191029201513.GG1208@intel.com Fixes: 810e24e009cf71 ("mm/mmu_notifiers: annotate with might_sleep()") Signed-off-by: Ville Syrjl <ville.syrjala(a)linux.intel.com> Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Borislav Petkov <bp(a)alien8.de> Cc: "H. Peter Anvin" <hpa(a)zytor.com> Cc: Jérôme Glisse <jglisse(a)redhat.com> Cc: Ralph Campbell <rcampbell(a)nvidia.com> Cc: Ira Weiny <ira.weiny(a)intel.com> Cc: Jason Gunthorpe <jgg(a)mellanox.com> Cc: Daniel Vetter <daniel.vetter(a)intel.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/khugepaged.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) --- a/mm/khugepaged.c~khugepaged-might_sleep-warn-due-to-config_highpte=y +++ a/mm/khugepaged.c @@ -1028,12 +1028,13 @@ static void collapse_huge_page(struct mm anon_vma_lock_write(vma->anon_vma); - pte = pte_offset_map(pmd, address); - pte_ptl = pte_lockptr(mm, pmd); - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, NULL, mm, address, address + HPAGE_PMD_SIZE); mmu_notifier_invalidate_range_start(&range); + + pte = pte_offset_map(pmd, address); + pte_ptl = pte_lockptr(mm, pmd); + pmd_ptl = pmd_lock(mm, pmd); /* probably unnecessary */ /* * After this gup_fast can't run anymore. This also removes _

5 years, 8 months

1
0
0 0

[patch 07/17] mm, vmstat: hide /proc/pagetypeinfo from normal users

by akpm＠linux-foundation.org

From: Michal Hocko <mhocko(a)suse.com> Subject: mm, vmstat: hide /proc/pagetypeinfo from normal users /proc/pagetypeinfo is a debugging tool to examine internal page allocator state wrt to fragmentation. It is not very useful for any other use so normal users really do not need to read this file. Waiman Long has noticed that reading this file can have negative side effects because zone->lock is necessary for gathering data and that a) interferes with the page allocator and its users and b) can lead to hard lockups on large machines which have very long free_list. Reduce both issues by simply not exporting the file to regular users. Link: http://lkml.kernel.org/r/20191025072610.18526-2-mhocko@kernel.org Fixes: 467c996c1e19 ("Print out statistics in relation to fragmentation avoidance to /proc/pagetypeinfo") Signed-off-by: Michal Hocko <mhocko(a)suse.com> Reported-by: Waiman Long <longman(a)redhat.com> Acked-by: Mel Gorman <mgorman(a)suse.de> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Acked-by: Waiman Long <longman(a)redhat.com> Acked-by: Rafael Aquini <aquini(a)redhat.com> Acked-by: David Rientjes <rientjes(a)google.com> Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org> Cc: David Hildenbrand <david(a)redhat.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Roman Gushchin <guro(a)fb.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Jann Horn <jannh(a)google.com> Cc: Song Liu <songliubraving(a)fb.com> Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/vmstat.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/vmstat.c~mm-vmstat-hide-proc-pagetypeinfo-from-normal-users +++ a/mm/vmstat.c @@ -1972,7 +1972,7 @@ void __init init_mm_internals(void) #endif #ifdef CONFIG_PROC_FS proc_create_seq("buddyinfo", 0444, NULL, &fragmentation_op); - proc_create_seq("pagetypeinfo", 0444, NULL, &pagetypeinfo_op); + proc_create_seq("pagetypeinfo", 0400, NULL, &pagetypeinfo_op); proc_create_seq("vmstat", 0444, NULL, &vmstat_op); proc_create_seq("zoneinfo", 0444, NULL, &zoneinfo_op); #endif _

5 years, 8 months

1
0
0 0

[patch 04/17] mm: thp: handle page cache THP correctly in PageTransCompoundMap

by akpm＠linux-foundation.org

From: Yang Shi <yang.shi(a)linux.alibaba.com> Subject: mm: thp: handle page cache THP correctly in PageTransCompoundMap We have a usecase to use tmpfs as QEMU memory backend and we would like to take the advantage of THP as well. But, our test shows the EPT is not PMD mapped even though the underlying THP are PMD mapped on host. The number showed by /sys/kernel/debug/kvm/largepage is much less than the number of PMD mapped shmem pages as the below: 7f2778200000-7f2878200000 rw-s 00000000 00:14 262232 /dev/shm/qemu_back_mem.mem.Hz2hSf (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 579584 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 12 And some benchmarks do worse than with anonymous THPs. By digging into the code we figured out that commit 127393fbe597 ("mm: thp: kvm: fix memory corruption in KVM with THP enabled") checks if there is a single PTE mapping on the page for anonymous THP when setting up EPT map. But, the _mapcount < 0 check doesn't fit to page cache THP since every subpage of page cache THP would get _mapcount inc'ed once it is PMD mapped, so PageTransCompoundMap() always returns false for page cache THP. This would prevent KVM from setting up PMD mapped EPT entry. So we need handle page cache THP correctly. However, when page cache THP's PMD gets split, kernel just remove the map instead of setting up PTE map like what anonymous THP does. Before KVM calls get_user_pages() the subpages may get PTE mapped even though it is still a THP since the page cache THP may be mapped by other processes at the mean time. Checking its _mapcount and whether the THP has PTE mapped or not. Although this may report some false negative cases (PTE mapped by other processes), it looks not trivial to make this accurate. With this fix /sys/kernel/debug/kvm/largepage would show reasonable pages are PMD mapped by EPT as the below: 7fbeaee00000-7fbfaee00000 rw-s 00000000 00:14 275464 /dev/shm/qemu_back_mem.mem.SKUvat (deleted) Size: 4194304 kB [snip] AnonHugePages: 0 kB ShmemPmdMapped: 557056 kB [snip] Locked: 0 kB cat /sys/kernel/debug/kvm/largepages 271 And the benchmarks are as same as anonymous THPs. [yang.shi(a)linux.alibaba.com: v4] Link: http://lkml.kernel.org/r/1571865575-42913-1-git-send-email-yang.shi@linux.a… Link: http://lkml.kernel.org/r/1571769577-89735-1-git-send-email-yang.shi@linux.a… Fixes: dd78fedde4b9 ("rmap: support file thp") Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com> Reported-by: Gang Deng <gavin.dg(a)linux.alibaba.com> Tested-by: Gang Deng <gavin.dg(a)linux.alibaba.com> Suggested-by: Hugh Dickins <hughd(a)google.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> [4.8+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- include/linux/mm.h | 5 ----- include/linux/mm_types.h | 5 +++++ include/linux/page-flags.h | 20 ++++++++++++++++++-- 3 files changed, 23 insertions(+), 7 deletions(-) --- a/include/linux/mm.h~mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap +++ a/include/linux/mm.h @@ -695,11 +695,6 @@ static inline void *kvcalloc(size_t n, s extern void kvfree(const void *addr); -static inline atomic_t *compound_mapcount_ptr(struct page *page) -{ - return &page[1].compound_mapcount; -} - static inline int compound_mapcount(struct page *page) { VM_BUG_ON_PAGE(!PageCompound(page), page); --- a/include/linux/mm_types.h~mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap +++ a/include/linux/mm_types.h @@ -221,6 +221,11 @@ struct page { #endif } _struct_page_alignment; +static inline atomic_t *compound_mapcount_ptr(struct page *page) +{ + return &page[1].compound_mapcount; +} + /* * Used for sizing the vmemmap region on some architectures */ --- a/include/linux/page-flags.h~mm-thp-handle-page-cache-thp-correctly-in-pagetranscompoundmap +++ a/include/linux/page-flags.h @@ -622,12 +622,28 @@ static inline int PageTransCompound(stru * * Unlike PageTransCompound, this is safe to be called only while * split_huge_pmd() cannot run from under us, like if protected by the - * MMU notifier, otherwise it may result in page->_mapcount < 0 false + * MMU notifier, otherwise it may result in page->_mapcount check false * positives. + * + * We have to treat page cache THP differently since every subpage of it + * would get _mapcount inc'ed once it is PMD mapped. But, it may be PTE + * mapped in the current process so comparing subpage's _mapcount to + * compound_mapcount to filter out PTE mapped case. */ static inline int PageTransCompoundMap(struct page *page) { - return PageTransCompound(page) && atomic_read(&page->_mapcount) < 0; + struct page *head; + + if (!PageTransCompound(page)) + return 0; + + if (PageAnon(page)) + return atomic_read(&page->_mapcount) < 0; + + head = compound_head(page); + /* File THP is PMD mapped and not PTE mapped */ + return atomic_read(&page->_mapcount) == + atomic_read(compound_mapcount_ptr(head)); } /* _

5 years, 8 months

1
0
0 0

[patch 03/17] mm, meminit: recalculate pcpu batch and high limits after init completes

by akpm＠linux-foundation.org

From: Mel Gorman <mgorman(a)techsingularity.net> Subject: mm, meminit: recalculate pcpu batch and high limits after init completes Deferred memory initialisation updates zone->managed_pages during the initialisation phase but before that finishes, the per-cpu page allocator (pcpu) calculates the number of pages allocated/freed in batches as well as the maximum number of pages allowed on a per-cpu list. As zone->managed_pages is not up to date yet, the pcpu initialisation calculates inappropriately low batch and high values. This increases zone lock contention quite severely in some cases with the degree of severity depending on how many CPUs share a local zone and the size of the zone. A private report indicated that kernel build times were excessive with extremely high system CPU usage. A perf profile indicated that a large chunk of time was lost on zone->lock contention. This patch recalculates the pcpu batch and high values after deferred initialisation completes for every populated zone in the system. It was tested on a 2-socket AMD EPYC 2 machine using a kernel compilation workload -- allmodconfig and all available CPUs. mmtests configuration: config-workload-kernbench-max Configuration was modified to build on a fresh XFS partition. kernbench 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v2 Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%* Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%* Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%* Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%) Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%) Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%) 5.4.0-rc3 5.4.0-rc3 vanilla resetpcpu-v2 Duration User 39766.24 49221.79 Duration System 44298.10 13361.67 Duration Elapsed 519.11 388.87 The patch reduces system CPU usage by 69.86% and total build time by 26.65%. The variance of system CPU usage is also much reduced. Before, this was the breakdown of batch and high values over all zones was. 256 batch: 1 256 batch: 63 512 batch: 7 256 high: 0 256 high: 378 512 high: 42 512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After the patch: 256 batch: 1 768 batch: 63 256 high: 0 768 high: 378 [mgorman(a)techsingularity.net: fix merge/linkage snafu] Link: http://lkml.kernel.org/r/20191023084705.GD3016@techsingularity.netLink: http://lkml.kernel.org/r/20191021094808.28824-2-mgorman@techsingularity.net Signed-off-by: Mel Gorman <mgorman(a)techsingularity.net> Acked-by: Michal Hocko <mhocko(a)suse.com> Acked-by: Vlastimil Babka <vbabka(a)suse.cz> Acked-by: David Hildenbrand <david(a)redhat.com> Cc: Matt Fleming <matt(a)codeblueprint.co.uk> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Borislav Petkov <bp(a)alien8.de> Cc: Qian Cai <cai(a)lca.pw> Cc: <stable(a)vger.kernel.org> [4.1+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) --- a/mm/page_alloc.c~mm-meminit-recalculate-pcpu-batch-and-high-limits-after-init-completes +++ a/mm/page_alloc.c @@ -1948,6 +1948,14 @@ void __init page_alloc_init_late(void) wait_for_completion(&pgdat_init_all_done_comp); /* + * The number of managed pages has changed due to the initialisation + * so the pcpu batch and high limits needs to be updated or the limits + * will be artificially small. + */ + for_each_populated_zone(zone) + zone_pcp_update(zone); + + /* * We initialized the rest of the deferred pages. Permanently disable * on-demand struct page initialization. */ @@ -8514,7 +8522,6 @@ void free_contig_range(unsigned long pfn WARN(count != 0, "%d pages are still in use!\n", count); } -#ifdef CONFIG_MEMORY_HOTPLUG /* * The zone indicated has a new number of managed_pages; batch sizes and percpu * page high values need to be recalulated. @@ -8528,7 +8535,6 @@ void __meminit zone_pcp_update(struct zo per_cpu_ptr(zone->pageset, cpu)); mutex_unlock(&pcp_batch_high_lock); } -#endif void zone_pcp_reset(struct zone *zone) { _

5 years, 8 months

1
0
0 0

[patch 01/17] mm: memcontrol: fix NULL-ptr deref in percpu stats flush

by akpm＠linux-foundation.org

From: Shakeel Butt <shakeelb(a)google.com> Subject: mm: memcontrol: fix NULL-ptr deref in percpu stats flush __mem_cgroup_free() can be called on the failure path in mem_cgroup_alloc(). However memcg_flush_percpu_vmstats() and memcg_flush_percpu_vmevents() which are called from __mem_cgroup_free() access the fields of memcg which can potentially be null if called from failure path from mem_cgroup_alloc(). Indeed syzbot has reported the following crash: R13: 00000000004bf27d R14: 00000000004db028 R15: 0000000000000003 kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: 0000 [#1] PREEMPT SMP KASAN CPU: 0 PID: 30393 Comm: syz-executor.1 Not tainted 5.4.0-rc2+ #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:memcg_flush_percpu_vmstats+0x4ae/0x930 mm/memcontrol.c:3436 Code: 05 41 89 c0 41 0f b6 04 24 41 38 c7 7c 08 84 c0 0f 85 5d 03 00 00 44 3b 05 33 d5 12 08 0f 83 e2 00 00 00 4c 89 f0 48 c1 e8 03 <42> 80 3c 28 00 0f 85 91 03 00 00 48 8b 85 10 fe ff ff 48 8b b0 90 RSP: 0018:ffff888095c27980 EFLAGS: 00010206 RAX: 0000000000000012 RBX: ffff888095c27b28 RCX: ffffc90008192000 RDX: 0000000000040000 RSI: ffffffff8340fae7 RDI: 0000000000000007 RBP: ffff888095c27be0 R08: 0000000000000000 R09: ffffed1013f0da33 R10: ffffed1013f0da32 R11: ffff88809f86d197 R12: fffffbfff138b760 R13: dffffc0000000000 R14: 0000000000000090 R15: 0000000000000007 FS: 00007f5027170700(0000) GS:ffff8880ae800000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000710158 CR3: 00000000a7b18000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: __mem_cgroup_free+0x1a/0x190 mm/memcontrol.c:5021 mem_cgroup_free mm/memcontrol.c:5033 [inline] mem_cgroup_css_alloc+0x3a1/0x1ae0 mm/memcontrol.c:5160 css_create kernel/cgroup/cgroup.c:5156 [inline] cgroup_apply_control_enable+0x44d/0xc40 kernel/cgroup/cgroup.c:3119 cgroup_mkdir+0x899/0x11b0 kernel/cgroup/cgroup.c:5401 kernfs_iop_mkdir+0x14d/0x1d0 fs/kernfs/dir.c:1124 vfs_mkdir+0x42e/0x670 fs/namei.c:3807 do_mkdirat+0x234/0x2a0 fs/namei.c:3830 __do_sys_mkdir fs/namei.c:3846 [inline] __se_sys_mkdir fs/namei.c:3844 [inline] __x64_sys_mkdir+0x5c/0x80 fs/namei.c:3844 do_syscall_64+0xfa/0x760 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x459a59 Code: fd b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b7 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:00007f502716fc78 EFLAGS: 00000246 ORIG_RAX: 0000000000000053 RAX: ffffffffffffffda RBX: 00007f502716fc90 RCX: 0000000000459a59 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000020000180 RBP: 000000000075bf20 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007f50271706d4 R13: 00000000004bf27d R14: 00000000004db028 R15: 0000000000000003 Fixing this by moving the flush to mem_cgroup_free as there is no need to flush anything if we see failure in mem_cgroup_alloc(). Link: http://lkml.kernel.org/r/20191018165231.249872-1-shakeelb@google.com Fixes: bb65f89b7d3d ("mm: memcontrol: flush percpu vmevents before releasing memcg") Fixes: c350a99ea2b1 ("mm: memcontrol: flush percpu vmstats before releasing memcg") Signed-off-by: Shakeel Butt <shakeelb(a)google.com> Reported-by: syzbot+515d5bcfe179cdf049b2(a)syzkaller.appspotmail.com Reviewed-by: Roman Gushchin <guro(a)fb.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memcontrol.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) --- a/mm/memcontrol.c~mm-memcontrol-fix-null-ptr-deref-in-percpu-stats-flush +++ a/mm/memcontrol.c @@ -5014,12 +5014,6 @@ static void __mem_cgroup_free(struct mem { int node; - /* - * Flush percpu vmstats and vmevents to guarantee the value correctness - * on parent's and all ancestor levels. - */ - memcg_flush_percpu_vmstats(memcg, false); - memcg_flush_percpu_vmevents(memcg); for_each_node(node) free_mem_cgroup_per_node_info(memcg, node); free_percpu(memcg->vmstats_percpu); @@ -5030,6 +5024,12 @@ static void __mem_cgroup_free(struct mem static void mem_cgroup_free(struct mem_cgroup *memcg) { memcg_wb_domain_exit(memcg); + /* + * Flush percpu vmstats and vmevents to guarantee the value correctness + * on parent's and all ancestor levels. + */ + memcg_flush_percpu_vmstats(memcg, false); + memcg_flush_percpu_vmevents(memcg); __mem_cgroup_free(memcg); } _

5 years, 8 months

1
0
0 0

Re: [PATCH] clone3: validate stack arguments

by Christian Brauner

On Thu, Oct 31, 2019 at 12:40:36PM +0000, Sasha Levin wrote: > Hi, > > [This is an automated email] > > This commit has been processed because it contains a "Fixes:" tag, > fixing commit: 7f192e3cd316b fork: add clone3. > > The bot has tested the following trees: v5.3.8. > > v5.3.8: Failed to apply! Possible dependencies: > 78f6face5af34 ("sched: add kernel-doc for struct clone_args") > > > NOTE: The patch will not be queued to stable trees until it is upstream. > > How should we proceed with this patch? Hey Sasha, This has now landed in mainline (cf. [2]). I would suggest to backport [1] together with [2]. The patch in [1] only documents struct clone_args and has no functional changes. If you prefer to only backport a v5.3 specific version of [2] you can find it inline (cf. [3]) including the base commit info for the 5.3 stable tree. Christian [1]: 78f6face5af3 ("sched: add kernel-doc for struct clone_args") [2]: fa729c4df558 ("clone3: validate stack arguments") [3]: >From 5bc5279d0dfa90cc6af385b6e3f65958f223ccab Mon Sep 17 00:00:00 2001 From: Christian Brauner <christian.brauner(a)ubuntu.com> Date: Thu, 31 Oct 2019 12:36:08 +0100 Subject: [PATCH] clone3: validate stack arguments Validate the stack arguments and setup the stack depening on whether or not it is growing down or up. Legacy clone() required userspace to know in which direction the stack is growing and pass down the stack pointer appropriately. To make things more confusing microblaze uses a variant of the clone() syscall selected by CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument. IA64 has a separate clone2() syscall which also takes an additional stack_size argument. Finally, parisc has a stack that is growing upwards. Userspace therefore has a lot nasty code like the following: #define __STACK_SIZE (8 * 1024 * 1024) pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd) { pid_t ret; void *stack; stack = malloc(__STACK_SIZE); if (!stack) return -ENOMEM; #ifdef __ia64__ ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #elif defined(__parisc__) /* stack grows up */ ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd); #else ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd); #endif return ret; } or even crazier variants such as [3]. With clone3() we have the ability to validate the stack. We can check that when stack_size is passed, the stack pointer is valid and the other way around. We can also check that the memory area userspace gave us is fine to use via access_ok(). Furthermore, we probably should not require userspace to know in which direction the stack is growing. It is easy for us to do this in the kernel and I couldn't find the original reasoning behind exposing this detail to userspace. /* Intentional user visible API change */ clone3() was released with 5.3. Currently, it is not documented and very unclear to userspace how the stack and stack_size argument have to be passed. After talking to glibc folks we concluded that trying to change clone3() to setup the stack instead of requiring userspace to do this is the right course of action. Note, that this is an explicit change in user visible behavior we introduce with this patch. If it breaks someone's use-case we will revert! (And then e.g. place the new behavior under an appropriate flag.) Breaking someone's use-case is very unlikely though. First, neither glibc nor musl currently expose a wrapper for clone3(). Second, there is no real motivation for anyone to use clone3() directly since it does not provide features that legacy clone doesn't. New features for clone3() will first happen in v5.5 which is why v5.4 is still a good time to try and make that change now and backport it to v5.3. Searches on [4] did not reveal any packages calling clone3(). [1]: https://lore.kernel.org/r/CAG48ez3q=BeNcuVTKBN79kJui4vC6nw0Bfq6xc-i0neheT17… [2]: https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenstein [3]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa… [4]: https://codesearch.debian.net Fixes: 7f192e3cd316 ("fork: add clone3") Cc: Kees Cook <keescook(a)chromium.org> Cc: Jann Horn <jannh(a)google.com> Cc: David Howells <dhowells(a)redhat.com> Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: Linus Torvalds <torvalds(a)linux-foundation.org> Cc: Florian Weimer <fweimer(a)redhat.com> Cc: Peter Zijlstra <peterz(a)infradead.org> Cc: linux-api(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Cc: <stable(a)vger.kernel.org> # 5.3 Cc: GNU C Library <libc-alpha(a)sourceware.org> Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com> Acked-by: Arnd Bergmann <arnd(a)arndb.de> Acked-by: Aleksa Sarai <cyphar(a)cyphar.com> Link: https://lore.kernel.org/r/20191031113608.20713-1-christian.brauner@ubuntu.c… --- kernel/fork.c | 33 ++++++++++++++++++++++++++++++++- 1 file changed, 32 insertions(+), 1 deletion(-) diff --git a/kernel/fork.c b/kernel/fork.c index 3647097e6783..8bbd39585301 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -2586,7 +2586,35 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs, return 0; } -static bool clone3_args_valid(const struct kernel_clone_args *kargs) +/** + * clone3_stack_valid - check and prepare stack + * @kargs: kernel clone args + * + * Verify that the stack arguments userspace gave us are sane. + * In addition, set the stack direction for userspace since it's easy for us to + * determine. + */ +static inline bool clone3_stack_valid(struct kernel_clone_args *kargs) +{ + if (kargs->stack == 0) { + if (kargs->stack_size > 0) + return false; + } else { + if (kargs->stack_size == 0) + return false; + + if (!access_ok((void __user *)kargs->stack, kargs->stack_size)) + return false; + +#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_IA64) + kargs->stack += kargs->stack_size; +#endif + } + + return true; +} + +static bool clone3_args_valid(struct kernel_clone_args *kargs) { /* * All lower bits of the flag word are taken. @@ -2606,6 +2634,9 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs) kargs->exit_signal) return false; + if (!clone3_stack_valid(kargs)) + return false; + return true; } base-commit: db0655e705be645ad673b0a70160921e088517c0 -- 2.23.0

5 years, 8 months

1
0
0 0

Re: [PATCH] nvme: change nvme_passthru_cmd64 to explicitly mark rsvd

by Christoph Hellwig

On Tue, Nov 05, 2019 at 08:39:12AM +0100, Marta Rybczynska wrote: > Looks good to me. However, please note that the new ioctl made it already to 5.3.8. It wasn't in 5.3, but it seems like you are right and it somehow got picked for the stable releases. Sasha, can you please revert 76d609da9ed1cc0dc780e2b539d7b827ce28f182 in 5.3-stable ASAP and make sure crap like backporting new ABIs that haven't seen a release yet is never ever going to happen again?

5 years, 8 months

3
3
0 0

[PATCH] KVM: X86: Dynamically allocating MSR number lists(msrs_to_save[], emulated_msrs[], msr_based_features[])

by Chenyi Qiang

The three msr number lists(msrs_to_save[], emulated_msrs[] and msr_based_features[]) are global arrays of kvm.ko, which are initialized/adjusted (copy supported MSRs forward to override the unsupported MSRs) when installing kvm-{intel,amd}.ko, but it doesn't reset these three arrays to their initial value when uninstalling kvm-{intel,amd}.ko. Thus, at the next installation, kvm-{intel,amd}.ko will initialize the modified arrays with some MSRs lost and some MSRs duplicated. So allocate and initialize these three MSR number lists dynamically when installing kvm-{intel,amd}.ko and free them when uninstalling. Cc: stable(a)vger.kernel.org Reviewed-by: Xiaoyao Li <xiaoyao.li(a)intel.com> Signed-off-by: Chenyi Qiang <chenyi.qiang(a)intel.com> --- arch/x86/kvm/x86.c | 86 ++++++++++++++++++++++++++++++---------------- 1 file changed, 57 insertions(+), 29 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index ff395f812719..08efcf6351cc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1132,13 +1132,15 @@ EXPORT_SYMBOL_GPL(kvm_rdpmc); * List of msr numbers which we expose to userspace through KVM_GET_MSRS * and KVM_SET_MSRS, and KVM_GET_MSR_INDEX_LIST. * - * This list is modified at module load time to reflect the + * The three msr number lists(msrs_to_save, emulated_msrs, msr_based_features) + * are allocated and initialized at module load time and freed at unload time. + * msrs_to_save is selected from the msrs_to_save_all to reflect the * capabilities of the host cpu. This capabilities test skips MSRs that are - * kvm-specific. Those are put in emulated_msrs; filtering of emulated_msrs + * kvm-specific. Those are put in emulated_msrs_all; filtering of emulated_msrs * may depend on host virtualization features rather than host cpu features. */ -static u32 msrs_to_save[] = { +const u32 msrs_to_save_all[] = { MSR_IA32_SYSENTER_CS, MSR_IA32_SYSENTER_ESP, MSR_IA32_SYSENTER_EIP, MSR_STAR, #ifdef CONFIG_X86_64 @@ -1179,9 +1181,10 @@ static u32 msrs_to_save[] = { MSR_ARCH_PERFMON_EVENTSEL0 + 16, MSR_ARCH_PERFMON_EVENTSEL0 + 17, }; +static u32 *msrs_to_save; static unsigned num_msrs_to_save; -static u32 emulated_msrs[] = { +const u32 emulated_msrs_all[] = { MSR_KVM_SYSTEM_TIME, MSR_KVM_WALL_CLOCK, MSR_KVM_SYSTEM_TIME_NEW, MSR_KVM_WALL_CLOCK_NEW, HV_X64_MSR_GUEST_OS_ID, HV_X64_MSR_HYPERCALL, @@ -1220,7 +1223,7 @@ static u32 emulated_msrs[] = { * by arch/x86/kvm/vmx/nested.c based on CPUID or other MSRs. * We always support the "true" VMX control MSRs, even if the host * processor does not, so I am putting these registers here rather - * than in msrs_to_save. + * than in msrs_to_save_all. */ MSR_IA32_VMX_BASIC, MSR_IA32_VMX_TRUE_PINBASED_CTLS, @@ -1239,13 +1242,14 @@ static u32 emulated_msrs[] = { MSR_KVM_POLL_CONTROL, }; +static u32 *emulated_msrs; static unsigned num_emulated_msrs; /* * List of msr numbers which are used to expose MSR-based features that * can be used by a hypervisor to validate requested CPU features. */ -static u32 msr_based_features[] = { +const u32 msr_based_features_all[] = { MSR_IA32_VMX_BASIC, MSR_IA32_VMX_TRUE_PINBASED_CTLS, MSR_IA32_VMX_PINBASED_CTLS, @@ -1270,6 +1274,7 @@ static u32 msr_based_features[] = { MSR_IA32_ARCH_CAPABILITIES, }; +static u32 *msr_based_features; static unsigned int num_msr_based_features; static u64 kvm_get_arch_capabilities(void) @@ -3311,11 +3316,11 @@ long kvm_arch_dev_ioctl(struct file *filp, if (n < msr_list.nmsrs) goto out; r = -EFAULT; - if (copy_to_user(user_msr_list->indices, &msrs_to_save, + if (copy_to_user(user_msr_list->indices, msrs_to_save, num_msrs_to_save * sizeof(u32))) goto out; if (copy_to_user(user_msr_list->indices + num_msrs_to_save, - &emulated_msrs, + emulated_msrs, num_emulated_msrs * sizeof(u32))) goto out; r = 0; @@ -3364,7 +3369,7 @@ long kvm_arch_dev_ioctl(struct file *filp, if (n < msr_list.nmsrs) goto out; r = -EFAULT; - if (copy_to_user(user_msr_list->indices, &msr_based_features, + if (copy_to_user(user_msr_list->indices, msr_based_features, num_msr_based_features * sizeof(u32))) goto out; r = 0; @@ -5086,26 +5091,41 @@ long kvm_arch_vm_ioctl(struct file *filp, return r; } -static void kvm_init_msr_list(void) +static int kvm_init_msr_list(void) { struct x86_pmu_capability x86_pmu; u32 dummy[2]; unsigned i, j; BUILD_BUG_ON_MSG(INTEL_PMC_MAX_FIXED != 4, - "Please update the fixed PMCs in msrs_to_save[]"); + "Please update the fixed PMCs in msrs_to_saved_all[]"); perf_get_x86_pmu_capability(&x86_pmu); - for (i = j = 0; i < ARRAY_SIZE(msrs_to_save); i++) { - if (rdmsr_safe(msrs_to_save[i], &dummy[0], &dummy[1]) < 0) + msrs_to_save = kmalloc(sizeof(msrs_to_save_all), + GFP_KERNEL_ACCOUNT); + if (!msrs_to_save) + return -ENOMEM; + + emulated_msrs = kmalloc(sizeof(emulated_msrs_all), + GFP_KERNEL_ACCOUNT); + if (!emulated_msrs) + goto free_msrs_to_save; + + msr_based_features = kmalloc(sizeof(msr_based_features_all), + GFP_KERNEL_ACCOUNT); + if (!msr_based_features) + goto free_emulated_msrs; + + for (i = j = 0; i < ARRAY_SIZE(msrs_to_save_all); i++) { + if (rdmsr_safe(msrs_to_save_all[i], &dummy[0], &dummy[1]) < 0) continue; /* * Even MSRs that are valid in the host may not be exposed * to the guests in some cases. */ - switch (msrs_to_save[i]) { + switch (msrs_to_save_all[i]) { case MSR_IA32_BNDCFGS: if (!kvm_mpx_supported()) continue; @@ -5133,17 +5153,17 @@ static void kvm_init_msr_list(void) break; case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: { if (!kvm_x86_ops->pt_supported() || - msrs_to_save[i] - MSR_IA32_RTIT_ADDR0_A >= + msrs_to_save_all[i] - MSR_IA32_RTIT_ADDR0_A >= intel_pt_validate_hw_cap(PT_CAP_num_address_ranges) * 2) continue; break; case MSR_ARCH_PERFMON_PERFCTR0 ... MSR_ARCH_PERFMON_PERFCTR0 + 17: - if (msrs_to_save[i] - MSR_ARCH_PERFMON_PERFCTR0 >= + if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_PERFCTR0 >= min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) continue; break; case MSR_ARCH_PERFMON_EVENTSEL0 ... MSR_ARCH_PERFMON_EVENTSEL0 + 17: - if (msrs_to_save[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= + if (msrs_to_save_all[i] - MSR_ARCH_PERFMON_EVENTSEL0 >= min(INTEL_PMC_MAX_GENERIC, x86_pmu.num_counters_gp)) continue; } @@ -5151,34 +5171,40 @@ static void kvm_init_msr_list(void) break; } - if (j < i) - msrs_to_save[j] = msrs_to_save[i]; + if (j <= i) + msrs_to_save[j] = msrs_to_save_all[i]; j++; } num_msrs_to_save = j; - for (i = j = 0; i < ARRAY_SIZE(emulated_msrs); i++) { - if (!kvm_x86_ops->has_emulated_msr(emulated_msrs[i])) + for (i = j = 0; i < ARRAY_SIZE(emulated_msrs_all); i++) { + if (!kvm_x86_ops->has_emulated_msr(emulated_msrs_all[i])) continue; - if (j < i) - emulated_msrs[j] = emulated_msrs[i]; + if (j <= i) + emulated_msrs[j] = emulated_msrs_all[i]; j++; } num_emulated_msrs = j; - for (i = j = 0; i < ARRAY_SIZE(msr_based_features); i++) { + for (i = j = 0; i < ARRAY_SIZE(msr_based_features_all); i++) { struct kvm_msr_entry msr; - msr.index = msr_based_features[i]; + msr.index = msr_based_features_all[i]; if (kvm_get_msr_feature(&msr)) continue; - if (j < i) - msr_based_features[j] = msr_based_features[i]; + if (j <= i) + msr_based_features[j] = msr_based_features_all[i]; j++; } num_msr_based_features = j; + return 0; +free_emulated_msrs: + kfree(emulated_msrs); +free_msrs_to_save: + kfree(msrs_to_save); + return -ENOMEM; } static int vcpu_mmio_write(struct kvm_vcpu *vcpu, gpa_t addr, int len, @@ -9294,12 +9320,14 @@ int kvm_arch_hardware_setup(void) kvm_default_tsc_scaling_ratio = 1ULL << kvm_tsc_scaling_ratio_frac_bits; } - kvm_init_msr_list(); - return 0; + return kvm_init_msr_list(); } void kvm_arch_hardware_unsetup(void) { + kfree(msr_based_features); + kfree(emulated_msrs); + kfree(msrs_to_save); kvm_x86_ops->hardware_unsetup(); } -- 2.17.1

5 years, 8 months

4
7
0 0

[PATCH 4.14 00/95] 4.14.152-stable review

by Greg Kroah-Hartman

This is the start of the stable review cycle for the 4.14.152 release. There are 95 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Wed 06 Nov 2019 09:14:04 PM UTC. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.152-r… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y and the diffstat can be found below. thanks, greg k-h ------------- Pseudo-Shortlog of commits: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> Linux 4.14.152-rc1 Takashi Iwai <tiwai(a)suse.de> ALSA: timer: Fix mutex deadlock at releasing card Takashi Iwai <tiwai(a)suse.de> ALSA: timer: Simplify error path in snd_timer_open() Vratislav Bendel <vbendel(a)redhat.com> xfs: Correctly invert xfs_buftarg LRU isolation logic Xin Long <lucien.xin(a)gmail.com> sctp: not bind the socket in sctp_connect Xin Long <lucien.xin(a)gmail.com> sctp: fix the issue that flags are ignored when using kernel_connect Eric Dumazet <edumazet(a)google.com> sch_netem: fix rcu splat in netem_enqueue() Valentin Vidic <vvidic(a)valentin-vidic.from.hr> net: usb: sr9800: fix uninitialized local variable Eric Dumazet <edumazet(a)google.com> bonding: fix potential NULL deref in bond_update_slave_arr Johan Hovold <johan(a)kernel.org> NFC: pn533: fix use-after-free and memleaks David Howells <dhowells(a)redhat.com> rxrpc: Fix call ref leak Eric Biggers <ebiggers(a)google.com> llc: fix sk_buff leak in llc_conn_service() Eric Biggers <ebiggers(a)google.com> llc: fix sk_buff leak in llc_sap_state_process() Tony Lindgren <tony(a)atomide.com> dmaengine: cppi41: Fix cppi41_dma_prep_slave_sg() when idle Laura Abbott <labbott(a)redhat.com> rtlwifi: Fix potential overflow on P2P code Catalin Marinas <catalin.marinas(a)arm.com> arm64: Ensure VM_WRITE|VM_SHARED ptes are clean by default Heiko Carstens <heiko.carstens(a)de.ibm.com> s390/idle: fix cpu idle time calculation Yihui ZENG <yzeng56(a)asu.edu> s390/cmm: fix information leak in cmm_timeout_handler() Markus Theil <markus.theil(a)tu-ilmenau.de> nl80211: fix validation of mesh path nexthop Michał Mirosław <mirq-linux(a)rere.qmqm.pl> HID: fix error message in hid_open_report() Alan Stern <stern(a)rowland.harvard.edu> HID: Fix assumption that devices have inputs Hans de Goede <hdegoede(a)redhat.com> HID: i2c-hid: add Trekstor Primebook C11B to descriptor override Bart Van Assche <bvanassche(a)acm.org> scsi: target: cxgbit: Fix cxgbit_fw4_ack() Johan Hovold <johan(a)kernel.org> USB: serial: whiteheat: fix line-speed endianness Johan Hovold <johan(a)kernel.org> USB: serial: whiteheat: fix potential slab corruption Johan Hovold <johan(a)kernel.org> USB: ldusb: fix control-message timeout Johan Hovold <johan(a)kernel.org> USB: ldusb: fix ring-buffer locking Alan Stern <stern(a)rowland.harvard.edu> usb-storage: Revert commit 747668dbc061 ("usb-storage: Set virt_boundary_mask to avoid SG overflows") Alan Stern <stern(a)rowland.harvard.edu> USB: gadget: Reject endpoints with 0 maxpacket value Alan Stern <stern(a)rowland.harvard.edu> UAS: Revert commit 3ae62a42090f ("UAS: fix alignment of scatter/gather segments") Kailang Yang <kailang(a)realtek.com> ALSA: hda/realtek - Add support for ALC623 Aaron Ma <aaron.ma(a)canonical.com> ALSA: hda/realtek - Fix 2 front mics of codec 0x623 Takashi Sakamoto <o-takashi(a)sakamocchi.jp> ALSA: bebob: Fix prototype of helper function to return negative value Miklos Szeredi <mszeredi(a)redhat.com> fuse: truncate pending writes on O_TRUNC Miklos Szeredi <mszeredi(a)redhat.com> fuse: flush dirty data/metadata before non-truncate setattr Hui Peng <benquike(a)gmail.com> ath6kl: fix a NULL-ptr-deref bug in ath6kl_usb_alloc_urb_from_pipe() Mika Westerberg <mika.westerberg(a)linux.intel.com> thunderbolt: Use 32-bit writes when writing ring producer/consumer Cong Wang <xiyou.wangcong(a)gmail.com> net_sched: check cops->tcf_block in tc_bind_tclass() Dan Carpenter <dan.carpenter(a)oracle.com> USB: legousbtower: fix a signedness bug in tower_probe() Mike Christie <mchristi(a)redhat.com> nbd: verify socket is supported during setup Petr Mladek <pmladek(a)suse.com> tracing: Initialize iter->seq after zeroing in tracing_read_pipe() Christian Borntraeger <borntraeger(a)de.ibm.com> s390/uaccess: avoid (false positive) compiler warnings Chuck Lever <chuck.lever(a)oracle.com> NFSv4: Fix leak of clp->cl_acceptor string Xiubo Li <xiubli(a)redhat.com> nbd: fix possible sysfs duplicate warning Thomas Bogendoerfer <tbogendoerfer(a)suse.de> MIPS: fw: sni: Fix out of bounds init of o32 stack Thomas Bogendoerfer <tbogendoerfer(a)suse.de> MIPS: include: Mark __xchg as __always_inline Tom Lendacky <thomas.lendacky(a)amd.com> perf/x86/amd: Change/fix NMI latency mitigation to use a timestamp Frederic Weisbecker <frederic(a)kernel.org> sched/vtime: Fix guest/system mis-accounting on task switch Jia-Ju Bai <baijiaju1990(a)gmail.com> fs: ocfs2: fix a possible null-pointer dereference in ocfs2_info_scan_inode_alloc() Jia-Ju Bai <baijiaju1990(a)gmail.com> fs: ocfs2: fix a possible null-pointer dereference in ocfs2_write_end_nolock() Jia-Ju Bai <baijiaju1990(a)gmail.com> fs: ocfs2: fix possible null-pointer dereferences in ocfs2_xa_prepare_entry() Jia Guo <guojia12(a)huawei.com> ocfs2: clear zero in unaligned direct IO Boris Ostrovsky <boris.ostrovsky(a)oracle.com> x86/xen: Return from panic notifier Thomas Bogendoerfer <tbogendoerfer(a)suse.de> MIPS: include: Mark __cmpxchg as __always_inline Dave Young <dyoung(a)redhat.com> efi/x86: Do not clean dummy variable in kexec path Lukas Wunner <lukas(a)wunner.de> efi/cper: Fix endianness of PCIe class code Adam Ford <aford173(a)gmail.com> serial: mctrl_gpio: Check for NULL pointer Austin Kim <austindh.kim(a)gmail.com> fs: cifs: mute -Wunused-const-variable message Thierry Reding <treding(a)nvidia.com> gpio: max77620: Use correct unit for debounce times Randy Dunlap <rdunlap(a)infradead.org> tty: n_hdlc: fix build on SPARC Christophe JAILLET <christophe.jaillet(a)wanadoo.fr> tty: serial: owl: Fix the link time qualifier of 'owl_uart_exit()' James Morse <james.morse(a)arm.com> arm64: ftrace: Ensure synchronisation in PLT setup for Neoverse-N1 #1542419 ZhangXiaoxu <zhangxiaoxu5(a)huawei.com> nfs: Fix nfsi->nrequests count error on nfs_inode_remove_request Dexuan Cui <decui(a)microsoft.com> HID: hyperv: Use in-place iterator API in the channel callback Bart Van Assche <bvanassche(a)acm.org> RDMA/iwcm: Fix a lock inversion issue Navid Emamdoost <navid.emamdoost(a)gmail.com> RDMA/hfi1: Prevent memory leak in sdma_init Connor Kuehl <connor.kuehl(a)canonical.com> staging: rtl8188eu: fix null dereference when kzalloc fails Andi Kleen <ak(a)linux.intel.com> perf jevents: Fix period for Intel fixed counters Steve MacLean <Steve.MacLean(a)microsoft.com> perf map: Fix overlapped map handling Ian Rogers <irogers(a)google.com> perf tests: Avoid raising SEGV using an obvious NULL dereference Ian Rogers <irogers(a)google.com> libsubcmd: Make _FORTIFY_SOURCE defines dependent on the feature Pascal Bouwmann <bouwmann(a)tau-tec.de> iio: fix center temperature of bmc150-accel-core Remi Pommarel <repk(a)triplefau.lt> iio: adc: meson_saradc: Fix memory allocation order Sven Van Asbroeck <thesven73(a)gmail.com> power: supply: max14656: fix potential use-after-free Sven Van Asbroeck <thesven73(a)gmail.com> PCI/PME: Fix possible use-after-free on remove Kees Cook <keescook(a)chromium.org> exec: load_script: Do not exec truncated interpreter path Lucas A. M. Magalhães <lucmaga(a)gmail.com> media: vimc: Remove unused but set variables Takashi Iwai <tiwai(a)suse.de> ALSA: hda/realtek - Apply ALC294 hp init also for S4 resume Nir Dotan <nird(a)mellanox.com> mlxsw: spectrum: Set LAG port collector only when active Sam Ravnborg <sam(a)ravnborg.org> rtc: pcf8523: set xtal load capacitance from DT Jan-Marek Glogowski <glogow(a)fbihome.de> usb: handle warm-reset port requests on hub resume NOGUCHI Hiroshi <drvlabo(a)gmail.com> HID: Add ASUS T100CHI keyboard dock battery quirks Brian Norris <briannorris(a)chromium.org> scripts/setlocalversion: Improve -dirty check with git-status --no-optional-locks Yi Wang <wang.yi59(a)zte.com.cn> clk: boston: unregister clks on failure in clk_boston_setup() Hans de Goede <hdegoede(a)redhat.com> HID: i2c-hid: Add Odys Winbook 13 to descriptor override Kan Liang <kan.liang(a)linux.intel.com> x86/cpu: Add Atom Tremont (Jacobsville) Julian Sax <jsbc(a)gmx.de> HID: i2c-hid: add Direkt-Tek DTLAPY133-1 to descriptor override David Hildenbrand <david(a)redhat.com> powerpc/powernv: hold device_hotplug_lock when calling memtrace_offline_pages() Phil Elwell <phil(a)raspberrypi.org> sc16is7xx: Fix for "Unexpected interrupt: 8" James Smart <jsmart2021(a)gmail.com> scsi: lpfc: Fix a duplicate 0711 log message number. Jaegeuk Kim <jaegeuk(a)kernel.org> f2fs: flush quota blocks after turnning it off Kent Overstreet <kent.overstreet(a)gmail.com> dm: Use kzalloc for all structs with embedded biosets/mempools Mikulas Patocka <mpatocka(a)redhat.com> dm snapshot: rework COW throttling to fix deadlock Mikulas Patocka <mpatocka(a)redhat.com> dm snapshot: introduce account_start_copy() and account_end_copy() Mikulas Patocka <mpatocka(a)redhat.com> dm snapshot: use mutex instead of rw_semaphore Sasha Levin <sashal(a)kernel.org> zram: fix race between backing_dev_show and backing_dev_store ------------- Diffstat: Documentation/admin-guide/kernel-parameters.txt | 4 + Makefile | 4 +- arch/arm64/include/asm/pgtable-prot.h | 15 +- arch/arm64/kernel/ftrace.c | 8 +- arch/mips/fw/sni/sniprom.c | 2 +- arch/mips/include/asm/cmpxchg.h | 9 +- arch/powerpc/platforms/powernv/memtrace.c | 4 +- arch/s390/include/asm/uaccess.h | 4 +- arch/s390/kernel/idle.c | 29 +++- arch/s390/mm/cmm.c | 12 +- arch/x86/events/amd/core.c | 30 ++-- arch/x86/include/asm/intel-family.h | 3 +- arch/x86/platform/efi/efi.c | 3 - arch/x86/xen/enlighten.c | 28 +++- drivers/block/nbd.c | 25 +++- drivers/block/zram/zram_drv.c | 5 +- drivers/clk/imgtec/clk-boston.c | 18 ++- drivers/dma/cppi41.c | 21 ++- drivers/firmware/efi/cper.c | 2 +- drivers/gpio/gpio-max77620.c | 6 +- drivers/hid/hid-axff.c | 11 +- drivers/hid/hid-core.c | 7 +- drivers/hid/hid-dr.c | 12 +- drivers/hid/hid-emsff.c | 12 +- drivers/hid/hid-gaff.c | 12 +- drivers/hid/hid-holtekff.c | 12 +- drivers/hid/hid-hyperv.c | 56 ++----- drivers/hid/hid-input.c | 3 + drivers/hid/hid-lg2ff.c | 12 +- drivers/hid/hid-lg3ff.c | 11 +- drivers/hid/hid-lg4ff.c | 11 +- drivers/hid/hid-lgff.c | 11 +- drivers/hid/hid-logitech-hidpp.c | 11 +- drivers/hid/hid-sony.c | 12 +- drivers/hid/hid-tmff.c | 12 +- drivers/hid/hid-zpff.c | 12 +- drivers/hid/i2c-hid/i2c-hid-dmi-quirks.c | 35 +++++ drivers/iio/accel/bmc150-accel-core.c | 2 +- drivers/iio/adc/meson_saradc.c | 10 +- drivers/infiniband/core/cma.c | 3 +- drivers/infiniband/hw/hfi1/sdma.c | 5 +- drivers/md/dm-bio-prison-v1.c | 2 +- drivers/md/dm-bio-prison-v2.c | 2 +- drivers/md/dm-io.c | 2 +- drivers/md/dm-kcopyd.c | 2 +- drivers/md/dm-region-hash.c | 2 +- drivers/md/dm-snap.c | 178 +++++++++++++++-------- drivers/md/dm-thin.c | 2 +- drivers/media/platform/vimc/vimc-sensor.c | 7 - drivers/net/bonding/bond_main.c | 2 +- drivers/net/ethernet/mellanox/mlxsw/spectrum.c | 62 +++++--- drivers/net/usb/sr9800.c | 2 +- drivers/net/wireless/ath/ath6kl/usb.c | 8 + drivers/net/wireless/realtek/rtlwifi/ps.c | 6 + drivers/nfc/pn533/usb.c | 9 +- drivers/pci/pcie/pme.c | 1 + drivers/power/supply/max14656_charger_detector.c | 17 ++- drivers/rtc/rtc-pcf8523.c | 28 +++- drivers/scsi/lpfc/lpfc_scsi.c | 2 +- drivers/staging/rtl8188eu/os_dep/usb_intf.c | 6 +- drivers/target/iscsi/cxgbit/cxgbit_cm.c | 3 +- drivers/thunderbolt/nhi.c | 22 ++- drivers/tty/n_hdlc.c | 5 + drivers/tty/serial/owl-uart.c | 2 +- drivers/tty/serial/sc16is7xx.c | 28 ++++ drivers/tty/serial/serial_mctrl_gpio.c | 3 + drivers/usb/core/hub.c | 7 + drivers/usb/gadget/udc/core.c | 11 ++ drivers/usb/misc/ldusb.c | 6 +- drivers/usb/misc/legousbtower.c | 2 +- drivers/usb/serial/whiteheat.c | 13 +- drivers/usb/serial/whiteheat.h | 2 +- drivers/usb/storage/scsiglue.c | 10 -- drivers/usb/storage/uas.c | 20 --- fs/binfmt_script.c | 57 ++++++-- fs/cifs/netmisc.c | 4 - fs/f2fs/super.c | 6 + fs/fuse/dir.c | 13 ++ fs/fuse/file.c | 10 +- fs/nfs/nfs4proc.c | 1 + fs/nfs/write.c | 5 +- fs/ocfs2/aops.c | 25 +++- fs/ocfs2/ioctl.c | 2 +- fs/ocfs2/xattr.c | 56 +++---- fs/xfs/xfs_buf.c | 2 +- include/net/llc_conn.h | 2 +- include/net/sch_generic.h | 5 + include/net/sctp/sctp.h | 2 + kernel/sched/cputime.c | 6 +- kernel/trace/trace.c | 1 + net/llc/llc_c_ac.c | 8 +- net/llc/llc_conn.c | 32 ++-- net/llc/llc_s_ac.c | 12 +- net/llc/llc_sap.c | 23 +-- net/rxrpc/sendmsg.c | 1 + net/sched/sch_api.c | 2 + net/sched/sch_netem.c | 2 +- net/sctp/ipv6.c | 2 +- net/sctp/protocol.c | 2 +- net/sctp/socket.c | 55 +++---- net/wireless/nl80211.c | 3 +- scripts/setlocalversion | 12 +- sound/core/timer.c | 63 ++++---- sound/firewire/bebob/bebob_stream.c | 3 +- sound/pci/hda/patch_realtek.c | 15 +- tools/lib/subcmd/Makefile | 8 +- tools/perf/pmu-events/jevents.c | 12 +- tools/perf/tests/perf-hooks.c | 3 +- tools/perf/util/map.c | 3 + 109 files changed, 959 insertions(+), 477 deletions(-)

5 years, 8 months

7
101
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror