June 2019 - Linux-stable-mirror

[patch 14/15] mm, swap: fix THP swap out

by akpm＠linux-foundation.org

From: Huang Ying <ying.huang(a)intel.com> Subject: mm, swap: fix THP swap out 0-Day test system reported some OOM regressions for several THP (Transparent Huge Page) swap test cases. These regressions are bisected to 6861428921b5 ("block: always define BIO_MAX_PAGES as 256"). In the commit, BIO_MAX_PAGES is set to 256 even when THP swap is enabled. So the bio_alloc(gfp_flags, 512) in get_swap_bio() may fail when swapping out THP. That causes the OOM. As in the patch description of 6861428921b5 ("block: always define BIO_MAX_PAGES as 256"), THP swap should use multi-page bvec to write THP to swap space. So the issue is fixed via doing that in get_swap_bio(). BTW: I remember I have checked the THP swap code when 6861428921b5 ("block: always define BIO_MAX_PAGES as 256") was merged, and thought the THP swap code needn't to be changed. But apparently, I was wrong. I should have done this at that time. Link: http://lkml.kernel.org/r/20190624075515.31040-1-ying.huang@intel.com Fixes: 6861428921b5 ("block: always define BIO_MAX_PAGES as 256") Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com> Reviewed-by: Ming Lei <ming.lei(a)redhat.com> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Hugh Dickins <hughd(a)google.com> Cc: Minchan Kim <minchan(a)kernel.org> Cc: Rik van Riel <riel(a)redhat.com> Cc: Daniel Jordan <daniel.m.jordan(a)oracle.com> Cc: Jens Axboe <axboe(a)kernel.dk> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_io.c | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) --- a/mm/page_io.c~mm-swap-fix-thp-swap-out +++ a/mm/page_io.c @@ -29,10 +29,9 @@ static struct bio *get_swap_bio(gfp_t gfp_flags, struct page *page, bio_end_io_t end_io) { - int i, nr = hpage_nr_pages(page); struct bio *bio; - bio = bio_alloc(gfp_flags, nr); + bio = bio_alloc(gfp_flags, 1); if (bio) { struct block_device *bdev; @@ -41,9 +40,7 @@ static struct bio *get_swap_bio(gfp_t gf bio->bi_iter.bi_sector <<= PAGE_SHIFT - 9; bio->bi_end_io = end_io; - for (i = 0; i < nr; i++) - bio_add_page(bio, page + i, PAGE_SIZE, 0); - VM_BUG_ON(bio->bi_iter.bi_size != PAGE_SIZE * nr); + bio_add_page(bio, page, PAGE_SIZE * hpage_nr_pages(page), 0); } return bio; } _

5 years, 11 months

1
0
0 0

[patch 10/15] mm/page_idle.c: fix oops because end_pfn is larger than max_pfn

by akpm＠linux-foundation.org

From: Colin Ian King <colin.king(a)canonical.com> Subject: mm/page_idle.c: fix oops because end_pfn is larger than max_pfn Currently the calcuation of end_pfn can round up the pfn number to more than the actual maximum number of pfns, causing an Oops. Fix this by ensuring end_pfn is never more than max_pfn. This can be easily triggered when on systems where the end_pfn gets rounded up to more than max_pfn using the idle-page stress-ng stress test: sudo stress-ng --idle-page 0 [ 3812.222790] BUG: unable to handle kernel paging request at 00000000000020d8 [ 3812.224341] #PF error: [normal kernel read fault] [ 3812.225144] PGD 0 P4D 0 [ 3812.225626] Oops: 0000 [#1] SMP PTI [ 3812.226264] CPU: 1 PID: 11039 Comm: stress-ng-idle- Not tainted 5.0.0-5-generic #6-Ubuntu [ 3812.227643] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 3812.229286] RIP: 0010:page_idle_get_page+0xc8/0x1a0 [ 3812.230173] Code: 0f b1 0a 75 7d 48 8b 03 48 89 c2 48 c1 e8 33 83 e0 07 48 c1 ea 36 48 8d 0c 40 4c 8d 24 88 49 c1 e4 07 4c 03 24 d5 00 89 c3 be <49> 8b 44 24 58 48 8d b8 80 a1 02 00 e8 07 d5 77 00 48 8b 53 08 48 [ 3812.234641] RSP: 0018:ffffafd7c672fde8 EFLAGS: 00010202 [ 3812.235792] RAX: 0000000000000005 RBX: ffffe36341fff700 RCX: 000000000000000f [ 3812.237739] RDX: 0000000000000284 RSI: 0000000000000275 RDI: 0000000001fff700 [ 3812.239225] RBP: ffffafd7c672fe00 R08: ffffa0bc34056410 R09: 0000000000000276 [ 3812.241027] R10: ffffa0bc754e9b40 R11: ffffa0bc330f6400 R12: 0000000000002080 [ 3812.242555] R13: ffffe36341fff700 R14: 0000000000080000 R15: ffffa0bc330f6400 [ 3812.244073] FS: 00007f0ec1ea5740(0000) GS:ffffa0bc7db00000(0000) knlGS:0000000000000000 [ 3812.245968] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3812.247162] CR2: 00000000000020d8 CR3: 0000000077d68000 CR4: 00000000000006e0 [ 3812.249045] Call Trace: [ 3812.249625] page_idle_bitmap_write+0x8c/0x140 [ 3812.250567] sysfs_kf_bin_write+0x5c/0x70 [ 3812.251406] kernfs_fop_write+0x12e/0x1b0 [ 3812.252282] __vfs_write+0x1b/0x40 [ 3812.253002] vfs_write+0xab/0x1b0 [ 3812.253941] ksys_write+0x55/0xc0 [ 3812.254660] __x64_sys_write+0x1a/0x20 [ 3812.255446] do_syscall_64+0x5a/0x110 [ 3812.256254] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Link: http://lkml.kernel.org/r/20190618124352.28307-1-colin.king@canonical.com Fixes: 33c3fc71c8cf ("mm: introduce idle page tracking") Signed-off-by: Colin Ian King <colin.king(a)canonical.com> Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org> Acked-by: Vladimir Davydov <vdavydov.dev(a)gmail.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Stephen Rothwell <sfr(a)canb.auug.org.au> Cc: Andrey Ryabinin <aryabinin(a)virtuozzo.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_idle.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) --- a/mm/page_idle.c~mm-idle-page-fix-oops-because-end_pfn-is-larger-than-max_pfn +++ a/mm/page_idle.c @@ -136,7 +136,7 @@ static ssize_t page_idle_bitmap_read(str end_pfn = pfn + count * BITS_PER_BYTE; if (end_pfn > max_pfn) - end_pfn = ALIGN(max_pfn, BITMAP_CHUNK_BITS); + end_pfn = max_pfn; for (; pfn < end_pfn; pfn++) { bit = pfn % BITMAP_CHUNK_BITS; @@ -181,7 +181,7 @@ static ssize_t page_idle_bitmap_write(st end_pfn = pfn + count * BITS_PER_BYTE; if (end_pfn > max_pfn) - end_pfn = ALIGN(max_pfn, BITMAP_CHUNK_BITS); + end_pfn = max_pfn; for (; pfn < end_pfn; pfn++) { bit = pfn % BITMAP_CHUNK_BITS; _

5 years, 11 months

1
0
0 0

[patch 07/15] mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge

by akpm＠linux-foundation.org

From: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Subject: mm: hugetlb: soft-offline: dissolve_free_huge_page() return zero on !PageHuge madvise(MADV_SOFT_OFFLINE) often returns -EBUSY when calling soft offline for hugepages with overcommitting enabled. That was caused by the suboptimal code in current soft-offline code. See the following part: ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); if (ret) { ... } else { /* * We set PG_hwpoison only when the migration source hugepage * was successfully dissolved, because otherwise hwpoisoned * hugepage remains on free hugepage list, then userspace will * find it as SIGBUS by allocation failure. That's not expected * in soft-offlining. */ ret = dissolve_free_huge_page(page); if (!ret) { if (set_hwpoison_free_buddy_page(page)) num_poisoned_pages_inc(); } } return ret; Here dissolve_free_huge_page() returns -EBUSY if the migration source page was freed into buddy in migrate_pages(), but even in that case we actually has a chance that set_hwpoison_free_buddy_page() succeeds. So that means current code gives up offlining too early now. dissolve_free_huge_page() checks that a given hugepage is suitable for dissolving, where we should return success for !PageHuge() case because the given hugepage is considered as already dissolved. This change also affects other callers of dissolve_free_huge_page(), which are cleaned up together. [n-horiguchi(a)ah.jp.nec.com: v3] Link: http://lkml.kernel.org/r/1560761476-4651-3-git-send-email-n-horiguchi@ah.jp…: http://lkml.kernel.org/r/1560154686-18497-3-git-send-email-n-horiguchi@ah.j… Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining") Signed-off-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Reported-by: Chen, Jerry T <jerry.t.chen(a)intel.com> Tested-by: Chen, Jerry T <jerry.t.chen(a)intel.com> Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Xishi Qiu <xishi.qiuxishi(a)alibaba-inc.com> Cc: "Chen, Jerry T" <jerry.t.chen(a)intel.com> Cc: "Zhuo, Qiuxu" <qiuxu.zhuo(a)intel.com> Cc: <stable(a)vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/hugetlb.c | 29 ++++++++++++++++++++--------- mm/memory-failure.c | 5 +---- 2 files changed, 21 insertions(+), 13 deletions(-) --- a/mm/hugetlb.c~mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge +++ a/mm/hugetlb.c @@ -1510,16 +1510,29 @@ static int free_pool_huge_page(struct hs /* * Dissolve a given free hugepage into free buddy pages. This function does - * nothing for in-use (including surplus) hugepages. Returns -EBUSY if the - * dissolution fails because a give page is not a free hugepage, or because - * free hugepages are fully reserved. + * nothing for in-use hugepages and non-hugepages. + * This function returns values like below: + * + * -EBUSY: failed to dissolved free hugepages or the hugepage is in-use + * (allocated or reserved.) + * 0: successfully dissolved free hugepages or the page is not a + * hugepage (considered as already dissolved) */ int dissolve_free_huge_page(struct page *page) { int rc = -EBUSY; + /* Not to disrupt normal path by vainly holding hugetlb_lock */ + if (!PageHuge(page)) + return 0; + spin_lock(&hugetlb_lock); - if (PageHuge(page) && !page_count(page)) { + if (!PageHuge(page)) { + rc = 0; + goto out; + } + + if (!page_count(page)) { struct page *head = compound_head(page); struct hstate *h = page_hstate(head); int nid = page_to_nid(head); @@ -1564,11 +1577,9 @@ int dissolve_free_huge_pages(unsigned lo for (pfn = start_pfn; pfn < end_pfn; pfn += 1 << minimum_order) { page = pfn_to_page(pfn); - if (PageHuge(page) && !page_count(page)) { - rc = dissolve_free_huge_page(page); - if (rc) - break; - } + rc = dissolve_free_huge_page(page); + if (rc) + break; } return rc; --- a/mm/memory-failure.c~mm-hugetlb-soft-offline-dissolve_free_huge_page-return-zero-on-pagehuge +++ a/mm/memory-failure.c @@ -1856,11 +1856,8 @@ static int soft_offline_in_use_page(stru static int soft_offline_free_page(struct page *page) { - int rc = 0; - struct page *head = compound_head(page); + int rc = dissolve_free_huge_page(page); - if (PageHuge(head)) - rc = dissolve_free_huge_page(page); if (!rc) { if (set_hwpoison_free_buddy_page(page)) num_poisoned_pages_inc(); _

5 years, 11 months

1
0
0 0

[patch 06/15] mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails

by akpm＠linux-foundation.org

From: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Subject: mm: soft-offline: return -EBUSY if set_hwpoison_free_buddy_page() fails The pass/fail of soft offline should be judged by checking whether the raw error page was finally contained or not (i.e. the result of set_hwpoison_free_buddy_page()), but current code do not work like that. It might lead us to misjudge the test result when set_hwpoison_free_buddy_page() fails. Without this fix, there are cases where madvise(MADV_SOFT_OFFLINE) may not offline the original page and will not return an error. Link: http://lkml.kernel.org/r/1560154686-18497-2-git-send-email-n-horiguchi@ah.j… Signed-off-by: Naoya Horiguchi <n-horiguchi(a)ah.jp.nec.com> Fixes: 6bc9b56433b76 ("mm: fix race on soft-offlining") Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com> Reviewed-by: Oscar Salvador <osalvador(a)suse.de> Cc: Michal Hocko <mhocko(a)kernel.org> Cc: Xishi Qiu <xishi.qiuxishi(a)alibaba-inc.com> Cc: "Chen, Jerry T" <jerry.t.chen(a)intel.com> Cc: "Zhuo, Qiuxu" <qiuxu.zhuo(a)intel.com> Cc: <stable(a)vger.kernel.org> [4.19+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/memory-failure.c | 2 ++ 1 file changed, 2 insertions(+) --- a/mm/memory-failure.c~mm-soft-offline-return-ebusy-if-set_hwpoison_free_buddy_page-fails +++ a/mm/memory-failure.c @@ -1730,6 +1730,8 @@ static int soft_offline_huge_page(struct if (!ret) { if (set_hwpoison_free_buddy_page(page)) num_poisoned_pages_inc(); + else + ret = -EBUSY; } } return ret; _

5 years, 11 months

1
0
0 0

[patch 05/15] signal: remove the wrong signal_pending() check in restore_user_sigmask()

by akpm＠linux-foundation.org

From: Oleg Nesterov <oleg(a)redhat.com> Subject: signal: remove the wrong signal_pending() check in restore_user_sigmask() This is the minimal fix for stable, I'll send cleanups later. 854a6ed56839a40f6b5 ("signal: Add restore_user_sigmask()") introduced the visible change which breaks user-space: a signal temporary unblocked by set_user_sigmask() can be delivered even if the caller returns success or timeout. Change restore_user_sigmask() to accept the additional "interrupted" argument which should be used instead of signal_pending() check, and update the callers. Eric said: : For clarity. I don't think this is required by posix, or fundamentally to : remove the races in select. It is what linux has always done and we have : applications who care so I agree this fix is needed. : : Further in any case where the semantic change that this patch rolls back : (aka where allowing a signal to be delivered and the select like call to : complete) would be advantage we can do as well if not better by using : signalfd. : : Michael is there any chance we can get this guarantee of the linux : implementation of pselect and friends clearly documented. The guarantee : that if the system call completes successfully we are guaranteed that no : signal that is unblocked by using sigmask will be delivered? Link: http://lkml.kernel.org/r/20190604134117.GA29963@redhat.com Fixes: 854a6ed56839a40f6b5d02a2962f48841482eec4 ("signal: Add restore_user_sigmask()") Signed-off-by: Oleg Nesterov <oleg(a)redhat.com> Reported-by: Eric Wong <e(a)80x24.org> Tested-by: Eric Wong <e(a)80x24.org> Acked-by: "Eric W. Biederman" <ebiederm(a)xmission.com> Acked-by: Arnd Bergmann <arnd(a)arndb.de> Acked-by: Deepa Dinamani <deepa.kernel(a)gmail.com> Cc: Michael Kerrisk <mtk.manpages(a)gmail.com> Cc: Jens Axboe <axboe(a)kernel.dk> Cc: Davidlohr Bueso <dave(a)stgolabs.net> Cc: Jason Baron <jbaron(a)akamai.com> Cc: Thomas Gleixner <tglx(a)linutronix.de> Cc: Al Viro <viro(a)ZenIV.linux.org.uk> Cc: David Laight <David.Laight(a)ACULAB.COM> Cc: <stable(a)vger.kernel.org> [5.0+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/aio.c | 28 ++++++++++++++++++++-------- fs/eventpoll.c | 4 ++-- fs/io_uring.c | 7 ++++--- fs/select.c | 18 ++++++------------ include/linux/signal.h | 2 +- kernel/signal.c | 5 +++-- 6 files changed, 36 insertions(+), 28 deletions(-) --- a/fs/aio.c~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask +++ a/fs/aio.c @@ -2095,6 +2095,7 @@ SYSCALL_DEFINE6(io_pgetevents, struct __aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 ts; + bool interrupted; int ret; if (timeout && unlikely(get_timespec64(&ts, timeout))) @@ -2108,8 +2109,10 @@ SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); - restore_user_sigmask(ksig.sigmask, &sigsaved); - if (signal_pending(current) && !ret) + + interrupted = signal_pending(current); + restore_user_sigmask(ksig.sigmask, &sigsaved, interrupted); + if (interrupted && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2128,6 +2131,7 @@ SYSCALL_DEFINE6(io_pgetevents_time32, struct __aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 ts; + bool interrupted; int ret; if (timeout && unlikely(get_old_timespec32(&ts, timeout))) @@ -2142,8 +2146,10 @@ SYSCALL_DEFINE6(io_pgetevents_time32, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &ts : NULL); - restore_user_sigmask(ksig.sigmask, &sigsaved); - if (signal_pending(current) && !ret) + + interrupted = signal_pending(current); + restore_user_sigmask(ksig.sigmask, &sigsaved, interrupted); + if (interrupted && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2193,6 +2199,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; + bool interrupted; int ret; if (timeout && get_old_timespec32(&t, timeout)) @@ -2206,8 +2213,10 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents, return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL); - restore_user_sigmask(ksig.sigmask, &sigsaved); - if (signal_pending(current) && !ret) + + interrupted = signal_pending(current); + restore_user_sigmask(ksig.sigmask, &sigsaved, interrupted); + if (interrupted && !ret) ret = -ERESTARTNOHAND; return ret; @@ -2226,6 +2235,7 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_tim struct __compat_aio_sigset ksig = { NULL, }; sigset_t ksigmask, sigsaved; struct timespec64 t; + bool interrupted; int ret; if (timeout && get_timespec64(&t, timeout)) @@ -2239,8 +2249,10 @@ COMPAT_SYSCALL_DEFINE6(io_pgetevents_tim return ret; ret = do_io_getevents(ctx_id, min_nr, nr, events, timeout ? &t : NULL); - restore_user_sigmask(ksig.sigmask, &sigsaved); - if (signal_pending(current) && !ret) + + interrupted = signal_pending(current); + restore_user_sigmask(ksig.sigmask, &sigsaved, interrupted); + if (interrupted && !ret) ret = -ERESTARTNOHAND; return ret; --- a/fs/eventpoll.c~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask +++ a/fs/eventpoll.c @@ -2325,7 +2325,7 @@ SYSCALL_DEFINE6(epoll_pwait, int, epfd, error = do_epoll_wait(epfd, events, maxevents, timeout); - restore_user_sigmask(sigmask, &sigsaved); + restore_user_sigmask(sigmask, &sigsaved, error == -EINTR); return error; } @@ -2350,7 +2350,7 @@ COMPAT_SYSCALL_DEFINE6(epoll_pwait, int, err = do_epoll_wait(epfd, events, maxevents, timeout); - restore_user_sigmask(sigmask, &sigsaved); + restore_user_sigmask(sigmask, &sigsaved, err == -EINTR); return err; } --- a/fs/io_uring.c~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask +++ a/fs/io_uring.c @@ -2201,11 +2201,12 @@ static int io_cqring_wait(struct io_ring } ret = wait_event_interruptible(ctx->wait, io_cqring_events(ring) >= min_events); - if (ret == -ERESTARTSYS) - ret = -EINTR; if (sig) - restore_user_sigmask(sig, &sigsaved); + restore_user_sigmask(sig, &sigsaved, ret == -ERESTARTSYS); + + if (ret == -ERESTARTSYS) + ret = -EINTR; return READ_ONCE(ring->r.head) == READ_ONCE(ring->r.tail) ? ret : 0; } --- a/fs/select.c~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask +++ a/fs/select.c @@ -758,10 +758,9 @@ static long do_pselect(int n, fd_set __u return ret; ret = core_sys_select(n, inp, outp, exp, to); + restore_user_sigmask(sigmask, &sigsaved, ret == -ERESTARTNOHAND); ret = poll_select_copy_remaining(&end_time, tsp, type, ret); - restore_user_sigmask(sigmask, &sigsaved); - return ret; } @@ -1106,8 +1105,7 @@ SYSCALL_DEFINE5(ppoll, struct pollfd __u ret = do_sys_poll(ufds, nfds, to); - restore_user_sigmask(sigmask, &sigsaved); - + restore_user_sigmask(sigmask, &sigsaved, ret == -EINTR); /* We can restart this syscall, usually */ if (ret == -EINTR) ret = -ERESTARTNOHAND; @@ -1142,8 +1140,7 @@ SYSCALL_DEFINE5(ppoll_time32, struct pol ret = do_sys_poll(ufds, nfds, to); - restore_user_sigmask(sigmask, &sigsaved); - + restore_user_sigmask(sigmask, &sigsaved, ret == -EINTR); /* We can restart this syscall, usually */ if (ret == -EINTR) ret = -ERESTARTNOHAND; @@ -1350,10 +1347,9 @@ static long do_compat_pselect(int n, com return ret; ret = compat_core_sys_select(n, inp, outp, exp, to); + restore_user_sigmask(sigmask, &sigsaved, ret == -ERESTARTNOHAND); ret = poll_select_copy_remaining(&end_time, tsp, type, ret); - restore_user_sigmask(sigmask, &sigsaved); - return ret; } @@ -1425,8 +1421,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time32, str ret = do_sys_poll(ufds, nfds, to); - restore_user_sigmask(sigmask, &sigsaved); - + restore_user_sigmask(sigmask, &sigsaved, ret == -EINTR); /* We can restart this syscall, usually */ if (ret == -EINTR) ret = -ERESTARTNOHAND; @@ -1461,8 +1456,7 @@ COMPAT_SYSCALL_DEFINE5(ppoll_time64, str ret = do_sys_poll(ufds, nfds, to); - restore_user_sigmask(sigmask, &sigsaved); - + restore_user_sigmask(sigmask, &sigsaved, ret == -EINTR); /* We can restart this syscall, usually */ if (ret == -EINTR) ret = -ERESTARTNOHAND; --- a/include/linux/signal.h~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask +++ a/include/linux/signal.h @@ -276,7 +276,7 @@ extern int sigprocmask(int, sigset_t *, extern int set_user_sigmask(const sigset_t __user *usigmask, sigset_t *set, sigset_t *oldset, size_t sigsetsize); extern void restore_user_sigmask(const void __user *usigmask, - sigset_t *sigsaved); + sigset_t *sigsaved, bool interrupted); extern void set_current_blocked(sigset_t *); extern void __set_current_blocked(const sigset_t *); extern int show_unhandled_signals; --- a/kernel/signal.c~signal-remove-the-wrong-signal_pending-check-in-restore_user_sigmask +++ a/kernel/signal.c @@ -2912,7 +2912,8 @@ EXPORT_SYMBOL(set_compat_user_sigmask); * This is useful for syscalls such as ppoll, pselect, io_pgetevents and * epoll_pwait where a new sigmask is passed in from userland for the syscalls. */ -void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved) +void restore_user_sigmask(const void __user *usigmask, sigset_t *sigsaved, + bool interrupted) { if (!usigmask) @@ -2922,7 +2923,7 @@ void restore_user_sigmask(const void __u * Restoring sigmask here can lead to delivering signals that the above * syscalls are intended to block because of the sigmask passed in. */ - if (signal_pending(current)) { + if (interrupted) { current->saved_sigmask = *sigsaved; set_restore_sigmask(); return; _

5 years, 11 months

1
0
0 0

[patch 04/15] fs/binfmt_flat.c: make load_flat_shared_library() work

by akpm＠linux-foundation.org

From: Jann Horn <jannh(a)google.com> Subject: fs/binfmt_flat.c: make load_flat_shared_library() work load_flat_shared_library() is broken: It only calls load_flat_file() if prepare_binprm() returns zero, but prepare_binprm() returns the number of bytes read - so this only happens if the file is empty. Instead, call into load_flat_file() if the number of bytes read is non-negative. (Even if the number of bytes is zero - in that case, load_flat_file() will see nullbytes and return a nice -ENOEXEC.) In addition, remove the code related to bprm creds and stop using prepare_binprm() - this code is loading a library, not a main executable, and it only actually uses the members "buf", "file" and "filename" of the linux_binprm struct. Instead, call kernel_read() directly. Link: http://lkml.kernel.org/r/20190524201817.16509-1-jannh@google.com Fixes: 287980e49ffc ("remove lots of IS_ERR_VALUE abuses") Signed-off-by: Jann Horn <jannh(a)google.com> Cc: Alexander Viro <viro(a)zeniv.linux.org.uk> Cc: Kees Cook <keescook(a)chromium.org> Cc: Nicolas Pitre <nicolas.pitre(a)linaro.org> Cc: Arnd Bergmann <arnd(a)arndb.de> Cc: Geert Uytterhoeven <geert(a)linux-m68k.org> Cc: Russell King <linux(a)armlinux.org.uk> Cc: Greg Ungerer <gerg(a)linux-m68k.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/binfmt_flat.c | 23 +++++++---------------- 1 file changed, 7 insertions(+), 16 deletions(-) --- a/fs/binfmt_flat.c~binfmt_flat-make-load_flat_shared_library-work +++ a/fs/binfmt_flat.c @@ -856,9 +856,14 @@ err: static int load_flat_shared_library(int id, struct lib_info *libs) { + /* + * This is a fake bprm struct; only the members "buf", "file" and + * "filename" are actually used. + */ struct linux_binprm bprm; int res; char buf[16]; + loff_t pos = 0; memset(&bprm, 0, sizeof(bprm)); @@ -872,25 +877,11 @@ static int load_flat_shared_library(int if (IS_ERR(bprm.file)) return res; - bprm.cred = prepare_exec_creds(); - res = -ENOMEM; - if (!bprm.cred) - goto out; - - /* We don't really care about recalculating credentials at this point - * as we're past the point of no return and are dealing with shared - * libraries. - */ - bprm.called_set_creds = 1; - - res = prepare_binprm(&bprm); + res = kernel_read(bprm.file, bprm.buf, BINPRM_BUF_SIZE, &pos); - if (!res) + if (res >= 0) res = load_flat_file(&bprm, libs, id, NULL); - abort_creds(bprm.cred); - -out: allow_write_access(bprm.file); fput(bprm.file); _

5 years, 11 months

1
0
0 0

[patch 03/15] mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask

by akpm＠linux-foundation.org

From: zhong jiang <zhongjiang(a)huawei.com> Subject: mm/mempolicy.c: fix an incorrect rebind node in mpol_rebind_nodemask mpol_rebind_nodemask() is called for MPOL_BIND and MPOL_INTERLEAVE mempoclicies when the tasks's cpuset's mems_allowed changes. For policies created without MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES, it works by remapping the policy's allowed nodes (stored in v.nodes) using the previous value of mems_allowed (stored in w.cpuset_mems_allowed) as the domain of map and the new mems_allowed (passed as nodes) as the range of the map (see the comment of bitmap_remap() for details). The result of remapping is stored back as policy's nodemask in v.nodes, and the new value of mems_allowed should be stored in w.cpuset_mems_allowed to facilitate the next rebind, if it happens. However, 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") introduced a bug where the result of remapping is stored in w.cpuset_mems_allowed instead. Thus, a mempolicy's allowed nodes can evolve in an unexpected way after a series of rebinding due to cpuset mems_allowed changes, possibly binding to a wrong node or a smaller number of nodes which may e.g. overload them. This patch fixes the bug so rebinding again works as intended. [vbabka(a)suse.cz: new changlog] Link: http://lkml.kernel.org/r/ef6a69c6-c052-b067-8f2c-9d615c619bb9@suse.cz Link: http://lkml.kernel.org/r/1558768043-23184-1-git-send-email-zhongjiang@huawe… Fixes: 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") Signed-off-by: zhong jiang <zhongjiang(a)huawei.com> Reviewed-by: Vlastimil Babka <vbabka(a)suse.cz> Cc: Oscar Salvador <osalvador(a)suse.de> Cc: Anshuman Khandual <khandual(a)linux.vnet.ibm.com> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Mel Gorman <mgorman(a)techsingularity.net> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Ralph Campbell <rcampbell(a)nvidia.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mempolicy.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/mm/mempolicy.c~mm-mempolicy-fix-an-incorrect-rebind-node-in-mpol_rebind_nodemask +++ a/mm/mempolicy.c @@ -306,7 +306,7 @@ static void mpol_rebind_nodemask(struct else { nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed, *nodes); - pol->w.cpuset_mems_allowed = tmp; + pol->w.cpuset_mems_allowed = *nodes; } if (nodes_empty(tmp)) _

5 years, 11 months

1
0
0 0

[patch 02/15] fs/proc/array.c: allow reporting eip/esp for all coredumping threads

by akpm＠linux-foundation.org

From: John Ogness <john.ogness(a)linutronix.de> Subject: fs/proc/array.c: allow reporting eip/esp for all coredumping threads 0a1eb2d474ed ("fs/proc: Stop reporting eip and esp in /proc/PID/stat") stopped reporting eip/esp and fd7d56270b52 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") reintroduced the feature to fix a regression with userspace core dump handlers (such as minicoredumper). Because PF_DUMPCORE is only set for the primary thread, this didn't fix the original problem for secondary threads. Allow reporting the eip/esp for all threads by checking for PF_EXITING as well. This is set for all the other threads when they are killed. coredump_wait() waits for all the tasks to become inactive before proceeding to invoke a core dumper. Link: http://lkml.kernel.org/r/87y32p7i7a.fsf@linutronix.de Link: http://lkml.kernel.org/r/20190522161614.628-1-jlu@pengutronix.de Fixes: fd7d56270b526ca3 ("fs/proc: Report eip/esp in /prod/PID/stat for coredumping") Signed-off-by: John Ogness <john.ogness(a)linutronix.de> Reported-by: Jan Luebbe <jlu(a)pengutronix.de> Tested-by: Jan Luebbe <jlu(a)pengutronix.de> Cc: Alexey Dobriyan <adobriyan(a)gmail.com> Cc: Andy Lutomirski <luto(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- fs/proc/array.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/fs/proc/array.c~fs-proc-allow-reporting-eip-esp-for-all-coredumping-threads +++ a/fs/proc/array.c @@ -462,7 +462,7 @@ static int do_task_stat(struct seq_file * a program is not able to use ptrace(2) in that case. It is * safe because the task has stopped executing permanently. */ - if (permitted && (task->flags & PF_DUMPCORE)) { + if (permitted && (task->flags & (PF_EXITING|PF_DUMPCORE))) { if (try_get_task_stack(task)) { eip = KSTK_EIP(task); esp = KSTK_ESP(task); _

5 years, 11 months

1
0
0 0

[4.9.y PATCH 0/2] Backported fixes for 4.9 stable tree

by Srivatsa S. Bhat

Hi Greg, This patchset includes a few backported fixes for the 4.9 stable tree. I would appreciate if you could kindly consider including them in the next release. Thank you! Regards, Srivatsa --- Gen Zhang (2): ip_sockglue: Fix missing-check bug in ip_ra_control() ipv6_sockglue: Fix a missing-check bug in ip6_ra_control() net/ipv4/ip_sockglue.c | 2 ++ net/ipv6/ipv6_sockglue.c | 2 ++ 2 files changed, 4 insertions(+)

5 years, 11 months

1
2
0 0

[PATCH v3] mtd: spinand: read return badly if the last page has bitflips

by liaoweixiong

In case of the last page containing bitflips (ret > 0), spinand_mtd_read() will return that number of bitflips for the last page. But to me it looks like it should instead return max_bitflips like it does when the last page read returns with 0. Signed-off-by: Weixiong Liao <liaoweixiong(a)allwinnertech.com> Reviewed-by: Boris Brezillon <boris.brezillon(a)collabora.com> Reviewed-by: Frieder Schrempf <frieder.schrempf(a)kontron.de> Cc: stable(a)vger.kernel.org Fixes: 7529df465248 ("mtd: nand: Add core infrastructure to support SPI NANDs") --- Changes since v2: - Resend this patch with Cc and Fixes tags. Changes since v1: - More accurate description for this patch --- drivers/mtd/nand/spi/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/mtd/nand/spi/core.c b/drivers/mtd/nand/spi/core.c index 556bfdb..6b9388d 100644 --- a/drivers/mtd/nand/spi/core.c +++ b/drivers/mtd/nand/spi/core.c @@ -511,12 +511,12 @@ static int spinand_mtd_read(struct mtd_info *mtd, loff_t from, if (ret == -EBADMSG) { ecc_failed = true; mtd->ecc_stats.failed++; - ret = 0; } else { mtd->ecc_stats.corrected += ret; max_bitflips = max_t(unsigned int, max_bitflips, ret); } + ret = 0; ops->retlen += iter.req.datalen; ops->oobretlen += iter.req.ooblen; } -- 1.9.1

5 years, 11 months

2
1
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror June 2019