December 2018 - Linux-stable-mirror

4.19.5 and later has tons of false messages "BUG: non-zero pgtables_bytes on freeing mm: -16384"

by Christian Borntraeger

Martin, Right now you get a message "BUG: non-zero pgtables_bytes on freeing mm: -16384" for EVERY process that exits in 4.19.5 and later. bisect points to commit 4136161d676a93fc8df6bdb80d720c15522d6c24 Author: Martin Schwidefsky <schwidefsky(a)de.ibm.com> Date: Mon Oct 15 11:09:16 2018 +0200 s390/mm: fix mis-accounting of pgtable_bytes [ Upstream commit e12e4044aede97974f2222eb7f0ed726a5179a32 ] Turns out that this patch requires several dependencies so the autoselection of this patch was missing that. Can we either revert this patch or add the dependencies? Christian

6 years, 6 months

2
4
0 0

[PATCH stable] ubifs: Handle re-linking of inodes correctly while recovery

by Rafał Miłecki

From: Richard Weinberger <richard(a)nod.at> commit e58725d51fa8da9133f3f1c54170aa2e43056b91 upstream. UBIFS's recovery code strictly assumes that a deleted inode will never come back, therefore it removes all data which belongs to that inode as soon it faces an inode with link count 0 in the replay list. Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE it can lead to data loss upon a power-cut. Consider a journal with entries like: 0: inode X (nlink = 0) /* O_TMPFILE was created */ 1: data for inode X /* Someone writes to the temp file */ 2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */ 3: inode X (nlink = 1) /* inode was re-linked via linkat() */ Upon replay of entry #2 UBIFS will drop all data that belongs to inode X, this will lead to an empty file after mounting. As solution for this problem, scan the replay list for a re-link entry before dropping data. Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE") Cc: stable(a)vger.kernel.org # 4.9-4.18 Cc: Russell Senior <russell(a)personaltelco.net> Cc: Rafał Miłecki <zajec5(a)gmail.com> Reported-by: Russell Senior <russell(a)personaltelco.net> Reported-by: Rafał Miłecki <zajec5(a)gmail.com> Tested-by: Rafał Miłecki <rafal(a)milecki.pl> Signed-off-by: Richard Weinberger <richard(a)nod.at> [rmilecki: update ubifs_assert() calls to compile with 4.18 and older] Signed-off-by: Rafał Miłecki <rafal(a)milecki.pl> (cherry picked from commit e58725d51fa8da9133f3f1c54170aa2e43056b91) --- fs/ubifs/replay.c | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+) diff --git a/fs/ubifs/replay.c b/fs/ubifs/replay.c index ae5c02f22f3e..d998fbf7de30 100644 --- a/fs/ubifs/replay.c +++ b/fs/ubifs/replay.c @@ -210,6 +210,38 @@ static int trun_remove_range(struct ubifs_info *c, struct replay_entry *r) } /** + * inode_still_linked - check whether inode in question will be re-linked. + * @c: UBIFS file-system description object + * @rino: replay entry to test + * + * O_TMPFILE files can be re-linked, this means link count goes from 0 to 1. + * This case needs special care, otherwise all references to the inode will + * be removed upon the first replay entry of an inode with link count 0 + * is found. + */ +static bool inode_still_linked(struct ubifs_info *c, struct replay_entry *rino) +{ + struct replay_entry *r; + + ubifs_assert(rino->deletion); + ubifs_assert(key_type(c, &rino->key) == UBIFS_INO_KEY); + + /* + * Find the most recent entry for the inode behind @rino and check + * whether it is a deletion. + */ + list_for_each_entry_reverse(r, &c->replay_list, list) { + ubifs_assert(r->sqnum >= rino->sqnum); + if (key_inum(c, &r->key) == key_inum(c, &rino->key)) + return r->deletion == 0; + + } + + ubifs_assert(0); + return false; +} + +/** * apply_replay_entry - apply a replay entry to the TNC. * @c: UBIFS file-system description object * @r: replay entry to apply @@ -239,6 +271,11 @@ static int apply_replay_entry(struct ubifs_info *c, struct replay_entry *r) { ino_t inum = key_inum(c, &r->key); + if (inode_still_linked(c, r)) { + err = 0; + break; + } + err = ubifs_tnc_remove_ino(c, inum); break; } -- 2.13.7

6 years, 6 months

2
1
0 0

suggest 00b80ac93553 ("spi: imx: mx51-ecspi: Move some initialisation to prepare_message hook.") for stable backports

by Uwe Kleine-König

Hello, even though the subject sounds harmless it fixes a real bug. (Yes, I'm aware that commit isn't in Linus' tree yet, but I assume your tracking of patches targetting stable is better than mine, so I didn't wait :-) For backporting you also need its parent commit (i.e. e697271c4e29 ("spi: imx: add a device specific prepare_message callback")). I have a local backport to v4.14 here which isn't entirely trivial. So just tell me if you need help. Best regards Uwe -- Pengutronix e.K. | Uwe Kleine-König | Industrial Linux Solutions | http://www.pengutronix.de/ |

6 years, 6 months

3
5
0 0

Re: [PATCH 2/2] USB: storage: add quirk for SMI SM3350

by Icenowy Zheng

在 2018-12-27四的 22:34 +0800，Icenowy Zheng写道： > The SMI SM3350 USB-UFS bridge controller cannot handle long sense > request > correctly and will make the chip refuse to do read/write when > requested > long sense. > > Add a bad sense quirk for it. > > Signed-off-by: Icenowy Zheng <icenowy(a)aosc.io> > --- I forgot to: Cc: stable(a)vger.kernel.org > drivers/usb/storage/unusual_devs.h | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/drivers/usb/storage/unusual_devs.h > b/drivers/usb/storage/unusual_devs.h > index f7f83b21dc74..ea0d27a94afe 100644 > --- a/drivers/usb/storage/unusual_devs.h > +++ b/drivers/usb/storage/unusual_devs.h > @@ -1265,6 +1265,18 @@ UNUSUAL_DEV( 0x090c, 0x1132, 0x0000, 0xffff, > USB_SC_DEVICE, USB_PR_DEVICE, NULL, > US_FL_FIX_CAPACITY ), > > +/* > + * Reported by Icenowy Zheng <icenowy(a)aosc.io> > + * The SMI SM3350 USB-UFS bridge controller will enter a wrong state > + * that do not process read/write command if a long sense is > requested, > + * so force to use 18-byte sense. > + */ > +UNUSUAL_DEV( 0x090c, 0x3350, 0x0000, 0xffff, > + "SMI", > + "SM3350 UFS-to-USB-Mass-Storage bridge", > + USB_SC_DEVICE, USB_PR_DEVICE, NULL, > + US_FL_BAD_SENSE ), > + > /* > * Reported by Paul Hartman <paul.hartman+linux(a)gmail.com> > * This card reader returns "Illegal Request, Logical Block Address

6 years, 6 months

1
0
0 0

Re: [PATCH 1/2] USB: storage: don't insert sane sense for SPC3+ when bad sense specified

by Icenowy Zheng

在 2018-12-27四的 22:34 +0800，Icenowy Zheng写道： > Currently the code will set US_FL_SANE_SENSE flag unconditionally if > device claims SPC3+, however we should allow US_FL_BAD_SENSE flag to > prevent this behavior, because SMI SM3350 UFS-USB bridge controller, > which claims SPC4, will show strange behavior with 96-byte sense > (put the chip into a wrong state that cannot read/write anything). > > Check the presence of US_FL_BAD_SENSE when assuming US_FL_SANE_SENSE > on > SPC4+ devices. > > Signed-off-by: Icenowy Zheng <icenowy(a)aosc.io> > --- I forgot to: Cc: stable(a)vger.kernel.org > drivers/usb/storage/scsiglue.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/usb/storage/scsiglue.c > b/drivers/usb/storage/scsiglue.c > index fde2e71a6ade..699fe9557127 100644 > --- a/drivers/usb/storage/scsiglue.c > +++ b/drivers/usb/storage/scsiglue.c > @@ -236,7 +236,8 @@ static int slave_configure(struct scsi_device > *sdev) > sdev->try_rc_10_first = 1; > > /* assume SPC3 or latter devices support sense size > > 18 */ > - if (sdev->scsi_level > SCSI_SPC_2) > + if (sdev->scsi_level > SCSI_SPC_2 && > + !(us->fflags & US_FL_BAD_SENSE)) > us->fflags |= US_FL_SANE_SENSE; > > /*

6 years, 6 months

1
0
0 0

[PATCH] f2fs: sanity check of xattr entry size

by Jaegeuk Kim

There is a security report where f2fs_getxattr() has a hole to expose wrong memory region when the image is malformed like this. f2fs_getxattr: entry->e_name_len: 4, size: 12288, buffer_size: 16384, len: 4 Cc: <stable(a)vger.kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org> --- fs/f2fs/xattr.c | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/fs/f2fs/xattr.c b/fs/f2fs/xattr.c index f44b0c38398b..18d5ffbc5e8c 100644 --- a/fs/f2fs/xattr.c +++ b/fs/f2fs/xattr.c @@ -288,7 +288,7 @@ static int read_xattr_block(struct inode *inode, void *txattr_addr) static int lookup_all_xattrs(struct inode *inode, struct page *ipage, unsigned int index, unsigned int len, const char *name, struct f2fs_xattr_entry **xe, - void **base_addr) + void **base_addr, int *base_size) { void *cur_addr, *txattr_addr, *last_addr = NULL; nid_t xnid = F2FS_I(inode)->i_xattr_nid; @@ -299,8 +299,8 @@ static int lookup_all_xattrs(struct inode *inode, struct page *ipage, if (!size && !inline_size) return -ENODATA; - txattr_addr = f2fs_kzalloc(F2FS_I_SB(inode), - inline_size + size + XATTR_PADDING_SIZE, GFP_NOFS); + *base_size = inline_size + size + XATTR_PADDING_SIZE; + txattr_addr = f2fs_kzalloc(F2FS_I_SB(inode), *base_size, GFP_NOFS); if (!txattr_addr) return -ENOMEM; @@ -312,8 +312,10 @@ static int lookup_all_xattrs(struct inode *inode, struct page *ipage, *xe = __find_inline_xattr(inode, txattr_addr, &last_addr, index, len, name); - if (*xe) + if (*xe) { + *base_size = inline_size; goto check; + } } /* read from xattr node block */ @@ -474,6 +476,7 @@ int f2fs_getxattr(struct inode *inode, int index, const char *name, int error = 0; unsigned int size, len; void *base_addr = NULL; + int base_size; if (name == NULL) return -EINVAL; @@ -484,7 +487,7 @@ int f2fs_getxattr(struct inode *inode, int index, const char *name, down_read(&F2FS_I(inode)->i_xattr_sem); error = lookup_all_xattrs(inode, ipage, index, len, name, - &entry, &base_addr); + &entry, &base_addr, &base_size); up_read(&F2FS_I(inode)->i_xattr_sem); if (error) return error; @@ -498,6 +501,11 @@ int f2fs_getxattr(struct inode *inode, int index, const char *name, if (buffer) { char *pval = entry->e_name + entry->e_name_len; + + if (base_size - (pval - (char *)base_addr) < size) { + error = -ERANGE; + goto out; + } memcpy(buffer, pval, size); } error = size; -- 2.19.0.605.g01d371f741-goog

6 years, 6 months

1
0
0 0

v4.20 build: 0 failures 4 warnings (v4.20)

by Build bot for Mark Brown

Tree/Branch: v4.20 Git describe: v4.20 Commit: 8fe28cb58b Linux 4.20 Build Time: 129 min 44 sec Passed: 11 / 11 (100.00 %) Failed: 0 / 11 ( 0.00 %) Errors: 0 Warnings: 4 Section Mismatches: 0 ------------------------------------------------------------------------------- defconfigs with issues (other than build errors): 1 warnings 0 mismatches : arm64-allmodconfig 3 warnings 0 mismatches : arm-allmodconfig ------------------------------------------------------------------------------- Warnings Summary: 4 1 ../drivers/staging/erofs/unzip_vle.c:188:29: warning: array subscript is above array bounds [-Warray-bounds] 1 ../drivers/scsi/myrs.c:821:24: warning: 'sshdr.sense_key' may be used uninitialized in this function [-Wmaybe-uninitialized] 1 ../drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:216:1: warning: the frame size of 1064 bytes is larger than 1024 bytes [-Wframe-larger-than=] 1 ../drivers/isdn/hardware/eicon/message.c:5985:1: warning: the frame size of 2064 bytes is larger than 2048 bytes [-Wframe-larger-than=] =============================================================================== Detailed per-defconfig build reports below: ------------------------------------------------------------------------------- arm64-allmodconfig : PASS, 0 errors, 1 warnings, 0 section mismatches Warnings: ../drivers/isdn/hardware/eicon/message.c:5985:1: warning: the frame size of 2064 bytes is larger than 2048 bytes [-Wframe-larger-than=] ------------------------------------------------------------------------------- arm-allmodconfig : PASS, 0 errors, 3 warnings, 0 section mismatches Warnings: ../drivers/net/ethernet/mellanox/mlx5/core/en_stats.c:216:1: warning: the frame size of 1064 bytes is larger than 1024 bytes [-Wframe-larger-than=] ../drivers/scsi/myrs.c:821:24: warning: 'sshdr.sense_key' may be used uninitialized in this function [-Wmaybe-uninitialized] ../drivers/staging/erofs/unzip_vle.c:188:29: warning: array subscript is above array bounds [-Warray-bounds] ------------------------------------------------------------------------------- Passed with no errors, warnings or mismatches: arm64-allnoconfig arm-multi_v5_defconfig arm-multi_v7_defconfig x86_64-defconfig arm-allnoconfig x86_64-allnoconfig arm-multi_v4t_defconfig x86_64-allmodconfig arm64-defconfig

6 years, 6 months

1
0
0 0

[merged] mm-page_alloc-fix-has_unmovable_pages-for-hugepages.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm, page_alloc: fix has_unmovable_pages for HugePages has been removed from the -mm tree. Its filename was mm-page_alloc-fix-has_unmovable_pages-for-hugepages.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Oscar Salvador <osalvador(a)suse.de> Subject: mm, page_alloc: fix has_unmovable_pages for HugePages While playing with gigantic hugepages and memory_hotplug, I triggered the following #PF when "cat memoryX/removable": <--- kernel: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 kernel: #PF error: [normal kernel read fault] kernel: PGD 0 P4D 0 kernel: Oops: 0000 [#1] SMP PTI kernel: CPU: 1 PID: 1481 Comm: cat Tainted: G E 4.20.0-rc6-mm1-1-default+ #18 kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 kernel: RIP: 0010:has_unmovable_pages+0x154/0x210 kernel: Code: 1b ff ff ff eb 32 48 8b 45 00 bf 00 10 00 00 a9 00 00 01 00 74 07 0f b6 4d 51 48 d3 e7 e8 c4 81 05 00 48 85 c0 49 89 c1 75 7e <41> 8b 41 08 83 f8 09 74 41 83 f8 1b 74 3c 4d 2b 64 24 58 49 81 ec kernel: RSP: 0018:ffffc90000a1fd30 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000009 kernel: RDX: ffffffff82aed4f0 RSI: 0000000000001000 RDI: 0000000000001000 kernel: RBP: ffffea0001800000 R08: 0000000000200000 R09: 0000000000000000 kernel: R10: 0000000000001000 R11: 0000000000000003 R12: ffff88813ffd45c0 kernel: R13: 0000000000060000 R14: 0000000000000001 R15: ffffea0000000000 kernel: FS: 00007fd71d9b3500(0000) GS:ffff88813bb00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000008 CR3: 00000001371c2002 CR4: 00000000003606e0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 kernel: Call Trace: kernel: is_mem_section_removable+0x7d/0x100 kernel: removable_show+0x90/0xb0 kernel: dev_attr_show+0x1c/0x50 kernel: sysfs_kf_seq_show+0xca/0x1b0 kernel: seq_read+0x133/0x380 kernel: __vfs_read+0x26/0x180 kernel: vfs_read+0x89/0x140 kernel: ksys_read+0x42/0x90 kernel: do_syscall_64+0x5b/0x180 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 kernel: RIP: 0033:0x7fd71d4c8b41 kernel: Code: fe ff ff 48 8d 3d 27 9e 09 00 48 83 ec 08 e8 96 02 02 00 66 0f 1f 44 00 00 8b 05 ea fc 2c 00 48 63 ff 85 c0 75 13 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 55 53 48 89 d5 48 89 kernel: RSP: 002b:00007ffeab5f6448 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 kernel: RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fd71d4c8b41 kernel: RDX: 0000000000020000 RSI: 00007fd71d809000 RDI: 0000000000000003 kernel: RBP: 0000000000020000 R08: ffffffffffffffff R09: 0000000000000000 kernel: R10: 000000000000038b R11: 0000000000000246 R12: 00007fd71d809000 kernel: R13: 0000000000000003 R14: 00007fd71d80900f R15: 0000000000020000 kernel: Modules linked in: af_packet(E) xt_tcpudp(E) ipt_REJECT(E) xt_conntrack(E) nf_conntrack(E) nf_defrag_ipv4(E) ip_set(E) nfnetlink(E) ebtable_nat(E) ebtable_broute(E) bridge(E) stp(E) llc(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ebtable_filter(E) ebtables(E) iptable_filter(E) ip_tables(E) x_tables(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) bochs_drm(E) ttm(E) drm_kms_helper(E) drm(E) aesni_intel(E) virtio_net(E) syscopyarea(E) net_failover(E) sysfillrect(E) failover(E) aes_x86_64(E) crypto_simd(E) sysimgblt(E) cryptd(E) pcspkr(E) glue_helper(E) parport_pc(E) fb_sys_fops(E) i2c_piix4(E) parport(E) button(E) btrfs(E) libcrc32c(E) xor(E) zstd_decompress(E) zstd_compress(E) raid6_pq(E) sd_mod(E) ata_generic(E) ata_piix(E) ahci(E) libahci(E) serio_raw(E) crc32c_intel(E) virtio_pci(E) virtio_ring(E) virtio(E) libata(E) sg(E) scsi_mod(E) autofs4(E) kernel: CR2: 0000000000000008 kernel: ---[ end trace 49cade81474e40e7 ]--- kernel: RIP: 0010:has_unmovable_pages+0x154/0x210 kernel: Code: 1b ff ff ff eb 32 48 8b 45 00 bf 00 10 00 00 a9 00 00 01 00 74 07 0f b6 4d 51 48 d3 e7 e8 c4 81 05 00 48 85 c0 49 89 c1 75 7e <41> 8b 41 08 83 f8 09 74 41 83 f8 1b 74 3c 4d 2b 64 24 58 49 81 ec kernel: RSP: 0018:ffffc90000a1fd30 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000009 kernel: RDX: ffffffff82aed4f0 RSI: 0000000000001000 RDI: 0000000000001000 kernel: RBP: ffffea0001800000 R08: 0000000000200000 R09: 0000000000000000 kernel: R10: 0000000000001000 R11: 0000000000000003 R12: ffff88813ffd45c0 kernel: R13: 0000000000060000 R14: 0000000000000001 R15: ffffea0000000000 kernel: FS: 00007fd71d9b3500(0000) GS:ffff88813bb00000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000008 CR3: 00000001371c2002 CR4: 00000000003606e0 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 ---> The reason is we do not pass the Head to page_hstate(), and so, the call to compound_order() in page_hstate() returns 0, so we end up checking all hstates's size to match PAGE_SIZE. Obviously, we do not find any hstate matching that size, and we return NULL. Then, we dereference that NULL pointer in hugepage_migration_supported() and we got the #PF from above. Fix that by getting the head page before calling page_hstate(). Also, since gigantic pages span several pageblocks, re-adjust the logic for skipping pages. While are it, we can also get rid of the round_up(). [osalvador(a)suse.de: remove round_up(), adjust skip pages logic per Michal] Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador(a)suse.de> Acked-by: Michal Hocko <mhocko(a)suse.com> Reviewed-by: David Hildenbrand <david(a)redhat.com> Cc: Vlastimil Babka <vbabka(a)suse.cz> Cc: Pavel Tatashin <pavel.tatashin(a)microsoft.com> Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/page_alloc.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) --- a/mm/page_alloc.c~mm-page_alloc-fix-has_unmovable_pages-for-hugepages +++ a/mm/page_alloc.c @@ -7814,11 +7814,14 @@ bool has_unmovable_pages(struct zone *zo * handle each tail page individually in migration. */ if (PageHuge(page)) { + struct page *head = compound_head(page); + unsigned int skip_pages; - if (!hugepage_migration_supported(page_hstate(page))) + if (!hugepage_migration_supported(page_hstate(head))) goto unmovable; - iter = round_up(iter + 1, 1<<compound_order(page)) - 1; + skip_pages = (1 << compound_order(head)) - (page - head); + iter += skip_pages - 1; continue; } _ Patches currently in -mm which might be from osalvador(a)suse.de are kernel-resource-check-for-ioresource_sysram-in-release_mem_region_adjustable.patch mm-page_alloc-drop-uneeded-__meminit-and-__meminitdata.patch mm-kmemleak-little-optimization-while-scanning.patch mm-memory_hotplug-dont-bail-out-in-do_migrate_range-prematurely.patch

6 years, 6 months

1
0
0 0

[merged] forkmemcg-fix-crash-in-free_thread_stack-on-memcg-charge-fail.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: fork,memcg: fix crash in free_thread_stack on memcg charge fail has been removed from the -mm tree. Its filename was forkmemcg-fix-crash-in-free_thread_stack-on-memcg-charge-fail.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Rik van Riel <riel(a)surriel.com> Subject: fork,memcg: fix crash in free_thread_stack on memcg charge fail Changeset 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") will result in fork failing if allocating a kernel stack for a task in dup_task_struct exceeds the kernel memory allowance for that cgroup. Unfortunately, it also results in a crash. This is due to the code jumping to free_stack and calling free_thread_stack when the memcg kernel stack charge fails, but without tsk->stack pointing at the freshly allocated stack. This in turn results in the vfree_atomic in free_thread_stack oopsing with a backtrace like this: #5 [ffffc900244efc88] die at ffffffff8101f0ab #6 [ffffc900244efcb8] do_general_protection at ffffffff8101cb86 #7 [ffffc900244efce0] general_protection at ffffffff818ff082 [exception RIP: llist_add_batch+7] RIP: ffffffff8150d487 RSP: ffffc900244efd98 RFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88085ef55980 RCX: 0000000000000000 RDX: ffff88085ef55980 RSI: 343834343531203a RDI: 343834343531203a RBP: ffffc900244efd98 R8: 0000000000000001 R9: ffff8808578c3600 R10: 0000000000000000 R11: 0000000000000001 R12: ffff88029f6c21c0 R13: 0000000000000286 R14: ffff880147759b00 R15: 0000000000000000 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 #8 [ffffc900244efda0] vfree_atomic at ffffffff811df2c7 #9 [ffffc900244efdb8] copy_process at ffffffff81086e37 #10 [ffffc900244efe98] _do_fork at ffffffff810884e0 #11 [ffffc900244eff10] sys_vfork at ffffffff810887ff #12 [ffffc900244eff20] do_syscall_64 at ffffffff81002a43 RIP: 000000000049b948 RSP: 00007ffcdb307830 RFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000896030 RCX: 000000000049b948 RDX: 0000000000000000 RSI: 00007ffcdb307790 RDI: 00000000005d7421 RBP: 000000000067370f R8: 00007ffcdb3077b0 R9: 000000000001ed00 R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000040 R13: 000000000000000f R14: 0000000000000000 R15: 000000000088d018 ORIG_RAX: 000000000000003a CS: 0033 SS: 002b The simplest fix is to assign tsk->stack right where it is allocated. Link: http://lkml.kernel.org/r/20181214231726.7ee4843c@imladris.surriel.com Fixes: 9b6f7e163cd0 ("mm: rework memcg kernel stack accounting") Signed-off-by: Rik van Riel <riel(a)surriel.com> Acked-by: Roman Gushchin <guro(a)fb.com> Acked-by: Michal Hocko <mhocko(a)suse.com> Cc: Shakeel Butt <shakeelb(a)google.com> Cc: Johannes Weiner <hannes(a)cmpxchg.org> Cc: Tejun Heo <tj(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- kernel/fork.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) --- a/kernel/fork.c~forkmemcg-fix-crash-in-free_thread_stack-on-memcg-charge-fail +++ a/kernel/fork.c @@ -240,8 +240,10 @@ static unsigned long *alloc_thread_stack * free_thread_stack() can be called in interrupt context, * so cache the vm_struct. */ - if (stack) + if (stack) { tsk->stack_vm_area = find_vm_area(stack); + tsk->stack = stack; + } return stack; #else struct page *page = alloc_pages_node(node, THREADINFO_GFP, @@ -288,7 +290,10 @@ static struct kmem_cache *thread_stack_c static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) { - return kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node); + unsigned long *stack; + stack = kmem_cache_alloc_node(thread_stack_cache, THREADINFO_GFP, node); + tsk->stack = stack; + return stack; } static void free_thread_stack(struct task_struct *tsk) _ Patches currently in -mm which might be from riel(a)surriel.com are

6 years, 6 months

1
0
0 0

[merged] mm-thp-fix-flags-for-pmd-migration-when-split.patch removed from -mm tree

by akpm＠linux-foundation.org

The patch titled Subject: mm: thp: fix flags for pmd migration when split has been removed from the -mm tree. Its filename was mm-thp-fix-flags-for-pmd-migration-when-split.patch This patch was dropped because it was merged into mainline or a subsystem tree ------------------------------------------------------ From: Peter Xu <peterx(a)redhat.com> Subject: mm: thp: fix flags for pmd migration when split When splitting a huge migrating PMD, we'll transfer all the existing PMD bits and apply them again onto the small PTEs. However we are fetching the bits unconditionally via pmd_soft_dirty(), pmd_write() or pmd_yound() while actually they don't make sense at all when it's a migration entry. Fix them up. Since at it, drop the ifdef together as not needed. Note that if my understanding is correct about the problem then if without the patch there is chance to lose some of the dirty bits in the migrating pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset instead of bit 2) and it could potentially corrupt the memory of an userspace program which depends on the dirty bit. Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx(a)redhat.com> Reviewed-by: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Reviewed-by: William Kucharski <william.kucharski(a)oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com> Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: Michal Hocko <mhocko(a)suse.com> Cc: Dave Jiang <dave.jiang(a)intel.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.vnet.ibm.com> Cc: Souptick Joarder <jrdr.linux(a)gmail.com> Cc: Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru> Cc: Zi Yan <zi.yan(a)cs.rutgers.edu> Cc: <stable(a)vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/huge_memory.c | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) --- a/mm/huge_memory.c~mm-thp-fix-flags-for-pmd-migration-when-split +++ a/mm/huge_memory.c @@ -2144,23 +2144,25 @@ static void __split_huge_pmd_locked(stru */ old_pmd = pmdp_invalidate(vma, haddr, pmd); -#ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION pmd_migration = is_pmd_migration_entry(old_pmd); - if (pmd_migration) { + if (unlikely(pmd_migration)) { swp_entry_t entry; entry = pmd_to_swp_entry(old_pmd); page = pfn_to_page(swp_offset(entry)); - } else -#endif + write = is_write_migration_entry(entry); + young = false; + soft_dirty = pmd_swp_soft_dirty(old_pmd); + } else { page = pmd_page(old_pmd); + if (pmd_dirty(old_pmd)) + SetPageDirty(page); + write = pmd_write(old_pmd); + young = pmd_young(old_pmd); + soft_dirty = pmd_soft_dirty(old_pmd); + } VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); - if (pmd_dirty(old_pmd)) - SetPageDirty(page); - write = pmd_write(old_pmd); - young = pmd_young(old_pmd); - soft_dirty = pmd_soft_dirty(old_pmd); /* * Withdraw the table only after we mark the pmd entry invalid. _ Patches currently in -mm which might be from peterx(a)redhat.com are userfaultfd-clear-flag-if-remap-event-not-enabled.patch

6 years, 6 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror December 2018