Hi Sasha,
This is a backport to v4.1 of the RFI flush series that went upstream recently.
There's also a few other commits I noticed had not made it to v4.1.
cheers
From: Eric Biggers <ebiggers(a)google.com>
Hi Sasha, can you please apply these backports of ext4 encryption fixes
to 4.1-stable? They all have equivalent fixes in 4.4-stable. Most
important is patch 1 which prevents unprivileged users from using (or
abusing) ext4 encryption when it hasn't been enabled on the filesystem
by a system administrator. Patch 2 adds a missing permission check
(CVE-2016-10318), and patch 3 is a backport that Ted sent out some
months ago that seems to have been missed, for a bug in 4.1 that is very
similar to the bug in 4.2+ that was assigned CVE-2017-7374.
Note that ext4 encryption in 4.1 is still pretty broken and should not
be used (even just 4.4-stable is much better); these are just the most
important fixes that really ought to be in 4.1-stable.
Eric Biggers (1):
fscrypto: add authorization check for setting encryption policy
Richard Weinberger (1):
ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY
Theodore Ts'o (1):
ext4 crypto: don't regenerate the per-inode encryption key
unnecessarily
fs/ext4/crypto_fname.c | 5 +++--
fs/ext4/crypto_key.c | 15 ++++++++++++---
fs/ext4/crypto_policy.c | 3 +++
fs/ext4/ext4.h | 1 +
fs/ext4/ioctl.c | 3 +++
fs/ext4/super.c | 3 +++
6 files changed, 25 insertions(+), 5 deletions(-)
--
2.16.2.395.g2e18187dfd-goog
On Fri, Mar 02, 2018 at 02:58:54PM +0100, Wolfram Sang wrote:
>
> > It needs platform maintainers to be motivated to fix it, and one way to
> > provide that motivation is for subsystem maintainers to say no to patches
> > like this. If patches like this get accepted, then the "problem" gets
> > solved, and there is very little motivation to fix the platform itself.
>
> Yes, I can see this. I will drop / revert the patch.
>
TBH, I can't find the threads from November so I feel a bit lost and
there is no documentation for platform_get_irq().
regards,
dan carpenter
With the alc289, the Pin 0x1b is Headphone-Mic, so we should assign
ALC269_FIXUP_DELL4_MIC_NO_PRESENCE rather than
ALC225_FIXUP_DELL1_MIC_NO_PRESENCE to it. And this change is suggested
by Kailang of Realtek and is verified on the machine.
(This fixes the commit 3f2f7c55)
Cc: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
sound/pci/hda/patch_realtek.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index b9c93fa..7a9a867 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6872,7 +6872,7 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60120},
{0x14, 0x90170110},
{0x21, 0x0321101f}),
- SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC225_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x12, 0xb7a60130},
{0x14, 0x90170110},
{0x21, 0x04211020}),
--
2.7.4
This is the start of the stable review cycle for the 4.4.115 release.
There are 67 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Feb 4 14:07:31 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.115-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.115-rc1
Stefan Agner <stefan(a)agner.ch>
spi: imx: do not access registers while clocks disabled
Fabio Estevam <fabio.estevam(a)nxp.com>
serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS
Mark Salyzyn <salyzyn(a)android.com>
selinux: general protection fault in sock_has_perm
Oliver Neukum <oneukum(a)suse.com>
usb: uas: unconditionally bring back host after reset
Hemant Kumar <hemantk(a)codeaurora.org>
usb: f_fs: Prevent gadget unbind if it is already unbound
Johan Hovold <johan(a)kernel.org>
USB: serial: simple: add Motorola Tetra driver
Shuah Khan <shuahkh(a)osg.samsung.com>
usbip: list: don't list devices attached to vhci_hcd
Shuah Khan <shuahkh(a)osg.samsung.com>
usbip: prevent bind loops on devices attached to vhci_hcd
Jia-Ju Bai <baijiaju1990(a)gmail.com>
USB: serial: io_edgeport: fix possible sleep-in-atomic
Oliver Neukum <oneukum(a)suse.com>
CDC-ACM: apply quirk for card reader
Hans de Goede <hdegoede(a)redhat.com>
USB: cdc-acm: Do not log urb submission errors on disconnect
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
USB: serial: pl2303: new device id for Chilitag
OKAMOTO Yoshiaki <yokamoto(a)allied-telesis.co.jp>
usb: option: Add support for FS040U modem
Larry Finger <Larry.Finger(a)lwfinger.net>
staging: rtl8188eu: Fix incorrect response to SIOCGIWESSID
Colin Ian King <colin.king(a)canonical.com>
usb: gadget: don't dereference g until after it has been null checked
Icenowy Zheng <icenowy(a)aosc.io>
media: usbtv: add a new usbid
Gustavo A. R. Silva <garsilva(a)embeddedor.com>
scsi: ufs: ufshcd: fix potential NULL pointer dereference in ufshcd_config_vreg
Guilherme G. Piccoli <gpiccoli(a)linux.vnet.ibm.com>
scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: ubsan fixes
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
drm/omap: Fix error handling path in 'omap_dmm_probe()'
Yisheng Xie <xieyisheng1(a)huawei.com>
kmemleak: add scheduling point to kmemleak_scan()
Trond Myklebust <trond.myklebust(a)primarydata.com>
SUNRPC: Allow connect to return EHOSTUNREACH
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
quota: Check for register_shrinker() failure.
Geert Uytterhoeven <geert+renesas(a)glider.be>
net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
Robert Lippert <roblip(a)gmail.com>
hwmon: (pmbus) Use 64bit math for DIRECT format values
Vasily Averin <vvs(a)virtuozzo.com>
lockd: fix "list_add double add" caused by legacy signal interface
Andrew Elble <aweits(a)rit.edu>
nfsd: check for use of the closed special stateid
Vasily Averin <vvs(a)virtuozzo.com>
grace: replace BUG_ON by WARN_ONCE in exit_net hook
Trond Myklebust <trond.myklebust(a)primarydata.com>
nfsd: Ensure we check stateid validity in the seqid operation checks
Trond Myklebust <trond.myklebust(a)primarydata.com>
nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)
Eduardo Otubo <otubo(a)redhat.com>
xen-netfront: remove warning when unloading module
Wanpeng Li <wanpeng.li(a)hotmail.com>
KVM: VMX: Fix rflags cache during vCPU reset
Josef Bacik <jbacik(a)fb.com>
btrfs: fix deadlock when writing out space cache
Chun-Yeow Yeoh <yeohchunyeow(a)gmail.com>
mac80211: fix the update of path metric for RANN frame
zhangliping <zhangliping02(a)baidu.com>
openvswitch: fix the incorrect flow action alloc size
Felix Kuehling <Felix.Kuehling(a)amd.com>
drm/amdkfd: Fix SDMA oversubsription handling
shaoyunl <Shaoyun.Liu(a)amd.com>
drm/amdkfd: Fix SDMA ring buffer size calculation
Felix Kuehling <Felix.Kuehling(a)amd.com>
drm/amdgpu: Fix SDMA load/unload sequence on HWS disabled mode
Michael Lyle <mlyle(a)lyle.org>
bcache: check return value of register_shrinker
James Hogan <jhogan(a)kernel.org>
cpufreq: Add Loongson machine dependencies
Hans de Goede <hdegoede(a)redhat.com>
ACPI / bus: Leave modalias empty for devices which are not present
Nikita Leshenko <nikita.leshchenko(a)oracle.com>
KVM: x86: ioapic: Preserve read-only values in the redirection table
Nikita Leshenko <nikita.leshchenko(a)oracle.com>
KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
Nikita Leshenko <nikita.leshchenko(a)oracle.com>
KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
Wanpeng Li <wanpeng.li(a)hotmail.com>
KVM: X86: Fix operand/address-size during instruction decoding
Liran Alon <liran.alon(a)oracle.com>
KVM: x86: Don't re-execute instruction when not passing CR2 value
Liran Alon <liran.alon(a)oracle.com>
KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure
Lyude Paul <lyude(a)redhat.com>
igb: Free IRQs when device is hotplugged
Jesse Chan <jc(a)linux.com>
mtd: nand: denali_pci: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
Jesse Chan <jc(a)linux.com>
gpio: ath79: add missing MODULE_DESCRIPTION/LICENSE
Jesse Chan <jc(a)linux.com>
gpio: iop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
Jesse Chan <jc(a)linux.com>
power: reset: zx-reboot: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
Stephan Mueller <smueller(a)chronox.de>
crypto: af_alg - whitelist mask and type
Stephan Mueller <smueller(a)chronox.de>
crypto: aesni - handle zero length dst buffer
Takashi Iwai <tiwai(a)suse.de>
ALSA: seq: Make ioctls race-free
Hugh Dickins <hughd(a)google.com>
kaiser: fix intel_bts perf crashes
Dave Hansen <dave.hansen(a)linux.intel.com>
x86/pti: Make unpoison of pgd for trusted boot work for real
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: reject stores into ctx via st and xadd
Alexei Starovoitov <ast(a)kernel.org>
bpf: fix 32-bit divide by zero
Eric Dumazet <edumazet(a)google.com>
bpf: fix divides by zero
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: avoid false sharing of map refcount with max_entries
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: arsh is not supported in 32 bit alu thus reject it
Alexei Starovoitov <ast(a)kernel.org>
bpf: introduce BPF_JIT_ALWAYS_ON config
Alexei Starovoitov <ast(a)fb.com>
bpf: fix bpf_tail_call() x64 JIT
Eric Dumazet <edumazet(a)google.com>
x86: bpf_jit: small optimization in emit_bpf_tail_call()
Alexei Starovoitov <ast(a)fb.com>
bpf: fix branch pruning logic
Linus Torvalds <torvalds(a)linux-foundation.org>
loop: fix concurrent lo_open/lo_release
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/Kconfig | 1 +
arch/s390/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/crypto/aesni-intel_glue.c | 2 +-
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/kernel/cpu/perf_event_intel_bts.c | 44 ++++++++++----
arch/x86/kernel/tboot.c | 10 ++++
arch/x86/kvm/emulate.c | 7 +++
arch/x86/kvm/ioapic.c | 20 ++++++-
arch/x86/kvm/vmx.c | 4 +-
arch/x86/kvm/x86.c | 2 +-
arch/x86/net/bpf_jit_comp.c | 13 ++--
crypto/af_alg.c | 10 ++--
drivers/acpi/device_sysfs.c | 4 ++
drivers/block/loop.c | 10 +++-
drivers/cpufreq/Kconfig | 2 +
drivers/gpio/gpio-ath79.c | 3 +
drivers/gpio/gpio-iop.c | 4 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 47 +++++++++++----
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 4 +-
.../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 18 ++++++
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 3 +-
drivers/hwmon/pmbus/pmbus_core.c | 21 ++++---
drivers/md/bcache/btree.c | 5 +-
drivers/media/usb/usbtv/usbtv-core.c | 1 +
drivers/mtd/nand/denali_pci.c | 4 ++
drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
drivers/net/ethernet/xilinx/Kconfig | 1 +
drivers/net/xen-netfront.c | 18 ++++++
drivers/power/reset/zx-reboot.c | 4 ++
drivers/scsi/aacraid/commsup.c | 2 +-
drivers/scsi/ufs/ufshcd.c | 7 ++-
drivers/spi/spi-imx.c | 15 ++++-
drivers/staging/rtl8188eu/os_dep/ioctl_linux.c | 14 ++---
drivers/tty/serial/imx.c | 14 +++--
drivers/usb/class/cdc-acm.c | 5 +-
drivers/usb/gadget/composite.c | 7 ++-
drivers/usb/gadget/function/f_fs.c | 3 +-
drivers/usb/serial/Kconfig | 1 +
drivers/usb/serial/io_edgeport.c | 1 -
drivers/usb/serial/option.c | 5 ++
drivers/usb/serial/pl2303.c | 1 +
drivers/usb/serial/pl2303.h | 1 +
drivers/usb/serial/usb-serial-simple.c | 7 +++
drivers/usb/storage/uas.c | 7 +--
fs/btrfs/free-space-cache.c | 3 +-
fs/nfs_common/grace.c | 10 +++-
fs/nfsd/nfs4state.c | 34 ++++++-----
fs/quota/dquot.c | 3 +-
fs/xfs/xfs_aops.c | 6 +-
include/linux/bpf.h | 16 +++--
init/Kconfig | 7 +++
kernel/bpf/core.c | 30 ++++++++--
kernel/bpf/verifier.c | 70 ++++++++++++++++++++++
lib/test_bpf.c | 13 ++--
mm/kmemleak.c | 2 +
net/Kconfig | 3 +
net/core/filter.c | 8 ++-
net/core/sysctl_net_core.c | 6 ++
net/mac80211/mesh_hwmp.c | 15 +++--
net/openvswitch/flow_netlink.c | 16 ++---
net/socket.c | 9 +++
net/sunrpc/xprtsock.c | 1 +
security/selinux/hooks.c | 2 +
sound/core/seq/seq_clientmgr.c | 10 +++-
sound/core/seq/seq_clientmgr.h | 1 +
tools/usb/usbip/src/usbip_bind.c | 9 +++
tools/usb/usbip/src/usbip_list.c | 9 +++
69 files changed, 504 insertions(+), 142 deletions(-)
Rediffed for RHEL7.4.z.
Conflicts:
The return value of md_make_request is different from RHEL to upstream.
And RHEL7 MD code does not use bio->bi_opf, but bio->bi_rw.
-Nigel Croxon
commit 393debc23c7820211d1c8253dd6a8408a7628fe7
Author: Shaohua Li <shli(a)fb.com>
Date: Thu Sep 21 10:23:35 2017 -0700
md: separate request handling
With commit cc27b0c78c79, pers->make_request could bail out without handling
the bio. If that happens, we should retry. The commit fixes md_make_request
but not other call sites. Separate the request handling part, so other call
sites can use it.
Reported-by: Nate Dailey <nate.dailey(a)stratus.com>
Fix: cc27b0c78c79(md: fix deadlock between mddev_suspend() and md_write_start())
Cc: stable(a)vger.kernel.org
Reviewed-by: NeilBrown <neilb(a)suse.com>
Signed-off-by: Shaohua Li <shli(a)fb.com>
Signed-off-by: Denys Vlasenko <dvlasenk(a)redhat.com>
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 85fe7a99290..407e15f4bfe 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -248,21 +248,8 @@ static DEFINE_SPINLOCK(all_mddevs_lock);
* call has finished, the bio has been linked into some internal structure
* and so is visible to ->quiesce(), so we don't need the refcount any more.
*/
-static void md_make_request(struct request_queue *q, struct bio *bio)
+void md_handle_request(struct mddev *mddev, struct bio *bio)
{
- const int rw = bio_data_dir(bio);
- struct mddev *mddev = q->queuedata;
- int cpu;
- unsigned int sectors;
-
- if (mddev == NULL || mddev->pers == NULL) {
- bio_io_error(bio);
- return;
- }
- if (mddev->ro == 1 && unlikely(rw == WRITE)) {
- bio_endio(bio, bio_sectors(bio) == 0 ? 0 : -EROFS);
- return;
- }
check_suspended:
smp_rmb(); /* Ensure implications of 'active' are visible */
rcu_read_lock();
@@ -282,24 +269,45 @@ check_suspended:
atomic_inc(&mddev->active_io);
rcu_read_unlock();
- /*
- * save the sectors now since our bio can
- * go away inside make_request
- */
- sectors = bio_sectors(bio);
if (!mddev->pers->make_request(mddev, bio)) {
atomic_dec(&mddev->active_io);
wake_up(&mddev->sb_wait);
goto check_suspended;
}
+ if (atomic_dec_and_test(&mddev->active_io) && mddev->suspended)
+ wake_up(&mddev->sb_wait);
+}
+EXPORT_SYMBOL(md_handle_request);
+
+static void md_make_request(struct request_queue *q, struct bio *bio)
+{
+ const int rw = bio_data_dir(bio);
+ struct mddev *mddev = q->queuedata;
+ int cpu;
+ unsigned int sectors;
+
+ if (mddev == NULL || mddev->pers == NULL) {
+ bio_io_error(bio);
+ return;
+ }
+ if (mddev->ro == 1 && unlikely(rw == WRITE)) {
+ bio_endio(bio, bio_sectors(bio) == 0 ? 0 : -EROFS);
+ return;
+ }
+
+ /*
+ * save the sectors now since our bio can
+ * go away inside make_request
+ */
+ sectors = bio_sectors(bio);
+
+ md_handle_request(mddev, bio);
+
cpu = part_stat_lock();
part_stat_inc(cpu, &mddev->gendisk->part0, ios[rw]);
part_stat_add(cpu, &mddev->gendisk->part0, sectors[rw], sectors);
part_stat_unlock();
-
- if (atomic_dec_and_test(&mddev->active_io) && mddev->suspended)
- wake_up(&mddev->sb_wait);
}
/* mddev_suspend makes sure no new requests are submitted
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 0d13bf88f41..12e19d6d373 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -679,6 +679,7 @@ extern void md_stop_writes(struct mddev *mddev);
extern int md_rdev_init(struct md_rdev *rdev);
extern void md_rdev_clear(struct md_rdev *rdev);
+extern void md_handle_request(struct mddev *mddev, struct bio *bio);
extern void mddev_suspend(struct mddev *mddev);
extern void mddev_resume(struct mddev *mddev);
extern struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
--
2.16.2
Changes since v4 [1]:
* Fix the changelog of "dax: introduce IS_DEVDAX() and IS_FSDAX()" to
better clarify the need for new helpers (Jan)
* Replace dax_sem_is_locked() with dax_sem_assert_held() (Jan)
* Use file_inode() in vma_is_dax() (Jan)
* Resend the full series to linux-xfs@ (Dave)
* Collect Jan's Reviewed-by
[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-February/014271.html
---
The vfio interface, like RDMA, wants to setup long term (indefinite)
pins of the pages backing an address range so that a guest or userspace
driver can perform DMA to the with physical address. Given that this
pinning may lead to filesystem operations deadlocking in the
filesystem-dax case, the pinning request needs to be rejected.
The longer term fix for vfio, RDMA, and any other long term pin user, is
to provide a 'pin with lease' mechanism. Similar to the leases that are
hold for pNFS RDMA layouts, this userspace lease gives the kernel a way
to notify userspace that the block layout of the file is changing and
the kernel is revoking access to pinned pages.
Related to this change is the discovery that vma_is_fsdax() was causing
device-dax inode detection to fail. That lead to series of fixes and
cleanups to make sure that S_DAX is defined correctly in the
CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case.
---
Dan Williams (12):
dax: fix vma_is_fsdax() helper
dax: introduce IS_DEVDAX() and IS_FSDAX()
ext2, dax: finish implementing dax_sem helpers
ext2, dax: define ext2_dax_*() infrastructure in all cases
ext4, dax: define ext4_dax_*() infrastructure in all cases
ext2, dax: replace IS_DAX() with IS_FSDAX()
ext4, dax: replace IS_DAX() with IS_FSDAX()
xfs, dax: replace IS_DAX() with IS_FSDAX()
mm, dax: replace IS_DAX() with IS_DEVDAX() or IS_FSDAX()
fs, dax: kill IS_DAX()
dax: fix S_DAX definition
vfio: disable filesystem-dax page pinning
drivers/vfio/vfio_iommu_type1.c | 18 ++++++++++++++--
fs/ext2/ext2.h | 6 +++++
fs/ext2/file.c | 19 +++++------------
fs/ext2/inode.c | 10 ++++-----
fs/ext4/file.c | 18 +++++-----------
fs/ext4/inode.c | 4 ++--
fs/ext4/ioctl.c | 2 +-
fs/ext4/super.c | 2 +-
fs/iomap.c | 2 +-
fs/xfs/xfs_file.c | 14 ++++++-------
fs/xfs/xfs_ioctl.c | 4 ++--
fs/xfs/xfs_iomap.c | 6 +++--
fs/xfs/xfs_reflink.c | 2 +-
include/linux/dax.h | 12 ++++++++---
include/linux/fs.h | 43 ++++++++++++++++++++++++++++-----------
mm/fadvise.c | 3 ++-
mm/filemap.c | 4 ++--
mm/huge_memory.c | 4 +++-
mm/madvise.c | 3 ++-
19 files changed, 102 insertions(+), 74 deletions(-)
The patch titled
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
has been added to the -mm tree. Its filename is
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-fix-memmap_init_zone…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-fix-memmap_init_zone…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Daniel Vacek <neelx(a)redhat.com>
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where
possible") introduced a bug where move_freepages() triggers a VM_BUG_ON()
on uninitialized page structure due to pageblock alignment. To fix this,
simply align the skipped pfns in memmap_init_zone() the same way as in
move_freepages_block().
Seen in one of the RHEL reports:
crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
kernel BUG at mm/page_alloc.c:1389!
invalid opcode: 0000 [#1] SMP
--
RIP: 0010:[<ffffffff8118833e>] [<ffffffff8118833e>] move_freepages+0x15e/0x160
RSP: 0018:ffff88054d727688 EFLAGS: 00010087
--
Call Trace:
[<ffffffff811883b3>] move_freepages_block+0x73/0x80
[<ffffffff81189e63>] __rmqueue+0x263/0x460
[<ffffffff8118c781>] get_page_from_freelist+0x7e1/0x9e0
[<ffffffff8118caf6>] __alloc_pages_nodemask+0x176/0x420
--
RIP [<ffffffff8118833e>] move_freepages+0x15e/0x160
RSP <ffff88054d727688>
crash> page_init_bug -v | grep RAM
<struct resource 0xffff88067fffd2f8> 1000 - 9bfff System RAM (620.00 KiB)
<struct resource 0xffff88067fffd3a0> 100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
<struct resource 0xffff88067fffd410> 4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB)
<struct resource 0xffff88067fffd480> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB)
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
<struct resource 0xffff88067fffd640> 100000000 - 67fffffff System RAM ( 22.00 GiB)
crash> page_init_bug | head -6
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
<struct page 0xffffea0001ede200> 1fffff00000000 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575
<struct page 0xffffea0001ede200> 505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
<struct page 0xffffea0001ed8000> 0 0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA 1 4095
<struct page 0xffffea0001edffc0> 1fffff00000400 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575
BUG, zones differ!
Note that this range follows two not populated sections 68000000-77ffffff
in this zone. 7b788000-7b7fffff is the first one after a gap. This makes
memmap_init_zone() skip all the pfns up to the beginning of this range.
But this range is not pageblock (2M) aligned. In fact no range has to be.
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001ed7fc0 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 0 0 <<<<
ffffea0001ede1c0 7b787000 0 0 0 0
ffffea0001ede200 7b788000 0 0 1 1fffff00000000
Top part of page flags should contain nodeid and zonenr, which is not
the case for page ffffea0001ed8000 here (<<<<).
crash> log | grep -o fffea0001ed[^\ ]* | sort -u
fffea0001ed8000
fffea0001eded20
fffea0001edffc0
crash> bt -r | grep -o fffea0001ed[^\ ]* | sort -u
fffea0001ed8000
fffea0001eded00
fffea0001eded20
fffea0001edffc0
Initialization of the whole beginning of the section is skipped up to the
start of the range due to the commit b92df1de5d28. Now any code calling
move_freepages_block() (like reusing the page from a freelist as in this
example) with a page from the beginning of the range will get the page
rounded down to start_page ffffea0001ed8000 and passed to move_freepages()
which crashes on assertion getting wrong zonenr.
> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
Note, page_zone() derives the zone from page flags here.
>From similar machine before commit b92df1de5d28:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff73941e00000 78000000 0 0 1 1fffff00000000
fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000
fffff73941ed8000 7b600000 0 0 1 1fffff00000000
fffff73941edff80 7b7fe000 0 0 1 1fffff00000000
fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068 uptodate,lru,active,mappedtodisk
All the pages since the beginning of the section are initialized.
move_freepages()' not gonna blow up.
The same machine with this fix applied:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001e00000 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 1 1fffff00000000
ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000
ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068 uptodate,lru,active,mappedtodisk
At least the bare minimum of pages is initialized preventing the crash as well.
Link: http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.152001194…
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <neelx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff -puN mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment
+++ a/mm/page_alloc.c
@@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned
/*
* Skip to the pfn preceding the next valid one (or
* end_pfn), such that we hit a valid pfn (or end_pfn)
- * on our next iteration of the loop.
+ * on our next iteration of the loop. Note that it needs
+ * to be pageblock aligned even when the region itself
+ * is not. move_freepages_block() can shift ahead of
+ * the valid region but still depends on correct page
+ * metadata.
*/
- pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
+ pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
+ ~(pageblock_nr_pages-1)) - 1;
#endif
continue;
}
_
Patches currently in -mm which might be from neelx(a)redhat.com are
mm-memblock-hardcode-the-end_pfn-being-1.patch
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch