From: Eric Biggers <ebiggers(a)google.com>
Hi Sasha, can you please apply these backports of ext4 encryption fixes
to 4.1-stable? They all have equivalent fixes in 4.4-stable. Most
important is patch 1 which prevents unprivileged users from using (or
abusing) ext4 encryption when it hasn't been enabled on the filesystem
by a system administrator. Patch 2 adds a missing permission check
(CVE-2016-10318), and patch 3 is a backport that Ted sent out some
months ago that seems to have been missed, for a bug in 4.1 that is very
similar to the bug in 4.2+ that was assigned CVE-2017-7374.
Note that ext4 encryption in 4.1 is still pretty broken and should not
be used (even just 4.4-stable is much better); these are just the most
important fixes that really ought to be in 4.1-stable.
Eric Biggers (1):
fscrypto: add authorization check for setting encryption policy
Richard Weinberger (1):
ext4: require encryption feature for EXT4_IOC_SET_ENCRYPTION_POLICY
Theodore Ts'o (1):
ext4 crypto: don't regenerate the per-inode encryption key
unnecessarily
fs/ext4/crypto_fname.c | 5 +++--
fs/ext4/crypto_key.c | 15 ++++++++++++---
fs/ext4/crypto_policy.c | 3 +++
fs/ext4/ext4.h | 1 +
fs/ext4/ioctl.c | 3 +++
fs/ext4/super.c | 3 +++
6 files changed, 25 insertions(+), 5 deletions(-)
--
2.16.2.395.g2e18187dfd-goog
On Fri, Mar 02, 2018 at 02:58:54PM +0100, Wolfram Sang wrote:
>
> > It needs platform maintainers to be motivated to fix it, and one way to
> > provide that motivation is for subsystem maintainers to say no to patches
> > like this. If patches like this get accepted, then the "problem" gets
> > solved, and there is very little motivation to fix the platform itself.
>
> Yes, I can see this. I will drop / revert the patch.
>
TBH, I can't find the threads from November so I feel a bit lost and
there is no documentation for platform_get_irq().
regards,
dan carpenter
With the alc289, the Pin 0x1b is Headphone-Mic, so we should assign
ALC269_FIXUP_DELL4_MIC_NO_PRESENCE rather than
ALC225_FIXUP_DELL1_MIC_NO_PRESENCE to it. And this change is suggested
by Kailang of Realtek and is verified on the machine.
(This fixes the commit 3f2f7c55)
Cc: Kailang Yang <kailang(a)realtek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hui Wang <hui.wang(a)canonical.com>
---
sound/pci/hda/patch_realtek.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/pci/hda/patch_realtek.c b/sound/pci/hda/patch_realtek.c
index b9c93fa..7a9a867 100644
--- a/sound/pci/hda/patch_realtek.c
+++ b/sound/pci/hda/patch_realtek.c
@@ -6872,7 +6872,7 @@ static const struct snd_hda_pin_quirk alc269_pin_fixup_tbl[] = {
{0x12, 0x90a60120},
{0x14, 0x90170110},
{0x21, 0x0321101f}),
- SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC225_FIXUP_DELL1_MIC_NO_PRESENCE,
+ SND_HDA_PIN_QUIRK(0x10ec0289, 0x1028, "Dell", ALC269_FIXUP_DELL4_MIC_NO_PRESENCE,
{0x12, 0xb7a60130},
{0x14, 0x90170110},
{0x21, 0x04211020}),
--
2.7.4
This is the start of the stable review cycle for the 4.4.115 release.
There are 67 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Feb 4 14:07:31 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.115-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.115-rc1
Stefan Agner <stefan(a)agner.ch>
spi: imx: do not access registers while clocks disabled
Fabio Estevam <fabio.estevam(a)nxp.com>
serial: imx: Only wakeup via RTSDEN bit if the system has RTS/CTS
Mark Salyzyn <salyzyn(a)android.com>
selinux: general protection fault in sock_has_perm
Oliver Neukum <oneukum(a)suse.com>
usb: uas: unconditionally bring back host after reset
Hemant Kumar <hemantk(a)codeaurora.org>
usb: f_fs: Prevent gadget unbind if it is already unbound
Johan Hovold <johan(a)kernel.org>
USB: serial: simple: add Motorola Tetra driver
Shuah Khan <shuahkh(a)osg.samsung.com>
usbip: list: don't list devices attached to vhci_hcd
Shuah Khan <shuahkh(a)osg.samsung.com>
usbip: prevent bind loops on devices attached to vhci_hcd
Jia-Ju Bai <baijiaju1990(a)gmail.com>
USB: serial: io_edgeport: fix possible sleep-in-atomic
Oliver Neukum <oneukum(a)suse.com>
CDC-ACM: apply quirk for card reader
Hans de Goede <hdegoede(a)redhat.com>
USB: cdc-acm: Do not log urb submission errors on disconnect
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
USB: serial: pl2303: new device id for Chilitag
OKAMOTO Yoshiaki <yokamoto(a)allied-telesis.co.jp>
usb: option: Add support for FS040U modem
Larry Finger <Larry.Finger(a)lwfinger.net>
staging: rtl8188eu: Fix incorrect response to SIOCGIWESSID
Colin Ian King <colin.king(a)canonical.com>
usb: gadget: don't dereference g until after it has been null checked
Icenowy Zheng <icenowy(a)aosc.io>
media: usbtv: add a new usbid
Gustavo A. R. Silva <garsilva(a)embeddedor.com>
scsi: ufs: ufshcd: fix potential NULL pointer dereference in ufshcd_config_vreg
Guilherme G. Piccoli <gpiccoli(a)linux.vnet.ibm.com>
scsi: aacraid: Prevent crash in case of free interrupt during scsi EH path
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: ubsan fixes
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
drm/omap: Fix error handling path in 'omap_dmm_probe()'
Yisheng Xie <xieyisheng1(a)huawei.com>
kmemleak: add scheduling point to kmemleak_scan()
Trond Myklebust <trond.myklebust(a)primarydata.com>
SUNRPC: Allow connect to return EHOSTUNREACH
Tetsuo Handa <penguin-kernel(a)I-love.SAKURA.ne.jp>
quota: Check for register_shrinker() failure.
Geert Uytterhoeven <geert+renesas(a)glider.be>
net: ethernet: xilinx: Mark XILINX_LL_TEMAC broken on 64-bit
Robert Lippert <roblip(a)gmail.com>
hwmon: (pmbus) Use 64bit math for DIRECT format values
Vasily Averin <vvs(a)virtuozzo.com>
lockd: fix "list_add double add" caused by legacy signal interface
Andrew Elble <aweits(a)rit.edu>
nfsd: check for use of the closed special stateid
Vasily Averin <vvs(a)virtuozzo.com>
grace: replace BUG_ON by WARN_ONCE in exit_net hook
Trond Myklebust <trond.myklebust(a)primarydata.com>
nfsd: Ensure we check stateid validity in the seqid operation checks
Trond Myklebust <trond.myklebust(a)primarydata.com>
nfsd: CLOSE SHOULD return the invalid special stateid for NFSv4.x (x>0)
Eduardo Otubo <otubo(a)redhat.com>
xen-netfront: remove warning when unloading module
Wanpeng Li <wanpeng.li(a)hotmail.com>
KVM: VMX: Fix rflags cache during vCPU reset
Josef Bacik <jbacik(a)fb.com>
btrfs: fix deadlock when writing out space cache
Chun-Yeow Yeoh <yeohchunyeow(a)gmail.com>
mac80211: fix the update of path metric for RANN frame
zhangliping <zhangliping02(a)baidu.com>
openvswitch: fix the incorrect flow action alloc size
Felix Kuehling <Felix.Kuehling(a)amd.com>
drm/amdkfd: Fix SDMA oversubsription handling
shaoyunl <Shaoyun.Liu(a)amd.com>
drm/amdkfd: Fix SDMA ring buffer size calculation
Felix Kuehling <Felix.Kuehling(a)amd.com>
drm/amdgpu: Fix SDMA load/unload sequence on HWS disabled mode
Michael Lyle <mlyle(a)lyle.org>
bcache: check return value of register_shrinker
James Hogan <jhogan(a)kernel.org>
cpufreq: Add Loongson machine dependencies
Hans de Goede <hdegoede(a)redhat.com>
ACPI / bus: Leave modalias empty for devices which are not present
Nikita Leshenko <nikita.leshchenko(a)oracle.com>
KVM: x86: ioapic: Preserve read-only values in the redirection table
Nikita Leshenko <nikita.leshchenko(a)oracle.com>
KVM: x86: ioapic: Clear Remote IRR when entry is switched to edge-triggered
Nikita Leshenko <nikita.leshchenko(a)oracle.com>
KVM: x86: ioapic: Fix level-triggered EOI and IOAPIC reconfigure race
Wanpeng Li <wanpeng.li(a)hotmail.com>
KVM: X86: Fix operand/address-size during instruction decoding
Liran Alon <liran.alon(a)oracle.com>
KVM: x86: Don't re-execute instruction when not passing CR2 value
Liran Alon <liran.alon(a)oracle.com>
KVM: x86: emulator: Return to user-mode on L1 CPL=0 emulation failure
Lyude Paul <lyude(a)redhat.com>
igb: Free IRQs when device is hotplugged
Jesse Chan <jc(a)linux.com>
mtd: nand: denali_pci: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
Jesse Chan <jc(a)linux.com>
gpio: ath79: add missing MODULE_DESCRIPTION/LICENSE
Jesse Chan <jc(a)linux.com>
gpio: iop: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
Jesse Chan <jc(a)linux.com>
power: reset: zx-reboot: add missing MODULE_DESCRIPTION/AUTHOR/LICENSE
Stephan Mueller <smueller(a)chronox.de>
crypto: af_alg - whitelist mask and type
Stephan Mueller <smueller(a)chronox.de>
crypto: aesni - handle zero length dst buffer
Takashi Iwai <tiwai(a)suse.de>
ALSA: seq: Make ioctls race-free
Hugh Dickins <hughd(a)google.com>
kaiser: fix intel_bts perf crashes
Dave Hansen <dave.hansen(a)linux.intel.com>
x86/pti: Make unpoison of pgd for trusted boot work for real
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: reject stores into ctx via st and xadd
Alexei Starovoitov <ast(a)kernel.org>
bpf: fix 32-bit divide by zero
Eric Dumazet <edumazet(a)google.com>
bpf: fix divides by zero
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: avoid false sharing of map refcount with max_entries
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: arsh is not supported in 32 bit alu thus reject it
Alexei Starovoitov <ast(a)kernel.org>
bpf: introduce BPF_JIT_ALWAYS_ON config
Alexei Starovoitov <ast(a)fb.com>
bpf: fix bpf_tail_call() x64 JIT
Eric Dumazet <edumazet(a)google.com>
x86: bpf_jit: small optimization in emit_bpf_tail_call()
Alexei Starovoitov <ast(a)fb.com>
bpf: fix branch pruning logic
Linus Torvalds <torvalds(a)linux-foundation.org>
loop: fix concurrent lo_open/lo_release
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/Kconfig | 1 +
arch/s390/Kconfig | 1 +
arch/x86/Kconfig | 1 +
arch/x86/crypto/aesni-intel_glue.c | 2 +-
arch/x86/include/asm/kvm_host.h | 3 +-
arch/x86/kernel/cpu/perf_event_intel_bts.c | 44 ++++++++++----
arch/x86/kernel/tboot.c | 10 ++++
arch/x86/kvm/emulate.c | 7 +++
arch/x86/kvm/ioapic.c | 20 ++++++-
arch/x86/kvm/vmx.c | 4 +-
arch/x86/kvm/x86.c | 2 +-
arch/x86/net/bpf_jit_comp.c | 13 ++--
crypto/af_alg.c | 10 ++--
drivers/acpi/device_sysfs.c | 4 ++
drivers/block/loop.c | 10 +++-
drivers/cpufreq/Kconfig | 2 +
drivers/gpio/gpio-ath79.c | 3 +
drivers/gpio/gpio-iop.c | 4 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v7.c | 47 +++++++++++----
drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_cik.c | 4 +-
.../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 18 ++++++
drivers/gpu/drm/omapdrm/omap_dmm_tiler.c | 3 +-
drivers/hwmon/pmbus/pmbus_core.c | 21 ++++---
drivers/md/bcache/btree.c | 5 +-
drivers/media/usb/usbtv/usbtv-core.c | 1 +
drivers/mtd/nand/denali_pci.c | 4 ++
drivers/net/ethernet/intel/igb/igb_main.c | 2 +-
drivers/net/ethernet/xilinx/Kconfig | 1 +
drivers/net/xen-netfront.c | 18 ++++++
drivers/power/reset/zx-reboot.c | 4 ++
drivers/scsi/aacraid/commsup.c | 2 +-
drivers/scsi/ufs/ufshcd.c | 7 ++-
drivers/spi/spi-imx.c | 15 ++++-
drivers/staging/rtl8188eu/os_dep/ioctl_linux.c | 14 ++---
drivers/tty/serial/imx.c | 14 +++--
drivers/usb/class/cdc-acm.c | 5 +-
drivers/usb/gadget/composite.c | 7 ++-
drivers/usb/gadget/function/f_fs.c | 3 +-
drivers/usb/serial/Kconfig | 1 +
drivers/usb/serial/io_edgeport.c | 1 -
drivers/usb/serial/option.c | 5 ++
drivers/usb/serial/pl2303.c | 1 +
drivers/usb/serial/pl2303.h | 1 +
drivers/usb/serial/usb-serial-simple.c | 7 +++
drivers/usb/storage/uas.c | 7 +--
fs/btrfs/free-space-cache.c | 3 +-
fs/nfs_common/grace.c | 10 +++-
fs/nfsd/nfs4state.c | 34 ++++++-----
fs/quota/dquot.c | 3 +-
fs/xfs/xfs_aops.c | 6 +-
include/linux/bpf.h | 16 +++--
init/Kconfig | 7 +++
kernel/bpf/core.c | 30 ++++++++--
kernel/bpf/verifier.c | 70 ++++++++++++++++++++++
lib/test_bpf.c | 13 ++--
mm/kmemleak.c | 2 +
net/Kconfig | 3 +
net/core/filter.c | 8 ++-
net/core/sysctl_net_core.c | 6 ++
net/mac80211/mesh_hwmp.c | 15 +++--
net/openvswitch/flow_netlink.c | 16 ++---
net/socket.c | 9 +++
net/sunrpc/xprtsock.c | 1 +
security/selinux/hooks.c | 2 +
sound/core/seq/seq_clientmgr.c | 10 +++-
sound/core/seq/seq_clientmgr.h | 1 +
tools/usb/usbip/src/usbip_bind.c | 9 +++
tools/usb/usbip/src/usbip_list.c | 9 +++
69 files changed, 504 insertions(+), 142 deletions(-)
Rediffed for RHEL7.4.z.
Conflicts:
The return value of md_make_request is different from RHEL to upstream.
And RHEL7 MD code does not use bio->bi_opf, but bio->bi_rw.
-Nigel Croxon
commit 393debc23c7820211d1c8253dd6a8408a7628fe7
Author: Shaohua Li <shli(a)fb.com>
Date: Thu Sep 21 10:23:35 2017 -0700
md: separate request handling
With commit cc27b0c78c79, pers->make_request could bail out without handling
the bio. If that happens, we should retry. The commit fixes md_make_request
but not other call sites. Separate the request handling part, so other call
sites can use it.
Reported-by: Nate Dailey <nate.dailey(a)stratus.com>
Fix: cc27b0c78c79(md: fix deadlock between mddev_suspend() and md_write_start())
Cc: stable(a)vger.kernel.org
Reviewed-by: NeilBrown <neilb(a)suse.com>
Signed-off-by: Shaohua Li <shli(a)fb.com>
Signed-off-by: Denys Vlasenko <dvlasenk(a)redhat.com>
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 85fe7a99290..407e15f4bfe 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -248,21 +248,8 @@ static DEFINE_SPINLOCK(all_mddevs_lock);
* call has finished, the bio has been linked into some internal structure
* and so is visible to ->quiesce(), so we don't need the refcount any more.
*/
-static void md_make_request(struct request_queue *q, struct bio *bio)
+void md_handle_request(struct mddev *mddev, struct bio *bio)
{
- const int rw = bio_data_dir(bio);
- struct mddev *mddev = q->queuedata;
- int cpu;
- unsigned int sectors;
-
- if (mddev == NULL || mddev->pers == NULL) {
- bio_io_error(bio);
- return;
- }
- if (mddev->ro == 1 && unlikely(rw == WRITE)) {
- bio_endio(bio, bio_sectors(bio) == 0 ? 0 : -EROFS);
- return;
- }
check_suspended:
smp_rmb(); /* Ensure implications of 'active' are visible */
rcu_read_lock();
@@ -282,24 +269,45 @@ check_suspended:
atomic_inc(&mddev->active_io);
rcu_read_unlock();
- /*
- * save the sectors now since our bio can
- * go away inside make_request
- */
- sectors = bio_sectors(bio);
if (!mddev->pers->make_request(mddev, bio)) {
atomic_dec(&mddev->active_io);
wake_up(&mddev->sb_wait);
goto check_suspended;
}
+ if (atomic_dec_and_test(&mddev->active_io) && mddev->suspended)
+ wake_up(&mddev->sb_wait);
+}
+EXPORT_SYMBOL(md_handle_request);
+
+static void md_make_request(struct request_queue *q, struct bio *bio)
+{
+ const int rw = bio_data_dir(bio);
+ struct mddev *mddev = q->queuedata;
+ int cpu;
+ unsigned int sectors;
+
+ if (mddev == NULL || mddev->pers == NULL) {
+ bio_io_error(bio);
+ return;
+ }
+ if (mddev->ro == 1 && unlikely(rw == WRITE)) {
+ bio_endio(bio, bio_sectors(bio) == 0 ? 0 : -EROFS);
+ return;
+ }
+
+ /*
+ * save the sectors now since our bio can
+ * go away inside make_request
+ */
+ sectors = bio_sectors(bio);
+
+ md_handle_request(mddev, bio);
+
cpu = part_stat_lock();
part_stat_inc(cpu, &mddev->gendisk->part0, ios[rw]);
part_stat_add(cpu, &mddev->gendisk->part0, sectors[rw], sectors);
part_stat_unlock();
-
- if (atomic_dec_and_test(&mddev->active_io) && mddev->suspended)
- wake_up(&mddev->sb_wait);
}
/* mddev_suspend makes sure no new requests are submitted
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 0d13bf88f41..12e19d6d373 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -679,6 +679,7 @@ extern void md_stop_writes(struct mddev *mddev);
extern int md_rdev_init(struct md_rdev *rdev);
extern void md_rdev_clear(struct md_rdev *rdev);
+extern void md_handle_request(struct mddev *mddev, struct bio *bio);
extern void mddev_suspend(struct mddev *mddev);
extern void mddev_resume(struct mddev *mddev);
extern struct bio *bio_clone_mddev(struct bio *bio, gfp_t gfp_mask,
--
2.16.2
Changes since v4 [1]:
* Fix the changelog of "dax: introduce IS_DEVDAX() and IS_FSDAX()" to
better clarify the need for new helpers (Jan)
* Replace dax_sem_is_locked() with dax_sem_assert_held() (Jan)
* Use file_inode() in vma_is_dax() (Jan)
* Resend the full series to linux-xfs@ (Dave)
* Collect Jan's Reviewed-by
[1]: https://lists.01.org/pipermail/linux-nvdimm/2018-February/014271.html
---
The vfio interface, like RDMA, wants to setup long term (indefinite)
pins of the pages backing an address range so that a guest or userspace
driver can perform DMA to the with physical address. Given that this
pinning may lead to filesystem operations deadlocking in the
filesystem-dax case, the pinning request needs to be rejected.
The longer term fix for vfio, RDMA, and any other long term pin user, is
to provide a 'pin with lease' mechanism. Similar to the leases that are
hold for pNFS RDMA layouts, this userspace lease gives the kernel a way
to notify userspace that the block layout of the file is changing and
the kernel is revoking access to pinned pages.
Related to this change is the discovery that vma_is_fsdax() was causing
device-dax inode detection to fail. That lead to series of fixes and
cleanups to make sure that S_DAX is defined correctly in the
CONFIG_FS_DAX=n + CONFIG_DEV_DAX=y case.
---
Dan Williams (12):
dax: fix vma_is_fsdax() helper
dax: introduce IS_DEVDAX() and IS_FSDAX()
ext2, dax: finish implementing dax_sem helpers
ext2, dax: define ext2_dax_*() infrastructure in all cases
ext4, dax: define ext4_dax_*() infrastructure in all cases
ext2, dax: replace IS_DAX() with IS_FSDAX()
ext4, dax: replace IS_DAX() with IS_FSDAX()
xfs, dax: replace IS_DAX() with IS_FSDAX()
mm, dax: replace IS_DAX() with IS_DEVDAX() or IS_FSDAX()
fs, dax: kill IS_DAX()
dax: fix S_DAX definition
vfio: disable filesystem-dax page pinning
drivers/vfio/vfio_iommu_type1.c | 18 ++++++++++++++--
fs/ext2/ext2.h | 6 +++++
fs/ext2/file.c | 19 +++++------------
fs/ext2/inode.c | 10 ++++-----
fs/ext4/file.c | 18 +++++-----------
fs/ext4/inode.c | 4 ++--
fs/ext4/ioctl.c | 2 +-
fs/ext4/super.c | 2 +-
fs/iomap.c | 2 +-
fs/xfs/xfs_file.c | 14 ++++++-------
fs/xfs/xfs_ioctl.c | 4 ++--
fs/xfs/xfs_iomap.c | 6 +++--
fs/xfs/xfs_reflink.c | 2 +-
include/linux/dax.h | 12 ++++++++---
include/linux/fs.h | 43 ++++++++++++++++++++++++++++-----------
mm/fadvise.c | 3 ++-
mm/filemap.c | 4 ++--
mm/huge_memory.c | 4 +++-
mm/madvise.c | 3 ++-
19 files changed, 102 insertions(+), 74 deletions(-)
The patch titled
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
has been added to the -mm tree. Its filename is
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-fix-memmap_init_zone…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-fix-memmap_init_zone…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Daniel Vacek <neelx(a)redhat.com>
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where
possible") introduced a bug where move_freepages() triggers a VM_BUG_ON()
on uninitialized page structure due to pageblock alignment. To fix this,
simply align the skipped pfns in memmap_init_zone() the same way as in
move_freepages_block().
Seen in one of the RHEL reports:
crash> log | grep -e BUG -e RIP -e Call.Trace -e move_freepages_block -e rmqueue -e freelist -A1
kernel BUG at mm/page_alloc.c:1389!
invalid opcode: 0000 [#1] SMP
--
RIP: 0010:[<ffffffff8118833e>] [<ffffffff8118833e>] move_freepages+0x15e/0x160
RSP: 0018:ffff88054d727688 EFLAGS: 00010087
--
Call Trace:
[<ffffffff811883b3>] move_freepages_block+0x73/0x80
[<ffffffff81189e63>] __rmqueue+0x263/0x460
[<ffffffff8118c781>] get_page_from_freelist+0x7e1/0x9e0
[<ffffffff8118caf6>] __alloc_pages_nodemask+0x176/0x420
--
RIP [<ffffffff8118833e>] move_freepages+0x15e/0x160
RSP <ffff88054d727688>
crash> page_init_bug -v | grep RAM
<struct resource 0xffff88067fffd2f8> 1000 - 9bfff System RAM (620.00 KiB)
<struct resource 0xffff88067fffd3a0> 100000 - 430bffff System RAM ( 1.05 GiB = 1071.75 MiB = 1097472.00 KiB)
<struct resource 0xffff88067fffd410> 4b0c8000 - 4bf9cfff System RAM ( 14.83 MiB = 15188.00 KiB)
<struct resource 0xffff88067fffd480> 4bfac000 - 646b1fff System RAM (391.02 MiB = 400408.00 KiB)
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
<struct resource 0xffff88067fffd640> 100000000 - 67fffffff System RAM ( 22.00 GiB)
crash> page_init_bug | head -6
<struct resource 0xffff88067fffd560> 7b788000 - 7b7fffff System RAM (480.00 KiB)
<struct page 0xffffea0001ede200> 1fffff00000000 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575
<struct page 0xffffea0001ede200> 505736 505344 <struct page 0xffffea0001ed8000> 505855 <struct page 0xffffea0001edffc0>
<struct page 0xffffea0001ed8000> 0 0 <struct pglist_data 0xffff88047ffd9000> 0 <struct zone 0xffff88047ffd9000> DMA 1 4095
<struct page 0xffffea0001edffc0> 1fffff00000400 0 <struct pglist_data 0xffff88047ffd9000> 1 <struct zone 0xffff88047ffd9800> DMA32 4096 1048575
BUG, zones differ!
Note that this range follows two not populated sections 68000000-77ffffff
in this zone. 7b788000-7b7fffff is the first one after a gap. This makes
memmap_init_zone() skip all the pfns up to the beginning of this range.
But this range is not pageblock (2M) aligned. In fact no range has to be.
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b787000 7b788000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001ed7fc0 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 0 0 <<<<
ffffea0001ede1c0 7b787000 0 0 0 0
ffffea0001ede200 7b788000 0 0 1 1fffff00000000
Top part of page flags should contain nodeid and zonenr, which is not
the case for page ffffea0001ed8000 here (<<<<).
crash> log | grep -o fffea0001ed[^\ ]* | sort -u
fffea0001ed8000
fffea0001eded20
fffea0001edffc0
crash> bt -r | grep -o fffea0001ed[^\ ]* | sort -u
fffea0001ed8000
fffea0001eded00
fffea0001eded20
fffea0001edffc0
Initialization of the whole beginning of the section is skipped up to the
start of the range due to the commit b92df1de5d28. Now any code calling
move_freepages_block() (like reusing the page from a freelist as in this
example) with a page from the beginning of the range will get the page
rounded down to start_page ffffea0001ed8000 and passed to move_freepages()
which crashes on assertion getting wrong zonenr.
> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
Note, page_zone() derives the zone from page flags here.
>From similar machine before commit b92df1de5d28:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
fffff73941e00000 78000000 0 0 1 1fffff00000000
fffff73941ed7fc0 7b5ff000 0 0 1 1fffff00000000
fffff73941ed8000 7b600000 0 0 1 1fffff00000000
fffff73941edff80 7b7fe000 0 0 1 1fffff00000000
fffff73941edffc0 7b7ff000 ffff8e67e04d3ae0 ad84 1 1fffff00020068 uptodate,lru,active,mappedtodisk
All the pages since the beginning of the section are initialized.
move_freepages()' not gonna blow up.
The same machine with this fix applied:
crash> kmem -p 77fff000 78000000 7b5ff000 7b600000 7b7fe000 7b7ff000
PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0001e00000 78000000 0 0 0 0
ffffea0001e00000 7b5ff000 0 0 0 0
ffffea0001ed8000 7b600000 0 0 1 1fffff00000000
ffffea0001edff80 7b7fe000 0 0 1 1fffff00000000
ffffea0001edffc0 7b7ff000 ffff88017fb13720 8 2 1fffff00020068 uptodate,lru,active,mappedtodisk
At least the bare minimum of pages is initialized preventing the crash as well.
Link: http://lkml.kernel.org/r/0485727b2e82da7efbce5f6ba42524b429d0391a.152001194…
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <neelx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff -puN mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment
+++ a/mm/page_alloc.c
@@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned
/*
* Skip to the pfn preceding the next valid one (or
* end_pfn), such that we hit a valid pfn (or end_pfn)
- * on our next iteration of the loop.
+ * on our next iteration of the loop. Note that it needs
+ * to be pageblock aligned even when the region itself
+ * is not. move_freepages_block() can shift ahead of
+ * the valid region but still depends on correct page
+ * metadata.
*/
- pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
+ pfn = (memblock_next_valid_pfn(pfn, end_pfn) &
+ ~(pageblock_nr_pages-1)) - 1;
#endif
continue;
}
_
Patches currently in -mm which might be from neelx(a)redhat.com are
mm-memblock-hardcode-the-end_pfn-being-1.patch
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
Hi All,
Resent without non-upstream patches.
This backport patchset fixed the spectre issue, it's original branch:
https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git/log/?h=kpti
A few dependency or fixingpatches are also picked up, if they are necessary
and no functional changes.
No bug found from kernelci.org and lkft testing. It also could be gotten from:
git://git.linaro.org/kernel/linux-linaro-stable.git v4.9-spectre-upstream-only
Comments are appreciated!
Regards
Alex
[PATCH 01/45] mm: Introduce lm_alias
[PATCH 02/45] arm64: alternatives: apply boot time fixups via the
[PATCH 03/45] arm64: barrier: Add CSDB macros to control data-value
[PATCH 04/45] arm64: Implement array_index_mask_nospec()
[PATCH 05/45] arm64: move TASK_* definitions to <asm/processor.h>
[PATCH 06/45] arm64: Factor out PAN enabling/disabling into separate
[PATCH 07/45] arm64: Factor out TTBR0_EL1 post-update workaround into
[PATCH 08/45] arm64: uaccess: consistently check object sizes
[PATCH 09/45] arm64: Make USER_DS an inclusive limit
[PATCH 10/45] arm64: Use pointer masking to limit uaccess speculation
[PATCH 11/45] arm64: syscallno is secretly an int, make it official
[PATCH 12/45] arm64: entry: Ensure branch through syscall table is
[PATCH 13/45] arm64: uaccess: Prevent speculative use of the current
[PATCH 14/45] arm64: uaccess: Don't bother eliding access_ok checks
[PATCH 15/45] arm64: uaccess: Mask __user pointers for __arch_{clear,
[PATCH 16/45] arm64: futex: Mask __user pointers prior to dereference
[PATCH 17/45] drivers/firmware: Expose psci_get_version through
[PATCH 18/45] arm64: cpufeature: __this_cpu_has_cap() shouldn't stop
[PATCH 19/45] arm64: cpu_errata: Allow an erratum to be match for all
[PATCH 20/45] arm64: Run enable method for errata work arounds on
[PATCH 21/45] arm64: cpufeature: Pass capability structure to
[PATCH 22/45] arm64: Move post_ttbr_update_workaround to C code
[PATCH 23/45] arm64: Add skeleton to harden the branch predictor
[PATCH 24/45] arm64: Move BP hardening to check_and_switch_context
[PATCH 25/45] arm64: KVM: Use per-CPU vector when BP hardening is
[PATCH 26/45] arm64: entry: Apply BP hardening for high-priority
[PATCH 27/45] arm64: entry: Apply BP hardening for suspicious
[PATCH 28/45] arm64: cputype: Add missing MIDR values for Cortex-A72
[PATCH 29/45] arm64: Implement branch predictor hardening for
[PATCH 30/45] arm64: KVM: Increment PC after handling an SMC trap
[PATCH 31/45] arm/arm64: KVM: Consolidate the PSCI include files
[PATCH 32/45] arm/arm64: KVM: Add PSCI_VERSION helper
[PATCH 33/45] arm/arm64: KVM: Add smccc accessors to PSCI code
[PATCH 34/45] arm/arm64: KVM: Implement PSCI 1.0 support
[PATCH 35/45] arm/arm64: KVM: Advertise SMCCC v1.1
[PATCH 36/45] arm64: KVM: Make PSCI_VERSION a fast path
[PATCH 37/45] arm/arm64: KVM: Turn kvm_psci_version into a static
[PATCH 38/45] arm64: KVM: Report SMCCC_ARCH_WORKAROUND_1 BP hardening
[PATCH 39/45] arm64: KVM: Add SMCCC_ARCH_WORKAROUND_1 fast handling
[PATCH 40/45] firmware/psci: Expose PSCI conduit
[PATCH 41/45] firmware/psci: Expose SMCCC version through psci_ops
[PATCH 42/45] arm/arm64: smccc: Make function identifiers an unsigned
[PATCH 43/45] arm/arm64: smccc: Implement SMCCC v1.1 inline primitive
[PATCH 44/45] arm64: Add ARM_SMCCC_ARCH_WORKAROUND_1 BP hardening
[PATCH 45/45] arm64: Kill PSCI_GET_VERSION as a variant-2 workaround
The patch titled
Subject: mm-memblock-hardcode-the-end_pfn-being-1-fix
has been added to the -mm tree. Its filename is
mm-memblock-hardcode-the-end_pfn-being-1-fix.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-memblock-hardcode-the-end_pfn-b…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-memblock-hardcode-the-end_pfn-b…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Andrew Morton <akpm(a)linux-foundation.org>
Subject: mm-memblock-hardcode-the-end_pfn-being-1-fix
make it work against current -linus, not against -mm
Cc: Daniel Vacek <neelx(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/page_alloc.c~mm-memblock-hardcode-the-end_pfn-being-1-fix mm/page_alloc.c
--- a/mm/page_alloc.c~mm-memblock-hardcode-the-end_pfn-being-1-fix
+++ a/mm/page_alloc.c
@@ -5361,7 +5361,7 @@ void __meminit memmap_init_zone(unsigned
* end_pfn), such that we hit a valid pfn (or end_pfn)
* on our next iteration of the loop.
*/
- pfn = memblock_next_valid_pfn(pfn) - 1;
+ pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
#endif
continue;
}
_
Patches currently in -mm which might be from akpm(a)linux-foundation.org are
i-need-old-gcc.patch
mm-memblock-hardcode-the-end_pfn-being-1-fix.patch
arm-arch-arm-include-asm-pageh-needs-personalityh.patch
mm.patch
mm-page_alloc-skip-over-regions-of-invalid-pfns-on-uma-fix.patch
direct-io-minor-cleanups-in-do_blockdev_direct_io-fix.patch
list_lru-prefetch-neighboring-list-entries-before-acquiring-lock-fix.patch
mm-oom-cgroup-aware-oom-killer-fix.patch
mm-oom-docs-describe-the-cgroup-aware-oom-killer-fix-2-fix.patch
proc-add-seq_put_decimal_ull_width-to-speed-up-proc-pid-smaps-fix.patch
sysctl-add-flags-to-support-min-max-range-clamping-fix.patch
linux-next-git-rejects.patch
kernel-forkc-export-kernel_thread-to-modules.patch
slab-leaks3-default-y.patch
The patch titled
Subject: mm/memblock.c: hardcode the end_pfn being -1
has been added to the -mm tree. Its filename is
mm-memblock-hardcode-the-end_pfn-being-1.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-memblock-hardcode-the-end_pfn-b…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-memblock-hardcode-the-end_pfn-b…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Daniel Vacek <neelx(a)redhat.com>
Subject: mm/memblock.c: hardcode the end_pfn being -1
This is just a cleanup. It aids handling the special end case in the next
commit.
Link: http://lkml.kernel.org/r/1ca478d4269125a99bcfb1ca04d7b88ac1aee924.152001194…
Signed-off-by: Daniel Vacek <neelx(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memblock.c | 13 ++++++-------
mm/page_alloc.c | 2 +-
2 files changed, 7 insertions(+), 8 deletions(-)
diff -puN mm/memblock.c~mm-memblock-hardcode-the-end_pfn-being-1 mm/memblock.c
--- a/mm/memblock.c~mm-memblock-hardcode-the-end_pfn-being-1
+++ a/mm/memblock.c
@@ -1101,13 +1101,12 @@ void __init_memblock __next_mem_pfn_rang
*out_nid = r->nid;
}
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
- unsigned long max_pfn)
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
{
struct memblock_type *type = &memblock.memory;
unsigned int right = type->cnt;
unsigned int mid, left = 0;
- phys_addr_t addr = PFN_PHYS(pfn + 1);
+ phys_addr_t addr = PFN_PHYS(++pfn);
do {
mid = (right + left) / 2;
@@ -1118,15 +1117,15 @@ unsigned long __init_memblock memblock_n
type->regions[mid].size))
left = mid + 1;
else {
- /* addr is within the region, so pfn + 1 is valid */
- return min(pfn + 1, max_pfn);
+ /* addr is within the region, so pfn is valid */
+ return pfn;
}
} while (left < right);
if (right == type->cnt)
- return max_pfn;
+ return -1UL;
else
- return min(PHYS_PFN(type->regions[right].base), max_pfn);
+ return PHYS_PFN(type->regions[right].base);
}
/**
diff -puN mm/page_alloc.c~mm-memblock-hardcode-the-end_pfn-being-1 mm/page_alloc.c
--- a/mm/page_alloc.c~mm-memblock-hardcode-the-end_pfn-being-1
+++ a/mm/page_alloc.c
@@ -5361,7 +5361,7 @@ void __meminit memmap_init_zone(unsigned
* end_pfn), such that we hit a valid pfn (or end_pfn)
* on our next iteration of the loop.
*/
- pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
+ pfn = memblock_next_valid_pfn(pfn) - 1;
#endif
continue;
}
_
Patches currently in -mm which might be from neelx(a)redhat.com are
mm-memblock-hardcode-the-end_pfn-being-1.patch
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
The patch titled
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
has been removed from the -mm tree. Its filename was
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Daniel Vacek <neelx(a)redhat.com>
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
BUG at mm/page_alloc.c:1913
> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") introduced a bug where move_freepages() triggers a
VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
To fix this, simply align the skipped pfns in memmap_init_zone() the same
way as in move_freepages_block().
Link: http://lkml.kernel.org/r/1519988497-28941-1-git-send-email-neelx@redhat.com
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <neelx(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memblock.c | 13 ++++++-------
mm/page_alloc.c | 9 +++++++--
2 files changed, 13 insertions(+), 9 deletions(-)
diff -puN mm/memblock.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment mm/memblock.c
--- a/mm/memblock.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment
+++ a/mm/memblock.c
@@ -1101,13 +1101,12 @@ void __init_memblock __next_mem_pfn_rang
*out_nid = r->nid;
}
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
- unsigned long max_pfn)
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
{
struct memblock_type *type = &memblock.memory;
unsigned int right = type->cnt;
unsigned int mid, left = 0;
- phys_addr_t addr = PFN_PHYS(pfn + 1);
+ phys_addr_t addr = PFN_PHYS(++pfn);
do {
mid = (right + left) / 2;
@@ -1118,15 +1117,15 @@ unsigned long __init_memblock memblock_n
type->regions[mid].size))
left = mid + 1;
else {
- /* addr is within the region, so pfn + 1 is valid */
- return min(pfn + 1, max_pfn);
+ /* addr is within the region, so pfn is valid */
+ return pfn;
}
} while (left < right);
if (right == type->cnt)
- return max_pfn;
+ return -1UL;
else
- return min(PHYS_PFN(type->regions[right].base), max_pfn);
+ return PHYS_PFN(type->regions[right].base);
}
/**
diff -puN mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment
+++ a/mm/page_alloc.c
@@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned
/*
* Skip to the pfn preceding the next valid one (or
* end_pfn), such that we hit a valid pfn (or end_pfn)
- * on our next iteration of the loop.
+ * on our next iteration of the loop. Note that it needs
+ * to be pageblock aligned even when the region itself
+ * is not as move_freepages_block() can shift ahead of
+ * the valid region but still depends on correct page
+ * metadata.
*/
- pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
+ pfn = (memblock_next_valid_pfn(pfn) &
+ ~(pageblock_nr_pages-1)) - 1;
#endif
continue;
}
_
Patches currently in -mm which might be from neelx(a)redhat.com are
mm-memblock-hardcode-the-end_pfn-being-1.patch
The patch titled
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
has been added to the -mm tree. Its filename is
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-fix-memmap_init_zone…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-fix-memmap_init_zone…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Daniel Vacek <neelx(a)redhat.com>
Subject: mm/page_alloc: fix memmap_init_zone pageblock alignment
BUG at mm/page_alloc.c:1913
> VM_BUG_ON(page_zone(start_page) != page_zone(end_page));
Commit b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns
where possible") introduced a bug where move_freepages() triggers a
VM_BUG_ON() on uninitialized page structure due to pageblock alignment.
To fix this, simply align the skipped pfns in memmap_init_zone() the same
way as in move_freepages_block().
Link: http://lkml.kernel.org/r/1519988497-28941-1-git-send-email-neelx@redhat.com
Fixes: b92df1de5d28 ("mm: page_alloc: skip over regions of invalid pfns where possible")
Signed-off-by: Daniel Vacek <neelx(a)redhat.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Pavel Tatashin <pasha.tatashin(a)oracle.com>
Cc: Paul Burton <paul.burton(a)imgtec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memblock.c | 13 ++++++-------
mm/page_alloc.c | 9 +++++++--
2 files changed, 13 insertions(+), 9 deletions(-)
diff -puN mm/memblock.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment mm/memblock.c
--- a/mm/memblock.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment
+++ a/mm/memblock.c
@@ -1101,13 +1101,12 @@ void __init_memblock __next_mem_pfn_rang
*out_nid = r->nid;
}
-unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn,
- unsigned long max_pfn)
+unsigned long __init_memblock memblock_next_valid_pfn(unsigned long pfn)
{
struct memblock_type *type = &memblock.memory;
unsigned int right = type->cnt;
unsigned int mid, left = 0;
- phys_addr_t addr = PFN_PHYS(pfn + 1);
+ phys_addr_t addr = PFN_PHYS(++pfn);
do {
mid = (right + left) / 2;
@@ -1118,15 +1117,15 @@ unsigned long __init_memblock memblock_n
type->regions[mid].size))
left = mid + 1;
else {
- /* addr is within the region, so pfn + 1 is valid */
- return min(pfn + 1, max_pfn);
+ /* addr is within the region, so pfn is valid */
+ return pfn;
}
} while (left < right);
if (right == type->cnt)
- return max_pfn;
+ return -1UL;
else
- return min(PHYS_PFN(type->regions[right].base), max_pfn);
+ return PHYS_PFN(type->regions[right].base);
}
/**
diff -puN mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-fix-memmap_init_zone-pageblock-alignment
+++ a/mm/page_alloc.c
@@ -5359,9 +5359,14 @@ void __meminit memmap_init_zone(unsigned
/*
* Skip to the pfn preceding the next valid one (or
* end_pfn), such that we hit a valid pfn (or end_pfn)
- * on our next iteration of the loop.
+ * on our next iteration of the loop. Note that it needs
+ * to be pageblock aligned even when the region itself
+ * is not as move_freepages_block() can shift ahead of
+ * the valid region but still depends on correct page
+ * metadata.
*/
- pfn = memblock_next_valid_pfn(pfn, end_pfn) - 1;
+ pfn = (memblock_next_valid_pfn(pfn) &
+ ~(pageblock_nr_pages-1)) - 1;
#endif
continue;
}
_
Patches currently in -mm which might be from neelx(a)redhat.com are
mm-page_alloc-fix-memmap_init_zone-pageblock-alignment.patch
This is the start of the stable review cycle for the 3.18.98 release.
There are 24 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Mar 4 08:42:22 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.98-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-3.18.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 3.18.98-rc1
Yangbo Lu <yangbo.lu(a)nxp.com>
net: gianfar_ptp: move set_fipers() to spinlock protecting area
Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
sctp: make use of pre-calculated len
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/gntdev: Fix partial gntdev_mmap() cleanup
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/gntdev: Fix off-by-one error when unmapping with holes
Sergei Shtylyov <sergei.shtylyov(a)cogentembedded.com>
SolutionEngine771x: fix Ether platform data
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
mdio-sun4i: Fix a memory leak
Eduardo Otubo <otubo(a)redhat.com>
xen-netfront: enable device after manual module load
Xiongwei Song <sxwjean(a)gmail.com>
drm/ttm: check the return value of kzalloc
Tushar Dave <tushar.n.dave(a)oracle.com>
e1000: fix disabling already-disabled warning
Aliaksei Karaliou <akaraliou.dev(a)gmail.com>
xfs: quota: check result of register_shrinker()
Aliaksei Karaliou <akaraliou.dev(a)gmail.com>
xfs: quota: fix missed destroy of qi_tree_lock
Stefan Haberland <sth(a)linux.vnet.ibm.com>
s390/dasd: fix wrongly assigned configuration data
Matthieu CASTET <matthieu.castet(a)parrot.com>
led: core: Fix brightness setting when setting delay_off=0
Guilherme G. Piccoli <gpiccoli(a)linux.vnet.ibm.com>
bnx2x: Improve reliability in case of nested PCI errors
Siva Reddy Kallam <siva.kallam(a)broadcom.com>
tg3: Enable PHY reset in MTU change path for 5720
Siva Reddy Kallam <siva.kallam(a)broadcom.com>
tg3: Add workaround to restrict 5762 MRRS to 2048
Cathy Avery <cavery(a)redhat.com>
scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error
Alexander Kochetkov <al.kochet(a)gmail.com>
net: arc_emac: fix arc_emac_rx() error paths
Radu Pirea <radu.pirea(a)microchip.com>
spi: atmel: fixed spin_lock usage inside atmel_spi_remove
Al Viro <viro(a)zeniv.linux.org.uk>
sget(): handle failures of register_shrinker()
Brendan McGrath <redmcg(a)redmandi.dyndns.org>
ipv6: icmp6: Allow icmp messages to be looped back
Sascha Hauer <s.hauer(a)pengutronix.de>
mtd: nand: gpmi: Fix failure when a erased page has a bitflip at BBM
Anna-Maria Gleixner <anna-maria(a)linutronix.de>
hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers)
Jakub Sitnicki <jkbs(a)redhat.com>
ipv6: Skip XFRM lookup if dst_entry in socket cache is valid
-------------
Diffstat:
Makefile | 4 +-
arch/sh/boards/mach-se/770x/setup.c | 10 ++++-
drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 +
drivers/leds/led-core.c | 2 +-
drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 6 +--
drivers/net/ethernet/arc/emac_main.c | 53 ++++++++++++++----------
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 4 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 14 ++++++-
drivers/net/ethernet/broadcom/tg3.c | 13 +++++-
drivers/net/ethernet/broadcom/tg3.h | 4 ++
drivers/net/ethernet/freescale/gianfar_ptp.c | 3 +-
drivers/net/ethernet/intel/e1000/e1000.h | 3 +-
drivers/net/ethernet/intel/e1000/e1000_main.c | 27 +++++++++---
drivers/net/phy/mdio-sun4i.c | 6 ++-
drivers/net/xen-netfront.c | 1 +
drivers/s390/block/dasd_3990_erp.c | 10 +++++
drivers/scsi/storvsc_drv.c | 3 +-
drivers/spi/spi-atmel.c | 2 +-
drivers/xen/gntdev.c | 8 ++--
fs/super.c | 6 ++-
fs/xfs/xfs_qm.c | 46 +++++++++++++-------
kernel/time/hrtimer.c | 7 +++-
net/ipv6/ip6_output.c | 11 ++---
net/ipv6/route.c | 1 +
net/sctp/socket.c | 16 ++++---
25 files changed, 180 insertions(+), 82 deletions(-)
This is the start of the stable review cycle for the 4.9.86 release.
There are 56 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun Mar 4 08:44:26 UTC 2018.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.86-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.86-rc1
James Hogan <jhogan(a)kernel.org>
MIPS: Implement __multi3 for GCC7 MIPS64r6 builds
Punit Agrawal <punit.agrawal(a)arm.com>
KVM: arm/arm64: Fix check for hugepage size when allocating at Stage 2
Yangbo Lu <yangbo.lu(a)nxp.com>
net: gianfar_ptp: move set_fipers() to spinlock protecting area
Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
sctp: make use of pre-calculated len
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/gntdev: Fix partial gntdev_mmap() cleanup
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/gntdev: Fix off-by-one error when unmapping with holes
Sergei Shtylyov <sergei.shtylyov(a)cogentembedded.com>
SolutionEngine771x: fix Ether platform data
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
mdio-sun4i: Fix a memory leak
Eduardo Otubo <otubo(a)redhat.com>
xen-netfront: enable device after manual module load
Venkat Duvvuru <venkatkumar.duvvuru(a)broadcom.com>
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
Luu An Phu <phu.luuan(a)nxp.com>
can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
Johannes Berg <johannes.berg(a)intel.com>
mac80211: mesh: drop frames appearing to be from us
Hao Chen <flank3rsky(a)gmail.com>
nl80211: Check for the required netlink attribute presence
Alexander Duyck <alexander.h.duyck(a)intel.com>
i40e/i40evf: Account for frags split over multiple descriptors in check linearize
Felix Janda <felix.janda(a)posteo.de>
uapi libc compat: add fallback for unsupported libcs
Xiongwei Song <sxwjean(a)gmail.com>
drm/ttm: check the return value of kzalloc
SZ Lin (林上智) <sz.lin(a)moxa.com>
NET: usb: qmi_wwan: add support for YUGA CLM920-NC5 PID 0x9625
Tushar Dave <tushar.n.dave(a)oracle.com>
e1000: fix disabling already-disabled warning
Gao Feng <gfree.wind(a)vip.163.com>
macvlan: Fix one possible double free
Aliaksei Karaliou <akaraliou.dev(a)gmail.com>
xfs: quota: check result of register_shrinker()
Aliaksei Karaliou <akaraliou.dev(a)gmail.com>
xfs: quota: fix missed destroy of qi_tree_lock
Erez Shitrit <erezsh(a)mellanox.com>
IB/ipoib: Fix race condition in neigh creation
Leon Romanovsky <leonro(a)mellanox.com>
IB/mlx4: Fix mlx4_ib_alloc_mr error flow
Stefan Haberland <sth(a)linux.vnet.ibm.com>
s390/dasd: fix wrongly assigned configuration data
Guenter Roeck <linux(a)roeck-us.net>
genirq: Guard handle_bad_irq log messages
Nitzan Carmi <nitzanc(a)mellanox.com>
IB/mlx5: Fix mlx5_ib_alloc_mr error flow
Matthieu CASTET <matthieu.castet(a)parrot.com>
led: core: Fix brightness setting when setting delay_off=0
Guilherme G. Piccoli <gpiccoli(a)linux.vnet.ibm.com>
bnx2x: Improve reliability in case of nested PCI errors
Siva Reddy Kallam <siva.kallam(a)broadcom.com>
tg3: Enable PHY reset in MTU change path for 5720
Siva Reddy Kallam <siva.kallam(a)broadcom.com>
tg3: Add workaround to restrict 5762 MRRS to 2048
Tommi Rantala <tommi.t.rantala(a)nokia.com>
tipc: fix tipc_mon_delete() oops in tipc_enable_bearer() error path
Tommi Rantala <tommi.t.rantala(a)nokia.com>
tipc: error path leak fixes in tipc_enable_bearer()
James Hogan <jhogan(a)kernel.org>
lib/mpi: Fix umul_ppmm() for MIPS64r6
Arnd Bergmann <arnd(a)arndb.de>
ARM: dts: ls1021a: fix incorrect clock references
Cathy Avery <cavery(a)redhat.com>
scsi: storvsc: Fix scsi_cmd error assignments in storvsc_handle_error
Fredrik Hallenberg <megahallon(a)gmail.com>
net: stmmac: Fix TX timestamp calculation
Xin Long <lucien.xin(a)gmail.com>
ip6_tunnel: get the min mtu properly in ip6_tnl_xmit
Alexander Kochetkov <al.kochet(a)gmail.com>
net: arc_emac: fix arc_emac_rx() error paths
Sean Wang <sean.wang(a)mediatek.com>
net: mediatek: setup proper state for disabled GMAC on the default
Abhijeet Kumar <abhijeet.kumar(a)intel.com>
ASoC: nau8825: fix issue that pop noise when start capture
Radu Pirea <radu.pirea(a)microchip.com>
spi: atmel: fixed spin_lock usage inside atmel_spi_remove
Jia-Ju Bai <baijiaju1990(a)163.com>
mac80211_hwsim: Fix a possible sleep-in-atomic bug in hwsim_get_radio_nl
Karol Herbst <kherbst(a)redhat.com>
drm/nouveau/pci: do a msi rearm on init
Alexey Khoroshilov <khoroshilov(a)ispras.ru>
net: phy: xgene: disable clk on error paths
Al Viro <viro(a)zeniv.linux.org.uk>
sget(): handle failures of register_shrinker()
Arnaldo Carvalho de Melo <acme(a)redhat.com>
x86/asm: Allow again using asm.h when building for the 'bpf' clang target
Chunyan Zhang <zhang.lyra(a)gmail.com>
ARM: 8731/1: Fix csum_partial_copy_from_user() stack mismatch
Brendan McGrath <redmcg(a)redmandi.dyndns.org>
ipv6: icmp6: Allow icmp messages to be looped back
Albert Hsieh <wen.hsieh(a)broadcom.com>
mtd: nand: brcmnand: Zero bitflip is not an error
Sascha Hauer <s.hauer(a)pengutronix.de>
mtd: nand: gpmi: Fix failure when a erased page has a bitflip at BBM
Daniele Palmas <dnlplm(a)gmail.com>
net: usb: qmi_wwan: add Telit ME910 PID 0x1101 support
Keith Busch <keith.busch(a)intel.com>
nvme: check hw sectors before setting chunk sectors
Andreas Platschek <andreas.platschek(a)opentech.at>
dmaengine: fsl-edma: disable clks on all error paths
Yunlei He <heyunlei(a)huawei.com>
f2fs: fix a bug caused by NULL extent tree
Ben Gardner <gardner.ben(a)gmail.com>
i2c: designware: must wait for enable
Anna-Maria Gleixner <anna-maria(a)linutronix.de>
hrtimer: Ensure POSIX compliance (relative CLOCK_REALTIME hrtimers)
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/ls1021a-qds.dts | 2 +-
arch/arm/boot/dts/ls1021a-twr.dts | 2 +-
arch/arm/kvm/mmu.c | 2 +-
arch/arm/lib/csumpartialcopyuser.S | 4 ++
arch/mips/lib/Makefile | 3 +-
arch/mips/lib/libgcc.h | 17 +++++++
arch/mips/lib/multi3.c | 54 +++++++++++++++++++++
arch/sh/boards/mach-se/770x/setup.c | 10 +++-
arch/x86/include/asm/asm.h | 2 +
drivers/dma/fsl-edma.c | 28 +++++------
drivers/gpu/drm/nouveau/nvkm/subdev/pci/base.c | 7 +++
drivers/gpu/drm/ttm/ttm_page_alloc.c | 2 +
drivers/i2c/busses/i2c-designware-core.c | 2 +-
drivers/infiniband/hw/mlx4/mr.c | 2 +-
drivers/infiniband/hw/mlx5/mr.c | 1 +
drivers/infiniband/ulp/ipoib/ipoib_main.c | 25 +++++++---
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 5 +-
drivers/leds/led-core.c | 2 +-
drivers/mtd/nand/brcmnand/brcmnand.c | 2 +-
drivers/mtd/nand/gpmi-nand/gpmi-nand.c | 6 +--
drivers/net/can/flexcan.c | 2 +-
drivers/net/ethernet/arc/emac_main.c | 53 ++++++++++++---------
drivers/net/ethernet/broadcom/bnx2x/bnx2x_cmn.c | 4 +-
drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c | 14 +++++-
drivers/net/ethernet/broadcom/bnxt/bnxt_sriov.c | 2 +-
drivers/net/ethernet/broadcom/tg3.c | 13 ++++-
drivers/net/ethernet/broadcom/tg3.h | 4 ++
drivers/net/ethernet/freescale/gianfar_ptp.c | 3 +-
drivers/net/ethernet/intel/e1000/e1000.h | 3 +-
drivers/net/ethernet/intel/e1000/e1000_main.c | 27 +++++++++--
drivers/net/ethernet/intel/i40e/i40e_txrx.c | 26 ++++++++--
drivers/net/ethernet/intel/i40evf/i40e_txrx.c | 26 ++++++++--
drivers/net/ethernet/mediatek/mtk_eth_soc.c | 11 +++--
.../net/ethernet/stmicro/stmmac/stmmac_hwtstamp.c | 6 ++-
drivers/net/macvlan.c | 7 ++-
drivers/net/phy/mdio-sun4i.c | 6 ++-
drivers/net/phy/mdio-xgene.c | 21 ++++++---
drivers/net/usb/qmi_wwan.c | 2 +
drivers/net/wireless/mac80211_hwsim.c | 2 +-
drivers/net/xen-netfront.c | 1 +
drivers/nvme/host/core.c | 3 +-
drivers/s390/block/dasd_3990_erp.c | 10 ++++
drivers/scsi/storvsc_drv.c | 3 +-
drivers/spi/spi-atmel.c | 2 +-
drivers/xen/gntdev.c | 8 ++--
fs/f2fs/extent_cache.c | 12 ++++-
fs/super.c | 6 ++-
fs/xfs/xfs_qm.c | 46 +++++++++++-------
include/uapi/linux/libc-compat.h | 55 +++++++++++++++++++++-
kernel/irq/debug.h | 5 ++
kernel/time/hrtimer.c | 7 ++-
lib/mpi/longlong.h | 18 ++++++-
net/ipv6/ip6_tunnel.c | 9 +++-
net/ipv6/route.c | 1 +
net/mac80211/rx.c | 2 +
net/sctp/socket.c | 16 ++++---
net/tipc/bearer.c | 5 +-
net/tipc/monitor.c | 6 ++-
net/wireless/nl80211.c | 3 +-
sound/soc/codecs/nau8825.c | 1 +
61 files changed, 498 insertions(+), 135 deletions(-)
The patch titled
Subject: mm/gup.c: teach get_user_pages_unlocked to handle FOLL_NOWAIT
has been added to the -mm tree. Its filename is
mm-gup-teach-get_user_pages_unlocked-to-handle-foll_nowait.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-gup-teach-get_user_pages_unlock…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-gup-teach-get_user_pages_unlock…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/SubmitChecklist when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Andrea Arcangeli <aarcange(a)redhat.com>
Subject: mm/gup.c: teach get_user_pages_unlocked to handle FOLL_NOWAIT
KVM is hanging during postcopy live migration with userfaultfd because
get_user_pages_unlocked is not capable to handle FOLL_NOWAIT.
Earlier FOLL_NOWAIT was only ever passed to get_user_pages.
Specifically faultin_page (the callee of get_user_pages_unlocked caller)
doesn't know that if FAULT_FLAG_RETRY_NOWAIT was set in the page fault
flags, when VM_FAULT_RETRY is returned, the mmap_sem wasn't actually
released (even if nonblocking is not NULL). So it sets *nonblocking to
zero and the caller won't release the mmap_sem thinking it was already
released, but it wasn't because of FOLL_NOWAIT.
Link: http://lkml.kernel.org/r/20180302174343.5421-2-aarcange@redhat.com
Fixes: ce53053ce378c ("kvm: switch get_user_page_nowait() to get_user_pages_unlocked()")
Signed-off-by: Andrea Arcangeli <aarcange(a)redhat.com>
Reported-by: Dr. David Alan Gilbert <dgilbert(a)redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff -puN mm/gup.c~mm-gup-teach-get_user_pages_unlocked-to-handle-foll_nowait mm/gup.c
--- a/mm/gup.c~mm-gup-teach-get_user_pages_unlocked-to-handle-foll_nowait
+++ a/mm/gup.c
@@ -516,7 +516,7 @@ static int faultin_page(struct task_stru
}
if (ret & VM_FAULT_RETRY) {
- if (nonblocking)
+ if (nonblocking && !(fault_flags & FAULT_FLAG_RETRY_NOWAIT))
*nonblocking = 0;
return -EBUSY;
}
@@ -890,7 +890,10 @@ static __always_inline long __get_user_p
break;
}
if (*locked) {
- /* VM_FAULT_RETRY didn't trigger */
+ /*
+ * VM_FAULT_RETRY didn't trigger or it was a
+ * FOLL_NOWAIT.
+ */
if (!pages_done)
pages_done = ret;
break;
_
Patches currently in -mm which might be from aarcange(a)redhat.com are
mm-gup-teach-get_user_pages_unlocked-to-handle-foll_nowait.patch
On Fri, Mar 02, 2018 at 05:41:28PM +0000, Harsh Shandilya wrote:
> On Fri 2 Mar, 2018, 2:25 PM Greg Kroah-Hartman, <gregkh(a)linuxfoundation.org>
> wrote:
>
> > This is the start of the stable review cycle for the 3.18.98 release.
> > There are 24 patches in this series, all will be posted as a response
> > to this one. If anyone has any issues with these being applied, please
> > let me know.
> >
> > Responses should be made by Sun Mar 4 08:42:22 UTC 2018.
> > Anything received after that time might be too late.
> >
> > The whole patch series can be found in one patch at:
> >
> > https://www.kernel.org/pub/linux/kernel/v3.x/stable-review/patch-3.18.98-rc…
> > or in the git tree and branch at:
> > git://
> > git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
> > linux-3.18.y
> > and the diffstat can be found below.
> >
>
> Builds and boots on the OnePlus 3T, no merge issues with CAF's msm-3.18
> tree.
Great, thanks for testing and letting me know.
greg k-h
> >
When SPI transfers can be offloaded using DMA, the SPI core need to
build a scatterlist to make sure that the buffer to be transferred is
dma-able.
This patch fixes the scatterlist entry size computation in the case
where the maximum acceptable scatterlist entry supported by the DMA
controller is less than PAGE_SIZE, when the buffer is vmalloced.
For each entry, the actual size is given by the minimum between the
desc_len (which is the max buffer size supported by the DMA controller)
and the remaining buffer length until we cross a page boundary.
Fixes: 65598c13fd66 ("spi: Fix per-page mapping of unaligned vmalloc-ed buffer")
Signed-off-by: Maxime Chevallier <maxime.chevallier(a)bootlin.com>
---
drivers/spi/spi.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index b33a727a0158..4153f959f28c 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -779,8 +779,14 @@ static int spi_map_buf(struct spi_controller *ctlr, struct device *dev,
for (i = 0; i < sgs; i++) {
if (vmalloced_buf || kmap_buf) {
- min = min_t(size_t,
- len, desc_len - offset_in_page(buf));
+ /*
+ * Next scatterlist entry size is the minimum between
+ * the desc_len and the remaining buffer length that
+ * fits in a page.
+ */
+ min = min_t(size_t, desc_len,
+ min_t(size_t, len,
+ PAGE_SIZE - offset_in_page(buf)));
if (vmalloced_buf)
vm_page = vmalloc_to_page(buf);
else
--
2.11.0
On 02/03/18 15:58, Simon Gaiser wrote:
> Juergen Gross:
>> On 20/02/18 05:56, Simon Gaiser wrote:
>>> Juergen Gross:
>>>> On 07/02/18 23:22, Simon Gaiser wrote:
>>>>> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
>>>>> concurrent xenstore accesses") made a subtle change to the semantic of
>>>>> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>>>>>
>>>>> Before on an error response to XS_TRANSACTION_END
>>>>> xenbus_dev_request_and_reply() would not decrement the active
>>>>> transaction counter. But xenbus_transaction_end() has always counted the
>>>>> transaction as finished regardless of the response.
>>>>
>>>> Which is correct now. Xenstore will free all transaction related
>>>> data regardless of the response. A once failed transaction can't
>>>> be repaired, it has to be repeated completely.
>>>
>>> So if xenstore frees the transaction why should we keep it in the list
>>> with pending transaction in xenbus_dev_frontend? That's exactly what
>>> this patch fixes by always removing it from the list, not only on a
>>> successful response (See below for the EINVAL case).
>>
>> Aah, sorry, I seem to have misread my own coding. :-(
>>
>> Yes, you are right. Sorry for not seeing it before.
>>
>>>
>>> [...]
>>>>> But xenbus_dev_frontend tries to end a transaction on closing of the
>>>>> device if the XS_TRANSACTION_END failed before. Trying to close the
>>>>> transaction twice corrupts the reference count. So fix this by also
>>>>> considering a transaction closed if we have sent XS_TRANSACTION_END once
>>>>> regardless of the return code.
>>>>
>>>> A transaction in the list of transactions should not considered to be
>>>> finished. Either it is not on the list or it is still pending.
>>>
>>> With "considering a transaction closed" I mean "take the code path which
>>> removes the transaction from the list with pending transactions".
>>>
>>> From the follow-up mail:
>>>>>> The new behavior is that xenbus_dev_request_and_reply() and
>>>>>> xenbus_transaction_end() will always count the transaction as finished
>>>>>> regardless the response code (handled in xs_request_exit()).
>>>>>
>>>>> ENOENT should not decrement the transaction counter, while all
>>>>> other responses to XS_TRANSACTION_END should still do so.
>>>>
>>>> Sorry, I stand corrected: the ENOENT case should never happen, as this
>>>> case is tested in xenbus_write_transaction(). It doesn't hurt to test
>>>> for ENOENT, though.
>>>>
>>>> What should be handled is EINVAL: this would happen if a user specified
>>>> a string different from "T" and "F".
>>>
>>> Ok, I will handle those cases in xs_request_exit(). Although I don't
>>> like that this depends on the internals of xenstore (At least to me it's
>>> not obvious why it should only return ENOENT or EINVAL in this case).
>>>
>>> In the xenbus_write_transaction() case checking the string before
>>> sending the transaction (like the transaction itself is verified) would
>>> avoid this problem.
>>
>> Right. I'd prefer this solution.
>>
>> Remains the only problem you tried to tackle with your second patch: a
>> kernel driver going crazy and ending transactions it never started (or
>> ending them multiple times). The EINVAL case can't happen here, but
>> ENOENT can. Either ENOENT has to be handled in xs_request_exit() or you
>> need to keep track of the transactions like in the user interface and
>> refuse ending an unknown transaction. Or you trust the kernel users.
>> Trying to fix the usage counter seems to be the wrong approach IMO.
>
> The point of the second patch was to detect such bugs. This would have
> saved quite some time to find this bug. I added the "fix" of the counter
> I just because it was trivial after having the if there.
>
> Adding tracking seems to be a quite complex solution for a _potential_
> problem.
I agree.
> So I would go with checking ENOENT in xs_request_exit(). Should this be
> WARN_ON_ONCE()? Since this normally should not happen I would say yes.
Yes, having a WARN_ON_ONCE here will help.
> Should I keep the reference counter sanity check? And if yes, with the
> "fix" to the counter?
I'd drop it. This really should not happen and blowing up kernel size
with checks of impossible situations isn't the way to go.
In case you really want to do something here you can add something like
ASSERT(xs_state_users) before decrementing the counter.
Juergen
On 20/02/18 05:56, Simon Gaiser wrote:
> Juergen Gross:
>> On 07/02/18 23:22, Simon Gaiser wrote:
>>> Commit fd8aa9095a95 ("xen: optimize xenbus driver for multiple
>>> concurrent xenstore accesses") made a subtle change to the semantic of
>>> xenbus_dev_request_and_reply() and xenbus_transaction_end().
>>>
>>> Before on an error response to XS_TRANSACTION_END
>>> xenbus_dev_request_and_reply() would not decrement the active
>>> transaction counter. But xenbus_transaction_end() has always counted the
>>> transaction as finished regardless of the response.
>>
>> Which is correct now. Xenstore will free all transaction related
>> data regardless of the response. A once failed transaction can't
>> be repaired, it has to be repeated completely.
>
> So if xenstore frees the transaction why should we keep it in the list
> with pending transaction in xenbus_dev_frontend? That's exactly what
> this patch fixes by always removing it from the list, not only on a
> successful response (See below for the EINVAL case).
Aah, sorry, I seem to have misread my own coding. :-(
Yes, you are right. Sorry for not seeing it before.
>
> [...]
>>> But xenbus_dev_frontend tries to end a transaction on closing of the
>>> device if the XS_TRANSACTION_END failed before. Trying to close the
>>> transaction twice corrupts the reference count. So fix this by also
>>> considering a transaction closed if we have sent XS_TRANSACTION_END once
>>> regardless of the return code.
>>
>> A transaction in the list of transactions should not considered to be
>> finished. Either it is not on the list or it is still pending.
>
> With "considering a transaction closed" I mean "take the code path which
> removes the transaction from the list with pending transactions".
>
> From the follow-up mail:
>>>> The new behavior is that xenbus_dev_request_and_reply() and
>>>> xenbus_transaction_end() will always count the transaction as finished
>>>> regardless the response code (handled in xs_request_exit()).
>>>
>>> ENOENT should not decrement the transaction counter, while all
>>> other responses to XS_TRANSACTION_END should still do so.
>>
>> Sorry, I stand corrected: the ENOENT case should never happen, as this
>> case is tested in xenbus_write_transaction(). It doesn't hurt to test
>> for ENOENT, though.
>>
>> What should be handled is EINVAL: this would happen if a user specified
>> a string different from "T" and "F".
>
> Ok, I will handle those cases in xs_request_exit(). Although I don't
> like that this depends on the internals of xenstore (At least to me it's
> not obvious why it should only return ENOENT or EINVAL in this case).
>
> In the xenbus_write_transaction() case checking the string before
> sending the transaction (like the transaction itself is verified) would
> avoid this problem.
Right. I'd prefer this solution.
Remains the only problem you tried to tackle with your second patch: a
kernel driver going crazy and ending transactions it never started (or
ending them multiple times). The EINVAL case can't happen here, but
ENOENT can. Either ENOENT has to be handled in xs_request_exit() or you
need to keep track of the transactions like in the user interface and
refuse ending an unknown transaction. Or you trust the kernel users.
Trying to fix the usage counter seems to be the wrong approach IMO.
Juergen
On Fri, Mar 02, 2018 at 01:46:47PM +0100, Wolfram Sang wrote:
>
> So, maybe some words why I accepted this patch.
>
> On Fri, Mar 02, 2018 at 11:19:31AM +0000, Russell King - ARM Linux wrote:
> > Note that there have been patches proposed to make platform_get_irq()
> > return an error rather than returning a value of zero, so changing
> > the driver in this way is not a good idea.
>
> I'd much agree to such an approach, yet I didn't see it coming along so
> far for years(?) now.
It needs platform maintainers to be motivated to fix it, and one way to
provide that motivation is for subsystem maintainers to say no to patches
like this. If patches like this get accepted, then the "problem" gets
solved, and there is very little motivation to fix the platform itself.
If you look back at the history of this, the times when platforms have
been fixed is when they have a problem exactly like this, and they're
told to fix their platforms IRQ numbering instead of the patch to "fix"
the driver being accepted.
Why fix the interrupt numbering if patches to "fix" drivers get
accepted?
--
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up
According to speedtest.net: 8.21Mbps down 510kbps up
Interrupt number 0 (returned by platform_get_irq()) might be a valid IRQ
so do not treat it as an error. If interrupt 0 was configured, the driver
would exit the probe early, before finishing initialization, but with
0-exit status.
Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
Cc: stable(a)vger.kernel.org
Fixes: e0d1ec97853f ("i2c-s3c2410: Change IRQ to be plain integer.")
Signed-off-by: Krzysztof Kozlowski <krzk(a)kernel.org>
---
drivers/i2c/busses/i2c-s3c2410.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/i2c/busses/i2c-s3c2410.c b/drivers/i2c/busses/i2c-s3c2410.c
index 5d97510ee48b..783a93404f47 100644
--- a/drivers/i2c/busses/i2c-s3c2410.c
+++ b/drivers/i2c/busses/i2c-s3c2410.c
@@ -1178,7 +1178,7 @@ static int s3c24xx_i2c_probe(struct platform_device *pdev)
*/
if (!(i2c->quirks & QUIRK_POLL)) {
i2c->irq = ret = platform_get_irq(pdev, 0);
- if (ret <= 0) {
+ if (ret < 0) {
dev_err(&pdev->dev, "cannot find IRQ\n");
clk_unprepare(i2c->clk);
return ret;
--
2.7.4
On Fri, Mar 02, 2018 at 11:29:26AM +0100, Wolfram Sang wrote:
> On Thu, Mar 01, 2018 at 09:34:44PM +0100, Krzysztof Kozlowski wrote:
> > Interrupt number 0 (returned by platform_get_irq()) might be a valid IRQ
> > so do not treat it as an error. If interrupt 0 was configured, the driver
> > would exit the probe early, before finishing initialization, but with
> > 0-exit status.
> >
> > Reported-by: Dan Carpenter <dan.carpenter(a)oracle.com>
> > Cc: stable(a)vger.kernel.org
> > Fixes: e0d1ec97853f ("i2c-s3c2410: Change IRQ to be plain integer.")
>
> Please configure git to use 14 digits here.
Wait, when did we decide that 12 wasn't enough?
I just did a `git log | grep Fixes | tee baz | head -n 200` and only on
my git tree tehre were only 2 which used exactly 14 digits. The
standard is 12.
regards,
dan carpenter