Files can be created and mapped in an explicitly mounted hugetlbfs
filesystem. If pages in such files are migrated, the filesystem
usage will not be decremented for the associated pages. This can
result in mmap or page allocation failures as it appears there are
fewer pages in the filesystem than there should be.
For example, a test program which hole punches, faults and migrates
pages in such a file (1G in size) will eventually fail because it
can not allocate a page. Reported counts and usage at time of failure:
node0
537 free_hugepages
1024 nr_hugepages
0 surplus_hugepages
node1
1000 free_hugepages
1024 nr_hugepages
0 surplus_hugepages
Filesystem Size Used Avail Use% Mounted on
nodev 4.0G 4.0G 0 100% /var/opt/hugepool
Note that the filesystem shows 4G of pages used, while actual usage is
511 pages (just under 1G). Failed trying to allocate page 512.
If a hugetlb page is associated with an explicitly mounted filesystem,
this information in contained in the page_private field. At migration
time, this information is not preserved. To fix, simply transfer
page_private from old to new page at migration time if necessary. Also,
migrate_page_states() unconditionally clears page_private and PagePrivate
of the old page. It is unlikely, but possible that these fields could
be non-NULL and are needed at hugetlb free page time. So, do not touch
these fields for hugetlb pages.
Cc: <stable(a)vger.kernel.org>
Fixes: 290408d4a250 ("hugetlb: hugepage migration core")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
fs/hugetlbfs/inode.c | 10 ++++++++++
mm/migrate.c | 10 ++++++++--
2 files changed, 18 insertions(+), 2 deletions(-)
diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c
index 32920a10100e..fb6de1db8806 100644
--- a/fs/hugetlbfs/inode.c
+++ b/fs/hugetlbfs/inode.c
@@ -859,6 +859,16 @@ static int hugetlbfs_migrate_page(struct address_space *mapping,
rc = migrate_huge_page_move_mapping(mapping, newpage, page);
if (rc != MIGRATEPAGE_SUCCESS)
return rc;
+
+ /*
+ * page_private is subpool pointer in hugetlb pages, transfer
+ * if needed.
+ */
+ if (page_private(page) && !page_private(newpage)) {
+ set_page_private(newpage, page_private(page));
+ set_page_private(page, 0);
+ }
+
if (mode != MIGRATE_SYNC_NO_COPY)
migrate_page_copy(newpage, page);
else
diff --git a/mm/migrate.c b/mm/migrate.c
index f7e4bfdc13b7..0d9708803553 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -703,8 +703,14 @@ void migrate_page_states(struct page *newpage, struct page *page)
*/
if (PageSwapCache(page))
ClearPageSwapCache(page);
- ClearPagePrivate(page);
- set_page_private(page, 0);
+ /*
+ * Unlikely, but PagePrivate and page_private could potentially
+ * contain information needed at hugetlb free page time.
+ */
+ if (!PageHuge(page)) {
+ ClearPagePrivate(page);
+ set_page_private(page, 0);
+ }
/*
* If any waiters have accumulated on the new page then
--
2.17.2
This is the start of the stable review cycle for the 4.14.104 release.
There are 71 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed Feb 27 19:50:01 UTC 2019.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.104-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.104-rc1
Matthias Kaehlcke <mka(a)chromium.org>
sched/sysctl: Fix attributes of some extern declarations
Colin Ian King <colin.king(a)canonical.com>
phy: tegra: remove redundant self assignment of 'map'
Nathan Chancellor <natechancellor(a)gmail.com>
pinctrl: max77620: Use define directive for max77620_pinconf_param values
Eli Cooper <elicooper(a)gmx.com>
netfilter: ipv6: Don't preserve original oif for loopback address
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_compat: use-after-free when deleting targets
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: fix flush after rule deletion in the same batch
Hangbin Liu <liuhangbin(a)gmail.com>
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
Willem de Bruijn <willemb(a)google.com>
net: avoid false positives in untrusted gso validation
Willem de Bruijn <willemb(a)google.com>
net: validate untrusted gso packets without csum offload
Chris Wilson <chris(a)chris-wilson.co.uk>
drm/i915/fbdev: Actually configure untiled displays
Alexey Brodkin <abrodkin(a)synopsys.com>
ARC: define ARCH_SLAB_MINALIGN = 8
Eugeniy Paltsev <Eugeniy.Paltsev(a)synopsys.com>
ARC: U-boot: check arguments paranoidly
Eugeniy Paltsev <Eugeniy.Paltsev(a)synopsys.com>
ARCv2: Enable unaligned access in early ASM code
Dmitry V. Levin <ldv(a)altlinux.org>
parisc: Fix ptrace syscall number modification
Eric Biggers <ebiggers(a)google.com>
KEYS: always initialize keyring_index_key::desc_len
Eric Biggers <ebiggers(a)google.com>
KEYS: user: Align the payload buffer
Bart Van Assche <bvanassche(a)acm.org>
RDMA/srp: Rework SCSI device reset handling
Konstantin Khlebnikov <khlebnikov(a)yandex-team.ru>
inet_diag: fix reporting cgroup classid and fallback to priority
Saeed Mahameed <saeedm(a)mellanox.com>
net/mlx4_en: Force CHECKSUM_NONE for short ethernet frames
Hangbin Liu <liuhangbin(a)gmail.com>
sit: check if IPv6 enabled before calling ip6_err_gen_icmpv6_unreach()
Cong Wang <xiyou.wangcong(a)gmail.com>
team: avoid complex list operations in team_nl_cmd_options_set()
Xin Long <lucien.xin(a)gmail.com>
sctp: call gso_reset_checksum when computing checksum in sctp_gso_segment
Russell King <rmk+kernel(a)armlinux.org.uk>
net: sfp: do not probe SFP module before we're attached
Kal Conley <kal.conley(a)dectris.com>
net/packet: fix 4gb buffer limit due to overflow check
Tonghao Zhang <xiangxia.m.yue(a)gmail.com>
net/mlx5e: Don't overwrite pedit action when multiple pedit used
Li RongQing <lirongqing(a)baidu.com>
ipv6: propagate genlmsg_reply return code
Eric Dumazet <edumazet(a)google.com>
batman-adv: fix uninit-value in batadv_interface_tx()
Nathan Chancellor <natechancellor(a)gmail.com>
isdn: avm: Fix string plus integer warning from Clang
Tariq Toukan <tariqt(a)mellanox.com>
net/mlx5e: Fix wrong (zero) TX drop counter indication for representor
Ido Schimmel <idosch(a)mellanox.com>
mlxsw: spectrum_switchdev: Do not treat static FDB entries as sticky
Peter Oskolkov <posk(a)google.com>
bpf: bpf_setsockopt: reset sock dst on SO_MARK changes
Kangjie Lu <kjlu(a)umn.edu>
leds: lp5523: fix a missing check of return value of lp55xx_read
Cheng-Min Ao <tony_ao(a)wiwynn.com>
hwmon: (tmp421) Correct the misspelling of the tmp442 compatible attribute in OF device ID table
Colin Ian King <colin.king(a)canonical.com>
atm: he: fix sign-extension overflow on large shift
Julia Lawall <Julia.Lawall(a)lip6.fr>
drm/meson: add missing of_node_put
Talons Lee <xin.li(a)citrix.com>
always clear the X2APIC_ENABLE bit for PV guest
Manish Rangankar <mrangankar(a)marvell.com>
scsi: qedi: Add ep_state for login completion on un-reachable targets
Stanley Chu <stanley.chu(a)mediatek.com>
scsi: ufs: Fix system suspend status
Jia-Ju Bai <baijiaju1990(a)gmail.com>
isdn: i4l: isdn_tty: Fix some concurrency double-free bugs
Jose Abreu <jose.abreu(a)synopsys.com>
net: stmmac: Fix PCI module removal leak
Yuchung Cheng <ycheng(a)google.com>
bpf: correctly set initial window on active Fast Open sender
Thomas Bogendoerfer <tbogendoerfer(a)suse.de>
MIPS: jazz: fix 64bit build
Logan Gunthorpe <logang(a)deltatee.com>
scsi: isci: initialize shost fully before calling scsi_add_host()
YueHaibing <yuehaibing(a)huawei.com>
scsi: qla4xxx: check return code of qla4xxx_copy_from_fwddb_param
Taehee Yoo <ap420073(a)gmail.com>
netfilter: nf_tables: fix leaking object reference count
Alban Bedel <albeu(a)free.fr>
MIPS: ath79: Enable OF serial ports in the default config
Yonglong Liu <liuyonglong(a)huawei.com>
net: hns: Fix use after free identified by SLUB debug
Denis Bolotin <dbolotin(a)marvell.com>
qed: Fix qed_ll2_post_rx_buffer_notify_fw() by adding a write memory barrier
Denis Bolotin <dbolotin(a)marvell.com>
qed: Fix qed_chain_set_prod() for PBL chains with non power of 2 page count
YueHaibing <yuehaibing(a)huawei.com>
xen/pvcalls: remove set but not used variable 'intf'
Kangjie Lu <kjlu(a)umn.edu>
mfd: mc13xxx: Fix a missing check of a register-read failure
Keerthy <j-keerthy(a)ti.com>
mfd: tps65218: Use devm_regmap_add_irq_chip and clean up error path in probe()
Charles Keepax <ckeepax(a)opensource.cirrus.com>
mfd: wm5110: Add missing ASRC rate register
Jonathan Marek <jonathan(a)marek.ca>
mfd: qcom_rpm: write fw_version to CTRL_REG
Dien Pham <dien.pham.ry(a)renesas.com>
mfd: bd9571mwv: Add volatile register to make DVFS work
Dan Carpenter <dan.carpenter(a)oracle.com>
mfd: ab8500-core: Return zero in get_register_interruptible()
Nicolas Boichat <drinkcat(a)chromium.org>
mfd: mt6397: Do not call irq_domain_remove if PMIC unsupported
Nathan Chancellor <natechancellor(a)gmail.com>
mfd: db8500-prcmu: Fix some section annotations
Nathan Chancellor <natechancellor(a)gmail.com>
mfd: twl-core: Fix section annotations on {,un}protect_pm_master
Stefano Stabellini <sstabellini(a)kernel.org>
pvcalls-back: set -ENOTCONN in pvcalls_conn_back_read
Vignesh R <vigneshr(a)ti.com>
mfd: ti_am335x_tscadc: Use PLATFORM_DEVID_AUTO while registering mfd cells
Eric Biggers <ebiggers(a)google.com>
KEYS: allow reaching the keys quotas exactly
Michal Hocko <mhocko(a)suse.com>
proc, oom: do not report alien mms when setting oom_score_adj
Ralph Campbell <rcampbell(a)nvidia.com>
numa: change get_mempolicy() to use nr_node_ids instead of MAX_NUMNODES
Yan, Zheng <zyan(a)redhat.com>
ceph: avoid repeatedly adding inode to mdsc->snap_flush_list
Ilya Dryomov <idryomov(a)gmail.com>
libceph: handle an empty authorize reply
Herbert Xu <herbert(a)gondor.apana.org.au>
mac80211: Free mpath object when rhashtable insertion fails
Rakesh Pillai <pillair(a)codeaurora.org>
mac80211: Restore vif beacon interval if start ap fails
Paul Burton <paul.burton(a)mips.com>
MIPS: eBPF: Always return sign extended 32b values
Quentin Perret <quentin.perret(a)arm.com>
tracing: Fix number of entries in trace header
Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
ARM: 8834/1: Fix: kprobes: optimized kprobes illegal instruction
-------------
Diffstat:
Makefile | 4 +-
arch/arc/include/asm/cache.h | 11 +++
arch/arc/kernel/head.S | 14 +++-
arch/arc/kernel/setup.c | 87 +++++++++++++++-------
arch/arm/probes/kprobes/opt-arm.c | 2 +-
arch/mips/configs/ath79_defconfig | 1 +
arch/mips/jazz/jazzdma.c | 5 +-
arch/mips/net/ebpf_jit.c | 9 ++-
arch/parisc/kernel/ptrace.c | 29 ++++++--
arch/x86/xen/enlighten_pv.c | 5 +-
drivers/atm/he.c | 2 +-
drivers/gpu/drm/i915/intel_fbdev.c | 12 +--
drivers/gpu/drm/meson/meson_drv.c | 9 ++-
drivers/hwmon/tmp421.c | 2 +-
drivers/infiniband/ulp/srp/ib_srp.c | 10 ---
drivers/isdn/hardware/avm/b1.c | 2 +-
drivers/isdn/i4l/isdn_tty.c | 6 +-
drivers/leds/leds-lp5523.c | 4 +-
drivers/mfd/ab8500-core.c | 2 +-
drivers/mfd/bd9571mwv.c | 1 +
drivers/mfd/db8500-prcmu.c | 4 +-
drivers/mfd/mc13xxx-core.c | 4 +-
drivers/mfd/mt6397-core.c | 3 +-
drivers/mfd/qcom_rpm.c | 4 +
drivers/mfd/ti_am335x_tscadc.c | 5 +-
drivers/mfd/tps65218.c | 24 +-----
drivers/mfd/twl-core.c | 4 +-
drivers/mfd/wm5110-tables.c | 2 +
drivers/net/ethernet/hisilicon/hns/hns_ae_adapt.c | 6 +-
drivers/net/ethernet/mellanox/mlx4/en_rx.c | 23 +++++-
drivers/net/ethernet/mellanox/mlx5/core/en_rep.c | 1 +
drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 25 ++++---
.../ethernet/mellanox/mlxsw/spectrum_switchdev.c | 12 +--
drivers/net/ethernet/qlogic/qed/qed_ll2.c | 4 +
drivers/net/ethernet/stmicro/stmmac/stmmac_pci.c | 10 +++
drivers/net/phy/sfp-bus.c | 2 +
drivers/net/phy/sfp.c | 30 +++++---
drivers/net/phy/sfp.h | 2 +
drivers/net/team/team.c | 27 ++-----
drivers/phy/tegra/xusb.c | 2 +-
drivers/pinctrl/pinctrl-max77620.c | 14 ++--
drivers/scsi/isci/init.c | 14 ++--
drivers/scsi/qedi/qedi_iscsi.c | 3 +
drivers/scsi/qedi/qedi_iscsi.h | 1 +
drivers/scsi/qla4xxx/ql4_os.c | 2 +
drivers/scsi/ufs/ufshcd.c | 2 +
drivers/xen/pvcalls-back.c | 9 +--
fs/ceph/snap.c | 3 +-
fs/proc/base.c | 4 -
include/keys/user-type.h | 2 +-
include/linux/qed/qed_chain.h | 31 ++++++++
include/linux/sched/sysctl.h | 6 +-
include/linux/skbuff.h | 2 +-
include/linux/virtio_net.h | 19 +++++
include/uapi/linux/inet_diag.h | 16 ++--
kernel/trace/trace.c | 2 +
mm/mempolicy.c | 6 +-
net/batman-adv/soft-interface.c | 2 +
net/bridge/br_multicast.c | 9 +--
net/ceph/messenger.c | 15 ++--
net/core/filter.c | 7 +-
net/ipv4/inet_diag.c | 10 ++-
net/ipv6/netfilter.c | 4 +-
net/ipv6/seg6.c | 4 +-
net/ipv6/sit.c | 3 +-
net/mac80211/cfg.c | 6 +-
net/mac80211/mesh_pathtbl.c | 17 +++--
net/netfilter/nf_tables_api.c | 5 ++
net/netfilter/nft_compat.c | 3 +-
net/packet/af_packet.c | 2 +-
net/sctp/offload.c | 1 +
net/sctp/sctp_diag.c | 1 +
security/keys/key.c | 4 +-
security/keys/keyring.c | 4 +-
security/keys/proc.c | 3 +-
security/keys/request_key.c | 1 +
security/keys/request_key_auth.c | 2 +-
77 files changed, 417 insertions(+), 233 deletions(-)
This is a note to let you know that I've just added the patch titled
mei: bus: move hw module get/put to probe/release
to my char-misc git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc.git
in the char-misc-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From b5958faa34e2f99f3475ad89c52d98dfea079d33 Mon Sep 17 00:00:00 2001
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Date: Mon, 25 Feb 2019 11:09:28 +0200
Subject: mei: bus: move hw module get/put to probe/release
Fix unbalanced module reference counting during internal reset, which
prevents the drivers unloading.
Tracking mei_me/txe modules on mei client bus via
mei_cldev_enable/disable is error prone due to possible internal
reset flow, where clients are disconnected underneath.
Moving reference counting to probe and release of mei bus client
driver solves this issue in simplest way, as each client provides only
a single connection to a client bus driver.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/misc/mei/bus.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index e5456faf00e6..65bec998eb6e 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -540,17 +540,9 @@ int mei_cldev_enable(struct mei_cl_device *cldev)
goto out;
}
- if (!mei_cl_bus_module_get(cldev)) {
- dev_err(&cldev->dev, "get hw module failed");
- ret = -ENODEV;
- goto out;
- }
-
ret = mei_cl_connect(cl, cldev->me_cl, NULL);
- if (ret < 0) {
+ if (ret < 0)
dev_err(&cldev->dev, "cannot connect\n");
- mei_cl_bus_module_put(cldev);
- }
out:
mutex_unlock(&bus->device_lock);
@@ -613,7 +605,6 @@ int mei_cldev_disable(struct mei_cl_device *cldev)
if (err < 0)
dev_err(bus->dev, "Could not disconnect from the ME client\n");
- mei_cl_bus_module_put(cldev);
out:
/* Flush queues and remove any pending read */
mei_cl_flush_queues(cl, NULL);
@@ -724,9 +715,16 @@ static int mei_cl_device_probe(struct device *dev)
if (!id)
return -ENODEV;
+ if (!mei_cl_bus_module_get(cldev)) {
+ dev_err(&cldev->dev, "get hw module failed");
+ return -ENODEV;
+ }
+
ret = cldrv->probe(cldev, id);
- if (ret)
+ if (ret) {
+ mei_cl_bus_module_put(cldev);
return ret;
+ }
__module_get(THIS_MODULE);
return 0;
@@ -754,6 +752,7 @@ static int mei_cl_device_remove(struct device *dev)
mei_cldev_unregister_callbacks(cldev);
+ mei_cl_bus_module_put(cldev);
module_put(THIS_MODULE);
dev->driver = NULL;
return ret;
--
2.21.0