From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: fix hugetlb cgroup refcounting during vma split
Guillaume Morin reported hitting the following WARNING followed by GPF or
NULL pointer deference either in cgroups_destroy or in the kill_css path.:
percpu ref (css_release) <= 0 (-1) after switching to atomic
WARNING: CPU: 23 PID: 130 at lib/percpu-refcount.c:196 percpu_ref_switch_to_atomic_rcu+0x127/0x130
CPU: 23 PID: 130 Comm: ksoftirqd/23 Kdump: loaded Tainted: G O 5.10.60 #1
RIP: 0010:percpu_ref_switch_to_atomic_rcu+0x127/0x130
Call Trace:
rcu_core+0x30f/0x530
rcu_core_si+0xe/0x10
__do_softirq+0x103/0x2a2
? sort_range+0x30/0x30
run_ksoftirqd+0x2b/0x40
smpboot_thread_fn+0x11a/0x170
kthread+0x10a/0x140
? kthread_create_worker_on_cpu+0x70/0x70
ret_from_fork+0x22/0x30
Upon further examination, it was discovered that the css structure was
associated with hugetlb reservations.
For private hugetlb mappings the vma points to a reserve map that contains
a pointer to the css. At mmap time, reservations are set up and a
reference to the css is taken. This reference is dropped in the vma close
operation; hugetlb_vm_op_close. However, if a vma is split no additional
reference to the css is taken yet hugetlb_vm_op_close will be called twice
for the split vma resulting in an underflow.
Fix by taking another reference in hugetlb_vm_op_open. Note that the
reference is only taken for the owner of the reserve map. In the more
common fork case, the pointer to the reserve map is cleared for non-owning
vmas.
Link: https://lkml.kernel.org/r/20210830215015.155224-1-mike.kravetz@oracle.com
Fixes: e9fe92ae0cd2 ("hugetlb_cgroup: add reservation accounting for
private mappings")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: Guillaume Morin <guillaume(a)morinfr.org>
Suggested-by: Guillaume Morin <guillaume(a)morinfr.org>
Tested-by: Guillaume Morin <guillaume(a)morinfr.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/hugetlb_cgroup.h | 12 ++++++++++++
mm/hugetlb.c | 4 +++-
2 files changed, 15 insertions(+), 1 deletion(-)
--- a/include/linux/hugetlb_cgroup.h~hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split
+++ a/include/linux/hugetlb_cgroup.h
@@ -121,6 +121,13 @@ static inline void hugetlb_cgroup_put_rs
css_put(&h_cg->css);
}
+static inline void resv_map_dup_hugetlb_cgroup_uncharge_info(
+ struct resv_map *resv_map)
+{
+ if (resv_map->css)
+ css_get(resv_map->css);
+}
+
extern int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr);
extern int hugetlb_cgroup_charge_cgroup_rsvd(int idx, unsigned long nr_pages,
@@ -199,6 +206,11 @@ static inline void hugetlb_cgroup_put_rs
{
}
+static inline void resv_map_dup_hugetlb_cgroup_uncharge_info(
+ struct resv_map *resv_map)
+{
+}
+
static inline int hugetlb_cgroup_charge_cgroup(int idx, unsigned long nr_pages,
struct hugetlb_cgroup **ptr)
{
--- a/mm/hugetlb.c~hugetlb-fix-hugetlb-cgroup-refcounting-during-vma-split
+++ a/mm/hugetlb.c
@@ -4106,8 +4106,10 @@ static void hugetlb_vm_op_open(struct vm
* after this open call completes. It is therefore safe to take a
* new reference here without additional locking.
*/
- if (resv && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
+ if (resv && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) {
+ resv_map_dup_hugetlb_cgroup_uncharge_info(resv);
kref_get(&resv->refs);
+ }
}
static void hugetlb_vm_op_close(struct vm_area_struct *vma)
_
From: Michael Wang <yun.wang(a)linux.alibaba.com>
Subject: mm: fix panic caused by __page_handle_poison()
In commit 510d25c92ec4 ("mm/hwpoison: disable pcp for
page_handle_poison()"), __page_handle_poison() was introduced, and if we
mark:
RET_A = dissolve_free_huge_page();
RET_B = take_page_off_buddy();
then __page_handle_poison was supposed to return TRUE When RET_A == 0 &&
RET_B == TRUE
But since it failed to take care the case when RET_A is -EBUSY or -ENOMEM,
and just return the ret as a bool which actually become TRUE, it break the
original logic.
The following result is a huge page in freelist but was
referenced as poisoned, and lead into the final panic:
kernel BUG at mm/internal.h:95!
invalid opcode: 0000 [#1] SMP PTI
skip...
RIP: 0010:set_page_refcounted mm/internal.h:95 [inline]
RIP: 0010:remove_hugetlb_page+0x23c/0x240 mm/hugetlb.c:1371
skip...
Call Trace:
remove_pool_huge_page+0xe4/0x110 mm/hugetlb.c:1892
return_unused_surplus_pages+0x8d/0x150 mm/hugetlb.c:2272
hugetlb_acct_memory.part.91+0x524/0x690 mm/hugetlb.c:4017
This patch replaces 'bool' with 'int' to handle RET_A correctly.
Link: https://lkml.kernel.org/r/61782ac6-1e8a-4f6f-35e6-e94fce3b37f5@linux.alibab…
Fixes: 510d25c92ec4 ("mm/hwpoison: disable pcp for page_handle_poison()")
Signed-off-by: Michael Wang <yun.wang(a)linux.alibaba.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Reported-by: Abaci <abaci(a)linux.alibaba.com>
Cc: <stable(a)vger.kernel.org> [5.14+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memory-failure.c~mm-fix-panic-caused-by-__page_handle_poison
+++ a/mm/memory-failure.c
@@ -68,7 +68,7 @@ atomic_long_t num_poisoned_pages __read_
static bool __page_handle_poison(struct page *page)
{
- bool ret;
+ int ret;
zone_pcp_disable(page_zone(page));
ret = dissolve_free_huge_page(page);
@@ -76,7 +76,7 @@ static bool __page_handle_poison(struct
ret = take_page_off_buddy(page);
zone_pcp_enable(page_zone(page));
- return ret;
+ return ret > 0;
}
static bool page_handle_poison(struct page *page, bool hugepage_or_freepage, bool release)
_
This is the start of the stable review cycle for the 4.19.206 release.
There are 33 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.206-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.206-rc1
Peter Collingbourne <pcc(a)google.com>
net: don't unconditionally copy_from_user a struct ifreq for socket ioctls
Denis Efremov <efremov(a)linux.com>
Revert "floppy: reintroduce O_NDELAY fix"
Sean Christopherson <seanjc(a)google.com>
KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
George Kennedy <george.kennedy(a)oracle.com>
fbmem: add margin check to fb_check_caps()
Linus Torvalds <torvalds(a)linux-foundation.org>
vt_kdsetmode: extend console locking
Gerd Rausch <gerd.rausch(a)oracle.com>
net/rds: dma_map_sg is entitled to merge entries
Ben Skeggs <bskeggs(a)redhat.com>
drm/nouveau/disp: power down unused DP links during init
Mark Yacoub <markyacoub(a)google.com>
drm: Copy drm_wait_vblank to user before returning
Shai Malin <smalin(a)marvell.com>
qed: Fix null-pointer dereference in qed_rdma_create_qp()
Shai Malin <smalin(a)marvell.com>
qed: qed ll2 race condition fixes
Neeraj Upadhyay <neeraju(a)codeaurora.org>
vringh: Use wiov->used to check for read/write desc order
Parav Pandit <parav(a)nvidia.com>
virtio_pci: Support surprise removal of virtio pci device
Parav Pandit <parav(a)nvidia.com>
virtio: Improve vq->broken access to avoid any compiler optimization
Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
opp: remove WARN when no valid OPPs remain
Jerome Brunet <jbrunet(a)baylibre.com>
usb: gadget: u_audio: fix race condition on endpoint stop
Guangbin Huang <huangguangbin2(a)huawei.com>
net: hns3: fix get wrong pfc_en when query PFC configuration
Maxim Kiselev <bigunclemax(a)gmail.com>
net: marvell: fix MVNETA_TX_IN_PRGRS bit number
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
xgene-v2: Fix a resource leak in the error handling path of 'xge_probe()'
Shreyansh Chouhan <chouhan.shreyansh630(a)gmail.com>
ip_gre: add validation for csum_start
Sasha Neftin <sasha.neftin(a)intel.com>
e1000e: Fix the max snoop/no-snoop latency for 10M
Tuo Li <islituo(a)gmail.com>
IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()
Wesley Cheng <wcheng(a)codeaurora.org>
usb: dwc3: gadget: Stop EP0 transfers during pullup disable
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: dwc3: gadget: Fix dwc3_calc_trbs_left()
Zhengjun Zhang <zhangzhengjun(a)aicrobo.com>
USB: serial: option: add new VID/PID to support Fibocom FG150
Johan Hovold <johan(a)kernel.org>
Revert "USB: serial: ch341: fix character loss at high transfer rates"
Stefan Mätje <stefan.maetje(a)esd.eu>
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
Kefeng Wang <wangkefeng.wang(a)huawei.com>
once: Fix panic when module unload
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: collect all entries in one cycle
Guenter Roeck <linux(a)roeck-us.net>
ARC: Fix CONFIG_STACKDEPOT
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix truncation handling for mod32 dst reg wrt zero
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix 32 bit src register truncation on div/mod
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Do not use ax register in interpreter on div/mod
Xiaolong Huang <butterflyhuangxx(a)gmail.com>
net: qrtr: fix another OOB Read in qrtr_endpoint_post
-------------
Diffstat:
Makefile | 4 +-
arch/arc/kernel/vmlinux.lds.S | 2 +
arch/x86/kvm/mmu.c | 11 +++-
drivers/block/floppy.c | 27 ++++----
drivers/gpu/drm/drm_ioc32.c | 4 +-
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c | 2 +-
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.h | 1 +
drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c | 9 +++
drivers/infiniband/hw/hfi1/sdma.c | 9 ++-
drivers/net/can/usb/esd_usb2.c | 4 +-
drivers/net/ethernet/apm/xgene-v2/main.c | 4 +-
.../net/ethernet/hisilicon/hns3/hns3pf/hclge_dcb.c | 13 +---
drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 ++++-
drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 +
drivers/net/ethernet/marvell/mvneta.c | 2 +-
drivers/net/ethernet/qlogic/qed/qed_ll2.c | 20 ++++++
drivers/net/ethernet/qlogic/qed/qed_rdma.c | 3 +-
drivers/opp/of.c | 5 +-
drivers/tty/vt/vt_ioctl.c | 11 ++--
drivers/usb/dwc3/gadget.c | 23 ++++---
drivers/usb/gadget/function/u_audio.c | 5 +-
drivers/usb/serial/ch341.c | 1 -
drivers/usb/serial/option.c | 2 +
drivers/vhost/vringh.c | 2 +-
drivers/video/fbdev/core/fbmem.c | 4 ++
drivers/virtio/virtio_pci_common.c | 7 +++
drivers/virtio/virtio_ring.c | 6 +-
include/linux/filter.h | 24 ++++++++
include/linux/netdevice.h | 4 ++
include/linux/once.h | 4 +-
kernel/bpf/core.c | 32 +++++-----
kernel/bpf/verifier.c | 27 ++++----
lib/once.c | 11 +++-
net/ipv4/ip_gre.c | 2 +
net/netfilter/nf_conntrack_core.c | 71 +++++++---------------
net/qrtr/qrtr.c | 2 +-
net/rds/ib_frmr.c | 4 +-
net/socket.c | 6 +-
38 files changed, 228 insertions(+), 157 deletions(-)
This is the start of the stable review cycle for the 4.14.246 release.
There are 23 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.246-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.14.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.14.246-rc1
Denis Efremov <efremov(a)linux.com>
Revert "floppy: reintroduce O_NDELAY fix"
Lai Jiangshan <laijs(a)linux.alibaba.com>
KVM: X86: MMU: Use the correct inherited permissions to get shadow page
Sean Christopherson <seanjc(a)google.com>
KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP shadow MMUs
George Kennedy <george.kennedy(a)oracle.com>
fbmem: add margin check to fb_check_caps()
Linus Torvalds <torvalds(a)linux-foundation.org>
vt_kdsetmode: extend console locking
Gerd Rausch <gerd.rausch(a)oracle.com>
net/rds: dma_map_sg is entitled to merge entries
Ben Skeggs <bskeggs(a)redhat.com>
drm/nouveau/disp: power down unused DP links during init
Mark Yacoub <markyacoub(a)google.com>
drm: Copy drm_wait_vblank to user before returning
Neeraj Upadhyay <neeraju(a)codeaurora.org>
vringh: Use wiov->used to check for read/write desc order
Parav Pandit <parav(a)nvidia.com>
virtio: Improve vq->broken access to avoid any compiler optimization
Michał Mirosław <mirq-linux(a)rere.qmqm.pl>
opp: remove WARN when no valid OPPs remain
Jerome Brunet <jbrunet(a)baylibre.com>
usb: gadget: u_audio: fix race condition on endpoint stop
Maxim Kiselev <bigunclemax(a)gmail.com>
net: marvell: fix MVNETA_TX_IN_PRGRS bit number
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
xgene-v2: Fix a resource leak in the error handling path of 'xge_probe()'
Shreyansh Chouhan <chouhan.shreyansh630(a)gmail.com>
ip_gre: add validation for csum_start
Sasha Neftin <sasha.neftin(a)intel.com>
e1000e: Fix the max snoop/no-snoop latency for 10M
Tuo Li <islituo(a)gmail.com>
IB/hfi1: Fix possible null-pointer dereference in _extend_sdma_tx_descs()
Wesley Cheng <wcheng(a)codeaurora.org>
usb: dwc3: gadget: Stop EP0 transfers during pullup disable
Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
usb: dwc3: gadget: Fix dwc3_calc_trbs_left()
Zhengjun Zhang <zhangzhengjun(a)aicrobo.com>
USB: serial: option: add new VID/PID to support Fibocom FG150
Johan Hovold <johan(a)kernel.org>
Revert "USB: serial: ch341: fix character loss at high transfer rates"
Stefan Mätje <stefan.maetje(a)esd.eu>
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
Guenter Roeck <linux(a)roeck-us.net>
ARC: Fix CONFIG_STACKDEPOT
-------------
Diffstat:
Documentation/virtual/kvm/mmu.txt | 4 ++--
Makefile | 4 ++--
arch/arc/kernel/vmlinux.lds.S | 2 ++
arch/x86/kvm/mmu.c | 11 +++++++++-
arch/x86/kvm/paging_tmpl.h | 14 ++++++++-----
drivers/base/power/opp/of.c | 5 +++--
drivers/block/floppy.c | 27 ++++++++++++-------------
drivers/gpu/drm/drm_ioc32.c | 4 +---
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.c | 2 +-
drivers/gpu/drm/nouveau/nvkm/engine/disp/dp.h | 1 +
drivers/gpu/drm/nouveau/nvkm/engine/disp/outp.c | 9 +++++++++
drivers/infiniband/hw/hfi1/sdma.c | 9 ++++-----
drivers/net/can/usb/esd_usb2.c | 4 ++--
drivers/net/ethernet/apm/xgene-v2/main.c | 4 +++-
drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 ++++++++++++-
drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 +++
drivers/net/ethernet/marvell/mvneta.c | 2 +-
drivers/tty/vt/vt_ioctl.c | 11 ++++++----
drivers/usb/dwc3/gadget.c | 23 ++++++++++-----------
drivers/usb/gadget/function/u_audio.c | 5 ++---
drivers/usb/serial/ch341.c | 1 -
drivers/usb/serial/option.c | 2 ++
drivers/vhost/vringh.c | 2 +-
drivers/video/fbdev/core/fbmem.c | 4 ++++
drivers/virtio/virtio_ring.c | 6 ++++--
net/ipv4/ip_gre.c | 2 ++
net/rds/ib_frmr.c | 4 ++--
27 files changed, 114 insertions(+), 65 deletions(-)
This is the start of the stable review cycle for the 4.4.283 release.
There are 10 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 03 Sep 2021 12:22:41 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.283-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.283-rc1
Denis Efremov <efremov(a)linux.com>
Revert "floppy: reintroduce O_NDELAY fix"
George Kennedy <george.kennedy(a)oracle.com>
fbmem: add margin check to fb_check_caps()
Linus Torvalds <torvalds(a)linux-foundation.org>
vt_kdsetmode: extend console locking
Neeraj Upadhyay <neeraju(a)codeaurora.org>
vringh: Use wiov->used to check for read/write desc order
Parav Pandit <parav(a)nvidia.com>
virtio: Improve vq->broken access to avoid any compiler optimization
Maxim Kiselev <bigunclemax(a)gmail.com>
net: marvell: fix MVNETA_TX_IN_PRGRS bit number
Sasha Neftin <sasha.neftin(a)intel.com>
e1000e: Fix the max snoop/no-snoop latency for 10M
Zhengjun Zhang <zhangzhengjun(a)aicrobo.com>
USB: serial: option: add new VID/PID to support Fibocom FG150
Johan Hovold <johan(a)kernel.org>
Revert "USB: serial: ch341: fix character loss at high transfer rates"
Stefan Mätje <stefan.maetje(a)esd.eu>
can: usb: esd_usb2: esd_usb2_rx_event(): fix the interchange of the CAN RX and TX error counters
-------------
Diffstat:
Makefile | 4 ++--
drivers/block/floppy.c | 27 +++++++++++++--------------
drivers/net/can/usb/esd_usb2.c | 4 ++--
drivers/net/ethernet/intel/e1000e/ich8lan.c | 14 +++++++++++++-
drivers/net/ethernet/intel/e1000e/ich8lan.h | 3 +++
drivers/net/ethernet/marvell/mvneta.c | 2 +-
drivers/tty/vt/vt_ioctl.c | 11 +++++++----
drivers/usb/serial/ch341.c | 1 -
drivers/usb/serial/option.c | 2 ++
drivers/vhost/vringh.c | 2 +-
drivers/video/fbdev/core/fbmem.c | 4 ++++
drivers/virtio/virtio_ring.c | 6 ++++--
12 files changed, 52 insertions(+), 28 deletions(-)