Commit d92c370a16cb ("block: really clone the block cgroup in
bio_clone_blkg_association") changed bio_clone_blkg_association() to
just clone bio->bi_blkg reference from source to destination bio. This
is however wrong if the source and destination bios are against
different block devices because struct blkcg_gq is different for each
bdev-blkcg pair. This will result in IOs being accounted (and throttled
as a result) multiple times against the same device (src bdev) while
throttling of the other device (dst bdev) is ignored. In case of BFQ the
inconsistency can even result in crashes in bfq_bic_update_cgroup().
Fix the problem by looking up correct blkcg_gq for the cloned bio.
Reported-by: Logan Gunthorpe <logang(a)deltatee.com>
Reported-by: Donald Buczek <buczek(a)molgen.mpg.de>
Fixes: d92c370a16cb ("block: really clone the block cgroup in bio_clone_blkg_association")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
block/blk-cgroup.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 40161a3f68d0..ecb4eaff6817 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1975,10 +1975,9 @@ EXPORT_SYMBOL_GPL(bio_associate_blkg);
void bio_clone_blkg_association(struct bio *dst, struct bio *src)
{
if (src->bi_blkg) {
- if (dst->bi_blkg)
- blkg_put(dst->bi_blkg);
- blkg_get(src->bi_blkg);
- dst->bi_blkg = src->bi_blkg;
+ rcu_read_lock();
+ bio_associate_blkg_from_css(dst, bio_blkcg_css(src));
+ rcu_read_unlock();
}
}
EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
--
2.35.3
The quilt patch titled
Subject: x86/kexec: fix memory leak of elf header buffer
has been removed from the -mm tree. Its filename was
x86-kexec-fix-memory-leak-of-elf-header-buffer.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Baoquan He <bhe(a)redhat.com>
Subject: x86/kexec: fix memory leak of elf header buffer
Date: Wed, 23 Feb 2022 19:32:24 +0800
This is reported by kmemleak detector:
unreferenced object 0xffffc900002a9000 (size 4096):
comm "kexec", pid 14950, jiffies 4295110793 (age 373.951s)
hex dump (first 32 bytes):
7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............
04 00 3e 00 01 00 00 00 00 00 00 00 00 00 00 00 ..>.............
backtrace:
[<0000000016a8ef9f>] __vmalloc_node_range+0x101/0x170
[<000000002b66b6c0>] __vmalloc_node+0xb4/0x160
[<00000000ad40107d>] crash_prepare_elf64_headers+0x8e/0xcd0
[<0000000019afff23>] crash_load_segments+0x260/0x470
[<0000000019ebe95c>] bzImage64_load+0x814/0xad0
[<0000000093e16b05>] arch_kexec_kernel_image_load+0x1be/0x2a0
[<000000009ef2fc88>] kimage_file_alloc_init+0x2ec/0x5a0
[<0000000038f5a97a>] __do_sys_kexec_file_load+0x28d/0x530
[<0000000087c19992>] do_syscall_64+0x3b/0x90
[<0000000066e063a4>] entry_SYSCALL_64_after_hwframe+0x44/0xae
In crash_prepare_elf64_headers(), a buffer is allocated via vmalloc() to
store elf headers. While it's not freed back to system correctly when
kdump kernel is reloaded or unloaded. Then memory leak is caused. Fix it
by introducing x86 specific function arch_kimage_file_post_load_cleanup(),
and freeing the buffer there.
And also remove the incorrect elf header buffer freeing code. Before
calling arch specific kexec_file loading function, the image instance has
been initialized. So 'image->elf_headers' must be NULL. It doesn't make
sense to free the elf header buffer in the place.
Three different people have reported three bugs about the memory leak on
x86_64 inside Redhat.
Link: https://lkml.kernel.org/r/20220223113225.63106-2-bhe@redhat.com
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Acked-by: Dave Young <dyoung(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/kernel/machine_kexec_64.c | 12 +++++++++---
1 file changed, 9 insertions(+), 3 deletions(-)
--- a/arch/x86/kernel/machine_kexec_64.c~x86-kexec-fix-memory-leak-of-elf-header-buffer
+++ a/arch/x86/kernel/machine_kexec_64.c
@@ -376,9 +376,6 @@ void machine_kexec(struct kimage *image)
#ifdef CONFIG_KEXEC_FILE
void *arch_kexec_kernel_image_load(struct kimage *image)
{
- vfree(image->elf_headers);
- image->elf_headers = NULL;
-
if (!image->fops || !image->fops->load)
return ERR_PTR(-ENOEXEC);
@@ -514,6 +511,15 @@ overflow:
(int)ELF64_R_TYPE(rel[i].r_info), value);
return -ENOEXEC;
}
+
+int arch_kimage_file_post_load_cleanup(struct kimage *image)
+{
+ vfree(image->elf_headers);
+ image->elf_headers = NULL;
+ image->elf_headers_sz = 0;
+
+ return kexec_image_post_load_cleanup_default(image);
+}
#endif /* CONFIG_KEXEC_FILE */
static int
_
Patches currently in -mm which might be from bhe(a)redhat.com are
The quilt patch titled
Subject: mm/memremap: fix missing call to untrack_pfn() in pagemap_range()
has been removed from the -mm tree. Its filename was
mm-memremap-fix-missing-call-to-untrack_pfn-in-pagemap_range.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/memremap: fix missing call to untrack_pfn() in pagemap_range()
Date: Tue, 31 May 2022 20:26:43 +0800
We forget to call untrack_pfn() to pair with track_pfn_remap() when range
is not allowed to hotplug. Fix it by jump err_kasan.
Link: https://lkml.kernel.org/r/20220531122643.25249-1-linmiaohe@huawei.com
Fixes: bca3feaa0764 ("mm/memory_hotplug: prevalidate the address range being added with platform")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Muchun Song <songmuchun(a)bytedance.com>
Cc: Anshuman Khandual <anshuman.khandual(a)arm.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memremap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/memremap.c~mm-memremap-fix-missing-call-to-untrack_pfn-in-pagemap_range
+++ a/mm/memremap.c
@@ -214,7 +214,7 @@ static int pagemap_range(struct dev_page
if (!mhp_range_allowed(range->start, range_len(range), !is_private)) {
error = -EINVAL;
- goto err_pfn_remap;
+ goto err_kasan;
}
mem_hotplug_begin();
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-shmemc-clean-up-comment-of-shmem_swapin_folio.patch
mm-reduce-the-rcu-lock-duration.patch
mm-migration-remove-unneeded-lock-page-and-pagemovable-check.patch
mm-migration-return-errno-when-isolate_huge_page-failed.patch
mm-migration-fix-potential-pte_unmap-on-an-not-mapped-pte.patch
From: Chris Ye <chris.ye(a)intel.com>
nvdimm_clear_badblocks_region() validates badblock clearing requests
against the span of the region, however it compares the inclusive
badblock request range to the exclusive region range. Fix up the
off-by-one error.
Fixes: 23f498448362 ("libnvdimm: rework region badblocks clearing")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chris Ye <chris.ye(a)intel.com>
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
---
drivers/nvdimm/bus.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvdimm/bus.c b/drivers/nvdimm/bus.c
index 7b0d1443217a..5db16857b80e 100644
--- a/drivers/nvdimm/bus.c
+++ b/drivers/nvdimm/bus.c
@@ -182,8 +182,8 @@ static int nvdimm_clear_badblocks_region(struct device *dev, void *data)
ndr_end = nd_region->ndr_start + nd_region->ndr_size - 1;
/* make sure we are in the region */
- if (ctx->phys < nd_region->ndr_start
- || (ctx->phys + ctx->cleared) > ndr_end)
+ if (ctx->phys < nd_region->ndr_start ||
+ (ctx->phys + ctx->cleared - 1) > ndr_end)
return 0;
sector = (ctx->phys - nd_region->ndr_start) / 512;
From: Niels Dossche <dossche.niels(a)gmail.com>
[ Upstream commit 22cbc6c2681a0a4fe76150270426e763d52353a4 ]
The documentation of the function rvt_error_qp says both r_lock and
s_lock need to be held when calling that function.
It also asserts using lockdep that both of those locks are held.
rvt_error_qp is called form rvt_send_cq, which is called from
rvt_qp_complete_swqe, which is called from rvt_send_complete, which is
called from rvt_ruc_loopback in two places. Both of these places do not
hold r_lock. Fix this by acquiring a spin_lock of r_lock in both of
these places.
The r_lock acquiring cannot be added in rvt_qp_complete_swqe because
some of its other callers already have r_lock acquired.
Link: https://lore.kernel.org/r/20220228195144.71946-1-dossche.niels@gmail.com
Signed-off-by: Niels Dossche <dossche.niels(a)gmail.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/infiniband/sw/rdmavt/qp.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index 48e8612c1bc8..e97c13967174 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -2812,7 +2812,7 @@ void rvt_qp_iter(struct rvt_dev_info *rdi,
EXPORT_SYMBOL(rvt_qp_iter);
/*
- * This should be called with s_lock held.
+ * This should be called with s_lock and r_lock held.
*/
void rvt_send_complete(struct rvt_qp *qp, struct rvt_swqe *wqe,
enum ib_wc_status status)
@@ -3171,7 +3171,9 @@ void rvt_ruc_loopback(struct rvt_qp *sqp)
rvp->n_loop_pkts++;
flush_send:
sqp->s_rnr_retry = sqp->s_rnr_retry_cnt;
+ spin_lock(&sqp->r_lock);
rvt_send_complete(sqp, wqe, send_status);
+ spin_unlock(&sqp->r_lock);
if (local_ops) {
atomic_dec(&sqp->local_ops_pending);
local_ops = 0;
@@ -3225,7 +3227,9 @@ void rvt_ruc_loopback(struct rvt_qp *sqp)
spin_unlock_irqrestore(&qp->r_lock, flags);
serr_no_r_lock:
spin_lock_irqsave(&sqp->s_lock, flags);
+ spin_lock(&sqp->r_lock);
rvt_send_complete(sqp, wqe, send_status);
+ spin_unlock(&sqp->r_lock);
if (sqp->ibqp.qp_type == IB_QPT_RC) {
int lastwqe;
--
2.35.1
From: Niels Dossche <dossche.niels(a)gmail.com>
[ Upstream commit 22cbc6c2681a0a4fe76150270426e763d52353a4 ]
The documentation of the function rvt_error_qp says both r_lock and
s_lock need to be held when calling that function.
It also asserts using lockdep that both of those locks are held.
rvt_error_qp is called form rvt_send_cq, which is called from
rvt_qp_complete_swqe, which is called from rvt_send_complete, which is
called from rvt_ruc_loopback in two places. Both of these places do not
hold r_lock. Fix this by acquiring a spin_lock of r_lock in both of
these places.
The r_lock acquiring cannot be added in rvt_qp_complete_swqe because
some of its other callers already have r_lock acquired.
Link: https://lore.kernel.org/r/20220228195144.71946-1-dossche.niels@gmail.com
Signed-off-by: Niels Dossche <dossche.niels(a)gmail.com>
Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/infiniband/sw/rdmavt/qp.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/infiniband/sw/rdmavt/qp.c b/drivers/infiniband/sw/rdmavt/qp.c
index d8d52a00a1be..585a9c76e518 100644
--- a/drivers/infiniband/sw/rdmavt/qp.c
+++ b/drivers/infiniband/sw/rdmavt/qp.c
@@ -2826,7 +2826,7 @@ void rvt_qp_iter(struct rvt_dev_info *rdi,
EXPORT_SYMBOL(rvt_qp_iter);
/*
- * This should be called with s_lock held.
+ * This should be called with s_lock and r_lock held.
*/
void rvt_send_complete(struct rvt_qp *qp, struct rvt_swqe *wqe,
enum ib_wc_status status)
@@ -3185,7 +3185,9 @@ void rvt_ruc_loopback(struct rvt_qp *sqp)
rvp->n_loop_pkts++;
flush_send:
sqp->s_rnr_retry = sqp->s_rnr_retry_cnt;
+ spin_lock(&sqp->r_lock);
rvt_send_complete(sqp, wqe, send_status);
+ spin_unlock(&sqp->r_lock);
if (local_ops) {
atomic_dec(&sqp->local_ops_pending);
local_ops = 0;
@@ -3239,7 +3241,9 @@ void rvt_ruc_loopback(struct rvt_qp *sqp)
spin_unlock_irqrestore(&qp->r_lock, flags);
serr_no_r_lock:
spin_lock_irqsave(&sqp->s_lock, flags);
+ spin_lock(&sqp->r_lock);
rvt_send_complete(sqp, wqe, send_status);
+ spin_unlock(&sqp->r_lock);
if (sqp->ibqp.qp_type == IB_QPT_RC) {
int lastwqe;
--
2.35.1