The patch titled
Subject: frontswap: don't call ->init if no ops are registered
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
frontswap-dont-call-init-if-no-ops-are-registered.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Christoph Hellwig <hch(a)lst.de>
Subject: frontswap: don't call ->init if no ops are registered
Date: Fri, 9 Sep 2022 15:08:29 +0200
If no frontswap module (i.e. zswap) was registered, frontswap_ops will be
NULL. In such situation, swapon crashes with the following stack trace:
Unable to handle kernel access to user memory outside uaccess routines at virtual address 0000000000000000
Mem abort info:
ESR = 0x0000000096000004
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x04: level 0 translation fault
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp=00000020a4fab000
[0000000000000000] pgd=0000000000000000, p4d=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
Modules linked in: zram fsl_dpaa2_eth pcs_lynx phylink ahci_qoriq crct10dif_ce ghash_ce sbsa_gwdt fsl_mc_dpio nvme lm90 nvme_core at803x xhci_plat_hcd rtc_fsl_ftm_alarm xgmac_mdio ahci_platform i2c_imx ip6_tables ip_tables fuse
Unloaded tainted modules: cppc_cpufreq():1
CPU: 10 PID: 761 Comm: swapon Not tainted 6.0.0-rc2-00454-g22100432cf14 #1
Hardware name: SolidRun Ltd. SolidRun CEX7 Platform, BIOS EDK II Jun 21 2022
pstate: 00400005 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
pc : frontswap_init+0x38/0x60
lr : __do_sys_swapon+0x8a8/0x9f4
sp : ffff80000969bcf0
x29: ffff80000969bcf0 x28: ffff37bee0d8fc00 x27: ffff80000a7f5000
x26: fffffcdefb971e80 x25: ffffaba797453b90 x24: 0000000000000064
x23: ffff37c1f209d1a8 x22: ffff37bee880e000 x21: ffffaba797748560
x20: ffff37bee0d8fce4 x19: ffffaba797748488 x18: 0000000000000014
x17: 0000000030ec029a x16: ffffaba795a479b0 x15: 0000000000000000
x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000001
x11: ffff37c63c0aba18 x10: 0000000000000000 x9 : ffffaba7956b8c88
x8 : ffff80000969bcd0 x7 : 0000000000000000 x6 : 0000000000000000
x5 : 0000000000000001 x4 : 0000000000000000 x3 : ffffaba79730f000
x2 : ffff37bee0d8fc00 x1 : 0000000000000000 x0 : 0000000000000000
Call trace:
frontswap_init+0x38/0x60
__do_sys_swapon+0x8a8/0x9f4
__arm64_sys_swapon+0x28/0x3c
invoke_syscall+0x78/0x100
el0_svc_common.constprop.0+0xd4/0xf4
do_el0_svc+0x38/0x4c
el0_svc+0x34/0x10c
el0t_64_sync_handler+0x11c/0x150
el0t_64_sync+0x190/0x194
Code: d000e283 910003fd f9006c41 f946d461 (f9400021)
---[ end trace 0000000000000000 ]---
Link: https://lkml.kernel.org/r/20220909130829.3262926-1-hch@lst.de
Fixes: 1da0d94a3ec8 ("frontswap: remove support for multiple ops")
Reported-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Liu Shixin <liushixin2(a)huawei.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/frontswap.c | 3 +++
1 file changed, 3 insertions(+)
--- a/mm/frontswap.c~frontswap-dont-call-init-if-no-ops-are-registered
+++ a/mm/frontswap.c
@@ -125,6 +125,9 @@ void frontswap_init(unsigned type, unsig
* p->frontswap set to something valid to work properly.
*/
frontswap_map_set(sis, map);
+
+ if (!frontswap_enabled())
+ return;
frontswap_ops->init(type);
}
_
Patches currently in -mm which might be from hch(a)lst.de are
frontswap-dont-call-init-if-no-ops-are-registered.patch
mm-remove-the-end_write_func-argument-to-__swap_writepage.patch
On Thu, Sep 08, 2022 at 04:42:38PM +0000, Lazar, Lijo wrote:
> I am not sure if ASPM settings can be generalized by PCIE core.
> Performance vs Power savings when ASPM is enabled will require some
> additional tuning and that will be device specific.
Can you elaborate on this? In the universe of drivers, very few do
their own ASPM configuration, and it's usually to work around hardware
defects, e.g., L1 doesn't work on some e1000e devices, L0s doesn't
work on some iwlwifi devices, etc.
The core does know how to configure all the ASPM features defined in
the PCIe spec, e.g., L0s, L1, L1.1, L1.2, and LTR.
> In some of the other ASICs, this programming is done in VBIOS/SBIOS
> firmware. Having it in driver provides the advantage of additional
> tuning without forcing a VBIOS upgrade.
I think it's clearly the intent of the PCIe spec that ASPM
configuration be done by generic code. Here are some things that
require a system-level view, not just an individual device view:
- L0s, L1, and L1 Substates cannot be enabled unless both ends
support it (PCIe r6.0, secs 5.4.1.4, 7.5.3.7, 5.5.4).
- Devices advertise the "Acceptable Latency" they can accept for
transitions from L0s or L1 to L0, and the actual latency depends
on the "Exit Latencies" of all the devices in the path to the Root
Port (sec 5.4.1.3.2).
- LTR (required by L1.2) cannot be enabled unless it is already
enabled in all upstream devices (sec 6.18). This patch relies on
"ltr_path", which works now but relies on the PCI core never
reconfiguring the upstream path.
There might be amdgpu-specific features the driver needs to set up,
but if drivers fiddle with architected features like LTR behind the
PCI core's back, things are likely to break.
> From: Alex Deucher <alexdeucher(a)gmail.com>
> On Thu, Sep 8, 2022 at 12:12 PM Bjorn Helgaas <helgaas(a)kernel.org> wrote:
> > Do you know why the driver configures ASPM itself? If the PCI core is
> > doing something wrong (and I'm sure it is, ASPM support is kind of a
> > mess), I'd much prefer to fix up the core where *all* drivers can
> > benefit from it.
>
> This is the programming sequence we get from our hardware team and it
> is used on both windows and Linux. As far as I understand it windows
> doesn't handle this in the core, it's up to the individual drivers to
> enable it. I'm not familiar with how this should be enabled
> generically, but at least for our hardware, it seems to have some
> variation compared to what is done in the PCI core due to stability,
> etc. It seems to me that this may need asic specific implementations
> for a lot of hardware depending on the required programming sequences.
> E.g., various asics may need hardware workaround for bugs or platform
> issues, etc. I can ask for more details from our hardware team.
If the PCI core has stability issues, I want to fix them. This
hardware may have its own stability issues, and I would ideally like
to have drivers use interfaces like pci_disable_link_state() to avoid
broken things. Maybe we need new interfaces for more subtle kinds of
breakage.
Bjorn
From: Xiu Jianfeng <xiujianfeng(a)huawei.com>
[ Upstream commit 51dd64bb99e4478fc5280171acd8e1b529eadaf7 ]
This reverts commit ccf11dbaa07b328fa469415c362d33459c140a37.
Commit ccf11dbaa07b ("evm: Fix memleak in init_desc") said there is
memleak in init_desc. That may be incorrect, as we can see, tmp_tfm is
saved in one of the two global variables hmac_tfm or evm_tfm[hash_algo],
then if init_desc is called next time, there is no need to alloc tfm
again, so in the error path of kmalloc desc or crypto_shash_init(desc),
It is not a problem without freeing tmp_tfm.
And also that commit did not reset the global variable to NULL after
freeing tmp_tfm and this makes *tfm a dangling pointer which may cause a
UAF issue.
Reported-by: Guozihua (Scott) <guozihua(a)huawei.com>
Signed-off-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
security/integrity/evm/evm_crypto.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/security/integrity/evm/evm_crypto.c b/security/integrity/evm/evm_crypto.c
index 25dac691491b..ee6bd945f3d6 100644
--- a/security/integrity/evm/evm_crypto.c
+++ b/security/integrity/evm/evm_crypto.c
@@ -75,7 +75,7 @@ static struct shash_desc *init_desc(char type, uint8_t hash_algo)
{
long rc;
const char *algo;
- struct crypto_shash **tfm, *tmp_tfm = NULL;
+ struct crypto_shash **tfm, *tmp_tfm;
struct shash_desc *desc;
if (type == EVM_XATTR_HMAC) {
@@ -120,16 +120,13 @@ static struct shash_desc *init_desc(char type, uint8_t hash_algo)
alloc:
desc = kmalloc(sizeof(*desc) + crypto_shash_descsize(*tfm),
GFP_KERNEL);
- if (!desc) {
- crypto_free_shash(tmp_tfm);
+ if (!desc)
return ERR_PTR(-ENOMEM);
- }
desc->tfm = *tfm;
rc = crypto_shash_init(desc);
if (rc) {
- crypto_free_shash(tmp_tfm);
kfree(desc);
return ERR_PTR(rc);
}
--
2.35.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
873aefb376bb ("vfio/type1: Unpin zero pages")
4b6c33b32296 ("vfio/type1: Prepare for batched pinning with struct vfio_batch")
be16c1fd99f4 ("vfio/type1: Change success value of vaddr_get_pfn()")
aae7a75a821a ("vfio/type1: Add proper error unwind for vfio_iommu_replay()")
64019a2e467a ("mm/gup: remove task_struct pointer for all gup code")
bce617edecad ("mm: do page fault accounting in handle_mm_fault")
ed03d924587e ("mm/gup: use a standard migration target allocation callback")
bbe88753bd42 ("mm/hugetlb: make hugetlb migration callback CMA aware")
41b4dc14ee80 ("mm/gup: restrict CMA region by using allocation scope API")
19fc7bed252c ("mm/migrate: introduce a standard migration target allocation function")
d92bbc2719bd ("mm/hugetlb: unify migration callbacks")
b4b382238ed2 ("mm/migrate: move migration helper from .h to .c")
c7073bab5772 ("mm/page_isolation: prefer the node of the source page")
3e4e28c5a8f0 ("mmap locking API: convert mmap_sem API comments")
d8ed45c5dcd4 ("mmap locking API: use coccinelle to convert mmap_sem rwsem call sites")
ca5999fde0a1 ("mm: introduce include/linux/pgtable.h")
420c2091b65a ("mm/gup: introduce pin_user_pages_locked()")
5a36f0f3f518 ("Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4 Mon Sep 17 00:00:00 2001
From: Alex Williamson <alex.williamson(a)redhat.com>
Date: Mon, 29 Aug 2022 21:05:40 -0600
Subject: [PATCH] vfio/type1: Unpin zero pages
There's currently a reference count leak on the zero page. We increment
the reference via pin_user_pages_remote(), but the page is later handled
as an invalid/reserved page, therefore it's not accounted against the
user and not unpinned by our put_pfn().
Introducing special zero page handling in put_pfn() would resolve the
leak, but without accounting of the zero page, a single user could
still create enough mappings to generate a reference count overflow.
The zero page is always resident, so for our purposes there's no reason
to keep it pinned. Therefore, add a loop to walk pages returned from
pin_user_pages_remote() and unpin any zero pages.
Cc: stable(a)vger.kernel.org
Reported-by: Luboslav Pivarc <lpivarc(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Link: https://lore.kernel.org/r/166182871735.3518559.8884121293045337358.stgit@om…
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index db516c90a977..8706482665d1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -558,6 +558,18 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM,
pages, NULL, NULL);
if (ret > 0) {
+ int i;
+
+ /*
+ * The zero page is always resident, we don't need to pin it
+ * and it falls into our invalid/reserved test so we don't
+ * unpin in put_pfn(). Unpin all zero pages in the batch here.
+ */
+ for (i = 0 ; i < ret; i++) {
+ if (unlikely(is_zero_pfn(page_to_pfn(pages[i]))))
+ unpin_user_page(pages[i]);
+ }
+
*pfn = page_to_pfn(pages[0]);
goto done;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
873aefb376bb ("vfio/type1: Unpin zero pages")
4b6c33b32296 ("vfio/type1: Prepare for batched pinning with struct vfio_batch")
be16c1fd99f4 ("vfio/type1: Change success value of vaddr_get_pfn()")
aae7a75a821a ("vfio/type1: Add proper error unwind for vfio_iommu_replay()")
64019a2e467a ("mm/gup: remove task_struct pointer for all gup code")
bce617edecad ("mm: do page fault accounting in handle_mm_fault")
ed03d924587e ("mm/gup: use a standard migration target allocation callback")
bbe88753bd42 ("mm/hugetlb: make hugetlb migration callback CMA aware")
41b4dc14ee80 ("mm/gup: restrict CMA region by using allocation scope API")
19fc7bed252c ("mm/migrate: introduce a standard migration target allocation function")
d92bbc2719bd ("mm/hugetlb: unify migration callbacks")
b4b382238ed2 ("mm/migrate: move migration helper from .h to .c")
c7073bab5772 ("mm/page_isolation: prefer the node of the source page")
3e4e28c5a8f0 ("mmap locking API: convert mmap_sem API comments")
d8ed45c5dcd4 ("mmap locking API: use coccinelle to convert mmap_sem rwsem call sites")
ca5999fde0a1 ("mm: introduce include/linux/pgtable.h")
420c2091b65a ("mm/gup: introduce pin_user_pages_locked()")
5a36f0f3f518 ("Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4 Mon Sep 17 00:00:00 2001
From: Alex Williamson <alex.williamson(a)redhat.com>
Date: Mon, 29 Aug 2022 21:05:40 -0600
Subject: [PATCH] vfio/type1: Unpin zero pages
There's currently a reference count leak on the zero page. We increment
the reference via pin_user_pages_remote(), but the page is later handled
as an invalid/reserved page, therefore it's not accounted against the
user and not unpinned by our put_pfn().
Introducing special zero page handling in put_pfn() would resolve the
leak, but without accounting of the zero page, a single user could
still create enough mappings to generate a reference count overflow.
The zero page is always resident, so for our purposes there's no reason
to keep it pinned. Therefore, add a loop to walk pages returned from
pin_user_pages_remote() and unpin any zero pages.
Cc: stable(a)vger.kernel.org
Reported-by: Luboslav Pivarc <lpivarc(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Link: https://lore.kernel.org/r/166182871735.3518559.8884121293045337358.stgit@om…
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index db516c90a977..8706482665d1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -558,6 +558,18 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM,
pages, NULL, NULL);
if (ret > 0) {
+ int i;
+
+ /*
+ * The zero page is always resident, we don't need to pin it
+ * and it falls into our invalid/reserved test so we don't
+ * unpin in put_pfn(). Unpin all zero pages in the batch here.
+ */
+ for (i = 0 ; i < ret; i++) {
+ if (unlikely(is_zero_pfn(page_to_pfn(pages[i]))))
+ unpin_user_page(pages[i]);
+ }
+
*pfn = page_to_pfn(pages[0]);
goto done;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
873aefb376bb ("vfio/type1: Unpin zero pages")
4b6c33b32296 ("vfio/type1: Prepare for batched pinning with struct vfio_batch")
be16c1fd99f4 ("vfio/type1: Change success value of vaddr_get_pfn()")
aae7a75a821a ("vfio/type1: Add proper error unwind for vfio_iommu_replay()")
64019a2e467a ("mm/gup: remove task_struct pointer for all gup code")
bce617edecad ("mm: do page fault accounting in handle_mm_fault")
ed03d924587e ("mm/gup: use a standard migration target allocation callback")
bbe88753bd42 ("mm/hugetlb: make hugetlb migration callback CMA aware")
41b4dc14ee80 ("mm/gup: restrict CMA region by using allocation scope API")
19fc7bed252c ("mm/migrate: introduce a standard migration target allocation function")
d92bbc2719bd ("mm/hugetlb: unify migration callbacks")
b4b382238ed2 ("mm/migrate: move migration helper from .h to .c")
c7073bab5772 ("mm/page_isolation: prefer the node of the source page")
3e4e28c5a8f0 ("mmap locking API: convert mmap_sem API comments")
d8ed45c5dcd4 ("mmap locking API: use coccinelle to convert mmap_sem rwsem call sites")
ca5999fde0a1 ("mm: introduce include/linux/pgtable.h")
420c2091b65a ("mm/gup: introduce pin_user_pages_locked()")
5a36f0f3f518 ("Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4 Mon Sep 17 00:00:00 2001
From: Alex Williamson <alex.williamson(a)redhat.com>
Date: Mon, 29 Aug 2022 21:05:40 -0600
Subject: [PATCH] vfio/type1: Unpin zero pages
There's currently a reference count leak on the zero page. We increment
the reference via pin_user_pages_remote(), but the page is later handled
as an invalid/reserved page, therefore it's not accounted against the
user and not unpinned by our put_pfn().
Introducing special zero page handling in put_pfn() would resolve the
leak, but without accounting of the zero page, a single user could
still create enough mappings to generate a reference count overflow.
The zero page is always resident, so for our purposes there's no reason
to keep it pinned. Therefore, add a loop to walk pages returned from
pin_user_pages_remote() and unpin any zero pages.
Cc: stable(a)vger.kernel.org
Reported-by: Luboslav Pivarc <lpivarc(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Link: https://lore.kernel.org/r/166182871735.3518559.8884121293045337358.stgit@om…
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index db516c90a977..8706482665d1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -558,6 +558,18 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM,
pages, NULL, NULL);
if (ret > 0) {
+ int i;
+
+ /*
+ * The zero page is always resident, we don't need to pin it
+ * and it falls into our invalid/reserved test so we don't
+ * unpin in put_pfn(). Unpin all zero pages in the batch here.
+ */
+ for (i = 0 ; i < ret; i++) {
+ if (unlikely(is_zero_pfn(page_to_pfn(pages[i]))))
+ unpin_user_page(pages[i]);
+ }
+
*pfn = page_to_pfn(pages[0]);
goto done;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
873aefb376bb ("vfio/type1: Unpin zero pages")
4b6c33b32296 ("vfio/type1: Prepare for batched pinning with struct vfio_batch")
be16c1fd99f4 ("vfio/type1: Change success value of vaddr_get_pfn()")
aae7a75a821a ("vfio/type1: Add proper error unwind for vfio_iommu_replay()")
64019a2e467a ("mm/gup: remove task_struct pointer for all gup code")
bce617edecad ("mm: do page fault accounting in handle_mm_fault")
ed03d924587e ("mm/gup: use a standard migration target allocation callback")
bbe88753bd42 ("mm/hugetlb: make hugetlb migration callback CMA aware")
41b4dc14ee80 ("mm/gup: restrict CMA region by using allocation scope API")
19fc7bed252c ("mm/migrate: introduce a standard migration target allocation function")
d92bbc2719bd ("mm/hugetlb: unify migration callbacks")
b4b382238ed2 ("mm/migrate: move migration helper from .h to .c")
c7073bab5772 ("mm/page_isolation: prefer the node of the source page")
3e4e28c5a8f0 ("mmap locking API: convert mmap_sem API comments")
d8ed45c5dcd4 ("mmap locking API: use coccinelle to convert mmap_sem rwsem call sites")
ca5999fde0a1 ("mm: introduce include/linux/pgtable.h")
420c2091b65a ("mm/gup: introduce pin_user_pages_locked()")
5a36f0f3f518 ("Merge tag 'vfio-v5.8-rc1' of git://github.com/awilliam/linux-vfio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4 Mon Sep 17 00:00:00 2001
From: Alex Williamson <alex.williamson(a)redhat.com>
Date: Mon, 29 Aug 2022 21:05:40 -0600
Subject: [PATCH] vfio/type1: Unpin zero pages
There's currently a reference count leak on the zero page. We increment
the reference via pin_user_pages_remote(), but the page is later handled
as an invalid/reserved page, therefore it's not accounted against the
user and not unpinned by our put_pfn().
Introducing special zero page handling in put_pfn() would resolve the
leak, but without accounting of the zero page, a single user could
still create enough mappings to generate a reference count overflow.
The zero page is always resident, so for our purposes there's no reason
to keep it pinned. Therefore, add a loop to walk pages returned from
pin_user_pages_remote() and unpin any zero pages.
Cc: stable(a)vger.kernel.org
Reported-by: Luboslav Pivarc <lpivarc(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Link: https://lore.kernel.org/r/166182871735.3518559.8884121293045337358.stgit@om…
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index db516c90a977..8706482665d1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -558,6 +558,18 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM,
pages, NULL, NULL);
if (ret > 0) {
+ int i;
+
+ /*
+ * The zero page is always resident, we don't need to pin it
+ * and it falls into our invalid/reserved test so we don't
+ * unpin in put_pfn(). Unpin all zero pages in the batch here.
+ */
+ for (i = 0 ; i < ret; i++) {
+ if (unlikely(is_zero_pfn(page_to_pfn(pages[i]))))
+ unpin_user_page(pages[i]);
+ }
+
*pfn = page_to_pfn(pages[0]);
goto done;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
873aefb376bb ("vfio/type1: Unpin zero pages")
4b6c33b32296 ("vfio/type1: Prepare for batched pinning with struct vfio_batch")
be16c1fd99f4 ("vfio/type1: Change success value of vaddr_get_pfn()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 873aefb376bbc0ed1dd2381ea1d6ec88106fdbd4 Mon Sep 17 00:00:00 2001
From: Alex Williamson <alex.williamson(a)redhat.com>
Date: Mon, 29 Aug 2022 21:05:40 -0600
Subject: [PATCH] vfio/type1: Unpin zero pages
There's currently a reference count leak on the zero page. We increment
the reference via pin_user_pages_remote(), but the page is later handled
as an invalid/reserved page, therefore it's not accounted against the
user and not unpinned by our put_pfn().
Introducing special zero page handling in put_pfn() would resolve the
leak, but without accounting of the zero page, a single user could
still create enough mappings to generate a reference count overflow.
The zero page is always resident, so for our purposes there's no reason
to keep it pinned. Therefore, add a loop to walk pages returned from
pin_user_pages_remote() and unpin any zero pages.
Cc: stable(a)vger.kernel.org
Reported-by: Luboslav Pivarc <lpivarc(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Link: https://lore.kernel.org/r/166182871735.3518559.8884121293045337358.stgit@om…
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index db516c90a977..8706482665d1 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -558,6 +558,18 @@ static int vaddr_get_pfns(struct mm_struct *mm, unsigned long vaddr,
ret = pin_user_pages_remote(mm, vaddr, npages, flags | FOLL_LONGTERM,
pages, NULL, NULL);
if (ret > 0) {
+ int i;
+
+ /*
+ * The zero page is always resident, we don't need to pin it
+ * and it falls into our invalid/reserved test so we don't
+ * unpin in put_pfn(). Unpin all zero pages in the batch here.
+ */
+ for (i = 0 ; i < ret; i++) {
+ if (unlikely(is_zero_pfn(page_to_pfn(pages[i]))))
+ unpin_user_page(pages[i]);
+ }
+
*pfn = page_to_pfn(pages[0]);
goto done;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
Possible dependencies:
54c3931957f6 ("tracing: hold caller_addr to hardirq_{enable,disable}_ip")
8b023accc8df ("lockdep: Fix -Wunused-parameter for _THIS_IP_")
ef9989afda73 ("kvm: add guest_state_{enter,exit}_irqoff()")
ed922739c919 ("KVM: Use interval tree to do fast hva lookup in memslots")
26b8345abc75 ("KVM: Resolve memslot ID via a hash table instead of via a static array")
1e8617d37fc3 ("KVM: Move WARN on invalid memslot index to update_memslots()")
4e4d30cb9b87 ("KVM: Resync only arch fields when slots_arch_lock gets reacquired")
c5b077549136 ("KVM: Convert the kvm->vcpus array to a xarray")
27592ae8dbe4 ("KVM: Move wiping of the kvm->vcpus array to common code")
bda44d844758 ("KVM: Ensure local memslot copies operate on up-to-date arch-specific data")
99cdc6c18c2d ("RISC-V: Add initial skeletal KVM support")
192ad3c27a48 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 54c3931957f6a6194d5972eccc36d052964b2abe Mon Sep 17 00:00:00 2001
From: Yipeng Zou <zouyipeng(a)huawei.com>
Date: Thu, 1 Sep 2022 18:45:14 +0800
Subject: [PATCH] tracing: hold caller_addr to hardirq_{enable,disable}_ip
Currently, The arguments passing to lockdep_hardirqs_{on,off} was fixed
in CALLER_ADDR0.
The function trace_hardirqs_on_caller should have been intended to use
caller_addr to represent the address that caller wants to be traced.
For example, lockdep log in riscv showing the last {enabled,disabled} at
__trace_hardirqs_{on,off} all the time(if called by):
[ 57.853175] hardirqs last enabled at (2519): __trace_hardirqs_on+0xc/0x14
[ 57.853848] hardirqs last disabled at (2520): __trace_hardirqs_off+0xc/0x14
After use trace_hardirqs_xx_caller, we can get more effective information:
[ 53.781428] hardirqs last enabled at (2595): restore_all+0xe/0x66
[ 53.782185] hardirqs last disabled at (2596): ret_from_exception+0xa/0x10
Link: https://lkml.kernel.org/r/20220901104515.135162-2-zouyipeng@huawei.com
Cc: stable(a)vger.kernel.org
Fixes: c3bc8fd637a96 ("tracing: Centralize preemptirq tracepoints and unify their usage")
Signed-off-by: Yipeng Zou <zouyipeng(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_preemptirq.c b/kernel/trace/trace_preemptirq.c
index 95b58bd757ce..1e130da1b742 100644
--- a/kernel/trace/trace_preemptirq.c
+++ b/kernel/trace/trace_preemptirq.c
@@ -95,14 +95,14 @@ __visible void trace_hardirqs_on_caller(unsigned long caller_addr)
}
lockdep_hardirqs_on_prepare();
- lockdep_hardirqs_on(CALLER_ADDR0);
+ lockdep_hardirqs_on(caller_addr);
}
EXPORT_SYMBOL(trace_hardirqs_on_caller);
NOKPROBE_SYMBOL(trace_hardirqs_on_caller);
__visible void trace_hardirqs_off_caller(unsigned long caller_addr)
{
- lockdep_hardirqs_off(CALLER_ADDR0);
+ lockdep_hardirqs_off(caller_addr);
if (!this_cpu_read(tracing_irq_cpu)) {
this_cpu_write(tracing_irq_cpu, 1);