From: Peter Xu <peterx(a)redhat.com>
commit 7196040e19ad634293acd3eff7083149d7669031 upstream.
Patch series "mm/gup: some cleanups", v5.
This patch (of 5):
Alex reported invalid page pointer returned with pin_user_pages_remote()
from vfio after upstream commit 4b6c33b32296 ("vfio/type1: Prepare for
batched pinning with struct vfio_batch").
It turns out that it's not the fault of the vfio commit; however after
vfio switches to a full page buffer to store the page pointers it starts
to …
[View More]expose the problem easier.
The problem is for VM_PFNMAP vmas we should normally fail with an
-EFAULT then vfio will carry on to handle the MMIO regions. However
when the bug triggered, follow_page_mask() returned -EEXIST for such a
page, which will jump over the current page, leaving that entry in
**pages untouched. However the caller is not aware of it, hence the
caller will reference the page as usual even if the pointer data can be
anything.
We had that -EEXIST logic since commit 1027e4436b6a ("mm: make GUP
handle pfn mapping unless FOLL_GET is requested") which seems very
reasonable. It could be that when we reworked GUP with FOLL_PIN we
could have overlooked that special path in commit 3faa52c03f44 ("mm/gup:
track FOLL_PIN pages"), even if that commit rightfully touched up
follow_devmap_pud() on checking FOLL_PIN when it needs to return an
-EEXIST.
Attaching the Fixes to the FOLL_PIN rework commit, as it happened later
than 1027e4436b6a.
[jhubbard(a)nvidia.com: added some tags, removed a reference to an out of tree module.]
Link: https://lkml.kernel.org/r/20220207062213.235127-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20220204020010.68930-1-jhubbard@nvidia.com
Link: https://lkml.kernel.org/r/20220204020010.68930-2-jhubbard@nvidia.com
Fixes: 3faa52c03f44 ("mm/gup: track FOLL_PIN pages")
Signed-off-by: Peter Xu <peterx(a)redhat.com>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Reviewed-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Reported-by: Alex Williamson <alex.williamson(a)redhat.com>
Debugged-by: Alex Williamson <alex.williamson(a)redhat.com>
Tested-by: Alex Williamson <alex.williamson(a)redhat.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Lukas Bulwahn <lukas.bulwahn(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
mm/gup.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/mm/gup.c b/mm/gup.c
index 7bc1ba9ce440..41da0bd61bec 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -465,7 +465,7 @@ static int follow_pfn_pte(struct vm_area_struct *vma, unsigned long address,
pte_t *pte, unsigned int flags)
{
/* No page to get reference */
- if (flags & FOLL_GET)
+ if (flags & (FOLL_GET | FOLL_PIN))
return -EFAULT;
if (flags & FOLL_TOUCH) {
--
2.36.0
[View Less]
Based on DesignWare Ethernet QoS datasheet, we are seeing the limitation
of Split Header (SPH) feature is not supported for Ipv4 fragmented packet.
This SPH limitation will cause ping failure when the packets size exceed
the MTU size. For example, the issue happens once the basic ping packet
size is larger than the configured MTU size and the data is lost inside
the fragmented packet, replaced by zeros/corrupted values, and leads to
ping fail.
So, disable the Split Header for Intel platforms.
…
[View More]v2: Add fixes tag in commit message.
Fixes: 67afd6d1cfdf("net: stmmac: Add Split Header support and enable it
in XGMAC cores")
Cc: <stable(a)vger.kernel.org> # 5.10.x
Suggested-by: Ong, Boon Leong <boon.leong.ong(a)intel.com>
Signed-off-by: Mohammad Athari Bin Ismail <mohammad.athari.ismail(a)intel.com>
Signed-off-by: Wong Vee Khee <vee.khee.wong(a)linux.intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan(a)linux.intel.com>
---
drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c | 1 +
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +-
include/linux/stmmac.h | 1 +
3 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
index 63754a9c4ba7..0b0be0898ac5 100644
--- a/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
+++ b/drivers/net/ethernet/stmicro/stmmac/dwmac-intel.c
@@ -454,6 +454,7 @@ static int intel_mgbe_common_data(struct pci_dev *pdev,
plat->has_gmac4 = 1;
plat->force_sf_dma_mode = 0;
plat->tso_en = 1;
+ plat->sph_disable = 1;
/* Multiplying factor to the clk_eee_i clock time
* period to make it closer to 100 ns. This value
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
index 4a4b3651ab3e..2525a80353b7 100644
--- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
+++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c
@@ -7021,7 +7021,7 @@ int stmmac_dvr_probe(struct device *device,
dev_info(priv->device, "TSO feature enabled\n");
}
- if (priv->dma_cap.sphen) {
+ if (priv->dma_cap.sphen && !priv->plat->sph_disable) {
ndev->hw_features |= NETIF_F_GRO;
priv->sph_cap = true;
priv->sph = priv->sph_cap;
diff --git a/include/linux/stmmac.h b/include/linux/stmmac.h
index 24eea1b05ca2..29917850f079 100644
--- a/include/linux/stmmac.h
+++ b/include/linux/stmmac.h
@@ -270,5 +270,6 @@ struct plat_stmmacenet_data {
int msi_rx_base_vec;
int msi_tx_base_vec;
bool use_phy_wol;
+ bool sph_disable;
};
#endif
--
2.25.1
[View Less]
From: Hugh Dickins <hughd(a)google.com>
Twice now, when exercising ext4 looped on shmem huge pages, I have crashed
on the PF_ONLY_HEAD check inside PageWaiters(): ext4_finish_bio() calling
end_page_writeback() calling wake_up_page() on tail of a shmem huge page,
no longer an ext4 page at all.
The problem is that PageWriteback is not accompanied by a page reference
(as the NOTE at the end of test_clear_page_writeback() acknowledges): as
soon as TestClearPageWriteback has been done, that …
[View More]page could be removed
from page cache, freed, and reused for something else by the time that
wake_up_page() is reached.
https://lore.kernel.org/linux-mm/20200827122019.GC14765@casper.infradead.or…
Matthew Wilcox suggested avoiding or weakening the PageWaiters() tail
check; but I'm paranoid about even looking at an unreferenced struct page,
lest its memory might itself have already been reused or hotremoved (and
wake_up_page_bit() may modify that memory with its ClearPageWaiters()).
Then on crashing a second time, realized there's a stronger reason against
that approach. If my testing just occasionally crashes on that check,
when the page is reused for part of a compound page, wouldn't it be much
more common for the page to get reused as an order-0 page before reaching
wake_up_page()? And on rare occasions, might that reused page already be
marked PageWriteback by its new user, and already be waited upon? What
would that look like?
It would look like BUG_ON(PageWriteback) after wait_on_page_writeback()
in write_cache_pages() (though I have never seen that crash myself).
Matthew Wilcox explaining this to himself:
"page is allocated, added to page cache, dirtied, writeback starts,
--- thread A ---
filesystem calls end_page_writeback()
test_clear_page_writeback()
--- context switch to thread B ---
truncate_inode_pages_range() finds the page, it doesn't have writeback set,
we delete it from the page cache. Page gets reallocated, dirtied, writeback
starts again. Then we call write_cache_pages(), see
PageWriteback() set, call wait_on_page_writeback()
--- context switch back to thread A ---
wake_up_page(page, PG_writeback);
... thread B is woken, but because the wakeup was for the old use of
the page, PageWriteback is still set.
Devious"
And prior to 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
this would have been much less likely: before that, wake_page_function()'s
non-exclusive case would stop walking and not wake if it found Writeback
already set again; whereas now the non-exclusive case proceeds to wake.
I have not thought of a fix that does not add a little overhead: the
simplest fix is for end_page_writeback() to get_page() before calling
test_clear_page_writeback(), then put_page() after wake_up_page().
Was there a chance of missed wakeups before, since a page freed before
reaching wake_up_page() would have PageWaiters cleared? I think not,
because each waiter does hold a reference on the page. This bug comes
when the old use of the page, the one we do TestClearPageWriteback on,
had *no* waiters, so no additional page reference beyond the page cache
(and whoever racily freed it). The reuse of the page has a waiter
holding a reference, and its own PageWriteback set; but the belated
wake_up_page() has woken the reuse to hit that BUG_ON(PageWriteback).
Orabug: 33634162
Reported-by: syzbot+3622cea378100f45d59f(a)syzkaller.appspotmail.com
Reported-by: Qian Cai <cai(a)lca.pw>
Fixes: 2a9127fcf229 ("mm: rewrite wait_on_page_bit_common() logic")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Cc: stable(a)vger.kernel.org # v5.8+
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
(cherry picked from commit 073861ed77b6b957c3c8d54a11dc503f7d986ceb)
Signed-off-by: Partha Sarathi Satapathy <partha.satapathy(a)oracle.com>
---
mm/filemap.c | 8 ++++++++
mm/page-writeback.c | 6 ------
2 files changed, 8 insertions(+), 6 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index dc20b49..336a065 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1366,11 +1366,19 @@ void end_page_writeback(struct page *page)
rotate_reclaimable_page(page);
}
+ /*
+ * Writeback does not hold a page reference of its own, relying
+ * on truncation to wait for the clearing of PG_writeback.
+ * But here we must make sure that the page is not freed and
+ * reused before the wake_up_page().
+ */
+ get_page(page);
if (!test_clear_page_writeback(page))
BUG();
smp_mb__after_atomic();
wake_up_page(page, PG_writeback);
+ put_page(page);
}
EXPORT_SYMBOL(end_page_writeback);
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 2d658b2..52d3006 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2746,12 +2746,6 @@ int test_clear_page_writeback(struct page *page)
} else {
ret = TestClearPageWriteback(page);
}
- /*
- * NOTE: Page might be free now! Writeback doesn't hold a page
- * reference on its own, it relies on truncation to wait for
- * the clearing of PG_writeback. The below can only access
- * page state that is static across allocation cycles.
- */
if (ret) {
dec_lruvec_state(lruvec, NR_WRITEBACK);
dec_zone_page_state(page, NR_ZONE_WRITE_PENDING);
--
1.8.3.1
[View Less]
Save some CPU from unnecessary polling and make SPI flash reading
a tiny bit faster.
Cc: <stable(a)vger.kernel.org> # v5.7+
Fixes: 23beb1adb5f6 ("arm64: dts: mt7622: add flash related device nodes")
Signed-off-by: Chuanhong Guo <gch981213(a)gmail.com>
---
The nor controller driver in kernem when this dt is added doesn't
support IRQ so there isn't one defined in dt back then. However,
device-tree is supposed to describe the hardware, so I think this
can count as a fix.
My main …
[View More]purpose for the fixes tag is just for the linux-stable
backport though. spi-mtk-nor supports interrupt since v5.7.
arch/arm64/boot/dts/mediatek/mt7622.dtsi | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/boot/dts/mediatek/mt7622.dtsi b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
index 8c2563a3919a..e263a81a011b 100644
--- a/arch/arm64/boot/dts/mediatek/mt7622.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt7622.dtsi
@@ -570,6 +570,7 @@ nor_flash: spi@11014000 {
compatible = "mediatek,mt7622-nor",
"mediatek,mt8173-nor";
reg = <0 0x11014000 0 0xe0>;
+ interrupts = <GIC_SPI 88 IRQ_TYPE_LEVEL_LOW>;
clocks = <&pericfg CLK_PERI_FLASH_PD>,
<&topckgen CLK_TOP_FLASH_SEL>;
clock-names = "spi", "sf";
--
2.35.1
[View Less]
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT …
[View More]configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: 7505576d1c1a ("MIPS: add support for SGI Octane (IP30)")
Cc: stable(a)vger.kernel.org # v5.5+
---
Changes from v1:
- s/chosen/disabled/ in the change description.
---
arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h | 1 -
1 file changed, 1 deletion(-)
linux-mips-ip30-cpu-has-fpu.diff
Index: linux-macro/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
===================================================================
--- linux-macro.orig/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
+++ linux-macro/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
@@ -28,7 +28,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
[View Less]
As Yanming reported in bugzilla:
https://bugzilla.kernel.org/show_bug.cgi?id=215904
The kernel message is shown below:
kernel BUG at fs/f2fs/inode.c:825!
Call Trace:
evict+0x282/0x4e0
__dentry_kill+0x2b2/0x4d0
shrink_dentry_list+0x17c/0x4f0
shrink_dcache_parent+0x143/0x1e0
do_one_tree+0x9/0x30
shrink_dcache_for_umount+0x51/0x120
generic_shutdown_super+0x5c/0x3a0
kill_block_super+0x90/0xd0
kill_f2fs_super+0x225/0x310
deactivate_locked_super+0x78/0xc0
cleanup_mnt+0x2b7/0x480
…
[View More]task_work_run+0xc8/0x150
exit_to_user_mode_prepare+0x14a/0x150
syscall_exit_to_user_mode+0x1d/0x40
do_syscall_64+0x48/0x90
The root cause is: inode node and dnode node share the same nid,
so during f2fs_evict_inode(), dnode node truncation will invalidate
its NAT entry, so when truncating inode node, it fails due to
invalid NAT entry, result in inode is still marked as dirty, fix
this issue by clearing dirty for inode and setting SBI_NEED_FSCK
flag in filesystem.
output from dump.f2fs:
[print_node_info: 354] Node ID [0xf:15] is inode
i_nid[0] [0x f : 15]
Cc: stable(a)vger.kernel.org
Reported-by: Ming Yan <yanming(a)tju.edu.cn>
Signed-off-by: Chao Yu <chao.yu(a)oppo.com>
---
fs/f2fs/inode.c | 16 +++++++++++++++-
1 file changed, 15 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/inode.c b/fs/f2fs/inode.c
index 234b8ed02644..030474b842ce 100644
--- a/fs/f2fs/inode.c
+++ b/fs/f2fs/inode.c
@@ -795,8 +795,22 @@ void f2fs_evict_inode(struct inode *inode)
f2fs_lock_op(sbi);
err = f2fs_remove_inode_page(inode);
f2fs_unlock_op(sbi);
- if (err == -ENOENT)
+ if (err == -ENOENT) {
err = 0;
+
+ /*
+ * in fuzzed image, another node may has the same
+ * block address as inode's, if it was truncated
+ * previously, truncation of inode node will fail.
+ */
+ if (is_inode_flag_set(inode, FI_DIRTY_INODE)) {
+ f2fs_warn(F2FS_I_SB(inode),
+ "f2fs_evict_inode: inconsistent node id, ino:%lu",
+ inode->i_ino);
+ f2fs_inode_synced(inode);
+ set_sbi_flag(sbi, SBI_NEED_FSCK);
+ }
+ }
}
/* give more chances, if ENOMEM case */
--
2.25.1
[View Less]
Hello Dear
I am glad to know you, but God knows you better and he knows why he
has directed me to you at this point in time so do not be surprised at
all. My name is Mrs.Sophia Erick, a widow, i have been suffering from
ovarian cancer disease. At this moment i am about to end the race like
this because the illness has gotten to a very bad stage, without any
family members and no child. I hope that you will not expose or betray
this trust and confidence that I am about to entrust to you for the
…
[View More]mutual benefit of the orphans and the less privileged ones. I have
some funds I inherited from my late husband,the sum of ($11.000.000
Eleven million dollars.) deposited in the Bank. Having known my
present health status, I decided to entrust this fund to you believing
that you will utilize it the way i am going to instruct
herein.Therefore I need you to assist me and reclaim this money and
use it for Charity works, for orphanages and giving justice and help
to the poor, needy and to promote the words of God and the effort that
the house of God will be maintained says The Lord." Jeremiah
22:15-16.“
It will be my great pleasure to compensate you with 35 % percent of
the total money for your personal use, 5 % percent for any expenses
that may occur during the international transfer process while 60% of
the money will go to the charity project. All I require from you is
sincerity and the ability to complete God's task without any failure.
It will be my pleasure to see that the bank has finally released and
transferred the fund into your bank account therein your country even
before I die here in the hospital, because of my present health status
everything needs to be processed rapidly as soon as possible. Please
kindly respond quickly. Thanks and God bless you,
May God Bless you for your kind help.
Yours sincerely sister Mrs. Sophia Erick.
[View Less]