From: Marc Zyngier maz@kernel.org
[ Upstream commit 668a9fe5c6a1bcac6b65d5e9b91a9eca86f782a3 ]
When requesting an interrupt, we correctly call into the runtime PM framework to guarantee that the underlying interrupt controller is up and running.
However, we fail to do so for chained interrupt controllers, as the mux interrupt is not requested along the same path.
Augment __irq_do_set_handler() to call into the runtime PM code in this case, making sure the PM flow is the same for all interrupts.
Reported-by: Lucas Stach l.stach@pengutronix.de Tested-by: Liu Ying victor.liu@nxp.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/26973cddee5f527ea17184c0f3fccb70bc8969a0.camel@pen... Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/irq/chip.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/irq/chip.c b/kernel/irq/chip.c index a98bcfc4be7b..25ddd3d9f0bc 100644 --- a/kernel/irq/chip.c +++ b/kernel/irq/chip.c @@ -1011,8 +1011,10 @@ __irq_do_set_handler(struct irq_desc *desc, irq_flow_handler_t handle, if (desc->irq_data.chip != &no_irq_chip) mask_ack_irq(desc); irq_state_set_disabled(desc); - if (is_chained) + if (is_chained) { desc->action = NULL; + WARN_ON(irq_chip_pm_put(irq_desc_get_irq_data(desc))); + } desc->depth = 1; } desc->handle_irq = handle; @@ -1038,6 +1040,7 @@ __irq_do_set_handler(struct irq_desc *desc, irq_flow_handler_t handle, irq_settings_set_norequest(desc); irq_settings_set_nothread(desc); desc->action = &chained_action; + WARN_ON(irq_chip_pm_get(irq_desc_get_irq_data(desc))); irq_activate_and_startup(desc, IRQ_RESEND); } }
From: Kunihiko Hayashi hayashi.kunihiko@socionext.com
[ Upstream commit e3f056a7aafabe4ac3ad4b7465ba821b44a7e639 ]
Add the compatible string to support UniPhier NX1 SoC, which has the same kinds of controls as the other UniPhier SoCs.
Signed-off-by: Kunihiko Hayashi hayashi.kunihiko@socionext.com Signed-off-by: Marc Zyngier maz@kernel.org Link: https://lore.kernel.org/r/1653023822-19229-3-git-send-email-hayashi.kunihiko... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/irqchip/irq-uniphier-aidet.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/irqchip/irq-uniphier-aidet.c b/drivers/irqchip/irq-uniphier-aidet.c index 89121b39be26..716b1bb88bf2 100644 --- a/drivers/irqchip/irq-uniphier-aidet.c +++ b/drivers/irqchip/irq-uniphier-aidet.c @@ -237,6 +237,7 @@ static const struct of_device_id uniphier_aidet_match[] = { { .compatible = "socionext,uniphier-ld11-aidet" }, { .compatible = "socionext,uniphier-ld20-aidet" }, { .compatible = "socionext,uniphier-pxs3-aidet" }, + { .compatible = "socionext,uniphier-nx1-aidet" }, { /* sentinel */ } };
From: Kees Cook keescook@chromium.org
[ Upstream commit 67ea0a2adbf667cd6da4965fbcfd0da741035084 ]
The pwep allocation was always being allocated smaller than the true structure size. Avoid this by always allocating the full structure. Found with GCC 12 and -Warray-bounds:
../drivers/staging/rtl8723bs/os_dep/ioctl_linux.c: In function 'rtw_set_encryption': ../drivers/staging/rtl8723bs/os_dep/ioctl_linux.c:591:29: warning: array subscript 'struct ndis_802_11_wep[0]' is partly outside array bounds of 'void[25]' [-Warray-bounds] 591 | pwep->length = wep_total_len; | ^~
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Fabio Aiuto fabioaiuto83@gmail.com Cc: Hans de Goede hdegoede@redhat.com Cc: linux-staging@lists.linux.dev Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20220608215512.1070847-1-keescook@chromium.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/rtl8723bs/os_dep/ioctl_linux.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/rtl8723bs/os_dep/ioctl_linux.c b/drivers/staging/rtl8723bs/os_dep/ioctl_linux.c index 295121c268bd..4ec3910d1fcf 100644 --- a/drivers/staging/rtl8723bs/os_dep/ioctl_linux.c +++ b/drivers/staging/rtl8723bs/os_dep/ioctl_linux.c @@ -104,7 +104,8 @@ static int wpa_set_encryption(struct net_device *dev, struct ieee_param *param, if (wep_key_len > 0) { wep_key_len = wep_key_len <= 5 ? 5 : 13; wep_total_len = wep_key_len + FIELD_OFFSET(struct ndis_802_11_wep, key_material); - pwep = kzalloc(wep_total_len, GFP_KERNEL); + /* Allocate a full structure to avoid potentially running off the end. */ + pwep = kzalloc(sizeof(*pwep), GFP_KERNEL); if (!pwep) { ret = -ENOMEM; goto exit; @@ -596,7 +597,8 @@ static int rtw_set_encryption(struct net_device *dev, struct ieee_param *param, if (wep_key_len > 0) { wep_key_len = wep_key_len <= 5 ? 5 : 13; wep_total_len = wep_key_len + FIELD_OFFSET(struct ndis_802_11_wep, key_material); - pwep = kzalloc(wep_total_len, GFP_KERNEL); + /* Allocate a full structure to avoid potentially running off the end. */ + pwep = kzalloc(sizeof(*pwep), GFP_KERNEL); if (!pwep) goto exit;
From: Alexander Usyskin alexander.usyskin@intel.com
[ Upstream commit 9f4639373e6756e1ccf0029f861f1061db3c3616 ]
Link reset flow is always performed in the runtime resumed state. The internal PG state may be left as ON after the suspend and will not be updated upon the resume if the D0i3 is not supported.
Ensure that the internal PG state is set to the right value on the flow entrance in case the firmware does not support D0i3.
Signed-off-by: Alexander Usyskin alexander.usyskin@intel.com Signed-off-by: Tomas Winkler tomas.winkler@intel.com Link: https://lore.kernel.org/r/20220606144225.282375-1-tomas.winkler@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/misc/mei/hw-me.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/misc/mei/hw-me.c b/drivers/misc/mei/hw-me.c index fbc4c9581864..ffe0da1ad58f 100644 --- a/drivers/misc/mei/hw-me.c +++ b/drivers/misc/mei/hw-me.c @@ -1154,6 +1154,8 @@ static int mei_me_hw_reset(struct mei_device *dev, bool intr_enable) ret = mei_me_d0i3_exit_sync(dev); if (ret) return ret; + } else { + hw->pg_state = MEI_PG_OFF; } }
From: Keith Busch kbusch@kernel.org
[ Upstream commit 4641a8e6e145f595059e695f0f8dbbe608134086 ]
Many users have encountered IO timeouts with a CSTS value of 0xffffffff, which indicates a failure to read the register. While there are various potential causes for this observation, faulty NVMe APST has been the culprit quite frequently. Add the recommended troubleshooting steps in the error output when this condition occurs.
Signed-off-by: Keith Busch kbusch@kernel.org Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 3ddd24a42043..e0accbf40f49 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -1241,6 +1241,14 @@ static void nvme_warn_reset(struct nvme_dev *dev, u32 csts) dev_warn(dev->ctrl.device, "controller is down; will reset: CSTS=0x%x, PCI_STATUS read failed (%d)\n", csts, result); + + if (csts != ~0) + return; + + dev_warn(dev->ctrl.device, + "Does your device have a faulty power saving mode enabled?\n"); + dev_warn(dev->ctrl.device, + "Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug\n"); }
static enum blk_eh_timer_return nvme_timeout(struct request *req, bool reserved)
From: Stefan Reiter stefan@pimaker.at
[ Upstream commit 3765fad508964f433ac111c127d6bedd19bdfa04 ]
ADATA XPG GAMMIX S50 drives report bogus eui64 values that appear to be the same across drives in one system. Quirk them out so they are not marked as "non globally unique" duplicates.
Signed-off-by: Stefan Reiter stefan@pimaker.at Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index e0accbf40f49..700857af5b79 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3368,6 +3368,8 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x1e4B, 0x1202), /* MAXIO MAP1202 */ .driver_data = NVME_QUIRK_BOGUS_NID, }, + { PCI_DEVICE(0x1cc1, 0x5350), /* ADATA XPG GAMMIX S50 */ + .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0061), .driver_data = NVME_QUIRK_DMA_ADDRESS_BITS_48, }, { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0065),
From: Keith Busch kbusch@kernel.org
[ Upstream commit 2cf7a77ed5f8903606f4f7833d02d67b08650442 ]
Add the quirk.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216049 Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 700857af5b79..93d1f12f31bb 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3344,6 +3344,8 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY | NVME_QUIRK_DISABLE_WRITE_ZEROES| NVME_QUIRK_IGNORE_DEV_SUBNQN, }, + { PCI_DEVICE(0x1987, 0x5012), /* Phison E12 */ + .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x1987, 0x5016), /* Phison E16 */ .driver_data = NVME_QUIRK_IGNORE_DEV_SUBNQN, }, { PCI_DEVICE(0x1b4b, 0x1092), /* Lexar 256 GB SSD */
From: Keith Busch kbusch@kernel.org
[ Upstream commit c98a879312caf775c9768faed25ce1c013b4df04 ]
Add the quirk.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216096 Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 93d1f12f31bb..f06bc7596e2b 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3328,7 +3328,8 @@ static const struct pci_device_id nvme_id_table[] = { { PCI_VDEVICE(REDHAT, 0x0010), /* Qemu emulated controller */ .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x126f, 0x2263), /* Silicon Motion unidentified */ - .driver_data = NVME_QUIRK_NO_NS_DESC_LIST, }, + .driver_data = NVME_QUIRK_NO_NS_DESC_LIST | + NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x1bb1, 0x0100), /* Seagate Nytro Flash Storage */ .driver_data = NVME_QUIRK_DELAY_BEFORE_CHK_RDY | NVME_QUIRK_NO_NS_DESC_LIST, },
From: Keith Busch kbusch@kernel.org
[ Upstream commit c4f01a776b28378f4f61b53f8cb0e358f4fa3721 ]
Add the quirk.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216049 Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index f06bc7596e2b..78e382651198 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3359,6 +3359,8 @@ static const struct pci_device_id nvme_id_table[] = { NVME_QUIRK_IGNORE_DEV_SUBNQN, }, { PCI_DEVICE(0x1c5c, 0x1504), /* SK Hynix PC400 */ .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x1c5c, 0x174a), /* SK Hynix P31 SSD */ + .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x15b7, 0x2001), /* Sandisk Skyhawk */ .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, { PCI_DEVICE(0x1d97, 0x2263), /* SPCC */
From: Ning Wang ningwang35@outlook.com
[ Upstream commit 6b961bce50e489186232cef51036ddb8d672bc3b ]
When ZHITAI TiPro7000 SSDs entered deepest power state(ps4) it has the same APST sleep problem as Kingston A2000. by chance the system crashes and displays the same dmesg info:
https://bugzilla.kernel.org/show_bug.cgi?id=195039#c65
As the Archlinux wiki suggest (enlat + exlat) < 25000 is fine and my testing shows no system crashes ever since. Therefore disabling the deepest power state will fix the APST sleep issue.
https://wiki.archlinux.org/title/Solid_state_drive/NVMe
This is the APST data from 'nvme id-ctrl /dev/nvme1'
NVME Identify Controller: vid : 0x1e49 ssvid : 0x1e49 sn : [...] mn : ZHITAI TiPro7000 1TB fr : ZTA32F3Y [...] ps 0 : mp:3.50W operational enlat:5 exlat:5 rrt:0 rrl:0 rwt:0 rwl:0 idle_power:- active_power:- ps 1 : mp:3.30W operational enlat:50 exlat:100 rrt:1 rrl:1 rwt:1 rwl:1 idle_power:- active_power:- ps 2 : mp:2.80W operational enlat:50 exlat:200 rrt:2 rrl:2 rwt:2 rwl:2 idle_power:- active_power:- ps 3 : mp:0.1500W non-operational enlat:500 exlat:5000 rrt:3 rrl:3 rwt:3 rwl:3 idle_power:- active_power:- ps 4 : mp:0.0200W non-operational enlat:2000 exlat:60000 rrt:4 rrl:4 rwt:4 rwl:4 idle_power:- active_power:-
Signed-off-by: Ning Wang ningwang35@outlook.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 78e382651198..9847a20f9623 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3375,6 +3375,8 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_BOGUS_NID, }, { PCI_DEVICE(0x1cc1, 0x5350), /* ADATA XPG GAMMIX S50 */ .driver_data = NVME_QUIRK_BOGUS_NID, }, + { PCI_DEVICE(0x1e49, 0x0041), /* ZHITAI TiPro7000 NVMe SSD */ + .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0061), .driver_data = NVME_QUIRK_DMA_ADDRESS_BITS_48, }, { PCI_DEVICE(PCI_VENDOR_ID_AMAZON, 0x0065),
From: "rasheed.hsueh" rasheed.hsueh@lcfc.corp-partner.google.com
[ Upstream commit 43047e082b90ead395c44b0e8497bc853bd13845 ]
Like commit 5611ec2b9814 ("nvme-pci: prevent SK hynix PC400 from using Write Zeroes command"), UMIS and Samsung has the same issue: [ 6305.633887] blk_update_request: operation not supported error, dev nvme0n1, sector 340812032 op 0x9:(WRITE_ZEROES) flags 0x0 phys_seg 0 prio class 0
So also disable Write Zeroes command on UMIS and Samsung.
Signed-off-by: rasheed.hsueh rasheed.hsueh@lcfc.corp-partner.google.com Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 9847a20f9623..4a07fb8bacfb 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -3365,6 +3365,14 @@ static const struct pci_device_id nvme_id_table[] = { .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, { PCI_DEVICE(0x1d97, 0x2263), /* SPCC */ .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x144d, 0xa80b), /* Samsung PM9B1 256G and 512G */ + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x144d, 0xa809), /* Samsung MZALQ256HBJD 256G */ + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x1cc4, 0x6303), /* UMIS RPJTJ512MGE1QDY 512G */ + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, + { PCI_DEVICE(0x1cc4, 0x6302), /* UMIS RPJTJ256MGE1QDY 256G */ + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, { PCI_DEVICE(0x2646, 0x2262), /* KINGSTON SKC2000 NVMe SSD */ .driver_data = NVME_QUIRK_NO_DEEPEST_PS, }, { PCI_DEVICE(0x2646, 0x2263), /* KINGSTON A2000 NVMe SSD */
From: Ye Bin yebin10@huawei.com
[ Upstream commit 27cfa258951a465e3eae63ee1e715e902cd45578 ]
We got issue as follows: [home]# mount /dev/sdd test [home]# cd test [test]# ls dir1 lost+found [test]# rmdir dir1 ext2_empty_dir: inject fault [test]# ls lost+found [test]# cd .. [home]# umount test [home]# fsck.ext2 -fn /dev/sdd e2fsck 1.42.9 (28-Dec-2013) Pass 1: Checking inodes, blocks, and sizes Inode 4065, i_size is 0, should be 1024. Fix? no
Pass 2: Checking directory structure Pass 3: Checking directory connectivity Unconnected directory inode 4065 (/???) Connect to /lost+found? no
'..' in ... (4065) is / (2), should be <The NULL inode> (0). Fix? no
Pass 4: Checking reference counts Inode 2 ref count is 3, should be 4. Fix? no
Inode 4065 ref count is 2, should be 3. Fix? no
Pass 5: Checking group summary information
/dev/sdd: ********** WARNING: Filesystem still has errors **********
/dev/sdd: 14/128016 files (0.0% non-contiguous), 18477/512000 blocks
Reason is same with commit 7aab5c84a0f6. We can't assume directory is empty when read directory entry failed.
Link: https://lore.kernel.org/r/20220615090010.1544152-1-yebin10@huawei.com Signed-off-by: Ye Bin yebin10@huawei.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext2/dir.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c index 2c2f179b6977..43de293cef56 100644 --- a/fs/ext2/dir.c +++ b/fs/ext2/dir.c @@ -672,17 +672,14 @@ int ext2_empty_dir (struct inode * inode) void *page_addr = NULL; struct page *page = NULL; unsigned long i, npages = dir_pages(inode); - int dir_has_error = 0;
for (i = 0; i < npages; i++) { char *kaddr; ext2_dirent * de; - page = ext2_get_page(inode, i, dir_has_error, &page_addr); + page = ext2_get_page(inode, i, 0, &page_addr);
- if (IS_ERR(page)) { - dir_has_error = 1; - continue; - } + if (IS_ERR(page)) + goto not_empty;
kaddr = page_addr; de = (ext2_dirent *)kaddr;
From: Zhang Yi yi.zhang@huawei.com
[ Upstream commit 15baa7dcadf1c4f0b4f752dc054191855ff2d78e ]
We have already check the io_error and uptodate flag before submitting the superblock buffer, and re-set the uptodate flag if it has been failed to write out. But it was lockless and could be raced by another ext4_commit_super(), and finally trigger '!uptodate' WARNING when marking buffer dirty. Fix it by submit buffer directly.
Reported-by: Hulk Robot hulkci@huawei.com Signed-off-by: Zhang Yi yi.zhang@huawei.com Reviewed-by: Jan Kara jack@suse.cz Reviewed-by: Ritesh Harjani ritesh.list@gmail.com Link: https://lore.kernel.org/r/20220520023216.3065073-1-yi.zhang@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/super.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 91b83749ee11..c5082f0ffcb4 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -5479,7 +5479,6 @@ static void ext4_update_super(struct super_block *sb) static int ext4_commit_super(struct super_block *sb) { struct buffer_head *sbh = EXT4_SB(sb)->s_sbh; - int error = 0;
if (!sbh) return -EINVAL; @@ -5488,6 +5487,13 @@ static int ext4_commit_super(struct super_block *sb)
ext4_update_super(sb);
+ lock_buffer(sbh); + /* Buffer got discarded which means block device got invalidated */ + if (!buffer_mapped(sbh)) { + unlock_buffer(sbh); + return -EIO; + } + if (buffer_write_io_error(sbh) || !buffer_uptodate(sbh)) { /* * Oh, dear. A previous attempt to write the @@ -5502,17 +5508,21 @@ static int ext4_commit_super(struct super_block *sb) clear_buffer_write_io_error(sbh); set_buffer_uptodate(sbh); } - BUFFER_TRACE(sbh, "marking dirty"); - mark_buffer_dirty(sbh); - error = __sync_dirty_buffer(sbh, - REQ_SYNC | (test_opt(sb, BARRIER) ? REQ_FUA : 0)); + get_bh(sbh); + /* Clear potential dirty bit if it was journalled update */ + clear_buffer_dirty(sbh); + sbh->b_end_io = end_buffer_write_sync; + submit_bh(REQ_OP_WRITE, + REQ_SYNC | (test_opt(sb, BARRIER) ? REQ_FUA : 0), sbh); + wait_on_buffer(sbh); if (buffer_write_io_error(sbh)) { ext4_msg(sb, KERN_ERR, "I/O error while writing " "superblock"); clear_buffer_write_io_error(sbh); set_buffer_uptodate(sbh); + return -EIO; } - return error; + return 0; }
/*
From: Jan Kara jack@suse.cz
[ Upstream commit 8d5459c11f548131ce48b2fbf45cccc5c382558f ]
When delayed allocation is disabled (either through mount option or because we are running low on free space), ext4_write_begin() allocates blocks with EXT4_GET_BLOCKS_IO_CREATE_EXT flag. With this flag extent merging is disabled and since ext4_write_begin() is called for each page separately, we end up with a *lot* of 1 block extents in the extent tree and following writeback is writing 1 block at a time which results in very poor write throughput (4 MB/s instead of 200 MB/s). These days when ext4_get_block_unwritten() is used only by ext4_write_begin(), ext4_page_mkwrite() and inline data conversion, we can safely allow extent merging to happen from these paths since following writeback will happen on different boundaries anyway. So use EXT4_GET_BLOCKS_CREATE_UNRIT_EXT instead which restores the performance.
Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20220520111402.4252-1-jack@suse.cz Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index f0350d60ba50..e83a2887c1a4 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -823,7 +823,7 @@ int ext4_get_block_unwritten(struct inode *inode, sector_t iblock, ext4_debug("ext4_get_block_unwritten: inode %lu, create flag %d\n", inode->i_ino, create); return _ext4_get_block(inode, iblock, bh_result, - EXT4_GET_BLOCKS_IO_CREATE_EXT); + EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT); }
/* Maximum number of blocks we map for direct IO at once. */
From: Ming Lei ming.lei@redhat.com
[ Upstream commit 5fd7a84a09e640016fe106dd3e992f5210e23dc7 ]
elevator can be tore down by sysfs switch interface or disk release, so hold ->sysfs_lock before referring to q->elevator, then potential use-after-free can be avoided.
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20220616014401.817001-2-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-mq.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c index b70488e4db94..95da44063e65 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -3683,12 +3683,14 @@ static bool blk_mq_elv_switch_none(struct list_head *head, if (!qe) return false;
+ /* q->elevator needs protection from ->sysfs_lock */ + mutex_lock(&q->sysfs_lock); + INIT_LIST_HEAD(&qe->node); qe->q = q; qe->type = q->elevator->type; list_add(&qe->node, head);
- mutex_lock(&q->sysfs_lock); /* * After elevator_switch_mq, the previous elevator_queue will be * released by elevator_release. The reference of the io scheduler
From: Ming Lei ming.lei@redhat.com
[ Upstream commit 6cfeadbff3f8905f2854735ebb88e581402c16c4 ]
commit 364b61818f65 ("blk-mq: clearing flush request reference in tags->rqs[]") is added to clear the to-be-free flush request from tags->rqs[] for avoiding use-after-free on the flush rq.
Yu Kuai reported that blk_mq_clear_flush_rq_mapping() slows down boot time by ~8s because running scsi probe which may create and remove lots of unpresent LUNs on megaraid-sas which uses BLK_MQ_F_TAG_HCTX_SHARED and each request queue has lots of hw queues.
Improve the situation by not running blk_mq_clear_flush_rq_mapping if disk isn't added when there can't be any flush request issued.
Reviewed-by: Christoph Hellwig hch@lst.de Reported-by: Yu Kuai yukuai3@huawei.com Signed-off-by: Ming Lei ming.lei@redhat.com Link: https://lore.kernel.org/r/20220616014401.817001-4-ming.lei@redhat.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-mq.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/block/blk-mq.c b/block/blk-mq.c index 95da44063e65..cc313b364691 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2681,8 +2681,9 @@ static void blk_mq_exit_hctx(struct request_queue *q, if (blk_mq_hw_queue_mapped(hctx)) blk_mq_tag_idle(hctx);
- blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx], - set->queue_depth, flush_rq); + if (blk_queue_init_done(q)) + blk_mq_clear_flush_rq_mapping(set->tags[hctx_idx], + set->queue_depth, flush_rq); if (set->ops->exit_request) set->ops->exit_request(set, flush_rq, hctx_idx);
From: Baokun Li libaokun1@huawei.com
[ Upstream commit cf4ff938b47fc5c00b0ccce53a3b50eca9b32281 ]
ext4_mb_normalize_request() can move logical start of allocated blocks to reduce fragmentation and better utilize preallocation. However logical block requested as a start of allocation (ac->ac_o_ex.fe_logical) should always be covered by allocated blocks so we should check that by modifying and to or in the assertion.
Signed-off-by: Baokun Li libaokun1@huawei.com Reviewed-by: Ritesh Harjani ritesh.list@gmail.com Link: https://lore.kernel.org/r/20220528110017.354175-3-libaokun1@huawei.com Signed-off-by: Theodore Ts'o tytso@mit.edu Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ext4/mballoc.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c index 20ac3c74b51d..b01584e140ac 100644 --- a/fs/ext4/mballoc.c +++ b/fs/ext4/mballoc.c @@ -4171,7 +4171,22 @@ ext4_mb_normalize_request(struct ext4_allocation_context *ac, } rcu_read_unlock();
- if (start + size <= ac->ac_o_ex.fe_logical && + /* + * In this function "start" and "size" are normalized for better + * alignment and length such that we could preallocate more blocks. + * This normalization is done such that original request of + * ac->ac_o_ex.fe_logical & fe_len should always lie within "start" and + * "size" boundaries. + * (Note fe_len can be relaxed since FS block allocation API does not + * provide gurantee on number of contiguous blocks allocation since that + * depends upon free space left, etc). + * In case of inode pa, later we use the allocated blocks + * [pa_start + fe_logical - pa_lstart, fe_len/size] from the preallocated + * range of goal/best blocks [start, size] to put it at the + * ac_o_ex.fe_logical extent of this inode. + * (See ext4_mb_use_inode_pa() for more details) + */ + if (start + size <= ac->ac_o_ex.fe_logical || start > ac->ac_o_ex.fe_logical) { ext4_msg(ac->ac_sb, KERN_ERR, "start %lu, size %lu, fe_logical %lu",
linux-stable-mirror@lists.linaro.org