After commit ea2f0f77538c, a 416-CPU VM running on Hyper-V hangs during
boot because scsi_add_host_with_dma() sets shost->cmd_per_lun to a
negative number:
'max_outstanding_req_per_channel' is 352,
'max_sub_channels' is (416 - 1) / 4 = 103, so in storvsc_probe(),
scsi_driver.can_queue = 352 * (103 + 1) * (100 - 10) / 100 = 32947, which
is bigger than SHRT_MAX (i.e. 32767).
Fix the hang issue by capping scsi_driver.can_queue.
Add the below Fixed tag though ea2f0f77538c itself is good.
Fixes: ea2f0f77538c ("scsi: core: Cap scsi_host cmd_per_lun at can_queue")
Cc: stable(a)vger.kernel.org
Signed-off-by: Dexuan Cui <decui(a)microsoft.com>
---
drivers/scsi/storvsc_drv.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index ebbbc1299c62..ba374908aec2 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -1976,6 +1976,16 @@ static int storvsc_probe(struct hv_device *device,
(max_sub_channels + 1) *
(100 - ring_avail_percent_lowater) / 100;
+ /*
+ * v5.14 (see commit ea2f0f77538c) implicitly requires that
+ * scsi_driver.can_queue should not exceed SHRT_MAX, otherwise
+ * scsi_add_host_with_dma() sets shost->cmd_per_lun to a negative
+ * number (note: the type of the "cmd_per_lun" field is "short"), and
+ * the system may hang during early boot.
+ */
+ if (scsi_driver.can_queue > SHRT_MAX)
+ scsi_driver.can_queue = SHRT_MAX;
+
host = scsi_host_alloc(&scsi_driver,
sizeof(struct hv_host_device));
if (!host)
--
2.17.1
On the 88W8897 card it's very important the TX ring write pointer is
updated correctly to its new value before setting the TX ready
interrupt, otherwise the firmware appears to crash (probably because
it's trying to DMA-read from the wrong place). The issue is present in
the latest firmware version 15.68.19.p21 of the pcie+usb card.
Since PCI uses "posted writes" when writing to a register, it's not
guaranteed that a write will happen immediately. That means the pointer
might be outdated when setting the TX ready interrupt, leading to
firmware crashes especially when ASPM L1 and L1 substates are enabled
(because of the higher link latency, the write will probably take
longer).
So fix those firmware crashes by always using a non-posted write for
this specific register write. We do that by simply reading back the
register after writing it, just as a few other PCI drivers do.
This fixes a bug where during rx/tx traffic and with ASPM L1 substates
enabled (the enabled substates are platform dependent), the firmware
crashes and eventually a command timeout appears in the logs.
Cc: stable(a)vger.kernel.org
Signed-off-by: Jonas Dreßler <verdre(a)v0yd.nl>
---
drivers/net/wireless/marvell/mwifiex/pcie.c | 26 ++++++++++++++++++---
1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/drivers/net/wireless/marvell/mwifiex/pcie.c b/drivers/net/wireless/marvell/mwifiex/pcie.c
index c6ccce426b49..0eff717ac5fa 100644
--- a/drivers/net/wireless/marvell/mwifiex/pcie.c
+++ b/drivers/net/wireless/marvell/mwifiex/pcie.c
@@ -240,6 +240,20 @@ static int mwifiex_write_reg(struct mwifiex_adapter *adapter, int reg, u32 data)
return 0;
}
+/*
+ * This function does a non-posted write into a PCIE card register, ensuring
+ * it's completion before returning.
+ */
+static int mwifiex_write_reg_np(struct mwifiex_adapter *adapter, int reg, u32 data)
+{
+ struct pcie_service_card *card = adapter->card;
+
+ iowrite32(data, card->pci_mmap1 + reg);
+ ioread32(card->pci_mmap1 + reg);
+
+ return 0;
+}
+
/* This function reads data from PCIE card register.
*/
static int mwifiex_read_reg(struct mwifiex_adapter *adapter, int reg, u32 *data)
@@ -1482,9 +1496,15 @@ mwifiex_pcie_send_data(struct mwifiex_adapter *adapter, struct sk_buff *skb,
reg->tx_rollover_ind);
rx_val = card->rxbd_rdptr & reg->rx_wrap_mask;
- /* Write the TX ring write pointer in to reg->tx_wrptr */
- if (mwifiex_write_reg(adapter, reg->tx_wrptr,
- card->txbd_wrptr | rx_val)) {
+ /* Write the TX ring write pointer in to reg->tx_wrptr.
+ * The firmware (latest version 15.68.19.p21) of the 88W8897
+ * pcie+usb card seems to crash when getting the TX ready
+ * interrupt but the TX ring write pointer points to an outdated
+ * address, so it's important we do a non-posted write here to
+ * force the completion of the write.
+ */
+ if (mwifiex_write_reg_np(adapter, reg->tx_wrptr,
+ card->txbd_wrptr | rx_val)) {
mwifiex_dbg(adapter, ERROR,
"SEND DATA: failed to write reg->tx_wrptr\n");
ret = -1;
--
2.31.1
From: Zhang Yi <yi.zhang(a)huawei.com>
[ Upstream commit 4df031ff5876d94b48dd9ee486ba5522382a06b2 ]
After commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <=
isize"), i_disksize could always be updated to i_size in ext4_setattr(),
and we could sure that i_disksize <= i_size since holding inode lock and
if i_disksize < i_size there are delalloc writes pending in the range
upto i_size. If the end of the current write is <= i_size, there's no
need to touch i_disksize since writeback will push i_disksize upto
i_size eventually. So we can switch to check i_size instead of
i_disksize in ext4_da_write_end() when write to the end of the file.
we also could remove ext4_mark_inode_dirty() together because we defer
inode dirtying to generic_write_end() or ext4_da_write_inline_data_end().
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Link: https://lore.kernel.org/r/20210716122024.1105856-2-yi.zhang@huawei.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/ext4/inode.c | 34 ++++++++++++++++++----------------
1 file changed, 18 insertions(+), 16 deletions(-)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index d8de607849df..dca8e3810443 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3084,35 +3084,37 @@ static int ext4_da_write_end(struct file *file,
end = start + copied - 1;
/*
- * generic_write_end() will run mark_inode_dirty() if i_size
- * changes. So let's piggyback the i_disksize mark_inode_dirty
- * into that.
+ * Since we are holding inode lock, we are sure i_disksize <=
+ * i_size. We also know that if i_disksize < i_size, there are
+ * delalloc writes pending in the range upto i_size. If the end of
+ * the current write is <= i_size, there's no need to touch
+ * i_disksize since writeback will push i_disksize upto i_size
+ * eventually. If the end of the current write is > i_size and
+ * inside an allocated block (ext4_da_should_update_i_disksize()
+ * check), we need to update i_disksize here as neither
+ * ext4_writepage() nor certain ext4_writepages() paths not
+ * allocating blocks update i_disksize.
+ *
+ * Note that we defer inode dirtying to generic_write_end() /
+ * ext4_da_write_inline_data_end().
*/
new_i_size = pos + copied;
- if (copied && new_i_size > EXT4_I(inode)->i_disksize) {
+ if (copied && new_i_size > inode->i_size) {
if (ext4_has_inline_data(inode) ||
- ext4_da_should_update_i_disksize(page, end)) {
+ ext4_da_should_update_i_disksize(page, end))
ext4_update_i_disksize(inode, new_i_size);
- /* We need to mark inode dirty even if
- * new_i_size is less that inode->i_size
- * bu greater than i_disksize.(hint delalloc)
- */
- ret = ext4_mark_inode_dirty(handle, inode);
- }
}
if (write_mode != CONVERT_INLINE_DATA &&
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA) &&
ext4_has_inline_data(inode))
- ret2 = ext4_da_write_inline_data_end(inode, pos, len, copied,
+ ret = ext4_da_write_inline_data_end(inode, pos, len, copied,
page);
else
- ret2 = generic_write_end(file, mapping, pos, len, copied,
+ ret = generic_write_end(file, mapping, pos, len, copied,
page, fsdata);
- copied = ret2;
- if (ret2 < 0)
- ret = ret2;
+ copied = ret;
ret2 = ext4_journal_stop(handle);
if (unlikely(ret2 && !ret))
ret = ret2;
--
2.33.0
Currently, Linux probes for X86_BUG_NULL_SEL unconditionally which
makes it unsafe to migrate in a virtualised environment as the
properties across the migration pool might differ.
Zen3 adds the NullSelectorClearsBase bit to indicate that loading
a NULL segment selector zeroes the base and limit fields, as well as
just attributes. Zen2 also has this behaviour but doesn't have the
NSCB bit.
When virtualised, NSCB might be cleared for migration safety,
therefore we must not probe. Always honour the NSCB bit in this case,
as the hypervisor is expected to synthesize it in the Zen2 case.
Signed-off-by: Jane Malalane <jane.malalane(a)citrix.com>
---
CC: <x86(a)kernel.org>
CC: Thomas Gleixner <tglx(a)linutronix.de>
CC: Ingo Molnar <mingo(a)redhat.com>
CC: Borislav Petkov <bp(a)alien8.de>
CC: "H. Peter Anvin" <hpa(a)zytor.com>
CC: Pu Wen <puwen(a)hygon.cn>
CC: Paolo Bonzini <pbonzini(a)redhat.com>
CC: Sean Christopherson <seanjc(a)google.com>
CC: Peter Zijlstra <peterz(a)infradead.org>
CC: Andrew Cooper <andrew.cooper3(a)citrix.com>
CC: Yazen Ghannam <Yazen.Ghannam(a)amd.com>
CC: Brijesh Singh <brijesh.singh(a)amd.com>
CC: Huang Rui <ray.huang(a)amd.com>
CC: Andy Lutomirski <luto(a)kernel.org>
CC: Kim Phillips <kim.phillips(a)amd.com>
CC: <stable(a)vger.kernel.org>
---
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/kernel/cpu/amd.c | 23 +++++++++++++++++++++++
arch/x86/kernel/cpu/common.c | 6 ++----
arch/x86/kernel/cpu/cpu.h | 1 +
arch/x86/kernel/cpu/hygon.c | 23 +++++++++++++++++++++++
5 files changed, 50 insertions(+), 5 deletions(-)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index d0ce5cfd3ac1..f571e4f6fe83 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -96,7 +96,7 @@
#define X86_FEATURE_SYSCALL32 ( 3*32+14) /* "" syscall in IA32 userspace */
#define X86_FEATURE_SYSENTER32 ( 3*32+15) /* "" sysenter in IA32 userspace */
#define X86_FEATURE_REP_GOOD ( 3*32+16) /* REP microcode works well */
-/* FREE! ( 3*32+17) */
+#define X86_FEATURE_NSCB ( 3*32+17) /* Null Selector Clears Base */
#define X86_FEATURE_LFENCE_RDTSC ( 3*32+18) /* "" LFENCE synchronizes RDTSC */
#define X86_FEATURE_ACC_POWER ( 3*32+19) /* AMD Accumulated Power Mechanism */
#define X86_FEATURE_NOPL ( 3*32+20) /* The NOPL (0F 1F) instructions */
diff --git a/arch/x86/kernel/cpu/amd.c b/arch/x86/kernel/cpu/amd.c
index 2131af9f2fa2..73c4863fe0f4 100644
--- a/arch/x86/kernel/cpu/amd.c
+++ b/arch/x86/kernel/cpu/amd.c
@@ -650,6 +650,29 @@ static void early_init_amd(struct cpuinfo_x86 *c)
if (c->x86_power & BIT(14))
set_cpu_cap(c, X86_FEATURE_RAPL);
+ /*
+ * Zen1 and earlier CPUs don't clear segment base/limits when
+ * loading a NULL selector. This has been designated
+ * X86_BUG_NULL_SEG.
+ *
+ * Zen3 CPUs advertise Null Selector Clears Base in CPUID.
+ * Zen2 CPUs also have this behaviour, but no CPUID bit.
+ *
+ * A hypervisor may sythesize the bit, but may also hide it
+ * for migration safety, so we must not probe for model
+ * specific behaviour when virtualised.
+ */
+ if (c->extended_cpuid_level >= 0x80000021 &&
+ cpuid_eax(0x80000021) & BIT(6))
+ set_cpu_cap(c, X86_FEATURE_NSCB);
+
+ if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && !cpu_has(c, X86_FEATURE_NSCB) &&
+ c->x86 == 0x17)
+ detect_null_seg_behavior(c);
+
+ if (!cpu_has(c, X86_FEATURE_NSCB))
+ set_cpu_bug(c, X86_BUG_NULL_SEG);
+
#ifdef CONFIG_X86_64
set_cpu_cap(c, X86_FEATURE_SYSCALL32);
#else
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 0f8885949e8c..690337796e61 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1395,7 +1395,7 @@ void __init early_cpu_init(void)
early_identify_cpu(&boot_cpu_data);
}
-static void detect_null_seg_behavior(struct cpuinfo_x86 *c)
+void detect_null_seg_behavior(struct cpuinfo_x86 *c)
{
#ifdef CONFIG_X86_64
/*
@@ -1419,7 +1419,7 @@ static void detect_null_seg_behavior(struct cpuinfo_x86 *c)
loadsegment(fs, 0);
rdmsrl(MSR_FS_BASE, tmp);
if (tmp != 0)
- set_cpu_bug(c, X86_BUG_NULL_SEG);
+ set_cpu_cap(c, X86_FEATURE_NSCB);
wrmsrl(MSR_FS_BASE, old_base);
#endif
}
@@ -1457,8 +1457,6 @@ static void generic_identify(struct cpuinfo_x86 *c)
get_model_name(c); /* Default name */
- detect_null_seg_behavior(c);
-
/*
* ESPFIX is a strange bug. All real CPUs have it. Paravirt
* systems that run Linux at CPL > 0 may or may not have the
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 95521302630d..642f46e0dd67 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -75,6 +75,7 @@ extern int detect_extended_topology_early(struct cpuinfo_x86 *c);
extern int detect_extended_topology(struct cpuinfo_x86 *c);
extern int detect_ht_early(struct cpuinfo_x86 *c);
extern void detect_ht(struct cpuinfo_x86 *c);
+extern void detect_null_seg_behavior(struct cpuinfo_x86 *c);
unsigned int aperfmperf_get_khz(int cpu);
diff --git a/arch/x86/kernel/cpu/hygon.c b/arch/x86/kernel/cpu/hygon.c
index 6d50136f7ab9..765f1556d964 100644
--- a/arch/x86/kernel/cpu/hygon.c
+++ b/arch/x86/kernel/cpu/hygon.c
@@ -264,6 +264,29 @@ static void early_init_hygon(struct cpuinfo_x86 *c)
if (c->x86_power & BIT(14))
set_cpu_cap(c, X86_FEATURE_RAPL);
+ /*
+ * Zen1 and earlier CPUs don't clear segment base/limits when
+ * loading a NULL selector. This has been designated
+ * X86_BUG_NULL_SEG.
+ *
+ * Zen3 CPUs advertise Null Selector Clears Base in CPUID.
+ * Zen2 CPUs also have this behaviour, but no CPUID bit.
+ *
+ * A hypervisor may sythesize the bit, but may also hide it
+ * for migration safety, so we must not probe for model
+ * specific behaviour when virtualised.
+ */
+ if (c->extended_cpuid_level >= 0x80000021 &&
+ cpuid_eax(0x80000021) & BIT(6))
+ set_cpu_cap(c, X86_FEATURE_NSCB);
+
+ if (!cpu_has(c, X86_FEATURE_HYPERVISOR) && !cpu_has(c, X86_FEATURE_NSCB) &&
+ c->x86 == 0x18)
+ detect_null_seg_behavior(c);
+
+ if (!cpu_has(c, X86_FEATURE_NSCB))
+ set_cpu_bug(c, X86_BUG_NULL_SEG);
+
#ifdef CONFIG_X86_64
set_cpu_cap(c, X86_FEATURE_SYSCALL32);
#endif
--
2.11.0