Fix an issue with the Tyan Tomcat IV S1564D system, the BIOS of which
does not assign PCI buses beyond #2, where our resource reallocation
code preserves the reset default of an I/O BAR assignment outside its
upstream PCI-to-PCI bridge's I/O forwarding range for device 06:08.0 in
this log:
pci_bus 0000:00: max bus depth: 4 pci_try_num: 5
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.0: BAR 4: assigned [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: [io 0xfce0-0xfcff] conflicts with 0000:06:08.0 [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: failed to assign [io size 0x0020]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0xe000-0xefff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: No. 2 try to assign unassigned res
[...]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: [io 0xfce0-0xfcff] conflicts with 0000:06:08.0 [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: failed to assign [io size 0x0020]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0xe000-0xefff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: No. 3 try to assign unassigned res
pci 0000:00:11.0: resource 7 [io 0xe000-0xefff] released
[...]
pci 0000:06:08.1: BAR 4: assigned [io 0x2000-0x201f]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [io 0x2000-0x2fff]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0x1000-0x2fff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: resource 4 [io 0x0000-0xffff]
pci_bus 0000:00: resource 5 [mem 0x00000000-0xffffffff]
pci_bus 0000:01: resource 0 [io 0x1000-0x2fff]
pci_bus 0000:01: resource 1 [mem 0xd8000000-0xdfffffff]
pci_bus 0000:01: resource 2 [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:02: resource 0 [io 0x1000-0x2fff]
pci_bus 0000:02: resource 1 [mem 0xd8000000-0xd8bfffff]
pci_bus 0000:04: resource 0 [io 0x1000-0x1fff]
pci_bus 0000:04: resource 1 [mem 0xd8600000-0xd8afffff]
pci_bus 0000:05: resource 0 [io 0x2000-0x2fff]
pci_bus 0000:05: resource 1 [mem 0xd8000000-0xd85fffff]
pci_bus 0000:06: resource 0 [io 0x2000-0x2fff]
pci_bus 0000:06: resource 1 [mem 0xd8000000-0xd85fffff]
-- note that the assignment of 0xfce0-0xfcff is outside the range of
0x2000-0x2fff assigned to bus #6:
05:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge (rev 03) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=05, secondary=06, subordinate=06, sec-latency=0
I/O behind bridge: 00002000-00002fff
Memory behind bridge: d8000000-d85fffff
Capabilities: [50] Power Management version 2
Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/4 Enable-
Capabilities: [80] #0d [0000]
Capabilities: [90] Express PCI/PCI-X Bridge IRQ 0
06:08.0 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller
Flags: bus master, medium devsel, latency 22, IRQ 5
I/O ports at fce0 [size=32]
Capabilities: [80] Power Management version 2
06:08.1 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller
Flags: bus master, medium devsel, latency 22, IRQ 5
I/O ports at 2000 [size=32]
Capabilities: [80] Power Management version 2
Since both 06:08.0 and 06:08.1 have the same reset defaults the latter
device escapes its fate and gets a good assignment owing to an address
conflict with the former device.
Consequently when the device driver tries to access 06:08.0 according to
its designated address range it pokes at an unassigned I/O location,
likely subtractively decoded by the southbridge and forwarded to ISA,
causing the driver to become confused and bail out:
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host controller halted, very bad!
uhci_hcd 0000:06:08.0: HCRESET not completed yet!
uhci_hcd 0000:06:08.0: HC died; cleaning up
if good luck happens or if bad luck does, an infinite flood of messages:
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
[...]
making the system virtually unusuable.
This is because we have code to deal with a situation from PR #16263,
where broken ACPI firmware reports the wrong address range for the host
bridge's decoding window and trying to adjust to the window causes more
breakage than leaving the BIOS assignments intact.
This may work for a device directly on the root bus decoded by the host
bridge only, but for a device behind one or more PCI-to-PCI (or CardBus)
bridges those bridges' forwarding windows have been standardised and
need to be respected, or leaving whatever has been there in a downstream
device's BAR will have no effect as cycles for the addresses recorded
there will have no chance to appear on the bus the device has been
immediately attached to.
Make sure then for a device behind a PCI-to-PCI bridge that any firmware
assignment is within the bridge's relevant forwarding window or do not
restore the assignment, fixing the system concerned as follows:
pci_bus 0000:00: max bus depth: 4 pci_try_num: 5
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: failed to assign [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: failed to assign [io 0xfce0-0xfcff]
[...]
pci_bus 0000:00: No. 2 try to assign unassigned res
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: failed to assign [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: failed to assign [io 0xfce0-0xfcff]
[...]
pci_bus 0000:00: No. 3 try to assign unassigned res
[...]
pci 0000:06:08.0: BAR 4: assigned [io 0x2000-0x201f]
pci 0000:06:08.1: BAR 4: assigned [io 0x2020-0x203f]
and making device 06:08.0 work correctly.
Cf. <https://bugzilla.kernel.org/show_bug.cgi?id=16263>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: 58c84eda0756 ("PCI: fall back to original BIOS BAR addresses")
Cc: stable(a)vger.kernel.org # v2.6.35+
---
Hi,
Resending this patch as it has gone into void. Patch re-verified against
5.17-rc2.
For the record the system's bus topology is as follows:
-[0000:00]-+-00.0
+-07.0
+-07.1
+-07.2
+-11.0-[0000:01-06]----00.0-[0000:02-06]--+-00.0-[0000:03]--
| +-01.0-[0000:04]--+-00.0
| | \-00.3
| \-02.0-[0000:05-06]----00.0-[0000:06]--+-05.0
| +-08.0
| +-08.1
| \-08.2
+-12.0
+-13.0
\-14.0
Maciej
Changes from v1:
- Do restore firmware BAR assignments behind a PCI-PCI bridge, but only if
within the bridge's forwarding window.
- Update the change description and heading accordingly (was: PCI: Do not
restore firmware BAR assignments behind a PCI-PCI bridge).
---
drivers/pci/setup-res.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
linux-pci-setup-res-fw-address-nobridge.diff
Index: linux-macro/drivers/pci/setup-res.c
===================================================================
--- linux-macro.orig/drivers/pci/setup-res.c
+++ linux-macro/drivers/pci/setup-res.c
@@ -212,9 +212,19 @@ static int pci_revert_fw_address(struct
res->end = res->start + size - 1;
res->flags &= ~IORESOURCE_UNSET;
+ /*
+ * If we're behind a P2P or CardBus bridge, make sure we're
+ * inside the relevant forwarding window, or otherwise the
+ * assignment must have been bogus and accesses intended for
+ * the range assigned would not reach the device anyway.
+ * On the root bus accept anything under the assumption the
+ * host bridge will let it through.
+ */
root = pci_find_parent_resource(dev, res);
if (!root) {
- if (res->flags & IORESOURCE_IO)
+ if (dev->bus->parent)
+ return -ENXIO;
+ else if (res->flags & IORESOURCE_IO)
root = &ioport_resource;
else
root = &iomem_resource;
From: Xiu Jianfeng <xiujianfeng(a)huawei.com>
[ Upstream commit 51dd64bb99e4478fc5280171acd8e1b529eadaf7 ]
This reverts commit ccf11dbaa07b328fa469415c362d33459c140a37.
Commit ccf11dbaa07b ("evm: Fix memleak in init_desc") said there is
memleak in init_desc. That may be incorrect, as we can see, tmp_tfm is
saved in one of the two global variables hmac_tfm or evm_tfm[hash_algo],
then if init_desc is called next time, there is no need to alloc tfm
again, so in the error path of kmalloc desc or crypto_shash_init(desc),
It is not a problem without freeing tmp_tfm.
And also that commit did not reset the global variable to NULL after
freeing tmp_tfm and this makes *tfm a dangling pointer which may cause a
UAF issue.
Reported-by: Guozihua (Scott) <guozihua(a)huawei.com>
Signed-off-by: Xiu Jianfeng <xiujianfeng(a)huawei.com>
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
security/integrity/evm/evm_crypto.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)
diff --git a/security/integrity/evm/evm_crypto.c b/security/integrity/evm/evm_crypto.c
index 25dac691491b..ee6bd945f3d6 100644
--- a/security/integrity/evm/evm_crypto.c
+++ b/security/integrity/evm/evm_crypto.c
@@ -75,7 +75,7 @@ static struct shash_desc *init_desc(char type, uint8_t hash_algo)
{
long rc;
const char *algo;
- struct crypto_shash **tfm, *tmp_tfm = NULL;
+ struct crypto_shash **tfm, *tmp_tfm;
struct shash_desc *desc;
if (type == EVM_XATTR_HMAC) {
@@ -120,16 +120,13 @@ static struct shash_desc *init_desc(char type, uint8_t hash_algo)
alloc:
desc = kmalloc(sizeof(*desc) + crypto_shash_descsize(*tfm),
GFP_KERNEL);
- if (!desc) {
- crypto_free_shash(tmp_tfm);
+ if (!desc)
return ERR_PTR(-ENOMEM);
- }
desc->tfm = *tfm;
rc = crypto_shash_init(desc);
if (rc) {
- crypto_free_shash(tmp_tfm);
kfree(desc);
return ERR_PTR(rc);
}
--
2.35.1
Fix Syzbot bug: kernel BUG in ntfs_lookup_inode_by_name
https://syzkaller.appspot.com/bug?id=32cf53b48c1846ffc25a185a2e92e170d1a95d…
Check whether $Extend is a directory or not( for NTFS3.0+) while
loading system files. If it isn't(as in the case of this bug where the
mft record for $Extend contains a regular file), load_system_files()
returns false.
Reported-by: syzbot+30b7f850c6d98ea461d2(a)syzkaller.appspotmail.com
CC: stable(a)vger.kernel.org # 4.9+
Signed-off-by: Soumya Negi <soumya.negi97(a)gmail.com>
---
Changes since v1:
* Added CC tag for stable
* Formatted changelog to fit within 72 cols
---
fs/ntfs/super.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/ntfs/super.c b/fs/ntfs/super.c
index 5ae8de09b271..18e2902531f9 100644
--- a/fs/ntfs/super.c
+++ b/fs/ntfs/super.c
@@ -2092,10 +2092,15 @@ static bool load_system_files(ntfs_volume *vol)
// TODO: Initialize security.
/* Get the extended system files' directory inode. */
vol->extend_ino = ntfs_iget(sb, FILE_Extend);
- if (IS_ERR(vol->extend_ino) || is_bad_inode(vol->extend_ino)) {
+ if (IS_ERR(vol->extend_ino) || is_bad_inode(vol->extend_ino) ||
+ !S_ISDIR(vol->extend_ino->i_mode)) {
+ static const char *es1 = "$Extend is not a directory";
+ static const char *es2 = "Failed to load $Extend";
+ const char *es = !S_ISDIR(vol->extend_ino->i_mode) ? es1 : es2;
+
if (!IS_ERR(vol->extend_ino))
iput(vol->extend_ino);
- ntfs_error(sb, "Failed to load $Extend.");
+ ntfs_error(sb, "%s.", es);
goto iput_sec_err_out;
}
#ifdef NTFS_RW
--
2.17.1
Older CPUs beyond its Servicing period are not listed in the affected
processor list for MMIO Stale Data vulnerabilities. These CPUs currently
report "Not affected" in sysfs, which may not be correct.
Add support for "Unknown" reporting for such CPUs. Mitigation is not
deployed when the status is "Unknown".
"CPU is beyond its Servicing period" means these CPUs are beyond their
Servicing [1] period and have reached End of Servicing Updates (ESU) [2].
[1] Servicing: The process of providing functional and security
updates to Intel processors or platforms, utilizing the Intel Platform
Update (IPU) process or other similar mechanisms.
[2] End of Servicing Updates (ESU): ESU is the date at which Intel
will no longer provide Servicing, such as through IPU or other similar
update processes. ESU dates will typically be aligned to end of
quarter.
Suggested-by: Andrew Cooper <andrew.cooper3(a)citrix.com>
Suggested-by: Tony Luck <tony.luck(a)intel.com>
Fixes: 8d50cdf8b834 ("x86/speculation/mmio: Add sysfs reporting for Processor MMIO Stale Data")
Cc: stable(a)vger.kernel.org
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
CPU vulnerability is unknown if, hardware doesn't set the immunity bits
and CPU is not in the known-affected-list.
In order to report the unknown status, this patch sets the MMIO bug
for all Intel CPUs that don't have the hardware immunity bits set.
Based on the known-affected-list of CPUs, mitigation selection then
deploys the mitigation or sets the "Unknown" status; which is ugly.
I will appreciate suggestions to improve this.
Thanks,
Pawan
.../hw-vuln/processor_mmio_stale_data.rst | 3 +++
arch/x86/kernel/cpu/bugs.c | 11 +++++++-
arch/x86/kernel/cpu/common.c | 26 +++++++++++++------
arch/x86/kernel/cpu/cpu.h | 1 +
4 files changed, 32 insertions(+), 9 deletions(-)
diff --git a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
index 9393c50b5afc..55524e0798da 100644
--- a/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
+++ b/Documentation/admin-guide/hw-vuln/processor_mmio_stale_data.rst
@@ -230,6 +230,9 @@ The possible values in this file are:
* - 'Mitigation: Clear CPU buffers'
- The processor is vulnerable and the CPU buffer clearing mitigation is
enabled.
+ * - 'Unknown: CPU is beyond its Servicing period'
+ - The processor vulnerability status is unknown because it is
+ out of Servicing period. Mitigation is not attempted.
If the processor is vulnerable then the following information is appended to
the above information:
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 0dd04713434b..dd6e78d370bc 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -416,6 +416,7 @@ enum mmio_mitigations {
MMIO_MITIGATION_OFF,
MMIO_MITIGATION_UCODE_NEEDED,
MMIO_MITIGATION_VERW,
+ MMIO_MITIGATION_UNKNOWN,
};
/* Default mitigation for Processor MMIO Stale Data vulnerabilities */
@@ -426,12 +427,18 @@ static const char * const mmio_strings[] = {
[MMIO_MITIGATION_OFF] = "Vulnerable",
[MMIO_MITIGATION_UCODE_NEEDED] = "Vulnerable: Clear CPU buffers attempted, no microcode",
[MMIO_MITIGATION_VERW] = "Mitigation: Clear CPU buffers",
+ [MMIO_MITIGATION_UNKNOWN] = "Unknown: CPU is beyond its servicing period",
};
static void __init mmio_select_mitigation(void)
{
u64 ia32_cap;
+ if (mmio_stale_data_unknown()) {
+ mmio_mitigation = MMIO_MITIGATION_UNKNOWN;
+ return;
+ }
+
if (!boot_cpu_has_bug(X86_BUG_MMIO_STALE_DATA) ||
cpu_mitigations_off()) {
mmio_mitigation = MMIO_MITIGATION_OFF;
@@ -1638,6 +1645,7 @@ void cpu_bugs_smt_update(void)
pr_warn_once(MMIO_MSG_SMT);
break;
case MMIO_MITIGATION_OFF:
+ case MMIO_MITIGATION_UNKNOWN:
break;
}
@@ -2235,7 +2243,8 @@ static ssize_t tsx_async_abort_show_state(char *buf)
static ssize_t mmio_stale_data_show_state(char *buf)
{
- if (mmio_mitigation == MMIO_MITIGATION_OFF)
+ if (mmio_mitigation == MMIO_MITIGATION_OFF ||
+ mmio_mitigation == MMIO_MITIGATION_UNKNOWN)
return sysfs_emit(buf, "%s\n", mmio_strings[mmio_mitigation]);
if (boot_cpu_has(X86_FEATURE_HYPERVISOR)) {
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 736262a76a12..82088410870e 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1286,6 +1286,22 @@ static bool arch_cap_mmio_immune(u64 ia32_cap)
ia32_cap & ARCH_CAP_SBDR_SSDP_NO);
}
+bool __init mmio_stale_data_unknown(void)
+{
+ u64 ia32_cap = x86_read_arch_cap_msr();
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_INTEL)
+ return false;
+ /*
+ * CPU vulnerability is unknown when, hardware doesn't set the
+ * immunity bits and CPU is not in the known affected list.
+ */
+ if (!cpu_matches(cpu_vuln_blacklist, MMIO) &&
+ !arch_cap_mmio_immune(ia32_cap))
+ return true;
+ return false;
+}
+
static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
{
u64 ia32_cap = x86_read_arch_cap_msr();
@@ -1349,14 +1365,8 @@ static void __init cpu_set_bug_bits(struct cpuinfo_x86 *c)
cpu_matches(cpu_vuln_blacklist, SRBDS | MMIO_SBDS))
setup_force_cpu_bug(X86_BUG_SRBDS);
- /*
- * Processor MMIO Stale Data bug enumeration
- *
- * Affected CPU list is generally enough to enumerate the vulnerability,
- * but for virtualization case check for ARCH_CAP MSR bits also, VMM may
- * not want the guest to enumerate the bug.
- */
- if (cpu_matches(cpu_vuln_blacklist, MMIO) &&
+ /* Processor MMIO Stale Data bug enumeration */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL &&
!arch_cap_mmio_immune(ia32_cap))
setup_force_cpu_bug(X86_BUG_MMIO_STALE_DATA);
diff --git a/arch/x86/kernel/cpu/cpu.h b/arch/x86/kernel/cpu/cpu.h
index 7c9b5893c30a..a2dbfc1bbc49 100644
--- a/arch/x86/kernel/cpu/cpu.h
+++ b/arch/x86/kernel/cpu/cpu.h
@@ -82,6 +82,7 @@ unsigned int aperfmperf_get_khz(int cpu);
extern void x86_spec_ctrl_setup_ap(void);
extern void update_srbds_msr(void);
+extern bool mmio_stale_data_unknown(void);
extern u64 x86_read_arch_cap_msr(void);
base-commit: 4a57a8400075bc5287c5c877702c68aeae2a033d
--
2.35.3
The commit 99c13b8c8896d7bcb92753bf
("x86/mm/pat: Don't report PAT on CPUs that don't support it")
incorrectly failed to account for the case in init_cache_modes() when
CPUs do support PAT and falsely reported PAT to be disabled when in
fact PAT is enabled. In some environments, notably in Xen PV domains,
MTRR is disabled but PAT is still enabled, and that is the case
that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does
not correctly report that PAT is enabled in such environments. The fix
is implemented in init_cache_modes() by setting pat_bp_enabled to true
in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf
("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed
to account for.
This approach arranges for pat_enabled() to return true in the Xen PV
environment without undermining the rest of PAT MSR management logic
that considers PAT to be disabled: Specifically, no writes to the PAT
MSR should occur.
This patch fixes a regression that some users are experiencing with
Linux as a Xen Dom0 driving particular Intel graphics devices by
correctly reporting to the Intel i915 driver that PAT is enabled where
previously it was falsely reporting that PAT is disabled. Some users
are experiencing system hangs in Xen PV Dom0 and all users on Xen PV
Dom0 are experiencing reduced graphics performance because the keying of
the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc())
means that in particular graphics frame buffer accesses are quite a bit
less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be
disabled if the administrator sets the "nopat" boot option. Introduce
a new boolean variable, pat_force_disable, to forcibly disable PAT
when the administrator sets the "nopat" option to override the default
behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs
moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
Co-developed-by: Jan Beulich <jbeulich(a)suse.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Chuck Zmudzinski <brchuckz(a)aol.com>
---
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich)
*Add the necessary code to incorporate the "nopat" fix
*void init_cache_modes(void) -> void __init init_cache_modes(void)
*Add Jan Beulich as Co-developer (Jan has not signed off yet)
*Expand the commit message to include relevant parts of the commit
message of Jan Beulich's proposed patch for this problem
*Fix 'else if ... {' placement and indentation
*Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to
include Jan's idea for fixing "nopat" that was missing from the first
version of the patch.
The patch has been tested, it works as expected with and without nopat
in the Xen PV Dom0 environment. That is, "nopat" causes the system to
exhibit the effects and problems that lack of PAT support causes.
arch/x86/mm/pat/memtype.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c
index d5ef64ddd35e..10a37d309d23 100644
--- a/arch/x86/mm/pat/memtype.c
+++ b/arch/x86/mm/pat/memtype.c
@@ -62,6 +62,7 @@
static bool __read_mostly pat_bp_initialized;
static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT);
+static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT);
static bool __read_mostly pat_bp_enabled;
static bool __read_mostly pat_cm_initialized;
@@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason)
static int __init nopat(char *str)
{
pat_disable("PAT support disabled via boot option.");
+ pat_force_disabled = true;
return 0;
}
early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat)
wrmsrl(MSR_IA32_CR_PAT, pat);
}
-void init_cache_modes(void)
+void __init init_cache_modes(void)
{
u64 pat = 0;
@@ -292,7 +294,7 @@ void init_cache_modes(void)
rdmsrl(MSR_IA32_CR_PAT, pat);
}
- if (!pat) {
+ if (!pat || pat_force_disabled) {
/*
* No PAT. Emulate the PAT table that corresponds to the two
* cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void)
*/
pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) |
PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
+ } else if (!pat_bp_enabled) {
+ /*
+ * In some environments, specifically Xen PV, PAT
+ * initialization is skipped because MTRRs are disabled even
+ * though PAT is available. In such environments, set PAT to
+ * enabled to correctly indicate to callers of pat_enabled()
+ * that CPU support for PAT is available.
+ */
+ pat_bp_enabled = true;
+ pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
}
__init_cache_modes(pat);
--
2.36.1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4d8f24eeedc58d5f87b650ddda73c16e8ba56559 Mon Sep 17 00:00:00 2001
From: Wei Wang <weiwan(a)google.com>
Date: Thu, 21 Jul 2022 20:44:04 +0000
Subject: [PATCH] Revert "tcp: change pingpong threshold to 3"
This reverts commit 4a41f453bedfd5e9cd040bad509d9da49feb3e2c.
This to-be-reverted commit was meant to apply a stricter rule for the
stack to enter pingpong mode. However, the condition used to check for
interactive session "before(tp->lsndtime, icsk->icsk_ack.lrcvtime)" is
jiffy based and might be too coarse, which delays the stack entering
pingpong mode.
We revert this patch so that we no longer use the above condition to
determine interactive session, and also reduce pingpong threshold to 1.
Fixes: 4a41f453bedf ("tcp: change pingpong threshold to 3")
Reported-by: LemmyHuang <hlm3280(a)163.com>
Suggested-by: Neal Cardwell <ncardwell(a)google.com>
Signed-off-by: Wei Wang <weiwan(a)google.com>
Acked-by: Neal Cardwell <ncardwell(a)google.com>
Reviewed-by: Eric Dumazet <edumazet(a)google.com>
Link: https://lore.kernel.org/r/20220721204404.388396-1-weiwan@google.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/include/net/inet_connection_sock.h b/include/net/inet_connection_sock.h
index 85cd695e7fd1..ee88f0f1350f 100644
--- a/include/net/inet_connection_sock.h
+++ b/include/net/inet_connection_sock.h
@@ -321,7 +321,7 @@ void inet_csk_update_fastreuse(struct inet_bind_bucket *tb,
struct dst_entry *inet_csk_update_pmtu(struct sock *sk, u32 mtu);
-#define TCP_PINGPONG_THRESH 3
+#define TCP_PINGPONG_THRESH 1
static inline void inet_csk_enter_pingpong_mode(struct sock *sk)
{
@@ -338,14 +338,6 @@ static inline bool inet_csk_in_pingpong_mode(struct sock *sk)
return inet_csk(sk)->icsk_ack.pingpong >= TCP_PINGPONG_THRESH;
}
-static inline void inet_csk_inc_pingpong_cnt(struct sock *sk)
-{
- struct inet_connection_sock *icsk = inet_csk(sk);
-
- if (icsk->icsk_ack.pingpong < U8_MAX)
- icsk->icsk_ack.pingpong++;
-}
-
static inline bool inet_csk_has_ulp(struct sock *sk)
{
return inet_sk(sk)->is_icsk && !!inet_csk(sk)->icsk_ulp_ops;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index cf6713c9567e..2efe41c84ee8 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -167,16 +167,13 @@ static void tcp_event_data_sent(struct tcp_sock *tp,
if (tcp_packets_in_flight(tp) == 0)
tcp_ca_event(sk, CA_EVENT_TX_START);
- /* If this is the first data packet sent in response to the
- * previous received data,
- * and it is a reply for ato after last received packet,
- * increase pingpong count.
- */
- if (before(tp->lsndtime, icsk->icsk_ack.lrcvtime) &&
- (u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
- inet_csk_inc_pingpong_cnt(sk);
-
tp->lsndtime = now;
+
+ /* If it is a reply for ato after last received
+ * packet, enter pingpong mode.
+ */
+ if ((u32)(now - icsk->icsk_ack.lrcvtime) < icsk->icsk_ack.ato)
+ inet_csk_enter_pingpong_mode(sk);
}
/* Account for an ACK we sent. */
Today PAT can't be used without MTRR being available, unless MTRR is at
least configured via CONFIG_MTRR and the system is running as Xen PV
guest. In this case PAT is automatically available via the hypervisor,
but the PAT MSR can't be modified by the kernel and MTRR is disabled.
As an additional complexity the availability of PAT can't be queried
via pat_enabled() in the Xen PV case, as the lack of MTRR will set PAT
to be disabled. This leads to some drivers believing that not all cache
modes are available, resulting in failures or degraded functionality.
The same applies to a kernel built with no MTRR support: it won't
allow to use the PAT MSR, even if there is no technical reason for
that, other than setting up PAT on all cpus the same way (which is a
requirement of the processor's cache management) is relying on some
MTRR specific code.
Fix all of that by:
- moving the function needed by PAT from MTRR specific code one level
up
- adding a PAT indirection layer supporting the 3 cases "no or disabled
PAT", "PAT under kernel control", and "PAT under Xen control"
- removing the dependency of PAT on MTRR
Juergen Gross (3):
x86: move some code out of arch/x86/kernel/cpu/mtrr
x86: add wrapper functions for mtrr functions handling also pat
x86: decouple pat and mtrr handling
arch/x86/include/asm/memtype.h | 13 ++-
arch/x86/include/asm/mtrr.h | 27 ++++--
arch/x86/include/asm/processor.h | 10 +++
arch/x86/kernel/cpu/common.c | 123 +++++++++++++++++++++++++++-
arch/x86/kernel/cpu/mtrr/generic.c | 90 ++------------------
arch/x86/kernel/cpu/mtrr/mtrr.c | 58 ++++---------
arch/x86/kernel/cpu/mtrr/mtrr.h | 1 -
arch/x86/kernel/setup.c | 12 +--
arch/x86/kernel/smpboot.c | 8 +-
arch/x86/mm/pat/memtype.c | 127 +++++++++++++++++++++--------
arch/x86/power/cpu.c | 2 +-
arch/x86/xen/enlighten_pv.c | 4 +
12 files changed, 289 insertions(+), 186 deletions(-)
--
2.35.3