During the High-Speed Isochronous Audio transfers, xHCI
controller on certain AMD platforms experiences momentary data
loss. This results in Missed Service Errors (MSE) being
generated by the xHCI.
The root cause of the MSE is attributed to the ISOC OUT endpoint
being omitted from scheduling. This can happen either when an IN
endpoint with a 64ms service interval is pre-scheduled prior to
the ISOC OUT endpoint or when the interval of the ISOC OUT
endpoint is shorter than that of the IN endpoint. Consequently,
the OUT service is neglected when an IN endpoint with a service
interval exceeding 32ms is scheduled concurrently (every 64ms in
this scenario).
This issue is particularly seen on certain older AMD platforms.
To mitigate this problem, it is recommended to adjust the service
interval of the IN endpoint to not exceed 32ms (interval 8). This
adjustment ensures that the OUT endpoint will not be bypassed,
even if a smaller interval value is utilized.
Cc: stable(a)vger.kernel.org
Signed-off-by: Raju Rangoju <Raju.Rangoju(a)amd.com>
---
Changes since v3:
- Bump up the enum number XHCI_LIMIT_ENDPOINT_INTERVAL_9
Changes since v2:
- added stable tag to backport to all stable kernels
Changes since v1:
- replaced hex values with pci device names
- corrected the commit message
drivers/usb/host/xhci-mem.c | 5 +++++
drivers/usb/host/xhci-pci.c | 25 +++++++++++++++++++++++++
drivers/usb/host/xhci.h | 1 +
3 files changed, 31 insertions(+)
diff --git a/drivers/usb/host/xhci-mem.c b/drivers/usb/host/xhci-mem.c
index 92703efda1f7..d3182ba98788 100644
--- a/drivers/usb/host/xhci-mem.c
+++ b/drivers/usb/host/xhci-mem.c
@@ -1420,6 +1420,11 @@ int xhci_endpoint_init(struct xhci_hcd *xhci,
/* Periodic endpoint bInterval limit quirk */
if (usb_endpoint_xfer_int(&ep->desc) ||
usb_endpoint_xfer_isoc(&ep->desc)) {
+ if ((xhci->quirks & XHCI_LIMIT_ENDPOINT_INTERVAL_9) &&
+ usb_endpoint_xfer_int(&ep->desc) &&
+ interval >= 9) {
+ interval = 8;
+ }
if ((xhci->quirks & XHCI_LIMIT_ENDPOINT_INTERVAL_7) &&
udev->speed >= USB_SPEED_HIGH &&
interval >= 7) {
diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 2d1e205c14c6..d23884afdf3f 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -69,12 +69,22 @@
#define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_4C_XHCI 0x15ec
#define PCI_DEVICE_ID_INTEL_TITAN_RIDGE_DD_XHCI 0x15f0
+#define PCI_DEVICE_ID_AMD_ARIEL_TYPEC_XHCI 0x13ed
+#define PCI_DEVICE_ID_AMD_ARIEL_TYPEA_XHCI 0x13ee
+#define PCI_DEVICE_ID_AMD_STARSHIP_XHCI 0x148c
+#define PCI_DEVICE_ID_AMD_FIREFLIGHT_15D4_XHCI 0x15d4
+#define PCI_DEVICE_ID_AMD_FIREFLIGHT_15D5_XHCI 0x15d5
+#define PCI_DEVICE_ID_AMD_RAVEN_15E0_XHCI 0x15e0
+#define PCI_DEVICE_ID_AMD_RAVEN_15E1_XHCI 0x15e1
+#define PCI_DEVICE_ID_AMD_RAVEN2_XHCI 0x15e5
#define PCI_DEVICE_ID_AMD_RENOIR_XHCI 0x1639
#define PCI_DEVICE_ID_AMD_PROMONTORYA_4 0x43b9
#define PCI_DEVICE_ID_AMD_PROMONTORYA_3 0x43ba
#define PCI_DEVICE_ID_AMD_PROMONTORYA_2 0x43bb
#define PCI_DEVICE_ID_AMD_PROMONTORYA_1 0x43bc
+#define PCI_DEVICE_ID_ATI_NAVI10_7316_XHCI 0x7316
+
#define PCI_DEVICE_ID_ASMEDIA_1042_XHCI 0x1042
#define PCI_DEVICE_ID_ASMEDIA_1042A_XHCI 0x1142
#define PCI_DEVICE_ID_ASMEDIA_1142_XHCI 0x1242
@@ -278,6 +288,21 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
if (pdev->vendor == PCI_VENDOR_ID_NEC)
xhci->quirks |= XHCI_NEC_HOST;
+ if (pdev->vendor == PCI_VENDOR_ID_AMD &&
+ (pdev->device == PCI_DEVICE_ID_AMD_ARIEL_TYPEC_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_ARIEL_TYPEA_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_STARSHIP_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_FIREFLIGHT_15D4_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_FIREFLIGHT_15D5_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_RAVEN_15E0_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_RAVEN_15E1_XHCI ||
+ pdev->device == PCI_DEVICE_ID_AMD_RAVEN2_XHCI))
+ xhci->quirks |= XHCI_LIMIT_ENDPOINT_INTERVAL_9;
+
+ if (pdev->vendor == PCI_VENDOR_ID_ATI &&
+ pdev->device == PCI_DEVICE_ID_ATI_NAVI10_7316_XHCI)
+ xhci->quirks |= XHCI_LIMIT_ENDPOINT_INTERVAL_9;
+
if (pdev->vendor == PCI_VENDOR_ID_AMD && xhci->hci_version == 0x96)
xhci->quirks |= XHCI_AMD_0x96_HOST;
diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h
index 4914f0a10cff..36b77d3c0e7b 100644
--- a/drivers/usb/host/xhci.h
+++ b/drivers/usb/host/xhci.h
@@ -1633,6 +1633,7 @@ struct xhci_hcd {
#define XHCI_WRITE_64_HI_LO BIT_ULL(47)
#define XHCI_CDNS_SCTX_QUIRK BIT_ULL(48)
#define XHCI_ETRON_HOST BIT_ULL(49)
+#define XHCI_LIMIT_ENDPOINT_INTERVAL_9 BIT_ULL(50)
unsigned int num_active_eps;
unsigned int limit_active_eps;
--
2.34.1
From: Kairui Song <kasong(a)tencent.com>
This series fixes the page cache corruption issue reported by Christian
Theune [1]. The issue was reported affects kernels back to 5.19.
Current maintained effected branches includes 6.1 and 6.6 and the fix
was included in 6.10 already.
This series can be applied for both 6.1 and 6.6.
Patch 3/3 is the fixing patch. It was initially submitted and merge as
an optimization but found to have fixed the corruption by handling race
correctly.
Patch 1/3 and 2/3 is required for 3/3.
Patch 3/3 included some unit test code, making the LOC of the backport a
bit higher, but should be OK to be kept, since they are just test code.
Note there seems still some unresolved problem in Link [1] but that
should be a different issue, and the commits being backported have been
well tested, they fix the corruption issue just fine.
Link: https://lore.kernel.org/linux-mm/A5A976CB-DB57-4513-A700-656580488AB6@flyin… [1]
Kairui Song (3):
mm/filemap: return early if failed to allocate memory for split
lib/xarray: introduce a new helper xas_get_order
mm/filemap: optimize filemap folio adding
include/linux/xarray.h | 6 +++
lib/test_xarray.c | 93 ++++++++++++++++++++++++++++++++++++++++++
lib/xarray.c | 49 ++++++++++++++--------
mm/filemap.c | 50 ++++++++++++++++++-----
4 files changed, 169 insertions(+), 29 deletions(-)
--
2.46.1
Commit 60e3318e3e900 ("cifs: use fs_context for automounts") was
released in v6.1.54 and broke the failover when one of the servers
inside DFS becomes unavailable. We reproduced the problem on the EC2
instances of different types. Reverting aforementioned commint on top of
the latest stable verison v6.1.94 helps to resolve the problem.
Earliest working version is v6.2-rc1. There were two big merges of CIFS fixes:
[1] and [2]. We would like to ask for the help to investigate this problem and
if some of those patches need to be backported. Also, is it safe to just revert
problematic commit until proper fixes/backports will be available?
We will help to do testing and confirm if fix works, but let me also list the
steps we used to reproduce the problem if it will help to identify the problem:
1. Create Active Directory domain eg. 'corp.fsxtest.local' in AWS Directory
Service with:
- three AWS FSX file systems filesystem1..filesystem3
- three Windows servers; They have DFS installed as per
https://learn.microsoft.com/en-us/windows-server/storage/dfs-namespaces/dfs…:
- dfs-srv1: EC2AMAZ-2EGTM59
- dfs-srv2: EC2AMAZ-1N36PRD
- dfs-srv3: EC2AMAZ-0PAUH2U
2. Create DFS namespace eg. 'dfs-namespace' in Windows server 2008 mode
and three folders targets in it:
- referral-a mapped to filesystem1.corp.local
- referral-b mapped to filesystem2.corp.local
- referral-c mapped to filesystem3.corp.local
- local folders dfs-srv1..dfs-srv3 in C:\DFSRoots\dfs-namespace of every
Windows server. This helps to quickly define underlying server when
DFS is mounted.
3. Enabled cifs debug logs:
```
echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control
echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control
echo 7 > /proc/fs/cifs/cifsFYI
```
4. Mount DFS namespace on Amazon Linux 2023 instance running any vanilla
kernel v6.1.54+:
```
dmesg -c &>/dev/null
cd /mnt
mount -t cifs -o cred=/mnt/creds,echo_interval=5 \
//corp.fsxtest.local/dfs-namespace \
./dfs-namespace
```
5. List DFS root, it's also required to avoid recursive mounts that happen
during regular 'ls' run:
```
sh -c 'ls dfs-namespace'
dfs-srv2 referral-a referral-b
```
The DFS server is EC2AMAZ-1N36PRD, it's also listed in mount:
```
[root@ip-172-31-2-82 mnt]# mount | grep dfs
//corp.fsxtest.local/dfs-namespace on /mnt/dfs-namespace type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.11.26,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
//EC2AMAZ-1N36PRD.corp.fsxtest.local/dfs-namespace/referral-a on /mnt/dfs-namespace/referral-a type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.12.80,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
```
List files in first folder:
```
sh -c 'ls dfs-namespace/referral-a'
filea.txt.txt
```
6. Shutdown DFS server-2.
List DFS root again, server changed from dfs-srv2 to dfs-srv1 EC2AMAZ-2EGTM59:
```
sh -c 'ls dfs-namespace'
dfs-srv1 referral-a referral-b
```
7. Try to list files in another folder, this causes ls to fail with error:
```
sh -c 'ls dfs-namespace/referral-b'
ls: cannot access 'dfs-namespace/referral-b': No route to host```
Sometimes it's also 'Operation now in progress' error.
mount shows the same output:
```
//corp.fsxtest.local/dfs-namespace on /mnt/dfs-namespace type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.11.26,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
//EC2AMAZ-1N36PRD.corp.fsxtest.local/dfs-namespace/referral-a on /mnt/dfs-namespace/referral-a type cifs (rw,relatime,vers=3.1.1,cache=strict,username=Admin,domain=corp.fsxtest.local,uid=0,noforceuid,gid=0,noforcegid,addr=172.31.12.80,file_mode=0755,dir_mode=0755,soft,nounix,mapposix,rsize=4194304,wsize=4194304,bsize=1048576,echo_interval=5,actimeo=1,closetimeo=1)
```
I also attached kernel debug logs from this test.
[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?…
Reported-by: Andrei Paniakin <apanyaki(a)amazon.com>
Bisected-by: Simba Bonga <simbarb(a)amazon.com>
---
#regzbot introduced: v6.1.54..v6.2-rc1
From: Ard Biesheuvel <ardb(a)kernel.org>
GCC and Clang both implement stack protector support based on Thread
Local Storage (TLS) variables, and this is used in the kernel to
implement per-task stack cookies, by copying a task's stack cookie into
a per-CPU variable every time it is scheduled in.
Both now also implement -mstack-protector-guard-symbol=, which permits
the TLS variable to be specified directly. This is useful because it
will allow us to move away from using a fixed offset of 40 bytes into
the per-CPU area on x86_64, which requires a lot of special handling in
the per-CPU code and the runtime relocation code.
However, while GCC is rather lax in its implementation of this command
line option, Clang actually requires that the provided symbol name
refers to a TLS variable (i.e., one declared with __thread), although it
also permits the variable to be undeclared entirely, in which case it
will use an implicit declaration of the right type.
The upshot of this is that Clang will emit the correct references to the
stack cookie variable in most cases, e.g.,
10d: 64 a1 00 00 00 00 mov %fs:0x0,%eax
10f: R_386_32 __stack_chk_guard
However, if a non-TLS definition of the symbol in question is visible in
the same compilation unit (which amounts to the whole of vmlinux if LTO
is enabled), it will drop the per-CPU prefix and emit a load from a
bogus address.
Work around this by using a symbol name that never occurs in C code, and
emit it as an alias in the linker script.
Fixes: 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Cc: <stable(a)vger.kernel.org>
Cc: Fangrui Song <i(a)maskray.me>
Cc: Uros Bizjak <ubizjak(a)gmail.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Andy Lutomirski <luto(a)kernel.org>
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Brian Gerst <brgerst(a)gmail.com>
---
arch/x86/Makefile | 5 +++--
arch/x86/entry/entry.S | 16 ++++++++++++++++
arch/x86/include/asm/asm-prototypes.h | 3 +++
arch/x86/kernel/cpu/common.c | 2 ++
arch/x86/kernel/vmlinux.lds.S | 3 +++
5 files changed, 27 insertions(+), 2 deletions(-)
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index cd75e78a06c1..5b773b34768d 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -142,9 +142,10 @@ ifeq ($(CONFIG_X86_32),y)
ifeq ($(CONFIG_STACKPROTECTOR),y)
ifeq ($(CONFIG_SMP),y)
- KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
+ KBUILD_CFLAGS += -mstack-protector-guard-reg=fs \
+ -mstack-protector-guard-symbol=__ref_stack_chk_guard
else
- KBUILD_CFLAGS += -mstack-protector-guard=global
+ KBUILD_CFLAGS += -mstack-protector-guard=global
endif
endif
else
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 324686bca368..b7ea3e8e9ecc 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -51,3 +51,19 @@ EXPORT_SYMBOL_GPL(mds_verw_sel);
.popsection
THUNK warn_thunk_thunk, __warn_thunk
+
+#ifndef CONFIG_X86_64
+/*
+ * Clang's implementation of TLS stack cookies requires the variable in
+ * question to be a TLS variable. If the variable happens to be defined as an
+ * ordinary variable with external linkage in the same compilation unit (which
+ * amounts to the whole of vmlinux with LTO enabled), Clang will drop the
+ * segment register prefix from the references, resulting in broken code. Work
+ * around this by avoiding the symbol used in -mstack-protector-guard-symbol=
+ * entirely in the C code, and use an alias emitted by the linker script
+ * instead.
+ */
+#ifdef CONFIG_STACKPROTECTOR
+EXPORT_SYMBOL(__ref_stack_chk_guard);
+#endif
+#endif
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 25466c4d2134..3674006e3974 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -20,3 +20,6 @@
extern void cmpxchg8b_emu(void);
#endif
+#if defined(__GENKSYMS__) && defined(CONFIG_STACKPROTECTOR)
+extern unsigned long __ref_stack_chk_guard;
+#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 8f41ab219cf1..9d42bd15e06c 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2091,8 +2091,10 @@ void syscall_init(void)
#ifdef CONFIG_STACKPROTECTOR
DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+#ifndef CONFIG_SMP
EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
#endif
+#endif
#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index 410546bacc0f..d61c3584f3e6 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -468,6 +468,9 @@ SECTIONS
. = ASSERT((_end - LOAD_OFFSET <= KERNEL_IMAGE_SIZE),
"kernel image bigger than KERNEL_IMAGE_SIZE");
+/* needed for Clang - see arch/x86/entry/entry.S */
+PROVIDE(__ref_stack_chk_guard = __stack_chk_guard);
+
#ifdef CONFIG_X86_64
/*
* Per-cpu symbols which need to be offset from __per_cpu_load
--
2.47.0