I'm announcing the release of the 5.4.101 kernel.
All users of the 5.4 kernel series must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm64/boot/dts/nvidia/tegra210.dtsi | 1
drivers/hid/hid-core.c | 6 +-
drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 1
drivers/net/usb/qmi_wwan.c | 1
drivers/usb/core/quirks.c | 9 ++-
fs/cifs/connect.c | 1
fs/dax.c | 10 +--
fs/ntfs/inode.c | 6 ++
include/linux/mm.h | 8 +-
kernel/bpf/verifier.c | 10 ++-
mm/memory.c | 57 +++++++++++----------
scripts/Makefile | 9 ++-
scripts/recordmcount.pl | 6 +-
virt/kvm/kvm_main.c | 17 ++++--
15 files changed, 92 insertions(+), 52 deletions(-)
Christoph Hellwig (2):
mm: unexport follow_pte_pmd
mm: simplify follow_pte{,pmd}
Christoph Schemmel (1):
NET: usb: qmi_wwan: Adding support for Cinterion MV31
Daniel Borkmann (1):
bpf: Fix truncation handling for mod32 dst reg wrt zero
Greg Kroah-Hartman (1):
Linux 5.4.101
Johan Hovold (1):
USB: quirks: sort quirk entries
Paolo Bonzini (2):
KVM: do not assume PTE is writable after follow_pfn
mm: provide a saner PTE walking API for modules
Raju Rangoju (1):
cxgb4: Add new T6 PCI device id 0x6092
Rolf Eike Beer (2):
scripts: use pkg-config to locate libcrypto
scripts: set proper OpenSSL include dir also for sign-file
Rong Chen (1):
scripts/recordmcount.pl: support big endian for ARCH sh
Rustam Kovhaev (1):
ntfs: check for valid standard information attribute
Sameer Pujar (1):
arm64: tegra: Add power-domain for Tegra210 HDA
Sean Christopherson (1):
KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped()
Shyam Prasad N (1):
cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath.
Stefan Ursella (1):
usb: quirks: add quirk to start video capture on ELMO L-12F document camera reliable
Will McVicker (1):
HID: make arrays usage and value to be the same
This is the start of the stable review cycle for the 5.4.101 release.
There are 17 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 27 Feb 2021 09:25:06 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.101-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.101-rc1
Rong Chen <rong.a.chen(a)intel.com>
scripts/recordmcount.pl: support big endian for ARCH sh
Shyam Prasad N <sprasad(a)microsoft.com>
cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting cifs_sb->prepath.
Raju Rangoju <rajur(a)chelsio.com>
cxgb4: Add new T6 PCI device id 0x6092
Christoph Schemmel <christoph.schemmel(a)gmail.com>
NET: usb: qmi_wwan: Adding support for Cinterion MV31
Sean Christopherson <seanjc(a)google.com>
KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped()
Paolo Bonzini <pbonzini(a)redhat.com>
mm: provide a saner PTE walking API for modules
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: do not assume PTE is writable after follow_pfn
Christoph Hellwig <hch(a)lst.de>
mm: simplify follow_pte{,pmd}
Christoph Hellwig <hch(a)lst.de>
mm: unexport follow_pte_pmd
Rolf Eike Beer <eb(a)emlix.com>
scripts: set proper OpenSSL include dir also for sign-file
Rolf Eike Beer <eb(a)emlix.com>
scripts: use pkg-config to locate libcrypto
Sameer Pujar <spujar(a)nvidia.com>
arm64: tegra: Add power-domain for Tegra210 HDA
Rustam Kovhaev <rkovhaev(a)gmail.com>
ntfs: check for valid standard information attribute
Stefan Ursella <stefan.ursella(a)wolfvision.net>
usb: quirks: add quirk to start video capture on ELMO L-12F document camera reliable
Johan Hovold <johan(a)kernel.org>
USB: quirks: sort quirk entries
Will McVicker <willmcvicker(a)google.com>
HID: make arrays usage and value to be the same
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix truncation handling for mod32 dst reg wrt zero
-------------
Diffstat:
Makefile | 4 +-
arch/arm64/boot/dts/nvidia/tegra210.dtsi | 1 +
drivers/hid/hid-core.c | 6 +--
drivers/net/ethernet/chelsio/cxgb4/t4_pci_id_tbl.h | 1 +
drivers/net/usb/qmi_wwan.c | 1 +
drivers/usb/core/quirks.c | 9 ++--
fs/cifs/connect.c | 1 +
fs/dax.c | 10 ++--
fs/ntfs/inode.c | 6 +++
include/linux/mm.h | 8 +--
kernel/bpf/verifier.c | 10 ++--
mm/memory.c | 57 ++++++++++++----------
scripts/Makefile | 9 +++-
scripts/recordmcount.pl | 6 ++-
virt/kvm/kvm_main.c | 17 +++++--
15 files changed, 93 insertions(+), 53 deletions(-)
coprocessor_flush is not a part of fast exception handlers, but it uses
parts of fast coprocessor handling code that's why it's in the same
source file. It uses call0 opcode to invoke those parts so there are no
limitations on their relative location, but the rest of the code calls
coprocessor_flush with call8 and that doesn't work when vectors are
placed in a different gigabyte-aligned area than the rest of the kernel.
Move coprocessor_flush from the .exception.text section to the .text so
that it's reachable from the rest of the kernel with call8.
Cc: stable(a)vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc(a)gmail.com>
---
arch/xtensa/kernel/coprocessor.S | 64 ++++++++++++++++----------------
1 file changed, 33 insertions(+), 31 deletions(-)
diff --git a/arch/xtensa/kernel/coprocessor.S b/arch/xtensa/kernel/coprocessor.S
index c426b846beef..45cc0ae0af6f 100644
--- a/arch/xtensa/kernel/coprocessor.S
+++ b/arch/xtensa/kernel/coprocessor.S
@@ -99,37 +99,6 @@
LOAD_CP_REGS_TAB(6)
LOAD_CP_REGS_TAB(7)
-/*
- * coprocessor_flush(struct thread_info*, index)
- * a2 a3
- *
- * Save coprocessor registers for coprocessor 'index'.
- * The register values are saved to or loaded from the coprocessor area
- * inside the task_info structure.
- *
- * Note that this function doesn't update the coprocessor_owner information!
- *
- */
-
-ENTRY(coprocessor_flush)
-
- /* reserve 4 bytes on stack to save a0 */
- abi_entry(4)
-
- s32i a0, a1, 0
- movi a0, .Lsave_cp_regs_jump_table
- addx8 a3, a3, a0
- l32i a4, a3, 4
- l32i a3, a3, 0
- add a2, a2, a4
- beqz a3, 1f
- callx0 a3
-1: l32i a0, a1, 0
-
- abi_ret(4)
-
-ENDPROC(coprocessor_flush)
-
/*
* Entry condition:
*
@@ -245,6 +214,39 @@ ENTRY(fast_coprocessor)
ENDPROC(fast_coprocessor)
+ .text
+
+/*
+ * coprocessor_flush(struct thread_info*, index)
+ * a2 a3
+ *
+ * Save coprocessor registers for coprocessor 'index'.
+ * The register values are saved to or loaded from the coprocessor area
+ * inside the task_info structure.
+ *
+ * Note that this function doesn't update the coprocessor_owner information!
+ *
+ */
+
+ENTRY(coprocessor_flush)
+
+ /* reserve 4 bytes on stack to save a0 */
+ abi_entry(4)
+
+ s32i a0, a1, 0
+ movi a0, .Lsave_cp_regs_jump_table
+ addx8 a3, a3, a0
+ l32i a4, a3, 4
+ l32i a3, a3, 0
+ add a2, a2, a4
+ beqz a3, 1f
+ callx0 a3
+1: l32i a0, a1, 0
+
+ abi_ret(4)
+
+ENDPROC(coprocessor_flush)
+
.data
ENTRY(coprocessor_owner)
--
2.20.1
This is the start of the stable review cycle for the 5.11.2 release.
There are 12 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 27 Feb 2021 09:25:06 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.2-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.11.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.11.2-rc1
Sean Christopherson <seanjc(a)google.com>
KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped()
Paolo Bonzini <pbonzini(a)redhat.com>
mm: provide a saner PTE walking API for modules
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: do not assume PTE is writable after follow_pfn
Sean Christopherson <seanjc(a)google.com>
KVM: x86: Zap the oldest MMU pages, not the newest
Thomas Hebb <tommyhebb(a)gmail.com>
hwmon: (dell-smm) Add XPS 15 L502X to fan control blacklist
Sameer Pujar <spujar(a)nvidia.com>
arm64: tegra: Add power-domain for Tegra210 HDA
Hui Wang <hui.wang(a)canonical.com>
Bluetooth: btusb: Some Qualcomm Bluetooth adapters stop working
Rustam Kovhaev <rkovhaev(a)gmail.com>
ntfs: check for valid standard information attribute
Stefan Ursella <stefan.ursella(a)wolfvision.net>
usb: quirks: add quirk to start video capture on ELMO L-12F document camera reliable
Johan Hovold <johan(a)kernel.org>
USB: quirks: sort quirk entries
Will McVicker <willmcvicker(a)google.com>
HID: make arrays usage and value to be the same
Daniel Borkmann <daniel(a)iogearbox.net>
bpf: Fix truncation handling for mod32 dst reg wrt zero
-------------
Diffstat:
Makefile | 4 ++--
arch/arm64/boot/dts/nvidia/tegra210.dtsi | 1 +
arch/s390/pci/pci_mmio.c | 4 ++--
arch/x86/kvm/mmu/mmu.c | 2 +-
drivers/bluetooth/btusb.c | 7 ++++++
drivers/hid/hid-core.c | 6 ++---
drivers/hwmon/dell-smm-hwmon.c | 7 ++++++
drivers/usb/core/quirks.c | 9 ++++---
fs/dax.c | 5 ++--
fs/ntfs/inode.c | 6 +++++
include/linux/mm.h | 6 +++--
kernel/bpf/verifier.c | 10 ++++----
mm/memory.c | 41 ++++++++++++++++++++++++++++----
virt/kvm/kvm_main.c | 17 +++++++++----
14 files changed, 97 insertions(+), 28 deletions(-)
From: NeilBrown <neilb(a)suse.de>
Subject: x86: fix seq_file iteration for pat/memtype.c
The memtype seq_file iterator allocates a buffer in the ->start and ->next
functions and frees it in the ->show function. The preferred handling for
such resources is to free them in the subsequent ->next or ->stop function
call.
Since Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration
code and interface") there is no guarantee that ->show will be called
after ->next, so this function can now leak memory.
So move the freeing of the buffer to ->next and ->stop.
Link: https://lkml.kernel.org/r/161248539022.21478.13874455485854739066.stgit@nob…
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb(a)suse.de>
Cc: Xin Long <lucien.xin(a)gmail.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
Cc: Neil Horman <nhorman(a)tuxdriver.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Vlad Yasevich <vyasevich(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/x86/mm/pat/memtype.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/arch/x86/mm/pat/memtype.c~x86-fix-seq_file-iteration-for-pat-memtypec
+++ a/arch/x86/mm/pat/memtype.c
@@ -1164,12 +1164,14 @@ static void *memtype_seq_start(struct se
static void *memtype_seq_next(struct seq_file *seq, void *v, loff_t *pos)
{
+ kfree(v);
++*pos;
return memtype_get_idx(*pos);
}
static void memtype_seq_stop(struct seq_file *seq, void *v)
{
+ kfree(v);
}
static int memtype_seq_show(struct seq_file *seq, void *v)
@@ -1181,8 +1183,6 @@ static int memtype_seq_show(struct seq_f
entry_print->end,
cattr_name(entry_print->type));
- kfree(entry_print);
-
return 0;
}
_
From: NeilBrown <neilb(a)suse.de>
Subject: seq_file: document how per-entry resources are managed.
Patch series "Fix some seq_file users that were recently broken".
A recent change to seq_file broke some users which were using seq_file
in a non-"standard" way ... though the "standard" isn't documented, so
they can be excused. The result is a possible leak - of memory in one
case, of references to a 'transport' in the other.
These three patches:
1/ document and explain the problem
2/ fix the problem user in x86
3/ fix the problem user in net/sctp
This patch (of 3):
Users of seq_file will sometimes find it convenient to take a resource,
such as a lock or memory allocation, in the ->start or ->next operations.
These are per-entry resources, distinct from per-session resources which
are taken in ->start and released in ->stop.
The preferred management of these is release the resource on the
subsequent call to ->next or ->stop.
However prior to Commit 1f4aace60b0e ("fs/seq_file.c: simplify seq_file
iteration code and interface") it happened that ->show would always be
called after ->start or ->next, and a few users chose to release the
resource in ->show.
This is no longer reliable. Since the mentioned commit, ->next will
always come after a successful ->show (to ensure m->index is updated
correctly), so the original ordering cannot be maintained.
This patch updates the documentation to clearly state the required
behaviour. Other patches will fix the few problematic users.
[akpm(a)linux-foundation.org: fix typo, per Willy]
Link: https://lkml.kernel.org/r/161248518659.21478.2484341937387294998.stgit@nobl…
Link: https://lkml.kernel.org/r/161248539020.21478.3147971477400875336.stgit@nobl…
Fixes: 1f4aace60b0e ("fs/seq_file.c: simplify seq_file iteration code and interface")
Signed-off-by: NeilBrown <neilb(a)suse.de>
Cc: Xin Long <lucien.xin(a)gmail.com>
Cc: Alexander Viro <viro(a)zeniv.linux.org.uk>
Cc: Jonathan Corbet <corbet(a)lwn.net>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Vlad Yasevich <vyasevich(a)gmail.com>
Cc: Neil Horman <nhorman(a)tuxdriver.com>
Cc: Marcelo Ricardo Leitner <marcelo.leitner(a)gmail.com>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/filesystems/seq_file.rst | 6 ++++++
1 file changed, 6 insertions(+)
--- a/Documentation/filesystems/seq_file.rst~seq_file-document-how-per-entry-resources-are-managed
+++ a/Documentation/filesystems/seq_file.rst
@@ -217,6 +217,12 @@ between the calls to start() and stop(),
is a reasonable thing to do. The seq_file code will also avoid taking any
other locks while the iterator is active.
+The iterater value returned by start() or next() is guaranteed to be
+passed to a subsequent next() or stop() call. This allows resources
+such as locks that were taken to be reliably released. There is *no*
+guarantee that the iterator will be passed to show(), though in practice
+it often will be.
+
Formatted output
================
_
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 68eabe17bf08 - Linux 5.11.2-rc1
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP
✅ Loopdev Sanity
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Ethernet drivers sanity
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
ppc64le:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ LTP
✅ Loopdev Sanity
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Ethernet drivers sanity
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Ethernet drivers sanity
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ ACPI table test
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: sanity smoke test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Ethernet drivers sanity
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
This patch fixes a circular locking dependency in the CI introduced by
commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated"). The lockdep only occurs when starting a Secure
Execution guest. Crypto virtualization (vfio_ap) is not yet supported for
SE guests; however, in order to avoid CI errors, this fix is being
provided.
The circular lockdep was introduced when the masks in the guest's APCB
were taken under the matrix_dev->lock. While the lock is definitely
needed to protect the setting/unsetting of the KVM pointer, it is not
necessarily critical for setting the masks, so this will not be done under
protection of the matrix_dev->lock.
Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tony Krowiak <akrowiak(a)linux.ibm.com>
Acked-by: Cornelia Huck <cohuck(a)redhat.com>
---
drivers/s390/crypto/vfio_ap_ops.c | 169 ++++++++++++++++++--------
drivers/s390/crypto/vfio_ap_private.h | 1 +
2 files changed, 122 insertions(+), 48 deletions(-)
diff --git a/drivers/s390/crypto/vfio_ap_ops.c b/drivers/s390/crypto/vfio_ap_ops.c
index 41fc2e4135fe..82b204f573b7 100644
--- a/drivers/s390/crypto/vfio_ap_ops.c
+++ b/drivers/s390/crypto/vfio_ap_ops.c
@@ -294,6 +294,21 @@ static int handle_pqap(struct kvm_vcpu *vcpu)
matrix_mdev = container_of(vcpu->kvm->arch.crypto.pqap_hook,
struct ap_matrix_mdev, pqap_hook);
+ /*
+ * If the KVM pointer is in the process of being set, wait until the
+ * process has completed. Unlock the matrix_dev->lock while waiting to
+ * allow the value of the KVM pointer to be set.
+ */
+ while (matrix_mdev->kvm_busy) {
+ mutex_unlock(&matrix_dev->lock);
+ msleep(100);
+ mutex_lock(&matrix_dev->lock);
+ }
+
+ /* If the there is no guest using the mdev, there is nothing to do */
+ if (!matrix_mdev->kvm)
+ goto out_unlock;
+
q = vfio_ap_get_queue(matrix_mdev, apqn);
if (!q)
goto out_unlock;
@@ -350,8 +365,11 @@ static int vfio_ap_mdev_create(struct kobject *kobj, struct mdev_device *mdev)
static int vfio_ap_mdev_remove(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
-
- if (matrix_mdev->kvm)
+ /*
+ * If the KVM pointer is in flux or the guest is running, disallow
+ * removal of the mdev
+ */
+ if (matrix_mdev->kvm_busy || matrix_mdev->kvm)
return -EBUSY;
mutex_lock(&matrix_dev->lock);
@@ -606,8 +624,11 @@ static ssize_t assign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow assignment of adapter */
- if (matrix_mdev->kvm)
+ /*
+ * If the KVM pointer is in flux or the guest is running, disallow
+ * un-assignment of adapter
+ */
+ if (matrix_mdev->kvm_busy || matrix_mdev->kvm)
return -EBUSY;
ret = kstrtoul(buf, 0, &apid);
@@ -672,8 +693,11 @@ static ssize_t unassign_adapter_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow un-assignment of adapter */
- if (matrix_mdev->kvm)
+ /*
+ * If the KVM pointer is in flux or the guest is running, disallow
+ * un-assignment of adapter
+ */
+ if (matrix_mdev->kvm_busy || matrix_mdev->kvm)
return -EBUSY;
ret = kstrtoul(buf, 0, &apid);
@@ -753,8 +777,11 @@ static ssize_t assign_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_apqi = matrix_mdev->matrix.aqm_max;
- /* If the guest is running, disallow assignment of domain */
- if (matrix_mdev->kvm)
+ /*
+ * If the KVM pointer is in flux or the guest is running, disallow
+ * un-assignment of domain
+ */
+ if (matrix_mdev->kvm_busy || matrix_mdev->kvm)
return -EBUSY;
ret = kstrtoul(buf, 0, &apqi);
@@ -814,8 +841,11 @@ static ssize_t unassign_domain_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow un-assignment of domain */
- if (matrix_mdev->kvm)
+ /*
+ * If the KVM pointer is in flux or the guest is running, disallow
+ * un-assignment of domain
+ */
+ if (matrix_mdev->kvm_busy || matrix_mdev->kvm)
return -EBUSY;
ret = kstrtoul(buf, 0, &apqi);
@@ -858,8 +888,11 @@ static ssize_t assign_control_domain_store(struct device *dev,
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
- /* If the guest is running, disallow assignment of control domain */
- if (matrix_mdev->kvm)
+ /*
+ * If the KVM pointer is in flux or the guest is running, disallow
+ * un-assignment of control domain
+ */
+ if (matrix_mdev->kvm_busy || matrix_mdev->kvm)
return -EBUSY;
ret = kstrtoul(buf, 0, &id);
@@ -908,8 +941,11 @@ static ssize_t unassign_control_domain_store(struct device *dev,
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
unsigned long max_domid = matrix_mdev->matrix.adm_max;
- /* If the guest is running, disallow un-assignment of control domain */
- if (matrix_mdev->kvm)
+ /*
+ * If the KVM pointer is in flux or the guest is running, disallow
+ * un-assignment of control domain
+ */
+ if (matrix_mdev->kvm_busy || matrix_mdev->kvm)
return -EBUSY;
ret = kstrtoul(buf, 0, &domid);
@@ -1027,8 +1063,15 @@ static const struct attribute_group *vfio_ap_mdev_attr_groups[] = {
* @matrix_mdev: a mediated matrix device
* @kvm: reference to KVM instance
*
- * Verifies no other mediated matrix device has @kvm and sets a reference to
- * it in @matrix_mdev->kvm.
+ * Sets all data for @matrix_mdev that are needed to manage AP resources
+ * for the guest whose state is represented by @kvm.
+ *
+ * Note: The matrix_dev->lock must be taken prior to calling
+ * this function; however, the lock will be temporarily released while the
+ * guest's AP configuration is set to avoid a potential lockdep splat.
+ * The kvm->lock is taken to set the guest's AP configuration which, under
+ * certain circumstances, will result in a circular lock dependency if this is
+ * done under the @matrix_mdev->lock.
*
* Return 0 if no other mediated matrix device has a reference to @kvm;
* otherwise, returns an -EPERM.
@@ -1043,9 +1086,18 @@ static int vfio_ap_mdev_set_kvm(struct ap_matrix_mdev *matrix_mdev,
return -EPERM;
}
- matrix_mdev->kvm = kvm;
- kvm_get_kvm(kvm);
- kvm->arch.crypto.pqap_hook = &matrix_mdev->pqap_hook;
+ if (kvm->arch.crypto.crycbd) {
+ matrix_mdev->kvm_busy = true;
+ kvm_get_kvm(kvm);
+ mutex_unlock(&matrix_dev->lock);
+ kvm_arch_crypto_set_masks(kvm,
+ matrix_mdev->matrix.apm,
+ matrix_mdev->matrix.aqm,
+ matrix_mdev->matrix.adm);
+ mutex_lock(&matrix_dev->lock);
+ matrix_mdev->kvm = kvm;
+ matrix_mdev->kvm_busy = false;
+ }
return 0;
}
@@ -1079,51 +1131,54 @@ static int vfio_ap_mdev_iommu_notifier(struct notifier_block *nb,
return NOTIFY_DONE;
}
+/**
+ * vfio_ap_mdev_unset_kvm
+ *
+ * @matrix_mdev: a matrix mediated device
+ *
+ * Performs clean-up of resources no longer needed by @matrix_mdev.
+ *
+ * Note: The matrix_dev->lock must be taken prior to calling
+ * this function; however, the lock will be temporarily released while the
+ * guest's AP configuration is cleared to avoid a potential lockdep splat.
+ * The kvm->lock is taken to clear the guest's AP configuration which, under
+ * certain circumstances, will result in a circular lock dependency if this is
+ * done under the @matrix_mdev->lock.
+ *
+ */
static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
{
- kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
- matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
- vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
- kvm_put_kvm(matrix_mdev->kvm);
- matrix_mdev->kvm = NULL;
+ if (matrix_mdev->kvm) {
+ matrix_mdev->kvm_busy = true;
+ mutex_unlock(&matrix_dev->lock);
+ kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
+ mutex_lock(&matrix_dev->lock);
+ vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
+ kvm_put_kvm(matrix_mdev->kvm);
+ matrix_mdev->kvm = NULL;
+ matrix_mdev->kvm_busy = false;
+ }
}
static int vfio_ap_mdev_group_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
- int ret, notify_rc = NOTIFY_OK;
+ int notify_rc = NOTIFY_OK;
struct ap_matrix_mdev *matrix_mdev;
if (action != VFIO_GROUP_NOTIFY_SET_KVM)
return NOTIFY_OK;
- matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
mutex_lock(&matrix_dev->lock);
+ matrix_mdev = container_of(nb, struct ap_matrix_mdev, group_notifier);
- if (!data) {
- if (matrix_mdev->kvm)
- vfio_ap_mdev_unset_kvm(matrix_mdev);
- goto notify_done;
- }
-
- ret = vfio_ap_mdev_set_kvm(matrix_mdev, data);
- if (ret) {
- notify_rc = NOTIFY_DONE;
- goto notify_done;
- }
-
- /* If there is no CRYCB pointer, then we can't copy the masks */
- if (!matrix_mdev->kvm->arch.crypto.crycbd) {
+ if (!data)
+ vfio_ap_mdev_unset_kvm(matrix_mdev);
+ else if (vfio_ap_mdev_set_kvm(matrix_mdev, data))
notify_rc = NOTIFY_DONE;
- goto notify_done;
- }
-
- kvm_arch_crypto_set_masks(matrix_mdev->kvm, matrix_mdev->matrix.apm,
- matrix_mdev->matrix.aqm,
- matrix_mdev->matrix.adm);
-notify_done:
mutex_unlock(&matrix_dev->lock);
+
return notify_rc;
}
@@ -1258,8 +1313,20 @@ static void vfio_ap_mdev_release(struct mdev_device *mdev)
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
mutex_lock(&matrix_dev->lock);
+ /*
+ * If the KVM pointer is in the process of being set, wait until the
+ * process has completed. Unlock the matrix_dev->lock while waiting to
+ * allow the value of the KVM pointer to be set.
+ */
+ while (matrix_mdev->kvm_busy) {
+ mutex_unlock(&matrix_dev->lock);
+ msleep(100);
+ mutex_lock(&matrix_dev->lock);
+ }
+
if (matrix_mdev->kvm)
vfio_ap_mdev_unset_kvm(matrix_mdev);
+
mutex_unlock(&matrix_dev->lock);
vfio_unregister_notifier(mdev_dev(mdev), VFIO_IOMMU_NOTIFY,
@@ -1293,6 +1360,7 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
unsigned int cmd, unsigned long arg)
{
int ret;
+ struct ap_matrix_mdev *matrix_mdev;
mutex_lock(&matrix_dev->lock);
switch (cmd) {
@@ -1300,7 +1368,12 @@ static ssize_t vfio_ap_mdev_ioctl(struct mdev_device *mdev,
ret = vfio_ap_mdev_get_device_info(arg);
break;
case VFIO_DEVICE_RESET:
- ret = vfio_ap_mdev_reset_queues(mdev);
+ matrix_mdev = mdev_get_drvdata(mdev);
+
+ if (matrix_mdev->kvm_busy)
+ ret = -EBUSY;
+ else
+ ret = vfio_ap_mdev_reset_queues(mdev);
break;
default:
ret = -EOPNOTSUPP;
diff --git a/drivers/s390/crypto/vfio_ap_private.h b/drivers/s390/crypto/vfio_ap_private.h
index 28e9d9989768..d1c8d25a899e 100644
--- a/drivers/s390/crypto/vfio_ap_private.h
+++ b/drivers/s390/crypto/vfio_ap_private.h
@@ -83,6 +83,7 @@ struct ap_matrix_mdev {
struct ap_matrix matrix;
struct notifier_block group_notifier;
struct notifier_block iommu_notifier;
+ bool kvm_busy;
struct kvm *kvm;
struct kvm_s390_module_hook pqap_hook;
struct mdev_device *mdev;
--
2.21.3
There are a couple new drivers and support for the new management
interface for mlx under review now. I figured I'll send them separately
if review is done in time, lots of people are waiting for the vdpa tool
patches to I want to make sure they make this release.
The following changes since commit f40ddce88593482919761f74910f42f4b84c004b:
Linux 5.11 (2021-02-14 14:32:24 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git tags/for_linus
for you to fetch changes up to 16c10bede8b3d8594279752bf53153491f3f944f:
virtio-input: add multi-touch support (2021-02-23 07:52:59 -0500)
----------------------------------------------------------------
virtio: features, fixes
new vdpa features to allow creation and deletion of new devices
virtio-blk support per-device queue depth
fixes, cleanups all over the place
Signed-off-by: Michael S. Tsirkin <mst(a)redhat.com>
----------------------------------------------------------------
Colin Xu (1):
virtio_input: Prevent EV_MSC/MSC_TIMESTAMP loop storm for MT.
Dongli Zhang (1):
vhost scsi: alloc vhost_scsi with kvzalloc() to avoid delay
Gustavo A. R. Silva (1):
virtio_net: Fix fall-through warnings for Clang
Jason Wang (17):
virtio-pci: do not access iomem via struct virtio_pci_device directly
virtio-pci: split out modern device
virtio-pci-modern: factor out modern device initialization logic
virtio-pci-modern: introduce vp_modern_remove()
virtio-pci-modern: introduce helper to set config vector
virtio-pci-modern: introduce helpers for setting and getting status
virtio-pci-modern: introduce helpers for setting and getting features
virtio-pci-modern: introduce vp_modern_generation()
virtio-pci-modern: introduce vp_modern_set_queue_vector()
virtio-pci-modern: introduce vp_modern_queue_address()
virtio-pci-modern: introduce helper to set/get queue_enable
virtio-pci-modern: introduce helper for setting/geting queue size
virtio-pci-modern: introduce helper for getting queue nums
virtio-pci-modern: introduce helper to get notification offset
virito-pci-modern: rename map_capability() to vp_modern_map_capability()
virtio-pci: introduce modern device module
virtio_vdpa: don't warn when fail to disable vq
Jiapeng Zhong (1):
virtio-mem: Assign boolean values to a bool variable
Joseph Qi (1):
virtio-blk: support per-device queue depth
Mathias Crombez (1):
virtio-input: add multi-touch support
Parav Pandit (6):
vdpa_sim_net: Make mac address array static
vdpa: Extend routine to accept vdpa device name
vdpa: Define vdpa mgmt device, ops and a netlink interface
vdpa: Enable a user to add and delete a vdpa device
vdpa: Enable user to query vdpa device info
vdpa_sim_net: Add support for user supported devices
Stefano Garzarella (1):
vdpa/mlx5: fix param validation in mlx5_vdpa_get_config()
Xianting Tian (1):
virtio_mmio: fix one typo
drivers/block/virtio_blk.c | 11 +-
drivers/net/virtio_net.c | 1 +
drivers/vdpa/Kconfig | 1 +
drivers/vdpa/ifcvf/ifcvf_main.c | 2 +-
drivers/vdpa/mlx5/net/mlx5_vnet.c | 4 +-
drivers/vdpa/vdpa.c | 503 ++++++++++++++++++++++++++-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 3 +-
drivers/vdpa/vdpa_sim/vdpa_sim.h | 2 +
drivers/vdpa/vdpa_sim/vdpa_sim_net.c | 100 ++++--
drivers/vhost/scsi.c | 9 +-
drivers/virtio/Kconfig | 9 +
drivers/virtio/Makefile | 1 +
drivers/virtio/virtio_input.c | 26 +-
drivers/virtio/virtio_mem.c | 2 +-
drivers/virtio/virtio_mmio.c | 2 +-
drivers/virtio/virtio_pci_common.h | 22 +-
drivers/virtio/virtio_pci_modern.c | 504 ++++-----------------------
drivers/virtio/virtio_pci_modern_dev.c | 599 +++++++++++++++++++++++++++++++++
drivers/virtio/virtio_vdpa.c | 3 +-
include/linux/vdpa.h | 44 ++-
include/linux/virtio_pci_modern.h | 111 ++++++
include/uapi/linux/vdpa.h | 40 +++
22 files changed, 1492 insertions(+), 507 deletions(-)
create mode 100644 drivers/virtio/virtio_pci_modern_dev.c
create mode 100644 include/linux/virtio_pci_modern.h
create mode 100644 include/uapi/linux/vdpa.h
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7ffddd499ba6122b1a07828f023d1d67629aa017 Mon Sep 17 00:00:00 2001
From: Muchun Song <songmuchun(a)bytedance.com>
Date: Thu, 4 Feb 2021 18:32:06 -0800
Subject: [PATCH] mm: hugetlb: fix a race between freeing and dissolving the
page
There is a race condition between __free_huge_page()
and dissolve_free_huge_page().
CPU0: CPU1:
// page_count(page) == 1
put_page(page)
__free_huge_page(page)
dissolve_free_huge_page(page)
spin_lock(&hugetlb_lock)
// PageHuge(page) && !page_count(page)
update_and_free_page(page)
// page is freed to the buddy
spin_unlock(&hugetlb_lock)
spin_lock(&hugetlb_lock)
clear_page_huge_active(page)
enqueue_huge_page(page)
// It is wrong, the page is already freed
spin_unlock(&hugetlb_lock)
The race window is between put_page() and dissolve_free_huge_page().
We should make sure that the page is already on the free list when it is
dissolved.
As a result __free_huge_page would corrupt page(s) already in the buddy
allocator.
Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com
Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6f0e242d38ca..c6ee3c28a04e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
static int num_fault_mutexes;
struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
+static inline bool PageHugeFreed(struct page *head)
+{
+ return page_private(head + 4) == -1UL;
+}
+
+static inline void SetPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, -1UL);
+}
+
+static inline void ClearPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, 0);
+}
+
/* Forward declaration */
static int hugetlb_acct_memory(struct hstate *h, long delta);
@@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
list_move(&page->lru, &h->hugepage_freelists[nid]);
h->free_huge_pages++;
h->free_huge_pages_node[nid]++;
+ SetPageHugeFreed(page);
}
static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
list_move(&page->lru, &h->hugepage_activelist);
set_page_refcounted(page);
+ ClearPageHugeFreed(page);
h->free_huge_pages--;
h->free_huge_pages_node[nid]--;
return page;
@@ -1505,6 +1522,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
spin_lock(&hugetlb_lock);
h->nr_huge_pages++;
h->nr_huge_pages_node[nid]++;
+ ClearPageHugeFreed(page);
spin_unlock(&hugetlb_lock);
}
@@ -1755,6 +1773,7 @@ int dissolve_free_huge_page(struct page *page)
{
int rc = -EBUSY;
+retry:
/* Not to disrupt normal path by vainly holding hugetlb_lock */
if (!PageHuge(page))
return 0;
@@ -1771,6 +1790,26 @@ int dissolve_free_huge_page(struct page *page)
int nid = page_to_nid(head);
if (h->free_huge_pages - h->resv_huge_pages == 0)
goto out;
+
+ /*
+ * We should make sure that the page is already on the free list
+ * when it is dissolved.
+ */
+ if (unlikely(!PageHugeFreed(head))) {
+ spin_unlock(&hugetlb_lock);
+ cond_resched();
+
+ /*
+ * Theoretically, we should return -EBUSY when we
+ * encounter this race. In fact, we have a chance
+ * to successfully dissolve the page if we do a
+ * retry. Because the race window is quite small.
+ * If we seize this opportunity, it is an optimization
+ * for increasing the success rate of dissolving page.
+ */
+ goto retry;
+ }
+
/*
* Move PageHWPoison flag from head page to the raw error page,
* which makes any subpages rather than the error page reusable.
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 7ffddd499ba6122b1a07828f023d1d67629aa017 Mon Sep 17 00:00:00 2001
From: Muchun Song <songmuchun(a)bytedance.com>
Date: Thu, 4 Feb 2021 18:32:06 -0800
Subject: [PATCH] mm: hugetlb: fix a race between freeing and dissolving the
page
There is a race condition between __free_huge_page()
and dissolve_free_huge_page().
CPU0: CPU1:
// page_count(page) == 1
put_page(page)
__free_huge_page(page)
dissolve_free_huge_page(page)
spin_lock(&hugetlb_lock)
// PageHuge(page) && !page_count(page)
update_and_free_page(page)
// page is freed to the buddy
spin_unlock(&hugetlb_lock)
spin_lock(&hugetlb_lock)
clear_page_huge_active(page)
enqueue_huge_page(page)
// It is wrong, the page is already freed
spin_unlock(&hugetlb_lock)
The race window is between put_page() and dissolve_free_huge_page().
We should make sure that the page is already on the free list when it is
dissolved.
As a result __free_huge_page would corrupt page(s) already in the buddy
allocator.
Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com
Fixes: c8721bbbdd36 ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 6f0e242d38ca..c6ee3c28a04e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -79,6 +79,21 @@ DEFINE_SPINLOCK(hugetlb_lock);
static int num_fault_mutexes;
struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp;
+static inline bool PageHugeFreed(struct page *head)
+{
+ return page_private(head + 4) == -1UL;
+}
+
+static inline void SetPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, -1UL);
+}
+
+static inline void ClearPageHugeFreed(struct page *head)
+{
+ set_page_private(head + 4, 0);
+}
+
/* Forward declaration */
static int hugetlb_acct_memory(struct hstate *h, long delta);
@@ -1028,6 +1043,7 @@ static void enqueue_huge_page(struct hstate *h, struct page *page)
list_move(&page->lru, &h->hugepage_freelists[nid]);
h->free_huge_pages++;
h->free_huge_pages_node[nid]++;
+ SetPageHugeFreed(page);
}
static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
@@ -1044,6 +1060,7 @@ static struct page *dequeue_huge_page_node_exact(struct hstate *h, int nid)
list_move(&page->lru, &h->hugepage_activelist);
set_page_refcounted(page);
+ ClearPageHugeFreed(page);
h->free_huge_pages--;
h->free_huge_pages_node[nid]--;
return page;
@@ -1505,6 +1522,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid)
spin_lock(&hugetlb_lock);
h->nr_huge_pages++;
h->nr_huge_pages_node[nid]++;
+ ClearPageHugeFreed(page);
spin_unlock(&hugetlb_lock);
}
@@ -1755,6 +1773,7 @@ int dissolve_free_huge_page(struct page *page)
{
int rc = -EBUSY;
+retry:
/* Not to disrupt normal path by vainly holding hugetlb_lock */
if (!PageHuge(page))
return 0;
@@ -1771,6 +1790,26 @@ int dissolve_free_huge_page(struct page *page)
int nid = page_to_nid(head);
if (h->free_huge_pages - h->resv_huge_pages == 0)
goto out;
+
+ /*
+ * We should make sure that the page is already on the free list
+ * when it is dissolved.
+ */
+ if (unlikely(!PageHugeFreed(head))) {
+ spin_unlock(&hugetlb_lock);
+ cond_resched();
+
+ /*
+ * Theoretically, we should return -EBUSY when we
+ * encounter this race. In fact, we have a chance
+ * to successfully dissolve the page if we do a
+ * retry. Because the race window is quite small.
+ * If we seize this opportunity, it is an optimization
+ * for increasing the success rate of dissolving page.
+ */
+ goto retry;
+ }
+
/*
* Move PageHWPoison flag from head page to the raw error page,
* which makes any subpages rather than the error page reusable.
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: 9c7a62998dea - KVM: Use kvm_pfn_t for local PFN variable in hva_to_pfn_remapped()
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j30 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP
✅ Loopdev Sanity
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Networking socket: fuzz
⏱ Networking: igmp conformance test
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ storage: SCSI VPD
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ Firmware test suite
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Ethernet drivers sanity
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
ppc64le:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ storage: software RAID testing
🚧 ⚡⚡⚡ xfstests - ext4
🚧 ⚡⚡⚡ xfstests - xfs
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage block - filesystem fio test
🚧 ⚡⚡⚡ Storage block - queue scheduler test
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Networking socket: fuzz
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ pciutils: update pci ids test
⚡⚡⚡ ALSA PCM loopback test
⚡⚡⚡ ALSA Control (mixer) Userspace Element test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Ethernet drivers sanity
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
⚡⚡⚡ LTP
⚡⚡⚡ Loopdev Sanity
⚡⚡⚡ Memory: fork_mem
⚡⚡⚡ Memory function: memfd_create
⚡⚡⚡ AMTU (Abstract Machine Test Utility)
⚡⚡⚡ Networking bridge: sanity
⚡⚡⚡ Networking route: pmtu
⚡⚡⚡ Networking route_func - local
⚡⚡⚡ Networking route_func - forward
⚡⚡⚡ Networking TCP: keepalive test
⚡⚡⚡ Networking UDP: socket
⚡⚡⚡ Networking tunnel: geneve basic test
⚡⚡⚡ Networking tunnel: gre basic
⚡⚡⚡ L2TP basic test
⚡⚡⚡ Networking tunnel: vxlan basic
⚡⚡⚡ Networking ipsec: basic netns - transport
⚡⚡⚡ Networking ipsec: basic netns - tunnel
⚡⚡⚡ Libkcapi AF_ALG test
⚡⚡⚡ trace: ftrace/tracer
🚧 ⚡⚡⚡ CIFS Connectathon
🚧 ⚡⚡⚡ POSIX pjd-fstest suites
🚧 ⚡⚡⚡ jvm - jcstress tests
🚧 ⚡⚡⚡ Memory function: kaslr
🚧 ⚡⚡⚡ Ethernet drivers sanity
🚧 ⚡⚡⚡ Networking firewall: basic netfilter test
🚧 ⚡⚡⚡ audit: audit testsuite test
x86_64:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ storage: software RAID testing
🚧 ⚡⚡⚡ CPU: Frequency Driver Test
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - ext4
🚧 ⚡⚡⚡ xfstests - xfs
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ xfstests - nfsv4.2
🚧 ⚡⚡⚡ xfstests - cifsv3.11
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ power-management: cpupower/sanity test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage block - filesystem fio test
🚧 ⚡⚡⚡ Storage block - queue scheduler test
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ stress: stress-ng
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
On 2/25/21 8:53 AM, Tony Krowiak wrote:
>
>
> On 2/25/21 6:28 AM, Halil Pasic wrote:
>> On Wed, 24 Feb 2021 22:28:50 -0500
>> Tony Krowiak<akrowiak(a)linux.ibm.com> wrote:
>>
>>>>> static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
>>>>> {
>>>>> - kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
>>>>> - matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
>>>>> - vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
>>>>> - kvm_put_kvm(matrix_mdev->kvm);
>>>>> - matrix_mdev->kvm = NULL;
>>>>> + struct kvm *kvm;
>>>>> +
>>>>> + if (matrix_mdev->kvm) {
>>>>> + kvm = matrix_mdev->kvm;
>>>>> + kvm_get_kvm(kvm);
>>>>> + matrix_mdev->kvm = NULL;
>>>> I think if there were two threads dong the unset in parallel, one
>>>> of them could bail out and carry on before the cleanup is done. But
>>>> since nothing much happens in release after that, I don't see an
>>>> immediate problem.
>>>>
>>>> Another thing to consider is, that setting ->kvm to NULL arms
>>>> vfio_ap_mdev_remove()...
>>> I'm not entirely sure what you mean by this, but my
>>> assumption is that you are talking about the check
>>> for matrix_mdev->kvm != NULL at the start of
>>> that function.
>> Yes I was talking about the check
>>
>> static int vfio_ap_mdev_remove(struct mdev_device *mdev)
>> {
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>
>> if (matrix_mdev->kvm)
>> return -EBUSY;
>> ...
>> kfree(matrix_mdev);
>> ...
>> }
>>
>> As you see, we bail out if kvm is still set, otherwise we clean up the
>> matrix_mdev which includes kfree-ing it. And vfio_ap_mdev_remove() is
>> initiated via the sysfs, i.e. can be initiated at any time. If we were
>> to free matrix_mdev in mdev_remove() and then carry on with kvm_unset()
>> with mutex_lock(&matrix_dev->lock); that would be bad.
>
> I agree.
>
>>
>>> The reason
>>> matrix_mdev->kvm is set to NULL before giving up
>>> the matrix_dev->lock is so that functions that check
>>> for the presence of the matrix_mdev->kvm pointer,
>>> such as assign_adapter_store() - will exit if they get
>>> control while the masks are being cleared.
>> I disagree!
>>
>> static ssize_t assign_adapter_store(struct device *dev,
>> struct device_attribute *attr,
>> const char *buf, size_t count)
>> {
>> int ret;
>> unsigned long apid;
>> struct mdev_device *mdev = mdev_from_dev(dev);
>> struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
>>
>> /* If the guest is running, disallow assignment of adapter */
>> if (matrix_mdev->kvm)
>> return -EBUSY;
>>
>> We bail out when kvm != NULL, so having it set to NULL while the
>> mask are being cleared will make these not bail out.
>
> You are correct, I am an idiot.
>
>>> So what we have
>>> here is a catch-22; in other words, we have the case
>>> you pointed out above and the cases related to
>>> assigning/unassigning adapters, domains and
>>> control domains which should exit when a guest
>>> is running.
>> See above.
>
> Ditto.
>
>>> I may have an idea to resolve this. Suppose we add:
>>>
>>> struct ap_matrix_mdev {
>>> ...
>>> bool kvm_busy;
>>> ...
>>> }
>>>
>>> This flag will be set to true at the start of both the
>>> vfio_ap_mdev_set_kvm() and vfio_ap_mdev_unset_kvm()
>>> and set to false at the end. The assignment/unassignment
>>> and remove callback functions can test this flag and
>>> return -EBUSY if the flag is true. That will preclude assigning
>>> or unassigning adapters, domains and control domains when
>>> the KVM pointer is being set/unset. Likewise, removal of the
>>> mediated device will also be prevented while the KVM pointer
>>> is being set/unset.
>>>
>>> In the case of the PQAP handler function, it can wait for the
>>> set/unset of the KVM pointer as follows:
>>>
>>> /while (matrix_mdev->kvm_busy) {//
>>> // mutex_unlock(&matrix_dev->lock);//
>>> // msleep(100);//
>>> // mutex_lock(&matrix_dev->lock);//
>>> //}//
>>> //
>>> //if (!matrix_mdev->kvm)//
>>> // goto out_unlock;
>>>
>>> /What say you?
>>> //
>> I'm not sure. Since I disagree with your analysis above it is difficult
>> to deal with the conclusion. I'm not against decoupling the tracking of
>> the state of the mdev_matrix device from the value of the kvm pointer. I
>> think we should first get a common understanding of the problem, before
>> we proceed to the solution.
>
> Regardless of my brain fog regarding the testing of the
> matrix_mdev->kvm pointer, I stand by what I stated
> in the paragraphs just before the code snippet.
>
> The problem is there are 10 functions that depend upon
> the value of the matrix_mdev->kvm pointer that can get
> control while the pointer is being set/unset and the
> matrix_dev->lock is given up to set/clear the masks:
* vfio_ap_irq_enable: called by handle_pqap() when AQIC is intercepted
* vfio_ap_irq_disable: called by handle_pqap() when AQIC is intercepted
* assign_adapter_store: sysfs
* unassign_adapter_store: sysfs
* assign_domain_store: sysfs
* unassign_domain_store: sysfs
* assign__control_domain_store: sysfs
* unassign_control_domain_store: sysfs
* vfio_ap_mdev_remove: sysfs
* vfio_ap_mdev_release: mdev fd closed by userspace (i.e., qemu)If we
add the proposed flag to indicate when the matrix_mdev->kvm
> pointer is in flux, then we can check that before allowing the functions
> in the list above to proceed.
>
>> Regards,
>> Halil
>
The patch titled
Subject: mm, compaction: make fast_isolate_freepages() stay within zone
has been removed from the -mm tree. Its filename was
mm-compaction-make-fast_isolate_freepages-stay-within-zone.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, compaction: make fast_isolate_freepages() stay within zone
Compaction always operates on pages from a single given zone when
isolating both pages to migrate and freepages. Pageblock boundaries are
intersected with zone boundaries to be safe in case zone starts or ends in
the middle of pageblock. The use of pageblock_pfn_to_page() protects
against non-contiguous pageblocks.
The functions fast_isolate_freepages() and fast_isolate_around() don't
currently protect the fast freepage isolation thoroughly enough against
these corner cases, and can result in freepage isolation operate outside
of zone boundaries:
- in fast_isolate_freepages() if we get a pfn from the first pageblock
of a zone that starts in the middle of that pageblock, 'highest' can be
a pfn outside of the zone. If we fail to isolate anything in this
function, we may then call fast_isolate_around() on a pfn outside of the
zone and there effectively do a set_pageblock_skip(page_to_pfn(highest))
which may currently hit a VM_BUG_ON() in some configurations
- fast_isolate_around() checks only the zone end boundary and not
beginning, nor that the pageblock is contiguous (with
pageblock_pfn_to_page()) so it's possible that we end up calling
isolate_freepages_block() on a range of pfn's from two different zones
and end up e.g. isolating freepages under the wrong zone's lock.
This patch should fix the above issues.
Link: https://lkml.kernel.org/r/20210217173300.6394-1-vbabka@suse.cz
Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: David Rientjes <rientjes(a)google.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
--- a/mm/compaction.c~mm-compaction-make-fast_isolate_freepages-stay-within-zone
+++ a/mm/compaction.c
@@ -1284,7 +1284,7 @@ static void
fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated)
{
unsigned long start_pfn, end_pfn;
- struct page *page = pfn_to_page(pfn);
+ struct page *page;
/* Do not search around if there are enough pages already */
if (cc->nr_freepages >= cc->nr_migratepages)
@@ -1295,8 +1295,12 @@ fast_isolate_around(struct compact_contr
return;
/* Pageblock boundaries */
- start_pfn = pageblock_start_pfn(pfn);
- end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(cc->zone)) - 1;
+ start_pfn = max(pageblock_start_pfn(pfn), cc->zone->zone_start_pfn);
+ end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(cc->zone));
+
+ page = pageblock_pfn_to_page(start_pfn, end_pfn, cc->zone);
+ if (!page)
+ return;
/* Scan before */
if (start_pfn != pfn) {
@@ -1398,7 +1402,8 @@ fast_isolate_freepages(struct compact_co
pfn = page_to_pfn(freepage);
if (pfn >= highest)
- highest = pageblock_start_pfn(pfn);
+ highest = max(pageblock_start_pfn(pfn),
+ cc->zone->zone_start_pfn);
if (pfn >= low_pfn) {
cc->fast_search_fail = 0;
@@ -1468,7 +1473,8 @@ fast_isolate_freepages(struct compact_co
} else {
if (cc->direct_compaction && pfn_valid(min_pfn)) {
page = pageblock_pfn_to_page(min_pfn,
- pageblock_end_pfn(min_pfn),
+ min(pageblock_end_pfn(min_pfn),
+ zone_end_pfn(cc->zone)),
cc->zone);
cc->free_pfn = min_pfn;
}
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
maintainers-add-uapi-directories-to-api-abi-section.patch
The patch titled
Subject: mm/vmscan: restore zone_reclaim_mode ABI
has been removed from the -mm tree. Its filename was
mm-vmscan-restore-zone_reclaim_mode-abi.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Subject: mm/vmscan: restore zone_reclaim_mode ABI
I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
sysctl. Like a good kernel developer, I also went to go update the
documentation. I noticed that the bits in the documentation didn't
match the bits in the #defines.
The VM never explicitly checks the RECLAIM_ZONE bit. The bit is,
however implicitly checked when checking 'node_reclaim_mode==0'. The
RECLAIM_ZONE #define was removed in a cleanup. That, by itself is
fine.
But, when the bit was removed (bit 0) the _other_ bit locations also got
changed. That's not OK because the bit values are documented to mean one
specific thing. Users surely do not expect the meaning to change from
kernel to kernel.
The end result is that if someone had a script that did:
sysctl vm.zone_reclaim_mode=1
it would have gone from enabling node reclaim for clean unmapped pages to
writing out pages during node reclaim after the commit in question.
That's not great.
Put the bits back the way they were and add a comment so something like
this is a bit harder to do again. Update the documentation to make it
clear that the first bit is ignored.
Link: https://lkml.kernel.org/r/20210219172555.FF0CDF23@viggo.jf.intel.com
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
Reviewed-by: Ben Widawsky <ben.widawsky(a)intel.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: David Rientjes <rientjes(a)google.com>
Acked-by: Christoph Lameter <cl(a)linux.com>
Cc: Alex Shi <alex.shi(a)linux.alibaba.com>
Cc: Daniel Wagner <dwagner(a)suse.de>
Cc: "Tobin C. Harding" <tobin(a)kernel.org>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/admin-guide/sysctl/vm.rst | 10 +++++-----
mm/vmscan.c | 9 +++++++--
2 files changed, 12 insertions(+), 7 deletions(-)
--- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-zone_reclaim_mode-abi
+++ a/Documentation/admin-guide/sysctl/vm.rst
@@ -983,11 +983,11 @@ that benefit from having their data cach
left disabled as the caching effect is likely to be more important than
data locality.
-zone_reclaim may be enabled if it's known that the workload is partitioned
-such that each partition fits within a NUMA node and that accessing remote
-memory would cause a measurable performance reduction. The page allocator
-will then reclaim easily reusable pages (those page cache pages that are
-currently not used) before allocating off node pages.
+Consider enabling one or more zone_reclaim mode bits if it's known that the
+workload is partitioned such that each partition fits within a NUMA node
+and that accessing remote memory would cause a measurable performance
+reduction. The page allocator will take additional actions before
+allocating off node pages.
Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
--- a/mm/vmscan.c~mm-vmscan-restore-zone_reclaim_mode-abi
+++ a/mm/vmscan.c
@@ -4085,8 +4085,13 @@ module_init(kswapd_init)
*/
int node_reclaim_mode __read_mostly;
-#define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */
-#define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */
+/*
+ * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
+ * ABI. New bits are OK, but existing bits can never change.
+ */
+#define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */
+#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
+#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */
/*
* Priority for NODE_RECLAIM. This determines the fraction of pages
_
Patches currently in -mm which might be from dave.hansen(a)linux.intel.com are
The patch titled
Subject: hugetlb: fix copy_huge_page_from_user contig page struct assumption
has been removed from the -mm tree. Its filename was
hugetlb-fix-copy_huge_page_from_user-contig-page-struct-assumption.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: fix copy_huge_page_from_user contig page struct assumption
page structs are not guaranteed to be contiguous for gigantic pages. The
routine copy_huge_page_from_user can encounter gigantic pages, yet it
assumes page structs are contiguous when copying pages from user space.
Since page structs for the target gigantic page are not contiguous, the
data copied from user space could overwrite other pages not associated
with the gigantic page and cause data corruption.
Non-contiguous page structs are generally not an issue. However, they can
exist with a specific kernel configuration and hotplug operations. For
example: Configure the kernel with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where
the gigantic page will be allocated.
Link: https://lkml.kernel.org/r/20210217184926.33567-2-mike.kravetz@oracle.com
Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Davidlohr Bueso <dbueso(a)suse.de>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/mm/memory.c~hugetlb-fix-copy_huge_page_from_user-contig-page-struct-assumption
+++ a/mm/memory.c
@@ -5177,17 +5177,19 @@ long copy_huge_page_from_user(struct pag
void *page_kaddr;
unsigned long i, rc = 0;
unsigned long ret_val = pages_per_huge_page * PAGE_SIZE;
+ struct page *subpage = dst_page;
- for (i = 0; i < pages_per_huge_page; i++) {
+ for (i = 0; i < pages_per_huge_page;
+ i++, subpage = mem_map_next(subpage, dst_page, i)) {
if (allow_pagefault)
- page_kaddr = kmap(dst_page + i);
+ page_kaddr = kmap(subpage);
else
- page_kaddr = kmap_atomic(dst_page + i);
+ page_kaddr = kmap_atomic(subpage);
rc = copy_from_user(page_kaddr,
(const void __user *)(src + i * PAGE_SIZE),
PAGE_SIZE);
if (allow_pagefault)
- kunmap(dst_page + i);
+ kunmap(subpage);
else
kunmap_atomic(page_kaddr);
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
The patch titled
Subject: hugetlb: fix update_and_free_page contig page struct assumption
has been removed from the -mm tree. Its filename was
hugetlb-fix-update_and_free_page-contig-page-struct-assumption.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: fix update_and_free_page contig page struct assumption
page structs are not guaranteed to be contiguous for gigantic pages. The
routine update_and_free_page can encounter a gigantic page, yet it assumes
page structs are contiguous when setting page flags in subpages.
If update_and_free_page encounters non-contiguous page structs, we can see
“BUG: Bad page state in process …” errors.
Non-contiguous page structs are generally not an issue. However, they can
exist with a specific kernel configuration and hotplug operations. For
example: Configure the kernel with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where
the gigantic page will be allocated. Zi Yan outlined steps to reproduce
here [1].
[1] https://lore.kernel.org/linux-mm/16F7C58B-4D79-41C5-9B64-A1A1628F4AF2@nvidi…
Link: https://lkml.kernel.org/r/20210217184926.33567-1-mike.kravetz@oracle.com
Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Davidlohr Bueso <dbueso(a)suse.de>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~hugetlb-fix-update_and_free_page-contig-page-struct-assumption
+++ a/mm/hugetlb.c
@@ -1321,14 +1321,16 @@ static inline void destroy_compound_giga
static void update_and_free_page(struct hstate *h, struct page *page)
{
int i;
+ struct page *subpage = page;
if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
return;
h->nr_huge_pages--;
h->nr_huge_pages_node[page_to_nid(page)]--;
- for (i = 0; i < pages_per_huge_page(h); i++) {
- page[i].flags &= ~(1 << PG_locked | 1 << PG_error |
+ for (i = 0; i < pages_per_huge_page(h);
+ i++, subpage = mem_map_next(subpage, page, i)) {
+ subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
1 << PG_referenced | 1 << PG_dirty |
1 << PG_active | 1 << PG_private |
1 << PG_writeback);
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
The patch titled
Subject: mm: memcontrol: fix get_active_memcg return value
has been removed from the -mm tree. Its filename was
mm-memcontrol-fix-get_active_memcg-return-value.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: memcontrol: fix get_active_memcg return value
We use a global percpu int_active_memcg variable to store the remote memcg
when we are in the interrupt context. But get_active_memcg always return
the current->active_memcg or root_mem_cgroup. The remote memcg (set in
the interrupt context) is ignored. This is not what we want. So fix it.
Link: https://lkml.kernel.org/r/20210223091101.42150-1-songmuchun@bytedance.com
Fixes: 37d5985c003d ("mm: kmem: prepare remote memcg charging infra for interrupt contexts")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-get_active_memcg-return-value
+++ a/mm/memcontrol.c
@@ -1061,13 +1061,9 @@ static __always_inline struct mem_cgroup
rcu_read_lock();
memcg = active_memcg();
- if (memcg) {
- /* current->active_memcg must hold a ref. */
- if (WARN_ON_ONCE(!css_tryget(&memcg->css)))
- memcg = root_mem_cgroup;
- else
- memcg = current->active_memcg;
- }
+ /* remote memcg must hold a ref. */
+ if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css)))
+ memcg = root_mem_cgroup;
rcu_read_unlock();
return memcg;
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
The patch titled
Subject: mm: memcontrol: fix swap undercounting in cgroup2
has been removed from the -mm tree. Its filename was
mm-memcontrol-fix-swap-undercounting-in-cgroup2.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: memcontrol: fix swap undercounting in cgroup2
When pages are swapped in, the VM may retain the swap copy to avoid
repeated writes in the future. It's also retained if shared pages are
faulted back in some processes, but not in others. During that time we
have an in-memory copy of the page, as well as an on-swap copy. Cgroup1
and cgroup2 handle these overlapping lifetimes slightly differently due to
the nature of how they account memory and swap:
Cgroup1 has a unified memory+swap counter that tracks a data page
regardless whether it's in-core or swapped out. On swapin, we transfer
the charge from the swap entry to the newly allocated swapcache page, even
though the swap entry might stick around for a while. That's why we have
a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge().
Cgroup2 tracks memory and swap as separate, independent resources and thus
has split memory and swap counters. On swapin, we charge the newly
allocated swapcache page as memory, while the swap slot in turn must
remain charged to the swap counter as long as its allocated too.
The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: make
swap tracking an integral part of memory control"), because it
accidentally removed the do_memsw_account() check in the branch inside
mem_cgroup_uncharge() that was supposed to tell the difference between the
charge transfer in cgroup1 and the separate counters in cgroup2.
As a result, cgroup2 currently undercounts retained swap to varying
degrees: swap slots are cached up to 50% of the configured limit or total
available swap space; partially faulted back shared pages are only limited
by physical capacity. This in turn allows cgroups to significantly
overconsume their alloted swap space.
Add the do_memsw_account() check back to fix this problem.
Link: https://lkml.kernel.org/r/20210217153237.92484-1-songmuchun@bytedance.com
Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-swap-undercounting-in-cgroup2
+++ a/mm/memcontrol.c
@@ -6748,7 +6748,19 @@ int mem_cgroup_charge(struct page *page,
memcg_check_events(memcg, page);
local_irq_enable();
- if (PageSwapCache(page)) {
+ /*
+ * Cgroup1's unified memory+swap counter has been charged with the
+ * new swapcache page, finish the transfer by uncharging the swap
+ * slot. The swap slot would also get uncharged when it dies, but
+ * it can stick around indefinitely and we'd count the page twice
+ * the entire time.
+ *
+ * Cgroup2 has separate resource counters for memory and swap,
+ * so this is a non-issue here. Memory and swap charge lifetimes
+ * correspond 1:1 to page and swap slot lifetimes: we charge the
+ * page to memory here, and uncharge swap when the slot is freed.
+ */
+ if (do_memsw_account() && PageSwapCache(page)) {
swp_entry_t entry = { .val = page_private(page) };
/*
* The swap entry might not get freed for a long time,
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
The patch titled
Subject: ntfs: check for valid standard information attribute
has been removed from the -mm tree. Its filename was
ntfs-check-for-valid-standard-information-attribute.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Rustam Kovhaev <rkovhaev(a)gmail.com>
Subject: ntfs: check for valid standard information attribute
Mounting a corrupted filesystem with NTFS resulted in a kernel crash.
We should check for valid STANDARD_INFORMATION attribute offset and length
before trying to access it
Link: https://lkml.kernel.org/r/20210217155930.1506815-1-rkovhaev@gmail.com
Link: https://syzkaller.appspot.com/bug?extid=c584225dabdea2f71969
Signed-off-by: Rustam Kovhaev <rkovhaev(a)gmail.com>
Reported-by: syzbot+c584225dabdea2f71969(a)syzkaller.appspotmail.com
Tested-by: syzbot+c584225dabdea2f71969(a)syzkaller.appspotmail.com
Acked-by: Anton Altaparmakov <anton(a)tuxera.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ntfs/inode.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/fs/ntfs/inode.c~ntfs-check-for-valid-standard-information-attribute
+++ a/fs/ntfs/inode.c
@@ -629,6 +629,12 @@ static int ntfs_read_locked_inode(struct
}
a = ctx->attr;
/* Get the standard information attribute value. */
+ if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
+ + le32_to_cpu(a->data.resident.value_length) >
+ (u8 *)ctx->mrec + vol->mft_record_size) {
+ ntfs_error(vi->i_sb, "Corrupt standard information attribute in inode.");
+ goto unm_err_out;
+ }
si = (STANDARD_INFORMATION*)((u8*)a +
le16_to_cpu(a->data.resident.value_offset));
_
Patches currently in -mm which might be from rkovhaev(a)gmail.com are
From: Florian Fainelli <florian.fainelli(a)broadcom.com>
Hi Greg, Sasha, Jaakub and David,
This patch series contains backports for a change that recently made it
upstream as:
commit f3f9be9c58085d11f4448ec199bf49dc2f9b7fb9
Merge: 18755e270666 f9b3827ee66c
Author: Jakub Kicinski <kuba(a)kernel.org>
Date: Tue Feb 23 12:23:06 2021 -0800
Merge branch 'net-dsa-learning-fixes-for-b53-bcm_sf2'
The way this was fixed in the netdev group's net tree is slightly
different from how it should be backported to stable trees which is why
you will find a patch for each branch in the thread started by this
cover letter.
Let me know if this does not apply for some reason. The changes from 4.9
through 4.19 are nearly identical and then from 5.4 through 5.11 are
about the same.
Thank you very much!
--
2.25.1
On Thu, 25 Feb 2021 08:53:50 -0500
Tony Krowiak <akrowiak(a)linux.ibm.com> wrote:
> If we add the proposed flag to indicate when the matrix_mdev->kvm
> pointer is in flux, then we can check that before allowing the functions
> in the list above to proceed.
I'm not against that. Go ahead!
Regards,
Halil
Hi Greg,
Here is a cherry-pick of a bluetooth fix which just landed in Linus' tree.
A lot of users are complaining about this:
https://bugzilla.kernel.org/show_bug.cgi?id=210681
It would be good if we can get this added to the 5.10.y and 5.11.y series.
Older kernels are not affected unless the commit mentioned by the Fixes
tag was backported.
I've based this cherry-pick on top of 5.10.y, it should also apply cleanly
to 5.11.y.
Regards,
Hans
Hui Wang (1):
Bluetooth: btusb: Some Qualcomm Bluetooth adapters stop working
drivers/bluetooth/btusb.c | 7 +++++++
1 file changed, 7 insertions(+)
--
2.30.1
On Wed, 24 Feb 2021 22:28:50 -0500
Tony Krowiak <akrowiak(a)linux.ibm.com> wrote:
> >> static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev)
> >> {
> >> - kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
> >> - matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
> >> - vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
> >> - kvm_put_kvm(matrix_mdev->kvm);
> >> - matrix_mdev->kvm = NULL;
> >> + struct kvm *kvm;
> >> +
> >> + if (matrix_mdev->kvm) {
> >> + kvm = matrix_mdev->kvm;
> >> + kvm_get_kvm(kvm);
> >> + matrix_mdev->kvm = NULL;
> > I think if there were two threads dong the unset in parallel, one
> > of them could bail out and carry on before the cleanup is done. But
> > since nothing much happens in release after that, I don't see an
> > immediate problem.
> >
> > Another thing to consider is, that setting ->kvm to NULL arms
> > vfio_ap_mdev_remove()...
>
> I'm not entirely sure what you mean by this, but my
> assumption is that you are talking about the check
> for matrix_mdev->kvm != NULL at the start of
> that function.
Yes I was talking about the check
static int vfio_ap_mdev_remove(struct mdev_device *mdev)
{
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
if (matrix_mdev->kvm)
return -EBUSY;
...
kfree(matrix_mdev);
...
}
As you see, we bail out if kvm is still set, otherwise we clean up the
matrix_mdev which includes kfree-ing it. And vfio_ap_mdev_remove() is
initiated via the sysfs, i.e. can be initiated at any time. If we were
to free matrix_mdev in mdev_remove() and then carry on with kvm_unset()
with mutex_lock(&matrix_dev->lock); that would be bad.
> The reason
> matrix_mdev->kvm is set to NULL before giving up
> the matrix_dev->lock is so that functions that check
> for the presence of the matrix_mdev->kvm pointer,
> such as assign_adapter_store() - will exit if they get
> control while the masks are being cleared.
I disagree!
static ssize_t assign_adapter_store(struct device *dev,
struct device_attribute *attr,
const char *buf, size_t count)
{
int ret;
unsigned long apid;
struct mdev_device *mdev = mdev_from_dev(dev);
struct ap_matrix_mdev *matrix_mdev = mdev_get_drvdata(mdev);
/* If the guest is running, disallow assignment of adapter */
if (matrix_mdev->kvm)
return -EBUSY;
We bail out when kvm != NULL, so having it set to NULL while the
mask are being cleared will make these not bail out.
> So what we have
> here is a catch-22; in other words, we have the case
> you pointed out above and the cases related to
> assigning/unassigning adapters, domains and
> control domains which should exit when a guest
> is running.
See above.
>
> I may have an idea to resolve this. Suppose we add:
>
> struct ap_matrix_mdev {
> ...
> bool kvm_busy;
> ...
> }
>
> This flag will be set to true at the start of both the
> vfio_ap_mdev_set_kvm() and vfio_ap_mdev_unset_kvm()
> and set to false at the end. The assignment/unassignment
> and remove callback functions can test this flag and
> return -EBUSY if the flag is true. That will preclude assigning
> or unassigning adapters, domains and control domains when
> the KVM pointer is being set/unset. Likewise, removal of the
> mediated device will also be prevented while the KVM pointer
> is being set/unset.
>
> In the case of the PQAP handler function, it can wait for the
> set/unset of the KVM pointer as follows:
>
> /while (matrix_mdev->kvm_busy) {//
> // mutex_unlock(&matrix_dev->lock);//
> // msleep(100);//
> // mutex_lock(&matrix_dev->lock);//
> //}//
> //
> //if (!matrix_mdev->kvm)//
> // goto out_unlock;
>
> /What say you?
> //
I'm not sure. Since I disagree with your analysis above it is difficult
to deal with the conclusion. I'm not against decoupling the tracking of
the state of the mdev_matrix device from the value of the kvm pointer. I
think we should first get a common understanding of the problem, before
we proceed to the solution.
Regards,
Halil
When vbus auto discharge is enabled, TCPM can sometimes be faster than
the TCPC i.e. TCPM can go ahead and move the port to unattached state
(involves disabling vbus auto discharge) before TCPC could effectively
discharge vbus to VSAFE0V. This leaves vbus with residual charge and
increases the decay time which prevents tsafe0v from being met.
This change introduces a new state VBUS_DISCHARGE where the TCPM waits
for a maximum of tSafe0V(max) for vbus to discharge to VSAFE0V before
transitioning to unattached state and re-enable toggling. If vbus
discharges to vsafe0v sooner, then, transition to unattached state
happens right away.
Also, while in SNK_READY, when auto discharge is enabled, drive
disconnect based on vbus turning off instead of Rp disappearing on
CC pins. Rp disappearing on CC pins is almost instanteous compared
to vbus decay.
Fixes: f321a02caebd ("usb: typec: tcpm: Implement enabling Auto
Discharge disconnect support")
Signed-off-by: Badhri Jagan Sridharan <badhri(a)google.com>
---
Changes since V1:
- Add Fixes tag
---
drivers/usb/typec/tcpm/tcpm.c | 60 +++++++++++++++++++++++++++++++----
1 file changed, 53 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index be0b6469dd3d..0ed71725980f 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -62,6 +62,8 @@
S(SNK_TRANSITION_SINK_VBUS), \
S(SNK_READY), \
\
+ S(VBUS_DISCHARGE), \
+ \
S(ACC_UNATTACHED), \
S(DEBUG_ACC_ATTACHED), \
S(AUDIO_ACC_ATTACHED), \
@@ -438,6 +440,9 @@ struct tcpm_port {
enum tcpm_ams next_ams;
bool in_ams;
+ /* Auto vbus discharge state */
+ bool auto_vbus_discharge_enabled;
+
#ifdef CONFIG_DEBUG_FS
struct dentry *dentry;
struct mutex logbuffer_lock; /* log buffer access lock */
@@ -3413,6 +3418,8 @@ static int tcpm_src_attach(struct tcpm_port *port)
if (port->tcpc->enable_auto_vbus_discharge) {
ret = port->tcpc->enable_auto_vbus_discharge(port->tcpc, true);
tcpm_log_force(port, "enable vbus discharge ret:%d", ret);
+ if (!ret)
+ port->auto_vbus_discharge_enabled = true;
}
ret = tcpm_set_roles(port, true, TYPEC_SOURCE, tcpm_data_role_for_source(port));
@@ -3495,6 +3502,8 @@ static void tcpm_reset_port(struct tcpm_port *port)
if (port->tcpc->enable_auto_vbus_discharge) {
ret = port->tcpc->enable_auto_vbus_discharge(port->tcpc, false);
tcpm_log_force(port, "Disable vbus discharge ret:%d", ret);
+ if (!ret)
+ port->auto_vbus_discharge_enabled = false;
}
port->in_ams = false;
port->ams = NONE_AMS;
@@ -3568,6 +3577,8 @@ static int tcpm_snk_attach(struct tcpm_port *port)
tcpm_set_auto_vbus_discharge_threshold(port, TYPEC_PWR_MODE_USB, false, VSAFE5V);
ret = port->tcpc->enable_auto_vbus_discharge(port->tcpc, true);
tcpm_log_force(port, "enable vbus discharge ret:%d", ret);
+ if (!ret)
+ port->auto_vbus_discharge_enabled = true;
}
ret = tcpm_set_roles(port, true, TYPEC_SINK, tcpm_data_role_for_sink(port));
@@ -3684,6 +3695,12 @@ static void run_state_machine(struct tcpm_port *port)
switch (port->state) {
case TOGGLING:
break;
+ case VBUS_DISCHARGE:
+ if (port->port_type == TYPEC_PORT_SRC)
+ tcpm_set_state(port, SRC_UNATTACHED, PD_T_SAFE_0V);
+ else
+ tcpm_set_state(port, SNK_UNATTACHED, PD_T_SAFE_0V);
+ break;
/* SRC states */
case SRC_UNATTACHED:
if (!port->non_pd_role_swap)
@@ -4669,7 +4686,9 @@ static void _tcpm_cc_change(struct tcpm_port *port, enum typec_cc_status cc1,
case SRC_READY:
if (tcpm_port_is_disconnected(port) ||
!tcpm_port_is_source(port)) {
- if (port->port_type == TYPEC_PORT_SRC)
+ if (port->auto_vbus_discharge_enabled && !port->vbus_vsafe0v)
+ tcpm_set_state(port, VBUS_DISCHARGE, 0);
+ else if (port->port_type == TYPEC_PORT_SRC)
tcpm_set_state(port, SRC_UNATTACHED, 0);
else
tcpm_set_state(port, SNK_UNATTACHED, 0);
@@ -4703,7 +4722,18 @@ static void _tcpm_cc_change(struct tcpm_port *port, enum typec_cc_status cc1,
tcpm_set_state(port, SNK_DEBOUNCED, 0);
break;
case SNK_READY:
- if (tcpm_port_is_disconnected(port))
+ /*
+ * When set_auto_vbus_discharge_threshold is enabled, CC pins go
+ * away before vbus decays to disconnect threshold. Allow
+ * disconnect to be driven by vbus disconnect when auto vbus
+ * discharge is enabled.
+ *
+ * EXIT condition is based primarily on vbus disconnect and CC is secondary.
+ * "A port that has entered into USB PD communications with the Source and
+ * has seen the CC voltage exceed vRd-USB may monitor the CC pin to detect
+ * cable disconnect in addition to monitoring VBUS.
+ */
+ if (!port->auto_vbus_discharge_enabled && tcpm_port_is_disconnected(port))
tcpm_set_state(port, unattached_state(port), 0);
else if (!port->pd_capable &&
(cc1 != old_cc1 || cc2 != old_cc2))
@@ -4803,9 +4833,16 @@ static void _tcpm_cc_change(struct tcpm_port *port, enum typec_cc_status cc1,
*/
break;
+ case VBUS_DISCHARGE:
+ /* Do nothing. Waiting for vsafe0v signal */
+ break;
default:
- if (tcpm_port_is_disconnected(port))
- tcpm_set_state(port, unattached_state(port), 0);
+ if (tcpm_port_is_disconnected(port)) {
+ if (port->auto_vbus_discharge_enabled && !port->vbus_vsafe0v)
+ tcpm_set_state(port, VBUS_DISCHARGE, 0);
+ else
+ tcpm_set_state(port, unattached_state(port), 0);
+ }
break;
}
}
@@ -4988,9 +5025,12 @@ static void _tcpm_pd_vbus_off(struct tcpm_port *port)
break;
default:
- if (port->pwr_role == TYPEC_SINK &&
- port->attached)
- tcpm_set_state(port, SNK_UNATTACHED, 0);
+ if (port->pwr_role == TYPEC_SINK && port->attached) {
+ if (port->auto_vbus_discharge_enabled && !port->vbus_vsafe0v)
+ tcpm_set_state(port, VBUS_DISCHARGE, 0);
+ else
+ tcpm_set_state(port, SNK_UNATTACHED, 0);
+ }
break;
}
}
@@ -5012,6 +5052,12 @@ static void _tcpm_pd_vbus_vsafe0v(struct tcpm_port *port)
tcpm_set_state(port, tcpm_try_snk(port) ? SNK_TRY : SRC_ATTACHED,
PD_T_CC_DEBOUNCE);
break;
+ case VBUS_DISCHARGE:
+ if (port->port_type == TYPEC_PORT_SRC)
+ tcpm_set_state(port, SRC_UNATTACHED, 0);
+ else
+ tcpm_set_state(port, SNK_UNATTACHED, 0);
+ break;
default:
break;
}
--
2.30.0.617.g56c4b15f3c-goog
From: Thomas Gleixner <tglx(a)linutronix.de>
The handle_exit_race() function is defined in commit c158b461306df82
("futex: Cure exit race"), which never returns -EBUSY. This results
in a small piece of dead code in the attach_to_pi_owner() function:
int ret = handle_exit_race(uaddr, uval, p); /* Never return -EBUSY */
...
if (ret == -EBUSY)
*exiting = p; /* dead code */
The return value -EBUSY is added to handle_exit_race() in upsteam
commit ac31c7ff8624409 ("futex: Provide distinct return value when
owner is exiting"). This commit was incorporated into v4.9.255, before
the function handle_exit_race() was introduced, whitout Modify
handle_exit_race().
To fix dead code, extract the change of handle_exit_race() from
commit ac31c7ff8624409 ("futex: Provide distinct return value when owner
is exiting"), re-incorporated.
Fixes: c158b461306df82 ("futex: Cure exit race")
Cc: stable(a)vger.kernel.org # 4.9.258-rc1
Signed-off-by: Xiaoming Ni <nixiaoming(a)huawei.com>
---
kernel/futex.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/kernel/futex.c b/kernel/futex.c
index b65dbb5d60bb..0fd785410150 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -1207,11 +1207,11 @@ static int handle_exit_race(u32 __user *uaddr, u32 uval,
u32 uval2;
/*
- * If the futex exit state is not yet FUTEX_STATE_DEAD, wait
- * for it to finish.
+ * If the futex exit state is not yet FUTEX_STATE_DEAD, tell the
+ * caller that the alleged owner is busy.
*/
if (tsk && tsk->futex_state != FUTEX_STATE_DEAD)
- return -EAGAIN;
+ return -EBUSY;
/*
* Reread the user space value to handle the following situation:
--
2.27.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a9545779ee9e9e103648f6f2552e73cfe808d0f4 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 8 Feb 2021 12:19:40 -0800
Subject: [PATCH] KVM: Use kvm_pfn_t for local PFN variable in
hva_to_pfn_remapped()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving
a so called "remapped" hva/pfn pair. In theory, the hva could resolve to
a pfn in high memory on a 32-bit kernel.
This bug was inadvertantly exposed by commit bd2fae8da794 ("KVM: do not
assume PTE is writable after follow_pfn"), which added an error PFN value
to the mix, causing gcc to comlain about overflowing the unsigned long.
arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’:
include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’
to ‘long unsigned int’ changes value from
‘9218868437227405314’ to ‘2’ [-Werror=overflow]
89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
| ^
virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’
Cc: stable(a)vger.kernel.org
Fixes: add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210208201940.1258328-1-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ee4ac2618ec5..001b9de4e727 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1906,7 +1906,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
bool write_fault, bool *writable,
kvm_pfn_t *p_pfn)
{
- unsigned long pfn;
+ kvm_pfn_t pfn;
pte_t *ptep;
spinlock_t *ptl;
int r;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a9545779ee9e9e103648f6f2552e73cfe808d0f4 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 8 Feb 2021 12:19:40 -0800
Subject: [PATCH] KVM: Use kvm_pfn_t for local PFN variable in
hva_to_pfn_remapped()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving
a so called "remapped" hva/pfn pair. In theory, the hva could resolve to
a pfn in high memory on a 32-bit kernel.
This bug was inadvertantly exposed by commit bd2fae8da794 ("KVM: do not
assume PTE is writable after follow_pfn"), which added an error PFN value
to the mix, causing gcc to comlain about overflowing the unsigned long.
arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’:
include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’
to ‘long unsigned int’ changes value from
‘9218868437227405314’ to ‘2’ [-Werror=overflow]
89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
| ^
virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’
Cc: stable(a)vger.kernel.org
Fixes: add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210208201940.1258328-1-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ee4ac2618ec5..001b9de4e727 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1906,7 +1906,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
bool write_fault, bool *writable,
kvm_pfn_t *p_pfn)
{
- unsigned long pfn;
+ kvm_pfn_t pfn;
pte_t *ptep;
spinlock_t *ptl;
int r;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From a9545779ee9e9e103648f6f2552e73cfe808d0f4 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Mon, 8 Feb 2021 12:19:40 -0800
Subject: [PATCH] KVM: Use kvm_pfn_t for local PFN variable in
hva_to_pfn_remapped()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Use kvm_pfn_t, a.k.a. u64, for the local 'pfn' variable when retrieving
a so called "remapped" hva/pfn pair. In theory, the hva could resolve to
a pfn in high memory on a 32-bit kernel.
This bug was inadvertantly exposed by commit bd2fae8da794 ("KVM: do not
assume PTE is writable after follow_pfn"), which added an error PFN value
to the mix, causing gcc to comlain about overflowing the unsigned long.
arch/x86/kvm/../../../virt/kvm/kvm_main.c: In function ‘hva_to_pfn_remapped’:
include/linux/kvm_host.h:89:30: error: conversion from ‘long long unsigned int’
to ‘long unsigned int’ changes value from
‘9218868437227405314’ to ‘2’ [-Werror=overflow]
89 | #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2)
| ^
virt/kvm/kvm_main.c:1935:9: note: in expansion of macro ‘KVM_PFN_ERR_RO_FAULT’
Cc: stable(a)vger.kernel.org
Fixes: add6a0cd1c5b ("KVM: MMU: try to fix up page faults before giving up")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210208201940.1258328-1-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index ee4ac2618ec5..001b9de4e727 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1906,7 +1906,7 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
bool write_fault, bool *writable,
kvm_pfn_t *p_pfn)
{
- unsigned long pfn;
+ kvm_pfn_t pfn;
pte_t *ptep;
spinlock_t *ptl;
int r;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bd2fae8da794b55bf2ac02632da3a151b10e664c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Mon, 1 Feb 2021 05:12:11 -0500
Subject: [PATCH] KVM: do not assume PTE is writable after follow_pfn
In order to convert an HVA to a PFN, KVM usually tries to use
the get_user_pages family of functinso. This however is not
possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the
PFN is writable. That is usually not a problem because the main
use of VM_IO vmas with KVM is for BARs in PCI device assignment,
however it is a bug. To fix it, use follow_pte and check pte_write
while under the protection of the PTE lock. The information can
be used to fail hva_to_pfn_remapped or passed back to the
caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix
up page faults before giving up", 2016-07-05); however, even older version
have the same issue, all the way back to commit 2e2e3738af33 ("KVM:
Handle vma regions with no backing page", 2008-07-20), as they also did
not check whether the PFN was writable.
Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page")
Reported-by: David Stevens <stevensd(a)google.com>
Cc: 3pvd(a)google.com
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8367d88ce39b..335a1a2b8edc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1904,9 +1904,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
kvm_pfn_t *p_pfn)
{
unsigned long pfn;
+ pte_t *ptep;
+ spinlock_t *ptl;
int r;
- r = follow_pfn(vma, addr, &pfn);
+ r = follow_pte(vma->vm_mm, addr, NULL, &ptep, NULL, &ptl);
if (r) {
/*
* get_user_pages fails for VM_IO and VM_PFNMAP vmas and does
@@ -1921,14 +1923,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
if (r)
return r;
- r = follow_pfn(vma, addr, &pfn);
+ r = follow_pte(vma->vm_mm, addr, NULL, &ptep, NULL, &ptl);
if (r)
return r;
+ }
+ if (write_fault && !pte_write(*ptep)) {
+ pfn = KVM_PFN_ERR_RO_FAULT;
+ goto out;
}
if (writable)
- *writable = true;
+ *writable = pte_write(*ptep);
+ pfn = pte_pfn(*ptep);
/*
* Get a reference here because callers of *hva_to_pfn* and
@@ -1943,6 +1950,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
*/
kvm_get_pfn(pfn);
+out:
+ pte_unmap_unlock(ptep, ptl);
*p_pfn = pfn;
return 0;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bd2fae8da794b55bf2ac02632da3a151b10e664c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Mon, 1 Feb 2021 05:12:11 -0500
Subject: [PATCH] KVM: do not assume PTE is writable after follow_pfn
In order to convert an HVA to a PFN, KVM usually tries to use
the get_user_pages family of functinso. This however is not
possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the
PFN is writable. That is usually not a problem because the main
use of VM_IO vmas with KVM is for BARs in PCI device assignment,
however it is a bug. To fix it, use follow_pte and check pte_write
while under the protection of the PTE lock. The information can
be used to fail hva_to_pfn_remapped or passed back to the
caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix
up page faults before giving up", 2016-07-05); however, even older version
have the same issue, all the way back to commit 2e2e3738af33 ("KVM:
Handle vma regions with no backing page", 2008-07-20), as they also did
not check whether the PFN was writable.
Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page")
Reported-by: David Stevens <stevensd(a)google.com>
Cc: 3pvd(a)google.com
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8367d88ce39b..335a1a2b8edc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1904,9 +1904,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
kvm_pfn_t *p_pfn)
{
unsigned long pfn;
+ pte_t *ptep;
+ spinlock_t *ptl;
int r;
- r = follow_pfn(vma, addr, &pfn);
+ r = follow_pte(vma->vm_mm, addr, NULL, &ptep, NULL, &ptl);
if (r) {
/*
* get_user_pages fails for VM_IO and VM_PFNMAP vmas and does
@@ -1921,14 +1923,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
if (r)
return r;
- r = follow_pfn(vma, addr, &pfn);
+ r = follow_pte(vma->vm_mm, addr, NULL, &ptep, NULL, &ptl);
if (r)
return r;
+ }
+ if (write_fault && !pte_write(*ptep)) {
+ pfn = KVM_PFN_ERR_RO_FAULT;
+ goto out;
}
if (writable)
- *writable = true;
+ *writable = pte_write(*ptep);
+ pfn = pte_pfn(*ptep);
/*
* Get a reference here because callers of *hva_to_pfn* and
@@ -1943,6 +1950,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
*/
kvm_get_pfn(pfn);
+out:
+ pte_unmap_unlock(ptep, ptl);
*p_pfn = pfn;
return 0;
}
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bd2fae8da794b55bf2ac02632da3a151b10e664c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Mon, 1 Feb 2021 05:12:11 -0500
Subject: [PATCH] KVM: do not assume PTE is writable after follow_pfn
In order to convert an HVA to a PFN, KVM usually tries to use
the get_user_pages family of functinso. This however is not
possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the
PFN is writable. That is usually not a problem because the main
use of VM_IO vmas with KVM is for BARs in PCI device assignment,
however it is a bug. To fix it, use follow_pte and check pte_write
while under the protection of the PTE lock. The information can
be used to fail hva_to_pfn_remapped or passed back to the
caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix
up page faults before giving up", 2016-07-05); however, even older version
have the same issue, all the way back to commit 2e2e3738af33 ("KVM:
Handle vma regions with no backing page", 2008-07-20), as they also did
not check whether the PFN was writable.
Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page")
Reported-by: David Stevens <stevensd(a)google.com>
Cc: 3pvd(a)google.com
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8367d88ce39b..335a1a2b8edc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1904,9 +1904,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
kvm_pfn_t *p_pfn)
{
unsigned long pfn;
+ pte_t *ptep;
+ spinlock_t *ptl;
int r;
- r = follow_pfn(vma, addr, &pfn);
+ r = follow_pte(vma->vm_mm, addr, NULL, &ptep, NULL, &ptl);
if (r) {
/*
* get_user_pages fails for VM_IO and VM_PFNMAP vmas and does
@@ -1921,14 +1923,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
if (r)
return r;
- r = follow_pfn(vma, addr, &pfn);
+ r = follow_pte(vma->vm_mm, addr, NULL, &ptep, NULL, &ptl);
if (r)
return r;
+ }
+ if (write_fault && !pte_write(*ptep)) {
+ pfn = KVM_PFN_ERR_RO_FAULT;
+ goto out;
}
if (writable)
- *writable = true;
+ *writable = pte_write(*ptep);
+ pfn = pte_pfn(*ptep);
/*
* Get a reference here because callers of *hva_to_pfn* and
@@ -1943,6 +1950,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
*/
kvm_get_pfn(pfn);
+out:
+ pte_unmap_unlock(ptep, ptl);
*p_pfn = pfn;
return 0;
}
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bd2fae8da794b55bf2ac02632da3a151b10e664c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Mon, 1 Feb 2021 05:12:11 -0500
Subject: [PATCH] KVM: do not assume PTE is writable after follow_pfn
In order to convert an HVA to a PFN, KVM usually tries to use
the get_user_pages family of functinso. This however is not
possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the
PFN is writable. That is usually not a problem because the main
use of VM_IO vmas with KVM is for BARs in PCI device assignment,
however it is a bug. To fix it, use follow_pte and check pte_write
while under the protection of the PTE lock. The information can
be used to fail hva_to_pfn_remapped or passed back to the
caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix
up page faults before giving up", 2016-07-05); however, even older version
have the same issue, all the way back to commit 2e2e3738af33 ("KVM:
Handle vma regions with no backing page", 2008-07-20), as they also did
not check whether the PFN was writable.
Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page")
Reported-by: David Stevens <stevensd(a)google.com>
Cc: 3pvd(a)google.com
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8367d88ce39b..335a1a2b8edc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1904,9 +1904,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
kvm_pfn_t *p_pfn)
{
unsigned long pfn;
+ pte_t *ptep;
+ spinlock_t *ptl;
int r;
- r = follow_pfn(vma, addr, &pfn);
+ r = follow_pte(vma->vm_mm, addr, NULL, &ptep, NULL, &ptl);
if (r) {
/*
* get_user_pages fails for VM_IO and VM_PFNMAP vmas and does
@@ -1921,14 +1923,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
if (r)
return r;
- r = follow_pfn(vma, addr, &pfn);
+ r = follow_pte(vma->vm_mm, addr, NULL, &ptep, NULL, &ptl);
if (r)
return r;
+ }
+ if (write_fault && !pte_write(*ptep)) {
+ pfn = KVM_PFN_ERR_RO_FAULT;
+ goto out;
}
if (writable)
- *writable = true;
+ *writable = pte_write(*ptep);
+ pfn = pte_pfn(*ptep);
/*
* Get a reference here because callers of *hva_to_pfn* and
@@ -1943,6 +1950,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
*/
kvm_get_pfn(pfn);
+out:
+ pte_unmap_unlock(ptep, ptl);
*p_pfn = pfn;
return 0;
}
This is the start of the stable review cycle for the 4.4.258 release.
There are 35 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 24 Feb 2021 12:07:46 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.258-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.258-rc1
Lai Jiangshan <laijs(a)linux.alibaba.com>
kvm: check tlbs_dirty directly
Arun Easi <aeasi(a)marvell.com>
scsi: qla2xxx: Fix crash during driver load on big endian machines
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: fix error handling in xen_blkbk_map()
Jan Beulich <jbeulich(a)suse.com>
xen-scsiback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-netback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: don't "handle" error by BUG()
Stefano Stabellini <stefano.stabellini(a)xilinx.com>
xen/arm: don't ignore return errors from set_phys_to_machine
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct error checking in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: don't bail early from clear_foreign_p2m_mapping()
Vasily Gorbik <gor(a)linux.ibm.com>
tracing: Avoid calling cc-option -mrecord-mcount for every Makefile
Greg Thelen <gthelen(a)google.com>
tracing: Fix SKIP_STACK_VALIDATION=1 build due to bad merge with -mrecord-mcount
Andi Kleen <ak(a)linux.intel.com>
trace: Use -mcount-record for dynamic ftrace
Borislav Petkov <bp(a)suse.de>
x86/build: Disable CET instrumentation in the kernel for 32-bit too
Stefano Garzarella <sgarzare(a)redhat.com>
vsock: fix locking in vsock_shutdown()
Edwin Peer <edwin.peer(a)broadcom.com>
net: watchdog: hold device global xmit lock during tx disable
Serge Semin <Sergey.Semin(a)baikalelectronics.ru>
usb: dwc3: ulpi: Replace CPU-based busyloop with Protocol-based one
Felipe Balbi <balbi(a)kernel.org>
usb: dwc3: ulpi: fix checkpatch warning
Randy Dunlap <rdunlap(a)infradead.org>
h8300: fix PREEMPTION build, TI_PRE_COUNT undefined
Jozsef Kadlecsik <kadlec(a)mail.kfki.hu>
netfilter: xt_recent: Fix attempt to update deleted entry
Roman Gushchin <guro(a)fb.com>
memblock: do not start bottom-up allocations with kernel_end
Phillip Lougher <phillip(a)squashfs.org.uk>
squashfs: add more sanity checks in xattr id lookup
Phillip Lougher <phillip(a)squashfs.org.uk>
squashfs: add more sanity checks in inode lookup
Phillip Lougher <phillip(a)squashfs.org.uk>
squashfs: add more sanity checks in id lookup
Theodore Ts'o <tytso(a)mit.edu>
memcg: fix a crash in wb_workfn when a device disappears
Qian Cai <cai(a)lca.pw>
include/trace/events/writeback.h: fix -Wstringop-truncation warnings
Tobin C. Harding <tobin(a)kernel.org>
lib/string: Add strscpy_pad() function
Dave Wysochanski <dwysocha(a)redhat.com>
SUNRPC: Handle 0 length opaque XDR object data properly
Dave Wysochanski <dwysocha(a)redhat.com>
SUNRPC: Move simple_get_bytes and simple_get_netobj into private header
Johannes Berg <johannes.berg(a)intel.com>
iwlwifi: mvm: guard against device removal in reprobe
Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap
Cong Wang <cong.wang(a)bytedance.com>
af_key: relax availability checks for skb size calculation
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
fgraph: Initialize tracing_graph_pause at task creation
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Do not count ftrace events in top level enable output
-------------
Diffstat:
Makefile | 11 +++++-
arch/arm/xen/p2m.c | 6 ++-
arch/h8300/kernel/asm-offsets.c | 3 ++
arch/x86/Makefile | 6 +--
arch/x86/xen/p2m.c | 15 ++++----
drivers/block/xen-blkback/blkback.c | 28 ++++++++------
drivers/net/wireless/iwlwifi/mvm/ops.c | 3 +-
drivers/net/wireless/iwlwifi/pcie/tx.c | 5 +++
drivers/net/xen-netback/netback.c | 4 +-
drivers/scsi/qla2xxx/qla_tmpl.c | 9 +++--
drivers/scsi/qla2xxx/qla_tmpl.h | 2 +-
drivers/usb/dwc3/ulpi.c | 20 ++++++++--
drivers/xen/gntdev.c | 33 +++++++++++------
drivers/xen/xen-scsiback.c | 4 +-
fs/fs-writeback.c | 2 +-
fs/squashfs/export.c | 41 ++++++++++++++++----
fs/squashfs/id.c | 40 ++++++++++++++++----
fs/squashfs/squashfs_fs_sb.h | 1 +
fs/squashfs/super.c | 6 +--
fs/squashfs/xattr.h | 10 ++++-
fs/squashfs/xattr_id.c | 66 ++++++++++++++++++++++++++++-----
include/linux/backing-dev.h | 10 +++++
include/linux/ftrace.h | 4 +-
include/linux/netdevice.h | 2 +
include/linux/string.h | 4 ++
include/linux/sunrpc/xdr.h | 3 +-
include/trace/events/writeback.h | 35 +++++++++--------
include/xen/grant_table.h | 1 +
kernel/trace/ftrace.c | 2 -
kernel/trace/trace_events.c | 3 +-
lib/string.c | 47 +++++++++++++++++++----
mm/backing-dev.c | 1 +
mm/memblock.c | 49 +++---------------------
net/key/af_key.c | 6 +--
net/netfilter/xt_recent.c | 12 +++++-
net/sunrpc/auth_gss/auth_gss.c | 30 +--------------
net/sunrpc/auth_gss/auth_gss_internal.h | 45 ++++++++++++++++++++++
net/sunrpc/auth_gss/gss_krb5_mech.c | 31 +---------------
net/vmw_vsock/af_vsock.c | 8 ++--
scripts/Makefile.build | 3 ++
virt/kvm/kvm_main.c | 3 +-
41 files changed, 389 insertions(+), 225 deletions(-)
The patch below does not apply to the 5.11-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f156abec725f945f9884bc6a5bd0dccb5aac16a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 3 Feb 2021 16:01:06 -0800
Subject: [PATCH] KVM: x86: Set so called 'reserved CR3 bits in LM mask' at
vCPU reset
Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU
reset. The reserved bits check needs to be done even if userspace never
configures the guest's CPUID model.
Cc: stable(a)vger.kernel.org
Fixes: 0107973a80ad ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210204000117.3303214-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 838ce5e9814b..10414a78b951 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10080,6 +10080,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
fx_init(vcpu);
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
+ vcpu->arch.cr3_lm_rsvd_bits = rsvd_bits(cpuid_maxphyaddr(vcpu), 63);
vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f156abec725f945f9884bc6a5bd0dccb5aac16a8 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Wed, 3 Feb 2021 16:01:06 -0800
Subject: [PATCH] KVM: x86: Set so called 'reserved CR3 bits in LM mask' at
vCPU reset
Set cr3_lm_rsvd_bits, which is effectively an invalid GPA mask, at vCPU
reset. The reserved bits check needs to be done even if userspace never
configures the guest's CPUID model.
Cc: stable(a)vger.kernel.org
Fixes: 0107973a80ad ("KVM: x86: Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210204000117.3303214-2-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 838ce5e9814b..10414a78b951 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10080,6 +10080,7 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
fx_init(vcpu);
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
+ vcpu->arch.cr3_lm_rsvd_bits = rsvd_bits(cpuid_maxphyaddr(vcpu), 63);
vcpu->arch.pat = MSR_IA32_CR_PAT_DEFAULT;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From bd2fae8da794b55bf2ac02632da3a151b10e664c Mon Sep 17 00:00:00 2001
From: Paolo Bonzini <pbonzini(a)redhat.com>
Date: Mon, 1 Feb 2021 05:12:11 -0500
Subject: [PATCH] KVM: do not assume PTE is writable after follow_pfn
In order to convert an HVA to a PFN, KVM usually tries to use
the get_user_pages family of functinso. This however is not
possible for VM_IO vmas; in that case, KVM instead uses follow_pfn.
In doing this however KVM loses the information on whether the
PFN is writable. That is usually not a problem because the main
use of VM_IO vmas with KVM is for BARs in PCI device assignment,
however it is a bug. To fix it, use follow_pte and check pte_write
while under the protection of the PTE lock. The information can
be used to fail hva_to_pfn_remapped or passed back to the
caller via *writable.
Usage of follow_pfn was introduced in commit add6a0cd1c5b ("KVM: MMU: try to fix
up page faults before giving up", 2016-07-05); however, even older version
have the same issue, all the way back to commit 2e2e3738af33 ("KVM:
Handle vma regions with no backing page", 2008-07-20), as they also did
not check whether the PFN was writable.
Fixes: 2e2e3738af33 ("KVM: Handle vma regions with no backing page")
Reported-by: David Stevens <stevensd(a)google.com>
Cc: 3pvd(a)google.com
Cc: Jann Horn <jannh(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 8367d88ce39b..335a1a2b8edc 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1904,9 +1904,11 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
kvm_pfn_t *p_pfn)
{
unsigned long pfn;
+ pte_t *ptep;
+ spinlock_t *ptl;
int r;
- r = follow_pfn(vma, addr, &pfn);
+ r = follow_pte(vma->vm_mm, addr, NULL, &ptep, NULL, &ptl);
if (r) {
/*
* get_user_pages fails for VM_IO and VM_PFNMAP vmas and does
@@ -1921,14 +1923,19 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
if (r)
return r;
- r = follow_pfn(vma, addr, &pfn);
+ r = follow_pte(vma->vm_mm, addr, NULL, &ptep, NULL, &ptl);
if (r)
return r;
+ }
+ if (write_fault && !pte_write(*ptep)) {
+ pfn = KVM_PFN_ERR_RO_FAULT;
+ goto out;
}
if (writable)
- *writable = true;
+ *writable = pte_write(*ptep);
+ pfn = pte_pfn(*ptep);
/*
* Get a reference here because callers of *hva_to_pfn* and
@@ -1943,6 +1950,8 @@ static int hva_to_pfn_remapped(struct vm_area_struct *vma,
*/
kvm_get_pfn(pfn);
+out:
+ pte_unmap_unlock(ptep, ptl);
*p_pfn = pfn;
return 0;
}
Hi,
the attached patches are the backport of these 2 patches for tools/Makefile
that allows building when OpenSSL is not at the default location. They apply
cleanly to both to 5.4.99 and 4.19.176. Backports for older stable kernels
will follow.
Greetings,
Eike
--
Rolf Eike Beer, emlix GmbH, http://www.emlix.com
Fon +49 551 30664-0, Fax +49 551 30664-11
Gothaer Platz 3, 37083 Göttingen, Germany
Sitz der Gesellschaft: Göttingen, Amtsgericht Göttingen HR B 3160
Geschäftsführung: Heike Jordan, Dr. Uwe Kracke – Ust-IdNr.: DE 205 198 055
emlix - smart embedded open source
This is the start of the stable review cycle for the 5.4.100 release.
There are 13 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 24 Feb 2021 12:07:46 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.100-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.100-rc1
Matwey V. Kornilov <matwey(a)sai.msu.ru>
media: pwc: Use correct device for DMA
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: fix error handling in xen_blkbk_map()
Jan Beulich <jbeulich(a)suse.com>
xen-scsiback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-netback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: don't "handle" error by BUG()
Stefano Stabellini <stefano.stabellini(a)xilinx.com>
xen/arm: don't ignore return errors from set_phys_to_machine
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct error checking in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: don't bail early from clear_foreign_p2m_mapping()
Wang Hai <wanghai38(a)huawei.com>
net: bridge: Fix a warning when del bridge sysfs
Loic Poulain <loic.poulain(a)linaro.org>
net: qrtr: Fix port ID for control messages
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: SEV: fix double locking due to incorrect backport
-------------
Diffstat:
Makefile | 4 ++--
arch/arm/xen/p2m.c | 6 ++++--
arch/x86/kvm/svm.c | 1 -
arch/x86/xen/p2m.c | 15 +++++++--------
drivers/block/xen-blkback/blkback.c | 30 ++++++++++++++++--------------
drivers/media/usb/pwc/pwc-if.c | 22 +++++++++++++---------
drivers/net/xen-netback/netback.c | 4 +---
drivers/xen/gntdev.c | 37 ++++++++++++++++++++-----------------
drivers/xen/xen-scsiback.c | 4 ++--
include/xen/grant_table.h | 1 +
net/bridge/br.c | 5 ++++-
net/qrtr/qrtr.c | 2 +-
12 files changed, 71 insertions(+), 60 deletions(-)
Hi
Please do consider applying 234f414efd11 ("Bluetooth: btusb: Some
Qualcomm Bluetooth adapters stop working") back to 5.10.y versions.
The issue got introduced in starting in 5.10-rc1.
Greg, I realize this is for now only in mainline but not even in
released tag, so following your earlier suggestion this might as well
be delayed a bit.
In https://bugzilla.kernel.org/show_bug.cgi?id=211571,
https://bugzilla.kernel.org/show_bug.cgi?id=210681 and
https://bugs.debian.org/981005 several users indicated to be affected
and would appreciate a fix to be backported to the stable series.
Regards,
Salvatore
While testing MST hotplug events on daisy chain monitors, find out
that CLEAR_PAYLOAD_ID_TABLE is not broadcasted and payload id table
is not reset. Dig in deeper and find out two parts needed to be fixed.
1. Link_Count_Total & Link_Count_Remaining of Broadcast message are
incorrect. Should set lct=1 & lcr=6
2. CLEAR_PAYLOAD_ID_TABLE request message is not set as path broadcast
request message. Should fix this.
Changes since v1:
*Refer to the suggestion from Ville Syrjala. While preparing hdr-rad,
take broadcast case into consideration.
Wayne Lin (2):
drm/dp_mst: Revise broadcast msg lct & lcr
drm/dp_mst: Set CLEAR_PAYLOAD_ID_TABLE as broadcast
drivers/gpu/drm/drm_dp_mst_topology.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)
--
2.17.1
Commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM
pointer invalidated") introduced a change that results in a circular
lockdep when a Secure Execution guest that is configured with
crypto devices is started. The problem resulted due to the fact that the
patch moved the setting of the guest's AP masks within the protection of
the matrix_dev->lock when the vfio_ap driver is notified that the KVM
pointer has been set. Since it is not critical that setting/clearing of
the guest's AP masks be done under the matrix_dev->lock when the driver is
notified, the masks will not be updated under the matrix_dev->lock. The
lock is necessary for the setting/unsetting of the KVM pointer, however,
so that will remain in place.
The dependency chain for the circular lockdep resolved by this patch
is (in reverse order):
2: vfio_ap_mdev_group_notifier: kvm->lock
matrix_dev->lock
1: handle_pqap: matrix_dev->lock
kvm_vcpu_ioctl: vcpu->mutex
0: kvm_s390_cpus_to_pv: vcpu->mutex
kvm_vm_ioctl: kvm->lock
Please note that if checkpatch is run against this patch series, you may
get a "WARNING: Unknown commit id 'f21916ec4826', maybe rebased or not
pulled?" message. The commit 'f21916ec4826', however, is definitely
in the master branch on top of which this patch series was built, so I'm
not sure why this message is being output by checkpatch.
Change log v1=> v2:
------------------
* No longer holding the matrix_dev->lock prior to setting/clearing the
masks supplying the AP configuration to a KVM guest.
* Make all updates to the data in the matrix mdev that is used to manage
AP resources used by the KVM guest in the vfio_ap_mdev_set_kvm() function
instead of the group notifier callback.
* Check for the matrix mdev's KVM pointer in the vfio_ap_mdev_unset_kvm()
function instead of the vfio_ap_mdev_release() function.
Tony Krowiak (1):
s390/vfio-ap: fix circular lockdep when setting/clearing crypto masks
drivers/s390/crypto/vfio_ap_ops.c | 119 +++++++++++++++++++++---------
1 file changed, 84 insertions(+), 35 deletions(-)
--
2.21.1
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, compaction: make fast_isolate_freepages() stay within zone
Compaction always operates on pages from a single given zone when
isolating both pages to migrate and freepages. Pageblock boundaries are
intersected with zone boundaries to be safe in case zone starts or ends in
the middle of pageblock. The use of pageblock_pfn_to_page() protects
against non-contiguous pageblocks.
The functions fast_isolate_freepages() and fast_isolate_around() don't
currently protect the fast freepage isolation thoroughly enough against
these corner cases, and can result in freepage isolation operate outside
of zone boundaries:
- in fast_isolate_freepages() if we get a pfn from the first pageblock
of a zone that starts in the middle of that pageblock, 'highest' can be
a pfn outside of the zone. If we fail to isolate anything in this
function, we may then call fast_isolate_around() on a pfn outside of the
zone and there effectively do a set_pageblock_skip(page_to_pfn(highest))
which may currently hit a VM_BUG_ON() in some configurations
- fast_isolate_around() checks only the zone end boundary and not
beginning, nor that the pageblock is contiguous (with
pageblock_pfn_to_page()) so it's possible that we end up calling
isolate_freepages_block() on a range of pfn's from two different zones
and end up e.g. isolating freepages under the wrong zone's lock.
This patch should fix the above issues.
Link: https://lkml.kernel.org/r/20210217173300.6394-1-vbabka@suse.cz
Fixes: 5a811889de10 ("mm, compaction: use free lists to quickly locate a migration target")
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Acked-by: David Rientjes <rientjes(a)google.com>
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/compaction.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
--- a/mm/compaction.c~mm-compaction-make-fast_isolate_freepages-stay-within-zone
+++ a/mm/compaction.c
@@ -1284,7 +1284,7 @@ static void
fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated)
{
unsigned long start_pfn, end_pfn;
- struct page *page = pfn_to_page(pfn);
+ struct page *page;
/* Do not search around if there are enough pages already */
if (cc->nr_freepages >= cc->nr_migratepages)
@@ -1295,8 +1295,12 @@ fast_isolate_around(struct compact_contr
return;
/* Pageblock boundaries */
- start_pfn = pageblock_start_pfn(pfn);
- end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(cc->zone)) - 1;
+ start_pfn = max(pageblock_start_pfn(pfn), cc->zone->zone_start_pfn);
+ end_pfn = min(pageblock_end_pfn(pfn), zone_end_pfn(cc->zone));
+
+ page = pageblock_pfn_to_page(start_pfn, end_pfn, cc->zone);
+ if (!page)
+ return;
/* Scan before */
if (start_pfn != pfn) {
@@ -1398,7 +1402,8 @@ fast_isolate_freepages(struct compact_co
pfn = page_to_pfn(freepage);
if (pfn >= highest)
- highest = pageblock_start_pfn(pfn);
+ highest = max(pageblock_start_pfn(pfn),
+ cc->zone->zone_start_pfn);
if (pfn >= low_pfn) {
cc->fast_search_fail = 0;
@@ -1468,7 +1473,8 @@ fast_isolate_freepages(struct compact_co
} else {
if (cc->direct_compaction && pfn_valid(min_pfn)) {
page = pageblock_pfn_to_page(min_pfn,
- pageblock_end_pfn(min_pfn),
+ min(pageblock_end_pfn(min_pfn),
+ zone_end_pfn(cc->zone)),
cc->zone);
cc->free_pfn = min_pfn;
}
_
From: Dave Hansen <dave.hansen(a)linux.intel.com>
Subject: mm/vmscan: restore zone_reclaim_mode ABI
I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
sysctl. Like a good kernel developer, I also went to go update the
documentation. I noticed that the bits in the documentation didn't
match the bits in the #defines.
The VM never explicitly checks the RECLAIM_ZONE bit. The bit is,
however implicitly checked when checking 'node_reclaim_mode==0'. The
RECLAIM_ZONE #define was removed in a cleanup. That, by itself is
fine.
But, when the bit was removed (bit 0) the _other_ bit locations also got
changed. That's not OK because the bit values are documented to mean one
specific thing. Users surely do not expect the meaning to change from
kernel to kernel.
The end result is that if someone had a script that did:
sysctl vm.zone_reclaim_mode=1
it would have gone from enabling node reclaim for clean unmapped pages to
writing out pages during node reclaim after the commit in question.
That's not great.
Put the bits back the way they were and add a comment so something like
this is a bit harder to do again. Update the documentation to make it
clear that the first bit is ignored.
Link: https://lkml.kernel.org/r/20210219172555.FF0CDF23@viggo.jf.intel.com
Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com>
Fixes: 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
Reviewed-by: Ben Widawsky <ben.widawsky(a)intel.com>
Reviewed-by: Oscar Salvador <osalvador(a)suse.de>
Acked-by: David Rientjes <rientjes(a)google.com>
Acked-by: Christoph Lameter <cl(a)linux.com>
Cc: Alex Shi <alex.shi(a)linux.alibaba.com>
Cc: Daniel Wagner <dwagner(a)suse.de>
Cc: "Tobin C. Harding" <tobin(a)kernel.org>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Huang Ying <ying.huang(a)intel.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Qian Cai <cai(a)lca.pw>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/admin-guide/sysctl/vm.rst | 10 +++++-----
mm/vmscan.c | 9 +++++++--
2 files changed, 12 insertions(+), 7 deletions(-)
--- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-zone_reclaim_mode-abi
+++ a/Documentation/admin-guide/sysctl/vm.rst
@@ -983,11 +983,11 @@ that benefit from having their data cach
left disabled as the caching effect is likely to be more important than
data locality.
-zone_reclaim may be enabled if it's known that the workload is partitioned
-such that each partition fits within a NUMA node and that accessing remote
-memory would cause a measurable performance reduction. The page allocator
-will then reclaim easily reusable pages (those page cache pages that are
-currently not used) before allocating off node pages.
+Consider enabling one or more zone_reclaim mode bits if it's known that the
+workload is partitioned such that each partition fits within a NUMA node
+and that accessing remote memory would cause a measurable performance
+reduction. The page allocator will take additional actions before
+allocating off node pages.
Allowing zone reclaim to write out pages stops processes that are
writing large amounts of data from dirtying pages on other nodes. Zone
--- a/mm/vmscan.c~mm-vmscan-restore-zone_reclaim_mode-abi
+++ a/mm/vmscan.c
@@ -4085,8 +4085,13 @@ module_init(kswapd_init)
*/
int node_reclaim_mode __read_mostly;
-#define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */
-#define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */
+/*
+ * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
+ * ABI. New bits are OK, but existing bits can never change.
+ */
+#define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */
+#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
+#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */
/*
* Priority for NODE_RECLAIM. This determines the fraction of pages
_
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: fix copy_huge_page_from_user contig page struct assumption
page structs are not guaranteed to be contiguous for gigantic pages. The
routine copy_huge_page_from_user can encounter gigantic pages, yet it
assumes page structs are contiguous when copying pages from user space.
Since page structs for the target gigantic page are not contiguous, the
data copied from user space could overwrite other pages not associated
with the gigantic page and cause data corruption.
Non-contiguous page structs are generally not an issue. However, they can
exist with a specific kernel configuration and hotplug operations. For
example: Configure the kernel with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where
the gigantic page will be allocated.
Link: https://lkml.kernel.org/r/20210217184926.33567-2-mike.kravetz@oracle.com
Fixes: 8fb5debc5fcd ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Davidlohr Bueso <dbueso(a)suse.de>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
--- a/mm/memory.c~hugetlb-fix-copy_huge_page_from_user-contig-page-struct-assumption
+++ a/mm/memory.c
@@ -5177,17 +5177,19 @@ long copy_huge_page_from_user(struct pag
void *page_kaddr;
unsigned long i, rc = 0;
unsigned long ret_val = pages_per_huge_page * PAGE_SIZE;
+ struct page *subpage = dst_page;
- for (i = 0; i < pages_per_huge_page; i++) {
+ for (i = 0; i < pages_per_huge_page;
+ i++, subpage = mem_map_next(subpage, dst_page, i)) {
if (allow_pagefault)
- page_kaddr = kmap(dst_page + i);
+ page_kaddr = kmap(subpage);
else
- page_kaddr = kmap_atomic(dst_page + i);
+ page_kaddr = kmap_atomic(subpage);
rc = copy_from_user(page_kaddr,
(const void __user *)(src + i * PAGE_SIZE),
PAGE_SIZE);
if (allow_pagefault)
- kunmap(dst_page + i);
+ kunmap(subpage);
else
kunmap_atomic(page_kaddr);
_
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: fix update_and_free_page contig page struct assumption
page structs are not guaranteed to be contiguous for gigantic pages. The
routine update_and_free_page can encounter a gigantic page, yet it assumes
page structs are contiguous when setting page flags in subpages.
If update_and_free_page encounters non-contiguous page structs, we can see
“BUG: Bad page state in process …” errors.
Non-contiguous page structs are generally not an issue. However, they can
exist with a specific kernel configuration and hotplug operations. For
example: Configure the kernel with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP. Then, hotplug add memory for the area where
the gigantic page will be allocated. Zi Yan outlined steps to reproduce
here [1].
[1] https://lore.kernel.org/linux-mm/16F7C58B-4D79-41C5-9B64-A1A1628F4AF2@nvidi…
Link: https://lkml.kernel.org/r/20210217184926.33567-1-mike.kravetz@oracle.com
Fixes: 944d9fec8d7a ("hugetlb: add support for gigantic page allocation at runtime")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Davidlohr Bueso <dbueso(a)suse.de>
Cc: "Kirill A . Shutemov" <kirill.shutemov(a)linux.intel.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Joao Martins <joao.m.martins(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/hugetlb.c~hugetlb-fix-update_and_free_page-contig-page-struct-assumption
+++ a/mm/hugetlb.c
@@ -1321,14 +1321,16 @@ static inline void destroy_compound_giga
static void update_and_free_page(struct hstate *h, struct page *page)
{
int i;
+ struct page *subpage = page;
if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
return;
h->nr_huge_pages--;
h->nr_huge_pages_node[page_to_nid(page)]--;
- for (i = 0; i < pages_per_huge_page(h); i++) {
- page[i].flags &= ~(1 << PG_locked | 1 << PG_error |
+ for (i = 0; i < pages_per_huge_page(h);
+ i++, subpage = mem_map_next(subpage, page, i)) {
+ subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
1 << PG_referenced | 1 << PG_dirty |
1 << PG_active | 1 << PG_private |
1 << PG_writeback);
_
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: memcontrol: fix get_active_memcg return value
We use a global percpu int_active_memcg variable to store the remote memcg
when we are in the interrupt context. But get_active_memcg always return
the current->active_memcg or root_mem_cgroup. The remote memcg (set in
the interrupt context) is ignored. This is not what we want. So fix it.
Link: https://lkml.kernel.org/r/20210223091101.42150-1-songmuchun@bytedance.com
Fixes: 37d5985c003d ("mm: kmem: prepare remote memcg charging infra for interrupt contexts")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-get_active_memcg-return-value
+++ a/mm/memcontrol.c
@@ -1061,13 +1061,9 @@ static __always_inline struct mem_cgroup
rcu_read_lock();
memcg = active_memcg();
- if (memcg) {
- /* current->active_memcg must hold a ref. */
- if (WARN_ON_ONCE(!css_tryget(&memcg->css)))
- memcg = root_mem_cgroup;
- else
- memcg = current->active_memcg;
- }
+ /* remote memcg must hold a ref. */
+ if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css)))
+ memcg = root_mem_cgroup;
rcu_read_unlock();
return memcg;
_
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: memcontrol: fix swap undercounting in cgroup2
When pages are swapped in, the VM may retain the swap copy to avoid
repeated writes in the future. It's also retained if shared pages are
faulted back in some processes, but not in others. During that time we
have an in-memory copy of the page, as well as an on-swap copy. Cgroup1
and cgroup2 handle these overlapping lifetimes slightly differently due to
the nature of how they account memory and swap:
Cgroup1 has a unified memory+swap counter that tracks a data page
regardless whether it's in-core or swapped out. On swapin, we transfer
the charge from the swap entry to the newly allocated swapcache page, even
though the swap entry might stick around for a while. That's why we have
a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge().
Cgroup2 tracks memory and swap as separate, independent resources and thus
has split memory and swap counters. On swapin, we charge the newly
allocated swapcache page as memory, while the swap slot in turn must
remain charged to the swap counter as long as its allocated too.
The cgroup2 logic was broken by commit 2d1c498072de ("mm: memcontrol: make
swap tracking an integral part of memory control"), because it
accidentally removed the do_memsw_account() check in the branch inside
mem_cgroup_uncharge() that was supposed to tell the difference between the
charge transfer in cgroup1 and the separate counters in cgroup2.
As a result, cgroup2 currently undercounts retained swap to varying
degrees: swap slots are cached up to 50% of the configured limit or total
available swap space; partially faulted back shared pages are only limited
by physical capacity. This in turn allows cgroups to significantly
overconsume their alloted swap space.
Add the do_memsw_account() check back to fix this problem.
Link: https://lkml.kernel.org/r/20210217153237.92484-1-songmuchun@bytedance.com
Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part of memory control")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Acked-by: Johannes Weiner <hannes(a)cmpxchg.org>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org> [5.8+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-swap-undercounting-in-cgroup2
+++ a/mm/memcontrol.c
@@ -6748,7 +6748,19 @@ int mem_cgroup_charge(struct page *page,
memcg_check_events(memcg, page);
local_irq_enable();
- if (PageSwapCache(page)) {
+ /*
+ * Cgroup1's unified memory+swap counter has been charged with the
+ * new swapcache page, finish the transfer by uncharging the swap
+ * slot. The swap slot would also get uncharged when it dies, but
+ * it can stick around indefinitely and we'd count the page twice
+ * the entire time.
+ *
+ * Cgroup2 has separate resource counters for memory and swap,
+ * so this is a non-issue here. Memory and swap charge lifetimes
+ * correspond 1:1 to page and swap slot lifetimes: we charge the
+ * page to memory here, and uncharge swap when the slot is freed.
+ */
+ if (do_memsw_account() && PageSwapCache(page)) {
swp_entry_t entry = { .val = page_private(page) };
/*
* The swap entry might not get freed for a long time,
_
From: Rustam Kovhaev <rkovhaev(a)gmail.com>
Subject: ntfs: check for valid standard information attribute
Mounting a corrupted filesystem with NTFS resulted in a kernel crash.
We should check for valid STANDARD_INFORMATION attribute offset and length
before trying to access it
Link: https://lkml.kernel.org/r/20210217155930.1506815-1-rkovhaev@gmail.com
Link: https://syzkaller.appspot.com/bug?extid=c584225dabdea2f71969
Signed-off-by: Rustam Kovhaev <rkovhaev(a)gmail.com>
Reported-by: syzbot+c584225dabdea2f71969(a)syzkaller.appspotmail.com
Tested-by: syzbot+c584225dabdea2f71969(a)syzkaller.appspotmail.com
Acked-by: Anton Altaparmakov <anton(a)tuxera.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/ntfs/inode.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/fs/ntfs/inode.c~ntfs-check-for-valid-standard-information-attribute
+++ a/fs/ntfs/inode.c
@@ -629,6 +629,12 @@ static int ntfs_read_locked_inode(struct
}
a = ctx->attr;
/* Get the standard information attribute value. */
+ if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
+ + le32_to_cpu(a->data.resident.value_length) >
+ (u8 *)ctx->mrec + vol->mft_record_size) {
+ ntfs_error(vi->i_sb, "Corrupt standard information attribute in inode.");
+ goto unm_err_out;
+ }
si = (STANDARD_INFORMATION*)((u8*)a +
le16_to_cpu(a->data.resident.value_offset));
_
This is the start of the stable review cycle for the 5.10.18 release.
There are 29 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 24 Feb 2021 12:07:46 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.18-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.18-rc1
Matwey V. Kornilov <matwey(a)sai.msu.ru>
media: pwc: Use correct device for DMA
Filipe Manana <fdmanana(a)suse.com>
btrfs: fix crash after non-aligned direct IO write with O_DSYNC
David Sterba <dsterba(a)suse.com>
btrfs: fix backport of 2175bf57dc952 in 5.10.13
Trent Piepho <tpiepho(a)gmail.com>
Bluetooth: btusb: Always fallback to alt 1 for WBS
Linus Torvalds <torvalds(a)linux-foundation.org>
tty: protect tty_write from odd low-level tty disciplines
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: fix error handling in xen_blkbk_map()
Jan Beulich <jbeulich(a)suse.com>
xen-scsiback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-netback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: don't "handle" error by BUG()
Stefano Stabellini <stefano.stabellini(a)xilinx.com>
xen/arm: don't ignore return errors from set_phys_to_machine
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct error checking in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: don't bail early from clear_foreign_p2m_mapping()
Yonatan Linik <yonatanlinik(a)gmail.com>
net: fix proc_fs init handling in af_packet and tls
Wang Hai <wanghai38(a)huawei.com>
net: bridge: Fix a warning when del bridge sysfs
Eelco Chaudron <echaudro(a)redhat.com>
net: openvswitch: fix TTL decrement exception action execution
Pablo Neira Ayuso <pablo(a)netfilter.org>
net: sched: incorrect Kconfig dependencies on Netfilter modules
Lorenzo Bianconi <lorenzo(a)kernel.org>
mt76: mt7615: fix rdd mcu cmd endianness
Felix Fietkau <nbd(a)nbd.name>
mt76: mt7915: fix endian issues
wenxu <wenxu(a)ucloud.cn>
net/sched: fix miss init the mru in qdisc_skb_cb
Florian Westphal <fw(a)strlen.de>
mptcp: skip to next candidate if subflow has unacked data
Loic Poulain <loic.poulain(a)linaro.org>
net: qrtr: Fix port ID for control messages
Max Gurtovoy <mgurtovoy(a)nvidia.com>
IB/isert: add module param to set sg_tablesize for IO cmd
Stefano Garzarella <sgarzare(a)redhat.com>
vdpa_sim: add get_config callback in vdpasim_dev_attr
Stefano Garzarella <sgarzare(a)redhat.com>
vdpa_sim: make 'config' generic and usable for any device type
Stefano Garzarella <sgarzare(a)redhat.com>
vdpa_sim: store parsed MAC address in a buffer
Stefano Garzarella <sgarzare(a)redhat.com>
vdpa_sim: add struct vdpasim_dev_attr for device attributes
Max Gurtovoy <mgurtovoy(a)nvidia.com>
vdpa_sim: remove hard-coded virtq count
-------------
Diffstat:
Makefile | 4 +-
arch/arm/xen/p2m.c | 6 +-
arch/x86/xen/p2m.c | 15 ++---
drivers/block/xen-blkback/blkback.c | 32 +++++----
drivers/bluetooth/btusb.c | 20 ++----
drivers/infiniband/ulp/isert/ib_isert.c | 27 +++++++-
drivers/infiniband/ulp/isert/ib_isert.h | 6 ++
drivers/media/usb/pwc/pwc-if.c | 22 +++---
drivers/net/wireless/mediatek/mt76/mt7615/mcu.c | 89 ++++++++++++++++++-------
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c | 87 ++++++++++++++++++------
drivers/net/xen-netback/netback.c | 4 +-
drivers/tty/tty_io.c | 5 +-
drivers/vdpa/vdpa_sim/vdpa_sim.c | 83 ++++++++++++++++-------
drivers/xen/gntdev.c | 37 +++++-----
drivers/xen/xen-scsiback.c | 4 +-
fs/btrfs/ctree.h | 6 +-
fs/btrfs/inode.c | 6 +-
include/xen/grant_table.h | 1 +
net/bridge/br.c | 5 +-
net/core/dev.c | 2 +
net/mptcp/protocol.c | 5 +-
net/openvswitch/actions.c | 15 ++---
net/packet/af_packet.c | 2 +
net/qrtr/qrtr.c | 2 +-
net/sched/Kconfig | 6 +-
net/tls/tls_proc.c | 3 +
26 files changed, 337 insertions(+), 157 deletions(-)
Hi, we'd like to request that you pull this commit into stable kernels.
It's a trivial patch that just downgrades a (harmless) kernel warning
message to debug level. The warning can be very chatty in some
situations, and it'd be nice to silence it.
-----------------------------8<-------------------------------
commit ccd1acdf1c49b835504b235461fd24e2ed826764
Author: Luis Henriques <lhenriques(a)suse.de>
Date: Thu Nov 12 11:25:32 2020 +0000
ceph: downgrade warning from mdsmap decode to debug
While the MDS cluster is unstable and changing state the client may get
mdsmap updates that will trigger warnings:
[144692.478400] ceph: mdsmap_decode got incorrect state(up:standby-replay)
[144697.489552] ceph: mdsmap_decode got incorrect state(up:standby-replay)
[144697.489580] ceph: mdsmap_decode got incorrect state(up:standby-replay)
This patch downgrades these warnings to debug, as they may flood the logs
if the cluster is unstable for a while.
Signed-off-by: Luis Henriques <lhenriques(a)suse.de>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Ilya Dryomov <idryomov(a)gmail.com>
Thanks!
--
Jeff Layton <jlayton(a)kernel.org>
While testing MST hotplug events on daisy chain monitors, find out
that CLEAR_PAYLOAD_ID_TABLE is not broadcasted and payload id table
is not reset. Dig in deeper and find out two parts needed to be fixed.
1. Link_Count_Total & Link_Count_Remaining of Broadcast message are
incorrect. Should set lct=1 & lcr=6
2. CLEAR_PAYLOAD_ID_TABLE request message is not set as path broadcast
request message. Should fix this.
Wayne Lin (2):
drm/dp_mst: Revise broadcast msg lct & lcr
drm/dp_mst: Set CLEAR_PAYLOAD_ID_TABLE as broadcast
drivers/gpu/drm/drm_dp_mst_topology.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
--
2.17.1
This is a request for merge of upstream commit 4a658527271b ("xen-netback:
delete NAPI instance when queue fails to initialize") on v4.4.y tree.
If 'xenvif_connect()' fails after successful 'netif_napi_add()', the napi is
not cleaned up. Because 'create_queues()' frees the queues in its error
handling code, if the 'xenvif_free()' is called for the vif, use-after-free
occurs. The upstream commit fixes the problem by cleaning up the napi in the
'xenvif_connect()'.
Attaching the original patch below for your convenience.
Tested-by: Markus Boehme <markubo(a)amazon.de>
Thanks,
SeongJae Park
==================================== >8 =======================================
>From 4a658527271bce43afb1cf4feec89afe6716ca59 Mon Sep 17 00:00:00 2001
From: David Vrabel <david.vrabel(a)citrix.com>
Date: Fri, 15 Jan 2016 14:55:35 +0000
Subject: [PATCH] xen-netback: delete NAPI instance when queue fails to
initialize
When xenvif_connect() fails it may leave a stale NAPI instance added to
the device. Make sure we delete it in the error path.
Signed-off-by: David Vrabel <david.vrabel(a)citrix.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
drivers/net/xen-netback/interface.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index e7bd63eb2876..3bba6ceee132 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -615,6 +615,7 @@ int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref,
queue->tx_irq = 0;
err_unmap:
xenvif_unmap_frontend_rings(queue);
+ netif_napi_del(&queue->napi);
err:
module_put(THIS_MODULE);
return err;
--
2.17.1
From: Lino Sanfilippo <l.sanfilippo(a)kunbus.com>
The following sequence of operations results in a refcount warning:
1. Open device /dev/tpmrm.
2. Remove module tpm_tis_spi.
3. Write a TPM command to the file descriptor opened at step 1.
------------[ cut here ]------------
WARNING: CPU: 3 PID: 1161 at lib/refcount.c:25 kobject_get+0xa0/0xa4
refcount_t: addition on 0; use-after-free.
Modules linked in: tpm_tis_spi tpm_tis_core tpm mdio_bcm_unimac brcmfmac
sha256_generic libsha256 sha256_arm hci_uart btbcm bluetooth cfg80211 vc4
brcmutil ecdh_generic ecc snd_soc_core crc32_arm_ce libaes
raspberrypi_hwmon ac97_bus snd_pcm_dmaengine bcm2711_thermal snd_pcm
snd_timer genet snd phy_generic soundcore [last unloaded: spi_bcm2835]
CPU: 3 PID: 1161 Comm: hold_open Not tainted 5.10.0ls-main-dirty #2
Hardware name: BCM2711
[<c0410c3c>] (unwind_backtrace) from [<c040b580>] (show_stack+0x10/0x14)
[<c040b580>] (show_stack) from [<c1092174>] (dump_stack+0xc4/0xd8)
[<c1092174>] (dump_stack) from [<c0445a30>] (__warn+0x104/0x108)
[<c0445a30>] (__warn) from [<c0445aa8>] (warn_slowpath_fmt+0x74/0xb8)
[<c0445aa8>] (warn_slowpath_fmt) from [<c08435d0>] (kobject_get+0xa0/0xa4)
[<c08435d0>] (kobject_get) from [<bf0a715c>] (tpm_try_get_ops+0x14/0x54 [tpm])
[<bf0a715c>] (tpm_try_get_ops [tpm]) from [<bf0a7d6c>] (tpm_common_write+0x38/0x60 [tpm])
[<bf0a7d6c>] (tpm_common_write [tpm]) from [<c05a7ac0>] (vfs_write+0xc4/0x3c0)
[<c05a7ac0>] (vfs_write) from [<c05a7ee4>] (ksys_write+0x58/0xcc)
[<c05a7ee4>] (ksys_write) from [<c04001a0>] (ret_fast_syscall+0x0/0x4c)
Exception stack(0xc226bfa8 to 0xc226bff0)
bfa0: 00000000 000105b4 00000003 beafe664 00000014 00000000
bfc0: 00000000 000105b4 000103f8 00000004 00000000 00000000 b6f9c000 beafe684
bfe0: 0000006c beafe648 0001056c b6eb6944
---[ end trace d4b8409def9b8b1f ]---
The reason for this warning is the attempt to get the chip->dev reference
in tpm_common_write() although the reference counter is already zero.
Since commit 8979b02aaf1d ("tpm: Fix reference count to main device") the
extra reference used to prevent a premature zero counter is never taken,
because the required TPM_CHIP_FLAG_TPM2 flag is never set.
Fix this by moving the TPM 2 character device handling from
tpm_chip_alloc() to tpm_add_char_device() which is called at a later point
in time when the flag has been set in case of TPM2.
Commit fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
already introduced function tpm_devs_release() to release the extra
reference but did not implement the required put on chip->devs that results
in the call of this function.
Fix this by putting chip->devs in tpm_chip_unregister().
Finally move the new implementation for the TPM 2 handling into a new
function to avoid multiple checks for the TPM_CHIP_FLAG_TPM2 flag in the
good case and error cases.
Cc: stable(a)vger.kernel.org
Fixes: fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
Fixes: 8979b02aaf1d ("tpm: Fix reference count to main device")
Co-developed-by: Jason Gunthorpe <jgg(a)ziepe.ca>
Signed-off-by: Jason Gunthorpe <jgg(a)ziepe.ca>
Signed-off-by: Lino Sanfilippo <l.sanfilippo(a)kunbus.com>
---
drivers/char/tpm/tpm-chip.c | 48 ++++++++-----------------------------
drivers/char/tpm/tpm.h | 1 +
drivers/char/tpm/tpm2-space.c | 55 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 66 insertions(+), 38 deletions(-)
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index ddaeceb..3f59f09 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -274,14 +274,6 @@ static void tpm_dev_release(struct device *dev)
kfree(chip);
}
-static void tpm_devs_release(struct device *dev)
-{
- struct tpm_chip *chip = container_of(dev, struct tpm_chip, devs);
-
- /* release the master device reference */
- put_device(&chip->dev);
-}
-
/**
* tpm_class_shutdown() - prepare the TPM device for loss of power.
* @dev: device to which the chip is associated.
@@ -344,7 +336,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
chip->dev_num = rc;
device_initialize(&chip->dev);
- device_initialize(&chip->devs);
chip->dev.class = tpm_class;
chip->dev.class->shutdown_pre = tpm_class_shutdown;
@@ -352,39 +343,20 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
chip->dev.parent = pdev;
chip->dev.groups = chip->groups;
- chip->devs.parent = pdev;
- chip->devs.class = tpmrm_class;
- chip->devs.release = tpm_devs_release;
- /* get extra reference on main device to hold on
- * behalf of devs. This holds the chip structure
- * while cdevs is in use. The corresponding put
- * is in the tpm_devs_release (TPM2 only)
- */
- if (chip->flags & TPM_CHIP_FLAG_TPM2)
- get_device(&chip->dev);
-
if (chip->dev_num == 0)
chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
else
chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);
- chip->devs.devt =
- MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);
-
rc = dev_set_name(&chip->dev, "tpm%d", chip->dev_num);
if (rc)
goto out;
- rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);
- if (rc)
- goto out;
if (!pdev)
chip->flags |= TPM_CHIP_FLAG_VIRTUAL;
cdev_init(&chip->cdev, &tpm_fops);
- cdev_init(&chip->cdevs, &tpmrm_fops);
chip->cdev.owner = THIS_MODULE;
- chip->cdevs.owner = THIS_MODULE;
rc = tpm2_init_space(&chip->work_space, TPM2_SPACE_BUFFER_SIZE);
if (rc) {
@@ -396,7 +368,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
return chip;
out:
- put_device(&chip->devs);
put_device(&chip->dev);
return ERR_PTR(rc);
}
@@ -445,14 +416,9 @@ static int tpm_add_char_device(struct tpm_chip *chip)
}
if (chip->flags & TPM_CHIP_FLAG_TPM2) {
- rc = cdev_device_add(&chip->cdevs, &chip->devs);
- if (rc) {
- dev_err(&chip->devs,
- "unable to cdev_device_add() %s, major %d, minor %d, err=%d\n",
- dev_name(&chip->devs), MAJOR(chip->devs.devt),
- MINOR(chip->devs.devt), rc);
- return rc;
- }
+ rc = tpm_devs_add(chip);
+ if (rc)
+ goto out_cdev;
}
/* Make the chip available. */
@@ -460,6 +426,10 @@ static int tpm_add_char_device(struct tpm_chip *chip)
idr_replace(&dev_nums_idr, chip, chip->dev_num);
mutex_unlock(&idr_lock);
+ return 0;
+
+out_cdev:
+ cdev_device_del(&chip->cdev, &chip->dev);
return rc;
}
@@ -640,8 +610,10 @@ void tpm_chip_unregister(struct tpm_chip *chip)
if (IS_ENABLED(CONFIG_HW_RANDOM_TPM))
hwrng_unregister(&chip->hwrng);
tpm_bios_log_teardown(chip);
- if (chip->flags & TPM_CHIP_FLAG_TPM2)
+ if (chip->flags & TPM_CHIP_FLAG_TPM2) {
cdev_device_del(&chip->cdevs, &chip->devs);
+ put_device(&chip->devs);
+ }
tpm_del_char_device(chip);
}
EXPORT_SYMBOL_GPL(tpm_chip_unregister);
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 947d1db..ac4b2dc 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -238,6 +238,7 @@ int tpm2_prepare_space(struct tpm_chip *chip, struct tpm_space *space, u8 *cmd,
size_t cmdsiz);
int tpm2_commit_space(struct tpm_chip *chip, struct tpm_space *space, void *buf,
size_t *bufsiz);
+int tpm_devs_add(struct tpm_chip *chip);
void tpm_bios_log_setup(struct tpm_chip *chip);
void tpm_bios_log_teardown(struct tpm_chip *chip);
diff --git a/drivers/char/tpm/tpm2-space.c b/drivers/char/tpm/tpm2-space.c
index 784b8b3..2f8bb3e 100644
--- a/drivers/char/tpm/tpm2-space.c
+++ b/drivers/char/tpm/tpm2-space.c
@@ -571,3 +571,58 @@ int tpm2_commit_space(struct tpm_chip *chip, struct tpm_space *space,
dev_err(&chip->dev, "%s: error %d\n", __func__, rc);
return rc;
}
+
+/*
+ * Put the reference to the main device.
+ */
+static void tpm_devs_release(struct device *dev)
+{
+ struct tpm_chip *chip = container_of(dev, struct tpm_chip, devs);
+
+ /* release the master device reference */
+ put_device(&chip->dev);
+}
+
+/*
+ * Add a device file to expose TPM spaces. Also take a reference to the
+ * main device.
+ */
+int tpm_devs_add(struct tpm_chip *chip)
+{
+ int rc;
+
+ device_initialize(&chip->devs);
+ chip->devs.parent = chip->dev.parent;
+ chip->devs.class = tpmrm_class;
+
+ /*
+ * Get extra reference on main device to hold on behalf of devs.
+ * This holds the chip structure while cdevs is in use. The
+ * corresponding put is in the tpm_devs_release.
+ */
+ get_device(&chip->dev);
+ chip->devs.release = tpm_devs_release;
+ chip->devs.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);
+ cdev_init(&chip->cdevs, &tpmrm_fops);
+ chip->cdevs.owner = THIS_MODULE;
+
+ rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);
+ if (rc)
+ goto out_devs;
+
+ rc = cdev_device_add(&chip->cdevs, &chip->devs);
+ if (rc) {
+ dev_err(&chip->devs,
+ "unable to cdev_device_add() %s, major %d, minor %d, err=%d\n",
+ dev_name(&chip->devs), MAJOR(chip->devs.devt),
+ MINOR(chip->devs.devt), rc);
+ goto out_devs;
+ }
+
+ return 0;
+
+out_devs:
+ put_device(&chip->devs);
+
+ return rc;
+}
--
2.7.4
From: Lino Sanfilippo <l.sanfilippo(a)kunbus.com>
The following sequence of operations results in a refcount warning:
1. Open device /dev/tpmrm.
2. Remove module tpm_tis_spi.
3. Write a TPM command to the file descriptor opened at step 1.
------------[ cut here ]------------
WARNING: CPU: 3 PID: 1161 at lib/refcount.c:25 kobject_get+0xa0/0xa4
refcount_t: addition on 0; use-after-free.
Modules linked in: tpm_tis_spi tpm_tis_core tpm mdio_bcm_unimac brcmfmac
sha256_generic libsha256 sha256_arm hci_uart btbcm bluetooth cfg80211 vc4
brcmutil ecdh_generic ecc snd_soc_core crc32_arm_ce libaes
raspberrypi_hwmon ac97_bus snd_pcm_dmaengine bcm2711_thermal snd_pcm
snd_timer genet snd phy_generic soundcore [last unloaded: spi_bcm2835]
CPU: 3 PID: 1161 Comm: hold_open Not tainted 5.10.0ls-main-dirty #2
Hardware name: BCM2711
[<c0410c3c>] (unwind_backtrace) from [<c040b580>] (show_stack+0x10/0x14)
[<c040b580>] (show_stack) from [<c1092174>] (dump_stack+0xc4/0xd8)
[<c1092174>] (dump_stack) from [<c0445a30>] (__warn+0x104/0x108)
[<c0445a30>] (__warn) from [<c0445aa8>] (warn_slowpath_fmt+0x74/0xb8)
[<c0445aa8>] (warn_slowpath_fmt) from [<c08435d0>] (kobject_get+0xa0/0xa4)
[<c08435d0>] (kobject_get) from [<bf0a715c>] (tpm_try_get_ops+0x14/0x54 [tpm])
[<bf0a715c>] (tpm_try_get_ops [tpm]) from [<bf0a7d6c>] (tpm_common_write+0x38/0x60 [tpm])
[<bf0a7d6c>] (tpm_common_write [tpm]) from [<c05a7ac0>] (vfs_write+0xc4/0x3c0)
[<c05a7ac0>] (vfs_write) from [<c05a7ee4>] (ksys_write+0x58/0xcc)
[<c05a7ee4>] (ksys_write) from [<c04001a0>] (ret_fast_syscall+0x0/0x4c)
Exception stack(0xc226bfa8 to 0xc226bff0)
bfa0: 00000000 000105b4 00000003 beafe664 00000014 00000000
bfc0: 00000000 000105b4 000103f8 00000004 00000000 00000000 b6f9c000 beafe684
bfe0: 0000006c beafe648 0001056c b6eb6944
---[ end trace d4b8409def9b8b1f ]---
The reason for this warning is the attempt to get the chip->dev reference
in tpm_common_write() although the reference counter is already zero.
Since commit 8979b02aaf1d ("tpm: Fix reference count to main device") the
extra reference used to prevent a premature zero counter is never taken,
because the required TPM_CHIP_FLAG_TPM2 flag is never set.
Fix this by moving the TPM 2 character device handling from
tpm_chip_alloc() to tpm_add_char_device() which is called at a later point
in time when the flag has been set in case of TPM2.
Commit fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
already introduced function tpm_devs_release() to release the extra
reference but did not implement the required put on chip->devs that results
in the call of this function.
Fix this by putting chip->devs in tpm_chip_unregister().
Finally move the new implementation for the TPM 2 handling into a new
function to avoid multiple checks for the TPM_CHIP_FLAG_TPM2 flag in the
good case and error cases.
Cc: stable(a)vger.kernel.org
Fixes: fdc915f7f719 ("tpm: expose spaces via a device link /dev/tpmrm<n>")
Fixes: 8979b02aaf1d ("tpm: Fix reference count to main device")
Co-developed-by: Jason Gunthorpe <jgg(a)ziepe.ca>
Signed-off-by: Jason Gunthorpe <jgg(a)ziepe.ca>
Signed-off-by: Lino Sanfilippo <l.sanfilippo(a)kunbus.com>
---
drivers/char/tpm/tpm-chip.c | 48 ++++++++-----------------------------
drivers/char/tpm/tpm.h | 1 +
drivers/char/tpm/tpm2-space.c | 55 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 66 insertions(+), 38 deletions(-)
diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
index ddaeceb..ae6e414 100644
--- a/drivers/char/tpm/tpm-chip.c
+++ b/drivers/char/tpm/tpm-chip.c
@@ -274,14 +274,6 @@ static void tpm_dev_release(struct device *dev)
kfree(chip);
}
-static void tpm_devs_release(struct device *dev)
-{
- struct tpm_chip *chip = container_of(dev, struct tpm_chip, devs);
-
- /* release the master device reference */
- put_device(&chip->dev);
-}
-
/**
* tpm_class_shutdown() - prepare the TPM device for loss of power.
* @dev: device to which the chip is associated.
@@ -344,7 +336,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
chip->dev_num = rc;
device_initialize(&chip->dev);
- device_initialize(&chip->devs);
chip->dev.class = tpm_class;
chip->dev.class->shutdown_pre = tpm_class_shutdown;
@@ -352,39 +343,20 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
chip->dev.parent = pdev;
chip->dev.groups = chip->groups;
- chip->devs.parent = pdev;
- chip->devs.class = tpmrm_class;
- chip->devs.release = tpm_devs_release;
- /* get extra reference on main device to hold on
- * behalf of devs. This holds the chip structure
- * while cdevs is in use. The corresponding put
- * is in the tpm_devs_release (TPM2 only)
- */
- if (chip->flags & TPM_CHIP_FLAG_TPM2)
- get_device(&chip->dev);
-
if (chip->dev_num == 0)
chip->dev.devt = MKDEV(MISC_MAJOR, TPM_MINOR);
else
chip->dev.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num);
- chip->devs.devt =
- MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);
-
rc = dev_set_name(&chip->dev, "tpm%d", chip->dev_num);
if (rc)
goto out;
- rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);
- if (rc)
- goto out;
if (!pdev)
chip->flags |= TPM_CHIP_FLAG_VIRTUAL;
cdev_init(&chip->cdev, &tpm_fops);
- cdev_init(&chip->cdevs, &tpmrm_fops);
chip->cdev.owner = THIS_MODULE;
- chip->cdevs.owner = THIS_MODULE;
rc = tpm2_init_space(&chip->work_space, TPM2_SPACE_BUFFER_SIZE);
if (rc) {
@@ -396,7 +368,6 @@ struct tpm_chip *tpm_chip_alloc(struct device *pdev,
return chip;
out:
- put_device(&chip->devs);
put_device(&chip->dev);
return ERR_PTR(rc);
}
@@ -445,14 +416,9 @@ static int tpm_add_char_device(struct tpm_chip *chip)
}
if (chip->flags & TPM_CHIP_FLAG_TPM2) {
- rc = cdev_device_add(&chip->cdevs, &chip->devs);
- if (rc) {
- dev_err(&chip->devs,
- "unable to cdev_device_add() %s, major %d, minor %d, err=%d\n",
- dev_name(&chip->devs), MAJOR(chip->devs.devt),
- MINOR(chip->devs.devt), rc);
- return rc;
- }
+ rc = tpm_devs_add(chip);
+ if (rc)
+ goto del_cdev;
}
/* Make the chip available. */
@@ -460,6 +426,10 @@ static int tpm_add_char_device(struct tpm_chip *chip)
idr_replace(&dev_nums_idr, chip, chip->dev_num);
mutex_unlock(&idr_lock);
+ return 0;
+
+del_cdev:
+ cdev_device_del(&chip->cdev, &chip->dev);
return rc;
}
@@ -640,8 +610,10 @@ void tpm_chip_unregister(struct tpm_chip *chip)
if (IS_ENABLED(CONFIG_HW_RANDOM_TPM))
hwrng_unregister(&chip->hwrng);
tpm_bios_log_teardown(chip);
- if (chip->flags & TPM_CHIP_FLAG_TPM2)
+ if (chip->flags & TPM_CHIP_FLAG_TPM2) {
cdev_device_del(&chip->cdevs, &chip->devs);
+ put_device(&chip->devs);
+ }
tpm_del_char_device(chip);
}
EXPORT_SYMBOL_GPL(tpm_chip_unregister);
diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
index 947d1db..ac4b2dc 100644
--- a/drivers/char/tpm/tpm.h
+++ b/drivers/char/tpm/tpm.h
@@ -238,6 +238,7 @@ int tpm2_prepare_space(struct tpm_chip *chip, struct tpm_space *space, u8 *cmd,
size_t cmdsiz);
int tpm2_commit_space(struct tpm_chip *chip, struct tpm_space *space, void *buf,
size_t *bufsiz);
+int tpm_devs_add(struct tpm_chip *chip);
void tpm_bios_log_setup(struct tpm_chip *chip);
void tpm_bios_log_teardown(struct tpm_chip *chip);
diff --git a/drivers/char/tpm/tpm2-space.c b/drivers/char/tpm/tpm2-space.c
index 784b8b3..6d17f77 100644
--- a/drivers/char/tpm/tpm2-space.c
+++ b/drivers/char/tpm/tpm2-space.c
@@ -571,3 +571,58 @@ int tpm2_commit_space(struct tpm_chip *chip, struct tpm_space *space,
dev_err(&chip->dev, "%s: error %d\n", __func__, rc);
return rc;
}
+
+/*
+ * Put the reference to the main device.
+ */
+static void tpm_devs_release(struct device *dev)
+{
+ struct tpm_chip *chip = container_of(dev, struct tpm_chip, devs);
+
+ /* release the master device reference */
+ put_device(&chip->dev);
+}
+
+/*
+ * Add a device file to expose TPM spaces. Also take a reference to the
+ * main device.
+ */
+int tpm_devs_add(struct tpm_chip *chip)
+{
+ int rc;
+
+ device_initialize(&chip->devs);
+ chip->devs.parent = chip->dev.parent;
+ chip->devs.class = tpmrm_class;
+
+ /*
+ * Get extra reference on main device to hold on behalf of devs.
+ * This holds the chip structure while cdevs is in use. The
+ * corresponding put is in the tpm_devs_release.
+ */
+ get_device(&chip->dev);
+ chip->devs.release = tpm_devs_release;
+ chip->devs.devt = MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);
+ cdev_init(&chip->cdevs, &tpmrm_fops);
+ chip->cdevs.owner = THIS_MODULE;
+
+ rc = dev_set_name(&chip->devs, "tpmrm%d", chip->dev_num);
+ if (rc)
+ goto out_put_devs;
+
+ rc = cdev_device_add(&chip->cdevs, &chip->devs);
+ if (rc) {
+ dev_err(&chip->devs,
+ "unable to cdev_device_add() %s, major %d, minor %d, err=%d\n",
+ dev_name(&chip->devs), MAJOR(chip->devs.devt),
+ MINOR(chip->devs.devt), rc);
+ goto out_put_devs;
+ }
+
+ return 0;
+
+out_put_devs:
+ put_device(&chip->devs);
+
+ return rc;
+}
--
2.7.4
When mmapping the shmem, it would previously adjust the pgoff in the
vm_area_struct to remove the fake offset that is added to be able to
identify the buffer. This patch removes the adjustment and makes the
fault handler use the vm_fault address to calculate the page offset
instead. Although using this address is apparently discouraged, several
DRM drivers seem to be doing it anyway.
The problem with removing the pgoff is that it prevents
drm_vma_node_unmap from working because that searches the mapping tree
by address. That doesn't work because all of the mappings are at offset
0. drm_vma_node_unmap is being used by the shmem helpers when purging
the buffer.
This fixes a bug in Panfrost which is using drm_gem_shmem_purge. Without
this the mapping for the purged buffer can still be accessed which might
mean it would access random pages from other buffers
v2: Don't check whether the unsigned page_offset is less than 0.
Cc: stable(a)vger.kernel.org
Fixes: 17acb9f35ed7 ("drm/shmem: Add madvise state and purge helpers")
Signed-off-by: Neil Roberts <nroberts(a)igalia.com>
---
drivers/gpu/drm/drm_gem_shmem_helper.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/drm_gem_shmem_helper.c b/drivers/gpu/drm/drm_gem_shmem_helper.c
index b26139b1dc35..5b5c095e86a9 100644
--- a/drivers/gpu/drm/drm_gem_shmem_helper.c
+++ b/drivers/gpu/drm/drm_gem_shmem_helper.c
@@ -527,15 +527,19 @@ static vm_fault_t drm_gem_shmem_fault(struct vm_fault *vmf)
loff_t num_pages = obj->size >> PAGE_SHIFT;
vm_fault_t ret;
struct page *page;
+ pgoff_t page_offset;
+
+ /* We don't use vmf->pgoff since that has the fake offset */
+ page_offset = (vmf->address - vma->vm_start) >> PAGE_SHIFT;
mutex_lock(&shmem->pages_lock);
- if (vmf->pgoff >= num_pages ||
+ if (page_offset >= num_pages ||
WARN_ON_ONCE(!shmem->pages) ||
shmem->madv < 0) {
ret = VM_FAULT_SIGBUS;
} else {
- page = shmem->pages[vmf->pgoff];
+ page = shmem->pages[page_offset];
ret = vmf_insert_page(vma, vmf->address, page);
}
@@ -591,9 +595,6 @@ int drm_gem_shmem_mmap(struct drm_gem_object *obj, struct vm_area_struct *vma)
struct drm_gem_shmem_object *shmem;
int ret;
- /* Remove the fake offset */
- vma->vm_pgoff -= drm_vma_node_start(&obj->vma_node);
-
if (obj->import_attach) {
/* Drop the reference drm_gem_mmap_obj() acquired.*/
drm_gem_object_put(obj);
--
2.29.2
The AFBC decoder used in the Rockchip VOP assumes the use of the
YUV-like colourspace transform (YTR). YTR is lossless for RGB(A)
buffers, which covers the RGBA8 and RGB565 formats supported in
vop_convert_afbc_format. Use of YTR is signaled with the
AFBC_FORMAT_MOD_YTR modifier, which prior to this commit was missing. As
such, a producer would have to generate buffers that do not use YTR,
which the VOP would erroneously decode as YTR, leading to severe visual
corruption.
The upstream AFBC support was developed against a captured frame, which
failed to exercise modifier support. Prior to bring-up of AFBC in Mesa
(in the Panfrost driver), no open userspace respected modifier
reporting. As such, this change is not expected to affect broken
userspaces.
Tested on RK3399 with Panfrost and Weston.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig(a)collabora.com>
---
drivers/gpu/drm/rockchip/rockchip_drm_vop.h | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.h b/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
index 4a2099cb5..857d97cdc 100644
--- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
+++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.h
@@ -17,9 +17,20 @@
#define NUM_YUV2YUV_COEFFICIENTS 12
+/* AFBC supports a number of configurable modes. Relevant to us is block size
+ * (16x16 or 32x8), storage modifiers (SPARSE, SPLIT), and the YUV-like
+ * colourspace transform (YTR). 16x16 SPARSE mode is always used. SPLIT mode
+ * could be enabled via the hreg_block_split register, but is not currently
+ * handled. The colourspace transform is implicitly always assumed by the
+ * decoder, so consumers must use this transform as well.
+ *
+ * Failure to match modifiers will cause errors displaying AFBC buffers
+ * produced by conformant AFBC producers, including Mesa.
+ */
#define ROCKCHIP_AFBC_MOD \
DRM_FORMAT_MOD_ARM_AFBC( \
AFBC_FORMAT_MOD_BLOCK_SIZE_16x16 | AFBC_FORMAT_MOD_SPARSE \
+ | AFBC_FORMAT_MOD_YTR \
)
enum vop_data_format {
--
2.28.0
I found a dead code in the queue/4.9 branch of the stable-rc repository.
2021-02-03:
commit c27f392040e2f6 ("futex: Provide distinct return value when
owner is exiting")
The function handle_exit_race does not exist. Therefore, the
change in handle_exit_race() is ignored in the patch round.
2021-02-22:
commit e55cb811e612 ("futex: Cure exit race")
Define the handle_exit_race() function,
but no branch in the function returns EBUSY.
As a result, dead code occurs in the attach_to_pi_owner():
int ret = handle_exit_race(uaddr, uval, p);
...
if (ret == -EBUSY)
*exiting = p; /* dead code */
To fix the dead code, modify the commit e55cb811e612 ("futex: Cure exit race"),
or install a patch to incorporate the changes in handle_exit_race().
I am unfamiliar with the processing of the stable-rc queue branch,
and I cannot find the patch mail of the current branch in
https://lore.kernel.org/lkml/?q=%22futex%3A+Cure+exit+race%22
Therefore, I re-integrated commit ac31c7ff8624 ("futex: Provide distinct
return value when owner is exiting").
-----
Thomas Gleixner (1):
futex: Provide distinct return value when owner is exiting
kernel/futex.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--
2.27.0
This is the start of the stable review cycle for the 5.11.1 release.
There are 12 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 24 Feb 2021 12:07:46 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.11.1-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.11.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.11.1-rc1
Matwey V. Kornilov <matwey(a)sai.msu.ru>
media: pwc: Use correct device for DMA
Trent Piepho <tpiepho(a)gmail.com>
Bluetooth: btusb: Always fallback to alt 1 for WBS
Linus Torvalds <torvalds(a)linux-foundation.org>
tty: protect tty_write from odd low-level tty disciplines
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: fix error handling in xen_blkbk_map()
Jan Beulich <jbeulich(a)suse.com>
xen-scsiback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-netback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: don't "handle" error by BUG()
Stefano Stabellini <stefano.stabellini(a)xilinx.com>
xen/arm: don't ignore return errors from set_phys_to_machine
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct error checking in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: don't bail early from clear_foreign_p2m_mapping()
-------------
Diffstat:
Makefile | 4 ++--
arch/arm/xen/p2m.c | 6 ++++--
arch/x86/xen/p2m.c | 15 +++++++--------
drivers/block/xen-blkback/blkback.c | 32 ++++++++++++++++++--------------
drivers/bluetooth/btusb.c | 20 ++++++--------------
drivers/media/usb/pwc/pwc-if.c | 22 +++++++++++++---------
drivers/net/xen-netback/netback.c | 4 +---
drivers/tty/tty_io.c | 5 ++++-
drivers/xen/gntdev.c | 37 ++++++++++++++++++++-----------------
drivers/xen/xen-scsiback.c | 4 ++--
include/xen/grant_table.h | 1 +
11 files changed, 78 insertions(+), 72 deletions(-)
From: Thomas Gleixner <tglx(a)linutronix.de>
task #32972991
commit 6f1a4891a5928a5969c87fa5a584844c983ec823 upstream
Evan tracked down a subtle race between the update of the MSI message and
the device raising an interrupt internally on PCI devices which do not
support MSI masking. The update of the MSI message is non-atomic and
consists of either 2 or 3 sequential 32bit wide writes to the PCI config
space.
- Write address low 32bits
- Write address high 32bits (If supported by device)
- Write data
When an interrupt is migrated then both address and data might change, so
the kernel attempts to mask the MSI interrupt first. But for MSI masking is
optional, so there exist devices which do not provide it. That means that
if the device raises an interrupt internally between the writes then a MSI
message is sent built from half updated state.
On x86 this can lead to spurious interrupts on the wrong interrupt
vector when the affinity setting changes both address and data. As a
consequence the device interrupt can be lost causing the device to
become stuck or malfunctioning.
Evan tried to handle that by disabling MSI accross an MSI message
update. That's not feasible because disabling MSI has issues on its own:
If MSI is disabled the PCI device is routing an interrupt to the legacy
INTx mechanism. The INTx delivery can be disabled, but the disablement is
not working on all devices.
Some devices lose interrupts when both MSI and INTx delivery are disabled.
Another way to solve this would be to enforce the allocation of the same
vector on all CPUs in the system for this kind of screwed devices. That
could be done, but it would bring back the vector space exhaustion problems
which got solved a few years ago.
Fortunately the high address (if supported by the device) is only relevant
when X2APIC is enabled which implies interrupt remapping. In the interrupt
remapping case the affinity setting is happening at the interrupt remapping
unit and the PCI MSI message is programmed only once when the PCI device is
initialized.
That makes it possible to solve it with a two step update:
1) Target the MSI msg to the new vector on the current target CPU
2) Target the MSI msg to the new vector on the new target CPU
In both cases writing the MSI message is only changing a single 32bit word
which prevents the issue of inconsistency.
After writing the final destination it is necessary to check whether the
device issued an interrupt while the intermediate state #1 (new vector,
current CPU) was in effect.
This is possible because the affinity change is always happening on the
current target CPU. The code runs with interrupts disabled, so the
interrupt can be detected by checking the IRR of the local APIC. If the
vector is pending in the IRR then the interrupt is retriggered on the new
target CPU by sending an IPI for the associated vector on the target CPU.
This can cause spurious interrupts on both the local and the new target
CPU.
1) If the new vector is not in use on the local CPU and the device
affected by the affinity change raised an interrupt during the
transitional state (step #1 above) then interrupt entry code will
ignore that spurious interrupt. The vector is marked so that the
'No irq handler for vector' warning is supressed once.
2) If the new vector is in use already on the local CPU then the IRR check
might see an pending interrupt from the device which is using this
vector. The IPI to the new target CPU will then invoke the handler of
the device, which got the affinity change, even if that device did not
issue an interrupt
3) If the new vector is in use already on the local CPU and the device
affected by the affinity change raised an interrupt during the
transitional state (step #1 above) then the handler of the device which
uses that vector on the local CPU will be invoked.
expose issues in device driver interrupt handlers which are not prepared to
handle a spurious interrupt correctly. This not a regression, it's just
exposing something which was already broken as spurious interrupts can
happen for a lot of reasons and all driver handlers need to be able to deal
with them.
Conflicts:
arch/x86/kernel/apic/msi.c
include/linux/irq.h
Reported-by: Evan Green <evgreen(a)chromium.org>
Debugged-by: Evan Green <evgreen(a)chromium.org>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Evan Green <evgreen(a)chromium.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/87imkr4s7n.fsf@nanos.tec.linutronix.de
Signed-off-by: Zelin Deng <zelin.deng(a)linux.alibaba.com>
---
arch/x86/include/asm/apic.h | 8 +++
arch/x86/kernel/apic/msi.c | 128 +++++++++++++++++++++++++++++++++++-
include/linux/irq.h | 15 +++++
include/linux/irqdomain.h | 7 ++
kernel/irq/debugfs.c | 1 +
kernel/irq/msi.c | 5 +-
6 files changed, 160 insertions(+), 4 deletions(-)
diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 050368db9d35..3c1e51ead072 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -448,6 +448,14 @@ static inline void ack_APIC_irq(void)
apic_eoi();
}
+
+static inline bool lapic_vector_set_in_irr(unsigned int vector)
+{
+ u32 irr = apic_read(APIC_IRR + (vector / 32 * 0x10));
+
+ return !!(irr & (1U << (vector % 32)));
+}
+
static inline unsigned default_get_apic_id(unsigned long x)
{
unsigned int ver = GET_APIC_VERSION(apic_read(APIC_LVR));
diff --git a/arch/x86/kernel/apic/msi.c b/arch/x86/kernel/apic/msi.c
index 37465ab9d030..a81e8e2ede3d 100644
--- a/arch/x86/kernel/apic/msi.c
+++ b/arch/x86/kernel/apic/msi.c
@@ -30,10 +30,8 @@ static struct irq_domain *platform_msi_default_domain;
static struct msi_domain_info platform_msi_domain_info;
#endif
-static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+static void __irq_msi_compose_msg(struct irq_cfg *cfg, struct msi_msg *msg)
{
- struct irq_cfg *cfg = irqd_cfg(data);
-
msg->address_hi = MSI_ADDR_BASE_HI;
if (x2apic_enabled())
@@ -54,6 +52,127 @@ static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
MSI_DATA_VECTOR(cfg->vector);
}
+static void irq_msi_compose_msg(struct irq_data *data, struct msi_msg *msg)
+{
+ __irq_msi_compose_msg(irqd_cfg(data), msg);
+}
+
+static void irq_msi_update_msg(struct irq_data *irqd, struct irq_cfg *cfg)
+{
+ struct msi_msg msg[2] = { [1] = { }, };
+
+ __irq_msi_compose_msg(cfg, msg);
+ irq_data_get_irq_chip(irqd)->irq_write_msi_msg(irqd, msg);
+}
+
+static int
+msi_set_affinity(struct irq_data *irqd, const struct cpumask *mask, bool force)
+{
+ struct irq_cfg old_cfg, *cfg = irqd_cfg(irqd);
+ struct irq_data *parent = irqd->parent_data;
+ unsigned int cpu;
+ int ret;
+
+ /* Save the current configuration */
+ cpu = cpumask_first(irq_data_get_effective_affinity_mask(irqd));
+ old_cfg = *cfg;
+
+ /* Allocate a new target vector */
+ ret = parent->chip->irq_set_affinity(parent, mask, force);
+ if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE)
+ return ret;
+
+ /*
+ * For non-maskable and non-remapped MSI interrupts the migration
+ * to a different destination CPU and a different vector has to be
+ * done careful to handle the possible stray interrupt which can be
+ * caused by the non-atomic update of the address/data pair.
+ *
+ * Direct update is possible when:
+ * - The MSI is maskable (remapped MSI does not use this code path)).
+ * The quirk bit is not set in this case.
+ * - The new vector is the same as the old vector
+ * - The old vector is MANAGED_IRQ_SHUTDOWN_VECTOR (interrupt starts up)
+ * - The new destination CPU is the same as the old destination CPU
+ */
+ if (!irqd_msi_nomask_quirk(irqd) ||
+ cfg->vector == old_cfg.vector ||
+ old_cfg.vector == MANAGED_IRQ_SHUTDOWN_VECTOR ||
+ cfg->dest_apicid == old_cfg.dest_apicid) {
+ irq_msi_update_msg(irqd, cfg);
+ return ret;
+ }
+
+ /*
+ * Paranoia: Validate that the interrupt target is the local
+ * CPU.
+ */
+ if (WARN_ON_ONCE(cpu != smp_processor_id())) {
+ irq_msi_update_msg(irqd, cfg);
+ return ret;
+ }
+
+ /*
+ * Redirect the interrupt to the new vector on the current CPU
+ * first. This might cause a spurious interrupt on this vector if
+ * the device raises an interrupt right between this update and the
+ * update to the final destination CPU.
+ *
+ * If the vector is in use then the installed device handler will
+ * denote it as spurious which is no harm as this is a rare event
+ * and interrupt handlers have to cope with spurious interrupts
+ * anyway. If the vector is unused, then it is marked so it won't
+ * trigger the 'No irq handler for vector' warning in do_IRQ().
+ *
+ * This requires to hold vector lock to prevent concurrent updates to
+ * the affected vector.
+ */
+ lock_vector_lock();
+
+ /*
+ * Mark the new target vector on the local CPU if it is currently
+ * unused. Reuse the VECTOR_RETRIGGERED state which is also used in
+ * the CPU hotplug path for a similar purpose. This cannot be
+ * undone here as the current CPU has interrupts disabled and
+ * cannot handle the interrupt before the whole set_affinity()
+ * section is done. In the CPU unplug case, the current CPU is
+ * about to vanish and will not handle any interrupts anymore. The
+ * vector is cleaned up when the CPU comes online again.
+ */
+ if (IS_ERR_OR_NULL(this_cpu_read(vector_irq[cfg->vector])))
+ this_cpu_write(vector_irq[cfg->vector], VECTOR_RETRIGGERED);
+
+ /* Redirect it to the new vector on the local CPU temporarily */
+ old_cfg.vector = cfg->vector;
+ irq_msi_update_msg(irqd, &old_cfg);
+
+ /* Now transition it to the target CPU */
+ irq_msi_update_msg(irqd, cfg);
+
+ /*
+ * All interrupts after this point are now targeted at the new
+ * vector/CPU.
+ *
+ * Drop vector lock before testing whether the temporary assignment
+ * to the local CPU was hit by an interrupt raised in the device,
+ * because the retrigger function acquires vector lock again.
+ */
+ unlock_vector_lock();
+
+ /*
+ * Check whether the transition raced with a device interrupt and
+ * is pending in the local APICs IRR. It is safe to do this outside
+ * of vector lock as the irq_desc::lock of this interrupt is still
+ * held and interrupts are disabled: The check is not accessing the
+ * underlying vector store. It's just checking the local APIC's
+ * IRR.
+ */
+ if (lapic_vector_set_in_irr(cfg->vector))
+ irq_data_get_irq_chip(irqd)->irq_retrigger(irqd);
+
+ return ret;
+}
+
/*
* IRQ Chip for MSI PCI/PCI-X/PCI-Express Devices,
* which implement the MSI or MSI-X Capability Structure.
@@ -65,6 +184,7 @@ static struct irq_chip pci_msi_controller = {
.irq_ack = irq_chip_ack_parent,
.irq_retrigger = irq_chip_retrigger_hierarchy,
.irq_compose_msi_msg = irq_msi_compose_msg,
+ .irq_set_affinity = msi_set_affinity,
.flags = IRQCHIP_SKIP_SET_WAKE,
};
@@ -153,6 +273,8 @@ void __init arch_init_msi_domain(struct irq_domain *parent)
}
if (!msi_default_domain)
pr_warn("failed to initialize irqdomain for MSI/MSI-x.\n");
+ else
+ msi_default_domain->flags |= IRQ_DOMAIN_MSI_NOMASK_QUIRK;
#ifdef CONFIG_X86_PLATFORM_MSI
fn = irq_domain_alloc_named_fwnode("PLATFORM-MSI");
if (fn) {
diff --git a/include/linux/irq.h b/include/linux/irq.h
index ea06077e066d..f7c1177f1840 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -405,6 +405,21 @@ static inline bool irqd_can_reserve(struct irq_data *d)
return __irqd_to_state(d) & IRQD_CAN_RESERVE;
}
+static inline void irqd_set_msi_nomask_quirk(struct irq_data *d)
+{
+ __irqd_to_state(d) |= IRQD_MSI_NOMASK_QUIRK;
+}
+
+static inline void irqd_clr_msi_nomask_quirk(struct irq_data *d)
+{
+ __irqd_to_state(d) &= ~IRQD_MSI_NOMASK_QUIRK;
+}
+
+static inline bool irqd_msi_nomask_quirk(struct irq_data *d)
+{
+ return __irqd_to_state(d) & IRQD_MSI_NOMASK_QUIRK;
+}
+
#undef __irqd_to_state
static inline irq_hw_number_t irqd_to_hwirq(struct irq_data *d)
diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h
index dccfa65aee96..8301f1df0682 100644
--- a/include/linux/irqdomain.h
+++ b/include/linux/irqdomain.h
@@ -202,6 +202,13 @@ enum {
/* Irq domain implements MSI remapping */
IRQ_DOMAIN_FLAG_MSI_REMAP = (1 << 5),
+ /*
+ * Quirk to handle MSI implementations which do not provide
+ * masking. Currently known to affect x86, but partially
+ * handled in core code.
+ */
+ IRQ_DOMAIN_MSI_NOMASK_QUIRK = (1 << 6),
+
/*
* Flags starting from IRQ_DOMAIN_FLAG_NONCORE are reserved
* for implementation specific purposes and ignored by the
diff --git a/kernel/irq/debugfs.c b/kernel/irq/debugfs.c
index 7c426da3a37e..ccfe1fd1721b 100644
--- a/kernel/irq/debugfs.c
+++ b/kernel/irq/debugfs.c
@@ -114,6 +114,7 @@ static const struct irq_bit_descr irqdata_states[] = {
BIT_MASK_DESCR(IRQD_AFFINITY_MANAGED),
BIT_MASK_DESCR(IRQD_MANAGED_SHUTDOWN),
BIT_MASK_DESCR(IRQD_CAN_RESERVE),
+ BIT_MASK_DESCR(IRQD_MSI_NOMASK_QUIRK),
BIT_MASK_DESCR(IRQD_FORWARDED_TO_VCPU),
diff --git a/kernel/irq/msi.c b/kernel/irq/msi.c
index 4ca2fd46645d..dc1186ce3ecd 100644
--- a/kernel/irq/msi.c
+++ b/kernel/irq/msi.c
@@ -453,8 +453,11 @@ int msi_domain_alloc_irqs(struct irq_domain *domain, struct device *dev,
continue;
irq_data = irq_domain_get_irq_data(domain, desc->irq);
- if (!can_reserve)
+ if (!can_reserve) {
irqd_clr_can_reserve(irq_data);
+ if (domain->flags & IRQ_DOMAIN_MSI_NOMASK_QUIRK)
+ irqd_set_msi_nomask_quirk(irq_data);
+ }
ret = irq_domain_activate_irq(irq_data, can_reserve);
if (ret)
goto cleanup;
--
2.30.0.81.g72c4083ddf
Hi All,
Below commits needs to be backported to 5.10.y kernels to avoid NULL
pointer deref panic(while trying to access 'numa_node' from NULL
'dma_device'), @ Line:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drive…)
commit:22dd4c707673129ed17e803b4bf68a567b2731db
commit:8ecfca68dc4cbee1272a0161e3f2fb9387dc6930
The panic is observed at nvme host(provider/transport being SoftiWARP),
while perfroming "nvme discover".
Hence, please backport the below patch series to '5.10.y' kernel.
"https://lore.kernel.org/linux-iommu/20201106181941.1878556-1-hch@lst.de/"
Thanks,
Krishnamraju.
The patch titled
Subject: mm: memcontrol: fix get_active_memcg return value
has been added to the -mm tree. Its filename is
mm-memcontrol-fix-get_active_memcg-return-value.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/mm-memcontrol-fix-get_active_memc…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/mm-memcontrol-fix-get_active_memc…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Muchun Song <songmuchun(a)bytedance.com>
Subject: mm: memcontrol: fix get_active_memcg return value
We use a global percpu int_active_memcg variable to store the remote memcg
when we are in the interrupt context. But get_active_memcg always return
the current->active_memcg or root_mem_cgroup. The remote memcg (set in
the interrupt context) is ignored. This is not what we want. So fix it.
Link: https://lkml.kernel.org/r/20210223091101.42150-1-songmuchun@bytedance.com
Fixes: 37d5985c003d ("mm: kmem: prepare remote memcg charging infra for interrupt contexts")
Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Vladimir Davydov <vdavydov.dev(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 10 +++-------
1 file changed, 3 insertions(+), 7 deletions(-)
--- a/mm/memcontrol.c~mm-memcontrol-fix-get_active_memcg-return-value
+++ a/mm/memcontrol.c
@@ -1061,13 +1061,9 @@ static __always_inline struct mem_cgroup
rcu_read_lock();
memcg = active_memcg();
- if (memcg) {
- /* current->active_memcg must hold a ref. */
- if (WARN_ON_ONCE(!css_tryget(&memcg->css)))
- memcg = root_mem_cgroup;
- else
- memcg = current->active_memcg;
- }
+ /* remote memcg must hold a ref. */
+ if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css)))
+ memcg = root_mem_cgroup;
rcu_read_unlock();
return memcg;
_
Patches currently in -mm which might be from songmuchun(a)bytedance.com are
mm-memcontrol-optimize-per-lruvec-stats-counter-memory-usage.patch
mm-memcontrol-fix-nr_anon_thps-accounting-in-charge-moving.patch
mm-memcontrol-convert-nr_anon_thps-account-to-pages.patch
mm-memcontrol-convert-nr_file_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_thps-account-to-pages.patch
mm-memcontrol-convert-nr_shmem_pmdmapped-account-to-pages.patch
mm-memcontrol-convert-nr_file_pmdmapped-account-to-pages.patch
mm-memcontrol-make-the-slab-calculation-consistent.patch
mm-memcontrol-replace-the-loop-with-a-list_for_each_entry.patch
mm-memcontrol-fix-swap-undercounting-in-cgroup2.patch
mm-memcontrol-fix-get_active_memcg-return-value.patch
hugetlb-convert-page_huge_active-hpagemigratable-flag-fix.patch
This is the start of the stable review cycle for the 4.9.258 release.
There are 49 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 24 Feb 2021 12:07:46 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.258-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.258-rc1
Lai Jiangshan <laijs(a)linux.alibaba.com>
kvm: check tlbs_dirty directly
Arun Easi <aeasi(a)marvell.com>
scsi: qla2xxx: Fix crash during driver load on big endian machines
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: fix error handling in xen_blkbk_map()
Jan Beulich <jbeulich(a)suse.com>
xen-scsiback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-netback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: don't "handle" error by BUG()
Stefano Stabellini <stefano.stabellini(a)xilinx.com>
xen/arm: don't ignore return errors from set_phys_to_machine
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct error checking in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: don't bail early from clear_foreign_p2m_mapping()
Vasily Gorbik <gor(a)linux.ibm.com>
tracing: Avoid calling cc-option -mrecord-mcount for every Makefile
Greg Thelen <gthelen(a)google.com>
tracing: Fix SKIP_STACK_VALIDATION=1 build due to bad merge with -mrecord-mcount
Andi Kleen <ak(a)linux.intel.com>
trace: Use -mcount-record for dynamic ftrace
Borislav Petkov <bp(a)suse.de>
x86/build: Disable CET instrumentation in the kernel for 32-bit too
Stefano Garzarella <sgarzare(a)redhat.com>
vsock: fix locking in vsock_shutdown()
Stefano Garzarella <sgarzare(a)redhat.com>
vsock/virtio: update credit only if socket is not closed
Edwin Peer <edwin.peer(a)broadcom.com>
net: watchdog: hold device global xmit lock during tx disable
Norbert Slusarek <nslusarek(a)gmx.net>
net/vmw_vsock: improve locking in vsock_connect_timeout()
Serge Semin <Sergey.Semin(a)baikalelectronics.ru>
usb: dwc3: ulpi: Replace CPU-based busyloop with Protocol-based one
Felipe Balbi <balbi(a)kernel.org>
usb: dwc3: ulpi: fix checkpatch warning
Randy Dunlap <rdunlap(a)infradead.org>
h8300: fix PREEMPTION build, TI_PRE_COUNT undefined
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: skip identical origin tuple in same zone only
Juergen Gross <jgross(a)suse.com>
xen/netback: avoid race in xenvif_rx_ring_slots_available()
Jozsef Kadlecsik <kadlec(a)mail.kfki.hu>
netfilter: xt_recent: Fix attempt to update deleted entry
Bui Quang Minh <minhquangbui99(a)gmail.com>
bpf: Check for integer overflow when using roundup_pow_of_two()
Roman Gushchin <guro(a)fb.com>
memblock: do not start bottom-up allocations with kernel_end
Alexandre Belloni <alexandre.belloni(a)bootlin.com>
ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL
Amir Goldstein <amir73il(a)gmail.com>
ovl: skip getxattr of security labels
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Check length before giving out the filter buffer
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Do not count ftrace events in top level enable output
Phillip Lougher <phillip(a)squashfs.org.uk>
squashfs: add more sanity checks in xattr id lookup
Phillip Lougher <phillip(a)squashfs.org.uk>
squashfs: add more sanity checks in inode lookup
Phillip Lougher <phillip(a)squashfs.org.uk>
squashfs: add more sanity checks in id lookup
Thomas Gleixner <tglx(a)linutronix.de>
futex: Cure exit race
Peter Zijlstra <peterz(a)infradead.org>
futex: Change locking rules
Thomas Gleixner <tglx(a)linutronix.de>
futex: Ensure the correct return value from futex_lock_pi()
Theodore Ts'o <tytso(a)mit.edu>
memcg: fix a crash in wb_workfn when a device disappears
Qian Cai <cai(a)lca.pw>
include/trace/events/writeback.h: fix -Wstringop-truncation warnings
Tobin C. Harding <tobin(a)kernel.org>
lib/string: Add strscpy_pad() function
Dave Wysochanski <dwysocha(a)redhat.com>
SUNRPC: Handle 0 length opaque XDR object data properly
Dave Wysochanski <dwysocha(a)redhat.com>
SUNRPC: Move simple_get_bytes and simple_get_netobj into private header
Johannes Berg <johannes.berg(a)intel.com>
iwlwifi: mvm: guard against device removal in reprobe
Emmanuel Grumbach <emmanuel.grumbach(a)intel.com>
iwlwifi: pcie: add a NULL check in iwl_pcie_txq_unmap
Johannes Berg <johannes.berg(a)intel.com>
iwlwifi: mvm: take mutex for calling iwl_mvm_get_sync_time()
Cong Wang <cong.wang(a)bytedance.com>
af_key: relax availability checks for skb size calculation
Sibi Sankar <sibis(a)codeaurora.org>
remoteproc: qcom_q6v5_mss: Validate MBA firmware size before load
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
fgraph: Initialize tracing_graph_pause at task creation
Johannes Weiner <hannes(a)cmpxchg.org>
mm: memcontrol: fix NULL pointer crash in test_clear_page_writeback()
-------------
Diffstat:
Makefile | 11 +-
arch/arm/boot/dts/lpc32xx.dtsi | 3 -
arch/arm/xen/p2m.c | 6 +-
arch/h8300/kernel/asm-offsets.c | 3 +
arch/x86/Makefile | 6 +-
arch/x86/xen/p2m.c | 15 +-
drivers/block/xen-blkback/blkback.c | 30 +--
.../net/wireless/intel/iwlwifi/mvm/debugfs-vif.c | 3 +
drivers/net/wireless/intel/iwlwifi/mvm/ops.c | 3 +-
drivers/net/wireless/intel/iwlwifi/pcie/tx.c | 5 +
drivers/net/xen-netback/netback.c | 4 +-
drivers/net/xen-netback/rx.c | 9 +-
drivers/remoteproc/qcom_q6v5_pil.c | 6 +
drivers/scsi/qla2xxx/qla_tmpl.c | 9 +-
drivers/scsi/qla2xxx/qla_tmpl.h | 2 +-
drivers/usb/dwc3/ulpi.c | 20 +-
drivers/xen/gntdev.c | 33 ++-
drivers/xen/xen-scsiback.c | 4 +-
fs/fs-writeback.c | 2 +-
fs/overlayfs/copy_up.c | 15 +-
fs/squashfs/export.c | 41 +++-
fs/squashfs/id.c | 40 +++-
fs/squashfs/squashfs_fs_sb.h | 1 +
fs/squashfs/super.c | 6 +-
fs/squashfs/xattr.h | 10 +-
fs/squashfs/xattr_id.c | 66 +++++-
include/linux/backing-dev.h | 10 +
include/linux/ftrace.h | 4 +-
include/linux/memcontrol.h | 33 ++-
include/linux/netdevice.h | 2 +
include/linux/string.h | 4 +
include/linux/sunrpc/xdr.h | 3 +-
include/trace/events/writeback.h | 35 ++--
include/xen/grant_table.h | 1 +
kernel/bpf/stackmap.c | 2 +
kernel/futex.c | 233 +++++++++++++++++----
kernel/trace/ftrace.c | 2 -
kernel/trace/trace.c | 2 +-
kernel/trace/trace_events.c | 3 +-
lib/string.c | 47 ++++-
mm/backing-dev.c | 1 +
mm/memblock.c | 48 +----
mm/memcontrol.c | 43 ++--
mm/page-writeback.c | 14 +-
net/key/af_key.c | 6 +-
net/netfilter/nf_conntrack_core.c | 3 +-
net/netfilter/xt_recent.c | 12 +-
net/sunrpc/auth_gss/auth_gss.c | 30 +--
net/sunrpc/auth_gss/auth_gss_internal.h | 45 ++++
net/sunrpc/auth_gss/gss_krb5_mech.c | 31 +--
net/vmw_vsock/af_vsock.c | 13 +-
net/vmw_vsock/virtio_transport_common.c | 4 +-
scripts/Makefile.build | 3 +
virt/kvm/kvm_main.c | 3 +-
54 files changed, 681 insertions(+), 309 deletions(-)
This is the start of the stable review cycle for the 4.19.177 release.
There are 50 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed, 24 Feb 2021 12:07:46 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.177-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.177-rc1
Lai Jiangshan <laijs(a)linux.alibaba.com>
kvm: check tlbs_dirty directly
Arun Easi <aeasi(a)marvell.com>
scsi: qla2xxx: Fix crash during driver load on big endian machines
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: fix error handling in xen_blkbk_map()
Jan Beulich <jbeulich(a)suse.com>
xen-scsiback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-netback: don't "handle" error by BUG()
Jan Beulich <jbeulich(a)suse.com>
xen-blkback: don't "handle" error by BUG()
Stefano Stabellini <stefano.stabellini(a)xilinx.com>
xen/arm: don't ignore return errors from set_phys_to_machine
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct error checking in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
Jan Beulich <jbeulich(a)suse.com>
Xen/x86: don't bail early from clear_foreign_p2m_mapping()
Loic Poulain <loic.poulain(a)linaro.org>
net: qrtr: Fix port ID for control messages
Paolo Bonzini <pbonzini(a)redhat.com>
KVM: SEV: fix double locking due to incorrect backport
Borislav Petkov <bp(a)suse.de>
x86/build: Disable CET instrumentation in the kernel for 32-bit too
Miklos Szeredi <mszeredi(a)redhat.com>
ovl: expand warning in ovl_d_real()
Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
net/qrtr: restrict user-controlled length in qrtr_tun_write_iter()
Sabyrzhan Tasbolatov <snovitoll(a)gmail.com>
net/rds: restrict iovecs length for RDS_CMSG_RDMA_ARGS
Stefano Garzarella <sgarzare(a)redhat.com>
vsock: fix locking in vsock_shutdown()
Stefano Garzarella <sgarzare(a)redhat.com>
vsock/virtio: update credit only if socket is not closed
Edwin Peer <edwin.peer(a)broadcom.com>
net: watchdog: hold device global xmit lock during tx disable
Norbert Slusarek <nslusarek(a)gmx.net>
net/vmw_vsock: improve locking in vsock_connect_timeout()
NeilBrown <neilb(a)suse.de>
net: fix iteration for sctp transport seq_files
Serge Semin <Sergey.Semin(a)baikalelectronics.ru>
usb: dwc3: ulpi: Replace CPU-based busyloop with Protocol-based one
Felipe Balbi <balbi(a)kernel.org>
usb: dwc3: ulpi: fix checkpatch warning
Randy Dunlap <rdunlap(a)infradead.org>
h8300: fix PREEMPTION build, TI_PRE_COUNT undefined
Alain Volmat <alain.volmat(a)foss.st.com>
i2c: stm32f7: fix configuration of the digital filter
Fangrui Song <maskray(a)google.com>
firmware_loader: align .builtin_fw to 8
Yufeng Mo <moyufeng(a)huawei.com>
net: hns3: add a check for queue_id in hclge_reset_vf_queue()
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: skip identical origin tuple in same zone only
Mohammad Athari Bin Ismail <mohammad.athari.ismail(a)intel.com>
net: stmmac: set TxQ mode back to DCB after disabling CBS
Juergen Gross <jgross(a)suse.com>
xen/netback: avoid race in xenvif_rx_ring_slots_available()
Sven Auhagen <sven.auhagen(a)voleatech.de>
netfilter: flowtable: fix tcp and udp header checksum update
Jozsef Kadlecsik <kadlec(a)mail.kfki.hu>
netfilter: xt_recent: Fix attempt to update deleted entry
Bui Quang Minh <minhquangbui99(a)gmail.com>
bpf: Check for integer overflow when using roundup_pow_of_two()
Lorenzo Bianconi <lorenzo(a)kernel.org>
mt76: dma: fix a possible memory leak in mt76_add_fragment()
Russell King <rmk+kernel(a)armlinux.org.uk>
ARM: kexec: fix oops after TLB are invalidated
Russell King <rmk+kernel(a)armlinux.org.uk>
ARM: ensure the signal page contains defined contents
Alexandre Belloni <alexandre.belloni(a)bootlin.com>
ARM: dts: lpc32xx: Revert set default clock rate of HCLK PLL
Lin Feng <linf(a)wangsu.com>
bfq-iosched: Revert "bfq: Fix computation of shallow depth"
Alexandre Ghiti <alex(a)ghiti.fr>
riscv: virt_addr_valid must check the address belongs to linear mapping
Victor Lu <victorchengchi.lu(a)amd.com>
drm/amd/display: Free atomic state after drm_atomic_commit
Victor Lu <victorchengchi.lu(a)amd.com>
drm/amd/display: Fix dc_sink kref count in emulated_link_detect
Amir Goldstein <amir73il(a)gmail.com>
ovl: skip getxattr of security labels
Miklos Szeredi <mszeredi(a)redhat.com>
cap: fix conversions on getxattr
Miklos Szeredi <mszeredi(a)redhat.com>
ovl: perform vfs_getxattr() with mounter creds
Hans de Goede <hdegoede(a)redhat.com>
platform/x86: hp-wmi: Disable tablet-mode reporting by default
Marc Zyngier <maz(a)kernel.org>
arm64: dts: rockchip: Fix PCIe DT properties on rk3399
Julien Grall <jgrall(a)amazon.com>
arm/xen: Don't probe xenbus as part of an early initcall
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Check length before giving out the filter buffer
Steven Rostedt (VMware) <rostedt(a)goodmis.org>
tracing: Do not count ftrace events in top level enable output
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/lpc32xx.dtsi | 3 -
arch/arm/include/asm/kexec-internal.h | 12 ++++
arch/arm/kernel/asm-offsets.c | 5 ++
arch/arm/kernel/machine_kexec.c | 20 +++----
arch/arm/kernel/relocate_kernel.S | 38 ++++--------
arch/arm/kernel/signal.c | 14 +++--
arch/arm/xen/enlighten.c | 2 -
arch/arm/xen/p2m.c | 6 +-
arch/arm64/boot/dts/rockchip/rk3399.dtsi | 2 +-
arch/h8300/kernel/asm-offsets.c | 3 +
arch/riscv/include/asm/page.h | 5 +-
arch/x86/Makefile | 6 +-
arch/x86/kvm/svm.c | 1 -
arch/x86/xen/p2m.c | 15 +++--
block/bfq-iosched.c | 8 +--
drivers/block/xen-blkback/blkback.c | 30 +++++-----
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 18 +++---
drivers/i2c/busses/i2c-stm32f7.c | 11 +++-
.../ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 7 +++
drivers/net/ethernet/stmicro/stmmac/stmmac_tc.c | 7 ++-
drivers/net/wireless/mediatek/mt76/dma.c | 8 ++-
drivers/net/xen-netback/netback.c | 4 +-
drivers/net/xen-netback/rx.c | 9 ++-
drivers/platform/x86/hp-wmi.c | 14 +++--
drivers/scsi/qla2xxx/qla_tmpl.c | 9 +--
drivers/scsi/qla2xxx/qla_tmpl.h | 2 +-
drivers/usb/dwc3/ulpi.c | 20 +++++--
drivers/xen/gntdev.c | 37 ++++++------
drivers/xen/xen-scsiback.c | 4 +-
drivers/xen/xenbus/xenbus.h | 1 -
drivers/xen/xenbus/xenbus_probe.c | 2 +-
fs/overlayfs/copy_up.c | 15 ++---
fs/overlayfs/inode.c | 2 +
fs/overlayfs/super.c | 13 +++--
include/asm-generic/vmlinux.lds.h | 2 +-
include/linux/netdevice.h | 2 +
include/xen/grant_table.h | 1 +
include/xen/xenbus.h | 2 -
kernel/bpf/stackmap.c | 2 +
kernel/trace/trace.c | 2 +-
kernel/trace/trace_events.c | 3 +-
net/netfilter/nf_conntrack_core.c | 3 +-
net/netfilter/nf_flow_table_core.c | 4 +-
net/netfilter/xt_recent.c | 12 +++-
net/qrtr/qrtr.c | 2 +-
net/qrtr/tun.c | 6 ++
net/rds/rdma.c | 3 +
net/sctp/proc.c | 16 ++++--
net/vmw_vsock/af_vsock.c | 13 ++---
net/vmw_vsock/hyperv_transport.c | 4 --
net/vmw_vsock/virtio_transport_common.c | 4 +-
security/commoncap.c | 67 ++++++++++++++--------
virt/kvm/kvm_main.c | 3 +-
54 files changed, 303 insertions(+), 205 deletions(-)
From: Mikulas Patocka <mpatocka(a)redhat.com>
dm_bufio_get_device_size returns returns the device size in blocks. Before
returning the value, we must subtract the nubmer of starting sectors. The
number of starting sectos may not be divisible by block size.
Note that currently, no target is using dm_bufio_set_sector_offset and
dm_bufio_get_device_size simultaneously, so this patch has no effect.
However, an upcoming patch that fixes dm-verity-fec needs this change.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Reviewed-by: Milan Broz <gmazyland(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
drivers/md/dm-bufio.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index fce4cbf9529d..50f3e673729c 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -1526,6 +1526,10 @@ EXPORT_SYMBOL_GPL(dm_bufio_get_block_size);
sector_t dm_bufio_get_device_size(struct dm_bufio_client *c)
{
sector_t s = i_size_read(c->bdev->bd_inode) >> SECTOR_SHIFT;
+ if (s >= c->start)
+ s -= c->start;
+ else
+ s = 0;
if (likely(c->sectors_per_block_bits >= 0))
s >>= c->sectors_per_block_bits;
else
--
2.30.1
cppcheck warning:
sound/soc/samsung/tm2_wm5110.c:605:6: style: Variable 'ret' is
reassigned a value before the old one has been
used. [redundantAssignment]
ret = devm_snd_soc_register_component(dev, &tm2_component,
^
sound/soc/samsung/tm2_wm5110.c:554:7: note: ret is assigned
ret = of_parse_phandle_with_args(dev->of_node, "i2s-controller",
^
sound/soc/samsung/tm2_wm5110.c:605:6: note: ret is overwritten
ret = devm_snd_soc_register_component(dev, &tm2_component,
^
The args is a stack variable, so it could have junk (uninitialized)
therefore args.np could have a non-NULL and random value even though
property was missing. Later could trigger invalid pointer dereference.
This patch provides the correct fix, there's no need to check for
args.np because args.np won't be initialized on errors.
Fixes: 75fa6833aef3 ("ASoC: samsung: tm2_wm5110: check of_parse return value")
Fixes: 8d1513cef51a ("ASoC: samsung: Add support for HDMI audio on TM2board")
Cc: <stable(a)vger.kernel.org>
Suggested-by: Krzysztof Kozlowski <krzk(a)kernel.org>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart(a)linux.intel.com>
---
sound/soc/samsung/tm2_wm5110.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/sound/soc/samsung/tm2_wm5110.c b/sound/soc/samsung/tm2_wm5110.c
index da6204248f82..125e07f65d2b 100644
--- a/sound/soc/samsung/tm2_wm5110.c
+++ b/sound/soc/samsung/tm2_wm5110.c
@@ -553,7 +553,7 @@ static int tm2_probe(struct platform_device *pdev)
ret = of_parse_phandle_with_args(dev->of_node, "i2s-controller",
cells_name, i, &args);
- if (ret || !args.np) {
+ if (ret) {
dev_err(dev, "i2s-controller property parse error: %d\n", i);
ret = -EINVAL;
goto dai_node_put;
--
2.25.1
Extend kvm_s390_shadow_fault to return the pointer to the valid leaf
DAT table entry, or to the invalid entry.
Also return some flags in the lower bits of the address:
PEI_DAT_PROT: indicates that DAT protection applies because of the
protection bit in the segment (or, if EDAT, region) tables.
PEI_NOT_PTE: indicates that the address of the DAT table entry returned
does not refer to a PTE, but to a segment or region table.
Signed-off-by: Claudio Imbrenda <imbrenda(a)linux.ibm.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Janosch Frank <frankja(a)de.ibm.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
---
arch/s390/kvm/gaccess.c | 30 +++++++++++++++++++++++++-----
arch/s390/kvm/gaccess.h | 6 +++++-
arch/s390/kvm/vsie.c | 8 ++++----
3 files changed, 34 insertions(+), 10 deletions(-)
diff --git a/arch/s390/kvm/gaccess.c b/arch/s390/kvm/gaccess.c
index 6d6b57059493..33a03e518640 100644
--- a/arch/s390/kvm/gaccess.c
+++ b/arch/s390/kvm/gaccess.c
@@ -976,7 +976,9 @@ int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra)
* kvm_s390_shadow_tables - walk the guest page table and create shadow tables
* @sg: pointer to the shadow guest address space structure
* @saddr: faulting address in the shadow gmap
- * @pgt: pointer to the page table address result
+ * @pgt: pointer to the beginning of the page table for the given address if
+ * successful (return value 0), or to the first invalid DAT entry in
+ * case of exceptions (return value > 0)
* @fake: pgt references contiguous guest memory block, not a pgtable
*/
static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
@@ -1034,6 +1036,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rfte.val = ptr;
goto shadow_r2t;
}
+ *pgt = ptr + vaddr.rfx * 8;
rc = gmap_read_table(parent, ptr + vaddr.rfx * 8, &rfte.val);
if (rc)
return rc;
@@ -1060,6 +1063,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rste.val = ptr;
goto shadow_r3t;
}
+ *pgt = ptr + vaddr.rsx * 8;
rc = gmap_read_table(parent, ptr + vaddr.rsx * 8, &rste.val);
if (rc)
return rc;
@@ -1087,6 +1091,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
rtte.val = ptr;
goto shadow_sgt;
}
+ *pgt = ptr + vaddr.rtx * 8;
rc = gmap_read_table(parent, ptr + vaddr.rtx * 8, &rtte.val);
if (rc)
return rc;
@@ -1123,6 +1128,7 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
ste.val = ptr;
goto shadow_pgt;
}
+ *pgt = ptr + vaddr.sx * 8;
rc = gmap_read_table(parent, ptr + vaddr.sx * 8, &ste.val);
if (rc)
return rc;
@@ -1157,6 +1163,8 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
* @vcpu: virtual cpu
* @sg: pointer to the shadow guest address space structure
* @saddr: faulting address in the shadow gmap
+ * @datptr: will contain the address of the faulting DAT table entry, or of
+ * the valid leaf, plus some flags
*
* Returns: - 0 if the shadow fault was successfully resolved
* - > 0 (pgm exception code) on exceptions while faulting
@@ -1165,11 +1173,11 @@ static int kvm_s390_shadow_tables(struct gmap *sg, unsigned long saddr,
* - -ENOMEM if out of memory
*/
int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
- unsigned long saddr)
+ unsigned long saddr, unsigned long *datptr)
{
union vaddress vaddr;
union page_table_entry pte;
- unsigned long pgt;
+ unsigned long pgt = 0;
int dat_protection, fake;
int rc;
@@ -1191,8 +1199,20 @@ int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *sg,
pte.val = pgt + vaddr.px * PAGE_SIZE;
goto shadow_page;
}
- if (!rc)
- rc = gmap_read_table(sg->parent, pgt + vaddr.px * 8, &pte.val);
+
+ switch (rc) {
+ case PGM_SEGMENT_TRANSLATION:
+ case PGM_REGION_THIRD_TRANS:
+ case PGM_REGION_SECOND_TRANS:
+ case PGM_REGION_FIRST_TRANS:
+ pgt |= PEI_NOT_PTE;
+ break;
+ case 0:
+ pgt += vaddr.px * 8;
+ rc = gmap_read_table(sg->parent, pgt, &pte.val);
+ }
+ if (*datptr)
+ *datptr = pgt | dat_protection * PEI_DAT_PROT;
if (!rc && pte.i)
rc = PGM_PAGE_TRANSLATION;
if (!rc && pte.z)
diff --git a/arch/s390/kvm/gaccess.h b/arch/s390/kvm/gaccess.h
index f4c51756c462..41e99cea7e4b 100644
--- a/arch/s390/kvm/gaccess.h
+++ b/arch/s390/kvm/gaccess.h
@@ -359,7 +359,11 @@ void ipte_unlock(struct kvm_vcpu *vcpu);
int ipte_lock_held(struct kvm_vcpu *vcpu);
int kvm_s390_check_low_addr_prot_real(struct kvm_vcpu *vcpu, unsigned long gra);
+/* MVPG PEI indication bits */
+#define PEI_DAT_PROT 2
+#define PEI_NOT_PTE 4
+
int kvm_s390_shadow_fault(struct kvm_vcpu *vcpu, struct gmap *shadow,
- unsigned long saddr);
+ unsigned long saddr, unsigned long *datptr);
#endif /* __KVM_S390_GACCESS_H */
diff --git a/arch/s390/kvm/vsie.c b/arch/s390/kvm/vsie.c
index bd803e091918..78b604326016 100644
--- a/arch/s390/kvm/vsie.c
+++ b/arch/s390/kvm/vsie.c
@@ -620,10 +620,10 @@ static int map_prefix(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
/* with mso/msl, the prefix lies at offset *mso* */
prefix += scb_s->mso;
- rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap, prefix);
+ rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap, prefix, NULL);
if (!rc && (scb_s->ecb & ECB_TE))
rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap,
- prefix + PAGE_SIZE);
+ prefix + PAGE_SIZE, NULL);
/*
* We don't have to mprotect, we will be called for all unshadows.
* SIE will detect if protection applies and trigger a validity.
@@ -914,7 +914,7 @@ static int handle_fault(struct kvm_vcpu *vcpu, struct vsie_page *vsie_page)
current->thread.gmap_addr, 1);
rc = kvm_s390_shadow_fault(vcpu, vsie_page->gmap,
- current->thread.gmap_addr);
+ current->thread.gmap_addr, NULL);
if (rc > 0) {
rc = inject_fault(vcpu, rc,
current->thread.gmap_addr,
@@ -936,7 +936,7 @@ static void handle_last_fault(struct kvm_vcpu *vcpu,
{
if (vsie_page->fault_addr)
kvm_s390_shadow_fault(vcpu, vsie_page->gmap,
- vsie_page->fault_addr);
+ vsie_page->fault_addr, NULL);
vsie_page->fault_addr = 0;
}
--
2.26.2
On Tue, Feb 23, 2021 at 8:38 AM Kun-Chuan Hsieh <jetswayss(a)gmail.com> wrote:
>
> Hi Jiri Olsa,
>
> Yes, exactly. The missing SHF_COMPRESSED value causes build failure.
>
> Compared to conditional compilation, your suggestion seems to be a better
> solution. Thank you for your suggestion. I will submit the patch v2.
>
>
> Hi Andrii Nakryiko,
>
> Yes, it is possible to detect whether libelf supports compressed ELF
> sections at compilation time. I can think of two possible methods.
> 1. Check SHF_COMPRESSED is defined. (If libelf supports compressed ELF
> sections, SHF_COMPRESSED should be defined in libelf.h or elf.h)
> 2. Check the libelf version by _ELFUTILS_PREREQ (from elfutils/version.h)
>
> Best regards,
> Kun-Chuan Hsieh
>
> On Tue, Feb 23, 2021 at 10:43 PM Jiri Olsa <jolsa(a)redhat.com> wrote:
>>
>> On Tue, Feb 23, 2021 at 01:20:01AM +0000, Kun-Chuan Hsieh wrote:
>> > Older versions of libelf cannot recognize the compressed section.
>>
>> so it's the SHF_COMPRESSED value the build fails on?
>>
>> maybe we could do just this:
>>
>> #ifndef SHF_COMPRESSED
>> #define SHF_COMPRESSED (1 << 11) /* Section with compressed data. */
>> #endif
Yeah, I was hoping it would be something like this. Or we can just
#ifdef two regions of code that rely on SHF_COMPRESSED.
>>
>> jirka
>>
>> > However, it's only required to fix the compressed section info when
>> > compiling with CONFIG_DEBUG_INFO_COMPRESSED flag is set.
>> >
>> > Only compile the compressed_section_fix function when necessary will make
>> > it easier to enable the BTF function. Since the tool resolve_btfids is
>> > compiled with host toolchain. The host toolchain might be older than the
>> > cross compile toolchain.
>> >
>> > Cc: stable(a)vger.kernel.org
>> > Signed-off-by: Kun-Chuan Hsieh <jetswayss(a)gmail.com>
>> > ---
>> > tools/bpf/resolve_btfids/main.c | 4 ++++
>> > 1 file changed, 4 insertions(+)
>> >
>> > diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c
>> > index 7409d7860aa6..ad40346c6631 100644
>> > --- a/tools/bpf/resolve_btfids/main.c
>> > +++ b/tools/bpf/resolve_btfids/main.c
>> > @@ -260,6 +260,7 @@ static struct btf_id *add_symbol(struct rb_root *root, char *name, size_t size)
>> > return btf_id__add(root, id, false);
>> > }
>> >
>> > +#ifdef CONFIG_DEBUG_INFO_COMPRESSED
>> > /*
>> > * The data of compressed section should be aligned to 4
>> > * (for 32bit) or 8 (for 64 bit) bytes. The binutils ld
>> > @@ -292,6 +293,7 @@ static int compressed_section_fix(Elf *elf, Elf_Scn *scn, GElf_Shdr *sh)
>> > }
>> > return 0;
>> > }
>> > +#endif
>> >
>> > static int elf_collect(struct object *obj)
>> > {
>> > @@ -370,8 +372,10 @@ static int elf_collect(struct object *obj)
>> > obj->efile.idlist_addr = sh.sh_addr;
>> > }
>> >
>> > +#ifdef CONFIG_DEBUG_INFO_COMPRESSED
>> > if (compressed_section_fix(elf, scn, &sh))
>> > return -1;
>> > +#endif
>> > }
>> >
>> > return 0;
>> > --
>> > 2.25.1
>> >
>>
On a 32-bit fast syscall that fails to read its arguments from user
memory, the kernel currently does syscall exit work but not
syscall exit work. This would confuse audit and ptrace.
This is a minimal fix intended for ease of backporting. A more
complete cleanup is coming.
Cc: stable(a)vger.kernel.org
Fixes: 0b085e68f407 ("x86/entry: Consolidate 32/64 bit syscall entry")
Signed-off-by: Andy Lutomirski <luto(a)kernel.org>
---
arch/x86/entry/common.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
index 0904f5676e4d..cf4dcf346ca8 100644
--- a/arch/x86/entry/common.c
+++ b/arch/x86/entry/common.c
@@ -128,7 +128,8 @@ static noinstr bool __do_fast_syscall_32(struct pt_regs *regs)
regs->ax = -EFAULT;
instrumentation_end();
- syscall_exit_to_user_mode(regs);
+ local_irq_disable();
+ exit_to_user_mode();
return false;
}
--
2.29.2
On Tue, Feb 23, 2021 at 1:14 AM Muchun Song <songmuchun(a)bytedance.com> wrote:
>
> We use a global percpu int_active_memcg variable to store the remote
> memcg when we are in the interrupt context. But get_active_memcg always
> return the current->active_memcg or root_mem_cgroup. The remote memcg
> (set in the interrupt context) is ignored. This is not what we want.
> So fix it.
>
> Fixes: 37d5985c003d ("mm: kmem: prepare remote memcg charging infra for interrupt contexts")
> Signed-off-by: Muchun Song <songmuchun(a)bytedance.com>
Good catch.
Cc: stable(a)vger.kernel.org
Reviewed-by: Shakeel Butt <shakeelb(a)google.com>
> ---
> mm/memcontrol.c | 10 +++-------
> 1 file changed, 3 insertions(+), 7 deletions(-)
>
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index be6bc5044150..bbe25655f7eb 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -1061,13 +1061,9 @@ static __always_inline struct mem_cgroup *get_active_memcg(void)
>
> rcu_read_lock();
> memcg = active_memcg();
> - if (memcg) {
> - /* current->active_memcg must hold a ref. */
> - if (WARN_ON_ONCE(!css_tryget(&memcg->css)))
> - memcg = root_mem_cgroup;
> - else
> - memcg = current->active_memcg;
> - }
> + /* remote memcg must hold a ref. */
> + if (memcg && WARN_ON_ONCE(!css_tryget(&memcg->css)))
> + memcg = root_mem_cgroup;
> rcu_read_unlock();
>
> return memcg;
> --
> 2.11.0
>
I'm announcing the release of the 5.10.18 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/arm/xen/p2m.c | 6 +
arch/x86/xen/p2m.c | 15 +---
drivers/block/xen-blkback/blkback.c | 32 ++++----
drivers/bluetooth/btusb.c | 20 +----
drivers/infiniband/ulp/isert/ib_isert.c | 27 +++++++
drivers/infiniband/ulp/isert/ib_isert.h | 6 +
drivers/media/usb/pwc/pwc-if.c | 22 +++--
drivers/net/wireless/mediatek/mt76/mt7615/mcu.c | 89 +++++++++++++++++-------
drivers/net/wireless/mediatek/mt76/mt7915/mcu.c | 87 ++++++++++++++++++-----
drivers/net/xen-netback/netback.c | 4 -
drivers/tty/tty_io.c | 5 +
drivers/vdpa/vdpa_sim/vdpa_sim.c | 83 ++++++++++++++++------
drivers/xen/gntdev.c | 37 +++++----
drivers/xen/xen-scsiback.c | 4 -
fs/btrfs/ctree.h | 6 -
fs/btrfs/inode.c | 6 +
include/xen/grant_table.h | 1
net/bridge/br.c | 5 +
net/core/dev.c | 2
net/mptcp/protocol.c | 5 +
net/openvswitch/actions.c | 15 +---
net/packet/af_packet.c | 2
net/qrtr/qrtr.c | 2
net/sched/Kconfig | 6 -
net/tls/tls_proc.c | 3
26 files changed, 336 insertions(+), 156 deletions(-)
David Sterba (1):
btrfs: fix backport of 2175bf57dc952 in 5.10.13
Eelco Chaudron (1):
net: openvswitch: fix TTL decrement exception action execution
Felix Fietkau (1):
mt76: mt7915: fix endian issues
Filipe Manana (1):
btrfs: fix crash after non-aligned direct IO write with O_DSYNC
Florian Westphal (1):
mptcp: skip to next candidate if subflow has unacked data
Greg Kroah-Hartman (1):
Linux 5.10.18
Jan Beulich (8):
Xen/x86: don't bail early from clear_foreign_p2m_mapping()
Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
Xen/gntdev: correct error checking in gntdev_map_grant_pages()
xen-blkback: don't "handle" error by BUG()
xen-netback: don't "handle" error by BUG()
xen-scsiback: don't "handle" error by BUG()
xen-blkback: fix error handling in xen_blkbk_map()
Linus Torvalds (1):
tty: protect tty_write from odd low-level tty disciplines
Loic Poulain (1):
net: qrtr: Fix port ID for control messages
Lorenzo Bianconi (1):
mt76: mt7615: fix rdd mcu cmd endianness
Matwey V. Kornilov (1):
media: pwc: Use correct device for DMA
Max Gurtovoy (2):
vdpa_sim: remove hard-coded virtq count
IB/isert: add module param to set sg_tablesize for IO cmd
Pablo Neira Ayuso (1):
net: sched: incorrect Kconfig dependencies on Netfilter modules
Stefano Garzarella (4):
vdpa_sim: add struct vdpasim_dev_attr for device attributes
vdpa_sim: store parsed MAC address in a buffer
vdpa_sim: make 'config' generic and usable for any device type
vdpa_sim: add get_config callback in vdpasim_dev_attr
Stefano Stabellini (1):
xen/arm: don't ignore return errors from set_phys_to_machine
Trent Piepho (1):
Bluetooth: btusb: Always fallback to alt 1 for WBS
Wang Hai (1):
net: bridge: Fix a warning when del bridge sysfs
Yonatan Linik (1):
net: fix proc_fs init handling in af_packet and tls
wenxu (1):
net/sched: fix miss init the mru in qdisc_skb_cb
Older versions of libelf cannot recognize the compressed section.
However, it's only required to fix the compressed section info when
compiling with CONFIG_DEBUG_INFO_COMPRESSED flag is set.
Only compile the compressed_section_fix function when necessary will make
it easier to enable the BTF function. Since the tool resolve_btfids is
compiled with host toolchain. The host toolchain might be older than the
cross compile toolchain.
Cc: stable(a)vger.kernel.org
Signed-off-by: Kun-Chuan Hsieh <jetswayss(a)gmail.com>
---
tools/bpf/resolve_btfids/main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c
index 7409d7860aa6..ad40346c6631 100644
--- a/tools/bpf/resolve_btfids/main.c
+++ b/tools/bpf/resolve_btfids/main.c
@@ -260,6 +260,7 @@ static struct btf_id *add_symbol(struct rb_root *root, char *name, size_t size)
return btf_id__add(root, id, false);
}
+#ifdef CONFIG_DEBUG_INFO_COMPRESSED
/*
* The data of compressed section should be aligned to 4
* (for 32bit) or 8 (for 64 bit) bytes. The binutils ld
@@ -292,6 +293,7 @@ static int compressed_section_fix(Elf *elf, Elf_Scn *scn, GElf_Shdr *sh)
}
return 0;
}
+#endif
static int elf_collect(struct object *obj)
{
@@ -370,8 +372,10 @@ static int elf_collect(struct object *obj)
obj->efile.idlist_addr = sh.sh_addr;
}
+#ifdef CONFIG_DEBUG_INFO_COMPRESSED
if (compressed_section_fix(elf, scn, &sh))
return -1;
+#endif
}
return 0;
--
2.25.1
From: Richard Gong <richard.gong(a)intel.com>
Clean up COMMAND_RECONFIG_FLAG_PARTIAL flag by resetting it to 0, which
aligns with the firmware settings.
Cc: <stable(a)vger.kernel.org> # 5.9+
Fixes: 36847f9e3e56 ("firmware: stratix10-svc: correct reconfig flag and timeout values")
Signed-off-by: Richard Gong <richard.gong(a)intel.com>
---
v3: correct the missing item in the Fixes subject line
v2: add tag Cc: <stable(a)vger.kernel.org> # 5.9+
add 'Fixes: ... ' line in the comment
---
include/linux/firmware/intel/stratix10-svc-client.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/firmware/intel/stratix10-svc-client.h b/include/linux/firmware/intel/stratix10-svc-client.h
index a93d859..f843c6a 100644
--- a/include/linux/firmware/intel/stratix10-svc-client.h
+++ b/include/linux/firmware/intel/stratix10-svc-client.h
@@ -56,7 +56,7 @@
* COMMAND_RECONFIG_FLAG_PARTIAL:
* Set to FPGA configuration type (full or partial).
*/
-#define COMMAND_RECONFIG_FLAG_PARTIAL 1
+#define COMMAND_RECONFIG_FLAG_PARTIAL 0
/**
* Timeout settings for service clients:
--
2.7.4
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format. However, the call to
`comedi_buf_write_samples()` is passing the address of a 32-bit integer
parameter. On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value. Fix it by changing the type of the parameter
holding the sample value to `unsigned short`.
[Note: the bug was introduced in commit edf4537bcbf5 ("staging: comedi:
pcl818: use comedi_buf_write_samples()") but the patch applies better to
commit d615416de615 ("staging: comedi: pcl818: introduce
pcl818_ai_write_sample()").]
Fixes: d615416de615 ("staging: comedi: pcl818: introduce pcl818_ai_write_sample()")
Cc: <stable(a)vger.kernel.org> # 4.0+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/staging/comedi/drivers/pcl818.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/comedi/drivers/pcl818.c b/drivers/staging/comedi/drivers/pcl818.c
index 63e3011158f2..f4b4a686c710 100644
--- a/drivers/staging/comedi/drivers/pcl818.c
+++ b/drivers/staging/comedi/drivers/pcl818.c
@@ -423,7 +423,7 @@ static int pcl818_ai_eoc(struct comedi_device *dev,
static bool pcl818_ai_write_sample(struct comedi_device *dev,
struct comedi_subdevice *s,
- unsigned int chan, unsigned int val)
+ unsigned int chan, unsigned short val)
{
struct pcl818_private *devpriv = dev->private;
struct comedi_cmd *cmd = &s->async->cmd;
--
2.30.0
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format. However, the call to
`comedi_buf_write_samples()` is passing the address of a 32-bit integer
variable. On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value. Fix it by changing the type of the variable
holding the sample value to `unsigned short`.
Fixes: 1f44c034de2e ("staging: comedi: pcl711: use comedi_buf_write_samples()")
Cc: <stable(a)vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/staging/comedi/drivers/pcl711.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/comedi/drivers/pcl711.c b/drivers/staging/comedi/drivers/pcl711.c
index 2dbf69e30965..bd6f42fe9e3c 100644
--- a/drivers/staging/comedi/drivers/pcl711.c
+++ b/drivers/staging/comedi/drivers/pcl711.c
@@ -184,7 +184,7 @@ static irqreturn_t pcl711_interrupt(int irq, void *d)
struct comedi_device *dev = d;
struct comedi_subdevice *s = dev->read_subdev;
struct comedi_cmd *cmd = &s->async->cmd;
- unsigned int data;
+ unsigned short data;
if (!dev->attached) {
dev_err(dev->class_dev, "spurious interrupt\n");
--
2.30.0
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format. However, the calls to
`comedi_buf_write_samples()` are passing the address of a 32-bit integer
variable. On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value. Fix it by changing the type of the variable
holding the sample value to `unsigned short`.
Fixes: de88924f67d ("staging: comedi: me4000: use comedi_buf_write_samples()")
Cc: <stable(a)vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/staging/comedi/drivers/me4000.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/comedi/drivers/me4000.c b/drivers/staging/comedi/drivers/me4000.c
index 726e40dc17b6..0d3d4cafce2e 100644
--- a/drivers/staging/comedi/drivers/me4000.c
+++ b/drivers/staging/comedi/drivers/me4000.c
@@ -924,7 +924,7 @@ static irqreturn_t me4000_ai_isr(int irq, void *dev_id)
struct comedi_subdevice *s = dev->read_subdev;
int i;
int c = 0;
- unsigned int lval;
+ unsigned short lval;
if (!dev->attached)
return IRQ_NONE;
--
2.30.0
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format. However, the call to
`comedi_buf_write_samples()` is passing the address of a 32-bit integer
variable. On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value. Fix it by changing the type of the variable
holding the sample value to `unsigned short`.
[Note: the bug was introduced in commit 1700529b24cc ("staging: comedi:
dmm32at: use comedi_buf_write_samples()") but the patch applies better
to the later (but in the same kernel release) commit 0c0eadadcbe6e
("staging: comedi: dmm32at: introduce dmm32_ai_get_sample()").]
Fixes: 0c0eadadcbe6e ("staging: comedi: dmm32at: introduce dmm32_ai_get_sample()")
Cc: <stable(a)vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/staging/comedi/drivers/dmm32at.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/comedi/drivers/dmm32at.c b/drivers/staging/comedi/drivers/dmm32at.c
index 17e6018918bb..56682f01242f 100644
--- a/drivers/staging/comedi/drivers/dmm32at.c
+++ b/drivers/staging/comedi/drivers/dmm32at.c
@@ -404,7 +404,7 @@ static irqreturn_t dmm32at_isr(int irq, void *d)
{
struct comedi_device *dev = d;
unsigned char intstat;
- unsigned int val;
+ unsigned short val;
int i;
if (!dev->attached) {
--
2.30.0
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format. However, the call to
`comedi_buf_write_samples()` is passing the address of a 32-bit integer
variable. On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value. Fix it by changing the type of the variable
holding the sample value to `unsigned short`.
Fixes: ad9eb43c93d8 ("staging: comedi: das800: use comedi_buf_write_samples()"
Cc: <stable(a)vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/staging/comedi/drivers/das800.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/comedi/drivers/das800.c b/drivers/staging/comedi/drivers/das800.c
index 4ea100ff6930..2881808d6606 100644
--- a/drivers/staging/comedi/drivers/das800.c
+++ b/drivers/staging/comedi/drivers/das800.c
@@ -427,7 +427,7 @@ static irqreturn_t das800_interrupt(int irq, void *d)
struct comedi_cmd *cmd;
unsigned long irq_flags;
unsigned int status;
- unsigned int val;
+ unsigned short val;
bool fifo_empty;
bool fifo_overflow;
int i;
--
2.30.0
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format. However, the call to
`comedi_buf_write_samples()` is passing the address of a 32-bit integer
variable. On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value. Fix it by changing the type of the variable
holding the sample value to `unsigned short`.
Fixes: d1d24cb65ee3 ("staging: comedi: das6402: read analog input samples in interrupt handler")
Cc: <stable(a)vger.kernel.org> # 3.19+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/staging/comedi/drivers/das6402.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/staging/comedi/drivers/das6402.c b/drivers/staging/comedi/drivers/das6402.c
index 04e224f8b779..96f4107b8054 100644
--- a/drivers/staging/comedi/drivers/das6402.c
+++ b/drivers/staging/comedi/drivers/das6402.c
@@ -186,7 +186,7 @@ static irqreturn_t das6402_interrupt(int irq, void *d)
if (status & DAS6402_STATUS_FFULL) {
async->events |= COMEDI_CB_OVERFLOW;
} else if (status & DAS6402_STATUS_FFNE) {
- unsigned int val;
+ unsigned short val;
val = das6402_ai_read_sample(dev, s);
comedi_buf_write_samples(s, &val, 1);
--
2.30.0
The analog input subdevice supports Comedi asynchronous commands that
use Comedi's 16-bit sample format. However, the calls to
`comedi_buf_write_samples()` are passing the address of a 32-bit integer
variable. On bigendian machines, this will copy 2 bytes from the wrong
end of the 32-bit value. Fix it by changing the type of the variables
holding the sample value to `unsigned short`. The type of the `val`
parameter of `pci1710_ai_read_sample()` is changed to `unsigned short *`
accordingly. The type of the `val` variable in `pci1710_ai_insn_read()`
is also changed to `unsigned short` since its address is passed to
`pci1710_ai_read_sample()`.
Fixes: a9c3a015c12f ("staging: comedi: adv_pci1710: use comedi_buf_write_samples()")
Cc: <stable(a)vger.kernel.org> # 4.0+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
drivers/staging/comedi/drivers/adv_pci1710.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/staging/comedi/drivers/adv_pci1710.c b/drivers/staging/comedi/drivers/adv_pci1710.c
index 692893c7e5c3..090607760be6 100644
--- a/drivers/staging/comedi/drivers/adv_pci1710.c
+++ b/drivers/staging/comedi/drivers/adv_pci1710.c
@@ -300,11 +300,11 @@ static int pci1710_ai_eoc(struct comedi_device *dev,
static int pci1710_ai_read_sample(struct comedi_device *dev,
struct comedi_subdevice *s,
unsigned int cur_chan,
- unsigned int *val)
+ unsigned short *val)
{
const struct boardtype *board = dev->board_ptr;
struct pci1710_private *devpriv = dev->private;
- unsigned int sample;
+ unsigned short sample;
unsigned int chan;
sample = inw(dev->iobase + PCI171X_AD_DATA_REG);
@@ -345,7 +345,7 @@ static int pci1710_ai_insn_read(struct comedi_device *dev,
pci1710_ai_setup_chanlist(dev, s, &insn->chanspec, 1, 1);
for (i = 0; i < insn->n; i++) {
- unsigned int val;
+ unsigned short val;
/* start conversion */
outw(0, dev->iobase + PCI171X_SOFTTRG_REG);
@@ -395,7 +395,7 @@ static void pci1710_handle_every_sample(struct comedi_device *dev,
{
struct comedi_cmd *cmd = &s->async->cmd;
unsigned int status;
- unsigned int val;
+ unsigned short val;
int ret;
status = inw(dev->iobase + PCI171X_STATUS_REG);
@@ -455,7 +455,7 @@ static void pci1710_handle_fifo(struct comedi_device *dev,
}
for (i = 0; i < devpriv->max_samples; i++) {
- unsigned int val;
+ unsigned short val;
int ret;
ret = pci1710_ai_read_sample(dev, s, s->async->cur_chan, &val);
--
2.30.0
The digital input subdevice supports Comedi asynchronous commands that
read interrupt status information. This uses 16-bit Comedi samples (of
which only the bottom 8 bits contain status information). However, the
interrupt handler is calling `comedi_buf_write_samples()` with the
address of a 32-bit variable `unsigned int status`. On a bigendian
machine, this will copy 2 bytes from the wrong end of the variable. Fix
it by changing the type of the variable to `unsigned short`.
Fixes: a8c66b684efa ("staging: comedi: addi_apci_1500: rewrite the subdevice support functions")
Cc: <stable(a)vger.kernel.org> #4.0+
Signed-off-by: Ian Abbott <abbotti(a)mev.co.uk>
---
.../staging/comedi/drivers/addi_apci_1500.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/staging/comedi/drivers/addi_apci_1500.c b/drivers/staging/comedi/drivers/addi_apci_1500.c
index 11efb21555e3..b04c15dcfb57 100644
--- a/drivers/staging/comedi/drivers/addi_apci_1500.c
+++ b/drivers/staging/comedi/drivers/addi_apci_1500.c
@@ -208,7 +208,7 @@ static irqreturn_t apci1500_interrupt(int irq, void *d)
struct comedi_device *dev = d;
struct apci1500_private *devpriv = dev->private;
struct comedi_subdevice *s = dev->read_subdev;
- unsigned int status = 0;
+ unsigned short status = 0;
unsigned int val;
val = inl(devpriv->amcc + AMCC_OP_REG_INTCSR);
@@ -238,14 +238,14 @@ static irqreturn_t apci1500_interrupt(int irq, void *d)
*
* Mask Meaning
* ---------- ------------------------------------------
- * 0x00000001 Event 1 has occurred
- * 0x00000010 Event 2 has occurred
- * 0x00000100 Counter/timer 1 has run down (not implemented)
- * 0x00001000 Counter/timer 2 has run down (not implemented)
- * 0x00010000 Counter 3 has run down (not implemented)
- * 0x00100000 Watchdog has run down (not implemented)
- * 0x01000000 Voltage error
- * 0x10000000 Short-circuit error
+ * 0b00000001 Event 1 has occurred
+ * 0b00000010 Event 2 has occurred
+ * 0b00000100 Counter/timer 1 has run down (not implemented)
+ * 0b00001000 Counter/timer 2 has run down (not implemented)
+ * 0b00010000 Counter 3 has run down (not implemented)
+ * 0b00100000 Watchdog has run down (not implemented)
+ * 0b01000000 Voltage error
+ * 0b10000000 Short-circuit error
*/
comedi_buf_write_samples(s, &status, 1);
comedi_handle_events(dev, s);
--
2.30.0
I'm announcing the release of the 5.4.100 kernel.
All users of the 5.4 kernel series must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 -
arch/arm/xen/p2m.c | 6 +++--
arch/x86/kvm/svm.c | 1
arch/x86/xen/p2m.c | 15 ++++++--------
drivers/block/xen-blkback/blkback.c | 30 +++++++++++++++--------------
drivers/media/usb/pwc/pwc-if.c | 22 ++++++++++++---------
drivers/net/xen-netback/netback.c | 4 ---
drivers/xen/gntdev.c | 37 +++++++++++++++++++-----------------
drivers/xen/xen-scsiback.c | 4 +--
fs/btrfs/ctree.h | 6 ++---
include/xen/grant_table.h | 1
net/bridge/br.c | 5 +++-
net/qrtr/qrtr.c | 2 -
13 files changed, 73 insertions(+), 62 deletions(-)
David Sterba (1):
btrfs: fix backport of 2175bf57dc952 in 5.4.95
Greg Kroah-Hartman (1):
Linux 5.4.100
Jan Beulich (8):
Xen/x86: don't bail early from clear_foreign_p2m_mapping()
Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
Xen/gntdev: correct error checking in gntdev_map_grant_pages()
xen-blkback: don't "handle" error by BUG()
xen-netback: don't "handle" error by BUG()
xen-scsiback: don't "handle" error by BUG()
xen-blkback: fix error handling in xen_blkbk_map()
Loic Poulain (1):
net: qrtr: Fix port ID for control messages
Matwey V. Kornilov (1):
media: pwc: Use correct device for DMA
Paolo Bonzini (1):
KVM: SEV: fix double locking due to incorrect backport
Stefano Stabellini (1):
xen/arm: don't ignore return errors from set_phys_to_machine
Wang Hai (1):
net: bridge: Fix a warning when del bridge sysfs
I'm announcing the release of the 5.11.1 kernel.
All users of the 5.11 kernel series must upgrade.
The updated 5.11.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.11.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 -
arch/arm/xen/p2m.c | 6 +++--
arch/x86/xen/p2m.c | 15 ++++++--------
drivers/block/xen-blkback/blkback.c | 32 +++++++++++++++++--------------
drivers/bluetooth/btusb.c | 20 +++++--------------
drivers/media/usb/pwc/pwc-if.c | 22 ++++++++++++---------
drivers/net/xen-netback/netback.c | 4 ---
drivers/tty/tty_io.c | 5 +++-
drivers/xen/gntdev.c | 37 +++++++++++++++++++-----------------
drivers/xen/xen-scsiback.c | 4 +--
include/xen/grant_table.h | 1
11 files changed, 77 insertions(+), 71 deletions(-)
Greg Kroah-Hartman (1):
Linux 5.11.1
Jan Beulich (8):
Xen/x86: don't bail early from clear_foreign_p2m_mapping()
Xen/x86: also check kernel mapping in set_foreign_p2m_mapping()
Xen/gntdev: correct dev_bus_addr handling in gntdev_map_grant_pages()
Xen/gntdev: correct error checking in gntdev_map_grant_pages()
xen-blkback: don't "handle" error by BUG()
xen-netback: don't "handle" error by BUG()
xen-scsiback: don't "handle" error by BUG()
xen-blkback: fix error handling in xen_blkbk_map()
Linus Torvalds (1):
tty: protect tty_write from odd low-level tty disciplines
Matwey V. Kornilov (1):
media: pwc: Use correct device for DMA
Stefano Stabellini (1):
xen/arm: don't ignore return errors from set_phys_to_machine
Trent Piepho (1):
Bluetooth: btusb: Always fallback to alt 1 for WBS
From: Mike Rapoport <rppt(a)linux.ibm.com>
There could be struct pages that are not backed by actual physical memory.
This can happen when the actual memory bank is not a multiple of
SECTION_SIZE or when an architecture does not register memory holes
reserved by the firmware as memblock.memory.
Such pages are currently initialized using init_unavailable_mem() function
that iterates through PFNs in holes in memblock.memory and if there is a
struct page corresponding to a PFN, the fields of this page are set to
default values and it is marked as Reserved.
init_unavailable_mem() does not take into account zone and node the page
belongs to and sets both zone and node links in struct page to zero.
Before commit 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions
rather that check each PFN") the holes inside a zone were re-initialized
during memmap_init() and got their zone/node links right. However, after
that commit nothing updates the struct pages representing such holes.
On a system that has firmware reserved holes in a zone above ZONE_DMA, for
instance in a configuration below:
# grep -A1 E820 /proc/iomem
7a17b000-7a216fff : Unknown E820 type
7a217000-7bffffff : System RAM
unset zone link in struct page will trigger
VM_BUG_ON_PAGE(!zone_spans_pfn(page_zone(page), pfn), page);
because there are pages in both ZONE_DMA32 and ZONE_DMA (unset zone link
in struct page) in the same pageblock.
Interleave initialization of the unavailable pages with the normal
initialization of memory map, so that zone and node information will be
properly set on struct pages that are not backed by the actual memory.
With this change the pages for holes inside a zone will get proper
zone/node links and the pages that are not spanned by any node will get
links to the adjacent zone/node.
Fixes: 73a6e474cb37 ("mm: memmap_init: iterate over memblock regions rather that check each PFN")
Signed-off-by: Mike Rapoport <rppt(a)linux.ibm.com>
Reported-by: Qian Cai <cai(a)lca.pw>
Reported-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Baoquan He <bhe(a)redhat.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: Qian Cai <cai(a)lca.pw>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
---
mm/page_alloc.c | 144 ++++++++++++++++++++----------------------------
1 file changed, 61 insertions(+), 83 deletions(-)
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3e93f8b29bae..1f1db70b7789 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6280,12 +6280,60 @@ static void __meminit zone_init_free_lists(struct zone *zone)
}
}
+#if !defined(CONFIG_FLAT_NODE_MEM_MAP)
+/*
+ * Only struct pages that correspond to ranges defined by memblock.memory
+ * are zeroed and initialized by going through __init_single_page() during
+ * memmap_init_zone().
+ *
+ * But, there could be struct pages that correspond to holes in
+ * memblock.memory. This can happen because of the following reasons:
+ * - phyiscal memory bank size is not necessarily the exact multiple of the
+ * arbitrary section size
+ * - early reserved memory may not be listed in memblock.memory
+ * - memory layouts defined with memmap= kernel parameter may not align
+ * nicely with memmap sections
+ *
+ * Explicitly initialize those struct pages so that:
+ * - PG_Reserved is set
+ * - zone and node links point to zone and node that span the page
+ */
+static u64 __meminit init_unavailable_range(unsigned long spfn,
+ unsigned long epfn,
+ int zone, int node)
+{
+ unsigned long pfn;
+ u64 pgcnt = 0;
+
+ for (pfn = spfn; pfn < epfn; pfn++) {
+ if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
+ pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
+ + pageblock_nr_pages - 1;
+ continue;
+ }
+ __init_single_page(pfn_to_page(pfn), pfn, zone, node);
+ __SetPageReserved(pfn_to_page(pfn));
+ pgcnt++;
+ }
+
+ return pgcnt;
+}
+#else
+static inline u64 init_unavailable_range(unsigned long spfn, unsigned long epfn,
+ int zone, int node)
+{
+ return 0;
+}
+#endif
+
void __meminit __weak memmap_init_zone(struct zone *zone)
{
unsigned long zone_start_pfn = zone->zone_start_pfn;
unsigned long zone_end_pfn = zone_start_pfn + zone->spanned_pages;
int i, nid = zone_to_nid(zone), zone_id = zone_idx(zone);
+ static unsigned long hole_pfn = 0;
unsigned long start_pfn, end_pfn;
+ u64 pgcnt = 0;
for_each_mem_pfn_range(i, nid, &start_pfn, &end_pfn, NULL) {
start_pfn = clamp(start_pfn, zone_start_pfn, zone_end_pfn);
@@ -6295,7 +6343,20 @@ void __meminit __weak memmap_init_zone(struct zone *zone)
memmap_init_range(end_pfn - start_pfn, nid,
zone_id, start_pfn, zone_end_pfn,
MEMINIT_EARLY, NULL, MIGRATE_MOVABLE);
+
+ if (hole_pfn < start_pfn)
+ pgcnt += init_unavailable_range(hole_pfn, start_pfn,
+ zone_id, nid);
+ hole_pfn = end_pfn;
}
+
+ if (hole_pfn < zone_end_pfn)
+ pgcnt += init_unavailable_range(hole_pfn, zone_end_pfn,
+ zone_id, nid);
+
+ if (pgcnt)
+ pr_info(" %s zone: %lld pages in unavailable ranges\n",
+ zone->name, pgcnt);
}
static int zone_batchsize(struct zone *zone)
@@ -7092,88 +7153,6 @@ void __init free_area_init_memoryless_node(int nid)
free_area_init_node(nid);
}
-#if !defined(CONFIG_FLAT_NODE_MEM_MAP)
-/*
- * Initialize all valid struct pages in the range [spfn, epfn) and mark them
- * PageReserved(). Return the number of struct pages that were initialized.
- */
-static u64 __init init_unavailable_range(unsigned long spfn, unsigned long epfn)
-{
- unsigned long pfn;
- u64 pgcnt = 0;
-
- for (pfn = spfn; pfn < epfn; pfn++) {
- if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) {
- pfn = ALIGN_DOWN(pfn, pageblock_nr_pages)
- + pageblock_nr_pages - 1;
- continue;
- }
- /*
- * Use a fake node/zone (0) for now. Some of these pages
- * (in memblock.reserved but not in memblock.memory) will
- * get re-initialized via reserve_bootmem_region() later.
- */
- __init_single_page(pfn_to_page(pfn), pfn, 0, 0);
- __SetPageReserved(pfn_to_page(pfn));
- pgcnt++;
- }
-
- return pgcnt;
-}
-
-/*
- * Only struct pages that are backed by physical memory are zeroed and
- * initialized by going through __init_single_page(). But, there are some
- * struct pages which are reserved in memblock allocator and their fields
- * may be accessed (for example page_to_pfn() on some configuration accesses
- * flags). We must explicitly initialize those struct pages.
- *
- * This function also addresses a similar issue where struct pages are left
- * uninitialized because the physical address range is not covered by
- * memblock.memory or memblock.reserved. That could happen when memblock
- * layout is manually configured via memmap=, or when the highest physical
- * address (max_pfn) does not end on a section boundary.
- */
-static void __init init_unavailable_mem(void)
-{
- phys_addr_t start, end;
- u64 i, pgcnt;
- phys_addr_t next = 0;
-
- /*
- * Loop through unavailable ranges not covered by memblock.memory.
- */
- pgcnt = 0;
- for_each_mem_range(i, &start, &end) {
- if (next < start)
- pgcnt += init_unavailable_range(PFN_DOWN(next),
- PFN_UP(start));
- next = end;
- }
-
- /*
- * Early sections always have a fully populated memmap for the whole
- * section - see pfn_valid(). If the last section has holes at the
- * end and that section is marked "online", the memmap will be
- * considered initialized. Make sure that memmap has a well defined
- * state.
- */
- pgcnt += init_unavailable_range(PFN_DOWN(next),
- round_up(max_pfn, PAGES_PER_SECTION));
-
- /*
- * Struct pages that do not have backing memory. This could be because
- * firmware is using some of this memory, or for some other reasons.
- */
- if (pgcnt)
- pr_info("Zeroed struct page in unavailable ranges: %lld pages", pgcnt);
-}
-#else
-static inline void __init init_unavailable_mem(void)
-{
-}
-#endif /* !CONFIG_FLAT_NODE_MEM_MAP */
-
#if MAX_NUMNODES > 1
/*
* Figure out the number of possible node ids.
@@ -7597,7 +7576,6 @@ void __init free_area_init(unsigned long *max_zone_pfn)
/* Initialise every node */
mminit_verify_pageflags_layout();
setup_nr_node_ids();
- init_unavailable_mem();
for_each_online_node(nid) {
pg_data_t *pgdat = NODE_DATA(nid);
free_area_init_node(nid);
--
2.28.0
some binary, for example the output of golang, may be mark as FPXX,
while in fact they are still FP32.
Since FPXX binary can work with both FR=1 and FR=0, we introduce a
config option CONFIG_MIPS_O32_FPXX_USE_FR0 to force it to use FR=0 here.
https://go-review.googlesource.com/c/go/+/239217https://go-review.googlesource.com/c/go/+/237058
v4->v5:
Fix CONFIG_MIPS_O32_FPXX_USE_FR0 usage: if -> ifdef
v3->v4:
introduce a config option: CONFIG_MIPS_O32_FPXX_USE_FR0
v2->v3:
commit message: add Signed-off-by and Cc to stable.
v1->v2:
Fix bad commit message: in fact, we are switching to FR=0
Signed-off-by: YunQiang Su <yunqiang.su(a)cipunited.com>
Cc: stable(a)vger.kernel.org # 4.19+
---
arch/mips/Kconfig | 11 +++++++++++
arch/mips/kernel/elf.c | 13 ++++++++++---
2 files changed, 21 insertions(+), 3 deletions(-)
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 0a17bedf4f0d..442db620636f 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -3100,6 +3100,17 @@ config MIPS_O32_FP64_SUPPORT
If unsure, say N.
+config MIPS_O32_FPXX_USE_FR0
+ bool "Use FR=0 mode for O32 FPXX binaries" if !CPU_MIPSR6
+ depends on MIPS_O32_FP64_SUPPORT
+ help
+ O32 FPXX can works on both FR=0 and FR=1 mode, so by default, the
+ mode preferred by hardware is used.
+
+ While some binaries may be marked as FPXX by mistake, for example
+ output of golang: they are in fact FP32 mode. To compatiable with
+ these binaries, we should use FR=0 mode for them.
+
config USE_OF
bool
select OF
diff --git a/arch/mips/kernel/elf.c b/arch/mips/kernel/elf.c
index 7b045d2a0b51..54b40b2b40b7 100644
--- a/arch/mips/kernel/elf.c
+++ b/arch/mips/kernel/elf.c
@@ -234,9 +234,10 @@ int arch_check_elf(void *_ehdr, bool has_interpreter, void *_interp_ehdr,
* fpxx case. This is because, in any-ABI (or no-ABI) we have no FPU
* instructions so we don't care about the mode. We will simply use
* the one preferred by the hardware. In fpxx case, that ABI can
- * handle both FR=1 and FR=0, so, again, we simply choose the one
- * preferred by the hardware. Next, if we only use single-precision
- * FPU instructions, and the default ABI FPU mode is not good
+ * handle both FR=1 and FR=0. Here, we may need to use FR=0, because
+ * some binaries may be mark as FPXX by mistake (ie, output of golang).
+ * - If we only use single-precision FPU instructions,
+ * and the default ABI FPU mode is not good
* (ie single + any ABI combination), we set again the FPU mode to the
* one is preferred by the hardware. Next, if we know that the code
* will only use single-precision instructions, shown by single being
@@ -248,8 +249,14 @@ int arch_check_elf(void *_ehdr, bool has_interpreter, void *_interp_ehdr,
*/
if (prog_req.fre && !prog_req.frdefault && !prog_req.fr1)
state->overall_fp_mode = FP_FRE;
+#ifdef CONFIG_MIPS_O32_FPXX_USE_FR0
+ else if (prog_req.fr1 && prog_req.frdefault)
+ state->overall_fp_mode = FP_FR0;
+ else if (prog_req.single && !prog_req.frdefault)
+#else
else if ((prog_req.fr1 && prog_req.frdefault) ||
(prog_req.single && !prog_req.frdefault))
+#endif
/* Make sure 64-bit MIPS III/IV/64R1 will not pick FR1 */
state->overall_fp_mode = ((raw_current_cpu_data.fpu_id & MIPS_FPIR_F64) &&
cpu_has_mips_r2_r6) ?
--
2.20.1
Fix an off by three orders of magnitude error in the AB8500
GPADC driver. Luckily it showed up quite quickly when trying
to make use of it. The processed reads were returning
microvolts, microamperes and microcelsius instead of millivolts,
milliamperes and millicelsius as advertised.
Cc: stable(a)vger.kernel.org
Fixes: 07063bbfa98e ("iio: adc: New driver for the AB8500 GPADC")
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
drivers/iio/adc/ab8500-gpadc.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iio/adc/ab8500-gpadc.c b/drivers/iio/adc/ab8500-gpadc.c
index 6f9a3e2d5533..7b5212ba5501 100644
--- a/drivers/iio/adc/ab8500-gpadc.c
+++ b/drivers/iio/adc/ab8500-gpadc.c
@@ -918,7 +918,7 @@ static int ab8500_gpadc_read_raw(struct iio_dev *indio_dev,
return processed;
/* Return millivolt or milliamps or millicentigrades */
- *val = processed * 1000;
+ *val = processed;
return IIO_VAL_INT;
}
--
2.29.2
Older verions of libelf cannot recognize the compressed section.
However, it's only required to fix the compressed section info when compiling with CONFIG_DEBUG_INFO_COMPRESSED flag is set.
Only compile the compressed_section_fix function when necessary will make it easier to enable the BTF function.
Since the tool resolve_btfids is compiled with host toolchain.
The host toolchain might be older than the cross compile toolchain.
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Kun-Chuan Hsieh <jetswayss(a)gmail.com>
---
tools/bpf/resolve_btfids/main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/bpf/resolve_btfids/main.c b/tools/bpf/resolve_btfids/main.c
index 7409d7860aa6..ad40346c6631 100644
--- a/tools/bpf/resolve_btfids/main.c
+++ b/tools/bpf/resolve_btfids/main.c
@@ -260,6 +260,7 @@ static struct btf_id *add_symbol(struct rb_root *root, char *name, size_t size)
return btf_id__add(root, id, false);
}
+#ifdef CONFIG_DEBUG_INFO_COMPRESSED
/*
* The data of compressed section should be aligned to 4
* (for 32bit) or 8 (for 64 bit) bytes. The binutils ld
@@ -292,6 +293,7 @@ static int compressed_section_fix(Elf *elf, Elf_Scn *scn, GElf_Shdr *sh)
}
return 0;
}
+#endif
static int elf_collect(struct object *obj)
{
@@ -370,8 +372,10 @@ static int elf_collect(struct object *obj)
obj->efile.idlist_addr = sh.sh_addr;
}
+#ifdef CONFIG_DEBUG_INFO_COMPRESSED
if (compressed_section_fix(elf, scn, &sh))
return -1;
+#endif
}
return 0;
--
2.25.1