This is the start of the stable review cycle for the 4.19.81 release.
There are 93 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Tue 29 Oct 2019 08:27:02 PM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.81-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.81-rc1
Greg KH <gregkh(a)linuxfoundation.org>
RDMA/cxgb4: Do not dma memory off of the stack
Tejun Heo <tj(a)kernel.org>
blk-rq-qos: fix first node deletion of rq_qos_del()
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
PCI: PM: Fix pci_power_up()
Juergen Gross <jgross(a)suse.com>
xen/netback: fix error path of xenvif_connect_data()
Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
cpufreq: Avoid cpufreq_suspend() deadlock on system shutdown
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
memstick: jmb38x_ms: Fix an error handling path in 'jmb38x_ms_probe()'
Qu Wenruo <wqu(a)suse.com>
btrfs: tracepoints: Fix bad entry members of qgroup events
Filipe Manana <fdmanana(a)suse.com>
Btrfs: check for the full sync flag while holding the inode lock during fsync
Filipe Manana <fdmanana(a)suse.com>
Btrfs: add missing extents release on file extent cluster relocation error
Qu Wenruo <wqu(a)suse.com>
btrfs: block-group: Fix a memory leak due to missing btrfs_put_block_group()
Patrick Williams <alpawi(a)amazon.com>
pinctrl: armada-37xx: swap polarity on LED group
Patrick Williams <alpawi(a)amazon.com>
pinctrl: armada-37xx: fix control of pins 32 and up
Dmitry Torokhov <dmitry.torokhov(a)gmail.com>
pinctrl: cherryview: restore Strago DMI workaround for all versions
Sean Christopherson <sean.j.christopherson(a)intel.com>
x86/apic/x2apic: Fix a NULL pointer deref when handling a dying cpu
Steve Wahl <steve.wahl(a)hpe.com>
x86/boot/64: Make level2_kernel_pgt pages invalid outside kernel area
Mikulas Patocka <mpatocka(a)redhat.com>
dm cache: fix bugs when a GFP_NOWAIT allocation fails
Prateek Sood <prsood(a)codeaurora.org>
tracing: Fix race in perf_trace_buf initialization
Alexander Shishkin <alexander.shishkin(a)linux.intel.com>
perf/aux: Fix AUX output stopping
Pavel Shilovsky <pshilov(a)microsoft.com>
CIFS: Fix use after free of file info structures
Roberto Bergantinos Corpas <rbergant(a)redhat.com>
CIFS: avoid using MID 0xFFFF
Marc Zyngier <marc.zyngier(a)arm.com>
arm64: Enable workaround for Cavium TX2 erratum 219 when running SMT
James Morse <james.morse(a)arm.com>
EDAC/ghes: Fix Use after free in ghes_edac remove path
Helge Deller <deller(a)gmx.de>
parisc: Fix vmap memory leak in ioremap()/iounmap()
Max Filippov <jcmvbkbc(a)gmail.com>
xtensa: drop EXPORT_SYMBOL for outs*/ins*
Jane Chu <jane.chu(a)oracle.com>
mm/memory-failure: poison read receives SIGKILL instead of SIGBUS if mmaped more than once
David Hildenbrand <david(a)redhat.com>
hugetlbfs: don't access uninitialized memmaps in pfn_range_valid_gigantic()
Qian Cai <cai(a)lca.pw>
mm/page_owner: don't access uninitialized memmaps when reading /proc/pagetypeinfo
Qian Cai <cai(a)lca.pw>
mm/slub: fix a deadlock in show_slab_objects()
David Hildenbrand <david(a)redhat.com>
mm/memory-failure.c: don't access uninitialized memmaps in memory_failure()
Faiz Abbas <faiz_abbas(a)ti.com>
mmc: cqhci: Commit descriptors before setting the doorbell
David Hildenbrand <david(a)redhat.com>
fs/proc/page.c: don't access uninitialized memmaps in fs/proc/page.c
David Hildenbrand <david(a)redhat.com>
drivers/base/memory.c: don't access uninitialized memmaps in soft_offline_page_store()
Hans de Goede <hdegoede(a)redhat.com>
drm/amdgpu: Bail earlier when amdgpu.cik_/si_support is not set to 1
Thomas Hellstrom <thellstrom(a)vmware.com>
drm/ttm: Restore ttm prefaulting
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
drm/edid: Add 6 bpc quirk for SDC panel in Lenovo G50
Will Deacon <will(a)kernel.org>
mac80211: Reject malformed SSID elements
Will Deacon <will(a)kernel.org>
cfg80211: wext: avoid copying malformed SSIDs
John Garry <john.garry(a)huawei.com>
ACPI: CPPC: Set pcc_data[pcc_ss_id] to NULL in acpi_cppc_processor_exit()
Junya Monden <jmonden(a)jp.adit-jv.com>
ASoC: rsnd: Reinitialize bit clock inversion flag for every format setting
Evan Green <evgreen(a)chromium.org>
Input: synaptics-rmi4 - avoid processing unknown IRQs
Marco Felsch <m.felsch(a)pengutronix.de>
Input: da9063 - fix capability and drop KEY_SLEEP
Bart Van Assche <bvanassche(a)acm.org>
scsi: ch: Make it possible to open a ch device multiple times again
Yufen Yu <yuyufen(a)huawei.com>
scsi: core: try to get module before removing device
Damien Le Moal <damien.lemoal(a)wdc.com>
scsi: core: save/restore command resid for error handling
Oliver Neukum <oneukum(a)suse.com>
scsi: sd: Ignore a failure to sync cache due to lack of authorization
Steffen Maier <maier(a)linux.ibm.com>
scsi: zfcp: fix reaction on bit error threshold notification
Colin Ian King <colin.king(a)canonical.com>
staging: wlan-ng: fix exit return when sme->key_idx >= NUM_WEPKEYS
Paul Burton <paulburton(a)kernel.org>
MIPS: tlbex: Fix build_restore_pagemask KScratch restore
Johan Hovold <johan(a)kernel.org>
USB: ldusb: fix read info leaks
Johan Hovold <johan(a)kernel.org>
USB: usblp: fix use-after-free on disconnect
Johan Hovold <johan(a)kernel.org>
USB: ldusb: fix memleak on disconnect
Johan Hovold <johan(a)kernel.org>
USB: serial: ti_usb_3410_5052: fix port-close races
Gustavo A. R. Silva <gustavo(a)embeddedor.com>
usb: udc: lpc32xx: fix bad bit shift operation
Lukas Wunner <lukas(a)wunner.de>
ALSA: hda - Force runtime PM on Nvidia HDMI codecs
Szabolcs Szőke <szszoke.code(a)gmail.com>
ALSA: usb-audio: Disable quirks for BOSS Katana amplifiers
Daniel Drake <drake(a)endlessm.com>
ALSA: hda/realtek - Enable headset mic on Asus MJ401TA
Kailang Yang <kailang(a)realtek.com>
ALSA: hda/realtek - Add support for ALC711
Johan Hovold <johan(a)kernel.org>
USB: legousbtower: fix memleak on disconnect
Matthew Wilcox (Oracle) <willy(a)infradead.org>
memfd: Fix locking when tagging pins
Xin Long <lucien.xin(a)gmail.com>
sctp: change sctp_prot .no_autobind with true
Biao Huang <biao.huang(a)mediatek.com>
net: stmmac: disable/enable ptp_ref_clk in suspend/resume flow
Xin Long <lucien.xin(a)gmail.com>
net: ipv6: fix listify ip6_rcv_finish in case of forwarding
Cédric Le Goater <clg(a)kaod.org>
net/ibmvnic: Fix EOI when running in XIVE mode.
Thomas Bogendoerfer <tbogendoerfer(a)suse.de>
net: i82596: fix dma_alloc_attr for sni_82596
Florian Fainelli <f.fainelli(a)gmail.com>
net: bcmgenet: Set phydev->dev_flags only for internal PHYs
Florian Fainelli <f.fainelli(a)gmail.com>
net: bcmgenet: Fix RGMII_MODE_EN value for GENET v1/2/3
Eric Dumazet <edumazet(a)google.com>
net: avoid potential infinite loop in tc_ctl_action()
Stefano Brivio <sbrivio(a)redhat.com>
ipv4: Return -ENETUNREACH if we can't create route but saddr is valid
Wei Wang <weiwan(a)google.com>
ipv4: fix race condition between route lookup and invalidation
Yi Li <yilikernel(a)gmail.com>
ocfs2: fix panic due to ocfs2_wq is null
Alex Deucher <alexander.deucher(a)amd.com>
Revert "drm/radeon: Fix EEH during kexec"
Song Liu <songliubraving(a)fb.com>
md/raid0: fix warning message for parameter default_layout
Dan Williams <dan.j.williams(a)intel.com>
libata/ahci: Fix PCS quirk application
Jacob Keller <jacob.e.keller(a)intel.com>
namespace: fix namespace.pl script to support relative paths
Kai-Heng Feng <kai.heng.feng(a)canonical.com>
r8152: Set macpassthru in reset_resume callback
Randy Dunlap <rdunlap(a)infradead.org>
lib: textsearch: fix escapes in example code
Yizhuo <yzhai003(a)ucr.edu>
net: hisilicon: Fix usage of uninitialized variable in function mdio_sc_cfg_reg_write()
Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
mips: Loongson: Fix the link time qualifier of 'serial_exit()'
Wen Yang <wenyang(a)linux.alibaba.com>
net: dsa: rtl8366rb: add missing of_node_put after calling of_get_child_by_name
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nft_connlimit: disable bh on garbage collection
Miaoqing Pan <miaoqing(a)codeaurora.org>
mac80211: fix txq null pointer dereference
Miaoqing Pan <miaoqing(a)codeaurora.org>
nl80211: fix null pointer dereference
Ross Lagerwall <ross.lagerwall(a)citrix.com>
xen/efi: Set nonblocking callbacks
Oleksij Rempel <o.rempel(a)pengutronix.de>
MIPS: dts: ar9331: fix interrupt-controller size
Michal Vokáč <michal.vokac(a)ysoft.com>
net: dsa: qca8k: Use up to 7 ports for all operations
Peter Ujfalusi <peter.ujfalusi(a)ti.com>
ARM: dts: am4372: Set memory bandwidth limit for DISPC
Navid Emamdoost <navid.emamdoost(a)gmail.com>
ieee802154: ca8210: prevent memory leak
Tony Lindgren <tony(a)atomide.com>
ARM: OMAP2+: Fix warnings with broken omap2_set_init_voltage()
Tony Lindgren <tony(a)atomide.com>
ARM: OMAP2+: Fix missing reset done flag for am3 and am43
Quinn Tran <qutran(a)marvell.com>
scsi: qla2xxx: Fix unbound sleep in fcport delete path.
Xiang Chen <chenxiang66(a)hisilicon.com>
scsi: megaraid: disable device when probe failed after enabled device
Stanley Chu <stanley.chu(a)mediatek.com>
scsi: ufs: skip shutdown if hba is not powered
Balbir Singh <sblbir(a)amzn.com>
nvme-pci: Fix a race in controller removal
-------------
Diffstat:
Makefile | 4 +-
arch/arm/boot/dts/am4372.dtsi | 2 +
.../mach-omap2/omap_hwmod_33xx_43xx_ipblock_data.c | 3 +-
arch/arm/mach-omap2/pm.c | 100 ---------------------
arch/arm/xen/efi.c | 2 +
arch/arm64/kernel/cpu_errata.c | 33 +++++++
arch/mips/boot/dts/qca/ar9331.dtsi | 2 +-
arch/mips/loongson64/common/serial.c | 2 +-
arch/mips/mm/tlbex.c | 23 +++--
arch/parisc/mm/ioremap.c | 12 +--
arch/x86/kernel/apic/x2apic_cluster.c | 3 +-
arch/x86/kernel/head64.c | 22 ++++-
arch/x86/xen/efi.c | 2 +
arch/xtensa/kernel/xtensa_ksyms.c | 7 --
block/blk-rq-qos.h | 13 ++-
drivers/acpi/cppc_acpi.c | 2 +-
drivers/ata/ahci.c | 4 +-
drivers/base/core.c | 3 +
drivers/base/memory.c | 3 +
drivers/cpufreq/cpufreq.c | 10 ---
drivers/edac/ghes_edac.c | 4 +
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 35 ++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 35 --------
drivers/gpu/drm/drm_edid.c | 3 +
drivers/gpu/drm/radeon/radeon_drv.c | 8 --
drivers/gpu/drm/ttm/ttm_bo_vm.c | 16 ++--
drivers/infiniband/hw/cxgb4/mem.c | 28 +++---
drivers/input/misc/da9063_onkey.c | 5 +-
drivers/input/rmi4/rmi_driver.c | 6 +-
drivers/md/dm-cache-target.c | 28 +-----
drivers/md/raid0.c | 2 +-
drivers/memstick/host/jmb38x_ms.c | 2 +-
drivers/mmc/host/cqhci.c | 3 +-
drivers/net/dsa/qca8k.c | 4 +-
drivers/net/dsa/rtl8366rb.c | 16 ++--
drivers/net/ethernet/broadcom/genet/bcmgenet.h | 1 +
drivers/net/ethernet/broadcom/genet/bcmmii.c | 11 ++-
drivers/net/ethernet/hisilicon/hns_mdio.c | 6 +-
drivers/net/ethernet/i825xx/lasi_82596.c | 4 +-
drivers/net/ethernet/i825xx/lib82596.c | 4 +-
drivers/net/ethernet/i825xx/sni_82596.c | 4 +-
drivers/net/ethernet/ibm/ibmvnic.c | 8 +-
drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 12 ++-
drivers/net/ieee802154/ca8210.c | 2 +-
drivers/net/usb/r8152.c | 3 +-
drivers/net/xen-netback/interface.c | 1 -
drivers/nvme/host/core.c | 5 +-
drivers/pci/pci.c | 24 +++--
drivers/pinctrl/intel/pinctrl-cherryview.c | 4 -
drivers/pinctrl/mvebu/pinctrl-armada-37xx.c | 26 +++---
drivers/s390/scsi/zfcp_fsf.c | 16 +++-
drivers/scsi/ch.c | 1 -
drivers/scsi/megaraid.c | 4 +-
drivers/scsi/qla2xxx/qla_target.c | 4 +
drivers/scsi/scsi_error.c | 3 +
drivers/scsi/scsi_sysfs.c | 11 ++-
drivers/scsi/sd.c | 3 +-
drivers/scsi/ufs/ufshcd.c | 3 +
drivers/staging/wlan-ng/cfg80211.c | 6 +-
drivers/usb/class/usblp.c | 4 +-
drivers/usb/gadget/udc/lpc32xx_udc.c | 6 +-
drivers/usb/misc/ldusb.c | 23 ++---
drivers/usb/misc/legousbtower.c | 5 +-
drivers/usb/serial/ti_usb_3410_5052.c | 10 +--
fs/btrfs/extent-tree.c | 1 +
fs/btrfs/file.c | 36 ++++----
fs/btrfs/relocation.c | 2 +
fs/cifs/file.c | 6 +-
fs/cifs/smb1ops.c | 3 +
fs/ocfs2/journal.c | 3 +-
fs/ocfs2/localalloc.c | 3 +-
fs/proc/page.c | 28 +++---
include/scsi/scsi_eh.h | 1 +
include/trace/events/btrfs.h | 3 +-
kernel/events/core.c | 2 +-
kernel/trace/trace_event_perf.c | 4 +
lib/textsearch.c | 4 +-
mm/hugetlb.c | 5 +-
mm/memfd.c | 18 ++--
mm/memory-failure.c | 36 ++++----
mm/page_owner.c | 5 +-
mm/slub.c | 13 ++-
net/ipv4/route.c | 11 ++-
net/ipv6/ip6_input.c | 4 +-
net/mac80211/debugfs_netdev.c | 11 ++-
net/mac80211/mlme.c | 5 +-
net/netfilter/nft_connlimit.c | 7 +-
net/sched/act_api.c | 14 +--
net/sctp/socket.c | 4 +-
net/wireless/nl80211.c | 3 +
net/wireless/wext-sme.c | 8 +-
scripts/namespace.pl | 13 +--
sound/pci/hda/patch_hdmi.c | 2 +
sound/pci/hda/patch_realtek.c | 14 +++
sound/soc/sh/rcar/core.c | 1 +
sound/usb/pcm.c | 3 +
96 files changed, 495 insertions(+), 439 deletions(-)
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
commit 93cad5f789951eaa27c3392b15294b4e51253944 upstream.
Cc: stable(a)vger.kernel.org # v5.3
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
[mpe: Some minor fixes to make it build]
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20190924035254.24612-4-aneesh.kumar@linux.ibm.com
[sandipan: Backported to v5.3]
Signed-off-by: Sandipan Das <sandipan(a)linux.ibm.com>
---
tools/testing/selftests/powerpc/mm/Makefile | 2 +
.../testing/selftests/powerpc/mm/tlbie_test.c | 734 ++++++++++++++++++
2 files changed, 736 insertions(+)
create mode 100644 tools/testing/selftests/powerpc/mm/tlbie_test.c
diff --git a/tools/testing/selftests/powerpc/mm/Makefile b/tools/testing/selftests/powerpc/mm/Makefile
index f1fbc15800c4..ed1565809d2b 100644
--- a/tools/testing/selftests/powerpc/mm/Makefile
+++ b/tools/testing/selftests/powerpc/mm/Makefile
@@ -4,6 +4,7 @@ noarg:
TEST_GEN_PROGS := hugetlb_vs_thp_test subpage_prot prot_sao segv_errors wild_bctr \
large_vm_fork_separation
+TEST_GEN_PROGS_EXTENDED := tlbie_test
TEST_GEN_FILES := tempfile
top_srcdir = ../../../../..
@@ -19,3 +20,4 @@ $(OUTPUT)/large_vm_fork_separation: CFLAGS += -m64
$(OUTPUT)/tempfile:
dd if=/dev/zero of=$@ bs=64k count=1
+$(OUTPUT)/tlbie_test: LDLIBS += -lpthread
diff --git a/tools/testing/selftests/powerpc/mm/tlbie_test.c b/tools/testing/selftests/powerpc/mm/tlbie_test.c
new file mode 100644
index 000000000000..9868a5ddd847
--- /dev/null
+++ b/tools/testing/selftests/powerpc/mm/tlbie_test.c
@@ -0,0 +1,734 @@
+// SPDX-License-Identifier: GPL-2.0
+
+/*
+ * Copyright 2019, Nick Piggin, Gautham R. Shenoy, Aneesh Kumar K.V, IBM Corp.
+ */
+
+/*
+ *
+ * Test tlbie/mtpidr race. We have 4 threads doing flush/load/compare/store
+ * sequence in a loop. The same threads also rung a context switch task
+ * that does sched_yield() in loop.
+ *
+ * The snapshot thread mark the mmap area PROT_READ in between, make a copy
+ * and copy it back to the original area. This helps us to detect if any
+ * store continued to happen after we marked the memory PROT_READ.
+ */
+
+#define _GNU_SOURCE
+#include <stdio.h>
+#include <sys/mman.h>
+#include <sys/types.h>
+#include <sys/wait.h>
+#include <sys/ipc.h>
+#include <sys/shm.h>
+#include <sys/stat.h>
+#include <sys/time.h>
+#include <linux/futex.h>
+#include <unistd.h>
+#include <asm/unistd.h>
+#include <string.h>
+#include <stdlib.h>
+#include <fcntl.h>
+#include <sched.h>
+#include <time.h>
+#include <stdarg.h>
+#include <sched.h>
+#include <pthread.h>
+#include <signal.h>
+#include <sys/prctl.h>
+
+static inline void dcbf(volatile unsigned int *addr)
+{
+ __asm__ __volatile__ ("dcbf %y0; sync" : : "Z"(*(unsigned char *)addr) : "memory");
+}
+
+static void err_msg(char *msg)
+{
+
+ time_t now;
+ time(&now);
+ printf("=================================\n");
+ printf(" Error: %s\n", msg);
+ printf(" %s", ctime(&now));
+ printf("=================================\n");
+ exit(1);
+}
+
+static char *map1;
+static char *map2;
+static pid_t rim_process_pid;
+
+/*
+ * A "rim-sequence" is defined to be the sequence of the following
+ * operations performed on a memory word:
+ * 1) FLUSH the contents of that word.
+ * 2) LOAD the contents of that word.
+ * 3) COMPARE the contents of that word with the content that was
+ * previously stored at that word
+ * 4) STORE new content into that word.
+ *
+ * The threads in this test that perform the rim-sequence are termed
+ * as rim_threads.
+ */
+
+/*
+ * A "corruption" is defined to be the failed COMPARE operation in a
+ * rim-sequence.
+ *
+ * A rim_thread that detects a corruption informs about it to all the
+ * other rim_threads, and the mem_snapshot thread.
+ */
+static volatile unsigned int corruption_found;
+
+/*
+ * This defines the maximum number of rim_threads in this test.
+ *
+ * The THREAD_ID_BITS denote the number of bits required
+ * to represent the thread_ids [0..MAX_THREADS - 1].
+ * We are being a bit paranoid here and set it to 8 bits,
+ * though 6 bits suffice.
+ *
+ */
+#define MAX_THREADS 64
+#define THREAD_ID_BITS 8
+#define THREAD_ID_MASK ((1 << THREAD_ID_BITS) - 1)
+static unsigned int rim_thread_ids[MAX_THREADS];
+static pthread_t rim_threads[MAX_THREADS];
+
+
+/*
+ * Each rim_thread works on an exclusive "chunk" of size
+ * RIM_CHUNK_SIZE.
+ *
+ * The ith rim_thread works on the ith chunk.
+ *
+ * The ith chunk begins at
+ * map1 + (i * RIM_CHUNK_SIZE)
+ */
+#define RIM_CHUNK_SIZE 1024
+#define BITS_PER_BYTE 8
+#define WORD_SIZE (sizeof(unsigned int))
+#define WORD_BITS (WORD_SIZE * BITS_PER_BYTE)
+#define WORDS_PER_CHUNK (RIM_CHUNK_SIZE/WORD_SIZE)
+
+static inline char *compute_chunk_start_addr(unsigned int thread_id)
+{
+ char *chunk_start;
+
+ chunk_start = (char *)((unsigned long)map1 +
+ (thread_id * RIM_CHUNK_SIZE));
+
+ return chunk_start;
+}
+
+/*
+ * The "word-offset" of a word-aligned address inside a chunk, is
+ * defined to be the number of words that precede the address in that
+ * chunk.
+ *
+ * WORD_OFFSET_BITS denote the number of bits required to represent
+ * the word-offsets of all the word-aligned addresses of a chunk.
+ */
+#define WORD_OFFSET_BITS (__builtin_ctz(WORDS_PER_CHUNK))
+#define WORD_OFFSET_MASK ((1 << WORD_OFFSET_BITS) - 1)
+
+static inline unsigned int compute_word_offset(char *start, unsigned int *addr)
+{
+ unsigned int delta_bytes, ret;
+ delta_bytes = (unsigned long)addr - (unsigned long)start;
+
+ ret = delta_bytes/WORD_SIZE;
+
+ return ret;
+}
+
+/*
+ * A "sweep" is defined to be the sequential execution of the
+ * rim-sequence by a rim_thread on its chunk one word at a time,
+ * starting from the first word of its chunk and ending with the last
+ * word of its chunk.
+ *
+ * Each sweep of a rim_thread is uniquely identified by a sweep_id.
+ * SWEEP_ID_BITS denote the number of bits required to represent
+ * the sweep_ids of rim_threads.
+ *
+ * As to why SWEEP_ID_BITS are computed as a function of THREAD_ID_BITS,
+ * WORD_OFFSET_BITS, and WORD_BITS, see the "store-pattern" below.
+ */
+#define SWEEP_ID_BITS (WORD_BITS - (THREAD_ID_BITS + WORD_OFFSET_BITS))
+#define SWEEP_ID_MASK ((1 << SWEEP_ID_BITS) - 1)
+
+/*
+ * A "store-pattern" is the word-pattern that is stored into a word
+ * location in the 4)STORE step of the rim-sequence.
+ *
+ * In the store-pattern, we shall encode:
+ *
+ * - The thread-id of the rim_thread performing the store
+ * (The most significant THREAD_ID_BITS)
+ *
+ * - The word-offset of the address into which the store is being
+ * performed (The next WORD_OFFSET_BITS)
+ *
+ * - The sweep_id of the current sweep in which the store is
+ * being performed. (The lower SWEEP_ID_BITS)
+ *
+ * Store Pattern: 32 bits
+ * |------------------|--------------------|---------------------------------|
+ * | Thread id | Word offset | sweep_id |
+ * |------------------|--------------------|---------------------------------|
+ * THREAD_ID_BITS WORD_OFFSET_BITS SWEEP_ID_BITS
+ *
+ * In the store pattern, the (Thread-id + Word-offset) uniquely identify the
+ * address to which the store is being performed i.e,
+ * address == map1 +
+ * (Thread-id * RIM_CHUNK_SIZE) + (Word-offset * WORD_SIZE)
+ *
+ * And the sweep_id in the store pattern identifies the time when the
+ * store was performed by the rim_thread.
+ *
+ * We shall use this property in the 3)COMPARE step of the
+ * rim-sequence.
+ */
+#define SWEEP_ID_SHIFT 0
+#define WORD_OFFSET_SHIFT (SWEEP_ID_BITS)
+#define THREAD_ID_SHIFT (WORD_OFFSET_BITS + SWEEP_ID_BITS)
+
+/*
+ * Compute the store pattern for a given thread with id @tid, at
+ * location @addr in the sweep identified by @sweep_id
+ */
+static inline unsigned int compute_store_pattern(unsigned int tid,
+ unsigned int *addr,
+ unsigned int sweep_id)
+{
+ unsigned int ret = 0;
+ char *start = compute_chunk_start_addr(tid);
+ unsigned int word_offset = compute_word_offset(start, addr);
+
+ ret += (tid & THREAD_ID_MASK) << THREAD_ID_SHIFT;
+ ret += (word_offset & WORD_OFFSET_MASK) << WORD_OFFSET_SHIFT;
+ ret += (sweep_id & SWEEP_ID_MASK) << SWEEP_ID_SHIFT;
+ return ret;
+}
+
+/* Extract the thread-id from the given store-pattern */
+static inline unsigned int extract_tid(unsigned int pattern)
+{
+ unsigned int ret;
+
+ ret = (pattern >> THREAD_ID_SHIFT) & THREAD_ID_MASK;
+ return ret;
+}
+
+/* Extract the word-offset from the given store-pattern */
+static inline unsigned int extract_word_offset(unsigned int pattern)
+{
+ unsigned int ret;
+
+ ret = (pattern >> WORD_OFFSET_SHIFT) & WORD_OFFSET_MASK;
+
+ return ret;
+}
+
+/* Extract the sweep-id from the given store-pattern */
+static inline unsigned int extract_sweep_id(unsigned int pattern)
+
+{
+ unsigned int ret;
+
+ ret = (pattern >> SWEEP_ID_SHIFT) & SWEEP_ID_MASK;
+
+ return ret;
+}
+
+/************************************************************
+ * *
+ * Logging the output of the verification *
+ * *
+ ************************************************************/
+#define LOGDIR_NAME_SIZE 100
+static char logdir[LOGDIR_NAME_SIZE];
+
+static FILE *fp[MAX_THREADS];
+static const char logfilename[] ="Thread-%02d-Chunk";
+
+static inline void start_verification_log(unsigned int tid,
+ unsigned int *addr,
+ unsigned int cur_sweep_id,
+ unsigned int prev_sweep_id)
+{
+ FILE *f;
+ char logfile[30];
+ char path[LOGDIR_NAME_SIZE + 30];
+ char separator[2] = "/";
+ char *chunk_start = compute_chunk_start_addr(tid);
+ unsigned int size = RIM_CHUNK_SIZE;
+
+ sprintf(logfile, logfilename, tid);
+ strcpy(path, logdir);
+ strcat(path, separator);
+ strcat(path, logfile);
+ f = fopen(path, "w");
+
+ if (!f) {
+ err_msg("Unable to create logfile\n");
+ }
+
+ fp[tid] = f;
+
+ fprintf(f, "----------------------------------------------------------\n");
+ fprintf(f, "PID = %d\n", rim_process_pid);
+ fprintf(f, "Thread id = %02d\n", tid);
+ fprintf(f, "Chunk Start Addr = 0x%016lx\n", (unsigned long)chunk_start);
+ fprintf(f, "Chunk Size = %d\n", size);
+ fprintf(f, "Next Store Addr = 0x%016lx\n", (unsigned long)addr);
+ fprintf(f, "Current sweep-id = 0x%08x\n", cur_sweep_id);
+ fprintf(f, "Previous sweep-id = 0x%08x\n", prev_sweep_id);
+ fprintf(f, "----------------------------------------------------------\n");
+}
+
+static inline void log_anamoly(unsigned int tid, unsigned int *addr,
+ unsigned int expected, unsigned int observed)
+{
+ FILE *f = fp[tid];
+
+ fprintf(f, "Thread %02d: Addr 0x%lx: Expected 0x%x, Observed 0x%x\n",
+ tid, (unsigned long)addr, expected, observed);
+ fprintf(f, "Thread %02d: Expected Thread id = %02d\n", tid, extract_tid(expected));
+ fprintf(f, "Thread %02d: Observed Thread id = %02d\n", tid, extract_tid(observed));
+ fprintf(f, "Thread %02d: Expected Word offset = %03d\n", tid, extract_word_offset(expected));
+ fprintf(f, "Thread %02d: Observed Word offset = %03d\n", tid, extract_word_offset(observed));
+ fprintf(f, "Thread %02d: Expected sweep-id = 0x%x\n", tid, extract_sweep_id(expected));
+ fprintf(f, "Thread %02d: Observed sweep-id = 0x%x\n", tid, extract_sweep_id(observed));
+ fprintf(f, "----------------------------------------------------------\n");
+}
+
+static inline void end_verification_log(unsigned int tid, unsigned nr_anamolies)
+{
+ FILE *f = fp[tid];
+ char logfile[30];
+ char path[LOGDIR_NAME_SIZE + 30];
+ char separator[] = "/";
+
+ fclose(f);
+
+ if (nr_anamolies == 0) {
+ remove(path);
+ return;
+ }
+
+ sprintf(logfile, logfilename, tid);
+ strcpy(path, logdir);
+ strcat(path, separator);
+ strcat(path, logfile);
+
+ printf("Thread %02d chunk has %d corrupted words. For details check %s\n",
+ tid, nr_anamolies, path);
+}
+
+/*
+ * When a COMPARE step of a rim-sequence fails, the rim_thread informs
+ * everyone else via the shared_memory pointed to by
+ * corruption_found variable. On seeing this, every thread verifies the
+ * content of its chunk as follows.
+ *
+ * Suppose a thread identified with @tid was about to store (but not
+ * yet stored) to @next_store_addr in its current sweep identified
+ * @cur_sweep_id. Let @prev_sweep_id indicate the previous sweep_id.
+ *
+ * This implies that for all the addresses @addr < @next_store_addr,
+ * Thread @tid has already performed a store as part of its current
+ * sweep. Hence we expect the content of such @addr to be:
+ * |-------------------------------------------------|
+ * | tid | word_offset(addr) | cur_sweep_id |
+ * |-------------------------------------------------|
+ *
+ * Since Thread @tid is yet to perform stores on address
+ * @next_store_addr and above, we expect the content of such an
+ * address @addr to be:
+ * |-------------------------------------------------|
+ * | tid | word_offset(addr) | prev_sweep_id |
+ * |-------------------------------------------------|
+ *
+ * The verifier function @verify_chunk does this verification and logs
+ * any anamolies that it finds.
+ */
+static void verify_chunk(unsigned int tid, unsigned int *next_store_addr,
+ unsigned int cur_sweep_id,
+ unsigned int prev_sweep_id)
+{
+ unsigned int *iter_ptr;
+ unsigned int size = RIM_CHUNK_SIZE;
+ unsigned int expected;
+ unsigned int observed;
+ char *chunk_start = compute_chunk_start_addr(tid);
+
+ int nr_anamolies = 0;
+
+ start_verification_log(tid, next_store_addr,
+ cur_sweep_id, prev_sweep_id);
+
+ for (iter_ptr = (unsigned int *)chunk_start;
+ (unsigned long)iter_ptr < (unsigned long)chunk_start + size;
+ iter_ptr++) {
+ unsigned int expected_sweep_id;
+
+ if (iter_ptr < next_store_addr) {
+ expected_sweep_id = cur_sweep_id;
+ } else {
+ expected_sweep_id = prev_sweep_id;
+ }
+
+ expected = compute_store_pattern(tid, iter_ptr, expected_sweep_id);
+
+ dcbf((volatile unsigned int*)iter_ptr); //Flush before reading
+ observed = *iter_ptr;
+
+ if (observed != expected) {
+ nr_anamolies++;
+ log_anamoly(tid, iter_ptr, expected, observed);
+ }
+ }
+
+ end_verification_log(tid, nr_anamolies);
+}
+
+static void set_pthread_cpu(pthread_t th, int cpu)
+{
+ cpu_set_t run_cpu_mask;
+ struct sched_param param;
+
+ CPU_ZERO(&run_cpu_mask);
+ CPU_SET(cpu, &run_cpu_mask);
+ pthread_setaffinity_np(th, sizeof(cpu_set_t), &run_cpu_mask);
+
+ param.sched_priority = 1;
+ if (0 && sched_setscheduler(0, SCHED_FIFO, ¶m) == -1) {
+ /* haven't reproduced with this setting, it kills random preemption which may be a factor */
+ fprintf(stderr, "could not set SCHED_FIFO, run as root?\n");
+ }
+}
+
+static void set_mycpu(int cpu)
+{
+ cpu_set_t run_cpu_mask;
+ struct sched_param param;
+
+ CPU_ZERO(&run_cpu_mask);
+ CPU_SET(cpu, &run_cpu_mask);
+ sched_setaffinity(0, sizeof(cpu_set_t), &run_cpu_mask);
+
+ param.sched_priority = 1;
+ if (0 && sched_setscheduler(0, SCHED_FIFO, ¶m) == -1) {
+ fprintf(stderr, "could not set SCHED_FIFO, run as root?\n");
+ }
+}
+
+static volatile int segv_wait;
+
+static void segv_handler(int signo, siginfo_t *info, void *extra)
+{
+ while (segv_wait) {
+ sched_yield();
+ }
+
+}
+
+static void set_segv_handler(void)
+{
+ struct sigaction sa;
+
+ sa.sa_flags = SA_SIGINFO;
+ sa.sa_sigaction = segv_handler;
+
+ if (sigaction(SIGSEGV, &sa, NULL) == -1) {
+ perror("sigaction");
+ exit(EXIT_FAILURE);
+ }
+}
+
+int timeout = 0;
+/*
+ * This function is executed by every rim_thread.
+ *
+ * This function performs sweeps over the exclusive chunks of the
+ * rim_threads executing the rim-sequence one word at a time.
+ */
+static void *rim_fn(void *arg)
+{
+ unsigned int tid = *((unsigned int *)arg);
+
+ int size = RIM_CHUNK_SIZE;
+ char *chunk_start = compute_chunk_start_addr(tid);
+
+ unsigned int prev_sweep_id;
+ unsigned int cur_sweep_id = 0;
+
+ /* word access */
+ unsigned int pattern = cur_sweep_id;
+ unsigned int *pattern_ptr = &pattern;
+ unsigned int *w_ptr, read_data;
+
+ set_segv_handler();
+
+ /*
+ * Let us initialize the chunk:
+ *
+ * Each word-aligned address addr in the chunk,
+ * is initialized to :
+ * |-------------------------------------------------|
+ * | tid | word_offset(addr) | 0 |
+ * |-------------------------------------------------|
+ */
+ for (w_ptr = (unsigned int *)chunk_start;
+ (unsigned long)w_ptr < (unsigned long)(chunk_start) + size;
+ w_ptr++) {
+
+ *pattern_ptr = compute_store_pattern(tid, w_ptr, cur_sweep_id);
+ *w_ptr = *pattern_ptr;
+ }
+
+ while (!corruption_found && !timeout) {
+ prev_sweep_id = cur_sweep_id;
+ cur_sweep_id = cur_sweep_id + 1;
+
+ for (w_ptr = (unsigned int *)chunk_start;
+ (unsigned long)w_ptr < (unsigned long)(chunk_start) + size;
+ w_ptr++) {
+ unsigned int old_pattern;
+
+ /*
+ * Compute the pattern that we would have
+ * stored at this location in the previous
+ * sweep.
+ */
+ old_pattern = compute_store_pattern(tid, w_ptr, prev_sweep_id);
+
+ /*
+ * FLUSH:Ensure that we flush the contents of
+ * the cache before loading
+ */
+ dcbf((volatile unsigned int*)w_ptr); //Flush
+
+ /* LOAD: Read the value */
+ read_data = *w_ptr; //Load
+
+ /*
+ * COMPARE: Is it the same as what we had stored
+ * in the previous sweep ? It better be!
+ */
+ if (read_data != old_pattern) {
+ /* No it isn't! Tell everyone */
+ corruption_found = 1;
+ }
+
+ /*
+ * Before performing a store, let us check if
+ * any rim_thread has found a corruption.
+ */
+ if (corruption_found || timeout) {
+ /*
+ * Yes. Someone (including us!) has found
+ * a corruption :(
+ *
+ * Let us verify that our chunk is
+ * correct.
+ */
+ /* But first, let us allow the dust to settle down! */
+ verify_chunk(tid, w_ptr, cur_sweep_id, prev_sweep_id);
+
+ return 0;
+ }
+
+ /*
+ * Compute the new pattern that we are going
+ * to write to this location
+ */
+ *pattern_ptr = compute_store_pattern(tid, w_ptr, cur_sweep_id);
+
+ /*
+ * STORE: Now let us write this pattern into
+ * the location
+ */
+ *w_ptr = *pattern_ptr;
+ }
+ }
+
+ return NULL;
+}
+
+
+static unsigned long start_cpu = 0;
+static unsigned long nrthreads = 4;
+
+static pthread_t mem_snapshot_thread;
+
+static void *mem_snapshot_fn(void *arg)
+{
+ int page_size = getpagesize();
+ size_t size = page_size;
+ void *tmp = malloc(size);
+
+ while (!corruption_found && !timeout) {
+ /* Stop memory migration once corruption is found */
+ segv_wait = 1;
+
+ mprotect(map1, size, PROT_READ);
+
+ /*
+ * Load from the working alias (map1). Loading from map2
+ * also fails.
+ */
+ memcpy(tmp, map1, size);
+
+ /*
+ * Stores must go via map2 which has write permissions, but
+ * the corrupted data tends to be seen in the snapshot buffer,
+ * so corruption does not appear to be introduced at the
+ * copy-back via map2 alias here.
+ */
+ memcpy(map2, tmp, size);
+ /*
+ * Before releasing other threads, must ensure the copy
+ * back to
+ */
+ asm volatile("sync" ::: "memory");
+ mprotect(map1, size, PROT_READ|PROT_WRITE);
+ asm volatile("sync" ::: "memory");
+ segv_wait = 0;
+
+ usleep(1); /* This value makes a big difference */
+ }
+
+ return 0;
+}
+
+void alrm_sighandler(int sig)
+{
+ timeout = 1;
+}
+
+int main(int argc, char *argv[])
+{
+ int c;
+ int page_size = getpagesize();
+ time_t now;
+ int i, dir_error;
+ pthread_attr_t attr;
+ key_t shm_key = (key_t) getpid();
+ int shmid, run_time = 20 * 60;
+ struct sigaction sa_alrm;
+
+ snprintf(logdir, LOGDIR_NAME_SIZE,
+ "/tmp/logdir-%u", (unsigned int)getpid());
+ while ((c = getopt(argc, argv, "r:hn:l:t:")) != -1) {
+ switch(c) {
+ case 'r':
+ start_cpu = strtoul(optarg, NULL, 10);
+ break;
+ case 'h':
+ printf("%s [-r <start_cpu>] [-n <nrthreads>] [-l <logdir>] [-t <timeout>]\n", argv[0]);
+ exit(0);
+ break;
+ case 'n':
+ nrthreads = strtoul(optarg, NULL, 10);
+ break;
+ case 'l':
+ strncpy(logdir, optarg, LOGDIR_NAME_SIZE);
+ break;
+ case 't':
+ run_time = strtoul(optarg, NULL, 10);
+ break;
+ default:
+ printf("invalid option\n");
+ exit(0);
+ break;
+ }
+ }
+
+ if (nrthreads > MAX_THREADS)
+ nrthreads = MAX_THREADS;
+
+ shmid = shmget(shm_key, page_size, IPC_CREAT|0666);
+ if (shmid < 0) {
+ err_msg("Failed shmget\n");
+ }
+
+ map1 = shmat(shmid, NULL, 0);
+ if (map1 == (void *) -1) {
+ err_msg("Failed shmat");
+ }
+
+ map2 = shmat(shmid, NULL, 0);
+ if (map2 == (void *) -1) {
+ err_msg("Failed shmat");
+ }
+
+ dir_error = mkdir(logdir, 0755);
+
+ if (dir_error) {
+ err_msg("Failed mkdir");
+ }
+
+ printf("start_cpu list:%lu\n", start_cpu);
+ printf("number of worker threads:%lu + 1 snapshot thread\n", nrthreads);
+ printf("Allocated address:0x%016lx + secondary map:0x%016lx\n", (unsigned long)map1, (unsigned long)map2);
+ printf("logdir at : %s\n", logdir);
+ printf("Timeout: %d seconds\n", run_time);
+
+ time(&now);
+ printf("=================================\n");
+ printf(" Starting Test\n");
+ printf(" %s", ctime(&now));
+ printf("=================================\n");
+
+ for (i = 0; i < nrthreads; i++) {
+ if (1 && !fork()) {
+ prctl(PR_SET_PDEATHSIG, SIGKILL);
+ set_mycpu(start_cpu + i);
+ for (;;)
+ sched_yield();
+ exit(0);
+ }
+ }
+
+
+ sa_alrm.sa_handler = &alrm_sighandler;
+ sigemptyset(&sa_alrm.sa_mask);
+ sa_alrm.sa_flags = 0;
+
+ if (sigaction(SIGALRM, &sa_alrm, 0) == -1) {
+ err_msg("Failed signal handler registration\n");
+ }
+
+ alarm(run_time);
+
+ pthread_attr_init(&attr);
+ for (i = 0; i < nrthreads; i++) {
+ rim_thread_ids[i] = i;
+ pthread_create(&rim_threads[i], &attr, rim_fn, &rim_thread_ids[i]);
+ set_pthread_cpu(rim_threads[i], start_cpu + i);
+ }
+
+ pthread_create(&mem_snapshot_thread, &attr, mem_snapshot_fn, map1);
+ set_pthread_cpu(mem_snapshot_thread, start_cpu + i);
+
+
+ pthread_join(mem_snapshot_thread, NULL);
+ for (i = 0; i < nrthreads; i++) {
+ pthread_join(rim_threads[i], NULL);
+ }
+
+ if (!timeout) {
+ time(&now);
+ printf("=================================\n");
+ printf(" Data Corruption Detected\n");
+ printf(" %s", ctime(&now));
+ printf(" See logfiles in %s\n", logdir);
+ printf("=================================\n");
+ return 1;
+ }
+ return 0;
+}
--
2.21.0
This is a backport of "iio: stm32-adc: fix a race with dma and irq" series
on top of v4.14.148 at the time of writing. The original series doesn't
apply cleanly on v4.14.x: the precursor patch needed a slight update.
Original series can be found in [1].
[1] https://www.lkml.org/lkml/2019/9/17/394
Original cover-letter:
This series fixes a race condition observed when using several ADCs with DMA
and irq.
There's a precusor patch to the fix. It keeps registers definitions as a whole
block, to ease readability and allow simple (readl) access path to EOC bits in
stm32-adc-core driver.
Fabrice Gasnier (2):
iio: adc: stm32-adc: move registers definitions
iio: adc: stm32-adc: fix a race when using several adcs with dma and
irq
drivers/iio/adc/stm32-adc-core.c | 70 +++++++++++---------
drivers/iio/adc/stm32-adc-core.h | 135 +++++++++++++++++++++++++++++++++++++++
drivers/iio/adc/stm32-adc.c | 107 -------------------------------
3 files changed, 175 insertions(+), 137 deletions(-)
--
2.7.4
The OMAP36xx and AM/DM37x TRMs say that the maximum divider for DSS fclk
(in CM_CLKSEL_DSS) is 32. Experimentation shows that this is not
correct, and using divider of 32 breaks DSS with a flood or underflows
and sync losts. Dividers up to 31 seem to work fine.
There is another patch to the DT files to limit the divider correctly,
but as the DSS driver also needs to know the maximum divider to be able
to iteratively find good rates, we also need to do the fix in the DSS
driver.
Signed-off-by: Adam Ford <aford173(a)gmail.com>
Cc: Tomi Valkeinen <tomi.valkeinen(a)ti.com>
Cc: stable(a)vger.kernel.org # linux-4.4.y only
diff --git a/drivers/video/fbdev/omap2/dss/dss.c b/drivers/video/fbdev/omap2/dss/dss.c
index 9200a8668b49..a57c3a5f4bf8 100644
--- a/drivers/video/fbdev/omap2/dss/dss.c
+++ b/drivers/video/fbdev/omap2/dss/dss.c
@@ -843,7 +843,7 @@ static const struct dss_features omap34xx_dss_feats = {
};
static const struct dss_features omap3630_dss_feats = {
- .fck_div_max = 32,
+ .fck_div_max = 31,
.dss_fck_multiplier = 1,
.parent_clk_name = "dpll4_ck",
.dpi_select_source = &dss_dpi_select_source_omap2_omap3,
--
2.17.1
The OMAP36xx and AM/DM37x TRMs say that the maximum divider for DSS fclk
(in CM_CLKSEL_DSS) is 32. Experimentation shows that this is not
correct, and using divider of 32 breaks DSS with a flood or underflows
and sync losts. Dividers up to 31 seem to work fine.
There is another patch to the DT files to limit the divider correctly,
but as the DSS driver also needs to know the maximum divider to be able
to iteratively find good rates, we also need to do the fix in the DSS
driver.
Signed-off-by: Adam Ford <aford173(a)gmail.com>
Cc: Tomi Valkeinen <tomi.valkeinen(a)ti.com>
Cc: stable(a)vger.kernel.org #linux-4.9.y+
diff --git a/drivers/video/fbdev/omap2/omapfb/dss/dss.c b/drivers/video/fbdev/omap2/omapfb/dss/dss.c
index 48c6500c24e1..4429ad37b64c 100644
--- a/drivers/video/fbdev/omap2/omapfb/dss/dss.c
+++ b/drivers/video/fbdev/omap2/omapfb/dss/dss.c
@@ -843,7 +843,7 @@ static const struct dss_features omap34xx_dss_feats = {
};
static const struct dss_features omap3630_dss_feats = {
- .fck_div_max = 32,
+ .fck_div_max = 31,
.dss_fck_multiplier = 1,
.parent_clk_name = "dpll4_ck",
.dpi_select_source = &dss_dpi_select_source_omap2_omap3,
--
2.17.1
The x86 version of get_user_pages_fast() relies on disabled interrupts to
synchronize gup_pte_range() between gup_get_pte(ptep); and get_page() against
a parallel munmap. The munmap side nulls the pte, then flushes TLBs, then
releases the page. As TLB flush is done synchronously via IPI disabling
interrupts blocks the page release, and get_page(), which assumes existing
reference on page, is thus safe.
However when TLB flush is done by a hypercall, e.g. in a Xen PV guest, there is
no blocking thanks to disabled interrupts, and get_page() can succeed on a page
that was already freed or even reused.
We have recently seen this happen with our 4.4 and 4.12 based kernels, with
userspace (java) that exits a thread, where mm_release() performs a futex_wake()
on tsk->clear_child_tid, and another thread in parallel unmaps the page where
tsk->clear_child_tid points to. The spurious get_page() succeeds, but futex code
immediately releases the page again, while it's already on a freelist. Symptoms
include a bad page state warning, general protection faults acessing a poisoned
list prev/next pointer in the freelist, or free page pcplists of two cpus joined
together in a single list. Oscar has also reproduced this scenario, with a
patch inserting delays before the get_page() to make the race window larger.
Fix this by removing the dependency on TLB flush interrupts the same way as the
generic get_user_pages_fast() code by using page_cache_add_speculative() and
revalidating the PTE contents after pinning the page. Mainline is safe since
4.13 where the x86 gup code was removed in favor of the common code. Accessing
the page table itself safely also relies on disabled interrupts and TLB flush
IPIs that don't happen with hypercalls, which was acknowledged in commit
9e52fc2b50de ("x86/mm: Enable RCU based page table freeing
(CONFIG_HAVE_RCU_TABLE_FREE=y)"). That commit with follups should also be
backported for full safety, although our reproducer didn't hit a problem
without that backport.
Reproduced-by: Oscar Salvador <osalvador(a)suse.de>
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Juergen Gross <jgross(a)suse.com>
Cc: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Vitaly Kuznetsov <vkuznets(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Borislav Petkov <bp(a)alien8.de>
Cc: Dave Hansen <dave.hansen(a)linux.intel.com>
Cc: Andy Lutomirski <luto(a)kernel.org>
---
Hi, I'm sending this stable-only patch for consideration because it's probably
unrealistic to backport the 4.13 switch to generic GUP. I can look at 4.4 and
3.16 if accepted. The RCU page table freeing could be also considered.
Note the patch also includes page refcount protection. I found out that
8fde12ca79af ("mm: prevent get_user_pages() from overflowing page refcount")
backport to 4.9 missed the arch-specific gup implementations:
https://lore.kernel.org/lkml/6650323f-dbc9-f069-000b-f6b0f941a065@suse.cz/
arch/x86/mm/gup.c | 32 ++++++++++++++++++++++++++++++--
1 file changed, 30 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index 1680768d392c..d7db45bdfb3b 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -97,6 +97,20 @@ static inline int pte_allows_gup(unsigned long pteval, int write)
return 1;
}
+/*
+ * Return the compund head page with ref appropriately incremented,
+ * or NULL if that failed.
+ */
+static inline struct page *try_get_compound_head(struct page *page, int refs)
+{
+ struct page *head = compound_head(page);
+ if (WARN_ON_ONCE(page_ref_count(head) < 0))
+ return NULL;
+ if (unlikely(!page_cache_add_speculative(head, refs)))
+ return NULL;
+ return head;
+}
+
/*
* The performance critical leaf functions are made noinline otherwise gcc
* inlines everything into a single function which results in too much
@@ -112,7 +126,7 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
ptep = pte_offset_map(&pmd, addr);
do {
pte_t pte = gup_get_pte(ptep);
- struct page *page;
+ struct page *head, *page;
/* Similar to the PMD case, NUMA hinting must take slow path */
if (pte_protnone(pte)) {
@@ -138,7 +152,21 @@ static noinline int gup_pte_range(pmd_t pmd, unsigned long addr,
}
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
page = pte_page(pte);
- get_page(page);
+
+ head = try_get_compound_head(page, 1);
+ if (!head) {
+ put_dev_pagemap(pgmap);
+ pte_unmap(ptep);
+ return 0;
+ }
+
+ if (unlikely(pte_val(pte) != pte_val(*ptep))) {
+ put_page(head);
+ put_dev_pagemap(pgmap);
+ pte_unmap(ptep);
+ return 0;
+ }
+
put_dev_pagemap(pgmap);
SetPageReferenced(page);
pages[*nr] = page;
--
2.22.0
Following a try_to_unmap() we may want to remove the userptr and so call
put_pages(). However, try_to_unmap() acquires the page lock and so we
must avoid recursively locking the pages ourselves -- which means that
we cannot safely acquire the lock around set_page_dirty(). Since we
can't be sure of the lock, we have to risk skip dirtying the page, or
else risk calling set_page_dirty() without a lock and so risk fs
corruption.
Reported-by: Lionel Landwerlin <lionel.g.landwerlin(a)intel.com>
Fixes: cb6d7c7dc7ff ("drm/i915/userptr: Acquire the page lock around set_page_dirty()")
Signed-off-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin(a)intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 16 ++++++++++++++--
1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index b9d2bb15e4a6..1ad2047a6dbd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -672,7 +672,7 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
obj->mm.dirty = false;
for_each_sgt_page(page, sgt_iter, pages) {
- if (obj->mm.dirty)
+ if (obj->mm.dirty && trylock_page(page)) {
/*
* As this may not be anonymous memory (e.g. shmem)
* but exist on a real mapping, we have to lock
@@ -680,8 +680,20 @@ i915_gem_userptr_put_pages(struct drm_i915_gem_object *obj,
* the page reference is not sufficient to
* prevent the inode from being truncated.
* Play safe and take the lock.
+ *
+ * However...!
+ *
+ * The mmu-notifier can be invalidated for a
+ * migrate_page, that is alreadying holding the lock
+ * on the page. Such a try_to_unmap() will result
+ * in us calling put_pages() and so recursively try
+ * to lock the page. We avoid that deadlock with
+ * a trylock_page() and in exchange we risk missing
+ * some page dirtying.
*/
- set_page_dirty_lock(page);
+ set_page_dirty(page);
+ unlock_page(page);
+ }
mark_page_accessed(page);
put_page(page);
--
2.22.0
Validate the stack arguments and setup the stack depening on whether or not
it is growing down or up.
Legacy clone() required userspace to know in which direction the stack is
growing and pass down the stack pointer appropriately. To make things more
confusing microblaze uses a variant of the clone() syscall selected by
CONFIG_CLONE_BACKWARDS3 that takes an additional stack_size argument.
IA64 has a separate clone2() syscall which also takes an additional
stack_size argument. Finally, parisc has a stack that is growing upwards.
Userspace therefore has a lot nasty code like the following:
#define __STACK_SIZE (8 * 1024 * 1024)
pid_t sys_clone(int (*fn)(void *), void *arg, int flags, int *pidfd)
{
pid_t ret;
void *stack;
stack = malloc(__STACK_SIZE);
if (!stack)
return -ENOMEM;
#ifdef __ia64__
ret = __clone2(fn, stack, __STACK_SIZE, flags | SIGCHLD, arg, pidfd);
#elif defined(__parisc__) /* stack grows up */
ret = clone(fn, stack, flags | SIGCHLD, arg, pidfd);
#else
ret = clone(fn, stack + __STACK_SIZE, flags | SIGCHLD, arg, pidfd);
#endif
return ret;
}
or even crazier variants such as [3].
With clone3() we have the ability to validate the stack. We can check that
when stack_size is passed, the stack pointer is valid and the other way
around. We can also check that the memory area userspace gave us is fine to
use via access_ok(). Furthermore, we probably should not require
userspace to know in which direction the stack is growing. It is easy
for us to do this in the kernel and I couldn't find the original
reasoning behind exposing this detail to userspace.
/* Intentional user visible API change */
clone3() was released with 5.3. Currently, it is not documented and very
unclear to userspace how the stack and stack_size argument have to be
passed. After talking to glibc folks we concluded that trying to change
clone3() to setup the stack instead of requiring userspace to do this is
the right course of action.
Note, that this is an explicit change in user visible behavior we introduce
with this patch. If it breaks someone's use-case we will revert! (And then
e.g. place the new behavior under an appropriate flag.)
Breaking someone's use-case is very unlikely though. First, neither glibc
nor musl currently expose a wrapper for clone3(). Second, there is no real
motivation for anyone to use clone3() directly since it does not provide
features that legacy clone doesn't. New features for clone3() will first
happen in v5.5 which is why v5.4 is still a good time to try and make that
change now and backport it to v5.3. Searches on [4] did not reveal any
packages calling clone3().
[1]: https://lore.kernel.org/r/CAG48ez3q=BeNcuVTKBN79kJui4vC6nw0Bfq6xc-i0neheT17…
[2]: https://lore.kernel.org/r/20191028172143.4vnnjpdljfnexaq5@wittgenstein
[3]: https://github.com/systemd/systemd/blob/5238e9575906297608ff802a27e2ff9effa…
[4]: https://codesearch.debian.net
Fixes: 7f192e3cd316 ("fork: add clone3")
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Jann Horn <jannh(a)google.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Florian Weimer <fweimer(a)redhat.com>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: linux-api(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Cc: <stable(a)vger.kernel.org> # 5.3
Cc: GNU C Library <libc-alpha(a)sourceware.org>
Signed-off-by: Christian Brauner <christian.brauner(a)ubuntu.com>
---
include/uapi/linux/sched.h | 4 ++++
kernel/fork.c | 33 ++++++++++++++++++++++++++++++++-
2 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/include/uapi/linux/sched.h b/include/uapi/linux/sched.h
index 99335e1f4a27..25b4fa00bad1 100644
--- a/include/uapi/linux/sched.h
+++ b/include/uapi/linux/sched.h
@@ -51,6 +51,10 @@
* sent when the child exits.
* @stack: Specify the location of the stack for the
* child process.
+ * Note, @stack is expected to point to the
+ * lowest address. The stack direction will be
+ * determined by the kernel and set up
+ * appropriately based on @stack_size.
* @stack_size: The size of the stack for the child process.
* @tls: If CLONE_SETTLS is set, the tls descriptor
* is set to tls.
diff --git a/kernel/fork.c b/kernel/fork.c
index bcdf53125210..55af6931c6ec 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2561,7 +2561,35 @@ noinline static int copy_clone_args_from_user(struct kernel_clone_args *kargs,
return 0;
}
-static bool clone3_args_valid(const struct kernel_clone_args *kargs)
+/**
+ * clone3_stack_valid - check and prepare stack
+ * @kargs: kernel clone args
+ *
+ * Verify that the stack arguments userspace gave us are sane.
+ * In addition, set the stack direction for userspace since it's easy for us to
+ * determine.
+ */
+static inline bool clone3_stack_valid(struct kernel_clone_args *kargs)
+{
+ if (kargs->stack == 0) {
+ if (kargs->stack_size > 0)
+ return false;
+ } else {
+ if (kargs->stack_size == 0)
+ return false;
+
+ if (!access_ok((void __user *)kargs->stack, kargs->stack_size))
+ return false;
+
+#if !defined(CONFIG_STACK_GROWSUP) && !defined(CONFIG_IA64)
+ kargs->stack += kargs->stack_size;
+#endif
+ }
+
+ return true;
+}
+
+static bool clone3_args_valid(struct kernel_clone_args *kargs)
{
/*
* All lower bits of the flag word are taken.
@@ -2581,6 +2609,9 @@ static bool clone3_args_valid(const struct kernel_clone_args *kargs)
kargs->exit_signal)
return false;
+ if (!clone3_stack_valid(kargs))
+ return false;
+
return true;
}
--
2.23.0