[PATCH AUTOSEL 4.14 01/53] gpio: pl061: Move irq_chip definition inside struct pl061

List overview All Threads
Download

newer

older

[PATCH AUTOSEL 4.9 01/36]...

[PATCH AUTOSEL 4.19 01/97]...

Sasha Levin

8 Jan 2019 8 Jan '19

7:31 p.m.

From: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org

[ Upstream commit ed8dce4c6f726b7f3c6bf40859b92a9e32f189c1 ]

Keeping the irq_chip definition static will make it shared with multiple giochips in the system. This practice is considered to be bad and now we will get the below warning from gpiolib core:

"detected irqchip that is shared with multiple gpiochips: please fix the driver."

Hence, move the irq_chip definition from static to `struct pl061` for using a unique irq_chip for each gpiochip.

Signed-off-by: Manivannan Sadhasivam manivannan.sadhasivam@linaro.org Signed-off-by: Linus Walleij linus.walleij@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpio/gpio-pl061.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/drivers/gpio/gpio-pl061.c b/drivers/gpio/gpio-pl061.c index 6aaaab79c205..f6e1e0e306a3 100644 --- a/drivers/gpio/gpio-pl061.c +++ b/drivers/gpio/gpio-pl061.c @@ -54,6 +54,7 @@ struct pl061 {

void __iomem *base; struct gpio_chip gc; + struct irq_chip irq_chip; int parent_irq;

#ifdef CONFIG_PM @@ -281,15 +282,6 @@ static int pl061_irq_set_wake(struct irq_data *d, unsigned int state) return irq_set_irq_wake(pl061->parent_irq, state); }

-static struct irq_chip pl061_irqchip = { - .name = "pl061", - .irq_ack = pl061_irq_ack, - .irq_mask = pl061_irq_mask, - .irq_unmask = pl061_irq_unmask, - .irq_set_type = pl061_irq_type, - .irq_set_wake = pl061_irq_set_wake, -}; - static int pl061_probe(struct amba_device *adev, const struct amba_id *id) { struct device *dev = &adev->dev; @@ -328,6 +320,13 @@ static int pl061_probe(struct amba_device *adev, const struct amba_id *id) /* * irq_chip support */ + pl061->irq_chip.name = dev_name(dev); + pl061->irq_chip.irq_ack = pl061_irq_ack; + pl061->irq_chip.irq_mask = pl061_irq_mask; + pl061->irq_chip.irq_unmask = pl061_irq_unmask; + pl061->irq_chip.irq_set_type = pl061_irq_type; + pl061->irq_chip.irq_set_wake = pl061_irq_set_wake; + writeb(0, pl061->base + GPIOIE); /* disable irqs */ irq = adev->irq[0]; if (irq < 0) { @@ -336,14 +335,14 @@ static int pl061_probe(struct amba_device *adev, const struct amba_id *id) } pl061->parent_irq = irq;

- ret = gpiochip_irqchip_add(&pl061->gc, &pl061_irqchip, + ret = gpiochip_irqchip_add(&pl061->gc, &pl061->irq_chip, 0, handle_bad_irq, IRQ_TYPE_NONE); if (ret) { dev_info(&adev->dev, "could not add irqchip\n"); return ret; } - gpiochip_set_chained_irqchip(&pl061->gc, &pl061_irqchip, + gpiochip_set_chained_irqchip(&pl061->gc, &pl061->irq_chip, irq, pl061_irq_handler);

amba_set_drvdata(adev, pl061);

-- 2.19.1

Show replies by date

Sasha Levin

8 Jan 8 Jan

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 02/53] platform/x86: asus-wmi: Tell the EC the OS will handle the display off hotkey

From: João Paulo Rechi Vita jprvita@gmail.com

[ Upstream commit 78f3ac76d9e5219589718b9e4733bee21627b3f5 ]

In the past, Asus firmwares would change the panel backlight directly through the EC when the display off hotkey (Fn+F7) was pressed, and only notify the OS of such change, with 0x33 when the LCD was ON and 0x34 when the LCD was OFF. These are currently mapped to KEY_DISPLAYTOGGLE and KEY_DISPLAY_OFF, respectively.

Most recently the EC on Asus most machines lost ability to toggle the LCD backlight directly, but unless the OS informs the firmware it is going to handle the display toggle hotkey events, the firmware still tries change the brightness through the EC, to no effect. The end result is a long list (at Endless we counted 11) of Asus laptop models where the display toggle hotkey does not perform any action. Our firmware engineers contacts at Asus were surprised that there were still machines out there with the old behavior.

Calling WMNB(ASUS_WMI_DEVID_BACKLIGHT==0x00050011, 2) on the _WDG device tells the firmware that it should let the OS handle the display toggle event, in which case it will simply notify the OS of a key press with 0x35, as shown by the DSDT excerpts bellow.

Scope (_SB) { (...)

Device (ATKD) { (...)

Name (_WDG, Buffer (0x28) { /* 0000 */ 0xD0, 0x5E, 0x84, 0x97, 0x6D, 0x4E, 0xDE, 0x11, /* 0008 */ 0x8A, 0x39, 0x08, 0x00, 0x20, 0x0C, 0x9A, 0x66, /* 0010 */ 0x4E, 0x42, 0x01, 0x02, 0x35, 0xBB, 0x3C, 0x0B, /* 0018 */ 0xC2, 0xE3, 0xED, 0x45, 0x91, 0xC2, 0x4C, 0x5A, /* 0020 */ 0x6D, 0x19, 0x5D, 0x1C, 0xFF, 0x00, 0x01, 0x08 }) Method (WMNB, 3, Serialized) { CreateDWordField (Arg2, Zero, IIA0) CreateDWordField (Arg2, 0x04, IIA1) Local0 = (Arg1 & 0xFFFFFFFF)

(...)

If ((Local0 == 0x53564544)) { (...)

If ((IIA0 == 0x00050011)) { If ((IIA1 == 0x02)) { ^^PCI0.SBRG.EC0.SPIN (0x72, One) ^^PCI0.SBRG.EC0.BLCT = One }

Return (One) } } (...) } (...) } (...) } (...)

Scope (_SB.PCI0.SBRG.EC0) { (...)

Name (BLCT, Zero)

(...)

Method (_Q10, 0, NotSerialized) // _Qxx: EC Query { If ((BLCT == Zero)) { Local0 = One Local0 = RPIN (0x72) Local0 ^= One SPIN (0x72, Local0) If (ATKP) { Local0 = (0x34 - Local0) ^^^^ATKD.IANE (Local0) } } ElseIf ((BLCT == One)) { If (ATKP) { ^^^^ATKD.IANE (0x35) } } } (...) }

Signed-off-by: João Paulo Rechi Vita jprvita@endlessm.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/platform/x86/asus-wmi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/platform/x86/asus-wmi.c b/drivers/platform/x86/asus-wmi.c index 7440f650e81a..3f662cd774d7 100644 --- a/drivers/platform/x86/asus-wmi.c +++ b/drivers/platform/x86/asus-wmi.c @@ -2147,7 +2147,8 @@ static int asus_wmi_add(struct platform_device *pdev) err = asus_wmi_backlight_init(asus); if (err && err != -ENODEV) goto fail_backlight; - } + } else + err = asus_wmi_set_devstate(ASUS_WMI_DEVID_BACKLIGHT, 2, NULL);

status = wmi_install_notify_handler(asus->driver->event_guid, asus_wmi_notify, asus);

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 03/53] e1000e: allow non-monotonic SYSTIM readings

From: Miroslav Lichvar mlichvar@redhat.com

[ Upstream commit e1f65b0d70e9e5c80e15105cd96fa00174d7c436 ]

It seems with some NICs supported by the e1000e driver a SYSTIM reading may occasionally be few microseconds before the previous reading and if enabled also pass e1000e_sanitize_systim() without reaching the maximum number of rereads, even if the function is modified to check three consecutive readings (i.e. it doesn't look like a double read error). This causes an underflow in the timecounter and the PHC time jumps hours ahead.

This was observed on 82574, I217 and I219. The fastest way to reproduce it is to run a program that continuously calls the PTP_SYS_OFFSET ioctl on the PHC.

Modify e1000e_phc_gettime() to use timecounter_cyc2time() instead of timecounter_read() in order to allow non-monotonic SYSTIM readings and prevent the PHC from jumping.

Cc: Richard Cochran richardcochran@gmail.com Signed-off-by: Miroslav Lichvar mlichvar@redhat.com Acked-by: Jacob Keller jacob.e.keller@intel.com Tested-by: Aaron Brown aaron.f.brown@intel.com Signed-off-by: Jeff Kirsher jeffrey.t.kirsher@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/intel/e1000e/ptp.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/net/ethernet/intel/e1000e/ptp.c b/drivers/net/ethernet/intel/e1000e/ptp.c index b366885487a8..cd16b70a4e70 100644 --- a/drivers/net/ethernet/intel/e1000e/ptp.c +++ b/drivers/net/ethernet/intel/e1000e/ptp.c @@ -191,10 +191,14 @@ static int e1000e_phc_gettime(struct ptp_clock_info *ptp, struct timespec64 *ts) struct e1000_adapter *adapter = container_of(ptp, struct e1000_adapter, ptp_clock_info); unsigned long flags; - u64 ns; + u64 cycles, ns;

spin_lock_irqsave(&adapter->systim_lock, flags); - ns = timecounter_read(&adapter->tc); + + /* Use timecounter_cyc2time() to allow non-monotonic SYSTIM readings */ + cycles = adapter->cc.read(&adapter->cc); + ns = timecounter_cyc2time(&adapter->tc, cycles); + spin_unlock_irqrestore(&adapter->systim_lock, flags);

*ts = ns_to_timespec64(ns); @@ -250,9 +254,12 @@ static void e1000e_systim_overflow_work(struct work_struct *work) systim_overflow_work.work); struct e1000_hw *hw = &adapter->hw; struct timespec64 ts; + u64 ns;

- adapter->ptp_clock_info.gettime64(&adapter->ptp_clock_info, &ts); + /* Update the timecounter */ + ns = timecounter_read(&adapter->tc);

+ ts = ns_to_timespec64(ns); e_dbg("SYSTIM overflow check at %lld.%09lu\n", (long long) ts.tv_sec, ts.tv_nsec);

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 04/53] writeback: don't decrement wb->refcnt if !wb->bdi

From: Anders Roxell anders.roxell@linaro.org

[ Upstream commit 347a28b586802d09604a149c1a1f6de5dccbe6fa ]

This happened while running in qemu-system-aarch64, the AMBA PL011 UART driver when enabling CONFIG_DEBUG_TEST_DRIVER_REMOVE. arch_initcall(pl011_init) came before subsys_initcall(default_bdi_init), devtmpfs' handle_remove() crashes because the reference count is a NULL pointer only because wb->bdi hasn't been initialized yet.

Rework so that wb_put have an extra check if wb->bdi before decrement wb->refcnt and also add a WARN_ON_ONCE to get a warning if it happens again in other drivers.

Fixes: 52ebea749aae ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") Co-developed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Anders Roxell anders.roxell@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/backing-dev-defs.h | 8 ++++++++ 1 file changed, 8 insertions(+)

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 3c1beffc861a..19240379637f 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -233,6 +233,14 @@ static inline void wb_get(struct bdi_writeback *wb) */ static inline void wb_put(struct bdi_writeback *wb) { + if (WARN_ON_ONCE(!wb->bdi)) { + /* + * A driver bug might cause a file to be removed before bdi was + * initialized. + */ + return; + } + if (wb != &wb->bdi->wb) percpu_ref_put(&wb->refcnt); }

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 05/53] serial: set suppress_bind_attrs flag only if builtin

From: Anders Roxell anders.roxell@linaro.org

[ Upstream commit 646097940ad35aa2c1f2012af932d55976a9f255 ]

When the test 'CONFIG_DEBUG_TEST_DRIVER_REMOVE=y' is enabled, arch_initcall(pl011_init) came before subsys_initcall(default_bdi_init). devtmpfs gets killed because we try to remove a file and decrement the wb reference count before the noop_backing_device_info gets initialized.

[ 0.332075] Serial: AMBA PL011 UART driver [ 0.485276] 9000000.pl011: ttyAMA0 at MMIO 0x9000000 (irq = 39, base_baud = 0) is a PL011 rev1 [ 0.502382] console [ttyAMA0] enabled [ 0.515710] Unable to handle kernel paging request at virtual address 0000800074c12000 [ 0.516053] Mem abort info: [ 0.516222] ESR = 0x96000004 [ 0.516417] Exception class = DABT (current EL), IL = 32 bits [ 0.516641] SET = 0, FnV = 0 [ 0.516826] EA = 0, S1PTW = 0 [ 0.516984] Data abort info: [ 0.517149] ISV = 0, ISS = 0x00000004 [ 0.517339] CM = 0, WnR = 0 [ 0.517553] [0000800074c12000] user address but active_mm is swapper [ 0.517928] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 0.518305] Modules linked in: [ 0.518839] CPU: 0 PID: 13 Comm: kdevtmpfs Not tainted 4.19.0-rc5-next-20180928-00002-g2ba39ab0cd01-dirty #82 [ 0.519307] Hardware name: linux,dummy-virt (DT) [ 0.519681] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 0.519959] pc : __destroy_inode+0x94/0x2a8 [ 0.520212] lr : __destroy_inode+0x78/0x2a8 [ 0.520401] sp : ffff0000098c3b20 [ 0.520590] x29: ffff0000098c3b20 x28: 00000000087a3714 [ 0.520904] x27: 0000000000002000 x26: 0000000000002000 [ 0.521179] x25: ffff000009583000 x24: 0000000000000000 [ 0.521467] x23: ffff80007bb52000 x22: ffff80007bbaa7c0 [ 0.521737] x21: ffff0000093f9338 x20: 0000000000000000 [ 0.522033] x19: ffff80007bbb05d8 x18: 0000000000000400 [ 0.522376] x17: 0000000000000000 x16: 0000000000000000 [ 0.522727] x15: 0000000000000400 x14: 0000000000000400 [ 0.523068] x13: 0000000000000001 x12: 0000000000000001 [ 0.523421] x11: 0000000000000000 x10: 0000000000000970 [ 0.523749] x9 : ffff0000098c3a60 x8 : ffff80007bbab190 [ 0.524017] x7 : ffff80007bbaa880 x6 : 0000000000000c88 [ 0.524305] x5 : ffff0000093d96c8 x4 : 61c8864680b583eb [ 0.524567] x3 : ffff0000093d6180 x2 : ffffffffffffffff [ 0.524872] x1 : 0000800074c12000 x0 : 0000800074c12000 [ 0.525207] Process kdevtmpfs (pid: 13, stack limit = 0x(____ptrval____)) [ 0.525529] Call trace: [ 0.525806] __destroy_inode+0x94/0x2a8 [ 0.526108] destroy_inode+0x34/0x88 [ 0.526370] evict+0x144/0x1c8 [ 0.526636] iput+0x184/0x230 [ 0.526871] dentry_unlink_inode+0x118/0x130 [ 0.527152] d_delete+0xd8/0xe0 [ 0.527420] vfs_unlink+0x240/0x270 [ 0.527665] handle_remove+0x1d8/0x330 [ 0.527875] devtmpfsd+0x138/0x1c8 [ 0.528085] kthread+0x14c/0x158 [ 0.528291] ret_from_fork+0x10/0x18 [ 0.528720] Code: 92800002 aa1403e0 d538d081 8b010000 (c85f7c04) [ 0.529367] ---[ end trace 5a3dee47727f877c ]---

Rework to set suppress_bind_attrs flag to avoid removing the device when CONFIG_DEBUG_TEST_DRIVER_REMOVE=y. This applies for pic32_uart and xilinx_uartps as well.

Co-developed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Anders Roxell anders.roxell@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/amba-pl011.c | 2 ++ drivers/tty/serial/pic32_uart.c | 1 + drivers/tty/serial/xilinx_uartps.c | 1 + 3 files changed, 4 insertions(+)

diff --git a/drivers/tty/serial/amba-pl011.c b/drivers/tty/serial/amba-pl011.c index c9f701aca677..4a4a9f33715c 100644 --- a/drivers/tty/serial/amba-pl011.c +++ b/drivers/tty/serial/amba-pl011.c @@ -2800,6 +2800,7 @@ static struct platform_driver arm_sbsa_uart_platform_driver = { .name = "sbsa-uart", .of_match_table = of_match_ptr(sbsa_uart_of_match), .acpi_match_table = ACPI_PTR(sbsa_uart_acpi_match), + .suppress_bind_attrs = IS_BUILTIN(CONFIG_SERIAL_AMBA_PL011), }, };

@@ -2828,6 +2829,7 @@ static struct amba_driver pl011_driver = { .drv = { .name = "uart-pl011", .pm = &pl011_dev_pm_ops, + .suppress_bind_attrs = IS_BUILTIN(CONFIG_SERIAL_AMBA_PL011), }, .id_table = pl011_ids, .probe = pl011_probe, diff --git a/drivers/tty/serial/pic32_uart.c b/drivers/tty/serial/pic32_uart.c index 00a33eb859d3..e3d7d9d6c599 100644 --- a/drivers/tty/serial/pic32_uart.c +++ b/drivers/tty/serial/pic32_uart.c @@ -920,6 +920,7 @@ static struct platform_driver pic32_uart_platform_driver = { .driver = { .name = PIC32_DEV_NAME, .of_match_table = of_match_ptr(pic32_serial_dt_ids), + .suppress_bind_attrs = IS_BUILTIN(CONFIG_SERIAL_PIC32), }, };

diff --git a/drivers/tty/serial/xilinx_uartps.c b/drivers/tty/serial/xilinx_uartps.c index 21c35ad72b99..772d07f85b3d 100644 --- a/drivers/tty/serial/xilinx_uartps.c +++ b/drivers/tty/serial/xilinx_uartps.c @@ -1644,6 +1644,7 @@ static struct platform_driver cdns_uart_platform_driver = { .name = CDNS_UART_NAME, .of_match_table = cdns_uart_of_match, .pm = &cdns_uart_dev_pm_ops, + .suppress_bind_attrs = IS_BUILTIN(CONFIG_SERIAL_XILINX_PS_UART), }, };

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 06/53] ALSA: oxfw: add support for APOGEE duet FireWire

From: Takashi Sakamoto o-takashi@sakamocchi.jp

[ Upstream commit fba43f454cdf9caa3185219d116bd2a6e6354552 ]

This commit adds support for APOGEE duet FireWire, launched 2007, already discontinued. This model uses Oxford Semiconductor FW971 as its communication engine. Below is information on Configuration ROM of this unit. The unit supports some AV/C commands defined by Audio subunit specification and vendor dependent commands.

$ ./hinawa-config-rom-printer /dev/fw1 { 'bus-info': { 'adj': False, 'bmc': False, 'chip_ID': 42949742248, 'cmc': False, 'cyc_clk_acc': 255, 'generation': 0, 'imc': False, 'isc': True, 'link_spd': 3, 'max_ROM': 0, 'max_rec': 64, 'name': '1394', 'node_vendor_ID': 987, 'pmc': False}, 'root-directory': [ ['VENDOR', 987], ['DESCRIPTOR', 'Apogee Electronics'], ['MODEL', 122333], ['DESCRIPTOR', 'Duet'], [ 'NODE_CAPABILITIES', { 'addressing': {'64': True, 'fix': True, 'prv': False}, 'misc': {'int': False, 'ms': False, 'spt': True}, 'state': { 'atn': False, 'ded': False, 'drq': True, 'elo': False, 'init': False, 'lst': True, 'off': False}, 'testing': {'bas': False, 'ext': False}}], [ 'UNIT', [ ['SPECIFIER_ID', 41005], ['VERSION', 65537], ['MODEL', 122333], ['DESCRIPTOR', 'Duet']]]]}

Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/firewire/Kconfig | 1 + sound/firewire/oxfw/oxfw.c | 8 ++++++++ 2 files changed, 9 insertions(+)

diff --git a/sound/firewire/Kconfig b/sound/firewire/Kconfig index 529d9f405fa9..0cb65d0864cc 100644 --- a/sound/firewire/Kconfig +++ b/sound/firewire/Kconfig @@ -41,6 +41,7 @@ config SND_OXFW * Mackie(Loud) U.420/U.420d * TASCAM FireOne * Stanton Controllers & Systems 1 Deck/Mixer + * APOGEE duet FireWire

To compile this driver as a module, choose M here: the module will be called snd-oxfw. diff --git a/sound/firewire/oxfw/oxfw.c b/sound/firewire/oxfw/oxfw.c index a315d5b6b86b..1554dc98e092 100644 --- a/sound/firewire/oxfw/oxfw.c +++ b/sound/firewire/oxfw/oxfw.c @@ -20,6 +20,7 @@ #define VENDOR_LACIE 0x00d04b #define VENDOR_TASCAM 0x00022e #define OUI_STANTON 0x001260 +#define OUI_APOGEE 0x0003db

#define MODEL_SATELLITE 0x00200f

@@ -442,6 +443,13 @@ static const struct ieee1394_device_id oxfw_id_table[] = { .vendor_id = OUI_STANTON, .model_id = 0x002000, }, + // APOGEE, duet FireWire + { + .match_flags = IEEE1394_MATCH_VENDOR_ID | + IEEE1394_MATCH_MODEL_ID, + .vendor_id = OUI_APOGEE, + .model_id = 0x01dddd, + }, { } }; MODULE_DEVICE_TABLE(ieee1394, oxfw_id_table);

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 07/53] x86/mce: Fix -Wmissing-prototypes warnings

From: Borislav Petkov bp@suse.de

[ Upstream commit 68b5e4326e4b8ac9080835005d8254fed0fb3c56 ]

Add the proper includes and make smca_get_name() static.

Fix an actual bug too which the warning triggered:

arch/x86/kernel/cpu/mcheck/therm_throt.c:395:39: error: conflicting \ types for ‘smp_thermal_interrupt’ asmlinkage __visible void __irq_entry smp_thermal_interrupt(struct pt_regs *r) ^~~~~~~~~~~~~~~~~~~~~ In file included from arch/x86/kernel/cpu/mcheck/therm_throt.c:29: ./arch/x86/include/asm/traps.h:107:17: note: previous declaration of \ ‘smp_thermal_interrupt’ was here asmlinkage void smp_thermal_interrupt(void);

Signed-off-by: Borislav Petkov bp@suse.de Cc: Yi Wang wang.yi59@zte.com.cn Cc: Michael Matz matz@suse.de Cc: x86@kernel.org Link: https://lkml.kernel.org/r/alpine.DEB.2.21.1811081633160.1549@nanos.tec.linut... Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/include/asm/traps.h | 6 +++--- arch/x86/kernel/cpu/mcheck/mce_amd.c | 5 +++-- arch/x86/kernel/cpu/mcheck/therm_throt.c | 3 ++- arch/x86/kernel/cpu/mcheck/threshold.c | 3 ++- 4 files changed, 10 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/traps.h b/arch/x86/include/asm/traps.h index 3de69330e6c5..afbc87206886 100644 --- a/arch/x86/include/asm/traps.h +++ b/arch/x86/include/asm/traps.h @@ -104,9 +104,9 @@ extern int panic_on_unrecovered_nmi;

void math_emulate(struct math_emu_info *); #ifndef CONFIG_X86_32 -asmlinkage void smp_thermal_interrupt(void); -asmlinkage void smp_threshold_interrupt(void); -asmlinkage void smp_deferred_error_interrupt(void); +asmlinkage void smp_thermal_interrupt(struct pt_regs *regs); +asmlinkage void smp_threshold_interrupt(struct pt_regs *regs); +asmlinkage void smp_deferred_error_interrupt(struct pt_regs *regs); #endif

extern void ist_enter(struct pt_regs *regs); diff --git a/arch/x86/kernel/cpu/mcheck/mce_amd.c b/arch/x86/kernel/cpu/mcheck/mce_amd.c index beec0daecbc5..4fa97a44e73f 100644 --- a/arch/x86/kernel/cpu/mcheck/mce_amd.c +++ b/arch/x86/kernel/cpu/mcheck/mce_amd.c @@ -23,6 +23,7 @@ #include <linux/string.h>

#include <asm/amd_nb.h> +#include <asm/traps.h> #include <asm/apic.h> #include <asm/mce.h> #include <asm/msr.h> @@ -99,7 +100,7 @@ static u32 smca_bank_addrs[MAX_NR_BANKS][NR_BLOCKS] __ro_after_init = [0 ... MAX_NR_BANKS - 1] = { [0 ... NR_BLOCKS - 1] = -1 } };

-const char *smca_get_name(enum smca_bank_types t) +static const char *smca_get_name(enum smca_bank_types t) { if (t >= N_SMCA_BANK_TYPES) return NULL; @@ -823,7 +824,7 @@ static void __log_error(unsigned int bank, u64 status, u64 addr, u64 misc) mce_log(&m); }

-asmlinkage __visible void __irq_entry smp_deferred_error_interrupt(void) +asmlinkage __visible void __irq_entry smp_deferred_error_interrupt(struct pt_regs *regs) { entering_irq(); trace_deferred_error_apic_entry(DEFERRED_ERROR_VECTOR); diff --git a/arch/x86/kernel/cpu/mcheck/therm_throt.c b/arch/x86/kernel/cpu/mcheck/therm_throt.c index 2da67b70ba98..ee229ceee745 100644 --- a/arch/x86/kernel/cpu/mcheck/therm_throt.c +++ b/arch/x86/kernel/cpu/mcheck/therm_throt.c @@ -25,6 +25,7 @@ #include <linux/cpu.h>

#include <asm/processor.h> +#include <asm/traps.h> #include <asm/apic.h> #include <asm/mce.h> #include <asm/msr.h> @@ -390,7 +391,7 @@ static void unexpected_thermal_interrupt(void)

static void (*smp_thermal_vector)(void) = unexpected_thermal_interrupt;

-asmlinkage __visible void __irq_entry smp_thermal_interrupt(struct pt_regs *r) +asmlinkage __visible void __irq_entry smp_thermal_interrupt(struct pt_regs *regs) { entering_irq(); trace_thermal_apic_entry(THERMAL_APIC_VECTOR); diff --git a/arch/x86/kernel/cpu/mcheck/threshold.c b/arch/x86/kernel/cpu/mcheck/threshold.c index 2b584b319eff..c21e0a1efd0f 100644 --- a/arch/x86/kernel/cpu/mcheck/threshold.c +++ b/arch/x86/kernel/cpu/mcheck/threshold.c @@ -6,6 +6,7 @@ #include <linux/kernel.h>

#include <asm/irq_vectors.h> +#include <asm/traps.h> #include <asm/apic.h> #include <asm/mce.h> #include <asm/trace/irq_vectors.h> @@ -18,7 +19,7 @@ static void default_threshold_interrupt(void)

void (*mce_threshold_vector)(void) = default_threshold_interrupt;

-asmlinkage __visible void __irq_entry smp_threshold_interrupt(void) +asmlinkage __visible void __irq_entry smp_threshold_interrupt(struct pt_regs *regs) { entering_irq(); trace_threshold_apic_entry(THRESHOLD_APIC_VECTOR);

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 08/53] MIPS: SiByte: Enable swiotlb for SWARM, LittleSur and BigSur

From: "Maciej W. Rozycki" macro@linux-mips.org

[ Upstream commit e4849aff1e169b86c561738daf8ff020e9de1011 ]

The Broadcom SiByte BCM1250, BCM1125, and BCM1125H SOCs have an onchip DRAM controller that supports memory amounts of up to 16GiB, and due to how the address decoder has been wired in the SOC any memory beyond 1GiB is actually mapped starting from 4GiB physical up, that is beyond the 32-bit addressable limit[1]. Consequently if the maximum amount of memory has been installed, then it will span up to 19GiB.

Many of the evaluation boards we support that are based on one of these SOCs have their memory soldered and the amount present fits in the 32-bit address range. The BCM91250A SWARM board however has actual DIMM slots and accepts, depending on the peripherals revision of the SOC, up to 4GiB or 8GiB of memory in commercially available JEDEC modules[2]. I believe this is also the case with the BCM91250C2 LittleSur board. This means that up to either 3GiB or 7GiB of memory requires 64-bit addressing to access.

I believe the BCM91480B BigSur board, which has the BCM1480 SOC instead, accepts at least as much memory, although I have no documentation or actual hardware available to verify that.

Both systems have PCI slots installed for use by any PCI option boards, including ones that only support 32-bit addressing (additionally the 32-bit PCI host bridge of the BCM1250, BCM1125, and BCM1125H SOCs limits addressing to 32-bits), and there is no IOMMU available. Therefore for PCI DMA to work in the presence of memory beyond enable swiotlb for the affected systems.

All the other SOC onchip DMA devices use 40-bit addressing and therefore can address the whole memory, so only enable swiotlb if PCI support and support for DMA beyond 4GiB have been both enabled in the configuration of the kernel.

This shows up as follows:

Broadcom SiByte BCM1250 B2 @ 800 MHz (SB1 rev 2) Board type: SiByte BCM91250A (SWARM) Determined physical RAM map: memory: 000000000fe7fe00 @ 0000000000000000 (usable) memory: 000000001ffffe00 @ 0000000080000000 (usable) memory: 000000000ffffe00 @ 00000000c0000000 (usable) memory: 0000000087fffe00 @ 0000000100000000 (usable) software IO TLB: mapped [mem 0xcbffc000-0xcfffc000] (64MB)

in the bootstrap log and removes failures like these:

defxx 0000:02:00.0: dma_direct_map_page: overflow 0x0000000185bc6080+4608 of device mask ffffffff bus mask 0 fddi0: Receive buffer allocation failed fddi0: Adapter open failed! IP-Config: Failed to open fddi0 defxx 0000:09:08.0: dma_direct_map_page: overflow 0x0000000185bc6080+4608 of device mask ffffffff bus mask 0 fddi1: Receive buffer allocation failed fddi1: Adapter open failed! IP-Config: Failed to open fddi1

when memory beyond 4GiB is handed out to devices that can only do 32-bit addressing.

This updates commit cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need DMA32.").

References:

[1] "BCM1250/BCM1125/BCM1125H User Manual", Revision 1250_1125-UM100-R, Broadcom Corporation, 21 Oct 2002, Section 3: "System Overview", "Memory Map", pp. 34-38

[2] "BCM91250A User Manual", Revision 91250A-UM100-R, Broadcom Corporation, 18 May 2004, Section 3: "Physical Description", "Supported DRAM", p. 23

Signed-off-by: Maciej W. Rozycki macro@linux-mips.org [paul.burton@mips.com: Remove GPL text from dma.c; SPDX tag covers it] Signed-off-by: Paul Burton paul.burton@mips.com Reviewed-by: Christoph Hellwig hch@lst.de Patchwork: https://patchwork.linux-mips.org/patch/21108/ References: cce335ae47e2 ("[MIPS] 64-bit Sibyte kernels need DMA32.") Cc: Ralf Baechle ralf@linux-mips.org Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/mips/Kconfig | 3 +++ arch/mips/sibyte/common/Makefile | 1 + arch/mips/sibyte/common/dma.c | 14 ++++++++++++++ 3 files changed, 18 insertions(+) create mode 100644 arch/mips/sibyte/common/dma.c

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index 23e3d3e0ee5b..4fbf8082c8f2 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -800,6 +800,7 @@ config SIBYTE_SWARM select SYS_SUPPORTS_HIGHMEM select SYS_SUPPORTS_LITTLE_ENDIAN select ZONE_DMA32 if 64BIT + select SWIOTLB if ARCH_DMA_ADDR_T_64BIT && PCI

config SIBYTE_LITTLESUR bool "Sibyte BCM91250C2-LittleSur" @@ -822,6 +823,7 @@ config SIBYTE_SENTOSA select SYS_HAS_CPU_SB1 select SYS_SUPPORTS_BIG_ENDIAN select SYS_SUPPORTS_LITTLE_ENDIAN + select SWIOTLB if ARCH_DMA_ADDR_T_64BIT && PCI

config SIBYTE_BIGSUR bool "Sibyte BCM91480B-BigSur" @@ -835,6 +837,7 @@ config SIBYTE_BIGSUR select SYS_SUPPORTS_HIGHMEM select SYS_SUPPORTS_LITTLE_ENDIAN select ZONE_DMA32 if 64BIT + select SWIOTLB if ARCH_DMA_ADDR_T_64BIT && PCI

config SNI_RM bool "SNI RM200/300/400" diff --git a/arch/mips/sibyte/common/Makefile b/arch/mips/sibyte/common/Makefile index b3d6bf23a662..3ef3fb658136 100644 --- a/arch/mips/sibyte/common/Makefile +++ b/arch/mips/sibyte/common/Makefile @@ -1,4 +1,5 @@ obj-y := cfe.o +obj-$(CONFIG_SWIOTLB) += dma.o obj-$(CONFIG_SIBYTE_BUS_WATCHER) += bus_watcher.o obj-$(CONFIG_SIBYTE_CFE_CONSOLE) += cfe_console.o obj-$(CONFIG_SIBYTE_TBPROF) += sb_tbprof.o diff --git a/arch/mips/sibyte/common/dma.c b/arch/mips/sibyte/common/dma.c new file mode 100644 index 000000000000..eb47a94f3583 --- /dev/null +++ b/arch/mips/sibyte/common/dma.c @@ -0,0 +1,14 @@ +// SPDX-License-Identifier: GPL-2.0+ +/* + * DMA support for Broadcom SiByte platforms. + * + * Copyright (c) 2018 Maciej W. Rozycki + */ + +#include <linux/swiotlb.h> +#include <asm/bootinfo.h> + +void __init plat_swiotlb_setup(void) +{ + swiotlb_init(1); +}

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 09/53] arm64: perf: set suppress_bind_attrs flag to true

From: Anders Roxell anders.roxell@linaro.org

[ Upstream commit 81e9fa8bab381f8b6eb04df7cdf0f71994099bd4 ]

The armv8_pmuv3 driver doesn't have a remove function, and when the test 'CONFIG_DEBUG_TEST_DRIVER_REMOVE=y' is enabled, the following Call trace can be seen.

[ 1.424287] Failed to register pmu: armv8_pmuv3, reason -17 [ 1.424870] WARNING: CPU: 0 PID: 1 at ../kernel/events/core.c:11771 perf_event_sysfs_init+0x98/0xdc [ 1.425220] Modules linked in: [ 1.425531] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G W 4.19.0-rc7-next-20181012-00003-ge7a97b1ad77b-dirty #35 [ 1.425951] Hardware name: linux,dummy-virt (DT) [ 1.426212] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 1.426458] pc : perf_event_sysfs_init+0x98/0xdc [ 1.426720] lr : perf_event_sysfs_init+0x98/0xdc [ 1.426908] sp : ffff00000804bd50 [ 1.427077] x29: ffff00000804bd50 x28: ffff00000934e078 [ 1.427429] x27: ffff000009546000 x26: 0000000000000007 [ 1.427757] x25: ffff000009280710 x24: 00000000ffffffef [ 1.428086] x23: ffff000009408000 x22: 0000000000000000 [ 1.428415] x21: ffff000009136008 x20: ffff000009408730 [ 1.428744] x19: ffff80007b20b400 x18: 000000000000000a [ 1.429075] x17: 0000000000000000 x16: 0000000000000000 [ 1.429418] x15: 0000000000000400 x14: 2e79726f74636572 [ 1.429748] x13: 696420656d617320 x12: 656874206e692065 [ 1.430060] x11: 6d616e20656d6173 x10: 2065687420687469 [ 1.430335] x9 : ffff00000804bd50 x8 : 206e6f7361657220 [ 1.430610] x7 : 2c3376756d705f38 x6 : ffff00000954d7ce [ 1.430880] x5 : 0000000000000000 x4 : 0000000000000000 [ 1.431226] x3 : 0000000000000000 x2 : ffffffffffffffff [ 1.431554] x1 : 4d151327adc50b00 x0 : 0000000000000000 [ 1.431868] Call trace: [ 1.432102] perf_event_sysfs_init+0x98/0xdc [ 1.432382] do_one_initcall+0x6c/0x1a8 [ 1.432637] kernel_init_freeable+0x1bc/0x280 [ 1.432905] kernel_init+0x18/0x160 [ 1.433115] ret_from_fork+0x10/0x18 [ 1.433297] ---[ end trace 27fd415390eb9883 ]---

Rework to set suppress_bind_attrs flag to avoid removing the device when CONFIG_DEBUG_TEST_DRIVER_REMOVE=y, since there's no real reason to remove the armv8_pmuv3 driver.

Cc: Arnd Bergmann arnd@arndb.de Co-developed-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Anders Roxell anders.roxell@linaro.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/kernel/perf_event.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 86249a24592d..05fdae70e9f6 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -1130,6 +1130,7 @@ static struct platform_driver armv8_pmu_driver = { .driver = { .name = ARMV8_PMU_PDEV_NAME, .of_match_table = armv8_pmu_of_device_ids, + .suppress_bind_attrs = true, }, .probe = armv8_pmu_device_probe, };

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 10/53] USB: serial: ftdi_sio: use rounding when calculating baud rate divisors

From: Nikolaj Fogh nikolajfogh@gmail.com

[ Upstream commit 6abd837104a3a8e1cda64fc4d7675f6c3ece9d8b ]

Improve baud-rate generation by using rounding-to-closest instead of truncation in divisor calculation.

Results have been verified by logic analyzer on an FT232RT (232BM) chip. The following table shows the wanted baud rate, the baud rate obtained with the old method (truncation), with the new method (rounding) and the baud rate generated by the windows 10 driver. The numbers in parentheses is the error.

+- Wanted --+------ Old -------+------ New -------+------ Win -------+ | 9600 | 9600 (0.00%) | 9604 (0.05%) | 9605 (0.05%) | | 19200 | 19200 (0.00%) | 19199 (0.01%) | 19198 (0.01%) | | 38400 | 38395 (0.01%) | 38431 (0.08%) | 38394 (0.02%) | | 57600 | 57725 (0.22%) | 57540 (0.10%) | 57673 (0.13%) | | 115200 | 115307 (0.09%) | 115330 (0.11%) | 115320 (0.10%) | | 921600 | 919963 (0.18%) | 920386 (0.13%) | 920810 (0.09%) | | 961200 | 996512 (3.67%) | 956480 (0.49%) | 956937 (0.44%) | +-----------+------------------+------------------+------------------+

The error due to noise in the measurements is in the order of a few tenths of a %. As can be seen, the baud rate is significantly improved for some rates (e.g. 961200), and corresponds to the output given by the windows driver.

The theoretical baud rate has been calculated for all baud rates from 1 to 3M, and as expected, the error is centered around 0, with a triangle shape instead of a sawtooth, so the maximum error is decreased to half.

Signed-off-by: Nikolaj Fogh nikolajfogh@gmail.com [ johan: edit commit message slightly ] Signed-off-by: Johan Hovold johan@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/serial/ftdi_sio.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c index 385f2ae3be24..a5e716577016 100644 --- a/drivers/usb/serial/ftdi_sio.c +++ b/drivers/usb/serial/ftdi_sio.c @@ -1119,7 +1119,7 @@ static unsigned short int ftdi_232am_baud_base_to_divisor(int baud, int base) { unsigned short int divisor; /* divisor shifted 3 bits to the left */ - int divisor3 = base / 2 / baud; + int divisor3 = DIV_ROUND_CLOSEST(base, 2 * baud); if ((divisor3 & 0x7) == 7) divisor3++; /* round x.7/8 up to x+1 */ divisor = divisor3 >> 3; @@ -1145,7 +1145,7 @@ static __u32 ftdi_232bm_baud_base_to_divisor(int baud, int base) static const unsigned char divfrac[8] = { 0, 3, 2, 4, 1, 5, 6, 7 }; __u32 divisor; /* divisor shifted 3 bits to the left */ - int divisor3 = base / 2 / baud; + int divisor3 = DIV_ROUND_CLOSEST(base, 2 * baud); divisor = divisor3 >> 3; divisor |= (__u32)divfrac[divisor3 & 0x7] << 14; /* Deal with special cases for highest baud rates. */ @@ -1168,7 +1168,7 @@ static __u32 ftdi_2232h_baud_base_to_divisor(int baud, int base) int divisor3;

/* hi-speed baud rate is 10-bit sampling instead of 16-bit */ - divisor3 = base * 8 / (baud * 10); + divisor3 = DIV_ROUND_CLOSEST(8 * base, 10 * baud);

divisor = divisor3 >> 3; divisor |= (__u32)divfrac[divisor3 & 0x7] << 14;

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 11/53] usb: gadget: udc: renesas_usb3: add a safety connection way for forced_b_device

From: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com

[ Upstream commit ceb94bc52c437463f0903e61060a94a2226fb672 ]

This patch adds a safety connection way for "forced_b_device" with "workaround_for_vbus" like below:

< Example for R-Car E3 Ebisu > # modprobe <any usb gadget driver> # echo 1 > /sys/kernel/debug/ee020000.usb/b_device (connect a usb cable to host side.) # echo 2 > /sys/kernel/debug/ee020000.usb/b_device

Previous code should have connected a usb cable before the "b_device" is set to 1 on the Ebisu board. However, if xHCI driver on the board is probed, it causes some troubles: - Conflicts USB VBUS/signals between the board and another host. - "Cannot enable. Maybe the USB cable is bad?" might happen on both the board and another host with a usb hub. - Cannot enumerate a usb gadget correctly because an interruption of VBUS change happens unexpectedly.

Reported-by: Kazuya Mizuguchi kazuya.mizuguchi.ks@renesas.com Signed-off-by: Yoshihiro Shimoda yoshihiro.shimoda.uh@renesas.com Signed-off-by: Felipe Balbi felipe.balbi@linux.intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/gadget/udc/renesas_usb3.c | 15 ++++++++++----- 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/gadget/udc/renesas_usb3.c b/drivers/usb/gadget/udc/renesas_usb3.c index ade0723787e5..e5355ede2c46 100644 --- a/drivers/usb/gadget/udc/renesas_usb3.c +++ b/drivers/usb/gadget/udc/renesas_usb3.c @@ -352,6 +352,7 @@ struct renesas_usb3 { bool extcon_host; /* check id and set EXTCON_USB_HOST */ bool extcon_usb; /* check vbus and set EXTCON_USB */ bool forced_b_device; + bool start_to_connect; };

#define gadget_to_renesas_usb3(_gadget) \ @@ -470,7 +471,8 @@ static void usb3_init_axi_bridge(struct renesas_usb3 *usb3) static void usb3_init_epc_registers(struct renesas_usb3 *usb3) { usb3_write(usb3, ~0, USB3_USB_INT_STA_1); - usb3_enable_irq_1(usb3, USB_INT_1_VBUS_CNG); + if (!usb3->workaround_for_vbus) + usb3_enable_irq_1(usb3, USB_INT_1_VBUS_CNG); }

static bool usb3_wakeup_usb2_phy(struct renesas_usb3 *usb3) @@ -676,8 +678,7 @@ static void usb3_mode_config(struct renesas_usb3 *usb3, bool host, bool a_dev) usb3_set_mode(usb3, host); usb3_vbus_out(usb3, a_dev); /* for A-Peripheral or forced B-device mode */ - if ((!host && a_dev) || - (usb3->workaround_for_vbus && usb3->forced_b_device)) + if ((!host && a_dev) || usb3->start_to_connect) usb3_connect(usb3); spin_unlock_irqrestore(&usb3->lock, flags); } @@ -2369,7 +2370,11 @@ static ssize_t renesas_usb3_b_device_write(struct file *file, if (copy_from_user(&buf, ubuf, min_t(size_t, sizeof(buf) - 1, count))) return -EFAULT;

- if (!strncmp(buf, "1", 1)) + usb3->start_to_connect = false; + if (usb3->workaround_for_vbus && usb3->forced_b_device && + !strncmp(buf, "2", 1)) + usb3->start_to_connect = true; + else if (!strncmp(buf, "1", 1)) usb3->forced_b_device = true; else usb3->forced_b_device = false; @@ -2377,7 +2382,7 @@ static ssize_t renesas_usb3_b_device_write(struct file *file, if (usb3->workaround_for_vbus) usb3_disconnect(usb3);

- /* Let this driver call usb3_connect() anyway */ + /* Let this driver call usb3_connect() if needed */ usb3_check_id(usb3);

return count;

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 12/53] selinux: always allow mounting submounts

From: Ondrej Mosnacek omosnace@redhat.com

[ Upstream commit 2cbdcb882f97a45f7475c67ac6257bbc16277dfe ]

If a superblock has the MS_SUBMOUNT flag set, we should always allow mounting it. These mounts are done automatically by the kernel either as part of mounting some parent mount (e.g. debugfs always mounts tracefs under "tracing" for compatibility) or they are mounted automatically as needed on subdirectory accesses (e.g. NFS crossmnt mounts). Since such automounts are either an implicit consequence of the parent mount (which is already checked) or they can happen during regular accesses (where it doesn't make sense to check against the current task's context), the mount permission check should be skipped for them.

Without this patch, attempts to access contents of an automounted directory can cause unexpected SELinux denials.

In the current kernel tree, the MS_SUBMOUNT flag is set only via vfs_submount(), which is called only from the following places: - AFS, when automounting special "symlinks" referencing other cells - CIFS, when automounting "referrals" - NFS, when automounting subtrees - debugfs, when automounting tracefs

In all cases the submounts are meant to be transparent to the user and it makes sense that if mounting the master is allowed, then so should be the automounts. Note that CAP_SYS_ADMIN capability checking is already skipped for (SB_KERNMOUNT|SB_SUBMOUNT) in: - sget_userns() in fs/super.c: if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) && !(type->fs_flags & FS_USERNS_MOUNT) && !capable(CAP_SYS_ADMIN)) return ERR_PTR(-EPERM); - sget() in fs/super.c: /* Ensure the requestor has permissions over the target filesystem */ if (!(flags & (SB_KERNMOUNT|SB_SUBMOUNT)) && !ns_capable(user_ns, CAP_SYS_ADMIN)) return ERR_PTR(-EPERM);

Verified internally on patched RHEL 7.6 with a reproducer using NFS+httpd and selinux-tesuite.

Fixes: 93faccbbfa95 ("fs: Better permission checking for submounts") Signed-off-by: Ondrej Mosnacek omosnace@redhat.com Signed-off-by: Paul Moore paul@paul-moore.com Signed-off-by: Sasha Levin sashal@kernel.org --- security/selinux/hooks.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index f5d304736852..d6b9ed34ceae 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -2820,7 +2820,7 @@ static int selinux_sb_kern_mount(struct super_block *sb, int flags, void *data) return rc;

/* Allow all mounts performed by the kernel */ - if (flags & MS_KERNMOUNT) + if (flags & (MS_KERNMOUNT | MS_SUBMOUNT)) return 0;

ad.type = LSM_AUDIT_DATA_DENTRY;

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 13/53] drm/amdgpu: Correct get_crtc_scanoutpos behavior when vpos >= vtotal

From: Nicholas Kazlauskas nicholas.kazlauskas@amd.com

[ Upstream commit 520f08df45fbe300ed650da786a74093d658b7e1 ]

When variable refresh rate is active the hardware counter can return a position >= vtotal. This results in a vpos being returned from amdgpu_display_get_crtc_scanoutpos that's a positive value. The positive value indicates to the caller that the display is currently in scanout when the display is actually still in vblank.

This is because the vfront porch duration is unknown with variable refresh active and will end when either a page flip occurs or the timeout specified by the driver/display is reached.

The behavior of the amdgpu_display_get_crtc_scanoutpos remains the same when the position is below vtotal. When the position is above vtotal the function will return a value that is effectively -vbl_end, the size of the vback porch.

The only caller affected by this change is the DRM helper for calculating vblank timestamps. This change corrects behavior for calculating the page flip timestamp from being the previous timestamp to the calculation to the next timestamp when position >= vtotal.

Signed-off-by: Nicholas Kazlauskas nicholas.kazlauskas@amd.com Reviewed-by: Harry Wentland harry.wentland@amd.com Signed-off-by: Alex Deucher alexander.deucher@amd.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c index 6ad243293a78..8c0a208e43c5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c @@ -855,7 +855,12 @@ int amdgpu_get_crtc_scanoutpos(struct drm_device *dev, unsigned int pipe, /* Inside "upper part" of vblank area? Apply corrective offset if so: */ if (in_vbl && (*vpos >= vbl_start)) { vtotal = mode->crtc_vtotal; - *vpos = *vpos - vtotal; + + /* With variable refresh rate displays the vpos can exceed + * the vtotal value. Clamp to 0 to return -vbl_end instead + * of guessing the remaining number of lines until scanout. + */ + *vpos = (*vpos < vtotal) ? (*vpos - vtotal) : 0; }

/* Correct for shifted end of vbl at vbl_end. */

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 14/53] rxe: IB_WR_REG_MR does not capture MR's iova field

From: Chuck Lever chuck.lever@oracle.com

[ Upstream commit b024dd0eba6e6d568f69d63c5e3153aba94c23e3 ]

FRWR memory registration is done with a series of calls and WRs. 1. ULP invokes ib_dma_map_sg() 2. ULP invokes ib_map_mr_sg() 3. ULP posts an IB_WR_REG_MR on the Send queue

Step 2 generates an iova. It is permissible for ULPs to change this iova (with certain restrictions) between steps 2 and 3.

rxe_map_mr_sg captures the MR's iova but later when rxe processes the REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova after step 2 but before step 3, rxe never captures that change.

When the remote sends an RDMA Read targeting that MR, rxe looks up the R_key, but the altered iova does not match the iova stored in the MR, causing the RDMA Read request to fail.

Reported-by: Anna Schumaker schumaker.anna@gmail.com Signed-off-by: Chuck Lever chuck.lever@oracle.com Reviewed-by: Sagi Grimberg sagi@grimberg.me Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/sw/rxe/rxe_req.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/drivers/infiniband/sw/rxe/rxe_req.c b/drivers/infiniband/sw/rxe/rxe_req.c index de853bcc2384..08ae4f3a6a37 100644 --- a/drivers/infiniband/sw/rxe/rxe_req.c +++ b/drivers/infiniband/sw/rxe/rxe_req.c @@ -640,6 +640,7 @@ int rxe_requester(void *arg) rmr->access = wqe->wr.wr.reg.access; rmr->lkey = wqe->wr.wr.reg.key; rmr->rkey = wqe->wr.wr.reg.key; + rmr->iova = wqe->wr.wr.reg.mr->iova; wqe->state = wqe_state_done; wqe->status = IB_WC_SUCCESS; } else {

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 15/53] jffs2: Fix use of uninitialized delayed_work, lockdep breakage

From: Daniel Santos daniel.santos@pobox.com

[ Upstream commit a788c5272769ddbcdbab297cf386413eeac04463 ]

jffs2_sync_fs makes the assumption that if CONFIG_JFFS2_FS_WRITEBUFFER is defined then a write buffer is available and has been initialized. However, this does is not the case when the mtd device has no out-of-band buffer:

int jffs2_nand_flash_setup(struct jffs2_sb_info *c) { if (!c->mtd->oobsize) return 0; ...

The resulting call to cancel_delayed_work_sync passing a uninitialized (but zeroed) delayed_work struct forces lockdep to become disabled.

[ 90.050639] overlayfs: upper fs does not support tmpfile. [ 90.652264] INFO: trying to register non-static key. [ 90.662171] the code is fine but needs lockdep annotation. [ 90.673090] turning off the locking correctness validator. [ 90.684021] CPU: 0 PID: 1762 Comm: mount_root Not tainted 4.14.63 #0 [ 90.696672] Stack : 00000000 00000000 80d8f6a2 00000038 805f0000 80444600 8fe364f4 805dfbe7 [ 90.713349] 80563a30 000006e2 8068370c 00000001 00000000 00000001 8e2fdc48 ffffffff [ 90.730020] 00000000 00000000 80d90000 00000000 00000106 00000000 6465746e 312e3420 [ 90.746690] 6b636f6c 03bf0000 f8000000 20676e69 00000000 80000000 00000000 8e2c2a90 [ 90.763362] 80d90000 00000001 00000000 8e2c2a90 00000003 80260dc0 08052098 80680000 [ 90.780033] ... [ 90.784902] Call Trace: [ 90.789793] [<8000f0d8>] show_stack+0xb8/0x148 [ 90.798659] [<8005a000>] register_lock_class+0x270/0x55c [ 90.809247] [<8005cb64>] __lock_acquire+0x13c/0xf7c [ 90.818964] [<8005e314>] lock_acquire+0x194/0x1dc [ 90.828345] [<8003f27c>] flush_work+0x200/0x24c [ 90.837374] [<80041dfc>] __cancel_work_timer+0x158/0x210 [ 90.847958] [<801a8770>] jffs2_sync_fs+0x20/0x54 [ 90.857173] [<80125cf4>] iterate_supers+0xf4/0x120 [ 90.866729] [<80158fc4>] sys_sync+0x44/0x9c [ 90.875067] [<80014424>] syscall_common+0x34/0x58

Signed-off-by: Daniel Santos daniel.santos@pobox.com Reviewed-by: Hou Tao houtao1@huawei.com Signed-off-by: Boris Brezillon boris.brezillon@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/jffs2/super.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/jffs2/super.c b/fs/jffs2/super.c index bc00cc385b77..83340496645b 100644 --- a/fs/jffs2/super.c +++ b/fs/jffs2/super.c @@ -101,7 +101,8 @@ static int jffs2_sync_fs(struct super_block *sb, int wait) struct jffs2_sb_info *c = JFFS2_SB_INFO(sb);

#ifdef CONFIG_JFFS2_FS_WRITEBUFFER - cancel_delayed_work_sync(&c->wbuf_dwork); + if (jffs2_is_writebuffered(c)) + cancel_delayed_work_sync(&c->wbuf_dwork); #endif

mutex_lock(&c->alloc_sem);

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 16/53] clk: imx: make mux parent strings const

From: "A.s. Dong" aisheng.dong@nxp.com

[ Upstream commit 9e5ef7a57ca75a1b9411c46caeeb6881124284a3 ]

As the commit 2893c379461a ("clk: make strings in parent name arrays const"), let's make the parent strings const, otherwise we may meet the following warning when compiling:

drivers/clk/imx/clk-imx7ulp.c: In function 'imx7ulp_clocks_init': drivers/clk/imx/clk-imx7ulp.c:73:35: warning: passing argument 5 of 'imx_clk_mux_flags' discards 'const' qualifier from pointer target type

clks[IMX7ULP_CLK_APLL_PRE_SEL] = imx_clk_mux_flags("apll_pre_sel", base + 0x508, 0, 1, pll_pre_sels, ARRAY_SIZE(pll_pre_sels), CLK_SET_PARENT_GATE); ^ In file included from drivers/clk/imx/clk-imx7ulp.c:23:0: drivers/clk/imx/clk.h:200:27: note: expected 'const char **' but argument is of type 'const char * const*' ...

Cc: Stephen Boyd sboyd@codeaurora.org Cc: Michael Turquette mturquette@baylibre.com Cc: Shawn Guo shawnguo@kernel.org Signed-off-by: Dong Aisheng aisheng.dong@nxp.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/imx/clk-busy.c | 2 +- drivers/clk/imx/clk-fixup-mux.c | 2 +- drivers/clk/imx/clk.h | 18 +++++++++++------- 3 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/drivers/clk/imx/clk-busy.c b/drivers/clk/imx/clk-busy.c index 5cc99590f9a3..097625c5715c 100644 --- a/drivers/clk/imx/clk-busy.c +++ b/drivers/clk/imx/clk-busy.c @@ -154,7 +154,7 @@ static struct clk_ops clk_busy_mux_ops = {

struct clk *imx_clk_busy_mux(const char *name, void __iomem *reg, u8 shift, u8 width, void __iomem *busy_reg, u8 busy_shift, - const char **parent_names, int num_parents) + const char * const *parent_names, int num_parents) { struct clk_busy_mux *busy; struct clk *clk; diff --git a/drivers/clk/imx/clk-fixup-mux.c b/drivers/clk/imx/clk-fixup-mux.c index c9b327e0a8dd..44817c1b0b88 100644 --- a/drivers/clk/imx/clk-fixup-mux.c +++ b/drivers/clk/imx/clk-fixup-mux.c @@ -70,7 +70,7 @@ static const struct clk_ops clk_fixup_mux_ops = { };

struct clk *imx_clk_fixup_mux(const char *name, void __iomem *reg, - u8 shift, u8 width, const char **parents, + u8 shift, u8 width, const char * const *parents, int num_parents, void (*fixup)(u32 *val)) { struct clk_fixup_mux *fixup_mux; diff --git a/drivers/clk/imx/clk.h b/drivers/clk/imx/clk.h index d69c4bbf3597..b03fbd502528 100644 --- a/drivers/clk/imx/clk.h +++ b/drivers/clk/imx/clk.h @@ -63,14 +63,14 @@ struct clk *imx_clk_busy_divider(const char *name, const char *parent_name,

struct clk *imx_clk_busy_mux(const char *name, void __iomem *reg, u8 shift, u8 width, void __iomem *busy_reg, u8 busy_shift, - const char **parent_names, int num_parents); + const char * const *parent_names, int num_parents);

struct clk *imx_clk_fixup_divider(const char *name, const char *parent, void __iomem *reg, u8 shift, u8 width, void (*fixup)(u32 *val));

static inline struct clk *imx_clk_fixed(const char *name, int rate) @@ -79,7 +79,8 @@ static inline struct clk *imx_clk_fixed(const char *name, int rate) }

static inline struct clk *imx_clk_mux_ldb(const char *name, void __iomem *reg, - u8 shift, u8 width, const char **parents, int num_parents) + u8 shift, u8 width, const char * const *parents, + int num_parents) { return clk_register_mux(NULL, name, parents, num_parents, CLK_SET_RATE_NO_REPARENT | CLK_SET_RATE_PARENT, reg, @@ -178,7 +179,8 @@ static inline struct clk *imx_clk_gate4(const char *name, const char *parent, }

static inline struct clk *imx_clk_mux(const char *name, void __iomem *reg, - u8 shift, u8 width, const char **parents, int num_parents) + u8 shift, u8 width, const char * const *parents, + int num_parents) { return clk_register_mux(NULL, name, parents, num_parents, CLK_SET_RATE_NO_REPARENT, reg, shift, @@ -186,7 +188,8 @@ static inline struct clk *imx_clk_mux(const char *name, void __iomem *reg, }

static inline struct clk *imx_clk_mux2(const char *name, void __iomem *reg, - u8 shift, u8 width, const char **parents, int num_parents) + u8 shift, u8 width, const char * const *parents, + int num_parents) { return clk_register_mux(NULL, name, parents, num_parents, CLK_SET_RATE_NO_REPARENT | CLK_OPS_PARENT_ENABLE, @@ -194,8 +197,9 @@ static inline struct clk *imx_clk_mux2(const char *name, void __iomem *reg, }

static inline struct clk *imx_clk_mux_flags(const char *name, - void __iomem *reg, u8 shift, u8 width, const char **parents, - int num_parents, unsigned long flags) + void __iomem *reg, u8 shift, u8 width, + const char * const *parents, int num_parents, + unsigned long flags) { return clk_register_mux(NULL, name, parents, num_parents, flags | CLK_SET_RATE_NO_REPARENT, reg, shift, width, 0,

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 17/53] pstore/ram: Do not treat empty buffers as valid

From: "Joel Fernandes (Google)" joel@joelfernandes.org

[ Upstream commit 30696378f68a9e3dad6bfe55938b112e72af00c2 ]

The ramoops backend currently calls persistent_ram_save_old() even if a buffer is empty. While this appears to work, it is does not seem like the right thing to do and could lead to future bugs so lets avoid that. It also prevents misleading prints in the logs which claim the buffer is valid.

I got something like:

found existing buffer, size 0, start 0

When I was expecting:

no valid data in buffer (sig = ...)

This bails out early (and reports with pr_debug()), since it's an acceptable state.

Signed-off-by: Joel Fernandes (Google) joel@joelfernandes.org Co-developed-by: Kees Cook keescook@chromium.org Signed-off-by: Kees Cook keescook@chromium.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/pstore/ram_core.c | 5 +++++ 1 file changed, 5 insertions(+)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c index ecdb3baa1283..11e558efd61e 100644 --- a/fs/pstore/ram_core.c +++ b/fs/pstore/ram_core.c @@ -488,6 +488,11 @@ static int persistent_ram_post_init(struct persistent_ram_zone *prz, u32 sig, sig ^= PERSISTENT_RAM_SIG;

if (prz->buffer->sig == sig) { + if (buffer_size(prz) == 0) { + pr_debug("found existing empty buffer\n"); + return 0; + } + if (buffer_size(prz) > prz->buffer_size || buffer_start(prz) > buffer_size(prz)) pr_info("found existing invalid buffer, size %zu, start %zu\n",

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 18/53] powerpc/xmon: Fix invocation inside lock region

From: Breno Leitao leitao@debian.org

[ Upstream commit 8d4a862276a9c30a269d368d324fb56529e6d5fd ]

Currently xmon needs to get devtree_lock (through rtas_token()) during its invocation (at crash time). If there is a crash while devtree_lock is being held, then xmon tries to get the lock but spins forever and never get into the interactive debugger, as in the following case:

int *ptr = NULL; raw_spin_lock_irqsave(&devtree_lock, flags); *ptr = 0xdeadbeef;

This patch avoids calling rtas_token(), thus trying to get the same lock, at crash time. This new mechanism proposes getting the token at initialization time (xmon_init()) and just consuming it at crash time.

This would allow xmon to be possible invoked independent of devtree_lock being held or not.

Signed-off-by: Breno Leitao leitao@debian.org Reviewed-by: Thiago Jung Bauermann bauerman@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/xmon/xmon.c | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/xmon/xmon.c b/arch/powerpc/xmon/xmon.c index a5938fadd031..f752f771f29d 100644 --- a/arch/powerpc/xmon/xmon.c +++ b/arch/powerpc/xmon/xmon.c @@ -78,6 +78,9 @@ static int xmon_gate; #define xmon_owner 0 #endif /* CONFIG_SMP */

+#ifdef CONFIG_PPC_PSERIES +static int set_indicator_token = RTAS_UNKNOWN_SERVICE; +#endif static unsigned long in_xmon __read_mostly = 0; static int xmon_on = IS_ENABLED(CONFIG_XMON_DEFAULT);

@@ -357,7 +360,6 @@ static inline void disable_surveillance(void) #ifdef CONFIG_PPC_PSERIES /* Since this can't be a module, args should end up below 4GB. */ static struct rtas_args args; - int token;

/* * At this point we have got all the cpus we can into @@ -366,11 +368,11 @@ static inline void disable_surveillance(void) * If we did try to take rtas.lock there would be a * real possibility of deadlock. */ - token = rtas_token("set-indicator"); - if (token == RTAS_UNKNOWN_SERVICE) + if (set_indicator_token == RTAS_UNKNOWN_SERVICE) return;

- rtas_call_unlocked(&args, token, 3, 1, NULL, SURVEILLANCE_TOKEN, 0, 0); + rtas_call_unlocked(&args, set_indicator_token, 3, 1, NULL, + SURVEILLANCE_TOKEN, 0, 0);

#endif /* CONFIG_PPC_PSERIES */ } @@ -3472,6 +3474,14 @@ static void xmon_init(int enable) __debugger_iabr_match = xmon_iabr_match; __debugger_break_match = xmon_break_match; __debugger_fault_handler = xmon_fault_handler; + +#ifdef CONFIG_PPC_PSERIES + /* + * Get the token here to avoid trying to get a lock + * during the crash, causing a deadlock. + */ + set_indicator_token = rtas_token("set-indicator"); +#endif } else { __debugger = NULL; __debugger_ipi = NULL;

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 19/53] powerpc/pseries/cpuidle: Fix preempt warning

From: Breno Leitao leitao@debian.org

[ Upstream commit 2b038cbc5fcf12a7ee1cc9bfd5da1e46dacdee87 ]

When booting a pseries kernel with PREEMPT enabled, it dumps the following warning:

BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 caller is pseries_processor_idle_init+0x5c/0x22c CPU: 13 PID: 1 Comm: swapper/0 Not tainted 4.20.0-rc3-00090-g12201a0128bc-dirty #828 Call Trace: [c000000429437ab0] [c0000000009c8878] dump_stack+0xec/0x164 (unreliable) [c000000429437b00] [c0000000005f2f24] check_preemption_disabled+0x154/0x160 [c000000429437b90] [c000000000cab8e8] pseries_processor_idle_init+0x5c/0x22c [c000000429437c10] [c000000000010ed4] do_one_initcall+0x64/0x300 [c000000429437ce0] [c000000000c54500] kernel_init_freeable+0x3f0/0x500 [c000000429437db0] [c0000000000112dc] kernel_init+0x2c/0x160 [c000000429437e20] [c00000000000c1d0] ret_from_kernel_thread+0x5c/0x6c

This happens because the code calls get_lppaca() which calls get_paca() and it checks if preemption is disabled through check_preemption_disabled().

Preemption should be disabled because the per CPU variable may make no sense if there is a preemption (and a CPU switch) after it reads the per CPU data and when it is used.

In this device driver specifically, it is not a problem, because this code just needs to have access to one lppaca struct, and it does not matter if it is the current per CPU lppaca struct or not (i.e. when there is a preemption and a CPU migration).

That said, the most appropriate fix seems to be related to avoiding the debug_smp_processor_id() call at get_paca(), instead of calling preempt_disable() before get_paca().

Signed-off-by: Breno Leitao leitao@debian.org Signed-off-by: Michael Ellerman mpe@ellerman.id.au Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpuidle/cpuidle-pseries.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/cpuidle/cpuidle-pseries.c b/drivers/cpuidle/cpuidle-pseries.c index a187a39fb866..7f21c6a57178 100644 --- a/drivers/cpuidle/cpuidle-pseries.c +++ b/drivers/cpuidle/cpuidle-pseries.c @@ -239,7 +239,13 @@ static int pseries_idle_probe(void) return -ENODEV;

if (firmware_has_feature(FW_FEATURE_SPLPAR)) { - if (lppaca_shared_proc(get_lppaca())) { + /* + * Use local_paca instead of get_lppaca() since + * preemption is not disabled, and it is not required in + * fact, since lppaca_ptr does not need to be the value + * associated to the current CPU, it can be from any CPU. + */ + if (lppaca_shared_proc(local_paca->lppaca_ptr)) { cpuidle_state_table = shared_states; max_idle_state = ARRAY_SIZE(shared_states); } else {

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 20/53] arm64: relocatable: fix inconsistencies in linker script and options

From: Ard Biesheuvel ard.biesheuvel@linaro.org

[ Upstream commit 3bbd3db86470c701091fb1d67f1fab6621debf50 ]

readelf complains about the section layout of vmlinux when building with CONFIG_RELOCATABLE=y (for KASLR):

readelf: Warning: [21]: Link field (0) should index a symtab section. readelf: Warning: [21]: Info field (0) should index a relocatable section.

Also, it seems that our use of '-pie -shared' is contradictory, and thus ambiguous. In general, the way KASLR is wired up at the moment is highly tailored to how ld.bfd happens to implement (and conflate) PIE executables and shared libraries, so given the current effort to support other toolchains, let's fix some of these issues as well.

- Drop the -pie linker argument and just leave -shared. In ld.bfd, the differences between them are unclear (except for the ELF type of the produced image [0]) but lld chokes on seeing both at the same time.

- Rename the .rela output section to .rela.dyn, as is customary for shared libraries and PIE executables, so that it is not misidentified by readelf as a static relocation section (producing the warnings above).

- Pass the -z notext and -z norelro options to explicitly instruct the linker to permit text relocations, and to omit the RELRO program header (which requires a certain section layout that we don't adhere to in the kernel). These are the defaults for current versions of ld.bfd.

- Discard .eh_frame and .gnu.hash sections to avoid them from being emitted between .head.text and .text, screwing up the section layout.

These changes only affect the ELF image, and produce the same binary image.

[0] b9dce7f1ba01 ("arm64: kernel: force ET_DYN ELF type for ...")

Cc: Nick Desaulniers ndesaulniers@google.com Cc: Peter Smith peter.smith@linaro.org Tested-by: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Ard Biesheuvel ard.biesheuvel@linaro.org Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/Makefile | 2 +- arch/arm64/kernel/vmlinux.lds.S | 9 +++++---- 2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile index 48f2b3657507..0c5f70e6d5cf 100644 --- a/arch/arm64/Makefile +++ b/arch/arm64/Makefile @@ -18,7 +18,7 @@ ifeq ($(CONFIG_RELOCATABLE), y) # Pass --no-apply-dynamic-relocs to restore pre-binutils-2.27 behaviour # for relative relocs, since this leads to better Image compression # with the relocation offsets always being zero. -LDFLAGS_vmlinux += -pie -shared -Bsymbolic \ +LDFLAGS_vmlinux += -shared -Bsymbolic -z notext -z norelro \ $(call ld-option, --no-apply-dynamic-relocs) endif

diff --git a/arch/arm64/kernel/vmlinux.lds.S b/arch/arm64/kernel/vmlinux.lds.S index ddfd3c0942f7..6edfdf5b061d 100644 --- a/arch/arm64/kernel/vmlinux.lds.S +++ b/arch/arm64/kernel/vmlinux.lds.S @@ -99,7 +99,8 @@ SECTIONS *(.discard) *(.discard.*) *(.interp .dynamic) - *(.dynsym .dynstr .hash) + *(.dynsym .dynstr .hash .gnu.hash) + *(.eh_frame) }

. = KIMAGE_VADDR + TEXT_OFFSET; @@ -176,12 +177,12 @@ SECTIONS

PERCPU_SECTION(L1_CACHE_BYTES)

- .rela : ALIGN(8) { + .rela.dyn : ALIGN(8) { *(.rela .rela*) }

- __rela_offset = ABSOLUTE(ADDR(.rela) - KIMAGE_VADDR); - __rela_size = SIZEOF(.rela); + __rela_offset = ABSOLUTE(ADDR(.rela.dyn) - KIMAGE_VADDR); + __rela_size = SIZEOF(.rela.dyn);

. = ALIGN(SEGMENT_ALIGN); __initdata_end = .;

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 21/53] media: firewire: Fix app_info parameter type in avc_ca{, _app}_info

From: Nathan Chancellor natechancellor@gmail.com

[ Upstream commit b2e9a4eda11fd2cb1e6714e9ad3f455c402568ff ]

Clang warns:

drivers/media/firewire/firedtv-avc.c:999:45: warning: implicit conversion from 'int' to 'char' changes value from 159 to -97 [-Wconstant-conversion] app_info[0] = (EN50221_TAG_APP_INFO >> 16) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1000:45: warning: implicit conversion from 'int' to 'char' changes value from 128 to -128 [-Wconstant-conversion] app_info[1] = (EN50221_TAG_APP_INFO >> 8) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1040:44: warning: implicit conversion from 'int' to 'char' changes value from 159 to -97 [-Wconstant-conversion] app_info[0] = (EN50221_TAG_CA_INFO >> 16) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ drivers/media/firewire/firedtv-avc.c:1041:44: warning: implicit conversion from 'int' to 'char' changes value from 128 to -128 [-Wconstant-conversion] app_info[1] = (EN50221_TAG_CA_INFO >> 8) & 0xff; ~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~ 4 warnings generated.

Change app_info's type to unsigned char to match the type of the member msg in struct ca_msg, which is the only thing passed into the app_info parameter in this function.

Link: https://github.com/ClangBuiltLinux/linux/issues/105

Signed-off-by: Nathan Chancellor natechancellor@gmail.com Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/firewire/firedtv-avc.c | 6 ++++-- drivers/media/firewire/firedtv.h | 6 ++++-- 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/media/firewire/firedtv-avc.c b/drivers/media/firewire/firedtv-avc.c index 5bde6c209cd7..b243e4a52f10 100644 --- a/drivers/media/firewire/firedtv-avc.c +++ b/drivers/media/firewire/firedtv-avc.c @@ -968,7 +968,8 @@ static int get_ca_object_length(struct avc_response_frame *r) return r->operand[7]; }

-int avc_ca_app_info(struct firedtv *fdtv, char *app_info, unsigned int *len) +int avc_ca_app_info(struct firedtv *fdtv, unsigned char *app_info, + unsigned int *len) { struct avc_command_frame *c = (void *)fdtv->avc_data; struct avc_response_frame *r = (void *)fdtv->avc_data; @@ -1009,7 +1010,8 @@ int avc_ca_app_info(struct firedtv *fdtv, char *app_info, unsigned int *len) return ret; }

-int avc_ca_info(struct firedtv *fdtv, char *app_info, unsigned int *len) +int avc_ca_info(struct firedtv *fdtv, unsigned char *app_info, + unsigned int *len) { struct avc_command_frame *c = (void *)fdtv->avc_data; struct avc_response_frame *r = (void *)fdtv->avc_data; diff --git a/drivers/media/firewire/firedtv.h b/drivers/media/firewire/firedtv.h index 345d1eda8c05..5b18a08c6285 100644 --- a/drivers/media/firewire/firedtv.h +++ b/drivers/media/firewire/firedtv.h @@ -124,8 +124,10 @@ int avc_lnb_control(struct firedtv *fdtv, char voltage, char burst, struct dvb_diseqc_master_cmd *diseqcmd); void avc_remote_ctrl_work(struct work_struct *work); int avc_register_remote_control(struct firedtv *fdtv); -int avc_ca_app_info(struct firedtv *fdtv, char *app_info, unsigned int *len); -int avc_ca_info(struct firedtv *fdtv, char *app_info, unsigned int *len); +int avc_ca_app_info(struct firedtv *fdtv, unsigned char *app_info, + unsigned int *len); +int avc_ca_info(struct firedtv *fdtv, unsigned char *app_info, + unsigned int *len); int avc_ca_reset(struct firedtv *fdtv); int avc_ca_pmt(struct firedtv *fdtv, char *app_info, int length); int avc_ca_get_time_date(struct firedtv *fdtv, int *interval);

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 22/53] IB/hfi1: Incorrect sizing of sge for PIO will OOPs

From: "Michael J. Ruhl" michael.j.ruhl@intel.com

[ Upstream commit dbc2970caef74e8ff41923d302aa6fb5a4812d0e ]

An incorrect sge sizing in the HFI PIO path will cause an OOPs similar to this:

BUG: unable to handle kernel NULL pointer dereference at (null) IP: [] hfi1_verbs_send_pio+0x3d8/0x530 [hfi1] PGD 0 Oops: 0000 1 SMP Call Trace: ? hfi1_verbs_send_dma+0xad0/0xad0 [hfi1] hfi1_verbs_send+0xdf/0x250 [hfi1] ? make_rc_ack+0xa80/0xa80 [hfi1] hfi1_do_send+0x192/0x430 [hfi1] hfi1_do_send_from_rvt+0x10/0x20 [hfi1] rvt_post_send+0x369/0x820 [rdmavt] ib_uverbs_post_send+0x317/0x570 [ib_uverbs] ib_uverbs_write+0x26f/0x420 [ib_uverbs] ? security_file_permission+0x21/0xa0 vfs_write+0xbd/0x1e0 ? mntput+0x24/0x40 SyS_write+0x7f/0xe0 system_call_fastpath+0x16/0x1b

Fix by adding the missing sizing check to correctly determine the sge length.

Fixes: 7724105686e7 ("IB/hfi1: add driver files") Reviewed-by: Mike Marciniszyn mike.marciniszyn@intel.com Signed-off-by: Michael J. Ruhl michael.j.ruhl@intel.com Signed-off-by: Dennis Dalessandro dennis.dalessandro@intel.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/hfi1/verbs.c | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/drivers/infiniband/hw/hfi1/verbs.c b/drivers/infiniband/hw/hfi1/verbs.c index 12cf0f7ca7bb..2e8854ba18cf 100644 --- a/drivers/infiniband/hw/hfi1/verbs.c +++ b/drivers/infiniband/hw/hfi1/verbs.c @@ -1123,6 +1123,8 @@ int hfi1_verbs_send_pio(struct rvt_qp *qp, struct hfi1_pkt_state *ps,

if (slen > len) slen = len; + if (slen > ss->sge.sge_length) + slen = ss->sge.sge_length; rvt_update_sge(ss, slen, false); seg_pio_copy_mid(pbuf, addr, slen); len -= slen;

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 23/53] media: venus: core: Set dma maximum segment size

From: Vivek Gautam vivek.gautam@codeaurora.org

[ Upstream commit de2563bce7a157f5296bab94f3843d7d64fb14b4 ]

Turning on CONFIG_DMA_API_DEBUG_SG results in the following error:

[ 460.308650] ------------[ cut here ]------------ [ 460.313490] qcom-venus aa00000.video-codec: DMA-API: mapping sg segment longer than device claims to support [len=4194304] [max=65536] [ 460.326017] WARNING: CPU: 3 PID: 3555 at src/kernel/dma/debug.c:1301 debug_dma_map_sg+0x174/0x254 [ 460.338888] Modules linked in: venus_dec venus_enc videobuf2_dma_sg videobuf2_memops hci_uart btqca bluetooth venus_core v4l2_mem2mem videobuf2_v4l2 videobuf2_common ath10k_snoc ath10k_core ath lzo lzo_compress zramjoydev [ 460.375811] CPU: 3 PID: 3555 Comm: V4L2DecoderThre Tainted: G W 4.19.1 #82 [ 460.384223] Hardware name: Google Cheza (rev1) (DT) [ 460.389251] pstate: 60400009 (nZCv daif +PAN -UAO) [ 460.394191] pc : debug_dma_map_sg+0x174/0x254 [ 460.398680] lr : debug_dma_map_sg+0x174/0x254 [ 460.403162] sp : ffffff80200c37d0 [ 460.406583] x29: ffffff80200c3830 x28: 0000000000010000 [ 460.412056] x27: 00000000ffffffff x26: ffffffc0f785ea80 [ 460.417532] x25: 0000000000000000 x24: ffffffc0f4ea1290 [ 460.423001] x23: ffffffc09e700300 x22: ffffffc0f4ea1290 [ 460.428470] x21: ffffff8009037000 x20: 0000000000000001 [ 460.433936] x19: ffffff80091b0000 x18: 0000000000000000 [ 460.439411] x17: 0000000000000000 x16: 000000000000f251 [ 460.444885] x15: 0000000000000006 x14: 0720072007200720 [ 460.450354] x13: ffffff800af536e0 x12: 0000000000000000 [ 460.455822] x11: 0000000000000000 x10: 0000000000000000 [ 460.461288] x9 : 537944d9c6c48d00 x8 : 537944d9c6c48d00 [ 460.466758] x7 : 0000000000000000 x6 : ffffffc0f8d98f80 [ 460.472230] x5 : 0000000000000000 x4 : 0000000000000000 [ 460.477703] x3 : 000000000000008a x2 : ffffffc0fdb13948 [ 460.483170] x1 : ffffffc0fdb0b0b0 x0 : 000000000000007a [ 460.488640] Call trace: [ 460.491165] debug_dma_map_sg+0x174/0x254 [ 460.495307] vb2_dma_sg_alloc+0x260/0x2dc [videobuf2_dma_sg] [ 460.501150] __vb2_queue_alloc+0x164/0x374 [videobuf2_common] [ 460.507076] vb2_core_reqbufs+0xfc/0x23c [videobuf2_common] [ 460.512815] vb2_reqbufs+0x44/0x5c [videobuf2_v4l2] [ 460.517853] v4l2_m2m_reqbufs+0x44/0x78 [v4l2_mem2mem] [ 460.523144] v4l2_m2m_ioctl_reqbufs+0x1c/0x28 [v4l2_mem2mem] [ 460.528976] v4l_reqbufs+0x30/0x40 [ 460.532480] __video_do_ioctl+0x36c/0x454 [ 460.536610] video_usercopy+0x25c/0x51c [ 460.540572] video_ioctl2+0x38/0x48 [ 460.544176] v4l2_ioctl+0x60/0x74 [ 460.547602] do_video_ioctl+0x948/0x3520 [ 460.551648] v4l2_compat_ioctl32+0x60/0x98 [ 460.555872] __arm64_compat_sys_ioctl+0x134/0x20c [ 460.560718] el0_svc_common+0x9c/0xe4 [ 460.564498] el0_svc_compat_handler+0x2c/0x38 [ 460.568982] el0_svc_compat+0x8/0x18 [ 460.572672] ---[ end trace ce209b87b2f3af88 ]---

...

From above warning one would deduce that the sg segment will overflow

the device's capacity. In reality, the hardware can accommodate larger sg segments. So, initialize the max segment size properly to weed out this warning.

Based on a similar patch sent by Sean Paul for mdss: https://patchwork.kernel.org/patch/10671457/

Signed-off-by: Vivek Gautam vivek.gautam@codeaurora.org Acked-by: Stanimir Varbanov stanimir.varbanov@linaro.org Signed-off-by: Hans Verkuil hverkuil-cisco@xs4all.nl Signed-off-by: Mauro Carvalho Chehab mchehab+samsung@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/media/platform/qcom/venus/core.c | 8 ++++++++ 1 file changed, 8 insertions(+)

diff --git a/drivers/media/platform/qcom/venus/core.c b/drivers/media/platform/qcom/venus/core.c index 41eef376eb2d..769e9e68562d 100644 --- a/drivers/media/platform/qcom/venus/core.c +++ b/drivers/media/platform/qcom/venus/core.c @@ -187,6 +187,14 @@ static int venus_probe(struct platform_device *pdev) if (ret) return ret;

+ if (!dev->dma_parms) { + dev->dma_parms = devm_kzalloc(dev, sizeof(*dev->dma_parms), + GFP_KERNEL); + if (!dev->dma_parms) + return -ENOMEM; + } + dma_set_max_seg_size(dev, DMA_BIT_MASK(32)); + INIT_LIST_HEAD(&core->instances); mutex_init(&core->lock); INIT_DELAYED_WORK(&core->work, venus_sys_error_handler);

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 24/53] net: call sk_dst_reset when set SO_DONTROUTE

From: yupeng yupeng0921@gmail.com

[ Upstream commit 0fbe82e628c817e292ff588cd5847fc935e025f2 ]

after set SO_DONTROUTE to 1, the IP layer should not route packets if the dest IP address is not in link scope. But if the socket has cached the dst_entry, such packets would be routed until the sk_dst_cache expires. So we should clean the sk_dst_cache when a user set SO_DONTROUTE option. Below are server/client python scripts which could reprodue this issue:

server side code:

========================================================================== import socket import struct import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind(('0.0.0.0', 9000)) s.listen(1) sock, addr = s.accept() sock.setsockopt(socket.SOL_SOCKET, socket.SO_DONTROUTE, struct.pack('i', 1)) while True: sock.send(b'foo') time.sleep(1) ==========================================================================

client side code: ========================================================================== import socket import time

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.connect(('server_address', 9000)) while True: data = s.recv(1024) print(data) ==========================================================================

Signed-off-by: yupeng yupeng0921@gmail.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/core/sock.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/net/core/sock.c b/net/core/sock.c index 36f19458e2fe..33584eb7f0cf 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -735,6 +735,7 @@ int sock_setsockopt(struct socket *sock, int level, int optname, break; case SO_DONTROUTE: sock_valbool_flag(sk, SOCK_LOCALROUTE, valbool); + sk_dst_reset(sk); break; case SO_BROADCAST: sock_valbool_flag(sk, SOCK_BROADCAST, valbool);

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 25/53] scsi: target: use consistent left-aligned ASCII INQUIRY data

From: David Disseldorp ddiss@suse.de

[ Upstream commit 0de263577de5d5e052be5f4f93334e63cc8a7f0b ]

spc5r17.pdf specifies:

4.3.1 ASCII data field requirements ASCII data fields shall contain only ASCII printable characters (i.e., code values 20h to 7Eh) and may be terminated with one or more ASCII null (00h) characters. ASCII data fields described as being left-aligned shall have any unused bytes at the end of the field (i.e., highest offset) and the unused bytes shall be filled with ASCII space characters (20h).

LIO currently space-pads the T10 VENDOR IDENTIFICATION and PRODUCT IDENTIFICATION fields in the standard INQUIRY data. However, the PRODUCT REVISION LEVEL field in the standard INQUIRY data as well as the T10 VENDOR IDENTIFICATION field in the INQUIRY Device Identification VPD Page are zero-terminated/zero-padded.

Fix this inconsistency by using space-padding for all of the above fields.

Signed-off-by: David Disseldorp ddiss@suse.de Reviewed-by: Christoph Hellwig hch@lst.de Reviewed-by: Bryant G. Ly bly@catalogicsoftware.com Reviewed-by: Lee Duncan lduncan@suse.com Reviewed-by: Hannes Reinecke hare@suse.com Reviewed-by: Roman Bolshakov r.bolshakov@yadro.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/target_core_spc.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/drivers/target/target_core_spc.c b/drivers/target/target_core_spc.c index cb0461a10808..93424db5f002 100644 --- a/drivers/target/target_core_spc.c +++ b/drivers/target/target_core_spc.c @@ -108,12 +108,17 @@ spc_emulate_inquiry_std(struct se_cmd *cmd, unsigned char *buf)

buf[7] = 0x2; /* CmdQue=1 */

- memcpy(&buf[8], "LIO-ORG ", 8); - memset(&buf[16], 0x20, 16); + /* + * ASCII data fields described as being left-aligned shall have any + * unused bytes at the end of the field (i.e., highest offset) and the + * unused bytes shall be filled with ASCII space characters (20h). + */ + memset(&buf[8], 0x20, 8 + 16 + 4); + memcpy(&buf[8], "LIO-ORG", sizeof("LIO-ORG") - 1); memcpy(&buf[16], dev->t10_wwn.model, - min_t(size_t, strlen(dev->t10_wwn.model), 16)); + strnlen(dev->t10_wwn.model, 16)); memcpy(&buf[32], dev->t10_wwn.revision, - min_t(size_t, strlen(dev->t10_wwn.revision), 4)); + strnlen(dev->t10_wwn.revision, 4)); buf[4] = 31; /* Set additional length to 31 */

return 0; @@ -251,7 +256,9 @@ spc_emulate_evpd_83(struct se_cmd *cmd, unsigned char *buf) buf[off] = 0x2; /* ASCII */ buf[off+1] = 0x1; /* T10 Vendor ID */ buf[off+2] = 0x0; - memcpy(&buf[off+4], "LIO-ORG", 8); + /* left align Vendor ID and pad with spaces */ + memset(&buf[off+4], 0x20, 8); + memcpy(&buf[off+4], "LIO-ORG", sizeof("LIO-ORG") - 1); /* Extra Byte for NULL Terminator */ id_len++; /* Identifier Length */

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 26/53] selftests: do not macro-expand failed assertion expressions

From: "Dmitry V. Levin" ldv@altlinux.org

[ Upstream commit b708a3cc9600390ccaa2b68a88087dd265154b2b ]

I've stumbled over the current macro-expand behaviour of the test harness:

$ gcc -Wall -xc - <<'__EOF__' TEST(macro) { int status = 0; ASSERT_TRUE(WIFSIGNALED(status)); } TEST_HARNESS_MAIN __EOF__ $ ./a.out [==========] Running 1 tests from 1 test cases. [ RUN ] global.macro <stdin>:4:global.macro:Expected 0 (0) != (((signed char) (((status) & 0x7f) + 1) >> 1) > 0) (0) global.macro: Test terminated by assertion [ FAIL ] global.macro [==========] 0 / 1 tests passed. [ FAILED ]

With this change the output of the same test looks much more comprehensible:

[==========] Running 1 tests from 1 test cases. [ RUN ] global.macro <stdin>:4:global.macro:Expected 0 (0) != WIFSIGNALED(status) (0) global.macro: Test terminated by assertion [ FAIL ] global.macro [==========] 0 / 1 tests passed. [ FAILED ]

The issue is very similar to the bug fixed in glibc assert(3) three years ago: https://sourceware.org/bugzilla/show_bug.cgi?id=18604

Cc: Shuah Khan shuah@kernel.org Cc: Kees Cook keescook@chromium.org Cc: Andy Lutomirski luto@amacapital.net Cc: Will Drewry wad@chromium.org Cc: linux-kselftest@vger.kernel.org Signed-off-by: Dmitry V. Levin ldv@altlinux.org Acked-by: Kees Cook keescook@chromium.org Signed-off-by: Shuah Khan shuah@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- tools/testing/selftests/kselftest_harness.h | 42 ++++++++++----------- 1 file changed, 21 insertions(+), 21 deletions(-)

diff --git a/tools/testing/selftests/kselftest_harness.h b/tools/testing/selftests/kselftest_harness.h index e81bd28bdd89..8f32d699d6c5 100644 --- a/tools/testing/selftests/kselftest_harness.h +++ b/tools/testing/selftests/kselftest_harness.h @@ -330,7 +330,7 @@ * ASSERT_EQ(expected, measured): expected == measured */ #define ASSERT_EQ(expected, seen) \ - __EXPECT(expected, seen, ==, 1) + __EXPECT(expected, #expected, seen, #seen, ==, 1)

/** * ASSERT_NE(expected, seen) @@ -341,7 +341,7 @@ * ASSERT_NE(expected, measured): expected != measured */ #define ASSERT_NE(expected, seen) \ - __EXPECT(expected, seen, !=, 1) + __EXPECT(expected, #expected, seen, #seen, !=, 1)

/** * ASSERT_LT(expected, seen) @@ -352,7 +352,7 @@ * ASSERT_LT(expected, measured): expected < measured */ #define ASSERT_LT(expected, seen) \ - __EXPECT(expected, seen, <, 1) + __EXPECT(expected, #expected, seen, #seen, <, 1)

/** * ASSERT_LE(expected, seen) @@ -363,7 +363,7 @@ * ASSERT_LE(expected, measured): expected <= measured */ #define ASSERT_LE(expected, seen) \ - __EXPECT(expected, seen, <=, 1) + __EXPECT(expected, #expected, seen, #seen, <=, 1)

/** * ASSERT_GT(expected, seen) @@ -374,7 +374,7 @@ * ASSERT_GT(expected, measured): expected > measured */ #define ASSERT_GT(expected, seen) \ - __EXPECT(expected, seen, >, 1) + __EXPECT(expected, #expected, seen, #seen, >, 1)

/** * ASSERT_GE(expected, seen) @@ -385,7 +385,7 @@ * ASSERT_GE(expected, measured): expected >= measured */ #define ASSERT_GE(expected, seen) \ - __EXPECT(expected, seen, >=, 1) + __EXPECT(expected, #expected, seen, #seen, >=, 1)

/** * ASSERT_NULL(seen) @@ -395,7 +395,7 @@ * ASSERT_NULL(measured): NULL == measured */ #define ASSERT_NULL(seen) \ - __EXPECT(NULL, seen, ==, 1) + __EXPECT(NULL, "NULL", seen, #seen, ==, 1)

/** * ASSERT_TRUE(seen) @@ -405,7 +405,7 @@ * ASSERT_TRUE(measured): measured != 0 */ #define ASSERT_TRUE(seen) \ - ASSERT_NE(0, seen) + __EXPECT(0, "0", seen, #seen, !=, 1)

/** * ASSERT_FALSE(seen) @@ -415,7 +415,7 @@ * ASSERT_FALSE(measured): measured == 0 */ #define ASSERT_FALSE(seen) \ - ASSERT_EQ(0, seen) + __EXPECT(0, "0", seen, #seen, ==, 1)

/** * ASSERT_STREQ(expected, seen) @@ -448,7 +448,7 @@ * EXPECT_EQ(expected, measured): expected == measured */ #define EXPECT_EQ(expected, seen) \ - __EXPECT(expected, seen, ==, 0) + __EXPECT(expected, #expected, seen, #seen, ==, 0)

/** * EXPECT_NE(expected, seen) @@ -459,7 +459,7 @@ * EXPECT_NE(expected, measured): expected != measured */ #define EXPECT_NE(expected, seen) \ - __EXPECT(expected, seen, !=, 0) + __EXPECT(expected, #expected, seen, #seen, !=, 0)

/** * EXPECT_LT(expected, seen) @@ -470,7 +470,7 @@ * EXPECT_LT(expected, measured): expected < measured */ #define EXPECT_LT(expected, seen) \ - __EXPECT(expected, seen, <, 0) + __EXPECT(expected, #expected, seen, #seen, <, 0)

/** * EXPECT_LE(expected, seen) @@ -481,7 +481,7 @@ * EXPECT_LE(expected, measured): expected <= measured */ #define EXPECT_LE(expected, seen) \ - __EXPECT(expected, seen, <=, 0) + __EXPECT(expected, #expected, seen, #seen, <=, 0)

/** * EXPECT_GT(expected, seen) @@ -492,7 +492,7 @@ * EXPECT_GT(expected, measured): expected > measured */ #define EXPECT_GT(expected, seen) \ - __EXPECT(expected, seen, >, 0) + __EXPECT(expected, #expected, seen, #seen, >, 0)

/** * EXPECT_GE(expected, seen) @@ -503,7 +503,7 @@ * EXPECT_GE(expected, measured): expected >= measured */ #define EXPECT_GE(expected, seen) \ - __EXPECT(expected, seen, >=, 0) + __EXPECT(expected, #expected, seen, #seen, >=, 0)

/** * EXPECT_NULL(seen) @@ -513,7 +513,7 @@ * EXPECT_NULL(measured): NULL == measured */ #define EXPECT_NULL(seen) \ - __EXPECT(NULL, seen, ==, 0) + __EXPECT(NULL, "NULL", seen, #seen, ==, 0)

/** * EXPECT_TRUE(seen) @@ -523,7 +523,7 @@ * EXPECT_TRUE(measured): 0 != measured */ #define EXPECT_TRUE(seen) \ - EXPECT_NE(0, seen) + __EXPECT(0, "0", seen, #seen, !=, 0)

/** * EXPECT_FALSE(seen) @@ -533,7 +533,7 @@ * EXPECT_FALSE(measured): 0 == measured */ #define EXPECT_FALSE(seen) \ - EXPECT_EQ(0, seen) + __EXPECT(0, "0", seen, #seen, ==, 0)

/** * EXPECT_STREQ(expected, seen) @@ -573,7 +573,7 @@ if (_metadata->passed && _metadata->step < 255) \ _metadata->step++;

-#define __EXPECT(_expected, _seen, _t, _assert) do { \ +#define __EXPECT(_expected, _expected_str, _seen, _seen_str, _t, _assert) do { \ /* Avoid multiple evaluation of the cases */ \ __typeof__(_expected) __exp = (_expected); \ __typeof__(_seen) __seen = (_seen); \ @@ -582,8 +582,8 @@ unsigned long long __exp_print = (uintptr_t)__exp; \ unsigned long long __seen_print = (uintptr_t)__seen; \ __TH_LOG("Expected %s (%llu) %s %s (%llu)", \ - #_expected, __exp_print, #_t, \ - #_seen, __seen_print); \ + _expected_str, __exp_print, #_t, \ + _seen_str, __seen_print); \ _metadata->passed = 0; \ /* Ensure the optional handler is triggered */ \ _metadata->trigger = 1; \

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 27/53] clk: imx6q: reset exclusive gates on init

From: Lucas Stach l.stach@pengutronix.de

[ Upstream commit f7542d817733f461258fd3a47d77da35b2d9fc81 ]

The exclusive gates may be set up in the wrong way by software running before the clock driver comes up. In that case the exclusive setup is locked in its initial state, as the complementary function can't be activated without disabling the initial setup first.

To avoid this lock situation, reset the exclusive gates to the off state and allow the kernel to provide the proper setup.

Signed-off-by: Lucas Stach l.stach@pengutronix.de Reviewed-by: Dong Aisheng Aisheng.dong@nxp.com Signed-off-by: Stephen Boyd sboyd@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clk/imx/clk-imx6q.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/clk/imx/clk-imx6q.c b/drivers/clk/imx/clk-imx6q.c index 8d518ad5dc13..8eb93eb2f857 100644 --- a/drivers/clk/imx/clk-imx6q.c +++ b/drivers/clk/imx/clk-imx6q.c @@ -515,8 +515,12 @@ static void __init imx6q_clocks_init(struct device_node *ccm_node) * lvds1_gate and lvds2_gate are pseudo-gates. Both can be * independently configured as clock inputs or outputs. We treat * the "output_enable" bit as a gate, even though it's really just - * enabling clock output. + * enabling clock output. Initially the gate bits are cleared, as + * otherwise the exclusive configuration gets locked in the setup done + * by software running before the clock driver, with no way to change + * it. */ + writel(readl(base + 0x160) & ~0x3c00, base + 0x160); clk[IMX6QDL_CLK_LVDS1_GATE] = imx_clk_gate_exclusive("lvds1_gate", "lvds1_sel", base + 0x160, 10, BIT(12)); clk[IMX6QDL_CLK_LVDS2_GATE] = imx_clk_gate_exclusive("lvds2_gate", "lvds2_sel", base + 0x160, 11, BIT(13));

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 28/53] arm64: Fix minor issues with the dcache_by_line_op macro

From: Will Deacon will.deacon@arm.com

[ Upstream commit 33309ecda0070506c49182530abe7728850ebe78 ]

The dcache_by_line_op macro suffers from a couple of small problems:

First, the GAS directives that are currently being used rely on assembler behavior that is not documented, and probably not guaranteed to produce the correct behavior going forward. As a result, we end up with some undefined symbols in cache.o:

$ nm arch/arm64/mm/cache.o ... U civac ... U cvac U cvap U cvau

This is due to the fact that the comparisons used to select the operation type in the dcache_by_line_op macro are comparing symbols not strings, and even though it seems that GAS is doing the right thing here (undefined symbols by the same name are equal to each other), it seems unwise to rely on this.

Second, when patching in a DC CVAP instruction on CPUs that support it, the fallback path consists of a DC CVAU instruction which may be affected by CPU errata that require ARM64_WORKAROUND_CLEAN_CACHE.

Solve these issues by unrolling the various maintenance routines and using the conditional directives that are documented as operating on strings. To avoid the complexity of nested alternatives, we move the DC CVAP patching to __clean_dcache_area_pop, falling back to a branch to __clean_dcache_area_poc if DCPOP is not supported by the CPU.

Reported-by: Ard Biesheuvel ard.biesheuvel@linaro.org Suggested-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Will Deacon will.deacon@arm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm64/include/asm/assembler.h | 30 ++++++++++++++++++------------ arch/arm64/mm/cache.S | 3 +++ 2 files changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/arm64/include/asm/assembler.h b/arch/arm64/include/asm/assembler.h index 66aea4aa455d..02d73d83f0de 100644 --- a/arch/arm64/include/asm/assembler.h +++ b/arch/arm64/include/asm/assembler.h @@ -373,27 +373,33 @@ alternative_endif * size: size of the region * Corrupts: kaddr, size, tmp1, tmp2 */ + .macro __dcache_op_workaround_clean_cache, op, kaddr +alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE + dc \op, \kaddr +alternative_else + dc civac, \kaddr +alternative_endif + .endm + .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2 dcache_line_size \tmp1, \tmp2 add \size, \kaddr, \size sub \tmp2, \tmp1, #1 bic \kaddr, \kaddr, \tmp2 9998: - .if (\op == cvau || \op == cvac) -alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE - dc \op, \kaddr -alternative_else - dc civac, \kaddr -alternative_endif - .elseif (\op == cvap) -alternative_if ARM64_HAS_DCPOP - sys 3, c7, c12, 1, \kaddr // dc cvap -alternative_else - dc cvac, \kaddr -alternative_endif + .ifc \op, cvau + __dcache_op_workaround_clean_cache \op, \kaddr + .else + .ifc \op, cvac + __dcache_op_workaround_clean_cache \op, \kaddr + .else + .ifc \op, cvap + sys 3, c7, c12, 1, \kaddr // dc cvap .else dc \op, \kaddr .endif + .endif + .endif add \kaddr, \kaddr, \tmp1 cmp \kaddr, \size b.lo 9998b diff --git a/arch/arm64/mm/cache.S b/arch/arm64/mm/cache.S index 91464e7f77cc..c1e8f3c6ffd5 100644 --- a/arch/arm64/mm/cache.S +++ b/arch/arm64/mm/cache.S @@ -181,6 +181,9 @@ ENDPROC(__dma_clean_area) * - size - size in question */ ENTRY(__clean_dcache_area_pop) + alternative_if_not ARM64_HAS_DCPOP + b __clean_dcache_area_poc + alternative_else_nop_endif dcache_by_line_op cvap, sy, x0, x1, x2, x3 ret ENDPIPROC(__clean_dcache_area_pop)

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 29/53] kconfig: fix file name and line number of warn_ignored_character()

From: Masahiro Yamada yamada.masahiro@socionext.com

[ Upstream commit 77c1c0fa8b1477c5799bdad65026ea5ff676da44 ]

Currently, warn_ignore_character() displays invalid file name and line number.

The lexer should use current_file->name and yylineno, while the parser should use zconf_curname() and zconf_lineno().

This difference comes from that the lexer is always going ahead of the parser. The parser needs to look ahead one token to make a shift/reduce decision, so the lexer is requested to scan more text from the input file.

This commit fixes the warning message from warn_ignored_character().

[Test Code]

----(Kconfig begin)---- / -----(Kconfig end)-----

[Output]

Before the fix:

<none>:0:warning: ignoring unsupported character '/'

After the fix:

Kconfig:1:warning: ignoring unsupported character '/'

Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/kconfig/zconf.l | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/kconfig/zconf.l b/scripts/kconfig/zconf.l index c410d257da06..6534dc5ac803 100644 --- a/scripts/kconfig/zconf.l +++ b/scripts/kconfig/zconf.l @@ -71,7 +71,7 @@ static void warn_ignored_character(char chr) { fprintf(stderr, "%s:%d:warning: ignoring unsupported character '%c'\n", - zconf_curname(), zconf_lineno(), chr); + current_file->name, yylineno, chr); } %}

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 30/53] kconfig: fix memory leak when EOF is encountered in quotation

From: Masahiro Yamada yamada.masahiro@socionext.com

[ Upstream commit fbac5977d81cb2b2b7e37b11c459055d9585273c ]

An unterminated string literal followed by new line is passed to the parser (with "multi-line strings not supported" warning shown), then handled properly there.

On the other hand, an unterminated string literal at end of file is never passed to the parser, then results in memory leak.

[Test Code]

----------(Kconfig begin)---------- source "Kconfig.inc"

config A bool "a" -----------(Kconfig end)-----------

--------(Kconfig.inc begin)-------- config B bool "b\No new line at end of file ---------(Kconfig.inc end)---------

[Summary from Valgrind]

Before the fix:

LEAK SUMMARY: definitely lost: 16 bytes in 1 blocks ...

After the fix:

LEAK SUMMARY: definitely lost: 0 bytes in 0 blocks ...

Eliminate the memory leak path by handling this case. Of course, such a Kconfig file is wrong already, so I will add an error message later.

Signed-off-by: Masahiro Yamada yamada.masahiro@socionext.com Signed-off-by: Sasha Levin sashal@kernel.org --- scripts/kconfig/zconf.l | 2 ++ 1 file changed, 2 insertions(+)

diff --git a/scripts/kconfig/zconf.l b/scripts/kconfig/zconf.l index 6534dc5ac803..0c7800112ff5 100644 --- a/scripts/kconfig/zconf.l +++ b/scripts/kconfig/zconf.l @@ -191,6 +191,8 @@ n [A-Za-z0-9_-] } <<EOF>> { BEGIN(INITIAL); + yylval.string = text; + return T_WORD_QUOTE; } }

-- 2.19.1

Sasha Levin

7:31 p.m.

New subject: [PATCH AUTOSEL 4.14 31/53] mmc: atmel-mci: do not assume idle after atmci_request_end

From: Jonas Danielsson jonas@orbital-systems.com

[ Upstream commit ae460c115b7aa50c9a36cf78fced07b27962c9d0 ]

On our AT91SAM9260 board we use the same sdio bus for wifi and for the sd card slot. This caused the atmel-mci to give the following splat on the serial console:

------------[ cut here ]------------ WARNING: CPU: 0 PID: 538 at drivers/mmc/host/atmel-mci.c:859 atmci_send_command+0x24/0x44 Modules linked in: CPU: 0 PID: 538 Comm: mmcqd/0 Not tainted 4.14.76 #14 Hardware name: Atmel AT91SAM9 [<c000fccc>] (unwind_backtrace) from [<c000d3dc>] (show_stack+0x10/0x14) [<c000d3dc>] (show_stack) from [<c0017644>] (__warn+0xd8/0xf4) [<c0017644>] (__warn) from [<c0017704>] (warn_slowpath_null+0x1c/0x24) [<c0017704>] (warn_slowpath_null) from [<c033bb9c>] (atmci_send_command+0x24/0x44) [<c033bb9c>] (atmci_send_command) from [<c033e984>] (atmci_start_request+0x1f4/0x2dc) [<c033e984>] (atmci_start_request) from [<c033f3b4>] (atmci_request+0xf0/0x164) [<c033f3b4>] (atmci_request) from [<c0327108>] (mmc_start_request+0x280/0x2d0) [<c0327108>] (mmc_start_request) from [<c032800c>] (mmc_start_areq+0x230/0x330) [<c032800c>] (mmc_start_areq) from [<c03366f8>] (mmc_blk_issue_rw_rq+0xc4/0x310) [<c03366f8>] (mmc_blk_issue_rw_rq) from [<c03372c4>] (mmc_blk_issue_rq+0x118/0x5ac) [<c03372c4>] (mmc_blk_issue_rq) from [<c033781c>] (mmc_queue_thread+0xc4/0x118) [<c033781c>] (mmc_queue_thread) from [<c002daf8>] (kthread+0x100/0x118) [<c002daf8>] (kthread) from [<c000a580>] (ret_from_fork+0x14/0x34) ---[ end trace 594371ddfa284bd6 ]---

This is: WARN_ON(host->cmd);

This was fixed on our board by letting atmci_request_end determine what state we are in. Instead of unconditionally setting it to STATE_IDLE on STATE_END_REQUEST.

Signed-off-by: Jonas Danielsson jonas@orbital-systems.com Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/mmc/host/atmel-mci.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/atmel-mci.c b/drivers/mmc/host/atmel-mci.c index 0a0ebf3a096d..c8a591d8a3d9 100644 --- a/drivers/mmc/host/atmel-mci.c +++ b/drivers/mmc/host/atmel-mci.c @@ -1954,13 +1954,14 @@ static void atmci_tasklet_func(unsigned long priv) }

atmci_request_end(host, host->mrq); - state = STATE_IDLE; + goto unlock; /* atmci_request_end() sets host->state */ break; } } while (state != prev_state);

host->state = state;

+unlock: spin_unlock(&host->lock); }

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 32/53] btrfs: improve error handling of btrfs_add_link

From: Johannes Thumshirn jthumshirn@suse.de

[ Upstream commit 1690dd41e0cb1dade80850ed8a3eb0121b96d22f ]

In the error handling block, err holds the return value of either btrfs_del_root_ref() or btrfs_del_inode_ref() but it hasn't been checked since it's introduction with commit fe66a05a0679 (Btrfs: improve error handling for btrfs_insert_dir_item callers) in 2012.

If the error handling in the error handling fails, there's not much left to do and the abort either happened earlier in the callees or is necessary here.

So if one of btrfs_del_root_ref() or btrfs_del_inode_ref() failed, abort the transaction, but still return the original code of the failure stored in 'ret' as this will be reported to the user.

Signed-off-by: Johannes Thumshirn jthumshirn@suse.de Reviewed-by: David Sterba dsterba@suse.com Signed-off-by: David Sterba dsterba@suse.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/btrfs/inode.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 1c340d6c8568..997ed98ce8cb 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -6582,14 +6582,19 @@ int btrfs_add_link(struct btrfs_trans_handle *trans, err = btrfs_del_root_ref(trans, fs_info, key.objectid, root->root_key.objectid, parent_ino, &local_index, name, name_len); - + if (err) + btrfs_abort_transaction(trans, err); } else if (add_backref) { u64 local_index; int err;

err = btrfs_del_inode_ref(trans, root, name, name_len, ino, parent_ino, &local_index); + if (err) + btrfs_abort_transaction(trans, err); } + + /* Return the original error code */ return ret; }

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 33/53] tty/serial: do not free trasnmit buffer page under port lock

From: Sergey Senozhatsky sergey.senozhatsky.work@gmail.com

[ Upstream commit d72402145ace0697a6a9e8e75a3de5bf3375f78d ]

LKP has hit yet another circular locking dependency between uart console drivers and debugobjects [1]:

CPU0 CPU1

rhltable_init() __init_work() debug_object_init uart_shutdown() /* db->lock */ /* uart_port->lock */ debug_print_object() free_page() printk() call_console_drivers() debug_check_no_obj_freed() /* uart_port->lock */ /* db->lock */ debug_print_object()

So there are two dependency chains: uart_port->lock -> db->lock And db->lock -> uart_port->lock

This particular circular locking dependency can be addressed in several ways:

a) One way would be to move debug_print_object() out of db->lock scope and, thus, break the db->lock -> uart_port->lock chain. b) Another one would be to free() transmit buffer page out of db->lock in UART code; which is what this patch does.

It makes sense to apply a) and b) independently: there are too many things going on behind free(), none of which depend on uart_port->lock.

The patch fixes transmit buffer page free() in uart_shutdown() and, additionally, in uart_port_startup() (as was suggested by Dmitry Safonov).

[1] https://lore.kernel.org/lkml/20181211091154.GL23332@shao2-debian/T/#u Signed-off-by: Sergey Senozhatsky sergey.senozhatsky@gmail.com Reviewed-by: Petr Mladek pmladek@suse.com Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Jiri Slaby jslaby@suse.com Cc: Andrew Morton akpm@linux-foundation.org Cc: Waiman Long longman@redhat.com Cc: Dmitry Safonov dima@arista.com Cc: Steven Rostedt rostedt@goodmis.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/tty/serial/serial_core.c | 22 ++++++++++++++++------ 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/serial/serial_core.c b/drivers/tty/serial/serial_core.c index 6db8844ef3ec..543d0f95f094 100644 --- a/drivers/tty/serial/serial_core.c +++ b/drivers/tty/serial/serial_core.c @@ -218,10 +218,15 @@ static int uart_port_startup(struct tty_struct *tty, struct uart_state *state, if (!state->xmit.buf) { state->xmit.buf = (unsigned char *) page; uart_circ_clear(&state->xmit); + uart_port_unlock(uport, flags); } else { + uart_port_unlock(uport, flags); + /* + * Do not free() the page under the port lock, see + * uart_shutdown(). + */ free_page(page); } - uart_port_unlock(uport, flags);

retval = uport->ops->startup(uport); if (retval == 0) { @@ -281,6 +286,7 @@ static void uart_shutdown(struct tty_struct *tty, struct uart_state *state) struct uart_port *uport = uart_port_check(state); struct tty_port *port = &state->port; unsigned long flags = 0; + char *xmit_buf = NULL;

/* * Set the TTY IO error marker @@ -311,14 +317,18 @@ static void uart_shutdown(struct tty_struct *tty, struct uart_state *state) tty_port_set_suspended(port, 0);

/* - * Free the transmit buffer page. + * Do not free() the transmit buffer page under the port lock since + * this can create various circular locking scenarios. For instance, + * console driver may need to allocate/free a debug object, which + * can endup in printk() recursion. */ uart_port_lock(state, flags); - if (state->xmit.buf) { - free_page((unsigned long)state->xmit.buf); - state->xmit.buf = NULL; - } + xmit_buf = state->xmit.buf; + state->xmit.buf = NULL; uart_port_unlock(uport, flags); + + if (xmit_buf) + free_page((unsigned long)xmit_buf); }

/**

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 34/53] perf intel-pt: Fix error with config term "pt=0"

From: Adrian Hunter adrian.hunter@intel.com

[ Upstream commit 1c6f709b9f96366cc47af23c05ecec9b8c0c392d ]

Users should never use 'pt=0', but if they do it may give a meaningless error:

$ perf record -e intel_pt/pt=0/u uname Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (intel_pt/pt=0/u).

Fix that by forcing 'pt=1'.

Committer testing:

# perf record -e intel_pt/pt=0/u uname Error: The sys_perf_event_open() syscall returned with 22 (Invalid argument) for event (intel_pt/pt=0/u). /bin/dmesg | grep -i perf may provide additional information.

# perf record -e intel_pt/pt=0/u uname pt=0 doesn't make sense, forcing pt=1 Linux [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.020 MB perf.data ] #

Signed-off-by: Adrian Hunter adrian.hunter@intel.com Tested-by: Arnaldo Carvalho de Melo acme@redhat.com Cc: Jiri Olsa jolsa@redhat.com Link: http://lkml.kernel.org/r/b7c5b4e5-9497-10e5-fd43-5f3e4a0fe51d@intel.com Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/arch/x86/util/intel-pt.c | 11 +++++++++++ 1 file changed, 11 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c index db0ba8caf5a2..ba8ecaf52200 100644 --- a/tools/perf/arch/x86/util/intel-pt.c +++ b/tools/perf/arch/x86/util/intel-pt.c @@ -524,10 +524,21 @@ static int intel_pt_validate_config(struct perf_pmu *intel_pt_pmu, struct perf_evsel *evsel) { int err; + char c;

if (!evsel) return 0;

+ /* + * If supported, force pass-through config term (pt=1) even if user + * sets pt=0, which avoids senseless kernel errors. + */ + if (perf_pmu__scan_file(intel_pt_pmu, "format/pt", "%c", &c) == 1 && + !(evsel->attr.config & 1)) { + pr_warning("pt=0 doesn't make sense, forcing pt=1\n"); + evsel->attr.config |= 1; + } + err = intel_pt_val_config_term(intel_pt_pmu, "caps/cycle_thresholds", "cyc_thresh", "caps/psb_cyc", evsel->attr.config);

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 35/53] perf svghelper: Fix unchecked usage of strncpy()

From: Arnaldo Carvalho de Melo acme@redhat.com

[ Upstream commit 2f5302533f306d5ee87bd375aef9ca35b91762cb ]

The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it.

In this specific case this would only happen if fgets() was buggy, as its man page states that it should read one less byte than the size of the destination buffer, so that it can put the nul byte at the end of it, so it would never copy 255 non-nul chars, as fgets reads into the orig buffer at most 254 non-nul chars and terminates it. But lets just switch to strlcpy to keep the original intent and silence the gcc 8.2 warning.

This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

In function 'cpu_model', inlined from 'svg_cpu_box' at util/svghelper.c:378:2: util/svghelper.c:337:5: error: 'strncpy' output may be truncated copying 255 bytes from a string of length 255 [-Werror=stringop-truncation] strncpy(cpu_m, &buf[13], 255); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Cc: Arjan van de Ven arjan@linux.intel.com Fixes: f48d55ce7871 ("perf: Add a SVG helper library file") Link: https://lkml.kernel.org/n/tip-xzkoo0gyr56gej39ltivuh9g@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/svghelper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/svghelper.c b/tools/perf/util/svghelper.c index 1cbada2dc6be..f735ee038713 100644 --- a/tools/perf/util/svghelper.c +++ b/tools/perf/util/svghelper.c @@ -334,7 +334,7 @@ static char *cpu_model(void) if (file) { while (fgets(buf, 255, file)) { if (strstr(buf, "model name")) { - strncpy(cpu_m, &buf[13], 255); + strlcpy(cpu_m, &buf[13], 255); break; } }

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 36/53] perf parse-events: Fix unchecked usage of strncpy()

From: Arnaldo Carvalho de Melo acme@redhat.com

[ Upstream commit bd8d57fb7e25e9fcf67a9eef5fa13aabe2016e07 ]

The strncpy() function may leave the destination string buffer unterminated, better use strlcpy() that we have a __weak fallback implementation for systems without it.

This fixes this warning on an Alpine Linux Edge system with gcc 8.2:

util/parse-events.c: In function 'print_symbol_events': util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'print_symbol_events.constprop', inlined from 'print_events' at util/parse-events.c:2508:2: util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In function 'print_symbol_events.constprop', inlined from 'print_events' at util/parse-events.c:2511:2: util/parse-events.c:2465:4: error: 'strncpy' specified bound 100 equals destination size [-Werror=stringop-truncation] strncpy(name, syms->symbol, MAX_NAME_LEN); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors

Cc: Adrian Hunter adrian.hunter@intel.com Cc: Jiri Olsa jolsa@kernel.org Cc: Namhyung Kim namhyung@kernel.org Fixes: 947b4ad1d198 ("perf list: Fix max event string size") Link: https://lkml.kernel.org/n/tip-b663e33bm6x8hrkie4uxh7u2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/perf/util/parse-events.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 53f620472151..d0b92d374ba9 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -2300,7 +2300,7 @@ void print_symbol_events(const char *event_glob, unsigned type, if (!name_only && strlen(syms->alias)) snprintf(name, MAX_NAME_LEN, "%s OR %s", syms->symbol, syms->alias); else - strncpy(name, syms->symbol, MAX_NAME_LEN); + strlcpy(name, syms->symbol, MAX_NAME_LEN);

evt_list[evt_i] = strdup(name); if (evt_list[evt_i] == NULL)

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 37/53] netfilter: ipt_CLUSTERIP: check MAC address when duplicate config is set

From: Taehee Yoo ap420073@gmail.com

[ Upstream commit 06aa151ad1fc74a49b45336672515774a678d78d ]

If same destination IP address config is already existing, that config is just used. MAC address also should be same. However, there is no MAC address checking routine. So that MAC address checking routine is added.

test commands: %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \ -j CLUSTERIP --new --hashmode sourceip \ --clustermac 01:00:5e:00:00:20 --total-nodes 2 --local-node 1 %iptables -A INPUT -p tcp -i lo -d 192.168.0.5 --dport 80 \ -j CLUSTERIP --new --hashmode sourceip \ --clustermac 01:00:5e:00:00:21 --total-nodes 2 --local-node 1

After this patch, above commands are disallowed.

Signed-off-by: Taehee Yoo ap420073@gmail.com Signed-off-by: Pablo Neira Ayuso pablo@netfilter.org Signed-off-by: Sasha Levin sashal@kernel.org --- net/ipv4/netfilter/ipt_CLUSTERIP.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/netfilter/ipt_CLUSTERIP.c b/net/ipv4/netfilter/ipt_CLUSTERIP.c index cc7c9d67ac19..45f21489f515 100644 --- a/net/ipv4/netfilter/ipt_CLUSTERIP.c +++ b/net/ipv4/netfilter/ipt_CLUSTERIP.c @@ -492,7 +492,8 @@ static int clusterip_tg_check(const struct xt_tgchk_param *par) if (IS_ERR(config)) return PTR_ERR(config); } - } + } else if (memcmp(&config->clustermac, &cipinfo->clustermac, ETH_ALEN)) + return -EINVAL;

ret = nf_ct_netns_get(par->net, par->family); if (ret < 0) {

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 38/53] dm crypt: use u64 instead of sector_t to store iv_offset

From: AliOS system security alios_sys_security@linux.alibaba.com

[ Upstream commit 8d683dcd65c037efc9fb38c696ec9b65b306e573 ]

The iv_offset in the mapping table of crypt target is a 64bit number when IV algorithm is plain64, plain64be, essiv or benbi. It will be assigned to iv_offset of struct crypt_config, cc_sector of struct convert_context and iv_sector of struct dm_crypt_request. These structures members are defined as a sector_t. But sector_t is 32bit when CONFIG_LBDAF is not set in 32bit kernel. In this situation sector_t is not big enough to store the 64bit iv_offset.

Here is a reproducer. Prepare test image and device (loop is automatically allocated by cryptsetup):

# dd if=/dev/zero of=tst.img bs=1M count=1 # echo "tst"|cryptsetup open --type plain -c aes-xts-plain64 \ --skip 500000000000000000 tst.img test

On 32bit system (use IV offset value that overflows to 64bit; CONFIG_LBDAF if off) and device checksum is wrong:

# dmsetup table test --showkeys 0 2048 crypt aes-xts-plain64 dfa7cfe3c481f2239155739c42e539ae8f2d38f304dcc89d20b26f69daaf0933 3551657984 7:0 0

# sha256sum /dev/mapper/test 533e25c09176632b3794f35303488c4a8f3f965dffffa6ec2df347c168cb6c19 /dev/mapper/test

On 64bit system (and on 32bit system with the patch), table and checksum is now correct:

# dmsetup table test --showkeys 0 2048 crypt aes-xts-plain64 dfa7cfe3c481f2239155739c42e539ae8f2d38f304dcc89d20b26f69daaf0933 500000000000000000 7:0 0

# sha256sum /dev/mapper/test 5d16160f9d5f8c33d8051e65fdb4f003cc31cd652b5abb08f03aa6fce0df75fc /dev/mapper/test

Signed-off-by: AliOS system security alios_sys_security@linux.alibaba.com Tested-and-Reviewed-by: Milan Broz gmazyland@gmail.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-crypt.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c index c60d29d09687..2652ef68d58d 100644 --- a/drivers/md/dm-crypt.c +++ b/drivers/md/dm-crypt.c @@ -49,7 +49,7 @@ struct convert_context { struct bio *bio_out; struct bvec_iter iter_in; struct bvec_iter iter_out; - sector_t cc_sector; + u64 cc_sector; atomic_t cc_pending; union { struct skcipher_request *req; @@ -81,7 +81,7 @@ struct dm_crypt_request { struct convert_context *ctx; struct scatterlist sg_in[4]; struct scatterlist sg_out[4]; - sector_t iv_sector; + u64 iv_sector; };

struct crypt_config; @@ -172,7 +172,7 @@ struct crypt_config { struct iv_lmk_private lmk; struct iv_tcw_private tcw; } iv_gen_private; - sector_t iv_offset; + u64 iv_offset; unsigned int iv_size; unsigned short int sector_size; unsigned char sector_shift;

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 39/53] dm kcopyd: Fix bug causing workqueue stalls

From: Nikos Tsironis ntsironis@arrikto.com

[ Upstream commit d7e6b8dfc7bcb3f4f3a18313581f67486a725b52 ]

When using kcopyd to run callbacks through dm_kcopyd_do_callback() or submitting copy jobs with a source size of 0, the jobs are pushed directly to the complete_jobs list, which could be under processing by the kcopyd thread. As a result, the kcopyd thread can continue running completed jobs indefinitely, without releasing the CPU, as long as someone keeps submitting new completed jobs through the aforementioned paths. Processing of work items, queued for execution on the same CPU as the currently running kcopyd thread, is thus stalled for excessive amounts of time, hurting performance.

Running the following test, from the device mapper test suite [1],

dmtest run --suite snapshot -n parallel_io_to_many_snaps_N

, with 8 active snapshots, we get, in dmesg, messages like the following:

[68899.948523] BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 95s! [68899.949282] Showing busy workqueues and worker pools: [68899.949288] workqueue events: flags=0x0 [68899.949295] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [68899.949306] pending: vmstat_shepherd, cache_reap [68899.949331] workqueue mm_percpu_wq: flags=0x8 [68899.949337] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949345] pending: vmstat_update [68899.949387] workqueue dm_bufio_cache: flags=0x8 [68899.949392] pwq 4: cpus=2 node=0 flags=0x0 nice=0 active=1/256 [68899.949400] pending: work_fn [dm_bufio] [68899.949423] workqueue kcopyd: flags=0x8 [68899.949429] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949437] pending: do_work [dm_mod] [68899.949452] workqueue kcopyd: flags=0x8 [68899.949458] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256 [68899.949466] in-flight: 13:do_work [dm_mod] [68899.949474] pending: do_work [dm_mod] [68899.949487] workqueue kcopyd: flags=0x8 [68899.949493] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949501] pending: do_work [dm_mod] [68899.949515] workqueue kcopyd: flags=0x8 [68899.949521] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949529] pending: do_work [dm_mod] [68899.949541] workqueue kcopyd: flags=0x8 [68899.949547] pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/256 [68899.949555] pending: do_work [dm_mod] [68899.949568] pool 0: cpus=0 node=0 flags=0x0 nice=0 hung=95s workers=4 idle: 27130 27223 1084

Fix this by splitting the complete_jobs list into two parts: A user facing part, named callback_jobs, and one used internally by kcopyd, retaining the name complete_jobs. dm_kcopyd_do_callback() and dispatch_job() now push their jobs to the callback_jobs list, which is spliced to the complete_jobs list once, every time the kcopyd thread wakes up. This prevents kcopyd from hogging the CPU indefinitely and causing workqueue stalls.

Re-running the aforementioned test:

* Workqueue stalls are eliminated * The maximum writing time among all targets is reduced from 09m37.10s to 06m04.85s and the total run time of the test is reduced from 10m43.591s to 7m19.199s

[1] https://github.com/jthornber/device-mapper-test-suite

Signed-off-by: Nikos Tsironis ntsironis@arrikto.com Signed-off-by: Ilias Tsitsimpis iliastsi@arrikto.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-kcopyd.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c index d4b326914f06..b9d1897bcf5b 100644 --- a/drivers/md/dm-kcopyd.c +++ b/drivers/md/dm-kcopyd.c @@ -55,15 +55,17 @@ struct dm_kcopyd_client { struct dm_kcopyd_throttle *throttle;

/* - * We maintain three lists of jobs: + * We maintain four lists of jobs: * * i) jobs waiting for pages * ii) jobs that have pages, and are waiting for the io to be issued. - * iii) jobs that have completed. + * iii) jobs that don't need to do any IO and just run a callback + * iv) jobs that have completed. * - * All three of these are protected by job_lock. + * All four of these are protected by job_lock. */ spinlock_t job_lock; + struct list_head callback_jobs; struct list_head complete_jobs; struct list_head io_jobs; struct list_head pages_jobs; @@ -622,6 +624,7 @@ static void do_work(struct work_struct *work) struct dm_kcopyd_client *kc = container_of(work, struct dm_kcopyd_client, kcopyd_work); struct blk_plug plug; + unsigned long flags;

/* * The order that these are called is *very* important. @@ -630,6 +633,10 @@ static void do_work(struct work_struct *work) * list. io jobs call wake when they complete and it all * starts again. */ + spin_lock_irqsave(&kc->job_lock, flags); + list_splice_tail_init(&kc->callback_jobs, &kc->complete_jobs); + spin_unlock_irqrestore(&kc->job_lock, flags); + blk_start_plug(&plug); process_jobs(&kc->complete_jobs, kc, run_complete_job); process_jobs(&kc->pages_jobs, kc, run_pages_job); @@ -647,7 +654,7 @@ static void dispatch_job(struct kcopyd_job *job) struct dm_kcopyd_client *kc = job->kc; atomic_inc(&kc->nr_jobs); if (unlikely(!job->source.count)) - push(&kc->complete_jobs, job); + push(&kc->callback_jobs, job); else if (job->pages == &zero_page_list) push(&kc->io_jobs, job); else @@ -857,7 +864,7 @@ void dm_kcopyd_do_callback(void *j, int read_err, unsigned long write_err) job->read_err = read_err; job->write_err = write_err;

- push(&kc->complete_jobs, job); + push(&kc->callback_jobs, job); wake(kc); } EXPORT_SYMBOL(dm_kcopyd_do_callback); @@ -887,6 +894,7 @@ struct dm_kcopyd_client *dm_kcopyd_client_create(struct dm_kcopyd_throttle *thro return ERR_PTR(-ENOMEM);

spin_lock_init(&kc->job_lock); + INIT_LIST_HEAD(&kc->callback_jobs); INIT_LIST_HEAD(&kc->complete_jobs); INIT_LIST_HEAD(&kc->io_jobs); INIT_LIST_HEAD(&kc->pages_jobs); @@ -936,6 +944,7 @@ void dm_kcopyd_client_destroy(struct dm_kcopyd_client *kc) /* Wait for completion of all jobs submitted by this client. */ wait_event(kc->destroyq, !atomic_read(&kc->nr_jobs));

+ BUG_ON(!list_empty(&kc->callback_jobs)); BUG_ON(!list_empty(&kc->complete_jobs)); BUG_ON(!list_empty(&kc->io_jobs)); BUG_ON(!list_empty(&kc->pages_jobs));

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 40/53] tools lib subcmd: Don't add the kernel sources to the include path

From: Arnaldo Carvalho de Melo acme@redhat.com

[ Upstream commit ece9804985b57e1ccd83b1fb6288520955a29d51 ]

At some point we decided not to directly include kernel sources files when building tools/perf/, but when tools/lib/subcmd/ was forked from tools/perf it somehow ended up adding it via these two lines in its Makefile:

CFLAGS += -I$(srctree)/include/uapi CFLAGS += -I$(srctree)/include

As $(srctree) points to the kernel sources.

Removing those lines and keeping just:

CFLAGS += -I$(srctree)/tools/include/

Is enough to build tools/perf and tools/objtool.

This fixes the build when building from the sources in environments such as the Android NDK crossbuilding from a fedora:26 system:

subcmd-util.h:11:15: error: expected ',' or ';' before 'void' static inline void report(const char *prefix, const char *err, va_list params) ^ In file included from /git/perf/include/uapi/linux/stddef.h:2:0, from /git/perf/include/uapi/linux/posix_types.h:5, from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h:36, from /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/unistd.h:33, from run-command.c:2: subcmd-util.h:18:17: error: '__no_instrument_function__' attribute applies only to functions

The /opt/android-ndk-r12b/platforms/android-24/arch-arm/usr/include/sys/types.h file that includes linux/posix_types.h ends up getting the one in the kernel sources causing the breakage. Fix it.

Test built tools/objtool/ too.

Reported-by: Jiri Olsa jolsa@kernel.org Tested-by: Jiri Olsa jolsa@kernel.org Cc: Adrian Hunter adrian.hunter@intel.com Cc: Josh Poimboeuf jpoimboe@redhat.com Cc: Namhyung Kim namhyung@kernel.org Fixes: 4b6ab94eabe4 ("perf subcmd: Create subcmd library") Link: https://lkml.kernel.org/n/tip-5lhaoecrj12t0bqwvpiu14sm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo acme@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/lib/subcmd/Makefile | 2 -- 1 file changed, 2 deletions(-)

diff --git a/tools/lib/subcmd/Makefile b/tools/lib/subcmd/Makefile index 95563b8e1ad7..ed61fb3a46c0 100644 --- a/tools/lib/subcmd/Makefile +++ b/tools/lib/subcmd/Makefile @@ -36,8 +36,6 @@ endif CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE

CFLAGS += -I$(srctree)/tools/include/ -CFLAGS += -I$(srctree)/include/uapi -CFLAGS += -I$(srctree)/include

SUBCMD_IN := $(OUTPUT)libsubcmd-in.o

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 41/53] dm snapshot: Fix excessive memory usage and workqueue stalls

From: Nikos Tsironis ntsironis@arrikto.com

[ Upstream commit 721b1d98fb517ae99ab3b757021cf81db41e67be ]

kcopyd has no upper limit to the number of jobs one can allocate and issue. Under certain workloads this can lead to excessive memory usage and workqueue stalls. For example, when creating multiple dm-snapshot targets with a 4K chunk size and then writing to the origin through the page cache. Syncing the page cache causes a large number of BIOs to be issued to the dm-snapshot origin target, which itself issues an even larger (because of the BIO splitting taking place) number of kcopyd jobs.

Running the following test, from the device mapper test suite [1],

dmtest run --suite snapshot -n many_snapshots_of_same_volume_N

, with 8 active snapshots, results in the kcopyd job slab cache growing to 10G. Depending on the available system RAM this can lead to the OOM killer killing user processes:

[463.492878] kthreadd invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP), nodemask=(null), order=1, oom_score_adj=0 [463.492894] kthreadd cpuset=/ mems_allowed=0 [463.492948] CPU: 7 PID: 2 Comm: kthreadd Not tainted 4.19.0-rc7 #3 [463.492950] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 [463.492952] Call Trace: [463.492964] dump_stack+0x7d/0xbb [463.492973] dump_header+0x6b/0x2fc [463.492987] ? lockdep_hardirqs_on+0xee/0x190 [463.493012] oom_kill_process+0x302/0x370 [463.493021] out_of_memory+0x113/0x560 [463.493030] __alloc_pages_slowpath+0xf40/0x1020 [463.493055] __alloc_pages_nodemask+0x348/0x3c0 [463.493067] cache_grow_begin+0x81/0x8b0 [463.493072] ? cache_grow_begin+0x874/0x8b0 [463.493078] fallback_alloc+0x1e4/0x280 [463.493092] kmem_cache_alloc_node+0xd6/0x370 [463.493098] ? copy_process.part.31+0x1c5/0x20d0 [463.493105] copy_process.part.31+0x1c5/0x20d0 [463.493115] ? __lock_acquire+0x3cc/0x1550 [463.493121] ? __switch_to_asm+0x34/0x70 [463.493129] ? kthread_create_worker_on_cpu+0x70/0x70 [463.493135] ? finish_task_switch+0x90/0x280 [463.493165] _do_fork+0xe0/0x6d0 [463.493191] ? kthreadd+0x19f/0x220 [463.493233] kernel_thread+0x25/0x30 [463.493235] kthreadd+0x1bf/0x220 [463.493242] ? kthread_create_on_cpu+0x90/0x90 [463.493248] ret_from_fork+0x3a/0x50 [463.493279] Mem-Info: [463.493285] active_anon:20631 inactive_anon:4831 isolated_anon:0 [463.493285] active_file:80216 inactive_file:80107 isolated_file:435 [463.493285] unevictable:0 dirty:51266 writeback:109372 unstable:0 [463.493285] slab_reclaimable:31191 slab_unreclaimable:3483521 [463.493285] mapped:526 shmem:4903 pagetables:1759 bounce:0 [463.493285] free:33623 free_pcp:2392 free_cma:0 ... [463.493489] Unreclaimable slab info: [463.493513] Name Used Total [463.493522] bio-6 1028KB 1028KB [463.493525] bio-5 1028KB 1028KB [463.493528] dm_snap_pending_exception 236783KB 243789KB [463.493531] dm_exception 41KB 42KB [463.493534] bio-4 1216KB 1216KB [463.493537] bio-3 439396KB 439396KB [463.493539] kcopyd_job 6973427KB 6973427KB ... [463.494340] Out of memory: Kill process 1298 (ruby2.3) score 1 or sacrifice child [463.494673] Killed process 1298 (ruby2.3) total-vm:435740kB, anon-rss:20180kB, file-rss:4kB, shmem-rss:0kB [463.506437] oom_reaper: reaped process 1298 (ruby2.3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Moreover, issuing a large number of kcopyd jobs results in kcopyd hogging the CPU, while processing them. As a result, processing of work items, queued for execution on the same CPU as the currently running kcopyd thread, is stalled for long periods of time, hurting performance. Running the aforementioned test we get, in dmesg, messages like the following:

[67501.194592] BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 27s! [67501.195586] Showing busy workqueues and worker pools: [67501.195591] workqueue events: flags=0x0 [67501.195597] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195611] pending: cache_reap [67501.195641] workqueue mm_percpu_wq: flags=0x8 [67501.195645] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195656] pending: vmstat_update [67501.195682] workqueue kblockd: flags=0x18 [67501.195687] pwq 5: cpus=2 node=0 flags=0x0 nice=-20 active=1/256 [67501.195698] pending: blk_timeout_work [67501.195753] workqueue kcopyd: flags=0x8 [67501.195757] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195768] pending: do_work [dm_mod] [67501.195802] workqueue kcopyd: flags=0x8 [67501.195806] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195817] pending: do_work [dm_mod] [67501.195834] workqueue kcopyd: flags=0x8 [67501.195838] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195848] pending: do_work [dm_mod] [67501.195881] workqueue kcopyd: flags=0x8 [67501.195885] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256 [67501.195896] pending: do_work [dm_mod] [67501.195920] workqueue kcopyd: flags=0x8 [67501.195924] pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=2/256 [67501.195935] in-flight: 67:do_work [dm_mod] [67501.195945] pending: do_work [dm_mod] [67501.195961] pool 8: cpus=4 node=0 flags=0x0 nice=0 hung=27s workers=3 idle: 129 23765

The root cause for these issues is the way dm-snapshot uses kcopyd. In particular, the lack of an explicit or implicit limit to the maximum number of in-flight COW jobs. The merging path is not affected because it implicitly limits the in-flight kcopyd jobs to one.

Fix these issues by using a semaphore to limit the maximum number of in-flight kcopyd jobs. We grab the semaphore before allocating a new kcopyd job in start_copy() and start_full_bio() and release it after the job finishes in copy_callback().

The initial semaphore value is configurable through a module parameter, to allow fine tuning the maximum number of in-flight COW jobs. Setting this parameter to zero initializes the semaphore to INT_MAX.

A default value of 2048 maximum in-flight kcopyd jobs was chosen. This value was decided experimentally as a trade-off between memory consumption, stalling the kernel's workqueues and maintaining a high enough throughput.

Re-running the aforementioned test:

* Workqueue stalls are eliminated * kcopyd's job slab cache uses a maximum of 130MB * The time taken by the test to write to the snapshot-origin target is reduced from 05m20.48s to 03m26.38s

[1] https://github.com/jthornber/device-mapper-test-suite

Signed-off-by: Nikos Tsironis ntsironis@arrikto.com Signed-off-by: Ilias Tsitsimpis iliastsi@arrikto.com Signed-off-by: Mike Snitzer snitzer@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/md/dm-snap.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+)

diff --git a/drivers/md/dm-snap.c b/drivers/md/dm-snap.c index a0613bd8ed00..b502debc6df3 100644 --- a/drivers/md/dm-snap.c +++ b/drivers/md/dm-snap.c @@ -19,6 +19,7 @@ #include <linux/vmalloc.h> #include <linux/log2.h> #include <linux/dm-kcopyd.h> +#include <linux/semaphore.h>

#include "dm.h"

@@ -105,6 +106,9 @@ struct dm_snapshot { /* The on disk metadata handler */ struct dm_exception_store *store;

+ /* Maximum number of in-flight COW jobs. */ + struct semaphore cow_count; + struct dm_kcopyd_client *kcopyd_client;

/* Wait for events based on state_bits */ @@ -145,6 +149,19 @@ struct dm_snapshot { #define RUNNING_MERGE 0 #define SHUTDOWN_MERGE 1

+/* + * Maximum number of chunks being copied on write. + * + * The value was decided experimentally as a trade-off between memory + * consumption, stalling the kernel's workqueues and maintaining a high enough + * throughput. + */ +#define DEFAULT_COW_THRESHOLD 2048 + +static int cow_threshold = DEFAULT_COW_THRESHOLD; +module_param_named(snapshot_cow_threshold, cow_threshold, int, 0644); +MODULE_PARM_DESC(snapshot_cow_threshold, "Maximum number of chunks being copied on write"); + DECLARE_DM_KCOPYD_THROTTLE_WITH_MODULE_PARM(snapshot_copy_throttle, "A percentage of time allocated for copy on write");

@@ -1189,6 +1206,8 @@ static int snapshot_ctr(struct dm_target *ti, unsigned int argc, char **argv) goto bad_hash_tables; }

+ sema_init(&s->cow_count, (cow_threshold > 0) ? cow_threshold : INT_MAX); + s->kcopyd_client = dm_kcopyd_client_create(&dm_kcopyd_throttle); if (IS_ERR(s->kcopyd_client)) { r = PTR_ERR(s->kcopyd_client); @@ -1560,6 +1579,7 @@ static void copy_callback(int read_err, unsigned long write_err, void *context) } list_add(&pe->out_of_order_entry, lh); } + up(&s->cow_count); }

/* @@ -1583,6 +1603,7 @@ static void start_copy(struct dm_snap_pending_exception *pe) dest.count = src.count;

/* Hand over to kcopyd */ + down(&s->cow_count); dm_kcopyd_copy(s->kcopyd_client, &src, 1, &dest, 0, copy_callback, pe); }

@@ -1602,6 +1623,7 @@ static void start_full_bio(struct dm_snap_pending_exception *pe, pe->full_bio = bio; pe->full_bio_end_io = bio->bi_end_io;

+ down(&s->cow_count); callback_data = dm_kcopyd_prepare_callback(s->kcopyd_client, copy_callback, pe);

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 42/53] quota: Lock s_umount in exclusive mode for Q_XQUOTA{ON, OFF} quotactls.

From: Javier Barrio javier.barrio.mart@gmail.com

[ Upstream commit 41c4f85cdac280d356df1f483000ecec4a8868be ]

Commit 1fa5efe3622db58cb8c7b9a50665e9eb9a6c7e97 (ext4: Use generic helpers for quotaon and quotaoff) made possible to call quotactl(Q_XQUOTAON/OFF) on ext4 filesystems with sysfile quota support. This leads to calling dquot_enable/disable without s_umount held in excl. mode, because quotactl_cmd_onoff checks only for Q_QUOTAON/OFF.

The following WARN_ON_ONCE triggers (in this case for dquot_enable, ext4, latest Linus' tree):

[ 117.807056] EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: quota,prjquota

[...]

[ 155.036847] WARNING: CPU: 0 PID: 2343 at fs/quota/dquot.c:2469 dquot_enable+0x34/0xb9 [ 155.036851] Modules linked in: quota_v2 quota_tree ipv6 af_packet joydev mousedev psmouse serio_raw pcspkr i2c_piix4 intel_agp intel_gtt e1000 ttm drm_kms_helper drm agpgart fb_sys_fops syscopyarea sysfillrect sysimgblt i2c_core input_leds kvm_intel kvm irqbypass qemu_fw_cfg floppy evdev parport_pc parport button crc32c_generic dm_mod ata_generic pata_acpi ata_piix libata loop ext4 crc16 mbcache jbd2 usb_storage usbcore sd_mod scsi_mod [ 155.036901] CPU: 0 PID: 2343 Comm: qctl Not tainted 4.20.0-rc6-00025-gf5d582777bcb #9 [ 155.036903] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014 [ 155.036911] RIP: 0010:dquot_enable+0x34/0xb9 [ 155.036915] Code: 41 56 41 55 41 54 55 53 4c 8b 6f 28 74 02 0f 0b 4d 8d 7d 70 49 89 fc 89 cb 41 89 d6 89 f5 4c 89 ff e8 23 09 ea ff 85 c0 74 0a <0f> 0b 4c 89 ff e8 8b 09 ea ff 85 db 74 6a 41 8b b5 f8 00 00 00 0f [ 155.036918] RSP: 0018:ffffb09b00493e08 EFLAGS: 00010202 [ 155.036922] RAX: 0000000000000001 RBX: 0000000000000008 RCX: 0000000000000008 [ 155.036924] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff9781b67cd870 [ 155.036926] RBP: 0000000000000002 R08: 0000000000000000 R09: 61c8864680b583eb [ 155.036929] R10: ffffb09b00493e48 R11: ffffffffff7ce7d4 R12: ffff9781b7ee8d78 [ 155.036932] R13: ffff9781b67cd800 R14: 0000000000000004 R15: ffff9781b67cd870 [ 155.036936] FS: 00007fd813250b88(0000) GS:ffff9781ba000000(0000) knlGS:0000000000000000 [ 155.036939] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 155.036942] CR2: 00007fd812ff61d6 CR3: 000000007c882000 CR4: 00000000000006b0 [ 155.036951] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 155.036953] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 155.036955] Call Trace: [ 155.037004] dquot_quota_enable+0x8b/0xd0 [ 155.037011] kernel_quotactl+0x628/0x74e [ 155.037027] ? do_mprotect_pkey+0x2a6/0x2cd [ 155.037034] __x64_sys_quotactl+0x1a/0x1d [ 155.037041] do_syscall_64+0x55/0xe4 [ 155.037078] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 155.037105] RIP: 0033:0x7fd812fe1198 [ 155.037109] Code: 02 77 0d 48 89 c1 48 c1 e9 3f 75 04 48 8b 04 24 48 83 c4 50 5b c3 48 83 ec 08 49 89 ca 48 63 d2 48 63 ff b8 b3 00 00 00 0f 05 <48> 89 c7 e8 c1 eb ff ff 5a c3 48 63 ff b8 bb 00 00 00 0f 05 48 89 [ 155.037112] RSP: 002b:00007ffe8cd7b050 EFLAGS: 00000206 ORIG_RAX: 00000000000000b3 [ 155.037116] RAX: ffffffffffffffda RBX: 00007ffe8cd7b148 RCX: 00007fd812fe1198 [ 155.037119] RDX: 0000000000000000 RSI: 00007ffe8cd7cea9 RDI: 0000000000580102 [ 155.037121] RBP: 00007ffe8cd7b0f0 R08: 000055fc8eba8a9d R09: 0000000000000000 [ 155.037124] R10: 00007ffe8cd7b074 R11: 0000000000000206 R12: 00007ffe8cd7b168 [ 155.037126] R13: 000055fc8eba8897 R14: 0000000000000000 R15: 0000000000000000 [ 155.037131] ---[ end trace 210f864257175c51 ]---

and then the syscall proceeds without s_umount locking.

This patch locks the superblock ->s_umount sem. in exclusive mode for all Q_XQUOTAON/OFF quotactls too in addition to Q_QUOTAON/OFF.

AFAICT, other than ext4, only xfs and ocfs2 are affected by this change. The VFS will now call in xfs_quota_* functions with s_umount held, which wasn't the case before. This looks good to me but I can not say for sure. Ext4 and ocfs2 where already beeing called with s_umount exclusive via quota_quotaon/off which is basically the same.

Signed-off-by: Javier Barrio javier.barrio.mart@gmail.com Signed-off-by: Jan Kara jack@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- fs/quota/quota.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/quota/quota.c b/fs/quota/quota.c index 3f02bab0db4e..a89c1f05b22e 100644 --- a/fs/quota/quota.c +++ b/fs/quota/quota.c @@ -793,7 +793,8 @@ static int quotactl_cmd_write(int cmd) /* Return true if quotactl command is manipulating quota on/off state */ static bool quotactl_cmd_onoff(int cmd) { - return (cmd == Q_QUOTAON) || (cmd == Q_QUOTAOFF); + return (cmd == Q_QUOTAON) || (cmd == Q_QUOTAOFF) || + (cmd == Q_XQUOTAON) || (cmd == Q_XQUOTAOFF); }

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 43/53] clocksource/drivers/integrator-ap: Add missing of_node_put()

From: Yangtao Li tiny.windzz@gmail.com

[ Upstream commit 5eb73c831171115d3b4347e1e7124a5a35d8086c ]

The function of_find_node_by_path() acquires a reference to the node returned by it and that reference needs to be dropped by its caller.

integrator_ap_timer_init_of() doesn't do that. The pri_node and the sec_node are used as an identifier to compare against the current node, so we can directly drop the refcount after getting the node from the path as it is not used as pointer.

By dropping the refcount right after getting it, a single variable is needed instead of two.

Fix this by use a single variable and drop the refcount right after of_find_node_by_path().

Signed-off-by: Yangtao Li tiny.windzz@gmail.com Signed-off-by: Daniel Lezcano daniel.lezcano@linaro.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/clocksource/timer-integrator-ap.c | 25 +++++++++++++++-------- 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/clocksource/timer-integrator-ap.c b/drivers/clocksource/timer-integrator-ap.c index 62d24690ba02..9701107806a7 100644 --- a/drivers/clocksource/timer-integrator-ap.c +++ b/drivers/clocksource/timer-integrator-ap.c @@ -181,8 +181,7 @@ static int __init integrator_ap_timer_init_of(struct device_node *node) int irq; struct clk *clk; unsigned long rate; - struct device_node *pri_node; - struct device_node *sec_node; + struct device_node *alias_node;

base = of_io_request_and_map(node, 0, "integrator-timer"); if (IS_ERR(base)) @@ -204,7 +203,18 @@ static int __init integrator_ap_timer_init_of(struct device_node *node) return err; }

- pri_node = of_find_node_by_path(path); + alias_node = of_find_node_by_path(path); + + /* + * The pointer is used as an identifier not as a pointer, we + * can drop the refcount on the of__node immediately after + * getting it. + */ + of_node_put(alias_node); + + if (node == alias_node) + /* The primary timer lacks IRQ, use as clocksource */ + return integrator_clocksource_init(rate, base);

err = of_property_read_string(of_aliases, "arm,timer-secondary", &path); @@ -213,14 +223,11 @@ static int __init integrator_ap_timer_init_of(struct device_node *node) return err; }

+ alias_node = of_find_node_by_path(path);

- sec_node = of_find_node_by_path(path); - - if (node == pri_node) - /* The primary timer lacks IRQ, use as clocksource */ - return integrator_clocksource_init(rate, base); + of_node_put(alias_node);

- if (node == sec_node) { + if (node == alias_node) { /* The secondary timer will drive the clock event */ irq = irq_of_parse_and_map(node, 0); return integrator_clockevent_init(rate, base, irq);

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 44/53] ALSA: bebob: fix model-id of unit for Apogee Ensemble

From: Takashi Sakamoto o-takashi@sakamocchi.jp

[ Upstream commit 644b2e97405b0b74845e1d3c2b4fe4c34858062b ]

This commit fixes hard-coded model-id for an unit of Apogee Ensemble with a correct value. This unit uses DM1500 ASIC produced ArchWave AG (formerly known as BridgeCo AG).

I note that this model supports three modes in the number of data channels in tx/rx streams; 8 ch pairs, 10 ch pairs, 18 ch pairs. The mode is switched by Vendor-dependent AV/C command, like:

$ cd linux-firewire-utils $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0600000000 (8ch pairs) $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0601000000 (10ch pairs) $ ./firewire-request /dev/fw1 fcp 0x00ff000003dbeb0602000000 (18ch pairs)

When switching between different mode, the unit disappears from IEEE 1394 bus, then appears on the bus with different combination of stream formats. In a mode of 18 ch pairs, available sampling rate is up to 96.0 kHz, else up to 192.0 kHz.

$ ./hinawa-config-rom-printer /dev/fw1 { 'bus-info': { 'adj': False, 'bmc': True, 'chip_ID': 21474898341, 'cmc': True, 'cyc_clk_acc': 100, 'generation': 2, 'imc': True, 'isc': True, 'link_spd': 2, 'max_ROM': 1, 'max_rec': 512, 'name': '1394', 'node_vendor_ID': 987, 'pmc': False}, 'root-directory': [ ['HARDWARE_VERSION', 19], [ 'NODE_CAPABILITIES', { 'addressing': {'64': True, 'fix': True, 'prv': False}, 'misc': {'int': False, 'ms': False, 'spt': True}, 'state': { 'atn': False, 'ded': False, 'drq': True, 'elo': False, 'init': False, 'lst': True, 'off': False}, 'testing': {'bas': False, 'ext': False}}], ['VENDOR', 987], ['DESCRIPTOR', 'Apogee Electronics'], ['MODEL', 126702], ['DESCRIPTOR', 'Ensemble'], ['VERSION', 5297], [ 'UNIT', [ ['SPECIFIER_ID', 41005], ['VERSION', 65537], ['MODEL', 126702], ['DESCRIPTOR', 'Ensemble']]], [ 'DEPENDENT_INFO', [ ['SPECIFIER_ID', 2037], ['VERSION', 1], [(58, 'IMMEDIATE'), 16777159], [(59, 'IMMEDIATE'), 1048576], [(60, 'IMMEDIATE'), 16777159], [(61, 'IMMEDIATE'), 6291456]]]]}

Signed-off-by: Takashi Sakamoto o-takashi@sakamocchi.jp Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- sound/firewire/bebob/bebob.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/sound/firewire/bebob/bebob.c b/sound/firewire/bebob/bebob.c index 93676354f87f..de4af8a41ff0 100644 --- a/sound/firewire/bebob/bebob.c +++ b/sound/firewire/bebob/bebob.c @@ -434,7 +434,7 @@ static const struct ieee1394_device_id bebob_id_table[] = { /* Apogee Electronics, DA/AD/DD-16X (X-FireWire card) */ SND_BEBOB_DEV_ENTRY(VEN_APOGEE, 0x00010048, &spec_normal), /* Apogee Electronics, Ensemble */ - SND_BEBOB_DEV_ENTRY(VEN_APOGEE, 0x00001eee, &spec_normal), + SND_BEBOB_DEV_ENTRY(VEN_APOGEE, 0x01eeee, &spec_normal), /* ESI, Quatafire610 */ SND_BEBOB_DEV_ENTRY(VEN_ESI, 0x00010064, &spec_normal), /* AcousticReality, eARMasterOne */

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 45/53] sysfs: Disable lockdep for driver bind/unbind files

From: Daniel Vetter daniel.vetter@ffwll.ch

[ Upstream commit 4f4b374332ec0ae9c738ff8ec9bed5cd97ff9adc ]

This is the much more correct fix for my earlier attempt at:

https://lkml.org/lkml/2018/12/10/118

Short recap:

- There's not actually a locking issue, it's just lockdep being a bit too eager to complain about a possible deadlock.

- Contrary to what I claimed the real problem is recursion on kn->count. Greg pointed me at sysfs_break_active_protection(), used by the scsi subsystem to allow a sysfs file to unbind itself. That would be a real deadlock, which isn't what's happening here. Also, breaking the active protection means we'd need to manually handle all the lifetime fun.

- With Rafael we discussed the task_work approach, which kinda works, but has two downsides: It's a functional change for a lockdep annotation issue, and it won't work for the bind file (which needs to get the errno from the driver load function back to userspace).

- Greg also asked why this never showed up: To hit this you need to unregister a 2nd driver from the unload code of your first driver. I guess only gpus do that. The bug has always been there, but only with a recent patch series did we add more locks so that lockdep built a chain from unbinding the snd-hda driver to the acpi_video_unregister call.

Full lockdep splat:

[12301.898799] ============================================ [12301.898805] WARNING: possible recursive locking detected [12301.898811] 4.20.0-rc7+ #84 Not tainted [12301.898815] -------------------------------------------- [12301.898821] bash/5297 is trying to acquire lock: [12301.898826] 00000000f61c6093 (kn->count#39){++++}, at: kernfs_remove_by_name_ns+0x3b/0x80 [12301.898841] but task is already holding lock: [12301.898847] 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190 [12301.898856] other info that might help us debug this: [12301.898862] Possible unsafe locking scenario: [12301.898867] CPU0 [12301.898870] ---- [12301.898874] lock(kn->count#39); [12301.898879] lock(kn->count#39); [12301.898883] *** DEADLOCK *** [12301.898891] May be due to missing lock nesting notation [12301.898899] 5 locks held by bash/5297: [12301.898903] #0: 00000000cd800e54 (sb_writers#4){.+.+}, at: vfs_write+0x17f/0x1b0 [12301.898915] #1: 000000000465e7c2 (&of->mutex){+.+.}, at: kernfs_fop_write+0xd3/0x190 [12301.898925] #2: 000000005f634021 (kn->count#39){++++}, at: kernfs_fop_write+0xdc/0x190 [12301.898936] #3: 00000000414ef7ac (&dev->mutex){....}, at: device_release_driver_internal+0x34/0x240 [12301.898950] #4: 000000003218fbdf (register_count_mutex){+.+.}, at: acpi_video_unregister+0xe/0x40 [12301.898960] stack backtrace: [12301.898968] CPU: 1 PID: 5297 Comm: bash Not tainted 4.20.0-rc7+ #84 [12301.898974] Hardware name: Hewlett-Packard HP EliteBook 8460p/161C, BIOS 68SCF Ver. F.01 03/11/2011 [12301.898982] Call Trace: [12301.898989] dump_stack+0x67/0x9b [12301.898997] __lock_acquire+0x6ad/0x1410 [12301.899003] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899010] ? find_held_lock+0x2d/0x90 [12301.899017] ? mutex_spin_on_owner+0xe4/0x150 [12301.899023] ? find_held_lock+0x2d/0x90 [12301.899030] ? lock_acquire+0x90/0x180 [12301.899036] lock_acquire+0x90/0x180 [12301.899042] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899049] __kernfs_remove+0x296/0x310 [12301.899055] ? kernfs_remove_by_name_ns+0x3b/0x80 [12301.899060] ? kernfs_name_hash+0xd/0x80 [12301.899066] ? kernfs_find_ns+0x6c/0x100 [12301.899073] kernfs_remove_by_name_ns+0x3b/0x80 [12301.899080] bus_remove_driver+0x92/0xa0 [12301.899085] acpi_video_unregister+0x24/0x40 [12301.899127] i915_driver_unload+0x42/0x130 [i915] [12301.899160] i915_pci_remove+0x19/0x30 [i915] [12301.899169] pci_device_remove+0x36/0xb0 [12301.899176] device_release_driver_internal+0x185/0x240 [12301.899183] unbind_store+0xaf/0x180 [12301.899189] kernfs_fop_write+0x104/0x190 [12301.899195] __vfs_write+0x31/0x180 [12301.899203] ? rcu_read_lock_sched_held+0x6f/0x80 [12301.899209] ? rcu_sync_lockdep_assert+0x29/0x50 [12301.899216] ? __sb_start_write+0x13c/0x1a0 [12301.899221] ? vfs_write+0x17f/0x1b0 [12301.899227] vfs_write+0xb9/0x1b0 [12301.899233] ksys_write+0x50/0xc0 [12301.899239] do_syscall_64+0x4b/0x180 [12301.899247] entry_SYSCALL_64_after_hwframe+0x49/0xbe [12301.899253] RIP: 0033:0x7f452ac7f7a4 [12301.899259] Code: 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 80 00 00 00 00 8b 05 aa f0 2c 00 48 63 ff 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 f3 c3 66 90 55 53 48 89 d5 48 89 f3 48 83 [12301.899273] RSP: 002b:00007ffceafa6918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [12301.899282] RAX: ffffffffffffffda RBX: 000000000000000d RCX: 00007f452ac7f7a4 [12301.899288] RDX: 000000000000000d RSI: 00005612a1abf7c0 RDI: 0000000000000001 [12301.899295] RBP: 00005612a1abf7c0 R08: 000000000000000a R09: 00005612a1c46730 [12301.899301] R10: 000000000000000a R11: 0000000000000246 R12: 000000000000000d [12301.899308] R13: 0000000000000001 R14: 00007f452af4a740 R15: 000000000000000d

Looking around I've noticed that usb and i2c already handle similar recursion problems, where a sysfs file can unbind the same type of sysfs somewhere else in the hierarchy. Relevant commits are:

commit 356c05d58af05d582e634b54b40050c73609617b Author: Alan Stern stern@rowland.harvard.edu Date: Mon May 14 13:30:03 2012 -0400

sysfs: get rid of some lockdep false positives

commit e9b526fe704812364bca07edd15eadeba163ebfb Author: Alexander Sverdlin alexander.sverdlin@nsn.com Date: Fri May 17 14:56:35 2013 +0200

i2c: suppress lockdep warning on delete_device

Implement the same trick for driver bind/unbind.

v2: Put the macro into bus.c (Greg).

Reviewed-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Cc: Ramalingam C ramalingam.c@intel.com Cc: Arend van Spriel aspriel@gmail.com Cc: Andy Shevchenko andriy.shevchenko@linux.intel.com Cc: Geert Uytterhoeven geert+renesas@glider.be Cc: Bartosz Golaszewski brgl@bgdev.pl Cc: Heikki Krogerus heikki.krogerus@linux.intel.com Cc: Vivek Gautam vivek.gautam@codeaurora.org Cc: Joe Perches joe@perches.com Signed-off-by: Daniel Vetter daniel.vetter@intel.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/base/bus.c | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c index 22a64fd3309b..1cf1460f8c90 100644 --- a/drivers/base/bus.c +++ b/drivers/base/bus.c @@ -33,6 +33,9 @@ static struct kset *system_kset;

#define to_drv_attr(_attr) container_of(_attr, struct driver_attribute, attr)

+#define DRIVER_ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store) \ + struct driver_attribute driver_attr_##_name = \ + __ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store)

static int __must_check bus_rescan_devices_helper(struct device *dev, void *data); @@ -197,7 +200,7 @@ static ssize_t unbind_store(struct device_driver *drv, const char *buf, bus_put(bus); return err; } -static DRIVER_ATTR_WO(unbind); +static DRIVER_ATTR_IGNORE_LOCKDEP(unbind, S_IWUSR, NULL, unbind_store);

/* * Manually attach a device to a driver. @@ -233,7 +236,7 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf, bus_put(bus); return err; } -static DRIVER_ATTR_WO(bind); +static DRIVER_ATTR_IGNORE_LOCKDEP(bind, S_IWUSR, NULL, bind_store);

static ssize_t show_drivers_autoprobe(struct bus_type *bus, char *buf) {

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 46/53] IB/usnic: Fix potential deadlock

From: Parvi Kaustubhi pkaustub@cisco.com

[ Upstream commit 8036e90f92aae2784b855a0007ae2d8154d28b3c ]

Acquiring the rtnl lock while holding usdev_lock could result in a deadlock.

For example:

usnic_ib_query_port() | mutex_lock(&us_ibdev->usdev_lock) | ib_get_eth_speed() | rtnl_lock()

rtnl_lock() | usnic_ib_netdevice_event() | mutex_lock(&us_ibdev->usdev_lock)

This commit moves the usdev_lock acquisition after the rtnl lock has been released.

This is safe to do because usdev_lock is not protecting anything being accessed in ib_get_eth_speed(). Hence, the correct order of holding locks (rtnl -> usdev_lock) is not violated.

Signed-off-by: Parvi Kaustubhi pkaustub@cisco.com Signed-off-by: Jason Gunthorpe jgg@mellanox.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/infiniband/hw/usnic/usnic_ib_verbs.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c index 3c3453d213dc..fdfa25059723 100644 --- a/drivers/infiniband/hw/usnic/usnic_ib_verbs.c +++ b/drivers/infiniband/hw/usnic/usnic_ib_verbs.c @@ -310,13 +310,16 @@ int usnic_ib_query_port(struct ib_device *ibdev, u8 port,

usnic_dbg("\n");

- mutex_lock(&us_ibdev->usdev_lock); if (ib_get_eth_speed(ibdev, port, &props->active_speed, - &props->active_width)) { - mutex_unlock(&us_ibdev->usdev_lock); + &props->active_width)) return -EINVAL; - }

+ /* + * usdev_lock is acquired after (and not before) ib_get_eth_speed call + * because acquiring rtnl_lock in ib_get_eth_speed, while holding + * usdev_lock could lead to a deadlock. + */ + mutex_lock(&us_ibdev->usdev_lock); /* props being zeroed by the caller, avoid zeroing it here */

props->lid = 0;

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 47/53] scsi: smartpqi: correct lun reset issues

From: Kevin Barnett kevin.barnett@microsemi.com

[ Upstream commit 2ba55c9851d74eb015a554ef69ddf2ef061d5780 ]

Problem: The Linux kernel takes a logical volume offline after a LUN reset. This is generally accompanied by this message in the dmesg output:

Device offlined - not ready after error recovery

Root Cause: The root cause is a "quirk" in the timeout handling in the Linux SCSI layer. The Linux kernel places a 30-second timeout on most media access commands (reads and writes) that it send to device drivers. When a media access command times out, the Linux kernel goes into error recovery mode for the LUN that was the target of the command that timed out. Every command that timed out is kept on a list inside of the Linux kernel to be retried later. The kernel attempts to recover the command(s) that timed out by issuing a LUN reset followed by a TEST UNIT READY. If the LUN reset and TEST UNIT READY commands are successful, the kernel retries the command(s) that timed out.

Each SCSI command issued by the kernel has a result field associated with it. This field indicates the final result of the command (success or error). When a command times out, the kernel places a value in this result field indicating that the command timed out.

The "quirk" is that after the LUN reset and TEST UNIT READY commands are completed, the kernel checks each command on the timed-out command list before retrying it. If the result field is still "timed out", the kernel treats that command as not having been successfully recovered for a retry. If the number of commands that are in this state are greater than two, the kernel takes the LUN offline.

Fix: When our RAIDStack receives a LUN reset, it simply waits until all outstanding commands complete. Generally, all of these outstanding commands complete successfully. Therefore, the fix in the smartpqi driver is to always set the command result field to indicate success when a request completes successfully. This normally isn’t necessary because the result field is always initialized to success when the command is submitted to the driver. So when the command completes successfully, the result field is left untouched. But in this case, the kernel changes the result field behind the driver’s back and then expects the field to be changed by the driver as the commands that timed-out complete.

Reviewed-by: Dave Carroll david.carroll@microsemi.com Reviewed-by: Scott Teel scott.teel@microsemi.com Signed-off-by: Kevin Barnett kevin.barnett@microsemi.com Signed-off-by: Don Brace don.brace@microsemi.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/smartpqi/smartpqi_init.c | 3 +++ 1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c index 83bdbd84eb01..b662f58203ac 100644 --- a/drivers/scsi/smartpqi/smartpqi_init.c +++ b/drivers/scsi/smartpqi/smartpqi_init.c @@ -2709,6 +2709,9 @@ static unsigned int pqi_process_io_intr(struct pqi_ctrl_info *ctrl_info, switch (response->header.iu_type) { case PQI_RESPONSE_IU_RAID_PATH_IO_SUCCESS: case PQI_RESPONSE_IU_AIO_PATH_IO_SUCCESS: + if (io_request->scmd) + io_request->scmd->result = 0; + /* fall through */ case PQI_RESPONSE_IU_GENERAL_MANAGEMENT: break; case PQI_RESPONSE_IU_TASK_MANAGEMENT:

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 48/53] scsi: smartpqi: call pqi_free_interrupts() in pqi_shutdown()

From: Yanjiang Jin yanjiang.jin@hxt-semitech.com

[ Upstream commit e57b2945aa654e48f85a41e8917793c64ecb9de8 ]

We must free all irqs during shutdown, else kexec's 2nd kernel would hang in pqi_wait_for_completion_io() as below:

Call trace:

pqi_wait_for_completion_io pqi_submit_raid_request_synchronous.constprop.78+0x23c/0x310 [smartpqi] pqi_configure_events+0xec/0x1f8 [smartpqi] pqi_ctrl_init+0x814/0xca0 [smartpqi] pqi_pci_probe+0x400/0x46c [smartpqi] local_pci_probe+0x48/0xb0 pci_device_probe+0x14c/0x1b0 really_probe+0x218/0x3fc driver_probe_device+0x70/0x140 __driver_attach+0x11c/0x134 bus_for_each_dev+0x70/0xc8 driver_attach+0x30/0x38 bus_add_driver+0x1f0/0x294 driver_register+0x74/0x12c __pci_register_driver+0x64/0x70 pqi_init+0xd0/0x10000 [smartpqi] do_one_initcall+0x60/0x1d8 do_init_module+0x64/0x1f8 load_module+0x10ec/0x1350 __se_sys_finit_module+0xd4/0x100 __arm64_sys_finit_module+0x28/0x34 el0_svc_handler+0x104/0x160 el0_svc+0x8/0xc

This happens only in the following combinations:

1. smartpqi is built as module, not built-in; 2. We have a disk connected to smartpqi card; 3. Both kexec's 1st and 2nd kernels use this disk as Rootfs' mount point.

Signed-off-by: Yanjiang Jin yanjiang.jin@hxt-semitech.com Acked-by: Don Brace don.brace@microsemi.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/smartpqi/smartpqi_init.c | 1 + 1 file changed, 1 insertion(+)

diff --git a/drivers/scsi/smartpqi/smartpqi_init.c b/drivers/scsi/smartpqi/smartpqi_init.c index b662f58203ac..bc15999f1c7c 100644 --- a/drivers/scsi/smartpqi/smartpqi_init.c +++ b/drivers/scsi/smartpqi/smartpqi_init.c @@ -6703,6 +6703,7 @@ static void pqi_shutdown(struct pci_dev *pci_dev) * storage. */ rc = pqi_flush_cache(ctrl_info, SHUTDOWN); + pqi_free_interrupts(ctrl_info); pqi_reset(ctrl_info); if (rc == 0) return;

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 49/53] scsi: megaraid: fix out-of-bound array accesses

From: Qian Cai cai@lca.pw

[ Upstream commit c7a082e4242fd8cd21a441071e622f87c16bdacc ]

UBSAN reported those with MegaRAID SAS-3 3108,

[ 77.467308] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32 [ 77.475402] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]' [ 77.481677] CPU: 16 PID: 333 Comm: kworker/16:1 Not tainted 4.20.0-rc5+ #1 [ 77.488556] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018 [ 77.495791] Workqueue: events work_for_cpu_fn [ 77.500154] Call trace: [ 77.502610] dump_backtrace+0x0/0x2c8 [ 77.506279] show_stack+0x24/0x30 [ 77.509604] dump_stack+0x118/0x19c [ 77.513098] ubsan_epilogue+0x14/0x60 [ 77.516765] __ubsan_handle_out_of_bounds+0xfc/0x13c [ 77.521767] mr_update_load_balance_params+0x150/0x158 [megaraid_sas] [ 77.528230] MR_ValidateMapInfo+0x2cc/0x10d0 [megaraid_sas] [ 77.533825] megasas_get_map_info+0x244/0x2f0 [megaraid_sas] [ 77.539505] megasas_init_adapter_fusion+0x9b0/0xf48 [megaraid_sas] [ 77.545794] megasas_init_fw+0x1ab4/0x3518 [megaraid_sas] [ 77.551212] megasas_probe_one+0x2c4/0xbe0 [megaraid_sas] [ 77.556614] local_pci_probe+0x7c/0xf0 [ 77.560365] work_for_cpu_fn+0x34/0x50 [ 77.564118] process_one_work+0x61c/0xf08 [ 77.568129] worker_thread+0x534/0xa70 [ 77.571882] kthread+0x1c8/0x1d0 [ 77.575114] ret_from_fork+0x10/0x1c

[ 89.240332] UBSAN: Undefined behaviour in drivers/scsi/megaraid/megaraid_sas_fp.c:117:32 [ 89.248426] index 255 is out of range for type 'MR_LD_SPAN_MAP [1]' [ 89.254700] CPU: 16 PID: 95 Comm: kworker/u130:0 Not tainted 4.20.0-rc5+ #1 [ 89.261665] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.50 06/01/2018 [ 89.268903] Workqueue: events_unbound async_run_entry_fn [ 89.274222] Call trace: [ 89.276680] dump_backtrace+0x0/0x2c8 [ 89.280348] show_stack+0x24/0x30 [ 89.283671] dump_stack+0x118/0x19c [ 89.287167] ubsan_epilogue+0x14/0x60 [ 89.290835] __ubsan_handle_out_of_bounds+0xfc/0x13c [ 89.295828] MR_LdRaidGet+0x50/0x58 [megaraid_sas] [ 89.300638] megasas_build_io_fusion+0xbb8/0xd90 [megaraid_sas] [ 89.306576] megasas_build_and_issue_cmd_fusion+0x138/0x460 [megaraid_sas] [ 89.313468] megasas_queue_command+0x398/0x3d0 [megaraid_sas] [ 89.319222] scsi_dispatch_cmd+0x1dc/0x8a8 [ 89.323321] scsi_request_fn+0x8e8/0xdd0 [ 89.327249] __blk_run_queue+0xc4/0x158 [ 89.331090] blk_execute_rq_nowait+0xf4/0x158 [ 89.335449] blk_execute_rq+0xdc/0x158 [ 89.339202] __scsi_execute+0x130/0x258 [ 89.343041] scsi_probe_and_add_lun+0x2fc/0x1488 [ 89.347661] __scsi_scan_target+0x1cc/0x8c8 [ 89.351848] scsi_scan_channel.part.3+0x8c/0xc0 [ 89.356382] scsi_scan_host_selected+0x130/0x1f0 [ 89.361002] do_scsi_scan_host+0xd8/0xf0 [ 89.364927] do_scan_async+0x9c/0x320 [ 89.368594] async_run_entry_fn+0x138/0x420 [ 89.372780] process_one_work+0x61c/0xf08 [ 89.376793] worker_thread+0x13c/0xa70 [ 89.380546] kthread+0x1c8/0x1d0 [ 89.383778] ret_from_fork+0x10/0x1c

This is because when populating Driver Map using firmware raid map, all non-existing VDs set their ldTgtIdToLd to 0xff, so it can be skipped later.

...

From drivers/scsi/megaraid/megaraid_sas_base.c ,

memset(instance->ld_ids, 0xff, MEGASAS_MAX_LD_IDS);

...

From drivers/scsi/megaraid/megaraid_sas_fp.c ,

/* For non existing VDs, iterate to next VD*/ if (ld >= (MAX_LOGICAL_DRIVES_EXT - 1)) continue;

However, there are a few places that failed to skip those non-existing VDs due to off-by-one errors. Then, those 0xff leaked into MR_LdRaidGet(0xff, map) and triggered the out-of-bound accesses.

Fixes: 51087a8617fe ("megaraid_sas : Extended VD support") Signed-off-by: Qian Cai cai@lca.pw Acked-by: Sumit Saxena sumit.saxena@broadcom.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/megaraid/megaraid_sas_fp.c | 2 +- drivers/scsi/megaraid/megaraid_sas_fusion.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_fp.c b/drivers/scsi/megaraid/megaraid_sas_fp.c index f2ffde430ec1..9a2a62e39e4c 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fp.c +++ b/drivers/scsi/megaraid/megaraid_sas_fp.c @@ -1266,7 +1266,7 @@ void mr_update_load_balance_params(struct MR_DRV_RAID_MAP_ALL *drv_map,

for (ldCount = 0; ldCount < MAX_LOGICAL_DRIVES_EXT; ldCount++) { ld = MR_TargetIdToLdGet(ldCount, drv_map); - if (ld >= MAX_LOGICAL_DRIVES_EXT) { + if (ld >= MAX_LOGICAL_DRIVES_EXT - 1) { lbInfo[ldCount].loadBalanceFlag = 0; continue; } diff --git a/drivers/scsi/megaraid/megaraid_sas_fusion.c b/drivers/scsi/megaraid/megaraid_sas_fusion.c index 06a2e3d9fc5b..7be2b9e11332 100644 --- a/drivers/scsi/megaraid/megaraid_sas_fusion.c +++ b/drivers/scsi/megaraid/megaraid_sas_fusion.c @@ -2529,7 +2529,7 @@ static void megasas_build_ld_nonrw_fusion(struct megasas_instance *instance, device_id < instance->fw_supported_vd_count)) {

ld = MR_TargetIdToLdGet(device_id, local_map_ptr); - if (ld >= instance->fw_supported_vd_count) + if (ld >= instance->fw_supported_vd_count - 1) fp_possible = 0; else { raid = MR_LdRaidGet(ld, local_map_ptr);

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 50/53] ocfs2: fix panic due to unrecovered local alloc

From: Junxiao Bi junxiao.bi@oracle.com

[ Upstream commit 532e1e54c8140188e192348c790317921cb2dc1c ]

mount.ocfs2 ignore the inconsistent error that journal is clean but local alloc is unrecovered. After mount, local alloc not empty, then reserver cluster didn't alloc a new local alloc window, reserveration map is empty(ocfs2_reservation_map.m_bitmap_len = 0), that triggered the following panic.

This issue was reported at

https://oss.oracle.com/pipermail/ocfs2-devel/2015-May/010854.html

and was advised to fixed during mount. But this is a very unusual inconsistent state, usually journal dirty flag should be cleared at the last stage of umount until every other things go right. We may need do further debug to check that. Any way to avoid possible futher corruption, mount should be abort and fsck should be run.

(mount.ocfs2,1765,1):ocfs2_load_local_alloc:353 ERROR: Local alloc hasn't been recovered! found = 6518, set = 6518, taken = 8192, off = 15912372 ocfs2: Mounting device (202,64) on (node 0, slot 3) with ordered data mode. o2dlm: Joining domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 8 ) 8 nodes ocfs2: Mounting device (202,80) on (node 0, slot 3) with ordered data mode. o2hb: Region 89CEAC63CC4F4D03AC185B44E0EE0F3F (xvdf) is now a quorum device o2net: Accepted connection from node yvwsoa17p (num 7) at 172.22.77.88:7777 o2dlm: Node 7 joins domain 64FE421C8C984E6D96ED12C55FEE2435 ( 0 1 2 3 4 5 6 7 8 ) 9 nodes o2dlm: Node 7 joins domain 89CEAC63CC4F4D03AC185B44E0EE0F3F ( 0 1 2 3 4 5 6 7 8 ) 9 nodes ------------[ cut here ]------------ kernel BUG at fs/ocfs2/reservations.c:507! invalid opcode: 0000 [#1] SMP Modules linked in: ocfs2 rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs fscache lockd grace ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs sunrpc ipt_REJECT nf_reject_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 ovmapi ppdev parport_pc parport xen_netfront fb_sys_fops sysimgblt sysfillrect syscopyarea acpi_cpufreq pcspkr i2c_piix4 i2c_core sg ext4 jbd2 mbcache2 sr_mod cdrom xen_blkfront pata_acpi ata_generic ata_piix floppy dm_mirror dm_region_hash dm_log dm_mod CPU: 0 PID: 4349 Comm: startWebLogic.s Not tainted 4.1.12-124.19.2.el6uek.x86_64 #2 Hardware name: Xen HVM domU, BIOS 4.4.4OVM 09/06/2018 task: ffff8803fb04e200 ti: ffff8800ea4d8000 task.ti: ffff8800ea4d8000 RIP: 0010:[<ffffffffa05e96a8>] [<ffffffffa05e96a8>] __ocfs2_resv_find_window+0x498/0x760 [ocfs2] Call Trace: ocfs2_resmap_resv_bits+0x10d/0x400 [ocfs2] ocfs2_claim_local_alloc_bits+0xd0/0x640 [ocfs2] __ocfs2_claim_clusters+0x178/0x360 [ocfs2] ocfs2_claim_clusters+0x1f/0x30 [ocfs2] ocfs2_convert_inline_data_to_extents+0x634/0xa60 [ocfs2] ocfs2_write_begin_nolock+0x1c6/0x1da0 [ocfs2] ocfs2_write_begin+0x13e/0x230 [ocfs2] generic_perform_write+0xbf/0x1c0 __generic_file_write_iter+0x19c/0x1d0 ocfs2_file_write_iter+0x589/0x1360 [ocfs2] __vfs_write+0xb8/0x110 vfs_write+0xa9/0x1b0 SyS_write+0x46/0xb0 system_call_fastpath+0x18/0xd7 Code: ff ff 8b 75 b8 39 75 b0 8b 45 c8 89 45 98 0f 84 e5 fe ff ff 45 8b 74 24 18 41 8b 54 24 1c e9 56 fc ff ff 85 c0 0f 85 48 ff ff ff <0f> 0b 48 8b 05 cf c3 de ff 48 ba 00 00 00 00 00 00 00 10 48 85 RIP __ocfs2_resv_find_window+0x498/0x760 [ocfs2] RSP <ffff8800ea4db668> ---[ end trace 566f07529f2edf3c ]--- Kernel panic - not syncing: Fatal exception Kernel Offset: disabled

Link: http://lkml.kernel.org/r/20181121020023.3034-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi junxiao.bi@oracle.com Reviewed-by: Yiwen Jiang jiangyiwen@huawei.com Acked-by: Joseph Qi jiangqi903@gmail.com Cc: Jun Piao piaojun@huawei.com Cc: Mark Fasheh mfasheh@versity.com Cc: Joel Becker jlbec@evilplan.org Cc: Changwei Ge ge.changwei@h3c.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ocfs2/localalloc.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/fs/ocfs2/localalloc.c b/fs/ocfs2/localalloc.c index fe0d1f9571bb..5d53d0d63d19 100644 --- a/fs/ocfs2/localalloc.c +++ b/fs/ocfs2/localalloc.c @@ -345,13 +345,18 @@ int ocfs2_load_local_alloc(struct ocfs2_super *osb) if (num_used || alloc->id1.bitmap1.i_used || alloc->id1.bitmap1.i_total - || la->la_bm_off) - mlog(ML_ERROR, "Local alloc hasn't been recovered!\n" + || la->la_bm_off) { + mlog(ML_ERROR, "inconsistent detected, clean journal with" + " unrecovered local alloc, please run fsck.ocfs2!\n" "found = %u, set = %u, taken = %u, off = %u\n", num_used, le32_to_cpu(alloc->id1.bitmap1.i_used), le32_to_cpu(alloc->id1.bitmap1.i_total), OCFS2_LOCAL_ALLOC(alloc)->la_bm_off);

+ status = -EINVAL; + goto bail; + } + osb->local_alloc_bh = alloc_bh; osb->local_alloc_state = OCFS2_LA_ENABLED;

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 51/53] mm/page-writeback.c: don't break integrity writeback on ->writepage() error

From: Brian Foster bfoster@redhat.com

[ Upstream commit 3fa750dcf29e8606e3969d13d8e188cc1c0f511d ]

write_cache_pages() is used in both background and integrity writeback scenarios by various filesystems. Background writeback is mostly concerned with cleaning a certain number of dirty pages based on various mm heuristics. It may not write the full set of dirty pages or wait for I/O to complete. Integrity writeback is responsible for persisting a set of dirty pages before the writeback job completes. For example, an fsync() call must perform integrity writeback to ensure data is on disk before the call returns.

write_cache_pages() unconditionally breaks out of its processing loop in the event of a ->writepage() error. This is fine for background writeback, which had no strict requirements and will eventually come around again. This can cause problems for integrity writeback on filesystems that might need to clean up state associated with failed page writeouts. For example, XFS performs internal delayed allocation accounting before returning a ->writepage() error, where applicable. If the current writeback happens to be associated with an unmount and write_cache_pages() completes the writeback prematurely due to error, the filesystem is unmounted in an inconsistent state if dirty+delalloc pages still exist.

To handle this problem, update write_cache_pages() to always process the full set of pages for integrity writeback regardless of ->writepage() errors. Save the first encountered error and return it to the caller once complete. This facilitates XFS (or any other fs that expects integrity writeback to process the entire set of dirty pages) to clean up its internal state completely in the event of persistent mapping errors. Background writeback continues to exit on the first error encountered.

[akpm@linux-foundation.org: fix typo in comment] Link: http://lkml.kernel.org/r/20181116134304.32440-1-bfoster@redhat.com Signed-off-by: Brian Foster bfoster@redhat.com Reviewed-by: Jan Kara jack@suse.cz Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- mm/page-writeback.c | 35 +++++++++++++++++++++-------------- 1 file changed, 21 insertions(+), 14 deletions(-)

diff --git a/mm/page-writeback.c b/mm/page-writeback.c index 3175ac850a53..e001de5ac50c 100644 --- a/mm/page-writeback.c +++ b/mm/page-writeback.c @@ -2157,6 +2157,7 @@ int write_cache_pages(struct address_space *mapping, { int ret = 0; int done = 0; + int error; struct pagevec pvec; int nr_pages; pgoff_t uninitialized_var(writeback_index); @@ -2253,25 +2254,31 @@ int write_cache_pages(struct address_space *mapping, goto continue_unlock;

trace_wbc_writepage(wbc, inode_to_bdi(mapping->host)); - ret = (*writepage)(page, wbc, data); - if (unlikely(ret)) { - if (ret == AOP_WRITEPAGE_ACTIVATE) { + error = (*writepage)(page, wbc, data); + if (unlikely(error)) { + /* + * Handle errors according to the type of + * writeback. There's no need to continue for + * background writeback. Just push done_index + * past this page so media errors won't choke + * writeout for the entire file. For integrity + * writeback, we must process the entire dirty + * set regardless of errors because the fs may + * still have state to clear for each page. In + * that case we continue processing and return + * the first error. + */ + if (error == AOP_WRITEPAGE_ACTIVATE) { unlock_page(page); - ret = 0; - } else { - /* - * done_index is set past this page, - * so media errors will not choke - * background writeout for the entire - * file. This has consequences for - * range_cyclic semantics (ie. it may - * not be suitable for data integrity - * writeout). - */ + error = 0; + } else if (wbc->sync_mode != WB_SYNC_ALL) { + ret = error; done_index = page->index + 1; done = 1; break; } + if (!ret) + ret = error; }

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 52/53] mm/swap: use nr_node_ids for avail_lists in swap_info_struct

From: Aaron Lu aaron.lu@intel.com

[ Upstream commit 66f71da9dd38af17dc17209cdde7987d4679a699 ]

Since a2468cc9bfdf ("swap: choose swap device according to numa node"), avail_lists field of swap_info_struct is changed to an array with MAX_NUMNODES elements. This made swap_info_struct size increased to 40KiB and needs an order-4 page to hold it.

This is not optimal in that: 1 Most systems have way less than MAX_NUMNODES(1024) nodes so it is a waste of memory; 2 It could cause swapon failure if the swap device is swapped on after system has been running for a while, due to no order-4 page is available as pointed out by Vasily Averin.

Solve the above two issues by using nr_node_ids(which is the actual possible node number the running system has) for avail_lists instead of MAX_NUMNODES.

nr_node_ids is unknown at compile time so can't be directly used when declaring this array. What I did here is to declare avail_lists as zero element array and allocate space for it when allocating space for swap_info_struct. The reason why keep using array but not pointer is plist_for_each_entry needs the field to be part of the struct, so pointer will not work.

This patch is on top of Vasily Averin's fix commit. I think the use of kvzalloc for swap_info_struct is still needed in case nr_node_ids is really big on some systems.

Link: http://lkml.kernel.org/r/20181115083847.GA11129@intel.com Signed-off-by: Aaron Lu aaron.lu@intel.com Reviewed-by: Andrew Morton akpm@linux-foundation.org Acked-by: Michal Hocko mhocko@suse.com Cc: Vasily Averin vvs@virtuozzo.com Cc: Huang Ying ying.huang@intel.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/swap.h | 11 ++++++++++- mm/swapfile.c | 3 ++- 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h index f02fb5db8914..4fd1ab9565ba 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -231,7 +231,6 @@ struct swap_info_struct { unsigned long flags; /* SWP_USED etc: see above */ signed short prio; /* swap priority of this type */ struct plist_node list; /* entry in swap_active_head */ - struct plist_node avail_lists[MAX_NUMNODES];/* entry in swap_avail_heads */ signed char type; /* strange name for an index */ unsigned int max; /* extent of the swap_map */ unsigned char *swap_map; /* vmalloc'ed array of usage counts */ @@ -272,6 +271,16 @@ struct swap_info_struct { */ struct work_struct discard_work; /* discard worker */ struct swap_cluster_list discard_clusters; /* discard clusters list */ + struct plist_node avail_lists[0]; /* + * entries in swap_avail_heads, one + * entry per node. + * Must be last as the number of the + * array is nr_node_ids, which is not + * a fixed value so have to allocate + * dynamically. + * And it has to be an array so that + * plist_for_each_* can work. + */ };

#ifdef CONFIG_64BIT diff --git a/mm/swapfile.c b/mm/swapfile.c index 08e8cd21770c..d687dc3c7050 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -2829,8 +2829,9 @@ static struct swap_info_struct *alloc_swap_info(void) struct swap_info_struct *p; unsigned int type; int i; + int size = sizeof(*p) + nr_node_ids * sizeof(struct plist_node);

- p = kvzalloc(sizeof(*p), GFP_KERNEL); + p = kvzalloc(size, GFP_KERNEL); if (!p) return ERR_PTR(-ENOMEM);

-- 2.19.1

Sasha Levin

7:32 p.m.

New subject: [PATCH AUTOSEL 4.14 53/53] mm, proc: be more verbose about unstable VMA flags in /proc/<pid>/smaps

From: Michal Hocko mhocko@suse.com

[ Upstream commit 7550c6079846a24f30d15ac75a941c8515dbedfb ]

Patch series "THP eligibility reporting via proc".

This series of three patches aims at making THP eligibility reporting much more robust and long term sustainable. The trigger for the change is a regression report [2] and the long follow up discussion. In short the specific application didn't have good API to query whether a particular mapping can be backed by THP so it has used VMA flags to workaround that. These flags represent a deep internal state of VMAs and as such they should be used by userspace with a great deal of caution.

A similar has happened for [3] when users complained that VM_MIXEDMAP is no longer set on DAX mappings. Again a lack of a proper API led to an abuse.

The first patch in the series tries to emphasise that that the semantic of flags might change and any application consuming those should be really careful.

The remaining two patches provide a more suitable interface to address [2] and provide a consistent API to query the THP status both for each VMA and process wide as well. [1]

http://lkml.kernel.org/r/20181120103515.25280-1-mhocko@kernel.org [2] http://lkml.kernel.org/r/http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054... [3] http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz

This patch (of 3):

Even though vma flags exported via /proc/<pid>/smaps are explicitly documented to be not guaranteed for future compatibility the warning doesn't go far enough because it doesn't mention semantic changes to those flags. And they are important as well because these flags are a deep implementation internal to the MM code and the semantic might change at any time.

Let's consider two recent examples: http://lkml.kernel.org/r/20181002100531.GC4135@quack2.suse.cz : commit e1fb4a086495 "dax: remove VM_MIXEDMAP for fsdax and device dax" has : removed VM_MIXEDMAP flag from DAX VMAs. Now our testing shows that in the : mean time certain customer of ours started poking into /proc/<pid>/smaps : and looks at VMA flags there and if VM_MIXEDMAP is missing among the VMA : flags, the application just fails to start complaining that DAX support is : missing in the kernel.

http://lkml.kernel.org/r/alpine.DEB.2.21.1809241054050.224429@chino.kir.corp... : Commit 1860033237d4 ("mm: make PR_SET_THP_DISABLE immediately active") : introduced a regression in that userspace cannot always determine the set : of vmas where thp is ineligible. : Userspace relies on the "nh" flag being emitted as part of /proc/pid/smaps : to determine if a vma is eligible to be backed by hugepages. : Previous to this commit, prctl(PR_SET_THP_DISABLE, 1) would cause thp to : be disabled and emit "nh" as a flag for the corresponding vmas as part of : /proc/pid/smaps. After the commit, thp is disabled by means of an mm : flag and "nh" is not emitted. : This causes smaps parsing libraries to assume a vma is eligible for thp : and ends up puzzling the user on why its memory is not backed by thp.

In both cases userspace was relying on a semantic of a specific VMA flag. The primary reason why that happened is a lack of a proper interface. While this has been worked on and it will be fixed properly, it seems that our wording could see some refinement and be more vocal about semantic aspect of these flags as well.

Link: http://lkml.kernel.org/r/20181211143641.3503-2-mhocko@kernel.org Signed-off-by: Michal Hocko mhocko@suse.com Acked-by: Jan Kara jack@suse.cz Acked-by: Dan Williams dan.j.williams@intel.com Acked-by: David Rientjes rientjes@google.com Acked-by: Mike Rapoport rppt@linux.ibm.com Acked-by: Vlastimil Babka vbabka@suse.cz Cc: Dan Williams dan.j.williams@intel.com Cc: David Rientjes rientjes@google.com Cc: Paul Oppenheimer bepvte@gmail.com Cc: William Kucharski william.kucharski@oracle.com Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/filesystems/proc.txt | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index adba21b5ada7..4cee34ce496e 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -494,7 +494,9 @@ manner. The codes are the following:

Note that there is no guarantee that every flag and associated mnemonic will be present in all further kernel releases. Things get changed, the flags may -be vanished or the reverse -- new added. +be vanished or the reverse -- new added. Interpretation of their meaning +might change in future as well. So each consumer of these flags has to +follow each specific kernel version for the exact semantic.

This file is only present if the CONFIG_MMU kernel configuration option is enabled.

-- 2.19.1

2555

days inactive

2555

days old

linux-stable-mirror@lists.linaro.org

52 comments

participants

tags (0)

participants (1)

Sasha Levin