From: Martijn van de Streek martijn@zeewinde.xyz
[ Upstream commit 022fc5315b7aff69d3df2c953b892a6232642d50 ]
The Trust Flex Design Tablet has an UGTizer USB ID and requires the same initialization as the UGTizer GP0610 to be detected as a graphics tablet instead of a mouse.
Signed-off-by: Martijn van de Streek martijn@zeewinde.xyz Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-uclogic-core.c | 2 ++ drivers/hid/hid-uclogic-params.c | 2 ++ 3 files changed, 5 insertions(+)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 79495e218b7fc..6b7f90a4721f0 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -1297,6 +1297,7 @@
#define USB_VENDOR_ID_UGTIZER 0x2179 #define USB_DEVICE_ID_UGTIZER_TABLET_GP0610 0x0053 +#define USB_DEVICE_ID_UGTIZER_TABLET_GT5040 0x0077
#define USB_VENDOR_ID_VIEWSONIC 0x0543 #define USB_DEVICE_ID_VIEWSONIC_PD1011 0xe621 diff --git a/drivers/hid/hid-uclogic-core.c b/drivers/hid/hid-uclogic-core.c index 86b568037cb8a..8e9c9e646cb7d 100644 --- a/drivers/hid/hid-uclogic-core.c +++ b/drivers/hid/hid-uclogic-core.c @@ -385,6 +385,8 @@ static const struct hid_device_id uclogic_devices[] = { USB_DEVICE_ID_UCLOGIC_DRAWIMAGE_G3) }, { HID_USB_DEVICE(USB_VENDOR_ID_UGTIZER, USB_DEVICE_ID_UGTIZER_TABLET_GP0610) }, + { HID_USB_DEVICE(USB_VENDOR_ID_UGTIZER, + USB_DEVICE_ID_UGTIZER_TABLET_GT5040) }, { HID_USB_DEVICE(USB_VENDOR_ID_UGEE, USB_DEVICE_ID_UGEE_TABLET_G5) }, { HID_USB_DEVICE(USB_VENDOR_ID_UGEE, diff --git a/drivers/hid/hid-uclogic-params.c b/drivers/hid/hid-uclogic-params.c index 7d20d1fcf8d20..d26d8cd98efcf 100644 --- a/drivers/hid/hid-uclogic-params.c +++ b/drivers/hid/hid-uclogic-params.c @@ -997,6 +997,8 @@ int uclogic_params_init(struct uclogic_params *params, break; case VID_PID(USB_VENDOR_ID_UGTIZER, USB_DEVICE_ID_UGTIZER_TABLET_GP0610): + case VID_PID(USB_VENDOR_ID_UGTIZER, + USB_DEVICE_ID_UGTIZER_TABLET_GT5040): case VID_PID(USB_VENDOR_ID_UGEE, USB_DEVICE_ID_UGEE_XPPEN_TABLET_G540): case VID_PID(USB_VENDOR_ID_UGEE,
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 3c785a06dee99501a17f8e8cf29b2b7e3f1e94ea ]
The usb-hid keyboard-dock for the Acer Switch 10 SW5-012 model declares an application and hid-usage page of 0x0088 for the INPUT(4) report which it sends. This reports contains 2 8-bit fields which are declared as HID_MAIN_ITEM_VARIABLE.
The keyboard-touchpad combo never actually generates this report, except when the touchpad is toggled on/off with the Fn + F7 hotkey combo. The toggle on/off is handled inside the keyboard-dock, when the touchpad is toggled off it simply stops sending events.
When the touchpad is toggled on/off an INPUT(4) report is generated with the first content byte set to 120/121, before this commit the kernel would report this as ABS_MISC 120/121 events.
Patch the descriptor to replace the HID_MAIN_ITEM_VARIABLE with HID_MAIN_ITEM_RELATIVE (because no key-presss release events are send) and add mappings for the 0x00880078 and 0x00880079 usages to generate touchpad on/off key events when the touchpad is toggled on/off.
Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ite.c | 61 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 60 insertions(+), 1 deletion(-)
diff --git a/drivers/hid/hid-ite.c b/drivers/hid/hid-ite.c index 044a93f3c1178..742c052b0110a 100644 --- a/drivers/hid/hid-ite.c +++ b/drivers/hid/hid-ite.c @@ -11,6 +11,48 @@
#include "hid-ids.h"
+#define QUIRK_TOUCHPAD_ON_OFF_REPORT BIT(0) + +static __u8 *ite_report_fixup(struct hid_device *hdev, __u8 *rdesc, unsigned int *rsize) +{ + unsigned long quirks = (unsigned long)hid_get_drvdata(hdev); + + if (quirks & QUIRK_TOUCHPAD_ON_OFF_REPORT) { + if (*rsize == 188 && rdesc[162] == 0x81 && rdesc[163] == 0x02) { + hid_info(hdev, "Fixing up ITE keyboard report descriptor\n"); + rdesc[163] = HID_MAIN_ITEM_RELATIVE; + } + } + + return rdesc; +} + +static int ite_input_mapping(struct hid_device *hdev, + struct hid_input *hi, struct hid_field *field, + struct hid_usage *usage, unsigned long **bit, + int *max) +{ + + unsigned long quirks = (unsigned long)hid_get_drvdata(hdev); + + if ((quirks & QUIRK_TOUCHPAD_ON_OFF_REPORT) && + (usage->hid & HID_USAGE_PAGE) == 0x00880000) { + if (usage->hid == 0x00880078) { + /* Touchpad on, userspace expects F22 for this */ + hid_map_usage_clear(hi, usage, bit, max, EV_KEY, KEY_F22); + return 1; + } + if (usage->hid == 0x00880079) { + /* Touchpad off, userspace expects F23 for this */ + hid_map_usage_clear(hi, usage, bit, max, EV_KEY, KEY_F23); + return 1; + } + return -1; + } + + return 0; +} + static int ite_event(struct hid_device *hdev, struct hid_field *field, struct hid_usage *usage, __s32 value) { @@ -37,13 +79,27 @@ static int ite_event(struct hid_device *hdev, struct hid_field *field, return 0; }
+static int ite_probe(struct hid_device *hdev, const struct hid_device_id *id) +{ + int ret; + + hid_set_drvdata(hdev, (void *)id->driver_data); + + ret = hid_open_report(hdev); + if (ret) + return ret; + + return hid_hw_start(hdev, HID_CONNECT_DEFAULT); +} + static const struct hid_device_id ite_devices[] = { { HID_USB_DEVICE(USB_VENDOR_ID_ITE, USB_DEVICE_ID_ITE8595) }, { HID_USB_DEVICE(USB_VENDOR_ID_258A, USB_DEVICE_ID_258A_6A88) }, /* ITE8595 USB kbd ctlr, with Synaptics touchpad connected to it. */ { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC, USB_VENDOR_ID_SYNAPTICS, - USB_DEVICE_ID_SYNAPTICS_ACER_SWITCH5_012) }, + USB_DEVICE_ID_SYNAPTICS_ACER_SWITCH5_012), + .driver_data = QUIRK_TOUCHPAD_ON_OFF_REPORT }, /* ITE8910 USB kbd ctlr, with Synaptics touchpad connected to it. */ { HID_DEVICE(BUS_USB, HID_GROUP_GENERIC, USB_VENDOR_ID_SYNAPTICS, @@ -55,6 +111,9 @@ MODULE_DEVICE_TABLE(hid, ite_devices); static struct hid_driver ite_driver = { .name = "itetech", .id_table = ite_devices, + .probe = ite_probe, + .report_fixup = ite_report_fixup, + .input_mapping = ite_input_mapping, .event = ite_event, }; module_hid_driver(ite_driver);
From: Frank Yang puilp0502@gmail.com
[ Upstream commit 652f3d00de523a17b0cebe7b90debccf13aa8c31 ]
The Varmilo VA104M Keyboard (04b4:07b1, reported as Varmilo Z104M) exposes media control hotkeys as a USB HID consumer control device, but these keys do not work in the current (5.8-rc1) kernel due to the incorrect HID report descriptor. Fix the problem by modifying the internal HID report descriptor.
More specifically, the keyboard report descriptor specifies the logical boundary as 572~10754 (0x023c ~ 0x2a02) while the usage boundary is specified as 0~10754 (0x00 ~ 0x2a02). This results in an incorrect interpretation of input reports, causing inputs to be ignored. By setting the Logical Minimum to zero, we align the logical boundary with the Usage ID boundary.
Some notes:
* There seem to be multiple variants of the VA104M keyboard. This patch specifically targets 04b4:07b1 variant.
* The device works out-of-the-box on Windows platform with the generic consumer control device driver (hidserv.inf). This suggests that Windows either ignores the Logical Minimum/Logical Maximum or interprets the Usage ID assignment differently from the linux implementation; Maybe there are other devices out there that only works on Windows due to this problem?
Signed-off-by: Frank Yang puilp0502@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-cypress.c | 44 ++++++++++++++++++++++++++++++++++----- drivers/hid/hid-ids.h | 2 ++ 2 files changed, 41 insertions(+), 5 deletions(-)
diff --git a/drivers/hid/hid-cypress.c b/drivers/hid/hid-cypress.c index a50ba4a4a1d71..b88f889b3932e 100644 --- a/drivers/hid/hid-cypress.c +++ b/drivers/hid/hid-cypress.c @@ -23,19 +23,17 @@ #define CP_2WHEEL_MOUSE_HACK 0x02 #define CP_2WHEEL_MOUSE_HACK_ON 0x04
+#define VA_INVAL_LOGICAL_BOUNDARY 0x08 + /* * Some USB barcode readers from cypress have usage min and usage max in * the wrong order */ -static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc, +static __u8 *cp_rdesc_fixup(struct hid_device *hdev, __u8 *rdesc, unsigned int *rsize) { - unsigned long quirks = (unsigned long)hid_get_drvdata(hdev); unsigned int i;
- if (!(quirks & CP_RDESC_SWAPPED_MIN_MAX)) - return rdesc; - if (*rsize < 4) return rdesc;
@@ -48,6 +46,40 @@ static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc, return rdesc; }
+static __u8 *va_logical_boundary_fixup(struct hid_device *hdev, __u8 *rdesc, + unsigned int *rsize) +{ + /* + * Varmilo VA104M (with VID Cypress and device ID 07B1) incorrectly + * reports Logical Minimum of its Consumer Control device as 572 + * (0x02 0x3c). Fix this by setting its Logical Minimum to zero. + */ + if (*rsize == 25 && + rdesc[0] == 0x05 && rdesc[1] == 0x0c && + rdesc[2] == 0x09 && rdesc[3] == 0x01 && + rdesc[6] == 0x19 && rdesc[7] == 0x00 && + rdesc[11] == 0x16 && rdesc[12] == 0x3c && rdesc[13] == 0x02) { + hid_info(hdev, + "fixing up varmilo VA104M consumer control report descriptor\n"); + rdesc[12] = 0x00; + rdesc[13] = 0x00; + } + return rdesc; +} + +static __u8 *cp_report_fixup(struct hid_device *hdev, __u8 *rdesc, + unsigned int *rsize) +{ + unsigned long quirks = (unsigned long)hid_get_drvdata(hdev); + + if (quirks & CP_RDESC_SWAPPED_MIN_MAX) + rdesc = cp_rdesc_fixup(hdev, rdesc, rsize); + if (quirks & VA_INVAL_LOGICAL_BOUNDARY) + rdesc = va_logical_boundary_fixup(hdev, rdesc, rsize); + + return rdesc; +} + static int cp_input_mapped(struct hid_device *hdev, struct hid_input *hi, struct hid_field *field, struct hid_usage *usage, unsigned long **bit, int *max) @@ -128,6 +160,8 @@ static const struct hid_device_id cp_devices[] = { .driver_data = CP_RDESC_SWAPPED_MIN_MAX }, { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_MOUSE), .driver_data = CP_2WHEEL_MOUSE_HACK }, + { HID_USB_DEVICE(USB_VENDOR_ID_CYPRESS, USB_DEVICE_ID_CYPRESS_VARMILO_VA104M_07B1), + .driver_data = VA_INVAL_LOGICAL_BOUNDARY }, { } }; MODULE_DEVICE_TABLE(hid, cp_devices); diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 6b7f90a4721f0..8e7bd9f1b029c 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -331,6 +331,8 @@ #define USB_DEVICE_ID_CYPRESS_BARCODE_4 0xed81 #define USB_DEVICE_ID_CYPRESS_TRUETOUCH 0xc001
+#define USB_DEVICE_ID_CYPRESS_VARMILO_VA104M_07B1 0X07b1 + #define USB_VENDOR_ID_DATA_MODUL 0x7374 #define USB_VENDOR_ID_DATA_MODUL_EASYMAXTOUCH 0x1201
From: Jiri Kosina jkosina@suse.cz
[ Upstream commit 1811977cb11354aef8cbd13e35ff50db716728a4 ]
This device needs HID_QUIRK_MULTI_INPUT in order to be presented to userspace in a consistent way.
Reported-and-tested-by: David Gámiz Jiménez david.gamiz@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-quirks.c | 1 + 2 files changed, 2 insertions(+)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 8e7bd9f1b029c..6d6c961f8fee7 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -487,6 +487,7 @@ #define USB_DEVICE_ID_PENPOWER 0x00f4
#define USB_VENDOR_ID_GREENASIA 0x0e8f +#define USB_DEVICE_ID_GREENASIA_DUAL_SAT_ADAPTOR 0x3010 #define USB_DEVICE_ID_GREENASIA_DUAL_USB_JOYPAD 0x3013
#define USB_VENDOR_ID_GRETAGMACBETH 0x0971 diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c index 7a2be0205dfd1..b154e11d43bfa 100644 --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -83,6 +83,7 @@ static const struct hid_device_id hid_quirks[] = { { HID_USB_DEVICE(USB_VENDOR_ID_FORMOSA, USB_DEVICE_ID_FORMOSA_IR_RECEIVER), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_FREESCALE, USB_DEVICE_ID_FREESCALE_MX28), HID_QUIRK_NOGET }, { HID_USB_DEVICE(USB_VENDOR_ID_FUTABA, USB_DEVICE_ID_LED_DISPLAY), HID_QUIRK_NO_INIT_REPORTS }, + { HID_USB_DEVICE(USB_VENDOR_ID_GREENASIA, USB_DEVICE_ID_GREENASIA_DUAL_SAT_ADAPTOR), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_GREENASIA, USB_DEVICE_ID_GREENASIA_DUAL_USB_JOYPAD), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_DRIVING), HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_FIGHTING), HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT },
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit b1884583fcd17d6a1b1bba94bbb5826e6b5c6e17 ]
The i8042 module exports several symbols which may be used by other modules.
Before this commit it would refuse to load (when built as a module itself) on systems without an i8042 controller.
This is a problem specifically for the asus-nb-wmi module. Many Asus laptops support the Asus WMI interface. Some of them have an i8042 controller and need to use i8042_install_filter() to filter some kbd events. Other models do not have an i8042 controller (e.g. they use an USB attached kbd).
Before this commit the asus-nb-wmi driver could not be loaded on Asus models without an i8042 controller, when the i8042 code was built as a module (as Arch Linux does) because the module_init function of the i8042 module would fail with -ENODEV and thus the i8042_install_filter symbol could not be loaded.
This commit fixes this by exiting from module_init with a return code of 0 if no controller is found. It also adds a i8042_present bool to make the module_exit function a no-op in this case and also adds a check for i8042_present to the exported i8042_command function.
The latter i8042_present check should not really be necessary because when builtin that function can already be used on systems without an i8042 controller, but better safe then sorry.
Reported-and-tested-by: Marius Iacob themariusus@gmail.com Signed-off-by: Hans de Goede hdegoede@redhat.com Link: https://lore.kernel.org/r/20201008112628.3979-2-hdegoede@redhat.com Signed-off-by: Dmitry Torokhov dmitry.torokhov@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/input/serio/i8042.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/input/serio/i8042.c b/drivers/input/serio/i8042.c index d3eda48032e39..944cbb519c6d7 100644 --- a/drivers/input/serio/i8042.c +++ b/drivers/input/serio/i8042.c @@ -122,6 +122,7 @@ module_param_named(unmask_kbd_data, i8042_unmask_kbd_data, bool, 0600); MODULE_PARM_DESC(unmask_kbd_data, "Unconditional enable (may reveal sensitive data) of normally sanitize-filtered kbd data traffic debug log [pre-condition: i8042.debug=1 enabled]"); #endif
+static bool i8042_present; static bool i8042_bypass_aux_irq_test; static char i8042_kbd_firmware_id[128]; static char i8042_aux_firmware_id[128]; @@ -343,6 +344,9 @@ int i8042_command(unsigned char *param, int command) unsigned long flags; int retval;
+ if (!i8042_present) + return -1; + spin_lock_irqsave(&i8042_lock, flags); retval = __i8042_command(param, command); spin_unlock_irqrestore(&i8042_lock, flags); @@ -1612,12 +1616,15 @@ static int __init i8042_init(void)
err = i8042_platform_init(); if (err) - return err; + return (err == -ENODEV) ? 0 : err;
err = i8042_controller_check(); if (err) goto err_platform_exit;
+ /* Set this before creating the dev to allow i8042_command to work right away */ + i8042_present = true; + pdev = platform_create_bundle(&i8042_driver, i8042_probe, NULL, 0, NULL, 0); if (IS_ERR(pdev)) { err = PTR_ERR(pdev); @@ -1636,6 +1643,9 @@ static int __init i8042_init(void)
static void __exit i8042_exit(void) { + if (!i8042_present) + return; + platform_device_unregister(i8042_platform_device); platform_driver_unregister(&i8042_driver); i8042_platform_exit();
From: Pablo Ceballos pceballos@google.com
[ Upstream commit 34a9fa2025d9d3177c99351c7aaf256c5f50691f ]
Some HID devices don't use a report ID because they only have a single report. In those cases, the report ID in struct hid_report will be zero and the data for the report will start at the first byte, so don't skip over the first byte.
Signed-off-by: Pablo Ceballos pceballos@google.com Acked-by: Srinivas Pandruvada srinivas.pandruvada@linux.intel.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-sensor-hub.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/hid/hid-sensor-hub.c b/drivers/hid/hid-sensor-hub.c index 94c7398b5c279..3dd7d32467378 100644 --- a/drivers/hid/hid-sensor-hub.c +++ b/drivers/hid/hid-sensor-hub.c @@ -483,7 +483,8 @@ static int sensor_hub_raw_event(struct hid_device *hdev, return 1;
ptr = raw_data; - ptr++; /* Skip report id */ + if (report->id) + ptr++; /* Skip report id */
spin_lock_irqsave(&pdata->lock, flags);
From: Necip Fazil Yildiran fazilyildiran@gmail.com
[ Upstream commit 06ea594051707c6b8834ef5b24e9b0730edd391b ]
When DMA_RALINK is enabled and DMADEVICES is disabled, it results in the following Kbuild warnings:
WARNING: unmet direct dependencies detected for DMA_ENGINE Depends on [n]: DMADEVICES [=n] Selected by [y]: - DMA_RALINK [=y] && STAGING [=y] && RALINK [=y] && !SOC_RT288X [=n]
WARNING: unmet direct dependencies detected for DMA_VIRTUAL_CHANNELS Depends on [n]: DMADEVICES [=n] Selected by [y]: - DMA_RALINK [=y] && STAGING [=y] && RALINK [=y] && !SOC_RT288X [=n]
The reason is that DMA_RALINK selects DMA_ENGINE and DMA_VIRTUAL_CHANNELS without depending on or selecting DMADEVICES while DMA_ENGINE and DMA_VIRTUAL_CHANNELS are subordinate to DMADEVICES. This can also fail building the kernel as demonstrated in a bug report.
Honor the kconfig dependency to remove unmet direct dependency warnings and avoid any potential build failures.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=210055 Signed-off-by: Necip Fazil Yildiran fazilyildiran@gmail.com Link: https://lore.kernel.org/r/20201104181522.43567-1-fazilyildiran@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/staging/ralink-gdma/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/staging/ralink-gdma/Kconfig b/drivers/staging/ralink-gdma/Kconfig index 54e8029e6b1af..0017376234e28 100644 --- a/drivers/staging/ralink-gdma/Kconfig +++ b/drivers/staging/ralink-gdma/Kconfig @@ -2,6 +2,7 @@ config DMA_RALINK tristate "RALINK DMA support" depends on RALINK && !SOC_RT288X + depends on DMADEVICES select DMA_ENGINE select DMA_VIRTUAL_CHANNELS
From: Chris Ye lzye@google.com
[ Upstream commit f59ee399de4a8ca4d7d19cdcabb4b63e94867f09 ]
Kernel 5.4 introduces HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE, devices need to be set explicitly with this flag.
Signed-off-by: Chris Ye lzye@google.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ids.h | 4 ++++ drivers/hid/hid-quirks.c | 4 ++++ 2 files changed, 8 insertions(+)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 6d6c961f8fee7..96e84a51f6b2c 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -445,6 +445,10 @@ #define USB_VENDOR_ID_FRUCTEL 0x25B6 #define USB_DEVICE_ID_GAMETEL_MT_MODE 0x0002
+#define USB_VENDOR_ID_GAMEVICE 0x27F8 +#define USB_DEVICE_ID_GAMEVICE_GV186 0x0BBE +#define USB_DEVICE_ID_GAMEVICE_KISHI 0x0BBF + #define USB_VENDOR_ID_GAMERON 0x0810 #define USB_DEVICE_ID_GAMERON_DUAL_PSX_ADAPTOR 0x0001 #define USB_DEVICE_ID_GAMERON_DUAL_PCS_ADAPTOR 0x0002 diff --git a/drivers/hid/hid-quirks.c b/drivers/hid/hid-quirks.c index b154e11d43bfa..bf7ecab5d9e5e 100644 --- a/drivers/hid/hid-quirks.c +++ b/drivers/hid/hid-quirks.c @@ -85,6 +85,10 @@ static const struct hid_device_id hid_quirks[] = { { HID_USB_DEVICE(USB_VENDOR_ID_FUTABA, USB_DEVICE_ID_LED_DISPLAY), HID_QUIRK_NO_INIT_REPORTS }, { HID_USB_DEVICE(USB_VENDOR_ID_GREENASIA, USB_DEVICE_ID_GREENASIA_DUAL_SAT_ADAPTOR), HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_GREENASIA, USB_DEVICE_ID_GREENASIA_DUAL_USB_JOYPAD), HID_QUIRK_MULTI_INPUT }, + { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_GAMEVICE, USB_DEVICE_ID_GAMEVICE_GV186), + HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, + { HID_USB_DEVICE(USB_VENDOR_ID_GAMEVICE, USB_DEVICE_ID_GAMEVICE_KISHI), + HID_QUIRK_INCREMENT_USAGE_ON_DUPLICATE }, { HID_USB_DEVICE(USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_DRIVING), HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_FIGHTING), HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT }, { HID_USB_DEVICE(USB_VENDOR_ID_HAPP, USB_DEVICE_ID_UGCI_FLYING), HID_QUIRK_BADPAD | HID_QUIRK_MULTI_INPUT },
From: Marc Ferland ferlandm@amotus.ca
[ Upstream commit 0ba2df09f1500d3f27398a3382b86d39c3e6abe2 ]
The xilinx_dma_poll_timeout macro is sometimes called while holding a spinlock (see xilinx_dma_issue_pending() for an example) this means we shouldn't sleep when polling the dma channel registers. To address it in xilinx poll timeout macro use readl_poll_timeout_atomic instead of readl_poll_timeout variant.
Signed-off-by: Marc Ferland ferlandm@amotus.ca Signed-off-by: Radhey Shyam Pandey radhey.shyam.pandey@xilinx.com Link: https://lore.kernel.org/r/1604473206-32573-2-git-send-email-radhey.shyam.pan... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/xilinx/xilinx_dma.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/dma/xilinx/xilinx_dma.c b/drivers/dma/xilinx/xilinx_dma.c index 0fc432567b857..993297d585c01 100644 --- a/drivers/dma/xilinx/xilinx_dma.c +++ b/drivers/dma/xilinx/xilinx_dma.c @@ -517,8 +517,8 @@ struct xilinx_dma_device { #define to_dma_tx_descriptor(tx) \ container_of(tx, struct xilinx_dma_tx_descriptor, async_tx) #define xilinx_dma_poll_timeout(chan, reg, val, cond, delay_us, timeout_us) \ - readl_poll_timeout(chan->xdev->regs + chan->ctrl_offset + reg, val, \ - cond, delay_us, timeout_us) + readl_poll_timeout_atomic(chan->xdev->regs + chan->ctrl_offset + reg, \ + val, cond, delay_us, timeout_us)
/* IO accessors */ static inline u32 dma_read(struct xilinx_dma_chan *chan, u32 reg)
From: Brian Masney bmasney@redhat.com
[ Upstream commit 65cae18882f943215d0505ddc7e70495877308e6 ]
When booting a hyperthreaded system with the kernel parameter 'mitigations=auto,nosmt', the following warning occurs:
WARNING: CPU: 0 PID: 1 at drivers/xen/events/events_base.c:1112 unbind_from_irqhandler+0x4e/0x60 ... Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006 ... Call Trace: xen_uninit_lock_cpu+0x28/0x62 xen_hvm_cpu_die+0x21/0x30 takedown_cpu+0x9c/0xe0 ? trace_suspend_resume+0x60/0x60 cpuhp_invoke_callback+0x9a/0x530 _cpu_up+0x11a/0x130 cpu_up+0x7e/0xc0 bringup_nonboot_cpus+0x48/0x50 smp_init+0x26/0x79 kernel_init_freeable+0xea/0x229 ? rest_init+0xaa/0xaa kernel_init+0xa/0x106 ret_from_fork+0x35/0x40
The secondary CPUs are not activated with the nosmt mitigations and only the primary thread on each CPU core is used. In this situation, xen_hvm_smp_prepare_cpus(), and more importantly xen_init_lock_cpu(), is not called, so the lock_kicker_irq is not initialized for the secondary CPUs. Let's fix this by exiting early in xen_uninit_lock_cpu() if the irq is not set to avoid the warning from above for each secondary CPU.
Signed-off-by: Brian Masney bmasney@redhat.com Link: https://lore.kernel.org/r/20201107011119.631442-1-bmasney@redhat.com Reviewed-by: Juergen Gross jgross@suse.com Signed-off-by: Boris Ostrovsky boris.ostrovsky@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/xen/spinlock.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index 799f4eba0a621..043c73dfd2c98 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -93,10 +93,20 @@ void xen_init_lock_cpu(int cpu)
void xen_uninit_lock_cpu(int cpu) { + int irq; + if (!xen_pvspin) return;
- unbind_from_irqhandler(per_cpu(lock_kicker_irq, cpu), NULL); + /* + * When booting the kernel with 'mitigations=auto,nosmt', the secondary + * CPUs are not activated, and lock_kicker_irq is not initialized. + */ + irq = per_cpu(lock_kicker_irq, cpu); + if (irq == -1) + return; + + unbind_from_irqhandler(irq, NULL); per_cpu(lock_kicker_irq, cpu) = -1; kfree(per_cpu(irq_name, cpu)); per_cpu(irq_name, cpu) = NULL;
From: Daniel Latypov dlatypov@google.com
[ Upstream commit 3084db0e0d5076cd48408274ab0911cd3ccdae88 ]
Currently the following expectation KUNIT_EXPECT_STREQ(test, "hi", "bye"); will produce: Expected "hi" == "bye", but "hi" == 1625079497 "bye" == 1625079500
After this patch: Expected "hi" == "bye", but "hi" == hi "bye" == bye
KUNIT_INIT_BINARY_STR_ASSERT_STRUCT() was written but just mistakenly not actually used by KUNIT_EXPECT_STREQ() and friends.
Signed-off-by: Daniel Latypov dlatypov@google.com Reviewed-by: Brendan Higgins brendanhiggins@google.com Tested-by: Brendan Higgins brendanhiggins@google.com Signed-off-by: Shuah Khan skhan@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- include/kunit/test.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/kunit/test.h b/include/kunit/test.h index 59f3144f009a5..b68ba33c16937 100644 --- a/include/kunit/test.h +++ b/include/kunit/test.h @@ -1064,7 +1064,7 @@ do { \ KUNIT_ASSERTION(test, \ strcmp(__left, __right) op 0, \ kunit_binary_str_assert, \ - KUNIT_INIT_BINARY_ASSERT_STRUCT(test, \ + KUNIT_INIT_BINARY_STR_ASSERT_STRUCT(test, \ assert_type, \ #op, \ #left, \
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit c27168a04a438a457c100253b1aaf0c779218aae ]
Like the MX5000 and MX5500 quad/bluetooth keyboards the Dinovo Edge also needs the HIDPP_CONSUMER_VENDOR_KEYS quirk for some special keys to work. Specifically without this the "Phone" and the 'A' - 'D' Smart Keys do not send any events.
In addition to fixing these keys not sending any events, adding the Bluetooth match, so that hid-logitech-hidpp is used instead of the generic HID driver, also adds battery monitoring support when the keyboard is connected over Bluetooth.
Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-logitech-hidpp.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c index a2991622702ae..0ca7231195473 100644 --- a/drivers/hid/hid-logitech-hidpp.c +++ b/drivers/hid/hid-logitech-hidpp.c @@ -3997,6 +3997,9 @@ static const struct hid_device_id hidpp_devices[] = { { /* Keyboard MX5000 (Bluetooth-receiver in HID proxy mode) */ LDJ_DEVICE(0xb305), .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, + { /* Dinovo Edge (Bluetooth-receiver in HID proxy mode) */ + LDJ_DEVICE(0xb309), + .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, { /* Keyboard MX5500 (Bluetooth-receiver in HID proxy mode) */ LDJ_DEVICE(0xb30b), .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, @@ -4039,6 +4042,9 @@ static const struct hid_device_id hidpp_devices[] = { { /* MX5000 keyboard over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb305), .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, + { /* Dinovo Edge keyboard over Bluetooth */ + HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb309), + .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS }, { /* MX5500 keyboard over Bluetooth */ HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, 0xb30b), .driver_data = HIDPP_QUIRK_HIDPP_CONSUMER_VENDOR_KEYS },
From: Hans de Goede hdegoede@redhat.com
[ Upstream commit 7940fb035abd88040d56be209962feffa33b03d0 ]
The battery status is also being reported by the logitech-hidpp driver, so ignore the standard HID battery status to avoid reporting the same info twice.
Note the logitech-hidpp battery driver provides more info, such as properly differentiating between charging and discharging. Also the standard HID battery info seems to be wrong, reporting a capacity of just 26% after fully charging the device.
Signed-off-by: Hans de Goede hdegoede@redhat.com Signed-off-by: Benjamin Tissoires benjamin.tissoires@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ids.h | 1 + drivers/hid/hid-input.c | 3 +++ 2 files changed, 4 insertions(+)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 96e84a51f6b2c..a6d63a7590434 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -749,6 +749,7 @@ #define USB_VENDOR_ID_LOGITECH 0x046d #define USB_DEVICE_ID_LOGITECH_AUDIOHUB 0x0a0e #define USB_DEVICE_ID_LOGITECH_T651 0xb00c +#define USB_DEVICE_ID_LOGITECH_DINOVO_EDGE_KBD 0xb309 #define USB_DEVICE_ID_LOGITECH_C007 0xc007 #define USB_DEVICE_ID_LOGITECH_C077 0xc077 #define USB_DEVICE_ID_LOGITECH_RECEIVER 0xc101 diff --git a/drivers/hid/hid-input.c b/drivers/hid/hid-input.c index 9770db624bfaf..4dca113924593 100644 --- a/drivers/hid/hid-input.c +++ b/drivers/hid/hid-input.c @@ -319,6 +319,9 @@ static const struct hid_device_id hid_battery_quirks[] = { { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_ASUSTEK, USB_DEVICE_ID_ASUSTEK_T100CHI_KEYBOARD), HID_BATTERY_QUIRK_IGNORE }, + { HID_BLUETOOTH_DEVICE(USB_VENDOR_ID_LOGITECH, + USB_DEVICE_ID_LOGITECH_DINOVO_EDGE_KBD), + HID_BATTERY_QUIRK_IGNORE }, {} };
From: Jens Axboe axboe@kernel.dk
[ Upstream commit 8d4c3e76e3be11a64df95ddee52e99092d42fc19 ]
If this is attempted by a kthread, then return -EOPNOTSUPP as we don't currently support that. Once we can get task_pid_ptr() doing the right thing, then this can go away again.
Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/proc/self.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/proc/self.c b/fs/proc/self.c index 72cd69bcaf4ad..cc71ce3466dc0 100644 --- a/fs/proc/self.c +++ b/fs/proc/self.c @@ -16,6 +16,13 @@ static const char *proc_self_get_link(struct dentry *dentry, pid_t tgid = task_tgid_nr_ns(current, ns); char *name;
+ /* + * Not currently supported. Once we can inherit all of struct pid, + * we can allow this. + */ + if (current->flags & PF_KTHREAD) + return ERR_PTR(-EOPNOTSUPP); + if (!tgid) return ERR_PTR(-ENOENT); /* max length of unsigned int in decimal + NULL term */
From: Minwoo Im minwoo.im.dev@gmail.com
[ Upstream commit 0f0d2c876c96d4908a9ef40959a44bec21bdd6cf ]
If Doorbell Buffer Config command fails even 'dev->dbbuf_dbs != NULL' which means OACS indicates that NVME_CTRL_OACS_DBBUF_SUPP is set, nvme_dbbuf_update_and_check_event() will check event even it's not been successfully set.
This patch fixes mismatch among dbbuf for sq/cqs in case that dbbuf command fails.
Signed-off-by: Minwoo Im minwoo.im.dev@gmail.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a6af96aaa0eb7..3448f7ac209a0 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -292,9 +292,21 @@ static void nvme_dbbuf_init(struct nvme_dev *dev, nvmeq->dbbuf_cq_ei = &dev->dbbuf_eis[cq_idx(qid, dev->db_stride)]; }
+static void nvme_dbbuf_free(struct nvme_queue *nvmeq) +{ + if (!nvmeq->qid) + return; + + nvmeq->dbbuf_sq_db = NULL; + nvmeq->dbbuf_cq_db = NULL; + nvmeq->dbbuf_sq_ei = NULL; + nvmeq->dbbuf_cq_ei = NULL; +} + static void nvme_dbbuf_set(struct nvme_dev *dev) { struct nvme_command c; + unsigned int i;
if (!dev->dbbuf_dbs) return; @@ -308,6 +320,9 @@ static void nvme_dbbuf_set(struct nvme_dev *dev) dev_warn(dev->ctrl.device, "unable to set dbbuf\n"); /* Free memory and continue on */ nvme_dbbuf_dma_free(dev); + + for (i = 1; i <= dev->online_queues; i++) + nvme_dbbuf_free(&dev->queues[i]); } }
From: Jens Axboe axboe@kernel.dk
[ Upstream commit 944d1444d53f5a213457e5096db370cfd06923d4 ]
Any attempt to do path resolution on /proc/self from an async worker will yield -EOPNOTSUPP. We can safely do that resolution from the task itself, and without blocking, so retry it from there.
Ideally io_uring would know this upfront and not have to go through the worker thread to find out, but that doesn't currently seem feasible.
Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- fs/io_uring.c | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-)
diff --git a/fs/io_uring.c b/fs/io_uring.c index e74a56f6915c0..46a68a8439895 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -435,6 +435,7 @@ struct io_sr_msg { struct io_open { struct file *file; int dfd; + bool ignore_nonblock; struct filename *filename; struct open_how how; unsigned long nofile; @@ -3590,6 +3591,7 @@ static int __io_openat_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe return ret; } req->open.nofile = rlimit(RLIMIT_NOFILE); + req->open.ignore_nonblock = false; req->flags |= REQ_F_NEED_CLEANUP; return 0; } @@ -3637,7 +3639,7 @@ static int io_openat2(struct io_kiocb *req, bool force_nonblock) struct file *file; int ret;
- if (force_nonblock) + if (force_nonblock && !req->open.ignore_nonblock) return -EAGAIN;
ret = build_open_flags(&req->open.how, &op); @@ -3652,6 +3654,21 @@ static int io_openat2(struct io_kiocb *req, bool force_nonblock) if (IS_ERR(file)) { put_unused_fd(ret); ret = PTR_ERR(file); + /* + * A work-around to ensure that /proc/self works that way + * that it should - if we get -EOPNOTSUPP back, then assume + * that proc_self_get_link() failed us because we're in async + * context. We should be safe to retry this from the task + * itself with force_nonblock == false set, as it should not + * block on lookup. Would be nice to know this upfront and + * avoid the async dance, but doesn't seem feasible. + */ + if (ret == -EOPNOTSUPP && io_wq_current_is_worker()) { + req->open.ignore_nonblock = true; + refcount_inc(&req->refs); + io_req_task_queue(req); + return 0; + } } else { fsnotify_open(file); fd_install(ret, file);
From: Jisheng Zhang Jisheng.Zhang@synaptics.com
[ Upstream commit 56311a315da7ebc668dbcc2f1c99689cc10796c4 ]
If the phy enables power saving technology, the dwmac's software reset needs more time to complete, enlarge dma reset timeout to 200000us.
Signed-off-by: Jisheng Zhang Jisheng.Zhang@synaptics.com Link: https://lore.kernel.org/r/20201113090902.5c7aab1a@xhacker.debian Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c index cb87d31a99dfb..57a53a600aa55 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac_lib.c @@ -23,7 +23,7 @@ int dwmac_dma_reset(void __iomem *ioaddr)
return readl_poll_timeout(ioaddr + DMA_BUS_MODE, value, !(value & DMA_BUS_MODE_SFT_RESET), - 10000, 100000); + 10000, 200000); }
/* CSR1 enables the transmit DMA to check for new descriptor */
From: Laurent Vivier lvivier@redhat.com
[ Upstream commit a312db697cb05dfa781848afe8585a1e1f2a5a99 ]
ERROR: modpost: "mac_pton" [drivers/vdpa/vdpa_sim/vdpa_sim.ko] undefined!
mac_pton() is defined in lib/net_utils.c and is not built if NET is not set.
Select GENERIC_NET_UTILS as vdpasim doesn't depend on NET.
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Laurent Vivier lvivier@redhat.com Link: https://lore.kernel.org/r/20201113155706.599434-1-lvivier@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Randy Dunlap rdunlap@infradead.org # build-tested Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vdpa/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/drivers/vdpa/Kconfig b/drivers/vdpa/Kconfig index d7d32b6561021..358f6048dd3ce 100644 --- a/drivers/vdpa/Kconfig +++ b/drivers/vdpa/Kconfig @@ -13,6 +13,7 @@ config VDPA_SIM depends on RUNTIME_TESTING_MENU && HAS_DMA select DMA_OPS select VHOST_RING + select GENERIC_NET_UTILS default n help vDPA networking device simulator which loop TX traffic back
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 6bcf34224ac1e94103797fd68b9836061762f2b2 ]
This adds a helper check if a vq has been setup. The next patches will use this when we move the vhost scsi cmd preallocation from per session to per vq. In the per vq case, we only want to allocate cmds for vqs that have actually been setup and not for all the possible vqs.
Signed-off-by: Mike Christie michael.christie@oracle.com Link: https://lore.kernel.org/r/1604986403-4931-2-git-send-email-michael.christie@... Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Jason Wang jasowang@redhat.com Acked-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/vhost.c | 6 ++++++ drivers/vhost/vhost.h | 1 + 2 files changed, 7 insertions(+)
diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 9ad45e1d27f0f..23e7b2d624511 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -305,6 +305,12 @@ static void vhost_vring_call_reset(struct vhost_vring_call *call_ctx) spin_lock_init(&call_ctx->ctx_lock); }
+bool vhost_vq_is_setup(struct vhost_virtqueue *vq) +{ + return vq->avail && vq->desc && vq->used && vhost_vq_access_ok(vq); +} +EXPORT_SYMBOL_GPL(vhost_vq_is_setup); + static void vhost_vq_reset(struct vhost_dev *dev, struct vhost_virtqueue *vq) { diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 9032d3c2a9f48..3d30b3da7bcf5 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -190,6 +190,7 @@ int vhost_get_vq_desc(struct vhost_virtqueue *, struct vhost_log *log, unsigned int *log_num); void vhost_discard_vq_desc(struct vhost_virtqueue *, int n);
+bool vhost_vq_is_setup(struct vhost_virtqueue *vq); int vhost_vq_init_access(struct vhost_virtqueue *); int vhost_add_used(struct vhost_virtqueue *, unsigned int head, int len); int vhost_add_used_n(struct vhost_virtqueue *, struct vring_used_elem *heads,
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 25b98b64e28423b0769313dcaf96423836b1f93d ]
We currently are limited to 256 cmds per session. This leads to problems where if the user has increased virtqueue_size to more than 2 or cmd_per_lun to more than 256 vhost_scsi_get_tag can fail and the guest will get IO errors.
This patch moves the cmd allocation to per vq so we can easily match whatever the user has specified for num_queues and virtqueue_size/cmd_per_lun. It also makes it easier to control how much memory we preallocate. For cases, where perf is not as important and we can use the current defaults (1 vq and 128 cmds per vq) memory use from preallocate cmds is cut in half. For cases, where we are willing to use more memory for higher perf, cmd mem use will now increase as the num queues and queue depth increases.
Signed-off-by: Mike Christie michael.christie@oracle.com Link: https://lore.kernel.org/r/1604986403-4931-3-git-send-email-michael.christie@... Signed-off-by: Michael S. Tsirkin mst@redhat.com Reviewed-by: Maurizio Lombardi mlombard@redhat.com Acked-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/scsi.c | 207 ++++++++++++++++++++++++++----------------- 1 file changed, 128 insertions(+), 79 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index b22adf03f5842..e31339be7dd78 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -52,7 +52,6 @@ #define VHOST_SCSI_VERSION "v0.1" #define VHOST_SCSI_NAMELEN 256 #define VHOST_SCSI_MAX_CDB_SIZE 32 -#define VHOST_SCSI_DEFAULT_TAGS 256 #define VHOST_SCSI_PREALLOC_SGLS 2048 #define VHOST_SCSI_PREALLOC_UPAGES 2048 #define VHOST_SCSI_PREALLOC_PROT_SGLS 2048 @@ -189,6 +188,9 @@ struct vhost_scsi_virtqueue { * Writers must also take dev mutex and flush under it. */ int inflight_idx; + struct vhost_scsi_cmd *scsi_cmds; + struct sbitmap scsi_tags; + int max_cmds; };
struct vhost_scsi { @@ -324,7 +326,9 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd) { struct vhost_scsi_cmd *tv_cmd = container_of(se_cmd, struct vhost_scsi_cmd, tvc_se_cmd); - struct se_session *se_sess = tv_cmd->tvc_nexus->tvn_se_sess; + struct vhost_scsi_virtqueue *svq = container_of(tv_cmd->tvc_vq, + struct vhost_scsi_virtqueue, vq); + struct vhost_scsi_inflight *inflight = tv_cmd->inflight; int i;
if (tv_cmd->tvc_sgl_count) { @@ -336,8 +340,8 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd) put_page(sg_page(&tv_cmd->tvc_prot_sgl[i])); }
- vhost_scsi_put_inflight(tv_cmd->inflight); - target_free_tag(se_sess, se_cmd); + sbitmap_clear_bit(&svq->scsi_tags, se_cmd->map_tag); + vhost_scsi_put_inflight(inflight); }
static u32 vhost_scsi_sess_get_index(struct se_session *se_sess) @@ -566,31 +570,31 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) }
static struct vhost_scsi_cmd * -vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg, +vhost_scsi_get_cmd(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg, unsigned char *cdb, u64 scsi_tag, u16 lun, u8 task_attr, u32 exp_data_len, int data_direction) { + struct vhost_scsi_virtqueue *svq = container_of(vq, + struct vhost_scsi_virtqueue, vq); struct vhost_scsi_cmd *cmd; struct vhost_scsi_nexus *tv_nexus; - struct se_session *se_sess; struct scatterlist *sg, *prot_sg; struct page **pages; - int tag, cpu; + int tag;
tv_nexus = tpg->tpg_nexus; if (!tv_nexus) { pr_err("Unable to locate active struct vhost_scsi_nexus\n"); return ERR_PTR(-EIO); } - se_sess = tv_nexus->tvn_se_sess;
- tag = sbitmap_queue_get(&se_sess->sess_tag_pool, &cpu); + tag = sbitmap_get(&svq->scsi_tags, 0, false); if (tag < 0) { pr_err("Unable to obtain tag for vhost_scsi_cmd\n"); return ERR_PTR(-ENOMEM); }
- cmd = &((struct vhost_scsi_cmd *)se_sess->sess_cmd_map)[tag]; + cmd = &svq->scsi_cmds[tag]; sg = cmd->tvc_sgl; prot_sg = cmd->tvc_prot_sgl; pages = cmd->tvc_upages; @@ -599,7 +603,6 @@ vhost_scsi_get_tag(struct vhost_virtqueue *vq, struct vhost_scsi_tpg *tpg, cmd->tvc_prot_sgl = prot_sg; cmd->tvc_upages = pages; cmd->tvc_se_cmd.map_tag = tag; - cmd->tvc_se_cmd.map_cpu = cpu; cmd->tvc_tag = scsi_tag; cmd->tvc_lun = lun; cmd->tvc_task_attr = task_attr; @@ -1065,11 +1068,11 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) scsi_command_size(cdb), VHOST_SCSI_MAX_CDB_SIZE); goto err; } - cmd = vhost_scsi_get_tag(vq, tpg, cdb, tag, lun, task_attr, + cmd = vhost_scsi_get_cmd(vq, tpg, cdb, tag, lun, task_attr, exp_data_len + prot_bytes, data_direction); if (IS_ERR(cmd)) { - vq_err(vq, "vhost_scsi_get_tag failed %ld\n", + vq_err(vq, "vhost_scsi_get_cmd failed %ld\n", PTR_ERR(cmd)); goto err; } @@ -1373,6 +1376,83 @@ static void vhost_scsi_flush(struct vhost_scsi *vs) wait_for_completion(&old_inflight[i]->comp); }
+static void vhost_scsi_destroy_vq_cmds(struct vhost_virtqueue *vq) +{ + struct vhost_scsi_virtqueue *svq = container_of(vq, + struct vhost_scsi_virtqueue, vq); + struct vhost_scsi_cmd *tv_cmd; + unsigned int i; + + if (!svq->scsi_cmds) + return; + + for (i = 0; i < svq->max_cmds; i++) { + tv_cmd = &svq->scsi_cmds[i]; + + kfree(tv_cmd->tvc_sgl); + kfree(tv_cmd->tvc_prot_sgl); + kfree(tv_cmd->tvc_upages); + } + + sbitmap_free(&svq->scsi_tags); + kfree(svq->scsi_cmds); + svq->scsi_cmds = NULL; +} + +static int vhost_scsi_setup_vq_cmds(struct vhost_virtqueue *vq, int max_cmds) +{ + struct vhost_scsi_virtqueue *svq = container_of(vq, + struct vhost_scsi_virtqueue, vq); + struct vhost_scsi_cmd *tv_cmd; + unsigned int i; + + if (svq->scsi_cmds) + return 0; + + if (sbitmap_init_node(&svq->scsi_tags, max_cmds, -1, GFP_KERNEL, + NUMA_NO_NODE)) + return -ENOMEM; + svq->max_cmds = max_cmds; + + svq->scsi_cmds = kcalloc(max_cmds, sizeof(*tv_cmd), GFP_KERNEL); + if (!svq->scsi_cmds) { + sbitmap_free(&svq->scsi_tags); + return -ENOMEM; + } + + for (i = 0; i < max_cmds; i++) { + tv_cmd = &svq->scsi_cmds[i]; + + tv_cmd->tvc_sgl = kcalloc(VHOST_SCSI_PREALLOC_SGLS, + sizeof(struct scatterlist), + GFP_KERNEL); + if (!tv_cmd->tvc_sgl) { + pr_err("Unable to allocate tv_cmd->tvc_sgl\n"); + goto out; + } + + tv_cmd->tvc_upages = kcalloc(VHOST_SCSI_PREALLOC_UPAGES, + sizeof(struct page *), + GFP_KERNEL); + if (!tv_cmd->tvc_upages) { + pr_err("Unable to allocate tv_cmd->tvc_upages\n"); + goto out; + } + + tv_cmd->tvc_prot_sgl = kcalloc(VHOST_SCSI_PREALLOC_PROT_SGLS, + sizeof(struct scatterlist), + GFP_KERNEL); + if (!tv_cmd->tvc_prot_sgl) { + pr_err("Unable to allocate tv_cmd->tvc_prot_sgl\n"); + goto out; + } + } + return 0; +out: + vhost_scsi_destroy_vq_cmds(vq); + return -ENOMEM; +} + /* * Called from vhost_scsi_ioctl() context to walk the list of available * vhost_scsi_tpg with an active struct vhost_scsi_nexus @@ -1427,10 +1507,9 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs,
if (!strcmp(tv_tport->tport_name, t->vhost_wwpn)) { if (vs->vs_tpg && vs->vs_tpg[tpg->tport_tpgt]) { - kfree(vs_tpg); mutex_unlock(&tpg->tv_tpg_mutex); ret = -EEXIST; - goto out; + goto undepend; } /* * In order to ensure individual vhost-scsi configfs @@ -1442,9 +1521,8 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, ret = target_depend_item(&se_tpg->tpg_group.cg_item); if (ret) { pr_warn("target_depend_item() failed: %d\n", ret); - kfree(vs_tpg); mutex_unlock(&tpg->tv_tpg_mutex); - goto out; + goto undepend; } tpg->tv_tpg_vhost_count++; tpg->vhost_scsi = vs; @@ -1457,6 +1535,16 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, if (match) { memcpy(vs->vs_vhost_wwpn, t->vhost_wwpn, sizeof(vs->vs_vhost_wwpn)); + + for (i = VHOST_SCSI_VQ_IO; i < VHOST_SCSI_MAX_VQ; i++) { + vq = &vs->vqs[i].vq; + if (!vhost_vq_is_setup(vq)) + continue; + + if (vhost_scsi_setup_vq_cmds(vq, vq->num)) + goto destroy_vq_cmds; + } + for (i = 0; i < VHOST_SCSI_MAX_VQ; i++) { vq = &vs->vqs[i].vq; mutex_lock(&vq->mutex); @@ -1476,7 +1564,22 @@ vhost_scsi_set_endpoint(struct vhost_scsi *vs, vhost_scsi_flush(vs); kfree(vs->vs_tpg); vs->vs_tpg = vs_tpg; + goto out;
+destroy_vq_cmds: + for (i--; i >= VHOST_SCSI_VQ_IO; i--) { + if (!vhost_vq_get_backend(&vs->vqs[i].vq)) + vhost_scsi_destroy_vq_cmds(&vs->vqs[i].vq); + } +undepend: + for (i = 0; i < VHOST_SCSI_MAX_TARGET; i++) { + tpg = vs_tpg[i]; + if (tpg) { + tpg->tv_tpg_vhost_count--; + target_undepend_item(&tpg->se_tpg.tpg_group.cg_item); + } + } + kfree(vs_tpg); out: mutex_unlock(&vs->dev.mutex); mutex_unlock(&vhost_scsi_mutex); @@ -1549,6 +1652,12 @@ vhost_scsi_clear_endpoint(struct vhost_scsi *vs, mutex_lock(&vq->mutex); vhost_vq_set_backend(vq, NULL); mutex_unlock(&vq->mutex); + /* + * Make sure cmds are not running before tearing them + * down. + */ + vhost_scsi_flush(vs); + vhost_scsi_destroy_vq_cmds(vq); } } /* @@ -1842,23 +1951,6 @@ static void vhost_scsi_port_unlink(struct se_portal_group *se_tpg, mutex_unlock(&vhost_scsi_mutex); }
-static void vhost_scsi_free_cmd_map_res(struct se_session *se_sess) -{ - struct vhost_scsi_cmd *tv_cmd; - unsigned int i; - - if (!se_sess->sess_cmd_map) - return; - - for (i = 0; i < VHOST_SCSI_DEFAULT_TAGS; i++) { - tv_cmd = &((struct vhost_scsi_cmd *)se_sess->sess_cmd_map)[i]; - - kfree(tv_cmd->tvc_sgl); - kfree(tv_cmd->tvc_prot_sgl); - kfree(tv_cmd->tvc_upages); - } -} - static ssize_t vhost_scsi_tpg_attrib_fabric_prot_type_store( struct config_item *item, const char *page, size_t count) { @@ -1898,45 +1990,6 @@ static struct configfs_attribute *vhost_scsi_tpg_attrib_attrs[] = { NULL, };
-static int vhost_scsi_nexus_cb(struct se_portal_group *se_tpg, - struct se_session *se_sess, void *p) -{ - struct vhost_scsi_cmd *tv_cmd; - unsigned int i; - - for (i = 0; i < VHOST_SCSI_DEFAULT_TAGS; i++) { - tv_cmd = &((struct vhost_scsi_cmd *)se_sess->sess_cmd_map)[i]; - - tv_cmd->tvc_sgl = kcalloc(VHOST_SCSI_PREALLOC_SGLS, - sizeof(struct scatterlist), - GFP_KERNEL); - if (!tv_cmd->tvc_sgl) { - pr_err("Unable to allocate tv_cmd->tvc_sgl\n"); - goto out; - } - - tv_cmd->tvc_upages = kcalloc(VHOST_SCSI_PREALLOC_UPAGES, - sizeof(struct page *), - GFP_KERNEL); - if (!tv_cmd->tvc_upages) { - pr_err("Unable to allocate tv_cmd->tvc_upages\n"); - goto out; - } - - tv_cmd->tvc_prot_sgl = kcalloc(VHOST_SCSI_PREALLOC_PROT_SGLS, - sizeof(struct scatterlist), - GFP_KERNEL); - if (!tv_cmd->tvc_prot_sgl) { - pr_err("Unable to allocate tv_cmd->tvc_prot_sgl\n"); - goto out; - } - } - return 0; -out: - vhost_scsi_free_cmd_map_res(se_sess); - return -ENOMEM; -} - static int vhost_scsi_make_nexus(struct vhost_scsi_tpg *tpg, const char *name) { @@ -1960,12 +2013,9 @@ static int vhost_scsi_make_nexus(struct vhost_scsi_tpg *tpg, * struct se_node_acl for the vhost_scsi struct se_portal_group with * the SCSI Initiator port name of the passed configfs group 'name'. */ - tv_nexus->tvn_se_sess = target_setup_session(&tpg->se_tpg, - VHOST_SCSI_DEFAULT_TAGS, - sizeof(struct vhost_scsi_cmd), + tv_nexus->tvn_se_sess = target_setup_session(&tpg->se_tpg, 0, 0, TARGET_PROT_DIN_PASS | TARGET_PROT_DOUT_PASS, - (unsigned char *)name, tv_nexus, - vhost_scsi_nexus_cb); + (unsigned char *)name, tv_nexus, NULL); if (IS_ERR(tv_nexus->tvn_se_sess)) { mutex_unlock(&tpg->tv_tpg_mutex); kfree(tv_nexus); @@ -2015,7 +2065,6 @@ static int vhost_scsi_drop_nexus(struct vhost_scsi_tpg *tpg) " %s Initiator Port: %s\n", vhost_scsi_dump_proto_id(tpg->tport), tv_nexus->tvn_se_sess->se_node_acl->initiatorname);
- vhost_scsi_free_cmd_map_res(se_sess); /* * Release the SCSI I_T Nexus to the emulated vhost Target Port */
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 47a3565e8bb14ec48a75b48daf57aa830e2691f8 ]
We might not do the final se_cmd put from vhost_scsi_complete_cmd_work. When the last put happens a little later then we could race where vhost_scsi_complete_cmd_work does vhost_signal, the guest runs and sends more IO, and vhost_scsi_handle_vq runs but does not find any free cmds.
This patch has us delay completing the cmd until the last lio core ref is dropped. We then know that once we signal to the guest that the cmd is completed that if it queues a new command it will find a free cmd.
Signed-off-by: Mike Christie michael.christie@oracle.com Reviewed-by: Maurizio Lombardi mlombard@redhat.com Link: https://lore.kernel.org/r/1604986403-4931-4-git-send-email-michael.christie@... Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/scsi.c | 42 +++++++++++++++--------------------------- 1 file changed, 15 insertions(+), 27 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index e31339be7dd78..5d8850f5aef16 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -322,7 +322,7 @@ static u32 vhost_scsi_tpg_get_inst_index(struct se_portal_group *se_tpg) return 1; }
-static void vhost_scsi_release_cmd(struct se_cmd *se_cmd) +static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd) { struct vhost_scsi_cmd *tv_cmd = container_of(se_cmd, struct vhost_scsi_cmd, tvc_se_cmd); @@ -344,6 +344,16 @@ static void vhost_scsi_release_cmd(struct se_cmd *se_cmd) vhost_scsi_put_inflight(inflight); }
+static void vhost_scsi_release_cmd(struct se_cmd *se_cmd) +{ + struct vhost_scsi_cmd *cmd = container_of(se_cmd, + struct vhost_scsi_cmd, tvc_se_cmd); + struct vhost_scsi *vs = cmd->tvc_vhost; + + llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list); + vhost_work_queue(&vs->dev, &vs->vs_completion_work); +} + static u32 vhost_scsi_sess_get_index(struct se_session *se_sess) { return 0; @@ -366,28 +376,15 @@ static int vhost_scsi_get_cmd_state(struct se_cmd *se_cmd) return 0; }
-static void vhost_scsi_complete_cmd(struct vhost_scsi_cmd *cmd) -{ - struct vhost_scsi *vs = cmd->tvc_vhost; - - llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list); - - vhost_work_queue(&vs->dev, &vs->vs_completion_work); -} - static int vhost_scsi_queue_data_in(struct se_cmd *se_cmd) { - struct vhost_scsi_cmd *cmd = container_of(se_cmd, - struct vhost_scsi_cmd, tvc_se_cmd); - vhost_scsi_complete_cmd(cmd); + transport_generic_free_cmd(se_cmd, 0); return 0; }
static int vhost_scsi_queue_status(struct se_cmd *se_cmd) { - struct vhost_scsi_cmd *cmd = container_of(se_cmd, - struct vhost_scsi_cmd, tvc_se_cmd); - vhost_scsi_complete_cmd(cmd); + transport_generic_free_cmd(se_cmd, 0); return 0; }
@@ -433,15 +430,6 @@ vhost_scsi_allocate_evt(struct vhost_scsi *vs, return evt; }
-static void vhost_scsi_free_cmd(struct vhost_scsi_cmd *cmd) -{ - struct se_cmd *se_cmd = &cmd->tvc_se_cmd; - - /* TODO locking against target/backend threads? */ - transport_generic_free_cmd(se_cmd, 0); - -} - static int vhost_scsi_check_stop_free(struct se_cmd *se_cmd) { return target_put_sess_cmd(se_cmd); @@ -560,7 +548,7 @@ static void vhost_scsi_complete_cmd_work(struct vhost_work *work) } else pr_err("Faulted on virtio_scsi_cmd_resp\n");
- vhost_scsi_free_cmd(cmd); + vhost_scsi_release_cmd_res(se_cmd); }
vq = -1; @@ -1091,7 +1079,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) &prot_iter, exp_data_len, &data_iter))) { vq_err(vq, "Failed to map iov to sgl\n"); - vhost_scsi_release_cmd(&cmd->tvc_se_cmd); + vhost_scsi_release_cmd_res(&cmd->tvc_se_cmd); goto err; } }
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 18f1becb6948cd411fd01968a0a54af63732e73c ]
Move code to parse lun from req's lun_buf to helper, so tmf code can use it in the next patch.
Signed-off-by: Mike Christie michael.christie@oracle.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Acked-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/1604986403-4931-5-git-send-email-michael.christie@... Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/scsi.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 5d8850f5aef16..ed7dc6b998f65 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -898,6 +898,11 @@ vhost_scsi_get_req(struct vhost_virtqueue *vq, struct vhost_scsi_ctx *vc, return ret; }
+static u16 vhost_buf_to_lun(u8 *lun_buf) +{ + return ((lun_buf[2] << 8) | lun_buf[3]) & 0x3FFF; +} + static void vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) { @@ -1036,12 +1041,12 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) tag = vhost64_to_cpu(vq, v_req_pi.tag); task_attr = v_req_pi.task_attr; cdb = &v_req_pi.cdb[0]; - lun = ((v_req_pi.lun[2] << 8) | v_req_pi.lun[3]) & 0x3FFF; + lun = vhost_buf_to_lun(v_req_pi.lun); } else { tag = vhost64_to_cpu(vq, v_req.tag); task_attr = v_req.task_attr; cdb = &v_req.cdb[0]; - lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF; + lun = vhost_buf_to_lun(v_req.lun); } /* * Check that the received CDB size does not exceeded our
On 25/11/20 16:35, Sasha Levin wrote:
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 18f1becb6948cd411fd01968a0a54af63732e73c ]
Move code to parse lun from req's lun_buf to helper, so tmf code can use it in the next patch.
Signed-off-by: Mike Christie michael.christie@oracle.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Acked-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/1604986403-4931-5-git-send-email-michael.christie@... Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
This doesn't seem like stable material, does it?
Paolo
drivers/vhost/scsi.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 5d8850f5aef16..ed7dc6b998f65 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -898,6 +898,11 @@ vhost_scsi_get_req(struct vhost_virtqueue *vq, struct vhost_scsi_ctx *vc, return ret; } +static u16 vhost_buf_to_lun(u8 *lun_buf) +{
- return ((lun_buf[2] << 8) | lun_buf[3]) & 0x3FFF;
+}
- static void vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) {
@@ -1036,12 +1041,12 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) tag = vhost64_to_cpu(vq, v_req_pi.tag); task_attr = v_req_pi.task_attr; cdb = &v_req_pi.cdb[0];
lun = ((v_req_pi.lun[2] << 8) | v_req_pi.lun[3]) & 0x3FFF;
} else { tag = vhost64_to_cpu(vq, v_req.tag); task_attr = v_req.task_attr; cdb = &v_req.cdb[0];lun = vhost_buf_to_lun(v_req_pi.lun);
lun = ((v_req.lun[2] << 8) | v_req.lun[3]) & 0x3FFF;
} /*lun = vhost_buf_to_lun(v_req.lun);
- Check that the received CDB size does not exceeded our
On Wed, Nov 25, 2020 at 06:48:21PM +0100, Paolo Bonzini wrote:
On 25/11/20 16:35, Sasha Levin wrote:
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 18f1becb6948cd411fd01968a0a54af63732e73c ]
Move code to parse lun from req's lun_buf to helper, so tmf code can use it in the next patch.
Signed-off-by: Mike Christie michael.christie@oracle.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Acked-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/1604986403-4931-5-git-send-email-michael.christie@... Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
This doesn't seem like stable material, does it?
It went in as a dependency for efd838fec17b ("vhost scsi: Add support for LUN resets."), which is the next patch.
On 25/11/20 19:01, Sasha Levin wrote:
On Wed, Nov 25, 2020 at 06:48:21PM +0100, Paolo Bonzini wrote:
On 25/11/20 16:35, Sasha Levin wrote:
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 18f1becb6948cd411fd01968a0a54af63732e73c ]
Move code to parse lun from req's lun_buf to helper, so tmf code can use it in the next patch.
Signed-off-by: Mike Christie michael.christie@oracle.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Acked-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/1604986403-4931-5-git-send-email-michael.christie@...
Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
This doesn't seem like stable material, does it?
It went in as a dependency for efd838fec17b ("vhost scsi: Add support for LUN resets."), which is the next patch.
Which doesn't seem to be suitable for stable either... Patch 3/5 in the series might be (vhost scsi: fix cmd completion race), so I can understand including 1/5 and 2/5 just in case, but not the rest. Does the bot not understand diffstats?
Paolo
On Wed, Nov 25, 2020 at 07:08:54PM +0100, Paolo Bonzini wrote:
On 25/11/20 19:01, Sasha Levin wrote:
On Wed, Nov 25, 2020 at 06:48:21PM +0100, Paolo Bonzini wrote:
On 25/11/20 16:35, Sasha Levin wrote:
From: Mike Christie michael.christie@oracle.com
[ Upstream commit 18f1becb6948cd411fd01968a0a54af63732e73c ]
Move code to parse lun from req's lun_buf to helper, so tmf code can use it in the next patch.
Signed-off-by: Mike Christie michael.christie@oracle.com Reviewed-by: Paolo Bonzini pbonzini@redhat.com Acked-by: Jason Wang jasowang@redhat.com Link: https://lore.kernel.org/r/1604986403-4931-5-git-send-email-michael.christie@...
Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org
This doesn't seem like stable material, does it?
It went in as a dependency for efd838fec17b ("vhost scsi: Add support for LUN resets."), which is the next patch.
Which doesn't seem to be suitable for stable either... Patch 3/5 in
Why not? It was sent as a fix to Linus.
the series might be (vhost scsi: fix cmd completion race), so I can understand including 1/5 and 2/5 just in case, but not the rest. Does the bot not understand diffstats?
Not on their own, no. What's wrong with the diffstats?
On 29/11/20 05:13, Sasha Levin wrote:
Which doesn't seem to be suitable for stable either... Patch 3/5 in
Why not? It was sent as a fix to Linus.
Dunno, 120 lines of new code? Even if it's okay for an rc, I don't see why it is would be backported to stable releases and release it without any kind of testing. Maybe for 5.9 the chances of breaking things are low, but stuff like locking rules might have changed since older releases like 5.4 or 4.19. The autoselection bot does not know that, it basically crosses fingers that these larger-scale changes cause the patches not to apply or compile anymore.
Maybe it's just me, but the whole "autoselect stable patches" and release them is very suspicious. You are basically crossing fingers and are ready to release any kind of untested crap, because you do not trust maintainers of marking stable patches right. Only then, when a backport is broken, it's maintainers who get the blame and have to fix it.
Personally I don't care because I have asked you to opt KVM out of autoselection, but this is the opposite of what Greg brags about when he touts the virtues of the upstream stable process over vendor kernels.
Paolo
the series might be (vhost scsi: fix cmd completion race), so I can understand including 1/5 and 2/5 just in case, but not the rest. Does the bot not understand diffstats?
Not on their own, no. What's wrong with the diffstats?
On Sun, Nov 29, 2020 at 06:34:01PM +0100, Paolo Bonzini wrote:
On 29/11/20 05:13, Sasha Levin wrote:
Which doesn't seem to be suitable for stable either... Patch 3/5 in
Why not? It was sent as a fix to Linus.
Dunno, 120 lines of new code? Even if it's okay for an rc, I don't see why it is would be backported to stable releases and release it without any kind of testing. Maybe for 5.9 the chances of breaking
Lines of code is not everything. If you think that this needs additional testing then that's fine and we can drop it, but not picking up a fix just because it's 120 lines is not something we'd do.
things are low, but stuff like locking rules might have changed since older releases like 5.4 or 4.19. The autoselection bot does not know that, it basically crosses fingers that these larger-scale changes cause the patches not to apply or compile anymore.
Plus all the testing we have for the stable trees, yes. It goes beyond just compiling at this point.
Your very own co-workers (https://cki-project.org/) are pushing hard on this effort around stable kernel testing, and statements like these aren't helping anyone.
If on the other hand, you'd like to see specific KVM/virtio/etc tests as part of the stable release process, we should all work together to make sure they're included in the current test suite.
Maybe it's just me, but the whole "autoselect stable patches" and release them is very suspicious. You are basically crossing fingers
Historically autoselected patches were later fixed/reverted at a lower ratio than patches tagged with a stable tag. I *think* that it's because they get a longer review cycle than some of the stable tagged patches.
and are ready to release any kind of untested crap, because you do not trust maintainers of marking stable patches right. Only then, when a
It's not that I don't trust - some folks forget, or not realize that something should go in stable. We're all humans. This is to complement the work done by maintainers, not replace it.
backport is broken, it's maintainers who get the blame and have to fix it.
What blame? Who's blaming who?
Personally I don't care because I have asked you to opt KVM out of autoselection, but this is the opposite of what Greg brags about when he touts the virtues of the upstream stable process over vendor kernels.
What, that we try and include all fixes rather than the ones I'm paid to pick up?
If you have a vendor you pay $$$ to, then yes - you're probably better off with a vendor kernel. This is actually in line (I think) with Greg's views on this (http://kroah.com/log/blog/2018/08/24/what-stable-kernel-should-i-use/).
On 29/11/20 22:06, Sasha Levin wrote:
On Sun, Nov 29, 2020 at 06:34:01PM +0100, Paolo Bonzini wrote:
On 29/11/20 05:13, Sasha Levin wrote:
Which doesn't seem to be suitable for stable either... Patch 3/5 in
Why not? It was sent as a fix to Linus.
Dunno, 120 lines of new code? Even if it's okay for an rc, I don't see why it is would be backported to stable releases and release it without any kind of testing. Maybe for 5.9 the chances of breaking
Lines of code is not everything. If you think that this needs additional testing then that's fine and we can drop it, but not picking up a fix just because it's 120 lines is not something we'd do.
Starting with the first two steps in stable-kernel-rules.rst:
Rules on what kind of patches are accepted, and which ones are not, into the "-stable" tree:
- It must be obviously correct and tested. - It cannot be bigger than 100 lines, with context.
Plus all the testing we have for the stable trees, yes. It goes beyond just compiling at this point.
Your very own co-workers (https://cki-project.org/) are pushing hard on this effort around stable kernel testing, and statements like these aren't helping anyone.
I am not aware of any public CI being done _at all_ done on vhost-scsi, by CKI or everyone else. So autoselection should be done only on subsystems that have very high coverage in CI.
Paolo
On Mon, Nov 30, 2020 at 09:33:46AM +0100, Paolo Bonzini wrote:
On 29/11/20 22:06, Sasha Levin wrote:
On Sun, Nov 29, 2020 at 06:34:01PM +0100, Paolo Bonzini wrote:
On 29/11/20 05:13, Sasha Levin wrote:
Which doesn't seem to be suitable for stable either... Patch 3/5 in
Why not? It was sent as a fix to Linus.
Dunno, 120 lines of new code? Even if it's okay for an rc, I don't see why it is would be backported to stable releases and release it without any kind of testing. Maybe for 5.9 the chances of breaking
Lines of code is not everything. If you think that this needs additional testing then that's fine and we can drop it, but not picking up a fix just because it's 120 lines is not something we'd do.
Starting with the first two steps in stable-kernel-rules.rst:
Rules on what kind of patches are accepted, and which ones are not, into the "-stable" tree:
- It must be obviously correct and tested.
- It cannot be bigger than 100 lines, with context.
We do obviously take patches that are bigger than 100 lines, as there are always exceptions to the rules here. Look at all of the spectre/meltdown patches as one such example. Should we refuse a patch just because it fixes a real issue yet is 101 lines long?
thanks,
greg k-h
On 30/11/20 14:28, Greg KH wrote:
Lines of code is not everything. If you think that this needs additional testing then that's fine and we can drop it, but not picking up a fix just because it's 120 lines is not something we'd do.
Starting with the first two steps in stable-kernel-rules.rst:
Rules on what kind of patches are accepted, and which ones are not, into the "-stable" tree:
- It must be obviously correct and tested.
- It cannot be bigger than 100 lines, with context.
We do obviously take patches that are bigger than 100 lines, as there are always exceptions to the rules here. Look at all of the spectre/meltdown patches as one such example. Should we refuse a patch just because it fixes a real issue yet is 101 lines long?
Every patch should be "fixing a real issue"---even a new feature. But the larger the patch, the more the submitters and maintainers should be trusted rather than a bot. The line between feature and bugfix _sometimes_ is blurry, I would say that in this case it's not, and it makes me question how the bot decided that this patch would be acceptable for stable (which AFAIK is not something that can be answered).
Paolo
On Mon, Nov 30, 2020 at 02:52:11PM +0100, Paolo Bonzini wrote:
On 30/11/20 14:28, Greg KH wrote:
Lines of code is not everything. If you think that this needs additional testing then that's fine and we can drop it, but not picking up a fix just because it's 120 lines is not something we'd do.
Starting with the first two steps in stable-kernel-rules.rst:
Rules on what kind of patches are accepted, and which ones are not, into the "-stable" tree:
- It must be obviously correct and tested.
- It cannot be bigger than 100 lines, with context.
We do obviously take patches that are bigger than 100 lines, as there are always exceptions to the rules here. Look at all of the spectre/meltdown patches as one such example. Should we refuse a patch just because it fixes a real issue yet is 101 lines long?
Every patch should be "fixing a real issue"---even a new feature. But the larger the patch, the more the submitters and maintainers should be trusted rather than a bot. The line between feature and bugfix _sometimes_ is blurry, I would say that in this case it's not, and it makes me question how the bot decided that this patch would be acceptable for stable (which AFAIK is not something that can be answered).
I thought that earlier Sasha said that this patch was needed as a prerequisite patch for a later fix, right? If not, sorry, I've lost the train of thought in this thread...
thanks,
greg k-h
On 30/11/20 14:57, Greg KH wrote:
Every patch should be "fixing a real issue"---even a new feature. But the larger the patch, the more the submitters and maintainers should be trusted rather than a bot. The line between feature and bugfix_sometimes_ is blurry, I would say that in this case it's not, and it makes me question how the bot decided that this patch would be acceptable for stable (which AFAIK is not something that can be answered).
I thought that earlier Sasha said that this patch was needed as a prerequisite patch for a later fix, right? If not, sorry, I've lost the train of thought in this thread...
Yeah---sorry I am replying to 22/33 but referring to 23/33, which is the one that in my opinion should not be blindly accepted for stable kernels without the agreement of the submitter or maintainer.
Paolo
On Mon, Nov 30, 2020 at 03:00:13PM +0100, Paolo Bonzini wrote:
On 30/11/20 14:57, Greg KH wrote:
Every patch should be "fixing a real issue"---even a new feature. But the larger the patch, the more the submitters and maintainers should be trusted rather than a bot. The line between feature and bugfix_sometimes_ is blurry, I would say that in this case it's not, and it makes me question how the bot decided that this patch would be acceptable for stable (which AFAIK is not something that can be answered).
I thought that earlier Sasha said that this patch was needed as a prerequisite patch for a later fix, right? If not, sorry, I've lost the train of thought in this thread...
Yeah---sorry I am replying to 22/33 but referring to 23/33, which is the one that in my opinion should not be blindly accepted for stable kernels without the agreement of the submitter or maintainer.
But it's not "blindly", right? I've sent this review mail over a week ago, and if it goes into the queue there will be at least two more emails going out to the author/maintainers.
During all this time it gets tested by various entities who do things that go beyond simple boot testing.
I'd argue that the backports we push in the stable tree sometimes get tested and reviewed better than the commits that land upstream.
On Mon, Nov 30, 2020 at 09:33:46AM +0100, Paolo Bonzini wrote:
On 29/11/20 22:06, Sasha Levin wrote:
Plus all the testing we have for the stable trees, yes. It goes beyond just compiling at this point.
Your very own co-workers (https://cki-project.org/) are pushing hard on this effort around stable kernel testing, and statements like these aren't helping anyone.
I am not aware of any public CI being done _at all_ done on vhost-scsi, by CKI or everyone else. So autoselection should be done only on subsystems that have very high coverage in CI.
Where can I find a testsuite for virtio/vhost? I see one for KVM, but where is the one that the maintainers of virtio/vhost run on patches that come in?
On 30/11/20 18:38, Sasha Levin wrote:
I am not aware of any public CI being done _at all_ done on vhost-scsi, by CKI or everyone else. So autoselection should be done only on subsystems that have very high coverage in CI.
Where can I find a testsuite for virtio/vhost? I see one for KVM, but where is the one that the maintainers of virtio/vhost run on patches that come in?
I don't know of any, especially for vhost-scsi. MikeC?
Paolo
On 11/30/20 11:52 AM, Paolo Bonzini wrote:
On 30/11/20 18:38, Sasha Levin wrote:
I am not aware of any public CI being done _at all_ done on vhost-scsi, by CKI or everyone else. So autoselection should be done only on subsystems that have very high coverage in CI.
Where can I find a testsuite for virtio/vhost? I see one for KVM, but where is the one that the maintainers of virtio/vhost run on patches that come in?
I don't know of any, especially for vhost-scsi. MikeC?
Sorry for the late reply on the thread. I was out of the office.
I have never seen a public/open-source vhost-scsi testsuite.
For patch 23 (the one that adds the lun reset support which is built on patch 22), we can't add it to stable right now if you wanted to, because it has a bug in it. Michael T, sent the fix:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linu...
to Linus today.
On 30/11/20 20:44, Mike Christie wrote:
I have never seen a public/open-source vhost-scsi testsuite.
For patch 23 (the one that adds the lun reset support which is built on patch 22), we can't add it to stable right now if you wanted to, because it has a bug in it. Michael T, sent the fix:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linu...
to Linus today.
Ok, so at least it was only a close call and anyway not for something that most people would be running on their machines. But it still seems to me that the state of CI in Linux is abysmal compared to what is needed to arbitrarily(*) pick up patches and commit them to "stable" trees.
Paolo
(*) A ML bot is an arbitrary choice as far as we are concerned since we cannot know how it makes a decision.
On Mon, Nov 30, 2020 at 09:29:02PM +0100, Paolo Bonzini wrote:
On 30/11/20 20:44, Mike Christie wrote:
I have never seen a public/open-source vhost-scsi testsuite.
For patch 23 (the one that adds the lun reset support which is built on patch 22), we can't add it to stable right now if you wanted to, because it has a bug in it. Michael T, sent the fix:
https://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost.git/commit/?h=linu...
to Linus today.
Ok, so at least it was only a close call and anyway not for something that most people would be running on their machines. But it still seems to me that the state of CI in Linux is abysmal compared to what is needed to arbitrarily(*) pick up patches and commit them to "stable" trees.
Paolo
(*) A ML bot is an arbitrary choice as far as we are concerned since we cannot know how it makes a decision.
The choice of patches is "arbitrary", but the decision is human. The patches are reviewed coming out of the AI, sent to public mailing list(s) for review, followed by 2 reminders asking for reviews.
The process for AUTOSEL patches generally takes longer than most patches do for upstream.
It's quite easy to NAK a patch too, just reply saying "no" and it'll be dropped (just like this patch was dropped right after your first reply) so the burden on maintainers is minimal.
On 01/12/20 00:59, Sasha Levin wrote:
It's quite easy to NAK a patch too, just reply saying "no" and it'll be dropped (just like this patch was dropped right after your first reply) so the burden on maintainers is minimal.
The maintainers are _already_ marking patches with "Cc: stable". That (plus backports) is where the burden on maintainers should start and end. I don't see the need to second guess them.
Paolo
On Fri, Dec 04, 2020 at 09:27:28AM +0100, Paolo Bonzini wrote:
On 01/12/20 00:59, Sasha Levin wrote:
It's quite easy to NAK a patch too, just reply saying "no" and it'll be dropped (just like this patch was dropped right after your first reply) so the burden on maintainers is minimal.
The maintainers are _already_ marking patches with "Cc: stable". That
They're not, though. Some forget, some subsystems don't mark anything, some don't mark it as it's not stable material when it lands in their tree but then it turns out to be one if it sits there for too long.
(plus backports) is where the burden on maintainers should start and end. I don't see the need to second guess them.
This is similar to describing our CI infrastructure as "second guessing": why are we second guessing authors and maintainers who are obviously doing the right thing by testing their patches and reporting issues to them?
Are you saying that you have always gotten stable tags right? never missed a stable tag where one should go?
On Fri, 2020-12-04 at 10:49 -0500, Sasha Levin wrote:
On Fri, Dec 04, 2020 at 09:27:28AM +0100, Paolo Bonzini wrote:
On 01/12/20 00:59, Sasha Levin wrote:
It's quite easy to NAK a patch too, just reply saying "no" and it'll be dropped (just like this patch was dropped right after your first reply) so the burden on maintainers is minimal.
The maintainers are _already_ marking patches with "Cc: stable". That
They're not, though. Some forget, some subsystems don't mark anything, some don't mark it as it's not stable material when it lands in their tree but then it turns out to be one if it sits there for too long.
(plus backports) is where the burden on maintainers should start and end. I don't see the need to second guess them.
This is similar to describing our CI infrastructure as "second guessing": why are we second guessing authors and maintainers who are obviously doing the right thing by testing their patches and reporting issues to them?
Are you saying that you have always gotten stable tags right? never missed a stable tag where one should go?
I think this simply adds to the burden of being a maintainer without all that much value.
I think the primary value here would be getting people to upgrade to current versions rather than backporting to nominally stable and relatively actively changed old versions.
This is very much related to this thread about trivial patches and maintainer burdening:
https://lore.kernel.org/lkml/1c7d7fde126bc0acf825766de64bf2f9b888f216.camel@...
On 04/12/20 16:49, Sasha Levin wrote:
On Fri, Dec 04, 2020 at 09:27:28AM +0100, Paolo Bonzini wrote:
On 01/12/20 00:59, Sasha Levin wrote:
It's quite easy to NAK a patch too, just reply saying "no" and it'll be dropped (just like this patch was dropped right after your first reply) so the burden on maintainers is minimal.
The maintainers are _already_ marking patches with "Cc: stable". That
They're not, though. Some forget, some subsystems don't mark anything, some don't mark it as it's not stable material when it lands in their tree but then it turns out to be one if it sits there for too long.
That means some subsystems will be worse as far as stable release support goes. That's not a problem:
- some subsystems have people paid to do backports to LTS releases when patches don't apply; others don't, if the patch doesn't apply the bug is simply not fixed in LTS releases
- some subsystems are worse than others even in "normal" releases :)
(plus backports) is where the burden on maintainers should start and end. I don't see the need to second guess them.
This is similar to describing our CI infrastructure as "second guessing": why are we second guessing authors and maintainers who are obviously doing the right thing by testing their patches and reporting issues to them?
No, it's not the same. CI helps finding bugs before you have to waste time spending bisecting regressions across thousands of commits. The lack of stable tags _can_ certainly be a problem, but it solves itself sooner or later when people upgrade their kernel.
Are you saying that you have always gotten stable tags right? never missed a stable tag where one should go?
Of course I did, just like I have introduced bugs. But at least I try to do my best both at adding stable tags and at not introducing bugs.
Paolo
On Fri, Dec 04, 2020 at 06:08:13PM +0100, Paolo Bonzini wrote:
On 04/12/20 16:49, Sasha Levin wrote:
On Fri, Dec 04, 2020 at 09:27:28AM +0100, Paolo Bonzini wrote:
On 01/12/20 00:59, Sasha Levin wrote:
It's quite easy to NAK a patch too, just reply saying "no" and it'll be dropped (just like this patch was dropped right after your first reply) so the burden on maintainers is minimal.
The maintainers are _already_ marking patches with "Cc: stable". That
They're not, though. Some forget, some subsystems don't mark anything, some don't mark it as it's not stable material when it lands in their tree but then it turns out to be one if it sits there for too long.
That means some subsystems will be worse as far as stable release support goes. That's not a problem:
- some subsystems have people paid to do backports to LTS releases
when patches don't apply; others don't, if the patch doesn't apply the bug is simply not fixed in LTS releases
Why not? A warning mail is originated and folks fix those up. I fixed a whole bunch of these myself for subsystems I'm not "paid" to do so.
- some subsystems are worse than others even in "normal" releases :)
Agree with that.
(plus backports) is where the burden on maintainers should start and end. I don't see the need to second guess them.
This is similar to describing our CI infrastructure as "second guessing": why are we second guessing authors and maintainers who are obviously doing the right thing by testing their patches and reporting issues to them?
No, it's not the same. CI helps finding bugs before you have to waste time spending bisecting regressions across thousands of commits. The lack of stable tags _can_ certainly be a problem, but it solves itself sooner or later when people upgrade their kernel.
If just waiting with fixing issues is ok until a user might "eventually" upgrade is acceptable then why bother with a stable tree to begin with?
From: Mike Christie michael.christie@oracle.com
[ Upstream commit efd838fec17bd8756da852a435800a7e6281bfbc ]
In newer versions of virtio-scsi we just reset the timer when an a command times out, so TMFs are never sent for the cmd time out case. However, in older kernels and for the TMF inject cases, we can still get resets and we end up just failing immediately so the guest might see the device get offlined and IO errors.
For the older kernel cases, we want the same end result as the modern virtio-scsi driver where we let the lower levels fire their error handling and handle the problem. And at the upper levels we want to wait. This patch ties the LUN reset handling into the LIO TMF code which will just wait for outstanding commands to complete like we are doing in the modern virtio-scsi case.
Note: I did not handle the ABORT case to keep this simple. For ABORTs LIO just waits on the cmd like how it does for the RESET case. If an ABORT fails, the guest OS ends up escalating to LUN RESET, so in the end we get the same behavior where we wait on the outstanding cmds.
Signed-off-by: Mike Christie michael.christie@oracle.com Link: https://lore.kernel.org/r/1604986403-4931-6-git-send-email-michael.christie@... Signed-off-by: Michael S. Tsirkin mst@redhat.com Acked-by: Stefan Hajnoczi stefanha@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/vhost/scsi.c | 147 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 134 insertions(+), 13 deletions(-)
diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index ed7dc6b998f65..f22fce5498626 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -139,6 +139,7 @@ struct vhost_scsi_tpg { struct se_portal_group se_tpg; /* Pointer back to vhost_scsi, protected by tv_tpg_mutex */ struct vhost_scsi *vhost_scsi; + struct list_head tmf_queue; };
struct vhost_scsi_tport { @@ -211,6 +212,20 @@ struct vhost_scsi { int vs_events_nr; /* num of pending events, protected by vq->mutex */ };
+struct vhost_scsi_tmf { + struct vhost_work vwork; + struct vhost_scsi_tpg *tpg; + struct vhost_scsi *vhost; + struct vhost_scsi_virtqueue *svq; + struct list_head queue_entry; + + struct se_cmd se_cmd; + struct vhost_scsi_inflight *inflight; + struct iovec resp_iov; + int in_iovs; + int vq_desc; +}; + /* * Context for processing request and control queue operations. */ @@ -344,14 +359,32 @@ static void vhost_scsi_release_cmd_res(struct se_cmd *se_cmd) vhost_scsi_put_inflight(inflight); }
+static void vhost_scsi_release_tmf_res(struct vhost_scsi_tmf *tmf) +{ + struct vhost_scsi_tpg *tpg = tmf->tpg; + struct vhost_scsi_inflight *inflight = tmf->inflight; + + mutex_lock(&tpg->tv_tpg_mutex); + list_add_tail(&tpg->tmf_queue, &tmf->queue_entry); + mutex_unlock(&tpg->tv_tpg_mutex); + vhost_scsi_put_inflight(inflight); +} + static void vhost_scsi_release_cmd(struct se_cmd *se_cmd) { - struct vhost_scsi_cmd *cmd = container_of(se_cmd, + if (se_cmd->se_cmd_flags & SCF_SCSI_TMR_CDB) { + struct vhost_scsi_tmf *tmf = container_of(se_cmd, + struct vhost_scsi_tmf, se_cmd); + + vhost_work_queue(&tmf->vhost->dev, &tmf->vwork); + } else { + struct vhost_scsi_cmd *cmd = container_of(se_cmd, struct vhost_scsi_cmd, tvc_se_cmd); - struct vhost_scsi *vs = cmd->tvc_vhost; + struct vhost_scsi *vs = cmd->tvc_vhost;
- llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list); - vhost_work_queue(&vs->dev, &vs->vs_completion_work); + llist_add(&cmd->tvc_completion_list, &vs->vs_completion_list); + vhost_work_queue(&vs->dev, &vs->vs_completion_work); + } }
static u32 vhost_scsi_sess_get_index(struct se_session *se_sess) @@ -390,7 +423,10 @@ static int vhost_scsi_queue_status(struct se_cmd *se_cmd)
static void vhost_scsi_queue_tm_rsp(struct se_cmd *se_cmd) { - return; + struct vhost_scsi_tmf *tmf = container_of(se_cmd, struct vhost_scsi_tmf, + se_cmd); + + transport_generic_free_cmd(&tmf->se_cmd, 0); }
static void vhost_scsi_aborted_task(struct se_cmd *se_cmd) @@ -1120,9 +1156,9 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) }
static void -vhost_scsi_send_tmf_reject(struct vhost_scsi *vs, - struct vhost_virtqueue *vq, - struct vhost_scsi_ctx *vc) +vhost_scsi_send_tmf_resp(struct vhost_scsi *vs, struct vhost_virtqueue *vq, + int in_iovs, int vq_desc, struct iovec *resp_iov, + int tmf_resp_code) { struct virtio_scsi_ctrl_tmf_resp rsp; struct iov_iter iov_iter; @@ -1130,17 +1166,87 @@ vhost_scsi_send_tmf_reject(struct vhost_scsi *vs,
pr_debug("%s\n", __func__); memset(&rsp, 0, sizeof(rsp)); - rsp.response = VIRTIO_SCSI_S_FUNCTION_REJECTED; + rsp.response = tmf_resp_code;
- iov_iter_init(&iov_iter, READ, &vq->iov[vc->out], vc->in, sizeof(rsp)); + iov_iter_init(&iov_iter, READ, resp_iov, in_iovs, sizeof(rsp));
ret = copy_to_iter(&rsp, sizeof(rsp), &iov_iter); if (likely(ret == sizeof(rsp))) - vhost_add_used_and_signal(&vs->dev, vq, vc->head, 0); + vhost_add_used_and_signal(&vs->dev, vq, vq_desc, 0); else pr_err("Faulted on virtio_scsi_ctrl_tmf_resp\n"); }
+static void vhost_scsi_tmf_resp_work(struct vhost_work *work) +{ + struct vhost_scsi_tmf *tmf = container_of(work, struct vhost_scsi_tmf, + vwork); + int resp_code; + + if (tmf->se_cmd.se_tmr_req->response == TMR_FUNCTION_COMPLETE) + resp_code = VIRTIO_SCSI_S_FUNCTION_SUCCEEDED; + else + resp_code = VIRTIO_SCSI_S_FUNCTION_REJECTED; + + vhost_scsi_send_tmf_resp(tmf->vhost, &tmf->svq->vq, tmf->in_iovs, + tmf->vq_desc, &tmf->resp_iov, resp_code); + vhost_scsi_release_tmf_res(tmf); +} + +static void +vhost_scsi_handle_tmf(struct vhost_scsi *vs, struct vhost_scsi_tpg *tpg, + struct vhost_virtqueue *vq, + struct virtio_scsi_ctrl_tmf_req *vtmf, + struct vhost_scsi_ctx *vc) +{ + struct vhost_scsi_virtqueue *svq = container_of(vq, + struct vhost_scsi_virtqueue, vq); + struct vhost_scsi_tmf *tmf; + + if (vhost32_to_cpu(vq, vtmf->subtype) != + VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET) + goto send_reject; + + if (!tpg->tpg_nexus || !tpg->tpg_nexus->tvn_se_sess) { + pr_err("Unable to locate active struct vhost_scsi_nexus for LUN RESET.\n"); + goto send_reject; + } + + mutex_lock(&tpg->tv_tpg_mutex); + if (list_empty(&tpg->tmf_queue)) { + pr_err("Missing reserve TMF. Could not handle LUN RESET.\n"); + mutex_unlock(&tpg->tv_tpg_mutex); + goto send_reject; + } + + tmf = list_first_entry(&tpg->tmf_queue, struct vhost_scsi_tmf, + queue_entry); + list_del_init(&tmf->queue_entry); + mutex_unlock(&tpg->tv_tpg_mutex); + + tmf->tpg = tpg; + tmf->vhost = vs; + tmf->svq = svq; + tmf->resp_iov = vq->iov[vc->out]; + tmf->vq_desc = vc->head; + tmf->in_iovs = vc->in; + tmf->inflight = vhost_scsi_get_inflight(vq); + + if (target_submit_tmr(&tmf->se_cmd, tpg->tpg_nexus->tvn_se_sess, NULL, + vhost_buf_to_lun(vtmf->lun), NULL, + TMR_LUN_RESET, GFP_KERNEL, 0, + TARGET_SCF_ACK_KREF) < 0) { + vhost_scsi_release_tmf_res(tmf); + goto send_reject; + } + + return; + +send_reject: + vhost_scsi_send_tmf_resp(vs, vq, vc->in, vc->head, &vq->iov[vc->out], + VIRTIO_SCSI_S_FUNCTION_REJECTED); +} + static void vhost_scsi_send_an_resp(struct vhost_scsi *vs, struct vhost_virtqueue *vq, @@ -1166,6 +1272,7 @@ vhost_scsi_send_an_resp(struct vhost_scsi *vs, static void vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) { + struct vhost_scsi_tpg *tpg; union { __virtio32 type; struct virtio_scsi_ctrl_an_req an; @@ -1247,12 +1354,12 @@ vhost_scsi_ctl_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) vc.req += typ_size; vc.req_size -= typ_size;
- ret = vhost_scsi_get_req(vq, &vc, NULL); + ret = vhost_scsi_get_req(vq, &vc, &tpg); if (ret) goto err;
if (v_req.type == VIRTIO_SCSI_T_TMF) - vhost_scsi_send_tmf_reject(vs, vq, &vc); + vhost_scsi_handle_tmf(vs, tpg, vq, &v_req.tmf, &vc); else vhost_scsi_send_an_resp(vs, vq, &vc); err: @@ -1913,11 +2020,19 @@ static int vhost_scsi_port_link(struct se_portal_group *se_tpg, { struct vhost_scsi_tpg *tpg = container_of(se_tpg, struct vhost_scsi_tpg, se_tpg); + struct vhost_scsi_tmf *tmf; + + tmf = kzalloc(sizeof(*tmf), GFP_KERNEL); + if (!tmf) + return -ENOMEM; + INIT_LIST_HEAD(&tmf->queue_entry); + vhost_work_init(&tmf->vwork, vhost_scsi_tmf_resp_work);
mutex_lock(&vhost_scsi_mutex);
mutex_lock(&tpg->tv_tpg_mutex); tpg->tv_tpg_port_count++; + list_add_tail(&tmf->queue_entry, &tpg->tmf_queue); mutex_unlock(&tpg->tv_tpg_mutex);
vhost_scsi_hotplug(tpg, lun); @@ -1932,11 +2047,16 @@ static void vhost_scsi_port_unlink(struct se_portal_group *se_tpg, { struct vhost_scsi_tpg *tpg = container_of(se_tpg, struct vhost_scsi_tpg, se_tpg); + struct vhost_scsi_tmf *tmf;
mutex_lock(&vhost_scsi_mutex);
mutex_lock(&tpg->tv_tpg_mutex); tpg->tv_tpg_port_count--; + tmf = list_first_entry(&tpg->tmf_queue, struct vhost_scsi_tmf, + queue_entry); + list_del(&tmf->queue_entry); + kfree(tmf); mutex_unlock(&tpg->tv_tpg_mutex);
vhost_scsi_hotunplug(tpg, lun); @@ -2197,6 +2317,7 @@ vhost_scsi_make_tpg(struct se_wwn *wwn, const char *name) } mutex_init(&tpg->tv_tpg_mutex); INIT_LIST_HEAD(&tpg->tv_tpg_list); + INIT_LIST_HEAD(&tpg->tmf_queue); tpg->tport = tport; tpg->tport_tpgt = tpgt;
From: Dmitry Osipenko digetx@gmail.com
[ Upstream commit c39de538a06e76d89b7e598a71e16688009cd56c ]
Annotate tegra_pm_set[clear]_cpu_in_lp2() with RCU_NONIDLE in order to fix lockdep warning about suspicious RCU usage of a spinlock during late idling phase.
WARNING: suspicious RCU usage ... include/trace/events/lock.h:13 suspicious rcu_dereference_check() usage! ... (dump_stack) from (lock_acquire) (lock_acquire) from (_raw_spin_lock) (_raw_spin_lock) from (tegra_pm_set_cpu_in_lp2) (tegra_pm_set_cpu_in_lp2) from (tegra_cpuidle_enter) (tegra_cpuidle_enter) from (cpuidle_enter_state) (cpuidle_enter_state) from (cpuidle_enter_state_coupled) (cpuidle_enter_state_coupled) from (cpuidle_enter) (cpuidle_enter) from (do_idle) ...
Tested-by: Peter Geis pgwipeout@gmail.com Reported-by: Peter Geis pgwipeout@gmail.com Signed-off-by: Dmitry Osipenko digetx@gmail.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/cpuidle/cpuidle-tegra.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/cpuidle/cpuidle-tegra.c b/drivers/cpuidle/cpuidle-tegra.c index e8956706a2917..191966dc8d023 100644 --- a/drivers/cpuidle/cpuidle-tegra.c +++ b/drivers/cpuidle/cpuidle-tegra.c @@ -189,7 +189,7 @@ static int tegra_cpuidle_state_enter(struct cpuidle_device *dev, }
local_fiq_disable(); - tegra_pm_set_cpu_in_lp2(); + RCU_NONIDLE(tegra_pm_set_cpu_in_lp2()); cpu_pm_enter();
switch (index) { @@ -207,7 +207,7 @@ static int tegra_cpuidle_state_enter(struct cpuidle_device *dev, }
cpu_pm_exit(); - tegra_pm_clear_cpu_in_lp2(); + RCU_NONIDLE(tegra_pm_clear_cpu_in_lp2()); local_fiq_enable();
return err ?: index;
From: Sugar Zhang sugar.zhang@rock-chips.com
[ Upstream commit e773ca7da8beeca7f17fe4c9d1284a2b66839cc1 ]
Actually, burst size is equal to '1 << desc->rqcfg.brst_size'. we should use burst size, not desc->rqcfg.brst_size.
dma memcpy performance on Rockchip RV1126 @ 1512MHz A7, 1056MHz LPDDR3, 200MHz DMA:
dmatest:
/# echo dma0chan0 > /sys/module/dmatest/parameters/channel /# echo 4194304 > /sys/module/dmatest/parameters/test_buf_size /# echo 8 > /sys/module/dmatest/parameters/iterations /# echo y > /sys/module/dmatest/parameters/norandom /# echo y > /sys/module/dmatest/parameters/verbose /# echo 1 > /sys/module/dmatest/parameters/run
dmatest: dma0chan0-copy0: result #1: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #2: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #3: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #4: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #5: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #6: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #7: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000 dmatest: dma0chan0-copy0: result #8: 'test passed' with src_off=0x0 dst_off=0x0 len=0x400000
Before:
dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 48 iops 200338 KB/s (0)
After this patch:
dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 179 iops 734873 KB/s (0)
After this patch and increase dma clk to 400MHz:
dmatest: dma0chan0-copy0: summary 8 tests, 0 failures 259 iops 1062929 KB/s (0)
Signed-off-by: Sugar Zhang sugar.zhang@rock-chips.com Link: https://lore.kernel.org/r/1605326106-55681-1-git-send-email-sugar.zhang@rock... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/dma/pl330.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c index 5274a0704d960..2c3c47e4f7770 100644 --- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -2802,7 +2802,7 @@ pl330_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dst, * If burst size is smaller than bus width then make sure we only * transfer one at a time to avoid a burst stradling an MFIFO entry. */ - if (desc->rqcfg.brst_size * 8 < pl330->pcfg.data_bus_width) + if (burst * 8 < pl330->pcfg.data_bus_width) desc->rqcfg.brst_len = 1;
desc->bytes_requested = len;
From: Lee Duncan lduncan@suse.com
[ Upstream commit fe0a8a95e7134d0b44cd407bc0085b9ba8d8fe31 ]
iSCSI NOPs are sometimes "lost", mistakenly sent to the user-land iscsid daemon instead of handled in the kernel, as they should be, resulting in a message from the daemon like:
iscsid: Got nop in, but kernel supports nop handling.
This can occur because of the new forward- and back-locks, and the fact that an iSCSI NOP response can occur before processing of the NOP send is complete. This can result in "conn->ping_task" being NULL in iscsi_nop_out_rsp(), when the pointer is actually in the process of being set.
To work around this, we add a new state to the "ping_task" pointer. In addition to NULL (not assigned) and a pointer (assigned), we add the state "being set", which is signaled with an INVALID pointer (using "-1").
Link: https://lore.kernel.org/r/20201106193317.16993-1-leeman.duncan@gmail.com Reviewed-by: Mike Christie michael.christie@oracle.com Signed-off-by: Lee Duncan lduncan@suse.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/scsi/libiscsi.c | 23 +++++++++++++++-------- include/scsi/libiscsi.h | 3 +++ 2 files changed, 18 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c index 1e9c3171fa9f4..f9314f1393fbd 100644 --- a/drivers/scsi/libiscsi.c +++ b/drivers/scsi/libiscsi.c @@ -533,8 +533,8 @@ static void iscsi_complete_task(struct iscsi_task *task, int state) if (conn->task == task) conn->task = NULL;
- if (conn->ping_task == task) - conn->ping_task = NULL; + if (READ_ONCE(conn->ping_task) == task) + WRITE_ONCE(conn->ping_task, NULL);
/* release get from queueing */ __iscsi_put_task(task); @@ -738,6 +738,9 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr, task->conn->session->age); }
+ if (unlikely(READ_ONCE(conn->ping_task) == INVALID_SCSI_TASK)) + WRITE_ONCE(conn->ping_task, task); + if (!ihost->workq) { if (iscsi_prep_mgmt_task(conn, task)) goto free_task; @@ -941,8 +944,11 @@ static int iscsi_send_nopout(struct iscsi_conn *conn, struct iscsi_nopin *rhdr) struct iscsi_nopout hdr; struct iscsi_task *task;
- if (!rhdr && conn->ping_task) - return -EINVAL; + if (!rhdr) { + if (READ_ONCE(conn->ping_task)) + return -EINVAL; + WRITE_ONCE(conn->ping_task, INVALID_SCSI_TASK); + }
memset(&hdr, 0, sizeof(struct iscsi_nopout)); hdr.opcode = ISCSI_OP_NOOP_OUT | ISCSI_OP_IMMEDIATE; @@ -957,11 +963,12 @@ static int iscsi_send_nopout(struct iscsi_conn *conn, struct iscsi_nopin *rhdr)
task = __iscsi_conn_send_pdu(conn, (struct iscsi_hdr *)&hdr, NULL, 0); if (!task) { + if (!rhdr) + WRITE_ONCE(conn->ping_task, NULL); iscsi_conn_printk(KERN_ERR, conn, "Could not send nopout\n"); return -EIO; } else if (!rhdr) { /* only track our nops */ - conn->ping_task = task; conn->last_ping = jiffies; }
@@ -984,7 +991,7 @@ static int iscsi_nop_out_rsp(struct iscsi_task *task, struct iscsi_conn *conn = task->conn; int rc = 0;
- if (conn->ping_task != task) { + if (READ_ONCE(conn->ping_task) != task) { /* * If this is not in response to one of our * nops then it must be from userspace. @@ -1923,7 +1930,7 @@ static void iscsi_start_tx(struct iscsi_conn *conn) */ static int iscsi_has_ping_timed_out(struct iscsi_conn *conn) { - if (conn->ping_task && + if (READ_ONCE(conn->ping_task) && time_before_eq(conn->last_recv + (conn->recv_timeout * HZ) + (conn->ping_timeout * HZ), jiffies)) return 1; @@ -2058,7 +2065,7 @@ enum blk_eh_timer_return iscsi_eh_cmd_timed_out(struct scsi_cmnd *sc) * Checking the transport already or nop from a cmd timeout still * running */ - if (conn->ping_task) { + if (READ_ONCE(conn->ping_task)) { task->have_checked_conn = true; rc = BLK_EH_RESET_TIMER; goto done; diff --git a/include/scsi/libiscsi.h b/include/scsi/libiscsi.h index c25fb86ffae95..b3bbd10eb3f07 100644 --- a/include/scsi/libiscsi.h +++ b/include/scsi/libiscsi.h @@ -132,6 +132,9 @@ struct iscsi_task { void *dd_data; /* driver/transport data */ };
+/* invalid scsi_task pointer */ +#define INVALID_SCSI_TASK (struct iscsi_task *)-1l + static inline int iscsi_task_has_unsol_data(struct iscsi_task *task) { return task->unsol_r2t.data_length > task->unsol_r2t.sent;
From: Mike Christie michael.christie@oracle.com
[ Upstream commit f36199355c64a39fe82cfddc7623d827c7e050da ]
Maurizio found a race where the abort and cmd stop paths can race as follows:
1. thread1 runs iscsit_release_commands_from_conn and sets CMD_T_FABRIC_STOP.
2. thread2 runs iscsit_aborted_task and then does __iscsit_free_cmd. It then returns from the aborted_task callout and we finish target_handle_abort and do:
target_handle_abort -> transport_cmd_check_stop_to_fabric -> lio_check_stop_free -> target_put_sess_cmd
The cmd is now freed.
3. thread1 now finishes iscsit_release_commands_from_conn and runs iscsit_free_cmd while accessing a command we just released.
In __target_check_io_state we check for CMD_T_FABRIC_STOP and set the CMD_T_ABORTED if the driver is not cleaning up the cmd because of a session shutdown. However, iscsit_release_commands_from_conn only sets the CMD_T_FABRIC_STOP and does not check to see if the abort path has claimed completion ownership of the command.
This adds a check in iscsit_release_commands_from_conn so only the abort or fabric stop path cleanup the command.
Link: https://lore.kernel.org/r/1605318378-9269-1-git-send-email-michael.christie@... Reported-by: Maurizio Lombardi mlombard@redhat.com Reviewed-by: Maurizio Lombardi mlombard@redhat.com Signed-off-by: Mike Christie michael.christie@oracle.com Signed-off-by: Martin K. Petersen martin.petersen@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/target/iscsi/iscsi_target.c | 17 +++++++++++++---- 1 file changed, 13 insertions(+), 4 deletions(-)
diff --git a/drivers/target/iscsi/iscsi_target.c b/drivers/target/iscsi/iscsi_target.c index 7b56fe9f10628..2e18ec42c7045 100644 --- a/drivers/target/iscsi/iscsi_target.c +++ b/drivers/target/iscsi/iscsi_target.c @@ -483,8 +483,7 @@ EXPORT_SYMBOL(iscsit_queue_rsp); void iscsit_aborted_task(struct iscsi_conn *conn, struct iscsi_cmd *cmd) { spin_lock_bh(&conn->cmd_lock); - if (!list_empty(&cmd->i_conn_node) && - !(cmd->se_cmd.transport_state & CMD_T_FABRIC_STOP)) + if (!list_empty(&cmd->i_conn_node)) list_del_init(&cmd->i_conn_node); spin_unlock_bh(&conn->cmd_lock);
@@ -4083,12 +4082,22 @@ static void iscsit_release_commands_from_conn(struct iscsi_conn *conn) spin_lock_bh(&conn->cmd_lock); list_splice_init(&conn->conn_cmd_list, &tmp_list);
- list_for_each_entry(cmd, &tmp_list, i_conn_node) { + list_for_each_entry_safe(cmd, cmd_tmp, &tmp_list, i_conn_node) { struct se_cmd *se_cmd = &cmd->se_cmd;
if (se_cmd->se_tfo != NULL) { spin_lock_irq(&se_cmd->t_state_lock); - se_cmd->transport_state |= CMD_T_FABRIC_STOP; + if (se_cmd->transport_state & CMD_T_ABORTED) { + /* + * LIO's abort path owns the cleanup for this, + * so put it back on the list and let + * aborted_task handle it. + */ + list_move_tail(&cmd->i_conn_node, + &conn->conn_cmd_list); + } else { + se_cmd->transport_state |= CMD_T_FABRIC_STOP; + } spin_unlock_irq(&se_cmd->t_state_lock); } }
From: Boqun Feng boqun.feng@gmail.com
[ Upstream commit 43be4388e94b915799a24f0eaf664bf95b85231f ]
A warning was hit when running xfstests/generic/068 in a Hyper-V guest:
[...] ------------[ cut here ]------------ [...] DEBUG_LOCKS_WARN_ON(lockdep_hardirqs_enabled()) [...] WARNING: CPU: 2 PID: 1350 at kernel/locking/lockdep.c:5280 check_flags.part.0+0x165/0x170 [...] ... [...] Workqueue: events pwq_unbound_release_workfn [...] RIP: 0010:check_flags.part.0+0x165/0x170 [...] ... [...] Call Trace: [...] lock_is_held_type+0x72/0x150 [...] ? lock_acquire+0x16e/0x4a0 [...] rcu_read_lock_sched_held+0x3f/0x80 [...] __send_ipi_one+0x14d/0x1b0 [...] hv_send_ipi+0x12/0x30 [...] __pv_queued_spin_unlock_slowpath+0xd1/0x110 [...] __raw_callee_save___pv_queued_spin_unlock_slowpath+0x11/0x20 [...] .slowpath+0x9/0xe [...] lockdep_unregister_key+0x128/0x180 [...] pwq_unbound_release_workfn+0xbb/0xf0 [...] process_one_work+0x227/0x5c0 [...] worker_thread+0x55/0x3c0 [...] ? process_one_work+0x5c0/0x5c0 [...] kthread+0x153/0x170 [...] ? __kthread_bind_mask+0x60/0x60 [...] ret_from_fork+0x1f/0x30
The cause of the problem is we have call chain lockdep_unregister_key() -> <irq disabled by raw_local_irq_save()> lockdep_unlock() -> arch_spin_unlock() -> __pv_queued_spin_unlock_slowpath() -> pv_kick() -> __send_ipi_one() -> trace_hyperv_send_ipi_one().
Although this particular warning is triggered because Hyper-V has a trace point in ipi sending, but in general arch_spin_unlock() may call another function having a trace point in it, so put the arch_spin_lock() and arch_spin_unlock() after lock_recursion protection to fix this problem and avoid similiar problems.
Signed-off-by: Boqun Feng boqun.feng@gmail.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Link: https://lkml.kernel.org/r/20201113110512.1056501-1-boqun.feng@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- kernel/locking/lockdep.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 3eb35ad1b5241..04134a242f3d5 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -108,19 +108,21 @@ static inline void lockdep_lock(void) { DEBUG_LOCKS_WARN_ON(!irqs_disabled());
+ __this_cpu_inc(lockdep_recursion); arch_spin_lock(&__lock); __owner = current; - __this_cpu_inc(lockdep_recursion); }
static inline void lockdep_unlock(void) { + DEBUG_LOCKS_WARN_ON(!irqs_disabled()); + if (debug_locks && DEBUG_LOCKS_WARN_ON(__owner != current)) return;
- __this_cpu_dec(lockdep_recursion); __owner = NULL; arch_spin_unlock(&__lock); + __this_cpu_dec(lockdep_recursion); }
static inline bool lockdep_assert_locked(void)
From: Sami Tolvanen samitolvanen@google.com
[ Upstream commit ebd19fc372e3e78bf165f230e7c084e304441c08 ]
This change switches rapl to use PMU_FORMAT_ATTR, and fixes two other macros to use device_attribute instead of kobj_attribute to avoid callback type mismatches that trip indirect call checking with Clang's Control-Flow Integrity (CFI).
Reported-by: Sedat Dilek sedat.dilek@gmail.com Signed-off-by: Sami Tolvanen samitolvanen@google.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Kees Cook keescook@chromium.org Link: https://lkml.kernel.org/r/20201113183126.1239404-1-samitolvanen@google.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/events/intel/cstate.c | 6 +++--- arch/x86/events/intel/uncore.c | 4 ++-- arch/x86/events/intel/uncore.h | 12 ++++++------ arch/x86/events/rapl.c | 14 +------------- 4 files changed, 12 insertions(+), 24 deletions(-)
diff --git a/arch/x86/events/intel/cstate.c b/arch/x86/events/intel/cstate.c index 442e1ed4acd49..4eb7ee5fed72d 100644 --- a/arch/x86/events/intel/cstate.c +++ b/arch/x86/events/intel/cstate.c @@ -107,14 +107,14 @@ MODULE_LICENSE("GPL");
#define DEFINE_CSTATE_FORMAT_ATTR(_var, _name, _format) \ -static ssize_t __cstate_##_var##_show(struct kobject *kobj, \ - struct kobj_attribute *attr, \ +static ssize_t __cstate_##_var##_show(struct device *dev, \ + struct device_attribute *attr, \ char *page) \ { \ BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ return sprintf(page, _format "\n"); \ } \ -static struct kobj_attribute format_attr_##_var = \ +static struct device_attribute format_attr_##_var = \ __ATTR(_name, 0444, __cstate_##_var##_show, NULL)
static ssize_t cstate_get_attr_cpumask(struct device *dev, diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index d5c6d3b340c50..803601baa753d 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -92,8 +92,8 @@ struct pci2phy_map *__find_pci2phy_map(int segment) return map; }
-ssize_t uncore_event_show(struct kobject *kobj, - struct kobj_attribute *attr, char *buf) +ssize_t uncore_event_show(struct device *dev, + struct device_attribute *attr, char *buf) { struct uncore_event_desc *event = container_of(attr, struct uncore_event_desc, attr); diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h index 105fdc69825eb..c5744783e05d0 100644 --- a/arch/x86/events/intel/uncore.h +++ b/arch/x86/events/intel/uncore.h @@ -157,7 +157,7 @@ struct intel_uncore_box { #define UNCORE_BOX_FLAG_CFL8_CBOX_MSR_OFFS 2
struct uncore_event_desc { - struct kobj_attribute attr; + struct device_attribute attr; const char *config; };
@@ -179,8 +179,8 @@ struct pci2phy_map { struct pci2phy_map *__find_pci2phy_map(int segment); int uncore_pcibus_to_physid(struct pci_bus *bus);
-ssize_t uncore_event_show(struct kobject *kobj, - struct kobj_attribute *attr, char *buf); +ssize_t uncore_event_show(struct device *dev, + struct device_attribute *attr, char *buf);
static inline struct intel_uncore_pmu *dev_to_uncore_pmu(struct device *dev) { @@ -201,14 +201,14 @@ extern int __uncore_max_dies; }
#define DEFINE_UNCORE_FORMAT_ATTR(_var, _name, _format) \ -static ssize_t __uncore_##_var##_show(struct kobject *kobj, \ - struct kobj_attribute *attr, \ +static ssize_t __uncore_##_var##_show(struct device *dev, \ + struct device_attribute *attr, \ char *page) \ { \ BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ return sprintf(page, _format "\n"); \ } \ -static struct kobj_attribute format_attr_##_var = \ +static struct device_attribute format_attr_##_var = \ __ATTR(_name, 0444, __uncore_##_var##_show, NULL)
static inline bool uncore_pmc_fixed(int idx) diff --git a/arch/x86/events/rapl.c b/arch/x86/events/rapl.c index 67b411f7e8c41..abaed36212250 100644 --- a/arch/x86/events/rapl.c +++ b/arch/x86/events/rapl.c @@ -93,18 +93,6 @@ static const char *const rapl_domain_names[NR_RAPL_DOMAINS] __initconst = { * any other bit is reserved */ #define RAPL_EVENT_MASK 0xFFULL - -#define DEFINE_RAPL_FORMAT_ATTR(_var, _name, _format) \ -static ssize_t __rapl_##_var##_show(struct kobject *kobj, \ - struct kobj_attribute *attr, \ - char *page) \ -{ \ - BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE); \ - return sprintf(page, _format "\n"); \ -} \ -static struct kobj_attribute format_attr_##_var = \ - __ATTR(_name, 0444, __rapl_##_var##_show, NULL) - #define RAPL_CNTR_WIDTH 32
#define RAPL_EVENT_ATTR_STR(_name, v, str) \ @@ -441,7 +429,7 @@ static struct attribute_group rapl_pmu_events_group = { .attrs = attrs_empty, };
-DEFINE_RAPL_FORMAT_ATTR(event, event, "config:0-7"); +PMU_FORMAT_ATTR(event, "config:0-7"); static struct attribute *rapl_formats_attr[] = { &format_attr_event.attr, NULL,
From: Laurent Pinchart laurent.pinchart@ideasonboard.com
[ Upstream commit dc293f2106903ab9c24e9cea18c276e32c394c33 ]
When adding __user annotations in commit 2adf5352a34a, the strncpy_from_user() function declaration for the CONFIG_GENERIC_STRNCPY_FROM_USER case was missed. Fix it.
Reported-by: kernel test robot lkp@intel.com Signed-off-by: Laurent Pinchart laurent.pinchart@ideasonboard.com Message-Id: 20200831210937.17938-1-laurent.pinchart@ideasonboard.com Signed-off-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/xtensa/include/asm/uaccess.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/xtensa/include/asm/uaccess.h b/arch/xtensa/include/asm/uaccess.h index b9758119feca1..5c9fb8005aa89 100644 --- a/arch/xtensa/include/asm/uaccess.h +++ b/arch/xtensa/include/asm/uaccess.h @@ -302,7 +302,7 @@ strncpy_from_user(char *dst, const char __user *src, long count) return -EFAULT; } #else -long strncpy_from_user(char *dst, const char *src, long count); +long strncpy_from_user(char *dst, const char __user *src, long count); #endif
/*
From: Thomas Gleixner tglx@linutronix.de
[ Upstream commit 860aaabac8235cfde10fe556aa82abbbe3117888 ]
sysrq-t ends up invoking show_opcodes() for each task which tries to access the user space code of other processes, which is obviously bogus.
It either manages to dump where the foreign task's regs->ip points to in a valid mapping of the current task or triggers a pagefault and prints "Code: Bad RIP value.". Both is just wrong.
Add a safeguard in copy_code() and check whether the @regs pointer matches currents pt_regs. If not, do not even try to access it.
While at it, add commentary why using copy_from_user_nmi() is safe in copy_code() even if the function name suggests otherwise.
Reported-by: Oleg Nesterov oleg@redhat.com Signed-off-by: Thomas Gleixner tglx@linutronix.de Signed-off-by: Borislav Petkov bp@suse.de Reviewed-by: Borislav Petkov bp@suse.de Acked-by: Oleg Nesterov oleg@redhat.com Tested-by: Borislav Petkov bp@suse.de Link: https://lkml.kernel.org/r/20201117202753.667274723@linutronix.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/kernel/dumpstack.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-)
diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c index ea8d51ec251bb..4da8345d34bb0 100644 --- a/arch/x86/kernel/dumpstack.c +++ b/arch/x86/kernel/dumpstack.c @@ -77,6 +77,9 @@ static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src, if (!user_mode(regs)) return copy_from_kernel_nofault(buf, (u8 *)src, nbytes);
+ /* The user space code from other tasks cannot be accessed. */ + if (regs != task_pt_regs(current)) + return -EPERM; /* * Make sure userspace isn't trying to trick us into dumping kernel * memory by pointing the userspace instruction pointer at it. @@ -84,6 +87,12 @@ static int copy_code(struct pt_regs *regs, u8 *buf, unsigned long src, if (__chk_range_not_ok(src, nbytes, TASK_SIZE_MAX)) return -EINVAL;
+ /* + * Even if named copy_from_user_nmi() this can be invoked from + * other contexts and will not try to resolve a pagefault, which is + * the correct thing to do here as this code can be called from any + * context. + */ return copy_from_user_nmi(buf, (void __user *)src, nbytes); }
@@ -114,13 +123,19 @@ void show_opcodes(struct pt_regs *regs, const char *loglvl) u8 opcodes[OPCODE_BUFSIZE]; unsigned long prologue = regs->ip - PROLOGUE_SIZE;
- if (copy_code(regs, opcodes, prologue, sizeof(opcodes))) { - printk("%sCode: Unable to access opcode bytes at RIP 0x%lx.\n", - loglvl, prologue); - } else { + switch (copy_code(regs, opcodes, prologue, sizeof(opcodes))) { + case 0: printk("%sCode: %" __stringify(PROLOGUE_SIZE) "ph <%02x> %" __stringify(EPILOGUE_SIZE) "ph\n", loglvl, opcodes, opcodes[PROLOGUE_SIZE], opcodes + PROLOGUE_SIZE + 1); + break; + case -EPERM: + /* No access to the user space stack of other tasks. Ignore. */ + break; + default: + printk("%sCode: Unable to access opcode bytes at RIP 0x%lx.\n", + loglvl, prologue); + break; } }
From: Andrew Lunn andrew@lunn.ch
[ Upstream commit a3dcb3e7e70c72a68a79b30fc3a3adad5612731c ]
When the switch is hardware reset, it reads the contents of the EEPROM. This can contain instructions for programming values into registers and to perform waits between such programming. Reading the EEPROM can take longer than the 100ms mv88e6xxx_hardware_reset() waits after deasserting the reset GPIO. So poll the EEPROM done bit to ensure it is complete.
Signed-off-by: Andrew Lunn andrew@lunn.ch Signed-off-by: Ruslan Sushko rus@sushko.dev Link: https://lore.kernel.org/r/20201116164301.977661-1-rus@sushko.dev Signed-off-by: Jakub Kicinski kuba@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/net/dsa/mv88e6xxx/chip.c | 2 ++ drivers/net/dsa/mv88e6xxx/global1.c | 31 +++++++++++++++++++++++++++++ drivers/net/dsa/mv88e6xxx/global1.h | 1 + 3 files changed, 34 insertions(+)
diff --git a/drivers/net/dsa/mv88e6xxx/chip.c b/drivers/net/dsa/mv88e6xxx/chip.c index f0dbc05e30a4d..16040b13579ef 100644 --- a/drivers/net/dsa/mv88e6xxx/chip.c +++ b/drivers/net/dsa/mv88e6xxx/chip.c @@ -2299,6 +2299,8 @@ static void mv88e6xxx_hardware_reset(struct mv88e6xxx_chip *chip) usleep_range(10000, 20000); gpiod_set_value_cansleep(gpiod, 0); usleep_range(10000, 20000); + + mv88e6xxx_g1_wait_eeprom_done(chip); } }
diff --git a/drivers/net/dsa/mv88e6xxx/global1.c b/drivers/net/dsa/mv88e6xxx/global1.c index f62aa83ca08d4..33d443a37efc4 100644 --- a/drivers/net/dsa/mv88e6xxx/global1.c +++ b/drivers/net/dsa/mv88e6xxx/global1.c @@ -75,6 +75,37 @@ static int mv88e6xxx_g1_wait_init_ready(struct mv88e6xxx_chip *chip) return mv88e6xxx_g1_wait_bit(chip, MV88E6XXX_G1_STS, bit, 1); }
+void mv88e6xxx_g1_wait_eeprom_done(struct mv88e6xxx_chip *chip) +{ + const unsigned long timeout = jiffies + 1 * HZ; + u16 val; + int err; + + /* Wait up to 1 second for the switch to finish reading the + * EEPROM. + */ + while (time_before(jiffies, timeout)) { + err = mv88e6xxx_g1_read(chip, MV88E6XXX_G1_STS, &val); + if (err) { + dev_err(chip->dev, "Error reading status"); + return; + } + + /* If the switch is still resetting, it may not + * respond on the bus, and so MDIO read returns + * 0xffff. Differentiate between that, and waiting for + * the EEPROM to be done by bit 0 being set. + */ + if (val != 0xffff && + val & BIT(MV88E6XXX_G1_STS_IRQ_EEPROM_DONE)) + return; + + usleep_range(1000, 2000); + } + + dev_err(chip->dev, "Timeout waiting for EEPROM done"); +} + /* Offset 0x01: Switch MAC Address Register Bytes 0 & 1 * Offset 0x02: Switch MAC Address Register Bytes 2 & 3 * Offset 0x03: Switch MAC Address Register Bytes 4 & 5 diff --git a/drivers/net/dsa/mv88e6xxx/global1.h b/drivers/net/dsa/mv88e6xxx/global1.h index 1e3546f8b0727..e05abe61fa114 100644 --- a/drivers/net/dsa/mv88e6xxx/global1.h +++ b/drivers/net/dsa/mv88e6xxx/global1.h @@ -278,6 +278,7 @@ int mv88e6xxx_g1_set_switch_mac(struct mv88e6xxx_chip *chip, u8 *addr); int mv88e6185_g1_reset(struct mv88e6xxx_chip *chip); int mv88e6352_g1_reset(struct mv88e6xxx_chip *chip); int mv88e6250_g1_reset(struct mv88e6xxx_chip *chip); +void mv88e6xxx_g1_wait_eeprom_done(struct mv88e6xxx_chip *chip);
int mv88e6185_g1_ppu_enable(struct mv88e6xxx_chip *chip); int mv88e6185_g1_ppu_disable(struct mv88e6xxx_chip *chip);
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 883a790a84401f6f55992887fd7263d808d4d05d ]
Jens has reported a situation where partial direct IOs can be issued and completed yet still return -EAGAIN. We don't want this to report a short IO as we want XFS to complete user DIO entirely or not at all.
This partial IO situation can occur on a write IO that is split across an allocated extent and a hole, and the second mapping is returning EAGAIN because allocation would be required.
The trivial reproducer:
$ sudo xfs_io -fdt -c "pwrite 0 4k" -c "pwrite -V 1 -b 8k -N 0 8k" /mnt/scr/foo wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0001 sec (27.509 MiB/sec and 7042.2535 ops/sec) pwrite: Resource temporarily unavailable $
The pwritev2(0, 8kB, RWF_NOWAIT) call returns EAGAIN having done the first 4kB write:
xfs_file_direct_write: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 0x2000 iomap_apply: dev 259:1 ino 0x83 pos 0 length 8192 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor xfs_ilock_nowait: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap xfs_iunlock: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin xfs_iomap_found: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 8192 fork data startoff 0x0 startblock 24 blockcount 0x1 iomap_apply_dstmap: dev 259:1 ino 0x83 bdev 259:1 addr 102400 offset 0 length 4096 type MAPPED flags DIRTY
Here the first iomap loop has mapped the first 4kB of the file and issued the IO, and we enter the second iomap_apply loop:
iomap_apply: dev 259:1 ino 0x83 pos 4096 length 4096 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor xfs_ilock_nowait: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap xfs_iunlock: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
And we exit with -EAGAIN out because we hit the allocate case trying to make the second 4kB block.
Then IO completes on the first 4kB and the original IO context completes and unlocks the inode, returning -EAGAIN to userspace:
xfs_end_io_direct_write: dev 259:1 ino 0x83 isize 0x1000 disize 0x1000 offset 0x0 count 4096 xfs_iunlock: dev 259:1 ino 0x83 flags IOLOCK_SHARED caller xfs_file_dio_aio_write
There are other vectors to the same problem when we re-enter the mapping code if we have to make multiple mappinfs under NOWAIT conditions. e.g. failing trylocks, COW extents being found, allocation being required, and so on.
Avoid all these potential problems by only allowing IOMAP_NOWAIT IO to go ahead if the mapping we retrieve for the IO spans an entire allocated extent. This avoids the possibility of subsequent mappings to complete the IO from triggering NOWAIT semantics by any means as NOWAIT IO will now only enter the mapping code once per NOWAIT IO.
Reported-and-tested-by: Jens Axboe axboe@kernel.dk Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/xfs/xfs_iomap.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 3abb8b9d6f4c6..7b9ff824e82d4 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -706,6 +706,23 @@ xfs_ilock_for_iomap( return 0; }
+/* + * Check that the imap we are going to return to the caller spans the entire + * range that the caller requested for the IO. + */ +static bool +imap_spans_range( + struct xfs_bmbt_irec *imap, + xfs_fileoff_t offset_fsb, + xfs_fileoff_t end_fsb) +{ + if (imap->br_startoff > offset_fsb) + return false; + if (imap->br_startoff + imap->br_blockcount < end_fsb) + return false; + return true; +} + static int xfs_direct_write_iomap_begin( struct inode *inode, @@ -766,6 +783,18 @@ xfs_direct_write_iomap_begin( if (imap_needs_alloc(inode, flags, &imap, nimaps)) goto allocate_blocks;
+ /* + * NOWAIT IO needs to span the entire requested IO with a single map so + * that we avoid partial IO failures due to the rest of the IO range not + * covered by this map triggering an EAGAIN condition when it is + * subsequently mapped and aborting the IO. + */ + if ((flags & IOMAP_NOWAIT) && + !imap_spans_range(&imap, offset_fsb, end_fsb)) { + error = -EAGAIN; + goto out_unlock; + } + xfs_iunlock(ip, lockmode); trace_xfs_iomap_found(ip, offset, length, XFS_DATA_FORK, &imap); return xfs_bmbt_to_iomap(ip, iomap, &imap, iomap_flags);
On Wed, Nov 25, 2020 at 10:35:50AM -0500, Sasha Levin wrote:
From: Dave Chinner dchinner@redhat.com
[ Upstream commit 883a790a84401f6f55992887fd7263d808d4d05d ]
Jens has reported a situation where partial direct IOs can be issued and completed yet still return -EAGAIN. We don't want this to report a short IO as we want XFS to complete user DIO entirely or not at all.
This partial IO situation can occur on a write IO that is split across an allocated extent and a hole, and the second mapping is returning EAGAIN because allocation would be required.
The trivial reproducer:
$ sudo xfs_io -fdt -c "pwrite 0 4k" -c "pwrite -V 1 -b 8k -N 0 8k" /mnt/scr/foo wrote 4096/4096 bytes at offset 0 4 KiB, 1 ops; 0.0001 sec (27.509 MiB/sec and 7042.2535 ops/sec) pwrite: Resource temporarily unavailable $
The pwritev2(0, 8kB, RWF_NOWAIT) call returns EAGAIN having done the first 4kB write:
xfs_file_direct_write: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 0x2000 iomap_apply: dev 259:1 ino 0x83 pos 0 length 8192 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor xfs_ilock_nowait: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap xfs_iunlock: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin xfs_iomap_found: dev 259:1 ino 0x83 size 0x1000 offset 0x0 count 8192 fork data startoff 0x0 startblock 24 blockcount 0x1 iomap_apply_dstmap: dev 259:1 ino 0x83 bdev 259:1 addr 102400 offset 0 length 4096 type MAPPED flags DIRTY
Here the first iomap loop has mapped the first 4kB of the file and issued the IO, and we enter the second iomap_apply loop:
iomap_apply: dev 259:1 ino 0x83 pos 4096 length 4096 flags WRITE|DIRECT|NOWAIT (0x31) ops xfs_direct_write_iomap_ops caller iomap_dio_rw actor iomap_dio_actor xfs_ilock_nowait: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_ilock_for_iomap xfs_iunlock: dev 259:1 ino 0x83 flags ILOCK_SHARED caller xfs_direct_write_iomap_begin
And we exit with -EAGAIN out because we hit the allocate case trying to make the second 4kB block.
Then IO completes on the first 4kB and the original IO context completes and unlocks the inode, returning -EAGAIN to userspace:
xfs_end_io_direct_write: dev 259:1 ino 0x83 isize 0x1000 disize 0x1000 offset 0x0 count 4096 xfs_iunlock: dev 259:1 ino 0x83 flags IOLOCK_SHARED caller xfs_file_dio_aio_write
There are other vectors to the same problem when we re-enter the mapping code if we have to make multiple mappinfs under NOWAIT conditions. e.g. failing trylocks, COW extents being found, allocation being required, and so on.
Avoid all these potential problems by only allowing IOMAP_NOWAIT IO to go ahead if the mapping we retrieve for the IO spans an entire allocated extent. This avoids the possibility of subsequent mappings to complete the IO from triggering NOWAIT semantics by any means as NOWAIT IO will now only enter the mapping code once per NOWAIT IO.
Reported-and-tested-by: Jens Axboe axboe@kernel.dk Signed-off-by: Dave Chinner dchinner@redhat.com Reviewed-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Darrick J. Wong darrick.wong@oracle.com Signed-off-by: Sasha Levin sashal@kernel.org
fs/xfs/xfs_iomap.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+)
No, please don't pick this up for stable kernels until at least 5.10 is released. This still needs some time integration testing time to ensure that we haven't introduced any other performance regressions with this change.
We've already had one XFS upstream kernel regression in this -rc cycle propagated to the stable kernels in 5.9.9 because the stable process picked up a bunch of random XFS fixes within hours of them being merged by Linus. One of those commits was a result of a thinko, and despite the fact we found it and reverted it within a few days, users of stable kernels have been exposed to it for a couple of weeks. That *should never have happened*.
This has happened before, and *again* we were lucky this wasn't worse than it was. We were saved by the flaw being caught by own internal pre-write corruption verifiers (which exist because we don't trust our code to be bug-free, let alone the collections of random, poorly tested backports) so that it only resulted in corruption shutdowns rather than permanent on-disk damage and data loss.
Put simply: the stable process is flawed because it shortcuts the necessary stabilisation testing for new code. It doesn't matter if the merged commits have a "fixes" tag in them, that tag doesn't mean the change is ready to be exposed to production systems. We need the *-rc stabilisation process* to weed out thinkos, brown paper bag bugs, etc, because we all make mistakes, and bugs in filesystem code can *lose user data permanently*.
Hence I ask that the stable maintainers only do automated pulls of iomap and XFS changes from upstream kernels when Linus officially releases them rather than at random points in time in the -rc cycle. If there is a critical fix we need to go back to stable kernels immediately, we will let stable@kernel.org know directly that we want this done.
Cheers,
Dave.
On Thu, Nov 26, 2020 at 08:52:47AM +1100, Dave Chinner wrote:
We've already had one XFS upstream kernel regression in this -rc cycle propagated to the stable kernels in 5.9.9 because the stable process picked up a bunch of random XFS fixes within hours of them being merged by Linus. One of those commits was a result of a thinko, and despite the fact we found it and reverted it within a few days, users of stable kernels have been exposed to it for a couple of weeks. That *should never have happened*.
No, what shouldn't have happened is a commit that never went out for a review on the public mailing lists nor spending any time in linux-next ending up in Linus's tree.
It's ridiculous that you see a failure in the maintainership workflow of XFS and turn around to blame it somehow on the stable process.
This has happened before, and *again* we were lucky this wasn't worse than it was. We were saved by the flaw being caught by own internal pre-write corruption verifiers (which exist because we don't trust our code to be bug-free, let alone the collections of random, poorly tested backports) so that it only resulted in corruption shutdowns rather than permanent on-disk damage and data loss.
Put simply: the stable process is flawed because it shortcuts the necessary stabilisation testing for new code. It doesn't matter if
The stable process assumes that commits that ended up upstream were reviewed and tested; the stable process doesn't offer much in the way of in-depth review of specific patches but mostly focuses on testing the product of backporting hundreds of patches into each stable branch.
Release candidate cycles are here to squash the bugs that went in during the merge window, not to introduce new "thinkos" in the way of pulling patches out of your hip in the middle of the release cycle.
the merged commits have a "fixes" tag in them, that tag doesn't mean the change is ready to be exposed to production systems. We need the *-rc stabilisation process* to weed out thinkos, brown paper bag bugs, etc, because we all make mistakes, and bugs in filesystem code can *lose user data permanently*.
What needed to happen here is that XFS's internal testing story would run *before* this patch was merged anywhere and catch this bug. Why didn't it happen?
Hence I ask that the stable maintainers only do automated pulls of iomap and XFS changes from upstream kernels when Linus officially releases them rather than at random points in time in the -rc cycle. If there is a critical fix we need to go back to stable kernels immediately, we will let stable@kernel.org know directly that we want this done.
I'll happily switch back to a model where we look only for stable tags from XFS, but sadly this happened only *once* in the past year. How is this helping to prevent the dangerous bugs that may cause users to lose their data permanently?
On Wed, Nov 25, 2020 at 06:46:54PM -0500, Sasha Levin wrote:
On Thu, Nov 26, 2020 at 08:52:47AM +1100, Dave Chinner wrote:
We've already had one XFS upstream kernel regression in this -rc cycle propagated to the stable kernels in 5.9.9 because the stable process picked up a bunch of random XFS fixes within hours of them being merged by Linus. One of those commits was a result of a thinko, and despite the fact we found it and reverted it within a few days, users of stable kernels have been exposed to it for a couple of weeks. That *should never have happened*.
No, what shouldn't have happened is a commit that never went out for a review on the public mailing lists nor spending any time in linux-next ending up in Linus's tree.
<sigh>
I think you've got your wires crossed somewhere, Sasha, because none of that happened here. From the public record, the patch was first posted here by Darrick:
https://lore.kernel.org/linux-xfs/160494584816.772693.2490433010759557816.st...
on Nov 9, and was reviewed by Christoph a day later. It was merged into the XFS tree on Nov 10, with the rvb tag:
https://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git/commit/?h=for-next&i...
Which means it should have been in linux-next on Nov 11, 12 and 13, when Darrick sent the pull request:
https://lore.kernel.org/linux-xfs/20201113231738.GX9695@magnolia/
It was merged into Linus's tree an hour later.
So, in contrast to your claims, the evidence is that the patch was, in fact, publicly posted, reviewed, and spent time in linux-next before ending up in Linus's tree.
FWIW, on November 17, GregKH sent the patch to lkml for stable review after being selected by the stable process for a stable backport. This was not cc'd to the XFS list, and it was committed without comment into the 5.9.x tree is on November 18.
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/fs/x...
IOWs, the XFS developers didn't ask for it to be backported to stable kernels - the commit did not contain a "cc: stable@kernel.org", nor was the original patch posting cc'd to the stable list.
The fact is that entire decision to backport this commit was made by stable maintainers and/or their tools, and the stable maintainers themselves chose not to tell the XFS list they had selected it for backport.
Hence:
It's ridiculous that you see a failure in the maintainership workflow of XFS and turn around to blame it somehow on the stable process.
.... I think you really need to have another look at the evidence before you dig yourself a deeper hole and waste more of my time....
This has happened before, and *again* we were lucky this wasn't worse than it was. We were saved by the flaw being caught by own internal pre-write corruption verifiers (which exist because we don't trust our code to be bug-free, let alone the collections of random, poorly tested backports) so that it only resulted in corruption shutdowns rather than permanent on-disk damage and data loss.
Put simply: the stable process is flawed because it shortcuts the necessary stabilisation testing for new code. It doesn't matter if
The stable process assumes that commits that ended up upstream were reviewed and tested; the stable process doesn't offer much in the way of in-depth review of specific patches but mostly focuses on testing the product of backporting hundreds of patches into each stable branch.
And I've lost count of the number of times I've told the stable maintainers that this is an invalid assumption.
Yet here we are again.
How many times do we have to make the same mistake before we learn from it?
Release candidate cycles are here to squash the bugs that went in during the merge window, not to introduce new "thinkos" in the way of pulling patches out of your hip in the middle of the release cycle.
"pulling patches out of your hip"
Nice insult. Avoids profanity filters and everything. But I don't know why you're trying to insult me over something I played no part in.
Seriously, merging critical fixes discovered in the -rc cycle happens *all the time* and has been done for as long as we've had -rc cycles. Even Linus himself does this.
The fact is that the -rc process is intended to accomodate merging fixes quickly whilst still allowing sufficient testing time to be confident that no regressions were introduced or have been found and addressed before release.
And that's the whole point of having an iterative integration testing phase in the release cycle - it can be adapted in duration to the current state of the code base and the fixes that are being made late in the cycle.
You *should* know all this Sasha, so I'm not sure why you are claiming that long standing, well founded software engineering practices are suddenly a problem...
the merged commits have a "fixes" tag in them, that tag doesn't mean the change is ready to be exposed to production systems. We need the *-rc stabilisation process* to weed out thinkos, brown paper bag bugs, etc, because we all make mistakes, and bugs in filesystem code can *lose user data permanently*.
What needed to happen here is that XFS's internal testing story would run *before* this patch was merged anywhere and catch this bug. Why didn't it happen?
It did. It's simply that kernel unit testing didn't discover the bug. That happens all the time, not just in XFS. In fact, it was XFS userspace unit testing that found the bug.
The kernel by itself didn't trigger inconsistencies because everything it created matches what it expected, and an old userspace checking a new kernel would ignore the bits changed by the bad commit in the kernel. IOWs, the typical kernel unit testing configuration of "new kernel, stable userspace" didn't see anything wrong.
It wasn't until a new userspace was run with an old kernel that the problem was found. The new userspace expected bits in the keys on disk to be set that an old kernel didn't set and that's when inconsistencies were flagged and the problem uncovered. This was found by the userspace XFS maintainer as when testing the changes merged from the kernel code. Unit testing userspace is the opposite of the kernel - "stable kernel, new userspace" - and that's the combination that triggered the errors that made us aware of the problem.
So, yes, our normal testing processes found the bug, it's just a the filesystem is *much more than just the kernel code* and sometimes bugs in the core code are found on the userspace side before they are found in the kernel.
However, testing variations in kernel and/or userspace tool versions is not generally part of the unit tests developers run. It is, however, something that we cover as the new code rolls out to testers and developers code as part of the integration testing process. That's when version mismatch and upgrade/downgrade bugs tend to show up as that's when the variety of installations the code is tested on rapidly expands. IOWs, we caught the problem exactly when hindsight tells us we should expect to catch such a problem.
The reality is that a single developer cannot do this sort of testing, hence we do not expect a single developer to do this. We don't even expect the maintainer to be able to cover such a huge testing scope before merging commits. It's simply not realistic to cover everything before code changes are merged. Perfection is the enemy of good.
I'll repeat the lesson to be learned here: merging a commit into Linus's tree does not mean it is fully tested and production ready. It just means it's passed a wide range of unit tests without regressions and so is considered ready for wider integration testing by a larger population of developers and testers.
Hence I ask that the stable maintainers only do automated pulls of iomap and XFS changes from upstream kernels when Linus officially releases them rather than at random points in time in the -rc cycle. If there is a critical fix we need to go back to stable kernels immediately, we will let stable@kernel.org know directly that we want this done.
I'll happily switch back to a model where we look only for stable tags from XFS, but sadly this happened only *once* in the past year. How is this helping to prevent the dangerous bugs that may cause users to lose their data permanently?
Given that the only dangerous bug that XFS users have been directly exposed to in recent times has been caused by an automated stable kernel backport short-circuiting our normal commit-test-release cycle, I can't see how XFS users will be worse off by turning off automated backports. :/
But that is not what I asked you to do or consider, it's just a strawman you constructed. What I want is for users to benefit from the overall stable process, but I don't want them exposed to the potential catastrophic risks that the current stable process exposes them to.
As developers, the stable process gives us no margin for error. We need at least some margin to prevent users for being exposed to avoidable regressions in stable kernels. So what I'm asking you to do is back off the bots a bit to provide that margin because, as we all known, bugs and mistakes happen.
An acceptible compromise from our perspective would be to run automated scans on released kernels that have already run the full test cycle. This way the users get the benefits of both full integration testing of the patches that get backported, and all the changes that the stable maintainers think their kernels should be getting end up in the stable kernel.
For the sorts of changes your process is typically backporting from XFS, an extra couple of weeks time lag is going to make no difference to users at all. But that extra margin means that situations like this recent stable kernel regression do not occur, resulting in stable kernels having a much lower risk profile for it's users.
That seems like a win-win-win scenario to me, so rather than throw shade and insults at the messenger, how about listening, learning from mistakes and trying to improve the process in a way that benefits everyone?
Cheers,
Dave.
linux-stable-mirror@lists.linaro.org