From: Filipe Manana <fdmanana(a)suse.com>
[ Upstream commit 1693d5442c458ae8d5b0d58463b873cd879569ed ]
Add a helper function to determine if a block group is being used and make
use of it at btrfs_delete_unused_bgs(). This helper will also be used in
future code changes.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Reviewed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/btrfs/block-group.c | 3 +--
fs/btrfs/block-group.h | 7 +++++++
2 files changed, 8 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c
index c4e3c1a5de059..9a7c7e0f7c233 100644
--- a/fs/btrfs/block-group.c
+++ b/fs/btrfs/block-group.c
@@ -1393,8 +1393,7 @@ void btrfs_delete_unused_bgs(struct btrfs_fs_info *fs_info)
}
spin_lock(&block_group->lock);
- if (block_group->reserved || block_group->pinned ||
- block_group->used || block_group->ro ||
+ if (btrfs_is_block_group_used(block_group) || block_group->ro ||
list_is_singular(&block_group->list)) {
/*
* We want to bail if we made new allocations or have
diff --git a/fs/btrfs/block-group.h b/fs/btrfs/block-group.h
index 4c7614346f724..0d02b75f9e7e3 100644
--- a/fs/btrfs/block-group.h
+++ b/fs/btrfs/block-group.h
@@ -196,6 +196,13 @@ static inline u64 btrfs_block_group_end(struct btrfs_block_group *block_group)
return (block_group->start + block_group->length);
}
+static inline bool btrfs_is_block_group_used(const struct btrfs_block_group *bg)
+{
+ lockdep_assert_held(&bg->lock);
+
+ return (bg->used > 0 || bg->reserved > 0 || bg->pinned > 0);
+}
+
static inline bool btrfs_is_block_group_data_only(
struct btrfs_block_group *block_group)
{
--
2.43.0
From: Arınç ÜNAL <arinc.unal(a)arinc9.com>
commit f490c492e946d8ffbe65ad4efc66de3c5ede30a4 upstream.
On MT7530, the HT_XTAL_FSEL field of the HWTRAP register stores a 2-bit
value that represents the frequency of the crystal oscillator connected to
the switch IC. The field is populated by the state of the ESW_P4_LED_0 and
ESW_P4_LED_0 pins, which is done right after reset is deasserted.
ESW_P4_LED_0 ESW_P3_LED_0 Frequency
-----------------------------------------
0 0 Reserved
0 1 20MHz
1 0 40MHz
1 1 25MHz
On MT7531, the XTAL25 bit of the STRAP register stores this. The LAN0LED0
pin is used to populate the bit. 25MHz when the pin is high, 40MHz when
it's low.
These pins are also used with LEDs, therefore, their state can be set to
something other than the bootstrapping configuration. For example, a link
may be established on port 3 before the DSA subdriver takes control of the
switch which would set ESW_P3_LED_0 to high.
Currently on mt7530_setup() and mt7531_setup(), 1000 - 1100 usec delay is
described between reset assertion and deassertion. Some switch ICs in real
life conditions cannot always have these pins set back to the bootstrapping
configuration before reset deassertion in this amount of delay. This causes
wrong crystal frequency to be selected which puts the switch in a
nonfunctional state after reset deassertion.
The tests below are conducted on an MT7530 with a 40MHz crystal oscillator
by Justin Swartz.
With a cable from an active peer connected to port 3 before reset, an
incorrect crystal frequency (0b11 = 25MHz) is selected:
[1] [3] [5]
: : :
_____________________________ __________________
ESW_P4_LED_0 |_______|
_____________________________
ESW_P3_LED_0 |__________________________
: : : :
: : [4]...:
: :
[2]................:
[1] Reset is asserted.
[2] Period of 1000 - 1100 usec.
[3] Reset is deasserted.
[4] Period of 315 usec. HWTRAP register is populated with incorrect
XTAL frequency.
[5] Signals reflect the bootstrapped configuration.
Increase the delay between reset_control_assert() and
reset_control_deassert(), and gpiod_set_value_cansleep(priv->reset, 0) and
gpiod_set_value_cansleep(priv->reset, 1) to 5000 - 5100 usec. This amount
ensures a higher possibility that the switch IC will have these pins back
to the bootstrapping configuration before reset deassertion.
With a cable from an active peer connected to port 3 before reset, the
correct crystal frequency (0b10 = 40MHz) is selected:
[1] [2-1] [3] [5]
: : : :
_____________________________ __________________
ESW_P4_LED_0 |_______|
___________________ _______
ESW_P3_LED_0 |_________| |__________________
: : : : :
: [2-2]...: [4]...:
[2]................:
[1] Reset is asserted.
[2] Period of 5000 - 5100 usec.
[2-1] ESW_P3_LED_0 goes low.
[2-2] Remaining period of 5000 - 5100 usec.
[3] Reset is deasserted.
[4] Period of 310 usec. HWTRAP register is populated with bootstrapped
XTAL frequency.
[5] Signals reflect the bootstrapped configuration.
ESW_P3_LED_0 low period before reset deassertion:
5000 usec
- 5100 usec
TEST RESET HOLD
# (usec)
---------------------
1 5410
2 5440
3 4375
4 5490
5 5475
6 4335
7 4370
8 5435
9 4205
10 4335
11 3750
12 3170
13 4395
14 4375
15 3515
16 4335
17 4220
18 4175
19 4175
20 4350
Min 3170
Max 5490
Median 4342.500
Avg 4466.500
Fixes: b8f126a8d543 ("net-next: dsa: add dsa support for Mediatek MT7530 switch")
Fixes: c288575f7810 ("net: dsa: mt7530: Add the support of MT7531 switch")
Co-developed-by: Justin Swartz <justin.swartz(a)risingedge.co.za>
Signed-off-by: Justin Swartz <justin.swartz(a)risingedge.co.za>
Signed-off-by: Arınç ÜNAL <arinc.unal(a)arinc9.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
---
Hi.
I've adjusted the patch to suit stable trees by removing the last paragraph
on the patch log. This is an important patch that should be backported to
stable releases. Please apply to 6.8, 6.7, 6.6, 6.1, and 5.15 stable trees.
---
drivers/net/dsa/mt7530.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/dsa/mt7530.c b/drivers/net/dsa/mt7530.c
index 3c1f657593a8..9c9a592c053e 100644
--- a/drivers/net/dsa/mt7530.c
+++ b/drivers/net/dsa/mt7530.c
@@ -2243,11 +2243,11 @@ mt7530_setup(struct dsa_switch *ds)
*/
if (priv->mcm) {
reset_control_assert(priv->rstc);
- usleep_range(1000, 1100);
+ usleep_range(5000, 5100);
reset_control_deassert(priv->rstc);
} else {
gpiod_set_value_cansleep(priv->reset, 0);
- usleep_range(1000, 1100);
+ usleep_range(5000, 5100);
gpiod_set_value_cansleep(priv->reset, 1);
}
@@ -2449,11 +2449,11 @@ mt7531_setup(struct dsa_switch *ds)
*/
if (priv->mcm) {
reset_control_assert(priv->rstc);
- usleep_range(1000, 1100);
+ usleep_range(5000, 5100);
reset_control_deassert(priv->rstc);
} else {
gpiod_set_value_cansleep(priv->reset, 0);
- usleep_range(1000, 1100);
+ usleep_range(5000, 5100);
gpiod_set_value_cansleep(priv->reset, 1);
}
---
base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
change-id: 20240318-b4-for-stable-f486c9ff8d86
Best regards,
--
Arınç ÜNAL <arinc.unal(a)arinc9.com>
hi,
i seem to be the only one in the world to have this problem. :-(
on one of my machines, updating to 6.6.18 and later (including mainline
branch) leads to unbootable system. all other computers are unaffected.
bisecting the history leads to:
commit 8117961d98fb2d335ab6de2cad7afb8b6171f5fe
Author: Ard Biesheuvel <ardb(a)kernel.org>
Date: Tue Sep 12 09:00:53 2023 +0000
x86/efi: Disregard setup header of loaded image
commit 7e50262229faad0c7b8c54477cd1c883f31cc4a7 upstream.
The native EFI entrypoint does not take a struct boot_params from the
loader, but instead, it constructs one from scratch, using the setup
header data placed at the start of the image.
This setup header is placed in a way that permits legacy loaders to
manipulate the contents (i.e., to pass the kernel command line or the
address and size of an initial ramdisk), but EFI boot does not use
it in
that way - it only copies the contents that were placed there at build
time, but EFI loaders will not (and should not) manipulate the setup
header to configure the boot. (Commit 63bf28ceb3ebbe76 "efi: x86: Wipe
setup_data on pure EFI boot" deals with some of the fallout of using
setup_data in a way that breaks EFI boot.)
Given that none of the non-zero values that are copied from the setup
header into the EFI stub's struct boot_params are relevant to the boot
now that the EFI stub no longer enters via the legacy decompressor, the
copy can be omitted altogether.
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Ingo Molnar <mingo(a)kernel.org>
Link:
https://lore.kernel.org/r/20230912090051.4014114-19-ardb@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
this seems to be the commit to introduce the regression.
i have no idea where to look and what information to provide to explain
what the problem might be or why this single machine is different. :-(
please don't hesitate to ask me further questions - i can do any
debugging you may need on your behalf.
sincerely,
R.
#regzbot introduced: 8117961d98fb2d335ab6de2cad7afb8b6171f5fe
Due to a long-standing bug in the Qualcomm Bluetooth driver, the device
address has so far been reversed when configuring the controller.
This has led to one vendor reversing the address provided by the
boot firmware using the 'local-bd-address' devicetree property.
The only device affected by this should be the WCN3991 used in some
Chromebooks. The corresponding compatible string has now been deprecated
so that the underlying driver bug can be fixed without breaking
backwards compatibility.
Set the HCI_QUIRK_BDADDR_PROPERTY_BROKEN quirk for the deprecated
compatible string and add the new 'qcom,wcn3991-bt-bdaddr-le' string to
the match table for boot firmware that conforms with the binding.
Fixes: 7d250a062f75 ("Bluetooth: hci_qca: Add support for Qualcomm Bluetooth SoC WCN3991")
Cc: stable(a)vger.kernel.org # 5.10
Cc: Balakrishna Godavarthi <quic_bgodavar(a)quicinc.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/bluetooth/hci_qca.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index f989c05f8177..346274fe66d8 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -1904,6 +1904,16 @@ static int qca_setup(struct hci_uart *hu)
case QCA_WCN6855:
case QCA_WCN7850:
set_bit(HCI_QUIRK_USE_BDADDR_PROPERTY, &hdev->quirks);
+
+ if (soc_type == QCA_WCN3991) {
+ struct device *dev = GET_HCIDEV_DEV(hdev);
+
+ if (device_is_compatible(dev, "qcom,wcn3991-bt")) {
+ set_bit(HCI_QUIRK_BDADDR_PROPERTY_BROKEN,
+ &hdev->quirks);
+ }
+ }
+
hci_set_aosp_capable(hdev);
ret = qca_read_soc_version(hdev, &ver, soc_type);
@@ -2597,6 +2607,7 @@ static const struct of_device_id qca_bluetooth_of_match[] = {
{ .compatible = "qcom,wcn3988-bt", .data = &qca_soc_data_wcn3988},
{ .compatible = "qcom,wcn3990-bt", .data = &qca_soc_data_wcn3990},
{ .compatible = "qcom,wcn3991-bt", .data = &qca_soc_data_wcn3991},
+ { .compatible = "qcom,wcn3991-bt-bdaddr-le", .data = &qca_soc_data_wcn3991},
{ .compatible = "qcom,wcn3998-bt", .data = &qca_soc_data_wcn3998},
{ .compatible = "qcom,wcn6750-bt", .data = &qca_soc_data_wcn6750},
{ .compatible = "qcom,wcn6855-bt", .data = &qca_soc_data_wcn6855},
--
2.43.2
The WCN6855 firmware on the Lenovo ThinkPad X13s expects the Bluetooth
device address in big-endian order when setting it using the
EDL_WRITE_BD_ADDR_OPCODE command.
Presumably, this is the case for all non-ROME devices which all use the
EDL_WRITE_BD_ADDR_OPCODE command for this (unlike the ROME devices which
use a different command and expect the address in little-endian order).
Reverse the little-endian address before setting it to make sure that
the address can be configured using tools like btmgmt or using the
'local-bd-address' devicetree property.
Note that this can potentially break systems with boot firmware which
has started relying on the broken behaviour and is incorrectly passing
the address via devicetree in big-endian order.
Fixes: 5c0a1001c8be ("Bluetooth: hci_qca: Add helper to set device address")
Cc: stable(a)vger.kernel.org # 5.1
Cc: Balakrishna Godavarthi <quic_bgodavar(a)quicinc.com>
Cc: Matthias Kaehlcke <mka(a)chromium.org>
Tested-by: Nikita Travkin <nikita(a)trvn.ru> # sc7180
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/bluetooth/btqca.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/bluetooth/btqca.c b/drivers/bluetooth/btqca.c
index b40b32fa7f1c..19cfc342fc7b 100644
--- a/drivers/bluetooth/btqca.c
+++ b/drivers/bluetooth/btqca.c
@@ -826,11 +826,15 @@ EXPORT_SYMBOL_GPL(qca_uart_setup);
int qca_set_bdaddr(struct hci_dev *hdev, const bdaddr_t *bdaddr)
{
+ bdaddr_t bdaddr_swapped;
struct sk_buff *skb;
int err;
- skb = __hci_cmd_sync_ev(hdev, EDL_WRITE_BD_ADDR_OPCODE, 6, bdaddr,
- HCI_EV_VENDOR, HCI_INIT_TIMEOUT);
+ baswap(&bdaddr_swapped, bdaddr);
+
+ skb = __hci_cmd_sync_ev(hdev, EDL_WRITE_BD_ADDR_OPCODE, 6,
+ &bdaddr_swapped, HCI_EV_VENDOR,
+ HCI_INIT_TIMEOUT);
if (IS_ERR(skb)) {
err = PTR_ERR(skb);
bt_dev_err(hdev, "QCA Change address cmd failed (%d)", err);
--
2.43.2
Some Bluetooth controllers lack persistent storage for the device
address and instead one can be provided by the boot firmware using the
'local-bd-address' devicetree property.
The Bluetooth devicetree bindings clearly states that the address should
be specified in little-endian order, but due to a long-standing bug in
the Qualcomm driver which reversed the address some bootloaders have
been providing the address in big-endian order instead.
Add a new quirk that can be used to mark deprecated compatible strings
that expect such broken devicetree properties and use it to reverse the
address when parsing the property so that the underlying driver bug can
be fixed.
Fixes: 5c0a1001c8be ("Bluetooth: hci_qca: Add helper to set device address")
Cc: stable(a)vger.kernel.org # 5.1
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
include/net/bluetooth/hci.h | 10 ++++++++++
net/bluetooth/hci_sync.c | 5 ++++-
2 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index bdee5d649cc6..556cffed5698 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -176,6 +176,16 @@ enum {
*/
HCI_QUIRK_USE_BDADDR_PROPERTY,
+ /* When this quirk is set, the Bluetooth Device Address provided by
+ * the 'local-bd-address' fwnode property is incorrectly specified in
+ * big-endian order.
+ *
+ * This quirk can be set before hci_register_dev is called or
+ * during the hdev->setup vendor callback and must only be used for
+ * deprecated compatible strings.
+ */
+ HCI_QUIRK_BDADDR_PROPERTY_BROKEN,
+
/* When this quirk is set, the duplicate filtering during
* scanning is based on Bluetooth devices addresses. To allow
* RSSI based updates, restart scanning if needed.
diff --git a/net/bluetooth/hci_sync.c b/net/bluetooth/hci_sync.c
index 5716345a26df..283ae8edc1e5 100644
--- a/net/bluetooth/hci_sync.c
+++ b/net/bluetooth/hci_sync.c
@@ -3215,7 +3215,10 @@ static void hci_dev_get_bd_addr_from_property(struct hci_dev *hdev)
if (ret < 0 || !bacmp(&ba, BDADDR_ANY))
return;
- bacpy(&hdev->public_addr, &ba);
+ if (test_bit(HCI_QUIRK_BDADDR_PROPERTY_BROKEN, &hdev->quirks))
+ baswap(&hdev->public_addr, &ba);
+ else
+ bacpy(&hdev->public_addr, &ba);
}
struct hci_init_stage {
--
2.43.2
Since commit 0c65dc062611 ("drm/i915/jsl: s/JSL/JASPERLAKE for
platform/subplatform defines"), boot freezes on a Jasper Lake tablet
(Librem 11), usually with graphical corruption on the eDP display,
but sometimes just a black screen. This commit was included in 6.6 and
later.
That commit was intended to refactor EHL and JSL macros, but the change
to ehl_combo_pll_div_frac_wa_needed() started matching JSL incorrectly
when it was only intended to match EHL.
It replaced:
return ((IS_PLATFORM(i915, INTEL_ELKHARTLAKE) &&
IS_JSL_EHL_DISPLAY_STEP(i915, STEP_B0, STEP_FOREVER)) ||
with:
return (((IS_ELKHARTLAKE(i915) || IS_JASPERLAKE(i915)) &&
IS_DISPLAY_STEP(i915, STEP_B0, STEP_FOREVER)) ||
Remove IS_JASPERLAKE() to fix the regression.
Signed-off-by: Jonathon Hall <jonathon.hall(a)puri.sm>
Cc: stable(a)vger.kernel.org
---
drivers/gpu/drm/i915/display/intel_dpll_mgr.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
index ef57dad1a9cb..57a97880dcb3 100644
--- a/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
+++ b/drivers/gpu/drm/i915/display/intel_dpll_mgr.c
@@ -2509,7 +2509,7 @@ static void icl_wrpll_params_populate(struct skl_wrpll_params *params,
static bool
ehl_combo_pll_div_frac_wa_needed(struct drm_i915_private *i915)
{
- return (((IS_ELKHARTLAKE(i915) || IS_JASPERLAKE(i915)) &&
+ return ((IS_ELKHARTLAKE(i915) &&
IS_DISPLAY_STEP(i915, STEP_B0, STEP_FOREVER)) ||
IS_TIGERLAKE(i915) || IS_ALDERLAKE_S(i915) || IS_ALDERLAKE_P(i915)) &&
i915->display.dpll.ref_clks.nssc == 38400;
--
2.39.2
In the RISC-V specification, the stimecmp register doesn't have a default
value. To prevent the timer interrupt from being triggered during timer
initialization, clear the timer interrupt by writing stimecmp with a
maximum value.
Fixes: 9f7a8ff6391f ("RISC-V: Prefer sstc extension if available")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ley Foon Tan <leyfoon.tan(a)starfivetech.com>
---
v3:
Resolved comment from Samuel Holland.
- Function riscv_clock_event_stop() needs to be called before
clockevents_config_and_register(), move riscv_clock_event_stop().
v2:
Resolved comments from Anup.
- Moved riscv_clock_event_stop() to riscv_timer_starting_cpu().
- Added Fixes tag
---
drivers/clocksource/timer-riscv.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index e66dcbd66566..79bb9a98baa7 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -108,6 +108,9 @@ static int riscv_timer_starting_cpu(unsigned int cpu)
{
struct clock_event_device *ce = per_cpu_ptr(&riscv_clock_event, cpu);
+ /* Clear timer interrupt */
+ riscv_clock_event_stop();
+
ce->cpumask = cpumask_of(cpu);
ce->irq = riscv_clock_event_irq;
if (riscv_timer_cannot_wake_cpu)
--
2.43.0
This is a backport of recently upstreamed fix for XPS 9530 sound issue.
Changes wouldn't apply cleanly, and were backported manually to 6.7.y.
Ideally should be applied to all branches where upstream commit
d110858a6925827609d11db8513d76750483ec06 exists (6.8.y) or was backported
(6.7.y) as it adds initial yet incomplete support for this laptop.
Signed-off-by: Aleksandrs Vinarskis <alex.vinarskis(a)gmail.com>
Aleksandrs Vinarskis (2):
mfd: intel-lpss: Switch to generalized quirk table
mfd: intel-lpss: Introduce QUIRK_CLOCK_DIVIDER_UNITY for XPS 9530
drivers/mfd/intel-lpss-pci.c | 28 ++++++++++++++++++++--------
drivers/mfd/intel-lpss.c | 9 ++++++++-
drivers/mfd/intel-lpss.h | 14 +++++++++++++-
3 files changed, 41 insertions(+), 10 deletions(-)
--
2.40.1
When userland uses DRM_IOCTL_MODE_ADDFB to add a framebuffer, the DRM
subsystem tries to find a pixel format from the supplied depth and
bpp-values. It does this by calling drm_driver_legacy_fb_format().
Unfortunately drm_driver_legacy_fb_format() can return formats not
supported by the underlying hardware. This series of patches remedies
this problem in patch 1.
In order to use the same logic for determining the pixel format, when
a fbdev adds a framebuffer as userland does, patches 2 to 11 migrates
fbdev users of drm_mode_legacy_fb_format() to
drm_driver_legacy_fb_format().
This series has been tested with the nouveau and modesetting drivers
on a NVIDIA NV96, the modesetting driver on Beagleboard Black, and
with the Intel and modesetting drivers on an Intel HD Graphics 4000
chipset.
This is an evolved version of the changes proposed in "drm: Don't
return unsupported formats in drm_mode_legacy_fb_format" [1] following
the suggestions of Thomas Zimmermann.
Cc: Abhinav Kumar <quic_abhinavk(a)quicinc.com>
Cc: Alim Akhtar <alim.akhtar(a)samsung.com>
Cc: amd-gfx(a)lists.freedesktop.org
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Cc: dri-devel(a)lists.freedesktop.org
Cc: freedreno(a)lists.freedesktop.org
Cc: intel-gfx(a)lists.freedesktop.org
Cc: intel-xe(a)lists.freedesktop.org
Cc: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Cc: linux-arm-kernel(a)lists.infradead.org
Cc: linux-arm-msm(a)vger.kernel.org
Cc: linux-samsung-soc(a)vger.kernel.org
Cc: linux-tegra(a)vger.kernel.org
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: "Maíra Canal" <mcanal(a)igalia.com>
Cc: Marijn Suijten <marijn.suijten(a)somainline.org>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Patrik Jakobsson <patrik.r.jakobsson(a)gmail.com>
Cc: Rob Clark <robdclark(a)gmail.com>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Sean Paul <sean(a)poorly.run>
Cc: stable(a)vger.kernel.org
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
Cc: "Ville Syrjälä" <ville.syrjala(a)linux.intel.com>
[1] https://lore.kernel.org/all/20240310152803.3315-1-frej.drejhammar@gmail.com/
Frej Drejhammar (11):
drm: Only return supported formats from drm_driver_legacy_fb_format
drm/fbdev_generic: Use drm_driver_legacy_fb_format() for fbdev
drm/armada: Use drm_driver_legacy_fb_format() for fbdev
drm/exynos: Use drm_driver_legacy_fb_format() for fbdev
drm/gma500: Use drm_driver_legacy_fb_format() for fbdev
drm/i915: Use drm_driver_legacy_fb_format() for fbdev
drm/msm: Use drm_driver_legacy_fb_format() for fbdev
drm/omapdrm: Use drm_driver_legacy_fb_format() for fbdev
drm/radeon: Use drm_driver_legacy_fb_format() for fbdev
drm/tegra: Use drm_driver_legacy_fb_format() for fbdev
drm/xe: Use drm_driver_legacy_fb_format() for fbdev
drivers/gpu/drm/armada/armada_fbdev.c | 5 +-
drivers/gpu/drm/drm_fb_helper.c | 2 +-
drivers/gpu/drm/drm_fbdev_dma.c | 4 +-
drivers/gpu/drm/drm_fbdev_generic.c | 4 +-
drivers/gpu/drm/drm_fourcc.c | 83 +++++++++++++++++++
drivers/gpu/drm/exynos/exynos_drm_fbdev.c | 6 +-
drivers/gpu/drm/gma500/fbdev.c | 2 +-
drivers/gpu/drm/i915/display/intel_fbdev_fb.c | 6 +-
drivers/gpu/drm/msm/msm_fbdev.c | 4 +-
drivers/gpu/drm/omapdrm/omap_fbdev.c | 6 +-
drivers/gpu/drm/radeon/radeon_fbdev.c | 6 +-
drivers/gpu/drm/tegra/fbdev.c | 5 +-
drivers/gpu/drm/xe/display/intel_fbdev_fb.c | 5 +-
13 files changed, 119 insertions(+), 19 deletions(-)
base-commit: 119b225f01e4d3ce974cd3b4d982c76a380c796d
--
2.44.0
There are several functions which are calling qcom_scm_bw_enable()
then returns immediately if the call fails and leaves the clocks
enabled.
Change the code of these functions to disable clocks when the
qcom_scm_bw_enable() call fails. This also fixes a possible dma
buffer leak in the qcom_scm_pas_init_image() function.
Compile tested only due to lack of hardware with interconnect
support.
Cc: stable(a)vger.kernel.org
Fixes: 65b7ebda5028 ("firmware: qcom_scm: Add bw voting support to the SCM interface")
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
---
Based on v6.8-rc7.
Note: Removing the two empty lines from qcom_scm_pas_init_image()
and fomr qcom_scm_pas_shutdown() functions is intentional to make
those consistent with the other two functions.
---
drivers/firmware/qcom/qcom_scm.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
diff --git a/drivers/firmware/qcom/qcom_scm.c b/drivers/firmware/qcom/qcom_scm.c
index 520de9b5633ab..e8460626fb0c4 100644
--- a/drivers/firmware/qcom/qcom_scm.c
+++ b/drivers/firmware/qcom/qcom_scm.c
@@ -569,13 +569,14 @@ int qcom_scm_pas_init_image(u32 peripheral, const void *metadata, size_t size,
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
desc.args[1] = mdata_phys;
ret = qcom_scm_call(__scm->dev, &desc, &res);
-
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
out:
@@ -637,10 +638,12 @@ int qcom_scm_pas_mem_setup(u32 peripheral, phys_addr_t addr, phys_addr_t size)
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
ret = qcom_scm_call(__scm->dev, &desc, &res);
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
return ret ? : res.result[0];
@@ -672,10 +675,12 @@ int qcom_scm_pas_auth_and_reset(u32 peripheral)
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
ret = qcom_scm_call(__scm->dev, &desc, &res);
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
return ret ? : res.result[0];
@@ -706,11 +711,12 @@ int qcom_scm_pas_shutdown(u32 peripheral)
ret = qcom_scm_bw_enable();
if (ret)
- return ret;
+ goto disable_clk;
ret = qcom_scm_call(__scm->dev, &desc, &res);
-
qcom_scm_bw_disable();
+
+disable_clk:
qcom_scm_clk_disable();
return ret ? : res.result[0];
---
base-commit: 90d35da658da8cff0d4ecbb5113f5fac9d00eb72
change-id: 20240304-qcom-scm-disable-clk-08e7ad853fa1
Best regards,
--
Gabor Juhos <j4g8y7(a)gmail.com>
No upstream commit exists for this patch.
Backport of commit 705318a99a13 ("io_uring/af_unix: disable sending
io_uring over sockets") introduced registered files leaks in 5.10/5.15
stable branches when CONFIG_UNIX is enabled.
The 5.10/5.15 backports removed io_sqe_file_register() calls from
io_install_fixed_file() and __io_sqe_files_update() so that newly added
files aren't passed to UNIX-related skbs and thus can't be put during
unregistering process. Skbs in the ring socket receive queue are released
but there is no skb having reference to the newly updated file.
In other words, when CONFIG_UNIX is enabled there would be no fput() when
files are unregistered for the corresponding fget() from
io_install_fixed_file() and __io_sqe_files_update().
Drop several code paths related to SCM_RIGHTS as a partial change from
commit 6e5e6d274956 ("io_uring: drop any code related to SCM_RIGHTS").
This code is useless in stable branches now, too, but is causing leaks in
5.10/5.15.
As stated above, the affected code was removed in upstream by
commit 6e5e6d274956 ("io_uring: drop any code related to SCM_RIGHTS").
Fresher stables from 6.1 have io_file_need_scm() stub function which
usage is effectively equivalent to dropping most of SCM-related code.
5.4 seems not to be affected with this problem since SCM-related
functions have been dropped there by the backport-patch.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 705318a99a13 ("io_uring/af_unix: disable sending io_uring over sockets")
Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru>
---
I feel io_uring-SCM related code should be dropped entirely from the
stable branches as the backports already differ greatly between versions
and some parts are still kept, some have been dropped in a non-consistent
order. Though this might contradict with stable kernel rules or be
inappropriate for some other reason.
io_uring/io_uring.c | 177 --------------------------------------------
1 file changed, 177 deletions(-)
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index 936abc6ee450..6ad078a3bf30 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -62,7 +62,6 @@
#include <linux/net.h>
#include <net/sock.h>
#include <net/af_unix.h>
-#include <net/scm.h>
#include <linux/anon_inodes.h>
#include <linux/sched/mm.h>
#include <linux/uaccess.h>
@@ -7989,15 +7988,6 @@ static void io_free_file_tables(struct io_file_table *table)
static void __io_sqe_files_unregister(struct io_ring_ctx *ctx)
{
-#if defined(CONFIG_UNIX)
- if (ctx->ring_sock) {
- struct sock *sock = ctx->ring_sock->sk;
- struct sk_buff *skb;
-
- while ((skb = skb_dequeue(&sock->sk_receive_queue)) != NULL)
- kfree_skb(skb);
- }
-#else
int i;
for (i = 0; i < ctx->nr_user_files; i++) {
@@ -8007,7 +7997,6 @@ static void __io_sqe_files_unregister(struct io_ring_ctx *ctx)
if (file)
fput(file);
}
-#endif
io_free_file_tables(&ctx->file_table);
io_rsrc_data_free(ctx->file_data);
ctx->file_data = NULL;
@@ -8159,170 +8148,10 @@ static struct io_sq_data *io_get_sq_data(struct io_uring_params *p,
return sqd;
}
-#if defined(CONFIG_UNIX)
-/*
- * Ensure the UNIX gc is aware of our file set, so we are certain that
- * the io_uring can be safely unregistered on process exit, even if we have
- * loops in the file referencing.
- */
-static int __io_sqe_files_scm(struct io_ring_ctx *ctx, int nr, int offset)
-{
- struct sock *sk = ctx->ring_sock->sk;
- struct scm_fp_list *fpl;
- struct sk_buff *skb;
- int i, nr_files;
-
- fpl = kzalloc(sizeof(*fpl), GFP_KERNEL);
- if (!fpl)
- return -ENOMEM;
-
- skb = alloc_skb(0, GFP_KERNEL);
- if (!skb) {
- kfree(fpl);
- return -ENOMEM;
- }
-
- skb->sk = sk;
- skb->scm_io_uring = 1;
-
- nr_files = 0;
- fpl->user = get_uid(current_user());
- for (i = 0; i < nr; i++) {
- struct file *file = io_file_from_index(ctx, i + offset);
-
- if (!file)
- continue;
- fpl->fp[nr_files] = get_file(file);
- unix_inflight(fpl->user, fpl->fp[nr_files]);
- nr_files++;
- }
-
- if (nr_files) {
- fpl->max = SCM_MAX_FD;
- fpl->count = nr_files;
- UNIXCB(skb).fp = fpl;
- skb->destructor = unix_destruct_scm;
- refcount_add(skb->truesize, &sk->sk_wmem_alloc);
- skb_queue_head(&sk->sk_receive_queue, skb);
-
- for (i = 0; i < nr; i++) {
- struct file *file = io_file_from_index(ctx, i + offset);
-
- if (file)
- fput(file);
- }
- } else {
- kfree_skb(skb);
- free_uid(fpl->user);
- kfree(fpl);
- }
-
- return 0;
-}
-
-/*
- * If UNIX sockets are enabled, fd passing can cause a reference cycle which
- * causes regular reference counting to break down. We rely on the UNIX
- * garbage collection to take care of this problem for us.
- */
-static int io_sqe_files_scm(struct io_ring_ctx *ctx)
-{
- unsigned left, total;
- int ret = 0;
-
- total = 0;
- left = ctx->nr_user_files;
- while (left) {
- unsigned this_files = min_t(unsigned, left, SCM_MAX_FD);
-
- ret = __io_sqe_files_scm(ctx, this_files, total);
- if (ret)
- break;
- left -= this_files;
- total += this_files;
- }
-
- if (!ret)
- return 0;
-
- while (total < ctx->nr_user_files) {
- struct file *file = io_file_from_index(ctx, total);
-
- if (file)
- fput(file);
- total++;
- }
-
- return ret;
-}
-#else
-static int io_sqe_files_scm(struct io_ring_ctx *ctx)
-{
- return 0;
-}
-#endif
-
static void io_rsrc_file_put(struct io_ring_ctx *ctx, struct io_rsrc_put *prsrc)
{
struct file *file = prsrc->file;
-#if defined(CONFIG_UNIX)
- struct sock *sock = ctx->ring_sock->sk;
- struct sk_buff_head list, *head = &sock->sk_receive_queue;
- struct sk_buff *skb;
- int i;
-
- __skb_queue_head_init(&list);
-
- /*
- * Find the skb that holds this file in its SCM_RIGHTS. When found,
- * remove this entry and rearrange the file array.
- */
- skb = skb_dequeue(head);
- while (skb) {
- struct scm_fp_list *fp;
-
- fp = UNIXCB(skb).fp;
- for (i = 0; i < fp->count; i++) {
- int left;
-
- if (fp->fp[i] != file)
- continue;
-
- unix_notinflight(fp->user, fp->fp[i]);
- left = fp->count - 1 - i;
- if (left) {
- memmove(&fp->fp[i], &fp->fp[i + 1],
- left * sizeof(struct file *));
- }
- fp->count--;
- if (!fp->count) {
- kfree_skb(skb);
- skb = NULL;
- } else {
- __skb_queue_tail(&list, skb);
- }
- fput(file);
- file = NULL;
- break;
- }
-
- if (!file)
- break;
-
- __skb_queue_tail(&list, skb);
-
- skb = skb_dequeue(head);
- }
-
- if (skb_peek(&list)) {
- spin_lock_irq(&head->lock);
- while ((skb = __skb_dequeue(&list)) != NULL)
- __skb_queue_tail(head, skb);
- spin_unlock_irq(&head->lock);
- }
-#else
fput(file);
-#endif
}
static void __io_rsrc_put_work(struct io_rsrc_node *ref_node)
@@ -8433,12 +8262,6 @@ static int io_sqe_files_register(struct io_ring_ctx *ctx, void __user *arg,
io_fixed_file_set(io_fixed_file_slot(&ctx->file_table, i), file);
}
- ret = io_sqe_files_scm(ctx);
- if (ret) {
- __io_sqe_files_unregister(ctx);
- return ret;
- }
-
io_rsrc_node_switch(ctx, NULL);
return ret;
out_fput:
--
2.44.0
commit b979f2d50a099f3402418d7ff5f26c3952fb08bb upstream.
A recent DRM series purporting to simplify support for "transparent
bridges" and handling of probe deferrals ironically exposed a
use-after-free issue on pmic_glink_altmode probe deferral.
This has manifested itself as the display subsystem occasionally failing
to initialise and NULL-pointer dereferences during boot of machines like
the Lenovo ThinkPad X13s.
Specifically, the dp-hpd bridge is currently registered before all
resources have been acquired which means that it can also be
deregistered on probe deferrals.
In the meantime there is a race window where the new aux bridge driver
(or PHY driver previously) may have looked up the dp-hpd bridge and
stored a (non-reference-counted) pointer to the bridge which is about to
be deallocated.
When the display controller is later initialised, this triggers a
use-after-free when attaching the bridges:
dp -> aux -> dp-hpd (freed)
which may, for example, result in the freed bridge failing to attach:
[drm:drm_bridge_attach [drm]] *ERROR* failed to attach bridge /soc@0/phy@88eb000 to encoder TMDS-31: -16
or a NULL-pointer dereference:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
...
Call trace:
drm_bridge_attach+0x70/0x1a8 [drm]
drm_aux_bridge_attach+0x24/0x38 [aux_bridge]
drm_bridge_attach+0x80/0x1a8 [drm]
dp_bridge_init+0xa8/0x15c [msm]
msm_dp_modeset_init+0x28/0xc4 [msm]
The DRM bridge implementation is clearly fragile and implicitly built on
the assumption that bridges may never go away. In this case, the fix is
to move the bridge registration in the pmic_glink_altmode driver to
after all resources have been looked up.
Incidentally, with the new dp-hpd bridge implementation, which registers
child devices, this is also a requirement due to a long-standing issue
in driver core that can otherwise lead to a probe deferral loop (see
commit fbc35b45f9f6 ("Add documentation on meaning of -EPROBE_DEFER")).
[DB: slightly fixed commit message by adding the word 'commit']
Fixes: 080b4e24852b ("soc: qcom: pmic_glink: Introduce altmode support")
Fixes: 2bcca96abfbf ("soc: qcom: pmic-glink: switch to DRM_AUX_HPD_BRIDGE")
Cc: <stable(a)vger.kernel.org> # 6.3
Cc: Bjorn Andersson <andersson(a)kernel.org>
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Reviewed-by: Bjorn Andersson <andersson(a)kernel.org>
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240217150228.5788-4-johan+l…
[ johan: backport to 6.7 which does not have DRM aux bridge ]
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/soc/qcom/pmic_glink_altmode.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/soc/qcom/pmic_glink_altmode.c b/drivers/soc/qcom/pmic_glink_altmode.c
index ad922f0dca6b..a890fafdafb8 100644
--- a/drivers/soc/qcom/pmic_glink_altmode.c
+++ b/drivers/soc/qcom/pmic_glink_altmode.c
@@ -469,12 +469,6 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev,
alt_port->bridge.ops = DRM_BRIDGE_OP_HPD;
alt_port->bridge.type = DRM_MODE_CONNECTOR_DisplayPort;
- ret = devm_drm_bridge_add(dev, &alt_port->bridge);
- if (ret) {
- fwnode_handle_put(fwnode);
- return ret;
- }
-
alt_port->dp_alt.svid = USB_TYPEC_DP_SID;
alt_port->dp_alt.mode = USB_TYPEC_DP_MODE;
alt_port->dp_alt.active = 1;
@@ -525,6 +519,16 @@ static int pmic_glink_altmode_probe(struct auxiliary_device *adev,
}
}
+ for (port = 0; port < ARRAY_SIZE(altmode->ports); port++) {
+ alt_port = &altmode->ports[port];
+ if (!alt_port->altmode)
+ continue;
+
+ ret = devm_drm_bridge_add(dev, &alt_port->bridge);
+ if (ret)
+ return ret;
+ }
+
altmode->client = devm_pmic_glink_register_client(dev,
altmode->owner_id,
pmic_glink_altmode_callback,
--
2.43.0
Hi all,
This patch series includes backports for the changes that fix CVE-2023-52447.
Commit e6c86c513f44 ("rcu-tasks: Provide rcu_trace_implies_rcu_gp()")
applied cleanly.
Commit 876673364161 ("bpf: Defer the free of inner map when necessary")
had one significant conflict, which was due to missing commit
8d5a8011b35d ("bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.").
The conflict was because of the switch to queue_work() from schedule_work() in
__bpf_map_put(). From what I can tell, the switch to queue_work() from
schedule_work() isn't relevant in the context of this bug, so I resolved the
conflict by keeping schedule_work() and not including 8d5a8011b35d
("bpf: Batch call_rcu callbacks instead of SLAB_TYPESAFE_BY_RCU.").
I also noticed that commit a6fb03a9c9c8
("bpf: add percpu stats for bpf_map elements insertions/deletions") is tagged as
a stable dependency of commit 876673364161. However, I don't see the functions
and fields added in that patch used at all in commit 876673364161. This patch
was backported to linux-6.1.y, but a `git grep` seems to show that
`bpf_map_init_elem_count` is never referenced in linux-6.1.y. It seems to me
that this patch is not actually a dependency of commit 876673364161, so I didn't
include it in this backport.
I ran the selftests added in commit 1624918be84a
("selftests/bpf: Add test cases for inner map"), and they passed with no KASAN
warnings. However, I did not manage to find a kernel on which these tests did
generate a KASAN warning, so the test result may not be very meaningful. Apart
from that, my typical build+boot test passed.
Hou Tao (1):
bpf: Defer the free of inner map when necessary
Paul E. McKenney (1):
rcu-tasks: Provide rcu_trace_implies_rcu_gp()
include/linux/bpf.h | 7 ++++++-
include/linux/rcupdate.h | 12 ++++++++++++
kernel/bpf/map_in_map.c | 11 ++++++++---
kernel/bpf/syscall.c | 26 ++++++++++++++++++++++++--
kernel/rcu/tasks.h | 2 ++
5 files changed, 52 insertions(+), 6 deletions(-)
--
2.44.0.278.ge034bb2e1d-goog
Please take the following 2 commits from 6.5 to 6.1-LTS. They're
already present in 5.10, 5.15, but seemingly were missed from 6.1.
They apply cleanly and compile for me.
873f50ece41a ("md: fix data corruption for raid456 when reshape
restart while grow up")
010444623e7f ("md/raid10: prevent soft lockup while flush writes")
Dear Greg & stable team,
could you please queue up the patch below for the stable-6.7 kernel?
This is upstream commit:
eba38cc7578bef94865341c73608bdf49193a51d
Thanks,
Helge
From eba38cc7578bef94865341c73608bdf49193a51d Mon Sep 17 00:00:00 2001
From: Helge Deller <deller(a)kernel.org>
Subject: [PATCH] bcachefs: Fix build on parisc by avoiding __multi3()
The gcc compiler on paric does support the __int128 type, although the
architecture does not have native 128-bit support.
The effect is, that the bcachefs u128_square() function will pull in the
libgcc __multi3() helper, which breaks the kernel build when bcachefs is
built as module since this function isn't currently exported in
arch/parisc/kernel/parisc_ksyms.c.
The build failure can be seen in the latest debian kernel build at:
https://buildd.debian.org/status/fetch.php?pkg=linux&arch=hppa&ver=6.7.1-1%…
We prefer to not export that symbol, so fall back to the optional 64-bit
implementation provided by bcachefs and thus avoid usage of __multi3().
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: Kent Overstreet <kent.overstreet(a)linux.dev>
Signed-off-by: Kent Overstreet <kent.overstreet(a)linux.dev>
diff --git a/fs/bcachefs/mean_and_variance.h b/fs/bcachefs/mean_and_variance.h
index b2be565bb8f2..64df11ab422b 100644
--- a/fs/bcachefs/mean_and_variance.h
+++ b/fs/bcachefs/mean_and_variance.h
@@ -17,7 +17,7 @@
* Rust and rustc has issues with u128.
*/
-#if defined(__SIZEOF_INT128__) && defined(__KERNEL__)
+#if defined(__SIZEOF_INT128__) && defined(__KERNEL__) && !defined(CONFIG_PARISC)
typedef struct {
unsigned __int128 v;
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 4.19.310 kernel.
All users of the 4.19 kernel series must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
- ------------
Makefile | 2 +-
arch/alpha/kernel/osf_sys.c | 2 +-
arch/um/Kconfig | 13 +
arch/um/Makefile | 3 +-
arch/x86/Makefile.um | 2 +-
drivers/input/serio/i8042-x86ia64io.h | 6 +
drivers/net/geneve.c | 18 +-
drivers/net/hyperv/netvsc_drv.c | 96 ++-
drivers/net/loopback.c | 6 -
drivers/net/nlmon.c | 6 -
drivers/net/usb/lan78xx.c | 966 +++++++++++++++++++++++--------
drivers/net/vsockmon.c | 14 +-
fs/btrfs/ref-verify.c | 6 +-
include/linux/netdevice.h | 6 +
include/uapi/linux/resource.h | 4 +-
kernel/sys.c | 91 +--
net/ipv6/route.c | 21 +-
net/netfilter/nf_conntrack_h323_asn1.c | 4 +
net/netrom/af_netrom.c | 14 +-
net/netrom/nr_dev.c | 2 +-
net/netrom/nr_in.c | 6 +-
net/netrom/nr_out.c | 2 +-
net/netrom/nr_route.c | 8 +-
net/netrom/nr_subr.c | 5 +-
net/rds/rdma.c | 3 +
net/rds/send.c | 6 +-
tools/testing/selftests/vm/map_hugetlb.c | 50 +-
27 files changed, 989 insertions(+), 373 deletions(-)
Arnd Bergmann (1):
y2038: rusage: use __kernel_old_timeval
Christophe Leroy (3):
tools/selftest/vm: allow choosing mem size and page size in map_hugetlb
selftests/vm: fix display of page size in map_hugetlb
selftests/vm: fix map_hugetlb length used for testing read and write
Dexuan Cui (1):
hv_netvsc: Make netvsc/VF binding check both MAC and serial number
Edward Adam Davis (1):
net/rds: fix WARNING in rds_conn_connect_if_down
Eric Dumazet (2):
geneve: make sure to pull inner header in geneve_rx()
net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
Fedor Pchelkin (1):
btrfs: ref-verify: free ref cache before clearing mount opt
Jason Xing (12):
netrom: Fix a data-race around sysctl_netrom_default_path_quality
netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
netrom: Fix a data-race around sysctl_netrom_transport_timeout
netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
netrom: Fix a data-race around sysctl_netrom_routing_control
netrom: Fix a data-race around sysctl_netrom_link_fails_count
netrom: Fix data-races around sysctl_net_busy_read
Johannes Berg (1):
um: allow not setting extra rpaths in the linux binary
John Efstathiades (4):
lan78xx: Fix white space and style issues
lan78xx: Add missing return code checks
lan78xx: Fix partial packet errors on suspend/resume
lan78xx: Fix race conditions in suspend/resume handling
Juhee Kang (1):
hv_netvsc: use netif_is_bond_master() instead of open code
Lee Jones (1):
net: usb: lan78xx: Remove lots of set but unused 'ret' variables
Lena Wang (1):
netfilter: nf_conntrack_h323: Add protection for bmp length out of range
Li RongQing (1):
net: move definition of pcpu_lstats to header file
Nico Pache (1):
selftests: mm: fix map_hugetlb failure on 64K page size systems
Oleg Nesterov (4):
getrusage: add the "signal_struct *sig" local variable
getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
getrusage: use __for_each_thread()
getrusage: use sig->stats_lock rather than lock_task_sighand()
Oleksij Rempel (1):
net: lan78xx: fix runtime PM count underflow on link stop
Sasha Levin (1):
Linux 4.19.310
Shradha Gupta (1):
hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed
Werner Sembach (1):
Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmX0nTsACgkQ3qZv95d3
LNxqzQ//Zn+wIORW6+gZGsidAvNaWWvtGOJII2rW1ZJsCfduCvuFPNRrab9m6JrQ
O+S7QgyLhLQpdqwbjA+MhGbai2KVPuv/MgsOT8ext5y9WpEDcy2uLSoabC9yOd0U
VDq27hhE6F3TgZpJwuVzHByuMmWSd96bLNuqhUEL5+07AIsRO+xqOK9Otwt6iuLd
uCMTtE0GlHIVAT73l8Pv7u1mOMXvEou/7xjDqQAB8Oo8pO+gHZfUqyExAjip4g9M
Ve6Sj8//zZWZHodKs/xtSX94ljP1qS3b3PdkkHJB2gbgqeymWm4v2X6IQgZ2PTXd
/Iy/4r9d9unBvEhfzXDcyHpvqZBOlR9nd6J0pjmERTpr6KWAwhRU/iaKDwt9lfij
nZ31PWR+qS8i7VxQGi+IKJ9QDLAhvMuR6yIF74ITuYWc+y19qsDE+KL8775eAGzA
/qOUqJsICtISSSsZ1jPJOJ4o469sxmAaT/xElFfVEbZLmtkx0FdI2mPeZ7WasHkN
421UnsoZnHr43Zc2uscquhmZuDEnoBkmGBhp2z6PTDe1SgMbR8c+qGJUNY31hKLv
XvuVMgrkLfXRu1LuZAaaxhPdIbjX9dDca5/I75C8S6a5eCpVmjZv+GhwZMuajxFU
qEUVkHQRch2RFyvV0Avl8IEIprRW46EuCx2eX2wXF/Fq4Mp20G4=
=zydN
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 5.4.272 kernel.
All users of the 5.4 kernel series must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
- ------------
Makefile | 2 +-
arch/alpha/kernel/osf_sys.c | 2 +-
arch/arm64/boot/dts/qcom/sdm845.dtsi | 25 +-
arch/um/Kconfig | 13 +
arch/um/Makefile | 3 +-
arch/x86/Makefile.um | 2 +-
drivers/base/regmap/internal.h | 4 +
drivers/base/regmap/regmap.c | 77 ++-
drivers/input/serio/i8042-x86ia64io.h | 6 +
drivers/net/ethernet/intel/ice/ice_main.c | 2 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 +-
drivers/net/geneve.c | 18 +-
drivers/net/hyperv/netvsc_drv.c | 96 ++-
drivers/net/usb/lan78xx.c | 910 ++++++++++++++++++++------
drivers/tty/serial/Kconfig | 1 +
drivers/tty/serial/max310x.c | 378 ++++++++---
include/linux/regmap.h | 19 +
include/uapi/linux/resource.h | 4 +-
kernel/sys.c | 91 +--
net/ipv6/route.c | 21 +-
net/netfilter/nf_conntrack_h323_asn1.c | 4 +
net/netfilter/nft_ct.c | 11 +-
net/netrom/af_netrom.c | 14 +-
net/netrom/nr_dev.c | 2 +-
net/netrom/nr_in.c | 6 +-
net/netrom/nr_out.c | 2 +-
net/netrom/nr_route.c | 8 +-
net/netrom/nr_subr.c | 5 +-
net/rds/rdma.c | 3 +
net/rds/send.c | 6 +-
tools/testing/selftests/vm/map_hugetlb.c | 7 +
31 files changed, 1334 insertions(+), 464 deletions(-)
Andy Shevchenko (4):
serial: max310x: Use devm_clk_get_optional() to get the input clock
serial: max310x: Try to get crystal clock rate from property
serial: max310x: Make use of device properties
serial: max310x: Unprepare and disable clock in error path
Ansuel Smith (1):
regmap: allow to define reg_update_bits for no bus configuration
Arnd Bergmann (1):
y2038: rusage: use __kernel_old_timeval
Cosmin Tanislav (4):
serial: max310x: use regmap methods for SPI batch operations
serial: max310x: use a separate regmap for each port
serial: max310x: make accessing revision id interface-agnostic
serial: max310x: implement I2C support
Dexuan Cui (1):
hv_netvsc: Make netvsc/VF binding check both MAC and serial number
Edward Adam Davis (1):
net/rds: fix WARNING in rds_conn_connect_if_down
Eric Dumazet (2):
geneve: make sure to pull inner header in geneve_rx()
net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
Florian Westphal (1):
netfilter: nft_ct: fix l3num expectations with inet pseudo family
Hugo Villeneuve (2):
serial: max310x: fail probe if clock crystal is unstable
serial: max310x: prevent infinite while() loop in port startup
Jan Kundrát (1):
serial: max310x: fix IO data corruption in batched operations
Jason Xing (12):
netrom: Fix a data-race around sysctl_netrom_default_path_quality
netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
netrom: Fix a data-race around sysctl_netrom_transport_timeout
netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
netrom: Fix a data-race around sysctl_netrom_routing_control
netrom: Fix a data-race around sysctl_netrom_link_fails_count
netrom: Fix data-races around sysctl_net_busy_read
Johan Hovold (1):
arm64: dts: qcom: sdm845: fix USB DP/DM HS PHY interrupts
Johannes Berg (1):
um: allow not setting extra rpaths in the linux binary
John Efstathiades (4):
lan78xx: Fix white space and style issues
lan78xx: Add missing return code checks
lan78xx: Fix partial packet errors on suspend/resume
lan78xx: Fix race conditions in suspend/resume handling
Juhee Kang (1):
hv_netvsc: use netif_is_bond_master() instead of open code
Lena Wang (1):
netfilter: nf_conntrack_h323: Add protection for bmp length out of range
Lina Iyer (1):
arm64: dts: qcom: add PDC interrupt controller for SDM845
Maciej Fijalkowski (1):
ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
Marek Vasut (1):
regmap: Add bulk read/write callbacks into regmap_config
Nico Pache (1):
selftests: mm: fix map_hugetlb failure on 64K page size systems
Oleg Nesterov (4):
getrusage: add the "signal_struct *sig" local variable
getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
getrusage: use __for_each_thread()
getrusage: use sig->stats_lock rather than lock_task_sighand()
Oleksij Rempel (1):
net: lan78xx: fix runtime PM count underflow on link stop
Rand Deeb (1):
net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink()
Sasha Levin (1):
Linux 5.4.272
Shradha Gupta (1):
hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed
Werner Sembach (1):
Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmX0nLoACgkQ3qZv95d3
LNxrdw/+OcKPa2m7vFsJC7VGQgT9PtU14MOS3rrsVBotR76muYTk9xCXk2JX2fnK
qaJKLv59rX0bkXTqzrHZbBlWSzslaM9Ka4gdB3hIyLglIAmjXNRb2PKPb0HVA9Dw
a7vCLxGvv3wGPopTHd5/kED/XPh361VZP3OuNM9+qGsSPh/G4sgy031/6ABLx/i/
NyUKqY/+5CIx8uzMXKdytseNwBKh0oEaq6JbZ9o+9J6Uf5ZxonW5svSYhWPnr8ks
WbkH1+Ykxdz77w84WGL1EHjeR5FmeXECUcDviQ9UqHVJabpJManmSV1t4O+8uoev
x/1+/En9+1viEyUN4F+kbQFwb0a3icN+uF8LSz1GPOR4zJP/AKw7qec4vgcIXElm
MDGSRNss41pNcCGXQzqQ40fXtdbP+ZOQS1l4GqrLzwt0YKEmF/rAgCxMCfPtiQRI
BmUoTlTnpD/Yq7VudOXs8dNcH1Fmmfr1jGkEKTulcK8xyHbjxEMaJE19YSa7lnOy
A1jdfwuNmlY+aM+keUqST5gJDM0qiLTMizxiZwSR9faMefh2teKekLEbCSenJXxw
OEGlUBZ7EW1P6N6/IUO07Q5BJAcX+pgpj1u3lKYA/x+SxQscAh38F6k/Z44929UK
F50bBtIsGliRxdktknGA7lYuUJTuVCYulLHYCqP7ovxsOu7D9z8=
=116+
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 5.10.213 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
- ------------
Makefile | 2 +-
arch/um/Kconfig | 13 +
arch/um/Makefile | 3 +-
arch/x86/Makefile.um | 2 +-
drivers/base/regmap/internal.h | 4 +
drivers/base/regmap/regmap.c | 77 +-
drivers/hv/channel.c | 174 +++-
drivers/hv/hyperv_vmbus.h | 3 +-
drivers/hv/ring_buffer.c | 28 +-
drivers/mmc/host/mmci_stm32_sdmmc.c | 112 ++-
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/ice/ice_main.c | 2 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 +-
drivers/net/geneve.c | 18 +-
drivers/net/hyperv/hyperv_net.h | 13 +
drivers/net/hyperv/netvsc.c | 55 +-
drivers/net/hyperv/netvsc_drv.c | 107 ++-
drivers/net/hyperv/rndis_filter.c | 1 +
drivers/net/usb/lan78xx.c | 910 ++++++++++++++++-----
drivers/tty/serial/Kconfig | 1 +
drivers/tty/serial/max310x.c | 378 ++++++---
drivers/usb/host/xhci-ring.c | 143 +++-
drivers/usb/host/xhci.h | 3 +
fs/ext4/extents.c | 5 +-
fs/ext4/extents_status.c | 14 +-
fs/ext4/extents_status.h | 6 +-
fs/ext4/inode.c | 65 +-
fs/hugetlbfs/inode.c | 17 +-
include/linux/filter.h | 3 +-
include/linux/hugetlb.h | 2 +-
include/linux/hyperv.h | 23 +
include/linux/lsm_hook_defs.h | 6 +-
include/linux/lsm_hooks.h | 4 +-
include/linux/regmap.h | 19 +
include/linux/security.h | 11 +-
include/linux/sockptr.h | 5 +
include/trace/events/qdisc.h | 20 +-
kernel/bpf/cpumap.c | 2 +-
kernel/sys.c | 91 ++-
mm/hugetlb.c | 37 +-
net/core/filter.c | 5 +-
net/core/sock.c | 52 +-
net/ipv6/route.c | 21 +-
net/netfilter/nf_conntrack_h323_asn1.c | 4 +
net/netfilter/nft_ct.c | 11 +-
net/netrom/af_netrom.c | 14 +-
net/netrom/nr_dev.c | 2 +-
net/netrom/nr_in.c | 6 +-
net/netrom/nr_out.c | 2 +-
net/netrom/nr_route.c | 8 +-
net/netrom/nr_subr.c | 5 +-
net/rds/rdma.c | 3 +
net/rds/send.c | 6 +-
security/apparmor/lsm.c | 29 +-
security/security.c | 35 +-
security/selinux/hooks.c | 13 +-
security/smack/smack_lsm.c | 19 +-
.../selftests/vm/charge_reserved_hugetlb.sh | 2 +-
tools/testing/selftests/vm/map_hugetlb.c | 7 +
tools/testing/selftests/vm/write_hugetlb_memory.sh | 2 +-
60 files changed, 1981 insertions(+), 702 deletions(-)
Andrea Parri (Microsoft) (1):
Drivers: hv: vmbus: Drop error message when 'No request id available'
Andres Beltran (2):
Drivers: hv: vmbus: Add vmbus_requestor data structure for VMBus hardening
hv_netvsc: Use vmbus_requestor to generate transaction IDs for VMBus hardening
Andy Shevchenko (4):
serial: max310x: Use devm_clk_get_optional() to get the input clock
serial: max310x: Try to get crystal clock rate from property
serial: max310x: Make use of device properties
serial: max310x: Unprepare and disable clock in error path
Ansuel Smith (1):
regmap: allow to define reg_update_bits for no bus configuration
Baokun Li (1):
ext4: make ext4_es_insert_extent() return void
Christophe Kerello (1):
mmc: mmci: stm32: fix DMA API overlapping mappings warning
Cosmin Tanislav (4):
serial: max310x: use regmap methods for SPI batch operations
serial: max310x: use a separate regmap for each port
serial: max310x: make accessing revision id interface-agnostic
serial: max310x: implement I2C support
Dexuan Cui (1):
hv_netvsc: Make netvsc/VF binding check both MAC and serial number
Edward Adam Davis (1):
net/rds: fix WARNING in rds_conn_connect_if_down
Eric Dumazet (2):
geneve: make sure to pull inner header in geneve_rx()
net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
Florian Westphal (1):
netfilter: nft_ct: fix l3num expectations with inet pseudo family
Hugo Villeneuve (2):
serial: max310x: fail probe if clock crystal is unstable
serial: max310x: prevent infinite while() loop in port startup
Jan Kundrát (1):
serial: max310x: fix IO data corruption in batched operations
Jason Xing (12):
netrom: Fix a data-race around sysctl_netrom_default_path_quality
netrom: Fix a data-race around sysctl_netrom_obsolescence_count_initialiser
netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
netrom: Fix a data-race around sysctl_netrom_transport_timeout
netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
netrom: Fix a data-race around sysctl_netrom_transport_acknowledge_delay
netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
netrom: Fix a data-race around sysctl_netrom_transport_requested_window_size
netrom: Fix a data-race around sysctl_netrom_transport_no_activity_timeout
netrom: Fix a data-race around sysctl_netrom_routing_control
netrom: Fix a data-race around sysctl_netrom_link_fails_count
netrom: Fix data-races around sysctl_net_busy_read
Johannes Berg (1):
um: allow not setting extra rpaths in the linux binary
John Efstathiades (4):
lan78xx: Fix white space and style issues
lan78xx: Add missing return code checks
lan78xx: Fix partial packet errors on suspend/resume
lan78xx: Fix race conditions in suspend/resume handling
Juhee Kang (1):
hv_netvsc: use netif_is_bond_master() instead of open code
Lena Wang (1):
netfilter: nf_conntrack_h323: Add protection for bmp length out of range
Long Li (2):
hv_netvsc: Wait for completion on request SWITCH_DATA_PATH
hv_netvsc: Process NETDEV_GOING_DOWN on VF hot remove
Maciej Fijalkowski (2):
ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
i40e: disable NAPI right after disabling irqs when handling xsk_pool
Marek Vasut (1):
regmap: Add bulk read/write callbacks into regmap_config
Martin KaFai Lau (2):
net: Change sock_getsockopt() to take the sk ptr instead of the sock ptr
bpf: net: Change sk_getsockopt() to take the sockptr_t argument
Mathias Nyman (3):
xhci: remove extra loop in interrupt context
xhci: prevent double-fetch of transfer and transfer event TRBs
xhci: process isoc TD properly when there was a transaction error mid TD.
Michal Pecio (1):
xhci: handle isoc Babble and Buffer Overrun events properly
Mike Kravetz (1):
mm/hugetlb: change hugetlb_reserve_pages() to type bool
Muhammad Usama Anjum (1):
selftests/mm: switch to bash from sh
Nico Pache (1):
selftests: mm: fix map_hugetlb failure on 64K page size systems
Oleg Nesterov (4):
getrusage: add the "signal_struct *sig" local variable
getrusage: move thread_group_cputime_adjusted() outside of lock_task_sighand()
getrusage: use __for_each_thread()
getrusage: use sig->stats_lock rather than lock_task_sighand()
Oleksij Rempel (1):
net: lan78xx: fix runtime PM count underflow on link stop
Ondrej Mosnacek (1):
lsm: fix default return value of the socket_getpeersec_*() hooks
Paul Moore (1):
lsm: make security_socket_getpeersec_stream() sockptr_t safe
Prakash Sangappa (1):
mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVE
Rand Deeb (1):
net: ice: Fix potential NULL pointer dereference in ice_bridge_setlink()
Sasha Levin (1):
Linux 5.10.213
Shradha Gupta (1):
hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed
Steven Rostedt (Google) (1):
tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string
Toke Høiland-Jørgensen (1):
cpumap: Zero-initialise xdp_rxq_info struct before running XDP program
Yann Gautier (1):
mmc: mmci: stm32: use a buffer for unaligned DMA requests
Zhang Yi (2):
ext4: refactor ext4_da_map_blocks()
ext4: convert to exclusive lock while inserting delalloc extents
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmX0m6kACgkQ3qZv95d3
LNwhoRAAvOhsQoD+aq3Z7sknD20UJ9JRkofM+91UJUfvWeWpOoZo/mqKYi487sTY
gxOZk9hE5oMNvu4KHbRXWfA7PYLLXYeyI3QGbCnB1WaEVrfRHEQExeA1sWvMZzHB
wuBY1a0j7ysk/unDGSoLPlSDJFj5iO39AoSoaorgGUZ9Z/k9K4CuLzSP5oBblsSl
zPMvXUsn7nGAfyThbS89wJOEYMmDQe0vglAnkCAX+p6JABc5uUVyNDmQ12EPRD11
P2/uZ9YlKejPNuaJxyZqFxxFTlAXjGhJu/gA4G3mQxtk5YQuFxQPtapw0XV0yJwu
Nz8OgSRFU24QBklsMTGXjXoi2fP2k6PpNYVdsC/wBSBGAnKYu0SmdeJfEY4GrxNZ
foGIAjnnR49pYfVVl94S6OpYsfLNPyjTEdYm2fJht5WzYSXc2GFMLZ7YqoIiWJum
CjtT0Yw/4C/OPcb2SE72XY932ieUnNhrTxUa9JejSCPa4nn8XRIo2oaUTPRcqR4T
OescRIfAvlJe9/H4DpzVzZZB53yFGRmIa3kMKw4r6oVlPq5xvJZ9tXxloApqCkzL
JZSxRoFLSwh3iRe8uUIP0YfhbOJxmaySr4/rlcnVdh+VW8dKRh1E+d+yDHTiFqz5
6waTahRk61y4oEAkI6kKOr4fM7lM6SHE69wnHMR5C79fbNhJowE=
=FqJj
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 6.1.82 kernel.
All users of the 6.1 kernel series must upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
- ------------
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmX0mg8ACgkQ3qZv95d3
LNzr0xAAsq7KbI+iOpJ1QUuNbFVKuk5d4BrzhtVTe0GwwriKWPO1Hz8/7OuKyORr
OmiWYs3rT6daI0n2lAhlF9o4u8A30n8Vu6EVcUklNOjZULH1kzLjwHFg48moKc9c
Dieio0QAkkjkF3SfSghUI3/CjhLhmQqAcvXawkYEEG+SUIpXLWnJ5iSf4Gf88SvK
GV9Kr4Y0fdZZ+5hCxo8BurIsnDCjewu6PLaMSfUbkMEsH8YQkjjo4Owvidx2UwJZ
AohRB3/xjpZd3kzEeWo/l+GkQyxBSV0BpDvyTHl/vEepm1G1469Fw+movBFFGOgX
lo6Mac387PpJAsTZfuG8w/HUBd/Qsqr3Ypr3K6bEvD7I0zJXwRZLfpedaoKz43yQ
nAGDJ/RXJdqhW+N95sEtvJm5HSCErZqup6qHUx7pht1Z+QQrgSt7GjPFGSV6w90v
ZAW77AZND3xlt746hbz2sXBxlMB7N9yy6rMN0ON6QDzj9RZhQUCBePqdOPXgZkU4
ics5pqXbFtK6mAXgHWK97IXaFEcxjgjJiaKiNNcPrUq0wveANzfYB/5oaXhNph+c
cmcL+/OpekamoA4MB90BQ3l/PfzaBofuYBLMjhGBBi/da18H0KdRbR7pWqDwI2iY
tF0FuT9EsirSITxXu5LCcQJYPY8FmQXeVs1CjlrRd7m7Ev0nRSo=
=0dE0
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 6.6.22 kernel.
All users of the 6.6 kernel series must upgrade.
The updated 6.6.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.6.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
- ------------
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmX0maQACgkQ3qZv95d3
LNzbJxAAnP99q7lPOB0g0UMoAGOBqKYWI81DtW2Xey8i/Bf0yerJiIYIBqRRRHKU
I4Sqq8C8iR7yrTkYcD7993J2Zey8rjMBIBws78vDeEKWdVxtGLuE1DivT2sBTk6N
y6uQ7Nar7mDmQ0U2IB5KlA1MCy8/tFb2J2Ocvep0GfUMbxiloOcnRoa9SqZlUOzN
BNeZoAqlAWIjgJ4rVo7En9cWdYjMtk4fPFDb53vfmSR2xu1Rak73w4LHA8MdtNzr
AnKqQOqXLEutjb8tGnJ4mWe7ep/OMsrDGNXSkz04Sg/V7kSbjYzyTGiLeCxYmxip
lavPNqBdgdiukexVkgWON8OA2AkvFBgKGHFrEaG4k+NAfLHzKAIC1AR8MQXjCbi8
5vinLTp+u8qmoAZNMsbTYgA347fgMexM3/rk65Z33w/GWP2hOWX7fJ9TRHycbomt
tiveGtaSMgMAdhN+RKlDTtsxF4JVRDVADe5w6LBzaWgv1drdcJzwNn2fTItWZm9p
Vq9gjzWdPChNqS9QXK58C14GUmnzgVEbZz3my0BU7ROroEKONje/jfwdHb9fj61V
8w/CYhAU9I/hlK/1ETxQkR3GdURZ/bzHNzmLsDXx2JBgxaelAXivQDuW2iR1PqtV
YH4NtC6VlUN4tDPNrLse+wCUb5QUuSX4jb6qXDxUneaRegXfa9Q=
=uPB5
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 6.7.10 kernel.
All users of the 6.7 kernel series must upgrade.
The updated 6.7.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.7.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
- ------------
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmX0mQoACgkQ3qZv95d3
LNzimw/8CNyHdoZLtpqSUecdo2L57pR6Fd3oHmP87dqOnRylZpQpSujVqn/euZ3e
XCEfwhyA52LaQDAcEqrgsoYa9aQQUBFHBY/P3QhPvvzud+wFkwyz73+UTYSH0ag7
GPpNwjdM6d/KdKZyNmv5FIX1zbaBx0ETYPVUBhL/UiCMwFTq4U/1ic4tiBueUSQa
o87IlfZQKB9Gz6k7Dsi4bW0tZbeVCcg7rmVlL69b0xATuftOJe7P76OmjSJ6GO8c
e2hTCTuc/RvuyAvfWjUSxmq38IMGb2yLsJP9/DWCjinlIna9mLKCsmQZ0oxg9q/T
NHwur3VWuFoxX86/0CGFQ9XqNuZlvH0aqPpdUn4BvLOhFBBn1TM9KbVx3fLTjOfo
BaF0tl8ZFsH5rZasUpoimrsmHoGSIVs7Vv/XY8hLZVq7BaWogwlz1yYN6tcDUcRn
bp+RuC6FrPO8/G6QMX2TwmSmEM15L/RvWKEjAigqEeqOrAW1EurKMLa9oim4gJtH
g3IRXdjzR6iYGW56XZ66MKmfXt3D2hPVOHYR2OvlMZMecSnuN0ncEAPzXa5hPNKN
Ff2seiKL7DWKnUhDsqLQfJIXEtAqxavNK9SbCB54g7Vq6THPfcN49+dXxkqHUxpH
IzGfYbfURrUEQw5KMsE1X7RueiX9eYTIWbthPw3A/OvWYC4HJ34=
=HltL
-----END PGP SIGNATURE-----
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
I'm announcing the release of the 6.8.1 kernel.
All users of the 6.8 kernel series must upgrade.
The updated 6.8.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.8.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
Thanks,
Sasha
- ------------
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEE4n5dijQDou9mhzu83qZv95d3LNwFAmX0mB0ACgkQ3qZv95d3
LNw+tBAAraZu6ZEFSO9/acuTsFQNQqPCVT6KPRWiX3FNWKnROFW10ExxgmjVTRUc
xpZ8KFSuxXb9ipPEHI5r1Apqp+iu6LE8NmbMRPLVyWEWonnxdKKWztturhebOYZs
Kj8HAwIyf0H3ubrIXT/pxtOzodiA3nsyc49+NNAK6fciEHe/g0KuSsC1a6nBAndG
8zqNTQg1jI02ymoiOPSf5z1MuIYmSA5Ae2+4pHthWcDPiaGw2MWqc632x2W+Yh1a
gZIXYar6VLA2Bk2hcSL5tOVIDTXHEZV5iRndJ5wbxbg+LbE0z1tTvuZPLEPefRnk
lMP+rObTHZLf9yTNMRbWJ0HwhrdFuVxLYUMeZfDyI5I5XF2pYv9qTQmjc+tYJICI
qbDiO5u1j7Ffsxfl0HIWNLlSUMt3MBt1cn8riMxb9YSFWVFiflw66Axi4Rybhp7E
Brrp/SNl1fHRoSgTxJC1RRz+/uIuwaQGOARKfcakXPLfQ54zUdNwSzEBdpXuLV3j
6QIS7d6hiw0IvtU5BF2JUMVt2hWW9SEAnhPRKrh+AJRX3aBjYmhe8UMo4OWfNINQ
Hzl4b/AZwQ22r3z20yIlbwLSmd/P27H+HbWnSOiPdpMDttes+jcN/nso0coHHX5i
chCuy4eGCj9K0WOSYSTn6XyobMMuQbyHVjXMM0wyXGLQbhqHycw=
=Aljr
-----END PGP SIGNATURE-----
This is the start of the stable review cycle for the 6.8.1 release.
There are 5 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Mar 15 04:28:11 PM UTC 2024.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.8.y
and the diffstat can be found below.
Thanks,
Sasha
-------------
Pseudo-Shortlog of commits:
Pawan Gupta (4):
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Documentation/hw-vuln: Add documentation for RFDS
x86/rfds: Mitigate Register File Data Sampling (RFDS)
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Sasha Levin (1):
Linux 6.8.1-rc1
.../ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../hw-vuln/reg-file-data-sampling.rst | 104 ++++++++++++++++++
.../admin-guide/kernel-parameters.txt | 21 ++++
Makefile | 4 +-
arch/x86/Kconfig | 11 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 ++++++-
arch/x86/kvm/x86.c | 5 +-
drivers/base/cpu.c | 3 +
include/linux/cpu.h | 2 +
13 files changed, 280 insertions(+), 11 deletions(-)
create mode 100644 Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
--
2.43.0
This is the start of the stable review cycle for the 6.6.22 release.
There are 60 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Mar 15 04:36:58 PM UTC 2024.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.6.y
and the diffstat can be found below.
Thanks,
Sasha
-------------
Pseudo-Shortlog of commits:
Byungchul Park (1):
mm/vmscan: fix a bug calling wakeup_kswapd() with a wrong zone index
Christian Borntraeger (1):
KVM: s390: vsie: fix race during shadow creation
Daniel Borkmann (2):
xdp, bonding: Fix feature flags when there are no slave devs anymore
selftests/bpf: Fix up xdp bonding test wrt feature flags
Eduard Zingerman (1):
bpf: check bpf_func_state->callback_depth when pruning states
Edward Adam Davis (1):
net/rds: fix WARNING in rds_conn_connect_if_down
Emeel Hakim (1):
net/mlx5e: Fix MACsec state loss upon state update in offload path
Eric Dumazet (2):
geneve: make sure to pull inner header in geneve_rx()
net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
Florian Kauer (1):
igc: avoid returning frame twice in XDP_REDIRECT
Florian Westphal (1):
netfilter: nft_ct: fix l3num expectations with inet pseudo family
Frank Li (3):
dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in
dts
dmaengine: fsl-edma: utilize common dt-binding header file
dmaengine: fsl-edma: correct max_segment_size setting
Gao Xiang (1):
erofs: apply proper VMA alignment for memory mapped files on THP
Gavin Li (1):
Revert "net/mlx5: Block entering switchdev mode with ns inconsistency"
Horatiu Vultur (1):
net: sparx5: Fix use after free inside sparx5_del_mact_entry
Jacob Keller (1):
ice: virtchnl: stop pretending to support RSS over AQ or registers
Jan Kara (1):
readahead: avoid multiple marked readahead pages
Jason Xing (12):
netrom: Fix a data-race around sysctl_netrom_default_path_quality
netrom: Fix a data-race around
sysctl_netrom_obsolescence_count_initialiser
netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
netrom: Fix a data-race around sysctl_netrom_transport_timeout
netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
netrom: Fix a data-race around
sysctl_netrom_transport_acknowledge_delay
netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
netrom: Fix a data-race around
sysctl_netrom_transport_requested_window_size
netrom: Fix a data-race around
sysctl_netrom_transport_no_activity_timeout
netrom: Fix a data-race around sysctl_netrom_routing_control
netrom: Fix a data-race around sysctl_netrom_link_fails_count
netrom: Fix data-races around sysctl_net_busy_read
Jianbo Liu (2):
net/mlx5: E-switch, Change flow rule destination checking
net/mlx5e: Change the warning when ignore_flow_level is not supported
Kefeng Wang (3):
mm: migrate: remove PageTransHuge check in numamigrate_isolate_page()
mm: migrate: remove THP mapcount check in numamigrate_isolate_page()
mm: migrate: convert numamigrate_isolate_page() to
numamigrate_isolate_folio()
Lena Wang (1):
netfilter: nf_conntrack_h323: Add protection for bmp length out of
range
Leon Romanovsky (1):
xfrm: Pass UDP encapsulation in TX packet offload
Maciej Fijalkowski (3):
ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
i40e: disable NAPI right after disabling irqs when handling xsk_pool
ice: reorder disabling IRQ and NAPI in ice_qp_dis
Matthieu Baerts (NGI0) (1):
selftests: mptcp: decrease BW in simult flows
Moshe Shemesh (1):
net/mlx5: Check capability for fw_reset
Nico Boehr (1):
KVM: s390: add stat counter for shadow gmap events
Oleg Nesterov (1):
exit: wait_task_zombie: kill the no longer necessary
spin_lock_irq(siglock)
Oleksij Rempel (1):
net: lan78xx: fix runtime PM count underflow on link stop
Pawan Gupta (4):
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Documentation/hw-vuln: Add documentation for RFDS
x86/rfds: Mitigate Register File Data Sampling (RFDS)
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Rahul Rameshbabu (2):
net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission
tracking occurs after populating the metadata_map
net/mlx5e: Switch to using _bh variant of of spinlock API in port
timestamping NAPI poll context
Rand Deeb (1):
net: ice: Fix potential NULL pointer dereference in
ice_bridge_setlink()
Saeed Mahameed (1):
Revert "net/mlx5e: Check the number of elements before walk TC
rhashtable"
Sasha Levin (1):
Linux 6.6.22-rc1
Steven Rostedt (Google) (1):
tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string
Tobias Jakobi (Compleo) (1):
net: dsa: microchip: fix register write order in ksz8_ind_write8()
Toke Høiland-Jørgensen (1):
cpumap: Zero-initialise xdp_rxq_info struct before running XDP program
Xiubo Li (1):
ceph: switch to corrected encoding of max_xattr_size in mdsmap
Yongzhi Liu (1):
net: pds_core: Fix possible double free in error handling path
.../ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../hw-vuln/reg-file-data-sampling.rst | 104 ++++++++++++++++++
.../admin-guide/kernel-parameters.txt | 21 ++++
Makefile | 4 +-
arch/s390/include/asm/kvm_host.h | 7 ++
arch/s390/kvm/gaccess.c | 7 ++
arch/s390/kvm/kvm-s390.c | 9 +-
arch/s390/kvm/vsie.c | 6 +-
arch/s390/mm/gmap.c | 1 +
arch/x86/Kconfig | 11 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 ++++++-
arch/x86/kvm/x86.c | 5 +-
drivers/base/cpu.c | 3 +
drivers/dma/fsl-edma-common.h | 5 +-
drivers/dma/fsl-edma-main.c | 21 ++--
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/dsa/microchip/ksz8795.c | 4 +-
drivers/net/ethernet/amd/pds_core/auxbus.c | 12 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/ice/ice_main.c | 2 +
drivers/net/ethernet/intel/ice/ice_virtchnl.c | 9 +-
.../intel/ice/ice_virtchnl_allowlist.c | 2 -
drivers/net/ethernet/intel/ice/ice_xsk.c | 9 +-
drivers/net/ethernet/intel/igc/igc_main.c | 13 +--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 ++++++++--
.../net/ethernet/mellanox/mlx5/core/devlink.c | 6 +
.../net/ethernet/mellanox/mlx5/core/en/ptp.c | 12 +-
.../mellanox/mlx5/core/en/tc/post_act.c | 2 +-
.../mellanox/mlx5/core/en_accel/macsec.c | 82 ++++++++------
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 2 +
.../mellanox/mlx5/core/esw/ipsec_fs.c | 2 +-
.../mellanox/mlx5/core/eswitch_offloads.c | 46 +++-----
.../ethernet/mellanox/mlx5/core/fw_reset.c | 22 +++-
.../microchip/sparx5/sparx5_mactable.c | 4 +-
drivers/net/geneve.c | 18 ++-
drivers/net/usb/lan78xx.c | 3 +-
fs/ceph/mdsmap.c | 7 +-
fs/erofs/data.c | 1 +
include/dt-bindings/dma/fsl-edma.h | 21 ++++
include/linux/ceph/mdsmap.h | 6 +-
include/linux/cpu.h | 2 +
include/linux/mlx5/mlx5_ifc.h | 4 +-
include/trace/events/qdisc.h | 20 ++--
kernel/bpf/cpumap.c | 2 +-
kernel/bpf/verifier.c | 3 +
kernel/exit.c | 10 +-
mm/migrate.c | 34 +++---
mm/readahead.c | 4 +-
net/ipv6/route.c | 21 ++--
net/netfilter/nf_conntrack_h323_asn1.c | 4 +
net/netfilter/nft_ct.c | 11 +-
net/netrom/af_netrom.c | 14 +--
net/netrom/nr_dev.c | 2 +-
net/netrom/nr_in.c | 6 +-
net/netrom/nr_out.c | 2 +-
net/netrom/nr_route.c | 8 +-
net/netrom/nr_subr.c | 5 +-
net/rds/rdma.c | 3 +
net/rds/send.c | 6 +-
net/xfrm/xfrm_device.c | 2 +-
.../selftests/bpf/prog_tests/xdp_bonding.c | 4 +-
.../selftests/net/mptcp/simult_flows.sh | 8 +-
66 files changed, 628 insertions(+), 237 deletions(-)
create mode 100644 Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
create mode 100644 include/dt-bindings/dma/fsl-edma.h
--
2.43.0
The vfio-platform SET_IRQS ioctl currently allows loopback triggering of
an interrupt before a signaling eventfd has been configured by the user,
which thereby allows a NULL pointer dereference.
Rather than register the IRQ relative to a valid trigger, register all
IRQs in a disabled state in the device open path. This allows mask
operations on the IRQ to nest within the overall enable state governed
by a valid eventfd signal. This decouples @masked, protected by the
@locked spinlock from @trigger, protected via the @igate mutex.
In doing so, it's guaranteed that changes to @trigger cannot race the
IRQ handlers because the IRQ handler is synchronously disabled before
modifying the trigger, and loopback triggering of the IRQ via ioctl is
safe due to serialization with trigger changes via igate.
For compatibility, request_irq() failures are maintained to be local to
the SET_IRQS ioctl rather than a fatal error in the open device path.
This allows, for example, a userspace driver with polling mode support
to continue to work regardless of moving the request_irq() call site.
This necessarily blocks all SET_IRQS access to the failed index.
Cc: Eric Auger <eric.auger(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: 57f972e2b341 ("vfio/platform: trigger an interrupt via eventfd")
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
---
drivers/vfio/platform/vfio_platform_irq.c | 100 +++++++++++++++-------
1 file changed, 68 insertions(+), 32 deletions(-)
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index e5dcada9e86c..ef41ecef83af 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -136,6 +136,16 @@ static int vfio_platform_set_irq_unmask(struct vfio_platform_device *vdev,
return 0;
}
+/*
+ * The trigger eventfd is guaranteed valid in the interrupt path
+ * and protected by the igate mutex when triggered via ioctl.
+ */
+static void vfio_send_eventfd(struct vfio_platform_irq *irq_ctx)
+{
+ if (likely(irq_ctx->trigger))
+ eventfd_signal(irq_ctx->trigger);
+}
+
static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
{
struct vfio_platform_irq *irq_ctx = dev_id;
@@ -155,7 +165,7 @@ static irqreturn_t vfio_automasked_irq_handler(int irq, void *dev_id)
spin_unlock_irqrestore(&irq_ctx->lock, flags);
if (ret == IRQ_HANDLED)
- eventfd_signal(irq_ctx->trigger);
+ vfio_send_eventfd(irq_ctx);
return ret;
}
@@ -164,52 +174,40 @@ static irqreturn_t vfio_irq_handler(int irq, void *dev_id)
{
struct vfio_platform_irq *irq_ctx = dev_id;
- eventfd_signal(irq_ctx->trigger);
+ vfio_send_eventfd(irq_ctx);
return IRQ_HANDLED;
}
static int vfio_set_trigger(struct vfio_platform_device *vdev, int index,
- int fd, irq_handler_t handler)
+ int fd)
{
struct vfio_platform_irq *irq = &vdev->irqs[index];
struct eventfd_ctx *trigger;
- int ret;
if (irq->trigger) {
- irq_clear_status_flags(irq->hwirq, IRQ_NOAUTOEN);
- free_irq(irq->hwirq, irq);
- kfree(irq->name);
+ disable_irq(irq->hwirq);
eventfd_ctx_put(irq->trigger);
irq->trigger = NULL;
}
if (fd < 0) /* Disable only */
return 0;
- irq->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-irq[%d](%s)",
- irq->hwirq, vdev->name);
- if (!irq->name)
- return -ENOMEM;
trigger = eventfd_ctx_fdget(fd);
- if (IS_ERR(trigger)) {
- kfree(irq->name);
+ if (IS_ERR(trigger))
return PTR_ERR(trigger);
- }
irq->trigger = trigger;
- irq_set_status_flags(irq->hwirq, IRQ_NOAUTOEN);
- ret = request_irq(irq->hwirq, handler, 0, irq->name, irq);
- if (ret) {
- kfree(irq->name);
- eventfd_ctx_put(trigger);
- irq->trigger = NULL;
- return ret;
- }
-
- if (!irq->masked)
- enable_irq(irq->hwirq);
+ /*
+ * irq->masked effectively provides nested disables within the overall
+ * enable relative to trigger. Specifically request_irq() is called
+ * with NO_AUTOEN, therefore the IRQ is initially disabled. The user
+ * may only further disable the IRQ with a MASK operations because
+ * irq->masked is initially false.
+ */
+ enable_irq(irq->hwirq);
return 0;
}
@@ -228,7 +226,7 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
handler = vfio_irq_handler;
if (!count && (flags & VFIO_IRQ_SET_DATA_NONE))
- return vfio_set_trigger(vdev, index, -1, handler);
+ return vfio_set_trigger(vdev, index, -1);
if (start != 0 || count != 1)
return -EINVAL;
@@ -236,7 +234,7 @@ static int vfio_platform_set_irq_trigger(struct vfio_platform_device *vdev,
if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
int32_t fd = *(int32_t *)data;
- return vfio_set_trigger(vdev, index, fd, handler);
+ return vfio_set_trigger(vdev, index, fd);
}
if (flags & VFIO_IRQ_SET_DATA_NONE) {
@@ -260,6 +258,14 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
unsigned start, unsigned count, uint32_t flags,
void *data) = NULL;
+ /*
+ * For compatibility, errors from request_irq() are local to the
+ * SET_IRQS path and reflected in the name pointer. This allows,
+ * for example, polling mode fallback for an exclusive IRQ failure.
+ */
+ if (IS_ERR(vdev->irqs[index].name))
+ return PTR_ERR(vdev->irqs[index].name);
+
switch (flags & VFIO_IRQ_SET_ACTION_TYPE_MASK) {
case VFIO_IRQ_SET_ACTION_MASK:
func = vfio_platform_set_irq_mask;
@@ -280,7 +286,7 @@ int vfio_platform_set_irqs_ioctl(struct vfio_platform_device *vdev,
int vfio_platform_irq_init(struct vfio_platform_device *vdev)
{
- int cnt = 0, i;
+ int cnt = 0, i, ret = 0;
while (vdev->get_irq(vdev, cnt) >= 0)
cnt++;
@@ -292,29 +298,54 @@ int vfio_platform_irq_init(struct vfio_platform_device *vdev)
for (i = 0; i < cnt; i++) {
int hwirq = vdev->get_irq(vdev, i);
+ irq_handler_t handler = vfio_irq_handler;
- if (hwirq < 0)
+ if (hwirq < 0) {
+ ret = -EINVAL;
goto err;
+ }
spin_lock_init(&vdev->irqs[i].lock);
vdev->irqs[i].flags = VFIO_IRQ_INFO_EVENTFD;
- if (irq_get_trigger_type(hwirq) & IRQ_TYPE_LEVEL_MASK)
+ if (irq_get_trigger_type(hwirq) & IRQ_TYPE_LEVEL_MASK) {
vdev->irqs[i].flags |= VFIO_IRQ_INFO_MASKABLE
| VFIO_IRQ_INFO_AUTOMASKED;
+ handler = vfio_automasked_irq_handler;
+ }
vdev->irqs[i].count = 1;
vdev->irqs[i].hwirq = hwirq;
vdev->irqs[i].masked = false;
+ vdev->irqs[i].name = kasprintf(GFP_KERNEL_ACCOUNT,
+ "vfio-irq[%d](%s)", hwirq,
+ vdev->name);
+ if (!vdev->irqs[i].name) {
+ ret = -ENOMEM;
+ goto err;
+ }
+
+ ret = request_irq(hwirq, handler, IRQF_NO_AUTOEN,
+ vdev->irqs[i].name, &vdev->irqs[i]);
+ if (ret) {
+ kfree(vdev->irqs[i].name);
+ vdev->irqs[i].name = ERR_PTR(ret);
+ }
}
vdev->num_irqs = cnt;
return 0;
err:
+ for (--i; i >= 0; i--) {
+ if (!IS_ERR(vdev->irqs[i].name)) {
+ free_irq(vdev->irqs[i].hwirq, &vdev->irqs[i]);
+ kfree(vdev->irqs[i].name);
+ }
+ }
kfree(vdev->irqs);
- return -EINVAL;
+ return ret;
}
void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
@@ -324,7 +355,12 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
for (i = 0; i < vdev->num_irqs; i++) {
vfio_virqfd_disable(&vdev->irqs[i].mask);
vfio_virqfd_disable(&vdev->irqs[i].unmask);
- vfio_set_trigger(vdev, i, -1, NULL);
+ if (!IS_ERR(vdev->irqs[i].name)) {
+ free_irq(vdev->irqs[i].hwirq, &vdev->irqs[i]);
+ if (vdev->irqs[i].trigger)
+ eventfd_ctx_put(vdev->irqs[i].trigger);
+ kfree(vdev->irqs[i].name);
+ }
}
vdev->num_irqs = 0;
--
2.44.0
This is the start of the stable review cycle for the 6.7.10 release.
There are 61 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Mar 15 04:32:27 PM UTC 2024.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.7.y
and the diffstat can be found below.
Thanks,
Sasha
-------------
Pseudo-Shortlog of commits:
Aya Levin (1):
net/mlx5: Fix fw reporter diagnose output
Daniel Borkmann (2):
xdp, bonding: Fix feature flags when there are no slave devs anymore
selftests/bpf: Fix up xdp bonding test wrt feature flags
Eduard Zingerman (1):
bpf: check bpf_func_state->callback_depth when pruning states
Edward Adam Davis (1):
net/rds: fix WARNING in rds_conn_connect_if_down
Emeel Hakim (1):
net/mlx5e: Fix MACsec state loss upon state update in offload path
Emil Tantilov (1):
idpf: disable local BH when scheduling napi for marker packets
Eric Dumazet (2):
geneve: make sure to pull inner header in geneve_rx()
net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
Florian Kauer (1):
igc: avoid returning frame twice in XDP_REDIRECT
Florian Westphal (1):
netfilter: nft_ct: fix l3num expectations with inet pseudo family
Frank Li (3):
dt-bindings: dma: fsl-edma: Add fsl-edma.h to prevent hardcoding in
dts
dmaengine: fsl-edma: utilize common dt-binding header file
dmaengine: fsl-edma: correct max_segment_size setting
Gao Xiang (1):
erofs: apply proper VMA alignment for memory mapped files on THP
Gavin Li (1):
Revert "net/mlx5: Block entering switchdev mode with ns inconsistency"
Guillaume Nault (1):
xfrm: Clear low order bits of ->flowi4_tos in decode_session4().
Horatiu Vultur (1):
net: sparx5: Fix use after free inside sparx5_del_mact_entry
Jacob Keller (2):
ice: replace ice_vf_recreate_vsi() with ice_vf_reconfig_vsi()
ice: virtchnl: stop pretending to support RSS over AQ or registers
Jan Kara (1):
readahead: avoid multiple marked readahead pages
Jason Xing (12):
netrom: Fix a data-race around sysctl_netrom_default_path_quality
netrom: Fix a data-race around
sysctl_netrom_obsolescence_count_initialiser
netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
netrom: Fix a data-race around sysctl_netrom_transport_timeout
netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
netrom: Fix a data-race around
sysctl_netrom_transport_acknowledge_delay
netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
netrom: Fix a data-race around
sysctl_netrom_transport_requested_window_size
netrom: Fix a data-race around
sysctl_netrom_transport_no_activity_timeout
netrom: Fix a data-race around sysctl_netrom_routing_control
netrom: Fix a data-race around sysctl_netrom_link_fails_count
netrom: Fix data-races around sysctl_net_busy_read
Jianbo Liu (2):
net/mlx5: E-switch, Change flow rule destination checking
net/mlx5e: Change the warning when ignore_flow_level is not supported
Lena Wang (1):
netfilter: nf_conntrack_h323: Add protection for bmp length out of
range
Leon Romanovsky (1):
xfrm: Pass UDP encapsulation in TX packet offload
Maciej Fijalkowski (3):
ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
i40e: disable NAPI right after disabling irqs when handling xsk_pool
ice: reorder disabling IRQ and NAPI in ice_qp_dis
Matthieu Baerts (NGI0) (1):
selftests: mptcp: decrease BW in simult flows
Michal Schmidt (1):
ice: fix uninitialized dplls mutex usage
Michal Swiatkowski (1):
ice: reconfig host after changing MSI-X on VF
Moshe Shemesh (1):
net/mlx5: Check capability for fw_reset
Oleg Nesterov (1):
exit: wait_task_zombie: kill the no longer necessary
spin_lock_irq(siglock)
Oleksij Rempel (1):
net: lan78xx: fix runtime PM count underflow on link stop
Pawan Gupta (4):
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Documentation/hw-vuln: Add documentation for RFDS
x86/rfds: Mitigate Register File Data Sampling (RFDS)
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Rahul Rameshbabu (2):
net/mlx5e: Use a memory barrier to enforce PTP WQ xmit submission
tracking occurs after populating the metadata_map
net/mlx5e: Switch to using _bh variant of of spinlock API in port
timestamping NAPI poll context
Rand Deeb (1):
net: ice: Fix potential NULL pointer dereference in
ice_bridge_setlink()
Saeed Mahameed (1):
Revert "net/mlx5e: Check the number of elements before walk TC
rhashtable"
Sasha Levin (1):
Linux 6.7.10-rc1
Steven Rostedt (Google) (1):
tracing/net_sched: Fix tracepoints that save qdisc_dev() as a string
Suren Baghdasaryan (1):
arch/arm/mm: fix major fault accounting when retrying under per-VMA
lock
Tobias Jakobi (Compleo) (1):
net: dsa: microchip: fix register write order in ksz8_ind_write8()
Toke Høiland-Jørgensen (1):
cpumap: Zero-initialise xdp_rxq_info struct before running XDP program
Wang Kefeng (1):
ARM: 9328/1: mm: try VMA lock-based page fault handling first
Yongzhi Liu (1):
net: pds_core: Fix possible double free in error handling path
.../ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../hw-vuln/reg-file-data-sampling.rst | 104 ++++++++++++++++++
.../admin-guide/kernel-parameters.txt | 21 ++++
Makefile | 4 +-
arch/arm/Kconfig | 1 +
arch/arm/mm/fault.c | 32 ++++++
arch/x86/Kconfig | 11 ++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 ++++++-
arch/x86/kvm/x86.c | 5 +-
drivers/base/cpu.c | 3 +
drivers/dma/fsl-edma-common.h | 5 +-
drivers/dma/fsl-edma-main.c | 21 ++--
drivers/net/bonding/bond_main.c | 2 +-
drivers/net/dsa/microchip/ksz8795.c | 4 +-
drivers/net/ethernet/amd/pds_core/auxbus.c | 12 +-
drivers/net/ethernet/intel/i40e/i40e_main.c | 2 +-
drivers/net/ethernet/intel/ice/ice_dpll.c | 2 +-
drivers/net/ethernet/intel/ice/ice_main.c | 2 +
drivers/net/ethernet/intel/ice/ice_sriov.c | 33 ++----
drivers/net/ethernet/intel/ice/ice_vf_lib.c | 35 ++++--
drivers/net/ethernet/intel/ice/ice_vf_lib.h | 1 -
.../ethernet/intel/ice/ice_vf_lib_private.h | 1 +
drivers/net/ethernet/intel/ice/ice_virtchnl.c | 9 +-
.../intel/ice/ice_virtchnl_allowlist.c | 2 -
drivers/net/ethernet/intel/ice/ice_xsk.c | 9 +-
.../net/ethernet/intel/idpf/idpf_virtchnl.c | 2 +
drivers/net/ethernet/intel/igc/igc_main.c | 13 +--
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 ++++++++--
.../net/ethernet/mellanox/mlx5/core/devlink.c | 6 +
.../net/ethernet/mellanox/mlx5/core/en/ptp.c | 12 +-
.../mellanox/mlx5/core/en/tc/post_act.c | 2 +-
.../mellanox/mlx5/core/en_accel/macsec.c | 82 ++++++++------
.../net/ethernet/mellanox/mlx5/core/en_tx.c | 2 +
.../mellanox/mlx5/core/esw/ipsec_fs.c | 2 +-
.../mellanox/mlx5/core/eswitch_offloads.c | 46 +++-----
.../ethernet/mellanox/mlx5/core/fw_reset.c | 22 +++-
.../net/ethernet/mellanox/mlx5/core/health.c | 2 +-
.../microchip/sparx5/sparx5_mactable.c | 4 +-
drivers/net/geneve.c | 18 ++-
drivers/net/usb/lan78xx.c | 3 +-
fs/erofs/data.c | 1 +
include/dt-bindings/dma/fsl-edma.h | 21 ++++
include/linux/cpu.h | 2 +
include/linux/mlx5/mlx5_ifc.h | 4 +-
include/trace/events/qdisc.h | 20 ++--
kernel/bpf/cpumap.c | 2 +-
kernel/bpf/verifier.c | 3 +
kernel/exit.c | 10 +-
mm/readahead.c | 4 +-
net/ipv6/route.c | 21 ++--
net/netfilter/nf_conntrack_h323_asn1.c | 4 +
net/netfilter/nft_ct.c | 11 +-
net/netrom/af_netrom.c | 14 +--
net/netrom/nr_dev.c | 2 +-
net/netrom/nr_in.c | 6 +-
net/netrom/nr_out.c | 2 +-
net/netrom/nr_route.c | 8 +-
net/netrom/nr_subr.c | 5 +-
net/rds/rdma.c | 3 +
net/rds/send.c | 6 +-
net/xfrm/xfrm_device.c | 2 +-
net/xfrm/xfrm_policy.c | 2 +-
.../selftests/bpf/prog_tests/xdp_bonding.c | 4 +-
.../selftests/net/mptcp/simult_flows.sh | 8 +-
68 files changed, 648 insertions(+), 251 deletions(-)
create mode 100644 Documentation/admin-guide/hw-vuln/reg-file-data-sampling.rst
create mode 100644 include/dt-bindings/dma/fsl-edma.h
--
2.43.0
It is possible trigger below warning message from mass storage function,
WARNING: CPU: 6 PID: 3839 at drivers/usb/gadget/udc/core.c:294 usb_ep_queue+0x7c/0x104
pc : usb_ep_queue+0x7c/0x104
lr : fsg_main_thread+0x494/0x1b3c
Root cause is mass storage function try to queue request from main thread,
but other thread may already disable ep when function disable.
As there is no function failure in the driver, in order to avoid effort
to fix warning, change WARN_ON_ONCE() in usb_ep_queue() to pr_debug().
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: yuan linyu <yuanlinyu(a)hihonor.com>
---
v4: add version info in subject
v3: add more debug info, remove two line commit description
https://lore.kernel.org/linux-usb/20240315015854.2715357-1-yuanlinyu@hihono…
v2: change WARN_ON_ONCE() in usb_ep_queue() to pr_debug()
https://lore.kernel.org/linux-usb/20240315013019.2711135-1-yuanlinyu@hihono…
v1: https://lore.kernel.org/linux-usb/20240314065949.2627778-1-yuanlinyu@hihono…
drivers/usb/gadget/udc/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index 9d4150124fdb..b3a9d18a8dcd 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -292,7 +292,9 @@ int usb_ep_queue(struct usb_ep *ep,
{
int ret = 0;
- if (WARN_ON_ONCE(!ep->enabled && ep->address)) {
+ if (!ep->enabled && ep->address) {
+ pr_debug("USB gadget: queue request to disabled ep 0x%x (%s)\n",
+ ep->address, ep->name);
ret = -ESHUTDOWN;
goto out;
}
--
2.25.1
Hi!
This is not exactly a regression, as I am not aware of a prior working
state, but kernel documentation advises me to CC regressions list anyway¹.
I am trying to put data on an external Kingston XS-2000 4 TB SSD using
self-compiled Linux 6.7.4 kernel and encrypted BCacheFS. I do not think
BCacheFS has any part in the errors I see, but if you disagree feel free
to CC the BCacheFS mailing list as you reply.
I am using a ThinkPad T14 AMD Gen 1 with AMD Ryzen 7 PRO 4750U and 32
GiB of RAM.
I connected the SSD onto USB-C port directly with the ThinkPad. lsusb
lists it as:
Bus 007 Device 004: ID 0951:176b Kingston Technology XS2000
The SSD is detected as follows:
[20303.913644] usb 7-1: new SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
[20303.926616] usb 7-1: New USB device found, idVendor=0951, idProduct=176b, bcdDevice= 1.00
[20303.926633] usb 7-1: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[20303.926641] usb 7-1: Product: XS2000
[20303.926647] usb 7-1: Manufacturer: Kingston
[20303.926652] usb 7-1: SerialNumber: […]
[20303.929078] scsi host0: uas
[20303.983859] scsi 0:0:0:0: Direct-Access Kingston XS2000 1000 PQ: 0 ANSI: 6
[20303.984426] sd 0:0:0:0: Attached scsi generic sg0 type 0
[20303.985197] sd 0:0:0:0: [sda] 8001573552 512-byte logical blocks: (4.10 TB/3.73 TiB)
[20303.985331] sd 0:0:0:0: [sda] Write Protect is off
[20303.985341] sd 0:0:0:0: [sda] Mode Sense: 43 00 00 00
[20303.985579] sd 0:0:0:0: [sda] Write cache: disabled, read cache: enabled, doesn't support DPO or FUA
[20303.989516] sda: sda1
[20303.989611] sd 0:0:0:0: [sda] Attached SCSI disk
BCacheFS is mounted as follows – but I suspect BCacheFS is not involved in
those errors anyway:
[20310.437864] bcachefs (sda1): mounting version 1.3: rebalance_work opts=metadata_checksum=xxhash,data_checksum=xxhash,compression=lz4
[20310.437895] bcachefs (sda1): recovering from clean shutdown, journal seq 5094
[20310.450813] bcachefs (sda1): alloc_read... done
[20310.450851] bcachefs (sda1): stripes_read... done
[20310.450855] bcachefs (sda1): snapshots_read... done
[20310.470815] bcachefs (sda1): journal_replay... done
[20310.470824] bcachefs (sda1): resume_logged_ops... done
[20310.470835] bcachefs (sda1): going read-write
During rsync'ing about 1,4 TB of data after eventually a hour I got
things like this:
[33963.462694] sd 0:0:0:0: [sda] tag#10 uas_zap_pending 0 uas-tag 1 inflight: CMD
[33963.462708] sd 0:0:0:0: [sda] tag#10 CDB: Write(16) 8a 00 00 00 00 00 82 c1 bc 00 00 00 04 00 00 00
[33963.462718] sd 0:0:0:0: [sda] tag#11 uas_zap_pending 0 uas-tag 2 inflight: CMD
[33963.462725] sd 0:0:0:0: [sda] tag#11 CDB: Write(16) 8a 00 00 00 00 00 82 c1 c8 00 00 00 04 00 00 00
[33963.462733] sd 0:0:0:0: [sda] tag#15 uas_zap_pending 0 uas-tag 3 inflight: CMD
[33963.462740] sd 0:0:0:0: [sda] tag#15 CDB: Write(16) 8a 00 00 00 00 00 82 c1 d2 4c 00 00 01 2f 00 00
[33963.462748] sd 0:0:0:0: [sda] tag#12 uas_zap_pending 0 uas-tag 4 inflight: CMD
[33963.462754] sd 0:0:0:0: [sda] tag#12 CDB: Write(16) 8a 00 00 00 00 00 82 c1 d0 00 00 00 02 4c 00 00
[33963.462762] sd 0:0:0:0: [sda] tag#13 uas_zap_pending 0 uas-tag 5 inflight: CMD
[33963.462769] sd 0:0:0:0: [sda] tag#13 CDB: Write(16) 8a 00 00 00 00 00 82 c1 d4 00 00 00 00 ff 00 00
[33963.462777] sd 0:0:0:0: [sda] tag#14 uas_zap_pending 0 uas-tag 6 inflight: CMD
[33963.462783] sd 0:0:0:0: [sda] tag#14 CDB: Write(16) 8a 00 00 00 00 00 82 c1 ce 00 00 00 00 cc 00 00
[33963.576991] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 9 using xhci_hcd
[33963.590793] scsi host0: uas_eh_device_reset_handler success
[33963.592857] sd 0:0:0:0: [sda] tag#10 timing out command, waited 180s
[33963.592872] sd 0:0:0:0: [sda] tag#10 FAILED Result: hostbyte=DID_RESET driverbyte=DRIVER_OK cmd_age=182s
[33963.592881] sd 0:0:0:0: [sda] tag#10 CDB: Write(16) 8a 00 00 00 00 00 82 c1 bc 00 00 00 04 00 00 00
[33963.592886] I/O error, dev sda, sector 2193734656 op 0x1:(WRITE) flags 0x104000 phys_seg 773 prio class 2
[33963.592898] bcachefs (sda1 inum 1073761281 offset 265216): data write error: I/O
[33963.592925] bcachefs (sda1 inum 1073761281 offset 467456): data write error: I/O
[33963.592933] bcachefs (sda1 inum 1073761281 offset 470016): data write error: I/O
[33963.592939] bcachefs (sda1 inum 1073761281 offset 471552): data write error: I/O
[33963.592949] bcachefs (sda1 inum 1073761281 offset 514560): data write error: I/O
[33963.592956] bcachefs (sda1 inum 1073761281 offset 517120): data write error: I/O
[33963.592963] bcachefs (sda1 inum 1073761281 offset 519168): data write error: I/O
[33963.592969] bcachefs (sda1 inum 1073761281 offset 521728): data write error: I/O
[33963.592976] bcachefs (sda1 inum 1073761281 offset 523776): data write error: I/O
[33963.592983] bcachefs (sda1 inum 1073761281 offset 526336): data write error: I/O
The rsync completed but I did not trust the result, even tough
"bcachefs fsck" told me the filesystem structure is okay.
Thus I reran rsync with option "-c" for checksumming. After a long time
with data that did match, it started to transfer a file again which should
not happen if data would have been identical. As it ran into I/O errors
again, I stopped the rsync process.
I looked for that UAS error message and according to the article² I
found I disabled UAS as follows:
% cat /etc/modprobe.d/disable-uas.conf
# Does not work with external SSD Transcend XS2000 4TB
options usb-storage quirks=0951:176b:u
The quirk was applied as I reconnected the devices after unloading
both usb-storage and uas modules:
[ 55.871301] usb 7-1: UAS is ignored for this device, using usb-storage instead
[ 55.871310] usb-storage 7-1:1.0: USB Mass Storage device detected
[ 55.871559] usb-storage 7-1:1.0: Quirks match for vid 0951 pid 176b: 800000
I recreated the BCacheFS filesystem and tried again. This time it did
not take more than 10 minutes for the first I/O error to appear. Unless
with UAS it made rsync stop with an I/O error immediately. Before that
there were several USB resets. Here is the excerpt from dmesg:
[ 795.768306] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 932.976677] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 963.189438] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 1000.057333] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 1036.917137] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 1073.782876] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 1110.647786] usb 7-1: reset SuperSpeed Plus Gen 2x1 USB device number 4 using xhci_hcd
[ 1117.163693] sd 0:0:0:0: [sda] tag#0 FAILED Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK cmd_age=214s
[ 1117.163718] sd 0:0:0:0: [sda] tag#0 CDB: Write(16) 8a 00 00 00 00 00 02 72 20 00 00 00 08 00 00 00
[ 1117.163725] I/O error, dev sda, sector 41033728 op 0x1:(WRITE) flags 0x104000 phys_seg 1551 prio class 2
[ 1117.163739] bcachefs (sda1 inum 1879048481 offset 2572800): data write error: I/O
[ 1117.163763] bcachefs (sda1 inum 1879048481 offset 2576384): data write error: I/O
[ 1117.163771] bcachefs (sda1 inum 1879048481 offset 2578432): data write error: I/O
[ 1117.163779] bcachefs (sda1 inum 1879048481 offset 2580480): data write error: I/O
[ 1117.163786] bcachefs (sda1 inum 1879048481 offset 2582528): data write error: I/O
[ 1117.163794] bcachefs (sda1 inum 1879048481 offset 2584576): data write error: I/O
[ 1117.163803] bcachefs (sda1 inum 1879048481 offset 2586624): data write error: I/O
[ 1117.163811] bcachefs (sda1 inum 1879048481 offset 2588672): data write error: I/O
[ 1117.163818] bcachefs (sda1 inum 1879048481 offset 2590720): data write error: I/O
[ 1117.163824] bcachefs (sda1 inum 1879048481 offset 2592768): data write error: I/O
So even without UAS the device does not seem to like to write data on
Linux.
Next steps may involve looking for a firmware update for the external SSD
as well as trying to obtain its SMART status. So far I did not succeed in
finding the right options for smartctl. In case there is enough evidence
that the device is defective I'd try to RMA it.
I will keep a copy of kernel log and I could do some further tests as time
permits. So let me know whether you need anything else, but for now
the mail is long enough as it is.
[1] https://www.kernel.org/doc/html/latest/admin-guide/reporting-issues.html
[2] How to disable USB Attached Storage (UAS)
Last edited on 4 December 2022, at 14:00
https://leo.leung.xyz/wiki/How_to_disable_USB_Attached_Storage_(UAS)
Ciao,
--
Martin
Commit 0499a78369ad ("ARM64: Dynamically allocate cpumasks and increase
supported CPUs to 512") changed the handling of cpumasks on ARM 64bit,
what resulted in the strange issues and warnings during cpufreq-dt
initialization on some big.LITTLE platforms.
This was caused by mixing OPPs between big and LITTLE cores, because
OPP-sharing information between big and LITTLE cores is computed on
cpumask, which in turn was not zeroed on allocation. Fix this by
switching to zalloc_cpumask_var() call.
Fixes: dc279ac6e5b4 ("cpufreq: dt: Refactor initialization to handle probe deferral properly")
CC: stable(a)vger.kernel.org # v5.10+
Signed-off-by: Marek Szyprowski <m.szyprowski(a)samsung.com>
---
drivers/cpufreq/cpufreq-dt.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/cpufreq/cpufreq-dt.c b/drivers/cpufreq/cpufreq-dt.c
index 8bd6e5e8f121..2d83bbc65dd0 100644
--- a/drivers/cpufreq/cpufreq-dt.c
+++ b/drivers/cpufreq/cpufreq-dt.c
@@ -208,7 +208,7 @@ static int dt_cpufreq_early_init(struct device *dev, int cpu)
if (!priv)
return -ENOMEM;
- if (!alloc_cpumask_var(&priv->cpus, GFP_KERNEL))
+ if (!zalloc_cpumask_var(&priv->cpus, GFP_KERNEL))
return -ENOMEM;
cpumask_set_cpu(cpu, priv->cpus);
--
2.34.1
It is possible trigger below warning message from mass storage function,
WARNING: CPU: 6 PID: 3839 at drivers/usb/gadget/udc/core.c:294 usb_ep_queue+0x7c/0x104
pc : usb_ep_queue+0x7c/0x104
lr : fsg_main_thread+0x494/0x1b3c
Root cause is mass storage function try to queue request from main thread,
but other thread may already disable ep when function disable.
As there is no function failure in the driver, in order to avoid effort
to fix warning, change WARN_ON_ONCE() in usb_ep_queue() to pr_debug().
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: yuan linyu <yuanlinyu(a)hihonor.com>
---
v3: add more debug info, remove two line commit description
v2: change WARN_ON_ONCE() in usb_ep_queue() to pr_debug()
https://lore.kernel.org/linux-usb/20240315013019.2711135-1-yuanlinyu@hihono…
v1: https://lore.kernel.org/linux-usb/20240314065949.2627778-1-yuanlinyu@hihono…
drivers/usb/gadget/udc/core.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index 9d4150124fdb..b3a9d18a8dcd 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -292,7 +292,9 @@ int usb_ep_queue(struct usb_ep *ep,
{
int ret = 0;
- if (WARN_ON_ONCE(!ep->enabled && ep->address)) {
+ if (!ep->enabled && ep->address) {
+ pr_debug("USB gadget: queue request to disabled ep 0x%x (%s)\n",
+ ep->address, ep->name);
ret = -ESHUTDOWN;
goto out;
}
--
2.25.1
It is possible trigger below warning message from mass storage function,
WARNING: CPU: 6 PID: 3839 at drivers/usb/gadget/udc/core.c:294 usb_ep_queue+0x7c/0x104
CPU: 6 PID: 3839 Comm: file-storage Tainted: G S WC O 6.1.25-android14-11-g354e2a7e7cd9 #1
pstate: 22400005 (nzCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
pc : usb_ep_queue+0x7c/0x104
lr : fsg_main_thread+0x494/0x1b3c
Root cause is mass storage function try to queue request from main thread,
but other thread may already disable ep when function disable.
As there is no function failure in the driver, in order to avoid effort
to fix warning, change WARN_ON_ONCE() in usb_ep_queue() to pr_debug().
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: yuan linyu <yuanlinyu(a)hihonor.com>
---
v2: change WARN_ON_ONCE() in usb_ep_queue() to pr_debug()
v1: https://lore.kernel.org/linux-usb/20240314065949.2627778-1-yuanlinyu@hihono…
drivers/usb/gadget/udc/core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/udc/core.c b/drivers/usb/gadget/udc/core.c
index 9d4150124fdb..2fbe5977c11d 100644
--- a/drivers/usb/gadget/udc/core.c
+++ b/drivers/usb/gadget/udc/core.c
@@ -292,7 +292,8 @@ int usb_ep_queue(struct usb_ep *ep,
{
int ret = 0;
- if (WARN_ON_ONCE(!ep->enabled && ep->address)) {
+ if (!ep->enabled && ep->address) {
+ pr_debug("queue disabled ep %x\n", ep->address);
ret = -ESHUTDOWN;
goto out;
}
--
2.25.1
If the device is configured for system wakeup, then make sure that the
xHCI driver knows about it and make sure to permit wakeup only at the
appropriate time.
For host mode, if the controller goes through the dwc3 code path, then a
child xHCI platform device is created. Make sure the platform device
also inherits the wakeup setting for xHCI to enable remote wakeup.
For device mode, make sure to disable system wakeup if no gadget driver
is bound. We may experience unwanted system wakeup due to the wakeup
signal from the controller PMU detecting connection/disconnection when
in low power (D3). E.g. In the case of Steam Deck, the PCI PME prevents
the system staying in suspend.
Cc: stable(a)vger.kernel.org
Reported-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com>
Closes: https://lore.kernel.org/linux-usb/70a7692d-647c-9be7-00a6-06fc60f77294@igal…
Fixes: d07e8819a03d ("usb: dwc3: add xHCI Host support")
Signed-off-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
---
drivers/usb/dwc3/core.c | 2 ++
drivers/usb/dwc3/core.h | 2 ++
drivers/usb/dwc3/gadget.c | 10 ++++++++++
drivers/usb/dwc3/host.c | 11 +++++++++++
4 files changed, 25 insertions(+)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index 3e55838c0001..31684cdaaae3 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1519,6 +1519,8 @@ static void dwc3_get_properties(struct dwc3 *dwc)
else
dwc->sysdev = dwc->dev;
+ dwc->sys_wakeup = device_may_wakeup(dwc->sysdev);
+
ret = device_property_read_string(dev, "usb-psy-name", &usb_psy_name);
if (ret >= 0) {
dwc->usb_psy = power_supply_get_by_name(usb_psy_name);
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index e120611a5174..893b1e694efe 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -1132,6 +1132,7 @@ struct dwc3_scratchpad_array {
* 3 - Reserved
* @dis_metastability_quirk: set to disable metastability quirk.
* @dis_split_quirk: set to disable split boundary.
+ * @sys_wakeup: set if the device may do system wakeup.
* @wakeup_configured: set if the device is configured for remote wakeup.
* @suspended: set to track suspend event due to U3/L2.
* @imod_interval: set the interrupt moderation interval in 250ns
@@ -1355,6 +1356,7 @@ struct dwc3 {
unsigned dis_split_quirk:1;
unsigned async_callbacks:1;
+ unsigned sys_wakeup:1;
unsigned wakeup_configured:1;
unsigned suspended:1;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 28f49400f3e8..07820b1a88a2 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2968,6 +2968,9 @@ static int dwc3_gadget_start(struct usb_gadget *g,
dwc->gadget_driver = driver;
spin_unlock_irqrestore(&dwc->lock, flags);
+ if (dwc->sys_wakeup)
+ device_wakeup_enable(dwc->sysdev);
+
return 0;
}
@@ -2983,6 +2986,9 @@ static int dwc3_gadget_stop(struct usb_gadget *g)
struct dwc3 *dwc = gadget_to_dwc(g);
unsigned long flags;
+ if (dwc->sys_wakeup)
+ device_wakeup_disable(dwc->sysdev);
+
spin_lock_irqsave(&dwc->lock, flags);
dwc->gadget_driver = NULL;
dwc->max_cfg_eps = 0;
@@ -4664,6 +4670,10 @@ int dwc3_gadget_init(struct dwc3 *dwc)
else
dwc3_gadget_set_speed(dwc->gadget, dwc->maximum_speed);
+ /* No system wakeup if no gadget driver bound */
+ if (dwc->sys_wakeup)
+ device_wakeup_disable(dwc->sysdev);
+
return 0;
err5:
diff --git a/drivers/usb/dwc3/host.c b/drivers/usb/dwc3/host.c
index 43230915323c..f6a020d77fa1 100644
--- a/drivers/usb/dwc3/host.c
+++ b/drivers/usb/dwc3/host.c
@@ -123,6 +123,14 @@ int dwc3_host_init(struct dwc3 *dwc)
goto err;
}
+ if (dwc->sys_wakeup) {
+ /* Restore wakeup setting if switched from device */
+ device_wakeup_enable(dwc->sysdev);
+
+ /* Pass on wakeup setting to the new xhci platform device */
+ device_init_wakeup(&xhci->dev, true);
+ }
+
return 0;
err:
platform_device_put(xhci);
@@ -131,6 +139,9 @@ int dwc3_host_init(struct dwc3 *dwc)
void dwc3_host_exit(struct dwc3 *dwc)
{
+ if (dwc->sys_wakeup)
+ device_init_wakeup(&dwc->xhci->dev, false);
+
platform_device_unregister(dwc->xhci);
dwc->xhci = NULL;
}
base-commit: b234c70fefa7532d34ebee104de64cc16f1b21e4
--
2.28.0
This is the start of the stable review cycle for the 5.4.272 release.
There are 51 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri Mar 15 05:02:10 PM UTC 2024.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
Thanks,
Sasha
-------------
Pseudo-Shortlog of commits:
Andy Shevchenko (4):
serial: max310x: Use devm_clk_get_optional() to get the input clock
serial: max310x: Try to get crystal clock rate from property
serial: max310x: Make use of device properties
serial: max310x: Unprepare and disable clock in error path
Ansuel Smith (1):
regmap: allow to define reg_update_bits for no bus configuration
Arnd Bergmann (1):
y2038: rusage: use __kernel_old_timeval
Cosmin Tanislav (4):
serial: max310x: use regmap methods for SPI batch operations
serial: max310x: use a separate regmap for each port
serial: max310x: make accessing revision id interface-agnostic
serial: max310x: implement I2C support
Dexuan Cui (1):
hv_netvsc: Make netvsc/VF binding check both MAC and serial number
Edward Adam Davis (1):
net/rds: fix WARNING in rds_conn_connect_if_down
Eric Dumazet (2):
geneve: make sure to pull inner header in geneve_rx()
net/ipv6: avoid possible UAF in ip6_route_mpath_notify()
Florian Westphal (1):
netfilter: nft_ct: fix l3num expectations with inet pseudo family
Hugo Villeneuve (2):
serial: max310x: fail probe if clock crystal is unstable
serial: max310x: prevent infinite while() loop in port startup
Ingo Molnar (1):
exit: Fix typo in comment: s/sub-theads/sub-threads
Jan Kundrát (1):
serial: max310x: fix IO data corruption in batched operations
Jason Xing (12):
netrom: Fix a data-race around sysctl_netrom_default_path_quality
netrom: Fix a data-race around
sysctl_netrom_obsolescence_count_initialiser
netrom: Fix data-races around sysctl_netrom_network_ttl_initialiser
netrom: Fix a data-race around sysctl_netrom_transport_timeout
netrom: Fix a data-race around sysctl_netrom_transport_maximum_tries
netrom: Fix a data-race around
sysctl_netrom_transport_acknowledge_delay
netrom: Fix a data-race around sysctl_netrom_transport_busy_delay
netrom: Fix a data-race around
sysctl_netrom_transport_requested_window_size
netrom: Fix a data-race around
sysctl_netrom_transport_no_activity_timeout
netrom: Fix a data-race around sysctl_netrom_routing_control
netrom: Fix a data-race around sysctl_netrom_link_fails_count
netrom: Fix data-races around sysctl_net_busy_read
Johannes Berg (1):
um: allow not setting extra rpaths in the linux binary
John Efstathiades (4):
lan78xx: Fix white space and style issues
lan78xx: Add missing return code checks
lan78xx: Fix partial packet errors on suspend/resume
lan78xx: Fix race conditions in suspend/resume handling
Juhee Kang (1):
hv_netvsc: use netif_is_bond_master() instead of open code
Lena Wang (1):
netfilter: nf_conntrack_h323: Add protection for bmp length out of
range
Maciej Fijalkowski (1):
ixgbe: {dis, en}able irqs in ixgbe_txrx_ring_{dis, en}able
Marek Vasut (1):
regmap: Add bulk read/write callbacks into regmap_config
Nico Pache (1):
selftests: mm: fix map_hugetlb failure on 64K page size systems
Oleg Nesterov (5):
getrusage: add the "signal_struct *sig" local variable
getrusage: move thread_group_cputime_adjusted() outside of
lock_task_sighand()
getrusage: use __for_each_thread()
getrusage: use sig->stats_lock rather than lock_task_sighand()
exit: wait_task_zombie: kill the no longer necessary
spin_lock_irq(siglock)
Oleksij Rempel (1):
net: lan78xx: fix runtime PM count underflow on link stop
Rand Deeb (1):
net: ice: Fix potential NULL pointer dereference in
ice_bridge_setlink()
Sasha Levin (1):
Linux 5.4.272-rc1
Shradha Gupta (1):
hv_netvsc: Register VF in netvsc_probe if NET_DEVICE_REGISTER missed
Werner Sembach (1):
Input: i8042 - fix strange behavior of touchpad on Clevo NS70PU
Makefile | 4 +-
arch/alpha/kernel/osf_sys.c | 2 +-
arch/um/Kconfig | 13 +
arch/um/Makefile | 3 +-
arch/x86/Makefile.um | 2 +-
drivers/base/regmap/internal.h | 4 +
drivers/base/regmap/regmap.c | 77 +-
drivers/input/serio/i8042-x86ia64io.h | 6 +
drivers/net/ethernet/intel/ice/ice_main.c | 2 +
drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 +-
drivers/net/geneve.c | 18 +-
drivers/net/hyperv/netvsc_drv.c | 96 +-
drivers/net/usb/lan78xx.c | 910 ++++++++++++++----
drivers/tty/serial/Kconfig | 1 +
drivers/tty/serial/max310x.c | 378 ++++++--
include/linux/regmap.h | 19 +
include/uapi/linux/resource.h | 4 +-
kernel/exit.c | 12 +-
kernel/sys.c | 91 +-
net/ipv6/route.c | 21 +-
net/netfilter/nf_conntrack_h323_asn1.c | 4 +
net/netfilter/nft_ct.c | 11 +-
net/netrom/af_netrom.c | 14 +-
net/netrom/nr_dev.c | 2 +-
net/netrom/nr_in.c | 6 +-
net/netrom/nr_out.c | 2 +-
net/netrom/nr_route.c | 8 +-
net/netrom/nr_subr.c | 5 +-
net/rds/rdma.c | 3 +
net/rds/send.c | 6 +-
tools/testing/selftests/vm/map_hugetlb.c | 7 +
31 files changed, 1322 insertions(+), 465 deletions(-)
--
2.43.0
Booting v6.8 results in a hang on various IPQ5018 based boards.
Investigating the problem showed that the hang happens when the
clk_alpha_pll_stromer_plus_set_rate() function tries to write
into the PLL_MODE register of the APSS PLL.
Checking the downstream code revealed that it uses [1] stromer
specific operations for IPQ5018, whereas in the current code
the stromer plus specific operations are used.
The ops in the 'ipq_pll_stromer_plus' clock definition can't be
changed since that is needed for IPQ5332, so add a new alpha pll
clock declaration which uses the correct stromer ops and use this
new clock for IPQ5018 to avoid the boot failure.
1. https://git.codelinaro.org/clo/qsdk/oss/kernel/linux-ipq-5.4/-/blob/NHSS.QS…
Cc: stable(a)vger.kernel.org
Fixes: 50492f929486 ("clk: qcom: apss-ipq-pll: add support for IPQ5018")
Signed-off-by: Gabor Juhos <j4g8y7(a)gmail.com>
---
Based on v6.8.
---
drivers/clk/qcom/apss-ipq-pll.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/drivers/clk/qcom/apss-ipq-pll.c b/drivers/clk/qcom/apss-ipq-pll.c
index 678b805f13d45..11f1ae59438f7 100644
--- a/drivers/clk/qcom/apss-ipq-pll.c
+++ b/drivers/clk/qcom/apss-ipq-pll.c
@@ -55,6 +55,24 @@ static struct clk_alpha_pll ipq_pll_huayra = {
},
};
+static struct clk_alpha_pll ipq_pll_stromer = {
+ .offset = 0x0,
+ .regs = ipq_pll_offsets[CLK_ALPHA_PLL_TYPE_STROMER_PLUS],
+ .flags = SUPPORTS_DYNAMIC_UPDATE,
+ .clkr = {
+ .enable_reg = 0x0,
+ .enable_mask = BIT(0),
+ .hw.init = &(struct clk_init_data){
+ .name = "a53pll",
+ .parent_data = &(const struct clk_parent_data) {
+ .fw_name = "xo",
+ },
+ .num_parents = 1,
+ .ops = &clk_alpha_pll_stromer_ops,
+ },
+ },
+};
+
static struct clk_alpha_pll ipq_pll_stromer_plus = {
.offset = 0x0,
.regs = ipq_pll_offsets[CLK_ALPHA_PLL_TYPE_STROMER_PLUS],
@@ -145,7 +163,7 @@ struct apss_pll_data {
static const struct apss_pll_data ipq5018_pll_data = {
.pll_type = CLK_ALPHA_PLL_TYPE_STROMER_PLUS,
- .pll = &ipq_pll_stromer_plus,
+ .pll = &ipq_pll_stromer,
.pll_config = &ipq5018_pll_config,
};
---
base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
change-id: 20240311-apss-ipq-pll-ipq5018-hang-74d9a8f47136
Best regards,
--
Gabor Juhos <j4g8y7(a)gmail.com>
From: Shyam Prasad N <sprasad(a)microsoft.com>
Several users have reported this log getting dumped too regularly to
kernel log. The likely root cause has been identified, and it suggests
that this situation is expected for some configurations
(for example SMB2.1).
Since the function returns appropriately even for such cases, it is
fairly harmless to make this a debug log. When needed, the verbosity
can be increased to capture this log.
Cc: Stable <stable(a)vger.kernel.org>
Reported-by: Jan Čermák <sairon(a)sairon.cz>
Signed-off-by: Shyam Prasad N <sprasad(a)microsoft.com>
---
fs/smb/client/sess.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/smb/client/sess.c b/fs/smb/client/sess.c
index 8f37373fd333..37cdf5b55108 100644
--- a/fs/smb/client/sess.c
+++ b/fs/smb/client/sess.c
@@ -230,7 +230,7 @@ int cifs_try_adding_channels(struct cifs_ses *ses)
spin_lock(&ses->iface_lock);
if (!ses->iface_count) {
spin_unlock(&ses->iface_lock);
- cifs_dbg(VFS, "server %s does not advertise interfaces\n",
+ cifs_dbg(FYI, "server %s does not advertise interfaces\n",
ses->server->hostname);
break;
}
@@ -396,7 +396,7 @@ cifs_chan_update_iface(struct cifs_ses *ses, struct TCP_Server_Info *server)
spin_lock(&ses->iface_lock);
if (!ses->iface_count) {
spin_unlock(&ses->iface_lock);
- cifs_dbg(VFS, "server %s does not advertise interfaces\n", ses->server->hostname);
+ cifs_dbg(FYI, "server %s does not advertise interfaces\n", ses->server->hostname);
return;
}
--
2.34.1
The expression result >= 0 will be true even if usb_stor_ctrl_transfer()
returns an error code. It is necessary to compare result with
USB_STOR_XFER_GOOD.
Found by Linux Verification Center (linuxtesting.org) with Svace.
Signed-off-by: Roman Smirnov <r.smirnov(a)omp.ru>
Cc: stable(a)vger.kernel.org
Reviewed-by: Sergey Shtylyov <s.shtylyov(a)omp.ru>
---
drivers/usb/storage/isd200.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/storage/isd200.c b/drivers/usb/storage/isd200.c
index 300aeef160e7..2a1531793820 100644
--- a/drivers/usb/storage/isd200.c
+++ b/drivers/usb/storage/isd200.c
@@ -774,7 +774,7 @@ static int isd200_write_config( struct us_data *us )
(void *) &info->ConfigData,
sizeof(info->ConfigData));
- if (result >= 0) {
+ if (result == USB_STOR_XFER_GOOD) {
usb_stor_dbg(us, " ISD200 Config Data was written successfully\n");
} else {
usb_stor_dbg(us, " Request to write ISD200 Config Data failed!\n");
@@ -816,7 +816,7 @@ static int isd200_read_config( struct us_data *us )
sizeof(info->ConfigData));
- if (result >= 0) {
+ if (result == USB_STOR_XFER_GOOD) {
usb_stor_dbg(us, " Retrieved the following ISD200 Config Data:\n");
#ifdef CONFIG_USB_STORAGE_DEBUG
isd200_log_config(us, info);
--
2.34.1
In f2fs_update_inode, i_size of the atomic file isn't updated until
FI_ATOMIC_COMMITTED flag is set. When committing atomic write right
after the writeback of the inode, i_size of the raw inode will not be
updated. It can cause the atomicity corruption due to a mismatch between
old file size and new data.
To prevent the problem, let's mark inode dirty for FI_ATOMIC_COMMITTED
Atomic write thread Writeback thread
__writeback_single_inode
write_inode
f2fs_update_inode
- skip i_size update
f2fs_ioc_commit_atomic_write
f2fs_commit_atomic_write
set_inode_flag(inode, FI_ATOMIC_COMMITTED)
f2fs_do_sync_file
f2fs_fsync_node_pages
- skip f2fs_update_inode since the inode is clean
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable(a)vger.kernel.org #v5.19+
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Reviewed-by: Yeongjin Gil <youngjin.gil(a)samsung.com>
Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com>
---
fs/f2fs/f2fs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 543898482f8b..a000cb024dbe 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3039,6 +3039,7 @@ static inline void __mark_inode_dirty_flag(struct inode *inode,
case FI_INLINE_DOTS:
case FI_PIN_FILE:
case FI_COMPRESS_RELEASED:
+ case FI_ATOMIC_COMMITTED:
f2fs_mark_inode_dirty_sync(inode, true);
}
}
--
2.25.1
The quilt patch titled
Subject: nilfs2: prevent kernel bug at submit_bh_wbc()
has been removed from the -mm tree. Its filename was
nilfs2-prevent-kernel-bug-at-submit_bh_wbc.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: prevent kernel bug at submit_bh_wbc()
Date: Wed, 13 Mar 2024 19:58:27 +0900
Fix a bug where nilfs_get_block() returns a successful status when
searching and inserting the specified block both fail inconsistently. If
this inconsistent behavior is not due to a previously fixed bug, then an
unexpected race is occurring, so return a temporary error -EAGAIN instead.
This prevents callers such as __block_write_begin_int() from requesting a
read into a buffer that is not mapped, which would cause the BUG_ON check
for the BH_Mapped flag in submit_bh_wbc() to fail.
Link: https://lkml.kernel.org/r/20240313105827.5296-3-konishi.ryusuke@gmail.com
Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nilfs2/inode.c~nilfs2-prevent-kernel-bug-at-submit_bh_wbc
+++ a/fs/nilfs2/inode.c
@@ -112,7 +112,7 @@ int nilfs_get_block(struct inode *inode,
"%s (ino=%lu): a race condition while inserting a data block at offset=%llu",
__func__, inode->i_ino,
(unsigned long long)blkoff);
- err = 0;
+ err = -EAGAIN;
}
nilfs_transaction_abort(inode->i_sb);
goto out;
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
The quilt patch titled
Subject: nilfs2: fix failure to detect DAT corruption in btree and direct mappings
has been removed from the -mm tree. Its filename was
nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings.patch
This patch was dropped because it was merged into the mm-nonmm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix failure to detect DAT corruption in btree and direct mappings
Date: Wed, 13 Mar 2024 19:58:26 +0900
Patch series "nilfs2: fix kernel bug at submit_bh_wbc()".
This resolves a kernel BUG reported by syzbot. Since there are two
flaws involved, I've made each one a separate patch.
The first patch alone resolves the syzbot-reported bug, but I think
both fixes should be sent to stable, so I've tagged them as such.
This patch (of 2):
Syzbot has reported a kernel bug in submit_bh_wbc() when writing file data
to a nilfs2 file system whose metadata is corrupted.
There are two flaws involved in this issue.
The first flaw is that when nilfs_get_block() locates a data block using
btree or direct mapping, if the disk address translation routine
nilfs_dat_translate() fails with internal code -ENOENT due to DAT metadata
corruption, it can be passed back to nilfs_get_block(). This causes
nilfs_get_block() to misidentify an existing block as non-existent,
causing both data block lookup and insertion to fail inconsistently.
The second flaw is that nilfs_get_block() returns a successful status in
this inconsistent state. This causes the caller __block_write_begin_int()
or others to request a read even though the buffer is not mapped,
resulting in a BUG_ON check for the BH_Mapped flag in submit_bh_wbc()
failing.
This fixes the first issue by changing the return value to code -EINVAL
when a conversion using DAT fails with code -ENOENT, avoiding the
conflicting condition that leads to the kernel bug described above. Here,
code -EINVAL indicates that metadata corruption was detected during the
block lookup, which will be properly handled as a file system error and
converted to -EIO when passing through the nilfs2 bmap layer.
Link: https://lkml.kernel.org/r/20240313105827.5296-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240313105827.5296-2-konishi.ryusuke@gmail.com
Fixes: c3a7abf06ce7 ("nilfs2: support contiguous lookup of blocks")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+cfed5b56649bddf80d6e(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=cfed5b56649bddf80d6e
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/btree.c | 9 +++++++--
fs/nilfs2/direct.c | 9 +++++++--
2 files changed, 14 insertions(+), 4 deletions(-)
--- a/fs/nilfs2/btree.c~nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings
+++ a/fs/nilfs2/btree.c
@@ -724,7 +724,7 @@ static int nilfs_btree_lookup_contig(con
dat = nilfs_bmap_get_dat(btree);
ret = nilfs_dat_translate(dat, ptr, &blocknr);
if (ret < 0)
- goto out;
+ goto dat_error;
ptr = blocknr;
}
cnt = 1;
@@ -743,7 +743,7 @@ static int nilfs_btree_lookup_contig(con
if (dat) {
ret = nilfs_dat_translate(dat, ptr2, &blocknr);
if (ret < 0)
- goto out;
+ goto dat_error;
ptr2 = blocknr;
}
if (ptr2 != ptr + cnt || ++cnt == maxblocks)
@@ -781,6 +781,11 @@ static int nilfs_btree_lookup_contig(con
out:
nilfs_btree_free_path(path);
return ret;
+
+ dat_error:
+ if (ret == -ENOENT)
+ ret = -EINVAL; /* Notify bmap layer of metadata corruption */
+ goto out;
}
static void nilfs_btree_promote_key(struct nilfs_bmap *btree,
--- a/fs/nilfs2/direct.c~nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings
+++ a/fs/nilfs2/direct.c
@@ -66,7 +66,7 @@ static int nilfs_direct_lookup_contig(co
dat = nilfs_bmap_get_dat(direct);
ret = nilfs_dat_translate(dat, ptr, &blocknr);
if (ret < 0)
- return ret;
+ goto dat_error;
ptr = blocknr;
}
@@ -79,7 +79,7 @@ static int nilfs_direct_lookup_contig(co
if (dat) {
ret = nilfs_dat_translate(dat, ptr2, &blocknr);
if (ret < 0)
- return ret;
+ goto dat_error;
ptr2 = blocknr;
}
if (ptr2 != ptr + cnt)
@@ -87,6 +87,11 @@ static int nilfs_direct_lookup_contig(co
}
*ptrp = ptr;
return cnt;
+
+ dat_error:
+ if (ret == -ENOENT)
+ ret = -EINVAL; /* Notify bmap layer of metadata corruption */
+ return ret;
}
static __u64
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
It is possible trigger below warning message from mass storage function,
------------[ cut here ]------------
WARNING: CPU: 6 PID: 3839 at drivers/usb/gadget/udc/core.c:294 usb_ep_queue+0x7c/0x104
CPU: 6 PID: 3839 Comm: file-storage Tainted: G S WC O 6.1.25-android14-11-g354e2a7e7cd9 #1
pstate: 22400005 (nzCv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
pc : usb_ep_queue+0x7c/0x104
lr : fsg_main_thread+0x494/0x1b3c
Root cause is mass storage function try to queue request from main thread,
but other thread may already disable ep when function disable.
As mass storage function have record of ep enable/disable state, let's
add the state check before queue request to UDC, it maybe avoid warning.
Also use common lock to protect ep state which avoid race between main
thread and function disable.
Cc: <stable(a)vger.kernel.org> # 6.1
Signed-off-by: yuan linyu <yuanlinyu(a)hihonor.com>
---
drivers/usb/gadget/function/f_mass_storage.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/drivers/usb/gadget/function/f_mass_storage.c b/drivers/usb/gadget/function/f_mass_storage.c
index c265a1f62fc1..056083cb68cb 100644
--- a/drivers/usb/gadget/function/f_mass_storage.c
+++ b/drivers/usb/gadget/function/f_mass_storage.c
@@ -520,12 +520,25 @@ static int fsg_setup(struct usb_function *f,
static int start_transfer(struct fsg_dev *fsg, struct usb_ep *ep,
struct usb_request *req)
{
+ unsigned long flags;
int rc;
- if (ep == fsg->bulk_in)
+ spin_lock_irqsave(&fsg->common->lock, flags);
+ if (ep == fsg->bulk_in) {
+ if (!fsg->bulk_in_enabled) {
+ spin_unlock_irqrestore(&fsg->common->lock, flags);
+ return -ESHUTDOWN;
+ }
dump_msg(fsg, "bulk-in", req->buf, req->length);
+ } else {
+ if (!fsg->bulk_out_enabled) {
+ spin_unlock_irqrestore(&fsg->common->lock, flags);
+ return -ESHUTDOWN;
+ }
+ }
rc = usb_ep_queue(ep, req, GFP_KERNEL);
+ spin_unlock_irqrestore(&fsg->common->lock, flags);
if (rc) {
/* We can't do much more than wait for a reset */
@@ -2406,8 +2419,10 @@ static int fsg_set_alt(struct usb_function *f, unsigned intf, unsigned alt)
static void fsg_disable(struct usb_function *f)
{
struct fsg_dev *fsg = fsg_from_func(f);
+ unsigned long flags;
/* Disable the endpoints */
+ spin_lock_irqsave(&fsg->common->lock, flags);
if (fsg->bulk_in_enabled) {
usb_ep_disable(fsg->bulk_in);
fsg->bulk_in_enabled = 0;
@@ -2416,6 +2431,7 @@ static void fsg_disable(struct usb_function *f)
usb_ep_disable(fsg->bulk_out);
fsg->bulk_out_enabled = 0;
}
+ spin_unlock_irqrestore(&fsg->common->lock, flags);
__raise_exception(fsg->common, FSG_STATE_CONFIG_CHANGE, NULL);
}
--
2.25.1
From: Hans de Goede <hdegoede(a)redhat.com>
commit aec7d25b497ce4a8d044e9496de0aa433f7f8f06 upstream.
On Goldmont p2sb_bar() only ever gets called for 2 devices, the actual P2SB
devfn 13,0 and the SPI controller which is part of the P2SB, devfn 13,2.
But the current p2sb code tries to cache BAR0 info for all of
devfn 13,0 to 13,7 . This involves calling pci_scan_single_device()
for device 13 functions 0-7 and the hw does not seem to like
pci_scan_single_device() getting called for some of the other hidden
devices. E.g. on an ASUS VivoBook D540NV-GQ065T this leads to continuous
ACPI errors leading to high CPU usage.
Fix this by only caching BAR0 info and thus only calling
pci_scan_single_device() for the P2SB and the SPI controller.
Fixes: 5913320eb0b3 ("platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe")
Reported-by: Danil Rybakov <danilrybakov249(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218531
Tested-by: Danil Rybakov <danilrybakov249(a)gmail.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://lore.kernel.org/r/20240304134356.305375-2-hdegoede@redhat.com
Signed-off-by: Nobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu(a)toshiba.co.jp>
---
drivers/platform/x86/p2sb.c | 25 +++++++++----------------
1 file changed, 9 insertions(+), 16 deletions(-)
diff --git a/drivers/platform/x86/p2sb.c b/drivers/platform/x86/p2sb.c
index 17cc4b45e023..a64f56ddd4a4 100644
--- a/drivers/platform/x86/p2sb.c
+++ b/drivers/platform/x86/p2sb.c
@@ -20,9 +20,11 @@
#define P2SBC_HIDE BIT(8)
#define P2SB_DEVFN_DEFAULT PCI_DEVFN(31, 1)
+#define P2SB_DEVFN_GOLDMONT PCI_DEVFN(13, 0)
+#define SPI_DEVFN_GOLDMONT PCI_DEVFN(13, 2)
static const struct x86_cpu_id p2sb_cpu_ids[] = {
- X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, PCI_DEVFN(13, 0)),
+ X86_MATCH_INTEL_FAM6_MODEL(ATOM_GOLDMONT, P2SB_DEVFN_GOLDMONT),
{}
};
@@ -98,21 +100,12 @@ static void p2sb_scan_and_cache_devfn(struct pci_bus *bus, unsigned int devfn)
static int p2sb_scan_and_cache(struct pci_bus *bus, unsigned int devfn)
{
- unsigned int slot, fn;
-
- if (PCI_FUNC(devfn) == 0) {
- /*
- * When function number of the P2SB device is zero, scan it and
- * other function numbers, and if devices are available, cache
- * their BAR0s.
- */
- slot = PCI_SLOT(devfn);
- for (fn = 0; fn < NR_P2SB_RES_CACHE; fn++)
- p2sb_scan_and_cache_devfn(bus, PCI_DEVFN(slot, fn));
- } else {
- /* Scan the P2SB device and cache its BAR0 */
- p2sb_scan_and_cache_devfn(bus, devfn);
- }
+ /* Scan the P2SB device and cache its BAR0 */
+ p2sb_scan_and_cache_devfn(bus, devfn);
+
+ /* On Goldmont p2sb_bar() also gets called for the SPI controller */
+ if (devfn == P2SB_DEVFN_GOLDMONT)
+ p2sb_scan_and_cache_devfn(bus, SPI_DEVFN_GOLDMONT);
if (!p2sb_valid_resource(&p2sb_resources[PCI_FUNC(devfn)].res))
return -ENOENT;
--
2.43.0
stop_qc_helper() is called while a spinlock is held. cancel_work_sync()
may sleep. Sleeping in atomic sections is not allowed. Hence change the
cancel_work_sync() call into a cancel_work() call.
Cc: Douglas Gilbert <dgilbert(a)interlog.com>
Cc: John Garry <john.g.garry(a)oracle.com>
Fixes: 1107c7b24ee3 ("scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bart Van Assche <bvanassche(a)acm.org>
---
drivers/scsi/scsi_debug.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 36368c71221b..7a0b7402b715 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -5648,7 +5648,7 @@ static void scsi_debug_slave_destroy(struct scsi_device *sdp)
sdp->hostdata = NULL;
}
-/* Returns true if we require the queued memory to be freed by the caller. */
+/* Returns true if the queued command memory should be freed by the caller. */
static bool stop_qc_helper(struct sdebug_defer *sd_dp,
enum sdeb_defer_type defer_t)
{
@@ -5664,11 +5664,8 @@ static bool stop_qc_helper(struct sdebug_defer *sd_dp,
return true;
}
} else if (defer_t == SDEB_DEFER_WQ) {
- /* Cancel if pending */
- if (cancel_work_sync(&sd_dp->ew.work))
- return true;
- /* Was not pending, so it must have run */
- return false;
+ /* The caller must free qcmd if cancellation succeeds. */
+ return cancel_work(&sd_dp->ew.work);
} else if (defer_t == SDEB_DEFER_POLL) {
return true;
}
From: Johannes Berg <johannes.berg(a)intel.com>
Wireless extensions are already disabled if MLO is enabled,
given that we cannot support MLO there with all the hard-
coded assumptions about BSSID etc.
However, the WiFi7 ecosystem is still stabilizing, and some
devices may need MLO disabled while that happens. In that
case, we might end up with a device that supports wext (but
not MLO) in one kernel, and then breaks wext in the future
(by enabling MLO), which is not desirable.
Add a flag to let such drivers/devices disable wext even if
MLO isn't yet enabled.
Cc: stable(a)vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
---
include/net/cfg80211.h | 2 ++
net/wireless/wext-core.c | 7 +++++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/include/net/cfg80211.h b/include/net/cfg80211.h
index 2e2be4fd2bb6..1e09329acc42 100644
--- a/include/net/cfg80211.h
+++ b/include/net/cfg80211.h
@@ -4991,6 +4991,7 @@ struct cfg80211_ops {
* set this flag to update channels on beacon hints.
* @WIPHY_FLAG_SUPPORTS_NSTR_NONPRIMARY: support connection to non-primary link
* of an NSTR mobile AP MLD.
+ * @WIPHY_FLAG_DISABLE_WEXT: disable wireless extensions for this device
*/
enum wiphy_flags {
WIPHY_FLAG_SUPPORTS_EXT_KEK_KCK = BIT(0),
@@ -5002,6 +5003,7 @@ enum wiphy_flags {
WIPHY_FLAG_4ADDR_STATION = BIT(6),
WIPHY_FLAG_CONTROL_PORT_PROTOCOL = BIT(7),
WIPHY_FLAG_IBSS_RSN = BIT(8),
+ WIPHY_FLAG_DISABLE_WEXT = BIT(9),
WIPHY_FLAG_MESH_AUTH = BIT(10),
WIPHY_FLAG_SUPPORTS_EXT_KCK_32 = BIT(11),
WIPHY_FLAG_SUPPORTS_NSTR_NONPRIMARY = BIT(12),
diff --git a/net/wireless/wext-core.c b/net/wireless/wext-core.c
index a161c64d1765..838ad6541a17 100644
--- a/net/wireless/wext-core.c
+++ b/net/wireless/wext-core.c
@@ -4,6 +4,7 @@
* Authors : Jean Tourrilhes - HPL - <jt(a)hpl.hp.com>
* Copyright (c) 1997-2007 Jean Tourrilhes, All Rights Reserved.
* Copyright 2009 Johannes Berg <johannes(a)sipsolutions.net>
+ * Copyright (C) 2024 Intel Corporation
*
* (As all part of the Linux kernel, this file is GPL)
*/
@@ -662,7 +663,8 @@ struct iw_statistics *get_wireless_stats(struct net_device *dev)
dev->ieee80211_ptr->wiphy->wext &&
dev->ieee80211_ptr->wiphy->wext->get_wireless_stats) {
wireless_warn_cfg80211_wext();
- if (dev->ieee80211_ptr->wiphy->flags & WIPHY_FLAG_SUPPORTS_MLO)
+ if (dev->ieee80211_ptr->wiphy->flags & (WIPHY_FLAG_SUPPORTS_MLO |
+ WIPHY_FLAG_DISABLE_WEXT))
return NULL;
return dev->ieee80211_ptr->wiphy->wext->get_wireless_stats(dev);
}
@@ -704,7 +706,8 @@ static iw_handler get_handler(struct net_device *dev, unsigned int cmd)
#ifdef CONFIG_CFG80211_WEXT
if (dev->ieee80211_ptr && dev->ieee80211_ptr->wiphy) {
wireless_warn_cfg80211_wext();
- if (dev->ieee80211_ptr->wiphy->flags & WIPHY_FLAG_SUPPORTS_MLO)
+ if (dev->ieee80211_ptr->wiphy->flags & (WIPHY_FLAG_SUPPORTS_MLO |
+ WIPHY_FLAG_DISABLE_WEXT))
return NULL;
handlers = dev->ieee80211_ptr->wiphy->wext;
}
--
2.44.0
From: Ranjan Kumar <ranjan.kumar(a)broadcom.com>
[ Upstream commit ee0017c3ed8a8abfa4d40e42f908fb38c31e7515 ]
If the driver detects that the controller is not ready before sending the
first IOC facts command, it will wait for a maximum of 10 seconds for it to
become ready. However, even if the controller becomes ready within 10
seconds, the driver will still issue a diagnostic reset.
Modify the driver to avoid sending a diag reset if the controller becomes
ready within the 10-second wait time.
Signed-off-by: Ranjan Kumar <ranjan.kumar(a)broadcom.com>
Link: https://lore.kernel.org/r/20240221071724.14986-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/mpt3sas/mpt3sas_base.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index e524e1fc53fa3..8325875bfc4ed 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -7238,7 +7238,9 @@ _base_wait_for_iocstate(struct MPT3SAS_ADAPTER *ioc, int timeout)
return -EFAULT;
}
- issue_diag_reset:
+ return 0;
+
+issue_diag_reset:
rc = _base_diag_reset(ioc);
return rc;
}
--
2.43.0
Make sure to put the runtime PM usage count (and suspend) also when
receiving a disconnect event while in the ST_MAINLINK_READY state.
This specifically avoids leaking a runtime PM usage count on every
disconnect with display servers that do not automatically enable
external displays when receiving a hotplug notification.
Fixes: 5814b8bf086a ("drm/msm/dp: incorporate pm_runtime framework into DP driver")
Cc: stable(a)vger.kernel.org # 6.8
Cc: Kuogee Hsieh <quic_khsieh(a)quicinc.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/gpu/drm/msm/dp/dp_display.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/msm/dp/dp_display.c b/drivers/gpu/drm/msm/dp/dp_display.c
index 4c72124ffb5d..8e8cf531da45 100644
--- a/drivers/gpu/drm/msm/dp/dp_display.c
+++ b/drivers/gpu/drm/msm/dp/dp_display.c
@@ -655,6 +655,7 @@ static int dp_hpd_unplug_handle(struct dp_display_private *dp, u32 data)
dp_display_host_phy_exit(dp);
dp->hpd_state = ST_DISCONNECTED;
dp_display_notify_disconnect(&dp->dp_display.pdev->dev);
+ pm_runtime_put_sync(&pdev->dev);
mutex_unlock(&dp->event_mutex);
return 0;
}
--
2.43.2
The quilt patch titled
Subject: memtest: use {READ,WRITE}_ONCE in memory scanning
has been removed from the -mm tree. Its filename was
memtest-use-readwrite_once-in-memory-scanning.patch
This patch was dropped because it was merged into the mm-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Qiang Zhang <qiang4.zhang(a)intel.com>
Subject: memtest: use {READ,WRITE}_ONCE in memory scanning
Date: Tue, 12 Mar 2024 16:04:23 +0800
memtest failed to find bad memory when compiled with clang. So use
{WRITE,READ}_ONCE to access memory to avoid compiler over optimization.
Link: https://lkml.kernel.org/r/20240312080422.691222-1-qiang4.zhang@intel.com
Signed-off-by: Qiang Zhang <qiang4.zhang(a)intel.com>
Cc: Bill Wendling <morbo(a)google.com>
Cc: Justin Stitt <justinstitt(a)google.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memtest.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memtest.c~memtest-use-readwrite_once-in-memory-scanning
+++ a/mm/memtest.c
@@ -51,10 +51,10 @@ static void __init memtest(u64 pattern,
last_bad = 0;
for (p = start; p < end; p++)
- *p = pattern;
+ WRITE_ONCE(*p, pattern);
for (p = start; p < end; p++, start_phys_aligned += incr) {
- if (*p == pattern)
+ if (READ_ONCE(*p) == pattern)
continue;
if (start_phys_aligned == last_bad + incr) {
last_bad += incr;
_
Patches currently in -mm which might be from qiang4.zhang(a)intel.com are
The patch titled
Subject: nilfs2: prevent kernel bug at submit_bh_wbc()
has been added to the -mm mm-nonmm-unstable branch. Its filename is
nilfs2-prevent-kernel-bug-at-submit_bh_wbc.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: prevent kernel bug at submit_bh_wbc()
Date: Wed, 13 Mar 2024 19:58:27 +0900
Fix a bug where nilfs_get_block() returns a successful status when
searching and inserting the specified block both fail inconsistently. If
this inconsistent behavior is not due to a previously fixed bug, then an
unexpected race is occurring, so return a temporary error -EAGAIN instead.
This prevents callers such as __block_write_begin_int() from requesting a
read into a buffer that is not mapped, which would cause the BUG_ON check
for the BH_Mapped flag in submit_bh_wbc() to fail.
Link: https://lkml.kernel.org/r/20240313105827.5296-3-konishi.ryusuke@gmail.com
Fixes: 1f5abe7e7dbc ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nilfs2/inode.c~nilfs2-prevent-kernel-bug-at-submit_bh_wbc
+++ a/fs/nilfs2/inode.c
@@ -112,7 +112,7 @@ int nilfs_get_block(struct inode *inode,
"%s (ino=%lu): a race condition while inserting a data block at offset=%llu",
__func__, inode->i_ino,
(unsigned long long)blkoff);
- err = 0;
+ err = -EAGAIN;
}
nilfs_transaction_abort(inode->i_sb);
goto out;
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings.patch
nilfs2-prevent-kernel-bug-at-submit_bh_wbc.patch
The patch titled
Subject: nilfs2: fix failure to detect DAT corruption in btree and direct mappings
has been added to the -mm mm-nonmm-unstable branch. Its filename is
nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-nonmm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix failure to detect DAT corruption in btree and direct mappings
Date: Wed, 13 Mar 2024 19:58:26 +0900
Patch series "nilfs2: fix kernel bug at submit_bh_wbc()".
This resolves a kernel BUG reported by syzbot. Since there are two
flaws involved, I've made each one a separate patch.
The first patch alone resolves the syzbot-reported bug, but I think
both fixes should be sent to stable, so I've tagged them as such.
This patch (of 2):
Syzbot has reported a kernel bug in submit_bh_wbc() when writing file data
to a nilfs2 file system whose metadata is corrupted.
There are two flaws involved in this issue.
The first flaw is that when nilfs_get_block() locates a data block using
btree or direct mapping, if the disk address translation routine
nilfs_dat_translate() fails with internal code -ENOENT due to DAT metadata
corruption, it can be passed back to nilfs_get_block(). This causes
nilfs_get_block() to misidentify an existing block as non-existent,
causing both data block lookup and insertion to fail inconsistently.
The second flaw is that nilfs_get_block() returns a successful status in
this inconsistent state. This causes the caller __block_write_begin_int()
or others to request a read even though the buffer is not mapped,
resulting in a BUG_ON check for the BH_Mapped flag in submit_bh_wbc()
failing.
This fixes the first issue by changing the return value to code -EINVAL
when a conversion using DAT fails with code -ENOENT, avoiding the
conflicting condition that leads to the kernel bug described above. Here,
code -EINVAL indicates that metadata corruption was detected during the
block lookup, which will be properly handled as a file system error and
converted to -EIO when passing through the nilfs2 bmap layer.
Link: https://lkml.kernel.org/r/20240313105827.5296-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240313105827.5296-2-konishi.ryusuke@gmail.com
Fixes: c3a7abf06ce7 ("nilfs2: support contiguous lookup of blocks")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+cfed5b56649bddf80d6e(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=cfed5b56649bddf80d6e
Tested-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/btree.c | 9 +++++++--
fs/nilfs2/direct.c | 9 +++++++--
2 files changed, 14 insertions(+), 4 deletions(-)
--- a/fs/nilfs2/btree.c~nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings
+++ a/fs/nilfs2/btree.c
@@ -724,7 +724,7 @@ static int nilfs_btree_lookup_contig(con
dat = nilfs_bmap_get_dat(btree);
ret = nilfs_dat_translate(dat, ptr, &blocknr);
if (ret < 0)
- goto out;
+ goto dat_error;
ptr = blocknr;
}
cnt = 1;
@@ -743,7 +743,7 @@ static int nilfs_btree_lookup_contig(con
if (dat) {
ret = nilfs_dat_translate(dat, ptr2, &blocknr);
if (ret < 0)
- goto out;
+ goto dat_error;
ptr2 = blocknr;
}
if (ptr2 != ptr + cnt || ++cnt == maxblocks)
@@ -781,6 +781,11 @@ static int nilfs_btree_lookup_contig(con
out:
nilfs_btree_free_path(path);
return ret;
+
+ dat_error:
+ if (ret == -ENOENT)
+ ret = -EINVAL; /* Notify bmap layer of metadata corruption */
+ goto out;
}
static void nilfs_btree_promote_key(struct nilfs_bmap *btree,
--- a/fs/nilfs2/direct.c~nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings
+++ a/fs/nilfs2/direct.c
@@ -66,7 +66,7 @@ static int nilfs_direct_lookup_contig(co
dat = nilfs_bmap_get_dat(direct);
ret = nilfs_dat_translate(dat, ptr, &blocknr);
if (ret < 0)
- return ret;
+ goto dat_error;
ptr = blocknr;
}
@@ -79,7 +79,7 @@ static int nilfs_direct_lookup_contig(co
if (dat) {
ret = nilfs_dat_translate(dat, ptr2, &blocknr);
if (ret < 0)
- return ret;
+ goto dat_error;
ptr2 = blocknr;
}
if (ptr2 != ptr + cnt)
@@ -87,6 +87,11 @@ static int nilfs_direct_lookup_contig(co
}
*ptrp = ptr;
return cnt;
+
+ dat_error:
+ if (ret == -ENOENT)
+ ret = -EINVAL; /* Notify bmap layer of metadata corruption */
+ return ret;
}
static __u64
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-failure-to-detect-dat-corruption-in-btree-and-direct-mappings.patch
nilfs2-prevent-kernel-bug-at-submit_bh_wbc.patch
Boris managed to create a device capable of changing its maj:min without
altering its device path.
Only multi-devices can be scanned. A device that gets scanned and remains
in the Btrfs kernel cache might end up with an incorrect maj:min.
Despite the tempfsid feature patch did not introduce this bug, it could
lead to issues if the above multi-device is converted to a single device
with a stale maj:min. Subsequently, attempting to mount the same device
with the correct maj:min might mistake it for another device with the same
fsid, potentially resulting in wrongly auto-enabling the tempfsid feature.
To address this, this patch validates the device's maj:min at the time of
device open and updates it if it has changed since the last scan.
CC: stable(a)vger.kernel.org # 6.7+
Fixes: a5b8a5f9f835 ("btrfs: support cloned-device mount capability")
Reported-by: Boris Burkov <boris(a)bur.io>
Co-developed-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: Anand Jain <anand.jain(a)oracle.com>
---
v2:
Drop using lookup_bdev() instead, get it from device->bdev->bd_dev.
v1:
https://lore.kernel.org/linux-btrfs/752b8526be21d984e0ee58c7f66d312664ff5ac…
fs/btrfs/volumes.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index e49935a54da0..c318640b4472 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -692,6 +692,16 @@ static int btrfs_open_one_device(struct btrfs_fs_devices *fs_devices,
device->bdev = bdev_handle->bdev;
clear_bit(BTRFS_DEV_STATE_IN_FS_METADATA, &device->dev_state);
+ if (device->devt != device->bdev->bd_dev) {
+ btrfs_warn(NULL,
+ "device %s maj:min changed from %d:%d to %d:%d",
+ device->name->str, MAJOR(device->devt),
+ MINOR(device->devt), MAJOR(device->bdev->bd_dev),
+ MINOR(device->bdev->bd_dev));
+
+ device->devt = device->bdev->bd_dev;
+ }
+
fs_devices->open_devices++;
if (test_bit(BTRFS_DEV_STATE_WRITEABLE, &device->dev_state) &&
device->devid != BTRFS_DEV_REPLACE_DEVID) {
--
2.38.1
The current code in pcie_wait_for_link_delay() handles the value
returned by pcie_failed_link_retrain() as an integer, expecting 0
when the link has been successfully retrained. The issue is that
pcie_failed_link_retrain() returns a boolean: "true" if the link
has been successfully retrained and "false" otherwise. This leads
pcie_wait_for_link_delay() to return an incorrect "active link"
status when pcie_failed_link_retrain() is called.
This patch fixes the check of the value returned by
pcie_failed_link_retrain() in pcie_wait_for_link_delay().
Note that this bug induces abnormal timeout delays when a PCI device
is unplugged (around 60 seconds per bridge / secondary bus removed).
Cc: stable(a)vger.kernel.org
Fixes: 1abb47390350 ("Merge branch 'pci/enumeration'")
Signed-off-by: Simon Guinot <simon.guinot(a)seagate.com>
---
drivers/pci/pci.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ccee56615f78..7ec91b4c5d03 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -5101,9 +5101,7 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
msleep(20);
rc = pcie_wait_for_link_status(pdev, false, active);
if (active) {
- if (rc)
- rc = pcie_failed_link_retrain(pdev);
- if (rc)
+ if (rc && !pcie_failed_link_retrain(pdev))
return false;
msleep(delay);
--
2.43.0
arch/powerpc/lib/xor_vmx.o is built with '-msoft-float' (from the main
powerpc Makefile) and '-maltivec' (from its CFLAGS), which causes an
error when building with clang after a recent change in main:
error: option '-msoft-float' cannot be specified with '-maltivec'
make[6]: *** [scripts/Makefile.build:243: arch/powerpc/lib/xor_vmx.o] Error 1
Explicitly add '-mhard-float' before '-maltivec' in xor_vmx.o's CFLAGS
to override the previous inclusion of '-msoft-float' (as the last option
wins), which matches how other areas of the kernel use '-maltivec', such
as AMDGPU.
Cc: stable(a)vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/1986
Link: https://github.com/llvm/llvm-project/commit/4792f912b232141ecba4cbae538873b…
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
arch/powerpc/lib/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile
index 6eac63e79a89..0ab65eeb93ee 100644
--- a/arch/powerpc/lib/Makefile
+++ b/arch/powerpc/lib/Makefile
@@ -76,7 +76,7 @@ obj-$(CONFIG_PPC_LIB_RHEAP) += rheap.o
obj-$(CONFIG_FTR_FIXUP_SELFTEST) += feature-fixups-test.o
obj-$(CONFIG_ALTIVEC) += xor_vmx.o xor_vmx_glue.o
-CFLAGS_xor_vmx.o += -maltivec $(call cc-option,-mabi=altivec)
+CFLAGS_xor_vmx.o += -mhard-float -maltivec $(call cc-option,-mabi=altivec)
# Enable <altivec.h>
CFLAGS_xor_vmx.o += -isystem $(shell $(CC) -print-file-name=include)
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20240127-ppc-xor_vmx-drop-msoft-float-ad68b437f86c
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
Without the device table the driver will not auto-load when compiled as
a module.
Fixes: 273db8f03509 ("Input: add IOC3 serio driver")
Cc: stable(a)vger.kernel.org
Signed-off-by: Karel Balej <balejk(a)matfyz.cz>
---
I do not own any device using this driver, but I verified that modinfo
does not show the "platform:ioc3-kbd" alias when run on the module
compiled without this patch and it does with it.
drivers/input/serio/ioc3kbd.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/input/serio/ioc3kbd.c b/drivers/input/serio/ioc3kbd.c
index 50552dc7b4f5..676b0bda3d72 100644
--- a/drivers/input/serio/ioc3kbd.c
+++ b/drivers/input/serio/ioc3kbd.c
@@ -200,9 +200,16 @@ static void ioc3kbd_remove(struct platform_device *pdev)
serio_unregister_port(d->aux);
}
+static const struct platform_device_id ioc3kbd_id_table[] = {
+ { "ioc3-kbd", },
+ { }
+};
+MODULE_DEVICE_TABLE(platform, ioc3kbd_id_table);
+
static struct platform_driver ioc3kbd_driver = {
.probe = ioc3kbd_probe,
.remove_new = ioc3kbd_remove,
+ .id_table = ioc3kbd_id_table,
.driver = {
.name = "ioc3-kbd",
},
--
2.44.0
From: Zi Yan <ziy(a)nvidia.com>
The tail pages in a THP can have swap entry information stored in their
private field. When migrating to a new page, all tail pages of the new
page need to update ->private to avoid future data corruption.
This fix is stable-only, since after commit 07e09c483cbe ("mm/huge_memory:
work on folio->swap instead of page->private when splitting folio"),
subpages of a swapcached THP no longer requires the maintenance.
Adding THPs to the swapcache was introduced in commit
38d8b4e6bdc87 ("mm, THP, swap: delay splitting THP during swap out"),
where each subpage of a THP added to the swapcache had its own swapcache
entry and required the ->private field to point to the correct swapcache
entry. Later, when THP migration functionality was implemented in commit
616b8371539a6 ("mm: thp: enable thp migration in generic path"),
it initially did not handle the subpages of swapcached THPs, failing to
update their ->private fields or replace the subpage pointers in the
swapcache. Subsequently, commit e71769ae5260 ("mm: enable thp migration
for shmem thp") addressed the swapcache update aspect. This patch fixes
the update of subpage ->private fields.
Closes: https://lore.kernel.org/linux-mm/1707814102-22682-1-git-send-email-quic_cha…
Fixes: 616b8371539a ("mm: thp: enable thp migration in generic path")
Signed-off-by: Zi Yan <ziy(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
---
mm/migrate.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index c93dd6a31c31..c5968021fde0 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -423,8 +423,12 @@ int folio_migrate_mapping(struct address_space *mapping,
if (folio_test_swapbacked(folio)) {
__folio_set_swapbacked(newfolio);
if (folio_test_swapcache(folio)) {
+ int i;
+
folio_set_swapcache(newfolio);
- newfolio->private = folio_get_private(folio);
+ for (i = 0; i < nr; i++)
+ set_page_private(folio_page(newfolio, i),
+ page_private(folio_page(folio, i)));
}
entries = nr;
} else {
--
2.43.0
In f2fs_do_write_data_page, FI_ATOMIC_FILE flag selects the target inode
between the original inode and COW inode. When aborting atomic write and
writeback occur simultaneously, invalid data can be written to original
inode if the FI_ATOMIC_FILE flag is cleared meanwhile.
To prevent the problem, let's truncate all pages before clearing the flag
Atomic write thread Writeback thread
f2fs_abort_atomic_write
clear_inode_flag(inode, FI_ATOMIC_FILE)
__writeback_single_inode
do_writepages
f2fs_do_write_data_page
- use dn of original inode
truncate_inode_pages_final
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable(a)vger.kernel.org #v5.19+
Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com>
---
fs/f2fs/segment.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 7901ede58113..7e47b8054413 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -192,6 +192,9 @@ void f2fs_abort_atomic_write(struct inode *inode, bool clean)
if (!f2fs_is_atomic_file(inode))
return;
+ if (clean)
+ truncate_inode_pages_final(inode->i_mapping);
+
release_atomic_write_cnt(inode);
clear_inode_flag(inode, FI_ATOMIC_COMMITTED);
clear_inode_flag(inode, FI_ATOMIC_REPLACE);
@@ -201,7 +204,6 @@ void f2fs_abort_atomic_write(struct inode *inode, bool clean)
F2FS_I(inode)->atomic_write_task = NULL;
if (clean) {
- truncate_inode_pages_final(inode->i_mapping);
f2fs_i_size_write(inode, fi->original_i_size);
fi->original_i_size = 0;
}
--
2.25.1
In f2fs_update_inode, i_size of the atomic file isn't updated until
FI_ATOMIC_COMMITTED flag is set. When committing atomic write right
after the writeback of the inode, i_size of the raw inode will not be
updated. It can cause the atomicity corruption due to a mismatch between
old file size and new data.
To prevent the problem, let's mark inode dirty for FI_ATOMIC_COMMITTED
Atomic write thread Writeback thread
__writeback_single_inode
write_inode
f2fs_update_inode
- skip i_size update
f2fs_ioc_commit_atomic_write
f2fs_commit_atomic_write
set_inode_flag(inode, FI_ATOMIC_COMMITTED)
f2fs_do_sync_file
f2fs_fsync_node_pages
- skip f2fs_update_inode since the inode is clean
Fixes: 3db1de0e582c ("f2fs: change the current atomic write way")
Cc: stable(a)vger.kernel.org #v5.19+
Signed-off-by: Sunmin Jeong <s_min.jeong(a)samsung.com>
---
fs/f2fs/f2fs.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
index 543898482f8b..a000cb024dbe 100644
--- a/fs/f2fs/f2fs.h
+++ b/fs/f2fs/f2fs.h
@@ -3039,6 +3039,7 @@ static inline void __mark_inode_dirty_flag(struct inode *inode,
case FI_INLINE_DOTS:
case FI_PIN_FILE:
case FI_COMPRESS_RELEASED:
+ case FI_ATOMIC_COMMITTED:
f2fs_mark_inode_dirty_sync(inode, true);
}
}
--
2.25.1
Many architectures' switch_mm() (e.g. arm64) do not have an smp_mb()
which the core scheduler code has depended upon since commit:
commit 223baf9d17f25 ("sched: Fix performance regression introduced by mm_cid")
If switch_mm() doesn't call smp_mb(), sched_mm_cid_remote_clear() can
unset the activly used cid when it fails to observe active task after it
sets lazy_put.
The *is* a memory barrier between storing to rq->curr and _return to
userspace_ (as required by membarrier), but the rseq mm_cid has stricter
requirements: the barrier needs to be issued between store to rq->curr
and switch_mm_cid(), which happens earlier than:
- spin_unlock(),
- switch_to().
So it's fine when the architecture switch_mm happens to have that barrier
already, but less so when the architecture only provides the full barrier
in switch_to() or spin_unlock().
It is a bug in the rseq switch_mm_cid() implementation. All architectures
that don't have memory barriers in switch_mm(), but rather have the full
barrier either in finish_lock_switch() or switch_to() have them too late
for the needs of switch_mm_cid().
Introduce a new smp_mb__after_switch_mm(), defined as smp_mb() in the
generic barrier.h header, and use it in switch_mm_cid() for scheduler
transitions where switch_mm() is expected to provide a memory barrier.
Architectures can override smp_mb__after_switch_mm() if their
switch_mm() implementation provides an implicit memory barrier.
Override it with a no-op on x86 which implicitly provide this memory
barrier by writing to CR3.
Reported-by: levi.yun <yeoreum.yun(a)arm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 223baf9d17f2 ("sched: Fix performance regression introduced by mm_cid")
Cc: <stable(a)vger.kernel.org> # 6.4.x
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: Aaron Lu <aaron.lu(a)intel.com>
---
arch/x86/include/asm/barrier.h | 3 +++
include/asm-generic/barrier.h | 8 ++++++++
kernel/sched/sched.h | 19 +++++++++++++------
3 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 35389b2af88e..0d5e54201eb2 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -79,6 +79,9 @@ do { \
#define __smp_mb__before_atomic() do { } while (0)
#define __smp_mb__after_atomic() do { } while (0)
+/* Writing to CR3 provides a full memory barrier in switch_mm(). */
+#define smp_mb__after_switch_mm() do { } while (0)
+
#include <asm-generic/barrier.h>
/*
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 961f4d88f9ef..5a6c94d7a598 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -296,5 +296,13 @@ do { \
#define io_stop_wc() do { } while (0)
#endif
+/*
+ * Architectures that guarantee an implicit smp_mb() in switch_mm()
+ * can override smp_mb__after_switch_mm.
+ */
+#ifndef smp_mb__after_switch_mm
+#define smp_mb__after_switch_mm() smp_mb()
+#endif
+
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h
index 2e5a95486a42..638ebd355912 100644
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -79,6 +79,8 @@
# include <asm/paravirt_api_clock.h>
#endif
+#include <asm/barrier.h>
+
#include "cpupri.h"
#include "cpudeadline.h"
@@ -3481,13 +3483,18 @@ static inline void switch_mm_cid(struct rq *rq,
* between rq->curr store and load of {prev,next}->mm->pcpu_cid[cpu].
* Provide it here.
*/
- if (!prev->mm) // from kernel
+ if (!prev->mm) { // from kernel
smp_mb();
- /*
- * user -> user transition guarantees a memory barrier through
- * switch_mm() when current->mm changes. If current->mm is
- * unchanged, no barrier is needed.
- */
+ } else { // from user
+ /*
+ * user -> user transition relies on an implicit the
+ * memory barrier in switch_mm() when current->mm
+ * changes. If the architecture switch_mm() does not
+ * have an implicit memory barrier, it is emitted here.
+ * If current->mm is unchanged, no barrier is needed.
+ */
+ smp_mb__after_switch_mm();
+ }
}
if (prev->mm_cid_active) {
mm_cid_snapshot_time(rq, prev->mm);
--
2.39.2
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
When the trace_pipe_raw file is closed, there should be no new readers on
the file descriptor. This is mostly handled with the waking and wait_index
fields of the iterator. But there's still a slight race.
CPU 0 CPU 1
----- -----
wait_index++;
index = wait_index;
ring_buffer_wake_waiters();
wait_on_pipe()
ring_buffer_wait();
The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is
that the ring_buffer_wait() needs the logic of:
prepare_to_wait();
if (!condition)
schedule();
Where the missing condition check is the iter->wait_index update.
Have the ring_buffer_wait() take a conditional callback function and a
data parameter that can be used within the wait_event_interruptible() of
the ring_buffer_wait() function.
In wait_on_pipe(), pass a condition function that will check if the
wait_index has been updated, if it has, it will return true to break out
of the wait_event_interruptible() loop.
Create a new field "closed" in the trace_iterator and set it in the
.flush() callback before calling ring_buffer_wake_waiters().
This will keep any new readers from waiting on a closed file descriptor.
Have the wait_on_pipe() condition callback also check the closed field.
Change the wait_index field of the trace_iterator to atomic_t. There's no
reason it needs to be 'long' and making it atomic and using
atomic_read_acquire() and atomic_fetch_inc_release() will provide the
necessary memory barriers.
Add a "woken" flag to tracing_buffers_splice_read() to exit the loop after
one more try to fetch data. That is, if it waited for data and something
woke it up, it should try to collect any new data and then exit back to
user space.
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm…
Link: https://lore.kernel.org/linux-trace-kernel/20240312121703.557950713@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: linke li <lilinke99(a)qq.com>
Cc: Rabin Vincent <rabin(a)rab.in>
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/ring_buffer.h | 3 ++-
include/linux/trace_events.h | 5 ++++-
kernel/trace/ring_buffer.c | 13 ++++++-----
kernel/trace/trace.c | 43 ++++++++++++++++++++++++++----------
4 files changed, 45 insertions(+), 19 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 338a33db1577..dc5ae4e96aee 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -99,7 +99,8 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
})
typedef bool (*ring_buffer_cond_fn)(void *data);
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full,
+ ring_buffer_cond_fn cond, void *data);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d68ff9b1247f..fc6d0af56bb1 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -103,13 +103,16 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
- long wait_index;
+ atomic_t wait_index;
/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
cpumask_var_t started;
+ /* Set when the file is closed to prevent new waiters */
+ bool closed;
+
/* it's true when current open file is snapshot */
bool snapshot;
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index f4c34b7c7e1e..350607cce869 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -902,23 +902,26 @@ static bool rb_wait_once(void *data)
* @buffer: buffer to wait on
* @cpu: the cpu buffer to wait on
* @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ * @cond: condition function to break out of wait (NULL to run once)
+ * @data: the data to pass to @cond.
*
* If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
* as data is added to any of the @buffer's cpu buffers. Otherwise
* it will wait for data to be added to a specific cpu buffer.
*/
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full,
+ ring_buffer_cond_fn cond, void *data)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct wait_queue_head *waitq;
- ring_buffer_cond_fn cond;
struct rb_irq_work *rbwork;
- void *data;
long once = 0;
int ret = 0;
- cond = rb_wait_once;
- data = &once;
+ if (!cond) {
+ cond = rb_wait_once;
+ data = &once;
+ }
/*
* Depending on what the caller is waiting for, either any
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c9c898307348..d390fea3a6a5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1955,15 +1955,36 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu)
#endif /* CONFIG_TRACER_MAX_TRACE */
+struct pipe_wait {
+ struct trace_iterator *iter;
+ int wait_index;
+};
+
+static bool wait_pipe_cond(void *data)
+{
+ struct pipe_wait *pwait = data;
+ struct trace_iterator *iter = pwait->iter;
+
+ if (atomic_read_acquire(&iter->wait_index) != pwait->wait_index)
+ return true;
+
+ return iter->closed;
+}
+
static int wait_on_pipe(struct trace_iterator *iter, int full)
{
+ struct pipe_wait pwait;
int ret;
/* Iterators are static, they should be filled or empty */
if (trace_buffer_iter(iter, iter->cpu_file))
return 0;
- ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full);
+ pwait.wait_index = atomic_read_acquire(&iter->wait_index);
+ pwait.iter = iter;
+
+ ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full,
+ wait_pipe_cond, &pwait);
#ifdef CONFIG_TRACER_MAX_TRACE
/*
@@ -8398,9 +8419,9 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id)
struct ftrace_buffer_info *info = file->private_data;
struct trace_iterator *iter = &info->iter;
- iter->wait_index++;
+ iter->closed = true;
/* Make sure the waiters see the new wait_index */
- smp_wmb();
+ (void)atomic_fetch_inc_release(&iter->wait_index);
ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
@@ -8500,6 +8521,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
+ bool woken = false;
int page_size;
int entries, i;
ssize_t ret = 0;
@@ -8573,17 +8595,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
/* did we read anything? */
if (!spd.nr_pages) {
- long wait_index;
if (ret)
goto out;
+ if (woken)
+ goto out;
+
ret = -EAGAIN;
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
- wait_index = READ_ONCE(iter->wait_index);
-
ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent);
if (ret)
goto out;
@@ -8592,10 +8614,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (!tracer_tracing_is_on(iter->tr))
goto out;
- /* Make sure we see the new wait_index */
- smp_rmb();
- if (wait_index != iter->wait_index)
- goto out;
+ /* Iterate one more time to collect any new data then exit */
+ woken = true;
goto again;
}
@@ -8618,9 +8638,8 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned
mutex_lock(&trace_types_lock);
- iter->wait_index++;
/* Make sure the waiters see the new wait_index */
- smp_wmb();
+ (void)atomic_fetch_inc_release(&iter->wait_index);
ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Convert ring_buffer_wait() over to wait_event_interruptible(). The default
condition is to execute the wait loop inside __wait_event() just once.
This does not change the ring_buffer_wait() prototype yet, but
restructures the code so that it can take a "cond" and "data" parameter
and will call wait_event_interruptible() with a helper function as the
condition.
The helper function (rb_wait_cond) takes the cond function and data
parameters. It will first check if the buffer hit the watermark defined by
the "full" parameter and then call the passed in condition parameter. If
either are true, it returns true.
If rb_wait_cond() does not return true, it will set the appropriate
"waiters_pending" flag and returns false.
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm…
Link: https://lore.kernel.org/linux-trace-kernel/20240312121703.399598519@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: linke li <lilinke99(a)qq.com>
Cc: Rabin Vincent <rabin(a)rab.in>
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/ring_buffer.h | 1 +
kernel/trace/ring_buffer.c | 116 +++++++++++++++++++++---------------
2 files changed, 69 insertions(+), 48 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index fa802db216f9..338a33db1577 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -98,6 +98,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
__ring_buffer_alloc((size), (flags), &__key); \
})
+typedef bool (*ring_buffer_cond_fn)(void *data);
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 8c3730a88662..f4c34b7c7e1e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -843,43 +843,15 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
return ret;
}
-/**
- * ring_buffer_wait - wait for input to the ring buffer
- * @buffer: buffer to wait on
- * @cpu: the cpu buffer to wait on
- * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
- *
- * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
- * as data is added to any of the @buffer's cpu buffers. Otherwise
- * it will wait for data to be added to a specific cpu buffer.
- */
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+static inline bool
+rb_wait_cond(struct rb_irq_work *rbwork, struct trace_buffer *buffer,
+ int cpu, int full, ring_buffer_cond_fn cond, void *data)
{
- struct ring_buffer_per_cpu *cpu_buffer;
- DEFINE_WAIT(wait);
- struct rb_irq_work *work;
- int ret = 0;
-
- /*
- * Depending on what the caller is waiting for, either any
- * data in any cpu buffer, or a specific buffer, put the
- * caller on the appropriate wait queue.
- */
- if (cpu == RING_BUFFER_ALL_CPUS) {
- work = &buffer->irq_work;
- /* Full only makes sense on per cpu reads */
- full = 0;
- } else {
- if (!cpumask_test_cpu(cpu, buffer->cpumask))
- return -ENODEV;
- cpu_buffer = buffer->buffers[cpu];
- work = &cpu_buffer->irq_work;
- }
+ if (rb_watermark_hit(buffer, cpu, full))
+ return true;
- if (full)
- prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
- else
- prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
+ if (cond(data))
+ return true;
/*
* The events can happen in critical sections where
@@ -902,27 +874,75 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* a task has been queued. It's OK for spurious wake ups.
*/
if (full)
- work->full_waiters_pending = true;
+ rbwork->full_waiters_pending = true;
else
- work->waiters_pending = true;
+ rbwork->waiters_pending = true;
- if (rb_watermark_hit(buffer, cpu, full))
- goto out;
+ return false;
+}
- if (signal_pending(current)) {
- ret = -EINTR;
- goto out;
+/*
+ * The default wait condition for ring_buffer_wait() is to just to exit the
+ * wait loop the first time it is woken up.
+ */
+static bool rb_wait_once(void *data)
+{
+ long *once = data;
+
+ /* wait_event() actually calls this twice before scheduling*/
+ if (*once > 1)
+ return true;
+
+ (*once)++;
+ return false;
+}
+
+/**
+ * ring_buffer_wait - wait for input to the ring buffer
+ * @buffer: buffer to wait on
+ * @cpu: the cpu buffer to wait on
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ *
+ * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
+ * as data is added to any of the @buffer's cpu buffers. Otherwise
+ * it will wait for data to be added to a specific cpu buffer.
+ */
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct wait_queue_head *waitq;
+ ring_buffer_cond_fn cond;
+ struct rb_irq_work *rbwork;
+ void *data;
+ long once = 0;
+ int ret = 0;
+
+ cond = rb_wait_once;
+ data = &once;
+
+ /*
+ * Depending on what the caller is waiting for, either any
+ * data in any cpu buffer, or a specific buffer, put the
+ * caller on the appropriate wait queue.
+ */
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+ rbwork = &buffer->irq_work;
+ /* Full only makes sense on per cpu reads */
+ full = 0;
+ } else {
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return -ENODEV;
+ cpu_buffer = buffer->buffers[cpu];
+ rbwork = &cpu_buffer->irq_work;
}
- schedule();
- out:
if (full)
- finish_wait(&work->full_waiters, &wait);
+ waitq = &rbwork->full_waiters;
else
- finish_wait(&work->waiters, &wait);
+ waitq = &rbwork->waiters;
- if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
- ret = -EINTR;
+ ret = wait_event_interruptible((*waitq),
+ rb_wait_cond(rbwork, buffer, cpu, full, cond, data));
return ret;
}
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
If a reader of the ring buffer is doing a poll, and waiting for the ring
buffer to hit a specific watermark, there could be a case where it gets
into an infinite ping-pong loop.
The poll code has:
rbwork->full_waiters_pending = true;
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
The writer will see full_waiters_pending and check if the ring buffer is
filled over the percentage of the shortest_full value. If it is, it calls
an irq_work to wake up all the waiters.
But the code could get into a circular loop:
CPU 0 CPU 1
----- -----
[ Poll ]
[ shortest_full = 0 ]
rbwork->full_waiters_pending = true;
if (rbwork->full_waiters_pending &&
[ buffer percent ] > shortest_full) {
rbwork->wakeup_full = true;
[ queue_irqwork ]
cpu_buffer->shortest_full = full;
[ IRQ work ]
if (rbwork->wakeup_full) {
cpu_buffer->shortest_full = 0;
wakeup poll waiters;
[woken]
if ([ buffer percent ] > full)
break;
rbwork->full_waiters_pending = true;
if (rbwork->full_waiters_pending &&
[ buffer percent ] > shortest_full) {
rbwork->wakeup_full = true;
[ queue_irqwork ]
cpu_buffer->shortest_full = full;
[ IRQ work ]
if (rbwork->wakeup_full) {
cpu_buffer->shortest_full = 0;
wakeup poll waiters;
[woken]
[ Wash, rinse, repeat! ]
In the poll, the shortest_full needs to be set before the
full_pending_waiters, as once that is set, the writer will compare the
current shortest_full (which is incorrect) to decide to call the irq_work,
which will reset the shortest_full (expecting the readers to update it).
Also move the setting of full_waiters_pending after the check if the ring
buffer has the required percentage filled. There's no reason to tell the
writer to wake up waiters if there are no waiters.
Link: https://lore.kernel.org/linux-trace-kernel/20240312131952.630922155@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6ffbccb9bcf0..99fdda29ce4e 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -965,16 +965,32 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
poll_wait(filp, &rbwork->full_waiters, poll_table);
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- rbwork->full_waiters_pending = true;
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- } else {
- poll_wait(filp, &rbwork->waiters, poll_table);
- rbwork->waiters_pending = true;
+ if (full_hit(buffer, cpu, full))
+ return EPOLLIN | EPOLLRDNORM;
+ /*
+ * Only allow full_waiters_pending update to be seen after
+ * the shortest_full is set. If the writer sees the
+ * full_waiters_pending flag set, it will compare the
+ * amount in the ring buffer to shortest_full. If the amount
+ * in the ring buffer is greater than the shortest_full
+ * percent, it will call the irq_work handler to wake up
+ * this list. The irq_handler will reset shortest_full
+ * back to zero. That's done under the reader_lock, but
+ * the below smp_mb() makes sure that the update to
+ * full_waiters_pending doesn't leak up into the above.
+ */
+ smp_mb();
+ rbwork->full_waiters_pending = true;
+ return 0;
}
+ poll_wait(filp, &rbwork->waiters, poll_table);
+ rbwork->waiters_pending = true;
+
/*
* There's a tight race between setting the waiters_pending and
* checking if the ring buffer is empty. Once the waiters_pending bit
@@ -990,9 +1006,6 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
*/
smp_mb();
- if (full)
- return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0;
-
if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
(cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
return EPOLLIN | EPOLLRDNORM;
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The rb_watermark_hit() checks if the amount of data in the ring buffer is
above the percentage level passed in by the "full" variable. If it is, it
returns true.
But it also sets the "shortest_full" field of the cpu_buffer that informs
writers that it needs to call the irq_work if the amount of data on the
ring buffer is above the requested amount.
The rb_watermark_hit() always sets the shortest_full even if the amount in
the ring buffer is what it wants. As it is not going to wait, because it
has what it wants, there's no reason to set shortest_full.
Link: https://lore.kernel.org/linux-trace-kernel/20240312115641.6aa8ba08@gandalf.…
Cc: stable(a)vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark")
Reviewed-by: Masami Hiramatsu (Google) <mhiramat(a)kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index aa332ace108b..6ffbccb9bcf0 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -834,9 +834,10 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
ret = !pagebusy && full_hit(buffer, cpu, full);
- if (!cpu_buffer->shortest_full ||
- cpu_buffer->shortest_full > full)
- cpu_buffer->shortest_full = full;
+ if (!ret && (!cpu_buffer->shortest_full ||
+ cpu_buffer->shortest_full > full)) {
+ cpu_buffer->shortest_full = full;
+ }
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
}
return ret;
--
2.43.0
Hi,
this series does basically two things:
1. Disables automatic load balancing as adviced by the hardware
workaround.
2. Assigns all the CCS slices to one single user engine. The user
will then be able to query only one CCS engine
In this v5 I have created a new file, gt/intel_gt_ccs_mode.c
where I added the intel_gt_apply_ccs_mode(). In the upcoming
patches, this file will contain the implementation for dynamic
CCS mode setting.
I saw also necessary the creation of a new mechanism fro looping
through engines in order to exclude the CCS's that are merged
into one single stream. It's called for_each_available_engine()
and I started using it in the hangcheck sefltest. I might still
need to iterate a few CI runs in order to cover more cases when
this call is needed.
I'm using here the "Requires: " tag, but I'm not sure the commit
id will be valid, on the other hand, I don't know what commit id
I should use.
Thanks Tvrtko, Matt, John and Joonas for your reviews!
Andi
Changelog
=========
v4 -> v5
- Use the workaround framework to do all the CCS balancing
settings in order to always apply the modes also when the
engine resets. Put everything in its own specific function to
be executed for the first CCS engine encountered. (Thanks
Matt)
- Calculate the CCS ID for the CCS mode as the first available
CCS among all the engines (Thanks Matt)
- create the intel_gt_ccs_mode.c function to host the CCS
configuration. We will have it ready for the next series.
- Fix a selftest that was failing because could not set CCS2.
- Add the for_each_available_engine() macro to exclude CCS1+ and
start using it in the hangcheck selftest.
v3 -> v4
- Reword correctly the comment in the workaround
- Fix a buffer overflow (Thanks Joonas)
- Handle properly the fused engines when setting the CCS mode.
v2 -> v3
- Simplified the algorithm for creating the list of the exported
uabi engines. (Patch 1) (Thanks, Tvrtko)
- Consider the fused engines when creating the uabi engine list
(Patch 2) (Thanks, Matt)
- Patch 4 now uses a the refactoring from patch 1, in a cleaner
outcome.
v1 -> v2
- In Patch 1 use the correct workaround number (thanks Matt).
- In Patch 2 do not add the extra CCS engines to the exposed
UABI engine list and adapt the engine counting accordingly
(thanks Tvrtko).
- Reword the commit of Patch 2 (thanks John).
Andi Shyti (4):
drm/i915/gt: Disable HW load balancing for CCS
drm/i915/gt: Refactor uabi engine class/instance list creation
drm/i915/gt: Disable tests for CCS engines beyond the first
drm/i915/gt: Enable only one CCS for compute workload
drivers/gpu/drm/i915/Makefile | 1 +
drivers/gpu/drm/i915/gt/intel_engine_user.c | 40 ++++++++++++++------
drivers/gpu/drm/i915/gt/intel_gt.h | 13 +++++++
drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c | 39 +++++++++++++++++++
drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h | 13 +++++++
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 6 +++
drivers/gpu/drm/i915/gt/intel_workarounds.c | 30 ++++++++++++++-
drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 22 +++++------
8 files changed, 139 insertions(+), 25 deletions(-)
create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.c
create mode 100644 drivers/gpu/drm/i915/gt/intel_gt_ccs_mode.h
--
2.43.0
This is a backport of recently upstreamed mitigation of a CPU
vulnerability Register File Data Sampling (RFDS) (CVE-2023-28746). It
has a dependency on "Delay VERW" series which is already present in
v6.8.
v6.8 just got released so the backport was very smooth.
Cc: Sasha Levin <sashal(a)kernel.org>
To: stable(a)vger.kernel.org
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
Pawan Gupta (4):
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Documentation/hw-vuln: Add documentation for RFDS
x86/rfds: Mitigate Register File Data Sampling (RFDS)
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 +++++++++++++++++++++
Documentation/admin-guide/kernel-parameters.txt | 21 +++++
arch/x86/Kconfig | 11 +++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 +++++++-
arch/x86/kvm/x86.c | 5 +-
drivers/base/cpu.c | 3 +
include/linux/cpu.h | 2 +
12 files changed, 278 insertions(+), 9 deletions(-)
---
base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
change-id: 20240312-rfds-backport-6-8-y-d67dcdfe4e51
Best regards,
--
Thanks,
Pawan
This is a backport of recently upstreamed mitigation of a CPU
vulnerability Register File Data Sampling (RFDS) (CVE-2023-28746). It
has a dependency on "Delay VERW" series which is already backported and
merged in linux-6.1.y.
- There was a minor conflict for patch 2/4 in Documentation index.
- There were many easy to resolve conflicts in patch 3/4 related to
sysfs reporting.
- s/ATOM_GRACEMONT/ALDERLAKE_N/
- ATOM_GRACEMONT is called ALDERLAKE_N in 6.6.
Cc: Sasha Levin <sashal(a)kernel.org>
To: stable(a)vger.kernel.org
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
Pawan Gupta (4):
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Documentation/hw-vuln: Add documentation for RFDS
x86/rfds: Mitigate Register File Data Sampling (RFDS)
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 +++++++++++++++++++++
Documentation/admin-guide/kernel-parameters.txt | 21 +++++
arch/x86/Kconfig | 11 +++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 +++++++-
arch/x86/kvm/x86.c | 5 +-
drivers/base/cpu.c | 8 ++
include/linux/cpu.h | 2 +
12 files changed, 283 insertions(+), 9 deletions(-)
---
base-commit: 61adba85cc40287232a539e607164f273260e0fe
change-id: 20240312-rfds-backport-6-1-y-d17ecf8c1ec5
Best regards,
--
Thanks,
Pawan
This is a backport of recently upstreamed mitigation of a CPU
vulnerability Register File Data Sampling (RFDS) (CVE-2023-28746). It
has a dependency on "Delay VERW" series which is already backported and
merged in linux-6.6.y.
There were no hiccups in backporting this.
Cc: Sasha Levin <sashal(a)kernel.org>
To: stable(a)vger.kernel.org
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
Pawan Gupta (4):
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Documentation/hw-vuln: Add documentation for RFDS
x86/rfds: Mitigate Register File Data Sampling (RFDS)
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 +++++++++++++++++++++
Documentation/admin-guide/kernel-parameters.txt | 21 +++++
arch/x86/Kconfig | 11 +++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 +++++++-
arch/x86/kvm/x86.c | 5 +-
drivers/base/cpu.c | 3 +
include/linux/cpu.h | 2 +
12 files changed, 278 insertions(+), 9 deletions(-)
---
base-commit: 62e5ae5007ef14cf9b12da6520d50fe90079d8d4
change-id: 20240312-rfds-backport-6-6-y-e1425616b52a
Best regards,
--
Thanks,
Pawan
This is a backport of recently upstreamed mitigation of a CPU
vulnerability Register File Data Sampling (RFDS) (CVE-2023-28746). It
has a dependency on "Delay VERW" series which is already backported and
merged in linux-6.7.y.
There were no hiccups in backporting this.
Cc: Sasha Levin <sashal(a)kernel.org>
To: stable(a)vger.kernel.org
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
Pawan Gupta (4):
x86/mmio: Disable KVM mitigation when X86_FEATURE_CLEAR_CPU_BUF is set
Documentation/hw-vuln: Add documentation for RFDS
x86/rfds: Mitigate Register File Data Sampling (RFDS)
KVM/x86: Export RFDS_NO and RFDS_CLEAR to guests
Documentation/ABI/testing/sysfs-devices-system-cpu | 1 +
Documentation/admin-guide/hw-vuln/index.rst | 1 +
.../admin-guide/hw-vuln/reg-file-data-sampling.rst | 104 +++++++++++++++++++++
Documentation/admin-guide/kernel-parameters.txt | 21 +++++
arch/x86/Kconfig | 11 +++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 8 ++
arch/x86/kernel/cpu/bugs.c | 92 +++++++++++++++++-
arch/x86/kernel/cpu/common.c | 38 +++++++-
arch/x86/kvm/x86.c | 5 +-
drivers/base/cpu.c | 3 +
include/linux/cpu.h | 2 +
12 files changed, 278 insertions(+), 9 deletions(-)
---
base-commit: 2e7cdd29fc42c410eab52fffe5710bf656619222
change-id: 20240312-rfds-backport-6-7-y-4fbe3fc5366a
Best regards,
--
Thanks,
Pawan
On 3/11/24 15:32, Vasant k wrote:
> Hi Tom,
>
> Right, it just escaped my mind that the SNP uses the secrets page
> to hand over APs to the next stage. I will correct that in the next
Not quite... The MADT table lists the APs and the GHCB AP Create NAE event
is used to start the APs.
Thanks,
Tom
> version. Please let me know if you have any corrections or improvement
> suggestions on the rest of the patchset.
>
> Thanks,
> Vasant
>
Good day,
Our investors are exploring all possibilities to initiate change
on a massive scale of investment opportunities, We offer
personal, business, loans at a fixed interest rate of 2% per
annum.
Regards
Howard Ethan
Financial Advisor
The patch titled
Subject: memtest: use {READ,WRITE}_ONCE in memory scanning
has been added to the -mm mm-unstable branch. Its filename is
memtest-use-readwrite_once-in-memory-scanning.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Qiang Zhang <qiang4.zhang(a)intel.com>
Subject: memtest: use {READ,WRITE}_ONCE in memory scanning
Date: Tue, 12 Mar 2024 16:04:23 +0800
memtest failed to find bad memory when compiled with clang. So use
{WRITE,READ}_ONCE to access memory to avoid compiler over optimization.
Link: https://lkml.kernel.org/r/20240312080422.691222-1-qiang4.zhang@intel.com
Signed-off-by: Qiang Zhang <qiang4.zhang(a)intel.com>
Cc: Bill Wendling <morbo(a)google.com>
Cc: Justin Stitt <justinstitt(a)google.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memtest.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/memtest.c~memtest-use-readwrite_once-in-memory-scanning
+++ a/mm/memtest.c
@@ -51,10 +51,10 @@ static void __init memtest(u64 pattern,
last_bad = 0;
for (p = start; p < end; p++)
- *p = pattern;
+ WRITE_ONCE(*p, pattern);
for (p = start; p < end; p++, start_phys_aligned += incr) {
- if (*p == pattern)
+ if (READ_ONCE(*p) == pattern)
continue;
if (start_phys_aligned == last_bad + incr) {
last_bad += incr;
_
Patches currently in -mm which might be from qiang4.zhang(a)intel.com are
memtest-use-readwrite_once-in-memory-scanning.patch
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
If a reader of the ring buffer is doing a poll, and waiting for the ring
buffer to hit a specific watermark, there could be a case where it gets
into an infinite ping-pong loop.
The poll code has:
rbwork->full_waiters_pending = true;
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
The writer will see full_waiters_pending and check if the ring buffer is
filled over the percentage of the shortest_full value. If it is, it calls
an irq_work to wake up all the waiters.
But the code could get into a circular loop:
CPU 0 CPU 1
----- -----
[ Poll ]
[ shortest_full = 0 ]
rbwork->full_waiters_pending = true;
if (rbwork->full_waiters_pending &&
[ buffer percent ] > shortest_full) {
rbwork->wakeup_full = true;
[ queue_irqwork ]
cpu_buffer->shortest_full = full;
[ IRQ work ]
if (rbwork->wakeup_full) {
cpu_buffer->shortest_full = 0;
wakeup poll waiters;
[woken]
if ([ buffer percent ] > full)
break;
rbwork->full_waiters_pending = true;
if (rbwork->full_waiters_pending &&
[ buffer percent ] > shortest_full) {
rbwork->wakeup_full = true;
[ queue_irqwork ]
cpu_buffer->shortest_full = full;
[ IRQ work ]
if (rbwork->wakeup_full) {
cpu_buffer->shortest_full = 0;
wakeup poll waiters;
[woken]
[ Wash, rinse, repeat! ]
In the poll, the shortest_full needs to be set before the
full_pending_waiters, as once that is set, the writer will compare the
current shortest_full (which is incorrect) to decide to call the irq_work,
which will reset the shortest_full (expecting the readers to update it).
Also move the setting of full_waiters_pending after the check if the ring
buffer has the required percentage filled. There's no reason to tell the
writer to wake up waiters if there are no waiters.
Cc: stable(a)vger.kernel.org
Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index aa332ace108b..adfe603a769b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -964,16 +964,32 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
poll_wait(filp, &rbwork->full_waiters, poll_table);
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- rbwork->full_waiters_pending = true;
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- } else {
- poll_wait(filp, &rbwork->waiters, poll_table);
- rbwork->waiters_pending = true;
+ if (full_hit(buffer, cpu, full))
+ return EPOLLIN | EPOLLRDNORM;
+ /*
+ * Only allow full_waiters_pending update to be seen after
+ * the shortest_full is set. If the writer sees the
+ * full_waiters_pending flag set, it will compare the
+ * amount in the ring buffer to shortest_full. If the amount
+ * in the ring buffer is greater than the shortest_full
+ * percent, it will call the irq_work handler to wake up
+ * this list. The irq_handler will reset shortest_full
+ * back to zero. That's done under the reader_lock, but
+ * the below smp_mb() makes sure that the update to
+ * full_waiters_pending doesn't leak up into the above.
+ */
+ smp_mb();
+ rbwork->full_waiters_pending = true;
+ return 0;
}
+ poll_wait(filp, &rbwork->waiters, poll_table);
+ rbwork->waiters_pending = true;
+
/*
* There's a tight race between setting the waiters_pending and
* checking if the ring buffer is empty. Once the waiters_pending bit
@@ -989,9 +1005,6 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
*/
smp_mb();
- if (full)
- return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0;
-
if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
(cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
return EPOLLIN | EPOLLRDNORM;
--
2.43.0
Since commit 1a50d9403fb9 ("treewide: Fix probing of devices in DT
overlays"), when using device-tree overlays, the FWNODE_FLAG_NOT_DEVICE
is set on each overlay nodes.
When an overlay contains a node related to a bus (i2c for instance)
and its children nodes representing i2c devices, the flag is cleared for
the bus node by the OF notifier but the "standard" probe sequence takes
place (the same one is performed without an overlay) for the bus and
children devices are created simply by walking the children DT nodes
without clearing the FWNODE_FLAG_NOT_DEVICE flag for these devices.
Clear the FWNODE_FLAG_NOT_DEVICE when the device is added, no matter if
an overlay is used or not.
Fixes: 1a50d9403fb9 ("treewide: Fix probing of devices in DT overlays")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Herve Codina <herve.codina(a)bootlin.com>
---
drivers/base/core.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/base/core.c b/drivers/base/core.c
index 14d46af40f9a..61d09ac57bfb 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3619,6 +3619,7 @@ int device_add(struct device *dev)
*/
if (dev->fwnode && !dev->fwnode->dev) {
dev->fwnode->dev = dev;
+ dev->fwnode->flags &= ~FWNODE_FLAG_NOT_DEVICE;
fw_devlink_link_device(dev);
}
--
2.43.0
Dear developers,
(sorry for the long CC list, it looks quite long to me, but I tried to
follow the issue reporting guide as closely as possible)
Since patches [1], [2] and [3] were applied to the kernel, there is a
regression with Lenovo ThinkPad Compact USB Keyboard (old model, not II).
[1]
https://github.com/torvalds/linux/commit/46a0a2c96f0f47628190f122c2e3d879e5…
[2]
https://github.com/torvalds/linux/commit/2f2bd7cbd1d1548137b351040dc4e037d1…
[3]
https://github.com/torvalds/linux/commit/43527a0094c10dfbf0d5a2e7979395a38d…
The regression is that a middle click is performed when releasing middle
button after wheel emulation.
The bug appears randomly, it can be after 5 minutes or 1 hour of
keyboard usage, and can only be worked around by unplugging/re-plugging
the keyboard. (I ended up resorting to simulate an unplug/replug, with a
script which echoes 0 then 1 to /sys/bus/usb/devices/<id>/authorized,
since I was afraid to damage the Micro-USB outlet by physically
unplugging/re-plugging too much).
Those spurious clicks are very annoying, since they can open links in
new tabs when scrolling in Firefox, or pasting text when scrolling in
terminals, or other unwanted stuff.
I witnessed it with latest kernels (Debian unstable) as well as stable
kernels (Debian 12 Bookworm, stable).
On Debian Stable, the last working kernel was 5.10.127, the regression
appeared in 5.10.136 (i read all changelogs on kernel.org between those
two releases but couldn't find anything about hid-lenovo, so I can't
tell exactly in which release the regression appeared, Debian upgraded
directly from .127 to .136).
I reported it in Debian [4], and apparently I'm not the only person
suffering from it [5].
[4] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1058758#32
[5] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1058758#42
I would understand that such bugs would end up in a development kernel
like the ones provided by Debian Unstable, but not with stable kernels
like the ones provided by Debian Stable.
Regards,
--
Raphaël Halimi
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
When the trace_pipe_raw file is closed, there should be no new readers on
the file descriptor. This is mostly handled with the waking and wait_index
fields of the iterator. But there's still a slight race.
CPU 0 CPU 1
----- -----
wait_index++;
index = wait_index;
ring_buffer_wake_waiters();
wait_on_pipe()
ring_buffer_wait();
The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is
that the ring_buffer_wait() needs the logic of:
prepare_to_wait();
if (!condition)
schedule();
Where the missing condition check is the iter->wait_index update.
Have the ring_buffer_wait() take a conditional callback function and a
data parameter that can be used within the wait_event_interruptible() of
the ring_buffer_wait() function.
In wait_on_pipe(), pass a condition function that will check if the
wait_index has been updated, if it has, it will return true to break out
of the wait_event_interruptible() loop.
Create a new field "closed" in the trace_iterator and set it in the
.flush() callback before calling ring_buffer_wake_waiters().
This will keep any new readers from waiting on a closed file descriptor.
Have the wait_on_pipe() condition callback also check the closed field.
Change the wait_index field of the trace_iterator to atomic_t. There's no
reason it needs to be 'long' and making it atomic and using
atomic_read_acquire() and atomic_fetch_inc_release() will provide the
necessary memory barriers.
Add a "woken" flag to tracing_buffers_splice_read() to exit the loop after
one more try to fetch data. That is, if it waited for data and something
woke it up, it should try to collect any new data and then exit back to
user space.
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm…
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/ring_buffer.h | 3 ++-
include/linux/trace_events.h | 5 ++++-
kernel/trace/ring_buffer.c | 13 ++++++-----
kernel/trace/trace.c | 43 ++++++++++++++++++++++++++----------
4 files changed, 45 insertions(+), 19 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 338a33db1577..dc5ae4e96aee 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -99,7 +99,8 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
})
typedef bool (*ring_buffer_cond_fn)(void *data);
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full,
+ ring_buffer_cond_fn cond, void *data);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d68ff9b1247f..fc6d0af56bb1 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -103,13 +103,16 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
- long wait_index;
+ atomic_t wait_index;
/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
cpumask_var_t started;
+ /* Set when the file is closed to prevent new waiters */
+ bool closed;
+
/* it's true when current open file is snapshot */
bool snapshot;
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c198ba466853..67d8405f4451 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -901,23 +901,26 @@ static bool rb_wait_once(void *data)
* @buffer: buffer to wait on
* @cpu: the cpu buffer to wait on
* @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ * @cond: condition function to break out of wait (NULL to run once)
+ * @data: the data to pass to @cond.
*
* If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
* as data is added to any of the @buffer's cpu buffers. Otherwise
* it will wait for data to be added to a specific cpu buffer.
*/
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full,
+ ring_buffer_cond_fn cond, void *data)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct wait_queue_head *waitq;
- ring_buffer_cond_fn cond;
struct rb_irq_work *rbwork;
- void *data;
long once = 0;
int ret = 0;
- cond = rb_wait_once;
- data = &once;
+ if (!cond) {
+ cond = rb_wait_once;
+ data = &once;
+ }
/*
* Depending on what the caller is waiting for, either any
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c9c898307348..d390fea3a6a5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1955,15 +1955,36 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu)
#endif /* CONFIG_TRACER_MAX_TRACE */
+struct pipe_wait {
+ struct trace_iterator *iter;
+ int wait_index;
+};
+
+static bool wait_pipe_cond(void *data)
+{
+ struct pipe_wait *pwait = data;
+ struct trace_iterator *iter = pwait->iter;
+
+ if (atomic_read_acquire(&iter->wait_index) != pwait->wait_index)
+ return true;
+
+ return iter->closed;
+}
+
static int wait_on_pipe(struct trace_iterator *iter, int full)
{
+ struct pipe_wait pwait;
int ret;
/* Iterators are static, they should be filled or empty */
if (trace_buffer_iter(iter, iter->cpu_file))
return 0;
- ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full);
+ pwait.wait_index = atomic_read_acquire(&iter->wait_index);
+ pwait.iter = iter;
+
+ ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full,
+ wait_pipe_cond, &pwait);
#ifdef CONFIG_TRACER_MAX_TRACE
/*
@@ -8398,9 +8419,9 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id)
struct ftrace_buffer_info *info = file->private_data;
struct trace_iterator *iter = &info->iter;
- iter->wait_index++;
+ iter->closed = true;
/* Make sure the waiters see the new wait_index */
- smp_wmb();
+ (void)atomic_fetch_inc_release(&iter->wait_index);
ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
@@ -8500,6 +8521,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
+ bool woken = false;
int page_size;
int entries, i;
ssize_t ret = 0;
@@ -8573,17 +8595,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
/* did we read anything? */
if (!spd.nr_pages) {
- long wait_index;
if (ret)
goto out;
+ if (woken)
+ goto out;
+
ret = -EAGAIN;
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
- wait_index = READ_ONCE(iter->wait_index);
-
ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent);
if (ret)
goto out;
@@ -8592,10 +8614,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (!tracer_tracing_is_on(iter->tr))
goto out;
- /* Make sure we see the new wait_index */
- smp_rmb();
- if (wait_index != iter->wait_index)
- goto out;
+ /* Iterate one more time to collect any new data then exit */
+ woken = true;
goto again;
}
@@ -8618,9 +8638,8 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned
mutex_lock(&trace_types_lock);
- iter->wait_index++;
/* Make sure the waiters see the new wait_index */
- smp_wmb();
+ (void)atomic_fetch_inc_release(&iter->wait_index);
ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Convert ring_buffer_wait() over to wait_event_interruptible(). The default
condition is to execute the wait loop inside __wait_event() just once.
This does not change the ring_buffer_wait() prototype yet, but
restructures the code so that it can take a "cond" and "data" parameter
and will call wait_event_interruptible() with a helper function as the
condition.
The helper function (rb_wait_cond) takes the cond function and data
parameters. It will first check if the buffer hit the watermark defined by
the "full" parameter and then call the passed in condition parameter. If
either are true, it returns true.
If rb_wait_cond() does not return true, it will set the appropriate
"waiters_pending" flag and returns false.
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm…
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/ring_buffer.h | 1 +
kernel/trace/ring_buffer.c | 116 +++++++++++++++++++++---------------
2 files changed, 69 insertions(+), 48 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index fa802db216f9..338a33db1577 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -98,6 +98,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
__ring_buffer_alloc((size), (flags), &__key); \
})
+typedef bool (*ring_buffer_cond_fn)(void *data);
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6ef763f57c66..c198ba466853 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -842,43 +842,15 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
return ret;
}
-/**
- * ring_buffer_wait - wait for input to the ring buffer
- * @buffer: buffer to wait on
- * @cpu: the cpu buffer to wait on
- * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
- *
- * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
- * as data is added to any of the @buffer's cpu buffers. Otherwise
- * it will wait for data to be added to a specific cpu buffer.
- */
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+static inline bool
+rb_wait_cond(struct rb_irq_work *rbwork, struct trace_buffer *buffer,
+ int cpu, int full, ring_buffer_cond_fn cond, void *data)
{
- struct ring_buffer_per_cpu *cpu_buffer;
- DEFINE_WAIT(wait);
- struct rb_irq_work *work;
- int ret = 0;
-
- /*
- * Depending on what the caller is waiting for, either any
- * data in any cpu buffer, or a specific buffer, put the
- * caller on the appropriate wait queue.
- */
- if (cpu == RING_BUFFER_ALL_CPUS) {
- work = &buffer->irq_work;
- /* Full only makes sense on per cpu reads */
- full = 0;
- } else {
- if (!cpumask_test_cpu(cpu, buffer->cpumask))
- return -ENODEV;
- cpu_buffer = buffer->buffers[cpu];
- work = &cpu_buffer->irq_work;
- }
+ if (rb_watermark_hit(buffer, cpu, full))
+ return true;
- if (full)
- prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
- else
- prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
+ if (cond(data))
+ return true;
/*
* The events can happen in critical sections where
@@ -901,27 +873,75 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* a task has been queued. It's OK for spurious wake ups.
*/
if (full)
- work->full_waiters_pending = true;
+ rbwork->full_waiters_pending = true;
else
- work->waiters_pending = true;
+ rbwork->waiters_pending = true;
- if (rb_watermark_hit(buffer, cpu, full))
- goto out;
+ return false;
+}
- if (signal_pending(current)) {
- ret = -EINTR;
- goto out;
+/*
+ * The default wait condition for ring_buffer_wait() is to just to exit the
+ * wait loop the first time it is woken up.
+ */
+static bool rb_wait_once(void *data)
+{
+ long *once = data;
+
+ /* wait_event() actually calls this twice before scheduling*/
+ if (*once > 1)
+ return true;
+
+ (*once)++;
+ return false;
+}
+
+/**
+ * ring_buffer_wait - wait for input to the ring buffer
+ * @buffer: buffer to wait on
+ * @cpu: the cpu buffer to wait on
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ *
+ * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
+ * as data is added to any of the @buffer's cpu buffers. Otherwise
+ * it will wait for data to be added to a specific cpu buffer.
+ */
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct wait_queue_head *waitq;
+ ring_buffer_cond_fn cond;
+ struct rb_irq_work *rbwork;
+ void *data;
+ long once = 0;
+ int ret = 0;
+
+ cond = rb_wait_once;
+ data = &once;
+
+ /*
+ * Depending on what the caller is waiting for, either any
+ * data in any cpu buffer, or a specific buffer, put the
+ * caller on the appropriate wait queue.
+ */
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+ rbwork = &buffer->irq_work;
+ /* Full only makes sense on per cpu reads */
+ full = 0;
+ } else {
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return -ENODEV;
+ cpu_buffer = buffer->buffers[cpu];
+ rbwork = &cpu_buffer->irq_work;
}
- schedule();
- out:
if (full)
- finish_wait(&work->full_waiters, &wait);
+ waitq = &rbwork->full_waiters;
else
- finish_wait(&work->waiters, &wait);
+ waitq = &rbwork->waiters;
- if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
- ret = -EINTR;
+ ret = wait_event_interruptible((*waitq),
+ rb_wait_cond(rbwork, buffer, cpu, full, cond, data));
return ret;
}
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
When the trace_pipe_raw file is closed, there should be no new readers on
the file descriptor. This is mostly handled with the waking and wait_index
fields of the iterator. But there's still a slight race.
CPU 0 CPU 1
----- -----
wait_index++;
index = wait_index;
ring_buffer_wake_waiters();
wait_on_pipe()
ring_buffer_wait();
The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is
that the ring_buffer_wait() needs the logic of:
prepare_to_wait();
if (!condition)
schedule();
Where the missing condition check is the iter->wait_index update.
Have the ring_buffer_wait() take a conditional callback function and a
data parameter that can be used within the wait_event_interruptible() of
the ring_buffer_wait() function.
In wait_on_pipe(), pass a condition function that will check if the
wait_index has been updated, if it has, it will return true to break out
of the wait_event_interruptible() loop.
Create a new field "closed" in the trace_iterator and set it in the
.flush() callback before calling ring_buffer_wake_waiters().
This will keep any new readers from waiting on a closed file descriptor.
Have the wait_on_pipe() condition callback also check the closed field.
Change the wait_index field of the trace_iterator to atomic_t. There's no
reason it needs to be 'long' and making it atomic and using
atomic_read_acquire() and atomic_fetch_inc_release() will provide the
necessary memory barriers.
Add a "woken" flag to tracing_buffers_splice_read() to exit the loop after
one more try to fetch data. That is, if it waited for data and something
woke it up, it should try to collect any new data and then exit back to
user space.
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm…
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/ring_buffer.h | 3 ++-
include/linux/trace_events.h | 5 ++++-
kernel/trace/ring_buffer.c | 11 ++++-----
kernel/trace/trace.c | 43 ++++++++++++++++++++++++++----------
4 files changed, 43 insertions(+), 19 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index 338a33db1577..dc5ae4e96aee 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -99,7 +99,8 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
})
typedef bool (*ring_buffer_cond_fn)(void *data);
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full,
+ ring_buffer_cond_fn cond, void *data);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d68ff9b1247f..fc6d0af56bb1 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -103,13 +103,16 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
- long wait_index;
+ atomic_t wait_index;
/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
cpumask_var_t started;
+ /* Set when the file is closed to prevent new waiters */
+ bool closed;
+
/* it's true when current open file is snapshot */
bool snapshot;
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index c198ba466853..81a5303bdc09 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -906,18 +906,19 @@ static bool rb_wait_once(void *data)
* as data is added to any of the @buffer's cpu buffers. Otherwise
* it will wait for data to be added to a specific cpu buffer.
*/
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full,
+ ring_buffer_cond_fn cond, void *data)
{
struct ring_buffer_per_cpu *cpu_buffer;
struct wait_queue_head *waitq;
- ring_buffer_cond_fn cond;
struct rb_irq_work *rbwork;
- void *data;
long once = 0;
int ret = 0;
- cond = rb_wait_once;
- data = &once;
+ if (!cond) {
+ cond = rb_wait_once;
+ data = &once;
+ }
/*
* Depending on what the caller is waiting for, either any
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c9c898307348..d390fea3a6a5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1955,15 +1955,36 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu)
#endif /* CONFIG_TRACER_MAX_TRACE */
+struct pipe_wait {
+ struct trace_iterator *iter;
+ int wait_index;
+};
+
+static bool wait_pipe_cond(void *data)
+{
+ struct pipe_wait *pwait = data;
+ struct trace_iterator *iter = pwait->iter;
+
+ if (atomic_read_acquire(&iter->wait_index) != pwait->wait_index)
+ return true;
+
+ return iter->closed;
+}
+
static int wait_on_pipe(struct trace_iterator *iter, int full)
{
+ struct pipe_wait pwait;
int ret;
/* Iterators are static, they should be filled or empty */
if (trace_buffer_iter(iter, iter->cpu_file))
return 0;
- ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full);
+ pwait.wait_index = atomic_read_acquire(&iter->wait_index);
+ pwait.iter = iter;
+
+ ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full,
+ wait_pipe_cond, &pwait);
#ifdef CONFIG_TRACER_MAX_TRACE
/*
@@ -8398,9 +8419,9 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id)
struct ftrace_buffer_info *info = file->private_data;
struct trace_iterator *iter = &info->iter;
- iter->wait_index++;
+ iter->closed = true;
/* Make sure the waiters see the new wait_index */
- smp_wmb();
+ (void)atomic_fetch_inc_release(&iter->wait_index);
ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
@@ -8500,6 +8521,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
+ bool woken = false;
int page_size;
int entries, i;
ssize_t ret = 0;
@@ -8573,17 +8595,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
/* did we read anything? */
if (!spd.nr_pages) {
- long wait_index;
if (ret)
goto out;
+ if (woken)
+ goto out;
+
ret = -EAGAIN;
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
- wait_index = READ_ONCE(iter->wait_index);
-
ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent);
if (ret)
goto out;
@@ -8592,10 +8614,8 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (!tracer_tracing_is_on(iter->tr))
goto out;
- /* Make sure we see the new wait_index */
- smp_rmb();
- if (wait_index != iter->wait_index)
- goto out;
+ /* Iterate one more time to collect any new data then exit */
+ woken = true;
goto again;
}
@@ -8618,9 +8638,8 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned
mutex_lock(&trace_types_lock);
- iter->wait_index++;
/* Make sure the waiters see the new wait_index */
- smp_wmb();
+ (void)atomic_fetch_inc_release(&iter->wait_index);
ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Convert ring_buffer_wait() over to wait_event_interruptible(). The default
condition is to execute the wait loop inside __wait_event() just once.
This does not change the ring_buffer_wait() prototype yet, but
restructures the code so that it can take a "cond" and "data" parameter
and will call wait_event_interruptible() with a helper function as the
condition.
The helper function (rb_wait_cond) takes the cond function and data
parameters. It will first check if the buffer hit the watermark defined by
the "full" parameter and then call the passed in condition parameter. If
either are true, it returns true.
If rb_wait_cond() does not return true, it will set the appropriate
"waiters_pending" flag and returns false.
Link: https://lore.kernel.org/linux-trace-kernel/CAHk-=wgsNgewHFxZAJiAQznwPMqEtQm…
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/ring_buffer.h | 1 +
kernel/trace/ring_buffer.c | 116 +++++++++++++++++++++---------------
2 files changed, 69 insertions(+), 48 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index fa802db216f9..338a33db1577 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -98,6 +98,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
__ring_buffer_alloc((size), (flags), &__key); \
})
+typedef bool (*ring_buffer_cond_fn)(void *data);
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 6ef763f57c66..c198ba466853 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -842,43 +842,15 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
return ret;
}
-/**
- * ring_buffer_wait - wait for input to the ring buffer
- * @buffer: buffer to wait on
- * @cpu: the cpu buffer to wait on
- * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
- *
- * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
- * as data is added to any of the @buffer's cpu buffers. Otherwise
- * it will wait for data to be added to a specific cpu buffer.
- */
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+static inline bool
+rb_wait_cond(struct rb_irq_work *rbwork, struct trace_buffer *buffer,
+ int cpu, int full, ring_buffer_cond_fn cond, void *data)
{
- struct ring_buffer_per_cpu *cpu_buffer;
- DEFINE_WAIT(wait);
- struct rb_irq_work *work;
- int ret = 0;
-
- /*
- * Depending on what the caller is waiting for, either any
- * data in any cpu buffer, or a specific buffer, put the
- * caller on the appropriate wait queue.
- */
- if (cpu == RING_BUFFER_ALL_CPUS) {
- work = &buffer->irq_work;
- /* Full only makes sense on per cpu reads */
- full = 0;
- } else {
- if (!cpumask_test_cpu(cpu, buffer->cpumask))
- return -ENODEV;
- cpu_buffer = buffer->buffers[cpu];
- work = &cpu_buffer->irq_work;
- }
+ if (rb_watermark_hit(buffer, cpu, full))
+ return true;
- if (full)
- prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
- else
- prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
+ if (cond(data))
+ return true;
/*
* The events can happen in critical sections where
@@ -901,27 +873,75 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* a task has been queued. It's OK for spurious wake ups.
*/
if (full)
- work->full_waiters_pending = true;
+ rbwork->full_waiters_pending = true;
else
- work->waiters_pending = true;
+ rbwork->waiters_pending = true;
- if (rb_watermark_hit(buffer, cpu, full))
- goto out;
+ return false;
+}
- if (signal_pending(current)) {
- ret = -EINTR;
- goto out;
+/*
+ * The default wait condition for ring_buffer_wait() is to just to exit the
+ * wait loop the first time it is woken up.
+ */
+static bool rb_wait_once(void *data)
+{
+ long *once = data;
+
+ /* wait_event() actually calls this twice before scheduling*/
+ if (*once > 1)
+ return true;
+
+ (*once)++;
+ return false;
+}
+
+/**
+ * ring_buffer_wait - wait for input to the ring buffer
+ * @buffer: buffer to wait on
+ * @cpu: the cpu buffer to wait on
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ *
+ * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
+ * as data is added to any of the @buffer's cpu buffers. Otherwise
+ * it will wait for data to be added to a specific cpu buffer.
+ */
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct wait_queue_head *waitq;
+ ring_buffer_cond_fn cond;
+ struct rb_irq_work *rbwork;
+ void *data;
+ long once = 0;
+ int ret = 0;
+
+ cond = rb_wait_once;
+ data = &once;
+
+ /*
+ * Depending on what the caller is waiting for, either any
+ * data in any cpu buffer, or a specific buffer, put the
+ * caller on the appropriate wait queue.
+ */
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+ rbwork = &buffer->irq_work;
+ /* Full only makes sense on per cpu reads */
+ full = 0;
+ } else {
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return -ENODEV;
+ cpu_buffer = buffer->buffers[cpu];
+ rbwork = &cpu_buffer->irq_work;
}
- schedule();
- out:
if (full)
- finish_wait(&work->full_waiters, &wait);
+ waitq = &rbwork->full_waiters;
else
- finish_wait(&work->waiters, &wait);
+ waitq = &rbwork->waiters;
- if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
- ret = -EINTR;
+ ret = wait_event_interruptible((*waitq),
+ rb_wait_cond(rbwork, buffer, cpu, full, cond, data));
return ret;
}
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
If a reader of the ring buffer is doing a poll, and waiting for the ring
buffer to hit a specific watermark, there could be a case where it gets
into an infinite ping-pong loop.
The poll code has:
rbwork->full_waiters_pending = true;
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
The writer will see full_waiters_pending and check if the ring buffer is
filled over the percentage of the shortest_full value. If it is, it calls
an irq_work to wake up all the waiters.
But the code could get into a circular loop:
CPU 0 CPU 1
----- -----
[ Poll ]
[ shortest_full = 0 ]
rbwork->full_waiters_pending = true;
if (rbwork->full_waiters_pending &&
[ buffer percent ] > shortest_full) {
rbwork->wakeup_full = true;
[ queue_irqwork ]
cpu_buffer->shortest_full = full;
[ IRQ work ]
if (rbwork->wakeup_full) {
cpu_buffer->shortest_full = 0;
wakeup poll waiters;
[woken]
if ([ buffer percent ] > full)
break;
rbwork->full_waiters_pending = true;
if (rbwork->full_waiters_pending &&
[ buffer percent ] > shortest_full) {
rbwork->wakeup_full = true;
[ queue_irqwork ]
cpu_buffer->shortest_full = full;
[ IRQ work ]
if (rbwork->wakeup_full) {
cpu_buffer->shortest_full = 0;
wakeup poll waiters;
[woken]
[ Wash, rinse, repeat! ]
In the poll, the shortest_full needs to be set before the
full_pending_waiters, as once that is set, the writer will compare the
current shortest_full (which is incorrect) to decide to call the irq_work,
which will reset the shortest_full (expecting the readers to update it).
Also move the setting of full_waiters_pending after the check if the ring
buffer has the required percentage filled. There's no reason to tell the
writer to wake up waiters if there are no waiters.
Cc: stable(a)vger.kernel.org
Fixes: 42fb0a1e84ff5 ("tracing/ring-buffer: Have polling block on watermark")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 27 ++++++++++++++++++++-------
1 file changed, 20 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index aa332ace108b..adfe603a769b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -964,16 +964,32 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
poll_wait(filp, &rbwork->full_waiters, poll_table);
raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- rbwork->full_waiters_pending = true;
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- } else {
- poll_wait(filp, &rbwork->waiters, poll_table);
- rbwork->waiters_pending = true;
+ if (full_hit(buffer, cpu, full))
+ return EPOLLIN | EPOLLRDNORM;
+ /*
+ * Only allow full_waiters_pending update to be seen after
+ * the shortest_full is set. If the writer sees the
+ * full_waiters_pending flag set, it will compare the
+ * amount in the ring buffer to shortest_full. If the amount
+ * in the ring buffer is greater than the shortest_full
+ * percent, it will call the irq_work handler to wake up
+ * this list. The irq_handler will reset shortest_full
+ * back to zero. That's done under the reader_lock, but
+ * the below smp_mb() makes sure that the update to
+ * full_waiters_pending doesn't leak up into the above.
+ */
+ smp_mb();
+ rbwork->full_waiters_pending = true;
+ return 0;
}
+ poll_wait(filp, &rbwork->waiters, poll_table);
+ rbwork->waiters_pending = true;
+
/*
* There's a tight race between setting the waiters_pending and
* checking if the ring buffer is empty. Once the waiters_pending bit
@@ -989,9 +1005,6 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
*/
smp_mb();
- if (full)
- return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0;
-
if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) ||
(cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu)))
return EPOLLIN | EPOLLRDNORM;
--
2.43.0
The PD of Type-C port needs to be updated in pd_set. Unlink the Type-C
port device to the old PD before linking it to a new one.
Fixes: cd099cde4ed2 ("usb: typec: tcpm: Support multiple capabilities")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kyle Tso <kyletso(a)google.com>
---
drivers/usb/typec/tcpm/tcpm.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 3d505614bff1..896f594b9328 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -6907,7 +6907,9 @@ static int tcpm_pd_set(struct typec_port *p, struct usb_power_delivery *pd)
port->port_source_caps = data->source_cap;
port->port_sink_caps = data->sink_cap;
+ typec_port_set_usb_power_delivery(p, NULL);
port->selected_pd = pd;
+ typec_port_set_usb_power_delivery(p, port->selected_pd);
unlock:
mutex_unlock(&port->lock);
return ret;
--
2.44.0.278.ge034bb2e1d-goog
From: Wenjing Liu <wenjing.liu(a)amd.com>
[why]
During minimal transition commit, the base state could be freed if it is current state.
This is because after committing minimal transition state, the current state will be
swapped to the minimal transition state and the old current state will be released.
the release could cause the old current state's memory to be freed. However dc
will derefernce this memory when release minimal transition state. Therefore, we
need to retain the old current state until we release minimal transition state.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Josip Pavic <josip.pavic(a)amd.com>
Acked-by: Wayne Lin <wayne.lin(a)amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/core/dc.c | 21 +++++++++++++--------
1 file changed, 13 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c
index a372c4965adf..ab0c920333be 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -4203,7 +4203,6 @@ static void release_minimal_transition_state(struct dc *dc,
{
restore_minimal_pipe_split_policy(dc, base_context, policy);
dc_state_release(minimal_transition_context);
- /* restore previous pipe split and odm policy */
}
static void force_vsync_flip_in_minimal_transition_context(struct dc_state *context)
@@ -4258,7 +4257,7 @@ static bool is_pipe_topology_transition_seamless_with_intermediate_step(
intermediate_state, final_state);
}
-static void swap_and_free_current_context(struct dc *dc,
+static void swap_and_release_current_context(struct dc *dc,
struct dc_state *new_context, struct dc_stream_state *stream)
{
@@ -4321,7 +4320,7 @@ static bool commit_minimal_transition_based_on_new_context(struct dc *dc,
commit_planes_for_stream(dc, srf_updates,
surface_count, stream, NULL,
UPDATE_TYPE_FULL, intermediate_context);
- swap_and_free_current_context(
+ swap_and_release_current_context(
dc, intermediate_context, stream);
dc_state_retain(dc->current_state);
success = true;
@@ -4338,6 +4337,7 @@ static bool commit_minimal_transition_based_on_current_context(struct dc *dc,
bool success = false;
struct pipe_split_policy_backup policy;
struct dc_state *intermediate_context;
+ struct dc_state *old_current_state = dc->current_state;
struct dc_surface_update srf_updates[MAX_SURFACE_NUM];
int surface_count;
@@ -4353,8 +4353,10 @@ static bool commit_minimal_transition_based_on_current_context(struct dc *dc,
* with the current state.
*/
restore_planes_and_stream_state(&dc->scratch.current_state, stream);
+ dc_state_retain(old_current_state);
intermediate_context = create_minimal_transition_state(dc,
- dc->current_state, &policy);
+ old_current_state, &policy);
+
if (intermediate_context) {
if (is_pipe_topology_transition_seamless_with_intermediate_step(
dc,
@@ -4367,14 +4369,15 @@ static bool commit_minimal_transition_based_on_current_context(struct dc *dc,
commit_planes_for_stream(dc, srf_updates,
surface_count, stream, NULL,
UPDATE_TYPE_FULL, intermediate_context);
- swap_and_free_current_context(
+ swap_and_release_current_context(
dc, intermediate_context, stream);
dc_state_retain(dc->current_state);
success = true;
}
release_minimal_transition_state(dc, intermediate_context,
- dc->current_state, &policy);
+ old_current_state, &policy);
}
+ dc_state_release(old_current_state);
/*
* Restore stream and plane states back to the values associated with
* new context.
@@ -4496,12 +4499,14 @@ static bool commit_minimal_transition_state(struct dc *dc,
dc->debug.pipe_split_policy != MPC_SPLIT_AVOID ? "MPC in Use" :
"Unknown");
+ dc_state_retain(transition_base_context);
transition_context = create_minimal_transition_state(dc,
transition_base_context, &policy);
if (transition_context) {
ret = dc_commit_state_no_check(dc, transition_context);
release_minimal_transition_state(dc, transition_context, transition_base_context, &policy);
}
+ dc_state_release(transition_base_context);
if (ret != DC_OK) {
/* this should never happen */
@@ -4839,7 +4844,7 @@ static bool update_planes_and_stream_v2(struct dc *dc,
context);
}
if (dc->current_state != context)
- swap_and_free_current_context(dc, context, stream);
+ swap_and_release_current_context(dc, context, stream);
return true;
}
@@ -4941,7 +4946,7 @@ static bool update_planes_and_stream_v3(struct dc *dc,
commit_planes_and_stream_update_with_new_context(dc,
srf_updates, surface_count, stream,
stream_update, update_type, new_context);
- swap_and_free_current_context(dc, new_context, stream);
+ swap_and_release_current_context(dc, new_context, stream);
}
return true;
--
2.37.3
From: Chris Park <chris.park(a)amd.com>
[Why]
Disabling stream encoder invokes a function that no longer exists
in bring-up.
[How]
Check if the function declaration is NULL in disable stream encoder.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Charlene Liu <charlene.liu(a)amd.com>
Acked-by: Wayne Lin <wayne.lin(a)amd.com>
Signed-off-by: Chris Park <chris.park(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
index 9d5df4c0da59..0ba1feaf96c0 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
@@ -1185,7 +1185,8 @@ void dce110_disable_stream(struct pipe_ctx *pipe_ctx)
if (dccg) {
dccg->funcs->disable_symclk32_se(dccg, dp_hpo_inst);
dccg->funcs->set_dpstreamclk(dccg, REFCLK, tg->inst, dp_hpo_inst);
- dccg->funcs->set_dtbclk_dto(dccg, &dto_params);
+ if (dccg && dccg->funcs->set_dtbclk_dto)
+ dccg->funcs->set_dtbclk_dto(dccg, &dto_params);
}
} else if (dccg && dccg->funcs->disable_symclk_se) {
dccg->funcs->disable_symclk_se(dccg, stream_enc->stream_enc_inst,
--
2.37.3
From: Wenjing Liu <wenjing.liu(a)amd.com>
This reverts commit 6cf00f4c4d5c ("drm/amd/display: Remove pixle rate
limit for subvp")
[why]
The original commit causes a regression when subvp is applied
on ODM required 8k60hz timing. The display shows black screen
on boot. The issue can be recovered with hotplug. It also causes
MPO to fail. We will temprarily revert this commit and investigate
the root cause further.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Chaitanya Dhere <chaitanya.dhere(a)amd.com>
Reviewed-by: Martin Leung <martin.leung(a)amd.com>
Acked-by: Wayne Lin <wayne.lin(a)amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
index b49e1dc9d8ba..a0a65e099104 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -623,6 +623,7 @@ static bool dcn32_assign_subvp_pipe(struct dc *dc,
* - Not TMZ surface
*/
if (pipe->plane_state && !pipe->top_pipe && !dcn32_is_center_timing(pipe) &&
+ !(pipe->stream->timing.pix_clk_100hz / 10000 > DCN3_2_MAX_SUBVP_PIXEL_RATE_MHZ) &&
(!dcn32_is_psr_capable(pipe) || (context->stream_count == 1 && dc->caps.dmub_caps.subvp_psr)) &&
dc_state_get_pipe_subvp_type(context, pipe) == SUBVP_NONE &&
(refresh_rate < 120 || dcn32_allow_subvp_high_refresh_rate(dc, context, pipe)) &&
--
2.37.3
From: Leo Ma <hanghong.ma(a)amd.com>
[Why]
When mode switching is triggered there is momentary noise visible on
some HDMI TV or displays.
[How]
Wait for 2 frames to make sure we have enough time to send out AV mute
and sink receives a full frame.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Wenjing Liu <wenjing.liu(a)amd.com>
Acked-by: Wayne Lin <wayne.lin(a)amd.com>
Signed-off-by: Leo Ma <hanghong.ma(a)amd.com>
---
.../gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
index 7e6b7f2a6dc9..8bc3d01537bb 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn30/dcn30_hwseq.c
@@ -812,10 +812,20 @@ void dcn30_set_avmute(struct pipe_ctx *pipe_ctx, bool enable)
if (pipe_ctx == NULL)
return;
- if (dc_is_hdmi_signal(pipe_ctx->stream->signal) && pipe_ctx->stream_res.stream_enc != NULL)
+ if (dc_is_hdmi_signal(pipe_ctx->stream->signal) && pipe_ctx->stream_res.stream_enc != NULL) {
pipe_ctx->stream_res.stream_enc->funcs->set_avmute(
pipe_ctx->stream_res.stream_enc,
enable);
+
+ /* Wait for two frame to make sure AV mute is sent out */
+ if (enable) {
+ pipe_ctx->stream_res.tg->funcs->wait_for_state(pipe_ctx->stream_res.tg, CRTC_STATE_VACTIVE);
+ pipe_ctx->stream_res.tg->funcs->wait_for_state(pipe_ctx->stream_res.tg, CRTC_STATE_VBLANK);
+ pipe_ctx->stream_res.tg->funcs->wait_for_state(pipe_ctx->stream_res.tg, CRTC_STATE_VACTIVE);
+ pipe_ctx->stream_res.tg->funcs->wait_for_state(pipe_ctx->stream_res.tg, CRTC_STATE_VBLANK);
+ pipe_ctx->stream_res.tg->funcs->wait_for_state(pipe_ctx->stream_res.tg, CRTC_STATE_VACTIVE);
+ }
+ }
}
void dcn30_update_info_frame(struct pipe_ctx *pipe_ctx)
--
2.37.3
When orientation switch is enabled in ucsi glink, there is a xhci
probe failure seen when booting up in host mode in reverse
orientation.
During bootup the following things happen in multiple drivers:
a) DWC3 controller driver initializes the core in device mode when the
dr_mode is set to DRD. It relies on role_switch call to change role to
host.
b) QMP driver initializes the lanes to TYPEC_ORIENTATION_NORMAL as a
normal routine. It relies on the typec_switch_set call to get notified
of orientation changes.
c) UCSI core reads the UCSI_GET_CONNECTOR_STATUS via the glink and
provides initial role switch to dwc3 controller.
When booting up in host mode with orientation TYPEC_ORIENTATION_REVERSE,
then we see the following things happening in order:
a) UCSI gives initial role as host to dwc3 controller ucsi_register_port.
Upon receiving this notification, the dwc3 core needs to program GCTL from
PRTCAP_DEVICE to PRTCAP_HOST and as part of this change, it asserts GCTL
Core soft reset and waits for it to be completed before shifting it to
host. Only after the reset is done will the dwc3_host_init be invoked and
xhci is probed. DWC3 controller expects that the usb phy's are stable
during this process i.e., the phy init is already done.
b) During the 100ms wait for GCTL core soft reset, the actual notification
from PPM is received by ucsi_glink via pmic glink for changing role to
host. The pmic_glink_ucsi_notify routine first sends the orientation
change to QMP and then sends role to dwc3 via ucsi framework. This is
happening exactly at the time GCTL core soft reset is being processed.
c) When QMP driver receives typec switch to TYPEC_ORIENTATION_REVERSE, it
then re-programs the phy at the instant GCTL core soft reset has been
asserted by dwc3 controller due to which the QMP PLL lock fails in
qmp_combo_usb_power_on.
d) After the 100ms of GCTL core soft reset is completed, the dwc3 core
goes for initializing the host mode and invokes xhci probe. But at this
point the QMP is non-responsive and as a result, the xhci plat probe fails
during xhci_reset.
Fix this by passing orientation switch to available ucsi instances if
their gpio configuration is available before ucsi_register is invoked so
that by the time, the pmic_glink_ucsi_notify provides typec_switch to QMP,
the lane is already configured and the call would be a NOP thus not racing
with role switch.
Cc: <stable(a)vger.kernel.org>
Fixes: c6165ed2f425 ("usb: ucsi: glink: use the connector orientation GPIO to provide switch events")
Suggested-by: Wesley Cheng <quic_wcheng(a)quicinc.com>
Signed-off-by: Krishna Kurapati <quic_kriskura(a)quicinc.com>
---
drivers/usb/typec/ucsi/ucsi_glink.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
diff --git a/drivers/usb/typec/ucsi/ucsi_glink.c b/drivers/usb/typec/ucsi/ucsi_glink.c
index 0bd3f6dee678..466df7b9f953 100644
--- a/drivers/usb/typec/ucsi/ucsi_glink.c
+++ b/drivers/usb/typec/ucsi/ucsi_glink.c
@@ -255,6 +255,20 @@ static void pmic_glink_ucsi_notify(struct work_struct *work)
static void pmic_glink_ucsi_register(struct work_struct *work)
{
struct pmic_glink_ucsi *ucsi = container_of(work, struct pmic_glink_ucsi, register_work);
+ int orientation;
+ int i;
+
+ for (i = 0; i < PMIC_GLINK_MAX_PORTS; i++) {
+ if (!ucsi->port_orientation[i])
+ continue;
+ orientation = gpiod_get_value(ucsi->port_orientation[i]);
+
+ if (orientation >= 0) {
+ typec_switch_set(ucsi->port_switch[i],
+ orientation ? TYPEC_ORIENTATION_REVERSE
+ : TYPEC_ORIENTATION_NORMAL);
+ }
+ }
ucsi_register(ucsi->ucsi);
}
--
2.34.1
While commit 69f89168b310 ("usb: typec: tpcm: Fix issues with power being
removed during reset") fixes the boot issues for bus powered devices such
as LibreTech Renegade Elite/Firefly, it trades off the CC pins NOT being
Hi-Zed during errory recovery (i.e PORT_RESET) for devices which are NOT
bus powered(a.k.a self powered). This change Hi-Zs the CC pins only for
self powered devices, thus preventing brown out for bus powered devices
Adhering to spec is gaining more importance due to the Common charger
initiative enforced by the European Union.
Quoting from the spec:
4.5.2.2.2.1 ErrorRecovery State Requirements
The port shall not drive VBUS or VCONN, and shall present a
high-impedance to ground (above zOPEN) on its CC1 and CC2 pins.
Hi-Zing the CC pins is the inteded behavior for PORT_RESET.
CC pins are set to default state after tErrorRecovery in
PORT_RESET_WAIT_OFF.
4.5.2.2.2.2 Exiting From ErrorRecovery State
A Sink shall transition to Unattached.SNK after tErrorRecovery.
A Source shall transition to Unattached.SRC after tErrorRecovery.
Fixes: 69f89168b310 ("usb: typec: tpcm: Fix issues with power being removed during reset")
Cc: stable(a)vger.kernel.org
Cc: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Badhri Jagan Sridharan <badhri(a)google.com>
---
Changes since V1:
* Fix CC for linux stable
---
drivers/usb/typec/tcpm/tcpm.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index c9a78f55ca48..bbe1381232eb 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -5593,8 +5593,11 @@ static void run_state_machine(struct tcpm_port *port)
break;
case PORT_RESET:
tcpm_reset_port(port);
- tcpm_set_cc(port, tcpm_default_state(port) == SNK_UNATTACHED ?
- TYPEC_CC_RD : tcpm_rp_cc(port));
+ if (port->self_powered)
+ tcpm_set_cc(port, TYPEC_CC_OPEN);
+ else
+ tcpm_set_cc(port, tcpm_default_state(port) == SNK_UNATTACHED ?
+ TYPEC_CC_RD : tcpm_rp_cc(port));
tcpm_set_state(port, PORT_RESET_WAIT_OFF,
PD_T_ERROR_RECOVERY);
break;
base-commit: a560a5672826fc1e057068bda93b3d4c98d037a2
--
2.44.0.rc1.240.g4c46232300-goog
We have found a regression bug, where more than 512 URBs cannot be reliably submitted to XHCI. URBs beyond that return 0x00 instead of valid data in the buffer.
Our software works reliably on kernel versions through 6.4.x and fails on versions 6.5, 6.6, 6.7, and 6.8.0-rc6. This was discovered when Ubuntu recently updated their latest kernel package to version 6.5.
The issue is limited to the XHCI driver and appears to be isolated to this specific commit:
[ https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/d… | https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/d… ]
Attached is a test program that demonstrates the problem. We used a few different USB-to-Serial adapters with no driver installed as a convenient way to reproduce. We check the TRB debug information before and after to verify the actual number of allocated TRBs.
With some adapters on unaffected kernels, the TRB map gets expanded correctly. This directly corresponds to correct functional behavior. On affected kernels, the TRB ring does not expand, and our functional tests also will fail.
We don't know exactly why this happens. Some adapters do work correctly, so there seems to also be some subtle problem that was being masked by the liberal expansion of the TRB ring in older kernels. We also saw on one system that the TRB expansion did work correctly with one particular adapter. However, on all systems at least two adapters did exhibit the problem and fail.
Would it be possible to resolve this regression for the 6.8 release and backport the fix to versions 6.5, 6.6, and 6.7?
#regzbot ^introduced: f5af638f0609af889f15c700c60b93c06cc76675
Hi all,
This patch series includes backports for the changes that fix CVE-2023-52447.
The changes were applied cleanly from my 5.15 backport, viewable at
https://lore.kernel.org/stable/cover.1710187165.git.rkolchmeyer@google.com/….
The notes in that patch set largely apply here.
The only note I have for 5.10 is that the test cases with sleepable BPF
programs in commit 1624918be84a ("selftests/bpf: Add test cases for inner map")
do not seem to be compatible with the 5.10 kernel. But the situation with the
other test cases matches the observations I shared in the 5.15 patch set.
Thanks,
-Robert
Hou Tao (1):
bpf: Defer the free of inner map when necessary
Paul E. McKenney (1):
rcu-tasks: Provide rcu_trace_implies_rcu_gp()
include/linux/bpf.h | 7 ++++++-
include/linux/rcupdate.h | 12 ++++++++++++
kernel/bpf/map_in_map.c | 11 ++++++++---
kernel/bpf/syscall.c | 26 ++++++++++++++++++++++++--
kernel/rcu/tasks.h | 2 ++
5 files changed, 52 insertions(+), 6 deletions(-)
--
2.44.0.278.ge034bb2e1d-goog
From: Ryan Roberts <ryan.roberts(a)arm.com>
There was previously a theoretical window where swapoff() could run and
teardown a swap_info_struct while a call to free_swap_and_cache() was
running in another thread. This could cause, amongst other bad
possibilities, swap_page_trans_huge_swapped() (called by
free_swap_and_cache()) to access the freed memory for swap_map.
This is a theoretical problem and I haven't been able to provoke it from a
test case. But there has been agreement based on code review that this is
possible (see link below).
Fix it by using get_swap_device()/put_swap_device(), which will stall
swapoff(). There was an extra check in _swap_info_get() to confirm that
the swap entry was not free. This isn't present in get_swap_device()
because it doesn't make sense in general due to the race between getting
the reference and swapoff. So I've added an equivalent check directly in
free_swap_and_cache().
Details of how to provoke one possible issue:
--8<-----
CPU0 CPU1
---- ----
shmem_undo_range
shmem_free_swap
xa_cmpxchg_irq
free_swap_and_cache
__swap_entry_free
/* swap_count() become 0 */
swapoff
try_to_unuse
shmem_unuse /* cannot find swap entry */
find_next_to_unuse
filemap_get_folio
folio_free_swap
/* remove swap cache */
/* free si->swap_map[] */
swap_page_trans_huge_swapped <-- access freed si->swap_map !!!
--8<-----
Link: https://lkml.kernel.org/r/20240306140356.3974886-1-ryan.roberts@arm.com
Closes: https://lore.kernel.org/linux-mm/8734t27awd.fsf@yhuang6-desk2.ccr.corp.inte…
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
Signed-off-by: "Huang, Ying" <ying.huang(a)intel.com> [patch description]
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Hi, Andrew,
If it's not too late. Please replace v2 of this patch in mm-stable
with this version.
Changes since v2:
- Remove comments for get_swap_device() because it's not correct.
- Revised patch description about the race condition description.
Changes since v1:
- Added comments for get_swap_device() as suggested by David
- Moved check that swap entry is not free from get_swap_device() to
free_swap_and_cache() since there are some paths that legitimately call with
a free offset.
Best Regards,
Huang, Ying
mm/swapfile.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2b3a2d85e350..9e0691276f5e 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1609,13 +1609,19 @@ int free_swap_and_cache(swp_entry_t entry)
if (non_swap_entry(entry))
return 1;
- p = _swap_info_get(entry);
+ p = get_swap_device(entry);
if (p) {
+ if (WARN_ON(data_race(!p->swap_map[swp_offset(entry)]))) {
+ put_swap_device(p);
+ return 0;
+ }
+
count = __swap_entry_free(p, entry);
if (count == SWAP_HAS_CACHE &&
!swap_page_trans_huge_swapped(p, entry))
__try_to_reclaim_swap(p, swp_offset(entry),
TTRS_UNMAPPED | TTRS_FULL);
+ put_swap_device(p);
}
return p != NULL;
}
--
2.39.2
From: Vasant Karasulli <vkarasulli(a)suse.de>
Hi,
here are changes to enable kexec/kdump in SEV-ES guests. The biggest
problem for supporting kexec/kdump under SEV-ES is to find a way to
hand the non-boot CPUs (APs) from one kernel to another.
Without SEV-ES the first kernel parks the CPUs in a HLT loop until
they get reset by the kexec'ed kernel via an INIT-SIPI-SIPI sequence.
For virtual machines the CPU reset is emulated by the hypervisor,
which sets the vCPU registers back to reset state.
This does not work under SEV-ES, because the hypervisor has no access
to the vCPU registers and can't make modifications to them. So an
SEV-ES guest needs to reset the vCPU itself and park it using the
AP-reset-hold protocol. Upon wakeup the guest needs to jump to
real-mode and to the reset-vector configured in the AP-Jump-Table.
The code to do this is the main part of this patch-set. It works by
placing code on the AP Jump-Table page itself to park the vCPU and for
jumping to the reset vector upon wakeup. The code on the AP Jump Table
runs in 16-bit protected mode with segment base set to the beginning
of the page. The AP Jump-Table is usually not within the first 1MB of
memory, so the code can't run in real-mode.
The AP Jump-Table is the best place to put the parking code, because
the memory is owned, but read-only by the firmware and writeable by
the OS. Only the first 4 bytes are used for the reset-vector, leaving
the rest of the page for code/data/stack to park a vCPU. The code
can't be in kernel memory because by the time the vCPU wakes up the
memory will be owned by the new kernel, which might have overwritten it
already.
The other patches add initial GHCB Version 2 protocol support, because
kexec/kdump need the MSR-based (without a GHCB) AP-reset-hold VMGEXIT,
which is a GHCB protocol version 2 feature.
The kexec'ed kernel is also entered via the decompressor and needs
MMIO support there, so this patch-set also adds MMIO #VC support to
the decompressor and support for handling CLFLUSH instructions.
Finally there is also code to disable kexec/kdump support at runtime
when the environment does not support it (e.g. no GHCB protocol
version 2 support or AP Jump Table over 4GB).
The diffstat looks big, but most of it is moving code for MMIO #VC
support around to make it available to the decompressor.
The previous version of this patch-set can be found here:
https://lore.kernel.org/lkml/20220127101044.13803-1-joro@8bytes.org/
Please review.
Thanks,
Vasant
Changes v3->v4:
- Rebased to v6.8 kernel
- Applied review comments by Sean Christopherson
- Combined sev_es_setup_ap_jump_table() and sev_setup_ap_jump_table()
into a single function which makes caching jump table address
unnecessary
- annotated struct sev_ap_jump_table_header with __packed attribute
- added code to set up real mode data segment at boot time instead of
hardcoding the value.
Changes v2->v3:
- Rebased to v5.17-rc1
- Applied most review comments by Boris
- Use the name 'AP jump table' consistently
- Make kexec-disabling for unsupported guests x86-specific
- Cleanup and consolidate patches to detect GHCB v2 protocol
support
Joerg Roedel (9):
x86/kexec/64: Disable kexec when SEV-ES is active
x86/sev: Save and print negotiated GHCB protocol version
x86/sev: Set GHCB data structure version
x86/sev: Setup code to park APs in the AP Jump Table
x86/sev: Park APs on AP Jump Table with GHCB protocol version 2
x86/sev: Use AP Jump Table blob to stop CPU
x86/sev: Add MMIO handling support to boot/compressed/ code
x86/sev: Handle CLFLUSH MMIO events
x86/kexec/64: Support kexec under SEV-ES with AP Jump Table Blob
arch/x86/boot/compressed/sev.c | 45 +-
arch/x86/include/asm/insn-eval.h | 1 +
arch/x86/include/asm/realmode.h | 5 +
arch/x86/include/asm/sev-ap-jumptable.h | 30 +
arch/x86/include/asm/sev.h | 7 +
arch/x86/kernel/machine_kexec_64.c | 12 +
arch/x86/kernel/process.c | 8 +
arch/x86/kernel/sev-shared.c | 234 +++++-
arch/x86/kernel/sev.c | 372 +++++-----
arch/x86/lib/insn-eval-shared.c | 912 ++++++++++++++++++++++++
arch/x86/lib/insn-eval.c | 911 +----------------------
arch/x86/realmode/Makefile | 9 +-
arch/x86/realmode/rm/Makefile | 11 +-
arch/x86/realmode/rm/header.S | 3 +
arch/x86/realmode/rm/sev.S | 85 +++
arch/x86/realmode/rmpiggy.S | 6 +
arch/x86/realmode/sev/Makefile | 33 +
arch/x86/realmode/sev/ap_jump_table.S | 131 ++++
arch/x86/realmode/sev/ap_jump_table.lds | 24 +
19 files changed, 1695 insertions(+), 1144 deletions(-)
create mode 100644 arch/x86/include/asm/sev-ap-jumptable.h
create mode 100644 arch/x86/lib/insn-eval-shared.c
create mode 100644 arch/x86/realmode/rm/sev.S
create mode 100644 arch/x86/realmode/sev/Makefile
create mode 100644 arch/x86/realmode/sev/ap_jump_table.S
create mode 100644 arch/x86/realmode/sev/ap_jump_table.lds
base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
--
2.34.1
This is the backport of recently upstreamed series that moves VERW
execution to a later point in exit-to-user path. This is needed because
in some cases it may be possible for data accessed after VERW executions
may end into MDS affected CPU buffers. Moving VERW closer to ring
transition reduces the attack surface.
- The series includes a dependency commit f87bc8dc7a7c ("x86/asm: Add
_ASM_RIP() macro for x86-64 (%rip) suffix").
- Patch 2 includes a change that adds runtime patching for jmp (instead
of verw in original series) due to lack of rip-relative relocation
support in kernels <v6.5.
- Fixed warning:
arch/x86/entry/entry.o: warning: objtool: mds_verw_sel+0x0: unreachable instruction.
- Resolved merge conflicts in:
swapgs_restore_regs_and_return_to_usermode in entry_64.S.
__vmx_vcpu_run in vmenter.S.
vmx_update_fb_clear_dis in vmx.c.
- Boot tested with KASLR and KPTI enabled.
- Verified VERW being executed with mitigation ON, and not being
executed with mitigation turned OFF.
To: stable(a)vger.kernel.org
Signed-off-by: Pawan Gupta <pawan.kumar.gupta(a)linux.intel.com>
---
H. Peter Anvin (Intel) (1):
x86/asm: Add _ASM_RIP() macro for x86-64 (%rip) suffix
Pawan Gupta (5):
x86/bugs: Add asm helpers for executing VERW
x86/entry_64: Add VERW just before userspace transition
x86/entry_32: Add VERW just before userspace transition
x86/bugs: Use ALTERNATIVE() instead of mds_user_clear static key
KVM/VMX: Move VERW closer to VMentry for MDS mitigation
Sean Christopherson (1):
KVM/VMX: Use BT+JNC, i.e. EFLAGS.CF to select VMRESUME vs. VMLAUNCH
Documentation/x86/mds.rst | 38 +++++++++++++++++++++++++-----------
arch/x86/entry/entry.S | 23 ++++++++++++++++++++++
arch/x86/entry/entry_32.S | 3 +++
arch/x86/entry/entry_64.S | 11 +++++++++++
arch/x86/entry/entry_64_compat.S | 1 +
arch/x86/include/asm/asm.h | 5 +++++
arch/x86/include/asm/cpufeatures.h | 2 +-
arch/x86/include/asm/entry-common.h | 1 -
arch/x86/include/asm/nospec-branch.h | 27 +++++++++++++------------
arch/x86/kernel/cpu/bugs.c | 15 ++++++--------
arch/x86/kernel/nmi.c | 3 ---
arch/x86/kvm/vmx/run_flags.h | 7 +++++--
arch/x86/kvm/vmx/vmenter.S | 9 ++++++---
arch/x86/kvm/vmx/vmx.c | 12 ++++++++----
14 files changed, 111 insertions(+), 46 deletions(-)
---
base-commit: 80efc6265290d34b75921bf7294e0d9c5a8749dc
change-id: 20240304-delay-verw-backport-5-15-y-e16f07fbb71e
Best regards,
--
Thanks,
Pawan
From: Filipe Manana <fdmanana(a)suse.com>
[ Upstream commit e06cc89475eddc1f3a7a4d471524256152c68166 ]
At space_info.c we have several places where we access the ->reserved
field of a block reserve without taking the block reserve's spinlock
first, which makes KCSAN warn about a data race since that field is
always updated while holding the spinlock.
The reports from KCSAN are like the following:
[117.193526] BUG: KCSAN: data-race in btrfs_block_rsv_release [btrfs] / need_preemptive_reclaim [btrfs]
[117.195148] read to 0x000000017f587190 of 8 bytes by task 6303 on cpu 3:
[117.195172] need_preemptive_reclaim+0x222/0x2f0 [btrfs]
[117.195992] __reserve_bytes+0xbb0/0xdc8 [btrfs]
[117.196807] btrfs_reserve_metadata_bytes+0x4c/0x120 [btrfs]
[117.197620] btrfs_block_rsv_add+0x78/0xa8 [btrfs]
[117.198434] btrfs_delayed_update_inode+0x154/0x368 [btrfs]
[117.199300] btrfs_update_inode+0x108/0x1c8 [btrfs]
[117.200122] btrfs_dirty_inode+0xb4/0x140 [btrfs]
[117.200937] btrfs_update_time+0x8c/0xb0 [btrfs]
[117.201754] touch_atime+0x16c/0x1e0
[117.201789] filemap_read+0x674/0x728
[117.201823] btrfs_file_read_iter+0xf8/0x410 [btrfs]
[117.202653] vfs_read+0x2b6/0x498
[117.203454] ksys_read+0xa2/0x150
[117.203473] __s390x_sys_read+0x68/0x88
[117.203495] do_syscall+0x1c6/0x210
[117.203517] __do_syscall+0xc8/0xf0
[117.203539] system_call+0x70/0x98
[117.203579] write to 0x000000017f587190 of 8 bytes by task 11 on cpu 0:
[117.203604] btrfs_block_rsv_release+0x2e8/0x578 [btrfs]
[117.204432] btrfs_delayed_inode_release_metadata+0x7c/0x1d0 [btrfs]
[117.205259] __btrfs_update_delayed_inode+0x37c/0x5e0 [btrfs]
[117.206093] btrfs_async_run_delayed_root+0x356/0x498 [btrfs]
[117.206917] btrfs_work_helper+0x160/0x7a0 [btrfs]
[117.207738] process_one_work+0x3b6/0x838
[117.207768] worker_thread+0x75e/0xb10
[117.207797] kthread+0x21a/0x230
[117.207830] __ret_from_fork+0x6c/0xb8
[117.207861] ret_from_fork+0xa/0x30
So add a helper to get the reserved amount of a block reserve while
holding the lock. The value may be not be up to date anymore when used by
need_preemptive_reclaim() and btrfs_preempt_reclaim_metadata_space(), but
that's ok since the worst it can do is cause more reclaim work do be done
sooner rather than later. Reading the field while holding the lock instead
of using the data_race() annotation is used in order to prevent load
tearing.
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/btrfs/block-rsv.h | 16 ++++++++++++++++
fs/btrfs/space-info.c | 26 +++++++++++++-------------
2 files changed, 29 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h
index 578c3497a455c..cda79d3e0c263 100644
--- a/fs/btrfs/block-rsv.h
+++ b/fs/btrfs/block-rsv.h
@@ -101,4 +101,20 @@ static inline bool btrfs_block_rsv_full(const struct btrfs_block_rsv *rsv)
return data_race(rsv->full);
}
+/*
+ * Get the reserved mount of a block reserve in a context where getting a stale
+ * value is acceptable, instead of accessing it directly and trigger data race
+ * warning from KCSAN.
+ */
+static inline u64 btrfs_block_rsv_reserved(struct btrfs_block_rsv *rsv)
+{
+ u64 ret;
+
+ spin_lock(&rsv->lock);
+ ret = rsv->reserved;
+ spin_unlock(&rsv->lock);
+
+ return ret;
+}
+
#endif /* BTRFS_BLOCK_RSV_H */
diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c
index 2635fb4bffa06..8b75f436a9a3c 100644
--- a/fs/btrfs/space-info.c
+++ b/fs/btrfs/space-info.c
@@ -847,7 +847,7 @@ btrfs_calc_reclaim_metadata_size(struct btrfs_fs_info *fs_info,
static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
struct btrfs_space_info *space_info)
{
- u64 global_rsv_size = fs_info->global_block_rsv.reserved;
+ const u64 global_rsv_size = btrfs_block_rsv_reserved(&fs_info->global_block_rsv);
u64 ordered, delalloc;
u64 total = writable_total_bytes(fs_info, space_info);
u64 thresh;
@@ -948,8 +948,8 @@ static bool need_preemptive_reclaim(struct btrfs_fs_info *fs_info,
ordered = percpu_counter_read_positive(&fs_info->ordered_bytes) >> 1;
delalloc = percpu_counter_read_positive(&fs_info->delalloc_bytes);
if (ordered >= delalloc)
- used += fs_info->delayed_refs_rsv.reserved +
- fs_info->delayed_block_rsv.reserved;
+ used += btrfs_block_rsv_reserved(&fs_info->delayed_refs_rsv) +
+ btrfs_block_rsv_reserved(&fs_info->delayed_block_rsv);
else
used += space_info->bytes_may_use - global_rsv_size;
@@ -1164,7 +1164,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
enum btrfs_flush_state flush;
u64 delalloc_size = 0;
u64 to_reclaim, block_rsv_size;
- u64 global_rsv_size = global_rsv->reserved;
+ const u64 global_rsv_size = btrfs_block_rsv_reserved(global_rsv);
loops++;
@@ -1176,9 +1176,9 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
* assume it's tied up in delalloc reservations.
*/
block_rsv_size = global_rsv_size +
- delayed_block_rsv->reserved +
- delayed_refs_rsv->reserved +
- trans_rsv->reserved;
+ btrfs_block_rsv_reserved(delayed_block_rsv) +
+ btrfs_block_rsv_reserved(delayed_refs_rsv) +
+ btrfs_block_rsv_reserved(trans_rsv);
if (block_rsv_size < space_info->bytes_may_use)
delalloc_size = space_info->bytes_may_use - block_rsv_size;
@@ -1198,16 +1198,16 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work)
to_reclaim = delalloc_size;
flush = FLUSH_DELALLOC;
} else if (space_info->bytes_pinned >
- (delayed_block_rsv->reserved +
- delayed_refs_rsv->reserved)) {
+ (btrfs_block_rsv_reserved(delayed_block_rsv) +
+ btrfs_block_rsv_reserved(delayed_refs_rsv))) {
to_reclaim = space_info->bytes_pinned;
flush = COMMIT_TRANS;
- } else if (delayed_block_rsv->reserved >
- delayed_refs_rsv->reserved) {
- to_reclaim = delayed_block_rsv->reserved;
+ } else if (btrfs_block_rsv_reserved(delayed_block_rsv) >
+ btrfs_block_rsv_reserved(delayed_refs_rsv)) {
+ to_reclaim = btrfs_block_rsv_reserved(delayed_block_rsv);
flush = FLUSH_DELAYED_ITEMS_NR;
} else {
- to_reclaim = delayed_refs_rsv->reserved;
+ to_reclaim = btrfs_block_rsv_reserved(delayed_refs_rsv);
flush = FLUSH_DELAYED_REFS_NR;
}
--
2.43.0
From: Ranjan Kumar <ranjan.kumar(a)broadcom.com>
[ Upstream commit ee0017c3ed8a8abfa4d40e42f908fb38c31e7515 ]
If the driver detects that the controller is not ready before sending the
first IOC facts command, it will wait for a maximum of 10 seconds for it to
become ready. However, even if the controller becomes ready within 10
seconds, the driver will still issue a diagnostic reset.
Modify the driver to avoid sending a diag reset if the controller becomes
ready within the 10-second wait time.
Signed-off-by: Ranjan Kumar <ranjan.kumar(a)broadcom.com>
Link: https://lore.kernel.org/r/20240221071724.14986-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/mpt3sas/mpt3sas_base.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 447ac667f4b2b..7588c2c11a879 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -5584,7 +5584,9 @@ _base_wait_for_iocstate(struct MPT3SAS_ADAPTER *ioc, int timeout)
return -EFAULT;
}
- issue_diag_reset:
+ return 0;
+
+issue_diag_reset:
rc = _base_diag_reset(ioc);
return rc;
}
--
2.43.0
From: Filipe Manana <fdmanana(a)suse.com>
[ Upstream commit c7bb26b847e5b97814f522686068c5628e2b3646 ]
At btrfs_use_block_rsv() we read the size of a block reserve without
locking its spinlock, which makes KCSAN complain because the size of a
block reserve is always updated while holding its spinlock. The report
from KCSAN is the following:
[653.313148] BUG: KCSAN: data-race in btrfs_update_delayed_refs_rsv [btrfs] / btrfs_use_block_rsv [btrfs]
[653.314755] read to 0x000000017f5871b8 of 8 bytes by task 7519 on cpu 0:
[653.314779] btrfs_use_block_rsv+0xe4/0x2f8 [btrfs]
[653.315606] btrfs_alloc_tree_block+0xdc/0x998 [btrfs]
[653.316421] btrfs_force_cow_block+0x220/0xe38 [btrfs]
[653.317242] btrfs_cow_block+0x1ac/0x568 [btrfs]
[653.318060] btrfs_search_slot+0xda2/0x19b8 [btrfs]
[653.318879] btrfs_del_csums+0x1dc/0x798 [btrfs]
[653.319702] __btrfs_free_extent.isra.0+0xc24/0x2028 [btrfs]
[653.320538] __btrfs_run_delayed_refs+0xd3c/0x2390 [btrfs]
[653.321340] btrfs_run_delayed_refs+0xae/0x290 [btrfs]
[653.322140] flush_space+0x5e4/0x718 [btrfs]
[653.322958] btrfs_preempt_reclaim_metadata_space+0x102/0x2f8 [btrfs]
[653.323781] process_one_work+0x3b6/0x838
[653.323800] worker_thread+0x75e/0xb10
[653.323817] kthread+0x21a/0x230
[653.323836] __ret_from_fork+0x6c/0xb8
[653.323855] ret_from_fork+0xa/0x30
[653.323887] write to 0x000000017f5871b8 of 8 bytes by task 576 on cpu 3:
[653.323906] btrfs_update_delayed_refs_rsv+0x1a4/0x250 [btrfs]
[653.324699] btrfs_add_delayed_data_ref+0x468/0x6d8 [btrfs]
[653.325494] btrfs_free_extent+0x76/0x120 [btrfs]
[653.326280] __btrfs_mod_ref+0x6a8/0x6b8 [btrfs]
[653.327064] btrfs_dec_ref+0x50/0x70 [btrfs]
[653.327849] walk_up_proc+0x236/0xa50 [btrfs]
[653.328633] walk_up_tree+0x21c/0x448 [btrfs]
[653.329418] btrfs_drop_snapshot+0x802/0x1328 [btrfs]
[653.330205] btrfs_clean_one_deleted_snapshot+0x184/0x238 [btrfs]
[653.330995] cleaner_kthread+0x2b0/0x2f0 [btrfs]
[653.331781] kthread+0x21a/0x230
[653.331800] __ret_from_fork+0x6c/0xb8
[653.331818] ret_from_fork+0xa/0x30
So add a helper to get the size of a block reserve while holding the lock.
Reading the field while holding the lock instead of using the data_race()
annotation is used in order to prevent load tearing.
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/btrfs/block-rsv.c | 2 +-
fs/btrfs/block-rsv.h | 16 ++++++++++++++++
2 files changed, 17 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/block-rsv.c b/fs/btrfs/block-rsv.c
index 36ef3228bac86..63205d2f4d84c 100644
--- a/fs/btrfs/block-rsv.c
+++ b/fs/btrfs/block-rsv.c
@@ -392,7 +392,7 @@ struct btrfs_block_rsv *btrfs_use_block_rsv(struct btrfs_trans_handle *trans,
block_rsv = get_block_rsv(trans, root);
- if (unlikely(block_rsv->size == 0))
+ if (unlikely(btrfs_block_rsv_size(block_rsv) == 0))
goto try_reserve;
again:
ret = btrfs_block_rsv_use_bytes(block_rsv, blocksize);
diff --git a/fs/btrfs/block-rsv.h b/fs/btrfs/block-rsv.h
index d1428bb73fc5a..69770360917cb 100644
--- a/fs/btrfs/block-rsv.h
+++ b/fs/btrfs/block-rsv.h
@@ -98,4 +98,20 @@ static inline void btrfs_unuse_block_rsv(struct btrfs_fs_info *fs_info,
btrfs_block_rsv_release(fs_info, block_rsv, 0);
}
+/*
+ * Get the size of a block reserve in a context where getting a stale value is
+ * acceptable, instead of accessing it directly and trigger data race warning
+ * from KCSAN.
+ */
+static inline u64 btrfs_block_rsv_size(struct btrfs_block_rsv *rsv)
+{
+ u64 ret;
+
+ spin_lock(&rsv->lock);
+ ret = rsv->size;
+ spin_unlock(&rsv->lock);
+
+ return ret;
+}
+
#endif /* BTRFS_BLOCK_RSV_H */
--
2.43.0
From: Ranjan Kumar <ranjan.kumar(a)broadcom.com>
[ Upstream commit ee0017c3ed8a8abfa4d40e42f908fb38c31e7515 ]
If the driver detects that the controller is not ready before sending the
first IOC facts command, it will wait for a maximum of 10 seconds for it to
become ready. However, even if the controller becomes ready within 10
seconds, the driver will still issue a diagnostic reset.
Modify the driver to avoid sending a diag reset if the controller becomes
ready within the 10-second wait time.
Signed-off-by: Ranjan Kumar <ranjan.kumar(a)broadcom.com>
Link: https://lore.kernel.org/r/20240221071724.14986-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen(a)oracle.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/scsi/mpt3sas/mpt3sas_base.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_base.c b/drivers/scsi/mpt3sas/mpt3sas_base.c
index 814ac25238058..105d781d0cacf 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_base.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_base.c
@@ -6357,7 +6357,9 @@ _base_wait_for_iocstate(struct MPT3SAS_ADAPTER *ioc, int timeout)
return -EFAULT;
}
- issue_diag_reset:
+ return 0;
+
+issue_diag_reset:
rc = _base_diag_reset(ioc);
return rc;
}
--
2.43.0
Currently for devices requiring masking at the irqchip for INTx, ie.
devices without DisINTx support, the IRQ is enabled in request_irq()
and subsequently disabled as necessary to align with the masked status
flag. This presents a window where the interrupt could fire between
these events, resulting in the IRQ incrementing the disable depth twice.
This would be unrecoverable for a user since the masked flag prevents
nested enables through vfio.
Instead, invert the logic using IRQF_NO_AUTOEN such that exclusive INTx
is never auto-enabled, then unmask as required.
Cc: stable(a)vger.kernel.org
Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
---
drivers/vfio/pci/vfio_pci_intrs.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 237beac83809..136101179fcb 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -296,8 +296,15 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
ctx->trigger = trigger;
+ /*
+ * Devices without DisINTx support require an exclusive interrupt,
+ * IRQ masking is performed at the IRQ chip. The masked status is
+ * protected by vdev->irqlock. Setup the IRQ without auto-enable and
+ * unmask as necessary below under lock. DisINTx is unmodified by
+ * the IRQ configuration and may therefore use auto-enable.
+ */
if (!vdev->pci_2_3)
- irqflags = 0;
+ irqflags = IRQF_NO_AUTOEN;
ret = request_irq(pdev->irq, vfio_intx_handler,
irqflags, ctx->name, vdev);
@@ -308,13 +315,9 @@ static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
return ret;
}
- /*
- * INTx disable will stick across the new irq setup,
- * disable_irq won't.
- */
spin_lock_irqsave(&vdev->irqlock, flags);
- if (!vdev->pci_2_3 && ctx->masked)
- disable_irq_nosync(pdev->irq);
+ if (!vdev->pci_2_3 && !ctx->masked)
+ enable_irq(pdev->irq);
spin_unlock_irqrestore(&vdev->irqlock, flags);
return 0;
--
2.44.0
From: Vitor Soares <vitor.soares(a)toradex.com>
When the mcp251xfd_start_xmit() function fails, the driver stops
processing messages, and the interrupt routine does not return,
running indefinitely even after killing the running application.
Error messages:
[ 441.298819] mcp251xfd spi2.0 can0: ERROR in mcp251xfd_start_xmit: -16
[ 441.306498] mcp251xfd spi2.0 can0: Transmit Event FIFO buffer not empty. (seq=0x000017c7, tef_tail=0x000017cf, tef_head=0x000017d0, tx_head=0x000017d3).
... and repeat forever.
The issue can be triggered when multiple devices share the same
SPI interface. And there is concurrent access to the bus.
The problem occurs because tx_ring->head increments even if
mcp251xfd_start_xmit() fails. Consequently, the driver skips one
TX package while still expecting a response in
mcp251xfd_handle_tefif_one().
This patch resolves the issue by decreasing tx_ring->head if
mcp251xfd_start_xmit() fails. With the fix, if we trigger the issue and
the err = -EBUSY, the driver returns NETDEV_TX_BUSY. The network stack
retries to transmit the message.
Otherwise, it prints an error and discards the message.
Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vitor Soares <vitor.soares(a)toradex.com>
---
V2->V3:
- Add tx_dropped stats.
- netdev_sent_queue() only if can_put_echo_skb() succeed.
V1->V2:
- Return NETDEV_TX_BUSY if mcp251xfd_tx_obj_write() == -EBUSY.
- Rework the commit message to address the change above.
- Change can_put_echo_skb() to be called after mcp251xfd_tx_obj_write() succeed.
Otherwise, we get Kernel NULL pointer dereference error.
drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c | 34 ++++++++++++--------
1 file changed, 21 insertions(+), 13 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
index 160528d3cc26..146c44e47c60 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
@@ -166,6 +166,7 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb,
struct net_device *ndev)
{
struct mcp251xfd_priv *priv = netdev_priv(ndev);
+ struct net_device_stats *stats = &ndev->stats;
struct mcp251xfd_tx_ring *tx_ring = priv->tx;
struct mcp251xfd_tx_obj *tx_obj;
unsigned int frame_len;
@@ -181,25 +182,32 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb,
tx_obj = mcp251xfd_get_tx_obj_next(tx_ring);
mcp251xfd_tx_obj_from_skb(priv, tx_obj, skb, tx_ring->head);
- /* Stop queue if we occupy the complete TX FIFO */
tx_head = mcp251xfd_get_tx_head(tx_ring);
- tx_ring->head++;
- if (mcp251xfd_get_tx_free(tx_ring) == 0)
- netif_stop_queue(ndev);
-
frame_len = can_skb_get_frame_len(skb);
- err = can_put_echo_skb(skb, ndev, tx_head, frame_len);
- if (!err)
- netdev_sent_queue(priv->ndev, frame_len);
+
+ tx_ring->head++;
err = mcp251xfd_tx_obj_write(priv, tx_obj);
- if (err)
- goto out_err;
+ if (err) {
+ tx_ring->head--;
- return NETDEV_TX_OK;
+ if (err == -EBUSY)
+ return NETDEV_TX_BUSY;
- out_err:
- netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err);
+ stats->tx_dropped++;
+
+ if (net_ratelimit())
+ netdev_err(priv->ndev,
+ "ERROR in %s: %d\n", __func__, err);
+ } else {
+ err = can_put_echo_skb(skb, ndev, tx_head, frame_len);
+ if (!err)
+ netdev_sent_queue(priv->ndev, frame_len);
+
+ /* Stop queue if we occupy the complete TX FIFO */
+ if (mcp251xfd_get_tx_free(tx_ring) == 0)
+ netif_stop_queue(ndev);
+ }
return NETDEV_TX_OK;
}
--
2.34.1
When running the simult_flow selftest in slow environments -- e.g. QEmu
without KVM support --, the results can be unstable. This selftest
checks if the aggregated bandwidth is (almost) fully used as expected.
To help improving the stability while still keeping the same validation
in place, the BW and the delay are reduced to lower the pressure on the
CPU.
Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
Fixes: 219d04992b68 ("mptcp: push pending frames when subflow has free space")
Cc: stable(a)vger.kernel.org
Suggested-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
(cherry picked from commit 5e2f3c65af47e527ccac54060cf909e3306652ff)
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Notes:
- Conflicts in simult_flows.sh, because v5.15 doesn't have commit
675d99338e7a ("selftests: mptcp: simult flows: format subtests
results in TAP") which modifies the context for a new but unrelated
feature.
- This is a new version to the one recently proposed by Sasha, this
time without dependences:
https://lore.kernel.org/stable/9f185a3f-9373-401c-9a5c-ec0f106c0cbc@kernel.…
- This is the same patch as the one recently sent for v6.1:
https://lore.kernel.org/stable/20240311111224.1421344-2-matttbe@kernel.org/
---
tools/testing/selftests/net/mptcp/simult_flows.sh | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
index 752cef168804..cab3e3a5481d 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -289,10 +289,11 @@ done
setup
run_test 10 10 0 0 "balanced bwidth"
-run_test 10 10 1 50 "balanced bwidth with unbalanced delay"
+run_test 10 10 1 25 "balanced bwidth with unbalanced delay"
# we still need some additional infrastructure to pass the following test-cases
-run_test 30 10 0 0 "unbalanced bwidth"
-run_test 30 10 1 50 "unbalanced bwidth with unbalanced delay"
-run_test 30 10 50 1 "unbalanced bwidth with opposed, unbalanced delay"
+run_test 10 3 0 0 "unbalanced bwidth"
+run_test 10 3 1 25 "unbalanced bwidth with unbalanced delay"
+run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay"
+
exit $ret
--
2.43.0
When running the simult_flow selftest in slow environments -- e.g. QEmu
without KVM support --, the results can be unstable. This selftest
checks if the aggregated bandwidth is (almost) fully used as expected.
To help improving the stability while still keeping the same validation
in place, the BW and the delay are reduced to lower the pressure on the
CPU.
Fixes: 1a418cb8e888 ("mptcp: simult flow self-tests")
Fixes: 219d04992b68 ("mptcp: push pending frames when subflow has free space")
Cc: stable(a)vger.kernel.org
Suggested-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Link: https://lore.kernel.org/r/20240131-upstream-net-20240131-mptcp-ci-issues-v1…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
(cherry picked from commit 5e2f3c65af47e527ccac54060cf909e3306652ff)
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
Notes:
- Conflicts in simult_flows.sh, because v6.1 doesn't have commit
675d99338e7a ("selftests: mptcp: simult flows: format subtests
results in TAP") which modifies the context for a new but unrelated
feature.
- This is a new version to the one recently proposed by Sasha, this
time without dependences:
https://lore.kernel.org/stable/9f185a3f-9373-401c-9a5c-ec0f106c0cbc@kernel.…
---
tools/testing/selftests/net/mptcp/simult_flows.sh | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/net/mptcp/simult_flows.sh b/tools/testing/selftests/net/mptcp/simult_flows.sh
index 4a417f9d51d6..ee24e06521e6 100755
--- a/tools/testing/selftests/net/mptcp/simult_flows.sh
+++ b/tools/testing/selftests/net/mptcp/simult_flows.sh
@@ -301,10 +301,11 @@ done
setup
run_test 10 10 0 0 "balanced bwidth"
-run_test 10 10 1 50 "balanced bwidth with unbalanced delay"
+run_test 10 10 1 25 "balanced bwidth with unbalanced delay"
# we still need some additional infrastructure to pass the following test-cases
-run_test 30 10 0 0 "unbalanced bwidth"
-run_test 30 10 1 50 "unbalanced bwidth with unbalanced delay"
-run_test 30 10 50 1 "unbalanced bwidth with opposed, unbalanced delay"
+run_test 10 3 0 0 "unbalanced bwidth"
+run_test 10 3 1 25 "unbalanced bwidth with unbalanced delay"
+run_test 10 3 25 1 "unbalanced bwidth with opposed, unbalanced delay"
+
exit $ret
--
2.43.0
The eventfd_ctx trigger pointer of the vfio_fsl_mc_irq object is
initially NULL and may become NULL if the user sets the trigger
eventfd to -1. The interrupt handler itself is guaranteed that
trigger is always valid between request_irq() and free_irq(), but
the loopback testing mechanisms to invoke the handler function
need to test the trigger. The triggering and setting ioctl paths
both make use of igate and are therefore mutually exclusive.
The vfio-fsl-mc driver does not make use of irqfds, nor does it
support any sort of masking operations, therefore unlike vfio-pci
and vfio-platform, the flow can remain essentially unchanged.
Cc: Diana Craciun <diana.craciun(a)oss.nxp.com>
Cc: stable(a)vger.kernel.org
Fixes: cc0ee20bd969 ("vfio/fsl-mc: trigger an interrupt via eventfd")
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
---
drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c b/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c
index d62fbfff20b8..82b2afa9b7e3 100644
--- a/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c
+++ b/drivers/vfio/fsl-mc/vfio_fsl_mc_intr.c
@@ -141,13 +141,14 @@ static int vfio_fsl_mc_set_irq_trigger(struct vfio_fsl_mc_device *vdev,
irq = &vdev->mc_irqs[index];
if (flags & VFIO_IRQ_SET_DATA_NONE) {
- vfio_fsl_mc_irq_handler(hwirq, irq);
+ if (irq->trigger)
+ eventfd_signal(irq->trigger);
} else if (flags & VFIO_IRQ_SET_DATA_BOOL) {
u8 trigger = *(u8 *)data;
- if (trigger)
- vfio_fsl_mc_irq_handler(hwirq, irq);
+ if (trigger && irq->trigger)
+ eventfd_signal(irq->trigger);
}
return 0;
--
2.44.0
irqfds for mask and unmask that are not specifically disabled by the
user are leaked. Remove any irqfds during cleanup
Cc: Eric Auger <eric.auger(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: a7fa7c77cf15 ("vfio/platform: implement IRQ masking/unmasking via an eventfd")
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
---
drivers/vfio/platform/vfio_platform_irq.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/vfio/platform/vfio_platform_irq.c b/drivers/vfio/platform/vfio_platform_irq.c
index 61a1bfb68ac7..e5dcada9e86c 100644
--- a/drivers/vfio/platform/vfio_platform_irq.c
+++ b/drivers/vfio/platform/vfio_platform_irq.c
@@ -321,8 +321,11 @@ void vfio_platform_irq_cleanup(struct vfio_platform_device *vdev)
{
int i;
- for (i = 0; i < vdev->num_irqs; i++)
+ for (i = 0; i < vdev->num_irqs; i++) {
+ vfio_virqfd_disable(&vdev->irqs[i].mask);
+ vfio_virqfd_disable(&vdev->irqs[i].unmask);
vfio_set_trigger(vdev, i, -1, NULL);
+ }
vdev->num_irqs = 0;
kfree(vdev->irqs);
--
2.44.0
A vulnerability exists where the eventfd for INTx signaling can be
deconfigured, which unregisters the IRQ handler but still allows
eventfds to be signaled with a NULL context through the SET_IRQS ioctl
or through unmask irqfd if the device interrupt is pending.
Ideally this could be solved with some additional locking; the igate
mutex serializes the ioctl and config space accesses, and the interrupt
handler is unregistered relative to the trigger, but the irqfd path
runs asynchronous to those. The igate mutex cannot be acquired from the
atomic context of the eventfd wake function. Disabling the irqfd
relative to the eventfd registration is potentially incompatible with
existing userspace.
As a result, the solution implemented here moves configuration of the
INTx interrupt handler to track the lifetime of the INTx context object
and irq_type configuration, rather than registration of a particular
trigger eventfd. Synchronization is added between the ioctl path and
eventfd_signal() wrapper such that the eventfd trigger can be
dynamically updated relative to in-flight interrupts or irqfd callbacks.
Cc: stable(a)vger.kernel.org
Fixes: 89e1f7d4c66d ("vfio: Add PCI device driver")
Reported-by: Reinette Chatre <reinette.chatre(a)intel.com>
Reviewed-by: Kevin Tian <kevin.tian(a)intel.com>
Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com>
Signed-off-by: Alex Williamson <alex.williamson(a)redhat.com>
---
drivers/vfio/pci/vfio_pci_intrs.c | 145 ++++++++++++++++--------------
1 file changed, 78 insertions(+), 67 deletions(-)
diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c
index 75c85eec21b3..fb5392b749ff 100644
--- a/drivers/vfio/pci/vfio_pci_intrs.c
+++ b/drivers/vfio/pci/vfio_pci_intrs.c
@@ -90,11 +90,15 @@ static void vfio_send_intx_eventfd(void *opaque, void *unused)
if (likely(is_intx(vdev) && !vdev->virq_disabled)) {
struct vfio_pci_irq_ctx *ctx;
+ struct eventfd_ctx *trigger;
ctx = vfio_irq_ctx_get(vdev, 0);
if (WARN_ON_ONCE(!ctx))
return;
- eventfd_signal(ctx->trigger);
+
+ trigger = READ_ONCE(ctx->trigger);
+ if (likely(trigger))
+ eventfd_signal(trigger);
}
}
@@ -253,100 +257,100 @@ static irqreturn_t vfio_intx_handler(int irq, void *dev_id)
return ret;
}
-static int vfio_intx_enable(struct vfio_pci_core_device *vdev)
+static int vfio_intx_enable(struct vfio_pci_core_device *vdev,
+ struct eventfd_ctx *trigger)
{
+ struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx;
+ unsigned long irqflags;
+ char *name;
+ int ret;
if (!is_irq_none(vdev))
return -EINVAL;
- if (!vdev->pdev->irq)
+ if (!pdev->irq)
return -ENODEV;
+ name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)", pci_name(pdev));
+ if (!name)
+ return -ENOMEM;
+
ctx = vfio_irq_ctx_alloc(vdev, 0);
if (!ctx)
return -ENOMEM;
+ ctx->name = name;
+ ctx->trigger = trigger;
+
/*
- * If the virtual interrupt is masked, restore it. Devices
- * supporting DisINTx can be masked at the hardware level
- * here, non-PCI-2.3 devices will have to wait until the
- * interrupt is enabled.
+ * Fill the initial masked state based on virq_disabled. After
+ * enable, changing the DisINTx bit in vconfig directly changes INTx
+ * masking. igate prevents races during setup, once running masked
+ * is protected via irqlock.
+ *
+ * Devices supporting DisINTx also reflect the current mask state in
+ * the physical DisINTx bit, which is not affected during IRQ setup.
+ *
+ * Devices without DisINTx support require an exclusive interrupt.
+ * IRQ masking is performed at the IRQ chip. Again, igate protects
+ * against races during setup and IRQ handlers and irqfds are not
+ * yet active, therefore masked is stable and can be used to
+ * conditionally auto-enable the IRQ.
+ *
+ * irq_type must be stable while the IRQ handler is registered,
+ * therefore it must be set before request_irq().
*/
ctx->masked = vdev->virq_disabled;
- if (vdev->pci_2_3)
- pci_intx(vdev->pdev, !ctx->masked);
+ if (vdev->pci_2_3) {
+ pci_intx(pdev, !ctx->masked);
+ irqflags = IRQF_SHARED;
+ } else {
+ irqflags = ctx->masked ? IRQF_NO_AUTOEN : 0;
+ }
vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX;
+ ret = request_irq(pdev->irq, vfio_intx_handler,
+ irqflags, ctx->name, vdev);
+ if (ret) {
+ vdev->irq_type = VFIO_PCI_NUM_IRQS;
+ kfree(name);
+ vfio_irq_ctx_free(vdev, ctx, 0);
+ return ret;
+ }
+
return 0;
}
-static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev, int fd)
+static int vfio_intx_set_signal(struct vfio_pci_core_device *vdev,
+ struct eventfd_ctx *trigger)
{
struct pci_dev *pdev = vdev->pdev;
- unsigned long irqflags = IRQF_SHARED;
struct vfio_pci_irq_ctx *ctx;
- struct eventfd_ctx *trigger;
- unsigned long flags;
- int ret;
+ struct eventfd_ctx *old;
ctx = vfio_irq_ctx_get(vdev, 0);
if (WARN_ON_ONCE(!ctx))
return -EINVAL;
- if (ctx->trigger) {
- free_irq(pdev->irq, vdev);
- kfree(ctx->name);
- eventfd_ctx_put(ctx->trigger);
- ctx->trigger = NULL;
- }
-
- if (fd < 0) /* Disable only */
- return 0;
-
- ctx->name = kasprintf(GFP_KERNEL_ACCOUNT, "vfio-intx(%s)",
- pci_name(pdev));
- if (!ctx->name)
- return -ENOMEM;
-
- trigger = eventfd_ctx_fdget(fd);
- if (IS_ERR(trigger)) {
- kfree(ctx->name);
- return PTR_ERR(trigger);
- }
+ old = ctx->trigger;
- ctx->trigger = trigger;
+ WRITE_ONCE(ctx->trigger, trigger);
- /*
- * Devices without DisINTx support require an exclusive interrupt,
- * IRQ masking is performed at the IRQ chip. The masked status is
- * protected by vdev->irqlock. Setup the IRQ without auto-enable and
- * unmask as necessary below under lock. DisINTx is unmodified by
- * the IRQ configuration and may therefore use auto-enable.
- */
- if (!vdev->pci_2_3)
- irqflags = IRQF_NO_AUTOEN;
-
- ret = request_irq(pdev->irq, vfio_intx_handler,
- irqflags, ctx->name, vdev);
- if (ret) {
- ctx->trigger = NULL;
- kfree(ctx->name);
- eventfd_ctx_put(trigger);
- return ret;
+ /* Releasing an old ctx requires synchronizing in-flight users */
+ if (old) {
+ synchronize_irq(pdev->irq);
+ vfio_virqfd_flush_thread(&ctx->unmask);
+ eventfd_ctx_put(old);
}
- spin_lock_irqsave(&vdev->irqlock, flags);
- if (!vdev->pci_2_3 && !ctx->masked)
- enable_irq(pdev->irq);
- spin_unlock_irqrestore(&vdev->irqlock, flags);
-
return 0;
}
static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
{
+ struct pci_dev *pdev = vdev->pdev;
struct vfio_pci_irq_ctx *ctx;
ctx = vfio_irq_ctx_get(vdev, 0);
@@ -354,10 +358,13 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev)
if (ctx) {
vfio_virqfd_disable(&ctx->unmask);
vfio_virqfd_disable(&ctx->mask);
+ free_irq(pdev->irq, vdev);
+ if (ctx->trigger)
+ eventfd_ctx_put(ctx->trigger);
+ kfree(ctx->name);
+ vfio_irq_ctx_free(vdev, ctx, 0);
}
- vfio_intx_set_signal(vdev, -1);
vdev->irq_type = VFIO_PCI_NUM_IRQS;
- vfio_irq_ctx_free(vdev, ctx, 0);
}
/*
@@ -641,19 +648,23 @@ static int vfio_pci_set_intx_trigger(struct vfio_pci_core_device *vdev,
return -EINVAL;
if (flags & VFIO_IRQ_SET_DATA_EVENTFD) {
+ struct eventfd_ctx *trigger = NULL;
int32_t fd = *(int32_t *)data;
int ret;
- if (is_intx(vdev))
- return vfio_intx_set_signal(vdev, fd);
+ if (fd >= 0) {
+ trigger = eventfd_ctx_fdget(fd);
+ if (IS_ERR(trigger))
+ return PTR_ERR(trigger);
+ }
- ret = vfio_intx_enable(vdev);
- if (ret)
- return ret;
+ if (is_intx(vdev))
+ ret = vfio_intx_set_signal(vdev, trigger);
+ else
+ ret = vfio_intx_enable(vdev, trigger);
- ret = vfio_intx_set_signal(vdev, fd);
- if (ret)
- vfio_intx_disable(vdev);
+ if (ret && trigger)
+ eventfd_ctx_put(trigger);
return ret;
}
--
2.44.0
This patch changes drm_mode_legacy_fb_format() to only return formats
which are supported by the current drm-device. The motivation for this
change is to fix a regression introduced by commit
c91acda3a380 ("drm/gem: Check for valid formats") which stops the Xorg
modesetting driver from working on the Beagleboard Black (it uses the
tilcdc kernel driver).
When the Xorg modesetting driver starts up, it tries to determine the
default bpp for the device. It does this by allocating a dumb 32bpp
frame buffer (using DRM_IOCTL_MODE_CREATE_DUMB) and then calling
drmModeAddFB() with that frame buffer asking for a 24-bit depth and 32
bpp. As the modesetting driver uses drmModeAddFB() which doesn't
supply a format, the kernel's drm_mode_legacy_fb_format() is called to
provide a format matching the requested depth and bpp. If the
drmModeAddFB() call fails, it forces both depth and bpp to 24. If
drmModeAddFB() succeeds, depth is assumed to be 24 and bpp 32. The
dummy frame buffer is then removed (using drmModeRmFB()).
If the modesetting driver finds that both the default bpp and depth
are 24, it forces the use of a 32bpp shadow buffer and a 24bpp front
buffer. Following this, the driver reads the user-specified color
depth option and tries to create a framebuffer of that depth, but if
the use of a shadow buffer has been forced, the bpp and depth of it
overrides the user-supplied option.
The Xorg modesetting driver on top of the tilcdc kernel driver used to
work on the Beagleboard Black if a 16 bit color depth was
configured. The hardware in the Beagleboard Black supports the RG16,
BG24, and XB24 formats. When drm_mode_legacy_fb_format() was called to
request a format for a 24-bit depth and 32 bpp, it would return the
unsupported RG24 format which drmModeAddFB() would happily accept (as
there was no check for a valid format). As a shadow buffer wasn't
forced, the modesetting driver would try the user specified 16 bit
color depth and drm_mode_legacy_fb_format() would return RG16 which is
supported by the hardware. Color depths of 24 bits were not supported,
as the unsupported RG24 would be detected when drmModeSetCrtc() was
called.
Following commit c91acda3a380 ("drm/gem: Check for valid formats"),
which adds a check for a valid (supported by the hardware) format to
the code path for the kernel part of drmModeAddFB(), the modesetting
driver fails to configure and add a frame buffer. This is because the
call to create a 24-bit depth and 32 bpp framebuffer during detection
of the default bpp will now fail and a 24-bit depth and 24 bpp front
buffer will be forced. As drm_mode_legacy_fb_format() will return RG24
which isn't supported, the creation of that framebuffer will also
fail.
To fix the regression, this patch extends drm_mode_legacy_fb_format()
to list all formats with a particular bpp and color depth known to the
kernel, and have it probe the current drm-device for a supported
format. This fixes the regression and, as a bonus, a color depth of 24
bits on the Beagleboard Black is now working.
As this patch changes drm_mode_legacy_fb_format() which is used by
other drivers, it has, in addition to the Beagleboard Black, also been
tested with the nouveau and modesetting drivers on a NVIDIA NV96, and
with the intel and modesetting drivers on an intel HD Graphics 4000
chipset.
Signed-off-by: Frej Drejhammar <frej.drejhammar(a)gmail.com>
Fixes: c91acda3a380 ("drm/gem: Check for valid formats")
Cc: stable(a)vger.kernel.org
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: David Airlie <airlied(a)gmail.com>
Cc: Daniel Vetter <daniel(a)ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com>
Cc: Maxime Ripard <mripard(a)kernel.org>
Cc: Thomas Zimmermann <tzimmermann(a)suse.de>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Cc: Patrik Jakobsson <patrik.r.jakobsson(a)gmail.com>
Cc: Rob Clark <robdclark(a)gmail.com>
Cc: Abhinav Kumar <quic_abhinavk(a)quicinc.com>
Cc: Dmitry Baryshkov <dmitry.baryshkov(a)linaro.org>
Cc: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
Cc: Javier Martinez Canillas <javierm(a)redhat.com>
Cc: "Maíra Canal" <mcanal(a)igalia.com>
Cc: "Ville Syrjälä" <ville.syrjala(a)linux.intel.com>
Cc: dri-devel(a)lists.freedesktop.org
---
drivers/gpu/drm/armada/armada_fbdev.c | 2 +-
drivers/gpu/drm/drm_fb_helper.c | 2 +-
drivers/gpu/drm/drm_fbdev_dma.c | 3 +-
drivers/gpu/drm/drm_fbdev_generic.c | 3 +-
drivers/gpu/drm/drm_fourcc.c | 91 ++++++++++++++++---
drivers/gpu/drm/exynos/exynos_drm_fbdev.c | 3 +-
drivers/gpu/drm/gma500/fbdev.c | 2 +-
drivers/gpu/drm/i915/display/intel_fbdev_fb.c | 3 +-
drivers/gpu/drm/msm/msm_fbdev.c | 3 +-
drivers/gpu/drm/omapdrm/omap_fbdev.c | 2 +-
drivers/gpu/drm/radeon/radeon_fbdev.c | 3 +-
drivers/gpu/drm/tegra/fbdev.c | 3 +-
drivers/gpu/drm/tiny/ofdrm.c | 6 +-
drivers/gpu/drm/xe/display/intel_fbdev_fb.c | 3 +-
include/drm/drm_fourcc.h | 3 +-
15 files changed, 104 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/armada/armada_fbdev.c b/drivers/gpu/drm/armada/armada_fbdev.c
index d223176912b6..82f312f76980 100644
--- a/drivers/gpu/drm/armada/armada_fbdev.c
+++ b/drivers/gpu/drm/armada/armada_fbdev.c
@@ -54,7 +54,7 @@ static int armada_fbdev_create(struct drm_fb_helper *fbh,
mode.width = sizes->surface_width;
mode.height = sizes->surface_height;
mode.pitches[0] = armada_pitch(mode.width, sizes->surface_bpp);
- mode.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
+ mode.pixel_format = drm_mode_legacy_fb_format(dev, sizes->surface_bpp,
sizes->surface_depth);
size = mode.pitches[0] * mode.height;
diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c
index d612133e2cf7..62f81a14fb2e 100644
--- a/drivers/gpu/drm/drm_fb_helper.c
+++ b/drivers/gpu/drm/drm_fb_helper.c
@@ -1453,7 +1453,7 @@ static uint32_t drm_fb_helper_find_format(struct drm_fb_helper *fb_helper, const
* the framebuffer emulation can only deal with such
* formats, specifically RGB/BGA formats.
*/
- format = drm_mode_legacy_fb_format(bpp, depth);
+ format = drm_mode_legacy_fb_format(dev, bpp, depth);
if (!format)
goto err;
diff --git a/drivers/gpu/drm/drm_fbdev_dma.c b/drivers/gpu/drm/drm_fbdev_dma.c
index 6c9427bb4053..cdb315c6d110 100644
--- a/drivers/gpu/drm/drm_fbdev_dma.c
+++ b/drivers/gpu/drm/drm_fbdev_dma.c
@@ -90,7 +90,8 @@ static int drm_fbdev_dma_helper_fb_probe(struct drm_fb_helper *fb_helper,
sizes->surface_width, sizes->surface_height,
sizes->surface_bpp);
- format = drm_mode_legacy_fb_format(sizes->surface_bpp, sizes->surface_depth);
+ format = drm_mode_legacy_fb_format(dev,
+ sizes->surface_bpp, sizes->surface_depth);
buffer = drm_client_framebuffer_create(client, sizes->surface_width,
sizes->surface_height, format);
if (IS_ERR(buffer))
diff --git a/drivers/gpu/drm/drm_fbdev_generic.c b/drivers/gpu/drm/drm_fbdev_generic.c
index d647d89764cb..aba8c272560c 100644
--- a/drivers/gpu/drm/drm_fbdev_generic.c
+++ b/drivers/gpu/drm/drm_fbdev_generic.c
@@ -84,7 +84,8 @@ static int drm_fbdev_generic_helper_fb_probe(struct drm_fb_helper *fb_helper,
sizes->surface_width, sizes->surface_height,
sizes->surface_bpp);
- format = drm_mode_legacy_fb_format(sizes->surface_bpp, sizes->surface_depth);
+ format = drm_mode_legacy_fb_format(dev,
+ sizes->surface_bpp, sizes->surface_depth);
buffer = drm_client_framebuffer_create(client, sizes->surface_width,
sizes->surface_height, format);
if (IS_ERR(buffer))
diff --git a/drivers/gpu/drm/drm_fourcc.c b/drivers/gpu/drm/drm_fourcc.c
index 193cf8ed7912..034f2087af9a 100644
--- a/drivers/gpu/drm/drm_fourcc.c
+++ b/drivers/gpu/drm/drm_fourcc.c
@@ -29,47 +29,97 @@
#include <drm/drm_device.h>
#include <drm/drm_fourcc.h>
+#include <drm/drm_plane.h>
+#include <drm/drm_print.h>
+
+/*
+ * Internal helper to find a valid format among a list of potentially
+ * valid formats.
+ *
+ * Traverses the variadic arguments until a format supported by @dev
+ * or an DRM_FORMAT_INVALID argument is found. If a supported format
+ * is found it is returned, otherwise DRM_FORMAT_INVALID is returned.
+ */
+static uint32_t select_valid_format(struct drm_device *dev, ...)
+{
+ va_list va;
+ uint32_t fmt = DRM_FORMAT_INVALID;
+ uint32_t to_try;
+
+ va_start(va, dev);
+
+ for (to_try = va_arg(va, uint32_t);
+ to_try != DRM_FORMAT_INVALID;
+ to_try = va_arg(va, uint32_t)) {
+ if (drm_any_plane_has_format(dev, to_try, 0)) {
+ fmt = to_try;
+ break;
+ }
+ }
+
+ va_end(va);
+
+ return fmt;
+}
/**
* drm_mode_legacy_fb_format - compute drm fourcc code from legacy description
+ * @dev: DRM device
* @bpp: bits per pixels
* @depth: bit depth per pixel
*
* Computes a drm fourcc pixel format code for the given @bpp/@depth values.
* Useful in fbdev emulation code, since that deals in those values.
*/
-uint32_t drm_mode_legacy_fb_format(uint32_t bpp, uint32_t depth)
+uint32_t drm_mode_legacy_fb_format(struct drm_device *dev,
+ uint32_t bpp, uint32_t depth)
{
uint32_t fmt = DRM_FORMAT_INVALID;
switch (bpp) {
case 1:
if (depth == 1)
- fmt = DRM_FORMAT_C1;
+ fmt = select_valid_format(dev,
+ DRM_FORMAT_C1,
+ DRM_FORMAT_INVALID);
break;
case 2:
if (depth == 2)
- fmt = DRM_FORMAT_C2;
+ fmt = select_valid_format(dev,
+ DRM_FORMAT_C2,
+ DRM_FORMAT_INVALID);
break;
case 4:
if (depth == 4)
- fmt = DRM_FORMAT_C4;
+ fmt = select_valid_format(dev,
+ DRM_FORMAT_C4,
+ DRM_FORMAT_INVALID);
break;
case 8:
if (depth == 8)
- fmt = DRM_FORMAT_C8;
+ fmt = select_valid_format(dev,
+ DRM_FORMAT_C8,
+ DRM_FORMAT_INVALID);
break;
case 16:
switch (depth) {
case 15:
- fmt = DRM_FORMAT_XRGB1555;
+ fmt = select_valid_format(dev,
+ DRM_FORMAT_XRGB1555,
+ DRM_FORMAT_XBGR1555,
+ DRM_FORMAT_RGBX5551,
+ DRM_FORMAT_BGRX5551,
+ DRM_FORMAT_INVALID);
break;
case 16:
- fmt = DRM_FORMAT_RGB565;
+ fmt = select_valid_format(dev,
+ DRM_FORMAT_RGB565,
+ DRM_FORMAT_BGR565,
+ DRM_FORMAT_INVALID);
break;
default:
break;
@@ -78,19 +128,36 @@ uint32_t drm_mode_legacy_fb_format(uint32_t bpp, uint32_t depth)
case 24:
if (depth == 24)
- fmt = DRM_FORMAT_RGB888;
+ fmt = select_valid_format(dev,
+ DRM_FORMAT_RGB888,
+ DRM_FORMAT_BGR888);
break;
case 32:
switch (depth) {
case 24:
- fmt = DRM_FORMAT_XRGB8888;
+ fmt = select_valid_format(dev,
+ DRM_FORMAT_XRGB8888,
+ DRM_FORMAT_XBGR8888,
+ DRM_FORMAT_RGBX8888,
+ DRM_FORMAT_BGRX8888,
+ DRM_FORMAT_INVALID);
break;
case 30:
- fmt = DRM_FORMAT_XRGB2101010;
+ fmt = select_valid_format(dev,
+ DRM_FORMAT_XRGB2101010,
+ DRM_FORMAT_XBGR2101010,
+ DRM_FORMAT_RGBX1010102,
+ DRM_FORMAT_BGRX1010102,
+ DRM_FORMAT_INVALID);
break;
case 32:
- fmt = DRM_FORMAT_ARGB8888;
+ fmt = select_valid_format(dev,
+ DRM_FORMAT_ARGB8888,
+ DRM_FORMAT_ABGR8888,
+ DRM_FORMAT_RGBA8888,
+ DRM_FORMAT_BGRA8888,
+ DRM_FORMAT_INVALID);
break;
default:
break;
@@ -119,7 +186,7 @@ EXPORT_SYMBOL(drm_mode_legacy_fb_format);
uint32_t drm_driver_legacy_fb_format(struct drm_device *dev,
uint32_t bpp, uint32_t depth)
{
- uint32_t fmt = drm_mode_legacy_fb_format(bpp, depth);
+ uint32_t fmt = drm_mode_legacy_fb_format(dev, bpp, depth);
if (dev->mode_config.quirk_addfb_prefer_host_byte_order) {
if (fmt == DRM_FORMAT_XRGB8888)
diff --git a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
index a379c8ca435a..e114ebd44169 100644
--- a/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
+++ b/drivers/gpu/drm/exynos/exynos_drm_fbdev.c
@@ -104,7 +104,8 @@ static int exynos_drm_fbdev_create(struct drm_fb_helper *helper,
mode_cmd.width = sizes->surface_width;
mode_cmd.height = sizes->surface_height;
mode_cmd.pitches[0] = sizes->surface_width * (sizes->surface_bpp >> 3);
- mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
+ mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev,
+ sizes->surface_bpp,
sizes->surface_depth);
size = mode_cmd.pitches[0] * mode_cmd.height;
diff --git a/drivers/gpu/drm/gma500/fbdev.c b/drivers/gpu/drm/gma500/fbdev.c
index 98b44974d42d..811ae5cccf2c 100644
--- a/drivers/gpu/drm/gma500/fbdev.c
+++ b/drivers/gpu/drm/gma500/fbdev.c
@@ -189,7 +189,7 @@ static int psb_fbdev_fb_probe(struct drm_fb_helper *fb_helper,
mode_cmd.width = sizes->surface_width;
mode_cmd.height = sizes->surface_height;
mode_cmd.pitches[0] = ALIGN(mode_cmd.width * DIV_ROUND_UP(bpp, 8), 64);
- mode_cmd.pixel_format = drm_mode_legacy_fb_format(bpp, depth);
+ mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev, bpp, depth);
size = mode_cmd.pitches[0] * mode_cmd.height;
size = ALIGN(size, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/display/intel_fbdev_fb.c b/drivers/gpu/drm/i915/display/intel_fbdev_fb.c
index 0665f943f65f..cb32fcff8fb5 100644
--- a/drivers/gpu/drm/i915/display/intel_fbdev_fb.c
+++ b/drivers/gpu/drm/i915/display/intel_fbdev_fb.c
@@ -30,7 +30,8 @@ struct drm_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
mode_cmd.pitches[0] = ALIGN(mode_cmd.width *
DIV_ROUND_UP(sizes->surface_bpp, 8), 64);
- mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
+ mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev,
+ sizes->surface_bpp,
sizes->surface_depth);
size = mode_cmd.pitches[0] * mode_cmd.height;
diff --git a/drivers/gpu/drm/msm/msm_fbdev.c b/drivers/gpu/drm/msm/msm_fbdev.c
index 030bedac632d..8748610299b4 100644
--- a/drivers/gpu/drm/msm/msm_fbdev.c
+++ b/drivers/gpu/drm/msm/msm_fbdev.c
@@ -77,7 +77,8 @@ static int msm_fbdev_create(struct drm_fb_helper *helper,
uint32_t format;
int ret, pitch;
- format = drm_mode_legacy_fb_format(sizes->surface_bpp, sizes->surface_depth);
+ format = drm_mode_legacy_fb_format(dev,
+ sizes->surface_bpp, sizes->surface_depth);
DBG("create fbdev: %dx%d@%d (%dx%d)", sizes->surface_width,
sizes->surface_height, sizes->surface_bpp,
diff --git a/drivers/gpu/drm/omapdrm/omap_fbdev.c b/drivers/gpu/drm/omapdrm/omap_fbdev.c
index 6b08b137af1a..98f01d80abd8 100644
--- a/drivers/gpu/drm/omapdrm/omap_fbdev.c
+++ b/drivers/gpu/drm/omapdrm/omap_fbdev.c
@@ -139,7 +139,7 @@ static int omap_fbdev_create(struct drm_fb_helper *helper,
sizes->surface_height, sizes->surface_bpp,
sizes->fb_width, sizes->fb_height);
- mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
+ mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev, sizes->surface_bpp,
sizes->surface_depth);
mode_cmd.width = sizes->surface_width;
diff --git a/drivers/gpu/drm/radeon/radeon_fbdev.c b/drivers/gpu/drm/radeon/radeon_fbdev.c
index 02bf25759059..bf1843529c7c 100644
--- a/drivers/gpu/drm/radeon/radeon_fbdev.c
+++ b/drivers/gpu/drm/radeon/radeon_fbdev.c
@@ -221,7 +221,8 @@ static int radeon_fbdev_fb_helper_fb_probe(struct drm_fb_helper *fb_helper,
if ((sizes->surface_bpp == 24) && ASIC_IS_AVIVO(rdev))
sizes->surface_bpp = 32;
- mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
+ mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev,
+ sizes->surface_bpp,
sizes->surface_depth);
ret = radeon_fbdev_create_pinned_object(fb_helper, &mode_cmd, &gobj);
diff --git a/drivers/gpu/drm/tegra/fbdev.c b/drivers/gpu/drm/tegra/fbdev.c
index db6eaac3d30e..290e8c426b0c 100644
--- a/drivers/gpu/drm/tegra/fbdev.c
+++ b/drivers/gpu/drm/tegra/fbdev.c
@@ -87,7 +87,8 @@ static int tegra_fbdev_probe(struct drm_fb_helper *helper,
cmd.pitches[0] = round_up(sizes->surface_width * bytes_per_pixel,
tegra->pitch_align);
- cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
+ cmd.pixel_format = drm_mode_legacy_fb_format(dev,
+ sizes->surface_bpp,
sizes->surface_depth);
size = cmd.pitches[0] * cmd.height;
diff --git a/drivers/gpu/drm/tiny/ofdrm.c b/drivers/gpu/drm/tiny/ofdrm.c
index ab89b7fc7bf6..ded868601aea 100644
--- a/drivers/gpu/drm/tiny/ofdrm.c
+++ b/drivers/gpu/drm/tiny/ofdrm.c
@@ -100,14 +100,14 @@ static const struct drm_format_info *display_get_validated_format(struct drm_dev
switch (depth) {
case 8:
- format = drm_mode_legacy_fb_format(8, 8);
+ format = drm_mode_legacy_fb_format(dev, 8, 8);
break;
case 15:
case 16:
- format = drm_mode_legacy_fb_format(16, depth);
+ format = drm_mode_legacy_fb_format(dev, 16, depth);
break;
case 32:
- format = drm_mode_legacy_fb_format(32, 24);
+ format = drm_mode_legacy_fb_format(dev, 32, 24);
break;
default:
drm_err(dev, "unsupported framebuffer depth %u\n", depth);
diff --git a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
index 51ae3561fd0d..a38a8143d632 100644
--- a/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
+++ b/drivers/gpu/drm/xe/display/intel_fbdev_fb.c
@@ -32,7 +32,8 @@ struct drm_framebuffer *intel_fbdev_fb_alloc(struct drm_fb_helper *helper,
mode_cmd.pitches[0] = ALIGN(mode_cmd.width *
DIV_ROUND_UP(sizes->surface_bpp, 8), XE_PAGE_SIZE);
- mode_cmd.pixel_format = drm_mode_legacy_fb_format(sizes->surface_bpp,
+ mode_cmd.pixel_format = drm_mode_legacy_fb_format(dev,
+ sizes->surface_bpp,
sizes->surface_depth);
size = mode_cmd.pitches[0] * mode_cmd.height;
diff --git a/include/drm/drm_fourcc.h b/include/drm/drm_fourcc.h
index ccf91daa4307..75d06393a564 100644
--- a/include/drm/drm_fourcc.h
+++ b/include/drm/drm_fourcc.h
@@ -310,7 +310,8 @@ const struct drm_format_info *drm_format_info(u32 format);
const struct drm_format_info *
drm_get_format_info(struct drm_device *dev,
const struct drm_mode_fb_cmd2 *mode_cmd);
-uint32_t drm_mode_legacy_fb_format(uint32_t bpp, uint32_t depth);
+uint32_t drm_mode_legacy_fb_format(struct drm_device *dev,
+ uint32_t bpp, uint32_t depth);
uint32_t drm_driver_legacy_fb_format(struct drm_device *dev,
uint32_t bpp, uint32_t depth);
unsigned int drm_format_info_block_width(const struct drm_format_info *info,
base-commit: b9511c6d277c31b13d4f3128eba46f4e0733d734
--
2.44.0
As a matter of fact, continuous reads require additional handling at the
operation level in order for them to work properly. The core helpers do
have this additional logic now, but any time a controller implements its
own page helper, this extra logic is "lost". This means we need another
level of per-controller driver checks to ensure they can leverage
continuous reads. This is for now unsupported, so in order to ensure
continuous reads are enabled only when fully using the core page
helpers, we need to add more initial checks.
Also, as performance is not relevant during raw accesses, we also
prevent these from enabling the feature.
This should solve the issue seen with controllers such as the STM32 FMC2
when in sequencer mode. In this case, the continuous read feature would
be enabled but not leveraged, and most importantly not disabled, leading
to further operations to fail.
Reported-by: Christophe Kerello <christophe.kerello(a)foss.st.com>
Fixes: 003fe4b9545b ("mtd: rawnand: Support for sequential cache reads")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/nand_base.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index 4d5a663e4e05..2479fa98f991 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -3594,7 +3594,8 @@ static int nand_do_read_ops(struct nand_chip *chip, loff_t from,
oob = ops->oobbuf;
oob_required = oob ? 1 : 0;
- rawnand_enable_cont_reads(chip, page, readlen, col);
+ if (likely(ops->mode != MTD_OPS_RAW))
+ rawnand_enable_cont_reads(chip, page, readlen, col);
while (1) {
struct mtd_ecc_stats ecc_stats = mtd->ecc_stats;
@@ -5212,6 +5213,15 @@ static void rawnand_late_check_supported_ops(struct nand_chip *chip)
if (!nand_has_exec_op(chip))
return;
+ /*
+ * For now, continuous reads can only be used with the core page helpers.
+ * This can be extended later.
+ */
+ if (!(chip->ecc.read_page == nand_read_page_hwecc ||
+ chip->ecc.read_page == nand_read_page_syndrome ||
+ chip->ecc.read_page == nand_read_page_swecc))
+ return;
+
rawnand_check_cont_read_support(chip);
}
--
2.40.1
None of the callers of drm_panel_get_modes() expect it to return
negative error codes. Either they propagate the return value in their
struct drm_connector_helper_funcs .get_modes() hook (which is also not
supposed to return negative codes), or add it to other counts leading to
bogus values.
On the other hand, many of the struct drm_panel_funcs .get_modes() hooks
do return negative error codes, so handle them gracefully instead of
propagating further.
Return 0 for no modes, whatever the reason.
Cc: Neil Armstrong <neil.armstrong(a)linaro.org>
Cc: Jessica Zhang <quic_jesszhan(a)quicinc.com>
Cc: Sam Ravnborg <sam(a)ravnborg.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
drivers/gpu/drm/drm_panel.c | 17 +++++++++++------
1 file changed, 11 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/drm_panel.c b/drivers/gpu/drm/drm_panel.c
index e814020bbcd3..cfbe020de54e 100644
--- a/drivers/gpu/drm/drm_panel.c
+++ b/drivers/gpu/drm/drm_panel.c
@@ -274,19 +274,24 @@ EXPORT_SYMBOL(drm_panel_disable);
* The modes probed from the panel are automatically added to the connector
* that the panel is attached to.
*
- * Return: The number of modes available from the panel on success or a
- * negative error code on failure.
+ * Return: The number of modes available from the panel on success, or 0 on
+ * failure (no modes).
*/
int drm_panel_get_modes(struct drm_panel *panel,
struct drm_connector *connector)
{
if (!panel)
- return -EINVAL;
+ return 0;
- if (panel->funcs && panel->funcs->get_modes)
- return panel->funcs->get_modes(panel, connector);
+ if (panel->funcs && panel->funcs->get_modes) {
+ int num;
- return -EOPNOTSUPP;
+ num = panel->funcs->get_modes(panel, connector);
+ if (num > 0)
+ return num;
+ }
+
+ return 0;
}
EXPORT_SYMBOL(drm_panel_get_modes);
--
2.39.2
Hi Sasha,
On 10/03/2024 03:33, Sasha Levin wrote:
> This is a note to let you know that I've just added the patch titled
>
> selftests: mptcp: simult flows: format subtests results in TAP
>
> to the 6.1-stable tree which can be found at:
> http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
>
> The filename of the patch is:
> selftests-mptcp-simult-flows-format-subtests-results.patch
> and it can be found in the queue-6.1 subdirectory.
>
> If you, or anyone else, feels it should not be added to the stable tree,
> please let <stable(a)vger.kernel.org> know about it.
Thank you for having backported this commit 675d99338e7a ("selftests:
mptcp: simult flows: format subtests results in TAP") -- as well as
commit 4d8e0dde0403 ("selftests: mptcp: simult flows: fix some subtest
names"), a fix for it -- as a "dependence" for commit 5e2f3c65af47
("selftests: mptcp: decrease BW in simult flows"), but I think it is
better not to include 675d99338e7a (and 4d8e0dde0403): they are not
dependences, just modifying the lines around, and they depend on other
commits to have this feature to work.
In other words, commit 675d99338e7a ("selftests: mptcp: simult flows:
format subtests results in TAP") -- and 4d8e0dde0403 ("selftests: mptcp:
simult flows: fix some subtest names") -- is now causing the MPTCP
simult flows selftest to fail. Could it be possible to remove them from
6.1 and 5.15 queues please?
> commit 4eeef0aaffa567f812390612c30f800de02edd73
> Author: Matthieu Baerts <matttbe(a)kernel.org>
> Date: Mon Jul 17 15:21:31 2023 +0200
>
> selftests: mptcp: simult flows: format subtests results in TAP
>
> [ Upstream commit 675d99338e7a6cd925d61d7dbf8c26612f7f08a9 ]
>
> The current selftests infrastructure formats the results in TAP 13. This
> version doesn't support subtests and only the end result of each
> selftest is taken into account. It means that a single issue in a
> subtest of a selftest containing multiple subtests forces the whole
> selftest to be marked as failed. It also means that subtests results are
> not tracked by CIs executing selftests.
>
> MPTCP selftests run hundreds of various subtests. It is then important
> to track each of them and not one result per selftest.
>
> It is particularly interesting to do that when validating stable kernels
> with the last version of the test suite: tests might fail because a
> feature is not supported but the test didn't skip that part. In this
> case, if subtests are not tracked, the whole selftest will be marked as
> failed making the other subtests useless because their results are
> ignored.
>
> This patch formats subtests results in TAP in simult_flows.sh selftest.
>
> Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
> Acked-by: Paolo Abeni <pabeni(a)redhat.com>
> Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
> Signed-off-by: David S. Miller <davem(a)davemloft.net>
> Stable-dep-of: 5e2f3c65af47 ("selftests: mptcp: decrease BW in simult flows")
If needed, I can help to resolve the conflicts to have commit
5e2f3c65af47 ("selftests: mptcp: decrease BW in simult flows")
backported to 6.1 and 5.15.
Cheers,
Matt
--
Sponsored by the NGI0 Core fund.
On Sun, Mar 10, 2024 at 03:43:38PM -0400, Kent Overstreet wrote:
> The following changes since commit 2e7cdd29fc42c410eab52fffe5710bf656619222:
>
> Linux 6.7.9 (2024-03-06 14:54:01 +0000)
>
> are available in the Git repository at:
>
> https://evilpiepirate.org/git/bcachefs.git tags/bcachefs-for-v6.7-stable-20240310
>
> for you to fetch changes up to 560ceb6a4d9e3bea57c29f5f3a7a1d671dfc7983:
>
> bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree (2024-03-10 14:36:57 -0400)
>
> ----------------------------------------------------------------
> bcachefs fixes for 6.7 stable
>
> "bcachefs: fix simulateously upgrading & downgrading" is the important
> one here. This fixes a really nasty bug where in a rare situation we
> wouldn't downgrade; we'd write a superblock where the version number is
> higher than the currently supported version.
>
> This caused total failure to mount multi device filesystems with the
> splitbrain checking in 6.8, since now we wouldn't be updating the member
> sequence numbers used for splitbrain checking, but the version number
> said we would be - and newer versions would attempt to kick every device
> out of the fs.
>
> ----------------------------------------------------------------
> Helge Deller (1):
> bcachefs: Fix build on parisc by avoiding __multi3()
>
> Kent Overstreet (3):
> bcachefs: check for failure to downgrade
> bcachefs: fix simulateously upgrading & downgrading
> bcachefs: Fix BTREE_ITER_FILTER_SNAPSHOTS on inodes btree
>
> Mathias Krause (1):
> bcachefs: install fd later to avoid race with close
>
> fs/bcachefs/btree_iter.c | 4 +++-
> fs/bcachefs/chardev.c | 3 +--
> fs/bcachefs/errcode.h | 1 +
> fs/bcachefs/mean_and_variance.h | 2 +-
> fs/bcachefs/super-io.c | 27 ++++++++++++++++++++++++---
> 5 files changed, 30 insertions(+), 7 deletions(-)
Qcom SoCs making use of ARM SMMU require BDF to SID translation table in
the driver to properly map the SID for the PCIe devices based on their BDF
identifier. This is currently achieved with the help of
qcom_pcie_config_sid_1_9_0() function for SoCs supporting the 1_9_0 config.
But With newer Qcom SoCs starting from SM8450, BDF to SID translation is
set to bypass mode by default in hardware. Due to this, the translation
table that is set in the qcom_pcie_config_sid_1_9_0() is essentially
unused and the default SID is used for all endpoints in SoCs starting from
SM8450.
This is a security concern and also warrants swapping the DeviceID in DT
while using the GIC ITS to handle MSIs from endpoints. The swapping is
currently done like below in DT when using GIC ITS:
/*
* MSIs for BDF (1:0.0) only works with Device ID 0x5980.
* Hence, the IDs are swapped.
*/
msi-map = <0x0 &gic_its 0x5981 0x1>,
<0x100 &gic_its 0x5980 0x1>;
Here, swapping of the DeviceIDs ensure that the endpoint with BDF (1:0.0)
gets the DeviceID 0x5980 which is associated with the default SID as per
the iommu mapping in DT. So MSIs were delivered with IDs swapped so far.
But this also means the Root Port (0:0.0) won't receive any MSIs (for PME,
AER etc...)
So let's fix these issues by clearing the BDF to SID bypass mode for all
SoCs making use of the 1_9_0 config. This allows the PCIe devices to use
the correct SID, thus avoiding the DeviceID swapping hack in DT and also
achieving the isolation between devices.
Cc: <stable(a)vger.kernel.org> # 5.11
Fixes: 4c9398822106 ("PCI: qcom: Add support for configuring BDF to SID mapping for SM8250")
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
I will send the DT patches to fix the msi-map entries once this patch gets
merged.
---
drivers/pci/controller/dwc/pcie-qcom.c | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 10f2d0bb86be..84e47c6f95fe 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -53,6 +53,7 @@
#define PARF_SLV_ADDR_SPACE_SIZE 0x358
#define PARF_DEVICE_TYPE 0x1000
#define PARF_BDF_TO_SID_TABLE_N 0x2000
+#define PARF_BDF_TO_SID_CFG 0x2c00
/* ELBI registers */
#define ELBI_SYS_CTRL 0x04
@@ -120,6 +121,9 @@
/* PARF_DEVICE_TYPE register fields */
#define DEVICE_TYPE_RC 0x4
+/* PARF_BDF_TO_SID_CFG fields */
+#define BDF_TO_SID_BYPASS BIT(0)
+
/* ELBI_SYS_CTRL register fields */
#define ELBI_SYS_CTRL_LT_ENABLE BIT(0)
@@ -1008,11 +1012,17 @@ static int qcom_pcie_config_sid_1_9_0(struct qcom_pcie *pcie)
u8 qcom_pcie_crc8_table[CRC8_TABLE_SIZE];
int i, nr_map, size = 0;
u32 smmu_sid_base;
+ u32 val;
of_get_property(dev->of_node, "iommu-map", &size);
if (!size)
return 0;
+ /* Enable BDF to SID translation by disabling bypass mode (default) */
+ val = readl(pcie->parf + PARF_BDF_TO_SID_CFG);
+ val &= ~BDF_TO_SID_BYPASS;
+ writel(val, pcie->parf + PARF_BDF_TO_SID_CFG);
+
map = kzalloc(size, GFP_KERNEL);
if (!map)
return -ENOMEM;
---
base-commit: 6613476e225e090cc9aad49be7fa504e290dd33d
change-id: 20240307-pci-bdf-sid-fix-c9cd8c0023d0
Best regards,
--
Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Clang enables -Wenum-enum-conversion and -Wenum-compare-conditional
under -Wenum-conversion. A recent change in Clang strengthened these
warnings and they appear frequently in common builds, primarily due to
several instances in common headers but there are quite a few drivers
that have individual instances as well.
include/linux/vmstat.h:508:43: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum numa_stat_item') [-Wenum-enum-conversion]
508 | return vmstat_text[NR_VM_ZONE_STAT_ITEMS +
| ~~~~~~~~~~~~~~~~~~~~~ ^
509 | item];
| ~~~~
drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c:955:24: warning: conditional expression between different enumeration types ('enum iwl_mac_beacon_flags' and 'enum iwl_mac_beacon_flags_v1') [-Wenum-compare-conditional]
955 | flags |= is_new_rate ? IWL_MAC_BEACON_CCK
| ^ ~~~~~~~~~~~~~~~~~~
956 | : IWL_MAC_BEACON_CCK_V1;
| ~~~~~~~~~~~~~~~~~~~~~
drivers/net/wireless/intel/iwlwifi/mvm/mac-ctxt.c:1120:21: warning: conditional expression between different enumeration types ('enum iwl_mac_beacon_flags' and 'enum iwl_mac_beacon_flags_v1') [-Wenum-compare-conditional]
1120 | 0) > 10 ?
| ^
1121 | IWL_MAC_BEACON_FILS :
| ~~~~~~~~~~~~~~~~~~~
1122 | IWL_MAC_BEACON_FILS_V1;
| ~~~~~~~~~~~~~~~~~~~~~~
Doing arithmetic between or returning two different types of enums could
be a bug, so each of the instance of the warning needs to be evaluated.
Unfortunately, as mentioned above, there are many instances of this
warning in many different configurations, which can break the build when
CONFIG_WERROR is enabled.
To avoid introducing new instances of the warnings while cleaning up the
disruption for the majority of users, disable these warnings for the
default build while leaving them on for W=1 builds.
Cc: stable(a)vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/2002
Link: https://github.com/llvm/llvm-project/commit/8c2ae42b3e1c6aa7c18f873edcebff7…
Acked-by: Yonghong Song <yonghong.song(a)linux.dev>
Signed-off-by: Nathan Chancellor <nathan(a)kernel.org>
---
Changes in v2:
- Only disable the warning for the default build, leave it on for W=1 (Arnd)
- Add Yonghong's ack, as the warning is still disabled for the default
build.
- Link to v1: https://lore.kernel.org/r/20240305-disable-extra-clang-enum-warnings-v1-1-6…
---
scripts/Makefile.extrawarn | 2 ++
1 file changed, 2 insertions(+)
diff --git a/scripts/Makefile.extrawarn b/scripts/Makefile.extrawarn
index a9e552a1e910..2f25a1de129d 100644
--- a/scripts/Makefile.extrawarn
+++ b/scripts/Makefile.extrawarn
@@ -132,6 +132,8 @@ KBUILD_CFLAGS += $(call cc-disable-warning, pointer-to-enum-cast)
KBUILD_CFLAGS += -Wno-tautological-constant-out-of-range-compare
KBUILD_CFLAGS += $(call cc-disable-warning, unaligned-access)
KBUILD_CFLAGS += $(call cc-disable-warning, cast-function-type-strict)
+KBUILD_CFLAGS += -Wno-enum-compare-conditional
+KBUILD_CFLAGS += -Wno-enum-enum-conversion
endif
endif
---
base-commit: 90d35da658da8cff0d4ecbb5113f5fac9d00eb72
change-id: 20240304-disable-extra-clang-enum-warnings-bf574c7c99fd
Best regards,
--
Nathan Chancellor <nathan(a)kernel.org>
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The .release() function does not get called until all readers of a file
descriptor are finished.
If a thread is blocked on reading a file descriptor in ring_buffer_wait(),
and another thread closes the file descriptor, it will not wake up the
other thread as ring_buffer_wake_waiters() is called by .release(), and
that will not get called until the .read() is finished.
The issue originally showed up in trace-cmd, but the readers are actually
other processes with their own file descriptors. So calling close() would wake
up the other tasks because they are blocked on another descriptor then the
one that was closed(). But there's other wake ups that solve that issue.
When a thread is blocked on a read, it can still hang even when another
thread closed its descriptor.
This is what the .flush() callback is for. Have the .flush() wake up the
readers.
Link: https://lore.kernel.org/linux-trace-kernel/20240308202432.107909457@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: linke li <lilinke99(a)qq.com>
Cc: Rabin Vincent <rabin(a)rab.in>
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d16b95ca58a7..c9c898307348 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8393,6 +8393,20 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
return size;
}
+static int tracing_buffers_flush(struct file *file, fl_owner_t id)
+{
+ struct ftrace_buffer_info *info = file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
+ return 0;
+}
+
static int tracing_buffers_release(struct inode *inode, struct file *file)
{
struct ftrace_buffer_info *info = file->private_data;
@@ -8404,12 +8418,6 @@ static int tracing_buffers_release(struct inode *inode, struct file *file)
__trace_array_put(iter->tr);
- iter->wait_index++;
- /* Make sure the waiters see the new wait_index */
- smp_wmb();
-
- ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
-
if (info->spare)
ring_buffer_free_read_page(iter->array_buffer->buffer,
info->spare_cpu, info->spare);
@@ -8625,6 +8633,7 @@ static const struct file_operations tracing_buffers_fops = {
.read = tracing_buffers_read,
.poll = tracing_buffers_poll,
.release = tracing_buffers_release,
+ .flush = tracing_buffers_flush,
.splice_read = tracing_buffers_splice_read,
.unlocked_ioctl = tracing_buffers_ioctl,
.llseek = no_llseek,
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The "shortest_full" variable is used to keep track of the waiter that is
waiting for the smallest amount on the ring buffer before being woken up.
When a tasks waits on the ring buffer, it passes in a "full" value that is
a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to
100% full buffer.
As all waiters are on the same wait queue, the wake up happens for the
waiter with the smallest percentage.
The problem is that the smallest_full on the cpu_buffer that stores the
smallest amount doesn't get reset when all the waiters are woken up. It
does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace).
This means that tasks may be woken up more often then when they want to
be. Instead, have the shortest_full field get reset just before waking up
all the tasks. If the tasks wait again, they will update the shortest_full
before sleeping.
Also add locking around setting of shortest_full in the poll logic, and
change "work" to "rbwork" to match the variable name for rb_irq_work
structures that are used in other places.
Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.948914369@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: linke li <lilinke99(a)qq.com>
Cc: Rabin Vincent <rabin(a)rab.in>
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 3400f11286e3..aa332ace108b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -755,8 +755,19 @@ static void rb_wake_up_waiters(struct irq_work *work)
wake_up_all(&rbwork->waiters);
if (rbwork->full_waiters_pending || rbwork->wakeup_full) {
+ /* Only cpu_buffer sets the above flags */
+ struct ring_buffer_per_cpu *cpu_buffer =
+ container_of(rbwork, struct ring_buffer_per_cpu, irq_work);
+
+ /* Called from interrupt context */
+ raw_spin_lock(&cpu_buffer->reader_lock);
rbwork->wakeup_full = false;
rbwork->full_waiters_pending = false;
+
+ /* Waking up all waiters, they will reset the shortest full */
+ cpu_buffer->shortest_full = 0;
+ raw_spin_unlock(&cpu_buffer->reader_lock);
+
wake_up_all(&rbwork->full_waiters);
}
}
@@ -934,28 +945,33 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full)
{
struct ring_buffer_per_cpu *cpu_buffer;
- struct rb_irq_work *work;
+ struct rb_irq_work *rbwork;
if (cpu == RING_BUFFER_ALL_CPUS) {
- work = &buffer->irq_work;
+ rbwork = &buffer->irq_work;
full = 0;
} else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return EPOLLERR;
cpu_buffer = buffer->buffers[cpu];
- work = &cpu_buffer->irq_work;
+ rbwork = &cpu_buffer->irq_work;
}
if (full) {
- poll_wait(filp, &work->full_waiters, poll_table);
- work->full_waiters_pending = true;
+ unsigned long flags;
+
+ poll_wait(filp, &rbwork->full_waiters, poll_table);
+
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ rbwork->full_waiters_pending = true;
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
} else {
- poll_wait(filp, &work->waiters, poll_table);
- work->waiters_pending = true;
+ poll_wait(filp, &rbwork->waiters, poll_table);
+ rbwork->waiters_pending = true;
}
/*
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
A task can wait on a ring buffer for when it fills up to a specific
watermark. The writer will check the minimum watermark that waiters are
waiting for and if the ring buffer is past that, it will wake up all the
waiters.
The waiters are in a wait loop, and will first check if a signal is
pending and then check if the ring buffer is at the desired level where it
should break out of the loop.
If a file that uses a ring buffer closes, and there's threads waiting on
the ring buffer, it needs to wake up those threads. To do this, a
"wait_index" was used.
Before entering the wait loop, the waiter will read the wait_index. On
wakeup, it will check if the wait_index is different than when it entered
the loop, and will exit the loop if it is. The waker will only need to
update the wait_index before waking up the waiters.
This had a couple of bugs. One trivial one and one broken by design.
The trivial bug was that the waiter checked the wait_index after the
schedule() call. It had to be checked between the prepare_to_wait() and
the schedule() which it was not.
The main bug is that the first check to set the default wait_index will
always be outside the prepare_to_wait() and the schedule(). That's because
the ring_buffer_wait() doesn't have enough context to know if it should
break out of the loop.
The loop itself is not needed, because all the callers to the
ring_buffer_wait() also has their own loop, as the callers have a better
sense of what the context is to decide whether to break out of the loop
or not.
Just have the ring_buffer_wait() block once, and if it gets woken up, exit
the function and let the callers decide what to do next.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS…
Link: https://lore.kernel.org/linux-trace-kernel/20240308202431.792933613@goodmis…
Cc: stable(a)vger.kernel.org
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Mark Rutland <mark.rutland(a)arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: linke li <lilinke99(a)qq.com>
Cc: Rabin Vincent <rabin(a)rab.in>
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 139 ++++++++++++++++++-------------------
1 file changed, 68 insertions(+), 71 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0699027b4f4c..3400f11286e3 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -384,7 +384,6 @@ struct rb_irq_work {
struct irq_work work;
wait_queue_head_t waiters;
wait_queue_head_t full_waiters;
- long wait_index;
bool waiters_pending;
bool full_waiters_pending;
bool wakeup_full;
@@ -798,14 +797,40 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
rbwork = &cpu_buffer->irq_work;
}
- rbwork->wait_index++;
- /* make sure the waiters see the new index */
- smp_wmb();
-
/* This can be called in any context */
irq_work_queue(&rbwork->work);
}
+static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ bool ret = false;
+
+ /* Reads of all CPUs always waits for any data */
+ if (cpu == RING_BUFFER_ALL_CPUS)
+ return !ring_buffer_empty(buffer);
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ if (!ring_buffer_empty_cpu(buffer, cpu)) {
+ unsigned long flags;
+ bool pagebusy;
+
+ if (!full)
+ return true;
+
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
+ ret = !pagebusy && full_hit(buffer, cpu, full);
+
+ if (!cpu_buffer->shortest_full ||
+ cpu_buffer->shortest_full > full)
+ cpu_buffer->shortest_full = full;
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+ }
+ return ret;
+}
+
/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
@@ -821,7 +846,6 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
struct ring_buffer_per_cpu *cpu_buffer;
DEFINE_WAIT(wait);
struct rb_irq_work *work;
- long wait_index;
int ret = 0;
/*
@@ -840,81 +864,54 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
work = &cpu_buffer->irq_work;
}
- wait_index = READ_ONCE(work->wait_index);
-
- while (true) {
- if (full)
- prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
- else
- prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
-
- /*
- * The events can happen in critical sections where
- * checking a work queue can cause deadlocks.
- * After adding a task to the queue, this flag is set
- * only to notify events to try to wake up the queue
- * using irq_work.
- *
- * We don't clear it even if the buffer is no longer
- * empty. The flag only causes the next event to run
- * irq_work to do the work queue wake up. The worse
- * that can happen if we race with !trace_empty() is that
- * an event will cause an irq_work to try to wake up
- * an empty queue.
- *
- * There's no reason to protect this flag either, as
- * the work queue and irq_work logic will do the necessary
- * synchronization for the wake ups. The only thing
- * that is necessary is that the wake up happens after
- * a task has been queued. It's OK for spurious wake ups.
- */
- if (full)
- work->full_waiters_pending = true;
- else
- work->waiters_pending = true;
-
- if (signal_pending(current)) {
- ret = -EINTR;
- break;
- }
-
- if (cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer))
- break;
-
- if (cpu != RING_BUFFER_ALL_CPUS &&
- !ring_buffer_empty_cpu(buffer, cpu)) {
- unsigned long flags;
- bool pagebusy;
- bool done;
-
- if (!full)
- break;
-
- raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
- done = !pagebusy && full_hit(buffer, cpu, full);
+ if (full)
+ prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
+ else
+ prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
- if (!cpu_buffer->shortest_full ||
- cpu_buffer->shortest_full > full)
- cpu_buffer->shortest_full = full;
- raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- if (done)
- break;
- }
+ /*
+ * The events can happen in critical sections where
+ * checking a work queue can cause deadlocks.
+ * After adding a task to the queue, this flag is set
+ * only to notify events to try to wake up the queue
+ * using irq_work.
+ *
+ * We don't clear it even if the buffer is no longer
+ * empty. The flag only causes the next event to run
+ * irq_work to do the work queue wake up. The worse
+ * that can happen if we race with !trace_empty() is that
+ * an event will cause an irq_work to try to wake up
+ * an empty queue.
+ *
+ * There's no reason to protect this flag either, as
+ * the work queue and irq_work logic will do the necessary
+ * synchronization for the wake ups. The only thing
+ * that is necessary is that the wake up happens after
+ * a task has been queued. It's OK for spurious wake ups.
+ */
+ if (full)
+ work->full_waiters_pending = true;
+ else
+ work->waiters_pending = true;
- schedule();
+ if (rb_watermark_hit(buffer, cpu, full))
+ goto out;
- /* Make sure to see the new wait index */
- smp_rmb();
- if (wait_index != work->wait_index)
- break;
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ goto out;
}
+ schedule();
+ out:
if (full)
finish_wait(&work->full_waiters, &wait);
else
finish_wait(&work->waiters, &wait);
+ if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
+ ret = -EINTR;
+
return ret;
}
--
2.43.0
The DebugSwap feature of SEV-ES provides a way for confidential guests to use
data breakpoints. However, because the status of the DebugSwap feature is
recorded in the VMSA, enabling it by default invalidates the attestation
signatures. In 6.10 we will introduce a new API to create SEV VMs that
will allow enabling DebugSwap based on what the user tells KVM to do.
Contextually, we will change the legacy KVM_SEV_ES_INIT API to never
enable DebugSwap.
For compatibility with kernels that pre-date the introduction of DebugSwap,
as well as with those where KVM_SEV_ES_INIT will never enable it, do not enable
the feature by default. If anybody wants to use it, for now they can enable
the sev_es_debug_swap_enabled module parameter, but this will result in a
warning.
Fixes: d1f85fbe836e ("KVM: SEV: Enable data breakpoints in SEV-ES")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
arch/x86/kvm/svm/sev.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c
index f760106c31f8..69b37956c1c8 100644
--- a/arch/x86/kvm/svm/sev.c
+++ b/arch/x86/kvm/svm/sev.c
@@ -57,7 +57,7 @@ static bool sev_es_enabled = true;
module_param_named(sev_es, sev_es_enabled, bool, 0444);
/* enable/disable SEV-ES DebugSwap support */
-static bool sev_es_debug_swap_enabled = true;
+static bool sev_es_debug_swap_enabled = false;
module_param_named(debug_swap, sev_es_debug_swap_enabled, bool, 0444);
#else
#define sev_enabled false
@@ -612,8 +612,11 @@ static int sev_es_sync_vmsa(struct vcpu_svm *svm)
save->xss = svm->vcpu.arch.ia32_xss;
save->dr6 = svm->vcpu.arch.dr6;
- if (sev_es_debug_swap_enabled)
+ if (sev_es_debug_swap_enabled) {
save->sev_features |= SVM_SEV_FEAT_DEBUG_SWAP;
+ pr_warn_once("Enabling DebugSwap with KVM_SEV_ES_INIT. "
+ "This will not work starting with Linux 6.10\n");
+ }
pr_debug("Virtual Machine Save Area (VMSA):\n");
print_hex_dump_debug("", DUMP_PREFIX_NONE, 16, 1, save, sizeof(*save), false);
--
2.39.1
This is a small set of patches that address build breakage with
allyesconfig / allmodconfig.
This solves some, but not all, build breakage.
The parport fix depends on the previous patch, the rest are independent
fixes.
With v2 there is a extra patch that drops ZONE_DMA support.
It does not fix any build failure, but a nice cleanup.
Cc: Miquel Raynal <miquel.raynal(a)bootlin.com>
To: Maciej W. Rozycki <macro(a)orcam.me.uk>
To: <sparclinux(a)vger.kernel.org>
Cc: <linux-parport(a)lists.infradead.org>
Cc: David S. Miller <davem(a)davemloft.net>
To: Andreas Larsson <andreas(a)gaisler.com>
To: Randy Dunlap <rdunlap(a)infradead.org>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: <linux-kernel(a)vger.kernel.org>
Changes in v2:
- Added r-b/tested by (thanks to Randy and Maciej)
- Dropped patch for uhci-grlib.c as it is already upstream (Randy)
- Added a few Fixes (Maciej)
- Fixed commit message when dropping GENERIC_ISA_DMA (Maciej)
- Added new patch that drop ZONE_DMA (Maciej)
- Added new patch to fix section mismatch error
In an allmodconfig build I see a lot of:
modpost: "__udelay" [module] has no CRC!
Similar for a handful of other symbols.
Any hint how to get rid of them would be nice.
I have tried to add the prototype to asm-prototypes.h with no luck.
On top of this the link fails, but I assume this the kernel that grows
too big which is no surprise.
- Link to v1: https://lore.kernel.org/r/20240223-sam-fix-sparc32-all-builds-v1-0-5c60fd5c…
---
Sam Ravnborg (7):
sparc32: Use generic cmpdi2/ucmpdi2 variants
sparc32: Fix build with trapbase
mtd: maps: sun_uflash: Declare uflash_devinit static
sparc32: Do not select ZONE_DMA
sparc32: Do not select GENERIC_ISA_DMA
sparc32: Fix parport build with sparc32
sparc32: Fix section mismatch in leon_pci_grpci
arch/sparc/Kconfig | 7 +-
arch/sparc/include/asm/parport.h | 259 +-----------------------------------
arch/sparc/include/asm/parport_64.h | 256 +++++++++++++++++++++++++++++++++++
arch/sparc/kernel/irq_32.c | 6 +-
arch/sparc/kernel/kernel.h | 8 +-
arch/sparc/kernel/kgdb_32.c | 4 +-
arch/sparc/kernel/leon_pci_grpci1.c | 2 +-
arch/sparc/kernel/leon_pci_grpci2.c | 2 +-
arch/sparc/kernel/leon_smp.c | 6 +-
arch/sparc/kernel/setup_32.c | 4 +-
arch/sparc/lib/Makefile | 4 +-
arch/sparc/lib/cmpdi2.c | 28 ----
arch/sparc/lib/ucmpdi2.c | 20 ---
arch/sparc/mm/srmmu.c | 1 -
drivers/mtd/maps/sun_uflash.c | 2 +-
15 files changed, 284 insertions(+), 325 deletions(-)
---
base-commit: 626db6ee8ee1edac206610db407114aa83b53fd3
change-id: 20240223-sam-fix-sparc32-all-builds-0a0403d6e1b3
Best regards,
--
Sam Ravnborg <sam(a)ravnborg.org>
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
When the trace_pipe_raw file is closed, there should be no new readers on
the file descriptor. This is mostly handled with the waking and wait_index
fields of the iterator. But there's still a slight race.
CPU 0 CPU 1
----- -----
wait_woken_prepare()
if (waking)
woken = true;
index = wait_index;
wait_woken_set()
waking = true
wait_index++;
ring_buffer_wake_waiters();
wait_on_pipe()
ring_buffer_wait();
The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is
that the ring_buffer_wait() needs the logic of:
prepare_to_wait();
if (!condition)
schedule();
Where the missing condition check is the iter->waking.
Either that condition check needs to be passed to ring_buffer_wait() or
the function needs to be broken up into three parts. This chooses to do
the break up.
Break ring_buffer_wait() into:
ring_buffer_prepare_to_wait();
ring_buffer_wait();
ring_buffer_finish_wait();
Now wait_on_pipe() can have:
ring_buffer_prepare_to_wait();
if (!iter->waking)
ring_buffer_wait();
ring_buffer_finish_wait();
And this will catch the above race, as the waiter will either see waking,
or already have been woken up.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS…
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/ring_buffer.h | 4 ++
kernel/trace/ring_buffer.c | 88 ++++++++++++++++++++++++++-----------
kernel/trace/trace.c | 14 +++++-
3 files changed, 78 insertions(+), 28 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index fa802db216f9..e5b5903cdc21 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -98,7 +98,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
__ring_buffer_alloc((size), (flags), &__key); \
})
+int ring_buffer_prepare_to_wait(struct trace_buffer *buffer, int cpu, int *full,
+ struct wait_queue_entry *wait);
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
+void ring_buffer_finish_wait(struct trace_buffer *buffer, int cpu, int full,
+ struct wait_queue_entry *wait);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 856d0e5b0da5..fa7090f6b4fc 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -868,29 +868,29 @@ rb_get_work_queue(struct trace_buffer *buffer, int cpu, int *full)
}
/**
- * ring_buffer_wait - wait for input to the ring buffer
+ * ring_buffer_prepare_to_wait - Prepare to wait for data on the ring buffer
* @buffer: buffer to wait on
* @cpu: the cpu buffer to wait on
- * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ * @full: wait until the percentage of pages are available,
+ * if @cpu != RING_BUFFER_ALL_CPUS. It may be updated via this function.
+ * @wait: The wait queue entry.
*
- * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
- * as data is added to any of the @buffer's cpu buffers. Otherwise
- * it will wait for data to be added to a specific cpu buffer.
+ * This must be called before ring_buffer_wait(). It calls the prepare_to_wait()
+ * on @wait for the necessary wait queue defined by @buffer, @cpu, and @full.
*/
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+int ring_buffer_prepare_to_wait(struct trace_buffer *buffer, int cpu, int *full,
+ struct wait_queue_entry *wait)
{
struct rb_irq_work *rbwork;
- DEFINE_WAIT(wait);
- int ret = 0;
- rbwork = rb_get_work_queue(buffer, cpu, &full);
+ rbwork = rb_get_work_queue(buffer, cpu, full);
if (IS_ERR(rbwork))
return PTR_ERR(rbwork);
- if (full)
- prepare_to_wait(&rbwork->full_waiters, &wait, TASK_INTERRUPTIBLE);
+ if (*full)
+ prepare_to_wait(&rbwork->full_waiters, wait, TASK_INTERRUPTIBLE);
else
- prepare_to_wait(&rbwork->waiters, &wait, TASK_INTERRUPTIBLE);
+ prepare_to_wait(&rbwork->waiters, wait, TASK_INTERRUPTIBLE);
/*
* The events can happen in critical sections where
@@ -912,30 +912,66 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* that is necessary is that the wake up happens after
* a task has been queued. It's OK for spurious wake ups.
*/
- if (full)
+ if (*full)
rbwork->full_waiters_pending = true;
else
rbwork->waiters_pending = true;
- if (rb_watermark_hit(buffer, cpu, full))
- goto out;
+ return 0;
+}
- if (signal_pending(current)) {
- ret = -EINTR;
- goto out;
- }
+/**
+ * ring_buffer_finish_wait - clean up of ring_buffer_prepare_to_wait()
+ * @buffer: buffer to wait on
+ * @cpu: the cpu buffer to wait on
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ * @wait: The wait queue entry.
+ *
+ * This must be called after ring_buffer_prepare_to_wait(). It cleans up
+ * the @wait for the queue defined by @buffer, @cpu, and @full.
+ */
+void ring_buffer_finish_wait(struct trace_buffer *buffer, int cpu, int full,
+ struct wait_queue_entry *wait)
+{
+ struct rb_irq_work *rbwork;
+
+ rbwork = rb_get_work_queue(buffer, cpu, &full);
+ if (WARN_ON_ONCE(IS_ERR(rbwork)))
+ return;
- schedule();
- out:
if (full)
- finish_wait(&rbwork->full_waiters, &wait);
+ finish_wait(&rbwork->full_waiters, wait);
else
- finish_wait(&rbwork->waiters, &wait);
+ finish_wait(&rbwork->waiters, wait);
+}
- if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
- ret = -EINTR;
+/**
+ * ring_buffer_wait - wait for input to the ring buffer
+ * @buffer: buffer to wait on
+ * @cpu: the cpu buffer to wait on
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ *
+ * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
+ * as data is added to any of the @buffer's cpu buffers. Otherwise
+ * it will wait for data to be added to a specific cpu buffer.
+ *
+ * ring_buffer_prepare_to_wait() must be called before this function
+ * and ring_buffer_finish_wait() must be called after.
+ */
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+{
+ if (rb_watermark_hit(buffer, cpu, full))
+ return 0;
- return ret;
+ if (signal_pending(current))
+ return -EINTR;
+
+ schedule();
+
+ if (!rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
+ return -EINTR;
+
+ return 0;
}
/**
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index a184dbdf8e91..2d6bc6ee8a58 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1984,7 +1984,8 @@ static bool wait_woken_prepare(struct trace_iterator *iter, int *wait_index)
spin_lock(&wait_lock);
if (iter->waking)
woken = true;
- *wait_index = iter->wait_index;
+ if (wait_index)
+ *wait_index = iter->wait_index;
spin_unlock(&wait_lock);
return woken;
@@ -2019,13 +2020,22 @@ static void wait_woken_clear(struct trace_iterator *iter)
static int wait_on_pipe(struct trace_iterator *iter, int full)
{
+ struct trace_buffer *buffer;
+ DEFINE_WAIT(wait);
int ret;
/* Iterators are static, they should be filled or empty */
if (trace_buffer_iter(iter, iter->cpu_file))
return 0;
- ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full);
+ buffer = iter->array_buffer->buffer;
+
+ ret = ring_buffer_prepare_to_wait(buffer, iter->cpu_file, &full, &wait);
+ if (ret < 0)
+ return ret;
+ if (!wait_woken_prepare(iter, NULL))
+ ret = ring_buffer_wait(buffer, iter->cpu_file, full);
+ ring_buffer_finish_wait(buffer, iter->cpu_file, full, &wait);
#ifdef CONFIG_TRACER_MAX_TRACE
/*
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The ring_buffer_wait() needs to be broken into three functions for proper
synchronization from the context of the callers:
ring_buffer_prepare_to_wait()
ring_buffer_wait()
ring_buffer_finish_wait()
To simplify the process, pull out the logic for getting the right work
queue to wait on, as it will be needed for the above functions.
There are three work queues depending on the cpu value.
If cpu == RING_BUFFER_ALL_CPUS, then the main "buffer->irq_work" is used.
Otherwise, the cpu_buffer representing the CPU buffer's irq_work is used.
Create a rb_get_work_queue() helper function to retrieve the proper queue.
Also rename "work" to "rbwork" as the variable point to struct rb_irq_work,
and to be more consistent with the variable naming elsewhere in the file.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS…
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 58 +++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 23 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index aa332ace108b..856d0e5b0da5 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -842,6 +842,31 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
return ret;
}
+static struct rb_irq_work *
+rb_get_work_queue(struct trace_buffer *buffer, int cpu, int *full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct rb_irq_work *rbwork;
+
+ /*
+ * Depending on what the caller is waiting for, either any
+ * data in any cpu buffer, or a specific buffer, put the
+ * caller on the appropriate wait queue.
+ */
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+ rbwork = &buffer->irq_work;
+ /* Full only makes sense on per cpu reads */
+ *full = 0;
+ } else {
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return ERR_PTR(-ENODEV);
+ cpu_buffer = buffer->buffers[cpu];
+ rbwork = &cpu_buffer->irq_work;
+ }
+
+ return rbwork;
+}
+
/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
@@ -854,31 +879,18 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
*/
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
{
- struct ring_buffer_per_cpu *cpu_buffer;
+ struct rb_irq_work *rbwork;
DEFINE_WAIT(wait);
- struct rb_irq_work *work;
int ret = 0;
- /*
- * Depending on what the caller is waiting for, either any
- * data in any cpu buffer, or a specific buffer, put the
- * caller on the appropriate wait queue.
- */
- if (cpu == RING_BUFFER_ALL_CPUS) {
- work = &buffer->irq_work;
- /* Full only makes sense on per cpu reads */
- full = 0;
- } else {
- if (!cpumask_test_cpu(cpu, buffer->cpumask))
- return -ENODEV;
- cpu_buffer = buffer->buffers[cpu];
- work = &cpu_buffer->irq_work;
- }
+ rbwork = rb_get_work_queue(buffer, cpu, &full);
+ if (IS_ERR(rbwork))
+ return PTR_ERR(rbwork);
if (full)
- prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
+ prepare_to_wait(&rbwork->full_waiters, &wait, TASK_INTERRUPTIBLE);
else
- prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
+ prepare_to_wait(&rbwork->waiters, &wait, TASK_INTERRUPTIBLE);
/*
* The events can happen in critical sections where
@@ -901,9 +913,9 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* a task has been queued. It's OK for spurious wake ups.
*/
if (full)
- work->full_waiters_pending = true;
+ rbwork->full_waiters_pending = true;
else
- work->waiters_pending = true;
+ rbwork->waiters_pending = true;
if (rb_watermark_hit(buffer, cpu, full))
goto out;
@@ -916,9 +928,9 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
schedule();
out:
if (full)
- finish_wait(&work->full_waiters, &wait);
+ finish_wait(&rbwork->full_waiters, &wait);
else
- finish_wait(&work->waiters, &wait);
+ finish_wait(&rbwork->waiters, &wait);
if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
ret = -EINTR;
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
When the tracing_pipe_raw file is closed, if there are readers still
blocked on it, they need to be woken up. Currently a wait_index is used.
When the readers need to be woken, the index is updated and they are all
woken up.
But there is a race where a new reader could be coming in just as the file
is being closed, and it could still block if the wake up happens just
before the reader enters the wait.
Add another field called "waking" and wrap both the waking and wait_index
around a new wait_lock to synchronize them.
When a reader comes in, it will save the current wait_index, but if waking
is set, then it will not block no matter what wait_index is.
After it wakes from the wait, if either the waking is set or the
wait_index is not the same as what it read before, then it will not block.
The waker will set waking and increment the wait_index. For the .flush()
function, it will not clear waking so that all new readers must not block.
There's an ioctl() that kicks all current waiters, but does not care about
new waiters. It will set the waking count back to what it was when it came
in.
There's still a race with the wait_on_pipe() with respect to the
ring_buffer_wait(), but that will be dealt with separately.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS…
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
Changes since v1: https://lore.kernel.org/linux-trace-kernel/20240308184007.805898590@goodmis…
- My tests triggered a warning about calling a mutex_lock() after a
prepare_to_wait() that changed the task's state. Convert the affected
mutex over to a spinlock.
include/linux/trace_events.h | 3 +-
kernel/trace/trace.c | 104 +++++++++++++++++++++++++++++------
2 files changed, 89 insertions(+), 18 deletions(-)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d68ff9b1247f..adf8e163a7be 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -103,7 +103,8 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
- long wait_index;
+ int wait_index;
+ int waking; /* set by a waker */
/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c9c898307348..a184dbdf8e91 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1955,6 +1955,68 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu)
#endif /* CONFIG_TRACER_MAX_TRACE */
+/*
+ * In order to wake up readers and have them return back to user space,
+ * the iterator has two counters:
+ *
+ * wait_index - always increases every time a waker wakes up the readers.
+ * waking - Set by the waker when waking and cleared afterward.
+ *
+ * Both are protected together with the wait_lock.
+ * When waking, the lock is taken and both indexes are incremented.
+ * The reader will first prepare the wait by taking the lock,
+ * if waking is set, it will sleep regardless of what wait_index is.
+ * Then after it sleeps it checks if wait_index has been updated
+ * and if it has, it will not sleep again.
+ *
+ * Note, if wait_woken_clear() is not called, then all new readers
+ * will not sleep (this happens in closing the file).
+ *
+ * wait_lock needs to be a spinlock and not a mutex as it can be called
+ * after prepare_to_wait(), which changes the task's state.
+ */
+static DEFINE_SPINLOCK(wait_lock);
+
+static bool wait_woken_prepare(struct trace_iterator *iter, int *wait_index)
+{
+ bool woken = false;
+
+ spin_lock(&wait_lock);
+ if (iter->waking)
+ woken = true;
+ *wait_index = iter->wait_index;
+ spin_unlock(&wait_lock);
+
+ return woken;
+}
+
+static bool wait_woken_check(struct trace_iterator *iter, int *wait_index)
+{
+ bool woken = false;
+
+ spin_lock(&wait_lock);
+ if (iter->waking || *wait_index != iter->wait_index)
+ woken = true;
+ spin_unlock(&wait_lock);
+
+ return woken;
+}
+
+static void wait_woken_set(struct trace_iterator *iter)
+{
+ spin_lock(&wait_lock);
+ iter->waking++;
+ iter->wait_index++;
+ spin_unlock(&wait_lock);
+}
+
+static void wait_woken_clear(struct trace_iterator *iter)
+{
+ spin_lock(&wait_lock);
+ iter->waking--;
+ spin_unlock(&wait_lock);
+}
+
static int wait_on_pipe(struct trace_iterator *iter, int full)
{
int ret;
@@ -8312,9 +8374,11 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
struct ftrace_buffer_info *info = filp->private_data;
struct trace_iterator *iter = &info->iter;
void *trace_data;
+ int wait_index;
int page_size;
ssize_t ret = 0;
ssize_t size;
+ bool woken;
if (!count)
return 0;
@@ -8353,6 +8417,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
if (info->read < page_size)
goto read;
+ woken = wait_woken_prepare(iter, &wait_index);
again:
trace_access_lock(iter->cpu_file);
ret = ring_buffer_read_page(iter->array_buffer->buffer,
@@ -8362,7 +8427,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
trace_access_unlock(iter->cpu_file);
if (ret < 0) {
- if (trace_empty(iter)) {
+ if (trace_empty(iter) && !woken) {
if ((filp->f_flags & O_NONBLOCK))
return -EAGAIN;
@@ -8370,6 +8435,8 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
if (ret)
return ret;
+ woken = wait_woken_check(iter, &wait_index);
+
goto again;
}
return 0;
@@ -8398,12 +8465,14 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id)
struct ftrace_buffer_info *info = file->private_data;
struct trace_iterator *iter = &info->iter;
- iter->wait_index++;
- /* Make sure the waiters see the new wait_index */
- smp_wmb();
+ wait_woken_set(iter);
ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+ /*
+ * Do not call wait_woken_clear(), as the file is being closed.
+ * this will prevent any new readers from sleeping.
+ */
return 0;
}
@@ -8500,9 +8569,11 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
+ int wait_index;
int page_size;
int entries, i;
ssize_t ret = 0;
+ bool woken = false;
#ifdef CONFIG_TRACER_MAX_TRACE
if (iter->snapshot && iter->tr->current_trace->use_max_tr)
@@ -8522,6 +8593,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (splice_grow_spd(pipe, &spd))
return -ENOMEM;
+ woken = wait_woken_prepare(iter, &wait_index);
again:
trace_access_lock(iter->cpu_file);
entries = ring_buffer_entries_cpu(iter->array_buffer->buffer, iter->cpu_file);
@@ -8573,17 +8645,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
/* did we read anything? */
if (!spd.nr_pages) {
- long wait_index;
if (ret)
goto out;
+ if (woken)
+ goto out;
+
ret = -EAGAIN;
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
- wait_index = READ_ONCE(iter->wait_index);
-
ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent);
if (ret)
goto out;
@@ -8592,10 +8664,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (!tracer_tracing_is_on(iter->tr))
goto out;
- /* Make sure we see the new wait_index */
- smp_rmb();
- if (wait_index != iter->wait_index)
- goto out;
+ woken = wait_woken_check(iter, &wait_index);
goto again;
}
@@ -8616,15 +8685,16 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned
if (cmd)
return -ENOIOCTLCMD;
- mutex_lock(&trace_types_lock);
-
- iter->wait_index++;
- /* Make sure the waiters see the new wait_index */
- smp_wmb();
+ wait_woken_set(iter);
ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
- mutex_unlock(&trace_types_lock);
+ /*
+ * This just kicks existing readers, a new reader coming in may
+ * still sleep.
+ */
+ wait_woken_clear(iter);
+
return 0;
}
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The .release() function does not get called until all readers of a file
descriptor are finished.
If a thread is blocked on reading a file descriptor in ring_buffer_wait(),
and another thread closes the file descriptor, it will not wake up the
other thread as ring_buffer_wake_waiters() is called by .release(), and
that will not get called until the .read() is finished.
The issue originally showed up in trace-cmd, but the readers are actually
other processes with their own file descriptors. So calling close() would wake
up the other tasks because they are blocked on another descriptor then the
one that was closed(). But there's other wake ups that solve that issue.
When a thread is blocked on a read, it can still hang even when another
thread closed its descriptor.
This is what the .flush() callback is for. Have the .flush() wake up the
readers.
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d16b95ca58a7..c9c898307348 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8393,6 +8393,20 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
return size;
}
+static int tracing_buffers_flush(struct file *file, fl_owner_t id)
+{
+ struct ftrace_buffer_info *info = file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
+ return 0;
+}
+
static int tracing_buffers_release(struct inode *inode, struct file *file)
{
struct ftrace_buffer_info *info = file->private_data;
@@ -8404,12 +8418,6 @@ static int tracing_buffers_release(struct inode *inode, struct file *file)
__trace_array_put(iter->tr);
- iter->wait_index++;
- /* Make sure the waiters see the new wait_index */
- smp_wmb();
-
- ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
-
if (info->spare)
ring_buffer_free_read_page(iter->array_buffer->buffer,
info->spare_cpu, info->spare);
@@ -8625,6 +8633,7 @@ static const struct file_operations tracing_buffers_fops = {
.read = tracing_buffers_read,
.poll = tracing_buffers_poll,
.release = tracing_buffers_release,
+ .flush = tracing_buffers_flush,
.splice_read = tracing_buffers_splice_read,
.unlocked_ioctl = tracing_buffers_ioctl,
.llseek = no_llseek,
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The "shortest_full" variable is used to keep track of the waiter that is
waiting for the smallest amount on the ring buffer before being woken up.
When a tasks waits on the ring buffer, it passes in a "full" value that is
a percentage. 0 means wake up on any data. 1-100 means wake up from 1% to
100% full buffer.
As all waiters are on the same wait queue, the wake up happens for the
waiter with the smallest percentage.
The problem is that the smallest_full on the cpu_buffer that stores the
smallest amount doesn't get reset when all the waiters are woken up. It
does get reset when the ring buffer is reset (echo > /sys/kernel/tracing/trace).
This means that tasks may be woken up more often then when they want to
be. Instead, have the shortest_full field get reset just before waking up
all the tasks. If the tasks wait again, they will update the shortest_full
before sleeping.
Also add locking around setting of shortest_full in the poll logic, and
change "work" to "rbwork" to match the variable name for rb_irq_work
structures that are used in other places.
Link: https://lore.kernel.org/linux-trace-kernel/20240308184007.485732758@goodmis…
Cc: stable(a)vger.kernel.org
Fixes: 2c2b0a78b3739 ("ring-buffer: Add percentage of ring buffer full to wake up reader")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 30 +++++++++++++++++++++++-------
1 file changed, 23 insertions(+), 7 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 3400f11286e3..aa332ace108b 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -755,8 +755,19 @@ static void rb_wake_up_waiters(struct irq_work *work)
wake_up_all(&rbwork->waiters);
if (rbwork->full_waiters_pending || rbwork->wakeup_full) {
+ /* Only cpu_buffer sets the above flags */
+ struct ring_buffer_per_cpu *cpu_buffer =
+ container_of(rbwork, struct ring_buffer_per_cpu, irq_work);
+
+ /* Called from interrupt context */
+ raw_spin_lock(&cpu_buffer->reader_lock);
rbwork->wakeup_full = false;
rbwork->full_waiters_pending = false;
+
+ /* Waking up all waiters, they will reset the shortest full */
+ cpu_buffer->shortest_full = 0;
+ raw_spin_unlock(&cpu_buffer->reader_lock);
+
wake_up_all(&rbwork->full_waiters);
}
}
@@ -934,28 +945,33 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full)
{
struct ring_buffer_per_cpu *cpu_buffer;
- struct rb_irq_work *work;
+ struct rb_irq_work *rbwork;
if (cpu == RING_BUFFER_ALL_CPUS) {
- work = &buffer->irq_work;
+ rbwork = &buffer->irq_work;
full = 0;
} else {
if (!cpumask_test_cpu(cpu, buffer->cpumask))
return EPOLLERR;
cpu_buffer = buffer->buffers[cpu];
- work = &cpu_buffer->irq_work;
+ rbwork = &cpu_buffer->irq_work;
}
if (full) {
- poll_wait(filp, &work->full_waiters, poll_table);
- work->full_waiters_pending = true;
+ unsigned long flags;
+
+ poll_wait(filp, &rbwork->full_waiters, poll_table);
+
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ rbwork->full_waiters_pending = true;
if (!cpu_buffer->shortest_full ||
cpu_buffer->shortest_full > full)
cpu_buffer->shortest_full = full;
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
} else {
- poll_wait(filp, &work->waiters, poll_table);
- work->waiters_pending = true;
+ poll_wait(filp, &rbwork->waiters, poll_table);
+ rbwork->waiters_pending = true;
}
/*
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
A task can wait on a ring buffer for when it fills up to a specific
watermark. The writer will check the minimum watermark that waiters are
waiting for and if the ring buffer is past that, it will wake up all the
waiters.
The waiters are in a wait loop, and will first check if a signal is
pending and then check if the ring buffer is at the desired level where it
should break out of the loop.
If a file that uses a ring buffer closes, and there's threads waiting on
the ring buffer, it needs to wake up those threads. To do this, a
"wait_index" was used.
Before entering the wait loop, the waiter will read the wait_index. On
wakeup, it will check if the wait_index is different than when it entered
the loop, and will exit the loop if it is. The waker will only need to
update the wait_index before waking up the waiters.
This had a couple of bugs. One trivial one and one broken by design.
The trivial bug was that the waiter checked the wait_index after the
schedule() call. It had to be checked between the prepare_to_wait() and
the schedule() which it was not.
The main bug is that the first check to set the default wait_index will
always be outside the prepare_to_wait() and the schedule(). That's because
the ring_buffer_wait() doesn't have enough context to know if it should
break out of the loop.
The loop itself is not needed, because all the callers to the
ring_buffer_wait() also has their own loop, as the callers have a better
sense of what the context is to decide whether to break out of the loop
or not.
Just have the ring_buffer_wait() block once, and if it gets woken up, exit
the function and let the callers decide what to do next.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS…
Cc: stable(a)vger.kernel.org
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 139 ++++++++++++++++++-------------------
1 file changed, 68 insertions(+), 71 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0699027b4f4c..3400f11286e3 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -384,7 +384,6 @@ struct rb_irq_work {
struct irq_work work;
wait_queue_head_t waiters;
wait_queue_head_t full_waiters;
- long wait_index;
bool waiters_pending;
bool full_waiters_pending;
bool wakeup_full;
@@ -798,14 +797,40 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
rbwork = &cpu_buffer->irq_work;
}
- rbwork->wait_index++;
- /* make sure the waiters see the new index */
- smp_wmb();
-
/* This can be called in any context */
irq_work_queue(&rbwork->work);
}
+static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ bool ret = false;
+
+ /* Reads of all CPUs always waits for any data */
+ if (cpu == RING_BUFFER_ALL_CPUS)
+ return !ring_buffer_empty(buffer);
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ if (!ring_buffer_empty_cpu(buffer, cpu)) {
+ unsigned long flags;
+ bool pagebusy;
+
+ if (!full)
+ return true;
+
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
+ ret = !pagebusy && full_hit(buffer, cpu, full);
+
+ if (!cpu_buffer->shortest_full ||
+ cpu_buffer->shortest_full > full)
+ cpu_buffer->shortest_full = full;
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+ }
+ return ret;
+}
+
/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
@@ -821,7 +846,6 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
struct ring_buffer_per_cpu *cpu_buffer;
DEFINE_WAIT(wait);
struct rb_irq_work *work;
- long wait_index;
int ret = 0;
/*
@@ -840,81 +864,54 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
work = &cpu_buffer->irq_work;
}
- wait_index = READ_ONCE(work->wait_index);
-
- while (true) {
- if (full)
- prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
- else
- prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
-
- /*
- * The events can happen in critical sections where
- * checking a work queue can cause deadlocks.
- * After adding a task to the queue, this flag is set
- * only to notify events to try to wake up the queue
- * using irq_work.
- *
- * We don't clear it even if the buffer is no longer
- * empty. The flag only causes the next event to run
- * irq_work to do the work queue wake up. The worse
- * that can happen if we race with !trace_empty() is that
- * an event will cause an irq_work to try to wake up
- * an empty queue.
- *
- * There's no reason to protect this flag either, as
- * the work queue and irq_work logic will do the necessary
- * synchronization for the wake ups. The only thing
- * that is necessary is that the wake up happens after
- * a task has been queued. It's OK for spurious wake ups.
- */
- if (full)
- work->full_waiters_pending = true;
- else
- work->waiters_pending = true;
-
- if (signal_pending(current)) {
- ret = -EINTR;
- break;
- }
-
- if (cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer))
- break;
-
- if (cpu != RING_BUFFER_ALL_CPUS &&
- !ring_buffer_empty_cpu(buffer, cpu)) {
- unsigned long flags;
- bool pagebusy;
- bool done;
-
- if (!full)
- break;
-
- raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
- done = !pagebusy && full_hit(buffer, cpu, full);
+ if (full)
+ prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
+ else
+ prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
- if (!cpu_buffer->shortest_full ||
- cpu_buffer->shortest_full > full)
- cpu_buffer->shortest_full = full;
- raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- if (done)
- break;
- }
+ /*
+ * The events can happen in critical sections where
+ * checking a work queue can cause deadlocks.
+ * After adding a task to the queue, this flag is set
+ * only to notify events to try to wake up the queue
+ * using irq_work.
+ *
+ * We don't clear it even if the buffer is no longer
+ * empty. The flag only causes the next event to run
+ * irq_work to do the work queue wake up. The worse
+ * that can happen if we race with !trace_empty() is that
+ * an event will cause an irq_work to try to wake up
+ * an empty queue.
+ *
+ * There's no reason to protect this flag either, as
+ * the work queue and irq_work logic will do the necessary
+ * synchronization for the wake ups. The only thing
+ * that is necessary is that the wake up happens after
+ * a task has been queued. It's OK for spurious wake ups.
+ */
+ if (full)
+ work->full_waiters_pending = true;
+ else
+ work->waiters_pending = true;
- schedule();
+ if (rb_watermark_hit(buffer, cpu, full))
+ goto out;
- /* Make sure to see the new wait index */
- smp_rmb();
- if (wait_index != work->wait_index)
- break;
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ goto out;
}
+ schedule();
+ out:
if (full)
finish_wait(&work->full_waiters, &wait);
else
finish_wait(&work->waiters, &wait);
+ if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
+ ret = -EINTR;
+
return ret;
}
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
When the tracing_pipe_raw file is closed, if there are readers still
blocked on it, they need to be woken up. Currently a wait_index is used.
When the readers need to be woken, the index is updated and they are all
woken up.
But there is a race where a new reader could be coming in just as the file
is being closed, and it could still block if the wake up happens just
before the reader enters the wait.
Add another field called "waking" and wrap both the waking and wait_index
around a new wait_mutex to synchronize them.
When a reader comes in, it will save the current wait_index, but if waking
is set, then it will not block no matter what wait_index is.
After it wakes from the wait, if either the waking is set or the
wait_index is not the same as what it read before, then it will not block.
The waker will set waking and increment the wait_index. For the .flush()
function, it will not clear waking so that all new readers must not block.
There's an ioctl() that kicks all current waiters, but does not care about
new waiters. It will set the waking count back to what it was when it came
in.
There's still a race with the wait_on_pipe() with respect to the
ring_buffer_wait(), but that will be dealt with separately.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS…
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/trace_events.h | 3 +-
kernel/trace/trace.c | 101 +++++++++++++++++++++++++++++------
2 files changed, 86 insertions(+), 18 deletions(-)
diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h
index d68ff9b1247f..adf8e163a7be 100644
--- a/include/linux/trace_events.h
+++ b/include/linux/trace_events.h
@@ -103,7 +103,8 @@ struct trace_iterator {
unsigned int temp_size;
char *fmt; /* modified format holder */
unsigned int fmt_size;
- long wait_index;
+ int wait_index;
+ int waking; /* set by a waker */
/* trace_seq for __print_flags() and __print_symbolic() etc. */
struct trace_seq tmp_seq;
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index c9c898307348..4e8f6cdeafd5 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1955,6 +1955,65 @@ update_max_tr_single(struct trace_array *tr, struct task_struct *tsk, int cpu)
#endif /* CONFIG_TRACER_MAX_TRACE */
+/*
+ * In order to wake up readers and have them return back to user space,
+ * the iterator has two counters:
+ *
+ * wait_index - always increases every time a waker wakes up the readers.
+ * waking - Set by the waker when waking and cleared afterward.
+ *
+ * Both are protected together with the wait_mutex.
+ * When waking, the lock is taken and both indexes are incremented.
+ * The reader will first prepare the wait by taking the lock,
+ * if waking is set, it will sleep regardless of what wait_index is.
+ * Then after it sleeps it checks if wait_index has been updated
+ * and if it has, it will not sleep again.
+ *
+ * Note, if wait_woken_clear() is not called, then all new readers
+ * will not sleep (this happens in closing the file).
+ */
+static DEFINE_MUTEX(wait_mutex);
+
+static bool wait_woken_prepare(struct trace_iterator *iter, int *wait_index)
+{
+ bool woken = false;
+
+ mutex_lock(&wait_mutex);
+ if (iter->waking)
+ woken = true;
+ *wait_index = iter->wait_index;
+ mutex_unlock(&wait_mutex);
+
+ return woken;
+}
+
+static bool wait_woken_check(struct trace_iterator *iter, int *wait_index)
+{
+ bool woken = false;
+
+ mutex_lock(&wait_mutex);
+ if (iter->waking || *wait_index != iter->wait_index)
+ woken = true;
+ mutex_unlock(&wait_mutex);
+
+ return woken;
+}
+
+static void wait_woken_set(struct trace_iterator *iter)
+{
+ mutex_lock(&wait_mutex);
+ iter->waking++;
+ iter->wait_index++;
+ mutex_unlock(&wait_mutex);
+}
+
+static void wait_woken_clear(struct trace_iterator *iter)
+{
+ mutex_lock(&wait_mutex);
+ iter->waking--;
+ mutex_unlock(&wait_mutex);
+}
+
static int wait_on_pipe(struct trace_iterator *iter, int full)
{
int ret;
@@ -8312,9 +8371,11 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
struct ftrace_buffer_info *info = filp->private_data;
struct trace_iterator *iter = &info->iter;
void *trace_data;
+ int wait_index;
int page_size;
ssize_t ret = 0;
ssize_t size;
+ bool woken;
if (!count)
return 0;
@@ -8353,6 +8414,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
if (info->read < page_size)
goto read;
+ woken = wait_woken_prepare(iter, &wait_index);
again:
trace_access_lock(iter->cpu_file);
ret = ring_buffer_read_page(iter->array_buffer->buffer,
@@ -8362,7 +8424,7 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
trace_access_unlock(iter->cpu_file);
if (ret < 0) {
- if (trace_empty(iter)) {
+ if (trace_empty(iter) && !woken) {
if ((filp->f_flags & O_NONBLOCK))
return -EAGAIN;
@@ -8370,6 +8432,8 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
if (ret)
return ret;
+ woken = wait_woken_check(iter, &wait_index);
+
goto again;
}
return 0;
@@ -8398,12 +8462,14 @@ static int tracing_buffers_flush(struct file *file, fl_owner_t id)
struct ftrace_buffer_info *info = file->private_data;
struct trace_iterator *iter = &info->iter;
- iter->wait_index++;
- /* Make sure the waiters see the new wait_index */
- smp_wmb();
+ wait_woken_set(iter);
ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+ /*
+ * Do not call wait_woken_clear(), as the file is being closed.
+ * this will prevent any new readers from sleeping.
+ */
return 0;
}
@@ -8500,9 +8566,11 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
.spd_release = buffer_spd_release,
};
struct buffer_ref *ref;
+ int wait_index;
int page_size;
int entries, i;
ssize_t ret = 0;
+ bool woken = false;
#ifdef CONFIG_TRACER_MAX_TRACE
if (iter->snapshot && iter->tr->current_trace->use_max_tr)
@@ -8522,6 +8590,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (splice_grow_spd(pipe, &spd))
return -ENOMEM;
+ woken = wait_woken_prepare(iter, &wait_index);
again:
trace_access_lock(iter->cpu_file);
entries = ring_buffer_entries_cpu(iter->array_buffer->buffer, iter->cpu_file);
@@ -8573,17 +8642,17 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
/* did we read anything? */
if (!spd.nr_pages) {
- long wait_index;
if (ret)
goto out;
+ if (woken)
+ goto out;
+
ret = -EAGAIN;
if ((file->f_flags & O_NONBLOCK) || (flags & SPLICE_F_NONBLOCK))
goto out;
- wait_index = READ_ONCE(iter->wait_index);
-
ret = wait_on_pipe(iter, iter->snapshot ? 0 : iter->tr->buffer_percent);
if (ret)
goto out;
@@ -8592,10 +8661,7 @@ tracing_buffers_splice_read(struct file *file, loff_t *ppos,
if (!tracer_tracing_is_on(iter->tr))
goto out;
- /* Make sure we see the new wait_index */
- smp_rmb();
- if (wait_index != iter->wait_index)
- goto out;
+ woken = wait_woken_check(iter, &wait_index);
goto again;
}
@@ -8616,15 +8682,16 @@ static long tracing_buffers_ioctl(struct file *file, unsigned int cmd, unsigned
if (cmd)
return -ENOIOCTLCMD;
- mutex_lock(&trace_types_lock);
-
- iter->wait_index++;
- /* Make sure the waiters see the new wait_index */
- smp_wmb();
+ wait_woken_set(iter);
ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
- mutex_unlock(&trace_types_lock);
+ /*
+ * This just kicks existing readers, a new reader coming in may
+ * still sleep.
+ */
+ wait_woken_clear(iter);
+
return 0;
}
--
2.43.0
From: Vitor Soares <vitor.soares(a)toradex.com>
When the mcp251xfd_start_xmit() function fails, the driver stops
processing messages and the interrupt routine does not return,
running indefinitely even after killing the running application.
Error messages:
[ 441.298819] mcp251xfd spi2.0 can0: ERROR in mcp251xfd_start_xmit: -16
[ 441.306498] mcp251xfd spi2.0 can0: Transmit Event FIFO buffer not empty. (seq=0x000017c7, tef_tail=0x000017cf, tef_head=0x000017d0, tx_head=0x000017d3).
... and repeat forever.
The issue can be triggered when multiple devices share the same
SPI interface. And there is concurrent access to the bus.
The problem occurs because tx_ring->head increments even if
mcp251xfd_start_xmit() fails. Consequently, the driver skips one
TX package while still expecting a response in
mcp251xfd_handle_tefif_one().
This patch resolves the issue by decreasing tx_ring->head if
mcp251xfd_start_xmit() fails. With the fix, if we trigger the issue and
the err = -EBUSY, the driver returns NETDEV_TX_BUSY. The network stack
retries to transmit the message.
Otherwise, it prints an error and discards the message.
Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vitor Soares <vitor.soares(a)toradex.com>
---
V1->V2:
- Return NETDEV_TX_BUSY if mcp251xfd_tx_obj_write() == -EBUSY
- Rework the commit message to address the change above
- Change can_put_echo_skb() to be called after mcp251xfd_tx_obj_write() succeed. Otherwise, we get Kernel NULL pointer dereference error.
drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c | 29 +++++++++++---------
1 file changed, 16 insertions(+), 13 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
index 160528d3cc26..0fdaececebdd 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
@@ -181,25 +181,28 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb,
tx_obj = mcp251xfd_get_tx_obj_next(tx_ring);
mcp251xfd_tx_obj_from_skb(priv, tx_obj, skb, tx_ring->head);
- /* Stop queue if we occupy the complete TX FIFO */
tx_head = mcp251xfd_get_tx_head(tx_ring);
- tx_ring->head++;
- if (mcp251xfd_get_tx_free(tx_ring) == 0)
- netif_stop_queue(ndev);
-
frame_len = can_skb_get_frame_len(skb);
- err = can_put_echo_skb(skb, ndev, tx_head, frame_len);
- if (!err)
- netdev_sent_queue(priv->ndev, frame_len);
+
+ tx_ring->head++;
err = mcp251xfd_tx_obj_write(priv, tx_obj);
- if (err)
- goto out_err;
+ if (err) {
+ tx_ring->head--;
- return NETDEV_TX_OK;
+ if (err == -EBUSY)
+ return NETDEV_TX_BUSY;
- out_err:
- netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err);
+ netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err);
+ } else {
+ can_put_echo_skb(skb, ndev, tx_head, frame_len);
+
+ /* Stop queue if we occupy the complete TX FIFO */
+ if (mcp251xfd_get_tx_free(tx_ring) == 0)
+ netif_stop_queue(ndev);
+
+ netdev_sent_queue(priv->ndev, frame_len);
+ }
return NETDEV_TX_OK;
}
--
2.34.1
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
When the trace_pipe_raw file is closed, there should be no new readers on
the file descriptor. This is mostly handled with the waking and wait_index
fields of the iterator. But there's still a slight race.
CPU 0 CPU 1
----- -----
wait_woken_prepare()
if (waking)
woken = true;
index = wait_index;
wait_woken_set()
waking = true
wait_index++;
ring_buffer_wake_waiters();
wait_on_pipe()
ring_buffer_wait();
The ring_buffer_wait() will miss the wakeup from CPU 1. The problem is
that the ring_buffer_wait() needs the logic of:
prepare_to_wait();
if (!condition)
schedule();
Where the missing condition check is the iter->waking.
Either that condition check needs to be passed to ring_buffer_wait() or
the function needs to be broken up into three parts. This chooses to do
the break up.
Break ring_buffer_wait() into:
ring_buffer_prepare_to_wait();
ring_buffer_wait();
ring_buffer_finish_wait();
Now wait_on_pipe() can have:
ring_buffer_prepare_to_wait();
if (!iter->waking)
ring_buffer_wait();
ring_buffer_finish_wait();
And this will catch the above race, as the waiter will either see waking,
or already have been woken up.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS…
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
include/linux/ring_buffer.h | 4 ++
kernel/trace/ring_buffer.c | 88 ++++++++++++++++++++++++++-----------
kernel/trace/trace.c | 14 +++++-
3 files changed, 78 insertions(+), 28 deletions(-)
diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h
index fa802db216f9..e5b5903cdc21 100644
--- a/include/linux/ring_buffer.h
+++ b/include/linux/ring_buffer.h
@@ -98,7 +98,11 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
__ring_buffer_alloc((size), (flags), &__key); \
})
+int ring_buffer_prepare_to_wait(struct trace_buffer *buffer, int cpu, int *full,
+ struct wait_queue_entry *wait);
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
+void ring_buffer_finish_wait(struct trace_buffer *buffer, int cpu, int full,
+ struct wait_queue_entry *wait);
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
struct file *filp, poll_table *poll_table, int full);
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 856d0e5b0da5..fa7090f6b4fc 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -868,29 +868,29 @@ rb_get_work_queue(struct trace_buffer *buffer, int cpu, int *full)
}
/**
- * ring_buffer_wait - wait for input to the ring buffer
+ * ring_buffer_prepare_to_wait - Prepare to wait for data on the ring buffer
* @buffer: buffer to wait on
* @cpu: the cpu buffer to wait on
- * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ * @full: wait until the percentage of pages are available,
+ * if @cpu != RING_BUFFER_ALL_CPUS. It may be updated via this function.
+ * @wait: The wait queue entry.
*
- * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
- * as data is added to any of the @buffer's cpu buffers. Otherwise
- * it will wait for data to be added to a specific cpu buffer.
+ * This must be called before ring_buffer_wait(). It calls the prepare_to_wait()
+ * on @wait for the necessary wait queue defined by @buffer, @cpu, and @full.
*/
-int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+int ring_buffer_prepare_to_wait(struct trace_buffer *buffer, int cpu, int *full,
+ struct wait_queue_entry *wait)
{
struct rb_irq_work *rbwork;
- DEFINE_WAIT(wait);
- int ret = 0;
- rbwork = rb_get_work_queue(buffer, cpu, &full);
+ rbwork = rb_get_work_queue(buffer, cpu, full);
if (IS_ERR(rbwork))
return PTR_ERR(rbwork);
- if (full)
- prepare_to_wait(&rbwork->full_waiters, &wait, TASK_INTERRUPTIBLE);
+ if (*full)
+ prepare_to_wait(&rbwork->full_waiters, wait, TASK_INTERRUPTIBLE);
else
- prepare_to_wait(&rbwork->waiters, &wait, TASK_INTERRUPTIBLE);
+ prepare_to_wait(&rbwork->waiters, wait, TASK_INTERRUPTIBLE);
/*
* The events can happen in critical sections where
@@ -912,30 +912,66 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* that is necessary is that the wake up happens after
* a task has been queued. It's OK for spurious wake ups.
*/
- if (full)
+ if (*full)
rbwork->full_waiters_pending = true;
else
rbwork->waiters_pending = true;
- if (rb_watermark_hit(buffer, cpu, full))
- goto out;
+ return 0;
+}
- if (signal_pending(current)) {
- ret = -EINTR;
- goto out;
- }
+/**
+ * ring_buffer_finish_wait - clean up of ring_buffer_prepare_to_wait()
+ * @buffer: buffer to wait on
+ * @cpu: the cpu buffer to wait on
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ * @wait: The wait queue entry.
+ *
+ * This must be called after ring_buffer_prepare_to_wait(). It cleans up
+ * the @wait for the queue defined by @buffer, @cpu, and @full.
+ */
+void ring_buffer_finish_wait(struct trace_buffer *buffer, int cpu, int full,
+ struct wait_queue_entry *wait)
+{
+ struct rb_irq_work *rbwork;
+
+ rbwork = rb_get_work_queue(buffer, cpu, &full);
+ if (WARN_ON_ONCE(IS_ERR(rbwork)))
+ return;
- schedule();
- out:
if (full)
- finish_wait(&rbwork->full_waiters, &wait);
+ finish_wait(&rbwork->full_waiters, wait);
else
- finish_wait(&rbwork->waiters, &wait);
+ finish_wait(&rbwork->waiters, wait);
+}
- if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
- ret = -EINTR;
+/**
+ * ring_buffer_wait - wait for input to the ring buffer
+ * @buffer: buffer to wait on
+ * @cpu: the cpu buffer to wait on
+ * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS
+ *
+ * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon
+ * as data is added to any of the @buffer's cpu buffers. Otherwise
+ * it will wait for data to be added to a specific cpu buffer.
+ *
+ * ring_buffer_prepare_to_wait() must be called before this function
+ * and ring_buffer_finish_wait() must be called after.
+ */
+int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
+{
+ if (rb_watermark_hit(buffer, cpu, full))
+ return 0;
- return ret;
+ if (signal_pending(current))
+ return -EINTR;
+
+ schedule();
+
+ if (!rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
+ return -EINTR;
+
+ return 0;
}
/**
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 4e8f6cdeafd5..790ce3ba2acb 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -1981,7 +1981,8 @@ static bool wait_woken_prepare(struct trace_iterator *iter, int *wait_index)
mutex_lock(&wait_mutex);
if (iter->waking)
woken = true;
- *wait_index = iter->wait_index;
+ if (wait_index)
+ *wait_index = iter->wait_index;
mutex_unlock(&wait_mutex);
return woken;
@@ -2016,13 +2017,22 @@ static void wait_woken_clear(struct trace_iterator *iter)
static int wait_on_pipe(struct trace_iterator *iter, int full)
{
+ struct trace_buffer *buffer;
+ DEFINE_WAIT(wait);
int ret;
/* Iterators are static, they should be filled or empty */
if (trace_buffer_iter(iter, iter->cpu_file))
return 0;
- ret = ring_buffer_wait(iter->array_buffer->buffer, iter->cpu_file, full);
+ buffer = iter->array_buffer->buffer;
+
+ ret = ring_buffer_prepare_to_wait(buffer, iter->cpu_file, &full, &wait);
+ if (ret < 0)
+ return ret;
+ if (!wait_woken_prepare(iter, NULL))
+ ret = ring_buffer_wait(buffer, iter->cpu_file, full);
+ ring_buffer_finish_wait(buffer, iter->cpu_file, full, &wait);
#ifdef CONFIG_TRACER_MAX_TRACE
/*
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The ring_buffer_wait() needs to be broken into three functions for proper
synchronization from the context of the callers:
ring_buffer_prepare_to_wait()
ring_buffer_wait()
ring_buffer_finish_wait()
To simplify the process, pull out the logic for getting the right work
queue to wait on, as it will be needed for the above functions.
There are three work queues depending on the cpu value.
If cpu == RING_BUFFER_ALL_CPUS, then the main "buffer->irq_work" is used.
Otherwise, the cpu_buffer representing the CPU buffer's irq_work is used.
Create a rb_get_work_queue() helper function to retrieve the proper queue.
Also rename "work" to "rbwork" as the variable point to struct rb_irq_work,
and to be more consistent with the variable naming elsewhere in the file.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS…
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 58 +++++++++++++++++++++++---------------
1 file changed, 35 insertions(+), 23 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index aa332ace108b..856d0e5b0da5 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -842,6 +842,31 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
return ret;
}
+static struct rb_irq_work *
+rb_get_work_queue(struct trace_buffer *buffer, int cpu, int *full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ struct rb_irq_work *rbwork;
+
+ /*
+ * Depending on what the caller is waiting for, either any
+ * data in any cpu buffer, or a specific buffer, put the
+ * caller on the appropriate wait queue.
+ */
+ if (cpu == RING_BUFFER_ALL_CPUS) {
+ rbwork = &buffer->irq_work;
+ /* Full only makes sense on per cpu reads */
+ *full = 0;
+ } else {
+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return ERR_PTR(-ENODEV);
+ cpu_buffer = buffer->buffers[cpu];
+ rbwork = &cpu_buffer->irq_work;
+ }
+
+ return rbwork;
+}
+
/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
@@ -854,31 +879,18 @@ static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
*/
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
{
- struct ring_buffer_per_cpu *cpu_buffer;
+ struct rb_irq_work *rbwork;
DEFINE_WAIT(wait);
- struct rb_irq_work *work;
int ret = 0;
- /*
- * Depending on what the caller is waiting for, either any
- * data in any cpu buffer, or a specific buffer, put the
- * caller on the appropriate wait queue.
- */
- if (cpu == RING_BUFFER_ALL_CPUS) {
- work = &buffer->irq_work;
- /* Full only makes sense on per cpu reads */
- full = 0;
- } else {
- if (!cpumask_test_cpu(cpu, buffer->cpumask))
- return -ENODEV;
- cpu_buffer = buffer->buffers[cpu];
- work = &cpu_buffer->irq_work;
- }
+ rbwork = rb_get_work_queue(buffer, cpu, &full);
+ if (IS_ERR(rbwork))
+ return PTR_ERR(rbwork);
if (full)
- prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
+ prepare_to_wait(&rbwork->full_waiters, &wait, TASK_INTERRUPTIBLE);
else
- prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
+ prepare_to_wait(&rbwork->waiters, &wait, TASK_INTERRUPTIBLE);
/*
* The events can happen in critical sections where
@@ -901,9 +913,9 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
* a task has been queued. It's OK for spurious wake ups.
*/
if (full)
- work->full_waiters_pending = true;
+ rbwork->full_waiters_pending = true;
else
- work->waiters_pending = true;
+ rbwork->waiters_pending = true;
if (rb_watermark_hit(buffer, cpu, full))
goto out;
@@ -916,9 +928,9 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
schedule();
out:
if (full)
- finish_wait(&work->full_waiters, &wait);
+ finish_wait(&rbwork->full_waiters, &wait);
else
- finish_wait(&work->waiters, &wait);
+ finish_wait(&rbwork->waiters, &wait);
if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
ret = -EINTR;
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
The .release() function does not get called until all readers of a file
descriptor are finished.
If a thread is blocked on reading a file descriptor in ring_buffer_wait(),
and another thread closes the file descriptor, it will not wake up the
other thread as ring_buffer_wake_waiters() is called by .release(), and
that will not get called until the .read() is finished.
The issue originally showed up in trace-cmd, but the readers are actually
other processes with their own file descriptors. So calling close() would wake
up the other tasks because they are blocked on another descriptor then the
one that was closed(). But there's other wake ups that solve that issue.
When a thread is blocked on a read, it can still hang even when another
thread closed its descriptor.
This is what the .flush() callback is for. Have the .flush() wake up the
readers.
Cc: stable(a)vger.kernel.org
Fixes: f3ddb74ad0790 ("tracing: Wake up ring buffer waiters on closing of the file")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/trace.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index d16b95ca58a7..c9c898307348 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8393,6 +8393,20 @@ tracing_buffers_read(struct file *filp, char __user *ubuf,
return size;
}
+static int tracing_buffers_flush(struct file *file, fl_owner_t id)
+{
+ struct ftrace_buffer_info *info = file->private_data;
+ struct trace_iterator *iter = &info->iter;
+
+ iter->wait_index++;
+ /* Make sure the waiters see the new wait_index */
+ smp_wmb();
+
+ ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
+
+ return 0;
+}
+
static int tracing_buffers_release(struct inode *inode, struct file *file)
{
struct ftrace_buffer_info *info = file->private_data;
@@ -8404,12 +8418,6 @@ static int tracing_buffers_release(struct inode *inode, struct file *file)
__trace_array_put(iter->tr);
- iter->wait_index++;
- /* Make sure the waiters see the new wait_index */
- smp_wmb();
-
- ring_buffer_wake_waiters(iter->array_buffer->buffer, iter->cpu_file);
-
if (info->spare)
ring_buffer_free_read_page(iter->array_buffer->buffer,
info->spare_cpu, info->spare);
@@ -8625,6 +8633,7 @@ static const struct file_operations tracing_buffers_fops = {
.read = tracing_buffers_read,
.poll = tracing_buffers_poll,
.release = tracing_buffers_release,
+ .flush = tracing_buffers_flush,
.splice_read = tracing_buffers_splice_read,
.unlocked_ioctl = tracing_buffers_ioctl,
.llseek = no_llseek,
--
2.43.0
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
A task can wait on a ring buffer for when it fills up to a specific
watermark. The writer will check the minimum watermark that waiters are
waiting for and if the ring buffer is past that, it will wake up all the
waiters.
The waiters are in a wait loop, and will first check if a signal is
pending and then check if the ring buffer is at the desired level where it
should break out of the loop.
If a file that uses a ring buffer closes, and there's threads waiting on
the ring buffer, it needs to wake up those threads. To do this, a
"wait_index" was used.
Before entering the wait loop, the waiter will read the wait_index. On
wakeup, it will check if the wait_index is different than when it entered
the loop, and will exit the loop if it is. The waker will only need to
update the wait_index before waking up the waiters.
This had a couple of bugs. One trivial one and one broken by design.
The trivial bug was that the waiter checked the wait_index after the
schedule() call. It had to be checked between the prepare_to_wait() and
the schedule() which it was not.
The main bug is that the first check to set the default wait_index will
always be outside the prepare_to_wait() and the schedule(). That's because
the ring_buffer_wait() doesn't have enough context to know if it should
break out of the loop.
The loop itself is not needed, because all the callers to the
ring_buffer_wait() also has their own loop, as the callers have a better
sense of what the context is to decide whether to break out of the loop
or not.
Just have the ring_buffer_wait() block once, and if it gets woken up, exit
the function and let the callers decide what to do next.
Link: https://lore.kernel.org/all/CAHk-=whs5MdtNjzFkTyaUy=vHi=qwWgPi0JgTe6OYUYMNS…
Cc: stable(a)vger.kernel.org
Fixes: e30f53aad2202 ("tracing: Do not busy wait in buffer splice")
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
---
kernel/trace/ring_buffer.c | 139 ++++++++++++++++++-------------------
1 file changed, 68 insertions(+), 71 deletions(-)
diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index 0699027b4f4c..3400f11286e3 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -384,7 +384,6 @@ struct rb_irq_work {
struct irq_work work;
wait_queue_head_t waiters;
wait_queue_head_t full_waiters;
- long wait_index;
bool waiters_pending;
bool full_waiters_pending;
bool wakeup_full;
@@ -798,14 +797,40 @@ void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu)
rbwork = &cpu_buffer->irq_work;
}
- rbwork->wait_index++;
- /* make sure the waiters see the new index */
- smp_wmb();
-
/* This can be called in any context */
irq_work_queue(&rbwork->work);
}
+static bool rb_watermark_hit(struct trace_buffer *buffer, int cpu, int full)
+{
+ struct ring_buffer_per_cpu *cpu_buffer;
+ bool ret = false;
+
+ /* Reads of all CPUs always waits for any data */
+ if (cpu == RING_BUFFER_ALL_CPUS)
+ return !ring_buffer_empty(buffer);
+
+ cpu_buffer = buffer->buffers[cpu];
+
+ if (!ring_buffer_empty_cpu(buffer, cpu)) {
+ unsigned long flags;
+ bool pagebusy;
+
+ if (!full)
+ return true;
+
+ raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
+ pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
+ ret = !pagebusy && full_hit(buffer, cpu, full);
+
+ if (!cpu_buffer->shortest_full ||
+ cpu_buffer->shortest_full > full)
+ cpu_buffer->shortest_full = full;
+ raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
+ }
+ return ret;
+}
+
/**
* ring_buffer_wait - wait for input to the ring buffer
* @buffer: buffer to wait on
@@ -821,7 +846,6 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
struct ring_buffer_per_cpu *cpu_buffer;
DEFINE_WAIT(wait);
struct rb_irq_work *work;
- long wait_index;
int ret = 0;
/*
@@ -840,81 +864,54 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full)
work = &cpu_buffer->irq_work;
}
- wait_index = READ_ONCE(work->wait_index);
-
- while (true) {
- if (full)
- prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
- else
- prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
-
- /*
- * The events can happen in critical sections where
- * checking a work queue can cause deadlocks.
- * After adding a task to the queue, this flag is set
- * only to notify events to try to wake up the queue
- * using irq_work.
- *
- * We don't clear it even if the buffer is no longer
- * empty. The flag only causes the next event to run
- * irq_work to do the work queue wake up. The worse
- * that can happen if we race with !trace_empty() is that
- * an event will cause an irq_work to try to wake up
- * an empty queue.
- *
- * There's no reason to protect this flag either, as
- * the work queue and irq_work logic will do the necessary
- * synchronization for the wake ups. The only thing
- * that is necessary is that the wake up happens after
- * a task has been queued. It's OK for spurious wake ups.
- */
- if (full)
- work->full_waiters_pending = true;
- else
- work->waiters_pending = true;
-
- if (signal_pending(current)) {
- ret = -EINTR;
- break;
- }
-
- if (cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer))
- break;
-
- if (cpu != RING_BUFFER_ALL_CPUS &&
- !ring_buffer_empty_cpu(buffer, cpu)) {
- unsigned long flags;
- bool pagebusy;
- bool done;
-
- if (!full)
- break;
-
- raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags);
- pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page;
- done = !pagebusy && full_hit(buffer, cpu, full);
+ if (full)
+ prepare_to_wait(&work->full_waiters, &wait, TASK_INTERRUPTIBLE);
+ else
+ prepare_to_wait(&work->waiters, &wait, TASK_INTERRUPTIBLE);
- if (!cpu_buffer->shortest_full ||
- cpu_buffer->shortest_full > full)
- cpu_buffer->shortest_full = full;
- raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags);
- if (done)
- break;
- }
+ /*
+ * The events can happen in critical sections where
+ * checking a work queue can cause deadlocks.
+ * After adding a task to the queue, this flag is set
+ * only to notify events to try to wake up the queue
+ * using irq_work.
+ *
+ * We don't clear it even if the buffer is no longer
+ * empty. The flag only causes the next event to run
+ * irq_work to do the work queue wake up. The worse
+ * that can happen if we race with !trace_empty() is that
+ * an event will cause an irq_work to try to wake up
+ * an empty queue.
+ *
+ * There's no reason to protect this flag either, as
+ * the work queue and irq_work logic will do the necessary
+ * synchronization for the wake ups. The only thing
+ * that is necessary is that the wake up happens after
+ * a task has been queued. It's OK for spurious wake ups.
+ */
+ if (full)
+ work->full_waiters_pending = true;
+ else
+ work->waiters_pending = true;
- schedule();
+ if (rb_watermark_hit(buffer, cpu, full))
+ goto out;
- /* Make sure to see the new wait index */
- smp_rmb();
- if (wait_index != work->wait_index)
- break;
+ if (signal_pending(current)) {
+ ret = -EINTR;
+ goto out;
}
+ schedule();
+ out:
if (full)
finish_wait(&work->full_waiters, &wait);
else
finish_wait(&work->waiters, &wait);
+ if (!ret && !rb_watermark_hit(buffer, cpu, full) && signal_pending(current))
+ ret = -EINTR;
+
return ret;
}
--
2.43.0
The .get_modes() callback is supposed to return the number of modes,
never a negative error code. If a negative value is returned, it'll just
be interpreted as a negative count, and added to previous calculations.
Document the rules, but handle the negative values gracefully with an
error message.
Cc: stable(a)vger.kernel.org
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
---
drivers/gpu/drm/drm_probe_helper.c | 7 +++++++
include/drm/drm_modeset_helper_vtables.h | 3 ++-
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/drm_probe_helper.c b/drivers/gpu/drm/drm_probe_helper.c
index 4d60cc810b57..bf2dd1f46b6c 100644
--- a/drivers/gpu/drm/drm_probe_helper.c
+++ b/drivers/gpu/drm/drm_probe_helper.c
@@ -422,6 +422,13 @@ static int drm_helper_probe_get_modes(struct drm_connector *connector)
count = connector_funcs->get_modes(connector);
+ /* The .get_modes() callback should not return negative values. */
+ if (count < 0) {
+ drm_err(connector->dev, ".get_modes() returned %pe\n",
+ ERR_PTR(count));
+ count = 0;
+ }
+
/*
* Fallback for when DDC probe failed in drm_get_edid() and thus skipped
* override/firmware EDID.
diff --git a/include/drm/drm_modeset_helper_vtables.h b/include/drm/drm_modeset_helper_vtables.h
index 881b03e4dc28..9ed42469540e 100644
--- a/include/drm/drm_modeset_helper_vtables.h
+++ b/include/drm/drm_modeset_helper_vtables.h
@@ -898,7 +898,8 @@ struct drm_connector_helper_funcs {
*
* RETURNS:
*
- * The number of modes added by calling drm_mode_probed_add().
+ * The number of modes added by calling drm_mode_probed_add(). Return 0
+ * on failures (no modes) instead of negative error codes.
*/
int (*get_modes)(struct drm_connector *connector);
--
2.39.2
From: Ard Biesheuvel <ardb(a)kernel.org>
Our efi_tcg2_tagged_event is not defined in the EFI spec, but it is not
a local invention either: it was taken from the TCG PC Client spec,
where it is called TCG_PCClientTaggedEvent.
This spec also contains some guidance on how to populate it, which
is not being followed closely at the moment; the event size should cover
the TCG_PCClientTaggedEvent and its payload only, but it currently
covers the preceding efi_tcg2_event too, and this may result in trailing
garbage being measured into the TPM.
So rename the struct and document its provenance, and fix up the use so
only the tagged event data is represented in the size field.
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
---
drivers/firmware/efi/libstub/efi-stub-helper.c | 20 +++++++++++---------
drivers/firmware/efi/libstub/efistub.h | 12 ++++++------
2 files changed, 17 insertions(+), 15 deletions(-)
diff --git a/drivers/firmware/efi/libstub/efi-stub-helper.c b/drivers/firmware/efi/libstub/efi-stub-helper.c
index bfa30625f5d0..16843ab9b64d 100644
--- a/drivers/firmware/efi/libstub/efi-stub-helper.c
+++ b/drivers/firmware/efi/libstub/efi-stub-helper.c
@@ -11,6 +11,7 @@
#include <linux/efi.h>
#include <linux/kernel.h>
+#include <linux/overflow.h>
#include <asm/efi.h>
#include <asm/setup.h>
@@ -219,23 +220,24 @@ static const struct {
},
};
+struct efistub_measured_event {
+ efi_tcg2_event_t event_data;
+ TCG_PCClientTaggedEvent tagged_event;
+} __packed;
+
static efi_status_t efi_measure_tagged_event(unsigned long load_addr,
unsigned long load_size,
enum efistub_event event)
{
+ struct efistub_measured_event *evt;
+ int size = struct_size(&evt->tagged_event, tagged_event_data,
+ events[event].event_data_len);
efi_guid_t tcg2_guid = EFI_TCG2_PROTOCOL_GUID;
efi_tcg2_protocol_t *tcg2 = NULL;
efi_status_t status;
efi_bs_call(locate_protocol, &tcg2_guid, NULL, (void **)&tcg2);
if (tcg2) {
- struct efi_measured_event {
- efi_tcg2_event_t event_data;
- efi_tcg2_tagged_event_t tagged_event;
- u8 tagged_event_data[];
- } *evt;
- int size = sizeof(*evt) + events[event].event_data_len;
-
status = efi_bs_call(allocate_pool, EFI_LOADER_DATA, size,
(void **)&evt);
if (status != EFI_SUCCESS)
@@ -249,12 +251,12 @@ static efi_status_t efi_measure_tagged_event(unsigned long load_addr,
.event_header.event_type = EV_EVENT_TAG,
};
- evt->tagged_event = (struct efi_tcg2_tagged_event){
+ evt->tagged_event = (TCG_PCClientTaggedEvent){
.tagged_event_id = events[event].event_id,
.tagged_event_data_size = events[event].event_data_len,
};
- memcpy(evt->tagged_event_data, events[event].event_data,
+ memcpy(evt->tagged_event.tagged_event_data, events[event].event_data,
events[event].event_data_len);
status = efi_call_proto(tcg2, hash_log_extend_event, 0,
diff --git a/drivers/firmware/efi/libstub/efistub.h b/drivers/firmware/efi/libstub/efistub.h
index c04b82ea40f2..043a3ff435f3 100644
--- a/drivers/firmware/efi/libstub/efistub.h
+++ b/drivers/firmware/efi/libstub/efistub.h
@@ -843,14 +843,14 @@ struct efi_tcg2_event {
/* u8[] event follows here */
} __packed;
-struct efi_tcg2_tagged_event {
- u32 tagged_event_id;
- u32 tagged_event_data_size;
- /* u8 tagged event data follows here */
-} __packed;
+/* from TCG PC Client Platform Firmware Profile Specification */
+typedef struct tdTCG_PCClientTaggedEvent {
+ u32 tagged_event_id;
+ u32 tagged_event_data_size;
+ u8 tagged_event_data[];
+} TCG_PCClientTaggedEvent;
typedef struct efi_tcg2_event efi_tcg2_event_t;
-typedef struct efi_tcg2_tagged_event efi_tcg2_tagged_event_t;
typedef union efi_tcg2_protocol efi_tcg2_protocol_t;
union efi_tcg2_protocol {
--
2.44.0.278.ge034bb2e1d-goog
From: Jeff Vanhoof <qjv001(a)motorola.com>
arm-smmu related crashes seen after a Missed ISOC interrupt when
no_interrupt=1 is used. This can happen if the hardware is still using
the data associated with a TRB after the usb_request's ->complete call
has been made. Instead of immediately releasing a request when a Missed
ISOC interrupt has occurred, this change will add logic to cancel the
request instead where it will eventually be released when the
END_TRANSFER command has completed. This logic is similar to some of the
cleanup done in dwc3_gadget_ep_dequeue.
Fixes: 6d8a019614f3 ("usb: dwc3: gadget: check for Missed Isoc from event status")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jeff Vanhoof <qjv001(a)motorola.com>
Co-developed-by: Dan Vacura <w36195(a)motorola.com>
Signed-off-by: Dan Vacura <w36195(a)motorola.com>
---
V1 -> V3:
- no change, new patch in series
drivers/usb/dwc3/core.h | 1 +
drivers/usb/dwc3/gadget.c | 38 ++++++++++++++++++++++++++------------
2 files changed, 27 insertions(+), 12 deletions(-)
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 8f9959ba9fd4..9b005d912241 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -943,6 +943,7 @@ struct dwc3_request {
#define DWC3_REQUEST_STATUS_DEQUEUED 3
#define DWC3_REQUEST_STATUS_STALLED 4
#define DWC3_REQUEST_STATUS_COMPLETED 5
+#define DWC3_REQUEST_STATUS_MISSED_ISOC 6
#define DWC3_REQUEST_STATUS_UNKNOWN -1
u8 epnum;
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 079cd333632e..411532c5c378 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -2021,6 +2021,9 @@ static void dwc3_gadget_ep_cleanup_cancelled_requests(struct dwc3_ep *dep)
case DWC3_REQUEST_STATUS_STALLED:
dwc3_gadget_giveback(dep, req, -EPIPE);
break;
+ case DWC3_REQUEST_STATUS_MISSED_ISOC:
+ dwc3_gadget_giveback(dep, req, -EXDEV);
+ break;
default:
dev_err(dwc->dev, "request cancelled with wrong reason:%d\n", req->status);
dwc3_gadget_giveback(dep, req, -ECONNRESET);
@@ -3402,21 +3405,32 @@ static bool dwc3_gadget_endpoint_trbs_complete(struct dwc3_ep *dep,
struct dwc3 *dwc = dep->dwc;
bool no_started_trb = true;
- dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
+ if (status == -EXDEV) {
+ struct dwc3_request *tmp;
+ struct dwc3_request *req;
- if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
- goto out;
+ if (!(dep->flags & DWC3_EP_END_TRANSFER_PENDING))
+ dwc3_stop_active_transfer(dep, true, true);
- if (!dep->endpoint.desc)
- return no_started_trb;
+ list_for_each_entry_safe(req, tmp, &dep->started_list, list)
+ dwc3_gadget_move_cancelled_request(req,
+ DWC3_REQUEST_STATUS_MISSED_ISOC);
+ } else {
+ dwc3_gadget_ep_cleanup_completed_requests(dep, event, status);
- if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
- list_empty(&dep->started_list) &&
- (list_empty(&dep->pending_list) || status == -EXDEV))
- dwc3_stop_active_transfer(dep, true, true);
- else if (dwc3_gadget_ep_should_continue(dep))
- if (__dwc3_gadget_kick_transfer(dep) == 0)
- no_started_trb = false;
+ if (dep->flags & DWC3_EP_END_TRANSFER_PENDING)
+ goto out;
+
+ if (!dep->endpoint.desc)
+ return no_started_trb;
+
+ if (usb_endpoint_xfer_isoc(dep->endpoint.desc) &&
+ list_empty(&dep->started_list) && list_empty(&dep->pending_list))
+ dwc3_stop_active_transfer(dep, true, true);
+ else if (dwc3_gadget_ep_should_continue(dep))
+ if (__dwc3_gadget_kick_transfer(dep) == 0)
+ no_started_trb = false;
+ }
out:
/*
--
2.34.1
Since kernel version 5.4.217 LTS, there has been an issue with the kernel live patching feature becoming unavailable.
When compiling the sample code for kernel live patching, the following message is displayed when enabled:
livepatch: klp_check_stack: kworker/u256:6:23490 has an unreliable stack
Reproduction steps:
1.git checkout v5.4.269 -b v5.4.269
2.make defconfig
3. Set CONFIG_LIVEPATCH=y、CONFIG_SAMPLE_LIVEPATCH=m
4. make -j bzImage
5. make samples/livepatch/livepatch-sample.ko
6. qemu-system-x86_64 -kernel arch/x86_64/boot/bzImage -nographic -append "console=ttyS0" -initrd initrd.img -m 1024M
7. insmod livepatch-sample.ko
Kernel live patch cannot complete successfully.
After some debugging, the immediate cause of the patch failure is an error in stack checking. The logs are as follows:
[ 340.974853] livepatch: klp_check_stack: kworker/u256:0:23486 has an unreliable stack
[ 340.974858] livepatch: klp_check_stack: kworker/u256:1:23487 has an unreliable stack
[ 340.974863] livepatch: klp_check_stack: kworker/u256:2:23488 has an unreliable stack
[ 340.974868] livepatch: klp_check_stack: kworker/u256:5:23489 has an unreliable stack
[ 340.974872] livepatch: klp_check_stack: kworker/u256:6:23490 has an unreliable stack
......
BTW,if you use the v5.4.217 tag for testing, make sure to set CONFIG_RETPOLINE = y and CONFIG_LIVEPATCH = y, and other steps are consistent with v5.4.269
After investigation, The problem is strongly related to the commit 8afd1c7da2b0 ("x86/speculation: Change FILL_RETURN_BUFFER to work with objtool"),
which would cause incorrect ORC entries to be generated, and the v5.4.217 version can undo this commit to make kernel livepatch work normally.
It is a back-ported upstream patch with some code adjustments,from the git log, the author also mentioned no intra-function call validation support.
Based on commit 6e1f54a4985b63bc1b55a09e5e75a974c5d6719b (Linux 5.4.269), This patchset adds stack validation support for intra-function calls,
allowing the kernel live patching feature to work correctly.
v3 - v2
- fix the compile error in arch/x86/kvm/svm.c, the error message is../arch/x86/include/asm/nospec-branch.h: 313: Error: no such instruction: 'unwind_hint_empty'
v2 - v1
- add the tag "Cc: stable(a)vger.kernel.org" in the sign-off area for patch x86/speculation: Support intra-function call
- add my own Signed-off to all patches
Alexandre Chartre (2):
objtool: is_fentry_call() crashes if call has no destination
objtool: Add support for intra-function calls
Rui Qi (1):
x86/speculation: Support intra-function call validation
arch/x86/include/asm/nospec-branch.h | 7 ++
arch/x86/include/asm/unwind_hints.h | 2 +-
include/linux/frame.h | 11 ++++
.../Documentation/stack-validation.txt | 8 +++
tools/objtool/arch/x86/decode.c | 6 ++
tools/objtool/check.c | 64 +++++++++++++++++--
6 files changed, 92 insertions(+), 6 deletions(-)
--
2.20.1
There was previously a theoretical window where swapoff() could run and
teardown a swap_info_struct while a call to free_swap_and_cache() was
running in another thread. This could cause, amongst other bad
possibilities, swap_page_trans_huge_swapped() (called by
free_swap_and_cache()) to access the freed memory for swap_map.
This is a theoretical problem and I haven't been able to provoke it from
a test case. But there has been agreement based on code review that this
is possible (see link below).
Fix it by using get_swap_device()/put_swap_device(), which will stall
swapoff(). There was an extra check in _swap_info_get() to confirm that
the swap entry was valid. This wasn't present in get_swap_device() so
I've added it. I couldn't find any existing get_swap_device() call sites
where this extra check would cause any false alarms.
Details of how to provoke one possible issue (thanks to David Hilenbrand
for deriving this):
--8<-----
__swap_entry_free() might be the last user and result in
"count == SWAP_HAS_CACHE".
swapoff->try_to_unuse() will stop as soon as soon as si->inuse_pages==0.
So the question is: could someone reclaim the folio and turn
si->inuse_pages==0, before we completed swap_page_trans_huge_swapped().
Imagine the following: 2 MiB folio in the swapcache. Only 2 subpages are
still references by swap entries.
Process 1 still references subpage 0 via swap entry.
Process 2 still references subpage 1 via swap entry.
Process 1 quits. Calls free_swap_and_cache().
-> count == SWAP_HAS_CACHE
[then, preempted in the hypervisor etc.]
Process 2 quits. Calls free_swap_and_cache().
-> count == SWAP_HAS_CACHE
Process 2 goes ahead, passes swap_page_trans_huge_swapped(), and calls
__try_to_reclaim_swap().
__try_to_reclaim_swap()->folio_free_swap()->delete_from_swap_cache()->
put_swap_folio()->free_swap_slot()->swapcache_free_entries()->
swap_entry_free()->swap_range_free()->
...
WRITE_ONCE(si->inuse_pages, si->inuse_pages - nr_entries);
What stops swapoff to succeed after process 2 reclaimed the swap cache
but before process1 finished its call to swap_page_trans_huge_swapped()?
--8<-----
Fixes: 7c00bafee87c ("mm/swap: free swap slots in batch")
Closes: https://lore.kernel.org/linux-mm/65a66eb9-41f8-4790-8db2-0c70ea15979f@redha…
Cc: stable(a)vger.kernel.org
Signed-off-by: Ryan Roberts <ryan.roberts(a)arm.com>
---
Applies on top of v6.8-rc6 and mm-unstable (b38c34939fe4).
Thanks,
Ryan
mm/swapfile.c | 14 +++++++++++---
1 file changed, 11 insertions(+), 3 deletions(-)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2b3a2d85e350..f580e6abc674 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1281,7 +1281,9 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry)
smp_rmb();
offset = swp_offset(entry);
if (offset >= si->max)
- goto put_out;
+ goto bad_offset;
+ if (data_race(!si->swap_map[swp_offset(entry)]))
+ goto bad_free;
return si;
bad_nofile:
@@ -1289,9 +1291,14 @@ struct swap_info_struct *get_swap_device(swp_entry_t entry)
out:
return NULL;
put_out:
- pr_err("%s: %s%08lx\n", __func__, Bad_offset, entry.val);
percpu_ref_put(&si->users);
return NULL;
+bad_offset:
+ pr_err("%s: %s%08lx\n", __func__, Bad_offset, entry.val);
+ goto put_out;
+bad_free:
+ pr_err("%s: %s%08lx\n", __func__, Unused_offset, entry.val);
+ goto put_out;
}
static unsigned char __swap_entry_free(struct swap_info_struct *p,
@@ -1609,13 +1616,14 @@ int free_swap_and_cache(swp_entry_t entry)
if (non_swap_entry(entry))
return 1;
- p = _swap_info_get(entry);
+ p = get_swap_device(entry);
if (p) {
count = __swap_entry_free(p, entry);
if (count == SWAP_HAS_CACHE &&
!swap_page_trans_huge_swapped(p, entry))
__try_to_reclaim_swap(p, swp_offset(entry),
TTRS_UNMAPPED | TTRS_FULL);
+ put_swap_device(p);
}
return p != NULL;
}
--
2.25.1
Hi,
I have to admit that v3 was a lazy attempt. This one should be on
the right path.
this series does basically two things:
1. Disables automatic load balancing as adviced by the hardware
workaround.
2. Assigns all the CCS slices to one single user engine. The user
will then be able to query only one CCS engine
I'm using here the "Requires: " tag, but I'm not sure the commit
id will be valid, on the other hand, I don't know what commit id
I should use.
Thanks Tvrtko, Matt, John and Joonas for your reviews!
Andi
Changelog
=========
v3 -> v4
- Reword correctly the comment in the workaround
- Fix a buffer overflow (Thanks Joonas)
- Handle properly the fused engines when setting the CCS mode.
v2 -> v3
- Simplified the algorithm for creating the list of the exported
uabi engines. (Patch 1) (Thanks, Tvrtko)
- Consider the fused engines when creating the uabi engine list
(Patch 2) (Thanks, Matt)
- Patch 4 now uses a the refactoring from patch 1, in a cleaner
outcome.
v1 -> v2
- In Patch 1 use the correct workaround number (thanks Matt).
- In Patch 2 do not add the extra CCS engines to the exposed UABI
engine list and adapt the engine counting accordingly (thanks
Tvrtko).
- Reword the commit of Patch 2 (thanks John).
Andi Shyti (3):
drm/i915/gt: Disable HW load balancing for CCS
drm/i915/gt: Refactor uabi engine class/instance list creation
drm/i915/gt: Enable only one CCS for compute workload
drivers/gpu/drm/i915/gt/intel_engine_user.c | 40 ++++++++++++++-------
drivers/gpu/drm/i915/gt/intel_gt.c | 23 ++++++++++++
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 6 ++++
drivers/gpu/drm/i915/gt/intel_workarounds.c | 5 +++
4 files changed, 62 insertions(+), 12 deletions(-)
--
2.43.0
Hi,
On Bugzilla, danilrybakov249(a)gmail.com reported stable-specific, ACPI error
regression that led into high CPU temperature [1]. He wrote:
> Overview:
>
> After updating from lts v6.6.14-2 to lts v6.6.17-1 noticed high CPU temperature and lag. After running htop noticed that journald was using 30-60% of CPU. Afterwards, tried switching to stable, or lts v6.6.18-1, but encountered the same issue.
>
> Running journalctl -f gives these lines over and over again:
>
> Feb 19 21:09:12 danirybe kernel: ACPI Error: Could not disable RealTimeClock events (20230628/evxfevnt-243)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 08, disabling event (20230628/evgpe-839)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 0A, disabling event (20230628/evgpe-839)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No handler or method for GPE 0B, disabling event (20230628/evgpe-839)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - PM_Timer (0), disabling (20230628/evevent-255)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - PowerButton (2), disabling (20230628/evevent-255)
> Feb 19 21:09:12 danirybe kernel: ACPI Error: No installed handler for fixed event - SleepButton (3), disabling (20230628/evevent-255)
>
> My system info:
>
> Laptop model: ASUS VivoBook D540NV-GQ065T
> OS: Arch Linux x86_64
> Kernel: 6.6.14-2-lts
> WM: sway
> CPU: Intel Pentium N420 (4) @ 2.500GHz
> GPU1: Intel Apollo Lake [HD Graphics 505]
> GPU2: NVIDIA GeForce 920MX
>
> I've pinned down the commit after which the problem occurs:
>
> 847e1eb30e269a094da046c08273abe3f3361cf2 is the first bad commit
> commit 847e1eb30e269a094da046c08273abe3f3361cf2
> Author: Shin'ichiro Kawasaki <shinichiro.kawasaki(a)wdc.com>
> Date: Mon Jan 8 15:20:58 2024 +0900
>
> platform/x86: p2sb: Allow p2sb_bar() calls during PCI device probe
>
> commit 5913320eb0b3ec88158cfcb0fa5e996bf4ef681b upstream.
>
> <snipped>...
See Bugzilla for the full thread.
Thanks.
[1]: https://bugzilla.kernel.org/show_bug.cgi?id=218531
--
An old man doll... just what I always wanted! - Clara
From: Vitor Soares <vitor.soares(a)toradex.com>
When the mcp251xfd_start_xmit() function fails, the driver stops
processing messages, and the interrupt routine does not return,
running indefinitely even after killing the running application.
Error messages:
[ 441.298819] mcp251xfd spi2.0 can0: ERROR in mcp251xfd_start_xmit: -16
[ 441.306498] mcp251xfd spi2.0 can0: Transmit Event FIFO buffer not empty. (seq=0x000017c7, tef_tail=0x000017cf, tef_head=0x000017d0, tx_head=0x000017d3).
... and repeat forever.
The issue can be triggered when multiple devices share the same
SPI interface. And there is concurrent access to the bus.
The problem occurs because tx_ring->head increments even if
mcp251xfd_start_xmit() fails. Consequently, the driver skips one
TX package while still expecting a response in
mcp251xfd_handle_tefif_one().
This patch resolves the issue by decreasing tx_ring->head if
mcp251xfd_start_xmit() fails. With the fix, if we attempt to trigger
the issue again, the driver prints an error and discard the message.
Fixes: 55e5b97f003e ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Cc: stable(a)vger.kernel.org
Signed-off-by: Vitor Soares <vitor.soares(a)toradex.com>
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c | 27 ++++++++++----------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
index 160528d3cc26..a8eb941c1b95 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tx.c
@@ -181,25 +181,26 @@ netdev_tx_t mcp251xfd_start_xmit(struct sk_buff *skb,
tx_obj = mcp251xfd_get_tx_obj_next(tx_ring);
mcp251xfd_tx_obj_from_skb(priv, tx_obj, skb, tx_ring->head);
- /* Stop queue if we occupy the complete TX FIFO */
tx_head = mcp251xfd_get_tx_head(tx_ring);
- tx_ring->head++;
- if (mcp251xfd_get_tx_free(tx_ring) == 0)
- netif_stop_queue(ndev);
-
frame_len = can_skb_get_frame_len(skb);
- err = can_put_echo_skb(skb, ndev, tx_head, frame_len);
- if (!err)
- netdev_sent_queue(priv->ndev, frame_len);
+ can_put_echo_skb(skb, ndev, tx_head, frame_len);
+
+ tx_ring->head++;
err = mcp251xfd_tx_obj_write(priv, tx_obj);
- if (err)
- goto out_err;
+ if (err) {
+ can_free_echo_skb(ndev, tx_head, NULL);
- return NETDEV_TX_OK;
+ tx_ring->head--;
+
+ netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err);
+ } else {
+ /* Stop queue if we occupy the complete TX FIFO */
+ if (mcp251xfd_get_tx_free(tx_ring) == 0)
+ netif_stop_queue(ndev);
- out_err:
- netdev_err(priv->ndev, "ERROR in %s: %d\n", __func__, err);
+ netdev_sent_queue(priv->ndev, frame_len);
+ }
return NETDEV_TX_OK;
}
--
2.34.1
From: Muhammad Ahmed <ahmed.ahmed(a)amd.com>
[WHY]
Blackscreen hang @ PC EF000025 when trying to wake up from S0i3. DCN
gets powered off due to dc_power_down_on_boot() being called after
timeout.
[HOW]
Setting the power_down_on_boot function pointer to null since we don't
expect the function to be called for APU.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
Acked-by: Alex Hung <alex.hung(a)amd.com>
Signed-off-by: Muhammad Ahmed <ahmed.ahmed(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c
index dce620d359a6..d4e0abbef28e 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c
@@ -39,7 +39,7 @@
static const struct hw_sequencer_funcs dcn35_funcs = {
.program_gamut_remap = dcn30_program_gamut_remap,
.init_hw = dcn35_init_hw,
- .power_down_on_boot = dcn35_power_down_on_boot,
+ .power_down_on_boot = NULL,
.apply_ctx_to_hw = dce110_apply_ctx_to_hw,
.apply_ctx_for_surface = NULL,
.program_front_end_for_ctx = dcn20_program_front_end_for_ctx,
--
2.34.1
We need to avoid the first page if we don't read it entirely.
We need to avoid the last page if we don't read it entirely.
While rather simple, this logic has been failed in the previous
fix. This time I wrote about 30 unit tests locally to check each
possible condition, hopefully I covered them all.
Reported-by: Christophe Kerello <christophe.kerello(a)foss.st.com>
Closes: https://lore.kernel.org/linux-mtd/20240221175327.42f7076d@xps-13/T/#m399bac…
Suggested-by: Christophe Kerello <christophe.kerello(a)foss.st.com>
Fixes: 828f6df1bcba ("mtd: rawnand: Clarify conditions to enable continuous reads")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
drivers/mtd/nand/raw/nand_base.c | 38 ++++++++++++++++++--------------
1 file changed, 22 insertions(+), 16 deletions(-)
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index 3b3ce2926f5d..bcfd99a1699f 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -3466,30 +3466,36 @@ static void rawnand_enable_cont_reads(struct nand_chip *chip, unsigned int page,
u32 readlen, int col)
{
struct mtd_info *mtd = nand_to_mtd(chip);
- unsigned int end_page, end_col;
+ unsigned int first_page, last_page;
chip->cont_read.ongoing = false;
if (!chip->controller->supported_op.cont_read)
return;
- end_page = DIV_ROUND_UP(col + readlen, mtd->writesize);
- end_col = (col + readlen) % mtd->writesize;
+ /*
+ * Don't bother making any calculations if the length is too small.
+ * Side effect: avoids possible integer underflows below.
+ */
+ if (readlen < (2 * mtd->writesize))
+ return;
+ /* Derive the page where continuous read should start (the first full page read) */
+ first_page = page;
if (col)
- page++;
-
- if (end_col && end_page)
- end_page--;
-
- if (page + 1 > end_page)
- return;
-
- chip->cont_read.first_page = page;
- chip->cont_read.last_page = end_page;
- chip->cont_read.ongoing = true;
-
- rawnand_cap_cont_reads(chip);
+ first_page++;
+
+ /* Derive the page where continuous read should stop (the last full page read) */
+ last_page = page + ((col + readlen) / mtd->writesize) - 1;
+
+ /* Configure and enable continuous read when suitable */
+ if (first_page < last_page) {
+ chip->cont_read.first_page = first_page;
+ chip->cont_read.last_page = last_page;
+ chip->cont_read.ongoing = true;
+ /* May reset the ongoing flag */
+ rawnand_cap_cont_reads(chip);
+ }
}
static void rawnand_cont_read_skip_first_page(struct nand_chip *chip, unsigned int page)
--
2.34.1
While crossing a LUN boundary, it is probably safer (and clearer) to
keep all members of the continuous read structure aligned, including the
pause page (which is the last page of the lun or the last page of the
continuous read). Once these members properly in sync, we can use the
rawnand_cap_cont_reads() helper everywhere to "prepare" the next
continuous read if there is one.
Fixes: bbcd80f53a5e ("mtd: rawnand: Prevent crossing LUN boundaries during sequential reads")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
---
This is not 100% a fix but I believe it is worth backporting as there
may be corner cases which were not identified with the initial
implementation.
---
drivers/mtd/nand/raw/nand_base.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c
index d6a27e08b112..4d5a663e4e05 100644
--- a/drivers/mtd/nand/raw/nand_base.c
+++ b/drivers/mtd/nand/raw/nand_base.c
@@ -1232,6 +1232,15 @@ static void rawnand_cap_cont_reads(struct nand_chip *chip)
chip->cont_read.pause_page = rawnand_last_page_of_lun(ppl, first_lun);
else
chip->cont_read.pause_page = chip->cont_read.last_page;
+
+ if (chip->cont_read.first_page == chip->cont_read.pause_page) {
+ chip->cont_read.first_page++;
+ chip->cont_read.pause_page = min(chip->cont_read.last_page,
+ rawnand_last_page_of_lun(ppl, first_lun + 1));
+ }
+
+ if (chip->cont_read.first_page >= chip->cont_read.last_page)
+ chip->cont_read.ongoing = false;
}
static int nand_lp_exec_cont_read_page_op(struct nand_chip *chip, unsigned int page,
@@ -1298,12 +1307,11 @@ static int nand_lp_exec_cont_read_page_op(struct nand_chip *chip, unsigned int p
if (!chip->cont_read.ongoing)
return 0;
- if (page == chip->cont_read.pause_page &&
- page != chip->cont_read.last_page) {
- chip->cont_read.first_page = chip->cont_read.pause_page + 1;
- rawnand_cap_cont_reads(chip);
- } else if (page == chip->cont_read.last_page) {
+ if (page == chip->cont_read.last_page) {
chip->cont_read.ongoing = false;
+ } else if (page == chip->cont_read.pause_page) {
+ chip->cont_read.first_page++;
+ rawnand_cap_cont_reads(chip);
}
return 0;
@@ -3510,10 +3518,7 @@ static void rawnand_cont_read_skip_first_page(struct nand_chip *chip, unsigned i
return;
chip->cont_read.first_page++;
- if (chip->cont_read.first_page == chip->cont_read.pause_page)
- chip->cont_read.first_page++;
- if (chip->cont_read.first_page >= chip->cont_read.last_page)
- chip->cont_read.ongoing = false;
+ rawnand_cap_cont_reads(chip);
}
/**
--
2.34.1
To add another datapoint to this - I've seen the same problem on Dell
PowerEdge R6615 servers... but no others.
The problem also crept into the 6.1.79 kernel with the commit
mentioned earlier, and is fixed by reverting that commit. Adding
nogbpages to the kernel command line can cause the failure to
reproduce on that hardware as well.
From: Jakub Kicinski <kuba(a)kernel.org>
[ Upstream commit e01e3934a1b2d122919f73bc6ddbe1cdafc4bbdb ]
Similarly to previous commit, the submitting thread (recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete().
Reorder scheduling the work before calling complete().
This seems more logical in the first place, as it's
the inverse order of what the submitting thread will do.
Reported-by: valis <sec(a)valis.email>
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Reviewed-by: Sabrina Dubroca <sd(a)queasysnail.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
(cherry picked from commit 6db22d6c7a6dc914b12c0469b94eb639b6a8a146)
[Lee: Fixed merge-conflict in Stable branches linux-6.1.y and older]
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
net/tls/tls_sw.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 910da98d6bfb3..58d9b5c06cf89 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -440,7 +440,6 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
struct scatterlist *sge;
struct sk_msg *msg_en;
struct tls_rec *rec;
- bool ready = false;
int pending;
rec = container_of(aead_req, struct tls_rec, aead_req);
@@ -472,8 +471,12 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
/* If received record is at head of tx_list, schedule tx */
first_rec = list_first_entry(&ctx->tx_list,
struct tls_rec, list);
- if (rec == first_rec)
- ready = true;
+ if (rec == first_rec) {
+ /* Schedule the transmission */
+ if (!test_and_set_bit(BIT_TX_SCHEDULED,
+ &ctx->tx_bitmask))
+ schedule_delayed_work(&ctx->tx_work.work, 1);
+ }
}
spin_lock_bh(&ctx->encrypt_compl_lock);
@@ -482,13 +485,6 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
if (!pending && ctx->async_notify)
complete(&ctx->async_wait.completion);
spin_unlock_bh(&ctx->encrypt_compl_lock);
-
- if (!ready)
- return;
-
- /* Schedule the transmission */
- if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
- schedule_delayed_work(&ctx->tx_work.work, 1);
}
static int tls_do_encryption(struct sock *sk,
--
2.44.0.278.ge034bb2e1d-goog
From: Jakub Kicinski <kuba(a)kernel.org>
[ Upstream commit e01e3934a1b2d122919f73bc6ddbe1cdafc4bbdb ]
Similarly to previous commit, the submitting thread (recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete().
Reorder scheduling the work before calling complete().
This seems more logical in the first place, as it's
the inverse order of what the submitting thread will do.
Reported-by: valis <sec(a)valis.email>
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Reviewed-by: Sabrina Dubroca <sd(a)queasysnail.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
(cherry picked from commit 6db22d6c7a6dc914b12c0469b94eb639b6a8a146)
[Lee: Fixed merge-conflict in Stable branches linux-6.1.y and older]
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
net/tls/tls_sw.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 46f1c19f7c60b..25a408206b3e0 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -443,7 +443,6 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
struct scatterlist *sge;
struct sk_msg *msg_en;
struct tls_rec *rec;
- bool ready = false;
int pending;
rec = container_of(aead_req, struct tls_rec, aead_req);
@@ -475,8 +474,12 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
/* If received record is at head of tx_list, schedule tx */
first_rec = list_first_entry(&ctx->tx_list,
struct tls_rec, list);
- if (rec == first_rec)
- ready = true;
+ if (rec == first_rec) {
+ /* Schedule the transmission */
+ if (!test_and_set_bit(BIT_TX_SCHEDULED,
+ &ctx->tx_bitmask))
+ schedule_delayed_work(&ctx->tx_work.work, 1);
+ }
}
spin_lock_bh(&ctx->encrypt_compl_lock);
@@ -485,13 +488,6 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
if (!pending && ctx->async_notify)
complete(&ctx->async_wait.completion);
spin_unlock_bh(&ctx->encrypt_compl_lock);
-
- if (!ready)
- return;
-
- /* Schedule the transmission */
- if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
- schedule_delayed_work(&ctx->tx_work.work, 1);
}
static int tls_do_encryption(struct sock *sk,
--
2.44.0.278.ge034bb2e1d-goog
From: Jakub Kicinski <kuba(a)kernel.org>
[ Upstream commit e01e3934a1b2d122919f73bc6ddbe1cdafc4bbdb ]
Similarly to previous commit, the submitting thread (recvmsg/sendmsg)
may exit as soon as the async crypto handler calls complete().
Reorder scheduling the work before calling complete().
This seems more logical in the first place, as it's
the inverse order of what the submitting thread will do.
Reported-by: valis <sec(a)valis.email>
Fixes: a42055e8d2c3 ("net/tls: Add support for async encryption of records for performance")
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Reviewed-by: Sabrina Dubroca <sd(a)queasysnail.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
(cherry picked from commit 6db22d6c7a6dc914b12c0469b94eb639b6a8a146)
[Lee: Fixed merge-conflict in Stable branches linux-6.1.y and older]
Signed-off-by: Lee Jones <lee(a)kernel.org>
---
net/tls/tls_sw.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index fc55b65695e5c..e51dc9d02b4e7 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -448,7 +448,6 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
struct scatterlist *sge;
struct sk_msg *msg_en;
struct tls_rec *rec;
- bool ready = false;
int pending;
rec = container_of(aead_req, struct tls_rec, aead_req);
@@ -480,8 +479,12 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
/* If received record is at head of tx_list, schedule tx */
first_rec = list_first_entry(&ctx->tx_list,
struct tls_rec, list);
- if (rec == first_rec)
- ready = true;
+ if (rec == first_rec) {
+ /* Schedule the transmission */
+ if (!test_and_set_bit(BIT_TX_SCHEDULED,
+ &ctx->tx_bitmask))
+ schedule_delayed_work(&ctx->tx_work.work, 1);
+ }
}
spin_lock_bh(&ctx->encrypt_compl_lock);
@@ -490,13 +493,6 @@ static void tls_encrypt_done(struct crypto_async_request *req, int err)
if (!pending && ctx->async_notify)
complete(&ctx->async_wait.completion);
spin_unlock_bh(&ctx->encrypt_compl_lock);
-
- if (!ready)
- return;
-
- /* Schedule the transmission */
- if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask))
- schedule_delayed_work(&ctx->tx_work.work, 1);
}
static int tls_do_encryption(struct sock *sk,
--
2.44.0.278.ge034bb2e1d-goog