In commit 2cb1e0259f50 ("ASoC: cs42l51: re-hook of_match_table
pointer"), 9 years ago, some random guy fixed the cs42l51 after it was
split into a core part and an I2C part to properly match based on a
Device Tree compatible string.
However, the fix in this commit is wrong: the MODULE_DEVICE_TABLE(of,
....) is in the core part of the driver, not the I2C part. Therefore,
automatic module loading based on module.alias, based on matching with
the DT compatible string, loads the core part of the driver, but not
the I2C part. And threfore, the i2c_driver is not registered, and the
codec is not known to the system, nor matched with a DT node with the
corresponding compatible string.
In order to fix that, we move the MODULE_DEVICE_TABLE(of, ...) into
the I2C part of the driver. The cs42l51_of_match[] array is also moved
as well, as it is not possible to have this definition in one file,
and the MODULE_DEVICE_TABLE(of, ...) invocation in another file, due
to how MODULE_DEVICE_TABLE works.
Thanks to this commit, the I2C part of the driver now properly
autoloads, and thanks to its dependency on the core part, the core
part gets autoloaded as well, resulting in a functional sound card
without having to manually load kernel modules.
Fixes: 2cb1e0259f50 ("ASoC: cs42l51: re-hook of_match_table pointer")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
---
sound/soc/codecs/cs42l51-i2c.c | 6 ++++++
sound/soc/codecs/cs42l51.c | 7 -------
sound/soc/codecs/cs42l51.h | 1 -
3 files changed, 6 insertions(+), 8 deletions(-)
diff --git a/sound/soc/codecs/cs42l51-i2c.c b/sound/soc/codecs/cs42l51-i2c.c
index b2106ff6a7cb..e7db7bcd0296 100644
--- a/sound/soc/codecs/cs42l51-i2c.c
+++ b/sound/soc/codecs/cs42l51-i2c.c
@@ -19,6 +19,12 @@ static struct i2c_device_id cs42l51_i2c_id[] = {
};
MODULE_DEVICE_TABLE(i2c, cs42l51_i2c_id);
+const struct of_device_id cs42l51_of_match[] = {
+ { .compatible = "cirrus,cs42l51", },
+ { }
+};
+MODULE_DEVICE_TABLE(of, cs42l51_of_match);
+
static int cs42l51_i2c_probe(struct i2c_client *i2c)
{
struct regmap_config config;
diff --git a/sound/soc/codecs/cs42l51.c b/sound/soc/codecs/cs42l51.c
index a67cd3ee84e0..a7079ae0ca09 100644
--- a/sound/soc/codecs/cs42l51.c
+++ b/sound/soc/codecs/cs42l51.c
@@ -823,13 +823,6 @@ int __maybe_unused cs42l51_resume(struct device *dev)
}
EXPORT_SYMBOL_GPL(cs42l51_resume);
-const struct of_device_id cs42l51_of_match[] = {
- { .compatible = "cirrus,cs42l51", },
- { }
-};
-MODULE_DEVICE_TABLE(of, cs42l51_of_match);
-EXPORT_SYMBOL_GPL(cs42l51_of_match);
-
MODULE_AUTHOR("Arnaud Patard <arnaud.patard(a)rtp-net.org>");
MODULE_DESCRIPTION("Cirrus Logic CS42L51 ALSA SoC Codec Driver");
MODULE_LICENSE("GPL");
diff --git a/sound/soc/codecs/cs42l51.h b/sound/soc/codecs/cs42l51.h
index a79343e8a54e..125703ede113 100644
--- a/sound/soc/codecs/cs42l51.h
+++ b/sound/soc/codecs/cs42l51.h
@@ -16,7 +16,6 @@ int cs42l51_probe(struct device *dev, struct regmap *regmap);
void cs42l51_remove(struct device *dev);
int __maybe_unused cs42l51_suspend(struct device *dev);
int __maybe_unused cs42l51_resume(struct device *dev);
-extern const struct of_device_id cs42l51_of_match[];
#define CS42L51_CHIP_ID 0x1B
#define CS42L51_CHIP_REV_A 0x00
--
2.41.0
From: Fedor Ross <fedor.ross(a)ifm.com>
The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0
mode' or . The maximum length of a CAN frame is 736 bits (64 data
bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe
spacing). For low bit rates like 10 kbit/s the arbitrarily chosen
MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small.
Otherwise during polling for the CAN controller to enter 'Normal CAN
2.0 mode' the timeout limit is exceeded and the configuration fails
with:
| $ ip link set dev can1 up type can bitrate 10000
| [ 731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468).
| [ 731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
| [ 731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check.
| RTNETLINK answers: Connection timed out
Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use
maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in
EFF mode, worst case bit stuffing and interframe spacing at the
current bit rate.
For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS
that holds the max frame length in bits, which is 736. This can be
replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in
a cleanup patch later.
Fixes: 55e5b97f003e8 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Fedor Ross <fedor.ross(a)ifm.com>
Signed-off-by: Marek Vasut <marex(a)denx.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20230717-mcp251xfd-fix-increase-poll-timeout-v5…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 10 ++++++++--
drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 1 +
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
index 68df6d4641b5..eebf967f4711 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -227,6 +227,8 @@ static int
__mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
const u8 mode_req, bool nowait)
{
+ const struct can_bittiming *bt = &priv->can.bittiming;
+ unsigned long timeout_us = MCP251XFD_POLL_TIMEOUT_US;
u32 con = 0, con_reqop, osc = 0;
u8 mode;
int err;
@@ -246,12 +248,16 @@ __mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
if (mode_req == MCP251XFD_REG_CON_MODE_SLEEP || nowait)
return 0;
+ if (bt->bitrate)
+ timeout_us = max_t(unsigned long, timeout_us,
+ MCP251XFD_FRAME_LEN_MAX_BITS * USEC_PER_SEC /
+ bt->bitrate);
+
err = regmap_read_poll_timeout(priv->map_reg, MCP251XFD_REG_CON, con,
!mcp251xfd_reg_invalid(con) &&
FIELD_GET(MCP251XFD_REG_CON_OPMOD_MASK,
con) == mode_req,
- MCP251XFD_POLL_SLEEP_US,
- MCP251XFD_POLL_TIMEOUT_US);
+ MCP251XFD_POLL_SLEEP_US, timeout_us);
if (err != -ETIMEDOUT && err != -EBADMSG)
return err;
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
index 7024ff0cc2c0..24510b3b8020 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
@@ -387,6 +387,7 @@ static_assert(MCP251XFD_TIMESTAMP_WORK_DELAY_SEC <
#define MCP251XFD_OSC_STAB_TIMEOUT_US (10 * MCP251XFD_OSC_STAB_SLEEP_US)
#define MCP251XFD_POLL_SLEEP_US (10)
#define MCP251XFD_POLL_TIMEOUT_US (USEC_PER_MSEC)
+#define MCP251XFD_FRAME_LEN_MAX_BITS (736)
/* Misc */
#define MCP251XFD_NAPI_WEIGHT 32
--
2.40.1
If the gs_usb device driver is unloaded (or unbound) before the
interface is shut down, the USB stack first calls the struct
usb_driver::disconnect and then the struct net_device_ops::ndo_stop
callback.
In gs_usb_disconnect() all pending bulk URBs are killed, i.e. no more
RX'ed CAN frames are send from the USB device to the host. Later in
gs_can_close() a reset control message is send to each CAN channel to
remove the controller from the CAN bus. In this race window the USB
device can still receive CAN frames from the bus and internally queue
them to be send to the host.
At least in the current version of the candlelight firmware, the queue
of received CAN frames is not emptied during the reset command. After
loading (or binding) the gs_usb driver, new URBs are submitted during
the struct net_device_ops::ndo_open callback and the candlelight
firmware starts sending its already queued CAN frames to the host.
However, this scenario was not considered when implementing the
hardware timestamp function. The cycle counter/time counter
infrastructure is set up (gs_usb_timestamp_init()) after the USBs are
submitted, resulting in a NULL pointer dereference if
timecounter_cyc2time() (via the call chain:
gs_usb_receive_bulk_callback() -> gs_usb_set_timestamp() ->
gs_usb_skb_set_timestamp()) is called too early.
Move the gs_usb_timestamp_init() function before the URBs are
submitted to fix this problem.
For a comprehensive solution, we need to consider gs_usb devices with
more than 1 channel. The cycle counter/time counter infrastructure is
setup per channel, but the RX URBs are per device. Once gs_can_open()
of _a_ channel has been called, and URBs have been submitted, the
gs_usb_receive_bulk_callback() can be called for _all_ available
channels, even for channels that are not running, yet. As cycle
counter/time counter has not set up, this will again lead to a NULL
pointer dereference.
Convert the cycle counter/time counter from a "per channel" to a "per
device" functionality. Also set it up, before submitting any URBs to
the device.
Further in gs_usb_receive_bulk_callback(), don't process any URBs for
not started CAN channels, only resubmit the URB.
Fixes: 45dfa45f52e6 ("can: gs_usb: add RX and TX hardware timestamp support")
Closes: https://github.com/candle-usb/candleLight_fw/issues/137#issuecomment-162353…
Cc: stable(a)vger.kernel.org
Cc: John Whittington <git(a)jbrengineering.co.uk>
Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-2-901…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/gs_usb.c | 101 ++++++++++++++++++-----------------
1 file changed, 53 insertions(+), 48 deletions(-)
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index 85b7b59c8426..f418066569fc 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -303,12 +303,6 @@ struct gs_can {
struct can_bittiming_const bt_const, data_bt_const;
unsigned int channel; /* channel number */
- /* time counter for hardware timestamps */
- struct cyclecounter cc;
- struct timecounter tc;
- spinlock_t tc_lock; /* spinlock to guard access tc->cycle_last */
- struct delayed_work timestamp;
-
u32 feature;
unsigned int hf_size_tx;
@@ -325,6 +319,13 @@ struct gs_usb {
struct gs_can *canch[GS_MAX_INTF];
struct usb_anchor rx_submitted;
struct usb_device *udev;
+
+ /* time counter for hardware timestamps */
+ struct cyclecounter cc;
+ struct timecounter tc;
+ spinlock_t tc_lock; /* spinlock to guard access tc->cycle_last */
+ struct delayed_work timestamp;
+
unsigned int hf_size_rx;
u8 active_channels;
};
@@ -388,15 +389,15 @@ static int gs_cmd_reset(struct gs_can *dev)
GFP_KERNEL);
}
-static inline int gs_usb_get_timestamp(const struct gs_can *dev,
+static inline int gs_usb_get_timestamp(const struct gs_usb *parent,
u32 *timestamp_p)
{
__le32 timestamp;
int rc;
- rc = usb_control_msg_recv(dev->udev, 0, GS_USB_BREQ_TIMESTAMP,
+ rc = usb_control_msg_recv(parent->udev, 0, GS_USB_BREQ_TIMESTAMP,
USB_DIR_IN | USB_TYPE_VENDOR | USB_RECIP_INTERFACE,
- dev->channel, 0,
+ 0, 0,
×tamp, sizeof(timestamp),
USB_CTRL_GET_TIMEOUT,
GFP_KERNEL);
@@ -410,20 +411,20 @@ static inline int gs_usb_get_timestamp(const struct gs_can *dev,
static u64 gs_usb_timestamp_read(const struct cyclecounter *cc) __must_hold(&dev->tc_lock)
{
- struct gs_can *dev = container_of(cc, struct gs_can, cc);
+ struct gs_usb *parent = container_of(cc, struct gs_usb, cc);
u32 timestamp = 0;
int err;
- lockdep_assert_held(&dev->tc_lock);
+ lockdep_assert_held(&parent->tc_lock);
/* drop lock for synchronous USB transfer */
- spin_unlock_bh(&dev->tc_lock);
- err = gs_usb_get_timestamp(dev, ×tamp);
- spin_lock_bh(&dev->tc_lock);
+ spin_unlock_bh(&parent->tc_lock);
+ err = gs_usb_get_timestamp(parent, ×tamp);
+ spin_lock_bh(&parent->tc_lock);
if (err)
- netdev_err(dev->netdev,
- "Error %d while reading timestamp. HW timestamps may be inaccurate.",
- err);
+ dev_err(&parent->udev->dev,
+ "Error %d while reading timestamp. HW timestamps may be inaccurate.",
+ err);
return timestamp;
}
@@ -431,14 +432,14 @@ static u64 gs_usb_timestamp_read(const struct cyclecounter *cc) __must_hold(&dev
static void gs_usb_timestamp_work(struct work_struct *work)
{
struct delayed_work *delayed_work = to_delayed_work(work);
- struct gs_can *dev;
+ struct gs_usb *parent;
- dev = container_of(delayed_work, struct gs_can, timestamp);
- spin_lock_bh(&dev->tc_lock);
- timecounter_read(&dev->tc);
- spin_unlock_bh(&dev->tc_lock);
+ parent = container_of(delayed_work, struct gs_usb, timestamp);
+ spin_lock_bh(&parent->tc_lock);
+ timecounter_read(&parent->tc);
+ spin_unlock_bh(&parent->tc_lock);
- schedule_delayed_work(&dev->timestamp,
+ schedule_delayed_work(&parent->timestamp,
GS_USB_TIMESTAMP_WORK_DELAY_SEC * HZ);
}
@@ -446,37 +447,38 @@ static void gs_usb_skb_set_timestamp(struct gs_can *dev,
struct sk_buff *skb, u32 timestamp)
{
struct skb_shared_hwtstamps *hwtstamps = skb_hwtstamps(skb);
+ struct gs_usb *parent = dev->parent;
u64 ns;
- spin_lock_bh(&dev->tc_lock);
- ns = timecounter_cyc2time(&dev->tc, timestamp);
- spin_unlock_bh(&dev->tc_lock);
+ spin_lock_bh(&parent->tc_lock);
+ ns = timecounter_cyc2time(&parent->tc, timestamp);
+ spin_unlock_bh(&parent->tc_lock);
hwtstamps->hwtstamp = ns_to_ktime(ns);
}
-static void gs_usb_timestamp_init(struct gs_can *dev)
+static void gs_usb_timestamp_init(struct gs_usb *parent)
{
- struct cyclecounter *cc = &dev->cc;
+ struct cyclecounter *cc = &parent->cc;
cc->read = gs_usb_timestamp_read;
cc->mask = CYCLECOUNTER_MASK(32);
cc->shift = 32 - bits_per(NSEC_PER_SEC / GS_USB_TIMESTAMP_TIMER_HZ);
cc->mult = clocksource_hz2mult(GS_USB_TIMESTAMP_TIMER_HZ, cc->shift);
- spin_lock_init(&dev->tc_lock);
- spin_lock_bh(&dev->tc_lock);
- timecounter_init(&dev->tc, &dev->cc, ktime_get_real_ns());
- spin_unlock_bh(&dev->tc_lock);
+ spin_lock_init(&parent->tc_lock);
+ spin_lock_bh(&parent->tc_lock);
+ timecounter_init(&parent->tc, &parent->cc, ktime_get_real_ns());
+ spin_unlock_bh(&parent->tc_lock);
- INIT_DELAYED_WORK(&dev->timestamp, gs_usb_timestamp_work);
- schedule_delayed_work(&dev->timestamp,
+ INIT_DELAYED_WORK(&parent->timestamp, gs_usb_timestamp_work);
+ schedule_delayed_work(&parent->timestamp,
GS_USB_TIMESTAMP_WORK_DELAY_SEC * HZ);
}
-static void gs_usb_timestamp_stop(struct gs_can *dev)
+static void gs_usb_timestamp_stop(struct gs_usb *parent)
{
- cancel_delayed_work_sync(&dev->timestamp);
+ cancel_delayed_work_sync(&parent->timestamp);
}
static void gs_update_state(struct gs_can *dev, struct can_frame *cf)
@@ -560,6 +562,9 @@ static void gs_usb_receive_bulk_callback(struct urb *urb)
if (!netif_device_present(netdev))
return;
+ if (!netif_running(netdev))
+ goto resubmit_urb;
+
if (hf->echo_id == -1) { /* normal rx */
if (hf->flags & GS_CAN_FLAG_FD) {
skb = alloc_canfd_skb(dev->netdev, &cfd);
@@ -856,6 +861,9 @@ static int gs_can_open(struct net_device *netdev)
}
if (!parent->active_channels) {
+ if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
+ gs_usb_timestamp_init(parent);
+
for (i = 0; i < GS_MAX_RX_URBS; i++) {
u8 *buf;
@@ -926,13 +934,9 @@ static int gs_can_open(struct net_device *netdev)
flags |= GS_CAN_MODE_FD;
/* if hardware supports timestamps, enable it */
- if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP) {
+ if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
flags |= GS_CAN_MODE_HW_TIMESTAMP;
- /* start polling timestamp */
- gs_usb_timestamp_init(dev);
- }
-
/* finally start device */
dev->can.state = CAN_STATE_ERROR_ACTIVE;
dm.flags = cpu_to_le32(flags);
@@ -942,8 +946,6 @@ static int gs_can_open(struct net_device *netdev)
GFP_KERNEL);
if (rc) {
netdev_err(netdev, "Couldn't start device (err=%d)\n", rc);
- if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
- gs_usb_timestamp_stop(dev);
dev->can.state = CAN_STATE_STOPPED;
goto out_usb_kill_anchored_urbs;
@@ -960,9 +962,13 @@ static int gs_can_open(struct net_device *netdev)
out_usb_free_urb:
usb_free_urb(urb);
out_usb_kill_anchored_urbs:
- if (!parent->active_channels)
+ if (!parent->active_channels) {
usb_kill_anchored_urbs(&dev->tx_submitted);
+ if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
+ gs_usb_timestamp_stop(parent);
+ }
+
close_candev(netdev);
return rc;
@@ -1011,14 +1017,13 @@ static int gs_can_close(struct net_device *netdev)
netif_stop_queue(netdev);
- /* stop polling timestamp */
- if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
- gs_usb_timestamp_stop(dev);
-
/* Stop polling */
parent->active_channels--;
if (!parent->active_channels) {
usb_kill_anchored_urbs(&parent->rx_submitted);
+
+ if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
+ gs_usb_timestamp_stop(parent);
}
/* Stop sending URBs */
--
2.40.1
The gs_usb driver handles USB devices with more than 1 CAN channel.
The RX path for all channels share the same bulk endpoint (the
transmitted bulk data encodes the channel number). These per-device
resources are allocated and submitted by the first opened channel.
During this allocation, the resources are either released immediately
in case of a failure or the URBs are anchored. All anchored URBs are
finally killed with gs_usb_disconnect().
Currently, gs_can_open() returns with an error if the allocation of a
URB or a buffer fails. However, if usb_submit_urb() fails, the driver
continues with the URBs submitted so far, even if no URBs were
successfully submitted.
Treat every error as fatal and free all allocated resources
immediately.
Switch to goto-style error handling, to prepare the driver for more
per-device resource allocation.
Cc: stable(a)vger.kernel.org
Cc: John Whittington <git(a)jbrengineering.co.uk>
Link: https://lore.kernel.org/all/20230716-gs_usb-fix-time-stamp-counter-v1-1-901…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/gs_usb.c | 31 ++++++++++++++++++++++---------
1 file changed, 22 insertions(+), 9 deletions(-)
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index d476c2884008..85b7b59c8426 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -833,6 +833,7 @@ static int gs_can_open(struct net_device *netdev)
.mode = cpu_to_le32(GS_CAN_MODE_START),
};
struct gs_host_frame *hf;
+ struct urb *urb = NULL;
u32 ctrlmode;
u32 flags = 0;
int rc, i;
@@ -856,13 +857,14 @@ static int gs_can_open(struct net_device *netdev)
if (!parent->active_channels) {
for (i = 0; i < GS_MAX_RX_URBS; i++) {
- struct urb *urb;
u8 *buf;
/* alloc rx urb */
urb = usb_alloc_urb(0, GFP_KERNEL);
- if (!urb)
- return -ENOMEM;
+ if (!urb) {
+ rc = -ENOMEM;
+ goto out_usb_kill_anchored_urbs;
+ }
/* alloc rx buffer */
buf = kmalloc(dev->parent->hf_size_rx,
@@ -870,8 +872,8 @@ static int gs_can_open(struct net_device *netdev)
if (!buf) {
netdev_err(netdev,
"No memory left for USB buffer\n");
- usb_free_urb(urb);
- return -ENOMEM;
+ rc = -ENOMEM;
+ goto out_usb_free_urb;
}
/* fill, anchor, and submit rx urb */
@@ -894,9 +896,7 @@ static int gs_can_open(struct net_device *netdev)
netdev_err(netdev,
"usb_submit failed (err=%d)\n", rc);
- usb_unanchor_urb(urb);
- usb_free_urb(urb);
- break;
+ goto out_usb_unanchor_urb;
}
/* Drop reference,
@@ -945,7 +945,8 @@ static int gs_can_open(struct net_device *netdev)
if (dev->feature & GS_CAN_FEATURE_HW_TIMESTAMP)
gs_usb_timestamp_stop(dev);
dev->can.state = CAN_STATE_STOPPED;
- return rc;
+
+ goto out_usb_kill_anchored_urbs;
}
parent->active_channels++;
@@ -953,6 +954,18 @@ static int gs_can_open(struct net_device *netdev)
netif_start_queue(netdev);
return 0;
+
+out_usb_unanchor_urb:
+ usb_unanchor_urb(urb);
+out_usb_free_urb:
+ usb_free_urb(urb);
+out_usb_kill_anchored_urbs:
+ if (!parent->active_channels)
+ usb_kill_anchored_urbs(&dev->tx_submitted);
+
+ close_candev(netdev);
+
+ return rc;
}
static int gs_usb_get_state(const struct net_device *netdev,
--
2.40.1
From: Fedor Ross <fedor.ross(a)ifm.com>
The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0
mode' or . The maximum length of a CAN frame is 736 bits (64 data
bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe
spacing). For low bit rates like 10 kbit/s the arbitrarily chosen
MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small.
Otherwise during polling for the CAN controller to enter 'Normal CAN
2.0 mode' the timeout limit is exceeded and the configuration fails
with:
| $ ip link set dev can1 up type can bitrate 10000
| [ 731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468).
| [ 731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
| [ 731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check.
| RTNETLINK answers: Connection timed out
Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use
maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in
EFF mode, worst case bit stuffing and interframe spacing at the
current bit rate.
For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS
that holds the max frame length in bits, which is 736. This can be
replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in
a cleanup patch later.
Fixes: 55e5b97f003e8 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Fedor Ross <fedor.ross(a)ifm.com>
Signed-off-by: Marek Vasut <marex(a)denx.de>
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
Hello,
picking up Fedor's and Marek's work. I decided to make it a minimal
patch and add stable on Cc. The mentioned cleanup patch that replaces
736 by can_frame_bits() can be done later and will go upstream via
can-next.
regards,
Marc
---
Changes in v5:
- use max_t() instead of max() to fix compilation on arm64
- Link to v4: https://lore.kernel.org/all/20230717-mcp251xfd-fix-increase-poll-timeout-v4…
Changes in v4:
- fix division by 0, if bit rate is not set yet
- Link to v3: https://lore.kernel.org/all/20230717100815.75764-1-mkl@pengutronix.de
Changes in v3:
- use 736 as max CAN frame length, calculated by Vincent Mailhol's
80a2fbce456e ("can: length: refactor frame lengths definition to add size in bits")
- update commit message
- drop patch 2/2
- Link to v2: https://lore.kernel.org/all/20230505222820.126441-1-marex@denx.de
Changes in v2:
- Add macros for CAN_BIT_STUFFING_OVERHEAD and CAN_IDLE_CONDITION_SAMPLES
(thanks Thomas, but please double check the comments)
- Update commit message
- Link to v1: https://lore.kernel.org/all/20230504195059.4706-1-marex@denx.de
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 10 ++++++++--
drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 1 +
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
index 68df6d4641b5..eebf967f4711 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -227,6 +227,8 @@ static int
__mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
const u8 mode_req, bool nowait)
{
+ const struct can_bittiming *bt = &priv->can.bittiming;
+ unsigned long timeout_us = MCP251XFD_POLL_TIMEOUT_US;
u32 con = 0, con_reqop, osc = 0;
u8 mode;
int err;
@@ -246,12 +248,16 @@ __mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
if (mode_req == MCP251XFD_REG_CON_MODE_SLEEP || nowait)
return 0;
+ if (bt->bitrate)
+ timeout_us = max_t(unsigned long, timeout_us,
+ MCP251XFD_FRAME_LEN_MAX_BITS * USEC_PER_SEC /
+ bt->bitrate);
+
err = regmap_read_poll_timeout(priv->map_reg, MCP251XFD_REG_CON, con,
!mcp251xfd_reg_invalid(con) &&
FIELD_GET(MCP251XFD_REG_CON_OPMOD_MASK,
con) == mode_req,
- MCP251XFD_POLL_SLEEP_US,
- MCP251XFD_POLL_TIMEOUT_US);
+ MCP251XFD_POLL_SLEEP_US, timeout_us);
if (err != -ETIMEDOUT && err != -EBADMSG)
return err;
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
index 7024ff0cc2c0..24510b3b8020 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
@@ -387,6 +387,7 @@ static_assert(MCP251XFD_TIMESTAMP_WORK_DELAY_SEC <
#define MCP251XFD_OSC_STAB_TIMEOUT_US (10 * MCP251XFD_OSC_STAB_SLEEP_US)
#define MCP251XFD_POLL_SLEEP_US (10)
#define MCP251XFD_POLL_TIMEOUT_US (USEC_PER_MSEC)
+#define MCP251XFD_FRAME_LEN_MAX_BITS (736)
/* Misc */
#define MCP251XFD_NAPI_WEIGHT 32
---
base-commit: 0dd1805fe498e0cf64f68e451a8baff7e64494ec
change-id: 20230717-mcp251xfd-fix-increase-poll-timeout-1f23b0e519b0
Best regards,
--
Marc Kleine-Budde <mkl(a)pengutronix.de>
Initialize local struct spi_mem_op at declaration to avoid having
garbage data from stack for members that were not explicitly
initialized afterwards. Zeroise the local struct after the first
use, so that we have it clean for the second use.
Fixes: d73ee7534cc5 ("mtd: spi-nor: core: perform a Soft Reset on shutdown")
Cc: stable(a)vger.kernel.org
Signed-off-by: Tudor Ambarus <tudor.ambarus(a)linaro.org>
---
drivers/mtd/spi-nor/core.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 273258f7e77f..603791497523 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -3235,11 +3235,9 @@ static int spi_nor_init(struct spi_nor *nor)
*/
static void spi_nor_soft_reset(struct spi_nor *nor)
{
- struct spi_mem_op op;
+ struct spi_mem_op op = SPINOR_SRSTEN_OP;
int ret;
- op = (struct spi_mem_op)SPINOR_SRSTEN_OP;
-
spi_nor_spimem_setup_op(nor, &op, nor->reg_proto);
ret = spi_mem_exec_op(nor->spimem, &op);
@@ -3248,6 +3246,7 @@ static void spi_nor_soft_reset(struct spi_nor *nor)
return;
}
+ memset(&op, 0, sizeof(op));
op = (struct spi_mem_op)SPINOR_SRST_OP;
spi_nor_spimem_setup_op(nor, &op, nor->reg_proto);
--
2.34.1
From: Fedor Ross <fedor.ross(a)ifm.com>
The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0
mode' or . The maximum length of a CAN frame is 736 bits (64 data
bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe
spacing). For low bit rates like 10 kbit/s the arbitrarily chosen
MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small.
Otherwise during polling for the CAN controller to enter 'Normal CAN
2.0 mode' the timeout limit is exceeded and the configuration fails
with:
| $ ip link set dev can1 up type can bitrate 10000
| [ 731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468).
| [ 731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
| [ 731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check.
| RTNETLINK answers: Connection timed out
Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use
maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in
EFF mode, worst case bit stuffing and interframe spacing at the
current bit rate.
For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS
that holds the max frame length in bits, which is 736. This can be
replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in
a cleanup patch later.
Fixes: 55e5b97f003e8 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Fedor Ross <fedor.ross(a)ifm.com>
Signed-off-by: Marek Vasut <marex(a)denx.de>
Cc: Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
Cc: Manivannan Sadhasivam <mani(a)kernel.org>
Cc: Thomas Kopp <thomas.kopp(a)microchip.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
Hello,
picking up Fedor's and Marek's work. I decided to make it a minimal
patch and add stable on Cc. The mentioned cleanup patch that replaces
736 by can_frame_bits() can be done later and will go upstream via
can-next.
regards,
Marc
v3:
- use 736 as max CAN frame length, calculated by Vincent Mailhol's
80a2fbce456e ("can: length: refactor frame lengths definition to add size in bits")
- update commit message
- drop patch 2/2
v2: https://lore.kernel.org/all/20230505222820.126441-1-marex@denx.de
- Add macros for CAN_BIT_STUFFING_OVERHEAD and CAN_IDLE_CONDITION_SAMPLES
(thanks Thomas, but please double check the comments)
- Update commit message
v1: https://lore.kernel.org/all/20230504195059.4706-1-marex@denx.de
drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 4 +++-
drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 1 +
2 files changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
index 68df6d4641b5..876e8e3cbb0b 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -227,6 +227,7 @@ static int
__mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
const u8 mode_req, bool nowait)
{
+ const struct can_bittiming *bt = &priv->can.bittiming;
u32 con = 0, con_reqop, osc = 0;
u8 mode;
int err;
@@ -251,7 +252,8 @@ __mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
FIELD_GET(MCP251XFD_REG_CON_OPMOD_MASK,
con) == mode_req,
MCP251XFD_POLL_SLEEP_US,
- MCP251XFD_POLL_TIMEOUT_US);
+ max_t(unsigned long, MCP251XFD_POLL_TIMEOUT_US,
+ MCP251XFD_FRAME_LEN_MAX_BITS * USEC_PER_SEC / bt->bitrate));
if (err != -ETIMEDOUT && err != -EBADMSG)
return err;
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
index 7024ff0cc2c0..24510b3b8020 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
@@ -387,6 +387,7 @@ static_assert(MCP251XFD_TIMESTAMP_WORK_DELAY_SEC <
#define MCP251XFD_OSC_STAB_TIMEOUT_US (10 * MCP251XFD_OSC_STAB_SLEEP_US)
#define MCP251XFD_POLL_SLEEP_US (10)
#define MCP251XFD_POLL_TIMEOUT_US (USEC_PER_MSEC)
+#define MCP251XFD_FRAME_LEN_MAX_BITS (736)
/* Misc */
#define MCP251XFD_NAPI_WEIGHT 32
--
2.40.1
This reverts commit b138e23d3dff90c0494925b4c1874227b81bddf7.
AutoRetry has been found to sometimes cause controller freezes when
communicating with buggy USB devices.
This controller feature allows the controller in host mode to send
non-terminating/burst retry ACKs instead of terminating retry ACKs
to devices when a transaction error (CRC error or overflow) occurs.
Unfortunately, if the USB device continues to respond with a CRC error,
the controller will not complete endpoint-related commands while it
keeps trying to auto-retry. [3] The xHCI driver will notice this once
it tries to abort the transfer using a Stop Endpoint command and
does not receive a completion in time. [1]
This situation is reported to dmesg:
[sda] tag#29 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[sda] tag#29 CDB: opcode=0x28 28 00 00 69 42 80 00 00 48 00
xhci-hcd: xHCI host not responding to stop endpoint command
xhci-hcd: xHCI host controller not responding, assume dead
xhci-hcd: HC died; cleaning up
Some users observed this problem on an Odroid HC2 with the JMS578
USB3-to-SATA bridge. The issue can be triggered by starting
a read-heavy workload on an attached SSD. After a while, the host
controller would die and the SSD would disappear from the system. [1]
Further analysis by Synopsys determined that controller revisions
other than the one in Odroid HC2 are also affected by this.
The recommended solution was to disable AutoRetry altogether.
This change does not have a noticeable performance impact. [2]
Revert the enablement commit. This will keep the AutoRetry bit in
the default state configured during SoC design [2].
Fixes: b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in the controller")
Link: https://lore.kernel.org/r/a21f34c04632d250cd0a78c7c6f4a1c9c7a43142.camel@gm… [1]
Link: https://lore.kernel.org/r/20230711214834.kyr6ulync32d4ktk@synopsys.com/ [2]
Link: https://lore.kernel.org/r/20230712225518.2smu7wse6djc7l5o@synopsys.com/ [3]
Cc: stable(a)vger.kernel.org
Cc: Mauro Ribeiro <mauro.ribeiro(a)hardkernel.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Jakub Vanek <linuxtardis(a)gmail.com>
---
V2 -> V3: Include more findings in changelog
V1 -> V2: Updated to disable AutoRetry everywhere based on Synopsys feedback
Reworded the changelog a bit to make it clearer
drivers/usb/dwc3/core.c | 16 ----------------
drivers/usb/dwc3/core.h | 3 ---
2 files changed, 19 deletions(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index f6689b731718..a4e079d37566 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1209,22 +1209,6 @@ static int dwc3_core_init(struct dwc3 *dwc)
dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
}
- if (dwc->dr_mode == USB_DR_MODE_HOST ||
- dwc->dr_mode == USB_DR_MODE_OTG) {
- reg = dwc3_readl(dwc->regs, DWC3_GUCTL);
-
- /*
- * Enable Auto retry Feature to make the controller operating in
- * Host mode on seeing transaction errors(CRC errors or internal
- * overrun scenerios) on IN transfers to reply to the device
- * with a non-terminating retry ACK (i.e, an ACK transcation
- * packet with Retry=1 & Nump != 0)
- */
- reg |= DWC3_GUCTL_HSTINAUTORETRY;
-
- dwc3_writel(dwc->regs, DWC3_GUCTL, reg);
- }
-
/*
* Must config both number of packets and max burst settings to enable
* RX and/or TX threshold.
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 8b1295e4dcdd..a69ac67d89fe 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -256,9 +256,6 @@
#define DWC3_GCTL_GBLHIBERNATIONEN BIT(1)
#define DWC3_GCTL_DSBLCLKGTNG BIT(0)
-/* Global User Control Register */
-#define DWC3_GUCTL_HSTINAUTORETRY BIT(14)
-
/* Global User Control 1 Register */
#define DWC3_GUCTL1_DEV_DECOUPLE_L1L2_EVT BIT(31)
#define DWC3_GUCTL1_TX_IPGAP_LINECHECK_DIS BIT(28)
--
2.25.1
From: Fedor Ross <fedor.ross(a)ifm.com>
The mcp251xfd controller needs an idle bus to enter 'Normal CAN 2.0
mode' or . The maximum length of a CAN frame is 736 bits (64 data
bytes, CAN-FD, EFF mode, worst case bit stuffing and interframe
spacing). For low bit rates like 10 kbit/s the arbitrarily chosen
MCP251XFD_POLL_TIMEOUT_US of 1 ms is too small.
Otherwise during polling for the CAN controller to enter 'Normal CAN
2.0 mode' the timeout limit is exceeded and the configuration fails
with:
| $ ip link set dev can1 up type can bitrate 10000
| [ 731.911072] mcp251xfd spi2.1 can1: Controller failed to enter mode CAN 2.0 Mode (6) and stays in Configuration Mode (4) (con=0x068b0760, osc=0x00000468).
| [ 731.927192] mcp251xfd spi2.1 can1: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
| [ 731.938101] A link change request failed with some changes committed already. Interface can1 may have been left with an inconsistent configuration, please check.
| RTNETLINK answers: Connection timed out
Make MCP251XFD_POLL_TIMEOUT_US timeout calculation dynamic. Use
maximum of 1ms and bit time of 1 full 64 data bytes CAN-FD frame in
EFF mode, worst case bit stuffing and interframe spacing at the
current bit rate.
For easier backporting define the macro MCP251XFD_FRAME_LEN_MAX_BITS
that holds the max frame length in bits, which is 736. This can be
replaced by can_frame_bits(true, true, true, true, CANFD_MAX_DLEN) in
a cleanup patch later.
Fixes: 55e5b97f003e8 ("can: mcp25xxfd: add driver for Microchip MCP25xxFD SPI CAN")
Signed-off-by: Fedor Ross <fedor.ross(a)ifm.com>
Signed-off-by: Marek Vasut <marex(a)denx.de>
Cc: Vincent Mailhol <mailhol.vincent(a)wanadoo.fr>
Cc: Manivannan Sadhasivam <mani(a)kernel.org>
Cc: Thomas Kopp <thomas.kopp(a)microchip.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/20230717100815.75764-1-mkl@pengutronix.de
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
Hello,
picking up Fedor's and Marek's work. I decided to make it a minimal
patch and add stable on Cc. The mentioned cleanup patch that replaces
736 by can_frame_bits() can be done later and will go upstream via
can-next.
regards,
Marc
v4:
- fix division by 0, if bit rate is not set yet
v3: https://lore.kernel.org/all/20230717100815.75764-1-mkl@pengutronix.de/
- use 736 as max CAN frame length, calculated by Vincent Mailhol's
80a2fbce456e ("can: length: refactor frame lengths definition to add size in bits")
- update commit message
- drop patch 2/2
v2: https://lore.kernel.org/all/20230505222820.126441-1-marex@denx.de
- Add macros for CAN_BIT_STUFFING_OVERHEAD and CAN_IDLE_CONDITION_SAMPLES
(thanks Thomas, but please double check the comments)
- Update commit message
v1: https://lore.kernel.org/all/20230504195059.4706-1-marex@denx.de
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c | 10 ++++++++--
drivers/net/can/spi/mcp251xfd/mcp251xfd.h | 1 +
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
index 68df6d4641b5..a1d58d09cc50 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
@@ -227,6 +227,8 @@ static int
__mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
const u8 mode_req, bool nowait)
{
+ const struct can_bittiming *bt = &priv->can.bittiming;
+ unsigned long timeout_us = MCP251XFD_POLL_TIMEOUT_US;
u32 con = 0, con_reqop, osc = 0;
u8 mode;
int err;
@@ -246,12 +248,16 @@ __mcp251xfd_chip_set_mode(const struct mcp251xfd_priv *priv,
if (mode_req == MCP251XFD_REG_CON_MODE_SLEEP || nowait)
return 0;
+ if (bt->bitrate)
+ timeout_us = max(timeout_us,
+ MCP251XFD_FRAME_LEN_MAX_BITS * USEC_PER_SEC /
+ bt->bitrate);
+
err = regmap_read_poll_timeout(priv->map_reg, MCP251XFD_REG_CON, con,
!mcp251xfd_reg_invalid(con) &&
FIELD_GET(MCP251XFD_REG_CON_OPMOD_MASK,
con) == mode_req,
- MCP251XFD_POLL_SLEEP_US,
- MCP251XFD_POLL_TIMEOUT_US);
+ MCP251XFD_POLL_SLEEP_US, timeout_us);
if (err != -ETIMEDOUT && err != -EBADMSG)
return err;
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
index 7024ff0cc2c0..24510b3b8020 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd.h
@@ -387,6 +387,7 @@ static_assert(MCP251XFD_TIMESTAMP_WORK_DELAY_SEC <
#define MCP251XFD_OSC_STAB_TIMEOUT_US (10 * MCP251XFD_OSC_STAB_SLEEP_US)
#define MCP251XFD_POLL_SLEEP_US (10)
#define MCP251XFD_POLL_TIMEOUT_US (USEC_PER_MSEC)
+#define MCP251XFD_FRAME_LEN_MAX_BITS (736)
/* Misc */
#define MCP251XFD_NAPI_WEIGHT 32
---
base-commit: 0dd1805fe498e0cf64f68e451a8baff7e64494ec
change-id: 20230717-mcp251xfd-fix-increase-poll-timeout-1f23b0e519b0
Best regards,
--
Marc Kleine-Budde <mkl(a)pengutronix.de>
Don't assume that the device is fully under the control of PCI core.
Use RMW capability accessors in link retraining which do proper locking
to avoid losing concurrent updates to the register values.
Fixes: 4ec73791a64b ("PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratum")
Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Suggested-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
pci/enumeration branch moves the link retraining code into PCI core and
also conflicts with a link retraining fix in pci/aspm. The changelog
(and patch splitting) takes the move into account by not referring to
ASPM while the change itself is not based on pci/enumeration (as per
Bjorn's preference).
---
drivers/pci/pci.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 60230da957e0..f7315b13bb82 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4927,7 +4927,6 @@ static int pcie_wait_for_link_status(struct pci_dev *pdev,
int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
{
int rc;
- u16 lnkctl;
/*
* Ensure the updated LNKCTL parameters are used during link
@@ -4939,17 +4938,14 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
if (rc)
return rc;
- pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnkctl);
- lnkctl |= PCI_EXP_LNKCTL_RL;
- pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+ pcie_capability_set_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
if (pdev->clear_retrain_link) {
/*
* Due to an erratum in some devices the Retrain Link bit
* needs to be cleared again manually to allow the link
* training to succeed.
*/
- lnkctl &= ~PCI_EXP_LNKCTL_RL;
- pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+ pcie_capability_clear_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
}
return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
--
2.30.2
Many places in the kernel write the Link Control and Root Control PCI
Express Capability Registers without proper concurrency control and
this could result in losing the changes one of the writers intended to
make.
Add pcie_cap_lock spinlock into the struct pci_dev and use it to
protect bit changes made in the RMW capability accessors. Protect only
a selected set of registers by differentiating the RMW accessor
internally to locked/unlocked variants using a wrapper which has the
same signature as pcie_capability_clear_and_set_word(). As the
Capability Register (pos) given to the wrapper is always a constant,
the compiler should be able to simplify all the dead-code away.
So far only the Link Control Register (ASPM, hotplug, link retraining,
various drivers) and the Root Control Register (AER & PME) seem to
require RMW locking.
Fixes: c7f486567c1d ("PCI PM: PCIe PME root port service driver")
Fixes: f12eb72a268b ("PCI/ASPM: Use PCI Express Capability accessors")
Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Fixes: affa48de8417 ("staging/rdma/hfi1: Add support for enabling/disabling PCIe ASPM")
Fixes: 849a9366cba9 ("misc: rtsx: Add support new chip rts5228 mmc: rtsx: Add support MMC_CAP2_NO_MMC")
Fixes: 3d1e7aa80d1c ("misc: rtsx: Use pcie_capability_clear_and_set_word() for PCI_EXP_LNKCTL")
Fixes: c0e5f4e73a71 ("misc: rtsx: Add support for RTS5261")
Fixes: 3df4fce739e2 ("misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG")
Fixes: 121e9c6b5c4c ("misc: rtsx: modify and fix init_hw function")
Fixes: 19f3bd548f27 ("mfd: rtsx: Remove LCTLR defination")
Fixes: 773ccdfd9cc6 ("mfd: rtsx: Read vendor setting from config space")
Fixes: 8275b77a1513 ("mfd: rts5249: Add support for RTS5250S power saving")
Fixes: 5da4e04ae480 ("misc: rtsx: Add support for RTS5260")
Fixes: 0f49bfbd0f2e ("tg3: Use PCI Express Capability accessors")
Fixes: 5e7dfd0fb94a ("tg3: Prevent corruption at 10 / 100Mbps w CLKREQ")
Fixes: b726e493e8dc ("r8169: sync existing 8168 device hardware start sequences with vendor driver")
Fixes: e6de30d63eb1 ("r8169: more 8168dp support.")
Fixes: 8a06127602de ("Bluetooth: hci_bcm4377: Add new driver for BCM4377 PCIe boards")
Fixes: 6f461f6c7c96 ("e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata")
Fixes: 1eae4eb2a1c7 ("e1000e: Disable L1 ASPM power savings for 82573 mobile variants")
Fixes: 8060e169e02f ("ath9k: Enable extended synch for AR9485 to fix L0s recovery issue")
Fixes: 69ce674bfa69 ("ath9k: do btcoex ASPM disabling at initialization time")
Fixes: f37f05503575 ("mt76: mt76x2e: disable pcie_aspm by default")
Suggested-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/pci/access.c | 20 +++++++++++++++++---
drivers/pci/probe.c | 1 +
include/linux/pci.h | 34 ++++++++++++++++++++++++++++++++--
3 files changed, 50 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 3c230ca3de58..0b2e90d2f04f 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -497,8 +497,8 @@ int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val)
}
EXPORT_SYMBOL(pcie_capability_write_dword);
-int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
- u16 clear, u16 set)
+int pcie_capability_clear_and_set_word_unlocked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set)
{
int ret;
u16 val;
@@ -512,7 +512,21 @@ int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
return ret;
}
-EXPORT_SYMBOL(pcie_capability_clear_and_set_word);
+EXPORT_SYMBOL(pcie_capability_clear_and_set_word_unlocked);
+
+int pcie_capability_clear_and_set_word_locked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&dev->pcie_cap_lock, flags);
+ ret = pcie_capability_clear_and_set_word_unlocked(dev, pos, clear, set);
+ spin_unlock_irqrestore(&dev->pcie_cap_lock, flags);
+
+ return ret;
+}
+EXPORT_SYMBOL(pcie_capability_clear_and_set_word_locked);
int pcie_capability_clear_and_set_dword(struct pci_dev *dev, int pos,
u32 clear, u32 set)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8bac3ce02609..f1587fb0ba71 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2324,6 +2324,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
.end = -1,
};
+ spin_lock_init(&dev->pcie_cap_lock);
#ifdef CONFIG_PCI_MSI
raw_spin_lock_init(&dev->msi_lock);
#endif
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c69a2cc1f412..7ee498cd1f37 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -467,6 +467,7 @@ struct pci_dev {
pci_dev_flags_t dev_flags;
atomic_t enable_cnt; /* pci_enable_device has been called */
+ spinlock_t pcie_cap_lock; /* Protects RMW ops in capability accessors */
u32 saved_config_space[16]; /* Config space saved at suspend time */
struct hlist_head saved_cap_space;
int rom_attr_enabled; /* Display of ROM attribute enabled? */
@@ -1217,11 +1218,40 @@ int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val);
int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val);
int pcie_capability_write_word(struct pci_dev *dev, int pos, u16 val);
int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val);
-int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
- u16 clear, u16 set);
+int pcie_capability_clear_and_set_word_unlocked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set);
+int pcie_capability_clear_and_set_word_locked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set);
int pcie_capability_clear_and_set_dword(struct pci_dev *dev, int pos,
u32 clear, u32 set);
+/**
+ * pcie_capability_clear_and_set_word - RMW accessor for PCI Express Capability Registers
+ * @dev: PCI device structure of the PCI Express device
+ * @pos: PCI Express Capability Register
+ * @clear: Clear bitmask
+ * @set: Set bitmask
+ *
+ * Perform a Read-Modify-Write (RMW) operation using @clear and @set
+ * bitmasks on PCI Express Capability Register at @pos. Certain PCI Express
+ * Capability Registers are accessed concurrently in RMW fashion, hence
+ * require locking which is handled transparently to the caller.
+ */
+static inline int pcie_capability_clear_and_set_word(struct pci_dev *dev,
+ int pos,
+ u16 clear, u16 set)
+{
+ switch (pos) {
+ case PCI_EXP_LNKCTL:
+ case PCI_EXP_RTCTL:
+ return pcie_capability_clear_and_set_word_locked(dev, pos,
+ clear, set);
+ default:
+ return pcie_capability_clear_and_set_word_unlocked(dev, pos,
+ clear, set);
+ }
+}
+
static inline int pcie_capability_set_word(struct pci_dev *dev, int pos,
u16 set)
{
--
2.30.2
The length information for available buffer space for CCA
replies is covered with two fields in the T6 header prepended
on each CCA reply: fromcardlen1 and fromcardlen2. The sum of
these both values must not exceed the AP bus limit for this
card (24KB for CEX8, 12KB CEX7 and older) minus the always
present headers.
The current code adjusted the fromcardlen2 value in case
of exceeding the AP bus limit when there was a non-zero
value given from userspace. Some tests now showed that this
was the wrong assumption. Instead the userspace value given for
this field should always be trusted and if the sum of the
wo fields exceeds the AP bus limit for this card the first
field fromcardlen1 should be adjusted instead.
So now the calculation is done with this new insight in mind.
Also some additional checks for overflow have been introduced
and some comments to provide some documentation for future
maintainers of this complicated calculation code.
Furthermore the 128 bytes of fix overhead which is used
in the current code is not correct. Investications showed
that for a reply always the same two header structs are
prepended before a possible payload. So this is also fixed
with this patch.
Signed-off-by: Harald Freudenberger <freude(a)linux.ibm.com>
Cc: stable(a)vger.kernel.org
---
drivers/s390/crypto/zcrypt_msgtype6.c | 45 ++++++++++++++++++++-------
1 file changed, 33 insertions(+), 12 deletions(-)
diff --git a/drivers/s390/crypto/zcrypt_msgtype6.c b/drivers/s390/crypto/zcrypt_msgtype6.c
index 247f0ad38362..5ac110669327 100644
--- a/drivers/s390/crypto/zcrypt_msgtype6.c
+++ b/drivers/s390/crypto/zcrypt_msgtype6.c
@@ -551,6 +551,12 @@ static int xcrb_msg_to_type6_ep11cprb_msgx(bool userspace, struct ap_message *ap
*
* Returns 0 on success or -EINVAL, -EFAULT, -EAGAIN in case of an error.
*/
+struct type86_reply_hdrs {
+ struct type86_hdr hdr;
+ struct type86_fmt2_ext fmt2;
+ /* ... payload may follow ... */
+} __packed;
+
struct type86x_reply {
struct type86_hdr hdr;
struct type86_fmt2_ext fmt2;
@@ -1101,23 +1107,38 @@ static long zcrypt_msgtype6_send_cprb(bool userspace, struct zcrypt_queue *zq,
struct ica_xcRB *xcrb,
struct ap_message *ap_msg)
{
- int rc;
+ unsigned int reply_bufsize_minus_headers =
+ zq->reply.bufsize - sizeof(struct type86_reply_hdrs);
struct response_type *rtype = ap_msg->private;
struct {
struct type6_hdr hdr;
struct CPRBX cprbx;
/* ... more data blocks ... */
} __packed * msg = ap_msg->msg;
-
- /*
- * Set the queue's reply buffer length minus 128 byte padding
- * as reply limit for the card firmware.
- */
- msg->hdr.fromcardlen1 = min_t(unsigned int, msg->hdr.fromcardlen1,
- zq->reply.bufsize - 128);
- if (msg->hdr.fromcardlen2)
- msg->hdr.fromcardlen2 =
- zq->reply.bufsize - msg->hdr.fromcardlen1 - 128;
+ int rc, delta;
+
+ /* limit each of the two from fields to AP bus limit - headers */
+ msg->hdr.fromcardlen1 = min_t(unsigned int,
+ msg->hdr.fromcardlen1,
+ reply_bufsize_minus_headers);
+ msg->hdr.fromcardlen2 = min_t(unsigned int,
+ msg->hdr.fromcardlen2,
+ reply_bufsize_minus_headers);
+
+ /* calculate delta if the sum of both exceeds AP bus limit - headers */
+ delta = msg->hdr.fromcardlen1 + msg->hdr.fromcardlen2
+ - reply_bufsize_minus_headers;
+ if (delta > 0) {
+ /*
+ * Sum exceeds AP bus limit - headers, prune fromcardlen1
+ * (always trust fromcardlen2)
+ */
+ if (delta > msg->hdr.fromcardlen1) {
+ rc = -EINVAL;
+ goto out;
+ }
+ msg->hdr.fromcardlen1 -= delta;
+ }
init_completion(&rtype->work);
rc = ap_queue_message(zq->queue, ap_msg);
@@ -1240,7 +1261,7 @@ static long zcrypt_msgtype6_send_ep11_cprb(bool userspace, struct zcrypt_queue *
* as reply limit for the card firmware.
*/
msg->hdr.fromcardlen1 = zq->reply.bufsize -
- sizeof(struct type86_hdr) - sizeof(struct type86_fmt2_ext);
+ sizeof(struct type86_reply_hdrs);
init_completion(&rtype->work);
rc = ap_queue_message(zq->queue, ap_msg);
--
2.34.1
Hello,
I see that raspberry pi bootloader throws ton of warnings when supplied
DTB file does not contain /__symbols__/ node.
On RPI 1B rev1 it looks like this:
dterror: no symbols found
dterror: no symbols found
dterror: no symbols found
dterror: no symbols found
dterror: no symbols found
dterror: no symbols found
dterror: no symbols found
dterror: no symbols found
dterror: no symbols found
dterror: no symbols found
Bootloader also propagates these warnings to kernel via dtb property
chosen/user-warnings and they can be read by simple command:
$ cat /sys/firmware/devicetree/base/chosen/user-warnings
...
Upstream Linux kernel build process by default does not generate
/__symbols__/ node for DTB files, but DTB files provided by raspberrypi
foundation have them for a longer time.
I wanted to look at this issue, but I figured out that it is already
solved by just recent Aurelien's patches:
e925743edc0d ("arm: dts: bcm: Enable device-tree overlay support for RPi devices")
3cdba279c5e9 ("arm64: dts: broadcom: Enable device-tree overlay support for RPi devices")
My testing showed that /__symbols__/ node is required by rpi bootloader
for overlay support even when overlayed DTB file does not use any DTB
symbol (and reference everything via full node path). So seems that
/__symbols__/ node is crucial for rpi bootloader even when symbols from
them are not used at all.
So I would like to ask, would you consider backporting these two
raspberry pi specific patches to stable kernel trees? Upstream kernel
would get rid of those bootloader warnings and also allow users to use
overlayed dtbs...
(Btw, do you know if raspberry pi foundation or broadcom provides source
code of that bootloader? It would be interesting to understand it or
maybe also fix it...)
Dzień dobry,
od wielu lat tworzymy sklepy internetowe dla firm z różnych branż, które działają jak wirtualny przedstawiciel handlowy 24 godziny na dobę, siedem dni w tygodniu.
Nasz model działania sprawia, że nasi Klienci otrzymują więcej niż dotychczas zapytań ofertowych z e-sklepu.
Jest szansa, abyśmy mogli porozmawiać na temat możliwości stworzenia dla Państwa tego typu projektu?
Z pozdrowieniami
Kamil Durjasz
exfat_extract_uni_name copies characters from a given file name entry into
the 'uniname' variable. This variable is actually defined on the stack of
the exfat_readdir() function. According to the definition of
the 'exfat_uni_name' type, the file name should be limited 255 characters
(+ null teminator space), but the exfat_get_uniname_from_ext_entry()
function can write more characters because there is no check if filename
entries exceeds max filename length. This patch add the check not to copy
filename characters when exceeding max filename length.
Cc: stable(a)vger.kernel.org
Cc: Yuezhang Mo <Yuezhang.Mo(a)sony.com>
Reported-by: Maxim Suhanov <dfirblog(a)gmail.com>
Reviewed-by: Sungjong Seo <sj1557.seo(a)samsung.com>
Signed-off-by: Namjae Jeon <linkinjeon(a)kernel.org>
---
fs/exfat/dir.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/fs/exfat/dir.c b/fs/exfat/dir.c
index 957574180a5e..bc48f3329921 100644
--- a/fs/exfat/dir.c
+++ b/fs/exfat/dir.c
@@ -34,6 +34,7 @@ static int exfat_get_uniname_from_ext_entry(struct super_block *sb,
{
int i, err;
struct exfat_entry_set_cache es;
+ unsigned int uni_len = 0, len;
err = exfat_get_dentry_set(&es, sb, p_dir, entry, ES_ALL_ENTRIES);
if (err)
@@ -52,7 +53,10 @@ static int exfat_get_uniname_from_ext_entry(struct super_block *sb,
if (exfat_get_entry_type(ep) != TYPE_EXTEND)
break;
- exfat_extract_uni_name(ep, uniname);
+ len = exfat_extract_uni_name(ep, uniname);
+ uni_len += len;
+ if (len != EXFAT_FILE_NAME_LEN || uni_len >= MAX_NAME_LENGTH)
+ break;
uniname += EXFAT_FILE_NAME_LEN;
}
@@ -1079,7 +1083,8 @@ int exfat_find_dir_entry(struct super_block *sb, struct exfat_inode_info *ei,
if (entry_type == TYPE_EXTEND) {
unsigned short entry_uniname[16], unichar;
- if (step != DIRENT_STEP_NAME) {
+ if (step != DIRENT_STEP_NAME ||
+ name_len >= MAX_NAME_LENGTH) {
step = DIRENT_STEP_FILE;
continue;
}
--
2.25.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x b6f3f28f604ba3de4724ad82bea6adb1300c0b5f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071116-umbrella-fog-a65f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b6f3f28f604ba3de4724ad82bea6adb1300c0b5f Mon Sep 17 00:00:00 2001
From: Michael Schmitz <schmitzmic(a)gmail.com>
Date: Wed, 21 Jun 2023 08:17:25 +1200
Subject: [PATCH] block: add overflow checks for Amiga partition support
The Amiga partition parser module uses signed int for partition sector
address and count, which will overflow for disks larger than 1 TB.
Use u64 as type for sector address and size to allow using disks up to
2 TB without LBD support, and disks larger than 2 TB with LBD. The RBD
format allows to specify disk sizes up to 2^128 bytes (though native
OS limitations reduce this somewhat, to max 2^68 bytes), so check for
u64 overflow carefully to protect against overflowing sector_t.
Bail out if sector addresses overflow 32 bits on kernels without LBD
support.
This bug was reported originally in 2012, and the fix was created by
the RDB author, Joanne Dow <jdow(a)earthlink.net>. A patch had been
discussed and reviewed on linux-m68k at that time but never officially
submitted (now resubmitted as patch 1 in this series).
This patch adds additional error checking and warning messages.
Reported-by: Martin Steigerwald <Martin(a)lichtvoll.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=43511
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Message-ID: <201206192146.09327.Martin(a)lichtvoll.de>
Cc: <stable(a)vger.kernel.org> # 5.2
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Reviewed-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Reviewed-by: Christoph Hellwig <hch(a)infradead.org>
Link: https://lore.kernel.org/r/20230620201725.7020-4-schmitzmic@gmail.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/partitions/amiga.c b/block/partitions/amiga.c
index 85c5c79aae48..ed222b9c901b 100644
--- a/block/partitions/amiga.c
+++ b/block/partitions/amiga.c
@@ -11,10 +11,18 @@
#define pr_fmt(fmt) fmt
#include <linux/types.h>
+#include <linux/mm_types.h>
+#include <linux/overflow.h>
#include <linux/affs_hardblocks.h>
#include "check.h"
+/* magic offsets in partition DosEnvVec */
+#define NR_HD 3
+#define NR_SECT 5
+#define LO_CYL 9
+#define HI_CYL 10
+
static __inline__ u32
checksum_block(__be32 *m, int size)
{
@@ -31,9 +39,12 @@ int amiga_partition(struct parsed_partitions *state)
unsigned char *data;
struct RigidDiskBlock *rdb;
struct PartitionBlock *pb;
- sector_t start_sect, nr_sects;
- int blk, part, res = 0;
- int blksize = 1; /* Multiplier for disk block size */
+ u64 start_sect, nr_sects;
+ sector_t blk, end_sect;
+ u32 cylblk; /* rdb_CylBlocks = nr_heads*sect_per_track */
+ u32 nr_hd, nr_sect, lo_cyl, hi_cyl;
+ int part, res = 0;
+ unsigned int blksize = 1; /* Multiplier for disk block size */
int slot = 1;
for (blk = 0; ; blk++, put_dev_sector(sect)) {
@@ -41,7 +52,7 @@ int amiga_partition(struct parsed_partitions *state)
goto rdb_done;
data = read_part_sector(state, blk, §);
if (!data) {
- pr_err("Dev %s: unable to read RDB block %d\n",
+ pr_err("Dev %s: unable to read RDB block %llu\n",
state->disk->disk_name, blk);
res = -1;
goto rdb_done;
@@ -58,12 +69,12 @@ int amiga_partition(struct parsed_partitions *state)
*(__be32 *)(data+0xdc) = 0;
if (checksum_block((__be32 *)data,
be32_to_cpu(rdb->rdb_SummedLongs) & 0x7F)==0) {
- pr_err("Trashed word at 0xd0 in block %d ignored in checksum calculation\n",
+ pr_err("Trashed word at 0xd0 in block %llu ignored in checksum calculation\n",
blk);
break;
}
- pr_err("Dev %s: RDB in block %d has bad checksum\n",
+ pr_err("Dev %s: RDB in block %llu has bad checksum\n",
state->disk->disk_name, blk);
}
@@ -80,10 +91,15 @@ int amiga_partition(struct parsed_partitions *state)
blk = be32_to_cpu(rdb->rdb_PartitionList);
put_dev_sector(sect);
for (part = 1; blk>0 && part<=16; part++, put_dev_sector(sect)) {
- blk *= blksize; /* Read in terms partition table understands */
+ /* Read in terms partition table understands */
+ if (check_mul_overflow(blk, (sector_t) blksize, &blk)) {
+ pr_err("Dev %s: overflow calculating partition block %llu! Skipping partitions %u and beyond\n",
+ state->disk->disk_name, blk, part);
+ break;
+ }
data = read_part_sector(state, blk, §);
if (!data) {
- pr_err("Dev %s: unable to read partition block %d\n",
+ pr_err("Dev %s: unable to read partition block %llu\n",
state->disk->disk_name, blk);
res = -1;
goto rdb_done;
@@ -95,19 +111,70 @@ int amiga_partition(struct parsed_partitions *state)
if (checksum_block((__be32 *)pb, be32_to_cpu(pb->pb_SummedLongs) & 0x7F) != 0 )
continue;
- /* Tell Kernel about it */
+ /* RDB gives us more than enough rope to hang ourselves with,
+ * many times over (2^128 bytes if all fields max out).
+ * Some careful checks are in order, so check for potential
+ * overflows.
+ * We are multiplying four 32 bit numbers to one sector_t!
+ */
+
+ nr_hd = be32_to_cpu(pb->pb_Environment[NR_HD]);
+ nr_sect = be32_to_cpu(pb->pb_Environment[NR_SECT]);
+
+ /* CylBlocks is total number of blocks per cylinder */
+ if (check_mul_overflow(nr_hd, nr_sect, &cylblk)) {
+ pr_err("Dev %s: heads*sects %u overflows u32, skipping partition!\n",
+ state->disk->disk_name, cylblk);
+ continue;
+ }
+
+ /* check for consistency with RDB defined CylBlocks */
+ if (cylblk > be32_to_cpu(rdb->rdb_CylBlocks)) {
+ pr_warn("Dev %s: cylblk %u > rdb_CylBlocks %u!\n",
+ state->disk->disk_name, cylblk,
+ be32_to_cpu(rdb->rdb_CylBlocks));
+ }
+
+ /* RDB allows for variable logical block size -
+ * normalize to 512 byte blocks and check result.
+ */
+
+ if (check_mul_overflow(cylblk, blksize, &cylblk)) {
+ pr_err("Dev %s: partition %u bytes per cyl. overflows u32, skipping partition!\n",
+ state->disk->disk_name, part);
+ continue;
+ }
+
+ /* Calculate partition start and end. Limit of 32 bit on cylblk
+ * guarantees no overflow occurs if LBD support is enabled.
+ */
+
+ lo_cyl = be32_to_cpu(pb->pb_Environment[LO_CYL]);
+ start_sect = ((u64) lo_cyl * cylblk);
+
+ hi_cyl = be32_to_cpu(pb->pb_Environment[HI_CYL]);
+ nr_sects = (((u64) hi_cyl - lo_cyl + 1) * cylblk);
- nr_sects = ((sector_t)be32_to_cpu(pb->pb_Environment[10]) + 1 -
- be32_to_cpu(pb->pb_Environment[9])) *
- be32_to_cpu(pb->pb_Environment[3]) *
- be32_to_cpu(pb->pb_Environment[5]) *
- blksize;
if (!nr_sects)
continue;
- start_sect = (sector_t)be32_to_cpu(pb->pb_Environment[9]) *
- be32_to_cpu(pb->pb_Environment[3]) *
- be32_to_cpu(pb->pb_Environment[5]) *
- blksize;
+
+ /* Warn user if partition end overflows u32 (AmigaDOS limit) */
+
+ if ((start_sect + nr_sects) > UINT_MAX) {
+ pr_warn("Dev %s: partition %u (%llu-%llu) needs 64 bit device support!\n",
+ state->disk->disk_name, part,
+ start_sect, start_sect + nr_sects);
+ }
+
+ if (check_add_overflow(start_sect, nr_sects, &end_sect)) {
+ pr_err("Dev %s: partition %u (%llu-%llu) needs LBD device support, skipping partition!\n",
+ state->disk->disk_name, part,
+ start_sect, end_sect);
+ continue;
+ }
+
+ /* Tell Kernel about it */
+
put_partition(state,slot++,start_sect,nr_sects);
{
/* Be even more informative to aid mounting */
Hi Greg, Sasha,
The following list shows the backported patches, I am using original
commit IDs for reference:
1) 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
2) 26b5a5712eb8 ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
3) 3e70489721b6 ("netfilter: nf_tables: unbind non-anonymous set if rule construction fails")
Please, apply,
Thanks.
Pablo Neira Ayuso (3):
netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
include/net/netfilter/nf_tables.h | 1 +
net/netfilter/nf_tables_api.c | 29 +++++++++++++++++++++++++----
2 files changed, 26 insertions(+), 4 deletions(-)
--
2.30.2
Hi Greg, Sasha,
The following list shows the backported patches, I am using original
commit IDs for reference:
1) 1e9451cbda45 ("netfilter: nf_tables: fix nat hook table deletion")
2) 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
3) 802b805162a1 ("netfilter: nftables: add helper function to set the base sequence number")
4) 19c28b1374fb ("netfilter: add helper function to set up the nfnetlink header and use it")
5) 0854db2aaef3 ("netfilter: nf_tables: use net_generic infra for transaction data")
6) 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
7) 26b5a5712eb8 ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
8) 938154b93be8 ("netfilter: nf_tables: reject unbound anonymous set before commit phase")
9) 3e70489721b6 ("netfilter: nf_tables: unbind non-anonymous set if rule construction fails")
10) 2024439bd5ce ("netfilter: nf_tables: fix scheduling-while-atomic splat")
Please, apply,
Thanks.
Florian Westphal (3):
netfilter: nf_tables: fix nat hook table deletion
netfilter: nf_tables: add rescheduling points during loop detection walks
netfilter: nf_tables: fix scheduling-while-atomic splat
Pablo Neira Ayuso (7):
netfilter: nftables: add helper function to set the base sequence number
netfilter: add helper function to set up the nfnetlink header and use it
netfilter: nf_tables: use net_generic infra for transaction data
netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
include/linux/netfilter/nfnetlink.h | 27 ++
include/net/netfilter/nf_tables.h | 14 +
include/net/netns/nftables.h | 5 -
net/netfilter/ipset/ip_set_core.c | 17 +-
net/netfilter/nf_conntrack_netlink.c | 77 ++---
net/netfilter/nf_tables_api.c | 483 ++++++++++++++++-----------
net/netfilter/nf_tables_trace.c | 9 +-
net/netfilter/nfnetlink_acct.c | 11 +-
net/netfilter/nfnetlink_cthelper.c | 11 +-
net/netfilter/nfnetlink_cttimeout.c | 22 +-
net/netfilter/nfnetlink_log.c | 11 +-
net/netfilter/nfnetlink_queue.c | 12 +-
net/netfilter/nft_chain_filter.c | 11 +-
net/netfilter/nft_compat.c | 11 +-
net/netfilter/nft_dynset.c | 6 +-
15 files changed, 383 insertions(+), 344 deletions(-)
--
2.30.2
Hi Greg, Sasha,
The following list shows the backported patches, I am using original
commit IDs for reference:
1) 1e9451cbda45 ("netfilter: nf_tables: fix nat hook table deletion")
2) 802b805162a1 ("netfilter: nftables: add helper function to set the base sequence number")
3) 19c28b1374fb ("netfilter: add helper function to set up the nfnetlink header and use it")
4) 0854db2aaef3 ("netfilter: nf_tables: use net_generic infra for transaction data")
5) 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
6) 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
7) 26b5a5712eb8 ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
8) 938154b93be8 ("netfilter: nf_tables: reject unbound anonymous set before commit phase")
9) 3e70489721b6 ("netfilter: nf_tables: unbind non-anonymous set if rule construction fails")
10) 2024439bd5ce ("netfilter: nf_tables: fix scheduling-while-atomic splat")
Please, apply,
Thanks
Florian Westphal (4):
netfilter: nf_tables: fix nat hook table deletion
netfilter: nf_tables: use net_generic infra for transaction data
netfilter: nf_tables: add rescheduling points during loop detection walks
netfilter: nf_tables: fix scheduling-while-atomic splat
Pablo Neira Ayuso (6):
netfilter: nftables: add helper function to set the base sequence number
netfilter: add helper function to set up the nfnetlink header and use it
netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
include/linux/netfilter/nfnetlink.h | 27 ++
include/net/netfilter/nf_tables.h | 14 +
include/net/netns/nftables.h | 6 -
net/netfilter/ipset/ip_set_core.c | 17 +-
net/netfilter/nf_conntrack_netlink.c | 77 ++--
net/netfilter/nf_tables_api.c | 510 ++++++++++++++++-----------
net/netfilter/nf_tables_offload.c | 29 +-
net/netfilter/nf_tables_trace.c | 9 +-
net/netfilter/nfnetlink_acct.c | 11 +-
net/netfilter/nfnetlink_cthelper.c | 11 +-
net/netfilter/nfnetlink_cttimeout.c | 22 +-
net/netfilter/nfnetlink_log.c | 11 +-
net/netfilter/nfnetlink_queue.c | 12 +-
net/netfilter/nft_chain_filter.c | 11 +-
net/netfilter/nft_compat.c | 11 +-
net/netfilter/nft_dynset.c | 6 +-
16 files changed, 418 insertions(+), 366 deletions(-)
--
2.30.2
Hi Greg, Sasha,
[ This is v3:
- remove backported function that is useless in patch 09/11
- use: Upstream commit XYZ tag as suggested by Salvatore Bonaccorso.
]
The following list shows the backported patches. I am using original commit IDs
for reference:
1) 0854db2aaef3 ("netfilter: nf_tables: use net_generic infra for transaction data")
2) 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
3) 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
4) 4bedf9eee016 ("netfilter: nf_tables: fix chain binding transaction logic")
5) 26b5a5712eb8 ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
6) 938154b93be8 ("netfilter: nf_tables: reject unbound anonymous set before commit phase")
7) 62e1e94b246e ("netfilter: nf_tables: reject unbound chain set before commit phase")
8) f8bb7889af58 ("netfilter: nftables: rename set element data activation/deactivation functions")
9) 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
10) 3e70489721b6 ("netfilter: nf_tables: unbind non-anonymous set if rule construction fails")
11) f838e0906dd3 ("netfilter: nf_tables: fix scheduling-while-atomic splat")
Notes:
- Patch #1 is a backported dependency patch required by these fixes.
- Patch #3 needs an incremental follow up fix coming in Patch #11.
Florian Westphal (3):
netfilter: nf_tables: use net_generic infra for transaction data
netfilter: nf_tables: add rescheduling points during loop detection walks
netfilter: nf_tables: fix scheduling-while-atomic splat
Pablo Neira Ayuso (8):
netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
netfilter: nf_tables: fix chain binding transaction logic
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: reject unbound chain set before commit phase
netfilter: nftables: rename set element data activation/deactivation functions
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
include/net/netfilter/nf_tables.h | 41 +-
include/net/netns/nftables.h | 7 -
net/netfilter/nf_tables_api.c | 662 +++++++++++++++++++++---------
net/netfilter/nf_tables_offload.c | 30 +-
net/netfilter/nft_chain_filter.c | 11 +-
net/netfilter/nft_dynset.c | 6 +-
net/netfilter/nft_immediate.c | 90 +++-
net/netfilter/nft_set_bitmap.c | 5 +-
net/netfilter/nft_set_hash.c | 23 +-
net/netfilter/nft_set_pipapo.c | 14 +-
net/netfilter/nft_set_rbtree.c | 5 +-
11 files changed, 648 insertions(+), 246 deletions(-)
--
2.30.2
Hi,
Could you please backport the three following commits to 4.14 and 4.19:
24c363623361 ("spi: spi-fsl-spi: remove always-true conditional in
fsl_spi_do_one_msg")
17ecffa28948 ("spi: spi-fsl-spi: relax message sanity checking a little")
a798a7086c38 ("spi: spi-fsl-spi: allow changing bits_per_word while CS
is still active")
Those three commits (the last one indeed) are needed to overcome a
problem introduced in 4.14.230 by commit 42c04316d927 ("spi: fsl-cpm:
Use 16 bit mode for large transfers with even size"), which leads to the
following error in certain cases:
[ 174.900526] at25 spi0.3: bits_per_word/speed_hz should be same for
the same SPI transfer
[ 174.911844] spi_master spi0: failed to transfer one message from queue
[ 366.639467] INFO: task od:406 blocked for more than 120 seconds.
[ 366.645155] Not tainted 4.14.320-s3k-dev-dirty #342
[ 366.652996] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
[ 489.519450] INFO: task od:406 blocked for more than 120 seconds.
[ 489.525156] Not tainted 4.14.320-s3k-dev-dirty #342
[ 489.532915] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message.
...
Thanks
Christophe
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 8a796565cec3601071cbbd27d6304e202019d014
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071623-deafness-gargle-5297@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
8a796565cec3 ("io_uring: Use io_schedule* in cqring wait")
d33a39e57768 ("io_uring: keep timeout in io_wait_queue")
46ae7eef44f6 ("io_uring: optimise non-timeout waiting")
846072f16eed ("io_uring: mimimise io_cqring_wait_schedule")
3fcf19d592d5 ("io_uring: parse check_cq out of wq waiting")
12521a5d5cb7 ("io_uring: fix CQ waiting timeout handling")
52ea806ad983 ("io_uring: finish waiting before flushing overflow entries")
35d90f95cfa7 ("io_uring: include task_work run after scheduling in wait for events")
1b346e4aa8e7 ("io_uring: don't check overflow flush failures")
a85381d8326d ("io_uring: skip overflow CQE posting for dying ring")
c0e0d6ba25f1 ("io_uring: add IORING_SETUP_DEFER_TASKRUN")
b4c98d59a787 ("io_uring: introduce io_has_work")
78a861b94959 ("io_uring: add sync cancelation API through io_uring_register()")
c34398a8c018 ("io_uring: remove __io_req_task_work_add")
ed5ccb3beeba ("io_uring: remove priority tw list optimisation")
4a0fef62788b ("io_uring: optimize io_uring_task layout")
253993210bd8 ("io_uring: introduce locking helpers for CQE posting")
305bef988708 ("io_uring: hide eventfd assumptions in eventfd paths")
affa87db9010 ("io_uring: fix multi ctx cancellation")
d9dee4302a7c ("io_uring: remove ->flush_cqes optimisation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a796565cec3601071cbbd27d6304e202019d014 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres(a)anarazel.de>
Date: Fri, 7 Jul 2023 09:20:07 -0700
Subject: [PATCH] io_uring: Use io_schedule* in cqring wait
I observed poor performance of io_uring compared to synchronous IO. That
turns out to be caused by deeper CPU idle states entered with io_uring,
due to io_uring using plain schedule(), whereas synchronous IO uses
io_schedule().
The losses due to this are substantial. On my cascade lake workstation,
t/io_uring from the fio repository e.g. yields regressions between 20%
and 40% with the following command:
./t/io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S$use_sync -R 0 /mnt/t2/fio/write.0.0
This is repeatable with different filesystems, using raw block devices
and using different block devices.
Use io_schedule_prepare() / io_schedule_finish() in
io_cqring_wait_schedule() to address the difference.
After that using io_uring is on par or surpassing synchronous IO (using
registered files etc makes it reliably win, but arguably is a less fair
comparison).
There are other calls to schedule() in io_uring/, but none immediately
jump out to be similarly situated, so I did not touch them. Similarly,
it's possible that mutex_lock_io() should be used, but it's not clear if
there are cases where that matters.
Cc: stable(a)vger.kernel.org # 5.10+
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: io-uring(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Andres Freund <andres(a)anarazel.de>
Link: https://lore.kernel.org/r/20230707162007.194068-1-andres@anarazel.de
[axboe: minor style fixup]
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index e8096d502a7c..7505de2428e0 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2489,6 +2489,8 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx)
static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
struct io_wait_queue *iowq)
{
+ int token, ret;
+
if (unlikely(READ_ONCE(ctx->check_cq)))
return 1;
if (unlikely(!llist_empty(&ctx->work_llist)))
@@ -2499,11 +2501,20 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
return -EINTR;
if (unlikely(io_should_wake(iowq)))
return 0;
+
+ /*
+ * Use io_schedule_prepare/finish, so cpufreq can take into account
+ * that the task is waiting for IO - turns out to be important for low
+ * QD IO.
+ */
+ token = io_schedule_prepare();
+ ret = 0;
if (iowq->timeout == KTIME_MAX)
schedule();
else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS))
- return -ETIME;
- return 0;
+ ret = -ETIME;
+ io_schedule_finish(token);
+ return ret;
}
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 20cb1c2fb7568a6054c55defe044311397e01ddb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071635-handsaw-enclosure-0363@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
20cb1c2fb756 ("blk-cgroup: Flush stats before releasing blkcg_gq")
2c275afeb61d ("block: make blkcg_punt_bio_submit optional")
12be09fe18f2 ("block: async_bio_lock does not need to be bh-safe")
3480373ebdf7 ("btrfs, block: move REQ_CGROUP_PUNT to btrfs")
0a0596fbbe5b ("btrfs, mm: remove the punt_to_cgroup field in struct writeback_control")
05d06a5c9d9c ("btrfs: move kthread_associate_blkcg out of btrfs_submit_compressed_write")
ae42a154ca89 ("btrfs: pass a btrfs_bio to btrfs_submit_bio")
34f888ce3a35 ("btrfs: cleanup main loop in btrfs_encoded_read_regular_fill_pages")
72b505dc5757 ("btrfs: add a wbc pointer to struct btrfs_bio_ctrl")
794c26e214ab ("btrfs: remove the sync_io flag in struct btrfs_bio_ctrl")
c000bc04bad4 ("btrfs: store the bio opf in struct btrfs_bio_ctrl")
eb8d0c6d042f ("btrfs: remove the force_bio_submit to submit_extent_page")
67998cf438e2 ("btrfs: don't set force_bio_submit in read_extent_buffer_subpage")
10e924bc320a ("btrfs: factor out a btrfs_add_compressed_bio_pages helper")
e7aff33e3161 ("btrfs: use the bbio file offset in btrfs_submit_compressed_read")
798c9fc74d03 ("btrfs: remove redundant free_extent_map in btrfs_submit_compressed_read")
544fe4a903ce ("btrfs: embed a btrfs_bio into struct compressed_bio")
3822a7c40997 ("Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 20cb1c2fb7568a6054c55defe044311397e01ddb Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei(a)redhat.com>
Date: Sat, 10 Jun 2023 07:42:49 +0800
Subject: [PATCH] blk-cgroup: Flush stats before releasing blkcg_gq
As noted by Michal, the blkg_iostat_set's in the lockless list hold
reference to blkg's to protect against their removal. Those blkg's
hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which
is called when the blkcg reference count reaches 0. This circular
dependency will prevent blkcg and some blkgs from being freed after
they are made offline.
It is less a problem if the cgroup to be destroyed also has other
controllers like memory that will call cgroup_rstat_flush() which will
clean up the reference count. If block is the only controller that uses
rstat, these offline blkcg and blkgs may never be freed leaking more
and more memory over time.
To prevent this potential memory leak:
- flush blkcg per-cpu stats list in __blkg_release(), when no new stat
can be added
- add global blkg_stat_lock for covering concurrent parent blkg stat
update
- don't grab bio->bi_blkg reference when adding the stats into blkcg's
per-cpu stat list since all stats are guaranteed to be consumed before
releasing blkg instance, and grabbing blkg reference for stats was the
most fragile part of original patch
Based on Waiman's patch:
https://lore.kernel.org/linux-block/20221215033132.230023-3-longman@redhat.…
Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Cc: stable(a)vger.kernel.org
Reported-by: Jay Shin <jaeshin(a)redhat.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Cc: Waiman Long <longman(a)redhat.com>
Cc: mkoutny(a)suse.com
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Link: https://lore.kernel.org/r/20230609234249.1412858-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 0ce64dd73cfe..f0b5c9c41cde 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -34,6 +34,8 @@
#include "blk-ioprio.h"
#include "blk-throttle.h"
+static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu);
+
/*
* blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation.
* blkcg_pol_register_mutex nests outside of it and synchronizes entire
@@ -56,6 +58,8 @@ static LIST_HEAD(all_blkcgs); /* protected by blkcg_pol_mutex */
bool blkcg_debug_stats = false;
+static DEFINE_RAW_SPINLOCK(blkg_stat_lock);
+
#define BLKG_DESTROY_BATCH_SIZE 64
/*
@@ -163,10 +167,20 @@ static void blkg_free(struct blkcg_gq *blkg)
static void __blkg_release(struct rcu_head *rcu)
{
struct blkcg_gq *blkg = container_of(rcu, struct blkcg_gq, rcu_head);
+ struct blkcg *blkcg = blkg->blkcg;
+ int cpu;
#ifdef CONFIG_BLK_CGROUP_PUNT_BIO
WARN_ON(!bio_list_empty(&blkg->async_bios));
#endif
+ /*
+ * Flush all the non-empty percpu lockless lists before releasing
+ * us, given these stat belongs to us.
+ *
+ * blkg_stat_lock is for serializing blkg stat update
+ */
+ for_each_possible_cpu(cpu)
+ __blkcg_rstat_flush(blkcg, cpu);
/* release the blkcg and parent blkg refs this blkg has been holding */
css_put(&blkg->blkcg->css);
@@ -951,23 +965,26 @@ static void blkcg_iostat_update(struct blkcg_gq *blkg, struct blkg_iostat *cur,
u64_stats_update_end_irqrestore(&blkg->iostat.sync, flags);
}
-static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
+static void __blkcg_rstat_flush(struct blkcg *blkcg, int cpu)
{
- struct blkcg *blkcg = css_to_blkcg(css);
struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
struct llist_node *lnode;
struct blkg_iostat_set *bisc, *next_bisc;
- /* Root-level stats are sourced from system-wide IO stats */
- if (!cgroup_parent(css->cgroup))
- return;
-
rcu_read_lock();
lnode = llist_del_all(lhead);
if (!lnode)
goto out;
+ /*
+ * For covering concurrent parent blkg update from blkg_release().
+ *
+ * When flushing from cgroup, cgroup_rstat_lock is always held, so
+ * this lock won't cause contention most of time.
+ */
+ raw_spin_lock(&blkg_stat_lock);
+
/*
* Iterate only the iostat_cpu's queued in the lockless list.
*/
@@ -991,13 +1008,19 @@ static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
if (parent && parent->parent)
blkcg_iostat_update(parent, &blkg->iostat.cur,
&blkg->iostat.last);
- percpu_ref_put(&blkg->refcnt);
}
-
+ raw_spin_unlock(&blkg_stat_lock);
out:
rcu_read_unlock();
}
+static void blkcg_rstat_flush(struct cgroup_subsys_state *css, int cpu)
+{
+ /* Root-level stats are sourced from system-wide IO stats */
+ if (cgroup_parent(css->cgroup))
+ __blkcg_rstat_flush(css_to_blkcg(css), cpu);
+}
+
/*
* We source root cgroup stats from the system-wide stats to avoid
* tracking the same information twice and incurring overhead when no
@@ -2075,7 +2098,6 @@ void blk_cgroup_bio_start(struct bio *bio)
llist_add(&bis->lnode, lhead);
WRITE_ONCE(bis->lqueued, true);
- percpu_ref_get(&bis->blkg->refcnt);
}
u64_stats_update_end_irqrestore(&bis->sync, flags);
Hi stable team,
Building BPF selftests on 5.10.186 currently causes the following compile error:
$ make -C tools/testing/selftests/bpf
...
BINARY test_verifier
In file included from
/usr/src/linux-5.10.186/tools/testing/selftests/bpf/verifier/tests.h:59,
from test_verifier.c:355:
/usr/src/linux-5.10.186/tools/testing/selftests/bpf/verifier/ref_tracking.c:935:10:
error: 'struct bpf_test' has no member named 'fixup_map_ringbuf'; did
you mean 'fixup_map_in_map'?
935 | .fixup_map_ringbuf = { 11 },
| ^~~~~~~~~~~~~~~~~
| fixup_map_in_map
The problem was introduced by commit f4b8c0710ab6 ("selftests/bpf: Add
verifier test for release_reference()") in your tree.
Seems like at least commit 4237e9f4a962 ("selftests/bpf: Add verifier
test for PTR_TO_MEM spill") is required for the build to succeed.
I previously reported this but things probably fell through the
cracks: https://lore.kernel.org/stable/CAN+4W8iMcwwVjmSekZ9txzZNxOZ0x98nBXo4cEoTU9G…
Thanks!
Lorenz
Hi Greg, Sasha,
The following list shows the backported patches, I am using original
commit IDs for reference:
1) 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
2) 3e70489721b6 ("netfilter: nf_tables: unbind non-anonymous set if rule construction fails")
Please, apply.
Thanks.
Pablo Neira Ayuso (2):
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
include/net/netfilter/nf_tables.h | 5 +-
net/netfilter/nf_tables_api.c | 147 ++++++++++++++++++++++++++----
net/netfilter/nft_set_bitmap.c | 5 +-
net/netfilter/nft_set_hash.c | 23 ++++-
net/netfilter/nft_set_pipapo.c | 14 ++-
net/netfilter/nft_set_rbtree.c | 5 +-
6 files changed, 168 insertions(+), 31 deletions(-)
--
2.30.2
commit 69562eb0bd3e6bb8e522a7b254334e0fb30dff0c upstream.
Hopefully, nobody is trying to abuse mount/sb marks for watching all
anonymous pipes/inodes.
I cannot think of a good reason to allow this - it looks like an
oversight that dated back to the original fanotify API.
Link: https://lore.kernel.org/linux-fsdevel/20230628101132.kvchg544mczxv2pm@quack…
Fixes: 0ff21db9fcc3 ("fanotify: hooks the fanotify_mark syscall to the vfsmount code")
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Reviewed-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Message-Id: <20230629042044.25723-1-amir73il(a)gmail.com>
[backport to 5.x.y]
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
---
Greg,
This 5.15 backport should cleanly apply to all 5.x.y LTS kernels.
It will NOT apply to 4.x.y kernels.
The original upstream commit should apply cleanly to 6.x.y stable
kernels.
Thanks,
Amir.
fs/notify/fanotify/fanotify_user.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
diff --git a/fs/notify/fanotify/fanotify_user.c b/fs/notify/fanotify/fanotify_user.c
index 84ec851211d9..0e2a0eb7cb9e 100644
--- a/fs/notify/fanotify/fanotify_user.c
+++ b/fs/notify/fanotify/fanotify_user.c
@@ -1337,8 +1337,11 @@ static int fanotify_test_fid(struct path *path, __kernel_fsid_t *fsid)
return 0;
}
-static int fanotify_events_supported(struct path *path, __u64 mask)
+static int fanotify_events_supported(struct path *path, __u64 mask,
+ unsigned int flags)
{
+ unsigned int mark_type = flags & FANOTIFY_MARK_TYPE_BITS;
+
/*
* Some filesystems such as 'proc' acquire unusual locks when opening
* files. For them fanotify permission events have high chances of
@@ -1350,6 +1353,21 @@ static int fanotify_events_supported(struct path *path, __u64 mask)
if (mask & FANOTIFY_PERM_EVENTS &&
path->mnt->mnt_sb->s_type->fs_flags & FS_DISALLOW_NOTIFY_PERM)
return -EINVAL;
+
+ /*
+ * mount and sb marks are not allowed on kernel internal pseudo fs,
+ * like pipe_mnt, because that would subscribe to events on all the
+ * anonynous pipes in the system.
+ *
+ * SB_NOUSER covers all of the internal pseudo fs whose objects are not
+ * exposed to user's mount namespace, but there are other SB_KERNMOUNT
+ * fs, like nsfs, debugfs, for which the value of allowing sb and mount
+ * mark is questionable. For now we leave them alone.
+ */
+ if (mark_type != FAN_MARK_INODE &&
+ path->mnt->mnt_sb->s_flags & SB_NOUSER)
+ return -EINVAL;
+
return 0;
}
@@ -1476,7 +1494,7 @@ static int do_fanotify_mark(int fanotify_fd, unsigned int flags, __u64 mask,
goto fput_and_out;
if (flags & FAN_MARK_ADD) {
- ret = fanotify_events_supported(&path, mask);
+ ret = fanotify_events_supported(&path, mask, flags);
if (ret)
goto path_put_and_out;
}
--
2.16.5
Hi,
one 'BUG_ON(ret < 0);' is still left in queue-6.1/btrfs-do-not-bug_on-on-tree-mod-log-failure-at-balan.patch
so we need to rebase this patch.
Best Regards
Wang Yugui (wangyugui(a)e16-tech.com)
2023/07/15
Hi,
In linux-6.1.y we don't have ARCH_BCM4908 symbol anymore (see commit
dd5c672d7ca9 ("arm64: bcmbca: Merge ARCH_BCM4908 to ARCH_BCMBCA") but
drivers/mtd/parsers/Kconfig still references it.
Please kindly cherry-pick a fix for that: commit 085679b15b5a ("mtd:
parsers: refer to ARCH_BCMBCA instead of ARCH_BCM4908") - it's part of
v6.2.
--
Rafał
Stable team, please apply patch 1/1 in this patchset along with its
dependencies to the v6.1 stable tree. The patch required a trivial
rebase adding a header include, hence resending it, while its 2
dependencies listed at Cc: stable lines in the commit message can be
cherry-picked as-is.
Thanks,
Imre
Imre Deak (1):
drm/i915/tc: Fix system resume MST mode restore for DP-alt sinks
.../drm/i915/display/intel_display_types.h | 1 +
drivers/gpu/drm/i915/display/intel_tc.c | 51 +++++++++++++++++--
2 files changed, 48 insertions(+), 4 deletions(-)
--
2.37.2
commit 0503ea8f5ba73eb3ab13a81c1eefbaf51405385a upstream.
This was inadvertently fixed during the removal of __vma_adjust().
When __vma_adjust() is adjusting next with a negative value (pushing
vma->vm_end lower), there would be two writes to the maple tree. The
first write is unnecessary and uses all allocated nodes in the maple
state. The second write is necessary but will need to allocate nodes
since the first write has used the allocated nodes. This may be a
problem as it may not be safe to allocate at this time, such as a low
memory situation. Fix the issue by avoiding the first write and only
write the adjusted "next" VMA.
Reported-by: John Hsu <John.Hsu(a)mediatek.com>
Link: https://lore.kernel.org/lkml/9cb8c599b1d7f9c1c300d1a334d5eb70ec4d7357.camel…
Cc: stable(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
---
mm/mmap.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/mm/mmap.c b/mm/mmap.c
index b8af52db3bbe..bb2e0ff0ef61 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -767,7 +767,8 @@ int __vma_adjust(struct vm_area_struct *vma, unsigned long start,
}
if (end != vma->vm_end) {
if (vma->vm_end > end) {
- if (!insert || (insert->vm_start != end)) {
+ if ((vma->vm_end + adjust_next != end) &&
+ (!insert || (insert->vm_start != end))) {
vma_mas_szero(&mas, end, vma->vm_end);
mas_reset(&mas);
VM_WARN_ON(insert &&
--
2.39.2
Hi Greg and Sasha,
Please consider applying the following patches to 6.4 and 6.3, they
should apply cleanly.
08f6554ff90e ("mips: Include KBUILD_CPPFLAGS in CHECKFLAGS invocation")
a7e5eb53bf9b ("powerpc/vdso: Include CLANG_FLAGS explicitly in ldflags-y")
cff6e7f50bd3 ("kbuild: Add CLANG_FLAGS to as-instr")
43fc0a99906e ("kbuild: Add KBUILD_CPPFLAGS to as-option invocation")
feb843a469fb ("kbuild: add $(CLANG_FLAGS) to KBUILD_CPPFLAGS")
They resolve and help avoid build breakage with tip of tree clang, which
has become a little stricter in the flags that it will accept for a
particular target, which requires '--target=' to be passed along to all
invocations of $(CC). Our continuous integration does not currently show
any breakage with 6.1 and earlier; should these patches be needed there,
I will send them at a later time.
Cheers,
Nathan
Greg,
This is a fine selected set of backports from 6.4.
Patch 4 fixes a fix that was already backported to 5.15.y.
Patch 1 fixes a fix that is wanted in 5.15.y and this was the trigger
for creating this 6.1 backport series.
Leah will take care for the 5.15.y backports, but it may take some time.
None of these are relevant before 5.15.
These backpors have gone through the usual xfs review and testing.
Thanks,
Amir.
Darrick J. Wong (4):
xfs: explicitly specify cpu when forcing inodegc delayed work to run
immediately
xfs: check that per-cpu inodegc workers actually run on that cpu
xfs: disable reaping in fscounters scrub
xfs: fix xfs_inodegc_stop racing with mod_delayed_work
fs/xfs/scrub/common.c | 26 -------------------------
fs/xfs/scrub/common.h | 2 --
fs/xfs/scrub/fscounters.c | 13 ++++++-------
fs/xfs/scrub/scrub.c | 2 --
fs/xfs/scrub/scrub.h | 1 -
fs/xfs/xfs_icache.c | 40 ++++++++++++++++++++++++++++++++-------
fs/xfs/xfs_mount.h | 3 +++
fs/xfs/xfs_super.c | 3 +++
8 files changed, 45 insertions(+), 45 deletions(-)
--
2.34.1
Hello Greg,
I know you haven't formally released 4.14.321-rc1 for review yet, but our CI picked up a build issue so I thought I may as well report it in case it's useful information for you.
SHA: Linux 4.14.321-rc1 (bc1094b21392)
Failed build log: https://gitlab.com/cip-project/cip-testing/linux-stable-rc-ci/-/jobs/463572…
defconfig used: https://gitlab.com/cip-project/cip-kernel/cip-kernel-config/-/blob/master/4…
Error log:
/builds/cip-project/cip-testing/linux-stable-rc-ci/gcc/gcc-11.1.0-nolibc/arm-linux-gnueabi/bin/arm-linux-gnueabi-ld: arch/arm/probes/kprobes/core.o: in function `jprobe_return':
/builds/cip-project/cip-testing/linux-stable-rc-ci/arch/arm/probes/kprobes/core.c:555: undefined reference to `kprobe_handler'
Makefile:1049: recipe for target 'vmlinux' failed
make: *** [vmlinux] Error 1
Problem patch:
Reverting 1c18f6ba04d8 ("ARM: 9303/1: kprobes: avoid missing-declaration warnings") makes the problem go away.
Kind regards, Chris
Hi,
I notice a regression report on Bugzilla [1]. Quoting from it:
> Hello I have been having problems for some time now with displaying any linux distribution with the new kernal. The only thing that can fix it are older kernals.
>
> I can't describe it very well, so I'm attaching some pictures, but it's like whenever anything happens on the screen, it pushes the upper left half up from the bottom. But it only goes to a certain extent, after that it just shakes around and jumps back a bit. Also, for example, the firefox icon is not where it is displayed, but where it should be (I tried it with touch).
>
> I think it has something to do with the framebuffer, because an usb stick with just the arch iso show the same issue in the tty.
>
> The hardware is a Huawei Matebook E 2022.
> Cpu: Intel® Core™ i5-1130G7
> Graphics: Intel® Iris® Xe
>
> Example under Manjaro that worked was with the
> linux-lqx-6.1.0.lqx2-1-x86_64.pkg.tar.zst kernal
> 22-Dec-2022 14:44 154507169
> from https://repo.blacksky3.com/x86_64/linux-lqx/old/
>
> Kernals above that Version dosent work, Example with 6.1.1-lqx1-linux-lqx:
> `
> $inxi -F
> System:
> Host: Johannes Kernel: 6.1.1-lqx1-linux-lqx arch: x86_64 bits: 64
> Desktop: GNOME v: 43.5 Distro: Manjaro Linux
> Machine:
> Type: Detachable System: HUAWEI product: DRC-WXX v: M1010
> serial:
> Mobo: HUAWEI model: DRC-WXX-PCB v: M1010 serial:
> UEFI: HUAWEI v: 1.30 date: 06/29/2022
> Battery:
> ID-1: BAT1 charge: 27.8 Wh (66.0%) condition: 42.1/42.1 Wh (100.0%)
> volts: 11.9 min: 11.5
> CPU:
> Info: quad core model: 11th Gen Intel Core i5-1130G7 bits: 64 type: MT MCP
> cache: L2: 5 MiB
> Speed (MHz): avg: 1395 min/max: 400/1801 cores: 1: 897 2: 1046 3: 962
> 4: 1801 5: 1801 6: 1801 7: 1801 8: 1052
> Graphics:
> Device-1: Intel Tiger Lake-UP4 GT2 [Iris Xe Graphics] driver: i915 v: kernel
> Display: wayland server: X.org v: 1.21.1.8 with: Xwayland v: 23.1.1
> compositor: gnome-shell driver: gpu: i915 resolution: 2560x1600~60Hz
> API: OpenGL v: 4.6 Mesa 23.0.3 renderer: Mesa Intel Xe Graphics (TGL GT2)
> Audio:
> Device-1: Intel driver: N/A
> Device-2: Intel Tiger Lake-LP Smart Sound Audio
> driver: sof-audio-pci-intel-tgl
> API: ALSA v: k6.1.1-lqx1-linux-lqx status: kernel-api
> Server-1: PulseAudio v: 16.1 status: active
> Network:
> Device-1: Intel Wi-Fi 6 AX201 driver: iwlwifi
> IF: wlp0s20f3 state: up mac: f4:b3:01:b7:c6:6d
> Bluetooth:
> Device-1: Intel AX201 Bluetooth driver: btusb type: USB
> Report: rfkill ID: hci0 state: up address: see --recommends
> Drives:
> Local Storage: total: 476.94 GiB used: 9.65 GiB (2.0%)
> ID-1: /dev/nvme0n1 model: PCIe-8 SSD 512GB size: 476.94 GiB
> Partition:
> ID-1: / size: 468.09 GiB used: 9.65 GiB (2.1%) fs: ext4 dev: /dev/nvme0n1p2
> ID-2: /boot/efi size: 299.4 MiB used: 288 KiB (0.1%) fs: vfat
> dev: /dev/nvme0n1p1
> Swap:
> Alert: No swap data was found.
> Sensors:
> System Temperatures: cpu: 35.0 C mobo: N/A
> Fan Speeds (RPM): N/A
> Info:
> Processes: 241 Uptime: 0m Memory: available: 15.42 GiB used: 1.18 GiB (7.6%)
> Shell: Zsh inxi: 3.3.27
>
> `
>
> Distros I testet:
> Ubuntu 20.04.2 results in Blackscreen/poweroff
> Fedora 38 (Bug as I mentiond)
> Manjaro (Bug as I mentiond)
> It all works fine when you start from grub2 in Rescue mode.
>
> That all is very new to me, so I am sorry if I did something wrong.
> I mean, I reported that bug first at the wrong place.. Anyway
>
> Best regreds Johannes
See Bugzilla for the full thread and attached pictures that demonstrate
this regression.
Anyway, I'm adding it to regzbot:
#regzbot introduced: v6.1..v6.4 https://bugzilla.kernel.org/show_bug.cgi?id=217666
#regzbot title: screen shakes on Intel® Iris® Xe
(also Cc'ing stable list because it also occurs on v6.1.y, but not in
v6.1 mainline).
Thanks.
--
An old man doll... just what I always wanted! - Clara
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x ede600e497b1461d06d22a7d17703d9096868bc3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071617-easeful-catlike-d302@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
ede600e497b1 ("btrfs: fix extent buffer leak after tree mod log failure at split_node()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ede600e497b1461d06d22a7d17703d9096868bc3 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 8 Jun 2023 11:27:38 +0100
Subject: [PATCH] btrfs: fix extent buffer leak after tree mod log failure at
split_node()
At split_node(), if we fail to log the tree mod log copy operation, we
return without unlocking the split extent buffer we just allocated and
without decrementing the reference we own on it. Fix this by unlocking
it and decrementing the ref count before returning.
Fixes: 5de865eebb83 ("Btrfs: fix tree mod logging")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 7f7f13965fe9..8496535828de 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3053,6 +3053,8 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
ret = btrfs_tree_mod_log_eb_copy(split, c, 0, mid, c_nritems - mid);
if (ret) {
+ btrfs_tree_unlock(split);
+ free_extent_buffer(split);
btrfs_abort_transaction(trans, ret);
return ret;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x ede600e497b1461d06d22a7d17703d9096868bc3
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071616-ragweed-flyer-3ff6@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
ede600e497b1 ("btrfs: fix extent buffer leak after tree mod log failure at split_node()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ede600e497b1461d06d22a7d17703d9096868bc3 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 8 Jun 2023 11:27:38 +0100
Subject: [PATCH] btrfs: fix extent buffer leak after tree mod log failure at
split_node()
At split_node(), if we fail to log the tree mod log copy operation, we
return without unlocking the split extent buffer we just allocated and
without decrementing the reference we own on it. Fix this by unlocking
it and decrementing the ref count before returning.
Fixes: 5de865eebb83 ("Btrfs: fix tree mod logging")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 7f7f13965fe9..8496535828de 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -3053,6 +3053,8 @@ static noinline int split_node(struct btrfs_trans_handle *trans,
ret = btrfs_tree_mod_log_eb_copy(split, c, 0, mid, c_nritems - mid);
if (ret) {
+ btrfs_tree_unlock(split);
+ free_extent_buffer(split);
btrfs_abort_transaction(trans, ret);
return ret;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x a46d37012a5be1737393b8f82fd35665e4556eee
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071638-uplifting-pentagram-4572@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
a46d37012a5b ("ASoC: mediatek: mt8173: Fix snd_soc_component_initialize error path")
8c32984bc7da ("ASoC: mediatek: mt8173: Fix debugfs registration for components")
fceef72b68d6 ("ASoC: mt8173: use devm_platform_ioremap_resource() to simplify code")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a46d37012a5be1737393b8f82fd35665e4556eee Mon Sep 17 00:00:00 2001
From: Ricardo Ribalda Delgado <ribalda(a)chromium.org>
Date: Mon, 12 Jun 2023 11:05:31 +0200
Subject: [PATCH] ASoC: mediatek: mt8173: Fix snd_soc_component_initialize
error path
If the second component fails to initialize, cleanup the first on.
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Cc: stable(a)kernel.org
Fixes: f1b5bf07365d ("ASoC: mt2701/mt8173: replace platform to component")
Signed-off-by: Ricardo Ribalda Delgado <ribalda(a)chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Link: https://lore.kernel.org/r/20230612-mt8173-fixup-v2-1-432aa99ce24d@chromium.…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/mediatek/mt8173/mt8173-afe-pcm.c b/sound/soc/mediatek/mt8173/mt8173-afe-pcm.c
index f93c2ec8beb7..ff25c44070a3 100644
--- a/sound/soc/mediatek/mt8173/mt8173-afe-pcm.c
+++ b/sound/soc/mediatek/mt8173/mt8173-afe-pcm.c
@@ -1156,14 +1156,14 @@ static int mt8173_afe_pcm_dev_probe(struct platform_device *pdev)
comp_hdmi = devm_kzalloc(&pdev->dev, sizeof(*comp_hdmi), GFP_KERNEL);
if (!comp_hdmi) {
ret = -ENOMEM;
- goto err_pm_disable;
+ goto err_cleanup_components;
}
ret = snd_soc_component_initialize(comp_hdmi,
&mt8173_afe_hdmi_dai_component,
&pdev->dev);
if (ret)
- goto err_pm_disable;
+ goto err_cleanup_components;
#ifdef CONFIG_DEBUG_FS
comp_hdmi->debugfs_prefix = "hdmi";
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a46d37012a5be1737393b8f82fd35665e4556eee
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071637-hankering-crop-89ef@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
a46d37012a5b ("ASoC: mediatek: mt8173: Fix snd_soc_component_initialize error path")
8c32984bc7da ("ASoC: mediatek: mt8173: Fix debugfs registration for components")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a46d37012a5be1737393b8f82fd35665e4556eee Mon Sep 17 00:00:00 2001
From: Ricardo Ribalda Delgado <ribalda(a)chromium.org>
Date: Mon, 12 Jun 2023 11:05:31 +0200
Subject: [PATCH] ASoC: mediatek: mt8173: Fix snd_soc_component_initialize
error path
If the second component fails to initialize, cleanup the first on.
Reported-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Cc: stable(a)kernel.org
Fixes: f1b5bf07365d ("ASoC: mt2701/mt8173: replace platform to component")
Signed-off-by: Ricardo Ribalda Delgado <ribalda(a)chromium.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno(a)collabora.com>
Link: https://lore.kernel.org/r/20230612-mt8173-fixup-v2-1-432aa99ce24d@chromium.…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/mediatek/mt8173/mt8173-afe-pcm.c b/sound/soc/mediatek/mt8173/mt8173-afe-pcm.c
index f93c2ec8beb7..ff25c44070a3 100644
--- a/sound/soc/mediatek/mt8173/mt8173-afe-pcm.c
+++ b/sound/soc/mediatek/mt8173/mt8173-afe-pcm.c
@@ -1156,14 +1156,14 @@ static int mt8173_afe_pcm_dev_probe(struct platform_device *pdev)
comp_hdmi = devm_kzalloc(&pdev->dev, sizeof(*comp_hdmi), GFP_KERNEL);
if (!comp_hdmi) {
ret = -ENOMEM;
- goto err_pm_disable;
+ goto err_cleanup_components;
}
ret = snd_soc_component_initialize(comp_hdmi,
&mt8173_afe_hdmi_dai_component,
&pdev->dev);
if (ret)
- goto err_pm_disable;
+ goto err_cleanup_components;
#ifdef CONFIG_DEBUG_FS
comp_hdmi->debugfs_prefix = "hdmi";
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071641-umpire-kilogram-716d@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
8a4a0b2a3eaf ("btrfs: fix race between quota disable and relocation")
e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
a855fbe69229 ("btrfs: fix lockdep splat when enabling and disabling qgroups")
49e5fb46211d ("btrfs: qgroup: export qgroups in sysfs")
5958253cf65d ("btrfs: qgroup: catch reserved space leaks at unmount time")
81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort")
aac0023c2106 ("btrfs: move basic block_group definitions to their own header")
d7cd4dd907c1 ("Btrfs: fix sysfs warning and missing raid sysfs directories")
867363429d70 ("btrfs: migrate the delalloc space stuff to it's own home")
fb6dea26601b ("btrfs: migrate btrfs_trans_release_chunk_metadata")
6ef03debdb3d ("btrfs: migrate the delayed refs rsv code")
67f9c2209e88 ("btrfs: migrate the global_block_rsv helpers to block-rsv.c")
550fa228ee7e ("btrfs: migrate the block-rsv code to block-rsv.c")
424a47805a81 ("btrfs: stop using block_rsv_release_bytes everywhere")
fcec36224fc6 ("btrfs: cleanup the target logic in __btrfs_block_rsv_release")
fed14b323db8 ("btrfs: export __btrfs_block_rsv_release")
0b50174ad5e9 ("btrfs: export btrfs_block_rsv_add_bytes")
d12ffdd1aa4c ("btrfs: move btrfs_block_rsv definitions into it's own header")
0d9764f6d0fb ("btrfs: move reserve_metadata_bytes and supporting code to space-info.c")
5da6afeb32e9 ("btrfs: move dump_space_info to space-info.c")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 19 Jun 2023 17:21:50 +0100
Subject: [PATCH] btrfs: fix race between quota disable and relocation
If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().
This can be sporadically triggered by test case btrfs/255 from fstests:
$ ./check btrfs/255
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
--- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100
@@ -1,2 +1,4 @@
QA output created by 255
+ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
+There may be more info in syslog - try dmesg | tail
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff)
Ran: btrfs/255
Failures: btrfs/255
Failed 1 of 1 tests
To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.
Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index f8735b31da16..da1f84a0eb29 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1232,12 +1232,23 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
int ret = 0;
/*
- * We need to have subvol_sem write locked, to prevent races between
- * concurrent tasks trying to disable quotas, because we will unlock
- * and relock qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes.
+ * We need to have subvol_sem write locked to prevent races with
+ * snapshot creation.
*/
lockdep_assert_held_write(&fs_info->subvol_sem);
+ /*
+ * Lock the cleaner mutex to prevent races with concurrent relocation,
+ * because relocation may be building backrefs for blocks of the quota
+ * root while we are deleting the root. This is like dropping fs roots
+ * of deleted snapshots/subvolumes, we need the same protection.
+ *
+ * This also prevents races between concurrent tasks trying to disable
+ * quotas, because we will unlock and relock qgroup_ioctl_lock across
+ * BTRFS_FS_QUOTA_ENABLED changes.
+ */
+ mutex_lock(&fs_info->cleaner_mutex);
+
mutex_lock(&fs_info->qgroup_ioctl_lock);
if (!fs_info->quota_root)
goto out;
@@ -1319,6 +1330,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
btrfs_end_transaction(trans);
else if (trans)
ret = btrfs_end_transaction(trans);
+ mutex_unlock(&fs_info->cleaner_mutex);
return ret;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071640-corset-tripping-baa8@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
8a4a0b2a3eaf ("btrfs: fix race between quota disable and relocation")
e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
a855fbe69229 ("btrfs: fix lockdep splat when enabling and disabling qgroups")
49e5fb46211d ("btrfs: qgroup: export qgroups in sysfs")
5958253cf65d ("btrfs: qgroup: catch reserved space leaks at unmount time")
81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort")
aac0023c2106 ("btrfs: move basic block_group definitions to their own header")
d7cd4dd907c1 ("Btrfs: fix sysfs warning and missing raid sysfs directories")
867363429d70 ("btrfs: migrate the delalloc space stuff to it's own home")
fb6dea26601b ("btrfs: migrate btrfs_trans_release_chunk_metadata")
6ef03debdb3d ("btrfs: migrate the delayed refs rsv code")
67f9c2209e88 ("btrfs: migrate the global_block_rsv helpers to block-rsv.c")
550fa228ee7e ("btrfs: migrate the block-rsv code to block-rsv.c")
424a47805a81 ("btrfs: stop using block_rsv_release_bytes everywhere")
fcec36224fc6 ("btrfs: cleanup the target logic in __btrfs_block_rsv_release")
fed14b323db8 ("btrfs: export __btrfs_block_rsv_release")
0b50174ad5e9 ("btrfs: export btrfs_block_rsv_add_bytes")
d12ffdd1aa4c ("btrfs: move btrfs_block_rsv definitions into it's own header")
0d9764f6d0fb ("btrfs: move reserve_metadata_bytes and supporting code to space-info.c")
5da6afeb32e9 ("btrfs: move dump_space_info to space-info.c")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 19 Jun 2023 17:21:50 +0100
Subject: [PATCH] btrfs: fix race between quota disable and relocation
If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().
This can be sporadically triggered by test case btrfs/255 from fstests:
$ ./check btrfs/255
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
--- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100
@@ -1,2 +1,4 @@
QA output created by 255
+ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
+There may be more info in syslog - try dmesg | tail
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff)
Ran: btrfs/255
Failures: btrfs/255
Failed 1 of 1 tests
To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.
Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index f8735b31da16..da1f84a0eb29 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1232,12 +1232,23 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
int ret = 0;
/*
- * We need to have subvol_sem write locked, to prevent races between
- * concurrent tasks trying to disable quotas, because we will unlock
- * and relock qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes.
+ * We need to have subvol_sem write locked to prevent races with
+ * snapshot creation.
*/
lockdep_assert_held_write(&fs_info->subvol_sem);
+ /*
+ * Lock the cleaner mutex to prevent races with concurrent relocation,
+ * because relocation may be building backrefs for blocks of the quota
+ * root while we are deleting the root. This is like dropping fs roots
+ * of deleted snapshots/subvolumes, we need the same protection.
+ *
+ * This also prevents races between concurrent tasks trying to disable
+ * quotas, because we will unlock and relock qgroup_ioctl_lock across
+ * BTRFS_FS_QUOTA_ENABLED changes.
+ */
+ mutex_lock(&fs_info->cleaner_mutex);
+
mutex_lock(&fs_info->qgroup_ioctl_lock);
if (!fs_info->quota_root)
goto out;
@@ -1319,6 +1330,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
btrfs_end_transaction(trans);
else if (trans)
ret = btrfs_end_transaction(trans);
+ mutex_unlock(&fs_info->cleaner_mutex);
return ret;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071636-neatness-trapping-fe89@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
8a4a0b2a3eaf ("btrfs: fix race between quota disable and relocation")
e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
a855fbe69229 ("btrfs: fix lockdep splat when enabling and disabling qgroups")
49e5fb46211d ("btrfs: qgroup: export qgroups in sysfs")
5958253cf65d ("btrfs: qgroup: catch reserved space leaks at unmount time")
81f7eb00ff5b ("btrfs: destroy qgroup extent records on transaction abort")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 19 Jun 2023 17:21:50 +0100
Subject: [PATCH] btrfs: fix race between quota disable and relocation
If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().
This can be sporadically triggered by test case btrfs/255 from fstests:
$ ./check btrfs/255
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
--- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100
@@ -1,2 +1,4 @@
QA output created by 255
+ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
+There may be more info in syslog - try dmesg | tail
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff)
Ran: btrfs/255
Failures: btrfs/255
Failed 1 of 1 tests
To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.
Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index f8735b31da16..da1f84a0eb29 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1232,12 +1232,23 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
int ret = 0;
/*
- * We need to have subvol_sem write locked, to prevent races between
- * concurrent tasks trying to disable quotas, because we will unlock
- * and relock qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes.
+ * We need to have subvol_sem write locked to prevent races with
+ * snapshot creation.
*/
lockdep_assert_held_write(&fs_info->subvol_sem);
+ /*
+ * Lock the cleaner mutex to prevent races with concurrent relocation,
+ * because relocation may be building backrefs for blocks of the quota
+ * root while we are deleting the root. This is like dropping fs roots
+ * of deleted snapshots/subvolumes, we need the same protection.
+ *
+ * This also prevents races between concurrent tasks trying to disable
+ * quotas, because we will unlock and relock qgroup_ioctl_lock across
+ * BTRFS_FS_QUOTA_ENABLED changes.
+ */
+ mutex_lock(&fs_info->cleaner_mutex);
+
mutex_lock(&fs_info->qgroup_ioctl_lock);
if (!fs_info->quota_root)
goto out;
@@ -1319,6 +1330,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
btrfs_end_transaction(trans);
else if (trans)
ret = btrfs_end_transaction(trans);
+ mutex_unlock(&fs_info->cleaner_mutex);
return ret;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071629-nag-gravel-c2b8@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
8a4a0b2a3eaf ("btrfs: fix race between quota disable and relocation")
e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 19 Jun 2023 17:21:50 +0100
Subject: [PATCH] btrfs: fix race between quota disable and relocation
If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().
This can be sporadically triggered by test case btrfs/255 from fstests:
$ ./check btrfs/255
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
--- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100
@@ -1,2 +1,4 @@
QA output created by 255
+ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
+There may be more info in syslog - try dmesg | tail
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff)
Ran: btrfs/255
Failures: btrfs/255
Failed 1 of 1 tests
To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.
Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index f8735b31da16..da1f84a0eb29 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1232,12 +1232,23 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
int ret = 0;
/*
- * We need to have subvol_sem write locked, to prevent races between
- * concurrent tasks trying to disable quotas, because we will unlock
- * and relock qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes.
+ * We need to have subvol_sem write locked to prevent races with
+ * snapshot creation.
*/
lockdep_assert_held_write(&fs_info->subvol_sem);
+ /*
+ * Lock the cleaner mutex to prevent races with concurrent relocation,
+ * because relocation may be building backrefs for blocks of the quota
+ * root while we are deleting the root. This is like dropping fs roots
+ * of deleted snapshots/subvolumes, we need the same protection.
+ *
+ * This also prevents races between concurrent tasks trying to disable
+ * quotas, because we will unlock and relock qgroup_ioctl_lock across
+ * BTRFS_FS_QUOTA_ENABLED changes.
+ */
+ mutex_lock(&fs_info->cleaner_mutex);
+
mutex_lock(&fs_info->qgroup_ioctl_lock);
if (!fs_info->quota_root)
goto out;
@@ -1319,6 +1330,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
btrfs_end_transaction(trans);
else if (trans)
ret = btrfs_end_transaction(trans);
+ mutex_unlock(&fs_info->cleaner_mutex);
return ret;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071628-seclusion-applied-22c4@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
8a4a0b2a3eaf ("btrfs: fix race between quota disable and relocation")
e804861bd4e6 ("btrfs: fix deadlock between quota disable and qgroup rescan worker")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 19 Jun 2023 17:21:50 +0100
Subject: [PATCH] btrfs: fix race between quota disable and relocation
If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().
This can be sporadically triggered by test case btrfs/255 from fstests:
$ ./check btrfs/255
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
--- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100
@@ -1,2 +1,4 @@
QA output created by 255
+ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
+There may be more info in syslog - try dmesg | tail
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff)
Ran: btrfs/255
Failures: btrfs/255
Failed 1 of 1 tests
To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.
Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index f8735b31da16..da1f84a0eb29 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1232,12 +1232,23 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
int ret = 0;
/*
- * We need to have subvol_sem write locked, to prevent races between
- * concurrent tasks trying to disable quotas, because we will unlock
- * and relock qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes.
+ * We need to have subvol_sem write locked to prevent races with
+ * snapshot creation.
*/
lockdep_assert_held_write(&fs_info->subvol_sem);
+ /*
+ * Lock the cleaner mutex to prevent races with concurrent relocation,
+ * because relocation may be building backrefs for blocks of the quota
+ * root while we are deleting the root. This is like dropping fs roots
+ * of deleted snapshots/subvolumes, we need the same protection.
+ *
+ * This also prevents races between concurrent tasks trying to disable
+ * quotas, because we will unlock and relock qgroup_ioctl_lock across
+ * BTRFS_FS_QUOTA_ENABLED changes.
+ */
+ mutex_lock(&fs_info->cleaner_mutex);
+
mutex_lock(&fs_info->qgroup_ioctl_lock);
if (!fs_info->quota_root)
goto out;
@@ -1319,6 +1330,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
btrfs_end_transaction(trans);
else if (trans)
ret = btrfs_end_transaction(trans);
+ mutex_unlock(&fs_info->cleaner_mutex);
return ret;
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071627-serotonin-shorts-8ca4@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
8a4a0b2a3eaf ("btrfs: fix race between quota disable and relocation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 19 Jun 2023 17:21:50 +0100
Subject: [PATCH] btrfs: fix race between quota disable and relocation
If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().
This can be sporadically triggered by test case btrfs/255 from fstests:
$ ./check btrfs/255
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
--- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100
@@ -1,2 +1,4 @@
QA output created by 255
+ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
+There may be more info in syslog - try dmesg | tail
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff)
Ran: btrfs/255
Failures: btrfs/255
Failed 1 of 1 tests
To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.
Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index f8735b31da16..da1f84a0eb29 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1232,12 +1232,23 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
int ret = 0;
/*
- * We need to have subvol_sem write locked, to prevent races between
- * concurrent tasks trying to disable quotas, because we will unlock
- * and relock qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes.
+ * We need to have subvol_sem write locked to prevent races with
+ * snapshot creation.
*/
lockdep_assert_held_write(&fs_info->subvol_sem);
+ /*
+ * Lock the cleaner mutex to prevent races with concurrent relocation,
+ * because relocation may be building backrefs for blocks of the quota
+ * root while we are deleting the root. This is like dropping fs roots
+ * of deleted snapshots/subvolumes, we need the same protection.
+ *
+ * This also prevents races between concurrent tasks trying to disable
+ * quotas, because we will unlock and relock qgroup_ioctl_lock across
+ * BTRFS_FS_QUOTA_ENABLED changes.
+ */
+ mutex_lock(&fs_info->cleaner_mutex);
+
mutex_lock(&fs_info->qgroup_ioctl_lock);
if (!fs_info->quota_root)
goto out;
@@ -1319,6 +1330,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
btrfs_end_transaction(trans);
else if (trans)
ret = btrfs_end_transaction(trans);
+ mutex_unlock(&fs_info->cleaner_mutex);
return ret;
}
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071626-darkness-agony-ce5d@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a4a0b2a3eaf75ca8854f856ef29690c12b2f531 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Mon, 19 Jun 2023 17:21:50 +0100
Subject: [PATCH] btrfs: fix race between quota disable and relocation
If we disable quotas while we have a relocation of a metadata block group
that has extents belonging to the quota root, we can cause the relocation
to fail with -ENOENT. This is because relocation builds backref nodes for
extents of the quota root and later needs to walk the backrefs and access
the quota root - however if in between a task disables quotas, it results
in deleting the quota root from the root tree (with btrfs_del_root(),
called from btrfs_quota_disable().
This can be sporadically triggered by test case btrfs/255 from fstests:
$ ./check btrfs/255
FSTYP -- btrfs
PLATFORM -- Linux/x86_64 debian0 6.4.0-rc6-btrfs-next-134+ #1 SMP PREEMPT_DYNAMIC Thu Jun 15 11:59:28 WEST 2023
MKFS_OPTIONS -- /dev/sdc
MOUNT_OPTIONS -- /dev/sdc /home/fdmanana/btrfs-tests/scratch_1
btrfs/255 6s ... _check_dmesg: something found in dmesg (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.dmesg)
- output mismatch (see /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad)
--- tests/btrfs/255.out 2023-03-02 21:47:53.876609426 +0000
+++ /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad 2023-06-16 10:20:39.267563212 +0100
@@ -1,2 +1,4 @@
QA output created by 255
+ERROR: error during balancing '/home/fdmanana/btrfs-tests/scratch_1': No such file or directory
+There may be more info in syslog - try dmesg | tail
Silence is golden
...
(Run 'diff -u /home/fdmanana/git/hub/xfstests/tests/btrfs/255.out /home/fdmanana/git/hub/xfstests/results//btrfs/255.out.bad' to see the entire diff)
Ran: btrfs/255
Failures: btrfs/255
Failed 1 of 1 tests
To fix this make the quota disable operation take the cleaner mutex, as
relocation of a block group also takes this mutex. This is also what we
do when deleting a subvolume/snapshot, we take the cleaner mutex in the
cleaner kthread (at cleaner_kthread()) and then we call btrfs_del_root()
at btrfs_drop_snapshot() while under the protection of the cleaner mutex.
Fixes: bed92eae26cc ("Btrfs: qgroup implementation and prototypes")
CC: stable(a)vger.kernel.org # 5.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/qgroup.c b/fs/btrfs/qgroup.c
index f8735b31da16..da1f84a0eb29 100644
--- a/fs/btrfs/qgroup.c
+++ b/fs/btrfs/qgroup.c
@@ -1232,12 +1232,23 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
int ret = 0;
/*
- * We need to have subvol_sem write locked, to prevent races between
- * concurrent tasks trying to disable quotas, because we will unlock
- * and relock qgroup_ioctl_lock across BTRFS_FS_QUOTA_ENABLED changes.
+ * We need to have subvol_sem write locked to prevent races with
+ * snapshot creation.
*/
lockdep_assert_held_write(&fs_info->subvol_sem);
+ /*
+ * Lock the cleaner mutex to prevent races with concurrent relocation,
+ * because relocation may be building backrefs for blocks of the quota
+ * root while we are deleting the root. This is like dropping fs roots
+ * of deleted snapshots/subvolumes, we need the same protection.
+ *
+ * This also prevents races between concurrent tasks trying to disable
+ * quotas, because we will unlock and relock qgroup_ioctl_lock across
+ * BTRFS_FS_QUOTA_ENABLED changes.
+ */
+ mutex_lock(&fs_info->cleaner_mutex);
+
mutex_lock(&fs_info->qgroup_ioctl_lock);
if (!fs_info->quota_root)
goto out;
@@ -1319,6 +1330,7 @@ int btrfs_quota_disable(struct btrfs_fs_info *fs_info)
btrfs_end_transaction(trans);
else if (trans)
ret = btrfs_end_transaction(trans);
+ mutex_unlock(&fs_info->cleaner_mutex);
return ret;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 40b0a749388517de244643c09bdbb98f7dcb6ef1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071602-oxford-etching-4c52@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
40b0a7493885 ("btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block()")
406808ab2f0b ("btrfs: use booleans where appropriate for the tree mod log functions")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
1b7ec85ef490 ("btrfs: pass root owner to read_tree_block")
bfb484d922a3 ("btrfs: cleanup extent buffer readahead")
ac5887c8e013 ("btrfs: locking: remove all the blocking helpers")
196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore")
bf77467a93bd ("btrfs: introduce BTRFS_NESTING_LEFT/BTRFS_NESTING_RIGHT")
9631e4cc1a03 ("btrfs: introduce BTRFS_NESTING_COW for cow'ing blocks")
fd7ba1c1202d ("btrfs: add nesting tags to the locking helpers")
51899412dd95 ("btrfs: introduce btrfs_path::recurse")
329ced799be8 ("btrfs: rename extent_buffer::lock_nested to extent_buffer::lock_recursed")
d16c702fe4f2 ("btrfs: ctree: check key order before merging tree blocks")
d3beaa253fd6 ("btrfs: set the lockdep class for log tree extent buffers")
ad24466588ab ("btrfs: set the correct lockdep class for new nodes")
d85327b1d8b7 ("btrfs: prefetch chunk tree leaves at mount")
49e5fb46211d ("btrfs: qgroup: export qgroups in sysfs")
5958253cf65d ("btrfs: qgroup: catch reserved space leaks at unmount time")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40b0a749388517de244643c09bdbb98f7dcb6ef1 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 8 Jun 2023 11:27:40 +0100
Subject: [PATCH] btrfs: do not BUG_ON() on tree mod log failure at
__btrfs_cow_block()
At __btrfs_cow_block(), instead of doing a BUG_ON() in case we fail to
record a tree mod log root insertion operation, do a transaction abort
instead. There's really no need for the BUG_ON(), we can properly
release all resources in this context and turn the filesystem to RO mode
and in an error state instead.
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 8496535828de..d6c29564ce49 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -584,9 +584,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
btrfs_header_backref_rev(buf) < BTRFS_MIXED_BACKREF_REV)
parent_start = buf->start;
- atomic_inc(&cow->refs);
ret = btrfs_tree_mod_log_insert_root(root->node, cow, true);
- BUG_ON(ret < 0);
+ if (ret < 0) {
+ btrfs_tree_unlock(cow);
+ free_extent_buffer(cow);
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
+ atomic_inc(&cow->refs);
rcu_assign_pointer(root->node, cow);
btrfs_free_tree_block(trans, btrfs_root_id(root), buf,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 40b0a749388517de244643c09bdbb98f7dcb6ef1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071601-gristle-construct-2916@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
40b0a7493885 ("btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block()")
406808ab2f0b ("btrfs: use booleans where appropriate for the tree mod log functions")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
1b7ec85ef490 ("btrfs: pass root owner to read_tree_block")
bfb484d922a3 ("btrfs: cleanup extent buffer readahead")
ac5887c8e013 ("btrfs: locking: remove all the blocking helpers")
196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 40b0a749388517de244643c09bdbb98f7dcb6ef1 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 8 Jun 2023 11:27:40 +0100
Subject: [PATCH] btrfs: do not BUG_ON() on tree mod log failure at
__btrfs_cow_block()
At __btrfs_cow_block(), instead of doing a BUG_ON() in case we fail to
record a tree mod log root insertion operation, do a transaction abort
instead. There's really no need for the BUG_ON(), we can properly
release all resources in this context and turn the filesystem to RO mode
and in an error state instead.
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 8496535828de..d6c29564ce49 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -584,9 +584,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
btrfs_header_backref_rev(buf) < BTRFS_MIXED_BACKREF_REV)
parent_start = buf->start;
- atomic_inc(&cow->refs);
ret = btrfs_tree_mod_log_insert_root(root->node, cow, true);
- BUG_ON(ret < 0);
+ if (ret < 0) {
+ btrfs_tree_unlock(cow);
+ free_extent_buffer(cow);
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
+ atomic_inc(&cow->refs);
rcu_assign_pointer(root->node, cow);
btrfs_free_tree_block(trans, btrfs_root_id(root), buf,
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x d09c51521f22f9cbdfb1cf63e5c456077c622c84
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071633-bless-cola-21d7@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
d09c51521f22 ("btrfs: add missing error handling when logging operation while COWing extent buffer")
33cff222faff ("btrfs: remove gfp_t flag from btrfs_tree_mod_log_insert_key()")
879b22219831 ("btrfs: switch GFP_ATOMIC to GFP_NOFS when fixing up low keys")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
1b7ec85ef490 ("btrfs: pass root owner to read_tree_block")
bfb484d922a3 ("btrfs: cleanup extent buffer readahead")
ac5887c8e013 ("btrfs: locking: remove all the blocking helpers")
196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore")
bf77467a93bd ("btrfs: introduce BTRFS_NESTING_LEFT/BTRFS_NESTING_RIGHT")
9631e4cc1a03 ("btrfs: introduce BTRFS_NESTING_COW for cow'ing blocks")
fd7ba1c1202d ("btrfs: add nesting tags to the locking helpers")
51899412dd95 ("btrfs: introduce btrfs_path::recurse")
329ced799be8 ("btrfs: rename extent_buffer::lock_nested to extent_buffer::lock_recursed")
d16c702fe4f2 ("btrfs: ctree: check key order before merging tree blocks")
d3beaa253fd6 ("btrfs: set the lockdep class for log tree extent buffers")
ad24466588ab ("btrfs: set the correct lockdep class for new nodes")
d85327b1d8b7 ("btrfs: prefetch chunk tree leaves at mount")
49e5fb46211d ("btrfs: qgroup: export qgroups in sysfs")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d09c51521f22f9cbdfb1cf63e5c456077c622c84 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 8 Jun 2023 11:27:37 +0100
Subject: [PATCH] btrfs: add missing error handling when logging operation
while COWing extent buffer
When COWing an extent buffer that is not the root node, we need to log in
the tree mod log that we replaced a pointer in the parent node, otherwise
a tree mod log user doing a search on the b+tree can return incorrect
results (that miss something). We are doing the call to
btrfs_tree_mod_log_insert_key() but we totally ignore its return value.
So fix this by adding the missing error handling, resulting in a
transaction abort and freeing the COWed extent buffer.
Fixes: f230475e62f7 ("Btrfs: put all block modifications into the tree mod log")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 385524224037..7f7f13965fe9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -595,8 +595,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
add_root_to_dirty_list(root);
} else {
WARN_ON(trans->transid != btrfs_header_generation(parent));
- btrfs_tree_mod_log_insert_key(parent, parent_slot,
- BTRFS_MOD_LOG_KEY_REPLACE);
+ ret = btrfs_tree_mod_log_insert_key(parent, parent_slot,
+ BTRFS_MOD_LOG_KEY_REPLACE);
+ if (ret) {
+ btrfs_tree_unlock(cow);
+ free_extent_buffer(cow);
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
btrfs_set_node_blockptr(parent, parent_slot,
cow->start);
btrfs_set_node_ptr_generation(parent, parent_slot,
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x d09c51521f22f9cbdfb1cf63e5c456077c622c84
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071632-modified-gauze-0d86@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
d09c51521f22 ("btrfs: add missing error handling when logging operation while COWing extent buffer")
33cff222faff ("btrfs: remove gfp_t flag from btrfs_tree_mod_log_insert_key()")
879b22219831 ("btrfs: switch GFP_ATOMIC to GFP_NOFS when fixing up low keys")
f3a84ccd28d0 ("btrfs: move the tree mod log code into its own file")
dbcc7d57bffc ("btrfs: fix race when cloning extent buffer during rewind of an old root")
cac06d843f25 ("btrfs: introduce the skeleton of btrfs_subpage structure")
1b7ec85ef490 ("btrfs: pass root owner to read_tree_block")
bfb484d922a3 ("btrfs: cleanup extent buffer readahead")
ac5887c8e013 ("btrfs: locking: remove all the blocking helpers")
196d59ab9ccc ("btrfs: switch extent buffer tree lock to rw_semaphore")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d09c51521f22f9cbdfb1cf63e5c456077c622c84 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 8 Jun 2023 11:27:37 +0100
Subject: [PATCH] btrfs: add missing error handling when logging operation
while COWing extent buffer
When COWing an extent buffer that is not the root node, we need to log in
the tree mod log that we replaced a pointer in the parent node, otherwise
a tree mod log user doing a search on the b+tree can return incorrect
results (that miss something). We are doing the call to
btrfs_tree_mod_log_insert_key() but we totally ignore its return value.
So fix this by adding the missing error handling, resulting in a
transaction abort and freeing the COWed extent buffer.
Fixes: f230475e62f7 ("Btrfs: put all block modifications into the tree mod log")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 385524224037..7f7f13965fe9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -595,8 +595,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
add_root_to_dirty_list(root);
} else {
WARN_ON(trans->transid != btrfs_header_generation(parent));
- btrfs_tree_mod_log_insert_key(parent, parent_slot,
- BTRFS_MOD_LOG_KEY_REPLACE);
+ ret = btrfs_tree_mod_log_insert_key(parent, parent_slot,
+ BTRFS_MOD_LOG_KEY_REPLACE);
+ if (ret) {
+ btrfs_tree_unlock(cow);
+ free_extent_buffer(cow);
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
btrfs_set_node_blockptr(parent, parent_slot,
cow->start);
btrfs_set_node_ptr_generation(parent, parent_slot,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x d09c51521f22f9cbdfb1cf63e5c456077c622c84
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071631-retake-dollar-89ea@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
d09c51521f22 ("btrfs: add missing error handling when logging operation while COWing extent buffer")
33cff222faff ("btrfs: remove gfp_t flag from btrfs_tree_mod_log_insert_key()")
879b22219831 ("btrfs: switch GFP_ATOMIC to GFP_NOFS when fixing up low keys")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d09c51521f22f9cbdfb1cf63e5c456077c622c84 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 8 Jun 2023 11:27:37 +0100
Subject: [PATCH] btrfs: add missing error handling when logging operation
while COWing extent buffer
When COWing an extent buffer that is not the root node, we need to log in
the tree mod log that we replaced a pointer in the parent node, otherwise
a tree mod log user doing a search on the b+tree can return incorrect
results (that miss something). We are doing the call to
btrfs_tree_mod_log_insert_key() but we totally ignore its return value.
So fix this by adding the missing error handling, resulting in a
transaction abort and freeing the COWed extent buffer.
Fixes: f230475e62f7 ("Btrfs: put all block modifications into the tree mod log")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 385524224037..7f7f13965fe9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -595,8 +595,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
add_root_to_dirty_list(root);
} else {
WARN_ON(trans->transid != btrfs_header_generation(parent));
- btrfs_tree_mod_log_insert_key(parent, parent_slot,
- BTRFS_MOD_LOG_KEY_REPLACE);
+ ret = btrfs_tree_mod_log_insert_key(parent, parent_slot,
+ BTRFS_MOD_LOG_KEY_REPLACE);
+ if (ret) {
+ btrfs_tree_unlock(cow);
+ free_extent_buffer(cow);
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
btrfs_set_node_blockptr(parent, parent_slot,
cow->start);
btrfs_set_node_ptr_generation(parent, parent_slot,
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d09c51521f22f9cbdfb1cf63e5c456077c622c84
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071630-arrange-duchess-1bb1@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
d09c51521f22 ("btrfs: add missing error handling when logging operation while COWing extent buffer")
33cff222faff ("btrfs: remove gfp_t flag from btrfs_tree_mod_log_insert_key()")
879b22219831 ("btrfs: switch GFP_ATOMIC to GFP_NOFS when fixing up low keys")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d09c51521f22f9cbdfb1cf63e5c456077c622c84 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 8 Jun 2023 11:27:37 +0100
Subject: [PATCH] btrfs: add missing error handling when logging operation
while COWing extent buffer
When COWing an extent buffer that is not the root node, we need to log in
the tree mod log that we replaced a pointer in the parent node, otherwise
a tree mod log user doing a search on the b+tree can return incorrect
results (that miss something). We are doing the call to
btrfs_tree_mod_log_insert_key() but we totally ignore its return value.
So fix this by adding the missing error handling, resulting in a
transaction abort and freeing the COWed extent buffer.
Fixes: f230475e62f7 ("Btrfs: put all block modifications into the tree mod log")
CC: stable(a)vger.kernel.org # 5.4+
Reviewed-by: Qu Wenruo <wqu(a)suse.com>
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 385524224037..7f7f13965fe9 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -595,8 +595,14 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
add_root_to_dirty_list(root);
} else {
WARN_ON(trans->transid != btrfs_header_generation(parent));
- btrfs_tree_mod_log_insert_key(parent, parent_slot,
- BTRFS_MOD_LOG_KEY_REPLACE);
+ ret = btrfs_tree_mod_log_insert_key(parent, parent_slot,
+ BTRFS_MOD_LOG_KEY_REPLACE);
+ if (ret) {
+ btrfs_tree_unlock(cow);
+ free_extent_buffer(cow);
+ btrfs_abort_transaction(trans, ret);
+ return ret;
+ }
btrfs_set_node_blockptr(parent, parent_slot,
cow->start);
btrfs_set_node_ptr_generation(parent, parent_slot,
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 95c8e349d8e8f190e28854e7ca96de866d2dc5a4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071646-upheaval-antiquity-8775@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
95c8e349d8e8 ("btrfs: warn on invalid slot in tree mod log rewind")
e23efd8e8767 ("btrfs: add eb to btrfs_node_key_ptr_offset")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
8bb808c6ad91 ("btrfs: don't print stack trace when transaction is aborted due to ENOMEM")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 95c8e349d8e8f190e28854e7ca96de866d2dc5a4 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Thu, 1 Jun 2023 11:55:13 -0700
Subject: [PATCH] btrfs: warn on invalid slot in tree mod log rewind
The way that tree mod log tracks the ultimate length of the eb, the
variable 'n', eventually turns up the correct value, but at intermediate
steps during the rewind, n can be inaccurate as a representation of the
end of the eb. For example, it doesn't get updated on move rewinds, and
it does get updated for add/remove in the middle of the eb.
To detect cases with invalid moves, introduce a separate variable called
max_slot which tries to track the maximum valid slot in the rewind eb.
We can then warn if we do a move whose src range goes beyond the max
valid slot.
There is a commented caveat that it is possible to have this value be an
overestimate due to the challenge of properly handling 'add' operations
in the middle of the eb, but in practice it doesn't cause enough of a
problem to throw out the max idea in favor of tracking every valid slot.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
index a555baa0143a..39545d1d2e9a 100644
--- a/fs/btrfs/tree-mod-log.c
+++ b/fs/btrfs/tree-mod-log.c
@@ -664,10 +664,27 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
unsigned long o_dst;
unsigned long o_src;
unsigned long p_size = sizeof(struct btrfs_key_ptr);
+ /*
+ * max_slot tracks the maximum valid slot of the rewind eb at every
+ * step of the rewind. This is in contrast with 'n' which eventually
+ * matches the number of items, but can be wrong during moves or if
+ * removes overlap on already valid slots (which is probably separately
+ * a bug). We do this to validate the offsets of memmoves for rewinding
+ * moves and detect invalid memmoves.
+ *
+ * Since a rewind eb can start empty, max_slot is a signed integer with
+ * a special meaning for -1, which is that no slot is valid to move out
+ * of. Any other negative value is invalid.
+ */
+ int max_slot;
+ int move_src_end_slot;
+ int move_dst_end_slot;
n = btrfs_header_nritems(eb);
+ max_slot = n - 1;
read_lock(&fs_info->tree_mod_log_lock);
while (tm && tm->seq >= time_seq) {
+ ASSERT(max_slot >= -1);
/*
* All the operations are recorded with the operator used for
* the modification. As we're going backwards, we do the
@@ -684,6 +701,8 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
btrfs_set_node_ptr_generation(eb, tm->slot,
tm->generation);
n++;
+ if (tm->slot > max_slot)
+ max_slot = tm->slot;
break;
case BTRFS_MOD_LOG_KEY_REPLACE:
BUG_ON(tm->slot >= n);
@@ -693,14 +712,37 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
tm->generation);
break;
case BTRFS_MOD_LOG_KEY_ADD:
+ /*
+ * It is possible we could have already removed keys
+ * behind the known max slot, so this will be an
+ * overestimate. In practice, the copy operation
+ * inserts them in increasing order, and overestimating
+ * just means we miss some warnings, so it's OK. It
+ * isn't worth carefully tracking the full array of
+ * valid slots to check against when moving.
+ */
+ if (tm->slot == max_slot)
+ max_slot--;
/* if a move operation is needed it's in the log */
n--;
break;
case BTRFS_MOD_LOG_MOVE_KEYS:
+ ASSERT(tm->move.nr_items > 0);
+ move_src_end_slot = tm->move.dst_slot + tm->move.nr_items - 1;
+ move_dst_end_slot = tm->slot + tm->move.nr_items - 1;
o_dst = btrfs_node_key_ptr_offset(eb, tm->slot);
o_src = btrfs_node_key_ptr_offset(eb, tm->move.dst_slot);
+ if (WARN_ON(move_src_end_slot > max_slot ||
+ tm->move.nr_items <= 0)) {
+ btrfs_warn(fs_info,
+"move from invalid tree mod log slot eb %llu slot %d dst_slot %d nr_items %d seq %llu n %u max_slot %d",
+ eb->start, tm->slot,
+ tm->move.dst_slot, tm->move.nr_items,
+ tm->seq, n, max_slot);
+ }
memmove_extent_buffer(eb, o_dst, o_src,
tm->move.nr_items * p_size);
+ max_slot = move_dst_end_slot;
break;
case BTRFS_MOD_LOG_ROOT_REPLACE:
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 95c8e349d8e8f190e28854e7ca96de866d2dc5a4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071644-boxer-satchel-5beb@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
95c8e349d8e8 ("btrfs: warn on invalid slot in tree mod log rewind")
e23efd8e8767 ("btrfs: add eb to btrfs_node_key_ptr_offset")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 95c8e349d8e8f190e28854e7ca96de866d2dc5a4 Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Thu, 1 Jun 2023 11:55:13 -0700
Subject: [PATCH] btrfs: warn on invalid slot in tree mod log rewind
The way that tree mod log tracks the ultimate length of the eb, the
variable 'n', eventually turns up the correct value, but at intermediate
steps during the rewind, n can be inaccurate as a representation of the
end of the eb. For example, it doesn't get updated on move rewinds, and
it does get updated for add/remove in the middle of the eb.
To detect cases with invalid moves, introduce a separate variable called
max_slot which tries to track the maximum valid slot in the rewind eb.
We can then warn if we do a move whose src range goes beyond the max
valid slot.
There is a commented caveat that it is possible to have this value be an
overestimate due to the challenge of properly handling 'add' operations
in the middle of the eb, but in practice it doesn't cause enough of a
problem to throw out the max idea in favor of tracking every valid slot.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
index a555baa0143a..39545d1d2e9a 100644
--- a/fs/btrfs/tree-mod-log.c
+++ b/fs/btrfs/tree-mod-log.c
@@ -664,10 +664,27 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
unsigned long o_dst;
unsigned long o_src;
unsigned long p_size = sizeof(struct btrfs_key_ptr);
+ /*
+ * max_slot tracks the maximum valid slot of the rewind eb at every
+ * step of the rewind. This is in contrast with 'n' which eventually
+ * matches the number of items, but can be wrong during moves or if
+ * removes overlap on already valid slots (which is probably separately
+ * a bug). We do this to validate the offsets of memmoves for rewinding
+ * moves and detect invalid memmoves.
+ *
+ * Since a rewind eb can start empty, max_slot is a signed integer with
+ * a special meaning for -1, which is that no slot is valid to move out
+ * of. Any other negative value is invalid.
+ */
+ int max_slot;
+ int move_src_end_slot;
+ int move_dst_end_slot;
n = btrfs_header_nritems(eb);
+ max_slot = n - 1;
read_lock(&fs_info->tree_mod_log_lock);
while (tm && tm->seq >= time_seq) {
+ ASSERT(max_slot >= -1);
/*
* All the operations are recorded with the operator used for
* the modification. As we're going backwards, we do the
@@ -684,6 +701,8 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
btrfs_set_node_ptr_generation(eb, tm->slot,
tm->generation);
n++;
+ if (tm->slot > max_slot)
+ max_slot = tm->slot;
break;
case BTRFS_MOD_LOG_KEY_REPLACE:
BUG_ON(tm->slot >= n);
@@ -693,14 +712,37 @@ static void tree_mod_log_rewind(struct btrfs_fs_info *fs_info,
tm->generation);
break;
case BTRFS_MOD_LOG_KEY_ADD:
+ /*
+ * It is possible we could have already removed keys
+ * behind the known max slot, so this will be an
+ * overestimate. In practice, the copy operation
+ * inserts them in increasing order, and overestimating
+ * just means we miss some warnings, so it's OK. It
+ * isn't worth carefully tracking the full array of
+ * valid slots to check against when moving.
+ */
+ if (tm->slot == max_slot)
+ max_slot--;
/* if a move operation is needed it's in the log */
n--;
break;
case BTRFS_MOD_LOG_MOVE_KEYS:
+ ASSERT(tm->move.nr_items > 0);
+ move_src_end_slot = tm->move.dst_slot + tm->move.nr_items - 1;
+ move_dst_end_slot = tm->slot + tm->move.nr_items - 1;
o_dst = btrfs_node_key_ptr_offset(eb, tm->slot);
o_src = btrfs_node_key_ptr_offset(eb, tm->move.dst_slot);
+ if (WARN_ON(move_src_end_slot > max_slot ||
+ tm->move.nr_items <= 0)) {
+ btrfs_warn(fs_info,
+"move from invalid tree mod log slot eb %llu slot %d dst_slot %d nr_items %d seq %llu n %u max_slot %d",
+ eb->start, tm->slot,
+ tm->move.dst_slot, tm->move.nr_items,
+ tm->seq, n, max_slot);
+ }
memmove_extent_buffer(eb, o_dst, o_src,
tm->move.nr_items * p_size);
+ max_slot = move_dst_end_slot;
break;
case BTRFS_MOD_LOG_ROOT_REPLACE:
/*
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5cead5422a0e3d13b0bcee986c0f5c4ebb94100b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071638-ultimatum-humbly-ca99@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
5cead5422a0e ("btrfs: insert tree mod log move in push_node_left")
e23efd8e8767 ("btrfs: add eb to btrfs_node_key_ptr_offset")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
8bb808c6ad91 ("btrfs: don't print stack trace when transaction is aborted due to ENOMEM")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5cead5422a0e3d13b0bcee986c0f5c4ebb94100b Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Thu, 1 Jun 2023 11:55:14 -0700
Subject: [PATCH] btrfs: insert tree mod log move in push_node_left
There is a fairly unlikely race condition in tree mod log rewind that
can result in a kernel panic which has the following trace:
[530.569] BTRFS critical (device sda3): unable to find logical 0 length 4096
[530.585] BTRFS critical (device sda3): unable to find logical 0 length 4096
[530.602] BUG: kernel NULL pointer dereference, address: 0000000000000002
[530.618] #PF: supervisor read access in kernel mode
[530.629] #PF: error_code(0x0000) - not-present page
[530.641] PGD 0 P4D 0
[530.647] Oops: 0000 [#1] SMP
[530.654] CPU: 30 PID: 398973 Comm: below Kdump: loaded Tainted: G S O K 5.12.0-0_fbk13_clang_7455_gb24de3bdb045 #1
[530.680] Hardware name: Quanta Mono Lake-M.2 SATA 1HY9U9Z001G/Mono Lake-M.2 SATA, BIOS F20_3A15 08/16/2017
[530.703] RIP: 0010:__btrfs_map_block+0xaa/0xd00
[530.755] RSP: 0018:ffffc9002c2f7600 EFLAGS: 00010246
[530.767] RAX: ffffffffffffffea RBX: ffff888292e41000 RCX: f2702d8b8be15100
[530.784] RDX: ffff88885fda6fb8 RSI: ffff88885fd973c8 RDI: ffff88885fd973c8
[530.800] RBP: ffff888292e410d0 R08: ffffffff82fd7fd0 R09: 00000000fffeffff
[530.816] R10: ffffffff82e57fd0 R11: ffffffff82e57d70 R12: 0000000000000000
[530.832] R13: 0000000000001000 R14: 0000000000001000 R15: ffffc9002c2f76f0
[530.848] FS: 00007f38d64af000(0000) GS:ffff88885fd80000(0000) knlGS:0000000000000000
[530.866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[530.880] CR2: 0000000000000002 CR3: 00000002b6770004 CR4: 00000000003706e0
[530.896] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[530.912] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[530.928] Call Trace:
[530.934] ? btrfs_printk+0x13b/0x18c
[530.943] ? btrfs_bio_counter_inc_blocked+0x3d/0x130
[530.955] btrfs_map_bio+0x75/0x330
[530.963] ? kmem_cache_alloc+0x12a/0x2d0
[530.973] ? btrfs_submit_metadata_bio+0x63/0x100
[530.984] btrfs_submit_metadata_bio+0xa4/0x100
[530.995] submit_extent_page+0x30f/0x360
[531.004] read_extent_buffer_pages+0x49e/0x6d0
[531.015] ? submit_extent_page+0x360/0x360
[531.025] btree_read_extent_buffer_pages+0x5f/0x150
[531.037] read_tree_block+0x37/0x60
[531.046] read_block_for_search+0x18b/0x410
[531.056] btrfs_search_old_slot+0x198/0x2f0
[531.066] resolve_indirect_ref+0xfe/0x6f0
[531.076] ? ulist_alloc+0x31/0x60
[531.084] ? kmem_cache_alloc_trace+0x12e/0x2b0
[531.095] find_parent_nodes+0x720/0x1830
[531.105] ? ulist_alloc+0x10/0x60
[531.113] iterate_extent_inodes+0xea/0x370
[531.123] ? btrfs_previous_extent_item+0x8f/0x110
[531.134] ? btrfs_search_path_in_tree+0x240/0x240
[531.146] iterate_inodes_from_logical+0x98/0xd0
[531.157] ? btrfs_search_path_in_tree+0x240/0x240
[531.168] btrfs_ioctl_logical_to_ino+0xd9/0x180
[531.179] btrfs_ioctl+0xe2/0x2eb0
This occurs when logical inode resolution takes a tree mod log sequence
number, and then while backref walking hits a rewind on a busy node
which has the following sequence of tree mod log operations (numbers
filled in from a specific example, but they are somewhat arbitrary)
REMOVE_WHILE_FREEING slot 532
REMOVE_WHILE_FREEING slot 531
REMOVE_WHILE_FREEING slot 530
...
REMOVE_WHILE_FREEING slot 0
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
ADD slot 455
ADD slot 454
ADD slot 453
...
ADD slot 0
MOVE src slot 0 -> dst slot 456 nritems 533
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
When this sequence gets applied via btrfs_tree_mod_log_rewind, it
allocates a fresh rewind eb, and first inserts the correct key info for
the 533 elements, then overwrites the first 456 of them, then decrements
the count by 456 via the add ops, then rewinds the move by doing a
memmove from 456:988->0:532. We have never written anything past 532, so
that memmove writes garbage into the 0:532 range. In practice, this
results in a lot of fully 0 keys. The rewind then puts valid keys into
slots 0:455 with the last removes, but 456:532 are still invalid.
When search_old_slot uses this eb, if it uses one of those invalid
slots, it can then read the extent buffer and issue a bio for offset 0
which ultimately panics looking up extent mappings.
This bad tree mod log sequence gets generated when the node balancing
code happens to do a balance_node_right followed by a push_node_left
while logging in the tree mod log. Illustrated for ebs L and R (left and
right):
L R
start:
[XXX|YYY|...] [ZZZ|...|...]
balance_node_right:
[XXX|YYY|...] [...|ZZZ|...] move Z to make room for Y
[XXX|...|...] [YYY|ZZZ|...] copy Y from L to R
push_node_left:
[XXX|YYY|...] [...|ZZZ|...] copy Y from R to L
[XXX|YYY|...] [ZZZ|...|...] move Z into emptied space (NOT LOGGED!)
This is because balance_node_right logs a move, but push_node_left
explicitly doesn't. That is because logging the move would remove the
overwritten src < dst range in the right eb, which was already logged
when we called btrfs_tree_mod_log_eb_copy. The correct sequence would
include a move from 456:988 to 0:532 after remove 0:455 and before
removing 0:532. Reversing that sequence would entail creating keys for
0:532, then moving those keys out to 456:988, then creating more keys
for 0:455.
i.e.,
REMOVE_WHILE_FREEING slot 532
REMOVE_WHILE_FREEING slot 531
REMOVE_WHILE_FREEING slot 530
...
REMOVE_WHILE_FREEING slot 0
MOVE src slot 456 -> dst slot 0 nritems 533
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
ADD slot 455
ADD slot 454
ADD slot 453
...
ADD slot 0
MOVE src slot 0 -> dst slot 456 nritems 533
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
Fix this to log the move but avoid the double remove by putting all the
logging logic in btrfs_tree_mod_log_eb_copy which has enough information
to detect these cases and properly log moves, removes, and adds. Leave
btrfs_tree_mod_log_insert_move to handle insert_ptr and delete_ptr's
tree mod logging.
(Un)fortunately, this is quite difficult to reproduce, and I was only
able to reproduce it by adding sleeps in btrfs_search_old_slot that
would encourage more log rewinding during ino_to_logical ioctls. I was
able to hit the warning in the previous patch in the series without the
fix quite quickly, but not after this patch.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 2f2071d64c52..385524224037 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2785,8 +2785,8 @@ static int push_node_left(struct btrfs_trans_handle *trans,
if (push_items < src_nritems) {
/*
- * Don't call btrfs_tree_mod_log_insert_move() here, key removal
- * was already fully logged by btrfs_tree_mod_log_eb_copy() above.
+ * btrfs_tree_mod_log_eb_copy handles logging the move, so we
+ * don't need to do an explicit tree mod log operation for it.
*/
memmove_extent_buffer(src, btrfs_node_key_ptr_offset(src, 0),
btrfs_node_key_ptr_offset(src, push_items),
@@ -2847,8 +2847,11 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
btrfs_abort_transaction(trans, ret);
return ret;
}
- ret = btrfs_tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
- BUG_ON(ret < 0);
+
+ /*
+ * btrfs_tree_mod_log_eb_copy handles logging the move, so we don't
+ * need to do an explicit tree mod log operation for it.
+ */
memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(dst, push_items),
btrfs_node_key_ptr_offset(dst, 0),
(dst_nritems) *
diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
index 39545d1d2e9a..07c086f9e35e 100644
--- a/fs/btrfs/tree-mod-log.c
+++ b/fs/btrfs/tree-mod-log.c
@@ -248,6 +248,26 @@ int btrfs_tree_mod_log_insert_key(struct extent_buffer *eb, int slot,
return ret;
}
+static struct tree_mod_elem *tree_mod_log_alloc_move(struct extent_buffer *eb,
+ int dst_slot, int src_slot,
+ int nr_items)
+{
+ struct tree_mod_elem *tm;
+
+ tm = kzalloc(sizeof(*tm), GFP_NOFS);
+ if (!tm)
+ return ERR_PTR(-ENOMEM);
+
+ tm->logical = eb->start;
+ tm->slot = src_slot;
+ tm->move.dst_slot = dst_slot;
+ tm->move.nr_items = nr_items;
+ tm->op = BTRFS_MOD_LOG_MOVE_KEYS;
+ RB_CLEAR_NODE(&tm->node);
+
+ return tm;
+}
+
int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
int dst_slot, int src_slot,
int nr_items)
@@ -265,18 +285,13 @@ int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
if (!tm_list)
return -ENOMEM;
- tm = kzalloc(sizeof(*tm), GFP_NOFS);
- if (!tm) {
- ret = -ENOMEM;
+ tm = tree_mod_log_alloc_move(eb, dst_slot, src_slot, nr_items);
+ if (IS_ERR(tm)) {
+ ret = PTR_ERR(tm);
+ tm = NULL;
goto free_tms;
}
- tm->logical = eb->start;
- tm->slot = src_slot;
- tm->move.dst_slot = dst_slot;
- tm->move.nr_items = nr_items;
- tm->op = BTRFS_MOD_LOG_MOVE_KEYS;
-
for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
tm_list[i] = alloc_tree_mod_elem(eb, i + dst_slot,
BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING);
@@ -489,6 +504,10 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
struct tree_mod_elem **tm_list_add, **tm_list_rem;
int i;
bool locked = false;
+ struct tree_mod_elem *dst_move_tm = NULL;
+ struct tree_mod_elem *src_move_tm = NULL;
+ u32 dst_move_nr_items = btrfs_header_nritems(dst) - dst_offset;
+ u32 src_move_nr_items = btrfs_header_nritems(src) - (src_offset + nr_items);
if (!tree_mod_need_log(fs_info, NULL))
return 0;
@@ -501,6 +520,26 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
if (!tm_list)
return -ENOMEM;
+ if (dst_move_nr_items) {
+ dst_move_tm = tree_mod_log_alloc_move(dst, dst_offset + nr_items,
+ dst_offset, dst_move_nr_items);
+ if (IS_ERR(dst_move_tm)) {
+ ret = PTR_ERR(dst_move_tm);
+ dst_move_tm = NULL;
+ goto free_tms;
+ }
+ }
+ if (src_move_nr_items) {
+ src_move_tm = tree_mod_log_alloc_move(src, src_offset,
+ src_offset + nr_items,
+ src_move_nr_items);
+ if (IS_ERR(src_move_tm)) {
+ ret = PTR_ERR(src_move_tm);
+ src_move_tm = NULL;
+ goto free_tms;
+ }
+ }
+
tm_list_add = tm_list;
tm_list_rem = tm_list + nr_items;
for (i = 0; i < nr_items; i++) {
@@ -523,6 +562,11 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
goto free_tms;
locked = true;
+ if (dst_move_tm) {
+ ret = tree_mod_log_insert(fs_info, dst_move_tm);
+ if (ret)
+ goto free_tms;
+ }
for (i = 0; i < nr_items; i++) {
ret = tree_mod_log_insert(fs_info, tm_list_rem[i]);
if (ret)
@@ -531,6 +575,11 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
if (ret)
goto free_tms;
}
+ if (src_move_tm) {
+ ret = tree_mod_log_insert(fs_info, src_move_tm);
+ if (ret)
+ goto free_tms;
+ }
write_unlock(&fs_info->tree_mod_log_lock);
kfree(tm_list);
@@ -538,6 +587,12 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
return 0;
free_tms:
+ if (dst_move_tm && !RB_EMPTY_NODE(&dst_move_tm->node))
+ rb_erase(&dst_move_tm->node, &fs_info->tree_mod_log);
+ kfree(dst_move_tm);
+ if (src_move_tm && !RB_EMPTY_NODE(&src_move_tm->node))
+ rb_erase(&src_move_tm->node, &fs_info->tree_mod_log);
+ kfree(src_move_tm);
for (i = 0; i < nr_items * 2; i++) {
if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5cead5422a0e3d13b0bcee986c0f5c4ebb94100b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071633-shell-happily-3a92@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
5cead5422a0e ("btrfs: insert tree mod log move in push_node_left")
e23efd8e8767 ("btrfs: add eb to btrfs_node_key_ptr_offset")
07e81dc94474 ("btrfs: move accessor helpers into accessors.h")
ad1ac5012c2b ("btrfs: move btrfs_map_token to accessors")
55e5cfd36da5 ("btrfs: remove fs_info::pending_changes and related code")
7966a6b5959b ("btrfs: move fs_info::flags enum to fs.h")
fc97a410bd78 ("btrfs: move mount option definitions to fs.h")
0d3a9cf8c306 ("btrfs: convert incompat and compat flag test helpers to macros")
ec8eb376e271 ("btrfs: move BTRFS_FS_STATE* definitions and helpers to fs.h")
9b569ea0be6f ("btrfs: move the printk helpers out of ctree.h")
e118578a8df7 ("btrfs: move assert helpers out of ctree.h")
c7f13d428ea1 ("btrfs: move fs wide helpers out of ctree.h")
63a7cb130718 ("btrfs: auto enable discard=async when possible")
7a66eda351ba ("btrfs: move the btrfs_verity_descriptor_item defs up in ctree.h")
956504a331a6 ("btrfs: move trans_handle_cachep out of ctree.h")
f1e5c6185ca1 ("btrfs: move flush related definitions to space-info.h")
ed4c491a3db2 ("btrfs: move BTRFS_MAX_MIRRORS into scrub.c")
4300c58f8090 ("btrfs: move btrfs on-disk definitions out of ctree.h")
d60d956eb41f ("btrfs: remove unused set/clear_pending_info helpers")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5cead5422a0e3d13b0bcee986c0f5c4ebb94100b Mon Sep 17 00:00:00 2001
From: Boris Burkov <boris(a)bur.io>
Date: Thu, 1 Jun 2023 11:55:14 -0700
Subject: [PATCH] btrfs: insert tree mod log move in push_node_left
There is a fairly unlikely race condition in tree mod log rewind that
can result in a kernel panic which has the following trace:
[530.569] BTRFS critical (device sda3): unable to find logical 0 length 4096
[530.585] BTRFS critical (device sda3): unable to find logical 0 length 4096
[530.602] BUG: kernel NULL pointer dereference, address: 0000000000000002
[530.618] #PF: supervisor read access in kernel mode
[530.629] #PF: error_code(0x0000) - not-present page
[530.641] PGD 0 P4D 0
[530.647] Oops: 0000 [#1] SMP
[530.654] CPU: 30 PID: 398973 Comm: below Kdump: loaded Tainted: G S O K 5.12.0-0_fbk13_clang_7455_gb24de3bdb045 #1
[530.680] Hardware name: Quanta Mono Lake-M.2 SATA 1HY9U9Z001G/Mono Lake-M.2 SATA, BIOS F20_3A15 08/16/2017
[530.703] RIP: 0010:__btrfs_map_block+0xaa/0xd00
[530.755] RSP: 0018:ffffc9002c2f7600 EFLAGS: 00010246
[530.767] RAX: ffffffffffffffea RBX: ffff888292e41000 RCX: f2702d8b8be15100
[530.784] RDX: ffff88885fda6fb8 RSI: ffff88885fd973c8 RDI: ffff88885fd973c8
[530.800] RBP: ffff888292e410d0 R08: ffffffff82fd7fd0 R09: 00000000fffeffff
[530.816] R10: ffffffff82e57fd0 R11: ffffffff82e57d70 R12: 0000000000000000
[530.832] R13: 0000000000001000 R14: 0000000000001000 R15: ffffc9002c2f76f0
[530.848] FS: 00007f38d64af000(0000) GS:ffff88885fd80000(0000) knlGS:0000000000000000
[530.866] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[530.880] CR2: 0000000000000002 CR3: 00000002b6770004 CR4: 00000000003706e0
[530.896] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[530.912] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[530.928] Call Trace:
[530.934] ? btrfs_printk+0x13b/0x18c
[530.943] ? btrfs_bio_counter_inc_blocked+0x3d/0x130
[530.955] btrfs_map_bio+0x75/0x330
[530.963] ? kmem_cache_alloc+0x12a/0x2d0
[530.973] ? btrfs_submit_metadata_bio+0x63/0x100
[530.984] btrfs_submit_metadata_bio+0xa4/0x100
[530.995] submit_extent_page+0x30f/0x360
[531.004] read_extent_buffer_pages+0x49e/0x6d0
[531.015] ? submit_extent_page+0x360/0x360
[531.025] btree_read_extent_buffer_pages+0x5f/0x150
[531.037] read_tree_block+0x37/0x60
[531.046] read_block_for_search+0x18b/0x410
[531.056] btrfs_search_old_slot+0x198/0x2f0
[531.066] resolve_indirect_ref+0xfe/0x6f0
[531.076] ? ulist_alloc+0x31/0x60
[531.084] ? kmem_cache_alloc_trace+0x12e/0x2b0
[531.095] find_parent_nodes+0x720/0x1830
[531.105] ? ulist_alloc+0x10/0x60
[531.113] iterate_extent_inodes+0xea/0x370
[531.123] ? btrfs_previous_extent_item+0x8f/0x110
[531.134] ? btrfs_search_path_in_tree+0x240/0x240
[531.146] iterate_inodes_from_logical+0x98/0xd0
[531.157] ? btrfs_search_path_in_tree+0x240/0x240
[531.168] btrfs_ioctl_logical_to_ino+0xd9/0x180
[531.179] btrfs_ioctl+0xe2/0x2eb0
This occurs when logical inode resolution takes a tree mod log sequence
number, and then while backref walking hits a rewind on a busy node
which has the following sequence of tree mod log operations (numbers
filled in from a specific example, but they are somewhat arbitrary)
REMOVE_WHILE_FREEING slot 532
REMOVE_WHILE_FREEING slot 531
REMOVE_WHILE_FREEING slot 530
...
REMOVE_WHILE_FREEING slot 0
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
ADD slot 455
ADD slot 454
ADD slot 453
...
ADD slot 0
MOVE src slot 0 -> dst slot 456 nritems 533
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
When this sequence gets applied via btrfs_tree_mod_log_rewind, it
allocates a fresh rewind eb, and first inserts the correct key info for
the 533 elements, then overwrites the first 456 of them, then decrements
the count by 456 via the add ops, then rewinds the move by doing a
memmove from 456:988->0:532. We have never written anything past 532, so
that memmove writes garbage into the 0:532 range. In practice, this
results in a lot of fully 0 keys. The rewind then puts valid keys into
slots 0:455 with the last removes, but 456:532 are still invalid.
When search_old_slot uses this eb, if it uses one of those invalid
slots, it can then read the extent buffer and issue a bio for offset 0
which ultimately panics looking up extent mappings.
This bad tree mod log sequence gets generated when the node balancing
code happens to do a balance_node_right followed by a push_node_left
while logging in the tree mod log. Illustrated for ebs L and R (left and
right):
L R
start:
[XXX|YYY|...] [ZZZ|...|...]
balance_node_right:
[XXX|YYY|...] [...|ZZZ|...] move Z to make room for Y
[XXX|...|...] [YYY|ZZZ|...] copy Y from L to R
push_node_left:
[XXX|YYY|...] [...|ZZZ|...] copy Y from R to L
[XXX|YYY|...] [ZZZ|...|...] move Z into emptied space (NOT LOGGED!)
This is because balance_node_right logs a move, but push_node_left
explicitly doesn't. That is because logging the move would remove the
overwritten src < dst range in the right eb, which was already logged
when we called btrfs_tree_mod_log_eb_copy. The correct sequence would
include a move from 456:988 to 0:532 after remove 0:455 and before
removing 0:532. Reversing that sequence would entail creating keys for
0:532, then moving those keys out to 456:988, then creating more keys
for 0:455.
i.e.,
REMOVE_WHILE_FREEING slot 532
REMOVE_WHILE_FREEING slot 531
REMOVE_WHILE_FREEING slot 530
...
REMOVE_WHILE_FREEING slot 0
MOVE src slot 456 -> dst slot 0 nritems 533
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
ADD slot 455
ADD slot 454
ADD slot 453
...
ADD slot 0
MOVE src slot 0 -> dst slot 456 nritems 533
REMOVE slot 455
REMOVE slot 454
REMOVE slot 453
...
REMOVE slot 0
Fix this to log the move but avoid the double remove by putting all the
logging logic in btrfs_tree_mod_log_eb_copy which has enough information
to detect these cases and properly log moves, removes, and adds. Leave
btrfs_tree_mod_log_insert_move to handle insert_ptr and delete_ptr's
tree mod logging.
(Un)fortunately, this is quite difficult to reproduce, and I was only
able to reproduce it by adding sleeps in btrfs_search_old_slot that
would encourage more log rewinding during ino_to_logical ioctls. I was
able to hit the warning in the previous patch in the series without the
fix quite quickly, but not after this patch.
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: Boris Burkov <boris(a)bur.io>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index 2f2071d64c52..385524224037 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2785,8 +2785,8 @@ static int push_node_left(struct btrfs_trans_handle *trans,
if (push_items < src_nritems) {
/*
- * Don't call btrfs_tree_mod_log_insert_move() here, key removal
- * was already fully logged by btrfs_tree_mod_log_eb_copy() above.
+ * btrfs_tree_mod_log_eb_copy handles logging the move, so we
+ * don't need to do an explicit tree mod log operation for it.
*/
memmove_extent_buffer(src, btrfs_node_key_ptr_offset(src, 0),
btrfs_node_key_ptr_offset(src, push_items),
@@ -2847,8 +2847,11 @@ static int balance_node_right(struct btrfs_trans_handle *trans,
btrfs_abort_transaction(trans, ret);
return ret;
}
- ret = btrfs_tree_mod_log_insert_move(dst, push_items, 0, dst_nritems);
- BUG_ON(ret < 0);
+
+ /*
+ * btrfs_tree_mod_log_eb_copy handles logging the move, so we don't
+ * need to do an explicit tree mod log operation for it.
+ */
memmove_extent_buffer(dst, btrfs_node_key_ptr_offset(dst, push_items),
btrfs_node_key_ptr_offset(dst, 0),
(dst_nritems) *
diff --git a/fs/btrfs/tree-mod-log.c b/fs/btrfs/tree-mod-log.c
index 39545d1d2e9a..07c086f9e35e 100644
--- a/fs/btrfs/tree-mod-log.c
+++ b/fs/btrfs/tree-mod-log.c
@@ -248,6 +248,26 @@ int btrfs_tree_mod_log_insert_key(struct extent_buffer *eb, int slot,
return ret;
}
+static struct tree_mod_elem *tree_mod_log_alloc_move(struct extent_buffer *eb,
+ int dst_slot, int src_slot,
+ int nr_items)
+{
+ struct tree_mod_elem *tm;
+
+ tm = kzalloc(sizeof(*tm), GFP_NOFS);
+ if (!tm)
+ return ERR_PTR(-ENOMEM);
+
+ tm->logical = eb->start;
+ tm->slot = src_slot;
+ tm->move.dst_slot = dst_slot;
+ tm->move.nr_items = nr_items;
+ tm->op = BTRFS_MOD_LOG_MOVE_KEYS;
+ RB_CLEAR_NODE(&tm->node);
+
+ return tm;
+}
+
int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
int dst_slot, int src_slot,
int nr_items)
@@ -265,18 +285,13 @@ int btrfs_tree_mod_log_insert_move(struct extent_buffer *eb,
if (!tm_list)
return -ENOMEM;
- tm = kzalloc(sizeof(*tm), GFP_NOFS);
- if (!tm) {
- ret = -ENOMEM;
+ tm = tree_mod_log_alloc_move(eb, dst_slot, src_slot, nr_items);
+ if (IS_ERR(tm)) {
+ ret = PTR_ERR(tm);
+ tm = NULL;
goto free_tms;
}
- tm->logical = eb->start;
- tm->slot = src_slot;
- tm->move.dst_slot = dst_slot;
- tm->move.nr_items = nr_items;
- tm->op = BTRFS_MOD_LOG_MOVE_KEYS;
-
for (i = 0; i + dst_slot < src_slot && i < nr_items; i++) {
tm_list[i] = alloc_tree_mod_elem(eb, i + dst_slot,
BTRFS_MOD_LOG_KEY_REMOVE_WHILE_MOVING);
@@ -489,6 +504,10 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
struct tree_mod_elem **tm_list_add, **tm_list_rem;
int i;
bool locked = false;
+ struct tree_mod_elem *dst_move_tm = NULL;
+ struct tree_mod_elem *src_move_tm = NULL;
+ u32 dst_move_nr_items = btrfs_header_nritems(dst) - dst_offset;
+ u32 src_move_nr_items = btrfs_header_nritems(src) - (src_offset + nr_items);
if (!tree_mod_need_log(fs_info, NULL))
return 0;
@@ -501,6 +520,26 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
if (!tm_list)
return -ENOMEM;
+ if (dst_move_nr_items) {
+ dst_move_tm = tree_mod_log_alloc_move(dst, dst_offset + nr_items,
+ dst_offset, dst_move_nr_items);
+ if (IS_ERR(dst_move_tm)) {
+ ret = PTR_ERR(dst_move_tm);
+ dst_move_tm = NULL;
+ goto free_tms;
+ }
+ }
+ if (src_move_nr_items) {
+ src_move_tm = tree_mod_log_alloc_move(src, src_offset,
+ src_offset + nr_items,
+ src_move_nr_items);
+ if (IS_ERR(src_move_tm)) {
+ ret = PTR_ERR(src_move_tm);
+ src_move_tm = NULL;
+ goto free_tms;
+ }
+ }
+
tm_list_add = tm_list;
tm_list_rem = tm_list + nr_items;
for (i = 0; i < nr_items; i++) {
@@ -523,6 +562,11 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
goto free_tms;
locked = true;
+ if (dst_move_tm) {
+ ret = tree_mod_log_insert(fs_info, dst_move_tm);
+ if (ret)
+ goto free_tms;
+ }
for (i = 0; i < nr_items; i++) {
ret = tree_mod_log_insert(fs_info, tm_list_rem[i]);
if (ret)
@@ -531,6 +575,11 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
if (ret)
goto free_tms;
}
+ if (src_move_tm) {
+ ret = tree_mod_log_insert(fs_info, src_move_tm);
+ if (ret)
+ goto free_tms;
+ }
write_unlock(&fs_info->tree_mod_log_lock);
kfree(tm_list);
@@ -538,6 +587,12 @@ int btrfs_tree_mod_log_eb_copy(struct extent_buffer *dst,
return 0;
free_tms:
+ if (dst_move_tm && !RB_EMPTY_NODE(&dst_move_tm->node))
+ rb_erase(&dst_move_tm->node, &fs_info->tree_mod_log);
+ kfree(dst_move_tm);
+ if (src_move_tm && !RB_EMPTY_NODE(&src_move_tm->node))
+ rb_erase(&src_move_tm->node, &fs_info->tree_mod_log);
+ kfree(src_move_tm);
for (i = 0; i < nr_items * 2; i++) {
if (tm_list[i] && !RB_EMPTY_NODE(&tm_list[i]->node))
rb_erase(&tm_list[i]->node, &fs_info->tree_mod_log);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 36614a3beba33a05ad78d4dcb9aa1d00e8a7d01
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071607-crunchy-washbowl-2fd3@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 36614a3beba33a05ad78d4dcb9aa1d00e8a7d01f Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch(a)lst.de>
Date: Wed, 31 May 2023 08:04:50 +0200
Subject: [PATCH] btrfs: fix range_end calculation in extent_write_locked_range
The range_end field in struct writeback_control is inclusive, just like
the end parameter passed to extent_write_locked_range. Not doing this
could cause extra writeout, which is harmless but suboptimal.
Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads")
CC: stable(a)vger.kernel.org # 5.9+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d5b3b15f7ab2..0726c82db309 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2310,7 +2310,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
struct writeback_control wbc_writepages = {
.sync_mode = WB_SYNC_ALL,
.range_start = start,
- .range_end = end + 1,
+ .range_end = end,
.no_cgroup_owner = 1,
};
struct btrfs_bio_ctrl bio_ctrl = {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 36614a3beba33a05ad78d4dcb9aa1d00e8a7d01
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071603-hangup-pegboard-8db1@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 36614a3beba33a05ad78d4dcb9aa1d00e8a7d01f Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch(a)lst.de>
Date: Wed, 31 May 2023 08:04:50 +0200
Subject: [PATCH] btrfs: fix range_end calculation in extent_write_locked_range
The range_end field in struct writeback_control is inclusive, just like
the end parameter passed to extent_write_locked_range. Not doing this
could cause extra writeout, which is harmless but suboptimal.
Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads")
CC: stable(a)vger.kernel.org # 5.9+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d5b3b15f7ab2..0726c82db309 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2310,7 +2310,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
struct writeback_control wbc_writepages = {
.sync_mode = WB_SYNC_ALL,
.range_start = start,
- .range_end = end + 1,
+ .range_end = end,
.no_cgroup_owner = 1,
};
struct btrfs_bio_ctrl bio_ctrl = {
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 36614a3beba33a05ad78d4dcb9aa1d00e8a7d01
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071602-swinger-darn-2b82@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 36614a3beba33a05ad78d4dcb9aa1d00e8a7d01f Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch(a)lst.de>
Date: Wed, 31 May 2023 08:04:50 +0200
Subject: [PATCH] btrfs: fix range_end calculation in extent_write_locked_range
The range_end field in struct writeback_control is inclusive, just like
the end parameter passed to extent_write_locked_range. Not doing this
could cause extra writeout, which is harmless but suboptimal.
Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads")
CC: stable(a)vger.kernel.org # 5.9+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d5b3b15f7ab2..0726c82db309 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2310,7 +2310,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
struct writeback_control wbc_writepages = {
.sync_mode = WB_SYNC_ALL,
.range_start = start,
- .range_end = end + 1,
+ .range_end = end,
.no_cgroup_owner = 1,
};
struct btrfs_bio_ctrl bio_ctrl = {
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 36614a3beba33a05ad78d4dcb9aa1d00e8a7d01
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071601-bagged-chunk-6945@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 36614a3beba33a05ad78d4dcb9aa1d00e8a7d01f Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch(a)lst.de>
Date: Wed, 31 May 2023 08:04:50 +0200
Subject: [PATCH] btrfs: fix range_end calculation in extent_write_locked_range
The range_end field in struct writeback_control is inclusive, just like
the end parameter passed to extent_write_locked_range. Not doing this
could cause extra writeout, which is harmless but suboptimal.
Fixes: 771ed689d2cd ("Btrfs: Optimize compressed writeback and reads")
CC: stable(a)vger.kernel.org # 5.9+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d5b3b15f7ab2..0726c82db309 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2310,7 +2310,7 @@ int extent_write_locked_range(struct inode *inode, u64 start, u64 end)
struct writeback_control wbc_writepages = {
.sync_mode = WB_SYNC_ALL,
.range_start = start,
- .range_end = end + 1,
+ .range_end = end,
.no_cgroup_owner = 1,
};
struct btrfs_bio_ctrl bio_ctrl = {
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 2c14f0ffdd30bd3d321ad5fe76fcf701746e1df6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071649-pushcart-bobtail-27e7@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
2c14f0ffdd30 ("btrfs: fix fsverify read error handling in end_page_read")
ed9ee98ecb4f ("btrfs: factor out a btrfs_verify_page helper")
0571b6357c5e ("btrfs: remove the io_failure_record infrastructure")
7609afac6775 ("btrfs: handle checksum validation and repair at the storage layer")
7276aa7d3825 ("btrfs: save the bio iter for checksum validation in common code")
d0e5cb2be770 ("btrfs: add a btrfs_inode pointer to struct btrfs_bio")
e0cfbb2ccabb ("btrfs: better document struct btrfs_bio")
bacf60e51586 ("btrfs: move repair_io_failure to bio.c")
103c19723c80 ("btrfs: split the bio submission path into a separate file")
cb3e217bdb39 ("btrfs: use btrfs_dev_name() helper to handle missing devices better")
2c8f5e8cdf0f ("btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree")
947a629988f1 ("btrfs: move tree block parentness check into validate_extent_buffer()")
789d6a3a876e ("btrfs: concentrate all tree block parentness check parameters into one structure")
35da5a7edec3 ("btrfs: drop private_data parameter from extent_io_tree_init")
621af94af334 ("btrfs: pass btrfs_inode to btrfs_check_data_csum")
bb41632ea7d2 ("btrfs: pass btrfs_inode to btrfs_submit_dio_bio")
e2884c3d4456 ("btrfs: switch btrfs_dio_private::inode to btrfs_inode")
d8f9268ece91 ("btrfs: pass btrfs_inode to btrfs_repair_one_sector")
d781c1c315ce ("btrfs: pass btrfs_inode to btrfs_submit_dio_repair_bio")
b762041629e7 ("btrfs: pass btrfs_inode to btrfs_submit_data_read_bio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c14f0ffdd30bd3d321ad5fe76fcf701746e1df6 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch(a)lst.de>
Date: Wed, 31 May 2023 08:04:52 +0200
Subject: [PATCH] btrfs: fix fsverify read error handling in end_page_read
Also clear the uptodate bit to make sure the page isn't seen as uptodate
in the page cache if fsverity verification fails.
Fixes: 146054090b08 ("btrfs: initial fsverity support")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8e42ce48b52e..a943a6622489 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -497,12 +497,8 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
ASSERT(page_offset(page) <= start &&
start + len <= page_offset(page) + PAGE_SIZE);
- if (uptodate) {
- if (!btrfs_verify_page(page, start)) {
- btrfs_page_set_error(fs_info, page, start, len);
- } else {
- btrfs_page_set_uptodate(fs_info, page, start, len);
- }
+ if (uptodate && btrfs_verify_page(page, start)) {
+ btrfs_page_set_uptodate(fs_info, page, start, len);
} else {
btrfs_page_clear_uptodate(fs_info, page, start, len);
btrfs_page_set_error(fs_info, page, start, len);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 2c14f0ffdd30bd3d321ad5fe76fcf701746e1df6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071648-certainly-sculpture-22e3@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
2c14f0ffdd30 ("btrfs: fix fsverify read error handling in end_page_read")
ed9ee98ecb4f ("btrfs: factor out a btrfs_verify_page helper")
0571b6357c5e ("btrfs: remove the io_failure_record infrastructure")
7609afac6775 ("btrfs: handle checksum validation and repair at the storage layer")
7276aa7d3825 ("btrfs: save the bio iter for checksum validation in common code")
d0e5cb2be770 ("btrfs: add a btrfs_inode pointer to struct btrfs_bio")
e0cfbb2ccabb ("btrfs: better document struct btrfs_bio")
bacf60e51586 ("btrfs: move repair_io_failure to bio.c")
103c19723c80 ("btrfs: split the bio submission path into a separate file")
cb3e217bdb39 ("btrfs: use btrfs_dev_name() helper to handle missing devices better")
2c8f5e8cdf0f ("btrfs: remove leftover setting of EXTENT_UPTODATE state in an inode's io_tree")
947a629988f1 ("btrfs: move tree block parentness check into validate_extent_buffer()")
789d6a3a876e ("btrfs: concentrate all tree block parentness check parameters into one structure")
35da5a7edec3 ("btrfs: drop private_data parameter from extent_io_tree_init")
621af94af334 ("btrfs: pass btrfs_inode to btrfs_check_data_csum")
bb41632ea7d2 ("btrfs: pass btrfs_inode to btrfs_submit_dio_bio")
e2884c3d4456 ("btrfs: switch btrfs_dio_private::inode to btrfs_inode")
d8f9268ece91 ("btrfs: pass btrfs_inode to btrfs_repair_one_sector")
d781c1c315ce ("btrfs: pass btrfs_inode to btrfs_submit_dio_repair_bio")
b762041629e7 ("btrfs: pass btrfs_inode to btrfs_submit_data_read_bio")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c14f0ffdd30bd3d321ad5fe76fcf701746e1df6 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch(a)lst.de>
Date: Wed, 31 May 2023 08:04:52 +0200
Subject: [PATCH] btrfs: fix fsverify read error handling in end_page_read
Also clear the uptodate bit to make sure the page isn't seen as uptodate
in the page cache if fsverity verification fails.
Fixes: 146054090b08 ("btrfs: initial fsverity support")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8e42ce48b52e..a943a6622489 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -497,12 +497,8 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
ASSERT(page_offset(page) <= start &&
start + len <= page_offset(page) + PAGE_SIZE);
- if (uptodate) {
- if (!btrfs_verify_page(page, start)) {
- btrfs_page_set_error(fs_info, page, start, len);
- } else {
- btrfs_page_set_uptodate(fs_info, page, start, len);
- }
+ if (uptodate && btrfs_verify_page(page, start)) {
+ btrfs_page_set_uptodate(fs_info, page, start, len);
} else {
btrfs_page_clear_uptodate(fs_info, page, start, len);
btrfs_page_set_error(fs_info, page, start, len);
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2c14f0ffdd30bd3d321ad5fe76fcf701746e1df6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071642-refinish-raking-4c3a@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2c14f0ffdd30bd3d321ad5fe76fcf701746e1df6 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch(a)lst.de>
Date: Wed, 31 May 2023 08:04:52 +0200
Subject: [PATCH] btrfs: fix fsverify read error handling in end_page_read
Also clear the uptodate bit to make sure the page isn't seen as uptodate
in the page cache if fsverity verification fails.
Fixes: 146054090b08 ("btrfs: initial fsverity support")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Josef Bacik <josef(a)toxicpanda.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: David Sterba <dsterba(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 8e42ce48b52e..a943a6622489 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -497,12 +497,8 @@ static void end_page_read(struct page *page, bool uptodate, u64 start, u32 len)
ASSERT(page_offset(page) <= start &&
start + len <= page_offset(page) + PAGE_SIZE);
- if (uptodate) {
- if (!btrfs_verify_page(page, start)) {
- btrfs_page_set_error(fs_info, page, start, len);
- } else {
- btrfs_page_set_uptodate(fs_info, page, start, len);
- }
+ if (uptodate && btrfs_verify_page(page, start)) {
+ btrfs_page_set_uptodate(fs_info, page, start, len);
} else {
btrfs_page_clear_uptodate(fs_info, page, start, len);
btrfs_page_set_error(fs_info, page, start, len);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x f18cc97845aa4ae0e795c088c979fe1642b3b8e5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071617-same-snowbird-770b@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
f18cc97845aa ("btrfs: fix dirty_metadata_bytes for redirtied buffers")
98c8d683c291 ("btrfs: combine btrfs_clear_buffer_dirty and clear_extent_buffer_dirty")
190a83391bc4 ("btrfs: rename btrfs_clean_tree_block to btrfs_clear_buffer_dirty")
c4e54a657116 ("btrfs: replace clearing extent buffer dirty bit with btrfs_clean_block")
ed25dab3a0d1 ("btrfs: add trans argument to btrfs_clean_tree_block")
d3fb66150c05 ("btrfs: always lock the block before calling btrfs_clean_tree_block")
7f0add250f82 ("btrfs: move super_block specific helpers into super.h")
c03b22076bd2 ("btrfs: move super prototypes into super.h")
5c11adcc383a ("btrfs: move verity prototypes into verity.h")
77407dc032e2 ("btrfs: move dev-replace prototypes into dev-replace.h")
2fc6822c99d7 ("btrfs: move scrub prototypes into scrub.h")
677074792a1d ("btrfs: move relocation prototypes into relocation.h")
33cf97a7b658 ("btrfs: move acl prototypes into acl.h")
b538a271ae9b ("btrfs: move the 32bit warn defines into messages.h")
af142b6f44d3 ("btrfs: move file prototypes to file.h")
7572dec8f522 ("btrfs: move ioctl prototypes into ioctl.h")
c7a03b524d30 ("btrfs: move uuid tree prototypes to uuid-tree.h")
7c8ede162805 ("btrfs: move file-item prototypes into their own header")
f2b39277b87d ("btrfs: move dir-item prototypes into dir-item.h")
59b818e064ab ("btrfs: move defrag related prototypes to their own header")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f18cc97845aa4ae0e795c088c979fe1642b3b8e5 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch(a)lst.de>
Date: Mon, 8 May 2023 07:58:38 -0700
Subject: [PATCH] btrfs: fix dirty_metadata_bytes for redirtied buffers
dirty_metadata_bytes is decremented in both places that clear the dirty
bit in a buffer, but only incremented in btrfs_mark_buffer_dirty, which
means that a buffer that is redirtied using btrfs_redirty_list_add won't
be added to dirty_metadata_bytes, but it will be subtracted when written
out, leading an inconsistency in the counter.
Move the dirty_metadata_bytes from btrfs_mark_buffer_dirty into
set_extent_buffer_dirty to also account for the redirty case, and remove
the now unused set_extent_buffer_dirty return value.
Fixes: d3575156f662 ("btrfs: zoned: redirty released extent buffers")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 83518ed71bfd..112c99dde57f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4621,7 +4621,6 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
{
struct btrfs_fs_info *fs_info = buf->fs_info;
u64 transid = btrfs_header_generation(buf);
- int was_dirty;
#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
/*
@@ -4636,11 +4635,7 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
if (transid != fs_info->generation)
WARN(1, KERN_CRIT "btrfs transid mismatch buffer %llu, found %llu running %llu\n",
buf->start, transid, fs_info->generation);
- was_dirty = set_extent_buffer_dirty(buf);
- if (!was_dirty)
- percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
- buf->len,
- fs_info->dirty_metadata_batch);
+ set_extent_buffer_dirty(buf);
#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
/*
* btrfs_check_leaf() won't check item data if we don't have WRITTEN
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a1adadd5d25d..a829390632a5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4148,7 +4148,7 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
WARN_ON(atomic_read(&eb->refs) == 0);
}
-bool set_extent_buffer_dirty(struct extent_buffer *eb)
+void set_extent_buffer_dirty(struct extent_buffer *eb)
{
int i;
int num_pages;
@@ -4183,13 +4183,14 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
eb->start, eb->len);
if (subpage)
unlock_page(eb->pages[0]);
+ percpu_counter_add_batch(&eb->fs_info->dirty_metadata_bytes,
+ eb->len,
+ eb->fs_info->dirty_metadata_batch);
}
#ifdef CONFIG_BTRFS_DEBUG
for (i = 0; i < num_pages; i++)
ASSERT(PageDirty(eb->pages[i]));
#endif
-
- return was_dirty;
}
void clear_extent_buffer_uptodate(struct extent_buffer *eb)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4341ad978fb8..f937654230d3 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -262,7 +262,7 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
unsigned long start, unsigned long pos,
unsigned long len);
-bool set_extent_buffer_dirty(struct extent_buffer *eb);
+void set_extent_buffer_dirty(struct extent_buffer *eb);
void set_extent_buffer_uptodate(struct extent_buffer *eb);
void clear_extent_buffer_uptodate(struct extent_buffer *eb);
int extent_buffer_under_io(const struct extent_buffer *eb);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f18cc97845aa4ae0e795c088c979fe1642b3b8e5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071616-exact-egotistic-c92e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
f18cc97845aa ("btrfs: fix dirty_metadata_bytes for redirtied buffers")
98c8d683c291 ("btrfs: combine btrfs_clear_buffer_dirty and clear_extent_buffer_dirty")
190a83391bc4 ("btrfs: rename btrfs_clean_tree_block to btrfs_clear_buffer_dirty")
c4e54a657116 ("btrfs: replace clearing extent buffer dirty bit with btrfs_clean_block")
ed25dab3a0d1 ("btrfs: add trans argument to btrfs_clean_tree_block")
d3fb66150c05 ("btrfs: always lock the block before calling btrfs_clean_tree_block")
7f0add250f82 ("btrfs: move super_block specific helpers into super.h")
c03b22076bd2 ("btrfs: move super prototypes into super.h")
5c11adcc383a ("btrfs: move verity prototypes into verity.h")
77407dc032e2 ("btrfs: move dev-replace prototypes into dev-replace.h")
2fc6822c99d7 ("btrfs: move scrub prototypes into scrub.h")
677074792a1d ("btrfs: move relocation prototypes into relocation.h")
33cf97a7b658 ("btrfs: move acl prototypes into acl.h")
b538a271ae9b ("btrfs: move the 32bit warn defines into messages.h")
af142b6f44d3 ("btrfs: move file prototypes to file.h")
7572dec8f522 ("btrfs: move ioctl prototypes into ioctl.h")
c7a03b524d30 ("btrfs: move uuid tree prototypes to uuid-tree.h")
7c8ede162805 ("btrfs: move file-item prototypes into their own header")
f2b39277b87d ("btrfs: move dir-item prototypes into dir-item.h")
59b818e064ab ("btrfs: move defrag related prototypes to their own header")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f18cc97845aa4ae0e795c088c979fe1642b3b8e5 Mon Sep 17 00:00:00 2001
From: Christoph Hellwig <hch(a)lst.de>
Date: Mon, 8 May 2023 07:58:38 -0700
Subject: [PATCH] btrfs: fix dirty_metadata_bytes for redirtied buffers
dirty_metadata_bytes is decremented in both places that clear the dirty
bit in a buffer, but only incremented in btrfs_mark_buffer_dirty, which
means that a buffer that is redirtied using btrfs_redirty_list_add won't
be added to dirty_metadata_bytes, but it will be subtracted when written
out, leading an inconsistency in the counter.
Move the dirty_metadata_bytes from btrfs_mark_buffer_dirty into
set_extent_buffer_dirty to also account for the redirty case, and remove
the now unused set_extent_buffer_dirty return value.
Fixes: d3575156f662 ("btrfs: zoned: redirty released extent buffers")
CC: stable(a)vger.kernel.org # 5.15+
Reviewed-by: Naohiro Aota <naohiro.aota(a)wdc.com>
Signed-off-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 83518ed71bfd..112c99dde57f 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4621,7 +4621,6 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
{
struct btrfs_fs_info *fs_info = buf->fs_info;
u64 transid = btrfs_header_generation(buf);
- int was_dirty;
#ifdef CONFIG_BTRFS_FS_RUN_SANITY_TESTS
/*
@@ -4636,11 +4635,7 @@ void btrfs_mark_buffer_dirty(struct extent_buffer *buf)
if (transid != fs_info->generation)
WARN(1, KERN_CRIT "btrfs transid mismatch buffer %llu, found %llu running %llu\n",
buf->start, transid, fs_info->generation);
- was_dirty = set_extent_buffer_dirty(buf);
- if (!was_dirty)
- percpu_counter_add_batch(&fs_info->dirty_metadata_bytes,
- buf->len,
- fs_info->dirty_metadata_batch);
+ set_extent_buffer_dirty(buf);
#ifdef CONFIG_BTRFS_FS_CHECK_INTEGRITY
/*
* btrfs_check_leaf() won't check item data if we don't have WRITTEN
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index a1adadd5d25d..a829390632a5 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4148,7 +4148,7 @@ void btrfs_clear_buffer_dirty(struct btrfs_trans_handle *trans,
WARN_ON(atomic_read(&eb->refs) == 0);
}
-bool set_extent_buffer_dirty(struct extent_buffer *eb)
+void set_extent_buffer_dirty(struct extent_buffer *eb)
{
int i;
int num_pages;
@@ -4183,13 +4183,14 @@ bool set_extent_buffer_dirty(struct extent_buffer *eb)
eb->start, eb->len);
if (subpage)
unlock_page(eb->pages[0]);
+ percpu_counter_add_batch(&eb->fs_info->dirty_metadata_bytes,
+ eb->len,
+ eb->fs_info->dirty_metadata_batch);
}
#ifdef CONFIG_BTRFS_DEBUG
for (i = 0; i < num_pages; i++)
ASSERT(PageDirty(eb->pages[i]));
#endif
-
- return was_dirty;
}
void clear_extent_buffer_uptodate(struct extent_buffer *eb)
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index 4341ad978fb8..f937654230d3 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -262,7 +262,7 @@ void extent_buffer_bitmap_set(const struct extent_buffer *eb, unsigned long star
void extent_buffer_bitmap_clear(const struct extent_buffer *eb,
unsigned long start, unsigned long pos,
unsigned long len);
-bool set_extent_buffer_dirty(struct extent_buffer *eb);
+void set_extent_buffer_dirty(struct extent_buffer *eb);
void set_extent_buffer_uptodate(struct extent_buffer *eb);
void clear_extent_buffer_uptodate(struct extent_buffer *eb);
int extent_buffer_under_io(const struct extent_buffer *eb);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 2d8ae8c417db284f598dffb178cc01e7db0f1821
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071623-graceful-breeder-217d@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
2d8ae8c417db ("nfsd: use vfs setgid helper")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2d8ae8c417db284f598dffb178cc01e7db0f1821 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Tue, 2 May 2023 15:36:02 +0200
Subject: [PATCH] nfsd: use vfs setgid helper
We've aligned setgid behavior over multiple kernel releases. The details
can be found in commit cf619f891971 ("Merge tag 'fs.ovl.setgid.v6.2' of
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping") and
commit 426b4ca2d6a5 ("Merge tag 'fs.setgid.v6.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux").
Consistent setgid stripping behavior is now encapsulated in the
setattr_should_drop_sgid() helper which is used by all filesystems that
strip setgid bits outside of vfs proper. Usually ATTR_KILL_SGID is
raised in e.g., chown_common() and is subject to the
setattr_should_drop_sgid() check to determine whether the setgid bit can
be retained. Since nfsd is raising ATTR_KILL_SGID unconditionally it
will cause notify_change() to strip it even if the caller had the
necessary privileges to retain it. Ensure that nfsd only raises
ATR_KILL_SGID if the caller lacks the necessary privileges to retain the
setgid bit.
Without this patch the setgid stripping tests in LTP will fail:
> As you can see, the problem is S_ISGID (0002000) was dropped on a
> non-group-executable file while chown was invoked by super-user, while
[...]
> fchown02.c:66: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700
[...]
> chown02.c:57: TFAIL: testfile2: wrong mode permissions 0100700, expected 0102700
With this patch all tests pass.
Reported-by: Sherry Yang <sherry.yang(a)oracle.com>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
diff --git a/fs/nfsd/vfs.c b/fs/nfsd/vfs.c
index db67f8e19344..0016bcc04a59 100644
--- a/fs/nfsd/vfs.c
+++ b/fs/nfsd/vfs.c
@@ -388,7 +388,9 @@ nfsd_sanitize_attrs(struct inode *inode, struct iattr *iap)
iap->ia_mode &= ~S_ISGID;
} else {
/* set ATTR_KILL_* bits and let VFS handle it */
- iap->ia_valid |= (ATTR_KILL_SUID | ATTR_KILL_SGID);
+ iap->ia_valid |= ATTR_KILL_SUID;
+ iap->ia_valid |=
+ setattr_should_drop_sgid(&nop_mnt_idmap, inode);
}
}
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x b719ebc37a1eacd4fd4f1264f731b016e5ec0c6e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071615-caddy-transport-24a5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
b719ebc37a1e ("wifi: ath10k: Serialize wake_tx_queue ops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b719ebc37a1eacd4fd4f1264f731b016e5ec0c6e Mon Sep 17 00:00:00 2001
From: Alexander Wetzel <alexander(a)wetzel-home.de>
Date: Thu, 23 Mar 2023 17:55:27 +0100
Subject: [PATCH] wifi: ath10k: Serialize wake_tx_queue ops
Serialize the ath10k implementation of the wake_tx_queue ops.
ath10k_mac_op_wake_tx_queue() must not run concurrent since it's using
ieee80211_txq_schedule_start().
The intend of this patch is to sort out an issue discovered in the discussion
referred to by the Link tag.
I can't test it with real hardware and thus just implemented the per-ac queue
lock Felix suggested. One obvious alternative to the per-ac lock would be to
bring back the txqs_lock commit bb2edb733586 ("ath10k: migrate to mac80211 txq
scheduling") dropped.
Fixes: bb2edb733586 ("ath10k: migrate to mac80211 txq scheduling")
Reported-by: Felix Fietkau <nbd(a)nbd.name>
Link: https://lore.kernel.org/r/519b5bb9-8899-ae7c-4eff-f3116cdfdb56@nbd.name
CC: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Wetzel <alexander(a)wetzel-home.de>
Signed-off-by: Kalle Valo <quic_kvalo(a)quicinc.com>
Link: https://lore.kernel.org/r/20230323165527.156414-1-alexander@wetzel-home.de
diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 5eb131ab916f..533ed7169e11 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -3643,6 +3643,9 @@ struct ath10k *ath10k_core_create(size_t priv_size, struct device *dev,
mutex_init(&ar->dump_mutex);
spin_lock_init(&ar->data_lock);
+ for (int ac = 0; ac < IEEE80211_NUM_ACS; ac++)
+ spin_lock_init(&ar->queue_lock[ac]);
+
INIT_LIST_HEAD(&ar->peers);
init_waitqueue_head(&ar->peer_mapping_wq);
init_waitqueue_head(&ar->htt.empty_tx_wq);
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index f5de8ce8fb45..4b5239de4018 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -1170,6 +1170,9 @@ struct ath10k {
/* protects shared structure data */
spinlock_t data_lock;
+ /* serialize wake_tx_queue calls per ac */
+ spinlock_t queue_lock[IEEE80211_NUM_ACS];
+
struct list_head arvifs;
struct list_head peers;
struct ath10k_peer *peer_map[ATH10K_MAX_NUM_PEER_IDS];
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 7675858f069b..9c4bf2fdbc0f 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4732,13 +4732,14 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw,
{
struct ath10k *ar = hw->priv;
int ret;
- u8 ac;
+ u8 ac = txq->ac;
ath10k_htt_tx_txq_update(hw, txq);
if (ar->htt.tx_q_state.mode != HTT_TX_MODE_SWITCH_PUSH)
return;
- ac = txq->ac;
+ spin_lock_bh(&ar->queue_lock[ac]);
+
ieee80211_txq_schedule_start(hw, ac);
txq = ieee80211_next_txq(hw, ac);
if (!txq)
@@ -4753,6 +4754,7 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw,
ath10k_htt_tx_txq_update(hw, txq);
out:
ieee80211_txq_schedule_end(hw, ac);
+ spin_unlock_bh(&ar->queue_lock[ac]);
}
/* Must not be called with conf_mutex held as workers can use that also. */
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x b719ebc37a1eacd4fd4f1264f731b016e5ec0c6e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071651-chill-scary-2f9f@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
b719ebc37a1e ("wifi: ath10k: Serialize wake_tx_queue ops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b719ebc37a1eacd4fd4f1264f731b016e5ec0c6e Mon Sep 17 00:00:00 2001
From: Alexander Wetzel <alexander(a)wetzel-home.de>
Date: Thu, 23 Mar 2023 17:55:27 +0100
Subject: [PATCH] wifi: ath10k: Serialize wake_tx_queue ops
Serialize the ath10k implementation of the wake_tx_queue ops.
ath10k_mac_op_wake_tx_queue() must not run concurrent since it's using
ieee80211_txq_schedule_start().
The intend of this patch is to sort out an issue discovered in the discussion
referred to by the Link tag.
I can't test it with real hardware and thus just implemented the per-ac queue
lock Felix suggested. One obvious alternative to the per-ac lock would be to
bring back the txqs_lock commit bb2edb733586 ("ath10k: migrate to mac80211 txq
scheduling") dropped.
Fixes: bb2edb733586 ("ath10k: migrate to mac80211 txq scheduling")
Reported-by: Felix Fietkau <nbd(a)nbd.name>
Link: https://lore.kernel.org/r/519b5bb9-8899-ae7c-4eff-f3116cdfdb56@nbd.name
CC: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Wetzel <alexander(a)wetzel-home.de>
Signed-off-by: Kalle Valo <quic_kvalo(a)quicinc.com>
Link: https://lore.kernel.org/r/20230323165527.156414-1-alexander@wetzel-home.de
diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 5eb131ab916f..533ed7169e11 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -3643,6 +3643,9 @@ struct ath10k *ath10k_core_create(size_t priv_size, struct device *dev,
mutex_init(&ar->dump_mutex);
spin_lock_init(&ar->data_lock);
+ for (int ac = 0; ac < IEEE80211_NUM_ACS; ac++)
+ spin_lock_init(&ar->queue_lock[ac]);
+
INIT_LIST_HEAD(&ar->peers);
init_waitqueue_head(&ar->peer_mapping_wq);
init_waitqueue_head(&ar->htt.empty_tx_wq);
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index f5de8ce8fb45..4b5239de4018 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -1170,6 +1170,9 @@ struct ath10k {
/* protects shared structure data */
spinlock_t data_lock;
+ /* serialize wake_tx_queue calls per ac */
+ spinlock_t queue_lock[IEEE80211_NUM_ACS];
+
struct list_head arvifs;
struct list_head peers;
struct ath10k_peer *peer_map[ATH10K_MAX_NUM_PEER_IDS];
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 7675858f069b..9c4bf2fdbc0f 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4732,13 +4732,14 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw,
{
struct ath10k *ar = hw->priv;
int ret;
- u8 ac;
+ u8 ac = txq->ac;
ath10k_htt_tx_txq_update(hw, txq);
if (ar->htt.tx_q_state.mode != HTT_TX_MODE_SWITCH_PUSH)
return;
- ac = txq->ac;
+ spin_lock_bh(&ar->queue_lock[ac]);
+
ieee80211_txq_schedule_start(hw, ac);
txq = ieee80211_next_txq(hw, ac);
if (!txq)
@@ -4753,6 +4754,7 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw,
ath10k_htt_tx_txq_update(hw, txq);
out:
ieee80211_txq_schedule_end(hw, ac);
+ spin_unlock_bh(&ar->queue_lock[ac]);
}
/* Must not be called with conf_mutex held as workers can use that also. */
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x b719ebc37a1eacd4fd4f1264f731b016e5ec0c6e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071638-cancel-excluding-0e69@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
b719ebc37a1e ("wifi: ath10k: Serialize wake_tx_queue ops")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b719ebc37a1eacd4fd4f1264f731b016e5ec0c6e Mon Sep 17 00:00:00 2001
From: Alexander Wetzel <alexander(a)wetzel-home.de>
Date: Thu, 23 Mar 2023 17:55:27 +0100
Subject: [PATCH] wifi: ath10k: Serialize wake_tx_queue ops
Serialize the ath10k implementation of the wake_tx_queue ops.
ath10k_mac_op_wake_tx_queue() must not run concurrent since it's using
ieee80211_txq_schedule_start().
The intend of this patch is to sort out an issue discovered in the discussion
referred to by the Link tag.
I can't test it with real hardware and thus just implemented the per-ac queue
lock Felix suggested. One obvious alternative to the per-ac lock would be to
bring back the txqs_lock commit bb2edb733586 ("ath10k: migrate to mac80211 txq
scheduling") dropped.
Fixes: bb2edb733586 ("ath10k: migrate to mac80211 txq scheduling")
Reported-by: Felix Fietkau <nbd(a)nbd.name>
Link: https://lore.kernel.org/r/519b5bb9-8899-ae7c-4eff-f3116cdfdb56@nbd.name
CC: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Wetzel <alexander(a)wetzel-home.de>
Signed-off-by: Kalle Valo <quic_kvalo(a)quicinc.com>
Link: https://lore.kernel.org/r/20230323165527.156414-1-alexander@wetzel-home.de
diff --git a/drivers/net/wireless/ath/ath10k/core.c b/drivers/net/wireless/ath/ath10k/core.c
index 5eb131ab916f..533ed7169e11 100644
--- a/drivers/net/wireless/ath/ath10k/core.c
+++ b/drivers/net/wireless/ath/ath10k/core.c
@@ -3643,6 +3643,9 @@ struct ath10k *ath10k_core_create(size_t priv_size, struct device *dev,
mutex_init(&ar->dump_mutex);
spin_lock_init(&ar->data_lock);
+ for (int ac = 0; ac < IEEE80211_NUM_ACS; ac++)
+ spin_lock_init(&ar->queue_lock[ac]);
+
INIT_LIST_HEAD(&ar->peers);
init_waitqueue_head(&ar->peer_mapping_wq);
init_waitqueue_head(&ar->htt.empty_tx_wq);
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h
index f5de8ce8fb45..4b5239de4018 100644
--- a/drivers/net/wireless/ath/ath10k/core.h
+++ b/drivers/net/wireless/ath/ath10k/core.h
@@ -1170,6 +1170,9 @@ struct ath10k {
/* protects shared structure data */
spinlock_t data_lock;
+ /* serialize wake_tx_queue calls per ac */
+ spinlock_t queue_lock[IEEE80211_NUM_ACS];
+
struct list_head arvifs;
struct list_head peers;
struct ath10k_peer *peer_map[ATH10K_MAX_NUM_PEER_IDS];
diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c
index 7675858f069b..9c4bf2fdbc0f 100644
--- a/drivers/net/wireless/ath/ath10k/mac.c
+++ b/drivers/net/wireless/ath/ath10k/mac.c
@@ -4732,13 +4732,14 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw,
{
struct ath10k *ar = hw->priv;
int ret;
- u8 ac;
+ u8 ac = txq->ac;
ath10k_htt_tx_txq_update(hw, txq);
if (ar->htt.tx_q_state.mode != HTT_TX_MODE_SWITCH_PUSH)
return;
- ac = txq->ac;
+ spin_lock_bh(&ar->queue_lock[ac]);
+
ieee80211_txq_schedule_start(hw, ac);
txq = ieee80211_next_txq(hw, ac);
if (!txq)
@@ -4753,6 +4754,7 @@ static void ath10k_mac_op_wake_tx_queue(struct ieee80211_hw *hw,
ath10k_htt_tx_txq_update(hw, txq);
out:
ieee80211_txq_schedule_end(hw, ac);
+ spin_unlock_bh(&ar->queue_lock[ac]);
}
/* Must not be called with conf_mutex held as workers can use that also. */
Hi Greg,
Could you please pick the below commit for v6.3, v6.1 and v5.15
Upstream Commit:
04292c695f82 2023-05-16ipvs: increase ip_vs_conn_tab_bits range for 64BIT [Pablo Neira Ayuso]
Thanks,
Allen
Hi,
A problem exists where dGPUs with type-C ports are considered power
supplies that power the system.
This leads to poor performance of the dGPU because graphics drivers like
amdgpu use power_supply_is_system_supplied() to decide how to configure
the dGPU.
This has been fixed in 6.5-rc1 by marking dGPUs as "DEVICE".
The logic to fix what to do when DEVICE is encountered was fixed in
6.4-rc4 and already backported to stable:
95339f40a8b6 ("power: supply: Fix logic checking if system is running
from battery")
So to wrap up the fix in stable kernels can you please backport:
6.4.y:
a7fbfd44c020 ("usb: typec: ucsi: Mark dGPUs as DEVICE scope")
6.1.y:
f510b0a3565b ("i2c: nvidia-gpu: Add ACPI property to align with
device-tree")
430b38764fbb ("i2c: nvidia-gpu: Remove ccgx,firmware-build property")
a7fbfd44c020 ("usb: typec: ucsi: Mark dGPUs as DEVICE scope")
Thanks!
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 36ce9d76b0a93bae799e27e4f5ac35478c676592
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071640-facedown-shrine-ada1@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
36ce9d76b0a9 ("shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs")
d7167b149943 ("fs_parse: fold fs_parameter_desc/fs_parameter_spec")
96cafb9ccb15 ("fs_parser: remove fs_parameter_description name field")
cc3c0b533ab9 ("add prefix to fs_context->log")
c80c98f0dc5d ("ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log")
7f5d38141e30 ("new primitive: __fs_parse()")
2c3f3dc31556 ("switch rbd and libceph to p_log-based primitives")
3fbb8d5554a1 ("struct p_log, variants of warnf() et.al. taking that one instead")
9f09f649ca33 ("teach logfc() to handle prefices, give it saner calling conventions")
5eede625297f ("fold struct fs_parameter_enum into struct constant_table")
2710c957a8ef ("fs_parse: get rid of ->enums")
0f89589a8c6f ("Pass consistent param->type to fs_parse()")
f2aedb713c28 ("NFS: Add fs_context support.")
e38bb238ed8c ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
e558100fda7e ("NFS: Do some tidying of the parsing code")
48be8a66cf98 ("NFS: Add a small buffer in nfs_fs_context to avoid string dup")
cbd071b5daa0 ("NFS: Deindent nfs_fs_context_parse_option()")
f8ee01e3e2c8 ("NFS: Split nfs_parse_mount_options()")
5eb005caf538 ("NFS: Rename struct nfs_parsed_mount_data to struct nfs_fs_context")
e0a626b12474 ("NFS: Constify mount argument match tables")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 36ce9d76b0a93bae799e27e4f5ac35478c676592 Mon Sep 17 00:00:00 2001
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Date: Wed, 7 Jun 2023 18:15:23 +0200
Subject: [PATCH] shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based
tmpfs
As the ramfs-based tmpfs uses ramfs_init_fs_context() for the
init_fs_context method, which allocates fc->s_fs_info, use ramfs_kill_sb()
to free it and avoid a memory leak.
Link: https://lkml.kernel.org/r/20230607161523.2876433-1-roberto.sassu@huaweiclou…
Fixes: c3b1b1cbf002 ("ramfs: add support for "mode=" mount option")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index 5ba580c78835..fef477c78107 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -278,7 +278,7 @@ int ramfs_init_fs_context(struct fs_context *fc)
return 0;
}
-static void ramfs_kill_sb(struct super_block *sb)
+void ramfs_kill_sb(struct super_block *sb)
{
kfree(sb->s_fs_info);
kill_litter_super(sb);
diff --git a/include/linux/ramfs.h b/include/linux/ramfs.h
index 917528d102c4..d506dc63dd47 100644
--- a/include/linux/ramfs.h
+++ b/include/linux/ramfs.h
@@ -7,6 +7,7 @@
struct inode *ramfs_get_inode(struct super_block *sb, const struct inode *dir,
umode_t mode, dev_t dev);
extern int ramfs_init_fs_context(struct fs_context *fc);
+extern void ramfs_kill_sb(struct super_block *sb);
#ifdef CONFIG_MMU
static inline int
diff --git a/mm/shmem.c b/mm/shmem.c
index 5e54ab5f61f2..c606ab89693a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4199,7 +4199,7 @@ static struct file_system_type shmem_fs_type = {
.name = "tmpfs",
.init_fs_context = ramfs_init_fs_context,
.parameters = ramfs_fs_parameters,
- .kill_sb = kill_litter_super,
+ .kill_sb = ramfs_kill_sb,
.fs_flags = FS_USERNS_MOUNT,
};
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 36ce9d76b0a93bae799e27e4f5ac35478c676592
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071639-deeply-cuddly-0824@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
36ce9d76b0a9 ("shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs")
d7167b149943 ("fs_parse: fold fs_parameter_desc/fs_parameter_spec")
96cafb9ccb15 ("fs_parser: remove fs_parameter_description name field")
cc3c0b533ab9 ("add prefix to fs_context->log")
c80c98f0dc5d ("ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log")
7f5d38141e30 ("new primitive: __fs_parse()")
2c3f3dc31556 ("switch rbd and libceph to p_log-based primitives")
3fbb8d5554a1 ("struct p_log, variants of warnf() et.al. taking that one instead")
9f09f649ca33 ("teach logfc() to handle prefices, give it saner calling conventions")
5eede625297f ("fold struct fs_parameter_enum into struct constant_table")
2710c957a8ef ("fs_parse: get rid of ->enums")
0f89589a8c6f ("Pass consistent param->type to fs_parse()")
f2aedb713c28 ("NFS: Add fs_context support.")
e38bb238ed8c ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
e558100fda7e ("NFS: Do some tidying of the parsing code")
48be8a66cf98 ("NFS: Add a small buffer in nfs_fs_context to avoid string dup")
cbd071b5daa0 ("NFS: Deindent nfs_fs_context_parse_option()")
f8ee01e3e2c8 ("NFS: Split nfs_parse_mount_options()")
5eb005caf538 ("NFS: Rename struct nfs_parsed_mount_data to struct nfs_fs_context")
e0a626b12474 ("NFS: Constify mount argument match tables")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 36ce9d76b0a93bae799e27e4f5ac35478c676592 Mon Sep 17 00:00:00 2001
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Date: Wed, 7 Jun 2023 18:15:23 +0200
Subject: [PATCH] shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based
tmpfs
As the ramfs-based tmpfs uses ramfs_init_fs_context() for the
init_fs_context method, which allocates fc->s_fs_info, use ramfs_kill_sb()
to free it and avoid a memory leak.
Link: https://lkml.kernel.org/r/20230607161523.2876433-1-roberto.sassu@huaweiclou…
Fixes: c3b1b1cbf002 ("ramfs: add support for "mode=" mount option")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index 5ba580c78835..fef477c78107 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -278,7 +278,7 @@ int ramfs_init_fs_context(struct fs_context *fc)
return 0;
}
-static void ramfs_kill_sb(struct super_block *sb)
+void ramfs_kill_sb(struct super_block *sb)
{
kfree(sb->s_fs_info);
kill_litter_super(sb);
diff --git a/include/linux/ramfs.h b/include/linux/ramfs.h
index 917528d102c4..d506dc63dd47 100644
--- a/include/linux/ramfs.h
+++ b/include/linux/ramfs.h
@@ -7,6 +7,7 @@
struct inode *ramfs_get_inode(struct super_block *sb, const struct inode *dir,
umode_t mode, dev_t dev);
extern int ramfs_init_fs_context(struct fs_context *fc);
+extern void ramfs_kill_sb(struct super_block *sb);
#ifdef CONFIG_MMU
static inline int
diff --git a/mm/shmem.c b/mm/shmem.c
index 5e54ab5f61f2..c606ab89693a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4199,7 +4199,7 @@ static struct file_system_type shmem_fs_type = {
.name = "tmpfs",
.init_fs_context = ramfs_init_fs_context,
.parameters = ramfs_fs_parameters,
- .kill_sb = kill_litter_super,
+ .kill_sb = ramfs_kill_sb,
.fs_flags = FS_USERNS_MOUNT,
};
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 36ce9d76b0a93bae799e27e4f5ac35478c676592
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071638-barbecue-imaginary-c2c3@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
36ce9d76b0a9 ("shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs")
d7167b149943 ("fs_parse: fold fs_parameter_desc/fs_parameter_spec")
96cafb9ccb15 ("fs_parser: remove fs_parameter_description name field")
cc3c0b533ab9 ("add prefix to fs_context->log")
c80c98f0dc5d ("ceph_parse_param(), ceph_parse_mon_ips(): switch to passing fc_log")
7f5d38141e30 ("new primitive: __fs_parse()")
2c3f3dc31556 ("switch rbd and libceph to p_log-based primitives")
3fbb8d5554a1 ("struct p_log, variants of warnf() et.al. taking that one instead")
9f09f649ca33 ("teach logfc() to handle prefices, give it saner calling conventions")
5eede625297f ("fold struct fs_parameter_enum into struct constant_table")
2710c957a8ef ("fs_parse: get rid of ->enums")
0f89589a8c6f ("Pass consistent param->type to fs_parse()")
f2aedb713c28 ("NFS: Add fs_context support.")
e38bb238ed8c ("NFS: Convert mount option parsing to use functionality from fs_parser.h")
e558100fda7e ("NFS: Do some tidying of the parsing code")
48be8a66cf98 ("NFS: Add a small buffer in nfs_fs_context to avoid string dup")
cbd071b5daa0 ("NFS: Deindent nfs_fs_context_parse_option()")
f8ee01e3e2c8 ("NFS: Split nfs_parse_mount_options()")
5eb005caf538 ("NFS: Rename struct nfs_parsed_mount_data to struct nfs_fs_context")
e0a626b12474 ("NFS: Constify mount argument match tables")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 36ce9d76b0a93bae799e27e4f5ac35478c676592 Mon Sep 17 00:00:00 2001
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Date: Wed, 7 Jun 2023 18:15:23 +0200
Subject: [PATCH] shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based
tmpfs
As the ramfs-based tmpfs uses ramfs_init_fs_context() for the
init_fs_context method, which allocates fc->s_fs_info, use ramfs_kill_sb()
to free it and avoid a memory leak.
Link: https://lkml.kernel.org/r/20230607161523.2876433-1-roberto.sassu@huaweiclou…
Fixes: c3b1b1cbf002 ("ramfs: add support for "mode=" mount option")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/fs/ramfs/inode.c b/fs/ramfs/inode.c
index 5ba580c78835..fef477c78107 100644
--- a/fs/ramfs/inode.c
+++ b/fs/ramfs/inode.c
@@ -278,7 +278,7 @@ int ramfs_init_fs_context(struct fs_context *fc)
return 0;
}
-static void ramfs_kill_sb(struct super_block *sb)
+void ramfs_kill_sb(struct super_block *sb)
{
kfree(sb->s_fs_info);
kill_litter_super(sb);
diff --git a/include/linux/ramfs.h b/include/linux/ramfs.h
index 917528d102c4..d506dc63dd47 100644
--- a/include/linux/ramfs.h
+++ b/include/linux/ramfs.h
@@ -7,6 +7,7 @@
struct inode *ramfs_get_inode(struct super_block *sb, const struct inode *dir,
umode_t mode, dev_t dev);
extern int ramfs_init_fs_context(struct fs_context *fc);
+extern void ramfs_kill_sb(struct super_block *sb);
#ifdef CONFIG_MMU
static inline int
diff --git a/mm/shmem.c b/mm/shmem.c
index 5e54ab5f61f2..c606ab89693a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4199,7 +4199,7 @@ static struct file_system_type shmem_fs_type = {
.name = "tmpfs",
.init_fs_context = ramfs_init_fs_context,
.parameters = ramfs_fs_parameters,
- .kill_sb = kill_litter_super,
+ .kill_sb = ramfs_kill_sb,
.fs_flags = FS_USERNS_MOUNT,
};
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x a5a319ec2c2236bb96d147c16196d2f1f3799301
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071628-dazzling-pretense-8a8d@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
a5a319ec2c22 ("um: Use HOST_DIR for mrproper")
0663c68c4d2d ("kbuild: remove {CLEAN,MRPROPER,DISTCLEAN}_DIRS")
610134b750bb ("kbuild: remove misleading stale FIXME comment")
63ec90f18204 ("um: ensure `make ARCH=um mrproper` removes arch/$(SUBARCH)/include/generated/")
8b41fc4454e3 ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf")
b1fbfcb4a209 ("kbuild: make single target builds even faster")
bbc55bded4aa ("modpost: dump missing namespaces into a single modules.nsdeps file")
0241ea8cae19 ("modpost: free ns_deps_buf.p after writing ns_deps files")
bff9c62b5d20 ("modpost: do not invoke extra modpost for nsdeps")
35e046a203ee ("kbuild: remove unneeded variable, single-all")
39808e451fdf ("kbuild: do not read $(KBUILD_EXTMOD)/Module.symvers")
1747269ab016 ("modpost: do not parse vmlinux for external module builds")
fab546e6cd7a ("kbuild: update comments in scripts/Makefile.modpost")
57baec7b1b04 ("scripts/nsdeps: make sure to pass all module source files to spatch")
09684950050b ("scripts/nsdeps: use alternative sed delimiter")
e0703556644a ("Merge tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a5a319ec2c2236bb96d147c16196d2f1f3799301 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Tue, 6 Jun 2023 15:24:45 -0700
Subject: [PATCH] um: Use HOST_DIR for mrproper
When HEADER_ARCH was introduced, the MRPROPER_FILES (then MRPROPER_DIRS)
list wasn't adjusted, leaving SUBARCH as part of the path argument.
This resulted in the "mrproper" target not cleaning up arch/x86/... when
SUBARCH was specified. Since HOST_DIR is arch/$(HEADER_ARCH), use it
instead to get the correct path.
Cc: Richard Weinberger <richard(a)nod.at>
Cc: Anton Ivanov <anton.ivanov(a)cambridgegreys.com>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: Azeem Shaikh <azeemshaikh38(a)gmail.com>
Cc: linux-um(a)lists.infradead.org
Fixes: 7bbe7204e937 ("um: merge Makefile-{i386,x86_64}")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Link: https://lore.kernel.org/r/20230606222442.never.807-kees@kernel.org
diff --git a/arch/um/Makefile b/arch/um/Makefile
index 8186d4761bda..da4d5256af2f 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -149,7 +149,7 @@ export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) $(CC_FLAGS_
# When cleaning we don't include .config, so we don't include
# TT or skas makefiles and don't clean skas_ptregs.h.
CLEAN_FILES += linux x.i gmon.out
-MRPROPER_FILES += arch/$(SUBARCH)/include/generated
+MRPROPER_FILES += $(HOST_DIR)/include/generated
archclean:
@find . \( -name '*.bb' -o -name '*.bbg' -o -name '*.da' \
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x a5a319ec2c2236bb96d147c16196d2f1f3799301
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071622-improving-scrambler-114b@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
a5a319ec2c22 ("um: Use HOST_DIR for mrproper")
0663c68c4d2d ("kbuild: remove {CLEAN,MRPROPER,DISTCLEAN}_DIRS")
610134b750bb ("kbuild: remove misleading stale FIXME comment")
63ec90f18204 ("um: ensure `make ARCH=um mrproper` removes arch/$(SUBARCH)/include/generated/")
8b41fc4454e3 ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf")
b1fbfcb4a209 ("kbuild: make single target builds even faster")
bbc55bded4aa ("modpost: dump missing namespaces into a single modules.nsdeps file")
0241ea8cae19 ("modpost: free ns_deps_buf.p after writing ns_deps files")
bff9c62b5d20 ("modpost: do not invoke extra modpost for nsdeps")
35e046a203ee ("kbuild: remove unneeded variable, single-all")
39808e451fdf ("kbuild: do not read $(KBUILD_EXTMOD)/Module.symvers")
1747269ab016 ("modpost: do not parse vmlinux for external module builds")
fab546e6cd7a ("kbuild: update comments in scripts/Makefile.modpost")
57baec7b1b04 ("scripts/nsdeps: make sure to pass all module source files to spatch")
09684950050b ("scripts/nsdeps: use alternative sed delimiter")
e0703556644a ("Merge tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a5a319ec2c2236bb96d147c16196d2f1f3799301 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Tue, 6 Jun 2023 15:24:45 -0700
Subject: [PATCH] um: Use HOST_DIR for mrproper
When HEADER_ARCH was introduced, the MRPROPER_FILES (then MRPROPER_DIRS)
list wasn't adjusted, leaving SUBARCH as part of the path argument.
This resulted in the "mrproper" target not cleaning up arch/x86/... when
SUBARCH was specified. Since HOST_DIR is arch/$(HEADER_ARCH), use it
instead to get the correct path.
Cc: Richard Weinberger <richard(a)nod.at>
Cc: Anton Ivanov <anton.ivanov(a)cambridgegreys.com>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: Azeem Shaikh <azeemshaikh38(a)gmail.com>
Cc: linux-um(a)lists.infradead.org
Fixes: 7bbe7204e937 ("um: merge Makefile-{i386,x86_64}")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Link: https://lore.kernel.org/r/20230606222442.never.807-kees@kernel.org
diff --git a/arch/um/Makefile b/arch/um/Makefile
index 8186d4761bda..da4d5256af2f 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -149,7 +149,7 @@ export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) $(CC_FLAGS_
# When cleaning we don't include .config, so we don't include
# TT or skas makefiles and don't clean skas_ptregs.h.
CLEAN_FILES += linux x.i gmon.out
-MRPROPER_FILES += arch/$(SUBARCH)/include/generated
+MRPROPER_FILES += $(HOST_DIR)/include/generated
archclean:
@find . \( -name '*.bb' -o -name '*.bbg' -o -name '*.da' \
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x a5a319ec2c2236bb96d147c16196d2f1f3799301
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071621-diary-henchman-76a2@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
a5a319ec2c22 ("um: Use HOST_DIR for mrproper")
0663c68c4d2d ("kbuild: remove {CLEAN,MRPROPER,DISTCLEAN}_DIRS")
610134b750bb ("kbuild: remove misleading stale FIXME comment")
63ec90f18204 ("um: ensure `make ARCH=um mrproper` removes arch/$(SUBARCH)/include/generated/")
8b41fc4454e3 ("kbuild: create modules.builtin without Makefile.modbuiltin or tristate.conf")
b1fbfcb4a209 ("kbuild: make single target builds even faster")
bbc55bded4aa ("modpost: dump missing namespaces into a single modules.nsdeps file")
0241ea8cae19 ("modpost: free ns_deps_buf.p after writing ns_deps files")
bff9c62b5d20 ("modpost: do not invoke extra modpost for nsdeps")
35e046a203ee ("kbuild: remove unneeded variable, single-all")
39808e451fdf ("kbuild: do not read $(KBUILD_EXTMOD)/Module.symvers")
1747269ab016 ("modpost: do not parse vmlinux for external module builds")
fab546e6cd7a ("kbuild: update comments in scripts/Makefile.modpost")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a5a319ec2c2236bb96d147c16196d2f1f3799301 Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Tue, 6 Jun 2023 15:24:45 -0700
Subject: [PATCH] um: Use HOST_DIR for mrproper
When HEADER_ARCH was introduced, the MRPROPER_FILES (then MRPROPER_DIRS)
list wasn't adjusted, leaving SUBARCH as part of the path argument.
This resulted in the "mrproper" target not cleaning up arch/x86/... when
SUBARCH was specified. Since HOST_DIR is arch/$(HEADER_ARCH), use it
instead to get the correct path.
Cc: Richard Weinberger <richard(a)nod.at>
Cc: Anton Ivanov <anton.ivanov(a)cambridgegreys.com>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: Azeem Shaikh <azeemshaikh38(a)gmail.com>
Cc: linux-um(a)lists.infradead.org
Fixes: 7bbe7204e937 ("um: merge Makefile-{i386,x86_64}")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Link: https://lore.kernel.org/r/20230606222442.never.807-kees@kernel.org
diff --git a/arch/um/Makefile b/arch/um/Makefile
index 8186d4761bda..da4d5256af2f 100644
--- a/arch/um/Makefile
+++ b/arch/um/Makefile
@@ -149,7 +149,7 @@ export CFLAGS_vmlinux := $(LINK-y) $(LINK_WRAPS) $(LD_FLAGS_CMDLINE) $(CC_FLAGS_
# When cleaning we don't include .config, so we don't include
# TT or skas makefiles and don't clean skas_ptregs.h.
CLEAN_FILES += linux x.i gmon.out
-MRPROPER_FILES += arch/$(SUBARCH)/include/generated
+MRPROPER_FILES += $(HOST_DIR)/include/generated
archclean:
@find . \( -name '*.bb' -o -name '*.bbg' -o -name '*.da' \
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 80fca8a10b604afad6c14213fdfd816c4eda3ee4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071622-crablike-glaucoma-f28c@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
80fca8a10b60 ("bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent")
17e4aed8309f ("bcache: remove 'int n' from parameter list of bch_bucket_alloc_set()")
8792099f9ad4 ("bcache: use MAX_CACHES_PER_SET instead of magic number 8 in __bch_bucket_alloc_set")
fc2d5988b597 ("bcache: add identifier names to arguments of function definitions")
6f10f7d1b02b ("bcache: style fix to replace 'unsigned' by 'unsigned int'")
ea8c5356d390 ("bcache: set max writeback rate when I/O request is idle")
b467a6ac0b4b ("bcache: add code comments for bset.c")
b4cb6efc1af7 ("bcache: display rate debug parameters to 0 when writeback is not running")
94f71c16062e ("bcache: fix I/O significant decline while backend devices registering")
99a27d59bd7b ("bcache: simplify the calculation of the total amount of flash dirty data")
ddcf35d39797 ("block: Add and use op_stat_group() for indexing disk_stat fields.")
3f289dcb4b26 ("block: make bdev_ops->rw_page() take a REQ_OP instead of bool")
0f0709e6bfc3 ("bcache: stop bcache device when backing device is offline")
522a777566f5 ("block: consolidate struct request timestamp fields")
4bc6339a583c ("block: move blk_stat_add() to __blk_mq_end_request()")
84c7afcebed9 ("block: use ktime_get_ns() instead of sched_clock() for cfq and bfq")
544ccc8dc904 ("block: get rid of struct blk_issue_stat")
a8a45941706b ("block: pass struct request instead of struct blk_issue_stat to wbt")
934031a12980 ("block: move some wbt helpers to blk-wbt.c")
782f569774d7 ("blk-wbt: throttle discards like background writes")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 80fca8a10b604afad6c14213fdfd816c4eda3ee4 Mon Sep 17 00:00:00 2001
From: Zheng Wang <zyytlz.wz(a)163.com>
Date: Thu, 15 Jun 2023 20:12:22 +0800
Subject: [PATCH] bcache: Fix __bch_btree_node_alloc to make the failure
behavior consistent
In some specific situations, the return value of __bch_btree_node_alloc
may be NULL. This may lead to a potential NULL pointer dereference in
caller function like a calling chain :
btree_split->bch_btree_node_alloc->__bch_btree_node_alloc.
Fix it by initializing the return value in __bch_btree_node_alloc.
Fixes: cafe56359144 ("bcache: A block layer cache")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Signed-off-by: Coly Li <colyli(a)suse.de>
Link: https://lore.kernel.org/r/20230615121223.22502-6-colyli@suse.de
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 7c21e54468bf..0ddf91204782 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1090,10 +1090,12 @@ struct btree *__bch_btree_node_alloc(struct cache_set *c, struct btree_op *op,
struct btree *parent)
{
BKEY_PADDED(key) k;
- struct btree *b = ERR_PTR(-EAGAIN);
+ struct btree *b;
mutex_lock(&c->bucket_lock);
retry:
+ /* return ERR_PTR(-EAGAIN) when it fails */
+ b = ERR_PTR(-EAGAIN);
if (__bch_bucket_alloc_set(c, RESERVE_BTREE, &k.key, wait))
goto err;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 80fca8a10b604afad6c14213fdfd816c4eda3ee4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071621-lubricate-dragging-07c6@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
80fca8a10b60 ("bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent")
17e4aed8309f ("bcache: remove 'int n' from parameter list of bch_bucket_alloc_set()")
8792099f9ad4 ("bcache: use MAX_CACHES_PER_SET instead of magic number 8 in __bch_bucket_alloc_set")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 80fca8a10b604afad6c14213fdfd816c4eda3ee4 Mon Sep 17 00:00:00 2001
From: Zheng Wang <zyytlz.wz(a)163.com>
Date: Thu, 15 Jun 2023 20:12:22 +0800
Subject: [PATCH] bcache: Fix __bch_btree_node_alloc to make the failure
behavior consistent
In some specific situations, the return value of __bch_btree_node_alloc
may be NULL. This may lead to a potential NULL pointer dereference in
caller function like a calling chain :
btree_split->bch_btree_node_alloc->__bch_btree_node_alloc.
Fix it by initializing the return value in __bch_btree_node_alloc.
Fixes: cafe56359144 ("bcache: A block layer cache")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Signed-off-by: Coly Li <colyli(a)suse.de>
Link: https://lore.kernel.org/r/20230615121223.22502-6-colyli@suse.de
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 7c21e54468bf..0ddf91204782 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1090,10 +1090,12 @@ struct btree *__bch_btree_node_alloc(struct cache_set *c, struct btree_op *op,
struct btree *parent)
{
BKEY_PADDED(key) k;
- struct btree *b = ERR_PTR(-EAGAIN);
+ struct btree *b;
mutex_lock(&c->bucket_lock);
retry:
+ /* return ERR_PTR(-EAGAIN) when it fails */
+ b = ERR_PTR(-EAGAIN);
if (__bch_bucket_alloc_set(c, RESERVE_BTREE, &k.key, wait))
goto err;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 80fca8a10b604afad6c14213fdfd816c4eda3ee4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071620-dangling-strike-bfaa@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
80fca8a10b60 ("bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent")
17e4aed8309f ("bcache: remove 'int n' from parameter list of bch_bucket_alloc_set()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 80fca8a10b604afad6c14213fdfd816c4eda3ee4 Mon Sep 17 00:00:00 2001
From: Zheng Wang <zyytlz.wz(a)163.com>
Date: Thu, 15 Jun 2023 20:12:22 +0800
Subject: [PATCH] bcache: Fix __bch_btree_node_alloc to make the failure
behavior consistent
In some specific situations, the return value of __bch_btree_node_alloc
may be NULL. This may lead to a potential NULL pointer dereference in
caller function like a calling chain :
btree_split->bch_btree_node_alloc->__bch_btree_node_alloc.
Fix it by initializing the return value in __bch_btree_node_alloc.
Fixes: cafe56359144 ("bcache: A block layer cache")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Signed-off-by: Coly Li <colyli(a)suse.de>
Link: https://lore.kernel.org/r/20230615121223.22502-6-colyli@suse.de
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 7c21e54468bf..0ddf91204782 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1090,10 +1090,12 @@ struct btree *__bch_btree_node_alloc(struct cache_set *c, struct btree_op *op,
struct btree *parent)
{
BKEY_PADDED(key) k;
- struct btree *b = ERR_PTR(-EAGAIN);
+ struct btree *b;
mutex_lock(&c->bucket_lock);
retry:
+ /* return ERR_PTR(-EAGAIN) when it fails */
+ b = ERR_PTR(-EAGAIN);
if (__bch_bucket_alloc_set(c, RESERVE_BTREE, &k.key, wait))
goto err;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 028ddcac477b691dd9205c92f991cc15259d033e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071604-panning-specimen-6e1a@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
028ddcac477b ("bcache: Remove unnecessary NULL point check in node allocations")
1fae7cf05293 ("bcache: style fix to add a blank line after declarations")
6f10f7d1b02b ("bcache: style fix to replace 'unsigned' by 'unsigned int'")
ea8c5356d390 ("bcache: set max writeback rate when I/O request is idle")
b467a6ac0b4b ("bcache: add code comments for bset.c")
b4cb6efc1af7 ("bcache: display rate debug parameters to 0 when writeback is not running")
94f71c16062e ("bcache: fix I/O significant decline while backend devices registering")
99a27d59bd7b ("bcache: simplify the calculation of the total amount of flash dirty data")
ddcf35d39797 ("block: Add and use op_stat_group() for indexing disk_stat fields.")
3f289dcb4b26 ("block: make bdev_ops->rw_page() take a REQ_OP instead of bool")
d19936a26658 ("bcache: convert to bioset_init()/mempool_init()")
0f0709e6bfc3 ("bcache: stop bcache device when backing device is offline")
522a777566f5 ("block: consolidate struct request timestamp fields")
4bc6339a583c ("block: move blk_stat_add() to __blk_mq_end_request()")
84c7afcebed9 ("block: use ktime_get_ns() instead of sched_clock() for cfq and bfq")
544ccc8dc904 ("block: get rid of struct blk_issue_stat")
a8a45941706b ("block: pass struct request instead of struct blk_issue_stat to wbt")
934031a12980 ("block: move some wbt helpers to blk-wbt.c")
782f569774d7 ("blk-wbt: throttle discards like background writes")
8bea60901974 ("blk-wbt: pass in enum wbt_flags to get_rq_wait()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 028ddcac477b691dd9205c92f991cc15259d033e Mon Sep 17 00:00:00 2001
From: Zheng Wang <zyytlz.wz(a)163.com>
Date: Thu, 15 Jun 2023 20:12:21 +0800
Subject: [PATCH] bcache: Remove unnecessary NULL point check in node
allocations
Due to the previous fix of __bch_btree_node_alloc, the return value will
never be a NULL pointer. So IS_ERR is enough to handle the failure
situation. Fix it by replacing IS_ERR_OR_NULL check by an IS_ERR check.
Fixes: cafe56359144 ("bcache: A block layer cache")
Cc: stable(a)vger.kernel.org
Signed-off-by: Zheng Wang <zyytlz.wz(a)163.com>
Signed-off-by: Coly Li <colyli(a)suse.de>
Link: https://lore.kernel.org/r/20230615121223.22502-5-colyli@suse.de
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index 147c493a989a..7c21e54468bf 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -1138,7 +1138,7 @@ static struct btree *btree_node_alloc_replacement(struct btree *b,
{
struct btree *n = bch_btree_node_alloc(b->c, op, b->level, b->parent);
- if (!IS_ERR_OR_NULL(n)) {
+ if (!IS_ERR(n)) {
mutex_lock(&n->write_lock);
bch_btree_sort_into(&b->keys, &n->keys, &b->c->sort);
bkey_copy_key(&n->key, &b->key);
@@ -1340,7 +1340,7 @@ static int btree_gc_coalesce(struct btree *b, struct btree_op *op,
memset(new_nodes, 0, sizeof(new_nodes));
closure_init_stack(&cl);
- while (nodes < GC_MERGE_NODES && !IS_ERR_OR_NULL(r[nodes].b))
+ while (nodes < GC_MERGE_NODES && !IS_ERR(r[nodes].b))
keys += r[nodes++].keys;
blocks = btree_default_blocks(b->c) * 2 / 3;
@@ -1352,7 +1352,7 @@ static int btree_gc_coalesce(struct btree *b, struct btree_op *op,
for (i = 0; i < nodes; i++) {
new_nodes[i] = btree_node_alloc_replacement(r[i].b, NULL);
- if (IS_ERR_OR_NULL(new_nodes[i]))
+ if (IS_ERR(new_nodes[i]))
goto out_nocoalesce;
}
@@ -1487,7 +1487,7 @@ out_nocoalesce:
bch_keylist_free(&keylist);
for (i = 0; i < nodes; i++)
- if (!IS_ERR_OR_NULL(new_nodes[i])) {
+ if (!IS_ERR(new_nodes[i])) {
btree_node_free(new_nodes[i]);
rw_unlock(true, new_nodes[i]);
}
@@ -1669,7 +1669,7 @@ static int bch_btree_gc_root(struct btree *b, struct btree_op *op,
if (should_rewrite) {
n = btree_node_alloc_replacement(b, NULL);
- if (!IS_ERR_OR_NULL(n)) {
+ if (!IS_ERR(n)) {
bch_btree_node_write_sync(n);
bch_btree_set_root(n);
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 1f829e74db0a..e2a803683105 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1723,7 +1723,7 @@ static void cache_set_flush(struct closure *cl)
if (!IS_ERR_OR_NULL(c->gc_thread))
kthread_stop(c->gc_thread);
- if (!IS_ERR_OR_NULL(c->root))
+ if (!IS_ERR(c->root))
list_add(&c->root->list, &c->btree_cache);
/*
@@ -2087,7 +2087,7 @@ static int run_cache_set(struct cache_set *c)
err = "cannot allocate new btree root";
c->root = __bch_btree_node_alloc(c, NULL, 0, true, NULL);
- if (IS_ERR_OR_NULL(c->root))
+ if (IS_ERR(c->root))
goto err;
mutex_lock(&c->root->write_lock);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 525c469e5de9bf7e53574396196e80fc716ac9eb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071604-urgency-jiffy-cbcf@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
525c469e5de9 ("wifi: mt76: mt7921e: fix init command fail with enabled device")
28fec923d240 ("mt76: connac: move mt76_connac2_load_patch in connac module")
b9ec27102ac0 ("mt76: connac: move mt76_connac2_load_ram in connac module")
c132fc7d83bb ("mt76: mt7921: move fw toggle in mt7921_load_firmware")
3d8c636c3e9e ("mt76: connac: move shared fw structures in connac module")
a55a0c701c12 ("mt76: mt7921s: fix firmware download random fail")
99ad32a4ca3a ("mt76: mt7915: add support for MT7986")
ade25ca7950b ("mt76: mt7915: fix mcs_map in mt7915_mcu_set_sta_he_mcs()")
11005b18f453 ("mt76: mt7921s: fix a possible memory leak in mt7921_load_patch")
4a74ecc8f0f6 ("mt76: connac: move mt76_connac_lmac_mapping in mt76-connac module")
602cc0c9618a ("mt76: mt7921e: fix possible probe failure after reboot")
97cef84d1043 ("mt76: connac: move mt76_connac_mcu_rdd_cmd in mt76-connac module")
9e90c3511041 ("mt76: connac: move mt76_connac_mcu_gen_dl_mode in mt76-connac module")
a6ef46fcccf2 ("mt76: mt7915: rely on mt76_connac_mcu_init_download")
ad1a2333350f ("mt76: mt7915: rely on mt76_connac_mcu_patch_sem_ctrl/mt76_connac_mcu_start_patch")
ae90bdd6ad54 ("mt76: connac: move mt76_connac_mcu_restart in common module")
3dc531b92b69 ("mt76: mt7915: rely on mt76_connac_mcu_start_firmware")
48d743d185a5 ("mt76: connac: move mt76_connac_mcu_set_pm in connac module")
2fec2ea644c5 ("mt76: connac: introduce is_connac_v1 utility routine")
187169de13d1 ("mt76: mt7915: rely on mt76_connac_mcu_wtbl_ht_tlv")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 525c469e5de9bf7e53574396196e80fc716ac9eb Mon Sep 17 00:00:00 2001
From: Quan Zhou <quan.zhou(a)mediatek.com>
Date: Wed, 5 Jul 2023 23:26:38 +0800
Subject: [PATCH] wifi: mt76: mt7921e: fix init command fail with enabled
device
For some cases as below, we may encounter the unpreditable chip stats
in driver probe()
* The system reboot flow do not work properly, such as kernel oops while
rebooting, and then the driver do not go back to default status at
this moment.
* Similar to the flow above. If the device was enabled in BIOS or UEFI,
the system may switch to Linux without driver fully shutdown.
To avoid the problem, force push the device back to default in probe()
* mt7921e_mcu_fw_pmctrl() : return control privilege to chip side.
* mt7921_wfsys_reset() : cleanup chip config before resource init.
Error log
[59007.600714] mt7921e 0000:02:00.0: ASIC revision: 79220010
[59010.889773] mt7921e 0000:02:00.0: Message 00000010 (seq 1) timeout
[59010.889786] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59014.217839] mt7921e 0000:02:00.0: Message 00000010 (seq 2) timeout
[59014.217852] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59017.545880] mt7921e 0000:02:00.0: Message 00000010 (seq 3) timeout
[59017.545893] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59020.874086] mt7921e 0000:02:00.0: Message 00000010 (seq 4) timeout
[59020.874099] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59024.202019] mt7921e 0000:02:00.0: Message 00000010 (seq 5) timeout
[59024.202033] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59027.530082] mt7921e 0000:02:00.0: Message 00000010 (seq 6) timeout
[59027.530096] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59030.857888] mt7921e 0000:02:00.0: Message 00000010 (seq 7) timeout
[59030.857904] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59034.185946] mt7921e 0000:02:00.0: Message 00000010 (seq 8) timeout
[59034.185961] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59037.514249] mt7921e 0000:02:00.0: Message 00000010 (seq 9) timeout
[59037.514262] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59040.842362] mt7921e 0000:02:00.0: Message 00000010 (seq 10) timeout
[59040.842375] mt7921e 0000:02:00.0: Failed to get patch semaphore
[59040.923845] mt7921e 0000:02:00.0: hardware init failed
Cc: stable(a)vger.kernel.org
Fixes: 5c14a5f944b9 ("mt76: mt7921: introduce mt7921e support")
Tested-by: Kai-Heng Feng <kai.heng.feng(a)canonical.com>
Tested-by: Juan Martinez <juan.martinez(a)amd.com>
Co-developed-by: Leon Yen <leon.yen(a)mediatek.com>
Signed-off-by: Leon Yen <leon.yen(a)mediatek.com>
Signed-off-by: Quan Zhou <quan.zhou(a)mediatek.com>
Signed-off-by: Deren Wu <deren.wu(a)mediatek.com>
Message-ID: <39fcb7cee08d4ab940d38d82f21897483212483f.1688569385.git.deren.wu(a)mediatek.com>
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/dma.c b/drivers/net/wireless/mediatek/mt76/mt7921/dma.c
index f0a80c2b476a..4153cd6c2a01 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/dma.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/dma.c
@@ -231,10 +231,6 @@ int mt7921_dma_init(struct mt7921_dev *dev)
if (ret)
return ret;
- ret = mt7921_wfsys_reset(dev);
- if (ret)
- return ret;
-
/* init tx queue */
ret = mt76_connac_init_tx_queues(dev->phy.mt76, MT7921_TXQ_BAND0,
MT7921_TX_RING_SIZE,
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c b/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
index c69ce6df4956..f55caa00ac69 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/mcu.c
@@ -476,12 +476,6 @@ static int mt7921_load_firmware(struct mt7921_dev *dev)
{
int ret;
- ret = mt76_get_field(dev, MT_CONN_ON_MISC, MT_TOP_MISC2_FW_N9_RDY);
- if (ret && mt76_is_mmio(&dev->mt76)) {
- dev_dbg(dev->mt76.dev, "Firmware is already download\n");
- goto fw_loaded;
- }
-
ret = mt76_connac2_load_patch(&dev->mt76, mt7921_patch_name(dev));
if (ret)
return ret;
@@ -504,8 +498,6 @@ static int mt7921_load_firmware(struct mt7921_dev *dev)
return -EIO;
}
-fw_loaded:
-
#ifdef CONFIG_PM
dev->mt76.hw->wiphy->wowlan = &mt76_connac_wowlan_support;
#endif /* CONFIG_PM */
diff --git a/drivers/net/wireless/mediatek/mt76/mt7921/pci.c b/drivers/net/wireless/mediatek/mt76/mt7921/pci.c
index ddb1fa4ee01d..95610a117d2f 100644
--- a/drivers/net/wireless/mediatek/mt76/mt7921/pci.c
+++ b/drivers/net/wireless/mediatek/mt76/mt7921/pci.c
@@ -325,6 +325,10 @@ static int mt7921_pci_probe(struct pci_dev *pdev,
bus_ops->rmw = mt7921_rmw;
dev->mt76.bus = bus_ops;
+ ret = mt7921e_mcu_fw_pmctrl(dev);
+ if (ret)
+ goto err_free_dev;
+
ret = __mt7921e_mcu_drv_pmctrl(dev);
if (ret)
goto err_free_dev;
@@ -333,6 +337,10 @@ static int mt7921_pci_probe(struct pci_dev *pdev,
(mt7921_l1_rr(dev, MT_HW_REV) & 0xff);
dev_info(mdev->dev, "ASIC revision: %04x\n", mdev->rev);
+ ret = mt7921_wfsys_reset(dev);
+ if (ret)
+ goto err_free_dev;
+
mt76_wr(dev, MT_WFDMA0_HOST_INT_ENA, 0);
mt76_wr(dev, MT_PCIE_MAC_INT_ENABLE, 0xff);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x cd9489623c29aa2f8cc07088168afb6e0d5ef06d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071643-deputize-alias-9624@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
cd9489623c29 ("i2c: qup: Add missing unwind goto in qup_i2c_probe()")
e42688ed5cf5 ("i2c: busses: remove duplicate dev_err()")
e0442d762139 ("i2c: busses: convert to devm_platform_ioremap_resource")
90224e6468e1 ("i2c: drivers: Use generic definitions for bus frequencies")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd9489623c29aa2f8cc07088168afb6e0d5ef06d Mon Sep 17 00:00:00 2001
From: Shuai Jiang <d202180596(a)hust.edu.cn>
Date: Tue, 18 Apr 2023 21:56:12 +0800
Subject: [PATCH] i2c: qup: Add missing unwind goto in qup_i2c_probe()
Smatch Warns:
drivers/i2c/busses/i2c-qup.c:1784 qup_i2c_probe()
warn: missing unwind goto?
The goto label "fail_runtime" and "fail" will disable qup->pclk,
but here qup->pclk failed to obtain, in order to be consistent,
change the direct return to goto label "fail_dma".
Fixes: 9cedf3b2f099 ("i2c: qup: Add bam dma capabilities")
Signed-off-by: Shuai Jiang <d202180596(a)hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91(a)hust.edu.cn>
Reviewed-by: Andi Shyti <andi.shyti(a)kernel.org>
Signed-off-by: Wolfram Sang <wsa(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # v4.6+
diff --git a/drivers/i2c/busses/i2c-qup.c b/drivers/i2c/busses/i2c-qup.c
index 2e153f2f71b6..78682388e02e 100644
--- a/drivers/i2c/busses/i2c-qup.c
+++ b/drivers/i2c/busses/i2c-qup.c
@@ -1752,16 +1752,21 @@ static int qup_i2c_probe(struct platform_device *pdev)
if (!clk_freq || clk_freq > I2C_MAX_FAST_MODE_PLUS_FREQ) {
dev_err(qup->dev, "clock frequency not supported %d\n",
clk_freq);
- return -EINVAL;
+ ret = -EINVAL;
+ goto fail_dma;
}
qup->base = devm_platform_ioremap_resource(pdev, 0);
- if (IS_ERR(qup->base))
- return PTR_ERR(qup->base);
+ if (IS_ERR(qup->base)) {
+ ret = PTR_ERR(qup->base);
+ goto fail_dma;
+ }
qup->irq = platform_get_irq(pdev, 0);
- if (qup->irq < 0)
- return qup->irq;
+ if (qup->irq < 0) {
+ ret = qup->irq;
+ goto fail_dma;
+ }
if (has_acpi_companion(qup->dev)) {
ret = device_property_read_u32(qup->dev,
@@ -1775,13 +1780,15 @@ static int qup_i2c_probe(struct platform_device *pdev)
qup->clk = devm_clk_get(qup->dev, "core");
if (IS_ERR(qup->clk)) {
dev_err(qup->dev, "Could not get core clock\n");
- return PTR_ERR(qup->clk);
+ ret = PTR_ERR(qup->clk);
+ goto fail_dma;
}
qup->pclk = devm_clk_get(qup->dev, "iface");
if (IS_ERR(qup->pclk)) {
dev_err(qup->dev, "Could not get iface clock\n");
- return PTR_ERR(qup->pclk);
+ ret = PTR_ERR(qup->pclk);
+ goto fail_dma;
}
qup_i2c_enable_clocks(qup);
src_clk_freq = clk_get_rate(qup->clk);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x cd9489623c29aa2f8cc07088168afb6e0d5ef06d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071625-mulch-item-f645@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
cd9489623c29 ("i2c: qup: Add missing unwind goto in qup_i2c_probe()")
e42688ed5cf5 ("i2c: busses: remove duplicate dev_err()")
e0442d762139 ("i2c: busses: convert to devm_platform_ioremap_resource")
90224e6468e1 ("i2c: drivers: Use generic definitions for bus frequencies")
351c8a09b00b ("Merge branch 'i2c/for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd9489623c29aa2f8cc07088168afb6e0d5ef06d Mon Sep 17 00:00:00 2001
From: Shuai Jiang <d202180596(a)hust.edu.cn>
Date: Tue, 18 Apr 2023 21:56:12 +0800
Subject: [PATCH] i2c: qup: Add missing unwind goto in qup_i2c_probe()
Smatch Warns:
drivers/i2c/busses/i2c-qup.c:1784 qup_i2c_probe()
warn: missing unwind goto?
The goto label "fail_runtime" and "fail" will disable qup->pclk,
but here qup->pclk failed to obtain, in order to be consistent,
change the direct return to goto label "fail_dma".
Fixes: 9cedf3b2f099 ("i2c: qup: Add bam dma capabilities")
Signed-off-by: Shuai Jiang <d202180596(a)hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91(a)hust.edu.cn>
Reviewed-by: Andi Shyti <andi.shyti(a)kernel.org>
Signed-off-by: Wolfram Sang <wsa(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # v4.6+
diff --git a/drivers/i2c/busses/i2c-qup.c b/drivers/i2c/busses/i2c-qup.c
index 2e153f2f71b6..78682388e02e 100644
--- a/drivers/i2c/busses/i2c-qup.c
+++ b/drivers/i2c/busses/i2c-qup.c
@@ -1752,16 +1752,21 @@ static int qup_i2c_probe(struct platform_device *pdev)
if (!clk_freq || clk_freq > I2C_MAX_FAST_MODE_PLUS_FREQ) {
dev_err(qup->dev, "clock frequency not supported %d\n",
clk_freq);
- return -EINVAL;
+ ret = -EINVAL;
+ goto fail_dma;
}
qup->base = devm_platform_ioremap_resource(pdev, 0);
- if (IS_ERR(qup->base))
- return PTR_ERR(qup->base);
+ if (IS_ERR(qup->base)) {
+ ret = PTR_ERR(qup->base);
+ goto fail_dma;
+ }
qup->irq = platform_get_irq(pdev, 0);
- if (qup->irq < 0)
- return qup->irq;
+ if (qup->irq < 0) {
+ ret = qup->irq;
+ goto fail_dma;
+ }
if (has_acpi_companion(qup->dev)) {
ret = device_property_read_u32(qup->dev,
@@ -1775,13 +1780,15 @@ static int qup_i2c_probe(struct platform_device *pdev)
qup->clk = devm_clk_get(qup->dev, "core");
if (IS_ERR(qup->clk)) {
dev_err(qup->dev, "Could not get core clock\n");
- return PTR_ERR(qup->clk);
+ ret = PTR_ERR(qup->clk);
+ goto fail_dma;
}
qup->pclk = devm_clk_get(qup->dev, "iface");
if (IS_ERR(qup->pclk)) {
dev_err(qup->dev, "Could not get iface clock\n");
- return PTR_ERR(qup->pclk);
+ ret = PTR_ERR(qup->pclk);
+ goto fail_dma;
}
qup_i2c_enable_clocks(qup);
src_clk_freq = clk_get_rate(qup->clk);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x cd9489623c29aa2f8cc07088168afb6e0d5ef06d
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071624-swimwear-finisher-6443@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
cd9489623c29 ("i2c: qup: Add missing unwind goto in qup_i2c_probe()")
e42688ed5cf5 ("i2c: busses: remove duplicate dev_err()")
e0442d762139 ("i2c: busses: convert to devm_platform_ioremap_resource")
90224e6468e1 ("i2c: drivers: Use generic definitions for bus frequencies")
351c8a09b00b ("Merge branch 'i2c/for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cd9489623c29aa2f8cc07088168afb6e0d5ef06d Mon Sep 17 00:00:00 2001
From: Shuai Jiang <d202180596(a)hust.edu.cn>
Date: Tue, 18 Apr 2023 21:56:12 +0800
Subject: [PATCH] i2c: qup: Add missing unwind goto in qup_i2c_probe()
Smatch Warns:
drivers/i2c/busses/i2c-qup.c:1784 qup_i2c_probe()
warn: missing unwind goto?
The goto label "fail_runtime" and "fail" will disable qup->pclk,
but here qup->pclk failed to obtain, in order to be consistent,
change the direct return to goto label "fail_dma".
Fixes: 9cedf3b2f099 ("i2c: qup: Add bam dma capabilities")
Signed-off-by: Shuai Jiang <d202180596(a)hust.edu.cn>
Reviewed-by: Dongliang Mu <dzm91(a)hust.edu.cn>
Reviewed-by: Andi Shyti <andi.shyti(a)kernel.org>
Signed-off-by: Wolfram Sang <wsa(a)kernel.org>
Cc: <stable(a)vger.kernel.org> # v4.6+
diff --git a/drivers/i2c/busses/i2c-qup.c b/drivers/i2c/busses/i2c-qup.c
index 2e153f2f71b6..78682388e02e 100644
--- a/drivers/i2c/busses/i2c-qup.c
+++ b/drivers/i2c/busses/i2c-qup.c
@@ -1752,16 +1752,21 @@ static int qup_i2c_probe(struct platform_device *pdev)
if (!clk_freq || clk_freq > I2C_MAX_FAST_MODE_PLUS_FREQ) {
dev_err(qup->dev, "clock frequency not supported %d\n",
clk_freq);
- return -EINVAL;
+ ret = -EINVAL;
+ goto fail_dma;
}
qup->base = devm_platform_ioremap_resource(pdev, 0);
- if (IS_ERR(qup->base))
- return PTR_ERR(qup->base);
+ if (IS_ERR(qup->base)) {
+ ret = PTR_ERR(qup->base);
+ goto fail_dma;
+ }
qup->irq = platform_get_irq(pdev, 0);
- if (qup->irq < 0)
- return qup->irq;
+ if (qup->irq < 0) {
+ ret = qup->irq;
+ goto fail_dma;
+ }
if (has_acpi_companion(qup->dev)) {
ret = device_property_read_u32(qup->dev,
@@ -1775,13 +1780,15 @@ static int qup_i2c_probe(struct platform_device *pdev)
qup->clk = devm_clk_get(qup->dev, "core");
if (IS_ERR(qup->clk)) {
dev_err(qup->dev, "Could not get core clock\n");
- return PTR_ERR(qup->clk);
+ ret = PTR_ERR(qup->clk);
+ goto fail_dma;
}
qup->pclk = devm_clk_get(qup->dev, "iface");
if (IS_ERR(qup->pclk)) {
dev_err(qup->dev, "Could not get iface clock\n");
- return PTR_ERR(qup->pclk);
+ ret = PTR_ERR(qup->pclk);
+ goto fail_dma;
}
qup_i2c_enable_clocks(qup);
src_clk_freq = clk_get_rate(qup->clk);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 8a796565cec3601071cbbd27d6304e202019d014
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023071622-exhale-throwing-6c38@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
8a796565cec3 ("io_uring: Use io_schedule* in cqring wait")
d33a39e57768 ("io_uring: keep timeout in io_wait_queue")
46ae7eef44f6 ("io_uring: optimise non-timeout waiting")
846072f16eed ("io_uring: mimimise io_cqring_wait_schedule")
3fcf19d592d5 ("io_uring: parse check_cq out of wq waiting")
12521a5d5cb7 ("io_uring: fix CQ waiting timeout handling")
52ea806ad983 ("io_uring: finish waiting before flushing overflow entries")
35d90f95cfa7 ("io_uring: include task_work run after scheduling in wait for events")
1b346e4aa8e7 ("io_uring: don't check overflow flush failures")
a85381d8326d ("io_uring: skip overflow CQE posting for dying ring")
c0e0d6ba25f1 ("io_uring: add IORING_SETUP_DEFER_TASKRUN")
b4c98d59a787 ("io_uring: introduce io_has_work")
78a861b94959 ("io_uring: add sync cancelation API through io_uring_register()")
c34398a8c018 ("io_uring: remove __io_req_task_work_add")
ed5ccb3beeba ("io_uring: remove priority tw list optimisation")
4a0fef62788b ("io_uring: optimize io_uring_task layout")
253993210bd8 ("io_uring: introduce locking helpers for CQE posting")
305bef988708 ("io_uring: hide eventfd assumptions in eventfd paths")
affa87db9010 ("io_uring: fix multi ctx cancellation")
d9dee4302a7c ("io_uring: remove ->flush_cqes optimisation")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8a796565cec3601071cbbd27d6304e202019d014 Mon Sep 17 00:00:00 2001
From: Andres Freund <andres(a)anarazel.de>
Date: Fri, 7 Jul 2023 09:20:07 -0700
Subject: [PATCH] io_uring: Use io_schedule* in cqring wait
I observed poor performance of io_uring compared to synchronous IO. That
turns out to be caused by deeper CPU idle states entered with io_uring,
due to io_uring using plain schedule(), whereas synchronous IO uses
io_schedule().
The losses due to this are substantial. On my cascade lake workstation,
t/io_uring from the fio repository e.g. yields regressions between 20%
and 40% with the following command:
./t/io_uring -r 5 -X0 -d 1 -s 1 -c 1 -p 0 -S$use_sync -R 0 /mnt/t2/fio/write.0.0
This is repeatable with different filesystems, using raw block devices
and using different block devices.
Use io_schedule_prepare() / io_schedule_finish() in
io_cqring_wait_schedule() to address the difference.
After that using io_uring is on par or surpassing synchronous IO (using
registered files etc makes it reliably win, but arguably is a less fair
comparison).
There are other calls to schedule() in io_uring/, but none immediately
jump out to be similarly situated, so I did not touch them. Similarly,
it's possible that mutex_lock_io() should be used, but it's not clear if
there are cases where that matters.
Cc: stable(a)vger.kernel.org # 5.10+
Cc: Pavel Begunkov <asml.silence(a)gmail.com>
Cc: io-uring(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Signed-off-by: Andres Freund <andres(a)anarazel.de>
Link: https://lore.kernel.org/r/20230707162007.194068-1-andres@anarazel.de
[axboe: minor style fixup]
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/io_uring/io_uring.c b/io_uring/io_uring.c
index e8096d502a7c..7505de2428e0 100644
--- a/io_uring/io_uring.c
+++ b/io_uring/io_uring.c
@@ -2489,6 +2489,8 @@ int io_run_task_work_sig(struct io_ring_ctx *ctx)
static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
struct io_wait_queue *iowq)
{
+ int token, ret;
+
if (unlikely(READ_ONCE(ctx->check_cq)))
return 1;
if (unlikely(!llist_empty(&ctx->work_llist)))
@@ -2499,11 +2501,20 @@ static inline int io_cqring_wait_schedule(struct io_ring_ctx *ctx,
return -EINTR;
if (unlikely(io_should_wake(iowq)))
return 0;
+
+ /*
+ * Use io_schedule_prepare/finish, so cpufreq can take into account
+ * that the task is waiting for IO - turns out to be important for low
+ * QD IO.
+ */
+ token = io_schedule_prepare();
+ ret = 0;
if (iowq->timeout == KTIME_MAX)
schedule();
else if (!schedule_hrtimeout(&iowq->timeout, HRTIMER_MODE_ABS))
- return -ETIME;
- return 0;
+ ret = -ETIME;
+ io_schedule_finish(token);
+ return ret;
}
/*
Greetings to You,
I have a good news for you.Please contact me for more details. Sorry
if
you received this letter in your spam, Due to recent connection error
here in my country.a
Looking forward for your immediate response to me through my private
e-
mail id: (jackthomsom7(a)gmail.com)
Best Regards,
Regards,
Dr. Thomsom Jack.
My Telephone number
Dear Sir :
Nice day!
This is lucy from metal & Harnesses factory ,we make metal & Harnesses,belt and buckles,BELT & STRAP,LEATHER Harnesses,tags ,belts,straps ,clips ,rings ,buckles, bars ,d rings stainless steel,d rings steel,d rings zinc ,o rings steel,o rings zinc ,steel d ring,steel d buckle ,hooks ,clasps ,clips , an array of fastening like rivets , grommets,rivets ,eyelets ,tools , metal gear , metal hardware , metal products, metal Components,S hooks ,d rings ,o rings ,snaps ,webbings , buckles, HOOK,brass buckles, clips , hooks ,straps , accessories ,brass snaps ,brass gear , stainless steel hardware,steel harness ,steel wires, wire rings , wire buckles ,brass gears ,brass products ,iron metal hardware ,Fasteners,safety buckles,brass buckles, brass rings ,belts,buckles ,Split Ring ,straps ,sliders,snaps ,Bars Rings,textile accessories , magnent buttons,carabiners,zipper ,zipper pull, buckles,snap fasteners ,trigger ,buttons ,snaps ,Bars Rings, Fasteners as required for our golbal clients ,
We are manufactory, we are the source, our price is very competitive ,you will get the best price , We have profuse designs with series quality grade, and expressly.
Our factory always produce customer designs and drawing , if you have any products looking please let me know
we could surely make for you
Sincerely hope could work with you !
Best regards
lucy
Hi,
This patch-series trying to avoid issues when plock ops with
DLM_PLOCK_FL_CLOSE flag is set sends a reply back which should never be
the case. This problem getting more serious when introducing a new plock
op and an answer was not expected as
I changed in v2 to check on DLM_PLOCK_FL_CLOSE flag for stable as this
can also being used to fix the potential issue for older kernels and it
does not change the UAPI. For newer user space applications the new flag
DLM_PLOCK_FL_NO_REPLY will tell the user space application to never send
an result back, it will handle this filter earlier in user space. For
older user space software we will filter the result in ther kernel.
This requires the behaviour that the flags are the same for the request
and the reply which is the case for dlm_controld.
Also fix the wrapped string and don't spam the user ignoring replies.
- Alex
Alexander Aring (3):
fs: dlm: ignore DLM_PLOCK_FL_CLOSE flag results
fs: dlm: introduce DLM_PLOCK_FL_NO_REPLY flag
fs: dlm: allow to F_SETLKW getting interrupted
fs/dlm/plock.c | 107 ++++++++++++++++++++++++---------
include/uapi/linux/dlm_plock.h | 2 +
2 files changed, 81 insertions(+), 28 deletions(-)
--
2.31.1
This patch introduces a new flag DLM_PLOCK_FL_NO_REPLY in case an dlm
plock operation should not send a reply back. Currently this is kind of
being handled in DLM_PLOCK_FL_CLOSE, but DLM_PLOCK_FL_CLOSE has more
meanings that it will remove all waiters for a specific nodeid/owner
values in by doing a unlock operation. In case of an error in dlm user
space software e.g. dlm_controld we get an reply with an error back.
This cannot be matched because there is no op to match in recv_list. We
filter now on DLM_PLOCK_FL_NO_REPLY in case we had an error back as
reply. In newer dlm_controld version it will never send a result back
when DLM_PLOCK_FL_NO_REPLY is set. This filter is a workaround to handle
older dlm_controld versions.
Fixes: 901025d2f319 ("dlm: make plock operation killable")
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
---
fs/dlm/plock.c | 23 +++++++++++++++++++----
include/uapi/linux/dlm_plock.h | 1 +
2 files changed, 20 insertions(+), 4 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 70a4752ed913..7fe9f4b922d3 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -96,7 +96,7 @@ static void do_unlock_close(const struct dlm_plock_info *info)
op->info.end = OFFSET_MAX;
op->info.owner = info->owner;
- op->info.flags |= DLM_PLOCK_FL_CLOSE;
+ op->info.flags |= (DLM_PLOCK_FL_CLOSE | DLM_PLOCK_FL_NO_REPLY);
send_op(op);
}
@@ -293,7 +293,7 @@ int dlm_posix_unlock(dlm_lockspace_t *lockspace, u64 number, struct file *file,
op->info.owner = (__u64)(long) fl->fl_owner;
if (fl->fl_flags & FL_CLOSE) {
- op->info.flags |= DLM_PLOCK_FL_CLOSE;
+ op->info.flags |= (DLM_PLOCK_FL_CLOSE | DLM_PLOCK_FL_NO_REPLY);
send_op(op);
rv = 0;
goto out;
@@ -392,7 +392,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count,
spin_lock(&ops_lock);
if (!list_empty(&send_list)) {
op = list_first_entry(&send_list, struct plock_op, list);
- if (op->info.flags & DLM_PLOCK_FL_CLOSE)
+ if (op->info.flags & DLM_PLOCK_FL_NO_REPLY)
list_del(&op->list);
else
list_move_tail(&op->list, &recv_list);
@@ -407,7 +407,7 @@ static ssize_t dev_read(struct file *file, char __user *u, size_t count,
that were generated by the vfs cleaning up for a close
(the process did not make an unlock call). */
- if (op->info.flags & DLM_PLOCK_FL_CLOSE)
+ if (op->info.flags & DLM_PLOCK_FL_NO_REPLY)
dlm_release_plock_op(op);
if (copy_to_user(u, &info, sizeof(info)))
@@ -433,6 +433,21 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
if (check_version(&info))
return -EINVAL;
+ /* Some old dlm user space software will send replies back,
+ * even if DLM_PLOCK_FL_NO_REPLY is set (because the flag is
+ * new) e.g. if a error occur. We can't match them in recv_list
+ * because they were never be part of it. We filter it here,
+ * new dlm user space software will filter it in user space.
+ *
+ * In future this handling can be removed.
+ */
+ if (info.flags & DLM_PLOCK_FL_NO_REPLY) {
+ pr_info("Received unexpected reply from op %d, "
+ "please update DLM user space software!\n",
+ info.optype);
+ return count;
+ }
+
/*
* The results for waiting ops (SETLKW) can be returned in any
* order, so match all fields to find the op. The results for
diff --git a/include/uapi/linux/dlm_plock.h b/include/uapi/linux/dlm_plock.h
index 63b6c1fd9169..8dfa272c929a 100644
--- a/include/uapi/linux/dlm_plock.h
+++ b/include/uapi/linux/dlm_plock.h
@@ -25,6 +25,7 @@ enum {
};
#define DLM_PLOCK_FL_CLOSE 1
+#define DLM_PLOCK_FL_NO_REPLY 2
struct dlm_plock_info {
__u32 version[3];
--
2.31.1
If the code fails to add histogram to hist_vars list, then ret should
contain error code before jumping to unregister histogram.
Cc: stable(a)vger.kernel.org
Fixes: 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables")
Signed-off-by: Mohamed Khalfella <mkhalfella(a)purestorage.com>
---
kernel/trace/trace_events_hist.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index c8c61381eba4..d06938ae0717 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -6668,7 +6668,8 @@ static int event_hist_trigger_parse(struct event_command *cmd_ops,
goto out_unreg;
if (has_hist_vars(hist_data) || hist_data->n_var_refs) {
- if (save_hist_vars(hist_data))
+ ret = save_hist_vars(hist_data);
+ if (ret)
goto out_unreg;
}
--
2.34.1
Commit 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if
they have referenced variables") added a check to fail histogram creation
if save_hist_vars() failed to add histogram to hist_vars list. But the
commit failed to set ret to failed return code before jumping to
unregister histogram, fix it.
Cc: stable(a)vger.kernel.org
Fixes: 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables")
Signed-off-by: Mohamed Khalfella <mkhalfella(a)purestorage.com>
---
kernel/trace/trace_events_hist.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index c8c61381eba4..d06938ae0717 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -6668,7 +6668,8 @@ static int event_hist_trigger_parse(struct event_command *cmd_ops,
goto out_unreg;
if (has_hist_vars(hist_data) || hist_data->n_var_refs) {
- if (save_hist_vars(hist_data))
+ ret = save_hist_vars(hist_data);
+ if (ret)
goto out_unreg;
}
--
2.34.1
The patch titled
Subject: lib/test_meminit: allocate pages up to order MAX_ORDER
has been added to the -mm mm-unstable branch. Its filename is
lib-test_meminit-allocate-pages-up-to-order-max_order.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Andrew Donnellan <ajd(a)linux.ibm.com>
Subject: lib/test_meminit: allocate pages up to order MAX_ORDER
Date: Fri, 14 Jul 2023 11:52:38 +1000
test_pages() tests the page allocator by calling alloc_pages() with
different orders up to order 10.
However, different architectures and platforms support different maximum
contiguous allocation sizes. The default maximum allocation order
(MAX_ORDER) is 10, but architectures can use CONFIG_ARCH_FORCE_MAX_ORDER
to override this. On platforms where this is less than 10, test_meminit()
will blow up with a WARN(). This is expected, so let's not do that.
Replace the hardcoded "10" with the MAX_ORDER macro so that we test
allocations up to the expected platform limit.
Link: https://lkml.kernel.org/r/20230714015238.47931-1-ajd@linux.ibm.com
Fixes: 5015a300a522 ("lib: introduce test_meminit module")
Signed-off-by: Andrew Donnellan <ajd(a)linux.ibm.com>
Reviewed-by: Alexander Potapenko <glider(a)google.com>
Cc: Xiaoke Wang <xkernel.wang(a)foxmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/test_meminit.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/test_meminit.c~lib-test_meminit-allocate-pages-up-to-order-max_order
+++ a/lib/test_meminit.c
@@ -93,7 +93,7 @@ static int __init test_pages(int *total_
int failures = 0, num_tests = 0;
int i;
- for (i = 0; i < 10; i++)
+ for (i = 0; i <= MAX_ORDER; i++)
num_tests += do_alloc_pages_order(i, &failures);
REPORT_FAILURES_IN_FN();
_
Patches currently in -mm which might be from ajd(a)linux.ibm.com are
lib-test_meminit-allocate-pages-up-to-order-max_order.patch
D e a r Sir,
I am M r. Taras Volodymyr from U k r a i n e but lived in Russia for many years, I am a successful business?m a n here in Russia as I have been involved oil serving business.
Based on what is going here I found you very capable of handling this h u g e business magnitude, there is a genuine need for an investment of a substantial amount in your country. If you are willing to p a r t n e r with me I will advise you to get back to me for proceedings-details on the way forward.
I wish for a prompt response from you regarding my letter.
Warm regards,
Mr. Taras Volodymyr
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when
running a preemptible kernel, as it is possible that a vCPU is blocked
without requesting a doorbell interrupt.
The issue is that any preemption that occurs between vgic_v4_put() and
schedule() on the block path will mark the vPE as nonresident and *not*
request a doorbell irq. This occurs because when the vcpu thread is
resumed on its way to block, vcpu_load() will make the vPE resident
again. Once the vcpu actually blocks, we don't request a doorbell
anymore, and the vcpu won't be woken up on interrupt delivery.
Fix it by tracking that we're entering WFI, and key the doorbell
request on that flag. This allows us not to make the vPE resident
when going through a preempt/schedule cycle, meaning we don't lose
any state.
Cc: stable(a)vger.kernel.org
Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put")
Reported-by: Xiang Chen <chenxiang66(a)hisilicon.com>
Suggested-by: Zenghui Yu <yuzenghui(a)huawei.com>
Tested-by: Xiang Chen <chenxiang66(a)hisilicon.com>
Co-developed-by: Oliver Upton <oliver.upton(a)linux.dev>
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
---
arch/arm64/include/asm/kvm_host.h | 2 ++
arch/arm64/kvm/arm.c | 6 ++++--
arch/arm64/kvm/vgic/vgic-v3.c | 2 +-
arch/arm64/kvm/vgic/vgic-v4.c | 7 +++++--
include/kvm/arm_vgic.h | 2 +-
5 files changed, 13 insertions(+), 6 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 1e768481f62f..914fc9c26e40 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -817,6 +817,8 @@ struct kvm_vcpu_arch {
#define DBG_SS_ACTIVE_PENDING __vcpu_single_flag(sflags, BIT(5))
/* PMUSERENR for the guest EL0 is on physical CPU */
#define PMUSERENR_ON_CPU __vcpu_single_flag(sflags, BIT(6))
+/* WFI instruction trapped */
+#define IN_WFI __vcpu_single_flag(sflags, BIT(7))
/* vcpu entered with HCR_EL2.E2H set */
#define VCPU_HCR_E2H __vcpu_single_flag(oflags, BIT(0))
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 236c5f1c9090..cf208d30a9ea 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -725,13 +725,15 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
*/
preempt_disable();
kvm_vgic_vmcr_sync(vcpu);
- vgic_v4_put(vcpu, true);
+ vcpu_set_flag(vcpu, IN_WFI);
+ vgic_v4_put(vcpu);
preempt_enable();
kvm_vcpu_halt(vcpu);
vcpu_clear_flag(vcpu, IN_WFIT);
preempt_disable();
+ vcpu_clear_flag(vcpu, IN_WFI);
vgic_v4_load(vcpu);
preempt_enable();
}
@@ -799,7 +801,7 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_RELOAD_GICv4, vcpu)) {
/* The distributor enable bits were changed */
preempt_disable();
- vgic_v4_put(vcpu, false);
+ vgic_v4_put(vcpu);
vgic_v4_load(vcpu);
preempt_enable();
}
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index 49d35618d576..df61ead7c757 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -780,7 +780,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
* done a vgic_v4_put) and when running a nested guest (the
* vPE was never resident in order to generate a doorbell).
*/
- WARN_ON(vgic_v4_put(vcpu, false));
+ WARN_ON(vgic_v4_put(vcpu));
vgic_v3_vmcr_sync(vcpu);
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index c1c28fe680ba..339a55194b2c 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -336,14 +336,14 @@ void vgic_v4_teardown(struct kvm *kvm)
its_vm->vpes = NULL;
}
-int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db)
+int vgic_v4_put(struct kvm_vcpu *vcpu)
{
struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
if (!vgic_supports_direct_msis(vcpu->kvm) || !vpe->resident)
return 0;
- return its_make_vpe_non_resident(vpe, need_db);
+ return its_make_vpe_non_resident(vpe, !!vcpu_get_flag(vcpu, IN_WFI));
}
int vgic_v4_load(struct kvm_vcpu *vcpu)
@@ -354,6 +354,9 @@ int vgic_v4_load(struct kvm_vcpu *vcpu)
if (!vgic_supports_direct_msis(vcpu->kvm) || vpe->resident)
return 0;
+ if (vcpu_get_flag(vcpu, IN_WFI))
+ return 0;
+
/*
* Before making the VPE resident, make sure the redistributor
* corresponding to our current CPU expects us here. See the
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 9b91a8135dac..765d801d1ddc 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -446,7 +446,7 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq,
int vgic_v4_load(struct kvm_vcpu *vcpu);
void vgic_v4_commit(struct kvm_vcpu *vcpu);
-int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
+int vgic_v4_put(struct kvm_vcpu *vcpu);
bool vgic_state_is_nested(struct kvm_vcpu *vcpu);
--
2.34.1
This reverts commit b138e23d3dff90c0494925b4c1874227b81bddf7.
AutoRetry has been found to cause some issues. This feature allows
the controller in host mode (further referred to as the xHC) to send
non-terminating/burst retry ACKs (Retry=1 and Nump!=0) instead of
terminating retry ACKs (Retry=1 and Nump=0) to devices when
a transaction error occurs.
Unfortunately, some USB devices fail to retry transactions when
the xHC sends them a burst retry ACK. When this happens, the xHC
enters a strange state. After the affected transfer times out,
the xHCI driver tries to resume normal operation of the xHC
by sending it a Stop Endpoint command. However, the xHC fails
to respond to it, and the xHCI driver gives up. [1]
This fact is reported via dmesg:
[sda] tag#29 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[sda] tag#29 CDB: opcode=0x28 28 00 00 69 42 80 00 00 48 00
xhci-hcd: xHCI host not responding to stop endpoint command
xhci-hcd: xHCI host controller not responding, assume dead
xhci-hcd: HC died; cleaning up
Some users observed this problem on an Odroid HC2 with the JMS578
USB3-to-SATA bridge. The issue can be triggered by starting
a read-heavy workload on an attached SSD. After a while, the host
controller would die and the SSD would disappear from the system. [1]
Further analysis by Synopsys determined that controller revisions
other than the one in Odroid HC2 are also affected by this.
The recommended solution was to disable AutoRetry altogether.
This change does not have a noticeable performance impact. [2]
Fixes: b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in the controller")
Link: https://lore.kernel.org/r/a21f34c04632d250cd0a78c7c6f4a1c9c7a43142.camel@gm… [1]
Link: https://lore.kernel.org/r/20230711214834.kyr6ulync32d4ktk@synopsys.com/ [2]
Cc: stable(a)vger.kernel.org
Cc: Mauro Ribeiro <mauro.ribeiro(a)hardkernel.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Jakub Vanek <linuxtardis(a)gmail.com>
---
V1 -> V2: Updated to disable AutoRetry everywhere based on Synopsys feedback
Reworded the changelog a bit to make it clearer
drivers/usb/dwc3/core.c | 16 ----------------
drivers/usb/dwc3/core.h | 3 ---
2 files changed, 19 deletions(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index f6689b731718..a4e079d37566 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1209,22 +1209,6 @@ static int dwc3_core_init(struct dwc3 *dwc)
dwc3_writel(dwc->regs, DWC3_GUCTL1, reg);
}
- if (dwc->dr_mode == USB_DR_MODE_HOST ||
- dwc->dr_mode == USB_DR_MODE_OTG) {
- reg = dwc3_readl(dwc->regs, DWC3_GUCTL);
-
- /*
- * Enable Auto retry Feature to make the controller operating in
- * Host mode on seeing transaction errors(CRC errors or internal
- * overrun scenerios) on IN transfers to reply to the device
- * with a non-terminating retry ACK (i.e, an ACK transcation
- * packet with Retry=1 & Nump != 0)
- */
- reg |= DWC3_GUCTL_HSTINAUTORETRY;
-
- dwc3_writel(dwc->regs, DWC3_GUCTL, reg);
- }
-
/*
* Must config both number of packets and max burst settings to enable
* RX and/or TX threshold.
diff --git a/drivers/usb/dwc3/core.h b/drivers/usb/dwc3/core.h
index 8b1295e4dcdd..a69ac67d89fe 100644
--- a/drivers/usb/dwc3/core.h
+++ b/drivers/usb/dwc3/core.h
@@ -256,9 +256,6 @@
#define DWC3_GCTL_GBLHIBERNATIONEN BIT(1)
#define DWC3_GCTL_DSBLCLKGTNG BIT(0)
-/* Global User Control Register */
-#define DWC3_GUCTL_HSTINAUTORETRY BIT(14)
-
/* Global User Control 1 Register */
#define DWC3_GUCTL1_DEV_DECOUPLE_L1L2_EVT BIT(31)
#define DWC3_GUCTL1_TX_IPGAP_LINECHECK_DIS BIT(28)
--
2.25.1
Commit 662d20b3a5af ("hwmon: (aquacomputer_d5next) Add support for
temperature sensor offsets") changed aqc_get_ctrl_val() to return
the value through a parameter instead of through the return value,
but didn't fix up a case that relied on the old behavior. Fix it
to use the proper received value and not the return code.
Fixes: 662d20b3a5af ("hwmon: (aquacomputer_d5next) Add support for temperature sensor offsets")
Cc: stable(a)vger.kernel.org
Signed-off-by: Aleksa Savic <savicaleksa83(a)gmail.com>
---
drivers/hwmon/aquacomputer_d5next.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/hwmon/aquacomputer_d5next.c b/drivers/hwmon/aquacomputer_d5next.c
index a981f7086114..a997dbcb563f 100644
--- a/drivers/hwmon/aquacomputer_d5next.c
+++ b/drivers/hwmon/aquacomputer_d5next.c
@@ -1027,7 +1027,7 @@ static int aqc_read(struct device *dev, enum hwmon_sensor_types type, u32 attr,
if (ret < 0)
return ret;
- *val = aqc_percent_to_pwm(ret);
+ *val = aqc_percent_to_pwm(*val);
break;
}
break;
--
2.35.0
The decision whether to enable a wake irq during suspend can not be done
based on the runtime PM state directly as a driver may use wake irqs
without implementing runtime PM. Such drivers specifically leave the
state set to the default 'suspended' and the wake irq is thus never
enabled at suspend.
Add a new wake irq flag to track whether a dedicated wake irq has been
enabled at runtime suspend and therefore must not be enabled at system
suspend.
Note that pm_runtime_enabled() can not be used as runtime PM is always
disabled during late suspend.
Fixes: 69728051f5bf ("PM / wakeirq: Fix unbalanced IRQ enable for wakeirq")
Cc: stable(a)vger.kernel.org # 4.16
Cc: Tony Lindgren <tony(a)atomide.com>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/base/power/power.h | 1 +
drivers/base/power/wakeirq.c | 12 ++++++++----
2 files changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/base/power/power.h b/drivers/base/power/power.h
index 0eb7f02b3ad5..922ed457db19 100644
--- a/drivers/base/power/power.h
+++ b/drivers/base/power/power.h
@@ -29,6 +29,7 @@ extern u64 pm_runtime_active_time(struct device *dev);
#define WAKE_IRQ_DEDICATED_MASK (WAKE_IRQ_DEDICATED_ALLOCATED | \
WAKE_IRQ_DEDICATED_MANAGED | \
WAKE_IRQ_DEDICATED_REVERSE)
+#define WAKE_IRQ_DEDICATED_ENABLED BIT(3)
struct wake_irq {
struct device *dev;
diff --git a/drivers/base/power/wakeirq.c b/drivers/base/power/wakeirq.c
index d487a6bac630..afd094dec5ca 100644
--- a/drivers/base/power/wakeirq.c
+++ b/drivers/base/power/wakeirq.c
@@ -314,8 +314,10 @@ void dev_pm_enable_wake_irq_check(struct device *dev,
return;
enable:
- if (!can_change_status || !(wirq->status & WAKE_IRQ_DEDICATED_REVERSE))
+ if (!can_change_status || !(wirq->status & WAKE_IRQ_DEDICATED_REVERSE)) {
enable_irq(wirq->irq);
+ wirq->status |= WAKE_IRQ_DEDICATED_ENABLED;
+ }
}
/**
@@ -336,8 +338,10 @@ void dev_pm_disable_wake_irq_check(struct device *dev, bool cond_disable)
if (cond_disable && (wirq->status & WAKE_IRQ_DEDICATED_REVERSE))
return;
- if (wirq->status & WAKE_IRQ_DEDICATED_MANAGED)
+ if (wirq->status & WAKE_IRQ_DEDICATED_MANAGED) {
+ wirq->status &= ~WAKE_IRQ_DEDICATED_ENABLED;
disable_irq_nosync(wirq->irq);
+ }
}
/**
@@ -376,7 +380,7 @@ void dev_pm_arm_wake_irq(struct wake_irq *wirq)
if (device_may_wakeup(wirq->dev)) {
if (wirq->status & WAKE_IRQ_DEDICATED_ALLOCATED &&
- !pm_runtime_status_suspended(wirq->dev))
+ !(wirq->status & WAKE_IRQ_DEDICATED_ENABLED))
enable_irq(wirq->irq);
enable_irq_wake(wirq->irq);
@@ -399,7 +403,7 @@ void dev_pm_disarm_wake_irq(struct wake_irq *wirq)
disable_irq_wake(wirq->irq);
if (wirq->status & WAKE_IRQ_DEDICATED_ALLOCATED &&
- !pm_runtime_status_suspended(wirq->dev))
+ !(wirq->status & WAKE_IRQ_DEDICATED_ENABLED))
disable_irq_nosync(wirq->irq);
}
}
--
2.41.0
The runtime PM state should not be changed by drivers that do not
implement runtime PM even if it happens to work around a bug in PM core.
With the wake irq arming now fixed, drop the bogus runtime PM state
update which left the device in active state (and could potentially
prevent a parent device from suspending).
Fixes: f3974413cf02 ("tty: serial: qcom_geni_serial: Wakeup IRQ cleanup")
Cc: stable(a)vger.kernel.org # 5.6
Cc: Matthias Kaehlcke <mka(a)chromium.org>
Cc: Stephen Boyd <swboyd(a)chromium.org>
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/tty/serial/qcom_geni_serial.c | 7 -------
1 file changed, 7 deletions(-)
diff --git a/drivers/tty/serial/qcom_geni_serial.c b/drivers/tty/serial/qcom_geni_serial.c
index 88ed5bbe25a8..b825b05e6137 100644
--- a/drivers/tty/serial/qcom_geni_serial.c
+++ b/drivers/tty/serial/qcom_geni_serial.c
@@ -1681,13 +1681,6 @@ static int qcom_geni_serial_probe(struct platform_device *pdev)
if (ret)
return ret;
- /*
- * Set pm_runtime status as ACTIVE so that wakeup_irq gets
- * enabled/disabled from dev_pm_arm_wake_irq during system
- * suspend/resume respectively.
- */
- pm_runtime_set_active(&pdev->dev);
-
if (port->wakeup_irq > 0) {
device_init_wakeup(&pdev->dev, true);
ret = dev_pm_set_dedicated_wake_irq(&pdev->dev,
--
2.41.0
From: Gavin Shan <gshan(a)redhat.com>
We run into guest hang in edk2 firmware when KSM is kept as running on
the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
buffered write. The status is returned by reading the memory region of
the pflash device and the read request should have been forwarded to QEMU
and emulated by it. Unfortunately, the read request is covered by an
illegal stage2 mapping when the guest hang issue occurs. The read request
is completed with QEMU bypassed and wrong status is fetched. The edk2
firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM
at (C) even the associated memory slot has been marked as invalid at (B)
when the memory slot is requested to be deleted. It's notable that the
active and inactive memory slots can't be swapped when we're in the middle
of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
to zero again. Besides, the swapping from the active to the inactive memory
slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
CPU-A CPU-B
----- -----
ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
kvm_vm_ioctl_set_memory_region
kvm_set_memory_region
__kvm_set_memory_region
kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
kvm_invalidate_memslot
kvm_copy_memslot
kvm_replace_memslot
kvm_swap_active_memslots (A)
kvm_arch_flush_shadow_memslot (B)
same page sharing by KSM
kvm_mmu_notifier_invalidate_range_start
:
kvm_mmu_notifier_change_pte
kvm_handle_hva_range
__kvm_handle_hva_range
kvm_set_spte_gfn (C)
:
kvm_mmu_notifier_invalidate_range_end
Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping so that the read request for the pflash's status
is forwarded to QEMU and emulated by it. In this way, the correct pflash's
status can be returned from QEMU to break the infinite loop in the edk2
firmware.
We tried a git-bisect and the first problematic commit is cd4c71835228 ("
KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
clean_dcache_guest_page() is called after the memory slots are iterated
in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
before the iteration on the memory slots before this commit. This change
literally enlarges the racy window between kvm_mmu_notifier_change_pte()
and memory slot removal so that we're able to reproduce the issue in a
practical test case. However, the issue exists since commit d5d8184d35c9
("KVM: ARM: Memory virtualization setup").
Cc: stable(a)vger.kernel.org # v3.9+
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reported-by: Shuai Hu <hshuai(a)redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oliver Upton <oliver.upton(a)linux.dev>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Message-Id: <20230615054259.14911-1-gshan(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
virt/kvm/kvm_main.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 479802a892d4..65f94f592ff8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -686,6 +686,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
return __kvm_handle_hva_range(kvm, &range);
}
+
+static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+ /*
+ * Skipping invalid memslots is correct if and only change_pte() is
+ * surrounded by invalidate_range_{start,end}(), which is currently
+ * guaranteed by the primary MMU. If that ever changes, KVM needs to
+ * unmap the memslot instead of skipping the memslot to ensure that KVM
+ * doesn't hold references to the old PFN.
+ */
+ WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
+
+ if (range->slot->flags & KVM_MEMSLOT_INVALID)
+ return false;
+
+ return kvm_set_spte_gfn(kvm, range);
+}
+
static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long address,
@@ -707,7 +725,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
return;
- kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
+ kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn);
}
void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
--
2.25.1
From: Gavin Shan <gshan(a)redhat.com>
We run into guest hang in edk2 firmware when KSM is kept as running on
the host. The edk2 firmware is waiting for status 0x80 from QEMU's pflash
device (TYPE_PFLASH_CFI01) during the operation of sector erasing or
buffered write. The status is returned by reading the memory region of
the pflash device and the read request should have been forwarded to QEMU
and emulated by it. Unfortunately, the read request is covered by an
illegal stage2 mapping when the guest hang issue occurs. The read request
is completed with QEMU bypassed and wrong status is fetched. The edk2
firmware runs into an infinite loop with the wrong status.
The illegal stage2 mapping is populated due to same page sharing by KSM
at (C) even the associated memory slot has been marked as invalid at (B)
when the memory slot is requested to be deleted. It's notable that the
active and inactive memory slots can't be swapped when we're in the middle
of kvm_mmu_notifier_change_pte() because kvm->mn_active_invalidate_count
is elevated, and kvm_swap_active_memslots() will busy loop until it reaches
to zero again. Besides, the swapping from the active to the inactive memory
slots is also avoided by holding &kvm->srcu in __kvm_handle_hva_range(),
corresponding to synchronize_srcu_expedited() in kvm_swap_active_memslots().
CPU-A CPU-B
----- -----
ioctl(kvm_fd, KVM_SET_USER_MEMORY_REGION)
kvm_vm_ioctl_set_memory_region
kvm_set_memory_region
__kvm_set_memory_region
kvm_set_memslot(kvm, old, NULL, KVM_MR_DELETE)
kvm_invalidate_memslot
kvm_copy_memslot
kvm_replace_memslot
kvm_swap_active_memslots (A)
kvm_arch_flush_shadow_memslot (B)
same page sharing by KSM
kvm_mmu_notifier_invalidate_range_start
:
kvm_mmu_notifier_change_pte
kvm_handle_hva_range
__kvm_handle_hva_range
kvm_set_spte_gfn (C)
:
kvm_mmu_notifier_invalidate_range_end
Fix the issue by skipping the invalid memory slot at (C) to avoid the
illegal stage2 mapping so that the read request for the pflash's status
is forwarded to QEMU and emulated by it. In this way, the correct pflash's
status can be returned from QEMU to break the infinite loop in the edk2
firmware.
We tried a git-bisect and the first problematic commit is cd4c71835228 ("
KVM: arm64: Convert to the gfn-based MMU notifier callbacks"). With this,
clean_dcache_guest_page() is called after the memory slots are iterated
in kvm_mmu_notifier_change_pte(). clean_dcache_guest_page() is called
before the iteration on the memory slots before this commit. This change
literally enlarges the racy window between kvm_mmu_notifier_change_pte()
and memory slot removal so that we're able to reproduce the issue in a
practical test case. However, the issue exists since commit d5d8184d35c9
("KVM: ARM: Memory virtualization setup").
Cc: stable(a)vger.kernel.org # v3.9+
Fixes: d5d8184d35c9 ("KVM: ARM: Memory virtualization setup")
Reported-by: Shuai Hu <hshuai(a)redhat.com>
Reported-by: Zhenyu Zhang <zhenyzha(a)redhat.com>
Signed-off-by: Gavin Shan <gshan(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Reviewed-by: Oliver Upton <oliver.upton(a)linux.dev>
Reviewed-by: Peter Xu <peterx(a)redhat.com>
Reviewed-by: Sean Christopherson <seanjc(a)google.com>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Message-Id: <20230615054259.14911-1-gshan(a)redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
---
virt/kvm/kvm_main.c | 20 +++++++++++++++++++-
1 file changed, 19 insertions(+), 1 deletion(-)
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 479802a892d4..65f94f592ff8 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -686,6 +686,24 @@ static __always_inline int kvm_handle_hva_range_no_flush(struct mmu_notifier *mn
return __kvm_handle_hva_range(kvm, &range);
}
+
+static bool kvm_change_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
+{
+ /*
+ * Skipping invalid memslots is correct if and only change_pte() is
+ * surrounded by invalidate_range_{start,end}(), which is currently
+ * guaranteed by the primary MMU. If that ever changes, KVM needs to
+ * unmap the memslot instead of skipping the memslot to ensure that KVM
+ * doesn't hold references to the old PFN.
+ */
+ WARN_ON_ONCE(!READ_ONCE(kvm->mn_active_invalidate_count));
+
+ if (range->slot->flags & KVM_MEMSLOT_INVALID)
+ return false;
+
+ return kvm_set_spte_gfn(kvm, range);
+}
+
static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
struct mm_struct *mm,
unsigned long address,
@@ -707,7 +725,7 @@ static void kvm_mmu_notifier_change_pte(struct mmu_notifier *mn,
if (!READ_ONCE(kvm->mmu_invalidate_in_progress))
return;
- kvm_handle_hva_range(mn, address, address + 1, pte, kvm_set_spte_gfn);
+ kvm_handle_hva_range(mn, address, address + 1, pte, kvm_change_spte_gfn);
}
void kvm_mmu_invalidate_begin(struct kvm *kvm, unsigned long start,
--
2.25.1
This sysctl has the very unusal behaviour of not allowing any user (even
CAP_SYS_ADMIN) to reduce the restriction setting, meaning that if you
were to set this sysctl to a more restrictive option in the host pidns
you would need to reboot your machine in order to reset it.
The justification given in [1] is that this is a security feature and
thus it should not be possible to disable. Aside from the fact that we
have plenty of security-related sysctls that can be disabled after being
enabled (fs.protected_symlinks for instance), the protection provided by
the sysctl is to stop users from being able to create a binary and then
execute it. A user with CAP_SYS_ADMIN can trivially do this without
memfd_create(2):
% cat mount-memfd.c
#include <fcntl.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <linux/mount.h>
#define SHELLCODE "#!/bin/echo this file was executed from this totally private tmpfs:"
int main(void)
{
int fsfd = fsopen("tmpfs", FSOPEN_CLOEXEC);
assert(fsfd >= 0);
assert(!fsconfig(fsfd, FSCONFIG_CMD_CREATE, NULL, NULL, 2));
int dfd = fsmount(fsfd, FSMOUNT_CLOEXEC, 0);
assert(dfd >= 0);
int execfd = openat(dfd, "exe", O_CREAT | O_RDWR | O_CLOEXEC, 0782);
assert(execfd >= 0);
assert(write(execfd, SHELLCODE, strlen(SHELLCODE)) == strlen(SHELLCODE));
assert(!close(execfd));
char *execpath = NULL;
char *argv[] = { "bad-exe", NULL }, *envp[] = { NULL };
execfd = openat(dfd, "exe", O_PATH | O_CLOEXEC);
assert(execfd >= 0);
assert(asprintf(&execpath, "/proc/self/fd/%d", execfd) > 0);
assert(!execve(execpath, argv, envp));
}
% ./mount-memfd
this file was executed from this totally private tmpfs: /proc/self/fd/5
%
Given that it is possible for CAP_SYS_ADMIN users to create executable
binaries without memfd_create(2) and without touching the host
filesystem (not to mention the many other things a CAP_SYS_ADMIN process
would be able to do that would be equivalent or worse), it seems strange
to cause a fair amount of headache to admins when there doesn't appear
to be an actual security benefit to blocking this.
It should be noted that with this change, programs that can do an
unprivileged unshare(CLONE_NEWUSER) would be able to create an
executable memfd even if their current pidns didn't allow it. However,
the same sample program above can also be used in this scenario, meaning
that even with this consideration, blocking CAP_SYS_ADMIN makes little
sense:
% unshare -rm ./mount-memfd
this file was executed from this totally private tmpfs: /proc/self/fd/5
This simply further reinforces that locked-down environments need to
disallow CLONE_NEWUSER for unprivileged users (as is already the case in
most container environments).
[1]: https://lore.kernel.org/all/CABi2SkWnAgHK1i6iqSqPMYuNEhtHBkO8jUuCvmG3RmUB5T…
Cc: Dominique Martinet <asmadeus(a)codewreck.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: stable(a)vger.kernel.org # v6.3+
Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
kernel/pid_sysctl.h | 7 -------
1 file changed, 7 deletions(-)
diff --git a/kernel/pid_sysctl.h b/kernel/pid_sysctl.h
index b26e027fc9cd..8a22bc29ebb4 100644
--- a/kernel/pid_sysctl.h
+++ b/kernel/pid_sysctl.h
@@ -24,13 +24,6 @@ static int pid_mfd_noexec_dointvec_minmax(struct ctl_table *table,
if (ns != &init_pid_ns)
table_copy.data = &ns->memfd_noexec_scope;
- /*
- * set minimum to current value, the effect is only bigger
- * value is accepted.
- */
- if (*(int *)table_copy.data > *(int *)table_copy.extra1)
- table_copy.extra1 = table_copy.data;
-
return proc_dointvec_minmax(&table_copy, write, buf, lenp, ppos);
}
--
2.41.0
The quilt patch titled
Subject: hugetlb: optimize update_and_free_pages_bulk to avoid lock cycles
has been removed from the -mm tree. Its filename was
hugetlb-optimize-update_and_free_pages_bulk-to-avoid-lock-cycles.patch
This patch was dropped because an updated version will be merged
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: optimize update_and_free_pages_bulk to avoid lock cycles
Date: Tue, 11 Jul 2023 15:09:42 -0700
update_and_free_pages_bulk is designed to free a list of hugetlb pages
back to their associated lower level allocators. This may require
allocating vmemmap pages associated with each hugetlb page. The hugetlb
page destructor must be changed before pages are freed to lower level
allocators. However, the destructor must be changed under the hugetlb
lock. This means there is potentially one lock cycle per page.
Minimize the number of lock cycles in update_and_free_pages_bulk by:
1) allocating necessary vmemmap for all hugetlb pages on the list
2) take hugetlb lock and clear destructor for all pages on the list
3) free all pages on list back to low level allocators
Link: https://lkml.kernel.org/r/20230711220942.43706-3-mike.kravetz@oracle.com
Fixes: ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Jiaqi Yan <jiaqiyan(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 35 ++++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
--- a/mm/hugetlb.c~hugetlb-optimize-update_and_free_pages_bulk-to-avoid-lock-cycles
+++ a/mm/hugetlb.c
@@ -1855,11 +1855,44 @@ static void update_and_free_pages_bulk(s
{
struct page *page, *t_page;
struct folio *folio;
+ bool clear_dtor = false;
+ /*
+ * First allocate required vmemmmap for all pages on list. If vmemmap
+ * can not be allocated, we can not free page to lower level allocator,
+ * so add back as hugetlb surplus page.
+ */
+ list_for_each_entry_safe(page, t_page, list, lru) {
+ if (HPageVmemmapOptimized(page)) {
+ clear_dtor = true;
+ if (hugetlb_vmemmap_restore(h, page)) {
+ spin_lock_irq(&hugetlb_lock);
+ add_hugetlb_folio(h, folio, true);
+ spin_unlock_irq(&hugetlb_lock);
+ }
+ cond_resched();
+ }
+ }
+
+ /*
+ * If vmemmmap allocation performed above, then take lock * to clear
+ * destructor of all pages on list.
+ */
+ if (clear_dtor) {
+ spin_lock_irq(&hugetlb_lock);
+ list_for_each_entry(page, list, lru)
+ __clear_hugetlb_destructor(h, page_folio(page));
+ spin_unlock_irq(&hugetlb_lock);
+ }
+
+ /*
+ * Free pages back to low level allocators. vmemmap and destructors
+ * were taken care of above, so update_and_free_hugetlb_folio will
+ * not need to take hugetlb lock.
+ */
list_for_each_entry_safe(page, t_page, list, lru) {
folio = page_folio(page);
update_and_free_hugetlb_folio(h, folio, false);
- cond_resched();
}
}
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlb-do-not-clear-hugetlb-dtor-until-allocating-vmemmap.patch
The patch titled
Subject: hugetlb: optimize update_and_free_pages_bulk to avoid lock cycles
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
hugetlb-optimize-update_and_free_pages_bulk-to-avoid-lock-cycles.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: optimize update_and_free_pages_bulk to avoid lock cycles
Date: Tue, 11 Jul 2023 15:09:42 -0700
update_and_free_pages_bulk is designed to free a list of hugetlb pages
back to their associated lower level allocators. This may require
allocating vmemmap pages associated with each hugetlb page. The hugetlb
page destructor must be changed before pages are freed to lower level
allocators. However, the destructor must be changed under the hugetlb
lock. This means there is potentially one lock cycle per page.
Minimize the number of lock cycles in update_and_free_pages_bulk by:
1) allocating necessary vmemmap for all hugetlb pages on the list
2) take hugetlb lock and clear destructor for all pages on the list
3) free all pages on list back to low level allocators
Link: https://lkml.kernel.org/r/20230711220942.43706-3-mike.kravetz@oracle.com
Fixes: ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Jiaqi Yan <jiaqiyan(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 35 ++++++++++++++++++++++++++++++++++-
1 file changed, 34 insertions(+), 1 deletion(-)
--- a/mm/hugetlb.c~hugetlb-optimize-update_and_free_pages_bulk-to-avoid-lock-cycles
+++ a/mm/hugetlb.c
@@ -1855,11 +1855,44 @@ static void update_and_free_pages_bulk(s
{
struct page *page, *t_page;
struct folio *folio;
+ bool clear_dtor = false;
+ /*
+ * First allocate required vmemmmap for all pages on list. If vmemmap
+ * can not be allocated, we can not free page to lower level allocator,
+ * so add back as hugetlb surplus page.
+ */
+ list_for_each_entry_safe(page, t_page, list, lru) {
+ if (HPageVmemmapOptimized(page)) {
+ clear_dtor = true;
+ if (hugetlb_vmemmap_restore(h, page)) {
+ spin_lock_irq(&hugetlb_lock);
+ add_hugetlb_folio(h, folio, true);
+ spin_unlock_irq(&hugetlb_lock);
+ }
+ cond_resched();
+ }
+ }
+
+ /*
+ * If vmemmmap allocation performed above, then take lock * to clear
+ * destructor of all pages on list.
+ */
+ if (clear_dtor) {
+ spin_lock_irq(&hugetlb_lock);
+ list_for_each_entry(page, list, lru)
+ __clear_hugetlb_destructor(h, page_folio(page));
+ spin_unlock_irq(&hugetlb_lock);
+ }
+
+ /*
+ * Free pages back to low level allocators. vmemmap and destructors
+ * were taken care of above, so update_and_free_hugetlb_folio will
+ * not need to take hugetlb lock.
+ */
list_for_each_entry_safe(page, t_page, list, lru) {
folio = page_folio(page);
update_and_free_hugetlb_folio(h, folio, false);
- cond_resched();
}
}
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlb-do-not-clear-hugetlb-dtor-until-allocating-vmemmap.patch
hugetlb-optimize-update_and_free_pages_bulk-to-avoid-lock-cycles.patch
The patch titled
Subject: hugetlb: do not clear hugetlb dtor until allocating vmemmap
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
hugetlb-do-not-clear-hugetlb-dtor-until-allocating-vmemmap.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mike Kravetz <mike.kravetz(a)oracle.com>
Subject: hugetlb: do not clear hugetlb dtor until allocating vmemmap
Date: Tue, 11 Jul 2023 15:09:41 -0700
Patch series "Fix hugetlb free path race with memory errors".
In the discussion of Jiaqi Yan's series "Improve hugetlbfs read on
HWPOISON hugepages" the race window was discovered.
https://lore.kernel.org/linux-mm/20230616233447.GB7371@monkey/
Freeing a hugetlb page back to low level memory allocators is performed
in two steps.
1) Under hugetlb lock, remove page from hugetlb lists and clear destructor
2) Outside lock, allocate vmemmap if necessary and call low level free
Between these two steps, the hugetlb page will appear as a normal
compound page. However, vmemmap for tail pages could be missing.
If a memory error occurs at this time, we could try to update page
flags non-existant page structs.
A much more detailed description is in the first patch.
The first patch addresses the race window. However, it adds a
hugetlb_lock lock/unlock cycle to every vmemmap optimized hugetlb page
free operation. This could lead to slowdowns if one is freeing a large
number of hugetlb pages.
The second path optimizes the update_and_free_pages_bulk routine to only
take the lock once in bulk operations.
The second patch is technically not a bug fix, but includes a Fixes tag
and Cc stable to avoid a performance regression. It can be combined with
the first, but was done separately make reviewing easier.
This patch (of 2):
Freeing a hugetlb page and releasing base pages back to the underlying
allocator such as buddy or cma is performed in two steps:
- remove_hugetlb_folio() is called to remove the folio from hugetlb
lists, get a ref on the page and remove hugetlb destructor. This
all must be done under the hugetlb lock. After this call, the page
can be treated as a normal compound page or a collection of base
size pages.
- update_and_free_hugetlb_folio() is called to allocate vmemmap if
needed and the free routine of the underlying allocator is called
on the resulting page. We can not hold the hugetlb lock here.
One issue with this scheme is that a memory error could occur between
these two steps. In this case, the memory error handling code treats
the old hugetlb page as a normal compound page or collection of base
pages. It will then try to SetPageHWPoison(page) on the page with an
error. If the page with error is a tail page without vmemmap, a write
error will occur when trying to set the flag.
Address this issue by modifying remove_hugetlb_folio() and
update_and_free_hugetlb_folio() such that the hugetlb destructor is not
cleared until after allocating vmemmap. Since clearing the destructor
requires holding the hugetlb lock, the clearing is done in
remove_hugetlb_folio() if the vmemmap is present. This saves a
lock/unlock cycle. Otherwise, destructor is cleared in
update_and_free_hugetlb_folio() after allocating vmemmap.
Note that this will leave hugetlb pages in a state where they are marked
free (by hugetlb specific page flag) and have a ref count. This is not
a normal state. The only code that would notice is the memory error
code, and it is set up to retry in such a case.
A subsequent patch will create a routine to do bulk processing of
vmemmap allocation. This will eliminate a lock/unlock cycle for each
hugetlb page in the case where we are freeing a large number of pages.
Link: https://lkml.kernel.org/r/20230711220942.43706-2-mike.kravetz@oracle.com
Fixes: ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: Axel Rasmussen <axelrasmussen(a)google.com>
Cc: James Houghton <jthoughton(a)google.com>
Cc: Jiaqi Yan <jiaqiyan(a)google.com>
Cc: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Naoya Horiguchi <naoya.horiguchi(a)linux.dev>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/hugetlb.c | 75 +++++++++++++++++++++++++++++++++----------------
1 file changed, 51 insertions(+), 24 deletions(-)
--- a/mm/hugetlb.c~hugetlb-do-not-clear-hugetlb-dtor-until-allocating-vmemmap
+++ a/mm/hugetlb.c
@@ -1579,9 +1579,37 @@ static inline void destroy_compound_giga
unsigned int order) { }
#endif
+static inline void __clear_hugetlb_destructor(struct hstate *h,
+ struct folio *folio)
+{
+ lockdep_assert_held(&hugetlb_lock);
+
+ /*
+ * Very subtle
+ *
+ * For non-gigantic pages set the destructor to the normal compound
+ * page dtor. This is needed in case someone takes an additional
+ * temporary ref to the page, and freeing is delayed until they drop
+ * their reference.
+ *
+ * For gigantic pages set the destructor to the null dtor. This
+ * destructor will never be called. Before freeing the gigantic
+ * page destroy_compound_gigantic_folio will turn the folio into a
+ * simple group of pages. After this the destructor does not
+ * apply.
+ *
+ */
+ if (hstate_is_gigantic(h))
+ folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
+ else
+ folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
+}
+
/*
- * Remove hugetlb folio from lists, and update dtor so that the folio appears
- * as just a compound page.
+ * Remove hugetlb folio from lists.
+ * If vmemmap exists for the folio, update dtor so that the folio appears
+ * as just a compound page. Otherwise, wait until after allocating vmemmap
+ * to update dtor.
*
* A reference is held on the folio, except in the case of demote.
*
@@ -1612,31 +1640,19 @@ static void __remove_hugetlb_folio(struc
}
/*
- * Very subtle
- *
- * For non-gigantic pages set the destructor to the normal compound
- * page dtor. This is needed in case someone takes an additional
- * temporary ref to the page, and freeing is delayed until they drop
- * their reference.
- *
- * For gigantic pages set the destructor to the null dtor. This
- * destructor will never be called. Before freeing the gigantic
- * page destroy_compound_gigantic_folio will turn the folio into a
- * simple group of pages. After this the destructor does not
- * apply.
- *
- * This handles the case where more than one ref is held when and
- * after update_and_free_hugetlb_folio is called.
- *
- * In the case of demote we do not ref count the page as it will soon
- * be turned into a page of smaller size.
+ * We can only clear the hugetlb destructor after allocating vmemmap
+ * pages. Otherwise, someone (memory error handling) may try to write
+ * to tail struct pages.
+ */
+ if (!folio_test_hugetlb_vmemmap_optimized(folio))
+ __clear_hugetlb_destructor(h, folio);
+
+ /*
+ * In the case of demote we do not ref count the page as it will soon
+ * be turned into a page of smaller size.
*/
if (!demote)
folio_ref_unfreeze(folio, 1);
- if (hstate_is_gigantic(h))
- folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
- else
- folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
h->nr_huge_pages--;
h->nr_huge_pages_node[nid]--;
@@ -1705,6 +1721,7 @@ static void __update_and_free_hugetlb_fo
{
int i;
struct page *subpage;
+ bool clear_dtor = folio_test_hugetlb_vmemmap_optimized(folio);
if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
return;
@@ -1735,6 +1752,16 @@ static void __update_and_free_hugetlb_fo
if (unlikely(folio_test_hwpoison(folio)))
folio_clear_hugetlb_hwpoison(folio);
+ /*
+ * If vmemmap pages were allocated above, then we need to clear the
+ * hugetlb destructor under the hugetlb lock.
+ */
+ if (clear_dtor) {
+ spin_lock_irq(&hugetlb_lock);
+ __clear_hugetlb_destructor(h, folio);
+ spin_unlock_irq(&hugetlb_lock);
+ }
+
for (i = 0; i < pages_per_huge_page(h); i++) {
subpage = folio_page(folio, i);
subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
_
Patches currently in -mm which might be from mike.kravetz(a)oracle.com are
hugetlb-do-not-clear-hugetlb-dtor-until-allocating-vmemmap.patch
hugetlb-optimize-update_and_free_pages_bulk-to-avoid-lock-cycles.patch
This is a RFC and a bit rough for now. I only set down to create the
first of the three patches. But while doing so I noticed a few things
that seemed odd for me with my background on writing and editing texts.
So I just quickly performed a few additional changes to fix those to see
if the stable team would appreciate them, as this document is clearly
their domain.
If those changes or even the initial patch are not welcomed, I'll simply
drop them. I'd totally understand this, as texts like these are delicate
and it's easy to accidentlly change the intent or the meaning while
adjusting things in good faith.
At the same time I might be willing to do a few more changes, if people
like the direction this takes and want a bit more fine tuning.
CC: Greg KH <gregkh(a)linuxfoundation.org>
CC: Sasha Levin <sashal(a)kernel.org>
CC: Jonathan Corbet <corbet(a)lwn.net>
Thorsten Leemhuis (3):
docs: stable-kernel-rules: mention other usages for stable tag
comments
docs: stable-kernel-rules: make rule section more straight forward
docs: stable-kernel-rules: improve structure to optimize reading flow
Documentation/process/stable-kernel-rules.rst | 189 ++++++++++--------
1 file changed, 105 insertions(+), 84 deletions(-)
base-commit: 016571b6d52deb473676fb4d24baf8ed3667ae21
--
2.40.1
Due to an oversight in commit 1b3044e39a89 ("procfs: fix pthread
cross-thread naming if !PR_DUMPABLE") in switching from REG to NOD,
chmod operations on /proc/thread-self/comm were no longer blocked as
they are on almost all other procfs files.
A very similar situation with /proc/self/environ was used to as a root
exploit a long time ago, but procfs has SB_I_NOEXEC so this is simply a
correctness issue.
Ref: https://lwn.net/Articles/191954/
Ref: 6d76fa58b050 ("Don't allow chmod() on the /proc/<pid>/ files")
Fixes: 1b3044e39a89 ("procfs: fix pthread cross-thread naming if !PR_DUMPABLE")
Cc: stable(a)vger.kernel.org # v4.7+
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
v2: removed nolibc selftests as per review
v1: <https://lore.kernel.org/linux-fsdevel/20230713121907.9693-1-cyphar@cyphar.c…>
---
fs/proc/base.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 05452c3b9872..7394229816f3 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -3583,7 +3583,8 @@ static int proc_tid_comm_permission(struct mnt_idmap *idmap,
}
static const struct inode_operations proc_tid_comm_inode_operations = {
- .permission = proc_tid_comm_permission,
+ .setattr = proc_setattr,
+ .permission = proc_tid_comm_permission,
};
/*
--
2.41.0
From: Thomas Bourgoin <thomas.bourgoin(a)foss.st.com>
We were reading the length of the scatterlist sg after copying value of
tsg inside.
So we are using the size of the previous scatterlist and for the first
one we are using an unitialised value.
Fix this by copying tsg in sg[0] before reading the size.
Fixes : 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Bourgoin <thomas.bourgoin(a)foss.st.com>
---
Changes in v2:
- Add Fixes 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module")
- Add Cc: stable(a)vger.kernel.org
drivers/crypto/stm32/stm32-hash.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/crypto/stm32/stm32-hash.c b/drivers/crypto/stm32/stm32-hash.c
index c179a6c1a457..519fb716acee 100644
--- a/drivers/crypto/stm32/stm32-hash.c
+++ b/drivers/crypto/stm32/stm32-hash.c
@@ -678,9 +678,9 @@ static int stm32_hash_dma_send(struct stm32_hash_dev *hdev)
}
for_each_sg(rctx->sg, tsg, rctx->nents, i) {
+ sg[0] = *tsg;
len = sg->length;
- sg[0] = *tsg;
if (sg_is_last(sg)) {
if (hdev->dma_mode == 1) {
len = (ALIGN(sg->length, 16) - 16);
--
2.25.1
From: Thomas Bourgoin <thomas.bourgoin(a)foss.st.com>
We were reading the length of the scatterlist sg after copying value of
tsg inside.
So we are using the size of the previous scatterlist and for the first
one we are using an unitialised value.
Fix this by copying tsg in sg[0] before reading the size.
Fixes : 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Bourgoin <thomas.bourgoin(a)foss.st.com>
---
Changes in v2:
- Add Fixes 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module")
- Add Cc: stable(a)vger.kernel.org
drivers/crypto/stm32/stm32-hash.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/crypto/stm32/stm32-hash.c b/drivers/crypto/stm32/stm32-hash.c
index c179a6c1a457..519fb716acee 100644
--- a/drivers/crypto/stm32/stm32-hash.c
+++ b/drivers/crypto/stm32/stm32-hash.c
@@ -678,9 +678,9 @@ static int stm32_hash_dma_send(struct stm32_hash_dev *hdev)
}
for_each_sg(rctx->sg, tsg, rctx->nents, i) {
+ sg[0] = *tsg;
len = sg->length;
- sg[0] = *tsg;
if (sg_is_last(sg)) {
if (hdev->dma_mode == 1) {
len = (ALIGN(sg->length, 16) - 16);
--
2.25.1
From: Thomas Bourgoin <thomas.bourgoin(a)foss.st.com>
We were reading the length of the scatterlist sg after copying value of
tsg inside.
So we are using the size of the previous scatterlist and for the first
one we are using an unitialised value.
Fix this by copying tsg in sg[0] before reading the size.
Fixes : 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Bourgoin <thomas.bourgoin(a)foss.st.com>
---
Changes since V1:
- Add Fixes 8a1012d3f2ab ("crypto: stm32 - Support for STM32 HASH module")
- Add Cc: stable(a)vger.kernel.org
---
drivers/crypto/stm32/stm32-hash.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/crypto/stm32/stm32-hash.c b/drivers/crypto/stm32/stm32-hash.c
index c179a6c1a457..519fb716acee 100644
--- a/drivers/crypto/stm32/stm32-hash.c
+++ b/drivers/crypto/stm32/stm32-hash.c
@@ -678,9 +678,9 @@ static int stm32_hash_dma_send(struct stm32_hash_dev *hdev)
}
for_each_sg(rctx->sg, tsg, rctx->nents, i) {
+ sg[0] = *tsg;
len = sg->length;
- sg[0] = *tsg;
if (sg_is_last(sg)) {
if (hdev->dma_mode == 1) {
len = (ALIGN(sg->length, 16) - 16);
--
2.25.1
The previous implementation of vm.memfd_noexec=2 did not actually
enforce the usage of MFD_NOEXEC_SEAL, as any program that set MFD_EXEC
would not be affected by the "enforcement" mechanism. This was fixed in
in Andrew's tree recently, but there were still some things that could
be cleaned up.
On the topic of older programs, it seems far less disruptive to have
vm.memfd_noexec=2 have the same behaviour as vm.memfd_noexec=1 (default
to MFD_NOEXEC_SEAL if unspecified) to avoid breaking older programs that
didn't actually care about the exec bits -- which includes the vast
majority of programs that use memfd_create(2), thus allowing users to be
able to enable this sysctl without all older programs needlessly
breaking. Otherwise vm.memfd_noexec=2 would be unusable on most
general-purpose systems as it would require an audit of all of
userspace.
While we're at it, fix the warnings emitted by memfd_create() to use
pr_warn_ratelimited(). If the intention of the warning is to get
developers to switch to explicitly specifying if they want exec bits or
not, you need to warn them whenever they use it. The systemd version on
my box doesn't pass MFD_EXEC, making the warning useless for most
userspace developers because it was already emitted during boot. Commit
105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC") mentions
that this was switched to pr_warn_once "as per review" but I couldn't
find the discussion anywhere, and the obvious issue (the ability for
unprivileged userspace to spam the kernel log) should be handled by
pr_warn_ratelimited. If the issue is that this is too spammy, we could
tie it to using vm.memfd_noexec=1 or higher.
This is a user-visible API change, but as it allows programs to do
something that would be blocked before, and the sysctl itself was broken
and recently released, it seems unlikely this will cause any issues.
Cc: Dominique Martinet <asmadeus(a)codewreck.org>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: stable(a)vger.kernel.org # v6.3+
Fixes: 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
Signed-off-by: Aleksa Sarai <cyphar(a)cyphar.com>
---
include/linux/pid_namespace.h | 16 +++--------
mm/memfd.c | 32 ++++++++--------------
tools/testing/selftests/memfd/memfd_test.c | 22 +++++++++++----
3 files changed, 33 insertions(+), 37 deletions(-)
diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
index c758809d5bcf..53974d79d98e 100644
--- a/include/linux/pid_namespace.h
+++ b/include/linux/pid_namespace.h
@@ -17,18 +17,10 @@
struct fs_pin;
#if defined(CONFIG_SYSCTL) && defined(CONFIG_MEMFD_CREATE)
-/*
- * sysctl for vm.memfd_noexec
- * 0: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL
- * acts like MFD_EXEC was set.
- * 1: memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL
- * acts like MFD_NOEXEC_SEAL was set.
- * 2: memfd_create() without MFD_NOEXEC_SEAL will be
- * rejected.
- */
-#define MEMFD_NOEXEC_SCOPE_EXEC 0
-#define MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL 1
-#define MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 2
+/* modes for vm.memfd_noexec sysctl */
+#define MEMFD_NOEXEC_SCOPE_EXEC 0 /* MFD_EXEC implied if unset */
+#define MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL 1 /* MFD_NOEXEC_SEAL implied if unset */
+#define MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED 2 /* same as 1, except MFD_EXEC rejected */
#endif
struct pid_namespace {
diff --git a/mm/memfd.c b/mm/memfd.c
index 0bdbd2335af7..4f1f841ae39d 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -271,30 +271,22 @@ long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int arg)
static int check_sysctl_memfd_noexec(unsigned int *flags)
{
#ifdef CONFIG_SYSCTL
- char comm[TASK_COMM_LEN];
- int sysctl = MEMFD_NOEXEC_SCOPE_EXEC;
- struct pid_namespace *ns;
-
- ns = task_active_pid_ns(current);
- if (ns)
- sysctl = ns->memfd_noexec_scope;
+ int sysctl = task_active_pid_ns(current)->memfd_noexec_scope;
if (!(*flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
- if (sysctl == MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL)
+ if (sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_SEAL)
*flags |= MFD_NOEXEC_SEAL;
else
*flags |= MFD_EXEC;
}
- if (*flags & MFD_EXEC && sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED) {
- pr_warn_once(
- "memfd_create(): MFD_NOEXEC_SEAL is enforced, pid=%d '%s'\n",
- task_pid_nr(current), get_task_comm(comm, current));
-
+ if (!(*flags & MFD_NOEXEC_SEAL) && sysctl >= MEMFD_NOEXEC_SCOPE_NOEXEC_ENFORCED) {
+ pr_warn_ratelimited(
+ "%s[%d]: memfd_create() requires MFD_NOEXEC_SEAL with vm.memfd_noexec=%d\n",
+ current->comm, task_pid_nr(current), sysctl);
return -EACCES;
}
#endif
-
return 0;
}
@@ -302,7 +294,6 @@ SYSCALL_DEFINE2(memfd_create,
const char __user *, uname,
unsigned int, flags)
{
- char comm[TASK_COMM_LEN];
unsigned int *file_seals;
struct file *file;
int fd, error;
@@ -324,13 +315,14 @@ SYSCALL_DEFINE2(memfd_create,
return -EINVAL;
if (!(flags & (MFD_EXEC | MFD_NOEXEC_SEAL))) {
- pr_warn_once(
- "memfd_create() without MFD_EXEC nor MFD_NOEXEC_SEAL, pid=%d '%s'\n",
- task_pid_nr(current), get_task_comm(comm, current));
+ pr_warn_ratelimited(
+ "%s[%d]: memfd_create() called without MFD_EXEC or MFD_NOEXEC_SEAL set\n",
+ current->comm, task_pid_nr(current));
}
- if (check_sysctl_memfd_noexec(&flags) < 0)
- return -EACCES;
+ error = check_sysctl_memfd_noexec(&flags);
+ if (error < 0)
+ return error;
/* length includes terminating zero */
len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1);
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index dbdd9ec5e397..d8342989c547 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -1145,11 +1145,23 @@ static void test_sysctl_child(void)
printf("%s sysctl 2\n", memfd_str);
sysctl_assert_write("2");
- mfd_fail_new("kern_memfd_sysctl_2",
- MFD_CLOEXEC | MFD_ALLOW_SEALING);
- mfd_fail_new("kern_memfd_sysctl_2_MFD_EXEC",
- MFD_CLOEXEC | MFD_EXEC);
- fd = mfd_assert_new("", 0, MFD_NOEXEC_SEAL);
+ mfd_fail_new("kern_memfd_sysctl_2_exec",
+ MFD_EXEC | MFD_CLOEXEC | MFD_ALLOW_SEALING);
+
+ fd = mfd_assert_new("kern_memfd_sysctl_2_dfl",
+ mfd_def_size,
+ MFD_CLOEXEC | MFD_ALLOW_SEALING);
+ mfd_assert_mode(fd, 0666);
+ mfd_assert_has_seals(fd, F_SEAL_EXEC);
+ mfd_fail_chmod(fd, 0777);
+ close(fd);
+
+ fd = mfd_assert_new("kern_memfd_sysctl_2_noexec_seal",
+ mfd_def_size,
+ MFD_NOEXEC_SEAL | MFD_CLOEXEC | MFD_ALLOW_SEALING);
+ mfd_assert_mode(fd, 0666);
+ mfd_assert_has_seals(fd, F_SEAL_EXEC);
+ mfd_fail_chmod(fd, 0777);
close(fd);
sysctl_fail_write("0");
--
2.41.0
Don't assume that the device is fully under the control of PCI core.
Use RMW capability accessors in link retraining which do proper locking
to avoid losing concurrent updates to the register values.
Fixes: 4ec73791a64b ("PCI: Work around Pericom PCIe-to-PCI bridge Retrain Link erratum")
Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Suggested-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
pci/enumeration branch moves the link retraining code into PCI core and
also conflicts with a link retraining fix in pci/aspm. The changelog
(and patch splitting) takes the move into account by not referring to
ASPM while the change itself is not based on pci/enumeration (as per
Bjorn's preference).
---
drivers/pci/pci.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 60230da957e0..f7315b13bb82 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -4927,7 +4927,6 @@ static int pcie_wait_for_link_status(struct pci_dev *pdev,
int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
{
int rc;
- u16 lnkctl;
/*
* Ensure the updated LNKCTL parameters are used during link
@@ -4939,17 +4938,14 @@ int pcie_retrain_link(struct pci_dev *pdev, bool use_lt)
if (rc)
return rc;
- pcie_capability_read_word(pdev, PCI_EXP_LNKCTL, &lnkctl);
- lnkctl |= PCI_EXP_LNKCTL_RL;
- pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+ pcie_capability_set_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
if (pdev->clear_retrain_link) {
/*
* Due to an erratum in some devices the Retrain Link bit
* needs to be cleared again manually to allow the link
* training to succeed.
*/
- lnkctl &= ~PCI_EXP_LNKCTL_RL;
- pcie_capability_write_word(pdev, PCI_EXP_LNKCTL, lnkctl);
+ pcie_capability_clear_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_RL);
}
return pcie_wait_for_link_status(pdev, use_lt, !use_lt);
--
2.30.2
Many places in the kernel write the Link Control and Root Control PCI
Express Capability Registers without proper concurrency control and
this could result in losing the changes one of the writers intended to
make.
Add pcie_cap_lock spinlock into the struct pci_dev and use it to
protect bit changes made in the RMW capability accessors. Protect only
a selected set of registers by differentiating the RMW accessor
internally to locked/unlocked variants using a wrapper which has the
same signature as pcie_capability_clear_and_set_word(). As the
Capability Register (pos) given to the wrapper is always a constant,
the compiler should be able to simplify all the dead-code away.
So far only the Link Control Register (ASPM, hotplug, link retraining,
various drivers) and the Root Control Register (AER & PME) seem to
require RMW locking.
Fixes: c7f486567c1d ("PCI PM: PCIe PME root port service driver")
Fixes: f12eb72a268b ("PCI/ASPM: Use PCI Express Capability accessors")
Fixes: 7d715a6c1ae5 ("PCI: add PCI Express ASPM support")
Fixes: affa48de8417 ("staging/rdma/hfi1: Add support for enabling/disabling PCIe ASPM")
Fixes: 849a9366cba9 ("misc: rtsx: Add support new chip rts5228 mmc: rtsx: Add support MMC_CAP2_NO_MMC")
Fixes: 3d1e7aa80d1c ("misc: rtsx: Use pcie_capability_clear_and_set_word() for PCI_EXP_LNKCTL")
Fixes: c0e5f4e73a71 ("misc: rtsx: Add support for RTS5261")
Fixes: 3df4fce739e2 ("misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG")
Fixes: 121e9c6b5c4c ("misc: rtsx: modify and fix init_hw function")
Fixes: 19f3bd548f27 ("mfd: rtsx: Remove LCTLR defination")
Fixes: 773ccdfd9cc6 ("mfd: rtsx: Read vendor setting from config space")
Fixes: 8275b77a1513 ("mfd: rts5249: Add support for RTS5250S power saving")
Fixes: 5da4e04ae480 ("misc: rtsx: Add support for RTS5260")
Fixes: 0f49bfbd0f2e ("tg3: Use PCI Express Capability accessors")
Fixes: 5e7dfd0fb94a ("tg3: Prevent corruption at 10 / 100Mbps w CLKREQ")
Fixes: b726e493e8dc ("r8169: sync existing 8168 device hardware start sequences with vendor driver")
Fixes: e6de30d63eb1 ("r8169: more 8168dp support.")
Fixes: 8a06127602de ("Bluetooth: hci_bcm4377: Add new driver for BCM4377 PCIe boards")
Fixes: 6f461f6c7c96 ("e1000e: enable/disable ASPM L0s and L1 and ERT according to hardware errata")
Fixes: 1eae4eb2a1c7 ("e1000e: Disable L1 ASPM power savings for 82573 mobile variants")
Fixes: 8060e169e02f ("ath9k: Enable extended synch for AR9485 to fix L0s recovery issue")
Fixes: 69ce674bfa69 ("ath9k: do btcoex ASPM disabling at initialization time")
Fixes: f37f05503575 ("mt76: mt76x2e: disable pcie_aspm by default")
Suggested-by: Lukas Wunner <lukas(a)wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen(a)linux.intel.com>
Reviewed-by: Rafael J. Wysocki <rafael(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/pci/access.c | 20 +++++++++++++++++---
drivers/pci/probe.c | 1 +
include/linux/pci.h | 34 ++++++++++++++++++++++++++++++++--
3 files changed, 50 insertions(+), 5 deletions(-)
diff --git a/drivers/pci/access.c b/drivers/pci/access.c
index 3c230ca3de58..0b2e90d2f04f 100644
--- a/drivers/pci/access.c
+++ b/drivers/pci/access.c
@@ -497,8 +497,8 @@ int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val)
}
EXPORT_SYMBOL(pcie_capability_write_dword);
-int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
- u16 clear, u16 set)
+int pcie_capability_clear_and_set_word_unlocked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set)
{
int ret;
u16 val;
@@ -512,7 +512,21 @@ int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
return ret;
}
-EXPORT_SYMBOL(pcie_capability_clear_and_set_word);
+EXPORT_SYMBOL(pcie_capability_clear_and_set_word_unlocked);
+
+int pcie_capability_clear_and_set_word_locked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set)
+{
+ unsigned long flags;
+ int ret;
+
+ spin_lock_irqsave(&dev->pcie_cap_lock, flags);
+ ret = pcie_capability_clear_and_set_word_unlocked(dev, pos, clear, set);
+ spin_unlock_irqrestore(&dev->pcie_cap_lock, flags);
+
+ return ret;
+}
+EXPORT_SYMBOL(pcie_capability_clear_and_set_word_locked);
int pcie_capability_clear_and_set_dword(struct pci_dev *dev, int pos,
u32 clear, u32 set)
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8bac3ce02609..f1587fb0ba71 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2324,6 +2324,7 @@ struct pci_dev *pci_alloc_dev(struct pci_bus *bus)
.end = -1,
};
+ spin_lock_init(&dev->pcie_cap_lock);
#ifdef CONFIG_PCI_MSI
raw_spin_lock_init(&dev->msi_lock);
#endif
diff --git a/include/linux/pci.h b/include/linux/pci.h
index c69a2cc1f412..7ee498cd1f37 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -467,6 +467,7 @@ struct pci_dev {
pci_dev_flags_t dev_flags;
atomic_t enable_cnt; /* pci_enable_device has been called */
+ spinlock_t pcie_cap_lock; /* Protects RMW ops in capability accessors */
u32 saved_config_space[16]; /* Config space saved at suspend time */
struct hlist_head saved_cap_space;
int rom_attr_enabled; /* Display of ROM attribute enabled? */
@@ -1217,11 +1218,40 @@ int pcie_capability_read_word(struct pci_dev *dev, int pos, u16 *val);
int pcie_capability_read_dword(struct pci_dev *dev, int pos, u32 *val);
int pcie_capability_write_word(struct pci_dev *dev, int pos, u16 val);
int pcie_capability_write_dword(struct pci_dev *dev, int pos, u32 val);
-int pcie_capability_clear_and_set_word(struct pci_dev *dev, int pos,
- u16 clear, u16 set);
+int pcie_capability_clear_and_set_word_unlocked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set);
+int pcie_capability_clear_and_set_word_locked(struct pci_dev *dev, int pos,
+ u16 clear, u16 set);
int pcie_capability_clear_and_set_dword(struct pci_dev *dev, int pos,
u32 clear, u32 set);
+/**
+ * pcie_capability_clear_and_set_word - RMW accessor for PCI Express Capability Registers
+ * @dev: PCI device structure of the PCI Express device
+ * @pos: PCI Express Capability Register
+ * @clear: Clear bitmask
+ * @set: Set bitmask
+ *
+ * Perform a Read-Modify-Write (RMW) operation using @clear and @set
+ * bitmasks on PCI Express Capability Register at @pos. Certain PCI Express
+ * Capability Registers are accessed concurrently in RMW fashion, hence
+ * require locking which is handled transparently to the caller.
+ */
+static inline int pcie_capability_clear_and_set_word(struct pci_dev *dev,
+ int pos,
+ u16 clear, u16 set)
+{
+ switch (pos) {
+ case PCI_EXP_LNKCTL:
+ case PCI_EXP_RTCTL:
+ return pcie_capability_clear_and_set_word_locked(dev, pos,
+ clear, set);
+ default:
+ return pcie_capability_clear_and_set_word_unlocked(dev, pos,
+ clear, set);
+ }
+}
+
static inline int pcie_capability_set_word(struct pci_dev *dev, int pos,
u16 set)
{
--
2.30.2
Changing the mode of symlinks is meaningless as the vfs doesn't take the
mode of a symlink into account during path lookup permission checking.
However, the vfs doesn't block mode changes on symlinks. This however,
has lead to an untenable mess roughly classifiable into the following
two categories:
(1) Filesystems that don't implement a i_op->setattr() for symlinks.
Such filesystems may or may not know that without i_op->setattr()
defined, notify_change() falls back to simple_setattr() causing the
inode's mode in the inode cache to be changed.
That's a generic issue as this will affect all non-size changing
inode attributes including ownership changes.
Example: afs
(2) Filesystems that fail with EOPNOTSUPP but change the mode of the
symlink nonetheless.
Some filesystems will happily update the mode of a symlink but still
return EOPNOTSUPP. This is the biggest source of confusion for
userspace.
The EOPNOTSUPP in this case comes from POSIX ACLs. Specifically it
comes from filesystems that call posix_acl_chmod(), e.g., btrfs via
if (!err && attr->ia_valid & ATTR_MODE)
err = posix_acl_chmod(idmap, dentry, inode->i_mode);
Filesystems including btrfs don't implement i_op->set_acl() so
posix_acl_chmod() will report EOPNOTSUPP.
When posix_acl_chmod() is called, most filesystems will have
finished updating the inode.
Perversely, this has the consequences that this behavior may depend
on two kconfig options and mount options:
* CONFIG_POSIX_ACL={y,n}
* CONFIG_${FSTYPE}_POSIX_ACL={y,n}
* Opt_acl, Opt_noacl
Example: btrfs, ext4, xfs
The only way to change the mode on a symlink currently involves abusing
an O_PATH file descriptor in the following manner:
fd = openat(-1, "/path/to/link", O_CLOEXEC | O_PATH | O_NOFOLLOW);
char path[PATH_MAX];
snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
chmod(path, 0000);
But for most major filesystems with POSIX ACL support such as btrfs,
ext4, ceph, tmpfs, xfs and others this will fail with EOPNOTSUPP with
the mode still updated due to the aforementioned posix_acl_chmod()
nonsense.
So, given that for all major filesystems this would fail with EOPNOTSUPP
and that both glibc (cf. [1]) and musl (cf. [2]) outright block mode
changes on symlinks we should just try and block mode changes on
symlinks directly in the vfs and have a clean break with this nonsense.
If this causes any regressions, we do the next best thing and fix up all
filesystems that do return EOPNOTSUPP with the mode updated to not call
posix_acl_chmod() on symlinks.
But as usual, let's try the clean cut solution first. It's a simple
patch that can be easily reverted. Not marking this for backport as I'll
do that manually if we're reasonably sure that this works and there are
no strong objections.
We could block this in chmod_common() but it's more appropriate to do it
notify_change() as it will also mean that we catch filesystems that
change symlink permissions explicitly or accidently.
Similar proposals were floated in the past as in [3] and [4] and again
recently in [5]. There's also a couple of bugs about this inconsistency
as in [6] and [7].
Link: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fc… [1]
Link: https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c [2]
Link: https://lore.kernel.org/all/20200911065733.GA31579@infradead.org [3]
Link: https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00518.html [4]
Link: https://lore.kernel.org/lkml/87lefmbppo.fsf@oldenburg.str.redhat.com [5]
Link: https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00467.html [6]
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=14578#c17 [7]
Cc: stable(a)vger.kernel.org # please backport to all LTSes but not before v6.6-rc2 is tagged
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Suggested-by: Florian Weimer <fweimer(a)redhat.com>
Reviewed-by: Aleksa Sarai <cyphar(a)cyphar.com>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
Changes in v2:
- declarations first...
- Link to v1: https://lore.kernel.org/r/20230712-vfs-chmod-symlinks-v1-1-27921df6011f@ker…
---
---
fs/attr.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/attr.c b/fs/attr.c
index d60dc1edb526..775bf77ce16f 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -394,9 +394,11 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
return error;
if ((ia_valid & ATTR_MODE)) {
- umode_t amode = attr->ia_mode;
+ if (S_ISLNK(inode->i_mode))
+ return -EOPNOTSUPP;
+
/* Flag setting protected by i_mutex */
- if (is_sxid(amode))
+ if (is_sxid(attr->ia_mode))
inode->i_flags &= ~S_NOSEC;
}
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230712-vfs-chmod-symlinks-a3ad1923a992
Conversion from big-endian to native is done in a common function
mmc_app_send_scr(). Converting in moxart_transfer_pio() is extra.
Double conversion on a LE system returns an incorrect SCR value,
leads to errors:
mmc0: unrecognised SCR structure version 8
Fixes: 1b66e94e6b99 ("mmc: moxart: Add MOXA ART SD/MMC driver")
Signed-off-by: Sergei Antonov <saproj(a)gmail.com>
Cc: Jonas Jensen <jonas.jensen(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
drivers/mmc/host/moxart-mmc.c | 8 +-------
1 file changed, 1 insertion(+), 7 deletions(-)
diff --git a/drivers/mmc/host/moxart-mmc.c b/drivers/mmc/host/moxart-mmc.c
index 2d002c81dcf3..d0d6ffcf78d4 100644
--- a/drivers/mmc/host/moxart-mmc.c
+++ b/drivers/mmc/host/moxart-mmc.c
@@ -338,13 +338,7 @@ static void moxart_transfer_pio(struct moxart_host *host)
return;
}
for (len = 0; len < remain && len < host->fifo_width;) {
- /* SCR data must be read in big endian. */
- if (data->mrq->cmd->opcode == SD_APP_SEND_SCR)
- *sgp = ioread32be(host->base +
- REG_DATA_WINDOW);
- else
- *sgp = ioread32(host->base +
- REG_DATA_WINDOW);
+ *sgp = ioread32(host->base + REG_DATA_WINDOW);
sgp++;
len += 4;
}
--
2.37.2
Hi Greg, Sasha,
[ this is v2, using "Upstream commit tag XYZ" as suggested by Salvatore Bonaccorso,
only this comestic change in this batch. ]
The following list shows the backported patches, I am using original
commit IDs for reference:
1) 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
2) 3e70489721b6 ("netfilter: nf_tables: unbind non-anonymous set if rule construction fails")
Please, apply.
Thanks.
Pablo Neira Ayuso (2):
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
include/net/netfilter/nf_tables.h | 5 +-
net/netfilter/nf_tables_api.c | 147 ++++++++++++++++++++++++++----
net/netfilter/nft_set_bitmap.c | 5 +-
net/netfilter/nft_set_hash.c | 23 ++++-
net/netfilter/nft_set_pipapo.c | 14 ++-
net/netfilter/nft_set_rbtree.c | 5 +-
6 files changed, 168 insertions(+), 31 deletions(-)
--
2.30.2
If system is busy during the command status polling function, the driver
may not get the chance to poll the status register till the end of time
out and return the premature status. Do a final check after time out
happens to ensure reading the correct status.
Fixes: 9d2ee0a60b8b ("mtd: nand: brcmnand: Check flash #WP pin status before nand erase/program")
Signed-off-by: William Zhang <william.zhang(a)broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Cc: stable(a)vger.kernel.org
---
Changes in v4:
- Update comment in the polling status function
- Add cc stable tag
Changes in v3: None
Changes in v2: None
drivers/mtd/nand/raw/brcmnand/brcmnand.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index 9ea96911d16b..9a373a10304d 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -1080,6 +1080,14 @@ static int bcmnand_ctrl_poll_status(struct brcmnand_controller *ctrl,
cpu_relax();
} while (time_after(limit, jiffies));
+ /*
+ * do a final check after time out in case the CPU was busy and the driver
+ * did not get enough time to perform the polling to avoid false alarms
+ */
+ val = brcmnand_read_reg(ctrl, BRCMNAND_INTFC_STATUS);
+ if ((val & mask) == expected_val)
+ return 0;
+
dev_warn(ctrl->dev, "timeout on status poll (expected %x got %x)\n",
expected_val, val & mask);
--
2.37.3
When executing a NAND command within the panic write path, wait for any
pending command instead of calling BUG_ON to avoid crashing while
already crashing.
Fixes: 27c5b17cd1b1 ("mtd: nand: add NAND driver "library" for Broadcom STB NAND controller")
Signed-off-by: William Zhang <william.zhang(a)broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Reviewed-by: Kursad Oney <kursad.oney(a)broadcom.com>
Reviewed-by: Kamal Dasu <kamal.dasu(a)broadcom.com>
Cc: stable(a)vger.kernel.org
---
Changes in v4:
- Update commit message
- Add cc stable tag
Changes in v3: None
Changes in v2: None
drivers/mtd/nand/raw/brcmnand/brcmnand.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index 9a373a10304d..b2c6396060db 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -1608,7 +1608,17 @@ static void brcmnand_send_cmd(struct brcmnand_host *host, int cmd)
dev_dbg(ctrl->dev, "send native cmd %d addr 0x%llx\n", cmd, cmd_addr);
- BUG_ON(ctrl->cmd_pending != 0);
+ /*
+ * If we came here through _panic_write and there is a pending
+ * command, try to wait for it. If it times out, rather than
+ * hitting BUG_ON, just return so we don't crash while crashing.
+ */
+ if (oops_in_progress) {
+ if (ctrl->cmd_pending &&
+ bcmnand_ctrl_poll_status(ctrl, NAND_CTRL_RDY, NAND_CTRL_RDY, 0))
+ return;
+ } else
+ BUG_ON(ctrl->cmd_pending != 0);
ctrl->cmd_pending = cmd;
ret = bcmnand_ctrl_poll_status(ctrl, NAND_CTRL_RDY, NAND_CTRL_RDY, 0);
--
2.37.3
When the oob buffer length is not in multiple of words, the oob write
function does out-of-bounds read on the oob source buffer at the last
iteration. Fix that by always checking length limit on the oob buffer
read and fill with 0xff when reaching the end of the buffer to the oob
registers.
Fixes: 27c5b17cd1b1 ("mtd: nand: add NAND driver "library" for Broadcom STB NAND controller")
Signed-off-by: William Zhang <william.zhang(a)broadcom.com>
Reviewed-by: Florian Fainelli <florian.fainelli(a)broadcom.com>
Cc: stable(a)vger.kernel.org
---
Changes in v4:
- Add comments in the write_oob_to_regs function
- Add cc stable tag
Changes in v3:
- Fix kernel test robot sparse warning:
drivers/mtd/nand/raw/brcmnand/brcmnand.c:1500:54: sparse: expected unsigned int [usertype] data
drivers/mtd/nand/raw/brcmnand/brcmnand.c:1500:54: sparse: got restricted __be32 [usertype]
Changes in v2:
- Handle the remaining unaligned oob data after the oob data write loop
drivers/mtd/nand/raw/brcmnand/brcmnand.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/mtd/nand/raw/brcmnand/brcmnand.c b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
index b2c6396060db..71d0ba652bee 100644
--- a/drivers/mtd/nand/raw/brcmnand/brcmnand.c
+++ b/drivers/mtd/nand/raw/brcmnand/brcmnand.c
@@ -1477,19 +1477,33 @@ static int write_oob_to_regs(struct brcmnand_controller *ctrl, int i,
const u8 *oob, int sas, int sector_1k)
{
int tbytes = sas << sector_1k;
- int j;
+ int j, k = 0;
+ u32 last = 0xffffffff;
+ u8 *plast = (u8 *)&last;
/* Adjust OOB values for 1K sector size */
if (sector_1k && (i & 0x01))
tbytes = max(0, tbytes - (int)ctrl->max_oob);
tbytes = min_t(int, tbytes, ctrl->max_oob);
- for (j = 0; j < tbytes; j += 4)
+ /*
+ * tbytes may not be multiple of words. Make sure we don't read out of
+ * the boundary and stop at last word.
+ */
+ for (j = 0; (j + 3) < tbytes; j += 4)
oob_reg_write(ctrl, j,
(oob[j + 0] << 24) |
(oob[j + 1] << 16) |
(oob[j + 2] << 8) |
(oob[j + 3] << 0));
+
+ /* handle the remaing bytes */
+ while (j < tbytes)
+ plast[k++] = oob[j++];
+
+ if (tbytes & 0x3)
+ oob_reg_write(ctrl, (tbytes & ~0x3), (__force u32)cpu_to_be32(last));
+
return tbytes;
}
--
2.37.3
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when
running a preemptible kernel, as it is possible that a vCPU is blocked
without requesting a doorbell interrupt.
The issue is that any preemption that occurs between vgic_v4_put() and
schedule() on the block path will mark the vPE as nonresident and *not*
request a doorbell irq.
Fix it by consistently requesting a doorbell irq in the vcpu put path if
the vCPU is blocking. While this technically means we could drop the
early doorbell irq request in kvm_vcpu_wfi(), deliberately leave it
intact such that vCPU halt polling can properly detect the wakeup
condition before actually scheduling out a vCPU.
Cc: stable(a)vger.kernel.org
Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put")
Reported-by: Xiang Chen <chenxiang66(a)hisilicon.com>
Tested-by: Xiang Chen <chenxiang66(a)hisilicon.com>
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
---
arch/arm64/kvm/vgic/vgic-v3.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index c3b8e132d599..8c467e9f4f11 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -749,7 +749,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
- WARN_ON(vgic_v4_put(vcpu, false));
+ WARN_ON(vgic_v4_put(vcpu, kvm_vcpu_is_blocking(vcpu)));
vgic_v3_vmcr_sync(vcpu);
--
2.41.0.255.g8b1d071c50-goog
This is the start of the stable review cycle for the 6.4.3 release.
There are 6 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Tue, 11 Jul 2023 20:38:10 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.4.3-rc2.…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.4.3-rc2
Suren Baghdasaryan <surenb(a)google.com>
fork: lock VMAs of the parent process when forking
Liu Shixin <liushixin2(a)huawei.com>
bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
Peter Collingbourne <pcc(a)google.com>
mm: call arch_swap_restore() from do_swap_page()
Hugh Dickins <hughd(a)google.com>
mm: lock newly mapped VMA with corrected ordering
Suren Baghdasaryan <surenb(a)google.com>
mm: lock newly mapped VMA which can be modified after it becomes visible
Suren Baghdasaryan <surenb(a)google.com>
mm: lock a vma before stack expansion
-------------
Diffstat:
Makefile | 4 ++--
include/linux/bootmem_info.h | 2 ++
kernel/fork.c | 1 +
mm/memory.c | 7 +++++++
mm/mmap.c | 6 ++++++
5 files changed, 18 insertions(+), 2 deletions(-)
Freeing a hugetlb page and releasing base pages back to the underlying
allocator such as buddy or cma is performed in two steps:
- remove_hugetlb_folio() is called to remove the folio from hugetlb
lists, get a ref on the page and remove hugetlb destructor. This
all must be done under the hugetlb lock. After this call, the page
can be treated as a normal compound page or a collection of base
size pages.
- update_and_free_hugetlb_folio() is called to allocate vmemmap if
needed and the free routine of the underlying allocator is called
on the resulting page. We can not hold the hugetlb lock here.
One issue with this scheme is that a memory error could occur between
these two steps. In this case, the memory error handling code treats
the old hugetlb page as a normal compound page or collection of base
pages. It will then try to SetPageHWPoison(page) on the page with an
error. If the page with error is a tail page without vmemmap, a write
error will occur when trying to set the flag.
Address this issue by modifying remove_hugetlb_folio() and
update_and_free_hugetlb_folio() such that the hugetlb destructor is not
cleared until after allocating vmemmap. Since clearing the destructor
requires holding the hugetlb lock, the clearing is done in
remove_hugetlb_folio() if the vmemmap is present. This saves a
lock/unlock cycle. Otherwise, destructor is cleared in
update_and_free_hugetlb_folio() after allocating vmemmap.
Note that this will leave hugetlb pages in a state where they are marked
free (by hugetlb specific page flag) and have a ref count. This is not
a normal state. The only code that would notice is the memory error
code, and it is set up to retry in such a case.
A subsequent patch will create a routine to do bulk processing of
vmemmap allocation. This will eliminate a lock/unlock cycle for each
hugetlb page in the case where we are freeing a large number of pages.
Fixes: ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
mm/hugetlb.c | 75 +++++++++++++++++++++++++++++++++++-----------------
1 file changed, 51 insertions(+), 24 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index e4a28ce0667f..1b67bf341c32 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1580,9 +1580,37 @@ static inline void destroy_compound_gigantic_folio(struct folio *folio,
unsigned int order) { }
#endif
+static inline void __clear_hugetlb_destructor(struct hstate *h,
+ struct folio *folio)
+{
+ lockdep_assert_held(&hugetlb_lock);
+
+ /*
+ * Very subtle
+ *
+ * For non-gigantic pages set the destructor to the normal compound
+ * page dtor. This is needed in case someone takes an additional
+ * temporary ref to the page, and freeing is delayed until they drop
+ * their reference.
+ *
+ * For gigantic pages set the destructor to the null dtor. This
+ * destructor will never be called. Before freeing the gigantic
+ * page destroy_compound_gigantic_folio will turn the folio into a
+ * simple group of pages. After this the destructor does not
+ * apply.
+ *
+ */
+ if (hstate_is_gigantic(h))
+ folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
+ else
+ folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
+}
+
/*
- * Remove hugetlb folio from lists, and update dtor so that the folio appears
- * as just a compound page.
+ * Remove hugetlb folio from lists.
+ * If vmemmap exists for the folio, update dtor so that the folio appears
+ * as just a compound page. Otherwise, wait until after allocating vmemmap
+ * to update dtor.
*
* A reference is held on the folio, except in the case of demote.
*
@@ -1613,31 +1641,19 @@ static void __remove_hugetlb_folio(struct hstate *h, struct folio *folio,
}
/*
- * Very subtle
- *
- * For non-gigantic pages set the destructor to the normal compound
- * page dtor. This is needed in case someone takes an additional
- * temporary ref to the page, and freeing is delayed until they drop
- * their reference.
- *
- * For gigantic pages set the destructor to the null dtor. This
- * destructor will never be called. Before freeing the gigantic
- * page destroy_compound_gigantic_folio will turn the folio into a
- * simple group of pages. After this the destructor does not
- * apply.
- *
- * This handles the case where more than one ref is held when and
- * after update_and_free_hugetlb_folio is called.
- *
- * In the case of demote we do not ref count the page as it will soon
- * be turned into a page of smaller size.
+ * We can only clear the hugetlb destructor after allocating vmemmap
+ * pages. Otherwise, someone (memory error handling) may try to write
+ * to tail struct pages.
+ */
+ if (!folio_test_hugetlb_vmemmap_optimized(folio))
+ __clear_hugetlb_destructor(h, folio);
+
+ /*
+ * In the case of demote we do not ref count the page as it will soon
+ * be turned into a page of smaller size.
*/
if (!demote)
folio_ref_unfreeze(folio, 1);
- if (hstate_is_gigantic(h))
- folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
- else
- folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
h->nr_huge_pages--;
h->nr_huge_pages_node[nid]--;
@@ -1706,6 +1722,7 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
{
int i;
struct page *subpage;
+ bool clear_dtor = folio_test_hugetlb_vmemmap_optimized(folio);
if (hstate_is_gigantic(h) && !gigantic_page_runtime_supported())
return;
@@ -1736,6 +1753,16 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
if (unlikely(folio_test_hwpoison(folio)))
folio_clear_hugetlb_hwpoison(folio);
+ /*
+ * If vmemmap pages were allocated above, then we need to clear the
+ * hugetlb destructor under the hugetlb lock.
+ */
+ if (clear_dtor) {
+ spin_lock_irq(&hugetlb_lock);
+ __clear_hugetlb_destructor(h, folio);
+ spin_unlock_irq(&hugetlb_lock);
+ }
+
for (i = 0; i < pages_per_huge_page(h); i++) {
subpage = folio_page(folio, i);
subpage->flags &= ~(1 << PG_locked | 1 << PG_error |
--
2.41.0
Hist triggers can have referenced variables without having direct
variables fields. This can be the case if referenced variables are added
for trigger actions. In this case the newly added references will not
have field variables. Not taking such referenced variables into
consideration can result in a bug where it would be possible to remove
hist trigger with variables being refenced. This will result in a bug
that is easily reproducable like so
$ cd /sys/kernel/tracing
$ echo 'synthetic_sys_enter char[] comm; long id' >> synthetic_events
$ echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
$ echo 'hist:keys=common_pid.execname,id.syscall:onmatch(raw_syscalls.sys_enter).synthetic_sys_enter($comm, id)' >> events/raw_syscalls/sys_enter/trigger
$ echo '!hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
[ 100.263533] ==================================================================
[ 100.264634] BUG: KASAN: slab-use-after-free in resolve_var_refs+0xc7/0x180
[ 100.265520] Read of size 8 at addr ffff88810375d0f0 by task bash/439
[ 100.266320]
[ 100.266533] CPU: 2 PID: 439 Comm: bash Not tainted 6.5.0-rc1 #4
[ 100.267277] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 100.268561] Call Trace:
[ 100.268902] <TASK>
[ 100.269189] dump_stack_lvl+0x4c/0x70
[ 100.269680] print_report+0xc5/0x600
[ 100.270165] ? resolve_var_refs+0xc7/0x180
[ 100.270697] ? kasan_complete_mode_report_info+0x80/0x1f0
[ 100.271389] ? resolve_var_refs+0xc7/0x180
[ 100.271913] kasan_report+0xbd/0x100
[ 100.272380] ? resolve_var_refs+0xc7/0x180
[ 100.272920] __asan_load8+0x71/0xa0
[ 100.273377] resolve_var_refs+0xc7/0x180
[ 100.273888] event_hist_trigger+0x749/0x860
[ 100.274505] ? kasan_save_stack+0x2a/0x50
[ 100.275024] ? kasan_set_track+0x29/0x40
[ 100.275536] ? __pfx_event_hist_trigger+0x10/0x10
[ 100.276138] ? ksys_write+0xd1/0x170
[ 100.276607] ? do_syscall_64+0x3c/0x90
[ 100.277099] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.277771] ? destroy_hist_data+0x446/0x470
[ 100.278324] ? event_hist_trigger_parse+0xa6c/0x3860
[ 100.278962] ? __pfx_event_hist_trigger_parse+0x10/0x10
[ 100.279627] ? __kasan_check_write+0x18/0x20
[ 100.280177] ? mutex_unlock+0x85/0xd0
[ 100.280660] ? __pfx_mutex_unlock+0x10/0x10
[ 100.281200] ? kfree+0x7b/0x120
[ 100.281619] ? ____kasan_slab_free+0x15d/0x1d0
[ 100.282197] ? event_trigger_write+0xac/0x100
[ 100.282764] ? __kasan_slab_free+0x16/0x20
[ 100.283293] ? __kmem_cache_free+0x153/0x2f0
[ 100.283844] ? sched_mm_cid_remote_clear+0xb1/0x250
[ 100.284550] ? __pfx_sched_mm_cid_remote_clear+0x10/0x10
[ 100.285221] ? event_trigger_write+0xbc/0x100
[ 100.285781] ? __kasan_check_read+0x15/0x20
[ 100.286321] ? __bitmap_weight+0x66/0xa0
[ 100.286833] ? _find_next_bit+0x46/0xe0
[ 100.287334] ? task_mm_cid_work+0x37f/0x450
[ 100.287872] event_triggers_call+0x84/0x150
[ 100.288408] trace_event_buffer_commit+0x339/0x430
[ 100.289073] ? ring_buffer_event_data+0x3f/0x60
[ 100.292189] trace_event_raw_event_sys_enter+0x8b/0xe0
[ 100.295434] syscall_trace_enter.constprop.0+0x18f/0x1b0
[ 100.298653] syscall_enter_from_user_mode+0x32/0x40
[ 100.301808] do_syscall_64+0x1a/0x90
[ 100.304748] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.307775] RIP: 0033:0x7f686c75c1cb
[ 100.310617] Code: 73 01 c3 48 8b 0d 65 3c 10 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 21 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 3c 10 00 f7 d8 64 89 01 48
[ 100.317847] RSP: 002b:00007ffc60137a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000021
[ 100.321200] RAX: ffffffffffffffda RBX: 000055f566469ea0 RCX: 00007f686c75c1cb
[ 100.324631] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000000000000000a
[ 100.328104] RBP: 00007ffc60137ac0 R08: 00007f686c818460 R09: 000000000000000a
[ 100.331509] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
[ 100.334992] R13: 0000000000000007 R14: 000000000000000a R15: 0000000000000007
[ 100.338381] </TASK>
We hit the bug because when second hist trigger has was created
has_hist_vars() returned false because hist trigger did not have
variables. As a result of that save_hist_vars() was not called to add
the trigger to trace_array->hist_vars. Later on when we attempted to
remove the first histogram find_any_var_ref() failed to detect it is
being used because it did not find the second trigger in hist_vars list.
With this change we wait until trigger actions are created so we can take
into consideration if hist trigger has variable references. Also, now we
check the return value of save_hist_vars() and fail trigger creation if
save_hist_vars() fails.
Signed-off-by: Mohamed Khalfella <mkhalfella(a)purestorage.com>
Cc: stable(a)vger.kernel.org
---
kernel/trace/trace_events_hist.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index b97d3ad832f1..c8c61381eba4 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -6663,13 +6663,15 @@ static int event_hist_trigger_parse(struct event_command *cmd_ops,
if (get_named_trigger_data(trigger_data))
goto enable;
- if (has_hist_vars(hist_data))
- save_hist_vars(hist_data);
-
ret = create_actions(hist_data);
if (ret)
goto out_unreg;
+ if (has_hist_vars(hist_data) || hist_data->n_var_refs) {
+ if (save_hist_vars(hist_data))
+ goto out_unreg;
+ }
+
ret = tracing_map_init(hist_data->map);
if (ret)
goto out_unreg;
--
2.34.1
AutoRetry has been found to cause issues in this controller revision.
This feature allows the controller to send non-terminating/burst retry
ACKs (Retry=1 and Nump!=0) as opposed to terminating retry ACKs
(Retry=1 and Nump=0) to devices when a transaction error occurs.
Unfortunately, some USB devices do not retry transfers when
the controller sends them a non-terminating retry ACK. After the transfer
times out, the xHCI driver tries to resume normal operation of the
controller by sending a Stop Endpoint command to it. However, this
revision of the controller fails to respond to that command in this
state and the xHCI driver therefore gives up. This is reported via dmesg:
[sda] tag#29 uas_eh_abort_handler 0 uas-tag 1 inflight: CMD IN
[sda] tag#29 CDB: opcode=0x28 28 00 00 69 42 80 00 00 48 00
xhci-hcd: xHCI host not responding to stop endpoint command
xhci-hcd: xHCI host controller not responding, assume dead
xhci-hcd: HC died; cleaning up
This problem has been observed on Odroid HC2. This board has
an integrated JMS578 USB3-to-SATA bridge. The above problem could be
triggered by starting a read-heavy workload on an attached SSD.
After a while, the host controller would die and the SSD would disappear
from the system.
Reverting b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in
the controller") stopped the issue from occurring on Odroid HC2.
As this problem hasn't been reported on other devices, disable
AutoRetry just for the dwc_usb3 revision v2.00a that the board SoC
(Exynos 5422) uses.
Fixes: b138e23d3dff ("usb: dwc3: core: Enable AutoRetry feature in the controller")
Link: https://lore.kernel.org/r/a21f34c04632d250cd0a78c7c6f4a1c9c7a43142.camel@gm…
Link: https://forum.odroid.com/viewtopic.php?t=42630
Cc: stable(a)vger.kernel.org
Cc: Mauro Ribeiro <mauro.ribeiro(a)hardkernel.com>
Cc: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Suggested-by: Thinh Nguyen <Thinh.Nguyen(a)synopsys.com>
Signed-off-by: Jakub Vanek <linuxtardis(a)gmail.com>
---
drivers/usb/dwc3/core.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/drivers/usb/dwc3/core.c b/drivers/usb/dwc3/core.c
index d68958e151a7..d742fc882d7e 100644
--- a/drivers/usb/dwc3/core.c
+++ b/drivers/usb/dwc3/core.c
@@ -1218,9 +1218,15 @@ static int dwc3_core_init(struct dwc3 *dwc)
* Host mode on seeing transaction errors(CRC errors or internal
* overrun scenerios) on IN transfers to reply to the device
* with a non-terminating retry ACK (i.e, an ACK transcation
- * packet with Retry=1 & Nump != 0)
+ * packet with Retry=1 & Nump != 0).
+ *
+ * Do not enable this for DWC_usb3 v2.00a. This controller
+ * revision stops responding to Stop Endpoint commands if
+ * a USB device does not retry after a non-terminating retry
+ * ACK is sent to it.
*/
- reg |= DWC3_GUCTL_HSTINAUTORETRY;
+ if (!DWC3_VER_IS(DWC3, 200A))
+ reg |= DWC3_GUCTL_HSTINAUTORETRY;
dwc3_writel(dwc->regs, DWC3_GUCTL, reg);
}
--
2.25.1
When allocating the 2D array for handling IRQ type registers in
regmap_add_irq_chip_fwnode(), the intent is to allocate a matrix
with num_config_bases rows and num_config_regs columns.
This is currently handled by allocating a buffer to hold a pointer for
each row (i.e. num_config_bases). After that, the logic attempts to
allocate the memory required to hold the register configuration for
each row. However, instead of doing this allocation for each row
(i.e. num_config_bases allocations), the logic erroneously does this
allocation num_config_regs number of times.
This scenario can lead to out-of-bounds accesses when num_config_regs
is greater than num_config_bases. Fix this by updating the terminating
condition of the loop that allocates the memory for holding the register
configuration to allocate memory only for each row in the matrix.
Amit Pundir reported a crash that was occurring on his db845c device
due to memory corruption (see "Closes" tag for Amit's report). The KASAN
report below helped narrow it down to this issue:
[ 14.033877][ T1] ==================================================================
[ 14.042507][ T1] BUG: KASAN: invalid-access in regmap_add_irq_chip_fwnode+0x594/0x1364
[ 14.050796][ T1] Write of size 8 at addr 06ffff8081021850 by task init/1
[ 14.057841][ T1] Pointer tag: [06], memory tag: [fe]
[ 14.063124][ T1]
[ 14.065349][ T1] CPU: 2 PID: 1 Comm: init Tainted: G W E 6.4.0-mainline-g6a4b67fef3e2 #1
[ 14.075014][ T1] Hardware name: Thundercomm Dragonboard 845c (DT)
[ 14.081432][ T1] Call trace:
[ 14.084618][ T1] dump_backtrace+0xe8/0x108
[ 14.089144][ T1] show_stack+0x18/0x30
[ 14.093215][ T1] dump_stack_lvl+0x50/0x6c
[ 14.097642][ T1] print_report+0x178/0x4c0
[ 14.102070][ T1] kasan_report+0xd4/0x12c
[ 14.106407][ T1] kasan_tag_mismatch+0x28/0x40
[ 14.111178][ T1] __hwasan_tag_mismatch+0x2c/0x5c
[ 14.116222][ T1] regmap_add_irq_chip_fwnode+0x594/0x1364
[ 14.121961][ T1] devm_regmap_add_irq_chip+0xb8/0x144
[ 14.127346][ T1] wcd934x_slim_status+0x210/0x28c [wcd934x]
[ 14.133307][ T1] slim_device_alloc_laddr+0x1ac/0x1ec [slimbus]
[ 14.139669][ T1] slim_device_probe+0x80/0x124 [slimbus]
[ 14.145394][ T1] really_probe+0x250/0x4d8
[ 14.149826][ T1] __driver_probe_device+0x104/0x1ac
[ 14.155041][ T1] driver_probe_device+0x80/0x218
[ 14.159990][ T1] __driver_attach+0x19c/0x2e4
[ 14.164678][ T1] bus_for_each_dev+0x158/0x1b4
[ 14.169454][ T1] driver_attach+0x34/0x44
[ 14.173790][ T1] bus_add_driver+0x1fc/0x328
[ 14.178390][ T1] driver_register+0xdc/0x1b4
[ 14.182995][ T1] __slim_driver_register+0x6c/0x84 [slimbus]
[ 14.189068][ T1] init_module+0x20/0xfe4 [wcd934x]
[ 14.194219][ T1] do_one_initcall+0x110/0x418
[ 14.198916][ T1] do_init_module+0x124/0x30c
[ 14.203521][ T1] load_module+0x1938/0x1ab0
[ 14.208034][ T1] __arm64_sys_finit_module+0x110/0x138
[ 14.213509][ T1] invoke_syscall+0x70/0x170
[ 14.218015][ T1] el0_svc_common+0xf0/0x138
[ 14.222523][ T1] do_el0_svc+0x40/0xb8
[ 14.226596][ T1] el0_svc+0x2c/0x78
[ 14.230405][ T1] el0t_64_sync_handler+0x68/0xb4
[ 14.235354][ T1] el0t_64_sync+0x19c/0x1a0
[ 14.239778][ T1]
[ 14.242004][ T1] The buggy address belongs to the object at ffffff8081021850
[ 14.242004][ T1] which belongs to the cache kmalloc-8 of size 8
[ 14.255669][ T1] The buggy address is located 0 bytes inside of
[ 14.255669][ T1] 8-byte region [ffffff8081021850, ffffff8081021858)
[ 14.255685][ T1]
[ 14.255689][ T1] The buggy address belongs to the physical page:
[ 14.255699][ T1] page:0000000080887a30 refcount:1 mapcount:0 mapping:0000000000000000 index:0x85ffff8081021ee0 pfn:0x101021
[ 14.275062][ T1] flags: 0x4000000000000200(slab|zone=1|kasantag=0x0)
[ 14.275078][ T1] page_type: 0xffffffff()
[ 14.275091][ T1] raw: 4000000000000200 49ffff8080002200 dead000000000122 0000000000000000
[ 14.275103][ T1] raw: 85ffff8081021ee0 00000000810000ea 00000001ffffffff 0000000000000000
[ 14.275110][ T1] page dumped because: kasan: bad access detected
[ 14.275116][ T1]
[ 14.275119][ T1] Memory state around the buggy address:
[ 14.275125][ T1] ffffff8081021600: fe fe fe 9a fe fe 5b 3c fe fe c9 fe b4 3f fe 54
[ 14.275133][ T1] ffffff8081021700: fe fe ad 6b fe fe fe 87 fe fe 39 c9 fe 03 fe ea
[ 14.275143][ T1] >ffffff8081021800: fe fe e1 fe 06 fe fe 21 fe fe e7 fe de fe fe 70
[ 14.275149][ T1] ^
[ 14.371674][ T1] ffffff8081021900: d7 fe fe 87 fe a0 fe fe fe 80 e0 f0 05 fe fe fe
[ 14.379667][ T1] ffffff8081021a00: 94 fe 31 fe fe e5 c8 00 d0 fe a1 fe fe e2 e5 fe
[ 14.387664][ T1] ==================================================================
Fixes: faa87ce9196d ("regmap-irq: Introduce config registers for irq types")
Reported-by: Amit Pundir <amit.pundir(a)linaro.org>
Closes: https://lore.kernel.org/all/CAMi1Hd04mu6JojT3y6wyN2YeVkPR5R3qnkKJ8iR8if_YBy…
Tested-by: John Stultz <jstultz(a)google.com>
Cc: stable(a)vger.kernel.org # v6.0+
Cc: Aidan MacDonald <aidanmacdonald.0x0(a)gmail.com>
Cc: Saravana Kannan <saravanak(a)google.com>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Signed-off-by: Isaac J. Manjarres <isaacmanjarres(a)google.com>
---
drivers/base/regmap/regmap-irq.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/base/regmap/regmap-irq.c b/drivers/base/regmap/regmap-irq.c
index ced0dcf86e0b..45fd13ef13fc 100644
--- a/drivers/base/regmap/regmap-irq.c
+++ b/drivers/base/regmap/regmap-irq.c
@@ -717,7 +717,7 @@ int regmap_add_irq_chip_fwnode(struct fwnode_handle *fwnode,
if (!d->config_buf)
goto err_alloc;
- for (i = 0; i < chip->num_config_regs; i++) {
+ for (i = 0; i < chip->num_config_bases; i++) {
d->config_buf[i] = kcalloc(chip->num_config_regs,
sizeof(**d->config_buf),
GFP_KERNEL);
--
2.41.0.255.g8b1d071c50-goog
The Winbond W25Q128 (actual vendor name W25Q128JV)
has exactly the same flags as the sibling device
w25q128fw. The devices both require unlocking and
support dual and quad SPI transport.
The actual product naming between devices:
0xef4018: "w25q128" W25Q128JV-IM/JM
0xef7018: "w25q128fw" W25Q128JV-IN/IQ/JQ
The latter device, "w25q128fw" supports features
named DTQ and QPI, otherwise it is the same.
Not having the right flags has the annoying side
effect that write access does not work.
After this patch I can write to the flash on the
Inteno XG6846 router.
Cc: stable(a)vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
---
drivers/mtd/spi-nor/winbond.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/mtd/spi-nor/winbond.c b/drivers/mtd/spi-nor/winbond.c
index 834d6ba5ce70..a67e1d4206f3 100644
--- a/drivers/mtd/spi-nor/winbond.c
+++ b/drivers/mtd/spi-nor/winbond.c
@@ -121,7 +121,9 @@ static const struct flash_info winbond_nor_parts[] = {
{ "w25q80bl", INFO(0xef4014, 0, 64 * 1024, 16)
NO_SFDP_FLAGS(SECT_4K) },
{ "w25q128", INFO(0xef4018, 0, 64 * 1024, 256)
- NO_SFDP_FLAGS(SECT_4K) },
+ FLAGS(SPI_NOR_HAS_LOCK | SPI_NOR_HAS_TB)
+ NO_SFDP_FLAGS(SECT_4K | SPI_NOR_DUAL_READ |
+ SPI_NOR_QUAD_READ) },
{ "w25q256", INFO(0xef4019, 0, 64 * 1024, 512)
NO_SFDP_FLAGS(SECT_4K | SPI_NOR_DUAL_READ | SPI_NOR_QUAD_READ)
.fixups = &w25q256_fixups },
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230711-spi-nor-winbond-w25q128-321a602ee267
Best regards,
--
Linus Walleij <linus.walleij(a)linaro.org>
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
---
arch/arm64/include/asm/kvm_pgtable.h | 26 ++++++------------
arch/arm64/kvm/hyp/pgtable.c | 41 ++++++++++++++++++++++------
arch/arm64/kvm/mmu.c | 18 ++++++------
3 files changed, 49 insertions(+), 36 deletions(-)
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index dc3c072e862f..75f437e8cd15 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -556,22 +556,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -593,18 +597,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index 5282cb9ca4cf..5d701e9adf5c 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1153,25 +1153,48 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 3b9d4d24c361..8a7e9381710e 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1639,27 +1639,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
base-commit: 44c026a73be8038f03dbdeef028b642880cf1511
--
2.41.0.178.g377b9f9a00-goog
The patch titled
Subject: maple_tree: fix node allocation testing on 32 bit
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-fix-node-allocation-testing-on-32-bit.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: fix node allocation testing on 32 bit
Date: Wed, 12 Jul 2023 13:39:16 -0400
Internal node counting was altered and the 64 bit test was updated,
however the 32bit test was missed.
Restore the 32bit test to a functional state.
Link: https://lore.kernel.org/linux-mm/CAMuHMdV4T53fOw7VPoBgPR7fP6RYqf=CBhD_y_vOg…
Link: https://lkml.kernel.org/r/20230712173916.168805-2-Liam.Howlett@oracle.com
Fixes: 541e06b772c1 ("maple_tree: remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc_bulk()")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/radix-tree/maple.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/tools/testing/radix-tree/maple.c~maple_tree-fix-node-allocation-testing-on-32-bit
+++ a/tools/testing/radix-tree/maple.c
@@ -206,9 +206,9 @@ static noinline void __init check_new_no
e = i - 1;
} else {
if (i >= 4)
- e = i - 4;
- else if (i == 3)
- e = i - 2;
+ e = i - 3;
+ else if (i >= 1)
+ e = i - 1;
else
e = 0;
}
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
mm-mlock-fix-vma-iterator-conversion-of-apply_vma_lock_flags.patch
maple_tree-fix-32-bit-mas_next-testing.patch
maple_tree-fix-node-allocation-testing-on-32-bit.patch
The patch titled
Subject: maple_tree: fix 32 bit mas_next testing
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
maple_tree-fix-32-bit-mas_next-testing.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: fix 32 bit mas_next testing
Date: Wed, 12 Jul 2023 13:39:15 -0400
The test setup of mas_next is dependent on node entry size to create a 2
level tree, but the tests did not account for this in the expected value
when shifting beyond the scope of the tree.
Fix this by setting up the test to succeed depending on the node entries
which is dependent on the 32/64 bit setup.
Link: https://lkml.kernel.org/r/20230712173916.168805-1-Liam.Howlett@oracle.com
Fixes: 120b116208a0 ("maple_tree: reorganize testing to restore module testing")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Reported-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
Link: https://lore.kernel.org/linux-mm/CAMuHMdV4T53fOw7VPoBgPR7fP6RYqf=CBhD_y_vOg…
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/test_maple_tree.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
--- a/lib/test_maple_tree.c~maple_tree-fix-32-bit-mas_next-testing
+++ a/lib/test_maple_tree.c
@@ -1898,13 +1898,16 @@ static noinline void __init next_prev_te
725};
static const unsigned long level2_32[] = { 1747, 2000, 1750, 1755,
1760, 1765};
+ unsigned long last_index;
if (MAPLE_32BIT) {
nr_entries = 500;
level2 = level2_32;
+ last_index = 0x138e;
} else {
nr_entries = 200;
level2 = level2_64;
+ last_index = 0x7d6;
}
for (i = 0; i <= nr_entries; i++)
@@ -2011,7 +2014,7 @@ static noinline void __init next_prev_te
val = mas_next(&mas, ULONG_MAX);
MT_BUG_ON(mt, val != NULL);
- MT_BUG_ON(mt, mas.index != 0x7d6);
+ MT_BUG_ON(mt, mas.index != last_index);
MT_BUG_ON(mt, mas.last != ULONG_MAX);
val = mas_prev(&mas, 0);
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
mm-mlock-fix-vma-iterator-conversion-of-apply_vma_lock_flags.patch
maple_tree-fix-32-bit-mas_next-testing.patch
maple_tree-fix-node-allocation-testing-on-32-bit.patch
The patch titled
Subject: selftests/mm: mkdirty: Fix incorrect position of #endif
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-mm-mkdirty-fix-incorrect-position-of-endif.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Colin Ian King <colin.i.king(a)gmail.com>
Subject: selftests/mm: mkdirty: Fix incorrect position of #endif
Date: Wed, 12 Jul 2023 14:46:48 +0100
The #endif is the wrong side of a } causing a build failure when
__NR_userfaultfd is not defined. Fix this by moving the #end to enclose
the }
Link: https://lkml.kernel.org/r/20230712134648.456349-1-colin.i.king@gmail.com
Fixes: 9eac40fc0cc7 ("selftests/mm: mkdirty: test behavior of (pte|pmd)_mkdirty on VMAs without write permissions")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/mkdirty.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/mm/mkdirty.c~selftests-mm-mkdirty-fix-incorrect-position-of-endif
+++ a/tools/testing/selftests/mm/mkdirty.c
@@ -321,8 +321,8 @@ close_uffd:
munmap:
munmap(dst, pagesize);
free(src);
-#endif /* __NR_userfaultfd */
}
+#endif /* __NR_userfaultfd */
int main(void)
{
_
Patches currently in -mm which might be from colin.i.king(a)gmail.com are
selftests-mm-mkdirty-fix-incorrect-position-of-endif.patch
Changing the mode of symlinks is meaningless as the vfs doesn't take the
mode of a symlink into account during path lookup permission checking.
However, the vfs doesn't block mode changes on symlinks. This however,
has lead to an untenable mess roughly classifiable into the following
two categories:
(1) Filesystems that don't implement a i_op->setattr() for symlinks.
Such filesystems may or may not know that without i_op->setattr()
defined, notify_change() falls back to simple_setattr() causing the
inode's mode in the inode cache to be changed.
That's a generic issue as this will affect all non-size changing
inode attributes including ownership changes.
Example: afs
(2) Filesystems that fail with EOPNOTSUPP but change the mode of the
symlink nonetheless.
Some filesystems will happily update the mode of a symlink but still
return EOPNOTSUPP. This is the biggest source of confusion for
userspace.
The EOPNOTSUPP in this case comes from POSIX ACLs. Specifically it
comes from filesystems that call posix_acl_chmod(), e.g., btrfs via
if (!err && attr->ia_valid & ATTR_MODE)
err = posix_acl_chmod(idmap, dentry, inode->i_mode);
Filesystems including btrfs don't implement i_op->set_acl() so
posix_acl_chmod() will report EOPNOTSUPP.
When posix_acl_chmod() is called, most filesystems will have
finished updating the inode.
Perversely, this has the consequences that this behavior may depend
on two kconfig options and mount options:
* CONFIG_POSIX_ACL={y,n}
* CONFIG_${FSTYPE}_POSIX_ACL={y,n}
* Opt_acl, Opt_noacl
Example: btrfs, ext4, xfs
The only way to change the mode on a symlink currently involves abusing
an O_PATH file descriptor in the following manner:
fd = openat(-1, "/path/to/link", O_CLOEXEC | O_PATH | O_NOFOLLOW);
char path[PATH_MAX];
snprintf(path, sizeof(path), "/proc/self/fd/%d", fd);
chmod(path, 0000);
But for most major filesystems with POSIX ACL support such as btrfs,
ext4, ceph, tmpfs, xfs and others this will fail with EOPNOTSUPP with
the mode still updated due to the aforementioned posix_acl_chmod()
nonsense.
So, given that for all major filesystems this would fail with EOPNOTSUPP
and that both glibc (cf. [1]) and musl (cf. [2]) outright block mode
changes on symlinks we should just try and block mode changes on
symlinks directly in the vfs and have a clean break with this nonsense.
If this causes any regressions, we do the next best thing and fix up all
filesystems that do return EOPNOTSUPP with the mode updated to not call
posix_acl_chmod() on symlinks.
But as usual, let's try the clean cut solution first. It's a simple
patch that can be easily reverted. Not marking this for backport as I'll
do that manually if we're reasonably sure that this works and there are
no strong objections.
We could block this in chmod_common() but it's more appropriate to do it
notify_change() as it will also mean that we catch filesystems that
change symlink permissions explicitly or accidently.
Similar proposals were floated in the past as in [3] and [4] and again
recently in [5]. There's also a couple of bugs about this inconsistency
as in [6] and [7].
Link: https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/unix/sysv/linux/fc… [1]
Link: https://git.musl-libc.org/cgit/musl/tree/src/stat/fchmodat.c [2]
Link: https://lore.kernel.org/all/20200911065733.GA31579@infradead.org [3]
Link: https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00518.html [4]
Link: https://lore.kernel.org/lkml/87lefmbppo.fsf@oldenburg.str.redhat.com [5]
Link: https://sourceware.org/legacy-ml/libc-alpha/2020-02/msg00467.html [6]
Link: https://sourceware.org/bugzilla/show_bug.cgi?id=14578#c17 [7]
Cc: stable(a)vger.kernel.org # no backport before v6.6-rc2 is tagged
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Suggested-by: Florian Weimer <fweimer(a)redhat.com>
Reviewed-by: Aleksa Sarai <cyphar(a)cyphar.com>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
---
---
fs/attr.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/fs/attr.c b/fs/attr.c
index d60dc1edb526..529153ecde25 100644
--- a/fs/attr.c
+++ b/fs/attr.c
@@ -394,6 +394,9 @@ int notify_change(struct mnt_idmap *idmap, struct dentry *dentry,
return error;
if ((ia_valid & ATTR_MODE)) {
+ if (S_ISLNK(inode->i_mode))
+ return -EOPNOTSUPP;
+
umode_t amode = attr->ia_mode;
/* Flag setting protected by i_mutex */
if (is_sxid(amode))
---
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
change-id: 20230712-vfs-chmod-symlinks-a3ad1923a992
From: Zhikai Zhai <zhikai.zhai(a)amd.com>
[WHY]
All of pipes will be used when the MPC split enable on the dcn
which just has 2 pipes. Then MPO enter will trigger the minimal
transition which need programe dcn from 2 pipes MPC split to 2
pipes MPO. This action will cause lag if happen frequently.
[HOW]
Disable the MPC split for the platform which dcn resource is limited
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Alvin Lee <alvin.lee2(a)amd.com>
Acked-by: Alan Liu <haoping.liu(a)amd.com>
Signed-off-by: Zhikai Zhai <zhikai.zhai(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
index 45956ef6f3f9..131b8b82afc0 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
@@ -65,7 +65,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.timing_trace = false,
.clock_trace = true,
.disable_pplib_clock_request = true,
- .pipe_split_policy = MPC_SPLIT_DYNAMIC,
+ .pipe_split_policy = MPC_SPLIT_AVOID,
.force_single_disp_pipe_split = false,
.disable_dcc = DCC_ENABLE,
.vsr_support = true,
--
2.34.1
From: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
[Why & How]
Port of a change that went into DCN314 to keep the PHY enabled
when we have a connected and active DP display.
The PHY can hang if PHY refclk is disabled inadvertently.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Josip Pavic <josip.pavic(a)amd.com>
Acked-by: Alan Liu <haoping.liu(a)amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
index 7ccd96959256..3db4ef564b99 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
@@ -87,6 +87,11 @@ static int dcn31_get_active_display_cnt_wa(
stream->signal == SIGNAL_TYPE_DVI_SINGLE_LINK ||
stream->signal == SIGNAL_TYPE_DVI_DUAL_LINK)
tmds_present = true;
+
+ /* Checking stream / link detection ensuring that PHY is active*/
+ if (dc_is_dp_signal(stream->signal) && !stream->dpms_off)
+ display_count++;
+
}
for (i = 0; i < dc->link_count; i++) {
--
2.34.1
From: Daniel Miess <daniel.miess(a)amd.com>
[Why]
In dcn314 DML the destination pipe vtotal was being set
to the crtc adjustment vtotal_min value even in cases
where that value is 0.
[How]
Only set vtotal to the crtc adjustment vtotal_min value
in cases where the value is non-zero.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
Acked-by: Alan Liu <haoping.liu(a)amd.com>
Signed-off-by: Daniel Miess <daniel.miess(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
index d9e049e7ff0a..ed8ddb75b333 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
@@ -295,7 +295,11 @@ int dcn314_populate_dml_pipes_from_context_fpu(struct dc *dc, struct dc_state *c
pipe = &res_ctx->pipe_ctx[i];
timing = &pipe->stream->timing;
- pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min;
+ if (pipe->stream->adjust.v_total_min != 0)
+ pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min;
+ else
+ pipes[pipe_cnt].pipe.dest.vtotal = timing->v_total;
+
pipes[pipe_cnt].pipe.dest.vblank_nom = timing->v_total - pipes[pipe_cnt].pipe.dest.vactive;
pipes[pipe_cnt].pipe.dest.vblank_nom = min(pipes[pipe_cnt].pipe.dest.vblank_nom, dcn3_14_ip.VBlankNomDefaultUS);
pipes[pipe_cnt].pipe.dest.vblank_nom = max(pipes[pipe_cnt].pipe.dest.vblank_nom, timing->v_sync_width);
--
2.34.1
From: Zhikai Zhai <zhikai.zhai(a)amd.com>
[WHY]
All of pipes will be used when the MPC split enable on the dcn
which just has 2 pipes. Then MPO enter will trigger the minimal
transition which need programe dcn from 2 pipes MPC split to 2
pipes MPO. This action will cause lag if happen frequently.
[HOW]
Disable the MPC split for the platform which dcn resource is limited
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Alvin Lee <alvin.lee2(a)amd.com>
Acked-by: Alan Liu <haoping.liu(a)amd.com>
Signed-off-by: Zhikai Zhai <zhikai.zhai(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
index 45956ef6f3f9..131b8b82afc0 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c
@@ -65,7 +65,7 @@ static const struct dc_debug_options debug_defaults_drv = {
.timing_trace = false,
.clock_trace = true,
.disable_pplib_clock_request = true,
- .pipe_split_policy = MPC_SPLIT_DYNAMIC,
+ .pipe_split_policy = MPC_SPLIT_AVOID,
.force_single_disp_pipe_split = false,
.disable_dcc = DCC_ENABLE,
.vsr_support = true,
--
2.34.1
From: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
[Why & How]
Port of a change that went into DCN314 to keep the PHY enabled
when we have a connected and active DP display.
The PHY can hang if PHY refclk is disabled inadvertently.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Josip Pavic <josip.pavic(a)amd.com>
Acked-by: Alan Liu <haoping.liu(a)amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
index 7ccd96959256..3db4ef564b99 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn31/dcn31_clk_mgr.c
@@ -87,6 +87,11 @@ static int dcn31_get_active_display_cnt_wa(
stream->signal == SIGNAL_TYPE_DVI_SINGLE_LINK ||
stream->signal == SIGNAL_TYPE_DVI_DUAL_LINK)
tmds_present = true;
+
+ /* Checking stream / link detection ensuring that PHY is active*/
+ if (dc_is_dp_signal(stream->signal) && !stream->dpms_off)
+ display_count++;
+
}
for (i = 0; i < dc->link_count; i++) {
--
2.34.1
From: Daniel Miess <daniel.miess(a)amd.com>
[Why]
In dcn314 DML the destination pipe vtotal was being set
to the crtc adjustment vtotal_min value even in cases
where that value is 0.
[How]
Only set vtotal to the crtc adjustment vtotal_min value
in cases where the value is non-zero.
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas(a)amd.com>
Acked-by: Alan Liu <haoping.liu(a)amd.com>
Signed-off-by: Daniel Miess <daniel.miess(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
index d9e049e7ff0a..ed8ddb75b333 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn314/dcn314_fpu.c
@@ -295,7 +295,11 @@ int dcn314_populate_dml_pipes_from_context_fpu(struct dc *dc, struct dc_state *c
pipe = &res_ctx->pipe_ctx[i];
timing = &pipe->stream->timing;
- pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min;
+ if (pipe->stream->adjust.v_total_min != 0)
+ pipes[pipe_cnt].pipe.dest.vtotal = pipe->stream->adjust.v_total_min;
+ else
+ pipes[pipe_cnt].pipe.dest.vtotal = timing->v_total;
+
pipes[pipe_cnt].pipe.dest.vblank_nom = timing->v_total - pipes[pipe_cnt].pipe.dest.vactive;
pipes[pipe_cnt].pipe.dest.vblank_nom = min(pipes[pipe_cnt].pipe.dest.vblank_nom, dcn3_14_ip.VBlankNomDefaultUS);
pipes[pipe_cnt].pipe.dest.vblank_nom = max(pipes[pipe_cnt].pipe.dest.vblank_nom, timing->v_sync_width);
--
2.34.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 430635a0ef1ce958b7b4311f172694ece2c692b8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050813-twiddling-olive-0fd5@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
430635a0ef1c ("perf intel-pt: Fix CYC timestamps after standalone CBR")
3f05516758be ("perf intel-pt: Accumulate cycle count from TSC/TMA/MTC packets")
f3c98c4b5ac8 ("perf intel-pt: Re-factor TIP cases in intel_pt_walk_to_ip")
7b4b4f83881e ("perf intel-pt: Accumulate cycle count from CYC packets")
948e9dc8bb26 ("perf intel-pt: Factor out intel_pt_update_sample_time")
1b6599a9d8e6 ("perf intel-pt: Fix sample timestamp wrt non-taken branches")
bea6385789b8 ("perf intel-pt: Implement decoder flags for trace begin / end")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 430635a0ef1ce958b7b4311f172694ece2c692b8 Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter(a)intel.com>
Date: Mon, 3 Apr 2023 18:48:31 +0300
Subject: [PATCH] perf intel-pt: Fix CYC timestamps after standalone CBR
After a standalone CBR (not associated with TSC), update the cycles
reference timestamp and reset the cycle count, so that CYC timestamps
are calculated relative to that point with the new frequency.
Fixes: cc33618619cefc6d ("perf tools: Add Intel PT support for decoding CYC packets")
Signed-off-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230403154831.8651-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 0ac860c8dd2b..7145c5890de0 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -1998,6 +1998,8 @@ static void intel_pt_calc_cbr(struct intel_pt_decoder *decoder)
decoder->cbr = cbr;
decoder->cbr_cyc_to_tsc = decoder->max_non_turbo_ratio_fp / cbr;
+ decoder->cyc_ref_timestamp = decoder->timestamp;
+ decoder->cycle_cnt = 0;
intel_pt_mtc_cyc_cnt_cbr(decoder);
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 430635a0ef1ce958b7b4311f172694ece2c692b8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023050812-swaddling-stardust-e90d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
430635a0ef1c ("perf intel-pt: Fix CYC timestamps after standalone CBR")
3f05516758be ("perf intel-pt: Accumulate cycle count from TSC/TMA/MTC packets")
f3c98c4b5ac8 ("perf intel-pt: Re-factor TIP cases in intel_pt_walk_to_ip")
7b4b4f83881e ("perf intel-pt: Accumulate cycle count from CYC packets")
948e9dc8bb26 ("perf intel-pt: Factor out intel_pt_update_sample_time")
1b6599a9d8e6 ("perf intel-pt: Fix sample timestamp wrt non-taken branches")
bea6385789b8 ("perf intel-pt: Implement decoder flags for trace begin / end")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 430635a0ef1ce958b7b4311f172694ece2c692b8 Mon Sep 17 00:00:00 2001
From: Adrian Hunter <adrian.hunter(a)intel.com>
Date: Mon, 3 Apr 2023 18:48:31 +0300
Subject: [PATCH] perf intel-pt: Fix CYC timestamps after standalone CBR
After a standalone CBR (not associated with TSC), update the cycles
reference timestamp and reset the cycle count, so that CYC timestamps
are calculated relative to that point with the new frequency.
Fixes: cc33618619cefc6d ("perf tools: Add Intel PT support for decoding CYC packets")
Signed-off-by: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Adrian Hunter <adrian.hunter(a)intel.com>
Cc: Ian Rogers <irogers(a)google.com>
Cc: Jiri Olsa <jolsa(a)kernel.org>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230403154831.8651-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme(a)redhat.com>
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 0ac860c8dd2b..7145c5890de0 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -1998,6 +1998,8 @@ static void intel_pt_calc_cbr(struct intel_pt_decoder *decoder)
decoder->cbr = cbr;
decoder->cbr_cyc_to_tsc = decoder->max_non_turbo_ratio_fp / cbr;
+ decoder->cyc_ref_timestamp = decoder->timestamp;
+ decoder->cycle_cnt = 0;
intel_pt_mtc_cyc_cnt_cbr(decoder);
}
From: "Paul E. McKenney" <paulmck(a)kernel.org>
[ Upstream commit a24c1aab652ebacf9ea62470a166514174c96fe1 ]
The rcu_data structure's ->rcu_cpu_has_work field can be modified by
any CPU attempting to wake up the rcuc kthread. Therefore, this commit
marks accesses to this field from the rcu_cpu_kthread() function.
This data race was reported by KCSAN. Not appropriate for backporting
due to failure being unlikely.
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/rcu/tree.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
index eec8e2f7537eb..b2c1ab260ed56 100644
--- a/kernel/rcu/tree.c
+++ b/kernel/rcu/tree.c
@@ -2810,12 +2810,12 @@ static void rcu_cpu_kthread(unsigned int cpu)
*statusp = RCU_KTHREAD_RUNNING;
local_irq_disable();
work = *workp;
- *workp = 0;
+ WRITE_ONCE(*workp, 0);
local_irq_enable();
if (work)
rcu_core();
local_bh_enable();
- if (*workp == 0) {
+ if (!READ_ONCE(*workp)) {
trace_rcu_utilization(TPS("End CPU kthread@rcu_wait"));
*statusp = RCU_KTHREAD_WAITING;
return;
--
2.39.2
apply_vma_lock_flags() calls mlock_fixup(), which could merge the VMA
after where the vma iterator is located. Although this is not an issue,
the next iteration of the loop will check the start of the vma to be
equal to the locally saved 'tmp' variable and cause an incorrect failure
scenario. Fix the error by setting tmp to the end of the vma iterator
value before restarting the loop.
There is also a potential of the error code being overwritten when the
loop terminates early. Fix the return issue by directly returning when
an error is encountered since there is nothing to undo after the loop.
Reported-by: Ryan Roberts <ryan.roberts(a)arm.com>
Link: https://lore.kernel.org/linux-mm/50341ca1-d582-b33a-e3d0-acb08a65166f@arm.c…
Cc: <stable(a)vger.kernel.org>
Fixes: 37598f5a9d8b ("mlock: convert mlock to vma iterator")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
---
mm/mlock.c | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/mm/mlock.c b/mm/mlock.c
index d7db94519884..0a0c996c5c21 100644
--- a/mm/mlock.c
+++ b/mm/mlock.c
@@ -477,7 +477,6 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
{
unsigned long nstart, end, tmp;
struct vm_area_struct *vma, *prev;
- int error;
VMA_ITERATOR(vmi, current->mm, start);
VM_BUG_ON(offset_in_page(start));
@@ -498,6 +497,7 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
nstart = start;
tmp = vma->vm_start;
for_each_vma_range(vmi, vma, end) {
+ int error;
vm_flags_t newflags;
if (vma->vm_start != tmp)
@@ -511,14 +511,15 @@ static int apply_vma_lock_flags(unsigned long start, size_t len,
tmp = end;
error = mlock_fixup(&vmi, vma, &prev, nstart, tmp, newflags);
if (error)
- break;
+ return error;
+ tmp = vma_iter_end(&vmi);
nstart = tmp;
}
- if (vma_iter_end(&vmi) < end)
+ if (tmp < end)
return -ENOMEM;
- return error;
+ return 0;
}
/*
--
2.39.2
In below thousands of screen rotation loop tests with virtual display
enabled, a CPU hard lockup issue may happen, leading system to unresponsive
and crash.
do {
xrandr --output Virtual --rotate inverted
xrandr --output Virtual --rotate right
xrandr --output Virtual --rotate left
xrandr --output Virtual --rotate normal
} while (1);
NMI watchdog: Watchdog detected hard LOCKUP on cpu 1
? hrtimer_run_softirq+0x140/0x140
? store_vblank+0xe0/0xe0 [drm]
hrtimer_cancel+0x15/0x30
amdgpu_vkms_disable_vblank+0x15/0x30 [amdgpu]
drm_vblank_disable_and_save+0x185/0x1f0 [drm]
drm_crtc_vblank_off+0x159/0x4c0 [drm]
? record_print_text.cold+0x11/0x11
? wait_for_completion_timeout+0x232/0x280
? drm_crtc_wait_one_vblank+0x40/0x40 [drm]
? bit_wait_io_timeout+0xe0/0xe0
? wait_for_completion_interruptible+0x1d7/0x320
? mutex_unlock+0x81/0xd0
amdgpu_vkms_crtc_atomic_disable
It's caused by a stuck in lock dependency in such scenario on different
CPUs.
CPU1 CPU2
drm_crtc_vblank_off hrtimer_interrupt
grab event_lock (irq disabled) __hrtimer_run_queues
grab vbl_lock/vblank_time_block amdgpu_vkms_vblank_simulate
amdgpu_vkms_disable_vblank drm_handle_vblank
hrtimer_cancel grab dev->event_lock
So CPU1 stucks in hrtimer_cancel as timer callback is running endless on
current clock base, as that timer queue on CPU2 has no chance to finish it
because of failing to hold the lock. So NMI watchdog will throw the errors
after its threshold, and all later CPUs are impacted/blocked.
So use hrtimer_try_to_cancel to fix this, as disable_vblank callback
does not need to wait the handler to finish. And also it's not necessary
to check the return value of hrtimer_try_to_cancel, because even if it's
-1 which means current timer callback is running, it will be reprogrammed
in hrtimer_start with calling enable_vblank to make it works.
v2: only re-arm timer when vblank is enabled (Christian) and add a Fixes
tag as well
v3: drop warn printing (Christian)
v4: drop superfluous check of blank->enabled in timer function, as it's
guaranteed in drm_handle_vblank (Christian)
Fixes: 84ec374bd580("drm/amdgpu: create amdgpu_vkms (v4)")
Cc: stable(a)vger.kernel.org
Suggested-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: Guchun Chen <guchun.chen(a)amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
index 53ff91fc6cf6..d0748bcfad16 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c
@@ -55,8 +55,9 @@ static enum hrtimer_restart amdgpu_vkms_vblank_simulate(struct hrtimer *timer)
DRM_WARN("%s: vblank timer overrun\n", __func__);
ret = drm_crtc_handle_vblank(crtc);
+ /* Don't queue timer again when vblank is disabled. */
if (!ret)
- DRM_ERROR("amdgpu_vkms failure on handling vblank");
+ return HRTIMER_NORESTART;
return HRTIMER_RESTART;
}
@@ -81,7 +82,7 @@ static void amdgpu_vkms_disable_vblank(struct drm_crtc *crtc)
{
struct amdgpu_crtc *amdgpu_crtc = to_amdgpu_crtc(crtc);
- hrtimer_cancel(&amdgpu_crtc->vblank_timer);
+ hrtimer_try_to_cancel(&amdgpu_crtc->vblank_timer);
}
static bool amdgpu_vkms_get_vblank_timestamp(struct drm_crtc *crtc,
--
2.25.1
Hi Greg, Sasha,
[ This is v2 appends patch #11 f838e0906dd3 as Florian suggested. ]
The following list shows the backported patches, I am using original
commit IDs for reference:
1) 0854db2aaef3 ("netfilter: nf_tables: use net_generic infra for transaction data")
2) 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
3) 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
4) 4bedf9eee016 ("netfilter: nf_tables: fix chain binding transaction logic")
5) 26b5a5712eb8 ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
6) 938154b93be8 ("netfilter: nf_tables: reject unbound anonymous set before commit phase")
7) 62e1e94b246e ("netfilter: nf_tables: reject unbound chain set before commit phase")
8) f8bb7889af58 ("netfilter: nftables: rename set element data activation/deactivation functions")
9) 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
10) 3e70489721b6 ("netfilter: nf_tables: unbind non-anonymous set if rule construction fails")
11) f838e0906dd3 ("netfilter: nf_tables: fix scheduling-while-atomic splat")
Note:
- Patch #1 is a backported dependency patch required by these fixes.
- Patch #2 needs a follow up fix coming in this series as patch #11.
Please, apply,
Thanks.
Florian Westphal (3):
netfilter: nf_tables: use net_generic infra for transaction data
netfilter: nf_tables: add rescheduling points during loop detection walks
netfilter: nf_tables: fix scheduling-while-atomic splat
Pablo Neira Ayuso (8):
netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
netfilter: nf_tables: fix chain binding transaction logic
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: reject unbound chain set before commit phase
netfilter: nftables: rename set element data activation/deactivation functions
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
include/net/netfilter/nf_tables.h | 41 +-
include/net/netns/nftables.h | 7 -
net/netfilter/nf_tables_api.c | 692 +++++++++++++++++++++---------
net/netfilter/nf_tables_offload.c | 30 +-
net/netfilter/nft_chain_filter.c | 11 +-
net/netfilter/nft_dynset.c | 6 +-
net/netfilter/nft_immediate.c | 90 +++-
net/netfilter/nft_set_bitmap.c | 5 +-
net/netfilter/nft_set_hash.c | 23 +-
net/netfilter/nft_set_pipapo.c | 14 +-
net/netfilter/nft_set_rbtree.c | 5 +-
11 files changed, 678 insertions(+), 246 deletions(-)
--
2.30.2
Dzień dobry,
jakiś czas temu zgłosiła się do nas firma, której strona internetowa nie pozycjonowała się wysoko w wyszukiwarce Google.
Na podstawie wykonanego przez nas audytu SEO zoptymalizowaliśmy treści na stronie pod kątem wcześniej opracowanych słów kluczowych. Nasz wewnętrzny system codziennie analizuje prawidłowe działanie witryny. Dzięki indywidualnej strategii, firma zdobywa coraz więcej Klientów.
Czy chcieliby Państwo zwiększyć liczbę osób odwiedzających stronę internetową firmy? Mógłbym przedstawić ofertę?
Pozdrawiam serdecznie,
Dominik Perkowski