Fixing the interpretation of the length of the B9h VPD page
(concurrent positioning ranges). Adding 4 is necessary as
the first 4 bytes of the page is the header with page number
and length information. Adding 3 was likely a misinterpretation
of the SBC-5 specification which sets all offsets starting at zero.
This fixes the error in dmesg:
[ 9.014456] sd 1:0:0:0: [sda] Invalid Concurrent Positioning Ranges VPD page
Cc: stable(a)vger.kernel.org
Fixes: e815d36548f0 ("scsi: sd: add concurrent positioning ranges support")
Signed-off-by: Tyler Erickson <tyler.erickson(a)seagate.com>
Reviewed-by: Muhammad Ahmad <muhammad.ahmad(a)seagate.com>
Tested-by: Michael English <michael.english(a)seagate.com>
---
drivers/scsi/sd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 749316462075..f25b0cc5dd21 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -3072,7 +3072,7 @@ static void sd_read_cpr(struct scsi_disk *sdkp)
goto out;
/* We must have at least a 64B header and one 32B range descriptor */
- vpd_len = get_unaligned_be16(&buffer[2]) + 3;
+ vpd_len = get_unaligned_be16(&buffer[2]) + 4;
if (vpd_len > buf_len || vpd_len < 64 + 32 || (vpd_len & 31)) {
sd_printk(KERN_ERR, sdkp,
"Invalid Concurrent Positioning Ranges VPD page\n");
--
2.17.1
From: Eric Biggers <ebiggers(a)google.com>
commit 5f41fdaea63ddf96d921ab36b2af4a90ccdb5744 upstream.
Make the test_dummy_encryption mount option require that the encrypt
feature flag be already enabled on the filesystem, rather than
automatically enabling it. Practically, this means that "-O encrypt"
will need to be included in MKFS_OPTIONS when running xfstests with the
test_dummy_encryption mount option. (ext4/053 also needs an update.)
Moreover, as long as the preconditions for test_dummy_encryption are
being tightened anyway, take the opportunity to start rejecting it when
!CONFIG_FS_ENCRYPTION rather than ignoring it.
The motivation for requiring the encrypt feature flag is that:
- Having the filesystem auto-enable feature flags is problematic, as it
bypasses the usual sanity checks. The specific issue which came up
recently is that in kernel versions where ext4 supports casefold but
not encrypt+casefold (v5.1 through v5.10), the kernel will happily add
the encrypt flag to a filesystem that has the casefold flag, making it
unmountable -- but only for subsequent mounts, not the initial one.
This confused the casefold support detection in xfstests, causing
generic/556 to fail rather than be skipped.
- The xfstests-bld test runners (kvm-xfstests et al.) already use the
required mkfs flag, so they will not be affected by this change. Only
users of test_dummy_encryption alone will be affected. But, this
option has always been for testing only, so it should be fine to
require that the few users of this option update their test scripts.
- f2fs already requires it (for its equivalent feature flag).
Signed-off-by: Eric Biggers <ebiggers(a)google.com>
Reviewed-by: Gabriel Krisman Bertazi <krisman(a)collabora.com>
Link: https://lore.kernel.org/r/20220519204437.61645-1-ebiggers@kernel.org
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
---
fs/ext4/ext4.h | 6 ------
fs/ext4/super.c | 18 ++++++++++--------
2 files changed, 10 insertions(+), 14 deletions(-)
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index a0a9878578949..2d84030d7b7fc 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1435,12 +1435,6 @@ struct ext4_super_block {
#ifdef __KERNEL__
-#ifdef CONFIG_FS_ENCRYPTION
-#define DUMMY_ENCRYPTION_ENABLED(sbi) ((sbi)->s_dummy_enc_policy.policy != NULL)
-#else
-#define DUMMY_ENCRYPTION_ENABLED(sbi) (0)
-#endif
-
/* Number of quota types we support */
#define EXT4_MAXQUOTAS 3
diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index d12f11c6fbf25..de7b0897f3d4b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -2058,6 +2058,12 @@ static int ext4_set_test_dummy_encryption(struct super_block *sb,
struct ext4_sb_info *sbi = EXT4_SB(sb);
int err;
+ if (!ext4_has_feature_encrypt(sb)) {
+ ext4_msg(sb, KERN_WARNING,
+ "test_dummy_encryption requires encrypt feature");
+ return -1;
+ }
+
/*
* This mount option is just for testing, and it's not worthwhile to
* implement the extra complexity (e.g. RCU protection) that would be
@@ -2085,11 +2091,13 @@ static int ext4_set_test_dummy_encryption(struct super_block *sb,
return -1;
}
ext4_msg(sb, KERN_WARNING, "Test dummy encryption mode enabled");
+ return 1;
#else
ext4_msg(sb, KERN_WARNING,
- "Test dummy encryption mount option ignored");
+ "test_dummy_encryption option not supported");
+ return -1;
+
#endif
- return 1;
}
struct ext4_parsed_options {
@@ -4783,12 +4791,6 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
goto failed_mount_wq;
}
- if (DUMMY_ENCRYPTION_ENABLED(sbi) && !sb_rdonly(sb) &&
- !ext4_has_feature_encrypt(sb)) {
- ext4_set_feature_encrypt(sb);
- ext4_commit_super(sb);
- }
-
/*
* Get the # of file system overhead blocks from the
* superblock if present.
base-commit: 207ca688162d4d77129981a8b4352114b97a52b5
--
2.36.1
There is a limitation in TI DP83867 PHY device where SGMII AN is only
triggered once after the device is booted up. Even after the PHY TPI is
down and up again, SGMII AN is not triggered and hence no new in-band
message from PHY to MAC side SGMII.
This could cause an issue during power up, when PHY is up prior to MAC.
At this condition, once MAC side SGMII is up, MAC side SGMII wouldn`t
receive new in-band message from TI PHY with correct link status, speed
and duplex info.
As suggested by TI, implemented a SW solution here to retrigger SGMII
Auto-Neg whenever there is a link change.
v2: Add Fixes tag in commit message.
Fixes: 2a10154abcb7 ("net: phy: dp83867: Add TI dp83867 phy")
Cc: <stable(a)vger.kernel.org> # 5.4.x
Signed-off-by: Sit, Michael Wei Hong <michael.wei.hong.sit(a)intel.com>
Reviewed-by: Voon Weifeng <weifeng.voon(a)intel.com>
Signed-off-by: Tan Tee Min <tee.min.tan(a)linux.intel.com>
---
drivers/net/phy/dp83867.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/net/phy/dp83867.c b/drivers/net/phy/dp83867.c
index 8561f2d4443b..13dafe7a29bd 100644
--- a/drivers/net/phy/dp83867.c
+++ b/drivers/net/phy/dp83867.c
@@ -137,6 +137,7 @@
#define DP83867_DOWNSHIFT_2_COUNT 2
#define DP83867_DOWNSHIFT_4_COUNT 4
#define DP83867_DOWNSHIFT_8_COUNT 8
+#define DP83867_SGMII_AUTONEG_EN BIT(7)
/* CFG3 bits */
#define DP83867_CFG3_INT_OE BIT(7)
@@ -855,6 +856,32 @@ static int dp83867_phy_reset(struct phy_device *phydev)
DP83867_PHYCR_FORCE_LINK_GOOD, 0);
}
+static void dp83867_link_change_notify(struct phy_device *phydev)
+{
+ /* There is a limitation in DP83867 PHY device where SGMII AN is
+ * only triggered once after the device is booted up. Even after the
+ * PHY TPI is down and up again, SGMII AN is not triggered and
+ * hence no new in-band message from PHY to MAC side SGMII.
+ * This could cause an issue during power up, when PHY is up prior
+ * to MAC. At this condition, once MAC side SGMII is up, MAC side
+ * SGMII wouldn`t receive new in-band message from TI PHY with
+ * correct link status, speed and duplex info.
+ * Thus, implemented a SW solution here to retrigger SGMII Auto-Neg
+ * whenever there is a link change.
+ */
+ if (phydev->interface == PHY_INTERFACE_MODE_SGMII) {
+ int val = 0;
+
+ val = phy_clear_bits(phydev, DP83867_CFG2,
+ DP83867_SGMII_AUTONEG_EN);
+ if (val < 0)
+ return;
+
+ phy_set_bits(phydev, DP83867_CFG2,
+ DP83867_SGMII_AUTONEG_EN);
+ }
+}
+
static struct phy_driver dp83867_driver[] = {
{
.phy_id = DP83867_PHY_ID,
@@ -879,6 +906,8 @@ static struct phy_driver dp83867_driver[] = {
.suspend = genphy_suspend,
.resume = genphy_resume,
+
+ .link_change_notify = dp83867_link_change_notify,
},
};
module_phy_driver(dp83867_driver);
--
2.25.1
Hello,
In this series are BFQ upstream fixes that didn't apply to 5.10 stable tree
cleanly and needed some massaging before they apply. The result did pass
some cgroup testing with bfq and the backport is based on the one we have
in our SLES kernel so I'm reasonably confident things are fine.
Honza
This series back-ports two bug fixes that landed very late in v5.18
cycle and didn't make it to the final release. They're present in
v5.19-rc1, but don't cherry-pick cleanly onto v5.18.2.
-Alex
Alex Elder (2):
net: ipa: fix page free in ipa_endpoint_trans_release()
net: ipa: fix page free in ipa_endpoint_replenish_one()
drivers/net/ipa/ipa_endpoint.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
--
2.32.0
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0a55cf74ffb5d004b93647e4389096880ce37d6b Mon Sep 17 00:00:00 2001
From: Steve French <stfrench(a)microsoft.com>
Date: Thu, 12 May 2022 10:18:00 -0500
Subject: [PATCH] SMB3: EBADF/EIO errors in rename/open caused by race
condition in smb2_compound_op
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
There is a race condition in smb2_compound_op:
after_close:
num_rqst++;
if (cfile) {
cifsFileInfo_put(cfile); // sends SMB2_CLOSE to the server
cfile = NULL;
This is triggered by smb2_query_path_info operation that happens during
revalidate_dentry. In smb2_query_path_info, get_readable_path is called to
load the cfile, increasing the reference counter. If in the meantime, this
reference becomes the very last, this call to cifsFileInfo_put(cfile) will
trigger a SMB2_CLOSE request sent to the server just before sending this compound
request – and so then the compound request fails either with EBADF/EIO depending
on the timing at the server, because the handle is already closed.
In the first scenario, the race seems to be happening between smb2_query_path_info
triggered by the rename operation, and between “cleanup” of asynchronous writes – while
fsync(fd) likely waits for the asynchronous writes to complete, releasing the writeback
structures can happen after the close(fd) call. So the EBADF/EIO errors will pop up if
the timing is such that:
1) There are still outstanding references after close(fd) in the writeback structures
2) smb2_query_path_info successfully fetches the cfile, increasing the refcounter by 1
3) All writeback structures release the same cfile, reducing refcounter to 1
4) smb2_compound_op is called with that cfile
In the second scenario, the race seems to be similar – here open triggers the
smb2_query_path_info operation, and if all other threads in the meantime decrease the
refcounter to 1 similarly to the first scenario, again SMB2_CLOSE will be sent to the
server just before issuing the compound request. This case is harder to reproduce.
See https://bugzilla.samba.org/show_bug.cgi?id=15051
Cc: stable(a)vger.kernel.org
Fixes: 8de9e86c67ba ("cifs: create a helper to find a writeable handle by path name")
Signed-off-by: Ondrej Hubsch <ohubsch(a)purestorage.com>
Reviewed-by: Ronnie Sahlberg <lsahlber(a)redhat.com>
Reviewed-by: Paulo Alcantara (SUSE) <pc(a)cjr.nz>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
diff --git a/fs/cifs/smb2inode.c b/fs/cifs/smb2inode.c
index fe5bfa245fa7..1b89b9b8a212 100644
--- a/fs/cifs/smb2inode.c
+++ b/fs/cifs/smb2inode.c
@@ -362,8 +362,6 @@ smb2_compound_op(const unsigned int xid, struct cifs_tcon *tcon,
num_rqst++;
if (cfile) {
- cifsFileInfo_put(cfile);
- cfile = NULL;
rc = compound_send_recv(xid, ses, server,
flags, num_rqst - 2,
&rqst[1], &resp_buftype[1],
From: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay(a)intel.com>
[ Upstream commit 0a967f5bfd9134b89681cae58deb222e20840e76 ]
The VT-d spec requires (10.4.4 Global Command Register, TE
field) that:
Hardware implementations supporting DMA draining must drain
any in-flight DMA read/write requests queued within the
Root-Complex before completing the translation enable
command and reflecting the status of the command through
the TES field in the Global Status register.
Unfortunately, some integrated graphic devices fail to do
so after some kind of power state transition. As the
result, the system might stuck in iommu_disable_translati
on(), waiting for the completion of TE transition.
This adds RPLS to a quirk list for those devices and skips
TE disabling if the qurik hits.
Link: https://gitlab.freedesktop.org/drm/intel/-/issues/4898
Tested-by: Raviteja Goud Talla <ravitejax.goud.talla(a)intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Acked-by: Lu Baolu <baolu.lu(a)linux.intel.com>
Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay(a)intel.com>
Reviewed-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi(a)intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220302043256.191529-1-tejas…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/iommu/intel/iommu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ab2273300346..e3f15e0cae34 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -5764,7 +5764,7 @@ static void quirk_igfx_skip_te_disable(struct pci_dev *dev)
ver = (dev->device >> 8) & 0xff;
if (ver != 0x45 && ver != 0x46 && ver != 0x4c &&
ver != 0x4e && ver != 0x8a && ver != 0x98 &&
- ver != 0x9a)
+ ver != 0x9a && ver != 0xa7)
return;
if (risky_device(dev))
--
2.35.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8b917cbe38e9b0d002492477a9fc2bfee2412ce4 Mon Sep 17 00:00:00 2001
From: Xiaomeng Tong <xiam0nd.tong(a)gmail.com>
Date: Sun, 27 Mar 2022 14:15:16 +0800
Subject: [PATCH] tilcdc: tilcdc_external: fix an incorrect NULL check on list
iterator
The bug is here:
if (!encoder) {
The list iterator value 'encoder' will *always* be set and non-NULL
by list_for_each_entry(), so it is incorrect to assume that the
iterator value will be NULL if the list is empty or no element
is found.
To fix the bug, use a new variable 'iter' as the list iterator,
while use the original variable 'encoder' as a dedicated pointer
to point to the found element.
Cc: stable(a)vger.kernel.org
Fixes: ec9eab097a500 ("drm/tilcdc: Add drm bridge support for attaching drm bridge drivers")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong(a)gmail.com>
Reviewed-by: Jyri Sarha <jyri.sarha(a)iki.fi>
Tested-by: Jyri Sarha <jyri.sarha(a)iki.fi>
Signed-off-by: Jyri Sarha <jyri.sarha(a)iki.fi>
Link: https://patchwork.freedesktop.org/patch/msgid/20220327061516.5076-1-xiam0nd…
diff --git a/drivers/gpu/drm/tilcdc/tilcdc_external.c b/drivers/gpu/drm/tilcdc/tilcdc_external.c
index 7594cf6e186e..3b86d002ef62 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_external.c
+++ b/drivers/gpu/drm/tilcdc/tilcdc_external.c
@@ -60,11 +60,13 @@ struct drm_connector *tilcdc_encoder_find_connector(struct drm_device *ddev,
int tilcdc_add_component_encoder(struct drm_device *ddev)
{
struct tilcdc_drm_private *priv = ddev->dev_private;
- struct drm_encoder *encoder;
+ struct drm_encoder *encoder = NULL, *iter;
- list_for_each_entry(encoder, &ddev->mode_config.encoder_list, head)
- if (encoder->possible_crtcs & (1 << priv->crtc->index))
+ list_for_each_entry(iter, &ddev->mode_config.encoder_list, head)
+ if (iter->possible_crtcs & (1 << priv->crtc->index)) {
+ encoder = iter;
break;
+ }
if (!encoder) {
dev_err(ddev->dev, "%s: No suitable encoder found\n", __func__);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8b917cbe38e9b0d002492477a9fc2bfee2412ce4 Mon Sep 17 00:00:00 2001
From: Xiaomeng Tong <xiam0nd.tong(a)gmail.com>
Date: Sun, 27 Mar 2022 14:15:16 +0800
Subject: [PATCH] tilcdc: tilcdc_external: fix an incorrect NULL check on list
iterator
The bug is here:
if (!encoder) {
The list iterator value 'encoder' will *always* be set and non-NULL
by list_for_each_entry(), so it is incorrect to assume that the
iterator value will be NULL if the list is empty or no element
is found.
To fix the bug, use a new variable 'iter' as the list iterator,
while use the original variable 'encoder' as a dedicated pointer
to point to the found element.
Cc: stable(a)vger.kernel.org
Fixes: ec9eab097a500 ("drm/tilcdc: Add drm bridge support for attaching drm bridge drivers")
Signed-off-by: Xiaomeng Tong <xiam0nd.tong(a)gmail.com>
Reviewed-by: Jyri Sarha <jyri.sarha(a)iki.fi>
Tested-by: Jyri Sarha <jyri.sarha(a)iki.fi>
Signed-off-by: Jyri Sarha <jyri.sarha(a)iki.fi>
Link: https://patchwork.freedesktop.org/patch/msgid/20220327061516.5076-1-xiam0nd…
diff --git a/drivers/gpu/drm/tilcdc/tilcdc_external.c b/drivers/gpu/drm/tilcdc/tilcdc_external.c
index 7594cf6e186e..3b86d002ef62 100644
--- a/drivers/gpu/drm/tilcdc/tilcdc_external.c
+++ b/drivers/gpu/drm/tilcdc/tilcdc_external.c
@@ -60,11 +60,13 @@ struct drm_connector *tilcdc_encoder_find_connector(struct drm_device *ddev,
int tilcdc_add_component_encoder(struct drm_device *ddev)
{
struct tilcdc_drm_private *priv = ddev->dev_private;
- struct drm_encoder *encoder;
+ struct drm_encoder *encoder = NULL, *iter;
- list_for_each_entry(encoder, &ddev->mode_config.encoder_list, head)
- if (encoder->possible_crtcs & (1 << priv->crtc->index))
+ list_for_each_entry(iter, &ddev->mode_config.encoder_list, head)
+ if (iter->possible_crtcs & (1 << priv->crtc->index)) {
+ encoder = iter;
break;
+ }
if (!encoder) {
dev_err(ddev->dev, "%s: No suitable encoder found\n", __func__);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d9f3af4fbb1d955bbaf872d9e76502f6e3e803cb Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jirislaby(a)kernel.org>
Date: Tue, 3 May 2022 10:08:03 +0200
Subject: [PATCH] serial: pch: don't overwrite xmit->buf[0] by x_char
When x_char is to be sent, the TX path overwrites whatever is in the
circular buffer at offset 0 with x_char and sends it using
pch_uart_hal_write(). I don't understand how this was supposed to work
if xmit->buf[0] already contained some character. It must have been
lost.
Remove this whole pop_tx_x() concept and do the work directly in the
callers. (Without printing anything using dev_dbg().)
Cc: <stable(a)vger.kernel.org>
Fixes: 3c6a483275f4 (Serial: EG20T: add PCH_UART driver)
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
Link: https://lore.kernel.org/r/20220503080808.28332-1-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/pch_uart.c b/drivers/tty/serial/pch_uart.c
index f872613a5e83..6cb631487383 100644
--- a/drivers/tty/serial/pch_uart.c
+++ b/drivers/tty/serial/pch_uart.c
@@ -624,22 +624,6 @@ static int push_rx(struct eg20t_port *priv, const unsigned char *buf,
return 0;
}
-static int pop_tx_x(struct eg20t_port *priv, unsigned char *buf)
-{
- int ret = 0;
- struct uart_port *port = &priv->port;
-
- if (port->x_char) {
- dev_dbg(priv->port.dev, "%s:X character send %02x (%lu)\n",
- __func__, port->x_char, jiffies);
- buf[0] = port->x_char;
- port->x_char = 0;
- ret = 1;
- }
-
- return ret;
-}
-
static int dma_push_rx(struct eg20t_port *priv, int size)
{
int room;
@@ -889,9 +873,10 @@ static unsigned int handle_tx(struct eg20t_port *priv)
fifo_size = max(priv->fifo_size, 1);
tx_empty = 1;
- if (pop_tx_x(priv, xmit->buf)) {
- pch_uart_hal_write(priv, xmit->buf, 1);
+ if (port->x_char) {
+ pch_uart_hal_write(priv, &port->x_char, 1);
port->icount.tx++;
+ port->x_char = 0;
tx_empty = 0;
fifo_size--;
}
@@ -948,9 +933,11 @@ static unsigned int dma_handle_tx(struct eg20t_port *priv)
}
fifo_size = max(priv->fifo_size, 1);
- if (pop_tx_x(priv, xmit->buf)) {
- pch_uart_hal_write(priv, xmit->buf, 1);
+
+ if (port->x_char) {
+ pch_uart_hal_write(priv, &port->x_char, 1);
port->icount.tx++;
+ port->x_char = 0;
fifo_size--;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d9f3af4fbb1d955bbaf872d9e76502f6e3e803cb Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jirislaby(a)kernel.org>
Date: Tue, 3 May 2022 10:08:03 +0200
Subject: [PATCH] serial: pch: don't overwrite xmit->buf[0] by x_char
When x_char is to be sent, the TX path overwrites whatever is in the
circular buffer at offset 0 with x_char and sends it using
pch_uart_hal_write(). I don't understand how this was supposed to work
if xmit->buf[0] already contained some character. It must have been
lost.
Remove this whole pop_tx_x() concept and do the work directly in the
callers. (Without printing anything using dev_dbg().)
Cc: <stable(a)vger.kernel.org>
Fixes: 3c6a483275f4 (Serial: EG20T: add PCH_UART driver)
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
Link: https://lore.kernel.org/r/20220503080808.28332-1-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/pch_uart.c b/drivers/tty/serial/pch_uart.c
index f872613a5e83..6cb631487383 100644
--- a/drivers/tty/serial/pch_uart.c
+++ b/drivers/tty/serial/pch_uart.c
@@ -624,22 +624,6 @@ static int push_rx(struct eg20t_port *priv, const unsigned char *buf,
return 0;
}
-static int pop_tx_x(struct eg20t_port *priv, unsigned char *buf)
-{
- int ret = 0;
- struct uart_port *port = &priv->port;
-
- if (port->x_char) {
- dev_dbg(priv->port.dev, "%s:X character send %02x (%lu)\n",
- __func__, port->x_char, jiffies);
- buf[0] = port->x_char;
- port->x_char = 0;
- ret = 1;
- }
-
- return ret;
-}
-
static int dma_push_rx(struct eg20t_port *priv, int size)
{
int room;
@@ -889,9 +873,10 @@ static unsigned int handle_tx(struct eg20t_port *priv)
fifo_size = max(priv->fifo_size, 1);
tx_empty = 1;
- if (pop_tx_x(priv, xmit->buf)) {
- pch_uart_hal_write(priv, xmit->buf, 1);
+ if (port->x_char) {
+ pch_uart_hal_write(priv, &port->x_char, 1);
port->icount.tx++;
+ port->x_char = 0;
tx_empty = 0;
fifo_size--;
}
@@ -948,9 +933,11 @@ static unsigned int dma_handle_tx(struct eg20t_port *priv)
}
fifo_size = max(priv->fifo_size, 1);
- if (pop_tx_x(priv, xmit->buf)) {
- pch_uart_hal_write(priv, xmit->buf, 1);
+
+ if (port->x_char) {
+ pch_uart_hal_write(priv, &port->x_char, 1);
port->icount.tx++;
+ port->x_char = 0;
fifo_size--;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d9f3af4fbb1d955bbaf872d9e76502f6e3e803cb Mon Sep 17 00:00:00 2001
From: Jiri Slaby <jirislaby(a)kernel.org>
Date: Tue, 3 May 2022 10:08:03 +0200
Subject: [PATCH] serial: pch: don't overwrite xmit->buf[0] by x_char
When x_char is to be sent, the TX path overwrites whatever is in the
circular buffer at offset 0 with x_char and sends it using
pch_uart_hal_write(). I don't understand how this was supposed to work
if xmit->buf[0] already contained some character. It must have been
lost.
Remove this whole pop_tx_x() concept and do the work directly in the
callers. (Without printing anything using dev_dbg().)
Cc: <stable(a)vger.kernel.org>
Fixes: 3c6a483275f4 (Serial: EG20T: add PCH_UART driver)
Signed-off-by: Jiri Slaby <jslaby(a)suse.cz>
Link: https://lore.kernel.org/r/20220503080808.28332-1-jslaby@suse.cz
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/tty/serial/pch_uart.c b/drivers/tty/serial/pch_uart.c
index f872613a5e83..6cb631487383 100644
--- a/drivers/tty/serial/pch_uart.c
+++ b/drivers/tty/serial/pch_uart.c
@@ -624,22 +624,6 @@ static int push_rx(struct eg20t_port *priv, const unsigned char *buf,
return 0;
}
-static int pop_tx_x(struct eg20t_port *priv, unsigned char *buf)
-{
- int ret = 0;
- struct uart_port *port = &priv->port;
-
- if (port->x_char) {
- dev_dbg(priv->port.dev, "%s:X character send %02x (%lu)\n",
- __func__, port->x_char, jiffies);
- buf[0] = port->x_char;
- port->x_char = 0;
- ret = 1;
- }
-
- return ret;
-}
-
static int dma_push_rx(struct eg20t_port *priv, int size)
{
int room;
@@ -889,9 +873,10 @@ static unsigned int handle_tx(struct eg20t_port *priv)
fifo_size = max(priv->fifo_size, 1);
tx_empty = 1;
- if (pop_tx_x(priv, xmit->buf)) {
- pch_uart_hal_write(priv, xmit->buf, 1);
+ if (port->x_char) {
+ pch_uart_hal_write(priv, &port->x_char, 1);
port->icount.tx++;
+ port->x_char = 0;
tx_empty = 0;
fifo_size--;
}
@@ -948,9 +933,11 @@ static unsigned int dma_handle_tx(struct eg20t_port *priv)
}
fifo_size = max(priv->fifo_size, 1);
- if (pop_tx_x(priv, xmit->buf)) {
- pch_uart_hal_write(priv, xmit->buf, 1);
+
+ if (port->x_char) {
+ pch_uart_hal_write(priv, &port->x_char, 1);
port->icount.tx++;
+ port->x_char = 0;
fifo_size--;
}
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 43994049180704fd1faf78623fabd9a5cd443708 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Wed, 4 May 2022 12:36:31 +0900
Subject: [PATCH] kprobes: Fix build errors with CONFIG_KRETPROBES=n
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Max Filippov reported:
When building kernel with CONFIG_KRETPROBES=n kernel/kprobes.c
compilation fails with the following messages:
kernel/kprobes.c: In function ‘recycle_rp_inst’:
kernel/kprobes.c:1273:32: error: implicit declaration of function
‘get_kretprobe’
kernel/kprobes.c: In function ‘kprobe_flush_task’:
kernel/kprobes.c:1299:35: error: ‘struct task_struct’ has no member
named ‘kretprobe_instances’
This came from the commit d741bf41d7c7 ("kprobes: Remove
kretprobe hash") which introduced get_kretprobe() and
kretprobe_instances member in task_struct when CONFIG_KRETPROBES=y,
but did not make recycle_rp_inst() and kprobe_flush_task()
depending on CONFIG_KRETPORBES.
Since those functions are only used for kretprobe, move those
functions into #ifdef CONFIG_KRETPROBE area.
Link: https://lkml.kernel.org/r/165163539094.74407.3838114721073251225.stgit@devn…
Reported-by: Max Filippov <jcmvbkbc(a)gmail.com>
Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash")
Cc: "Naveen N . Rao" <naveen.n.rao(a)linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy(a)intel.com>
Cc: "David S . Miller" <davem(a)davemloft.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Tested-by: Max Filippov <jcmvbkbc(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 157168769fc2..55041d2f884d 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -424,7 +424,7 @@ void unregister_kretprobe(struct kretprobe *rp);
int register_kretprobes(struct kretprobe **rps, int num);
void unregister_kretprobes(struct kretprobe **rps, int num);
-#ifdef CONFIG_KRETPROBE_ON_RETHOOK
+#if defined(CONFIG_KRETPROBE_ON_RETHOOK) || !defined(CONFIG_KRETPROBES)
#define kprobe_flush_task(tk) do {} while (0)
#else
void kprobe_flush_task(struct task_struct *tk);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index dbe57df2e199..172ba75a4a49 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1257,79 +1257,6 @@ void kprobe_busy_end(void)
preempt_enable();
}
-#if !defined(CONFIG_KRETPROBE_ON_RETHOOK)
-static void free_rp_inst_rcu(struct rcu_head *head)
-{
- struct kretprobe_instance *ri = container_of(head, struct kretprobe_instance, rcu);
-
- if (refcount_dec_and_test(&ri->rph->ref))
- kfree(ri->rph);
- kfree(ri);
-}
-NOKPROBE_SYMBOL(free_rp_inst_rcu);
-
-static void recycle_rp_inst(struct kretprobe_instance *ri)
-{
- struct kretprobe *rp = get_kretprobe(ri);
-
- if (likely(rp))
- freelist_add(&ri->freelist, &rp->freelist);
- else
- call_rcu(&ri->rcu, free_rp_inst_rcu);
-}
-NOKPROBE_SYMBOL(recycle_rp_inst);
-
-/*
- * This function is called from delayed_put_task_struct() when a task is
- * dead and cleaned up to recycle any kretprobe instances associated with
- * this task. These left over instances represent probed functions that
- * have been called but will never return.
- */
-void kprobe_flush_task(struct task_struct *tk)
-{
- struct kretprobe_instance *ri;
- struct llist_node *node;
-
- /* Early boot, not yet initialized. */
- if (unlikely(!kprobes_initialized))
- return;
-
- kprobe_busy_begin();
-
- node = __llist_del_all(&tk->kretprobe_instances);
- while (node) {
- ri = container_of(node, struct kretprobe_instance, llist);
- node = node->next;
-
- recycle_rp_inst(ri);
- }
-
- kprobe_busy_end();
-}
-NOKPROBE_SYMBOL(kprobe_flush_task);
-
-static inline void free_rp_inst(struct kretprobe *rp)
-{
- struct kretprobe_instance *ri;
- struct freelist_node *node;
- int count = 0;
-
- node = rp->freelist.head;
- while (node) {
- ri = container_of(node, struct kretprobe_instance, freelist);
- node = node->next;
-
- kfree(ri);
- count++;
- }
-
- if (refcount_sub_and_test(count, &rp->rph->ref)) {
- kfree(rp->rph);
- rp->rph = NULL;
- }
-}
-#endif /* !CONFIG_KRETPROBE_ON_RETHOOK */
-
/* Add the new probe to 'ap->list'. */
static int add_new_kprobe(struct kprobe *ap, struct kprobe *p)
{
@@ -1928,6 +1855,77 @@ static struct notifier_block kprobe_exceptions_nb = {
#ifdef CONFIG_KRETPROBES
#if !defined(CONFIG_KRETPROBE_ON_RETHOOK)
+static void free_rp_inst_rcu(struct rcu_head *head)
+{
+ struct kretprobe_instance *ri = container_of(head, struct kretprobe_instance, rcu);
+
+ if (refcount_dec_and_test(&ri->rph->ref))
+ kfree(ri->rph);
+ kfree(ri);
+}
+NOKPROBE_SYMBOL(free_rp_inst_rcu);
+
+static void recycle_rp_inst(struct kretprobe_instance *ri)
+{
+ struct kretprobe *rp = get_kretprobe(ri);
+
+ if (likely(rp))
+ freelist_add(&ri->freelist, &rp->freelist);
+ else
+ call_rcu(&ri->rcu, free_rp_inst_rcu);
+}
+NOKPROBE_SYMBOL(recycle_rp_inst);
+
+/*
+ * This function is called from delayed_put_task_struct() when a task is
+ * dead and cleaned up to recycle any kretprobe instances associated with
+ * this task. These left over instances represent probed functions that
+ * have been called but will never return.
+ */
+void kprobe_flush_task(struct task_struct *tk)
+{
+ struct kretprobe_instance *ri;
+ struct llist_node *node;
+
+ /* Early boot, not yet initialized. */
+ if (unlikely(!kprobes_initialized))
+ return;
+
+ kprobe_busy_begin();
+
+ node = __llist_del_all(&tk->kretprobe_instances);
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, llist);
+ node = node->next;
+
+ recycle_rp_inst(ri);
+ }
+
+ kprobe_busy_end();
+}
+NOKPROBE_SYMBOL(kprobe_flush_task);
+
+static inline void free_rp_inst(struct kretprobe *rp)
+{
+ struct kretprobe_instance *ri;
+ struct freelist_node *node;
+ int count = 0;
+
+ node = rp->freelist.head;
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, freelist);
+ node = node->next;
+
+ kfree(ri);
+ count++;
+ }
+
+ if (refcount_sub_and_test(count, &rp->rph->ref)) {
+ kfree(rp->rph);
+ rp->rph = NULL;
+ }
+}
+
/* This assumes the 'tsk' is the current task or the is not running. */
static kprobe_opcode_t *__kretprobe_find_ret_addr(struct task_struct *tsk,
struct llist_node **cur)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 43994049180704fd1faf78623fabd9a5cd443708 Mon Sep 17 00:00:00 2001
From: Masami Hiramatsu <mhiramat(a)kernel.org>
Date: Wed, 4 May 2022 12:36:31 +0900
Subject: [PATCH] kprobes: Fix build errors with CONFIG_KRETPROBES=n
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Max Filippov reported:
When building kernel with CONFIG_KRETPROBES=n kernel/kprobes.c
compilation fails with the following messages:
kernel/kprobes.c: In function ‘recycle_rp_inst’:
kernel/kprobes.c:1273:32: error: implicit declaration of function
‘get_kretprobe’
kernel/kprobes.c: In function ‘kprobe_flush_task’:
kernel/kprobes.c:1299:35: error: ‘struct task_struct’ has no member
named ‘kretprobe_instances’
This came from the commit d741bf41d7c7 ("kprobes: Remove
kretprobe hash") which introduced get_kretprobe() and
kretprobe_instances member in task_struct when CONFIG_KRETPROBES=y,
but did not make recycle_rp_inst() and kprobe_flush_task()
depending on CONFIG_KRETPORBES.
Since those functions are only used for kretprobe, move those
functions into #ifdef CONFIG_KRETPROBE area.
Link: https://lkml.kernel.org/r/165163539094.74407.3838114721073251225.stgit@devn…
Reported-by: Max Filippov <jcmvbkbc(a)gmail.com>
Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash")
Cc: "Naveen N . Rao" <naveen.n.rao(a)linux.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy(a)intel.com>
Cc: "David S . Miller" <davem(a)davemloft.net>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: stable(a)vger.kernel.org
Signed-off-by: Masami Hiramatsu <mhiramat(a)kernel.org>
Tested-by: Max Filippov <jcmvbkbc(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h
index 157168769fc2..55041d2f884d 100644
--- a/include/linux/kprobes.h
+++ b/include/linux/kprobes.h
@@ -424,7 +424,7 @@ void unregister_kretprobe(struct kretprobe *rp);
int register_kretprobes(struct kretprobe **rps, int num);
void unregister_kretprobes(struct kretprobe **rps, int num);
-#ifdef CONFIG_KRETPROBE_ON_RETHOOK
+#if defined(CONFIG_KRETPROBE_ON_RETHOOK) || !defined(CONFIG_KRETPROBES)
#define kprobe_flush_task(tk) do {} while (0)
#else
void kprobe_flush_task(struct task_struct *tk);
diff --git a/kernel/kprobes.c b/kernel/kprobes.c
index dbe57df2e199..172ba75a4a49 100644
--- a/kernel/kprobes.c
+++ b/kernel/kprobes.c
@@ -1257,79 +1257,6 @@ void kprobe_busy_end(void)
preempt_enable();
}
-#if !defined(CONFIG_KRETPROBE_ON_RETHOOK)
-static void free_rp_inst_rcu(struct rcu_head *head)
-{
- struct kretprobe_instance *ri = container_of(head, struct kretprobe_instance, rcu);
-
- if (refcount_dec_and_test(&ri->rph->ref))
- kfree(ri->rph);
- kfree(ri);
-}
-NOKPROBE_SYMBOL(free_rp_inst_rcu);
-
-static void recycle_rp_inst(struct kretprobe_instance *ri)
-{
- struct kretprobe *rp = get_kretprobe(ri);
-
- if (likely(rp))
- freelist_add(&ri->freelist, &rp->freelist);
- else
- call_rcu(&ri->rcu, free_rp_inst_rcu);
-}
-NOKPROBE_SYMBOL(recycle_rp_inst);
-
-/*
- * This function is called from delayed_put_task_struct() when a task is
- * dead and cleaned up to recycle any kretprobe instances associated with
- * this task. These left over instances represent probed functions that
- * have been called but will never return.
- */
-void kprobe_flush_task(struct task_struct *tk)
-{
- struct kretprobe_instance *ri;
- struct llist_node *node;
-
- /* Early boot, not yet initialized. */
- if (unlikely(!kprobes_initialized))
- return;
-
- kprobe_busy_begin();
-
- node = __llist_del_all(&tk->kretprobe_instances);
- while (node) {
- ri = container_of(node, struct kretprobe_instance, llist);
- node = node->next;
-
- recycle_rp_inst(ri);
- }
-
- kprobe_busy_end();
-}
-NOKPROBE_SYMBOL(kprobe_flush_task);
-
-static inline void free_rp_inst(struct kretprobe *rp)
-{
- struct kretprobe_instance *ri;
- struct freelist_node *node;
- int count = 0;
-
- node = rp->freelist.head;
- while (node) {
- ri = container_of(node, struct kretprobe_instance, freelist);
- node = node->next;
-
- kfree(ri);
- count++;
- }
-
- if (refcount_sub_and_test(count, &rp->rph->ref)) {
- kfree(rp->rph);
- rp->rph = NULL;
- }
-}
-#endif /* !CONFIG_KRETPROBE_ON_RETHOOK */
-
/* Add the new probe to 'ap->list'. */
static int add_new_kprobe(struct kprobe *ap, struct kprobe *p)
{
@@ -1928,6 +1855,77 @@ static struct notifier_block kprobe_exceptions_nb = {
#ifdef CONFIG_KRETPROBES
#if !defined(CONFIG_KRETPROBE_ON_RETHOOK)
+static void free_rp_inst_rcu(struct rcu_head *head)
+{
+ struct kretprobe_instance *ri = container_of(head, struct kretprobe_instance, rcu);
+
+ if (refcount_dec_and_test(&ri->rph->ref))
+ kfree(ri->rph);
+ kfree(ri);
+}
+NOKPROBE_SYMBOL(free_rp_inst_rcu);
+
+static void recycle_rp_inst(struct kretprobe_instance *ri)
+{
+ struct kretprobe *rp = get_kretprobe(ri);
+
+ if (likely(rp))
+ freelist_add(&ri->freelist, &rp->freelist);
+ else
+ call_rcu(&ri->rcu, free_rp_inst_rcu);
+}
+NOKPROBE_SYMBOL(recycle_rp_inst);
+
+/*
+ * This function is called from delayed_put_task_struct() when a task is
+ * dead and cleaned up to recycle any kretprobe instances associated with
+ * this task. These left over instances represent probed functions that
+ * have been called but will never return.
+ */
+void kprobe_flush_task(struct task_struct *tk)
+{
+ struct kretprobe_instance *ri;
+ struct llist_node *node;
+
+ /* Early boot, not yet initialized. */
+ if (unlikely(!kprobes_initialized))
+ return;
+
+ kprobe_busy_begin();
+
+ node = __llist_del_all(&tk->kretprobe_instances);
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, llist);
+ node = node->next;
+
+ recycle_rp_inst(ri);
+ }
+
+ kprobe_busy_end();
+}
+NOKPROBE_SYMBOL(kprobe_flush_task);
+
+static inline void free_rp_inst(struct kretprobe *rp)
+{
+ struct kretprobe_instance *ri;
+ struct freelist_node *node;
+ int count = 0;
+
+ node = rp->freelist.head;
+ while (node) {
+ ri = container_of(node, struct kretprobe_instance, freelist);
+ node = node->next;
+
+ kfree(ri);
+ count++;
+ }
+
+ if (refcount_sub_and_test(count, &rp->rph->ref)) {
+ kfree(rp->rph);
+ rp->rph = NULL;
+ }
+}
+
/* This assumes the 'tsk' is the current task or the is not running. */
static kprobe_opcode_t *__kretprobe_find_ret_addr(struct task_struct *tsk,
struct llist_node **cur)
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f44b3e74c33fe04defeff24ebcae98c3bcc5b285 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Sun, 1 May 2022 23:14:22 +0100
Subject: [PATCH] MIPS: IP30: Remove incorrect `cpu_has_fpu' override
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Reported-by: Stephen Zhang <starzhangzsd(a)gmail.com>
Fixes: 7505576d1c1a ("MIPS: add support for SGI Octane (IP30)")
Cc: stable(a)vger.kernel.org # v5.5+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h b/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
index 8ad0c424a9af..ce4e4c6e09e2 100644
--- a/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
@@ -28,7 +28,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f44b3e74c33fe04defeff24ebcae98c3bcc5b285 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Sun, 1 May 2022 23:14:22 +0100
Subject: [PATCH] MIPS: IP30: Remove incorrect `cpu_has_fpu' override
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Reported-by: Stephen Zhang <starzhangzsd(a)gmail.com>
Fixes: 7505576d1c1a ("MIPS: add support for SGI Octane (IP30)")
Cc: stable(a)vger.kernel.org # v5.5+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h b/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
index 8ad0c424a9af..ce4e4c6e09e2 100644
--- a/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
@@ -28,7 +28,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f44b3e74c33fe04defeff24ebcae98c3bcc5b285 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Sun, 1 May 2022 23:14:22 +0100
Subject: [PATCH] MIPS: IP30: Remove incorrect `cpu_has_fpu' override
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Reported-by: Stephen Zhang <starzhangzsd(a)gmail.com>
Fixes: 7505576d1c1a ("MIPS: add support for SGI Octane (IP30)")
Cc: stable(a)vger.kernel.org # v5.5+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h b/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
index 8ad0c424a9af..ce4e4c6e09e2 100644
--- a/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-ip30/cpu-feature-overrides.h
@@ -28,7 +28,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 424c3781dd1cb401857585331eaaa425a13f2429 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Sun, 1 May 2022 23:14:16 +0100
Subject: [PATCH] MIPS: IP27: Remove incorrect `cpu_has_fpu' override
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Reported-by: Stephen Zhang <starzhangzsd(a)gmail.com>
Fixes: 0ebb2f4159af ("MIPS: IP27: Update/restructure CPU overrides")
Cc: stable(a)vger.kernel.org # v4.2+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
index c8385c4e8664..568fe09332eb 100644
--- a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
@@ -25,7 +25,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 424c3781dd1cb401857585331eaaa425a13f2429 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Sun, 1 May 2022 23:14:16 +0100
Subject: [PATCH] MIPS: IP27: Remove incorrect `cpu_has_fpu' override
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Reported-by: Stephen Zhang <starzhangzsd(a)gmail.com>
Fixes: 0ebb2f4159af ("MIPS: IP27: Update/restructure CPU overrides")
Cc: stable(a)vger.kernel.org # v4.2+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
index c8385c4e8664..568fe09332eb 100644
--- a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
@@ -25,7 +25,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 424c3781dd1cb401857585331eaaa425a13f2429 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Sun, 1 May 2022 23:14:16 +0100
Subject: [PATCH] MIPS: IP27: Remove incorrect `cpu_has_fpu' override
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Reported-by: Stephen Zhang <starzhangzsd(a)gmail.com>
Fixes: 0ebb2f4159af ("MIPS: IP27: Update/restructure CPU overrides")
Cc: stable(a)vger.kernel.org # v4.2+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
index c8385c4e8664..568fe09332eb 100644
--- a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
@@ -25,7 +25,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 424c3781dd1cb401857585331eaaa425a13f2429 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Sun, 1 May 2022 23:14:16 +0100
Subject: [PATCH] MIPS: IP27: Remove incorrect `cpu_has_fpu' override
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Reported-by: Stephen Zhang <starzhangzsd(a)gmail.com>
Fixes: 0ebb2f4159af ("MIPS: IP27: Update/restructure CPU overrides")
Cc: stable(a)vger.kernel.org # v4.2+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
index c8385c4e8664..568fe09332eb 100644
--- a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
@@ -25,7 +25,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 424c3781dd1cb401857585331eaaa425a13f2429 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Sun, 1 May 2022 23:14:16 +0100
Subject: [PATCH] MIPS: IP27: Remove incorrect `cpu_has_fpu' override
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Reported-by: Stephen Zhang <starzhangzsd(a)gmail.com>
Fixes: 0ebb2f4159af ("MIPS: IP27: Update/restructure CPU overrides")
Cc: stable(a)vger.kernel.org # v4.2+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
index c8385c4e8664..568fe09332eb 100644
--- a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
@@ -25,7 +25,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 424c3781dd1cb401857585331eaaa425a13f2429 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Sun, 1 May 2022 23:14:16 +0100
Subject: [PATCH] MIPS: IP27: Remove incorrect `cpu_has_fpu' override
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Reported-by: Stephen Zhang <starzhangzsd(a)gmail.com>
Fixes: 0ebb2f4159af ("MIPS: IP27: Update/restructure CPU overrides")
Cc: stable(a)vger.kernel.org # v4.2+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
index c8385c4e8664..568fe09332eb 100644
--- a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
@@ -25,7 +25,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 424c3781dd1cb401857585331eaaa425a13f2429 Mon Sep 17 00:00:00 2001
From: "Maciej W. Rozycki" <macro(a)orcam.me.uk>
Date: Sun, 1 May 2022 23:14:16 +0100
Subject: [PATCH] MIPS: IP27: Remove incorrect `cpu_has_fpu' override
Remove unsupported forcing of `cpu_has_fpu' to 1, which makes the `nofpu'
kernel parameter non-functional, and also causes a link error:
ld: arch/mips/kernel/traps.o: in function `trap_init':
./arch/mips/include/asm/msa.h:(.init.text+0x348): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x354): undefined reference to `handle_fpe'
ld: ./arch/mips/include/asm/msa.h:(.init.text+0x360): undefined reference to `handle_fpe'
where the CONFIG_MIPS_FP_SUPPORT configuration option has been disabled.
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Reported-by: Stephen Zhang <starzhangzsd(a)gmail.com>
Fixes: 0ebb2f4159af ("MIPS: IP27: Update/restructure CPU overrides")
Cc: stable(a)vger.kernel.org # v4.2+
Signed-off-by: Thomas Bogendoerfer <tsbogend(a)alpha.franken.de>
diff --git a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
index c8385c4e8664..568fe09332eb 100644
--- a/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
+++ b/arch/mips/include/asm/mach-ip27/cpu-feature-overrides.h
@@ -25,7 +25,6 @@
#define cpu_has_4kex 1
#define cpu_has_3k_cache 0
#define cpu_has_4k_cache 1
-#define cpu_has_fpu 1
#define cpu_has_nofpuex 0
#define cpu_has_32fpr 1
#define cpu_has_counter 1
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From af9fb41ed315ce95f659f0b10b4d59a71975381d Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg(a)intel.com>
Date: Tue, 17 May 2022 22:52:50 +0200
Subject: [PATCH] um: virtio_uml: Fix broken device handling in time-travel
If a device implementation crashes, virtio_uml will mark it
as dead by calling virtio_break_device() and scheduling the
work that will remove it.
This still seems like the right thing to do, but it's done
directly while reading the message, and if time-travel is
used, this is in the time-travel handler, outside of the
normal Linux machinery. Therefore, we cannot acquire locks
or do normal "linux-y" things because e.g. lockdep will be
confused about the context.
Move handling this situation out of the read function and
into the actual IRQ handler and response handling instead,
so that in the case of time-travel we don't call it in the
wrong context.
Chances are the system will still crash immediately, since
the device implementation crashing may also cause the time-
travel controller to go down, but at least all of that now
happens without strange warnings from lockdep.
Fixes: c8177aba37ca ("um: time-travel: rework interrupt handling in ext mode")
Cc: stable(a)vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Signed-off-by: Richard Weinberger <richard(a)nod.at>
diff --git a/arch/um/drivers/virtio_uml.c b/arch/um/drivers/virtio_uml.c
index ba562d68dc04..82ff3785bf69 100644
--- a/arch/um/drivers/virtio_uml.c
+++ b/arch/um/drivers/virtio_uml.c
@@ -63,6 +63,7 @@ struct virtio_uml_device {
u8 config_changed_irq:1;
uint64_t vq_irq_vq_map;
+ int recv_rc;
};
struct virtio_uml_vq_info {
@@ -148,14 +149,6 @@ static int vhost_user_recv(struct virtio_uml_device *vu_dev,
rc = vhost_user_recv_header(fd, msg);
- if (rc == -ECONNRESET && vu_dev->registered) {
- struct virtio_uml_platform_data *pdata;
-
- pdata = vu_dev->pdata;
-
- virtio_break_device(&vu_dev->vdev);
- schedule_work(&pdata->conn_broken_wk);
- }
if (rc)
return rc;
size = msg->header.size;
@@ -164,6 +157,21 @@ static int vhost_user_recv(struct virtio_uml_device *vu_dev,
return full_read(fd, &msg->payload, size, false);
}
+static void vhost_user_check_reset(struct virtio_uml_device *vu_dev,
+ int rc)
+{
+ struct virtio_uml_platform_data *pdata = vu_dev->pdata;
+
+ if (rc != -ECONNRESET)
+ return;
+
+ if (!vu_dev->registered)
+ return;
+
+ virtio_break_device(&vu_dev->vdev);
+ schedule_work(&pdata->conn_broken_wk);
+}
+
static int vhost_user_recv_resp(struct virtio_uml_device *vu_dev,
struct vhost_user_msg *msg,
size_t max_payload_size)
@@ -171,8 +179,10 @@ static int vhost_user_recv_resp(struct virtio_uml_device *vu_dev,
int rc = vhost_user_recv(vu_dev, vu_dev->sock, msg,
max_payload_size, true);
- if (rc)
+ if (rc) {
+ vhost_user_check_reset(vu_dev, rc);
return rc;
+ }
if (msg->header.flags != (VHOST_USER_FLAG_REPLY | VHOST_USER_VERSION))
return -EPROTO;
@@ -369,6 +379,7 @@ static irqreturn_t vu_req_read_message(struct virtio_uml_device *vu_dev,
sizeof(msg.msg.payload) +
sizeof(msg.extra_payload));
+ vu_dev->recv_rc = rc;
if (rc)
return IRQ_NONE;
@@ -412,7 +423,9 @@ static irqreturn_t vu_req_interrupt(int irq, void *data)
if (!um_irq_timetravel_handler_used())
ret = vu_req_read_message(vu_dev, NULL);
- if (vu_dev->vq_irq_vq_map) {
+ if (vu_dev->recv_rc) {
+ vhost_user_check_reset(vu_dev, vu_dev->recv_rc);
+ } else if (vu_dev->vq_irq_vq_map) {
struct virtqueue *vq;
virtio_device_for_each_vq((&vu_dev->vdev), vq) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 891163adf180bc369b2f11c9dfce6d2758d2a5bd Mon Sep 17 00:00:00 2001
From: GUO Zihua <guozihua(a)huawei.com>
Date: Thu, 7 Apr 2022 10:16:19 +0800
Subject: [PATCH] ima: remove the IMA_TEMPLATE Kconfig option
The original 'ima' measurement list template contains a hash, defined
as 20 bytes, and a null terminated pathname, limited to 255
characters. Other measurement list templates permit both larger hashes
and longer pathnames. When the "ima" template is configured as the
default, a new measurement list template (ima_template=) must be
specified before specifying a larger hash algorithm (ima_hash=) on the
boot command line.
To avoid this boot command line ordering issue, remove the legacy "ima"
template configuration option, allowing it to still be specified on the
boot command line.
The root cause of this issue is that during the processing of ima_hash,
we would try to check whether the hash algorithm is compatible with the
template. If the template is not set at the moment we do the check, we
check the algorithm against the configured default template. If the
default template is "ima", then we reject any hash algorithm other than
sha1 and md5.
For example, if the compiled default template is "ima", and the default
algorithm is sha1 (which is the current default). In the cmdline, we put
in "ima_hash=sha256 ima_template=ima-ng". The expected behavior would be
that ima starts with ima-ng as the template and sha256 as the hash
algorithm. However, during the processing of "ima_hash=",
"ima_template=" has not been processed yet, and hash_setup would check
the configured hash algorithm against the compiled default: ima, and
reject sha256. So at the end, the hash algorithm that is actually used
will be sha1.
With template "ima" removed from the configured default, we ensure that
the default tempalte would at least be "ima-ng" which allows for
basically any hash algorithm.
This change would not break the algorithm compatibility checks for IMA.
Fixes: 4286587dccd43 ("ima: add Kconfig default measurement list template")
Signed-off-by: GUO Zihua <guozihua(a)huawei.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index f3a9cc201c8c..7249f16257c7 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -69,10 +69,9 @@ choice
hash, defined as 20 bytes, and a null terminated pathname,
limited to 255 characters. The 'ima-ng' measurement list
template permits both larger hash digests and longer
- pathnames.
+ pathnames. The configured default template can be replaced
+ by specifying "ima_template=" on the boot command line.
- config IMA_TEMPLATE
- bool "ima"
config IMA_NG_TEMPLATE
bool "ima-ng (default)"
config IMA_SIG_TEMPLATE
@@ -82,7 +81,6 @@ endchoice
config IMA_DEFAULT_TEMPLATE
string
depends on IMA
- default "ima" if IMA_TEMPLATE
default "ima-ng" if IMA_NG_TEMPLATE
default "ima-sig" if IMA_SIG_TEMPLATE
@@ -102,19 +100,19 @@ choice
config IMA_DEFAULT_HASH_SHA256
bool "SHA256"
- depends on CRYPTO_SHA256=y && !IMA_TEMPLATE
+ depends on CRYPTO_SHA256=y
config IMA_DEFAULT_HASH_SHA512
bool "SHA512"
- depends on CRYPTO_SHA512=y && !IMA_TEMPLATE
+ depends on CRYPTO_SHA512=y
config IMA_DEFAULT_HASH_WP512
bool "WP512"
- depends on CRYPTO_WP512=y && !IMA_TEMPLATE
+ depends on CRYPTO_WP512=y
config IMA_DEFAULT_HASH_SM3
bool "SM3"
- depends on CRYPTO_SM3=y && !IMA_TEMPLATE
+ depends on CRYPTO_SM3=y
endchoice
config IMA_DEFAULT_HASH
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 891163adf180bc369b2f11c9dfce6d2758d2a5bd Mon Sep 17 00:00:00 2001
From: GUO Zihua <guozihua(a)huawei.com>
Date: Thu, 7 Apr 2022 10:16:19 +0800
Subject: [PATCH] ima: remove the IMA_TEMPLATE Kconfig option
The original 'ima' measurement list template contains a hash, defined
as 20 bytes, and a null terminated pathname, limited to 255
characters. Other measurement list templates permit both larger hashes
and longer pathnames. When the "ima" template is configured as the
default, a new measurement list template (ima_template=) must be
specified before specifying a larger hash algorithm (ima_hash=) on the
boot command line.
To avoid this boot command line ordering issue, remove the legacy "ima"
template configuration option, allowing it to still be specified on the
boot command line.
The root cause of this issue is that during the processing of ima_hash,
we would try to check whether the hash algorithm is compatible with the
template. If the template is not set at the moment we do the check, we
check the algorithm against the configured default template. If the
default template is "ima", then we reject any hash algorithm other than
sha1 and md5.
For example, if the compiled default template is "ima", and the default
algorithm is sha1 (which is the current default). In the cmdline, we put
in "ima_hash=sha256 ima_template=ima-ng". The expected behavior would be
that ima starts with ima-ng as the template and sha256 as the hash
algorithm. However, during the processing of "ima_hash=",
"ima_template=" has not been processed yet, and hash_setup would check
the configured hash algorithm against the compiled default: ima, and
reject sha256. So at the end, the hash algorithm that is actually used
will be sha1.
With template "ima" removed from the configured default, we ensure that
the default tempalte would at least be "ima-ng" which allows for
basically any hash algorithm.
This change would not break the algorithm compatibility checks for IMA.
Fixes: 4286587dccd43 ("ima: add Kconfig default measurement list template")
Signed-off-by: GUO Zihua <guozihua(a)huawei.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index f3a9cc201c8c..7249f16257c7 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -69,10 +69,9 @@ choice
hash, defined as 20 bytes, and a null terminated pathname,
limited to 255 characters. The 'ima-ng' measurement list
template permits both larger hash digests and longer
- pathnames.
+ pathnames. The configured default template can be replaced
+ by specifying "ima_template=" on the boot command line.
- config IMA_TEMPLATE
- bool "ima"
config IMA_NG_TEMPLATE
bool "ima-ng (default)"
config IMA_SIG_TEMPLATE
@@ -82,7 +81,6 @@ endchoice
config IMA_DEFAULT_TEMPLATE
string
depends on IMA
- default "ima" if IMA_TEMPLATE
default "ima-ng" if IMA_NG_TEMPLATE
default "ima-sig" if IMA_SIG_TEMPLATE
@@ -102,19 +100,19 @@ choice
config IMA_DEFAULT_HASH_SHA256
bool "SHA256"
- depends on CRYPTO_SHA256=y && !IMA_TEMPLATE
+ depends on CRYPTO_SHA256=y
config IMA_DEFAULT_HASH_SHA512
bool "SHA512"
- depends on CRYPTO_SHA512=y && !IMA_TEMPLATE
+ depends on CRYPTO_SHA512=y
config IMA_DEFAULT_HASH_WP512
bool "WP512"
- depends on CRYPTO_WP512=y && !IMA_TEMPLATE
+ depends on CRYPTO_WP512=y
config IMA_DEFAULT_HASH_SM3
bool "SM3"
- depends on CRYPTO_SM3=y && !IMA_TEMPLATE
+ depends on CRYPTO_SM3=y
endchoice
config IMA_DEFAULT_HASH
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 891163adf180bc369b2f11c9dfce6d2758d2a5bd Mon Sep 17 00:00:00 2001
From: GUO Zihua <guozihua(a)huawei.com>
Date: Thu, 7 Apr 2022 10:16:19 +0800
Subject: [PATCH] ima: remove the IMA_TEMPLATE Kconfig option
The original 'ima' measurement list template contains a hash, defined
as 20 bytes, and a null terminated pathname, limited to 255
characters. Other measurement list templates permit both larger hashes
and longer pathnames. When the "ima" template is configured as the
default, a new measurement list template (ima_template=) must be
specified before specifying a larger hash algorithm (ima_hash=) on the
boot command line.
To avoid this boot command line ordering issue, remove the legacy "ima"
template configuration option, allowing it to still be specified on the
boot command line.
The root cause of this issue is that during the processing of ima_hash,
we would try to check whether the hash algorithm is compatible with the
template. If the template is not set at the moment we do the check, we
check the algorithm against the configured default template. If the
default template is "ima", then we reject any hash algorithm other than
sha1 and md5.
For example, if the compiled default template is "ima", and the default
algorithm is sha1 (which is the current default). In the cmdline, we put
in "ima_hash=sha256 ima_template=ima-ng". The expected behavior would be
that ima starts with ima-ng as the template and sha256 as the hash
algorithm. However, during the processing of "ima_hash=",
"ima_template=" has not been processed yet, and hash_setup would check
the configured hash algorithm against the compiled default: ima, and
reject sha256. So at the end, the hash algorithm that is actually used
will be sha1.
With template "ima" removed from the configured default, we ensure that
the default tempalte would at least be "ima-ng" which allows for
basically any hash algorithm.
This change would not break the algorithm compatibility checks for IMA.
Fixes: 4286587dccd43 ("ima: add Kconfig default measurement list template")
Signed-off-by: GUO Zihua <guozihua(a)huawei.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index f3a9cc201c8c..7249f16257c7 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -69,10 +69,9 @@ choice
hash, defined as 20 bytes, and a null terminated pathname,
limited to 255 characters. The 'ima-ng' measurement list
template permits both larger hash digests and longer
- pathnames.
+ pathnames. The configured default template can be replaced
+ by specifying "ima_template=" on the boot command line.
- config IMA_TEMPLATE
- bool "ima"
config IMA_NG_TEMPLATE
bool "ima-ng (default)"
config IMA_SIG_TEMPLATE
@@ -82,7 +81,6 @@ endchoice
config IMA_DEFAULT_TEMPLATE
string
depends on IMA
- default "ima" if IMA_TEMPLATE
default "ima-ng" if IMA_NG_TEMPLATE
default "ima-sig" if IMA_SIG_TEMPLATE
@@ -102,19 +100,19 @@ choice
config IMA_DEFAULT_HASH_SHA256
bool "SHA256"
- depends on CRYPTO_SHA256=y && !IMA_TEMPLATE
+ depends on CRYPTO_SHA256=y
config IMA_DEFAULT_HASH_SHA512
bool "SHA512"
- depends on CRYPTO_SHA512=y && !IMA_TEMPLATE
+ depends on CRYPTO_SHA512=y
config IMA_DEFAULT_HASH_WP512
bool "WP512"
- depends on CRYPTO_WP512=y && !IMA_TEMPLATE
+ depends on CRYPTO_WP512=y
config IMA_DEFAULT_HASH_SM3
bool "SM3"
- depends on CRYPTO_SM3=y && !IMA_TEMPLATE
+ depends on CRYPTO_SM3=y
endchoice
config IMA_DEFAULT_HASH
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 891163adf180bc369b2f11c9dfce6d2758d2a5bd Mon Sep 17 00:00:00 2001
From: GUO Zihua <guozihua(a)huawei.com>
Date: Thu, 7 Apr 2022 10:16:19 +0800
Subject: [PATCH] ima: remove the IMA_TEMPLATE Kconfig option
The original 'ima' measurement list template contains a hash, defined
as 20 bytes, and a null terminated pathname, limited to 255
characters. Other measurement list templates permit both larger hashes
and longer pathnames. When the "ima" template is configured as the
default, a new measurement list template (ima_template=) must be
specified before specifying a larger hash algorithm (ima_hash=) on the
boot command line.
To avoid this boot command line ordering issue, remove the legacy "ima"
template configuration option, allowing it to still be specified on the
boot command line.
The root cause of this issue is that during the processing of ima_hash,
we would try to check whether the hash algorithm is compatible with the
template. If the template is not set at the moment we do the check, we
check the algorithm against the configured default template. If the
default template is "ima", then we reject any hash algorithm other than
sha1 and md5.
For example, if the compiled default template is "ima", and the default
algorithm is sha1 (which is the current default). In the cmdline, we put
in "ima_hash=sha256 ima_template=ima-ng". The expected behavior would be
that ima starts with ima-ng as the template and sha256 as the hash
algorithm. However, during the processing of "ima_hash=",
"ima_template=" has not been processed yet, and hash_setup would check
the configured hash algorithm against the compiled default: ima, and
reject sha256. So at the end, the hash algorithm that is actually used
will be sha1.
With template "ima" removed from the configured default, we ensure that
the default tempalte would at least be "ima-ng" which allows for
basically any hash algorithm.
This change would not break the algorithm compatibility checks for IMA.
Fixes: 4286587dccd43 ("ima: add Kconfig default measurement list template")
Signed-off-by: GUO Zihua <guozihua(a)huawei.com>
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index f3a9cc201c8c..7249f16257c7 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -69,10 +69,9 @@ choice
hash, defined as 20 bytes, and a null terminated pathname,
limited to 255 characters. The 'ima-ng' measurement list
template permits both larger hash digests and longer
- pathnames.
+ pathnames. The configured default template can be replaced
+ by specifying "ima_template=" on the boot command line.
- config IMA_TEMPLATE
- bool "ima"
config IMA_NG_TEMPLATE
bool "ima-ng (default)"
config IMA_SIG_TEMPLATE
@@ -82,7 +81,6 @@ endchoice
config IMA_DEFAULT_TEMPLATE
string
depends on IMA
- default "ima" if IMA_TEMPLATE
default "ima-ng" if IMA_NG_TEMPLATE
default "ima-sig" if IMA_SIG_TEMPLATE
@@ -102,19 +100,19 @@ choice
config IMA_DEFAULT_HASH_SHA256
bool "SHA256"
- depends on CRYPTO_SHA256=y && !IMA_TEMPLATE
+ depends on CRYPTO_SHA256=y
config IMA_DEFAULT_HASH_SHA512
bool "SHA512"
- depends on CRYPTO_SHA512=y && !IMA_TEMPLATE
+ depends on CRYPTO_SHA512=y
config IMA_DEFAULT_HASH_WP512
bool "WP512"
- depends on CRYPTO_WP512=y && !IMA_TEMPLATE
+ depends on CRYPTO_WP512=y
config IMA_DEFAULT_HASH_SM3
bool "SM3"
- depends on CRYPTO_SM3=y && !IMA_TEMPLATE
+ depends on CRYPTO_SM3=y
endchoice
config IMA_DEFAULT_HASH
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3bc5e683c67d94bd839a1da2e796c15847b51b69 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 1 Apr 2022 12:27:44 +0200
Subject: [PATCH] bfq: Split shared queues on move between cgroups
When bfqq is shared by multiple processes it can happen that one of the
processes gets moved to a different cgroup (or just starts submitting IO
for different cgroup). In case that happens we need to split the merged
bfqq as otherwise we will have IO for multiple cgroups in one bfqq and
we will just account IO time to wrong entities etc.
Similarly if the bfqq is scheduled to merge with another bfqq but the
merge didn't happen yet, cancel the merge as it need not be valid
anymore.
CC: stable(a)vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3(a)huawei.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-3-jack@suse.cz
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 420eda2589c0..9352f3cc2377 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -743,9 +743,39 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
}
if (sync_bfqq) {
- entity = &sync_bfqq->entity;
- if (entity->sched_data != &bfqg->sched_data)
- bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
+ if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
+ /* We are the only user of this bfqq, just move it */
+ if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
+ bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
+ } else {
+ struct bfq_queue *bfqq;
+
+ /*
+ * The queue was merged to a different queue. Check
+ * that the merge chain still belongs to the same
+ * cgroup.
+ */
+ for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
+ if (bfqq->entity.sched_data !=
+ &bfqg->sched_data)
+ break;
+ if (bfqq) {
+ /*
+ * Some queue changed cgroup so the merge is
+ * not valid anymore. We cannot easily just
+ * cancel the merge (by clearing new_bfqq) as
+ * there may be other processes using this
+ * queue and holding refs to all queues below
+ * sync_bfqq->new_bfqq. Similarly if the merge
+ * already happened, we need to detach from
+ * bfqq now so that we cannot merge bio to a
+ * request from the old cgroup.
+ */
+ bfq_put_cooperator(sync_bfqq);
+ bfq_release_process_ref(bfqd, sync_bfqq);
+ bic_set_bfqq(bic, NULL, 1);
+ }
+ }
}
return bfqg;
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 7d00b21ebe5d..89fe3f85eb3c 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5315,7 +5315,7 @@ static void bfq_put_stable_ref(struct bfq_queue *bfqq)
bfq_put_queue(bfqq);
}
-static void bfq_put_cooperator(struct bfq_queue *bfqq)
+void bfq_put_cooperator(struct bfq_queue *bfqq)
{
struct bfq_queue *__bfqq, *next;
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 3b83e3d1c2e5..a56763045d19 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -979,6 +979,7 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd,
void bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bool compensate, enum bfqq_expiration reason);
void bfq_put_queue(struct bfq_queue *bfqq);
+void bfq_put_cooperator(struct bfq_queue *bfqq);
void bfq_end_wr_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
void bfq_release_process_ref(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_schedule_dispatch(struct bfq_data *bfqd);
From: Alexander Usyskin <alexander.usyskin(a)intel.com>
Drop HBM responses also in the early shutdown phase where
the usual traffic is allowed.
Extend the rule that drop HBM responses received during the shutdown
phase by also in MEI_DEV_POWERING_DOWN state.
This resolves the stall if the driver is stopping in the middle
of the link initialization or link reset.
Drop the capabilities response on early shutdown.
Fixes: 6d7163f2c49f ("mei: hbm: drop hbm responses on early shutdown")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Alexander Usyskin <alexander.usyskin(a)intel.com>
Signed-off-by: Tomas Winkler <tomas.winkler(a)intel.com>
---
drivers/misc/mei/hbm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/misc/mei/hbm.c b/drivers/misc/mei/hbm.c
index cebcca6d6d3e..cf2b8261da14 100644
--- a/drivers/misc/mei/hbm.c
+++ b/drivers/misc/mei/hbm.c
@@ -1351,7 +1351,8 @@ int mei_hbm_dispatch(struct mei_device *dev, struct mei_msg_hdr *hdr)
if (dev->dev_state != MEI_DEV_INIT_CLIENTS ||
dev->hbm_state != MEI_HBM_CAP_SETUP) {
- if (dev->dev_state == MEI_DEV_POWER_DOWN) {
+ if (dev->dev_state == MEI_DEV_POWER_DOWN ||
+ dev->dev_state == MEI_DEV_POWERING_DOWN) {
dev_dbg(dev->dev, "hbm: capabilities response: on shutdown, ignoring\n");
return 0;
}
--
2.35.3
When ext4_xattr_block_set() decides to remove xattr block the following
race can happen:
CPU1 CPU2
ext4_xattr_block_set() ext4_xattr_release_block()
new_bh = ext4_xattr_block_cache_find()
lock_buffer(bh);
ref = le32_to_cpu(BHDR(bh)->h_refcount);
if (ref == 1) {
...
mb_cache_entry_delete();
unlock_buffer(bh);
ext4_free_blocks();
...
ext4_forget(..., bh, ...);
jbd2_journal_revoke(..., bh);
ext4_journal_get_write_access(..., new_bh, ...)
do_get_write_access()
jbd2_journal_cancel_revoke(..., new_bh);
Later the code in ext4_xattr_block_set() finds out the block got freed
and cancels reusal of the block but the revoke stays canceled and so in
case of block reuse and journal replay the filesystem can get corrupted.
If the race works out slightly differently, we can also hit assertions
in the jbd2 code. Fix the problem by waiting for users of the mbcache
entry (so xattr block reuse gets canceled) before we free xattr block to
prevent the issue with racing ext4_journal_get_write_access() call.
CC: stable(a)vger.kernel.org
Fixes: 82939d7999df ("ext4: convert to mbcache2")
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/ext4/xattr.c | 40 +++++++++++++++++++++++++++++++++-------
1 file changed, 33 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 042325349098..4eeeb3db618f 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -1241,17 +1241,29 @@ ext4_xattr_release_block(handle_t *handle, struct inode *inode,
hash = le32_to_cpu(BHDR(bh)->h_hash);
ref = le32_to_cpu(BHDR(bh)->h_refcount);
if (ref == 1) {
+ struct mb_cache_entry *ce = NULL;
+
ea_bdebug(bh, "refcount now=0; freeing");
/*
* This must happen under buffer lock for
* ext4_xattr_block_set() to reliably detect freed block
*/
if (ea_block_cache)
- mb_cache_entry_delete(ea_block_cache, hash,
- bh->b_blocknr);
+ ce = mb_cache_entry_invalidate(ea_block_cache, hash,
+ bh->b_blocknr);
get_bh(bh);
unlock_buffer(bh);
+ if (ce) {
+ /*
+ * Wait for outstanding users of xattr entry so that we
+ * know they don't try to reuse xattr block before we
+ * free it - that revokes the block from the journal
+ * which upsets jbd2_journal_get_write_access()
+ */
+ mb_cache_entry_wait_unused(ce);
+ mb_cache_entry_put(ea_block_cache, ce);
+ }
if (ext4_has_feature_ea_inode(inode->i_sb))
ext4_xattr_inode_dec_ref_all(handle, inode, bh,
BFIRST(bh),
@@ -1847,7 +1859,7 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
struct buffer_head *new_bh = NULL;
struct ext4_xattr_search s_copy = bs->s;
struct ext4_xattr_search *s = &s_copy;
- struct mb_cache_entry *ce = NULL;
+ struct mb_cache_entry *ce = NULL, *old_ce = NULL;
int error = 0;
struct mb_cache *ea_block_cache = EA_BLOCK_CACHE(inode);
struct inode *ea_inode = NULL, *tmp_inode;
@@ -1871,11 +1883,15 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
/*
* This must happen under buffer lock for
* ext4_xattr_block_set() to reliably detect modified
- * block
+ * block. We keep ourselves entry reference so that
+ * we can wait for outstanding users of the entry
+ * before freeing the xattr block.
*/
- if (ea_block_cache)
- mb_cache_entry_delete(ea_block_cache, hash,
- bs->bh->b_blocknr);
+ if (ea_block_cache) {
+ old_ce = mb_cache_entry_invalidate(
+ ea_block_cache, hash,
+ bs->bh->b_blocknr);
+ }
ea_bdebug(bs->bh, "modifying in-place");
error = ext4_xattr_set_entry(i, s, handle, inode,
true /* is_block */);
@@ -2127,6 +2143,14 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
if (bs->bh && bs->bh != new_bh) {
struct ext4_xattr_inode_array *ea_inode_array = NULL;
+ /*
+ * Wait for outstanding users of xattr entry so that we know
+ * they don't try to reuse xattr block before we free it - that
+ * revokes the block from the journal which upsets
+ * jbd2_journal_get_write_access()
+ */
+ if (old_ce)
+ mb_cache_entry_wait_unused(old_ce);
ext4_xattr_release_block(handle, inode, bs->bh,
&ea_inode_array,
0 /* extra_credits */);
@@ -2151,6 +2175,8 @@ ext4_xattr_block_set(handle_t *handle, struct inode *inode,
}
if (ce)
mb_cache_entry_put(ea_block_cache, ce);
+ if (old_ce)
+ mb_cache_entry_put(ea_block_cache, old_ce);
brelse(new_bh);
if (!(bs->bh && s->base == bs->bh->b_data))
kfree(s->base);
--
2.35.3
This series back-ports two bug fixes, which are now in v5.19-rc1 but
do not back-port cleanly to older releases. This commit applies
cleanly to v5.17.13, v5.15.45, and v5.10.120.
-Alex
Alex Elder (2):
net: ipa: fix page free in ipa_endpoint_trans_release()
net: ipa: fix page free in ipa_endpoint_replenish_one()
drivers/net/ipa/ipa_endpoint.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--
2.32.0
Hello my dear.,
I sent this mail praying it will get to you in a good condition of
health, since I myself are in a very critical health condition in
which I sleep every night without knowing if I may be alive to see the
next day. I bring peace and love to you.. It is by the grace of God, I
had no choice than to do what is lawful and right in the sight of God
for eternal life and in the sight of man, for witness of God’s mercy
and glory upon my life. I am Mrs. Dina Howley Mckenna, a widow. I am
suffering from a long time brain tumor, It has defiled all forms of
medical treatment, and right now I have about a few months to leave,
according to medical experts. The situation has gotten complicated
recently with my inability to hear proper, am communicating with you
with the help of the chief nurse herein the hospital, from all
indication my conditions is really deteriorating and it is quite
obvious that, according to my doctors they have advised me that I may
not live too long, Because this illness has gotten to a very bad
stage. I plead that you will not expose or betray this trust and
confidence that I am about to repose on you for the mutual benefit of
the orphans and the less privilege. I have some funds I inherited from
my late husband, the sum of ($ 11,000,000.00, Eleven Million Dollars).
Having known my condition, I decided to donate this fund to you
believing that you will utilize it the way i am going to instruct
herein. I need you to assist me and reclaim this money and use it for
Charity works therein your country for orphanages and gives justice
and help to the poor, needy and widows says The Lord." Jeremiah
22:15-16.“ and also build schools for less privilege that will be
named after my late husband if possible and to promote the word of God
and the effort that the house of God is maintained. I do not want a
situation where this money will be used in an ungodly manner. That's
why I'm taking this decision. I'm not afraid of death, so I know where
I'm going. I accept this decision because I do not have any child who
will inherit this money after I die.. Please I want your sincerely and
urgent answer to know if you will be able to execute this project for
the glory of God, and I will give you more information on how the fund
will be transferred to your bank account. May the grace, peace, love
and the truth in the Word of God be with you and all those that you
love and care for..
I'm waiting for your immediate reply..
May God Bless you.,
Mrs. Dina Howley Mckenna..
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 083084df578a8bdb18334f69e7b32d690aaa3247 Mon Sep 17 00:00:00 2001
From: Tokunori Ikegami <ikegami.t(a)gmail.com>
Date: Thu, 24 Mar 2022 02:04:55 +0900
Subject: [PATCH] mtd: cfi_cmdset_0002: Move and rename
chip_check/chip_ready/chip_good_for_write
This is a preparation patch for the S29GL064N buffer writes fix. There
is no functional change.
Link: https://lore.kernel.org/r/b687c259-6413-26c9-d4c9-b3afa69ea124@pengutronix.…
Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
Signed-off-by: Tokunori Ikegami <ikegami.t(a)gmail.com>
Cc: stable(a)vger.kernel.org
Acked-by: Vignesh Raghavendra <vigneshr(a)ti.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220323170458.5608-2-ikegami.t@gmail.com
diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index a761134fd3be..9ccde90dc180 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -802,21 +802,25 @@ static struct mtd_info *cfi_amdstd_setup(struct mtd_info *mtd)
}
/*
- * Return true if the chip is ready.
+ * Return true if the chip is ready and has the correct value.
*
* Ready is one of: read mode, query mode, erase-suspend-read mode (in any
* non-suspended sector) and is indicated by no toggle bits toggling.
*
+ * Error are indicated by toggling bits or bits held with the wrong value,
+ * or with bits toggling.
+ *
* Note that anything more complicated than checking if no bits are toggling
* (including checking DQ5 for an error status) is tricky to get working
* correctly and is therefore not done (particularly with interleaved chips
* as each chip must be checked independently of the others).
*/
static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
- unsigned long addr)
+ unsigned long addr, map_word *expected)
{
struct cfi_private *cfi = map->fldrv_priv;
map_word d, t;
+ int ret;
if (cfi_use_status_reg(cfi)) {
map_word ready = CMD(CFI_SR_DRB);
@@ -826,57 +830,20 @@ static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
*/
cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi,
cfi->device_type, NULL);
- d = map_read(map, addr);
+ t = map_read(map, addr);
- return map_word_andequal(map, d, ready, ready);
+ return map_word_andequal(map, t, ready, ready);
}
d = map_read(map, addr);
t = map_read(map, addr);
- return map_word_equal(map, d, t);
-}
-
-/*
- * Return true if the chip is ready and has the correct value.
- *
- * Ready is one of: read mode, query mode, erase-suspend-read mode (in any
- * non-suspended sector) and it is indicated by no bits toggling.
- *
- * Error are indicated by toggling bits or bits held with the wrong value,
- * or with bits toggling.
- *
- * Note that anything more complicated than checking if no bits are toggling
- * (including checking DQ5 for an error status) is tricky to get working
- * correctly and is therefore not done (particularly with interleaved chips
- * as each chip must be checked independently of the others).
- *
- */
-static int __xipram chip_good(struct map_info *map, struct flchip *chip,
- unsigned long addr, map_word expected)
-{
- struct cfi_private *cfi = map->fldrv_priv;
- map_word oldd, curd;
-
- if (cfi_use_status_reg(cfi)) {
- map_word ready = CMD(CFI_SR_DRB);
-
- /*
- * For chips that support status register, check device
- * ready bit
- */
- cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi,
- cfi->device_type, NULL);
- curd = map_read(map, addr);
-
- return map_word_andequal(map, curd, ready, ready);
- }
+ ret = map_word_equal(map, d, t);
- oldd = map_read(map, addr);
- curd = map_read(map, addr);
+ if (!ret || !expected)
+ return ret;
- return map_word_equal(map, oldd, curd) &&
- map_word_equal(map, curd, expected);
+ return map_word_equal(map, t, *expected);
}
static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode)
@@ -893,7 +860,7 @@ static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr
case FL_STATUS:
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
@@ -932,7 +899,7 @@ static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr
chip->state = FL_ERASE_SUSPENDING;
chip->erase_suspended = 1;
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
@@ -1463,7 +1430,7 @@ static int do_otp_lock(struct map_info *map, struct flchip *chip, loff_t adr,
/* wait for chip to become ready */
timeo = jiffies + msecs_to_jiffies(2);
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
@@ -1695,11 +1662,11 @@ static int __xipram do_write_oneword_once(struct map_info *map,
}
/*
- * We check "time_after" and "!chip_good" before checking
- * "chip_good" to avoid the failure due to scheduling.
+ * We check "time_after" and "!chip_ready" before checking
+ * "chip_ready" to avoid the failure due to scheduling.
*/
if (time_after(jiffies, timeo) &&
- !chip_good(map, chip, adr, datum)) {
+ !chip_ready(map, chip, adr, &datum)) {
xip_enable(map, chip, adr);
printk(KERN_WARNING "MTD %s(): software timeout\n", __func__);
xip_disable(map, chip, adr);
@@ -1707,7 +1674,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
break;
}
- if (chip_good(map, chip, adr, datum)) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -1975,18 +1942,18 @@ static int __xipram do_write_buffer_wait(struct map_info *map,
}
/*
- * We check "time_after" and "!chip_good" before checking
- * "chip_good" to avoid the failure due to scheduling.
+ * We check "time_after" and "!chip_ready" before checking
+ * "chip_ready" to avoid the failure due to scheduling.
*/
if (time_after(jiffies, timeo) &&
- !chip_good(map, chip, adr, datum)) {
+ !chip_ready(map, chip, adr, &datum)) {
pr_err("MTD %s(): software timeout, address:0x%.8lx.\n",
__func__, adr);
ret = -EIO;
break;
}
- if (chip_good(map, chip, adr, datum)) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -2195,7 +2162,7 @@ static int cfi_amdstd_panic_wait(struct map_info *map, struct flchip *chip,
* If the driver thinks the chip is idle, and no toggle bits
* are changing, then the chip is actually idle for sure.
*/
- if (chip->state == FL_READY && chip_ready(map, chip, adr))
+ if (chip->state == FL_READY && chip_ready(map, chip, adr, NULL))
return 0;
/*
@@ -2212,7 +2179,7 @@ static int cfi_amdstd_panic_wait(struct map_info *map, struct flchip *chip,
/* wait for the chip to become ready */
for (i = 0; i < jiffies_to_usecs(timeo); i++) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
return 0;
udelay(1);
@@ -2276,13 +2243,13 @@ static int do_panic_write_oneword(struct map_info *map, struct flchip *chip,
map_write(map, datum, adr);
for (i = 0; i < jiffies_to_usecs(uWriteTimeout); i++) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
udelay(1);
}
- if (!chip_good(map, chip, adr, datum) ||
+ if (!chip_ready(map, chip, adr, &datum) ||
cfi_check_err_status(map, chip, adr)) {
/* reset on all failures. */
map_write(map, CMD(0xF0), chip->start);
@@ -2424,6 +2391,7 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip)
DECLARE_WAITQUEUE(wait, current);
int ret;
int retry_cnt = 0;
+ map_word datum = map_word_ff(map);
adr = cfi->addr_unlock1;
@@ -2478,7 +2446,7 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip)
chip->erase_suspended = 0;
}
- if (chip_good(map, chip, adr, map_word_ff(map))) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -2523,6 +2491,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
DECLARE_WAITQUEUE(wait, current);
int ret;
int retry_cnt = 0;
+ map_word datum = map_word_ff(map);
adr += chip->start;
@@ -2577,7 +2546,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
chip->erase_suspended = 0;
}
- if (chip_good(map, chip, adr, map_word_ff(map))) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -2771,7 +2740,7 @@ static int __maybe_unused do_ppb_xxlock(struct map_info *map,
*/
timeo = jiffies + msecs_to_jiffies(2000); /* 2s max (un)locking */
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 083084df578a8bdb18334f69e7b32d690aaa3247 Mon Sep 17 00:00:00 2001
From: Tokunori Ikegami <ikegami.t(a)gmail.com>
Date: Thu, 24 Mar 2022 02:04:55 +0900
Subject: [PATCH] mtd: cfi_cmdset_0002: Move and rename
chip_check/chip_ready/chip_good_for_write
This is a preparation patch for the S29GL064N buffer writes fix. There
is no functional change.
Link: https://lore.kernel.org/r/b687c259-6413-26c9-d4c9-b3afa69ea124@pengutronix.…
Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
Signed-off-by: Tokunori Ikegami <ikegami.t(a)gmail.com>
Cc: stable(a)vger.kernel.org
Acked-by: Vignesh Raghavendra <vigneshr(a)ti.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220323170458.5608-2-ikegami.t@gmail.com
diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index a761134fd3be..9ccde90dc180 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -802,21 +802,25 @@ static struct mtd_info *cfi_amdstd_setup(struct mtd_info *mtd)
}
/*
- * Return true if the chip is ready.
+ * Return true if the chip is ready and has the correct value.
*
* Ready is one of: read mode, query mode, erase-suspend-read mode (in any
* non-suspended sector) and is indicated by no toggle bits toggling.
*
+ * Error are indicated by toggling bits or bits held with the wrong value,
+ * or with bits toggling.
+ *
* Note that anything more complicated than checking if no bits are toggling
* (including checking DQ5 for an error status) is tricky to get working
* correctly and is therefore not done (particularly with interleaved chips
* as each chip must be checked independently of the others).
*/
static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
- unsigned long addr)
+ unsigned long addr, map_word *expected)
{
struct cfi_private *cfi = map->fldrv_priv;
map_word d, t;
+ int ret;
if (cfi_use_status_reg(cfi)) {
map_word ready = CMD(CFI_SR_DRB);
@@ -826,57 +830,20 @@ static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
*/
cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi,
cfi->device_type, NULL);
- d = map_read(map, addr);
+ t = map_read(map, addr);
- return map_word_andequal(map, d, ready, ready);
+ return map_word_andequal(map, t, ready, ready);
}
d = map_read(map, addr);
t = map_read(map, addr);
- return map_word_equal(map, d, t);
-}
-
-/*
- * Return true if the chip is ready and has the correct value.
- *
- * Ready is one of: read mode, query mode, erase-suspend-read mode (in any
- * non-suspended sector) and it is indicated by no bits toggling.
- *
- * Error are indicated by toggling bits or bits held with the wrong value,
- * or with bits toggling.
- *
- * Note that anything more complicated than checking if no bits are toggling
- * (including checking DQ5 for an error status) is tricky to get working
- * correctly and is therefore not done (particularly with interleaved chips
- * as each chip must be checked independently of the others).
- *
- */
-static int __xipram chip_good(struct map_info *map, struct flchip *chip,
- unsigned long addr, map_word expected)
-{
- struct cfi_private *cfi = map->fldrv_priv;
- map_word oldd, curd;
-
- if (cfi_use_status_reg(cfi)) {
- map_word ready = CMD(CFI_SR_DRB);
-
- /*
- * For chips that support status register, check device
- * ready bit
- */
- cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi,
- cfi->device_type, NULL);
- curd = map_read(map, addr);
-
- return map_word_andequal(map, curd, ready, ready);
- }
+ ret = map_word_equal(map, d, t);
- oldd = map_read(map, addr);
- curd = map_read(map, addr);
+ if (!ret || !expected)
+ return ret;
- return map_word_equal(map, oldd, curd) &&
- map_word_equal(map, curd, expected);
+ return map_word_equal(map, t, *expected);
}
static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode)
@@ -893,7 +860,7 @@ static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr
case FL_STATUS:
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
@@ -932,7 +899,7 @@ static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr
chip->state = FL_ERASE_SUSPENDING;
chip->erase_suspended = 1;
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
@@ -1463,7 +1430,7 @@ static int do_otp_lock(struct map_info *map, struct flchip *chip, loff_t adr,
/* wait for chip to become ready */
timeo = jiffies + msecs_to_jiffies(2);
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
@@ -1695,11 +1662,11 @@ static int __xipram do_write_oneword_once(struct map_info *map,
}
/*
- * We check "time_after" and "!chip_good" before checking
- * "chip_good" to avoid the failure due to scheduling.
+ * We check "time_after" and "!chip_ready" before checking
+ * "chip_ready" to avoid the failure due to scheduling.
*/
if (time_after(jiffies, timeo) &&
- !chip_good(map, chip, adr, datum)) {
+ !chip_ready(map, chip, adr, &datum)) {
xip_enable(map, chip, adr);
printk(KERN_WARNING "MTD %s(): software timeout\n", __func__);
xip_disable(map, chip, adr);
@@ -1707,7 +1674,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
break;
}
- if (chip_good(map, chip, adr, datum)) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -1975,18 +1942,18 @@ static int __xipram do_write_buffer_wait(struct map_info *map,
}
/*
- * We check "time_after" and "!chip_good" before checking
- * "chip_good" to avoid the failure due to scheduling.
+ * We check "time_after" and "!chip_ready" before checking
+ * "chip_ready" to avoid the failure due to scheduling.
*/
if (time_after(jiffies, timeo) &&
- !chip_good(map, chip, adr, datum)) {
+ !chip_ready(map, chip, adr, &datum)) {
pr_err("MTD %s(): software timeout, address:0x%.8lx.\n",
__func__, adr);
ret = -EIO;
break;
}
- if (chip_good(map, chip, adr, datum)) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -2195,7 +2162,7 @@ static int cfi_amdstd_panic_wait(struct map_info *map, struct flchip *chip,
* If the driver thinks the chip is idle, and no toggle bits
* are changing, then the chip is actually idle for sure.
*/
- if (chip->state == FL_READY && chip_ready(map, chip, adr))
+ if (chip->state == FL_READY && chip_ready(map, chip, adr, NULL))
return 0;
/*
@@ -2212,7 +2179,7 @@ static int cfi_amdstd_panic_wait(struct map_info *map, struct flchip *chip,
/* wait for the chip to become ready */
for (i = 0; i < jiffies_to_usecs(timeo); i++) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
return 0;
udelay(1);
@@ -2276,13 +2243,13 @@ static int do_panic_write_oneword(struct map_info *map, struct flchip *chip,
map_write(map, datum, adr);
for (i = 0; i < jiffies_to_usecs(uWriteTimeout); i++) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
udelay(1);
}
- if (!chip_good(map, chip, adr, datum) ||
+ if (!chip_ready(map, chip, adr, &datum) ||
cfi_check_err_status(map, chip, adr)) {
/* reset on all failures. */
map_write(map, CMD(0xF0), chip->start);
@@ -2424,6 +2391,7 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip)
DECLARE_WAITQUEUE(wait, current);
int ret;
int retry_cnt = 0;
+ map_word datum = map_word_ff(map);
adr = cfi->addr_unlock1;
@@ -2478,7 +2446,7 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip)
chip->erase_suspended = 0;
}
- if (chip_good(map, chip, adr, map_word_ff(map))) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -2523,6 +2491,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
DECLARE_WAITQUEUE(wait, current);
int ret;
int retry_cnt = 0;
+ map_word datum = map_word_ff(map);
adr += chip->start;
@@ -2577,7 +2546,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
chip->erase_suspended = 0;
}
- if (chip_good(map, chip, adr, map_word_ff(map))) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -2771,7 +2740,7 @@ static int __maybe_unused do_ppb_xxlock(struct map_info *map,
*/
timeo = jiffies + msecs_to_jiffies(2000); /* 2s max (un)locking */
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 083084df578a8bdb18334f69e7b32d690aaa3247 Mon Sep 17 00:00:00 2001
From: Tokunori Ikegami <ikegami.t(a)gmail.com>
Date: Thu, 24 Mar 2022 02:04:55 +0900
Subject: [PATCH] mtd: cfi_cmdset_0002: Move and rename
chip_check/chip_ready/chip_good_for_write
This is a preparation patch for the S29GL064N buffer writes fix. There
is no functional change.
Link: https://lore.kernel.org/r/b687c259-6413-26c9-d4c9-b3afa69ea124@pengutronix.…
Fixes: dfeae1073583("mtd: cfi_cmdset_0002: Change write buffer to check correct value")
Signed-off-by: Tokunori Ikegami <ikegami.t(a)gmail.com>
Cc: stable(a)vger.kernel.org
Acked-by: Vignesh Raghavendra <vigneshr(a)ti.com>
Signed-off-by: Miquel Raynal <miquel.raynal(a)bootlin.com>
Link: https://lore.kernel.org/linux-mtd/20220323170458.5608-2-ikegami.t@gmail.com
diff --git a/drivers/mtd/chips/cfi_cmdset_0002.c b/drivers/mtd/chips/cfi_cmdset_0002.c
index a761134fd3be..9ccde90dc180 100644
--- a/drivers/mtd/chips/cfi_cmdset_0002.c
+++ b/drivers/mtd/chips/cfi_cmdset_0002.c
@@ -802,21 +802,25 @@ static struct mtd_info *cfi_amdstd_setup(struct mtd_info *mtd)
}
/*
- * Return true if the chip is ready.
+ * Return true if the chip is ready and has the correct value.
*
* Ready is one of: read mode, query mode, erase-suspend-read mode (in any
* non-suspended sector) and is indicated by no toggle bits toggling.
*
+ * Error are indicated by toggling bits or bits held with the wrong value,
+ * or with bits toggling.
+ *
* Note that anything more complicated than checking if no bits are toggling
* (including checking DQ5 for an error status) is tricky to get working
* correctly and is therefore not done (particularly with interleaved chips
* as each chip must be checked independently of the others).
*/
static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
- unsigned long addr)
+ unsigned long addr, map_word *expected)
{
struct cfi_private *cfi = map->fldrv_priv;
map_word d, t;
+ int ret;
if (cfi_use_status_reg(cfi)) {
map_word ready = CMD(CFI_SR_DRB);
@@ -826,57 +830,20 @@ static int __xipram chip_ready(struct map_info *map, struct flchip *chip,
*/
cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi,
cfi->device_type, NULL);
- d = map_read(map, addr);
+ t = map_read(map, addr);
- return map_word_andequal(map, d, ready, ready);
+ return map_word_andequal(map, t, ready, ready);
}
d = map_read(map, addr);
t = map_read(map, addr);
- return map_word_equal(map, d, t);
-}
-
-/*
- * Return true if the chip is ready and has the correct value.
- *
- * Ready is one of: read mode, query mode, erase-suspend-read mode (in any
- * non-suspended sector) and it is indicated by no bits toggling.
- *
- * Error are indicated by toggling bits or bits held with the wrong value,
- * or with bits toggling.
- *
- * Note that anything more complicated than checking if no bits are toggling
- * (including checking DQ5 for an error status) is tricky to get working
- * correctly and is therefore not done (particularly with interleaved chips
- * as each chip must be checked independently of the others).
- *
- */
-static int __xipram chip_good(struct map_info *map, struct flchip *chip,
- unsigned long addr, map_word expected)
-{
- struct cfi_private *cfi = map->fldrv_priv;
- map_word oldd, curd;
-
- if (cfi_use_status_reg(cfi)) {
- map_word ready = CMD(CFI_SR_DRB);
-
- /*
- * For chips that support status register, check device
- * ready bit
- */
- cfi_send_gen_cmd(0x70, cfi->addr_unlock1, chip->start, map, cfi,
- cfi->device_type, NULL);
- curd = map_read(map, addr);
-
- return map_word_andequal(map, curd, ready, ready);
- }
+ ret = map_word_equal(map, d, t);
- oldd = map_read(map, addr);
- curd = map_read(map, addr);
+ if (!ret || !expected)
+ return ret;
- return map_word_equal(map, oldd, curd) &&
- map_word_equal(map, curd, expected);
+ return map_word_equal(map, t, *expected);
}
static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr, int mode)
@@ -893,7 +860,7 @@ static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr
case FL_STATUS:
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
@@ -932,7 +899,7 @@ static int get_chip(struct map_info *map, struct flchip *chip, unsigned long adr
chip->state = FL_ERASE_SUSPENDING;
chip->erase_suspended = 1;
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
@@ -1463,7 +1430,7 @@ static int do_otp_lock(struct map_info *map, struct flchip *chip, loff_t adr,
/* wait for chip to become ready */
timeo = jiffies + msecs_to_jiffies(2);
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
@@ -1695,11 +1662,11 @@ static int __xipram do_write_oneword_once(struct map_info *map,
}
/*
- * We check "time_after" and "!chip_good" before checking
- * "chip_good" to avoid the failure due to scheduling.
+ * We check "time_after" and "!chip_ready" before checking
+ * "chip_ready" to avoid the failure due to scheduling.
*/
if (time_after(jiffies, timeo) &&
- !chip_good(map, chip, adr, datum)) {
+ !chip_ready(map, chip, adr, &datum)) {
xip_enable(map, chip, adr);
printk(KERN_WARNING "MTD %s(): software timeout\n", __func__);
xip_disable(map, chip, adr);
@@ -1707,7 +1674,7 @@ static int __xipram do_write_oneword_once(struct map_info *map,
break;
}
- if (chip_good(map, chip, adr, datum)) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -1975,18 +1942,18 @@ static int __xipram do_write_buffer_wait(struct map_info *map,
}
/*
- * We check "time_after" and "!chip_good" before checking
- * "chip_good" to avoid the failure due to scheduling.
+ * We check "time_after" and "!chip_ready" before checking
+ * "chip_ready" to avoid the failure due to scheduling.
*/
if (time_after(jiffies, timeo) &&
- !chip_good(map, chip, adr, datum)) {
+ !chip_ready(map, chip, adr, &datum)) {
pr_err("MTD %s(): software timeout, address:0x%.8lx.\n",
__func__, adr);
ret = -EIO;
break;
}
- if (chip_good(map, chip, adr, datum)) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -2195,7 +2162,7 @@ static int cfi_amdstd_panic_wait(struct map_info *map, struct flchip *chip,
* If the driver thinks the chip is idle, and no toggle bits
* are changing, then the chip is actually idle for sure.
*/
- if (chip->state == FL_READY && chip_ready(map, chip, adr))
+ if (chip->state == FL_READY && chip_ready(map, chip, adr, NULL))
return 0;
/*
@@ -2212,7 +2179,7 @@ static int cfi_amdstd_panic_wait(struct map_info *map, struct flchip *chip,
/* wait for the chip to become ready */
for (i = 0; i < jiffies_to_usecs(timeo); i++) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
return 0;
udelay(1);
@@ -2276,13 +2243,13 @@ static int do_panic_write_oneword(struct map_info *map, struct flchip *chip,
map_write(map, datum, adr);
for (i = 0; i < jiffies_to_usecs(uWriteTimeout); i++) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
udelay(1);
}
- if (!chip_good(map, chip, adr, datum) ||
+ if (!chip_ready(map, chip, adr, &datum) ||
cfi_check_err_status(map, chip, adr)) {
/* reset on all failures. */
map_write(map, CMD(0xF0), chip->start);
@@ -2424,6 +2391,7 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip)
DECLARE_WAITQUEUE(wait, current);
int ret;
int retry_cnt = 0;
+ map_word datum = map_word_ff(map);
adr = cfi->addr_unlock1;
@@ -2478,7 +2446,7 @@ static int __xipram do_erase_chip(struct map_info *map, struct flchip *chip)
chip->erase_suspended = 0;
}
- if (chip_good(map, chip, adr, map_word_ff(map))) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -2523,6 +2491,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
DECLARE_WAITQUEUE(wait, current);
int ret;
int retry_cnt = 0;
+ map_word datum = map_word_ff(map);
adr += chip->start;
@@ -2577,7 +2546,7 @@ static int __xipram do_erase_oneblock(struct map_info *map, struct flchip *chip,
chip->erase_suspended = 0;
}
- if (chip_good(map, chip, adr, map_word_ff(map))) {
+ if (chip_ready(map, chip, adr, &datum)) {
if (cfi_check_err_status(map, chip, adr))
ret = -EIO;
break;
@@ -2771,7 +2740,7 @@ static int __maybe_unused do_ppb_xxlock(struct map_info *map,
*/
timeo = jiffies + msecs_to_jiffies(2000); /* 2s max (un)locking */
for (;;) {
- if (chip_ready(map, chip, adr))
+ if (chip_ready(map, chip, adr, NULL))
break;
if (time_after(jiffies, timeo)) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0ea917819d12fed41ea4662cc26ffa0060a5c354 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Fri, 20 May 2022 12:46:00 +0300
Subject: [PATCH] drm/i915/dsi: fix VBT send packet port selection for ICL+
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The VBT send packet port selection was never updated for ICL+ where the
2nd link is on port B instead of port C as in VLV+ DSI.
First, single link DSI needs to use the configured port instead of
relying on the VBT sequence block port. Remove the hard-coded port C
check here and make it generic. For reference, see commit f915084edc5a
("drm/i915: Changes related to the sequence port no for") for the
original VLV specific fix.
Second, the sequence block port number is either 0 or 1, where 1
indicates the 2nd link. Remove the hard-coded port C here for 2nd
link. (This could be a "find second set bit" on DSI ports, but just
check the two possible options.)
Third, sanity check the result with a warning to avoid a NULL pointer
dereference.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5984
Cc: stable(a)vger.kernel.org # v4.19+
Cc: Ville Syrjala <ville.syrjala(a)linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220520094600.2066945-1-jani…
(cherry picked from commit 08c59dde71b73a0ac94e3ed2d431345b01f20485)
diff --git a/drivers/gpu/drm/i915/display/intel_dsi_vbt.c b/drivers/gpu/drm/i915/display/intel_dsi_vbt.c
index f370e9c4350d..dd24aef925f2 100644
--- a/drivers/gpu/drm/i915/display/intel_dsi_vbt.c
+++ b/drivers/gpu/drm/i915/display/intel_dsi_vbt.c
@@ -125,9 +125,25 @@ struct i2c_adapter_lookup {
#define ICL_GPIO_DDPA_CTRLCLK_2 8
#define ICL_GPIO_DDPA_CTRLDATA_2 9
-static enum port intel_dsi_seq_port_to_port(u8 port)
+static enum port intel_dsi_seq_port_to_port(struct intel_dsi *intel_dsi,
+ u8 seq_port)
{
- return port ? PORT_C : PORT_A;
+ /*
+ * If single link DSI is being used on any port, the VBT sequence block
+ * send packet apparently always has 0 for the port. Just use the port
+ * we have configured, and ignore the sequence block port.
+ */
+ if (hweight8(intel_dsi->ports) == 1)
+ return ffs(intel_dsi->ports) - 1;
+
+ if (seq_port) {
+ if (intel_dsi->ports & PORT_B)
+ return PORT_B;
+ else if (intel_dsi->ports & PORT_C)
+ return PORT_C;
+ }
+
+ return PORT_A;
}
static const u8 *mipi_exec_send_packet(struct intel_dsi *intel_dsi,
@@ -149,15 +165,10 @@ static const u8 *mipi_exec_send_packet(struct intel_dsi *intel_dsi,
seq_port = (flags >> MIPI_PORT_SHIFT) & 3;
- /* For DSI single link on Port A & C, the seq_port value which is
- * parsed from Sequence Block#53 of VBT has been set to 0
- * Now, read/write of packets for the DSI single link on Port A and
- * Port C will based on the DVO port from VBT block 2.
- */
- if (intel_dsi->ports == (1 << PORT_C))
- port = PORT_C;
- else
- port = intel_dsi_seq_port_to_port(seq_port);
+ port = intel_dsi_seq_port_to_port(intel_dsi, seq_port);
+
+ if (drm_WARN_ON(&dev_priv->drm, !intel_dsi->dsi_hosts[port]))
+ goto out;
dsi_device = intel_dsi->dsi_hosts[port]->device;
if (!dsi_device) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0ea917819d12fed41ea4662cc26ffa0060a5c354 Mon Sep 17 00:00:00 2001
From: Jani Nikula <jani.nikula(a)intel.com>
Date: Fri, 20 May 2022 12:46:00 +0300
Subject: [PATCH] drm/i915/dsi: fix VBT send packet port selection for ICL+
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The VBT send packet port selection was never updated for ICL+ where the
2nd link is on port B instead of port C as in VLV+ DSI.
First, single link DSI needs to use the configured port instead of
relying on the VBT sequence block port. Remove the hard-coded port C
check here and make it generic. For reference, see commit f915084edc5a
("drm/i915: Changes related to the sequence port no for") for the
original VLV specific fix.
Second, the sequence block port number is either 0 or 1, where 1
indicates the 2nd link. Remove the hard-coded port C here for 2nd
link. (This could be a "find second set bit" on DSI ports, but just
check the two possible options.)
Third, sanity check the result with a warning to avoid a NULL pointer
dereference.
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5984
Cc: stable(a)vger.kernel.org # v4.19+
Cc: Ville Syrjala <ville.syrjala(a)linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula(a)intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20220520094600.2066945-1-jani…
(cherry picked from commit 08c59dde71b73a0ac94e3ed2d431345b01f20485)
diff --git a/drivers/gpu/drm/i915/display/intel_dsi_vbt.c b/drivers/gpu/drm/i915/display/intel_dsi_vbt.c
index f370e9c4350d..dd24aef925f2 100644
--- a/drivers/gpu/drm/i915/display/intel_dsi_vbt.c
+++ b/drivers/gpu/drm/i915/display/intel_dsi_vbt.c
@@ -125,9 +125,25 @@ struct i2c_adapter_lookup {
#define ICL_GPIO_DDPA_CTRLCLK_2 8
#define ICL_GPIO_DDPA_CTRLDATA_2 9
-static enum port intel_dsi_seq_port_to_port(u8 port)
+static enum port intel_dsi_seq_port_to_port(struct intel_dsi *intel_dsi,
+ u8 seq_port)
{
- return port ? PORT_C : PORT_A;
+ /*
+ * If single link DSI is being used on any port, the VBT sequence block
+ * send packet apparently always has 0 for the port. Just use the port
+ * we have configured, and ignore the sequence block port.
+ */
+ if (hweight8(intel_dsi->ports) == 1)
+ return ffs(intel_dsi->ports) - 1;
+
+ if (seq_port) {
+ if (intel_dsi->ports & PORT_B)
+ return PORT_B;
+ else if (intel_dsi->ports & PORT_C)
+ return PORT_C;
+ }
+
+ return PORT_A;
}
static const u8 *mipi_exec_send_packet(struct intel_dsi *intel_dsi,
@@ -149,15 +165,10 @@ static const u8 *mipi_exec_send_packet(struct intel_dsi *intel_dsi,
seq_port = (flags >> MIPI_PORT_SHIFT) & 3;
- /* For DSI single link on Port A & C, the seq_port value which is
- * parsed from Sequence Block#53 of VBT has been set to 0
- * Now, read/write of packets for the DSI single link on Port A and
- * Port C will based on the DVO port from VBT block 2.
- */
- if (intel_dsi->ports == (1 << PORT_C))
- port = PORT_C;
- else
- port = intel_dsi_seq_port_to_port(seq_port);
+ port = intel_dsi_seq_port_to_port(intel_dsi, seq_port);
+
+ if (drm_WARN_ON(&dev_priv->drm, !intel_dsi->dsi_hosts[port]))
+ goto out;
dsi_device = intel_dsi->dsi_hosts[port]->device;
if (!dsi_device) {
When we compile-in the CCI along with the imx412 driver and run on the RB5
we see that i2c_add_adapter() causes the probe of the imx412 driver to
happen.
This probe tries to perform an i2c xfer() and the xfer() in i2c-qcom-cci.c
fails on pm_runtime_get() because the i2c-qcom-cci.c::probe() function has
not completed to pm_runtime_enable(dev).
Fix this sequence by ensuring pm_runtime_xxx() calls happen prior to adding
the i2c adapter.
Fixes: e517526195de ("i2c: Add Qualcomm CCI I2C driver")
Reported-by: Vladimir Zapolskiy <vladimir.zapolskiy(a)linaro.org>
Reviewed-by: Vladimir Zapolskiy <vladimir.zapolskiy(a)linaro.org>
Tested-by: Vladimir Zapolskiy <vladimir.zapolskiy(a)linaro.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
drivers/i2c/busses/i2c-qcom-cci.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/drivers/i2c/busses/i2c-qcom-cci.c b/drivers/i2c/busses/i2c-qcom-cci.c
index 5c7cc862f08f..8d078bdb5c1b 100644
--- a/drivers/i2c/busses/i2c-qcom-cci.c
+++ b/drivers/i2c/busses/i2c-qcom-cci.c
@@ -638,6 +638,11 @@ static int cci_probe(struct platform_device *pdev)
if (ret < 0)
goto error;
+ pm_runtime_set_autosuspend_delay(dev, MSEC_PER_SEC);
+ pm_runtime_use_autosuspend(dev);
+ pm_runtime_set_active(dev);
+ pm_runtime_enable(dev);
+
for (i = 0; i < cci->data->num_masters; i++) {
if (!cci->master[i].cci)
continue;
@@ -649,14 +654,12 @@ static int cci_probe(struct platform_device *pdev)
}
}
- pm_runtime_set_autosuspend_delay(dev, MSEC_PER_SEC);
- pm_runtime_use_autosuspend(dev);
- pm_runtime_set_active(dev);
- pm_runtime_enable(dev);
-
return 0;
error_i2c:
+ pm_runtime_disable(dev);
+ pm_runtime_dont_use_autosuspend(dev);
+
for (--i ; i >= 0; i--) {
if (cci->master[i].cci) {
i2c_del_adapter(&cci->master[i].adap);
--
2.36.1
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 22b106e5355d6e7a9c3b5cb5ed4ef22ae585ea94 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Thu, 2 Jun 2022 10:12:42 +0200
Subject: [PATCH] block: fix bio_clone_blkg_association() to associate with
proper blkcg_gq
Commit d92c370a16cb ("block: really clone the block cgroup in
bio_clone_blkg_association") changed bio_clone_blkg_association() to
just clone bio->bi_blkg reference from source to destination bio. This
is however wrong if the source and destination bios are against
different block devices because struct blkcg_gq is different for each
bdev-blkcg pair. This will result in IOs being accounted (and throttled
as a result) multiple times against the same device (src bdev) while
throttling of the other device (dst bdev) is ignored. In case of BFQ the
inconsistency can even result in crashes in bfq_bic_update_cgroup().
Fix the problem by looking up correct blkcg_gq for the cloned bio.
Reported-by: Logan Gunthorpe <logang(a)deltatee.com>
Reported-and-tested-by: Donald Buczek <buczek(a)molgen.mpg.de>
Fixes: d92c370a16cb ("block: really clone the block cgroup in bio_clone_blkg_association")
CC: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220602081242.7731-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 40161a3f68d0..764e740b0c0f 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1974,12 +1974,8 @@ EXPORT_SYMBOL_GPL(bio_associate_blkg);
*/
void bio_clone_blkg_association(struct bio *dst, struct bio *src)
{
- if (src->bi_blkg) {
- if (dst->bi_blkg)
- blkg_put(dst->bi_blkg);
- blkg_get(src->bi_blkg);
- dst->bi_blkg = src->bi_blkg;
- }
+ if (src->bi_blkg)
+ bio_associate_blkg_from_css(dst, bio_blkcg_css(src));
}
EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 22b106e5355d6e7a9c3b5cb5ed4ef22ae585ea94 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Thu, 2 Jun 2022 10:12:42 +0200
Subject: [PATCH] block: fix bio_clone_blkg_association() to associate with
proper blkcg_gq
Commit d92c370a16cb ("block: really clone the block cgroup in
bio_clone_blkg_association") changed bio_clone_blkg_association() to
just clone bio->bi_blkg reference from source to destination bio. This
is however wrong if the source and destination bios are against
different block devices because struct blkcg_gq is different for each
bdev-blkcg pair. This will result in IOs being accounted (and throttled
as a result) multiple times against the same device (src bdev) while
throttling of the other device (dst bdev) is ignored. In case of BFQ the
inconsistency can even result in crashes in bfq_bic_update_cgroup().
Fix the problem by looking up correct blkcg_gq for the cloned bio.
Reported-by: Logan Gunthorpe <logang(a)deltatee.com>
Reported-and-tested-by: Donald Buczek <buczek(a)molgen.mpg.de>
Fixes: d92c370a16cb ("block: really clone the block cgroup in bio_clone_blkg_association")
CC: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220602081242.7731-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 40161a3f68d0..764e740b0c0f 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1974,12 +1974,8 @@ EXPORT_SYMBOL_GPL(bio_associate_blkg);
*/
void bio_clone_blkg_association(struct bio *dst, struct bio *src)
{
- if (src->bi_blkg) {
- if (dst->bi_blkg)
- blkg_put(dst->bi_blkg);
- blkg_get(src->bi_blkg);
- dst->bi_blkg = src->bi_blkg;
- }
+ if (src->bi_blkg)
+ bio_associate_blkg_from_css(dst, bio_blkcg_css(src));
}
EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 22b106e5355d6e7a9c3b5cb5ed4ef22ae585ea94 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Thu, 2 Jun 2022 10:12:42 +0200
Subject: [PATCH] block: fix bio_clone_blkg_association() to associate with
proper blkcg_gq
Commit d92c370a16cb ("block: really clone the block cgroup in
bio_clone_blkg_association") changed bio_clone_blkg_association() to
just clone bio->bi_blkg reference from source to destination bio. This
is however wrong if the source and destination bios are against
different block devices because struct blkcg_gq is different for each
bdev-blkcg pair. This will result in IOs being accounted (and throttled
as a result) multiple times against the same device (src bdev) while
throttling of the other device (dst bdev) is ignored. In case of BFQ the
inconsistency can even result in crashes in bfq_bic_update_cgroup().
Fix the problem by looking up correct blkcg_gq for the cloned bio.
Reported-by: Logan Gunthorpe <logang(a)deltatee.com>
Reported-and-tested-by: Donald Buczek <buczek(a)molgen.mpg.de>
Fixes: d92c370a16cb ("block: really clone the block cgroup in bio_clone_blkg_association")
CC: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220602081242.7731-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 40161a3f68d0..764e740b0c0f 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1974,12 +1974,8 @@ EXPORT_SYMBOL_GPL(bio_associate_blkg);
*/
void bio_clone_blkg_association(struct bio *dst, struct bio *src)
{
- if (src->bi_blkg) {
- if (dst->bi_blkg)
- blkg_put(dst->bi_blkg);
- blkg_get(src->bi_blkg);
- dst->bi_blkg = src->bi_blkg;
- }
+ if (src->bi_blkg)
+ bio_associate_blkg_from_css(dst, bio_blkcg_css(src));
}
EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
The patch below does not apply to the 5.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 22b106e5355d6e7a9c3b5cb5ed4ef22ae585ea94 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Thu, 2 Jun 2022 10:12:42 +0200
Subject: [PATCH] block: fix bio_clone_blkg_association() to associate with
proper blkcg_gq
Commit d92c370a16cb ("block: really clone the block cgroup in
bio_clone_blkg_association") changed bio_clone_blkg_association() to
just clone bio->bi_blkg reference from source to destination bio. This
is however wrong if the source and destination bios are against
different block devices because struct blkcg_gq is different for each
bdev-blkcg pair. This will result in IOs being accounted (and throttled
as a result) multiple times against the same device (src bdev) while
throttling of the other device (dst bdev) is ignored. In case of BFQ the
inconsistency can even result in crashes in bfq_bic_update_cgroup().
Fix the problem by looking up correct blkcg_gq for the cloned bio.
Reported-by: Logan Gunthorpe <logang(a)deltatee.com>
Reported-and-tested-by: Donald Buczek <buczek(a)molgen.mpg.de>
Fixes: d92c370a16cb ("block: really clone the block cgroup in bio_clone_blkg_association")
CC: stable(a)vger.kernel.org
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220602081242.7731-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index 40161a3f68d0..764e740b0c0f 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1974,12 +1974,8 @@ EXPORT_SYMBOL_GPL(bio_associate_blkg);
*/
void bio_clone_blkg_association(struct bio *dst, struct bio *src)
{
- if (src->bi_blkg) {
- if (dst->bi_blkg)
- blkg_put(dst->bi_blkg);
- blkg_get(src->bi_blkg);
- dst->bi_blkg = src->bi_blkg;
- }
+ if (src->bi_blkg)
+ bio_associate_blkg_from_css(dst, bio_blkcg_css(src));
}
EXPORT_SYMBOL_GPL(bio_clone_blkg_association);
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3a761d72fa62eec8913e45d29375344f61706541 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Mon, 4 Apr 2022 12:51:41 +0200
Subject: [PATCH] exportfs: support idmapped mounts
Make the two locations where exportfs helpers check permission to lookup
a given inode idmapped mount aware by switching it to the lookup_one()
helper. This is a bugfix for the open_by_handle_at() system call which
doesn't take idmapped mounts into account currently. It's not tied to a
specific commit so we'll just Cc stable.
In addition this is required to support idmapped base layers in overlay.
The overlay filesystem uses exportfs to encode and decode file handles
for its index=on mount option and when nfs_export=on.
Cc: <stable(a)vger.kernel.org>
Cc: <linux-fsdevel(a)vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan(a)redhat.com>
Reviewed-by: Amir Goldstein <amir73il(a)gmail.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..3ef80d000e13 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -145,7 +145,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
if (err)
goto out_err;
dprintk("%s: found name: %s\n", __func__, nbuf);
- tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf));
+ tmp = lookup_one_unlocked(mnt_user_ns(mnt), nbuf, parent, strlen(nbuf));
if (IS_ERR(tmp)) {
dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp));
err = PTR_ERR(tmp);
@@ -525,7 +525,8 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
}
inode_lock(target_dir->d_inode);
- nresult = lookup_one_len(nbuf, target_dir, strlen(nbuf));
+ nresult = lookup_one(mnt_user_ns(mnt), nbuf,
+ target_dir, strlen(nbuf));
if (!IS_ERR(nresult)) {
if (unlikely(nresult->d_inode != result->d_inode)) {
dput(nresult);
The patch below does not apply to the 5.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3a761d72fa62eec8913e45d29375344f61706541 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Mon, 4 Apr 2022 12:51:41 +0200
Subject: [PATCH] exportfs: support idmapped mounts
Make the two locations where exportfs helpers check permission to lookup
a given inode idmapped mount aware by switching it to the lookup_one()
helper. This is a bugfix for the open_by_handle_at() system call which
doesn't take idmapped mounts into account currently. It's not tied to a
specific commit so we'll just Cc stable.
In addition this is required to support idmapped base layers in overlay.
The overlay filesystem uses exportfs to encode and decode file handles
for its index=on mount option and when nfs_export=on.
Cc: <stable(a)vger.kernel.org>
Cc: <linux-fsdevel(a)vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan(a)redhat.com>
Reviewed-by: Amir Goldstein <amir73il(a)gmail.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..3ef80d000e13 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -145,7 +145,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
if (err)
goto out_err;
dprintk("%s: found name: %s\n", __func__, nbuf);
- tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf));
+ tmp = lookup_one_unlocked(mnt_user_ns(mnt), nbuf, parent, strlen(nbuf));
if (IS_ERR(tmp)) {
dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp));
err = PTR_ERR(tmp);
@@ -525,7 +525,8 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
}
inode_lock(target_dir->d_inode);
- nresult = lookup_one_len(nbuf, target_dir, strlen(nbuf));
+ nresult = lookup_one(mnt_user_ns(mnt), nbuf,
+ target_dir, strlen(nbuf));
if (!IS_ERR(nresult)) {
if (unlikely(nresult->d_inode != result->d_inode)) {
dput(nresult);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3a761d72fa62eec8913e45d29375344f61706541 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Mon, 4 Apr 2022 12:51:41 +0200
Subject: [PATCH] exportfs: support idmapped mounts
Make the two locations where exportfs helpers check permission to lookup
a given inode idmapped mount aware by switching it to the lookup_one()
helper. This is a bugfix for the open_by_handle_at() system call which
doesn't take idmapped mounts into account currently. It's not tied to a
specific commit so we'll just Cc stable.
In addition this is required to support idmapped base layers in overlay.
The overlay filesystem uses exportfs to encode and decode file handles
for its index=on mount option and when nfs_export=on.
Cc: <stable(a)vger.kernel.org>
Cc: <linux-fsdevel(a)vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan(a)redhat.com>
Reviewed-by: Amir Goldstein <amir73il(a)gmail.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..3ef80d000e13 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -145,7 +145,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
if (err)
goto out_err;
dprintk("%s: found name: %s\n", __func__, nbuf);
- tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf));
+ tmp = lookup_one_unlocked(mnt_user_ns(mnt), nbuf, parent, strlen(nbuf));
if (IS_ERR(tmp)) {
dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp));
err = PTR_ERR(tmp);
@@ -525,7 +525,8 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
}
inode_lock(target_dir->d_inode);
- nresult = lookup_one_len(nbuf, target_dir, strlen(nbuf));
+ nresult = lookup_one(mnt_user_ns(mnt), nbuf,
+ target_dir, strlen(nbuf));
if (!IS_ERR(nresult)) {
if (unlikely(nresult->d_inode != result->d_inode)) {
dput(nresult);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3a761d72fa62eec8913e45d29375344f61706541 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Mon, 4 Apr 2022 12:51:41 +0200
Subject: [PATCH] exportfs: support idmapped mounts
Make the two locations where exportfs helpers check permission to lookup
a given inode idmapped mount aware by switching it to the lookup_one()
helper. This is a bugfix for the open_by_handle_at() system call which
doesn't take idmapped mounts into account currently. It's not tied to a
specific commit so we'll just Cc stable.
In addition this is required to support idmapped base layers in overlay.
The overlay filesystem uses exportfs to encode and decode file handles
for its index=on mount option and when nfs_export=on.
Cc: <stable(a)vger.kernel.org>
Cc: <linux-fsdevel(a)vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan(a)redhat.com>
Reviewed-by: Amir Goldstein <amir73il(a)gmail.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..3ef80d000e13 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -145,7 +145,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
if (err)
goto out_err;
dprintk("%s: found name: %s\n", __func__, nbuf);
- tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf));
+ tmp = lookup_one_unlocked(mnt_user_ns(mnt), nbuf, parent, strlen(nbuf));
if (IS_ERR(tmp)) {
dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp));
err = PTR_ERR(tmp);
@@ -525,7 +525,8 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
}
inode_lock(target_dir->d_inode);
- nresult = lookup_one_len(nbuf, target_dir, strlen(nbuf));
+ nresult = lookup_one(mnt_user_ns(mnt), nbuf,
+ target_dir, strlen(nbuf));
if (!IS_ERR(nresult)) {
if (unlikely(nresult->d_inode != result->d_inode)) {
dput(nresult);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3a761d72fa62eec8913e45d29375344f61706541 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Mon, 4 Apr 2022 12:51:41 +0200
Subject: [PATCH] exportfs: support idmapped mounts
Make the two locations where exportfs helpers check permission to lookup
a given inode idmapped mount aware by switching it to the lookup_one()
helper. This is a bugfix for the open_by_handle_at() system call which
doesn't take idmapped mounts into account currently. It's not tied to a
specific commit so we'll just Cc stable.
In addition this is required to support idmapped base layers in overlay.
The overlay filesystem uses exportfs to encode and decode file handles
for its index=on mount option and when nfs_export=on.
Cc: <stable(a)vger.kernel.org>
Cc: <linux-fsdevel(a)vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan(a)redhat.com>
Reviewed-by: Amir Goldstein <amir73il(a)gmail.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..3ef80d000e13 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -145,7 +145,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
if (err)
goto out_err;
dprintk("%s: found name: %s\n", __func__, nbuf);
- tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf));
+ tmp = lookup_one_unlocked(mnt_user_ns(mnt), nbuf, parent, strlen(nbuf));
if (IS_ERR(tmp)) {
dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp));
err = PTR_ERR(tmp);
@@ -525,7 +525,8 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
}
inode_lock(target_dir->d_inode);
- nresult = lookup_one_len(nbuf, target_dir, strlen(nbuf));
+ nresult = lookup_one(mnt_user_ns(mnt), nbuf,
+ target_dir, strlen(nbuf));
if (!IS_ERR(nresult)) {
if (unlikely(nresult->d_inode != result->d_inode)) {
dput(nresult);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3a761d72fa62eec8913e45d29375344f61706541 Mon Sep 17 00:00:00 2001
From: Christian Brauner <brauner(a)kernel.org>
Date: Mon, 4 Apr 2022 12:51:41 +0200
Subject: [PATCH] exportfs: support idmapped mounts
Make the two locations where exportfs helpers check permission to lookup
a given inode idmapped mount aware by switching it to the lookup_one()
helper. This is a bugfix for the open_by_handle_at() system call which
doesn't take idmapped mounts into account currently. It's not tied to a
specific commit so we'll just Cc stable.
In addition this is required to support idmapped base layers in overlay.
The overlay filesystem uses exportfs to encode and decode file handles
for its index=on mount option and when nfs_export=on.
Cc: <stable(a)vger.kernel.org>
Cc: <linux-fsdevel(a)vger.kernel.org>
Tested-by: Giuseppe Scrivano <gscrivan(a)redhat.com>
Reviewed-by: Amir Goldstein <amir73il(a)gmail.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Christian Brauner (Microsoft) <brauner(a)kernel.org>
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/exportfs/expfs.c b/fs/exportfs/expfs.c
index 0106eba46d5a..3ef80d000e13 100644
--- a/fs/exportfs/expfs.c
+++ b/fs/exportfs/expfs.c
@@ -145,7 +145,7 @@ static struct dentry *reconnect_one(struct vfsmount *mnt,
if (err)
goto out_err;
dprintk("%s: found name: %s\n", __func__, nbuf);
- tmp = lookup_one_len_unlocked(nbuf, parent, strlen(nbuf));
+ tmp = lookup_one_unlocked(mnt_user_ns(mnt), nbuf, parent, strlen(nbuf));
if (IS_ERR(tmp)) {
dprintk("%s: lookup failed: %d\n", __func__, PTR_ERR(tmp));
err = PTR_ERR(tmp);
@@ -525,7 +525,8 @@ exportfs_decode_fh_raw(struct vfsmount *mnt, struct fid *fid, int fh_len,
}
inode_lock(target_dir->d_inode);
- nresult = lookup_one_len(nbuf, target_dir, strlen(nbuf));
+ nresult = lookup_one(mnt_user_ns(mnt), nbuf,
+ target_dir, strlen(nbuf));
if (!IS_ERR(nresult)) {
if (unlikely(nresult->d_inode != result->d_inode)) {
dput(nresult);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b954ebba296bb2eb2e38322f17aaa6426934bd7e Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Tue, 12 Apr 2022 20:52:35 +0900
Subject: [PATCH] zonefs: Clear inode information flags on inode creation
Ensure that the i_flags field of struct zonefs_inode_info is cleared to
0 when initializing a zone file inode, avoiding seeing the flag
ZONEFS_ZONE_OPEN being incorrectly set.
Fixes: b5c00e975779 ("zonefs: open/close zone on file open/close")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com>
Reviewed-by: Hans Holmberg <hans.holmberg(a)wdc.com>
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 3614c7834007..75d8dabe0807 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -1142,6 +1142,7 @@ static struct inode *zonefs_alloc_inode(struct super_block *sb)
inode_init_once(&zi->i_vnode);
mutex_init(&zi->i_truncate_mutex);
zi->i_wr_refcnt = 0;
+ zi->i_flags = 0;
return &zi->i_vnode;
}
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b954ebba296bb2eb2e38322f17aaa6426934bd7e Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Tue, 12 Apr 2022 20:52:35 +0900
Subject: [PATCH] zonefs: Clear inode information flags on inode creation
Ensure that the i_flags field of struct zonefs_inode_info is cleared to
0 when initializing a zone file inode, avoiding seeing the flag
ZONEFS_ZONE_OPEN being incorrectly set.
Fixes: b5c00e975779 ("zonefs: open/close zone on file open/close")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com>
Reviewed-by: Hans Holmberg <hans.holmberg(a)wdc.com>
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 3614c7834007..75d8dabe0807 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -1142,6 +1142,7 @@ static struct inode *zonefs_alloc_inode(struct super_block *sb)
inode_init_once(&zi->i_vnode);
mutex_init(&zi->i_truncate_mutex);
zi->i_wr_refcnt = 0;
+ zi->i_flags = 0;
return &zi->i_vnode;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b954ebba296bb2eb2e38322f17aaa6426934bd7e Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Tue, 12 Apr 2022 20:52:35 +0900
Subject: [PATCH] zonefs: Clear inode information flags on inode creation
Ensure that the i_flags field of struct zonefs_inode_info is cleared to
0 when initializing a zone file inode, avoiding seeing the flag
ZONEFS_ZONE_OPEN being incorrectly set.
Fixes: b5c00e975779 ("zonefs: open/close zone on file open/close")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Chaitanya Kulkarni <kch(a)nvidia.com>
Reviewed-by: Hans Holmberg <hans.holmberg(a)wdc.com>
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 3614c7834007..75d8dabe0807 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -1142,6 +1142,7 @@ static struct inode *zonefs_alloc_inode(struct super_block *sb)
inode_init_once(&zi->i_vnode);
mutex_init(&zi->i_truncate_mutex);
zi->i_wr_refcnt = 0;
+ zi->i_flags = 0;
return &zi->i_vnode;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 19139539207934aef6335bdef09c9e4bd70d1808 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Tue, 12 Apr 2022 17:41:37 +0900
Subject: [PATCH] zonefs: Fix management of open zones
The mount option "explicit_open" manages the device open zone
resources to ensure that if an application opens a sequential file for
writing, the file zone can always be written by explicitly opening
the zone and accounting for that state with the s_open_zones counter.
However, if some zones are already open when mounting, the device open
zone resource usage status will be larger than the initial s_open_zones
value of 0. Ensure that this inconsistency does not happen by closing
any sequential zone that is open when mounting.
Furthermore, with ZNS drives, closing an explicitly open zone that has
not been written will change the zone state to "closed", that is, the
zone will remain in an active state. Since this can then cause failures
of explicit open operations on other zones if the drive active zone
resources are exceeded, we need to make sure that the zone is not
active anymore by resetting it instead of closing it. To address this,
zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request
into a REQ_OP_ZONE_RESET for sequential zones that have not been
written.
Fixes: b5c00e975779 ("zonefs: open/close zone on file open/close")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg(a)wdc.com>
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 75d8dabe0807..e20e7c841489 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -35,6 +35,17 @@ static inline int zonefs_zone_mgmt(struct inode *inode,
lockdep_assert_held(&zi->i_truncate_mutex);
+ /*
+ * With ZNS drives, closing an explicitly open zone that has not been
+ * written will change the zone state to "closed", that is, the zone
+ * will remain active. Since this can then cause failure of explicit
+ * open operation on other zones if the drive active zone resources
+ * are exceeded, make sure that the zone does not remain active by
+ * resetting it.
+ */
+ if (op == REQ_OP_ZONE_CLOSE && !zi->i_wpoffset)
+ op = REQ_OP_ZONE_RESET;
+
trace_zonefs_zone_mgmt(inode, op);
ret = blkdev_zone_mgmt(inode->i_sb->s_bdev, op, zi->i_zsector,
zi->i_zone_size >> SECTOR_SHIFT, GFP_NOFS);
@@ -1294,12 +1305,13 @@ static void zonefs_init_dir_inode(struct inode *parent, struct inode *inode,
inc_nlink(parent);
}
-static void zonefs_init_file_inode(struct inode *inode, struct blk_zone *zone,
- enum zonefs_ztype type)
+static int zonefs_init_file_inode(struct inode *inode, struct blk_zone *zone,
+ enum zonefs_ztype type)
{
struct super_block *sb = inode->i_sb;
struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
struct zonefs_inode_info *zi = ZONEFS_I(inode);
+ int ret = 0;
inode->i_ino = zone->start >> sbi->s_zone_sectors_shift;
inode->i_mode = S_IFREG | sbi->s_perm;
@@ -1324,6 +1336,22 @@ static void zonefs_init_file_inode(struct inode *inode, struct blk_zone *zone,
sb->s_maxbytes = max(zi->i_max_size, sb->s_maxbytes);
sbi->s_blocks += zi->i_max_size >> sb->s_blocksize_bits;
sbi->s_used_blocks += zi->i_wpoffset >> sb->s_blocksize_bits;
+
+ /*
+ * For sequential zones, make sure that any open zone is closed first
+ * to ensure that the initial number of open zones is 0, in sync with
+ * the open zone accounting done when the mount option
+ * ZONEFS_MNTOPT_EXPLICIT_OPEN is used.
+ */
+ if (type == ZONEFS_ZTYPE_SEQ &&
+ (zone->cond == BLK_ZONE_COND_IMP_OPEN ||
+ zone->cond == BLK_ZONE_COND_EXP_OPEN)) {
+ mutex_lock(&zi->i_truncate_mutex);
+ ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_CLOSE);
+ mutex_unlock(&zi->i_truncate_mutex);
+ }
+
+ return ret;
}
static struct dentry *zonefs_create_inode(struct dentry *parent,
@@ -1333,6 +1361,7 @@ static struct dentry *zonefs_create_inode(struct dentry *parent,
struct inode *dir = d_inode(parent);
struct dentry *dentry;
struct inode *inode;
+ int ret;
dentry = d_alloc_name(parent, name);
if (!dentry)
@@ -1343,10 +1372,16 @@ static struct dentry *zonefs_create_inode(struct dentry *parent,
goto dput;
inode->i_ctime = inode->i_mtime = inode->i_atime = dir->i_ctime;
- if (zone)
- zonefs_init_file_inode(inode, zone, type);
- else
+ if (zone) {
+ ret = zonefs_init_file_inode(inode, zone, type);
+ if (ret) {
+ iput(inode);
+ goto dput;
+ }
+ } else {
zonefs_init_dir_inode(dir, inode, type);
+ }
+
d_add(dentry, inode);
dir->i_size++;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 19139539207934aef6335bdef09c9e4bd70d1808 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Tue, 12 Apr 2022 17:41:37 +0900
Subject: [PATCH] zonefs: Fix management of open zones
The mount option "explicit_open" manages the device open zone
resources to ensure that if an application opens a sequential file for
writing, the file zone can always be written by explicitly opening
the zone and accounting for that state with the s_open_zones counter.
However, if some zones are already open when mounting, the device open
zone resource usage status will be larger than the initial s_open_zones
value of 0. Ensure that this inconsistency does not happen by closing
any sequential zone that is open when mounting.
Furthermore, with ZNS drives, closing an explicitly open zone that has
not been written will change the zone state to "closed", that is, the
zone will remain in an active state. Since this can then cause failures
of explicit open operations on other zones if the drive active zone
resources are exceeded, we need to make sure that the zone is not
active anymore by resetting it instead of closing it. To address this,
zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request
into a REQ_OP_ZONE_RESET for sequential zones that have not been
written.
Fixes: b5c00e975779 ("zonefs: open/close zone on file open/close")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg(a)wdc.com>
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 75d8dabe0807..e20e7c841489 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -35,6 +35,17 @@ static inline int zonefs_zone_mgmt(struct inode *inode,
lockdep_assert_held(&zi->i_truncate_mutex);
+ /*
+ * With ZNS drives, closing an explicitly open zone that has not been
+ * written will change the zone state to "closed", that is, the zone
+ * will remain active. Since this can then cause failure of explicit
+ * open operation on other zones if the drive active zone resources
+ * are exceeded, make sure that the zone does not remain active by
+ * resetting it.
+ */
+ if (op == REQ_OP_ZONE_CLOSE && !zi->i_wpoffset)
+ op = REQ_OP_ZONE_RESET;
+
trace_zonefs_zone_mgmt(inode, op);
ret = blkdev_zone_mgmt(inode->i_sb->s_bdev, op, zi->i_zsector,
zi->i_zone_size >> SECTOR_SHIFT, GFP_NOFS);
@@ -1294,12 +1305,13 @@ static void zonefs_init_dir_inode(struct inode *parent, struct inode *inode,
inc_nlink(parent);
}
-static void zonefs_init_file_inode(struct inode *inode, struct blk_zone *zone,
- enum zonefs_ztype type)
+static int zonefs_init_file_inode(struct inode *inode, struct blk_zone *zone,
+ enum zonefs_ztype type)
{
struct super_block *sb = inode->i_sb;
struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
struct zonefs_inode_info *zi = ZONEFS_I(inode);
+ int ret = 0;
inode->i_ino = zone->start >> sbi->s_zone_sectors_shift;
inode->i_mode = S_IFREG | sbi->s_perm;
@@ -1324,6 +1336,22 @@ static void zonefs_init_file_inode(struct inode *inode, struct blk_zone *zone,
sb->s_maxbytes = max(zi->i_max_size, sb->s_maxbytes);
sbi->s_blocks += zi->i_max_size >> sb->s_blocksize_bits;
sbi->s_used_blocks += zi->i_wpoffset >> sb->s_blocksize_bits;
+
+ /*
+ * For sequential zones, make sure that any open zone is closed first
+ * to ensure that the initial number of open zones is 0, in sync with
+ * the open zone accounting done when the mount option
+ * ZONEFS_MNTOPT_EXPLICIT_OPEN is used.
+ */
+ if (type == ZONEFS_ZTYPE_SEQ &&
+ (zone->cond == BLK_ZONE_COND_IMP_OPEN ||
+ zone->cond == BLK_ZONE_COND_EXP_OPEN)) {
+ mutex_lock(&zi->i_truncate_mutex);
+ ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_CLOSE);
+ mutex_unlock(&zi->i_truncate_mutex);
+ }
+
+ return ret;
}
static struct dentry *zonefs_create_inode(struct dentry *parent,
@@ -1333,6 +1361,7 @@ static struct dentry *zonefs_create_inode(struct dentry *parent,
struct inode *dir = d_inode(parent);
struct dentry *dentry;
struct inode *inode;
+ int ret;
dentry = d_alloc_name(parent, name);
if (!dentry)
@@ -1343,10 +1372,16 @@ static struct dentry *zonefs_create_inode(struct dentry *parent,
goto dput;
inode->i_ctime = inode->i_mtime = inode->i_atime = dir->i_ctime;
- if (zone)
- zonefs_init_file_inode(inode, zone, type);
- else
+ if (zone) {
+ ret = zonefs_init_file_inode(inode, zone, type);
+ if (ret) {
+ iput(inode);
+ goto dput;
+ }
+ } else {
zonefs_init_dir_inode(dir, inode, type);
+ }
+
d_add(dentry, inode);
dir->i_size++;
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 19139539207934aef6335bdef09c9e4bd70d1808 Mon Sep 17 00:00:00 2001
From: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Date: Tue, 12 Apr 2022 17:41:37 +0900
Subject: [PATCH] zonefs: Fix management of open zones
The mount option "explicit_open" manages the device open zone
resources to ensure that if an application opens a sequential file for
writing, the file zone can always be written by explicitly opening
the zone and accounting for that state with the s_open_zones counter.
However, if some zones are already open when mounting, the device open
zone resource usage status will be larger than the initial s_open_zones
value of 0. Ensure that this inconsistency does not happen by closing
any sequential zone that is open when mounting.
Furthermore, with ZNS drives, closing an explicitly open zone that has
not been written will change the zone state to "closed", that is, the
zone will remain in an active state. Since this can then cause failures
of explicit open operations on other zones if the drive active zone
resources are exceeded, we need to make sure that the zone is not
active anymore by resetting it instead of closing it. To address this,
zonefs_zone_mgmt() is modified to change a REQ_OP_ZONE_CLOSE request
into a REQ_OP_ZONE_RESET for sequential zones that have not been
written.
Fixes: b5c00e975779 ("zonefs: open/close zone on file open/close")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Damien Le Moal <damien.lemoal(a)opensource.wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn(a)wdc.com>
Reviewed-by: Hans Holmberg <hans.holmberg(a)wdc.com>
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 75d8dabe0807..e20e7c841489 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -35,6 +35,17 @@ static inline int zonefs_zone_mgmt(struct inode *inode,
lockdep_assert_held(&zi->i_truncate_mutex);
+ /*
+ * With ZNS drives, closing an explicitly open zone that has not been
+ * written will change the zone state to "closed", that is, the zone
+ * will remain active. Since this can then cause failure of explicit
+ * open operation on other zones if the drive active zone resources
+ * are exceeded, make sure that the zone does not remain active by
+ * resetting it.
+ */
+ if (op == REQ_OP_ZONE_CLOSE && !zi->i_wpoffset)
+ op = REQ_OP_ZONE_RESET;
+
trace_zonefs_zone_mgmt(inode, op);
ret = blkdev_zone_mgmt(inode->i_sb->s_bdev, op, zi->i_zsector,
zi->i_zone_size >> SECTOR_SHIFT, GFP_NOFS);
@@ -1294,12 +1305,13 @@ static void zonefs_init_dir_inode(struct inode *parent, struct inode *inode,
inc_nlink(parent);
}
-static void zonefs_init_file_inode(struct inode *inode, struct blk_zone *zone,
- enum zonefs_ztype type)
+static int zonefs_init_file_inode(struct inode *inode, struct blk_zone *zone,
+ enum zonefs_ztype type)
{
struct super_block *sb = inode->i_sb;
struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
struct zonefs_inode_info *zi = ZONEFS_I(inode);
+ int ret = 0;
inode->i_ino = zone->start >> sbi->s_zone_sectors_shift;
inode->i_mode = S_IFREG | sbi->s_perm;
@@ -1324,6 +1336,22 @@ static void zonefs_init_file_inode(struct inode *inode, struct blk_zone *zone,
sb->s_maxbytes = max(zi->i_max_size, sb->s_maxbytes);
sbi->s_blocks += zi->i_max_size >> sb->s_blocksize_bits;
sbi->s_used_blocks += zi->i_wpoffset >> sb->s_blocksize_bits;
+
+ /*
+ * For sequential zones, make sure that any open zone is closed first
+ * to ensure that the initial number of open zones is 0, in sync with
+ * the open zone accounting done when the mount option
+ * ZONEFS_MNTOPT_EXPLICIT_OPEN is used.
+ */
+ if (type == ZONEFS_ZTYPE_SEQ &&
+ (zone->cond == BLK_ZONE_COND_IMP_OPEN ||
+ zone->cond == BLK_ZONE_COND_EXP_OPEN)) {
+ mutex_lock(&zi->i_truncate_mutex);
+ ret = zonefs_zone_mgmt(inode, REQ_OP_ZONE_CLOSE);
+ mutex_unlock(&zi->i_truncate_mutex);
+ }
+
+ return ret;
}
static struct dentry *zonefs_create_inode(struct dentry *parent,
@@ -1333,6 +1361,7 @@ static struct dentry *zonefs_create_inode(struct dentry *parent,
struct inode *dir = d_inode(parent);
struct dentry *dentry;
struct inode *inode;
+ int ret;
dentry = d_alloc_name(parent, name);
if (!dentry)
@@ -1343,10 +1372,16 @@ static struct dentry *zonefs_create_inode(struct dentry *parent,
goto dput;
inode->i_ctime = inode->i_mtime = inode->i_atime = dir->i_ctime;
- if (zone)
- zonefs_init_file_inode(inode, zone, type);
- else
+ if (zone) {
+ ret = zonefs_init_file_inode(inode, zone, type);
+ if (ret) {
+ iput(inode);
+ goto dput;
+ }
+ } else {
zonefs_init_dir_inode(dir, inode, type);
+ }
+
d_add(dentry, inode);
dir->i_size++;
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ba58995909b5098ca4003af65b0ccd5a8d13dd25 Mon Sep 17 00:00:00 2001
From: Alexander Aring <aahringo(a)redhat.com>
Date: Wed, 6 Apr 2022 13:34:16 -0400
Subject: [PATCH] dlm: fix pending remove if msg allocation fails
This patch unsets ls_remove_len and ls_remove_name if a message
allocation of a remove messages fails. In this case we never send a
remove message out but set the per ls ls_remove_len ls_remove_name
variable for a pending remove. Unset those variable should indicate
possible waiters in wait_pending_remove() that no pending remove is
going on at this moment.
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: David Teigland <teigland(a)redhat.com>
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 137cf09b51e5..f5330e58d1fc 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -4100,13 +4100,14 @@ static void send_repeat_remove(struct dlm_ls *ls, char *ms_name, int len)
rv = _create_message(ls, sizeof(struct dlm_message) + len,
dir_nodeid, DLM_MSG_REMOVE, &ms, &mh);
if (rv)
- return;
+ goto out;
memcpy(ms->m_extra, name, len);
ms->m_hash = cpu_to_le32(hash);
send_message(mh, ms);
+out:
spin_lock(&ls->ls_remove_spin);
ls->ls_remove_len = 0;
memset(ls->ls_remove_name, 0, DLM_RESNAME_MAXLEN);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ba58995909b5098ca4003af65b0ccd5a8d13dd25 Mon Sep 17 00:00:00 2001
From: Alexander Aring <aahringo(a)redhat.com>
Date: Wed, 6 Apr 2022 13:34:16 -0400
Subject: [PATCH] dlm: fix pending remove if msg allocation fails
This patch unsets ls_remove_len and ls_remove_name if a message
allocation of a remove messages fails. In this case we never send a
remove message out but set the per ls ls_remove_len ls_remove_name
variable for a pending remove. Unset those variable should indicate
possible waiters in wait_pending_remove() that no pending remove is
going on at this moment.
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: David Teigland <teigland(a)redhat.com>
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 137cf09b51e5..f5330e58d1fc 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -4100,13 +4100,14 @@ static void send_repeat_remove(struct dlm_ls *ls, char *ms_name, int len)
rv = _create_message(ls, sizeof(struct dlm_message) + len,
dir_nodeid, DLM_MSG_REMOVE, &ms, &mh);
if (rv)
- return;
+ goto out;
memcpy(ms->m_extra, name, len);
ms->m_hash = cpu_to_le32(hash);
send_message(mh, ms);
+out:
spin_lock(&ls->ls_remove_spin);
ls->ls_remove_len = 0;
memset(ls->ls_remove_name, 0, DLM_RESNAME_MAXLEN);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ba58995909b5098ca4003af65b0ccd5a8d13dd25 Mon Sep 17 00:00:00 2001
From: Alexander Aring <aahringo(a)redhat.com>
Date: Wed, 6 Apr 2022 13:34:16 -0400
Subject: [PATCH] dlm: fix pending remove if msg allocation fails
This patch unsets ls_remove_len and ls_remove_name if a message
allocation of a remove messages fails. In this case we never send a
remove message out but set the per ls ls_remove_len ls_remove_name
variable for a pending remove. Unset those variable should indicate
possible waiters in wait_pending_remove() that no pending remove is
going on at this moment.
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: David Teigland <teigland(a)redhat.com>
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 137cf09b51e5..f5330e58d1fc 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -4100,13 +4100,14 @@ static void send_repeat_remove(struct dlm_ls *ls, char *ms_name, int len)
rv = _create_message(ls, sizeof(struct dlm_message) + len,
dir_nodeid, DLM_MSG_REMOVE, &ms, &mh);
if (rv)
- return;
+ goto out;
memcpy(ms->m_extra, name, len);
ms->m_hash = cpu_to_le32(hash);
send_message(mh, ms);
+out:
spin_lock(&ls->ls_remove_spin);
ls->ls_remove_len = 0;
memset(ls->ls_remove_name, 0, DLM_RESNAME_MAXLEN);
The patch below does not apply to the 5.18-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From ba58995909b5098ca4003af65b0ccd5a8d13dd25 Mon Sep 17 00:00:00 2001
From: Alexander Aring <aahringo(a)redhat.com>
Date: Wed, 6 Apr 2022 13:34:16 -0400
Subject: [PATCH] dlm: fix pending remove if msg allocation fails
This patch unsets ls_remove_len and ls_remove_name if a message
allocation of a remove messages fails. In this case we never send a
remove message out but set the per ls ls_remove_len ls_remove_name
variable for a pending remove. Unset those variable should indicate
possible waiters in wait_pending_remove() that no pending remove is
going on at this moment.
Cc: stable(a)vger.kernel.org
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
Signed-off-by: David Teigland <teigland(a)redhat.com>
diff --git a/fs/dlm/lock.c b/fs/dlm/lock.c
index 137cf09b51e5..f5330e58d1fc 100644
--- a/fs/dlm/lock.c
+++ b/fs/dlm/lock.c
@@ -4100,13 +4100,14 @@ static void send_repeat_remove(struct dlm_ls *ls, char *ms_name, int len)
rv = _create_message(ls, sizeof(struct dlm_message) + len,
dir_nodeid, DLM_MSG_REMOVE, &ms, &mh);
if (rv)
- return;
+ goto out;
memcpy(ms->m_extra, name, len);
ms->m_hash = cpu_to_le32(hash);
send_message(mh, ms);
+out:
spin_lock(&ls->ls_remove_spin);
ls->ls_remove_len = 0;
memset(ls->ls_remove_name, 0, DLM_RESNAME_MAXLEN);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 83013631f0f9961416abd812e228c8efbc2f6069 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Fri, 1 Apr 2022 15:38:54 +0200
Subject: [PATCH] PCI: qcom: Fix unbalanced PHY init on probe errors
Undo the PHY initialisation (e.g. balance runtime PM) if host
initialisation fails during probe.
Link: https://lore.kernel.org/r/20220401133854.10421-3-johan+linaro@kernel.org
Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver")
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Acked-by: Stanimir Varbanov <svarbanov(a)mm-sol.com>
Cc: stable(a)vger.kernel.org # 4.5
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 9191be96d627..2e5464edc36e 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -1631,11 +1631,13 @@ static int qcom_pcie_probe(struct platform_device *pdev)
ret = dw_pcie_host_init(pp);
if (ret) {
dev_err(dev, "cannot initialize host\n");
- goto err_pm_runtime_put;
+ goto err_phy_exit;
}
return 0;
+err_phy_exit:
+ phy_exit(pcie->phy);
err_pm_runtime_put:
pm_runtime_put(dev);
pm_runtime_disable(dev);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 83013631f0f9961416abd812e228c8efbc2f6069 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Fri, 1 Apr 2022 15:38:54 +0200
Subject: [PATCH] PCI: qcom: Fix unbalanced PHY init on probe errors
Undo the PHY initialisation (e.g. balance runtime PM) if host
initialisation fails during probe.
Link: https://lore.kernel.org/r/20220401133854.10421-3-johan+linaro@kernel.org
Fixes: 82a823833f4e ("PCI: qcom: Add Qualcomm PCIe controller driver")
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Acked-by: Stanimir Varbanov <svarbanov(a)mm-sol.com>
Cc: stable(a)vger.kernel.org # 4.5
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 9191be96d627..2e5464edc36e 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -1631,11 +1631,13 @@ static int qcom_pcie_probe(struct platform_device *pdev)
ret = dw_pcie_host_init(pp);
if (ret) {
dev_err(dev, "cannot initialize host\n");
- goto err_pm_runtime_put;
+ goto err_phy_exit;
}
return 0;
+err_phy_exit:
+ phy_exit(pcie->phy);
err_pm_runtime_put:
pm_runtime_put(dev);
pm_runtime_disable(dev);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fdf6a2f533115ec5d4d9629178f8196331f1ac50 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Fri, 1 Apr 2022 15:33:51 +0200
Subject: [PATCH] PCI: qcom: Fix pipe clock imbalance
Fix a clock imbalance introduced by ed8cc3b1fc84 ("PCI: qcom: Add support
for SDM845 PCIe controller"), which enables the pipe clock both in init()
and in post_init() but only disables in post_deinit().
Note that the pipe clock was also never disabled in the init() error
paths and that enabling the clock before powering up the PHY looks
questionable.
Link: https://lore.kernel.org/r/20220401133351.10113-1-johan+linaro@kernel.org
Fixes: ed8cc3b1fc84 ("PCI: qcom: Add support for SDM845 PCIe controller")
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
Cc: stable(a)vger.kernel.org # 5.6
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 375f27ab9403..925324dece64 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -1238,12 +1238,6 @@ static int qcom_pcie_init_2_7_0(struct qcom_pcie *pcie)
goto err_disable_clocks;
}
- ret = clk_prepare_enable(res->pipe_clk);
- if (ret) {
- dev_err(dev, "cannot prepare/enable pipe clock\n");
- goto err_disable_clocks;
- }
-
/* Wait for reset to complete, required on SM8450 */
usleep_range(1000, 1500);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fdf6a2f533115ec5d4d9629178f8196331f1ac50 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Fri, 1 Apr 2022 15:33:51 +0200
Subject: [PATCH] PCI: qcom: Fix pipe clock imbalance
Fix a clock imbalance introduced by ed8cc3b1fc84 ("PCI: qcom: Add support
for SDM845 PCIe controller"), which enables the pipe clock both in init()
and in post_init() but only disables in post_deinit().
Note that the pipe clock was also never disabled in the init() error
paths and that enabling the clock before powering up the PHY looks
questionable.
Link: https://lore.kernel.org/r/20220401133351.10113-1-johan+linaro@kernel.org
Fixes: ed8cc3b1fc84 ("PCI: qcom: Add support for SDM845 PCIe controller")
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
Cc: stable(a)vger.kernel.org # 5.6
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 375f27ab9403..925324dece64 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -1238,12 +1238,6 @@ static int qcom_pcie_init_2_7_0(struct qcom_pcie *pcie)
goto err_disable_clocks;
}
- ret = clk_prepare_enable(res->pipe_clk);
- if (ret) {
- dev_err(dev, "cannot prepare/enable pipe clock\n");
- goto err_disable_clocks;
- }
-
/* Wait for reset to complete, required on SM8450 */
usleep_range(1000, 1500);
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From fdf6a2f533115ec5d4d9629178f8196331f1ac50 Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan+linaro(a)kernel.org>
Date: Fri, 1 Apr 2022 15:33:51 +0200
Subject: [PATCH] PCI: qcom: Fix pipe clock imbalance
Fix a clock imbalance introduced by ed8cc3b1fc84 ("PCI: qcom: Add support
for SDM845 PCIe controller"), which enables the pipe clock both in init()
and in post_init() but only disables in post_deinit().
Note that the pipe clock was also never disabled in the init() error
paths and that enabling the clock before powering up the PHY looks
questionable.
Link: https://lore.kernel.org/r/20220401133351.10113-1-johan+linaro@kernel.org
Fixes: ed8cc3b1fc84 ("PCI: qcom: Add support for SDM845 PCIe controller")
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas(a)google.com>
Reviewed-by: Bjorn Andersson <bjorn.andersson(a)linaro.org>
Cc: stable(a)vger.kernel.org # 5.6
diff --git a/drivers/pci/controller/dwc/pcie-qcom.c b/drivers/pci/controller/dwc/pcie-qcom.c
index 375f27ab9403..925324dece64 100644
--- a/drivers/pci/controller/dwc/pcie-qcom.c
+++ b/drivers/pci/controller/dwc/pcie-qcom.c
@@ -1238,12 +1238,6 @@ static int qcom_pcie_init_2_7_0(struct qcom_pcie *pcie)
goto err_disable_clocks;
}
- ret = clk_prepare_enable(res->pipe_clk);
- if (ret) {
- dev_err(dev, "cannot prepare/enable pipe clock\n");
- goto err_disable_clocks;
- }
-
/* Wait for reset to complete, required on SM8450 */
usleep_range(1000, 1500);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b27f266f74fbda4ee36c2b2b04d15992860cf23b Mon Sep 17 00:00:00 2001
From: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Date: Tue, 3 May 2022 14:05:46 +0900
Subject: [PATCH] tracing: Fix return value of trace_pid_write()
Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 4
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
....
This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.
The result of applied patch is below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 5
close(1) = 0
Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Baik Song An <bsahn(a)etri.re.kr>
Cc: Hong Yeon Kim <kimhy(a)etri.re.kr>
Cc: Taeung Song <taeung(a)reallinux.co.kr>
Cc: linuxgeek(a)linuxgeek.io
Cc: stable(a)vger.kernel.org
Fixes: 4909010788640 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 498ae22d4ffa..4825883b2ffd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -721,13 +721,16 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
pos = 0;
ret = trace_get_user(&parser, ubuf, cnt, &pos);
- if (ret < 0 || !trace_parser_loaded(&parser))
+ if (ret < 0)
break;
read += ret;
ubuf += ret;
cnt -= ret;
+ if (!trace_parser_loaded(&parser))
+ break;
+
ret = -EINVAL;
if (kstrtoul(parser.buffer, 0, &val))
break;
@@ -753,7 +756,6 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
if (!nr_pids) {
/* Cleared the list of pids */
trace_pid_list_free(pid_list);
- read = ret;
pid_list = NULL;
}
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b27f266f74fbda4ee36c2b2b04d15992860cf23b Mon Sep 17 00:00:00 2001
From: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Date: Tue, 3 May 2022 14:05:46 +0900
Subject: [PATCH] tracing: Fix return value of trace_pid_write()
Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 4
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
....
This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.
The result of applied patch is below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 5
close(1) = 0
Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Baik Song An <bsahn(a)etri.re.kr>
Cc: Hong Yeon Kim <kimhy(a)etri.re.kr>
Cc: Taeung Song <taeung(a)reallinux.co.kr>
Cc: linuxgeek(a)linuxgeek.io
Cc: stable(a)vger.kernel.org
Fixes: 4909010788640 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 498ae22d4ffa..4825883b2ffd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -721,13 +721,16 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
pos = 0;
ret = trace_get_user(&parser, ubuf, cnt, &pos);
- if (ret < 0 || !trace_parser_loaded(&parser))
+ if (ret < 0)
break;
read += ret;
ubuf += ret;
cnt -= ret;
+ if (!trace_parser_loaded(&parser))
+ break;
+
ret = -EINVAL;
if (kstrtoul(parser.buffer, 0, &val))
break;
@@ -753,7 +756,6 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
if (!nr_pids) {
/* Cleared the list of pids */
trace_pid_list_free(pid_list);
- read = ret;
pid_list = NULL;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b27f266f74fbda4ee36c2b2b04d15992860cf23b Mon Sep 17 00:00:00 2001
From: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Date: Tue, 3 May 2022 14:05:46 +0900
Subject: [PATCH] tracing: Fix return value of trace_pid_write()
Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 4
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
....
This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.
The result of applied patch is below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 5
close(1) = 0
Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Baik Song An <bsahn(a)etri.re.kr>
Cc: Hong Yeon Kim <kimhy(a)etri.re.kr>
Cc: Taeung Song <taeung(a)reallinux.co.kr>
Cc: linuxgeek(a)linuxgeek.io
Cc: stable(a)vger.kernel.org
Fixes: 4909010788640 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 498ae22d4ffa..4825883b2ffd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -721,13 +721,16 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
pos = 0;
ret = trace_get_user(&parser, ubuf, cnt, &pos);
- if (ret < 0 || !trace_parser_loaded(&parser))
+ if (ret < 0)
break;
read += ret;
ubuf += ret;
cnt -= ret;
+ if (!trace_parser_loaded(&parser))
+ break;
+
ret = -EINVAL;
if (kstrtoul(parser.buffer, 0, &val))
break;
@@ -753,7 +756,6 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
if (!nr_pids) {
/* Cleared the list of pids */
trace_pid_list_free(pid_list);
- read = ret;
pid_list = NULL;
}
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b27f266f74fbda4ee36c2b2b04d15992860cf23b Mon Sep 17 00:00:00 2001
From: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Date: Tue, 3 May 2022 14:05:46 +0900
Subject: [PATCH] tracing: Fix return value of trace_pid_write()
Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 4
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
....
This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.
The result of applied patch is below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 5
close(1) = 0
Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Baik Song An <bsahn(a)etri.re.kr>
Cc: Hong Yeon Kim <kimhy(a)etri.re.kr>
Cc: Taeung Song <taeung(a)reallinux.co.kr>
Cc: linuxgeek(a)linuxgeek.io
Cc: stable(a)vger.kernel.org
Fixes: 4909010788640 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 498ae22d4ffa..4825883b2ffd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -721,13 +721,16 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
pos = 0;
ret = trace_get_user(&parser, ubuf, cnt, &pos);
- if (ret < 0 || !trace_parser_loaded(&parser))
+ if (ret < 0)
break;
read += ret;
ubuf += ret;
cnt -= ret;
+ if (!trace_parser_loaded(&parser))
+ break;
+
ret = -EINVAL;
if (kstrtoul(parser.buffer, 0, &val))
break;
@@ -753,7 +756,6 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
if (!nr_pids) {
/* Cleared the list of pids */
trace_pid_list_free(pid_list);
- read = ret;
pid_list = NULL;
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b27f266f74fbda4ee36c2b2b04d15992860cf23b Mon Sep 17 00:00:00 2001
From: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Date: Tue, 3 May 2022 14:05:46 +0900
Subject: [PATCH] tracing: Fix return value of trace_pid_write()
Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 4
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
....
This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.
The result of applied patch is below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 5
close(1) = 0
Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Baik Song An <bsahn(a)etri.re.kr>
Cc: Hong Yeon Kim <kimhy(a)etri.re.kr>
Cc: Taeung Song <taeung(a)reallinux.co.kr>
Cc: linuxgeek(a)linuxgeek.io
Cc: stable(a)vger.kernel.org
Fixes: 4909010788640 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 498ae22d4ffa..4825883b2ffd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -721,13 +721,16 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
pos = 0;
ret = trace_get_user(&parser, ubuf, cnt, &pos);
- if (ret < 0 || !trace_parser_loaded(&parser))
+ if (ret < 0)
break;
read += ret;
ubuf += ret;
cnt -= ret;
+ if (!trace_parser_loaded(&parser))
+ break;
+
ret = -EINVAL;
if (kstrtoul(parser.buffer, 0, &val))
break;
@@ -753,7 +756,6 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
if (!nr_pids) {
/* Cleared the list of pids */
trace_pid_list_free(pid_list);
- read = ret;
pid_list = NULL;
}
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b27f266f74fbda4ee36c2b2b04d15992860cf23b Mon Sep 17 00:00:00 2001
From: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Date: Tue, 3 May 2022 14:05:46 +0900
Subject: [PATCH] tracing: Fix return value of trace_pid_write()
Setting set_event_pid with trailing whitespace lead to endless write
system calls like below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 4
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
write(1, "\n", 1) = 0
....
This is because, the result of trace_get_user's are not returned when it
read at least one pid. To fix it, update read variable even if
parser->idx == 0.
The result of applied patch is below.
$ strace echo "123 " > /sys/kernel/debug/tracing/set_event_pid
execve("/usr/bin/echo", ["echo", "123 "], ...) = 0
...
write(1, "123 \n", 5) = 5
close(1) = 0
Link: https://lkml.kernel.org/r/20220503050546.288911-1-vvghjk1234@gmail.com
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Baik Song An <bsahn(a)etri.re.kr>
Cc: Hong Yeon Kim <kimhy(a)etri.re.kr>
Cc: Taeung Song <taeung(a)reallinux.co.kr>
Cc: linuxgeek(a)linuxgeek.io
Cc: stable(a)vger.kernel.org
Fixes: 4909010788640 ("tracing: Add set_event_pid directory for future use")
Signed-off-by: Wonhyuk Yang <vvghjk1234(a)gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 498ae22d4ffa..4825883b2ffd 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -721,13 +721,16 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
pos = 0;
ret = trace_get_user(&parser, ubuf, cnt, &pos);
- if (ret < 0 || !trace_parser_loaded(&parser))
+ if (ret < 0)
break;
read += ret;
ubuf += ret;
cnt -= ret;
+ if (!trace_parser_loaded(&parser))
+ break;
+
ret = -EINVAL;
if (kstrtoul(parser.buffer, 0, &val))
break;
@@ -753,7 +756,6 @@ int trace_pid_write(struct trace_pid_list *filtered_pids,
if (!nr_pids) {
/* Cleared the list of pids */
trace_pid_list_free(pid_list);
- read = ret;
pid_list = NULL;
}
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 499f12168aebd6da8fa32c9b7d6203ca9b5eb88d Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 7 Apr 2022 14:56:32 -0400
Subject: [PATCH] tracing: Have event format check not flag %p* on
__get_dynamic_array()
The print fmt check against trace events to make sure that the format does
not use pointers that may be freed from the time of the trace to the time
the event is read, gives a false positive on %pISpc when reading data that
was saved in __get_dynamic_array() when it is perfectly fine to do so, as
the data being read is on the ring buffer.
Link: https://lore.kernel.org/all/20220407144524.2a592ed6@canb.auug.org.au/
Cc: stable(a)vger.kernel.org
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Reported-by: Stephen Rothwell <sfr(a)canb.auug.org.au>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 78f313b7b315..d5913487821a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -392,12 +392,6 @@ static void test_event_printk(struct trace_event_call *call)
if (!(dereference_flags & (1ULL << arg)))
goto next_arg;
- /* Check for __get_sockaddr */;
- if (str_has_prefix(fmt + i, "__get_sockaddr(")) {
- dereference_flags &= ~(1ULL << arg);
- goto next_arg;
- }
-
/* Find the REC-> in the argument */
c = strchr(fmt + i, ',');
r = strstr(fmt + i, "REC->");
@@ -413,7 +407,14 @@ static void test_event_printk(struct trace_event_call *call)
a = strchr(fmt + i, '&');
if ((a && (a < r)) || test_field(r, call))
dereference_flags &= ~(1ULL << arg);
+ } else if ((r = strstr(fmt + i, "__get_dynamic_array(")) &&
+ (!c || r < c)) {
+ dereference_flags &= ~(1ULL << arg);
+ } else if ((r = strstr(fmt + i, "__get_sockaddr(")) &&
+ (!c || r < c)) {
+ dereference_flags &= ~(1ULL << arg);
}
+
next_arg:
i--;
arg++;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 499f12168aebd6da8fa32c9b7d6203ca9b5eb88d Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org>
Date: Thu, 7 Apr 2022 14:56:32 -0400
Subject: [PATCH] tracing: Have event format check not flag %p* on
__get_dynamic_array()
The print fmt check against trace events to make sure that the format does
not use pointers that may be freed from the time of the trace to the time
the event is read, gives a false positive on %pISpc when reading data that
was saved in __get_dynamic_array() when it is perfectly fine to do so, as
the data being read is on the ring buffer.
Link: https://lore.kernel.org/all/20220407144524.2a592ed6@canb.auug.org.au/
Cc: stable(a)vger.kernel.org
Fixes: 5013f454a352c ("tracing: Add check of trace event print fmts for dereferencing pointers")
Reported-by: Stephen Rothwell <sfr(a)canb.auug.org.au>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events.c b/kernel/trace/trace_events.c
index 78f313b7b315..d5913487821a 100644
--- a/kernel/trace/trace_events.c
+++ b/kernel/trace/trace_events.c
@@ -392,12 +392,6 @@ static void test_event_printk(struct trace_event_call *call)
if (!(dereference_flags & (1ULL << arg)))
goto next_arg;
- /* Check for __get_sockaddr */;
- if (str_has_prefix(fmt + i, "__get_sockaddr(")) {
- dereference_flags &= ~(1ULL << arg);
- goto next_arg;
- }
-
/* Find the REC-> in the argument */
c = strchr(fmt + i, ',');
r = strstr(fmt + i, "REC->");
@@ -413,7 +407,14 @@ static void test_event_printk(struct trace_event_call *call)
a = strchr(fmt + i, '&');
if ((a && (a < r)) || test_field(r, call))
dereference_flags &= ~(1ULL << arg);
+ } else if ((r = strstr(fmt + i, "__get_dynamic_array(")) &&
+ (!c || r < c)) {
+ dereference_flags &= ~(1ULL << arg);
+ } else if ((r = strstr(fmt + i, "__get_sockaddr(")) &&
+ (!c || r < c)) {
+ dereference_flags &= ~(1ULL << arg);
}
+
next_arg:
i--;
arg++;
On each vcpu load, we set the KVM_ARM64_HOST_SVE_ENABLED
flag if SVE is enabled for EL0 on the host. This is used to restore
the correct state on vpcu put.
However, it appears that nothing ever clears this flag. Once
set, it will stick until the vcpu is destroyed, which has the
potential to spuriously enable SVE for userspace.
We probably never saw the issue because no VMM uses SVE, but
that's still pretty bad. Unconditionally clearing the flag
on vcpu load addresses the issue.
Fixes: 8383741ab2e7 ("KVM: arm64: Get rid of host SVE tracking/saving")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
arch/arm64/kvm/fpsimd.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/arm64/kvm/fpsimd.c b/arch/arm64/kvm/fpsimd.c
index 441edb9c398c..3c2cfc3adc51 100644
--- a/arch/arm64/kvm/fpsimd.c
+++ b/arch/arm64/kvm/fpsimd.c
@@ -80,6 +80,7 @@ void kvm_arch_vcpu_load_fp(struct kvm_vcpu *vcpu)
vcpu->arch.flags &= ~KVM_ARM64_FP_ENABLED;
vcpu->arch.flags |= KVM_ARM64_FP_HOST;
+ vcpu->arch.flags &= ~KVM_ARM64_HOST_SVE_ENABLED;
if (read_sysreg(cpacr_el1) & CPACR_EL1_ZEN_EL0EN)
vcpu->arch.flags |= KVM_ARM64_HOST_SVE_ENABLED;
--
2.34.1
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d36f6ed761b53933b0b4126486c10d3da7751e7f Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Wed, 18 May 2022 20:08:16 +0800
Subject: [PATCH] ext4: fix bug_on in __es_tree_search
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/extents_status.c:199!
[...]
RIP: 0010:ext4_es_end fs/ext4/extents_status.c:199 [inline]
RIP: 0010:__es_tree_search+0x1e0/0x260 fs/ext4/extents_status.c:217
[...]
Call Trace:
ext4_es_cache_extent+0x109/0x340 fs/ext4/extents_status.c:766
ext4_cache_extents+0x239/0x2e0 fs/ext4/extents.c:561
ext4_find_extent+0x6b7/0xa20 fs/ext4/extents.c:964
ext4_ext_map_blocks+0x16b/0x4b70 fs/ext4/extents.c:4384
ext4_map_blocks+0xe26/0x19f0 fs/ext4/inode.c:567
ext4_getblk+0x320/0x4c0 fs/ext4/inode.c:980
ext4_bread+0x2d/0x170 fs/ext4/inode.c:1031
ext4_quota_read+0x248/0x320 fs/ext4/super.c:6257
v2_read_header+0x78/0x110 fs/quota/quota_v2.c:63
v2_check_quota_file+0x76/0x230 fs/quota/quota_v2.c:82
vfs_load_quota_inode+0x5d1/0x1530 fs/quota/dquot.c:2368
dquot_enable+0x28a/0x330 fs/quota/dquot.c:2490
ext4_quota_enable fs/ext4/super.c:6137 [inline]
ext4_enable_quotas+0x5d7/0x960 fs/ext4/super.c:6163
ext4_fill_super+0xa7c9/0xdc00 fs/ext4/super.c:4754
mount_bdev+0x2e9/0x3b0 fs/super.c:1158
mount_fs+0x4b/0x1e4 fs/super.c:1261
[...]
==================================================================
Above issue may happen as follows:
-------------------------------------
ext4_fill_super
ext4_enable_quotas
ext4_quota_enable
ext4_iget
__ext4_iget
ext4_ext_check_inode
ext4_ext_check
__ext4_ext_check
ext4_valid_extent_entries
Check for overlapping extents does't take effect
dquot_enable
vfs_load_quota_inode
v2_check_quota_file
v2_read_header
ext4_quota_read
ext4_bread
ext4_getblk
ext4_map_blocks
ext4_ext_map_blocks
ext4_find_extent
ext4_cache_extents
ext4_es_cache_extent
ext4_es_cache_extent
__es_tree_search
ext4_es_end
BUG_ON(es->es_lblk + es->es_len < es->es_lblk)
The error ext4 extents is as follows:
0af3 0300 0400 0000 00000000 extent_header
00000000 0100 0000 12000000 extent1
00000000 0100 0000 18000000 extent2
02000000 0400 0000 14000000 extent3
In the ext4_valid_extent_entries function,
if prev is 0, no error is returned even if lblock<=prev.
This was intended to skip the check on the first extent, but
in the error image above, prev=0+1-1=0 when checking the second extent,
so even though lblock<=prev, the function does not return an error.
As a result, bug_ON occurs in __es_tree_search and the system panics.
To solve this problem, we only need to check that:
1. The lblock of the first extent is not less than 0.
2. The lblock of the next extent is not less than
the next block of the previous extent.
The same applies to extent_idx.
Cc: stable(a)kernel.org
Fixes: 5946d089379a ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220518120816.1541863-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 474479ce76e0..c148bb97b527 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -372,7 +372,7 @@ static int ext4_valid_extent_entries(struct inode *inode,
{
unsigned short entries;
ext4_lblk_t lblock = 0;
- ext4_lblk_t prev = 0;
+ ext4_lblk_t cur = 0;
if (eh->eh_entries == 0)
return 1;
@@ -396,11 +396,11 @@ static int ext4_valid_extent_entries(struct inode *inode,
/* Check for overlapping extents */
lblock = le32_to_cpu(ext->ee_block);
- if ((lblock <= prev) && prev) {
+ if (lblock < cur) {
*pblk = ext4_ext_pblock(ext);
return 0;
}
- prev = lblock + ext4_ext_get_actual_len(ext) - 1;
+ cur = lblock + ext4_ext_get_actual_len(ext);
ext++;
entries--;
}
@@ -420,13 +420,13 @@ static int ext4_valid_extent_entries(struct inode *inode,
/* Check for overlapping index extents */
lblock = le32_to_cpu(ext_idx->ei_block);
- if ((lblock <= prev) && prev) {
+ if (lblock < cur) {
*pblk = ext4_idx_pblock(ext_idx);
return 0;
}
ext_idx++;
entries--;
- prev = lblock;
+ cur = lblock + 1;
}
}
return 1;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d36f6ed761b53933b0b4126486c10d3da7751e7f Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Wed, 18 May 2022 20:08:16 +0800
Subject: [PATCH] ext4: fix bug_on in __es_tree_search
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/extents_status.c:199!
[...]
RIP: 0010:ext4_es_end fs/ext4/extents_status.c:199 [inline]
RIP: 0010:__es_tree_search+0x1e0/0x260 fs/ext4/extents_status.c:217
[...]
Call Trace:
ext4_es_cache_extent+0x109/0x340 fs/ext4/extents_status.c:766
ext4_cache_extents+0x239/0x2e0 fs/ext4/extents.c:561
ext4_find_extent+0x6b7/0xa20 fs/ext4/extents.c:964
ext4_ext_map_blocks+0x16b/0x4b70 fs/ext4/extents.c:4384
ext4_map_blocks+0xe26/0x19f0 fs/ext4/inode.c:567
ext4_getblk+0x320/0x4c0 fs/ext4/inode.c:980
ext4_bread+0x2d/0x170 fs/ext4/inode.c:1031
ext4_quota_read+0x248/0x320 fs/ext4/super.c:6257
v2_read_header+0x78/0x110 fs/quota/quota_v2.c:63
v2_check_quota_file+0x76/0x230 fs/quota/quota_v2.c:82
vfs_load_quota_inode+0x5d1/0x1530 fs/quota/dquot.c:2368
dquot_enable+0x28a/0x330 fs/quota/dquot.c:2490
ext4_quota_enable fs/ext4/super.c:6137 [inline]
ext4_enable_quotas+0x5d7/0x960 fs/ext4/super.c:6163
ext4_fill_super+0xa7c9/0xdc00 fs/ext4/super.c:4754
mount_bdev+0x2e9/0x3b0 fs/super.c:1158
mount_fs+0x4b/0x1e4 fs/super.c:1261
[...]
==================================================================
Above issue may happen as follows:
-------------------------------------
ext4_fill_super
ext4_enable_quotas
ext4_quota_enable
ext4_iget
__ext4_iget
ext4_ext_check_inode
ext4_ext_check
__ext4_ext_check
ext4_valid_extent_entries
Check for overlapping extents does't take effect
dquot_enable
vfs_load_quota_inode
v2_check_quota_file
v2_read_header
ext4_quota_read
ext4_bread
ext4_getblk
ext4_map_blocks
ext4_ext_map_blocks
ext4_find_extent
ext4_cache_extents
ext4_es_cache_extent
ext4_es_cache_extent
__es_tree_search
ext4_es_end
BUG_ON(es->es_lblk + es->es_len < es->es_lblk)
The error ext4 extents is as follows:
0af3 0300 0400 0000 00000000 extent_header
00000000 0100 0000 12000000 extent1
00000000 0100 0000 18000000 extent2
02000000 0400 0000 14000000 extent3
In the ext4_valid_extent_entries function,
if prev is 0, no error is returned even if lblock<=prev.
This was intended to skip the check on the first extent, but
in the error image above, prev=0+1-1=0 when checking the second extent,
so even though lblock<=prev, the function does not return an error.
As a result, bug_ON occurs in __es_tree_search and the system panics.
To solve this problem, we only need to check that:
1. The lblock of the first extent is not less than 0.
2. The lblock of the next extent is not less than
the next block of the previous extent.
The same applies to extent_idx.
Cc: stable(a)kernel.org
Fixes: 5946d089379a ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220518120816.1541863-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 474479ce76e0..c148bb97b527 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -372,7 +372,7 @@ static int ext4_valid_extent_entries(struct inode *inode,
{
unsigned short entries;
ext4_lblk_t lblock = 0;
- ext4_lblk_t prev = 0;
+ ext4_lblk_t cur = 0;
if (eh->eh_entries == 0)
return 1;
@@ -396,11 +396,11 @@ static int ext4_valid_extent_entries(struct inode *inode,
/* Check for overlapping extents */
lblock = le32_to_cpu(ext->ee_block);
- if ((lblock <= prev) && prev) {
+ if (lblock < cur) {
*pblk = ext4_ext_pblock(ext);
return 0;
}
- prev = lblock + ext4_ext_get_actual_len(ext) - 1;
+ cur = lblock + ext4_ext_get_actual_len(ext);
ext++;
entries--;
}
@@ -420,13 +420,13 @@ static int ext4_valid_extent_entries(struct inode *inode,
/* Check for overlapping index extents */
lblock = le32_to_cpu(ext_idx->ei_block);
- if ((lblock <= prev) && prev) {
+ if (lblock < cur) {
*pblk = ext4_idx_pblock(ext_idx);
return 0;
}
ext_idx++;
entries--;
- prev = lblock;
+ cur = lblock + 1;
}
}
return 1;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d36f6ed761b53933b0b4126486c10d3da7751e7f Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Wed, 18 May 2022 20:08:16 +0800
Subject: [PATCH] ext4: fix bug_on in __es_tree_search
Hulk Robot reported a BUG_ON:
==================================================================
kernel BUG at fs/ext4/extents_status.c:199!
[...]
RIP: 0010:ext4_es_end fs/ext4/extents_status.c:199 [inline]
RIP: 0010:__es_tree_search+0x1e0/0x260 fs/ext4/extents_status.c:217
[...]
Call Trace:
ext4_es_cache_extent+0x109/0x340 fs/ext4/extents_status.c:766
ext4_cache_extents+0x239/0x2e0 fs/ext4/extents.c:561
ext4_find_extent+0x6b7/0xa20 fs/ext4/extents.c:964
ext4_ext_map_blocks+0x16b/0x4b70 fs/ext4/extents.c:4384
ext4_map_blocks+0xe26/0x19f0 fs/ext4/inode.c:567
ext4_getblk+0x320/0x4c0 fs/ext4/inode.c:980
ext4_bread+0x2d/0x170 fs/ext4/inode.c:1031
ext4_quota_read+0x248/0x320 fs/ext4/super.c:6257
v2_read_header+0x78/0x110 fs/quota/quota_v2.c:63
v2_check_quota_file+0x76/0x230 fs/quota/quota_v2.c:82
vfs_load_quota_inode+0x5d1/0x1530 fs/quota/dquot.c:2368
dquot_enable+0x28a/0x330 fs/quota/dquot.c:2490
ext4_quota_enable fs/ext4/super.c:6137 [inline]
ext4_enable_quotas+0x5d7/0x960 fs/ext4/super.c:6163
ext4_fill_super+0xa7c9/0xdc00 fs/ext4/super.c:4754
mount_bdev+0x2e9/0x3b0 fs/super.c:1158
mount_fs+0x4b/0x1e4 fs/super.c:1261
[...]
==================================================================
Above issue may happen as follows:
-------------------------------------
ext4_fill_super
ext4_enable_quotas
ext4_quota_enable
ext4_iget
__ext4_iget
ext4_ext_check_inode
ext4_ext_check
__ext4_ext_check
ext4_valid_extent_entries
Check for overlapping extents does't take effect
dquot_enable
vfs_load_quota_inode
v2_check_quota_file
v2_read_header
ext4_quota_read
ext4_bread
ext4_getblk
ext4_map_blocks
ext4_ext_map_blocks
ext4_find_extent
ext4_cache_extents
ext4_es_cache_extent
ext4_es_cache_extent
__es_tree_search
ext4_es_end
BUG_ON(es->es_lblk + es->es_len < es->es_lblk)
The error ext4 extents is as follows:
0af3 0300 0400 0000 00000000 extent_header
00000000 0100 0000 12000000 extent1
00000000 0100 0000 18000000 extent2
02000000 0400 0000 14000000 extent3
In the ext4_valid_extent_entries function,
if prev is 0, no error is returned even if lblock<=prev.
This was intended to skip the check on the first extent, but
in the error image above, prev=0+1-1=0 when checking the second extent,
so even though lblock<=prev, the function does not return an error.
As a result, bug_ON occurs in __es_tree_search and the system panics.
To solve this problem, we only need to check that:
1. The lblock of the first extent is not less than 0.
2. The lblock of the next extent is not less than
the next block of the previous extent.
The same applies to extent_idx.
Cc: stable(a)kernel.org
Fixes: 5946d089379a ("ext4: check for overlapping extents in ext4_valid_extent_entries()")
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220518120816.1541863-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index 474479ce76e0..c148bb97b527 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -372,7 +372,7 @@ static int ext4_valid_extent_entries(struct inode *inode,
{
unsigned short entries;
ext4_lblk_t lblock = 0;
- ext4_lblk_t prev = 0;
+ ext4_lblk_t cur = 0;
if (eh->eh_entries == 0)
return 1;
@@ -396,11 +396,11 @@ static int ext4_valid_extent_entries(struct inode *inode,
/* Check for overlapping extents */
lblock = le32_to_cpu(ext->ee_block);
- if ((lblock <= prev) && prev) {
+ if (lblock < cur) {
*pblk = ext4_ext_pblock(ext);
return 0;
}
- prev = lblock + ext4_ext_get_actual_len(ext) - 1;
+ cur = lblock + ext4_ext_get_actual_len(ext);
ext++;
entries--;
}
@@ -420,13 +420,13 @@ static int ext4_valid_extent_entries(struct inode *inode,
/* Check for overlapping index extents */
lblock = le32_to_cpu(ext_idx->ei_block);
- if ((lblock <= prev) && prev) {
+ if (lblock < cur) {
*pblk = ext4_idx_pblock(ext_idx);
return 0;
}
ext_idx++;
entries--;
- prev = lblock;
+ cur = lblock + 1;
}
}
return 1;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f87c7a4b084afc13190cbb263538e444cb2b392a Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Thu, 28 Apr 2022 21:40:31 +0800
Subject: [PATCH] ext4: fix race condition between ext4_write and
ext4_convert_inline_data
Hulk Robot reported a BUG_ON:
==================================================================
EXT4-fs error (device loop3): ext4_mb_generate_buddy:805: group 0,
block bitmap and bg descriptor inconsistent: 25 vs 31513 free clusters
kernel BUG at fs/ext4/ext4_jbd2.c:53!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 25371 Comm: syz-executor.3 Not tainted 5.10.0+ #1
RIP: 0010:ext4_put_nojournal fs/ext4/ext4_jbd2.c:53 [inline]
RIP: 0010:__ext4_journal_stop+0x10e/0x110 fs/ext4/ext4_jbd2.c:116
[...]
Call Trace:
ext4_write_inline_data_end+0x59a/0x730 fs/ext4/inline.c:795
generic_perform_write+0x279/0x3c0 mm/filemap.c:3344
ext4_buffered_write_iter+0x2e3/0x3d0 fs/ext4/file.c:270
ext4_file_write_iter+0x30a/0x11c0 fs/ext4/file.c:520
do_iter_readv_writev+0x339/0x3c0 fs/read_write.c:732
do_iter_write+0x107/0x430 fs/read_write.c:861
vfs_writev fs/read_write.c:934 [inline]
do_pwritev+0x1e5/0x380 fs/read_write.c:1031
[...]
==================================================================
Above issue may happen as follows:
cpu1 cpu2
__________________________|__________________________
do_pwritev
vfs_writev
do_iter_write
ext4_file_write_iter
ext4_buffered_write_iter
generic_perform_write
ext4_da_write_begin
vfs_fallocate
ext4_fallocate
ext4_convert_inline_data
ext4_convert_inline_data_nolock
ext4_destroy_inline_data_nolock
clear EXT4_STATE_MAY_INLINE_DATA
ext4_map_blocks
ext4_ext_map_blocks
ext4_mb_new_blocks
ext4_mb_regular_allocator
ext4_mb_good_group_nolock
ext4_mb_init_group
ext4_mb_init_cache
ext4_mb_generate_buddy --> error
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_restore_inline_data
set EXT4_STATE_MAY_INLINE_DATA
ext4_block_write_begin
ext4_da_write_end
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_write_inline_data_end
handle=NULL
ext4_journal_stop(handle)
__ext4_journal_stop
ext4_put_nojournal(handle)
ref_cnt = (unsigned long)handle
BUG_ON(ref_cnt == 0) ---> BUG_ON
The lock held by ext4_convert_inline_data is xattr_sem, but the lock
held by generic_perform_write is i_rwsem. Therefore, the two locks can
be concurrent.
To solve above issue, we add inode_lock() for ext4_convert_inline_data().
At the same time, move ext4_convert_inline_data() in front of
ext4_punch_hole(), remove similar handling from ext4_punch_hole().
Fixes: 0c8d414f163f ("ext4: let fallocate handle inline data correctly")
Cc: stable(a)vger.kernel.org
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220428134031.4153381-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e473fde6b64b..474479ce76e0 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4693,15 +4693,17 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;
+ inode_lock(inode);
+ ret = ext4_convert_inline_data(inode);
+ inode_unlock(inode);
+ if (ret)
+ goto exit;
+
if (mode & FALLOC_FL_PUNCH_HOLE) {
ret = ext4_punch_hole(file, offset, len);
goto exit;
}
- ret = ext4_convert_inline_data(inode);
- if (ret)
- goto exit;
-
if (mode & FALLOC_FL_COLLAPSE_RANGE) {
ret = ext4_collapse_range(file, offset, len);
goto exit;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5948bbba28e3..890f769d6e20 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3979,15 +3979,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
trace_ext4_punch_hole(inode, offset, length, 0);
- ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
- if (ext4_has_inline_data(inode)) {
- filemap_invalidate_lock(mapping);
- ret = ext4_convert_inline_data(inode);
- filemap_invalidate_unlock(mapping);
- if (ret)
- return ret;
- }
-
/*
* Write out all dirty pages to avoid race conditions
* Then release them.
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f87c7a4b084afc13190cbb263538e444cb2b392a Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Thu, 28 Apr 2022 21:40:31 +0800
Subject: [PATCH] ext4: fix race condition between ext4_write and
ext4_convert_inline_data
Hulk Robot reported a BUG_ON:
==================================================================
EXT4-fs error (device loop3): ext4_mb_generate_buddy:805: group 0,
block bitmap and bg descriptor inconsistent: 25 vs 31513 free clusters
kernel BUG at fs/ext4/ext4_jbd2.c:53!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 25371 Comm: syz-executor.3 Not tainted 5.10.0+ #1
RIP: 0010:ext4_put_nojournal fs/ext4/ext4_jbd2.c:53 [inline]
RIP: 0010:__ext4_journal_stop+0x10e/0x110 fs/ext4/ext4_jbd2.c:116
[...]
Call Trace:
ext4_write_inline_data_end+0x59a/0x730 fs/ext4/inline.c:795
generic_perform_write+0x279/0x3c0 mm/filemap.c:3344
ext4_buffered_write_iter+0x2e3/0x3d0 fs/ext4/file.c:270
ext4_file_write_iter+0x30a/0x11c0 fs/ext4/file.c:520
do_iter_readv_writev+0x339/0x3c0 fs/read_write.c:732
do_iter_write+0x107/0x430 fs/read_write.c:861
vfs_writev fs/read_write.c:934 [inline]
do_pwritev+0x1e5/0x380 fs/read_write.c:1031
[...]
==================================================================
Above issue may happen as follows:
cpu1 cpu2
__________________________|__________________________
do_pwritev
vfs_writev
do_iter_write
ext4_file_write_iter
ext4_buffered_write_iter
generic_perform_write
ext4_da_write_begin
vfs_fallocate
ext4_fallocate
ext4_convert_inline_data
ext4_convert_inline_data_nolock
ext4_destroy_inline_data_nolock
clear EXT4_STATE_MAY_INLINE_DATA
ext4_map_blocks
ext4_ext_map_blocks
ext4_mb_new_blocks
ext4_mb_regular_allocator
ext4_mb_good_group_nolock
ext4_mb_init_group
ext4_mb_init_cache
ext4_mb_generate_buddy --> error
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_restore_inline_data
set EXT4_STATE_MAY_INLINE_DATA
ext4_block_write_begin
ext4_da_write_end
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_write_inline_data_end
handle=NULL
ext4_journal_stop(handle)
__ext4_journal_stop
ext4_put_nojournal(handle)
ref_cnt = (unsigned long)handle
BUG_ON(ref_cnt == 0) ---> BUG_ON
The lock held by ext4_convert_inline_data is xattr_sem, but the lock
held by generic_perform_write is i_rwsem. Therefore, the two locks can
be concurrent.
To solve above issue, we add inode_lock() for ext4_convert_inline_data().
At the same time, move ext4_convert_inline_data() in front of
ext4_punch_hole(), remove similar handling from ext4_punch_hole().
Fixes: 0c8d414f163f ("ext4: let fallocate handle inline data correctly")
Cc: stable(a)vger.kernel.org
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220428134031.4153381-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e473fde6b64b..474479ce76e0 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4693,15 +4693,17 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;
+ inode_lock(inode);
+ ret = ext4_convert_inline_data(inode);
+ inode_unlock(inode);
+ if (ret)
+ goto exit;
+
if (mode & FALLOC_FL_PUNCH_HOLE) {
ret = ext4_punch_hole(file, offset, len);
goto exit;
}
- ret = ext4_convert_inline_data(inode);
- if (ret)
- goto exit;
-
if (mode & FALLOC_FL_COLLAPSE_RANGE) {
ret = ext4_collapse_range(file, offset, len);
goto exit;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5948bbba28e3..890f769d6e20 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3979,15 +3979,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
trace_ext4_punch_hole(inode, offset, length, 0);
- ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
- if (ext4_has_inline_data(inode)) {
- filemap_invalidate_lock(mapping);
- ret = ext4_convert_inline_data(inode);
- filemap_invalidate_unlock(mapping);
- if (ret)
- return ret;
- }
-
/*
* Write out all dirty pages to avoid race conditions
* Then release them.
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f87c7a4b084afc13190cbb263538e444cb2b392a Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Thu, 28 Apr 2022 21:40:31 +0800
Subject: [PATCH] ext4: fix race condition between ext4_write and
ext4_convert_inline_data
Hulk Robot reported a BUG_ON:
==================================================================
EXT4-fs error (device loop3): ext4_mb_generate_buddy:805: group 0,
block bitmap and bg descriptor inconsistent: 25 vs 31513 free clusters
kernel BUG at fs/ext4/ext4_jbd2.c:53!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 25371 Comm: syz-executor.3 Not tainted 5.10.0+ #1
RIP: 0010:ext4_put_nojournal fs/ext4/ext4_jbd2.c:53 [inline]
RIP: 0010:__ext4_journal_stop+0x10e/0x110 fs/ext4/ext4_jbd2.c:116
[...]
Call Trace:
ext4_write_inline_data_end+0x59a/0x730 fs/ext4/inline.c:795
generic_perform_write+0x279/0x3c0 mm/filemap.c:3344
ext4_buffered_write_iter+0x2e3/0x3d0 fs/ext4/file.c:270
ext4_file_write_iter+0x30a/0x11c0 fs/ext4/file.c:520
do_iter_readv_writev+0x339/0x3c0 fs/read_write.c:732
do_iter_write+0x107/0x430 fs/read_write.c:861
vfs_writev fs/read_write.c:934 [inline]
do_pwritev+0x1e5/0x380 fs/read_write.c:1031
[...]
==================================================================
Above issue may happen as follows:
cpu1 cpu2
__________________________|__________________________
do_pwritev
vfs_writev
do_iter_write
ext4_file_write_iter
ext4_buffered_write_iter
generic_perform_write
ext4_da_write_begin
vfs_fallocate
ext4_fallocate
ext4_convert_inline_data
ext4_convert_inline_data_nolock
ext4_destroy_inline_data_nolock
clear EXT4_STATE_MAY_INLINE_DATA
ext4_map_blocks
ext4_ext_map_blocks
ext4_mb_new_blocks
ext4_mb_regular_allocator
ext4_mb_good_group_nolock
ext4_mb_init_group
ext4_mb_init_cache
ext4_mb_generate_buddy --> error
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_restore_inline_data
set EXT4_STATE_MAY_INLINE_DATA
ext4_block_write_begin
ext4_da_write_end
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_write_inline_data_end
handle=NULL
ext4_journal_stop(handle)
__ext4_journal_stop
ext4_put_nojournal(handle)
ref_cnt = (unsigned long)handle
BUG_ON(ref_cnt == 0) ---> BUG_ON
The lock held by ext4_convert_inline_data is xattr_sem, but the lock
held by generic_perform_write is i_rwsem. Therefore, the two locks can
be concurrent.
To solve above issue, we add inode_lock() for ext4_convert_inline_data().
At the same time, move ext4_convert_inline_data() in front of
ext4_punch_hole(), remove similar handling from ext4_punch_hole().
Fixes: 0c8d414f163f ("ext4: let fallocate handle inline data correctly")
Cc: stable(a)vger.kernel.org
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220428134031.4153381-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e473fde6b64b..474479ce76e0 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4693,15 +4693,17 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;
+ inode_lock(inode);
+ ret = ext4_convert_inline_data(inode);
+ inode_unlock(inode);
+ if (ret)
+ goto exit;
+
if (mode & FALLOC_FL_PUNCH_HOLE) {
ret = ext4_punch_hole(file, offset, len);
goto exit;
}
- ret = ext4_convert_inline_data(inode);
- if (ret)
- goto exit;
-
if (mode & FALLOC_FL_COLLAPSE_RANGE) {
ret = ext4_collapse_range(file, offset, len);
goto exit;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5948bbba28e3..890f769d6e20 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3979,15 +3979,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
trace_ext4_punch_hole(inode, offset, length, 0);
- ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
- if (ext4_has_inline_data(inode)) {
- filemap_invalidate_lock(mapping);
- ret = ext4_convert_inline_data(inode);
- filemap_invalidate_unlock(mapping);
- if (ret)
- return ret;
- }
-
/*
* Write out all dirty pages to avoid race conditions
* Then release them.
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f87c7a4b084afc13190cbb263538e444cb2b392a Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Thu, 28 Apr 2022 21:40:31 +0800
Subject: [PATCH] ext4: fix race condition between ext4_write and
ext4_convert_inline_data
Hulk Robot reported a BUG_ON:
==================================================================
EXT4-fs error (device loop3): ext4_mb_generate_buddy:805: group 0,
block bitmap and bg descriptor inconsistent: 25 vs 31513 free clusters
kernel BUG at fs/ext4/ext4_jbd2.c:53!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 25371 Comm: syz-executor.3 Not tainted 5.10.0+ #1
RIP: 0010:ext4_put_nojournal fs/ext4/ext4_jbd2.c:53 [inline]
RIP: 0010:__ext4_journal_stop+0x10e/0x110 fs/ext4/ext4_jbd2.c:116
[...]
Call Trace:
ext4_write_inline_data_end+0x59a/0x730 fs/ext4/inline.c:795
generic_perform_write+0x279/0x3c0 mm/filemap.c:3344
ext4_buffered_write_iter+0x2e3/0x3d0 fs/ext4/file.c:270
ext4_file_write_iter+0x30a/0x11c0 fs/ext4/file.c:520
do_iter_readv_writev+0x339/0x3c0 fs/read_write.c:732
do_iter_write+0x107/0x430 fs/read_write.c:861
vfs_writev fs/read_write.c:934 [inline]
do_pwritev+0x1e5/0x380 fs/read_write.c:1031
[...]
==================================================================
Above issue may happen as follows:
cpu1 cpu2
__________________________|__________________________
do_pwritev
vfs_writev
do_iter_write
ext4_file_write_iter
ext4_buffered_write_iter
generic_perform_write
ext4_da_write_begin
vfs_fallocate
ext4_fallocate
ext4_convert_inline_data
ext4_convert_inline_data_nolock
ext4_destroy_inline_data_nolock
clear EXT4_STATE_MAY_INLINE_DATA
ext4_map_blocks
ext4_ext_map_blocks
ext4_mb_new_blocks
ext4_mb_regular_allocator
ext4_mb_good_group_nolock
ext4_mb_init_group
ext4_mb_init_cache
ext4_mb_generate_buddy --> error
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_restore_inline_data
set EXT4_STATE_MAY_INLINE_DATA
ext4_block_write_begin
ext4_da_write_end
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_write_inline_data_end
handle=NULL
ext4_journal_stop(handle)
__ext4_journal_stop
ext4_put_nojournal(handle)
ref_cnt = (unsigned long)handle
BUG_ON(ref_cnt == 0) ---> BUG_ON
The lock held by ext4_convert_inline_data is xattr_sem, but the lock
held by generic_perform_write is i_rwsem. Therefore, the two locks can
be concurrent.
To solve above issue, we add inode_lock() for ext4_convert_inline_data().
At the same time, move ext4_convert_inline_data() in front of
ext4_punch_hole(), remove similar handling from ext4_punch_hole().
Fixes: 0c8d414f163f ("ext4: let fallocate handle inline data correctly")
Cc: stable(a)vger.kernel.org
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220428134031.4153381-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e473fde6b64b..474479ce76e0 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4693,15 +4693,17 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;
+ inode_lock(inode);
+ ret = ext4_convert_inline_data(inode);
+ inode_unlock(inode);
+ if (ret)
+ goto exit;
+
if (mode & FALLOC_FL_PUNCH_HOLE) {
ret = ext4_punch_hole(file, offset, len);
goto exit;
}
- ret = ext4_convert_inline_data(inode);
- if (ret)
- goto exit;
-
if (mode & FALLOC_FL_COLLAPSE_RANGE) {
ret = ext4_collapse_range(file, offset, len);
goto exit;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5948bbba28e3..890f769d6e20 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3979,15 +3979,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
trace_ext4_punch_hole(inode, offset, length, 0);
- ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
- if (ext4_has_inline_data(inode)) {
- filemap_invalidate_lock(mapping);
- ret = ext4_convert_inline_data(inode);
- filemap_invalidate_unlock(mapping);
- if (ret)
- return ret;
- }
-
/*
* Write out all dirty pages to avoid race conditions
* Then release them.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f87c7a4b084afc13190cbb263538e444cb2b392a Mon Sep 17 00:00:00 2001
From: Baokun Li <libaokun1(a)huawei.com>
Date: Thu, 28 Apr 2022 21:40:31 +0800
Subject: [PATCH] ext4: fix race condition between ext4_write and
ext4_convert_inline_data
Hulk Robot reported a BUG_ON:
==================================================================
EXT4-fs error (device loop3): ext4_mb_generate_buddy:805: group 0,
block bitmap and bg descriptor inconsistent: 25 vs 31513 free clusters
kernel BUG at fs/ext4/ext4_jbd2.c:53!
invalid opcode: 0000 [#1] SMP KASAN PTI
CPU: 0 PID: 25371 Comm: syz-executor.3 Not tainted 5.10.0+ #1
RIP: 0010:ext4_put_nojournal fs/ext4/ext4_jbd2.c:53 [inline]
RIP: 0010:__ext4_journal_stop+0x10e/0x110 fs/ext4/ext4_jbd2.c:116
[...]
Call Trace:
ext4_write_inline_data_end+0x59a/0x730 fs/ext4/inline.c:795
generic_perform_write+0x279/0x3c0 mm/filemap.c:3344
ext4_buffered_write_iter+0x2e3/0x3d0 fs/ext4/file.c:270
ext4_file_write_iter+0x30a/0x11c0 fs/ext4/file.c:520
do_iter_readv_writev+0x339/0x3c0 fs/read_write.c:732
do_iter_write+0x107/0x430 fs/read_write.c:861
vfs_writev fs/read_write.c:934 [inline]
do_pwritev+0x1e5/0x380 fs/read_write.c:1031
[...]
==================================================================
Above issue may happen as follows:
cpu1 cpu2
__________________________|__________________________
do_pwritev
vfs_writev
do_iter_write
ext4_file_write_iter
ext4_buffered_write_iter
generic_perform_write
ext4_da_write_begin
vfs_fallocate
ext4_fallocate
ext4_convert_inline_data
ext4_convert_inline_data_nolock
ext4_destroy_inline_data_nolock
clear EXT4_STATE_MAY_INLINE_DATA
ext4_map_blocks
ext4_ext_map_blocks
ext4_mb_new_blocks
ext4_mb_regular_allocator
ext4_mb_good_group_nolock
ext4_mb_init_group
ext4_mb_init_cache
ext4_mb_generate_buddy --> error
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_restore_inline_data
set EXT4_STATE_MAY_INLINE_DATA
ext4_block_write_begin
ext4_da_write_end
ext4_test_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA)
ext4_write_inline_data_end
handle=NULL
ext4_journal_stop(handle)
__ext4_journal_stop
ext4_put_nojournal(handle)
ref_cnt = (unsigned long)handle
BUG_ON(ref_cnt == 0) ---> BUG_ON
The lock held by ext4_convert_inline_data is xattr_sem, but the lock
held by generic_perform_write is i_rwsem. Therefore, the two locks can
be concurrent.
To solve above issue, we add inode_lock() for ext4_convert_inline_data().
At the same time, move ext4_convert_inline_data() in front of
ext4_punch_hole(), remove similar handling from ext4_punch_hole().
Fixes: 0c8d414f163f ("ext4: let fallocate handle inline data correctly")
Cc: stable(a)vger.kernel.org
Reported-by: Hulk Robot <hulkci(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20220428134031.4153381-1-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index e473fde6b64b..474479ce76e0 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -4693,15 +4693,17 @@ long ext4_fallocate(struct file *file, int mode, loff_t offset, loff_t len)
FALLOC_FL_INSERT_RANGE))
return -EOPNOTSUPP;
+ inode_lock(inode);
+ ret = ext4_convert_inline_data(inode);
+ inode_unlock(inode);
+ if (ret)
+ goto exit;
+
if (mode & FALLOC_FL_PUNCH_HOLE) {
ret = ext4_punch_hole(file, offset, len);
goto exit;
}
- ret = ext4_convert_inline_data(inode);
- if (ret)
- goto exit;
-
if (mode & FALLOC_FL_COLLAPSE_RANGE) {
ret = ext4_collapse_range(file, offset, len);
goto exit;
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5948bbba28e3..890f769d6e20 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3979,15 +3979,6 @@ int ext4_punch_hole(struct file *file, loff_t offset, loff_t length)
trace_ext4_punch_hole(inode, offset, length, 0);
- ext4_clear_inode_state(inode, EXT4_STATE_MAY_INLINE_DATA);
- if (ext4_has_inline_data(inode)) {
- filemap_invalidate_lock(mapping);
- ret = ext4_convert_inline_data(inode);
- filemap_invalidate_unlock(mapping);
- if (ret)
- return ret;
- }
-
/*
* Write out all dirty pages to avoid race conditions
* Then release them.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d63c00ea435a5352f486c259665a4ced60399421 Mon Sep 17 00:00:00 2001
From: Dmitry Monakhov <dmtrmonakhov(a)yandex-team.ru>
Date: Sun, 17 Apr 2022 20:03:15 +0300
Subject: [PATCH] ext4: mark group as trimmed only if it was fully scanned
Otherwise nonaligned fstrim calls will works inconveniently for iterative
scanners, for example:
// trim [0,16MB] for group-1, but mark full group as trimmed
fstrim -o $((1024*1024*128)) -l $((1024*1024*16)) ./m
// handle [16MB,16MB] for group-1, do nothing because group already has the flag.
fstrim -o $((1024*1024*144)) -l $((1024*1024*16)) ./m
[ Update function documentation for ext4_trim_all_free -- TYT ]
Signed-off-by: Dmitry Monakhov <dmtrmonakhov(a)yandex-team.ru>
Link: https://lore.kernel.org/r/1650214995-860245-1-git-send-email-dmtrmonakhov@y…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
Cc: stable(a)kernel.org
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 3e715b837e70..bc90635b757c 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -6395,6 +6395,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
* @start: first group block to examine
* @max: last group block to examine
* @minblocks: minimum extent block count
+ * @set_trimmed: set the trimmed flag if at least one block is trimmed
*
* ext4_trim_all_free walks through group's block bitmap searching for free
* extents. When the free extent is found, mark it as used in group buddy
@@ -6404,7 +6405,7 @@ __releases(ext4_group_lock_ptr(sb, e4b->bd_group))
static ext4_grpblk_t
ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
ext4_grpblk_t start, ext4_grpblk_t max,
- ext4_grpblk_t minblocks)
+ ext4_grpblk_t minblocks, bool set_trimmed)
{
struct ext4_buddy e4b;
int ret;
@@ -6423,7 +6424,7 @@ ext4_trim_all_free(struct super_block *sb, ext4_group_t group,
if (!EXT4_MB_GRP_WAS_TRIMMED(e4b.bd_info) ||
minblocks < EXT4_SB(sb)->s_last_trim_minblks) {
ret = ext4_try_to_trim_range(sb, &e4b, start, max, minblocks);
- if (ret >= 0)
+ if (ret >= 0 && set_trimmed)
EXT4_MB_GRP_SET_TRIMMED(e4b.bd_info);
} else {
ret = 0;
@@ -6460,6 +6461,7 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
ext4_fsblk_t first_data_blk =
le32_to_cpu(EXT4_SB(sb)->s_es->s_first_data_block);
ext4_fsblk_t max_blks = ext4_blocks_count(EXT4_SB(sb)->s_es);
+ bool whole_group, eof = false;
int ret = 0;
start = range->start >> sb->s_blocksize_bits;
@@ -6478,8 +6480,10 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
if (minlen > EXT4_CLUSTERS_PER_GROUP(sb))
goto out;
}
- if (end >= max_blks)
+ if (end >= max_blks - 1) {
end = max_blks - 1;
+ eof = true;
+ }
if (end <= first_data_blk)
goto out;
if (start < first_data_blk)
@@ -6493,6 +6497,7 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
/* end now represents the last cluster to discard in this group */
end = EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ whole_group = true;
for (group = first_group; group <= last_group; group++) {
grp = ext4_get_group_info(sb, group);
@@ -6509,12 +6514,13 @@ int ext4_trim_fs(struct super_block *sb, struct fstrim_range *range)
* change it for the last group, note that last_cluster is
* already computed earlier by ext4_get_group_no_and_offset()
*/
- if (group == last_group)
+ if (group == last_group) {
end = last_cluster;
-
+ whole_group = eof ? true : end == EXT4_CLUSTERS_PER_GROUP(sb) - 1;
+ }
if (grp->bb_free >= minlen) {
cnt = ext4_trim_all_free(sb, group, first_cluster,
- end, minlen);
+ end, minlen, whole_group);
if (cnt < 0) {
ret = cnt;
break;
From: Pablo Neira Ayuso <pablo(a)netfilter.org>
commit 520778042ccca019f3ffa136dd0ca565c486cedd upstream.
Since 3e135cd499bf ("netfilter: nft_dynset: dynamic stateful expression
instantiation"), it is possible to attach stateful expressions to set
elements.
cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate
and destroy phase") introduces conditional destruction on the object to
accomodate transaction semantics.
nft_expr_init() calls expr->ops->init() first, then check for
NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful
lookup expressions which points to a set, which might lead to UAF since
the set is not properly detached from the set->binding for this case.
Anyway, this combination is non-sense from nf_tables perspective.
This patch fixes this problem by checking for NFT_STATEFUL_EXPR before
expr->ops->init() is called.
The reporter provides a KASAN splat and a poc reproducer (similar to
those autogenerated by syzbot to report use-after-free errors). It is
unknown to me if they are using syzbot or if they use similar automated
tool to locate the bug that they are reporting.
For the record, this is the KASAN splat.
[ 85.431824] ==================================================================
[ 85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20
[ 85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776
[ 85.434756]
[ 85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G W 5.18.0+ #2
[ 85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Fixes: 0b2d8a7b638b ("netfilter: nf_tables: add helper functions for expression handling")
Reported-and-tested-by: Aaron Adams <edg-e(a)nccgroup.com>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
[Ajay: Regenerated the patch for v5.4.y]
Signed-off-by: Ajay Kaher <akaher(a)vmware.com>
---
net/netfilter/nf_tables_api.c | 16 ++++++++++------
net/netfilter/nft_dynset.c | 3 ---
2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 545da27..b51c192 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2267,27 +2267,31 @@ struct nft_expr *nft_expr_init(const struct nft_ctx *ctx,
err = nf_tables_expr_parse(ctx, nla, &info);
if (err < 0)
- goto err1;
+ goto err_expr_parse;
+
+ err = -EOPNOTSUPP;
+ if (!(info.ops->type->flags & NFT_EXPR_STATEFUL))
+ goto err_expr_stateful;
err = -ENOMEM;
expr = kzalloc(info.ops->size, GFP_KERNEL);
if (expr == NULL)
- goto err2;
+ goto err_expr_stateful;
err = nf_tables_newexpr(ctx, &info, expr);
if (err < 0)
- goto err3;
+ goto err_expr_new;
return expr;
-err3:
+err_expr_new:
kfree(expr);
-err2:
+err_expr_stateful:
owner = info.ops->type->owner;
if (info.ops->type->release_ops)
info.ops->type->release_ops(info.ops);
module_put(owner);
-err1:
+err_expr_parse:
return ERR_PTR(err);
}
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 6fdea0e..6bcc181 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -204,9 +204,6 @@ static int nft_dynset_init(const struct nft_ctx *ctx,
return PTR_ERR(priv->expr);
err = -EOPNOTSUPP;
- if (!(priv->expr->ops->type->flags & NFT_EXPR_STATEFUL))
- goto err1;
-
if (priv->expr->ops->type->flags & NFT_EXPR_GC) {
if (set->flags & NFT_SET_TIMEOUT)
goto err1;
--
2.7.4
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d1f6530c3e373ddd7c76b05646052a27eead14ad Mon Sep 17 00:00:00 2001
From: Johannes Berg <johannes.berg(a)intel.com>
Date: Tue, 17 May 2022 12:05:08 +0300
Subject: [PATCH] iwlwifi: fw: init SAR GEO table only if data is present
When no table data was read from ACPI, then filling the data
and returning success here will fill zero values, which means
transmit power will be limited to 0 dBm. This is clearly not
intended.
Return an error from iwl_sar_geo_init() if there's no data to
fill into the command structure.
Cc: stable(a)vger.kernel.org
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
Fixes: 78a19d5285d9 ("iwlwifi: mvm: Read the PPAG and SAR tables at INIT stage")
Signed-off-by: Gregory Greenman <gregory.greenman(a)intel.com>
Link: https://lore.kernel.org/r/20220517120044.bc45923b74e9.Id2b4362234b7f8ced82c…
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/drivers/net/wireless/intel/iwlwifi/fw/acpi.c b/drivers/net/wireless/intel/iwlwifi/fw/acpi.c
index 33aae639ad37..e6d64152c81a 100644
--- a/drivers/net/wireless/intel/iwlwifi/fw/acpi.c
+++ b/drivers/net/wireless/intel/iwlwifi/fw/acpi.c
@@ -937,6 +937,9 @@ int iwl_sar_geo_init(struct iwl_fw_runtime *fwrt,
{
int i, j;
+ if (!fwrt->geo_enabled)
+ return -ENODATA;
+
if (!iwl_sar_geo_support(fwrt))
return -EOPNOTSUPP;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3bc5e683c67d94bd839a1da2e796c15847b51b69 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 1 Apr 2022 12:27:44 +0200
Subject: [PATCH] bfq: Split shared queues on move between cgroups
When bfqq is shared by multiple processes it can happen that one of the
processes gets moved to a different cgroup (or just starts submitting IO
for different cgroup). In case that happens we need to split the merged
bfqq as otherwise we will have IO for multiple cgroups in one bfqq and
we will just account IO time to wrong entities etc.
Similarly if the bfqq is scheduled to merge with another bfqq but the
merge didn't happen yet, cancel the merge as it need not be valid
anymore.
CC: stable(a)vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3(a)huawei.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-3-jack@suse.cz
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 420eda2589c0..9352f3cc2377 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -743,9 +743,39 @@ static struct bfq_group *__bfq_bic_change_cgroup(struct bfq_data *bfqd,
}
if (sync_bfqq) {
- entity = &sync_bfqq->entity;
- if (entity->sched_data != &bfqg->sched_data)
- bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
+ if (!sync_bfqq->new_bfqq && !bfq_bfqq_coop(sync_bfqq)) {
+ /* We are the only user of this bfqq, just move it */
+ if (sync_bfqq->entity.sched_data != &bfqg->sched_data)
+ bfq_bfqq_move(bfqd, sync_bfqq, bfqg);
+ } else {
+ struct bfq_queue *bfqq;
+
+ /*
+ * The queue was merged to a different queue. Check
+ * that the merge chain still belongs to the same
+ * cgroup.
+ */
+ for (bfqq = sync_bfqq; bfqq; bfqq = bfqq->new_bfqq)
+ if (bfqq->entity.sched_data !=
+ &bfqg->sched_data)
+ break;
+ if (bfqq) {
+ /*
+ * Some queue changed cgroup so the merge is
+ * not valid anymore. We cannot easily just
+ * cancel the merge (by clearing new_bfqq) as
+ * there may be other processes using this
+ * queue and holding refs to all queues below
+ * sync_bfqq->new_bfqq. Similarly if the merge
+ * already happened, we need to detach from
+ * bfqq now so that we cannot merge bio to a
+ * request from the old cgroup.
+ */
+ bfq_put_cooperator(sync_bfqq);
+ bfq_release_process_ref(bfqd, sync_bfqq);
+ bic_set_bfqq(bic, NULL, 1);
+ }
+ }
}
return bfqg;
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c
index 7d00b21ebe5d..89fe3f85eb3c 100644
--- a/block/bfq-iosched.c
+++ b/block/bfq-iosched.c
@@ -5315,7 +5315,7 @@ static void bfq_put_stable_ref(struct bfq_queue *bfqq)
bfq_put_queue(bfqq);
}
-static void bfq_put_cooperator(struct bfq_queue *bfqq)
+void bfq_put_cooperator(struct bfq_queue *bfqq)
{
struct bfq_queue *__bfqq, *next;
diff --git a/block/bfq-iosched.h b/block/bfq-iosched.h
index 3b83e3d1c2e5..a56763045d19 100644
--- a/block/bfq-iosched.h
+++ b/block/bfq-iosched.h
@@ -979,6 +979,7 @@ void bfq_weights_tree_remove(struct bfq_data *bfqd,
void bfq_bfqq_expire(struct bfq_data *bfqd, struct bfq_queue *bfqq,
bool compensate, enum bfqq_expiration reason);
void bfq_put_queue(struct bfq_queue *bfqq);
+void bfq_put_cooperator(struct bfq_queue *bfqq);
void bfq_end_wr_async_queues(struct bfq_data *bfqd, struct bfq_group *bfqg);
void bfq_release_process_ref(struct bfq_data *bfqd, struct bfq_queue *bfqq);
void bfq_schedule_dispatch(struct bfq_data *bfqd);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 075a53b78b815301f8d3dd1ee2cd99554e34f0dd Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 1 Apr 2022 12:27:50 +0200
Subject: [PATCH] bfq: Make sure bfqg for which we are queueing requests is
online
Bios queued into BFQ IO scheduler can be associated with a cgroup that
was already offlined. This may then cause insertion of this bfq_group
into a service tree. But this bfq_group will get freed as soon as last
bio associated with it is completed leading to use after free issues for
service tree users. Fix the problem by making sure we always operate on
online bfq_group. If the bfq_group associated with the bio is not
online, we pick the first online parent.
CC: stable(a)vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3(a)huawei.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 32d2c2a47480..09574af83566 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -612,10 +612,19 @@ static void bfq_link_bfqg(struct bfq_data *bfqd, struct bfq_group *bfqg)
struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
{
struct blkcg_gq *blkg = bio->bi_blkg;
+ struct bfq_group *bfqg;
- if (!blkg)
- return bfqd->root_group;
- return blkg_to_bfqg(blkg);
+ while (blkg) {
+ bfqg = blkg_to_bfqg(blkg);
+ if (bfqg->online) {
+ bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
+ return bfqg;
+ }
+ blkg = blkg->parent;
+ }
+ bio_associate_blkg_from_css(bio,
+ &bfqg_to_blkg(bfqd->root_group)->blkcg->css);
+ return bfqd->root_group;
}
/**
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 075a53b78b815301f8d3dd1ee2cd99554e34f0dd Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 1 Apr 2022 12:27:50 +0200
Subject: [PATCH] bfq: Make sure bfqg for which we are queueing requests is
online
Bios queued into BFQ IO scheduler can be associated with a cgroup that
was already offlined. This may then cause insertion of this bfq_group
into a service tree. But this bfq_group will get freed as soon as last
bio associated with it is completed leading to use after free issues for
service tree users. Fix the problem by making sure we always operate on
online bfq_group. If the bfq_group associated with the bio is not
online, we pick the first online parent.
CC: stable(a)vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3(a)huawei.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 32d2c2a47480..09574af83566 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -612,10 +612,19 @@ static void bfq_link_bfqg(struct bfq_data *bfqd, struct bfq_group *bfqg)
struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
{
struct blkcg_gq *blkg = bio->bi_blkg;
+ struct bfq_group *bfqg;
- if (!blkg)
- return bfqd->root_group;
- return blkg_to_bfqg(blkg);
+ while (blkg) {
+ bfqg = blkg_to_bfqg(blkg);
+ if (bfqg->online) {
+ bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
+ return bfqg;
+ }
+ blkg = blkg->parent;
+ }
+ bio_associate_blkg_from_css(bio,
+ &bfqg_to_blkg(bfqd->root_group)->blkcg->css);
+ return bfqd->root_group;
}
/**
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 075a53b78b815301f8d3dd1ee2cd99554e34f0dd Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 1 Apr 2022 12:27:50 +0200
Subject: [PATCH] bfq: Make sure bfqg for which we are queueing requests is
online
Bios queued into BFQ IO scheduler can be associated with a cgroup that
was already offlined. This may then cause insertion of this bfq_group
into a service tree. But this bfq_group will get freed as soon as last
bio associated with it is completed leading to use after free issues for
service tree users. Fix the problem by making sure we always operate on
online bfq_group. If the bfq_group associated with the bio is not
online, we pick the first online parent.
CC: stable(a)vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3(a)huawei.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 32d2c2a47480..09574af83566 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -612,10 +612,19 @@ static void bfq_link_bfqg(struct bfq_data *bfqd, struct bfq_group *bfqg)
struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
{
struct blkcg_gq *blkg = bio->bi_blkg;
+ struct bfq_group *bfqg;
- if (!blkg)
- return bfqd->root_group;
- return blkg_to_bfqg(blkg);
+ while (blkg) {
+ bfqg = blkg_to_bfqg(blkg);
+ if (bfqg->online) {
+ bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
+ return bfqg;
+ }
+ blkg = blkg->parent;
+ }
+ bio_associate_blkg_from_css(bio,
+ &bfqg_to_blkg(bfqd->root_group)->blkcg->css);
+ return bfqd->root_group;
}
/**
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 075a53b78b815301f8d3dd1ee2cd99554e34f0dd Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Fri, 1 Apr 2022 12:27:50 +0200
Subject: [PATCH] bfq: Make sure bfqg for which we are queueing requests is
online
Bios queued into BFQ IO scheduler can be associated with a cgroup that
was already offlined. This may then cause insertion of this bfq_group
into a service tree. But this bfq_group will get freed as soon as last
bio associated with it is completed leading to use after free issues for
service tree users. Fix the problem by making sure we always operate on
online bfq_group. If the bfq_group associated with the bio is not
online, we pick the first online parent.
CC: stable(a)vger.kernel.org
Fixes: e21b7a0b9887 ("block, bfq: add full hierarchical scheduling and cgroups support")
Tested-by: "yukuai (C)" <yukuai3(a)huawei.com>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Link: https://lore.kernel.org/r/20220401102752.8599-9-jack@suse.cz
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/bfq-cgroup.c b/block/bfq-cgroup.c
index 32d2c2a47480..09574af83566 100644
--- a/block/bfq-cgroup.c
+++ b/block/bfq-cgroup.c
@@ -612,10 +612,19 @@ static void bfq_link_bfqg(struct bfq_data *bfqd, struct bfq_group *bfqg)
struct bfq_group *bfq_bio_bfqg(struct bfq_data *bfqd, struct bio *bio)
{
struct blkcg_gq *blkg = bio->bi_blkg;
+ struct bfq_group *bfqg;
- if (!blkg)
- return bfqd->root_group;
- return blkg_to_bfqg(blkg);
+ while (blkg) {
+ bfqg = blkg_to_bfqg(blkg);
+ if (bfqg->online) {
+ bio_associate_blkg_from_css(bio, &blkg->blkcg->css);
+ return bfqg;
+ }
+ blkg = blkg->parent;
+ }
+ bio_associate_blkg_from_css(bio,
+ &bfqg_to_blkg(bfqd->root_group)->blkcg->css);
+ return bfqd->root_group;
}
/**
test_bit(), as any other bitmap op, takes `unsigned long *` as a
second argument (pointer to the actual bitmap), as any bitmap
itself is an array of unsigned longs. However, the ia64_get_irr()
code passes a ref to `u64` as a second argument.
This works with the ia64 bitops implementation due to that they
have `void *` as the second argument and then cast it later on.
This works with the bitmap API itself due to that `unsigned long`
has the same size on ia64 as `u64` (`unsigned long long`), but
from the compiler PoV those two are different.
Define @irr as `unsigned long` to fix that. That implies no
functional changes. Has been hidden for 16 years!
Fixes: a58786917ce2 ("[IA64] avoid broken SAL_CACHE_FLUSH implementations")
Cc: stable(a)vger.kernel.org # 2.6.16+
Reported-by: kernel test robot <lkp(a)intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin(a)intel.com>
---
arch/ia64/include/asm/processor.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 7cbce290f4e5..757c2f6d8d4b 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -538,7 +538,7 @@ ia64_get_irr(unsigned int vector)
{
unsigned int reg = vector / 64;
unsigned int bit = vector % 64;
- u64 irr;
+ unsigned long irr;
switch (reg) {
case 0: irr = ia64_getreg(_IA64_REG_CR_IRR0); break;
--
2.36.1
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a508e33956b538e034ed5df619a73ec7c15bda72 Mon Sep 17 00:00:00 2001
From: Miaoqian Lin <linmq006(a)gmail.com>
Date: Thu, 12 May 2022 08:44:45 +0400
Subject: [PATCH] ipmi:ipmb: Fix refcount leak in ipmi_ipmb_probe
of_parse_phandle() returns a node pointer with refcount
incremented, we should use of_node_put() on it when done.
Add missing of_node_put() to avoid refcount leak.
Fixes: 00d93611f002 ("ipmi:ipmb: Add the ability to have a separate slave and master device")
Signed-off-by: Miaoqian Lin <linmq006(a)gmail.com>
Message-Id: <20220512044445.3102-1-linmq006(a)gmail.com>
Cc: stable(a)vger.kernel.org # v5.17+
Signed-off-by: Corey Minyard <cminyard(a)mvista.com>
diff --git a/drivers/char/ipmi/ipmi_ipmb.c b/drivers/char/ipmi/ipmi_ipmb.c
index 7a83fbb4e379..ab19b4b3317e 100644
--- a/drivers/char/ipmi/ipmi_ipmb.c
+++ b/drivers/char/ipmi/ipmi_ipmb.c
@@ -475,6 +475,7 @@ static int ipmi_ipmb_probe(struct i2c_client *client)
slave_np = of_parse_phandle(dev->of_node, "slave-dev", 0);
if (slave_np) {
slave_adap = of_get_i2c_adapter_by_node(slave_np);
+ of_node_put(slave_np);
if (!slave_adap) {
dev_notice(&client->dev,
"Could not find slave adapter\n");
These 2 patches fix issues related to the promiscuous mode on VF.
Comments are welcome,
Olivier
Cc: stable(a)vger.kernel.org
Cc: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Changes since v1:
- resend with CC intel-wired-lan
- remove CC Hiroshi Shimamoto (address does not exist anymore)
Olivier Matz (2):
ixgbe: fix bcast packets Rx on VF after promisc removal
ixgbe: fix unexpected VLAN Rx in promisc mode on VF
drivers/net/ethernet/intel/ixgbe/ixgbe_sriov.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
--
2.30.2
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d09144745959bf7852ccafd73243dd7d1eaeb163 Mon Sep 17 00:00:00 2001
From: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Date: Mon, 9 May 2022 14:34:17 +0100
Subject: [PATCH] crypto: qat - re-enable registration of algorithms
Re-enable the registration of algorithms after fixes to (1) use
pre-allocated buffers in the datapath and (2) support the
CRYPTO_TFM_REQ_MAY_BACKLOG flag.
This reverts commit 8893d27ffcaf6ec6267038a177cb87bcde4dd3de.
Cc: stable(a)vger.kernel.org
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Reviewed-by: Marco Chiappero <marco.chiappero(a)intel.com>
Reviewed-by: Adam Guerin <adam.guerin(a)intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba(a)intel.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/qat/qat_4xxx/adf_drv.c b/drivers/crypto/qat/qat_4xxx/adf_drv.c
index fa4c350c1bf9..a6c78b9c730b 100644
--- a/drivers/crypto/qat/qat_4xxx/adf_drv.c
+++ b/drivers/crypto/qat/qat_4xxx/adf_drv.c
@@ -75,13 +75,6 @@ static int adf_crypto_dev_config(struct adf_accel_dev *accel_dev)
if (ret)
goto err;
- /* Temporarily set the number of crypto instances to zero to avoid
- * registering the crypto algorithms.
- * This will be removed when the algorithms will support the
- * CRYPTO_TFM_REQ_MAY_BACKLOG flag
- */
- instances = 0;
-
for (i = 0; i < instances; i++) {
val = i;
bank = i * 2;
diff --git a/drivers/crypto/qat/qat_common/qat_crypto.c b/drivers/crypto/qat/qat_common/qat_crypto.c
index 80d905ed102e..9341d892533a 100644
--- a/drivers/crypto/qat/qat_common/qat_crypto.c
+++ b/drivers/crypto/qat/qat_common/qat_crypto.c
@@ -161,13 +161,6 @@ int qat_crypto_dev_config(struct adf_accel_dev *accel_dev)
if (ret)
goto err;
- /* Temporarily set the number of crypto instances to zero to avoid
- * registering the crypto algorithms.
- * This will be removed when the algorithms will support the
- * CRYPTO_TFM_REQ_MAY_BACKLOG flag
- */
- instances = 0;
-
for (i = 0; i < instances; i++) {
val = i;
snprintf(key, sizeof(key), ADF_CY "%d" ADF_RING_ASYM_BANK_NUM, i);
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2acbb8771f6ac82422886e63832ee7a0f4b1635b Mon Sep 17 00:00:00 2001
From: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Date: Mon, 9 May 2022 14:34:15 +0100
Subject: [PATCH] crypto: qat - add param check for DH
Reject requests with a source buffer that is bigger than the size of the
key. This is to prevent a possible integer underflow that might happen
when copying the source scatterlist into a linear buffer.
Cc: stable(a)vger.kernel.org
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Reviewed-by: Adam Guerin <adam.guerin(a)intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba(a)intel.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/qat/qat_common/qat_asym_algs.c b/drivers/crypto/qat/qat_common/qat_asym_algs.c
index 947eeff181b4..7173a2a0a484 100644
--- a/drivers/crypto/qat/qat_common/qat_asym_algs.c
+++ b/drivers/crypto/qat/qat_common/qat_asym_algs.c
@@ -235,6 +235,10 @@ static int qat_dh_compute_value(struct kpp_request *req)
req->dst_len = ctx->p_size;
return -EOVERFLOW;
}
+
+ if (req->src_len > ctx->p_size)
+ return -EINVAL;
+
memset(msg, '\0', sizeof(*msg));
ICP_QAT_FW_PKE_HDR_VALID_FLAG_SET(msg->pke_hdr,
ICP_QAT_FW_COMN_REQ_FLAG_SET);
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 029aa4624a7fe35233bdd3d1354dc7be260380bf Mon Sep 17 00:00:00 2001
From: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Date: Mon, 9 May 2022 14:34:13 +0100
Subject: [PATCH] crypto: qat - remove dma_free_coherent() for DH
The functions qat_dh_compute_value() allocates memory with
dma_alloc_coherent() if the source or the destination buffers are made
of multiple flat buffers or of a size that is not compatible with the
hardware.
This memory is then freed with dma_free_coherent() in the context of a
tasklet invoked to handle the response for the corresponding request.
According to Documentation/core-api/dma-api-howto.rst, the function
dma_free_coherent() cannot be called in an interrupt context.
Replace allocations with dma_alloc_coherent() in the function
qat_dh_compute_value() with kmalloc() + dma_map_single().
Cc: stable(a)vger.kernel.org
Fixes: c9839143ebbf ("crypto: qat - Add DH support")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Reviewed-by: Adam Guerin <adam.guerin(a)intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba(a)intel.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/qat/qat_common/qat_asym_algs.c b/drivers/crypto/qat/qat_common/qat_asym_algs.c
index b31372bddb96..25bbd22085c3 100644
--- a/drivers/crypto/qat/qat_common/qat_asym_algs.c
+++ b/drivers/crypto/qat/qat_common/qat_asym_algs.c
@@ -164,26 +164,21 @@ static void qat_dh_cb(struct icp_qat_fw_pke_resp *resp)
err = (err == ICP_QAT_FW_COMN_STATUS_FLAG_OK) ? 0 : -EINVAL;
if (areq->src) {
- if (req->src_align)
- dma_free_coherent(dev, req->ctx.dh->p_size,
- req->src_align, req->in.dh.in.b);
- else
- dma_unmap_single(dev, req->in.dh.in.b,
- req->ctx.dh->p_size, DMA_TO_DEVICE);
+ dma_unmap_single(dev, req->in.dh.in.b, req->ctx.dh->p_size,
+ DMA_TO_DEVICE);
+ kfree_sensitive(req->src_align);
}
areq->dst_len = req->ctx.dh->p_size;
if (req->dst_align) {
scatterwalk_map_and_copy(req->dst_align, areq->dst, 0,
areq->dst_len, 1);
-
- dma_free_coherent(dev, req->ctx.dh->p_size, req->dst_align,
- req->out.dh.r);
- } else {
- dma_unmap_single(dev, req->out.dh.r, req->ctx.dh->p_size,
- DMA_FROM_DEVICE);
+ kfree_sensitive(req->dst_align);
}
+ dma_unmap_single(dev, req->out.dh.r, req->ctx.dh->p_size,
+ DMA_FROM_DEVICE);
+
dma_unmap_single(dev, req->phy_in, sizeof(struct qat_dh_input_params),
DMA_TO_DEVICE);
dma_unmap_single(dev, req->phy_out,
@@ -231,6 +226,7 @@ static int qat_dh_compute_value(struct kpp_request *req)
struct icp_qat_fw_pke_request *msg = &qat_req->req;
int ret;
int n_input_params = 0;
+ u8 *vaddr;
if (unlikely(!ctx->xa))
return -EINVAL;
@@ -287,27 +283,24 @@ static int qat_dh_compute_value(struct kpp_request *req)
*/
if (sg_is_last(req->src) && req->src_len == ctx->p_size) {
qat_req->src_align = NULL;
- qat_req->in.dh.in.b = dma_map_single(dev,
- sg_virt(req->src),
- req->src_len,
- DMA_TO_DEVICE);
- if (unlikely(dma_mapping_error(dev,
- qat_req->in.dh.in.b)))
- return ret;
-
+ vaddr = sg_virt(req->src);
} else {
int shift = ctx->p_size - req->src_len;
- qat_req->src_align = dma_alloc_coherent(dev,
- ctx->p_size,
- &qat_req->in.dh.in.b,
- GFP_KERNEL);
+ qat_req->src_align = kzalloc(ctx->p_size, GFP_KERNEL);
if (unlikely(!qat_req->src_align))
return ret;
scatterwalk_map_and_copy(qat_req->src_align + shift,
req->src, 0, req->src_len, 0);
+
+ vaddr = qat_req->src_align;
}
+
+ qat_req->in.dh.in.b = dma_map_single(dev, vaddr, ctx->p_size,
+ DMA_TO_DEVICE);
+ if (unlikely(dma_mapping_error(dev, qat_req->in.dh.in.b)))
+ goto unmap_src;
}
/*
* dst can be of any size in valid range, but HW expects it to be the
@@ -318,20 +311,18 @@ static int qat_dh_compute_value(struct kpp_request *req)
*/
if (sg_is_last(req->dst) && req->dst_len == ctx->p_size) {
qat_req->dst_align = NULL;
- qat_req->out.dh.r = dma_map_single(dev, sg_virt(req->dst),
- req->dst_len,
- DMA_FROM_DEVICE);
-
- if (unlikely(dma_mapping_error(dev, qat_req->out.dh.r)))
- goto unmap_src;
-
+ vaddr = sg_virt(req->dst);
} else {
- qat_req->dst_align = dma_alloc_coherent(dev, ctx->p_size,
- &qat_req->out.dh.r,
- GFP_KERNEL);
+ qat_req->dst_align = kzalloc(ctx->p_size, GFP_KERNEL);
if (unlikely(!qat_req->dst_align))
goto unmap_src;
+
+ vaddr = qat_req->dst_align;
}
+ qat_req->out.dh.r = dma_map_single(dev, vaddr, ctx->p_size,
+ DMA_FROM_DEVICE);
+ if (unlikely(dma_mapping_error(dev, qat_req->out.dh.r)))
+ goto unmap_dst;
qat_req->in.dh.in_tab[n_input_params] = 0;
qat_req->out.dh.out_tab[1] = 0;
@@ -371,23 +362,17 @@ static int qat_dh_compute_value(struct kpp_request *req)
sizeof(struct qat_dh_input_params),
DMA_TO_DEVICE);
unmap_dst:
- if (qat_req->dst_align)
- dma_free_coherent(dev, ctx->p_size, qat_req->dst_align,
- qat_req->out.dh.r);
- else
- if (!dma_mapping_error(dev, qat_req->out.dh.r))
- dma_unmap_single(dev, qat_req->out.dh.r, ctx->p_size,
- DMA_FROM_DEVICE);
+ if (!dma_mapping_error(dev, qat_req->out.dh.r))
+ dma_unmap_single(dev, qat_req->out.dh.r, ctx->p_size,
+ DMA_FROM_DEVICE);
+ kfree_sensitive(qat_req->dst_align);
unmap_src:
if (req->src) {
- if (qat_req->src_align)
- dma_free_coherent(dev, ctx->p_size, qat_req->src_align,
- qat_req->in.dh.in.b);
- else
- if (!dma_mapping_error(dev, qat_req->in.dh.in.b))
- dma_unmap_single(dev, qat_req->in.dh.in.b,
- ctx->p_size,
- DMA_TO_DEVICE);
+ if (!dma_mapping_error(dev, qat_req->in.dh.in.b))
+ dma_unmap_single(dev, qat_req->in.dh.in.b,
+ ctx->p_size,
+ DMA_TO_DEVICE);
+ kfree_sensitive(qat_req->src_align);
}
return ret;
}
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 80a52e1ee7757b742f96bfb0d58f0c14eb6583d0 Mon Sep 17 00:00:00 2001
From: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Date: Mon, 9 May 2022 14:34:11 +0100
Subject: [PATCH] crypto: qat - fix memory leak in RSA
When an RSA key represented in form 2 (as defined in PKCS #1 V2.1) is
used, some components of the private key persist even after the TFM is
released.
Replace the explicit calls to free the buffers in qat_rsa_exit_tfm()
with a call to qat_rsa_clear_ctx() which frees all buffers referenced in
the TFM context.
Cc: stable(a)vger.kernel.org
Fixes: 879f77e9071f ("crypto: qat - Add RSA CRT mode")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Reviewed-by: Adam Guerin <adam.guerin(a)intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba(a)intel.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/qat/qat_common/qat_asym_algs.c b/drivers/crypto/qat/qat_common/qat_asym_algs.c
index ff7249c093c9..2bc02c75398e 100644
--- a/drivers/crypto/qat/qat_common/qat_asym_algs.c
+++ b/drivers/crypto/qat/qat_common/qat_asym_algs.c
@@ -1257,18 +1257,8 @@ static void qat_rsa_exit_tfm(struct crypto_akcipher *tfm)
struct qat_rsa_ctx *ctx = akcipher_tfm_ctx(tfm);
struct device *dev = &GET_DEV(ctx->inst->accel_dev);
- if (ctx->n)
- dma_free_coherent(dev, ctx->key_sz, ctx->n, ctx->dma_n);
- if (ctx->e)
- dma_free_coherent(dev, ctx->key_sz, ctx->e, ctx->dma_e);
- if (ctx->d) {
- memset(ctx->d, '\0', ctx->key_sz);
- dma_free_coherent(dev, ctx->key_sz, ctx->d, ctx->dma_d);
- }
+ qat_rsa_clear_ctx(dev, ctx);
qat_crypto_put_instance(ctx->inst);
- ctx->n = NULL;
- ctx->e = NULL;
- ctx->d = NULL;
}
static struct akcipher_alg rsa = {
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9714061423b8b24b8afb31b8eb4df977c63f19c4 Mon Sep 17 00:00:00 2001
From: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Date: Mon, 9 May 2022 14:34:14 +0100
Subject: [PATCH] crypto: qat - add param check for RSA
Reject requests with a source buffer that is bigger than the size of the
key. This is to prevent a possible integer underflow that might happen
when copying the source scatterlist into a linear buffer.
Cc: stable(a)vger.kernel.org
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Reviewed-by: Adam Guerin <adam.guerin(a)intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba(a)intel.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/qat/qat_common/qat_asym_algs.c b/drivers/crypto/qat/qat_common/qat_asym_algs.c
index 25bbd22085c3..947eeff181b4 100644
--- a/drivers/crypto/qat/qat_common/qat_asym_algs.c
+++ b/drivers/crypto/qat/qat_common/qat_asym_algs.c
@@ -656,6 +656,10 @@ static int qat_rsa_enc(struct akcipher_request *req)
req->dst_len = ctx->key_sz;
return -EOVERFLOW;
}
+
+ if (req->src_len > ctx->key_sz)
+ return -EINVAL;
+
memset(msg, '\0', sizeof(*msg));
ICP_QAT_FW_PKE_HDR_VALID_FLAG_SET(msg->pke_hdr,
ICP_QAT_FW_COMN_REQ_FLAG_SET);
@@ -785,6 +789,10 @@ static int qat_rsa_dec(struct akcipher_request *req)
req->dst_len = ctx->key_sz;
return -EOVERFLOW;
}
+
+ if (req->src_len > ctx->key_sz)
+ return -EINVAL;
+
memset(msg, '\0', sizeof(*msg));
ICP_QAT_FW_PKE_HDR_VALID_FLAG_SET(msg->pke_hdr,
ICP_QAT_FW_COMN_REQ_FLAG_SET);
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e0831e7af4e03f2715de102e18e9179ec0a81562 Mon Sep 17 00:00:00 2001
From: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Date: Mon, 9 May 2022 14:34:08 +0100
Subject: [PATCH] crypto: qat - use pre-allocated buffers in datapath
In order to do DMAs, the QAT device requires that the scatterlist
structures are mapped and translated into a format that the firmware can
understand. This is defined as the composition of a scatter gather list
(SGL) descriptor header, the struct qat_alg_buf_list, plus a variable
number of flat buffer descriptors, the struct qat_alg_buf.
The allocation and mapping of these data structures is done each time a
request is received from the skcipher and aead APIs.
In an OOM situation, this behaviour might lead to a dead-lock if an
allocation fails.
Based on the conversation in [1], increase the size of the aead and
skcipher request contexts to include an SGL descriptor that can handle
a maximum of 4 flat buffers.
If requests exceed 4 entries buffers, memory is allocated dynamically.
[1] https://lore.kernel.org/linux-crypto/20200722072932.GA27544@gondor.apana.or…
Cc: stable(a)vger.kernel.org
Fixes: d370cec32194 ("crypto: qat - Intel(R) QAT crypto interface")
Reported-by: Mikulas Patocka <mpatocka(a)redhat.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Reviewed-by: Marco Chiappero <marco.chiappero(a)intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba(a)intel.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/qat/qat_common/qat_algs.c b/drivers/crypto/qat/qat_common/qat_algs.c
index f998ed58457c..ec635fe44c1f 100644
--- a/drivers/crypto/qat/qat_common/qat_algs.c
+++ b/drivers/crypto/qat/qat_common/qat_algs.c
@@ -46,19 +46,6 @@
static DEFINE_MUTEX(algs_lock);
static unsigned int active_devs;
-struct qat_alg_buf {
- u32 len;
- u32 resrvd;
- u64 addr;
-} __packed;
-
-struct qat_alg_buf_list {
- u64 resrvd;
- u32 num_bufs;
- u32 num_mapped_bufs;
- struct qat_alg_buf bufers[];
-} __packed __aligned(64);
-
/* Common content descriptor */
struct qat_alg_cd {
union {
@@ -693,7 +680,10 @@ static void qat_alg_free_bufl(struct qat_crypto_instance *inst,
bl->bufers[i].len, DMA_BIDIRECTIONAL);
dma_unmap_single(dev, blp, sz, DMA_TO_DEVICE);
- kfree(bl);
+
+ if (!qat_req->buf.sgl_src_valid)
+ kfree(bl);
+
if (blp != blpout) {
/* If out of place operation dma unmap only data */
int bufless = blout->num_bufs - blout->num_mapped_bufs;
@@ -704,7 +694,9 @@ static void qat_alg_free_bufl(struct qat_crypto_instance *inst,
DMA_BIDIRECTIONAL);
}
dma_unmap_single(dev, blpout, sz_out, DMA_TO_DEVICE);
- kfree(blout);
+
+ if (!qat_req->buf.sgl_dst_valid)
+ kfree(blout);
}
}
@@ -721,15 +713,24 @@ static int qat_alg_sgl_to_bufl(struct qat_crypto_instance *inst,
dma_addr_t blp = DMA_MAPPING_ERROR;
dma_addr_t bloutp = DMA_MAPPING_ERROR;
struct scatterlist *sg;
- size_t sz_out, sz = struct_size(bufl, bufers, n + 1);
+ size_t sz_out, sz = struct_size(bufl, bufers, n);
+ int node = dev_to_node(&GET_DEV(inst->accel_dev));
if (unlikely(!n))
return -EINVAL;
- bufl = kzalloc_node(sz, GFP_ATOMIC,
- dev_to_node(&GET_DEV(inst->accel_dev)));
- if (unlikely(!bufl))
- return -ENOMEM;
+ qat_req->buf.sgl_src_valid = false;
+ qat_req->buf.sgl_dst_valid = false;
+
+ if (n > QAT_MAX_BUFF_DESC) {
+ bufl = kzalloc_node(sz, GFP_ATOMIC, node);
+ if (unlikely(!bufl))
+ return -ENOMEM;
+ } else {
+ bufl = &qat_req->buf.sgl_src.sgl_hdr;
+ memset(bufl, 0, sizeof(struct qat_alg_buf_list));
+ qat_req->buf.sgl_src_valid = true;
+ }
for_each_sg(sgl, sg, n, i)
bufl->bufers[i].addr = DMA_MAPPING_ERROR;
@@ -760,12 +761,18 @@ static int qat_alg_sgl_to_bufl(struct qat_crypto_instance *inst,
struct qat_alg_buf *bufers;
n = sg_nents(sglout);
- sz_out = struct_size(buflout, bufers, n + 1);
+ sz_out = struct_size(buflout, bufers, n);
sg_nctr = 0;
- buflout = kzalloc_node(sz_out, GFP_ATOMIC,
- dev_to_node(&GET_DEV(inst->accel_dev)));
- if (unlikely(!buflout))
- goto err_in;
+
+ if (n > QAT_MAX_BUFF_DESC) {
+ buflout = kzalloc_node(sz_out, GFP_ATOMIC, node);
+ if (unlikely(!buflout))
+ goto err_in;
+ } else {
+ buflout = &qat_req->buf.sgl_dst.sgl_hdr;
+ memset(buflout, 0, sizeof(struct qat_alg_buf_list));
+ qat_req->buf.sgl_dst_valid = true;
+ }
bufers = buflout->bufers;
for_each_sg(sglout, sg, n, i)
@@ -810,7 +817,9 @@ static int qat_alg_sgl_to_bufl(struct qat_crypto_instance *inst,
dma_unmap_single(dev, buflout->bufers[i].addr,
buflout->bufers[i].len,
DMA_BIDIRECTIONAL);
- kfree(buflout);
+
+ if (!qat_req->buf.sgl_dst_valid)
+ kfree(buflout);
err_in:
if (!dma_mapping_error(dev, blp))
@@ -823,7 +832,8 @@ static int qat_alg_sgl_to_bufl(struct qat_crypto_instance *inst,
bufl->bufers[i].len,
DMA_BIDIRECTIONAL);
- kfree(bufl);
+ if (!qat_req->buf.sgl_src_valid)
+ kfree(bufl);
dev_err(dev, "Failed to map buf for dma\n");
return -ENOMEM;
diff --git a/drivers/crypto/qat/qat_common/qat_crypto.h b/drivers/crypto/qat/qat_common/qat_crypto.h
index b6a4c95ae003..0928f159ea99 100644
--- a/drivers/crypto/qat/qat_common/qat_crypto.h
+++ b/drivers/crypto/qat/qat_common/qat_crypto.h
@@ -21,6 +21,26 @@ struct qat_crypto_instance {
atomic_t refctr;
};
+#define QAT_MAX_BUFF_DESC 4
+
+struct qat_alg_buf {
+ u32 len;
+ u32 resrvd;
+ u64 addr;
+} __packed;
+
+struct qat_alg_buf_list {
+ u64 resrvd;
+ u32 num_bufs;
+ u32 num_mapped_bufs;
+ struct qat_alg_buf bufers[];
+} __packed;
+
+struct qat_alg_fixed_buf_list {
+ struct qat_alg_buf_list sgl_hdr;
+ struct qat_alg_buf descriptors[QAT_MAX_BUFF_DESC];
+} __packed __aligned(64);
+
struct qat_crypto_request_buffs {
struct qat_alg_buf_list *bl;
dma_addr_t blp;
@@ -28,6 +48,10 @@ struct qat_crypto_request_buffs {
dma_addr_t bloutp;
size_t sz;
size_t sz_out;
+ bool sgl_src_valid;
+ bool sgl_dst_valid;
+ struct qat_alg_fixed_buf_list sgl_src;
+ struct qat_alg_fixed_buf_list sgl_dst;
};
struct qat_crypto_request;
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1731160ff7c7bbb11bb1aacb14dd25e18d522779 Mon Sep 17 00:00:00 2001
From: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Date: Mon, 9 May 2022 14:19:27 +0100
Subject: [PATCH] crypto: qat - set to zero DH parameters before free
Set to zero the context buffers containing the DH key before they are
freed.
This is a defense in depth measure that avoids keys to be recovered from
memory in case the system is compromised between the free of the buffer
and when that area of memory (containing keys) gets overwritten.
Cc: stable(a)vger.kernel.org
Fixes: c9839143ebbf ("crypto: qat - Add DH support")
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu(a)intel.com>
Reviewed-by: Adam Guerin <adam.guerin(a)intel.com>
Reviewed-by: Wojciech Ziemba <wojciech.ziemba(a)intel.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/qat/qat_common/qat_asym_algs.c b/drivers/crypto/qat/qat_common/qat_asym_algs.c
index b0b78445418b..5633f9df3b6f 100644
--- a/drivers/crypto/qat/qat_common/qat_asym_algs.c
+++ b/drivers/crypto/qat/qat_common/qat_asym_algs.c
@@ -420,14 +420,17 @@ static int qat_dh_set_params(struct qat_dh_ctx *ctx, struct dh *params)
static void qat_dh_clear_ctx(struct device *dev, struct qat_dh_ctx *ctx)
{
if (ctx->g) {
+ memset(ctx->g, 0, ctx->p_size);
dma_free_coherent(dev, ctx->p_size, ctx->g, ctx->dma_g);
ctx->g = NULL;
}
if (ctx->xa) {
+ memset(ctx->xa, 0, ctx->p_size);
dma_free_coherent(dev, ctx->p_size, ctx->xa, ctx->dma_xa);
ctx->xa = NULL;
}
if (ctx->p) {
+ memset(ctx->p, 0, ctx->p_size);
dma_free_coherent(dev, ctx->p_size, ctx->p, ctx->dma_p);
ctx->p = NULL;
}
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f5585f4f0ef5b17026bbd60fbff6fcc91b99d5bf Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 28 Apr 2022 14:59:46 +0100
Subject: [PATCH] btrfs: fix deadlock between concurrent dio writes when low on
free data space
When reserving data space for a direct IO write we can end up deadlocking
if we have multiple tasks attempting a write to the same file range, there
are multiple extents covered by that file range, we are low on available
space for data and the writes don't expand the inode's i_size.
The deadlock can happen like this:
1) We have a file with an i_size of 1M, at offset 0 it has an extent with
a size of 128K and at offset 128K it has another extent also with a
size of 128K;
2) Task A does a direct IO write against file range [0, 256K), and because
the write is within the i_size boundary, it takes the inode's lock (VFS
level) in shared mode;
3) Task A locks the file range [0, 256K) at btrfs_dio_iomap_begin(), and
then gets the extent map for the extent covering the range [0, 128K).
At btrfs_get_blocks_direct_write(), it creates an ordered extent for
that file range ([0, 128K));
4) Before returning from btrfs_dio_iomap_begin(), it unlocks the file
range [0, 256K);
5) Task A executes btrfs_dio_iomap_begin() again, this time for the file
range [128K, 256K), and locks the file range [128K, 256K);
6) Task B starts a direct IO write against file range [0, 256K) as well.
It also locks the inode in shared mode, as it's within the i_size limit,
and then tries to lock file range [0, 256K). It is able to lock the
subrange [0, 128K) but then blocks waiting for the range [128K, 256K),
as it is currently locked by task A;
7) Task A enters btrfs_get_blocks_direct_write() and tries to reserve data
space. Because we are low on available free space, it triggers the
async data reclaim task, and waits for it to reserve data space;
8) The async reclaim task decides to wait for all existing ordered extents
to complete (through btrfs_wait_ordered_roots()).
It finds the ordered extent previously created by task A for the file
range [0, 128K) and waits for it to complete;
9) The ordered extent for the file range [0, 128K) can not complete
because it blocks at btrfs_finish_ordered_io() when trying to lock the
file range [0, 128K).
This results in a deadlock, because:
- task B is holding the file range [0, 128K) locked, waiting for the
range [128K, 256K) to be unlocked by task A;
- task A is holding the file range [128K, 256K) locked and it's waiting
for the async data reclaim task to satisfy its space reservation
request;
- the async data reclaim task is waiting for ordered extent [0, 128K)
to complete, but the ordered extent can not complete because the
file range [0, 128K) is currently locked by task B, which is waiting
on task A to unlock file range [128K, 256K) and task A waiting
on the async data reclaim task.
This results in a deadlock between 4 task: task A, task B, the async
data reclaim task and the task doing ordered extent completion (a work
queue task).
This type of deadlock can sporadically be triggered by the test case
generic/300 from fstests, and results in a stack trace like the following:
[12084.033689] INFO: task kworker/u16:7:123749 blocked for more than 241 seconds.
[12084.034877] Not tainted 5.18.0-rc2-btrfs-next-115 #1
[12084.035562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12084.036548] task:kworker/u16:7 state:D stack: 0 pid:123749 ppid: 2 flags:0x00004000
[12084.036554] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
[12084.036599] Call Trace:
[12084.036601] <TASK>
[12084.036606] __schedule+0x3cb/0xed0
[12084.036616] schedule+0x4e/0xb0
[12084.036620] btrfs_start_ordered_extent+0x109/0x1c0 [btrfs]
[12084.036651] ? prepare_to_wait_exclusive+0xc0/0xc0
[12084.036659] btrfs_run_ordered_extent_work+0x1a/0x30 [btrfs]
[12084.036688] btrfs_work_helper+0xf8/0x400 [btrfs]
[12084.036719] ? lock_is_held_type+0xe8/0x140
[12084.036727] process_one_work+0x252/0x5a0
[12084.036736] ? process_one_work+0x5a0/0x5a0
[12084.036738] worker_thread+0x52/0x3b0
[12084.036743] ? process_one_work+0x5a0/0x5a0
[12084.036745] kthread+0xf2/0x120
[12084.036747] ? kthread_complete_and_exit+0x20/0x20
[12084.036751] ret_from_fork+0x22/0x30
[12084.036765] </TASK>
[12084.036769] INFO: task kworker/u16:11:153787 blocked for more than 241 seconds.
[12084.037702] Not tainted 5.18.0-rc2-btrfs-next-115 #1
[12084.038540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12084.039506] task:kworker/u16:11 state:D stack: 0 pid:153787 ppid: 2 flags:0x00004000
[12084.039511] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
[12084.039551] Call Trace:
[12084.039553] <TASK>
[12084.039557] __schedule+0x3cb/0xed0
[12084.039566] schedule+0x4e/0xb0
[12084.039569] schedule_timeout+0xed/0x130
[12084.039573] ? mark_held_locks+0x50/0x80
[12084.039578] ? _raw_spin_unlock_irq+0x24/0x50
[12084.039580] ? lockdep_hardirqs_on+0x7d/0x100
[12084.039585] __wait_for_common+0xaf/0x1f0
[12084.039587] ? usleep_range_state+0xb0/0xb0
[12084.039596] btrfs_wait_ordered_extents+0x3d6/0x470 [btrfs]
[12084.039636] btrfs_wait_ordered_roots+0x175/0x240 [btrfs]
[12084.039670] flush_space+0x25b/0x630 [btrfs]
[12084.039712] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
[12084.039747] process_one_work+0x252/0x5a0
[12084.039756] ? process_one_work+0x5a0/0x5a0
[12084.039758] worker_thread+0x52/0x3b0
[12084.039762] ? process_one_work+0x5a0/0x5a0
[12084.039765] kthread+0xf2/0x120
[12084.039766] ? kthread_complete_and_exit+0x20/0x20
[12084.039770] ret_from_fork+0x22/0x30
[12084.039783] </TASK>
[12084.039800] INFO: task kworker/u16:17:217907 blocked for more than 241 seconds.
[12084.040709] Not tainted 5.18.0-rc2-btrfs-next-115 #1
[12084.041398] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12084.042404] task:kworker/u16:17 state:D stack: 0 pid:217907 ppid: 2 flags:0x00004000
[12084.042411] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[12084.042461] Call Trace:
[12084.042463] <TASK>
[12084.042471] __schedule+0x3cb/0xed0
[12084.042485] schedule+0x4e/0xb0
[12084.042490] wait_extent_bit.constprop.0+0x1eb/0x260 [btrfs]
[12084.042539] ? prepare_to_wait_exclusive+0xc0/0xc0
[12084.042551] lock_extent_bits+0x37/0x90 [btrfs]
[12084.042601] btrfs_finish_ordered_io.isra.0+0x3fd/0x960 [btrfs]
[12084.042656] ? lock_is_held_type+0xe8/0x140
[12084.042667] btrfs_work_helper+0xf8/0x400 [btrfs]
[12084.042716] ? lock_is_held_type+0xe8/0x140
[12084.042727] process_one_work+0x252/0x5a0
[12084.042742] worker_thread+0x52/0x3b0
[12084.042750] ? process_one_work+0x5a0/0x5a0
[12084.042754] kthread+0xf2/0x120
[12084.042757] ? kthread_complete_and_exit+0x20/0x20
[12084.042763] ret_from_fork+0x22/0x30
[12084.042783] </TASK>
[12084.042798] INFO: task fio:234517 blocked for more than 241 seconds.
[12084.043598] Not tainted 5.18.0-rc2-btrfs-next-115 #1
[12084.044282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12084.045244] task:fio state:D stack: 0 pid:234517 ppid:234515 flags:0x00004000
[12084.045248] Call Trace:
[12084.045250] <TASK>
[12084.045254] __schedule+0x3cb/0xed0
[12084.045263] schedule+0x4e/0xb0
[12084.045266] wait_extent_bit.constprop.0+0x1eb/0x260 [btrfs]
[12084.045298] ? prepare_to_wait_exclusive+0xc0/0xc0
[12084.045306] lock_extent_bits+0x37/0x90 [btrfs]
[12084.045336] btrfs_dio_iomap_begin+0x336/0xc60 [btrfs]
[12084.045370] ? lock_is_held_type+0xe8/0x140
[12084.045378] iomap_iter+0x184/0x4c0
[12084.045383] __iomap_dio_rw+0x2c6/0x8a0
[12084.045406] iomap_dio_rw+0xa/0x30
[12084.045408] btrfs_do_write_iter+0x370/0x5e0 [btrfs]
[12084.045440] aio_write+0xfa/0x2c0
[12084.045448] ? __might_fault+0x2a/0x70
[12084.045451] ? kvm_sched_clock_read+0x14/0x40
[12084.045455] ? lock_release+0x153/0x4a0
[12084.045463] io_submit_one+0x615/0x9f0
[12084.045467] ? __might_fault+0x2a/0x70
[12084.045469] ? kvm_sched_clock_read+0x14/0x40
[12084.045478] __x64_sys_io_submit+0x83/0x160
[12084.045483] ? syscall_enter_from_user_mode+0x1d/0x50
[12084.045489] do_syscall_64+0x3b/0x90
[12084.045517] entry_SYSCALL_64_after_hwframe+0x44/0xae
[12084.045521] RIP: 0033:0x7fa76511af79
[12084.045525] RSP: 002b:00007ffd6d6b9058 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[12084.045530] RAX: ffffffffffffffda RBX: 00007fa75ba6e760 RCX: 00007fa76511af79
[12084.045532] RDX: 0000557b304ff3f0 RSI: 0000000000000001 RDI: 00007fa75ba4c000
[12084.045535] RBP: 00007fa75ba4c000 R08: 00007fa751b76000 R09: 0000000000000330
[12084.045537] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[12084.045540] R13: 0000000000000000 R14: 0000557b304ff3f0 R15: 0000557b30521eb0
[12084.045561] </TASK>
Fix this issue by always reserving data space before locking a file range
at btrfs_dio_iomap_begin(). If we can't reserve the space, then we don't
error out immediately - instead after locking the file range, check if we
can do a NOCOW write, and if we can we don't error out since we don't need
to allocate a data extent, however if we can't NOCOW then error out with
-ENOSPC. This also implies that we may end up reserving space when it's
not needed because the write will end up being done in NOCOW mode - in that
case we just release the space after we noticed we did a NOCOW write - this
is the same type of logic that is done in the path for buffered IO writes.
Fixes: f0bfa76a11e93d ("btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range")
CC: stable(a)vger.kernel.org # 5.17+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e6be5ebe7611..9cd12760e9dd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -64,6 +64,8 @@ struct btrfs_iget_args {
struct btrfs_dio_data {
ssize_t submitted;
struct extent_changeset *data_reserved;
+ bool data_space_reserved;
+ bool nocow_done;
};
struct btrfs_rename_ctx {
@@ -7481,6 +7483,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
ret = PTR_ERR(em2);
goto out;
}
+
+ dio_data->nocow_done = true;
} else {
/* Our caller expects us to free the input extent map. */
free_extent_map(em);
@@ -7489,10 +7493,19 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
if (nowait)
return -EAGAIN;
- /* We have to COW, so need to reserve metadata and data space. */
- ret = btrfs_delalloc_reserve_space(BTRFS_I(inode),
- &dio_data->data_reserved,
- start, len);
+ /*
+ * If we could not allocate data space before locking the file
+ * range and we can't do a NOCOW write, then we have to fail.
+ */
+ if (!dio_data->data_space_reserved)
+ return -ENOSPC;
+
+ /*
+ * We have to COW and we have already reserved data space before,
+ * so now we reserve only metadata.
+ */
+ ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len, len,
+ false);
if (ret < 0)
goto out;
space_reserved = true;
@@ -7505,10 +7518,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
*map = em;
len = min(len, em->len - (start - em->start));
if (len < prev_len)
- btrfs_delalloc_release_space(BTRFS_I(inode),
- dio_data->data_reserved,
- start + len, prev_len - len,
- true);
+ btrfs_delalloc_release_metadata(BTRFS_I(inode),
+ prev_len - len, true);
}
/*
@@ -7526,15 +7537,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
out:
if (ret && space_reserved) {
btrfs_delalloc_release_extents(BTRFS_I(inode), len);
- if (can_nocow) {
- btrfs_delalloc_release_metadata(BTRFS_I(inode), len, true);
- } else {
- btrfs_delalloc_release_space(BTRFS_I(inode),
- dio_data->data_reserved,
- start, len, true);
- extent_changeset_free(dio_data->data_reserved);
- dio_data->data_reserved = NULL;
- }
+ btrfs_delalloc_release_metadata(BTRFS_I(inode), len, true);
}
return ret;
}
@@ -7551,6 +7554,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
const bool write = !!(flags & IOMAP_WRITE);
int ret = 0;
u64 len = length;
+ const u64 data_alloc_len = length;
bool unlock_extents = false;
if (!write)
@@ -7603,6 +7607,25 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
iomap->private = dio_data;
+ /*
+ * We always try to allocate data space and must do it before locking
+ * the file range, to avoid deadlocks with concurrent writes to the same
+ * range if the range has several extents and the writes don't expand the
+ * current i_size (the inode lock is taken in shared mode). If we fail to
+ * allocate data space here we continue and later, after locking the
+ * file range, we fail with ENOSPC only if we figure out we can not do a
+ * NOCOW write.
+ */
+ if (write && !(flags & IOMAP_NOWAIT)) {
+ ret = btrfs_check_data_free_space(BTRFS_I(inode),
+ &dio_data->data_reserved,
+ start, data_alloc_len);
+ if (!ret)
+ dio_data->data_space_reserved = true;
+ else if (ret && !(BTRFS_I(inode)->flags &
+ (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)))
+ goto err;
+ }
/*
* If this errors out it's because we couldn't invalidate pagecache for
@@ -7677,6 +7700,24 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
unlock_extents = true;
/* Recalc len in case the new em is smaller than requested */
len = min(len, em->len - (start - em->start));
+ if (dio_data->data_space_reserved) {
+ u64 release_offset;
+ u64 release_len = 0;
+
+ if (dio_data->nocow_done) {
+ release_offset = start;
+ release_len = data_alloc_len;
+ } else if (len < data_alloc_len) {
+ release_offset = start + len;
+ release_len = data_alloc_len - len;
+ }
+
+ if (release_len > 0)
+ btrfs_free_reserved_data_space(BTRFS_I(inode),
+ dio_data->data_reserved,
+ release_offset,
+ release_len);
+ }
} else {
/*
* We need to unlock only the end area that we aren't using.
@@ -7721,6 +7762,13 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart, lockend,
&cached_state);
err:
+ if (dio_data->data_space_reserved) {
+ btrfs_free_reserved_data_space(BTRFS_I(inode),
+ dio_data->data_reserved,
+ start, data_alloc_len);
+ extent_changeset_free(dio_data->data_reserved);
+ }
+
kfree(dio_data);
return ret;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f5585f4f0ef5b17026bbd60fbff6fcc91b99d5bf Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Thu, 28 Apr 2022 14:59:46 +0100
Subject: [PATCH] btrfs: fix deadlock between concurrent dio writes when low on
free data space
When reserving data space for a direct IO write we can end up deadlocking
if we have multiple tasks attempting a write to the same file range, there
are multiple extents covered by that file range, we are low on available
space for data and the writes don't expand the inode's i_size.
The deadlock can happen like this:
1) We have a file with an i_size of 1M, at offset 0 it has an extent with
a size of 128K and at offset 128K it has another extent also with a
size of 128K;
2) Task A does a direct IO write against file range [0, 256K), and because
the write is within the i_size boundary, it takes the inode's lock (VFS
level) in shared mode;
3) Task A locks the file range [0, 256K) at btrfs_dio_iomap_begin(), and
then gets the extent map for the extent covering the range [0, 128K).
At btrfs_get_blocks_direct_write(), it creates an ordered extent for
that file range ([0, 128K));
4) Before returning from btrfs_dio_iomap_begin(), it unlocks the file
range [0, 256K);
5) Task A executes btrfs_dio_iomap_begin() again, this time for the file
range [128K, 256K), and locks the file range [128K, 256K);
6) Task B starts a direct IO write against file range [0, 256K) as well.
It also locks the inode in shared mode, as it's within the i_size limit,
and then tries to lock file range [0, 256K). It is able to lock the
subrange [0, 128K) but then blocks waiting for the range [128K, 256K),
as it is currently locked by task A;
7) Task A enters btrfs_get_blocks_direct_write() and tries to reserve data
space. Because we are low on available free space, it triggers the
async data reclaim task, and waits for it to reserve data space;
8) The async reclaim task decides to wait for all existing ordered extents
to complete (through btrfs_wait_ordered_roots()).
It finds the ordered extent previously created by task A for the file
range [0, 128K) and waits for it to complete;
9) The ordered extent for the file range [0, 128K) can not complete
because it blocks at btrfs_finish_ordered_io() when trying to lock the
file range [0, 128K).
This results in a deadlock, because:
- task B is holding the file range [0, 128K) locked, waiting for the
range [128K, 256K) to be unlocked by task A;
- task A is holding the file range [128K, 256K) locked and it's waiting
for the async data reclaim task to satisfy its space reservation
request;
- the async data reclaim task is waiting for ordered extent [0, 128K)
to complete, but the ordered extent can not complete because the
file range [0, 128K) is currently locked by task B, which is waiting
on task A to unlock file range [128K, 256K) and task A waiting
on the async data reclaim task.
This results in a deadlock between 4 task: task A, task B, the async
data reclaim task and the task doing ordered extent completion (a work
queue task).
This type of deadlock can sporadically be triggered by the test case
generic/300 from fstests, and results in a stack trace like the following:
[12084.033689] INFO: task kworker/u16:7:123749 blocked for more than 241 seconds.
[12084.034877] Not tainted 5.18.0-rc2-btrfs-next-115 #1
[12084.035562] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12084.036548] task:kworker/u16:7 state:D stack: 0 pid:123749 ppid: 2 flags:0x00004000
[12084.036554] Workqueue: btrfs-flush_delalloc btrfs_work_helper [btrfs]
[12084.036599] Call Trace:
[12084.036601] <TASK>
[12084.036606] __schedule+0x3cb/0xed0
[12084.036616] schedule+0x4e/0xb0
[12084.036620] btrfs_start_ordered_extent+0x109/0x1c0 [btrfs]
[12084.036651] ? prepare_to_wait_exclusive+0xc0/0xc0
[12084.036659] btrfs_run_ordered_extent_work+0x1a/0x30 [btrfs]
[12084.036688] btrfs_work_helper+0xf8/0x400 [btrfs]
[12084.036719] ? lock_is_held_type+0xe8/0x140
[12084.036727] process_one_work+0x252/0x5a0
[12084.036736] ? process_one_work+0x5a0/0x5a0
[12084.036738] worker_thread+0x52/0x3b0
[12084.036743] ? process_one_work+0x5a0/0x5a0
[12084.036745] kthread+0xf2/0x120
[12084.036747] ? kthread_complete_and_exit+0x20/0x20
[12084.036751] ret_from_fork+0x22/0x30
[12084.036765] </TASK>
[12084.036769] INFO: task kworker/u16:11:153787 blocked for more than 241 seconds.
[12084.037702] Not tainted 5.18.0-rc2-btrfs-next-115 #1
[12084.038540] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12084.039506] task:kworker/u16:11 state:D stack: 0 pid:153787 ppid: 2 flags:0x00004000
[12084.039511] Workqueue: events_unbound btrfs_async_reclaim_data_space [btrfs]
[12084.039551] Call Trace:
[12084.039553] <TASK>
[12084.039557] __schedule+0x3cb/0xed0
[12084.039566] schedule+0x4e/0xb0
[12084.039569] schedule_timeout+0xed/0x130
[12084.039573] ? mark_held_locks+0x50/0x80
[12084.039578] ? _raw_spin_unlock_irq+0x24/0x50
[12084.039580] ? lockdep_hardirqs_on+0x7d/0x100
[12084.039585] __wait_for_common+0xaf/0x1f0
[12084.039587] ? usleep_range_state+0xb0/0xb0
[12084.039596] btrfs_wait_ordered_extents+0x3d6/0x470 [btrfs]
[12084.039636] btrfs_wait_ordered_roots+0x175/0x240 [btrfs]
[12084.039670] flush_space+0x25b/0x630 [btrfs]
[12084.039712] btrfs_async_reclaim_data_space+0x108/0x1b0 [btrfs]
[12084.039747] process_one_work+0x252/0x5a0
[12084.039756] ? process_one_work+0x5a0/0x5a0
[12084.039758] worker_thread+0x52/0x3b0
[12084.039762] ? process_one_work+0x5a0/0x5a0
[12084.039765] kthread+0xf2/0x120
[12084.039766] ? kthread_complete_and_exit+0x20/0x20
[12084.039770] ret_from_fork+0x22/0x30
[12084.039783] </TASK>
[12084.039800] INFO: task kworker/u16:17:217907 blocked for more than 241 seconds.
[12084.040709] Not tainted 5.18.0-rc2-btrfs-next-115 #1
[12084.041398] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12084.042404] task:kworker/u16:17 state:D stack: 0 pid:217907 ppid: 2 flags:0x00004000
[12084.042411] Workqueue: btrfs-endio-write btrfs_work_helper [btrfs]
[12084.042461] Call Trace:
[12084.042463] <TASK>
[12084.042471] __schedule+0x3cb/0xed0
[12084.042485] schedule+0x4e/0xb0
[12084.042490] wait_extent_bit.constprop.0+0x1eb/0x260 [btrfs]
[12084.042539] ? prepare_to_wait_exclusive+0xc0/0xc0
[12084.042551] lock_extent_bits+0x37/0x90 [btrfs]
[12084.042601] btrfs_finish_ordered_io.isra.0+0x3fd/0x960 [btrfs]
[12084.042656] ? lock_is_held_type+0xe8/0x140
[12084.042667] btrfs_work_helper+0xf8/0x400 [btrfs]
[12084.042716] ? lock_is_held_type+0xe8/0x140
[12084.042727] process_one_work+0x252/0x5a0
[12084.042742] worker_thread+0x52/0x3b0
[12084.042750] ? process_one_work+0x5a0/0x5a0
[12084.042754] kthread+0xf2/0x120
[12084.042757] ? kthread_complete_and_exit+0x20/0x20
[12084.042763] ret_from_fork+0x22/0x30
[12084.042783] </TASK>
[12084.042798] INFO: task fio:234517 blocked for more than 241 seconds.
[12084.043598] Not tainted 5.18.0-rc2-btrfs-next-115 #1
[12084.044282] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[12084.045244] task:fio state:D stack: 0 pid:234517 ppid:234515 flags:0x00004000
[12084.045248] Call Trace:
[12084.045250] <TASK>
[12084.045254] __schedule+0x3cb/0xed0
[12084.045263] schedule+0x4e/0xb0
[12084.045266] wait_extent_bit.constprop.0+0x1eb/0x260 [btrfs]
[12084.045298] ? prepare_to_wait_exclusive+0xc0/0xc0
[12084.045306] lock_extent_bits+0x37/0x90 [btrfs]
[12084.045336] btrfs_dio_iomap_begin+0x336/0xc60 [btrfs]
[12084.045370] ? lock_is_held_type+0xe8/0x140
[12084.045378] iomap_iter+0x184/0x4c0
[12084.045383] __iomap_dio_rw+0x2c6/0x8a0
[12084.045406] iomap_dio_rw+0xa/0x30
[12084.045408] btrfs_do_write_iter+0x370/0x5e0 [btrfs]
[12084.045440] aio_write+0xfa/0x2c0
[12084.045448] ? __might_fault+0x2a/0x70
[12084.045451] ? kvm_sched_clock_read+0x14/0x40
[12084.045455] ? lock_release+0x153/0x4a0
[12084.045463] io_submit_one+0x615/0x9f0
[12084.045467] ? __might_fault+0x2a/0x70
[12084.045469] ? kvm_sched_clock_read+0x14/0x40
[12084.045478] __x64_sys_io_submit+0x83/0x160
[12084.045483] ? syscall_enter_from_user_mode+0x1d/0x50
[12084.045489] do_syscall_64+0x3b/0x90
[12084.045517] entry_SYSCALL_64_after_hwframe+0x44/0xae
[12084.045521] RIP: 0033:0x7fa76511af79
[12084.045525] RSP: 002b:00007ffd6d6b9058 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
[12084.045530] RAX: ffffffffffffffda RBX: 00007fa75ba6e760 RCX: 00007fa76511af79
[12084.045532] RDX: 0000557b304ff3f0 RSI: 0000000000000001 RDI: 00007fa75ba4c000
[12084.045535] RBP: 00007fa75ba4c000 R08: 00007fa751b76000 R09: 0000000000000330
[12084.045537] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
[12084.045540] R13: 0000000000000000 R14: 0000557b304ff3f0 R15: 0000557b30521eb0
[12084.045561] </TASK>
Fix this issue by always reserving data space before locking a file range
at btrfs_dio_iomap_begin(). If we can't reserve the space, then we don't
error out immediately - instead after locking the file range, check if we
can do a NOCOW write, and if we can we don't error out since we don't need
to allocate a data extent, however if we can't NOCOW then error out with
-ENOSPC. This also implies that we may end up reserving space when it's
not needed because the write will end up being done in NOCOW mode - in that
case we just release the space after we noticed we did a NOCOW write - this
is the same type of logic that is done in the path for buffered IO writes.
Fixes: f0bfa76a11e93d ("btrfs: fix ENOSPC failure when attempting direct IO write into NOCOW range")
CC: stable(a)vger.kernel.org # 5.17+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index e6be5ebe7611..9cd12760e9dd 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -64,6 +64,8 @@ struct btrfs_iget_args {
struct btrfs_dio_data {
ssize_t submitted;
struct extent_changeset *data_reserved;
+ bool data_space_reserved;
+ bool nocow_done;
};
struct btrfs_rename_ctx {
@@ -7481,6 +7483,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
ret = PTR_ERR(em2);
goto out;
}
+
+ dio_data->nocow_done = true;
} else {
/* Our caller expects us to free the input extent map. */
free_extent_map(em);
@@ -7489,10 +7493,19 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
if (nowait)
return -EAGAIN;
- /* We have to COW, so need to reserve metadata and data space. */
- ret = btrfs_delalloc_reserve_space(BTRFS_I(inode),
- &dio_data->data_reserved,
- start, len);
+ /*
+ * If we could not allocate data space before locking the file
+ * range and we can't do a NOCOW write, then we have to fail.
+ */
+ if (!dio_data->data_space_reserved)
+ return -ENOSPC;
+
+ /*
+ * We have to COW and we have already reserved data space before,
+ * so now we reserve only metadata.
+ */
+ ret = btrfs_delalloc_reserve_metadata(BTRFS_I(inode), len, len,
+ false);
if (ret < 0)
goto out;
space_reserved = true;
@@ -7505,10 +7518,8 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
*map = em;
len = min(len, em->len - (start - em->start));
if (len < prev_len)
- btrfs_delalloc_release_space(BTRFS_I(inode),
- dio_data->data_reserved,
- start + len, prev_len - len,
- true);
+ btrfs_delalloc_release_metadata(BTRFS_I(inode),
+ prev_len - len, true);
}
/*
@@ -7526,15 +7537,7 @@ static int btrfs_get_blocks_direct_write(struct extent_map **map,
out:
if (ret && space_reserved) {
btrfs_delalloc_release_extents(BTRFS_I(inode), len);
- if (can_nocow) {
- btrfs_delalloc_release_metadata(BTRFS_I(inode), len, true);
- } else {
- btrfs_delalloc_release_space(BTRFS_I(inode),
- dio_data->data_reserved,
- start, len, true);
- extent_changeset_free(dio_data->data_reserved);
- dio_data->data_reserved = NULL;
- }
+ btrfs_delalloc_release_metadata(BTRFS_I(inode), len, true);
}
return ret;
}
@@ -7551,6 +7554,7 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
const bool write = !!(flags & IOMAP_WRITE);
int ret = 0;
u64 len = length;
+ const u64 data_alloc_len = length;
bool unlock_extents = false;
if (!write)
@@ -7603,6 +7607,25 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
iomap->private = dio_data;
+ /*
+ * We always try to allocate data space and must do it before locking
+ * the file range, to avoid deadlocks with concurrent writes to the same
+ * range if the range has several extents and the writes don't expand the
+ * current i_size (the inode lock is taken in shared mode). If we fail to
+ * allocate data space here we continue and later, after locking the
+ * file range, we fail with ENOSPC only if we figure out we can not do a
+ * NOCOW write.
+ */
+ if (write && !(flags & IOMAP_NOWAIT)) {
+ ret = btrfs_check_data_free_space(BTRFS_I(inode),
+ &dio_data->data_reserved,
+ start, data_alloc_len);
+ if (!ret)
+ dio_data->data_space_reserved = true;
+ else if (ret && !(BTRFS_I(inode)->flags &
+ (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC)))
+ goto err;
+ }
/*
* If this errors out it's because we couldn't invalidate pagecache for
@@ -7677,6 +7700,24 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
unlock_extents = true;
/* Recalc len in case the new em is smaller than requested */
len = min(len, em->len - (start - em->start));
+ if (dio_data->data_space_reserved) {
+ u64 release_offset;
+ u64 release_len = 0;
+
+ if (dio_data->nocow_done) {
+ release_offset = start;
+ release_len = data_alloc_len;
+ } else if (len < data_alloc_len) {
+ release_offset = start + len;
+ release_len = data_alloc_len - len;
+ }
+
+ if (release_len > 0)
+ btrfs_free_reserved_data_space(BTRFS_I(inode),
+ dio_data->data_reserved,
+ release_offset,
+ release_len);
+ }
} else {
/*
* We need to unlock only the end area that we aren't using.
@@ -7721,6 +7762,13 @@ static int btrfs_dio_iomap_begin(struct inode *inode, loff_t start,
unlock_extent_cached(&BTRFS_I(inode)->io_tree, lockstart, lockend,
&cached_state);
err:
+ if (dio_data->data_space_reserved) {
+ btrfs_free_reserved_data_space(BTRFS_I(inode),
+ dio_data->data_reserved,
+ start, data_alloc_len);
+ extent_changeset_free(dio_data->data_reserved);
+ }
+
kfree(dio_data);
return ret;
בבקשה אני צריך את עזרתך
אני שולח לך את ברכתי מסולטנות עומאן, בעיר הבירה מוסקט.
האם אוכל להשתמש במדיום הזה כדי לפתוח איתך תקשורת הדדית, ולחפש את קבלתך
להשקעה במדינה שלך בניהולך כשותפה שלי, שמי עיישה קדאפי ומתגוררת כיום
בעומאן, אני אלמנה ואם חד הורית עם שלושה ילדים , בתו הביולוגית היחידה
של נשיא לוב המנוח (קולונל המנוח מועמר קדאפי) וכרגע אני נמצאת תחת הגנת
מקלט מדיני על ידי ממשלת עומני.
יש לי כספים בשווי "עשרים ושבע מיליון וחמש מאות אלף דולר אמריקאי"
-$27.500.000.00 דולר אמריקאי שאני רוצה להפקיד בידך עבור פרויקט השקעה
בארצך. אם אתה מוכן לטפל בפרויקט זה בשמי, אנא השב דחוף כדי לאפשר לי
לספק לך פרטים נוספים כדי להתחיל בתהליך ההעברה.
אודה לתגובתך הדחופה באמצעות כתובת הדוא"ל שלי למטה: madamgadafiaisha(a)gmail.com
תודה
שלך באמת עיישה
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Please i need your help
I am sending my greetings to you from the Sultanate of Oman, In the
capital city of Muscat.
May i use this medium to open a mutual communication with you, and
seeking your acceptance towards investing in your country under your
management as my partner, My name is Aisha Gaddafi and presently
living in Oman, i am a Widow and single Mother with three Children,
the only biological Daughter of late Libyan President (Late Colonel
Muammar Gaddafi) and presently i am under political asylum protection
by the Omani Government.
I have funds worth “Twenty Seven Million Five Hundred Thousand United
State Dollars” -$27.500.000.00 US Dollars which i want to entrust on
you for investment project in your country. If you are willing to
handle this project on my behalf, kindly reply urgent to enable me
provide you more details to start the transfer process.
I shall appreciate your urgent response through my email address
below: madamgadafiaisha(a)gmail.com
Thanks
Yours Truly Aisha
This bug is marked as fixed by commit:
net: core: netlink: add helper refcount dec and lock function
net: sched: add helper function to take reference to Qdisc
net: sched: extend Qdisc with rcu
net: sched: rename qdisc_destroy() to qdisc_put()
net: sched: use Qdisc rcu API instead of relying on rtnl lock
But I can't find it in any tested tree for more than 90 days.
Is it a correct commit? Please update it by replying:
#syz fix: exact-commit-title
Until then the bug is still considered open and
new crashes with the same signature are ignored.
This is the start of the stable review cycle for the 5.10.120 release.
There are 53 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.10.120-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.10.120-rc1
Liu Jian <liujian56(a)huawei.com>
bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes
Yuntao Wang <ytcoode(a)gmail.com>
bpf: Fix potential array overflow in bpf_trampoline_get_progs()
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix possible sleep during nfsd4_release_lockowner()
Trond Myklebust <trond.myklebust(a)hammerspace.com>
NFS: Memory allocation failures are not server fatal errors
Akira Yokosawa <akiyks(a)gmail.com>
docs: submitting-patches: Fix crossref to 'The canonical patch format'
Xiu Jianfeng <xiujianfeng(a)huawei.com>
tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe()
Stefan Mahnke-Hartmann <stefan.mahnke-hartmann(a)infineon.com>
tpm: Fix buffer access in tpm2_get_tpm_pt()
Tao Jin <tao-j(a)outlook.com>
HID: multitouch: add quirks to enable Lenovo X12 trackpoint
Marek Maślanka <mm(a)semihalf.com>
HID: multitouch: Add support for Google Whiskers Touchpad
Mariusz Tkaczyk <mariusz.tkaczyk(a)linux.intel.com>
raid5: introduce MD_BROKEN
Sarthak Kukreti <sarthakkukreti(a)google.com>
dm verity: set DM_TARGET_IMMUTABLE feature flag
Mikulas Patocka <mpatocka(a)redhat.com>
dm stats: add cond_resched when looping over entries
Mikulas Patocka <mpatocka(a)redhat.com>
dm crypt: make printing of the key constant-time
Dan Carpenter <dan.carpenter(a)oracle.com>
dm integrity: fix error code in dm_integrity_ctr()
Jonathan Bakker <xc-racer2(a)live.ca>
ARM: dts: s5pv210: Correct interrupt name for bluetooth in Aries
Steven Rostedt <rostedt(a)goodmis.org>
Bluetooth: hci_qca: Use del_timer_sync() before freeing
Sultan Alsawaf <sultan(a)kerneltoast.com>
zsmalloc: fix races between asynchronous zspage free and page migration
Vitaly Chikunov <vt(a)altlinux.org>
crypto: ecrdsa - Fix incorrect use of vli_cmp
Fabio Estevam <festevam(a)denx.de>
crypto: caam - fix i.MX6SX entropy delay value
Sean Christopherson <seanjc(a)google.com>
KVM: x86: avoid calling x86 emulator without a decoded instruction
Paolo Bonzini <pbonzini(a)redhat.com>
x86, kvm: use correct GFP flags for preemption disabled
Sean Christopherson <seanjc(a)google.com>
x86/kvm: Alloc dummy async #PF token outside of raw spinlock
Xiaomeng Tong <xiam0nd.tong(a)gmail.com>
KVM: PPC: Book3S HV: fix incorrect NULL check on list iterator
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: re-fetch conntrack after insertion
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: sanitize nft_set_desc_concat_parse()
Nicolai Stange <nstange(a)suse.de>
crypto: drbg - make reseeding from get_random_bytes() synchronous
Nicolai Stange <nstange(a)suse.de>
crypto: drbg - move dynamic ->reseed_threshold adjustments to __drbg_seed()
Nicolai Stange <nstange(a)suse.de>
crypto: drbg - track whether DRBG was seeded with !rng_is_initialized()
Nicolai Stange <nstange(a)suse.de>
crypto: drbg - prepare for more fine-grained tracking of seeding state
Justin M. Forbes <jforbes(a)fedoraproject.org>
lib/crypto: add prompts back to crypto libraries
Yuezhang Mo <Yuezhang.Mo(a)sony.com>
exfat: fix referencing wrong parent directory information after renaming
Tadeusz Struk <tadeusz.struk(a)linaro.org>
exfat: check if cluster num is valid
Gustavo A. R. Silva <gustavoars(a)kernel.org>
drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency()
Dave Chinner <dchinner(a)redhat.com>
xfs: Fix CIL throttle hang when CIL space used going backwards
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: fix an ABBA deadlock in xfs_rename
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: fix the forward progress assertion in xfs_iwalk_run_callbacks
Kaixu Xia <kaixuxia(a)tencent.com>
xfs: show the proper user quota options
Darrick J. Wong <darrick.wong(a)oracle.com>
xfs: detect overflows in bmbt records
Alex Elder <elder(a)linaro.org>
net: ipa: compute proper aggregation limit
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring: fix using under-expanded iters
Pavel Begunkov <asml.silence(a)gmail.com>
io_uring: don't re-import iovecs from callbacks
Stephen Brennan <stephen.s.brennan(a)oracle.com>
assoc_array: Fix BUG_ON during garbage collect
Miri Korenblit <miriam.rachel.korenblit(a)intel.com>
cfg80211: set custom regdomain after wiphy registration
David Howells <dhowells(a)redhat.com>
pipe: Fix missing lock in pipe_resize_ring()
Kuniyuki Iwashima <kuniyu(a)amazon.co.jp>
pipe: make poll_usage boolean and annotate its access
Pablo Neira Ayuso <pablo(a)netfilter.org>
netfilter: nf_tables: disallow non-stateful expression in sets earlier
Piyush Malgujar <pmalgujar(a)marvell.com>
drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers
Mika Westerberg <mika.westerberg(a)linux.intel.com>
i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging
Joel Stanley <joel(a)jms.id.au>
net: ftgmac100: Disable hardware checksum on AST2600
Lin Ma <linma(a)zju.edu.cn>
nfc: pn533: Fix buggy cleanup order
Thomas Bartschies <thomas.bartschies(a)cvk.de>
net: af_key: check encryption module availability consistency
Al Viro <viro(a)zeniv.linux.org.uk>
percpu_ref_init(): clean ->percpu_count_ref on failure
IotaHydrae <writeforever(a)foxmail.com>
pinctrl: sunxi: fix f1c100s uart2 function
-------------
Diffstat:
Documentation/process/submitting-patches.rst | 2 +-
Makefile | 4 +-
arch/arm/boot/dts/s5pv210-aries.dtsi | 2 +-
arch/powerpc/kvm/book3s_hv_uvmem.c | 8 +-
arch/x86/kernel/kvm.c | 41 ++++++----
arch/x86/kvm/x86.c | 31 +++++---
crypto/Kconfig | 2 -
crypto/drbg.c | 110 +++++++++++---------------
crypto/ecrdsa.c | 8 +-
drivers/bluetooth/hci_qca.c | 4 +-
drivers/char/random.c | 2 -
drivers/char/tpm/tpm2-cmd.c | 11 ++-
drivers/char/tpm/tpm_ibmvtpm.c | 1 +
drivers/crypto/caam/ctrl.c | 18 +++++
drivers/gpu/drm/i915/intel_pm.c | 2 +-
drivers/hid/hid-ids.h | 1 +
drivers/hid/hid-multitouch.c | 9 +++
drivers/i2c/busses/i2c-ismt.c | 14 ++++
drivers/i2c/busses/i2c-thunderx-pcidrv.c | 1 +
drivers/md/dm-crypt.c | 14 +++-
drivers/md/dm-integrity.c | 2 -
drivers/md/dm-stats.c | 8 ++
drivers/md/dm-verity-target.c | 1 +
drivers/md/raid5.c | 47 ++++++-----
drivers/net/ethernet/faraday/ftgmac100.c | 5 ++
drivers/net/ipa/ipa_endpoint.c | 4 +-
drivers/nfc/pn533/pn533.c | 5 +-
drivers/pinctrl/sunxi/pinctrl-suniv-f1c100s.c | 2 +-
fs/exfat/balloc.c | 8 +-
fs/exfat/exfat_fs.h | 8 ++
fs/exfat/fatent.c | 8 --
fs/exfat/namei.c | 27 +------
fs/io_uring.c | 47 ++---------
fs/nfs/internal.h | 1 +
fs/nfsd/nfs4state.c | 12 +--
fs/pipe.c | 33 ++++----
fs/xfs/libxfs/xfs_bmap.c | 5 ++
fs/xfs/libxfs/xfs_dir2.h | 2 -
fs/xfs/libxfs/xfs_dir2_sf.c | 2 +-
fs/xfs/xfs_buf_item.c | 37 +++++----
fs/xfs/xfs_inode.c | 42 ++++++----
fs/xfs/xfs_inode_item.c | 14 ++++
fs/xfs/xfs_iwalk.c | 2 +-
fs/xfs/xfs_log_cil.c | 22 ++++--
fs/xfs/xfs_super.c | 10 ++-
include/crypto/drbg.h | 10 ++-
include/linux/pipe_fs_i.h | 2 +-
include/net/netfilter/nf_conntrack_core.h | 7 +-
kernel/bpf/trampoline.c | 18 +++--
lib/Kconfig | 2 +
lib/assoc_array.c | 8 ++
lib/crypto/Kconfig | 17 ++--
lib/percpu-refcount.c | 1 +
mm/zsmalloc.c | 37 ++++++++-
net/core/filter.c | 4 +-
net/key/af_key.c | 6 +-
net/netfilter/nf_tables_api.c | 36 ++++++---
net/wireless/core.c | 8 +-
net/wireless/reg.c | 1 +
59 files changed, 461 insertions(+), 335 deletions(-)
This is the start of the stable review cycle for the 4.19.246 release.
There are 30 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.246-r…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.19.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.19.246-rc1
Liu Jian <liujian56(a)huawei.com>
bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix possible sleep during nfsd4_release_lockowner()
Akira Yokosawa <akiyks(a)gmail.com>
docs: submitting-patches: Fix crossref to 'The canonical patch format'
Xiu Jianfeng <xiujianfeng(a)huawei.com>
tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe()
Stefan Mahnke-Hartmann <stefan.mahnke-hartmann(a)infineon.com>
tpm: Fix buffer access in tpm2_get_tpm_pt()
Marek Maślanka <mm(a)semihalf.com>
HID: multitouch: Add support for Google Whiskers Touchpad
Sarthak Kukreti <sarthakkukreti(a)google.com>
dm verity: set DM_TARGET_IMMUTABLE feature flag
Mikulas Patocka <mpatocka(a)redhat.com>
dm stats: add cond_resched when looping over entries
Mikulas Patocka <mpatocka(a)redhat.com>
dm crypt: make printing of the key constant-time
Dan Carpenter <dan.carpenter(a)oracle.com>
dm integrity: fix error code in dm_integrity_ctr()
Sultan Alsawaf <sultan(a)kerneltoast.com>
zsmalloc: fix races between asynchronous zspage free and page migration
Florian Westphal <fw(a)strlen.de>
netfilter: conntrack: re-fetch conntrack after insertion
Kees Cook <keescook(a)chromium.org>
exec: Force single empty string when argv is empty
Haimin Zhang <tcs.kernel(a)gmail.com>
block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
Gustavo A. R. Silva <gustavoars(a)kernel.org>
drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency()
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf tests bp_account: Make global variable static
Arnaldo Carvalho de Melo <acme(a)redhat.com>
perf bench: Share some global variables to fix build with gcc 10
Ben Hutchings <ben(a)decadent.org.uk>
libtraceevent: Fix build with binutils 2.35
Miri Korenblit <miriam.rachel.korenblit(a)intel.com>
cfg80211: set custom regdomain after wiphy registration
Stephen Brennan <stephen.s.brennan(a)oracle.com>
assoc_array: Fix BUG_ON during garbage collect
Piyush Malgujar <pmalgujar(a)marvell.com>
drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers
Mika Westerberg <mika.westerberg(a)linux.intel.com>
i2c: ismt: Provide a DMA buffer for Interrupt Cause Logging
Joel Stanley <joel(a)jms.id.au>
net: ftgmac100: Disable hardware checksum on AST2600
Thomas Bartschies <thomas.bartschies(a)cvk.de>
net: af_key: check encryption module availability consistency
Lorenzo Pieralisi <lorenzo.pieralisi(a)arm.com>
ACPI: sysfs: Fix BERT error region memory mapping
Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
ACPI: sysfs: Make sparse happy about address space in use
Willy Tarreau <w(a)1wt.eu>
secure_seq: use the 64 bits of the siphash for port offset calculation
Eric Dumazet <edumazet(a)google.com>
tcp: change source port randomizarion at connect() time
Denis Efremov (Oracle) <efremov(a)linux.com>
staging: rtl8723bs: prevent ->Ssid overflow in rtw_wx_set_scan()
Thomas Gleixner <tglx(a)linutronix.de>
x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests
-------------
Diffstat:
Documentation/process/submitting-patches.rst | 2 +-
Makefile | 4 +--
arch/x86/pci/xen.c | 5 ++++
block/bio.c | 2 +-
drivers/acpi/sysfs.c | 23 +++++++++++-----
drivers/char/tpm/tpm2-cmd.c | 11 +++++++-
drivers/char/tpm/tpm_ibmvtpm.c | 1 +
drivers/gpu/drm/i915/intel_pm.c | 2 +-
drivers/hid/hid-multitouch.c | 3 +++
drivers/i2c/busses/i2c-ismt.c | 14 ++++++++++
drivers/i2c/busses/i2c-thunderx-pcidrv.c | 1 +
drivers/md/dm-crypt.c | 14 +++++++---
drivers/md/dm-integrity.c | 2 --
drivers/md/dm-stats.c | 8 ++++++
drivers/md/dm-verity-target.c | 1 +
drivers/net/ethernet/faraday/ftgmac100.c | 5 ++++
drivers/staging/rtl8723bs/os_dep/ioctl_linux.c | 6 +++--
fs/exec.c | 17 ++++++++++++
fs/nfsd/nfs4state.c | 12 +++------
include/net/inet_hashtables.h | 2 +-
include/net/netfilter/nf_conntrack_core.h | 7 ++++-
include/net/secure_seq.h | 4 +--
lib/assoc_array.c | 8 ++++++
mm/zsmalloc.c | 37 +++++++++++++++++++++++---
net/core/filter.c | 4 +--
net/core/secure_seq.c | 4 +--
net/ipv4/inet_hashtables.c | 28 ++++++++++++++-----
net/ipv6/inet6_hashtables.c | 4 +--
net/key/af_key.c | 6 ++---
net/wireless/core.c | 7 ++---
net/wireless/reg.c | 1 +
tools/lib/traceevent/Makefile | 2 +-
tools/perf/bench/bench.h | 4 +++
tools/perf/bench/futex-hash.c | 12 ++++-----
tools/perf/bench/futex-lock-pi.c | 11 ++++----
tools/perf/tests/bp_account.c | 2 +-
36 files changed, 209 insertions(+), 67 deletions(-)
This is the start of the stable review cycle for the 4.9.317 release.
There are 12 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 05 Jun 2022 17:38:05 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.9.317-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.9.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.9.317-rc1
Liu Jian <liujian56(a)huawei.com>
bpf: Enlarge offset check value to INT_MAX in bpf_skb_{load,store}_bytes
Chuck Lever <chuck.lever(a)oracle.com>
NFSD: Fix possible sleep during nfsd4_release_lockowner()
Xiu Jianfeng <xiujianfeng(a)huawei.com>
tpm: ibmvtpm: Correct the return value in tpm_ibmvtpm_probe()
Sarthak Kukreti <sarthakkukreti(a)google.com>
dm verity: set DM_TARGET_IMMUTABLE feature flag
Mikulas Patocka <mpatocka(a)redhat.com>
dm stats: add cond_resched when looping over entries
Mikulas Patocka <mpatocka(a)redhat.com>
dm crypt: make printing of the key constant-time
Kees Cook <keescook(a)chromium.org>
exec: Force single empty string when argv is empty
Haimin Zhang <tcs.kernel(a)gmail.com>
block-map: add __GFP_ZERO flag for alloc_page in function bio_copy_kern
Gustavo A. R. Silva <gustavoars(a)kernel.org>
drm/i915: Fix -Wstringop-overflow warning in call to intel_read_wm_latency()
Stephen Brennan <stephen.s.brennan(a)oracle.com>
assoc_array: Fix BUG_ON during garbage collect
Piyush Malgujar <pmalgujar(a)marvell.com>
drivers: i2c: thunderx: Allow driver to work with ACPI defined TWSI controllers
Thomas Bartschies <thomas.bartschies(a)cvk.de>
net: af_key: check encryption module availability consistency
-------------
Diffstat:
Makefile | 4 ++--
block/bio.c | 2 +-
drivers/char/tpm/tpm_ibmvtpm.c | 1 +
drivers/gpu/drm/i915/intel_pm.c | 2 +-
drivers/i2c/busses/i2c-thunderx-pcidrv.c | 1 +
drivers/md/dm-crypt.c | 15 +++++++++++----
drivers/md/dm-stats.c | 8 ++++++++
drivers/md/dm-verity-target.c | 1 +
fs/exec.c | 17 +++++++++++++++++
fs/nfsd/nfs4state.c | 12 ++++--------
lib/assoc_array.c | 8 ++++++++
net/core/filter.c | 4 ++--
net/key/af_key.c | 6 +++---
13 files changed, 60 insertions(+), 21 deletions(-)
From: Pablo Neira Ayuso <pablo(a)netfilter.org>
commit 520778042ccca019f3ffa136dd0ca565c486cedd upstream.
Since 3e135cd499bf ("netfilter: nft_dynset: dynamic stateful expression
instantiation"), it is possible to attach stateful expressions to set
elements.
cd5125d8f518 ("netfilter: nf_tables: split set destruction in deactivate
and destroy phase") introduces conditional destruction on the object to
accomodate transaction semantics.
nft_expr_init() calls expr->ops->init() first, then check for
NFT_STATEFUL_EXPR, this stills allows to initialize a non-stateful
lookup expressions which points to a set, which might lead to UAF since
the set is not properly detached from the set->binding for this case.
Anyway, this combination is non-sense from nf_tables perspective.
This patch fixes this problem by checking for NFT_STATEFUL_EXPR before
expr->ops->init() is called.
The reporter provides a KASAN splat and a poc reproducer (similar to
those autogenerated by syzbot to report use-after-free errors). It is
unknown to me if they are using syzbot or if they use similar automated
tool to locate the bug that they are reporting.
For the record, this is the KASAN splat.
[ 85.431824] ==================================================================
[ 85.432901] BUG: KASAN: use-after-free in nf_tables_bind_set+0x81b/0xa20
[ 85.433825] Write of size 8 at addr ffff8880286f0e98 by task poc/776
[ 85.434756]
[ 85.434999] CPU: 1 PID: 776 Comm: poc Tainted: G W 5.18.0+ #2
[ 85.436023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
Fixes: 0b2d8a7b638b ("netfilter: nf_tables: add helper functions for expression handling")
Reported-and-tested-by: Aaron Adams <edg-e(a)nccgroup.com>
Signed-off-by: Pablo Neira Ayuso <pablo(a)netfilter.org>
[Ajay: Regenerated the patch for v4.19.y]
Signed-off-by: Ajay Kaher <akaher(a)vmware.com>
---
net/netfilter/nf_tables_api.c | 16 ++++++++++------
net/netfilter/nft_dynset.c | 3 ---
2 files changed, 10 insertions(+), 9 deletions(-)
diff --git a/net/netfilter/nf_tables_api.c b/net/netfilter/nf_tables_api.c
index 9cc8e92..ab68076 100644
--- a/net/netfilter/nf_tables_api.c
+++ b/net/netfilter/nf_tables_api.c
@@ -2167,27 +2167,31 @@ struct nft_expr *nft_expr_init(const struct nft_ctx *ctx,
err = nf_tables_expr_parse(ctx, nla, &info);
if (err < 0)
- goto err1;
+ goto err_expr_parse;
+
+ err = -EOPNOTSUPP;
+ if (!(info.ops->type->flags & NFT_EXPR_STATEFUL))
+ goto err_expr_stateful;
err = -ENOMEM;
expr = kzalloc(info.ops->size, GFP_KERNEL);
if (expr == NULL)
- goto err2;
+ goto err_expr_stateful;
err = nf_tables_newexpr(ctx, &info, expr);
if (err < 0)
- goto err3;
+ goto err_expr_new;
return expr;
-err3:
+err_expr_new:
kfree(expr);
-err2:
+err_expr_stateful:
owner = info.ops->type->owner;
if (info.ops->type->release_ops)
info.ops->type->release_ops(info.ops);
module_put(owner);
-err1:
+err_expr_parse:
return ERR_PTR(err);
}
diff --git a/net/netfilter/nft_dynset.c b/net/netfilter/nft_dynset.c
index 4e54404..cc076d5 100644
--- a/net/netfilter/nft_dynset.c
+++ b/net/netfilter/nft_dynset.c
@@ -193,9 +193,6 @@ static int nft_dynset_init(const struct nft_ctx *ctx,
return PTR_ERR(priv->expr);
err = -EOPNOTSUPP;
- if (!(priv->expr->ops->type->flags & NFT_EXPR_STATEFUL))
- goto err1;
-
if (priv->expr->ops->type->flags & NFT_EXPR_GC) {
if (set->flags & NFT_SET_TIMEOUT)
goto err1;
--
2.7.4
בבקשה אני צריך את עזרתך
אני שולח לך את ברכתי מסולטנות עומאן, בעיר הבירה מוסקט.
האם אוכל להשתמש במדיום הזה כדי לפתוח איתך תקשורת הדדית, ולחפש את קבלתך
להשקעה במדינה שלך בניהולך כשותפה שלי, שמי עיישה קדאפי ומתגוררת כיום
בעומאן, אני אלמנה ואם חד הורית עם שלושה ילדים , בתו הביולוגית היחידה
של נשיא לוב המנוח (קולונל המנוח מועמר קדאפי) וכרגע אני נמצאת תחת הגנת
מקלט מדיני על ידי ממשלת עומני.
יש לי כספים בשווי "עשרים ושבע מיליון וחמש מאות אלף דולר אמריקאי"
-$27.500.000.00 דולר אמריקאי שאני רוצה להפקיד בידך עבור פרויקט השקעה
בארצך. אם אתה מוכן לטפל בפרויקט זה בשמי, אנא השב דחוף כדי לאפשר לי
לספק לך פרטים נוספים כדי להתחיל בתהליך ההעברה.
אודה לתגובתך הדחופה באמצעות כתובת הדוא"ל שלי למטה: madamgadafiaisha(a)gmail.com
תודה
שלך באמת עיישה
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Please i need your help
I am sending my greetings to you from the Sultanate of Oman, In the
capital city of Muscat.
May i use this medium to open a mutual communication with you, and
seeking your acceptance towards investing in your country under your
management as my partner, My name is Aisha Gaddafi and presently
living in Oman, i am a Widow and single Mother with three Children,
the only biological Daughter of late Libyan President (Late Colonel
Muammar Gaddafi) and presently i am under political asylum protection
by the Omani Government.
I have funds worth “Twenty Seven Million Five Hundred Thousand United
State Dollars” -$27.500.000.00 US Dollars which i want to entrust on
you for investment project in your country. If you are willing to
handle this project on my behalf, kindly reply urgent to enable me
provide you more details to start the transfer process.
I shall appreciate your urgent response through my email address
below: madamgadafiaisha(a)gmail.com
Thanks
Yours Truly Aisha
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 8ba0005ff418ec356e176b26eaa04a6ac755d05b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:10:54 +0200
Subject: [PATCH] landlock: Fix same-layer rule unions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The original behavior was to check if the full set of requested accesses
was allowed by at least a rule of every relevant layer. This didn't
take into account requests for multiple accesses and same-layer rules
allowing the union of these accesses in a complementary way. As a
result, multiple accesses requested on a file hierarchy matching rules
that, together, allowed these accesses, but without a unique rule
allowing all of them, was illegitimately denied. This case should be
rare in practice and it can only be triggered by the path_rename or
file_open hook implementations.
For instance, if, for the same layer, a rule allows execution
beneath /a/b and another rule allows read beneath /a, requesting access
to read and execute at the same time for /a/b should be allowed for this
layer.
This was an inconsistency because the union of same-layer rule accesses
was already allowed if requested once at a time anyway.
This fix changes the way allowed accesses are gathered over a path walk.
To take into account all these rule accesses, we store in a matrix all
layer granting the set of requested accesses, according to the handled
accesses. To avoid heap allocation, we use an array on the stack which
is 2*13 bytes. A following commit bringing the LANDLOCK_ACCESS_FS_REFER
access right will increase this size to reach 112 bytes (2*14*4) in case
of link or rename actions.
Add a new layout1.layer_rule_unions test to check that accesses from
different rules pertaining to the same layer are ORed in a file
hierarchy. Also test that it is not the case for rules from different
layers.
Reviewed-by: Paul Moore <paul(a)paul-moore.com>
Link: https://lore.kernel.org/r/20220506161102.525323-5-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 20953bff8fd5..c5749301b37d 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -207,45 +207,67 @@ find_rule(const struct landlock_ruleset *const domain,
return rule;
}
-static inline layer_mask_t unmask_layers(const struct landlock_rule *const rule,
- const access_mask_t access_request,
- layer_mask_t layer_mask)
+/*
+ * @layer_masks is read and may be updated according to the access request and
+ * the matching rule.
+ *
+ * Returns true if the request is allowed (i.e. relevant layer masks for the
+ * request are empty).
+ */
+static inline bool
+unmask_layers(const struct landlock_rule *const rule,
+ const access_mask_t access_request,
+ layer_mask_t (*const layer_masks)[LANDLOCK_NUM_ACCESS_FS])
{
size_t layer_level;
+ if (!access_request || !layer_masks)
+ return true;
if (!rule)
- return layer_mask;
+ return false;
/*
* An access is granted if, for each policy layer, at least one rule
- * encountered on the pathwalk grants the requested accesses,
- * regardless of their position in the layer stack. We must then check
+ * encountered on the pathwalk grants the requested access,
+ * regardless of its position in the layer stack. We must then check
* the remaining layers for each inode, from the first added layer to
- * the last one.
+ * the last one. When there is multiple requested accesses, for each
+ * policy layer, the full set of requested accesses may not be granted
+ * by only one rule, but by the union (binary OR) of multiple rules.
+ * E.g. /a/b <execute> + /a <read> => /a/b <execute + read>
*/
for (layer_level = 0; layer_level < rule->num_layers; layer_level++) {
const struct landlock_layer *const layer =
&rule->layers[layer_level];
const layer_mask_t layer_bit = BIT_ULL(layer->level - 1);
+ const unsigned long access_req = access_request;
+ unsigned long access_bit;
+ bool is_empty;
- /* Checks that the layer grants access to the full request. */
- if ((layer->access & access_request) == access_request) {
- layer_mask &= ~layer_bit;
-
- if (layer_mask == 0)
- return layer_mask;
+ /*
+ * Records in @layer_masks which layer grants access to each
+ * requested access.
+ */
+ is_empty = true;
+ for_each_set_bit(access_bit, &access_req,
+ ARRAY_SIZE(*layer_masks)) {
+ if (layer->access & BIT_ULL(access_bit))
+ (*layer_masks)[access_bit] &= ~layer_bit;
+ is_empty = is_empty && !(*layer_masks)[access_bit];
}
+ if (is_empty)
+ return true;
}
- return layer_mask;
+ return false;
}
static int check_access_path(const struct landlock_ruleset *const domain,
const struct path *const path,
const access_mask_t access_request)
{
- bool allowed = false;
+ layer_mask_t layer_masks[LANDLOCK_NUM_ACCESS_FS] = {};
+ bool allowed = false, has_access = false;
struct path walker_path;
- layer_mask_t layer_mask;
size_t i;
if (!access_request)
@@ -265,13 +287,20 @@ static int check_access_path(const struct landlock_ruleset *const domain,
return -EACCES;
/* Saves all layers handling a subset of requested accesses. */
- layer_mask = 0;
for (i = 0; i < domain->num_layers; i++) {
- if (domain->fs_access_masks[i] & access_request)
- layer_mask |= BIT_ULL(i);
+ const unsigned long access_req = access_request;
+ unsigned long access_bit;
+
+ for_each_set_bit(access_bit, &access_req,
+ ARRAY_SIZE(layer_masks)) {
+ if (domain->fs_access_masks[i] & BIT_ULL(access_bit)) {
+ layer_masks[access_bit] |= BIT_ULL(i);
+ has_access = true;
+ }
+ }
}
/* An access request not handled by the domain is allowed. */
- if (layer_mask == 0)
+ if (!has_access)
return 0;
walker_path = *path;
@@ -283,14 +312,11 @@ static int check_access_path(const struct landlock_ruleset *const domain,
while (true) {
struct dentry *parent_dentry;
- layer_mask =
- unmask_layers(find_rule(domain, walker_path.dentry),
- access_request, layer_mask);
- if (layer_mask == 0) {
+ allowed = unmask_layers(find_rule(domain, walker_path.dentry),
+ access_request, &layer_masks);
+ if (allowed)
/* Stops when a rule from each layer grants access. */
- allowed = true;
break;
- }
jump_up:
if (walker_path.dentry == walker_path.mnt->mnt_root) {
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 521af2848951..d43231b783e4 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -22,6 +22,8 @@
typedef u16 access_mask_t;
/* Makes sure all filesystem access rights can be stored. */
static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
+/* Makes sure for_each_set_bit() and for_each_clear_bit() calls are OK. */
+static_assert(sizeof(unsigned long) >= sizeof(access_mask_t));
typedef u16 layer_mask_t;
/* Makes sure all layers can be checked. */
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index e13f046a172a..a4fdcda62bde 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -758,6 +758,113 @@ TEST_F_FORK(layout1, ruleset_overlap)
ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY | O_DIRECTORY));
}
+TEST_F_FORK(layout1, layer_rule_unions)
+{
+ const struct rule layer1[] = {
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE,
+ },
+ /* dir_s1d3 should allow READ_FILE and WRITE_FILE (O_RDWR). */
+ {
+ .path = dir_s1d3,
+ .access = LANDLOCK_ACCESS_FS_WRITE_FILE,
+ },
+ {},
+ };
+ const struct rule layer2[] = {
+ /* Doesn't change anything from layer1. */
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_READ_FILE |
+ LANDLOCK_ACCESS_FS_WRITE_FILE,
+ },
+ {},
+ };
+ const struct rule layer3[] = {
+ /* Only allows write (but not read) to dir_s1d3. */
+ {
+ .path = dir_s1d2,
+ .access = LANDLOCK_ACCESS_FS_WRITE_FILE,
+ },
+ {},
+ };
+ int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, layer1);
+
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks s1d1 hierarchy with layer1. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* Checks s1d2 hierarchy with layer1. */
+ ASSERT_EQ(0, test_open(file1_s1d2, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* Checks s1d3 hierarchy with layer1. */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d3, O_WRONLY));
+ /* dir_s1d3 should allow READ_FILE and WRITE_FILE (O_RDWR). */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* Doesn't change anything from layer1. */
+ ruleset_fd = create_ruleset(_metadata, ACCESS_RW, layer2);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks s1d1 hierarchy with layer2. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* Checks s1d2 hierarchy with layer2. */
+ ASSERT_EQ(0, test_open(file1_s1d2, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* Checks s1d3 hierarchy with layer2. */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d3, O_WRONLY));
+ /* dir_s1d3 should allow READ_FILE and WRITE_FILE (O_RDWR). */
+ ASSERT_EQ(0, test_open(file1_s1d3, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* Only allows write (but not read) to dir_s1d3. */
+ ruleset_fd = create_ruleset(_metadata, ACCESS_RW, layer3);
+ ASSERT_LE(0, ruleset_fd);
+ enforce_ruleset(_metadata, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
+
+ /* Checks s1d1 hierarchy with layer3. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* Checks s1d2 hierarchy with layer3. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_RDONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_WRONLY));
+ ASSERT_EQ(EACCES, test_open(file1_s1d2, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+
+ /* Checks s1d3 hierarchy with layer3. */
+ ASSERT_EQ(EACCES, test_open(file1_s1d3, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d3, O_WRONLY));
+ /* dir_s1d3 should now deny READ_FILE and WRITE_FILE (O_RDWR). */
+ ASSERT_EQ(EACCES, test_open(file1_s1d3, O_RDWR));
+ ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY | O_DIRECTORY));
+}
+
TEST_F_FORK(layout1, non_overlapping_accesses)
{
const struct rule layer1[] = {
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2cd7cd6eed88b8383cfddce589afe9c0ae1d19b4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:10:53 +0200
Subject: [PATCH] landlock: Create find_rule() from unmask_layers()
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This refactoring will be useful in a following commit.
Reviewed-by: Paul Moore <paul(a)paul-moore.com>
Link: https://lore.kernel.org/r/20220506161102.525323-4-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index f48c0a3b1e75..20953bff8fd5 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -183,23 +183,36 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
/* Access-control management */
-static inline layer_mask_t
-unmask_layers(const struct landlock_ruleset *const domain,
- const struct path *const path, const access_mask_t access_request,
- layer_mask_t layer_mask)
+/*
+ * The lifetime of the returned rule is tied to @domain.
+ *
+ * Returns NULL if no rule is found or if @dentry is negative.
+ */
+static inline const struct landlock_rule *
+find_rule(const struct landlock_ruleset *const domain,
+ const struct dentry *const dentry)
{
const struct landlock_rule *rule;
const struct inode *inode;
- size_t i;
- if (d_is_negative(path->dentry))
- /* Ignore nonexistent leafs. */
- return layer_mask;
- inode = d_backing_inode(path->dentry);
+ /* Ignores nonexistent leafs. */
+ if (d_is_negative(dentry))
+ return NULL;
+
+ inode = d_backing_inode(dentry);
rcu_read_lock();
rule = landlock_find_rule(
domain, rcu_dereference(landlock_inode(inode)->object));
rcu_read_unlock();
+ return rule;
+}
+
+static inline layer_mask_t unmask_layers(const struct landlock_rule *const rule,
+ const access_mask_t access_request,
+ layer_mask_t layer_mask)
+{
+ size_t layer_level;
+
if (!rule)
return layer_mask;
@@ -210,8 +223,9 @@ unmask_layers(const struct landlock_ruleset *const domain,
* the remaining layers for each inode, from the first added layer to
* the last one.
*/
- for (i = 0; i < rule->num_layers; i++) {
- const struct landlock_layer *const layer = &rule->layers[i];
+ for (layer_level = 0; layer_level < rule->num_layers; layer_level++) {
+ const struct landlock_layer *const layer =
+ &rule->layers[layer_level];
const layer_mask_t layer_bit = BIT_ULL(layer->level - 1);
/* Checks that the layer grants access to the full request. */
@@ -269,8 +283,9 @@ static int check_access_path(const struct landlock_ruleset *const domain,
while (true) {
struct dentry *parent_dentry;
- layer_mask = unmask_layers(domain, &walker_path, access_request,
- layer_mask);
+ layer_mask =
+ unmask_layers(find_rule(domain, walker_path.dentry),
+ access_request, layer_mask);
if (layer_mask == 0) {
/* Stops when a rule from each layer grants access. */
allowed = true;
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From eba39ca4b155c54adf471a69e91799cc1727873f Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:08:19 +0200
Subject: [PATCH] landlock: Change landlock_restrict_self(2) check ordering
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
According to the Landlock goal to be a security feature available to
unprivileges processes, it makes more sense to first check for
no_new_privs before checking anything else (i.e. syscall arguments).
Merge inval_fd_enforce and unpriv_enforce_without_no_new_privs tests
into the new restrict_self_checks_ordering. This is similar to the
previous commit checking other syscalls.
Link: https://lore.kernel.org/r/20220506160820.524344-10-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index a7396220c9d4..507d43827afe 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -405,10 +405,6 @@ SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32,
if (!landlock_initialized)
return -EOPNOTSUPP;
- /* No flag for now. */
- if (flags)
- return -EINVAL;
-
/*
* Similar checks as for seccomp(2), except that an -EPERM may be
* returned.
@@ -417,6 +413,10 @@ SYSCALL_DEFINE2(landlock_restrict_self, const int, ruleset_fd, const __u32,
!ns_capable_noaudit(current_user_ns(), CAP_SYS_ADMIN))
return -EPERM;
+ /* No flag for now. */
+ if (flags)
+ return -EINVAL;
+
/* Gets and checks the ruleset. */
ruleset = get_ruleset_from_fd(ruleset_fd, FMODE_CAN_READ);
if (IS_ERR(ruleset))
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index 18b779471dcb..21fb33581419 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -168,22 +168,49 @@ TEST(add_rule_checks_ordering)
ASSERT_EQ(0, close(ruleset_fd));
}
-TEST(inval_fd_enforce)
+/* Tests ordering of syscall argument and permission checks. */
+TEST(restrict_self_checks_ordering)
{
+ const struct landlock_ruleset_attr ruleset_attr = {
+ .handled_access_fs = LANDLOCK_ACCESS_FS_EXECUTE,
+ };
+ struct landlock_path_beneath_attr path_beneath_attr = {
+ .allowed_access = LANDLOCK_ACCESS_FS_EXECUTE,
+ .parent_fd = -1,
+ };
+ const int ruleset_fd =
+ landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+
+ ASSERT_LE(0, ruleset_fd);
+ path_beneath_attr.parent_fd =
+ open("/tmp", O_PATH | O_NOFOLLOW | O_DIRECTORY | O_CLOEXEC);
+ ASSERT_LE(0, path_beneath_attr.parent_fd);
+ ASSERT_EQ(0, landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
+ &path_beneath_attr, 0));
+ ASSERT_EQ(0, close(path_beneath_attr.parent_fd));
+
+ /* Checks unprivileged enforcement without no_new_privs. */
+ drop_caps(_metadata);
+ ASSERT_EQ(-1, landlock_restrict_self(-1, -1));
+ ASSERT_EQ(EPERM, errno);
+ ASSERT_EQ(-1, landlock_restrict_self(-1, 0));
+ ASSERT_EQ(EPERM, errno);
+ ASSERT_EQ(-1, landlock_restrict_self(ruleset_fd, 0));
+ ASSERT_EQ(EPERM, errno);
+
ASSERT_EQ(0, prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0));
+ /* Checks invalid flags. */
+ ASSERT_EQ(-1, landlock_restrict_self(-1, -1));
+ ASSERT_EQ(EINVAL, errno);
+
+ /* Checks invalid ruleset FD. */
ASSERT_EQ(-1, landlock_restrict_self(-1, 0));
ASSERT_EQ(EBADF, errno);
-}
-
-TEST(unpriv_enforce_without_no_new_privs)
-{
- int err;
- drop_caps(_metadata);
- err = landlock_restrict_self(-1, 0);
- ASSERT_EQ(EPERM, errno);
- ASSERT_EQ(err, -1);
+ /* Checks valid call. */
+ ASSERT_EQ(0, landlock_restrict_self(ruleset_fd, 0));
+ ASSERT_EQ(0, close(ruleset_fd));
}
TEST(ruleset_fd_io)
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d1788ad990874734341b05ab8ccb6448c09c6422 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:08:17 +0200
Subject: [PATCH] selftests/landlock: Add tests for O_PATH
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The O_PATH flag is currently not handled by Landlock. Let's make sure
this behavior will remain consistent with the same ruleset over time.
Cc: Shuah Khan <shuah(a)kernel.org>
Link: https://lore.kernel.org/r/20220506160820.524344-8-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index 9165f6adf7b9..a8f54c4462eb 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -654,17 +654,23 @@ TEST_F_FORK(layout1, effective_access)
enforce_ruleset(_metadata, ruleset_fd);
ASSERT_EQ(0, close(ruleset_fd));
- /* Tests on a directory. */
+ /* Tests on a directory (with or without O_PATH). */
ASSERT_EQ(EACCES, test_open("/", O_RDONLY));
+ ASSERT_EQ(0, test_open("/", O_RDONLY | O_PATH));
ASSERT_EQ(EACCES, test_open(dir_s1d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s1d1, O_RDONLY | O_PATH));
ASSERT_EQ(EACCES, test_open(file1_s1d1, O_RDONLY));
+ ASSERT_EQ(0, test_open(file1_s1d1, O_RDONLY | O_PATH));
+
ASSERT_EQ(0, test_open(dir_s1d2, O_RDONLY));
ASSERT_EQ(0, test_open(file1_s1d2, O_RDONLY));
ASSERT_EQ(0, test_open(dir_s1d3, O_RDONLY));
ASSERT_EQ(0, test_open(file1_s1d3, O_RDONLY));
- /* Tests on a file. */
+ /* Tests on a file (with or without O_PATH). */
ASSERT_EQ(EACCES, test_open(dir_s2d2, O_RDONLY));
+ ASSERT_EQ(0, test_open(dir_s2d2, O_RDONLY | O_PATH));
+
ASSERT_EQ(0, test_open(file1_s2d2, O_RDONLY));
/* Checks effective read and write actions. */
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 75c542d6c6cc48720376862d5496d51509160dfd Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:10:52 +0200
Subject: [PATCH] landlock: Reduce the maximum number of layers to 16
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The maximum number of nested Landlock domains is currently 64. Because
of the following fix and to help reduce the stack size, let's reduce it
to 16. This seems large enough for a lot of use cases (e.g. sandboxed
init service, spawning a sandboxed SSH service, in nested sandboxed
containers). Reducing the number of nested domains may also help to
discover misuse of Landlock (e.g. creating a domain per rule).
Add and use a dedicated layer_mask_t typedef to fit with the number of
layers. This might be useful when changing it and to keep it consistent
with the maximum number of layers.
Reviewed-by: Paul Moore <paul(a)paul-moore.com>
Link: https://lore.kernel.org/r/20220506161102.525323-3-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/Documentation/userspace-api/landlock.rst b/Documentation/userspace-api/landlock.rst
index f35552ff19ba..b68e7a51009f 100644
--- a/Documentation/userspace-api/landlock.rst
+++ b/Documentation/userspace-api/landlock.rst
@@ -267,8 +267,8 @@ restrict such paths with dedicated ruleset flags.
Ruleset layers
--------------
-There is a limit of 64 layers of stacked rulesets. This can be an issue for a
-task willing to enforce a new ruleset in complement to its 64 inherited
+There is a limit of 16 layers of stacked rulesets. This can be an issue for a
+task willing to enforce a new ruleset in complement to its 16 inherited
rulesets. Once this limit is reached, sys_landlock_restrict_self() returns
E2BIG. It is then strongly suggested to carefully build rulesets once in the
life of a thread, especially for applications able to launch other applications
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index d4006add8bdf..f48c0a3b1e75 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -183,10 +183,10 @@ int landlock_append_fs_rule(struct landlock_ruleset *const ruleset,
/* Access-control management */
-static inline u64 unmask_layers(const struct landlock_ruleset *const domain,
- const struct path *const path,
- const access_mask_t access_request,
- u64 layer_mask)
+static inline layer_mask_t
+unmask_layers(const struct landlock_ruleset *const domain,
+ const struct path *const path, const access_mask_t access_request,
+ layer_mask_t layer_mask)
{
const struct landlock_rule *rule;
const struct inode *inode;
@@ -212,11 +212,11 @@ static inline u64 unmask_layers(const struct landlock_ruleset *const domain,
*/
for (i = 0; i < rule->num_layers; i++) {
const struct landlock_layer *const layer = &rule->layers[i];
- const u64 layer_level = BIT_ULL(layer->level - 1);
+ const layer_mask_t layer_bit = BIT_ULL(layer->level - 1);
/* Checks that the layer grants access to the full request. */
if ((layer->access & access_request) == access_request) {
- layer_mask &= ~layer_level;
+ layer_mask &= ~layer_bit;
if (layer_mask == 0)
return layer_mask;
@@ -231,12 +231,9 @@ static int check_access_path(const struct landlock_ruleset *const domain,
{
bool allowed = false;
struct path walker_path;
- u64 layer_mask;
+ layer_mask_t layer_mask;
size_t i;
- /* Make sure all layers can be checked. */
- BUILD_BUG_ON(BITS_PER_TYPE(layer_mask) < LANDLOCK_MAX_NUM_LAYERS);
-
if (!access_request)
return 0;
if (WARN_ON_ONCE(!domain || !path))
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index 41372f22837f..17c2a2e7fe1e 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -15,7 +15,7 @@
/* clang-format off */
-#define LANDLOCK_MAX_NUM_LAYERS 64
+#define LANDLOCK_MAX_NUM_LAYERS 16
#define LANDLOCK_MAX_NUM_RULES U32_MAX
#define LANDLOCK_LAST_ACCESS_FS LANDLOCK_ACCESS_FS_MAKE_SYM
diff --git a/security/landlock/ruleset.h b/security/landlock/ruleset.h
index 8d5717594931..521af2848951 100644
--- a/security/landlock/ruleset.h
+++ b/security/landlock/ruleset.h
@@ -23,6 +23,10 @@ typedef u16 access_mask_t;
/* Makes sure all filesystem access rights can be stored. */
static_assert(BITS_PER_TYPE(access_mask_t) >= LANDLOCK_NUM_ACCESS_FS);
+typedef u16 layer_mask_t;
+/* Makes sure all layers can be checked. */
+static_assert(BITS_PER_TYPE(layer_mask_t) >= LANDLOCK_MAX_NUM_LAYERS);
+
/**
* struct landlock_layer - Access rights for a given layer
*/
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index a8f54c4462eb..e13f046a172a 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -1159,7 +1159,7 @@ TEST_F_FORK(layout1, max_layers)
const int ruleset_fd = create_ruleset(_metadata, ACCESS_RW, rules);
ASSERT_LE(0, ruleset_fd);
- for (i = 0; i < 64; i++)
+ for (i = 0; i < 16; i++)
enforce_ruleset(_metadata, ruleset_fd);
for (i = 0; i < 2; i++) {
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6533d0c3a86ee1cc74ff37ac92ca597deb87015c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:08:20 +0200
Subject: [PATCH] selftests/landlock: Test landlock_create_ruleset(2) argument
check ordering
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Add inval_create_ruleset_arguments, extension of
inval_create_ruleset_flags, to also check error ordering for
landlock_create_ruleset(2).
This is similar to the previous commit checking landlock_add_rule(2).
Test coverage for security/landlock is 94.4% of 504 lines accorging to
gcc/gcov-11.
Link: https://lore.kernel.org/r/20220506160820.524344-11-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index 21fb33581419..35f64832b869 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -97,14 +97,17 @@ TEST(abi_version)
ASSERT_EQ(EINVAL, errno);
}
-TEST(inval_create_ruleset_flags)
+/* Tests ordering of syscall argument checks. */
+TEST(create_ruleset_checks_ordering)
{
const int last_flag = LANDLOCK_CREATE_RULESET_VERSION;
const int invalid_flag = last_flag << 1;
+ int ruleset_fd;
const struct landlock_ruleset_attr ruleset_attr = {
.handled_access_fs = LANDLOCK_ACCESS_FS_READ_FILE,
};
+ /* Checks priority for invalid flags. */
ASSERT_EQ(-1, landlock_create_ruleset(NULL, 0, invalid_flag));
ASSERT_EQ(EINVAL, errno);
@@ -119,6 +122,22 @@ TEST(inval_create_ruleset_flags)
landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr),
invalid_flag));
ASSERT_EQ(EINVAL, errno);
+
+ /* Checks too big ruleset_attr size. */
+ ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, -1, 0));
+ ASSERT_EQ(E2BIG, errno);
+
+ /* Checks too small ruleset_attr size. */
+ ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 0, 0));
+ ASSERT_EQ(EINVAL, errno);
+ ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr, 1, 0));
+ ASSERT_EQ(EINVAL, errno);
+
+ /* Checks valid call. */
+ ruleset_fd =
+ landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+ ASSERT_LE(0, ruleset_fd);
+ ASSERT_EQ(0, close(ruleset_fd));
}
/* Tests ordering of syscall argument checks. */
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c56b3bf566da5a0dd3b58ad97a614b0928b06ebf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:08:14 +0200
Subject: [PATCH] selftests/landlock: Add tests for unknown access rights
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Make sure that trying to use unknown access rights returns an error.
Cc: Shuah Khan <shuah(a)kernel.org>
Link: https://lore.kernel.org/r/20220506160820.524344-5-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/tools/testing/selftests/landlock/fs_test.c b/tools/testing/selftests/landlock/fs_test.c
index cc7fa7b17578..f293b7e2a1a7 100644
--- a/tools/testing/selftests/landlock/fs_test.c
+++ b/tools/testing/selftests/landlock/fs_test.c
@@ -448,6 +448,22 @@ TEST_F_FORK(layout1, file_access_rights)
ASSERT_EQ(0, close(path_beneath.parent_fd));
}
+TEST_F_FORK(layout1, unknown_access_rights)
+{
+ __u64 access_mask;
+
+ for (access_mask = 1ULL << 63; access_mask != ACCESS_LAST;
+ access_mask >>= 1) {
+ struct landlock_ruleset_attr ruleset_attr = {
+ .handled_access_fs = access_mask,
+ };
+
+ ASSERT_EQ(-1, landlock_create_ruleset(&ruleset_attr,
+ sizeof(ruleset_attr), 0));
+ ASSERT_EQ(EINVAL, errno);
+ }
+}
+
static void add_path_beneath(struct __test_metadata *const _metadata,
const int ruleset_fd, const __u64 allowed_access,
const char *const path)
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 291865bd7e8bb4b4033d341fa02dafa728e6378c Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:08:13 +0200
Subject: [PATCH] selftests/landlock: Extend tests for minimal valid attribute
size
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
This might be useful when the struct landlock_ruleset_attr will get more
fields.
Cc: Shuah Khan <shuah(a)kernel.org>
Link: https://lore.kernel.org/r/20220506160820.524344-4-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/tools/testing/selftests/landlock/base_test.c b/tools/testing/selftests/landlock/base_test.c
index 3faeae4233a4..be9b937256ac 100644
--- a/tools/testing/selftests/landlock/base_test.c
+++ b/tools/testing/selftests/landlock/base_test.c
@@ -35,6 +35,8 @@ TEST(inconsistent_attr)
ASSERT_EQ(EINVAL, errno);
ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, 1, 0));
ASSERT_EQ(EINVAL, errno);
+ ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, 7, 0));
+ ASSERT_EQ(EINVAL, errno);
ASSERT_EQ(-1, landlock_create_ruleset(NULL, 1, 0));
/* The size if less than sizeof(struct landlock_attr_enforce). */
@@ -47,6 +49,9 @@ TEST(inconsistent_attr)
ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, page_size + 1, 0));
ASSERT_EQ(E2BIG, errno);
+ /* Checks minimal valid attribute size. */
+ ASSERT_EQ(-1, landlock_create_ruleset(ruleset_attr, 8, 0));
+ ASSERT_EQ(ENOMSG, errno);
ASSERT_EQ(-1, landlock_create_ruleset(
ruleset_attr,
sizeof(struct landlock_ruleset_attr), 0));
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a13e248ff90e81e9322406c0e618cf2168702f4e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:08:11 +0200
Subject: [PATCH] landlock: Fix landlock_add_rule(2) documentation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
It is not mandatory to pass a file descriptor obtained with the O_PATH
flag. Also, replace rule's accesses with ruleset's accesses.
Link: https://lore.kernel.org/r/20220506160820.524344-2-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index 15c31abb0d76..21c8d58283c9 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -62,8 +62,9 @@ struct landlock_path_beneath_attr {
*/
__u64 allowed_access;
/**
- * @parent_fd: File descriptor, open with ``O_PATH``, which identifies
- * the parent directory of a file hierarchy, or just a file.
+ * @parent_fd: File descriptor, preferably opened with ``O_PATH``,
+ * which identifies the parent directory of a file hierarchy, or just a
+ * file.
*/
__s32 parent_fd;
/*
diff --git a/security/landlock/syscalls.c b/security/landlock/syscalls.c
index 2fde978bf8ca..7edc1d50e2bf 100644
--- a/security/landlock/syscalls.c
+++ b/security/landlock/syscalls.c
@@ -292,14 +292,13 @@ static int get_path_from_fd(const s32 fd, struct path *const path)
*
* - EOPNOTSUPP: Landlock is supported by the kernel but disabled at boot time;
* - EINVAL: @flags is not 0, or inconsistent access in the rule (i.e.
- * &landlock_path_beneath_attr.allowed_access is not a subset of the rule's
- * accesses);
+ * &landlock_path_beneath_attr.allowed_access is not a subset of the
+ * ruleset handled accesses);
* - ENOMSG: Empty accesses (e.g. &landlock_path_beneath_attr.allowed_access);
* - EBADF: @ruleset_fd is not a file descriptor for the current thread, or a
* member of @rule_attr is not a file descriptor as expected;
* - EBADFD: @ruleset_fd is not a ruleset file descriptor, or a member of
- * @rule_attr is not the expected file descriptor type (e.g. file open
- * without O_PATH);
+ * @rule_attr is not the expected file descriptor type;
* - EPERM: @ruleset_fd has no write access to the underlying ruleset;
* - EFAULT: @rule_attr inconsistency.
*/
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9805a722db071e1772b80e6e0ff33f35355639ac Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:05:12 +0200
Subject: [PATCH] samples/landlock: Add clang-format exceptions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In preparation to a following commit, add clang-format on and
clang-format off stanzas around constant definitions. This enables to
keep aligned values, which is much more readable than packed
definitions.
Link: https://lore.kernel.org/r/20220506160513.523257-7-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
index 8859fc193542..5ce961b5bda7 100644
--- a/samples/landlock/sandboxer.c
+++ b/samples/landlock/sandboxer.c
@@ -70,11 +70,15 @@ static int parse_path(char *env_path, const char ***const path_list)
return num_paths;
}
+/* clang-format off */
+
#define ACCESS_FILE ( \
LANDLOCK_ACCESS_FS_EXECUTE | \
LANDLOCK_ACCESS_FS_WRITE_FILE | \
LANDLOCK_ACCESS_FS_READ_FILE)
+/* clang-format on */
+
static int populate_ruleset(
const char *const env_var, const int ruleset_fd,
const __u64 allowed_access)
@@ -139,6 +143,8 @@ static int populate_ruleset(
return ret;
}
+/* clang-format off */
+
#define ACCESS_FS_ROUGHLY_READ ( \
LANDLOCK_ACCESS_FS_EXECUTE | \
LANDLOCK_ACCESS_FS_READ_FILE | \
@@ -156,6 +162,8 @@ static int populate_ruleset(
LANDLOCK_ACCESS_FS_MAKE_BLOCK | \
LANDLOCK_ACCESS_FS_MAKE_SYM)
+/* clang-format on */
+
int main(const int argc, char *const argv[], char *const *const envp)
{
const char *cmd_path;
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 81709f3dccacf4104a4bc2daa80bdd767a9c4c54 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:05:13 +0200
Subject: [PATCH] samples/landlock: Format with clang-format
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Let's follow a consistent and documented coding style. Everything may
not be to our liking but it is better than tacit knowledge. Moreover,
this will help maintain style consistency between different developers.
This contains only whitespace changes.
Automatically formatted with:
clang-format-14 -i samples/landlock/*.[ch]
Link: https://lore.kernel.org/r/20220506160513.523257-8-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/samples/landlock/sandboxer.c b/samples/landlock/sandboxer.c
index 5ce961b5bda7..c089e9cdaf32 100644
--- a/samples/landlock/sandboxer.c
+++ b/samples/landlock/sandboxer.c
@@ -22,9 +22,9 @@
#include <unistd.h>
#ifndef landlock_create_ruleset
-static inline int landlock_create_ruleset(
- const struct landlock_ruleset_attr *const attr,
- const size_t size, const __u32 flags)
+static inline int
+landlock_create_ruleset(const struct landlock_ruleset_attr *const attr,
+ const size_t size, const __u32 flags)
{
return syscall(__NR_landlock_create_ruleset, attr, size, flags);
}
@@ -32,17 +32,18 @@ static inline int landlock_create_ruleset(
#ifndef landlock_add_rule
static inline int landlock_add_rule(const int ruleset_fd,
- const enum landlock_rule_type rule_type,
- const void *const rule_attr, const __u32 flags)
+ const enum landlock_rule_type rule_type,
+ const void *const rule_attr,
+ const __u32 flags)
{
- return syscall(__NR_landlock_add_rule, ruleset_fd, rule_type,
- rule_attr, flags);
+ return syscall(__NR_landlock_add_rule, ruleset_fd, rule_type, rule_attr,
+ flags);
}
#endif
#ifndef landlock_restrict_self
static inline int landlock_restrict_self(const int ruleset_fd,
- const __u32 flags)
+ const __u32 flags)
{
return syscall(__NR_landlock_restrict_self, ruleset_fd, flags);
}
@@ -79,9 +80,8 @@ static int parse_path(char *env_path, const char ***const path_list)
/* clang-format on */
-static int populate_ruleset(
- const char *const env_var, const int ruleset_fd,
- const __u64 allowed_access)
+static int populate_ruleset(const char *const env_var, const int ruleset_fd,
+ const __u64 allowed_access)
{
int num_paths, i, ret = 1;
char *env_path_name;
@@ -111,12 +111,10 @@ static int populate_ruleset(
for (i = 0; i < num_paths; i++) {
struct stat statbuf;
- path_beneath.parent_fd = open(path_list[i], O_PATH |
- O_CLOEXEC);
+ path_beneath.parent_fd = open(path_list[i], O_PATH | O_CLOEXEC);
if (path_beneath.parent_fd < 0) {
fprintf(stderr, "Failed to open \"%s\": %s\n",
- path_list[i],
- strerror(errno));
+ path_list[i], strerror(errno));
goto out_free_name;
}
if (fstat(path_beneath.parent_fd, &statbuf)) {
@@ -127,9 +125,10 @@ static int populate_ruleset(
if (!S_ISDIR(statbuf.st_mode))
path_beneath.allowed_access &= ACCESS_FILE;
if (landlock_add_rule(ruleset_fd, LANDLOCK_RULE_PATH_BENEATH,
- &path_beneath, 0)) {
- fprintf(stderr, "Failed to update the ruleset with \"%s\": %s\n",
- path_list[i], strerror(errno));
+ &path_beneath, 0)) {
+ fprintf(stderr,
+ "Failed to update the ruleset with \"%s\": %s\n",
+ path_list[i], strerror(errno));
close(path_beneath.parent_fd);
goto out_free_name;
}
@@ -171,55 +170,64 @@ int main(const int argc, char *const argv[], char *const *const envp)
int ruleset_fd;
struct landlock_ruleset_attr ruleset_attr = {
.handled_access_fs = ACCESS_FS_ROUGHLY_READ |
- ACCESS_FS_ROUGHLY_WRITE,
+ ACCESS_FS_ROUGHLY_WRITE,
};
if (argc < 2) {
- fprintf(stderr, "usage: %s=\"...\" %s=\"...\" %s <cmd> [args]...\n\n",
- ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
- fprintf(stderr, "Launch a command in a restricted environment.\n\n");
+ fprintf(stderr,
+ "usage: %s=\"...\" %s=\"...\" %s <cmd> [args]...\n\n",
+ ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
+ fprintf(stderr,
+ "Launch a command in a restricted environment.\n\n");
fprintf(stderr, "Environment variables containing paths, "
"each separated by a colon:\n");
- fprintf(stderr, "* %s: list of paths allowed to be used in a read-only way.\n",
- ENV_FS_RO_NAME);
- fprintf(stderr, "* %s: list of paths allowed to be used in a read-write way.\n",
- ENV_FS_RW_NAME);
- fprintf(stderr, "\nexample:\n"
- "%s=\"/bin:/lib:/usr:/proc:/etc:/dev/urandom\" "
- "%s=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" "
- "%s bash -i\n",
- ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
+ fprintf(stderr,
+ "* %s: list of paths allowed to be used in a read-only way.\n",
+ ENV_FS_RO_NAME);
+ fprintf(stderr,
+ "* %s: list of paths allowed to be used in a read-write way.\n",
+ ENV_FS_RW_NAME);
+ fprintf(stderr,
+ "\nexample:\n"
+ "%s=\"/bin:/lib:/usr:/proc:/etc:/dev/urandom\" "
+ "%s=\"/dev/null:/dev/full:/dev/zero:/dev/pts:/tmp\" "
+ "%s bash -i\n",
+ ENV_FS_RO_NAME, ENV_FS_RW_NAME, argv[0]);
return 1;
}
- ruleset_fd = landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
+ ruleset_fd =
+ landlock_create_ruleset(&ruleset_attr, sizeof(ruleset_attr), 0);
if (ruleset_fd < 0) {
const int err = errno;
perror("Failed to create a ruleset");
switch (err) {
case ENOSYS:
- fprintf(stderr, "Hint: Landlock is not supported by the current kernel. "
- "To support it, build the kernel with "
- "CONFIG_SECURITY_LANDLOCK=y and prepend "
- "\"landlock,\" to the content of CONFIG_LSM.\n");
+ fprintf(stderr,
+ "Hint: Landlock is not supported by the current kernel. "
+ "To support it, build the kernel with "
+ "CONFIG_SECURITY_LANDLOCK=y and prepend "
+ "\"landlock,\" to the content of CONFIG_LSM.\n");
break;
case EOPNOTSUPP:
- fprintf(stderr, "Hint: Landlock is currently disabled. "
- "It can be enabled in the kernel configuration by "
- "prepending \"landlock,\" to the content of CONFIG_LSM, "
- "or at boot time by setting the same content to the "
- "\"lsm\" kernel parameter.\n");
+ fprintf(stderr,
+ "Hint: Landlock is currently disabled. "
+ "It can be enabled in the kernel configuration by "
+ "prepending \"landlock,\" to the content of CONFIG_LSM, "
+ "or at boot time by setting the same content to the "
+ "\"lsm\" kernel parameter.\n");
break;
}
return 1;
}
if (populate_ruleset(ENV_FS_RO_NAME, ruleset_fd,
- ACCESS_FS_ROUGHLY_READ)) {
+ ACCESS_FS_ROUGHLY_READ)) {
goto err_close_ruleset;
}
if (populate_ruleset(ENV_FS_RW_NAME, ruleset_fd,
- ACCESS_FS_ROUGHLY_READ | ACCESS_FS_ROUGHLY_WRITE)) {
+ ACCESS_FS_ROUGHLY_READ |
+ ACCESS_FS_ROUGHLY_WRITE)) {
goto err_close_ruleset;
}
if (prctl(PR_SET_NO_NEW_PRIVS, 1, 0, 0, 0)) {
@@ -236,7 +244,7 @@ int main(const int argc, char *const argv[], char *const *const envp)
cmd_argv = argv + 1;
execvpe(cmd_path, cmd_argv, envp);
fprintf(stderr, "Failed to execute \"%s\": %s\n", cmd_path,
- strerror(errno));
+ strerror(errno));
fprintf(stderr, "Hint: access to the binary, the interpreter or "
"shared libraries may be denied.\n");
return 1;
The patch below was submitted to be applied to the 5.18-stable tree.
I fail to see how this patch meets the stable kernel rules as found at
Documentation/process/stable-kernel-rules.rst.
I could be totally wrong, and if so, please respond to
<stable(a)vger.kernel.org> and let me know why this patch should be
applied. Otherwise, it is now dropped from my patch queues, never to be
seen again.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 6cc2df8e3a3967e7c13a424f87f6efb1d4a62d80 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Micka=C3=ABl=20Sala=C3=BCn?= <mic(a)digikod.net>
Date: Fri, 6 May 2022 18:05:07 +0200
Subject: [PATCH] landlock: Add clang-format exceptions
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
In preparation to a following commit, add clang-format on and
clang-format off stanzas around constant definitions. This enables to
keep aligned values, which is much more readable than packed
definitions.
Link: https://lore.kernel.org/r/20220506160513.523257-2-mic@digikod.net
Cc: stable(a)vger.kernel.org
Signed-off-by: Mickaël Salaün <mic(a)digikod.net>
diff --git a/include/uapi/linux/landlock.h b/include/uapi/linux/landlock.h
index b3d952067f59..15c31abb0d76 100644
--- a/include/uapi/linux/landlock.h
+++ b/include/uapi/linux/landlock.h
@@ -33,7 +33,9 @@ struct landlock_ruleset_attr {
* - %LANDLOCK_CREATE_RULESET_VERSION: Get the highest supported Landlock ABI
* version.
*/
+/* clang-format off */
#define LANDLOCK_CREATE_RULESET_VERSION (1U << 0)
+/* clang-format on */
/**
* enum landlock_rule_type - Landlock rule type
@@ -120,6 +122,7 @@ struct landlock_path_beneath_attr {
* :manpage:`access(2)`.
* Future Landlock evolutions will enable to restrict them.
*/
+/* clang-format off */
#define LANDLOCK_ACCESS_FS_EXECUTE (1ULL << 0)
#define LANDLOCK_ACCESS_FS_WRITE_FILE (1ULL << 1)
#define LANDLOCK_ACCESS_FS_READ_FILE (1ULL << 2)
@@ -133,5 +136,6 @@ struct landlock_path_beneath_attr {
#define LANDLOCK_ACCESS_FS_MAKE_FIFO (1ULL << 10)
#define LANDLOCK_ACCESS_FS_MAKE_BLOCK (1ULL << 11)
#define LANDLOCK_ACCESS_FS_MAKE_SYM (1ULL << 12)
+/* clang-format on */
#endif /* _UAPI_LINUX_LANDLOCK_H */
diff --git a/security/landlock/fs.c b/security/landlock/fs.c
index 97b8e421f617..4195a6be60b2 100644
--- a/security/landlock/fs.c
+++ b/security/landlock/fs.c
@@ -141,10 +141,12 @@ static struct landlock_object *get_inode_object(struct inode *const inode)
}
/* All access rights that can be tied to files. */
+/* clang-format off */
#define ACCESS_FILE ( \
LANDLOCK_ACCESS_FS_EXECUTE | \
LANDLOCK_ACCESS_FS_WRITE_FILE | \
LANDLOCK_ACCESS_FS_READ_FILE)
+/* clang-format on */
/*
* @path: Should have been checked by get_path_from_fd().
diff --git a/security/landlock/limits.h b/security/landlock/limits.h
index 2a0a1095ee27..a274ae6b5570 100644
--- a/security/landlock/limits.h
+++ b/security/landlock/limits.h
@@ -12,10 +12,14 @@
#include <linux/limits.h>
#include <uapi/linux/landlock.h>
+/* clang-format off */
+
#define LANDLOCK_MAX_NUM_LAYERS 64
#define LANDLOCK_MAX_NUM_RULES U32_MAX
#define LANDLOCK_LAST_ACCESS_FS LANDLOCK_ACCESS_FS_MAKE_SYM
#define LANDLOCK_MASK_ACCESS_FS ((LANDLOCK_LAST_ACCESS_FS << 1) - 1)
+/* clang-format on */
+
#endif /* _SECURITY_LANDLOCK_LIMITS_H */
The patch below does not apply to the 5.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b7a4f9b5d0e4b6dd937678c546c0b322dd1a4054 Mon Sep 17 00:00:00 2001
From: Kishon Vijay Abraham I <kishon(a)ti.com>
Date: Tue, 10 May 2022 14:46:30 +0530
Subject: [PATCH] xhci: Set HCD flag to defer primary roothub registration
Set "HCD_FLAG_DEFER_RH_REGISTER" to hcd->flags in xhci_run() to defer
registering primary roothub in usb_add_hcd() if xhci has two roothubs.
This will make sure both primary roothub and secondary roothub will be
registered along with the second HCD.
This is required for cold plugged USB devices to be detected in certain
PCIe USB cards (like Inateck USB card connected to AM64 EVM or J7200 EVM).
This patch has been added and reverted earier as it triggered a race
in usb device enumeration.
That race is now fixed in 5.16-rc3, and in stable back to 5.4
commit 6cca13de26ee ("usb: hub: Fix locking issues with address0_mutex")
commit 6ae6dc22d2d1 ("usb: hub: Fix usb enumeration issue due to address0
race")
[minor rebase change, and commit message update -Mathias]
CC: stable(a)vger.kernel.org # 5.4+
Suggested-by: Mathias Nyman <mathias.nyman(a)linux.intel.com>
Tested-by: Chris Chiu <chris.chiu(a)canonical.com>
Signed-off-by: Kishon Vijay Abraham I <kishon(a)ti.com>
Link: https://lore.kernel.org/r/20220510091630.16564-3-kishon@ti.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/host/xhci.c b/drivers/usb/host/xhci.c
index 25b87e99b4dd..2be38d9de8df 100644
--- a/drivers/usb/host/xhci.c
+++ b/drivers/usb/host/xhci.c
@@ -696,6 +696,8 @@ int xhci_run(struct usb_hcd *hcd)
xhci_dbg_trace(xhci, trace_xhci_dbg_init,
"Finished xhci_run for USB2 roothub");
+ set_bit(HCD_FLAG_DEFER_RH_REGISTER, &hcd->flags);
+
xhci_create_dbc_dev(xhci);
xhci_debugfs_init(xhci);