The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 44b1eba44dc537edf076f131f1eeee7544d0e04f Mon Sep 17 00:00:00 2001
From: Loic Poulain <loic.poulain(a)linaro.org>
Date: Mon, 21 Jun 2021 21:46:10 +0530
Subject: [PATCH] bus: mhi: core: Fix power down latency
On graceful power-down/disable transition, when an MHI reset is
performed, the MHI device loses its context, including interrupt
configuration. However, the current implementation is waiting for
event(irq) driven state change to confirm reset has been completed,
which never happens, and causes reset timeout, leading to unexpected
high latency of the mhi_power_down procedure (up to 45 seconds).
Fix that by moving to the recently introduced poll_reg_field method,
waiting for the reset bit to be cleared, in the same way as the
power_on procedure.
Cc: stable(a)vger.kernel.org
Fixes: a6e2e3522f29 ("bus: mhi: core: Add support for PM state transitions")
Signed-off-by: Loic Poulain <loic.poulain(a)linaro.org>
Reviewed-by: Bhaumik Bhatt <bbhatt(a)codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Reviewed-by: Hemant Kumar <hemantk(a)codeaurora.org>
Link: https://lore.kernel.org/r/1620029090-8975-1-git-send-email-loic.poulain@lin…
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Link: https://lore.kernel.org/r/20210621161616.77524-3-manivannan.sadhasivam@lina…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index e2e59a341fef..704a5e225097 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -465,23 +465,15 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl)
/* Trigger MHI RESET so that the device will not access host memory */
if (!MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state)) {
- u32 in_reset = -1;
- unsigned long timeout = msecs_to_jiffies(mhi_cntrl->timeout_ms);
-
dev_dbg(dev, "Triggering MHI Reset in device\n");
mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
/* Wait for the reset bit to be cleared by the device */
- ret = wait_event_timeout(mhi_cntrl->state_event,
- mhi_read_reg_field(mhi_cntrl,
- mhi_cntrl->regs,
- MHICTRL,
- MHICTRL_RESET_MASK,
- MHICTRL_RESET_SHIFT,
- &in_reset) ||
- !in_reset, timeout);
- if (!ret || in_reset)
- dev_err(dev, "Device failed to exit MHI Reset state\n");
+ ret = mhi_poll_reg_field(mhi_cntrl, mhi_cntrl->regs, MHICTRL,
+ MHICTRL_RESET_MASK, MHICTRL_RESET_SHIFT, 0,
+ 25000);
+ if (ret)
+ dev_err(dev, "Device failed to clear MHI Reset\n");
/*
* Device will clear BHI_INTVEC as a part of RESET processing,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49221cf86d18bb66fe95d3338cb33bd4b9880ca5 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Tue, 22 Jun 2021 09:15:35 +0200
Subject: [PATCH] fuse: reject internal errno
Don't allow userspace to report errors that could be kernel-internal.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko(a)gmail.com>
Fixes: 334f485df85a ("[PATCH] FUSE - device functions")
Cc: <stable(a)vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6e63bcba2a40..b8d58aa08206 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1867,7 +1867,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
}
err = -EINVAL;
- if (oh.error <= -1000 || oh.error > 0)
+ if (oh.error <= -512 || oh.error > 0)
goto copy_finish;
spin_lock(&fpq->lock);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49221cf86d18bb66fe95d3338cb33bd4b9880ca5 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Tue, 22 Jun 2021 09:15:35 +0200
Subject: [PATCH] fuse: reject internal errno
Don't allow userspace to report errors that could be kernel-internal.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko(a)gmail.com>
Fixes: 334f485df85a ("[PATCH] FUSE - device functions")
Cc: <stable(a)vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6e63bcba2a40..b8d58aa08206 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1867,7 +1867,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
}
err = -EINVAL;
- if (oh.error <= -1000 || oh.error > 0)
+ if (oh.error <= -512 || oh.error > 0)
goto copy_finish;
spin_lock(&fpq->lock);
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49221cf86d18bb66fe95d3338cb33bd4b9880ca5 Mon Sep 17 00:00:00 2001
From: Miklos Szeredi <mszeredi(a)redhat.com>
Date: Tue, 22 Jun 2021 09:15:35 +0200
Subject: [PATCH] fuse: reject internal errno
Don't allow userspace to report errors that could be kernel-internal.
Reported-by: Anatoly Trosinenko <anatoly.trosinenko(a)gmail.com>
Fixes: 334f485df85a ("[PATCH] FUSE - device functions")
Cc: <stable(a)vger.kernel.org> # v2.6.14
Signed-off-by: Miklos Szeredi <mszeredi(a)redhat.com>
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 6e63bcba2a40..b8d58aa08206 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1867,7 +1867,7 @@ static ssize_t fuse_dev_do_write(struct fuse_dev *fud,
}
err = -EINVAL;
- if (oh.error <= -1000 || oh.error > 0)
+ if (oh.error <= -512 || oh.error > 0)
goto copy_finish;
spin_lock(&fpq->lock);
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9eea2904292c2d8fa98df141d3bf7c41ec9dc1b5 Mon Sep 17 00:00:00 2001
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Date: Fri, 14 May 2021 17:27:42 +0200
Subject: [PATCH] evm: Execute evm_inode_init_security() only when an HMAC key
is loaded
evm_inode_init_security() requires an HMAC key to calculate the HMAC on
initial xattrs provided by LSMs. However, it checks generically whether a
key has been loaded, including also public keys, which is not correct as
public keys are not suitable to calculate the HMAC.
Originally, support for signature verification was introduced to verify a
possibly immutable initial ram disk, when no new files are created, and to
switch to HMAC for the root filesystem. By that time, an HMAC key should
have been loaded and usable to calculate HMACs for new files.
More recently support for requiring an HMAC key was removed from the
kernel, so that signature verification can be used alone. Since this is a
legitimate use case, evm_inode_init_security() should not return an error
when no HMAC key has been loaded.
This patch fixes this problem by replacing the evm_key_loaded() check with
a check of the EVM_INIT_HMAC flag in evm_initialized.
Fixes: 26ddabfe96b ("evm: enable EVM when X509 certificate is loaded")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Cc: stable(a)vger.kernel.org # 4.5.x
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c
index 0de367aaa2d3..7ac5204c8d1f 100644
--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -521,7 +521,7 @@ void evm_inode_post_setattr(struct dentry *dentry, int ia_valid)
}
/*
- * evm_inode_init_security - initializes security.evm
+ * evm_inode_init_security - initializes security.evm HMAC value
*/
int evm_inode_init_security(struct inode *inode,
const struct xattr *lsm_xattr,
@@ -530,7 +530,8 @@ int evm_inode_init_security(struct inode *inode,
struct evm_xattr *xattr_data;
int rc;
- if (!evm_key_loaded() || !evm_protected_xattr(lsm_xattr->name))
+ if (!(evm_initialized & EVM_INIT_HMAC) ||
+ !evm_protected_xattr(lsm_xattr->name))
return 0;
xattr_data = kzalloc(sizeof(*xattr_data), GFP_NOFS);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 9eea2904292c2d8fa98df141d3bf7c41ec9dc1b5 Mon Sep 17 00:00:00 2001
From: Roberto Sassu <roberto.sassu(a)huawei.com>
Date: Fri, 14 May 2021 17:27:42 +0200
Subject: [PATCH] evm: Execute evm_inode_init_security() only when an HMAC key
is loaded
evm_inode_init_security() requires an HMAC key to calculate the HMAC on
initial xattrs provided by LSMs. However, it checks generically whether a
key has been loaded, including also public keys, which is not correct as
public keys are not suitable to calculate the HMAC.
Originally, support for signature verification was introduced to verify a
possibly immutable initial ram disk, when no new files are created, and to
switch to HMAC for the root filesystem. By that time, an HMAC key should
have been loaded and usable to calculate HMACs for new files.
More recently support for requiring an HMAC key was removed from the
kernel, so that signature verification can be used alone. Since this is a
legitimate use case, evm_inode_init_security() should not return an error
when no HMAC key has been loaded.
This patch fixes this problem by replacing the evm_key_loaded() check with
a check of the EVM_INIT_HMAC flag in evm_initialized.
Fixes: 26ddabfe96b ("evm: enable EVM when X509 certificate is loaded")
Signed-off-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Cc: stable(a)vger.kernel.org # 4.5.x
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
diff --git a/security/integrity/evm/evm_main.c b/security/integrity/evm/evm_main.c
index 0de367aaa2d3..7ac5204c8d1f 100644
--- a/security/integrity/evm/evm_main.c
+++ b/security/integrity/evm/evm_main.c
@@ -521,7 +521,7 @@ void evm_inode_post_setattr(struct dentry *dentry, int ia_valid)
}
/*
- * evm_inode_init_security - initializes security.evm
+ * evm_inode_init_security - initializes security.evm HMAC value
*/
int evm_inode_init_security(struct inode *inode,
const struct xattr *lsm_xattr,
@@ -530,7 +530,8 @@ int evm_inode_init_security(struct inode *inode,
struct evm_xattr *xattr_data;
int rc;
- if (!evm_key_loaded() || !evm_protected_xattr(lsm_xattr->name))
+ if (!(evm_initialized & EVM_INIT_HMAC) ||
+ !evm_protected_xattr(lsm_xattr->name))
return 0;
xattr_data = kzalloc(sizeof(*xattr_data), GFP_NOFS);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 314538041b5632ffaf64798faaeabaf2793fe029 Mon Sep 17 00:00:00 2001
From: Martin Fuzzey <martin.fuzzey(a)flowbird.group>
Date: Tue, 1 Jun 2021 18:19:53 +0200
Subject: [PATCH] rsi: fix AP mode with WPA failure due to encrypted EAPOL
In AP mode WPA2-PSK connections were not established.
The reason was that the AP was sending the first message
of the 4 way handshake encrypted, even though no pairwise
key had (correctly) yet been set.
Encryption was enabled if the "security_enable" driver flag
was set and encryption was not explicitly disabled by
IEEE80211_TX_INTFL_DONT_ENCRYPT.
However security_enable was set when *any* key, including
the AP GTK key, had been set which was causing unwanted
encryption even if no key was avaialble for the unicast
packet to be sent.
Fix this by adding a check that we have a key and drop
the old security_enable driver flag which is insufficient
and redundant.
The Redpine downstream out of tree driver does it this way too.
Regarding the Fixes tag the actual code being modified was
introduced earlier, with the original driver submission, in
dad0d04fa7ba ("rsi: Add RS9113 wireless driver"), however
at that time AP mode was not yet supported so there was
no bug at that point.
So I have tagged the introduction of AP support instead
which was part of the patch set "rsi: support for AP mode" [1]
It is not clear whether AP WPA has ever worked, I can see nothing
on the kernel side that broke it afterwards yet the AP support
patch series says "Tests are performed to confirm aggregation,
connections in WEP and WPA/WPA2 security."
One possibility is that the initial tests were done with a modified
userspace (hostapd).
[1] https://www.spinics.net/lists/linux-wireless/msg165302.html
Signed-off-by: Martin Fuzzey <martin.fuzzey(a)flowbird.group>
Fixes: 38ef62353acb ("rsi: security enhancements for AP mode")
CC: stable(a)vger.kernel.org
Signed-off-by: Kalle Valo <kvalo(a)codeaurora.org>
Link: https://lore.kernel.org/r/1622564459-24430-1-git-send-email-martin.fuzzey@f…
diff --git a/drivers/net/wireless/rsi/rsi_91x_hal.c b/drivers/net/wireless/rsi/rsi_91x_hal.c
index ab837921d9a4..99b21a2c8386 100644
--- a/drivers/net/wireless/rsi/rsi_91x_hal.c
+++ b/drivers/net/wireless/rsi/rsi_91x_hal.c
@@ -203,7 +203,7 @@ int rsi_prepare_data_desc(struct rsi_common *common, struct sk_buff *skb)
wh->frame_control |= cpu_to_le16(RSI_SET_PS_ENABLE);
if ((!(info->flags & IEEE80211_TX_INTFL_DONT_ENCRYPT)) &&
- (common->secinfo.security_enable)) {
+ info->control.hw_key) {
if (rsi_is_cipher_wep(common))
ieee80211_size += 4;
else
diff --git a/drivers/net/wireless/rsi/rsi_91x_mac80211.c b/drivers/net/wireless/rsi/rsi_91x_mac80211.c
index d9f1e73293aa..b66975f54567 100644
--- a/drivers/net/wireless/rsi/rsi_91x_mac80211.c
+++ b/drivers/net/wireless/rsi/rsi_91x_mac80211.c
@@ -1045,7 +1045,6 @@ static int rsi_mac80211_set_key(struct ieee80211_hw *hw,
mutex_lock(&common->mutex);
switch (cmd) {
case SET_KEY:
- secinfo->security_enable = true;
status = rsi_hal_key_config(hw, vif, key, sta);
if (status) {
mutex_unlock(&common->mutex);
@@ -1064,8 +1063,6 @@ static int rsi_mac80211_set_key(struct ieee80211_hw *hw,
break;
case DISABLE_KEY:
- if (vif->type == NL80211_IFTYPE_STATION)
- secinfo->security_enable = false;
rsi_dbg(ERR_ZONE, "%s: RSI del key\n", __func__);
memset(key, 0, sizeof(struct ieee80211_key_conf));
status = rsi_hal_key_config(hw, vif, key, sta);
diff --git a/drivers/net/wireless/rsi/rsi_91x_mgmt.c b/drivers/net/wireless/rsi/rsi_91x_mgmt.c
index dffe1d6cc592..891fd5f0fa76 100644
--- a/drivers/net/wireless/rsi/rsi_91x_mgmt.c
+++ b/drivers/net/wireless/rsi/rsi_91x_mgmt.c
@@ -1803,8 +1803,7 @@ int rsi_send_wowlan_request(struct rsi_common *common, u16 flags,
RSI_WIFI_MGMT_Q);
cmd_frame->desc.desc_dword0.frame_type = WOWLAN_CONFIG_PARAMS;
cmd_frame->host_sleep_status = sleep_status;
- if (common->secinfo.security_enable &&
- common->secinfo.gtk_cipher)
+ if (common->secinfo.gtk_cipher)
flags |= RSI_WOW_GTK_REKEY;
if (sleep_status)
cmd_frame->wow_flags = flags;
diff --git a/drivers/net/wireless/rsi/rsi_main.h b/drivers/net/wireless/rsi/rsi_main.h
index a1065e5a92b4..0f535850a383 100644
--- a/drivers/net/wireless/rsi/rsi_main.h
+++ b/drivers/net/wireless/rsi/rsi_main.h
@@ -151,7 +151,6 @@ enum edca_queue {
};
struct security_info {
- bool security_enable;
u32 ptk_cipher;
u32 gtk_cipher;
};
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e7d9560aeae51415f6c9bc343feb783a441ff4c5 Mon Sep 17 00:00:00 2001
From: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Date: Wed, 16 Jun 2021 12:21:30 -0400
Subject: [PATCH] Revert "drm/amd/display: Fix overlay validation by
considering cursors"
This reverts commit 33f409e60eb0c59a4d0d06a62ab4642a988e17f7.
The patch that we are reverting here was originally applied because it
fixes multiple IGT issues and flickering in Android. However, after a
discussion with Sean Paul and Mark, it looks like that this patch might
cause problems on ChromeOS. For this reason, we decided to revert this
patch.
Cc: Nicholas Kazlauskas <Nicholas.Kazlauskas(a)amd.com>
Cc: Harry Wentland <Harry.Wentland(a)amd.com>
Cc: Hersen Wu <hersenxs.wu(a)amd.com>
Cc: Sean Paul <seanpaul(a)chromium.org>
Cc: Mark Yacoub <markyacoub(a)chromium.org>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Reviewed-by: Sean Paul <seanpaul(a)chromium.org>
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Cc: stable(a)vger.kernel.org
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index 2688a2e759de..cfb2f9e43661 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -10120,8 +10120,8 @@ static int validate_overlay(struct drm_atomic_state *state)
{
int i;
struct drm_plane *plane;
- struct drm_plane_state *new_plane_state;
- struct drm_plane_state *primary_state, *cursor_state, *overlay_state = NULL;
+ struct drm_plane_state *old_plane_state, *new_plane_state;
+ struct drm_plane_state *primary_state, *overlay_state = NULL;
/* Check if primary plane is contained inside overlay */
for_each_new_plane_in_state_reverse(state, plane, new_plane_state, i) {
@@ -10151,14 +10151,6 @@ static int validate_overlay(struct drm_atomic_state *state)
if (!primary_state->crtc)
return 0;
- /* check if cursor plane is enabled */
- cursor_state = drm_atomic_get_plane_state(state, overlay_state->crtc->cursor);
- if (IS_ERR(cursor_state))
- return PTR_ERR(cursor_state);
-
- if (drm_atomic_plane_disabling(plane->state, cursor_state))
- return 0;
-
/* Perform the bounds check to ensure the overlay plane covers the primary */
if (primary_state->crtc_x < overlay_state->crtc_x ||
primary_state->crtc_y < overlay_state->crtc_y ||
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8090d67421ddab0ae932abab5a60200598bf0bbb Mon Sep 17 00:00:00 2001
From: Stephan Gerhold <stephan(a)gerhold.net>
Date: Wed, 26 May 2021 11:44:07 +0200
Subject: [PATCH] iio: accel: bma180: Fix BMA25x bandwidth register values
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
According to the BMA253 datasheet [1] and BMA250 datasheet [2] the
bandwidth value for BMA25x should be set as 01xxx:
"Settings 00xxx result in a bandwidth of 7.81 Hz; [...]
It is recommended [...] to use the range from ´01000b´ to ´01111b´
only in order to be compatible with future products."
However, at the moment the drivers sets bandwidth values from 0 to 6,
which is not recommended and always results into 7.81 Hz bandwidth
according to the datasheet.
Fix this by introducing a bw_offset = 8 = 01000b for BMA25x,
so the additional bit is always set for BMA25x.
[1]: https://www.bosch-sensortec.com/media/boschsensortec/downloads/datasheets/b…
[2]: https://datasheet.octopart.com/BMA250-Bosch-datasheet-15540103.pdf
Cc: Peter Meerwald <pmeerw(a)pmeerw.net>
Fixes: 2017cff24cc0 ("iio:bma180: Add BMA250 chip support")
Signed-off-by: Stephan Gerhold <stephan(a)gerhold.net>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Link: https://lore.kernel.org/r/20210526094408.34298-2-stephan@gerhold.net
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/accel/bma180.c b/drivers/iio/accel/bma180.c
index 4a07e60c0e21..e7c6b3096cb7 100644
--- a/drivers/iio/accel/bma180.c
+++ b/drivers/iio/accel/bma180.c
@@ -55,7 +55,7 @@ struct bma180_part_info {
u8 int_reset_reg, int_reset_mask;
u8 sleep_reg, sleep_mask;
- u8 bw_reg, bw_mask;
+ u8 bw_reg, bw_mask, bw_offset;
u8 scale_reg, scale_mask;
u8 power_reg, power_mask, lowpower_val;
u8 int_enable_reg, int_enable_mask;
@@ -127,6 +127,7 @@ struct bma180_part_info {
#define BMA250_RANGE_MASK GENMASK(3, 0) /* Range of accel values */
#define BMA250_BW_MASK GENMASK(4, 0) /* Accel bandwidth */
+#define BMA250_BW_OFFSET 8
#define BMA250_SUSPEND_MASK BIT(7) /* chip will sleep */
#define BMA250_LOWPOWER_MASK BIT(6)
#define BMA250_DATA_INTEN_MASK BIT(4)
@@ -143,6 +144,7 @@ struct bma180_part_info {
#define BMA254_RANGE_MASK GENMASK(3, 0) /* Range of accel values */
#define BMA254_BW_MASK GENMASK(4, 0) /* Accel bandwidth */
+#define BMA254_BW_OFFSET 8
#define BMA254_SUSPEND_MASK BIT(7) /* chip will sleep */
#define BMA254_LOWPOWER_MASK BIT(6)
#define BMA254_DATA_INTEN_MASK BIT(4)
@@ -287,7 +289,8 @@ static int bma180_set_bw(struct bma180_data *data, int val)
for (i = 0; i < data->part_info->num_bw; ++i) {
if (data->part_info->bw_table[i] == val) {
ret = bma180_set_bits(data, data->part_info->bw_reg,
- data->part_info->bw_mask, i);
+ data->part_info->bw_mask,
+ i + data->part_info->bw_offset);
if (ret) {
dev_err(&data->client->dev,
"failed to set bandwidth\n");
@@ -880,6 +883,7 @@ static const struct bma180_part_info bma180_part_info[] = {
.sleep_mask = BMA250_SUSPEND_MASK,
.bw_reg = BMA250_BW_REG,
.bw_mask = BMA250_BW_MASK,
+ .bw_offset = BMA250_BW_OFFSET,
.scale_reg = BMA250_RANGE_REG,
.scale_mask = BMA250_RANGE_MASK,
.power_reg = BMA250_POWER_REG,
@@ -909,6 +913,7 @@ static const struct bma180_part_info bma180_part_info[] = {
.sleep_mask = BMA254_SUSPEND_MASK,
.bw_reg = BMA254_BW_REG,
.bw_mask = BMA254_BW_MASK,
+ .bw_offset = BMA254_BW_OFFSET,
.scale_reg = BMA254_RANGE_REG,
.scale_mask = BMA254_RANGE_MASK,
.power_reg = BMA254_POWER_REG,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8090d67421ddab0ae932abab5a60200598bf0bbb Mon Sep 17 00:00:00 2001
From: Stephan Gerhold <stephan(a)gerhold.net>
Date: Wed, 26 May 2021 11:44:07 +0200
Subject: [PATCH] iio: accel: bma180: Fix BMA25x bandwidth register values
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
According to the BMA253 datasheet [1] and BMA250 datasheet [2] the
bandwidth value for BMA25x should be set as 01xxx:
"Settings 00xxx result in a bandwidth of 7.81 Hz; [...]
It is recommended [...] to use the range from ´01000b´ to ´01111b´
only in order to be compatible with future products."
However, at the moment the drivers sets bandwidth values from 0 to 6,
which is not recommended and always results into 7.81 Hz bandwidth
according to the datasheet.
Fix this by introducing a bw_offset = 8 = 01000b for BMA25x,
so the additional bit is always set for BMA25x.
[1]: https://www.bosch-sensortec.com/media/boschsensortec/downloads/datasheets/b…
[2]: https://datasheet.octopart.com/BMA250-Bosch-datasheet-15540103.pdf
Cc: Peter Meerwald <pmeerw(a)pmeerw.net>
Fixes: 2017cff24cc0 ("iio:bma180: Add BMA250 chip support")
Signed-off-by: Stephan Gerhold <stephan(a)gerhold.net>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Link: https://lore.kernel.org/r/20210526094408.34298-2-stephan@gerhold.net
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/accel/bma180.c b/drivers/iio/accel/bma180.c
index 4a07e60c0e21..e7c6b3096cb7 100644
--- a/drivers/iio/accel/bma180.c
+++ b/drivers/iio/accel/bma180.c
@@ -55,7 +55,7 @@ struct bma180_part_info {
u8 int_reset_reg, int_reset_mask;
u8 sleep_reg, sleep_mask;
- u8 bw_reg, bw_mask;
+ u8 bw_reg, bw_mask, bw_offset;
u8 scale_reg, scale_mask;
u8 power_reg, power_mask, lowpower_val;
u8 int_enable_reg, int_enable_mask;
@@ -127,6 +127,7 @@ struct bma180_part_info {
#define BMA250_RANGE_MASK GENMASK(3, 0) /* Range of accel values */
#define BMA250_BW_MASK GENMASK(4, 0) /* Accel bandwidth */
+#define BMA250_BW_OFFSET 8
#define BMA250_SUSPEND_MASK BIT(7) /* chip will sleep */
#define BMA250_LOWPOWER_MASK BIT(6)
#define BMA250_DATA_INTEN_MASK BIT(4)
@@ -143,6 +144,7 @@ struct bma180_part_info {
#define BMA254_RANGE_MASK GENMASK(3, 0) /* Range of accel values */
#define BMA254_BW_MASK GENMASK(4, 0) /* Accel bandwidth */
+#define BMA254_BW_OFFSET 8
#define BMA254_SUSPEND_MASK BIT(7) /* chip will sleep */
#define BMA254_LOWPOWER_MASK BIT(6)
#define BMA254_DATA_INTEN_MASK BIT(4)
@@ -287,7 +289,8 @@ static int bma180_set_bw(struct bma180_data *data, int val)
for (i = 0; i < data->part_info->num_bw; ++i) {
if (data->part_info->bw_table[i] == val) {
ret = bma180_set_bits(data, data->part_info->bw_reg,
- data->part_info->bw_mask, i);
+ data->part_info->bw_mask,
+ i + data->part_info->bw_offset);
if (ret) {
dev_err(&data->client->dev,
"failed to set bandwidth\n");
@@ -880,6 +883,7 @@ static const struct bma180_part_info bma180_part_info[] = {
.sleep_mask = BMA250_SUSPEND_MASK,
.bw_reg = BMA250_BW_REG,
.bw_mask = BMA250_BW_MASK,
+ .bw_offset = BMA250_BW_OFFSET,
.scale_reg = BMA250_RANGE_REG,
.scale_mask = BMA250_RANGE_MASK,
.power_reg = BMA250_POWER_REG,
@@ -909,6 +913,7 @@ static const struct bma180_part_info bma180_part_info[] = {
.sleep_mask = BMA254_SUSPEND_MASK,
.bw_reg = BMA254_BW_REG,
.bw_mask = BMA254_BW_MASK,
+ .bw_offset = BMA254_BW_OFFSET,
.scale_reg = BMA254_RANGE_REG,
.scale_mask = BMA254_RANGE_MASK,
.power_reg = BMA254_POWER_REG,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8090d67421ddab0ae932abab5a60200598bf0bbb Mon Sep 17 00:00:00 2001
From: Stephan Gerhold <stephan(a)gerhold.net>
Date: Wed, 26 May 2021 11:44:07 +0200
Subject: [PATCH] iio: accel: bma180: Fix BMA25x bandwidth register values
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
According to the BMA253 datasheet [1] and BMA250 datasheet [2] the
bandwidth value for BMA25x should be set as 01xxx:
"Settings 00xxx result in a bandwidth of 7.81 Hz; [...]
It is recommended [...] to use the range from ´01000b´ to ´01111b´
only in order to be compatible with future products."
However, at the moment the drivers sets bandwidth values from 0 to 6,
which is not recommended and always results into 7.81 Hz bandwidth
according to the datasheet.
Fix this by introducing a bw_offset = 8 = 01000b for BMA25x,
so the additional bit is always set for BMA25x.
[1]: https://www.bosch-sensortec.com/media/boschsensortec/downloads/datasheets/b…
[2]: https://datasheet.octopart.com/BMA250-Bosch-datasheet-15540103.pdf
Cc: Peter Meerwald <pmeerw(a)pmeerw.net>
Fixes: 2017cff24cc0 ("iio:bma180: Add BMA250 chip support")
Signed-off-by: Stephan Gerhold <stephan(a)gerhold.net>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Link: https://lore.kernel.org/r/20210526094408.34298-2-stephan@gerhold.net
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/accel/bma180.c b/drivers/iio/accel/bma180.c
index 4a07e60c0e21..e7c6b3096cb7 100644
--- a/drivers/iio/accel/bma180.c
+++ b/drivers/iio/accel/bma180.c
@@ -55,7 +55,7 @@ struct bma180_part_info {
u8 int_reset_reg, int_reset_mask;
u8 sleep_reg, sleep_mask;
- u8 bw_reg, bw_mask;
+ u8 bw_reg, bw_mask, bw_offset;
u8 scale_reg, scale_mask;
u8 power_reg, power_mask, lowpower_val;
u8 int_enable_reg, int_enable_mask;
@@ -127,6 +127,7 @@ struct bma180_part_info {
#define BMA250_RANGE_MASK GENMASK(3, 0) /* Range of accel values */
#define BMA250_BW_MASK GENMASK(4, 0) /* Accel bandwidth */
+#define BMA250_BW_OFFSET 8
#define BMA250_SUSPEND_MASK BIT(7) /* chip will sleep */
#define BMA250_LOWPOWER_MASK BIT(6)
#define BMA250_DATA_INTEN_MASK BIT(4)
@@ -143,6 +144,7 @@ struct bma180_part_info {
#define BMA254_RANGE_MASK GENMASK(3, 0) /* Range of accel values */
#define BMA254_BW_MASK GENMASK(4, 0) /* Accel bandwidth */
+#define BMA254_BW_OFFSET 8
#define BMA254_SUSPEND_MASK BIT(7) /* chip will sleep */
#define BMA254_LOWPOWER_MASK BIT(6)
#define BMA254_DATA_INTEN_MASK BIT(4)
@@ -287,7 +289,8 @@ static int bma180_set_bw(struct bma180_data *data, int val)
for (i = 0; i < data->part_info->num_bw; ++i) {
if (data->part_info->bw_table[i] == val) {
ret = bma180_set_bits(data, data->part_info->bw_reg,
- data->part_info->bw_mask, i);
+ data->part_info->bw_mask,
+ i + data->part_info->bw_offset);
if (ret) {
dev_err(&data->client->dev,
"failed to set bandwidth\n");
@@ -880,6 +883,7 @@ static const struct bma180_part_info bma180_part_info[] = {
.sleep_mask = BMA250_SUSPEND_MASK,
.bw_reg = BMA250_BW_REG,
.bw_mask = BMA250_BW_MASK,
+ .bw_offset = BMA250_BW_OFFSET,
.scale_reg = BMA250_RANGE_REG,
.scale_mask = BMA250_RANGE_MASK,
.power_reg = BMA250_POWER_REG,
@@ -909,6 +913,7 @@ static const struct bma180_part_info bma180_part_info[] = {
.sleep_mask = BMA254_SUSPEND_MASK,
.bw_reg = BMA254_BW_REG,
.bw_mask = BMA254_BW_MASK,
+ .bw_offset = BMA254_BW_OFFSET,
.scale_reg = BMA254_RANGE_REG,
.scale_mask = BMA254_RANGE_MASK,
.power_reg = BMA254_POWER_REG,
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8090d67421ddab0ae932abab5a60200598bf0bbb Mon Sep 17 00:00:00 2001
From: Stephan Gerhold <stephan(a)gerhold.net>
Date: Wed, 26 May 2021 11:44:07 +0200
Subject: [PATCH] iio: accel: bma180: Fix BMA25x bandwidth register values
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
According to the BMA253 datasheet [1] and BMA250 datasheet [2] the
bandwidth value for BMA25x should be set as 01xxx:
"Settings 00xxx result in a bandwidth of 7.81 Hz; [...]
It is recommended [...] to use the range from ´01000b´ to ´01111b´
only in order to be compatible with future products."
However, at the moment the drivers sets bandwidth values from 0 to 6,
which is not recommended and always results into 7.81 Hz bandwidth
according to the datasheet.
Fix this by introducing a bw_offset = 8 = 01000b for BMA25x,
so the additional bit is always set for BMA25x.
[1]: https://www.bosch-sensortec.com/media/boschsensortec/downloads/datasheets/b…
[2]: https://datasheet.octopart.com/BMA250-Bosch-datasheet-15540103.pdf
Cc: Peter Meerwald <pmeerw(a)pmeerw.net>
Fixes: 2017cff24cc0 ("iio:bma180: Add BMA250 chip support")
Signed-off-by: Stephan Gerhold <stephan(a)gerhold.net>
Reviewed-by: Linus Walleij <linus.walleij(a)linaro.org>
Link: https://lore.kernel.org/r/20210526094408.34298-2-stephan@gerhold.net
Cc: <Stable(a)vger.kernel.org>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
diff --git a/drivers/iio/accel/bma180.c b/drivers/iio/accel/bma180.c
index 4a07e60c0e21..e7c6b3096cb7 100644
--- a/drivers/iio/accel/bma180.c
+++ b/drivers/iio/accel/bma180.c
@@ -55,7 +55,7 @@ struct bma180_part_info {
u8 int_reset_reg, int_reset_mask;
u8 sleep_reg, sleep_mask;
- u8 bw_reg, bw_mask;
+ u8 bw_reg, bw_mask, bw_offset;
u8 scale_reg, scale_mask;
u8 power_reg, power_mask, lowpower_val;
u8 int_enable_reg, int_enable_mask;
@@ -127,6 +127,7 @@ struct bma180_part_info {
#define BMA250_RANGE_MASK GENMASK(3, 0) /* Range of accel values */
#define BMA250_BW_MASK GENMASK(4, 0) /* Accel bandwidth */
+#define BMA250_BW_OFFSET 8
#define BMA250_SUSPEND_MASK BIT(7) /* chip will sleep */
#define BMA250_LOWPOWER_MASK BIT(6)
#define BMA250_DATA_INTEN_MASK BIT(4)
@@ -143,6 +144,7 @@ struct bma180_part_info {
#define BMA254_RANGE_MASK GENMASK(3, 0) /* Range of accel values */
#define BMA254_BW_MASK GENMASK(4, 0) /* Accel bandwidth */
+#define BMA254_BW_OFFSET 8
#define BMA254_SUSPEND_MASK BIT(7) /* chip will sleep */
#define BMA254_LOWPOWER_MASK BIT(6)
#define BMA254_DATA_INTEN_MASK BIT(4)
@@ -287,7 +289,8 @@ static int bma180_set_bw(struct bma180_data *data, int val)
for (i = 0; i < data->part_info->num_bw; ++i) {
if (data->part_info->bw_table[i] == val) {
ret = bma180_set_bits(data, data->part_info->bw_reg,
- data->part_info->bw_mask, i);
+ data->part_info->bw_mask,
+ i + data->part_info->bw_offset);
if (ret) {
dev_err(&data->client->dev,
"failed to set bandwidth\n");
@@ -880,6 +883,7 @@ static const struct bma180_part_info bma180_part_info[] = {
.sleep_mask = BMA250_SUSPEND_MASK,
.bw_reg = BMA250_BW_REG,
.bw_mask = BMA250_BW_MASK,
+ .bw_offset = BMA250_BW_OFFSET,
.scale_reg = BMA250_RANGE_REG,
.scale_mask = BMA250_RANGE_MASK,
.power_reg = BMA250_POWER_REG,
@@ -909,6 +913,7 @@ static const struct bma180_part_info bma180_part_info[] = {
.sleep_mask = BMA254_SUSPEND_MASK,
.bw_reg = BMA254_BW_REG,
.bw_mask = BMA254_BW_MASK,
+ .bw_offset = BMA254_BW_OFFSET,
.scale_reg = BMA254_RANGE_REG,
.scale_mask = BMA254_RANGE_MASK,
.power_reg = BMA254_POWER_REG,
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 39307f8ee3539478c28e71b4909b5b028cce14b1 Mon Sep 17 00:00:00 2001
From: Daniel Rosenberg <drosen(a)google.com>
Date: Thu, 3 Jun 2021 09:50:37 +0000
Subject: [PATCH] f2fs: Show casefolding support only when supported
The casefolding feature is only supported when CONFIG_UNICODE is set.
This modifies the feature list f2fs presents under sysfs accordingly.
Fixes: 5aba54302a46 ("f2fs: include charset encoding information in the superblock")
Cc: stable(a)vger.kernel.org # v5.4+
Signed-off-by: Daniel Rosenberg <drosen(a)google.com>
Reviewed-by: Eric Biggers <ebiggers(a)google.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index c579d5d3a916..62fbe4f20dd6 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -725,7 +725,9 @@ F2FS_FEATURE_RO_ATTR(lost_found, FEAT_LOST_FOUND);
F2FS_FEATURE_RO_ATTR(verity, FEAT_VERITY);
#endif
F2FS_FEATURE_RO_ATTR(sb_checksum, FEAT_SB_CHECKSUM);
+#ifdef CONFIG_UNICODE
F2FS_FEATURE_RO_ATTR(casefold, FEAT_CASEFOLD);
+#endif
F2FS_FEATURE_RO_ATTR(readonly, FEAT_RO);
#ifdef CONFIG_F2FS_FS_COMPRESSION
F2FS_FEATURE_RO_ATTR(compression, FEAT_COMPRESSION);
@@ -836,7 +838,9 @@ static struct attribute *f2fs_feat_attrs[] = {
ATTR_LIST(verity),
#endif
ATTR_LIST(sb_checksum),
+#ifdef CONFIG_UNICODE
ATTR_LIST(casefold),
+#endif
ATTR_LIST(readonly),
#ifdef CONFIG_F2FS_FS_COMPRESSION
ATTR_LIST(compression),
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 39307f8ee3539478c28e71b4909b5b028cce14b1 Mon Sep 17 00:00:00 2001
From: Daniel Rosenberg <drosen(a)google.com>
Date: Thu, 3 Jun 2021 09:50:37 +0000
Subject: [PATCH] f2fs: Show casefolding support only when supported
The casefolding feature is only supported when CONFIG_UNICODE is set.
This modifies the feature list f2fs presents under sysfs accordingly.
Fixes: 5aba54302a46 ("f2fs: include charset encoding information in the superblock")
Cc: stable(a)vger.kernel.org # v5.4+
Signed-off-by: Daniel Rosenberg <drosen(a)google.com>
Reviewed-by: Eric Biggers <ebiggers(a)google.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index c579d5d3a916..62fbe4f20dd6 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -725,7 +725,9 @@ F2FS_FEATURE_RO_ATTR(lost_found, FEAT_LOST_FOUND);
F2FS_FEATURE_RO_ATTR(verity, FEAT_VERITY);
#endif
F2FS_FEATURE_RO_ATTR(sb_checksum, FEAT_SB_CHECKSUM);
+#ifdef CONFIG_UNICODE
F2FS_FEATURE_RO_ATTR(casefold, FEAT_CASEFOLD);
+#endif
F2FS_FEATURE_RO_ATTR(readonly, FEAT_RO);
#ifdef CONFIG_F2FS_FS_COMPRESSION
F2FS_FEATURE_RO_ATTR(compression, FEAT_COMPRESSION);
@@ -836,7 +838,9 @@ static struct attribute *f2fs_feat_attrs[] = {
ATTR_LIST(verity),
#endif
ATTR_LIST(sb_checksum),
+#ifdef CONFIG_UNICODE
ATTR_LIST(casefold),
+#endif
ATTR_LIST(readonly),
#ifdef CONFIG_F2FS_FS_COMPRESSION
ATTR_LIST(compression),
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 39307f8ee3539478c28e71b4909b5b028cce14b1 Mon Sep 17 00:00:00 2001
From: Daniel Rosenberg <drosen(a)google.com>
Date: Thu, 3 Jun 2021 09:50:37 +0000
Subject: [PATCH] f2fs: Show casefolding support only when supported
The casefolding feature is only supported when CONFIG_UNICODE is set.
This modifies the feature list f2fs presents under sysfs accordingly.
Fixes: 5aba54302a46 ("f2fs: include charset encoding information in the superblock")
Cc: stable(a)vger.kernel.org # v5.4+
Signed-off-by: Daniel Rosenberg <drosen(a)google.com>
Reviewed-by: Eric Biggers <ebiggers(a)google.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index c579d5d3a916..62fbe4f20dd6 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -725,7 +725,9 @@ F2FS_FEATURE_RO_ATTR(lost_found, FEAT_LOST_FOUND);
F2FS_FEATURE_RO_ATTR(verity, FEAT_VERITY);
#endif
F2FS_FEATURE_RO_ATTR(sb_checksum, FEAT_SB_CHECKSUM);
+#ifdef CONFIG_UNICODE
F2FS_FEATURE_RO_ATTR(casefold, FEAT_CASEFOLD);
+#endif
F2FS_FEATURE_RO_ATTR(readonly, FEAT_RO);
#ifdef CONFIG_F2FS_FS_COMPRESSION
F2FS_FEATURE_RO_ATTR(compression, FEAT_COMPRESSION);
@@ -836,7 +838,9 @@ static struct attribute *f2fs_feat_attrs[] = {
ATTR_LIST(verity),
#endif
ATTR_LIST(sb_checksum),
+#ifdef CONFIG_UNICODE
ATTR_LIST(casefold),
+#endif
ATTR_LIST(readonly),
#ifdef CONFIG_F2FS_FS_COMPRESSION
ATTR_LIST(compression),
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 39307f8ee3539478c28e71b4909b5b028cce14b1 Mon Sep 17 00:00:00 2001
From: Daniel Rosenberg <drosen(a)google.com>
Date: Thu, 3 Jun 2021 09:50:37 +0000
Subject: [PATCH] f2fs: Show casefolding support only when supported
The casefolding feature is only supported when CONFIG_UNICODE is set.
This modifies the feature list f2fs presents under sysfs accordingly.
Fixes: 5aba54302a46 ("f2fs: include charset encoding information in the superblock")
Cc: stable(a)vger.kernel.org # v5.4+
Signed-off-by: Daniel Rosenberg <drosen(a)google.com>
Reviewed-by: Eric Biggers <ebiggers(a)google.com>
Reviewed-by: Chao Yu <yuchao0(a)huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk(a)kernel.org>
diff --git a/fs/f2fs/sysfs.c b/fs/f2fs/sysfs.c
index c579d5d3a916..62fbe4f20dd6 100644
--- a/fs/f2fs/sysfs.c
+++ b/fs/f2fs/sysfs.c
@@ -725,7 +725,9 @@ F2FS_FEATURE_RO_ATTR(lost_found, FEAT_LOST_FOUND);
F2FS_FEATURE_RO_ATTR(verity, FEAT_VERITY);
#endif
F2FS_FEATURE_RO_ATTR(sb_checksum, FEAT_SB_CHECKSUM);
+#ifdef CONFIG_UNICODE
F2FS_FEATURE_RO_ATTR(casefold, FEAT_CASEFOLD);
+#endif
F2FS_FEATURE_RO_ATTR(readonly, FEAT_RO);
#ifdef CONFIG_F2FS_FS_COMPRESSION
F2FS_FEATURE_RO_ATTR(compression, FEAT_COMPRESSION);
@@ -836,7 +838,9 @@ static struct attribute *f2fs_feat_attrs[] = {
ATTR_LIST(verity),
#endif
ATTR_LIST(sb_checksum),
+#ifdef CONFIG_UNICODE
ATTR_LIST(casefold),
+#endif
ATTR_LIST(readonly),
#ifdef CONFIG_F2FS_FS_COMPRESSION
ATTR_LIST(compression),
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49c6f8756cdffeb9af1fbcb86bacacced26465d7 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 22 Jun 2021 10:56:51 -0700
Subject: [PATCH] KVM: x86: Force all MMUs to reinitialize if guest CPUID is
modified
Invalidate all MMUs' roles after a CPUID update to force reinitizliation
of the MMU context/helpers. Despite the efforts of commit de3ccd26fafc
("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"),
there are still a handful of CPUID-based properties that affect MMU
behavior but are not incorporated into mmu_role. E.g. 1gb hugepage
support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all
factor into the guest's reserved PTE bits.
The obvious alternative would be to add all such properties to mmu_role,
but doing so provides no benefit over simply forcing a reinitialization
on every CPUID update, as setting guest CPUID is a rare operation.
Note, reinitializing all MMUs after a CPUID update does not fix all of
KVM's woes. Specifically, kvm_mmu_page_role doesn't track the CPUID
properties, which means that a vCPU can reuse shadow pages that should
not exist for the new vCPU model, e.g. that map GPAs that are now illegal
(due to MAXPHYADDR changes) or that set bits that are now reserved
(PAGE_SIZE for 1gb pages), etc...
Tracking the relevant CPUID properties in kvm_mmu_page_role would address
the majority of problems, but fully tracking that much state in the
shadow page role comes with an unpalatable cost as it would require a
non-trivial increase in KVM's memory footprint. The GBPAGES case is even
worse, as neither Intel nor AMD provides a way to disable 1gb hugepage
support in the hardware page walker, i.e. it's a virtualization hole that
can't be closed when using TDP.
In other words, resetting the MMU after a CPUID update is largely a
superficial fix. But, it will allow reverting the tracking of MAXPHYADDR
in the mmu_role, and that case in particular needs to mostly work because
KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging
is supported. For cases where KVM botches guest behavior, the damage is
limited to that guest. But for the shadow_root_level, a misconfigured
MMU can cause KVM to incorrectly access memory, e.g. due to walking off
the end of its shadow page tables.
Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
Cc: Yu Zhang <yu.c.zhang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210622175739.3610207-7-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a474cd13b0c8..f1e4d5f2bf8d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1496,6 +1496,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu);
void kvm_mmu_init_vm(struct kvm *kvm);
void kvm_mmu_uninit_vm(struct kvm *kvm);
+void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot,
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b4da665bb892..c42613cfb5ba 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -202,10 +202,10 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);
/*
- * Except for the MMU, which needs to be reset after any vendor
- * specific adjustments to the reserved GPA bits.
+ * Except for the MMU, which needs to do its thing any vendor specific
+ * adjustments to the reserved GPA bits.
*/
- kvm_mmu_reset_context(vcpu);
+ kvm_mmu_after_set_cpuid(vcpu);
}
static int is_efer_nx(void)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fa35762f325c..1ab3fdb1f2e4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4903,6 +4903,18 @@ kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu)
return role.base;
}
+void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+ /*
+ * Invalidate all MMU roles to force them to reinitialize as CPUID
+ * information is factored into reserved bit calculations.
+ */
+ vcpu->arch.root_mmu.mmu_role.ext.valid = 0;
+ vcpu->arch.guest_mmu.mmu_role.ext.valid = 0;
+ vcpu->arch.nested_mmu.mmu_role.ext.valid = 0;
+ kvm_mmu_reset_context(vcpu);
+}
+
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
{
kvm_mmu_unload(vcpu);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 49c6f8756cdffeb9af1fbcb86bacacced26465d7 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 22 Jun 2021 10:56:51 -0700
Subject: [PATCH] KVM: x86: Force all MMUs to reinitialize if guest CPUID is
modified
Invalidate all MMUs' roles after a CPUID update to force reinitizliation
of the MMU context/helpers. Despite the efforts of commit de3ccd26fafc
("KVM: MMU: record maximum physical address width in kvm_mmu_extended_role"),
there are still a handful of CPUID-based properties that affect MMU
behavior but are not incorporated into mmu_role. E.g. 1gb hugepage
support, AMD vs. Intel handling of bit 8, and SEV's C-Bit location all
factor into the guest's reserved PTE bits.
The obvious alternative would be to add all such properties to mmu_role,
but doing so provides no benefit over simply forcing a reinitialization
on every CPUID update, as setting guest CPUID is a rare operation.
Note, reinitializing all MMUs after a CPUID update does not fix all of
KVM's woes. Specifically, kvm_mmu_page_role doesn't track the CPUID
properties, which means that a vCPU can reuse shadow pages that should
not exist for the new vCPU model, e.g. that map GPAs that are now illegal
(due to MAXPHYADDR changes) or that set bits that are now reserved
(PAGE_SIZE for 1gb pages), etc...
Tracking the relevant CPUID properties in kvm_mmu_page_role would address
the majority of problems, but fully tracking that much state in the
shadow page role comes with an unpalatable cost as it would require a
non-trivial increase in KVM's memory footprint. The GBPAGES case is even
worse, as neither Intel nor AMD provides a way to disable 1gb hugepage
support in the hardware page walker, i.e. it's a virtualization hole that
can't be closed when using TDP.
In other words, resetting the MMU after a CPUID update is largely a
superficial fix. But, it will allow reverting the tracking of MAXPHYADDR
in the mmu_role, and that case in particular needs to mostly work because
KVM's shadow_root_level depends on guest MAXPHYADDR when 5-level paging
is supported. For cases where KVM botches guest behavior, the damage is
limited to that guest. But for the shadow_root_level, a misconfigured
MMU can cause KVM to incorrectly access memory, e.g. due to walking off
the end of its shadow page tables.
Fixes: 7dcd57552008 ("x86/kvm/mmu: check if tdp/shadow MMU reconfiguration is needed")
Cc: Yu Zhang <yu.c.zhang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210622175739.3610207-7-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index a474cd13b0c8..f1e4d5f2bf8d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1496,6 +1496,7 @@ int kvm_mmu_create(struct kvm_vcpu *vcpu);
void kvm_mmu_init_vm(struct kvm *kvm);
void kvm_mmu_uninit_vm(struct kvm *kvm);
+void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu);
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu);
void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot,
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index b4da665bb892..c42613cfb5ba 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -202,10 +202,10 @@ static void kvm_vcpu_after_set_cpuid(struct kvm_vcpu *vcpu)
static_call(kvm_x86_vcpu_after_set_cpuid)(vcpu);
/*
- * Except for the MMU, which needs to be reset after any vendor
- * specific adjustments to the reserved GPA bits.
+ * Except for the MMU, which needs to do its thing any vendor specific
+ * adjustments to the reserved GPA bits.
*/
- kvm_mmu_reset_context(vcpu);
+ kvm_mmu_after_set_cpuid(vcpu);
}
static int is_efer_nx(void)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fa35762f325c..1ab3fdb1f2e4 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4903,6 +4903,18 @@ kvm_mmu_calc_root_page_role(struct kvm_vcpu *vcpu)
return role.base;
}
+void kvm_mmu_after_set_cpuid(struct kvm_vcpu *vcpu)
+{
+ /*
+ * Invalidate all MMU roles to force them to reinitialize as CPUID
+ * information is factored into reserved bit calculations.
+ */
+ vcpu->arch.root_mmu.mmu_role.ext.valid = 0;
+ vcpu->arch.guest_mmu.mmu_role.ext.valid = 0;
+ vcpu->arch.nested_mmu.mmu_role.ext.valid = 0;
+ kvm_mmu_reset_context(vcpu);
+}
+
void kvm_mmu_reset_context(struct kvm_vcpu *vcpu)
{
kvm_mmu_unload(vcpu);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ef318b9edf66a082f23d00d79b70c17b4c055a26 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 22 Jun 2021 10:56:49 -0700
Subject: [PATCH] KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in
nested NPT walk
Use the MMU's role to get its effective SMEP value when injecting a fault
into the guest. When walking L1's (nested) NPT while L2 is active, vCPU
state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
CR4, EFER, etc... If L1 and L2 have different settings for SMEP and
L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
when injecting #NPF.
Fixes: e57d4a356ad3 ("KVM: Add instruction fetch checking when walking guest page table")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210622175739.3610207-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..52fffd68b522 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -471,8 +471,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
error:
errcode |= write_fault | user_fault;
- if (fetch_fault && (mmu->nx ||
- kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
+ if (fetch_fault && (mmu->nx || mmu->mmu_role.ext.cr4_smep))
errcode |= PFERR_FETCH_MASK;
walker->fault.vector = PF_VECTOR;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ef318b9edf66a082f23d00d79b70c17b4c055a26 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 22 Jun 2021 10:56:49 -0700
Subject: [PATCH] KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in
nested NPT walk
Use the MMU's role to get its effective SMEP value when injecting a fault
into the guest. When walking L1's (nested) NPT while L2 is active, vCPU
state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
CR4, EFER, etc... If L1 and L2 have different settings for SMEP and
L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
when injecting #NPF.
Fixes: e57d4a356ad3 ("KVM: Add instruction fetch checking when walking guest page table")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210622175739.3610207-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..52fffd68b522 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -471,8 +471,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
error:
errcode |= write_fault | user_fault;
- if (fetch_fault && (mmu->nx ||
- kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
+ if (fetch_fault && (mmu->nx || mmu->mmu_role.ext.cr4_smep))
errcode |= PFERR_FETCH_MASK;
walker->fault.vector = PF_VECTOR;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ef318b9edf66a082f23d00d79b70c17b4c055a26 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 22 Jun 2021 10:56:49 -0700
Subject: [PATCH] KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in
nested NPT walk
Use the MMU's role to get its effective SMEP value when injecting a fault
into the guest. When walking L1's (nested) NPT while L2 is active, vCPU
state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
CR4, EFER, etc... If L1 and L2 have different settings for SMEP and
L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
when injecting #NPF.
Fixes: e57d4a356ad3 ("KVM: Add instruction fetch checking when walking guest page table")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210622175739.3610207-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..52fffd68b522 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -471,8 +471,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
error:
errcode |= write_fault | user_fault;
- if (fetch_fault && (mmu->nx ||
- kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
+ if (fetch_fault && (mmu->nx || mmu->mmu_role.ext.cr4_smep))
errcode |= PFERR_FETCH_MASK;
walker->fault.vector = PF_VECTOR;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ef318b9edf66a082f23d00d79b70c17b4c055a26 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 22 Jun 2021 10:56:49 -0700
Subject: [PATCH] KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in
nested NPT walk
Use the MMU's role to get its effective SMEP value when injecting a fault
into the guest. When walking L1's (nested) NPT while L2 is active, vCPU
state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
CR4, EFER, etc... If L1 and L2 have different settings for SMEP and
L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
when injecting #NPF.
Fixes: e57d4a356ad3 ("KVM: Add instruction fetch checking when walking guest page table")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210622175739.3610207-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..52fffd68b522 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -471,8 +471,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
error:
errcode |= write_fault | user_fault;
- if (fetch_fault && (mmu->nx ||
- kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
+ if (fetch_fault && (mmu->nx || mmu->mmu_role.ext.cr4_smep))
errcode |= PFERR_FETCH_MASK;
walker->fault.vector = PF_VECTOR;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From ef318b9edf66a082f23d00d79b70c17b4c055a26 Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 22 Jun 2021 10:56:49 -0700
Subject: [PATCH] KVM: x86/mmu: Use MMU's role to detect CR4.SMEP value in
nested NPT walk
Use the MMU's role to get its effective SMEP value when injecting a fault
into the guest. When walking L1's (nested) NPT while L2 is active, vCPU
state will reflect L2, whereas NPT uses the host's (L1 in this case) CR0,
CR4, EFER, etc... If L1 and L2 have different settings for SMEP and
L1 does not have EFER.NX=1, this can result in an incorrect PFEC.FETCH
when injecting #NPF.
Fixes: e57d4a356ad3 ("KVM: Add instruction fetch checking when walking guest page table")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210622175739.3610207-5-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/paging_tmpl.h b/arch/x86/kvm/mmu/paging_tmpl.h
index 823a5919f9fa..52fffd68b522 100644
--- a/arch/x86/kvm/mmu/paging_tmpl.h
+++ b/arch/x86/kvm/mmu/paging_tmpl.h
@@ -471,8 +471,7 @@ static int FNAME(walk_addr_generic)(struct guest_walker *walker,
error:
errcode |= write_fault | user_fault;
- if (fetch_fault && (mmu->nx ||
- kvm_read_cr4_bits(vcpu, X86_CR4_SMEP)))
+ if (fetch_fault && (mmu->nx || mmu->mmu_role.ext.cr4_smep))
errcode |= PFERR_FETCH_MASK;
walker->fault.vector = PF_VECTOR;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 112022bdb5bc372e00e6e43cb88ee38ea67b97bd Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc(a)google.com>
Date: Tue, 22 Jun 2021 10:56:47 -0700
Subject: [PATCH] KVM: x86/mmu: Treat NX as used (not reserved) for all !TDP
shadow MMUs
Mark NX as being used for all non-nested shadow MMUs, as KVM will set the
NX bit for huge SPTEs if the iTLB mutli-hit mitigation is enabled.
Checking the mitigation itself is not sufficient as it can be toggled on
at any time and KVM doesn't reset MMU contexts when that happens. KVM
could reset the contexts, but that would require purging all SPTEs in all
MMUs, for no real benefit. And, KVM already forces EFER.NX=1 when TDP is
disabled (for WP=0, SMEP=1, NX=0), so technically NX is never reserved
for shadow MMUs.
Fixes: b8e8c8303ff2 ("kvm: mmu: ITLB_MULTIHIT mitigation")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
Message-Id: <20210622175739.3610207-3-seanjc(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index b3be690d081a..444e068e6ad9 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4221,7 +4221,15 @@ static inline u64 reserved_hpa_bits(void)
void
reset_shadow_zero_bits_mask(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
{
- bool uses_nx = context->nx ||
+ /*
+ * KVM uses NX when TDP is disabled to handle a variety of scenarios,
+ * notably for huge SPTEs if iTLB multi-hit mitigation is enabled and
+ * to generate correct permissions for CR0.WP=0/CR4.SMEP=1/EFER.NX=0.
+ * The iTLB multi-hit workaround can be toggled at any time, so assume
+ * NX can be used by any non-nested shadow MMU to avoid having to reset
+ * MMU contexts. Note, KVM forces EFER.NX=1 when TDP is disabled.
+ */
+ bool uses_nx = context->nx || !tdp_enabled ||
context->mmu_role.base.smep_andnot_wp;
struct rsvd_bits_validate *shadow_zero_check;
int i;
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ba53317d497dec029bfb040b1daf38328fa00ab Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 26 May 2021 22:58:51 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.
Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv
Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable(a)vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Reviewed-by: Fabiano Rosas <farosas(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..13728495ac66 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4455,7 +4455,6 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
mtspr(SPRN_EBBRR, ebb_regs[1]);
mtspr(SPRN_BESCR, ebb_regs[2]);
mtspr(SPRN_TAR, user_tar);
- mtspr(SPRN_FSCR, current->thread.fscr);
}
mtspr(SPRN_VRSAVE, user_vrsave);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 5e634db4809b..004f0d4e665f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -59,6 +59,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
#define STACK_SLOT_UAMOR (SFS-88)
#define STACK_SLOT_DAWR1 (SFS-96)
#define STACK_SLOT_DAWRX1 (SFS-104)
+#define STACK_SLOT_FSCR (SFS-112)
/* the following is used by the P9 short path */
#define STACK_SLOT_NVGPRS (SFS-152) /* 18 gprs */
@@ -686,6 +687,8 @@ BEGIN_FTR_SECTION
std r6, STACK_SLOT_DAWR0(r1)
std r7, STACK_SLOT_DAWRX0(r1)
std r8, STACK_SLOT_IAMR(r1)
+ mfspr r5, SPRN_FSCR
+ std r5, STACK_SLOT_FSCR(r1)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
BEGIN_FTR_SECTION
mfspr r6, SPRN_DAWR1
@@ -1663,6 +1666,10 @@ FTR_SECTION_ELSE
ld r7, STACK_SLOT_HFSCR(r1)
mtspr SPRN_HFSCR, r7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
+BEGIN_FTR_SECTION
+ ld r5, STACK_SLOT_FSCR(r1)
+ mtspr SPRN_FSCR, r5
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ba53317d497dec029bfb040b1daf38328fa00ab Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 26 May 2021 22:58:51 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.
Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv
Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable(a)vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Reviewed-by: Fabiano Rosas <farosas(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..13728495ac66 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4455,7 +4455,6 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
mtspr(SPRN_EBBRR, ebb_regs[1]);
mtspr(SPRN_BESCR, ebb_regs[2]);
mtspr(SPRN_TAR, user_tar);
- mtspr(SPRN_FSCR, current->thread.fscr);
}
mtspr(SPRN_VRSAVE, user_vrsave);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 5e634db4809b..004f0d4e665f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -59,6 +59,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
#define STACK_SLOT_UAMOR (SFS-88)
#define STACK_SLOT_DAWR1 (SFS-96)
#define STACK_SLOT_DAWRX1 (SFS-104)
+#define STACK_SLOT_FSCR (SFS-112)
/* the following is used by the P9 short path */
#define STACK_SLOT_NVGPRS (SFS-152) /* 18 gprs */
@@ -686,6 +687,8 @@ BEGIN_FTR_SECTION
std r6, STACK_SLOT_DAWR0(r1)
std r7, STACK_SLOT_DAWRX0(r1)
std r8, STACK_SLOT_IAMR(r1)
+ mfspr r5, SPRN_FSCR
+ std r5, STACK_SLOT_FSCR(r1)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
BEGIN_FTR_SECTION
mfspr r6, SPRN_DAWR1
@@ -1663,6 +1666,10 @@ FTR_SECTION_ELSE
ld r7, STACK_SLOT_HFSCR(r1)
mtspr SPRN_HFSCR, r7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
+BEGIN_FTR_SECTION
+ ld r5, STACK_SLOT_FSCR(r1)
+ mtspr SPRN_FSCR, r5
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ba53317d497dec029bfb040b1daf38328fa00ab Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 26 May 2021 22:58:51 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.
Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv
Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable(a)vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Reviewed-by: Fabiano Rosas <farosas(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..13728495ac66 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4455,7 +4455,6 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
mtspr(SPRN_EBBRR, ebb_regs[1]);
mtspr(SPRN_BESCR, ebb_regs[2]);
mtspr(SPRN_TAR, user_tar);
- mtspr(SPRN_FSCR, current->thread.fscr);
}
mtspr(SPRN_VRSAVE, user_vrsave);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 5e634db4809b..004f0d4e665f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -59,6 +59,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
#define STACK_SLOT_UAMOR (SFS-88)
#define STACK_SLOT_DAWR1 (SFS-96)
#define STACK_SLOT_DAWRX1 (SFS-104)
+#define STACK_SLOT_FSCR (SFS-112)
/* the following is used by the P9 short path */
#define STACK_SLOT_NVGPRS (SFS-152) /* 18 gprs */
@@ -686,6 +687,8 @@ BEGIN_FTR_SECTION
std r6, STACK_SLOT_DAWR0(r1)
std r7, STACK_SLOT_DAWRX0(r1)
std r8, STACK_SLOT_IAMR(r1)
+ mfspr r5, SPRN_FSCR
+ std r5, STACK_SLOT_FSCR(r1)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
BEGIN_FTR_SECTION
mfspr r6, SPRN_DAWR1
@@ -1663,6 +1666,10 @@ FTR_SECTION_ELSE
ld r7, STACK_SLOT_HFSCR(r1)
mtspr SPRN_HFSCR, r7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
+BEGIN_FTR_SECTION
+ ld r5, STACK_SLOT_FSCR(r1)
+ mtspr SPRN_FSCR, r5
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ba53317d497dec029bfb040b1daf38328fa00ab Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 26 May 2021 22:58:51 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.
Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv
Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable(a)vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Reviewed-by: Fabiano Rosas <farosas(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..13728495ac66 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4455,7 +4455,6 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
mtspr(SPRN_EBBRR, ebb_regs[1]);
mtspr(SPRN_BESCR, ebb_regs[2]);
mtspr(SPRN_TAR, user_tar);
- mtspr(SPRN_FSCR, current->thread.fscr);
}
mtspr(SPRN_VRSAVE, user_vrsave);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 5e634db4809b..004f0d4e665f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -59,6 +59,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
#define STACK_SLOT_UAMOR (SFS-88)
#define STACK_SLOT_DAWR1 (SFS-96)
#define STACK_SLOT_DAWRX1 (SFS-104)
+#define STACK_SLOT_FSCR (SFS-112)
/* the following is used by the P9 short path */
#define STACK_SLOT_NVGPRS (SFS-152) /* 18 gprs */
@@ -686,6 +687,8 @@ BEGIN_FTR_SECTION
std r6, STACK_SLOT_DAWR0(r1)
std r7, STACK_SLOT_DAWRX0(r1)
std r8, STACK_SLOT_IAMR(r1)
+ mfspr r5, SPRN_FSCR
+ std r5, STACK_SLOT_FSCR(r1)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
BEGIN_FTR_SECTION
mfspr r6, SPRN_DAWR1
@@ -1663,6 +1666,10 @@ FTR_SECTION_ELSE
ld r7, STACK_SLOT_HFSCR(r1)
mtspr SPRN_HFSCR, r7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
+BEGIN_FTR_SECTION
+ ld r5, STACK_SLOT_FSCR(r1)
+ mtspr SPRN_FSCR, r5
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ba53317d497dec029bfb040b1daf38328fa00ab Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 26 May 2021 22:58:51 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.
Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv
Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable(a)vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Reviewed-by: Fabiano Rosas <farosas(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..13728495ac66 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4455,7 +4455,6 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
mtspr(SPRN_EBBRR, ebb_regs[1]);
mtspr(SPRN_BESCR, ebb_regs[2]);
mtspr(SPRN_TAR, user_tar);
- mtspr(SPRN_FSCR, current->thread.fscr);
}
mtspr(SPRN_VRSAVE, user_vrsave);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 5e634db4809b..004f0d4e665f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -59,6 +59,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
#define STACK_SLOT_UAMOR (SFS-88)
#define STACK_SLOT_DAWR1 (SFS-96)
#define STACK_SLOT_DAWRX1 (SFS-104)
+#define STACK_SLOT_FSCR (SFS-112)
/* the following is used by the P9 short path */
#define STACK_SLOT_NVGPRS (SFS-152) /* 18 gprs */
@@ -686,6 +687,8 @@ BEGIN_FTR_SECTION
std r6, STACK_SLOT_DAWR0(r1)
std r7, STACK_SLOT_DAWRX0(r1)
std r8, STACK_SLOT_IAMR(r1)
+ mfspr r5, SPRN_FSCR
+ std r5, STACK_SLOT_FSCR(r1)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
BEGIN_FTR_SECTION
mfspr r6, SPRN_DAWR1
@@ -1663,6 +1666,10 @@ FTR_SECTION_ELSE
ld r7, STACK_SLOT_HFSCR(r1)
mtspr SPRN_HFSCR, r7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
+BEGIN_FTR_SECTION
+ ld r5, STACK_SLOT_FSCR(r1)
+ mtspr SPRN_FSCR, r5
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ba53317d497dec029bfb040b1daf38328fa00ab Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 26 May 2021 22:58:51 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.
Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv
Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable(a)vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Reviewed-by: Fabiano Rosas <farosas(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..13728495ac66 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4455,7 +4455,6 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
mtspr(SPRN_EBBRR, ebb_regs[1]);
mtspr(SPRN_BESCR, ebb_regs[2]);
mtspr(SPRN_TAR, user_tar);
- mtspr(SPRN_FSCR, current->thread.fscr);
}
mtspr(SPRN_VRSAVE, user_vrsave);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 5e634db4809b..004f0d4e665f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -59,6 +59,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
#define STACK_SLOT_UAMOR (SFS-88)
#define STACK_SLOT_DAWR1 (SFS-96)
#define STACK_SLOT_DAWRX1 (SFS-104)
+#define STACK_SLOT_FSCR (SFS-112)
/* the following is used by the P9 short path */
#define STACK_SLOT_NVGPRS (SFS-152) /* 18 gprs */
@@ -686,6 +687,8 @@ BEGIN_FTR_SECTION
std r6, STACK_SLOT_DAWR0(r1)
std r7, STACK_SLOT_DAWRX0(r1)
std r8, STACK_SLOT_IAMR(r1)
+ mfspr r5, SPRN_FSCR
+ std r5, STACK_SLOT_FSCR(r1)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
BEGIN_FTR_SECTION
mfspr r6, SPRN_DAWR1
@@ -1663,6 +1666,10 @@ FTR_SECTION_ELSE
ld r7, STACK_SLOT_HFSCR(r1)
mtspr SPRN_HFSCR, r7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
+BEGIN_FTR_SECTION
+ ld r5, STACK_SLOT_FSCR(r1)
+ mtspr SPRN_FSCR, r5
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ba53317d497dec029bfb040b1daf38328fa00ab Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 26 May 2021 22:58:51 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.
Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv
Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable(a)vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Reviewed-by: Fabiano Rosas <farosas(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..13728495ac66 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4455,7 +4455,6 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
mtspr(SPRN_EBBRR, ebb_regs[1]);
mtspr(SPRN_BESCR, ebb_regs[2]);
mtspr(SPRN_TAR, user_tar);
- mtspr(SPRN_FSCR, current->thread.fscr);
}
mtspr(SPRN_VRSAVE, user_vrsave);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 5e634db4809b..004f0d4e665f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -59,6 +59,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
#define STACK_SLOT_UAMOR (SFS-88)
#define STACK_SLOT_DAWR1 (SFS-96)
#define STACK_SLOT_DAWRX1 (SFS-104)
+#define STACK_SLOT_FSCR (SFS-112)
/* the following is used by the P9 short path */
#define STACK_SLOT_NVGPRS (SFS-152) /* 18 gprs */
@@ -686,6 +687,8 @@ BEGIN_FTR_SECTION
std r6, STACK_SLOT_DAWR0(r1)
std r7, STACK_SLOT_DAWRX0(r1)
std r8, STACK_SLOT_IAMR(r1)
+ mfspr r5, SPRN_FSCR
+ std r5, STACK_SLOT_FSCR(r1)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
BEGIN_FTR_SECTION
mfspr r6, SPRN_DAWR1
@@ -1663,6 +1666,10 @@ FTR_SECTION_ELSE
ld r7, STACK_SLOT_HFSCR(r1)
mtspr SPRN_HFSCR, r7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
+BEGIN_FTR_SECTION
+ ld r5, STACK_SLOT_FSCR(r1)
+ mtspr SPRN_FSCR, r5
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 6ba53317d497dec029bfb040b1daf38328fa00ab Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin(a)gmail.com>
Date: Wed, 26 May 2021 22:58:51 +1000
Subject: [PATCH] KVM: PPC: Book3S HV: Save host FSCR in the P7/8 path
Similar to commit 25edcc50d76c ("KVM: PPC: Book3S HV: Save and restore
FSCR in the P9 path"), ensure the P7/8 path saves and restores the host
FSCR. The logic explained in that patch actually applies there to the
old path well: a context switch can be made before kvmppc_vcpu_run_hv
restores the host FSCR and returns.
Now both the p9 and the p7/8 paths now save and restore their FSCR, it
no longer needs to be restored at the end of kvmppc_vcpu_run_hv
Fixes: b005255e12a3 ("KVM: PPC: Book3S HV: Context-switch new POWER8 SPRs")
Cc: stable(a)vger.kernel.org # v3.14+
Signed-off-by: Nicholas Piggin <npiggin(a)gmail.com>
Reviewed-by: Fabiano Rosas <farosas(a)linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe(a)ellerman.id.au>
Link: https://lore.kernel.org/r/20210526125851.3436735-1-npiggin@gmail.com
diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
index 28a80d240b76..13728495ac66 100644
--- a/arch/powerpc/kvm/book3s_hv.c
+++ b/arch/powerpc/kvm/book3s_hv.c
@@ -4455,7 +4455,6 @@ static int kvmppc_vcpu_run_hv(struct kvm_vcpu *vcpu)
mtspr(SPRN_EBBRR, ebb_regs[1]);
mtspr(SPRN_BESCR, ebb_regs[2]);
mtspr(SPRN_TAR, user_tar);
- mtspr(SPRN_FSCR, current->thread.fscr);
}
mtspr(SPRN_VRSAVE, user_vrsave);
diff --git a/arch/powerpc/kvm/book3s_hv_rmhandlers.S b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
index 5e634db4809b..004f0d4e665f 100644
--- a/arch/powerpc/kvm/book3s_hv_rmhandlers.S
+++ b/arch/powerpc/kvm/book3s_hv_rmhandlers.S
@@ -59,6 +59,7 @@ END_FTR_SECTION_IFCLR(CPU_FTR_ARCH_300)
#define STACK_SLOT_UAMOR (SFS-88)
#define STACK_SLOT_DAWR1 (SFS-96)
#define STACK_SLOT_DAWRX1 (SFS-104)
+#define STACK_SLOT_FSCR (SFS-112)
/* the following is used by the P9 short path */
#define STACK_SLOT_NVGPRS (SFS-152) /* 18 gprs */
@@ -686,6 +687,8 @@ BEGIN_FTR_SECTION
std r6, STACK_SLOT_DAWR0(r1)
std r7, STACK_SLOT_DAWRX0(r1)
std r8, STACK_SLOT_IAMR(r1)
+ mfspr r5, SPRN_FSCR
+ std r5, STACK_SLOT_FSCR(r1)
END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
BEGIN_FTR_SECTION
mfspr r6, SPRN_DAWR1
@@ -1663,6 +1666,10 @@ FTR_SECTION_ELSE
ld r7, STACK_SLOT_HFSCR(r1)
mtspr SPRN_HFSCR, r7
ALT_FTR_SECTION_END_IFCLR(CPU_FTR_ARCH_300)
+BEGIN_FTR_SECTION
+ ld r5, STACK_SLOT_FSCR(r1)
+ mtspr SPRN_FSCR, r5
+END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S)
/*
* Restore various registers to 0, where non-zero values
* set by the guest could disrupt the host.
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c24d37322548a6ec3caec67100d28b9c1f89f60a Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Mon, 28 Jun 2021 19:33:23 -0700
Subject: [PATCH] mm/gup: fix try_grab_compound_head() race with
split_huge_page()
try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent freeing
of page tables (via local_irq_save()), but not against concurrent TLB
flushes, freeing of data pages, or splitting of compound pages.
Because no reference is held to the page when try_grab_compound_head() is
called, the page may have been freed and reallocated by the time its
refcount has been elevated; therefore, once we're holding a stable
reference to the page, the caller re-checks whether the PTE still points
to the same page (with the same access rights).
The problem is that try_grab_compound_head() has to grab a reference on
the head page; but between the time we look up what the head page is and
the time we actually grab a reference on the head page, the compound page
may have been split up (either explicitly through split_huge_page() or by
freeing the compound page to the buddy allocator and then allocating its
individual order-0 pages). If that happens, get_user_pages_fast() may end
up returning the right page but lifting the refcount on a now-unrelated
page, leading to use-after-free of pages.
To fix it: Re-check whether the pages still belong together after lifting
the refcount on the head page. Move anything else that checks
compound_head(page) below the refcount increment.
This can't actually happen on bare-metal x86 (because there, disabling
IRQs locks out remote TLB flushes), but it can happen on virtualized x86
(e.g. under KVM) and probably also on arm64. The race window is pretty
narrow, and constantly allocating and shattering hugepages isn't exactly
fast; for now I've only managed to reproduce this in an x86 KVM guest with
an artificially widened timing window (by adding a loop that repeatedly
calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits, so
that PV TLB flushes are used instead of IPIs).
As requested on the list, also replace the existing VM_BUG_ON_PAGE() with
a warning and bailout. Since the existing code only performed the BUG_ON
check on DEBUG_VM kernels, ensure that the new code also only performs the
check under that configuration - I don't want to mix two logically
separate changes together too much. The macro VM_WARN_ON_ONCE_PAGE()
doesn't return a value on !DEBUG_VM, so wrap the whole check in an #ifdef
block. An alternative would be to change the VM_WARN_ON_ONCE_PAGE()
definition for !DEBUG_VM such that it always returns false, but since that
would differ from the behavior of the normal WARN macros, it might be too
confusing for readers.
Link: https://lkml.kernel.org/r/20210615012014.1100672-1-jannh@google.com
Fixes: 7aef4172c795 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Jan Kara <jack(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index 3ded6a5f26b2..90262e448552 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -44,6 +44,23 @@ static void hpage_pincount_sub(struct page *page, int refs)
atomic_sub(refs, compound_pincount_ptr(page));
}
+/* Equivalent to calling put_page() @refs times. */
+static void put_page_refs(struct page *page, int refs)
+{
+#ifdef CONFIG_DEBUG_VM
+ if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
+ return;
+#endif
+
+ /*
+ * Calling put_page() for each ref is unnecessarily slow. Only the last
+ * ref needs a put_page().
+ */
+ if (refs > 1)
+ page_ref_sub(page, refs - 1);
+ put_page(page);
+}
+
/*
* Return the compound head page with ref appropriately incremented,
* or NULL if that failed.
@@ -56,6 +73,21 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
return NULL;
if (unlikely(!page_cache_add_speculative(head, refs)))
return NULL;
+
+ /*
+ * At this point we have a stable reference to the head page; but it
+ * could be that between the compound_head() lookup and the refcount
+ * increment, the compound page was split, in which case we'd end up
+ * holding a reference on a page that has nothing to do with the page
+ * we were given anymore.
+ * So now that the head page is stable, recheck that the pages still
+ * belong together.
+ */
+ if (unlikely(compound_head(page) != head)) {
+ put_page_refs(head, refs);
+ return NULL;
+ }
+
return head;
}
@@ -95,6 +127,14 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
!is_pinnable_page(page)))
return NULL;
+ /*
+ * CAUTION: Don't use compound_head() on the page before this
+ * point, the result won't be stable.
+ */
+ page = try_get_compound_head(page, refs);
+ if (!page)
+ return NULL;
+
/*
* When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to
@@ -103,15 +143,10 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
* However, be sure to *also* increment the normal page refcount
* field at least once, so that the page really is pinned.
*/
- if (!hpage_pincount_available(page))
- refs *= GUP_PIN_COUNTING_BIAS;
-
- page = try_get_compound_head(page, refs);
- if (!page)
- return NULL;
-
if (hpage_pincount_available(page))
hpage_pincount_add(page, refs);
+ else
+ page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
orig_refs);
@@ -135,14 +170,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
refs *= GUP_PIN_COUNTING_BIAS;
}
- VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
- /*
- * Calling put_page() for each ref is unnecessarily slow. Only the last
- * ref needs a put_page().
- */
- if (refs > 1)
- page_ref_sub(page, refs - 1);
- put_page(page);
+ put_page_refs(page, refs);
}
/**
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c24d37322548a6ec3caec67100d28b9c1f89f60a Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Mon, 28 Jun 2021 19:33:23 -0700
Subject: [PATCH] mm/gup: fix try_grab_compound_head() race with
split_huge_page()
try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent freeing
of page tables (via local_irq_save()), but not against concurrent TLB
flushes, freeing of data pages, or splitting of compound pages.
Because no reference is held to the page when try_grab_compound_head() is
called, the page may have been freed and reallocated by the time its
refcount has been elevated; therefore, once we're holding a stable
reference to the page, the caller re-checks whether the PTE still points
to the same page (with the same access rights).
The problem is that try_grab_compound_head() has to grab a reference on
the head page; but between the time we look up what the head page is and
the time we actually grab a reference on the head page, the compound page
may have been split up (either explicitly through split_huge_page() or by
freeing the compound page to the buddy allocator and then allocating its
individual order-0 pages). If that happens, get_user_pages_fast() may end
up returning the right page but lifting the refcount on a now-unrelated
page, leading to use-after-free of pages.
To fix it: Re-check whether the pages still belong together after lifting
the refcount on the head page. Move anything else that checks
compound_head(page) below the refcount increment.
This can't actually happen on bare-metal x86 (because there, disabling
IRQs locks out remote TLB flushes), but it can happen on virtualized x86
(e.g. under KVM) and probably also on arm64. The race window is pretty
narrow, and constantly allocating and shattering hugepages isn't exactly
fast; for now I've only managed to reproduce this in an x86 KVM guest with
an artificially widened timing window (by adding a loop that repeatedly
calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits, so
that PV TLB flushes are used instead of IPIs).
As requested on the list, also replace the existing VM_BUG_ON_PAGE() with
a warning and bailout. Since the existing code only performed the BUG_ON
check on DEBUG_VM kernels, ensure that the new code also only performs the
check under that configuration - I don't want to mix two logically
separate changes together too much. The macro VM_WARN_ON_ONCE_PAGE()
doesn't return a value on !DEBUG_VM, so wrap the whole check in an #ifdef
block. An alternative would be to change the VM_WARN_ON_ONCE_PAGE()
definition for !DEBUG_VM such that it always returns false, but since that
would differ from the behavior of the normal WARN macros, it might be too
confusing for readers.
Link: https://lkml.kernel.org/r/20210615012014.1100672-1-jannh@google.com
Fixes: 7aef4172c795 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Jan Kara <jack(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index 3ded6a5f26b2..90262e448552 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -44,6 +44,23 @@ static void hpage_pincount_sub(struct page *page, int refs)
atomic_sub(refs, compound_pincount_ptr(page));
}
+/* Equivalent to calling put_page() @refs times. */
+static void put_page_refs(struct page *page, int refs)
+{
+#ifdef CONFIG_DEBUG_VM
+ if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
+ return;
+#endif
+
+ /*
+ * Calling put_page() for each ref is unnecessarily slow. Only the last
+ * ref needs a put_page().
+ */
+ if (refs > 1)
+ page_ref_sub(page, refs - 1);
+ put_page(page);
+}
+
/*
* Return the compound head page with ref appropriately incremented,
* or NULL if that failed.
@@ -56,6 +73,21 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
return NULL;
if (unlikely(!page_cache_add_speculative(head, refs)))
return NULL;
+
+ /*
+ * At this point we have a stable reference to the head page; but it
+ * could be that between the compound_head() lookup and the refcount
+ * increment, the compound page was split, in which case we'd end up
+ * holding a reference on a page that has nothing to do with the page
+ * we were given anymore.
+ * So now that the head page is stable, recheck that the pages still
+ * belong together.
+ */
+ if (unlikely(compound_head(page) != head)) {
+ put_page_refs(head, refs);
+ return NULL;
+ }
+
return head;
}
@@ -95,6 +127,14 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
!is_pinnable_page(page)))
return NULL;
+ /*
+ * CAUTION: Don't use compound_head() on the page before this
+ * point, the result won't be stable.
+ */
+ page = try_get_compound_head(page, refs);
+ if (!page)
+ return NULL;
+
/*
* When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to
@@ -103,15 +143,10 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
* However, be sure to *also* increment the normal page refcount
* field at least once, so that the page really is pinned.
*/
- if (!hpage_pincount_available(page))
- refs *= GUP_PIN_COUNTING_BIAS;
-
- page = try_get_compound_head(page, refs);
- if (!page)
- return NULL;
-
if (hpage_pincount_available(page))
hpage_pincount_add(page, refs);
+ else
+ page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
orig_refs);
@@ -135,14 +170,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
refs *= GUP_PIN_COUNTING_BIAS;
}
- VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
- /*
- * Calling put_page() for each ref is unnecessarily slow. Only the last
- * ref needs a put_page().
- */
- if (refs > 1)
- page_ref_sub(page, refs - 1);
- put_page(page);
+ put_page_refs(page, refs);
}
/**
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c24d37322548a6ec3caec67100d28b9c1f89f60a Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Mon, 28 Jun 2021 19:33:23 -0700
Subject: [PATCH] mm/gup: fix try_grab_compound_head() race with
split_huge_page()
try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent freeing
of page tables (via local_irq_save()), but not against concurrent TLB
flushes, freeing of data pages, or splitting of compound pages.
Because no reference is held to the page when try_grab_compound_head() is
called, the page may have been freed and reallocated by the time its
refcount has been elevated; therefore, once we're holding a stable
reference to the page, the caller re-checks whether the PTE still points
to the same page (with the same access rights).
The problem is that try_grab_compound_head() has to grab a reference on
the head page; but between the time we look up what the head page is and
the time we actually grab a reference on the head page, the compound page
may have been split up (either explicitly through split_huge_page() or by
freeing the compound page to the buddy allocator and then allocating its
individual order-0 pages). If that happens, get_user_pages_fast() may end
up returning the right page but lifting the refcount on a now-unrelated
page, leading to use-after-free of pages.
To fix it: Re-check whether the pages still belong together after lifting
the refcount on the head page. Move anything else that checks
compound_head(page) below the refcount increment.
This can't actually happen on bare-metal x86 (because there, disabling
IRQs locks out remote TLB flushes), but it can happen on virtualized x86
(e.g. under KVM) and probably also on arm64. The race window is pretty
narrow, and constantly allocating and shattering hugepages isn't exactly
fast; for now I've only managed to reproduce this in an x86 KVM guest with
an artificially widened timing window (by adding a loop that repeatedly
calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits, so
that PV TLB flushes are used instead of IPIs).
As requested on the list, also replace the existing VM_BUG_ON_PAGE() with
a warning and bailout. Since the existing code only performed the BUG_ON
check on DEBUG_VM kernels, ensure that the new code also only performs the
check under that configuration - I don't want to mix two logically
separate changes together too much. The macro VM_WARN_ON_ONCE_PAGE()
doesn't return a value on !DEBUG_VM, so wrap the whole check in an #ifdef
block. An alternative would be to change the VM_WARN_ON_ONCE_PAGE()
definition for !DEBUG_VM such that it always returns false, but since that
would differ from the behavior of the normal WARN macros, it might be too
confusing for readers.
Link: https://lkml.kernel.org/r/20210615012014.1100672-1-jannh@google.com
Fixes: 7aef4172c795 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Jan Kara <jack(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index 3ded6a5f26b2..90262e448552 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -44,6 +44,23 @@ static void hpage_pincount_sub(struct page *page, int refs)
atomic_sub(refs, compound_pincount_ptr(page));
}
+/* Equivalent to calling put_page() @refs times. */
+static void put_page_refs(struct page *page, int refs)
+{
+#ifdef CONFIG_DEBUG_VM
+ if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
+ return;
+#endif
+
+ /*
+ * Calling put_page() for each ref is unnecessarily slow. Only the last
+ * ref needs a put_page().
+ */
+ if (refs > 1)
+ page_ref_sub(page, refs - 1);
+ put_page(page);
+}
+
/*
* Return the compound head page with ref appropriately incremented,
* or NULL if that failed.
@@ -56,6 +73,21 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
return NULL;
if (unlikely(!page_cache_add_speculative(head, refs)))
return NULL;
+
+ /*
+ * At this point we have a stable reference to the head page; but it
+ * could be that between the compound_head() lookup and the refcount
+ * increment, the compound page was split, in which case we'd end up
+ * holding a reference on a page that has nothing to do with the page
+ * we were given anymore.
+ * So now that the head page is stable, recheck that the pages still
+ * belong together.
+ */
+ if (unlikely(compound_head(page) != head)) {
+ put_page_refs(head, refs);
+ return NULL;
+ }
+
return head;
}
@@ -95,6 +127,14 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
!is_pinnable_page(page)))
return NULL;
+ /*
+ * CAUTION: Don't use compound_head() on the page before this
+ * point, the result won't be stable.
+ */
+ page = try_get_compound_head(page, refs);
+ if (!page)
+ return NULL;
+
/*
* When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to
@@ -103,15 +143,10 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
* However, be sure to *also* increment the normal page refcount
* field at least once, so that the page really is pinned.
*/
- if (!hpage_pincount_available(page))
- refs *= GUP_PIN_COUNTING_BIAS;
-
- page = try_get_compound_head(page, refs);
- if (!page)
- return NULL;
-
if (hpage_pincount_available(page))
hpage_pincount_add(page, refs);
+ else
+ page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
orig_refs);
@@ -135,14 +170,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
refs *= GUP_PIN_COUNTING_BIAS;
}
- VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
- /*
- * Calling put_page() for each ref is unnecessarily slow. Only the last
- * ref needs a put_page().
- */
- if (refs > 1)
- page_ref_sub(page, refs - 1);
- put_page(page);
+ put_page_refs(page, refs);
}
/**
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c24d37322548a6ec3caec67100d28b9c1f89f60a Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Mon, 28 Jun 2021 19:33:23 -0700
Subject: [PATCH] mm/gup: fix try_grab_compound_head() race with
split_huge_page()
try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent freeing
of page tables (via local_irq_save()), but not against concurrent TLB
flushes, freeing of data pages, or splitting of compound pages.
Because no reference is held to the page when try_grab_compound_head() is
called, the page may have been freed and reallocated by the time its
refcount has been elevated; therefore, once we're holding a stable
reference to the page, the caller re-checks whether the PTE still points
to the same page (with the same access rights).
The problem is that try_grab_compound_head() has to grab a reference on
the head page; but between the time we look up what the head page is and
the time we actually grab a reference on the head page, the compound page
may have been split up (either explicitly through split_huge_page() or by
freeing the compound page to the buddy allocator and then allocating its
individual order-0 pages). If that happens, get_user_pages_fast() may end
up returning the right page but lifting the refcount on a now-unrelated
page, leading to use-after-free of pages.
To fix it: Re-check whether the pages still belong together after lifting
the refcount on the head page. Move anything else that checks
compound_head(page) below the refcount increment.
This can't actually happen on bare-metal x86 (because there, disabling
IRQs locks out remote TLB flushes), but it can happen on virtualized x86
(e.g. under KVM) and probably also on arm64. The race window is pretty
narrow, and constantly allocating and shattering hugepages isn't exactly
fast; for now I've only managed to reproduce this in an x86 KVM guest with
an artificially widened timing window (by adding a loop that repeatedly
calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits, so
that PV TLB flushes are used instead of IPIs).
As requested on the list, also replace the existing VM_BUG_ON_PAGE() with
a warning and bailout. Since the existing code only performed the BUG_ON
check on DEBUG_VM kernels, ensure that the new code also only performs the
check under that configuration - I don't want to mix two logically
separate changes together too much. The macro VM_WARN_ON_ONCE_PAGE()
doesn't return a value on !DEBUG_VM, so wrap the whole check in an #ifdef
block. An alternative would be to change the VM_WARN_ON_ONCE_PAGE()
definition for !DEBUG_VM such that it always returns false, but since that
would differ from the behavior of the normal WARN macros, it might be too
confusing for readers.
Link: https://lkml.kernel.org/r/20210615012014.1100672-1-jannh@google.com
Fixes: 7aef4172c795 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Jan Kara <jack(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index 3ded6a5f26b2..90262e448552 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -44,6 +44,23 @@ static void hpage_pincount_sub(struct page *page, int refs)
atomic_sub(refs, compound_pincount_ptr(page));
}
+/* Equivalent to calling put_page() @refs times. */
+static void put_page_refs(struct page *page, int refs)
+{
+#ifdef CONFIG_DEBUG_VM
+ if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
+ return;
+#endif
+
+ /*
+ * Calling put_page() for each ref is unnecessarily slow. Only the last
+ * ref needs a put_page().
+ */
+ if (refs > 1)
+ page_ref_sub(page, refs - 1);
+ put_page(page);
+}
+
/*
* Return the compound head page with ref appropriately incremented,
* or NULL if that failed.
@@ -56,6 +73,21 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
return NULL;
if (unlikely(!page_cache_add_speculative(head, refs)))
return NULL;
+
+ /*
+ * At this point we have a stable reference to the head page; but it
+ * could be that between the compound_head() lookup and the refcount
+ * increment, the compound page was split, in which case we'd end up
+ * holding a reference on a page that has nothing to do with the page
+ * we were given anymore.
+ * So now that the head page is stable, recheck that the pages still
+ * belong together.
+ */
+ if (unlikely(compound_head(page) != head)) {
+ put_page_refs(head, refs);
+ return NULL;
+ }
+
return head;
}
@@ -95,6 +127,14 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
!is_pinnable_page(page)))
return NULL;
+ /*
+ * CAUTION: Don't use compound_head() on the page before this
+ * point, the result won't be stable.
+ */
+ page = try_get_compound_head(page, refs);
+ if (!page)
+ return NULL;
+
/*
* When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to
@@ -103,15 +143,10 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
* However, be sure to *also* increment the normal page refcount
* field at least once, so that the page really is pinned.
*/
- if (!hpage_pincount_available(page))
- refs *= GUP_PIN_COUNTING_BIAS;
-
- page = try_get_compound_head(page, refs);
- if (!page)
- return NULL;
-
if (hpage_pincount_available(page))
hpage_pincount_add(page, refs);
+ else
+ page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
orig_refs);
@@ -135,14 +170,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
refs *= GUP_PIN_COUNTING_BIAS;
}
- VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
- /*
- * Calling put_page() for each ref is unnecessarily slow. Only the last
- * ref needs a put_page().
- */
- if (refs > 1)
- page_ref_sub(page, refs - 1);
- put_page(page);
+ put_page_refs(page, refs);
}
/**
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From c24d37322548a6ec3caec67100d28b9c1f89f60a Mon Sep 17 00:00:00 2001
From: Jann Horn <jannh(a)google.com>
Date: Mon, 28 Jun 2021 19:33:23 -0700
Subject: [PATCH] mm/gup: fix try_grab_compound_head() race with
split_huge_page()
try_grab_compound_head() is used to grab a reference to a page from
get_user_pages_fast(), which is only protected against concurrent freeing
of page tables (via local_irq_save()), but not against concurrent TLB
flushes, freeing of data pages, or splitting of compound pages.
Because no reference is held to the page when try_grab_compound_head() is
called, the page may have been freed and reallocated by the time its
refcount has been elevated; therefore, once we're holding a stable
reference to the page, the caller re-checks whether the PTE still points
to the same page (with the same access rights).
The problem is that try_grab_compound_head() has to grab a reference on
the head page; but between the time we look up what the head page is and
the time we actually grab a reference on the head page, the compound page
may have been split up (either explicitly through split_huge_page() or by
freeing the compound page to the buddy allocator and then allocating its
individual order-0 pages). If that happens, get_user_pages_fast() may end
up returning the right page but lifting the refcount on a now-unrelated
page, leading to use-after-free of pages.
To fix it: Re-check whether the pages still belong together after lifting
the refcount on the head page. Move anything else that checks
compound_head(page) below the refcount increment.
This can't actually happen on bare-metal x86 (because there, disabling
IRQs locks out remote TLB flushes), but it can happen on virtualized x86
(e.g. under KVM) and probably also on arm64. The race window is pretty
narrow, and constantly allocating and shattering hugepages isn't exactly
fast; for now I've only managed to reproduce this in an x86 KVM guest with
an artificially widened timing window (by adding a loop that repeatedly
calls `inl(0x3f8 + 5)` in `try_get_compound_head()` to force VM exits, so
that PV TLB flushes are used instead of IPIs).
As requested on the list, also replace the existing VM_BUG_ON_PAGE() with
a warning and bailout. Since the existing code only performed the BUG_ON
check on DEBUG_VM kernels, ensure that the new code also only performs the
check under that configuration - I don't want to mix two logically
separate changes together too much. The macro VM_WARN_ON_ONCE_PAGE()
doesn't return a value on !DEBUG_VM, so wrap the whole check in an #ifdef
block. An alternative would be to change the VM_WARN_ON_ONCE_PAGE()
definition for !DEBUG_VM such that it always returns false, but since that
would differ from the behavior of the normal WARN macros, it might be too
confusing for readers.
Link: https://lkml.kernel.org/r/20210615012014.1100672-1-jannh@google.com
Fixes: 7aef4172c795 ("mm: handle PTE-mapped tail pages in gerneric fast gup implementaiton")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: John Hubbard <jhubbard(a)nvidia.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Jan Kara <jack(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/mm/gup.c b/mm/gup.c
index 3ded6a5f26b2..90262e448552 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -44,6 +44,23 @@ static void hpage_pincount_sub(struct page *page, int refs)
atomic_sub(refs, compound_pincount_ptr(page));
}
+/* Equivalent to calling put_page() @refs times. */
+static void put_page_refs(struct page *page, int refs)
+{
+#ifdef CONFIG_DEBUG_VM
+ if (VM_WARN_ON_ONCE_PAGE(page_ref_count(page) < refs, page))
+ return;
+#endif
+
+ /*
+ * Calling put_page() for each ref is unnecessarily slow. Only the last
+ * ref needs a put_page().
+ */
+ if (refs > 1)
+ page_ref_sub(page, refs - 1);
+ put_page(page);
+}
+
/*
* Return the compound head page with ref appropriately incremented,
* or NULL if that failed.
@@ -56,6 +73,21 @@ static inline struct page *try_get_compound_head(struct page *page, int refs)
return NULL;
if (unlikely(!page_cache_add_speculative(head, refs)))
return NULL;
+
+ /*
+ * At this point we have a stable reference to the head page; but it
+ * could be that between the compound_head() lookup and the refcount
+ * increment, the compound page was split, in which case we'd end up
+ * holding a reference on a page that has nothing to do with the page
+ * we were given anymore.
+ * So now that the head page is stable, recheck that the pages still
+ * belong together.
+ */
+ if (unlikely(compound_head(page) != head)) {
+ put_page_refs(head, refs);
+ return NULL;
+ }
+
return head;
}
@@ -95,6 +127,14 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
!is_pinnable_page(page)))
return NULL;
+ /*
+ * CAUTION: Don't use compound_head() on the page before this
+ * point, the result won't be stable.
+ */
+ page = try_get_compound_head(page, refs);
+ if (!page)
+ return NULL;
+
/*
* When pinning a compound page of order > 1 (which is what
* hpage_pincount_available() checks for), use an exact count to
@@ -103,15 +143,10 @@ __maybe_unused struct page *try_grab_compound_head(struct page *page,
* However, be sure to *also* increment the normal page refcount
* field at least once, so that the page really is pinned.
*/
- if (!hpage_pincount_available(page))
- refs *= GUP_PIN_COUNTING_BIAS;
-
- page = try_get_compound_head(page, refs);
- if (!page)
- return NULL;
-
if (hpage_pincount_available(page))
hpage_pincount_add(page, refs);
+ else
+ page_ref_add(page, refs * (GUP_PIN_COUNTING_BIAS - 1));
mod_node_page_state(page_pgdat(page), NR_FOLL_PIN_ACQUIRED,
orig_refs);
@@ -135,14 +170,7 @@ static void put_compound_head(struct page *page, int refs, unsigned int flags)
refs *= GUP_PIN_COUNTING_BIAS;
}
- VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
- /*
- * Calling put_page() for each ref is unnecessarily slow. Only the last
- * ref needs a put_page().
- */
- if (refs > 1)
- page_ref_sub(page, refs - 1);
- put_page(page);
+ put_page_refs(page, refs);
}
/**
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 44b1eba44dc537edf076f131f1eeee7544d0e04f Mon Sep 17 00:00:00 2001
From: Loic Poulain <loic.poulain(a)linaro.org>
Date: Mon, 21 Jun 2021 21:46:10 +0530
Subject: [PATCH] bus: mhi: core: Fix power down latency
On graceful power-down/disable transition, when an MHI reset is
performed, the MHI device loses its context, including interrupt
configuration. However, the current implementation is waiting for
event(irq) driven state change to confirm reset has been completed,
which never happens, and causes reset timeout, leading to unexpected
high latency of the mhi_power_down procedure (up to 45 seconds).
Fix that by moving to the recently introduced poll_reg_field method,
waiting for the reset bit to be cleared, in the same way as the
power_on procedure.
Cc: stable(a)vger.kernel.org
Fixes: a6e2e3522f29 ("bus: mhi: core: Add support for PM state transitions")
Signed-off-by: Loic Poulain <loic.poulain(a)linaro.org>
Reviewed-by: Bhaumik Bhatt <bbhatt(a)codeaurora.org>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Reviewed-by: Hemant Kumar <hemantk(a)codeaurora.org>
Link: https://lore.kernel.org/r/1620029090-8975-1-git-send-email-loic.poulain@lin…
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Link: https://lore.kernel.org/r/20210621161616.77524-3-manivannan.sadhasivam@lina…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index e2e59a341fef..704a5e225097 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -465,23 +465,15 @@ static void mhi_pm_disable_transition(struct mhi_controller *mhi_cntrl)
/* Trigger MHI RESET so that the device will not access host memory */
if (!MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state)) {
- u32 in_reset = -1;
- unsigned long timeout = msecs_to_jiffies(mhi_cntrl->timeout_ms);
-
dev_dbg(dev, "Triggering MHI Reset in device\n");
mhi_set_mhi_state(mhi_cntrl, MHI_STATE_RESET);
/* Wait for the reset bit to be cleared by the device */
- ret = wait_event_timeout(mhi_cntrl->state_event,
- mhi_read_reg_field(mhi_cntrl,
- mhi_cntrl->regs,
- MHICTRL,
- MHICTRL_RESET_MASK,
- MHICTRL_RESET_SHIFT,
- &in_reset) ||
- !in_reset, timeout);
- if (!ret || in_reset)
- dev_err(dev, "Device failed to exit MHI Reset state\n");
+ ret = mhi_poll_reg_field(mhi_cntrl, mhi_cntrl->regs, MHICTRL,
+ MHICTRL_RESET_MASK, MHICTRL_RESET_SHIFT, 0,
+ 25000);
+ if (ret)
+ dev_err(dev, "Device failed to clear MHI Reset\n");
/*
* Device will clear BHI_INTVEC as a part of RESET processing,
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d8ac76cdd1755b21e8c008c28d0b7251c0b14986 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 9 Jun 2021 11:25:03 +0100
Subject: [PATCH] btrfs: send: fix invalid path for unlink operations after
parent orphanization
During an incremental send operation, when processing the new references
for the current inode, we might send an unlink operation for another inode
that has a conflicting path and has more than one hard link. However this
path was computed and cached before we processed previous new references
for the current inode. We may have orphanized a directory of that path
while processing a previous new reference, in which case the path will
be invalid and cause the receiver process to fail.
The following reproducer triggers the problem and explains how/why it
happens in its comments:
$ cat test-send-unlink.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV >/dev/null
mount $DEV $MNT
# Create our test files and directory. Inode 259 (file3) has two hard
# links.
touch $MNT/file1
touch $MNT/file2
touch $MNT/file3
mkdir $MNT/A
ln $MNT/file3 $MNT/A/hard_link
# Filesystem looks like:
#
# . (ino 256)
# |----- file1 (ino 257)
# |----- file2 (ino 258)
# |----- file3 (ino 259)
# |----- A/ (ino 260)
# |---- hard_link (ino 259)
#
# Now create the base snapshot, which is going to be the parent snapshot
# for a later incremental send.
btrfs subvolume snapshot -r $MNT $MNT/snap1
btrfs send -f /tmp/snap1.send $MNT/snap1
# Move inode 257 into directory inode 260. This results in computing the
# path for inode 260 as "/A" and caching it.
mv $MNT/file1 $MNT/A/file1
# Move inode 258 (file2) into directory inode 260, with a name of
# "hard_link", moving first inode 259 away since it currently has that
# location and name.
mv $MNT/A/hard_link $MNT/tmp
mv $MNT/file2 $MNT/A/hard_link
# Now rename inode 260 to something else (B for example) and then create
# a hard link for inode 258 that has the old name and location of inode
# 260 ("/A").
mv $MNT/A $MNT/B
ln $MNT/B/hard_link $MNT/A
# Filesystem now looks like:
#
# . (ino 256)
# |----- tmp (ino 259)
# |----- file3 (ino 259)
# |----- B/ (ino 260)
# | |---- file1 (ino 257)
# | |---- hard_link (ino 258)
# |
# |----- A (ino 258)
# Create another snapshot of our subvolume and use it for an incremental
# send.
btrfs subvolume snapshot -r $MNT $MNT/snap2
btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2
# Now unmount the filesystem, create a new one, mount it and try to
# apply both send streams to recreate both snapshots.
umount $DEV
mkfs.btrfs -f $DEV >/dev/null
mount $DEV $MNT
# First add the first snapshot to the new filesystem by applying the
# first send stream.
btrfs receive -f /tmp/snap1.send $MNT
# The incremental receive operation below used to fail with the
# following error:
#
# ERROR: unlink A/hard_link failed: No such file or directory
#
# This is because when send is processing inode 257, it generates the
# path for inode 260 as "/A", since that inode is its parent in the send
# snapshot, and caches that path.
#
# Later when processing inode 258, it first processes its new reference
# that has the path of "/A", which results in orphanizing inode 260
# because there is a a path collision. This results in issuing a rename
# operation from "/A" to "/o260-6-0".
#
# Finally when processing the new reference "B/hard_link" for inode 258,
# it notices that it collides with inode 259 (not yet processed, because
# it has a higher inode number), since that inode has the name
# "hard_link" under the directory inode 260. It also checks that inode
# 259 has two hardlinks, so it decides to issue a unlink operation for
# the name "hard_link" for inode 259. However the path passed to the
# unlink operation is "/A/hard_link", which is incorrect since currently
# "/A" does not exists, due to the orphanization of inode 260 mentioned
# before. The path is incorrect because it was computed and cached
# before the orphanization. This results in the receiver to fail with
# the above error.
btrfs receive -f /tmp/snap2.send $MNT
umount $MNT
When running the test, it fails like this:
$ ./test-send-unlink.sh
Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
At subvol /mnt/sdi/snap1
Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
At subvol /mnt/sdi/snap2
At subvol snap1
At snapshot snap2
ERROR: unlink A/hard_link failed: No such file or directory
Fix this by recomputing a path before issuing an unlink operation when
processing the new references for the current inode if we previously
have orphanized a directory.
A test case for fstests will follow soon.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index bd69db72acc5..a2b3c594379d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -4064,6 +4064,17 @@ static int process_recorded_refs(struct send_ctx *sctx, int *pending_move)
if (ret < 0)
goto out;
} else {
+ /*
+ * If we previously orphanized a directory that
+ * collided with a new reference that we already
+ * processed, recompute the current path because
+ * that directory may be part of the path.
+ */
+ if (orphanized_dir) {
+ ret = refresh_ref_path(sctx, cur);
+ if (ret < 0)
+ goto out;
+ }
ret = send_unlink(sctx, cur->full_path);
if (ret < 0)
goto out;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d8ac76cdd1755b21e8c008c28d0b7251c0b14986 Mon Sep 17 00:00:00 2001
From: Filipe Manana <fdmanana(a)suse.com>
Date: Wed, 9 Jun 2021 11:25:03 +0100
Subject: [PATCH] btrfs: send: fix invalid path for unlink operations after
parent orphanization
During an incremental send operation, when processing the new references
for the current inode, we might send an unlink operation for another inode
that has a conflicting path and has more than one hard link. However this
path was computed and cached before we processed previous new references
for the current inode. We may have orphanized a directory of that path
while processing a previous new reference, in which case the path will
be invalid and cause the receiver process to fail.
The following reproducer triggers the problem and explains how/why it
happens in its comments:
$ cat test-send-unlink.sh
#!/bin/bash
DEV=/dev/sdi
MNT=/mnt/sdi
mkfs.btrfs -f $DEV >/dev/null
mount $DEV $MNT
# Create our test files and directory. Inode 259 (file3) has two hard
# links.
touch $MNT/file1
touch $MNT/file2
touch $MNT/file3
mkdir $MNT/A
ln $MNT/file3 $MNT/A/hard_link
# Filesystem looks like:
#
# . (ino 256)
# |----- file1 (ino 257)
# |----- file2 (ino 258)
# |----- file3 (ino 259)
# |----- A/ (ino 260)
# |---- hard_link (ino 259)
#
# Now create the base snapshot, which is going to be the parent snapshot
# for a later incremental send.
btrfs subvolume snapshot -r $MNT $MNT/snap1
btrfs send -f /tmp/snap1.send $MNT/snap1
# Move inode 257 into directory inode 260. This results in computing the
# path for inode 260 as "/A" and caching it.
mv $MNT/file1 $MNT/A/file1
# Move inode 258 (file2) into directory inode 260, with a name of
# "hard_link", moving first inode 259 away since it currently has that
# location and name.
mv $MNT/A/hard_link $MNT/tmp
mv $MNT/file2 $MNT/A/hard_link
# Now rename inode 260 to something else (B for example) and then create
# a hard link for inode 258 that has the old name and location of inode
# 260 ("/A").
mv $MNT/A $MNT/B
ln $MNT/B/hard_link $MNT/A
# Filesystem now looks like:
#
# . (ino 256)
# |----- tmp (ino 259)
# |----- file3 (ino 259)
# |----- B/ (ino 260)
# | |---- file1 (ino 257)
# | |---- hard_link (ino 258)
# |
# |----- A (ino 258)
# Create another snapshot of our subvolume and use it for an incremental
# send.
btrfs subvolume snapshot -r $MNT $MNT/snap2
btrfs send -f /tmp/snap2.send -p $MNT/snap1 $MNT/snap2
# Now unmount the filesystem, create a new one, mount it and try to
# apply both send streams to recreate both snapshots.
umount $DEV
mkfs.btrfs -f $DEV >/dev/null
mount $DEV $MNT
# First add the first snapshot to the new filesystem by applying the
# first send stream.
btrfs receive -f /tmp/snap1.send $MNT
# The incremental receive operation below used to fail with the
# following error:
#
# ERROR: unlink A/hard_link failed: No such file or directory
#
# This is because when send is processing inode 257, it generates the
# path for inode 260 as "/A", since that inode is its parent in the send
# snapshot, and caches that path.
#
# Later when processing inode 258, it first processes its new reference
# that has the path of "/A", which results in orphanizing inode 260
# because there is a a path collision. This results in issuing a rename
# operation from "/A" to "/o260-6-0".
#
# Finally when processing the new reference "B/hard_link" for inode 258,
# it notices that it collides with inode 259 (not yet processed, because
# it has a higher inode number), since that inode has the name
# "hard_link" under the directory inode 260. It also checks that inode
# 259 has two hardlinks, so it decides to issue a unlink operation for
# the name "hard_link" for inode 259. However the path passed to the
# unlink operation is "/A/hard_link", which is incorrect since currently
# "/A" does not exists, due to the orphanization of inode 260 mentioned
# before. The path is incorrect because it was computed and cached
# before the orphanization. This results in the receiver to fail with
# the above error.
btrfs receive -f /tmp/snap2.send $MNT
umount $MNT
When running the test, it fails like this:
$ ./test-send-unlink.sh
Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap1'
At subvol /mnt/sdi/snap1
Create a readonly snapshot of '/mnt/sdi' in '/mnt/sdi/snap2'
At subvol /mnt/sdi/snap2
At subvol snap1
At snapshot snap2
ERROR: unlink A/hard_link failed: No such file or directory
Fix this by recomputing a path before issuing an unlink operation when
processing the new references for the current inode if we previously
have orphanized a directory.
A test case for fstests will follow soon.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: Filipe Manana <fdmanana(a)suse.com>
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
index bd69db72acc5..a2b3c594379d 100644
--- a/fs/btrfs/send.c
+++ b/fs/btrfs/send.c
@@ -4064,6 +4064,17 @@ static int process_recorded_refs(struct send_ctx *sctx, int *pending_move)
if (ret < 0)
goto out;
} else {
+ /*
+ * If we previously orphanized a directory that
+ * collided with a new reference that we already
+ * processed, recompute the current path because
+ * that directory may be part of the path.
+ */
+ if (orphanized_dir) {
+ ret = refresh_ref_path(sctx, cur);
+ if (ret < 0)
+ goto out;
+ }
ret = send_unlink(sctx, cur->full_path);
if (ret < 0)
goto out;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From e41eb3e408de27982a5f8f50b2dd8002bed96908 Mon Sep 17 00:00:00 2001
From: Felix Fietkau <nbd(a)nbd.name>
Date: Sat, 19 Jun 2021 12:15:17 +0200
Subject: [PATCH] mac80211: remove iwlwifi specific workaround that broke sta
NDP tx
Sending nulldata packets is important for sw AP link probing and detecting
4-address mode links. The checks that dropped these packets were apparently
added to work around an iwlwifi firmware bug with multi-TID aggregation.
Fixes: 41cbb0f5a295 ("mac80211: add support for HE")
Cc: stable(a)vger.kernel.org
Signed-off-by: Felix Fietkau <nbd(a)nbd.name>
Link: https://lore.kernel.org/r/20210619101517.90806-1-nbd@nbd.name
Signed-off-by: Johannes Berg <johannes.berg(a)intel.com>
diff --git a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
index 1ad621d13ad3..0a13c2bda2ee 100644
--- a/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
+++ b/drivers/net/wireless/intel/iwlwifi/mvm/tx.c
@@ -1032,6 +1032,9 @@ static int iwl_mvm_tx_mpdu(struct iwl_mvm *mvm, struct sk_buff *skb,
if (WARN_ON_ONCE(mvmsta->sta_id == IWL_MVM_INVALID_STA))
return -1;
+ if (unlikely(ieee80211_is_any_nullfunc(fc)) && sta->he_cap.has_he)
+ return -1;
+
if (unlikely(ieee80211_is_probe_resp(fc)))
iwl_mvm_probe_resp_set_noa(mvm, skb);
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c
index 3f2aad2e7436..b1c44fa63a06 100644
--- a/net/mac80211/mlme.c
+++ b/net/mac80211/mlme.c
@@ -1094,11 +1094,6 @@ void ieee80211_send_nullfunc(struct ieee80211_local *local,
struct ieee80211_hdr_3addr *nullfunc;
struct ieee80211_if_managed *ifmgd = &sdata->u.mgd;
- /* Don't send NDPs when STA is connected HE */
- if (sdata->vif.type == NL80211_IFTYPE_STATION &&
- !(ifmgd->flags & IEEE80211_STA_DISABLE_HE))
- return;
-
skb = ieee80211_nullfunc_get(&local->hw, &sdata->vif,
!ieee80211_hw_check(&local->hw, DOESNT_SUPPORT_QOS_NDP));
if (!skb)
@@ -1130,10 +1125,6 @@ static void ieee80211_send_4addr_nullfunc(struct ieee80211_local *local,
if (WARN_ON(sdata->vif.type != NL80211_IFTYPE_STATION))
return;
- /* Don't send NDPs when connected HE */
- if (!(sdata->u.mgd.flags & IEEE80211_STA_DISABLE_HE))
- return;
-
skb = dev_alloc_skb(local->hw.extra_tx_headroom + 30);
if (!skb)
return;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fb8696ab14adadb2e3f6c17c18ed26b3ecd96691 Mon Sep 17 00:00:00 2001
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
Date: Fri, 18 Jun 2021 19:36:45 +0200
Subject: [PATCH] can: gw: synchronize rcu operations before removing gw job
entry
can_can_gw_rcv() is called under RCU protection, so after calling
can_rx_unregister(), we have to call synchronize_rcu in order to wait
for any RCU read-side critical sections to finish before removing the
kmem_cache entry with the referenced gw job entry.
Link: https://lore.kernel.org/r/20210618173645.2238-1-socketcan@hartkopp.net
Fixes: c1aabdf379bc ("can-gw: add netlink based CAN routing")
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/gw.c b/net/can/gw.c
index ba4124805602..d8861e862f15 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -596,6 +596,7 @@ static int cgw_notifier(struct notifier_block *nb,
if (gwj->src.dev == dev || gwj->dst.dev == dev) {
hlist_del(&gwj->list);
cgw_unregister_filter(net, gwj);
+ synchronize_rcu();
kmem_cache_free(cgw_cache, gwj);
}
}
@@ -1154,6 +1155,7 @@ static void cgw_remove_all_jobs(struct net *net)
hlist_for_each_entry_safe(gwj, nx, &net->can.cgw_list, list) {
hlist_del(&gwj->list);
cgw_unregister_filter(net, gwj);
+ synchronize_rcu();
kmem_cache_free(cgw_cache, gwj);
}
}
@@ -1222,6 +1224,7 @@ static int cgw_remove_job(struct sk_buff *skb, struct nlmsghdr *nlh,
hlist_del(&gwj->list);
cgw_unregister_filter(net, gwj);
+ synchronize_rcu();
kmem_cache_free(cgw_cache, gwj);
err = 0;
break;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fb8696ab14adadb2e3f6c17c18ed26b3ecd96691 Mon Sep 17 00:00:00 2001
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
Date: Fri, 18 Jun 2021 19:36:45 +0200
Subject: [PATCH] can: gw: synchronize rcu operations before removing gw job
entry
can_can_gw_rcv() is called under RCU protection, so after calling
can_rx_unregister(), we have to call synchronize_rcu in order to wait
for any RCU read-side critical sections to finish before removing the
kmem_cache entry with the referenced gw job entry.
Link: https://lore.kernel.org/r/20210618173645.2238-1-socketcan@hartkopp.net
Fixes: c1aabdf379bc ("can-gw: add netlink based CAN routing")
Cc: linux-stable <stable(a)vger.kernel.org>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/gw.c b/net/can/gw.c
index ba4124805602..d8861e862f15 100644
--- a/net/can/gw.c
+++ b/net/can/gw.c
@@ -596,6 +596,7 @@ static int cgw_notifier(struct notifier_block *nb,
if (gwj->src.dev == dev || gwj->dst.dev == dev) {
hlist_del(&gwj->list);
cgw_unregister_filter(net, gwj);
+ synchronize_rcu();
kmem_cache_free(cgw_cache, gwj);
}
}
@@ -1154,6 +1155,7 @@ static void cgw_remove_all_jobs(struct net *net)
hlist_for_each_entry_safe(gwj, nx, &net->can.cgw_list, list) {
hlist_del(&gwj->list);
cgw_unregister_filter(net, gwj);
+ synchronize_rcu();
kmem_cache_free(cgw_cache, gwj);
}
}
@@ -1222,6 +1224,7 @@ static int cgw_remove_job(struct sk_buff *skb, struct nlmsghdr *nlh,
hlist_del(&gwj->list);
cgw_unregister_filter(net, gwj);
+ synchronize_rcu();
kmem_cache_free(cgw_cache, gwj);
err = 0;
break;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d5f9023fa61ee8b94f37a93f08e94b136cf1e463 Mon Sep 17 00:00:00 2001
From: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Date: Sat, 19 Jun 2021 13:18:13 -0300
Subject: [PATCH] can: bcm: delay release of struct bcm_op after
synchronize_rcu()
can_rx_register() callbacks may be called concurrently to the call to
can_rx_unregister(). The callbacks and callback data, though, are
protected by RCU and the struct sock reference count.
So the callback data is really attached to the life of sk, meaning
that it should be released on sk_destruct. However, bcm_remove_op()
calls tasklet_kill(), and RCU callbacks may be called under RCU
softirq, so that cannot be used on kernels before the introduction of
HRTIMER_MODE_SOFT.
However, bcm_rx_handler() is called under RCU protection, so after
calling can_rx_unregister(), we may call synchronize_rcu() in order to
wait for any RCU read-side critical sections to finish. That is,
bcm_rx_handler() won't be called anymore for those ops. So, we only
free them, after we do that synchronize_rcu().
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Link: https://lore.kernel.org/r/20210619161813.2098382-1-cascardo@canonical.com
Cc: linux-stable <stable(a)vger.kernel.org>
Reported-by: syzbot+0f7e7e5e2f4f40fa89c0(a)syzkaller.appspotmail.com
Reported-by: Norbert Slusarek <nslusarek(a)gmx.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Acked-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/bcm.c b/net/can/bcm.c
index f3e4d9528fa3..0928a39c4423 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -785,6 +785,7 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
bcm_rx_handler, op);
list_del(&op->list);
+ synchronize_rcu();
bcm_remove_op(op);
return 1; /* done */
}
@@ -1533,9 +1534,13 @@ static int bcm_release(struct socket *sock)
REGMASK(op->can_id),
bcm_rx_handler, op);
- bcm_remove_op(op);
}
+ synchronize_rcu();
+
+ list_for_each_entry_safe(op, next, &bo->rx_ops, list)
+ bcm_remove_op(op);
+
#if IS_ENABLED(CONFIG_PROC_FS)
/* remove procfs entry */
if (net->can.bcmproc_dir && bo->bcm_proc_read)
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d5f9023fa61ee8b94f37a93f08e94b136cf1e463 Mon Sep 17 00:00:00 2001
From: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Date: Sat, 19 Jun 2021 13:18:13 -0300
Subject: [PATCH] can: bcm: delay release of struct bcm_op after
synchronize_rcu()
can_rx_register() callbacks may be called concurrently to the call to
can_rx_unregister(). The callbacks and callback data, though, are
protected by RCU and the struct sock reference count.
So the callback data is really attached to the life of sk, meaning
that it should be released on sk_destruct. However, bcm_remove_op()
calls tasklet_kill(), and RCU callbacks may be called under RCU
softirq, so that cannot be used on kernels before the introduction of
HRTIMER_MODE_SOFT.
However, bcm_rx_handler() is called under RCU protection, so after
calling can_rx_unregister(), we may call synchronize_rcu() in order to
wait for any RCU read-side critical sections to finish. That is,
bcm_rx_handler() won't be called anymore for those ops. So, we only
free them, after we do that synchronize_rcu().
Fixes: ffd980f976e7 ("[CAN]: Add broadcast manager (bcm) protocol")
Link: https://lore.kernel.org/r/20210619161813.2098382-1-cascardo@canonical.com
Cc: linux-stable <stable(a)vger.kernel.org>
Reported-by: syzbot+0f7e7e5e2f4f40fa89c0(a)syzkaller.appspotmail.com
Reported-by: Norbert Slusarek <nslusarek(a)gmx.net>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo(a)canonical.com>
Acked-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
diff --git a/net/can/bcm.c b/net/can/bcm.c
index f3e4d9528fa3..0928a39c4423 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -785,6 +785,7 @@ static int bcm_delete_rx_op(struct list_head *ops, struct bcm_msg_head *mh,
bcm_rx_handler, op);
list_del(&op->list);
+ synchronize_rcu();
bcm_remove_op(op);
return 1; /* done */
}
@@ -1533,9 +1534,13 @@ static int bcm_release(struct socket *sock)
REGMASK(op->can_id),
bcm_rx_handler, op);
- bcm_remove_op(op);
}
+ synchronize_rcu();
+
+ list_for_each_entry_safe(op, next, &bo->rx_ops, list)
+ bcm_remove_op(op);
+
#if IS_ENABLED(CONFIG_PROC_FS)
/* remove procfs entry */
if (net->can.bcmproc_dir && bo->bcm_proc_read)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f2165627319ffd33a6217275e5690b1ab5c45763 Mon Sep 17 00:00:00 2001
From: David Sterba <dsterba(a)suse.com>
Date: Mon, 14 Jun 2021 12:45:18 +0200
Subject: [PATCH] btrfs: compression: don't try to compress if we don't have
enough pages
The early check if we should attempt compression does not take into
account the number of input pages. It can happen that there's only one
page, eg. a tail page after some ranges of the BTRFS_MAX_UNCOMPRESSED
have been processed, or an isolated page that won't be converted to an
inline extent.
The single page would be compressed but a later check would drop it
again because the result size must be at least one block shorter than
the input. That can never work with just one page.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a2494c645681..e6eb20987351 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -629,7 +629,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
* inode has not been flagged as nocompress. This flag can
* change at any time if we discover bad compression ratios.
*/
- if (inode_need_compress(BTRFS_I(inode), start, end)) {
+ if (nr_pages > 1 && inode_need_compress(BTRFS_I(inode), start, end)) {
WARN_ON(pages);
pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
if (!pages) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f2165627319ffd33a6217275e5690b1ab5c45763 Mon Sep 17 00:00:00 2001
From: David Sterba <dsterba(a)suse.com>
Date: Mon, 14 Jun 2021 12:45:18 +0200
Subject: [PATCH] btrfs: compression: don't try to compress if we don't have
enough pages
The early check if we should attempt compression does not take into
account the number of input pages. It can happen that there's only one
page, eg. a tail page after some ranges of the BTRFS_MAX_UNCOMPRESSED
have been processed, or an isolated page that won't be converted to an
inline extent.
The single page would be compressed but a later check would drop it
again because the result size must be at least one block shorter than
the input. That can never work with just one page.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a2494c645681..e6eb20987351 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -629,7 +629,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
* inode has not been flagged as nocompress. This flag can
* change at any time if we discover bad compression ratios.
*/
- if (inode_need_compress(BTRFS_I(inode), start, end)) {
+ if (nr_pages > 1 && inode_need_compress(BTRFS_I(inode), start, end)) {
WARN_ON(pages);
pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
if (!pages) {
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From f2165627319ffd33a6217275e5690b1ab5c45763 Mon Sep 17 00:00:00 2001
From: David Sterba <dsterba(a)suse.com>
Date: Mon, 14 Jun 2021 12:45:18 +0200
Subject: [PATCH] btrfs: compression: don't try to compress if we don't have
enough pages
The early check if we should attempt compression does not take into
account the number of input pages. It can happen that there's only one
page, eg. a tail page after some ranges of the BTRFS_MAX_UNCOMPRESSED
have been processed, or an isolated page that won't be converted to an
inline extent.
The single page would be compressed but a later check would drop it
again because the result size must be at least one block shorter than
the input. That can never work with just one page.
CC: stable(a)vger.kernel.org # 4.4+
Signed-off-by: David Sterba <dsterba(a)suse.com>
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index a2494c645681..e6eb20987351 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -629,7 +629,7 @@ static noinline int compress_file_range(struct async_chunk *async_chunk)
* inode has not been flagged as nocompress. This flag can
* change at any time if we discover bad compression ratios.
*/
- if (inode_need_compress(BTRFS_I(inode), start, end)) {
+ if (nr_pages > 1 && inode_need_compress(BTRFS_I(inode), start, end)) {
WARN_ON(pages);
pages = kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS);
if (!pages) {
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 74c66120fda6596ad57f41e1607b3a5d51ca143d Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Wed, 16 Jun 2021 13:34:59 -0700
Subject: [PATCH] crypto: nx - Fix memcpy() over-reading in nonce
Fix typo in memcpy() where size should be CTR_RFC3686_NONCE_SIZE.
Fixes: 030f4e968741 ("crypto: nx - Fix reentrancy bugs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/nx/nx-aes-ctr.c b/drivers/crypto/nx/nx-aes-ctr.c
index 13f518802343..6120e350ff71 100644
--- a/drivers/crypto/nx/nx-aes-ctr.c
+++ b/drivers/crypto/nx/nx-aes-ctr.c
@@ -118,7 +118,7 @@ static int ctr3686_aes_nx_crypt(struct skcipher_request *req)
struct nx_crypto_ctx *nx_ctx = crypto_skcipher_ctx(tfm);
u8 iv[16];
- memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_IV_SIZE);
+ memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_NONCE_SIZE);
memcpy(iv + CTR_RFC3686_NONCE_SIZE, req->iv, CTR_RFC3686_IV_SIZE);
iv[12] = iv[13] = iv[14] = 0;
iv[15] = 1;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 74c66120fda6596ad57f41e1607b3a5d51ca143d Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Wed, 16 Jun 2021 13:34:59 -0700
Subject: [PATCH] crypto: nx - Fix memcpy() over-reading in nonce
Fix typo in memcpy() where size should be CTR_RFC3686_NONCE_SIZE.
Fixes: 030f4e968741 ("crypto: nx - Fix reentrancy bugs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/nx/nx-aes-ctr.c b/drivers/crypto/nx/nx-aes-ctr.c
index 13f518802343..6120e350ff71 100644
--- a/drivers/crypto/nx/nx-aes-ctr.c
+++ b/drivers/crypto/nx/nx-aes-ctr.c
@@ -118,7 +118,7 @@ static int ctr3686_aes_nx_crypt(struct skcipher_request *req)
struct nx_crypto_ctx *nx_ctx = crypto_skcipher_ctx(tfm);
u8 iv[16];
- memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_IV_SIZE);
+ memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_NONCE_SIZE);
memcpy(iv + CTR_RFC3686_NONCE_SIZE, req->iv, CTR_RFC3686_IV_SIZE);
iv[12] = iv[13] = iv[14] = 0;
iv[15] = 1;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 74c66120fda6596ad57f41e1607b3a5d51ca143d Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Wed, 16 Jun 2021 13:34:59 -0700
Subject: [PATCH] crypto: nx - Fix memcpy() over-reading in nonce
Fix typo in memcpy() where size should be CTR_RFC3686_NONCE_SIZE.
Fixes: 030f4e968741 ("crypto: nx - Fix reentrancy bugs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/nx/nx-aes-ctr.c b/drivers/crypto/nx/nx-aes-ctr.c
index 13f518802343..6120e350ff71 100644
--- a/drivers/crypto/nx/nx-aes-ctr.c
+++ b/drivers/crypto/nx/nx-aes-ctr.c
@@ -118,7 +118,7 @@ static int ctr3686_aes_nx_crypt(struct skcipher_request *req)
struct nx_crypto_ctx *nx_ctx = crypto_skcipher_ctx(tfm);
u8 iv[16];
- memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_IV_SIZE);
+ memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_NONCE_SIZE);
memcpy(iv + CTR_RFC3686_NONCE_SIZE, req->iv, CTR_RFC3686_IV_SIZE);
iv[12] = iv[13] = iv[14] = 0;
iv[15] = 1;
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 74c66120fda6596ad57f41e1607b3a5d51ca143d Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Wed, 16 Jun 2021 13:34:59 -0700
Subject: [PATCH] crypto: nx - Fix memcpy() over-reading in nonce
Fix typo in memcpy() where size should be CTR_RFC3686_NONCE_SIZE.
Fixes: 030f4e968741 ("crypto: nx - Fix reentrancy bugs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/nx/nx-aes-ctr.c b/drivers/crypto/nx/nx-aes-ctr.c
index 13f518802343..6120e350ff71 100644
--- a/drivers/crypto/nx/nx-aes-ctr.c
+++ b/drivers/crypto/nx/nx-aes-ctr.c
@@ -118,7 +118,7 @@ static int ctr3686_aes_nx_crypt(struct skcipher_request *req)
struct nx_crypto_ctx *nx_ctx = crypto_skcipher_ctx(tfm);
u8 iv[16];
- memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_IV_SIZE);
+ memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_NONCE_SIZE);
memcpy(iv + CTR_RFC3686_NONCE_SIZE, req->iv, CTR_RFC3686_IV_SIZE);
iv[12] = iv[13] = iv[14] = 0;
iv[15] = 1;
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 74c66120fda6596ad57f41e1607b3a5d51ca143d Mon Sep 17 00:00:00 2001
From: Kees Cook <keescook(a)chromium.org>
Date: Wed, 16 Jun 2021 13:34:59 -0700
Subject: [PATCH] crypto: nx - Fix memcpy() over-reading in nonce
Fix typo in memcpy() where size should be CTR_RFC3686_NONCE_SIZE.
Fixes: 030f4e968741 ("crypto: nx - Fix reentrancy bugs")
Cc: stable(a)vger.kernel.org
Signed-off-by: Kees Cook <keescook(a)chromium.org>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
diff --git a/drivers/crypto/nx/nx-aes-ctr.c b/drivers/crypto/nx/nx-aes-ctr.c
index 13f518802343..6120e350ff71 100644
--- a/drivers/crypto/nx/nx-aes-ctr.c
+++ b/drivers/crypto/nx/nx-aes-ctr.c
@@ -118,7 +118,7 @@ static int ctr3686_aes_nx_crypt(struct skcipher_request *req)
struct nx_crypto_ctx *nx_ctx = crypto_skcipher_ctx(tfm);
u8 iv[16];
- memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_IV_SIZE);
+ memcpy(iv, nx_ctx->priv.ctr.nonce, CTR_RFC3686_NONCE_SIZE);
memcpy(iv + CTR_RFC3686_NONCE_SIZE, req->iv, CTR_RFC3686_IV_SIZE);
iv[12] = iv[13] = iv[14] = 0;
iv[15] = 1;
The patch below does not apply to the 5.13-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fed09e0bf9f0622a54f3963f959d914fa817f8a6 Mon Sep 17 00:00:00 2001
From: Kyle Tso <kyletso(a)google.com>
Date: Wed, 16 Jun 2021 01:32:06 +0800
Subject: [PATCH] usb: typec: tcpm: Ignore Vsafe0v in PR_SWAP_SNK_SRC_SOURCE_ON
state
In PR_SWAP_SNK_SRC_SOURCE_ON state, Vsafe0v is expected as well so do
nothing here to avoid state machine going into SNK_UNATTACHED.
Fixes: 28b43d3d746b ("usb: typec: tcpm: Introduce vsafe0v for vbus")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Badhri Jagan Sridharan <badhri(a)google.com>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Kyle Tso <kyletso(a)google.com>
Link: https://lore.kernel.org/r/20210615173206.1646477-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 197556038ba4..e11e9227107d 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -5212,6 +5212,7 @@ static void _tcpm_pd_vbus_vsafe0v(struct tcpm_port *port)
}
break;
case PR_SWAP_SNK_SRC_SINK_OFF:
+ case PR_SWAP_SNK_SRC_SOURCE_ON:
/* Do nothing, vsafe0v is expected during transition */
break;
default:
The patch below does not apply to the 5.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From fed09e0bf9f0622a54f3963f959d914fa817f8a6 Mon Sep 17 00:00:00 2001
From: Kyle Tso <kyletso(a)google.com>
Date: Wed, 16 Jun 2021 01:32:06 +0800
Subject: [PATCH] usb: typec: tcpm: Ignore Vsafe0v in PR_SWAP_SNK_SRC_SOURCE_ON
state
In PR_SWAP_SNK_SRC_SOURCE_ON state, Vsafe0v is expected as well so do
nothing here to avoid state machine going into SNK_UNATTACHED.
Fixes: 28b43d3d746b ("usb: typec: tcpm: Introduce vsafe0v for vbus")
Cc: stable <stable(a)vger.kernel.org>
Reviewed-by: Badhri Jagan Sridharan <badhri(a)google.com>
Reviewed-by: Guenter Roeck <linux(a)roeck-us.net>
Acked-by: Heikki Krogerus <heikki.krogerus(a)linux.intel.com>
Signed-off-by: Kyle Tso <kyletso(a)google.com>
Link: https://lore.kernel.org/r/20210615173206.1646477-1-kyletso@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 197556038ba4..e11e9227107d 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -5212,6 +5212,7 @@ static void _tcpm_pd_vbus_vsafe0v(struct tcpm_port *port)
}
break;
case PR_SWAP_SNK_SRC_SINK_OFF:
+ case PR_SWAP_SNK_SRC_SOURCE_ON:
/* Do nothing, vsafe0v is expected during transition */
break;
default:
The patch below does not apply to the 4.9-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 362372ceb6556f338e230f2d90af27b47f82365a Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 22 Jun 2021 11:06:47 +0200
Subject: [PATCH] ALSA: usb-audio: Fix OOB access at proc output
At extending the available mixer values for 32bit types, we forgot to
add the corresponding entries for the format dump in the proc output.
This may result in OOB access. Here adds the missing entries.
Fixes: bc18e31c3042 ("ALSA: usb-audio: Fix parameter block size for UAC2 control requests")
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210622090647.14021-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c
index 428d581f988f..4ea4875abdf8 100644
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -3294,8 +3294,9 @@ static void snd_usb_mixer_dump_cval(struct snd_info_buffer *buffer,
struct usb_mixer_elem_list *list)
{
struct usb_mixer_elem_info *cval = mixer_elem_list_to_info(list);
- static const char * const val_types[] = {"BOOLEAN", "INV_BOOLEAN",
- "S8", "U8", "S16", "U16"};
+ static const char * const val_types[] = {
+ "BOOLEAN", "INV_BOOLEAN", "S8", "U8", "S16", "U16", "S32", "U32",
+ };
snd_iprintf(buffer, " Info: id=%i, control=%i, cmask=0x%x, "
"channels=%i, type=\"%s\"\n", cval->head.id,
cval->control, cval->cmask, cval->channels,
The patch below does not apply to the 4.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 362372ceb6556f338e230f2d90af27b47f82365a Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 22 Jun 2021 11:06:47 +0200
Subject: [PATCH] ALSA: usb-audio: Fix OOB access at proc output
At extending the available mixer values for 32bit types, we forgot to
add the corresponding entries for the format dump in the proc output.
This may result in OOB access. Here adds the missing entries.
Fixes: bc18e31c3042 ("ALSA: usb-audio: Fix parameter block size for UAC2 control requests")
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210622090647.14021-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c
index 428d581f988f..4ea4875abdf8 100644
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -3294,8 +3294,9 @@ static void snd_usb_mixer_dump_cval(struct snd_info_buffer *buffer,
struct usb_mixer_elem_list *list)
{
struct usb_mixer_elem_info *cval = mixer_elem_list_to_info(list);
- static const char * const val_types[] = {"BOOLEAN", "INV_BOOLEAN",
- "S8", "U8", "S16", "U16"};
+ static const char * const val_types[] = {
+ "BOOLEAN", "INV_BOOLEAN", "S8", "U8", "S16", "U16", "S32", "U32",
+ };
snd_iprintf(buffer, " Info: id=%i, control=%i, cmask=0x%x, "
"channels=%i, type=\"%s\"\n", cval->head.id,
cval->control, cval->cmask, cval->channels,
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 362372ceb6556f338e230f2d90af27b47f82365a Mon Sep 17 00:00:00 2001
From: Takashi Iwai <tiwai(a)suse.de>
Date: Tue, 22 Jun 2021 11:06:47 +0200
Subject: [PATCH] ALSA: usb-audio: Fix OOB access at proc output
At extending the available mixer values for 32bit types, we forgot to
add the corresponding entries for the format dump in the proc output.
This may result in OOB access. Here adds the missing entries.
Fixes: bc18e31c3042 ("ALSA: usb-audio: Fix parameter block size for UAC2 control requests")
Cc: <stable(a)vger.kernel.org>
Link: https://lore.kernel.org/r/20210622090647.14021-1-tiwai@suse.de
Signed-off-by: Takashi Iwai <tiwai(a)suse.de>
diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c
index 428d581f988f..4ea4875abdf8 100644
--- a/sound/usb/mixer.c
+++ b/sound/usb/mixer.c
@@ -3294,8 +3294,9 @@ static void snd_usb_mixer_dump_cval(struct snd_info_buffer *buffer,
struct usb_mixer_elem_list *list)
{
struct usb_mixer_elem_info *cval = mixer_elem_list_to_info(list);
- static const char * const val_types[] = {"BOOLEAN", "INV_BOOLEAN",
- "S8", "U8", "S16", "U16"};
+ static const char * const val_types[] = {
+ "BOOLEAN", "INV_BOOLEAN", "S8", "U8", "S16", "U16", "S32", "U32",
+ };
snd_iprintf(buffer, " Info: id=%i, control=%i, cmask=0x%x, "
"channels=%i, type=\"%s\"\n", cval->head.id,
cval->control, cval->cmask, cval->channels,
I'm announcing the release of the 5.10.49 kernel.
All users of the 5.10 kernel series must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/hexagon/Makefile | 6 +-
arch/hexagon/include/asm/futex.h | 4 -
arch/hexagon/include/asm/timex.h | 3 -
arch/hexagon/kernel/hexagon_ksyms.c | 8 +--
arch/hexagon/kernel/ptrace.c | 4 -
arch/hexagon/lib/Makefile | 3 -
arch/hexagon/lib/divsi3.S | 67 +++++++++++++++++++++++++++++++
arch/hexagon/lib/memcpy_likely_aligned.S | 56 +++++++++++++++++++++++++
arch/hexagon/lib/modsi3.S | 46 +++++++++++++++++++++
arch/hexagon/lib/udivsi3.S | 38 +++++++++++++++++
arch/hexagon/lib/umodsi3.S | 36 ++++++++++++++++
arch/powerpc/kvm/book3s_hv.c | 4 +
drivers/media/usb/uvc/uvc_driver.c | 32 ++++++++++++++
drivers/xen/events/events_base.c | 23 ++++++++--
15 files changed, 314 insertions(+), 18 deletions(-)
Fabiano Rosas (1):
KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
Greg Kroah-Hartman (1):
Linux 5.10.49
Juergen Gross (1):
xen/events: reset active flag for lateeoi events later
Laurent Pinchart (1):
media: uvcvideo: Support devices that report an OT as an entity source
Sid Manning (3):
Hexagon: fix build errors
Hexagon: add target builtins to kernel
Hexagon: change jumps to must-extend in futex_atomic_*
I'm announcing the release of the 4.14.239 kernel.
All users of the 4.14 kernel series must upgrade.
The updated 4.14.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.14.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
drivers/gpu/drm/nouveau/nouveau_bo.c | 4
drivers/scsi/sr.c | 2
drivers/xen/events/events_base.c | 23 ++++-
include/linux/hugetlb.h | 16 ---
include/linux/kfifo.h | 3
include/linux/mmdebug.h | 21 +++-
include/linux/pagemap.h | 13 +-
include/linux/rmap.h | 3
kernel/futex.c | 2
kernel/kthread.c | 77 +++++++++++-----
mm/huge_memory.c | 26 +----
mm/hugetlb.c | 5 -
mm/internal.h | 53 ++++++++---
mm/page_vma_mapped.c | 160 +++++++++++++++++++++--------------
mm/rmap.c | 48 ++++++----
16 files changed, 280 insertions(+), 178 deletions(-)
Alex Shi (1):
mm: add VM_WARN_ON_ONCE_PAGE() macro
Christian König (1):
drm/nouveau: fix dma_address check for CPU/GPU sync
Greg Kroah-Hartman (1):
Linux 4.14.239
Hugh Dickins (13):
mm/thp: try_to_unmap() use TTU_SYNC for safe splitting
mm/thp: fix vma_address() if virtual address below file offset
mm: page_vma_mapped_walk(): use page for pvmw->page
mm: page_vma_mapped_walk(): settle PageHuge on entry
mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd
mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block
mm: page_vma_mapped_walk(): crossing page table boundary
mm: page_vma_mapped_walk(): add a level of indentation
mm: page_vma_mapped_walk(): use goto instead of while (1)
mm: page_vma_mapped_walk(): get vma_address_end() earlier
mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes
mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk()
mm, futex: fix shared futex pgoff on shmem huge page
Jue Wang (1):
mm/thp: fix page_address_in_vma() on file THP tails
Juergen Gross (1):
xen/events: reset active flag for lateeoi events later
ManYi Li (1):
scsi: sr: Return appropriate error code when disk is ejected
Miaohe Lin (2):
mm/rmap: remove unneeded semicolon in page_not_mapped()
mm/rmap: use page_not_mapped in try_to_unmap()
Michal Hocko (1):
include/linux/mmdebug.h: make VM_WARN* non-rvals
Petr Mladek (2):
kthread_worker: split code for canceling the delayed work timer
kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
Sean Young (1):
kfifo: DECLARE_KIFO_PTR(fifo, u64) does not work on arm 32 bit
Yang Shi (1):
mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
I'm announcing the release of the 4.9.275 kernel.
All users of the 4.9 kernel series must upgrade.
The updated 4.9.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.9.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
drivers/gpu/drm/nouveau/nouveau_bo.c | 4 -
drivers/scsi/sr.c | 2
drivers/xen/events/events_base.c | 23 ++++++++--
include/linux/hugetlb.h | 15 ------
include/linux/mmdebug.h | 21 +++++++--
include/linux/pagemap.h | 13 +++--
kernel/futex.c | 2
kernel/kthread.c | 77 +++++++++++++++++++++++------------
mm/huge_memory.c | 29 ++++---------
mm/hugetlb.c | 5 --
11 files changed, 111 insertions(+), 82 deletions(-)
Alex Shi (1):
mm: add VM_WARN_ON_ONCE_PAGE() macro
Christian König (1):
drm/nouveau: fix dma_address check for CPU/GPU sync
Greg Kroah-Hartman (1):
Linux 4.9.275
Hugh Dickins (1):
mm, futex: fix shared futex pgoff on shmem huge page
Juergen Gross (1):
xen/events: reset active flag for lateeoi events later
ManYi Li (1):
scsi: sr: Return appropriate error code when disk is ejected
Michal Hocko (1):
include/linux/mmdebug.h: make VM_WARN* non-rvals
Petr Mladek (2):
kthread_worker: split code for canceling the delayed work timer
kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync()
Yang Shi (1):
mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split
I'm announcing the release of the 4.4.275 kernel.
All users of the 4.4 kernel series must upgrade.
The updated 4.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
arch/arm/probes/kprobes/core.c | 6 ++++++
drivers/gpu/drm/nouveau/nouveau_bo.c | 4 ++--
drivers/scsi/sr.c | 2 ++
drivers/xen/events/events_base.c | 23 +++++++++++++++++++----
5 files changed, 30 insertions(+), 7 deletions(-)
Christian König (1):
drm/nouveau: fix dma_address check for CPU/GPU sync
Greg Kroah-Hartman (1):
Linux 4.4.275
Juergen Gross (1):
xen/events: reset active flag for lateeoi events later
ManYi Li (1):
scsi: sr: Return appropriate error code when disk is ejected
Masami Hiramatsu (1):
arm: kprobes: Allow to handle reentered kprobe on single-stepping
This is the start of the stable review cycle for the 5.4.131 release.
There are 4 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.4.131-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 5.4.131-rc1
Juergen Gross <jgross(a)suse.com>
xen/events: reset active flag for lateeoi events later
Alper Gun <alpergun(a)google.com>
KVM: SVM: Call SEV Guest Decommission if ASID binding fails
Heiko Carstens <hca(a)linux.ibm.com>
s390/stack: fix possible register corruption with stack switch helper
David Rientjes <rientjes(a)google.com>
KVM: SVM: Periodically schedule when unregistering regions on destroy
-------------
Diffstat:
Makefile | 4 ++--
arch/s390/include/asm/stacktrace.h | 18 +++++++++++-------
arch/x86/kvm/svm.c | 33 ++++++++++++++++++++++-----------
drivers/xen/events/events_base.c | 23 +++++++++++++++++++----
4 files changed, 54 insertions(+), 24 deletions(-)
This reverts commit 9e31c1fe45d555a948ff66f1f0e3fe1f83ca63f7. Ever
since that commit, we've been having issues where a hang in one client
can propagate to another. In particular, a hang in an app can propagate
to the X server which causes the whole desktop to lock up.
Error propagation along fences sound like a good idea, but as your bug
shows, surprising consequences, since propagating errors across security
boundaries is not a good thing.
What we do have is track the hangs on the ctx, and report information to
userspace using RESET_STATS. That's how arb_robustness works. Also, if my
understanding is still correct, the EIO from execbuf is when your context
is banned (because not recoverable or too many hangs). And in all these
cases it's up to userspace to figure out what is all impacted and should
be reported to the application, that's not on the kernel to guess and
automatically propagate.
What's more, we're also building more features on top of ctx error
reporting with RESET_STATS ioctl: Encrypted buffers use the same, and the
userspace fence wait also relies on that mechanism. So it is the path
going forward for reporting gpu hangs and resets to userspace.
So all together that's why I think we should just bury this idea again as
not quite the direction we want to go to, hence why I think the revert is
the right option here.
For backporters: Please note that you _must_ have a backport of
https://lore.kernel.org/dri-devel/20210602164149.391653-2-jason@jlekstrand.…
for otherwise backporting just this patch opens up a security bug.
v2: Augment commit message. Also restore Jason's sob that I
accidentally lost.
v3: Add a note for backporters
Signed-off-by: Jason Ekstrand <jason(a)jlekstrand.net>
Reported-by: Marcin Slusarz <marcin.slusarz(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v5.6+
Cc: Jason Ekstrand <jason.ekstrand(a)intel.com>
Cc: Marcin Slusarz <marcin.slusarz(a)intel.com>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/3080
Fixes: 9e31c1fe45d5 ("drm/i915: Propagate errors on awaiting already signaled fences")
Acked-by: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Reviewed-by: Jon Bloomfield <jon.bloomfield(a)intel.com>
---
drivers/gpu/drm/i915/i915_request.c | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 86b4c9f2613d5..09ebea9a0090a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1399,10 +1399,8 @@ i915_request_await_execution(struct i915_request *rq,
do {
fence = *child++;
- if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
- i915_sw_fence_set_error_once(&rq->submit, fence->error);
+ if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
continue;
- }
if (fence->context == rq->fence.context)
continue;
@@ -1499,10 +1497,8 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
do {
fence = *child++;
- if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
- i915_sw_fence_set_error_once(&rq->submit, fence->error);
+ if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
continue;
- }
/*
* Requests on the same timeline are explicitly ordered, along
--
2.31.1
This is the start of the stable review cycle for the 4.4.275 release.
There are 4 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sun, 11 Jul 2021 13:14:09 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.4.275-rc…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 4.4.275-rc1
Masami Hiramatsu <mhiramat(a)kernel.org>
arm: kprobes: Allow to handle reentered kprobe on single-stepping
Juergen Gross <jgross(a)suse.com>
xen/events: reset active flag for lateeoi events later
Christian König <christian.koenig(a)amd.com>
drm/nouveau: fix dma_address check for CPU/GPU sync
ManYi Li <limanyi(a)uniontech.com>
scsi: sr: Return appropriate error code when disk is ejected
-------------
Diffstat:
Makefile | 4 ++--
arch/arm/probes/kprobes/core.c | 6 ++++++
drivers/gpu/drm/nouveau/nouveau_bo.c | 4 ++--
drivers/scsi/sr.c | 2 ++
drivers/xen/events/events_base.c | 23 +++++++++++++++++++----
5 files changed, 31 insertions(+), 8 deletions(-)
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit b601a18f12383001e7a8da238de7ca1559ebc450 ]
A consumer is expected to disable a PWM before calling pwm_put(). And if
they didn't there is hopefully a good reason (or the consumer needs
fixing). Also if disabling an enabled PWM was the right thing to do,
this should better be done in the framework instead of in each low level
driver.
So drop the hardware modification from the .remove() callback.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding(a)gmail.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/pwm/pwm-spear.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/pwm/pwm-spear.c b/drivers/pwm/pwm-spear.c
index 6c6b44fd3f43..2d11ac277de8 100644
--- a/drivers/pwm/pwm-spear.c
+++ b/drivers/pwm/pwm-spear.c
@@ -231,10 +231,6 @@ static int spear_pwm_probe(struct platform_device *pdev)
static int spear_pwm_remove(struct platform_device *pdev)
{
struct spear_pwm_chip *pc = platform_get_drvdata(pdev);
- int i;
-
- for (i = 0; i < NUM_PWM; i++)
- pwm_disable(&pc->chip.pwms[i]);
/* clk was prepared in probe, hence unprepare it here */
clk_unprepare(pc->clk);
--
2.30.2
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit b601a18f12383001e7a8da238de7ca1559ebc450 ]
A consumer is expected to disable a PWM before calling pwm_put(). And if
they didn't there is hopefully a good reason (or the consumer needs
fixing). Also if disabling an enabled PWM was the right thing to do,
this should better be done in the framework instead of in each low level
driver.
So drop the hardware modification from the .remove() callback.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding(a)gmail.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/pwm/pwm-spear.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/pwm/pwm-spear.c b/drivers/pwm/pwm-spear.c
index 6c6b44fd3f43..2d11ac277de8 100644
--- a/drivers/pwm/pwm-spear.c
+++ b/drivers/pwm/pwm-spear.c
@@ -231,10 +231,6 @@ static int spear_pwm_probe(struct platform_device *pdev)
static int spear_pwm_remove(struct platform_device *pdev)
{
struct spear_pwm_chip *pc = platform_get_drvdata(pdev);
- int i;
-
- for (i = 0; i < NUM_PWM; i++)
- pwm_disable(&pc->chip.pwms[i]);
/* clk was prepared in probe, hence unprepare it here */
clk_unprepare(pc->clk);
--
2.30.2
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit b601a18f12383001e7a8da238de7ca1559ebc450 ]
A consumer is expected to disable a PWM before calling pwm_put(). And if
they didn't there is hopefully a good reason (or the consumer needs
fixing). Also if disabling an enabled PWM was the right thing to do,
this should better be done in the framework instead of in each low level
driver.
So drop the hardware modification from the .remove() callback.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding(a)gmail.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/pwm/pwm-spear.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/pwm/pwm-spear.c b/drivers/pwm/pwm-spear.c
index 6c6b44fd3f43..2d11ac277de8 100644
--- a/drivers/pwm/pwm-spear.c
+++ b/drivers/pwm/pwm-spear.c
@@ -231,10 +231,6 @@ static int spear_pwm_probe(struct platform_device *pdev)
static int spear_pwm_remove(struct platform_device *pdev)
{
struct spear_pwm_chip *pc = platform_get_drvdata(pdev);
- int i;
-
- for (i = 0; i < NUM_PWM; i++)
- pwm_disable(&pc->chip.pwms[i]);
/* clk was prepared in probe, hence unprepare it here */
clk_unprepare(pc->clk);
--
2.30.2
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit b601a18f12383001e7a8da238de7ca1559ebc450 ]
A consumer is expected to disable a PWM before calling pwm_put(). And if
they didn't there is hopefully a good reason (or the consumer needs
fixing). Also if disabling an enabled PWM was the right thing to do,
this should better be done in the framework instead of in each low level
driver.
So drop the hardware modification from the .remove() callback.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Signed-off-by: Thierry Reding <thierry.reding(a)gmail.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/pwm/pwm-spear.c | 4 ----
1 file changed, 4 deletions(-)
diff --git a/drivers/pwm/pwm-spear.c b/drivers/pwm/pwm-spear.c
index 6c6b44fd3f43..2d11ac277de8 100644
--- a/drivers/pwm/pwm-spear.c
+++ b/drivers/pwm/pwm-spear.c
@@ -231,10 +231,6 @@ static int spear_pwm_probe(struct platform_device *pdev)
static int spear_pwm_remove(struct platform_device *pdev)
{
struct spear_pwm_chip *pc = platform_get_drvdata(pdev);
- int i;
-
- for (i = 0; i < NUM_PWM; i++)
- pwm_disable(&pc->chip.pwms[i]);
/* clk was prepared in probe, hence unprepare it here */
clk_unprepare(pc->clk);
--
2.30.2
Commit 5551cb1c68d4 ("mac80211: do not accept/forward invalid EAPOL
frames") uses skb_mac_header() before eth_type_trans() is called
leading to incorrect pointer, the pointer gets written to. This issue
has appeared during backporting to 4.4, 4.9 and 4.14.
Fixes: 5551cb1c68d4 ("mac80211: do not accept/forward invalid EAPOL frames")
Link: https://lore.kernel.org/r/CAHQn7pKcyC_jYmGyTcPCdk9xxATwW5QPNph=bsZV8d-HPwNs…
Cc: <stable(a)vger.kernel.org> # 4.9.x
Signed-off-by: Davis Mosenkovs <davis(a)mosenkovs.lv>
---
net/mac80211/rx.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/net/mac80211/rx.c b/net/mac80211/rx.c
index 3a069cb188b7..b40e71a5d795 100644
--- a/net/mac80211/rx.c
+++ b/net/mac80211/rx.c
@@ -2380,7 +2380,7 @@ ieee80211_deliver_skb(struct ieee80211_rx_data *rx)
#endif
if (skb) {
- struct ethhdr *ehdr = (void *)skb_mac_header(skb);
+ struct ethhdr *ehdr = (struct ethhdr *)skb->data;
/* deliver to local stack */
skb->protocol = eth_type_trans(skb, dev);
--
2.25.1
This is the start of the stable review cycle for the 4.4.274 release.
There are 57 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Wed 30 Jun 2021 02:42:54 PM UTC.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.4.y
and the diffstat can be found below.
Thanks,
Sasha
-------------
Pseudo-Shortlog of commits:
Anirudh Rayabharam (1):
HID: usbhid: fix info leak in hid_submit_ctrl
Antti Järvinen (1):
PCI: Mark TI C667X to avoid bus reset
Arnd Bergmann (1):
ARM: 9081/1: fix gcc-10 thumb2-kernel regression
Bixuan Cui (1):
HID: gt683r: add missing MODULE_DEVICE_TABLE
Bumyong Lee (1):
dmaengine: pl330: fix wrong usage of spinlock flags in dma_cyclc
Chen Li (1):
radeon: use memcpy_to/fromio for UVD fw upload
Christophe JAILLET (3):
qlcnic: Fix an error handling path in 'qlcnic_probe()'
netxen_nic: Fix an error handling path in 'netxen_nic_probe()'
be2net: Fix an error handling path in 'be_probe()'
Dongliang Mu (1):
net: usb: fix possible use-after-free in smsc75xx_bind
Du Cheng (1):
cfg80211: call cfg80211_leave_ocb when switching away from OCB
Eric Dumazet (3):
net/af_unix: fix a data-race in unix_dgram_sendmsg / unix_release_sock
inet: use bigger hash table for IP ID generation
inet: annotate date races around sk->sk_txhash
Esben Haabendal (1):
net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY
Fugang Duan (1):
net: fec_ptp: add clock rate zero check
Hillf Danton (1):
gfs2: Fix use-after-free in gfs2_glock_shrink_scan
Ido Schimmel (1):
rtnetlink: Fix regression in bridge VLAN configuration
Jiapeng Chong (2):
ethernet: myri10ge: Fix missing error code in myri10ge_probe()
rtnetlink: Fix missing error code in rtnl_bridge_notify()
Jisheng Zhang (1):
net: stmmac: dwmac1000: Fix extended MAC address registers definition
Johan Hovold (1):
i2c: robotfuzz-osif: fix control-request directions
Johannes Berg (1):
mac80211: drop multicast fragments
Josh Triplett (1):
net: ipconfig: Don't override command-line hostnames or domains
Kees Cook (3):
r8152: Avoid memcpy() over-reading of ETH_SS_STATS
sh_eth: Avoid memcpy() over-reading of ETH_SS_STATS
r8169: Avoid memcpy() over-reading of ETH_SS_STATS
Linyu Yuan (1):
net: cdc_eem: fix tx fixup skb leak
Maciej Żenczykowski (1):
net: cdc_ncm: switch to eth%d interface naming
Mark Bolhuis (1):
HID: Add BUS_VIRTUAL to hid_connect logging
Maurizio Lombardi (1):
scsi: target: core: Fix warning on realtime kernels
Maxim Mikityanskiy (1):
netfilter: synproxy: Fix out of bounds when parsing TCP options
Ming Lei (1):
scsi: core: Put .shost_dev in failure path if host state changes to
RUNNING
Nanyong Sun (1):
net: ipv4: fix memory leak in netlbl_cipsov4_add_std
Nathan Chancellor (1):
Makefile: Move -Wno-unused-but-set-variable out of GCC only block
Norbert Slusarek (1):
can: bcm: fix infoleak in struct bcm_msg_head
Pavel Skripkin (5):
net: rds: fix memory leak in rds_recvmsg
net: hamradio: fix memory leak in mkiss_close
net: ethernet: fix potential use-after-free in ec_bhf_remove
net: caif: fix memory leak in ldisc_open
nilfs2: fix memory leak in nilfs_sysfs_delete_device_group
Rafael J. Wysocki (1):
Revert "PCI: PM: Do not read power state in pci_enable_device_flags()"
Sasha Levin (1):
Linux 4.4.274-rc1
Shanker Donthineni (1):
PCI: Mark some NVIDIA GPUs to avoid bus reset
Srinivas Pandruvada (1):
HID: hid-sensor-hub: Return error for hid_set_field() failure
Steven Rostedt (VMware) (3):
tracing: Do no increment trace_clock_global() by one
tracing: Do not stop recording cmdlines when tracing is off
tracing: Do not stop recording comms if the trace file is being read
Tetsuo Handa (1):
can: bcm/raw/isotp: use per module netdevice notifier
Thomas Gleixner (1):
x86/fpu: Reset state for all signal restore failures
Vineet Gupta (1):
ARCv2: save ABI registers across signal handling
Yang Yingliang (1):
dmaengine: stedma40: add missing iounmap() on error in d40_probe()
Yongqiang Liu (1):
ARM: OMAP2+: Fix build warning when mmc_omap is not built
Zheng Yongjun (4):
net/x25: Return the correct errno code
net: Return the correct errno code
fib: Return the correct errno code
ping: Check return value of function 'ping_queue_rcv_skb'
Makefile | 7 +-
arch/arc/include/uapi/asm/sigcontext.h | 1 +
arch/arc/kernel/signal.c | 43 +++++++++++++
arch/arm/kernel/setup.c | 16 +++--
arch/arm/mach-omap2/board-n8x0.c | 2 +-
arch/x86/kernel/fpu/signal.c | 18 ++++--
drivers/dma/pl330.c | 6 +-
drivers/dma/ste_dma40.c | 3 +
drivers/gpu/drm/radeon/radeon_uvd.c | 4 +-
drivers/hid/hid-core.c | 3 +
drivers/hid/hid-gt683r.c | 1 +
drivers/hid/hid-sensor-hub.c | 13 ++--
drivers/hid/usbhid/hid-core.c | 2 +-
drivers/i2c/busses/i2c-robotfuzz-osif.c | 4 +-
drivers/net/caif/caif_serial.c | 1 +
drivers/net/ethernet/ec_bhf.c | 4 +-
drivers/net/ethernet/emulex/benet/be_main.c | 1 +
drivers/net/ethernet/freescale/fec_ptp.c | 4 ++
.../net/ethernet/myricom/myri10ge/myri10ge.c | 1 +
.../ethernet/qlogic/netxen/netxen_nic_main.c | 2 +
.../net/ethernet/qlogic/qlcnic/qlcnic_main.c | 1 +
drivers/net/ethernet/realtek/r8169.c | 2 +-
drivers/net/ethernet/renesas/sh_eth.c | 2 +-
.../net/ethernet/stmicro/stmmac/dwmac1000.h | 8 +--
drivers/net/ethernet/xilinx/ll_temac_main.c | 5 ++
drivers/net/hamradio/mkiss.c | 1 +
drivers/net/usb/cdc_eem.c | 2 +-
drivers/net/usb/cdc_ncm.c | 2 +-
drivers/net/usb/r8152.c | 2 +-
drivers/net/usb/smsc75xx.c | 10 +--
drivers/pci/pci.c | 16 ++++-
drivers/pci/quirks.c | 22 +++++++
drivers/scsi/hosts.c | 8 ++-
drivers/target/target_core_transport.c | 4 +-
fs/gfs2/glock.c | 2 +-
fs/nilfs2/sysfs.c | 1 +
include/linux/hid.h | 3 +-
include/net/sock.h | 10 ++-
kernel/trace/trace.c | 12 ----
kernel/trace/trace_clock.c | 6 +-
net/can/bcm.c | 64 +++++++++++++++----
net/can/raw.c | 62 ++++++++++++++----
net/compat.c | 2 +-
net/core/fib_rules.c | 2 +-
net/core/rtnetlink.c | 4 ++
net/ipv4/cipso_ipv4.c | 1 +
net/ipv4/ipconfig.c | 13 ++--
net/ipv4/ping.c | 12 ++--
net/ipv4/route.c | 42 ++++++++----
net/mac80211/rx.c | 9 +--
net/netfilter/nf_synproxy_core.c | 5 ++
net/rds/recv.c | 2 +-
net/unix/af_unix.c | 7 +-
net/wireless/util.c | 3 +
net/x25/af_x25.c | 2 +-
55 files changed, 351 insertions(+), 134 deletions(-)
--
2.30.2
From: Sherry Sun <sherry.sun(a)nxp.com>
[ Upstream commit fcb10ee27fb91b25b68d7745db9817ecea9f1038 ]
We should be very careful about the register values that will be used
for division or modulo operations, althrough the possibility that the
UARTBAUD register value is zero is very low, but we had better to deal
with the "bad data" of hardware in advance to avoid division or modulo
by zero leading to undefined kernel behavior.
Signed-off-by: Sherry Sun <sherry.sun(a)nxp.com>
Link: https://lore.kernel.org/r/20210427021226.27468-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/fsl_lpuart.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c
index 5b6093dc3ff2..a4c1797e30d7 100644
--- a/drivers/tty/serial/fsl_lpuart.c
+++ b/drivers/tty/serial/fsl_lpuart.c
@@ -1766,6 +1766,9 @@ lpuart32_console_get_options(struct lpuart_port *sport, int *baud,
bd = lpuart32_read(sport->port.membase + UARTBAUD);
bd &= UARTBAUD_SBR_MASK;
+ if (!bd)
+ return;
+
sbr = bd;
uartclk = clk_get_rate(sport->clk);
/*
--
2.30.2
From: Sherry Sun <sherry.sun(a)nxp.com>
[ Upstream commit fcb10ee27fb91b25b68d7745db9817ecea9f1038 ]
We should be very careful about the register values that will be used
for division or modulo operations, althrough the possibility that the
UARTBAUD register value is zero is very low, but we had better to deal
with the "bad data" of hardware in advance to avoid division or modulo
by zero leading to undefined kernel behavior.
Signed-off-by: Sherry Sun <sherry.sun(a)nxp.com>
Link: https://lore.kernel.org/r/20210427021226.27468-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/fsl_lpuart.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c
index cebebdcd091c..3d5fe53988e5 100644
--- a/drivers/tty/serial/fsl_lpuart.c
+++ b/drivers/tty/serial/fsl_lpuart.c
@@ -1998,6 +1998,9 @@ lpuart32_console_get_options(struct lpuart_port *sport, int *baud,
bd = lpuart32_read(&sport->port, UARTBAUD);
bd &= UARTBAUD_SBR_MASK;
+ if (!bd)
+ return;
+
sbr = bd;
uartclk = clk_get_rate(sport->clk);
/*
--
2.30.2
From: Sherry Sun <sherry.sun(a)nxp.com>
[ Upstream commit fcb10ee27fb91b25b68d7745db9817ecea9f1038 ]
We should be very careful about the register values that will be used
for division or modulo operations, althrough the possibility that the
UARTBAUD register value is zero is very low, but we had better to deal
with the "bad data" of hardware in advance to avoid division or modulo
by zero leading to undefined kernel behavior.
Signed-off-by: Sherry Sun <sherry.sun(a)nxp.com>
Link: https://lore.kernel.org/r/20210427021226.27468-1-sherry.sun@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/fsl_lpuart.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tty/serial/fsl_lpuart.c b/drivers/tty/serial/fsl_lpuart.c
index 1544a7cc76ff..1319f3dd5b70 100644
--- a/drivers/tty/serial/fsl_lpuart.c
+++ b/drivers/tty/serial/fsl_lpuart.c
@@ -1681,6 +1681,9 @@ lpuart32_console_get_options(struct lpuart_port *sport, int *baud,
bd = lpuart32_read(sport->port.membase + UARTBAUD);
bd &= UARTBAUD_SBR_MASK;
+ if (!bd)
+ return;
+
sbr = bd;
uartclk = clk_get_rate(sport->clk);
/*
--
2.30.2
The patch titled
Subject: shm: skip shm_destroy if task IPC namespace was changed
has been added to the -mm tree. Its filename is
shm-skip-shm_destroy-if-task-ipc-namespace-was-changed.patch
This patch should soon appear at
https://ozlabs.org/~akpm/mmots/broken-out/shm-skip-shm_destroy-if-task-ipc-…
and later at
https://ozlabs.org/~akpm/mmotm/broken-out/shm-skip-shm_destroy-if-task-ipc-…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Subject: shm: skip shm_destroy if task IPC namespace was changed
Patch series "shm: omit forced shm destroy if task IPC namespace was changed".
Task IPC namespace shm's has shm_rmid_forced feature which is per IPC
namespace and controlled by kernel.shm_rmid_forced sysctl. When feature
is turned on, then during task exit (and unshare(CLONE_NEWIPC)) all
sysvshm's will be destroyed by exit_shm(struct task_struct *task)
function.
But there is a problem if task was changed IPC namespace since shmget()
call. In such situation exit_shm() function will try to call
shm_destroy(<new_ipc_namespace_ptr>, <sysvshmem_from_old_ipc_namespace>)
which leads to the situation when sysvshm object still attached to old IPC
namespace but freed; later during old IPC namespace cleanup we will try to
free such sysvshm object for the second time and will get the problem :)
First patch solves this problem by postponing shm_destroy to the moment
when IPC namespace cleanup will be called. Second patch is useful to
prevent (or easy catch) such bugs in the future by adding corresponding
WARNings.
This patch (of 2):
Task may change IPC namespace by doing setns() but sysvshm objects remains
at the origin IPC namespace (=IPC namespace where task was when shmget()
was called). Let's skip forced shm destroy in such case because we can't
determine IPC namespace by shm only. These problematic sysvshm's will be
destroyed on ipc namespace cleanup.
Link: https://lkml.kernel.org/r/20210706132259.71740-1-alexander.mikhalitsyn@virt…
Link: https://lkml.kernel.org/r/20210706132259.71740-2-alexander.mikhalitsyn@virt…
Fixes: ab602f79915 ("shm: make exit_shm work proportional to task activity")
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn(a)virtuozzo.com>
Cc: Milton Miller <miltonm(a)bga.com>
Cc: Jack Miller <millerjo(a)us.ibm.com>
Cc: Pavel Tikhomirov <ptikhomirov(a)virtuozzo.com>
Cc: Alexander Mikhalitsyn <alexander(a)mihalicyn.com>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: Davidlohr Bueso <dave(a)stgolabs.net>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/shm.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
--- a/ipc/shm.c~shm-skip-shm_destroy-if-task-ipc-namespace-was-changed
+++ a/ipc/shm.c
@@ -173,6 +173,14 @@ static inline struct shmid_kernel *shm_o
return container_of(ipcp, struct shmid_kernel, shm_perm);
}
+static inline bool is_shm_in_ns(struct ipc_namespace *ns, struct shmid_kernel *shp)
+{
+ int idx = ipcid_to_idx(shp->shm_perm.id);
+ struct shmid_kernel *tshp = shm_obtain_object(ns, idx);
+
+ return !IS_ERR(tshp) && tshp == shp;
+}
+
/*
* shm_lock_(check_) routines are called in the paths where the rwsem
* is not necessarily held.
@@ -415,7 +423,7 @@ void exit_shm(struct task_struct *task)
list_for_each_entry_safe(shp, n, &task->sysvshm.shm_clist, shm_clist) {
shp->shm_creator = NULL;
- if (shm_may_destroy(ns, shp)) {
+ if (is_shm_in_ns(ns, shp) && shm_may_destroy(ns, shp)) {
shm_lock_by_ptr(shp);
shm_destroy(ns, shp);
}
_
Patches currently in -mm which might be from alexander.mikhalitsyn(a)virtuozzo.com are
shm-skip-shm_destroy-if-task-ipc-namespace-was-changed.patch
ipc-warn-if-trying-to-remove-ipc-object-which-is-absent.patch
This test passes pointers obtained from anon_allocate_area to the
userfaultfd and mremap APIs. This causes a problem if the system
allocator returns tagged pointers because with the tagged address ABI
the kernel rejects tagged addresses passed to these APIs, which would
end up causing the test to fail. To make this test compatible with
such system allocators, stop using the system allocator to allocate
memory in anon_allocate_area, and instead just use mmap.
Co-developed-by: Lokesh Gidra <lokeshgidra(a)google.com>
Signed-off-by: Lokesh Gidra <lokeshgidra(a)google.com>
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Fixes: c47174fc362a ("userfaultfd: selftest")
Cc: <stable(a)vger.kernel.org> # 5.4
Link: https://linux-review.googlesource.com/id/Icac91064fcd923f77a83e8e133f8631c5…
---
tools/testing/selftests/vm/userfaultfd.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c
index f5ab5e0312e7..d0f802053dfd 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -197,8 +197,10 @@ static int anon_release_pages(char *rel_area)
static void anon_allocate_area(void **alloc_area)
{
- if (posix_memalign(alloc_area, page_size, nr_pages * page_size)) {
- fprintf(stderr, "out of memory\n");
+ *alloc_area = mmap(NULL, nr_pages * page_size, PROT_READ | PROT_WRITE,
+ MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
+ if (*alloc_area == MAP_FAILED) {
+ fprintf(stderr, "anon memory mmap failed\n");
*alloc_area = NULL;
}
}
--
2.32.0.93.g670b81a890-goog
Hello,
We ran automated tests on a recent commit from this kernel tree:
Kernel repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Commit: fc3b667678f2 - Linux 5.12.15
The results of these automated tests are provided below.
Overall result: PASSED
Merge: OK
Compile: OK
Tests: OK
All kernel binaries, config files, and logs are available for download here:
https://arr-cki-prod-datawarehouse-public.s3.amazonaws.com/index.html?prefi…
Please reply to this email if you have any questions about the tests that we
ran or if you have any suggestions on how to make future tests more effective.
,-. ,-.
( C ) ( K ) Continuous
`-',-.`-' Kernel
( I ) Integration
`-'
______________________________________________________________________________
Compile testing
---------------
We compiled the kernel for 4 architectures:
aarch64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
ppc64le:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
s390x:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
x86_64:
make options: make -j24 INSTALL_MOD_STRIP=1 targz-pkg
Hardware testing
----------------
We booted each kernel and ran the following tests:
aarch64:
Host 1:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ ACPI enabled test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ❌ Podman system integration test - as root
🚧 ❌ Podman system integration test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ✅ stress: stress-ng
ppc64le:
Host 1:
✅ Boot test
✅ Reboot test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ trace: ftrace/tracer
🚧 ✅ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ❌ Podman system integration test - as root
🚧 ❌ Podman system integration test - as user
🚧 ✅ xfstests - btrfs
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
s390x:
Host 1:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
✅ Boot test
✅ Reboot test
✅ selinux-policy: serge-testsuite
✅ Storage: swraid mdadm raid_module test
🚧 ❌ Podman system integration test - as root
🚧 ❌ Podman system integration test - as user
🚧 ✅ Storage blktests
🚧 ✅ Storage nvme - tcp
🚧 ⚡⚡⚡ stress: stress-ng
Host 2:
✅ Boot test
✅ Reboot test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
x86_64:
Host 1:
✅ Boot test
✅ Reboot test
✅ ACPI table test
✅ LTP
✅ CIFS Connectathon
✅ POSIX pjd-fstest suites
✅ Loopdev Sanity
✅ jvm - jcstress tests
✅ Memory: fork_mem
✅ Memory function: memfd_create
✅ AMTU (Abstract Machine Test Utility)
✅ Networking bridge: sanity
✅ Ethernet drivers sanity
✅ Networking socket: fuzz
✅ Networking: igmp conformance test
✅ Networking route: pmtu
✅ Networking route_func - local
✅ Networking route_func - forward
✅ Networking TCP: keepalive test
✅ Networking UDP: socket
✅ Networking cki netfilter test
✅ Networking tunnel: geneve basic test
✅ Networking tunnel: gre basic
✅ L2TP basic test
✅ Networking tunnel: vxlan basic
✅ Networking ipsec: basic netns - transport
✅ Networking ipsec: basic netns - tunnel
✅ Libkcapi AF_ALG test
✅ pciutils: sanity smoke test
✅ pciutils: update pci ids test
✅ ALSA PCM loopback test
✅ ALSA Control (mixer) Userspace Element test
✅ storage: SCSI VPD
✅ trace: ftrace/tracer
🚧 ❌ xarray-idr-radixtree-test
🚧 ✅ i2c: i2cdetect sanity
🚧 ✅ Firmware test suite
🚧 ✅ Memory function: kaslr
🚧 ✅ audit: audit testsuite test
Host 2:
⚡ Internal infrastructure issues prevented one or more tests (marked
with ⚡⚡⚡) from running on this architecture.
This is not the fault of the kernel that was tested.
⚡⚡⚡ Boot test
⚡⚡⚡ Reboot test
⚡⚡⚡ xfstests - ext4
⚡⚡⚡ xfstests - xfs
⚡⚡⚡ xfstests - nfsv4.2
⚡⚡⚡ selinux-policy: serge-testsuite
⚡⚡⚡ power-management: cpupower/sanity test
⚡⚡⚡ storage: software RAID testing
⚡⚡⚡ Storage: swraid mdadm raid_module test
🚧 ⚡⚡⚡ Podman system integration test - as root
🚧 ⚡⚡⚡ Podman system integration test - as user
🚧 ⚡⚡⚡ CPU: Idle Test
🚧 ⚡⚡⚡ xfstests - btrfs
🚧 ⚡⚡⚡ xfstests - cifsv3.11
🚧 ⚡⚡⚡ IPMI driver test
🚧 ⚡⚡⚡ IPMItool loop stress test
🚧 ⚡⚡⚡ Storage blktests
🚧 ⚡⚡⚡ Storage block - filesystem fio test
🚧 ⚡⚡⚡ Storage block - queue scheduler test
🚧 ⚡⚡⚡ Storage nvme - tcp
🚧 ⚡⚡⚡ Storage: lvm device-mapper test - upstream
🚧 ⚡⚡⚡ stress: stress-ng
Host 3:
✅ Boot test
✅ Reboot test
✅ xfstests - ext4
✅ xfstests - xfs
✅ xfstests - nfsv4.2
✅ selinux-policy: serge-testsuite
✅ storage: software RAID testing
✅ Storage: swraid mdadm raid_module test
🚧 ❌ Podman system integration test - as root
🚧 ❌ Podman system integration test - as user
🚧 ❌ CPU: Idle Test
🚧 ✅ xfstests - btrfs
🚧 ❌ xfstests - cifsv3.11
🚧 ✅ IPMI driver test
🚧 ✅ IPMItool loop stress test
🚧 ✅ Storage blktests
🚧 ✅ Storage block - filesystem fio test
🚧 ✅ Storage block - queue scheduler test
🚧 ✅ Storage nvme - tcp
🚧 ✅ Storage: lvm device-mapper test - upstream
🚧 ✅ stress: stress-ng
Test sources: https://gitlab.com/cki-project/kernel-tests
💚 Pull requests are welcome for new tests or improvements to existing tests!
Aborted tests
-------------
Tests that didn't complete running successfully are marked with ⚡⚡⚡.
If this was caused by an infrastructure issue, we try to mark that
explicitly in the report.
Waived tests
------------
If the test run included waived tests, they are marked with 🚧. Such tests are
executed but their results are not taken into account. Tests are waived when
their results are not reliable enough, e.g. when they're just introduced or are
being fixed.
Testing timeout
---------------
We aim to provide a report within reasonable timeframe. Tests that haven't
finished running yet are marked with ⏱.
The Tegra serial driver always prints an error message when enabling the
FIFO for devices that have support for checking the FIFO enable status.
Fix this by displaying the error message, only when an error occurs.
Finally, update the error message to make it clear that enabling the
FIFO failed and display the error code.
Fixes: 222dcdff3405 ("serial: tegra: check for FIFO mode enabled status")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Jon Hunter <jonathanh(a)nvidia.com>
---
Changes since V1:
- Updated the error message to make it more meaningful.
drivers/tty/serial/serial-tegra.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/tty/serial/serial-tegra.c b/drivers/tty/serial/serial-tegra.c
index 222032792d6c..eba5b9ecba34 100644
--- a/drivers/tty/serial/serial-tegra.c
+++ b/drivers/tty/serial/serial-tegra.c
@@ -1045,9 +1045,11 @@ static int tegra_uart_hw_init(struct tegra_uart_port *tup)
if (tup->cdata->fifo_mode_enable_status) {
ret = tegra_uart_wait_fifo_mode_enabled(tup);
- dev_err(tup->uport.dev, "FIFO mode not enabled\n");
- if (ret < 0)
+ if (ret < 0) {
+ dev_err(tup->uport.dev,
+ "Failed to enable FIFO mode: %d\n", ret);
return ret;
+ }
} else {
/*
* For all tegra devices (up to t210), there is a hardware
--
2.25.1
The patch titled
Subject: mm/mremap: hold the rmap lock in write mode when moving page table entries.
has been removed from the -mm tree. Its filename was
mm-mremap-hold-the-rmap-lock-in-write-mode-when-moving-page-table-entries.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/mremap: hold the rmap lock in write mode when moving page table entries.
To avoid a race between rmap walk and mremap, mremap does
take_rmap_locks(). The lock was taken to ensure that rmap walk don't miss
a page table entry due to PTE moves via move_pagetables(). The kernel
does further optimization of this lock such that if we are going to find
the newly added vma after the old vma, the rmap lock is not taken. This
is because rmap walk would find the vmas in the same order and if we don't
find the page table attached to older vma we would find it with the new
vma which we would iterate later.
As explained in commit eb66ae030829 ("mremap: properly flush TLB before
releasing the page") mremap is special in that it doesn't take ownership
of the page. The optimized version for PUD/PMD aligned mremap also
doesn't hold the ptl lock. This can result in stale TLB entries as show
below.
This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::
Optmized PMD move
CPU 1 CPU 2 CPU 3
mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one
mmap_write_lock_killable()
addr = old_addr
lock(pte_ptl)
lock(pmd_ptl)
pmd = *old_pmd
pmd_clear(old_pmd)
flush_tlb_range(old_addr)
*new_pmd = pmd
*new_addr = 10; and fills
TLB with new addr
and old pfn
unlock(pmd_ptl)
ptep_clear_flush()
old pfn is free.
Stale TLB entry
Optimized PUD move also suffers from a similar race. Both the above race
condition can be fixed if we force mremap path to take rmap lock.
Link: https://lkml.kernel.org/r/20210616045239.370802-7-aneesh.kumar@linux.ibm.com
Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP…
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Stephen Rothwell <sfr(a)canb.auug.org.au>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mremap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/mremap.c~mm-mremap-hold-the-rmap-lock-in-write-mode-when-moving-page-table-entries
+++ a/mm/mremap.c
@@ -504,7 +504,7 @@ unsigned long move_page_tables(struct vm
} else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr,
- old_pud, new_pud, need_rmap_locks))
+ old_pud, new_pud, true))
continue;
}
@@ -531,7 +531,7 @@ unsigned long move_page_tables(struct vm
* moving at the PMD level if possible.
*/
if (move_pgt_entry(NORMAL_PMD, vma, old_addr, new_addr,
- old_pmd, new_pmd, need_rmap_locks))
+ old_pmd, new_pmd, true))
continue;
}
_
Patches currently in -mm which might be from aneesh.kumar(a)linux.ibm.com are
From: Charles Keepax <ckeepax(a)opensource.cirrus.com>
[ Upstream commit 0e793ba77c18382f08e440260fe72bc6fce2a3cb ]
Currently, the SPI core doesn't set the struct device fwnode pointer
when it creates a new SPI device. This means when the device is
registered the fwnode is NULL and the check in device_add which sets
the fwnode->dev pointer is skipped. This wasn't previously an issue,
however these two patches:
commit 4731210c09f5 ("gpiolib: Bind gpio_device to a driver to enable
fw_devlink=on by default")
commit ced2af419528 ("gpiolib: Don't probe gpio_device if it's not the
primary device")
Added some code to the GPIO core which relies on using that
fwnode->dev pointer to determine if a driver is bound to the fwnode
and if not bind a stub GPIO driver. This means the GPIO providers
behind SPI will get both the expected driver and this stub driver
causing the stub driver to fail if it attempts to request any pin
configuration. For example on my system:
madera-pinctrl madera-pinctrl: pin gpio5 already requested by madera-pinctrl; cannot claim for gpiochip3
madera-pinctrl madera-pinctrl: pin-4 (gpiochip3) status -22
madera-pinctrl madera-pinctrl: could not request pin 4 (gpio5) from group aif1 on device madera-pinctrl
gpio_stub_drv gpiochip3: Error applying setting, reverse things back
gpio_stub_drv: probe of gpiochip3 failed with error -22
The firmware node on the device created by the GPIO framework is set
through the of_node pointer hence things generally actually work,
however that fwnode->dev is never set, as the check was skipped at
device_add time. This fix appears to match how the I2C subsystem
handles the same situation.
Signed-off-by: Charles Keepax <ckeepax(a)opensource.cirrus.com>
Link: https://lore.kernel.org/r/20210421101402.8468-1-ckeepax@opensource.cirrus.c…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/spi/spi.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index e353b7a9e54e..d83f91a6ab13 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -2057,6 +2057,7 @@ of_register_spi_device(struct spi_controller *ctlr, struct device_node *nc)
/* Store a pointer to the node in the device structure */
of_node_get(nc);
spi->dev.of_node = nc;
+ spi->dev.fwnode = of_fwnode_handle(nc);
/* Register the new device */
rc = spi_add_device(spi);
--
2.30.2
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
With the addition of simple mathematical operations (plus and minus), the
parsing of the "sym-offset" modifier broke, as it took the '-' part of the
"sym-offset" as a minus, and tried to break it up into a mathematical
operation of "field.sym - offset", in which case it failed to parse
(unless the event had a field called "offset").
Both .sym and .sym-offset modifiers should not be entered into
mathematical calculations anyway. If ".sym-offset" is found in the
modifier, then simply make it not an operation that can be calculated on.
Link: https://lkml.kernel.org/r/20210707110821.188ae255@oasis.local.home
Cc: Ingo Molnar <mingo(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Namhyung Kim <namhyung(a)kernel.org>
Cc: Daniel Bristot de Oliveira <bristot(a)redhat.com>
Cc: stable(a)vger.kernel.org
Fixes: 100719dcef447 ("tracing: Add simple expression support to hist triggers")
Reviewed-by: Tom Zanussi <zanussi(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
---
kernel/trace/trace_events_hist.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index ba03b7d84fc2..0207aeed31e6 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -1555,6 +1555,13 @@ static int contains_operator(char *str)
switch (*op) {
case '-':
+ /*
+ * Unfortunately, the modifier ".sym-offset"
+ * can confuse things.
+ */
+ if (op - str >= 4 && !strncmp(op - 4, ".sym-offset", 11))
+ return FIELD_OP_NONE;
+
if (*str == '-')
field_op = FIELD_OP_UNARY_MINUS;
else
--
2.30.2
From: Deren Wu <deren.wu(a)mediatek.com>
Hi,
The bug is found on some devices which power off pcie bus
in suspend mode. In this case, mt7921 would resume fail on
interactive commands. This patch could recover the failure in
command timeout.
Need to apply patches below for dependency. Those patches work
for stability improvment as well.
upstream patches:
commit f92f81d35ac2 ("mt76: mt7921: check mcu returned values in mt7921_start")
commit d32464e68ffc ("mt76: mt7921: introduce mt7921_run_firmware utility routine.")
commit 1f7396acfef4 ("mt76: mt7921: introduce __mt7921_start utility routine")
commit 3990465db682 ("mt76: dma: introduce mt76_dma_queue_reset routine")
commit c001df978e4c ("mt76: dma: export mt76_dma_rx_cleanup routine")
commit 0c1ce9884607 ("mt76: mt7921: add wifi reset support")
commit e513ae49088b ("mt76: mt7921: abort uncompleted scan by wifi reset")
Lorenzo Bianconi (1):
mt76: mt7921: get rid of mcu_reset function pointer
drivers/net/wireless/mediatek/mt76/mt7921/mcu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--
2.25.1
From: Kan Liang <kan.liang(a)linux.intel.com>
A warning as below may be occasionally triggered in an ADL machine when
these conditions occur,
- Two perf record commands run one by one. Both record a PEBS event.
- Both runs on small cores.
- They have different adaptive PEBS configuration (PEBS_DATA_CFG).
[ 673.663291] WARNING: CPU: 4 PID: 9874 at
arch/x86/events/intel/ds.c:1743
setup_pebs_adaptive_sample_data+0x55e/0x5b0
[ 673.663348] RIP: 0010:setup_pebs_adaptive_sample_data+0x55e/0x5b0
[ 673.663357] Call Trace:
[ 673.663357] <NMI>
[ 673.663357] intel_pmu_drain_pebs_icl+0x48b/0x810
[ 673.663360] perf_event_nmi_handler+0x41/0x80
[ 673.663368] </NMI>
[ 673.663370] __perf_event_task_sched_in+0x2c2/0x3a0
Different from the big core, the small core requires the ACK before
re-enabling counters in the NMI handler, otherwise a stale PEBS record
may be dumped into the later NMI handler, which trigger the warning.
Add late_ack in the struct x86_hybrid_pmu to track the late_ack flag for
different types of PMUs. Apply late ACK only for the big cores on an
Alder Lake machine.
The existing hybrid() macro has a compile error when taking address of a
bit-field variable. Add a new macro hybrid_bit() to get the bit-field
value of a given PMU.
Fixes: f83d2f91d259 ("perf/x86/intel: Add Alder Lake Hybrid support")
Reported-by: Ammy Yi <ammy.yi(a)intel.com>
Reviewed-by: Andi Kleen <ak(a)linux.intel.com>
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/events/intel/core.c | 11 +++++------
arch/x86/events/perf_event.h | 12 ++++++++++++
2 files changed, 17 insertions(+), 6 deletions(-)
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index a0dfa82..430a24d 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -2904,14 +2904,13 @@ static int handle_pmi_common(struct pt_regs *regs, u64 status)
*/
static int intel_pmu_handle_irq(struct pt_regs *regs)
{
- struct cpu_hw_events *cpuc;
+ struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+ bool late_ack = hybrid_bit(cpuc->pmu, late_ack);
int loops;
u64 status;
int handled;
int pmu_enabled;
- cpuc = this_cpu_ptr(&cpu_hw_events);
-
/*
* Save the PMU state.
* It needs to be restored when leaving the handler.
@@ -2921,7 +2920,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
* No known reason to not always do late ACK,
* but just in case do it opt-in.
*/
- if (!x86_pmu.late_ack)
+ if (!late_ack)
apic_write(APIC_LVTPC, APIC_DM_NMI);
intel_bts_disable_local();
cpuc->enabled = 0;
@@ -2969,7 +2968,7 @@ static int intel_pmu_handle_irq(struct pt_regs *regs)
* have been reset. This avoids spurious NMIs on
* Haswell CPUs.
*/
- if (x86_pmu.late_ack)
+ if (late_ack)
apic_write(APIC_LVTPC, APIC_DM_NMI);
return handled;
}
@@ -6116,7 +6115,6 @@ __init int intel_pmu_init(void)
static_branch_enable(&perf_is_hybrid);
x86_pmu.num_hybrid_pmus = X86_HYBRID_NUM_PMUS;
- x86_pmu.late_ack = true;
x86_pmu.pebs_aliases = NULL;
x86_pmu.pebs_prec_dist = true;
x86_pmu.pebs_block = true;
@@ -6154,6 +6152,7 @@ __init int intel_pmu_init(void)
pmu = &x86_pmu.hybrid_pmu[X86_HYBRID_PMU_CORE_IDX];
pmu->name = "cpu_core";
pmu->cpu_type = hybrid_big;
+ pmu->late_ack = true;
if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) {
pmu->num_counters = x86_pmu.num_counters + 2;
pmu->num_counters_fixed = x86_pmu.num_counters_fixed + 1;
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index bc8836b..40fa3b1 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -656,6 +656,8 @@ struct x86_hybrid_pmu {
struct event_constraint *event_constraints;
struct event_constraint *pebs_constraints;
struct extra_reg *extra_regs;
+
+ unsigned int late_ack:1;
};
static __always_inline struct x86_hybrid_pmu *hybrid_pmu(struct pmu *pmu)
@@ -686,6 +688,16 @@ extern struct static_key_false perf_is_hybrid;
__Fp; \
}))
+#define hybrid_bit(_pmu, _field) \
+({ \
+ bool __Fp = x86_pmu._field; \
+ \
+ if (is_hybrid() && (_pmu)) \
+ __Fp = hybrid_pmu(_pmu)->_field; \
+ \
+ __Fp; \
+})
+
enum hybrid_pmu_type {
hybrid_big = 0x40,
hybrid_small = 0x20,
--
2.7.4
The standard printk() tries to flush the message to the console
immediately. It tries to take the console lock. If the lock is
already taken then the current owner is responsible for flushing
even the new message.
There is a small race window between checking whether a new message is
available and releasing the console lock. It is solved by re-checking
the state after releasing the console lock. If the check is positive
then console_unlock() tries to take the lock again and process the new
message as well.
The commit 996e966640ddea7b535c ("printk: remove logbuf_lock") causes that
console_seq is not longer read atomically. As a result, the re-check might
be done with an inconsistent 64-bit index.
Solve it by using the last sequence number that has been checked under
the console lock. In the worst case, it will take the lock again only
to realized that the new message has already been proceed. But it
was possible even before.
The variable next_seq is marked as __maybe_unused to call down compiler
warning when CONFIG_PRINTK is not defined.
Fixes: commit 996e966640ddea7b535c ("printk: remove logbuf_lock")
Reported-by: kernel test robot <lkp(a)intel.com> # unused next_seq warning
Cc: stable(a)vger.kernel.org # 5.13
Signed-off-by: Petr Mladek <pmladek(a)suse.com>
Acked-by: Sergey Senozhatsky <senozhatsky(a)chromium.org>
---
Changes in v2:
- fixed warning about unused next_seq variable reported by kernel test robot
kernel/printk/printk.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 142a58d124d9..6dad7da8f383 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -2545,6 +2545,7 @@ void console_unlock(void)
bool do_cond_resched, retry;
struct printk_info info;
struct printk_record r;
+ u64 __maybe_unused next_seq;
if (console_suspended) {
up_console_sem();
@@ -2654,8 +2655,10 @@ void console_unlock(void)
cond_resched();
}
- console_locked = 0;
+ /* Get consistent value of the next-to-be-used sequence number. */
+ next_seq = console_seq;
+ console_locked = 0;
up_console_sem();
/*
@@ -2664,7 +2667,7 @@ void console_unlock(void)
* there's a new owner and the console_unlock() from them will do the
* flush, no worries.
*/
- retry = prb_read_valid(prb, console_seq, NULL);
+ retry = prb_read_valid(prb, next_seq, NULL);
printk_safe_exit_irqrestore(flags);
if (retry && console_trylock())
--
2.26.2
"Hello,
We are interested in advertising. We want to advertise guest article
on your site planet.kernel.org. Our article will be written according
to your site theme and will include a link to a betting site.
What is the rate for this?
IMPORTANT: If you are not accepting links to gambling site we might be
still interested, Please mention the rate for regular links as well
Skype: shanker2020
Telegram 9966160005
WhatsApp number: 9966160005
Thank you"
Fix an issue with the Tyan Tomcat IV S1564D system, the BIOS of which
does not assign PCI buses beyond #2, where our resource reallocation
code preserves the reset default of an I/O BAR assignment outside its
upstream PCI-to-PCI bridge's I/O forwarding range for device 06:08.0 in
this log:
pci_bus 0000:00: max bus depth: 4 pci_try_num: 5
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.0: BAR 4: assigned [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: [io 0xfce0-0xfcff] conflicts with 0000:06:08.0 [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: failed to assign [io size 0x0020]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0xe000-0xefff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: No. 2 try to assign unassigned res
[...]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: [io 0xfce0-0xfcff] conflicts with 0000:06:08.0 [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: failed to assign [io size 0x0020]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0xe000-0xefff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: No. 3 try to assign unassigned res
pci 0000:00:11.0: resource 7 [io 0xe000-0xefff] released
[...]
pci 0000:06:08.1: BAR 4: assigned [io 0x2000-0x201f]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [io 0x2000-0x2fff]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0x1000-0x2fff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: resource 4 [io 0x0000-0xffff]
pci_bus 0000:00: resource 5 [mem 0x00000000-0xffffffff]
pci_bus 0000:01: resource 0 [io 0x1000-0x2fff]
pci_bus 0000:01: resource 1 [mem 0xd8000000-0xdfffffff]
pci_bus 0000:01: resource 2 [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:02: resource 0 [io 0x1000-0x2fff]
pci_bus 0000:02: resource 1 [mem 0xd8000000-0xd8bfffff]
pci_bus 0000:04: resource 0 [io 0x1000-0x1fff]
pci_bus 0000:04: resource 1 [mem 0xd8600000-0xd8afffff]
pci_bus 0000:05: resource 0 [io 0x2000-0x2fff]
pci_bus 0000:05: resource 1 [mem 0xd8000000-0xd85fffff]
pci_bus 0000:06: resource 0 [io 0x2000-0x2fff]
pci_bus 0000:06: resource 1 [mem 0xd8000000-0xd85fffff]
-- note that the assignment of 0xfce0-0xfcff is outside the range of
0x2000-0x2fff assigned to bus #6:
05:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge (rev 03) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=05, secondary=06, subordinate=06, sec-latency=0
I/O behind bridge: 00002000-00002fff
Memory behind bridge: d8000000-d85fffff
Capabilities: [50] Power Management version 2
Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/4 Enable-
Capabilities: [80] #0d [0000]
Capabilities: [90] Express PCI/PCI-X Bridge IRQ 0
06:08.0 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller
Flags: bus master, medium devsel, latency 22, IRQ 5
I/O ports at fce0 [size=32]
Capabilities: [80] Power Management version 2
06:08.1 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller
Flags: bus master, medium devsel, latency 22, IRQ 5
I/O ports at 2000 [size=32]
Capabilities: [80] Power Management version 2
Since both 06:08.0 and 06:08.1 have the same reset defaults the latter
device escapes its fate and gets a good assignment owing to an address
conflict with the former device.
Consequently when the device driver tries to access 06:08.0 according to
its designated address range it pokes at an unassigned I/O location,
likely subtractively decoded by the southbridge and forwarded to ISA,
causing the driver to become confused and bail out:
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host controller halted, very bad!
uhci_hcd 0000:06:08.0: HCRESET not completed yet!
uhci_hcd 0000:06:08.0: HC died; cleaning up
if good luck happens or if bad luck does, an infinite flood of messages:
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
[...]
making the system virtually unusuable.
This is because we have code to deal with a situation from PR #16263,
where broken ACPI firmware reports the wrong address range for the host
bridge's decoding window and trying to adjust to the window causes more
breakage than leaving the BIOS assignments intact.
This may work for a device directly on the root bus decoded by the host
bridge only, but for a device behind one or more PCI-to-PCI (or CardBus)
bridges those bridges' forwarding windows have been standardised and
need to be respected, or leaving whatever has been there in a downstream
device's BAR will have no effect as cycles for the addresses recorded
there will have no chance to appear on the bus the device has been
immediately attached to.
Make sure then for a device behind a PCI-to-PCI bridge that any firmware
assignment is within the bridge's relevant forwarding window or do not
restore the assignment, fixing the system concerned as follows:
pci_bus 0000:00: max bus depth: 4 pci_try_num: 5
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: failed to assign [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: failed to assign [io 0xfce0-0xfcff]
[...]
pci_bus 0000:00: No. 2 try to assign unassigned res
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: failed to assign [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: failed to assign [io 0xfce0-0xfcff]
[...]
pci_bus 0000:00: No. 3 try to assign unassigned res
[...]
pci 0000:06:08.0: BAR 4: assigned [io 0x2000-0x201f]
pci 0000:06:08.1: BAR 4: assigned [io 0x2020-0x203f]
and making device 06:08.0 work correctly.
Cf. <https://bugzilla.kernel.org/show_bug.cgi?id=16263>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: 58c84eda0756 ("PCI: fall back to original BIOS BAR addresses")
Cc: stable(a)vger.kernel.org # v2.6.35+
---
For the record the system's bus topology is as follows:
-[0000:00]-+-00.0
+-07.0
+-07.1
+-07.2
+-11.0-[0000:01-06]----00.0-[0000:02-06]--+-00.0-[0000:03]--
| +-01.0-[0000:04]--+-00.0
| | \-00.3
| \-02.0-[0000:05-06]----00.0-[0000:06]--+-05.0
| +-08.0
| +-08.1
| \-08.2
+-12.0
+-13.0
\-14.0
Changes from v1:
- Do restore firmware BAR assignments behind a PCI-PCI bridge, but only if
within the bridge's forwarding window.
- Update the change description and heading accordingly (was: PCI: Do not
restore firmware BAR assignments behind a PCI-PCI bridge).
---
drivers/pci/setup-res.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
linux-pci-setup-res-fw-address-nobridge.diff
Index: linux-macro-ide-tty/drivers/pci/setup-res.c
===================================================================
--- linux-macro-ide-tty.orig/drivers/pci/setup-res.c
+++ linux-macro-ide-tty/drivers/pci/setup-res.c
@@ -208,9 +208,19 @@ static int pci_revert_fw_address(struct
res->end = res->start + size - 1;
res->flags &= ~IORESOURCE_UNSET;
+ /*
+ * If we're behind a P2P or CardBus bridge, make sure we're
+ * inside the relevant forwarding window, or otherwise the
+ * assignment must have been bogus and accesses intended for
+ * the range assigned would not reach the device anyway.
+ * On the root bus accept anything under the assumption the
+ * host bridge will let it through.
+ */
root = pci_find_parent_resource(dev, res);
if (!root) {
- if (res->flags & IORESOURCE_IO)
+ if (dev->bus->parent)
+ return -ENXIO;
+ else if (res->flags & IORESOURCE_IO)
root = &ioport_resource;
else
root = &iomem_resource;
Fix an issue with the Tyan Tomcat IV S1564D system, the BIOS of which
does not assign PCI buses beyond #2, where our resource reallocation
code preserves the reset default of an I/O BAR assignment outside its
upstream PCI-to-PCI bridge's I/O forwarding range for device 06:08.0 in
this log:
pci_bus 0000:00: max bus depth: 4 pci_try_num: 5
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.0: BAR 4: assigned [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: [io 0xfce0-0xfcff] conflicts with 0000:06:08.0 [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: failed to assign [io size 0x0020]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0xe000-0xefff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: No. 2 try to assign unassigned res
[...]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: trying firmware assignment [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: [io 0xfce0-0xfcff] conflicts with 0000:06:08.0 [io 0xfce0-0xfcff]
pci 0000:06:08.1: BAR 4: failed to assign [io size 0x0020]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0xe000-0xefff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: No. 3 try to assign unassigned res
pci 0000:00:11.0: resource 7 [io 0xe000-0xefff] released
[...]
pci 0000:06:08.1: BAR 4: assigned [io 0x2000-0x201f]
pci 0000:05:00.0: PCI bridge to [bus 06]
pci 0000:05:00.0: bridge window [io 0x2000-0x2fff]
pci 0000:05:00.0: bridge window [mem 0xd8000000-0xd85fffff]
[...]
pci 0000:00:11.0: PCI bridge to [bus 01-06]
pci 0000:00:11.0: bridge window [io 0x1000-0x2fff]
pci 0000:00:11.0: bridge window [mem 0xd8000000-0xdfffffff]
pci 0000:00:11.0: bridge window [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:00: resource 4 [io 0x0000-0xffff]
pci_bus 0000:00: resource 5 [mem 0x00000000-0xffffffff]
pci_bus 0000:01: resource 0 [io 0x1000-0x2fff]
pci_bus 0000:01: resource 1 [mem 0xd8000000-0xdfffffff]
pci_bus 0000:01: resource 2 [mem 0xa8000000-0xafffffff 64bit pref]
pci_bus 0000:02: resource 0 [io 0x1000-0x2fff]
pci_bus 0000:02: resource 1 [mem 0xd8000000-0xd8bfffff]
pci_bus 0000:04: resource 0 [io 0x1000-0x1fff]
pci_bus 0000:04: resource 1 [mem 0xd8600000-0xd8afffff]
pci_bus 0000:05: resource 0 [io 0x2000-0x2fff]
pci_bus 0000:05: resource 1 [mem 0xd8000000-0xd85fffff]
pci_bus 0000:06: resource 0 [io 0x2000-0x2fff]
pci_bus 0000:06: resource 1 [mem 0xd8000000-0xd85fffff]
-- note that the assignment of 0xfce0-0xfcff is outside the range of
0x2000-0x2fff assigned to bus #6:
05:00.0 PCI bridge: Texas Instruments XIO2000(A)/XIO2200A PCI Express-to-PCI Bridge (rev 03) (prog-if 00 [Normal decode])
Flags: bus master, fast devsel, latency 0
Bus: primary=05, secondary=06, subordinate=06, sec-latency=0
I/O behind bridge: 00002000-00002fff
Memory behind bridge: d8000000-d85fffff
Capabilities: [50] Power Management version 2
Capabilities: [60] Message Signalled Interrupts: 64bit+ Queue=0/4 Enable-
Capabilities: [80] #0d [0000]
Capabilities: [90] Express PCI/PCI-X Bridge IRQ 0
06:08.0 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller
Flags: bus master, medium devsel, latency 22, IRQ 5
I/O ports at fce0 [size=32]
Capabilities: [80] Power Management version 2
06:08.1 USB controller: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller (rev 61) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. VT82xx/62xx/VX700/8x0/900 UHCI USB 1.1 Controller
Flags: bus master, medium devsel, latency 22, IRQ 5
I/O ports at 2000 [size=32]
Capabilities: [80] Power Management version 2
Since both 06:08.0 and 06:08.1 have the same reset defaults the latter
device escapes its fate and gets a good assignment owing to an address
conflict with the former device.
Consequently when the device driver tries to access 06:08.0 according to
its designated address range it pokes at an unassigned I/O location,
likely subtractively decoded by the southbridge and forwarded to ISA,
causing the driver to become confused and bail out:
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host controller halted, very bad!
uhci_hcd 0000:06:08.0: HCRESET not completed yet!
uhci_hcd 0000:06:08.0: HC died; cleaning up
if good luck happens or if bad luck does, an infinite flood of messages:
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
uhci_hcd 0000:06:08.0: host system error, PCI problems?
uhci_hcd 0000:06:08.0: host controller process error, something bad happened!
[...]
making the system virtually unusuable.
This is because we have code to deal with a situation from PR #16263,
where broken ACPI firmware reports the wrong address range for the host
bridge's decoding window and trying to adjust to the window causes more
breakage than leaving the BIOS assignments intact.
This may work for a device directly on the root bus decoded by the host
bridge only, but for a device behind one or more PCI-to-PCI (or CardBus)
bridges those bridges' forwarding windows have been standardised and
need to be respected, or leaving whatever has been there in a downstream
device's BAR will have no effect as cycles for the addresses recorded
there will have no chance to appear on the bus the device has been
immediately attached to.
Do not restore the firmware assignment for a device behind a PCI-to-PCI
bridge then, fixing the system concerned as follows:
pci_bus 0000:00: max bus depth: 4 pci_try_num: 5
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: failed to assign [io size 0x0020]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: failed to assign [io size 0x0020]
[...]
pci_bus 0000:00: No. 2 try to assign unassigned res
[...]
pci 0000:06:08.0: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.0: BAR 4: failed to assign [io size 0x0020]
pci 0000:06:08.1: BAR 4: no space for [io size 0x0020]
pci 0000:06:08.1: BAR 4: failed to assign [io size 0x0020]
[...]
pci_bus 0000:00: No. 3 try to assign unassigned res
[...]
pci 0000:06:08.0: BAR 4: assigned [io 0x2000-0x201f]
pci 0000:06:08.1: BAR 4: assigned [io 0x2020-0x203f]
and making device 06:08.0 work correctly.
Cf. <https://bugzilla.kernel.org/show_bug.cgi?id=16263>
Signed-off-by: Maciej W. Rozycki <macro(a)orcam.me.uk>
Fixes: 58c84eda0756 ("PCI: fall back to original BIOS BAR addresses")
Cc: stable(a)vger.kernel.org # v2.6.35+
---
For the record the system's bus topology is as follows:
-[0000:00]-+-00.0
+-07.0
+-07.1
+-07.2
+-11.0-[0000:01-06]----00.0-[0000:02-06]--+-00.0-[0000:03]--
| +-01.0-[0000:04]--+-00.0
| | \-00.3
| \-02.0-[0000:05-06]----00.0-[0000:06]--+-05.0
| +-08.0
| +-08.1
| \-08.2
+-12.0
+-13.0
\-14.0
---
drivers/pci/setup-res.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
linux-pci-setup-res-fw-address-nobridge.diff
Index: linux-macro-ide-tty/drivers/pci/setup-res.c
===================================================================
--- linux-macro-ide-tty.orig/drivers/pci/setup-res.c
+++ linux-macro-ide-tty/drivers/pci/setup-res.c
@@ -328,13 +328,15 @@ int pci_assign_resource(struct pci_dev *
ret = _pci_assign_resource(dev, resno, size, align);
/*
- * If we failed to assign anything, let's try the address
- * where firmware left it. That at least has a chance of
- * working, which is better than just leaving it disabled.
+ * If we failed to assign anything and we're not behind a P2P
+ * or CardBus bridge, let's try the address where firmware
+ * left it. That at least has a chance of working, which is
+ * better than just leaving it disabled.
*/
if (ret < 0) {
pci_info(dev, "BAR %d: no space for %pR\n", resno, res);
- ret = pci_revert_fw_address(res, dev, resno, size);
+ if (!dev->bus->parent)
+ ret = pci_revert_fw_address(res, dev, resno, size);
}
if (ret < 0) {
On Thu, Jul 08 2021 at 15:37, Thomas Gleixner wrote:
> The recent commit which fixed the entry/exit mismatch on failed 32-bit
> syscalls got the ordering vs. instrumentation_end() wrong, which makes
> objtool complain about tracer invocation in an instrumentation disabled
> region.
>
> Stick the offending local_irq_disable() into the instrumentation enabled
> region so objtool stops complaining.
>
> Fixes: 5d5675df792f ("x86/entry: Fix entry/exit mismatch on failed fast 32-bit syscalls")
Bah. I looked at the wrong branch. It's fixed already in Linus tree:
commit 240001d4e3041832e8a2654adc3ccf1683132b92
Author: Peter Zijlstra <peterz(a)infradead.org>
Date: Mon Jun 21 13:12:34 2021 +0200
x86/entry: Fix noinstr fail in __do_fast_syscall_32()
Though that lacks a CC: stable tag, which would have been appropriate
because 5d5675df792f ("x86/entry: Fix entry/exit mismatch on failed fast
32-bit syscalls") has been backported.
Can the stable folks pick this up please?
Thanks,
tglx
commit 934002cd660b035b926438244b4294e647507e13 upstream.
Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding
fails. If a failure happens after a successful LAUNCH_START command,
a decommission command should be executed. Otherwise, guest context
will be unfreed inside the AMD SP. After the firmware will not have
memory to allocate more SEV guest context, LAUNCH_START command will
begin to fail with SEV_RET_RESOURCE_LIMIT error.
The existing code calls decommission inside sev_unbind_asid, but it is
not called if a failure happens before guest activation succeeds. If
sev_bind_asid fails, decommission is never called. PSP firmware has a
limit for the number of guests. If sev_asid_binding fails many times,
PSP firmware will not have resources to create another guest context.
Cc: stable(a)vger.kernel.org
Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command")
Reported-by: Peter Gonda <pgonda(a)google.com>
Signed-off-by: Alper Gun <alpergun(a)google.com>
Reviewed-by: Marc Orr <marcorr(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Message-Id: <20210610174604.2554090-1-alpergun(a)google.com>
---
arch/x86/kvm/svm.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index ad24e6777277..ae94ec036137 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1791,9 +1791,25 @@ static void sev_asid_free(struct kvm *kvm)
__sev_asid_free(sev->asid);
}
-static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
+static void sev_decommission(unsigned int handle)
{
struct sev_data_decommission *decommission;
+
+ if (!handle)
+ return;
+
+ decommission = kzalloc(sizeof(*decommission), GFP_KERNEL);
+ if (!decommission)
+ return;
+
+ decommission->handle = handle;
+ sev_guest_decommission(decommission, NULL);
+
+ kfree(decommission);
+}
+
+static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
+{
struct sev_data_deactivate *data;
if (!handle)
@@ -1811,15 +1827,7 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
sev_guest_df_flush(NULL);
kfree(data);
- decommission = kzalloc(sizeof(*decommission), GFP_KERNEL);
- if (!decommission)
- return;
-
- /* decommission handle */
- decommission->handle = handle;
- sev_guest_decommission(decommission, NULL);
-
- kfree(decommission);
+ sev_decommission(handle);
}
static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
@@ -6468,8 +6476,10 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
/* Bind ASID to this guest */
ret = sev_bind_asid(kvm, start->handle, error);
- if (ret)
+ if (ret) {
+ sev_decommission(start->handle);
goto e_free_session;
+ }
/* return handle to userspace */
params.handle = start->handle;
--
2.32.0.93.g670b81a890-goog
Hi Greg,
the attached patch is a backport of upstream commit 3de218ff39b9e3f0d4
for Linux 5.10 and older (I've checked it to apply down to 4.4).
Juergen
commit 934002cd660b035b926438244b4294e647507e13 upstream.
Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding
fails. If a failure happens after a successful LAUNCH_START command,
a decommission command should be executed. Otherwise, guest context
will be unfreed inside the AMD SP. After the firmware will not have
memory to allocate more SEV guest context, LAUNCH_START command will
begin to fail with SEV_RET_RESOURCE_LIMIT error.
The existing code calls decommission inside sev_unbind_asid, but it is
not called if a failure happens before guest activation succeeds. If
sev_bind_asid fails, decommission is never called. PSP firmware has a
limit for the number of guests. If sev_asid_binding fails many times,
PSP firmware will not have resources to create another guest context.
Cc: stable(a)vger.kernel.org
Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command")
Reported-by: Peter Gonda <pgonda(a)google.com>
Signed-off-by: Alper Gun <alpergun(a)google.com>
Reviewed-by: Marc Orr <marcorr(a)google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
Message-Id: <20210610174604.2554090-1-alpergun(a)google.com>
---
arch/x86/kvm/svm.c | 32 +++++++++++++++++++++-----------
1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 074cd170912a..aa2da922ca99 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1794,9 +1794,25 @@ static void sev_asid_free(struct kvm *kvm)
__sev_asid_free(sev->asid);
}
-static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
+static void sev_decommission(unsigned int handle)
{
struct sev_data_decommission *decommission;
+
+ if (!handle)
+ return;
+
+ decommission = kzalloc(sizeof(*decommission), GFP_KERNEL);
+ if (!decommission)
+ return;
+
+ decommission->handle = handle;
+ sev_guest_decommission(decommission, NULL);
+
+ kfree(decommission);
+}
+
+static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
+{
struct sev_data_deactivate *data;
if (!handle)
@@ -1814,15 +1830,7 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle)
sev_guest_df_flush(NULL);
kfree(data);
- decommission = kzalloc(sizeof(*decommission), GFP_KERNEL);
- if (!decommission)
- return;
-
- /* decommission handle */
- decommission->handle = handle;
- sev_guest_decommission(decommission, NULL);
-
- kfree(decommission);
+ sev_decommission(handle);
}
static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr,
@@ -6475,8 +6483,10 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp)
/* Bind ASID to this guest */
ret = sev_bind_asid(kvm, start->handle, error);
- if (ret)
+ if (ret) {
+ sev_decommission(start->handle);
goto e_free_session;
+ }
/* return handle to userspace */
params.handle = start->handle;
--
2.32.0.93.g670b81a890-goog
Hi,
please consider applying the following patches to v5.10.y and v5.12.y
stable branches.
788dcee0306e Hexagon: fix build errors
f1f99adf05f2 Hexagon: add target builtins to kernel
6fff7410f6be Hexagon: change jumps to must-extend in futex_atomic_*
The patches are needed to enable hexagon build tests using clang
in v5.10.y and v5.12.y.
Thanks,
Guenter
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 67147e96a332b56c7206238162771d82467f86c0 Mon Sep 17 00:00:00 2001
From: Heiko Carstens <hca(a)linux.ibm.com>
Date: Fri, 18 Jun 2021 16:58:47 +0200
Subject: [PATCH] s390/stack: fix possible register corruption with stack
switch helper
The CALL_ON_STACK macro is used to call a C function from inline
assembly, and therefore must consider the C ABI, which says that only
registers 6-13, and 15 are non-volatile (restored by the called
function).
The inline assembly incorrectly marks all registers used to pass
parameters to the called function as read-only input operands, instead
of operands that are read and written to. This might result in
register corruption depending on usage, compiler, and compile options.
Fix this by marking all operands used to pass parameters as read/write
operands. To keep the code simple even register 6, if used, is marked
as read-write operand.
Fixes: ff340d2472ec ("s390: add stack switch helper")
Cc: <stable(a)kernel.org> # 4.20
Reviewed-by: Vasily Gorbik <gor(a)linux.ibm.com>
Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor(a)linux.ibm.com>
diff --git a/arch/s390/include/asm/stacktrace.h b/arch/s390/include/asm/stacktrace.h
index 2b543163d90a..76c6034428be 100644
--- a/arch/s390/include/asm/stacktrace.h
+++ b/arch/s390/include/asm/stacktrace.h
@@ -91,12 +91,16 @@ struct stack_frame {
CALL_ARGS_4(arg1, arg2, arg3, arg4); \
register unsigned long r4 asm("6") = (unsigned long)(arg5)
-#define CALL_FMT_0 "=&d" (r2) :
-#define CALL_FMT_1 "+&d" (r2) :
-#define CALL_FMT_2 CALL_FMT_1 "d" (r3),
-#define CALL_FMT_3 CALL_FMT_2 "d" (r4),
-#define CALL_FMT_4 CALL_FMT_3 "d" (r5),
-#define CALL_FMT_5 CALL_FMT_4 "d" (r6),
+/*
+ * To keep this simple mark register 2-6 as being changed (volatile)
+ * by the called function, even though register 6 is saved/nonvolatile.
+ */
+#define CALL_FMT_0 "=&d" (r2)
+#define CALL_FMT_1 "+&d" (r2)
+#define CALL_FMT_2 CALL_FMT_1, "+&d" (r3)
+#define CALL_FMT_3 CALL_FMT_2, "+&d" (r4)
+#define CALL_FMT_4 CALL_FMT_3, "+&d" (r5)
+#define CALL_FMT_5 CALL_FMT_4, "+&d" (r6)
#define CALL_CLOBBER_5 "0", "1", "14", "cc", "memory"
#define CALL_CLOBBER_4 CALL_CLOBBER_5
@@ -118,7 +122,7 @@ struct stack_frame {
" brasl 14,%[_fn]\n" \
" la 15,0(%[_prev])\n" \
: [_prev] "=&a" (prev), CALL_FMT_##nr \
- [_stack] "R" (stack), \
+ : [_stack] "R" (stack), \
[_bc] "i" (offsetof(struct stack_frame, back_chain)), \
[_frame] "d" (frame), \
[_fn] "X" (fn) : CALL_CLOBBER_##nr); \
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 34b3d5344719d14fd2185b2d9459b3abcb8cf9d8 Mon Sep 17 00:00:00 2001
From: Petr Mladek <pmladek(a)suse.com>
Date: Thu, 24 Jun 2021 18:39:45 -0700
Subject: [PATCH] kthread_worker: split code for canceling the delayed work
timer
Patch series "kthread_worker: Fix race between kthread_mod_delayed_work()
and kthread_cancel_delayed_work_sync()".
This patchset fixes the race between kthread_mod_delayed_work() and
kthread_cancel_delayed_work_sync() including proper return value
handling.
This patch (of 2):
Simple code refactoring as a preparation step for fixing a race between
kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync().
It does not modify the existing behavior.
Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com
Signed-off-by: Petr Mladek <pmladek(a)suse.com>
Cc: <jenhaochen(a)google.com>
Cc: Martin Liu <liumartin(a)google.com>
Cc: Minchan Kim <minchan(a)google.com>
Cc: Nathan Chancellor <nathan(a)kernel.org>
Cc: Nick Desaulniers <ndesaulniers(a)google.com>
Cc: Oleg Nesterov <oleg(a)redhat.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
diff --git a/kernel/kthread.c b/kernel/kthread.c
index fe3f2a40d61e..121a0e1fc659 100644
--- a/kernel/kthread.c
+++ b/kernel/kthread.c
@@ -1092,6 +1092,33 @@ void kthread_flush_work(struct kthread_work *work)
}
EXPORT_SYMBOL_GPL(kthread_flush_work);
+/*
+ * Make sure that the timer is neither set nor running and could
+ * not manipulate the work list_head any longer.
+ *
+ * The function is called under worker->lock. The lock is temporary
+ * released but the timer can't be set again in the meantime.
+ */
+static void kthread_cancel_delayed_work_timer(struct kthread_work *work,
+ unsigned long *flags)
+{
+ struct kthread_delayed_work *dwork =
+ container_of(work, struct kthread_delayed_work, work);
+ struct kthread_worker *worker = work->worker;
+
+ /*
+ * del_timer_sync() must be called to make sure that the timer
+ * callback is not running. The lock must be temporary released
+ * to avoid a deadlock with the callback. In the meantime,
+ * any queuing is blocked by setting the canceling counter.
+ */
+ work->canceling++;
+ raw_spin_unlock_irqrestore(&worker->lock, *flags);
+ del_timer_sync(&dwork->timer);
+ raw_spin_lock_irqsave(&worker->lock, *flags);
+ work->canceling--;
+}
+
/*
* This function removes the work from the worker queue. Also it makes sure
* that it won't get queued later via the delayed work's timer.
@@ -1106,23 +1133,8 @@ static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork,
unsigned long *flags)
{
/* Try to cancel the timer if exists. */
- if (is_dwork) {
- struct kthread_delayed_work *dwork =
- container_of(work, struct kthread_delayed_work, work);
- struct kthread_worker *worker = work->worker;
-
- /*
- * del_timer_sync() must be called to make sure that the timer
- * callback is not running. The lock must be temporary released
- * to avoid a deadlock with the callback. In the meantime,
- * any queuing is blocked by setting the canceling counter.
- */
- work->canceling++;
- raw_spin_unlock_irqrestore(&worker->lock, *flags);
- del_timer_sync(&dwork->timer);
- raw_spin_lock_irqsave(&worker->lock, *flags);
- work->canceling--;
- }
+ if (is_dwork)
+ kthread_cancel_delayed_work_timer(work, flags);
/*
* Try to remove the work from a worker list. It might either
From: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
[ Upstream commit 4ca052b4ea621d0002a5e5feace51f60ad5e6b23 ]
Some devices reference an output terminal as the source of extension
units. This is incorrect, as output terminals only have an input pin,
and thus can't be connected to any entity in the forward direction. The
resulting topology would cause issues when registering the media
controller graph. To avoid this problem, connect the extension unit to
the source of the output terminal instead.
While at it, and while no device has been reported to be affected by
this issue, also handle forward scans where two output terminals would
be connected together, and skip the terminals found through such an
invalid connection.
Cc: stable(a)vger.kernel.org # v5.10
Reported-by: Hans de Goede <hdegoede(a)redhat.com>
Signed-off-by: Laurent Pinchart <laurent.pinchart(a)ideasonboard.com>
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
drivers/media/usb/uvc/uvc_driver.c | 32 ++++++++++++++++++++++++++++++
1 file changed, 32 insertions(+)
diff --git a/drivers/media/usb/uvc/uvc_driver.c b/drivers/media/usb/uvc/uvc_driver.c
index 5ad528264135..282f3d2388cc 100644
--- a/drivers/media/usb/uvc/uvc_driver.c
+++ b/drivers/media/usb/uvc/uvc_driver.c
@@ -1588,6 +1588,31 @@ static int uvc_scan_chain_forward(struct uvc_video_chain *chain,
return -EINVAL;
}
+ /*
+ * Some devices reference an output terminal as the
+ * source of extension units. This is incorrect, as
+ * output terminals only have an input pin, and thus
+ * can't be connected to any entity in the forward
+ * direction. The resulting topology would cause issues
+ * when registering the media controller graph. To
+ * avoid this problem, connect the extension unit to
+ * the source of the output terminal instead.
+ */
+ if (UVC_ENTITY_IS_OTERM(entity)) {
+ struct uvc_entity *source;
+
+ source = uvc_entity_by_id(chain->dev,
+ entity->baSourceID[0]);
+ if (!source) {
+ uvc_trace(UVC_TRACE_DESCR,
+ "Can't connect extension unit %u in chain\n",
+ forward->id);
+ break;
+ }
+
+ forward->baSourceID[0] = source->id;
+ }
+
list_add_tail(&forward->chain, &chain->entities);
if (uvc_trace_param & UVC_TRACE_PROBE) {
if (!found)
@@ -1608,6 +1633,13 @@ static int uvc_scan_chain_forward(struct uvc_video_chain *chain,
return -EINVAL;
}
+ if (UVC_ENTITY_IS_OTERM(entity)) {
+ uvc_trace(UVC_TRACE_DESCR,
+ "Unsupported connection between output terminals %u and %u\n",
+ entity->id, forward->id);
+ break;
+ }
+
list_add_tail(&forward->chain, &chain->entities);
if (uvc_trace_param & UVC_TRACE_PROBE) {
if (!found)
--
2.31.1
From: Anson Huang <Anson.Huang(a)nxp.com>
commit 4521de30fbb3f5be0db58de93582ebce72c9d44f upstream.
The vdd3p0 LDO's input should be from external USB VBUS directly, NOT
PMIC's power supply, the vdd3p0 LDO's target output voltage can be
controlled by SW, and it requires input voltage to be high enough, with
incorrect power supply assigned, if the power supply's voltage is lower
than the LDO target output voltage, it will return fail and skip the LDO
voltage adjustment, so remove the power supply assignment for vdd3p0 to
avoid such scenario.
Fixes: 93385546ba36 ("ARM: dts: imx6qdl-sabresd: Assign corresponding power supply for LDOs")
Signed-off-by: Anson Huang <Anson.Huang(a)nxp.com>
Signed-off-by: Shawn Guo <shawnguo(a)kernel.org>
Signed-off-by: Nobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu(a)toshiba.co.jp>
---
arch/arm/boot/dts/imx6qdl-sabresd.dtsi | 4 ----
1 file changed, 4 deletions(-)
diff --git a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi b/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
index 41384bbd2f60c6..03357d39870eed 100644
--- a/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
+++ b/arch/arm/boot/dts/imx6qdl-sabresd.dtsi
@@ -675,10 +675,6 @@
vin-supply = <&vgen5_reg>;
};
-®_vdd3p0 {
- vin-supply = <&sw2_reg>;
-};
-
®_vdd2p5 {
vin-supply = <&vgen5_reg>;
};
--
2.31.1
From: David Rientjes <rientjes(a)google.com>
commit 7be74942f184fdfba34ddd19a0d995deb34d4a03 upstream.
There may be many encrypted regions that need to be unregistered when a
SEV VM is destroyed. This can lead to soft lockups. For example, on a
host running 4.15:
watchdog: BUG: soft lockup - CPU#206 stuck for 11s! [t_virtual_machi:194348]
CPU: 206 PID: 194348 Comm: t_virtual_machi
RIP: 0010:free_unref_page_list+0x105/0x170
...
Call Trace:
[<0>] release_pages+0x159/0x3d0
[<0>] sev_unpin_memory+0x2c/0x50 [kvm_amd]
[<0>] __unregister_enc_region_locked+0x2f/0x70 [kvm_amd]
[<0>] svm_vm_destroy+0xa9/0x200 [kvm_amd]
[<0>] kvm_arch_destroy_vm+0x47/0x200
[<0>] kvm_put_kvm+0x1a8/0x2f0
[<0>] kvm_vm_release+0x25/0x30
[<0>] do_exit+0x335/0xc10
[<0>] do_group_exit+0x3f/0xa0
[<0>] get_signal+0x1bc/0x670
[<0>] do_signal+0x31/0x130
Although the CLFLUSH is no longer issued on every encrypted region to be
unregistered, there are no other changes that can prevent soft lockups for
very large SEV VMs in the latest kernel.
Periodically schedule if necessary. This still holds kvm->lock across the
resched, but since this only happens when the VM is destroyed this is
assumed to be acceptable.
Signed-off-by: David Rientjes <rientjes(a)google.com>
Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727(a)chino.kir.corp.google.com>
Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com>
[iwamatsu: adjust filename.]
Reference: CVE-2020-36311
Signed-off-by: Nobuhiro Iwamatsu (CIP) <nobuhiro1.iwamatsu(a)toshiba.co.jp>
---
arch/x86/kvm/svm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index c5673bda4b66df..3f776e654e3aec 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1910,6 +1910,7 @@ static void sev_vm_destroy(struct kvm *kvm)
list_for_each_safe(pos, q, head) {
__unregister_enc_region_locked(kvm,
list_entry(pos, struct enc_region, list));
+ cond_resched();
}
}
--
2.31.1
Hi,
I'd like to propose the following patch for inclusion into 5.10 LTS
commit: 25edcc50d76c834479d11fcc7de46f3da4d95121
subject: [PATCH] KVM: PPC: Book3S HV: Save and restore FSCR in the P9 path
Without this patch qemu does not work on POWER9 on modern glibc,
so I think it should be included in at least 5.10 lts branch.
I cannot test on older LTS versions unfortunately, only 5.10.
Started Virtual Machine qemu-2-gentoo-ppc64-stable.
Facility 'SCV' unavailable (12), exception at 0x7fff9f81d4b0, MSR=900000000280f033
CPU 0/KVM[2914766]: illegal instruction (4) at 7fff9f81d4b0 nip 7fff9f81d4b0 lr 13e6048e0 code 1 in libc-2.33.so[7fff9f6f0000+200000]
CPU 0/KVM[2914766]: code: e8010010 7c0803a6 4e800020 60420000 7ca42b78 4bffedb5 60000000 38210020
CPU 0/KVM[2914766]: code: e8010010 7c0803a6 4e800020 60420000 <44000001> 4bffffb8 60000000 60420000
--
Best regards,
Georgy
From: Kan Liang <kan.liang(a)linux.intel.com>
There are three channels on a Ice Lake server, but only two channels
will ever be active. Current perf only enables two channels.
Support the extra IMC channel, which may be activated on some Ice Lake
machines. For a non-activated channel, the SW can still access it. The
write will be ignored by the HW. 0 is always returned for the reading.
Fixes: 2b3b76b5ec67 ("perf/x86/intel/uncore: Add Ice Lake server uncore support")
Reviewed-by: Andi Kleen <ak(a)linux.intel.com>
Signed-off-by: Kan Liang <kan.liang(a)linux.intel.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/events/intel/uncore_snbep.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 9a178a9..72a4181 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -452,7 +452,7 @@
#define ICX_M3UPI_PCI_PMON_BOX_CTL 0xa0
/* ICX IMC */
-#define ICX_NUMBER_IMC_CHN 2
+#define ICX_NUMBER_IMC_CHN 3
#define ICX_IMC_MEM_STRIDE 0x4
/* SPR */
@@ -5458,7 +5458,7 @@ static struct intel_uncore_ops icx_uncore_mmio_ops = {
static struct intel_uncore_type icx_uncore_imc = {
.name = "imc",
.num_counters = 4,
- .num_boxes = 8,
+ .num_boxes = 12,
.perf_ctr_bits = 48,
.fixed_ctr_bits = 48,
.fixed_ctr = SNR_IMC_MMIO_PMON_FIXED_CTR,
--
2.7.4
If a user program uses userfaultfd on ranges of heap memory, it may
end up passing a tagged pointer to the kernel in the range.start
field of the UFFDIO_REGISTER ioctl. This can happen when using an
MTE-capable allocator, or on Android if using the Tagged Pointers
feature for MTE readiness [1].
When a fault subsequently occurs, the tag is stripped from the fault
address returned to the application in the fault.address field
of struct uffd_msg. However, from the application's perspective,
the tagged address *is* the memory address, so if the application
is unaware of memory tags, it may get confused by receiving an
address that is, from its point of view, outside of the bounds of the
allocation. We observed this behavior in the kselftest for userfaultfd
[2] but other applications could have the same problem.
Address this by not untagging pointers passed to the userfaultfd
ioctls. Instead, let the system call fail. This will provide an
early indication of problems with tag-unaware userspace code instead
of letting the code get confused later, and is consistent with how
we decided to handle brk/mmap/mremap in commit dcde237319e6 ("mm:
Avoid creating virtual address aliases in brk()/mmap()/mremap()"),
as well as being consistent with the existing tagged address ABI
documentation relating to how ioctl arguments are handled.
The code change is a revert of commit 7d0325749a6c ("userfaultfd:
untag user pointers") plus some fixups to some additional calls to
validate_range that have appeared since then.
[1] https://source.android.com/devices/tech/debug/tagged-pointers
[2] tools/testing/selftests/vm/userfaultfd.c
Signed-off-by: Peter Collingbourne <pcc(a)google.com>
Reviewed-by: Andrey Konovalov <andreyknvl(a)gmail.com>
Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0…
Fixes: 63f0c6037965 ("arm64: Introduce prctl() options to control the tagged user addresses ABI")
Cc: <stable(a)vger.kernel.org> # 5.4
---
v4:
- document the changes more accurately
- fix new calls to validate_range
Documentation/arm64/tagged-address-abi.rst | 26 +++++++++++++++-------
fs/userfaultfd.c | 26 ++++++++++------------
2 files changed, 30 insertions(+), 22 deletions(-)
diff --git a/Documentation/arm64/tagged-address-abi.rst b/Documentation/arm64/tagged-address-abi.rst
index 459e6b66ff68..0c9120ec58ae 100644
--- a/Documentation/arm64/tagged-address-abi.rst
+++ b/Documentation/arm64/tagged-address-abi.rst
@@ -45,14 +45,24 @@ how the user addresses are used by the kernel:
1. User addresses not accessed by the kernel but used for address space
management (e.g. ``mprotect()``, ``madvise()``). The use of valid
- tagged pointers in this context is allowed with the exception of
- ``brk()``, ``mmap()`` and the ``new_address`` argument to
- ``mremap()`` as these have the potential to alias with existing
- user addresses.
-
- NOTE: This behaviour changed in v5.6 and so some earlier kernels may
- incorrectly accept valid tagged pointers for the ``brk()``,
- ``mmap()`` and ``mremap()`` system calls.
+ tagged pointers in this context is allowed with these exceptions:
+
+ - ``brk()``, ``mmap()`` and the ``new_address`` argument to
+ ``mremap()`` as these have the potential to alias with existing
+ user addresses.
+
+ NOTE: This behaviour changed in v5.6 and so some earlier kernels may
+ incorrectly accept valid tagged pointers for the ``brk()``,
+ ``mmap()`` and ``mremap()`` system calls.
+
+ - The ``range.start``, ``start`` and ``dst`` arguments to the
+ ``UFFDIO_*`` ``ioctl()``s used on a file descriptor obtained from
+ ``userfaultfd()``, as fault addresses subsequently obtained by reading
+ the file descriptor will be untagged, which may otherwise confuse
+ tag-unaware programs.
+
+ NOTE: This behaviour changed in v5.14 and so some earlier kernels may
+ incorrectly accept valid tagged pointers for this system call.
2. User addresses accessed by the kernel (e.g. ``write()``). This ABI
relaxation is disabled by default and the application thread needs to
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index dd7a6c62b56f..27af6b82a758 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1236,23 +1236,21 @@ static __always_inline void wake_userfault(struct userfaultfd_ctx *ctx,
}
static __always_inline int validate_range(struct mm_struct *mm,
- __u64 *start, __u64 len)
+ __u64 start, __u64 len)
{
__u64 task_size = mm->task_size;
- *start = untagged_addr(*start);
-
- if (*start & ~PAGE_MASK)
+ if (start & ~PAGE_MASK)
return -EINVAL;
if (len & ~PAGE_MASK)
return -EINVAL;
if (!len)
return -EINVAL;
- if (*start < mmap_min_addr)
+ if (start < mmap_min_addr)
return -EINVAL;
- if (*start >= task_size)
+ if (start >= task_size)
return -EINVAL;
- if (len > task_size - *start)
+ if (len > task_size - start)
return -EINVAL;
return 0;
}
@@ -1313,7 +1311,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx,
vm_flags |= VM_UFFD_MINOR;
}
- ret = validate_range(mm, &uffdio_register.range.start,
+ ret = validate_range(mm, uffdio_register.range.start,
uffdio_register.range.len);
if (ret)
goto out;
@@ -1519,7 +1517,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
if (copy_from_user(&uffdio_unregister, buf, sizeof(uffdio_unregister)))
goto out;
- ret = validate_range(mm, &uffdio_unregister.start,
+ ret = validate_range(mm, uffdio_unregister.start,
uffdio_unregister.len);
if (ret)
goto out;
@@ -1668,7 +1666,7 @@ static int userfaultfd_wake(struct userfaultfd_ctx *ctx,
if (copy_from_user(&uffdio_wake, buf, sizeof(uffdio_wake)))
goto out;
- ret = validate_range(ctx->mm, &uffdio_wake.start, uffdio_wake.len);
+ ret = validate_range(ctx->mm, uffdio_wake.start, uffdio_wake.len);
if (ret)
goto out;
@@ -1708,7 +1706,7 @@ static int userfaultfd_copy(struct userfaultfd_ctx *ctx,
sizeof(uffdio_copy)-sizeof(__s64)))
goto out;
- ret = validate_range(ctx->mm, &uffdio_copy.dst, uffdio_copy.len);
+ ret = validate_range(ctx->mm, uffdio_copy.dst, uffdio_copy.len);
if (ret)
goto out;
/*
@@ -1765,7 +1763,7 @@ static int userfaultfd_zeropage(struct userfaultfd_ctx *ctx,
sizeof(uffdio_zeropage)-sizeof(__s64)))
goto out;
- ret = validate_range(ctx->mm, &uffdio_zeropage.range.start,
+ ret = validate_range(ctx->mm, uffdio_zeropage.range.start,
uffdio_zeropage.range.len);
if (ret)
goto out;
@@ -1815,7 +1813,7 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx,
sizeof(struct uffdio_writeprotect)))
return -EFAULT;
- ret = validate_range(ctx->mm, &uffdio_wp.range.start,
+ ret = validate_range(ctx->mm, uffdio_wp.range.start,
uffdio_wp.range.len);
if (ret)
return ret;
@@ -1863,7 +1861,7 @@ static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
sizeof(uffdio_continue) - (sizeof(__s64))))
goto out;
- ret = validate_range(ctx->mm, &uffdio_continue.range.start,
+ ret = validate_range(ctx->mm, uffdio_continue.range.start,
uffdio_continue.range.len);
if (ret)
goto out;
--
2.32.0.93.g670b81a890-goog
Le 7/07/2021 à 01:16, Palmer Dabbelt a écrit :
> On Thu, 24 Jun 2021 05:17:21 PDT (-0700), alex(a)ghiti.fr wrote:
>> BPF region was moved back to the region below the kernel at the end of
>> the
>> module region in commit 3a02764c372c ("riscv: Ensure BPF_JIT_REGION_START
>> aligned with PMD size"), so reflect this change in kernel page table
>> output.
>>
>> Signed-off-by: Alexandre Ghiti <alex(a)ghiti.fr>
>> ---
>> arch/riscv/mm/ptdump.c | 4 ++--
>> 1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
>> index 0536ac84b730..22d6555d89dc 100644
>> --- a/arch/riscv/mm/ptdump.c
>> +++ b/arch/riscv/mm/ptdump.c
>> @@ -98,8 +98,8 @@ static struct addr_marker address_markers[] = {
>> {0, "vmalloc() end"},
>> {0, "Linear mapping"},
>> #ifdef CONFIG_64BIT
>> - {0, "Modules mapping"},
>> - {0, "Kernel mapping (kernel, BPF)"},
>> + {0, "Modules/BPF mapping"},
>> + {0, "Kernel mapping"},
>> #endif
>> {-1, NULL},
>> };
>
> Thanks, this is on for-next.
As this fix was for 5.13, I add stable in cc.
Cc: stable(a)vger.kernel.org # v5.13
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv(a)lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
From: "Aneesh Kumar K.V" <aneesh.kumar(a)linux.ibm.com>
Subject: mm/mremap: hold the rmap lock in write mode when moving page table entries.
To avoid a race between rmap walk and mremap, mremap does
take_rmap_locks(). The lock was taken to ensure that rmap walk don't miss
a page table entry due to PTE moves via move_pagetables(). The kernel
does further optimization of this lock such that if we are going to find
the newly added vma after the old vma, the rmap lock is not taken. This
is because rmap walk would find the vmas in the same order and if we don't
find the page table attached to older vma we would find it with the new
vma which we would iterate later.
As explained in commit eb66ae030829 ("mremap: properly flush TLB before
releasing the page") mremap is special in that it doesn't take ownership
of the page. The optimized version for PUD/PMD aligned mremap also
doesn't hold the ptl lock. This can result in stale TLB entries as show
below.
This patch updates the rmap locking requirement in mremap to handle the race condition
explained below with optimized mremap::
Optmized PMD move
CPU 1 CPU 2 CPU 3
mremap(old_addr, new_addr) page_shrinker/try_to_unmap_one
mmap_write_lock_killable()
addr = old_addr
lock(pte_ptl)
lock(pmd_ptl)
pmd = *old_pmd
pmd_clear(old_pmd)
flush_tlb_range(old_addr)
*new_pmd = pmd
*new_addr = 10; and fills
TLB with new addr
and old pfn
unlock(pmd_ptl)
ptep_clear_flush()
old pfn is free.
Stale TLB entry
Optimized PUD move also suffers from a similar race. Both the above race
condition can be fixed if we force mremap path to take rmap lock.
Link: https://lkml.kernel.org/r/20210616045239.370802-7-aneesh.kumar@linux.ibm.com
Fixes: 2c91bd4a4e2e ("mm: speed up mremap by 20x on large regions")
Fixes: c49dd3401802 ("mm: speedup mremap on 1GB or larger regions")
Link: https://lore.kernel.org/linux-mm/CAHk-=wgXVR04eBNtxQfevontWnP6FDm+oj5vauQXP…
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar(a)linux.ibm.com>
Acked-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov(a)linux.intel.com>
Cc: Christophe Leroy <christophe.leroy(a)csgroup.eu>
Cc: Joel Fernandes <joel(a)joelfernandes.org>
Cc: Kalesh Singh <kaleshsingh(a)google.com>
Cc: Kirill A. Shutemov <kirill(a)shutemov.name>
Cc: Michael Ellerman <mpe(a)ellerman.id.au>
Cc: Nicholas Piggin <npiggin(a)gmail.com>
Cc: Stephen Rothwell <sfr(a)canb.auug.org.au>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mremap.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/mremap.c~mm-mremap-hold-the-rmap-lock-in-write-mode-when-moving-page-table-entries
+++ a/mm/mremap.c
@@ -504,7 +504,7 @@ unsigned long move_page_tables(struct vm
} else if (IS_ENABLED(CONFIG_HAVE_MOVE_PUD) && extent == PUD_SIZE) {
if (move_pgt_entry(NORMAL_PUD, vma, old_addr, new_addr,
- old_pud, new_pud, need_rmap_locks))
+ old_pud, new_pud, true))
continue;
}
@@ -531,7 +531,7 @@ unsigned long move_page_tables(struct vm
* moving at the PMD level if possible.
*/
if (move_pgt_entry(NORMAL_PMD, vma, old_addr, new_addr,
- old_pmd, new_pmd, need_rmap_locks))
+ old_pmd, new_pmd, true))
continue;
}
_