The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x a7075f501bd33c93570af759b6f4302ef0175168
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025102051-bonded-proofread-bd52@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a7075f501bd33c93570af759b6f4302ef0175168 Mon Sep 17 00:00:00 2001
From: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Date: Thu, 9 Oct 2025 17:03:49 -0700
Subject: [PATCH] ixgbevf: fix mailbox API compatibility by negotiating
supported features
There was backward compatibility in the terms of mailbox API. Various
drivers from various OSes supporting 10G adapters from Intel portfolio
could easily negotiate mailbox API.
This convention has been broken since introducing API 1.4.
Commit 0062e7cc955e ("ixgbevf: add VF IPsec offload code") added support
for IPSec which is specific only for the kernel ixgbe driver. None of the
rest of the Intel 10G PF/VF drivers supports it. And actually lack of
support was not included in the IPSec implementation - there were no such
code paths. No possibility to negotiate support for the feature was
introduced along with introduction of the feature itself.
Commit 339f28964147 ("ixgbevf: Add support for new mailbox communication
between PF and VF") increasing API version to 1.5 did the same - it
introduced code supported specifically by the PF ESX driver. It altered API
version for the VF driver in the same time not touching the version
defined for the PF ixgbe driver. It led to additional discrepancies,
as the code provided within API 1.6 cannot be supported for Linux ixgbe
driver as it causes crashes.
The issue was noticed some time ago and mitigated by Jake within the commit
d0725312adf5 ("ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5").
As a result we have regression for IPsec support and after increasing API
to version 1.6 ixgbevf driver stopped to support ESX MBX.
To fix this mess add new mailbox op asking PF driver about supported
features. Basing on a response determine whether to set support for IPSec
and ESX-specific enhanced mailbox.
New mailbox op, for compatibility purposes, must be added within new API
revision, as API version of OOT PF & VF drivers is already increased to
1.6 and doesn't incorporate features negotiate op.
Features negotiation mechanism gives possibility to be extended with new
features when needed in the future.
Reported-by: Jacob Keller <jacob.e.keller(a)intel.com>
Closes: https://lore.kernel.org/intel-wired-lan/20241101-jk-ixgbevf-mailbox-v1-5-fi…
Fixes: 0062e7cc955e ("ixgbevf: add VF IPsec offload code")
Fixes: 339f28964147 ("ixgbevf: Add support for new mailbox communication between PF and VF")
Reviewed-by: Jacob Keller <jacob.e.keller(a)intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski(a)intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com>
Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-4-ef32a425b92a@i…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 65580b9cb06f..fce35924ff8b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -273,6 +273,9 @@ static int ixgbevf_ipsec_add_sa(struct net_device *dev,
adapter = netdev_priv(dev);
ipsec = adapter->ipsec;
+ if (!(adapter->pf_features & IXGBEVF_PF_SUP_IPSEC))
+ return -EOPNOTSUPP;
+
if (xs->id.proto != IPPROTO_ESP && xs->id.proto != IPPROTO_AH) {
NL_SET_ERR_MSG_MOD(extack, "Unsupported protocol for IPsec offload");
return -EINVAL;
@@ -405,6 +408,9 @@ static void ixgbevf_ipsec_del_sa(struct net_device *dev,
adapter = netdev_priv(dev);
ipsec = adapter->ipsec;
+ if (!(adapter->pf_features & IXGBEVF_PF_SUP_IPSEC))
+ return;
+
if (xs->xso.dir == XFRM_DEV_OFFLOAD_IN) {
sa_idx = xs->xso.offload_handle - IXGBE_IPSEC_BASE_RX_INDEX;
@@ -612,6 +618,10 @@ void ixgbevf_init_ipsec_offload(struct ixgbevf_adapter *adapter)
size_t size;
switch (adapter->hw.api_version) {
+ case ixgbe_mbox_api_17:
+ if (!(adapter->pf_features & IXGBEVF_PF_SUP_IPSEC))
+ return;
+ break;
case ixgbe_mbox_api_14:
break;
default:
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 3a379e6a3a2a..039187607e98 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -363,6 +363,13 @@ struct ixgbevf_adapter {
struct ixgbe_hw hw;
u16 msg_enable;
+ u32 pf_features;
+#define IXGBEVF_PF_SUP_IPSEC BIT(0)
+#define IXGBEVF_PF_SUP_ESX_MBX BIT(1)
+
+#define IXGBEVF_SUPPORTED_FEATURES (IXGBEVF_PF_SUP_IPSEC | \
+ IXGBEVF_PF_SUP_ESX_MBX)
+
struct ixgbevf_hw_stats stats;
unsigned long state;
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 92671638b428..d5ce20f47def 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2271,10 +2271,35 @@ static void ixgbevf_init_last_counter_stats(struct ixgbevf_adapter *adapter)
adapter->stats.base_vfmprc = adapter->stats.last_vfmprc;
}
+/**
+ * ixgbevf_set_features - Set features supported by PF
+ * @adapter: pointer to the adapter struct
+ *
+ * Negotiate with PF supported features and then set pf_features accordingly.
+ */
+static void ixgbevf_set_features(struct ixgbevf_adapter *adapter)
+{
+ u32 *pf_features = &adapter->pf_features;
+ struct ixgbe_hw *hw = &adapter->hw;
+ int err;
+
+ err = hw->mac.ops.negotiate_features(hw, pf_features);
+ if (err && err != -EOPNOTSUPP)
+ netdev_dbg(adapter->netdev,
+ "PF feature negotiation failed.\n");
+
+ /* Address also pre API 1.7 cases */
+ if (hw->api_version == ixgbe_mbox_api_14)
+ *pf_features |= IXGBEVF_PF_SUP_IPSEC;
+ else if (hw->api_version == ixgbe_mbox_api_15)
+ *pf_features |= IXGBEVF_PF_SUP_ESX_MBX;
+}
+
static void ixgbevf_negotiate_api(struct ixgbevf_adapter *adapter)
{
struct ixgbe_hw *hw = &adapter->hw;
static const int api[] = {
+ ixgbe_mbox_api_17,
ixgbe_mbox_api_16,
ixgbe_mbox_api_15,
ixgbe_mbox_api_14,
@@ -2295,8 +2320,9 @@ static void ixgbevf_negotiate_api(struct ixgbevf_adapter *adapter)
idx++;
}
- /* Following is not supported by API 1.6, it is specific for 1.5 */
- if (hw->api_version == ixgbe_mbox_api_15) {
+ ixgbevf_set_features(adapter);
+
+ if (adapter->pf_features & IXGBEVF_PF_SUP_ESX_MBX) {
hw->mbx.ops.init_params(hw);
memcpy(&hw->mbx.ops, &ixgbevf_mbx_ops,
sizeof(struct ixgbe_mbx_operations));
@@ -2654,6 +2680,7 @@ static void ixgbevf_set_num_queues(struct ixgbevf_adapter *adapter)
case ixgbe_mbox_api_14:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_16:
+ case ixgbe_mbox_api_17:
if (adapter->xdp_prog &&
hw->mac.max_tx_queues == rss)
rss = rss > 3 ? 2 : 1;
@@ -4649,6 +4676,7 @@ static int ixgbevf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
case ixgbe_mbox_api_14:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_16:
+ case ixgbe_mbox_api_17:
netdev->max_mtu = IXGBE_MAX_JUMBO_FRAME_SIZE -
(ETH_HLEN + ETH_FCS_LEN);
break;
diff --git a/drivers/net/ethernet/intel/ixgbevf/mbx.h b/drivers/net/ethernet/intel/ixgbevf/mbx.h
index c1494fd1f67b..a8ed23ee66aa 100644
--- a/drivers/net/ethernet/intel/ixgbevf/mbx.h
+++ b/drivers/net/ethernet/intel/ixgbevf/mbx.h
@@ -67,6 +67,7 @@ enum ixgbe_pfvf_api_rev {
ixgbe_mbox_api_14, /* API version 1.4, linux/freebsd VF driver */
ixgbe_mbox_api_15, /* API version 1.5, linux/freebsd VF driver */
ixgbe_mbox_api_16, /* API version 1.6, linux/freebsd VF driver */
+ ixgbe_mbox_api_17, /* API version 1.7, linux/freebsd VF driver */
/* This value should always be last */
ixgbe_mbox_api_unknown, /* indicates that API version is not known */
};
@@ -106,6 +107,9 @@ enum ixgbe_pfvf_api_rev {
/* mailbox API, version 1.6 VF requests */
#define IXGBE_VF_GET_PF_LINK_STATE 0x11 /* request PF to send link info */
+/* mailbox API, version 1.7 VF requests */
+#define IXGBE_VF_FEATURES_NEGOTIATE 0x12 /* get features supported by PF*/
+
/* length of permanent address message returned from PF */
#define IXGBE_VF_PERMADDR_MSG_LEN 4
/* word in permanent address message with the current multicast type */
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index f05246fb5a74..74d320879513 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -313,6 +313,7 @@ int ixgbevf_get_reta_locked(struct ixgbe_hw *hw, u32 *reta, int num_rx_queues)
* is not supported for this device type.
*/
switch (hw->api_version) {
+ case ixgbe_mbox_api_17:
case ixgbe_mbox_api_16:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_14:
@@ -383,6 +384,7 @@ int ixgbevf_get_rss_key_locked(struct ixgbe_hw *hw, u8 *rss_key)
* or if the operation is not supported for this device type.
*/
switch (hw->api_version) {
+ case ixgbe_mbox_api_17:
case ixgbe_mbox_api_16:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_14:
@@ -555,6 +557,7 @@ static s32 ixgbevf_update_xcast_mode(struct ixgbe_hw *hw, int xcast_mode)
case ixgbe_mbox_api_14:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_16:
+ case ixgbe_mbox_api_17:
break;
default:
return -EOPNOTSUPP;
@@ -646,6 +649,7 @@ static int ixgbevf_get_pf_link_state(struct ixgbe_hw *hw, ixgbe_link_speed *spee
switch (hw->api_version) {
case ixgbe_mbox_api_16:
+ case ixgbe_mbox_api_17:
break;
default:
return -EOPNOTSUPP;
@@ -669,6 +673,42 @@ static int ixgbevf_get_pf_link_state(struct ixgbe_hw *hw, ixgbe_link_speed *spee
return err;
}
+/**
+ * ixgbevf_negotiate_features_vf - negotiate supported features with PF driver
+ * @hw: pointer to the HW structure
+ * @pf_features: bitmask of features supported by PF
+ *
+ * Return: IXGBE_ERR_MBX in the case of mailbox error,
+ * -EOPNOTSUPP if the op is not supported or 0 on success.
+ */
+static int ixgbevf_negotiate_features_vf(struct ixgbe_hw *hw, u32 *pf_features)
+{
+ u32 msgbuf[2] = {};
+ int err;
+
+ switch (hw->api_version) {
+ case ixgbe_mbox_api_17:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ msgbuf[0] = IXGBE_VF_FEATURES_NEGOTIATE;
+ msgbuf[1] = IXGBEVF_SUPPORTED_FEATURES;
+
+ err = ixgbevf_write_msg_read_ack(hw, msgbuf, msgbuf,
+ ARRAY_SIZE(msgbuf));
+
+ if (err || (msgbuf[0] & IXGBE_VT_MSGTYPE_FAILURE)) {
+ err = IXGBE_ERR_MBX;
+ *pf_features = 0x0;
+ } else {
+ *pf_features = msgbuf[1];
+ }
+
+ return err;
+}
+
/**
* ixgbevf_set_vfta_vf - Set/Unset VLAN filter table address
* @hw: pointer to the HW structure
@@ -799,6 +839,7 @@ static s32 ixgbevf_check_mac_link_vf(struct ixgbe_hw *hw,
bool *link_up,
bool autoneg_wait_to_complete)
{
+ struct ixgbevf_adapter *adapter = hw->back;
struct ixgbe_mbx_info *mbx = &hw->mbx;
struct ixgbe_mac_info *mac = &hw->mac;
s32 ret_val = 0;
@@ -825,7 +866,7 @@ static s32 ixgbevf_check_mac_link_vf(struct ixgbe_hw *hw,
* until we are called again and don't report an error
*/
if (mbx->ops.read(hw, &in_msg, 1)) {
- if (hw->api_version >= ixgbe_mbox_api_15)
+ if (adapter->pf_features & IXGBEVF_PF_SUP_ESX_MBX)
mac->get_link_status = false;
goto out;
}
@@ -1026,6 +1067,7 @@ int ixgbevf_get_queues(struct ixgbe_hw *hw, unsigned int *num_tcs,
case ixgbe_mbox_api_14:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_16:
+ case ixgbe_mbox_api_17:
break;
default:
return 0;
@@ -1080,6 +1122,7 @@ static const struct ixgbe_mac_operations ixgbevf_mac_ops = {
.setup_link = ixgbevf_setup_mac_link_vf,
.check_link = ixgbevf_check_mac_link_vf,
.negotiate_api_version = ixgbevf_negotiate_api_version_vf,
+ .negotiate_features = ixgbevf_negotiate_features_vf,
.set_rar = ixgbevf_set_rar_vf,
.update_mc_addr_list = ixgbevf_update_mc_addr_list_vf,
.update_xcast_mode = ixgbevf_update_xcast_mode,
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.h b/drivers/net/ethernet/intel/ixgbevf/vf.h
index 2d791bc26ae4..4f19b8900c29 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.h
@@ -26,6 +26,7 @@ struct ixgbe_mac_operations {
s32 (*stop_adapter)(struct ixgbe_hw *);
s32 (*get_bus_info)(struct ixgbe_hw *);
s32 (*negotiate_api_version)(struct ixgbe_hw *hw, int api);
+ int (*negotiate_features)(struct ixgbe_hw *hw, u32 *pf_features);
/* Link */
s32 (*setup_link)(struct ixgbe_hw *, ixgbe_link_speed, bool, bool);
commit 3fcbf1c77d08 ("arch_topology: Fix cache attributes detection
in the CPU hotplug path")
adds a call to detect_cache_attributes() to populate the cacheinfo
before updating the siblings mask. detect_cache_attributes() allocates
memory and can take the PPTT mutex (on ACPI platforms). On PREEMPT_RT
kernels, on secondary CPUs, this triggers a:
'BUG: sleeping function called from invalid context'
as the code is executed with preemption and interrupts disabled:
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
| in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/111
| preempt_count: 1, expected: 0
| RCU nest depth: 1, expected: 1
| 3 locks held by swapper/111/0:
| #0: (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x218/0x12c8
| #1: (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x48/0xf0
| #2: (&zone->lock){+.+.}-{3:3}, at: rmqueue_bulk+0x64/0xa80
| irq event stamp: 0
| hardirqs last enabled at (0): 0x0
| hardirqs last disabled at (0): copy_process+0x5dc/0x1ab8
| softirqs last enabled at (0): copy_process+0x5dc/0x1ab8
| softirqs last disabled at (0): 0x0
| Preemption disabled at:
| migrate_enable+0x30/0x130
| CPU: 111 PID: 0 Comm: swapper/111 Tainted: G W 6.0.0-rc4-rt6-[...]
| Call trace:
| __kmalloc+0xbc/0x1e8
| detect_cache_attributes+0x2d4/0x5f0
| update_siblings_masks+0x30/0x368
| store_cpu_topology+0x78/0xb8
| secondary_start_kernel+0xd0/0x198
| __secondary_switched+0xb0/0xb4
Pierre fixed this issue in the upstream 6.3 and the original series is follows:
https://lore.kernel.org/all/167404285593.885445.6219705651301997538.b4-ty@a…
We also encountered the same issue on 6.1 stable branch, and need to backport this series:
cacheinfo: Use RISC-V's init_cache_level() as generic OF implementation
cacheinfo: Return error code in init_of_cache_level()
cacheinfo: Check 'cache-unified' property to count cache leaves
ACPI: PPTT: Remove acpi_find_cache_levels()
ACPI: PPTT: Update acpi_find_last_cache_level() to acpi_get_cache_info()
arch_topology: Build cacheinfo from primary CPU
And there was a non-trivial number of follow-on fixes for patches in this
series, as pointed out by Greg in the 6.1.156-RC1 review:
cacheinfo: Initialize variables in fetch_cache_info()
cacheinfo: Fix LLC is not exported through sysfs
drivers: base: cacheinfo: Update cpu_map_populated during CPU Hotplug
Finally, Jon discovered an issue in the Tegra platform caused by these patches:
https://lore.kernel.org/all/046f08cb-0610-48c9-af24-4804367df177@nvidia.com/
So we also need to backport the following patch:
arm64: tegra: Update cache properties
K Prateek Nayak (1):
drivers: base: cacheinfo: Update cpu_map_populated during CPU Hotplug
Pierre Gondois (8):
cacheinfo: Use RISC-V's init_cache_level() as generic OF
implementation
cacheinfo: Return error code in init_of_cache_level()
cacheinfo: Check 'cache-unified' property to count cache leaves
ACPI: PPTT: Remove acpi_find_cache_levels()
ACPI: PPTT: Update acpi_find_last_cache_level() to
acpi_get_cache_info()
arch_topology: Build cacheinfo from primary CPU
cacheinfo: Initialize variables in fetch_cache_info()
arm64: tegra: Update cache properties
Yicong Yang (1):
cacheinfo: Fix LLC is not exported through sysfs
arch/arm64/boot/dts/nvidia/tegra194.dtsi | 15 +++
arch/arm64/boot/dts/nvidia/tegra210.dtsi | 1 +
arch/arm64/boot/dts/nvidia/tegra234.dtsi | 33 +++++
arch/arm64/kernel/cacheinfo.c | 11 +-
arch/riscv/kernel/cacheinfo.c | 42 ------
drivers/acpi/pptt.c | 93 ++++++++------
drivers/base/arch_topology.c | 12 +-
drivers/base/cacheinfo.c | 156 +++++++++++++++++++----
include/linux/cacheinfo.h | 11 +-
9 files changed, 262 insertions(+), 112 deletions(-)
--
2.25.1
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 56094ad3eaa21e6621396cc33811d8f72847a834
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025102038-hash-smashing-4b29@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 56094ad3eaa21e6621396cc33811d8f72847a834 Mon Sep 17 00:00:00 2001
From: Jan Kara <jack(a)suse.cz>
Date: Thu, 2 Oct 2025 17:55:07 +0200
Subject: [PATCH] vfs: Don't leak disconnected dentries on umount
When user calls open_by_handle_at() on some inode that is not cached, we
will create disconnected dentry for it. If such dentry is a directory,
exportfs_decode_fh_raw() will then try to connect this dentry to the
dentry tree through reconnect_path(). It may happen for various reasons
(such as corrupted fs or race with rename) that the call to
lookup_one_unlocked() in reconnect_one() will fail to find the dentry we
are trying to reconnect and instead create a new dentry under the
parent. Now this dentry will not be marked as disconnected although the
parent still may well be disconnected (at least in case this
inconsistency happened because the fs is corrupted and .. doesn't point
to the real parent directory). This creates inconsistency in
disconnected flags but AFAICS it was mostly harmless. At least until
commit f1ee616214cb ("VFS: don't keep disconnected dentries on d_anon")
which removed adding of most disconnected dentries to sb->s_anon list.
Thus after this commit cleanup of disconnected dentries implicitely
relies on the fact that dput() will immediately reclaim such dentries.
However when some leaf dentry isn't marked as disconnected, as in the
scenario described above, the reclaim doesn't happen and the dentries
are "leaked". Memory reclaim can eventually reclaim them but otherwise
they stay in memory and if umount comes first, we hit infamous "Busy
inodes after unmount" bug. Make sure all dentries created under a
disconnected parent are marked as disconnected as well.
Reported-by: syzbot+1d79ebe5383fc016cf07(a)syzkaller.appspotmail.com
Fixes: f1ee616214cb ("VFS: don't keep disconnected dentries on d_anon")
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
diff --git a/fs/dcache.c b/fs/dcache.c
index a067fa0a965a..035cccbc9276 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2557,6 +2557,8 @@ struct dentry *d_alloc_parallel(struct dentry *parent,
spin_lock(&parent->d_lock);
new->d_parent = dget_dlock(parent);
hlist_add_head(&new->d_sib, &parent->d_children);
+ if (parent->d_flags & DCACHE_DISCONNECTED)
+ new->d_flags |= DCACHE_DISCONNECTED;
spin_unlock(&parent->d_lock);
retry:
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x a7075f501bd33c93570af759b6f4302ef0175168
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025102050-stadium-reformer-c157@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a7075f501bd33c93570af759b6f4302ef0175168 Mon Sep 17 00:00:00 2001
From: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Date: Thu, 9 Oct 2025 17:03:49 -0700
Subject: [PATCH] ixgbevf: fix mailbox API compatibility by negotiating
supported features
There was backward compatibility in the terms of mailbox API. Various
drivers from various OSes supporting 10G adapters from Intel portfolio
could easily negotiate mailbox API.
This convention has been broken since introducing API 1.4.
Commit 0062e7cc955e ("ixgbevf: add VF IPsec offload code") added support
for IPSec which is specific only for the kernel ixgbe driver. None of the
rest of the Intel 10G PF/VF drivers supports it. And actually lack of
support was not included in the IPSec implementation - there were no such
code paths. No possibility to negotiate support for the feature was
introduced along with introduction of the feature itself.
Commit 339f28964147 ("ixgbevf: Add support for new mailbox communication
between PF and VF") increasing API version to 1.5 did the same - it
introduced code supported specifically by the PF ESX driver. It altered API
version for the VF driver in the same time not touching the version
defined for the PF ixgbe driver. It led to additional discrepancies,
as the code provided within API 1.6 cannot be supported for Linux ixgbe
driver as it causes crashes.
The issue was noticed some time ago and mitigated by Jake within the commit
d0725312adf5 ("ixgbevf: stop attempting IPSEC offload on Mailbox API 1.5").
As a result we have regression for IPsec support and after increasing API
to version 1.6 ixgbevf driver stopped to support ESX MBX.
To fix this mess add new mailbox op asking PF driver about supported
features. Basing on a response determine whether to set support for IPSec
and ESX-specific enhanced mailbox.
New mailbox op, for compatibility purposes, must be added within new API
revision, as API version of OOT PF & VF drivers is already increased to
1.6 and doesn't incorporate features negotiate op.
Features negotiation mechanism gives possibility to be extended with new
features when needed in the future.
Reported-by: Jacob Keller <jacob.e.keller(a)intel.com>
Closes: https://lore.kernel.org/intel-wired-lan/20241101-jk-ixgbevf-mailbox-v1-5-fi…
Fixes: 0062e7cc955e ("ixgbevf: add VF IPsec offload code")
Fixes: 339f28964147 ("ixgbevf: Add support for new mailbox communication between PF and VF")
Reviewed-by: Jacob Keller <jacob.e.keller(a)intel.com>
Reviewed-by: Przemek Kitszel <przemyslaw.kitszel(a)intel.com>
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov(a)intel.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski(a)intel.com>
Tested-by: Rafal Romanowski <rafal.romanowski(a)intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller(a)intel.com>
Link: https://patch.msgid.link/20251009-jk-iwl-net-2025-10-01-v3-4-ef32a425b92a@i…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/intel/ixgbevf/ipsec.c b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
index 65580b9cb06f..fce35924ff8b 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ipsec.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ipsec.c
@@ -273,6 +273,9 @@ static int ixgbevf_ipsec_add_sa(struct net_device *dev,
adapter = netdev_priv(dev);
ipsec = adapter->ipsec;
+ if (!(adapter->pf_features & IXGBEVF_PF_SUP_IPSEC))
+ return -EOPNOTSUPP;
+
if (xs->id.proto != IPPROTO_ESP && xs->id.proto != IPPROTO_AH) {
NL_SET_ERR_MSG_MOD(extack, "Unsupported protocol for IPsec offload");
return -EINVAL;
@@ -405,6 +408,9 @@ static void ixgbevf_ipsec_del_sa(struct net_device *dev,
adapter = netdev_priv(dev);
ipsec = adapter->ipsec;
+ if (!(adapter->pf_features & IXGBEVF_PF_SUP_IPSEC))
+ return;
+
if (xs->xso.dir == XFRM_DEV_OFFLOAD_IN) {
sa_idx = xs->xso.offload_handle - IXGBE_IPSEC_BASE_RX_INDEX;
@@ -612,6 +618,10 @@ void ixgbevf_init_ipsec_offload(struct ixgbevf_adapter *adapter)
size_t size;
switch (adapter->hw.api_version) {
+ case ixgbe_mbox_api_17:
+ if (!(adapter->pf_features & IXGBEVF_PF_SUP_IPSEC))
+ return;
+ break;
case ixgbe_mbox_api_14:
break;
default:
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
index 3a379e6a3a2a..039187607e98 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf.h
@@ -363,6 +363,13 @@ struct ixgbevf_adapter {
struct ixgbe_hw hw;
u16 msg_enable;
+ u32 pf_features;
+#define IXGBEVF_PF_SUP_IPSEC BIT(0)
+#define IXGBEVF_PF_SUP_ESX_MBX BIT(1)
+
+#define IXGBEVF_SUPPORTED_FEATURES (IXGBEVF_PF_SUP_IPSEC | \
+ IXGBEVF_PF_SUP_ESX_MBX)
+
struct ixgbevf_hw_stats stats;
unsigned long state;
diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
index 92671638b428..d5ce20f47def 100644
--- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
+++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c
@@ -2271,10 +2271,35 @@ static void ixgbevf_init_last_counter_stats(struct ixgbevf_adapter *adapter)
adapter->stats.base_vfmprc = adapter->stats.last_vfmprc;
}
+/**
+ * ixgbevf_set_features - Set features supported by PF
+ * @adapter: pointer to the adapter struct
+ *
+ * Negotiate with PF supported features and then set pf_features accordingly.
+ */
+static void ixgbevf_set_features(struct ixgbevf_adapter *adapter)
+{
+ u32 *pf_features = &adapter->pf_features;
+ struct ixgbe_hw *hw = &adapter->hw;
+ int err;
+
+ err = hw->mac.ops.negotiate_features(hw, pf_features);
+ if (err && err != -EOPNOTSUPP)
+ netdev_dbg(adapter->netdev,
+ "PF feature negotiation failed.\n");
+
+ /* Address also pre API 1.7 cases */
+ if (hw->api_version == ixgbe_mbox_api_14)
+ *pf_features |= IXGBEVF_PF_SUP_IPSEC;
+ else if (hw->api_version == ixgbe_mbox_api_15)
+ *pf_features |= IXGBEVF_PF_SUP_ESX_MBX;
+}
+
static void ixgbevf_negotiate_api(struct ixgbevf_adapter *adapter)
{
struct ixgbe_hw *hw = &adapter->hw;
static const int api[] = {
+ ixgbe_mbox_api_17,
ixgbe_mbox_api_16,
ixgbe_mbox_api_15,
ixgbe_mbox_api_14,
@@ -2295,8 +2320,9 @@ static void ixgbevf_negotiate_api(struct ixgbevf_adapter *adapter)
idx++;
}
- /* Following is not supported by API 1.6, it is specific for 1.5 */
- if (hw->api_version == ixgbe_mbox_api_15) {
+ ixgbevf_set_features(adapter);
+
+ if (adapter->pf_features & IXGBEVF_PF_SUP_ESX_MBX) {
hw->mbx.ops.init_params(hw);
memcpy(&hw->mbx.ops, &ixgbevf_mbx_ops,
sizeof(struct ixgbe_mbx_operations));
@@ -2654,6 +2680,7 @@ static void ixgbevf_set_num_queues(struct ixgbevf_adapter *adapter)
case ixgbe_mbox_api_14:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_16:
+ case ixgbe_mbox_api_17:
if (adapter->xdp_prog &&
hw->mac.max_tx_queues == rss)
rss = rss > 3 ? 2 : 1;
@@ -4649,6 +4676,7 @@ static int ixgbevf_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
case ixgbe_mbox_api_14:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_16:
+ case ixgbe_mbox_api_17:
netdev->max_mtu = IXGBE_MAX_JUMBO_FRAME_SIZE -
(ETH_HLEN + ETH_FCS_LEN);
break;
diff --git a/drivers/net/ethernet/intel/ixgbevf/mbx.h b/drivers/net/ethernet/intel/ixgbevf/mbx.h
index c1494fd1f67b..a8ed23ee66aa 100644
--- a/drivers/net/ethernet/intel/ixgbevf/mbx.h
+++ b/drivers/net/ethernet/intel/ixgbevf/mbx.h
@@ -67,6 +67,7 @@ enum ixgbe_pfvf_api_rev {
ixgbe_mbox_api_14, /* API version 1.4, linux/freebsd VF driver */
ixgbe_mbox_api_15, /* API version 1.5, linux/freebsd VF driver */
ixgbe_mbox_api_16, /* API version 1.6, linux/freebsd VF driver */
+ ixgbe_mbox_api_17, /* API version 1.7, linux/freebsd VF driver */
/* This value should always be last */
ixgbe_mbox_api_unknown, /* indicates that API version is not known */
};
@@ -106,6 +107,9 @@ enum ixgbe_pfvf_api_rev {
/* mailbox API, version 1.6 VF requests */
#define IXGBE_VF_GET_PF_LINK_STATE 0x11 /* request PF to send link info */
+/* mailbox API, version 1.7 VF requests */
+#define IXGBE_VF_FEATURES_NEGOTIATE 0x12 /* get features supported by PF*/
+
/* length of permanent address message returned from PF */
#define IXGBE_VF_PERMADDR_MSG_LEN 4
/* word in permanent address message with the current multicast type */
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.c b/drivers/net/ethernet/intel/ixgbevf/vf.c
index f05246fb5a74..74d320879513 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.c
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.c
@@ -313,6 +313,7 @@ int ixgbevf_get_reta_locked(struct ixgbe_hw *hw, u32 *reta, int num_rx_queues)
* is not supported for this device type.
*/
switch (hw->api_version) {
+ case ixgbe_mbox_api_17:
case ixgbe_mbox_api_16:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_14:
@@ -383,6 +384,7 @@ int ixgbevf_get_rss_key_locked(struct ixgbe_hw *hw, u8 *rss_key)
* or if the operation is not supported for this device type.
*/
switch (hw->api_version) {
+ case ixgbe_mbox_api_17:
case ixgbe_mbox_api_16:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_14:
@@ -555,6 +557,7 @@ static s32 ixgbevf_update_xcast_mode(struct ixgbe_hw *hw, int xcast_mode)
case ixgbe_mbox_api_14:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_16:
+ case ixgbe_mbox_api_17:
break;
default:
return -EOPNOTSUPP;
@@ -646,6 +649,7 @@ static int ixgbevf_get_pf_link_state(struct ixgbe_hw *hw, ixgbe_link_speed *spee
switch (hw->api_version) {
case ixgbe_mbox_api_16:
+ case ixgbe_mbox_api_17:
break;
default:
return -EOPNOTSUPP;
@@ -669,6 +673,42 @@ static int ixgbevf_get_pf_link_state(struct ixgbe_hw *hw, ixgbe_link_speed *spee
return err;
}
+/**
+ * ixgbevf_negotiate_features_vf - negotiate supported features with PF driver
+ * @hw: pointer to the HW structure
+ * @pf_features: bitmask of features supported by PF
+ *
+ * Return: IXGBE_ERR_MBX in the case of mailbox error,
+ * -EOPNOTSUPP if the op is not supported or 0 on success.
+ */
+static int ixgbevf_negotiate_features_vf(struct ixgbe_hw *hw, u32 *pf_features)
+{
+ u32 msgbuf[2] = {};
+ int err;
+
+ switch (hw->api_version) {
+ case ixgbe_mbox_api_17:
+ break;
+ default:
+ return -EOPNOTSUPP;
+ }
+
+ msgbuf[0] = IXGBE_VF_FEATURES_NEGOTIATE;
+ msgbuf[1] = IXGBEVF_SUPPORTED_FEATURES;
+
+ err = ixgbevf_write_msg_read_ack(hw, msgbuf, msgbuf,
+ ARRAY_SIZE(msgbuf));
+
+ if (err || (msgbuf[0] & IXGBE_VT_MSGTYPE_FAILURE)) {
+ err = IXGBE_ERR_MBX;
+ *pf_features = 0x0;
+ } else {
+ *pf_features = msgbuf[1];
+ }
+
+ return err;
+}
+
/**
* ixgbevf_set_vfta_vf - Set/Unset VLAN filter table address
* @hw: pointer to the HW structure
@@ -799,6 +839,7 @@ static s32 ixgbevf_check_mac_link_vf(struct ixgbe_hw *hw,
bool *link_up,
bool autoneg_wait_to_complete)
{
+ struct ixgbevf_adapter *adapter = hw->back;
struct ixgbe_mbx_info *mbx = &hw->mbx;
struct ixgbe_mac_info *mac = &hw->mac;
s32 ret_val = 0;
@@ -825,7 +866,7 @@ static s32 ixgbevf_check_mac_link_vf(struct ixgbe_hw *hw,
* until we are called again and don't report an error
*/
if (mbx->ops.read(hw, &in_msg, 1)) {
- if (hw->api_version >= ixgbe_mbox_api_15)
+ if (adapter->pf_features & IXGBEVF_PF_SUP_ESX_MBX)
mac->get_link_status = false;
goto out;
}
@@ -1026,6 +1067,7 @@ int ixgbevf_get_queues(struct ixgbe_hw *hw, unsigned int *num_tcs,
case ixgbe_mbox_api_14:
case ixgbe_mbox_api_15:
case ixgbe_mbox_api_16:
+ case ixgbe_mbox_api_17:
break;
default:
return 0;
@@ -1080,6 +1122,7 @@ static const struct ixgbe_mac_operations ixgbevf_mac_ops = {
.setup_link = ixgbevf_setup_mac_link_vf,
.check_link = ixgbevf_check_mac_link_vf,
.negotiate_api_version = ixgbevf_negotiate_api_version_vf,
+ .negotiate_features = ixgbevf_negotiate_features_vf,
.set_rar = ixgbevf_set_rar_vf,
.update_mc_addr_list = ixgbevf_update_mc_addr_list_vf,
.update_xcast_mode = ixgbevf_update_xcast_mode,
diff --git a/drivers/net/ethernet/intel/ixgbevf/vf.h b/drivers/net/ethernet/intel/ixgbevf/vf.h
index 2d791bc26ae4..4f19b8900c29 100644
--- a/drivers/net/ethernet/intel/ixgbevf/vf.h
+++ b/drivers/net/ethernet/intel/ixgbevf/vf.h
@@ -26,6 +26,7 @@ struct ixgbe_mac_operations {
s32 (*stop_adapter)(struct ixgbe_hw *);
s32 (*get_bus_info)(struct ixgbe_hw *);
s32 (*negotiate_api_version)(struct ixgbe_hw *hw, int api);
+ int (*negotiate_features)(struct ixgbe_hw *hw, u32 *pf_features);
/* Link */
s32 (*setup_link)(struct ixgbe_hw *, ixgbe_link_speed, bool, bool);
Hello.
We have observed a huge latency increase using `fork()` after ingesting the CVE-2025-38085 fix which leads to the commit `1013af4f585f: mm/hugetlb: fix huge_pmd_unshare() vs GUP-fast race`. On large machines with 1.5TB of memory with 196 cores, we identified mmapping of 1.2TB of shared memory and forking itself dozens or hundreds of times we see a increase of execution times of a factor of 4. The reproducer is at the end of the email.
Comparing the a kernel without this patch with a kernel with this patch applied when spawning 1000 children we see those execution times:
Patched kernel:
$ time make stress
...
real 0m11.275s
user 0m0.177s
sys 0m23.905s
Original kernel :
$ time make stress
...real 0m2.475s
user 0m1.398s
sys 0m2.501s
The patch in question: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id…
My observation/assumption is:
each child touches 100 random pages and despawns
on each despawn `huge_pmd_unshare()` is called
each call to `huge_pmd_unshare()` syncrhonizes all threads using `tlb_remove_table_sync_one()` leading to the regression
I'm happy to provide more information.
Thank you
Stanislav Uschakow
=== Reproducer ===
Setup:
#!/bin/bash
echo "Setting up hugepages for reproduction..."
# hugepages (1.2TB / 2MB = 614400 pages)
REQUIRED_PAGES=614400
# Check current hugepage allocation
CURRENT_PAGES=$(cat /proc/sys/vm/nr_hugepages)
echo "Current hugepages: $CURRENT_PAGES"
if [ "$CURRENT_PAGES" -lt "$REQUIRED_PAGES" ]; then
echo "Allocating $REQUIRED_PAGES hugepages..."
echo $REQUIRED_PAGES | sudo tee /proc/sys/vm/nr_hugepages
ALLOCATED=$(cat /proc/sys/vm/nr_hugepages)
echo "Allocated hugepages: $ALLOCATED"
if [ "$ALLOCATED" -lt "$REQUIRED_PAGES" ]; then
echo "Warning: Could not allocate all required hugepages"
echo "Available: $ALLOCATED, Required: $REQUIRED_PAGES"
fi
fi
echo never | sudo tee /sys/kernel/mm/transparent_hugepage/enabled
echo -e "\nHugepage information:"
cat /proc/meminfo | grep -i huge
echo -e "\nSetup complete. You can now run the reproduction test."
Makefile:
CXX = gcc
CXXFLAGS = -O2 -Wall
TARGET = hugepage_repro
SOURCE = hugepage_repro.c
$(TARGET): $(SOURCE)
$(CXX) $(CXXFLAGS) -o $(TARGET) $(SOURCE)
clean:
rm -f $(TARGET)
setup:
chmod +x setup_hugepages.sh
./setup_hugepages.sh
test: $(TARGET)
./$(TARGET) 20 3
stress: $(TARGET)
./$(TARGET) 1000 1
.PHONY: clean setup test stress
hugepage_repro.c:
#include <sys/mman.h>
#include <sys/wait.h>
#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <stdio.h>
#define HUGEPAGE_SIZE (2 * 1024 * 1024) // 2MB
#define TOTAL_SIZE (1200ULL * 1024 * 1024 * 1024) // 1.2TB
#define NUM_HUGEPAGES (TOTAL_SIZE / HUGEPAGE_SIZE)
void* create_hugepage_mapping() {
void* addr = mmap(NULL, TOTAL_SIZE, PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0);
if (addr == MAP_FAILED) {
perror("mmap hugepages failed");
exit(1);
}
return addr;
}
void touch_random_pages(void* addr, int num_touches) {
char* base = (char*)addr;
for (int i = 0; i < num_touches; ++i) {
size_t offset = (rand() % NUM_HUGEPAGES) * HUGEPAGE_SIZE;
volatile char val = base[offset];
(void)val;
}
}
void child_process(void* shared_mem, int child_id) {
struct timespec start, end;
clock_gettime(CLOCK_MONOTONIC, &start);
touch_random_pages(shared_mem, 100);
clock_gettime(CLOCK_MONOTONIC, &end);
long duration = (end.tv_sec - start.tv_sec) * 1000000 +
(end.tv_nsec - start.tv_nsec) / 1000;
printf("Child %d completed in %ld μs\n", child_id, duration);
}
int main(int argc, char* argv[]) {
int num_processes = argc > 1 ? atoi(argv[1]) : 50;
int iterations = argc > 2 ? atoi(argv[2]) : 5;
printf("Creating %lluGB hugepage mapping...\n", TOTAL_SIZE / (1024*1024*1024));
void* shared_mem = create_hugepage_mapping();
for (int iter = 0; iter < iterations; ++iter) {
printf("\nIteration %d: Forking %d processes\n", iter + 1, num_processes);
pid_t children[num_processes];
struct timespec iter_start, iter_end;
clock_gettime(CLOCK_MONOTONIC, &iter_start);
for (int i = 0; i < num_processes; ++i) {
pid_t pid = fork();
if (pid == 0) {
child_process(shared_mem, i);
exit(0);
} else if (pid > 0) {
children[i] = pid;
}
}
for (int i = 0; i < num_processes; ++i) {
waitpid(children[i], NULL, 0);
}
clock_gettime(CLOCK_MONOTONIC, &iter_end);
long iter_duration = (iter_end.tv_sec - iter_start.tv_sec) * 1000 +
(iter_end.tv_nsec - iter_start.tv_nsec) / 1000000;
printf("Iteration completed in %ld ms\n", iter_duration);
}
munmap(shared_mem, TOTAL_SIZE);
return 0;
}
Amazon Web Services Development Center Germany GmbH
Tamara-Danz-Str. 13
10243 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B
Sitz: Berlin
Ust-ID: DE 365 538 597
The patch below does not apply to the 6.17-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.17.y
git checkout FETCH_HEAD
git cherry-pick -x 15292f1b4c55a3a7c940dbcb6cb8793871ed3d92
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025102047-tissue-surplus-ff35@gregkh' --subject-prefix 'PATCH 6.17.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 15292f1b4c55a3a7c940dbcb6cb8793871ed3d92 Mon Sep 17 00:00:00 2001
From: Babu Moger <babu.moger(a)amd.com>
Date: Fri, 10 Oct 2025 12:08:35 -0500
Subject: [PATCH] x86/resctrl: Fix miscount of bandwidth event when
reactivating previously unavailable RMID
Users can create as many monitoring groups as the number of RMIDs supported
by the hardware. However, on AMD systems, only a limited number of RMIDs
are guaranteed to be actively tracked by the hardware. RMIDs that exceed
this limit are placed in an "Unavailable" state.
When a bandwidth counter is read for such an RMID, the hardware sets
MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked
again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable
remains set on first read after tracking re-starts and is clear on all
subsequent reads as long as the RMID is tracked.
resctrl miscounts the bandwidth events after an RMID transitions from the
"Unavailable" state back to being tracked. This happens because when the
hardware starts counting again after resetting the counter to zero, resctrl
in turn compares the new count against the counter value stored from the
previous time the RMID was tracked.
This results in resctrl computing an event value that is either undercounting
(when new counter is more than stored counter) or a mistaken overflow (when
new counter is less than stored counter).
Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to
zero whenever the RMID is in the "Unavailable" state to ensure accurate
counting after the RMID resets to zero when it starts to be tracked again.
Example scenario that results in mistaken overflow
==================================================
1. The resctrl filesystem is mounted, and a task is assigned to a
monitoring group.
$mount -t resctrl resctrl /sys/fs/resctrl
$mkdir /sys/fs/resctrl/mon_groups/test1/
$echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
21323 <- Total bytes on domain 0
"Unavailable" <- Total bytes on domain 1
Task is running on domain 0. Counter on domain 1 is "Unavailable".
2. The task runs on domain 0 for a while and then moves to domain 1. The
counter starts incrementing on domain 1.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
7345357 <- Total bytes on domain 0
4545 <- Total bytes on domain 1
3. At some point, the RMID in domain 0 transitions to the "Unavailable"
state because the task is no longer executing in that domain.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
"Unavailable" <- Total bytes on domain 0
434341 <- Total bytes on domain 1
4. Since the task continues to migrate between domains, it may eventually
return to domain 0.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
17592178699059 <- Overflow on domain 0
3232332 <- Total bytes on domain 1
In this case, the RMID on domain 0 transitions from "Unavailable" state to
active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when
the counter is read and begins tracking the RMID counting from 0.
Subsequent reads succeed but return a value smaller than the previously
saved MSR value (7345357). Consequently, the resctrl's overflow logic is
triggered, it compares the previous value (7345357) with the new, smaller
value and incorrectly interprets this as a counter overflow, adding a large
delta.
In reality, this is a false positive: the counter did not overflow but was
simply reset when the RMID transitioned from "Unavailable" back to active
state.
Here is the text from APM [1] available from [2].
"In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the
first QM_CTR read when it begins tracking an RMID that it was not
previously tracking. The U bit will be zero for all subsequent reads from
that RMID while it is still tracked by the hardware. Therefore, a QM_CTR
read with the U bit set when that RMID is in use by a processor can be
considered 0 when calculating the difference with a subsequent read."
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory
Bandwidth (MBM).
[ bp: Split commit message into smaller paragraph chunks for better
consumption. ]
Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger(a)amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com>
Tested-by: Reinette Chatre <reinette.chatre(a)intel.com>
Cc: stable(a)vger.kernel.org # needs adjustments for <= v6.17
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c8945610d455..2cd25a0d4637 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -242,7 +242,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 unused, u32 rmid, enum resctrl_event_id eventid,
u64 *val, void *ignored)
{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
int cpu = cpumask_any(&d->hdr.cpu_mask);
+ struct arch_mbm_state *am;
u64 msr_val;
u32 prmid;
int ret;
@@ -251,12 +253,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
prmid = logical_rmid_to_physical_rmid(cpu, rmid);
ret = __rmid_read_phys(prmid, eventid, &msr_val);
- if (ret)
- return ret;
- *val = get_corrected_val(r, d, rmid, eventid, msr_val);
+ if (!ret) {
+ *val = get_corrected_val(r, d, rmid, eventid, msr_val);
+ } else if (ret == -EINVAL) {
+ am = get_arch_mbm_state(hw_dom, rmid, eventid);
+ if (am)
+ am->prev_msr = 0;
+ }
- return 0;
+ return ret;
}
static int __cntr_id_read(u32 cntr_id, u64 *val)
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 15292f1b4c55a3a7c940dbcb6cb8793871ed3d92
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025102049-machine-domestic-c4b2@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 15292f1b4c55a3a7c940dbcb6cb8793871ed3d92 Mon Sep 17 00:00:00 2001
From: Babu Moger <babu.moger(a)amd.com>
Date: Fri, 10 Oct 2025 12:08:35 -0500
Subject: [PATCH] x86/resctrl: Fix miscount of bandwidth event when
reactivating previously unavailable RMID
Users can create as many monitoring groups as the number of RMIDs supported
by the hardware. However, on AMD systems, only a limited number of RMIDs
are guaranteed to be actively tracked by the hardware. RMIDs that exceed
this limit are placed in an "Unavailable" state.
When a bandwidth counter is read for such an RMID, the hardware sets
MSR_IA32_QM_CTR.Unavailable (bit 62). When such an RMID starts being tracked
again the hardware counter is reset to zero. MSR_IA32_QM_CTR.Unavailable
remains set on first read after tracking re-starts and is clear on all
subsequent reads as long as the RMID is tracked.
resctrl miscounts the bandwidth events after an RMID transitions from the
"Unavailable" state back to being tracked. This happens because when the
hardware starts counting again after resetting the counter to zero, resctrl
in turn compares the new count against the counter value stored from the
previous time the RMID was tracked.
This results in resctrl computing an event value that is either undercounting
(when new counter is more than stored counter) or a mistaken overflow (when
new counter is less than stored counter).
Reset the stored value (arch_mbm_state::prev_msr) of MSR_IA32_QM_CTR to
zero whenever the RMID is in the "Unavailable" state to ensure accurate
counting after the RMID resets to zero when it starts to be tracked again.
Example scenario that results in mistaken overflow
==================================================
1. The resctrl filesystem is mounted, and a task is assigned to a
monitoring group.
$mount -t resctrl resctrl /sys/fs/resctrl
$mkdir /sys/fs/resctrl/mon_groups/test1/
$echo 1234 > /sys/fs/resctrl/mon_groups/test1/tasks
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
21323 <- Total bytes on domain 0
"Unavailable" <- Total bytes on domain 1
Task is running on domain 0. Counter on domain 1 is "Unavailable".
2. The task runs on domain 0 for a while and then moves to domain 1. The
counter starts incrementing on domain 1.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
7345357 <- Total bytes on domain 0
4545 <- Total bytes on domain 1
3. At some point, the RMID in domain 0 transitions to the "Unavailable"
state because the task is no longer executing in that domain.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
"Unavailable" <- Total bytes on domain 0
434341 <- Total bytes on domain 1
4. Since the task continues to migrate between domains, it may eventually
return to domain 0.
$cat /sys/fs/resctrl/mon_groups/test1/mon_data/mon_L3_*/mbm_total_bytes
17592178699059 <- Overflow on domain 0
3232332 <- Total bytes on domain 1
In this case, the RMID on domain 0 transitions from "Unavailable" state to
active state. The hardware sets MSR_IA32_QM_CTR.Unavailable (bit 62) when
the counter is read and begins tracking the RMID counting from 0.
Subsequent reads succeed but return a value smaller than the previously
saved MSR value (7345357). Consequently, the resctrl's overflow logic is
triggered, it compares the previous value (7345357) with the new, smaller
value and incorrectly interprets this as a counter overflow, adding a large
delta.
In reality, this is a false positive: the counter did not overflow but was
simply reset when the RMID transitioned from "Unavailable" back to active
state.
Here is the text from APM [1] available from [2].
"In PQOS Version 2.0 or higher, the MBM hardware will set the U bit on the
first QM_CTR read when it begins tracking an RMID that it was not
previously tracking. The U bit will be zero for all subsequent reads from
that RMID while it is still tracked by the hardware. Therefore, a QM_CTR
read with the U bit set when that RMID is in use by a processor can be
considered 0 when calculating the difference with a subsequent read."
[1] AMD64 Architecture Programmer's Manual Volume 2: System Programming
Publication # 24593 Revision 3.41 section 19.3.3 Monitoring L3 Memory
Bandwidth (MBM).
[ bp: Split commit message into smaller paragraph chunks for better
consumption. ]
Fixes: 4d05bf71f157d ("x86/resctrl: Introduce AMD QOS feature")
Signed-off-by: Babu Moger <babu.moger(a)amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre(a)intel.com>
Tested-by: Reinette Chatre <reinette.chatre(a)intel.com>
Cc: stable(a)vger.kernel.org # needs adjustments for <= v6.17
Link: https://bugzilla.kernel.org/show_bug.cgi?id=206537 # [2]
diff --git a/arch/x86/kernel/cpu/resctrl/monitor.c b/arch/x86/kernel/cpu/resctrl/monitor.c
index c8945610d455..2cd25a0d4637 100644
--- a/arch/x86/kernel/cpu/resctrl/monitor.c
+++ b/arch/x86/kernel/cpu/resctrl/monitor.c
@@ -242,7 +242,9 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
u32 unused, u32 rmid, enum resctrl_event_id eventid,
u64 *val, void *ignored)
{
+ struct rdt_hw_mon_domain *hw_dom = resctrl_to_arch_mon_dom(d);
int cpu = cpumask_any(&d->hdr.cpu_mask);
+ struct arch_mbm_state *am;
u64 msr_val;
u32 prmid;
int ret;
@@ -251,12 +253,16 @@ int resctrl_arch_rmid_read(struct rdt_resource *r, struct rdt_mon_domain *d,
prmid = logical_rmid_to_physical_rmid(cpu, rmid);
ret = __rmid_read_phys(prmid, eventid, &msr_val);
- if (ret)
- return ret;
- *val = get_corrected_val(r, d, rmid, eventid, msr_val);
+ if (!ret) {
+ *val = get_corrected_val(r, d, rmid, eventid, msr_val);
+ } else if (ret == -EINVAL) {
+ am = get_arch_mbm_state(hw_dom, rmid, eventid);
+ if (am)
+ am->prev_msr = 0;
+ }
- return 0;
+ return ret;
}
static int __cntr_id_read(u32 cntr_id, u64 *val)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x eed0e3d305530066b4fc5370107cff8ef1a0d229
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025101624-attitude-destruct-3559@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From eed0e3d305530066b4fc5370107cff8ef1a0d229 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)kernel.org>
Date: Sat, 9 Aug 2025 10:19:39 -0700
Subject: [PATCH] KEYS: trusted_tpm1: Compare HMAC values in constant time
To prevent timing attacks, HMAC value comparison needs to be constant
time. Replace the memcmp() with the correct function, crypto_memneq().
[For the Fixes commit I used the commit that introduced the memcmp().
It predates the introduction of crypto_memneq(), but it was still a bug
at the time even though a helper function didn't exist yet.]
Fixes: d00a1c72f7f4 ("keys: add new trusted key-type")
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
diff --git a/security/keys/trusted-keys/trusted_tpm1.c b/security/keys/trusted-keys/trusted_tpm1.c
index 89c9798d1800..e73f2c6c817a 100644
--- a/security/keys/trusted-keys/trusted_tpm1.c
+++ b/security/keys/trusted-keys/trusted_tpm1.c
@@ -7,6 +7,7 @@
*/
#include <crypto/hash_info.h>
+#include <crypto/utils.h>
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/parser.h>
@@ -241,7 +242,7 @@ int TSS_checkhmac1(unsigned char *buffer,
if (ret < 0)
goto out;
- if (memcmp(testhmac, authdata, SHA1_DIGEST_SIZE))
+ if (crypto_memneq(testhmac, authdata, SHA1_DIGEST_SIZE))
ret = -EINVAL;
out:
kfree_sensitive(sdesc);
@@ -334,7 +335,7 @@ static int TSS_checkhmac2(unsigned char *buffer,
TPM_NONCE_SIZE, ononce, 1, continueflag1, 0, 0);
if (ret < 0)
goto out;
- if (memcmp(testhmac1, authdata1, SHA1_DIGEST_SIZE)) {
+ if (crypto_memneq(testhmac1, authdata1, SHA1_DIGEST_SIZE)) {
ret = -EINVAL;
goto out;
}
@@ -343,7 +344,7 @@ static int TSS_checkhmac2(unsigned char *buffer,
TPM_NONCE_SIZE, ononce, 1, continueflag2, 0, 0);
if (ret < 0)
goto out;
- if (memcmp(testhmac2, authdata2, SHA1_DIGEST_SIZE))
+ if (crypto_memneq(testhmac2, authdata2, SHA1_DIGEST_SIZE))
ret = -EINVAL;
out:
kfree_sensitive(sdesc);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x eed0e3d305530066b4fc5370107cff8ef1a0d229
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025101623-bleep-cold-406b@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From eed0e3d305530066b4fc5370107cff8ef1a0d229 Mon Sep 17 00:00:00 2001
From: Eric Biggers <ebiggers(a)kernel.org>
Date: Sat, 9 Aug 2025 10:19:39 -0700
Subject: [PATCH] KEYS: trusted_tpm1: Compare HMAC values in constant time
To prevent timing attacks, HMAC value comparison needs to be constant
time. Replace the memcmp() with the correct function, crypto_memneq().
[For the Fixes commit I used the commit that introduced the memcmp().
It predates the introduction of crypto_memneq(), but it was still a bug
at the time even though a helper function didn't exist yet.]
Fixes: d00a1c72f7f4 ("keys: add new trusted key-type")
Cc: stable(a)vger.kernel.org
Signed-off-by: Eric Biggers <ebiggers(a)kernel.org>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
diff --git a/security/keys/trusted-keys/trusted_tpm1.c b/security/keys/trusted-keys/trusted_tpm1.c
index 89c9798d1800..e73f2c6c817a 100644
--- a/security/keys/trusted-keys/trusted_tpm1.c
+++ b/security/keys/trusted-keys/trusted_tpm1.c
@@ -7,6 +7,7 @@
*/
#include <crypto/hash_info.h>
+#include <crypto/utils.h>
#include <linux/init.h>
#include <linux/slab.h>
#include <linux/parser.h>
@@ -241,7 +242,7 @@ int TSS_checkhmac1(unsigned char *buffer,
if (ret < 0)
goto out;
- if (memcmp(testhmac, authdata, SHA1_DIGEST_SIZE))
+ if (crypto_memneq(testhmac, authdata, SHA1_DIGEST_SIZE))
ret = -EINVAL;
out:
kfree_sensitive(sdesc);
@@ -334,7 +335,7 @@ static int TSS_checkhmac2(unsigned char *buffer,
TPM_NONCE_SIZE, ononce, 1, continueflag1, 0, 0);
if (ret < 0)
goto out;
- if (memcmp(testhmac1, authdata1, SHA1_DIGEST_SIZE)) {
+ if (crypto_memneq(testhmac1, authdata1, SHA1_DIGEST_SIZE)) {
ret = -EINVAL;
goto out;
}
@@ -343,7 +344,7 @@ static int TSS_checkhmac2(unsigned char *buffer,
TPM_NONCE_SIZE, ononce, 1, continueflag2, 0, 0);
if (ret < 0)
goto out;
- if (memcmp(testhmac2, authdata2, SHA1_DIGEST_SIZE))
+ if (crypto_memneq(testhmac2, authdata2, SHA1_DIGEST_SIZE))
ret = -EINVAL;
out:
kfree_sensitive(sdesc);
The patch below does not apply to the 6.12-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.12.y
git checkout FETCH_HEAD
git cherry-pick -x 833d4313bc1e9e194814917d23e8874d6b651649
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2025101604-chamber-playhouse-5278@gregkh' --subject-prefix 'PATCH 6.12.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 833d4313bc1e9e194814917d23e8874d6b651649 Mon Sep 17 00:00:00 2001
From: "Matthieu Baerts (NGI0)" <matttbe(a)kernel.org>
Date: Thu, 18 Sep 2025 10:50:18 +0200
Subject: [PATCH] mptcp: reset blackhole on success with non-loopback ifaces
When a first MPTCP connection gets successfully established after a
blackhole period, 'active_disable_times' was supposed to be reset when
this connection was done via any non-loopback interfaces.
Unfortunately, the opposite condition was checked: only reset when the
connection was established via a loopback interface. Fixing this by
simply looking at the opposite.
This is similar to what is done with TCP FastOpen, see
tcp_fastopen_active_disable_ofo_check().
This patch is a follow-up of a previous discussion linked to commit
893c49a78d9f ("mptcp: Use __sk_dst_get() and dst_dev_rcu() in
mptcp_active_enable()."), see [1].
Fixes: 27069e7cb3d1 ("mptcp: disable active MPTCP in case of blackhole")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/4209a283-8822-47bd-95b7-87e96d9b7ea3@kernel.org [1]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu(a)google.com>
Link: https://patch.msgid.link/20250918-net-next-mptcp-blackhole-reset-loopback-v…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/mptcp/ctrl.c b/net/mptcp/ctrl.c
index e8ffa62ec183..d96130e49942 100644
--- a/net/mptcp/ctrl.c
+++ b/net/mptcp/ctrl.c
@@ -507,7 +507,7 @@ void mptcp_active_enable(struct sock *sk)
rcu_read_lock();
dst = __sk_dst_get(sk);
dev = dst ? dst_dev_rcu(dst) : NULL;
- if (dev && (dev->flags & IFF_LOOPBACK))
+ if (!(dev && (dev->flags & IFF_LOOPBACK)))
atomic_set(&pernet->active_disable_times, 0);
rcu_read_unlock();
}