On 14.02.2022 18:38, Borislav Petkov wrote:
On Wed, Feb 09, 2022 at 01:04:36PM -0800, Pawan Gupta wrote:
tsx_clear_cpuid() uses MSR_TSX_FORCE_ABORT to clear CPUID.RTM and CPUID.HLE. Not all CPUs support MSR_TSX_FORCE_ABORT, alternatively use MSR_IA32_TSX_CTRL when supported.
Fixes: 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts") Reported-by: kernel test robot lkp@intel.com Tested-by: Neelima Krishnan neelima.krishnan@intel.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com
<--- I'm assuming this needs to be
?
Yes, this needs to be backported to a few kernels that have the commit 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts"). Once this is reviewed, I will send a separate email to stable@ with the list of stable kernels.
@@ -106,13 +110,11 @@ void __init tsx_init(void) int ret;
/*
* Hardware will always abort a TSX transaction if both CPUID bits
* RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is
* better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them
* here.
* Hardware will always abort a TSX transaction when CPUID
* RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate
*/* CPUID.RTM and CPUID.HLE bits. Clear them here.
- if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) &&
boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) {
- if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
So you test here X86_FEATURE_RTM_ALWAYS_ABORT and tsx_clear_cpuid() tests it again. Simplify I guess.
X86_FEATURE_RTM_ALWAYS_ABORT is the precondition for MSR_TFA_TSX_CPUID_CLEAR bit to exist. For current callers of tsx_clear_cpuid() this condition is met, and test for X86_FEATURE_RTM_ALWAYS_ABORT can be removed. But, all the future callers must also have this check, otherwise the MSR write will fault.
tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT; tsx_clear_cpuid(); setup_clear_cpu_cap(X86_FEATURE_RTM);
Also, here you clear X86_FEATURE_RTM and X86_FEATURE_HLE but the other caller of tsx_clear_cpuid() - init_intel() - doesn't clear those bits. Why?
Calling setup_clear_cpu_cap() by boot CPU is sufficient to clear these bits for secondary CPUs. Moreover, init_intel() is also called during cpu hotplug, clearing cached CPUID bits will not be safe after boot.
IOW, can we concentrate the whole tsx_clear_cpuid() logic inside it and have callers only call it unconditionally. Then the decision whether to disable TSX and clear bits will happen all solely in that function instead of being spread around the boot code and being inconsistent.
There are certain cases where this will leave the system in an inconsistent state, for example smt toggle after a late microcode update that adds CPUID.RTM_ALWAYS_ABORT=1. During an smt toggle, if we unconditionally clear CPUID.RTM and CPUID.HLE in init_intel(), half of the CPUs will report TSX feature and other half will not.
Thanks, Pawan
On Mon, Feb 14, 2022 at 02:41:21PM -0800, Pawan Gupta wrote:
Yes, this needs to be backported to a few kernels that have the commit 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts"). Once this is reviewed, I will send a separate email to stable@ with the list of stable kernels.
You don't have to send a separate email - CC: stable and the Fixes tag is enough for a patch to be picked up by the stable folks.
X86_FEATURE_RTM_ALWAYS_ABORT is the precondition for MSR_TFA_TSX_CPUID_CLEAR bit to exist. For current callers of tsx_clear_cpuid() this condition is met, and test for X86_FEATURE_RTM_ALWAYS_ABORT can be removed. But, all the future callers must also have this check, otherwise the MSR write will fault.
I meant something like this (completely untested):
diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c index c2343ea911e8..9d08a6b1726a 100644 --- a/arch/x86/kernel/cpu/tsx.c +++ b/arch/x86/kernel/cpu/tsx.c @@ -84,7 +84,7 @@ static enum tsx_ctrl_states x86_get_tsx_auto_mode(void) return TSX_CTRL_ENABLE; }
-void tsx_clear_cpuid(void) +bool tsx_clear_cpuid(void) { u64 msr;
@@ -97,11 +97,14 @@ void tsx_clear_cpuid(void) rdmsrl(MSR_TSX_FORCE_ABORT, msr); msr |= MSR_TFA_TSX_CPUID_CLEAR; wrmsrl(MSR_TSX_FORCE_ABORT, msr); + return true; } else if (tsx_ctrl_is_supported()) { rdmsrl(MSR_IA32_TSX_CTRL, msr); msr |= TSX_CTRL_CPUID_CLEAR; wrmsrl(MSR_IA32_TSX_CTRL, msr); + return true; } + return false; }
void __init tsx_init(void) @@ -114,9 +117,8 @@ void __init tsx_init(void) * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate * CPUID.RTM and CPUID.HLE bits. Clear them here. */ - if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) { + if (tsx_clear_cpuid()) { tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT; - tsx_clear_cpuid(); setup_clear_cpu_cap(X86_FEATURE_RTM); setup_clear_cpu_cap(X86_FEATURE_HLE); return;
---
but I'm guessing TSX should be disabled by default during boot only when X86_FEATURE_RTM_ALWAYS_ABORT is set.
If those CPUs which support only disabling TSX through MSR_IA32_TSX_CTRL but don't have MSR_TSX_FORCE_ABORT - if those CPUs set X86_FEATURE_RTM_ALWAYS_ABORT too, then this should work.
There are certain cases where this will leave the system in an inconsistent state, for example smt toggle after a late microcode update
What is a "smt toggle"?
You mean late microcode update and then offlining and onlining all logical CPUs except the BSP which would re-detect CPUID features?
that adds CPUID.RTM_ALWAYS_ABORT=1. During an smt toggle, if we unconditionally clear CPUID.RTM and CPUID.HLE in init_intel(), half of the CPUs will report TSX feature and other half will not.
That is important and should be documented. Something like this perhaps:
---
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 8321c43554a1..6c7bca9d6f2e 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -722,6 +722,13 @@ static void init_intel(struct cpuinfo_x86 *c) else if (tsx_ctrl_state == TSX_CTRL_DISABLE) tsx_disable(); else if (tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT) + /* + * This call doesn't clear RTM and HLE X86_FEATURE bits because + * a late microcode reload adding MSR_TSX_FORCE_ABORT can cause + * for those bits to get cleared - something which the kernel + * cannot do due to userspace potentially already using said + * features. + */ tsx_clear_cpuid();
split_lock_init();
On 15.02.2022 00:28, Borislav Petkov wrote:
On Mon, Feb 14, 2022 at 02:41:21PM -0800, Pawan Gupta wrote:
X86_FEATURE_RTM_ALWAYS_ABORT is the precondition for MSR_TFA_TSX_CPUID_CLEAR bit to exist. For current callers of tsx_clear_cpuid() this condition is met, and test for X86_FEATURE_RTM_ALWAYS_ABORT can be removed. But, all the future callers must also have this check, otherwise the MSR write will fault.
I meant something like this (completely untested):
diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c index c2343ea911e8..9d08a6b1726a 100644 --- a/arch/x86/kernel/cpu/tsx.c +++ b/arch/x86/kernel/cpu/tsx.c @@ -84,7 +84,7 @@ static enum tsx_ctrl_states x86_get_tsx_auto_mode(void) return TSX_CTRL_ENABLE; }
-void tsx_clear_cpuid(void) +bool tsx_clear_cpuid(void) { u64 msr;
@@ -97,11 +97,14 @@ void tsx_clear_cpuid(void) rdmsrl(MSR_TSX_FORCE_ABORT, msr); msr |= MSR_TFA_TSX_CPUID_CLEAR; wrmsrl(MSR_TSX_FORCE_ABORT, msr);
} else if (tsx_ctrl_is_supported()) { rdmsrl(MSR_IA32_TSX_CTRL, msr); msr |= TSX_CTRL_CPUID_CLEAR; wrmsrl(MSR_IA32_TSX_CTRL, msr);return true;
This will clear TSX CPUID when X86_FEATURE_RTM_ALWAYS_ABORT=0 also, because ...
}return true;
- return false;
}
void __init tsx_init(void) @@ -114,9 +117,8 @@ void __init tsx_init(void) * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate * CPUID.RTM and CPUID.HLE bits. Clear them here. */
- if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) {
- if (tsx_clear_cpuid()) {
... we are calling tsx_clear_cpuid() unconditionally.
tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT;
setup_clear_cpu_cap(X86_FEATURE_RTM); setup_clear_cpu_cap(X86_FEATURE_HLE); return;tsx_clear_cpuid();
but I'm guessing TSX should be disabled by default during boot only when X86_FEATURE_RTM_ALWAYS_ABORT is set.
That is correct.
If those CPUs which support only disabling TSX through MSR_IA32_TSX_CTRL but don't have MSR_TSX_FORCE_ABORT - if those CPUs set X86_FEATURE_RTM_ALWAYS_ABORT too, then this should work.
There are certain cases where this will leave the system in an inconsistent state, for example smt toggle after a late microcode update
What is a "smt toggle"?
Sorry, I should have been more clear. I meant:
# echo off > /sys/devices/system/cpu/smt/control # echo on > /sys/devices/system/cpu/smt/control
You mean late microcode update and then offlining and onlining all logical CPUs except the BSP which would re-detect CPUID features?
That could also be the problematic case.
that adds CPUID.RTM_ALWAYS_ABORT=1. During an smt toggle, if we unconditionally clear CPUID.RTM and CPUID.HLE in init_intel(), half of the CPUs will report TSX feature and other half will not.
That is important and should be documented. Something like this perhaps:
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 8321c43554a1..6c7bca9d6f2e 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -722,6 +722,13 @@ static void init_intel(struct cpuinfo_x86 *c) else if (tsx_ctrl_state == TSX_CTRL_DISABLE) tsx_disable(); else if (tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT)
/*
* This call doesn't clear RTM and HLE X86_FEATURE bits because
* a late microcode reload adding MSR_TSX_FORCE_ABORT can cause
* for those bits to get cleared - something which the kernel
* cannot do due to userspace potentially already using said
* features.
tsx_clear_cpuid();*/
Thanks, I will add this comment in the next revision.
Pawan
On Mon, Feb 14, 2022 at 04:20:14PM -0800, Pawan Gupta wrote:
... we are calling tsx_clear_cpuid() unconditionally.
I know, that's why I asked...
If those CPUs which support only disabling TSX through MSR_IA32_TSX_CTRL but don't have MSR_TSX_FORCE_ABORT - if those CPUs set X86_FEATURE_RTM_ALWAYS_ABORT too, then this should work.
... this^^.
IOW, what are you fixing here exactly?
Let's look at the two callsites of tsx_clear_cpuid():
1. tsx_init: that will do something on X86_FEATURE_RTM_ALWAYS_ABORT CPUs.
2. init_intel: that will get called when
tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT
But TSX_CTRL_RTM_ALWAYS_ABORT gets set only when X86_FEATURE_RTM_ALWAYS_ABORT is set. I.e., the first case, in tsx_init().
So, IIUC, you wanna fix the case where CPUs which set X86_FEATURE_RTM_ALWAYS_ABORT but *don't* have MSR_TSX_FORCE_ABORT, those CPUs should still disable TSX through MSR_IA32_TSX_CTRL.
Correct?
On 15.02.2022 11:24, Borislav Petkov wrote:
On Mon, Feb 14, 2022 at 04:20:14PM -0800, Pawan Gupta wrote:
... we are calling tsx_clear_cpuid() unconditionally.
I know, that's why I asked...
If those CPUs which support only disabling TSX through MSR_IA32_TSX_CTRL but don't have MSR_TSX_FORCE_ABORT - if those CPUs set X86_FEATURE_RTM_ALWAYS_ABORT too, then this should work.
... this^^.
IOW, what are you fixing here exactly?
Let's look at the two callsites of tsx_clear_cpuid():
tsx_init: that will do something on X86_FEATURE_RTM_ALWAYS_ABORT CPUs.
init_intel: that will get called when
tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT
But TSX_CTRL_RTM_ALWAYS_ABORT gets set only when X86_FEATURE_RTM_ALWAYS_ABORT is set. I.e., the first case, in tsx_init().
So, IIUC, you wanna fix the case where CPUs which set X86_FEATURE_RTM_ALWAYS_ABORT but *don't* have MSR_TSX_FORCE_ABORT, those CPUs should still disable TSX through MSR_IA32_TSX_CTRL.
Correct?
That is exactly what this patch is fixing. Please let me know if you have any questions.
Thanks, Pawan
On Tue, Feb 15, 2022 at 04:11:03AM -0800, Pawan Gupta wrote:
That is exactly what this patch is fixing. Please let me know if you have any questions.
Just one: does the explanation I've written for this mess, sound about right?
I'd like for this to be documented so that I don't scratch my head again when looking at this again later.
Btw, lemme add Cooper to Cc to doublecheck me - he usually knows those things.
Thx.
--- From: Pawan Gupta pawan.kumar.gupta@linux.intel.com Date: Wed, 9 Feb 2022 13:04:36 -0800 Subject: [PATCH] x86/tsx: Use MSR_TSX_CTRL to clear CPUID bits
tsx_clear_cpuid() uses MSR_TSX_FORCE_ABORT to clear CPUID.RTM and CPUID.HLE. Not all CPUs support MSR_TSX_FORCE_ABORT, alternatively use MSR_IA32_TSX_CTRL when supported.
[ bp: Document how and why TSX gets disabled. ]
Fixes: 293649307ef9 ("x86/tsx: Clear CPUID bits when TSX always force aborts") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Pawan Gupta pawan.kumar.gupta@linux.intel.com Signed-off-by: Borislav Petkov bp@suse.de Tested-by: Neelima Krishnan neelima.krishnan@intel.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/5bd785a1d6ea0b572250add0c6617b4504bc24d1.164444031... --- arch/x86/kernel/cpu/intel.c | 1 + arch/x86/kernel/cpu/tsx.c | 54 ++++++++++++++++++++++++++++++++----- 2 files changed, 48 insertions(+), 7 deletions(-)
diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 8321c43554a1..8abf995677a4 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -722,6 +722,7 @@ static void init_intel(struct cpuinfo_x86 *c) else if (tsx_ctrl_state == TSX_CTRL_DISABLE) tsx_disable(); else if (tsx_ctrl_state == TSX_CTRL_RTM_ALWAYS_ABORT) + /* See comment over that function for more details. */ tsx_clear_cpuid();
split_lock_init(); diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c index 9c7a5f049292..2835fa89fc6f 100644 --- a/arch/x86/kernel/cpu/tsx.c +++ b/arch/x86/kernel/cpu/tsx.c @@ -58,7 +58,7 @@ void tsx_enable(void) wrmsrl(MSR_IA32_TSX_CTRL, tsx); }
-static bool __init tsx_ctrl_is_supported(void) +static bool tsx_ctrl_is_supported(void) { u64 ia32_cap = x86_read_arch_cap_msr();
@@ -84,6 +84,44 @@ static enum tsx_ctrl_states x86_get_tsx_auto_mode(void) return TSX_CTRL_ENABLE; }
+/* + * Disabling TSX is not a trivial business. + * + * First of all, there's a CPUID bit: X86_FEATURE_RTM_ALWAYS_ABORT + * which says that TSX is practically disabled (all transactions are + * aborted by default). When that bit is set, the kernel unconditionally + * disables TSX. + * + * In order to do that, however, it needs to dance a bit: + * + * 1. The first method to disable it is through MSR_TSX_FORCE_ABORT and + * the MSR is present only when *two* CPUID bits are set: + * + * - X86_FEATURE_RTM_ALWAYS_ABORT + * - X86_FEATURE_TSX_FORCE_ABORT + * + * 2. The second method is for CPUs which do not have the above-mentioned + * MSR: those use a different MSR - MSR_IA32_TSX_CTRL and disable TSX + * through that one. Those CPUs can also have the initially mentioned + * CPUID bit X86_FEATURE_RTM_ALWAYS_ABORT set and for those the same strategy + * applies: TSX gets disabled unconditionally. + * + * When either of the two methods are present, the kernel disables TSX and + * clears the respective RTM and HLE feature flags. + * + * An additional twist in the whole thing presents late microcode loading + * which, when done, may cause for the X86_FEATURE_RTM_ALWAYS_ABORT CPUID + * bit to be set after the update. + * + * A subsequent hotplug operation on any logical CPU except the BSP will + * cause for the supported CPUID feature bits to get re-detected and, if + * RTM and HLE get cleared all of a sudden, but, userspace did consult + * them before the update, then funny explosions will happen. Long story + * short: the kernel doesn't modify CPUID feature bits after booting. + * + * That's why, this function's call in init_intel() doesn't clear the + * feature flags. + */ void tsx_clear_cpuid(void) { u64 msr; @@ -97,6 +135,10 @@ void tsx_clear_cpuid(void) rdmsrl(MSR_TSX_FORCE_ABORT, msr); msr |= MSR_TFA_TSX_CPUID_CLEAR; wrmsrl(MSR_TSX_FORCE_ABORT, msr); + } else if (tsx_ctrl_is_supported()) { + rdmsrl(MSR_IA32_TSX_CTRL, msr); + msr |= TSX_CTRL_CPUID_CLEAR; + wrmsrl(MSR_IA32_TSX_CTRL, msr); } }
@@ -106,13 +148,11 @@ void __init tsx_init(void) int ret;
/* - * Hardware will always abort a TSX transaction if both CPUID bits - * RTM_ALWAYS_ABORT and TSX_FORCE_ABORT are set. In this case, it is - * better not to enumerate CPUID.RTM and CPUID.HLE bits. Clear them - * here. + * Hardware will always abort a TSX transaction when CPUID + * RTM_ALWAYS_ABORT is set. In this case, it is better not to enumerate + * CPUID.RTM and CPUID.HLE bits. Clear them here. */ - if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT) && - boot_cpu_has(X86_FEATURE_TSX_FORCE_ABORT)) { + if (boot_cpu_has(X86_FEATURE_RTM_ALWAYS_ABORT)) { tsx_ctrl_state = TSX_CTRL_RTM_ALWAYS_ABORT; tsx_clear_cpuid(); setup_clear_cpu_cap(X86_FEATURE_RTM);
On 15.02.2022 17:31, Borislav Petkov wrote:
On Tue, Feb 15, 2022 at 04:11:03AM -0800, Pawan Gupta wrote:
That is exactly what this patch is fixing. Please let me know if you have any questions.
Just one: does the explanation I've written for this mess, sound about right?
I admit it has gotten complicated with so many bits associated with TSX. Your explanation is accurate. I just have a small suggestion below.
+/*
- Disabling TSX is not a trivial business.
- First of all, there's a CPUID bit: X86_FEATURE_RTM_ALWAYS_ABORT
- which says that TSX is practically disabled (all transactions are
- aborted by default). When that bit is set, the kernel unconditionally
- disables TSX.
- In order to do that, however, it needs to dance a bit:
- The first method to disable it is through MSR_TSX_FORCE_ABORT and
- the MSR is present only when *two* CPUID bits are set:
s/MSR/MSR bit MSR_TFA_TSX_CPUID_CLEAR/
Thanks, Pawan
On Tue, Feb 15, 2022 at 10:19:31AM -0800, Pawan Gupta wrote:
I admit it has gotten complicated with so many bits associated with TSX.
Yah, and looka here:
https://github.com/andyhhp/xen/commit/ad9f7c3b2e0df38ad6d54f4769d4dccf765fbc...
It seems it isn't complicated enough. ;-\
Andy just made me aware of this thing where you guys have added a new MSR bit:
MSR_MCU_OPT_CTRL[1] which is called something like MCU_OPT_CTRL_RTM_ALLOW or so. And lemme quote from that patch:
/* * Probe for the February 2022 microcode which de-features TSX on * TAA-vulnerable client parts - WHL-R/CFL-R. * * RTM_ALWAYS_ABORT (read above) enumerates the new functionality, * but is read as zero if MCU_OPT_CTRL.RTM_ALLOW has been set * before we run. Undo this. */
so MCU_OPT_CTRL.RTM_ALLOW=1 would have CPUID.X86_FEATURE_RTM_ALWAYS_ABORT=0, which means, we cannot tie TSX disabling on X86_FEATURE_RTM_ALWAYS_ABORT only.
So this would need more work, it seems to me.
Thx.
On 15.02.2022 20:33, Borislav Petkov wrote:
On Tue, Feb 15, 2022 at 10:19:31AM -0800, Pawan Gupta wrote:
I admit it has gotten complicated with so many bits associated with TSX.
Yah, and looka here:
https://github.com/andyhhp/xen/commit/ad9f7c3b2e0df38ad6d54f4769d4dccf765fbc...
It seems it isn't complicated enough. ;-\
Andy just made me aware of this thing where you guys have added a new MSR bit:
MSR_MCU_OPT_CTRL[1] which is called something like MCU_OPT_CTRL_RTM_ALLOW or so.
RTM_ALLOW bit was added to MSR_MCU_OPT_CTRL, but its not set by default, and it is *not* recommended to be used in production deployments [1]:
Although MSR 0x122 (TSX_CTRL) and MSR 0x123 (IA32_MCU_OPT_CTRL) can be used to reenable Intel TSX for development, doing so is not recommended for production deployments. In particular, applying MD_CLEAR flows for mitigation of the Intel TSX Asynchronous Abort (TAA) transient execution attack may not be effective on these processors when Intel TSX is enabled with updated microcode. The processors continue to be mitigated against TAA when Intel TSX is disabled.
And lemme quote from that patch:
/* * Probe for the February 2022 microcode which de-features TSX on * TAA-vulnerable client parts - WHL-R/CFL-R. * * RTM_ALWAYS_ABORT (read above) enumerates the new functionality, * but is read as zero if MCU_OPT_CTRL.RTM_ALLOW has been set * before we run. Undo this. */
Such development-only bit (SDV_ENABLE_RTM) existed for previous round of TSX defeature, but we decided not to use it:
https://lore.kernel.org/lkml/20210611232114.3dmmkpkkcqg5orhw@gupta-dev2.loca...
I am not sure why do we need to handle RTM_ALLOW this time? I will update the patch if you think otherwise.
Thanks, Pawan
[1] Intel Transactional Synchronization Extension (Intel TSX) Disable Update for Selected Processors https://cdrdv2.intel.com/v1/dl/getContent/643557
so MCU_OPT_CTRL.RTM_ALLOW=1 would have CPUID.X86_FEATURE_RTM_ALWAYS_ABORT=0, which means, we cannot tie TSX disabling on X86_FEATURE_RTM_ALWAYS_ABORT only.
So this would need more work, it seems to me.
Thx.
-- Regards/Gruss, Boris.
On 16/02/2022 00:39, Pawan Gupta wrote:
On 15.02.2022 20:33, Borislav Petkov wrote:
On Tue, Feb 15, 2022 at 10:19:31AM -0800, Pawan Gupta wrote:
I admit it has gotten complicated with so many bits associated with TSX.
Yah, and looka here:
https://github.com/andyhhp/xen/commit/ad9f7c3b2e0df38ad6d54f4769d4dccf765fbc...
It seems it isn't complicated enough. ;-\
Andy just made me aware of this thing where you guys have added a new MSR bit:
MSR_MCU_OPT_CTRL[1] which is called something like MCU_OPT_CTRL_RTM_ALLOW or so.
RTM_ALLOW bit was added to MSR_MCU_OPT_CTRL, but its not set by default, and it is *not* recommended to be used in production deployments [1]:
Although MSR 0x122 (TSX_CTRL) and MSR 0x123 (IA32_MCU_OPT_CTRL) can be used to reenable Intel TSX for development, doing so is not recommended for production deployments. In particular, applying MD_CLEAR flows for mitigation of the Intel TSX Asynchronous Abort (TAA) transient execution attack may not be effective on these processors when Intel TSX is enabled with updated microcode. The processors continue to be mitigated against TAA when Intel TSX is disabled.
The purpose of setting RTM_ALLOW isn't to enable TSX per say.
The purpose is to make MSR_TSX_CTRL.RTM_DISABLE behaves consistently on all hardware, which reduces the complexity and invasiveness of dealing with this special case, because the TAA workaround will still turn TSX off by default.
The configuration you don't want to be running with is RTM_ALLOW && !RTM_DISABLE, because that is "still vulnerable to TSX Async Abort".
~Andrew
On 16.02.2022 00:49, Andrew Cooper wrote:
On 16/02/2022 00:39, Pawan Gupta wrote:
On 15.02.2022 20:33, Borislav Petkov wrote:
On Tue, Feb 15, 2022 at 10:19:31AM -0800, Pawan Gupta wrote:
I admit it has gotten complicated with so many bits associated with TSX.
Yah, and looka here:
https://github.com/andyhhp/xen/commit/ad9f7c3b2e0df38ad6d54f4769d4dccf765fbc...
It seems it isn't complicated enough. ;-\
Andy just made me aware of this thing where you guys have added a new MSR bit:
MSR_MCU_OPT_CTRL[1] which is called something like MCU_OPT_CTRL_RTM_ALLOW or so.
RTM_ALLOW bit was added to MSR_MCU_OPT_CTRL, but its not set by default, and it is *not* recommended to be used in production deployments [1]:
Although MSR 0x122 (TSX_CTRL) and MSR 0x123 (IA32_MCU_OPT_CTRL) can be used to reenable Intel TSX for development, doing so is not recommended for production deployments. In particular, applying MD_CLEAR flows for mitigation of the Intel TSX Asynchronous Abort (TAA) transient execution attack may not be effective on these processors when Intel TSX is enabled with updated microcode. The processors continue to be mitigated against TAA when Intel TSX is disabled.
The purpose of setting RTM_ALLOW isn't to enable TSX per say.
The purpose is to make MSR_TSX_CTRL.RTM_DISABLE behaves consistently on all hardware, which reduces the complexity and invasiveness of dealing with this special case, because the TAA workaround will still turn TSX off by default.
The configuration you don't want to be running with is RTM_ALLOW && !RTM_DISABLE, because that is "still vulnerable to TSX Async Abort".
I am not sure how a system can end up with RTM_ALLOW && !RTM_DISABLE? I have no problem taking care of this case, I am just debating why we need it.
One way to get to this state is BIOS sets RTM_ALLOW (dont know why?) and linux cmdline has tsx=on.
Thanks, Pawan
On 15.02.2022 17:28, Pawan Gupta wrote:
On 16.02.2022 00:49, Andrew Cooper wrote:
On 16/02/2022 00:39, Pawan Gupta wrote:
On 15.02.2022 20:33, Borislav Petkov wrote:
On Tue, Feb 15, 2022 at 10:19:31AM -0800, Pawan Gupta wrote:
I admit it has gotten complicated with so many bits associated with TSX.
Yah, and looka here:
https://github.com/andyhhp/xen/commit/ad9f7c3b2e0df38ad6d54f4769d4dccf765fbc...
It seems it isn't complicated enough. ;-\
Andy just made me aware of this thing where you guys have added a new MSR bit:
MSR_MCU_OPT_CTRL[1] which is called something like MCU_OPT_CTRL_RTM_ALLOW or so.
RTM_ALLOW bit was added to MSR_MCU_OPT_CTRL, but its not set by default, and it is *not* recommended to be used in production deployments [1]:
Although MSR 0x122 (TSX_CTRL) and MSR 0x123 (IA32_MCU_OPT_CTRL) can be used to reenable Intel TSX for development, doing so is not recommended for production deployments. In particular, applying MD_CLEAR flows for mitigation of the Intel TSX Asynchronous Abort (TAA) transient execution attack may not be effective on these processors when Intel TSX is enabled with updated microcode. The processors continue to be mitigated against TAA when Intel TSX is disabled.
The purpose of setting RTM_ALLOW isn't to enable TSX per say.
The purpose is to make MSR_TSX_CTRL.RTM_DISABLE behaves consistently on all hardware, which reduces the complexity and invasiveness of dealing with this special case, because the TAA workaround will still turn TSX off by default.
The configuration you don't want to be running with is RTM_ALLOW && !RTM_DISABLE, because that is "still vulnerable to TSX Async Abort".
I am not sure how a system can end up with RTM_ALLOW && !RTM_DISABLE? I have no problem taking care of this case, I am just debating why we need it.
One way to get to this state is BIOS sets RTM_ALLOW (dont know why?) and linux cmdline has tsx=on.
If RTM_ALLOW is set for any reason, we can still deny tsx=on request. Below patch should do that (I haven't tested it yet).
Alternatively, we can reset RTM_ALLOW, which will set CPUID.RTM_ALWAYS_ABORT and may require re-enumeration of CPU features or otherwise setup_force_cpu_cap(X86_FEATURE_RTM_ALWAYS_ABORT) should also work.
--- diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h index 3faf0f97edb1..2ef58bcfb1e4 100644 --- a/arch/x86/include/asm/msr-index.h +++ b/arch/x86/include/asm/msr-index.h @@ -131,6 +131,8 @@ /* SRBDS support */ #define MSR_IA32_MCU_OPT_CTRL 0x00000123 #define RNGDS_MITG_DIS BIT(0) +#define MCU_OPT_CTRL_RTM_ALLOW BIT(1) +#define MCU_OPT_CTRL_RTM_LOCKED BIT(2)
#define MSR_IA32_SYSENTER_CS 0x00000174 #define MSR_IA32_SYSENTER_ESP 0x00000175 diff --git a/arch/x86/kernel/cpu/tsx.c b/arch/x86/kernel/cpu/tsx.c index 2835fa89fc6f..8ac085ac597f 100644 --- a/arch/x86/kernel/cpu/tsx.c +++ b/arch/x86/kernel/cpu/tsx.c @@ -142,6 +142,29 @@ void tsx_clear_cpuid(void) } }
+/* + * When the microcode released in Feb 2022 is applied, TSX will be disabled by + * default on some processors. MSR 0x122 (TSX_CTRL) and MSR 0x123 + * (IA32_MCU_OPT_CTRL) can be used to re-enable TSX for development, doing so is + * not recommended for production deployments. In particular, applying MD_CLEAR + * flows for mitigation of the Intel TSX Asynchronous Abort (TAA) transient + * execution attack may not be effective on these processors when TSX is + * enabled with updated microcode. + * + * Detect if this unsafe TSX development mode is enabled. + */ +static bool tsx_is_unsafe(void) +{ + u64 mcu_opt_ctrl; + + if (!boot_cpu_has(X86_FEATURE_SRBDS_CTRL)) + return false; + + rdmsrl(MSR_IA32_MCU_OPT_CTRL, mcu_opt_ctrl); + + return !!(mcu_opt_ctrl & MCU_OPT_CTRL_RTM_ALLOW); +} + void __init tsx_init(void) { char arg[5] = {}; @@ -163,6 +186,13 @@ void __init tsx_init(void) if (!tsx_ctrl_is_supported()) { tsx_ctrl_state = TSX_CTRL_NOT_SUPPORTED; return; + } else if (tsx_is_unsafe()) { + /* Do not allow tsx=on, when TSX is unsafe to use */ + tsx_ctrl_state = TSX_CTRL_DISABLE; + tsx_disable(); + setup_clear_cpu_cap(X86_FEATURE_RTM); + setup_clear_cpu_cap(X86_FEATURE_HLE); + return; }
ret = cmdline_find_option(boot_command_line, "tsx", arg, sizeof(arg));
On Tue, Feb 15, 2022 at 10:08:26PM -0800, Pawan Gupta wrote:
Alternatively, we can reset RTM_ALLOW,
I'd vote for that. Considering how plagued TSX is, I'm sceptical anyone would be interested in doing development with it so we can reset it for now and allow enabling it later if there's an actual use case for it...
Thx.
On 16.02.2022 11:30, Borislav Petkov wrote:
On Tue, Feb 15, 2022 at 10:08:26PM -0800, Pawan Gupta wrote:
Alternatively, we can reset RTM_ALLOW,
I'd vote for that. Considering how plagued TSX is, I'm sceptical anyone would be interested in doing development with it so we can reset it for now and allow enabling it later if there's an actual use case for it...
Okay, I will add RTM_ALLOW reset logic in the next revision.
Thanks, Pawan
On 16/02/2022 01:28, Pawan Gupta wrote:
On 16.02.2022 00:49, Andrew Cooper wrote:
On 16/02/2022 00:39, Pawan Gupta wrote:
On 15.02.2022 20:33, Borislav Petkov wrote:
On Tue, Feb 15, 2022 at 10:19:31AM -0800, Pawan Gupta wrote:
I admit it has gotten complicated with so many bits associated with TSX.
Yah, and looka here:
https://github.com/andyhhp/xen/commit/ad9f7c3b2e0df38ad6d54f4769d4dccf765fbc...
It seems it isn't complicated enough. ;-\
Andy just made me aware of this thing where you guys have added a new MSR bit:
MSR_MCU_OPT_CTRL[1] which is called something like MCU_OPT_CTRL_RTM_ALLOW or so.
RTM_ALLOW bit was added to MSR_MCU_OPT_CTRL, but its not set by default, and it is *not* recommended to be used in production deployments [1]:
Although MSR 0x122 (TSX_CTRL) and MSR 0x123 (IA32_MCU_OPT_CTRL) can be used to reenable Intel TSX for development, doing so is not recommended for production deployments. In particular, applying MD_CLEAR flows for mitigation of the Intel TSX Asynchronous Abort (TAA) transient execution attack may not be effective on these processors when Intel TSX is enabled with updated microcode. The processors continue to be mitigated against TAA when Intel TSX is disabled.
The purpose of setting RTM_ALLOW isn't to enable TSX per say.
The purpose is to make MSR_TSX_CTRL.RTM_DISABLE behaves consistently on all hardware, which reduces the complexity and invasiveness of dealing with this special case, because the TAA workaround will still turn TSX off by default.
The configuration you don't want to be running with is RTM_ALLOW && !RTM_DISABLE, because that is "still vulnerable to TSX Async Abort".
I am not sure how a system can end up with RTM_ALLOW && !RTM_DISABLE? I have no problem taking care of this case, I am just debating why we need it.
Well for one, when Linux is starting up as the kexec environment following Xen.
You're not coding for "what logic should follow a clean microcode load". You're coding for "how to take the arbitrary state that my preceding environment left, and turn it into what I want".
Look no further than linuxboot for an environment where your bootloader has already altered these settings.
~Andrew
On 16.02.2022 11:46, Andrew Cooper wrote:
On 16/02/2022 01:28, Pawan Gupta wrote:
On 16.02.2022 00:49, Andrew Cooper wrote:
On 16/02/2022 00:39, Pawan Gupta wrote:
On 15.02.2022 20:33, Borislav Petkov wrote:
On Tue, Feb 15, 2022 at 10:19:31AM -0800, Pawan Gupta wrote:
I admit it has gotten complicated with so many bits associated with TSX.
Yah, and looka here:
https://github.com/andyhhp/xen/commit/ad9f7c3b2e0df38ad6d54f4769d4dccf765fbc...
It seems it isn't complicated enough. ;-\
Andy just made me aware of this thing where you guys have added a new MSR bit:
MSR_MCU_OPT_CTRL[1] which is called something like MCU_OPT_CTRL_RTM_ALLOW or so.
RTM_ALLOW bit was added to MSR_MCU_OPT_CTRL, but its not set by default, and it is *not* recommended to be used in production deployments [1]:
Although MSR 0x122 (TSX_CTRL) and MSR 0x123 (IA32_MCU_OPT_CTRL) can be used to reenable Intel TSX for development, doing so is not recommended for production deployments. In particular, applying MD_CLEAR flows for mitigation of the Intel TSX Asynchronous Abort (TAA) transient execution attack may not be effective on these processors when Intel TSX is enabled with updated microcode. The processors continue to be mitigated against TAA when Intel TSX is disabled.
The purpose of setting RTM_ALLOW isn't to enable TSX per say.
The purpose is to make MSR_TSX_CTRL.RTM_DISABLE behaves consistently on all hardware, which reduces the complexity and invasiveness of dealing with this special case, because the TAA workaround will still turn TSX off by default.
The configuration you don't want to be running with is RTM_ALLOW && !RTM_DISABLE, because that is "still vulnerable to TSX Async Abort".
I am not sure how a system can end up with RTM_ALLOW && !RTM_DISABLE? I have no problem taking care of this case, I am just debating why we need it.
Well for one, when Linux is starting up as the kexec environment following Xen.
You're not coding for "what logic should follow a clean microcode load". You're coding for "how to take the arbitrary state that my preceding environment left, and turn it into what I want".
I will add the handling for this case (and I am going to follow these words of wisdom in my future work).
Thanks, Pawan
linux-stable-mirror@lists.linaro.org