This was marked for stable, and honestly, nowhere in the discussion did I see any mention of just *how* bad the performance impact of this was.
When performance goes down by 50% on some loads, people need to start asking themselves whether it was worth it. It's apparently better to just disable SMT entirely, which is what security-conscious people do anyway.
So why do that STIBP slow-down by default when the people who *really* care already disabled SMT?
I think we should use the same logic as for L1TF: we default to something that doesn't kill performance. Warn once about it, and let the crazy people say "I'd rather take a 50% performance hit than worry about a theoretical issue".
Linus
On Sun, 18 Nov 2018, Linus Torvalds wrote:
This was marked for stable, and honestly, nowhere in the discussion did I see any mention of just *how* bad the performance impact of this was.
Frankly, I ran some benchmarks myself, and am seeing very, very varying/noisy results, which were rather inconclusive across the individual workloads.
The article someone pointed me to at Phoronix yesterday also was talking about something between 3% and 20% IIRC.
When performance goes down by 50% on some loads, people need to start asking themselves whether it was worth it. It's apparently better to just disable SMT entirely, which is what security-conscious people do anyway.
So why do that STIBP slow-down by default when the people who *really* care already disabled SMT?
BTW for them, there is no impact at all. And they probably did it because of crypto libraries being implemented in a way that their observable operation depends on value of the secrets (tlbleed etc), but this is being fixed in the said crypto code.
STIBP is only activated on systems with HT on; plus odds are that people who don't care about spectrev2 already have 'nospectre_v2' on their command-line, so they are fine as well.
I think we should use the same logic as for L1TF: we default to something that doesn't kill performance. Warn once about it, and let the crazy people say "I'd rather take a 50% performance hit than worry about a theoretical issue".
So, I think it's as theoretical as any other spectrev2 (only with the extra "HT" condition added on top).
Namely, I think that scenarios such as "one browser tab's javascript is attacking other browser's tab private data on a sibling" is rather a practical one, I've seen such demos (not with the sibling condition being in place, but I don't think that matters that much). The same likely holds for server threads serving individual requests in separate threads, etc (but yeah, you need to have proper gadgets there in place already in server's .text, which makes it much less practical).
For L1TF, the basic argument AFAICS was, that the default situation is "only special people care about strict isolation between VMs". But this is isolation between individual processess.
Tim is currently working [1] on a patchset on top of my original STIBP-enabling commit, that will make STIBP to be used in much smaller number of cases (taking prctl()-based aproach as one of the possible ones, similarly as we did for SSBD), and as I indicated in that thread, I think it should be really considered for this -rc cycle still if it lands in tip in a reasonable future.
To conclude -- I am quite happy that this finally started to move (although it's sad that some of it is due to clickbaity article titles, but whatever), as Intel didn't really provide any patch / guidance (*) in past ~1 year to those who care about spectrev2 isolation on HT, which is something I wasn't really feeling comfortable with, and that's why I submitted the patch.
If we make it opt-in (on kernel cmdline) and issue big fat warning if not mitigated, fine, but then we're bit incosistent how we deal with cross-core (IBPB) and cross-thread (STIBP) spectrev2 security in the kernel code.
If we take Tim's aproach, even better -- there would be means available to make system secure, although not sacrifying performance by default.
I would not prefer going the plain revert path though, as that leaves no other option to mitigate rather than turning SMT off, which might possibly have even *worse* performance numbers depending on workload.
[1] https://lore.kernel.org/lkml/?q=cover.1542418936.git.tim.c.chen%40linux.inte...
(*) no code to implement STIBP sanely, no recommendation about turning SMT off at least for some workloads
Thanks,
On Sun, Nov 18, 2018 at 10:49:44PM +0100, Jiri Kosina wrote:
odds are that people who don't care about spectrev2 already have 'nospectre_v2' on their command-line, so they are fine as well.
FWIW in our appliances, we never modify the boot loader's cmdline in field, we only provide new kernel+rootfs images. We've however disabled the config options for all this class of vulnerabilities. As long as it remains possible to disable the new ones using config options only, that's not an issue for me.
Cheers, Willy
On Sun, Nov 18, 2018 at 1:49 PM Jiri Kosina jikos@kernel.org wrote:
So why do that STIBP slow-down by default when the people who *really* care already disabled SMT?
BTW for them, there is no impact at all.
Right. People who really care about security and are anal about it do not see *any* advantage of the patch.
But people who aren't that worried suddenly see potentially huge slowdowns.
In other words, the behavior of the patch is basically essentially exactly the reverse of what you'd want. You penalize the people who don't even want it and don't care.
STIBP is only activated on systems with HT on; plus odds are that people who don't care about spectrev2 already have 'nospectre_v2' on their command-line, so they are fine as well.
I'm talking about *normal* people. People who simply aren't all that invested in this all. People who just want to get their work done.
So, I think it's as theoretical as any other spectrev2 (only with the extra "HT" condition added on top).
What? No.
It's *way* more theoretical than something like meltdown, which could be trivially used to get data from another protection domain.
Have you seen any actual realistic attacks for normal human users? Things where the *kernel* should actually care?
The javascript thing is for the browser to fix up, not for the kernel to say "now everything should run up to 50% slower".
Linus
On Sun, 18 Nov 2018, Linus Torvalds wrote:
So, I think it's as theoretical as any other spectrev2 (only with the extra "HT" condition added on top).
What? No.
It's *way* more theoretical than something like meltdown, which could be trivially used to get data from another protection domain.
Oh yeah, I absolutely agree that spectrev2 and Meltdown and completely different beasts.
Have you seen any actual realistic attacks for normal human users? Things where the *kernel* should actually care?
The javascript thing is for the browser to fix up,
It's probably not just browsers, but anything running JITed sandboxed code. So the most straightforward way might be the prctl() aproach, where userspace would claim "I do care about this, please fix it up for me". So prctl() + perhaps SECCOMP.
Which gets us back to Tim's fixup patch. Do you still prefer the revert, given the existence of that? I think that if Tim's fixup makes it through (it's currently missing SECCOMP handling, but that is trivial to add on top), it might be the best compromise. We'd also have have to make IBPB obey it to be consistent (and get even a few more % of performance back), but that's easy as well.
Thanks,
On Nov 18, 2018, at 2:17 PM, Jiri Kosina jikos@kernel.org wrote:
It's probably not just browsers, but anything running JITed sandboxed code. So the most straightforward way might be the prctl() aproach, where userspace would claim "I do care about this, please fix it up for me". So prctl() + perhaps SECCOMP.
Yeah, the prctl() shifts the pain to the right place: folks explicitly opting in. Always-on seemed way too draconian to me.
On Sun, Nov 18, 2018 at 2:19 PM Jiri Kosina jikos@kernel.org wrote:
Which gets us back to Tim's fixup patch. Do you still prefer the revert, given the existence of that? I think that if Tim's fixup makes it through (it's currently missing SECCOMP handling, but that is trivial to add on top), it might be the best compromise. We'd also have have to make IBPB obey it to be consistent (and get even a few more % of performance back), but that's easy as well.
+1 for Tim's patch. That make us more consistent with how we handled L1TF (giving the system owner a control knob to decide whether they want this level of fixup, based on their own analysis of their vulnerability).
-Tony
On Sun, Nov 18, 2018 at 2:17 PM Jiri Kosina jikos@kernel.org wrote:
Which gets us back to Tim's fixup patch. Do you still prefer the revert, given the existence of that?
I don't think the code needs to be reverted, but the *behavior* of just unconditionally enabling STIBP needs to be reverted.
Because it was clearly way more expensive than people were told.
Linus
On 11/18/2018 02:36 PM, Linus Torvalds wrote:
On Sun, Nov 18, 2018 at 2:17 PM Jiri Kosina jikos@kernel.org wrote:
Which gets us back to Tim's fixup patch. Do you still prefer the revert, given the existence of that?
I don't think the code needs to be reverted, but the *behavior* of just unconditionally enabling STIBP needs to be reverted.
Yes, in the patchset I proposed on top of Jiri's, the default will be enabling STIBP only for tasks that ask for STIBP via prctl or those that are non-dumpable, like sshd.
Because it was clearly way more expensive than people were told.
I did realize the performance implications of making STIBP on for all tasks. Here's to recap performance impact of STIBP when I posted my patchset:
"...leaving STIBP on all the time is expensive for certain applications that have frequent indirect branches. One such application is perlbench in the SpecInt Rate 2006 test suite which shows a 21% reduction in throughput."
I think if we include Jiri's patchset for 4.20, we should have my patchset also included to avoid the performance impact on regular tasks but still allow admin and applications to turn on STIBP if needed.
Thanks.
Tim
Linus Torvalds torvalds@linux-foundation.org writes:
On Sun, Nov 18, 2018 at 2:17 PM Jiri Kosina jikos@kernel.org wrote:
Which gets us back to Tim's fixup patch. Do you still prefer the revert, given the existence of that?
I don't think the code needs to be reverted, but the *behavior* of just unconditionally enabling STIBP needs to be reverted.
Actually I think it should be reverted. Yes of course opt-in is needed.
But also when you opt-in it doesn't make sense to set STIBP when the sibling is running the same security context, which is actually a common case. So to even use it properly you would need some scheduler support to detect these cases and only enable it then with opt-in. These patches didn't even try to tackle this problem.
-Andi
On 11/18/2018 02:17 PM, Jiri Kosina wrote:
On Sun, 18 Nov 2018, Linus Torvalds wrote:
So, I think it's as theoretical as any other spectrev2 (only with the extra "HT" condition added on top).
What? No.
It's *way* more theoretical than something like meltdown, which could be trivially used to get data from another protection domain.
Oh yeah, I absolutely agree that spectrev2 and Meltdown and completely different beasts.
Have you seen any actual realistic attacks for normal human users? Things where the *kernel* should actually care?
The javascript thing is for the browser to fix up,
It's probably not just browsers, but anything running JITed sandboxed code. So the most straightforward way might be the prctl() aproach, where userspace would claim "I do care about this, please fix it up for me". So prctl() + perhaps SECCOMP.
Which gets us back to Tim's fixup patch. Do you still prefer the revert, given the existence of that? I think that if Tim's fixup makes it through (it's currently missing SECCOMP handling, but that is trivial to add on top), it might be the best compromise. We'd also have have to make IBPB obey it to be consistent (and get even a few more % of performance back), but that's easy as well.
Thanks,
I think if Thomas can merge my patchset along with Jiri's, the default option will become opt in for tasks that want the extra security and we won't lose performance.
Tasks that want extra security will enable that via prctl interface or making themselves non-dumpable.
Admin that want security for all tasks will enable the strict option on boot to enable STIBP for all tasks.
So my patchset and Jiri's patchset should probably be merged together, so the users have a choice of the behavior.
Tim
On Sun, Nov 18, 2018 at 02:40:28PM -0800, Tim Chen wrote:
Tasks that want extra security will enable that via prctl interface or making themselves non-dumpable.
Well, you need to be careful regarding the last part of your option above, because a number of network daemons become non-dumpable by executing setuid() at boot, and certainly don't want to suffer a performance loss as a side effect of wanting to become "normally" secure. I'd suggest to use the prctl only so that it doesn't randomly hit innocent applications that would only have as a last resort to turn off reasonable security features to avoid this impact.
Regards, Willy
On Sun, 18 Nov 2018, Tim Chen wrote:
On 11/18/2018 02:17 PM, Jiri Kosina wrote:
On Sun, 18 Nov 2018, Linus Torvalds wrote:
So, I think it's as theoretical as any other spectrev2 (only with the extra "HT" condition added on top).
What? No.
It's *way* more theoretical than something like meltdown, which could be trivially used to get data from another protection domain.
Oh yeah, I absolutely agree that spectrev2 and Meltdown and completely different beasts.
Have you seen any actual realistic attacks for normal human users? Things where the *kernel* should actually care?
The javascript thing is for the browser to fix up,
It's probably not just browsers, but anything running JITed sandboxed code. So the most straightforward way might be the prctl() aproach, where userspace would claim "I do care about this, please fix it up for me". So prctl() + perhaps SECCOMP.
Which gets us back to Tim's fixup patch. Do you still prefer the revert, given the existence of that? I think that if Tim's fixup makes it through (it's currently missing SECCOMP handling, but that is trivial to add on top), it might be the best compromise. We'd also have have to make IBPB obey it to be consistent (and get even a few more % of performance back), but that's easy as well.
I think if Thomas can merge my patchset along with Jiri's, the default option will become opt in for tasks that want the extra security and we won't lose performance.
If it would be in mergeable state ....
Thanks,
tglx
On Sun, 18 Nov 2018, Jiri Kosina wrote:
It's probably not just browsers, but anything running JITed sandboxed code. So the most straightforward way might be the prctl() aproach, where userspace would claim "I do care about this, please fix it up for me". So prctl() + perhaps SECCOMP.
I've just sent SECCOMP handling as a followup to Tim's set.
I still feel like we should make STIBP and IBPB behavior consistent (in a sense they should always be used both, or none of them), but that might be additional 4.21 optimization.
Thanks,
On 11/19/2018 6:00 AM, Linus Torvalds wrote:
On Sun, Nov 18, 2018 at 1:49 PM Jiri Kosina jikos@kernel.org wrote:
So why do that STIBP slow-down by default when the people who *really* care already disabled SMT?
BTW for them, there is no impact at all.
Right. People who really care about security and are anal about it do not see *any* advantage of the patch.
In the documentation, AMD officially recommends against this by default, and I can speak for Intel that our position is that as well: this really must not be on by default.
STIBP and its friends are there as tools, and were created early on as big hammers because that is all that one can add in a microcode update.. expensive big hammers.
In some ways it's analogous to the "disable caches" bit in CR0. sure it's there as a big hammer, but you don't set that always just because caches could be used for a side channel
Using these tools much more surgically is fine, if a paranoid task wants it for example, or when you know you are doing a hard core security transition. But always on? Yikes.
On Mon, 19 Nov 2018, Arjan van de Ven wrote:
In the documentation, AMD officially recommends against this by default, and I can speak for Intel that our position is that as well: this really must not be on by default.
Thanks for pointing to the AMD doc, it's indeed clearly stated there.
Is there any chance this could perhaps be added to Intel documentation as well, so that we avoid cases like this in the future?
The revision 3.0 of Intel doc from may 2018 [1] I am always looking into doesn't say anything discouraging about STIBP to me.
[1] https://software.intel.com/security-software-guidance/api-app/sites/default/...
Thanks,
On 11/20/2018 11:27 PM, Jiri Kosina wrote:
On Mon, 19 Nov 2018, Arjan van de Ven wrote:
In the documentation, AMD officially recommends against this by default, and I can speak for Intel that our position is that as well: this really must not be on by default.
Thanks for pointing to the AMD doc, it's indeed clearly stated there.
Is there any chance this could perhaps be added to Intel documentation as well, so that we avoid cases like this in the future?
absolutely that's now already in progress; the doc publishing process is a bit on the long side unfortunately so it won't be today ;)
* Linus Torvalds torvalds@linux-foundation.org wrote:
This was marked for stable, and honestly, nowhere in the discussion did I see any mention of just *how* bad the performance impact of this was.
Yeah. This was an oversight - we'll fix it!
When performance goes down by 50% on some loads, people need to start asking themselves whether it was worth it. It's apparently better to just disable SMT entirely, which is what security-conscious people do anyway.
So why do that STIBP slow-down by default when the people who *really* care already disabled SMT?
I think we should use the same logic as for L1TF: we default to something that doesn't kill performance. Warn once about it, and let the crazy people say "I'd rather take a 50% performance hit than worry about a theoretical issue".
Yeah, absolutely.
We'll also require performance measurements in changelogs enabling any sort of mitigation feature from now on - this requirement was implicit but 53c613fe6349 flew in under the radar, so it's going to be explicit an explicit requirement.
Thanks,
Ingo
On Mon, 19 Nov 2018, Ingo Molnar wrote:
This was marked for stable, and honestly, nowhere in the discussion did I see any mention of just *how* bad the performance impact of this was.
Yeah. This was an oversight - we'll fix it!
When performance goes down by 50% on some loads, people need to start asking themselves whether it was worth it. It's apparently better to just disable SMT entirely, which is what security-conscious people do anyway.
So why do that STIBP slow-down by default when the people who *really* care already disabled SMT?
I think we should use the same logic as for L1TF: we default to something that doesn't kill performance. Warn once about it, and let the crazy people say "I'd rather take a 50% performance hit than worry about a theoretical issue".
Yeah, absolutely.
We'll also require performance measurements in changelogs enabling any sort of mitigation feature from now on - this requirement was implicit but 53c613fe6349 flew in under the radar, so it's going to be explicit an explicit requirement.
Do you already have an idea whether you'd proceed with Tim's patchset for current cycle? If so, great as far as I am concerned. If not, I'll send a patch that switches this to opt-in via kernel cmdline (+boot-time warning if not mitigated).
Thanks,
On Sun, 18 Nov 2018, Linus Torvalds wrote:
This was marked for stable, and honestly, nowhere in the discussion did I see any mention of just *how* bad the performance impact of this was.
When performance goes down by 50% on some loads, people need to start asking themselves whether it was worth it. It's apparently better to just disable SMT entirely, which is what security-conscious people do anyway.
So why do that STIBP slow-down by default when the people who *really* care already disabled SMT?
I think we should use the same logic as for L1TF: we default to something that doesn't kill performance. Warn once about it, and let the crazy people say "I'd rather take a 50% performance hit than worry about a theoretical issue".
Just to update status quo here -- Thomas is working on polishing Tim's set into mergeable state, I've just sent him small addition on top that makes IBPB also be controlled via the same toggle.
That should make the whole 'spectre v2 userspace-to-userspace' mitigation control consistent and undestandable. And also give us even few more % back that are lost due to IBPB as well.
linux-stable-mirror@lists.linaro.org