Mark notes that gcc optimization passes have the potential to elide necessary invocations of this instruction sequence, so include an optimization barrier.
> I think that either way, we have a potential problem if the compiler > generates a branch dependent on the result of validate_index_nospec(). > > In that case, we could end up with codegen approximating: > > bool safe = false; > > if (idx < bound) { > idx = array_index_nospec(idx, bound); > safe = true; > } > > // this branch can be mispredicted > if (safe) { > foo = array[idx]; > } > > ... and thus we lose the nospec protection.
I see GCC do this at -O0, but so far I haven't tricked it into doing this at -O1 or above.
Regardless, I worry this is fragile -- GCC *can* generate code as per the above, even if it's unlikely to.
Cc: stable@vger.kernel.org Fixes: babdde2698d4 ("x86: Implement array_index_mask_nospec") Cc: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Ingo Molnar mingo@kernel.org Reported-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Dan Williams dan.j.williams@intel.com --- arch/x86/include/asm/barrier.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 042b5e892ed1..41f7435c84a7 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -38,10 +38,11 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, { unsigned long mask;
- asm ("cmp %1,%2; sbb %0,%0;" + asm volatile ("cmp %1,%2; sbb %0,%0;" :"=r" (mask) :"g"(size),"r" (index) :"cc"); + barrier(); return mask; }
Hi Dan,
On Fri, May 25, 2018 at 10:21:08AM -0700, Dan Williams wrote:
Mark notes that gcc optimization passes have the potential to elide necessary invocations of this instruction sequence, so include an optimization barrier.
> I think that either way, we have a potential problem if the compiler > generates a branch dependent on the result of validate_index_nospec(). > > In that case, we could end up with codegen approximating: > > bool safe = false; > > if (idx < bound) { > idx = array_index_nospec(idx, bound); > safe = true; > } > > // this branch can be mispredicted > if (safe) { > foo = array[idx]; > } > > ... and thus we lose the nospec protection. I see GCC do this at -O0, but so far I haven't tricked it into doing this at -O1 or above. Regardless, I worry this is fragile -- GCC *can* generate code as per the above, even if it's unlikely to.
I certainly believe that we want the volatile.
However, just to be clear, the barrier doesn't prevent the above example, since we don't currently have a mechanism to prevent the compiler splitting basic blocks and inserting additional branches between those blocks.
I've written up a rationale for the volatile below, if you want something for the commit message. There's a minor comment on the patch after that.
---- The volatile will inhibit *some* cases where the compiler could lift the array_index_nospec() call out of a branch, e.g. where there are multiple invocations of array_index_nospec() with the same arguments:
if (idx < foo) { idx1 = array_idx_nospec(idx, foo) do_something(idx1); }
< some other code >
if (idx < foo) { idx2 = array_idx_nospec(idx, foo); do_something_else(idx2); }
... since the compiler can determine that the two invocations yield the same result, and reuse the first result (likely the same register as idx was in originally) for the second branch, effectively re-writing the above as:
if (idx < foo) { idx = array_idx_nospec(idx, foo); do_something(idx); }
< some other code >
if (idx < foo) { do_something_else(idx); } ... if we don't take the first branch, then speculatively take the second, we lose the nospec protection.
There's more info on volatile asm in the GCC docs:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile ----
Cc: stable@vger.kernel.org Fixes: babdde2698d4 ("x86: Implement array_index_mask_nospec") Cc: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Ingo Molnar mingo@kernel.org Reported-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Dan Williams dan.j.williams@intel.com
arch/x86/include/asm/barrier.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 042b5e892ed1..41f7435c84a7 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -38,10 +38,11 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, { unsigned long mask;
- asm ("cmp %1,%2; sbb %0,%0;"
- asm volatile ("cmp %1,%2; sbb %0,%0;" :"=r" (mask) :"g"(size),"r" (index) :"cc");
- barrier(); return mask;
}
What does the barrier() prevent?
I don't think that inhibits the insertion of branches, and AFAIK the volatile is sufficient to prevent elision of identical array_idx_nospec() calls.
I don't have an objection to it, regardless.
So long as the example is updated in the commit message, feel free to add:
Acked-by: Mark Rutland mark.rutland@arm.com
Thanks, Mark.
Dan,
On Wed, 30 May 2018, Mark Rutland wrote:
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 042b5e892ed1..41f7435c84a7 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -38,10 +38,11 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, { unsigned long mask;
- asm ("cmp %1,%2; sbb %0,%0;"
- asm volatile ("cmp %1,%2; sbb %0,%0;" :"=r" (mask) :"g"(size),"r" (index) :"cc");
- barrier(); return mask;
}
What does the barrier() prevent?
I don't think that inhibits the insertion of branches, and AFAIK the volatile is sufficient to prevent elision of identical array_idx_nospec() calls.
I don't have an objection to it, regardless.
So long as the example is updated in the commit message, feel free to add:
Any update on this?
Thanks,
tglx
Mark notes that gcc optimization passes have the potential to elide necessary invocations of this instruction sequence, so mark the asm volaltile.
---
From Mark:
The volatile will inhibit *some* cases where the compiler could lift the array_index_nospec() call out of a branch, e.g. where there are multiple invocations of array_index_nospec() with the same arguments:
if (idx < foo) { idx1 = array_idx_nospec(idx, foo) do_something(idx1); }
< some other code >
if (idx < foo) { idx2 = array_idx_nospec(idx, foo); do_something_else(idx2); }
... since the compiler can determine that the two invocations yield the same result, and reuse the first result (likely the same register as idx was in originally) for the second branch, effectively re-writing the above as:
if (idx < foo) { idx = array_idx_nospec(idx, foo); do_something(idx); }
< some other code >
if (idx < foo) { do_something_else(idx); }
... if we don't take the first branch, then speculatively take the second, we lose the nospec protection.
There's more info on volatile asm in the GCC docs:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Volatile
Cc: stable@vger.kernel.org Fixes: babdde2698d4 ("x86: Implement array_index_mask_nospec") Cc: Thomas Gleixner tglx@linutronix.de Cc: Peter Zijlstra peterz@infradead.org Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Ingo Molnar mingo@kernel.org Reported-by: Mark Rutland mark.rutland@arm.com Acked-by: Mark Rutland mark.rutland@arm.com Signed-off-by: Dan Williams dan.j.williams@intel.com --- Changes in v2: * drop the barrier() call (Mark) * update the example (Mark)
arch/x86/include/asm/barrier.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h index 042b5e892ed1..14de0432d288 100644 --- a/arch/x86/include/asm/barrier.h +++ b/arch/x86/include/asm/barrier.h @@ -38,7 +38,7 @@ static inline unsigned long array_index_mask_nospec(unsigned long index, { unsigned long mask;
- asm ("cmp %1,%2; sbb %0,%0;" + asm volatile ("cmp %1,%2; sbb %0,%0;" :"=r" (mask) :"g"(size),"r" (index) :"cc");
On Thu, Jun 7, 2018 at 9:23 AM Dan Williams dan.j.williams@intel.com wrote:
Mark notes that gcc optimization passes have the potential to elide necessary invocations of this instruction sequence, so mark the asm volatile.
Ack. I'm not entirely sure this matters much, but it certainly isn't wrong either.
The reason I'm not 100% convinced this matters is that gcc can *still* mess things up for us by simply adding conditionals elsewhere.
For example, let's say we write this:
if (idx < foo) { idx = array_idx_nospec(idx, foo); do_something(idx); } else { do_something_else(); }
then everything is obviously fine, right? With the volatile on the array_idx_nospec(), we're guaranteed to use the right reduced idx, and there's only one user, so we're all good.
Except maybe do_something(idx) looks something like this:
do_something(int idx) { do_something_else() access(idx); }
and gcc decides that "hey, I can combine the two do_something_else() cases", and then generates code that is basically
if (idx < foo) idx = array_idx_nospec(idx, foo); do_something_else(); if (idx < foo) access(idx);
instead. And now we're back to the "first branch can be predicted correctly, second branch can be mis-predicted".
Honestly, I don't really care, and I don't think the kernel _should_ care. I don't think this is a problem in practice. I'm just saying that adding a "volatile" on array_idx_nospec() doesn't really guarantee anything, since it's not a volatile over the whole relevant sequence, only over that small part.
So I think the volatile is fine, but I really think it doesn't matter either. We're not going to plug every theoretical hole, and I think the hole that the volatile plugs is theoretical, not practical.
Linus
linux-stable-mirror@lists.linaro.org