On Sep 21, 2022, at 12:52 AM, Peter Zijlstra peterz@infradead.org wrote:
⚠ External Email
On Tue, Sep 20, 2022 at 10:47:43PM +0000, Nadav Amit wrote:
Fix this issue with small backportable patch. Instead of trying to make RCU-like behavior for bp_desc, just eliminate the unnecessary level of indirection of bp_desc, and hold the whole descriptor on the stack. Anyhow, there is only a single descriptor at any given moment.
Because of text_mutex; indeed. No idea why I put that thing on the stack.
I've done a few minor edits to your patch, but it otherwise looks good to me.
Subject: x86/alternative: Fix race in try_get_desc() From: Nadav Amit namit@vmware.com Date: Tue, 20 Sep 2022 22:47:43 +0000
From: Nadav Amit namit@vmware.com
The text poke mechanism claims to have an RCU-like behavior, but it does not appear that there is any quiescent state to ensure that nobody holds reference to desc. As a result, the following race appears to be possible, which can lead to memory corruption.
CPU0 CPU1
text_poke_bp_batch() -> smp_store_release(&bp_desc, &desc)
[ notice that desc is on the stack ]
poke_int3_handler() [ int3 might be kprobe's so sync events are do not help ] -> try_get_desc(descp=&bp_desc) desc = __READ_ONCE(bp_desc) if (!desc) [false, success]
WRITE_ONCE(bp_desc, NULL); atomic_dec_and_test(&desc.refs)
[ success, desc space on the stack is being reused and might have non-zero value. ] arch_atomic_inc_not_zero(&desc->refs)
[ might succeed since desc points to stack memory that was freed and might be reused. ]
I encountered some occasional crashes of poke_int3_handler() when kprobes are set, while accessing desc->vec. The analysis has been done offline and I did not corroborate the cause of the crashes. Yet, it seems that this race might be the root cause.
Fix this issue with small backportable patch. Instead of trying to make RCU-like behavior for bp_desc, just eliminate the unnecessary level of indirection of bp_desc, and hold the whole descriptor on the stack.
Looks good to me. Thanks for improving my patch.
I just made one small mistake in commit message. It should say “hold the whole descriptor as a global” in the line above.
I will send v2 with your changes and the updated commit message.
Thanks again.