On Fri, Nov 24, 2023 at 12:54:30PM +0100, Peter Zijlstra wrote:
On Fri, Nov 24, 2023 at 12:04:09PM +0100, Jonas Oberhauser wrote:
I think ARM64 approached this problem by adding the load-acquire/store-release instructions and for TSO based code, translate into those (eg. x86 -> arm64 transpilers).
Although those instructions have a bit more ordering constraints.
I have heard rumors that the apple chips also have a register that can be set at runtime.
I could understand the rumor, smart design! Thx for sharing.
Oh, I thought they made do with the load-acquire/store-release thingies. But to be fair, I haven't been paying *that* much attention to the apple stuff.
I did read about how they fudged some of the x86 flags thing.
And there are some IBM machines that have a setting, but not sure how it is controlled.
Cute, I'm assuming this is the Power series (s390 already being TSO)? I wasn't aware they had this.
IIRC Risc-V actually has such instructions as well, so *why* are you doing this?!?!
Unfortunately, at least last time I checked RISC-V still hadn't gotten such instructions. What they have is the *semantics* of the instructions, but no actual opcodes to encode them.
Well, that sucks..
I argued for them in the RISC-V memory group, but it was considered to be outside the scope of that group.
Transpiling with sufficient DMB ISH to get the desired ordering is really bad for performance.
Ha!, quite dreadful I would imagine.
That is not to say that linux should support this. Perhaps linux should pressure RISC-V into supporting implicit barriers instead.
I'm not sure I count for much in this regard, but yeah, that sounds like a plan :-)