On Fri, Nov 24, 2023 at 12:04:09PM +0100, Jonas Oberhauser wrote:
I think ARM64 approached this problem by adding the load-acquire/store-release instructions and for TSO based code, translate into those (eg. x86 -> arm64 transpilers).
Although those instructions have a bit more ordering constraints.
I have heard rumors that the apple chips also have a register that can be set at runtime.
Oh, I thought they made do with the load-acquire/store-release thingies. But to be fair, I haven't been paying *that* much attention to the apple stuff.
I did read about how they fudged some of the x86 flags thing.
And there are some IBM machines that have a setting, but not sure how it is controlled.
Cute, I'm assuming this is the Power series (s390 already being TSO)? I wasn't aware they had this.
IIRC Risc-V actually has such instructions as well, so *why* are you doing this?!?!
Unfortunately, at least last time I checked RISC-V still hadn't gotten such instructions. What they have is the *semantics* of the instructions, but no actual opcodes to encode them.
Well, that sucks..
I argued for them in the RISC-V memory group, but it was considered to be outside the scope of that group.
Transpiling with sufficient DMB ISH to get the desired ordering is really bad for performance.
Ha!, quite dreadful I would imagine.
That is not to say that linux should support this. Perhaps linux should pressure RISC-V into supporting implicit barriers instead.
I'm not sure I count for much in this regard, but yeah, that sounds like a plan :-)
Peter Zijlstra peterz@infradead.org writes:
On Fri, Nov 24, 2023 at 12:04:09PM +0100, Jonas Oberhauser wrote:
I think ARM64 approached this problem by adding the load-acquire/store-release instructions and for TSO based code, translate into those (eg. x86 -> arm64 transpilers).
Although those instructions have a bit more ordering constraints.
I have heard rumors that the apple chips also have a register that can be set at runtime.
Oh, I thought they made do with the load-acquire/store-release thingies. But to be fair, I haven't been paying *that* much attention to the apple stuff.
I did read about how they fudged some of the x86 flags thing.
And there are some IBM machines that have a setting, but not sure how it is controlled.
Cute, I'm assuming this is the Power series (s390 already being TSO)? I wasn't aware they had this.
Are you referring to Strong Access Ordering? That is a per-page attribute, not a CPU mode, and was removed in ISA v3.1 anyway.
cheers
On Fri, Nov 24, 2023 at 12:54:30PM +0100, Peter Zijlstra wrote:
On Fri, Nov 24, 2023 at 12:04:09PM +0100, Jonas Oberhauser wrote:
I think ARM64 approached this problem by adding the load-acquire/store-release instructions and for TSO based code, translate into those (eg. x86 -> arm64 transpilers).
Although those instructions have a bit more ordering constraints.
I have heard rumors that the apple chips also have a register that can be set at runtime.
I could understand the rumor, smart design! Thx for sharing.
Oh, I thought they made do with the load-acquire/store-release thingies. But to be fair, I haven't been paying *that* much attention to the apple stuff.
I did read about how they fudged some of the x86 flags thing.
And there are some IBM machines that have a setting, but not sure how it is controlled.
Cute, I'm assuming this is the Power series (s390 already being TSO)? I wasn't aware they had this.
IIRC Risc-V actually has such instructions as well, so *why* are you doing this?!?!
Unfortunately, at least last time I checked RISC-V still hadn't gotten such instructions. What they have is the *semantics* of the instructions, but no actual opcodes to encode them.
Well, that sucks..
I argued for them in the RISC-V memory group, but it was considered to be outside the scope of that group.
Transpiling with sufficient DMB ISH to get the desired ordering is really bad for performance.
Ha!, quite dreadful I would imagine.
That is not to say that linux should support this. Perhaps linux should pressure RISC-V into supporting implicit barriers instead.
I'm not sure I count for much in this regard, but yeah, that sounds like a plan :-)
On Fri, Nov 24, 2023 at 12:54:30PM +0100, Peter Zijlstra wrote:
On Fri, Nov 24, 2023 at 12:04:09PM +0100, Jonas Oberhauser wrote:
I think ARM64 approached this problem by adding the load-acquire/store-release instructions and for TSO based code, translate into those (eg. x86 -> arm64 transpilers).
Although those instructions have a bit more ordering constraints.
I have heard rumors that the apple chips also have a register that can be set at runtime.
Oh, I thought they made do with the load-acquire/store-release thingies. But to be fair, I haven't been paying *that* much attention to the apple stuff.
I did read about how they fudged some of the x86 flags thing.
I don't know what others may have built specifically, but architecturally on arm64 we expect people to express ordering requirements through instructions. ARMv8.0 has load-acquire and store-release, ARMv8.3 added RCpc forms of load-acquire as part of FEAT_LRCPC, and ARMv8.4 added a number of instructions as part of FEAT_LRCPC2.
For a number of reasons we avoid IMPLEMENTATION DEFINED controls for things like this.
Thanks Mark.
linux-kselftest-mirror@lists.linaro.org