Hi
I was debugging kprobes-test for BE8 and noticed that some data fields are stored in LE instead of BE. It happens because these data fields get interpreted as instructions.
Is it a known issue?
For example: test_align_fail_data: bx lr .byte 0xaa .align .word 0x12345678
I would expect to see something like this: 00000000 <test_align_fail_data>: 0: e12fff1e bx lr 4: aa .byte 0xaa 5: 00 .byte 0x00 6: 0000 .short 0x0000 8: 12345678 .word 0x12345678
But instead I have: 00000000 <test_align_fail_data>: 0: e12fff1e bx lr 4: aa .byte 0xaa 5: 00 .byte 0x00 6: 0000 .short 0x0000 8: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
As a result the word 0x12345678 will be stored in LE.
I've run several tests and here are my observations: - Double ".align" fixes the issue :) - Behavior is the same for LE/BE, ARM/Thumb, GCC 4.4.1/4.6.x/4.8.2 - Size of alignment doesn't matter. - Issue happens only if previous data is not instruction-aligned and 0's are added before NOPs. - Explicit filling with 0's (.align , 0) fixes the issue, but as a side effect data @0x4 is interpreted as a single ".word 0xaa000000" instead of ".byte .byte .short". I'm not sure if there can be any functional difference because of this. - Issue doesn't happen if there is no instructions before data (no "bx lr" in the example). - Issue doesn't happen if data after .align is defined as ".type <symbol>,%object".
On 15/10/13 23:38, Taras Kondratiuk wrote:
Hi
I was debugging kprobes-test for BE8 and noticed that some data fields are stored in LE instead of BE. It happens because these data fields get interpreted as instructions.
Is it a known issue?
I reported the crashes to Tixy along with a different method of sovling the problem (changed to using pointers to the strings) a while ago. However it seems that nothing has happened to fix this.
Since kprobes seems to work with the fixed tests I forgot to follow up and prod Jon about looking into this problem.
Jon, if you are not interested in fixing this, then please let me know and we can get a patch sorted to fix it.
PS, I am going to leave this out of the current be8 patchset as I want to get that merged, and at the moment kprobes-test is not essential to getting the system started.
For example: test_align_fail_data: bx lr .byte 0xaa .align .word 0x12345678
I would expect to see something like this: 00000000<test_align_fail_data>: 0: e12fff1e bx lr 4: aa .byte 0xaa 5: 00 .byte 0x00 6: 0000 .short 0x0000 8: 12345678 .word 0x12345678
But instead I have: 00000000<test_align_fail_data>: 0: e12fff1e bx lr 4: aa .byte 0xaa 5: 00 .byte 0x00 6: 0000 .short 0x0000 8: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
As a result the word 0x12345678 will be stored in LE.
I've run several tests and here are my observations:
- Double ".align" fixes the issue :)
- Behavior is the same for LE/BE, ARM/Thumb, GCC 4.4.1/4.6.x/4.8.2
- Size of alignment doesn't matter.
- Issue happens only if previous data is not instruction-aligned and 0's are added before NOPs.
- Explicit filling with 0's (.align , 0) fixes the issue, but as a side effect data @0x4 is interpreted as a single ".word 0xaa000000" instead of ".byte .byte .short". I'm not sure if there can be any functional difference because of this.
- Issue doesn't happen if there is no instructions before data (no "bx lr" in the example).
- Issue doesn't happen if data after .align is defined as ".type<symbol>,%object".
Thanks for getting down to a simple test case.
My view is to fix this by not doing complicated things by trying to save a bit of space by embedding strings into the code. It is not as if we cannot get the compiler to put the strings into the relevant data area and give us a pointer we can use.
The code in this case is /not easy/ to follow and it would be nice if it could be cleaned up to just take the string as a argument to the test code instead of trying to find it via assembly magic.
On Wed, 2013-10-16 at 12:13 +0100, Ben Dooks wrote:
On 15/10/13 23:38, Taras Kondratiuk wrote:
Hi
I was debugging kprobes-test for BE8 and noticed that some data fields are stored in LE instead of BE. It happens because these data fields get interpreted as instructions.
Is it a known issue?
I reported the crashes to Tixy along with a different method of sovling the problem (changed to using pointers to the strings) a while ago.
I found that fix in the list archives: http://lists.infradead.org/pipermail/linux-arm-kernel/2013-July/186528.html
However it seems that nothing has happened to fix this.
Since kprobes seems to work with the fixed tests I forgot to follow up and prod Jon about looking into this problem.
Sorry, I sorta forgot/ignored the issue, strange compiler problem with a vague explanation on a big-endian kernel (who uses that ;-)
Jon, if you are not interested in fixing this, then please let me know and we can get a patch sorted to fix it.
Looking at your old patch again, it looks good to me. So if someone could post this to the lists again, with the commit message updated to have a clearer explanation as to the symptoms, e.g. some of Taras' analysis, that would be good. It should go to the list again to give people another chance to comment...
Thanks
On Wed, 2013-10-16 at 17:06 +0100, Jon Medhurst (Tixy) wrote:
On Wed, 2013-10-16 at 12:13 +0100, Ben Dooks wrote:
On 15/10/13 23:38, Taras Kondratiuk wrote:
Hi
I was debugging kprobes-test for BE8 and noticed that some data fields are stored in LE instead of BE. It happens because these data fields get interpreted as instructions.
Is it a known issue?
I reported the crashes to Tixy along with a different method of sovling the problem (changed to using pointers to the strings) a while ago.
I found that fix in the list archives: http://lists.infradead.org/pipermail/linux-arm-kernel/2013-July/186528.html
However it seems that nothing has happened to fix this.
Since kprobes seems to work with the fixed tests I forgot to follow up and prod Jon about looking into this problem.
Sorry, I sorta forgot/ignored the issue, strange compiler problem with a vague explanation on a big-endian kernel (who uses that ;-)
Jon, if you are not interested in fixing this, then please let me know and we can get a patch sorted to fix it.
Looking at your old patch again, it looks good to me.
Actually it's broken for thumb, the pointer to the title needs storing at a 4 byte alignment, which presumably will make the problem we're trying to workaround re-occur. I'll try and find some time to take a look for a different way of doing things...
On 10/16/2013 08:03 PM, Jon Medhurst (Tixy) wrote:
On Wed, 2013-10-16 at 17:06 +0100, Jon Medhurst (Tixy) wrote:
On Wed, 2013-10-16 at 12:13 +0100, Ben Dooks wrote:
On 15/10/13 23:38, Taras Kondratiuk wrote:
Hi
I was debugging kprobes-test for BE8 and noticed that some data fields are stored in LE instead of BE. It happens because these data fields get interpreted as instructions.
Is it a known issue?
I reported the crashes to Tixy along with a different method of sovling the problem (changed to using pointers to the strings) a while ago.
I found that fix in the list archives: http://lists.infradead.org/pipermail/linux-arm-kernel/2013-July/186528.html
However it seems that nothing has happened to fix this.
Since kprobes seems to work with the fixed tests I forgot to follow up and prod Jon about looking into this problem.
Sorry, I sorta forgot/ignored the issue, strange compiler problem with a vague explanation on a big-endian kernel (who uses that ;-)
Jon, if you are not interested in fixing this, then please let me know and we can get a patch sorted to fix it.
Looking at your old patch again, it looks good to me.
Actually it's broken for thumb, the pointer to the title needs storing at a 4 byte alignment, which presumably will make the problem we're trying to workaround re-occur. I'll try and find some time to take a look for a different way of doing things...
Word alignment here doesn't cause the problem, so this patch should workaround the issue.
On Wed, 2013-10-16 at 01:38 +0300, Taras Kondratiuk wrote:
Hi
I was debugging kprobes-test for BE8 and noticed that some data fields are stored in LE instead of BE. It happens because these data fields get interpreted as instructions.
Is it a known issue?
For example: test_align_fail_data: bx lr .byte 0xaa .align .word 0x12345678
I would expect to see something like this: 00000000 <test_align_fail_data>: 0: e12fff1e bx lr 4: aa .byte 0xaa 5: 00 .byte 0x00 6: 0000 .short 0x0000 8: 12345678 .word 0x12345678
But instead I have: 00000000 <test_align_fail_data>: 0: e12fff1e bx lr 4: aa .byte 0xaa 5: 00 .byte 0x00 6: 0000 .short 0x0000 8: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
As a result the word 0x12345678 will be stored in LE.
Hmm looks like a GCC bug, or we're doing something it considers undefined behaviour?
I've run several tests and here are my observations:
- Double ".align" fixes the issue :)
Temping as a simple fix, but rather risky to rely on strange compiler behaviour to fix another strange compiler error. :-)
- Behavior is the same for LE/BE, ARM/Thumb, GCC 4.4.1/4.6.x/4.8.2
At least the compiler is being consistent and it's obviously an issue we need to work around in the kernel.
- Size of alignment doesn't matter.
- Issue happens only if previous data is not instruction-aligned and 0's are added before NOPs.
- Explicit filling with 0's (.align , 0) fixes the issue, but as a side effect data @0x4 is interpreted as a single ".word 0xaa000000" instead of ".byte .byte .short". I'm not sure if there can be any functional difference because of this.
- Issue doesn't happen if there is no instructions before data (no "bx lr" in the example).
- Issue doesn't happen if data after .align is defined as ".type <symbol>,%object".
Seems like something which should be reported to the GCC people, and you would think with the detailed diagnosis you've provided someone familiar with the code would be able to quickly get to the bottom of it.
Meanwhile, I like the solution Den Dooks came up with, I'll comment about that in my reply to his mail...
On Wed, 2013-10-16 at 01:38 +0300, Taras Kondratiuk wrote:
Hi
I was debugging kprobes-test for BE8 and noticed that some data fields are stored in LE instead of BE. It happens because these data fields get interpreted as instructions.
Is it a known issue?
For example: test_align_fail_data: bx lr .byte 0xaa .align .word 0x12345678
I would expect to see something like this: 00000000 <test_align_fail_data>: 0: e12fff1e bx lr 4: aa .byte 0xaa 5: 00 .byte 0x00 6: 0000 .short 0x0000 8: 12345678 .word 0x12345678
But instead I have: 00000000 <test_align_fail_data>: 0: e12fff1e bx lr 4: aa .byte 0xaa 5: 00 .byte 0x00 6: 0000 .short 0x0000 8: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
As a result the word 0x12345678 will be stored in LE.
I've run several tests and here are my observations:
- Double ".align" fixes the issue :)
- Behavior is the same for LE/BE, ARM/Thumb, GCC 4.4.1/4.6.x/4.8.2
- Size of alignment doesn't matter.
- Issue happens only if previous data is not instruction-aligned and 0's are added before NOPs.
- Explicit filling with 0's (.align , 0) fixes the issue, but as a side effect data @0x4 is interpreted as a single ".word 0xaa000000" instead of ".byte .byte .short". I'm not sure if there can be any functional difference because of this.
After thinking about things overnight, I believe that this is the fix we should go with. We want to stick alignment padding between data laid down with .byte and .word so it makes sense to explicitly ask the toolchain to pad with zeros rather than leaving it the opportunity to get confused. (.align in the text section probably means it wants to align with nops, but then sees the initial alignment and/or surrounding statements look like binary data, not code, and then...)
I'll send a patch proposing that fix after I've worked out how to test it on a big-endian kernel. Or if someone else sends a patch for that with a good commit message that explains what's going on I'll happily ack that.
On 10/17/2013 03:17 PM, Jon Medhurst (Tixy) wrote:
On Wed, 2013-10-16 at 01:38 +0300, Taras Kondratiuk wrote:
- Explicit filling with 0's (.align , 0) fixes the issue, but as a side effect data @0x4 is interpreted as a single ".word 0xaa000000" instead of ".byte .byte .short". I'm not sure if there can be any functional difference because of this.
After thinking about things overnight, I believe that this is the fix we should go with. We want to stick alignment padding between data laid down with .byte and .word so it makes sense to explicitly ask the toolchain to pad with zeros rather than leaving it the opportunity to get confused. (.align in the text section probably means it wants to align with nops, but then sees the initial alignment and/or surrounding statements look like binary data, not code, and then...)
I'll send a patch proposing that fix after I've worked out how to test it on a big-endian kernel. Or if someone else sends a patch for that with a good commit message that explains what's going on I'll happily ack that.
I have several fixes for BE Thumb kprobes on top of Ben's series. ".aling , 0" workaround is one of them. All tests now pass for LE/BE and Thumb/ARM. I will clean up patches and send them to Ben.
linaro-kernel@lists.linaro.org