-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As I've made a fair bit of headway since LinaroConnect, I wanted to drop a line on my current progress with porting TianoCore to KVM
Summary (tl;dr version):
KVM can start TianoCore, and boot all the way to shell, and access HDDs via VirtioBlk. We can start grub and successfully retrieve files from ext partitions, load a device tree, and start the kernel. The kernel runs through most of the EFI stub, but falls over during ExitBootServices()
Long Version:
So, after much blood sweat and tears, we're finally at the point of trying to actually start a kernel, though this (for the moment) remains an elusive goal. The current problem is that once we call EBS(), we get an exception from EFI with no Image information, which means the exception handler doesn't know where it came from. After several seconds, we get a second exception from within DxeCore, and then EFI falls over.
Debugging EFI is difficult and error prone, combined with limited debug facilities from the gdb-stub in QEMU (no breakpoints), and no decent way to load all of EFI itself (you have to run add-symbol-file manually with the output of commands printed on the console; supposedly its possible to generate a giant GdbSyms.dll file to import in a single go, but I haven't succeeded at this). This is further complicated that it appears we're asserting somewhere in a driver, and short of adding printfs to *every* driver, its impossible to know which is asseting.
Previous attempts to debug assets shows that EFI does "odd" things to the stack when we hit an exception, making walking it with GDB impossible. I need to figure out what madness EFI does with my SP so I can get the entire stack on an explosion, but this remains at best hopeful thinking.
Further complicating things is that during EBS, my print debugging goes away. I might just cheat and roll a simple assembly function to bang out messages through serial without calling anything else. Ugly as sin, but this should let me get useful debug output through the EBS framework. Complicating matters is that I need to locate each and all EBS() event functions, which are spread *everywhere* in TianoCore, and then debug them each individually.
I'm open to ideas on how best to accomplish this.
On a larger scale, there are a couple of other bugs and odds and ends which currently affect us:
* wfi doesn't work
THis is probably the biggest w.r.t. to functionality that should work, but doesn't. The EFI event loop is built on checking the timer, then calling wfi to check the timer later. The problem here is we call wfi ... and UEFI never comes back despite events firing (I can put print code in the interrupt handler to confirm this). This may be related to the VGIC errors I get running kvm under foundation, but haven't taken the time to properly nail down the bug here.
This was worked around by commenting out the wfi, turning event loop into a busy loop, but this has to be resolved before we can ever consider merging it
* No RTC
I looked through virt.c in KVM, and as best I can tell, I've got no RTC at all (no PL031). It also appears that the kernel can't get RTC as running a kernel gets me a 1970 clock. I'm not sure if this is by design or not, but it causes GetTime() to return EFI_ERROR, and I suspect may be one of the exceptions I'm getting avoid (Shell prints a ton of warnings that GetTime is busted).
* No terminfo support (not ARM specific)
EFI assumes its working against a real console or terminal. As such, it doesn't respect anything like termcap or such, and output gets jumbled due to incorrect escape sequencies and such. This isn't specific to ARM as I get identical behavior with OVMF running on stdio or over telnet. Backspace for instance requires me to type ^H into the console manually
As we expect to have this usable remotely, we need to determine how to handle the terminal and escape sequences properly so we're not printing garbage to the screen. A "dumb" mode might just be the best way to handle this, but something like grub do display fancy graphics.
Ideas welcome.
If anyone wants to play with it, my code is on git, but requires a bit of setup to get running, and you need a patch to KVM to successfully start UEFI at the moment. Michael
On Fri, Mar 28, 2014 at 04:26:59AM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As I've made a fair bit of headway since LinaroConnect, I wanted to drop a line on my current progress with porting TianoCore to KVM
Summary (tl;dr version):
KVM can start TianoCore, and boot all the way to shell, and access HDDs via VirtioBlk. We can start grub and successfully retrieve files from ext partitions, load a device tree, and start the kernel. The kernel runs through most of the EFI stub, but falls over during ExitBootServices()
Thanks for providing this status!
Long Version:
So, after much blood sweat and tears, we're finally at the point of trying to actually start a kernel, though this (for the moment) remains an elusive goal. The current problem is that once we call EBS(), we get an exception from EFI with no Image information, which means the exception handler doesn't know where it came from. After several seconds, we get a second exception from within DxeCore, and then EFI falls over.
Debugging EFI is difficult and error prone, combined with limited debug facilities from the gdb-stub in QEMU (no breakpoints), and no decent way to load all of EFI itself (you have to run add-symbol-file manually with the output of commands printed on the console; supposedly its possible to generate a giant GdbSyms.dll file to import in a single go, but I haven't succeeded at this). This is further complicated that it appears we're asserting somewhere in a driver, and short of adding printfs to *every* driver, its impossible to know which is asseting.
Maybe it's worth adding a hack-support-gdb-in-kvm implementation for this. If we go down this road, I can probably find time to help you out there.
Can you do some scripting to replace assert statements with "{ print("%s:%d\n", __FILE__, __LINE__); orig_assert(); }" type hack?
Previous attempts to debug assets shows that EFI does "odd" things to the stack when we hit an exception, making walking it with GDB impossible. I need to figure out what madness EFI does with my SP so I can get the entire stack on an explosion, but this remains at best hopeful thinking.
This sounds very strange - could it be that because you take an exception, you use a SP from a different mode and everything just messes up?
Further complicating things is that during EBS, my print debugging goes away. I might just cheat and roll a simple assembly function to bang out messages through serial without calling anything else. Ugly as sin, but this should let me get useful debug output through the EBS framework. Complicating matters is that I need to locate each and all EBS() event functions, which are spread *everywhere* in TianoCore, and then debug them each individually.
I'm a little confused no knowing UEFI, is EBS() not a single function and what does it matter that it's called from multiple places?
I'm open to ideas on how best to accomplish this.
On a larger scale, there are a couple of other bugs and odds and ends which currently affect us:
- wfi doesn't work
THis is probably the biggest w.r.t. to functionality that should work, but doesn't. The EFI event loop is built on checking the timer, then calling wfi to check the timer later. The problem here is we call wfi ... and UEFI never comes back despite events firing (I can put print code in the interrupt handler to confirm this). This may be related to the VGIC errors I get running kvm under foundation, but haven't taken the time to properly nail down the bug here.
So if I understand it, the expected sequence of events are:
1. check timer (arch timer counter?) 2. WFI 3. virtual arch timer interrupt, causes wake-up from WFI 4. go to 1->
But you seem to get stuck at (2)?
When you say "print code in the interrupt handler" is that the UEFI interrupt handler? In that case, you do wake up from the WFI...?
Do you see stuff happening in virt/kvm/arm/arch_timer.c: kvm_timer_inject_irq()?
That should call kvm_vgic_inject_irq(), which should vgic_kick_vcpus(kvm), which is what wakes you up from your WFI.
This was worked around by commenting out the wfi, turning event loop into a busy loop, but this has to be resolved before we can ever consider merging it
- No RTC
I looked through virt.c in KVM, and as best I can tell, I've got no RTC at all (no PL031). It also appears that the kernel can't get RTC as running a kernel gets me a 1970 clock. I'm not sure if this is by design or not, but it causes GetTime() to return EFI_ERROR, and I suspect may be one of the exceptions I'm getting avoid (Shell prints a ton of warnings that GetTime is busted).
The only thing you can use to tell passing of time in mach-virt is the arch-timer counter and use a fixed starting point.
-Christoffer
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 02:09 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 04:26:59AM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As I've made a fair bit of headway since LinaroConnect, I wanted to drop a line on my current progress with porting TianoCore to KVM
Summary (tl;dr version):
KVM can start TianoCore, and boot all the way to shell, and access HDDs via VirtioBlk. We can start grub and successfully retrieve files from ext partitions, load a device tree, and start the kernel. The kernel runs through most of the EFI stub, but falls over during ExitBootServices()
Thanks for providing this status!
Long Version:
So, after much blood sweat and tears, we're finally at the point of trying to actually start a kernel, though this (for the moment) remains an elusive goal. The current problem is that once we call EBS(), we get an exception from EFI with no Image information, which means the exception handler doesn't know where it came from. After several seconds, we get a second exception from within DxeCore, and then EFI falls over.
Debugging EFI is difficult and error prone, combined with limited debug facilities from the gdb-stub in QEMU (no breakpoints), and no decent way to load all of EFI itself (you have to run add-symbol-file manually with the output of commands printed on the console; supposedly its possible to generate a giant GdbSyms.dll file to import in a single go, but I haven't succeeded at this). This is further complicated that it appears we're asserting somewhere in a driver, and short of adding printfs to *every* driver, its impossible to know which is asseting.
Maybe it's worth adding a hack-support-gdb-in-kvm implementation for this. If we go down this road, I can probably find time to help you out there.
Can you do some scripting to replace assert statements with "{ print("%s:%d\n", __FILE__, __LINE__); orig_assert(); }" type hack?
That's probably a decent idea if I can find where ASSERT() is defined. I'll try that in a bit.
Previous attempts to debug assets shows that EFI does "odd" things to the stack when we hit an exception, making walking it with GDB impossible. I need to figure out what madness EFI does with my SP so I can get the entire stack on an explosion, but this remains at best hopeful thinking.
This sounds very strange - could it be that because you take an exception, you use a SP from a different mode and everything just messes up?
This could be GDB just being unhappy. I've had issues walking the stack in KVM in general, but even if I walk the stack by hand, I don't see a pointer to the next frame when we're in an exception. To my knowledge, UEFI uses the standard AArch64 C ABI, but this might be a faulty exception on my part.
Further complicating things is that during EBS, my print debugging goes away. I might just cheat and roll a simple assembly function to bang out messages through serial without calling anything else. Ugly as sin, but this should let me get useful debug output through the EBS framework. Complicating matters is that I need to locate each and all EBS() event functions, which are spread *everywhere* in TianoCore, and then debug them each individually.
I'm a little confused no knowing UEFI, is EBS() not a single function and what does it matter that it's called from multiple places?
So, drivers and applications can enlist to get notification of when ExitBootServices are called. This pushes a pointer to a function into an array when is then iterated through and this pointer is then called so drivers can unregister themselves from boot services, etc.
Complicating the issue is I can't use printf once GetMemoryMap() is called without breaking EBS() (I think this is a bug in UEFI, leif, 2 cents?, but I think I can twiddle the serial port directly without breaking shit.
Having slept on it, its probably easy to print out the pointers as we go through them, so I can get an idea of whats listening for EBS and try and narrow down my list of candidates.
I'm open to ideas on how best to accomplish this.
On a larger scale, there are a couple of other bugs and odds and ends which currently affect us:
- wfi doesn't work
THis is probably the biggest w.r.t. to functionality that should work, but doesn't. The EFI event loop is built on checking the timer, then calling wfi to check the timer later. The problem here is we call wfi ... and UEFI never comes back despite events firing (I can put print code in the interrupt handler to confirm this). This may be related to the VGIC errors I get running kvm under foundation, but haven't taken the time to properly nail down the bug here.
So if I understand it, the expected sequence of events are:
- check timer (arch timer counter?) 2. WFI 3. virtual arch timer
interrupt, causes wake-up from WFI 4. go to 1->
But you seem to get stuck at (2)?
Exactly.
When you say "print code in the interrupt handler" is that the UEFI interrupt handler? In that case, you do wake up from the WFI...?
I put a DEBUG print line in the Timer interrupt handler, which prints out a message every tick letting me know the timer was working. When we call wfi, the timer ticks still show up (and I can see them through vgic with debugging there enabled)
Do you see stuff happening in virt/kvm/arm/arch_timer.c: kvm_timer_inject_irq()?
That should call kvm_vgic_inject_irq(), which should vgic_kick_vcpus(kvm), which is what wakes you up from your WFI.
Hrm, I need some debug code in vgic_kick_vcpus. Thanks!
This was worked around by commenting out the wfi, turning event loop into a busy loop, but this has to be resolved before we can ever consider merging it
- No RTC
I looked through virt.c in KVM, and as best I can tell, I've got no RTC at all (no PL031). It also appears that the kernel can't get RTC as running a kernel gets me a 1970 clock. I'm not sure if this is by design or not, but it causes GetTime() to return EFI_ERROR, and I suspect may be one of the exceptions I'm getting avoid (Shell prints a ton of warnings that GetTime is busted).
The only thing you can use to tell passing of time in mach-virt is the arch-timer counter and use a fixed starting point.
The problem here is spec says THOU SHALL HAVE RTC. We could fake it with counting up from system start and using the UEFI build time as a starting point, but this is not what the spec rights had in mind (nothing says GetTime() has to be accurate :-)).
For KVM, I'm wondering if we should just stick a PL031 on the bus and be done with it. For Xen, we're going to need a way to do this via xenbus.
-Christoffer
On Fri, Mar 28, 2014 at 03:38:28PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 02:09 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 04:26:59AM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As I've made a fair bit of headway since LinaroConnect, I wanted to drop a line on my current progress with porting TianoCore to KVM
Summary (tl;dr version):
KVM can start TianoCore, and boot all the way to shell, and access HDDs via VirtioBlk. We can start grub and successfully retrieve files from ext partitions, load a device tree, and start the kernel. The kernel runs through most of the EFI stub, but falls over during ExitBootServices()
Thanks for providing this status!
Long Version:
So, after much blood sweat and tears, we're finally at the point of trying to actually start a kernel, though this (for the moment) remains an elusive goal. The current problem is that once we call EBS(), we get an exception from EFI with no Image information, which means the exception handler doesn't know where it came from. After several seconds, we get a second exception from within DxeCore, and then EFI falls over.
Debugging EFI is difficult and error prone, combined with limited debug facilities from the gdb-stub in QEMU (no breakpoints), and no decent way to load all of EFI itself (you have to run add-symbol-file manually with the output of commands printed on the console; supposedly its possible to generate a giant GdbSyms.dll file to import in a single go, but I haven't succeeded at this). This is further complicated that it appears we're asserting somewhere in a driver, and short of adding printfs to *every* driver, its impossible to know which is asseting.
Maybe it's worth adding a hack-support-gdb-in-kvm implementation for this. If we go down this road, I can probably find time to help you out there.
Can you do some scripting to replace assert statements with "{ print("%s:%d\n", __FILE__, __LINE__); orig_assert(); }" type hack?
That's probably a decent idea if I can find where ASSERT() is defined. I'll try that in a bit.
Previous attempts to debug assets shows that EFI does "odd" things to the stack when we hit an exception, making walking it with GDB impossible. I need to figure out what madness EFI does with my SP so I can get the entire stack on an explosion, but this remains at best hopeful thinking.
This sounds very strange - could it be that because you take an exception, you use a SP from a different mode and everything just messes up?
This could be GDB just being unhappy. I've had issues walking the stack in KVM in general, but even if I walk the stack by hand, I don't see a pointer to the next frame when we're in an exception. To my knowledge, UEFI uses the standard AArch64 C ABI, but this might be a faulty exception on my part.
Further complicating things is that during EBS, my print debugging goes away. I might just cheat and roll a simple assembly function to bang out messages through serial without calling anything else. Ugly as sin, but this should let me get useful debug output through the EBS framework. Complicating matters is that I need to locate each and all EBS() event functions, which are spread *everywhere* in TianoCore, and then debug them each individually.
I'm a little confused no knowing UEFI, is EBS() not a single function and what does it matter that it's called from multiple places?
So, drivers and applications can enlist to get notification of when ExitBootServices are called. This pushes a pointer to a function into an array when is then iterated through and this pointer is then called so drivers can unregister themselves from boot services, etc.
Complicating the issue is I can't use printf once GetMemoryMap() is called without breaking EBS() (I think this is a bug in UEFI, leif, 2 cents?, but I think I can twiddle the serial port directly without breaking shit.
yeah, just writing to the pl011 out should be trivial, or add an hvc temporary hack to KVM, I've done things like that when originally debugging kernel boot under KVM.
Having slept on it, its probably easy to print out the pointers as we go through them, so I can get an idea of whats listening for EBS and try and narrow down my list of candidates.
yes, add a function that side-steps all the UEFI-weirdness (should be a few lines static function) that can print the pointers of the functions you're calling.
I'm open to ideas on how best to accomplish this.
On a larger scale, there are a couple of other bugs and odds and ends which currently affect us:
- wfi doesn't work
THis is probably the biggest w.r.t. to functionality that should work, but doesn't. The EFI event loop is built on checking the timer, then calling wfi to check the timer later. The problem here is we call wfi ... and UEFI never comes back despite events firing (I can put print code in the interrupt handler to confirm this). This may be related to the VGIC errors I get running kvm under foundation, but haven't taken the time to properly nail down the bug here.
So if I understand it, the expected sequence of events are:
- check timer (arch timer counter?) 2. WFI 3. virtual arch timer
interrupt, causes wake-up from WFI 4. go to 1->
But you seem to get stuck at (2)?
Exactly.
When you say "print code in the interrupt handler" is that the UEFI interrupt handler? In that case, you do wake up from the WFI...?
I put a DEBUG print line in the Timer interrupt handler, which prints out a message every tick letting me know the timer was working. When we call wfi, the timer ticks still show up (and I can see them through vgic with debugging there enabled)
Which timer interrupt handler? The UEFI one?
If you get an interrupt for the timer in UEFI, then your WFIs are not hanging, the VCPU actually resumes. Assuming you receive the interrupts on the same CPU that did the WFI.
Do you see stuff happening in virt/kvm/arm/arch_timer.c: kvm_timer_inject_irq()?
That should call kvm_vgic_inject_irq(), which should vgic_kick_vcpus(kvm), which is what wakes you up from your WFI.
Hrm, I need some debug code in vgic_kick_vcpus. Thanks!
This was worked around by commenting out the wfi, turning event loop into a busy loop, but this has to be resolved before we can ever consider merging it
- No RTC
I looked through virt.c in KVM, and as best I can tell, I've got no RTC at all (no PL031). It also appears that the kernel can't get RTC as running a kernel gets me a 1970 clock. I'm not sure if this is by design or not, but it causes GetTime() to return EFI_ERROR, and I suspect may be one of the exceptions I'm getting avoid (Shell prints a ton of warnings that GetTime is busted).
The only thing you can use to tell passing of time in mach-virt is the arch-timer counter and use a fixed starting point.
The problem here is spec says THOU SHALL HAVE RTC. We could fake it with counting up from system start and using the UEFI build time as a starting point, but this is not what the spec rights had in mind (nothing says GetTime() has to be accurate :-)).
For KVM, I'm wondering if we should just stick a PL031 on the bus and be done with it. For Xen, we're going to need a way to do this via xenbus.
So the QEMU virt platform is simply not equipped to run UEFI? That's interesting. Peter, any thoughts?
-Christoffer
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 03:47 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 03:38:28PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 02:09 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 04:26:59AM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As I've made a fair bit of headway since LinaroConnect, I wanted to drop a line on my current progress with porting TianoCore to KVM
Summary (tl;dr version):
KVM can start TianoCore, and boot all the way to shell, and access HDDs via VirtioBlk. We can start grub and successfully retrieve files from ext partitions, load a device tree, and start the kernel. The kernel runs through most of the EFI stub, but falls over during ExitBootServices()
Thanks for providing this status!
Long Version:
So, after much blood sweat and tears, we're finally at the point of trying to actually start a kernel, though this (for the moment) remains an elusive goal. The current problem is that once we call EBS(), we get an exception from EFI with no Image information, which means the exception handler doesn't know where it came from. After several seconds, we get a second exception from within DxeCore, and then EFI falls over.
Debugging EFI is difficult and error prone, combined with limited debug facilities from the gdb-stub in QEMU (no breakpoints), and no decent way to load all of EFI itself (you have to run add-symbol-file manually with the output of commands printed on the console; supposedly its possible to generate a giant GdbSyms.dll file to import in a single go, but I haven't succeeded at this). This is further complicated that it appears we're asserting somewhere in a driver, and short of adding printfs to *every* driver, its impossible to know which is asseting.
Maybe it's worth adding a hack-support-gdb-in-kvm implementation for this. If we go down this road, I can probably find time to help you out there.
Can you do some scripting to replace assert statements with "{ print("%s:%d\n", __FILE__, __LINE__); orig_assert(); }" type hack?
That's probably a decent idea if I can find where ASSERT() is defined. I'll try that in a bit.
Previous attempts to debug assets shows that EFI does "odd" things to the stack when we hit an exception, making walking it with GDB impossible. I need to figure out what madness EFI does with my SP so I can get the entire stack on an explosion, but this remains at best hopeful thinking.
This sounds very strange - could it be that because you take an exception, you use a SP from a different mode and everything just messes up?
This could be GDB just being unhappy. I've had issues walking the stack in KVM in general, but even if I walk the stack by hand, I don't see a pointer to the next frame when we're in an exception. To my knowledge, UEFI uses the standard AArch64 C ABI, but this might be a faulty exception on my part.
Further complicating things is that during EBS, my print debugging goes away. I might just cheat and roll a simple assembly function to bang out messages through serial without calling anything else. Ugly as sin, but this should let me get useful debug output through the EBS framework. Complicating matters is that I need to locate each and all EBS() event functions, which are spread *everywhere* in TianoCore, and then debug them each individually.
I'm a little confused no knowing UEFI, is EBS() not a single function and what does it matter that it's called from multiple places?
So, drivers and applications can enlist to get notification of when ExitBootServices are called. This pushes a pointer to a function into an array when is then iterated through and this pointer is then called so drivers can unregister themselves from boot services, etc.
Complicating the issue is I can't use printf once GetMemoryMap() is called without breaking EBS() (I think this is a bug in UEFI, leif, 2 cents?, but I think I can twiddle the serial port directly without breaking shit.
yeah, just writing to the pl011 out should be trivial, or add an hvc temporary hack to KVM, I've done things like that when originally debugging kernel boot under KVM.
Just for the record, hvc?
Having slept on it, its probably easy to print out the pointers as we go through them, so I can get an idea of whats listening for EBS and try and narrow down my list of candidates.
yes, add a function that side-steps all the UEFI-weirdness (should be a few lines static function) that can print the pointers of the functions you're calling.
Biggest issue is now binutils doesn't like PE?AArch64 files (addr2line and friends don't work) but I think I can muddle through it. There are tricks at this point I can use if I have a pointer to get an idea where UEFI is.
I'm open to ideas on how best to accomplish this.
On a larger scale, there are a couple of other bugs and odds and ends which currently affect us:
- wfi doesn't work
THis is probably the biggest w.r.t. to functionality that should work, but doesn't. The EFI event loop is built on checking the timer, then calling wfi to check the timer later. The problem here is we call wfi ... and UEFI never comes back despite events firing (I can put print code in the interrupt handler to confirm this). This may be related to the VGIC errors I get running kvm under foundation, but haven't taken the time to properly nail down the bug here.
So if I understand it, the expected sequence of events are:
- check timer (arch timer counter?) 2. WFI 3. virtual arch
timer interrupt, causes wake-up from WFI 4. go to 1->
But you seem to get stuck at (2)?
Exactly.
When you say "print code in the interrupt handler" is that the UEFI interrupt handler? In that case, you do wake up from the WFI...?
I put a DEBUG print line in the Timer interrupt handler, which prints out a message every tick letting me know the timer was working. When we call wfi, the timer ticks still show up (and I can see them through vgic with debugging there enabled)
Which timer interrupt handler? The UEFI one?
If you get an interrupt for the timer in UEFI, then your WFIs are not hanging, the VCPU actually resumes. Assuming you receive the interrupts on the same CPU that did the WFI.
We're running uni-proc as that's all KVM supports ATM. What happens is we wfi, the interrupt fires, the interrupt handler fires, and we remain at the wfi.
Do you see stuff happening in virt/kvm/arm/arch_timer.c: kvm_timer_inject_irq()?
That should call kvm_vgic_inject_irq(), which should vgic_kick_vcpus(kvm), which is what wakes you up from your WFI.
Hrm, I need some debug code in vgic_kick_vcpus. Thanks!
This was worked around by commenting out the wfi, turning event loop into a busy loop, but this has to be resolved before we can ever consider merging it
- No RTC
I looked through virt.c in KVM, and as best I can tell, I've got no RTC at all (no PL031). It also appears that the kernel can't get RTC as running a kernel gets me a 1970 clock. I'm not sure if this is by design or not, but it causes GetTime() to return EFI_ERROR, and I suspect may be one of the exceptions I'm getting avoid (Shell prints a ton of warnings that GetTime is busted).
The only thing you can use to tell passing of time in mach-virt is the arch-timer counter and use a fixed starting point.
The problem here is spec says THOU SHALL HAVE RTC. We could fake it with counting up from system start and using the UEFI build time as a starting point, but this is not what the spec rights had in mind (nothing says GetTime() has to be accurate :-)).
For KVM, I'm wondering if we should just stick a PL031 on the bus and be done with it. For Xen, we're going to need a way to do this via xenbus.
So the QEMU virt platform is simply not equipped to run UEFI? That's interesting. Peter, any thoughts?
I didn't notice the lack of RTC until I went to connect it, I should have hilighted this one sooner :-/.
-Christoffer
On Fri, Mar 28, 2014 at 04:08:52PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 03:47 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 03:38:28PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 02:09 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 04:26:59AM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As I've made a fair bit of headway since LinaroConnect, I wanted to drop a line on my current progress with porting TianoCore to KVM
Summary (tl;dr version):
KVM can start TianoCore, and boot all the way to shell, and access HDDs via VirtioBlk. We can start grub and successfully retrieve files from ext partitions, load a device tree, and start the kernel. The kernel runs through most of the EFI stub, but falls over during ExitBootServices()
Thanks for providing this status!
Long Version:
So, after much blood sweat and tears, we're finally at the point of trying to actually start a kernel, though this (for the moment) remains an elusive goal. The current problem is that once we call EBS(), we get an exception from EFI with no Image information, which means the exception handler doesn't know where it came from. After several seconds, we get a second exception from within DxeCore, and then EFI falls over.
Debugging EFI is difficult and error prone, combined with limited debug facilities from the gdb-stub in QEMU (no breakpoints), and no decent way to load all of EFI itself (you have to run add-symbol-file manually with the output of commands printed on the console; supposedly its possible to generate a giant GdbSyms.dll file to import in a single go, but I haven't succeeded at this). This is further complicated that it appears we're asserting somewhere in a driver, and short of adding printfs to *every* driver, its impossible to know which is asseting.
Maybe it's worth adding a hack-support-gdb-in-kvm implementation for this. If we go down this road, I can probably find time to help you out there.
Can you do some scripting to replace assert statements with "{ print("%s:%d\n", __FILE__, __LINE__); orig_assert(); }" type hack?
That's probably a decent idea if I can find where ASSERT() is defined. I'll try that in a bit.
Previous attempts to debug assets shows that EFI does "odd" things to the stack when we hit an exception, making walking it with GDB impossible. I need to figure out what madness EFI does with my SP so I can get the entire stack on an explosion, but this remains at best hopeful thinking.
This sounds very strange - could it be that because you take an exception, you use a SP from a different mode and everything just messes up?
This could be GDB just being unhappy. I've had issues walking the stack in KVM in general, but even if I walk the stack by hand, I don't see a pointer to the next frame when we're in an exception. To my knowledge, UEFI uses the standard AArch64 C ABI, but this might be a faulty exception on my part.
Further complicating things is that during EBS, my print debugging goes away. I might just cheat and roll a simple assembly function to bang out messages through serial without calling anything else. Ugly as sin, but this should let me get useful debug output through the EBS framework. Complicating matters is that I need to locate each and all EBS() event functions, which are spread *everywhere* in TianoCore, and then debug them each individually.
I'm a little confused no knowing UEFI, is EBS() not a single function and what does it matter that it's called from multiple places?
So, drivers and applications can enlist to get notification of when ExitBootServices are called. This pushes a pointer to a function into an array when is then iterated through and this pointer is then called so drivers can unregister themselves from boot services, etc.
Complicating the issue is I can't use printf once GetMemoryMap() is called without breaking EBS() (I think this is a bug in UEFI, leif, 2 cents?, but I think I can twiddle the serial port directly without breaking shit.
yeah, just writing to the pl011 out should be trivial, or add an hvc temporary hack to KVM, I've done things like that when originally debugging kernel boot under KVM.
Just for the record, hvc?
Hypervisor Call, 'HVC' is the instruction that causes a trap to KVM, so you could do "mov r1, #0x41; mov r0, #0xff42; hvc #0;" to invent a hypercall on number 0xff42 meaning "do print somewhere" and print 'A'. That would be useful if you don't know what your memory map is like and have no idea if you can even get to your pl011 registers, but if know the address of that, it may be much easier to just hardcode that.
Having slept on it, its probably easy to print out the pointers as we go through them, so I can get an idea of whats listening for EBS and try and narrow down my list of candidates.
yes, add a function that side-steps all the UEFI-weirdness (should be a few lines static function) that can print the pointers of the functions you're calling.
Biggest issue is now binutils doesn't like PE?AArch64 files (addr2line and friends don't work) but I think I can muddle through it. There are tricks at this point I can use if I have a pointer to get an idea where UEFI is.
I'm open to ideas on how best to accomplish this.
On a larger scale, there are a couple of other bugs and odds and ends which currently affect us:
- wfi doesn't work
THis is probably the biggest w.r.t. to functionality that should work, but doesn't. The EFI event loop is built on checking the timer, then calling wfi to check the timer later. The problem here is we call wfi ... and UEFI never comes back despite events firing (I can put print code in the interrupt handler to confirm this). This may be related to the VGIC errors I get running kvm under foundation, but haven't taken the time to properly nail down the bug here.
So if I understand it, the expected sequence of events are:
- check timer (arch timer counter?) 2. WFI 3. virtual arch
timer interrupt, causes wake-up from WFI 4. go to 1->
But you seem to get stuck at (2)?
Exactly.
When you say "print code in the interrupt handler" is that the UEFI interrupt handler? In that case, you do wake up from the WFI...?
I put a DEBUG print line in the Timer interrupt handler, which prints out a message every tick letting me know the timer was working. When we call wfi, the timer ticks still show up (and I can see them through vgic with debugging there enabled)
Which timer interrupt handler? The UEFI one?
If you get an interrupt for the timer in UEFI, then your WFIs are not hanging, the VCPU actually resumes. Assuming you receive the interrupts on the same CPU that did the WFI.
We're running uni-proc as that's all KVM supports ATM. What happens is we wfi, the interrupt fires, the interrupt handler fires, and we remain at the wfi.
again, which interrupt handler?
If the one in UEFI, it sounds like you're problem is not that you're stuck at WFI, but that you re-execute WFI.
-Christoffer
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 05:02 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 04:08:52PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 03:47 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 03:38:28PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 02:09 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 04:26:59AM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As I've made a fair bit of headway since LinaroConnect, I wanted to drop a line on my current progress with porting TianoCore to KVM
Summary (tl;dr version):
KVM can start TianoCore, and boot all the way to shell, and access HDDs via VirtioBlk. We can start grub and successfully retrieve files from ext partitions, load a device tree, and start the kernel. The kernel runs through most of the EFI stub, but falls over during ExitBootServices()
Thanks for providing this status!
Long Version:
So, after much blood sweat and tears, we're finally at the point of trying to actually start a kernel, though this (for the moment) remains an elusive goal. The current problem is that once we call EBS(), we get an exception from EFI with no Image information, which means the exception handler doesn't know where it came from. After several seconds, we get a second exception from within DxeCore, and then EFI falls over.
Debugging EFI is difficult and error prone, combined with limited debug facilities from the gdb-stub in QEMU (no breakpoints), and no decent way to load all of EFI itself (you have to run add-symbol-file manually with the output of commands printed on the console; supposedly its possible to generate a giant GdbSyms.dll file to import in a single go, but I haven't succeeded at this). This is further complicated that it appears we're asserting somewhere in a driver, and short of adding printfs to *every* driver, its impossible to know which is asseting.
Maybe it's worth adding a hack-support-gdb-in-kvm implementation for this. If we go down this road, I can probably find time to help you out there.
Can you do some scripting to replace assert statements with "{ print("%s:%d\n", __FILE__, __LINE__); orig_assert(); }" type hack?
That's probably a decent idea if I can find where ASSERT() is defined. I'll try that in a bit.
Previous attempts to debug assets shows that EFI does "odd" things to the stack when we hit an exception, making walking it with GDB impossible. I need to figure out what madness EFI does with my SP so I can get the entire stack on an explosion, but this remains at best hopeful thinking.
This sounds very strange - could it be that because you take an exception, you use a SP from a different mode and everything just messes up?
This could be GDB just being unhappy. I've had issues walking the stack in KVM in general, but even if I walk the stack by hand, I don't see a pointer to the next frame when we're in an exception. To my knowledge, UEFI uses the standard AArch64 C ABI, but this might be a faulty exception on my part.
Further complicating things is that during EBS, my print debugging goes away. I might just cheat and roll a simple assembly function to bang out messages through serial without calling anything else. Ugly as sin, but this should let me get useful debug output through the EBS framework. Complicating matters is that I need to locate each and all EBS() event functions, which are spread *everywhere* in TianoCore, and then debug them each individually.
I'm a little confused no knowing UEFI, is EBS() not a single function and what does it matter that it's called from multiple places?
So, drivers and applications can enlist to get notification of when ExitBootServices are called. This pushes a pointer to a function into an array when is then iterated through and this pointer is then called so drivers can unregister themselves from boot services, etc.
Complicating the issue is I can't use printf once GetMemoryMap() is called without breaking EBS() (I think this is a bug in UEFI, leif, 2 cents?, but I think I can twiddle the serial port directly without breaking shit.
yeah, just writing to the pl011 out should be trivial, or add an hvc temporary hack to KVM, I've done things like that when originally debugging kernel boot under KVM.
Just for the record, hvc?
Hypervisor Call, 'HVC' is the instruction that causes a trap to KVM, so you could do "mov r1, #0x41; mov r0, #0xff42; hvc #0;" to invent a hypercall on number 0xff42 meaning "do print somewhere" and print 'A'. That would be useful if you don't know what your memory map is like and have no idea if you can even get to your pl011 registers, but if know the address of that, it may be much easier to just hardcode that.
Having slept on it, its probably easy to print out the pointers as we go through them, so I can get an idea of whats listening for EBS and try and narrow down my list of candidates.
yes, add a function that side-steps all the UEFI-weirdness (should be a few lines static function) that can print the pointers of the functions you're calling.
Biggest issue is now binutils doesn't like PE?AArch64 files (addr2line and friends don't work) but I think I can muddle through it. There are tricks at this point I can use if I have a pointer to get an idea where UEFI is.
I'm open to ideas on how best to accomplish this.
On a larger scale, there are a couple of other bugs and odds and ends which currently affect us:
- wfi doesn't work
THis is probably the biggest w.r.t. to functionality that should work, but doesn't. The EFI event loop is built on checking the timer, then calling wfi to check the timer later. The problem here is we call wfi ... and UEFI never comes back despite events firing (I can put print code in the interrupt handler to confirm this). This may be related to the VGIC errors I get running kvm under foundation, but haven't taken the time to properly nail down the bug here.
So if I understand it, the expected sequence of events are:
- check timer (arch timer counter?) 2. WFI 3. virtual
arch timer interrupt, causes wake-up from WFI 4. go to 1->
But you seem to get stuck at (2)?
Exactly.
When you say "print code in the interrupt handler" is that the UEFI interrupt handler? In that case, you do wake up from the WFI...?
I put a DEBUG print line in the Timer interrupt handler, which prints out a message every tick letting me know the timer was working. When we call wfi, the timer ticks still show up (and I can see them through vgic with debugging there enabled)
Which timer interrupt handler? The UEFI one?
If you get an interrupt for the timer in UEFI, then your WFIs are not hanging, the VCPU actually resumes. Assuming you receive the interrupts on the same CPU that did the WFI.
We're running uni-proc as that's all KVM supports ATM. What happens is we wfi, the interrupt fires, the interrupt handler fires, and we remain at the wfi.
again, which interrupt handler?
If the one in UEFI, it sounds like you're problem is not that you're stuck at WFI, but that you re-execute WFI.
The one in EFI, and I suspect thats the case, we re-execute wfi. Michael
On Fri, Mar 28, 2014 at 05:08:03PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 05:02 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 04:08:52PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 03:47 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 03:38:28PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 02:09 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 04:26:59AM -0400, Michael Casadevall wrote: > -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 > > As I've made a fair bit of headway since LinaroConnect, > I wanted to drop a line on my current progress with > porting TianoCore to KVM > > Summary (tl;dr version): > > KVM can start TianoCore, and boot all the way to shell, > and access HDDs via VirtioBlk. We can start grub and > successfully retrieve files from ext partitions, load a > device tree, and start the kernel. The kernel runs > through most of the EFI stub, but falls over during > ExitBootServices()
Thanks for providing this status!
> > Long Version: > > So, after much blood sweat and tears, we're finally at > the point of trying to actually start a kernel, though > this (for the moment) remains an elusive goal. The > current problem is that once we call EBS(), we get an > exception from EFI with no Image information, which means > the exception handler doesn't know where it came from. > After several seconds, we get a second exception from > within DxeCore, and then EFI falls over. > > Debugging EFI is difficult and error prone, combined with > limited debug facilities from the gdb-stub in QEMU (no > breakpoints), and no decent way to load all of EFI > itself (you have to run add-symbol-file manually with the > output of commands printed on the console; supposedly its > possible to generate a giant GdbSyms.dll file to import > in a single go, but I haven't succeeded at this). This is > further complicated that it appears we're asserting > somewhere in a driver, and short of adding printfs to > *every* driver, its impossible to know which is > asseting.
Maybe it's worth adding a hack-support-gdb-in-kvm implementation for this. If we go down this road, I can probably find time to help you out there.
Can you do some scripting to replace assert statements with "{ print("%s:%d\n", __FILE__, __LINE__); orig_assert(); }" type hack?
That's probably a decent idea if I can find where ASSERT() is defined. I'll try that in a bit.
> > Previous attempts to debug assets shows that EFI does > "odd" things to the stack when we hit an exception, > making walking it with GDB impossible. I need to figure > out what madness EFI does with my SP so I can get the > entire stack on an explosion, but this remains at best > hopeful thinking.
This sounds very strange - could it be that because you take an exception, you use a SP from a different mode and everything just messes up?
This could be GDB just being unhappy. I've had issues walking the stack in KVM in general, but even if I walk the stack by hand, I don't see a pointer to the next frame when we're in an exception. To my knowledge, UEFI uses the standard AArch64 C ABI, but this might be a faulty exception on my part.
> > Further complicating things is that during EBS, my print > debugging goes away. I might just cheat and roll a > simple assembly function to bang out messages through > serial without calling anything else. Ugly as sin, but > this should let me get useful debug output through the > EBS framework. Complicating matters is that I need to > locate each and all EBS() event functions, which are > spread *everywhere* in TianoCore, and then debug them > each individually.
I'm a little confused no knowing UEFI, is EBS() not a single function and what does it matter that it's called from multiple places?
So, drivers and applications can enlist to get notification of when ExitBootServices are called. This pushes a pointer to a function into an array when is then iterated through and this pointer is then called so drivers can unregister themselves from boot services, etc.
Complicating the issue is I can't use printf once GetMemoryMap() is called without breaking EBS() (I think this is a bug in UEFI, leif, 2 cents?, but I think I can twiddle the serial port directly without breaking shit.
yeah, just writing to the pl011 out should be trivial, or add an hvc temporary hack to KVM, I've done things like that when originally debugging kernel boot under KVM.
Just for the record, hvc?
Hypervisor Call, 'HVC' is the instruction that causes a trap to KVM, so you could do "mov r1, #0x41; mov r0, #0xff42; hvc #0;" to invent a hypercall on number 0xff42 meaning "do print somewhere" and print 'A'. That would be useful if you don't know what your memory map is like and have no idea if you can even get to your pl011 registers, but if know the address of that, it may be much easier to just hardcode that.
Having slept on it, its probably easy to print out the pointers as we go through them, so I can get an idea of whats listening for EBS and try and narrow down my list of candidates.
yes, add a function that side-steps all the UEFI-weirdness (should be a few lines static function) that can print the pointers of the functions you're calling.
Biggest issue is now binutils doesn't like PE?AArch64 files (addr2line and friends don't work) but I think I can muddle through it. There are tricks at this point I can use if I have a pointer to get an idea where UEFI is.
> > I'm open to ideas on how best to accomplish this. > > On a larger scale, there are a couple of other bugs and > odds and ends which currently affect us: > > * wfi doesn't work > > THis is probably the biggest w.r.t. to functionality > that should work, but doesn't. The EFI event loop is > built on checking the timer, then calling wfi to check > the timer later. The problem here is we call wfi ... and > UEFI never comes back despite events firing (I can put > print code in the interrupt handler to confirm this). > This may be related to the VGIC errors I get running kvm > under foundation, but haven't taken the time to properly > nail down the bug here.
So if I understand it, the expected sequence of events are:
- check timer (arch timer counter?) 2. WFI 3. virtual
arch timer interrupt, causes wake-up from WFI 4. go to 1->
But you seem to get stuck at (2)?
Exactly.
When you say "print code in the interrupt handler" is that the UEFI interrupt handler? In that case, you do wake up from the WFI...?
I put a DEBUG print line in the Timer interrupt handler, which prints out a message every tick letting me know the timer was working. When we call wfi, the timer ticks still show up (and I can see them through vgic with debugging there enabled)
Which timer interrupt handler? The UEFI one?
If you get an interrupt for the timer in UEFI, then your WFIs are not hanging, the VCPU actually resumes. Assuming you receive the interrupts on the same CPU that did the WFI.
We're running uni-proc as that's all KVM supports ATM. What happens is we wfi, the interrupt fires, the interrupt handler fires, and we remain at the wfi.
again, which interrupt handler?
If the one in UEFI, it sounds like you're problem is not that you're stuck at WFI, but that you re-execute WFI.
The one in EFI, and I suspect thats the case, we re-execute wfi.
if you run the interrupt handler you are not stuck in wfi, but re-execute it, and it's your software wait condition which is the problem - likely related to the missing rtc?
-Christoffer
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 05:29 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 05:08:03PM -0400, Michael Casadevall wrote:
if you run the interrupt handler you are not stuck in wfi, but re-execute it, and it's your software wait condition which is the problem - likely related to the missing rtc?
The pc returns to the wfi instruction after executing the interrupt handler; this is in a bit of assembly, and not in C code.
On Fri, Mar 28, 2014 at 07:17:49PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 05:29 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 05:08:03PM -0400, Michael Casadevall wrote:
if you run the interrupt handler you are not stuck in wfi, but re-execute it, and it's your software wait condition which is the problem - likely related to the missing rtc?
The pc returns to the wfi instruction after executing the interrupt handler; this is in a bit of assembly, and not in C code.
It shouldn't. It should return to the instruction after the wfi instruction.
-Christoffer
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 08:17 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 07:17:49PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 05:29 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 05:08:03PM -0400, Michael Casadevall wrote:
if you run the interrupt handler you are not stuck in wfi, but re-execute it, and it's your software wait condition which is the problem - likely related to the missing rtc?
The pc returns to the wfi instruction after executing the interrupt handler; this is in a bit of assembly, and not in C code.
It shouldn't. It should return to the instruction after the wfi instruction.
Which is why I think there's a bug somewhere :-) This was what I was banging my head on at LC until I commented it out. Obviously thats not the correct thing to do, but I'm unsure if this is KVM or EFI doing something it shouldn't. As I said, I'm not convinced the GIC driver in EFI is 100% correct.
On Fri, Mar 28, 2014 at 08:57:15PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 08:17 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 07:17:49PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 05:29 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 05:08:03PM -0400, Michael Casadevall wrote:
if you run the interrupt handler you are not stuck in wfi, but re-execute it, and it's your software wait condition which is the problem - likely related to the missing rtc?
The pc returns to the wfi instruction after executing the interrupt handler; this is in a bit of assembly, and not in C code.
It shouldn't. It should return to the instruction after the wfi instruction.
Which is why I think there's a bug somewhere :-) This was what I was banging my head on at LC until I commented it out. Obviously thats not the correct thing to do, but I'm unsure if this is KVM or EFI doing something it shouldn't. As I said, I'm not convinced the GIC driver in EFI is 100% correct.
If your exception return is directly to the WFI instruction (and it's not because you do another iteration of the loop), then it sounds like the exception handler in UEFI is written incorrectly. KVM should not be involved at all in an RFE from EL0 to EL1. Also reading the virtual counter happens directly without trapping, so I would be equally surprised there.
Sounds to me like you need to trace the execution of UEFI exactly and figure out what it's doing.
If you can write up a guide on how to reproduce your results and where to look in the UEFI code, then I can try taking a look. (Writing such a guide may be needed and useful in any case).
-Christoffer
On 28 March 2014 20:08, Michael Casadevall michael.casadevall@linaro.org wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 03:47 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 03:38:28PM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 02:09 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 04:26:59AM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As I've made a fair bit of headway since LinaroConnect, I wanted to drop a line on my current progress with porting TianoCore to KVM
Summary (tl;dr version):
KVM can start TianoCore, and boot all the way to shell, and access HDDs via VirtioBlk. We can start grub and successfully retrieve files from ext partitions, load a device tree, and start the kernel. The kernel runs through most of the EFI stub, but falls over during ExitBootServices()
Thanks for providing this status!
Long Version:
So, after much blood sweat and tears, we're finally at the point of trying to actually start a kernel, though this (for the moment) remains an elusive goal. The current problem is that once we call EBS(), we get an exception from EFI with no Image information, which means the exception handler doesn't know where it came from. After several seconds, we get a second exception from within DxeCore, and then EFI falls over.
Debugging EFI is difficult and error prone, combined with limited debug facilities from the gdb-stub in QEMU (no breakpoints), and no decent way to load all of EFI itself (you have to run add-symbol-file manually with the output of commands printed on the console; supposedly its possible to generate a giant GdbSyms.dll file to import in a single go, but I haven't succeeded at this). This is further complicated that it appears we're asserting somewhere in a driver, and short of adding printfs to *every* driver, its impossible to know which is asseting.
Maybe it's worth adding a hack-support-gdb-in-kvm implementation for this. If we go down this road, I can probably find time to help you out there.
Can you do some scripting to replace assert statements with "{ print("%s:%d\n", __FILE__, __LINE__); orig_assert(); }" type hack?
That's probably a decent idea if I can find where ASSERT() is defined. I'll try that in a bit.
Previous attempts to debug assets shows that EFI does "odd" things to the stack when we hit an exception, making walking it with GDB impossible. I need to figure out what madness EFI does with my SP so I can get the entire stack on an explosion, but this remains at best hopeful thinking.
This sounds very strange - could it be that because you take an exception, you use a SP from a different mode and everything just messes up?
This could be GDB just being unhappy. I've had issues walking the stack in KVM in general, but even if I walk the stack by hand, I don't see a pointer to the next frame when we're in an exception. To my knowledge, UEFI uses the standard AArch64 C ABI, but this might be a faulty exception on my part.
Further complicating things is that during EBS, my print debugging goes away. I might just cheat and roll a simple assembly function to bang out messages through serial without calling anything else. Ugly as sin, but this should let me get useful debug output through the EBS framework. Complicating matters is that I need to locate each and all EBS() event functions, which are spread *everywhere* in TianoCore, and then debug them each individually.
I'm a little confused no knowing UEFI, is EBS() not a single function and what does it matter that it's called from multiple places?
So, drivers and applications can enlist to get notification of when ExitBootServices are called. This pushes a pointer to a function into an array when is then iterated through and this pointer is then called so drivers can unregister themselves from boot services, etc.
Complicating the issue is I can't use printf once GetMemoryMap() is called without breaking EBS() (I think this is a bug in UEFI, leif, 2 cents?, but I think I can twiddle the serial port directly without breaking shit.
yeah, just writing to the pl011 out should be trivial, or add an hvc temporary hack to KVM, I've done things like that when originally debugging kernel boot under KVM.
Just for the record, hvc?
Having slept on it, its probably easy to print out the pointers as we go through them, so I can get an idea of whats listening for EBS and try and narrow down my list of candidates.
yes, add a function that side-steps all the UEFI-weirdness (should be a few lines static function) that can print the pointers of the functions you're calling.
Biggest issue is now binutils doesn't like PE?AArch64 files (addr2line and friends don't work) but I think I can muddle through it. There are tricks at this point I can use if I have a pointer to get an idea where UEFI is.
If you could create a bug in the sourceware bugzilla for this issue then I'll add it to my list of things to fix.
Cheers,
On 28 March 2014 19:38, Michael Casadevall michael.casadevall@linaro.org wrote:
On 03/28/2014 02:09 PM, Christoffer Dall wrote:
On Fri, Mar 28, 2014 at 04:26:59AM -0400, Michael Casadevall wrote:
Previous attempts to debug assets shows that EFI does "odd" things to the stack when we hit an exception, making walking it with GDB impossible. I need to figure out what madness EFI does with my SP so I can get the entire stack on an explosion, but this remains at best hopeful thinking.
This sounds very strange - could it be that because you take an exception, you use a SP from a different mode and everything just messes up?
This could be GDB just being unhappy. I've had issues walking the stack in KVM in general, but even if I walk the stack by hand, I don't see a pointer to the next frame when we're in an exception. To my knowledge, UEFI uses the standard AArch64 C ABI, but this might be a faulty exception on my part.
There's a bug in QEMU's AArch64 KVM support which means we don't do the right thing with SP on syncing state to/from the kernel, so don't trust that.... (Fixed either in master or in my a64-system patchset, I forget which).
thanks -- PMM
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/28/2014 07:40 PM, Peter Maydell wrote:
.
There's a bug in QEMU's AArch64 KVM support which means we don't do the right thing with SP on syncing state to/from the kernel, so don't trust that.... (Fixed either in master or in my a64-system patchset, I forget which).
My initial attempts at getting ASSERT to give me useful information failed; is this patchset for KVM or the kernel, and where can I find it? Michael
On 31 March 2014 23:07, Michael Casadevall michael.casadevall@linaro.org wrote:
On 03/28/2014 07:40 PM, Peter Maydell wrote:
There's a bug in QEMU's AArch64 KVM support which means we don't do the right thing with SP on syncing state to/from the kernel, so don't trust that.... (Fixed either in master or in my a64-system patchset, I forget which).
My initial attempts at getting ASSERT to give me useful information failed; is this patchset for KVM or the kernel, and where can I find it?
It's for QEMU. https://lists.nongnu.org/archive/html/qemu-devel/2014-03/msg05588.html
Patch 16 is the one I'm talking about, though it probably depends on some of the preceding ones (mostly textual conflicts if you try to apply it on its own I suspect). You can also find that in git://git.linaro.org/people/peter.maydell/qemu-arm.git branch a64-system but beware: that is my work-in-progress branch and it may rebase, break arbitrarily, etc etc.
thanks -- PMM
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 03/31/2014 06:27 PM, Peter Maydell wrote:
On 31 March 2014 23:07, Michael Casadevall michael.casadevall@linaro.org wrote:
On 03/28/2014 07:40 PM, Peter Maydell wrote:
There's a bug in QEMU's AArch64 KVM support which means we don't do the right thing with SP on syncing state to/from the kernel, so don't trust that.... (Fixed either in master or in my a64-system patchset, I forget which).
My initial attempts at getting ASSERT to give me useful information failed; is this patchset for KVM or the kernel, and where can I find it?
It's for QEMU. https://lists.nongnu.org/archive/html/qemu-devel/2014-03/msg05588.html
Patch 16 is the one I'm talking about, though it probably depends on some of the preceding ones (mostly textual conflicts if you try to apply it on its own I suspect). You can also find that in git://git.linaro.org/people/peter.maydell/qemu-arm.git branch a64-system but beware: that is my work-in-progress branch and it may rebase, break arbitrarily, etc etc.
thanks -- PMM
I took your branch, added the UEFI patch to it, and then with some fiddling:
(gdb) reload-uefi -o GdbSyms.dll
EFI_SYSTEM_TABLE @ 0xff7cff18 Connected to KVM EFI Mar 31 2014 17:46:10 (Rev. 0x0) ConfigurationTable @ 0xff7b7e18, 0x6 entries DebugImageInfoTable @ 0xc007e018, 0x1b entries
Loading new symbols... add-symbol-file /home/mcasadevall/uefi/Build/AArch64Virtualization-KVM/DEBUG_ARMLINUXGCC/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll 0xff7f1260 add symbol table from file "/home/mcasadevall/uefi/Build/AArch64Virtualization-KVM/DEBUG_ARMLINUXGCC/AARCH64/MdeModulePkg/Core/Dxe/DxeMain/DEBUG/DxeCore.dll" at
(sip a LOT of output) Program received signal SIGINT, Interrupt.
(gdb) bt 0x00000000ff80bba8 in SetMemN (Buffer=0x0, Length=0, Value=0) at /home/mcasadevall/uefi/ArmPkg/Library/BaseMemoryLibStm/SetMemWrapper.c:87 87 return SetMem64 (Buffer, Length, (UINT64)Value); (gdb) bt #0 0x00000000ff80bba8 in SetMemN (Buffer=0x0, Length=0, Value=0) at /home/mcasadevall/uefi/ArmPkg/Library/BaseMemoryLibStm/SetMemWrapper.c:87 #1 0x0000000000000000 in ?? () (gdb)
Its a fairly major improvement, and I managed to use GdbSyms.dll to load ALL the symbol files in a single go, but I'm still having issues with the stack. At least now i can get the frame we're currently in reliably, but the backtrace remains busted.
reload-uefi also is incredibly slow, probably because its going through three SSH proxies to dump out the DebugImageInfoTable. I can combine this though with printing points out and get a reasonable idea of where stuff is breaking.
On 1 April 2014 01:35, Michael Casadevall michael.casadevall@linaro.org wrote:
Its a fairly major improvement, and I managed to use GdbSyms.dll to load ALL the symbol files in a single go, but I'm still having issues with the stack. At least now i can get the frame we're currently in reliably, but the backtrace remains busted.
Hrmm. If you ask gdb about native register values and the raw contents of the stack, you might be able to figure out (in conjunction with the disassembly of the relevant functions and the procedure calling standard) whether the problem is QEMU reporting bogus register values or if UEFI has been compiled with some funny options that cause it not to put in stack frames, or something else.
thanks -- PMM
On 04/01/14 03:22, Peter Maydell wrote:
On 1 April 2014 01:35, Michael Casadevall michael.casadevall@linaro.org wrote:
Its a fairly major improvement, and I managed to use GdbSyms.dll to load ALL the symbol files in a single go, but I'm still having issues with the stack. At least now i can get the frame we're currently in reliably, but the backtrace remains busted.
Hrmm. If you ask gdb about native register values and the raw contents of the stack, you might be able to figure out (in conjunction with the disassembly of the relevant functions and the procedure calling standard) whether the problem is QEMU reporting bogus register values or if UEFI has been compiled with some funny options that cause it not to put in stack frames, or something else.
Would this problem be a situation where Ftrace anor KernelShark + Ftrace or KernelShark + trace-cmd would ease the pain of debugging?
If not, why?
curiously, James
[2] https://lwn.net/Articles/425583/
[3] http://chemnitzer.linux-tage.de/2012/vortraege/shortpaper/985_Rostedt.txt