On Fri, Mar 28, 2014 at 04:26:59AM -0400, Michael Casadevall wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
As I've made a fair bit of headway since LinaroConnect, I wanted to drop a line on my current progress with porting TianoCore to KVM
Summary (tl;dr version):
KVM can start TianoCore, and boot all the way to shell, and access HDDs via VirtioBlk. We can start grub and successfully retrieve files from ext partitions, load a device tree, and start the kernel. The kernel runs through most of the EFI stub, but falls over during ExitBootServices()
Thanks for providing this status!
Long Version:
So, after much blood sweat and tears, we're finally at the point of trying to actually start a kernel, though this (for the moment) remains an elusive goal. The current problem is that once we call EBS(), we get an exception from EFI with no Image information, which means the exception handler doesn't know where it came from. After several seconds, we get a second exception from within DxeCore, and then EFI falls over.
Debugging EFI is difficult and error prone, combined with limited debug facilities from the gdb-stub in QEMU (no breakpoints), and no decent way to load all of EFI itself (you have to run add-symbol-file manually with the output of commands printed on the console; supposedly its possible to generate a giant GdbSyms.dll file to import in a single go, but I haven't succeeded at this). This is further complicated that it appears we're asserting somewhere in a driver, and short of adding printfs to *every* driver, its impossible to know which is asseting.
Maybe it's worth adding a hack-support-gdb-in-kvm implementation for this. If we go down this road, I can probably find time to help you out there.
Can you do some scripting to replace assert statements with "{ print("%s:%d\n", __FILE__, __LINE__); orig_assert(); }" type hack?
Previous attempts to debug assets shows that EFI does "odd" things to the stack when we hit an exception, making walking it with GDB impossible. I need to figure out what madness EFI does with my SP so I can get the entire stack on an explosion, but this remains at best hopeful thinking.
This sounds very strange - could it be that because you take an exception, you use a SP from a different mode and everything just messes up?
Further complicating things is that during EBS, my print debugging goes away. I might just cheat and roll a simple assembly function to bang out messages through serial without calling anything else. Ugly as sin, but this should let me get useful debug output through the EBS framework. Complicating matters is that I need to locate each and all EBS() event functions, which are spread *everywhere* in TianoCore, and then debug them each individually.
I'm a little confused no knowing UEFI, is EBS() not a single function and what does it matter that it's called from multiple places?
I'm open to ideas on how best to accomplish this.
On a larger scale, there are a couple of other bugs and odds and ends which currently affect us:
- wfi doesn't work
THis is probably the biggest w.r.t. to functionality that should work, but doesn't. The EFI event loop is built on checking the timer, then calling wfi to check the timer later. The problem here is we call wfi ... and UEFI never comes back despite events firing (I can put print code in the interrupt handler to confirm this). This may be related to the VGIC errors I get running kvm under foundation, but haven't taken the time to properly nail down the bug here.
So if I understand it, the expected sequence of events are:
1. check timer (arch timer counter?) 2. WFI 3. virtual arch timer interrupt, causes wake-up from WFI 4. go to 1->
But you seem to get stuck at (2)?
When you say "print code in the interrupt handler" is that the UEFI interrupt handler? In that case, you do wake up from the WFI...?
Do you see stuff happening in virt/kvm/arm/arch_timer.c: kvm_timer_inject_irq()?
That should call kvm_vgic_inject_irq(), which should vgic_kick_vcpus(kvm), which is what wakes you up from your WFI.
This was worked around by commenting out the wfi, turning event loop into a busy loop, but this has to be resolved before we can ever consider merging it
- No RTC
I looked through virt.c in KVM, and as best I can tell, I've got no RTC at all (no PL031). It also appears that the kernel can't get RTC as running a kernel gets me a 1970 clock. I'm not sure if this is by design or not, but it causes GetTime() to return EFI_ERROR, and I suspect may be one of the exceptions I'm getting avoid (Shell prints a ton of warnings that GetTime is busted).
The only thing you can use to tell passing of time in mach-virt is the arch-timer counter and use a fixed starting point.
-Christoffer