Thanks for jumping in!
On 29 February 2016 at 20:59, Ard Biesheuvel ard.biesheuvel@linaro.org wrote:
On 29 February 2016 at 21:05, Laszlo Ersek lersek@redhat.com wrote:
On 02/29/16 20:39, Ryan Harkin wrote:
Hi Ard/Leif/anyone who cares,
So I was trying to work out who broke MMC support in TC2 in the upstream EDK2 tree. It was difficult because the tree is borked on TC2 in so many interesting ways throughout history, but I eventually bisected down to this patch:
300fc77 2015-08-25 ArmPlatformPkg/PL180MciDxe: check PrimeCell ID before initializing [Ard Biesheuvel]
Basically, TC2 reads 0x02 for MCI_PERIPH_ID_REG3, when, according to the spec, the register is supposed to read 0x00 in all cases. So the driver doesn't probe and is never initialised. I guess this is an FPGA bug in TC2? It's probably known about, but not to me ;-)
Anyway, how to fix it??
We could mask off the "stuck" bit, we could not check ID_REG3, there are other things we could do.
I decided to mask off the bit rather than discard the register check in my patch below, just to get things working
But would you like to do?
For extra point.... this was extra fun to track down due to other problems. TC2 stopped booting since this patch was submitted
d340ef7 2014-08-26 ArmPkg/ArmArchTimerLib: Remove non required [depex] and IoLib [Olivier Martin]
I've always carried a revert patch in my tree because I was previously told I was wrong and that it wasn't a problem, even though it clearly is. TC2 is spewing out a constant stream of this message:
IRQ Exception PC at 0xBFB74C20 CPSR 0x60000133
^^ That only happens on a release build. I suspect there's a there bug too. The IRQ exception shouldn't happen infinitely. I think it should appear once and never again.
But anyway, on a debug build, I see this once I've fixed the probe function:
Probing ID registers at 0x11C050FE0 for a PL180
IRQ Exception PC at 0xBF71FC58 CPSR 0x60000133 nZCveAifT_svc /working/platforms/uefi/edk2/Build/ArmVExpress-CTA15-A7/DEBUG_GCC49/ARM/ArmPlatformPkg/Drivers/PL180MciDxe/PL180MciDxe/DEBUG/PL180MciDxe.dll loaded at 0xBF71E000 (PE/COFF offset) 0x1C58 (ELF or Mach-O offset) 0x1A38 0x9303 STR r3, [sp, #0xC] R0 0x00000001 R1 0x00000400 R2 0x00000800 R3 0x00000080 R4 0xBF810391 R5 0xBF831000 R6 0x00000000 R7 0xB000021C R8 0x80000100 R9 0xB8000000 R10 0xBFFEC000 R11 0x00000000 R12 0x00000000 SP 0xBFFFFB40 LR 0xBF71FC39 PC 0xBF71FC58 DFSR 0x00001E17 DFAR 0xE1D3C103 IFSR 0x00001236 IFAR 0xE9404847 No function: write to 0xE1D3C103 Instruction Access Flag fault on Page at 0xE9404847
ASSERT [ArmCpuDxe] /working/platforms/uefi/edk2/ArmPkg/Library/DefaultExceptionHandlerLib/Arm/DefaultExceptionHandler.c(258): ((BOOLEAN)(0==1))
It wasn't fixed until Ard's patch that broke MMC support. Ugh!
I'm suspecting that the MMC support has a dependency on IoLib - for that is the part of the patch that broke TC2 in the first place. But I have yet to investigate that problem; I don't even know what IoLib is.
IoLib is a library class that lets you massage IO ports and MMIO registers.
MdePkg/Include/Library/IoLib.h
The patch you quoted does two things: it removes ArmArchTimerLib's build-time dependency on the IoLib class, and it removes the runtime (dispatch) dependency on EFI_CPU_ARCH_PROTOCOL of any module that is linked against ArmArchTimerLib (unless that module has the same dependency due to another library instance it links against, or due to its own explicit [depex] section).
Removing the library class dependency could introduce such a problem only if the actual library instance used for that dependency had a constructor function that is henceforth no longer called, and this function changed something related to interrupts. Very unlikely.
Removing the DXE dispatch dependency on EFI_CPU_ARCH_PROTOCOL is the likely culprit, in my opinion. The driver that provides said architectural protocol probably massages interrupt configuration on the CPU or the GIC in its entry point function in such a way that ArmArchTimerLib actually silently depends on, without explicitly calling EFI_CPU_ARCH_PROTOCOL member functions. By removing the depex, the DXE core may have reordered another driver (that links against ArmArchTimerLib) versus the driver providing EFI_CPU_ARCH_PROTOCOL -- for which reason the timerlib functions may now run without the necessary interrupt setup.
Instances of the TimerLib class have always been finicky. For example, in OvmfPkg we have three instances (for various module types & firmware phases). The two instances that get linked into early module types (SEC, and PEI_CORE, PEIM, DXE_CORE) massage chipset registers directly, because that was the only robust way to make sure that whichever of these module (types) needed the ACPI timer could actually utilize it. Through these library instances, every such "early" module (that needs TimerLib) looks at the chipset registers, and sets the needed bits if they are not in place yet.
Thanks for the analysis
It appears that PL180MciDxe's dependency on gEfiCpuArchProtocolGuid was transitively fulfilled by its dependency on TimerLib, which is implemented by ArmArchTimerLib on TC2, and the patch removes it from the depex
It looks like both of you are right: If I only revert the Depex part of the commit, all works.
+[Depex] + gEfiCpuArchProtocolGuid
If I only revert the IoLib part, then the board still dies with the same ASSERT.
Simply replacing TRUE with gEfiCpuArchProtocolGuid in the Depex section of PL180MciDxe.inf should do the trick.
Yes, that works, thanks. It looks like a better fix than adding it back into
As far as the Primecell ID is concerned, let's just whitelist whatever TC2 exposes, even if in error.
You mean rather than mask the "stuck bit", specifically check if the register is 0 or 2? (Where 2 is added as a #define to the header).
I intend to also add a patch that prints debug if the probed registers don't match, sound reasonable?