On 8 February 2016 at 18:12, Ryan Harkin ryan.harkin@linaro.org wrote:
On 5 February 2016 at 18:50, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 19:36, Ryan Harkin wrote:
Hi Laszlo,
On 5 February 2016 at 17:19, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 17:35, Ryan Harkin wrote:
Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
I cannot give a coherent answer, just a few thoughts.
(1) I think BdsLibConnectAll() actually succeeds for the first time as well. All devices are enumerated, all drivers are connected, aren't they? The boot order is a separate question.
Yes, you're right, they are all connected because they all appear in the boot list.
(2) The network, the NIC, or the NIC driver are more probable suspects. If I see right, you always have a misc / misc1 / hd / network sequence of attempts, it's just that on the first few occasions, the network fails. ("Link timeout in auto-negotiation".)
Correct.
(3) I think repeated BdsLibConnectAll() calls may only give more time to the NIC to bring itself into working shape. What if you keep only one BdsLibConnectAll(), and replace the second BdsLibConnectAll() with a sizeable gBS->Stall()?
Eureka! I replace the 2nd BdsLibConnectAll() with "gBS->Stall(500000);" (0.5 seconds) and this works every time also.
So time to negociate (sic) would seem like the culprit. I suppose a 2nd BdsLibConnectAll() buys the NIC some time.
I'm left wondering if the "Boot EFI Network" option should actually be waiting for negotiation, however. I'm sure it's common on first boot that the network needs a little time to negotiate. I'll look into that. Perhaps there is a setting or an override to tell it to be patient?
AutoNegotiate() in "EmbeddedPkg/Drivers/Lan9118Dxe/Lan9118DxeUtil.c" uses a fixed timeout of 2000 * LAN9118_STALL (where LAN9118_STALL is 2 microseconds). LAN9118_STALL seems extremely short, but even if it is correct as a polling interval for the NIC, the 2000 should be made a PCD, probably.
As you'll see, I've just submitted a patch to convert the value to a PCD, keeping the same default value.
If I use this patch and another patch in the Juno .dsc file to set the PCD to 2000, then networking works just fine. I don't need any extra stalls in the platform code.
I've spent all day trying to test the same patches on Versatile Express TC2, but I've now discovered that ethernet doesn't even work on my TC2 even from the very time the driver was submitted. And it was submitted specifically for TC2. Quality.
Just an FYI on this TC2 issue:
LAN9118 works on release builds but not on debug builds.
I've traced back to the original "known good" code from before the LAN9118 driver was submitted and it shows the same symptoms: debug bad, release good.
Either way, to get TC2 to auto-negotiate, I have to set the new PCD timeout to 10 times higher than the one on Juno. Currently I'm using 400000. But transmit still fails no matter what.
Of course, the problems could be nothing to do with LAN9118. To eliminate the networking stack, I tried with the FVP models and they work under release and debug, although debug gives repeated errors:
LAN91x: SnpTransmit(): TxQueue insert failure.
So nothing conclusive there. More work is needed.