Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
Regards, Ryan.
[1] https://git.linaro.org/uefi/linaro-edk2.git/commitdiff/bfbd0ef1a182e1baa120f... [2] https://git.linaro.org/landing-teams/working/arm/edk2.git/commitdiff/25320ba...
On 02/05/16 17:35, Ryan Harkin wrote:
Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
I cannot give a coherent answer, just a few thoughts.
(1) I think BdsLibConnectAll() actually succeeds for the first time as well. All devices are enumerated, all drivers are connected, aren't they? The boot order is a separate question.
(2) The network, the NIC, or the NIC driver are more probable suspects. If I see right, you always have a misc / misc1 / hd / network sequence of attempts, it's just that on the first few occasions, the network fails. ("Link timeout in auto-negotiation".)
(3) I think repeated BdsLibConnectAll() calls may only give more time to the NIC to bring itself into working shape. What if you keep only one BdsLibConnectAll(), and replace the second BdsLibConnectAll() with a sizeable gBS->Stall()?
(4) What the boot order should be can be influenced by the platform BDS lib, in the PlatformBdsPolicyBehavior() function.
Namely, the BdsEntry() function in "MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c" initializes the "BootOptionList" variable to an empty list. Then it calls PlatformBdsPolicyBehavior(), which takes "BootOptionList" as an input/output parameter -- if it wishes, it can populate it.
In ArmVirtPkg and in OvmfPkg, we perform the following steps in PlatformBdsPolicyBehavior():
(a) connect the console(s)
(b) BdsLibConnectAll()
(c) BdsLibEnumerateAllBootOption (BootOptionList) -- this relies on the presence of all devices, from the previous step. This function (in "IntelFrameworkModulePkg/Library/GenericBdsLib/BdsBoot.c") has extensive documentation in its leading comment.
It will enumerate everything sensible (modifying BootOrder as well I think), and output a BootOptionList that contains all the possible boot options, in a sane order. Sanity means, if I remember correctly, that all options that existed previously and were referenced by BootOrder, retain their positions at the front of the list, and any new auto-detected boot options are tacked to the end.
(d) SetBootOrderFromQemu (BootOptionList) -- this is the really platform specific part for massaging the boot order. We read through BootOptionList -- we don't modify it --, do various calculations, and then rewrite the BootOrder variable. Importantly, all Boot#### variables that become *unreferenced* by BootOrder as a result of this, must be deleted (otherwise they constitute a leak). Again, BootOptionList is not modified.
(e) BdsLibBuildOptionFromVar (BootOptionList, L"BootOrder") -- it rebuilds BootOptionList from the new BootOrder contents. (We are again in PlatformBdsPolicyBehavior(), where BootOptionList counts as input/output.)
On a physical platform, I think you just go with (b) and (c), and then let the user customize the boot order. Next time you boot, (c) will respect that.
There are further possibilities; there is a "boot mode" HOB with which your low-level platform code can control your BDS policy, in order to speed up things. See BdsLibGetBootMode() and the macros in "MdePkg/Include/Pi/PiBootMode.h". Those macros are documented in one of the PI spec volumes.
For example, I think BOOT_ASSUMING_NO_CONFIGURATION_CHANGES is meant to be very fast (no need to connect all devices to all drivers), but such a HOB must be produced by your own PEI phase somehow -- you must know for example that the chassis was never opened while the machine was off.
FWIW, OVMF only uses BOOT_WITH_FULL_CONFIGURATION, and BOOT_ON_S3_RESUME, and these two are differentiated in OVMF's PEI phase by reading a CMOS register.
Anyway, I think what you need is: - call BdsLibConnectAll() exactly once - give that NIC more time (?) - if you'd like to regenerate all possible boot options *at the end* of BootOrder that the user may have deleted (or have become available by installing new hardware), call BdsLibEnumerateAllBootOption() too.
Laszlo
Regards, Ryan.
[1] https://git.linaro.org/uefi/linaro-edk2.git/commitdiff/bfbd0ef1a182e1baa120f... [2] https://git.linaro.org/landing-teams/working/arm/edk2.git/commitdiff/25320ba...
Hi Laszlo,
On 5 February 2016 at 17:19, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 17:35, Ryan Harkin wrote:
Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
I cannot give a coherent answer, just a few thoughts.
(1) I think BdsLibConnectAll() actually succeeds for the first time as well. All devices are enumerated, all drivers are connected, aren't they? The boot order is a separate question.
Yes, you're right, they are all connected because they all appear in the boot list.
(2) The network, the NIC, or the NIC driver are more probable suspects. If I see right, you always have a misc / misc1 / hd / network sequence of attempts, it's just that on the first few occasions, the network fails. ("Link timeout in auto-negotiation".)
Correct.
(3) I think repeated BdsLibConnectAll() calls may only give more time to the NIC to bring itself into working shape. What if you keep only one BdsLibConnectAll(), and replace the second BdsLibConnectAll() with a sizeable gBS->Stall()?
Eureka! I replace the 2nd BdsLibConnectAll() with "gBS->Stall(500000);" (0.5 seconds) and this works every time also.
So time to negociate (sic) would seem like the culprit. I suppose a 2nd BdsLibConnectAll() buys the NIC some time.
I'm left wondering if the "Boot EFI Network" option should actually be waiting for negotiation, however. I'm sure it's common on first boot that the network needs a little time to negotiate. I'll look into that. Perhaps there is a setting or an override to tell it to be patient?
(4) What the boot order should be can be influenced by the platform BDS lib, in the PlatformBdsPolicyBehavior() function.
Namely, the BdsEntry() function in "MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c" initializes the "BootOptionList" variable to an empty list. Then it calls PlatformBdsPolicyBehavior(), which takes "BootOptionList" as an input/output parameter -- if it wishes, it can populate it.
In ArmVirtPkg and in OvmfPkg, we perform the following steps in PlatformBdsPolicyBehavior():
(a) connect the console(s)
(b) BdsLibConnectAll()
(c) BdsLibEnumerateAllBootOption (BootOptionList) -- this relies on the presence of all devices, from the previous step. This function (in "IntelFrameworkModulePkg/Library/GenericBdsLib/BdsBoot.c") has extensive documentation in its leading comment.
It will enumerate everything sensible (modifying BootOrder as well I think), and output a BootOptionList that contains all the possible boot options, in a sane order. Sanity means, if I remember correctly, that all options that existed previously and were referenced by BootOrder, retain their positions at the front of the list, and any new auto-detected boot options are tacked to the end.
(d) SetBootOrderFromQemu (BootOptionList) -- this is the really platform specific part for massaging the boot order. We read through BootOptionList -- we don't modify it --, do various calculations, and then rewrite the BootOrder variable. Importantly, all Boot#### variables that become *unreferenced* by BootOrder as a result of this, must be deleted (otherwise they constitute a leak). Again, BootOptionList is not modified.
(e) BdsLibBuildOptionFromVar (BootOptionList, L"BootOrder") -- it rebuilds BootOptionList from the new BootOrder contents. (We are again in PlatformBdsPolicyBehavior(), where BootOptionList counts as input/output.)
On a physical platform, I think you just go with (b) and (c), and then let the user customize the boot order. Next time you boot, (c) will respect that.
Excellent answer, thanks. It looks like (c) is exactly the thing I'm looking for. For example, make HDD boot before USB. That sort of thing.
I'm quite happy that once the default boot order has been set that it stays that way unless the user changes it. I don't (think I) want to customise the boot order after the initial boot.
There are further possibilities; there is a "boot mode" HOB with which your low-level platform code can control your BDS policy, in order to speed up things. See BdsLibGetBootMode() and the macros in "MdePkg/Include/Pi/PiBootMode.h". Those macros are documented in one of the PI spec volumes.
For example, I think BOOT_ASSUMING_NO_CONFIGURATION_CHANGES is meant to be very fast (no need to connect all devices to all drivers), but such a HOB must be produced by your own PEI phase somehow -- you must know for example that the chassis was never opened while the machine was off.
FWIW, OVMF only uses BOOT_WITH_FULL_CONFIGURATION, and BOOT_ON_S3_RESUME, and these two are differentiated in OVMF's PEI phase by reading a CMOS register.
Anyway, I think what you need is:
- call BdsLibConnectAll() exactly once
- give that NIC more time (?)
- if you'd like to regenerate all possible boot options *at the end* of BootOrder that the user may have deleted (or have become available by installing new hardware), call BdsLibEnumerateAllBootOption() too.
Yes, that sounds about right. I have concerns about the negotiation timing, but the boot order hacking sounds like what I'm looking for.
Thanks again, Ryan.
Laszlo
Regards, Ryan.
[1] https://git.linaro.org/uefi/linaro-edk2.git/commitdiff/bfbd0ef1a182e1baa120f... [2] https://git.linaro.org/landing-teams/working/arm/edk2.git/commitdiff/25320ba...
On Feb 5, 2016, at 10:36 AM, Ryan Harkin ryan.harkin@linaro.org wrote:
Hi Laszlo,
On 5 February 2016 at 17:19, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 17:35, Ryan Harkin wrote:
Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
I cannot give a coherent answer, just a few thoughts.
(1) I think BdsLibConnectAll() actually succeeds for the first time as well. All devices are enumerated, all drivers are connected, aren't they? The boot order is a separate question.
Yes, you're right, they are all connected because they all appear in the boot list.
(2) The network, the NIC, or the NIC driver are more probable suspects. If I see right, you always have a misc / misc1 / hd / network sequence of attempts, it's just that on the first few occasions, the network fails. ("Link timeout in auto-negotiation".)
Correct.
(3) I think repeated BdsLibConnectAll() calls may only give more time to the NIC to bring itself into working shape. What if you keep only one BdsLibConnectAll(), and replace the second BdsLibConnectAll() with a sizeable gBS->Stall()?
Eureka! I replace the 2nd BdsLibConnectAll() with "gBS->Stall(500000);" (0.5 seconds) and this works every time also.
So time to negociate (sic) would seem like the culprit. I suppose a 2nd BdsLibConnectAll() buys the NIC some time.
I'm left wondering if the "Boot EFI Network" option should actually be waiting for negotiation, however. I'm sure it's common on first boot that the network needs a little time to negotiate. I'll look into that. Perhaps there is a setting or an override to tell it to be patient?
This is either a bug in the driver, or a bug in the config of the PXE server.
Thanks,
Andrew Fish
(4) What the boot order should be can be influenced by the platform BDS lib, in the PlatformBdsPolicyBehavior() function.
Namely, the BdsEntry() function in "MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c" initializes the "BootOptionList" variable to an empty list. Then it calls PlatformBdsPolicyBehavior(), which takes "BootOptionList" as an input/output parameter -- if it wishes, it can populate it.
In ArmVirtPkg and in OvmfPkg, we perform the following steps in PlatformBdsPolicyBehavior():
(a) connect the console(s)
(b) BdsLibConnectAll()
(c) BdsLibEnumerateAllBootOption (BootOptionList) -- this relies on the presence of all devices, from the previous step. This function (in "IntelFrameworkModulePkg/Library/GenericBdsLib/BdsBoot.c") has extensive documentation in its leading comment.
It will enumerate everything sensible (modifying BootOrder as well I think), and output a BootOptionList that contains all the possible boot options, in a sane order. Sanity means, if I remember correctly, that all options that existed previously and were referenced by BootOrder, retain their positions at the front of the list, and any new auto-detected boot options are tacked to the end.
(d) SetBootOrderFromQemu (BootOptionList) -- this is the really platform specific part for massaging the boot order. We read through BootOptionList -- we don't modify it --, do various calculations, and then rewrite the BootOrder variable. Importantly, all Boot#### variables that become *unreferenced* by BootOrder as a result of this, must be deleted (otherwise they constitute a leak). Again, BootOptionList is not modified.
(e) BdsLibBuildOptionFromVar (BootOptionList, L"BootOrder") -- it rebuilds BootOptionList from the new BootOrder contents. (We are again in PlatformBdsPolicyBehavior(), where BootOptionList counts as input/output.)
On a physical platform, I think you just go with (b) and (c), and then let the user customize the boot order. Next time you boot, (c) will respect that.
Excellent answer, thanks. It looks like (c) is exactly the thing I'm looking for. For example, make HDD boot before USB. That sort of thing.
I'm quite happy that once the default boot order has been set that it stays that way unless the user changes it. I don't (think I) want to customise the boot order after the initial boot.
There are further possibilities; there is a "boot mode" HOB with which your low-level platform code can control your BDS policy, in order to speed up things. See BdsLibGetBootMode() and the macros in "MdePkg/Include/Pi/PiBootMode.h". Those macros are documented in one of the PI spec volumes.
For example, I think BOOT_ASSUMING_NO_CONFIGURATION_CHANGES is meant to be very fast (no need to connect all devices to all drivers), but such a HOB must be produced by your own PEI phase somehow -- you must know for example that the chassis was never opened while the machine was off.
FWIW, OVMF only uses BOOT_WITH_FULL_CONFIGURATION, and BOOT_ON_S3_RESUME, and these two are differentiated in OVMF's PEI phase by reading a CMOS register.
Anyway, I think what you need is:
- call BdsLibConnectAll() exactly once
- give that NIC more time (?)
- if you'd like to regenerate all possible boot options *at the end* of
BootOrder that the user may have deleted (or have become available by installing new hardware), call BdsLibEnumerateAllBootOption() too.
Yes, that sounds about right. I have concerns about the negotiation timing, but the boot order hacking sounds like what I'm looking for.
Thanks again, Ryan.
Laszlo
Regards, Ryan.
[1] https://git.linaro.org/uefi/linaro-edk2.git/commitdiff/bfbd0ef1a182e1baa120f... [2] https://git.linaro.org/landing-teams/working/arm/edk2.git/commitdiff/25320ba...
edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Hi Andrew,
On 5 February 2016 at 18:39, Andrew Fish afish@apple.com wrote:
On Feb 5, 2016, at 10:36 AM, Ryan Harkin ryan.harkin@linaro.org wrote:
Hi Laszlo,
On 5 February 2016 at 17:19, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 17:35, Ryan Harkin wrote:
Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
I cannot give a coherent answer, just a few thoughts.
(1) I think BdsLibConnectAll() actually succeeds for the first time as well. All devices are enumerated, all drivers are connected, aren't they? The boot order is a separate question.
Yes, you're right, they are all connected because they all appear in the boot list.
(2) The network, the NIC, or the NIC driver are more probable suspects. If I see right, you always have a misc / misc1 / hd / network sequence of attempts, it's just that on the first few occasions, the network fails. ("Link timeout in auto-negotiation".)
Correct.
(3) I think repeated BdsLibConnectAll() calls may only give more time to the NIC to bring itself into working shape. What if you keep only one BdsLibConnectAll(), and replace the second BdsLibConnectAll() with a sizeable gBS->Stall()?
Eureka! I replace the 2nd BdsLibConnectAll() with "gBS->Stall(500000);" (0.5 seconds) and this works every time also.
So time to negociate (sic) would seem like the culprit. I suppose a 2nd BdsLibConnectAll() buys the NIC some time.
I'm left wondering if the "Boot EFI Network" option should actually be waiting for negotiation, however. I'm sure it's common on first boot that the network needs a little time to negotiate. I'll look into that. Perhaps there is a setting or an override to tell it to be patient?
This is either a bug in the driver, or a bug in the config of the PXE server.
I doubt it's a bug in the server: negotiation here refers to the the NIC training on the network, so the server isn't involved at this stage.
But a bug in the driver is a distinct possibility.
Thanks, Ryan.
Thanks,
Andrew Fish
(4) What the boot order should be can be influenced by the platform BDS lib, in the PlatformBdsPolicyBehavior() function.
Namely, the BdsEntry() function in "MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c" initializes the "BootOptionList" variable to an empty list. Then it calls PlatformBdsPolicyBehavior(), which takes "BootOptionList" as an input/output parameter -- if it wishes, it can populate it.
In ArmVirtPkg and in OvmfPkg, we perform the following steps in PlatformBdsPolicyBehavior():
(a) connect the console(s)
(b) BdsLibConnectAll()
(c) BdsLibEnumerateAllBootOption (BootOptionList) -- this relies on the presence of all devices, from the previous step. This function (in "IntelFrameworkModulePkg/Library/GenericBdsLib/BdsBoot.c") has extensive documentation in its leading comment.
It will enumerate everything sensible (modifying BootOrder as well I think), and output a BootOptionList that contains all the possible boot options, in a sane order. Sanity means, if I remember correctly, that all options that existed previously and were referenced by BootOrder, retain their positions at the front of the list, and any new auto-detected boot options are tacked to the end.
(d) SetBootOrderFromQemu (BootOptionList) -- this is the really platform specific part for massaging the boot order. We read through BootOptionList -- we don't modify it --, do various calculations, and then rewrite the BootOrder variable. Importantly, all Boot#### variables that become *unreferenced* by BootOrder as a result of this, must be deleted (otherwise they constitute a leak). Again, BootOptionList is not modified.
(e) BdsLibBuildOptionFromVar (BootOptionList, L"BootOrder") -- it rebuilds BootOptionList from the new BootOrder contents. (We are again in PlatformBdsPolicyBehavior(), where BootOptionList counts as input/output.)
On a physical platform, I think you just go with (b) and (c), and then let the user customize the boot order. Next time you boot, (c) will respect that.
Excellent answer, thanks. It looks like (c) is exactly the thing I'm looking for. For example, make HDD boot before USB. That sort of thing.
I'm quite happy that once the default boot order has been set that it stays that way unless the user changes it. I don't (think I) want to customise the boot order after the initial boot.
There are further possibilities; there is a "boot mode" HOB with which your low-level platform code can control your BDS policy, in order to speed up things. See BdsLibGetBootMode() and the macros in "MdePkg/Include/Pi/PiBootMode.h". Those macros are documented in one of the PI spec volumes.
For example, I think BOOT_ASSUMING_NO_CONFIGURATION_CHANGES is meant to be very fast (no need to connect all devices to all drivers), but such a HOB must be produced by your own PEI phase somehow -- you must know for example that the chassis was never opened while the machine was off.
FWIW, OVMF only uses BOOT_WITH_FULL_CONFIGURATION, and BOOT_ON_S3_RESUME, and these two are differentiated in OVMF's PEI phase by reading a CMOS register.
Anyway, I think what you need is:
- call BdsLibConnectAll() exactly once
- give that NIC more time (?)
- if you'd like to regenerate all possible boot options *at the end* of
BootOrder that the user may have deleted (or have become available by installing new hardware), call BdsLibEnumerateAllBootOption() too.
Yes, that sounds about right. I have concerns about the negotiation timing, but the boot order hacking sounds like what I'm looking for.
Thanks again, Ryan.
Laszlo
Regards, Ryan.
[1] https://git.linaro.org/uefi/linaro-edk2.git/commitdiff/bfbd0ef1a182e1baa120f... [2] https://git.linaro.org/landing-teams/working/arm/edk2.git/commitdiff/25320ba...
edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
On Feb 5, 2016, at 10:47 AM, Ryan Harkin ryan.harkin@linaro.org wrote:
Hi Andrew,
On 5 February 2016 at 18:39, Andrew Fish afish@apple.com wrote:
On Feb 5, 2016, at 10:36 AM, Ryan Harkin ryan.harkin@linaro.org wrote:
Hi Laszlo,
On 5 February 2016 at 17:19, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 17:35, Ryan Harkin wrote:
Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
I cannot give a coherent answer, just a few thoughts.
(1) I think BdsLibConnectAll() actually succeeds for the first time as well. All devices are enumerated, all drivers are connected, aren't they? The boot order is a separate question.
Yes, you're right, they are all connected because they all appear in the boot list.
(2) The network, the NIC, or the NIC driver are more probable suspects. If I see right, you always have a misc / misc1 / hd / network sequence of attempts, it's just that on the first few occasions, the network fails. ("Link timeout in auto-negotiation".)
Correct.
(3) I think repeated BdsLibConnectAll() calls may only give more time to the NIC to bring itself into working shape. What if you keep only one BdsLibConnectAll(), and replace the second BdsLibConnectAll() with a sizeable gBS->Stall()?
Eureka! I replace the 2nd BdsLibConnectAll() with "gBS->Stall(500000);" (0.5 seconds) and this works every time also.
So time to negociate (sic) would seem like the culprit. I suppose a 2nd BdsLibConnectAll() buys the NIC some time.
I'm left wondering if the "Boot EFI Network" option should actually be waiting for negotiation, however. I'm sure it's common on first boot that the network needs a little time to negotiate. I'll look into that. Perhaps there is a setting or an override to tell it to be patient?
This is either a bug in the driver, or a bug in the config of the PXE server.
I doubt it's a bug in the server: negotiation here refers to the the NIC training on the network, so the server isn't involved at this stage.
OK I was thinking the PXE Server could have been configured with timeout values that were too short or something like that.
But a bug in the driver is a distinct possibility.
If adding gBS->Stlall() fixes you issue it IS a bug in the driver.
Thanks,
Andrew Fish
Thanks, Ryan.
Thanks,
Andrew Fish
(4) What the boot order should be can be influenced by the platform BDS lib, in the PlatformBdsPolicyBehavior() function.
Namely, the BdsEntry() function in "MdeModulePkg/Universal/Variable/RuntimeDxe/Variable.c" initializes the "BootOptionList" variable to an empty list. Then it calls PlatformBdsPolicyBehavior(), which takes "BootOptionList" as an input/output parameter -- if it wishes, it can populate it.
In ArmVirtPkg and in OvmfPkg, we perform the following steps in PlatformBdsPolicyBehavior():
(a) connect the console(s)
(b) BdsLibConnectAll()
(c) BdsLibEnumerateAllBootOption (BootOptionList) -- this relies on the presence of all devices, from the previous step. This function (in "IntelFrameworkModulePkg/Library/GenericBdsLib/BdsBoot.c") has extensive documentation in its leading comment.
It will enumerate everything sensible (modifying BootOrder as well I think), and output a BootOptionList that contains all the possible boot options, in a sane order. Sanity means, if I remember correctly, that all options that existed previously and were referenced by BootOrder, retain their positions at the front of the list, and any new auto-detected boot options are tacked to the end.
(d) SetBootOrderFromQemu (BootOptionList) -- this is the really platform specific part for massaging the boot order. We read through BootOptionList -- we don't modify it --, do various calculations, and then rewrite the BootOrder variable. Importantly, all Boot#### variables that become *unreferenced* by BootOrder as a result of this, must be deleted (otherwise they constitute a leak). Again, BootOptionList is not modified.
(e) BdsLibBuildOptionFromVar (BootOptionList, L"BootOrder") -- it rebuilds BootOptionList from the new BootOrder contents. (We are again in PlatformBdsPolicyBehavior(), where BootOptionList counts as input/output.)
On a physical platform, I think you just go with (b) and (c), and then let the user customize the boot order. Next time you boot, (c) will respect that.
Excellent answer, thanks. It looks like (c) is exactly the thing I'm looking for. For example, make HDD boot before USB. That sort of thing.
I'm quite happy that once the default boot order has been set that it stays that way unless the user changes it. I don't (think I) want to customise the boot order after the initial boot.
There are further possibilities; there is a "boot mode" HOB with which your low-level platform code can control your BDS policy, in order to speed up things. See BdsLibGetBootMode() and the macros in "MdePkg/Include/Pi/PiBootMode.h". Those macros are documented in one of the PI spec volumes.
For example, I think BOOT_ASSUMING_NO_CONFIGURATION_CHANGES is meant to be very fast (no need to connect all devices to all drivers), but such a HOB must be produced by your own PEI phase somehow -- you must know for example that the chassis was never opened while the machine was off.
FWIW, OVMF only uses BOOT_WITH_FULL_CONFIGURATION, and BOOT_ON_S3_RESUME, and these two are differentiated in OVMF's PEI phase by reading a CMOS register.
Anyway, I think what you need is:
- call BdsLibConnectAll() exactly once
- give that NIC more time (?)
- if you'd like to regenerate all possible boot options *at the end* of
BootOrder that the user may have deleted (or have become available by installing new hardware), call BdsLibEnumerateAllBootOption() too.
Yes, that sounds about right. I have concerns about the negotiation timing, but the boot order hacking sounds like what I'm looking for.
Thanks again, Ryan.
Laszlo
Regards, Ryan.
[1] https://git.linaro.org/uefi/linaro-edk2.git/commitdiff/bfbd0ef1a182e1baa120f... [2] https://git.linaro.org/landing-teams/working/arm/edk2.git/commitdiff/25320ba...
edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
On 02/05/16 19:36, Ryan Harkin wrote:
Hi Laszlo,
On 5 February 2016 at 17:19, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 17:35, Ryan Harkin wrote:
Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
I cannot give a coherent answer, just a few thoughts.
(1) I think BdsLibConnectAll() actually succeeds for the first time as well. All devices are enumerated, all drivers are connected, aren't they? The boot order is a separate question.
Yes, you're right, they are all connected because they all appear in the boot list.
(2) The network, the NIC, or the NIC driver are more probable suspects. If I see right, you always have a misc / misc1 / hd / network sequence of attempts, it's just that on the first few occasions, the network fails. ("Link timeout in auto-negotiation".)
Correct.
(3) I think repeated BdsLibConnectAll() calls may only give more time to the NIC to bring itself into working shape. What if you keep only one BdsLibConnectAll(), and replace the second BdsLibConnectAll() with a sizeable gBS->Stall()?
Eureka! I replace the 2nd BdsLibConnectAll() with "gBS->Stall(500000);" (0.5 seconds) and this works every time also.
So time to negociate (sic) would seem like the culprit. I suppose a 2nd BdsLibConnectAll() buys the NIC some time.
I'm left wondering if the "Boot EFI Network" option should actually be waiting for negotiation, however. I'm sure it's common on first boot that the network needs a little time to negotiate. I'll look into that. Perhaps there is a setting or an override to tell it to be patient?
AutoNegotiate() in "EmbeddedPkg/Drivers/Lan9118Dxe/Lan9118DxeUtil.c" uses a fixed timeout of 2000 * LAN9118_STALL (where LAN9118_STALL is 2 microseconds). LAN9118_STALL seems extremely short, but even if it is correct as a polling interval for the NIC, the 2000 should be made a PCD, probably.
[snip]
Thanks Laszlo
Hi Laszlo,
On 5 February 2016 at 18:50, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 19:36, Ryan Harkin wrote:
Hi Laszlo,
On 5 February 2016 at 17:19, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 17:35, Ryan Harkin wrote:
Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
I cannot give a coherent answer, just a few thoughts.
(1) I think BdsLibConnectAll() actually succeeds for the first time as well. All devices are enumerated, all drivers are connected, aren't they? The boot order is a separate question.
Yes, you're right, they are all connected because they all appear in the boot list.
(2) The network, the NIC, or the NIC driver are more probable suspects. If I see right, you always have a misc / misc1 / hd / network sequence of attempts, it's just that on the first few occasions, the network fails. ("Link timeout in auto-negotiation".)
Correct.
(3) I think repeated BdsLibConnectAll() calls may only give more time to the NIC to bring itself into working shape. What if you keep only one BdsLibConnectAll(), and replace the second BdsLibConnectAll() with a sizeable gBS->Stall()?
Eureka! I replace the 2nd BdsLibConnectAll() with "gBS->Stall(500000);" (0.5 seconds) and this works every time also.
So time to negociate (sic) would seem like the culprit. I suppose a 2nd BdsLibConnectAll() buys the NIC some time.
I'm left wondering if the "Boot EFI Network" option should actually be waiting for negotiation, however. I'm sure it's common on first boot that the network needs a little time to negotiate. I'll look into that. Perhaps there is a setting or an override to tell it to be patient?
AutoNegotiate() in "EmbeddedPkg/Drivers/Lan9118Dxe/Lan9118DxeUtil.c" uses a fixed timeout of 2000 * LAN9118_STALL (where LAN9118_STALL is 2 microseconds). LAN9118_STALL seems extremely short, but even if it is correct as a polling interval for the NIC, the 2000 should be made a PCD, probably.
Yeah, that looks like a bad idea! I'll investigate on Monday.
Thanks, Ryan.
[snip]
Thanks Laszlo
On 5 February 2016 at 18:50, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 19:36, Ryan Harkin wrote:
Hi Laszlo,
On 5 February 2016 at 17:19, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 17:35, Ryan Harkin wrote:
Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
I cannot give a coherent answer, just a few thoughts.
(1) I think BdsLibConnectAll() actually succeeds for the first time as well. All devices are enumerated, all drivers are connected, aren't they? The boot order is a separate question.
Yes, you're right, they are all connected because they all appear in the boot list.
(2) The network, the NIC, or the NIC driver are more probable suspects. If I see right, you always have a misc / misc1 / hd / network sequence of attempts, it's just that on the first few occasions, the network fails. ("Link timeout in auto-negotiation".)
Correct.
(3) I think repeated BdsLibConnectAll() calls may only give more time to the NIC to bring itself into working shape. What if you keep only one BdsLibConnectAll(), and replace the second BdsLibConnectAll() with a sizeable gBS->Stall()?
Eureka! I replace the 2nd BdsLibConnectAll() with "gBS->Stall(500000);" (0.5 seconds) and this works every time also.
So time to negociate (sic) would seem like the culprit. I suppose a 2nd BdsLibConnectAll() buys the NIC some time.
I'm left wondering if the "Boot EFI Network" option should actually be waiting for negotiation, however. I'm sure it's common on first boot that the network needs a little time to negotiate. I'll look into that. Perhaps there is a setting or an override to tell it to be patient?
AutoNegotiate() in "EmbeddedPkg/Drivers/Lan9118Dxe/Lan9118DxeUtil.c" uses a fixed timeout of 2000 * LAN9118_STALL (where LAN9118_STALL is 2 microseconds). LAN9118_STALL seems extremely short, but even if it is correct as a polling interval for the NIC, the 2000 should be made a PCD, probably.
As you'll see, I've just submitted a patch to convert the value to a PCD, keeping the same default value.
If I use this patch and another patch in the Juno .dsc file to set the PCD to 2000, then networking works just fine. I don't need any extra stalls in the platform code.
I've spent all day trying to test the same patches on Versatile Express TC2, but I've now discovered that ethernet doesn't even work on my TC2 even from the very time the driver was submitted. And it was submitted specifically for TC2. Quality.
On 8 February 2016 at 18:12, Ryan Harkin ryan.harkin@linaro.org wrote:
On 5 February 2016 at 18:50, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 19:36, Ryan Harkin wrote:
Hi Laszlo,
On 5 February 2016 at 17:19, Laszlo Ersek lersek@redhat.com wrote:
On 02/05/16 17:35, Ryan Harkin wrote:
Hello all,
I'm having a problem that is platform specific, but perhaps more of a generic problem.
When ARM's Juno board boots, not all devices are connected. The first boot creates the boot variables and sets their order, meaning that we get the following list on the first attempt:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell
Intel BDS then attempts to boot from one of the devices and ends up in Shell. After exiting Shell, the Intel BDS console GUI comes up. Selecting the Boot Manager option shows more devices being connected and the list becomes longer:
EFI Misc Device EFI Misc Device 1 EFI Internal Shell EFI Hard Drive EFI Network
Subsequent boots will never attempt to boot from Hard Drive or Network because Shell will always succeed. That is not good.
Leif has a patch in his working tree that solves this problem [1] by making the platform call BdsLibConnectAll() at init time. So now, the first time boot order looks sane:
EFI Misc Device EFI Misc Device 1 EFI Hard Drive EFI Network EFI Internal Shell
However, then the board is booting, the "EFI Network" fails to boot the first time and so the board drops back to Shell again:
Warning: LAN9118 Driver in stopped state Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. EhcExecTransfer: transfer failed with 2 EhcControlTransfer: error - Device Error, transfer - 2 Buffer: EFI Hard Drive Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network Warning: LAN9118 Driver not initialized Link timeout in auto-negotiation. Lan9118: Auto Negociation not supported. Booting EFI Internal Shell
Exiting Shell drops the user back to the Intel BDS UI. Selecting "Continue" then succeeds in booting from the EFI Network:
Booting EFI Misc Device Booting EFI Misc Device 1 Booting EFI Hard Drive Booting EFI Network ..MnpFreeTxBuf: Duplicated recycle report from SNP. MnpFreeTxBuf: Duplicated recycle report from SNP. [snip repeated SNP errors]
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I'm assuming that the 2nd call is connecting things that didn't connect the first time. And from that, I suspect/guess that perhaps they didn't connect due to either ordering or timing.
Is there a recommended way to set the order things are connected? Is it even possible to specify dependencies or order? And if so, how do we work out what the order should be?
I cannot give a coherent answer, just a few thoughts.
(1) I think BdsLibConnectAll() actually succeeds for the first time as well. All devices are enumerated, all drivers are connected, aren't they? The boot order is a separate question.
Yes, you're right, they are all connected because they all appear in the boot list.
(2) The network, the NIC, or the NIC driver are more probable suspects. If I see right, you always have a misc / misc1 / hd / network sequence of attempts, it's just that on the first few occasions, the network fails. ("Link timeout in auto-negotiation".)
Correct.
(3) I think repeated BdsLibConnectAll() calls may only give more time to the NIC to bring itself into working shape. What if you keep only one BdsLibConnectAll(), and replace the second BdsLibConnectAll() with a sizeable gBS->Stall()?
Eureka! I replace the 2nd BdsLibConnectAll() with "gBS->Stall(500000);" (0.5 seconds) and this works every time also.
So time to negociate (sic) would seem like the culprit. I suppose a 2nd BdsLibConnectAll() buys the NIC some time.
I'm left wondering if the "Boot EFI Network" option should actually be waiting for negotiation, however. I'm sure it's common on first boot that the network needs a little time to negotiate. I'll look into that. Perhaps there is a setting or an override to tell it to be patient?
AutoNegotiate() in "EmbeddedPkg/Drivers/Lan9118Dxe/Lan9118DxeUtil.c" uses a fixed timeout of 2000 * LAN9118_STALL (where LAN9118_STALL is 2 microseconds). LAN9118_STALL seems extremely short, but even if it is correct as a polling interval for the NIC, the 2000 should be made a PCD, probably.
As you'll see, I've just submitted a patch to convert the value to a PCD, keeping the same default value.
If I use this patch and another patch in the Juno .dsc file to set the PCD to 2000, then networking works just fine. I don't need any extra stalls in the platform code.
I've spent all day trying to test the same patches on Versatile Express TC2, but I've now discovered that ethernet doesn't even work on my TC2 even from the very time the driver was submitted. And it was submitted specifically for TC2. Quality.
Just an FYI on this TC2 issue:
LAN9118 works on release builds but not on debug builds.
I've traced back to the original "known good" code from before the LAN9118 driver was submitted and it shows the same symptoms: debug bad, release good.
Either way, to get TC2 to auto-negotiate, I have to set the new PCD timeout to 10 times higher than the one on Juno. Currently I'm using 400000. But transmit still fails no matter what.
Of course, the problems could be nothing to do with LAN9118. To eliminate the networking stack, I tried with the FVP models and they work under release and debug, although debug gives repeated errors:
LAN91x: SnpTransmit(): TxQueue insert failure.
So nothing conclusive there. More work is needed.
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I have seen components that have asynchronous initialization characteristics, meaning that they returns from the Driver Binding Start() call but keep doing processing on callbacks. If this processing will eventually result in creating new handles (USB bus enumeration comes to mind) then the asynchronous processing is effectively racing the driver connection process (others feel free to jump in and disagree with this assertion). I have no idea if this is your issue but wanted to raise it as a possibility for completeness.
I've been thinking about proposing an enhancement to the spec to cover this case but haven't been motivated enough yet.
Eugene
On 02/05/16 18:56, Cohen, Eugene wrote:
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I have seen components that have asynchronous initialization characteristics, meaning that they returns from the Driver Binding Start() call but keep doing processing on callbacks. If this processing will eventually result in creating new handles (USB bus enumeration comes to mind) then the asynchronous processing is effectively racing the driver connection process (others feel free to jump in and disagree with this assertion). I have no idea if this is your issue but wanted to raise it as a possibility for completeness.
I've been thinking about proposing an enhancement to the spec to cover this case but haven't been motivated enough yet.
Perhaps install a NULL protocol interface (with a new GUID), similar to gEfiPciEnumerationCompleteProtocolGuid?
Thanks Laszlo
On Feb 5, 2016, at 9:56 AM, Cohen, Eugene eugene@hp.com wrote:
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I have seen components that have asynchronous initialization characteristics, meaning that they returns from the Driver Binding Start() call but keep doing processing on callbacks. If this processing will eventually result in creating new handles (USB bus enumeration comes to mind) then the asynchronous processing is effectively racing the driver connection process (others feel free to jump in and disagree with this assertion). I have no idea if this is your issue but wanted to raise it as a possibility for completeness.
Well USB is different since it supports hot-plug. The USB bus driver is doing the gBS->ConnectController() on the children as they are discovered, but the Start() should connect the current topology.
If drivers are picking and choosing what to connect that seems like a bug in the driver. The architecture is the platform will pass in a remaining device path to bus driver to give a hint on the only device that NEEDs to be connected, and it is a driver implementation choice if it just enumerates that device or the entire bus. There is not concept of the driver choosing, that is non conferment driver behavior.
In general connecting all drivers on boot is a performance bug.
How the BDS configures boot options is just implementation choice, so you can modify it to be what ever your platform needs.
Thanks,
Andrew Fish
I've been thinking about proposing an enhancement to the spec to cover this case but haven't been motivated enough yet.
Eugene
edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
On 02/05/16 19:26, Andrew Fish wrote:
[snip]
In general connecting all drivers on boot is a performance bug.
I agree in general, but for a QEMU VM specifically, bootable devices can show up and go away at every boot, at wildly different addresses, and there's no way to know about such changes within the firmware (so that we could set the boot mode HOB accordingly). A full enumeration is not super-fast, but it works correctly.
[snip]
Thanks Laszlo
Hi Eugene,
On 5 February 2016 at 17:56, Cohen, Eugene eugene@hp.com wrote:
If I duplicate the call to BdsLibConnectAll() [2], then boot works as expected. On first boot, the boot order is created correctly and EFI Network pulls down a file and boots it.
I have seen components that have asynchronous initialization characteristics, meaning that they returns from the Driver Binding Start() call but keep doing processing on callbacks. If this processing will eventually result in creating new handles (USB bus enumeration comes to mind) then the asynchronous processing is effectively racing the driver connection process (others feel free to jump in and disagree with this assertion). I have no idea if this is your issue but wanted to raise it as a possibility for completeness.
That could be part of the problem.
Laszlo has proposed I add a gBS->Stall() to prevent the problem - and that works. I'm uncertain about the way that negotiation should be handled; it sounds like something asynchronous. A delay is simple, but my gut feeling is that it's never the "right" solution.
Thanks, Ryan.
I've been thinking about proposing an enhancement to the spec to cover this case but haven't been motivated enough yet.
Eugene