Hi,
I have been working on PCI controller driver performance (Root Bridge) for my ARMv8 platform. I had integrated my PciHostBridgeDxe code with MdeModulePkg/Bus/Pci/PciBusDxe. Have followed PCI Host bridge resource allocation and Root bridge IO protocol, as used in some other existing PCI root bridge driver.
My concern here is that:
Tftp using PCI interface is painfully slow. It takes around 10 mins to transfer a 30MB file over PCI NIC card using Tftp
Has anyone observed the slowness in network transfer using PCI interface?
I couldn't figure out any bottlenecks in PCI root bridge driver( it uses ArmDmaLib for Allocate, Free, Map and UnMap) nor does PCIBUS driver seems to have.
Can the slowness be Network stack (like SNP, MNP and other protocols) or E1000 driver code or TFTP command code?
Any pointer for the same would be really helpful!!
Thanks in advance for your time!
Best regards,
Shaveta
On 14 July 2016 at 15:05, Shaveta Leekha shaveta.leekha@nxp.com wrote:
Hi,
I have been working on PCI controller driver performance (Root Bridge) for my ARMv8 platform. I had integrated my PciHostBridgeDxe code with MdeModulePkg/Bus/Pci/PciBusDxe. Have followed PCI Host bridge resource allocation and Root bridge IO protocol, as used in some other existing PCI root bridge driver.
My concern here is that:
Tftp using PCI interface is painfully slow. It takes around 10 mins to transfer a 30MB file over PCI NIC card using Tftp
How much time does it take on other hardware? Did you try it on your PC?
Hi Ard,
Tftp have been tested on u-boot and Linux on same platform/board (the hardware on which tftp is tested on UEFI)
And it takes around 1 min(or even lesser) in getting this big file (initrd image : around 30MB) over PCI using Tftp.
So the issue is not with hardware, its somewhere in (PCI + Network stack over it + E1000 card driver + Tftp implementation) in UEFI.
So not yet able to figure out !!
Thanks and Regards, Shaveta
-----Original Message----- From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] Sent: Thursday, July 14, 2016 6:52 PM To: Shaveta Leekha shaveta.leekha@nxp.com Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List linaro-uefi@lists.linaro.org Subject: Re: PCI performance issue
On 14 July 2016 at 15:05, Shaveta Leekha shaveta.leekha@nxp.com wrote:
Hi,
I have been working on PCI controller driver performance (Root Bridge) for my ARMv8 platform. I had integrated my PciHostBridgeDxe code with MdeModulePkg/Bus/Pci/PciBusDxe. Have followed PCI Host bridge resource allocation and Root bridge IO protocol, as used in some other existing PCI root bridge driver.
My concern here is that:
Tftp using PCI interface is painfully slow. It takes around 10 mins to transfer a 30MB file over PCI NIC card using Tftp
How much time does it take on other hardware? Did you try it on your PC?
But I have not tested the code (software) on any other hardware/board. As I have not yet ported PCI code on any other board yet.
Regards, Shaveta
-----Original Message----- From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Shaveta Leekha Sent: Thursday, July 14, 2016 6:57 PM To: Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List linaro-uefi@lists.linaro.org Subject: Re: [edk2] PCI performance issue
Hi Ard,
Tftp have been tested on u-boot and Linux on same platform/board (the hardware on which tftp is tested on UEFI)
And it takes around 1 min(or even lesser) in getting this big file (initrd image : around 30MB) over PCI using Tftp.
So the issue is not with hardware, its somewhere in (PCI + Network stack over it + E1000 card driver + Tftp implementation) in UEFI.
So not yet able to figure out !!
Thanks and Regards, Shaveta
-----Original Message----- From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] Sent: Thursday, July 14, 2016 6:52 PM To: Shaveta Leekha shaveta.leekha@nxp.com Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List linaro-uefi@lists.linaro.org Subject: Re: PCI performance issue
On 14 July 2016 at 15:05, Shaveta Leekha shaveta.leekha@nxp.com wrote:
Hi,
I have been working on PCI controller driver performance (Root Bridge) for my ARMv8 platform. I had integrated my PciHostBridgeDxe code with MdeModulePkg/Bus/Pci/PciBusDxe. Have followed PCI Host bridge resource allocation and Root bridge IO protocol, as used in some other existing PCI root bridge driver.
My concern here is that:
Tftp using PCI interface is painfully slow. It takes around 10 mins to transfer a 30MB file over PCI NIC card using Tftp
How much time does it take on other hardware? Did you try it on your PC? _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
On 14 July 2016 at 15:29, Shaveta Leekha shaveta.leekha@nxp.com wrote:
But I have not tested the code (software) on any other hardware/board. As I have not yet ported PCI code on any other board yet.
I would recommend to base your expectations not on U-Boot but on UEFI running on a different architecture using similar network hardware.
Ok, I can try that !!
Thanks and Regards, Shaveta
-----Original Message----- From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] Sent: Thursday, July 14, 2016 7:11 PM To: Shaveta Leekha shaveta.leekha@nxp.com Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List linaro-uefi@lists.linaro.org Subject: Re: PCI performance issue
On 14 July 2016 at 15:29, Shaveta Leekha shaveta.leekha@nxp.com wrote:
But I have not tested the code (software) on any other hardware/board. As I have not yet ported PCI code on any other board yet.
I would recommend to base your expectations not on U-Boot but on UEFI running on a different architecture using similar network hardware.
I've been down this road before...
Network performance (on non-coherent DMA architectures) can be affected by:
1. Excessive double buffering caused by unaligned buffers (PCI BusMasterRead / BusMasterWrite cases) 2. Excessive accesses to uncached buffers (like PCI common buffer cases) 3. Packet loss due to the lack of interrupts in UEFI, I mean, due to a network polling rate that is too slow (look at the MNP poll and UEFI tick periods)
You should be able to get far better performance than 3MB/min!
Eugene
-----Original Message----- From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Shaveta Leekha Sent: Thursday, July 14, 2016 7:45 AM To: Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: Re: [edk2] PCI performance issue
Ok, I can try that !!
Thanks and Regards, Shaveta
-----Original Message----- From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] Sent: Thursday, July 14, 2016 7:11 PM To: Shaveta Leekha shaveta.leekha@nxp.com Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: Re: PCI performance issue
On 14 July 2016 at 15:29, Shaveta Leekha shaveta.leekha@nxp.com wrote:
But I have not tested the code (software) on any other hardware/board. As I have not yet ported PCI code on any other board yet.
I would recommend to base your expectations not on U-Boot but on UEFI running on a different architecture using similar network hardware. _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
You are right Eugene!
Performance is getting impacted by all these factors, but that's true that I shall get better performance figure than this. That's why I am trying to figure out the possible optimizations in code that may help in improving it.
Have you tried some optimizations to get better performance in UEFI code?
Thanks and Regards, Shaveta
-----Original Message----- From: Cohen, Eugene [mailto:eugene@hp.com] Sent: Thursday, July 14, 2016 7:29 PM To: Shaveta Leekha shaveta.leekha@nxp.com; Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List linaro-uefi@lists.linaro.org Subject: RE: PCI performance issue
I've been down this road before...
Network performance (on non-coherent DMA architectures) can be affected by:
1. Excessive double buffering caused by unaligned buffers (PCI BusMasterRead / BusMasterWrite cases) 2. Excessive accesses to uncached buffers (like PCI common buffer cases) 3. Packet loss due to the lack of interrupts in UEFI, I mean, due to a network polling rate that is too slow (look at the MNP poll and UEFI tick periods)
You should be able to get far better performance than 3MB/min!
Eugene
-----Original Message----- From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Shaveta Leekha Sent: Thursday, July 14, 2016 7:45 AM To: Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: Re: [edk2] PCI performance issue
Ok, I can try that !!
Thanks and Regards, Shaveta
-----Original Message----- From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] Sent: Thursday, July 14, 2016 7:11 PM To: Shaveta Leekha shaveta.leekha@nxp.com Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: Re: PCI performance issue
On 14 July 2016 at 15:29, Shaveta Leekha shaveta.leekha@nxp.com wrote:
But I have not tested the code (software) on any other hardware/board. As I have not yet ported PCI code on any other board yet.
I would recommend to base your expectations not on U-Boot but on UEFI running on a different architecture using similar network hardware. _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Shaveta,
Have you tried some optimizations to get better performance in UEFI code
I think the question you should be asking is how do you measure how the current code is performing, this is a tools and methodology thing. With ARM there are all sorts of options from JTAG debuggers that can sample things to full up ETM/PTM that can show the flow. So first figure out where time is being spent during your network transfers using your favorite debug tools and with that data you will then know where to focus.
Eugene
-----Original Message----- From: Shaveta Leekha [mailto:shaveta.leekha@nxp.com] Sent: Thursday, July 14, 2016 10:03 PM To: Cohen, Eugene eugene@hp.com; Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: RE: PCI performance issue
You are right Eugene!
Performance is getting impacted by all these factors, but that's true that I shall get better performance figure than this. That's why I am trying to figure out the possible optimizations in code that may help in improving it.
Have you tried some optimizations to get better performance in UEFI code?
Thanks and Regards, Shaveta
-----Original Message----- From: Cohen, Eugene [mailto:eugene@hp.com] Sent: Thursday, July 14, 2016 7:29 PM To: Shaveta Leekha shaveta.leekha@nxp.com; Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: RE: PCI performance issue
I've been down this road before...
Network performance (on non-coherent DMA architectures) can be affected by:
- Excessive double buffering caused by unaligned buffers (PCI
BusMasterRead / BusMasterWrite cases) 2. Excessive accesses to uncached buffers (like PCI common buffer cases) 3. Packet loss due to the lack of interrupts in UEFI, I mean, due to a network polling rate that is too slow (look at the MNP poll and UEFI tick periods)
You should be able to get far better performance than 3MB/min!
Eugene
-----Original Message----- From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Shaveta Leekha Sent: Thursday, July 14, 2016 7:45 AM To: Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: Re: [edk2] PCI performance issue
Ok, I can try that !!
Thanks and Regards, Shaveta
-----Original Message----- From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] Sent: Thursday, July 14, 2016 7:11 PM To: Shaveta Leekha shaveta.leekha@nxp.com Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: Re: PCI performance issue
On 14 July 2016 at 15:29, Shaveta Leekha shaveta.leekha@nxp.com
wrote:
But I have not tested the code (software) on any other hardware/board. As I have not yet ported PCI code on any other board yet.
I would recommend to base your expectations not on U-Boot but on UEFI running on a different architecture using similar network hardware. _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
Of all the gin joints in all the towns in all the world, Cohen, Eugene had to walk into mine at 05:32:47 on Friday 15 July 2016 and say:
Shaveta,
Have you tried some optimizations to get better performance in UEFI code
If I may ask, what device driver code are you using? I think earlier you mentioned you're using an Intel E1000 device. Is that correct? If so, which driver are you using with it?
I'm guessing the ARMv8 platform you're using is a Freescale/NXP Layerscape processor. Do you know if it implements bus snooping for PCI devices? One of my biggest beefs with ARM-based SoCs is that you don't always have hardware cache coherence enforced for I/O peripherals. You usually do have coherence between cores, and you might have it with some devices, but not all of them. PCI is one that often gets shortchanged.
If you don't have PCI snooping, that means you have to enforce cache coherence in software in your driver, and it can be tricky to get right. If you use cached buffers for RX packet data for example, you have to do a cache invalidate on the buffer both before and after the DMA transfer in order to fully avoid any cache effects. (There was a long discussion on one of the Linux development forums about these topics on ARM, but unfortunately I don't have the link handy.)
Also, when allocating DMA descriptor rings, you may need to mark them as uncached. The DMA descriptors are often smaller than a single cache line and yet must be allocated contiguously (at least for the E1000 hardware), so you can't flush/invalidate them individually.
And depending on the circumstances, you may be forced to use the Device memory attribute for those uncached allocations instead of Normal or Strongly Ordered in order to truly defeat all cache effects (for both the L1 and L2 caches). It is not enough for example to just specify Normal memory and inner+outer uncached. (We found that at least with the ARM A9 processors with L2 cache controller, the L2 cache would still buffer writes, unless you used Device memory.)
If you don't get it right, then you may observe what seems like inconsistent hardware behavior which may manifest as long traffic stalls (among other things). It might pay to use Wireshark to monitor the traffic pattern between the UEFI test system and the TFTP host to see if there are a lot of retries.
Since the IA32 and X64 platforms always do PCI bus snooping, drivers originally written for those platforms may rely on hardware cache coherence and will not work correctly on other platforms without modifications. I must confess I'm not familiar enough with the UEFI driver model to know exactly how these issues are supposed to be handled, but if UEFI is running with the caches enabled, then they have to be handled somehow.
-Bill
I think the question you should be asking is how do you measure how the current code is performing, this is a tools and methodology thing. With ARM there are all sorts of options from JTAG debuggers that can sample things to full up ETM/PTM that can show the flow. So first figure out where time is being spent during your network transfers using your favorite debug tools and with that data you will then know where to focus.
Eugene
-----Original Message----- From: Shaveta Leekha [mailto:shaveta.leekha@nxp.com] Sent: Thursday, July 14, 2016 10:03 PM To: Cohen, Eugene eugene@hp.com; Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: RE: PCI performance issue
You are right Eugene!
Performance is getting impacted by all these factors, but that's true that I shall get better performance figure than this. That's why I am trying to figure out the possible optimizations in code that may help in improving it.
Have you tried some optimizations to get better performance in UEFI code?
Thanks and Regards, Shaveta
-----Original Message----- From: Cohen, Eugene [mailto:eugene@hp.com] Sent: Thursday, July 14, 2016 7:29 PM To: Shaveta Leekha shaveta.leekha@nxp.com; Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: RE: PCI performance issue
I've been down this road before...
Network performance (on non-coherent DMA architectures) can be affected by:
- Excessive double buffering caused by unaligned buffers (PCI
BusMasterRead / BusMasterWrite cases) 2. Excessive accesses to uncached buffers (like PCI common buffer cases) 3. Packet loss due to the lack of interrupts in UEFI, I mean, due to a network polling rate that is too slow (look at the MNP poll and UEFI tick periods)
You should be able to get far better performance than 3MB/min!
Eugene
-----Original Message----- From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Shaveta Leekha Sent: Thursday, July 14, 2016 7:45 AM To: Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: Re: [edk2] PCI performance issue
Ok, I can try that !!
Thanks and Regards, Shaveta
-----Original Message----- From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] Sent: Thursday, July 14, 2016 7:11 PM To: Shaveta Leekha shaveta.leekha@nxp.com Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: Re: PCI performance issue
On 14 July 2016 at 15:29, Shaveta Leekha shaveta.leekha@nxp.com
wrote:
But I have not tested the code (software) on any other hardware/board. As I have not yet ported PCI code on any other board yet.
I would recommend to base your expectations not on U-Boot but on UEFI running on a different architecture using similar network hardware. _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel