Of all the gin joints in all the towns in all the world, Cohen, Eugene had to walk into mine at 05:32:47 on Friday 15 July 2016 and say:
Shaveta,
Have you tried some optimizations to get better performance in UEFI code
If I may ask, what device driver code are you using? I think earlier you mentioned you're using an Intel E1000 device. Is that correct? If so, which driver are you using with it?
I'm guessing the ARMv8 platform you're using is a Freescale/NXP Layerscape processor. Do you know if it implements bus snooping for PCI devices? One of my biggest beefs with ARM-based SoCs is that you don't always have hardware cache coherence enforced for I/O peripherals. You usually do have coherence between cores, and you might have it with some devices, but not all of them. PCI is one that often gets shortchanged.
If you don't have PCI snooping, that means you have to enforce cache coherence in software in your driver, and it can be tricky to get right. If you use cached buffers for RX packet data for example, you have to do a cache invalidate on the buffer both before and after the DMA transfer in order to fully avoid any cache effects. (There was a long discussion on one of the Linux development forums about these topics on ARM, but unfortunately I don't have the link handy.)
Also, when allocating DMA descriptor rings, you may need to mark them as uncached. The DMA descriptors are often smaller than a single cache line and yet must be allocated contiguously (at least for the E1000 hardware), so you can't flush/invalidate them individually.
And depending on the circumstances, you may be forced to use the Device memory attribute for those uncached allocations instead of Normal or Strongly Ordered in order to truly defeat all cache effects (for both the L1 and L2 caches). It is not enough for example to just specify Normal memory and inner+outer uncached. (We found that at least with the ARM A9 processors with L2 cache controller, the L2 cache would still buffer writes, unless you used Device memory.)
If you don't get it right, then you may observe what seems like inconsistent hardware behavior which may manifest as long traffic stalls (among other things). It might pay to use Wireshark to monitor the traffic pattern between the UEFI test system and the TFTP host to see if there are a lot of retries.
Since the IA32 and X64 platforms always do PCI bus snooping, drivers originally written for those platforms may rely on hardware cache coherence and will not work correctly on other platforms without modifications. I must confess I'm not familiar enough with the UEFI driver model to know exactly how these issues are supposed to be handled, but if UEFI is running with the caches enabled, then they have to be handled somehow.
-Bill
I think the question you should be asking is how do you measure how the current code is performing, this is a tools and methodology thing. With ARM there are all sorts of options from JTAG debuggers that can sample things to full up ETM/PTM that can show the flow. So first figure out where time is being spent during your network transfers using your favorite debug tools and with that data you will then know where to focus.
Eugene
-----Original Message----- From: Shaveta Leekha [mailto:shaveta.leekha@nxp.com] Sent: Thursday, July 14, 2016 10:03 PM To: Cohen, Eugene eugene@hp.com; Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: RE: PCI performance issue
You are right Eugene!
Performance is getting impacted by all these factors, but that's true that I shall get better performance figure than this. That's why I am trying to figure out the possible optimizations in code that may help in improving it.
Have you tried some optimizations to get better performance in UEFI code?
Thanks and Regards, Shaveta
-----Original Message----- From: Cohen, Eugene [mailto:eugene@hp.com] Sent: Thursday, July 14, 2016 7:29 PM To: Shaveta Leekha shaveta.leekha@nxp.com; Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: RE: PCI performance issue
I've been down this road before...
Network performance (on non-coherent DMA architectures) can be affected by:
- Excessive double buffering caused by unaligned buffers (PCI
BusMasterRead / BusMasterWrite cases) 2. Excessive accesses to uncached buffers (like PCI common buffer cases) 3. Packet loss due to the lack of interrupts in UEFI, I mean, due to a network polling rate that is too slow (look at the MNP poll and UEFI tick periods)
You should be able to get far better performance than 3MB/min!
Eugene
-----Original Message----- From: edk2-devel [mailto:edk2-devel-bounces@lists.01.org] On Behalf Of Shaveta Leekha Sent: Thursday, July 14, 2016 7:45 AM To: Ard Biesheuvel ard.biesheuvel@linaro.org Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: Re: [edk2] PCI performance issue
Ok, I can try that !!
Thanks and Regards, Shaveta
-----Original Message----- From: Ard Biesheuvel [mailto:ard.biesheuvel@linaro.org] Sent: Thursday, July 14, 2016 7:11 PM To: Shaveta Leekha shaveta.leekha@nxp.com Cc: edk2-devel@lists.01.org; Linaro UEFI Mailman List <linaro- uefi@lists.linaro.org> Subject: Re: PCI performance issue
On 14 July 2016 at 15:29, Shaveta Leekha shaveta.leekha@nxp.com
wrote:
But I have not tested the code (software) on any other hardware/board. As I have not yet ported PCI code on any other board yet.
I would recommend to base your expectations not on U-Boot but on UEFI running on a different architecture using similar network hardware. _______________________________________________ edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel
edk2-devel mailing list edk2-devel@lists.01.org https://lists.01.org/mailman/listinfo/edk2-devel