On 30 August 2016 at 14:14, Alan Ott alan@softiron.co.uk wrote:
On 08/30/2016 02:49 AM, Ard Biesheuvel wrote:
On 30 August 2016 at 03:59, Alan Ott alan@softiron.co.uk wrote:
On 08/29/2016 03:34 PM, Ard Biesheuvel wrote:
On 29 August 2016 at 20:02, Alan Ott alan@softiron.co.uk wrote:
Change the receive buffers to be single-use only. This involves a few changes that work together:
- Change msk_newbuf() to not attempt to re-use or free a buffer,
- Change msk_rxeof() free the buffer when it is done with it,
- Store a temporary copy of the received data for passing to the Receive Queue,
- Call the required Flush() and Unmap() on the DMA buffer.
In addition this means failure to allocate a new buffer is a failure of msk_rxeof().
Note that this change makes the driver work the way the FreeBSD driver (from which this driver was derived) works, and this simply removes an optimization (the code in msk_newbuf() which re-uses the buffers. This removal of the optimization is done for two reasons:
- The optimization failed to work for 64-bit DMA transfers;
- The UEFI specification, version 2.6, section 13.4 requires calls to Flush() and Unmap() before reading a DMA write buffer from the
CPU, which doesn't fit with the optimization as it existed.
Reverting back to the behavior as it was in the FreeBSD driver solves number 1 and 2 above, and makes this driver more consistent with something we know to be working. There is slightly more overhead, but it is more consistent with the UEFI standard.
Contributed-under: TianoCore Contribution Agreement 1.0 Signed-off-by: Alan Ott alan@softiron.co.uk
Drivers/Net/MarvellYukonDxe/if_msk.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-)
diff --git a/Drivers/Net/MarvellYukonDxe/if_msk.c b/Drivers/Net/MarvellYukonDxe/if_msk.c index d2102dc..ad7a1c2 100644 --- a/Drivers/Net/MarvellYukonDxe/if_msk.c +++ b/Drivers/Net/MarvellYukonDxe/if_msk.c @@ -581,13 +581,6 @@ msk_newbuf (
rxd = &sc_if->msk_cdata.msk_rxdesc[idx];
- if ((rxd->rx_m.Buf != NULL) && (rxd->rx_m.Length >= Length)) {
- return EFI_ALREADY_STARTED;
- } else if (rxd->rx_m.Buf != NULL) {
- mPciIo->Unmap (mPciIo, rxd->rx_m.DmaMapping);
- mPciIo->FreeBuffer (mPciIo, EFI_SIZE_TO_PAGES (rxd->rx_m.Length),
rxd->rx_m.Buf);
- }
- Status = mPciIo->AllocateBuffer (mPciIo, AllocateAnyPages,
EfiBootServicesData, EFI_SIZE_TO_PAGES (Length), &Buffer, 0); if (EFI_ERROR (Status)) { return Status; @@ -1848,6 +1841,7 @@ msk_rxeof ( struct msk_rxdesc *rxd; INTN cons; INTN rxlen;
MSK_DMA_BUF m;
DEBUG ((EFI_D_NET, "Marvell Yukon: rxeof\n"));
@@ -1871,26 +1865,40 @@ msk_rxeof ( break; }
- rxd = &sc_if->msk_cdata.msk_rxdesc[cons];
- m = rxd->rx_m;
Struct assignment is not allowed in EDK2 (but I haven't checked the existing code).
I had a look in the EDKII C Coding Standards 2.1 Draft PDF here:
https://github.com/tianocore/tianocore.github.io/wiki/EDK-II-Specifications and didn't see it, but maybe I'm looking at the wrong thing.
I will change this to
gBS->CopyMem (&m, &rxd->rx_m, sizeof(MSK_DMA_BUF));
when merging the patch, unless there is a pressing reason not to.
Feel free, but: gBS->CopyMem (&m, &rxd->rx_m, sizeof(m)); is a little safer (sizeof(m) instead of sizeof(MSK_DMA_BUF)), but of course the struct assignment method is the safest (with compile-time checks for equivalent types and automatic determination of size (which already helped me once today)).
Whatever you feel needs to be done is fine.
OK
Status = msk_newbuf (sc_if, cons); if (EFI_ERROR (Status)) {
// This is a dropped packet, but we aren't counting drops // Reuse old buffer msk_discard_rxbuf (sc_if, cons);
break;
- }
- Status = mPciIo->Flush (mPciIo);
- if (EFI_ERROR (Status)) {
DEBUG ((EFI_D_NET, "Marvell Yukon: failed to Flush DMA\n")); }
- Status = mPciIo->Unmap (mPciIo, m.DmaMapping);
- if (EFI_ERROR (Status)) {
DEBUG ((EFI_D_NET, "Marvell Yukon: failed to Unmap DMA\n"));
- }
Status = gBS->AllocatePool (EfiBootServicesData, sizeof (MSK_LINKED_DMA_BUF), (VOID**) &m_link); if (!EFI_ERROR (Status)) { gBS->SetMem (m_link, sizeof (MSK_LINKED_DMA_BUF), 0);
rxd = &sc_if->msk_cdata.msk_rxdesc[cons];
m_link->Signature = RX_MBUF_SIGNATURE; Status = gBS->AllocatePool (EfiBootServicesData, len, (VOID**) &m_link->DmaBuf.Buf); if(!EFI_ERROR (Status)) {
gBS->CopyMem (m_link->DmaBuf.Buf, rxd->rx_m.Buf, len);
gBS->CopyMem (m_link->DmaBuf.Buf, m.Buf, len); m_link->DmaBuf.Length = len;
m_link->DmaBuf.DmaMapping = rxd->rx_m.DmaMapping;
m_link->DmaBuf.DmaMapping = m.DmaMapping;
Do we still need DmaMapping after we unmapped the buffer?
Certainly not, but the consumer of that queue doesn't do anything with the DMA mapping, which is why it didn't cause an issue, even before my patches.
The Receive Queue should use a MSK_LINKED_SYSTEM_BUF instead of a MSK_LINKED_DMA_BUF. I made a patch for this, which I will send tomorrow (Tuesday) as part of v7.
OK, but before you do that, could we get rid of the PciIo->AllocateBuffer()
I've acutally been testing with something similar already (although using gBS->AllocatePages() as you originally suggested), but I'd like to do one thing at a time.
OK, fair enough.
Further, from my reading of the documents, I'm not even sure it's right to try to DMA to memory that's not allocated by the PciIo. I had partially typed up an email to this effect, but I was hoping for one thing at a time. But since you brought it up...
What do you think about section 5.1.1.2 of the Driver Writer's Guide where it says:
A UEFI driver must never directly allocate a memory buffer for DMA access. The UEFI driver cannot know enough about the system architecture to predict what system memory areas are available for DMA or if CPU caches are coherent with DMA operations. Instead, a UEFI driver must use the services provided by the I/O protocol for the bus to allocate and free buffers required for DMA operations. There should also be services to initiate and complete DMA transactions. For example, the PCI Root Bridge I/O Protocol and PCI I/O Protocol both provide services for PCI DMA operations. As additional I/O bus types with DMA capabilities are introduced, new protocols that abstract the DMA services must be provided.
Similar language is in section 5.3.11. It seems like it's stating you can't make the assumption that the bus is able to perform DMA operations to all of RAM, but that the bus driver knows these (platform- and bus-specific) limitations, so you should use the bus (in this case PciIo), to allocate the memory.
However, the examples of streaming PCI DMA in the DWG do _not_ do an allocation this way, and rely on a pointer passed in from outside the example code (DWG 18.5.7).
I've looked hard in the UEFI specification, but I can't find anything else which corroborates this. The UEFI spec only seems to mandate PciIo->AllocateBuffer() for common buffers (EfiPciOperationBusMasterCommonBuffer). This leads me to suspect the above quoted paragraph (DWG 5.1.1.2) may only be talking about common buffers and not for streaming buffers, but it's not clear.
No, it is not clear, but AllocateBuffer() is simply not required for streaming DMA. Note that the transmit path in this driver already performs streaming DMA on arbitrary buffers, so there is no reason for the RX path to be different.
In general, the directionality of streaming DMA combined with the restrictions on when the data is valid and accessible (before Map, after Unmap, etc) means that the implementation can reallocate the buffer if it is not located in suitable memory, which is exactly what the non-coherent DmaLib does.
There are two reasons I'd prefer not to use AllocateBuffer() here: - it gives you uncached memory on non-coherent platforms, which means that every single read performed by the CopyMem() after flush/unmap goes all the way to main memory - AllocateBuffer() changes the memory map (and so does AllocatePages(), which is why I suggested AllocatePool() instead in the second instance), which is costly
In my testing, your suggestion (AllocatePages()) works, but I'm not convinced it's correct.
Either way, I'm going to push for this being a separate patch than the current patch series. I'm happy to do it (if we decide ultimately that it is correct), but it's really a separate problem (as you've indicated before).
Let me know if this is ok with you.
Sure.
and the redundant copy as well? Something like below, but in a way that it doesn't leak memory?
Can this also be a separate from the series? Again, it's a separate problem. Like the above, I'm happy to do it, but rebasing this same set continuously is getting tiresome. v7 is nearly ready, and I'd like it to be the last one. I'm happy to submit a second series after this series is in, to optimize these issues you point out.
Let me know, and again, thanks for your review and interest in this series.
No, that is perfectly fine. I am very happy you are willing to spend more time on this, since I don't have access to the hardware (and am not that familiar with this version or the BSD version of the driver)
It just seemed that changing from MSK_DMA_BUF to MSK_SYSTEM_BUF etc would combine naturally with the changes above, but perhaps not. It's up to you.
Thanks, Ard.