Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64

21 May 2014


      On Wednesday 21 May 2014 16:32:16 Catalin Marinas wrote:
...
On Wed, May 21, 2014 at 03:43:39PM +0100, Arnd Bergmann wrote:
...
On Wednesday 21 May 2014 14:56:36 Catalin Marinas wrote:
...
On Tue, May 20, 2014 at 08:37:10PM +0100, Arnd Bergmann wrote:
...
On Tuesday 20 May 2014 15:12:49 Catalin Marinas wrote:
The point I was trying to make is a different one: For the streaming mapping,
you can allow any mask as long as the swiotlb is able to create a bounce
buffer.
dma_mask is a hardware parameter and is used by the streaming DMA API
implementation (which could be swiotlb) to decide whether to bounce or
not. The streaming DMA API can take any CPU address as input,
independent of the dma_mask, and bounce accordingly. If it doesn't have
a bounce buffer matching the dma_mask, dma_supported() would return
false.
...
This does not work for the coherent mapping, because that by definition
cannot use bounce buffers.
Yes but most of the time these masks have the same value.
Let me start again with an example:
io_tlb_end	=    0x1000000; // 16 MB
dma_mask 	=   0x10000000; // 256 MB
ZONE_DMA	=  0x100000000; // 4 GB
max_pfn		= 0x1000000000; // 64 GB
The device driver has set dma_mask and dma_coherent_mask to 256 because
of a limitation in the hardware. This succeeded because io_tlb_end is
well within the dma mask.
Since the coherent_dma_mask is smaller than max_pfn, the swiotlb version
of dma_alloc_coherent then allocates the buffer using GFP_DMA. However,
that likely returns an address above dma_mask/coherent_dma_mask.
swiotlb_alloc_coherent() first tries __get_free_pages(GFP_DMA). If the
returned address is not within coherent_dma_mask, it frees the allocated
pages and tries to allocate from its bounce buffer (io_tlb).
Ok, got it now. Thanks for your patience. I suppose this was done
to maintain the API guarantee that dma_set_coherent_mask(dev, dma_get_mask(dev))
always succeeds.
...
...
My point was that to fix this case, you must not compare the
coherent_dma_mask requested by the device against io_tlb_end but against
ZONE_DMA.
See above, I think swiotlb does the right thing.
We have a problem, however, with CMA, as we assume that the returned
address is within coherent_dma_mask (we need a contiguous_dma_supported
function, similar to swiotlb_dma_supported). We also don't limit the CMA
buffer to ZONE_DMA on arm64.
Right. I guess the answer to that is to teach CMA to honor GFP_DMA.
This was probably never needed on ARM32 but is an obvious extension.
...
...
...
...
As far as I understand it, you are not allowed
to create a noncacheable mapping for a page that also has a cachable
mapping, or did that change between armv7 and armv8?
No change between ARMv7 and ARMv8. In fact, ARMv7 has clarified what
"unpredictable" means when you mix different attributes. Basically, we
can mix the same memory type (Normal Cacheable vs Normal NonCacheable)
but if the cacheability is different, we need to do some cache
maintenance before accessing the non-cacheable mapping the first time
(clear potentially dirty lines that could be evicted randomly). Any
speculatively loaded cache lines via the cacheable mapping would not be
hit when accessed via the non-cacheable one.
Ah, only for the first access? I had remembered this differently, thanks
for the clarification.
The first time, but assuming that you no longer write to the cacheable
alias to dirty it again (basically it's like the streaming DMA, if you
want to access the cacheable alias you need cache maintenance).
Ok.
...
...
...
...
For CMA, we use the trick to change the mapping in place, but that
doesn't work for normal pages, which are mapped using huge tlbs.
That's true for arm32 and we could do the same for arm64, though I don't
think its worth because we could break very huge mappings (e.g. 1GB). We
have plenty of VA space and we just create a new non-coherent mapping
(this latter part could be optimised, even generically, to use huge
pages where possible).
I thought we specifically did this on arm32 to avoid the duplicate
mapping because we had concluded that it would be broken even with the
updated ARMv7 definition.
We did it because it was vaguely defined as "unpredictable" in the ARM
ARM for a long time until the hw guys realised it's not easy to change
Linux and decided to clarify what could actually happen (see A3.5.7 in
the ARMv7 ARM, "Mismatched memory attributes").
But in the past we used to have coherent DMA buffers mapped as Strongly
Ordered and this would be a different memory type than Normal
(NonCacheable). But even in this case, ARM ARM now states that such
mapping is permitted but the Strongly Ordered properties are not
guaranteed (could simply behave line Normal NonCacheable).
Right.
...
...
...
...
...
Of course, another approach would be to change the cacheability
attributes of the io_tlb buffer (or the CMA one) but current approach
has the advantage that the io_tlb buffer can be used for both coherent
and non-coherent devices.
That definitely shouldn't be done without performance testing. However
I wonder if they were done in the first place for the swiotlb: It's
possible that it's cheaper to have the bounce buffer for noncoherent
devices be mapped noncachable so we do cache bypassing copy into the
bounce buffer and out of it and save the extra flushes.
I doesn't do this AFAICT but it's an interesting idea. Well, to be
optimised when we can benchmark this on hardware.
One issue though is when some (or all) devices are coherent. In this
case, even if you use a bounce buffer, it needs to be mapped coherently
at the CPU level (e.g. the device has its own cache that can be snooped,
so CPU accesses need to be cacheable).
Do you mean there are cases where an uncached buffer is not coherent
with a device cache?
Yes, if for example you have a device (e.g. GPU, micro-controller) on a
coherent interconnect, non-cacheable CPU accesses are not guaranteed to
(probably should not) hit into the other device's cache (it could be
seen as a side-effect of allowing cacheable vs non-cacheable aliases).
Rihgt, makes sense. So if we do this trick, it should only be done for
devices that are not coherent to start with.
Arnd

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

Re: [PATCH] usb: ehci: Enable support for 64bit EHCI host controllers in arm64