On 01/19/13 21:05, Arnd Bergmann wrote:
I found at least one source line that incorrectly uses an atomic allocation, in ehci_mem_init():
dma_alloc_coherent (ehci_to_hcd(ehci)->self.controller, ehci->periodic_size * sizeof(__le32), &ehci->periodic_dma, 0);
The last argument is the GFP_ flag, which should never be zero, as that is implicit !wait. This function is called only once, so it is not the actual culprit, but there could be other instances where we accidentally allocate something as GFP_ATOMIC.
The total number of allocations I found for each type are
sata_mv: 66 pages (270336 bytes) mv643xx_eth: 4 pages == (16384 bytes) orion_ehci: 154 pages (630784 bytes) orion_ehci (atomic): 256 pages (1048576 bytes)
from the distribution of the numbers, it seems that there is exactly 1 MB of data allocated between bus addresses 0x1f90000 and 0x1f9ffff, allocated in individual pages. This matches the size of your pool, so it's definitely something coming from USB, and no single other allocation, but it does not directly point to a specific line of code.
Very interesting, so this is no fragmentation problem nor something caused by sata or ethernet.
One thing I found was that the ARM dma-mapping code seems buggy in the way that it does a bitwise and between the gfp mask and GFP_ATOMIC, which does not work because GFP_ATOMIC is defined by the absence of __GFP_WAIT.
I believe we need the patch below, but it is not clear to me if that issue is related to your problem or now.
Out of curiosity I checked include/linux/gfp.h. GFP_ATOMIC is defined as __GFP_HIGH (which means 'use emergency pool', and no wait), so this patch should not make any difference for "normal" (GPF_ATOMIC / GFP_KERNEL) allocations, only for gfp_flags accidentally set to zero. So, can a new test with this patch help to debug the pool exhaustion?
diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 6b2fb87..c57975f 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -640,7 +641,7 @@ static void *__dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, if (is_coherent || nommu()) addr = __alloc_simple_buffer(dev, size, gfp, &page);
- else if (gfp & GFP_ATOMIC)
- else if (!(gfp & __GFP_WAIT)) addr = __alloc_from_pool(size, &page); else if (!IS_ENABLED(CONFIG_CMA)) addr = __alloc_remap_buffer(dev, size, gfp, prot, &page, caller);
@@ -1272,7 +1273,7 @@ static void *arm_iommu_alloc_attrs(struct device *dev, size_t size, *handle = DMA_ERROR_CODE; size = PAGE_ALIGN(size);
- if (gfp & GFP_ATOMIC)
- if (!(gfp & __GFP_WAIT)) return __iommu_alloc_atomic(dev, size, handle);
pages = __iommu_alloc_buffer(dev, size, gfp, attrs); 8<-------
There is one more code path I could find, which is usb_submit_urb() => usb_hcd_submit_urb => ehci_urb_enqueue() => submit_async() => qh_append_tds() => qh_make(GFP_ATOMIC) => ehci_qh_alloc() => dma_pool_alloc() => pool_alloc_page() => dma_alloc_coherent()
So even for a GFP_KERNEL passed into usb_submit_urb, the ehci driver causes the low-level allocation to be GFP_ATOMIC, because qh_append_tds() is called under a spinlock. If we have hundreds of URBs in flight, that will exhaust the pool rather quickly.
Maybe there are hundreds of URBs in flight in my application, I have no idea how to check this. It seems to me that bad reception conditions (lost lock / regained lock messages for some dvb channels) accelerate the buffer exhaustion. But even with a 4MB coherent pool I see the error. Is there any chance to fix this in the usb or dvb subsystem (or wherever)? Should I try to further increase the pool size, or what else can I do besides using an older kernel?
Soeren