Currently SEV enabled guests hit might_sleep() warnings when a driver (nvme in this case) allocates through the DMA API in a non-blockable context. Making these unencrypted non-blocking DMA allocations come from the coherent pools prevents this BUG.
BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio 2 locks held by fio/3383: #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: dump_stack+0x98/0xd5 ___might_sleep+0x175/0x260 __might_sleep+0x4a/0x80 _vm_unmap_aliases+0x45/0x250 vm_unmap_aliases+0x19/0x20 __set_memory_enc_dec+0xa4/0x130 set_memory_decrypted+0x10/0x20 dma_direct_alloc_pages+0x148/0x150 dma_direct_alloc+0xe/0x10 dma_alloc_attrs+0x86/0xc0 dma_pool_alloc+0x16f/0x2b0 nvme_queue_rq+0x878/0xc30 [nvme] __blk_mq_try_issue_directly+0x135/0x200 blk_mq_request_issue_directly+0x4f/0x80 blk_mq_try_issue_list_directly+0x46/0xb0 blk_mq_sched_insert_requests+0x19b/0x2b0 blk_mq_flush_plug_list+0x22f/0x3b0 blk_flush_plug_list+0xd1/0x100 blk_finish_plug+0x2c/0x40 iomap_dio_rw+0x427/0x490 ext4_file_write_iter+0x181/0x5d0 aio_write+0x109/0x1b0 io_submit_one+0x7d0/0xfa0 __x64_sys_io_submit+0xa2/0x280 do_syscall_64+0x5f/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Christoph Hellwig (9): dma-direct: remove __dma_direct_free_pages dma-direct: remove the dma_handle argument to __dma_direct_alloc_pages dma-direct: provide mmap and get_sgtable method overrides dma-mapping: merge the generic remapping helpers into dma-direct dma-direct: consolidate the error handling in dma_direct_alloc_pages xtensa: use the generic uncached segment support dma-direct: make uncached_kernel_address more general dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR dma-pool: fix coherent pool allocations for IOMMU mappings
David Rientjes (13): dma-remap: separate DMA atomic pools from direct remap code dma-pool: add additional coherent pools to map to gfp mask dma-pool: dynamically expanding atomic pools dma-direct: atomic allocations must come from atomic coherent pools dma-pool: add pool sizes to debugfs x86/mm: unencrypted non-blocking DMA allocations use coherent pools dma-pool: scale the default DMA coherent pool size with memory capacity dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL dma-direct: always align allocation size in dma_direct_alloc_pages() dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails dma-direct: check return value when encrypting or decrypting memory dma-direct: add missing set_memory_decrypted() for coherent mapping dma-mapping: warn when coherent pool is depleted
Geert Uytterhoeven (1): dma-pool: fix too large DMA pools on medium memory size systems
Huang Shijie (1): lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr
Nicolas Saenz Julienne (6): dma-direct: provide function to check physical memory area validity dma-pool: get rid of dma_in_atomic_pool() dma-pool: introduce dma_guess_pool() dma-pool: make sure atomic pool suits device dma-pool: do not allocate pool memory from CMA dma/direct: turn ARCH_ZONE_DMA_BITS into a variable
Documentation/core-api/genalloc.rst | 2 +- arch/Kconfig | 8 +- arch/arc/Kconfig | 1 - arch/arm/Kconfig | 1 - arch/arm/mm/dma-mapping.c | 8 +- arch/arm64/Kconfig | 1 - arch/arm64/mm/init.c | 9 +- arch/ia64/Kconfig | 2 +- arch/ia64/kernel/dma-mapping.c | 6 - arch/microblaze/Kconfig | 3 +- arch/microblaze/mm/consistent.c | 2 +- arch/mips/Kconfig | 7 +- arch/mips/mm/dma-noncoherent.c | 8 +- arch/nios2/Kconfig | 3 +- arch/nios2/mm/dma-mapping.c | 2 +- arch/powerpc/include/asm/page.h | 9 - arch/powerpc/mm/mem.c | 20 +- arch/powerpc/platforms/Kconfig.cputype | 1 - arch/s390/include/asm/page.h | 2 - arch/s390/mm/init.c | 1 + arch/x86/Kconfig | 1 + arch/xtensa/Kconfig | 6 +- arch/xtensa/include/asm/platform.h | 27 --- arch/xtensa/kernel/Makefile | 3 +- arch/xtensa/kernel/pci-dma.c | 121 ++--------- drivers/iommu/dma-iommu.c | 5 +- drivers/misc/sram-exec.c | 2 +- include/linux/dma-direct.h | 12 +- include/linux/dma-mapping.h | 7 +- include/linux/dma-noncoherent.h | 4 +- include/linux/genalloc.h | 2 +- kernel/dma/Kconfig | 20 +- kernel/dma/Makefile | 1 + kernel/dma/direct.c | 224 ++++++++++++++++---- kernel/dma/mapping.c | 45 +---- kernel/dma/pool.c | 270 +++++++++++++++++++++++++ kernel/dma/remap.c | 176 +--------------- lib/genalloc.c | 5 +- 38 files changed, 564 insertions(+), 463 deletions(-) create mode 100644 kernel/dma/pool.c
From: Christoph Hellwig hch@lst.de
upstream acaade1af3587132e7ea585f470a95261e14f60c commit.
We can just call dma_free_contiguous directly instead of wrapping it.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Peter Gonda pgonda@google.com --- include/linux/dma-direct.h | 1 - kernel/dma/direct.c | 11 +++-------- kernel/dma/remap.c | 4 ++-- 3 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 6a18a97b76a8..02a418520062 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -76,6 +76,5 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs); struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); -void __dma_direct_free_pages(struct device *dev, size_t size, struct page *page); int dma_direct_supported(struct device *dev, u64 mask); #endif /* _LINUX_DMA_DIRECT_H */ diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 0a093a675b63..86f580439c9c 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -154,7 +154,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, * so log an error and fail. */ dev_info(dev, "Rejecting highmem page from CMA.\n"); - __dma_direct_free_pages(dev, size, page); + dma_free_contiguous(dev, page, size); return NULL; }
@@ -176,11 +176,6 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, return ret; }
-void __dma_direct_free_pages(struct device *dev, size_t size, struct page *page) -{ - dma_free_contiguous(dev, page, size); -} - void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs) { @@ -189,7 +184,7 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && !force_dma_unencrypted(dev)) { /* cpu_addr is a struct page cookie, not a kernel address */ - __dma_direct_free_pages(dev, size, cpu_addr); + dma_free_contiguous(dev, cpu_addr, size); return; }
@@ -199,7 +194,7 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && dma_alloc_need_uncached(dev, attrs)) cpu_addr = cached_kernel_address(cpu_addr); - __dma_direct_free_pages(dev, size, virt_to_page(cpu_addr)); + dma_free_contiguous(dev, virt_to_page(cpu_addr), size); }
void *dma_direct_alloc(struct device *dev, size_t size, diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c index c00b9258fa6a..fb1e50c2d48a 100644 --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -238,7 +238,7 @@ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, dma_pgprot(dev, PAGE_KERNEL, attrs), __builtin_return_address(0)); if (!ret) { - __dma_direct_free_pages(dev, size, page); + dma_free_contiguous(dev, page, size); return ret; }
@@ -256,7 +256,7 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, struct page *page = pfn_to_page(__phys_to_pfn(phys));
vunmap(vaddr); - __dma_direct_free_pages(dev, size, page); + dma_free_contiguous(dev, page, size); } }
From: Christoph Hellwig hch@lst.de
upstream 4e1003aa56a7d60ddb048e43a7a51368fcfe36af commit.
The argument isn't used anywhere, so stop passing it.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Peter Gonda pgonda@google.com --- include/linux/dma-direct.h | 2 +- kernel/dma/direct.c | 4 ++-- kernel/dma/remap.c | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 02a418520062..3238177e65ad 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -75,6 +75,6 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs); struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, - dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); + gfp_t gfp, unsigned long attrs); int dma_direct_supported(struct device *dev, u64 mask); #endif /* _LINUX_DMA_DIRECT_H */ diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 86f580439c9c..9621993bf2bb 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -84,7 +84,7 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) }
struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, - dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) + gfp_t gfp, unsigned long attrs) { size_t alloc_size = PAGE_ALIGN(size); int node = dev_to_node(dev); @@ -132,7 +132,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret;
- page = __dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs); + page = __dma_direct_alloc_pages(dev, size, gfp, attrs); if (!page) return NULL;
diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c index fb1e50c2d48a..90d5ce77c189 100644 --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -226,7 +226,7 @@ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, goto done; }
- page = __dma_direct_alloc_pages(dev, size, dma_handle, flags, attrs); + page = __dma_direct_alloc_pages(dev, size, flags, attrs); if (!page) return NULL;
From: Christoph Hellwig hch@lst.de
upstream 34dc0ea6bc960f1f57b2148f01a3f4da23f87013 commit.
For dma-direct we know that the DMA address is an encoding of the physical address that we can trivially decode. Use that fact to provide implementations that do not need the arch_dma_coherent_to_pfn architecture hook. Note that we still can only support mmap of non-coherent memory only if the architecture provides a way to set an uncached bit in the page tables. This must be true for architectures that use the generic remap helpers, but other architectures can also manually select it.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Peter Gonda pgonda@google.com --- arch/arc/Kconfig | 1 - arch/arm/Kconfig | 1 - arch/arm/mm/dma-mapping.c | 6 --- arch/arm64/Kconfig | 1 - arch/ia64/Kconfig | 2 +- arch/ia64/kernel/dma-mapping.c | 6 --- arch/microblaze/Kconfig | 1 - arch/mips/Kconfig | 4 +- arch/mips/mm/dma-noncoherent.c | 6 --- arch/powerpc/platforms/Kconfig.cputype | 1 - include/linux/dma-direct.h | 7 +++ include/linux/dma-noncoherent.h | 2 - kernel/dma/Kconfig | 12 ++++-- kernel/dma/direct.c | 59 ++++++++++++++++++++++++++ kernel/dma/mapping.c | 45 +++----------------- kernel/dma/remap.c | 6 --- 16 files changed, 85 insertions(+), 75 deletions(-)
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 8383155c8c82..4d7b671c8ff4 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -6,7 +6,6 @@ config ARC def_bool y select ARC_TIMERS - select ARCH_HAS_DMA_COHERENT_TO_PFN select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_PTE_SPECIAL select ARCH_HAS_SETUP_DMA_OPS diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 05c9bbfe444d..fac9999d6ef5 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -7,7 +7,6 @@ config ARM select ARCH_HAS_BINFMT_FLAT select ARCH_HAS_DEBUG_VIRTUAL if MMU select ARCH_HAS_DEVMEM_IS_ALLOWED - select ARCH_HAS_DMA_COHERENT_TO_PFN if SWIOTLB select ARCH_HAS_DMA_WRITE_COMBINE if !ARM_DMA_MEM_BUFFERABLE select ARCH_HAS_ELF_RANDOMIZE select ARCH_HAS_FORTIFY_SOURCE diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 27576c7b836e..58d5765fb129 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -2346,12 +2346,6 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr, size, dir); }
-long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr, - dma_addr_t dma_addr) -{ - return dma_to_pfn(dev, dma_addr); -} - void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index a0bc9bbb92f3..bc45a704987f 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -12,7 +12,6 @@ config ARM64 select ARCH_CLOCKSOURCE_DATA select ARCH_HAS_DEBUG_VIRTUAL select ARCH_HAS_DEVMEM_IS_ALLOWED - select ARCH_HAS_DMA_COHERENT_TO_PFN select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_ACPI_TABLE_UPGRADE if ACPI select ARCH_HAS_FAST_MULTIPLIER diff --git a/arch/ia64/Kconfig b/arch/ia64/Kconfig index 16714477eef4..bab7cd878464 100644 --- a/arch/ia64/Kconfig +++ b/arch/ia64/Kconfig @@ -33,7 +33,7 @@ config IA64 select HAVE_ARCH_TRACEHOOK select HAVE_MEMBLOCK_NODE_MAP select HAVE_VIRT_CPU_ACCOUNTING - select ARCH_HAS_DMA_COHERENT_TO_PFN + select DMA_NONCOHERENT_MMAP select ARCH_HAS_SYNC_DMA_FOR_CPU select VIRT_TO_BUS select GENERIC_IRQ_PROBE diff --git a/arch/ia64/kernel/dma-mapping.c b/arch/ia64/kernel/dma-mapping.c index 4a3262795890..09ef9ce9988d 100644 --- a/arch/ia64/kernel/dma-mapping.c +++ b/arch/ia64/kernel/dma-mapping.c @@ -19,9 +19,3 @@ void arch_dma_free(struct device *dev, size_t size, void *cpu_addr, { dma_direct_free_pages(dev, size, cpu_addr, dma_addr, attrs); } - -long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr, - dma_addr_t dma_addr) -{ - return page_to_pfn(virt_to_page(cpu_addr)); -} diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index c9c4be822456..261c26df1c9f 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -4,7 +4,6 @@ config MICROBLAZE select ARCH_32BIT_OFF_T select ARCH_NO_SWAP select ARCH_HAS_BINFMT_FLAT if !MMU - select ARCH_HAS_DMA_COHERENT_TO_PFN if MMU select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_SYNC_DMA_FOR_CPU diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index e5c2d47608fe..c1c3da4fc667 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -1134,9 +1134,9 @@ config DMA_NONCOHERENT select ARCH_HAS_DMA_WRITE_COMBINE select ARCH_HAS_SYNC_DMA_FOR_DEVICE select ARCH_HAS_UNCACHED_SEGMENT - select NEED_DMA_MAP_STATE - select ARCH_HAS_DMA_COHERENT_TO_PFN + select DMA_NONCOHERENT_MMAP select DMA_NONCOHERENT_CACHE_SYNC + select NEED_DMA_MAP_STATE
config SYS_HAS_EARLY_PRINTK bool diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index 1d4d57dd9acf..fcf6d3eaac66 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -59,12 +59,6 @@ void *cached_kernel_address(void *addr) return __va(addr) - UNCAC_BASE; }
-long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr, - dma_addr_t dma_addr) -{ - return page_to_pfn(virt_to_page(cached_kernel_address(cpu_addr))); -} - static inline void dma_sync_virt(void *addr, size_t size, enum dma_data_direction dir) { diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype index f0330ce498d1..97af19141aed 100644 --- a/arch/powerpc/platforms/Kconfig.cputype +++ b/arch/powerpc/platforms/Kconfig.cputype @@ -459,7 +459,6 @@ config NOT_COHERENT_CACHE bool depends on 4xx || PPC_8xx || E200 || PPC_MPC512x || \ GAMECUBE_COMMON || AMIGAONE - select ARCH_HAS_DMA_COHERENT_TO_PFN select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_SYNC_DMA_FOR_DEVICE select ARCH_HAS_SYNC_DMA_FOR_CPU diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 3238177e65ad..6db863c3eb93 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -76,5 +76,12 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs); struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp_t gfp, unsigned long attrs); +int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + unsigned long attrs); +bool dma_direct_can_mmap(struct device *dev); +int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + unsigned long attrs); int dma_direct_supported(struct device *dev, u64 mask); #endif /* _LINUX_DMA_DIRECT_H */ diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h index dd3de6d88fc0..e30fca1f1b12 100644 --- a/include/linux/dma-noncoherent.h +++ b/include/linux/dma-noncoherent.h @@ -41,8 +41,6 @@ void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void arch_dma_free(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs); -long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr, - dma_addr_t dma_addr);
#ifdef CONFIG_MMU /* diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 73c5c2b8e824..4c103a24e380 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -51,9 +51,6 @@ config ARCH_HAS_SYNC_DMA_FOR_CPU_ALL config ARCH_HAS_DMA_PREP_COHERENT bool
-config ARCH_HAS_DMA_COHERENT_TO_PFN - bool - config ARCH_HAS_FORCE_DMA_UNENCRYPTED bool
@@ -68,9 +65,18 @@ config SWIOTLB bool select NEED_DMA_MAP_STATE
+# +# Should be selected if we can mmap non-coherent mappings to userspace. +# The only thing that is really required is a way to set an uncached bit +# in the pagetables +# +config DMA_NONCOHERENT_MMAP + bool + config DMA_REMAP depends on MMU select GENERIC_ALLOCATOR + select DMA_NONCOHERENT_MMAP bool
config DMA_DIRECT_REMAP diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 9621993bf2bb..76c722bc9e0c 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -43,6 +43,12 @@ static inline dma_addr_t phys_to_dma_direct(struct device *dev, return phys_to_dma(dev, phys); }
+static inline struct page *dma_direct_to_page(struct device *dev, + dma_addr_t dma_addr) +{ + return pfn_to_page(PHYS_PFN(dma_to_phys(dev, dma_addr))); +} + u64 dma_direct_get_required_mask(struct device *dev) { phys_addr_t phys = (phys_addr_t)(max_pfn - 1) << PAGE_SHIFT; @@ -380,6 +386,59 @@ dma_addr_t dma_direct_map_resource(struct device *dev, phys_addr_t paddr, } EXPORT_SYMBOL(dma_direct_map_resource);
+int dma_direct_get_sgtable(struct device *dev, struct sg_table *sgt, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + unsigned long attrs) +{ + struct page *page = dma_direct_to_page(dev, dma_addr); + int ret; + + ret = sg_alloc_table(sgt, 1, GFP_KERNEL); + if (!ret) + sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0); + return ret; +} + +#ifdef CONFIG_MMU +bool dma_direct_can_mmap(struct device *dev) +{ + return dev_is_dma_coherent(dev) || + IS_ENABLED(CONFIG_DMA_NONCOHERENT_MMAP); +} + +int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + unsigned long attrs) +{ + unsigned long user_count = vma_pages(vma); + unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; + unsigned long pfn = PHYS_PFN(dma_to_phys(dev, dma_addr)); + int ret = -ENXIO; + + vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs); + + if (dma_mmap_from_dev_coherent(dev, vma, cpu_addr, size, &ret)) + return ret; + + if (vma->vm_pgoff >= count || user_count > count - vma->vm_pgoff) + return -ENXIO; + return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff, + user_count << PAGE_SHIFT, vma->vm_page_prot); +} +#else /* CONFIG_MMU */ +bool dma_direct_can_mmap(struct device *dev) +{ + return false; +} + +int dma_direct_mmap(struct device *dev, struct vm_area_struct *vma, + void *cpu_addr, dma_addr_t dma_addr, size_t size, + unsigned long attrs) +{ + return -ENXIO; +} +#endif /* CONFIG_MMU */ + /* * Because 32-bit DMA masks are so common we expect every architecture to be * able to satisfy them - either by not supporting more physical memory, or by diff --git a/kernel/dma/mapping.c b/kernel/dma/mapping.c index 8682a5305cb3..98e3d873792e 100644 --- a/kernel/dma/mapping.c +++ b/kernel/dma/mapping.c @@ -112,24 +112,9 @@ int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, dma_addr_t dma_addr, size_t size, unsigned long attrs) { - struct page *page; + struct page *page = virt_to_page(cpu_addr); int ret;
- if (!dev_is_dma_coherent(dev)) { - unsigned long pfn; - - if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_COHERENT_TO_PFN)) - return -ENXIO; - - /* If the PFN is not valid, we do not have a struct page */ - pfn = arch_dma_coherent_to_pfn(dev, cpu_addr, dma_addr); - if (!pfn_valid(pfn)) - return -ENXIO; - page = pfn_to_page(pfn); - } else { - page = virt_to_page(cpu_addr); - } - ret = sg_alloc_table(sgt, 1, GFP_KERNEL); if (!ret) sg_set_page(sgt->sgl, page, PAGE_ALIGN(size), 0); @@ -154,7 +139,7 @@ int dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, const struct dma_map_ops *ops = get_dma_ops(dev);
if (dma_is_direct(ops)) - return dma_common_get_sgtable(dev, sgt, cpu_addr, dma_addr, + return dma_direct_get_sgtable(dev, sgt, cpu_addr, dma_addr, size, attrs); if (!ops->get_sgtable) return -ENXIO; @@ -194,7 +179,6 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, unsigned long user_count = vma_pages(vma); unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; unsigned long off = vma->vm_pgoff; - unsigned long pfn; int ret = -ENXIO;
vma->vm_page_prot = dma_pgprot(dev, vma->vm_page_prot, attrs); @@ -205,19 +189,8 @@ int dma_common_mmap(struct device *dev, struct vm_area_struct *vma, if (off >= count || user_count > count - off) return -ENXIO;
- if (!dev_is_dma_coherent(dev)) { - if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_COHERENT_TO_PFN)) - return -ENXIO; - - /* If the PFN is not valid, we do not have a struct page */ - pfn = arch_dma_coherent_to_pfn(dev, cpu_addr, dma_addr); - if (!pfn_valid(pfn)) - return -ENXIO; - } else { - pfn = page_to_pfn(virt_to_page(cpu_addr)); - } - - return remap_pfn_range(vma, vma->vm_start, pfn + vma->vm_pgoff, + return remap_pfn_range(vma, vma->vm_start, + page_to_pfn(virt_to_page(cpu_addr)) + vma->vm_pgoff, user_count << PAGE_SHIFT, vma->vm_page_prot); #else return -ENXIO; @@ -235,12 +208,8 @@ bool dma_can_mmap(struct device *dev) { const struct dma_map_ops *ops = get_dma_ops(dev);
- if (dma_is_direct(ops)) { - return IS_ENABLED(CONFIG_MMU) && - (dev_is_dma_coherent(dev) || - IS_ENABLED(CONFIG_ARCH_HAS_DMA_COHERENT_TO_PFN)); - } - + if (dma_is_direct(ops)) + return dma_direct_can_mmap(dev); return ops->mmap != NULL; } EXPORT_SYMBOL_GPL(dma_can_mmap); @@ -265,7 +234,7 @@ int dma_mmap_attrs(struct device *dev, struct vm_area_struct *vma, const struct dma_map_ops *ops = get_dma_ops(dev);
if (dma_is_direct(ops)) - return dma_common_mmap(dev, vma, cpu_addr, dma_addr, size, + return dma_direct_mmap(dev, vma, cpu_addr, dma_addr, size, attrs); if (!ops->mmap) return -ENXIO; diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c index 90d5ce77c189..3c49499ee6b0 100644 --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -259,10 +259,4 @@ void arch_dma_free(struct device *dev, size_t size, void *vaddr, dma_free_contiguous(dev, page, size); } } - -long arch_dma_coherent_to_pfn(struct device *dev, void *cpu_addr, - dma_addr_t dma_addr) -{ - return __phys_to_pfn(dma_to_phys(dev, dma_addr)); -} #endif /* CONFIG_DMA_DIRECT_REMAP */
From: Christoph Hellwig hch@lst.de
upstream 3acac065508f6cc60ac9d3e4b7c6cc37fd91d531 commit.
Integrate the generic dma remapping implementation into the main flow. This prepares for architectures like xtensa that use an uncached segment for pages in the kernel mapping, but can also remap highmem from CMA. To simplify that implementation we now always deduct the page from the physical address via the DMA address instead of the virtual address.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/direct.c | 60 ++++++++++++++++++++++++++++++++++++--------- kernel/dma/remap.c | 49 ------------------------------------ 2 files changed, 48 insertions(+), 61 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 76c722bc9e0c..d30c5468a91a 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -12,6 +12,7 @@ #include <linux/dma-contiguous.h> #include <linux/dma-noncoherent.h> #include <linux/pfn.h> +#include <linux/vmalloc.h> #include <linux/set_memory.h> #include <linux/swiotlb.h>
@@ -138,6 +139,15 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret;
+ if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && + dma_alloc_need_uncached(dev, attrs) && + !gfpflags_allow_blocking(gfp)) { + ret = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp); + if (!ret) + return NULL; + goto done; + } + page = __dma_direct_alloc_pages(dev, size, gfp, attrs); if (!page) return NULL; @@ -147,9 +157,28 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, /* remove any dirty cache lines on the kernel alias */ if (!PageHighMem(page)) arch_dma_prep_coherent(page, size); - *dma_handle = phys_to_dma(dev, page_to_phys(page)); /* return the page pointer as the opaque cookie */ - return page; + ret = page; + goto done; + } + + if ((IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && + dma_alloc_need_uncached(dev, attrs)) || + (IS_ENABLED(CONFIG_DMA_REMAP) && PageHighMem(page))) { + /* remove any dirty cache lines on the kernel alias */ + arch_dma_prep_coherent(page, PAGE_ALIGN(size)); + + /* create a coherent mapping */ + ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size), + dma_pgprot(dev, PAGE_KERNEL, attrs), + __builtin_return_address(0)); + if (!ret) { + dma_free_contiguous(dev, page, size); + return ret; + } + + memset(ret, 0, size); + goto done; }
if (PageHighMem(page)) { @@ -165,12 +194,9 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, }
ret = page_address(page); - if (force_dma_unencrypted(dev)) { + if (force_dma_unencrypted(dev)) set_memory_decrypted((unsigned long)ret, 1 << get_order(size)); - *dma_handle = __phys_to_dma(dev, page_to_phys(page)); - } else { - *dma_handle = phys_to_dma(dev, page_to_phys(page)); - } + memset(ret, 0, size);
if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && @@ -178,7 +204,11 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, arch_dma_prep_coherent(page, size); ret = uncached_kernel_address(ret); } - +done: + if (force_dma_unencrypted(dev)) + *dma_handle = __phys_to_dma(dev, page_to_phys(page)); + else + *dma_handle = phys_to_dma(dev, page_to_phys(page)); return ret; }
@@ -194,19 +224,24 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, return; }
+ if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && + dma_free_from_pool(cpu_addr, PAGE_ALIGN(size))) + return; + if (force_dma_unencrypted(dev)) set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order);
- if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && - dma_alloc_need_uncached(dev, attrs)) - cpu_addr = cached_kernel_address(cpu_addr); - dma_free_contiguous(dev, virt_to_page(cpu_addr), size); + if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) + vunmap(cpu_addr); + + dma_free_contiguous(dev, dma_direct_to_page(dev, dma_addr), size); }
void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { if (!IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && + !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && dma_alloc_need_uncached(dev, attrs)) return arch_dma_alloc(dev, size, dma_handle, gfp, attrs); return dma_direct_alloc_pages(dev, size, dma_handle, gfp, attrs); @@ -216,6 +251,7 @@ void dma_direct_free(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs) { if (!IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && + !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && dma_alloc_need_uncached(dev, attrs)) arch_dma_free(dev, size, cpu_addr, dma_addr, attrs); else diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c index 3c49499ee6b0..d47bd40fc0f5 100644 --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -210,53 +210,4 @@ bool dma_free_from_pool(void *start, size_t size) gen_pool_free(atomic_pool, (unsigned long)start, size); return true; } - -void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, - gfp_t flags, unsigned long attrs) -{ - struct page *page = NULL; - void *ret; - - size = PAGE_ALIGN(size); - - if (!gfpflags_allow_blocking(flags)) { - ret = dma_alloc_from_pool(size, &page, flags); - if (!ret) - return NULL; - goto done; - } - - page = __dma_direct_alloc_pages(dev, size, flags, attrs); - if (!page) - return NULL; - - /* remove any dirty cache lines on the kernel alias */ - arch_dma_prep_coherent(page, size); - - /* create a coherent mapping */ - ret = dma_common_contiguous_remap(page, size, - dma_pgprot(dev, PAGE_KERNEL, attrs), - __builtin_return_address(0)); - if (!ret) { - dma_free_contiguous(dev, page, size); - return ret; - } - - memset(ret, 0, size); -done: - *dma_handle = phys_to_dma(dev, page_to_phys(page)); - return ret; -} - -void arch_dma_free(struct device *dev, size_t size, void *vaddr, - dma_addr_t dma_handle, unsigned long attrs) -{ - if (!dma_free_from_pool(vaddr, PAGE_ALIGN(size))) { - phys_addr_t phys = dma_to_phys(dev, dma_handle); - struct page *page = pfn_to_page(__phys_to_pfn(phys)); - - vunmap(vaddr); - dma_free_contiguous(dev, page, size); - } -} #endif /* CONFIG_DMA_DIRECT_REMAP */
From: Huang Shijie sjhuang@iluvatar.ai
upstream 964975ac6677c97ae61ec9d6969dd5d03f18d1c3 commit.
Follow the kernel conventions, rename addr_in_gen_pool to gen_pool_has_addr.
[sjhuang@iluvatar.ai: fix Documentation/ too] Link: http://lkml.kernel.org/r/20181229015914.5573-1-sjhuang@iluvatar.ai Link: http://lkml.kernel.org/r/20181228083950.20398-1-sjhuang@iluvatar.ai Signed-off-by: Huang Shijie sjhuang@iluvatar.ai Reviewed-by: Andrew Morton akpm@linux-foundation.org Cc: Russell King linux@armlinux.org.uk Cc: Arnd Bergmann arnd@arndb.de Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Christoph Hellwig hch@lst.de Cc: Marek Szyprowski m.szyprowski@samsung.com Cc: Robin Murphy robin.murphy@arm.com Cc: Stephen Rothwell sfr@canb.auug.org.au Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Peter Gonda pgonda@google.com --- Documentation/core-api/genalloc.rst | 2 +- arch/arm/mm/dma-mapping.c | 2 +- drivers/misc/sram-exec.c | 2 +- include/linux/genalloc.h | 2 +- kernel/dma/remap.c | 2 +- lib/genalloc.c | 5 +++-- 6 files changed, 8 insertions(+), 7 deletions(-)
diff --git a/Documentation/core-api/genalloc.rst b/Documentation/core-api/genalloc.rst index 6b38a39fab24..a534cc7ebd05 100644 --- a/Documentation/core-api/genalloc.rst +++ b/Documentation/core-api/genalloc.rst @@ -129,7 +129,7 @@ writing of special-purpose memory allocators in the future. :functions: gen_pool_for_each_chunk
.. kernel-doc:: lib/genalloc.c - :functions: addr_in_gen_pool + :functions: gen_pool_has_addr
.. kernel-doc:: lib/genalloc.c :functions: gen_pool_avail diff --git a/arch/arm/mm/dma-mapping.c b/arch/arm/mm/dma-mapping.c index 58d5765fb129..84ecbaefb9cf 100644 --- a/arch/arm/mm/dma-mapping.c +++ b/arch/arm/mm/dma-mapping.c @@ -529,7 +529,7 @@ static void *__alloc_from_pool(size_t size, struct page **ret_page)
static bool __in_atomic_pool(void *start, size_t size) { - return addr_in_gen_pool(atomic_pool, (unsigned long)start, size); + return gen_pool_has_addr(atomic_pool, (unsigned long)start, size); }
static int __free_from_pool(void *start, size_t size) diff --git a/drivers/misc/sram-exec.c b/drivers/misc/sram-exec.c index 426ad912b441..d054e2842a5f 100644 --- a/drivers/misc/sram-exec.c +++ b/drivers/misc/sram-exec.c @@ -96,7 +96,7 @@ void *sram_exec_copy(struct gen_pool *pool, void *dst, void *src, if (!part) return NULL;
- if (!addr_in_gen_pool(pool, (unsigned long)dst, size)) + if (!gen_pool_has_addr(pool, (unsigned long)dst, size)) return NULL;
base = (unsigned long)part->base; diff --git a/include/linux/genalloc.h b/include/linux/genalloc.h index 4bd583bd6934..5b14a0f38124 100644 --- a/include/linux/genalloc.h +++ b/include/linux/genalloc.h @@ -206,7 +206,7 @@ extern struct gen_pool *devm_gen_pool_create(struct device *dev, int min_alloc_order, int nid, const char *name); extern struct gen_pool *gen_pool_get(struct device *dev, const char *name);
-bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start, +extern bool gen_pool_has_addr(struct gen_pool *pool, unsigned long start, size_t size);
#ifdef CONFIG_OF diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c index d47bd40fc0f5..d14cbc83986a 100644 --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -178,7 +178,7 @@ bool dma_in_atomic_pool(void *start, size_t size) if (unlikely(!atomic_pool)) return false;
- return addr_in_gen_pool(atomic_pool, (unsigned long)start, size); + return gen_pool_has_addr(atomic_pool, (unsigned long)start, size); }
void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags) diff --git a/lib/genalloc.c b/lib/genalloc.c index 9fc31292cfa1..e43d6107fd62 100644 --- a/lib/genalloc.c +++ b/lib/genalloc.c @@ -540,7 +540,7 @@ void gen_pool_for_each_chunk(struct gen_pool *pool, EXPORT_SYMBOL(gen_pool_for_each_chunk);
/** - * addr_in_gen_pool - checks if an address falls within the range of a pool + * gen_pool_has_addr - checks if an address falls within the range of a pool * @pool: the generic memory pool * @start: start address * @size: size of the region @@ -548,7 +548,7 @@ EXPORT_SYMBOL(gen_pool_for_each_chunk); * Check if the range of addresses falls within the specified pool. Returns * true if the entire range is contained in the pool and false otherwise. */ -bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start, +bool gen_pool_has_addr(struct gen_pool *pool, unsigned long start, size_t size) { bool found = false; @@ -567,6 +567,7 @@ bool addr_in_gen_pool(struct gen_pool *pool, unsigned long start, rcu_read_unlock(); return found; } +EXPORT_SYMBOL(gen_pool_has_addr);
/** * gen_pool_avail - get available free space of the pool
From: David Rientjes rientjes@google.com
upstream e860c299ac0d738b44ff91693f11e63080a29698 commit.
DMA atomic pools will be needed beyond only CONFIG_DMA_DIRECT_REMAP so separate them out into their own file.
This also adds a new Kconfig option that can be subsequently used for options, such as CONFIG_AMD_MEM_ENCRYPT, that will utilize the coherent pools but do not have a dependency on direct remapping.
For this patch alone, there is no functional change introduced.
Reviewed-by: Christoph Hellwig hch@lst.de Signed-off-by: David Rientjes rientjes@google.com [hch: fixup copyrights and remove unused includes] Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/Kconfig | 6 ++- kernel/dma/Makefile | 1 + kernel/dma/pool.c | 123 ++++++++++++++++++++++++++++++++++++++++++++ kernel/dma/remap.c | 121 +------------------------------------------ 4 files changed, 130 insertions(+), 121 deletions(-) create mode 100644 kernel/dma/pool.c
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index 4c103a24e380..d006668c0027 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -79,10 +79,14 @@ config DMA_REMAP select DMA_NONCOHERENT_MMAP bool
-config DMA_DIRECT_REMAP +config DMA_COHERENT_POOL bool select DMA_REMAP
+config DMA_DIRECT_REMAP + bool + select DMA_COHERENT_POOL + config DMA_CMA bool "DMA Contiguous Memory Allocator" depends on HAVE_DMA_CONTIGUOUS && CMA diff --git a/kernel/dma/Makefile b/kernel/dma/Makefile index d237cf3dc181..370f63344e9c 100644 --- a/kernel/dma/Makefile +++ b/kernel/dma/Makefile @@ -6,4 +6,5 @@ obj-$(CONFIG_DMA_DECLARE_COHERENT) += coherent.o obj-$(CONFIG_DMA_VIRT_OPS) += virt.o obj-$(CONFIG_DMA_API_DEBUG) += debug.o obj-$(CONFIG_SWIOTLB) += swiotlb.o +obj-$(CONFIG_DMA_COHERENT_POOL) += pool.o obj-$(CONFIG_DMA_REMAP) += remap.o diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c new file mode 100644 index 000000000000..3df5d9d39922 --- /dev/null +++ b/kernel/dma/pool.c @@ -0,0 +1,123 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Copyright (C) 2012 ARM Ltd. + * Copyright (C) 2020 Google LLC + */ +#include <linux/dma-direct.h> +#include <linux/dma-noncoherent.h> +#include <linux/dma-contiguous.h> +#include <linux/init.h> +#include <linux/genalloc.h> +#include <linux/slab.h> + +static struct gen_pool *atomic_pool __ro_after_init; + +#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K +static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; + +static int __init early_coherent_pool(char *p) +{ + atomic_pool_size = memparse(p, &p); + return 0; +} +early_param("coherent_pool", early_coherent_pool); + +static gfp_t dma_atomic_pool_gfp(void) +{ + if (IS_ENABLED(CONFIG_ZONE_DMA)) + return GFP_DMA; + if (IS_ENABLED(CONFIG_ZONE_DMA32)) + return GFP_DMA32; + return GFP_KERNEL; +} + +static int __init dma_atomic_pool_init(void) +{ + unsigned int pool_size_order = get_order(atomic_pool_size); + unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT; + struct page *page; + void *addr; + int ret; + + if (dev_get_cma_area(NULL)) + page = dma_alloc_from_contiguous(NULL, nr_pages, + pool_size_order, false); + else + page = alloc_pages(dma_atomic_pool_gfp(), pool_size_order); + if (!page) + goto out; + + arch_dma_prep_coherent(page, atomic_pool_size); + + atomic_pool = gen_pool_create(PAGE_SHIFT, -1); + if (!atomic_pool) + goto free_page; + + addr = dma_common_contiguous_remap(page, atomic_pool_size, + pgprot_dmacoherent(PAGE_KERNEL), + __builtin_return_address(0)); + if (!addr) + goto destroy_genpool; + + ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr, + page_to_phys(page), atomic_pool_size, -1); + if (ret) + goto remove_mapping; + gen_pool_set_algo(atomic_pool, gen_pool_first_fit_order_align, NULL); + + pr_info("DMA: preallocated %zu KiB pool for atomic allocations\n", + atomic_pool_size / 1024); + return 0; + +remove_mapping: + dma_common_free_remap(addr, atomic_pool_size); +destroy_genpool: + gen_pool_destroy(atomic_pool); + atomic_pool = NULL; +free_page: + if (!dma_release_from_contiguous(NULL, page, nr_pages)) + __free_pages(page, pool_size_order); +out: + pr_err("DMA: failed to allocate %zu KiB pool for atomic coherent allocation\n", + atomic_pool_size / 1024); + return -ENOMEM; +} +postcore_initcall(dma_atomic_pool_init); + +bool dma_in_atomic_pool(void *start, size_t size) +{ + if (unlikely(!atomic_pool)) + return false; + + return gen_pool_has_addr(atomic_pool, (unsigned long)start, size); +} + +void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags) +{ + unsigned long val; + void *ptr = NULL; + + if (!atomic_pool) { + WARN(1, "coherent pool not initialised!\n"); + return NULL; + } + + val = gen_pool_alloc(atomic_pool, size); + if (val) { + phys_addr_t phys = gen_pool_virt_to_phys(atomic_pool, val); + + *ret_page = pfn_to_page(__phys_to_pfn(phys)); + ptr = (void *)val; + memset(ptr, 0, size); + } + + return ptr; +} + +bool dma_free_from_pool(void *start, size_t size) +{ + if (!dma_in_atomic_pool(start, size)) + return false; + gen_pool_free(atomic_pool, (unsigned long)start, size); + return true; +} diff --git a/kernel/dma/remap.c b/kernel/dma/remap.c index d14cbc83986a..f7b402849891 100644 --- a/kernel/dma/remap.c +++ b/kernel/dma/remap.c @@ -1,13 +1,8 @@ // SPDX-License-Identifier: GPL-2.0 /* - * Copyright (C) 2012 ARM Ltd. * Copyright (c) 2014 The Linux Foundation */ -#include <linux/dma-direct.h> -#include <linux/dma-noncoherent.h> -#include <linux/dma-contiguous.h> -#include <linux/init.h> -#include <linux/genalloc.h> +#include <linux/dma-mapping.h> #include <linux/slab.h> #include <linux/vmalloc.h>
@@ -97,117 +92,3 @@ void dma_common_free_remap(void *cpu_addr, size_t size) unmap_kernel_range((unsigned long)cpu_addr, PAGE_ALIGN(size)); vunmap(cpu_addr); } - -#ifdef CONFIG_DMA_DIRECT_REMAP -static struct gen_pool *atomic_pool __ro_after_init; - -#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K -static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; - -static int __init early_coherent_pool(char *p) -{ - atomic_pool_size = memparse(p, &p); - return 0; -} -early_param("coherent_pool", early_coherent_pool); - -static gfp_t dma_atomic_pool_gfp(void) -{ - if (IS_ENABLED(CONFIG_ZONE_DMA)) - return GFP_DMA; - if (IS_ENABLED(CONFIG_ZONE_DMA32)) - return GFP_DMA32; - return GFP_KERNEL; -} - -static int __init dma_atomic_pool_init(void) -{ - unsigned int pool_size_order = get_order(atomic_pool_size); - unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT; - struct page *page; - void *addr; - int ret; - - if (dev_get_cma_area(NULL)) - page = dma_alloc_from_contiguous(NULL, nr_pages, - pool_size_order, false); - else - page = alloc_pages(dma_atomic_pool_gfp(), pool_size_order); - if (!page) - goto out; - - arch_dma_prep_coherent(page, atomic_pool_size); - - atomic_pool = gen_pool_create(PAGE_SHIFT, -1); - if (!atomic_pool) - goto free_page; - - addr = dma_common_contiguous_remap(page, atomic_pool_size, - pgprot_dmacoherent(PAGE_KERNEL), - __builtin_return_address(0)); - if (!addr) - goto destroy_genpool; - - ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr, - page_to_phys(page), atomic_pool_size, -1); - if (ret) - goto remove_mapping; - gen_pool_set_algo(atomic_pool, gen_pool_first_fit_order_align, NULL); - - pr_info("DMA: preallocated %zu KiB pool for atomic allocations\n", - atomic_pool_size / 1024); - return 0; - -remove_mapping: - dma_common_free_remap(addr, atomic_pool_size); -destroy_genpool: - gen_pool_destroy(atomic_pool); - atomic_pool = NULL; -free_page: - if (!dma_release_from_contiguous(NULL, page, nr_pages)) - __free_pages(page, pool_size_order); -out: - pr_err("DMA: failed to allocate %zu KiB pool for atomic coherent allocation\n", - atomic_pool_size / 1024); - return -ENOMEM; -} -postcore_initcall(dma_atomic_pool_init); - -bool dma_in_atomic_pool(void *start, size_t size) -{ - if (unlikely(!atomic_pool)) - return false; - - return gen_pool_has_addr(atomic_pool, (unsigned long)start, size); -} - -void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags) -{ - unsigned long val; - void *ptr = NULL; - - if (!atomic_pool) { - WARN(1, "coherent pool not initialised!\n"); - return NULL; - } - - val = gen_pool_alloc(atomic_pool, size); - if (val) { - phys_addr_t phys = gen_pool_virt_to_phys(atomic_pool, val); - - *ret_page = pfn_to_page(__phys_to_pfn(phys)); - ptr = (void *)val; - memset(ptr, 0, size); - } - - return ptr; -} - -bool dma_free_from_pool(void *start, size_t size) -{ - if (!dma_in_atomic_pool(start, size)) - return false; - gen_pool_free(atomic_pool, (unsigned long)start, size); - return true; -} -#endif /* CONFIG_DMA_DIRECT_REMAP */
From: David Rientjes rientjes@google.com
upstream c84dc6e68a1d2464e050d9694be4e4ff49e32bfd commit.
The single atomic pool is allocated from the lowest zone possible since it is guaranteed to be applicable for any DMA allocation.
Devices may allocate through the DMA API but not have a strict reliance on GFP_DMA memory. Since the atomic pool will be used for all non-blockable allocations, returning all memory from ZONE_DMA may unnecessarily deplete the zone.
Provision for multiple atomic pools that will map to the optimal gfp mask of the device.
When allocating non-blockable memory, determine the optimal gfp mask of the device and use the appropriate atomic pool.
The coherent DMA mask will remain the same between allocation and free and, thus, memory will be freed to the same atomic pool it was allocated from.
__dma_atomic_pool_init() will be changed to return struct gen_pool * later once dynamic expansion is added.
Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- drivers/iommu/dma-iommu.c | 5 +- include/linux/dma-direct.h | 2 + include/linux/dma-mapping.h | 6 +- kernel/dma/direct.c | 10 +-- kernel/dma/pool.c | 120 +++++++++++++++++++++++------------- 5 files changed, 90 insertions(+), 53 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index 76bd2309e023..b642c1123a29 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -927,7 +927,7 @@ static void __iommu_dma_free(struct device *dev, size_t size, void *cpu_addr)
/* Non-coherent atomic allocation? Easy */ if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(cpu_addr, alloc_size)) + dma_free_from_pool(dev, cpu_addr, alloc_size)) return;
if (IS_ENABLED(CONFIG_DMA_REMAP) && is_vmalloc_addr(cpu_addr)) { @@ -1010,7 +1010,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t size,
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !gfpflags_allow_blocking(gfp) && !coherent) - cpu_addr = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp); + cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, + gfp); else cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs); if (!cpu_addr) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 6db863c3eb93..fb5ec847ddf3 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -66,6 +66,8 @@ static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr) }
u64 dma_direct_get_required_mask(struct device *dev); +gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, + u64 *phys_mask); void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void dma_direct_free(struct device *dev, size_t size, void *cpu_addr, diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index 4d450672b7d6..e4be706d8f5e 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -633,9 +633,9 @@ void *dma_common_pages_remap(struct page **pages, size_t size, pgprot_t prot, const void *caller); void dma_common_free_remap(void *cpu_addr, size_t size);
-bool dma_in_atomic_pool(void *start, size_t size); -void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags); -bool dma_free_from_pool(void *start, size_t size); +void *dma_alloc_from_pool(struct device *dev, size_t size, + struct page **ret_page, gfp_t flags); +bool dma_free_from_pool(struct device *dev, void *start, size_t size);
int dma_common_get_sgtable(struct device *dev, struct sg_table *sgt, void *cpu_addr, diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index d30c5468a91a..38266fb2797d 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -58,8 +58,8 @@ u64 dma_direct_get_required_mask(struct device *dev) return (1ULL << (fls64(max_dma) - 1)) * 2 - 1; }
-static gfp_t __dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, - u64 *phys_mask) +gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, + u64 *phys_mask) { if (dev->bus_dma_mask && dev->bus_dma_mask < dma_mask) dma_mask = dev->bus_dma_mask; @@ -103,7 +103,7 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size,
/* we always manually zero the memory once we are done: */ gfp &= ~__GFP_ZERO; - gfp |= __dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, + gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, &phys_mask); page = dma_alloc_contiguous(dev, alloc_size, gfp); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { @@ -142,7 +142,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && dma_alloc_need_uncached(dev, attrs) && !gfpflags_allow_blocking(gfp)) { - ret = dma_alloc_from_pool(PAGE_ALIGN(size), &page, gfp); + ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); if (!ret) return NULL; goto done; @@ -225,7 +225,7 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, }
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(cpu_addr, PAGE_ALIGN(size))) + dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) return;
if (force_dma_unencrypted(dev)) diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 3df5d9d39922..db4f89ac5f5f 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -10,7 +10,9 @@ #include <linux/genalloc.h> #include <linux/slab.h>
-static struct gen_pool *atomic_pool __ro_after_init; +static struct gen_pool *atomic_pool_dma __ro_after_init; +static struct gen_pool *atomic_pool_dma32 __ro_after_init; +static struct gen_pool *atomic_pool_kernel __ro_after_init;
#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; @@ -22,89 +24,119 @@ static int __init early_coherent_pool(char *p) } early_param("coherent_pool", early_coherent_pool);
-static gfp_t dma_atomic_pool_gfp(void) +static int __init __dma_atomic_pool_init(struct gen_pool **pool, + size_t pool_size, gfp_t gfp) { - if (IS_ENABLED(CONFIG_ZONE_DMA)) - return GFP_DMA; - if (IS_ENABLED(CONFIG_ZONE_DMA32)) - return GFP_DMA32; - return GFP_KERNEL; -} - -static int __init dma_atomic_pool_init(void) -{ - unsigned int pool_size_order = get_order(atomic_pool_size); - unsigned long nr_pages = atomic_pool_size >> PAGE_SHIFT; + const unsigned int order = get_order(pool_size); + const unsigned long nr_pages = pool_size >> PAGE_SHIFT; struct page *page; void *addr; int ret;
if (dev_get_cma_area(NULL)) - page = dma_alloc_from_contiguous(NULL, nr_pages, - pool_size_order, false); + page = dma_alloc_from_contiguous(NULL, nr_pages, order, false); else - page = alloc_pages(dma_atomic_pool_gfp(), pool_size_order); + page = alloc_pages(gfp, order); if (!page) goto out;
- arch_dma_prep_coherent(page, atomic_pool_size); + arch_dma_prep_coherent(page, pool_size);
- atomic_pool = gen_pool_create(PAGE_SHIFT, -1); - if (!atomic_pool) + *pool = gen_pool_create(PAGE_SHIFT, -1); + if (!*pool) goto free_page;
- addr = dma_common_contiguous_remap(page, atomic_pool_size, + addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) goto destroy_genpool;
- ret = gen_pool_add_virt(atomic_pool, (unsigned long)addr, - page_to_phys(page), atomic_pool_size, -1); + ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page), + pool_size, -1); if (ret) goto remove_mapping; - gen_pool_set_algo(atomic_pool, gen_pool_first_fit_order_align, NULL); + gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL);
- pr_info("DMA: preallocated %zu KiB pool for atomic allocations\n", - atomic_pool_size / 1024); + pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n", + pool_size >> 10, &gfp); return 0;
remove_mapping: - dma_common_free_remap(addr, atomic_pool_size); + dma_common_free_remap(addr, pool_size); destroy_genpool: - gen_pool_destroy(atomic_pool); - atomic_pool = NULL; + gen_pool_destroy(*pool); + *pool = NULL; free_page: if (!dma_release_from_contiguous(NULL, page, nr_pages)) - __free_pages(page, pool_size_order); + __free_pages(page, order); out: - pr_err("DMA: failed to allocate %zu KiB pool for atomic coherent allocation\n", - atomic_pool_size / 1024); + pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n", + pool_size >> 10, &gfp); return -ENOMEM; } + +static int __init dma_atomic_pool_init(void) +{ + int ret = 0; + int err; + + ret = __dma_atomic_pool_init(&atomic_pool_kernel, atomic_pool_size, + GFP_KERNEL); + if (IS_ENABLED(CONFIG_ZONE_DMA)) { + err = __dma_atomic_pool_init(&atomic_pool_dma, + atomic_pool_size, GFP_DMA); + if (!ret && err) + ret = err; + } + if (IS_ENABLED(CONFIG_ZONE_DMA32)) { + err = __dma_atomic_pool_init(&atomic_pool_dma32, + atomic_pool_size, GFP_DMA32); + if (!ret && err) + ret = err; + } + return ret; +} postcore_initcall(dma_atomic_pool_init);
-bool dma_in_atomic_pool(void *start, size_t size) +static inline struct gen_pool *dev_to_pool(struct device *dev) { - if (unlikely(!atomic_pool)) - return false; + u64 phys_mask; + gfp_t gfp; + + gfp = dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, + &phys_mask); + if (IS_ENABLED(CONFIG_ZONE_DMA) && gfp == GFP_DMA) + return atomic_pool_dma; + if (IS_ENABLED(CONFIG_ZONE_DMA32) && gfp == GFP_DMA32) + return atomic_pool_dma32; + return atomic_pool_kernel; +}
- return gen_pool_has_addr(atomic_pool, (unsigned long)start, size); +static bool dma_in_atomic_pool(struct device *dev, void *start, size_t size) +{ + struct gen_pool *pool = dev_to_pool(dev); + + if (unlikely(!pool)) + return false; + return gen_pool_has_addr(pool, (unsigned long)start, size); }
-void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags) +void *dma_alloc_from_pool(struct device *dev, size_t size, + struct page **ret_page, gfp_t flags) { + struct gen_pool *pool = dev_to_pool(dev); unsigned long val; void *ptr = NULL;
- if (!atomic_pool) { - WARN(1, "coherent pool not initialised!\n"); + if (!pool) { + WARN(1, "%pGg atomic pool not initialised!\n", &flags); return NULL; }
- val = gen_pool_alloc(atomic_pool, size); + val = gen_pool_alloc(pool, size); if (val) { - phys_addr_t phys = gen_pool_virt_to_phys(atomic_pool, val); + phys_addr_t phys = gen_pool_virt_to_phys(pool, val);
*ret_page = pfn_to_page(__phys_to_pfn(phys)); ptr = (void *)val; @@ -114,10 +146,12 @@ void *dma_alloc_from_pool(size_t size, struct page **ret_page, gfp_t flags) return ptr; }
-bool dma_free_from_pool(void *start, size_t size) +bool dma_free_from_pool(struct device *dev, void *start, size_t size) { - if (!dma_in_atomic_pool(start, size)) + struct gen_pool *pool = dev_to_pool(dev); + + if (!dma_in_atomic_pool(dev, start, size)) return false; - gen_pool_free(atomic_pool, (unsigned long)start, size); + gen_pool_free(pool, (unsigned long)start, size); return true; }
From: David Rientjes rientjes@google.com
upstream 54adadf9b08571fb8b11dc9d0d3a2ddd39825efd commit.
When an atomic pool becomes fully depleted because it is now relied upon for all non-blocking allocations through the DMA API, allow background expansion of each pool by a kworker.
When an atomic pool has less than the default size of memory left, kick off a kworker to dynamically expand the pool in the background. The pool is doubled in size, up to MAX_ORDER-1. If memory cannot be allocated at the requested order, smaller allocation(s) are attempted.
This allows the default size to be kept quite low when one or more of the atomic pools is not used.
Allocations for lowmem should also use GFP_KERNEL for the benefits of reclaim, so use GFP_KERNEL | GFP_DMA and GFP_KERNEL | GFP_DMA32 for lowmem allocations.
This also allows __dma_atomic_pool_init() to return a pointer to the pool to make initialization cleaner.
Also switch over some node ids to the more appropriate NUMA_NO_NODE.
Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/pool.c | 122 +++++++++++++++++++++++++++++++--------------- 1 file changed, 84 insertions(+), 38 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index db4f89ac5f5f..ffe866c2c034 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -9,13 +9,17 @@ #include <linux/init.h> #include <linux/genalloc.h> #include <linux/slab.h> +#include <linux/workqueue.h>
static struct gen_pool *atomic_pool_dma __ro_after_init; static struct gen_pool *atomic_pool_dma32 __ro_after_init; static struct gen_pool *atomic_pool_kernel __ro_after_init;
#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K -static size_t atomic_pool_size __initdata = DEFAULT_DMA_COHERENT_POOL_SIZE; +static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; + +/* Dynamic background expansion when the atomic pool is near capacity */ +static struct work_struct atomic_pool_work;
static int __init early_coherent_pool(char *p) { @@ -24,76 +28,116 @@ static int __init early_coherent_pool(char *p) } early_param("coherent_pool", early_coherent_pool);
-static int __init __dma_atomic_pool_init(struct gen_pool **pool, - size_t pool_size, gfp_t gfp) +static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, + gfp_t gfp) { - const unsigned int order = get_order(pool_size); - const unsigned long nr_pages = pool_size >> PAGE_SHIFT; + unsigned int order; struct page *page; void *addr; - int ret; + int ret = -ENOMEM; + + /* Cannot allocate larger than MAX_ORDER-1 */ + order = min(get_order(pool_size), MAX_ORDER-1); + + do { + pool_size = 1 << (PAGE_SHIFT + order);
- if (dev_get_cma_area(NULL)) - page = dma_alloc_from_contiguous(NULL, nr_pages, order, false); - else - page = alloc_pages(gfp, order); + if (dev_get_cma_area(NULL)) + page = dma_alloc_from_contiguous(NULL, 1 << order, + order, false); + else + page = alloc_pages(gfp, order); + } while (!page && order-- > 0); if (!page) goto out;
arch_dma_prep_coherent(page, pool_size);
- *pool = gen_pool_create(PAGE_SHIFT, -1); - if (!*pool) - goto free_page; - addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) - goto destroy_genpool; + goto free_page;
- ret = gen_pool_add_virt(*pool, (unsigned long)addr, page_to_phys(page), - pool_size, -1); + ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page), + pool_size, NUMA_NO_NODE); if (ret) goto remove_mapping; - gen_pool_set_algo(*pool, gen_pool_first_fit_order_align, NULL);
- pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n", - pool_size >> 10, &gfp); return 0;
remove_mapping: dma_common_free_remap(addr, pool_size); -destroy_genpool: - gen_pool_destroy(*pool); - *pool = NULL; free_page: - if (!dma_release_from_contiguous(NULL, page, nr_pages)) + if (!dma_release_from_contiguous(NULL, page, 1 << order)) __free_pages(page, order); out: - pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n", - pool_size >> 10, &gfp); - return -ENOMEM; + return ret; +} + +static void atomic_pool_resize(struct gen_pool *pool, gfp_t gfp) +{ + if (pool && gen_pool_avail(pool) < atomic_pool_size) + atomic_pool_expand(pool, gen_pool_size(pool), gfp); +} + +static void atomic_pool_work_fn(struct work_struct *work) +{ + if (IS_ENABLED(CONFIG_ZONE_DMA)) + atomic_pool_resize(atomic_pool_dma, + GFP_KERNEL | GFP_DMA); + if (IS_ENABLED(CONFIG_ZONE_DMA32)) + atomic_pool_resize(atomic_pool_dma32, + GFP_KERNEL | GFP_DMA32); + atomic_pool_resize(atomic_pool_kernel, GFP_KERNEL); +} + +static __init struct gen_pool *__dma_atomic_pool_init(size_t pool_size, + gfp_t gfp) +{ + struct gen_pool *pool; + int ret; + + pool = gen_pool_create(PAGE_SHIFT, NUMA_NO_NODE); + if (!pool) + return NULL; + + gen_pool_set_algo(pool, gen_pool_first_fit_order_align, NULL); + + ret = atomic_pool_expand(pool, pool_size, gfp); + if (ret) { + gen_pool_destroy(pool); + pr_err("DMA: failed to allocate %zu KiB %pGg pool for atomic allocation\n", + pool_size >> 10, &gfp); + return NULL; + } + + pr_info("DMA: preallocated %zu KiB %pGg pool for atomic allocations\n", + gen_pool_size(pool) >> 10, &gfp); + return pool; }
static int __init dma_atomic_pool_init(void) { int ret = 0; - int err;
- ret = __dma_atomic_pool_init(&atomic_pool_kernel, atomic_pool_size, - GFP_KERNEL); + INIT_WORK(&atomic_pool_work, atomic_pool_work_fn); + + atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size, + GFP_KERNEL); + if (!atomic_pool_kernel) + ret = -ENOMEM; if (IS_ENABLED(CONFIG_ZONE_DMA)) { - err = __dma_atomic_pool_init(&atomic_pool_dma, - atomic_pool_size, GFP_DMA); - if (!ret && err) - ret = err; + atomic_pool_dma = __dma_atomic_pool_init(atomic_pool_size, + GFP_KERNEL | GFP_DMA); + if (!atomic_pool_dma) + ret = -ENOMEM; } if (IS_ENABLED(CONFIG_ZONE_DMA32)) { - err = __dma_atomic_pool_init(&atomic_pool_dma32, - atomic_pool_size, GFP_DMA32); - if (!ret && err) - ret = err; + atomic_pool_dma32 = __dma_atomic_pool_init(atomic_pool_size, + GFP_KERNEL | GFP_DMA32); + if (!atomic_pool_dma32) + ret = -ENOMEM; } return ret; } @@ -142,6 +186,8 @@ void *dma_alloc_from_pool(struct device *dev, size_t size, ptr = (void *)val; memset(ptr, 0, size); } + if (gen_pool_avail(pool) < atomic_pool_size) + schedule_work(&atomic_pool_work);
return ptr; }
From: David Rientjes rientjes@google.com
upstream 76a19940bd62a81148c303f3df6d0cee9ae4b509 commit.
When a device requires unencrypted memory and the context does not allow blocking, memory must be returned from the atomic coherent pools.
This avoids the remap when CONFIG_DMA_DIRECT_REMAP is not enabled and the config only requires CONFIG_DMA_COHERENT_POOL. This will be used for CONFIG_AMD_MEM_ENCRYPT in a subsequent patch.
Keep all memory in these pools unencrypted. When set_memory_decrypted() fails, this prohibits the memory from being added. If adding memory to the genpool fails, and set_memory_encrypted() subsequently fails, there is no alternative other than leaking the memory.
Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/direct.c | 46 ++++++++++++++++++++++++++++++++++++++------- kernel/dma/pool.c | 27 +++++++++++++++++++++++--- 2 files changed, 63 insertions(+), 10 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 38266fb2797d..210ea469028c 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -90,6 +90,39 @@ static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) min_not_zero(dev->coherent_dma_mask, dev->bus_dma_mask); }
+/* + * Decrypting memory is allowed to block, so if this device requires + * unencrypted memory it must come from atomic pools. + */ +static inline bool dma_should_alloc_from_pool(struct device *dev, gfp_t gfp, + unsigned long attrs) +{ + if (!IS_ENABLED(CONFIG_DMA_COHERENT_POOL)) + return false; + if (gfpflags_allow_blocking(gfp)) + return false; + if (force_dma_unencrypted(dev)) + return true; + if (!IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) + return false; + if (dma_alloc_need_uncached(dev, attrs)) + return true; + return false; +} + +static inline bool dma_should_free_from_pool(struct device *dev, + unsigned long attrs) +{ + if (IS_ENABLED(CONFIG_DMA_COHERENT_POOL)) + return true; + if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && + !force_dma_unencrypted(dev)) + return false; + if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP)) + return true; + return false; +} + struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp_t gfp, unsigned long attrs) { @@ -139,9 +172,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret;
- if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_alloc_need_uncached(dev, attrs) && - !gfpflags_allow_blocking(gfp)) { + if (dma_should_alloc_from_pool(dev, gfp, attrs)) { ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); if (!ret) return NULL; @@ -217,6 +248,11 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, { unsigned int page_order = get_order(size);
+ /* If cpu_addr is not from an atomic pool, dma_free_from_pool() fails */ + if (dma_should_free_from_pool(dev, attrs) && + dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) + return; + if ((attrs & DMA_ATTR_NO_KERNEL_MAPPING) && !force_dma_unencrypted(dev)) { /* cpu_addr is a struct page cookie, not a kernel address */ @@ -224,10 +260,6 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, return; }
- if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && - dma_free_from_pool(dev, cpu_addr, PAGE_ALIGN(size))) - return; - if (force_dma_unencrypted(dev)) set_memory_encrypted((unsigned long)cpu_addr, 1 << page_order);
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index ffe866c2c034..c8d61b3a7bd6 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -8,6 +8,7 @@ #include <linux/dma-contiguous.h> #include <linux/init.h> #include <linux/genalloc.h> +#include <linux/set_memory.h> #include <linux/slab.h> #include <linux/workqueue.h>
@@ -53,22 +54,42 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
arch_dma_prep_coherent(page, pool_size);
+#ifdef CONFIG_DMA_DIRECT_REMAP addr = dma_common_contiguous_remap(page, pool_size, pgprot_dmacoherent(PAGE_KERNEL), __builtin_return_address(0)); if (!addr) goto free_page; - +#else + addr = page_to_virt(page); +#endif + /* + * Memory in the atomic DMA pools must be unencrypted, the pools do not + * shrink so no re-encryption occurs in dma_direct_free_pages(). + */ + ret = set_memory_decrypted((unsigned long)page_to_virt(page), + 1 << order); + if (ret) + goto remove_mapping; ret = gen_pool_add_virt(pool, (unsigned long)addr, page_to_phys(page), pool_size, NUMA_NO_NODE); if (ret) - goto remove_mapping; + goto encrypt_mapping;
return 0;
+encrypt_mapping: + ret = set_memory_encrypted((unsigned long)page_to_virt(page), + 1 << order); + if (WARN_ON_ONCE(ret)) { + /* Decrypt succeeded but encrypt failed, purposely leak */ + goto out; + } remove_mapping: +#ifdef CONFIG_DMA_DIRECT_REMAP dma_common_free_remap(addr, pool_size); -free_page: +#endif +free_page: __maybe_unused if (!dma_release_from_contiguous(NULL, page, 1 << order)) __free_pages(page, order); out:
From: David Rientjes rientjes@google.com
upstream 2edc5bb3c5cc42131438460a50b7b16905c81c2a commit.
The atomic DMA pools can dynamically expand based on non-blocking allocations that need to use it.
Export the sizes of each of these pools, in bytes, through debugfs for measurement.
Suggested-by: Christoph Hellwig hch@lst.de Signed-off-by: David Rientjes rientjes@google.com [hch: remove the !CONFIG_DEBUG_FS stubs] Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/pool.c | 30 ++++++++++++++++++++++++++++++ 1 file changed, 30 insertions(+)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index c8d61b3a7bd6..dde6de7f8e83 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -3,6 +3,7 @@ * Copyright (C) 2012 ARM Ltd. * Copyright (C) 2020 Google LLC */ +#include <linux/debugfs.h> #include <linux/dma-direct.h> #include <linux/dma-noncoherent.h> #include <linux/dma-contiguous.h> @@ -13,8 +14,11 @@ #include <linux/workqueue.h>
static struct gen_pool *atomic_pool_dma __ro_after_init; +static unsigned long pool_size_dma; static struct gen_pool *atomic_pool_dma32 __ro_after_init; +static unsigned long pool_size_dma32; static struct gen_pool *atomic_pool_kernel __ro_after_init; +static unsigned long pool_size_kernel;
#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; @@ -29,6 +33,29 @@ static int __init early_coherent_pool(char *p) } early_param("coherent_pool", early_coherent_pool);
+static void __init dma_atomic_pool_debugfs_init(void) +{ + struct dentry *root; + + root = debugfs_create_dir("dma_pools", NULL); + if (IS_ERR_OR_NULL(root)) + return; + + debugfs_create_ulong("pool_size_dma", 0400, root, &pool_size_dma); + debugfs_create_ulong("pool_size_dma32", 0400, root, &pool_size_dma32); + debugfs_create_ulong("pool_size_kernel", 0400, root, &pool_size_kernel); +} + +static void dma_atomic_pool_size_add(gfp_t gfp, size_t size) +{ + if (gfp & __GFP_DMA) + pool_size_dma += size; + else if (gfp & __GFP_DMA32) + pool_size_dma32 += size; + else + pool_size_kernel += size; +} + static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, gfp_t gfp) { @@ -76,6 +103,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, if (ret) goto encrypt_mapping;
+ dma_atomic_pool_size_add(gfp, pool_size); return 0;
encrypt_mapping: @@ -160,6 +188,8 @@ static int __init dma_atomic_pool_init(void) if (!atomic_pool_dma32) ret = -ENOMEM; } + + dma_atomic_pool_debugfs_init(); return ret; } postcore_initcall(dma_atomic_pool_init);
From: David Rientjes rientjes@google.com
upstream 82fef0ad811fb5976cf36ccc3d2c3bc0195dfb72 commit.
When CONFIG_AMD_MEM_ENCRYPT is enabled and a device requires unencrypted DMA, all non-blocking allocations must originate from the atomic DMA coherent pools.
Select CONFIG_DMA_COHERENT_POOL for CONFIG_AMD_MEM_ENCRYPT.
Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+)
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 8ef85139553f..be8746e9d864 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1530,6 +1530,7 @@ config X86_CPA_STATISTICS config AMD_MEM_ENCRYPT bool "AMD Secure Memory Encryption (SME) support" depends on X86_64 && CPU_SUP_AMD + select DMA_COHERENT_POOL select DYNAMIC_PHYSICAL_MASK select ARCH_USE_MEMREMAP_PROT select ARCH_HAS_FORCE_DMA_UNENCRYPTED
From: David Rientjes rientjes@google.com
upstream 1d659236fb43c4d2b37af7a4309681e834e9ec9a commit.
When AMD memory encryption is enabled, some devices may use more than 256KB/sec from the atomic pools. It would be more appropriate to scale the default size based on memory capacity unless the coherent_pool option is used on the kernel command line.
This provides a slight optimization on initial expansion and is deemed appropriate due to the increased reliance on the atomic pools. Note that the default size of 128KB per pool will normally be larger than the single coherent pool implementation since there are now up to three coherent pools (DMA, DMA32, and kernel).
Note that even prior to this patch, coherent_pool= for sizes larger than 1 << (PAGE_SHIFT + MAX_ORDER-1) can fail. With new dynamic expansion support, this would be trivially extensible to allow even larger initial sizes.
Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/pool.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index dde6de7f8e83..35bb51c31fff 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -20,8 +20,8 @@ static unsigned long pool_size_dma32; static struct gen_pool *atomic_pool_kernel __ro_after_init; static unsigned long pool_size_kernel;
-#define DEFAULT_DMA_COHERENT_POOL_SIZE SZ_256K -static size_t atomic_pool_size = DEFAULT_DMA_COHERENT_POOL_SIZE; +/* Size can be defined by the coherent_pool command line */ +static size_t atomic_pool_size;
/* Dynamic background expansion when the atomic pool is near capacity */ static struct work_struct atomic_pool_work; @@ -170,6 +170,16 @@ static int __init dma_atomic_pool_init(void) { int ret = 0;
+ /* + * If coherent_pool was not used on the command line, default the pool + * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1. + */ + if (!atomic_pool_size) { + atomic_pool_size = max(totalram_pages() >> PAGE_SHIFT, 1UL) * + SZ_128K; + atomic_pool_size = min_t(size_t, atomic_pool_size, + 1 << (PAGE_SHIFT + MAX_ORDER-1)); + } INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
atomic_pool_kernel = __dma_atomic_pool_init(atomic_pool_size,
From: Geert Uytterhoeven geert@linux-m68k.org
upstream 3ee06a6d532f75f20528ff4d2c473cda36c484fe commit.
On systems with at least 32 MiB, but less than 32 GiB of RAM, the DMA memory pools are much larger than intended (e.g. 2 MiB instead of 128 KiB on a 256 MiB system).
Fix this by correcting the calculation of the number of GiBs of RAM in the system. Invert the order of the min/max operations, to keep on calculating in pages until the last step, which aids readability.
Fixes: 1d659236fb43c4d2 ("dma-pool: scale the default DMA coherent pool size with memory capacity") Signed-off-by: Geert Uytterhoeven geert@linux-m68k.org Acked-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/pool.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 35bb51c31fff..8cfa01243ed2 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -175,10 +175,9 @@ static int __init dma_atomic_pool_init(void) * sizes to 128KB per 1GB of memory, min 128KB, max MAX_ORDER-1. */ if (!atomic_pool_size) { - atomic_pool_size = max(totalram_pages() >> PAGE_SHIFT, 1UL) * - SZ_128K; - atomic_pool_size = min_t(size_t, atomic_pool_size, - 1 << (PAGE_SHIFT + MAX_ORDER-1)); + unsigned long pages = totalram_pages() / (SZ_1G / SZ_128K); + pages = min_t(unsigned long, pages, MAX_ORDER_NR_PAGES); + atomic_pool_size = max_t(size_t, pages << PAGE_SHIFT, SZ_128K); } INIT_WORK(&atomic_pool_work, atomic_pool_work_fn);
From: David Rientjes rientjes@google.com
upstream dbed452a078d56bc7f1abecc3edd6a75e8e4484e commit.
DMA_REMAP is an unnecessary requirement for AMD SEV, which requires DMA_COHERENT_POOL, so avoid selecting it when it is otherwise unnecessary.
The only other requirement for DMA coherent pools is DMA_DIRECT_REMAP, so ensure that properly selects the config option when needed.
Fixes: 82fef0ad811f ("x86/mm: unencrypted non-blocking DMA allocations use coherent pools") Reported-by: Alex Xu (Hello71) alex_y_xu@yahoo.ca Signed-off-by: David Rientjes rientjes@google.com Tested-by: Alex Xu (Hello71) alex_y_xu@yahoo.ca Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/Kconfig | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index d006668c0027..a0ce3c1494fd 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -73,18 +73,18 @@ config SWIOTLB config DMA_NONCOHERENT_MMAP bool
+config DMA_COHERENT_POOL + bool + config DMA_REMAP + bool depends on MMU select GENERIC_ALLOCATOR select DMA_NONCOHERENT_MMAP - bool - -config DMA_COHERENT_POOL - bool - select DMA_REMAP
config DMA_DIRECT_REMAP bool + select DMA_REMAP select DMA_COHERENT_POOL
config DMA_CMA
From: Christoph Hellwig hch@lst.de
upstream 3d0fc341c4bb66b2c41c0d1ec954a6d300e100b7 commit.
Use a goto label to merge two error return cases.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/direct.c | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 210ea469028c..5343afbb8af3 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -203,11 +203,8 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size), dma_pgprot(dev, PAGE_KERNEL, attrs), __builtin_return_address(0)); - if (!ret) { - dma_free_contiguous(dev, page, size); - return ret; - } - + if (!ret) + goto out_free_pages; memset(ret, 0, size); goto done; } @@ -220,8 +217,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, * so log an error and fail. */ dev_info(dev, "Rejecting highmem page from CMA.\n"); - dma_free_contiguous(dev, page, size); - return NULL; + goto out_free_pages; }
ret = page_address(page); @@ -241,6 +237,9 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, else *dma_handle = phys_to_dma(dev, page_to_phys(page)); return ret; +out_free_pages: + dma_free_contiguous(dev, page, size); + return NULL; }
void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr,
From: Christoph Hellwig hch@lst.de
upstream 0f665b9e2a6d4cc963e6cd349d40320ed5281f95 commit.
Switch xtensa over to use the generic uncached support, and thus the generic implementations of dma_alloc_* and dma_alloc_*, which also gains support for mmaping DMA memory. The non-working nommu DMA support has been disabled, but could be re-enabled easily if platforms that actually have an uncached segment show up.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Max Filippov jcmvbkbc@gmail.com Tested-by: Max Filippov jcmvbkbc@gmail.com Signed-off-by: Peter Gonda pgonda@google.com --- arch/xtensa/Kconfig | 6 +- arch/xtensa/include/asm/platform.h | 27 ------- arch/xtensa/kernel/Makefile | 3 +- arch/xtensa/kernel/pci-dma.c | 121 +++-------------------------- 4 files changed, 18 insertions(+), 139 deletions(-)
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index 8352037322df..d3a5891eff2e 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -3,8 +3,10 @@ config XTENSA def_bool y select ARCH_32BIT_OFF_T select ARCH_HAS_BINFMT_FLAT if !MMU - select ARCH_HAS_SYNC_DMA_FOR_CPU - select ARCH_HAS_SYNC_DMA_FOR_DEVICE + select ARCH_HAS_DMA_PREP_COHERENT if MMU + select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU + select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU + select ARCH_HAS_UNCACHED_SEGMENT if MMU select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS select ARCH_WANT_FRAME_POINTERS diff --git a/arch/xtensa/include/asm/platform.h b/arch/xtensa/include/asm/platform.h index 913826dfa838..f2c48522c5a1 100644 --- a/arch/xtensa/include/asm/platform.h +++ b/arch/xtensa/include/asm/platform.h @@ -65,31 +65,4 @@ extern void platform_calibrate_ccount (void); */ void cpu_reset(void) __attribute__((noreturn));
-/* - * Memory caching is platform-dependent in noMMU xtensa configurations. - * The following set of functions should be implemented in platform code - * in order to enable coherent DMA memory operations when CONFIG_MMU is not - * enabled. Default implementations do nothing and issue a warning. - */ - -/* - * Check whether p points to a cached memory. - */ -bool platform_vaddr_cached(const void *p); - -/* - * Check whether p points to an uncached memory. - */ -bool platform_vaddr_uncached(const void *p); - -/* - * Return pointer to an uncached view of the cached sddress p. - */ -void *platform_vaddr_to_uncached(void *p); - -/* - * Return pointer to a cached view of the uncached sddress p. - */ -void *platform_vaddr_to_cached(void *p); - #endif /* _XTENSA_PLATFORM_H */ diff --git a/arch/xtensa/kernel/Makefile b/arch/xtensa/kernel/Makefile index 6f629027ac7d..d4082c6a121b 100644 --- a/arch/xtensa/kernel/Makefile +++ b/arch/xtensa/kernel/Makefile @@ -5,10 +5,11 @@
extra-y := head.o vmlinux.lds
-obj-y := align.o coprocessor.o entry.o irq.o pci-dma.o platform.o process.o \ +obj-y := align.o coprocessor.o entry.o irq.o platform.o process.o \ ptrace.o setup.o signal.o stacktrace.o syscall.o time.o traps.o \ vectors.o
+obj-$(CONFIG_MMU) += pci-dma.o obj-$(CONFIG_PCI) += pci.o obj-$(CONFIG_MODULES) += xtensa_ksyms.o module.o obj-$(CONFIG_FUNCTION_TRACER) += mcount.o diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 154979d62b73..1c82e21de4f6 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -81,122 +81,25 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr, } }
-#ifdef CONFIG_MMU -bool platform_vaddr_cached(const void *p) -{ - unsigned long addr = (unsigned long)p; - - return addr >= XCHAL_KSEG_CACHED_VADDR && - addr - XCHAL_KSEG_CACHED_VADDR < XCHAL_KSEG_SIZE; -} - -bool platform_vaddr_uncached(const void *p) -{ - unsigned long addr = (unsigned long)p; - - return addr >= XCHAL_KSEG_BYPASS_VADDR && - addr - XCHAL_KSEG_BYPASS_VADDR < XCHAL_KSEG_SIZE; -} - -void *platform_vaddr_to_uncached(void *p) -{ - return p + XCHAL_KSEG_BYPASS_VADDR - XCHAL_KSEG_CACHED_VADDR; -} - -void *platform_vaddr_to_cached(void *p) -{ - return p + XCHAL_KSEG_CACHED_VADDR - XCHAL_KSEG_BYPASS_VADDR; -} -#else -bool __attribute__((weak)) platform_vaddr_cached(const void *p) -{ - WARN_ONCE(1, "Default %s implementation is used\n", __func__); - return true; -} - -bool __attribute__((weak)) platform_vaddr_uncached(const void *p) -{ - WARN_ONCE(1, "Default %s implementation is used\n", __func__); - return false; -} - -void __attribute__((weak)) *platform_vaddr_to_uncached(void *p) +void arch_dma_prep_coherent(struct page *page, size_t size) { - WARN_ONCE(1, "Default %s implementation is used\n", __func__); - return p; -} - -void __attribute__((weak)) *platform_vaddr_to_cached(void *p) -{ - WARN_ONCE(1, "Default %s implementation is used\n", __func__); - return p; + __invalidate_dcache_range((unsigned long)page_address(page), size); } -#endif
/* - * Note: We assume that the full memory space is always mapped to 'kseg' - * Otherwise we have to use page attributes (not implemented). + * Memory caching is platform-dependent in noMMU xtensa configurations. + * The following two functions should be implemented in platform code + * in order to enable coherent DMA memory operations when CONFIG_MMU is not + * enabled. */ - -void *arch_dma_alloc(struct device *dev, size_t size, dma_addr_t *handle, - gfp_t flag, unsigned long attrs) -{ - unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; - struct page *page = NULL; - - /* ignore region speicifiers */ - - flag &= ~(__GFP_DMA | __GFP_HIGHMEM); - - if (dev == NULL || (dev->coherent_dma_mask < 0xffffffff)) - flag |= GFP_DMA; - - if (gfpflags_allow_blocking(flag)) - page = dma_alloc_from_contiguous(dev, count, get_order(size), - flag & __GFP_NOWARN); - - if (!page) - page = alloc_pages(flag | __GFP_ZERO, get_order(size)); - - if (!page) - return NULL; - - *handle = phys_to_dma(dev, page_to_phys(page)); - #ifdef CONFIG_MMU - if (PageHighMem(page)) { - void *p; - - p = dma_common_contiguous_remap(page, size, - pgprot_noncached(PAGE_KERNEL), - __builtin_return_address(0)); - if (!p) { - if (!dma_release_from_contiguous(dev, page, count)) - __free_pages(page, get_order(size)); - } - return p; - } -#endif - BUG_ON(!platform_vaddr_cached(page_address(page))); - __invalidate_dcache_range((unsigned long)page_address(page), size); - return platform_vaddr_to_uncached(page_address(page)); +void *uncached_kernel_address(void *p) +{ + return p + XCHAL_KSEG_BYPASS_VADDR - XCHAL_KSEG_CACHED_VADDR; }
-void arch_dma_free(struct device *dev, size_t size, void *vaddr, - dma_addr_t dma_handle, unsigned long attrs) +void *cached_kernel_address(void *p) { - unsigned long count = PAGE_ALIGN(size) >> PAGE_SHIFT; - struct page *page; - - if (platform_vaddr_uncached(vaddr)) { - page = virt_to_page(platform_vaddr_to_cached(vaddr)); - } else { -#ifdef CONFIG_MMU - dma_common_free_remap(vaddr, size); -#endif - page = pfn_to_page(PHYS_PFN(dma_to_phys(dev, dma_handle))); - } - - if (!dma_release_from_contiguous(dev, page, count)) - __free_pages(page, get_order(size)); + return p + XCHAL_KSEG_CACHED_VADDR - XCHAL_KSEG_BYPASS_VADDR; } +#endif /* CONFIG_MMU */
From: Christoph Hellwig hch@lst.de
upstream fa7e2247c5729f990c7456fe09f3af99c8f2571b commit.
Rename the symbol to arch_dma_set_uncached, and pass a size to it as well as allow an error return. That will allow reusing this hook for in-place pagetable remapping.
As the in-place remap doesn't always require an explicit cache flush, also detangle ARCH_HAS_DMA_PREP_COHERENT from ARCH_HAS_DMA_SET_UNCACHED.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Robin Murphy robin.murphy@arm.com Change-Id: I69aaa5ee5f73547aaf4de8fb0a57494709fa5eb5 Signed-off-by: Peter Gonda pgonda@google.com --- arch/Kconfig | 8 ++++---- arch/microblaze/Kconfig | 2 +- arch/microblaze/mm/consistent.c | 2 +- arch/mips/Kconfig | 3 ++- arch/mips/mm/dma-noncoherent.c | 2 +- arch/nios2/Kconfig | 3 ++- arch/nios2/mm/dma-mapping.c | 2 +- arch/xtensa/Kconfig | 2 +- arch/xtensa/kernel/pci-dma.c | 2 +- include/linux/dma-noncoherent.h | 2 +- kernel/dma/direct.c | 10 ++++++---- 11 files changed, 21 insertions(+), 17 deletions(-)
diff --git a/arch/Kconfig b/arch/Kconfig index 238dccfa7691..38b6e74750fc 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -248,11 +248,11 @@ config ARCH_HAS_SET_DIRECT_MAP bool
# -# Select if arch has an uncached kernel segment and provides the -# uncached_kernel_address / cached_kernel_address symbols to use it +# Select if the architecture provides the arch_dma_set_uncached symbol to +# either provide an uncached segement alias for a DMA allocation, or +# to remap the page tables in place. # -config ARCH_HAS_UNCACHED_SEGMENT - select ARCH_HAS_DMA_PREP_COHERENT +config ARCH_HAS_DMA_SET_UNCACHED bool
# Select if arch init_task must go in the __init_task_data section diff --git a/arch/microblaze/Kconfig b/arch/microblaze/Kconfig index 261c26df1c9f..2bdb3ceb525d 100644 --- a/arch/microblaze/Kconfig +++ b/arch/microblaze/Kconfig @@ -8,7 +8,7 @@ config MICROBLAZE select ARCH_HAS_GCOV_PROFILE_ALL select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE - select ARCH_HAS_UNCACHED_SEGMENT if !MMU + select ARCH_HAS_DMA_SET_UNCACHED if !MMU select ARCH_MIGHT_HAVE_PC_PARPORT select ARCH_WANT_IPC_PARSE_VERSION select BUILDTIME_EXTABLE_SORT diff --git a/arch/microblaze/mm/consistent.c b/arch/microblaze/mm/consistent.c index 8c5f0c332d8b..457581fb74cc 100644 --- a/arch/microblaze/mm/consistent.c +++ b/arch/microblaze/mm/consistent.c @@ -40,7 +40,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) #define UNCACHED_SHADOW_MASK 0 #endif /* CONFIG_XILINX_UNCACHED_SHADOW */
-void *uncached_kernel_address(void *ptr) +void *arch_dma_set_uncached(void *ptr, size_t size) { unsigned long addr = (unsigned long)ptr;
diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index c1c3da4fc667..ab98d8bad08e 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -1132,8 +1132,9 @@ config DMA_NONCOHERENT # significant advantages. # select ARCH_HAS_DMA_WRITE_COMBINE + select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_SYNC_DMA_FOR_DEVICE - select ARCH_HAS_UNCACHED_SEGMENT + select ARCH_HAS_DMA_SET_UNCACHED select DMA_NONCOHERENT_MMAP select DMA_NONCOHERENT_CACHE_SYNC select NEED_DMA_MAP_STATE diff --git a/arch/mips/mm/dma-noncoherent.c b/arch/mips/mm/dma-noncoherent.c index fcf6d3eaac66..d71b947a2121 100644 --- a/arch/mips/mm/dma-noncoherent.c +++ b/arch/mips/mm/dma-noncoherent.c @@ -49,7 +49,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) dma_cache_wback_inv((unsigned long)page_address(page), size); }
-void *uncached_kernel_address(void *addr) +void *arch_dma_set_uncached(void *addr, size_t size) { return (void *)(__pa(addr) + UNCAC_BASE); } diff --git a/arch/nios2/Kconfig b/arch/nios2/Kconfig index 44b5da37e8bd..2fc4ed210b5f 100644 --- a/arch/nios2/Kconfig +++ b/arch/nios2/Kconfig @@ -2,9 +2,10 @@ config NIOS2 def_bool y select ARCH_32BIT_OFF_T + select ARCH_HAS_DMA_PREP_COHERENT select ARCH_HAS_SYNC_DMA_FOR_CPU select ARCH_HAS_SYNC_DMA_FOR_DEVICE - select ARCH_HAS_UNCACHED_SEGMENT + select ARCH_HAS_DMA_SET_UNCACHED select ARCH_NO_SWAP select TIMER_OF select GENERIC_ATOMIC64 diff --git a/arch/nios2/mm/dma-mapping.c b/arch/nios2/mm/dma-mapping.c index 9cb238664584..19f6d6b394e6 100644 --- a/arch/nios2/mm/dma-mapping.c +++ b/arch/nios2/mm/dma-mapping.c @@ -67,7 +67,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) flush_dcache_range(start, start + size); }
-void *uncached_kernel_address(void *ptr) +void *arch_dma_set_uncached(void *ptr, size_t size) { unsigned long addr = (unsigned long)ptr;
diff --git a/arch/xtensa/Kconfig b/arch/xtensa/Kconfig index d3a5891eff2e..75bc567c5c10 100644 --- a/arch/xtensa/Kconfig +++ b/arch/xtensa/Kconfig @@ -6,7 +6,7 @@ config XTENSA select ARCH_HAS_DMA_PREP_COHERENT if MMU select ARCH_HAS_SYNC_DMA_FOR_CPU if MMU select ARCH_HAS_SYNC_DMA_FOR_DEVICE if MMU - select ARCH_HAS_UNCACHED_SEGMENT if MMU + select ARCH_HAS_DMA_SET_UNCACHED if MMU select ARCH_USE_QUEUED_RWLOCKS select ARCH_USE_QUEUED_SPINLOCKS select ARCH_WANT_FRAME_POINTERS diff --git a/arch/xtensa/kernel/pci-dma.c b/arch/xtensa/kernel/pci-dma.c index 1c82e21de4f6..d704eb67867c 100644 --- a/arch/xtensa/kernel/pci-dma.c +++ b/arch/xtensa/kernel/pci-dma.c @@ -93,7 +93,7 @@ void arch_dma_prep_coherent(struct page *page, size_t size) * enabled. */ #ifdef CONFIG_MMU -void *uncached_kernel_address(void *p) +void *arch_dma_set_uncached(void *p, size_t size) { return p + XCHAL_KSEG_BYPASS_VADDR - XCHAL_KSEG_CACHED_VADDR; } diff --git a/include/linux/dma-noncoherent.h b/include/linux/dma-noncoherent.h index e30fca1f1b12..dc6ddbb26846 100644 --- a/include/linux/dma-noncoherent.h +++ b/include/linux/dma-noncoherent.h @@ -108,7 +108,7 @@ static inline void arch_dma_prep_coherent(struct page *page, size_t size) } #endif /* CONFIG_ARCH_HAS_DMA_PREP_COHERENT */
-void *uncached_kernel_address(void *addr); +void *arch_dma_set_uncached(void *addr, size_t size); void *cached_kernel_address(void *addr);
#endif /* _LINUX_DMA_NONCOHERENT_H */ diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 5343afbb8af3..bb5cb5af9f7d 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -226,10 +226,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size,
memset(ret, 0, size);
- if (IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && + if (IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) && dma_alloc_need_uncached(dev, attrs)) { arch_dma_prep_coherent(page, size); - ret = uncached_kernel_address(ret); + ret = arch_dma_set_uncached(ret, size); + if (IS_ERR(ret)) + goto out_free_pages; } done: if (force_dma_unencrypted(dev)) @@ -271,7 +273,7 @@ void dma_direct_free_pages(struct device *dev, size_t size, void *cpu_addr, void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs) { - if (!IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && + if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) && !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && dma_alloc_need_uncached(dev, attrs)) return arch_dma_alloc(dev, size, dma_handle, gfp, attrs); @@ -281,7 +283,7 @@ void *dma_direct_alloc(struct device *dev, size_t size, void dma_direct_free(struct device *dev, size_t size, void *cpu_addr, dma_addr_t dma_addr, unsigned long attrs) { - if (!IS_ENABLED(CONFIG_ARCH_HAS_UNCACHED_SEGMENT) && + if (!IS_ENABLED(CONFIG_ARCH_HAS_DMA_SET_UNCACHED) && !IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && dma_alloc_need_uncached(dev, attrs)) arch_dma_free(dev, size, cpu_addr, dma_addr, attrs);
From: David Rientjes rientjes@google.com
upstream 633d5fce78a61e8727674467944939f55b0bcfab commit.
dma_alloc_contiguous() does size >> PAGE_SHIFT and set_memory_decrypted() works at page granularity. It's necessary to page align the allocation size in dma_direct_alloc_pages() for consistent behavior.
This also fixes an issue when arch_dma_prep_coherent() is called on an unaligned allocation size for dma_alloc_need_uncached() when CONFIG_DMA_DIRECT_REMAP is disabled but CONFIG_ARCH_HAS_DMA_SET_UNCACHED is enabled.
Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Change-Id: I6ede6ca2864a9fb3ace42df7a0da6725ae453f1c Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/direct.c | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index bb5cb5af9f7d..e72bb0dc8150 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -126,11 +126,12 @@ static inline bool dma_should_free_from_pool(struct device *dev, struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp_t gfp, unsigned long attrs) { - size_t alloc_size = PAGE_ALIGN(size); int node = dev_to_node(dev); struct page *page = NULL; u64 phys_mask;
+ WARN_ON_ONCE(!PAGE_ALIGNED(size)); + if (attrs & DMA_ATTR_NO_WARN) gfp |= __GFP_NOWARN;
@@ -138,14 +139,14 @@ struct page *__dma_direct_alloc_pages(struct device *dev, size_t size, gfp &= ~__GFP_ZERO; gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, &phys_mask); - page = dma_alloc_contiguous(dev, alloc_size, gfp); + page = dma_alloc_contiguous(dev, size, gfp); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { - dma_free_contiguous(dev, page, alloc_size); + dma_free_contiguous(dev, page, size); page = NULL; } again: if (!page) - page = alloc_pages_node(node, gfp, get_order(alloc_size)); + page = alloc_pages_node(node, gfp, get_order(size)); if (page && !dma_coherent_ok(dev, page_to_phys(page), size)) { dma_free_contiguous(dev, page, size); page = NULL; @@ -172,8 +173,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, struct page *page; void *ret;
+ size = PAGE_ALIGN(size); + if (dma_should_alloc_from_pool(dev, gfp, attrs)) { - ret = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, gfp); + ret = dma_alloc_from_pool(dev, size, &page, gfp); if (!ret) return NULL; goto done; @@ -197,10 +200,10 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, dma_alloc_need_uncached(dev, attrs)) || (IS_ENABLED(CONFIG_DMA_REMAP) && PageHighMem(page))) { /* remove any dirty cache lines on the kernel alias */ - arch_dma_prep_coherent(page, PAGE_ALIGN(size)); + arch_dma_prep_coherent(page, size);
/* create a coherent mapping */ - ret = dma_common_contiguous_remap(page, PAGE_ALIGN(size), + ret = dma_common_contiguous_remap(page, size, dma_pgprot(dev, PAGE_KERNEL, attrs), __builtin_return_address(0)); if (!ret)
From: David Rientjes rientjes@google.com
upstream 96a539fa3bb71f443ae08e57b9f63d6e5bb2207c commit.
If arch_dma_set_uncached() fails after memory has been decrypted, it needs to be re-encrypted before freeing.
Fixes: fa7e2247c572 ("dma-direct: make uncached_kernel_address more general") Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/direct.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index e72bb0dc8150..b4a5b7076399 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -234,7 +234,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, arch_dma_prep_coherent(page, size); ret = arch_dma_set_uncached(ret, size); if (IS_ERR(ret)) - goto out_free_pages; + goto out_encrypt_pages; } done: if (force_dma_unencrypted(dev)) @@ -242,6 +242,11 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, else *dma_handle = phys_to_dma(dev, page_to_phys(page)); return ret; + +out_encrypt_pages: + if (force_dma_unencrypted(dev)) + set_memory_encrypted((unsigned long)page_address(page), + 1 << get_order(size)); out_free_pages: dma_free_contiguous(dev, page, size); return NULL;
From: David Rientjes rientjes@google.com
upstream 56fccf21d1961a06e2a0c96ce446ebf036651062 commit.
__change_page_attr() can fail which will cause set_memory_encrypted() and set_memory_decrypted() to return non-zero.
If the device requires unencrypted DMA memory and decryption fails, simply free the memory and fail.
If attempting to re-encrypt in the failure path and that encryption fails, there is no alternative other than to leak the memory.
Fixes: c10f07aa27da ("dma/direct: Handle force decryption for DMA coherent buffers in common code") Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/direct.c | 19 ++++++++++++++----- 1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index b4a5b7076399..ac611b4f65b9 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -172,6 +172,7 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, { struct page *page; void *ret; + int err;
size = PAGE_ALIGN(size);
@@ -224,8 +225,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, }
ret = page_address(page); - if (force_dma_unencrypted(dev)) - set_memory_decrypted((unsigned long)ret, 1 << get_order(size)); + if (force_dma_unencrypted(dev)) { + err = set_memory_decrypted((unsigned long)ret, + 1 << get_order(size)); + if (err) + goto out_free_pages; + }
memset(ret, 0, size);
@@ -244,9 +249,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, return ret;
out_encrypt_pages: - if (force_dma_unencrypted(dev)) - set_memory_encrypted((unsigned long)page_address(page), - 1 << get_order(size)); + if (force_dma_unencrypted(dev)) { + err = set_memory_encrypted((unsigned long)page_address(page), + 1 << get_order(size)); + /* If memory cannot be re-encrypted, it must be leaked */ + if (err) + return NULL; + } out_free_pages: dma_free_contiguous(dev, page, size); return NULL;
From: David Rientjes rientjes@google.com
upstream 1a2b3357e860d890f8045367b179c7e7e802cd71 commit.
When a coherent mapping is created in dma_direct_alloc_pages(), it needs to be decrypted if the device requires unencrypted DMA before returning.
Fixes: 3acac065508f ("dma-mapping: merge the generic remapping helpers into dma-direct") Signed-off-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/direct.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index ac611b4f65b9..6c677ffdbd53 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -209,6 +209,12 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, __builtin_return_address(0)); if (!ret) goto out_free_pages; + if (force_dma_unencrypted(dev)) { + err = set_memory_decrypted((unsigned long)ret, + 1 << get_order(size)); + if (err) + goto out_free_pages; + } memset(ret, 0, size); goto done; }
From: Christoph Hellwig hch@lst.de
upstream d07ae4c486908615ab336b987c7c367d132fd844 commit.
The dma coherent pool code needs genalloc. Move the select over from DMA_REMAP, which doesn't actually need it.
Fixes: dbed452a078d ("dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL") Reported-by: kernel test robot lkp@intel.com Signed-off-by: Christoph Hellwig hch@lst.de Acked-by: David Rientjes rientjes@google.com Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/dma/Kconfig b/kernel/dma/Kconfig index a0ce3c1494fd..6ad16b7c9652 100644 --- a/kernel/dma/Kconfig +++ b/kernel/dma/Kconfig @@ -74,12 +74,12 @@ config DMA_NONCOHERENT_MMAP bool
config DMA_COHERENT_POOL + select GENERIC_ALLOCATOR bool
config DMA_REMAP bool depends on MMU - select GENERIC_ALLOCATOR select DMA_NONCOHERENT_MMAP
config DMA_DIRECT_REMAP
From: David Rientjes rientjes@google.com
upstream 71cdec4fab76667dabdbb2ca232b039004ebd40f commit.
When a DMA coherent pool is depleted, allocation failures may or may not get reported in the kernel log depending on the allocator.
The admin does have a workaround, however, by using coherent_pool= on the kernel command line.
Provide some guidance on the failure and a recommended minimum size for the pools (double the size).
Signed-off-by: David Rientjes rientjes@google.com Tested-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/pool.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 8cfa01243ed2..39ca26fa41b5 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -239,12 +239,16 @@ void *dma_alloc_from_pool(struct device *dev, size_t size, }
val = gen_pool_alloc(pool, size); - if (val) { + if (likely(val)) { phys_addr_t phys = gen_pool_virt_to_phys(pool, val);
*ret_page = pfn_to_page(__phys_to_pfn(phys)); ptr = (void *)val; memset(ptr, 0, size); + } else { + WARN_ONCE(1, "DMA coherent pool depleted, increase size " + "(recommended min coherent_pool=%zuK)\n", + gen_pool_size(pool) >> 9); } if (gen_pool_avail(pool) < atomic_pool_size) schedule_work(&atomic_pool_work);
From: Nicolas Saenz Julienne nsaenzjulienne@suse.de
upstream 567f6a6eba0c09e5f502e0290e57651befa8aacb commit.
dma_coherent_ok() checks if a physical memory area fits a device's DMA constraints.
Signed-off-by: Nicolas Saenz Julienne nsaenzjulienne@suse.de Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- include/linux/dma-direct.h | 1 + kernel/dma/direct.c | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index fb5ec847ddf3..8ccddee1f78a 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -68,6 +68,7 @@ static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr) u64 dma_direct_get_required_mask(struct device *dev); gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, u64 *phys_mask); +bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size); void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void dma_direct_free(struct device *dev, size_t size, void *cpu_addr, diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 6c677ffdbd53..54c1c3a20c09 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -84,7 +84,7 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, return 0; }
-static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) +bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) { return phys_to_dma_direct(dev, phys) + size - 1 <= min_not_zero(dev->coherent_dma_mask, dev->bus_dma_mask);
From: Nicolas Saenz Julienne nsaenzjulienne@suse.de
upstream 23e469be6239d9cf3d921fc3e38545491df56534 commit.
The function is only used once and can be simplified to a one-liner.
Signed-off-by: Nicolas Saenz Julienne nsaenzjulienne@suse.de Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/pool.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 39ca26fa41b5..318035e093fb 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -217,15 +217,6 @@ static inline struct gen_pool *dev_to_pool(struct device *dev) return atomic_pool_kernel; }
-static bool dma_in_atomic_pool(struct device *dev, void *start, size_t size) -{ - struct gen_pool *pool = dev_to_pool(dev); - - if (unlikely(!pool)) - return false; - return gen_pool_has_addr(pool, (unsigned long)start, size); -} - void *dma_alloc_from_pool(struct device *dev, size_t size, struct page **ret_page, gfp_t flags) { @@ -260,7 +251,7 @@ bool dma_free_from_pool(struct device *dev, void *start, size_t size) { struct gen_pool *pool = dev_to_pool(dev);
- if (!dma_in_atomic_pool(dev, start, size)) + if (!pool || !gen_pool_has_addr(pool, (unsigned long)start, size)) return false; gen_pool_free(pool, (unsigned long)start, size); return true;
From: Nicolas Saenz Julienne nsaenzjulienne@suse.de
upstream 48b6703858dd5526c82d8ff2dbac59acab3a9dda commit.
dma-pool's dev_to_pool() creates the false impression that there is a way to grantee a mapping between a device's DMA constraints and an atomic pool. It tuns out it's just a guess, and the device might need to use an atomic pool containing memory from a 'safer' (or lower) memory zone.
To help mitigate this, introduce dma_guess_pool() which can be fed a device's DMA constraints and atomic pools already known to be faulty, in order for it to provide an better guess on which pool to use.
Signed-off-by: Nicolas Saenz Julienne nsaenzjulienne@suse.de Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/pool.c | 26 +++++++++++++++++++++++--- 1 file changed, 23 insertions(+), 3 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 318035e093fb..5b9eaa2b498d 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -203,7 +203,7 @@ static int __init dma_atomic_pool_init(void) } postcore_initcall(dma_atomic_pool_init);
-static inline struct gen_pool *dev_to_pool(struct device *dev) +static inline struct gen_pool *dma_guess_pool_from_device(struct device *dev) { u64 phys_mask; gfp_t gfp; @@ -217,10 +217,30 @@ static inline struct gen_pool *dev_to_pool(struct device *dev) return atomic_pool_kernel; }
+static inline struct gen_pool *dma_get_safer_pool(struct gen_pool *bad_pool) +{ + if (bad_pool == atomic_pool_kernel) + return atomic_pool_dma32 ? : atomic_pool_dma; + + if (bad_pool == atomic_pool_dma32) + return atomic_pool_dma; + + return NULL; +} + +static inline struct gen_pool *dma_guess_pool(struct device *dev, + struct gen_pool *bad_pool) +{ + if (bad_pool) + return dma_get_safer_pool(bad_pool); + + return dma_guess_pool_from_device(dev); +} + void *dma_alloc_from_pool(struct device *dev, size_t size, struct page **ret_page, gfp_t flags) { - struct gen_pool *pool = dev_to_pool(dev); + struct gen_pool *pool = dma_guess_pool(dev, NULL); unsigned long val; void *ptr = NULL;
@@ -249,7 +269,7 @@ void *dma_alloc_from_pool(struct device *dev, size_t size,
bool dma_free_from_pool(struct device *dev, void *start, size_t size) { - struct gen_pool *pool = dev_to_pool(dev); + struct gen_pool *pool = dma_guess_pool(dev, NULL);
if (!pool || !gen_pool_has_addr(pool, (unsigned long)start, size)) return false;
From: Nicolas Saenz Julienne nsaenzjulienne@suse.de
upstream 81e9d894e03f9a279102c7aac62ea7cbf9949f4b commit.
When allocating DMA memory from a pool, the core can only guess which atomic pool will fit a device's constraints. If it doesn't, get a safer atomic pool and try again.
Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp mask") Reported-by: Jeremy Linton jeremy.linton@arm.com Suggested-by: Robin Murphy robin.murphy@arm.com Signed-off-by: Nicolas Saenz Julienne nsaenzjulienne@suse.de Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/pool.c | 57 ++++++++++++++++++++++++++++++----------------- 1 file changed, 37 insertions(+), 20 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 5b9eaa2b498d..d48d9acb585f 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -240,39 +240,56 @@ static inline struct gen_pool *dma_guess_pool(struct device *dev, void *dma_alloc_from_pool(struct device *dev, size_t size, struct page **ret_page, gfp_t flags) { - struct gen_pool *pool = dma_guess_pool(dev, NULL); - unsigned long val; + struct gen_pool *pool = NULL; + unsigned long val = 0; void *ptr = NULL; - - if (!pool) { - WARN(1, "%pGg atomic pool not initialised!\n", &flags); - return NULL; + phys_addr_t phys; + + while (1) { + pool = dma_guess_pool(dev, pool); + if (!pool) { + WARN(1, "Failed to get suitable pool for %s\n", + dev_name(dev)); + break; + } + + val = gen_pool_alloc(pool, size); + if (!val) + continue; + + phys = gen_pool_virt_to_phys(pool, val); + if (dma_coherent_ok(dev, phys, size)) + break; + + gen_pool_free(pool, val, size); + val = 0; }
- val = gen_pool_alloc(pool, size); - if (likely(val)) { - phys_addr_t phys = gen_pool_virt_to_phys(pool, val);
+ if (val) { *ret_page = pfn_to_page(__phys_to_pfn(phys)); ptr = (void *)val; memset(ptr, 0, size); - } else { - WARN_ONCE(1, "DMA coherent pool depleted, increase size " - "(recommended min coherent_pool=%zuK)\n", - gen_pool_size(pool) >> 9); + + if (gen_pool_avail(pool) < atomic_pool_size) + schedule_work(&atomic_pool_work); } - if (gen_pool_avail(pool) < atomic_pool_size) - schedule_work(&atomic_pool_work);
return ptr; }
bool dma_free_from_pool(struct device *dev, void *start, size_t size) { - struct gen_pool *pool = dma_guess_pool(dev, NULL); + struct gen_pool *pool = NULL;
- if (!pool || !gen_pool_has_addr(pool, (unsigned long)start, size)) - return false; - gen_pool_free(pool, (unsigned long)start, size); - return true; + while (1) { + pool = dma_guess_pool(dev, pool); + if (!pool) + return false; + + if (gen_pool_has_addr(pool, (unsigned long)start, size)) { + gen_pool_free(pool, (unsigned long)start, size); + return true; + } + } }
From: Nicolas Saenz Julienne nsaenzjulienne@suse.de
upstream d9765e41d8e9ea2251bf73735a2895c8bad546fc commit.
There is no guarantee to CMA's placement, so allocating a zone specific atomic pool from CMA might return memory from a completely different memory zone. So stop using it.
Fixes: c84dc6e68a1d ("dma-pool: add additional coherent pools to map to gfp mask") Reported-by: Jeremy Linton jeremy.linton@arm.com Signed-off-by: Nicolas Saenz Julienne nsaenzjulienne@suse.de Tested-by: Jeremy Linton jeremy.linton@arm.com Acked-by: David Rientjes rientjes@google.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Peter Gonda pgonda@google.com --- kernel/dma/pool.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-)
diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index d48d9acb585f..6bc74a2d5127 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -6,7 +6,6 @@ #include <linux/debugfs.h> #include <linux/dma-direct.h> #include <linux/dma-noncoherent.h> -#include <linux/dma-contiguous.h> #include <linux/init.h> #include <linux/genalloc.h> #include <linux/set_memory.h> @@ -69,12 +68,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size,
do { pool_size = 1 << (PAGE_SHIFT + order); - - if (dev_get_cma_area(NULL)) - page = dma_alloc_from_contiguous(NULL, 1 << order, - order, false); - else - page = alloc_pages(gfp, order); + page = alloc_pages(gfp, order); } while (!page && order-- > 0); if (!page) goto out; @@ -118,8 +112,7 @@ static int atomic_pool_expand(struct gen_pool *pool, size_t pool_size, dma_common_free_remap(addr, pool_size); #endif free_page: __maybe_unused - if (!dma_release_from_contiguous(NULL, page, 1 << order)) - __free_pages(page, order); + __free_pages(page, order); out: return ret; }
From: Christoph Hellwig hch@lst.de
upstream 9420139f516d7fbc248ce17f35275cb005ed98ea commit.
When allocating coherent pool memory for an IOMMU mapping we don't care about the DMA mask. Move the guess for the initial GFP mask into the dma_direct_alloc_pages and pass dma_coherent_ok as a function pointer argument so that it doesn't get applied to the IOMMU case.
Signed-off-by: Christoph Hellwig hch@lst.de Tested-by: Amit Pundir amit.pundir@linaro.org Change-Id: I343ae38a73135948f8f8bb9ae9a12034c7d4c405 Signed-off-by: Peter Gonda pgonda@google.com --- drivers/iommu/dma-iommu.c | 4 +- include/linux/dma-direct.h | 3 - include/linux/dma-mapping.h | 5 +- kernel/dma/direct.c | 11 +++- kernel/dma/pool.c | 114 +++++++++++++++--------------------- 5 files changed, 61 insertions(+), 76 deletions(-)
diff --git a/drivers/iommu/dma-iommu.c b/drivers/iommu/dma-iommu.c index b642c1123a29..f917bd10f47c 100644 --- a/drivers/iommu/dma-iommu.c +++ b/drivers/iommu/dma-iommu.c @@ -1010,8 +1010,8 @@ static void *iommu_dma_alloc(struct device *dev, size_t size,
if (IS_ENABLED(CONFIG_DMA_DIRECT_REMAP) && !gfpflags_allow_blocking(gfp) && !coherent) - cpu_addr = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &page, - gfp); + page = dma_alloc_from_pool(dev, PAGE_ALIGN(size), &cpu_addr, + gfp, NULL); else cpu_addr = iommu_dma_alloc_pages(dev, size, &page, gfp, attrs); if (!cpu_addr) diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 8ccddee1f78a..6db863c3eb93 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -66,9 +66,6 @@ static inline phys_addr_t dma_to_phys(struct device *dev, dma_addr_t daddr) }
u64 dma_direct_get_required_mask(struct device *dev); -gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, - u64 *phys_mask); -bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size); void *dma_direct_alloc(struct device *dev, size_t size, dma_addr_t *dma_handle, gfp_t gfp, unsigned long attrs); void dma_direct_free(struct device *dev, size_t size, void *cpu_addr, diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h index e4be706d8f5e..246a4b429612 100644 --- a/include/linux/dma-mapping.h +++ b/include/linux/dma-mapping.h @@ -633,8 +633,9 @@ void *dma_common_pages_remap(struct page **pages, size_t size, pgprot_t prot, const void *caller); void dma_common_free_remap(void *cpu_addr, size_t size);
-void *dma_alloc_from_pool(struct device *dev, size_t size, - struct page **ret_page, gfp_t flags); +struct page *dma_alloc_from_pool(struct device *dev, size_t size, + void **cpu_addr, gfp_t flags, + bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t)); bool dma_free_from_pool(struct device *dev, void *start, size_t size);
int diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 54c1c3a20c09..71be82b07743 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -84,7 +84,7 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, return 0; }
-bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) +static bool dma_coherent_ok(struct device *dev, phys_addr_t phys, size_t size) { return phys_to_dma_direct(dev, phys) + size - 1 <= min_not_zero(dev->coherent_dma_mask, dev->bus_dma_mask); @@ -177,8 +177,13 @@ void *dma_direct_alloc_pages(struct device *dev, size_t size, size = PAGE_ALIGN(size);
if (dma_should_alloc_from_pool(dev, gfp, attrs)) { - ret = dma_alloc_from_pool(dev, size, &page, gfp); - if (!ret) + u64 phys_mask; + + gfp |= dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, + &phys_mask); + page = dma_alloc_from_pool(dev, size, &ret, gfp, + dma_coherent_ok); + if (!page) return NULL; goto done; } diff --git a/kernel/dma/pool.c b/kernel/dma/pool.c index 6bc74a2d5127..5d071d4a3cba 100644 --- a/kernel/dma/pool.c +++ b/kernel/dma/pool.c @@ -196,93 +196,75 @@ static int __init dma_atomic_pool_init(void) } postcore_initcall(dma_atomic_pool_init);
-static inline struct gen_pool *dma_guess_pool_from_device(struct device *dev) +static inline struct gen_pool *dma_guess_pool(struct gen_pool *prev, gfp_t gfp) { - u64 phys_mask; - gfp_t gfp; - - gfp = dma_direct_optimal_gfp_mask(dev, dev->coherent_dma_mask, - &phys_mask); - if (IS_ENABLED(CONFIG_ZONE_DMA) && gfp == GFP_DMA) + if (prev == NULL) { + if (IS_ENABLED(CONFIG_ZONE_DMA32) && (gfp & GFP_DMA32)) + return atomic_pool_dma32; + if (IS_ENABLED(CONFIG_ZONE_DMA) && (gfp & GFP_DMA)) + return atomic_pool_dma; + return atomic_pool_kernel; + } + if (prev == atomic_pool_kernel) + return atomic_pool_dma32 ? atomic_pool_dma32 : atomic_pool_dma; + if (prev == atomic_pool_dma32) return atomic_pool_dma; - if (IS_ENABLED(CONFIG_ZONE_DMA32) && gfp == GFP_DMA32) - return atomic_pool_dma32; - return atomic_pool_kernel; + return NULL; }
-static inline struct gen_pool *dma_get_safer_pool(struct gen_pool *bad_pool) +static struct page *__dma_alloc_from_pool(struct device *dev, size_t size, + struct gen_pool *pool, void **cpu_addr, + bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t)) { - if (bad_pool == atomic_pool_kernel) - return atomic_pool_dma32 ? : atomic_pool_dma; + unsigned long addr; + phys_addr_t phys;
- if (bad_pool == atomic_pool_dma32) - return atomic_pool_dma; + addr = gen_pool_alloc(pool, size); + if (!addr) + return NULL;
- return NULL; -} + phys = gen_pool_virt_to_phys(pool, addr); + if (phys_addr_ok && !phys_addr_ok(dev, phys, size)) { + gen_pool_free(pool, addr, size); + return NULL; + }
-static inline struct gen_pool *dma_guess_pool(struct device *dev, - struct gen_pool *bad_pool) -{ - if (bad_pool) - return dma_get_safer_pool(bad_pool); + if (gen_pool_avail(pool) < atomic_pool_size) + schedule_work(&atomic_pool_work);
- return dma_guess_pool_from_device(dev); + *cpu_addr = (void *)addr; + memset(*cpu_addr, 0, size); + return pfn_to_page(__phys_to_pfn(phys)); }
-void *dma_alloc_from_pool(struct device *dev, size_t size, - struct page **ret_page, gfp_t flags) +struct page *dma_alloc_from_pool(struct device *dev, size_t size, + void **cpu_addr, gfp_t gfp, + bool (*phys_addr_ok)(struct device *, phys_addr_t, size_t)) { struct gen_pool *pool = NULL; - unsigned long val = 0; - void *ptr = NULL; - phys_addr_t phys; - - while (1) { - pool = dma_guess_pool(dev, pool); - if (!pool) { - WARN(1, "Failed to get suitable pool for %s\n", - dev_name(dev)); - break; - } - - val = gen_pool_alloc(pool, size); - if (!val) - continue; - - phys = gen_pool_virt_to_phys(pool, val); - if (dma_coherent_ok(dev, phys, size)) - break; - - gen_pool_free(pool, val, size); - val = 0; - } - - - if (val) { - *ret_page = pfn_to_page(__phys_to_pfn(phys)); - ptr = (void *)val; - memset(ptr, 0, size); + struct page *page;
- if (gen_pool_avail(pool) < atomic_pool_size) - schedule_work(&atomic_pool_work); + while ((pool = dma_guess_pool(pool, gfp))) { + page = __dma_alloc_from_pool(dev, size, pool, cpu_addr, + phys_addr_ok); + if (page) + return page; }
- return ptr; + WARN(1, "Failed to get suitable pool for %s\n", dev_name(dev)); + return NULL; }
bool dma_free_from_pool(struct device *dev, void *start, size_t size) { struct gen_pool *pool = NULL;
- while (1) { - pool = dma_guess_pool(dev, pool); - if (!pool) - return false; - - if (gen_pool_has_addr(pool, (unsigned long)start, size)) { - gen_pool_free(pool, (unsigned long)start, size); - return true; - } + while ((pool = dma_guess_pool(pool, 0))) { + if (!gen_pool_has_addr(pool, (unsigned long)start, size)) + continue; + gen_pool_free(pool, (unsigned long)start, size); + return true; } + + return false; }
From: Nicolas Saenz Julienne nsaenzjulienne@suse.de
upstream 8b5369ea580964dbc982781bfb9fb93459fc5e8d commit.
Some architectures, notably ARM, are interested in tweaking this depending on their runtime DMA addressing limitations.
Acked-by: Christoph Hellwig hch@lst.de Signed-off-by: Nicolas Saenz Julienne nsaenzjulienne@suse.de Signed-off-by: Catalin Marinas catalin.marinas@arm.com Change-Id: I890f2bfbbf5758e3868acddd7bba6f655ec2b357 Signed-off-by: Peter Gonda pgonda@google.com --- arch/arm64/mm/init.c | 9 ++++++++- arch/powerpc/include/asm/page.h | 9 --------- arch/powerpc/mm/mem.c | 20 +++++++++++++++----- arch/s390/include/asm/page.h | 2 -- arch/s390/mm/init.c | 1 + include/linux/dma-direct.h | 2 ++ kernel/dma/direct.c | 13 ++++++------- 7 files changed, 32 insertions(+), 24 deletions(-)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 45c00a54909c..214cedc9271c 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -20,6 +20,7 @@ #include <linux/sort.h> #include <linux/of.h> #include <linux/of_fdt.h> +#include <linux/dma-direct.h> #include <linux/dma-mapping.h> #include <linux/dma-contiguous.h> #include <linux/efi.h> @@ -41,6 +42,8 @@ #include <asm/tlb.h> #include <asm/alternative.h>
+#define ARM64_ZONE_DMA_BITS 30 + /* * We need to be able to catch inadvertent references to memstart_addr * that occur (potentially in generic code) before arm64_memblock_init() @@ -418,7 +421,11 @@ void __init arm64_memblock_init(void)
early_init_fdt_scan_reserved_mem();
- /* 4GB maximum for 32-bit only capable devices */ + if (IS_ENABLED(CONFIG_ZONE_DMA)) { + zone_dma_bits = ARM64_ZONE_DMA_BITS; + arm64_dma_phys_limit = max_zone_phys(ARM64_ZONE_DMA_BITS); + } + if (IS_ENABLED(CONFIG_ZONE_DMA32)) arm64_dma_phys_limit = max_zone_dma_phys(); else diff --git a/arch/powerpc/include/asm/page.h b/arch/powerpc/include/asm/page.h index 6ba5adb96a3b..d568ce08e3b2 100644 --- a/arch/powerpc/include/asm/page.h +++ b/arch/powerpc/include/asm/page.h @@ -334,13 +334,4 @@ struct vm_area_struct; #endif /* __ASSEMBLY__ */ #include <asm/slice.h>
-/* - * Allow 30-bit DMA for very limited Broadcom wifi chips on many powerbooks. - */ -#ifdef CONFIG_PPC32 -#define ARCH_ZONE_DMA_BITS 30 -#else -#define ARCH_ZONE_DMA_BITS 31 -#endif - #endif /* _ASM_POWERPC_PAGE_H */ diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 96ca90ce0264..3b99b6b67fb5 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -31,6 +31,7 @@ #include <linux/slab.h> #include <linux/vmalloc.h> #include <linux/memremap.h> +#include <linux/dma-direct.h>
#include <asm/pgalloc.h> #include <asm/prom.h> @@ -223,10 +224,10 @@ static int __init mark_nonram_nosave(void) * everything else. GFP_DMA32 page allocations automatically fall back to * ZONE_DMA. * - * By using 31-bit unconditionally, we can exploit ARCH_ZONE_DMA_BITS to - * inform the generic DMA mapping code. 32-bit only devices (if not handled - * by an IOMMU anyway) will take a first dip into ZONE_NORMAL and get - * otherwise served by ZONE_DMA. + * By using 31-bit unconditionally, we can exploit zone_dma_bits to inform the + * generic DMA mapping code. 32-bit only devices (if not handled by an IOMMU + * anyway) will take a first dip into ZONE_NORMAL and get otherwise served by + * ZONE_DMA. */ static unsigned long max_zone_pfns[MAX_NR_ZONES];
@@ -259,9 +260,18 @@ void __init paging_init(void) printk(KERN_DEBUG "Memory hole size: %ldMB\n", (long int)((top_of_ram - total_ram) >> 20));
+ /* + * Allow 30-bit DMA for very limited Broadcom wifi chips on many + * powerbooks. + */ + if (IS_ENABLED(CONFIG_PPC32)) + zone_dma_bits = 30; + else + zone_dma_bits = 31; + #ifdef CONFIG_ZONE_DMA max_zone_pfns[ZONE_DMA] = min(max_low_pfn, - 1UL << (ARCH_ZONE_DMA_BITS - PAGE_SHIFT)); + 1UL << (zone_dma_bits - PAGE_SHIFT)); #endif max_zone_pfns[ZONE_NORMAL] = max_low_pfn; #ifdef CONFIG_HIGHMEM diff --git a/arch/s390/include/asm/page.h b/arch/s390/include/asm/page.h index e399102367af..1019efd85b9d 100644 --- a/arch/s390/include/asm/page.h +++ b/arch/s390/include/asm/page.h @@ -179,8 +179,6 @@ static inline int devmem_is_allowed(unsigned long pfn) #define VM_DATA_DEFAULT_FLAGS (VM_READ | VM_WRITE | \ VM_MAYREAD | VM_MAYWRITE | VM_MAYEXEC)
-#define ARCH_ZONE_DMA_BITS 31 - #include <asm-generic/memory_model.h> #include <asm-generic/getorder.h>
diff --git a/arch/s390/mm/init.c b/arch/s390/mm/init.c index c1d96e588152..ac44bd76db4b 100644 --- a/arch/s390/mm/init.c +++ b/arch/s390/mm/init.c @@ -118,6 +118,7 @@ void __init paging_init(void)
sparse_memory_present_with_active_regions(MAX_NUMNODES); sparse_init(); + zone_dma_bits = 31; memset(max_zone_pfns, 0, sizeof(max_zone_pfns)); max_zone_pfns[ZONE_DMA] = PFN_DOWN(MAX_DMA_ADDRESS); max_zone_pfns[ZONE_NORMAL] = max_low_pfn; diff --git a/include/linux/dma-direct.h b/include/linux/dma-direct.h index 6db863c3eb93..f3b276242f2d 100644 --- a/include/linux/dma-direct.h +++ b/include/linux/dma-direct.h @@ -6,6 +6,8 @@ #include <linux/memblock.h> /* for min_low_pfn */ #include <linux/mem_encrypt.h>
+extern unsigned int zone_dma_bits; + static inline dma_addr_t phys_to_dma(struct device *dev, phys_addr_t paddr);
#ifdef CONFIG_ARCH_HAS_PHYS_TO_DMA diff --git a/kernel/dma/direct.c b/kernel/dma/direct.c index 71be82b07743..2af418b4b6f9 100644 --- a/kernel/dma/direct.c +++ b/kernel/dma/direct.c @@ -17,12 +17,11 @@ #include <linux/swiotlb.h>
/* - * Most architectures use ZONE_DMA for the first 16 Megabytes, but - * some use it for entirely different regions: + * Most architectures use ZONE_DMA for the first 16 Megabytes, but some use it + * it for entirely different regions. In that case the arch code needs to + * override the variable below for dma-direct to work properly. */ -#ifndef ARCH_ZONE_DMA_BITS -#define ARCH_ZONE_DMA_BITS 24 -#endif +unsigned int zone_dma_bits __ro_after_init = 24;
static void report_addr(struct device *dev, dma_addr_t dma_addr, size_t size) { @@ -77,7 +76,7 @@ gfp_t dma_direct_optimal_gfp_mask(struct device *dev, u64 dma_mask, * Note that GFP_DMA32 and GFP_DMA are no ops without the corresponding * zones. */ - if (*phys_mask <= DMA_BIT_MASK(ARCH_ZONE_DMA_BITS)) + if (*phys_mask <= DMA_BIT_MASK(zone_dma_bits)) return GFP_DMA; if (*phys_mask <= DMA_BIT_MASK(32)) return GFP_DMA32; @@ -547,7 +546,7 @@ int dma_direct_supported(struct device *dev, u64 mask) u64 min_mask;
if (IS_ENABLED(CONFIG_ZONE_DMA)) - min_mask = DMA_BIT_MASK(ARCH_ZONE_DMA_BITS); + min_mask = DMA_BIT_MASK(zone_dma_bits); else min_mask = DMA_BIT_MASK(32);
On Fri, Sep 25, 2020 at 09:18:46AM -0700, Peter Gonda wrote:
Currently SEV enabled guests hit might_sleep() warnings when a driver (nvme in this case) allocates through the DMA API in a non-blockable context. Making these unencrypted non-blocking DMA allocations come from the coherent pools prevents this BUG.
BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio 2 locks held by fio/3383: #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: dump_stack+0x98/0xd5 ___might_sleep+0x175/0x260 __might_sleep+0x4a/0x80 _vm_unmap_aliases+0x45/0x250 vm_unmap_aliases+0x19/0x20 __set_memory_enc_dec+0xa4/0x130 set_memory_decrypted+0x10/0x20 dma_direct_alloc_pages+0x148/0x150 dma_direct_alloc+0xe/0x10 dma_alloc_attrs+0x86/0xc0 dma_pool_alloc+0x16f/0x2b0 nvme_queue_rq+0x878/0xc30 [nvme] __blk_mq_try_issue_directly+0x135/0x200 blk_mq_request_issue_directly+0x4f/0x80 blk_mq_try_issue_list_directly+0x46/0xb0 blk_mq_sched_insert_requests+0x19b/0x2b0 blk_mq_flush_plug_list+0x22f/0x3b0 blk_flush_plug_list+0xd1/0x100 blk_finish_plug+0x2c/0x40 iomap_dio_rw+0x427/0x490 ext4_file_write_iter+0x181/0x5d0 aio_write+0x109/0x1b0 io_submit_one+0x7d0/0xfa0 __x64_sys_io_submit+0xa2/0x280 do_syscall_64+0x5f/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Christoph Hellwig (9): dma-direct: remove __dma_direct_free_pages dma-direct: remove the dma_handle argument to __dma_direct_alloc_pages dma-direct: provide mmap and get_sgtable method overrides dma-mapping: merge the generic remapping helpers into dma-direct dma-direct: consolidate the error handling in dma_direct_alloc_pages xtensa: use the generic uncached segment support dma-direct: make uncached_kernel_address more general dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR dma-pool: fix coherent pool allocations for IOMMU mappings
David Rientjes (13): dma-remap: separate DMA atomic pools from direct remap code dma-pool: add additional coherent pools to map to gfp mask dma-pool: dynamically expanding atomic pools dma-direct: atomic allocations must come from atomic coherent pools dma-pool: add pool sizes to debugfs x86/mm: unencrypted non-blocking DMA allocations use coherent pools dma-pool: scale the default DMA coherent pool size with memory capacity dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL dma-direct: always align allocation size in dma_direct_alloc_pages() dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails dma-direct: check return value when encrypting or decrypting memory dma-direct: add missing set_memory_decrypted() for coherent mapping dma-mapping: warn when coherent pool is depleted
Geert Uytterhoeven (1): dma-pool: fix too large DMA pools on medium memory size systems
Huang Shijie (1): lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr
Nicolas Saenz Julienne (6): dma-direct: provide function to check physical memory area validity dma-pool: get rid of dma_in_atomic_pool() dma-pool: introduce dma_guess_pool() dma-pool: make sure atomic pool suits device dma-pool: do not allocate pool memory from CMA dma/direct: turn ARCH_ZONE_DMA_BITS into a variable
Documentation/core-api/genalloc.rst | 2 +- arch/Kconfig | 8 +- arch/arc/Kconfig | 1 - arch/arm/Kconfig | 1 - arch/arm/mm/dma-mapping.c | 8 +- arch/arm64/Kconfig | 1 - arch/arm64/mm/init.c | 9 +- arch/ia64/Kconfig | 2 +- arch/ia64/kernel/dma-mapping.c | 6 - arch/microblaze/Kconfig | 3 +- arch/microblaze/mm/consistent.c | 2 +- arch/mips/Kconfig | 7 +- arch/mips/mm/dma-noncoherent.c | 8 +- arch/nios2/Kconfig | 3 +- arch/nios2/mm/dma-mapping.c | 2 +- arch/powerpc/include/asm/page.h | 9 - arch/powerpc/mm/mem.c | 20 +- arch/powerpc/platforms/Kconfig.cputype | 1 - arch/s390/include/asm/page.h | 2 - arch/s390/mm/init.c | 1 + arch/x86/Kconfig | 1 + arch/xtensa/Kconfig | 6 +- arch/xtensa/include/asm/platform.h | 27 --- arch/xtensa/kernel/Makefile | 3 +- arch/xtensa/kernel/pci-dma.c | 121 ++--------- drivers/iommu/dma-iommu.c | 5 +- drivers/misc/sram-exec.c | 2 +- include/linux/dma-direct.h | 12 +- include/linux/dma-mapping.h | 7 +- include/linux/dma-noncoherent.h | 4 +- include/linux/genalloc.h | 2 +- kernel/dma/Kconfig | 20 +- kernel/dma/Makefile | 1 + kernel/dma/direct.c | 224 ++++++++++++++++---- kernel/dma/mapping.c | 45 +---- kernel/dma/pool.c | 270 +++++++++++++++++++++++++ kernel/dma/remap.c | 176 +--------------- lib/genalloc.c | 5 +- 38 files changed, 564 insertions(+), 463 deletions(-) create mode 100644 kernel/dma/pool.c
This is a pretty big set of changes for a stable tree. Can I get some acks/reviews/whatever by the developers involved here that this really is a good idea to take into the 5.4.y tree?
thanks,
greg k-h
We realize this is a large set of changes but it's the only way for us to remove that customer facing BUG for SEV guests. When David asked back in May a full series was preferred, is there another way forward? I've CCd the authors of the changes.
On Mon, Oct 5, 2020 at 7:06 AM Greg KH greg@kroah.com wrote:
On Fri, Sep 25, 2020 at 09:18:46AM -0700, Peter Gonda wrote:
Currently SEV enabled guests hit might_sleep() warnings when a driver (nvme in this case) allocates through the DMA API in a non-blockable context. Making these unencrypted non-blocking DMA allocations come from the coherent pools prevents this BUG.
BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio 2 locks held by fio/3383: #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: dump_stack+0x98/0xd5 ___might_sleep+0x175/0x260 __might_sleep+0x4a/0x80 _vm_unmap_aliases+0x45/0x250 vm_unmap_aliases+0x19/0x20 __set_memory_enc_dec+0xa4/0x130 set_memory_decrypted+0x10/0x20 dma_direct_alloc_pages+0x148/0x150 dma_direct_alloc+0xe/0x10 dma_alloc_attrs+0x86/0xc0 dma_pool_alloc+0x16f/0x2b0 nvme_queue_rq+0x878/0xc30 [nvme] __blk_mq_try_issue_directly+0x135/0x200 blk_mq_request_issue_directly+0x4f/0x80 blk_mq_try_issue_list_directly+0x46/0xb0 blk_mq_sched_insert_requests+0x19b/0x2b0 blk_mq_flush_plug_list+0x22f/0x3b0 blk_flush_plug_list+0xd1/0x100 blk_finish_plug+0x2c/0x40 iomap_dio_rw+0x427/0x490 ext4_file_write_iter+0x181/0x5d0 aio_write+0x109/0x1b0 io_submit_one+0x7d0/0xfa0 __x64_sys_io_submit+0xa2/0x280 do_syscall_64+0x5f/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Christoph Hellwig (9): dma-direct: remove __dma_direct_free_pages dma-direct: remove the dma_handle argument to __dma_direct_alloc_pages dma-direct: provide mmap and get_sgtable method overrides dma-mapping: merge the generic remapping helpers into dma-direct dma-direct: consolidate the error handling in dma_direct_alloc_pages xtensa: use the generic uncached segment support dma-direct: make uncached_kernel_address more general dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR dma-pool: fix coherent pool allocations for IOMMU mappings
David Rientjes (13): dma-remap: separate DMA atomic pools from direct remap code dma-pool: add additional coherent pools to map to gfp mask dma-pool: dynamically expanding atomic pools dma-direct: atomic allocations must come from atomic coherent pools dma-pool: add pool sizes to debugfs x86/mm: unencrypted non-blocking DMA allocations use coherent pools dma-pool: scale the default DMA coherent pool size with memory capacity dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL dma-direct: always align allocation size in dma_direct_alloc_pages() dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails dma-direct: check return value when encrypting or decrypting memory dma-direct: add missing set_memory_decrypted() for coherent mapping dma-mapping: warn when coherent pool is depleted
Geert Uytterhoeven (1): dma-pool: fix too large DMA pools on medium memory size systems
Huang Shijie (1): lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr
Nicolas Saenz Julienne (6): dma-direct: provide function to check physical memory area validity dma-pool: get rid of dma_in_atomic_pool() dma-pool: introduce dma_guess_pool() dma-pool: make sure atomic pool suits device dma-pool: do not allocate pool memory from CMA dma/direct: turn ARCH_ZONE_DMA_BITS into a variable
Documentation/core-api/genalloc.rst | 2 +- arch/Kconfig | 8 +- arch/arc/Kconfig | 1 - arch/arm/Kconfig | 1 - arch/arm/mm/dma-mapping.c | 8 +- arch/arm64/Kconfig | 1 - arch/arm64/mm/init.c | 9 +- arch/ia64/Kconfig | 2 +- arch/ia64/kernel/dma-mapping.c | 6 - arch/microblaze/Kconfig | 3 +- arch/microblaze/mm/consistent.c | 2 +- arch/mips/Kconfig | 7 +- arch/mips/mm/dma-noncoherent.c | 8 +- arch/nios2/Kconfig | 3 +- arch/nios2/mm/dma-mapping.c | 2 +- arch/powerpc/include/asm/page.h | 9 - arch/powerpc/mm/mem.c | 20 +- arch/powerpc/platforms/Kconfig.cputype | 1 - arch/s390/include/asm/page.h | 2 - arch/s390/mm/init.c | 1 + arch/x86/Kconfig | 1 + arch/xtensa/Kconfig | 6 +- arch/xtensa/include/asm/platform.h | 27 --- arch/xtensa/kernel/Makefile | 3 +- arch/xtensa/kernel/pci-dma.c | 121 ++--------- drivers/iommu/dma-iommu.c | 5 +- drivers/misc/sram-exec.c | 2 +- include/linux/dma-direct.h | 12 +- include/linux/dma-mapping.h | 7 +- include/linux/dma-noncoherent.h | 4 +- include/linux/genalloc.h | 2 +- kernel/dma/Kconfig | 20 +- kernel/dma/Makefile | 1 + kernel/dma/direct.c | 224 ++++++++++++++++---- kernel/dma/mapping.c | 45 +---- kernel/dma/pool.c | 270 +++++++++++++++++++++++++ kernel/dma/remap.c | 176 +--------------- lib/genalloc.c | 5 +- 38 files changed, 564 insertions(+), 463 deletions(-) create mode 100644 kernel/dma/pool.c
This is a pretty big set of changes for a stable tree. Can I get some acks/reviews/whatever by the developers involved here that this really is a good idea to take into the 5.4.y tree?
thanks,
greg k-h
Thanks Peter.
The series of commits certainly expanded from my initial set that I asked about in a thread with the subject "DMA API stable backports for AMD SEV" on May 19. Turns out that switching how DMA memory is allocated based on various characteristics of the allocation and device is trickier than originally thought :) There were a number of fixes that were needed for subtleties and cornercases that folks ran into, but were addressed and have been merged by Linus. I believe it's stable in upstream and that we've been thorough in compiling a full set of changes that are required for 5.4.
Note that without this series, all SEV-enabled guests will run into the "sleeping function called from invalid context" issue in the vmalloc layer that Peter cites when using certain drivers. For such configurations, there is no way to avoid the "BUG" messages in the guest kernel when using AMD SEV unless this series is merged into an LTS kernel that the distros will then pick up.
For my 13 patches in the 30 patch series, I fully stand by Peter's backports and rationale for merge into 5.4 LTS.
On Tue, 6 Oct 2020, Peter Gonda wrote:
We realize this is a large set of changes but it's the only way for us to remove that customer facing BUG for SEV guests. When David asked back in May a full series was preferred, is there another way forward? I've CCd the authors of the changes.
On Mon, Oct 5, 2020 at 7:06 AM Greg KH greg@kroah.com wrote:
On Fri, Sep 25, 2020 at 09:18:46AM -0700, Peter Gonda wrote:
Currently SEV enabled guests hit might_sleep() warnings when a driver (nvme in this case) allocates through the DMA API in a non-blockable context. Making these unencrypted non-blocking DMA allocations come from the coherent pools prevents this BUG.
BUG: sleeping function called from invalid context at mm/vmalloc.c:1710 in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 3383, name: fio 2 locks held by fio/3383: #0: ffff93b6a8568348 (&sb->s_type->i_mutex_key#16){+.+.}, at: ext4_file_write_iter+0xa2/0x5d0 #1: ffffffffa52a61a0 (rcu_read_lock){....}, at: hctx_lock+0x1a/0xe0 CPU: 0 PID: 3383 Comm: fio Tainted: G W 5.5.10 #14 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: dump_stack+0x98/0xd5 ___might_sleep+0x175/0x260 __might_sleep+0x4a/0x80 _vm_unmap_aliases+0x45/0x250 vm_unmap_aliases+0x19/0x20 __set_memory_enc_dec+0xa4/0x130 set_memory_decrypted+0x10/0x20 dma_direct_alloc_pages+0x148/0x150 dma_direct_alloc+0xe/0x10 dma_alloc_attrs+0x86/0xc0 dma_pool_alloc+0x16f/0x2b0 nvme_queue_rq+0x878/0xc30 [nvme] __blk_mq_try_issue_directly+0x135/0x200 blk_mq_request_issue_directly+0x4f/0x80 blk_mq_try_issue_list_directly+0x46/0xb0 blk_mq_sched_insert_requests+0x19b/0x2b0 blk_mq_flush_plug_list+0x22f/0x3b0 blk_flush_plug_list+0xd1/0x100 blk_finish_plug+0x2c/0x40 iomap_dio_rw+0x427/0x490 ext4_file_write_iter+0x181/0x5d0 aio_write+0x109/0x1b0 io_submit_one+0x7d0/0xfa0 __x64_sys_io_submit+0xa2/0x280 do_syscall_64+0x5f/0x250 entry_SYSCALL_64_after_hwframe+0x49/0xbe
Christoph Hellwig (9): dma-direct: remove __dma_direct_free_pages dma-direct: remove the dma_handle argument to __dma_direct_alloc_pages dma-direct: provide mmap and get_sgtable method overrides dma-mapping: merge the generic remapping helpers into dma-direct dma-direct: consolidate the error handling in dma_direct_alloc_pages xtensa: use the generic uncached segment support dma-direct: make uncached_kernel_address more general dma-mapping: DMA_COHERENT_POOL should select GENERIC_ALLOCATOR dma-pool: fix coherent pool allocations for IOMMU mappings
David Rientjes (13): dma-remap: separate DMA atomic pools from direct remap code dma-pool: add additional coherent pools to map to gfp mask dma-pool: dynamically expanding atomic pools dma-direct: atomic allocations must come from atomic coherent pools dma-pool: add pool sizes to debugfs x86/mm: unencrypted non-blocking DMA allocations use coherent pools dma-pool: scale the default DMA coherent pool size with memory capacity dma-pool: decouple DMA_REMAP from DMA_COHERENT_POOL dma-direct: always align allocation size in dma_direct_alloc_pages() dma-direct: re-encrypt memory if dma_direct_alloc_pages() fails dma-direct: check return value when encrypting or decrypting memory dma-direct: add missing set_memory_decrypted() for coherent mapping dma-mapping: warn when coherent pool is depleted
Geert Uytterhoeven (1): dma-pool: fix too large DMA pools on medium memory size systems
Huang Shijie (1): lib/genalloc.c: rename addr_in_gen_pool to gen_pool_has_addr
Nicolas Saenz Julienne (6): dma-direct: provide function to check physical memory area validity dma-pool: get rid of dma_in_atomic_pool() dma-pool: introduce dma_guess_pool() dma-pool: make sure atomic pool suits device dma-pool: do not allocate pool memory from CMA dma/direct: turn ARCH_ZONE_DMA_BITS into a variable
Documentation/core-api/genalloc.rst | 2 +- arch/Kconfig | 8 +- arch/arc/Kconfig | 1 - arch/arm/Kconfig | 1 - arch/arm/mm/dma-mapping.c | 8 +- arch/arm64/Kconfig | 1 - arch/arm64/mm/init.c | 9 +- arch/ia64/Kconfig | 2 +- arch/ia64/kernel/dma-mapping.c | 6 - arch/microblaze/Kconfig | 3 +- arch/microblaze/mm/consistent.c | 2 +- arch/mips/Kconfig | 7 +- arch/mips/mm/dma-noncoherent.c | 8 +- arch/nios2/Kconfig | 3 +- arch/nios2/mm/dma-mapping.c | 2 +- arch/powerpc/include/asm/page.h | 9 - arch/powerpc/mm/mem.c | 20 +- arch/powerpc/platforms/Kconfig.cputype | 1 - arch/s390/include/asm/page.h | 2 - arch/s390/mm/init.c | 1 + arch/x86/Kconfig | 1 + arch/xtensa/Kconfig | 6 +- arch/xtensa/include/asm/platform.h | 27 --- arch/xtensa/kernel/Makefile | 3 +- arch/xtensa/kernel/pci-dma.c | 121 ++--------- drivers/iommu/dma-iommu.c | 5 +- drivers/misc/sram-exec.c | 2 +- include/linux/dma-direct.h | 12 +- include/linux/dma-mapping.h | 7 +- include/linux/dma-noncoherent.h | 4 +- include/linux/genalloc.h | 2 +- kernel/dma/Kconfig | 20 +- kernel/dma/Makefile | 1 + kernel/dma/direct.c | 224 ++++++++++++++++---- kernel/dma/mapping.c | 45 +---- kernel/dma/pool.c | 270 +++++++++++++++++++++++++ kernel/dma/remap.c | 176 +--------------- lib/genalloc.c | 5 +- 38 files changed, 564 insertions(+), 463 deletions(-) create mode 100644 kernel/dma/pool.c
This is a pretty big set of changes for a stable tree. Can I get some acks/reviews/whatever by the developers involved here that this really is a good idea to take into the 5.4.y tree?
thanks,
greg k-h
On Tue, Oct 06, 2020 at 11:10:41AM -0700, David Rientjes wrote:
Thanks Peter.
The series of commits certainly expanded from my initial set that I asked about in a thread with the subject "DMA API stable backports for AMD SEV" on May 19. Turns out that switching how DMA memory is allocated based on various characteristics of the allocation and device is trickier than originally thought :) There were a number of fixes that were needed for subtleties and cornercases that folks ran into, but were addressed and have been merged by Linus. I believe it's stable in upstream and that we've been thorough in compiling a full set of changes that are required for 5.4.
Note that without this series, all SEV-enabled guests will run into the "sleeping function called from invalid context" issue in the vmalloc layer that Peter cites when using certain drivers. For such configurations, there is no way to avoid the "BUG" messages in the guest kernel when using AMD SEV unless this series is merged into an LTS kernel that the distros will then pick up.
For my 13 patches in the 30 patch series, I fully stand by Peter's backports and rationale for merge into 5.4 LTS.
Given that this "feature" has never worked in the 5.4 or older kernels, why should this be backported there? This isn't a bugfix from what I can tell, is it? And if so, what kernel version did work properly?
And if someone really wants this new feature, why can't they just use a newer kernel release?
thanks,
greg k-h
On Mon, 4 Jan 2021, Greg KH wrote:
The series of commits certainly expanded from my initial set that I asked about in a thread with the subject "DMA API stable backports for AMD SEV" on May 19. Turns out that switching how DMA memory is allocated based on various characteristics of the allocation and device is trickier than originally thought :) There were a number of fixes that were needed for subtleties and cornercases that folks ran into, but were addressed and have been merged by Linus. I believe it's stable in upstream and that we've been thorough in compiling a full set of changes that are required for 5.4.
Note that without this series, all SEV-enabled guests will run into the "sleeping function called from invalid context" issue in the vmalloc layer that Peter cites when using certain drivers. For such configurations, there is no way to avoid the "BUG" messages in the guest kernel when using AMD SEV unless this series is merged into an LTS kernel that the distros will then pick up.
For my 13 patches in the 30 patch series, I fully stand by Peter's backports and rationale for merge into 5.4 LTS.
Given that this "feature" has never worked in the 5.4 or older kernels, why should this be backported there? This isn't a bugfix from what I can tell, is it? And if so, what kernel version did work properly?
I think it can be considered a bug fix.
Today, if you boot an SEV encrypted guest running 5.4 and it requires atomic DMA allocations, you'll get the "sleeping function called from invalid context" bugs. We see this in our Cloud because there is a reliance on atomic allocations through the DMA API by the NVMe driver. Likely nobody else has triggered this because they don't have such driver dependencies.
No previous kernel version worked properly since SEV guest support was introduced in 4.14.
And if someone really wants this new feature, why can't they just use a newer kernel release?
This is more of a product question that I'll defer to Peter and he can loop the necessary people in if required.
Since the SEV feature provides confidentiality for guest managed memory, running an unmodified guest vs a guest modified to avoid these bugs by the cloud provider is a very different experience from the perspective of the customer trying to protect their data.
These customers are running standard distros that may be slow to upgrade to new kernels released over the past few months. We could certainly work with the distros to backport this support directly to them on a case-by-case basis, but the thought was to first attempt to fix this in 5.4 stable for everybody and allow them to receive the fixes necessary for running a non-buggy SEV encrypted guest that way vs multiple distros doing the backport so they can run with SEV.
On Mon, Jan 04, 2021 at 02:37:00PM -0800, David Rientjes wrote:
On Mon, 4 Jan 2021, Greg KH wrote:
The series of commits certainly expanded from my initial set that I asked about in a thread with the subject "DMA API stable backports for AMD SEV" on May 19. Turns out that switching how DMA memory is allocated based on various characteristics of the allocation and device is trickier than originally thought :) There were a number of fixes that were needed for subtleties and cornercases that folks ran into, but were addressed and have been merged by Linus. I believe it's stable in upstream and that we've been thorough in compiling a full set of changes that are required for 5.4.
Note that without this series, all SEV-enabled guests will run into the "sleeping function called from invalid context" issue in the vmalloc layer that Peter cites when using certain drivers. For such configurations, there is no way to avoid the "BUG" messages in the guest kernel when using AMD SEV unless this series is merged into an LTS kernel that the distros will then pick up.
For my 13 patches in the 30 patch series, I fully stand by Peter's backports and rationale for merge into 5.4 LTS.
Given that this "feature" has never worked in the 5.4 or older kernels, why should this be backported there? This isn't a bugfix from what I can tell, is it? And if so, what kernel version did work properly?
I think it can be considered a bug fix.
Today, if you boot an SEV encrypted guest running 5.4 and it requires atomic DMA allocations, you'll get the "sleeping function called from invalid context" bugs. We see this in our Cloud because there is a reliance on atomic allocations through the DMA API by the NVMe driver. Likely nobody else has triggered this because they don't have such driver dependencies.
No previous kernel version worked properly since SEV guest support was introduced in 4.14.
So since this has never worked, it is not a regression that is being fixed, but rather, a "new feature". And because of that, if you want it to work properly, please use a new kernel that has all of these major changes in it.
And if someone really wants this new feature, why can't they just use a newer kernel release?
This is more of a product question that I'll defer to Peter and he can loop the necessary people in if required.
If you want to make a "product" of a new feature, using an old kernel base, then yes, you have to backport this and you are on your own here. That's just totally normal for all "products" that do not want to use the latest kernel release.
Since the SEV feature provides confidentiality for guest managed memory, running an unmodified guest vs a guest modified to avoid these bugs by the cloud provider is a very different experience from the perspective of the customer trying to protect their data.
These customers are running standard distros that may be slow to upgrade to new kernels released over the past few months. We could certainly work with the distros to backport this support directly to them on a case-by-case basis, but the thought was to first attempt to fix this in 5.4 stable for everybody and allow them to receive the fixes necessary for running a non-buggy SEV encrypted guest that way vs multiple distros doing the backport so they can run with SEV.
What distro that is based on 5.4 that follows the upstream stable trees have not already included these patches in their releases? And what prevents them from using a newer kernel release entirely for this new feature their customers are requesting?
thanks,
greg k-h
On Tue, 5 Jan 2021, Greg KH wrote:
I think it can be considered a bug fix.
Today, if you boot an SEV encrypted guest running 5.4 and it requires atomic DMA allocations, you'll get the "sleeping function called from invalid context" bugs. We see this in our Cloud because there is a reliance on atomic allocations through the DMA API by the NVMe driver. Likely nobody else has triggered this because they don't have such driver dependencies.
No previous kernel version worked properly since SEV guest support was introduced in 4.14.
So since this has never worked, it is not a regression that is being fixed, but rather, a "new feature". And because of that, if you want it to work properly, please use a new kernel that has all of these major changes in it.
Hmm, maybe :) AMD shipped guest support in 4.14 and host support in 4.16 for the SEV feature. In turns out that a subset of drivers (for Google, NVMe) would run into scheduling while atomic bugs because they do GFP_ATOMIC allocations through the DMA API and that uses set_memory_decrypted() for SEV which can block. I'd argue that's a bug in the SEV feature for a subset of configs.
So this never worked correctly for a subset of drivers until I added atomic DMA pools in 5.7, which was the preferred way of fixing it. But SEV as a feature works for everybody not using this subset of drivers. I wouldn't say that the fix is a "new feature" because it's the means by which we provide unencrypted DMA memory for atomic allocators that can't make the transition from encrypted to unecrypted during allocation because of their context; it specifically addresses the bug.
What distro that is based on 5.4 that follows the upstream stable trees have not already included these patches in their releases? And what prevents them from using a newer kernel release entirely for this new feature their customers are requesting?
I'll defer this to Peter who would have a far better understanding of the base kernel versions that our customers use with SEV.
Thanks
Reviving this thread.
This is repro-able on 5.4 LTS guests as long as CONFIG_DEBUG_ATOMIC_SLEEP is set. I am not sure if this can be classified as a regression or not since I think this issue would have shown up on all SEV guests using NVMe boot disks since the start of SEV. So it seems like a regression to me.
Currently I know of Ubuntu20.04.01 being based on 5.4. I have also sent this backport to 4.19 which is close to the 4.18 which RHEL8 is based on. I do not know what prevents the distros from using a more recent kernel but that seems like their choice. I was just hoping to get these backport submitted into LTS to show the issue and help any distro wanting to backport these changes to prevent it. Greg is there more information I can help provide as justification?
On Mon, Jan 4, 2021 at 11:38 PM David Rientjes rientjes@google.com wrote:
On Tue, 5 Jan 2021, Greg KH wrote:
I think it can be considered a bug fix.
Today, if you boot an SEV encrypted guest running 5.4 and it requires atomic DMA allocations, you'll get the "sleeping function called from invalid context" bugs. We see this in our Cloud because there is a reliance on atomic allocations through the DMA API by the NVMe driver. Likely nobody else has triggered this because they don't have such driver dependencies.
No previous kernel version worked properly since SEV guest support was introduced in 4.14.
So since this has never worked, it is not a regression that is being fixed, but rather, a "new feature". And because of that, if you want it to work properly, please use a new kernel that has all of these major changes in it.
Hmm, maybe :) AMD shipped guest support in 4.14 and host support in 4.16 for the SEV feature. In turns out that a subset of drivers (for Google, NVMe) would run into scheduling while atomic bugs because they do GFP_ATOMIC allocations through the DMA API and that uses set_memory_decrypted() for SEV which can block. I'd argue that's a bug in the SEV feature for a subset of configs.
So this never worked correctly for a subset of drivers until I added atomic DMA pools in 5.7, which was the preferred way of fixing it. But SEV as a feature works for everybody not using this subset of drivers. I wouldn't say that the fix is a "new feature" because it's the means by which we provide unencrypted DMA memory for atomic allocators that can't make the transition from encrypted to unecrypted during allocation because of their context; it specifically addresses the bug.
What distro that is based on 5.4 that follows the upstream stable trees have not already included these patches in their releases? And what prevents them from using a newer kernel release entirely for this new feature their customers are requesting?
I'll defer this to Peter who would have a far better understanding of the base kernel versions that our customers use with SEV.
Thanks
linux-stable-mirror@lists.linaro.org