From: Ira Weiny ira.weiny@intel.com
Should a stray write in the kernel occur persistent memory is affected more than regular memory. A write to the wrong area of memory could result in latent data corruption which will will persist after a reboot. PKS provides a nice way to restrict access to persistent memory kernel mappings, while providing fast access when needed.
Since the last RFC[1] this patch set has grown quite a bit. It now depends on the core patches submitted separately.
https://lore.kernel.org/lkml/20201009194258.3207172-1-ira.weiny@intel.com/
And contained in the git tree here:
https://github.com/weiny2/linux-kernel/tree/pks-rfc-v3
However, functionally there is only 1 major change from the last RFC. Specifically, kmap() is most often used within a single thread in a 'map/do something/unmap' pattern. In fact this is the pattern used in ~90% of the callers of kmap(). This pattern works very well for the pmem use case and the testing which was done. However, there were another ~20-30 kmap users which do not follow this pattern. Some of them seem to expect the mapping to be 'global' while others require a detailed audit to be sure.[2][3]
While we don't anticipate global mappings to pmem there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why.
There were a number of options considered.
1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only
Option 1 is simply not feasible kmap_atomic() is not the same semantic as kmap() within a single tread. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior intended for future users. Therefore, option #4 was chosen.
To handle the global PKRS state in the most efficient manner possible. We lazily override the thread specific PKRS key value only when needed because we anticipate PKS to not be needed will not be needed most of the time. And even when it is used 90% of the time it is a thread local call.
[1] https://lore.kernel.org/lkml/20200717072056.73134-1-ira.weiny@intel.com/
[2] The following list of callers continue calling kmap() (utilizing the global PKRS). It would be nice if more of them could be converted to kmap_thread()
drivers/firewire/net.c: ptr = kmap(dev->broadcast_rcv_buffer.pages[u]); drivers/gpu/drm/i915/gem/i915_gem_pages.c: return kmap(sg_page(sgt->sgl)); drivers/gpu/drm/ttm/ttm_bo_util.c: map->virtual = kmap(map->page); drivers/infiniband/hw/qib/qib_user_sdma.c: mpage = kmap(page); drivers/misc/vmw_vmci/vmci_host.c: context->notify = kmap(context->notify_page) + (uva & (PAGE_SIZE - 1)); drivers/misc/xilinx_sdfec.c: addr = kmap(pages[i]); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/mmc/host/usdhi6rol0.c: host->pg.mapped = kmap(host->pg.page); drivers/nvme/target/tcp.c: iov->iov_base = kmap(sg_page(sg)) + sg->offset + sg_offset; drivers/scsi/libiscsi_tcp.c: segment->sg_mapped = kmap(sg_page(sg)); drivers/target/iscsi/iscsi_target.c: iov[i].iov_base = kmap(sg_page(sg)) + sg->offset + page_off; drivers/target/target_core_transport.c: return kmap(sg_page(sg)) + sg->offset; fs/btrfs/check-integrity.c: block_ctx->datav[i] = kmap(block_ctx->pagev[i]); fs/ceph/dir.c: cache_ctl->dentries = kmap(cache_ctl->page); fs/ceph/inode.c: ctl->dentries = kmap(ctl->page); fs/erofs/zpvec.h: kmap_atomic(ctor->curr) : kmap(ctor->curr); lib/scatterlist.c: miter->addr = kmap(miter->page) + miter->__offset; net/ceph/pagelist.c: pl->mapped_tail = kmap(page); net/ceph/pagelist.c: pl->mapped_tail = kmap(page); virt/kvm/kvm_main.c: hva = kmap(page);
[3] The following appear to follow the same pattern as ext2 which was converted after some code audit. So I _think_ they too could be converted to k[un]map_thread().
fs/freevxfs/vxfs_subr.c|75| kmap(pp); fs/jfs/jfs_metapage.c|102| kmap(page); fs/jfs/jfs_metapage.c|156| kmap(page); fs/minix/dir.c|72| kmap(page); fs/nilfs2/dir.c|195| kmap(page); fs/nilfs2/ifile.h|24| void *kaddr = kmap(ibh->b_page); fs/ntfs/aops.h|78| kmap(page); fs/ntfs/compress.c|574| kmap(page); fs/qnx6/dir.c|32| kmap(page); fs/qnx6/dir.c|58| kmap(*p = page); fs/qnx6/inode.c|190| kmap(page); fs/qnx6/inode.c|557| kmap(page); fs/reiserfs/inode.c|2397| kmap(bh_result->b_page); fs/reiserfs/xattr.c|444| kmap(page); fs/sysv/dir.c|60| kmap(page); fs/sysv/dir.c|262| kmap(page); fs/ufs/dir.c|194| kmap(page); fs/ufs/dir.c|562| kmap(page);
Ira Weiny (58): x86/pks: Add a global pkrs option x86/pks/test: Add testing for global option memremap: Add zone device access protection kmap: Add stray access protection for device pages kmap: Introduce k[un]map_thread kmap: Introduce k[un]map_thread debugging drivers/drbd: Utilize new kmap_thread() drivers/firmware_loader: Utilize new kmap_thread() drivers/gpu: Utilize new kmap_thread() drivers/rdma: Utilize new kmap_thread() drivers/net: Utilize new kmap_thread() fs/afs: Utilize new kmap_thread() fs/btrfs: Utilize new kmap_thread() fs/cifs: Utilize new kmap_thread() fs/ecryptfs: Utilize new kmap_thread() fs/gfs2: Utilize new kmap_thread() fs/nilfs2: Utilize new kmap_thread() fs/hfs: Utilize new kmap_thread() fs/hfsplus: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() fs/nfs: Utilize new kmap_thread() fs/f2fs: Utilize new kmap_thread() fs/fuse: Utilize new kmap_thread() fs/freevxfs: Utilize new kmap_thread() fs/reiserfs: Utilize new kmap_thread() fs/zonefs: Utilize new kmap_thread() fs/ubifs: Utilize new kmap_thread() fs/cachefiles: Utilize new kmap_thread() fs/ntfs: Utilize new kmap_thread() fs/romfs: Utilize new kmap_thread() fs/vboxsf: Utilize new kmap_thread() fs/hostfs: Utilize new kmap_thread() fs/cramfs: Utilize new kmap_thread() fs/erofs: Utilize new kmap_thread() fs: Utilize new kmap_thread() fs/ext2: Use ext2_put_page fs/ext2: Utilize new kmap_thread() fs/isofs: Utilize new kmap_thread() fs/jffs2: Utilize new kmap_thread() net: Utilize new kmap_thread() drivers/target: Utilize new kmap_thread() drivers/scsi: Utilize new kmap_thread() drivers/mmc: Utilize new kmap_thread() drivers/xen: Utilize new kmap_thread() drivers/firmware: Utilize new kmap_thread() drives/staging: Utilize new kmap_thread() drivers/mtd: Utilize new kmap_thread() drivers/md: Utilize new kmap_thread() drivers/misc: Utilize new kmap_thread() drivers/android: Utilize new kmap_thread() kernel: Utilize new kmap_thread() mm: Utilize new kmap_thread() lib: Utilize new kmap_thread() powerpc: Utilize new kmap_thread() samples: Utilize new kmap_thread() dax: Stray access protection for dax_direct_access() nvdimm/pmem: Stray access protection for pmem->virt_addr [dax|pmem]: Enable stray access protection
Documentation/core-api/protection-keys.rst | 11 +- arch/powerpc/mm/mem.c | 4 +- arch/x86/entry/common.c | 28 +++ arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 ++++++- arch/x86/mm/fault.c | 193 ++++++++++++++---- arch/x86/mm/pkeys.c | 88 ++++++-- drivers/android/binder_alloc.c | 4 +- drivers/base/firmware_loader/fallback.c | 4 +- drivers/base/firmware_loader/main.c | 4 +- drivers/block/drbd/drbd_main.c | 4 +- drivers/block/drbd/drbd_receiver.c | 12 +- drivers/dax/device.c | 2 + drivers/dax/super.c | 2 + drivers/firmware/efi/capsule-loader.c | 6 +- drivers/firmware/efi/capsule.c | 4 +- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 +- drivers/gpu/drm/gma500/gma_display.c | 4 +- drivers/gpu/drm/gma500/mmu.c | 10 +- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 +- .../drm/i915/gem/selftests/i915_gem_context.c | 4 +- .../drm/i915/gem/selftests/i915_gem_mman.c | 8 +- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 +- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 +- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 +- drivers/gpu/drm/i915/i915_gem.c | 8 +- drivers/gpu/drm/i915/i915_gpu_error.c | 4 +- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 +- drivers/gpu/drm/radeon/radeon_ttm.c | 4 +- drivers/infiniband/hw/hfi1/sdma.c | 4 +- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +- drivers/md/bcache/request.c | 4 +- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 +- drivers/mmc/host/mmc_spi.c | 4 +- drivers/mmc/host/sdricoh_cs.c | 4 +- drivers/mtd/mtd_blkdevs.c | 12 +- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 +- .../net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 +- drivers/nvdimm/pmem.c | 6 + drivers/scsi/ipr.c | 8 +- drivers/scsi/pmcraid.c | 8 +- drivers/staging/rts5208/rtsx_transport.c | 4 +- drivers/target/target_core_iblock.c | 4 +- drivers/target/target_core_rd.c | 4 +- drivers/target/target_core_transport.c | 4 +- drivers/xen/gntalloc.c | 4 +- fs/afs/dir.c | 16 +- fs/afs/dir_edit.c | 16 +- fs/afs/mntpt.c | 4 +- fs/afs/write.c | 4 +- fs/aio.c | 4 +- fs/binfmt_elf.c | 4 +- fs/binfmt_elf_fdpic.c | 4 +- fs/btrfs/check-integrity.c | 4 +- fs/btrfs/compression.c | 4 +- fs/btrfs/inode.c | 16 +- fs/btrfs/lzo.c | 24 +-- fs/btrfs/raid56.c | 34 +-- fs/btrfs/reflink.c | 8 +- fs/btrfs/send.c | 4 +- fs/btrfs/zlib.c | 32 +-- fs/btrfs/zstd.c | 20 +- fs/cachefiles/rdwr.c | 4 +- fs/cifs/cifsencrypt.c | 6 +- fs/cifs/file.c | 16 +- fs/cifs/smb2ops.c | 8 +- fs/cramfs/inode.c | 10 +- fs/ecryptfs/crypto.c | 8 +- fs/ecryptfs/read_write.c | 8 +- fs/erofs/super.c | 4 +- fs/erofs/xattr.c | 4 +- fs/exec.c | 10 +- fs/ext2/dir.c | 8 +- fs/ext2/ext2.h | 8 + fs/ext2/namei.c | 15 +- fs/f2fs/f2fs.h | 8 +- fs/freevxfs/vxfs_immed.c | 4 +- fs/fuse/readdir.c | 4 +- fs/gfs2/bmap.c | 4 +- fs/gfs2/ops_fstype.c | 4 +- fs/hfs/bnode.c | 14 +- fs/hfs/btree.c | 20 +- fs/hfsplus/bitmap.c | 20 +- fs/hfsplus/bnode.c | 102 ++++----- fs/hfsplus/btree.c | 18 +- fs/hostfs/hostfs_kern.c | 12 +- fs/io_uring.c | 4 +- fs/isofs/compress.c | 4 +- fs/jffs2/file.c | 8 +- fs/jffs2/gc.c | 4 +- fs/nfs/dir.c | 20 +- fs/nilfs2/alloc.c | 34 +-- fs/nilfs2/cpfile.c | 4 +- fs/ntfs/aops.c | 4 +- fs/reiserfs/journal.c | 4 +- fs/romfs/super.c | 4 +- fs/splice.c | 4 +- fs/ubifs/file.c | 16 +- fs/vboxsf/file.c | 12 +- fs/zonefs/super.c | 4 +- include/linux/entry-common.h | 3 + include/linux/highmem.h | 63 +++++- include/linux/memremap.h | 1 + include/linux/mm.h | 43 ++++ include/linux/pkeys.h | 6 +- include/linux/sched.h | 8 + include/trace/events/kmap_thread.h | 56 +++++ init/init_task.c | 6 + kernel/fork.c | 18 ++ kernel/kexec_core.c | 8 +- lib/Kconfig.debug | 8 + lib/iov_iter.c | 12 +- lib/pks/pks_test.c | 138 +++++++++++-- lib/test_bpf.c | 4 +- lib/test_hmm.c | 8 +- mm/Kconfig | 13 ++ mm/debug.c | 23 +++ mm/memory.c | 8 +- mm/memremap.c | 90 ++++++++ mm/swapfile.c | 4 +- mm/userfaultfd.c | 4 +- net/ceph/messenger.c | 4 +- net/core/datagram.c | 4 +- net/core/sock.c | 8 +- net/ipv4/ip_output.c | 4 +- net/sunrpc/cache.c | 4 +- net/sunrpc/xdr.c | 8 +- net/tls/tls_device.c | 4 +- samples/vfio-mdev/mbochs.c | 4 +- 131 files changed, 1284 insertions(+), 565 deletions(-) create mode 100644 include/trace/events/kmap_thread.h
From: Ira Weiny ira.weiny@intel.com
Some users, such as kmap(), sometimes requires PKS to be global. However, updating all CPUs, and worse yet all threads is expensive.
Introduce a global PKRS state which is checked at critical times to allow the state to enable access when global PKS is required. To accomplish this with minimal locking; the code is carefully designed with the following key concepts.
1) Borrow the idea of lazy TLB invalidations from the fault handler code. When enabling PKS access we anticipate that other threads are not yet running. However, if they are we catch the fault and clean up the MSR value.
2) When disabling PKS access we force all MSR values across all CPU's. This is required to block access as soon as possible.[1] However, it is key that we never attempt to update the per-task PKS values directly. See next point.
3) Per-task PKS values never get updated with global PKS values. This is key to prevent locking requirements and a nearly intractable problem of trying to update every task in the system. Here are a few key points.
3a) The MSR value can be updated with the global PKS value if that global value happened to change while the task was running.
3b) If the task was sleeping while the global PKS was updated then the global value is added in when task's are scheduled.
3c) If the global PKS value restricts access the MSR is updated as soon as possible[1] and the thread value is not updated which ensures the thread does not retain the elevated privileges after a context switch.
4) Follow on patches must be careful to preserve the separation of the thread PKRS value and the MSR value.
5) Access Disable on any individual pkey is turned into (Access Disable | Write Disable) to facilitate faster integration of the global value into the thread local MSR through a simple '&' operation. Doing otherwise would result in complicated individual bit manipulation for each pkey.
[1] There is a race condition which is ignored which is required for performance issues. This potentially allows access to a thread until the end of it's time slice. After the context switch the global value will be restored.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- Documentation/core-api/protection-keys.rst | 11 +- arch/x86/entry/common.c | 7 + arch/x86/include/asm/pkeys.h | 6 +- arch/x86/include/asm/pkeys_common.h | 8 +- arch/x86/kernel/process.c | 74 +++++++- arch/x86/mm/fault.c | 189 ++++++++++++++++----- arch/x86/mm/pkeys.c | 88 ++++++++-- include/linux/pkeys.h | 6 +- lib/pks/pks_test.c | 16 +- 9 files changed, 329 insertions(+), 76 deletions(-)
diff --git a/Documentation/core-api/protection-keys.rst b/Documentation/core-api/protection-keys.rst index c60366921d60..9e8a98653e13 100644 --- a/Documentation/core-api/protection-keys.rst +++ b/Documentation/core-api/protection-keys.rst @@ -121,9 +121,9 @@ mapping adds that mapping to the protection domain. int pks_key_alloc(const char * const pkey_user); #define PAGE_KERNEL_PKEY(pkey) #define _PAGE_KEY(pkey) - void pks_mknoaccess(int pkey); - void pks_mkread(int pkey); - void pks_mkrdwr(int pkey); + void pks_mknoaccess(int pkey, bool global); + void pks_mkread(int pkey, bool global); + void pks_mkrdwr(int pkey, bool global); void pks_key_free(int pkey);
pks_key_alloc() allocates keys dynamically to allow better use of the limited @@ -141,7 +141,10 @@ _PAGE_KEY(). The pks_mk*() family of calls allows kernel users the ability to change the protections for the domain identified by the pkey specified. 3 states are available pks_mknoaccess(), pks_mkread(), and pks_mkrdwr() which set the access -to none, read, and read/write respectively. +to none, read, and read/write respectively. 'global' specifies that the +protection should be set across all threads (logical CPU's) not just the +current running thread/CPU. This increases the overhead of PKS and lessens the +protection so it should be used sparingly.
Finally, pks_key_free() allows a user to return the key to the allocator for use by others. diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 324a8fd5ac10..86ad32e0095e 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -261,12 +261,19 @@ noinstr void idtentry_exit_nmi(struct pt_regs *regs, irqentry_state_t *irq_state * current running value and set the default PKRS value for the duration of the * exception. Thus preventing exception handlers from having the elevated * access of the interrupted task. + * + * NOTE That the thread saved PKRS must be preserved separately to ensure + * global overrides do not 'stick' on a thread. */ noinstr void irq_save_pkrs(irqentry_state_t *state) { if (!cpu_feature_enabled(X86_FEATURE_PKS)) return;
+ /* + * The thread_pkrs must be maintained separately to prevent global + * overrides from 'sticking' on a thread. + */ state->thread_pkrs = current->thread.saved_pkrs; state->pkrs = this_cpu_read(pkrs_cache); write_pkrs(INIT_PKRS_VALUE); diff --git a/arch/x86/include/asm/pkeys.h b/arch/x86/include/asm/pkeys.h index 79952216474e..cae0153a5480 100644 --- a/arch/x86/include/asm/pkeys.h +++ b/arch/x86/include/asm/pkeys.h @@ -143,9 +143,9 @@ u32 update_pkey_val(u32 pk_reg, int pkey, unsigned int flags); int pks_key_alloc(const char *const pkey_user); void pks_key_free(int pkey);
-void pks_mknoaccess(int pkey); -void pks_mkread(int pkey); -void pks_mkrdwr(int pkey); +void pks_mknoaccess(int pkey, bool global); +void pks_mkread(int pkey, bool global); +void pks_mkrdwr(int pkey, bool global);
#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */
diff --git a/arch/x86/include/asm/pkeys_common.h b/arch/x86/include/asm/pkeys_common.h index 8961e2ddd6ff..e380679ba1bb 100644 --- a/arch/x86/include/asm/pkeys_common.h +++ b/arch/x86/include/asm/pkeys_common.h @@ -6,7 +6,12 @@ #define PKR_WD_BIT 0x2 #define PKR_BITS_PER_PKEY 2
-#define PKR_AD_KEY(pkey) (PKR_AD_BIT << ((pkey) * PKR_BITS_PER_PKEY)) +/* + * We must define 11b as the default to make global overrides efficient. + * See arch/x86/kernel/process.c where the global pkrs is factored in during + * context switch. + */ +#define PKR_AD_KEY(pkey) ((PKR_WD_BIT | PKR_AD_BIT) << ((pkey) * PKR_BITS_PER_PKEY))
/* * Define a default PKRS value for each task. @@ -27,6 +32,7 @@ #define PKS_NUM_KEYS 16
#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +extern u32 pkrs_global_cache; DECLARE_PER_CPU(u32, pkrs_cache); noinstr void write_pkrs(u32 new_pkrs); #else diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c index eb3a95a69392..58edd162d9cb 100644 --- a/arch/x86/kernel/process.c +++ b/arch/x86/kernel/process.c @@ -43,7 +43,7 @@ #include <asm/io_bitmap.h> #include <asm/proto.h> #include <asm/frame.h> -#include <asm/pkeys_common.h> +#include <linux/pkeys.h>
#include "process.h"
@@ -189,15 +189,83 @@ int copy_thread(unsigned long clone_flags, unsigned long sp, unsigned long arg, }
#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS -DECLARE_PER_CPU(u32, pkrs_cache); static inline void pks_init_task(struct task_struct *tsk) { /* New tasks get the most restrictive PKRS value */ tsk->thread.saved_pkrs = INIT_PKRS_VALUE; } + +extern u32 pkrs_global_cache; + +/** + * The global PKRS value can only increase access. Because 01b and 11b both + * disable access. The following truth table is our desired result for each of + * the pkeys when we add in the global permissions. + * + * 00 R/W - Write enabled (all access) + * 10 Read - write disabled (Read only) + * 01 NO Acc - access disabled + * 11 NO Acc - also access disabled + * + * local global desired required + * result operation + * 00 00 00 & + * 00 10 00 & + * 00 01 00 & + * 00 11 00 & + * + * 10 00 00 & + * 10 10 10 & + * 10 01 10 ^ special case + * 10 11 10 & + * + * 01 00 00 & + * 01 10 10 ^ special case + * 01 01 01 & + * 01 11 01 & + * + * 11 00 00 & + * 11 10 10 & + * 11 01 01 & + * 11 11 11 & + * + * In order to eliminate the need to loop through each pkey and deal with the 2 + * above special cases we force all 01b values to 11b through the API thus + * resulting in the simplified truth table below. + * + * 00 R/W - Write enabled (all access) + * 10 Read - write disabled (Read only) + * 01 NO Acc - access disabled + * (Not allowed in the API always use 11) + * 11 NO Acc - access disabled + * + * local global desired effective + * result operation + * 00 00 00 & + * 00 10 00 & + * 00 11 00 & + * 00 11 00 & + * + * 10 00 00 & + * 10 10 10 & + * 10 11 10 & + * 10 11 10 & + * + * 11 00 00 & + * 11 10 10 & + * 11 11 11 & + * 11 11 11 & + * + * 11 00 00 & + * 11 10 10 & + * 11 11 11 & + * 11 11 11 & + * + * Thus we can simply 'AND' in the global pkrs value. + */ static inline void pks_sched_in(void) { - write_pkrs(current->thread.saved_pkrs); + write_pkrs(current->thread.saved_pkrs & pkrs_global_cache); } #else static inline void pks_init_task(struct task_struct *tsk) { } diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index dd5af9399131..4b4ff9efa298 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -32,6 +32,8 @@ #include <asm/pgtable_areas.h> /* VMALLOC_START, ... */ #include <asm/kvm_para.h> /* kvm_handle_async_pf */
+#include <asm-generic/mman-common.h> + #define CREATE_TRACE_POINTS #include <asm/trace/exceptions.h>
@@ -995,9 +997,124 @@ mm_fault_error(struct pt_regs *regs, unsigned long error_code, } }
-static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte) +#ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +/* + * check if we have had a 'global' pkey update. If so, handle this like a lazy + * TLB; fix up the local MSR and return + * + * See arch/x86/kernel/process.c for the explanation on how global is handled + * with a simple '&' operation. + * + * Also we don't update the current thread saved_pkrs because we don't want the + * global value to 'stick' with the thread. Rather we want this to be valid + * only for the remainder of this time slice. For subsequent time slices the + * global value will be factored in during schedule; see arch/x86/kernel/process.c + * + * Finally we have a trade off between performance and forcing a restriction of + * permissions across all CPUs on a global update. + * + * Given the following window. + * + * Global PKRS CPU #0 CPU #1 + * cache MSR MSR + * + * | | | + * Global |----------\ | | + * Restriction | ------------> read | <= T1 + * (on CPU #0) | | | | + * ------\ | | | | + * ------>| | | | + * | | | | + * Update CPU #1 |--------\ | | | + * | --------------\ | | + * | | --|------------>| + * Global remote | | | | + * MSR update | | | | + * (CPU 2-n) | | | | + * |-----> CPU's | v | + * local | (2-N) | local --\ | + * update | | update ------>|(Update <= T2 + * ----------------\ | | Incorrect) + * | -----------\ | | + * | --->|(Update OK) | + * Context | | | + * Switch |----------\ | | + * | ------------> read | + * | | | | + * | | | | + * | | v | + * | | local --\ | + * | | update ------>|(Update + * | | | Correct) + * + * We allow for a larger window of the global pkey being open because global + * updates should be rare and we don't want to burden normal faults with having + * to read the global state. + */ +static bool global_pkey_is_enabled(pte_t *pte, bool is_write, + irqentry_state_t *irq_state) +{ + u8 pkey = pte_flags_pkey(pte->pte); + int pkey_shift = pkey * PKR_BITS_PER_PKEY; + u32 mask = (((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift); + u32 global = READ_ONCE(pkrs_global_cache); + u32 val; + + /* Return early if global access is not valid */ + val = (global & mask) >> pkey_shift; + if ((val & PKR_AD_BIT) || (is_write && (val & PKR_WD_BIT))) + return false; + + irq_state->pkrs &= global; + + return true; +} + +#else /* !CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ +__always_inline bool global_pkey_is_enabled(pte_t *pte, bool is_write, + irqentry_state_t *irq_state) +{ + return false; +} +#endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */ + +#ifdef CONFIG_PKS_TESTING +bool pks_test_callback(irqentry_state_t *irq_state); +static bool handle_pks_testing(unsigned long hw_error_code, irqentry_state_t *irq_state) +{ + /* + * If we get a protection key exception it could be because we + * are running the PKS test. If so, pks_test_callback() will + * clear the protection mechanism and return true to indicate + * the fault was handled + */ + return pks_test_callback(irq_state); +} +#else /* !CONFIG_PKS_TESTING */ +static bool handle_pks_testing(unsigned long hw_error_code, irqentry_state_t *irq_state) +{ + return false; +} +#endif /* CONFIG_PKS_TESTING */ + + +static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte, + irqentry_state_t *irq_state) { - if ((error_code & X86_PF_WRITE) && !pte_write(*pte)) + bool is_write = (error_code & X86_PF_WRITE); + + if (IS_ENABLED(CONFIG_ARCH_HAS_SUPERVISOR_PKEYS) && + error_code & X86_PF_PK) { + if (global_pkey_is_enabled(pte, is_write, irq_state)) + return 1; + + if (handle_pks_testing(error_code, irq_state)) + return 1; + + return 0; + } + + if (is_write && !pte_write(*pte)) return 0;
if ((error_code & X86_PF_INSTR) && !pte_exec(*pte)) @@ -1007,7 +1124,7 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte) }
/* - * Handle a spurious fault caused by a stale TLB entry. + * Handle a spurious fault caused by a stale TLB entry or a lazy PKRS update. * * This allows us to lazily refresh the TLB when increasing the * permissions of a kernel page (RO -> RW or NX -> X). Doing it @@ -1022,13 +1139,19 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte) * There are no security implications to leaving a stale TLB when * increasing the permissions on a page. * + * Similarly, PKRS increases in permissions are done on a thread local level. + * But if the caller indicates the permission should be allowd globaly we can + * lazily update only those threads which fault and avoid a global IPI MSR + * update. + * * Returns non-zero if a spurious fault was handled, zero otherwise. * * See Intel Developer's Manual Vol 3 Section 4.10.4.3, bullet 3 * (Optional Invalidation). */ static noinline int -spurious_kernel_fault(unsigned long error_code, unsigned long address) +spurious_kernel_fault(unsigned long error_code, unsigned long address, + irqentry_state_t *irq_state) { pgd_t *pgd; p4d_t *p4d; @@ -1038,17 +1161,19 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) int ret;
/* - * Only writes to RO or instruction fetches from NX may cause - * spurious faults. + * Only PKey faults or writes to RO or instruction fetches from NX may + * cause spurious faults. * * These could be from user or supervisor accesses but the TLB * is only lazily flushed after a kernel mapping protection * change, so user accesses are not expected to cause spurious * faults. */ - if (error_code != (X86_PF_WRITE | X86_PF_PROT) && - error_code != (X86_PF_INSTR | X86_PF_PROT)) - return 0; + if (!(error_code & X86_PF_PK)) { + if (error_code != (X86_PF_WRITE | X86_PF_PROT) && + error_code != (X86_PF_INSTR | X86_PF_PROT)) + return 0; + }
pgd = init_mm.pgd + pgd_index(address); if (!pgd_present(*pgd)) @@ -1059,27 +1184,31 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) return 0;
if (p4d_large(*p4d)) - return spurious_kernel_fault_check(error_code, (pte_t *) p4d); + return spurious_kernel_fault_check(error_code, (pte_t *) p4d, + irq_state);
pud = pud_offset(p4d, address); if (!pud_present(*pud)) return 0;
if (pud_large(*pud)) - return spurious_kernel_fault_check(error_code, (pte_t *) pud); + return spurious_kernel_fault_check(error_code, (pte_t *) pud, + irq_state);
pmd = pmd_offset(pud, address); if (!pmd_present(*pmd)) return 0;
if (pmd_large(*pmd)) - return spurious_kernel_fault_check(error_code, (pte_t *) pmd); + return spurious_kernel_fault_check(error_code, (pte_t *) pmd, + irq_state);
pte = pte_offset_kernel(pmd, address); if (!pte_present(*pte)) return 0;
- ret = spurious_kernel_fault_check(error_code, pte); + ret = spurious_kernel_fault_check(error_code, pte, + irq_state); if (!ret) return 0;
@@ -1087,7 +1216,8 @@ spurious_kernel_fault(unsigned long error_code, unsigned long address) * Make sure we have permissions in PMD. * If not, then there's a bug in the page tables: */ - ret = spurious_kernel_fault_check(error_code, (pte_t *) pmd); + ret = spurious_kernel_fault_check(error_code, (pte_t *) pmd, + irq_state); WARN_ONCE(!ret, "PMD has incorrect permission bits\n");
return ret; @@ -1150,25 +1280,6 @@ static int fault_in_kernel_space(unsigned long address) return address >= TASK_SIZE_MAX; }
-#ifdef CONFIG_PKS_TESTING -bool pks_test_callback(irqentry_state_t *irq_state); -static bool handle_pks_testing(unsigned long hw_error_code, irqentry_state_t *irq_state) -{ - /* - * If we get a protection key exception it could be because we - * are running the PKS test. If so, pks_test_callback() will - * clear the protection mechanism and return true to indicate - * the fault was handled. - */ - return (hw_error_code & X86_PF_PK) && pks_test_callback(irq_state); -} -#else -static bool handle_pks_testing(unsigned long hw_error_code, irqentry_state_t *irq_state) -{ - return false; -} -#endif - /* * Called for all faults where 'address' is part of the kernel address * space. Might get called for faults that originate from *code* that @@ -1186,9 +1297,6 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, !cpu_feature_enabled(X86_FEATURE_PKS)) WARN_ON_ONCE(hw_error_code & X86_PF_PK);
- if (handle_pks_testing(hw_error_code, irq_state)) - return; - #ifdef CONFIG_X86_32 /* * We can fault-in kernel-space virtual memory on-demand. The @@ -1220,8 +1328,11 @@ do_kern_addr_fault(struct pt_regs *regs, unsigned long hw_error_code, } #endif
- /* Was the fault spurious, caused by lazy TLB invalidation? */ - if (spurious_kernel_fault(hw_error_code, address)) + /* + * Was the fault spurious; caused by lazy TLB invalidation or PKRS + * update? + */ + if (spurious_kernel_fault(hw_error_code, address, irq_state)) return;
/* kprobes don't want to hook the spurious faults: */ @@ -1492,7 +1603,7 @@ DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault) * * Fingers crossed. * - * The async #PF handling code takes care of idtentry handling + * The async #PF handling code takes care of irqentry handling * itself. */ if (kvm_handle_async_pf(regs, (u32)address)) diff --git a/arch/x86/mm/pkeys.c b/arch/x86/mm/pkeys.c index 2431c68ef752..a45893069877 100644 --- a/arch/x86/mm/pkeys.c +++ b/arch/x86/mm/pkeys.c @@ -263,33 +263,84 @@ noinstr void write_pkrs(u32 new_pkrs) } EXPORT_SYMBOL_GPL(write_pkrs);
+/* + * NOTE: The pkrs_global_cache is _never_ stored in the per thread PKRS cache + * values [thread.saved_pkrs] by design + * + * This allows us to invalidate access on running threads immediately upon + * invalidate. Sleeping threads will not be enabled due to the algorithm + * during pkrs_sched_in() + */ +DEFINE_SPINLOCK(pkrs_global_cache_lock); +u32 pkrs_global_cache = INIT_PKRS_VALUE; +EXPORT_SYMBOL_GPL(pkrs_global_cache); + +static inline void update_global_pkrs(int pkey, unsigned long protection) +{ + int pkey_shift = pkey * PKR_BITS_PER_PKEY; + u32 mask = (((1 << PKR_BITS_PER_PKEY) - 1) << pkey_shift); + u32 old_val; + + spin_lock(&pkrs_global_cache_lock); + old_val = (pkrs_global_cache & mask) >> pkey_shift; + pkrs_global_cache &= ~mask; + if (protection & PKEY_DISABLE_ACCESS) + pkrs_global_cache |= PKR_AD_BIT << pkey_shift; + if (protection & PKEY_DISABLE_WRITE) + pkrs_global_cache |= PKR_WD_BIT << pkey_shift; + + /* + * If we are preventing access from the old value. Force the + * update on all running threads. + */ + if (((old_val == 0) && protection) || + ((old_val & PKR_WD_BIT) && (protection & PKEY_DISABLE_ACCESS))) { + int cpu; + + for_each_online_cpu(cpu) { + u32 *ptr = per_cpu_ptr(&pkrs_cache, cpu); + + *ptr = update_pkey_val(*ptr, pkey, protection); + wrmsrl_on_cpu(cpu, MSR_IA32_PKRS, *ptr); + put_cpu_ptr(ptr); + } + } + spin_unlock(&pkrs_global_cache_lock); +} + /** * Do not call this directly, see pks_mk*() below. * * @pkey: Key for the domain to change * @protection: protection bits to be used + * @global: should this change be made globally or not. * * Protection utilizes the same protection bits specified for User pkeys * PKEY_DISABLE_ACCESS * PKEY_DISABLE_WRITE * */ -static inline void pks_update_protection(int pkey, unsigned long protection) +static inline void pks_update_protection(int pkey, unsigned long protection, + bool global) { - current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs, - pkey, protection); preempt_disable(); + if (global) + update_global_pkrs(pkey, protection); + + current->thread.saved_pkrs = update_pkey_val(current->thread.saved_pkrs, pkey, + protection); write_pkrs(current->thread.saved_pkrs); + preempt_enable(); }
/** * PKS access control functions * - * Change the access of the domain specified by the pkey. These are global - * updates. They only affects the current running thread. It is undefined and - * a bug for users to call this without having allocated a pkey and using it as - * pkey here. + * Change the access of the domain specified by the pkey. These may be global + * updates depending on the value of global. It is undefined and a bug for + * users to call this without having allocated a pkey and using it as pkey + * here. * * pks_mknoaccess() * Disable all access to the domain @@ -299,23 +350,30 @@ static inline void pks_update_protection(int pkey, unsigned long protection) * Make the domain Read/Write * * @pkey the pkey for which the access should change. - * + * @global if true the access is enabled on all threads/logical cpus */ -void pks_mknoaccess(int pkey) +void pks_mknoaccess(int pkey, bool global) { - pks_update_protection(pkey, PKEY_DISABLE_ACCESS); + /* + * We force disable access to be 11b + * (PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE) + * instaed of 01b See arch/x86/kernel/process.c where the global pkrs + * is factored in during context switch. + */ + pks_update_protection(pkey, PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE, + global); } EXPORT_SYMBOL_GPL(pks_mknoaccess);
-void pks_mkread(int pkey) +void pks_mkread(int pkey, bool global) { - pks_update_protection(pkey, PKEY_DISABLE_WRITE); + pks_update_protection(pkey, PKEY_DISABLE_WRITE, global); } EXPORT_SYMBOL_GPL(pks_mkread);
-void pks_mkrdwr(int pkey) +void pks_mkrdwr(int pkey, bool global) { - pks_update_protection(pkey, 0); + pks_update_protection(pkey, 0, global); } EXPORT_SYMBOL_GPL(pks_mkrdwr);
@@ -377,7 +435,7 @@ void pks_key_free(int pkey) return;
/* Restore to default of no access */ - pks_mknoaccess(pkey); + pks_mknoaccess(pkey, true); pks_key_users[pkey] = NULL; __clear_bit(pkey, &pks_key_allocation_map); } diff --git a/include/linux/pkeys.h b/include/linux/pkeys.h index f9552bd9341f..8f3bfec83949 100644 --- a/include/linux/pkeys.h +++ b/include/linux/pkeys.h @@ -57,15 +57,15 @@ static inline int pks_key_alloc(const char * const pkey_user) static inline void pks_key_free(int pkey) { } -static inline void pks_mknoaccess(int pkey) +static inline void pks_mknoaccess(int pkey, bool global) { WARN_ON_ONCE(1); } -static inline void pks_mkread(int pkey) +static inline void pks_mkread(int pkey, bool global) { WARN_ON_ONCE(1); } -static inline void pks_mkrdwr(int pkey) +static inline void pks_mkrdwr(int pkey, bool global) { WARN_ON_ONCE(1); } diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index d7dbf92527bd..286c8b8457da 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -163,12 +163,12 @@ static void check_exception(irqentry_state_t *irq_state) * Check we can update the value during exception without affecting the * calling thread. The calling thread is checked after exception... */ - pks_mkrdwr(test_armed_key); + pks_mkrdwr(test_armed_key, false); if (!check_pkrs(test_armed_key, 0)) { pr_err(" FAIL: exception did not change register to 0\n"); test_exception_ctx->pass = false; } - pks_mknoaccess(test_armed_key); + pks_mknoaccess(test_armed_key, false); if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)) { pr_err(" FAIL: exception did not change register to 0x3\n"); test_exception_ctx->pass = false; @@ -314,13 +314,13 @@ static int run_access_test(struct pks_test_ctx *ctx, { switch (test->mode) { case PKS_TEST_NO_ACCESS: - pks_mknoaccess(ctx->pkey); + pks_mknoaccess(ctx->pkey, false); break; case PKS_TEST_RDWR: - pks_mkrdwr(ctx->pkey); + pks_mkrdwr(ctx->pkey, false); break; case PKS_TEST_RDONLY: - pks_mkread(ctx->pkey); + pks_mkread(ctx->pkey, false); break; default: pr_err("BUG in test invalid mode\n"); @@ -476,7 +476,7 @@ static void run_exception_test(void) goto free_context; }
- pks_mkread(ctx->pkey); + pks_mkread(ctx->pkey, false);
spin_lock(&test_lock); WRITE_ONCE(test_exception_ctx, ctx); @@ -556,7 +556,7 @@ static void crash_it(void) return; }
- pks_mknoaccess(ctx->pkey); + pks_mknoaccess(ctx->pkey, false);
spin_lock(&test_lock); WRITE_ONCE(test_armed_key, 0); @@ -618,7 +618,7 @@ static ssize_t pks_write_file(struct file *file, const char __user *user_buf, /* start of context switch test */ if (!strcmp(buf, "1")) { /* Ensure a known state to test context switch */ - pks_mknoaccess(ctx->pkey); + pks_mknoaccess(ctx->pkey, false); }
/* After context switch msr should be restored */
From: Ira Weiny ira.weiny@intel.com
Now that PKS can be enabled globaly (for all threads) add a test which spawns a thread and tests the same PKS functionality.
The test enables/disables PKS in 1 thread while attempting to access the page in another thread. We use the same test array as in the 'local' PKS testing.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- arch/x86/mm/fault.c | 4 ++ lib/pks/pks_test.c | 128 +++++++++++++++++++++++++++++++++++++++++--- 2 files changed, 124 insertions(+), 8 deletions(-)
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index 4b4ff9efa298..4c74f52fbc23 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -1108,6 +1108,10 @@ static int spurious_kernel_fault_check(unsigned long error_code, pte_t *pte, if (global_pkey_is_enabled(pte, is_write, irq_state)) return 1;
+ /* + * NOTE: This must be after the global_pkey_is_enabled() call + * to allow the fixup code to be tested. + */ if (handle_pks_testing(error_code, irq_state)) return 1;
diff --git a/lib/pks/pks_test.c b/lib/pks/pks_test.c index 286c8b8457da..dfddccbe4cb6 100644 --- a/lib/pks/pks_test.c +++ b/lib/pks/pks_test.c @@ -154,7 +154,8 @@ static void check_exception(irqentry_state_t *irq_state) }
/* Check the exception state */ - if (!check_pkrs(test_armed_key, PKEY_DISABLE_ACCESS)) { + if (!check_pkrs(test_armed_key, + PKEY_DISABLE_ACCESS | PKEY_DISABLE_WRITE)) { pr_err(" FAIL: PKRS cache and MSR\n"); test_exception_ctx->pass = false; } @@ -308,24 +309,29 @@ static int test_it(struct pks_test_ctx *ctx, struct pks_access_test *test, void return ret; }
-static int run_access_test(struct pks_test_ctx *ctx, - struct pks_access_test *test, - void *ptr) +static void set_protection(int pkey, enum pks_access_mode mode, bool global) { - switch (test->mode) { + switch (mode) { case PKS_TEST_NO_ACCESS: - pks_mknoaccess(ctx->pkey, false); + pks_mknoaccess(pkey, global); break; case PKS_TEST_RDWR: - pks_mkrdwr(ctx->pkey, false); + pks_mkrdwr(pkey, global); break; case PKS_TEST_RDONLY: - pks_mkread(ctx->pkey, false); + pks_mkread(pkey, global); break; default: pr_err("BUG in test invalid mode\n"); break; } +} + +static int run_access_test(struct pks_test_ctx *ctx, + struct pks_access_test *test, + void *ptr) +{ + set_protection(ctx->pkey, test->mode, false);
return test_it(ctx, test, ptr); } @@ -516,6 +522,110 @@ static void run_exception_test(void) pass ? "PASS" : "FAIL"); }
+struct shared_data { + struct mutex lock; + struct pks_test_ctx *ctx; + void *kmap_addr; + struct pks_access_test *test; +}; + +static int thread_main(void *d) +{ + struct shared_data *data = d; + struct pks_test_ctx *ctx = data->ctx; + + while (!kthread_should_stop()) { + mutex_lock(&data->lock); + /* + * wait for the main thread to hand us the page + * We should be spinning so hopefully we will not have gotten + * the global value from a schedule in. + */ + if (data->kmap_addr) { + if (test_it(ctx, data->test, data->kmap_addr)) + ctx->pass = false; + data->kmap_addr = NULL; + } + mutex_unlock(&data->lock); + } + + return 0; +} + +static void run_thread_access_test(struct shared_data *data, + struct pks_test_ctx *ctx, + struct pks_access_test *test, + void *ptr) +{ + set_protection(ctx->pkey, test->mode, true); + + pr_info("checking... mode %s; write %s\n", + get_mode_str(test->mode), test->write ? "TRUE" : "FALSE"); + + mutex_lock(&data->lock); + data->test = test; + data->kmap_addr = ptr; + mutex_unlock(&data->lock); + + while (data->kmap_addr) { + msleep(10); + } +} + +static void run_global_test(void) +{ + struct task_struct *other_task; + struct pks_test_ctx *ctx; + struct shared_data data; + bool pass = true; + void *ptr; + int i; + + pr_info(" ***** BEGIN: global pkey checking\n"); + + /* Set up context, data pgae, and thread */ + ctx = alloc_ctx("global pkey test"); + if (IS_ERR(ctx)) { + pr_err(" FAIL: no context\n"); + pass = false; + goto result; + } + ptr = alloc_test_page(ctx->pkey); + if (!ptr) { + pr_err(" FAIL: no vmalloc page\n"); + pass = false; + goto free_context; + } + other_task = kthread_run(thread_main, &data, "PKRS global test"); + if (IS_ERR(other_task)) { + pr_err(" FAIL: Failed to start thread\n"); + pass = false; + goto free_page; + } + + memset(&data, 0, sizeof(data)); + mutex_init(&data.lock); + data.ctx = ctx; + + /* Start testing */ + ctx->pass = true; + + for (i = 0; i < ARRAY_SIZE(pkey_test_ary); i++) { + run_thread_access_test(&data, ctx, &pkey_test_ary[i], ptr); + } + + kthread_stop(other_task); + pass = ctx->pass; + +free_page: + vfree(ptr); +free_context: + free_ctx(ctx); +result: + pr_info(" ***** END: global pkey checking : %s\n", + pass ? "PASS" : "FAIL"); +} + static void run_all(void) { struct pks_test_ctx *ctx[PKS_NUM_KEYS]; @@ -538,6 +648,8 @@ static void run_all(void) }
run_exception_test(); + + run_global_test(); }
static void crash_it(void)
From: Ira Weiny ira.weiny@intel.com
Device managed memory exposes itself to the kernel direct map which allows stray pointers to access these device memories.
Stray pointers to normal memory may result in a crash or other undesirable behavior which, while unfortunate, are usually recoverable with a reboot. Stray access, specifically stray writes, to areas such as non-volatile memory are permanent in nature and thus are more likely to result in permanent user data loss vs stray access to other memory areas.
Furthermore, we protect against reads which can help with speculative reads to poison areas as well. But this is a secondary reason.
Set up an infrastructure for extra device access protection. Then implement the new protection using the new Protection Keys Supervisor (PKS) on architectures which support it.
To enable this extra protection devices specify a flag in the pgmap to indicate that these areas wish to use additional protection.
Kernel code which intends to access this memory can do so automatically through the use of the kmap infrastructure calling into dev_access_[enable|disable]() described here. The kmap infrastructure is implemented in a follow on patch.
In addition, users can directly enable/disable the access through dev_access_[enable|disable]() if they have a priori knowledge of the type of pages they are accessing.
All calls to enable/disable protection flow through dev_access_[enable|disable]() and are nestable by the use of a per task reference count. This reference count does 2 things.
1) Allows a thread to nest calls to disable protection such that the first call to re-enable protection does not 'break' the last access of the pmem device memory.
2) Provides faster performance by avoiding lots of MSR writes. For example, looping over a sequence of pmem pages.
In addition, we must ensure the reference count is preserved through an exception so we add the count to irqentry_state_t and save/restore the reference count while giving exceptions their own count should they use a kmap call.
The following shows how this works through an exception:
... // ref == 0 dev_access_enable() // ref += 1 ==> disable protection irq() // enable protection // ref = 0 _handler() dev_access_enable() // ref += 1 ==> disable protection dev_access_disable() // ref -= 1 ==> enable protection // WARN_ON(ref != 0) // disable protection do_pmem_thing() // all good here dev_access_disable() // ref -= 1 ==> 0 ==> enable protection ...
Nested exceptions operate the same way with each exception storing the interrupted exception state all the way down.
The pkey value is never free'ed as this optimizes the implementation to be either on or off using a static branch conditional in the fast paths.
Cc: Juri Lelli juri.lelli@redhat.com Cc: Vincent Guittot vincent.guittot@linaro.org Cc: Dietmar Eggemann dietmar.eggemann@arm.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Ben Segall bsegall@google.com Cc: Mel Gorman mgorman@suse.de Signed-off-by: Ira Weiny ira.weiny@intel.com --- arch/x86/entry/common.c | 21 +++++++++ include/linux/entry-common.h | 3 ++ include/linux/memremap.h | 1 + include/linux/mm.h | 43 +++++++++++++++++ include/linux/sched.h | 3 ++ init/init_task.c | 3 ++ kernel/fork.c | 3 ++ mm/Kconfig | 13 ++++++ mm/memremap.c | 90 ++++++++++++++++++++++++++++++++++++ 9 files changed, 180 insertions(+)
diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c index 86ad32e0095e..3680724c1a4d 100644 --- a/arch/x86/entry/common.c +++ b/arch/x86/entry/common.c @@ -264,12 +264,27 @@ noinstr void idtentry_exit_nmi(struct pt_regs *regs, irqentry_state_t *irq_state * * NOTE That the thread saved PKRS must be preserved separately to ensure * global overrides do not 'stick' on a thread. + * + * Furthermore, Zone Device Access Protection maintains access in a re-entrant + * manner through a reference count which also needs to be maintained should + * exception handlers use those interfaces for memory access. Here we start + * off the exception handler ref count to 0 and ensure it is 0 when the + * exception is done. Then restore it for the interrupted task. */ noinstr void irq_save_pkrs(irqentry_state_t *state) { if (!cpu_feature_enabled(X86_FEATURE_PKS)) return;
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION + /* + * Save the ref count of the current running process and set it to 0 + * for any irq users to properly track re-entrance + */ + state->pkrs_ref = current->dev_page_access_ref; + current->dev_page_access_ref = 0; +#endif + /* * The thread_pkrs must be maintained separately to prevent global * overrides from 'sticking' on a thread. @@ -286,6 +301,12 @@ noinstr void irq_restore_pkrs(irqentry_state_t *state)
write_pkrs(state->pkrs); current->thread.saved_pkrs = state->thread_pkrs; + +#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION + WARN_ON_ONCE(current->dev_page_access_ref != 0); + /* Restore the interrupted process reference */ + current->dev_page_access_ref = state->pkrs_ref; +#endif } #endif /* CONFIG_ARCH_HAS_SUPERVISOR_PKEYS */
diff --git a/include/linux/entry-common.h b/include/linux/entry-common.h index c3b361ffa059..06743cce2dbf 100644 --- a/include/linux/entry-common.h +++ b/include/linux/entry-common.h @@ -343,6 +343,9 @@ void irqentry_exit_to_user_mode(struct pt_regs *regs); #ifndef irqentry_state typedef struct irqentry_state { #ifdef CONFIG_ARCH_HAS_SUPERVISOR_PKEYS +#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION + unsigned int pkrs_ref; +#endif u32 pkrs; u32 thread_pkrs; #endif diff --git a/include/linux/memremap.h b/include/linux/memremap.h index e5862746751b..b6713ee7b218 100644 --- a/include/linux/memremap.h +++ b/include/linux/memremap.h @@ -89,6 +89,7 @@ struct dev_pagemap_ops { };
#define PGMAP_ALTMAP_VALID (1 << 0) +#define PGMAP_PROT_ENABLED (1 << 1)
/** * struct dev_pagemap - metadata for ZONE_DEVICE mappings diff --git a/include/linux/mm.h b/include/linux/mm.h index 16b799a0522c..9e845515ff15 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1141,6 +1141,49 @@ static inline bool is_pci_p2pdma_page(const struct page *page) page->pgmap->type == MEMORY_DEVICE_PCI_P2PDMA; }
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION +DECLARE_STATIC_KEY_FALSE(dev_protection_static_key); + +/* + * We make page_is_access_protected() as quick as possible. + * 1) If no mappings have been enabled with extra protection we skip this + * entirely + * 2) Skip pages which are not ZONE_DEVICE + * 3) Only then check if this particular page was mapped with extra + * protections. + */ +static inline bool page_is_access_protected(struct page *page) +{ + if (!static_branch_unlikely(&dev_protection_static_key)) + return false; + if (!is_zone_device_page(page)) + return false; + if (page->pgmap->flags & PGMAP_PROT_ENABLED) + return true; + return false; +} + +void __dev_access_enable(bool global); +void __dev_access_disable(bool global); +static __always_inline void dev_access_enable(bool global) +{ + if (static_branch_unlikely(&dev_protection_static_key)) + __dev_access_enable(global); +} +static __always_inline void dev_access_disable(bool global) +{ + if (static_branch_unlikely(&dev_protection_static_key)) + __dev_access_disable(global); +} +#else +static inline bool page_is_access_protected(struct page *page) +{ + return false; +} +static inline void dev_access_enable(bool global) { } +static inline void dev_access_disable(bool global) { } +#endif /* CONFIG_ZONE_DEVICE_ACCESS_PROTECTION */ + /* 127: arbitrary random number, small enough to assemble well */ #define page_ref_zero_or_close_to_overflow(page) \ ((unsigned int) page_ref_count(page) + 127u <= 127u) diff --git a/include/linux/sched.h b/include/linux/sched.h index afe01e232935..25d97ab6c757 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1315,6 +1315,9 @@ struct task_struct { struct callback_head mce_kill_me; #endif
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION + unsigned int dev_page_access_ref; +#endif /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/init/init_task.c b/init/init_task.c index f6889fce64af..9b39f25de59b 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -209,6 +209,9 @@ struct task_struct init_task #ifdef CONFIG_SECCOMP .seccomp = { .filter_count = ATOMIC_INIT(0) }, #endif +#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION + .dev_page_access_ref = 0, +#endif }; EXPORT_SYMBOL(init_task);
diff --git a/kernel/fork.c b/kernel/fork.c index da8d360fb032..b6a3ee328a89 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -940,6 +940,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node)
#ifdef CONFIG_MEMCG tsk->active_memcg = NULL; +#endif +#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION + tsk->dev_page_access_ref = 0; #endif return tsk;
diff --git a/mm/Kconfig b/mm/Kconfig index 1b9bc004d9bc..01dd75720ae6 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -794,6 +794,19 @@ config ZONE_DEVICE
If FS_DAX is enabled, then say Y.
+config ZONE_DEVICE_ACCESS_PROTECTION + bool "Device memory access protection" + depends on ZONE_DEVICE + depends on ARCH_HAS_SUPERVISOR_PKEYS + + help + Enable the option of having access protections on device memory + areas. This protects against access to device memory which is not + intended such as stray writes. This feature is particularly useful + to protect against corruption of persistent memory. + + If in doubt, say 'Y'. + config DEV_PAGEMAP_OPS bool
diff --git a/mm/memremap.c b/mm/memremap.c index fbfc79fd9c24..edad2aa0bd24 100644 --- a/mm/memremap.c +++ b/mm/memremap.c @@ -6,12 +6,16 @@ #include <linux/memory_hotplug.h> #include <linux/mm.h> #include <linux/pfn_t.h> +#include <linux/pkeys.h> #include <linux/swap.h> #include <linux/mmzone.h> #include <linux/swapops.h> #include <linux/types.h> #include <linux/wait_bit.h> #include <linux/xarray.h> +#include <uapi/asm-generic/mman-common.h> + +#define PKEY_INVALID (INT_MIN)
static DEFINE_XARRAY(pgmap_array);
@@ -67,6 +71,89 @@ static void devmap_managed_enable_put(void) } #endif /* CONFIG_DEV_PAGEMAP_OPS */
+#ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION +/* + * Note; all devices which have asked for protections share the same key. The + * key may, or may not, have been provided by the core. If not, protection + * will remain disabled. The key acquisition is attempted at init time and + * never again. So we don't have to worry about dev_page_pkey changing. + */ +static int dev_page_pkey = PKEY_INVALID; +DEFINE_STATIC_KEY_FALSE(dev_protection_static_key); +EXPORT_SYMBOL(dev_protection_static_key); + +static pgprot_t dev_pgprot_get(struct dev_pagemap *pgmap, pgprot_t prot) +{ + if (pgmap->flags & PGMAP_PROT_ENABLED && dev_page_pkey != PKEY_INVALID) { + pgprotval_t val = pgprot_val(prot); + + static_branch_inc(&dev_protection_static_key); + prot = __pgprot(val | _PAGE_PKEY(dev_page_pkey)); + } + return prot; +} + +static void dev_pgprot_put(struct dev_pagemap *pgmap) +{ + if (pgmap->flags & PGMAP_PROT_ENABLED && dev_page_pkey != PKEY_INVALID) + static_branch_dec(&dev_protection_static_key); +} + +void __dev_access_disable(bool global) +{ + unsigned long flags; + + local_irq_save(flags); + if (!--current->dev_page_access_ref) + pks_mknoaccess(dev_page_pkey, global); + local_irq_restore(flags); +} +EXPORT_SYMBOL_GPL(__dev_access_disable); + +void __dev_access_enable(bool global) +{ + unsigned long flags; + + local_irq_save(flags); + /* 0 clears the PKEY_DISABLE_ACCESS bit, allowing access */ + if (!current->dev_page_access_ref++) + pks_mkrdwr(dev_page_pkey, global); + local_irq_restore(flags); +} +EXPORT_SYMBOL_GPL(__dev_access_enable); + +/** + * dev_access_protection_init: Configure a PKS key domain for device pages + * + * The domain defaults to the protected state. Device page mappings should set + * the PGMAP_PROT_ENABLED flag when mapping pages. + * + * Note the pkey is never free'ed. This is run at init time and we either get + * the key or we do not. We need to do this to maintian a constant key (or + * not) as device memory is added or removed. + */ +static int __init __dev_access_protection_init(void) +{ + int pkey = pks_key_alloc("Device Memory"); + + if (pkey < 0) + return 0; + + dev_page_pkey = pkey; + + return 0; +} +subsys_initcall(__dev_access_protection_init); +#else +static pgprot_t dev_pgprot_get(struct dev_pagemap *pgmap, pgprot_t prot) +{ + return prot; +} +static void dev_pgprot_put(struct dev_pagemap *pgmap) +{ +} +#endif /* CONFIG_ZONE_DEVICE_ACCESS_PROTECTION */ + static void pgmap_array_delete(struct resource *res) { xa_store_range(&pgmap_array, PHYS_PFN(res->start), PHYS_PFN(res->end), @@ -156,6 +243,7 @@ void memunmap_pages(struct dev_pagemap *pgmap) pgmap_array_delete(res); WARN_ONCE(pgmap->altmap.alloc, "failed to free all reserved pages\n"); devmap_managed_enable_put(); + dev_pgprot_put(pgmap); } EXPORT_SYMBOL_GPL(memunmap_pages);
@@ -191,6 +279,8 @@ void *memremap_pages(struct dev_pagemap *pgmap, int nid) int error, is_ram; bool need_devmap_managed = true;
+ params.pgprot = dev_pgprot_get(pgmap, params.pgprot); + switch (pgmap->type) { case MEMORY_DEVICE_PRIVATE: if (!IS_ENABLED(CONFIG_DEVICE_PRIVATE)) {
From: Ira Weiny ira.weiny@intel.com
Device managed pages may have additional protections. These protections need to be removed prior to valid use by kernel users.
Check for special treatment of device managed pages in kmap and take action if needed. We use kmap as an interface for generic kernel code because under normal circumstances it would be a bug for general kernel code to not use kmap prior to accessing kernel memory. Therefore, this should allow any valid kernel users to seamlessly use these pages without issues.
Because of the critical nature of kmap it must be pointed out that the over head on regular DRAM is carefully implemented to be as fast as possible. Furthermore the underlying MSR write required on device pages when protected is better than a normal MSR write.
Specifically, WRMSR(MSR_IA32_PKRS) is not serializing but still maintains ordering properties similar to WRPKRU. The current SDM section on PKRS needs updating but should be the same as that of WRPKRU. So to quote from the WRPKRU text:
WRPKRU will never execute speculatively. Memory accesses affected by PKRU register will not execute (even speculatively) until all prior executions of WRPKRU have completed execution and updated the PKRU register.
Still this will make accessing pmem more expensive from the kernel but the overhead is minimized and many pmem users access this memory through user page mappings which are not affected at all.
Cc: Randy Dunlap rdunlap@infradead.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- include/linux/highmem.h | 32 +++++++++++++++++++++++++++++++- 1 file changed, 31 insertions(+), 1 deletion(-)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 14e6202ce47f..2a9806e3b8d2 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -8,6 +8,7 @@ #include <linux/mm.h> #include <linux/uaccess.h> #include <linux/hardirq.h> +#include <linux/memremap.h>
#include <asm/cacheflush.h>
@@ -31,6 +32,20 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
#include <asm/kmap_types.h>
+static inline void dev_page_enable_access(struct page *page, bool global) +{ + if (!page_is_access_protected(page)) + return; + dev_access_enable(global); +} + +static inline void dev_page_disable_access(struct page *page, bool global) +{ + if (!page_is_access_protected(page)) + return; + dev_access_disable(global); +} + #ifdef CONFIG_HIGHMEM extern void *kmap_atomic_high_prot(struct page *page, pgprot_t prot); extern void kunmap_atomic_high(void *kvaddr); @@ -55,6 +70,11 @@ static inline void *kmap(struct page *page) else addr = kmap_high(page); kmap_flush_tlb((unsigned long)addr); + /* + * Even non-highmem pages may have additional access protections which + * need to be checked and potentially enabled. + */ + dev_page_enable_access(page, true); return addr; }
@@ -63,6 +83,11 @@ void kunmap_high(struct page *page); static inline void kunmap(struct page *page) { might_sleep(); + /* + * Even non-highmem pages may have additional access protections which + * need to be checked and potentially disabled. + */ + dev_page_disable_access(page, true); if (!PageHighMem(page)) return; kunmap_high(page); @@ -85,6 +110,7 @@ static inline void *kmap_atomic_prot(struct page *page, pgprot_t prot) { preempt_disable(); pagefault_disable(); + dev_page_enable_access(page, false); if (!PageHighMem(page)) return page_address(page); return kmap_atomic_high_prot(page, prot); @@ -137,6 +163,7 @@ static inline unsigned long totalhigh_pages(void) { return 0UL; } static inline void *kmap(struct page *page) { might_sleep(); + dev_page_enable_access(page, true); return page_address(page); }
@@ -146,6 +173,7 @@ static inline void kunmap_high(struct page *page)
static inline void kunmap(struct page *page) { + dev_page_disable_access(page, true); #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(page_address(page)); #endif @@ -155,6 +183,7 @@ static inline void *kmap_atomic(struct page *page) { preempt_disable(); pagefault_disable(); + dev_page_enable_access(page, false); return page_address(page); } #define kmap_atomic_prot(page, prot) kmap_atomic(page) @@ -216,7 +245,8 @@ static inline void kmap_atomic_idx_pop(void) #define kunmap_atomic(addr) \ do { \ BUILD_BUG_ON(__same_type((addr), struct page *)); \ - kunmap_atomic_high(addr); \ + dev_page_disable_access(kmap_to_page(addr), false); \ + kunmap_atomic_high(addr); \ pagefault_enable(); \ preempt_enable(); \ } while (0)
From: Ira Weiny ira.weiny@intel.com
To correctly support the semantics of kmap() with Kernel protection keys (PKS), kmap() may be required to set the protections on multiple processors (globally). Enabling PKS globally can be very expensive depending on the requested operation. Furthermore, enabling a domain globally reduces the protection afforded by PKS.
Most kmap() (Aprox 209 of 229) callers use the map within a single thread and have no need for the protection domain to be enabled globally. However, the remaining callers do not follow this pattern and, as best I can tell, expect the mapping to be 'global' and available to any thread who may access the mapping.[1]
We don't anticipate global mappings to pmem, however in general there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why the failure occurred.
To resolve this a number of options were considered.
1) Attempt to change all the thread local kmap() calls to kmap_atomic()[2] 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20 call sites to 'kmap_global()' to indicate that they require a global enablement of the pages. 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only
Option 1 is simply not feasible. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior the developer intended. Therefore, #4 was chosen.
Subsequent patches will convert most ~90% of the kmap callers to this new call leaving about 10% of the existing kmap callers to enable PKS globally.
Cc: Randy Dunlap rdunlap@infradead.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- include/linux/highmem.h | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-)
diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 2a9806e3b8d2..ef7813544719 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -60,7 +60,7 @@ static inline void kmap_flush_tlb(unsigned long addr) { } #endif
void *kmap_high(struct page *page); -static inline void *kmap(struct page *page) +static inline void *__kmap(struct page *page, bool global) { void *addr;
@@ -74,20 +74,20 @@ static inline void *kmap(struct page *page) * Even non-highmem pages may have additional access protections which * need to be checked and potentially enabled. */ - dev_page_enable_access(page, true); + dev_page_enable_access(page, global); return addr; }
void kunmap_high(struct page *page);
-static inline void kunmap(struct page *page) +static inline void __kunmap(struct page *page, bool global) { might_sleep(); /* * Even non-highmem pages may have additional access protections which * need to be checked and potentially disabled. */ - dev_page_disable_access(page, true); + dev_page_disable_access(page, global); if (!PageHighMem(page)) return; kunmap_high(page); @@ -160,10 +160,10 @@ static inline struct page *kmap_to_page(void *addr)
static inline unsigned long totalhigh_pages(void) { return 0UL; }
-static inline void *kmap(struct page *page) +static inline void *__kmap(struct page *page, bool global) { might_sleep(); - dev_page_enable_access(page, true); + dev_page_enable_access(page, global); return page_address(page); }
@@ -171,9 +171,9 @@ static inline void kunmap_high(struct page *page) { }
-static inline void kunmap(struct page *page) +static inline void __kunmap(struct page *page, bool global) { - dev_page_disable_access(page, true); + dev_page_disable_access(page, global); #ifdef ARCH_HAS_FLUSH_ON_KUNMAP kunmap_flush_on_unmap(page_address(page)); #endif @@ -238,6 +238,24 @@ static inline void kmap_atomic_idx_pop(void)
#endif
+static inline void *kmap(struct page *page) +{ + return __kmap(page, true); +} +static inline void kunmap(struct page *page) +{ + __kunmap(page, true); +} + +static inline void *kmap_thread(struct page *page) +{ + return __kmap(page, false); +} +static inline void kunmap_thread(struct page *page) +{ + __kunmap(page, false); +} + /* * Prevent people trying to call kunmap_atomic() as if it were kunmap() * kunmap_atomic() should get the return value of kmap_atomic, not the page.
Ira,
On Fri, Oct 09 2020 at 12:49, ira weiny wrote:
From: Ira Weiny ira.weiny@intel.com
To correctly support the semantics of kmap() with Kernel protection keys (PKS), kmap() may be required to set the protections on multiple processors (globally). Enabling PKS globally can be very expensive depending on the requested operation. Furthermore, enabling a domain globally reduces the protection afforded by PKS.
Most kmap() (Aprox 209 of 229) callers use the map within a single thread and have no need for the protection domain to be enabled globally. However, the remaining callers do not follow this pattern and, as best I can tell, expect the mapping to be 'global' and available to any thread who may access the mapping.[1]
We don't anticipate global mappings to pmem, however in general there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why the failure occurred.
To resolve this a number of options were considered.
- Attempt to change all the thread local kmap() calls to kmap_atomic()[2]
- Introduce a flags parameter to kmap() to indicate if the mapping should be global or not
- Change ~20 call sites to 'kmap_global()' to indicate that they require a global enablement of the pages.
- Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only
Option 1 is simply not feasible. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior the developer intended. Therefore, #4 was chosen.
There is Option #5:
Convert the thread local kmap() invocations to the proposed kmap_local() interface which is coming along [1].
That solves a couple of issues:
1) It relieves the current kmap_atomic() usage sites from the implict pagefault/preempt disable semantics which apply even when CONFIG_HIGHMEM is disabled. kmap_local() still can be invoked from atomic context.
2) Due to #1 it allows to replace the conditional usage of kmap() and kmap_atomic() for purely thread local mappings.
3) It puts the burden on the HIGHMEM inflicted systems
4) It is actually more efficient for most of the pure thread local use cases on HIGHMEM inflicted systems because it avoids the overhead of the global lock and the potential kmap slot exhaustion. A potential preemption will be more expensive, but that's not really the case we want to optimize for.
5) It solves the RT issue vs. kmap_atomic()
So instead of creating yet another variety of kmap() which is just scratching the particular PKRS itch, can we please consolidate all of that on the wider reaching kmap_local() approach?
Thanks,
tglx
[1] https://lore.kernel.org/lkml/20201103092712.714480842@linutronix.de/
On Tue, Nov 10, 2020 at 02:13:56AM +0100, Thomas Gleixner wrote:
Ira,
On Fri, Oct 09 2020 at 12:49, ira weiny wrote:
From: Ira Weiny ira.weiny@intel.com
To correctly support the semantics of kmap() with Kernel protection keys (PKS), kmap() may be required to set the protections on multiple processors (globally). Enabling PKS globally can be very expensive depending on the requested operation. Furthermore, enabling a domain globally reduces the protection afforded by PKS.
Most kmap() (Aprox 209 of 229) callers use the map within a single thread and have no need for the protection domain to be enabled globally. However, the remaining callers do not follow this pattern and, as best I can tell, expect the mapping to be 'global' and available to any thread who may access the mapping.[1]
We don't anticipate global mappings to pmem, however in general there is a danger in changing the semantics of kmap(). Effectively, this would cause an unresolved page fault with little to no information about why the failure occurred.
To resolve this a number of options were considered.
- Attempt to change all the thread local kmap() calls to kmap_atomic()[2]
- Introduce a flags parameter to kmap() to indicate if the mapping should be global or not
- Change ~20 call sites to 'kmap_global()' to indicate that they require a global enablement of the pages.
- Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only
Option 1 is simply not feasible. Option 2 would require all of the call sites of kmap() to change. Option 3 seems like a good minimal change but there is a danger that new code may miss the semantic change of kmap() and not get the behavior the developer intended. Therefore, #4 was chosen.
There is Option #5:
There is now yes. :-D
Convert the thread local kmap() invocations to the proposed kmap_local() interface which is coming along [1].
I've been trying to follow that thread.
That solves a couple of issues:
It relieves the current kmap_atomic() usage sites from the implict pagefault/preempt disable semantics which apply even when CONFIG_HIGHMEM is disabled. kmap_local() still can be invoked from atomic context.
Due to #1 it allows to replace the conditional usage of kmap() and kmap_atomic() for purely thread local mappings.
It puts the burden on the HIGHMEM inflicted systems
It is actually more efficient for most of the pure thread local use cases on HIGHMEM inflicted systems because it avoids the overhead of the global lock and the potential kmap slot exhaustion. A potential preemption will be more expensive, but that's not really the case we want to optimize for.
It solves the RT issue vs. kmap_atomic()
So instead of creating yet another variety of kmap() which is just scratching the particular PKRS itch, can we please consolidate all of that on the wider reaching kmap_local() approach?
Yes I agree. We absolutely don't want more kmap*() calls and I was hoping to dovetail into your kmap_local() work.[2]
I've pivoted away from this work a bit to clean up all the kmap()/memcpy*()/kunmaps() as discussed elsewhere in the thread first.[3] I was hoping your work would land and then I could s/kmap_thread()/kmap_local()/ on all of these patches.
Also, we can convert the new memcpy_*_page() calls to kmap_local() as well. [For now my patch just uses kmap_atomic().]
I've not looked at all of the patches in your latest version. Have you included converting any of the kmap() call sites? I thought you were more focused on converting the kmap_atomic() to kmap_local()?
Ira
Thanks,
tglx
[1] https://lore.kernel.org/lkml/20201103092712.714480842@linutronix.de/
[2] https://lore.kernel.org/lkml/20201012195354.GC2046448@iweiny-DESK2.sc.intel.... [3] https://lore.kernel.org/lkml/20201009213434.GA839@sol.localdomain/ https://lore.kernel.org/lkml/20201013200149.GI3576660@ZenIV.linux.org.uk/
On Mon, Nov 09 2020 at 20:59, Ira Weiny wrote:
On Tue, Nov 10, 2020 at 02:13:56AM +0100, Thomas Gleixner wrote: Also, we can convert the new memcpy_*_page() calls to kmap_local() as well. [For now my patch just uses kmap_atomic().]
I've not looked at all of the patches in your latest version. Have you included converting any of the kmap() call sites? I thought you were more focused on converting the kmap_atomic() to kmap_local()?
I did not touch any of those yet, but it's a logical consequence to convert all kmap() instances which are _not_ creating a global mapping over to it.
Thanks,
tglx
From: Ira Weiny ira.weiny@intel.com
Most kmap() callers use the map within a single thread and have no need for the protection domain to be enabled globally.
To differentiate these kmap users, new k[un]map_thread() calls were introduced which are thread local.
To aid in debugging the new use of kmap_thread(), add a reference count, a check on that count, and tracing to ID where mapping errors occur.
Cc: Juri Lelli juri.lelli@redhat.com Cc: Vincent Guittot vincent.guittot@linaro.org Cc: Dietmar Eggemann dietmar.eggemann@arm.com Cc: Steven Rostedt rostedt@goodmis.org Cc: Ben Segall bsegall@google.com Cc: Mel Gorman mgorman@suse.de Signed-off-by: Ira Weiny ira.weiny@intel.com --- include/linux/highmem.h | 5 +++ include/linux/sched.h | 5 +++ include/trace/events/kmap_thread.h | 56 ++++++++++++++++++++++++++++++ init/init_task.c | 3 ++ kernel/fork.c | 15 ++++++++ lib/Kconfig.debug | 8 +++++ mm/debug.c | 23 ++++++++++++ 7 files changed, 115 insertions(+) create mode 100644 include/trace/events/kmap_thread.h
diff --git a/include/linux/highmem.h b/include/linux/highmem.h index ef7813544719..22d1c000802e 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -247,6 +247,10 @@ static inline void kunmap(struct page *page) __kunmap(page, true); }
+#ifdef CONFIG_DEBUG_KMAP_THREAD +void *kmap_thread(struct page *page); +void kunmap_thread(struct page *page); +#else static inline void *kmap_thread(struct page *page) { return __kmap(page, false); @@ -255,6 +259,7 @@ static inline void kunmap_thread(struct page *page) { __kunmap(page, false); } +#endif
/* * Prevent people trying to call kunmap_atomic() as if it were kunmap() diff --git a/include/linux/sched.h b/include/linux/sched.h index 25d97ab6c757..4627ea4a49e6 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1318,6 +1318,11 @@ struct task_struct { #ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION unsigned int dev_page_access_ref; #endif + +#ifdef CONFIG_DEBUG_KMAP_THREAD + unsigned int kmap_thread_cnt; +#endif + /* * New fields for task_struct should be added above here, so that * they are included in the randomized portion of task_struct. diff --git a/include/trace/events/kmap_thread.h b/include/trace/events/kmap_thread.h new file mode 100644 index 000000000000..e7143cfe0daf --- /dev/null +++ b/include/trace/events/kmap_thread.h @@ -0,0 +1,56 @@ +/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */ + +/* + * Copyright (c) 2020 Intel Corporation. All rights reserved. + * + */ + +#undef TRACE_SYSTEM +#define TRACE_SYSTEM kmap_thread + +#if !defined(_TRACE_KMAP_THREAD_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_KMAP_THREAD_H + +#include <linux/tracepoint.h> + +DECLARE_EVENT_CLASS(kmap_thread_template, + TP_PROTO(struct task_struct *tsk, struct page *page, + void *caller_addr, int cnt), + TP_ARGS(tsk, page, caller_addr, cnt), + + TP_STRUCT__entry( + __field(int, pid) + __field(struct page *, page) + __field(void *, caller_addr) + __field(int, cnt) + ), + + TP_fast_assign( + __entry->pid = tsk->pid; + __entry->page = page; + __entry->caller_addr = caller_addr; + __entry->cnt = cnt; + ), + + TP_printk("PID %d; (%d) %pS %p", + __entry->pid, + __entry->cnt, + __entry->caller_addr, + __entry->page + ) +); + +DEFINE_EVENT(kmap_thread_template, kmap_thread, + TP_PROTO(struct task_struct *tsk, struct page *page, + void *caller_addr, int cnt), + TP_ARGS(tsk, page, caller_addr, cnt)); + +DEFINE_EVENT(kmap_thread_template, kunmap_thread, + TP_PROTO(struct task_struct *tsk, struct page *page, + void *caller_addr, int cnt), + TP_ARGS(tsk, page, caller_addr, cnt)); + + +#endif /* _TRACE_KMAP_THREAD_H */ + +#include <trace/define_trace.h> diff --git a/init/init_task.c b/init/init_task.c index 9b39f25de59b..19f09965eb34 100644 --- a/init/init_task.c +++ b/init/init_task.c @@ -212,6 +212,9 @@ struct task_struct init_task #ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION .dev_page_access_ref = 0, #endif +#ifdef CONFIG_DEBUG_KMAP_THREAD + .kmap_thread_cnt = 0, +#endif }; EXPORT_SYMBOL(init_task);
diff --git a/kernel/fork.c b/kernel/fork.c index b6a3ee328a89..2c66e49b7614 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -722,6 +722,17 @@ static inline void put_signal_struct(struct signal_struct *sig) free_signal_struct(sig); }
+#ifdef CONFIG_DEBUG_KMAP_THREAD +static void check_outstanding_kmap_thread(struct task_struct *tsk) +{ + if (tsk->kmap_thread_cnt) + pr_warn(KERN_ERR "WARNING: PID %d; Failed to kunmap_thread() [cnt %d]\n", + tsk->pid, tsk->kmap_thread_cnt); +} +#else +static void check_outstanding_kmap_thread(struct task_struct *tsk) { } +#endif + void __put_task_struct(struct task_struct *tsk) { WARN_ON(!tsk->exit_state); @@ -734,6 +745,7 @@ void __put_task_struct(struct task_struct *tsk) exit_creds(tsk); delayacct_tsk_free(tsk); put_signal_struct(tsk->signal); + check_outstanding_kmap_thread(tsk);
if (!profile_handoff_task(tsk)) free_task(tsk); @@ -943,6 +955,9 @@ static struct task_struct *dup_task_struct(struct task_struct *orig, int node) #endif #ifdef CONFIG_ZONE_DEVICE_ACCESS_PROTECTION tsk->dev_page_access_ref = 0; +#endif +#ifdef CONFIG_DEBUG_KMAP_THREAD + tsk->kmap_thread_cnt = 0; #endif return tsk;
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index f015c09ba5a1..6507b43d5b0c 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -858,6 +858,14 @@ config DEBUG_HIGHMEM This option enables additional error checking for high memory systems. Disable for production systems.
+config DEBUG_KMAP_THREAD + bool "Kmap debugging" + depends on DEBUG_KERNEL + help + This option enables additional error checking for kernel mapping code + specifically the k[un]map_thread() calls. Disable for production + systems. + config HAVE_DEBUG_STACKOVERFLOW bool
diff --git a/mm/debug.c b/mm/debug.c index ca8d1cacdecc..68d186f3570e 100644 --- a/mm/debug.c +++ b/mm/debug.c @@ -320,3 +320,26 @@ void page_init_poison(struct page *page, size_t size) } EXPORT_SYMBOL_GPL(page_init_poison); #endif /* CONFIG_DEBUG_VM */ + +#define CREATE_TRACE_POINTS +#include <trace/events/kmap_thread.h> + +#ifdef CONFIG_DEBUG_KMAP_THREAD +void *kmap_thread(struct page *page) +{ + trace_kmap_thread(current, page, __builtin_return_address(0), + current->kmap_thread_cnt); + current->kmap_thread_cnt++; + return __kmap(page, false); +} +EXPORT_SYMBOL_GPL(kmap_thread); + +void kunmap_thread(struct page *page) +{ + __kunmap(page, false); + current->kmap_thread_cnt--; + trace_kunmap_thread(current, page, __builtin_return_address(0), + current->kmap_thread_cnt); +} +EXPORT_SYMBOL_GPL(kunmap_thread); +#endif
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this driver are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Jens Axboe axboe@kernel.dk Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/block/drbd/drbd_main.c | 4 ++-- drivers/block/drbd/drbd_receiver.c | 12 ++++++------ 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c index 573dbf6f0c31..f0d0c6b0745e 100644 --- a/drivers/block/drbd/drbd_main.c +++ b/drivers/block/drbd/drbd_main.c @@ -1532,9 +1532,9 @@ static int _drbd_no_send_page(struct drbd_peer_device *peer_device, struct page int err;
socket = peer_device->connection->data.socket; - addr = kmap(page) + offset; + addr = kmap_thread(page) + offset; err = drbd_send_all(peer_device->connection, socket, addr, size, msg_flags); - kunmap(page); + kunmap_thread(page); if (!err) peer_device->device->send_cnt += size >> 9; return err; diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c index 422363daa618..4704bc0564e2 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -1951,13 +1951,13 @@ read_in_block(struct drbd_peer_device *peer_device, u64 id, sector_t sector, page = peer_req->pages; page_chain_for_each(page) { unsigned len = min_t(int, ds, PAGE_SIZE); - data = kmap(page); + data = kmap_thread(page); err = drbd_recv_all_warn(peer_device->connection, data, len); if (drbd_insert_fault(device, DRBD_FAULT_RECEIVE)) { drbd_err(device, "Fault injection: Corrupting data on receive\n"); data[0] = data[0] ^ (unsigned long)-1; } - kunmap(page); + kunmap_thread(page); if (err) { drbd_free_peer_req(device, peer_req); return NULL; @@ -1992,7 +1992,7 @@ static int drbd_drain_block(struct drbd_peer_device *peer_device, int data_size)
page = drbd_alloc_pages(peer_device, 1, 1);
- data = kmap(page); + data = kmap_thread(page); while (data_size) { unsigned int len = min_t(int, data_size, PAGE_SIZE);
@@ -2001,7 +2001,7 @@ static int drbd_drain_block(struct drbd_peer_device *peer_device, int data_size) break; data_size -= len; } - kunmap(page); + kunmap_thread(page); drbd_free_pages(peer_device->device, page, 0); return err; } @@ -2033,10 +2033,10 @@ static int recv_dless_read(struct drbd_peer_device *peer_device, struct drbd_req D_ASSERT(peer_device->device, sector == bio->bi_iter.bi_sector);
bio_for_each_segment(bvec, bio, iter) { - void *mapped = kmap(bvec.bv_page) + bvec.bv_offset; + void *mapped = kmap_thread(bvec.bv_page) + bvec.bv_offset; expect = min_t(int, data_size, bvec.bv_len); err = drbd_recv_all_warn(peer_device->connection, mapped, expect); - kunmap(bvec.bv_page); + kunmap_thread(bvec.bv_page); if (err) return err; data_size -= expect;
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this driver are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Luis Chamberlain mcgrof@kernel.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/base/firmware_loader/fallback.c | 4 ++-- drivers/base/firmware_loader/main.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/base/firmware_loader/fallback.c b/drivers/base/firmware_loader/fallback.c index 283ca2de76d4..22dea9ba7a37 100644 --- a/drivers/base/firmware_loader/fallback.c +++ b/drivers/base/firmware_loader/fallback.c @@ -322,14 +322,14 @@ static void firmware_rw(struct fw_priv *fw_priv, char *buffer, int page_ofs = offset & (PAGE_SIZE-1); int page_cnt = min_t(size_t, PAGE_SIZE - page_ofs, count);
- page_data = kmap(fw_priv->pages[page_nr]); + page_data = kmap_thread(fw_priv->pages[page_nr]);
if (read) memcpy(buffer, page_data + page_ofs, page_cnt); else memcpy(page_data + page_ofs, buffer, page_cnt);
- kunmap(fw_priv->pages[page_nr]); + kunmap_thread(fw_priv->pages[page_nr]); buffer += page_cnt; offset += page_cnt; count -= page_cnt; diff --git a/drivers/base/firmware_loader/main.c b/drivers/base/firmware_loader/main.c index 63b9714a0154..cc884c9f8742 100644 --- a/drivers/base/firmware_loader/main.c +++ b/drivers/base/firmware_loader/main.c @@ -409,11 +409,11 @@ static int fw_decompress_xz_pages(struct device *dev, struct fw_priv *fw_priv,
/* decompress onto the new allocated page */ page = fw_priv->pages[fw_priv->nr_pages - 1]; - xz_buf.out = kmap(page); + xz_buf.out = kmap_thread(page); xz_buf.out_pos = 0; xz_buf.out_size = PAGE_SIZE; xz_ret = xz_dec_run(xz_dec, &xz_buf); - kunmap(page); + kunmap_thread(page); fw_priv->size += xz_buf.out_pos; /* partial decompression means either end or error */ if (xz_buf.out_pos != PAGE_SIZE)
From: Ira Weiny ira.weiny@intel.com
These kmap() calls in the gpu stack are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Patrik Jakobsson patrik.r.jakobsson@gmail.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 ++++++------ drivers/gpu/drm/gma500/gma_display.c | 4 ++-- drivers/gpu/drm/gma500/mmu.c | 10 +++++----- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 ++-- .../gpu/drm/i915/gem/selftests/i915_gem_context.c | 4 ++-- drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 ++++---- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++-- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 ++-- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 ++-- drivers/gpu/drm/i915/i915_gem.c | 8 ++++---- drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++-- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 ++-- drivers/gpu/drm/radeon/radeon_ttm.c | 4 ++-- 13 files changed, 37 insertions(+), 37 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 978bae731398..bd564bccb7a3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2437,11 +2437,11 @@ static ssize_t amdgpu_ttm_gtt_read(struct file *f, char __user *buf,
page = adev->gart.pages[p]; if (page) { - ptr = kmap(page); + ptr = kmap_thread(page); ptr += off;
r = copy_to_user(buf, ptr, cur_size); - kunmap(adev->gart.pages[p]); + kunmap_thread(adev->gart.pages[p]); } else r = clear_user(buf, cur_size);
@@ -2507,9 +2507,9 @@ static ssize_t amdgpu_iomem_read(struct file *f, char __user *buf, if (p->mapping != adev->mman.bdev.dev_mapping) return -EPERM;
- ptr = kmap(p); + ptr = kmap_thread(p); r = copy_to_user(buf, ptr + off, bytes); - kunmap(p); + kunmap_thread(p); if (r) return -EFAULT;
@@ -2558,9 +2558,9 @@ static ssize_t amdgpu_iomem_write(struct file *f, const char __user *buf, if (p->mapping != adev->mman.bdev.dev_mapping) return -EPERM;
- ptr = kmap(p); + ptr = kmap_thread(p); r = copy_from_user(ptr + off, buf, bytes); - kunmap(p); + kunmap_thread(p); if (r) return -EFAULT;
diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c index 3df6d6e850f5..35f4e55c941f 100644 --- a/drivers/gpu/drm/gma500/gma_display.c +++ b/drivers/gpu/drm/gma500/gma_display.c @@ -400,9 +400,9 @@ int gma_crtc_cursor_set(struct drm_crtc *crtc, /* Copy the cursor to cursor mem */ tmp_dst = dev_priv->vram_addr + cursor_gt->offset; for (i = 0; i < cursor_pages; i++) { - tmp_src = kmap(gt->pages[i]); + tmp_src = kmap_thread(gt->pages[i]); memcpy(tmp_dst, tmp_src, PAGE_SIZE); - kunmap(gt->pages[i]); + kunmap_thread(gt->pages[i]); tmp_dst += PAGE_SIZE; }
diff --git a/drivers/gpu/drm/gma500/mmu.c b/drivers/gpu/drm/gma500/mmu.c index 505044c9a673..fba7a3a461fd 100644 --- a/drivers/gpu/drm/gma500/mmu.c +++ b/drivers/gpu/drm/gma500/mmu.c @@ -192,20 +192,20 @@ struct psb_mmu_pd *psb_mmu_alloc_pd(struct psb_mmu_driver *driver, pd->invalid_pte = 0; }
- v = kmap(pd->dummy_pt); + v = kmap_thread(pd->dummy_pt); for (i = 0; i < (PAGE_SIZE / sizeof(uint32_t)); ++i) v[i] = pd->invalid_pte;
- kunmap(pd->dummy_pt); + kunmap_thread(pd->dummy_pt);
- v = kmap(pd->p); + v = kmap_thread(pd->p); for (i = 0; i < (PAGE_SIZE / sizeof(uint32_t)); ++i) v[i] = pd->invalid_pde;
- kunmap(pd->p); + kunmap_thread(pd->p);
clear_page(kmap(pd->dummy_page)); - kunmap(pd->dummy_page); + kunmap_thread(pd->dummy_page);
pd->tables = vmalloc_user(sizeof(struct psb_mmu_pt *) * 1024); if (!pd->tables) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 38113d3c0138..274424795fb7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -566,9 +566,9 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv, if (err < 0) goto fail;
- vaddr = kmap(page); + vaddr = kmap_thread(page); memcpy(vaddr, data, len); - kunmap(page); + kunmap_thread(page);
err = pagecache_write_end(file, file->f_mapping, offset, len, len, diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 7ffc3c751432..b466c677d007 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1754,7 +1754,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) return -EINVAL; }
- vaddr = kmap(page); + vaddr = kmap_thread(page); if (!vaddr) { pr_err("No (mappable) scratch page!\n"); return -EINVAL; @@ -1765,7 +1765,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) pr_err("Inconsistent initial state of scratch page!\n"); err = -EINVAL; } - kunmap(page); + kunmap_thread(page);
return err; } diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 9c7402ce5bf9..447df22e2e06 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -143,7 +143,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt);
p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT); - cpu = kmap(p) + offset_in_page(offset); + cpu = kmap_thread(p) + offset_in_page(offset); drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n", @@ -161,7 +161,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu)); - kunmap(p); + kunmap_thread(p);
out: __i915_vma_put(vma); @@ -236,7 +236,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt);
p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT); - cpu = kmap(p) + offset_in_page(offset); + cpu = kmap_thread(p) + offset_in_page(offset); drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n", @@ -254,7 +254,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu)); - kunmap(p); + kunmap_thread(p); if (err) return err;
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index 7fb36b12fe7a..38da348282f1 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -731,7 +731,7 @@ static void swizzle_page(struct page *page) char *vaddr; int i;
- vaddr = kmap(page); + vaddr = kmap_thread(page);
for (i = 0; i < PAGE_SIZE; i += 128) { memcpy(temp, &vaddr[i], 64); @@ -739,7 +739,7 @@ static void swizzle_page(struct page *page) memcpy(&vaddr[i + 64], temp, 64); }
- kunmap(page); + kunmap_thread(page); }
/** diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 2a72cce63fd9..4cfb24e9ed62 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -312,9 +312,9 @@ static void poison_scratch_page(struct page *page, unsigned long size) do { void *vaddr;
- vaddr = kmap(page); + vaddr = kmap_thread(page); memset(vaddr, POISON_FREE, PAGE_SIZE); - kunmap(page); + kunmap_thread(page);
page = pfn_to_page(page_to_pfn(page) + 1); size -= PAGE_SIZE; diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index 43c7acbdc79d..a40d3130cebf 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -142,12 +142,12 @@ static int __shmem_rw(struct file *file, loff_t off, if (IS_ERR(page)) return PTR_ERR(page);
- vaddr = kmap(page); + vaddr = kmap_thread(page); if (write) memcpy(vaddr + offset_in_page(off), ptr, this); else memcpy(ptr, vaddr + offset_in_page(off), this); - kunmap(page); + kunmap_thread(page); put_page(page);
len -= this; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 9aa3066cb75d..cae8300fd224 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -312,14 +312,14 @@ shmem_pread(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret;
- vaddr = kmap(page); + vaddr = kmap_thread(page);
if (needs_clflush) drm_clflush_virt_range(vaddr + offset, len);
ret = __copy_to_user(user_data, vaddr + offset, len);
- kunmap(page); + kunmap_thread(page);
return ret ? -EFAULT : 0; } @@ -708,7 +708,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret;
- vaddr = kmap(page); + vaddr = kmap_thread(page);
if (needs_clflush_before) drm_clflush_virt_range(vaddr + offset, len); @@ -717,7 +717,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, if (!ret && needs_clflush_after) drm_clflush_virt_range(vaddr + offset, len);
- kunmap(page); + kunmap_thread(page);
return ret ? -EFAULT : 0; } diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 3e6cbb0d1150..aecd469b6b6e 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1058,9 +1058,9 @@ i915_vma_coredump_create(const struct intel_gt *gt,
drm_clflush_pages(&page, 1);
- s = kmap(page); + s = kmap_thread(page); ret = compress_page(compress, s, dst, false); - kunmap(page); + kunmap_thread(page);
drm_clflush_pages(&page, 1);
diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c index c2d001d9c0ec..7f7ef2d056f4 100644 --- a/drivers/gpu/drm/i915/selftests/i915_perf.c +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c @@ -307,7 +307,7 @@ static int live_noa_gpr(void *arg) }
/* Poison the ce->vm so we detect writes not to the GGTT gt->scratch */ - scratch = kmap(ce->vm->scratch[0].base.page); + scratch = kmap_thread(ce->vm->scratch[0].base.page); memset(scratch, POISON_FREE, PAGE_SIZE);
rq = intel_context_create_request(ce); @@ -405,7 +405,7 @@ static int live_noa_gpr(void *arg) out_rq: i915_request_put(rq); out_ce: - kunmap(ce->vm->scratch[0].base.page); + kunmap_thread(ce->vm->scratch[0].base.page); intel_context_put(ce); out: stream_destroy(stream); diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 004344dce140..0aba0cac51e1 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -1013,11 +1013,11 @@ static ssize_t radeon_ttm_gtt_read(struct file *f, char __user *buf,
page = rdev->gart.pages[p]; if (page) { - ptr = kmap(page); + ptr = kmap_thread(page); ptr += off;
r = copy_to_user(buf, ptr, cur_size); - kunmap(rdev->gart.pages[p]); + kunmap_thread(rdev->gart.pages[p]); } else r = clear_user(buf, cur_size);
On Fri, Oct 09, 2020 at 12:49:44PM -0700, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
These kmap() calls in the gpu stack are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Patrik Jakobsson patrik.r.jakobsson@gmail.com Signed-off-by: Ira Weiny ira.weiny@intel.com
I'm guessing the entire pile goes in through some other tree. If so:
Acked-by: Daniel Vetter daniel.vetter@ffwll.ch
If you want this to land through maintainer trees, then we need a per-driver split (since aside from amdgpu and radeon they're all different subtrees).
btw the two kmap calls in drm you highlight in the cover letter should also be convertible to kmap_thread. We only hold vmalloc mappings for a longer time (or it'd be quite a driver bug). So if you want maybe throw those two as two additional patches on top, and we can do some careful review & testing for them. -Daniel
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 ++++++------ drivers/gpu/drm/gma500/gma_display.c | 4 ++-- drivers/gpu/drm/gma500/mmu.c | 10 +++++----- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 ++-- .../gpu/drm/i915/gem/selftests/i915_gem_context.c | 4 ++-- drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 ++++---- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++-- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 ++-- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 ++-- drivers/gpu/drm/i915/i915_gem.c | 8 ++++---- drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++-- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 ++-- drivers/gpu/drm/radeon/radeon_ttm.c | 4 ++-- 13 files changed, 37 insertions(+), 37 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 978bae731398..bd564bccb7a3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2437,11 +2437,11 @@ static ssize_t amdgpu_ttm_gtt_read(struct file *f, char __user *buf, page = adev->gart.pages[p]; if (page) {
ptr = kmap(page);
ptr = kmap_thread(page); ptr += off;
r = copy_to_user(buf, ptr, cur_size);
kunmap(adev->gart.pages[p]);
} else r = clear_user(buf, cur_size);kunmap_thread(adev->gart.pages[p]);
@@ -2507,9 +2507,9 @@ static ssize_t amdgpu_iomem_read(struct file *f, char __user *buf, if (p->mapping != adev->mman.bdev.dev_mapping) return -EPERM;
ptr = kmap(p);
r = copy_to_user(buf, ptr + off, bytes);ptr = kmap_thread(p);
kunmap(p);
if (r) return -EFAULT;kunmap_thread(p);
@@ -2558,9 +2558,9 @@ static ssize_t amdgpu_iomem_write(struct file *f, const char __user *buf, if (p->mapping != adev->mman.bdev.dev_mapping) return -EPERM;
ptr = kmap(p);
r = copy_from_user(ptr + off, buf, bytes);ptr = kmap_thread(p);
kunmap(p);
if (r) return -EFAULT;kunmap_thread(p);
diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c index 3df6d6e850f5..35f4e55c941f 100644 --- a/drivers/gpu/drm/gma500/gma_display.c +++ b/drivers/gpu/drm/gma500/gma_display.c @@ -400,9 +400,9 @@ int gma_crtc_cursor_set(struct drm_crtc *crtc, /* Copy the cursor to cursor mem */ tmp_dst = dev_priv->vram_addr + cursor_gt->offset; for (i = 0; i < cursor_pages; i++) {
tmp_src = kmap(gt->pages[i]);
tmp_src = kmap_thread(gt->pages[i]); memcpy(tmp_dst, tmp_src, PAGE_SIZE);
kunmap(gt->pages[i]);
}kunmap_thread(gt->pages[i]); tmp_dst += PAGE_SIZE;
diff --git a/drivers/gpu/drm/gma500/mmu.c b/drivers/gpu/drm/gma500/mmu.c index 505044c9a673..fba7a3a461fd 100644 --- a/drivers/gpu/drm/gma500/mmu.c +++ b/drivers/gpu/drm/gma500/mmu.c @@ -192,20 +192,20 @@ struct psb_mmu_pd *psb_mmu_alloc_pd(struct psb_mmu_driver *driver, pd->invalid_pte = 0; }
- v = kmap(pd->dummy_pt);
- v = kmap_thread(pd->dummy_pt); for (i = 0; i < (PAGE_SIZE / sizeof(uint32_t)); ++i) v[i] = pd->invalid_pte;
- kunmap(pd->dummy_pt);
- kunmap_thread(pd->dummy_pt);
- v = kmap(pd->p);
- v = kmap_thread(pd->p); for (i = 0; i < (PAGE_SIZE / sizeof(uint32_t)); ++i) v[i] = pd->invalid_pde;
- kunmap(pd->p);
- kunmap_thread(pd->p);
clear_page(kmap(pd->dummy_page));
- kunmap(pd->dummy_page);
- kunmap_thread(pd->dummy_page);
pd->tables = vmalloc_user(sizeof(struct psb_mmu_pt *) * 1024); if (!pd->tables) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 38113d3c0138..274424795fb7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -566,9 +566,9 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv, if (err < 0) goto fail;
vaddr = kmap(page);
memcpy(vaddr, data, len);vaddr = kmap_thread(page);
kunmap(page);
kunmap_thread(page);
err = pagecache_write_end(file, file->f_mapping, offset, len, len, diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 7ffc3c751432..b466c677d007 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1754,7 +1754,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) return -EINVAL; }
- vaddr = kmap(page);
- vaddr = kmap_thread(page); if (!vaddr) { pr_err("No (mappable) scratch page!\n"); return -EINVAL;
@@ -1765,7 +1765,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) pr_err("Inconsistent initial state of scratch page!\n"); err = -EINVAL; }
- kunmap(page);
- kunmap_thread(page);
return err; } diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 9c7402ce5bf9..447df22e2e06 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -143,7 +143,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
- cpu = kmap(p) + offset_in_page(offset);
- cpu = kmap_thread(p) + offset_in_page(offset); drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n",
@@ -161,7 +161,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu));
- kunmap(p);
- kunmap_thread(p);
out: __i915_vma_put(vma); @@ -236,7 +236,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
cpu = kmap(p) + offset_in_page(offset);
drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n",cpu = kmap_thread(p) + offset_in_page(offset);
@@ -254,7 +254,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu));
kunmap(p);
if (err) return err;kunmap_thread(p);
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index 7fb36b12fe7a..38da348282f1 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -731,7 +731,7 @@ static void swizzle_page(struct page *page) char *vaddr; int i;
- vaddr = kmap(page);
- vaddr = kmap_thread(page);
for (i = 0; i < PAGE_SIZE; i += 128) { memcpy(temp, &vaddr[i], 64); @@ -739,7 +739,7 @@ static void swizzle_page(struct page *page) memcpy(&vaddr[i + 64], temp, 64); }
- kunmap(page);
- kunmap_thread(page);
} /** diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 2a72cce63fd9..4cfb24e9ed62 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -312,9 +312,9 @@ static void poison_scratch_page(struct page *page, unsigned long size) do { void *vaddr;
vaddr = kmap(page);
memset(vaddr, POISON_FREE, PAGE_SIZE);vaddr = kmap_thread(page);
kunmap(page);
kunmap_thread(page);
page = pfn_to_page(page_to_pfn(page) + 1); size -= PAGE_SIZE; diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index 43c7acbdc79d..a40d3130cebf 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -142,12 +142,12 @@ static int __shmem_rw(struct file *file, loff_t off, if (IS_ERR(page)) return PTR_ERR(page);
vaddr = kmap(page);
if (write) memcpy(vaddr + offset_in_page(off), ptr, this); else memcpy(ptr, vaddr + offset_in_page(off), this);vaddr = kmap_thread(page);
kunmap(page);
put_page(page);kunmap_thread(page);
len -= this; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 9aa3066cb75d..cae8300fd224 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -312,14 +312,14 @@ shmem_pread(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret;
- vaddr = kmap(page);
- vaddr = kmap_thread(page);
if (needs_clflush) drm_clflush_virt_range(vaddr + offset, len); ret = __copy_to_user(user_data, vaddr + offset, len);
- kunmap(page);
- kunmap_thread(page);
return ret ? -EFAULT : 0; } @@ -708,7 +708,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret;
- vaddr = kmap(page);
- vaddr = kmap_thread(page);
if (needs_clflush_before) drm_clflush_virt_range(vaddr + offset, len); @@ -717,7 +717,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, if (!ret && needs_clflush_after) drm_clflush_virt_range(vaddr + offset, len);
- kunmap(page);
- kunmap_thread(page);
return ret ? -EFAULT : 0; } diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 3e6cbb0d1150..aecd469b6b6e 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1058,9 +1058,9 @@ i915_vma_coredump_create(const struct intel_gt *gt, drm_clflush_pages(&page, 1);
s = kmap(page);
s = kmap_thread(page); ret = compress_page(compress, s, dst, false);
kunmap(page);
kunmap_thread(page);
drm_clflush_pages(&page, 1); diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c index c2d001d9c0ec..7f7ef2d056f4 100644 --- a/drivers/gpu/drm/i915/selftests/i915_perf.c +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c @@ -307,7 +307,7 @@ static int live_noa_gpr(void *arg) } /* Poison the ce->vm so we detect writes not to the GGTT gt->scratch */
- scratch = kmap(ce->vm->scratch[0].base.page);
- scratch = kmap_thread(ce->vm->scratch[0].base.page); memset(scratch, POISON_FREE, PAGE_SIZE);
rq = intel_context_create_request(ce); @@ -405,7 +405,7 @@ static int live_noa_gpr(void *arg) out_rq: i915_request_put(rq); out_ce:
- kunmap(ce->vm->scratch[0].base.page);
- kunmap_thread(ce->vm->scratch[0].base.page); intel_context_put(ce);
out: stream_destroy(stream); diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 004344dce140..0aba0cac51e1 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -1013,11 +1013,11 @@ static ssize_t radeon_ttm_gtt_read(struct file *f, char __user *buf, page = rdev->gart.pages[p]; if (page) {
ptr = kmap(page);
ptr = kmap_thread(page); ptr += off;
r = copy_to_user(buf, ptr, cur_size);
kunmap(rdev->gart.pages[p]);
} else r = clear_user(buf, cur_size);kunmap_thread(rdev->gart.pages[p]);
2.28.0.rc0.12.gb6a658bd00c9
On Sat, Oct 10, 2020 at 12:03:49AM +0200, Daniel Vetter wrote:
On Fri, Oct 09, 2020 at 12:49:44PM -0700, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
These kmap() calls in the gpu stack are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: David Airlie airlied@linux.ie Cc: Daniel Vetter daniel@ffwll.ch Cc: Patrik Jakobsson patrik.r.jakobsson@gmail.com Signed-off-by: Ira Weiny ira.weiny@intel.com
I'm guessing the entire pile goes in through some other tree.
Apologies for not realizing there were multiple maintainers here.
But, I was thinking it would land together through the mm tree once the core support lands. I've tried to split these out in a way they can be easily reviewed/acked by the correct developers.
If so:
Acked-by: Daniel Vetter daniel.vetter@ffwll.ch
If you want this to land through maintainer trees, then we need a per-driver split (since aside from amdgpu and radeon they're all different subtrees).
It is just RFC for the moment. I need to get the core support accepted first then this can land.
btw the two kmap calls in drm you highlight in the cover letter should also be convertible to kmap_thread. We only hold vmalloc mappings for a longer time (or it'd be quite a driver bug). So if you want maybe throw those two as two additional patches on top, and we can do some careful review & testing for them.
Cool. I'll add them in.
Ira
-Daniel
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 12 ++++++------ drivers/gpu/drm/gma500/gma_display.c | 4 ++-- drivers/gpu/drm/gma500/mmu.c | 10 +++++----- drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 4 ++-- .../gpu/drm/i915/gem/selftests/i915_gem_context.c | 4 ++-- drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c | 8 ++++---- drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c | 4 ++-- drivers/gpu/drm/i915/gt/intel_gtt.c | 4 ++-- drivers/gpu/drm/i915/gt/shmem_utils.c | 4 ++-- drivers/gpu/drm/i915/i915_gem.c | 8 ++++---- drivers/gpu/drm/i915/i915_gpu_error.c | 4 ++-- drivers/gpu/drm/i915/selftests/i915_perf.c | 4 ++-- drivers/gpu/drm/radeon/radeon_ttm.c | 4 ++-- 13 files changed, 37 insertions(+), 37 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 978bae731398..bd564bccb7a3 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c @@ -2437,11 +2437,11 @@ static ssize_t amdgpu_ttm_gtt_read(struct file *f, char __user *buf, page = adev->gart.pages[p]; if (page) {
ptr = kmap(page);
ptr = kmap_thread(page); ptr += off;
r = copy_to_user(buf, ptr, cur_size);
kunmap(adev->gart.pages[p]);
} else r = clear_user(buf, cur_size);kunmap_thread(adev->gart.pages[p]);
@@ -2507,9 +2507,9 @@ static ssize_t amdgpu_iomem_read(struct file *f, char __user *buf, if (p->mapping != adev->mman.bdev.dev_mapping) return -EPERM;
ptr = kmap(p);
r = copy_to_user(buf, ptr + off, bytes);ptr = kmap_thread(p);
kunmap(p);
if (r) return -EFAULT;kunmap_thread(p);
@@ -2558,9 +2558,9 @@ static ssize_t amdgpu_iomem_write(struct file *f, const char __user *buf, if (p->mapping != adev->mman.bdev.dev_mapping) return -EPERM;
ptr = kmap(p);
r = copy_from_user(ptr + off, buf, bytes);ptr = kmap_thread(p);
kunmap(p);
if (r) return -EFAULT;kunmap_thread(p);
diff --git a/drivers/gpu/drm/gma500/gma_display.c b/drivers/gpu/drm/gma500/gma_display.c index 3df6d6e850f5..35f4e55c941f 100644 --- a/drivers/gpu/drm/gma500/gma_display.c +++ b/drivers/gpu/drm/gma500/gma_display.c @@ -400,9 +400,9 @@ int gma_crtc_cursor_set(struct drm_crtc *crtc, /* Copy the cursor to cursor mem */ tmp_dst = dev_priv->vram_addr + cursor_gt->offset; for (i = 0; i < cursor_pages; i++) {
tmp_src = kmap(gt->pages[i]);
tmp_src = kmap_thread(gt->pages[i]); memcpy(tmp_dst, tmp_src, PAGE_SIZE);
kunmap(gt->pages[i]);
}kunmap_thread(gt->pages[i]); tmp_dst += PAGE_SIZE;
diff --git a/drivers/gpu/drm/gma500/mmu.c b/drivers/gpu/drm/gma500/mmu.c index 505044c9a673..fba7a3a461fd 100644 --- a/drivers/gpu/drm/gma500/mmu.c +++ b/drivers/gpu/drm/gma500/mmu.c @@ -192,20 +192,20 @@ struct psb_mmu_pd *psb_mmu_alloc_pd(struct psb_mmu_driver *driver, pd->invalid_pte = 0; }
- v = kmap(pd->dummy_pt);
- v = kmap_thread(pd->dummy_pt); for (i = 0; i < (PAGE_SIZE / sizeof(uint32_t)); ++i) v[i] = pd->invalid_pte;
- kunmap(pd->dummy_pt);
- kunmap_thread(pd->dummy_pt);
- v = kmap(pd->p);
- v = kmap_thread(pd->p); for (i = 0; i < (PAGE_SIZE / sizeof(uint32_t)); ++i) v[i] = pd->invalid_pde;
- kunmap(pd->p);
- kunmap_thread(pd->p);
clear_page(kmap(pd->dummy_page));
- kunmap(pd->dummy_page);
- kunmap_thread(pd->dummy_page);
pd->tables = vmalloc_user(sizeof(struct psb_mmu_pt *) * 1024); if (!pd->tables) diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c index 38113d3c0138..274424795fb7 100644 --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c @@ -566,9 +566,9 @@ i915_gem_object_create_shmem_from_data(struct drm_i915_private *dev_priv, if (err < 0) goto fail;
vaddr = kmap(page);
memcpy(vaddr, data, len);vaddr = kmap_thread(page);
kunmap(page);
kunmap_thread(page);
err = pagecache_write_end(file, file->f_mapping, offset, len, len, diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c index 7ffc3c751432..b466c677d007 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_context.c @@ -1754,7 +1754,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) return -EINVAL; }
- vaddr = kmap(page);
- vaddr = kmap_thread(page); if (!vaddr) { pr_err("No (mappable) scratch page!\n"); return -EINVAL;
@@ -1765,7 +1765,7 @@ static int check_scratch_page(struct i915_gem_context *ctx, u32 *out) pr_err("Inconsistent initial state of scratch page!\n"); err = -EINVAL; }
- kunmap(page);
- kunmap_thread(page);
return err; } diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c index 9c7402ce5bf9..447df22e2e06 100644 --- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c +++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_mman.c @@ -143,7 +143,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
- cpu = kmap(p) + offset_in_page(offset);
- cpu = kmap_thread(p) + offset_in_page(offset); drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n",
@@ -161,7 +161,7 @@ static int check_partial_mapping(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu));
- kunmap(p);
- kunmap_thread(p);
out: __i915_vma_put(vma); @@ -236,7 +236,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, intel_gt_flush_ggtt_writes(&to_i915(obj->base.dev)->gt); p = i915_gem_object_get_page(obj, offset >> PAGE_SHIFT);
cpu = kmap(p) + offset_in_page(offset);
drm_clflush_virt_range(cpu, sizeof(*cpu)); if (*cpu != (u32)page) { pr_err("Partial view for %lu [%u] (offset=%llu, size=%u [%llu, row size %u], fence=%d, tiling=%d, stride=%d) misalignment, expected write to page (%llu + %u [0x%llx]) of 0x%x, found 0x%x\n",cpu = kmap_thread(p) + offset_in_page(offset);
@@ -254,7 +254,7 @@ static int check_partial_mappings(struct drm_i915_gem_object *obj, } *cpu = 0; drm_clflush_virt_range(cpu, sizeof(*cpu));
kunmap(p);
if (err) return err;kunmap_thread(p);
diff --git a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c index 7fb36b12fe7a..38da348282f1 100644 --- a/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c +++ b/drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c @@ -731,7 +731,7 @@ static void swizzle_page(struct page *page) char *vaddr; int i;
- vaddr = kmap(page);
- vaddr = kmap_thread(page);
for (i = 0; i < PAGE_SIZE; i += 128) { memcpy(temp, &vaddr[i], 64); @@ -739,7 +739,7 @@ static void swizzle_page(struct page *page) memcpy(&vaddr[i + 64], temp, 64); }
- kunmap(page);
- kunmap_thread(page);
} /** diff --git a/drivers/gpu/drm/i915/gt/intel_gtt.c b/drivers/gpu/drm/i915/gt/intel_gtt.c index 2a72cce63fd9..4cfb24e9ed62 100644 --- a/drivers/gpu/drm/i915/gt/intel_gtt.c +++ b/drivers/gpu/drm/i915/gt/intel_gtt.c @@ -312,9 +312,9 @@ static void poison_scratch_page(struct page *page, unsigned long size) do { void *vaddr;
vaddr = kmap(page);
memset(vaddr, POISON_FREE, PAGE_SIZE);vaddr = kmap_thread(page);
kunmap(page);
kunmap_thread(page);
page = pfn_to_page(page_to_pfn(page) + 1); size -= PAGE_SIZE; diff --git a/drivers/gpu/drm/i915/gt/shmem_utils.c b/drivers/gpu/drm/i915/gt/shmem_utils.c index 43c7acbdc79d..a40d3130cebf 100644 --- a/drivers/gpu/drm/i915/gt/shmem_utils.c +++ b/drivers/gpu/drm/i915/gt/shmem_utils.c @@ -142,12 +142,12 @@ static int __shmem_rw(struct file *file, loff_t off, if (IS_ERR(page)) return PTR_ERR(page);
vaddr = kmap(page);
if (write) memcpy(vaddr + offset_in_page(off), ptr, this); else memcpy(ptr, vaddr + offset_in_page(off), this);vaddr = kmap_thread(page);
kunmap(page);
put_page(page);kunmap_thread(page);
len -= this; diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 9aa3066cb75d..cae8300fd224 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -312,14 +312,14 @@ shmem_pread(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret;
- vaddr = kmap(page);
- vaddr = kmap_thread(page);
if (needs_clflush) drm_clflush_virt_range(vaddr + offset, len); ret = __copy_to_user(user_data, vaddr + offset, len);
- kunmap(page);
- kunmap_thread(page);
return ret ? -EFAULT : 0; } @@ -708,7 +708,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, char *vaddr; int ret;
- vaddr = kmap(page);
- vaddr = kmap_thread(page);
if (needs_clflush_before) drm_clflush_virt_range(vaddr + offset, len); @@ -717,7 +717,7 @@ shmem_pwrite(struct page *page, int offset, int len, char __user *user_data, if (!ret && needs_clflush_after) drm_clflush_virt_range(vaddr + offset, len);
- kunmap(page);
- kunmap_thread(page);
return ret ? -EFAULT : 0; } diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c index 3e6cbb0d1150..aecd469b6b6e 100644 --- a/drivers/gpu/drm/i915/i915_gpu_error.c +++ b/drivers/gpu/drm/i915/i915_gpu_error.c @@ -1058,9 +1058,9 @@ i915_vma_coredump_create(const struct intel_gt *gt, drm_clflush_pages(&page, 1);
s = kmap(page);
s = kmap_thread(page); ret = compress_page(compress, s, dst, false);
kunmap(page);
kunmap_thread(page);
drm_clflush_pages(&page, 1); diff --git a/drivers/gpu/drm/i915/selftests/i915_perf.c b/drivers/gpu/drm/i915/selftests/i915_perf.c index c2d001d9c0ec..7f7ef2d056f4 100644 --- a/drivers/gpu/drm/i915/selftests/i915_perf.c +++ b/drivers/gpu/drm/i915/selftests/i915_perf.c @@ -307,7 +307,7 @@ static int live_noa_gpr(void *arg) } /* Poison the ce->vm so we detect writes not to the GGTT gt->scratch */
- scratch = kmap(ce->vm->scratch[0].base.page);
- scratch = kmap_thread(ce->vm->scratch[0].base.page); memset(scratch, POISON_FREE, PAGE_SIZE);
rq = intel_context_create_request(ce); @@ -405,7 +405,7 @@ static int live_noa_gpr(void *arg) out_rq: i915_request_put(rq); out_ce:
- kunmap(ce->vm->scratch[0].base.page);
- kunmap_thread(ce->vm->scratch[0].base.page); intel_context_put(ce);
out: stream_destroy(stream); diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c index 004344dce140..0aba0cac51e1 100644 --- a/drivers/gpu/drm/radeon/radeon_ttm.c +++ b/drivers/gpu/drm/radeon/radeon_ttm.c @@ -1013,11 +1013,11 @@ static ssize_t radeon_ttm_gtt_read(struct file *f, char __user *buf, page = rdev->gart.pages[p]; if (page) {
ptr = kmap(page);
ptr = kmap_thread(page); ptr += off;
r = copy_to_user(buf, ptr, cur_size);
kunmap(rdev->gart.pages[p]);
} else r = clear_user(buf, cur_size);kunmap_thread(rdev->gart.pages[p]);
2.28.0.rc0.12.gb6a658bd00c9
-- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in these drivers are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Mike Marciniszyn mike.marciniszyn@intel.com Cc: Dennis Dalessandro dennis.dalessandro@intel.com Cc: Doug Ledford dledford@redhat.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Faisal Latif faisal.latif@intel.com Cc: Shiraz Saleem shiraz.saleem@intel.com Cc: Bernard Metzler bmt@zurich.ibm.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/infiniband/hw/hfi1/sdma.c | 4 ++-- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +++++----- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +++++++------- 3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/sdma.c b/drivers/infiniband/hw/hfi1/sdma.c index 04575c9afd61..09d206e3229a 100644 --- a/drivers/infiniband/hw/hfi1/sdma.c +++ b/drivers/infiniband/hw/hfi1/sdma.c @@ -3130,7 +3130,7 @@ int ext_coal_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx, }
if (type == SDMA_MAP_PAGE) { - kvaddr = kmap(page); + kvaddr = kmap_thread(page); kvaddr += offset; } else if (WARN_ON(!kvaddr)) { __sdma_txclean(dd, tx); @@ -3140,7 +3140,7 @@ int ext_coal_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx, memcpy(tx->coalesce_buf + tx->coalesce_idx, kvaddr, len); tx->coalesce_idx += len; if (type == SDMA_MAP_PAGE) - kunmap(page); + kunmap_thread(page);
/* If there is more data, return */ if (tx->tlen - tx->coalesce_idx) diff --git a/drivers/infiniband/hw/i40iw/i40iw_cm.c b/drivers/infiniband/hw/i40iw/i40iw_cm.c index a3b95805c154..122d7a5642a1 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_cm.c +++ b/drivers/infiniband/hw/i40iw/i40iw_cm.c @@ -3721,7 +3721,7 @@ int i40iw_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) ibmr->device = iwpd->ibpd.device; iwqp->lsmm_mr = ibmr; if (iwqp->page) - iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page); + iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page); dev->iw_priv_qp_ops->qp_send_lsmm(&iwqp->sc_qp, iwqp->ietf_mem.va, (accept.size + conn_param->private_data_len), @@ -3729,12 +3729,12 @@ int i40iw_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
} else { if (iwqp->page) - iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page); + iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page); dev->iw_priv_qp_ops->qp_send_lsmm(&iwqp->sc_qp, NULL, 0, 0); }
if (iwqp->page) - kunmap(iwqp->page); + kunmap_thread(iwqp->page);
iwqp->cm_id = cm_id; cm_node->cm_id = cm_id; @@ -4102,10 +4102,10 @@ static void i40iw_cm_event_connected(struct i40iw_cm_event *event) i40iw_cm_init_tsa_conn(iwqp, cm_node); read0 = (cm_node->send_rdma0_op == SEND_RDMA_READ_ZERO); if (iwqp->page) - iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page); + iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page); dev->iw_priv_qp_ops->qp_send_rtt(&iwqp->sc_qp, read0); if (iwqp->page) - kunmap(iwqp->page); + kunmap_thread(iwqp->page);
memset(&attr, 0, sizeof(attr)); attr.qp_state = IB_QPS_RTS; diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index d19d8325588b..4ed37c328d02 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -76,7 +76,7 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void *paddr) if (unlikely(!p)) return -EFAULT;
- buffer = kmap(p); + buffer = kmap_thread(p);
if (likely(PAGE_SIZE - off >= bytes)) { memcpy(paddr, buffer + off, bytes); @@ -84,7 +84,7 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void *paddr) unsigned long part = bytes - (PAGE_SIZE - off);
memcpy(paddr, buffer + off, part); - kunmap(p); + kunmap_thread(p);
if (!mem->is_pbl) p = siw_get_upage(mem->umem, @@ -96,10 +96,10 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void *paddr) if (unlikely(!p)) return -EFAULT;
- buffer = kmap(p); + buffer = kmap_thread(p); memcpy(paddr + part, buffer, bytes - part); } - kunmap(p); + kunmap_thread(p); } } return (int)bytes; @@ -505,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) page_array[seg] = p;
if (!c_tx->use_sendpage) { - iov[seg].iov_base = kmap(p) + fp_off; + iov[seg].iov_base = kmap_thread(p) + fp_off; iov[seg].iov_len = plen;
/* Remember for later kunmap() */ @@ -518,9 +518,9 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) plen); } else if (do_crc) { crypto_shash_update(c_tx->mpa_crc_hd, - kmap(p) + fp_off, + kmap_thread(p) + fp_off, plen); - kunmap(p); + kunmap_thread(p); } } else { u64 va = sge->laddr + sge_off;
-----ira.weiny@intel.com wrote: -----
To: "Andrew Morton" akpm@linux-foundation.org, "Thomas Gleixner" tglx@linutronix.de, "Ingo Molnar" mingo@redhat.com, "Borislav Petkov" bp@alien8.de, "Andy Lutomirski" luto@kernel.org, "Peter Zijlstra" peterz@infradead.org From: ira.weiny@intel.com Date: 10/09/2020 09:52PM Cc: "Ira Weiny" ira.weiny@intel.com, "Mike Marciniszyn" mike.marciniszyn@intel.com, "Dennis Dalessandro" dennis.dalessandro@intel.com, "Doug Ledford" dledford@redhat.com, "Jason Gunthorpe" jgg@ziepe.ca, "Faisal Latif" faisal.latif@intel.com, "Shiraz Saleem" shiraz.saleem@intel.com, "Bernard Metzler" bmt@zurich.ibm.com, x86@kernel.org, "Dave Hansen" dave.hansen@linux.intel.com, "Dan Williams" dan.j.williams@intel.com, "Fenghua Yu" fenghua.yu@intel.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-nvdimm@lists.01.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kselftest@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, kvm@vger.kernel.org, netdev@vger.kernel.org, bpf@vger.kernel.org, kexec@lists.infradead.org, linux-bcache@vger.kernel.org, linux-mtd@lists.infradead.org, devel@driverdev.osuosl.org, linux-efi@vger.kernel.org, linux-mmc@vger.kernel.org, linux-scsi@vger.kernel.org, target-devel@vger.kernel.org, linux-nfs@vger.kernel.org, ceph-devel@vger.kernel.org, linux-ext4@vger.kernel.org, linux-aio@kvack.org, io-uring@vger.kernel.org, linux-erofs@lists.ozlabs.org, linux-um@lists.infradead.org, linux-ntfs-dev@lists.sourceforge.net, reiserfs-devel@vger.kernel.org, linux-f2fs-devel@lists.sourceforge.net, linux-nilfs@vger.kernel.org, cluster-devel@redhat.com, ecryptfs@vger.kernel.org, linux-cifs@vger.kernel.org, linux-btrfs@vger.kernel.org, linux-afs@lists.infradead.org, linux-rdma@vger.kernel.org, amd-gfx@lists.freedesktop.org, dri-devel@lists.freedesktop.org, intel-gfx@lists.freedesktop.org, drbd-dev@tron.linbit.com, linux-block@vger.kernel.org, xen-devel@lists.xenproject.org, linux-cachefs@redhat.com, samba-technical@lists.samba.org, intel-wired-lan@lists.osuosl.org Subject: [EXTERNAL] [PATCH RFC PKS/PMEM 10/58] drivers/rdma: Utilize new kmap_thread()
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in these drivers are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Mike Marciniszyn mike.marciniszyn@intel.com Cc: Dennis Dalessandro dennis.dalessandro@intel.com Cc: Doug Ledford dledford@redhat.com Cc: Jason Gunthorpe jgg@ziepe.ca Cc: Faisal Latif faisal.latif@intel.com Cc: Shiraz Saleem shiraz.saleem@intel.com Cc: Bernard Metzler bmt@zurich.ibm.com Signed-off-by: Ira Weiny ira.weiny@intel.com
drivers/infiniband/hw/hfi1/sdma.c | 4 ++-- drivers/infiniband/hw/i40iw/i40iw_cm.c | 10 +++++----- drivers/infiniband/sw/siw/siw_qp_tx.c | 14 +++++++------- 3 files changed, 14 insertions(+), 14 deletions(-)
diff --git a/drivers/infiniband/hw/hfi1/sdma.c b/drivers/infiniband/hw/hfi1/sdma.c index 04575c9afd61..09d206e3229a 100644 --- a/drivers/infiniband/hw/hfi1/sdma.c +++ b/drivers/infiniband/hw/hfi1/sdma.c @@ -3130,7 +3130,7 @@ int ext_coal_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx, }
if (type == SDMA_MAP_PAGE) {
kvaddr = kmap(page);
} else if (WARN_ON(!kvaddr)) { __sdma_txclean(dd, tx);kvaddr = kmap_thread(page); kvaddr += offset;
@@ -3140,7 +3140,7 @@ int ext_coal_sdma_tx_descs(struct hfi1_devdata *dd, struct sdma_txreq *tx, memcpy(tx->coalesce_buf + tx->coalesce_idx, kvaddr, len); tx->coalesce_idx += len; if (type == SDMA_MAP_PAGE)
kunmap(page);
kunmap_thread(page);
/* If there is more data, return */ if (tx->tlen - tx->coalesce_idx)
diff --git a/drivers/infiniband/hw/i40iw/i40iw_cm.c b/drivers/infiniband/hw/i40iw/i40iw_cm.c index a3b95805c154..122d7a5642a1 100644 --- a/drivers/infiniband/hw/i40iw/i40iw_cm.c +++ b/drivers/infiniband/hw/i40iw/i40iw_cm.c @@ -3721,7 +3721,7 @@ int i40iw_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) ibmr->device = iwpd->ibpd.device; iwqp->lsmm_mr = ibmr; if (iwqp->page)
iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
dev->iw_priv_qp_ops->qp_send_lsmm(&iwqp->sc_qp, iwqp->ietf_mem.va, (accept.size + conn_param->private_data_len),iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
@@ -3729,12 +3729,12 @@ int i40iw_accept(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param)
} else { if (iwqp->page)
iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
dev->iw_priv_qp_ops->qp_send_lsmm(&iwqp->sc_qp, NULL, 0, 0); }
if (iwqp->page)
kunmap(iwqp->page);
kunmap_thread(iwqp->page);
iwqp->cm_id = cm_id; cm_node->cm_id = cm_id;
@@ -4102,10 +4102,10 @@ static void i40iw_cm_event_connected(struct i40iw_cm_event *event) i40iw_cm_init_tsa_conn(iwqp, cm_node); read0 = (cm_node->send_rdma0_op == SEND_RDMA_READ_ZERO); if (iwqp->page)
iwqp->sc_qp.qp_uk.sq_base = kmap(iwqp->page);
dev->iw_priv_qp_ops->qp_send_rtt(&iwqp->sc_qp, read0); if (iwqp->page)iwqp->sc_qp.qp_uk.sq_base = kmap_thread(iwqp->page);
kunmap(iwqp->page);
kunmap_thread(iwqp->page);
memset(&attr, 0, sizeof(attr)); attr.qp_state = IB_QPS_RTS;
diff --git a/drivers/infiniband/sw/siw/siw_qp_tx.c b/drivers/infiniband/sw/siw/siw_qp_tx.c index d19d8325588b..4ed37c328d02 100644 --- a/drivers/infiniband/sw/siw/siw_qp_tx.c +++ b/drivers/infiniband/sw/siw/siw_qp_tx.c @@ -76,7 +76,7 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void *paddr) if (unlikely(!p)) return -EFAULT;
buffer = kmap(p);
buffer = kmap_thread(p); if (likely(PAGE_SIZE - off >= bytes)) { memcpy(paddr, buffer + off, bytes);
@@ -84,7 +84,7 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void *paddr) unsigned long part = bytes - (PAGE_SIZE - off);
memcpy(paddr, buffer + off, part);
kunmap(p);
kunmap_thread(p); if (!mem->is_pbl) p = siw_get_upage(mem->umem,
@@ -96,10 +96,10 @@ static int siw_try_1seg(struct siw_iwarp_tx *c_tx, void *paddr) if (unlikely(!p)) return -EFAULT;
buffer = kmap(p);
buffer = kmap_thread(p); memcpy(paddr + part, buffer, bytes - part); }
kunmap(p);
} } return (int)bytes;kunmap_thread(p);
@@ -505,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) page_array[seg] = p;
if (!c_tx->use_sendpage) {
iov[seg].iov_base = kmap(p) + fp_off;
iov[seg].iov_base = kmap_thread(p) + fp_off;
This misses a corresponding kunmap_thread() in siw_unmap_pages() (pls change line 403 in siw_qp_tx.c as well)
Thanks, Bernard.
iov[seg].iov_len = plen; /* Remember for later kunmap() */
@@ -518,9 +518,9 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) plen); } else if (do_crc) { crypto_shash_update(c_tx->mpa_crc_hd,
kmap(p) + fp_off,
kmap_thread(p) + fp_off, plen);
kunmap(p);
kunmap_thread(p); } } else { u64 va = sge->laddr + sge_off;
-- 2.28.0.rc0.12.gb6a658bd00c9
On Sat, Oct 10, 2020 at 11:36:49AM +0000, Bernard Metzler wrote:
-----ira.weiny@intel.com wrote: -----
[snip]
@@ -505,7 +505,7 @@ static int siw_tx_hdt(struct siw_iwarp_tx *c_tx, struct socket *s) page_array[seg] = p;
if (!c_tx->use_sendpage) {
iov[seg].iov_base = kmap(p) + fp_off;
iov[seg].iov_base = kmap_thread(p) + fp_off;
This misses a corresponding kunmap_thread() in siw_unmap_pages() (pls change line 403 in siw_qp_tx.c as well)
Thanks I missed that.
Done.
Ira
Thanks, Bernard.
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in these drivers are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: "David S. Miller" davem@davemloft.net Cc: Jakub Kicinski kuba@kernel.org Cc: Jesse Brandeburg jesse.brandeburg@intel.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/net/ethernet/intel/igb/igb_ethtool.c | 4 ++-- drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/net/ethernet/intel/igb/igb_ethtool.c b/drivers/net/ethernet/intel/igb/igb_ethtool.c index 6e8231c1ddf0..ac9189752012 100644 --- a/drivers/net/ethernet/intel/igb/igb_ethtool.c +++ b/drivers/net/ethernet/intel/igb/igb_ethtool.c @@ -1794,14 +1794,14 @@ static int igb_check_lbtest_frame(struct igb_rx_buffer *rx_buffer,
frame_size >>= 1;
- data = kmap(rx_buffer->page); + data = kmap_thread(rx_buffer->page);
if (data[3] != 0xFF || data[frame_size + 10] != 0xBE || data[frame_size + 12] != 0xAF) match = false;
- kunmap(rx_buffer->page); + kunmap_thread(rx_buffer->page);
return match; } diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c index 71ec908266a6..7d469425f8b4 100644 --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_ethtool.c @@ -1963,14 +1963,14 @@ static bool ixgbe_check_lbtest_frame(struct ixgbe_rx_buffer *rx_buffer,
frame_size >>= 1;
- data = kmap(rx_buffer->page) + rx_buffer->page_offset; + data = kmap_thread(rx_buffer->page) + rx_buffer->page_offset;
if (data[3] != 0xFF || data[frame_size + 10] != 0xBE || data[frame_size + 12] != 0xAF) match = false;
- kunmap(rx_buffer->page); + kunmap_thread(rx_buffer->page);
return match; }
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: David Howells dhowells@redhat.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/afs/dir.c | 16 ++++++++-------- fs/afs/dir_edit.c | 16 ++++++++-------- fs/afs/mntpt.c | 4 ++-- fs/afs/write.c | 4 ++-- 4 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/fs/afs/dir.c b/fs/afs/dir.c index 1d2e61e0ab04..5d01cdb590de 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -127,14 +127,14 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page, qty /= sizeof(union afs_xdr_dir_block);
/* check them */ - dbuf = kmap(page); + dbuf = kmap_thread(page); for (tmp = 0; tmp < qty; tmp++) { if (dbuf->blocks[tmp].hdr.magic != AFS_DIR_MAGIC) { printk("kAFS: %s(%lx): bad magic %d/%d is %04hx\n", __func__, dvnode->vfs_inode.i_ino, tmp, qty, ntohs(dbuf->blocks[tmp].hdr.magic)); trace_afs_dir_check_failed(dvnode, off, i_size); - kunmap(page); + kunmap_thread(page); trace_afs_file_error(dvnode, -EIO, afs_file_error_dir_bad_magic); goto error; } @@ -146,7 +146,7 @@ static bool afs_dir_check_page(struct afs_vnode *dvnode, struct page *page, ((u8 *)&dbuf->blocks[tmp])[AFS_DIR_BLOCK_SIZE - 1] = 0; }
- kunmap(page); + kunmap_thread(page);
checked: afs_stat_v(dvnode, n_read_dir); @@ -177,13 +177,13 @@ static bool afs_dir_check_pages(struct afs_vnode *dvnode, struct afs_read *req) req->pos, req->index, req->nr_pages, req->offset);
for (i = 0; i < req->nr_pages; i++) { - dbuf = kmap(req->pages[i]); + dbuf = kmap_thread(req->pages[i]); for (j = 0; j < qty; j++) { union afs_xdr_dir_block *block = &dbuf->blocks[j];
pr_warn("[%02x] %32phN\n", i * qty + j, block); } - kunmap(req->pages[i]); + kunmap_thread(req->pages[i]); } return false; } @@ -481,7 +481,7 @@ static int afs_dir_iterate(struct inode *dir, struct dir_context *ctx,
limit = blkoff & ~(PAGE_SIZE - 1);
- dbuf = kmap(page); + dbuf = kmap_thread(page);
/* deal with the individual blocks stashed on this page */ do { @@ -489,7 +489,7 @@ static int afs_dir_iterate(struct inode *dir, struct dir_context *ctx, sizeof(union afs_xdr_dir_block)]; ret = afs_dir_iterate_block(dvnode, ctx, dblock, blkoff); if (ret != 1) { - kunmap(page); + kunmap_thread(page); goto out; }
@@ -497,7 +497,7 @@ static int afs_dir_iterate(struct inode *dir, struct dir_context *ctx,
} while (ctx->pos < dir->i_size && blkoff < limit);
- kunmap(page); + kunmap_thread(page); ret = 0; }
diff --git a/fs/afs/dir_edit.c b/fs/afs/dir_edit.c index b108528bf010..35ed6828e205 100644 --- a/fs/afs/dir_edit.c +++ b/fs/afs/dir_edit.c @@ -218,7 +218,7 @@ void afs_edit_dir_add(struct afs_vnode *vnode, need_slots = round_up(12 + name->len + 1 + 4, AFS_DIR_DIRENT_SIZE); need_slots /= AFS_DIR_DIRENT_SIZE;
- meta_page = kmap(page0); + meta_page = kmap_thread(page0); meta = &meta_page->blocks[0]; if (i_size == 0) goto new_directory; @@ -247,7 +247,7 @@ void afs_edit_dir_add(struct afs_vnode *vnode, set_page_private(page, 1); SetPagePrivate(page); } - dir_page = kmap(page); + dir_page = kmap_thread(page); }
/* Abandon the edit if we got a callback break. */ @@ -284,7 +284,7 @@ void afs_edit_dir_add(struct afs_vnode *vnode,
if (page != page0) { unlock_page(page); - kunmap(page); + kunmap_thread(page); put_page(page); } } @@ -323,7 +323,7 @@ void afs_edit_dir_add(struct afs_vnode *vnode, afs_set_contig_bits(block, slot, need_slots); if (page != page0) { unlock_page(page); - kunmap(page); + kunmap_thread(page); put_page(page); }
@@ -337,7 +337,7 @@ void afs_edit_dir_add(struct afs_vnode *vnode,
out_unmap: unlock_page(page0); - kunmap(page0); + kunmap_thread(page0); put_page(page0); _leave(""); return; @@ -346,7 +346,7 @@ void afs_edit_dir_add(struct afs_vnode *vnode, trace_afs_edit_dir(vnode, why, afs_edit_dir_create_inval, 0, 0, 0, 0, name->name); clear_bit(AFS_VNODE_DIR_VALID, &vnode->flags); if (page != page0) { - kunmap(page); + kunmap_thread(page); put_page(page); } goto out_unmap; @@ -398,7 +398,7 @@ void afs_edit_dir_remove(struct afs_vnode *vnode, need_slots = round_up(12 + name->len + 1 + 4, AFS_DIR_DIRENT_SIZE); need_slots /= AFS_DIR_DIRENT_SIZE;
- meta_page = kmap(page0); + meta_page = kmap_thread(page0); meta = &meta_page->blocks[0];
/* Find a page that has sufficient slots available. Each VM page @@ -410,7 +410,7 @@ void afs_edit_dir_remove(struct afs_vnode *vnode, page = find_lock_page(vnode->vfs_inode.i_mapping, index); if (!page) goto error; - dir_page = kmap(page); + dir_page = kmap_thread(page); } else { page = page0; dir_page = meta_page; diff --git a/fs/afs/mntpt.c b/fs/afs/mntpt.c index 79bc5f1338ed..562454e2fd5c 100644 --- a/fs/afs/mntpt.c +++ b/fs/afs/mntpt.c @@ -139,11 +139,11 @@ static int afs_mntpt_set_params(struct fs_context *fc, struct dentry *mntpt) return ret; }
- buf = kmap(page); + buf = kmap_thread(page); ret = -EINVAL; if (buf[size - 1] == '.') ret = vfs_parse_fs_string(fc, "source", buf, size - 1); - kunmap(page); + kunmap_thread(page); put_page(page); if (ret < 0) return ret; diff --git a/fs/afs/write.c b/fs/afs/write.c index 4b2265cb1891..c56e5b4db4ae 100644 --- a/fs/afs/write.c +++ b/fs/afs/write.c @@ -38,9 +38,9 @@ static int afs_fill_page(struct afs_vnode *vnode, struct key *key, if (pos >= vnode->vfs_inode.i_size) { p = pos & ~PAGE_MASK; ASSERTCMP(p + len, <=, PAGE_SIZE); - data = kmap(page); + data = kmap_thread(page); memset(data + p, 0, len); - kunmap(page); + kunmap_thread(page); return 0; }
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Chris Mason clm@fb.com Cc: Josef Bacik josef@toxicpanda.com Cc: David Sterba dsterba@suse.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/btrfs/check-integrity.c | 4 ++-- fs/btrfs/compression.c | 4 ++-- fs/btrfs/inode.c | 16 ++++++++-------- fs/btrfs/lzo.c | 24 ++++++++++++------------ fs/btrfs/raid56.c | 34 +++++++++++++++++----------------- fs/btrfs/reflink.c | 8 ++++---- fs/btrfs/send.c | 4 ++-- fs/btrfs/zlib.c | 32 ++++++++++++++++---------------- fs/btrfs/zstd.c | 20 ++++++++++---------- 9 files changed, 73 insertions(+), 73 deletions(-)
diff --git a/fs/btrfs/check-integrity.c b/fs/btrfs/check-integrity.c index 81a8c87a5afb..9e5a02512ab5 100644 --- a/fs/btrfs/check-integrity.c +++ b/fs/btrfs/check-integrity.c @@ -2706,7 +2706,7 @@ static void __btrfsic_submit_bio(struct bio *bio)
bio_for_each_segment(bvec, bio, iter) { BUG_ON(bvec.bv_len != PAGE_SIZE); - mapped_datav[i] = kmap(bvec.bv_page); + mapped_datav[i] = kmap_thread(bvec.bv_page); i++;
if (dev_state->state->print_mask & @@ -2720,7 +2720,7 @@ static void __btrfsic_submit_bio(struct bio *bio) bio, &bio_is_patched, bio->bi_opf); bio_for_each_segment(bvec, bio, iter) - kunmap(bvec.bv_page); + kunmap_thread(bvec.bv_page); kfree(mapped_datav); } else if (NULL != dev_state && (bio->bi_opf & REQ_PREFLUSH)) { if (dev_state->state->print_mask & diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 1ab56a734e70..5944fb36d68a 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -1626,7 +1626,7 @@ static void heuristic_collect_sample(struct inode *inode, u64 start, u64 end, curr_sample_pos = 0; while (index < index_end) { page = find_get_page(inode->i_mapping, index); - in_data = kmap(page); + in_data = kmap_thread(page); /* Handle case where the start is not aligned to PAGE_SIZE */ i = start % PAGE_SIZE; while (i < PAGE_SIZE - SAMPLING_READ_SIZE) { @@ -1639,7 +1639,7 @@ static void heuristic_collect_sample(struct inode *inode, u64 start, u64 end, start += SAMPLING_INTERVAL; curr_sample_pos += SAMPLING_READ_SIZE; } - kunmap(page); + kunmap_thread(page); put_page(page);
index++; diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 9570458aa847..9710a52c6c42 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -4603,7 +4603,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, if (offset != blocksize) { if (!len) len = blocksize - offset; - kaddr = kmap(page); + kaddr = kmap_thread(page); if (front) memset(kaddr + (block_start - page_offset(page)), 0, offset); @@ -4611,7 +4611,7 @@ int btrfs_truncate_block(struct inode *inode, loff_t from, loff_t len, memset(kaddr + (block_start - page_offset(page)) + offset, 0, len); flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); } ClearPageChecked(page); set_page_dirty(page); @@ -6509,9 +6509,9 @@ static noinline int uncompress_inline(struct btrfs_path *path, */
if (max_size + pg_offset < PAGE_SIZE) { - char *map = kmap(page); + char *map = kmap_thread(page); memset(map + pg_offset + max_size, 0, PAGE_SIZE - max_size - pg_offset); - kunmap(page); + kunmap_thread(page); } kfree(tmp); return ret; @@ -6704,7 +6704,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, goto out; } } else { - map = kmap(page); + map = kmap_thread(page); read_extent_buffer(leaf, map + pg_offset, ptr, copy_size); if (pg_offset + copy_size < PAGE_SIZE) { @@ -6712,7 +6712,7 @@ struct extent_map *btrfs_get_extent(struct btrfs_inode *inode, PAGE_SIZE - pg_offset - copy_size); } - kunmap(page); + kunmap_thread(page); } flush_dcache_page(page); } @@ -8326,10 +8326,10 @@ vm_fault_t btrfs_page_mkwrite(struct vm_fault *vmf) zero_start = PAGE_SIZE;
if (zero_start != PAGE_SIZE) { - kaddr = kmap(page); + kaddr = kmap_thread(page); memset(kaddr + zero_start, 0, PAGE_SIZE - zero_start); flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); } ClearPageChecked(page); set_page_dirty(page); diff --git a/fs/btrfs/lzo.c b/fs/btrfs/lzo.c index aa9cd11f4b78..f29dcc9ec573 100644 --- a/fs/btrfs/lzo.c +++ b/fs/btrfs/lzo.c @@ -140,7 +140,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping, *total_in = 0;
in_page = find_get_page(mapping, start >> PAGE_SHIFT); - data_in = kmap(in_page); + data_in = kmap_thread(in_page);
/* * store the size of all chunks of compressed data in @@ -151,7 +151,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping, ret = -ENOMEM; goto out; } - cpage_out = kmap(out_page); + cpage_out = kmap_thread(out_page); out_offset = LZO_LEN; tot_out = LZO_LEN; pages[0] = out_page; @@ -209,7 +209,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping, if (out_len == 0 && tot_in >= len) break;
- kunmap(out_page); + kunmap_thread(out_page); if (nr_pages == nr_dest_pages) { out_page = NULL; ret = -E2BIG; @@ -221,7 +221,7 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping, ret = -ENOMEM; goto out; } - cpage_out = kmap(out_page); + cpage_out = kmap_thread(out_page); pages[nr_pages++] = out_page;
pg_bytes_left = PAGE_SIZE; @@ -243,12 +243,12 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping, break;
bytes_left = len - tot_in; - kunmap(in_page); + kunmap_thread(in_page); put_page(in_page);
start += PAGE_SIZE; in_page = find_get_page(mapping, start >> PAGE_SHIFT); - data_in = kmap(in_page); + data_in = kmap_thread(in_page); in_len = min(bytes_left, PAGE_SIZE); }
@@ -258,10 +258,10 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping, }
/* store the size of all chunks of compressed data */ - cpage_out = kmap(pages[0]); + cpage_out = kmap_thread(pages[0]); write_compress_length(cpage_out, tot_out);
- kunmap(pages[0]); + kunmap_thread(pages[0]);
ret = 0; *total_out = tot_out; @@ -269,10 +269,10 @@ int lzo_compress_pages(struct list_head *ws, struct address_space *mapping, out: *out_pages = nr_pages; if (out_page) - kunmap(out_page); + kunmap_thread(out_page);
if (in_page) { - kunmap(in_page); + kunmap_thread(in_page); put_page(in_page); }
@@ -305,7 +305,7 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb) u64 disk_start = cb->start; struct bio *orig_bio = cb->orig_bio;
- data_in = kmap(pages_in[0]); + data_in = kmap_thread(pages_in[0]); tot_len = read_compress_length(data_in); /* * Compressed data header check. @@ -387,7 +387,7 @@ int lzo_decompress_bio(struct list_head *ws, struct compressed_bio *cb) else kunmap(pages_in[page_in_index]);
- data_in = kmap(pages_in[++page_in_index]); + data_in = kmap_thread(pages_in[++page_in_index]);
in_page_bytes_left = PAGE_SIZE; in_offset = 0; diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 255490f42b5d..34e646e4548c 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -262,13 +262,13 @@ static void cache_rbio_pages(struct btrfs_raid_bio *rbio) if (!rbio->bio_pages[i]) continue;
- s = kmap(rbio->bio_pages[i]); - d = kmap(rbio->stripe_pages[i]); + s = kmap_thread(rbio->bio_pages[i]); + d = kmap_thread(rbio->stripe_pages[i]);
copy_page(d, s);
- kunmap(rbio->bio_pages[i]); - kunmap(rbio->stripe_pages[i]); + kunmap_thread(rbio->bio_pages[i]); + kunmap_thread(rbio->stripe_pages[i]); SetPageUptodate(rbio->stripe_pages[i]); } set_bit(RBIO_CACHE_READY_BIT, &rbio->flags); @@ -1241,13 +1241,13 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) /* first collect one page from each data stripe */ for (stripe = 0; stripe < nr_data; stripe++) { p = page_in_rbio(rbio, stripe, pagenr, 0); - pointers[stripe] = kmap(p); + pointers[stripe] = kmap_thread(p); }
/* then add the parity stripe */ p = rbio_pstripe_page(rbio, pagenr); SetPageUptodate(p); - pointers[stripe++] = kmap(p); + pointers[stripe++] = kmap_thread(p);
if (has_qstripe) {
@@ -1257,7 +1257,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio) */ p = rbio_qstripe_page(rbio, pagenr); SetPageUptodate(p); - pointers[stripe++] = kmap(p); + pointers[stripe++] = kmap_thread(p);
raid6_call.gen_syndrome(rbio->real_stripes, PAGE_SIZE, pointers); @@ -1269,7 +1269,7 @@ static noinline void finish_rmw(struct btrfs_raid_bio *rbio)
for (stripe = 0; stripe < rbio->real_stripes; stripe++) - kunmap(page_in_rbio(rbio, stripe, pagenr, 0)); + kunmap_thread(page_in_rbio(rbio, stripe, pagenr, 0)); }
/* @@ -1835,7 +1835,7 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio) } else { page = rbio_stripe_page(rbio, stripe, pagenr); } - pointers[stripe] = kmap(page); + pointers[stripe] = kmap_thread(page); }
/* all raid6 handling here */ @@ -1940,7 +1940,7 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio) } else { page = rbio_stripe_page(rbio, stripe, pagenr); } - kunmap(page); + kunmap_thread(page); } }
@@ -2379,18 +2379,18 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio, /* first collect one page from each data stripe */ for (stripe = 0; stripe < nr_data; stripe++) { p = page_in_rbio(rbio, stripe, pagenr, 0); - pointers[stripe] = kmap(p); + pointers[stripe] = kmap_thread(p); }
/* then add the parity stripe */ - pointers[stripe++] = kmap(p_page); + pointers[stripe++] = kmap_thread(p_page);
if (has_qstripe) { /* * raid6, add the qstripe and call the * library function to fill in our p/q */ - pointers[stripe++] = kmap(q_page); + pointers[stripe++] = kmap_thread(q_page);
raid6_call.gen_syndrome(rbio->real_stripes, PAGE_SIZE, pointers); @@ -2402,17 +2402,17 @@ static noinline void finish_parity_scrub(struct btrfs_raid_bio *rbio,
/* Check scrubbing parity and repair it */ p = rbio_stripe_page(rbio, rbio->scrubp, pagenr); - parity = kmap(p); + parity = kmap_thread(p); if (memcmp(parity, pointers[rbio->scrubp], PAGE_SIZE)) copy_page(parity, pointers[rbio->scrubp]); else /* Parity is right, needn't writeback */ bitmap_clear(rbio->dbitmap, pagenr, 1); - kunmap(p); + kunmap_thread(p);
for (stripe = 0; stripe < nr_data; stripe++) - kunmap(page_in_rbio(rbio, stripe, pagenr, 0)); - kunmap(p_page); + kunmap_thread(page_in_rbio(rbio, stripe, pagenr, 0)); + kunmap_thread(p_page); }
__free_page(p_page); diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c index 5cd02514cf4d..10e53d7eba8c 100644 --- a/fs/btrfs/reflink.c +++ b/fs/btrfs/reflink.c @@ -92,10 +92,10 @@ static int copy_inline_to_page(struct inode *inode, if (comp_type == BTRFS_COMPRESS_NONE) { char *map;
- map = kmap(page); + map = kmap_thread(page); memcpy(map, data_start, datal); flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); } else { ret = btrfs_decompress(comp_type, data_start, page, 0, inline_size, datal); @@ -119,10 +119,10 @@ static int copy_inline_to_page(struct inode *inode, if (datal < block_size) { char *map;
- map = kmap(page); + map = kmap_thread(page); memset(map + datal, 0, block_size - datal); flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); }
SetPageUptodate(page); diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c index d9813a5b075a..06c383d3dc43 100644 --- a/fs/btrfs/send.c +++ b/fs/btrfs/send.c @@ -4863,9 +4863,9 @@ static ssize_t fill_read_buf(struct send_ctx *sctx, u64 offset, u32 len) } }
- addr = kmap(page); + addr = kmap_thread(page); memcpy(sctx->read_buf + ret, addr + pg_offset, cur_len); - kunmap(page); + kunmap_thread(page); unlock_page(page); put_page(page); index++; diff --git a/fs/btrfs/zlib.c b/fs/btrfs/zlib.c index 05615a1099db..45b7a907bab3 100644 --- a/fs/btrfs/zlib.c +++ b/fs/btrfs/zlib.c @@ -126,7 +126,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping, ret = -ENOMEM; goto out; } - cpage_out = kmap(out_page); + cpage_out = kmap_thread(out_page); pages[0] = out_page; nr_pages = 1;
@@ -149,12 +149,12 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping,
for (i = 0; i < in_buf_pages; i++) { if (in_page) { - kunmap(in_page); + kunmap_thread(in_page); put_page(in_page); } in_page = find_get_page(mapping, start >> PAGE_SHIFT); - data_in = kmap(in_page); + data_in = kmap_thread(in_page); memcpy(workspace->buf + i * PAGE_SIZE, data_in, PAGE_SIZE); start += PAGE_SIZE; @@ -162,12 +162,12 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping, workspace->strm.next_in = workspace->buf; } else { if (in_page) { - kunmap(in_page); + kunmap_thread(in_page); put_page(in_page); } in_page = find_get_page(mapping, start >> PAGE_SHIFT); - data_in = kmap(in_page); + data_in = kmap_thread(in_page); start += PAGE_SIZE; workspace->strm.next_in = data_in; } @@ -196,7 +196,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping, * the stream end if required */ if (workspace->strm.avail_out == 0) { - kunmap(out_page); + kunmap_thread(out_page); if (nr_pages == nr_dest_pages) { out_page = NULL; ret = -E2BIG; @@ -207,7 +207,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping, ret = -ENOMEM; goto out; } - cpage_out = kmap(out_page); + cpage_out = kmap_thread(out_page); pages[nr_pages] = out_page; nr_pages++; workspace->strm.avail_out = PAGE_SIZE; @@ -234,7 +234,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping, goto out; } else if (workspace->strm.avail_out == 0) { /* get another page for the stream end */ - kunmap(out_page); + kunmap_thread(out_page); if (nr_pages == nr_dest_pages) { out_page = NULL; ret = -E2BIG; @@ -245,7 +245,7 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping, ret = -ENOMEM; goto out; } - cpage_out = kmap(out_page); + cpage_out = kmap_thread(out_page); pages[nr_pages] = out_page; nr_pages++; workspace->strm.avail_out = PAGE_SIZE; @@ -265,10 +265,10 @@ int zlib_compress_pages(struct list_head *ws, struct address_space *mapping, out: *out_pages = nr_pages; if (out_page) - kunmap(out_page); + kunmap_thread(out_page);
if (in_page) { - kunmap(in_page); + kunmap_thread(in_page); put_page(in_page); } return ret; @@ -289,7 +289,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb) u64 disk_start = cb->start; struct bio *orig_bio = cb->orig_bio;
- data_in = kmap(pages_in[page_in_index]); + data_in = kmap_thread(pages_in[page_in_index]); workspace->strm.next_in = data_in; workspace->strm.avail_in = min_t(size_t, srclen, PAGE_SIZE); workspace->strm.total_in = 0; @@ -311,7 +311,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
if (Z_OK != zlib_inflateInit2(&workspace->strm, wbits)) { pr_warn("BTRFS: inflateInit failed\n"); - kunmap(pages_in[page_in_index]); + kunmap_thread(pages_in[page_in_index]); return -EIO; } while (workspace->strm.total_in < srclen) { @@ -339,13 +339,13 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb)
if (workspace->strm.avail_in == 0) { unsigned long tmp; - kunmap(pages_in[page_in_index]); + kunmap_thread(pages_in[page_in_index]); page_in_index++; if (page_in_index >= total_pages_in) { data_in = NULL; break; } - data_in = kmap(pages_in[page_in_index]); + data_in = kmap_thread(pages_in[page_in_index]); workspace->strm.next_in = data_in; tmp = srclen - workspace->strm.total_in; workspace->strm.avail_in = min(tmp, @@ -359,7 +359,7 @@ int zlib_decompress_bio(struct list_head *ws, struct compressed_bio *cb) done: zlib_inflateEnd(&workspace->strm); if (data_in) - kunmap(pages_in[page_in_index]); + kunmap_thread(pages_in[page_in_index]); if (!ret) zero_fill_bio(orig_bio); return ret; diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c index 9a4871636c6c..48e03f6dcef7 100644 --- a/fs/btrfs/zstd.c +++ b/fs/btrfs/zstd.c @@ -399,7 +399,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping,
/* map in the first page of input data */ in_page = find_get_page(mapping, start >> PAGE_SHIFT); - workspace->in_buf.src = kmap(in_page); + workspace->in_buf.src = kmap_thread(in_page); workspace->in_buf.pos = 0; workspace->in_buf.size = min_t(size_t, len, PAGE_SIZE);
@@ -411,7 +411,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping, goto out; } pages[nr_pages++] = out_page; - workspace->out_buf.dst = kmap(out_page); + workspace->out_buf.dst = kmap_thread(out_page); workspace->out_buf.pos = 0; workspace->out_buf.size = min_t(size_t, max_out, PAGE_SIZE);
@@ -446,7 +446,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping, if (workspace->out_buf.pos == workspace->out_buf.size) { tot_out += PAGE_SIZE; max_out -= PAGE_SIZE; - kunmap(out_page); + kunmap_thread(out_page); if (nr_pages == nr_dest_pages) { out_page = NULL; ret = -E2BIG; @@ -458,7 +458,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping, goto out; } pages[nr_pages++] = out_page; - workspace->out_buf.dst = kmap(out_page); + workspace->out_buf.dst = kmap_thread(out_page); workspace->out_buf.pos = 0; workspace->out_buf.size = min_t(size_t, max_out, PAGE_SIZE); @@ -479,7 +479,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping, start += PAGE_SIZE; len -= PAGE_SIZE; in_page = find_get_page(mapping, start >> PAGE_SHIFT); - workspace->in_buf.src = kmap(in_page); + workspace->in_buf.src = kmap_thread(in_page); workspace->in_buf.pos = 0; workspace->in_buf.size = min_t(size_t, len, PAGE_SIZE); } @@ -518,7 +518,7 @@ int zstd_compress_pages(struct list_head *ws, struct address_space *mapping, goto out; } pages[nr_pages++] = out_page; - workspace->out_buf.dst = kmap(out_page); + workspace->out_buf.dst = kmap_thread(out_page); workspace->out_buf.pos = 0; workspace->out_buf.size = min_t(size_t, max_out, PAGE_SIZE); } @@ -565,7 +565,7 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb) goto done; }
- workspace->in_buf.src = kmap(pages_in[page_in_index]); + workspace->in_buf.src = kmap_thread(pages_in[page_in_index]); workspace->in_buf.pos = 0; workspace->in_buf.size = min_t(size_t, srclen, PAGE_SIZE);
@@ -601,14 +601,14 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb) break;
if (workspace->in_buf.pos == workspace->in_buf.size) { - kunmap(pages_in[page_in_index++]); + kunmap_thread(pages_in[page_in_index++]); if (page_in_index >= total_pages_in) { workspace->in_buf.src = NULL; ret = -EIO; goto done; } srclen -= PAGE_SIZE; - workspace->in_buf.src = kmap(pages_in[page_in_index]); + workspace->in_buf.src = kmap_thread(pages_in[page_in_index]); workspace->in_buf.pos = 0; workspace->in_buf.size = min_t(size_t, srclen, PAGE_SIZE); } @@ -617,7 +617,7 @@ int zstd_decompress_bio(struct list_head *ws, struct compressed_bio *cb) zero_fill_bio(orig_bio); done: if (workspace->in_buf.src) - kunmap(pages_in[page_in_index]); + kunmap_thread(pages_in[page_in_index]); return ret; }
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Steve French sfrench@samba.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/cifs/cifsencrypt.c | 6 +++--- fs/cifs/file.c | 16 ++++++++-------- fs/cifs/smb2ops.c | 8 ++++---- 3 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/fs/cifs/cifsencrypt.c b/fs/cifs/cifsencrypt.c index 9daa256f69d4..2f8232d01a56 100644 --- a/fs/cifs/cifsencrypt.c +++ b/fs/cifs/cifsencrypt.c @@ -82,17 +82,17 @@ int __cifs_calc_signature(struct smb_rqst *rqst,
rqst_page_get_length(rqst, i, &len, &offset);
- kaddr = (char *) kmap(rqst->rq_pages[i]) + offset; + kaddr = (char *) kmap_thread(rqst->rq_pages[i]) + offset;
rc = crypto_shash_update(shash, kaddr, len); if (rc) { cifs_dbg(VFS, "%s: Could not update with payload\n", __func__); - kunmap(rqst->rq_pages[i]); + kunmap_thread(rqst->rq_pages[i]); return rc; }
- kunmap(rqst->rq_pages[i]); + kunmap_thread(rqst->rq_pages[i]); }
rc = crypto_shash_final(shash, signature); diff --git a/fs/cifs/file.c b/fs/cifs/file.c index be46fab4c96d..6db2caab8852 100644 --- a/fs/cifs/file.c +++ b/fs/cifs/file.c @@ -2145,17 +2145,17 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) inode = page->mapping->host;
offset += (loff_t)from; - write_data = kmap(page); + write_data = kmap_thread(page); write_data += from;
if ((to > PAGE_SIZE) || (from > to)) { - kunmap(page); + kunmap_thread(page); return -EIO; }
/* racing with truncate? */ if (offset > mapping->host->i_size) { - kunmap(page); + kunmap_thread(page); return 0; /* don't care */ }
@@ -2183,7 +2183,7 @@ static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) rc = -EIO; }
- kunmap(page); + kunmap_thread(page); return rc; }
@@ -2559,10 +2559,10 @@ static int cifs_write_end(struct file *file, struct address_space *mapping, known which we might as well leverage */ /* BB check if anything else missing out of ppw such as updating last write time */ - page_data = kmap(page); + page_data = kmap_thread(page); rc = cifs_write(cfile, pid, page_data + offset, copied, &pos); /* if (rc < 0) should we set writebehind rc? */ - kunmap(page); + kunmap_thread(page);
free_xid(xid); } else { @@ -4511,7 +4511,7 @@ static int cifs_readpage_worker(struct file *file, struct page *page, if (rc == 0) goto read_complete;
- read_data = kmap(page); + read_data = kmap_thread(page); /* for reads over a certain size could initiate async read ahead */
rc = cifs_read(file, read_data, PAGE_SIZE, poffset); @@ -4540,7 +4540,7 @@ static int cifs_readpage_worker(struct file *file, struct page *page, rc = 0;
io_error: - kunmap(page); + kunmap_thread(page); unlock_page(page);
read_complete: diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c index 32f90dc82c84..a3e7ebab38b6 100644 --- a/fs/cifs/smb2ops.c +++ b/fs/cifs/smb2ops.c @@ -4068,12 +4068,12 @@ smb3_init_transform_rq(struct TCP_Server_Info *server, int num_rqst,
rqst_page_get_length(&new_rq[i], j, &len, &offset);
- dst = (char *) kmap(new_rq[i].rq_pages[j]) + offset; - src = (char *) kmap(old_rq[i - 1].rq_pages[j]) + offset; + dst = (char *) kmap_thread(new_rq[i].rq_pages[j]) + offset; + src = (char *) kmap_thread(old_rq[i - 1].rq_pages[j]) + offset;
memcpy(dst, src, len); - kunmap(new_rq[i].rq_pages[j]); - kunmap(old_rq[i - 1].rq_pages[j]); + kunmap_thread(new_rq[i].rq_pages[j]); + kunmap_thread(old_rq[i - 1].rq_pages[j]); } }
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Herbert Xu herbert@gondor.apana.org.au Cc: Eric Biggers ebiggers@google.com Cc: Aditya Pakki pakki001@umn.edu Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/ecryptfs/crypto.c | 8 ++++---- fs/ecryptfs/read_write.c | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c index 0681540c48d9..e73e00994bee 100644 --- a/fs/ecryptfs/crypto.c +++ b/fs/ecryptfs/crypto.c @@ -469,10 +469,10 @@ int ecryptfs_encrypt_page(struct page *page) }
lower_offset = lower_offset_for_page(crypt_stat, page); - enc_extent_virt = kmap(enc_extent_page); + enc_extent_virt = kmap_thread(enc_extent_page); rc = ecryptfs_write_lower(ecryptfs_inode, enc_extent_virt, lower_offset, PAGE_SIZE); - kunmap(enc_extent_page); + kunmap_thread(enc_extent_page); if (rc < 0) { ecryptfs_printk(KERN_ERR, "Error attempting to write lower page; rc = [%d]\n", @@ -518,10 +518,10 @@ int ecryptfs_decrypt_page(struct page *page) BUG_ON(!(crypt_stat->flags & ECRYPTFS_ENCRYPTED));
lower_offset = lower_offset_for_page(crypt_stat, page); - page_virt = kmap(page); + page_virt = kmap_thread(page); rc = ecryptfs_read_lower(page_virt, lower_offset, PAGE_SIZE, ecryptfs_inode); - kunmap(page); + kunmap_thread(page); if (rc < 0) { ecryptfs_printk(KERN_ERR, "Error attempting to read lower page; rc = [%d]\n", diff --git a/fs/ecryptfs/read_write.c b/fs/ecryptfs/read_write.c index 0438997ac9d8..5eca4330c0c0 100644 --- a/fs/ecryptfs/read_write.c +++ b/fs/ecryptfs/read_write.c @@ -64,11 +64,11 @@ int ecryptfs_write_lower_page_segment(struct inode *ecryptfs_inode,
offset = ((((loff_t)page_for_lower->index) << PAGE_SHIFT) + offset_in_page); - virt = kmap(page_for_lower); + virt = kmap_thread(page_for_lower); rc = ecryptfs_write_lower(ecryptfs_inode, virt, offset, size); if (rc > 0) rc = 0; - kunmap(page_for_lower); + kunmap_thread(page_for_lower); return rc; }
@@ -251,11 +251,11 @@ int ecryptfs_read_lower_page_segment(struct page *page_for_ecryptfs, int rc;
offset = ((((loff_t)page_index) << PAGE_SHIFT) + offset_in_page); - virt = kmap(page_for_ecryptfs); + virt = kmap_thread(page_for_ecryptfs); rc = ecryptfs_read_lower(virt, offset, size, ecryptfs_inode); if (rc > 0) rc = 0; - kunmap(page_for_ecryptfs); + kunmap_thread(page_for_ecryptfs); flush_dcache_page(page_for_ecryptfs); return rc; }
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Bob Peterson rpeterso@redhat.com Cc: Andreas Gruenbacher agruenba@redhat.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/gfs2/bmap.c | 4 ++-- fs/gfs2/ops_fstype.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/gfs2/bmap.c b/fs/gfs2/bmap.c index 0f69fbd4af66..375af4528411 100644 --- a/fs/gfs2/bmap.c +++ b/fs/gfs2/bmap.c @@ -67,7 +67,7 @@ static int gfs2_unstuffer_page(struct gfs2_inode *ip, struct buffer_head *dibh, }
if (!PageUptodate(page)) { - void *kaddr = kmap(page); + void *kaddr = kmap_thread(page); u64 dsize = i_size_read(inode);
if (dsize > gfs2_max_stuffed_size(ip)) @@ -75,7 +75,7 @@ static int gfs2_unstuffer_page(struct gfs2_inode *ip, struct buffer_head *dibh,
memcpy(kaddr, dibh->b_data + sizeof(struct gfs2_dinode), dsize); memset(kaddr + dsize, 0, PAGE_SIZE - dsize); - kunmap(page); + kunmap_thread(page);
SetPageUptodate(page); } diff --git a/fs/gfs2/ops_fstype.c b/fs/gfs2/ops_fstype.c index 6d18d2c91add..a5d20d9b504a 100644 --- a/fs/gfs2/ops_fstype.c +++ b/fs/gfs2/ops_fstype.c @@ -263,9 +263,9 @@ static int gfs2_read_super(struct gfs2_sbd *sdp, sector_t sector, int silent) __free_page(page); return -EIO; } - p = kmap(page); + p = kmap_thread(page); gfs2_sb_in(sdp, p); - kunmap(page); + kunmap_thread(page); __free_page(page); return gfs2_check_sb(sdp, silent); }
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Ryusuke Konishi konishi.ryusuke@gmail.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/nilfs2/alloc.c | 34 +++++++++++++++++----------------- fs/nilfs2/cpfile.c | 4 ++-- 2 files changed, 19 insertions(+), 19 deletions(-)
diff --git a/fs/nilfs2/alloc.c b/fs/nilfs2/alloc.c index adf3bb0a8048..2aa4c34094ef 100644 --- a/fs/nilfs2/alloc.c +++ b/fs/nilfs2/alloc.c @@ -524,7 +524,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode, ret = nilfs_palloc_get_desc_block(inode, group, 1, &desc_bh); if (ret < 0) return ret; - desc_kaddr = kmap(desc_bh->b_page); + desc_kaddr = kmap_thread(desc_bh->b_page); desc = nilfs_palloc_block_get_group_desc( inode, group, desc_bh, desc_kaddr); n = nilfs_palloc_rest_groups_in_desc_block(inode, group, @@ -536,7 +536,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode, inode, group, 1, &bitmap_bh); if (ret < 0) goto out_desc; - bitmap_kaddr = kmap(bitmap_bh->b_page); + bitmap_kaddr = kmap_thread(bitmap_bh->b_page); bitmap = bitmap_kaddr + bh_offset(bitmap_bh); pos = nilfs_palloc_find_available_slot( bitmap, group_offset, @@ -547,21 +547,21 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode, desc, lock, -1); req->pr_entry_nr = entries_per_group * group + pos; - kunmap(desc_bh->b_page); - kunmap(bitmap_bh->b_page); + kunmap_thread(desc_bh->b_page); + kunmap_thread(bitmap_bh->b_page);
req->pr_desc_bh = desc_bh; req->pr_bitmap_bh = bitmap_bh; return 0; } - kunmap(bitmap_bh->b_page); + kunmap_thread(bitmap_bh->b_page); brelse(bitmap_bh); }
group_offset = 0; }
- kunmap(desc_bh->b_page); + kunmap_thread(desc_bh->b_page); brelse(desc_bh); }
@@ -569,7 +569,7 @@ int nilfs_palloc_prepare_alloc_entry(struct inode *inode, return -ENOSPC;
out_desc: - kunmap(desc_bh->b_page); + kunmap_thread(desc_bh->b_page); brelse(desc_bh); return ret; } @@ -605,10 +605,10 @@ void nilfs_palloc_commit_free_entry(struct inode *inode, spinlock_t *lock;
group = nilfs_palloc_group(inode, req->pr_entry_nr, &group_offset); - desc_kaddr = kmap(req->pr_desc_bh->b_page); + desc_kaddr = kmap_thread(req->pr_desc_bh->b_page); desc = nilfs_palloc_block_get_group_desc(inode, group, req->pr_desc_bh, desc_kaddr); - bitmap_kaddr = kmap(req->pr_bitmap_bh->b_page); + bitmap_kaddr = kmap_thread(req->pr_bitmap_bh->b_page); bitmap = bitmap_kaddr + bh_offset(req->pr_bitmap_bh); lock = nilfs_mdt_bgl_lock(inode, group);
@@ -620,8 +620,8 @@ void nilfs_palloc_commit_free_entry(struct inode *inode, else nilfs_palloc_group_desc_add_entries(desc, lock, 1);
- kunmap(req->pr_bitmap_bh->b_page); - kunmap(req->pr_desc_bh->b_page); + kunmap_thread(req->pr_bitmap_bh->b_page); + kunmap_thread(req->pr_desc_bh->b_page);
mark_buffer_dirty(req->pr_desc_bh); mark_buffer_dirty(req->pr_bitmap_bh); @@ -646,10 +646,10 @@ void nilfs_palloc_abort_alloc_entry(struct inode *inode, spinlock_t *lock;
group = nilfs_palloc_group(inode, req->pr_entry_nr, &group_offset); - desc_kaddr = kmap(req->pr_desc_bh->b_page); + desc_kaddr = kmap_thread(req->pr_desc_bh->b_page); desc = nilfs_palloc_block_get_group_desc(inode, group, req->pr_desc_bh, desc_kaddr); - bitmap_kaddr = kmap(req->pr_bitmap_bh->b_page); + bitmap_kaddr = kmap_thread(req->pr_bitmap_bh->b_page); bitmap = bitmap_kaddr + bh_offset(req->pr_bitmap_bh); lock = nilfs_mdt_bgl_lock(inode, group);
@@ -661,8 +661,8 @@ void nilfs_palloc_abort_alloc_entry(struct inode *inode, else nilfs_palloc_group_desc_add_entries(desc, lock, 1);
- kunmap(req->pr_bitmap_bh->b_page); - kunmap(req->pr_desc_bh->b_page); + kunmap_thread(req->pr_bitmap_bh->b_page); + kunmap_thread(req->pr_desc_bh->b_page);
brelse(req->pr_bitmap_bh); brelse(req->pr_desc_bh); @@ -754,7 +754,7 @@ int nilfs_palloc_freev(struct inode *inode, __u64 *entry_nrs, size_t nitems) /* Get the first entry number of the group */ group_min_nr = (__u64)group * epg;
- bitmap_kaddr = kmap(bitmap_bh->b_page); + bitmap_kaddr = kmap_thread(bitmap_bh->b_page); bitmap = bitmap_kaddr + bh_offset(bitmap_bh); lock = nilfs_mdt_bgl_lock(inode, group);
@@ -800,7 +800,7 @@ int nilfs_palloc_freev(struct inode *inode, __u64 *entry_nrs, size_t nitems) entry_start = rounddown(group_offset, epb); } while (true);
- kunmap(bitmap_bh->b_page); + kunmap_thread(bitmap_bh->b_page); mark_buffer_dirty(bitmap_bh); brelse(bitmap_bh);
diff --git a/fs/nilfs2/cpfile.c b/fs/nilfs2/cpfile.c index 86d4d850d130..402ab8bfce29 100644 --- a/fs/nilfs2/cpfile.c +++ b/fs/nilfs2/cpfile.c @@ -235,11 +235,11 @@ int nilfs_cpfile_get_checkpoint(struct inode *cpfile, ret = nilfs_cpfile_get_checkpoint_block(cpfile, cno, create, &cp_bh); if (ret < 0) goto out_header; - kaddr = kmap(cp_bh->b_page); + kaddr = kmap_thread(cp_bh->b_page); cp = nilfs_cpfile_block_get_checkpoint(cpfile, cno, cp_bh, kaddr); if (nilfs_checkpoint_invalid(cp)) { if (!create) { - kunmap(cp_bh->b_page); + kunmap_thread(cp_bh->b_page); brelse(cp_bh); ret = -ENOENT; goto out_header;
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/hfs/bnode.c | 14 +++++++------- fs/hfs/btree.c | 20 ++++++++++---------- 2 files changed, 17 insertions(+), 17 deletions(-)
diff --git a/fs/hfs/bnode.c b/fs/hfs/bnode.c index b63a4df7327b..8b4d02576405 100644 --- a/fs/hfs/bnode.c +++ b/fs/hfs/bnode.c @@ -23,8 +23,8 @@ void hfs_bnode_read(struct hfs_bnode *node, void *buf, off += node->page_offset; page = node->page[0];
- memcpy(buf, kmap(page) + off, len); - kunmap(page); + memcpy(buf, kmap_thread(page) + off, len); + kunmap_thread(page); }
u16 hfs_bnode_read_u16(struct hfs_bnode *node, int off) @@ -108,9 +108,9 @@ void hfs_bnode_copy(struct hfs_bnode *dst_node, int dst, src_page = src_node->page[0]; dst_page = dst_node->page[0];
- memcpy(kmap(dst_page) + dst, kmap(src_page) + src, len); - kunmap(src_page); - kunmap(dst_page); + memcpy(kmap_thread(dst_page) + dst, kmap_thread(src_page) + src, len); + kunmap_thread(src_page); + kunmap_thread(dst_page); set_page_dirty(dst_page); }
@@ -125,9 +125,9 @@ void hfs_bnode_move(struct hfs_bnode *node, int dst, int src, int len) src += node->page_offset; dst += node->page_offset; page = node->page[0]; - ptr = kmap(page); + ptr = kmap_thread(page); memmove(ptr + dst, ptr + src, len); - kunmap(page); + kunmap_thread(page); set_page_dirty(page); }
diff --git a/fs/hfs/btree.c b/fs/hfs/btree.c index 19017d296173..bd4a6d35e361 100644 --- a/fs/hfs/btree.c +++ b/fs/hfs/btree.c @@ -80,7 +80,7 @@ struct hfs_btree *hfs_btree_open(struct super_block *sb, u32 id, btree_keycmp ke goto free_inode;
/* Load the header */ - head = (struct hfs_btree_header_rec *)(kmap(page) + sizeof(struct hfs_bnode_desc)); + head = (struct hfs_btree_header_rec *)(kmap_thread(page) + sizeof(struct hfs_bnode_desc)); tree->root = be32_to_cpu(head->root); tree->leaf_count = be32_to_cpu(head->leaf_count); tree->leaf_head = be32_to_cpu(head->leaf_head); @@ -119,7 +119,7 @@ struct hfs_btree *hfs_btree_open(struct super_block *sb, u32 id, btree_keycmp ke tree->node_size_shift = ffs(size) - 1; tree->pages_per_bnode = (tree->node_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
- kunmap(page); + kunmap_thread(page); put_page(page); return tree;
@@ -268,7 +268,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
off += node->page_offset; pagep = node->page + (off >> PAGE_SHIFT); - data = kmap(*pagep); + data = kmap_thread(*pagep); off &= ~PAGE_MASK; idx = 0;
@@ -281,7 +281,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree) idx += i; data[off] |= m; set_page_dirty(*pagep); - kunmap(*pagep); + kunmap_thread(*pagep); tree->free_nodes--; mark_inode_dirty(tree->inode); hfs_bnode_put(node); @@ -290,14 +290,14 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree) } } if (++off >= PAGE_SIZE) { - kunmap(*pagep); - data = kmap(*++pagep); + kunmap_thread(*pagep); + data = kmap_thread(*++pagep); off = 0; } idx += 8; len--; } - kunmap(*pagep); + kunmap_thread(*pagep); nidx = node->next; if (!nidx) { printk(KERN_DEBUG "create new bmap node...\n"); @@ -313,7 +313,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree) off = off16; off += node->page_offset; pagep = node->page + (off >> PAGE_SHIFT); - data = kmap(*pagep); + data = kmap_thread(*pagep); off &= ~PAGE_MASK; } } @@ -360,7 +360,7 @@ void hfs_bmap_free(struct hfs_bnode *node) } off += node->page_offset + nidx / 8; page = node->page[off >> PAGE_SHIFT]; - data = kmap(page); + data = kmap_thread(page); off &= ~PAGE_MASK; m = 1 << (~nidx & 7); byte = data[off]; @@ -373,7 +373,7 @@ void hfs_bmap_free(struct hfs_bnode *node) } data[off] = byte & ~m; set_page_dirty(page); - kunmap(page); + kunmap_thread(page); hfs_bnode_put(node); tree->free_nodes++; mark_inode_dirty(tree->inode);
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/hfsplus/bitmap.c | 20 ++++----- fs/hfsplus/bnode.c | 102 ++++++++++++++++++++++---------------------- fs/hfsplus/btree.c | 18 ++++---- 3 files changed, 70 insertions(+), 70 deletions(-)
diff --git a/fs/hfsplus/bitmap.c b/fs/hfsplus/bitmap.c index cebce0cfe340..9ec7c1559a0c 100644 --- a/fs/hfsplus/bitmap.c +++ b/fs/hfsplus/bitmap.c @@ -39,7 +39,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size, start = size; goto out; } - pptr = kmap(page); + pptr = kmap_thread(page); curr = pptr + (offset & (PAGE_CACHE_BITS - 1)) / 32; i = offset % 32; offset &= ~(PAGE_CACHE_BITS - 1); @@ -74,7 +74,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size, } curr++; } - kunmap(page); + kunmap_thread(page); offset += PAGE_CACHE_BITS; if (offset >= size) break; @@ -84,7 +84,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size, start = size; goto out; } - curr = pptr = kmap(page); + curr = pptr = kmap_thread(page); if ((size ^ offset) / PAGE_CACHE_BITS) end = pptr + PAGE_CACHE_BITS / 32; else @@ -127,7 +127,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size, len -= 32; } set_page_dirty(page); - kunmap(page); + kunmap_thread(page); offset += PAGE_CACHE_BITS; page = read_mapping_page(mapping, offset / PAGE_CACHE_BITS, NULL); @@ -135,7 +135,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size, start = size; goto out; } - pptr = kmap(page); + pptr = kmap_thread(page); curr = pptr; end = pptr + PAGE_CACHE_BITS / 32; } @@ -151,7 +151,7 @@ int hfsplus_block_allocate(struct super_block *sb, u32 size, done: *curr = cpu_to_be32(n); set_page_dirty(page); - kunmap(page); + kunmap_thread(page); *max = offset + (curr - pptr) * 32 + i - start; sbi->free_blocks -= *max; hfsplus_mark_mdb_dirty(sb); @@ -185,7 +185,7 @@ int hfsplus_block_free(struct super_block *sb, u32 offset, u32 count) page = read_mapping_page(mapping, pnr, NULL); if (IS_ERR(page)) goto kaboom; - pptr = kmap(page); + pptr = kmap_thread(page); curr = pptr + (offset & (PAGE_CACHE_BITS - 1)) / 32; end = pptr + PAGE_CACHE_BITS / 32; len = count; @@ -215,11 +215,11 @@ int hfsplus_block_free(struct super_block *sb, u32 offset, u32 count) if (!count) break; set_page_dirty(page); - kunmap(page); + kunmap_thread(page); page = read_mapping_page(mapping, ++pnr, NULL); if (IS_ERR(page)) goto kaboom; - pptr = kmap(page); + pptr = kmap_thread(page); curr = pptr; end = pptr + PAGE_CACHE_BITS / 32; } @@ -231,7 +231,7 @@ int hfsplus_block_free(struct super_block *sb, u32 offset, u32 count) } out: set_page_dirty(page); - kunmap(page); + kunmap_thread(page); sbi->free_blocks += len; hfsplus_mark_mdb_dirty(sb); mutex_unlock(&sbi->alloc_mutex); diff --git a/fs/hfsplus/bnode.c b/fs/hfsplus/bnode.c index 177fae4e6581..62757d92fbbd 100644 --- a/fs/hfsplus/bnode.c +++ b/fs/hfsplus/bnode.c @@ -29,14 +29,14 @@ void hfs_bnode_read(struct hfs_bnode *node, void *buf, int off, int len) off &= ~PAGE_MASK;
l = min_t(int, len, PAGE_SIZE - off); - memcpy(buf, kmap(*pagep) + off, l); - kunmap(*pagep); + memcpy(buf, kmap_thread(*pagep) + off, l); + kunmap_thread(*pagep);
while ((len -= l) != 0) { buf += l; l = min_t(int, len, PAGE_SIZE); - memcpy(buf, kmap(*++pagep), l); - kunmap(*pagep); + memcpy(buf, kmap_thread(*++pagep), l); + kunmap_thread(*pagep); } }
@@ -82,16 +82,16 @@ void hfs_bnode_write(struct hfs_bnode *node, void *buf, int off, int len) off &= ~PAGE_MASK;
l = min_t(int, len, PAGE_SIZE - off); - memcpy(kmap(*pagep) + off, buf, l); + memcpy(kmap_thread(*pagep) + off, buf, l); set_page_dirty(*pagep); - kunmap(*pagep); + kunmap_thread(*pagep);
while ((len -= l) != 0) { buf += l; l = min_t(int, len, PAGE_SIZE); - memcpy(kmap(*++pagep), buf, l); + memcpy(kmap_thread(*++pagep), buf, l); set_page_dirty(*pagep); - kunmap(*pagep); + kunmap_thread(*pagep); } }
@@ -112,15 +112,15 @@ void hfs_bnode_clear(struct hfs_bnode *node, int off, int len) off &= ~PAGE_MASK;
l = min_t(int, len, PAGE_SIZE - off); - memset(kmap(*pagep) + off, 0, l); + memset(kmap_thread(*pagep) + off, 0, l); set_page_dirty(*pagep); - kunmap(*pagep); + kunmap_thread(*pagep);
while ((len -= l) != 0) { l = min_t(int, len, PAGE_SIZE); - memset(kmap(*++pagep), 0, l); + memset(kmap_thread(*++pagep), 0, l); set_page_dirty(*pagep); - kunmap(*pagep); + kunmap_thread(*pagep); } }
@@ -142,24 +142,24 @@ void hfs_bnode_copy(struct hfs_bnode *dst_node, int dst,
if (src == dst) { l = min_t(int, len, PAGE_SIZE - src); - memcpy(kmap(*dst_page) + src, kmap(*src_page) + src, l); - kunmap(*src_page); + memcpy(kmap_thread(*dst_page) + src, kmap_thread(*src_page) + src, l); + kunmap_thread(*src_page); set_page_dirty(*dst_page); - kunmap(*dst_page); + kunmap_thread(*dst_page);
while ((len -= l) != 0) { l = min_t(int, len, PAGE_SIZE); - memcpy(kmap(*++dst_page), kmap(*++src_page), l); - kunmap(*src_page); + memcpy(kmap_thread(*++dst_page), kmap_thread(*++src_page), l); + kunmap_thread(*src_page); set_page_dirty(*dst_page); - kunmap(*dst_page); + kunmap_thread(*dst_page); } } else { void *src_ptr, *dst_ptr;
do { - src_ptr = kmap(*src_page) + src; - dst_ptr = kmap(*dst_page) + dst; + src_ptr = kmap_thread(*src_page) + src; + dst_ptr = kmap_thread(*dst_page) + dst; if (PAGE_SIZE - src < PAGE_SIZE - dst) { l = PAGE_SIZE - src; src = 0; @@ -171,9 +171,9 @@ void hfs_bnode_copy(struct hfs_bnode *dst_node, int dst, } l = min(len, l); memcpy(dst_ptr, src_ptr, l); - kunmap(*src_page); + kunmap_thread(*src_page); set_page_dirty(*dst_page); - kunmap(*dst_page); + kunmap_thread(*dst_page); if (!dst) dst_page++; else @@ -202,27 +202,27 @@ void hfs_bnode_move(struct hfs_bnode *node, int dst, int src, int len)
if (src == dst) { while (src < len) { - memmove(kmap(*dst_page), kmap(*src_page), src); - kunmap(*src_page); + memmove(kmap_thread(*dst_page), kmap_thread(*src_page), src); + kunmap_thread(*src_page); set_page_dirty(*dst_page); - kunmap(*dst_page); + kunmap_thread(*dst_page); len -= src; src = PAGE_SIZE; src_page--; dst_page--; } src -= len; - memmove(kmap(*dst_page) + src, - kmap(*src_page) + src, len); - kunmap(*src_page); + memmove(kmap_thread(*dst_page) + src, + kmap_thread(*src_page) + src, len); + kunmap_thread(*src_page); set_page_dirty(*dst_page); - kunmap(*dst_page); + kunmap_thread(*dst_page); } else { void *src_ptr, *dst_ptr;
do { - src_ptr = kmap(*src_page) + src; - dst_ptr = kmap(*dst_page) + dst; + src_ptr = kmap_thread(*src_page) + src; + dst_ptr = kmap_thread(*dst_page) + dst; if (src < dst) { l = src; src = PAGE_SIZE; @@ -234,9 +234,9 @@ void hfs_bnode_move(struct hfs_bnode *node, int dst, int src, int len) } l = min(len, l); memmove(dst_ptr - l, src_ptr - l, l); - kunmap(*src_page); + kunmap_thread(*src_page); set_page_dirty(*dst_page); - kunmap(*dst_page); + kunmap_thread(*dst_page); if (dst == PAGE_SIZE) dst_page--; else @@ -251,26 +251,26 @@ void hfs_bnode_move(struct hfs_bnode *node, int dst, int src, int len)
if (src == dst) { l = min_t(int, len, PAGE_SIZE - src); - memmove(kmap(*dst_page) + src, - kmap(*src_page) + src, l); - kunmap(*src_page); + memmove(kmap_thread(*dst_page) + src, + kmap_thread(*src_page) + src, l); + kunmap_thread(*src_page); set_page_dirty(*dst_page); - kunmap(*dst_page); + kunmap_thread(*dst_page);
while ((len -= l) != 0) { l = min_t(int, len, PAGE_SIZE); - memmove(kmap(*++dst_page), - kmap(*++src_page), l); - kunmap(*src_page); + memmove(kmap_thread(*++dst_page), + kmap_thread(*++src_page), l); + kunmap_thread(*src_page); set_page_dirty(*dst_page); - kunmap(*dst_page); + kunmap_thread(*dst_page); } } else { void *src_ptr, *dst_ptr;
do { - src_ptr = kmap(*src_page) + src; - dst_ptr = kmap(*dst_page) + dst; + src_ptr = kmap_thread(*src_page) + src; + dst_ptr = kmap_thread(*dst_page) + dst; if (PAGE_SIZE - src < PAGE_SIZE - dst) { l = PAGE_SIZE - src; @@ -283,9 +283,9 @@ void hfs_bnode_move(struct hfs_bnode *node, int dst, int src, int len) } l = min(len, l); memmove(dst_ptr, src_ptr, l); - kunmap(*src_page); + kunmap_thread(*src_page); set_page_dirty(*dst_page); - kunmap(*dst_page); + kunmap_thread(*dst_page); if (!dst) dst_page++; else @@ -502,14 +502,14 @@ struct hfs_bnode *hfs_bnode_find(struct hfs_btree *tree, u32 num) if (!test_bit(HFS_BNODE_NEW, &node->flags)) return node;
- desc = (struct hfs_bnode_desc *)(kmap(node->page[0]) + + desc = (struct hfs_bnode_desc *)(kmap_thread(node->page[0]) + node->page_offset); node->prev = be32_to_cpu(desc->prev); node->next = be32_to_cpu(desc->next); node->num_recs = be16_to_cpu(desc->num_recs); node->type = desc->type; node->height = desc->height; - kunmap(node->page[0]); + kunmap_thread(node->page[0]);
switch (node->type) { case HFS_NODE_HEADER: @@ -593,14 +593,14 @@ struct hfs_bnode *hfs_bnode_create(struct hfs_btree *tree, u32 num) }
pagep = node->page; - memset(kmap(*pagep) + node->page_offset, 0, + memset(kmap_thread(*pagep) + node->page_offset, 0, min_t(int, PAGE_SIZE, tree->node_size)); set_page_dirty(*pagep); - kunmap(*pagep); + kunmap_thread(*pagep); for (i = 1; i < tree->pages_per_bnode; i++) { - memset(kmap(*++pagep), 0, PAGE_SIZE); + memset(kmap_thread(*++pagep), 0, PAGE_SIZE); set_page_dirty(*pagep); - kunmap(*pagep); + kunmap_thread(*pagep); } clear_bit(HFS_BNODE_NEW, &node->flags); wake_up(&node->lock_wq); diff --git a/fs/hfsplus/btree.c b/fs/hfsplus/btree.c index 66774f4cb4fd..74fcef3a1628 100644 --- a/fs/hfsplus/btree.c +++ b/fs/hfsplus/btree.c @@ -394,7 +394,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree)
off += node->page_offset; pagep = node->page + (off >> PAGE_SHIFT); - data = kmap(*pagep); + data = kmap_thread(*pagep); off &= ~PAGE_MASK; idx = 0;
@@ -407,7 +407,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree) idx += i; data[off] |= m; set_page_dirty(*pagep); - kunmap(*pagep); + kunmap_thread(*pagep); tree->free_nodes--; mark_inode_dirty(tree->inode); hfs_bnode_put(node); @@ -417,14 +417,14 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree) } } if (++off >= PAGE_SIZE) { - kunmap(*pagep); - data = kmap(*++pagep); + kunmap_thread(*pagep); + data = kmap_thread(*++pagep); off = 0; } idx += 8; len--; } - kunmap(*pagep); + kunmap_thread(*pagep); nidx = node->next; if (!nidx) { hfs_dbg(BNODE_MOD, "create new bmap node\n"); @@ -440,7 +440,7 @@ struct hfs_bnode *hfs_bmap_alloc(struct hfs_btree *tree) off = off16; off += node->page_offset; pagep = node->page + (off >> PAGE_SHIFT); - data = kmap(*pagep); + data = kmap_thread(*pagep); off &= ~PAGE_MASK; } } @@ -490,7 +490,7 @@ void hfs_bmap_free(struct hfs_bnode *node) } off += node->page_offset + nidx / 8; page = node->page[off >> PAGE_SHIFT]; - data = kmap(page); + data = kmap_thread(page); off &= ~PAGE_MASK; m = 1 << (~nidx & 7); byte = data[off]; @@ -498,13 +498,13 @@ void hfs_bmap_free(struct hfs_bnode *node) pr_crit("trying to free free bnode " "%u(%d)\n", node->this, node->type); - kunmap(page); + kunmap_thread(page); hfs_bnode_put(node); return; } data[off] = byte & ~m; set_page_dirty(page); - kunmap(page); + kunmap_thread(page); hfs_bnode_put(node); tree->free_nodes++; mark_inode_dirty(tree->inode);
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: David Woodhouse dwmw2@infradead.org Cc: Richard Weinberger richard@nod.at Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/jffs2/file.c | 4 ++-- fs/jffs2/gc.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c index f8fb89b10227..3e6d54f9b011 100644 --- a/fs/jffs2/file.c +++ b/fs/jffs2/file.c @@ -88,7 +88,7 @@ static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg)
BUG_ON(!PageLocked(pg));
- pg_buf = kmap(pg); + pg_buf = kmap_thread(pg); /* FIXME: Can kmap fail? */
ret = jffs2_read_inode_range(c, f, pg_buf, pg->index << PAGE_SHIFT, @@ -103,7 +103,7 @@ static int jffs2_do_readpage_nolock (struct inode *inode, struct page *pg) }
flush_dcache_page(pg); - kunmap(pg); + kunmap_thread(pg);
jffs2_dbg(2, "readpage finished\n"); return ret; diff --git a/fs/jffs2/gc.c b/fs/jffs2/gc.c index 373b3b7c9f44..a7259783ab84 100644 --- a/fs/jffs2/gc.c +++ b/fs/jffs2/gc.c @@ -1335,7 +1335,7 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era return PTR_ERR(page); }
- pg_ptr = kmap(page); + pg_ptr = kmap_thread(page); mutex_lock(&f->sem);
offset = start; @@ -1400,7 +1400,7 @@ static int jffs2_garbage_collect_dnode(struct jffs2_sb_info *c, struct jffs2_era } }
- kunmap(page); + kunmap_thread(page); put_page(page); return ret; }
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Trond Myklebust trond.myklebust@hammerspace.com Cc: Anna Schumaker anna.schumaker@netapp.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/nfs/dir.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/fs/nfs/dir.c b/fs/nfs/dir.c index cb52db9a0cfb..fee321acccb4 100644 --- a/fs/nfs/dir.c +++ b/fs/nfs/dir.c @@ -213,7 +213,7 @@ int nfs_readdir_make_qstr(struct qstr *string, const char *name, unsigned int le static int nfs_readdir_add_to_array(struct nfs_entry *entry, struct page *page) { - struct nfs_cache_array *array = kmap(page); + struct nfs_cache_array *array = kmap_thread(page); struct nfs_cache_array_entry *cache_entry; int ret;
@@ -235,7 +235,7 @@ int nfs_readdir_add_to_array(struct nfs_entry *entry, struct page *page) if (entry->eof != 0) array->eof_index = array->size; out: - kunmap(page); + kunmap_thread(page); return ret; }
@@ -347,7 +347,7 @@ int nfs_readdir_search_array(nfs_readdir_descriptor_t *desc) struct nfs_cache_array *array; int status;
- array = kmap(desc->page); + array = kmap_thread(desc->page);
if (*desc->dir_cookie == 0) status = nfs_readdir_search_for_pos(array, desc); @@ -359,7 +359,7 @@ int nfs_readdir_search_array(nfs_readdir_descriptor_t *desc) desc->current_index += array->size; desc->page_index++; } - kunmap(desc->page); + kunmap_thread(desc->page); return status; }
@@ -602,10 +602,10 @@ int nfs_readdir_page_filler(nfs_readdir_descriptor_t *desc, struct nfs_entry *en
out_nopages: if (count == 0 || (status == -EBADCOOKIE && entry->eof != 0)) { - array = kmap(page); + array = kmap_thread(page); array->eof_index = array->size; status = 0; - kunmap(page); + kunmap_thread(page); }
put_page(scratch); @@ -669,7 +669,7 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page, goto out; }
- array = kmap(page); + array = kmap_thread(page);
status = nfs_readdir_alloc_pages(pages, array_size); if (status < 0) @@ -691,7 +691,7 @@ int nfs_readdir_xdr_to_array(nfs_readdir_descriptor_t *desc, struct page *page,
nfs_readdir_free_pages(pages, array_size); out_release_array: - kunmap(page); + kunmap_thread(page); nfs4_label_free(entry.label); out: nfs_free_fattr(entry.fattr); @@ -803,7 +803,7 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc) struct nfs_cache_array *array = NULL; struct nfs_open_dir_context *ctx = file->private_data;
- array = kmap(desc->page); + array = kmap_thread(desc->page); for (i = desc->cache_entry_index; i < array->size; i++) { struct nfs_cache_array_entry *ent;
@@ -827,7 +827,7 @@ int nfs_do_filldir(nfs_readdir_descriptor_t *desc) if (array->eof_index >= 0) desc->eof = true;
- kunmap(desc->page); + kunmap_thread(desc->page); dfprintk(DIRCACHE, "NFS: nfs_do_filldir() filling ended @ cookie %Lu; returning = %d\n", (unsigned long long)*desc->dir_cookie, res); return res;
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Jaegeuk Kim jaegeuk@kernel.org Cc: Chao Yu chao@kernel.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/f2fs/f2fs.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index d9e52a7f3702..ff72a45a577e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page(
static inline void f2fs_copy_page(struct page *src, struct page *dst) { - char *src_kaddr = kmap(src); - char *dst_kaddr = kmap(dst); + char *src_kaddr = kmap_thread(src); + char *dst_kaddr = kmap_thread(dst);
memcpy(dst_kaddr, src_kaddr, PAGE_SIZE); - kunmap(dst); - kunmap(src); + kunmap_thread(dst); + kunmap_thread(src); }
static inline void f2fs_put_page(struct page *page, int unlock)
On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Jaegeuk Kim jaegeuk@kernel.org Cc: Chao Yu chao@kernel.org Signed-off-by: Ira Weiny ira.weiny@intel.com
fs/f2fs/f2fs.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index d9e52a7f3702..ff72a45a577e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page( static inline void f2fs_copy_page(struct page *src, struct page *dst) {
- char *src_kaddr = kmap(src);
- char *dst_kaddr = kmap(dst);
- char *src_kaddr = kmap_thread(src);
- char *dst_kaddr = kmap_thread(dst);
memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
- kunmap(dst);
- kunmap(src);
- kunmap_thread(dst);
- kunmap_thread(src);
}
Wouldn't it make more sense to switch cases like this to kmap_atomic()? The pages are only mapped to do a memcpy(), then they're immediately unmapped.
- Eric
On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote:
On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote:
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
@@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page( static inline void f2fs_copy_page(struct page *src, struct page *dst) {
- char *src_kaddr = kmap(src);
- char *dst_kaddr = kmap(dst);
- char *src_kaddr = kmap_thread(src);
- char *dst_kaddr = kmap_thread(dst);
memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
- kunmap(dst);
- kunmap(src);
- kunmap_thread(dst);
- kunmap_thread(src);
}
Wouldn't it make more sense to switch cases like this to kmap_atomic()? The pages are only mapped to do a memcpy(), then they're immediately unmapped.
Maybe you missed the earlier thread from Thomas trying to do something similar for rather different reasons ...
https://lore.kernel.org/lkml/20200919091751.011116649@linutronix.de/
On Sat, Oct 10, 2020 at 01:39:54AM +0100, Matthew Wilcox wrote:
On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote:
On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote:
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
@@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page( static inline void f2fs_copy_page(struct page *src, struct page *dst) {
- char *src_kaddr = kmap(src);
- char *dst_kaddr = kmap(dst);
- char *src_kaddr = kmap_thread(src);
- char *dst_kaddr = kmap_thread(dst);
memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
- kunmap(dst);
- kunmap(src);
- kunmap_thread(dst);
- kunmap_thread(src);
}
Wouldn't it make more sense to switch cases like this to kmap_atomic()? The pages are only mapped to do a memcpy(), then they're immediately unmapped.
Maybe you missed the earlier thread from Thomas trying to do something similar for rather different reasons ...
https://lore.kernel.org/lkml/20200919091751.011116649@linutronix.de/
I did miss it. I'm not subscribed to any of the mailing lists it was sent to.
Anyway, it shouldn't matter. Patchsets should be standalone, and not require reading random prior threads on linux-kernel to understand.
And I still don't really understand. After this patchset, there is still code nearly identical to the above (doing a temporary mapping just for a memcpy) that would still be using kmap_atomic(). Is the idea that later, such code will be converted to use kmap_thread() instead? If not, why use one over the other?
- Eric
On Fri, Oct 09, 2020 at 06:30:36PM -0700, Eric Biggers wrote:
On Sat, Oct 10, 2020 at 01:39:54AM +0100, Matthew Wilcox wrote:
On Fri, Oct 09, 2020 at 02:34:34PM -0700, Eric Biggers wrote:
On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote:
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
@@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page( static inline void f2fs_copy_page(struct page *src, struct page *dst) {
- char *src_kaddr = kmap(src);
- char *dst_kaddr = kmap(dst);
- char *src_kaddr = kmap_thread(src);
- char *dst_kaddr = kmap_thread(dst);
memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
- kunmap(dst);
- kunmap(src);
- kunmap_thread(dst);
- kunmap_thread(src);
}
Wouldn't it make more sense to switch cases like this to kmap_atomic()? The pages are only mapped to do a memcpy(), then they're immediately unmapped.
Maybe you missed the earlier thread from Thomas trying to do something similar for rather different reasons ...
https://lore.kernel.org/lkml/20200919091751.011116649@linutronix.de/
I did miss it. I'm not subscribed to any of the mailing lists it was sent to.
Anyway, it shouldn't matter. Patchsets should be standalone, and not require reading random prior threads on linux-kernel to understand.
Sorry, but I did not think that the discussion above was directly related. If I'm not mistaken, Thomas' work was directed at relaxing kmap_atomic() into kmap_thread() calls. While interesting, it is not the point of this series. I want to restrict kmap() callers into kmap_thread().
For this series it was considered to change the kmap_thread() call sites to kmap_atomic(). But like I said in the cover letter kmap_atomic() is not the same semantic. It is too strict. Perhaps I should have expanded that explanation.
And I still don't really understand. After this patchset, there is still code nearly identical to the above (doing a temporary mapping just for a memcpy) that would still be using kmap_atomic().
I don't understand. You mean there would be other call sites calling:
kmap_atomic() memcpy() kunmap_atomic()
?
Is the idea that later, such code will be converted to use kmap_thread() instead? If not, why use one over the other?
The reason for the new call is that with PKS added behind kmap we have 3 levels of mapping we want.
global kmap (can span threads and sleep) 'thread' kmap (can sleep but not span threads) 'atomic' kmap (can't sleep nor span threads [by definition])
As Matthew said perhaps 'global kmaps' may be best changed to vmaps? I just don't know the details of every call site.
And since I don't know the call site details if there are kmap_thread() calls which are better off as kmap_atomic() calls I think it is worth converting them. But I made the assumption that kmap users would already be calling kmap_atomic() if they could (because it is more efficient).
Ira
On Sun, Oct 11, 2020 at 11:56:35PM -0700, Ira Weiny wrote:
And I still don't really understand. After this patchset, there is still code nearly identical to the above (doing a temporary mapping just for a memcpy) that would still be using kmap_atomic().
I don't understand. You mean there would be other call sites calling:
kmap_atomic() memcpy() kunmap_atomic()
Yes, there are tons of places that do this. Try 'git grep -A6 kmap_atomic' and look for memcpy().
Hence why I'm asking what will be the "recommended" way to do this... kunmap_thread() or kmap_atomic()?
And since I don't know the call site details if there are kmap_thread() calls which are better off as kmap_atomic() calls I think it is worth converting them. But I made the assumption that kmap users would already be calling kmap_atomic() if they could (because it is more efficient).
Not necessarily. In cases where either one is correct, people might not have put much thought into which of kmap() and kmap_atomic() they are using.
- Eric
On 10/12/20 9:19 AM, Eric Biggers wrote:
On Sun, Oct 11, 2020 at 11:56:35PM -0700, Ira Weiny wrote:
And I still don't really understand. After this patchset, there is still code nearly identical to the above (doing a temporary mapping just for a memcpy) that would still be using kmap_atomic().
I don't understand. You mean there would be other call sites calling:
kmap_atomic() memcpy() kunmap_atomic()
Yes, there are tons of places that do this. Try 'git grep -A6 kmap_atomic' and look for memcpy().
Hence why I'm asking what will be the "recommended" way to do this... kunmap_thread() or kmap_atomic()?
kmap_atomic() is always preferred over kmap()/kmap_thread(). kmap_atomic() is _much_ more lightweight since its TLB invalidation is always CPU-local and never broadcast.
So, basically, unless you *must* sleep while the mapping is in place, kmap_atomic() is preferred.
On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote:
kmap_atomic() is always preferred over kmap()/kmap_thread(). kmap_atomic() is _much_ more lightweight since its TLB invalidation is always CPU-local and never broadcast.
So, basically, unless you *must* sleep while the mapping is in place, kmap_atomic() is preferred.
But kmap_atomic() disables preemption, so the _ideal_ interface would map it only locally, then on preemption make it global. I don't even know if that _can_ be done. But this email makes it seem like kmap_atomic() has no downsides.
On Mon, Oct 12, 2020 at 05:44:38PM +0100, Matthew Wilcox wrote:
On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote:
kmap_atomic() is always preferred over kmap()/kmap_thread(). kmap_atomic() is _much_ more lightweight since its TLB invalidation is always CPU-local and never broadcast.
So, basically, unless you *must* sleep while the mapping is in place, kmap_atomic() is preferred.
But kmap_atomic() disables preemption, so the _ideal_ interface would map it only locally, then on preemption make it global. I don't even know if that _can_ be done. But this email makes it seem like kmap_atomic() has no downsides.
And that is IIUC what Thomas was trying to solve.
Also, Linus brought up that kmap_atomic() has quirks in nesting.[1]
From what I can see all of these discussions support the need to have something
between kmap() and kmap_atomic().
However, the reason behind converting call sites to kmap_thread() are different between Thomas' patch set and mine. Both require more kmap granularity. However, they do so with different reasons and underlying implementations but with the _same_ resulting semantics; a thread local mapping which is preemptable.[2] Therefore they each focus on changing different call sites.
While this patch set is huge I think it serves a valuable purpose to identify a large number of call sites which are candidates for this new semantic.
Ira
[1] https://lore.kernel.org/lkml/CAHk-=wgbmwsTOKs23Z=71EBTrULoeaH2U3TNqT2atHEWvk... [2] It is important to note these implementations are not incompatible with each other. So I don't see yet another 'kmap_something()' being required.
On Mon, Oct 12, 2020 at 12:53:54PM -0700, Ira Weiny wrote:
On Mon, Oct 12, 2020 at 05:44:38PM +0100, Matthew Wilcox wrote:
On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote:
kmap_atomic() is always preferred over kmap()/kmap_thread(). kmap_atomic() is _much_ more lightweight since its TLB invalidation is always CPU-local and never broadcast.
So, basically, unless you *must* sleep while the mapping is in place, kmap_atomic() is preferred.
But kmap_atomic() disables preemption, so the _ideal_ interface would map it only locally, then on preemption make it global. I don't even know if that _can_ be done. But this email makes it seem like kmap_atomic() has no downsides.
And that is IIUC what Thomas was trying to solve.
Also, Linus brought up that kmap_atomic() has quirks in nesting.[1]
From what I can see all of these discussions support the need to have something
between kmap() and kmap_atomic().
However, the reason behind converting call sites to kmap_thread() are different between Thomas' patch set and mine. Both require more kmap granularity. However, they do so with different reasons and underlying implementations but with the _same_ resulting semantics; a thread local mapping which is preemptable.[2] Therefore they each focus on changing different call sites.
While this patch set is huge I think it serves a valuable purpose to identify a large number of call sites which are candidates for this new semantic.
Yes, I agree. My problem with this patch-set is that it ties it to some Intel feature that almost nobody cares about. Maybe we should care about it, but you didn't try very hard to make anyone care about it in the cover letter.
For a future patch-set, I'd like to see you just introduce the new API. Then you can optimise the Intel implementation of it afterwards. Those patch-sets have entirely different reviewers.
On Mon, Oct 12, 2020 at 09:02:54PM +0100, Matthew Wilcox wrote:
On Mon, Oct 12, 2020 at 12:53:54PM -0700, Ira Weiny wrote:
On Mon, Oct 12, 2020 at 05:44:38PM +0100, Matthew Wilcox wrote:
On Mon, Oct 12, 2020 at 09:28:29AM -0700, Dave Hansen wrote:
kmap_atomic() is always preferred over kmap()/kmap_thread(). kmap_atomic() is _much_ more lightweight since its TLB invalidation is always CPU-local and never broadcast.
So, basically, unless you *must* sleep while the mapping is in place, kmap_atomic() is preferred.
But kmap_atomic() disables preemption, so the _ideal_ interface would map it only locally, then on preemption make it global. I don't even know if that _can_ be done. But this email makes it seem like kmap_atomic() has no downsides.
And that is IIUC what Thomas was trying to solve.
Also, Linus brought up that kmap_atomic() has quirks in nesting.[1]
From what I can see all of these discussions support the need to have something
between kmap() and kmap_atomic().
However, the reason behind converting call sites to kmap_thread() are different between Thomas' patch set and mine. Both require more kmap granularity. However, they do so with different reasons and underlying implementations but with the _same_ resulting semantics; a thread local mapping which is preemptable.[2] Therefore they each focus on changing different call sites.
While this patch set is huge I think it serves a valuable purpose to identify a large number of call sites which are candidates for this new semantic.
Yes, I agree. My problem with this patch-set is that it ties it to some Intel feature that almost nobody cares about.
I humbly disagree. At this level the only thing this is tied to is the idea that there are additional memory protections available which can be enabled quickly on a per-thread basis. PKS on Intel is but 1 implementation of that.
Even the kmap code only has knowledge that there is something which needs to be done special on a devm page.
Maybe we should care about it, but you didn't try very hard to make anyone care about it in the cover letter.
Ok my bad. We have customers who care very much about restricting access to the PMEM pages to prevent bugs in the kernel from causing permanent damage to their data/file systems. I'll reword the cover letter better.
For a future patch-set, I'd like to see you just introduce the new API. Then you can optimise the Intel implementation of it afterwards. Those patch-sets have entirely different reviewers.
I considered doing this. But this seemed more logical because the feature is being driven by PMEM which is behind the kmap interface not by the users of the API.
I can introduce a patch set with a kmap_thread() call which does nothing if that is more palatable but it seems wrong to me to do so.
Ira
On Fri, 2020-10-09 at 14:34 -0700, Eric Biggers wrote:
On Fri, Oct 09, 2020 at 12:49:57PM -0700, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Jaegeuk Kim jaegeuk@kernel.org Cc: Chao Yu chao@kernel.org Signed-off-by: Ira Weiny ira.weiny@intel.com
fs/f2fs/f2fs.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index d9e52a7f3702..ff72a45a577e 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -2410,12 +2410,12 @@ static inline struct page *f2fs_pagecache_get_page( static inline void f2fs_copy_page(struct page *src, struct page *dst) {
- char *src_kaddr = kmap(src);
- char *dst_kaddr = kmap(dst);
- char *src_kaddr = kmap_thread(src);
- char *dst_kaddr = kmap_thread(dst);
memcpy(dst_kaddr, src_kaddr, PAGE_SIZE);
- kunmap(dst);
- kunmap(src);
- kunmap_thread(dst);
- kunmap_thread(src);
}
Wouldn't it make more sense to switch cases like this to kmap_atomic()? The pages are only mapped to do a memcpy(), then they're immediately unmapped.
On a VIPT/VIVT architecture, this is horrendously wasteful. You're taking something that was mapped at colour c_src mapping it to a new address src_kaddr, which is likely a different colour and necessitates flushing the original c_src, then you copy it to dst_kaddr, which is also likely a different colour from c_dst, so dst_kaddr has to be flushed on kunmap and c_dst has to be invalidated on kmap. What we should have is an architectural primitive for doing this, something like kmemcopy_arch(dst, src). PIPT architectures can implement it as the above (possibly losing kmap if they don't need it) but VIPT/VIVT architectures can set up a correctly coloured mapping so they can simply copy from c_src to c_dst without any need to flush and the data arrives cache hot at c_dst.
James
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Miklos Szeredi miklos@szeredi.hu Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/fuse/readdir.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/fuse/readdir.c b/fs/fuse/readdir.c index 90e3f01bd796..953ffe6f56e3 100644 --- a/fs/fuse/readdir.c +++ b/fs/fuse/readdir.c @@ -536,9 +536,9 @@ static int fuse_readdir_cached(struct file *file, struct dir_context *ctx) * Contents of the page are now protected against changing by holding * the page lock. */ - addr = kmap(page); + addr = kmap_thread(page); res = fuse_parse_cache(ff, addr, size, ctx); - kunmap(page); + kunmap_thread(page); unlock_page(page); put_page(page);
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Christoph Hellwig hch@infradead.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/freevxfs/vxfs_immed.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/freevxfs/vxfs_immed.c b/fs/freevxfs/vxfs_immed.c index bfc780c682fb..9c42fec4cd85 100644 --- a/fs/freevxfs/vxfs_immed.c +++ b/fs/freevxfs/vxfs_immed.c @@ -69,9 +69,9 @@ vxfs_immed_readpage(struct file *fp, struct page *pp) u_int64_t offset = (u_int64_t)pp->index << PAGE_SHIFT; caddr_t kaddr;
- kaddr = kmap(pp); + kaddr = kmap_thread(pp); memcpy(kaddr, vip->vii_immed.vi_immed + offset, PAGE_SIZE); - kunmap(pp); + kunmap_thread(pp); flush_dcache_page(pp); SetPageUptodate(pp);
- kaddr = kmap(pp);
- kaddr = kmap_thread(pp); memcpy(kaddr, vip->vii_immed.vi_immed + offset, PAGE_SIZE);
- kunmap(pp);
- kunmap_thread(pp);
You only Cced me on this particular patch, which means I have absolutely no idea what kmap_thread and kunmap_thread actually do, and thus can't provide an informed review.
That being said I think your life would be a lot easier if you add helpers for the above code sequence and its counterpart that copies to a potential hughmem page first, as that hides the implementation details from most users.
On Tue, Oct 13, 2020 at 12:25:44PM +0100, Christoph Hellwig wrote:
- kaddr = kmap(pp);
- kaddr = kmap_thread(pp); memcpy(kaddr, vip->vii_immed.vi_immed + offset, PAGE_SIZE);
- kunmap(pp);
- kunmap_thread(pp);
You only Cced me on this particular patch, which means I have absolutely no idea what kmap_thread and kunmap_thread actually do, and thus can't provide an informed review.
Sorry the list was so big I struggled with who to CC and on which patches.
That being said I think your life would be a lot easier if you add helpers for the above code sequence and its counterpart that copies to a potential hughmem page first, as that hides the implementation details from most users.
Matthew Wilcox and Al Viro have suggested similar ideas.
https://lore.kernel.org/lkml/20201013205012.GI2046448@iweiny-DESK2.sc.intel....
Ira
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Jan Kara jack@suse.cz Cc: "Theodore Ts'o" tytso@mit.edu Cc: Randy Dunlap rdunlap@infradead.org Cc: Alex Shi alex.shi@linux.alibaba.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/reiserfs/journal.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/reiserfs/journal.c b/fs/reiserfs/journal.c index e98f99338f8f..be8f56261e8c 100644 --- a/fs/reiserfs/journal.c +++ b/fs/reiserfs/journal.c @@ -4194,11 +4194,11 @@ static int do_journal_end(struct reiserfs_transaction_handle *th, int flags) SB_ONDISK_JOURNAL_SIZE(sb))); set_buffer_uptodate(tmp_bh); page = cn->bh->b_page; - addr = kmap(page); + addr = kmap_thread(page); memcpy(tmp_bh->b_data, addr + offset_in_page(cn->bh->b_data), cn->bh->b_size); - kunmap(page); + kunmap_thread(page); mark_buffer_dirty(tmp_bh); jindex++; set_buffer_journal_dirty(cn->bh);
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Damien Le Moal damien.lemoal@wdc.com Cc: Naohiro Aota naohiro.aota@wdc.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/zonefs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 8ec7c8f109d7..2fd6c86beee1 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -1297,7 +1297,7 @@ static int zonefs_read_super(struct super_block *sb) if (ret) goto free_page;
- super = kmap(page); + super = kmap_thread(page);
ret = -EINVAL; if (le32_to_cpu(super->s_magic) != ZONEFS_MAGIC) @@ -1349,7 +1349,7 @@ static int zonefs_read_super(struct super_block *sb) ret = 0;
unmap: - kunmap(page); + kunmap_thread(page); free_page: __free_page(page);
On 2020/10/10 4:52, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Damien Le Moal damien.lemoal@wdc.com Cc: Naohiro Aota naohiro.aota@wdc.com Signed-off-by: Ira Weiny ira.weiny@intel.com
fs/zonefs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c index 8ec7c8f109d7..2fd6c86beee1 100644 --- a/fs/zonefs/super.c +++ b/fs/zonefs/super.c @@ -1297,7 +1297,7 @@ static int zonefs_read_super(struct super_block *sb) if (ret) goto free_page;
- super = kmap(page);
- super = kmap_thread(page);
ret = -EINVAL; if (le32_to_cpu(super->s_magic) != ZONEFS_MAGIC) @@ -1349,7 +1349,7 @@ static int zonefs_read_super(struct super_block *sb) ret = 0; unmap:
- kunmap(page);
- kunmap_thread(page);
free_page: __free_page(page);
acked-by: Damien Le Moal damien.lemoal@wdc.com
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Richard Weinberger richard@nod.at Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/ubifs/file.c | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-)
diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index b77d1637bbbc..a3537447a885 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -111,7 +111,7 @@ static int do_readpage(struct page *page) ubifs_assert(c, !PageChecked(page)); ubifs_assert(c, !PagePrivate(page));
- addr = kmap(page); + addr = kmap_thread(page);
block = page->index << UBIFS_BLOCKS_PER_PAGE_SHIFT; beyond = (i_size + UBIFS_BLOCK_SIZE - 1) >> UBIFS_BLOCK_SHIFT; @@ -174,7 +174,7 @@ static int do_readpage(struct page *page) SetPageUptodate(page); ClearPageError(page); flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); return 0;
error: @@ -182,7 +182,7 @@ static int do_readpage(struct page *page) ClearPageUptodate(page); SetPageError(page); flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); return err; }
@@ -616,7 +616,7 @@ static int populate_page(struct ubifs_info *c, struct page *page, dbg_gen("ino %lu, pg %lu, i_size %lld, flags %#lx", inode->i_ino, page->index, i_size, page->flags);
- addr = zaddr = kmap(page); + addr = zaddr = kmap_thread(page);
end_index = (i_size - 1) >> PAGE_SHIFT; if (!i_size || page->index > end_index) { @@ -692,7 +692,7 @@ static int populate_page(struct ubifs_info *c, struct page *page, SetPageUptodate(page); ClearPageError(page); flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); *n = nn; return 0;
@@ -700,7 +700,7 @@ static int populate_page(struct ubifs_info *c, struct page *page, ClearPageUptodate(page); SetPageError(page); flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); ubifs_err(c, "bad data node (block %u, inode %lu)", page_block, inode->i_ino); return -EINVAL; @@ -918,7 +918,7 @@ static int do_writepage(struct page *page, int len) /* Update radix tree tags */ set_page_writeback(page);
- addr = kmap(page); + addr = kmap_thread(page); block = page->index << UBIFS_BLOCKS_PER_PAGE_SHIFT; i = 0; while (len) { @@ -950,7 +950,7 @@ static int do_writepage(struct page *page, int len) ClearPagePrivate(page); ClearPageChecked(page);
- kunmap(page); + kunmap_thread(page); unlock_page(page); end_page_writeback(page); return err;
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: David Howells dhowells@redhat.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/cachefiles/rdwr.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 3080cda9e824..2468e5c067ba 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -936,9 +936,9 @@ int cachefiles_write_page(struct fscache_storage *op, struct page *page) } }
- data = kmap(page); + data = kmap_thread(page); ret = kernel_write(file, data, len, &pos); - kunmap(page); + kunmap_thread(page); fput(file); if (ret != len) goto error_eio;
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Anton Altaparmakov anton@tuxera.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/ntfs/aops.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ntfs/aops.c b/fs/ntfs/aops.c index bb0a43860ad2..11633d732809 100644 --- a/fs/ntfs/aops.c +++ b/fs/ntfs/aops.c @@ -1099,7 +1099,7 @@ static int ntfs_write_mst_block(struct page *page, if (!nr_bhs) goto done; /* Map the page so we can access its contents. */ - kaddr = kmap(page); + kaddr = kmap_thread(page); /* Clear the page uptodate flag whilst the mst fixups are applied. */ BUG_ON(!PageUptodate(page)); ClearPageUptodate(page); @@ -1276,7 +1276,7 @@ static int ntfs_write_mst_block(struct page *page, iput(VFS_I(base_tni)); } SetPageUptodate(page); - kunmap(page); + kunmap_thread(page); done: if (unlikely(err && err != -ENOMEM)) { /*
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/romfs/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/romfs/super.c b/fs/romfs/super.c index e582d001f792..9050074c6755 100644 --- a/fs/romfs/super.c +++ b/fs/romfs/super.c @@ -107,7 +107,7 @@ static int romfs_readpage(struct file *file, struct page *page) void *buf; int ret;
- buf = kmap(page); + buf = kmap_thread(page); if (!buf) return -ENOMEM;
@@ -136,7 +136,7 @@ static int romfs_readpage(struct file *file, struct page *page) SetPageUptodate(page);
flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); unlock_page(page); return ret; }
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Hans de Goede hdegoede@redhat.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/vboxsf/file.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/vboxsf/file.c b/fs/vboxsf/file.c index c4ab5996d97a..d9c7e6b7b4cc 100644 --- a/fs/vboxsf/file.c +++ b/fs/vboxsf/file.c @@ -216,7 +216,7 @@ static int vboxsf_readpage(struct file *file, struct page *page) u8 *buf; int err;
- buf = kmap(page); + buf = kmap_thread(page);
err = vboxsf_read(sf_handle->root, sf_handle->handle, off, &nread, buf); if (err == 0) { @@ -227,7 +227,7 @@ static int vboxsf_readpage(struct file *file, struct page *page) SetPageError(page); }
- kunmap(page); + kunmap_thread(page); unlock_page(page); return err; } @@ -268,10 +268,10 @@ static int vboxsf_writepage(struct page *page, struct writeback_control *wbc) if (!sf_handle) return -EBADF;
- buf = kmap(page); + buf = kmap_thread(page); err = vboxsf_write(sf_handle->root, sf_handle->handle, off, &nwrite, buf); - kunmap(page); + kunmap_thread(page);
kref_put(&sf_handle->refcount, vboxsf_handle_release);
@@ -302,10 +302,10 @@ static int vboxsf_write_end(struct file *file, struct address_space *mapping, if (!PageUptodate(page) && copied < len) zero_user(page, from + copied, len - copied);
- buf = kmap(page); + buf = kmap_thread(page); err = vboxsf_write(sf_handle->root, sf_handle->handle, pos, &nwritten, buf + from); - kunmap(page); + kunmap_thread(page);
if (err) { nwritten = 0;
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Jeff Dike jdike@addtoit.com Cc: Richard Weinberger richard@nod.at Cc: Anton Ivanov anton.ivanov@cambridgegreys.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/hostfs/hostfs_kern.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/fs/hostfs/hostfs_kern.c b/fs/hostfs/hostfs_kern.c index c070c0d8e3e9..608efd0f83cb 100644 --- a/fs/hostfs/hostfs_kern.c +++ b/fs/hostfs/hostfs_kern.c @@ -409,7 +409,7 @@ static int hostfs_writepage(struct page *page, struct writeback_control *wbc) if (page->index >= end_index) count = inode->i_size & (PAGE_SIZE-1);
- buffer = kmap(page); + buffer = kmap_thread(page);
err = write_file(HOSTFS_I(inode)->fd, &base, buffer, count); if (err != count) { @@ -425,7 +425,7 @@ static int hostfs_writepage(struct page *page, struct writeback_control *wbc) err = 0;
out: - kunmap(page); + kunmap_thread(page);
unlock_page(page); return err; @@ -437,7 +437,7 @@ static int hostfs_readpage(struct file *file, struct page *page) loff_t start = page_offset(page); int bytes_read, ret = 0;
- buffer = kmap(page); + buffer = kmap_thread(page); bytes_read = read_file(FILE_HOSTFS_I(file)->fd, &start, buffer, PAGE_SIZE); if (bytes_read < 0) { @@ -454,7 +454,7 @@ static int hostfs_readpage(struct file *file, struct page *page)
out: flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); unlock_page(page); return ret; } @@ -480,9 +480,9 @@ static int hostfs_write_end(struct file *file, struct address_space *mapping, unsigned from = pos & (PAGE_SIZE - 1); int err;
- buffer = kmap(page); + buffer = kmap_thread(page); err = write_file(FILE_HOSTFS_I(file)->fd, &pos, buffer + from, copied); - kunmap(page); + kunmap_thread(page);
if (!PageUptodate(page) && err == PAGE_SIZE) SetPageUptodate(page);
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Nicolas Pitre nico@fluxnic.net Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/cramfs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 912308600d39..003c014a42ed 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset, struct page *page = pages[i];
if (page) { - memcpy(data, kmap(page), PAGE_SIZE); - kunmap(page); + memcpy(data, kmap_thread(page), PAGE_SIZE); + kunmap_thread(page); put_page(page); } else memset(data, 0, PAGE_SIZE); @@ -826,7 +826,7 @@ static int cramfs_readpage(struct file *file, struct page *page)
maxblock = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT; bytes_filled = 0; - pgdata = kmap(page); + pgdata = kmap_thread(page);
if (page->index < maxblock) { struct super_block *sb = inode->i_sb; @@ -914,13 +914,13 @@ static int cramfs_readpage(struct file *file, struct page *page)
memset(pgdata + bytes_filled, 0, PAGE_SIZE - bytes_filled); flush_dcache_page(page); - kunmap(page); + kunmap_thread(page); SetPageUptodate(page); unlock_page(page); return 0;
err: - kunmap(page); + kunmap_thread(page); ClearPageUptodate(page); SetPageError(page); unlock_page(page);
On Fri, 9 Oct 2020, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Nicolas Pitre nico@fluxnic.net Signed-off-by: Ira Weiny ira.weiny@intel.com
Acked-by: Nicolas Pitre nico@fluxnic.net
fs/cramfs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 912308600d39..003c014a42ed 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset, struct page *page = pages[i]; if (page) {
memcpy(data, kmap(page), PAGE_SIZE);
kunmap(page);
memcpy(data, kmap_thread(page), PAGE_SIZE);
} else memset(data, 0, PAGE_SIZE);kunmap_thread(page); put_page(page);
@@ -826,7 +826,7 @@ static int cramfs_readpage(struct file *file, struct page *page) maxblock = (inode->i_size + PAGE_SIZE - 1) >> PAGE_SHIFT; bytes_filled = 0;
- pgdata = kmap(page);
- pgdata = kmap_thread(page);
if (page->index < maxblock) { struct super_block *sb = inode->i_sb; @@ -914,13 +914,13 @@ static int cramfs_readpage(struct file *file, struct page *page) memset(pgdata + bytes_filled, 0, PAGE_SIZE - bytes_filled); flush_dcache_page(page);
- kunmap(page);
- kunmap_thread(page); SetPageUptodate(page); unlock_page(page); return 0;
err:
- kunmap(page);
- kunmap_thread(page); ClearPageUptodate(page); SetPageError(page); unlock_page(page);
-- 2.28.0.rc0.12.gb6a658bd00c9
On Fri, Oct 9, 2020 at 12:52 PM ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Nicolas Pitre nico@fluxnic.net Signed-off-by: Ira Weiny ira.weiny@intel.com
fs/cramfs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 912308600d39..003c014a42ed 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset, struct page *page = pages[i];
if (page) {
memcpy(data, kmap(page), PAGE_SIZE);
kunmap(page);
memcpy(data, kmap_thread(page), PAGE_SIZE);
kunmap_thread(page);
Why does this need a sleepable kmap? This looks like a textbook kmap_atomic() use case.
On Tue, Oct 13, 2020 at 11:44:29AM -0700, Dan Williams wrote:
On Fri, Oct 9, 2020 at 12:52 PM ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Nicolas Pitre nico@fluxnic.net Signed-off-by: Ira Weiny ira.weiny@intel.com
fs/cramfs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 912308600d39..003c014a42ed 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset, struct page *page = pages[i];
if (page) {
memcpy(data, kmap(page), PAGE_SIZE);
kunmap(page);
memcpy(data, kmap_thread(page), PAGE_SIZE);
kunmap_thread(page);
Why does this need a sleepable kmap? This looks like a textbook kmap_atomic() use case.
There's a lot of code of this form. Could we perhaps have:
static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size) { char *vto = kmap_atomic(to);
memcpy(vto, vfrom, size); kunmap_atomic(vto); }
in linux/highmem.h ?
On Tue, Oct 13, 2020 at 12:37 PM Matthew Wilcox willy@infradead.org wrote:
On Tue, Oct 13, 2020 at 11:44:29AM -0700, Dan Williams wrote:
On Fri, Oct 9, 2020 at 12:52 PM ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Nicolas Pitre nico@fluxnic.net Signed-off-by: Ira Weiny ira.weiny@intel.com
fs/cramfs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 912308600d39..003c014a42ed 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset, struct page *page = pages[i];
if (page) {
memcpy(data, kmap(page), PAGE_SIZE);
kunmap(page);
memcpy(data, kmap_thread(page), PAGE_SIZE);
kunmap_thread(page);
Why does this need a sleepable kmap? This looks like a textbook kmap_atomic() use case.
There's a lot of code of this form. Could we perhaps have:
static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size) { char *vto = kmap_atomic(to);
memcpy(vto, vfrom, size); kunmap_atomic(vto);
}
in linux/highmem.h ?
Nice, yes, that could also replace the local ones in lib/iov_iter.c (memcpy_{to,from}_page())
On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:
static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size) { char *vto = kmap_atomic(to);
memcpy(vto, vfrom, size); kunmap_atomic(vto); }
in linux/highmem.h ?
You mean, like static void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len) { char *from = kmap_atomic(page); memcpy(to, from + offset, len); kunmap_atomic(from); }
static void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len) { char *to = kmap_atomic(page); memcpy(to + offset, from, len); kunmap_atomic(to); }
static void memzero_page(struct page *page, size_t offset, size_t len) { char *addr = kmap_atomic(page); memset(addr + offset, 0, len); kunmap_atomic(addr); }
in lib/iov_iter.c? FWIW, I don't like that "highpage" in the name and highmem.h as location - these make perfect sense regardless of highmem; they are normal memory operations with page + offset used instead of a pointer...
On Tue, Oct 13, 2020 at 09:01:49PM +0100, Al Viro wrote:
On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:
static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size) { char *vto = kmap_atomic(to);
memcpy(vto, vfrom, size); kunmap_atomic(vto); }
in linux/highmem.h ?
You mean, like static void memcpy_from_page(char *to, struct page *page, size_t offset, size_t len) { char *from = kmap_atomic(page); memcpy(to, from + offset, len); kunmap_atomic(from); }
static void memcpy_to_page(struct page *page, size_t offset, const char *from, size_t len) { char *to = kmap_atomic(page); memcpy(to + offset, from, len); kunmap_atomic(to); }
static void memzero_page(struct page *page, size_t offset, size_t len) { char *addr = kmap_atomic(page); memset(addr + offset, 0, len); kunmap_atomic(addr); }
in lib/iov_iter.c? FWIW, I don't like that "highpage" in the name and highmem.h as location - these make perfect sense regardless of highmem; they are normal memory operations with page + offset used instead of a pointer...
I was thinking along those lines as well especially because of the direction this patch set takes kmap().
Thanks for pointing these out to me. How about I lift them to a common header? But if not highmem.h where?
Ira
On Tue, Oct 13, 2020 at 08:36:43PM +0100, Matthew Wilcox wrote:
On Tue, Oct 13, 2020 at 11:44:29AM -0700, Dan Williams wrote:
On Fri, Oct 9, 2020 at 12:52 PM ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Nicolas Pitre nico@fluxnic.net Signed-off-by: Ira Weiny ira.weiny@intel.com
fs/cramfs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/fs/cramfs/inode.c b/fs/cramfs/inode.c index 912308600d39..003c014a42ed 100644 --- a/fs/cramfs/inode.c +++ b/fs/cramfs/inode.c @@ -247,8 +247,8 @@ static void *cramfs_blkdev_read(struct super_block *sb, unsigned int offset, struct page *page = pages[i];
if (page) {
memcpy(data, kmap(page), PAGE_SIZE);
kunmap(page);
memcpy(data, kmap_thread(page), PAGE_SIZE);
kunmap_thread(page);
Why does this need a sleepable kmap? This looks like a textbook kmap_atomic() use case.
There's a lot of code of this form. Could we perhaps have:
static inline void copy_to_highpage(struct page *to, void *vfrom, unsigned int size) { char *vto = kmap_atomic(to);
memcpy(vto, vfrom, size); kunmap_atomic(vto); }
in linux/highmem.h ?
Christoph had the same idea. I'll work on it.
Ira
From: Ira Weiny ira.weiny@intel.com
The kmap() calls in this FS are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Gao Xiang xiang@kernel.org Cc: Chao Yu chao@kernel.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/erofs/super.c | 4 ++-- fs/erofs/xattr.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/fs/erofs/super.c b/fs/erofs/super.c index ddaa516c008a..41696b60f1b3 100644 --- a/fs/erofs/super.c +++ b/fs/erofs/super.c @@ -139,7 +139,7 @@ static int erofs_read_superblock(struct super_block *sb)
sbi = EROFS_SB(sb);
- data = kmap(page); + data = kmap_thread(page); dsb = (struct erofs_super_block *)(data + EROFS_SUPER_OFFSET);
ret = -EINVAL; @@ -189,7 +189,7 @@ static int erofs_read_superblock(struct super_block *sb) } ret = 0; out: - kunmap(page); + kunmap_thread(page); put_page(page); return ret; } diff --git a/fs/erofs/xattr.c b/fs/erofs/xattr.c index c8c381eadcd6..1771baa99d77 100644 --- a/fs/erofs/xattr.c +++ b/fs/erofs/xattr.c @@ -20,7 +20,7 @@ static inline void xattr_iter_end(struct xattr_iter *it, bool atomic) { /* the only user of kunmap() is 'init_inode_xattrs' */ if (!atomic) - kunmap(it->page); + kunmap_thread(it->page); else kunmap_atomic(it->kaddr);
@@ -96,7 +96,7 @@ static int init_inode_xattrs(struct inode *inode) }
/* read in shared xattr array (non-atomic, see kmalloc below) */ - it.kaddr = kmap(it.page); + it.kaddr = kmap_thread(it.page); atomic_map = false;
ih = (struct erofs_xattr_ibody_header *)(it.kaddr + it.ofs);
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: Jens Axboe axboe@kernel.dk Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/aio.c | 4 ++-- fs/binfmt_elf.c | 4 ++-- fs/binfmt_elf_fdpic.c | 4 ++-- fs/exec.c | 10 +++++----- fs/io_uring.c | 4 ++-- fs/splice.c | 4 ++-- 6 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/fs/aio.c b/fs/aio.c index d5ec30385566..27f95996d25f 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1223,10 +1223,10 @@ static long aio_read_events_ring(struct kioctx *ctx, avail = min(avail, nr - ret); avail = min_t(long, avail, AIO_EVENTS_PER_PAGE - pos);
- ev = kmap(page); + ev = kmap_thread(page); copy_ret = copy_to_user(event + ret, ev + pos, sizeof(*ev) * avail); - kunmap(page); + kunmap_thread(page);
if (unlikely(copy_ret)) { ret = -EFAULT; diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c index 13d053982dd7..1a332ef1ae03 100644 --- a/fs/binfmt_elf.c +++ b/fs/binfmt_elf.c @@ -2430,9 +2430,9 @@ static int elf_core_dump(struct coredump_params *cprm)
page = get_dump_page(addr); if (page) { - void *kaddr = kmap(page); + void *kaddr = kmap_thread(page); stop = !dump_emit(cprm, kaddr, PAGE_SIZE); - kunmap(page); + kunmap_thread(page); put_page(page); } else stop = !dump_skip(cprm, PAGE_SIZE); diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c index 50f845702b92..8fbe188e0fdd 100644 --- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -1542,9 +1542,9 @@ static bool elf_fdpic_dump_segments(struct coredump_params *cprm) bool res; struct page *page = get_dump_page(addr); if (page) { - void *kaddr = kmap(page); + void *kaddr = kmap_thread(page); res = dump_emit(cprm, kaddr, PAGE_SIZE); - kunmap(page); + kunmap_thread(page); put_page(page); } else { res = dump_skip(cprm, PAGE_SIZE); diff --git a/fs/exec.c b/fs/exec.c index a91003e28eaa..3948b8511e3a 100644 --- a/fs/exec.c +++ b/fs/exec.c @@ -575,11 +575,11 @@ static int copy_strings(int argc, struct user_arg_ptr argv,
if (kmapped_page) { flush_kernel_dcache_page(kmapped_page); - kunmap(kmapped_page); + kunmap_thread(kmapped_page); put_arg_page(kmapped_page); } kmapped_page = page; - kaddr = kmap(kmapped_page); + kaddr = kmap_thread(kmapped_page); kpos = pos & PAGE_MASK; flush_arg_page(bprm, kpos, kmapped_page); } @@ -593,7 +593,7 @@ static int copy_strings(int argc, struct user_arg_ptr argv, out: if (kmapped_page) { flush_kernel_dcache_page(kmapped_page); - kunmap(kmapped_page); + kunmap_thread(kmapped_page); put_arg_page(kmapped_page); } return ret; @@ -871,11 +871,11 @@ int transfer_args_to_stack(struct linux_binprm *bprm,
for (index = MAX_ARG_PAGES - 1; index >= stop; index--) { unsigned int offset = index == stop ? bprm->p & ~PAGE_MASK : 0; - char *src = kmap(bprm->page[index]) + offset; + char *src = kmap_thread(bprm->page[index]) + offset; sp -= PAGE_SIZE - offset; if (copy_to_user((void *) sp, src, PAGE_SIZE - offset) != 0) ret = -EFAULT; - kunmap(bprm->page[index]); + kunmap_thread(bprm->page[index]); if (ret) goto out; } diff --git a/fs/io_uring.c b/fs/io_uring.c index aae0ef2ec34d..f59bb079822d 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -2903,7 +2903,7 @@ static ssize_t loop_rw_iter(int rw, struct file *file, struct kiocb *kiocb, iovec = iov_iter_iovec(iter); } else { /* fixed buffers import bvec */ - iovec.iov_base = kmap(iter->bvec->bv_page) + iovec.iov_base = kmap_thread(iter->bvec->bv_page) + iter->iov_offset; iovec.iov_len = min(iter->count, iter->bvec->bv_len - iter->iov_offset); @@ -2918,7 +2918,7 @@ static ssize_t loop_rw_iter(int rw, struct file *file, struct kiocb *kiocb, }
if (iov_iter_is_bvec(iter)) - kunmap(iter->bvec->bv_page); + kunmap_thread(iter->bvec->bv_page);
if (nr < 0) { if (!ret) diff --git a/fs/splice.c b/fs/splice.c index ce75aec52274..190c4d218c30 100644 --- a/fs/splice.c +++ b/fs/splice.c @@ -815,9 +815,9 @@ static int write_pipe_buf(struct pipe_inode_info *pipe, struct pipe_buffer *buf, void *data; loff_t tmp = sd->pos;
- data = kmap(buf->page); + data = kmap_thread(buf->page); ret = __kernel_write(sd->u.file, data + buf->offset, sd->len, &tmp); - kunmap(buf->page); + kunmap_thread(buf->page);
return ret; }
From: Ira Weiny ira.weiny@intel.com
There are 3 places in namei.c where the equivalent of ext2_put_page() is open coded. We want to use k[un]map_thread() instead of k[un]map() in ext2_[get|put]_page().
Move ext2_put_page() to ext2.h and use it in namei.c in prep for converting the k[un]map() code.
Cc: Jan Kara jack@suse.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/ext2/dir.c | 6 ------ fs/ext2/ext2.h | 8 ++++++++ fs/ext2/namei.c | 15 +++++---------- 3 files changed, 13 insertions(+), 16 deletions(-)
diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c index 70355ab6740e..f3194bf20733 100644 --- a/fs/ext2/dir.c +++ b/fs/ext2/dir.c @@ -66,12 +66,6 @@ static inline unsigned ext2_chunk_size(struct inode *inode) return inode->i_sb->s_blocksize; }
-static inline void ext2_put_page(struct page *page) -{ - kunmap(page); - put_page(page); -} - /* * Return the offset into page `page_nr' of the last valid * byte in that page, plus one. diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 5136b7289e8d..021ec8b42ac3 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -16,6 +16,8 @@ #include <linux/blockgroup_lock.h> #include <linux/percpu_counter.h> #include <linux/rbtree.h> +#include <linux/mm.h> +#include <linux/highmem.h>
/* XXX Here for now... not interested in restructing headers JUST now */
@@ -745,6 +747,12 @@ extern int ext2_delete_entry (struct ext2_dir_entry_2 *, struct page *); extern int ext2_empty_dir (struct inode *); extern struct ext2_dir_entry_2 * ext2_dotdot (struct inode *, struct page **); extern void ext2_set_link(struct inode *, struct ext2_dir_entry_2 *, struct page *, struct inode *, int); +static inline void ext2_put_page(struct page *page) +{ + kunmap(page); + put_page(page); +} +
/* ialloc.c */ extern struct inode * ext2_new_inode (struct inode *, umode_t, const struct qstr *); diff --git a/fs/ext2/namei.c b/fs/ext2/namei.c index 5bf2c145643b..ea980f1e2e99 100644 --- a/fs/ext2/namei.c +++ b/fs/ext2/namei.c @@ -389,23 +389,18 @@ static int ext2_rename (struct inode * old_dir, struct dentry * old_dentry, if (dir_de) { if (old_dir != new_dir) ext2_set_link(old_inode, dir_de, dir_page, new_dir, 0); - else { - kunmap(dir_page); - put_page(dir_page); - } + else + ext2_put_page(dir_page); inode_dec_link_count(old_dir); } return 0;
out_dir: - if (dir_de) { - kunmap(dir_page); - put_page(dir_page); - } + if (dir_de) + ext2_put_page(dir_page); out_old: - kunmap(old_page); - put_page(old_page); + ext2_put_page(old_page); out: return err; }
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS update use the new kmap_thread() call instead.
Cc: Jan Kara jack@suse.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/ext2/dir.c | 2 +- fs/ext2/ext2.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ext2/dir.c b/fs/ext2/dir.c index f3194bf20733..abe97ba458c8 100644 --- a/fs/ext2/dir.c +++ b/fs/ext2/dir.c @@ -196,7 +196,7 @@ static struct page * ext2_get_page(struct inode *dir, unsigned long n, struct address_space *mapping = dir->i_mapping; struct page *page = read_mapping_page(mapping, n, NULL); if (!IS_ERR(page)) { - kmap(page); + kmap_thread(page); if (unlikely(!PageChecked(page))) { if (PageError(page) || !ext2_check_page(page, quiet)) goto fail; diff --git a/fs/ext2/ext2.h b/fs/ext2/ext2.h index 021ec8b42ac3..9bcb6714c255 100644 --- a/fs/ext2/ext2.h +++ b/fs/ext2/ext2.h @@ -749,7 +749,7 @@ extern struct ext2_dir_entry_2 * ext2_dotdot (struct inode *, struct page **); extern void ext2_set_link(struct inode *, struct ext2_dir_entry_2 *, struct page *, struct inode *, int); static inline void ext2_put_page(struct page *page) { - kunmap(page); + kunmap_thread(page); put_page(page); }
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/isofs/compress.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/isofs/compress.c b/fs/isofs/compress.c index bc12ac7e2312..ddd3fd99d2e1 100644 --- a/fs/isofs/compress.c +++ b/fs/isofs/compress.c @@ -344,7 +344,7 @@ static int zisofs_readpage(struct file *file, struct page *page) pages[i] = grab_cache_page_nowait(mapping, index); if (pages[i]) { ClearPageError(pages[i]); - kmap(pages[i]); + kmap_thread(pages[i]); } }
@@ -356,7 +356,7 @@ static int zisofs_readpage(struct file *file, struct page *page) flush_dcache_page(pages[i]); if (i == full_page && err) SetPageError(pages[i]); - kunmap(pages[i]); + kunmap_thread(pages[i]); unlock_page(pages[i]); if (i != full_page) put_page(pages[i]);
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- fs/jffs2/file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/jffs2/file.c b/fs/jffs2/file.c index 3e6d54f9b011..14dd2b18cc16 100644 --- a/fs/jffs2/file.c +++ b/fs/jffs2/file.c @@ -287,13 +287,13 @@ static int jffs2_write_end(struct file *filp, struct address_space *mapping,
/* In 2.4, it was already kmapped by generic_file_write(). Doesn't hurt to do it again. The alternative is ifdefs, which are ugly. */ - kmap(pg); + kmap_thread(pg);
ret = jffs2_write_inode_range(c, f, ri, page_address(pg) + aligned_start, (pg->index << PAGE_SHIFT) + aligned_start, end - aligned_start, &writtenlen);
- kunmap(pg); + kunmap_thread(pg);
if (ret) { /* There was an error writing. */
From: Ira Weiny ira.weiny@intel.com
These kmap() calls in these drivers are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: "David S. Miller" davem@davemloft.net Cc: Jakub Kicinski kuba@kernel.org Cc: Alexey Kuznetsov kuznet@ms2.inr.ac.ru Cc: Hideaki YOSHIFUJI yoshfuji@linux-ipv6.org Cc: Trond Myklebust trond.myklebust@hammerspace.com Cc: Anna Schumaker anna.schumaker@netapp.com Cc: Boris Pismenny borisp@nvidia.com Cc: Aviad Yehezkel aviadye@nvidia.com Cc: John Fastabend john.fastabend@gmail.com Cc: Daniel Borkmann daniel@iogearbox.net Signed-off-by: Ira Weiny ira.weiny@intel.com --- net/ceph/messenger.c | 4 ++-- net/core/datagram.c | 4 ++-- net/core/sock.c | 8 ++++---- net/ipv4/ip_output.c | 4 ++-- net/sunrpc/cache.c | 4 ++-- net/sunrpc/xdr.c | 8 ++++---- net/tls/tls_device.c | 4 ++-- 7 files changed, 18 insertions(+), 18 deletions(-)
diff --git a/net/ceph/messenger.c b/net/ceph/messenger.c index d4d7a0e52491..0c49b8e333da 100644 --- a/net/ceph/messenger.c +++ b/net/ceph/messenger.c @@ -1535,10 +1535,10 @@ static u32 ceph_crc32c_page(u32 crc, struct page *page, { char *kaddr;
- kaddr = kmap(page); + kaddr = kmap_thread(page); BUG_ON(kaddr == NULL); crc = crc32c(crc, kaddr + page_offset, length); - kunmap(page); + kunmap_thread(page);
return crc; } diff --git a/net/core/datagram.c b/net/core/datagram.c index 639745d4f3b9..cbd0a343074a 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -441,14 +441,14 @@ static int __skb_datagram_iter(const struct sk_buff *skb, int offset, end = start + skb_frag_size(frag); if ((copy = end - offset) > 0) { struct page *page = skb_frag_page(frag); - u8 *vaddr = kmap(page); + u8 *vaddr = kmap_thread(page);
if (copy > len) copy = len; n = INDIRECT_CALL_1(cb, simple_copy_to_iter, vaddr + skb_frag_off(frag) + offset - start, copy, data, to); - kunmap(page); + kunmap_thread(page); offset += n; if (n != copy) goto short_copy; diff --git a/net/core/sock.c b/net/core/sock.c index 6c5c6b18eff4..9b46a75cd8c1 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2846,11 +2846,11 @@ ssize_t sock_no_sendpage(struct socket *sock, struct page *page, int offset, siz ssize_t res; struct msghdr msg = {.msg_flags = flags}; struct kvec iov; - char *kaddr = kmap(page); + char *kaddr = kmap_thread(page); iov.iov_base = kaddr + offset; iov.iov_len = size; res = kernel_sendmsg(sock, &msg, &iov, 1, size); - kunmap(page); + kunmap_thread(page); return res; } EXPORT_SYMBOL(sock_no_sendpage); @@ -2861,12 +2861,12 @@ ssize_t sock_no_sendpage_locked(struct sock *sk, struct page *page, ssize_t res; struct msghdr msg = {.msg_flags = flags}; struct kvec iov; - char *kaddr = kmap(page); + char *kaddr = kmap_thread(page);
iov.iov_base = kaddr + offset; iov.iov_len = size; res = kernel_sendmsg_locked(sk, &msg, &iov, 1, size); - kunmap(page); + kunmap_thread(page); return res; } EXPORT_SYMBOL(sock_no_sendpage_locked); diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index e6f2ada9e7d5..05304fb251a4 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -949,9 +949,9 @@ csum_page(struct page *page, int offset, int copy) { char *kaddr; __wsum csum; - kaddr = kmap(page); + kaddr = kmap_thread(page); csum = csum_partial(kaddr + offset, copy, 0); - kunmap(page); + kunmap_thread(page); return csum; }
diff --git a/net/sunrpc/cache.c b/net/sunrpc/cache.c index baef5ee43dbb..88193f2a8e6f 100644 --- a/net/sunrpc/cache.c +++ b/net/sunrpc/cache.c @@ -935,9 +935,9 @@ static ssize_t cache_downcall(struct address_space *mapping, if (!page) goto out_slow;
- kaddr = kmap(page); + kaddr = kmap_thread(page); ret = cache_do_downcall(kaddr, buf, count, cd); - kunmap(page); + kunmap_thread(page); unlock_page(page); put_page(page); return ret; diff --git a/net/sunrpc/xdr.c b/net/sunrpc/xdr.c index be11d672b5b9..00afbb48fb0a 100644 --- a/net/sunrpc/xdr.c +++ b/net/sunrpc/xdr.c @@ -1353,7 +1353,7 @@ xdr_xcode_array2(struct xdr_buf *buf, unsigned int base, base &= ~PAGE_MASK; avail_page = min_t(unsigned int, PAGE_SIZE - base, avail_here); - c = kmap(*ppages) + base; + c = kmap_thread(*ppages) + base;
while (avail_here) { avail_here -= avail_page; @@ -1429,9 +1429,9 @@ xdr_xcode_array2(struct xdr_buf *buf, unsigned int base, } } if (avail_here) { - kunmap(*ppages); + kunmap_thread(*ppages); ppages++; - c = kmap(*ppages); + c = kmap_thread(*ppages); }
avail_page = min(avail_here, @@ -1471,7 +1471,7 @@ xdr_xcode_array2(struct xdr_buf *buf, unsigned int base, out: kfree(elem); if (ppages) - kunmap(*ppages); + kunmap_thread(*ppages); return err; }
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c index b74e2741f74f..ead5b1c485f8 100644 --- a/net/tls/tls_device.c +++ b/net/tls/tls_device.c @@ -576,13 +576,13 @@ int tls_device_sendpage(struct sock *sk, struct page *page, goto out; }
- kaddr = kmap(page); + kaddr = kmap_thread(page); iov.iov_base = kaddr + offset; iov.iov_len = size; iov_iter_kvec(&msg_iter, WRITE, &iov, 1, size); rc = tls_push_data(sk, &msg_iter, size, flags, TLS_RECORD_TYPE_DATA); - kunmap(page); + kunmap_thread(page);
out: release_sock(sk);
From: Ira Weiny ira.weiny@intel.com
These kmap() calls in this driver are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/target/target_core_iblock.c | 4 ++-- drivers/target/target_core_rd.c | 4 ++-- drivers/target/target_core_transport.c | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/target/target_core_iblock.c b/drivers/target/target_core_iblock.c index 1c181d31f4c8..df7b1568edb3 100644 --- a/drivers/target/target_core_iblock.c +++ b/drivers/target/target_core_iblock.c @@ -415,7 +415,7 @@ iblock_execute_zero_out(struct block_device *bdev, struct se_cmd *cmd) unsigned char *buf, *not_zero; int ret;
- buf = kmap(sg_page(sg)) + sg->offset; + buf = kmap_thread(sg_page(sg)) + sg->offset; if (!buf) return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; /* @@ -423,7 +423,7 @@ iblock_execute_zero_out(struct block_device *bdev, struct se_cmd *cmd) * incoming WRITE_SAME payload does not contain zeros. */ not_zero = memchr_inv(buf, 0x00, cmd->data_length); - kunmap(sg_page(sg)); + kunmap_thread(sg_page(sg));
if (not_zero) return TCM_LOGICAL_UNIT_COMMUNICATION_FAILURE; diff --git a/drivers/target/target_core_rd.c b/drivers/target/target_core_rd.c index 408bd975170b..dbbdd39c5bf9 100644 --- a/drivers/target/target_core_rd.c +++ b/drivers/target/target_core_rd.c @@ -159,9 +159,9 @@ static int rd_allocate_sgl_table(struct rd_dev *rd_dev, struct rd_dev_sg_table * sg_assign_page(&sg[j], pg); sg[j].length = PAGE_SIZE;
- p = kmap(pg); + p = kmap_thread(pg); memset(p, init_payload, PAGE_SIZE); - kunmap(pg); + kunmap_thread(pg); }
page_offset += sg_per_table; diff --git a/drivers/target/target_core_transport.c b/drivers/target/target_core_transport.c index ff26ab0a5f60..8d0bae5a92e5 100644 --- a/drivers/target/target_core_transport.c +++ b/drivers/target/target_core_transport.c @@ -1692,11 +1692,11 @@ int target_submit_cmd_map_sgls(struct se_cmd *se_cmd, struct se_session *se_sess unsigned char *buf = NULL;
if (sgl) - buf = kmap(sg_page(sgl)) + sgl->offset; + buf = kmap_thread(sg_page(sgl)) + sgl->offset;
if (buf) { memset(buf, 0, sgl->length); - kunmap(sg_page(sgl)); + kunmap_thread(sg_page(sgl)); } }
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: "James E.J. Bottomley" jejb@linux.ibm.com Cc: "Martin K. Petersen" martin.petersen@oracle.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/scsi/ipr.c | 8 ++++---- drivers/scsi/pmcraid.c | 8 ++++---- 2 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/drivers/scsi/ipr.c b/drivers/scsi/ipr.c index b0aa58d117cc..a5a0b8feb661 100644 --- a/drivers/scsi/ipr.c +++ b/drivers/scsi/ipr.c @@ -3923,9 +3923,9 @@ static int ipr_copy_ucode_buffer(struct ipr_sglist *sglist, buffer += bsize_elem) { struct page *page = sg_page(sg);
- kaddr = kmap(page); + kaddr = kmap_thread(page); memcpy(kaddr, buffer, bsize_elem); - kunmap(page); + kunmap_thread(page);
sg->length = bsize_elem;
@@ -3938,9 +3938,9 @@ static int ipr_copy_ucode_buffer(struct ipr_sglist *sglist, if (len % bsize_elem) { struct page *page = sg_page(sg);
- kaddr = kmap(page); + kaddr = kmap_thread(page); memcpy(kaddr, buffer, len % bsize_elem); - kunmap(page); + kunmap_thread(page);
sg->length = len % bsize_elem; } diff --git a/drivers/scsi/pmcraid.c b/drivers/scsi/pmcraid.c index aa9ae2ae8579..4b05ba4b8a11 100644 --- a/drivers/scsi/pmcraid.c +++ b/drivers/scsi/pmcraid.c @@ -3269,13 +3269,13 @@ static int pmcraid_copy_sglist( for (i = 0; i < (len / bsize_elem); i++, sg = sg_next(sg), buffer += bsize_elem) { struct page *page = sg_page(sg);
- kaddr = kmap(page); + kaddr = kmap_thread(page); if (direction == DMA_TO_DEVICE) rc = copy_from_user(kaddr, buffer, bsize_elem); else rc = copy_to_user(buffer, kaddr, bsize_elem);
- kunmap(page); + kunmap_thread(page);
if (rc) { pmcraid_err("failed to copy user data into sg list\n"); @@ -3288,14 +3288,14 @@ static int pmcraid_copy_sglist( if (len % bsize_elem) { struct page *page = sg_page(sg);
- kaddr = kmap(page); + kaddr = kmap_thread(page);
if (direction == DMA_TO_DEVICE) rc = copy_from_user(kaddr, buffer, len % bsize_elem); else rc = copy_to_user(buffer, kaddr, len % bsize_elem);
- kunmap(page); + kunmap_thread(page);
sg->length = len % bsize_elem; }
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Ulf Hansson ulf.hansson@linaro.org Cc: Sascha Sommer saschasommer@freenet.de Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/mmc/host/mmc_spi.c | 4 ++-- drivers/mmc/host/sdricoh_cs.c | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/mmc/host/mmc_spi.c b/drivers/mmc/host/mmc_spi.c index 18a850f37ddc..ab28e7103b8d 100644 --- a/drivers/mmc/host/mmc_spi.c +++ b/drivers/mmc/host/mmc_spi.c @@ -918,7 +918,7 @@ mmc_spi_data_do(struct mmc_spi_host *host, struct mmc_command *cmd, }
/* allow pio too; we don't allow highmem */ - kmap_addr = kmap(sg_page(sg)); + kmap_addr = kmap_thread(sg_page(sg)); if (direction == DMA_TO_DEVICE) t->tx_buf = kmap_addr + sg->offset; else @@ -950,7 +950,7 @@ mmc_spi_data_do(struct mmc_spi_host *host, struct mmc_command *cmd, /* discard mappings */ if (direction == DMA_FROM_DEVICE) flush_kernel_dcache_page(sg_page(sg)); - kunmap(sg_page(sg)); + kunmap_thread(sg_page(sg)); if (dma_dev) dma_unmap_page(dma_dev, dma_addr, PAGE_SIZE, dir);
diff --git a/drivers/mmc/host/sdricoh_cs.c b/drivers/mmc/host/sdricoh_cs.c index 76a8cd3a186f..7806bc69c4f1 100644 --- a/drivers/mmc/host/sdricoh_cs.c +++ b/drivers/mmc/host/sdricoh_cs.c @@ -312,11 +312,11 @@ static void sdricoh_request(struct mmc_host *mmc, struct mmc_request *mrq) int result; page = sg_page(data->sg);
- buf = kmap(page) + data->sg->offset + (len * i); + buf = kmap_thread(page) + data->sg->offset + (len * i); result = sdricoh_blockio(host, data->flags & MMC_DATA_READ, buf, len); - kunmap(page); + kunmap_thread(page); flush_dcache_page(page); if (result) { dev_err(dev, "sdricoh_request: cmd %i "
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Stefano Stabellini sstabellini@kernel.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/xen/gntalloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/xen/gntalloc.c b/drivers/xen/gntalloc.c index 3fa40c723e8e..3b78e055feff 100644 --- a/drivers/xen/gntalloc.c +++ b/drivers/xen/gntalloc.c @@ -184,9 +184,9 @@ static int add_grefs(struct ioctl_gntalloc_alloc_gref *op, static void __del_gref(struct gntalloc_gref *gref) { if (gref->notify.flags & UNMAP_NOTIFY_CLEAR_BYTE) { - uint8_t *tmp = kmap(gref->page); + uint8_t *tmp = kmap_thread(gref->page); tmp[gref->notify.pgoff] = 0; - kunmap(gref->page); + kunmap_thread(gref->page); } if (gref->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { notify_remote_via_evtchn(gref->notify.event);
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Ard Biesheuvel ardb@kernel.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/firmware/efi/capsule-loader.c | 6 +++--- drivers/firmware/efi/capsule.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/firmware/efi/capsule-loader.c b/drivers/firmware/efi/capsule-loader.c index 4dde8edd53b6..aa2e0b5940fd 100644 --- a/drivers/firmware/efi/capsule-loader.c +++ b/drivers/firmware/efi/capsule-loader.c @@ -197,7 +197,7 @@ static ssize_t efi_capsule_write(struct file *file, const char __user *buff, page = cap_info->pages[cap_info->index - 1]; }
- kbuff = kmap(page); + kbuff = kmap_thread(page); kbuff += PAGE_SIZE - cap_info->page_bytes_remain;
/* Copy capsule binary data from user space to kernel space buffer */ @@ -217,7 +217,7 @@ static ssize_t efi_capsule_write(struct file *file, const char __user *buff, }
cap_info->count += write_byte; - kunmap(page); + kunmap_thread(page);
/* Submit the full binary to efi_capsule_update() API */ if (cap_info->header.headersize > 0 && @@ -236,7 +236,7 @@ static ssize_t efi_capsule_write(struct file *file, const char __user *buff, return write_byte;
fail_unmap: - kunmap(page); + kunmap_thread(page); failed: efi_free_all_buff_pages(cap_info); return ret; diff --git a/drivers/firmware/efi/capsule.c b/drivers/firmware/efi/capsule.c index 598b7800d14e..edb7797b0e4f 100644 --- a/drivers/firmware/efi/capsule.c +++ b/drivers/firmware/efi/capsule.c @@ -244,7 +244,7 @@ int efi_capsule_update(efi_capsule_header_t *capsule, phys_addr_t *pages) for (i = 0; i < sg_count; i++) { efi_capsule_block_desc_t *sglist;
- sglist = kmap(sg_pages[i]); + sglist = kmap_thread(sg_pages[i]);
for (j = 0; j < SGLIST_PER_PAGE && count > 0; j++) { u64 sz = min_t(u64, imagesize, @@ -265,7 +265,7 @@ int efi_capsule_update(efi_capsule_header_t *capsule, phys_addr_t *pages) else sglist[j].data = page_to_phys(sg_pages[i + 1]);
- kunmap(sg_pages[i]); + kunmap_thread(sg_pages[i]); }
mutex_lock(&capsule_mutex);
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/staging/rts5208/rtsx_transport.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/staging/rts5208/rtsx_transport.c b/drivers/staging/rts5208/rtsx_transport.c index 0027bcf638ad..f747cc23951b 100644 --- a/drivers/staging/rts5208/rtsx_transport.c +++ b/drivers/staging/rts5208/rtsx_transport.c @@ -92,13 +92,13 @@ unsigned int rtsx_stor_access_xfer_buf(unsigned char *buffer, while (sglen > 0) { unsigned int plen = min(sglen, (unsigned int) PAGE_SIZE - poff); - unsigned char *ptr = kmap(page); + unsigned char *ptr = kmap_thread(page);
if (dir == TO_XFER_BUF) memcpy(ptr + poff, buffer + cnt, plen); else memcpy(buffer + cnt, ptr + poff, plen); - kunmap(page); + kunmap_thread(page);
/* Start at the beginning of the next page */ poff = 0;
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Miquel Raynal miquel.raynal@bootlin.com Cc: Richard Weinberger richard@nod.at Cc: Vignesh Raghavendra vigneshr@ti.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/mtd/mtd_blkdevs.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/mtd/mtd_blkdevs.c b/drivers/mtd/mtd_blkdevs.c index 0c05f77f9b21..4b18998273fa 100644 --- a/drivers/mtd/mtd_blkdevs.c +++ b/drivers/mtd/mtd_blkdevs.c @@ -88,14 +88,14 @@ static blk_status_t do_blktrans_request(struct mtd_blktrans_ops *tr, return BLK_STS_IOERR; return BLK_STS_OK; case REQ_OP_READ: - buf = kmap(bio_page(req->bio)) + bio_offset(req->bio); + buf = kmap_thread(bio_page(req->bio)) + bio_offset(req->bio); for (; nsect > 0; nsect--, block++, buf += tr->blksize) { if (tr->readsect(dev, block, buf)) { - kunmap(bio_page(req->bio)); + kunmap_thread(bio_page(req->bio)); return BLK_STS_IOERR; } } - kunmap(bio_page(req->bio)); + kunmap_thread(bio_page(req->bio)); rq_flush_dcache_pages(req); return BLK_STS_OK; case REQ_OP_WRITE: @@ -103,14 +103,14 @@ static blk_status_t do_blktrans_request(struct mtd_blktrans_ops *tr, return BLK_STS_IOERR;
rq_flush_dcache_pages(req); - buf = kmap(bio_page(req->bio)) + bio_offset(req->bio); + buf = kmap_thread(bio_page(req->bio)) + bio_offset(req->bio); for (; nsect > 0; nsect--, block++, buf += tr->blksize) { if (tr->writesect(dev, block, buf)) { - kunmap(bio_page(req->bio)); + kunmap_thread(bio_page(req->bio)); return BLK_STS_IOERR; } } - kunmap(bio_page(req->bio)); + kunmap_thread(bio_page(req->bio)); return BLK_STS_OK; default: return BLK_STS_IOERR;
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Coly Li colyli@suse.de (maintainer:BCACHE (BLOCK LAYER CACHE)) Cc: Kent Overstreet kent.overstreet@gmail.com (maintainer:BCACHE (BLOCK LAYER CACHE)) Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/md/bcache/request.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index c7cadaafa947..a4571f6d09dd 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -44,10 +44,10 @@ static void bio_csum(struct bio *bio, struct bkey *k) uint64_t csum = 0;
bio_for_each_segment(bv, bio, iter) { - void *d = kmap(bv.bv_page) + bv.bv_offset; + void *d = kmap_thread(bv.bv_page) + bv.bv_offset;
csum = bch_crc64_update(csum, d, bv.bv_len); - kunmap(bv.bv_page); + kunmap_thread(bv.bv_page); }
k->ptr[KEY_PTRS(k)] = csum & (~0ULL >> 1);
On 2020/10/10 03:50, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Hi Ira,
There were a number of options considered.
1) Attempt to change all the thread local kmap() calls to kmap_atomic() 2) Introduce a flags parameter to kmap() to indicate if the mapping should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only
I copied the above information from patch 00/58 to this message. The idea behind kmap_thread() is fine to me, but as you said the new api is very easy to be missed in new code (even for me). I would like to be supportive to option 2) introduce a flag to kmap(), then we won't forget the new thread-localized kmap method, and people won't ask why a _thread() function is called but no kthread created.
Thanks.
Coly Li
Cc: Coly Li colyli@suse.de (maintainer:BCACHE (BLOCK LAYER CACHE)) Cc: Kent Overstreet kent.overstreet@gmail.com (maintainer:BCACHE (BLOCK LAYER CACHE)) Signed-off-by: Ira Weiny ira.weiny@intel.com
drivers/md/bcache/request.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c index c7cadaafa947..a4571f6d09dd 100644 --- a/drivers/md/bcache/request.c +++ b/drivers/md/bcache/request.c @@ -44,10 +44,10 @@ static void bio_csum(struct bio *bio, struct bkey *k) uint64_t csum = 0; bio_for_each_segment(bv, bio, iter) {
void *d = kmap(bv.bv_page) + bv.bv_offset;
void *d = kmap_thread(bv.bv_page) + bv.bv_offset;
csum = bch_crc64_update(csum, d, bv.bv_len);
kunmap(bv.bv_page);
}kunmap_thread(bv.bv_page);
k->ptr[KEY_PTRS(k)] = csum & (~0ULL >> 1);
On Sat, Oct 10, 2020 at 10:20:34AM +0800, Coly Li wrote:
On 2020/10/10 03:50, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Hi Ira,
There were a number of options considered.
- Attempt to change all the thread local kmap() calls to kmap_atomic()
- Introduce a flags parameter to kmap() to indicate if the mapping
should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only
I copied the above information from patch 00/58 to this message. The idea behind kmap_thread() is fine to me, but as you said the new api is very easy to be missed in new code (even for me). I would like to be supportive to option 2) introduce a flag to kmap(), then we won't forget the new thread-localized kmap method, and people won't ask why a _thread() function is called but no kthread created.
Thanks for the feedback.
I'm going to hold off making any changes until others weigh in. FWIW, I kind of like option 2 as well. But there is already kmap_atomic() so it seemed like kmap_XXXX() was more in line with the current API.
Thanks, Ira
Thanks.
Coly Li
On 2020/10/12 13:28, Ira Weiny wrote:
On Sat, Oct 10, 2020 at 10:20:34AM +0800, Coly Li wrote:
On 2020/10/10 03:50, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Hi Ira,
There were a number of options considered.
- Attempt to change all the thread local kmap() calls to kmap_atomic()
- Introduce a flags parameter to kmap() to indicate if the mapping
should be global or not 3) Change ~20-30 call sites to 'kmap_global()' to indicate that they require a global mapping of the pages 4) Change ~209 call sites to 'kmap_thread()' to indicate that the mapping is to be used within that thread of execution only
I copied the above information from patch 00/58 to this message. The idea behind kmap_thread() is fine to me, but as you said the new api is very easy to be missed in new code (even for me). I would like to be supportive to option 2) introduce a flag to kmap(), then we won't forget the new thread-localized kmap method, and people won't ask why a _thread() function is called but no kthread created.
Thanks for the feedback.
I'm going to hold off making any changes until others weigh in. FWIW, I kind of like option 2 as well. But there is already kmap_atomic() so it seemed like kmap_XXXX() was more in line with the current API.
I understand it now, the idea is fine to me.
Acked-by: Coly Li colyli@suse.de
Thanks.
Coly Li
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/misc/vmw_vmci/vmci_queue_pair.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/drivers/misc/vmw_vmci/vmci_queue_pair.c b/drivers/misc/vmw_vmci/vmci_queue_pair.c index 8531ae781195..f308abb8ad03 100644 --- a/drivers/misc/vmw_vmci/vmci_queue_pair.c +++ b/drivers/misc/vmw_vmci/vmci_queue_pair.c @@ -343,7 +343,7 @@ static int qp_memcpy_to_queue_iter(struct vmci_queue *queue, size_t to_copy;
if (kernel_if->host) - va = kmap(kernel_if->u.h.page[page_index]); + va = kmap_thread(kernel_if->u.h.page[page_index]); else va = kernel_if->u.g.vas[page_index + 1]; /* Skip header. */ @@ -357,12 +357,12 @@ static int qp_memcpy_to_queue_iter(struct vmci_queue *queue, if (!copy_from_iter_full((u8 *)va + page_offset, to_copy, from)) { if (kernel_if->host) - kunmap(kernel_if->u.h.page[page_index]); + kunmap_thread(kernel_if->u.h.page[page_index]); return VMCI_ERROR_INVALID_ARGS; } bytes_copied += to_copy; if (kernel_if->host) - kunmap(kernel_if->u.h.page[page_index]); + kunmap_thread(kernel_if->u.h.page[page_index]); }
return VMCI_SUCCESS; @@ -391,7 +391,7 @@ static int qp_memcpy_from_queue_iter(struct iov_iter *to, int err;
if (kernel_if->host) - va = kmap(kernel_if->u.h.page[page_index]); + va = kmap_thread(kernel_if->u.h.page[page_index]); else va = kernel_if->u.g.vas[page_index + 1]; /* Skip header. */ @@ -405,12 +405,12 @@ static int qp_memcpy_from_queue_iter(struct iov_iter *to, err = copy_to_iter((u8 *)va + page_offset, to_copy, to); if (err != to_copy) { if (kernel_if->host) - kunmap(kernel_if->u.h.page[page_index]); + kunmap_thread(kernel_if->u.h.page[page_index]); return VMCI_ERROR_INVALID_ARGS; } bytes_copied += to_copy; if (kernel_if->host) - kunmap(kernel_if->u.h.page[page_index]); + kunmap_thread(kernel_if->u.h.page[page_index]); }
return VMCI_SUCCESS;
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/android/binder_alloc.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/android/binder_alloc.c b/drivers/android/binder_alloc.c index 69609696a843..5f50856caad7 100644 --- a/drivers/android/binder_alloc.c +++ b/drivers/android/binder_alloc.c @@ -1118,9 +1118,9 @@ binder_alloc_copy_user_to_buffer(struct binder_alloc *alloc, page = binder_alloc_get_page(alloc, buffer, buffer_offset, &pgoff); size = min_t(size_t, bytes, PAGE_SIZE - pgoff); - kptr = kmap(page) + pgoff; + kptr = kmap_thread(page) + pgoff; ret = copy_from_user(kptr, from, size); - kunmap(page); + kunmap_thread(page); if (ret) return bytes - size + ret; bytes -= size;
From: Ira Weiny ira.weiny@intel.com
This kmap() call is localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Eric Biederman ebiederm@xmission.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- kernel/kexec_core.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index c19c0dad1ebe..272a9920c0d6 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -815,7 +815,7 @@ static int kimage_load_normal_segment(struct kimage *image, if (result < 0) goto out;
- ptr = kmap(page); + ptr = kmap_thread(page); /* Start with a clear page */ clear_page(ptr); ptr += maddr & ~PAGE_MASK; @@ -828,7 +828,7 @@ static int kimage_load_normal_segment(struct kimage *image, memcpy(ptr, kbuf, uchunk); else result = copy_from_user(ptr, buf, uchunk); - kunmap(page); + kunmap_thread(page); if (result) { result = -EFAULT; goto out; @@ -879,7 +879,7 @@ static int kimage_load_crash_segment(struct kimage *image, goto out; } arch_kexec_post_alloc_pages(page_address(page), 1, 0); - ptr = kmap(page); + ptr = kmap_thread(page); ptr += maddr & ~PAGE_MASK; mchunk = min_t(size_t, mbytes, PAGE_SIZE - (maddr & ~PAGE_MASK)); @@ -895,7 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image, else result = copy_from_user(ptr, buf, uchunk); kexec_flush_icache_page(page); - kunmap(page); + kunmap_thread(page); arch_kexec_pre_free_pages(page_address(page), 1); if (result) { result = -EFAULT;
ira.weiny@intel.com writes:
From: Ira Weiny ira.weiny@intel.com
This kmap() call is localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Acked-by: "Eric W. Biederman" ebiederm@xmission.com
Cc: Eric Biederman ebiederm@xmission.com Signed-off-by: Ira Weiny ira.weiny@intel.com
kernel/kexec_core.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c index c19c0dad1ebe..272a9920c0d6 100644 --- a/kernel/kexec_core.c +++ b/kernel/kexec_core.c @@ -815,7 +815,7 @@ static int kimage_load_normal_segment(struct kimage *image, if (result < 0) goto out;
ptr = kmap(page);
/* Start with a clear page */ clear_page(ptr); ptr += maddr & ~PAGE_MASK;ptr = kmap_thread(page);
@@ -828,7 +828,7 @@ static int kimage_load_normal_segment(struct kimage *image, memcpy(ptr, kbuf, uchunk); else result = copy_from_user(ptr, buf, uchunk);
kunmap(page);
if (result) { result = -EFAULT; goto out;kunmap_thread(page);
@@ -879,7 +879,7 @@ static int kimage_load_crash_segment(struct kimage *image, goto out; } arch_kexec_post_alloc_pages(page_address(page), 1, 0);
ptr = kmap(page);
ptr += maddr & ~PAGE_MASK; mchunk = min_t(size_t, mbytes, PAGE_SIZE - (maddr & ~PAGE_MASK));ptr = kmap_thread(page);
@@ -895,7 +895,7 @@ static int kimage_load_crash_segment(struct kimage *image, else result = copy_from_user(ptr, buf, uchunk); kexec_flush_icache_page(page);
kunmap(page);
arch_kexec_pre_free_pages(page_address(page), 1); if (result) { result = -EFAULT;kunmap_thread(page);
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- mm/memory.c | 8 ++++---- mm/swapfile.c | 4 ++-- mm/userfaultfd.c | 4 ++-- 3 files changed, 8 insertions(+), 8 deletions(-)
diff --git a/mm/memory.c b/mm/memory.c index fcfc4ca36eba..75a054882d7a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4945,7 +4945,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm, if (bytes > PAGE_SIZE-offset) bytes = PAGE_SIZE-offset;
- maddr = kmap(page); + maddr = kmap_thread(page); if (write) { copy_to_user_page(vma, page, addr, maddr + offset, buf, bytes); @@ -4954,7 +4954,7 @@ int __access_remote_vm(struct task_struct *tsk, struct mm_struct *mm, copy_from_user_page(vma, page, addr, buf, maddr + offset, bytes); } - kunmap(page); + kunmap_thread(page); put_page(page); } len -= bytes; @@ -5216,14 +5216,14 @@ long copy_huge_page_from_user(struct page *dst_page,
for (i = 0; i < pages_per_huge_page; i++) { if (allow_pagefault) - page_kaddr = kmap(dst_page + i); + page_kaddr = kmap_thread(dst_page + i); else page_kaddr = kmap_atomic(dst_page + i); rc = copy_from_user(page_kaddr, (const void __user *)(src + i * PAGE_SIZE), PAGE_SIZE); if (allow_pagefault) - kunmap(dst_page + i); + kunmap_thread(dst_page + i); else kunmap_atomic(page_kaddr);
diff --git a/mm/swapfile.c b/mm/swapfile.c index debc94155f74..e3296ff95648 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -3219,7 +3219,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) error = PTR_ERR(page); goto bad_swap_unlock_inode; } - swap_header = kmap(page); + swap_header = kmap_thread(page);
maxpages = read_swap_header(p, swap_header, inode); if (unlikely(!maxpages)) { @@ -3395,7 +3395,7 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags) filp_close(swap_file, NULL); out: if (page && !IS_ERR(page)) { - kunmap(page); + kunmap_thread(page); put_page(page); } if (name) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9a3d451402d7..4d38c881bb2d 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -586,11 +586,11 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, mmap_read_unlock(dst_mm); BUG_ON(!page);
- page_kaddr = kmap(page); + page_kaddr = kmap_thread(page); err = copy_from_user(page_kaddr, (const void __user *) src_addr, PAGE_SIZE); - kunmap(page); + kunmap_thread(page); if (unlikely(err)) { err = -EFAULT; goto out;
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Alexander Viro viro@zeniv.linux.org.uk Cc: "Jérôme Glisse" jglisse@redhat.com Cc: Martin KaFai Lau kafai@fb.com Cc: Song Liu songliubraving@fb.com Cc: Yonghong Song yhs@fb.com Cc: Andrii Nakryiko andriin@fb.com Cc: John Fastabend john.fastabend@gmail.com Cc: KP Singh kpsingh@chromium.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- lib/iov_iter.c | 12 ++++++------ lib/test_bpf.c | 4 ++-- lib/test_hmm.c | 8 ++++---- 3 files changed, 12 insertions(+), 12 deletions(-)
diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 5e40786c8f12..1d47f957cf95 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -208,7 +208,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b } /* Too bad - revert to non-atomic kmap */
- kaddr = kmap(page); + kaddr = kmap_thread(page); from = kaddr + offset; left = copyout(buf, from, copy); copy -= left; @@ -225,7 +225,7 @@ static size_t copy_page_to_iter_iovec(struct page *page, size_t offset, size_t b from += copy; bytes -= copy; } - kunmap(page); + kunmap_thread(page);
done: if (skip == iov->iov_len) { @@ -292,7 +292,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t } /* Too bad - revert to non-atomic kmap */
- kaddr = kmap(page); + kaddr = kmap_thread(page); to = kaddr + offset; left = copyin(to, buf, copy); copy -= left; @@ -309,7 +309,7 @@ static size_t copy_page_from_iter_iovec(struct page *page, size_t offset, size_t to += copy; bytes -= copy; } - kunmap(page); + kunmap_thread(page);
done: if (skip == iov->iov_len) { @@ -1742,10 +1742,10 @@ int iov_iter_for_each_range(struct iov_iter *i, size_t bytes, return 0;
iterate_all_kinds(i, bytes, v, -EINVAL, ({ - w.iov_base = kmap(v.bv_page) + v.bv_offset; + w.iov_base = kmap_thread(v.bv_page) + v.bv_offset; w.iov_len = v.bv_len; err = f(&w, context); - kunmap(v.bv_page); + kunmap_thread(v.bv_page); err;}), ({ w = v; err = f(&w, context);}) diff --git a/lib/test_bpf.c b/lib/test_bpf.c index ca7d635bccd9..441f822f56ba 100644 --- a/lib/test_bpf.c +++ b/lib/test_bpf.c @@ -6506,11 +6506,11 @@ static void *generate_test_data(struct bpf_test *test, int sub) if (!page) goto err_kfree_skb;
- ptr = kmap(page); + ptr = kmap_thread(page); if (!ptr) goto err_free_page; memcpy(ptr, test->frag_data, MAX_DATA); - kunmap(page); + kunmap_thread(page); skb_add_rx_frag(skb, 0, page, 0, MAX_DATA, MAX_DATA); }
diff --git a/lib/test_hmm.c b/lib/test_hmm.c index e7dc3de355b7..e40d26f97f45 100644 --- a/lib/test_hmm.c +++ b/lib/test_hmm.c @@ -329,9 +329,9 @@ static int dmirror_do_read(struct dmirror *dmirror, unsigned long start, if (!page) return -ENOENT;
- tmp = kmap(page); + tmp = kmap_thread(page); memcpy(ptr, tmp, PAGE_SIZE); - kunmap(page); + kunmap_thread(page);
ptr += PAGE_SIZE; bounce->cpages++; @@ -398,9 +398,9 @@ static int dmirror_do_write(struct dmirror *dmirror, unsigned long start, if (!page || xa_pointer_tag(entry) != DPT_XA_TAG_WRITE) return -ENOENT;
- tmp = kmap(page); + tmp = kmap_thread(page); memcpy(tmp, ptr, PAGE_SIZE); - kunmap(page); + kunmap_thread(page);
ptr += PAGE_SIZE; bounce->cpages++;
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Benjamin Herrenschmidt benh@kernel.crashing.org Cc: Paul Mackerras paulus@samba.org Signed-off-by: Ira Weiny ira.weiny@intel.com --- arch/powerpc/mm/mem.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 42e25874f5a8..6ef557b8dda6 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -573,9 +573,9 @@ void flush_icache_user_page(struct vm_area_struct *vma, struct page *page, { unsigned long maddr;
- maddr = (unsigned long) kmap(page) + (addr & ~PAGE_MASK); + maddr = (unsigned long) kmap_thread(page) + (addr & ~PAGE_MASK); flush_icache_range(maddr, maddr + len); - kunmap(page); + kunmap_thread(page); }
/*
From: Ira Weiny ira.weiny@intel.com
These kmap() calls are localized to a single thread. To avoid the over head of global PKRS updates use the new kmap_thread() call.
Cc: Kirti Wankhede kwankhede@nvidia.com Signed-off-by: Ira Weiny ira.weiny@intel.com --- samples/vfio-mdev/mbochs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/samples/vfio-mdev/mbochs.c b/samples/vfio-mdev/mbochs.c index 3cc5e5921682..6d95422c0b46 100644 --- a/samples/vfio-mdev/mbochs.c +++ b/samples/vfio-mdev/mbochs.c @@ -479,12 +479,12 @@ static ssize_t mdev_access(struct mdev_device *mdev, char *buf, size_t count, pos -= MBOCHS_MMIO_BAR_OFFSET; poff = pos & ~PAGE_MASK; pg = __mbochs_get_page(mdev_state, pos >> PAGE_SHIFT); - map = kmap(pg); + map = kmap_thread(pg); if (is_write) memcpy(map + poff, buf, count); else memcpy(buf, map + poff, count); - kunmap(pg); + kunmap_thread(pg); put_page(pg);
} else {
From: Ira Weiny ira.weiny@intel.com
dax_direct_access() is a special case of accessing pmem via a page offset and without a struct page.
Because the dax driver is well aware of the special protections it has mapped memory with, call dev_access_[en|dis]able() directly instead of the unnecessary overhead of trying to get a page to kmap.
Similar to kmap, we leverage existing functions, dax_read_[un]lock(), because they are already required to surround the use of the memory returned from dax_direct_access().
Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/dax/super.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/drivers/dax/super.c b/drivers/dax/super.c index e84070b55463..0ddb3ee73e36 100644 --- a/drivers/dax/super.c +++ b/drivers/dax/super.c @@ -30,6 +30,7 @@ static DEFINE_SPINLOCK(dax_host_lock);
int dax_read_lock(void) { + dev_access_enable(false); return srcu_read_lock(&dax_srcu); } EXPORT_SYMBOL_GPL(dax_read_lock); @@ -37,6 +38,7 @@ EXPORT_SYMBOL_GPL(dax_read_lock); void dax_read_unlock(int id) { srcu_read_unlock(&dax_srcu, id); + dev_access_disable(false); } EXPORT_SYMBOL_GPL(dax_read_unlock);
From: Ira Weiny ira.weiny@intel.com
The pmem driver uses a cached virtual address to access its memory directly. Because the nvdimm driver is well aware of the special protections it has mapped memory with, we call dev_access_[en|dis]able() around the direct pmem->virt_addr (pmem_addr) usage instead of the unnecessary overhead of trying to get a page to kmap.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/nvdimm/pmem.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index fab29b514372..e4dc1ae990fc 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -148,7 +148,9 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem, if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) return BLK_STS_IOERR;
+ dev_access_enable(false); rc = read_pmem(page, page_off, pmem_addr, len); + dev_access_disable(false); flush_dcache_page(page); return rc; } @@ -180,11 +182,13 @@ static blk_status_t pmem_do_write(struct pmem_device *pmem, * after clear poison. */ flush_dcache_page(page); + dev_access_enable(false); write_pmem(pmem_addr, page, page_off, len); if (unlikely(bad_pmem)) { rc = pmem_clear_poison(pmem, pmem_off, len); write_pmem(pmem_addr, page, page_off, len); } + dev_access_disable(false);
return rc; }
On 10/9/20 12:50 PM, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The pmem driver uses a cached virtual address to access its memory directly. Because the nvdimm driver is well aware of the special protections it has mapped memory with, we call dev_access_[en|dis]able() around the direct pmem->virt_addr (pmem_addr) usage instead of the unnecessary overhead of trying to get a page to kmap.
Signed-off-by: Ira Weiny ira.weiny@intel.com
drivers/nvdimm/pmem.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index fab29b514372..e4dc1ae990fc 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -148,7 +148,9 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem, if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) return BLK_STS_IOERR;
- dev_access_enable(false); rc = read_pmem(page, page_off, pmem_addr, len);
- dev_access_disable(false);
Hi Ira!
The APIs should be tweaked to use a symbol (GLOBAL, PER_THREAD), instead of true/false. Try reading the above and you'll see that it sounds like it's doing the opposite of what it is ("enable_this(false)" sounds like a clumsy API design to *disable*, right?). And there is no hint about the scope.
And it *could* be so much more readable like this:
dev_access_enable(DEV_ACCESS_THIS_THREAD);
thanks,
On Fri, Oct 09, 2020 at 07:53:07PM -0700, John Hubbard wrote:
On 10/9/20 12:50 PM, ira.weiny@intel.com wrote:
From: Ira Weiny ira.weiny@intel.com
The pmem driver uses a cached virtual address to access its memory directly. Because the nvdimm driver is well aware of the special protections it has mapped memory with, we call dev_access_[en|dis]able() around the direct pmem->virt_addr (pmem_addr) usage instead of the unnecessary overhead of trying to get a page to kmap.
Signed-off-by: Ira Weiny ira.weiny@intel.com
drivers/nvdimm/pmem.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index fab29b514372..e4dc1ae990fc 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -148,7 +148,9 @@ static blk_status_t pmem_do_read(struct pmem_device *pmem, if (unlikely(is_bad_pmem(&pmem->bb, sector, len))) return BLK_STS_IOERR;
- dev_access_enable(false); rc = read_pmem(page, page_off, pmem_addr, len);
- dev_access_disable(false);
Hi Ira!
The APIs should be tweaked to use a symbol (GLOBAL, PER_THREAD), instead of true/false. Try reading the above and you'll see that it sounds like it's doing the opposite of what it is ("enable_this(false)" sounds like a clumsy API design to *disable*, right?). And there is no hint about the scope.
Sounds reasonable.
And it *could* be so much more readable like this:
dev_access_enable(DEV_ACCESS_THIS_THREAD);
I'll think about the flag name. I'm not liking 'this thread'.
Maybe DEV_ACCESS_[GLOBAL|THREAD]
Ira
From: Ira Weiny ira.weiny@intel.com
Protecting against stray writes is particularly important for PMEM because, unlike writes to anonymous memory, writes to PMEM persists across a reboot. Thus data corruption could result in permanent loss of data.
While stray writes are more serious than reads, protection is also enabled for reads. This helps to detect bugs in code which would incorrectly access device memory and prevents a more serious machine checks should those bug reads from a poison page.
Enable stray access protection by setting the flag in pgmap which requests it. There is no option presented to the user. If Zone Device Access Protection not be supported this flag will have no affect.
Signed-off-by: Ira Weiny ira.weiny@intel.com --- drivers/dax/device.c | 2 ++ drivers/nvdimm/pmem.c | 2 ++ 2 files changed, 4 insertions(+)
diff --git a/drivers/dax/device.c b/drivers/dax/device.c index 1e89513f3c59..e6fb35b4f0fb 100644 --- a/drivers/dax/device.c +++ b/drivers/dax/device.c @@ -430,6 +430,8 @@ int dev_dax_probe(struct device *dev) }
dev_dax->pgmap.type = MEMORY_DEVICE_GENERIC; + dev_dax->pgmap.flags |= PGMAP_PROT_ENABLED; + addr = devm_memremap_pages(dev, &dev_dax->pgmap); if (IS_ERR(addr)) return PTR_ERR(addr); diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c index e4dc1ae990fc..9fcd8338e23f 100644 --- a/drivers/nvdimm/pmem.c +++ b/drivers/nvdimm/pmem.c @@ -426,6 +426,8 @@ static int pmem_attach_disk(struct device *dev, return -EBUSY; }
+ pmem->pgmap.flags |= PGMAP_PROT_ENABLED; + q = blk_alloc_queue(dev_to_node(dev)); if (!q) return -ENOMEM;
linux-kselftest-mirror@lists.linaro.org