This is a resend of previously submitted series: https://lore.kernel.org/patchwork/cover/1462071/ https://lore.kernel.org/patchwork/patch/1458703/ https://lore.kernel.org/lkml/YPG8SdsbQ+sxjk0w@yury-ThinkPad/T/ https://lore.kernel.org/lkml/YMVSHCY9yEocmfVD@yury-ThinkPad/T/
Most of the patches received testing and review. If I missed to add someone's review tag putting all together - my kind apologise. Please resend it here.
I believe I addessed all comments except Joe's one. In comment to patch 3, Joe Perches suggested to rename include/linux/find.h, but didn't give a new name, so I leave it as is. Since this header is not for direct inclusion, I'm OK with any reasonable name, and we can change it later.
Andrew, can you please take this series in linux-next?
Andy Shevchenko (1): tools: Rename bitmap_alloc() to bitmap_zalloc()
Yury Norov (16): bitops: protect find_first_{,zero}_bit properly bitops: move find_bit_*_le functions from le.h to find.h include: move find.h from asm_generic to linux arch: remove GENERIC_FIND_FIRST_BIT entirely lib: add find_first_and_bit() cpumask: use find_first_and_bit() all: replace find_next{,_zero}_bit with find_first{,_zero}_bit where appropriate tools: sync tools/bitmap with mother linux cpumask: replace cpumask_next_* with cpumask_first_* where appropriate include/linux: move for_each_bit() macros from bitops.h to find.h find: micro-optimize for_each_{set,clear}_bit() Replace for_each_*_bit_from() with for_each_*_bit() where appropriate mm/percpu: micro-optimize pcpu_is_populated() bitmap: unify find_bit operations lib: bitmap: add performance test for bitmap_print_to_pagebuf vsprintf: rework bitmap_list_string
MAINTAINERS | 4 +- arch/alpha/include/asm/bitops.h | 2 - arch/arc/Kconfig | 1 - arch/arc/include/asm/bitops.h | 1 - arch/arm/include/asm/bitops.h | 1 - arch/arm64/Kconfig | 1 - arch/arm64/include/asm/bitops.h | 1 - arch/csky/include/asm/bitops.h | 1 - arch/h8300/include/asm/bitops.h | 1 - arch/hexagon/include/asm/bitops.h | 1 - arch/ia64/include/asm/bitops.h | 2 - arch/m68k/include/asm/bitops.h | 2 - arch/mips/Kconfig | 1 - arch/mips/include/asm/bitops.h | 1 - arch/openrisc/include/asm/bitops.h | 1 - arch/parisc/include/asm/bitops.h | 2 - arch/powerpc/include/asm/bitops.h | 2 - arch/powerpc/include/asm/cputhreads.h | 2 +- arch/powerpc/platforms/pasemi/dma_lib.c | 4 +- arch/riscv/include/asm/bitops.h | 1 - arch/s390/Kconfig | 1 - arch/s390/include/asm/bitops.h | 1 - arch/s390/kvm/kvm-s390.c | 2 +- arch/sh/include/asm/bitops.h | 1 - arch/sparc/include/asm/bitops_32.h | 1 - arch/sparc/include/asm/bitops_64.h | 2 - arch/x86/Kconfig | 1 - arch/x86/include/asm/bitops.h | 2 - arch/x86/kernel/apic/vector.c | 4 +- arch/x86/um/Kconfig | 1 - arch/xtensa/include/asm/bitops.h | 1 - block/blk-mq.c | 2 +- drivers/block/rnbd/rnbd-clt.c | 2 +- drivers/dma/ti/edma.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 +- drivers/hwmon/ltc2992.c | 3 +- drivers/iio/adc/ad7124.c | 2 +- drivers/infiniband/hw/irdma/hw.c | 16 +- drivers/media/cec/core/cec-core.c | 2 +- drivers/media/mc/mc-devnode.c | 2 +- drivers/mmc/host/renesas_sdhi_core.c | 2 +- drivers/net/virtio_net.c | 2 +- drivers/pci/controller/dwc/pci-dra7xx.c | 2 +- drivers/scsi/lpfc/lpfc_sli.c | 10 +- drivers/soc/fsl/qbman/bman_portal.c | 2 +- drivers/soc/fsl/qbman/qman_portal.c | 2 +- drivers/soc/ti/k3-ringacc.c | 4 +- drivers/tty/n_tty.c | 2 +- drivers/virt/acrn/ioreq.c | 3 +- fs/f2fs/segment.c | 8 +- fs/ocfs2/cluster/heartbeat.c | 2 +- fs/ocfs2/dlm/dlmdomain.c | 4 +- fs/ocfs2/dlm/dlmmaster.c | 18 +- fs/ocfs2/dlm/dlmrecovery.c | 2 +- fs/ocfs2/dlm/dlmthread.c | 2 +- include/asm-generic/bitops.h | 1 - include/asm-generic/bitops/le.h | 64 --- include/linux/bitmap.h | 34 +- include/linux/bitops.h | 34 -- include/linux/cpumask.h | 46 ++- include/linux/find.h | 372 ++++++++++++++++++ kernel/time/clocksource.c | 4 +- lib/Kconfig | 3 - lib/find_bit.c | 21 + lib/find_bit_benchmark.c | 21 + lib/genalloc.c | 2 +- lib/test_bitmap.c | 37 ++ lib/vsprintf.c | 24 +- mm/percpu.c | 35 +- net/ncsi/ncsi-manage.c | 4 +- tools/include/asm-generic/bitops.h | 1 - tools/include/asm-generic/bitops/find.h | 145 ------- tools/include/linux/bitmap.h | 11 +- .../bitops => tools/include/linux}/find.h | 54 ++- tools/lib/find_bit.c | 20 + tools/perf/bench/find-bit-bench.c | 2 +- tools/perf/builtin-c2c.c | 6 +- tools/perf/builtin-record.c | 2 +- tools/perf/tests/bitmap.c | 2 +- tools/perf/tests/mem2node.c | 2 +- tools/perf/util/affinity.c | 4 +- tools/perf/util/header.c | 4 +- tools/perf/util/metricgroup.c | 2 +- tools/perf/util/mmap.c | 4 +- .../selftests/kvm/dirty_log_perf_test.c | 2 +- tools/testing/selftests/kvm/dirty_log_test.c | 4 +- .../selftests/kvm/x86_64/vmx_dirty_log_test.c | 2 +- 87 files changed, 657 insertions(+), 461 deletions(-) create mode 100644 include/linux/find.h delete mode 100644 tools/include/asm-generic/bitops/find.h rename {include/asm-generic/bitops => tools/include/linux}/find.h (83%)
find_first_bit() and find_first_zero_bit() are not protected with ifdefs as other functions in find.h. It causes build errors on some platforms if CONFIG_GENERIC_FIND_FIRST_BIT is enabled.
Signed-off-by: Yury Norov yury.norov@gmail.com Fixes: 2cc7b6a44ac2 ("lib: add fast path for find_first_*_bit() and find_last_bit()") Reported-by: kernel test robot lkp@intel.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- include/asm-generic/bitops/find.h | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h index 0d132ee2a291..835f959a25f2 100644 --- a/include/asm-generic/bitops/find.h +++ b/include/asm-generic/bitops/find.h @@ -97,6 +97,7 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size,
#ifdef CONFIG_GENERIC_FIND_FIRST_BIT
+#ifndef find_first_bit /** * find_first_bit - find the first set bit in a memory region * @addr: The address to start the search at @@ -116,7 +117,9 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
return _find_first_bit(addr, size); } +#endif
+#ifndef find_first_zero_bit /** * find_first_zero_bit - find the first cleared bit in a memory region * @addr: The address to start the search at @@ -136,6 +139,8 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size)
return _find_first_zero_bit(addr, size); } +#endif + #else /* CONFIG_GENERIC_FIND_FIRST_BIT */
#ifndef find_first_bit
It's convenient to have all find_bit declarations in one place.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- include/asm-generic/bitops/find.h | 69 +++++++++++++++++++++++++++++++ include/asm-generic/bitops/le.h | 64 ---------------------------- 2 files changed, 69 insertions(+), 64 deletions(-)
diff --git a/include/asm-generic/bitops/find.h b/include/asm-generic/bitops/find.h index 835f959a25f2..91b1b23f2b0c 100644 --- a/include/asm-generic/bitops/find.h +++ b/include/asm-generic/bitops/find.h @@ -190,4 +190,73 @@ extern unsigned long find_next_clump8(unsigned long *clump, #define find_first_clump8(clump, bits, size) \ find_next_clump8((clump), (bits), (size), 0)
+#if defined(__LITTLE_ENDIAN) + +static inline unsigned long find_next_zero_bit_le(const void *addr, + unsigned long size, unsigned long offset) +{ + return find_next_zero_bit(addr, size, offset); +} + +static inline unsigned long find_next_bit_le(const void *addr, + unsigned long size, unsigned long offset) +{ + return find_next_bit(addr, size, offset); +} + +static inline unsigned long find_first_zero_bit_le(const void *addr, + unsigned long size) +{ + return find_first_zero_bit(addr, size); +} + +#elif defined(__BIG_ENDIAN) + +#ifndef find_next_zero_bit_le +static inline +unsigned long find_next_zero_bit_le(const void *addr, unsigned + long size, unsigned long offset) +{ + if (small_const_nbits(size)) { + unsigned long val = *(const unsigned long *)addr; + + if (unlikely(offset >= size)) + return size; + + val = swab(val) | ~GENMASK(size - 1, offset); + return val == ~0UL ? size : ffz(val); + } + + return _find_next_bit(addr, NULL, size, offset, ~0UL, 1); +} +#endif + +#ifndef find_next_bit_le +static inline +unsigned long find_next_bit_le(const void *addr, unsigned + long size, unsigned long offset) +{ + if (small_const_nbits(size)) { + unsigned long val = *(const unsigned long *)addr; + + if (unlikely(offset >= size)) + return size; + + val = swab(val) & GENMASK(size - 1, offset); + return val ? __ffs(val) : size; + } + + return _find_next_bit(addr, NULL, size, offset, 0UL, 1); +} +#endif + +#ifndef find_first_zero_bit_le +#define find_first_zero_bit_le(addr, size) \ + find_next_zero_bit_le((addr), (size), 0) +#endif + +#else +#error "Please fix <asm/byteorder.h>" +#endif + #endif /*_ASM_GENERIC_BITOPS_FIND_H_ */ diff --git a/include/asm-generic/bitops/le.h b/include/asm-generic/bitops/le.h index 5a28629cbf4d..d51beff60375 100644 --- a/include/asm-generic/bitops/le.h +++ b/include/asm-generic/bitops/le.h @@ -2,83 +2,19 @@ #ifndef _ASM_GENERIC_BITOPS_LE_H_ #define _ASM_GENERIC_BITOPS_LE_H_
-#include <asm-generic/bitops/find.h> #include <asm/types.h> #include <asm/byteorder.h> -#include <linux/swab.h>
#if defined(__LITTLE_ENDIAN)
#define BITOP_LE_SWIZZLE 0
-static inline unsigned long find_next_zero_bit_le(const void *addr, - unsigned long size, unsigned long offset) -{ - return find_next_zero_bit(addr, size, offset); -} - -static inline unsigned long find_next_bit_le(const void *addr, - unsigned long size, unsigned long offset) -{ - return find_next_bit(addr, size, offset); -} - -static inline unsigned long find_first_zero_bit_le(const void *addr, - unsigned long size) -{ - return find_first_zero_bit(addr, size); -} - #elif defined(__BIG_ENDIAN)
#define BITOP_LE_SWIZZLE ((BITS_PER_LONG-1) & ~0x7)
-#ifndef find_next_zero_bit_le -static inline -unsigned long find_next_zero_bit_le(const void *addr, unsigned - long size, unsigned long offset) -{ - if (small_const_nbits(size)) { - unsigned long val = *(const unsigned long *)addr; - - if (unlikely(offset >= size)) - return size; - - val = swab(val) | ~GENMASK(size - 1, offset); - return val == ~0UL ? size : ffz(val); - } - - return _find_next_bit(addr, NULL, size, offset, ~0UL, 1); -} -#endif - -#ifndef find_next_bit_le -static inline -unsigned long find_next_bit_le(const void *addr, unsigned - long size, unsigned long offset) -{ - if (small_const_nbits(size)) { - unsigned long val = *(const unsigned long *)addr; - - if (unlikely(offset >= size)) - return size; - - val = swab(val) & GENMASK(size - 1, offset); - return val ? __ffs(val) : size; - } - - return _find_next_bit(addr, NULL, size, offset, 0UL, 1); -} #endif
-#ifndef find_first_zero_bit_le -#define find_first_zero_bit_le(addr, size) \ - find_next_zero_bit_le((addr), (size), 0) -#endif - -#else -#error "Please fix <asm/byteorder.h>" -#endif
static inline int test_bit_le(int nr, const void *addr) {
find_bit API and bitmap API are closely related, but inclusion paths are different - include/asm-generic and include/linux, correspondingly. In the past it made a lot of troubles due to circular dependencies and/or undefined symbols. Fix this by moving find.h under include/linux.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- MAINTAINERS | 2 +- arch/alpha/include/asm/bitops.h | 2 -- arch/arc/include/asm/bitops.h | 1 - arch/arm/include/asm/bitops.h | 1 - arch/arm64/include/asm/bitops.h | 1 - arch/csky/include/asm/bitops.h | 1 - arch/h8300/include/asm/bitops.h | 1 - arch/hexagon/include/asm/bitops.h | 1 - arch/ia64/include/asm/bitops.h | 2 -- arch/m68k/include/asm/bitops.h | 2 -- arch/mips/include/asm/bitops.h | 1 - arch/openrisc/include/asm/bitops.h | 1 - arch/parisc/include/asm/bitops.h | 2 -- arch/powerpc/include/asm/bitops.h | 2 -- arch/riscv/include/asm/bitops.h | 1 - arch/s390/include/asm/bitops.h | 1 - arch/sh/include/asm/bitops.h | 1 - arch/sparc/include/asm/bitops_32.h | 1 - arch/sparc/include/asm/bitops_64.h | 2 -- arch/x86/include/asm/bitops.h | 2 -- arch/xtensa/include/asm/bitops.h | 1 - include/asm-generic/bitops.h | 1 - include/linux/bitmap.h | 1 + include/{asm-generic/bitops => linux}/find.h | 12 +++++++++--- 24 files changed, 11 insertions(+), 32 deletions(-) rename include/{asm-generic/bitops => linux}/find.h (97%)
diff --git a/MAINTAINERS b/MAINTAINERS index b63403793c81..9b62293f7b72 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3271,8 +3271,8 @@ M: Yury Norov yury.norov@gmail.com R: Andy Shevchenko andriy.shevchenko@linux.intel.com R: Rasmus Villemoes linux@rasmusvillemoes.dk S: Maintained -F: include/asm-generic/bitops/find.h F: include/linux/bitmap.h +F: include/linux/find.h F: lib/bitmap.c F: lib/find_bit.c F: lib/find_bit_benchmark.c diff --git a/arch/alpha/include/asm/bitops.h b/arch/alpha/include/asm/bitops.h index 5adca78830b5..e1d8483a45f2 100644 --- a/arch/alpha/include/asm/bitops.h +++ b/arch/alpha/include/asm/bitops.h @@ -430,8 +430,6 @@ static inline unsigned int __arch_hweight8(unsigned int w)
#endif /* __KERNEL__ */
-#include <asm-generic/bitops/find.h> - #ifdef __KERNEL__
/* diff --git a/arch/arc/include/asm/bitops.h b/arch/arc/include/asm/bitops.h index a7daaf64ae34..bdb7e190a294 100644 --- a/arch/arc/include/asm/bitops.h +++ b/arch/arc/include/asm/bitops.h @@ -189,7 +189,6 @@ static inline __attribute__ ((const)) unsigned long __ffs(unsigned long x) #include <asm-generic/bitops/atomic.h> #include <asm-generic/bitops/non-atomic.h>
-#include <asm-generic/bitops/find.h> #include <asm-generic/bitops/le.h> #include <asm-generic/bitops/ext2-atomic-setbit.h>
diff --git a/arch/arm/include/asm/bitops.h b/arch/arm/include/asm/bitops.h index c92e42a5c8f7..8e94fe7ab5eb 100644 --- a/arch/arm/include/asm/bitops.h +++ b/arch/arm/include/asm/bitops.h @@ -264,7 +264,6 @@ static inline int find_next_bit_le(const void *p, int size, int offset)
#endif
-#include <asm-generic/bitops/find.h> #include <asm-generic/bitops/le.h>
/* diff --git a/arch/arm64/include/asm/bitops.h b/arch/arm64/include/asm/bitops.h index 81a3e519b07d..9b3c787132d2 100644 --- a/arch/arm64/include/asm/bitops.h +++ b/arch/arm64/include/asm/bitops.h @@ -18,7 +18,6 @@
#include <asm-generic/bitops/ffz.h> #include <asm-generic/bitops/fls64.h> -#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/sched.h> #include <asm-generic/bitops/hweight.h> diff --git a/arch/csky/include/asm/bitops.h b/arch/csky/include/asm/bitops.h index 91818787d860..9604f47bd850 100644 --- a/arch/csky/include/asm/bitops.h +++ b/arch/csky/include/asm/bitops.h @@ -59,7 +59,6 @@ static __always_inline unsigned long __fls(unsigned long x)
#include <asm-generic/bitops/ffz.h> #include <asm-generic/bitops/fls64.h> -#include <asm-generic/bitops/find.h>
#ifndef _LINUX_BITOPS_H #error only <linux/bitops.h> can be included directly diff --git a/arch/h8300/include/asm/bitops.h b/arch/h8300/include/asm/bitops.h index c867a80cab5b..4489e3d6edd3 100644 --- a/arch/h8300/include/asm/bitops.h +++ b/arch/h8300/include/asm/bitops.h @@ -168,7 +168,6 @@ static inline unsigned long __ffs(unsigned long word) return result; }
-#include <asm-generic/bitops/find.h> #include <asm-generic/bitops/sched.h> #include <asm-generic/bitops/hweight.h> #include <asm-generic/bitops/lock.h> diff --git a/arch/hexagon/include/asm/bitops.h b/arch/hexagon/include/asm/bitops.h index 71429f756af0..75d6ba3643b8 100644 --- a/arch/hexagon/include/asm/bitops.h +++ b/arch/hexagon/include/asm/bitops.h @@ -271,7 +271,6 @@ static inline unsigned long __fls(unsigned long word) }
#include <asm-generic/bitops/lock.h> -#include <asm-generic/bitops/find.h>
#include <asm-generic/bitops/fls64.h> #include <asm-generic/bitops/sched.h> diff --git a/arch/ia64/include/asm/bitops.h b/arch/ia64/include/asm/bitops.h index 2f24ee6459d2..577be93c0818 100644 --- a/arch/ia64/include/asm/bitops.h +++ b/arch/ia64/include/asm/bitops.h @@ -441,8 +441,6 @@ static __inline__ unsigned long __arch_hweight64(unsigned long x)
#endif /* __KERNEL__ */
-#include <asm-generic/bitops/find.h> - #ifdef __KERNEL__
#include <asm-generic/bitops/le.h> diff --git a/arch/m68k/include/asm/bitops.h b/arch/m68k/include/asm/bitops.h index 7b414099e5fc..f551a2160294 100644 --- a/arch/m68k/include/asm/bitops.h +++ b/arch/m68k/include/asm/bitops.h @@ -529,6 +529,4 @@ static inline int __fls(int x) #include <asm-generic/bitops/le.h> #endif /* __KERNEL__ */
-#include <asm-generic/bitops/find.h> - #endif /* _M68K_BITOPS_H */ diff --git a/arch/mips/include/asm/bitops.h b/arch/mips/include/asm/bitops.h index dc2a6234dd3c..c09d57f907f7 100644 --- a/arch/mips/include/asm/bitops.h +++ b/arch/mips/include/asm/bitops.h @@ -446,7 +446,6 @@ static inline int ffs(int word) }
#include <asm-generic/bitops/ffz.h> -#include <asm-generic/bitops/find.h>
#ifdef __KERNEL__
diff --git a/arch/openrisc/include/asm/bitops.h b/arch/openrisc/include/asm/bitops.h index 7f1ca35213d8..d773ed938acb 100644 --- a/arch/openrisc/include/asm/bitops.h +++ b/arch/openrisc/include/asm/bitops.h @@ -30,7 +30,6 @@ #include <asm/bitops/fls.h> #include <asm/bitops/__fls.h> #include <asm-generic/bitops/fls64.h> -#include <asm-generic/bitops/find.h>
#ifndef _LINUX_BITOPS_H #error only <linux/bitops.h> can be included directly diff --git a/arch/parisc/include/asm/bitops.h b/arch/parisc/include/asm/bitops.h index aa4e883431c1..c7a9997ac9cb 100644 --- a/arch/parisc/include/asm/bitops.h +++ b/arch/parisc/include/asm/bitops.h @@ -208,8 +208,6 @@ static __inline__ int fls(unsigned int x)
#endif /* __KERNEL__ */
-#include <asm-generic/bitops/find.h> - #ifdef __KERNEL__
#include <asm-generic/bitops/le.h> diff --git a/arch/powerpc/include/asm/bitops.h b/arch/powerpc/include/asm/bitops.h index 299ab33505a6..ce2c1fa1a45d 100644 --- a/arch/powerpc/include/asm/bitops.h +++ b/arch/powerpc/include/asm/bitops.h @@ -255,8 +255,6 @@ unsigned long __arch_hweight64(__u64 w); #include <asm-generic/bitops/hweight.h> #endif
-#include <asm-generic/bitops/find.h> - /* wrappers that deal with KASAN instrumentation */ #include <asm-generic/bitops/instrumented-atomic.h> #include <asm-generic/bitops/instrumented-lock.h> diff --git a/arch/riscv/include/asm/bitops.h b/arch/riscv/include/asm/bitops.h index 396a3303c537..3540b690944b 100644 --- a/arch/riscv/include/asm/bitops.h +++ b/arch/riscv/include/asm/bitops.h @@ -20,7 +20,6 @@ #include <asm-generic/bitops/fls.h> #include <asm-generic/bitops/__fls.h> #include <asm-generic/bitops/fls64.h> -#include <asm-generic/bitops/find.h> #include <asm-generic/bitops/sched.h> #include <asm-generic/bitops/ffs.h>
diff --git a/arch/s390/include/asm/bitops.h b/arch/s390/include/asm/bitops.h index fd149480b6e2..f7cefdde7c24 100644 --- a/arch/s390/include/asm/bitops.h +++ b/arch/s390/include/asm/bitops.h @@ -387,7 +387,6 @@ static inline int fls(unsigned int word) #endif /* CONFIG_HAVE_MARCH_Z9_109_FEATURES */
#include <asm-generic/bitops/ffz.h> -#include <asm-generic/bitops/find.h> #include <asm-generic/bitops/hweight.h> #include <asm-generic/bitops/sched.h> #include <asm-generic/bitops/le.h> diff --git a/arch/sh/include/asm/bitops.h b/arch/sh/include/asm/bitops.h index 3b6c7b5b7ec9..10ceb0d6b5a9 100644 --- a/arch/sh/include/asm/bitops.h +++ b/arch/sh/include/asm/bitops.h @@ -68,6 +68,5 @@ static inline unsigned long __ffs(unsigned long word) #include <asm-generic/bitops/fls64.h>
#include <asm-generic/bitops/le.h> -#include <asm-generic/bitops/find.h>
#endif /* __ASM_SH_BITOPS_H */ diff --git a/arch/sparc/include/asm/bitops_32.h b/arch/sparc/include/asm/bitops_32.h index 0ceff3b915a8..889afa9f990f 100644 --- a/arch/sparc/include/asm/bitops_32.h +++ b/arch/sparc/include/asm/bitops_32.h @@ -100,7 +100,6 @@ static inline void change_bit(unsigned long nr, volatile unsigned long *addr) #include <asm-generic/bitops/fls64.h> #include <asm-generic/bitops/hweight.h> #include <asm-generic/bitops/lock.h> -#include <asm-generic/bitops/find.h> #include <asm-generic/bitops/le.h> #include <asm-generic/bitops/ext2-atomic.h>
diff --git a/arch/sparc/include/asm/bitops_64.h b/arch/sparc/include/asm/bitops_64.h index ca7ea5913494..005a8ae858f1 100644 --- a/arch/sparc/include/asm/bitops_64.h +++ b/arch/sparc/include/asm/bitops_64.h @@ -52,8 +52,6 @@ unsigned int __arch_hweight8(unsigned int w); #include <asm-generic/bitops/lock.h> #endif /* __KERNEL__ */
-#include <asm-generic/bitops/find.h> - #ifdef __KERNEL__
#include <asm-generic/bitops/le.h> diff --git a/arch/x86/include/asm/bitops.h b/arch/x86/include/asm/bitops.h index 0367efdc5b7a..a288ecd230ab 100644 --- a/arch/x86/include/asm/bitops.h +++ b/arch/x86/include/asm/bitops.h @@ -380,8 +380,6 @@ static __always_inline int fls64(__u64 x) #include <asm-generic/bitops/fls64.h> #endif
-#include <asm-generic/bitops/find.h> - #include <asm-generic/bitops/sched.h>
#include <asm/arch_hweight.h> diff --git a/arch/xtensa/include/asm/bitops.h b/arch/xtensa/include/asm/bitops.h index 3f71d364ba90..cd225896c40f 100644 --- a/arch/xtensa/include/asm/bitops.h +++ b/arch/xtensa/include/asm/bitops.h @@ -205,7 +205,6 @@ BIT_OPS(change, "xor", ) #undef BIT_OP #undef TEST_AND_BIT_OP
-#include <asm-generic/bitops/find.h> #include <asm-generic/bitops/le.h>
#include <asm-generic/bitops/ext2-atomic-setbit.h> diff --git a/include/asm-generic/bitops.h b/include/asm-generic/bitops.h index df9b5bc3d282..a47b8a71d6fe 100644 --- a/include/asm-generic/bitops.h +++ b/include/asm-generic/bitops.h @@ -20,7 +20,6 @@ #include <asm-generic/bitops/fls.h> #include <asm-generic/bitops/__fls.h> #include <asm-generic/bitops/fls64.h> -#include <asm-generic/bitops/find.h>
#ifndef _LINUX_BITOPS_H #error only <linux/bitops.h> can be included directly diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index a36cfcec4e77..3f7c6731b203 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -6,6 +6,7 @@
#include <linux/align.h> #include <linux/bitops.h> +#include <linux/find.h> #include <linux/limits.h> #include <linux/string.h> #include <linux/types.h> diff --git a/include/asm-generic/bitops/find.h b/include/linux/find.h similarity index 97% rename from include/asm-generic/bitops/find.h rename to include/linux/find.h index 91b1b23f2b0c..c5410c243e04 100644 --- a/include/asm-generic/bitops/find.h +++ b/include/linux/find.h @@ -1,6 +1,12 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _ASM_GENERIC_BITOPS_FIND_H_ -#define _ASM_GENERIC_BITOPS_FIND_H_ +#ifndef __LINUX_FIND_H_ +#define __LINUX_FIND_H_ + +#ifndef __LINUX_BITMAP_H +#error only <linux/bitmap.h> can be included directly +#endif + +#include <linux/bitops.h>
extern unsigned long _find_next_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long nbits, @@ -259,4 +265,4 @@ unsigned long find_next_bit_le(const void *addr, unsigned #error "Please fix <asm/byteorder.h>" #endif
-#endif /*_ASM_GENERIC_BITOPS_FIND_H_ */ +#endif /*__LINUX_FIND_H_ */
In 5.12 cycle we enabled GENERIC_FIND_FIRST_BIT config option for ARM64 and MIPS. It increased performance and shrunk .text size; and so far I didn't receive any negative feedback on the change.
https://lore.kernel.org/linux-arch/20210225135700.1381396-1-yury.norov@gmail...
Now I think it's a good time to switch all architectures to use find_{first,last}_bit() unconditionally, and so remove corresponding config option.
The patch does't introduce functioal changes for arc, arm, arm64, mips, m68k, s390 and x86, for other architectures I expect improvement both in performance and .text size.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Alexander Lobakin alobakin@pm.me (mips) Reviewed-by: Alexander Lobakin alobakin@pm.me (mips) Reviewed-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Acked-by: Will Deacon will@kernel.org Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- arch/arc/Kconfig | 1 - arch/arm64/Kconfig | 1 - arch/mips/Kconfig | 1 - arch/s390/Kconfig | 1 - arch/x86/Kconfig | 1 - arch/x86/um/Kconfig | 1 - include/linux/find.h | 13 ------------- lib/Kconfig | 3 --- 8 files changed, 22 deletions(-)
diff --git a/arch/arc/Kconfig b/arch/arc/Kconfig index 53d143fc42fe..f35994edc165 100644 --- a/arch/arc/Kconfig +++ b/arch/arc/Kconfig @@ -20,7 +20,6 @@ config ARC select COMMON_CLK select DMA_DIRECT_REMAP select GENERIC_ATOMIC64 if !ISA_ARCV2 || !(ARC_HAS_LL64 && ARC_HAS_LLSC) - select GENERIC_FIND_FIRST_BIT # for now, we don't need GENERIC_IRQ_PROBE, CONFIG_GENERIC_IRQ_CHIP select GENERIC_IRQ_SHOW select GENERIC_PCI_IOMAP diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index fdcd54d39c1e..7b2269ad3e12 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -119,7 +119,6 @@ config ARM64 select GENERIC_CPU_AUTOPROBE select GENERIC_CPU_VULNERABILITIES select GENERIC_EARLY_IOREMAP - select GENERIC_FIND_FIRST_BIT select GENERIC_IDLE_POLL_SETUP select GENERIC_IRQ_IPI select GENERIC_IRQ_PROBE diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig index da874375fa4b..d7758c8f7e37 100644 --- a/arch/mips/Kconfig +++ b/arch/mips/Kconfig @@ -30,7 +30,6 @@ config MIPS select GENERIC_ATOMIC64 if !64BIT select GENERIC_CMOS_UPDATE select GENERIC_CPU_AUTOPROBE - select GENERIC_FIND_FIRST_BIT select GENERIC_GETTIMEOFDAY select GENERIC_IOMAP select GENERIC_IRQ_PROBE diff --git a/arch/s390/Kconfig b/arch/s390/Kconfig index 92c0a1b4c528..7a3bbaf19640 100644 --- a/arch/s390/Kconfig +++ b/arch/s390/Kconfig @@ -126,7 +126,6 @@ config S390 select GENERIC_CPU_AUTOPROBE select GENERIC_CPU_VULNERABILITIES select GENERIC_ENTRY - select GENERIC_FIND_FIRST_BIT select GENERIC_GETTIMEOFDAY select GENERIC_PTDUMP select GENERIC_SMP_IDLE_THREAD diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 45962aaf2b2c..7edce30dc214 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -134,7 +134,6 @@ config X86 select GENERIC_CPU_VULNERABILITIES select GENERIC_EARLY_IOREMAP select GENERIC_ENTRY - select GENERIC_FIND_FIRST_BIT select GENERIC_IOMAP select GENERIC_IRQ_EFFECTIVE_AFF_MASK if SMP select GENERIC_IRQ_MATRIX_ALLOCATOR if X86_LOCAL_APIC diff --git a/arch/x86/um/Kconfig b/arch/x86/um/Kconfig index 95d26a69088b..40d6a06e41c8 100644 --- a/arch/x86/um/Kconfig +++ b/arch/x86/um/Kconfig @@ -8,7 +8,6 @@ endmenu
config UML_X86 def_bool y - select GENERIC_FIND_FIRST_BIT
config 64BIT bool "64-bit kernel" if "$(SUBARCH)" = "x86" diff --git a/include/linux/find.h b/include/linux/find.h index c5410c243e04..ea57f7f38c49 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -101,8 +101,6 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, } #endif
-#ifdef CONFIG_GENERIC_FIND_FIRST_BIT - #ifndef find_first_bit /** * find_first_bit - find the first set bit in a memory region @@ -147,17 +145,6 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size) } #endif
-#else /* CONFIG_GENERIC_FIND_FIRST_BIT */ - -#ifndef find_first_bit -#define find_first_bit(addr, size) find_next_bit((addr), (size), 0) -#endif -#ifndef find_first_zero_bit -#define find_first_zero_bit(addr, size) find_next_zero_bit((addr), (size), 0) -#endif - -#endif /* CONFIG_GENERIC_FIND_FIRST_BIT */ - #ifndef find_last_bit /** * find_last_bit - find the last set bit in a memory region diff --git a/lib/Kconfig b/lib/Kconfig index 5c9c0687f76d..f8874240835b 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -59,9 +59,6 @@ config GENERIC_STRNLEN_USER config GENERIC_NET_UTILS bool
-config GENERIC_FIND_FIRST_BIT - bool - source "lib/math/Kconfig"
config NO_GENERIC_PCI_IOPORT_MAP
Currently find_first_and_bit() is an alias to find_next_and_bit(). However, it is widely used in cpumask, so it worth to optimize it. This patch adds its own implementation for find_first_and_bit().
On x86_64 find_bit_benchmark says:
Before (#define find_first_and_bit(...) find_next_and_bit(..., 0): Start testing find_bit() with random-filled bitmap [ 140.291468] find_first_and_bit: 46890919 ns, 32671 iterations Start testing find_bit() with sparse bitmap [ 140.295028] find_first_and_bit: 7103 ns, 1 iterations
After: Start testing find_bit() with random-filled bitmap [ 162.574907] find_first_and_bit: 25045813 ns, 32846 iterations Start testing find_bit() with sparse bitmap [ 162.578458] find_first_and_bit: 4900 ns, 1 iterations
(Thanks to Alexey Klimov for thorough testing.)
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com Tested-by: Alexey Klimov aklimov@redhat.com --- include/linux/find.h | 27 +++++++++++++++++++++++++++ lib/find_bit.c | 21 +++++++++++++++++++++ lib/find_bit_benchmark.c | 21 +++++++++++++++++++++ 3 files changed, 69 insertions(+)
diff --git a/include/linux/find.h b/include/linux/find.h index ea57f7f38c49..6048f8c97418 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -12,6 +12,8 @@ extern unsigned long _find_next_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long nbits, unsigned long start, unsigned long invert, unsigned long le); extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size); +extern unsigned long _find_first_and_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long size); extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size); extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
@@ -123,6 +125,31 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size) } #endif
+#ifndef find_first_and_bit +/** + * find_first_and_bit - find the first set bit in both memory regions + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @size: The bitmap size in bits + * + * Returns the bit number for the next set bit + * If no bits are set, returns @size. + */ +static inline +unsigned long find_first_and_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + if (small_const_nbits(size)) { + unsigned long val = *addr1 & *addr2 & GENMASK(size - 1, 0); + + return val ? __ffs(val) : size; + } + + return _find_first_and_bit(addr1, addr2, size); +} +#endif + #ifndef find_first_zero_bit /** * find_first_zero_bit - find the first cleared bit in a memory region diff --git a/lib/find_bit.c b/lib/find_bit.c index 0f8e2e369b1d..1b8e4b2a9cba 100644 --- a/lib/find_bit.c +++ b/lib/find_bit.c @@ -89,6 +89,27 @@ unsigned long _find_first_bit(const unsigned long *addr, unsigned long size) EXPORT_SYMBOL(_find_first_bit); #endif
+#ifndef find_first_and_bit +/* + * Find the first set bit in two memory regions. + */ +unsigned long _find_first_and_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + unsigned long idx, val; + + for (idx = 0; idx * BITS_PER_LONG < size; idx++) { + val = addr1[idx] & addr2[idx]; + if (val) + return min(idx * BITS_PER_LONG + __ffs(val), size); + } + + return size; +} +EXPORT_SYMBOL(_find_first_and_bit); +#endif + #ifndef find_first_zero_bit /* * Find the first cleared bit in a memory region. diff --git a/lib/find_bit_benchmark.c b/lib/find_bit_benchmark.c index 5637c5711db9..db904b57d4b8 100644 --- a/lib/find_bit_benchmark.c +++ b/lib/find_bit_benchmark.c @@ -49,6 +49,25 @@ static int __init test_find_first_bit(void *bitmap, unsigned long len) return 0; }
+static int __init test_find_first_and_bit(void *bitmap, const void *bitmap2, unsigned long len) +{ + static DECLARE_BITMAP(cp, BITMAP_LEN) __initdata; + unsigned long i, cnt; + ktime_t time; + + bitmap_copy(cp, bitmap, BITMAP_LEN); + + time = ktime_get(); + for (cnt = i = 0; i < len; cnt++) { + i = find_first_and_bit(cp, bitmap2, len); + __clear_bit(i, cp); + } + time = ktime_get() - time; + pr_err("find_first_and_bit: %18llu ns, %6ld iterations\n", time, cnt); + + return 0; +} + static int __init test_find_next_bit(const void *bitmap, unsigned long len) { unsigned long i, cnt; @@ -129,6 +148,7 @@ static int __init find_bit_test(void) * traverse only part of bitmap to avoid soft lockup. */ test_find_first_bit(bitmap, BITMAP_LEN / 10); + test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN / 2); test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
pr_err("\nStart testing find_bit() with sparse bitmap\n"); @@ -145,6 +165,7 @@ static int __init find_bit_test(void) test_find_next_zero_bit(bitmap, BITMAP_LEN); test_find_last_bit(bitmap, BITMAP_LEN); test_find_first_bit(bitmap, BITMAP_LEN); + test_find_first_and_bit(bitmap, bitmap2, BITMAP_LEN); test_find_next_and_bit(bitmap, bitmap2, BITMAP_LEN);
/*
Now we have an efficient implementation for find_first_and_bit(), so switch cpumask to use it where appropriate.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- include/linux/cpumask.h | 30 ++++++++++++++++++++---------- 1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index f3689a52bfd0..4a03f6636e6c 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -123,6 +123,12 @@ static inline unsigned int cpumask_first(const struct cpumask *srcp) return 0; }
+static inline unsigned int cpumask_first_and(const struct cpumask *srcp1, + const struct cpumask *srcp2) +{ + return 0; +} + static inline unsigned int cpumask_last(const struct cpumask *srcp) { return 0; @@ -167,7 +173,7 @@ static inline unsigned int cpumask_local_spread(unsigned int i, int node)
static inline int cpumask_any_and_distribute(const struct cpumask *src1p, const struct cpumask *src2p) { - return cpumask_next_and(-1, src1p, src2p); + return cpumask_first_and(src1p, src2p); }
static inline int cpumask_any_distribute(const struct cpumask *srcp) @@ -195,6 +201,19 @@ static inline unsigned int cpumask_first(const struct cpumask *srcp) return find_first_bit(cpumask_bits(srcp), nr_cpumask_bits); }
+/** + * cpumask_first_and - return the first cpu from *srcp1 & *srcp2 + * @src1p: the first input + * @src2p: the second input + * + * Returns >= nr_cpu_ids if no cpus set in both. See also cpumask_next_and(). + */ +static inline +unsigned int cpumask_first_and(const struct cpumask *srcp1, const struct cpumask *srcp2) +{ + return find_first_and_bit(cpumask_bits(srcp1), cpumask_bits(srcp2), nr_cpumask_bits); +} + /** * cpumask_last - get the last CPU in a cpumask * @srcp: - the cpumask pointer @@ -585,15 +604,6 @@ static inline void cpumask_copy(struct cpumask *dstp, */ #define cpumask_any(srcp) cpumask_first(srcp)
-/** - * cpumask_first_and - return the first cpu from *srcp1 & *srcp2 - * @src1p: the first input - * @src2p: the second input - * - * Returns >= nr_cpu_ids if no cpus set in both. See also cpumask_next_and(). - */ -#define cpumask_first_and(src1p, src2p) cpumask_next_and(-1, (src1p), (src2p)) - /** * cpumask_any_and - pick a "random" cpu from *mask1 & *mask2 * @mask1: the first input cpumask
find_first{,_zero}_bit is a more effective analogue of 'next' version if start == 0. This patch replaces 'next' with 'first' where things look trivial.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- arch/powerpc/platforms/pasemi/dma_lib.c | 4 ++-- arch/s390/kvm/kvm-s390.c | 2 +- drivers/block/rnbd/rnbd-clt.c | 2 +- drivers/dma/ti/edma.c | 2 +- drivers/iio/adc/ad7124.c | 2 +- drivers/infiniband/hw/irdma/hw.c | 16 ++++++++-------- drivers/media/cec/core/cec-core.c | 2 +- drivers/media/mc/mc-devnode.c | 2 +- drivers/pci/controller/dwc/pci-dra7xx.c | 2 +- drivers/scsi/lpfc/lpfc_sli.c | 10 +++++----- drivers/soc/ti/k3-ringacc.c | 4 ++-- drivers/tty/n_tty.c | 2 +- drivers/virt/acrn/ioreq.c | 3 +-- fs/f2fs/segment.c | 8 ++++---- fs/ocfs2/cluster/heartbeat.c | 2 +- fs/ocfs2/dlm/dlmdomain.c | 4 ++-- fs/ocfs2/dlm/dlmmaster.c | 18 +++++++++--------- fs/ocfs2/dlm/dlmrecovery.c | 2 +- fs/ocfs2/dlm/dlmthread.c | 2 +- lib/genalloc.c | 2 +- net/ncsi/ncsi-manage.c | 4 ++-- 21 files changed, 47 insertions(+), 48 deletions(-)
diff --git a/arch/powerpc/platforms/pasemi/dma_lib.c b/arch/powerpc/platforms/pasemi/dma_lib.c index 270fa3c0d372..26427311fc72 100644 --- a/arch/powerpc/platforms/pasemi/dma_lib.c +++ b/arch/powerpc/platforms/pasemi/dma_lib.c @@ -375,7 +375,7 @@ int pasemi_dma_alloc_flag(void) int bit;
retry: - bit = find_next_bit(flags_free, MAX_FLAGS, 0); + bit = find_first_bit(flags_free, MAX_FLAGS); if (bit >= MAX_FLAGS) return -ENOSPC; if (!test_and_clear_bit(bit, flags_free)) @@ -440,7 +440,7 @@ int pasemi_dma_alloc_fun(void) int bit;
retry: - bit = find_next_bit(fun_free, MAX_FLAGS, 0); + bit = find_first_bit(fun_free, MAX_FLAGS); if (bit >= MAX_FLAGS) return -ENOSPC; if (!test_and_clear_bit(bit, fun_free)) diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 02574d7b3612..a5443a0accb3 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -2023,7 +2023,7 @@ static unsigned long kvm_s390_next_dirty_cmma(struct kvm_memslots *slots, while ((slotidx > 0) && (ofs >= ms->npages)) { slotidx--; ms = slots->memslots + slotidx; - ofs = find_next_bit(kvm_second_dirty_bitmap(ms), ms->npages, 0); + ofs = find_first_bit(kvm_second_dirty_bitmap(ms), ms->npages); } return ms->base_gfn + ofs; } diff --git a/drivers/block/rnbd/rnbd-clt.c b/drivers/block/rnbd/rnbd-clt.c index bd4a41afbbfc..5864c9b46cb9 100644 --- a/drivers/block/rnbd/rnbd-clt.c +++ b/drivers/block/rnbd/rnbd-clt.c @@ -196,7 +196,7 @@ rnbd_get_cpu_qlist(struct rnbd_clt_session *sess, int cpu) return per_cpu_ptr(sess->cpu_queues, bit); } else if (cpu != 0) { /* Search from 0 to cpu */ - bit = find_next_bit(sess->cpu_queues_bm, cpu, 0); + bit = find_first_bit(sess->cpu_queues_bm, cpu); if (bit < cpu) return per_cpu_ptr(sess->cpu_queues, bit); } diff --git a/drivers/dma/ti/edma.c b/drivers/dma/ti/edma.c index 35d81bd857f1..caa4050ecc02 100644 --- a/drivers/dma/ti/edma.c +++ b/drivers/dma/ti/edma.c @@ -1681,7 +1681,7 @@ static irqreturn_t dma_ccerr_handler(int irq, void *data)
dev_dbg(ecc->dev, "EMR%d 0x%08x\n", j, val); emr = val; - for (i = find_next_bit(&emr, 32, 0); i < 32; + for (i = find_first_bit(&emr, 32); i < 32; i = find_next_bit(&emr, 32, i + 1)) { int k = (j << 5) + i;
diff --git a/drivers/iio/adc/ad7124.c b/drivers/iio/adc/ad7124.c index e45c600fccc0..bc2cfa5f9592 100644 --- a/drivers/iio/adc/ad7124.c +++ b/drivers/iio/adc/ad7124.c @@ -347,7 +347,7 @@ static int ad7124_find_free_config_slot(struct ad7124_state *st) { unsigned int free_cfg_slot;
- free_cfg_slot = find_next_zero_bit(&st->cfg_slots_status, AD7124_MAX_CONFIGS, 0); + free_cfg_slot = find_first_zero_bit(&st->cfg_slots_status, AD7124_MAX_CONFIGS); if (free_cfg_slot == AD7124_MAX_CONFIGS) return -1;
diff --git a/drivers/infiniband/hw/irdma/hw.c b/drivers/infiniband/hw/irdma/hw.c index 00de5ee9a260..a003893a431e 100644 --- a/drivers/infiniband/hw/irdma/hw.c +++ b/drivers/infiniband/hw/irdma/hw.c @@ -1696,14 +1696,14 @@ static enum irdma_status_code irdma_setup_init_state(struct irdma_pci_f *rf) */ static void irdma_get_used_rsrc(struct irdma_device *iwdev) { - iwdev->rf->used_pds = find_next_zero_bit(iwdev->rf->allocated_pds, - iwdev->rf->max_pd, 0); - iwdev->rf->used_qps = find_next_zero_bit(iwdev->rf->allocated_qps, - iwdev->rf->max_qp, 0); - iwdev->rf->used_cqs = find_next_zero_bit(iwdev->rf->allocated_cqs, - iwdev->rf->max_cq, 0); - iwdev->rf->used_mrs = find_next_zero_bit(iwdev->rf->allocated_mrs, - iwdev->rf->max_mr, 0); + iwdev->rf->used_pds = find_first_zero_bit(iwdev->rf->allocated_pds, + iwdev->rf->max_pd); + iwdev->rf->used_qps = find_first_zero_bit(iwdev->rf->allocated_qps, + iwdev->rf->max_qp); + iwdev->rf->used_cqs = find_first_zero_bit(iwdev->rf->allocated_cqs, + iwdev->rf->max_cq); + iwdev->rf->used_mrs = find_first_zero_bit(iwdev->rf->allocated_mrs, + iwdev->rf->max_mr); }
void irdma_ctrl_deinit_hw(struct irdma_pci_f *rf) diff --git a/drivers/media/cec/core/cec-core.c b/drivers/media/cec/core/cec-core.c index 551689d371a7..7322e7cd9753 100644 --- a/drivers/media/cec/core/cec-core.c +++ b/drivers/media/cec/core/cec-core.c @@ -106,7 +106,7 @@ static int __must_check cec_devnode_register(struct cec_devnode *devnode,
/* Part 1: Find a free minor number */ mutex_lock(&cec_devnode_lock); - minor = find_next_zero_bit(cec_devnode_nums, CEC_NUM_DEVICES, 0); + minor = find_first_zero_bit(cec_devnode_nums, CEC_NUM_DEVICES); if (minor == CEC_NUM_DEVICES) { mutex_unlock(&cec_devnode_lock); pr_err("could not get a free minor\n"); diff --git a/drivers/media/mc/mc-devnode.c b/drivers/media/mc/mc-devnode.c index f11382afe23b..680fbb3a9340 100644 --- a/drivers/media/mc/mc-devnode.c +++ b/drivers/media/mc/mc-devnode.c @@ -217,7 +217,7 @@ int __must_check media_devnode_register(struct media_device *mdev,
/* Part 1: Find a free minor number */ mutex_lock(&media_devnode_lock); - minor = find_next_zero_bit(media_devnode_nums, MEDIA_NUM_DEVICES, 0); + minor = find_first_zero_bit(media_devnode_nums, MEDIA_NUM_DEVICES); if (minor == MEDIA_NUM_DEVICES) { mutex_unlock(&media_devnode_lock); pr_err("could not get a free minor\n"); diff --git a/drivers/pci/controller/dwc/pci-dra7xx.c b/drivers/pci/controller/dwc/pci-dra7xx.c index fbbb78f6885e..ab4b133e1ccd 100644 --- a/drivers/pci/controller/dwc/pci-dra7xx.c +++ b/drivers/pci/controller/dwc/pci-dra7xx.c @@ -211,7 +211,7 @@ static int dra7xx_pcie_handle_msi(struct pcie_port *pp, int index) if (!val) return 0;
- pos = find_next_bit(&val, MAX_MSI_IRQS_PER_CTRL, 0); + pos = find_first_bit(&val, MAX_MSI_IRQS_PER_CTRL); while (pos != MAX_MSI_IRQS_PER_CTRL) { generic_handle_domain_irq(pp->irq_domain, (index * MAX_MSI_IRQS_PER_CTRL) + pos); diff --git a/drivers/scsi/lpfc/lpfc_sli.c b/drivers/scsi/lpfc/lpfc_sli.c index 47dd13719901..5274536f9ea1 100644 --- a/drivers/scsi/lpfc/lpfc_sli.c +++ b/drivers/scsi/lpfc/lpfc_sli.c @@ -17286,8 +17286,8 @@ lpfc_sli4_alloc_xri(struct lpfc_hba *phba) * the driver starts at 0 each time. */ spin_lock_irq(&phba->hbalock); - xri = find_next_zero_bit(phba->sli4_hba.xri_bmask, - phba->sli4_hba.max_cfg_param.max_xri, 0); + xri = find_first_zero_bit(phba->sli4_hba.xri_bmask, + phba->sli4_hba.max_cfg_param.max_xri); if (xri >= phba->sli4_hba.max_cfg_param.max_xri) { spin_unlock_irq(&phba->hbalock); return NO_XRI; @@ -18964,7 +18964,7 @@ lpfc_sli4_alloc_rpi(struct lpfc_hba *phba) max_rpi = phba->sli4_hba.max_cfg_param.max_rpi; rpi_limit = phba->sli4_hba.next_rpi;
- rpi = find_next_zero_bit(phba->sli4_hba.rpi_bmask, rpi_limit, 0); + rpi = find_first_zero_bit(phba->sli4_hba.rpi_bmask, rpi_limit); if (rpi >= rpi_limit) rpi = LPFC_RPI_ALLOC_ERROR; else { @@ -19607,8 +19607,8 @@ lpfc_sli4_fcf_rr_next_index_get(struct lpfc_hba *phba) * have been tested so that we can detect when we should * change the priority level. */ - next_fcf_index = find_next_bit(phba->fcf.fcf_rr_bmask, - LPFC_SLI4_FCF_TBL_INDX_MAX, 0); + next_fcf_index = find_first_bit(phba->fcf.fcf_rr_bmask, + LPFC_SLI4_FCF_TBL_INDX_MAX); }
diff --git a/drivers/soc/ti/k3-ringacc.c b/drivers/soc/ti/k3-ringacc.c index 312ba0f98ad7..573be88f8191 100644 --- a/drivers/soc/ti/k3-ringacc.c +++ b/drivers/soc/ti/k3-ringacc.c @@ -358,8 +358,8 @@ struct k3_ring *k3_ringacc_request_ring(struct k3_ringacc *ringacc, goto out;
if (flags & K3_RINGACC_RING_USE_PROXY) { - proxy_id = find_next_zero_bit(ringacc->proxy_inuse, - ringacc->num_proxies, 0); + proxy_id = find_first_zero_bit(ringacc->proxy_inuse, + ringacc->num_proxies); if (proxy_id == ringacc->num_proxies) goto error; } diff --git a/drivers/tty/n_tty.c b/drivers/tty/n_tty.c index 0ec93f1a61f5..0965793dfe4f 100644 --- a/drivers/tty/n_tty.c +++ b/drivers/tty/n_tty.c @@ -1975,7 +1975,7 @@ static bool canon_copy_from_read_buf(struct tty_struct *tty, more = n - (size - tail); if (eol == N_TTY_BUF_SIZE && more) { /* scan wrapped without finding set bit */ - eol = find_next_bit(ldata->read_flags, more, 0); + eol = find_first_bit(ldata->read_flags, more); found = eol != more; } else found = eol != size; diff --git a/drivers/virt/acrn/ioreq.c b/drivers/virt/acrn/ioreq.c index 80b2e3f0e276..5ff1c53740c0 100644 --- a/drivers/virt/acrn/ioreq.c +++ b/drivers/virt/acrn/ioreq.c @@ -246,8 +246,7 @@ void acrn_ioreq_request_clear(struct acrn_vm *vm) spin_lock_bh(&vm->ioreq_clients_lock); client = vm->default_client; if (client) { - vcpu = find_next_bit(client->ioreqs_map, - ACRN_IO_REQUEST_MAX, 0); + vcpu = find_first_bit(client->ioreqs_map, ACRN_IO_REQUEST_MAX); while (vcpu < ACRN_IO_REQUEST_MAX) { acrn_ioreq_complete_request(client, vcpu, NULL); vcpu = find_next_bit(client->ioreqs_map, diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c index b4dd22134a73..2a4a101b3fe3 100644 --- a/fs/f2fs/segment.c +++ b/fs/f2fs/segment.c @@ -2524,8 +2524,8 @@ static void get_new_segment(struct f2fs_sb_info *sbi, secno = find_next_zero_bit(free_i->free_secmap, MAIN_SECS(sbi), hint); if (secno >= MAIN_SECS(sbi)) { if (dir == ALLOC_RIGHT) { - secno = find_next_zero_bit(free_i->free_secmap, - MAIN_SECS(sbi), 0); + secno = find_first_zero_bit(free_i->free_secmap, + MAIN_SECS(sbi)); f2fs_bug_on(sbi, secno >= MAIN_SECS(sbi)); } else { go_left = 1; @@ -2540,8 +2540,8 @@ static void get_new_segment(struct f2fs_sb_info *sbi, left_start--; continue; } - left_start = find_next_zero_bit(free_i->free_secmap, - MAIN_SECS(sbi), 0); + left_start = find_first_zero_bit(free_i->free_secmap, + MAIN_SECS(sbi)); f2fs_bug_on(sbi, left_start >= MAIN_SECS(sbi)); break; } diff --git a/fs/ocfs2/cluster/heartbeat.c b/fs/ocfs2/cluster/heartbeat.c index f89ffcbd585f..a17be1618bf7 100644 --- a/fs/ocfs2/cluster/heartbeat.c +++ b/fs/ocfs2/cluster/heartbeat.c @@ -379,7 +379,7 @@ static void o2hb_nego_timeout(struct work_struct *work)
o2hb_fill_node_map(live_node_bitmap, sizeof(live_node_bitmap)); /* lowest node as master node to make negotiate decision. */ - master_node = find_next_bit(live_node_bitmap, O2NM_MAX_NODES, 0); + master_node = find_first_bit(live_node_bitmap, O2NM_MAX_NODES);
if (master_node == o2nm_this_node()) { if (!test_bit(master_node, reg->hr_nego_node_bitmap)) { diff --git a/fs/ocfs2/dlm/dlmdomain.c b/fs/ocfs2/dlm/dlmdomain.c index 9f90fc9551e1..c4eccd499db8 100644 --- a/fs/ocfs2/dlm/dlmdomain.c +++ b/fs/ocfs2/dlm/dlmdomain.c @@ -1045,7 +1045,7 @@ static int dlm_send_regions(struct dlm_ctxt *dlm, unsigned long *node_map) int status, ret = 0, i; char *p;
- if (find_next_bit(node_map, O2NM_MAX_NODES, 0) >= O2NM_MAX_NODES) + if (find_first_bit(node_map, O2NM_MAX_NODES) >= O2NM_MAX_NODES) goto bail;
qr = kzalloc(sizeof(struct dlm_query_region), GFP_KERNEL); @@ -1217,7 +1217,7 @@ static int dlm_send_nodeinfo(struct dlm_ctxt *dlm, unsigned long *node_map) struct o2nm_node *node; int ret = 0, status, count, i;
- if (find_next_bit(node_map, O2NM_MAX_NODES, 0) >= O2NM_MAX_NODES) + if (find_first_bit(node_map, O2NM_MAX_NODES) >= O2NM_MAX_NODES) goto bail;
qn = kzalloc(sizeof(struct dlm_query_nodeinfo), GFP_KERNEL); diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c index 9b88219febb5..227da5b1b6ab 100644 --- a/fs/ocfs2/dlm/dlmmaster.c +++ b/fs/ocfs2/dlm/dlmmaster.c @@ -861,7 +861,7 @@ struct dlm_lock_resource * dlm_get_lock_resource(struct dlm_ctxt *dlm, * to see if there are any nodes that still need to be * considered. these will not appear in the mle nodemap * but they might own this lockres. wait on them. */ - bit = find_next_bit(dlm->recovery_map, O2NM_MAX_NODES, 0); + bit = find_first_bit(dlm->recovery_map, O2NM_MAX_NODES); if (bit < O2NM_MAX_NODES) { mlog(0, "%s: res %.*s, At least one node (%d) " "to recover before lock mastery can begin\n", @@ -912,7 +912,7 @@ struct dlm_lock_resource * dlm_get_lock_resource(struct dlm_ctxt *dlm, dlm_wait_for_recovery(dlm);
spin_lock(&dlm->spinlock); - bit = find_next_bit(dlm->recovery_map, O2NM_MAX_NODES, 0); + bit = find_first_bit(dlm->recovery_map, O2NM_MAX_NODES); if (bit < O2NM_MAX_NODES) { mlog(0, "%s: res %.*s, At least one node (%d) " "to recover before lock mastery can begin\n", @@ -1079,7 +1079,7 @@ static int dlm_wait_for_lock_mastery(struct dlm_ctxt *dlm, sleep = 1; /* have all nodes responded? */ if (voting_done && !*blocked) { - bit = find_next_bit(mle->maybe_map, O2NM_MAX_NODES, 0); + bit = find_first_bit(mle->maybe_map, O2NM_MAX_NODES); if (dlm->node_num <= bit) { /* my node number is lowest. * now tell other nodes that I am @@ -1234,8 +1234,8 @@ static int dlm_restart_lock_mastery(struct dlm_ctxt *dlm, } else { mlog(ML_ERROR, "node down! %d\n", node); if (blocked) { - int lowest = find_next_bit(mle->maybe_map, - O2NM_MAX_NODES, 0); + int lowest = find_first_bit(mle->maybe_map, + O2NM_MAX_NODES);
/* act like it was never there */ clear_bit(node, mle->maybe_map); @@ -1795,7 +1795,7 @@ int dlm_assert_master_handler(struct o2net_msg *msg, u32 len, void *data, "MLE for it! (%.*s)\n", assert->node_idx, namelen, name); } else { - int bit = find_next_bit (mle->maybe_map, O2NM_MAX_NODES, 0); + int bit = find_first_bit(mle->maybe_map, O2NM_MAX_NODES); if (bit >= O2NM_MAX_NODES) { /* not necessarily an error, though less likely. * could be master just re-asserting. */ @@ -2521,7 +2521,7 @@ static int dlm_is_lockres_migratable(struct dlm_ctxt *dlm, }
if (!nonlocal) { - node_ref = find_next_bit(res->refmap, O2NM_MAX_NODES, 0); + node_ref = find_first_bit(res->refmap, O2NM_MAX_NODES); if (node_ref >= O2NM_MAX_NODES) return 0; } @@ -3303,7 +3303,7 @@ static void dlm_clean_block_mle(struct dlm_ctxt *dlm, BUG_ON(mle->type != DLM_MLE_BLOCK);
spin_lock(&mle->spinlock); - bit = find_next_bit(mle->maybe_map, O2NM_MAX_NODES, 0); + bit = find_first_bit(mle->maybe_map, O2NM_MAX_NODES); if (bit != dead_node) { mlog(0, "mle found, but dead node %u would not have been " "master\n", dead_node); @@ -3542,7 +3542,7 @@ void dlm_force_free_mles(struct dlm_ctxt *dlm) spin_lock(&dlm->master_lock);
BUG_ON(dlm->dlm_state != DLM_CTXT_LEAVING); - BUG_ON((find_next_bit(dlm->domain_map, O2NM_MAX_NODES, 0) < O2NM_MAX_NODES)); + BUG_ON((find_first_bit(dlm->domain_map, O2NM_MAX_NODES) < O2NM_MAX_NODES));
for (i = 0; i < DLM_HASH_BUCKETS; i++) { bucket = dlm_master_hash(dlm, i); diff --git a/fs/ocfs2/dlm/dlmrecovery.c b/fs/ocfs2/dlm/dlmrecovery.c index 0e7aad1b11cc..e24e327524f8 100644 --- a/fs/ocfs2/dlm/dlmrecovery.c +++ b/fs/ocfs2/dlm/dlmrecovery.c @@ -451,7 +451,7 @@ static int dlm_do_recovery(struct dlm_ctxt *dlm) if (dlm->reco.dead_node == O2NM_INVALID_NODE_NUM) { int bit;
- bit = find_next_bit (dlm->recovery_map, O2NM_MAX_NODES, 0); + bit = find_first_bit(dlm->recovery_map, O2NM_MAX_NODES); if (bit >= O2NM_MAX_NODES || bit < 0) dlm_set_reco_dead_node(dlm, O2NM_INVALID_NODE_NUM); else diff --git a/fs/ocfs2/dlm/dlmthread.c b/fs/ocfs2/dlm/dlmthread.c index c350bd4df770..eedf07ca23ca 100644 --- a/fs/ocfs2/dlm/dlmthread.c +++ b/fs/ocfs2/dlm/dlmthread.c @@ -92,7 +92,7 @@ int __dlm_lockres_unused(struct dlm_lock_resource *res) return 0;
/* Another node has this resource with this node as the master */ - bit = find_next_bit(res->refmap, O2NM_MAX_NODES, 0); + bit = find_first_bit(res->refmap, O2NM_MAX_NODES); if (bit < O2NM_MAX_NODES) return 0;
diff --git a/lib/genalloc.c b/lib/genalloc.c index 9a57257988c7..00fc50d0a640 100644 --- a/lib/genalloc.c +++ b/lib/genalloc.c @@ -251,7 +251,7 @@ void gen_pool_destroy(struct gen_pool *pool) list_del(&chunk->next_chunk);
end_bit = chunk_size(chunk) >> order; - bit = find_next_bit(chunk->bits, end_bit, 0); + bit = find_first_bit(chunk->bits, end_bit); BUG_ON(bit < end_bit);
vfree(chunk); diff --git a/net/ncsi/ncsi-manage.c b/net/ncsi/ncsi-manage.c index 89c7742cd72e..a554837d86dd 100644 --- a/net/ncsi/ncsi-manage.c +++ b/net/ncsi/ncsi-manage.c @@ -608,7 +608,7 @@ static int clear_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc, bitmap = &ncf->bitmap;
spin_lock_irqsave(&nc->lock, flags); - index = find_next_bit(bitmap, ncf->n_vids, 0); + index = find_first_bit(bitmap, ncf->n_vids); if (index >= ncf->n_vids) { spin_unlock_irqrestore(&nc->lock, flags); return -1; @@ -667,7 +667,7 @@ static int set_one_vid(struct ncsi_dev_priv *ndp, struct ncsi_channel *nc, return -1; }
- index = find_next_zero_bit(bitmap, ncf->n_vids, 0); + index = find_first_zero_bit(bitmap, ncf->n_vids); if (index < 0 || index >= ncf->n_vids) { netdev_err(ndp->ndev.dev, "Channel %u already has all VLAN filters set\n",
Remove tools/include/asm-generic/bitops/find.h and copy include/linux/bitmap.h to tools. find_*_le() functions are not copied because not needed in tools.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- MAINTAINERS | 2 +- tools/include/asm-generic/bitops.h | 1 - tools/include/linux/bitmap.h | 7 +- .../{asm-generic/bitops => linux}/find.h | 81 +++++++++++++++++-- tools/lib/find_bit.c | 20 +++++ 5 files changed, 100 insertions(+), 11 deletions(-) rename tools/include/{asm-generic/bitops => linux}/find.h (63%)
diff --git a/MAINTAINERS b/MAINTAINERS index 9b62293f7b72..b033083dbb42 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3277,8 +3277,8 @@ F: lib/bitmap.c F: lib/find_bit.c F: lib/find_bit_benchmark.c F: lib/test_bitmap.c -F: tools/include/asm-generic/bitops/find.h F: tools/include/linux/bitmap.h +F: tools/include/linux/find.h F: tools/lib/bitmap.c F: tools/lib/find_bit.c
diff --git a/tools/include/asm-generic/bitops.h b/tools/include/asm-generic/bitops.h index 5d2ab38965cc..9ab313e93555 100644 --- a/tools/include/asm-generic/bitops.h +++ b/tools/include/asm-generic/bitops.h @@ -18,7 +18,6 @@ #include <asm-generic/bitops/fls.h> #include <asm-generic/bitops/__fls.h> #include <asm-generic/bitops/fls64.h> -#include <asm-generic/bitops/find.h>
#ifndef _TOOLS_LINUX_BITOPS_H_ #error only <linux/bitops.h> can be included directly diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h index 9d959bc24859..13d90b574970 100644 --- a/tools/include/linux/bitmap.h +++ b/tools/include/linux/bitmap.h @@ -1,9 +1,10 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _PERF_BITOPS_H -#define _PERF_BITOPS_H +#ifndef _TOOLS_LINUX_BITMAP_H +#define _TOOLS_LINUX_BITMAP_H
#include <string.h> #include <linux/bitops.h> +#include <linux/find.h> #include <stdlib.h> #include <linux/kernel.h>
@@ -181,4 +182,4 @@ static inline int bitmap_intersects(const unsigned long *src1, return __bitmap_intersects(src1, src2, nbits); }
-#endif /* _PERF_BITOPS_H */ +#endif /* _TOOLS_LINUX_BITMAP_H */ diff --git a/tools/include/asm-generic/bitops/find.h b/tools/include/linux/find.h similarity index 63% rename from tools/include/asm-generic/bitops/find.h rename to tools/include/linux/find.h index 6481fd11012a..47e2bd6c5174 100644 --- a/tools/include/asm-generic/bitops/find.h +++ b/tools/include/linux/find.h @@ -1,11 +1,19 @@ /* SPDX-License-Identifier: GPL-2.0 */ -#ifndef _TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_ -#define _TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_ +#ifndef _TOOLS_LINUX_FIND_H_ +#define _TOOLS_LINUX_FIND_H_ + +#ifndef _TOOLS_LINUX_BITMAP_H +#error tools: only <linux/bitmap.h> can be included directly +#endif + +#include <linux/bitops.h>
extern unsigned long _find_next_bit(const unsigned long *addr1, const unsigned long *addr2, unsigned long nbits, unsigned long start, unsigned long invert, unsigned long le); extern unsigned long _find_first_bit(const unsigned long *addr, unsigned long size); +extern unsigned long _find_first_and_bit(const unsigned long *addr1, + const unsigned long *addr2, unsigned long size); extern unsigned long _find_first_zero_bit(const unsigned long *addr, unsigned long size); extern unsigned long _find_last_bit(const unsigned long *addr, unsigned long size);
@@ -96,7 +104,6 @@ unsigned long find_next_zero_bit(const unsigned long *addr, unsigned long size, #endif
#ifndef find_first_bit - /** * find_first_bit - find the first set bit in a memory region * @addr: The address to start the search at @@ -116,11 +123,34 @@ unsigned long find_first_bit(const unsigned long *addr, unsigned long size)
return _find_first_bit(addr, size); } +#endif + +#ifndef find_first_and_bit +/** + * find_first_and_bit - find the first set bit in both memory regions + * @addr1: The first address to base the search on + * @addr2: The second address to base the search on + * @size: The bitmap size in bits + * + * Returns the bit number for the next set bit + * If no bits are set, returns @size. + */ +static inline +unsigned long find_first_and_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + if (small_const_nbits(size)) { + unsigned long val = *addr1 & *addr2 & GENMASK(size - 1, 0);
-#endif /* find_first_bit */ + return val ? __ffs(val) : size; + }
-#ifndef find_first_zero_bit + return _find_first_and_bit(addr1, addr2, size); +} +#endif
+#ifndef find_first_zero_bit /** * find_first_zero_bit - find the first cleared bit in a memory region * @addr: The address to start the search at @@ -142,4 +172,43 @@ unsigned long find_first_zero_bit(const unsigned long *addr, unsigned long size) } #endif
-#endif /*_TOOLS_LINUX_ASM_GENERIC_BITOPS_FIND_H_ */ +#ifndef find_last_bit +/** + * find_last_bit - find the last set bit in a memory region + * @addr: The address to start the search at + * @size: The number of bits to search + * + * Returns the bit number of the last set bit, or size. + */ +static inline +unsigned long find_last_bit(const unsigned long *addr, unsigned long size) +{ + if (small_const_nbits(size)) { + unsigned long val = *addr & GENMASK(size - 1, 0); + + return val ? __fls(val) : size; + } + + return _find_last_bit(addr, size); +} +#endif + +/** + * find_next_clump8 - find next 8-bit clump with set bits in a memory region + * @clump: location to store copy of found clump + * @addr: address to base the search on + * @size: bitmap size in number of bits + * @offset: bit offset at which to start searching + * + * Returns the bit offset for the next set clump; the found clump value is + * copied to the location pointed by @clump. If no bits are set, returns @size. + */ +extern unsigned long find_next_clump8(unsigned long *clump, + const unsigned long *addr, + unsigned long size, unsigned long offset); + +#define find_first_clump8(clump, bits, size) \ + find_next_clump8((clump), (bits), (size), 0) + + +#endif /*__LINUX_FIND_H_ */ diff --git a/tools/lib/find_bit.c b/tools/lib/find_bit.c index 109aa7ffcf97..ba4b8d94e004 100644 --- a/tools/lib/find_bit.c +++ b/tools/lib/find_bit.c @@ -96,6 +96,26 @@ unsigned long _find_first_bit(const unsigned long *addr, unsigned long size) } #endif
+#ifndef find_first_and_bit +/* + * Find the first set bit in two memory regions. + */ +unsigned long _find_first_and_bit(const unsigned long *addr1, + const unsigned long *addr2, + unsigned long size) +{ + unsigned long idx, val; + + for (idx = 0; idx * BITS_PER_LONG < size; idx++) { + val = addr1[idx] & addr2[idx]; + if (val) + return min(idx * BITS_PER_LONG + __ffs(val), size); + } + + return size; +} +#endif + #ifndef find_first_zero_bit /* * Find the first cleared bit in a memory region.
cpumask_first() is a more effective analogue of 'next' version if n == -1 (which means start == 0). This patch replaces 'next' with 'first' where things look trivial.
There's no cpumask_first_zero() function, so create it.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- arch/powerpc/include/asm/cputhreads.h | 2 +- block/blk-mq.c | 2 +- drivers/net/virtio_net.c | 2 +- drivers/soc/fsl/qbman/bman_portal.c | 2 +- drivers/soc/fsl/qbman/qman_portal.c | 2 +- include/linux/cpumask.h | 16 ++++++++++++++++ kernel/time/clocksource.c | 4 ++-- 7 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/arch/powerpc/include/asm/cputhreads.h b/arch/powerpc/include/asm/cputhreads.h index b167186aaee4..44286df21d2a 100644 --- a/arch/powerpc/include/asm/cputhreads.h +++ b/arch/powerpc/include/asm/cputhreads.h @@ -52,7 +52,7 @@ static inline cpumask_t cpu_thread_mask_to_cores(const struct cpumask *threads) for (i = 0; i < NR_CPUS; i += threads_per_core) { cpumask_shift_left(&tmp, &threads_core_mask, i); if (cpumask_intersects(threads, &tmp)) { - cpu = cpumask_next_and(-1, &tmp, cpu_online_mask); + cpu = cpumask_first_and(&tmp, cpu_online_mask); if (cpu < nr_cpu_ids) cpumask_set_cpu(cpu, &res); } diff --git a/block/blk-mq.c b/block/blk-mq.c index 3ab4540320ca..39c38a8a996d 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2544,7 +2544,7 @@ static bool blk_mq_hctx_has_requests(struct blk_mq_hw_ctx *hctx) static inline bool blk_mq_last_cpu_in_hctx(unsigned int cpu, struct blk_mq_hw_ctx *hctx) { - if (cpumask_next_and(-1, hctx->cpumask, cpu_online_mask) != cpu) + if (cpumask_first_and(hctx->cpumask, cpu_online_mask) != cpu) return false; if (cpumask_next_and(cpu, hctx->cpumask, cpu_online_mask) < nr_cpu_ids) return false; diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 2e42210a6503..7fb87f8a7973 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -2080,7 +2080,7 @@ static void virtnet_set_affinity(struct virtnet_info *vi) stragglers = num_cpu >= vi->curr_queue_pairs ? num_cpu % vi->curr_queue_pairs : 0; - cpu = cpumask_next(-1, cpu_online_mask); + cpu = cpumask_first(cpu_online_mask);
for (i = 0; i < vi->curr_queue_pairs; i++) { group_size = stride + (i < stragglers ? 1 : 0); diff --git a/drivers/soc/fsl/qbman/bman_portal.c b/drivers/soc/fsl/qbman/bman_portal.c index acda8a5637c5..4d7b9caee1c4 100644 --- a/drivers/soc/fsl/qbman/bman_portal.c +++ b/drivers/soc/fsl/qbman/bman_portal.c @@ -155,7 +155,7 @@ static int bman_portal_probe(struct platform_device *pdev) }
spin_lock(&bman_lock); - cpu = cpumask_next_zero(-1, &portal_cpus); + cpu = cpumask_first_zero(&portal_cpus); if (cpu >= nr_cpu_ids) { __bman_portals_probed = 1; /* unassigned portal, skip init */ diff --git a/drivers/soc/fsl/qbman/qman_portal.c b/drivers/soc/fsl/qbman/qman_portal.c index 96f74a1dc603..e23b60618c1a 100644 --- a/drivers/soc/fsl/qbman/qman_portal.c +++ b/drivers/soc/fsl/qbman/qman_portal.c @@ -248,7 +248,7 @@ static int qman_portal_probe(struct platform_device *pdev) pcfg->pools = qm_get_pools_sdqcr();
spin_lock(&qman_lock); - cpu = cpumask_next_zero(-1, &portal_cpus); + cpu = cpumask_first_zero(&portal_cpus); if (cpu >= nr_cpu_ids) { __qman_portals_probed = 1; /* unassigned portal, skip init */ diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 4a03f6636e6c..f5883a8f28ca 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -123,6 +123,11 @@ static inline unsigned int cpumask_first(const struct cpumask *srcp) return 0; }
+static inline unsigned int cpumask_first_zero(const struct cpumask *srcp) +{ + return 0; +} + static inline unsigned int cpumask_first_and(const struct cpumask *srcp1, const struct cpumask *srcp2) { @@ -201,6 +206,17 @@ static inline unsigned int cpumask_first(const struct cpumask *srcp) return find_first_bit(cpumask_bits(srcp), nr_cpumask_bits); }
+/** + * cpumask_first_zero - get the first unset cpu in a cpumask + * @srcp: the cpumask pointer + * + * Returns >= nr_cpu_ids if all cpus are set. + */ +static inline unsigned int cpumask_first_zero(const struct cpumask *srcp) +{ + return find_first_zero_bit(cpumask_bits(srcp), nr_cpumask_bits); +} + /** * cpumask_first_and - return the first cpu from *srcp1 & *srcp2 * @src1p: the first input diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c index b038b81f8d32..2f170383b00a 100644 --- a/kernel/time/clocksource.c +++ b/kernel/time/clocksource.c @@ -262,7 +262,7 @@ static void clocksource_verify_choose_cpus(void) return;
/* Make sure to select at least one CPU other than the current CPU. */ - cpu = cpumask_next(-1, cpu_online_mask); + cpu = cpumask_first(cpu_online_mask); if (cpu == smp_processor_id()) cpu = cpumask_next(cpu, cpu_online_mask); if (WARN_ON_ONCE(cpu >= nr_cpu_ids)) @@ -284,7 +284,7 @@ static void clocksource_verify_choose_cpus(void) cpu = prandom_u32() % nr_cpu_ids; cpu = cpumask_next(cpu - 1, cpu_online_mask); if (cpu >= nr_cpu_ids) - cpu = cpumask_next(-1, cpu_online_mask); + cpu = cpumask_first(cpu_online_mask); if (!WARN_ON_ONCE(cpu >= nr_cpu_ids)) cpumask_set_cpu(cpu, &cpus_chosen); }
for_each_bit() macros depend on find_bit() machinery, and so the proper place for them is the find.h header.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- include/linux/bitops.h | 34 ---------------------------------- include/linux/find.h | 34 ++++++++++++++++++++++++++++++++++ 2 files changed, 34 insertions(+), 34 deletions(-)
diff --git a/include/linux/bitops.h b/include/linux/bitops.h index 5e62e2383b7f..7aaed501f768 100644 --- a/include/linux/bitops.h +++ b/include/linux/bitops.h @@ -32,40 +32,6 @@ extern unsigned long __sw_hweight64(__u64 w); */ #include <asm/bitops.h>
-#define for_each_set_bit(bit, addr, size) \ - for ((bit) = find_first_bit((addr), (size)); \ - (bit) < (size); \ - (bit) = find_next_bit((addr), (size), (bit) + 1)) - -/* same as for_each_set_bit() but use bit as value to start with */ -#define for_each_set_bit_from(bit, addr, size) \ - for ((bit) = find_next_bit((addr), (size), (bit)); \ - (bit) < (size); \ - (bit) = find_next_bit((addr), (size), (bit) + 1)) - -#define for_each_clear_bit(bit, addr, size) \ - for ((bit) = find_first_zero_bit((addr), (size)); \ - (bit) < (size); \ - (bit) = find_next_zero_bit((addr), (size), (bit) + 1)) - -/* same as for_each_clear_bit() but use bit as value to start with */ -#define for_each_clear_bit_from(bit, addr, size) \ - for ((bit) = find_next_zero_bit((addr), (size), (bit)); \ - (bit) < (size); \ - (bit) = find_next_zero_bit((addr), (size), (bit) + 1)) - -/** - * for_each_set_clump8 - iterate over bitmap for each 8-bit clump with set bits - * @start: bit offset to start search and to store the current iteration offset - * @clump: location to store copy of current 8-bit clump - * @bits: bitmap address to base the search on - * @size: bitmap size in number of bits - */ -#define for_each_set_clump8(start, clump, bits, size) \ - for ((start) = find_first_clump8(&(clump), (bits), (size)); \ - (start) < (size); \ - (start) = find_next_clump8(&(clump), (bits), (size), (start) + 8)) - static inline int get_bitmask_order(unsigned int count) { int order; diff --git a/include/linux/find.h b/include/linux/find.h index 6048f8c97418..4500e8ab93e2 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -279,4 +279,38 @@ unsigned long find_next_bit_le(const void *addr, unsigned #error "Please fix <asm/byteorder.h>" #endif
+#define for_each_set_bit(bit, addr, size) \ + for ((bit) = find_first_bit((addr), (size)); \ + (bit) < (size); \ + (bit) = find_next_bit((addr), (size), (bit) + 1)) + +/* same as for_each_set_bit() but use bit as value to start with */ +#define for_each_set_bit_from(bit, addr, size) \ + for ((bit) = find_next_bit((addr), (size), (bit)); \ + (bit) < (size); \ + (bit) = find_next_bit((addr), (size), (bit) + 1)) + +#define for_each_clear_bit(bit, addr, size) \ + for ((bit) = find_first_zero_bit((addr), (size)); \ + (bit) < (size); \ + (bit) = find_next_zero_bit((addr), (size), (bit) + 1)) + +/* same as for_each_clear_bit() but use bit as value to start with */ +#define for_each_clear_bit_from(bit, addr, size) \ + for ((bit) = find_next_zero_bit((addr), (size), (bit)); \ + (bit) < (size); \ + (bit) = find_next_zero_bit((addr), (size), (bit) + 1)) + +/** + * for_each_set_clump8 - iterate over bitmap for each 8-bit clump with set bits + * @start: bit offset to start search and to store the current iteration offset + * @clump: location to store copy of current 8-bit clump + * @bits: bitmap address to base the search on + * @size: bitmap size in number of bits + */ +#define for_each_set_clump8(start, clump, bits, size) \ + for ((start) = find_first_clump8(&(clump), (bits), (size)); \ + (start) < (size); \ + (start) = find_next_clump8(&(clump), (bits), (size), (start) + 8)) + #endif /*__LINUX_FIND_H_ */
The macros iterate thru all set/clear bits in a bitmap. They search a first bit using find_first_bit(), and the rest bits using find_next_bit().
Since find_next_bit() is called shortly after find_first_bit(), we can save few lines of I-cache by not using find_first_bit().
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- include/linux/find.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/find.h b/include/linux/find.h index 4500e8ab93e2..ae9ed52b52b8 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned #endif
#define for_each_set_bit(bit, addr, size) \ - for ((bit) = find_first_bit((addr), (size)); \ + for ((bit) = find_next_bit((addr), (size), 0); \ (bit) < (size); \ (bit) = find_next_bit((addr), (size), (bit) + 1))
@@ -291,7 +291,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned (bit) = find_next_bit((addr), (size), (bit) + 1))
#define for_each_clear_bit(bit, addr, size) \ - for ((bit) = find_first_zero_bit((addr), (size)); \ + for ((bit) = find_next_zero_bit((addr), (size), 0); \ (bit) < (size); \ (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
On Sat 2021-08-14 14:17:07, Yury Norov wrote:
The macros iterate thru all set/clear bits in a bitmap. They search a first bit using find_first_bit(), and the rest bits using find_next_bit().
Since find_next_bit() is called shortly after find_first_bit(), we can save few lines of I-cache by not using find_first_bit().
Is this only a speculation or does it fix a real performance problem?
The macro is used like:
for_each_set_bit(bit, addr, size) { fn(bit); }
IMHO, the micro-opimization does not help when fn() is non-trivial.
--- a/include/linux/find.h +++ b/include/linux/find.h @@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned #endif #define for_each_set_bit(bit, addr, size) \
- for ((bit) = find_first_bit((addr), (size)); \
- for ((bit) = find_next_bit((addr), (size), 0); \ (bit) < (size); \ (bit) = find_next_bit((addr), (size), (bit) + 1))
It is not a big deal. I just think that the original code is slightly more self-explaining.
Best Regards, Petr
On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote:
On Sat 2021-08-14 14:17:07, Yury Norov wrote:
The macros iterate thru all set/clear bits in a bitmap. They search a first bit using find_first_bit(), and the rest bits using find_next_bit().
Since find_next_bit() is called shortly after find_first_bit(), we can save few lines of I-cache by not using find_first_bit().
Is this only a speculation or does it fix a real performance problem?
The macro is used like:
for_each_set_bit(bit, addr, size) { fn(bit); }
IMHO, the micro-opimization does not help when fn() is non-trivial.
The effect is measurable:
Start testing for_each_bit() for_each_set_bit: 15296 ns, 1000 iterations for_each_set_bit_from: 15225 ns, 1000 iterations
Start testing for_each_bit() with cash flushing for_each_set_bit: 547626 ns, 1000 iterations for_each_set_bit_from: 497899 ns, 1000 iterations
Refer this:
https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg356151.html
Thanks, Yury
--- a/include/linux/find.h +++ b/include/linux/find.h @@ -280,7 +280,7 @@ unsigned long find_next_bit_le(const void *addr, unsigned #endif #define for_each_set_bit(bit, addr, size) \
- for ((bit) = find_first_bit((addr), (size)); \
- for ((bit) = find_next_bit((addr), (size), 0); \ (bit) < (size); \ (bit) = find_next_bit((addr), (size), (bit) + 1))
It is not a big deal. I just think that the original code is slightly more self-explaining.
Best Regards, Petr
On Thu 2021-08-26 14:09:55, Yury Norov wrote:
On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote:
On Sat 2021-08-14 14:17:07, Yury Norov wrote:
The macros iterate thru all set/clear bits in a bitmap. They search a first bit using find_first_bit(), and the rest bits using find_next_bit().
Since find_next_bit() is called shortly after find_first_bit(), we can save few lines of I-cache by not using find_first_bit().
Is this only a speculation or does it fix a real performance problem?
The macro is used like:
for_each_set_bit(bit, addr, size) { fn(bit); }
IMHO, the micro-opimization does not help when fn() is non-trivial.
The effect is measurable:
Start testing for_each_bit() for_each_set_bit: 15296 ns, 1000 iterations for_each_set_bit_from: 15225 ns, 1000 iterations
Start testing for_each_bit() with cash flushing for_each_set_bit: 547626 ns, 1000 iterations for_each_set_bit_from: 497899 ns, 1000 iterations
Refer this:
https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg356151.html
I see. The results look convincing on the first look.
But I am still not sure. This patch is basically contradicting many other patches from this patchset:
+ 5th patch optimizes find_first_and_bit() and proves that it is much faster:
Before (#define find_first_and_bit(...) find_next_and_bit(..., 0): Start testing find_bit() with random-filled bitmap [ 140.291468] find_first_and_bit: 46890919 ns, 32671 iterations Start testing find_bit() with sparse bitmap [ 140.295028] find_first_and_bit: 7103 ns, 1 iterations
After: Start testing find_bit() with random-filled bitmap [ 162.574907] find_first_and_bit: 25045813 ns, 32846 iterations Start testing find_bit() with sparse bitmap [ 162.578458] find_first_and_bit: 4900 ns, 1 iterations
=> saves 46% in random bitmap saves 31% in sparse bitmap
+ 6th, 7th, and 9th patch makes the code use find_first_bit() because it is faster than find_next_bit(mask, size, 0);
+ Now, 11th (this) patch replaces find_first_bit() with find_next_bit(mask, size, 0) because find_first_bit() makes things slower. It is suspicious at minimum.
By other words. The I-cache could safe 10% in one case. But find_first_bit() might safe 46% in random case.
Does I-cache cost more than the faster code?
Or was for_each_set_bit() tested only with a bitmap where find_first_bit() optimization did not help much?
How would for_each_set_bit() work with random bitmap? How does it work with larger bitmaps?
Best Regards, Petr
On Mon, Aug 30, 2021 at 02:12:49PM +0200, Petr Mladek wrote:
On Thu 2021-08-26 14:09:55, Yury Norov wrote:
On Thu, Aug 26, 2021 at 03:57:13PM +0200, Petr Mladek wrote:
On Sat 2021-08-14 14:17:07, Yury Norov wrote:
The macros iterate thru all set/clear bits in a bitmap. They search a first bit using find_first_bit(), and the rest bits using find_next_bit().
Since find_next_bit() is called shortly after find_first_bit(), we can save few lines of I-cache by not using find_first_bit().
Is this only a speculation or does it fix a real performance problem?
The macro is used like:
for_each_set_bit(bit, addr, size) { fn(bit); }
IMHO, the micro-opimization does not help when fn() is non-trivial.
The effect is measurable:
Start testing for_each_bit() for_each_set_bit: 15296 ns, 1000 iterations for_each_set_bit_from: 15225 ns, 1000 iterations
Start testing for_each_bit() with cash flushing for_each_set_bit: 547626 ns, 1000 iterations for_each_set_bit_from: 497899 ns, 1000 iterations
Refer this:
https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg356151.html
I see. The results look convincing on the first look.
But I am still not sure. This patch is basically contradicting many other patches from this patchset:
5th patch optimizes find_first_and_bit() and proves that it is much faster:
Before (#define find_first_and_bit(...) find_next_and_bit(..., 0): Start testing find_bit() with random-filled bitmap [ 140.291468] find_first_and_bit: 46890919 ns, 32671 iterations Start testing find_bit() with sparse bitmap [ 140.295028] find_first_and_bit: 7103 ns, 1 iterations
After: Start testing find_bit() with random-filled bitmap [ 162.574907] find_first_and_bit: 25045813 ns, 32846 iterations Start testing find_bit() with sparse bitmap [ 162.578458] find_first_and_bit: 4900 ns, 1 iterations
=> saves 46% in random bitmap saves 31% in sparse bitmap
6th, 7th, and 9th patch makes the code use find_first_bit() because it is faster than find_next_bit(mask, size, 0);
Now, 11th (this) patch replaces find_first_bit() with find_next_bit(mask, size, 0) because find_first_bit() makes things slower. It is suspicious at minimum.
By other words. The I-cache could safe 10% in one case. But find_first_bit() might safe 46% in random case.
Those are different cases. find_first_bit() is approximately twice faster than find_next_bit, and much smaller. The conclusion is simple: use 'first' version whenever possible if there's no other considerations.
In case of for_each_bit() macros, however, we have such a consideration. In contrast to regular pattern, where user calls either first, or next versions N times, here we call find_first_bit once, and then find_next_bit N-1 times.
Because we know for sure that we'll call find_next_bit shortly, we can benefit from locality under heavy pressure on I-cache, if replace 'first' with 'next'. Consider it as a prefetch mechanism for the following calls to find_next_bit().
Does I-cache cost more than the faster code?
In this case cache miss is more expensive.
Or was for_each_set_bit() tested only with a bitmap where find_first_bit() optimization did not help much?
I tried to ensure that the effect of I-cache is real and in this case more important than code performance, so in the test I called 'first' once and 'next' twice.
How would for_each_set_bit() work with random bitmap?
It would work for all bitmaps.
How does it work with larger bitmaps?
Percentage gain (but not absolute) will decrease proportionally to the number of calls of find_next_bit() for big N.
Thanks, Yury
Best Regards, Petr
A couple of kernel functions call for_each_*_bit_from() with start bit equal to 0. Replace them with for_each_*_bit().
No functional changes, but might improve on readability.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- arch/x86/kernel/apic/vector.c | 4 ++-- drivers/gpu/drm/etnaviv/etnaviv_gpu.c | 4 ++-- drivers/hwmon/ltc2992.c | 3 +-- 3 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c index c132daabe615..3e6f6b448f6a 100644 --- a/arch/x86/kernel/apic/vector.c +++ b/arch/x86/kernel/apic/vector.c @@ -760,9 +760,9 @@ void __init lapic_update_legacy_vectors(void)
void __init lapic_assign_system_vectors(void) { - unsigned int i, vector = 0; + unsigned int i, vector;
- for_each_set_bit_from(vector, system_vectors, NR_VECTORS) + for_each_set_bit(vector, system_vectors, NR_VECTORS) irq_matrix_assign_system(vector_matrix, vector, false);
if (nr_legacy_irqs() > 1) diff --git a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c index c297fffe06eb..cb9a2e493e8a 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_gpu.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_gpu.c @@ -1038,7 +1038,7 @@ int etnaviv_gpu_debugfs(struct etnaviv_gpu *gpu, struct seq_file *m)
void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu) { - unsigned int i = 0; + unsigned int i;
dev_err(gpu->dev, "recover hung GPU!\n");
@@ -1051,7 +1051,7 @@ void etnaviv_gpu_recover_hang(struct etnaviv_gpu *gpu)
/* complete all events, the GPU won't do it after the reset */ spin_lock(&gpu->event_spinlock); - for_each_set_bit_from(i, gpu->event_bitmap, ETNA_NR_EVENTS) + for_each_set_bit(i, gpu->event_bitmap, ETNA_NR_EVENTS) complete(&gpu->event_free); bitmap_zero(gpu->event_bitmap, ETNA_NR_EVENTS); spin_unlock(&gpu->event_spinlock); diff --git a/drivers/hwmon/ltc2992.c b/drivers/hwmon/ltc2992.c index 2a4bed0ab226..7352d2b3c756 100644 --- a/drivers/hwmon/ltc2992.c +++ b/drivers/hwmon/ltc2992.c @@ -248,8 +248,7 @@ static int ltc2992_gpio_get_multiple(struct gpio_chip *chip, unsigned long *mask
gpio_status = reg;
- gpio_nr = 0; - for_each_set_bit_from(gpio_nr, mask, LTC2992_GPIO_NR) { + for_each_set_bit(gpio_nr, mask, LTC2992_GPIO_NR) { if (test_bit(LTC2992_GPIO_BIT(gpio_nr), &gpio_status)) set_bit(gpio_nr, bits); }
From: Andy Shevchenko andriy.shevchenko@linux.intel.com
Rename bitmap_alloc() to bitmap_zalloc() in tools to follow the bitmap API in the kernel.
No functional changes intended.
Suggested-by: Yury Norov yury.norov@gmail.com Acked-by: Yury Norov yury.norov@gmail.com Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com Acked-by: Jiri Olsa jolsa@redhat.com --- tools/include/linux/bitmap.h | 4 ++-- tools/perf/bench/find-bit-bench.c | 2 +- tools/perf/builtin-c2c.c | 6 +++--- tools/perf/builtin-record.c | 2 +- tools/perf/tests/bitmap.c | 2 +- tools/perf/tests/mem2node.c | 2 +- tools/perf/util/affinity.c | 4 ++-- tools/perf/util/header.c | 4 ++-- tools/perf/util/metricgroup.c | 2 +- tools/perf/util/mmap.c | 4 ++-- tools/testing/selftests/kvm/dirty_log_perf_test.c | 2 +- tools/testing/selftests/kvm/dirty_log_test.c | 4 ++-- tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c | 2 +- 13 files changed, 20 insertions(+), 20 deletions(-)
diff --git a/tools/include/linux/bitmap.h b/tools/include/linux/bitmap.h index 13d90b574970..ea97804d04d4 100644 --- a/tools/include/linux/bitmap.h +++ b/tools/include/linux/bitmap.h @@ -112,10 +112,10 @@ static inline int test_and_clear_bit(int nr, unsigned long *addr) }
/** - * bitmap_alloc - Allocate bitmap + * bitmap_zalloc - Allocate bitmap * @nbits: Number of bits */ -static inline unsigned long *bitmap_alloc(int nbits) +static inline unsigned long *bitmap_zalloc(int nbits) { return calloc(1, BITS_TO_LONGS(nbits) * sizeof(unsigned long)); } diff --git a/tools/perf/bench/find-bit-bench.c b/tools/perf/bench/find-bit-bench.c index 73b5bcc5946a..22b5cfe97023 100644 --- a/tools/perf/bench/find-bit-bench.c +++ b/tools/perf/bench/find-bit-bench.c @@ -54,7 +54,7 @@ static bool asm_test_bit(long nr, const unsigned long *addr)
static int do_for_each_set_bit(unsigned int num_bits) { - unsigned long *to_test = bitmap_alloc(num_bits); + unsigned long *to_test = bitmap_zalloc(num_bits); struct timeval start, end, diff; u64 runtime_us; struct stats fb_time_stats, tb_time_stats; diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c index 6dea37f141b2..c34d77bee4ef 100644 --- a/tools/perf/builtin-c2c.c +++ b/tools/perf/builtin-c2c.c @@ -139,11 +139,11 @@ static void *c2c_he_zalloc(size_t size) if (!c2c_he) return NULL;
- c2c_he->cpuset = bitmap_alloc(c2c.cpus_cnt); + c2c_he->cpuset = bitmap_zalloc(c2c.cpus_cnt); if (!c2c_he->cpuset) return NULL;
- c2c_he->nodeset = bitmap_alloc(c2c.nodes_cnt); + c2c_he->nodeset = bitmap_zalloc(c2c.nodes_cnt); if (!c2c_he->nodeset) return NULL;
@@ -2047,7 +2047,7 @@ static int setup_nodes(struct perf_session *session) struct perf_cpu_map *map = n[node].map; unsigned long *set;
- set = bitmap_alloc(c2c.cpus_cnt); + set = bitmap_zalloc(c2c.cpus_cnt); if (!set) return -ENOMEM;
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 671a21c9ee4d..f1b30ac094cb 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -2786,7 +2786,7 @@ int cmd_record(int argc, const char **argv)
if (rec->opts.affinity != PERF_AFFINITY_SYS) { rec->affinity_mask.nbits = cpu__max_cpu(); - rec->affinity_mask.bits = bitmap_alloc(rec->affinity_mask.nbits); + rec->affinity_mask.bits = bitmap_zalloc(rec->affinity_mask.nbits); if (!rec->affinity_mask.bits) { pr_err("Failed to allocate thread mask for %zd cpus\n", rec->affinity_mask.nbits); err = -ENOMEM; diff --git a/tools/perf/tests/bitmap.c b/tools/perf/tests/bitmap.c index 96c137360918..12b805efdca0 100644 --- a/tools/perf/tests/bitmap.c +++ b/tools/perf/tests/bitmap.c @@ -14,7 +14,7 @@ static unsigned long *get_bitmap(const char *str, int nbits) unsigned long *bm = NULL; int i;
- bm = bitmap_alloc(nbits); + bm = bitmap_zalloc(nbits);
if (map && bm) { for (i = 0; i < map->nr; i++) diff --git a/tools/perf/tests/mem2node.c b/tools/perf/tests/mem2node.c index a258bd51f1a4..e4d0d58b97f8 100644 --- a/tools/perf/tests/mem2node.c +++ b/tools/perf/tests/mem2node.c @@ -27,7 +27,7 @@ static unsigned long *get_bitmap(const char *str, int nbits) unsigned long *bm = NULL; int i;
- bm = bitmap_alloc(nbits); + bm = bitmap_zalloc(nbits);
if (map && bm) { for (i = 0; i < map->nr; i++) { diff --git a/tools/perf/util/affinity.c b/tools/perf/util/affinity.c index a5e31f826828..7b12bd7a3080 100644 --- a/tools/perf/util/affinity.c +++ b/tools/perf/util/affinity.c @@ -25,11 +25,11 @@ int affinity__setup(struct affinity *a) { int cpu_set_size = get_cpu_set_size();
- a->orig_cpus = bitmap_alloc(cpu_set_size * 8); + a->orig_cpus = bitmap_zalloc(cpu_set_size * 8); if (!a->orig_cpus) return -1; sched_getaffinity(0, cpu_set_size, (cpu_set_t *)a->orig_cpus); - a->sched_cpus = bitmap_alloc(cpu_set_size * 8); + a->sched_cpus = bitmap_zalloc(cpu_set_size * 8); if (!a->sched_cpus) { zfree(&a->orig_cpus); return -1; diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index 44249027507a..563dec72adeb 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -278,7 +278,7 @@ static int do_read_bitmap(struct feat_fd *ff, unsigned long **pset, u64 *psize) if (ret) return ret;
- set = bitmap_alloc(size); + set = bitmap_zalloc(size); if (!set) return -ENOMEM;
@@ -1294,7 +1294,7 @@ static int memory_node__read(struct memory_node *n, unsigned long idx)
size++;
- n->set = bitmap_alloc(size); + n->set = bitmap_zalloc(size); if (!n->set) { closedir(dir); return -ENOMEM; diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index 99d047c5ead0..29b747ac31c1 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -313,7 +313,7 @@ static int metricgroup__setup_events(struct list_head *groups, struct evsel *evsel, *tmp; unsigned long *evlist_used;
- evlist_used = bitmap_alloc(perf_evlist->core.nr_entries); + evlist_used = bitmap_zalloc(perf_evlist->core.nr_entries); if (!evlist_used) return -ENOMEM;
diff --git a/tools/perf/util/mmap.c b/tools/perf/util/mmap.c index ab7108d22428..512dc8b9c168 100644 --- a/tools/perf/util/mmap.c +++ b/tools/perf/util/mmap.c @@ -106,7 +106,7 @@ static int perf_mmap__aio_bind(struct mmap *map, int idx, int cpu, int affinity) data = map->aio.data[idx]; mmap_len = mmap__mmap_len(map); node_index = cpu__get_node(cpu); - node_mask = bitmap_alloc(node_index + 1); + node_mask = bitmap_zalloc(node_index + 1); if (!node_mask) { pr_err("Failed to allocate node mask for mbind: error %m\n"); return -1; @@ -258,7 +258,7 @@ static void build_node_mask(int node, struct mmap_cpu_mask *mask) static int perf_mmap__setup_affinity_mask(struct mmap *map, struct mmap_params *mp) { map->affinity_mask.nbits = cpu__max_cpu(); - map->affinity_mask.bits = bitmap_alloc(map->affinity_mask.nbits); + map->affinity_mask.bits = bitmap_zalloc(map->affinity_mask.nbits); if (!map->affinity_mask.bits) return -1;
diff --git a/tools/testing/selftests/kvm/dirty_log_perf_test.c b/tools/testing/selftests/kvm/dirty_log_perf_test.c index 3c30d0045d8d..479868570d59 100644 --- a/tools/testing/selftests/kvm/dirty_log_perf_test.c +++ b/tools/testing/selftests/kvm/dirty_log_perf_test.c @@ -171,7 +171,7 @@ static void run_test(enum vm_guest_mode mode, void *arg) guest_num_pages = (nr_vcpus * guest_percpu_mem_size) >> vm_get_page_shift(vm); guest_num_pages = vm_adjust_num_guest_pages(mode, guest_num_pages); host_num_pages = vm_num_host_pages(mode, guest_num_pages); - bmap = bitmap_alloc(host_num_pages); + bmap = bitmap_zalloc(host_num_pages);
if (dirty_log_manual_caps) { cap.cap = KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2; diff --git a/tools/testing/selftests/kvm/dirty_log_test.c b/tools/testing/selftests/kvm/dirty_log_test.c index 5fe0140e407e..792c60e1b17d 100644 --- a/tools/testing/selftests/kvm/dirty_log_test.c +++ b/tools/testing/selftests/kvm/dirty_log_test.c @@ -749,8 +749,8 @@ static void run_test(enum vm_guest_mode mode, void *arg)
pr_info("guest physical test memory offset: 0x%lx\n", guest_test_phys_mem);
- bmap = bitmap_alloc(host_num_pages); - host_bmap_track = bitmap_alloc(host_num_pages); + bmap = bitmap_zalloc(host_num_pages); + host_bmap_track = bitmap_zalloc(host_num_pages);
/* Add an extra memory slot for testing dirty logging */ vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, diff --git a/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c b/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c index 06a64980a5d2..68f26a8b4f42 100644 --- a/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c +++ b/tools/testing/selftests/kvm/x86_64/vmx_dirty_log_test.c @@ -111,7 +111,7 @@ int main(int argc, char *argv[]) nested_map(vmx, vm, NESTED_TEST_MEM1, GUEST_TEST_MEM, 4096); nested_map(vmx, vm, NESTED_TEST_MEM2, GUEST_TEST_MEM, 4096);
- bmap = bitmap_alloc(TEST_MEM_PAGES); + bmap = bitmap_zalloc(TEST_MEM_PAGES); host_test_mem = addr_gpa2hva(vm, GUEST_TEST_MEM);
while (!done) {
bitmap_next_clear_region() calls find_next_zero_bit() and find_next_bit() sequentially to find a range of clear bits. In case of pcpu_is_populated() there's a chance to return earlier if bitmap has all bits set.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com Acked-by: Dennis Zhou dennis@kernel.org --- mm/percpu.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-)
diff --git a/mm/percpu.c b/mm/percpu.c index a43039629aa4..8bf8b88446d7 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -1070,17 +1070,18 @@ static void pcpu_block_update_hint_free(struct pcpu_chunk *chunk, int bit_off, static bool pcpu_is_populated(struct pcpu_chunk *chunk, int bit_off, int bits, int *next_off) { - unsigned int page_start, page_end, rs, re; + unsigned int start, end;
- page_start = PFN_DOWN(bit_off * PCPU_MIN_ALLOC_SIZE); - page_end = PFN_UP((bit_off + bits) * PCPU_MIN_ALLOC_SIZE); + start = PFN_DOWN(bit_off * PCPU_MIN_ALLOC_SIZE); + end = PFN_UP((bit_off + bits) * PCPU_MIN_ALLOC_SIZE);
- rs = page_start; - bitmap_next_clear_region(chunk->populated, &rs, &re, page_end); - if (rs >= page_end) + start = find_next_zero_bit(chunk->populated, end, start); + if (start >= end) return true;
- *next_off = re * PAGE_SIZE / PCPU_MIN_ALLOC_SIZE; + end = find_next_bit(chunk->populated, end, start + 1); + + *next_off = end * PAGE_SIZE / PCPU_MIN_ALLOC_SIZE; return false; }
bitmap_for_each_{set,clear}_region() are similar to for_each_bit() macros in include/linux/find.h, but interface and implementation of them are different.
This patch adds for_each_bitrange() macros and drops unused bitmap_*_region() API in sake of unification.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com Acked-by: Dennis Zhou dennis@kernel.org Acked-by: Ulf Hansson ulf.hansson@linaro.org # For MMC --- drivers/mmc/host/renesas_sdhi_core.c | 2 +- include/linux/bitmap.h | 33 ---------------- include/linux/find.h | 56 ++++++++++++++++++++++++++++ mm/percpu.c | 20 ++++------ 4 files changed, 65 insertions(+), 46 deletions(-)
diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c index e49ca0f7fe9a..efd33b1fc467 100644 --- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -647,7 +647,7 @@ static int renesas_sdhi_select_tuning(struct tmio_mmc_host *host) * is at least SH_MOBILE_SDHI_MIN_TAP_ROW probes long then use the * center index as the tap, otherwise bail out. */ - bitmap_for_each_set_region(bitmap, rs, re, 0, taps_size) { + for_each_set_bitrange(rs, re, bitmap, taps_size) { if (re - rs > tap_cnt) { tap_end = re; tap_start = rs; diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 3f7c6731b203..6938edd457c2 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -55,12 +55,6 @@ struct device; * bitmap_clear(dst, pos, nbits) Clear specified bit area * bitmap_find_next_zero_area(buf, len, pos, n, mask) Find bit free area * bitmap_find_next_zero_area_off(buf, len, pos, n, mask, mask_off) as above - * bitmap_next_clear_region(map, &start, &end, nbits) Find next clear region - * bitmap_next_set_region(map, &start, &end, nbits) Find next set region - * bitmap_for_each_clear_region(map, rs, re, start, end) - * Iterate over all clear regions - * bitmap_for_each_set_region(map, rs, re, start, end) - * Iterate over all set regions * bitmap_shift_right(dst, src, n, nbits) *dst = *src >> n * bitmap_shift_left(dst, src, n, nbits) *dst = *src << n * bitmap_cut(dst, src, first, n, nbits) Cut n bits from first, copy rest @@ -459,14 +453,6 @@ static inline void bitmap_replace(unsigned long *dst, __bitmap_replace(dst, old, new, mask, nbits); }
-static inline void bitmap_next_clear_region(unsigned long *bitmap, - unsigned int *rs, unsigned int *re, - unsigned int end) -{ - *rs = find_next_zero_bit(bitmap, end, *rs); - *re = find_next_bit(bitmap, end, *rs + 1); -} - static inline void bitmap_next_set_region(unsigned long *bitmap, unsigned int *rs, unsigned int *re, unsigned int end) @@ -475,25 +461,6 @@ static inline void bitmap_next_set_region(unsigned long *bitmap, *re = find_next_zero_bit(bitmap, end, *rs + 1); }
-/* - * Bitmap region iterators. Iterates over the bitmap between [@start, @end). - * @rs and @re should be integer variables and will be set to start and end - * index of the current clear or set region. - */ -#define bitmap_for_each_clear_region(bitmap, rs, re, start, end) \ - for ((rs) = (start), \ - bitmap_next_clear_region((bitmap), &(rs), &(re), (end)); \ - (rs) < (re); \ - (rs) = (re) + 1, \ - bitmap_next_clear_region((bitmap), &(rs), &(re), (end))) - -#define bitmap_for_each_set_region(bitmap, rs, re, start, end) \ - for ((rs) = (start), \ - bitmap_next_set_region((bitmap), &(rs), &(re), (end)); \ - (rs) < (re); \ - (rs) = (re) + 1, \ - bitmap_next_set_region((bitmap), &(rs), &(re), (end))) - /** * BITMAP_FROM_U64() - Represent u64 value in the format suitable for bitmap. * @n: u64 value diff --git a/include/linux/find.h b/include/linux/find.h index ae9ed52b52b8..5bb6db213bcb 100644 --- a/include/linux/find.h +++ b/include/linux/find.h @@ -301,6 +301,62 @@ unsigned long find_next_bit_le(const void *addr, unsigned (bit) < (size); \ (bit) = find_next_zero_bit((addr), (size), (bit) + 1))
+/** + * for_each_set_bitrange - iterate over all set bit ranges [b; e) + * @b: bit offset of start of current bitrange (first set bit) + * @e: bit offset of end of current bitrange (first unset bit) + * @addr: bitmap address to base the search on + * @size: bitmap size in number of bits + */ +#define for_each_set_bitrange(b, e, addr, size) \ + for ((b) = find_next_bit((addr), (size), 0), \ + (e) = find_next_zero_bit((addr), (size), (b) + 1); \ + (b) < (size); \ + (b) = find_next_bit((addr), (size), (e) + 1), \ + (e) = find_next_zero_bit((addr), (size), (b) + 1)) + +/** + * for_each_set_bitrange_from - iterate over all set bit ranges [b; e) + * @b: bit offset of start of current bitrange (first set bit); must be initialized + * @e: bit offset of end of current bitrange (first unset bit) + * @addr: bitmap address to base the search on + * @size: bitmap size in number of bits + */ +#define for_each_set_bitrange_from(b, e, addr, size) \ + for ((b) = find_next_bit((addr), (size), (b)), \ + (e) = find_next_zero_bit((addr), (size), (b) + 1); \ + (b) < (size); \ + (b) = find_next_bit((addr), (size), (e) + 1), \ + (e) = find_next_zero_bit((addr), (size), (b) + 1)) + +/** + * for_each_clear_bitrange - iterate over all unset bit ranges [b; e) + * @b: bit offset of start of current bitrange (first unset bit) + * @e: bit offset of end of current bitrange (first set bit) + * @addr: bitmap address to base the search on + * @size: bitmap size in number of bits + */ +#define for_each_clear_bitrange(b, e, addr, size) \ + for ((b) = find_next_zero_bit((addr), (size), 0), \ + (e) = find_next_bit((addr), (size), (b) + 1); \ + (b) < (size); \ + (b) = find_next_zero_bit((addr), (size), (e) + 1), \ + (e) = find_next_bit((addr), (size), (b) + 1)) + +/** + * for_each_clear_bitrange_from - iterate over all unset bit ranges [b; e) + * @b: bit offset of start of current bitrange (first set bit); must be initialized + * @e: bit offset of end of current bitrange (first unset bit) + * @addr: bitmap address to base the search on + * @size: bitmap size in number of bits + */ +#define for_each_clear_bitrange_from(b, e, addr, size) \ + for ((b) = find_next_zero_bit((addr), (size), (b)), \ + (e) = find_next_bit((addr), (size), (b) + 1); \ + (b) < (size); \ + (b) = find_next_zero_bit((addr), (size), (e) + 1), \ + (e) = find_next_bit((addr), (size), (b) + 1)) + /** * for_each_set_clump8 - iterate over bitmap for each 8-bit clump with set bits * @start: bit offset to start search and to store the current iteration offset diff --git a/mm/percpu.c b/mm/percpu.c index 8bf8b88446d7..8c16c8e04eee 100644 --- a/mm/percpu.c +++ b/mm/percpu.c @@ -779,7 +779,7 @@ static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index) { struct pcpu_block_md *block = chunk->md_blocks + index; unsigned long *alloc_map = pcpu_index_alloc_map(chunk, index); - unsigned int rs, re, start; /* region start, region end */ + unsigned int start, end; /* region start, region end */
/* promote scan_hint to contig_hint */ if (block->scan_hint) { @@ -795,9 +795,8 @@ static void pcpu_block_refresh_hint(struct pcpu_chunk *chunk, int index) block->right_free = 0;
/* iterate over free areas and update the contig hints */ - bitmap_for_each_clear_region(alloc_map, rs, re, start, - PCPU_BITMAP_BLOCK_BITS) - pcpu_block_update(block, rs, re); + for_each_clear_bitrange_from(start, end, alloc_map, PCPU_BITMAP_BLOCK_BITS) + pcpu_block_update(block, start, end); }
/** @@ -1855,13 +1854,12 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
/* populate if not all pages are already there */ if (!is_atomic) { - unsigned int page_start, page_end, rs, re; + unsigned int page_end, rs, re;
- page_start = PFN_DOWN(off); + rs = PFN_DOWN(off); page_end = PFN_UP(off + size);
- bitmap_for_each_clear_region(chunk->populated, rs, re, - page_start, page_end) { + for_each_clear_bitrange_from(rs, re, chunk->populated, page_end) { WARN_ON(chunk->immutable);
ret = pcpu_populate_chunk(chunk, rs, re, pcpu_gfp); @@ -2017,8 +2015,7 @@ static void pcpu_balance_free(bool empty_only) list_for_each_entry_safe(chunk, next, &to_free, list) { unsigned int rs, re;
- bitmap_for_each_set_region(chunk->populated, rs, re, 0, - chunk->nr_pages) { + for_each_set_bitrange(rs, re, chunk->populated, chunk->nr_pages) { pcpu_depopulate_chunk(chunk, rs, re); spin_lock_irq(&pcpu_lock); pcpu_chunk_depopulated(chunk, rs, re); @@ -2088,8 +2085,7 @@ static void pcpu_balance_populated(void) continue;
/* @chunk can't go away while pcpu_alloc_mutex is held */ - bitmap_for_each_clear_region(chunk->populated, rs, re, 0, - chunk->nr_pages) { + for_each_clear_bitrange(rs, re, chunk->populated, chunk->nr_pages) { int nr = min_t(int, re - rs, nr_to_pop);
spin_unlock_irq(&pcpu_lock);
Functional tests for bitmap_print_to_pagebuf() are provided in lib/test_printf.c. This patch adds performance test for a case of fully set bitmap.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- lib/test_bitmap.c | 37 +++++++++++++++++++++++++++++++++++++ 1 file changed, 37 insertions(+)
diff --git a/lib/test_bitmap.c b/lib/test_bitmap.c index 4ea73f5aed41..452d525007da 100644 --- a/lib/test_bitmap.c +++ b/lib/test_bitmap.c @@ -430,6 +430,42 @@ static void __init test_bitmap_parselist(void) } }
+static void __init test_bitmap_printlist(void) +{ + unsigned long *bmap = kmalloc(PAGE_SIZE, GFP_KERNEL); + char *buf = kmalloc(PAGE_SIZE, GFP_KERNEL); + char expected[256]; + int ret, slen; + ktime_t time; + + if (!buf || !bmap) + goto out; + + memset(bmap, -1, PAGE_SIZE); + slen = snprintf(expected, 256, "0-%ld", PAGE_SIZE * 8 - 1); + if (slen < 0) + goto out; + + time = ktime_get(); + ret = bitmap_print_to_pagebuf(true, buf, bmap, PAGE_SIZE * 8); + time = ktime_get() - time; + + if (ret != slen + 1) { + pr_err("bitmap_print_to_pagebuf: result is %d, expected %d\n", ret, slen); + goto out; + } + + if (strncmp(buf, expected, slen)) { + pr_err("bitmap_print_to_pagebuf: result is %s, expected %s\n", buf, expected); + goto out; + } + + pr_err("bitmap_print_to_pagebuf: input is '%s', Time: %llu\n", buf, time); +out: + kfree(buf); + kfree(bmap); +} + static const unsigned long parse_test[] __initconst = { BITMAP_FROM_U64(0), BITMAP_FROM_U64(1), @@ -669,6 +705,7 @@ static void __init selftest(void) test_bitmap_arr32(); test_bitmap_parse(); test_bitmap_parselist(); + test_bitmap_printlist(); test_mem_optimisations(); test_for_each_set_clump8(); test_bitmap_cut();
bitmap_list_string() is very ineffective when printing bitmaps with long ranges of set bits because it calls find_next_bit for each bit in the bitmap. We can do better by detecting ranges of set bits.
In my environment, before/after is 943008/31008 ns.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com --- lib/vsprintf.c | 24 +++++++----------------- 1 file changed, 7 insertions(+), 17 deletions(-)
diff --git a/lib/vsprintf.c b/lib/vsprintf.c index dd006adfe853..29a384eee286 100644 --- a/lib/vsprintf.c +++ b/lib/vsprintf.c @@ -1241,20 +1241,13 @@ char *bitmap_list_string(char *buf, char *end, unsigned long *bitmap, struct printf_spec spec, const char *fmt) { int nr_bits = max_t(int, spec.field_width, 0); - /* current bit is 'cur', most recently seen range is [rbot, rtop] */ - int cur, rbot, rtop; bool first = true; + int rbot, rtop;
if (check_pointer(&buf, end, bitmap, spec)) return buf;
- rbot = cur = find_first_bit(bitmap, nr_bits); - while (cur < nr_bits) { - rtop = cur; - cur = find_next_bit(bitmap, nr_bits, cur + 1); - if (cur < nr_bits && cur <= rtop + 1) - continue; - + for_each_set_bitrange(rbot, rtop, bitmap, nr_bits) { if (!first) { if (buf < end) *buf = ','; @@ -1263,15 +1256,12 @@ char *bitmap_list_string(char *buf, char *end, unsigned long *bitmap, first = false;
buf = number(buf, end, rbot, default_dec_spec); - if (rbot < rtop) { - if (buf < end) - *buf = '-'; - buf++; - - buf = number(buf, end, rtop, default_dec_spec); - } + if (rtop == rbot + 1) + continue;
- rbot = cur; + if (buf < end) + *buf = '-'; + buf = number(++buf, end, rtop - 1, default_dec_spec); } return buf; }
On Sun, Aug 15, 2021 at 12:21 AM Yury Norov yury.norov@gmail.com wrote:
bitmap_list_string() is very ineffective when printing bitmaps with long ranges of set bits because it calls find_next_bit for each bit in the bitmap. We can do better by detecting ranges of set bits.
In my environment, before/after is 943008/31008 ns.
I would add a couple of words, maybe in parentheses, to describe what your environment is.
...
buf = number(++buf, end, rtop - 1, default_dec_spec);
++buf is a bit confusing here. Since you will rewrite the buf value anyway, I would write the parameter as buf + 1.
On Sun, Aug 15, 2021 at 02:09:45PM +0300, Andy Shevchenko wrote:
On Sun, Aug 15, 2021 at 12:21 AM Yury Norov yury.norov@gmail.com wrote:
bitmap_list_string() is very ineffective when printing bitmaps with long ranges of set bits because it calls find_next_bit for each bit in the bitmap. We can do better by detecting ranges of set bits.
In my environment, before/after is 943008/31008 ns.
I would add a couple of words, maybe in parentheses, to describe what your environment is.
...
buf = number(++buf, end, rtop - 1, default_dec_spec);
++buf is a bit confusing here. Since you will rewrite the buf value anyway, I would write the parameter as buf + 1.
Agree, it's sloppy. I'll send the patch by tomorrow.
On Sat 2021-08-14 14:17:13, Yury Norov wrote:
bitmap_list_string() is very ineffective when printing bitmaps with long ranges of set bits because it calls find_next_bit for each bit in the bitmap. We can do better by detecting ranges of set bits.
In my environment, before/after is 943008/31008 ns.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com
I like the patch. The new code is much easier to follow. Feel free to use:
Reviewed-by: Petr Mladek pmladek@suse.com
Best Regards, Petr
On Thu, Aug 26, 2021 at 04:15:09PM +0200, Petr Mladek wrote:
On Sat 2021-08-14 14:17:13, Yury Norov wrote:
bitmap_list_string() is very ineffective when printing bitmaps with long ranges of set bits because it calls find_next_bit for each bit in the bitmap. We can do better by detecting ranges of set bits.
In my environment, before/after is 943008/31008 ns.
Signed-off-by: Yury Norov yury.norov@gmail.com Tested-by: Wolfram Sang wsa+renesas@sang-engineering.com
I like the patch. The new code is much easier to follow. Feel free to use:
Reviewed-by: Petr Mladek pmladek@suse.com
Best Regards, Petr
Thanks Petr! The patch is already in the linux-next.
Andrew, Stephen, can you please append Petr's reviewed-by?
Thanks, Yury
linux-kselftest-mirror@lists.linaro.org