New subject: [PATCH v2 1/4] kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers

7 Oct 2021


      The kernel.h is a set of something which is not related to each other
and often used in non-crossed compilation units, especially when drivers
need only one or two macro definitions from it.
Here is the split of container_of(). The goals are the following:
- untwist the dependency hell a bit
- drop kernel.h inclusion where it's only used for container_of()
- speed up C preprocessing.
People, like Greg KH and Miguel Ojeda, were asking about the latter.
Read below the methodology and test setup with outcome numbers.
The methodology
===============
The question here is how to measure in the more or less clean way
the C preprocessing time when building a project like Linux kernel.
To answer it, let's look around and see what tools do we have that
may help. Aha, here is ccache tool that seems quite plausible to
be used. Its core idea is to preprocess C file, count hash (MD4)
and compare to ones that are in the cache. If found, return the
object file, avoiding compilation stage.
Taking into account the property of the ccache, configure and use
it in the below steps:
1. Configure kernel with allyesconfig
2. Make it with `make` to be sure that the cache is filled with
   the latest data. I.o.w. warm up the cache.
3. Run `make -s` (silent mode to reduce the influence of
   the unrelated things, like console output) 10 times and
   measure 'real' time spent.
4. Repeat 1-3 for each patch or patch set to get data sets before
   and after.
When we get the raw data, calculating median will show us the number.
Comparing them before and after we will see the difference.
The setup
=========
I have used the Intel x86_64 server platform (see partial output of
 `lscpu` below):
$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  88
  On-line CPU(s) list:   0-87
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz
    CPU family:          6
    Model:               79
    Thread(s) per core:  2
    Core(s) per socket:  22
    Socket(s):           2
    Stepping:            1
    CPU max MHz:         3600.0000
    CPU min MHz:         1200.0000
...
Caches (sum of all):
  L1d:                   1.4 MiB (44 instances)
  L1i:                   1.4 MiB (44 instances)
  L2:                    11 MiB (44 instances)
  L3:                    110 MiB (2 instances)
NUMA:
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-21,44-65
  NUMA node1 CPU(s):     22-43,66-87
Vulnerabilities:
  Itlb multihit:         KVM: Mitigation: Split huge pages
  L1tf:                  Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
  Mds:                   Mitigation; Clear CPU buffers; SMT vulnerable
  Meltdown:              Mitigation; PTI
  Spec store bypass:     Mitigation; Speculative Store Bypass disabled via prctl and seccomp
  Spectre v1:            Mitigation; usercopy/swapgs barriers and __user pointer sanitization
  Spectre v2:            Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
  Tsx async abort:       Mitigation; Clear CPU buffers; SMT vulnerable
With the following GCC:
$ gcc --version
gcc (Debian 10.3.0-11) 10.3.0
The commands I have run during the measurement were:
rm -rf $O
    make O=$O allyesconfig
    time make O=$O -s -j64	# this step has been measured
The raw data and median
=======================
Before patch 2 (yes, I have measured the only patch 2 effect) in the series
(the data is sorted by time):
real    2m8.794s
real    2m11.183s
real    2m11.235s
real    2m11.639s
real    2m11.960s
real    2m12.014s
real    2m12.609s
real    2m13.177s
real    2m13.462s
real    2m19.132s
After patch 2 has been applied:
real    2m8.536s
real    2m8.776s
real    2m9.071s
real    2m9.459s
real    2m9.531s
real    2m9.610s
real    2m10.356s
real    2m10.430s
real    2m11.117s
real    2m11.885s
Median values are:
    131.987s before
    129.571s after
We see the steady speedup as of 1.83%.
Andy Shevchenko (4):
  kernel.h: Drop unneeded <linux/kernel.h> inclusion from other headers
  kernel.h: Split out container_of() and typeof_member() macros
  lib/rhashtable: Replace kernel.h with the necessary inclusions
  kunit: Replace kernel.h with the necessary inclusions
include/kunit/test.h         | 14 ++++++++++++--
 include/linux/container_of.h | 37 ++++++++++++++++++++++++++++++++++++
 include/linux/kernel.h       | 31 +-----------------------------
 include/linux/kobject.h      |  1 +
 include/linux/list.h         |  6 ++++--
 include/linux/llist.h        |  4 +++-
 include/linux/plist.h        |  5 ++++-
 include/linux/rwsem.h        |  1 -
 include/linux/spinlock.h     |  1 -
 include/media/media-entity.h |  3 ++-
 lib/radix-tree.c             |  6 +++++-
 lib/rhashtable.c             |  7 ++++++-
 12 files changed, 75 insertions(+), 41 deletions(-)
 create mode 100644 include/linux/container_of.h
-- 
2.33.0

[PATCH v2 0/4] kernel.h further split

Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com

Signed-off-by: Andy Shevchenko andriy.shevchenko@linux.intel.com

The methodology

The setup

The raw data and median

The methodology

The setup

The raw data and median

The methodology

The setup

The raw data and median

The methodology

The setup

The raw data and median

The methodology

The setup

The raw data and median