Commit a745f7af3cbd ("selftests/harness: Add 30 second timeout per
test") solves that binary tests doesn't hang forever. However, scripts
can still hang forever, this adds an timeout to each test script run. This
assumes that an individual test doesn't take longer than 30 seconds.
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
tools/testing/selftests/kselftest/runner.sh | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 00c9020bdda8..cff7d2d83648 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -5,6 +5,7 @@
export skip_rc=4
export logfile=/dev/stdout
export per_test_logging=
+export TEST_TIMEOUT_DEFAULT=30
# There isn't a shell-agnostic way to find the path of a sourced file,
# so we must rely on BASE_DIR being set to find other tools.
@@ -24,6 +25,14 @@ tap_prefix()
fi
}
+tap_timeout()
+{
+ if [ -x /usr/bin/timeout ] && [ -x "$BASENAME_TEST" ] \
+ && file $BASENAME_TEST |grep -q "shell script"; then
+ echo -n "timeout $TEST_TIMEOUT_DEFAULT"
+ fi
+}
+
run_one()
{
DIR="$1"
@@ -44,7 +53,7 @@ run_one()
echo "not ok $test_num $TEST_HDR_MSG"
else
cd `dirname $TEST` > /dev/null
- (((((./$BASENAME_TEST 2>&1; echo $? >&3) |
+ ((((( tap_timeout ./$BASENAME_TEST 2>&1; echo $? >&3) |
tap_prefix >&4) 3>&1) |
(read xs; exit $xs)) 4>>"$logfile" &&
echo "ok $test_num $TEST_HDR_MSG") ||
--
2.11.0
Hello,
The proposed introduction of a relaxed ARM64 ABI [1] will allow tagged memory
addresses to be passed through the user-kernel syscall ABI boundary. Tagged
memory addresses are those which contain a non-zero top byte (the hardware
has always ignored this top byte due to TCR_EL1.TBI0) and may be useful
for features such as HWASan.
To permit this relaxation a proposed patchset [2] strips the top byte (tag)
from user provided memory addresses prior to use in kernel functions which
require untagged addresses (for example comparasion/arithmetic of addresses).
The author of this patchset relied on a variety of techniques [2] (such as
grep, BUG_ON, sparse etc) to identify as many instances of possible where
tags need to be stipped.
To support this effort and to catch future regressions (e.g. in new syscalls
or ioctls), I've devised an additional approach for detecting the use of
tagged addresses in functions that do not want them. This approach makes
use of Smatch [3] and is outlined in this RFC. Due to the ability of Smatch
to do flow analysis I believe we can annotate the kernel in fewer places
than a similar approach in sparse.
I'm keen for feedback on the likely usefulness of this approach.
We first add some new annotations that are exclusively consumed by Smatch:
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -19,6 +19,7 @@
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
# define __percpu __attribute__((noderef, address_space(3)))
# define __rcu __attribute__((noderef, address_space(4)))
+# define __untagged __attribute__((address_space(5)))
# define __private __attribute__((noderef))
extern void __chk_user_ptr(const volatile void __user *);
extern void __chk_io_ptr(const volatile void __iomem *);
@@ -45,6 +46,7 @@ extern void __chk_io_ptr(const volatile void __iomem *);
...
The purpose of this annotation is to indicate in function prototypes that a
given argument must not be a user tagged memory address. (The address space
number isn't significant here and could be replaced with any other annotation
that we get Smatch to understand).
An example of how we use this annotation is as follows:
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2224,7 +2224,7 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
EXPORT_SYMBOL(get_unmapped_area);
/* Look up the first VMA which satisfies addr < vm_end, NULL if none. */
-struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
+struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long __untagged addr)
{
struct rb_node *rb_node;
struct vm_area_struct *vma;
Thus our intent here is to flag up whenever addr is tagged. Specifically
modifications I've made to Smatch to test this will flag an issue where all
the following conditions are met:
1. the parameter is used in the function
2. the data in the parameter has originated or been derived from userspace
(this relies on existing Smatch functionality to detect where data has
come from)
3. the data's top byte is non-zero (via flow analysis to determine the
range of values that it may be given the known call-tree).
Due to the use of smatch and its flow-analysis we don't need to propogate the
__untagged annotation up the call chain to the callers and their callers - thus
allowing us to only annotate functions that actually does something with the address
and it's only necessary if the function has the potential to receive user data. I.e.
we only need to tag find_vma to catch an issue with mmap_region (because it calls
count_vma_pages_range which calls find_vma_intersection which calls find_vma).
Due to condition 3 above, the use of the existing untagged_addr macro (or anything
that does something similar) will prevent smatch from producing a warning.
Using a vanilla (v5.2-rc2) kernel, and a single find_vma annotation, smatch will
produce the following warnings:
mm/mmap.c:2227 find_vma() warn: Variable addr looks like a tagged address - it is not allowed here
mm/mmap.c:2227 find_vma() warn: Variable addr looks like a tagged address - it is not allowed here
mm/mmap.c:2227 find_vma() warn: Variable addr looks like a tagged address - it is not allowed here
...
The warning is printed for each call site that calls find_vma with a tagged
address from userspace. After 6 runs of smatch, 24 warnings are produced.
The warnings are helpful in detecting issues, but not useful in providing enough
information to debug the issue and find the offending functions higher up the
call stack that call find_vma. Smatch is good at determining the ranges of values
that can be passed to a function, but it doesn't keep track of how it determined
those ranges - this makes it difficult to determine the offending function.
However even this level of limited functionality is helpful - as once the kernel
is initially sanitized of tagged addresses, then the use of smatch here can
spot regressions and offending code identified via git aiaiai/bisect.
Smatch builds a database which includes a table of functions, who they are
called by and with what range of parameters. Smatch also provides a bunch
of perl and python scripts which can be used to extract helpful information
for example to produce a call tree for a given function. I've adapted these
scripts such that for a given function (e.g. find_vma) it will show you the
call tree where callers pass user data and where that data is tagged addresses.
The output looks something like this:
find_vma() - 0-u64max (1)
kvm_arch_prepare_memory_region() - 0-u64max (1)
__kvm_set_memory_region() - 0-u64max (1)
kvm_set_memory_region() - 0-u64max (1)
kvm_vm_ioctl_set_memory_region() - 0-u64max (1)
hugetlb_get_unmapped_area() - 0-u64max (1)
shm_get_unmapped_area() - 0-u64max (1)
shm_get_unmapped_area() - 32785,40977,98321,106513,2097151-u64max[c] (1)
...
In summary the following are found, note this currently unhelpfuly includes
functions inbetween find_vma and the leaf functions:
$ cat find_vma_tree_orig | sed -e 's/^[ \t]*//' | cut -d ' ' -f 1 | sort | uniq
call_mmap()
check_and_migrate_cma_pages()
compat_ipv6_getsockopt()
compat_sock_common_getsockopt()
compat_tcp_getsockopt()
count_vma_pages_range()
__do_compat_sys_get_mempolicy()
do_get_mempolicy()
do_ioctl()
do_mincore()
do_mlock()
do_mmap()
do_mmap_pgoff()
...
As you can see, this gives a good point in the right direction for hunting
down callers of find_vma with tagged addresses.
This can be further improved - the problem here is that for a given function,
e.g. find_vma we look for callers where *any* of the parameters
passed to find_vma are tagged addresses from userspace - i.e. not *just*
the annotated parameter. This is also true for find_vma's callers' callers'.
This results in the call tree having false positives.
It *is* possible to track parameters (e.g. find_vma arg 1 comes from arg 3 of
do_pages_stat_array etc), but this is limited as if functions modify the
data then the tracking is stopped (however this can be fixed).
After apply the patchset ([PATCH v16 00/16] arm64: untag user pointers passed
to the kernel)[2] which untags user addresses, smatch indicates 11 issues. The
call tree is reduced. After grep'ing the call tree output, there are some valid
instances where untagging is needed, e.g:
gntdev_ioctl_get_offset_for_vaddr()
kvm_vm_ioctl_set_memory_region()
privcmd_ioctl_mmap_batch()
privcmd_ioctl_mmap_resource()
__se_sys_brk()
__se_sys_mremap()
__se_sys_munmap()
__se_sys_remap_file_pages()
__se_sys_shmat()
__se_sys_shmdt()
__vm_munmap()
...
An example of a false positve is do_mlock. We untag the address and pass that
to apply_vma_lock_flags - however we also pass a length - because the length
came from userspace and could have the top bits set - it's flagged. However
with improved parameter tracking we can remove this false positive and similar.
Prior to smatch I attempted a similar approach with sparse - however it seemed
necessary to propogate the __untagged annotation in every function up the call tree,
and resulted in adding the __untagged annotation to functions that would never
get near user provided data. This leads to a littering of __untagged all over the
kernel which doesn't seem appealing. Smatch is more capable, however it almost
certainly won't pick up 100% of issues due to the difficulity of making flow
analysis understand everything a compiler can.
Is it likely to be acceptable to use the __untagged annotation in user-path
functions that require untagged addresses across the kernel?
Thanks,
Andrew Murray
[1] https://lkml.org/lkml/2019/6/13/534
[2] https://patchwork.kernel.org/cover/10989517/
[3] http://smatch.sourceforge.net/
vDSO (virtual dynamic shared object) is a mechanism that the Linux
kernel provides as an alternative to system calls to reduce where
possible the costs in terms of cycles.
This is possible because certain syscalls like gettimeofday() do
not write any data and return one or more values that are stored
in the kernel, which makes relatively safe calling them directly
as a library function.
Even if the mechanism is pretty much standard, every architecture
in the last few years ended up implementing their own vDSO library
in the architectural code.
The purpose of this patch-set is to identify the commonalities in
between the architectures and try to consolidate the common code
paths, starting with gettimeofday().
This implementation contains the following design choices:
* Every architecture defines the arch specific code in an header in
"asm/vdso/".
* The generic implementation includes the arch specific one and lives
in "lib/vdso".
* The arch specific code for gettimeofday lives in
"<arch path>/vdso/gettimeofday.c" and includes the generic code only.
* The generic implementation of update_vsyscall and update_vsyscall_tz
lives in kernel/vdso and provide the bindings that can be implemented
by each architecture.
* Each architecture provides its implementation of the bindings in
"asm/vdso/vsyscall.h".
* This approach allows to consolidate the common code in a single place
with the benefit of avoiding code duplication.
This implementation contains the portings to the common library for: arm64,
compat mode for arm64, arm, mips, x86_64, x32, compat mode for x86_64 and
i386.
The mips porting has been tested on qemu for mips32el. A configuration to
repeat the tests can be found at [4].
The x86_64 porting has been tested on an Intel Xeon 5120T based machine
running Ubuntu 18.04 and using the Ubuntu provided defconfig.
The i386 porting has been tested on qemu using the i386_defconfig
configuration.
Last but not least from this porting arm64, compat arm64, arm and mips gain
the support for:
* CLOCK_BOOTTIME that can be useful in certain scenarios since it keeps
track of the time during sleep as well.
* CLOCK_TAI that is like CLOCK_REALTIME, but uses the International
Atomic Time (TAI) reference instead of UTC to avoid jumping on leap
second updates.
for both clock_gettime and clock_getres.
The porting has been validated using the vdsotest test-suite [1] extended
to cover all the clock ids [2].
A new test has been added to the linux kselftest in order to validate the
newly added library.
The porting has been benchmarked and the performance results are
provided as part of this cover letter.
To simplify the testing, a copy of the patchset on top of a recent linux
tree can be found at [3] and [4].
[1] https://github.com/nathanlynch/vdsotest
[2] https://github.com/fvincenzo/vdsotest
[3] git://linux-arm.org/linux-vf.git vdso/v6
[4] git://linux-arm.org/linux-vf.git vdso-mips/v6
Changes:
--------
v6:
- Rebased on 5.2-rc2.
- Added performance numbers.
- Removed vdso_types.h.
- Unified update_vsyscall and update_vsyscall_tz.
- Reworked the kselftest included in this patchset.
- Addressed review comments.
v5:
- Rebased on 5.0-rc7.
- Added x86_64, compat mode for x86_64 and i386 portings.
- Extended vDSO kselftest.
- Addressed review comments.
v4:
- Rebased on 5.0-rc2.
- Addressed review comments.
- Disabled compat vdso on arm64 when the kernel is compiled with
clang.
v3:
- Ported the latest fixes and optimizations done on the x86
architecture to the generic library.
- Addressed review comments.
- Improved the documentation of the interfaces.
- Changed the HAVE_ARCH_TIMER config option to a more generic
HAVE_HW_COUNTER.
v2:
- Added -ffixed-x18 to arm64
- Repleced occurrences of timeval and timespec
- Modified datapage.h to be compliant with y2038 on all the architectures
- Removed __u_vdso type
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Paul Burton <paul.burton(a)mips.com>
Cc: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Mark Salyzyn <salyzyn(a)android.com>
Cc: Peter Collingbourne <pcc(a)google.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Cc: Huw Davies <huw(a)codeweavers.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Performance Numbers: Linux 5.2.0-rc2 - Xeon Gold 5120T
======================================================
Unified vDSO:
-------------
clock-gettime-monotonic: syscall: 342 nsec/call
clock-gettime-monotonic: libc: 25 nsec/call
clock-gettime-monotonic: vdso: 24 nsec/call
clock-getres-monotonic: syscall: 296 nsec/call
clock-getres-monotonic: libc: 296 nsec/call
clock-getres-monotonic: vdso: 3 nsec/call
clock-gettime-monotonic-coarse: syscall: 294 nsec/call
clock-gettime-monotonic-coarse: libc: 5 nsec/call
clock-gettime-monotonic-coarse: vdso: 5 nsec/call
clock-getres-monotonic-coarse: syscall: 295 nsec/call
clock-getres-monotonic-coarse: libc: 292 nsec/call
clock-getres-monotonic-coarse: vdso: 5 nsec/call
clock-gettime-monotonic-raw: syscall: 343 nsec/call
clock-gettime-monotonic-raw: libc: 25 nsec/call
clock-gettime-monotonic-raw: vdso: 23 nsec/call
clock-getres-monotonic-raw: syscall: 290 nsec/call
clock-getres-monotonic-raw: libc: 290 nsec/call
clock-getres-monotonic-raw: vdso: 4 nsec/call
clock-gettime-tai: syscall: 332 nsec/call
clock-gettime-tai: libc: 24 nsec/call
clock-gettime-tai: vdso: 23 nsec/call
clock-getres-tai: syscall: 288 nsec/call
clock-getres-tai: libc: 288 nsec/call
clock-getres-tai: vdso: 3 nsec/call
clock-gettime-boottime: syscall: 342 nsec/call
clock-gettime-boottime: libc: 24 nsec/call
clock-gettime-boottime: vdso: 23 nsec/call
clock-getres-boottime: syscall: 284 nsec/call
clock-getres-boottime: libc: 291 nsec/call
clock-getres-boottime: vdso: 3 nsec/call
clock-gettime-realtime: syscall: 337 nsec/call
clock-gettime-realtime: libc: 24 nsec/call
clock-gettime-realtime: vdso: 23 nsec/call
clock-getres-realtime: syscall: 287 nsec/call
clock-getres-realtime: libc: 284 nsec/call
clock-getres-realtime: vdso: 3 nsec/call
clock-gettime-realtime-coarse: syscall: 307 nsec/call
clock-gettime-realtime-coarse: libc: 4 nsec/call
clock-gettime-realtime-coarse: vdso: 4 nsec/call
clock-getres-realtime-coarse: syscall: 294 nsec/call
clock-getres-realtime-coarse: libc: 291 nsec/call
clock-getres-realtime-coarse: vdso: 4 nsec/call
getcpu: syscall: 246 nsec/call
getcpu: libc: 14 nsec/call
getcpu: vdso: 11 nsec/call
gettimeofday: syscall: 293 nsec/call
gettimeofday: libc: 26 nsec/call
gettimeofday: vdso: 25 nsec/call
Stock Kernel:
-------------
clock-gettime-monotonic: syscall: 338 nsec/call
clock-gettime-monotonic: libc: 24 nsec/call
clock-gettime-monotonic: vdso: 23 nsec/call
clock-getres-monotonic: syscall: 291 nsec/call
clock-getres-monotonic: libc: 304 nsec/call
clock-getres-monotonic: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-monotonic-coarse: syscall: 297 nsec/call
clock-gettime-monotonic-coarse: libc: 5 nsec/call
clock-gettime-monotonic-coarse: vdso: 4 nsec/call
clock-getres-monotonic-coarse: syscall: 281 nsec/call
clock-getres-monotonic-coarse: libc: 286 nsec/call
clock-getres-monotonic-coarse: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-monotonic-raw: syscall: 336 nsec/call
clock-gettime-monotonic-raw: libc: 340 nsec/call
clock-gettime-monotonic-raw: vdso: 346 nsec/call
clock-getres-monotonic-raw: syscall: 297 nsec/call
clock-getres-monotonic-raw: libc: 301 nsec/call
clock-getres-monotonic-raw: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-tai: syscall: 351 nsec/call
clock-gettime-tai: libc: 24 nsec/call
clock-gettime-tai: vdso: 23 nsec/call
clock-getres-tai: syscall: 298 nsec/call
clock-getres-tai: libc: 290 nsec/call
clock-getres-tai: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-boottime: syscall: 342 nsec/call
clock-gettime-boottime: libc: 347 nsec/call
clock-gettime-boottime: vdso: 355 nsec/call
clock-getres-boottime: syscall: 296 nsec/call
clock-getres-boottime: libc: 295 nsec/call
clock-getres-boottime: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-realtime: syscall: 346 nsec/call
clock-gettime-realtime: libc: 24 nsec/call
clock-gettime-realtime: vdso: 22 nsec/call
clock-getres-realtime: syscall: 295 nsec/call
clock-getres-realtime: libc: 291 nsec/call
clock-getres-realtime: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-realtime-coarse: syscall: 292 nsec/call
clock-gettime-realtime-coarse: libc: 5 nsec/call
clock-gettime-realtime-coarse: vdso: 4 nsec/call
clock-getres-realtime-coarse: syscall: 300 nsec/call
clock-getres-realtime-coarse: libc: 301 nsec/call
clock-getres-realtime-coarse: vdso: not tested
Note: vDSO version of clock_getres not found
getcpu: syscall: 252 nsec/call
getcpu: libc: 14 nsec/call
getcpu: vdso: 11 nsec/call
gettimeofday: syscall: 293 nsec/call
gettimeofday: libc: 24 nsec/call
gettimeofday: vdso: 25 nsec/call
Peter Collingbourne (1):
arm64: Build vDSO with -ffixed-x18
Vincenzo Frascino (18):
kernel: Standardize vdso_datapage
kernel: Define gettimeofday vdso common code
kernel: Unify update_vsyscall implementation
arm64: Substitute gettimeofday with C implementation
arm64: compat: Add missing syscall numbers
arm64: compat: Expose signal related structures
arm64: compat: Generate asm offsets for signals
lib: vdso: Add compat support
arm64: compat: Add vDSO
arm64: Refactor vDSO code
arm64: compat: vDSO setup for compat layer
arm64: elf: vDSO code page discovery
arm64: compat: Get sigreturn trampolines from vDSO
arm64: Add vDSO compat support
arm: Add support for generic vDSO
mips: Add support for generic vDSO
x86: Add support for generic vDSO
kselftest: Extend vDSO selftest
arch/arm/Kconfig | 3 +
arch/arm/include/asm/vdso/gettimeofday.h | 96 +++++
arch/arm/include/asm/vdso/vsyscall.h | 71 ++++
arch/arm/include/asm/vdso_datapage.h | 29 +-
arch/arm/kernel/vdso.c | 87 +----
arch/arm/vdso/Makefile | 13 +-
arch/arm/vdso/note.c | 15 +
arch/arm/vdso/vdso.lds.S | 2 +
arch/arm/vdso/vgettimeofday.c | 268 +------------
arch/arm64/Kconfig | 3 +
arch/arm64/Makefile | 23 +-
arch/arm64/include/asm/elf.h | 14 +
arch/arm64/include/asm/signal32.h | 46 +++
arch/arm64/include/asm/unistd.h | 5 +
arch/arm64/include/asm/vdso.h | 3 +
arch/arm64/include/asm/vdso/compat_barrier.h | 51 +++
.../include/asm/vdso/compat_gettimeofday.h | 108 ++++++
arch/arm64/include/asm/vdso/gettimeofday.h | 84 +++++
arch/arm64/include/asm/vdso/vsyscall.h | 53 +++
arch/arm64/include/asm/vdso_datapage.h | 48 ---
arch/arm64/kernel/Makefile | 6 +-
arch/arm64/kernel/asm-offsets.c | 39 +-
arch/arm64/kernel/signal32.c | 72 ++--
arch/arm64/kernel/vdso.c | 356 ++++++++++++------
arch/arm64/kernel/vdso/Makefile | 34 +-
arch/arm64/kernel/vdso/gettimeofday.S | 334 ----------------
arch/arm64/kernel/vdso/vgettimeofday.c | 28 ++
arch/arm64/kernel/vdso32/.gitignore | 2 +
arch/arm64/kernel/vdso32/Makefile | 184 +++++++++
arch/arm64/kernel/vdso32/note.c | 15 +
arch/arm64/kernel/vdso32/sigreturn.S | 62 +++
arch/arm64/kernel/vdso32/vdso.S | 19 +
arch/arm64/kernel/vdso32/vdso.lds.S | 82 ++++
arch/arm64/kernel/vdso32/vgettimeofday.c | 59 +++
arch/mips/Kconfig | 2 +
arch/mips/include/asm/vdso.h | 78 +---
arch/mips/include/asm/vdso/gettimeofday.h | 175 +++++++++
arch/mips/{ => include/asm}/vdso/vdso.h | 6 +-
arch/mips/include/asm/vdso/vsyscall.h | 43 +++
arch/mips/kernel/vdso.c | 37 +-
arch/mips/vdso/Makefile | 25 +-
arch/mips/vdso/elf.S | 2 +-
arch/mips/vdso/gettimeofday.c | 273 --------------
arch/mips/vdso/sigreturn.S | 2 +-
arch/mips/vdso/vdso.lds.S | 4 +
arch/mips/vdso/vgettimeofday.c | 57 +++
arch/x86/Kconfig | 3 +
arch/x86/entry/vdso/Makefile | 9 +
arch/x86/entry/vdso/vclock_gettime.c | 251 +++---------
arch/x86/entry/vdso/vdso.lds.S | 2 +
arch/x86/entry/vdso/vdso32/vdso32.lds.S | 2 +
arch/x86/entry/vdso/vdsox32.lds.S | 1 +
arch/x86/entry/vsyscall/Makefile | 2 -
arch/x86/entry/vsyscall/vsyscall_gtod.c | 83 ----
arch/x86/include/asm/mshyperv-tsc.h | 76 ++++
arch/x86/include/asm/mshyperv.h | 70 +---
arch/x86/include/asm/pvclock.h | 2 +-
arch/x86/include/asm/vdso/gettimeofday.h | 203 ++++++++++
arch/x86/include/asm/vdso/vsyscall.h | 44 +++
arch/x86/include/asm/vgtod.h | 75 +---
arch/x86/include/asm/vvar.h | 7 +-
arch/x86/kernel/pvclock.c | 1 +
include/asm-generic/vdso/vsyscall.h | 56 +++
include/linux/hrtimer.h | 15 +-
include/linux/hrtimer_defs.h | 25 ++
include/linux/timekeeper_internal.h | 9 +
include/vdso/datapage.h | 91 +++++
include/vdso/helpers.h | 56 +++
include/vdso/vsyscall.h | 11 +
kernel/Makefile | 1 +
kernel/vdso/Makefile | 2 +
kernel/vdso/vsyscall.c | 139 +++++++
lib/Kconfig | 5 +
lib/vdso/Kconfig | 36 ++
lib/vdso/Makefile | 22 ++
lib/vdso/gettimeofday.c | 229 +++++++++++
tools/testing/selftests/vDSO/Makefile | 2 +
tools/testing/selftests/vDSO/vdso_full_test.c | 261 +++++++++++++
78 files changed, 3042 insertions(+), 1767 deletions(-)
create mode 100644 arch/arm/include/asm/vdso/gettimeofday.h
create mode 100644 arch/arm/include/asm/vdso/vsyscall.h
create mode 100644 arch/arm/vdso/note.c
create mode 100644 arch/arm64/include/asm/vdso/compat_barrier.h
create mode 100644 arch/arm64/include/asm/vdso/compat_gettimeofday.h
create mode 100644 arch/arm64/include/asm/vdso/gettimeofday.h
create mode 100644 arch/arm64/include/asm/vdso/vsyscall.h
delete mode 100644 arch/arm64/include/asm/vdso_datapage.h
delete mode 100644 arch/arm64/kernel/vdso/gettimeofday.S
create mode 100644 arch/arm64/kernel/vdso/vgettimeofday.c
create mode 100644 arch/arm64/kernel/vdso32/.gitignore
create mode 100644 arch/arm64/kernel/vdso32/Makefile
create mode 100644 arch/arm64/kernel/vdso32/note.c
create mode 100644 arch/arm64/kernel/vdso32/sigreturn.S
create mode 100644 arch/arm64/kernel/vdso32/vdso.S
create mode 100644 arch/arm64/kernel/vdso32/vdso.lds.S
create mode 100644 arch/arm64/kernel/vdso32/vgettimeofday.c
create mode 100644 arch/mips/include/asm/vdso/gettimeofday.h
rename arch/mips/{ => include/asm}/vdso/vdso.h (90%)
create mode 100644 arch/mips/include/asm/vdso/vsyscall.h
delete mode 100644 arch/mips/vdso/gettimeofday.c
create mode 100644 arch/mips/vdso/vgettimeofday.c
delete mode 100644 arch/x86/entry/vsyscall/vsyscall_gtod.c
create mode 100644 arch/x86/include/asm/mshyperv-tsc.h
create mode 100644 arch/x86/include/asm/vdso/gettimeofday.h
create mode 100644 arch/x86/include/asm/vdso/vsyscall.h
create mode 100644 include/asm-generic/vdso/vsyscall.h
create mode 100644 include/linux/hrtimer_defs.h
create mode 100644 include/vdso/datapage.h
create mode 100644 include/vdso/helpers.h
create mode 100644 include/vdso/vsyscall.h
create mode 100644 kernel/vdso/Makefile
create mode 100644 kernel/vdso/vsyscall.c
create mode 100644 lib/vdso/Kconfig
create mode 100644 lib/vdso/Makefile
create mode 100644 lib/vdso/gettimeofday.c
create mode 100644 tools/testing/selftests/vDSO/vdso_full_test.c
--
2.21.0
This patch series moves Hyper-V clock/timer code to a separate Hyper-V
clocksource driver. Previously, Hyper-V clock/timer code and data
structures were mixed in with other Hyper-V code in the ISA independent
drivers/hv code as well as in ISA dependent code. The new Hyper-V
clocksource driver is ISA independent, with a just few dependencies on
ISA specific functions. The patch series does not change any behavior
or functionality -- it only reorganizes the existing code and fixes up
the linkages. A few places outside of Hyper-V code are fixed up to use
the new #include file structure.
This restructuring is in response to Marc Zyngier's review comments
on supporting Hyper-V running on ARM64, and is a good idea in general.
It increases the amount of code shared between the x86 and ARM64
architectures, and reduces the size of the new code for supporting
Hyper-V on ARM64. A new version of the Hyper-V on ARM64 patches will
follow once this clocksource restructuring is accepted.
The code is diff'ed against the upstream tip tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/vdso
Changes in v4:
* Revised commit messages
* Rebased to upstream tip tree
Changes in v3:
* Removed boolean argument to hv_init_clocksource(). Always call
sched_clock_register, which is needed on ARM64 but a no-op on x86.
* Removed separate cpuhp setup in hv_stimer_alloc() and instead
directly call hv_stimer_init() and hv_stimer_cleanup() from
corresponding VMbus functions. This more closely matches original
code and avoids clocksource stop/restart problems on ARM64 when
VMbus code denies CPU offlining request.
Changes in v2:
* Revised commit short descriptions so the distinction between
the first and second patches is clearer [GregKH]
* Renamed new clocksource driver files and functions to use
existing "timer" and "stimer" names instead of introducing
"syntimer". [Vitaly Kuznetsov]
* Introduced CONFIG_HYPER_TIMER to fix build problem when
CONFIG_HYPERV=m [Vitaly Kuznetsov]
* Added "Suggested-by: Marc Zyngier"
Michael Kelley (2):
Drivers: hv: Create Hyper-V clocksource driver from existing
clockevents code
Drivers: hv: Move Hyper-V clocksource code to new clocksource driver
MAINTAINERS | 2 +
arch/x86/entry/vdso/vma.c | 2 +-
arch/x86/hyperv/hv_init.c | 91 +--------
arch/x86/include/asm/hyperv-tlfs.h | 6 +
arch/x86/include/asm/mshyperv.h | 81 ++------
arch/x86/include/asm/vdso/gettimeofday.h | 2 +-
arch/x86/kernel/cpu/mshyperv.c | 2 +
arch/x86/kvm/x86.c | 1 +
drivers/clocksource/Makefile | 1 +
drivers/clocksource/hyperv_timer.c | 321 +++++++++++++++++++++++++++++++
drivers/hv/Kconfig | 3 +
drivers/hv/hv.c | 156 +--------------
drivers/hv/hyperv_vmbus.h | 3 -
drivers/hv/vmbus_drv.c | 42 ++--
include/clocksource/hyperv_timer.h | 105 ++++++++++
15 files changed, 483 insertions(+), 335 deletions(-)
create mode 100644 drivers/clocksource/hyperv_timer.c
create mode 100644 include/clocksource/hyperv_timer.h
--
1.8.3.1
The blockdev book basically contains user-faced documentation.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
---
.../blockdev/drbd/DRBD-8.3-data-packets.svg | 0
.../blockdev/drbd/DRBD-data-packets.svg | 0
.../blockdev/drbd/conn-states-8.dot | 0
.../blockdev/drbd/data-structure-v9.rst | 0
.../blockdev/drbd/disk-states-8.dot | 0
.../drbd/drbd-connection-state-overview.dot | 0
.../blockdev/drbd/figures.rst | 0
.../{ => admin-guide}/blockdev/drbd/index.rst | 0
.../blockdev/drbd/node-states-8.dot | 1 -
.../{ => admin-guide}/blockdev/floppy.rst | 0
.../{ => admin-guide}/blockdev/index.rst | 2 --
.../{ => admin-guide}/blockdev/nbd.rst | 0
.../{ => admin-guide}/blockdev/paride.rst | 0
.../{ => admin-guide}/blockdev/ramdisk.rst | 0
.../{ => admin-guide}/blockdev/zram.rst | 0
Documentation/admin-guide/index.rst | 1 +
.../admin-guide/kernel-parameters.txt | 18 +++++++++---------
MAINTAINERS | 10 +++++-----
drivers/block/Kconfig | 8 ++++----
drivers/block/floppy.c | 2 +-
drivers/block/zram/Kconfig | 6 +++---
tools/testing/selftests/zram/README | 2 +-
22 files changed, 24 insertions(+), 26 deletions(-)
rename Documentation/{ => admin-guide}/blockdev/drbd/DRBD-8.3-data-packets.svg (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/DRBD-data-packets.svg (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/conn-states-8.dot (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/data-structure-v9.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/disk-states-8.dot (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/drbd-connection-state-overview.dot (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/figures.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/index.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/node-states-8.dot (99%)
rename Documentation/{ => admin-guide}/blockdev/floppy.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/index.rst (94%)
rename Documentation/{ => admin-guide}/blockdev/nbd.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/paride.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/ramdisk.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/zram.rst (100%)
diff --git a/Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg b/Documentation/admin-guide/blockdev/drbd/DRBD-8.3-data-packets.svg
similarity index 100%
rename from Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg
rename to Documentation/admin-guide/blockdev/drbd/DRBD-8.3-data-packets.svg
diff --git a/Documentation/blockdev/drbd/DRBD-data-packets.svg b/Documentation/admin-guide/blockdev/drbd/DRBD-data-packets.svg
similarity index 100%
rename from Documentation/blockdev/drbd/DRBD-data-packets.svg
rename to Documentation/admin-guide/blockdev/drbd/DRBD-data-packets.svg
diff --git a/Documentation/blockdev/drbd/conn-states-8.dot b/Documentation/admin-guide/blockdev/drbd/conn-states-8.dot
similarity index 100%
rename from Documentation/blockdev/drbd/conn-states-8.dot
rename to Documentation/admin-guide/blockdev/drbd/conn-states-8.dot
diff --git a/Documentation/blockdev/drbd/data-structure-v9.rst b/Documentation/admin-guide/blockdev/drbd/data-structure-v9.rst
similarity index 100%
rename from Documentation/blockdev/drbd/data-structure-v9.rst
rename to Documentation/admin-guide/blockdev/drbd/data-structure-v9.rst
diff --git a/Documentation/blockdev/drbd/disk-states-8.dot b/Documentation/admin-guide/blockdev/drbd/disk-states-8.dot
similarity index 100%
rename from Documentation/blockdev/drbd/disk-states-8.dot
rename to Documentation/admin-guide/blockdev/drbd/disk-states-8.dot
diff --git a/Documentation/blockdev/drbd/drbd-connection-state-overview.dot b/Documentation/admin-guide/blockdev/drbd/drbd-connection-state-overview.dot
similarity index 100%
rename from Documentation/blockdev/drbd/drbd-connection-state-overview.dot
rename to Documentation/admin-guide/blockdev/drbd/drbd-connection-state-overview.dot
diff --git a/Documentation/blockdev/drbd/figures.rst b/Documentation/admin-guide/blockdev/drbd/figures.rst
similarity index 100%
rename from Documentation/blockdev/drbd/figures.rst
rename to Documentation/admin-guide/blockdev/drbd/figures.rst
diff --git a/Documentation/blockdev/drbd/index.rst b/Documentation/admin-guide/blockdev/drbd/index.rst
similarity index 100%
rename from Documentation/blockdev/drbd/index.rst
rename to Documentation/admin-guide/blockdev/drbd/index.rst
diff --git a/Documentation/blockdev/drbd/node-states-8.dot b/Documentation/admin-guide/blockdev/drbd/node-states-8.dot
similarity index 99%
rename from Documentation/blockdev/drbd/node-states-8.dot
rename to Documentation/admin-guide/blockdev/drbd/node-states-8.dot
index 4a2b00c23547..bfa54e1f8016 100644
--- a/Documentation/blockdev/drbd/node-states-8.dot
+++ b/Documentation/admin-guide/blockdev/drbd/node-states-8.dot
@@ -11,4 +11,3 @@ digraph peer_states {
Unknown -> Primary [ label = "connected" ]
Unknown -> Secondary [ label = "connected" ]
}
-
diff --git a/Documentation/blockdev/floppy.rst b/Documentation/admin-guide/blockdev/floppy.rst
similarity index 100%
rename from Documentation/blockdev/floppy.rst
rename to Documentation/admin-guide/blockdev/floppy.rst
diff --git a/Documentation/blockdev/index.rst b/Documentation/admin-guide/blockdev/index.rst
similarity index 94%
rename from Documentation/blockdev/index.rst
rename to Documentation/admin-guide/blockdev/index.rst
index a9af6ed8b4aa..20a738d9d047 100644
--- a/Documentation/blockdev/index.rst
+++ b/Documentation/admin-guide/blockdev/index.rst
@@ -1,5 +1,3 @@
-:orphan:
-
===========================
The Linux RapidIO Subsystem
===========================
diff --git a/Documentation/blockdev/nbd.rst b/Documentation/admin-guide/blockdev/nbd.rst
similarity index 100%
rename from Documentation/blockdev/nbd.rst
rename to Documentation/admin-guide/blockdev/nbd.rst
diff --git a/Documentation/blockdev/paride.rst b/Documentation/admin-guide/blockdev/paride.rst
similarity index 100%
rename from Documentation/blockdev/paride.rst
rename to Documentation/admin-guide/blockdev/paride.rst
diff --git a/Documentation/blockdev/ramdisk.rst b/Documentation/admin-guide/blockdev/ramdisk.rst
similarity index 100%
rename from Documentation/blockdev/ramdisk.rst
rename to Documentation/admin-guide/blockdev/ramdisk.rst
diff --git a/Documentation/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
similarity index 100%
rename from Documentation/blockdev/zram.rst
rename to Documentation/admin-guide/blockdev/zram.rst
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 65e821a03aca..c073af461cdf 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -73,6 +73,7 @@ configure specific aspects of kernel behavior to your liking.
java
ras
bcache
+ blockdev/index
ext4
pm/index
thunderbolt
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9b535c0e22f3..49ad034c4675 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1249,7 +1249,7 @@
See also Documentation/fault-injection/.
floppy= [HW]
- See Documentation/blockdev/floppy.rst.
+ See Documentation/admin-guide/blockdev/floppy.rst.
force_pal_cache_flush
[IA-64] Avoid check_sal_cache_flush which may hang on
@@ -2247,7 +2247,7 @@
memblock=debug [KNL] Enable memblock debug messages.
load_ramdisk= [RAM] List of ramdisks to load from floppy
- See Documentation/blockdev/ramdisk.rst.
+ See Documentation/admin-guide/blockdev/ramdisk.rst.
lockd.nlm_grace_period=P [NFS] Assign grace period.
Format: <integer>
@@ -3294,7 +3294,7 @@
pcd. [PARIDE]
See header of drivers/block/paride/pcd.c.
- See also Documentation/blockdev/paride.rst.
+ See also Documentation/admin-guide/blockdev/paride.rst.
pci=option[,option...] [PCI] various PCI subsystem options.
@@ -3538,7 +3538,7 @@
needed on a platform with proper driver support.
pd. [PARIDE]
- See Documentation/blockdev/paride.rst.
+ See Documentation/admin-guide/blockdev/paride.rst.
pdcchassis= [PARISC,HW] Disable/Enable PDC Chassis Status codes at
boot time.
@@ -3553,10 +3553,10 @@
and performance comparison.
pf. [PARIDE]
- See Documentation/blockdev/paride.rst.
+ See Documentation/admin-guide/blockdev/paride.rst.
pg. [PARIDE]
- See Documentation/blockdev/paride.rst.
+ See Documentation/admin-guide/blockdev/paride.rst.
pirq= [SMP,APIC] Manual mp-table setup
See Documentation/x86/i386/IO-APIC.rst.
@@ -3668,7 +3668,7 @@
prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
before loading.
- See Documentation/blockdev/ramdisk.rst.
+ See Documentation/admin-guide/blockdev/ramdisk.rst.
psi= [KNL] Enable or disable pressure stall information
tracking.
@@ -3690,7 +3690,7 @@
pstore.backend= Specify the name of the pstore backend to use
pt. [PARIDE]
- See Documentation/blockdev/paride.rst.
+ See Documentation/admin-guide/blockdev/paride.rst.
pti= [X86_64] Control Page Table Isolation of user and
kernel address spaces. Disabling this feature
@@ -3719,7 +3719,7 @@
See Documentation/admin-guide/md.rst.
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
- See Documentation/blockdev/ramdisk.rst.
+ See Documentation/admin-guide/blockdev/ramdisk.rst.
random.trust_cpu={on,off}
[KNL] Enable or disable trusting the use of the
diff --git a/MAINTAINERS b/MAINTAINERS
index 4c622a19ab7d..3f0f654d1166 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4974,7 +4974,7 @@ T: git git://git.linbit.com/drbd-8.4.git
S: Supported
F: drivers/block/drbd/
F: lib/lru_cache.c
-F: Documentation/blockdev/drbd/
+F: Documentation/admin-guide/blockdev/
DRIVER CORE, KOBJECTS, DEBUGFS AND SYSFS
M: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
@@ -11024,7 +11024,7 @@ M: Josef Bacik <josef(a)toxicpanda.com>
S: Maintained
L: linux-block(a)vger.kernel.org
L: nbd(a)other.debian.org
-F: Documentation/blockdev/nbd.rst
+F: Documentation/admin-guide/blockdev/nbd.rst
F: drivers/block/nbd.c
F: include/trace/events/nbd.h
F: include/uapi/linux/nbd.h
@@ -12028,7 +12028,7 @@ PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
M: Tim Waugh <tim(a)cyberelk.net>
L: linux-parport(a)lists.infradead.org (subscribers-only)
S: Maintained
-F: Documentation/blockdev/paride.rst
+F: Documentation/admin-guide/blockdev/paride.rst
F: drivers/block/paride/
PARISC ARCHITECTURE
@@ -13310,7 +13310,7 @@ F: drivers/net/wireless/ralink/rt2x00/
RAMDISK RAM BLOCK DEVICE DRIVER
M: Jens Axboe <axboe(a)kernel.dk>
S: Maintained
-F: Documentation/blockdev/ramdisk.rst
+F: Documentation/admin-guide/blockdev/ramdisk.rst
F: drivers/block/brd.c
RANCHU VIRTUAL BOARD FOR MIPS
@@ -17672,7 +17672,7 @@ R: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
L: linux-kernel(a)vger.kernel.org
S: Maintained
F: drivers/block/zram/
-F: Documentation/blockdev/zram.rst
+F: Documentation/admin-guide/blockdev/zram.rst
ZS DECSTATION Z85C30 SERIAL DRIVER
M: "Maciej W. Rozycki" <macro(a)linux-mips.org>
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index c43690b973d8..1bb8ec575352 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -31,7 +31,7 @@ config BLK_DEV_FD
If you want to use the floppy disk drive(s) of your PC under Linux,
say Y. Information about this driver, especially important for IBM
Thinkpad users, is contained in
- <file:Documentation/blockdev/floppy.rst>.
+ <file:Documentation/admin-guide/blockdev/floppy.rst>.
That file also contains the location of the Floppy driver FAQ as
well as location of the fdutils package used to configure additional
parameters of the driver at run time.
@@ -96,7 +96,7 @@ config PARIDE
your computer's parallel port. Most of them are actually IDE devices
using a parallel port IDE adapter. This option enables the PARIDE
subsystem which contains drivers for many of these external drives.
- Read <file:Documentation/blockdev/paride.rst> for more information.
+ Read <file:Documentation/admin-guide/blockdev/paride.rst> for more information.
If you have said Y to the "Parallel-port support" configuration
option, you may share a single port between your printer and other
@@ -261,7 +261,7 @@ config BLK_DEV_NBD
userland (making server and client physically the same computer,
communicating using the loopback network device).
- Read <file:Documentation/blockdev/nbd.rst> for more information,
+ Read <file:Documentation/admin-guide/blockdev/nbd.rst> for more information,
especially about where to find the server code, which runs in user
space and does not need special kernel support.
@@ -303,7 +303,7 @@ config BLK_DEV_RAM
during the initial install of Linux.
Note that the kernel command line option "ramdisk=XX" is now obsolete.
- For details, read <file:Documentation/blockdev/ramdisk.rst>.
+ For details, read <file:Documentation/admin-guide/blockdev/ramdisk.rst>.
To compile this driver as a module, choose M here: the
module will be called brd. An alias "rd" has been defined
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 5c99e52f9dc1..f652c1ac3ae9 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4424,7 +4424,7 @@ static int __init floppy_setup(char *str)
pr_cont("\n");
} else
DPRINT("botched floppy option\n");
- DPRINT("Read Documentation/blockdev/floppy.rst\n");
+ DPRINT("Read Documentation/admin-guide/blockdev/floppy.rst\n");
return 0;
}
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index e06b99d54816..fe7a4b7d30cf 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -12,7 +12,7 @@ config ZRAM
It has several use cases, for example: /tmp storage, use as swap
disks and maybe many more.
- See Documentation/blockdev/zram.rst for more information.
+ See Documentation/admin-guide/blockdev/zram.rst for more information.
config ZRAM_WRITEBACK
bool "Write back incompressible or idle page to backing device"
@@ -26,7 +26,7 @@ config ZRAM_WRITEBACK
With /sys/block/zramX/{idle,writeback}, application could ask
idle page's writeback to the backing device to save in memory.
- See Documentation/blockdev/zram.rst for more information.
+ See Documentation/admin-guide/blockdev/zram.rst for more information.
config ZRAM_MEMORY_TRACKING
bool "Track zRam block status"
@@ -36,4 +36,4 @@ config ZRAM_MEMORY_TRACKING
of zRAM. Admin could see the information via
/sys/kernel/debug/zram/zramX/block_state.
- See Documentation/blockdev/zram.rst for more information.
+ See Documentation/admin-guide/blockdev/zram.rst for more information.
diff --git a/tools/testing/selftests/zram/README b/tools/testing/selftests/zram/README
index 5fa378391d3b..110b34834a6f 100644
--- a/tools/testing/selftests/zram/README
+++ b/tools/testing/selftests/zram/README
@@ -37,4 +37,4 @@ Commands required for testing:
- mkfs/ mkfs.ext4
For more information please refer:
-kernel-source-tree/Documentation/blockdev/zram.rst
+kernel-source-tree/Documentation/admin-guide/blockdev/zram.rst
--
2.21.0
Rename the blockdev documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.
The drbd sub-directory contains some graphs and data flows.
Add those too to the documentation.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
---
.../admin-guide/kernel-parameters.txt | 18 +-
...structure-v9.txt => data-structure-v9.rst} | 6 +-
Documentation/blockdev/drbd/figures.rst | 28 +++
.../blockdev/drbd/{README.txt => index.rst} | 15 +-
.../blockdev/{floppy.txt => floppy.rst} | 88 ++++----
Documentation/blockdev/index.rst | 16 ++
Documentation/blockdev/{nbd.txt => nbd.rst} | 2 +-
.../blockdev/{paride.txt => paride.rst} | 194 +++++++++--------
.../blockdev/{ramdisk.txt => ramdisk.rst} | 55 ++---
Documentation/blockdev/{zram.txt => zram.rst} | 195 ++++++++++++------
MAINTAINERS | 8 +-
drivers/block/Kconfig | 8 +-
drivers/block/floppy.c | 2 +-
drivers/block/zram/Kconfig | 6 +-
tools/testing/selftests/zram/README | 2 +-
15 files changed, 398 insertions(+), 245 deletions(-)
rename Documentation/blockdev/drbd/{data-structure-v9.txt => data-structure-v9.rst} (94%)
create mode 100644 Documentation/blockdev/drbd/figures.rst
rename Documentation/blockdev/drbd/{README.txt => index.rst} (55%)
rename Documentation/blockdev/{floppy.txt => floppy.rst} (81%)
create mode 100644 Documentation/blockdev/index.rst
rename Documentation/blockdev/{nbd.txt => nbd.rst} (96%)
rename Documentation/blockdev/{paride.txt => paride.rst} (81%)
rename Documentation/blockdev/{ramdisk.txt => ramdisk.rst} (84%)
rename Documentation/blockdev/{zram.txt => zram.rst} (76%)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 92d335837c52..7ed94527a4a0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1249,7 +1249,7 @@
See also Documentation/fault-injection/.
floppy= [HW]
- See Documentation/blockdev/floppy.txt.
+ See Documentation/blockdev/floppy.rst.
force_pal_cache_flush
[IA-64] Avoid check_sal_cache_flush which may hang on
@@ -2247,7 +2247,7 @@
memblock=debug [KNL] Enable memblock debug messages.
load_ramdisk= [RAM] List of ramdisks to load from floppy
- See Documentation/blockdev/ramdisk.txt.
+ See Documentation/blockdev/ramdisk.rst.
lockd.nlm_grace_period=P [NFS] Assign grace period.
Format: <integer>
@@ -3294,7 +3294,7 @@
pcd. [PARIDE]
See header of drivers/block/paride/pcd.c.
- See also Documentation/blockdev/paride.txt.
+ See also Documentation/blockdev/paride.rst.
pci=option[,option...] [PCI] various PCI subsystem options.
@@ -3538,7 +3538,7 @@
needed on a platform with proper driver support.
pd. [PARIDE]
- See Documentation/blockdev/paride.txt.
+ See Documentation/blockdev/paride.rst.
pdcchassis= [PARISC,HW] Disable/Enable PDC Chassis Status codes at
boot time.
@@ -3553,10 +3553,10 @@
and performance comparison.
pf. [PARIDE]
- See Documentation/blockdev/paride.txt.
+ See Documentation/blockdev/paride.rst.
pg. [PARIDE]
- See Documentation/blockdev/paride.txt.
+ See Documentation/blockdev/paride.rst.
pirq= [SMP,APIC] Manual mp-table setup
See Documentation/x86/i386/IO-APIC.rst.
@@ -3668,7 +3668,7 @@
prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
before loading.
- See Documentation/blockdev/ramdisk.txt.
+ See Documentation/blockdev/ramdisk.rst.
psi= [KNL] Enable or disable pressure stall information
tracking.
@@ -3690,7 +3690,7 @@
pstore.backend= Specify the name of the pstore backend to use
pt. [PARIDE]
- See Documentation/blockdev/paride.txt.
+ See Documentation/blockdev/paride.rst.
pti= [X86_64] Control Page Table Isolation of user and
kernel address spaces. Disabling this feature
@@ -3719,7 +3719,7 @@
See Documentation/admin-guide/md.rst.
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
- See Documentation/blockdev/ramdisk.txt.
+ See Documentation/blockdev/ramdisk.rst.
random.trust_cpu={on,off}
[KNL] Enable or disable trusting the use of the
diff --git a/Documentation/blockdev/drbd/data-structure-v9.txt b/Documentation/blockdev/drbd/data-structure-v9.rst
similarity index 94%
rename from Documentation/blockdev/drbd/data-structure-v9.txt
rename to Documentation/blockdev/drbd/data-structure-v9.rst
index 1e52a0e32624..66036b901644 100644
--- a/Documentation/blockdev/drbd/data-structure-v9.txt
+++ b/Documentation/blockdev/drbd/data-structure-v9.rst
@@ -1,3 +1,7 @@
+================================
+kernel data structure for DRBD-9
+================================
+
This describes the in kernel data structure for DRBD-9. Starting with
Linux v3.14 we are reorganizing DRBD to use this data structure.
@@ -10,7 +14,7 @@ device is represented by a block device locally.
The DRBD objects are interconnected to form a matrix as depicted below; a
drbd_peer_device object sits at each intersection between a drbd_device and a
-drbd_connection:
+drbd_connection::
/--------------+---------------+.....+---------------\
| resource | device | | device |
diff --git a/Documentation/blockdev/drbd/figures.rst b/Documentation/blockdev/drbd/figures.rst
new file mode 100644
index 000000000000..3e3fd4b8a478
--- /dev/null
+++ b/Documentation/blockdev/drbd/figures.rst
@@ -0,0 +1,28 @@
+.. The here included files are intended to help understand the implementation
+
+Data flows that Relate some functions, and write packets
+========================================================
+
+.. kernel-figure:: DRBD-8.3-data-packets.svg
+ :alt: DRBD-8.3-data-packets.svg
+ :align: center
+
+.. kernel-figure:: DRBD-data-packets.svg
+ :alt: DRBD-data-packets.svg
+ :align: center
+
+
+Sub graphs of DRBD's state transitions
+======================================
+
+.. kernel-figure:: conn-states-8.dot
+ :alt: conn-states-8.dot
+ :align: center
+
+.. kernel-figure:: disk-states-8.dot
+ :alt: disk-states-8.dot
+ :align: center
+
+.. kernel-figure:: node-states-8.dot
+ :alt: node-states-8.dot
+ :align: center
diff --git a/Documentation/blockdev/drbd/README.txt b/Documentation/blockdev/drbd/index.rst
similarity index 55%
rename from Documentation/blockdev/drbd/README.txt
rename to Documentation/blockdev/drbd/index.rst
index 627b0a1bf35e..68ecd5c113e9 100644
--- a/Documentation/blockdev/drbd/README.txt
+++ b/Documentation/blockdev/drbd/index.rst
@@ -1,4 +1,9 @@
+==========================================
+Distributed Replicated Block Device - DRBD
+==========================================
+
Description
+===========
DRBD is a shared-nothing, synchronously replicated block device. It
is designed to serve as a building block for high availability
@@ -7,10 +12,8 @@ Description
Please visit http://www.drbd.org to find out more.
-The here included files are intended to help understand the implementation
+.. toctree::
+ :maxdepth: 1
-DRBD-8.3-data-packets.svg, DRBD-data-packets.svg
- relates some functions, and write packets.
-
-conn-states-8.dot, disk-states-8.dot, node-states-8.dot
- The sub graphs of DRBD's state transitions
+ data-structure-v9
+ figures
diff --git a/Documentation/blockdev/floppy.txt b/Documentation/blockdev/floppy.rst
similarity index 81%
rename from Documentation/blockdev/floppy.txt
rename to Documentation/blockdev/floppy.rst
index e2240f5ab64d..4a8f31cf4139 100644
--- a/Documentation/blockdev/floppy.txt
+++ b/Documentation/blockdev/floppy.rst
@@ -1,35 +1,37 @@
-This file describes the floppy driver.
+=============
+Floppy Driver
+=============
FAQ list:
=========
- A FAQ list may be found in the fdutils package (see below), and also
+A FAQ list may be found in the fdutils package (see below), and also
at <http://fdutils.linux.lu/faq.html>.
LILO configuration options (Thinkpad users, read this)
======================================================
- The floppy driver is configured using the 'floppy=' option in
+The floppy driver is configured using the 'floppy=' option in
lilo. This option can be typed at the boot prompt, or entered in the
lilo configuration file.
- Example: If your kernel is called linux-2.6.9, type the following line
-at the lilo boot prompt (if you have a thinkpad):
+Example: If your kernel is called linux-2.6.9, type the following line
+at the lilo boot prompt (if you have a thinkpad)::
linux-2.6.9 floppy=thinkpad
You may also enter the following line in /etc/lilo.conf, in the description
-of linux-2.6.9:
+of linux-2.6.9::
append = "floppy=thinkpad"
- Several floppy related options may be given, example:
+Several floppy related options may be given, example::
linux-2.6.9 floppy=daring floppy=two_fdc
append = "floppy=daring floppy=two_fdc"
- If you give options both in the lilo config file and on the boot
+If you give options both in the lilo config file and on the boot
prompt, the option strings of both places are concatenated, the boot
prompt options coming last. That's why there are also options to
restore the default behavior.
@@ -38,21 +40,23 @@ restore the default behavior.
Module configuration options
============================
- If you use the floppy driver as a module, use the following syntax:
-modprobe floppy floppy="<options>"
+If you use the floppy driver as a module, use the following syntax::
-Example:
- modprobe floppy floppy="omnibook messages"
+ modprobe floppy floppy="<options>"
- If you need certain options enabled every time you load the floppy driver,
-you can put:
+Example::
- options floppy floppy="omnibook messages"
+ modprobe floppy floppy="omnibook messages"
+
+If you need certain options enabled every time you load the floppy driver,
+you can put::
+
+ options floppy floppy="omnibook messages"
in a configuration file in /etc/modprobe.d/.
- The floppy driver related options are:
+The floppy driver related options are:
floppy=asus_pci
Sets the bit mask to allow only units 0 and 1. (default)
@@ -70,8 +74,7 @@ in a configuration file in /etc/modprobe.d/.
Tells the floppy driver that you have only one floppy controller.
(default)
- floppy=two_fdc
- floppy=<address>,two_fdc
+ floppy=two_fdc / floppy=<address>,two_fdc
Tells the floppy driver that you have two floppy controllers.
The second floppy controller is assumed to be at <address>.
This option is not needed if the second controller is at address
@@ -84,8 +87,7 @@ in a configuration file in /etc/modprobe.d/.
floppy=0,thinkpad
Tells the floppy driver that you don't have a Thinkpad.
- floppy=omnibook
- floppy=nodma
+ floppy=omnibook / floppy=nodma
Tells the floppy driver not to use Dma for data transfers.
This is needed on HP Omnibooks, which don't have a workable
DMA channel for the floppy driver. This option is also useful
@@ -144,14 +146,16 @@ in a configuration file in /etc/modprobe.d/.
described in the physical CMOS), or if your BIOS uses
non-standard CMOS types. The CMOS types are:
- 0 - Use the value of the physical CMOS
- 1 - 5 1/4 DD
- 2 - 5 1/4 HD
- 3 - 3 1/2 DD
- 4 - 3 1/2 HD
- 5 - 3 1/2 ED
- 6 - 3 1/2 ED
- 16 - unknown or not installed
+ == ==================================
+ 0 Use the value of the physical CMOS
+ 1 5 1/4 DD
+ 2 5 1/4 HD
+ 3 3 1/2 DD
+ 4 3 1/2 HD
+ 5 3 1/2 ED
+ 6 3 1/2 ED
+ 16 unknown or not installed
+ == ==================================
(Note: there are two valid types for ED drives. This is because 5 was
initially chosen to represent floppy *tapes*, and 6 for ED drives.
@@ -162,8 +166,7 @@ in a configuration file in /etc/modprobe.d/.
Print a warning message when an unexpected interrupt is received.
(default)
- floppy=no_unexpected_interrupts
- floppy=L40SX
+ floppy=no_unexpected_interrupts / floppy=L40SX
Don't print a message when an unexpected interrupt is received. This
is needed on IBM L40SX laptops in certain video modes. (There seems
to be an interaction between video and floppy. The unexpected
@@ -199,47 +202,54 @@ in a configuration file in /etc/modprobe.d/.
Sets the floppy DMA channel to <nr> instead of 2.
floppy=slow
- Use PS/2 stepping rate:
- " PS/2 floppies have much slower step rates than regular floppies.
+ Use PS/2 stepping rate::
+
+ PS/2 floppies have much slower step rates than regular floppies.
It's been recommended that take about 1/4 of the default speed
- in some more extreme cases."
+ in some more extreme cases.
Supporting utilities and additional documentation:
==================================================
- Additional parameters of the floppy driver can be configured at
+Additional parameters of the floppy driver can be configured at
runtime. Utilities which do this can be found in the fdutils package.
This package also contains a new version of mtools which allows to
access high capacity disks (up to 1992K on a high density 3 1/2 disk!).
It also contains additional documentation about the floppy driver.
The latest version can be found at fdutils homepage:
+
http://fdutils.linux.lu
The fdutils releases can be found at:
+
http://fdutils.linux.lu/download.html
+
http://www.tux.org/pub/knaff/fdutils/
+
ftp://metalab.unc.edu/pub/Linux/utils/disk-management/
Reporting problems about the floppy driver
==========================================
- If you have a question or a bug report about the floppy driver, mail
+If you have a question or a bug report about the floppy driver, mail
me at Alain.Knaff(a)poboxes.com . If you post to Usenet, preferably use
comp.os.linux.hardware. As the volume in these groups is rather high,
be sure to include the word "floppy" (or "FLOPPY") in the subject
line. If the reported problem happens when mounting floppy disks, be
sure to mention also the type of the filesystem in the subject line.
- Be sure to read the FAQ before mailing/posting any bug reports!
+Be sure to read the FAQ before mailing/posting any bug reports!
- Alain
+Alain
Changelog
=========
-10-30-2004 : Cleanup, updating, add reference to module configuration.
+10-30-2004 :
+ Cleanup, updating, add reference to module configuration.
James Nelson <james4765(a)gmail.com>
-6-3-2000 : Original Document
+6-3-2000 :
+ Original Document
diff --git a/Documentation/blockdev/index.rst b/Documentation/blockdev/index.rst
new file mode 100644
index 000000000000..a9af6ed8b4aa
--- /dev/null
+++ b/Documentation/blockdev/index.rst
@@ -0,0 +1,16 @@
+:orphan:
+
+===========================
+The Linux RapidIO Subsystem
+===========================
+
+.. toctree::
+ :maxdepth: 1
+
+ floppy
+ nbd
+ paride
+ ramdisk
+ zram
+
+ drbd/index
diff --git a/Documentation/blockdev/nbd.txt b/Documentation/blockdev/nbd.rst
similarity index 96%
rename from Documentation/blockdev/nbd.txt
rename to Documentation/blockdev/nbd.rst
index db242ea2bce8..d78dfe559dcf 100644
--- a/Documentation/blockdev/nbd.txt
+++ b/Documentation/blockdev/nbd.rst
@@ -1,3 +1,4 @@
+==================================
Network Block Device (TCP version)
==================================
@@ -28,4 +29,3 @@ max_part
nbds_max
Number of block devices that should be initialized (default: 16).
-
diff --git a/Documentation/blockdev/paride.txt b/Documentation/blockdev/paride.rst
similarity index 81%
rename from Documentation/blockdev/paride.txt
rename to Documentation/blockdev/paride.rst
index ee6717e3771d..87b4278bf314 100644
--- a/Documentation/blockdev/paride.txt
+++ b/Documentation/blockdev/paride.rst
@@ -1,15 +1,17 @@
-
- Linux and parallel port IDE devices
+===================================
+Linux and parallel port IDE devices
+===================================
PARIDE v1.03 (c) 1997-8 Grant Guenther <grant(a)torque.net>
1. Introduction
+===============
Owing to the simplicity and near universality of the parallel port interface
to personal computers, many external devices such as portable hard-disk,
CD-ROM, LS-120 and tape drives use the parallel port to connect to their
host computer. While some devices (notably scanners) use ad-hoc methods
-to pass commands and data through the parallel port interface, most
+to pass commands and data through the parallel port interface, most
external devices are actually identical to an internal model, but with
a parallel-port adapter chip added in. Some of the original parallel port
adapters were little more than mechanisms for multiplexing a SCSI bus.
@@ -28,47 +30,50 @@ were to open up a parallel port CD-ROM drive, for instance, one would
find a standard ATAPI CD-ROM drive, a power supply, and a single adapter
that interconnected a standard PC parallel port cable and a standard
IDE cable. It is usually possible to exchange the CD-ROM device with
-any other device using the IDE interface.
+any other device using the IDE interface.
The document describes the support in Linux for parallel port IDE
devices. It does not cover parallel port SCSI devices, "ditto" tape
-drives or scanners. Many different devices are supported by the
+drives or scanners. Many different devices are supported by the
parallel port IDE subsystem, including:
- MicroSolutions backpack CD-ROM
- MicroSolutions backpack PD/CD
- MicroSolutions backpack hard-drives
- MicroSolutions backpack 8000t tape drive
- SyQuest EZ-135, EZ-230 & SparQ drives
- Avatar Shark
- Imation Superdisk LS-120
- Maxell Superdisk LS-120
- FreeCom Power CD
- Hewlett-Packard 5GB and 8GB tape drives
- Hewlett-Packard 7100 and 7200 CD-RW drives
+ - MicroSolutions backpack CD-ROM
+ - MicroSolutions backpack PD/CD
+ - MicroSolutions backpack hard-drives
+ - MicroSolutions backpack 8000t tape drive
+ - SyQuest EZ-135, EZ-230 & SparQ drives
+ - Avatar Shark
+ - Imation Superdisk LS-120
+ - Maxell Superdisk LS-120
+ - FreeCom Power CD
+ - Hewlett-Packard 5GB and 8GB tape drives
+ - Hewlett-Packard 7100 and 7200 CD-RW drives
as well as most of the clone and no-name products on the market.
To support such a wide range of devices, PARIDE, the parallel port IDE
subsystem, is actually structured in three parts. There is a base
paride module which provides a registry and some common methods for
-accessing the parallel ports. The second component is a set of
-high-level drivers for each of the different types of supported devices:
+accessing the parallel ports. The second component is a set of
+high-level drivers for each of the different types of supported devices:
+ === =============
pd IDE disk
pcd ATAPI CD-ROM
pf ATAPI disk
pt ATAPI tape
pg ATAPI generic
+ === =============
(Currently, the pg driver is only used with CD-R drives).
The high-level drivers function according to the relevant standards.
The third component of PARIDE is a set of low-level protocol drivers
for each of the parallel port IDE adapter chips. Thanks to the interest
-and encouragement of Linux users from many parts of the world,
+and encouragement of Linux users from many parts of the world,
support is available for almost all known adapter protocols:
+ ==== ====================================== ====
aten ATEN EH-100 (HK)
bpck Microsolutions backpack (US)
comm DataStor (old-type) "commuter" adapter (TW)
@@ -83,9 +88,11 @@ support is available for almost all known adapter protocols:
ktti KT Technology PHd adapter (SG)
on20 OnSpec 90c20 (US)
on26 OnSpec 90c26 (US)
+ ==== ====================================== ====
2. Using the PARIDE subsystem
+=============================
While configuring the Linux kernel, you may choose either to build
the PARIDE drivers into your kernel, or to build them as modules.
@@ -94,10 +101,10 @@ In either case, you will need to select "Parallel port IDE device support"
as well as at least one of the high-level drivers and at least one
of the parallel port communication protocols. If you do not know
what kind of parallel port adapter is used in your drive, you could
-begin by checking the file names and any text files on your DOS
+begin by checking the file names and any text files on your DOS
installation floppy. Alternatively, you can look at the markings on
the adapter chip itself. That's usually sufficient to identify the
-correct device.
+correct device.
You can actually select all the protocol modules, and allow the PARIDE
subsystem to try them all for you.
@@ -105,8 +112,9 @@ subsystem to try them all for you.
For the "brand-name" products listed above, here are the protocol
and high-level drivers that you would use:
+ ================ ============ ====== ========
Manufacturer Model Driver Protocol
-
+ ================ ============ ====== ========
MicroSolutions CD-ROM pcd bpck
MicroSolutions PD drive pf bpck
MicroSolutions hard-drive pd bpck
@@ -119,8 +127,10 @@ and high-level drivers that you would use:
Hewlett-Packard 5GB Tape pt epat
Hewlett-Packard 7200e (CD) pcd epat
Hewlett-Packard 7200e (CD-R) pg epat
+ ================ ============ ====== ========
2.1 Configuring built-in drivers
+---------------------------------
We recommend that you get to know how the drivers work and how to
configure them as loadable modules, before attempting to compile a
@@ -143,7 +153,7 @@ protocol identification number and, for some devices, the drive's
chain ID. While your system is booting, a number of messages are
displayed on the console. Like all such messages, they can be
reviewed with the 'dmesg' command. Among those messages will be
-some lines like:
+some lines like::
paride: bpck registered as protocol 0
paride: epat registered as protocol 1
@@ -158,10 +168,10 @@ the last two digits of the drive's serial number (but read MicroSolutions'
documentation about this).
As an example, let's assume that you have a MicroSolutions PD/CD drive
-with unit ID number 36 connected to the parallel port at 0x378, a SyQuest
-EZ-135 connected to the chained port on the PD/CD drive and also an
-Imation Superdisk connected to port 0x278. You could give the following
-options on your boot command:
+with unit ID number 36 connected to the parallel port at 0x378, a SyQuest
+EZ-135 connected to the chained port on the PD/CD drive and also an
+Imation Superdisk connected to port 0x278. You could give the following
+options on your boot command::
pd.drive0=0x378,1 pf.drive0=0x278,1 pf.drive1=0x378,0,36
@@ -169,24 +179,27 @@ In the last option, pf.drive1 configures device /dev/pf1, the 0x378
is the parallel port base address, the 0 is the protocol registration
number and 36 is the chain ID.
-Please note: while PARIDE will work both with and without the
+Please note: while PARIDE will work both with and without the
PARPORT parallel port sharing system that is included by the
"Parallel port support" option, PARPORT must be included and enabled
if you want to use chains of devices on the same parallel port.
2.2 Loading and configuring PARIDE as modules
+----------------------------------------------
It is much faster and simpler to get to understand the PARIDE drivers
-if you use them as loadable kernel modules.
+if you use them as loadable kernel modules.
-Note 1: using these drivers with the "kerneld" automatic module loading
-system is not recommended for beginners, and is not documented here.
+Note 1:
+ using these drivers with the "kerneld" automatic module loading
+ system is not recommended for beginners, and is not documented here.
-Note 2: if you build PARPORT support as a loadable module, PARIDE must
-also be built as loadable modules, and PARPORT must be loaded before the
-PARIDE modules.
+Note 2:
+ if you build PARPORT support as a loadable module, PARIDE must
+ also be built as loadable modules, and PARPORT must be loaded before
+ the PARIDE modules.
-To use PARIDE, you must begin by
+To use PARIDE, you must begin by::
insmod paride
@@ -195,8 +208,8 @@ among other tasks.
Then, load as many of the protocol modules as you think you might need.
As you load each module, it will register the protocols that it supports,
-and print a log message to your kernel log file and your console. For
-example:
+and print a log message to your kernel log file and your console. For
+example::
# insmod epat
paride: epat registered as protocol 0
@@ -205,22 +218,22 @@ example:
paride: k971 registered as protocol 2
Finally, you can load high-level drivers for each kind of device that
-you have connected. By default, each driver will autoprobe for a single
+you have connected. By default, each driver will autoprobe for a single
device, but you can support up to four similar devices by giving their
individual co-ordinates when you load the driver.
For example, if you had two no-name CD-ROM drives both using the
KingByte KBIC-951A adapter, one on port 0x378 and the other on 0x3bc
-you could give the following command:
+you could give the following command::
# insmod pcd drive0=0x378,1 drive1=0x3bc,1
For most adapters, giving a port address and protocol number is sufficient,
-but check the source files in linux/drivers/block/paride for more
+but check the source files in linux/drivers/block/paride for more
information. (Hopefully someone will write some man pages one day !).
As another example, here's what happens when PARPORT is installed, and
-a SyQuest EZ-135 is attached to port 0x378:
+a SyQuest EZ-135 is attached to port 0x378::
# insmod paride
paride: version 1.0 installed
@@ -237,46 +250,47 @@ Note that the last line is the output from the generic partition table
scanner - in this case it reports that it has found a disk with one partition.
2.3 Using a PARIDE device
+--------------------------
Once the drivers have been loaded, you can access PARIDE devices in the
same way as their traditional counterparts. You will probably need to
create the device "special files". Here is a simple script that you can
-cut to a file and execute:
+cut to a file and execute::
-#!/bin/bash
-#
-# mkd -- a script to create the device special files for the PARIDE subsystem
-#
-function mkdev {
- mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
-}
-#
-function pd {
- D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
- mkdev pd$D b 45 $[ $1 * 16 ]
- for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
- done
-}
-#
-cd /dev
-#
-for u in 0 1 2 3 ; do pd $u ; done
-for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done
-for u in 0 1 2 3 ; do mkdev pf$u b 47 $u ; done
-for u in 0 1 2 3 ; do mkdev pt$u c 96 $u ; done
-for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done
-for u in 0 1 2 3 ; do mkdev pg$u c 97 $u ; done
-#
-# end of mkd
+ #!/bin/bash
+ #
+ # mkd -- a script to create the device special files for the PARIDE subsystem
+ #
+ function mkdev {
+ mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
+ }
+ #
+ function pd {
+ D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
+ mkdev pd$D b 45 $[ $1 * 16 ]
+ for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+ do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
+ done
+ }
+ #
+ cd /dev
+ #
+ for u in 0 1 2 3 ; do pd $u ; done
+ for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done
+ for u in 0 1 2 3 ; do mkdev pf$u b 47 $u ; done
+ for u in 0 1 2 3 ; do mkdev pt$u c 96 $u ; done
+ for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done
+ for u in 0 1 2 3 ; do mkdev pg$u c 97 $u ; done
+ #
+ # end of mkd
With the device files and drivers in place, you can access PARIDE devices
-like any other Linux device. For example, to mount a CD-ROM in pcd0, use:
+like any other Linux device. For example, to mount a CD-ROM in pcd0, use::
mount /dev/pcd0 /cdrom
If you have a fresh Avatar Shark cartridge, and the drive is pda, you
-might do something like:
+might do something like::
fdisk /dev/pda -- make a new partition table with
partition 1 of type 83
@@ -289,41 +303,46 @@ might do something like:
Devices like the Imation superdisk work in the same way, except that
they do not have a partition table. For example to make a 120MB
-floppy that you could share with a DOS system:
+floppy that you could share with a DOS system::
mkdosfs /dev/pf0
mount /dev/pf0 /mnt
2.4 The pf driver
+------------------
The pf driver is intended for use with parallel port ATAPI disk
devices. The most common devices in this category are PD drives
and LS-120 drives. Traditionally, media for these devices are not
partitioned. Consequently, the pf driver does not support partitioned
-media. This may be changed in a future version of the driver.
+media. This may be changed in a future version of the driver.
2.5 Using the pt driver
+------------------------
The pt driver for parallel port ATAPI tape drives is a minimal driver.
-It does not yet support many of the standard tape ioctl operations.
+It does not yet support many of the standard tape ioctl operations.
For best performance, a block size of 32KB should be used. You will
probably want to set the parallel port delay to 0, if you can.
2.6 Using the pg driver
+------------------------
The pg driver can be used in conjunction with the cdrecord program
to create CD-ROMs. Please get cdrecord version 1.6.1 or later
-from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ . To record CD-R media
-your parallel port should ideally be set to EPP mode, and the "port delay"
-should be set to 0. With those settings it is possible to record at 2x
+from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ . To record CD-R media
+your parallel port should ideally be set to EPP mode, and the "port delay"
+should be set to 0. With those settings it is possible to record at 2x
speed without any buffer underruns. If you cannot get the driver to work
in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only.
3. Troubleshooting
+==================
3.1 Use EPP mode if you can
+----------------------------
The most common problems that people report with the PARIDE drivers
concern the parallel port CMOS settings. At this time, none of the
@@ -332,6 +351,7 @@ If you are able to do so, please set your parallel port into EPP mode
using your CMOS setup procedure.
3.2 Check the port delay
+-------------------------
Some parallel ports cannot reliably transfer data at full speed. To
offset the errors, the PARIDE protocol modules introduce a "port
@@ -347,23 +367,25 @@ read the comments at the beginning of the driver source files in
linux/drivers/block/paride.
3.3 Some drives need a printer reset
+-------------------------------------
There appear to be a number of "noname" external drives on the market
that do not always power up correctly. We have noticed this with some
drives based on OnSpec and older Freecom adapters. In these rare cases,
the adapter can often be reinitialised by issuing a "printer reset" on
-the parallel port. As the reset operation is potentially disruptive in
-multiple device environments, the PARIDE drivers will not do it
-automatically. You can however, force a printer reset by doing:
+the parallel port. As the reset operation is potentially disruptive in
+multiple device environments, the PARIDE drivers will not do it
+automatically. You can however, force a printer reset by doing::
insmod lp reset=1
rmmod lp
If you have one of these marginal cases, you should probably build
your paride drivers as modules, and arrange to do the printer reset
-before loading the PARIDE drivers.
+before loading the PARIDE drivers.
3.4 Use the verbose option and dmesg if you need help
+------------------------------------------------------
While a lot of testing has gone into these drivers to make them work
as smoothly as possible, problems will arise. If you do have problems,
@@ -373,7 +395,7 @@ clues, then please make sure that only one drive is hooked to your system,
and that either (a) PARPORT is enabled or (b) no other device driver
is using your parallel port (check in /proc/ioports). Then, load the
appropriate drivers (you can load several protocol modules if you want)
-as in:
+as in::
# insmod paride
# insmod epat
@@ -394,12 +416,14 @@ by e-mail to grant(a)torque.net, or join the linux-parport mailing list
and post your report there.
3.5 For more information or help
+---------------------------------
You can join the linux-parport mailing list by sending a mail message
-to
+to:
+
linux-parport-request(a)torque.net
-with the single word
+with the single word::
subscribe
@@ -412,6 +436,4 @@ have in your mail headers, when sending mail to the list server.
You might also find some useful information on the linux-parport
web pages (although they are not always up to date) at
- http://web.archive.org/web/*/http://www.torque.net/parport/
-
-
+ http://web.archive.org/web/%2E/http://www.torque.net/parport/
diff --git a/Documentation/blockdev/ramdisk.txt b/Documentation/blockdev/ramdisk.rst
similarity index 84%
rename from Documentation/blockdev/ramdisk.txt
rename to Documentation/blockdev/ramdisk.rst
index 501e12e0323e..b7c2268f8dec 100644
--- a/Documentation/blockdev/ramdisk.txt
+++ b/Documentation/blockdev/ramdisk.rst
@@ -1,7 +1,8 @@
+==========================================
Using the RAM disk block device with Linux
-------------------------------------------
+==========================================
-Contents:
+.. Contents:
1) Overview
2) Kernel Command Line Parameters
@@ -42,7 +43,7 @@ rescue floppy disk.
2a) Kernel Command Line Parameters
ramdisk_size=N
- ==============
+ Size of the ramdisk.
This parameter tells the RAM disk driver to set up RAM disks of N k size. The
default is 4096 (4 MB).
@@ -50,16 +51,13 @@ default is 4096 (4 MB).
2b) Module parameters
rd_nr
- =====
- /dev/ramX devices created.
+ /dev/ramX devices created.
max_part
- ========
- Maximum partition number.
+ Maximum partition number.
rd_size
- =======
- See ramdisk_size.
+ See ramdisk_size.
3) Using "rdev -r"
------------------
@@ -71,11 +69,11 @@ to 2 MB (2^11) of where to find the RAM disk (this used to be the size). Bit
prompt/wait sequence is to be given before trying to read the RAM disk. Since
the RAM disk dynamically grows as data is being written into it, a size field
is not required. Bits 11 to 13 are not currently used and may as well be zero.
-These numbers are no magical secrets, as seen below:
+These numbers are no magical secrets, as seen below::
-./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK 0x07FF
-./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
-./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
+ ./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK 0x07FF
+ ./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
+ ./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
Consider a typical two floppy disk setup, where you will have the
kernel on disk one, and have already put a RAM disk image onto disk #2.
@@ -92,20 +90,23 @@ sequence so that you have a chance to switch floppy disks.
The command line equivalent is: "prompt_ramdisk=1"
Putting that together gives 2^15 + 2^14 + 0 = 49152 for an rdev word.
-So to create disk one of the set, you would do:
+So to create disk one of the set, you would do::
/usr/src/linux# cat arch/x86/boot/zImage > /dev/fd0
/usr/src/linux# rdev /dev/fd0 /dev/fd0
/usr/src/linux# rdev -r /dev/fd0 49152
-If you make a boot disk that has LILO, then for the above, you would use:
+If you make a boot disk that has LILO, then for the above, you would use::
+
append = "ramdisk_start=0 load_ramdisk=1 prompt_ramdisk=1"
-Since the default start = 0 and the default prompt = 1, you could use:
+
+Since the default start = 0 and the default prompt = 1, you could use::
+
append = "load_ramdisk=1"
4) An Example of Creating a Compressed RAM Disk
-----------------------------------------------
+-----------------------------------------------
To create a RAM disk image, you will need a spare block device to
construct it on. This can be the RAM disk device itself, or an
@@ -120,11 +121,11 @@ a) Decide on the RAM disk size that you want. Say 2 MB for this example.
Create it by writing to the RAM disk device. (This step is not currently
required, but may be in the future.) It is wise to zero out the
area (esp. for disks) so that maximal compression is achieved for
- the unused blocks of the image that you are about to create.
+ the unused blocks of the image that you are about to create::
dd if=/dev/zero of=/dev/ram0 bs=1k count=2048
-b) Make a filesystem on it. Say ext2fs for this example.
+b) Make a filesystem on it. Say ext2fs for this example::
mke2fs -vm0 /dev/ram0 2048
@@ -133,11 +134,11 @@ c) Mount it, copy the files you want to it (eg: /etc/* /dev/* ...)
d) Compress the contents of the RAM disk. The level of compression
will be approximately 50% of the space used by the files. Unused
- space on the RAM disk will compress to almost nothing.
+ space on the RAM disk will compress to almost nothing::
dd if=/dev/ram0 bs=1k count=2048 | gzip -v9 > /tmp/ram_image.gz
-e) Put the kernel onto the floppy
+e) Put the kernel onto the floppy::
dd if=zImage of=/dev/fd0 bs=1k
@@ -146,13 +147,13 @@ f) Put the RAM disk image onto the floppy, after the kernel. Use an offset
(possibly larger) kernel onto the same floppy later without overlapping
the RAM disk image. An offset of 400 kB for kernels about 350 kB in
size would be reasonable. Make sure offset+size of ram_image.gz is
- not larger than the total space on your floppy (usually 1440 kB).
+ not larger than the total space on your floppy (usually 1440 kB)::
dd if=/tmp/ram_image.gz of=/dev/fd0 bs=1k seek=400
g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
For prompt_ramdisk=1, load_ramdisk=1, ramdisk_start=400, one would
- have 2^15 + 2^14 + 400 = 49552.
+ have 2^15 + 2^14 + 400 = 49552::
rdev /dev/fd0 /dev/fd0
rdev -r /dev/fd0 49552
@@ -160,15 +161,17 @@ g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
That is it. You now have your boot/root compressed RAM disk floppy. Some
users may wish to combine steps (d) and (f) by using a pipe.
---------------------------------------------------------------------------
+
Paul Gortmaker 12/95
Changelog:
----------
-10-22-04 : Updated to reflect changes in command line options, remove
+10-22-04 :
+ Updated to reflect changes in command line options, remove
obsolete references, general cleanup.
James Nelson (james4765(a)gmail.com)
-12-95 : Original Document
+12-95 :
+ Original Document
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.rst
similarity index 76%
rename from Documentation/blockdev/zram.txt
rename to Documentation/blockdev/zram.rst
index 4df0ce271085..2111231c9c0f 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.rst
@@ -1,7 +1,9 @@
+========================================
zram: Compressed RAM based block devices
-----------------------------------------
+========================================
-* Introduction
+Introduction
+============
The zram module creates RAM based block devices named /dev/zram<id>
(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
@@ -12,9 +14,11 @@ use as swap disks, various caches under /var and maybe many more :)
Statistics for individual zram devices are exported through sysfs nodes at
/sys/block/zram<id>/
-* Usage
+Usage
+=====
There are several ways to configure and manage zram device(-s):
+
a) using zram and zram_control sysfs attributes
b) using zramctl utility, provided by util-linux (util-linux(a)vger.kernel.org).
@@ -22,7 +26,7 @@ In this document we will describe only 'manual' zram configuration steps,
IOW, zram and zram_control sysfs attributes.
In order to get a better idea about zramctl please consult util-linux
-documentation, zramctl man-page or `zramctl --help'. Please be informed
+documentation, zramctl man-page or `zramctl --help`. Please be informed
that zram maintainers do not develop/maintain util-linux or zramctl, should
you have any questions please contact util-linux(a)vger.kernel.org
@@ -30,19 +34,23 @@ Following shows a typical sequence of steps for using zram.
WARNING
=======
+
For the sake of simplicity we skip error checking parts in most of the
examples below. However, it is your sole responsibility to handle errors.
zram sysfs attributes always return negative values in case of errors.
The list of possible return codes:
--EBUSY -- an attempt to modify an attribute that cannot be changed once
-the device has been initialised. Please reset device first;
--ENOMEM -- zram was not able to allocate enough memory to fulfil your
-needs;
--EINVAL -- invalid input has been provided.
+
+======== =============================================================
+-EBUSY an attempt to modify an attribute that cannot be changed once
+ the device has been initialised. Please reset device first;
+-ENOMEM zram was not able to allocate enough memory to fulfil your
+ needs;
+-EINVAL invalid input has been provided.
+======== =============================================================
If you use 'echo', the returned value that is changed by 'echo' utility,
-and, in general case, something like:
+and, in general case, something like::
echo 3 > /sys/block/zram0/max_comp_streams
if [ $? -ne 0 ];
@@ -51,7 +59,11 @@ and, in general case, something like:
should suffice.
-1) Load Module:
+1) Load Module
+==============
+
+::
+
modprobe zram num_devices=4
This creates 4 devices: /dev/zram{0,1,2,3}
@@ -59,6 +71,8 @@ num_devices parameter is optional and tells zram how many devices should be
pre-created. Default: 1.
2) Set max number of compression streams
+========================================
+
Regardless the value passed to this attribute, ZRAM will always
allocate multiple compression streams - one per online CPUs - thus
allowing several concurrent compression operations. The number of
@@ -66,16 +80,20 @@ allocated compression streams goes down when some of the CPUs
become offline. There is no single-compression-stream mode anymore,
unless you are running a UP system or has only 1 CPU online.
-To find out how many streams are currently available:
+To find out how many streams are currently available::
+
cat /sys/block/zram0/max_comp_streams
3) Select compression algorithm
+===============================
+
Using comp_algorithm device attribute one can see available and
currently selected (shown in square brackets) compression algorithms,
change selected compression algorithm (once the device is initialised
there is no way to change compression algorithm).
-Examples:
+Examples::
+
#show supported compression algorithms
cat /sys/block/zram0/comp_algorithm
lzo [lz4]
@@ -83,20 +101,23 @@ Examples:
#select lzo compression algorithm
echo lzo > /sys/block/zram0/comp_algorithm
-For the time being, the `comp_algorithm' content does not necessarily
+For the time being, the `comp_algorithm` content does not necessarily
show every compression algorithm supported by the kernel. We keep this
list primarily to simplify device configuration and one can configure
a new device with a compression algorithm that is not listed in
-`comp_algorithm'. The thing is that, internally, ZRAM uses Crypto API
+`comp_algorithm`. The thing is that, internally, ZRAM uses Crypto API
and, if some of the algorithms were built as modules, it's impossible
to list all of them using, for instance, /proc/crypto or any other
method. This, however, has an advantage of permitting the usage of
custom crypto compression modules (implementing S/W or H/W compression).
4) Set Disksize
+===============
+
Set disk size by writing the value to sysfs node 'disksize'.
The value can be either in bytes or you can use mem suffixes.
-Examples:
+Examples::
+
# Initialize /dev/zram0 with 50MB disksize
echo $((50*1024*1024)) > /sys/block/zram0/disksize
@@ -111,10 +132,13 @@ since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
size of the disk when not in use so a huge zram is wasteful.
5) Set memory limit: Optional
+=============================
+
Set memory limit by writing the value to sysfs node 'mem_limit'.
The value can be either in bytes or you can use mem suffixes.
In addition, you could change the value in runtime.
-Examples:
+Examples::
+
# limit /dev/zram0 with 50MB memory
echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
@@ -126,7 +150,11 @@ Examples:
# To disable memory limit
echo 0 > /sys/block/zram0/mem_limit
-6) Activate:
+6) Activate
+===========
+
+::
+
mkswap /dev/zram0
swapon /dev/zram0
@@ -134,6 +162,7 @@ Examples:
mount /dev/zram1 /tmp
7) Add/remove zram devices
+==========================
zram provides a control interface, which enables dynamic (on-demand) device
addition and removal.
@@ -142,37 +171,44 @@ In order to add a new /dev/zramX device, perform read operation on hot_add
attribute. This will return either new device's device id (meaning that you
can use /dev/zram<id>) or error code.
-Example:
+Example::
+
cat /sys/class/zram-control/hot_add
1
To remove the existing /dev/zramX device (where X is a device id)
-execute
+execute::
+
echo X > /sys/class/zram-control/hot_remove
-8) Stats:
+8) Stats
+========
+
Per-device statistics are exported as various nodes under /sys/block/zram<id>/
A brief description of exported device attributes. For more details please
read Documentation/ABI/testing/sysfs-block-zram.
+====================== ====== ===============================================
Name access description
----- ------ -----------
+====================== ====== ===============================================
disksize RW show and set the device's disk size
initstate RO shows the initialization state of the device
reset WO trigger device reset
-mem_used_max WO reset the `mem_used_max' counter (see later)
-mem_limit WO specifies the maximum amount of memory ZRAM can use
- to store the compressed data
-writeback_limit WO specifies the maximum amount of write IO zram can
- write out to backing device as 4KB unit
+mem_used_max WO reset the `mem_used_max` counter (see later)
+mem_limit WO specifies the maximum amount of memory ZRAM can
+ use to store the compressed data
+writeback_limit WO specifies the maximum amount of write IO zram
+ can write out to backing device as 4KB unit
writeback_limit_enable RW show and set writeback_limit feature
-max_comp_streams RW the number of possible concurrent compress operations
+max_comp_streams RW the number of possible concurrent compress
+ operations
comp_algorithm RW show and change the compression algorithm
compact WO trigger memory compaction
debug_stat RO this file is used for zram debugging purposes
backing_dev RW set up backend storage for zram to write out
idle WO mark allocated slot as idle
+====================== ====== ===============================================
User space is advised to use the following files to read the device statistics.
@@ -188,23 +224,31 @@ The stat file represents device's I/O statistics not accounted by block
layer and, thus, not available in zram<id>/stat file. It consists of a
single line of text and contains the following stats separated by
whitespace:
- failed_reads the number of failed reads
- failed_writes the number of failed writes
- invalid_io the number of non-page-size-aligned I/O requests
+
+ ============= =============================================================
+ failed_reads The number of failed reads
+ failed_writes The number of failed writes
+ invalid_io The number of non-page-size-aligned I/O requests
notify_free Depending on device usage scenario it may account
+
a) the number of pages freed because of swap slot free
- notifications or b) the number of pages freed because of
- REQ_OP_DISCARD requests sent by bio. The former ones are
- sent to a swap block device when a swap slot is freed,
- which implies that this disk is being used as a swap disk.
+ notifications
+ b) the number of pages freed because of
+ REQ_OP_DISCARD requests sent by bio. The former ones are
+ sent to a swap block device when a swap slot is freed,
+ which implies that this disk is being used as a swap disk.
+
The latter ones are sent by filesystem mounted with
discard option, whenever some data blocks are getting
discarded.
+ ============= =============================================================
File /sys/block/zram<id>/mm_stat
The stat file represents device's mm statistics. It consists of a single
line of text and contains the following stats separated by whitespace:
+
+ ================ =============================================================
orig_data_size uncompressed size of data stored in this disk.
This excludes same-element-filled pages (same_pages) since
no memory is allocated for them.
@@ -223,58 +267,71 @@ line of text and contains the following stats separated by whitespace:
No memory is allocated for such pages.
pages_compacted the number of pages freed during compaction
huge_pages the number of incompressible pages
+ ================ =============================================================
File /sys/block/zram<id>/bd_stat
The stat file represents device's backing device statistics. It consists of
a single line of text and contains the following stats separated by whitespace:
+
+ ============== =============================================================
bd_count size of data written in backing device.
Unit: 4K bytes
bd_reads the number of reads from backing device
Unit: 4K bytes
bd_writes the number of writes to backing device
Unit: 4K bytes
+ ============== =============================================================
+
+9) Deactivate
+=============
+
+::
-9) Deactivate:
swapoff /dev/zram0
umount /dev/zram1
-10) Reset:
- Write any positive value to 'reset' sysfs node
- echo 1 > /sys/block/zram0/reset
- echo 1 > /sys/block/zram1/reset
+10) Reset
+=========
+
+ Write any positive value to 'reset' sysfs node::
+
+ echo 1 > /sys/block/zram0/reset
+ echo 1 > /sys/block/zram1/reset
This frees all the memory allocated for the given device and
resets the disksize to zero. You must set the disksize again
before reusing the device.
-* Optional Feature
+Optional Feature
+================
-= writeback
+writeback
+---------
With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
to backing storage rather than keeping it in memory.
-To use the feature, admin should set up backing device via
+To use the feature, admin should set up backing device via::
- "echo /dev/sda5 > /sys/block/zramX/backing_dev"
+ echo /dev/sda5 > /sys/block/zramX/backing_dev
before disksize setting. It supports only partition at this moment.
-If admin want to use incompressible page writeback, they could do via
+If admin want to use incompressible page writeback, they could do via::
- "echo huge > /sys/block/zramX/write"
+ echo huge > /sys/block/zramX/write
To use idle page writeback, first, user need to declare zram pages
-as idle.
+as idle::
- "echo all > /sys/block/zramX/idle"
+ echo all > /sys/block/zramX/idle
From now on, any pages on zram are idle pages. The idle mark
will be removed until someone request access of the block.
IOW, unless there is access request, those pages are still idle pages.
-Admin can request writeback of those idle pages at right timing via
+Admin can request writeback of those idle pages at right timing via::
- "echo idle > /sys/block/zramX/writeback"
+ echo idle > /sys/block/zramX/writeback
With the command, zram writeback idle pages from memory to the storage.
@@ -285,7 +342,7 @@ to guarantee storage health for entire product life.
To overcome the concern, zram supports "writeback_limit" feature.
The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
any writeback. IOW, if admin want to apply writeback budget, he should
-enable writeback_limit_enable via
+enable writeback_limit_enable via::
$ echo 1 > /sys/block/zramX/writeback_limit_enable
@@ -296,7 +353,7 @@ until admin set the budget via /sys/block/zramX/writeback_limit.
assigned via /sys/block/zramX/writeback_limit is meaninless.)
If admin want to limit writeback as per-day 400M, he could do it
-like below.
+like below::
$ MB_SHIFT=20
$ 4K_SHIFT=12
@@ -305,16 +362,16 @@ like below.
$ echo 1 > /sys/block/zram0/writeback_limit_enable
If admin want to allow further write again once the bugdet is exausted,
-he could do it like below
+he could do it like below::
$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
/sys/block/zram0/writeback_limit
-If admin want to see remaining writeback budget since he set,
+If admin want to see remaining writeback budget since he set::
$ cat /sys/block/zramX/writeback_limit
-If admin want to disable writeback limit, he could do
+If admin want to disable writeback limit, he could do::
$ echo 0 > /sys/block/zramX/writeback_limit_enable
@@ -326,25 +383,35 @@ budget in next setting is user's job.
If admin want to measure writeback count in a certain period, he could
know it via /sys/block/zram0/bd_stat's 3rd column.
-= memory tracking
+memory tracking
+===============
With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
zram block. It could be useful to catch cold or incompressible
pages of the process with*pagemap.
+
If you enable the feature, you could see block state via
-/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
+/sys/kernel/debug/zram/zram0/block_state". The output is as follows::
300 75.033841 .wh.
301 63.806904 s...
302 63.806919 ..hi
-First column is zram's block index.
-Second column is access time since the system was booted
-Third column is state of the block.
-(s: same page
-w: written page to backing store
-h: huge page
-i: idle page)
+First column
+ zram's block index.
+Second column
+ access time since the system was booted
+Third column
+ state of the block:
+
+ s:
+ same page
+ w:
+ written page to backing store
+ h:
+ huge page
+ i:
+ idle page
First line of above example says 300th block is accessed at 75.033841sec
and the block's state is huge so it is written back to the backing
diff --git a/MAINTAINERS b/MAINTAINERS
index 85648a944446..808f65e06ad8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11024,7 +11024,7 @@ M: Josef Bacik <josef(a)toxicpanda.com>
S: Maintained
L: linux-block(a)vger.kernel.org
L: nbd(a)other.debian.org
-F: Documentation/blockdev/nbd.txt
+F: Documentation/blockdev/nbd.rst
F: drivers/block/nbd.c
F: include/trace/events/nbd.h
F: include/uapi/linux/nbd.h
@@ -12028,7 +12028,7 @@ PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
M: Tim Waugh <tim(a)cyberelk.net>
L: linux-parport(a)lists.infradead.org (subscribers-only)
S: Maintained
-F: Documentation/blockdev/paride.txt
+F: Documentation/blockdev/paride.rst
F: drivers/block/paride/
PARISC ARCHITECTURE
@@ -13310,7 +13310,7 @@ F: drivers/net/wireless/ralink/rt2x00/
RAMDISK RAM BLOCK DEVICE DRIVER
M: Jens Axboe <axboe(a)kernel.dk>
S: Maintained
-F: Documentation/blockdev/ramdisk.txt
+F: Documentation/blockdev/ramdisk.rst
F: drivers/block/brd.c
RANCHU VIRTUAL BOARD FOR MIPS
@@ -17672,7 +17672,7 @@ R: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
L: linux-kernel(a)vger.kernel.org
S: Maintained
F: drivers/block/zram/
-F: Documentation/blockdev/zram.txt
+F: Documentation/blockdev/zram.rst
ZS DECSTATION Z85C30 SERIAL DRIVER
M: "Maciej W. Rozycki" <macro(a)linux-mips.org>
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 96ec7e0fc1ea..c43690b973d8 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -31,7 +31,7 @@ config BLK_DEV_FD
If you want to use the floppy disk drive(s) of your PC under Linux,
say Y. Information about this driver, especially important for IBM
Thinkpad users, is contained in
- <file:Documentation/blockdev/floppy.txt>.
+ <file:Documentation/blockdev/floppy.rst>.
That file also contains the location of the Floppy driver FAQ as
well as location of the fdutils package used to configure additional
parameters of the driver at run time.
@@ -96,7 +96,7 @@ config PARIDE
your computer's parallel port. Most of them are actually IDE devices
using a parallel port IDE adapter. This option enables the PARIDE
subsystem which contains drivers for many of these external drives.
- Read <file:Documentation/blockdev/paride.txt> for more information.
+ Read <file:Documentation/blockdev/paride.rst> for more information.
If you have said Y to the "Parallel-port support" configuration
option, you may share a single port between your printer and other
@@ -261,7 +261,7 @@ config BLK_DEV_NBD
userland (making server and client physically the same computer,
communicating using the loopback network device).
- Read <file:Documentation/blockdev/nbd.txt> for more information,
+ Read <file:Documentation/blockdev/nbd.rst> for more information,
especially about where to find the server code, which runs in user
space and does not need special kernel support.
@@ -303,7 +303,7 @@ config BLK_DEV_RAM
during the initial install of Linux.
Note that the kernel command line option "ramdisk=XX" is now obsolete.
- For details, read <file:Documentation/blockdev/ramdisk.txt>.
+ For details, read <file:Documentation/blockdev/ramdisk.rst>.
To compile this driver as a module, choose M here: the
module will be called brd. An alias "rd" has been defined
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index b933a7eea52b..5c99e52f9dc1 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4424,7 +4424,7 @@ static int __init floppy_setup(char *str)
pr_cont("\n");
} else
DPRINT("botched floppy option\n");
- DPRINT("Read Documentation/blockdev/floppy.txt\n");
+ DPRINT("Read Documentation/blockdev/floppy.rst\n");
return 0;
}
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 1ffc64770643..e06b99d54816 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -12,7 +12,7 @@ config ZRAM
It has several use cases, for example: /tmp storage, use as swap
disks and maybe many more.
- See Documentation/blockdev/zram.txt for more information.
+ See Documentation/blockdev/zram.rst for more information.
config ZRAM_WRITEBACK
bool "Write back incompressible or idle page to backing device"
@@ -26,7 +26,7 @@ config ZRAM_WRITEBACK
With /sys/block/zramX/{idle,writeback}, application could ask
idle page's writeback to the backing device to save in memory.
- See Documentation/blockdev/zram.txt for more information.
+ See Documentation/blockdev/zram.rst for more information.
config ZRAM_MEMORY_TRACKING
bool "Track zRam block status"
@@ -36,4 +36,4 @@ config ZRAM_MEMORY_TRACKING
of zRAM. Admin could see the information via
/sys/kernel/debug/zram/zramX/block_state.
- See Documentation/blockdev/zram.txt for more information.
+ See Documentation/blockdev/zram.rst for more information.
diff --git a/tools/testing/selftests/zram/README b/tools/testing/selftests/zram/README
index 7972cc512408..5fa378391d3b 100644
--- a/tools/testing/selftests/zram/README
+++ b/tools/testing/selftests/zram/README
@@ -37,4 +37,4 @@ Commands required for testing:
- mkfs/ mkfs.ext4
For more information please refer:
-kernel-source-tree/Documentation/blockdev/zram.txt
+kernel-source-tree/Documentation/blockdev/zram.rst
--
2.21.0
This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:
int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a problem for
Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfds for PID-based processes we enable them to adopt this api.
In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.
Signed-off-by: Christian Brauner <christian(a)brauner.io>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Jann Horn <jannh(a)google.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Andy Lutomirsky <luto(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Aleksa Sarai <cyphar(a)cyphar.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: linux-api(a)vger.kernel.org
---
v1:
- kbuild test robot <lkp(a)intel.com>:
- add missing entry for pidfd_open to arch/arm/tools/syscall.tbl
- Oleg Nesterov <oleg(a)redhat.com>:
- use simpler thread-group leader check
v2:
- Oleg Nesterov <oleg(a)redhat.com>:
- avoid using additional variable
- remove unneeded comment
- Arnd Bergmann <arnd(a)arndb.de>:
- switch from 428 to 434 since the new mount api has taken it
- bump syscall numbers in arch/arm64/include/asm/unistd.h
- Joel Fernandes (Google) <joel(a)joelfernandes.org>:
- switch from ESRCH to EINVAL when the passed-in pid does not refer to a
thread-group leader
- Christian Brauner <christian(a)brauner.io>:
- rebase on v5.2-rc1
- adapt syscall number to account for new mount api syscalls
v3:
- Arnd Bergmann <arnd(a)arndb.de>:
- add missing syscall entries for mips-o32 and mips-n64
---
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/pid.h | 1 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/fork.c | 2 +-
kernel/pid.c | 43 +++++++++++++++++++++
23 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 9e7704e44f6d..1db9bbcfb84e 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -473,3 +473,4 @@
541 common fsconfig sys_fsconfig
542 common fsmount sys_fsmount
543 common fspick sys_fspick
+544 common pidfd_open sys_pidfd_open
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index aaf479a9e92d..81e6e1817c45 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -447,3 +447,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 70e6882853c0..e8f7d95a1481 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -44,7 +44,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
-#define __NR_compat_syscalls 434
+#define __NR_compat_syscalls 435
#endif
#define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index c39e90600bb3..7a3158ccd68e 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -886,6 +886,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
__SYSCALL(__NR_fsmount, sys_fsmount)
#define __NR_fspick 433
__SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_pidfd_open 434
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
/*
* Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index e01df3f2f80d..ecc44926737b 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -354,3 +354,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 7e3d0734b2f3..9a3eb2558568 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -433,3 +433,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 26339e417695..ad706f83c755 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 0e2dd68ade57..97035e19ad03 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -372,3 +372,4 @@
431 n32 fsconfig sys_fsconfig
432 n32 fsmount sys_fsmount
433 n32 fspick sys_fspick
+434 n32 pidfd_open sys_pidfd_open
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 5eebfa0d155c..d7292722d3b0 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -348,3 +348,4 @@
431 n64 fsconfig sys_fsconfig
432 n64 fsmount sys_fsmount
433 n64 fspick sys_fspick
+434 n64 pidfd_open sys_pidfd_open
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 3cc1374e02d0..dba084c92f14 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -421,3 +421,4 @@
431 o32 fsconfig sys_fsconfig
432 o32 fsmount sys_fsmount
433 o32 fspick sys_fspick
+434 o32 pidfd_open sys_pidfd_open
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index c9e377d59232..5022b9e179c2 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -430,3 +430,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 103655d84b4b..f2c3bda2d39f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -515,3 +515,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index e822b2964a83..6ebacfeaf853 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -436,3 +436,4 @@
431 common fsconfig sys_fsconfig sys_fsconfig
432 common fsmount sys_fsmount sys_fsmount
433 common fspick sys_fspick sys_fspick
+434 common pidfd_open sys_pidfd_open sys_pidfd_open
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 016a727d4357..834c9c7d79fa 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -436,3 +436,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index e047480b1605..c58e71f21129 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -479,3 +479,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ad968b7bac72..43e4429a5272 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -438,3 +438,4 @@
431 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig
432 i386 fsmount sys_fsmount __ia32_sys_fsmount
433 i386 fspick sys_fspick __ia32_sys_fspick
+434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b4e6f9e6204a..1bee0a77fdd3 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -355,6 +355,7 @@
431 common fsconfig __x64_sys_fsconfig
432 common fsmount __x64_sys_fsmount
433 common fspick __x64_sys_fspick
+434 common pidfd_open __x64_sys_pidfd_open
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 5fa0ee1c8e00..782b81945ccc 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -404,3 +404,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 3c8ef5a199ca..c938a92eab99 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -67,6 +67,7 @@ struct pid
extern struct pid init_struct_pid;
extern const struct file_operations pidfd_fops;
+extern int pidfd_create(struct pid *pid);
static inline struct pid *get_pid(struct pid *pid)
{
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e2870fe1be5b..989055e0b501 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -929,6 +929,7 @@ asmlinkage long sys_clock_adjtime32(clockid_t which_clock,
struct old_timex32 __user *tx);
asmlinkage long sys_syncfs(int fd);
asmlinkage long sys_setns(int fd, int nstype);
+asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags);
asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg,
unsigned int vlen, unsigned flags);
asmlinkage long sys_process_vm_readv(pid_t pid,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index a87904daf103..e5684a4512c0 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
__SYSCALL(__NR_fsmount, sys_fsmount)
#define __NR_fspick 433
__SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_pidfd_open 434
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
#undef __NR_syscalls
-#define __NR_syscalls 434
+#define __NR_syscalls 435
/*
* 32 bit systems traditionally used different
diff --git a/kernel/fork.c b/kernel/fork.c
index b4cba953040a..c3df226f47a1 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1724,7 +1724,7 @@ const struct file_operations pidfd_fops = {
* Return: On success, a cloexec pidfd is returned.
* On error, a negative errno number will be returned.
*/
-static int pidfd_create(struct pid *pid)
+int pidfd_create(struct pid *pid)
{
int fd;
diff --git a/kernel/pid.c b/kernel/pid.c
index 89548d35eefb..8fc9d94f6ac1 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -37,6 +37,7 @@
#include <linux/syscalls.h>
#include <linux/proc_ns.h>
#include <linux/proc_fs.h>
+#include <linux/sched/signal.h>
#include <linux/sched/task.h>
#include <linux/idr.h>
@@ -450,6 +451,48 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
return idr_get_next(&ns->idr, &nr);
}
+/**
+ * pidfd_open() - Open new pid file descriptor.
+ *
+ * @pid: pid for which to retrieve a pidfd
+ * @flags: flags to pass
+ *
+ * This creates a new pid file descriptor with the O_CLOEXEC flag set for
+ * the process identified by @pid. Currently, the process identified by
+ * @pid must be a thread-group leader. This restriction currently exists
+ * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
+ * be used with CLONE_THREAD) and pidfd polling (only supports thread group
+ * leaders).
+ *
+ * Return: On success, a cloexec pidfd is returned.
+ * On error, a negative errno number will be returned.
+ */
+SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
+{
+ int fd, ret;
+ struct pid *p;
+
+ if (flags)
+ return -EINVAL;
+
+ if (pid <= 0)
+ return -EINVAL;
+
+ p = find_get_pid(pid);
+ if (!p)
+ return -ESRCH;
+
+ ret = 0;
+ rcu_read_lock();
+ if (!pid_task(p, PIDTYPE_TGID))
+ ret = -EINVAL;
+ rcu_read_unlock();
+
+ fd = ret ?: pidfd_create(p);
+ put_pid(p);
+ return fd;
+}
+
void __init pid_idr_init(void)
{
/* Verify no one has done anything silly: */
--
2.21.0