From: Alexei Starovoitov <ast(a)kernel.org>
[ Upstream commit 7c0c6095d48dcd0e67c917aa73cdbb2715aafc36 ]
Adjust scale tests to check for new jmp sequence limit.
BPF_JGT had to be changed to BPF_JEQ because the verifier was
too smart. It tracked the known safe range of R0 values
and pruned the search earlier before hitting exact 8192 limit.
bpf_semi_rand_get() was too (un)?lucky.
k = 0; was missing in bpf_fill_scale2.
It was testing a bit shorter sequence of jumps than intended.
Signed-off-by: Alexei Starovoitov <ast(a)kernel.org>
Acked-by: Andrii Nakryiko <andriin(a)fb.com>
Signed-off-by: Daniel Borkmann <daniel(a)iogearbox.net>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/bpf/test_verifier.c | 31 +++++++++++----------
1 file changed, 17 insertions(+), 14 deletions(-)
diff --git a/tools/testing/selftests/bpf/test_verifier.c b/tools/testing/selftests/bpf/test_verifier.c
index 288cb740e005..6438d4dc8ae1 100644
--- a/tools/testing/selftests/bpf/test_verifier.c
+++ b/tools/testing/selftests/bpf/test_verifier.c
@@ -207,33 +207,35 @@ static void bpf_fill_rand_ld_dw(struct bpf_test *self)
self->retval = (uint32_t)res;
}
-/* test the sequence of 1k jumps */
+#define MAX_JMP_SEQ 8192
+
+/* test the sequence of 8k jumps */
static void bpf_fill_scale1(struct bpf_test *self)
{
struct bpf_insn *insn = self->fill_insns;
int i = 0, k = 0;
insn[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
- /* test to check that the sequence of 1024 jumps is acceptable */
- while (k++ < 1024) {
+ /* test to check that the long sequence of jumps is acceptable */
+ while (k++ < MAX_JMP_SEQ) {
insn[i++] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_get_prandom_u32);
- insn[i++] = BPF_JMP_IMM(BPF_JGT, BPF_REG_0, bpf_semi_rand_get(), 2);
+ insn[i++] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, bpf_semi_rand_get(), 2);
insn[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_10);
insn[i++] = BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6,
-8 * (k % 64 + 1));
}
- /* every jump adds 1024 steps to insn_processed, so to stay exactly
- * within 1m limit add MAX_TEST_INSNS - 1025 MOVs and 1 EXIT
+ /* every jump adds 1 step to insn_processed, so to stay exactly
+ * within 1m limit add MAX_TEST_INSNS - MAX_JMP_SEQ - 1 MOVs and 1 EXIT
*/
- while (i < MAX_TEST_INSNS - 1025)
+ while (i < MAX_TEST_INSNS - MAX_JMP_SEQ - 1)
insn[i++] = BPF_ALU32_IMM(BPF_MOV, BPF_REG_0, 42);
insn[i] = BPF_EXIT_INSN();
self->prog_len = i + 1;
self->retval = 42;
}
-/* test the sequence of 1k jumps in inner most function (function depth 8)*/
+/* test the sequence of 8k jumps in inner most function (function depth 8)*/
static void bpf_fill_scale2(struct bpf_test *self)
{
struct bpf_insn *insn = self->fill_insns;
@@ -245,19 +247,20 @@ static void bpf_fill_scale2(struct bpf_test *self)
insn[i++] = BPF_EXIT_INSN();
}
insn[i++] = BPF_MOV64_REG(BPF_REG_6, BPF_REG_1);
- /* test to check that the sequence of 1024 jumps is acceptable */
- while (k++ < 1024) {
+ /* test to check that the long sequence of jumps is acceptable */
+ k = 0;
+ while (k++ < MAX_JMP_SEQ) {
insn[i++] = BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0,
BPF_FUNC_get_prandom_u32);
- insn[i++] = BPF_JMP_IMM(BPF_JGT, BPF_REG_0, bpf_semi_rand_get(), 2);
+ insn[i++] = BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, bpf_semi_rand_get(), 2);
insn[i++] = BPF_MOV64_REG(BPF_REG_1, BPF_REG_10);
insn[i++] = BPF_STX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6,
-8 * (k % (64 - 4 * FUNC_NEST) + 1));
}
- /* every jump adds 1024 steps to insn_processed, so to stay exactly
- * within 1m limit add MAX_TEST_INSNS - 1025 MOVs and 1 EXIT
+ /* every jump adds 1 step to insn_processed, so to stay exactly
+ * within 1m limit add MAX_TEST_INSNS - MAX_JMP_SEQ - 1 MOVs and 1 EXIT
*/
- while (i < MAX_TEST_INSNS - 1025)
+ while (i < MAX_TEST_INSNS - MAX_JMP_SEQ - 1)
insn[i++] = BPF_ALU32_IMM(BPF_MOV, BPF_REG_0, 42);
insn[i] = BPF_EXIT_INSN();
self->prog_len = i + 1;
--
2.20.1
Hi Linus,
Please pull the following Kselftest update for Linux 5.3-rc1.
This Kselftest update for Linux 5.3-rc1 consists of build failure
fixes and minor code cleaning patch to remove duplicate headers.
diff is attached.
thanks,
-- Shuah
----------------------------------------------------------------
The following changes since commit 4b972a01a7da614b4796475f933094751a295a2f:
Linux 5.2-rc6 (2019-06-22 16:01:36 -0700)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
tags/linux-kselftest-5.3-rc1
for you to fetch changes up to ee8a84c60bcc1f1615bd9cb3edfe501e26cdc85b:
rseq/selftests: Fix Thumb mode build failure on arm32 (2019-07-08
13:00:41 -0600)
----------------------------------------------------------------
linux-kselftest-5.3-rc1
This Kselftest update for Linux 5.3-rc1 consists of build failure
fixes and minor code cleaning patch to remove duplicate headers.
----------------------------------------------------------------
Mathieu Desnoyers (1):
rseq/selftests: Fix Thumb mode build failure on arm32
Naresh Kamboju (1):
selftests: dma-buf: Adding kernel config fragment CONFIG_UDMABUF=y
Shuah Khan (1):
selftests: timestamping: Fix SIOCGSTAMP undeclared build failure
YueHaibing (1):
kselftests: cgroup: remove duplicated include from test_freezer.c
tools/testing/selftests/cgroup/test_freezer.c | 1 -
tools/testing/selftests/drivers/dma-buf/config | 1 +
.../networking/timestamping/timestamping.c | 9 +---
tools/testing/selftests/rseq/rseq-arm.h | 61
++++++++++++----------
4 files changed, 35 insertions(+), 37 deletions(-)
create mode 100644 tools/testing/selftests/drivers/dma-buf/config
----------------------------------------------------------------
From: Paolo Pisati <paolo.pisati(a)canonical.com>
After applying patch 0001, all checksum implementations i could test (x86-64, arm64 and
arm), now agree on the return value.
Patch 0002 fix the expected return value for test #13: i did the calculation manually,
and it correspond.
Unfortunately, after applying patch 0001, other test cases now fail in
test_verifier:
$ sudo ./tools/testing/selftests/bpf/test_verifier
...
#417/p helper access to variable memory: size = 0 allowed on NULL (ARG_PTR_TO_MEM_OR_NULL) FAIL retval 65535 != 0
#419/p helper access to variable memory: size = 0 allowed on != NULL stack pointer (ARG_PTR_TO_MEM_OR_NULL) FAIL retval 65535 != 0
#423/p helper access to variable memory: size possible = 0 allowed on != NULL packet pointer (ARG_PTR_TO_MEM_OR_NULL) FAIL retval 65535 != 0
...
Summary: 1500 PASSED, 0 SKIPPED, 3 FAILED
And there are probably other fallouts in other selftests - someone familiar
should take a look before applying these patches.
Paolo Pisati (2):
bpf: bpf_csum_diff: fold the checksum before returning the
value
bpf, selftest: fix checksum value for test #13
net/core/filter.c | 2 +-
tools/testing/selftests/bpf/verifier/array_access.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--
2.17.1
From: Paolo Pisati <paolo.pisati(a)canonical.com>
After applying patch 0001, all checksum implementations i could test (x86-64, arm64 and
arm), now agree on the return value.
Patch 0002 fix the expected return value for test #13: i did the calculation manually,
and it correspond.
Unfortunately, after applying patch 0001, other test cases now fail in
test_verifier:
$ sudo ./tools/testing/selftests/bpf/test_verifier
...
#417/p helper access to variable memory: size = 0 allowed on NULL (ARG_PTR_TO_MEM_OR_NULL) FAIL retval 65535 != 0
#419/p helper access to variable memory: size = 0 allowed on != NULL stack pointer (ARG_PTR_TO_MEM_OR_NULL) FAIL retval 65535 != 0
#423/p helper access to variable memory: size possible = 0 allowed on != NULL packet pointer (ARG_PTR_TO_MEM_OR_NULL) FAIL retval 65535 != 0
...
Summary: 1500 PASSED, 0 SKIPPED, 3 FAILED
And there are probably other fallouts in other selftests - someone familiar
should take a look before applying these patches.
Paolo Pisati (2):
bpf: bpf_csum_diff: fold the checksum before returning the
value
bpf, selftest: fix checksum value for test #13
net/core/filter.c | 2 +-
tools/testing/selftests/bpf/verifier/array_access.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--
2.17.1
On 32-bit x86 when building with clang-9, the loop gets turned back into
an inefficient division that causes a link error:
kernel/time/vsyscall.o: In function `update_vsyscall':
vsyscall.c:(.text+0xe3): undefined reference to `__udivdi3'
Use the provided __iter_div_u64_rem() function that is meant to address
the same case in other files.
Fixes: 44f57d788e7d ("timekeeping: Provide a generic update_vsyscall() implementation")
Signed-off-by: Arnd Bergmann <arnd(a)arndb.de>
---
kernel/time/vsyscall.c | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/kernel/time/vsyscall.c b/kernel/time/vsyscall.c
index a80893180826..8cf3596a4ce6 100644
--- a/kernel/time/vsyscall.c
+++ b/kernel/time/vsyscall.c
@@ -104,11 +104,7 @@ void update_vsyscall(struct timekeeper *tk)
vdso_ts->sec = tk->xtime_sec + tk->wall_to_monotonic.tv_sec;
nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift;
nsec = nsec + tk->wall_to_monotonic.tv_nsec;
- while (nsec >= NSEC_PER_SEC) {
- nsec = nsec - NSEC_PER_SEC;
- vdso_ts->sec++;
- }
- vdso_ts->nsec = nsec;
+ vdso_ts->sec += __iter_div_u64_rem(nsec, NSEC_PER_SEC, &vdso_ts->nsec);
if (__arch_use_vsyscall(vdata))
update_vdso_data(vdata, tk);
--
2.20.0
By undefined run_test(), compile of breakpoint_test_arm64.c fails.
This changes arun_test to run_test.
----
reakpoint_test_arm64.c: In function 'main':
breakpoint_test_arm64.c:217:14: warning: implicit declaration of function
'run_test'; did you mean 'arun_test'? [-Wimplicit-function-declaration]
result = run_test(size, MIN(size, 8), wr, wp);
^~~~~~~~
arun_test
----
Fixes: 5821ba969511 ("selftests: Add test plan API to kselftest.h and adjust callers")
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu(a)nigauri.org>
---
tools/testing/selftests/breakpoints/breakpoint_test_arm64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
index 58ed5eeab709..ad41ea69001b 100644
--- a/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
+++ b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
@@ -109,7 +109,7 @@ static bool set_watchpoint(pid_t pid, int size, int wp)
return false;
}
-static bool arun_test(int wr_size, int wp_size, int wr, int wp)
+static bool run_test(int wr_size, int wp_size, int wr, int wp)
{
int status;
siginfo_t siginfo;
--
2.20.1
## TL;DR
This is a pretty straightforward follow-up to Luis' comments on PATCH
v6: There is nothing that changes any functionality or usage.
As for our current status, we only need reviews/acks on the following
patches:
- [PATCH v7 06/18] kbuild: enable building KUnit
- Need a review or ack from Masahiro Yamada or Michal Marek
- [PATCH v7 08/18] objtool: add kunit_try_catch_throw to the noreturn
list
- Need a review or ack from Josh Poimboeuf or Peter Zijlstra
Other than that, I think we should be good to go.
## Background
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
(however, KUnit still allows you to run tests on test machines or in VMs
if you want[1]) and does not require tests to be written in userspace
running on a host kernel. Additionally, KUnit is fast: From invocation
to completion KUnit can run several dozen tests in about a second.
Currently, the entire KUnit test suite for KUnit runs in under a second
from the initial invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
### What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
### Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
### More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here[2].
Additionally for convenience, I have applied these patches to a
branch[3]. The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/v5.2/v7 branch.
## Changes Since Last Version
Aside from renaming `struct kunit_module` to `struct kunit_suite`, there
isn't really anything in here that changes any functionality:
- Rebased on v5.2
- Added Iurii as a maintainer for PROC SYSCTL, as suggested by Luis.
- Removed some references to spinlock that I failed to remove in the
previous version, as pointed out by Luis.
- Cleaned up some comments, as suggested by Luis.
[1] https://google.github.io/kunit-docs/third_party/kernel/docs/usage.html#kuni…
[2] https://google.github.io/kunit-docs/third_party/kernel/docs/
[3] https://kunit.googlesource.com/linux/+/kunit/rfc/v5.2/v7
--
2.22.0.410.gd8fdbe21b5-goog
## TL;DR
This new patch set only contains a very minor change suggested by
Masahiro to [PATCH v7 06/18] and is otherwise identical to PATCH v7.
Also, with Josh's ack on the preceding patch set, I think we now have
all necessary reviews and acks from all interested parties.
## Background
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
(however, KUnit still allows you to run tests on test machines or in VMs
if you want[1]) and does not require tests to be written in userspace
running on a host kernel. Additionally, KUnit is fast: From invocation
to completion KUnit can run several dozen tests in about a second.
Currently, the entire KUnit test suite for KUnit runs in under a second
from the initial invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
### What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
### Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
### More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here[2].
Additionally for convenience, I have applied these patches to a
branch[3]. The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/v5.2/v8 branch.
## Changes Since Last Version
Like I said in the TL;DR, there is only one minor change since the
previous revision. That change only affects patch 06/18; it makes it so
that make doesn't attempt to scan the kunit/ directory when CONFIG_KUNIT
is not set as suggested by Masahiro.
[1] https://google.github.io/kunit-docs/third_party/kernel/docs/usage.html#kuni…
[2] https://google.github.io/kunit-docs/third_party/kernel/docs/
[3] https://kunit.googlesource.com/linux/+/kunit/rfc/v5.2/v8
--
2.22.0.410.gd8fdbe21b5-goog
> On Mon, Jul 8, 2019 at 4:16 PM Brendan Higgins <brendanhiggins(a)google.com> wrote:
>>
>> CC'ing Iurii Zaikin
>>
>> On Fri, Jul 5, 2019 at 1:48 PM Luis Chamberlain <mcgrof(a)kernel.org> wrote:
>> >
>> > On Wed, Jul 03, 2019 at 05:36:15PM -0700, Brendan Higgins wrote:
>> > > Add entry for the new proc sysctl KUnit test to the PROC SYSCTL section.
>> > >
>> > > Signed-off-by: Brendan Higgins <brendanhiggins(a)google.com>
>> > > Reviewed-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
>> > > Reviewed-by: Logan Gunthorpe <logang(a)deltatee.com>
>> > > Acked-by: Luis Chamberlain <mcgrof(a)kernel.org>
>> >
>> > Come to think of it, I'd welcome Iurii to be added as a maintainer,
>> > with the hope Iurii would be up to review only the kunit changes. Of
>> > course if Iurii would be up to also help review future proc changes,
>> > even better. 3 pair of eyeballs is better than 2 pairs.
>>
>> What do you think, Iurii?
I'm in.
## TL;DR
This is a pretty straightforward follow-up to Stephen and Luis' comments
on PATCH v5: There is nothing that really changes any functionality or
usage with the minor exception that I renamed `struct kunit_module` to
`struct kunit_suite`. Nevertheless, a good deal of clean ups and fixes.
See "Changes Since Last Version" section below.
As for our current status, right now we got Reviewed-bys on all patches
except:
- [PATCH v6 08/18] objtool: add kunit_try_catch_throw to the noreturn
list
However, it would probably be good to get reviews/acks from the
subsystem maintainers on:
- [PATCH v6 06/18] kbuild: enable building KUnit
- [PATCH v6 08/18] objtool: add kunit_try_catch_throw to the noreturn
list
- [PATCH v6 15/18] Documentation: kunit: add documentation for KUnit
- [PATCH v6 17/18] kernel/sysctl-test: Add null pointer test for
sysctl.c:proc_dointvec()
Other than that, I think we should be good to go.
## Background
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
(however, KUnit still allows you to run tests on test machines or in VMs
if you want[1]) and does not require tests to be written in userspace
running on a host kernel. Additionally, KUnit is fast: From invocation
to completion KUnit can run several dozen tests in about a second.
Currently, the entire KUnit test suite for KUnit runs in under a second
from the initial invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
### What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
### Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
### More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here[2].
Additionally for convenience, I have applied these patches to a
branch[3]. The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/v5.2-rc7/v6 branch.
## Changes Since Last Version
Aside from renaming `struct kunit_module` to `struct kunit_suite`, there
isn't really anything in here that changes any functionality:
- Rebased on v5.2-rc7
- Got rid of spinlocks.
- Now update success field using WRITE_ONCE. - Suggested by Stephen
- Other instances replaced by mutex. - Suggested by Stephen and Luis
- Renamed `struct kunit_module` to `struct kunit_suite`. - Inspired by
Luis.
- Fixed a broken example of how to use kunit_tool. - Pointed out by Ted
- Added descriptions to unit tests. - Suggested by Luis
- Labeled unit tests which tested the API. - Suggested by Luis
- Made a number of minor cleanups. - Suggested by Luis and Stephen.
[1] https://google.github.io/kunit-docs/third_party/kernel/docs/usage.html#kuni…
[2] https://google.github.io/kunit-docs/third_party/kernel/docs/
[3] https://kunit.googlesource.com/linux/+/kunit/rfc/v5.2-rc7/v6
--
2.22.0.410.gd8fdbe21b5-goog
Using ".arm .inst" for the arm signature introduces build issues for
programs compiled in Thumb mode because the assembler stays in the
arm mode for the rest of the inline assembly. Revert to using a ".word"
to express the signature as data instead.
The choice of signature is a valid trap instruction on arm32 little
endian, where both code and data are little endian.
ARMv6+ big endian (BE8) generates mixed endianness code vs data:
little-endian code and big-endian data. The data value of the signature
needs to have its byte order reversed to generate the trap instruction.
Prior to ARMv6, -mbig-endian generates big-endian code and data
(which match), so the endianness of the data representation of the
signature should not be reversed. However, the choice between BE32
and BE8 is done by the linker, so we cannot know whether code and
data endianness will be mixed before the linker is invoked. So rather
than try to play tricks with the linker, the rseq signature is simply
data (not a trap instruction) prior to ARMv6 on big endian. This is
why the signature is expressed as data (.word) rather than as
instruction (.inst) in assembler.
Because a ".word" is used to emit the signature, it will be interpreted
as a literal pool by a disassembler, not as an actual instruction.
Considering that the signature is not meant to be executed except in
scenarios where the program execution is completely bogus, this should
not be an issue.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
Acked-by: Will Deacon <will.deacon(a)arm.com>
CC: Peter Zijlstra <peterz(a)infradead.org>
CC: Thomas Gleixner <tglx(a)linutronix.de>
CC: Joel Fernandes <joelaf(a)google.com>
CC: Catalin Marinas <catalin.marinas(a)arm.com>
CC: Dave Watson <davejwatson(a)fb.com>
CC: Will Deacon <will.deacon(a)arm.com>
CC: Shuah Khan <shuah(a)kernel.org>
CC: Andi Kleen <andi(a)firstfloor.org>
CC: linux-kselftest(a)vger.kernel.org
CC: "H . Peter Anvin" <hpa(a)zytor.com>
CC: Chris Lameter <cl(a)linux.com>
CC: Russell King <linux(a)arm.linux.org.uk>
CC: Michael Kerrisk <mtk.manpages(a)gmail.com>
CC: "Paul E . McKenney" <paulmck(a)linux.vnet.ibm.com>
CC: Paul Turner <pjt(a)google.com>
CC: Boqun Feng <boqun.feng(a)gmail.com>
CC: Josh Triplett <josh(a)joshtriplett.org>
CC: Steven Rostedt <rostedt(a)goodmis.org>
CC: Ben Maurer <bmaurer(a)fb.com>
CC: linux-api(a)vger.kernel.org
CC: Andy Lutomirski <luto(a)amacapital.net>
CC: Andrew Morton <akpm(a)linux-foundation.org>
CC: Linus Torvalds <torvalds(a)linux-foundation.org>
CC: Carlos O'Donell <carlos(a)redhat.com>
CC: Florian Weimer <fweimer(a)redhat.com>
---
tools/testing/selftests/rseq/rseq-arm.h | 61 ++++++++++++++++++---------------
1 file changed, 33 insertions(+), 28 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index 84f28f147fb6..5943c816c07c 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -6,6 +6,8 @@
*/
/*
+ * - ARM little endian
+ *
* RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
* value 0x5de3. This traps if user-space reaches this instruction by mistake,
* and the uncommon operand ensures the kernel does not move the instruction
@@ -22,36 +24,40 @@
* def3 udf #243 ; 0xf3
* e7f5 b.n <7f5>
*
- * pre-ARMv6 big endian code:
- * e7f5 b.n <7f5>
- * def3 udf #243 ; 0xf3
+ * - ARMv6+ big endian (BE8):
*
* ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
- * code and big-endian data. Ensure the RSEQ_SIG data signature matches code
- * endianness. Prior to ARMv6, -mbig-endian generates big-endian code and data
- * (which match), so there is no need to reverse the endianness of the data
- * representation of the signature. However, the choice between BE32 and BE8
- * is done by the linker, so we cannot know whether code and data endianness
- * will be mixed before the linker is invoked.
+ * code and big-endian data. The data value of the signature needs to have its
+ * byte order reversed to generate the trap instruction:
+ *
+ * Data: 0xf3def5e7
+ *
+ * Translates to this A32 instruction pattern:
+ *
+ * e7f5def3 udf #24035 ; 0x5de3
+ *
+ * Translates to this T16 instruction pattern:
+ *
+ * def3 udf #243 ; 0xf3
+ * e7f5 b.n <7f5>
+ *
+ * - Prior to ARMv6 big endian (BE32):
+ *
+ * Prior to ARMv6, -mbig-endian generates big-endian code and data
+ * (which match), so the endianness of the data representation of the
+ * signature should not be reversed. However, the choice between BE32
+ * and BE8 is done by the linker, so we cannot know whether code and
+ * data endianness will be mixed before the linker is invoked. So rather
+ * than try to play tricks with the linker, the rseq signature is simply
+ * data (not a trap instruction) prior to ARMv6 on big endian. This is
+ * why the signature is expressed as data (.word) rather than as
+ * instruction (.inst) in assembler.
*/
-#define RSEQ_SIG_CODE 0xe7f5def3
-
-#ifndef __ASSEMBLER__
-
-#define RSEQ_SIG_DATA \
- ({ \
- int sig; \
- asm volatile ("b 2f\n\t" \
- "1: .inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
- "2:\n\t" \
- "ldr %[sig], 1b\n\t" \
- : [sig] "=r" (sig)); \
- sig; \
- })
-
-#define RSEQ_SIG RSEQ_SIG_DATA
-
+#ifdef __ARMEB__
+#define RSEQ_SIG 0xf3def5e7 /* udf #24035 ; 0x5de3 (ARMv6+) */
+#else
+#define RSEQ_SIG 0xe7f5def3 /* udf #24035 ; 0x5de3 */
#endif
#define rseq_smp_mb() __asm__ __volatile__ ("dmb" ::: "memory", "cc")
@@ -125,8 +131,7 @@ do { \
__rseq_str(table_label) ":\n\t" \
".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
- ".arm\n\t" \
- ".inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
+ ".word " __rseq_str(RSEQ_SIG) "\n\t" \
__rseq_str(label) ":\n\t" \
teardown \
"b %l[" __rseq_str(abort_label) "]\n\t"
--
2.11.0
The ftrace test will need to have CONFIG_FTRACE enabled to make the
ftrace directory available.
Add an additional check to skip this test if the CONFIG_FTRACE was not
enabled.
This will be helpful to avoid a false-positive test result when testing
it directly with the following commad against a kernel that does not
have CONFIG_FTRACE enabled:
make -C tools/testing/selftests TARGETS=ftrace run_tests
The test result on an Ubuntu KVM kernel will be changed from:
selftests: ftrace: ftracetest
========================================
Error: No ftrace directory found
not ok 1..1 selftests: ftrace: ftracetest [FAIL]
To:
selftests: ftrace: ftracetest
========================================
CONFIG_FTRACE was not enabled, test skipped.
not ok 1..1 selftests: ftrace: ftracetest [SKIP]
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/ftrace/ftracetest | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/ftrace/ftracetest b/tools/testing/selftests/ftrace/ftracetest
index 6d5e9e8..6c8322e 100755
--- a/tools/testing/selftests/ftrace/ftracetest
+++ b/tools/testing/selftests/ftrace/ftracetest
@@ -7,6 +7,9 @@
# Written by Masami Hiramatsu <masami.hiramatsu.pt(a)hitachi.com>
#
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
usage() { # errno [message]
[ ! -z "$2" ] && echo $2
echo "Usage: ftracetest [options] [testcase(s)] [testcase-directory(s)]"
@@ -139,7 +142,13 @@ parse_opts $*
# Verify parameters
if [ -z "$TRACING_DIR" -o ! -d "$TRACING_DIR" ]; then
- errexit "No ftrace directory found"
+ ftrace_enabled=`grep "^CONFIG_FTRACE=y" /lib/modules/$(uname -r)/build/.config`
+ if [ -z "$ftrace_enabled" ]; then
+ echo "CONFIG_FTRACE was not enabled, test skipped."
+ exit $ksft_skip
+ else
+ errexit "No ftrace directory found"
+ fi
fi
# Preparing logs
--
2.7.4
## TL;DR
A not so quick follow-up to Stephen's suggestions on PATCH v4. Nothing
that really changes any functionality or usage with the minor exception
of a couple public functions that Stephen asked me to rename.
Nevertheless, a good deal of clean up and fixes. See changes below.
As for our current status, right now we got Reviewed-bys on all patches
except:
- [PATCH v5 08/18] objtool: add kunit_try_catch_throw to the noreturn
list
However, it would probably be good to get reviews/acks from the
subsystem maintainers on:
- [PATCH v5 06/18] kbuild: enable building KUnit
- [PATCH v5 08/18] objtool: add kunit_try_catch_throw to the noreturn
list
- [PATCH v5 15/18] Documentation: kunit: add documentation for KUnit
- [PATCH v5 17/18] kernel/sysctl-test: Add null pointer test for
sysctl.c:proc_dointvec()
- [PATCH v5 18/18] MAINTAINERS: add proc sysctl KUnit test to PROC
SYSCTL section
Other than that, I think we should be good to go.
One last thing, I updated the background to include my thoughts on KUnit
vs. in kernel testing with kselftest in the background sections as
suggested by Frank in the discussion on PATCH v2.
## Background
This patch set proposes KUnit, a lightweight unit testing and mocking
framework for the Linux kernel.
Unlike Autotest and kselftest, KUnit is a true unit testing framework;
it does not require installing the kernel on a test machine or in a VM
(however, KUnit still allows you to run tests on test machines or in VMs
if you want[1]) and does not require tests to be written in userspace
running on a host kernel. Additionally, KUnit is fast: From invocation
to completion KUnit can run several dozen tests in under a second.
Currently, the entire KUnit test suite for KUnit runs in under a second
from the initial invocation (build time excluded).
KUnit is heavily inspired by JUnit, Python's unittest.mock, and
Googletest/Googlemock for C++. KUnit provides facilities for defining
unit test cases, grouping related test cases into test suites, providing
common infrastructure for running tests, mocking, spying, and much more.
### But wait! Doesn't kselftest support in kernel testing?!
In a previous version of this patchset Frank pointed out that kselftest
already supports writing a test that resides in the kernel using the
test module feature[2]. LWN did a really great summary on this
discussion here[3].
Kselftest has a feature that allows a test module to be loaded into a
kernel using the kselftest framework; this does allow someone to write
tests against kernel code not directly exposed to userland; however, it
does not provide much of a framework around how to structure the tests.
The kselftest test module feature just provides a header which has a
standardized way of reporting test failures, and then provides
infrastructure to load and run the tests using the kselftest test
harness.
The kselftest test module does not seem to be opinionated at all in
regards to how tests are structured, how they check for failures, how
tests are organized. Even in the method it provides for reporting
failures is pretty simple; it doesn't have any more advanced failure
reporting or logging features. Given what's there, I think it is fair to
say that it is not actually a framework, but a feature that makes it
possible for someone to do some checks in kernel space.
Furthermore, kselftest test module has very few users. I checked for all
the tests that use it using the following grep command:
grep -Hrn -e 'kselftest_module\.h'
and only got three results: lib/test_strscpy.c, lib/test_printf.c, and
lib/test_bitmap.c.
So despite kselftest test module's existence, there really is no feature
overlap between kselftest and KUnit, save one: that you can use either
to write an in-kernel test, but this is a very small feature in
comparison to everything that KUnit allows you to do. KUnit is a full
x-unit style unit testing framework, whereas kselftest looks a lot more
like an end-to-end/functional testing framework, with a feature that
makes it possible to write in-kernel tests.
### What's so special about unit testing?
A unit test is supposed to test a single unit of code in isolation,
hence the name. There should be no dependencies outside the control of
the test; this means no external dependencies, which makes tests orders
of magnitudes faster. Likewise, since there are no external dependencies,
there are no hoops to jump through to run the tests. Additionally, this
makes unit tests deterministic: a failing unit test always indicates a
problem. Finally, because unit tests necessarily have finer granularity,
they are able to test all code paths easily solving the classic problem
of difficulty in exercising error handling code.
### Is KUnit trying to replace other testing frameworks for the kernel?
No. Most existing tests for the Linux kernel are end-to-end tests, which
have their place. A well tested system has lots of unit tests, a
reasonable number of integration tests, and some end-to-end tests. KUnit
is just trying to address the unit test space which is currently not
being addressed.
### More information on KUnit
There is a bunch of documentation near the end of this patch set that
describes how to use KUnit and best practices for writing unit tests.
For convenience I am hosting the compiled docs here[4].
Additionally for convenience, I have applied these patches to a
branch[5]. The repo may be cloned with:
git clone https://kunit.googlesource.com/linux
This patchset is on the kunit/rfc/v5.2-rc4/v5 branch.
## Changes Since Last Version
Aside from a couple public function renames, there isn't really anything
in here that changes any functionality.
- Went through and fixed a couple of anti-patterns suggested by Stephen
Boyd. Things like:
- Dropping an else clause at the end of a function.
- Dropping the comma on the closing sentinel, `{}`, of a list.
- Inlines a bunch of functions in the test case running logic in patch
01/18 to make it more readable as suggested by Stephen Boyd
- Found and fixed bug in resource deallocation logic in patch 02/18. Bug
was discovered as a result of making a change suggested by Stephen
Boyd. This does not substantially change how any of the code works
conceptually.
- Renamed new_string_stream() to alloc_string_stream() as suggested by
Stephen Boyd.
- Made string-stream a KUnit managed object - based on a suggestion made
by Stephen Boyd.
- Renamed kunit_new_stream() to alloc_kunit_stream() as suggested by
Stephen Boyd.
- Removed the ability to set log level after allocating a kunit_stream,
as suggested by Stephen Boyd.
[1] https://google.github.io/kunit-docs/third_party/kernel/docs/usage.html#kuni…
[2] https://www.kernel.org/doc/html/latest/dev-tools/kselftest.html#test-module
[3] https://lwn.net/Articles/790235/
[4] https://google.github.io/kunit-docs/third_party/kernel/docs/
[5] https://kunit.googlesource.com/linux/+/kunit/rfc/v5.2-rc4/v5
--
2.22.0.410.gd8fdbe21b5-goog
Hi
this patchset aims to add the initial arch-specific arm64 support to
kselftest starting with signals-related test-cases.
A common internal test-case layout is proposed which then it is
anyway wired-up to the toplevel kselftest Makefile, so that it
should be possible at the end to run it on an arm64 target in the
usual way with KSFT.
~/linux# make TARGETS=arm64 kselftest
New KSFT arm64 testcases live inside tools/testing/selftests/arm64 grouped by
family inside subdirectories: arm64/signal is the first family proposed with
this series. arm64/signal tests can be run via KSFT or standalone.
Thanks
Cristian
Notes:
-----
- further details in the included READMEs
- more tests still to be written (current strategy is going through the related
Kernel signal-handling code and write a test for each possible and sensible code-path)
- a bit of overlap around KSFT arm64/ Makefiles is expected with this recently
proposed patch-series:
http://lists.infradead.org/pipermail/linux-arm-kernel/2019-June/659432.html
Changes:
--------
v1-->v2:
- rebased on 5.2-rc7
- various makefile's cleanups
- mixed READMEs fixes
- fixed test_arm64_signals.sh runner script
- cleaned up assembly code in signal.S
- improved get_current_context() logic
- fixed SAFE_WRITE()
- common support code splitted into more chunks, each one introduced when
needed by some new testcases
- fixed some headers validation routines in testcases.c
- removed some still broken/immature tests:
+ fake_sigreturn_misaligned
+ fake_sigreturn_overflow_reserved
+ mangle_pc_invalid
+ mangle_sp_misaligned
- fixed some other testcases:
+ mangle_pstate_ssbs_regs: better checks of SSBS bit when feature unsupported
+ mangle_pstate_invalid_compat_toggle: name fix
+ mangle_pstate_invalid_mode_el[1-3]: precautionary zeroing PSTATE.MODE
+ fake_sigreturn_bad_magic, fake_sigreturn_bad_size,
fake_sigreturn_bad_size_for_magic0:
- accounting for available space...dropping extra when needed
- keeping alignent
- new testcases on FPSMID context:
+ fake_sigreturn_missing_fpsimd
+ fake_sigreturn_duplicated_fpsimd
Cristian Marussi (10):
kselftest: arm64: introduce new boilerplate code
kselftest: arm64: adds first test and common utils
kselftest: arm64: mangle_pstate_invalid_daif_bits
kselftest: arm64: mangle_pstate_invalid_mode_el
kselftest: arm64: mangle_pstate_ssbs_regs
kselftest: arm64: fake_sigreturn_bad_magic
kselftest: arm64: fake_sigreturn_bad_size_for_magic0
kselftest: arm64: fake_sigreturn_missing_fpsimd
kselftest: arm64: fake_sigreturn_duplicated_fpsimd
kselftest: arm64: fake_sigreturn_bad_size
tools/testing/selftests/Makefile | 1 +
tools/testing/selftests/arm64/Makefile | 51 +++
tools/testing/selftests/arm64/README | 43 +++
.../testing/selftests/arm64/signal/.gitignore | 5 +
tools/testing/selftests/arm64/signal/Makefile | 86 +++++
tools/testing/selftests/arm64/signal/README | 59 ++++
.../testing/selftests/arm64/signal/signals.S | 64 ++++
.../arm64/signal/test_arm64_signals.sh | 55 +++
.../selftests/arm64/signal/test_signals.c | 26 ++
.../selftests/arm64/signal/test_signals.h | 141 ++++++++
.../arm64/signal/test_signals_utils.c | 331 ++++++++++++++++++
.../arm64/signal/test_signals_utils.h | 31 ++
.../arm64/signal/testcases/.gitignore | 11 +
.../testcases/fake_sigreturn_bad_magic.c | 63 ++++
.../testcases/fake_sigreturn_bad_size.c | 85 +++++
.../fake_sigreturn_bad_size_for_magic0.c | 57 +++
.../fake_sigreturn_duplicated_fpsimd.c | 62 ++++
.../testcases/fake_sigreturn_missing_fpsimd.c | 44 +++
.../mangle_pstate_invalid_compat_toggle.c | 25 ++
.../mangle_pstate_invalid_daif_bits.c | 28 ++
.../mangle_pstate_invalid_mode_el1.c | 29 ++
.../mangle_pstate_invalid_mode_el2.c | 29 ++
.../mangle_pstate_invalid_mode_el3.c | 29 ++
.../testcases/mangle_pstate_ssbs_regs.c | 56 +++
.../arm64/signal/testcases/testcases.c | 150 ++++++++
.../arm64/signal/testcases/testcases.h | 83 +++++
26 files changed, 1644 insertions(+)
create mode 100644 tools/testing/selftests/arm64/Makefile
create mode 100644 tools/testing/selftests/arm64/README
create mode 100644 tools/testing/selftests/arm64/signal/.gitignore
create mode 100644 tools/testing/selftests/arm64/signal/Makefile
create mode 100644 tools/testing/selftests/arm64/signal/README
create mode 100644 tools/testing/selftests/arm64/signal/signals.S
create mode 100755 tools/testing/selftests/arm64/signal/test_arm64_signals.sh
create mode 100644 tools/testing/selftests/arm64/signal/test_signals.c
create mode 100644 tools/testing/selftests/arm64/signal/test_signals.h
create mode 100644 tools/testing/selftests/arm64/signal/test_signals_utils.c
create mode 100644 tools/testing/selftests/arm64/signal/test_signals_utils.h
create mode 100644 tools/testing/selftests/arm64/signal/testcases/.gitignore
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_bad_magic.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_bad_size.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_bad_size_for_magic0.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_duplicated_fpsimd.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/fake_sigreturn_missing_fpsimd.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_compat_toggle.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_daif_bits.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_mode_el1.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_mode_el2.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_invalid_mode_el3.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/mangle_pstate_ssbs_regs.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/testcases.c
create mode 100644 tools/testing/selftests/arm64/signal/testcases/testcases.h
--
2.17.1
This patch series moves Hyper-V clock/timer code to a separate Hyper-V
clocksource driver. Previously, Hyper-V clock/timer code and data
structures were mixed in with other Hyper-V code in the ISA independent
drivers/hv code as well as in ISA dependent code. The new Hyper-V
clocksource driver is ISA agnostic, with a just few dependencies on
ISA specific functions. The patch series does not change any behavior
or functionality -- it only reorganizes the existing code and fixes up
the linkages. A few places outside of Hyper-V code are fixed up to use
the new #include file structure.
This restructuring is in response to Marc Zyngier's review comments
on supporting Hyper-V running on ARM64, and is a good idea in general.
It increases the amount of code shared between the x86 and ARM64
architectures, and reduces the size of the new code for supporting
Hyper-V on ARM64. A new version of the Hyper-V on ARM64 patches will
follow once this clocksource restructuring is accepted.
The code is diff'ed against the upstream tip tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/vdso
Changes in v5:
* Revised commit summaries [Thomas Gleixner]
* Removed call to clockevents_unbind_device() [Thomas Gleixner]
* Restructured hv_init_clocksource() [Thomas Gleixner]
* Various other small code cleanups [Thomas Gleixner]
Changes in v4:
* Revised commit messages
* Rebased to upstream tip tree
Changes in v3:
* Removed boolean argument to hv_init_clocksource(). Always call
sched_clock_register, which is needed on ARM64 but a no-op on x86.
* Removed separate cpuhp setup in hv_stimer_alloc() and instead
directly call hv_stimer_init() and hv_stimer_cleanup() from
corresponding VMbus functions. This more closely matches original
code and avoids clocksource stop/restart problems on ARM64 when
VMbus code denies CPU offlining request.
Changes in v2:
* Revised commit short descriptions so the distinction between
the first and second patches is clearer [GregKH]
* Renamed new clocksource driver files and functions to use
existing "timer" and "stimer" names instead of introducing
"syntimer". [Vitaly Kuznetsov]
* Introduced CONFIG_HYPER_TIMER to fix build problem when
CONFIG_HYPERV=m [Vitaly Kuznetsov]
* Added "Suggested-by: Marc Zyngier"
Michael Kelley (2):
clocksource/drivers: Make Hyper-V clocksource ISA agnostic
clocksource/drivers: Continue making Hyper-V clocksource ISA agnostic
MAINTAINERS | 2 +
arch/x86/entry/vdso/vma.c | 2 +-
arch/x86/hyperv/hv_init.c | 91 +--------
arch/x86/include/asm/hyperv-tlfs.h | 6 +
arch/x86/include/asm/mshyperv.h | 81 +-------
arch/x86/include/asm/vdso/gettimeofday.h | 2 +-
arch/x86/kernel/cpu/mshyperv.c | 4 +-
arch/x86/kvm/x86.c | 1 +
drivers/clocksource/Makefile | 1 +
drivers/clocksource/hyperv_timer.c | 339 +++++++++++++++++++++++++++++++
drivers/hv/Kconfig | 3 +
drivers/hv/hv.c | 156 +-------------
drivers/hv/hv_util.c | 1 +
drivers/hv/hyperv_vmbus.h | 3 -
drivers/hv/vmbus_drv.c | 42 ++--
include/clocksource/hyperv_timer.h | 105 ++++++++++
16 files changed, 503 insertions(+), 336 deletions(-)
create mode 100644 drivers/clocksource/hyperv_timer.c
create mode 100644 include/clocksource/hyperv_timer.h
--
1.8.3.1
The psock_tpacket test will need to access /proc/kallsyms, this would
require the kernel config CONFIG_KALLSYMS to be enabled first.
Apart from adding CONFIG_KALLSYMS to the net/config file here, check the
file existence to determine if we can run this test will be helpful to
avoid a false-positive test result when testing it directly with the
following commad against a kernel that have CONFIG_KALLSYMS disabled:
make -C tools/testing/selftests TARGETS=net run_tests
Signed-off-by: Po-Hsu Lin <po-hsu.lin(a)canonical.com>
---
tools/testing/selftests/net/config | 1 +
tools/testing/selftests/net/run_afpackettests | 14 +++++++++-----
2 files changed, 10 insertions(+), 5 deletions(-)
diff --git a/tools/testing/selftests/net/config b/tools/testing/selftests/net/config
index 4740404..3dea2cb 100644
--- a/tools/testing/selftests/net/config
+++ b/tools/testing/selftests/net/config
@@ -25,3 +25,4 @@ CONFIG_NF_TABLES_IPV6=y
CONFIG_NF_TABLES_IPV4=y
CONFIG_NFT_CHAIN_NAT_IPV6=m
CONFIG_NFT_CHAIN_NAT_IPV4=m
+CONFIG_KALLSYMS=y
diff --git a/tools/testing/selftests/net/run_afpackettests b/tools/testing/selftests/net/run_afpackettests
index ea5938e..8b42e8b 100755
--- a/tools/testing/selftests/net/run_afpackettests
+++ b/tools/testing/selftests/net/run_afpackettests
@@ -21,12 +21,16 @@ fi
echo "--------------------"
echo "running psock_tpacket test"
echo "--------------------"
-./in_netns.sh ./psock_tpacket
-if [ $? -ne 0 ]; then
- echo "[FAIL]"
- ret=1
+if [ -f /proc/kallsyms ]; then
+ ./in_netns.sh ./psock_tpacket
+ if [ $? -ne 0 ]; then
+ echo "[FAIL]"
+ ret=1
+ else
+ echo "[PASS]"
+ fi
else
- echo "[PASS]"
+ echo "[SKIP] CONFIG_KALLSYMS not enabled"
fi
echo "--------------------"
--
2.7.4
From: Colin Ian King <colin.king(a)canonical.com>
There is an spelling mistake in an a test error message. Fix it.
Signed-off-by: Colin Ian King <colin.king(a)canonical.com>
---
tools/testing/selftests/x86/test_vsyscall.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/x86/test_vsyscall.c b/tools/testing/selftests/x86/test_vsyscall.c
index 4602326b8f5b..a4f4d4cf22c3 100644
--- a/tools/testing/selftests/x86/test_vsyscall.c
+++ b/tools/testing/selftests/x86/test_vsyscall.c
@@ -451,7 +451,7 @@ static int test_vsys_x(void)
printf("[OK]\tExecuting the vsyscall page failed: #PF(0x%lx)\n",
segv_err);
} else {
- printf("[FAILT]\tExecution failed with the wrong error: #PF(0x%lx)\n",
+ printf("[FAIL]\tExecution failed with the wrong error: #PF(0x%lx)\n",
segv_err);
return 1;
}
--
2.20.1
Commit a745f7af3cbd ("selftests/harness: Add 30 second timeout per
test") solves that binary tests doesn't hang forever. However, scripts
can still hang forever, this adds an timeout to each test script run. This
assumes that an individual test doesn't take longer than 30 seconds.
Signed-off-by: Anders Roxell <anders.roxell(a)linaro.org>
---
tools/testing/selftests/kselftest/runner.sh | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/kselftest/runner.sh b/tools/testing/selftests/kselftest/runner.sh
index 00c9020bdda8..cff7d2d83648 100644
--- a/tools/testing/selftests/kselftest/runner.sh
+++ b/tools/testing/selftests/kselftest/runner.sh
@@ -5,6 +5,7 @@
export skip_rc=4
export logfile=/dev/stdout
export per_test_logging=
+export TEST_TIMEOUT_DEFAULT=30
# There isn't a shell-agnostic way to find the path of a sourced file,
# so we must rely on BASE_DIR being set to find other tools.
@@ -24,6 +25,14 @@ tap_prefix()
fi
}
+tap_timeout()
+{
+ if [ -x /usr/bin/timeout ] && [ -x "$BASENAME_TEST" ] \
+ && file $BASENAME_TEST |grep -q "shell script"; then
+ echo -n "timeout $TEST_TIMEOUT_DEFAULT"
+ fi
+}
+
run_one()
{
DIR="$1"
@@ -44,7 +53,7 @@ run_one()
echo "not ok $test_num $TEST_HDR_MSG"
else
cd `dirname $TEST` > /dev/null
- (((((./$BASENAME_TEST 2>&1; echo $? >&3) |
+ ((((( tap_timeout ./$BASENAME_TEST 2>&1; echo $? >&3) |
tap_prefix >&4) 3>&1) |
(read xs; exit $xs)) 4>>"$logfile" &&
echo "ok $test_num $TEST_HDR_MSG") ||
--
2.11.0
The t->rcu_read_unlock_special union's need_qs bit can be set by the
scheduler tick (in rcu_flavor_sched_clock_irq) to indicate that help is
needed from the rcu_read_unlock path. When this help arrives however, we
can do better to speed up the quiescent state reporting which if
rcu_read_unlock_special::need_qs is set might be quite urgent. Make use
of this information in deciding when to do heavy-weight softirq raising
where possible.
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
kernel/rcu/tree_plugin.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index c588ef98efd3..bff6410fac06 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -622,7 +622,8 @@ static void rcu_read_unlock_special(struct task_struct *t)
t->rcu_read_unlock_special.b.exp_hint = false;
exp = (t->rcu_blocked_node && t->rcu_blocked_node->exp_tasks) ||
(rdp->grpmask & rnp->expmask) ||
- tick_nohz_full_cpu(rdp->cpu);
+ tick_nohz_full_cpu(rdp->cpu) ||
+ t->rcu_read_unlock_special.b.need_qs;
// Need to defer quiescent state until everything is enabled.
if (irqs_were_disabled && use_softirq &&
(in_interrupt() ||
--
2.22.0.410.gd8fdbe21b5-goog
Hello,
The proposed introduction of a relaxed ARM64 ABI [1] will allow tagged memory
addresses to be passed through the user-kernel syscall ABI boundary. Tagged
memory addresses are those which contain a non-zero top byte (the hardware
has always ignored this top byte due to TCR_EL1.TBI0) and may be useful
for features such as HWASan.
To permit this relaxation a proposed patchset [2] strips the top byte (tag)
from user provided memory addresses prior to use in kernel functions which
require untagged addresses (for example comparasion/arithmetic of addresses).
The author of this patchset relied on a variety of techniques [2] (such as
grep, BUG_ON, sparse etc) to identify as many instances of possible where
tags need to be stipped.
To support this effort and to catch future regressions (e.g. in new syscalls
or ioctls), I've devised an additional approach for detecting the use of
tagged addresses in functions that do not want them. This approach makes
use of Smatch [3] and is outlined in this RFC. Due to the ability of Smatch
to do flow analysis I believe we can annotate the kernel in fewer places
than a similar approach in sparse.
I'm keen for feedback on the likely usefulness of this approach.
We first add some new annotations that are exclusively consumed by Smatch:
--- a/include/linux/compiler_types.h
+++ b/include/linux/compiler_types.h
@@ -19,6 +19,7 @@
# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)
# define __percpu __attribute__((noderef, address_space(3)))
# define __rcu __attribute__((noderef, address_space(4)))
+# define __untagged __attribute__((address_space(5)))
# define __private __attribute__((noderef))
extern void __chk_user_ptr(const volatile void __user *);
extern void __chk_io_ptr(const volatile void __iomem *);
@@ -45,6 +46,7 @@ extern void __chk_io_ptr(const volatile void __iomem *);
...
The purpose of this annotation is to indicate in function prototypes that a
given argument must not be a user tagged memory address. (The address space
number isn't significant here and could be replaced with any other annotation
that we get Smatch to understand).
An example of how we use this annotation is as follows:
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -2224,7 +2224,7 @@ get_unmapped_area(struct file *file, unsigned long addr, unsigned long len,
EXPORT_SYMBOL(get_unmapped_area);
/* Look up the first VMA which satisfies addr < vm_end, NULL if none. */
-struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long addr)
+struct vm_area_struct *find_vma(struct mm_struct *mm, unsigned long __untagged addr)
{
struct rb_node *rb_node;
struct vm_area_struct *vma;
Thus our intent here is to flag up whenever addr is tagged. Specifically
modifications I've made to Smatch to test this will flag an issue where all
the following conditions are met:
1. the parameter is used in the function
2. the data in the parameter has originated or been derived from userspace
(this relies on existing Smatch functionality to detect where data has
come from)
3. the data's top byte is non-zero (via flow analysis to determine the
range of values that it may be given the known call-tree).
Due to the use of smatch and its flow-analysis we don't need to propogate the
__untagged annotation up the call chain to the callers and their callers - thus
allowing us to only annotate functions that actually does something with the address
and it's only necessary if the function has the potential to receive user data. I.e.
we only need to tag find_vma to catch an issue with mmap_region (because it calls
count_vma_pages_range which calls find_vma_intersection which calls find_vma).
Due to condition 3 above, the use of the existing untagged_addr macro (or anything
that does something similar) will prevent smatch from producing a warning.
Using a vanilla (v5.2-rc2) kernel, and a single find_vma annotation, smatch will
produce the following warnings:
mm/mmap.c:2227 find_vma() warn: Variable addr looks like a tagged address - it is not allowed here
mm/mmap.c:2227 find_vma() warn: Variable addr looks like a tagged address - it is not allowed here
mm/mmap.c:2227 find_vma() warn: Variable addr looks like a tagged address - it is not allowed here
...
The warning is printed for each call site that calls find_vma with a tagged
address from userspace. After 6 runs of smatch, 24 warnings are produced.
The warnings are helpful in detecting issues, but not useful in providing enough
information to debug the issue and find the offending functions higher up the
call stack that call find_vma. Smatch is good at determining the ranges of values
that can be passed to a function, but it doesn't keep track of how it determined
those ranges - this makes it difficult to determine the offending function.
However even this level of limited functionality is helpful - as once the kernel
is initially sanitized of tagged addresses, then the use of smatch here can
spot regressions and offending code identified via git aiaiai/bisect.
Smatch builds a database which includes a table of functions, who they are
called by and with what range of parameters. Smatch also provides a bunch
of perl and python scripts which can be used to extract helpful information
for example to produce a call tree for a given function. I've adapted these
scripts such that for a given function (e.g. find_vma) it will show you the
call tree where callers pass user data and where that data is tagged addresses.
The output looks something like this:
find_vma() - 0-u64max (1)
kvm_arch_prepare_memory_region() - 0-u64max (1)
__kvm_set_memory_region() - 0-u64max (1)
kvm_set_memory_region() - 0-u64max (1)
kvm_vm_ioctl_set_memory_region() - 0-u64max (1)
hugetlb_get_unmapped_area() - 0-u64max (1)
shm_get_unmapped_area() - 0-u64max (1)
shm_get_unmapped_area() - 32785,40977,98321,106513,2097151-u64max[c] (1)
...
In summary the following are found, note this currently unhelpfuly includes
functions inbetween find_vma and the leaf functions:
$ cat find_vma_tree_orig | sed -e 's/^[ \t]*//' | cut -d ' ' -f 1 | sort | uniq
call_mmap()
check_and_migrate_cma_pages()
compat_ipv6_getsockopt()
compat_sock_common_getsockopt()
compat_tcp_getsockopt()
count_vma_pages_range()
__do_compat_sys_get_mempolicy()
do_get_mempolicy()
do_ioctl()
do_mincore()
do_mlock()
do_mmap()
do_mmap_pgoff()
...
As you can see, this gives a good point in the right direction for hunting
down callers of find_vma with tagged addresses.
This can be further improved - the problem here is that for a given function,
e.g. find_vma we look for callers where *any* of the parameters
passed to find_vma are tagged addresses from userspace - i.e. not *just*
the annotated parameter. This is also true for find_vma's callers' callers'.
This results in the call tree having false positives.
It *is* possible to track parameters (e.g. find_vma arg 1 comes from arg 3 of
do_pages_stat_array etc), but this is limited as if functions modify the
data then the tracking is stopped (however this can be fixed).
After apply the patchset ([PATCH v16 00/16] arm64: untag user pointers passed
to the kernel)[2] which untags user addresses, smatch indicates 11 issues. The
call tree is reduced. After grep'ing the call tree output, there are some valid
instances where untagging is needed, e.g:
gntdev_ioctl_get_offset_for_vaddr()
kvm_vm_ioctl_set_memory_region()
privcmd_ioctl_mmap_batch()
privcmd_ioctl_mmap_resource()
__se_sys_brk()
__se_sys_mremap()
__se_sys_munmap()
__se_sys_remap_file_pages()
__se_sys_shmat()
__se_sys_shmdt()
__vm_munmap()
...
An example of a false positve is do_mlock. We untag the address and pass that
to apply_vma_lock_flags - however we also pass a length - because the length
came from userspace and could have the top bits set - it's flagged. However
with improved parameter tracking we can remove this false positive and similar.
Prior to smatch I attempted a similar approach with sparse - however it seemed
necessary to propogate the __untagged annotation in every function up the call tree,
and resulted in adding the __untagged annotation to functions that would never
get near user provided data. This leads to a littering of __untagged all over the
kernel which doesn't seem appealing. Smatch is more capable, however it almost
certainly won't pick up 100% of issues due to the difficulity of making flow
analysis understand everything a compiler can.
Is it likely to be acceptable to use the __untagged annotation in user-path
functions that require untagged addresses across the kernel?
Thanks,
Andrew Murray
[1] https://lkml.org/lkml/2019/6/13/534
[2] https://patchwork.kernel.org/cover/10989517/
[3] http://smatch.sourceforge.net/
vDSO (virtual dynamic shared object) is a mechanism that the Linux
kernel provides as an alternative to system calls to reduce where
possible the costs in terms of cycles.
This is possible because certain syscalls like gettimeofday() do
not write any data and return one or more values that are stored
in the kernel, which makes relatively safe calling them directly
as a library function.
Even if the mechanism is pretty much standard, every architecture
in the last few years ended up implementing their own vDSO library
in the architectural code.
The purpose of this patch-set is to identify the commonalities in
between the architectures and try to consolidate the common code
paths, starting with gettimeofday().
This implementation contains the following design choices:
* Every architecture defines the arch specific code in an header in
"asm/vdso/".
* The generic implementation includes the arch specific one and lives
in "lib/vdso".
* The arch specific code for gettimeofday lives in
"<arch path>/vdso/gettimeofday.c" and includes the generic code only.
* The generic implementation of update_vsyscall and update_vsyscall_tz
lives in kernel/vdso and provide the bindings that can be implemented
by each architecture.
* Each architecture provides its implementation of the bindings in
"asm/vdso/vsyscall.h".
* This approach allows to consolidate the common code in a single place
with the benefit of avoiding code duplication.
This implementation contains the portings to the common library for: arm64,
compat mode for arm64, arm, mips, x86_64, x32, compat mode for x86_64 and
i386.
The mips porting has been tested on qemu for mips32el. A configuration to
repeat the tests can be found at [4].
The x86_64 porting has been tested on an Intel Xeon 5120T based machine
running Ubuntu 18.04 and using the Ubuntu provided defconfig.
The i386 porting has been tested on qemu using the i386_defconfig
configuration.
Last but not least from this porting arm64, compat arm64, arm and mips gain
the support for:
* CLOCK_BOOTTIME that can be useful in certain scenarios since it keeps
track of the time during sleep as well.
* CLOCK_TAI that is like CLOCK_REALTIME, but uses the International
Atomic Time (TAI) reference instead of UTC to avoid jumping on leap
second updates.
for both clock_gettime and clock_getres.
The porting has been validated using the vdsotest test-suite [1] extended
to cover all the clock ids [2].
A new test has been added to the linux kselftest in order to validate the
newly added library.
The porting has been benchmarked and the performance results are
provided as part of this cover letter.
To simplify the testing, a copy of the patchset on top of a recent linux
tree can be found at [3] and [4].
[1] https://github.com/nathanlynch/vdsotest
[2] https://github.com/fvincenzo/vdsotest
[3] git://linux-arm.org/linux-vf.git vdso/v6
[4] git://linux-arm.org/linux-vf.git vdso-mips/v6
Changes:
--------
v6:
- Rebased on 5.2-rc2.
- Added performance numbers.
- Removed vdso_types.h.
- Unified update_vsyscall and update_vsyscall_tz.
- Reworked the kselftest included in this patchset.
- Addressed review comments.
v5:
- Rebased on 5.0-rc7.
- Added x86_64, compat mode for x86_64 and i386 portings.
- Extended vDSO kselftest.
- Addressed review comments.
v4:
- Rebased on 5.0-rc2.
- Addressed review comments.
- Disabled compat vdso on arm64 when the kernel is compiled with
clang.
v3:
- Ported the latest fixes and optimizations done on the x86
architecture to the generic library.
- Addressed review comments.
- Improved the documentation of the interfaces.
- Changed the HAVE_ARCH_TIMER config option to a more generic
HAVE_HW_COUNTER.
v2:
- Added -ffixed-x18 to arm64
- Repleced occurrences of timeval and timespec
- Modified datapage.h to be compliant with y2038 on all the architectures
- Removed __u_vdso type
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Russell King <linux(a)armlinux.org.uk>
Cc: Ralf Baechle <ralf(a)linux-mips.org>
Cc: Paul Burton <paul.burton(a)mips.com>
Cc: Daniel Lezcano <daniel.lezcano(a)linaro.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Mark Salyzyn <salyzyn(a)android.com>
Cc: Peter Collingbourne <pcc(a)google.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Dmitry Safonov <0x7f454c46(a)gmail.com>
Cc: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Cc: Huw Davies <huw(a)codeweavers.com>
Signed-off-by: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Performance Numbers: Linux 5.2.0-rc2 - Xeon Gold 5120T
======================================================
Unified vDSO:
-------------
clock-gettime-monotonic: syscall: 342 nsec/call
clock-gettime-monotonic: libc: 25 nsec/call
clock-gettime-monotonic: vdso: 24 nsec/call
clock-getres-monotonic: syscall: 296 nsec/call
clock-getres-monotonic: libc: 296 nsec/call
clock-getres-monotonic: vdso: 3 nsec/call
clock-gettime-monotonic-coarse: syscall: 294 nsec/call
clock-gettime-monotonic-coarse: libc: 5 nsec/call
clock-gettime-monotonic-coarse: vdso: 5 nsec/call
clock-getres-monotonic-coarse: syscall: 295 nsec/call
clock-getres-monotonic-coarse: libc: 292 nsec/call
clock-getres-monotonic-coarse: vdso: 5 nsec/call
clock-gettime-monotonic-raw: syscall: 343 nsec/call
clock-gettime-monotonic-raw: libc: 25 nsec/call
clock-gettime-monotonic-raw: vdso: 23 nsec/call
clock-getres-monotonic-raw: syscall: 290 nsec/call
clock-getres-monotonic-raw: libc: 290 nsec/call
clock-getres-monotonic-raw: vdso: 4 nsec/call
clock-gettime-tai: syscall: 332 nsec/call
clock-gettime-tai: libc: 24 nsec/call
clock-gettime-tai: vdso: 23 nsec/call
clock-getres-tai: syscall: 288 nsec/call
clock-getres-tai: libc: 288 nsec/call
clock-getres-tai: vdso: 3 nsec/call
clock-gettime-boottime: syscall: 342 nsec/call
clock-gettime-boottime: libc: 24 nsec/call
clock-gettime-boottime: vdso: 23 nsec/call
clock-getres-boottime: syscall: 284 nsec/call
clock-getres-boottime: libc: 291 nsec/call
clock-getres-boottime: vdso: 3 nsec/call
clock-gettime-realtime: syscall: 337 nsec/call
clock-gettime-realtime: libc: 24 nsec/call
clock-gettime-realtime: vdso: 23 nsec/call
clock-getres-realtime: syscall: 287 nsec/call
clock-getres-realtime: libc: 284 nsec/call
clock-getres-realtime: vdso: 3 nsec/call
clock-gettime-realtime-coarse: syscall: 307 nsec/call
clock-gettime-realtime-coarse: libc: 4 nsec/call
clock-gettime-realtime-coarse: vdso: 4 nsec/call
clock-getres-realtime-coarse: syscall: 294 nsec/call
clock-getres-realtime-coarse: libc: 291 nsec/call
clock-getres-realtime-coarse: vdso: 4 nsec/call
getcpu: syscall: 246 nsec/call
getcpu: libc: 14 nsec/call
getcpu: vdso: 11 nsec/call
gettimeofday: syscall: 293 nsec/call
gettimeofday: libc: 26 nsec/call
gettimeofday: vdso: 25 nsec/call
Stock Kernel:
-------------
clock-gettime-monotonic: syscall: 338 nsec/call
clock-gettime-monotonic: libc: 24 nsec/call
clock-gettime-monotonic: vdso: 23 nsec/call
clock-getres-monotonic: syscall: 291 nsec/call
clock-getres-monotonic: libc: 304 nsec/call
clock-getres-monotonic: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-monotonic-coarse: syscall: 297 nsec/call
clock-gettime-monotonic-coarse: libc: 5 nsec/call
clock-gettime-monotonic-coarse: vdso: 4 nsec/call
clock-getres-monotonic-coarse: syscall: 281 nsec/call
clock-getres-monotonic-coarse: libc: 286 nsec/call
clock-getres-monotonic-coarse: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-monotonic-raw: syscall: 336 nsec/call
clock-gettime-monotonic-raw: libc: 340 nsec/call
clock-gettime-monotonic-raw: vdso: 346 nsec/call
clock-getres-monotonic-raw: syscall: 297 nsec/call
clock-getres-monotonic-raw: libc: 301 nsec/call
clock-getres-monotonic-raw: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-tai: syscall: 351 nsec/call
clock-gettime-tai: libc: 24 nsec/call
clock-gettime-tai: vdso: 23 nsec/call
clock-getres-tai: syscall: 298 nsec/call
clock-getres-tai: libc: 290 nsec/call
clock-getres-tai: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-boottime: syscall: 342 nsec/call
clock-gettime-boottime: libc: 347 nsec/call
clock-gettime-boottime: vdso: 355 nsec/call
clock-getres-boottime: syscall: 296 nsec/call
clock-getres-boottime: libc: 295 nsec/call
clock-getres-boottime: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-realtime: syscall: 346 nsec/call
clock-gettime-realtime: libc: 24 nsec/call
clock-gettime-realtime: vdso: 22 nsec/call
clock-getres-realtime: syscall: 295 nsec/call
clock-getres-realtime: libc: 291 nsec/call
clock-getres-realtime: vdso: not tested
Note: vDSO version of clock_getres not found
clock-gettime-realtime-coarse: syscall: 292 nsec/call
clock-gettime-realtime-coarse: libc: 5 nsec/call
clock-gettime-realtime-coarse: vdso: 4 nsec/call
clock-getres-realtime-coarse: syscall: 300 nsec/call
clock-getres-realtime-coarse: libc: 301 nsec/call
clock-getres-realtime-coarse: vdso: not tested
Note: vDSO version of clock_getres not found
getcpu: syscall: 252 nsec/call
getcpu: libc: 14 nsec/call
getcpu: vdso: 11 nsec/call
gettimeofday: syscall: 293 nsec/call
gettimeofday: libc: 24 nsec/call
gettimeofday: vdso: 25 nsec/call
Peter Collingbourne (1):
arm64: Build vDSO with -ffixed-x18
Vincenzo Frascino (18):
kernel: Standardize vdso_datapage
kernel: Define gettimeofday vdso common code
kernel: Unify update_vsyscall implementation
arm64: Substitute gettimeofday with C implementation
arm64: compat: Add missing syscall numbers
arm64: compat: Expose signal related structures
arm64: compat: Generate asm offsets for signals
lib: vdso: Add compat support
arm64: compat: Add vDSO
arm64: Refactor vDSO code
arm64: compat: vDSO setup for compat layer
arm64: elf: vDSO code page discovery
arm64: compat: Get sigreturn trampolines from vDSO
arm64: Add vDSO compat support
arm: Add support for generic vDSO
mips: Add support for generic vDSO
x86: Add support for generic vDSO
kselftest: Extend vDSO selftest
arch/arm/Kconfig | 3 +
arch/arm/include/asm/vdso/gettimeofday.h | 96 +++++
arch/arm/include/asm/vdso/vsyscall.h | 71 ++++
arch/arm/include/asm/vdso_datapage.h | 29 +-
arch/arm/kernel/vdso.c | 87 +----
arch/arm/vdso/Makefile | 13 +-
arch/arm/vdso/note.c | 15 +
arch/arm/vdso/vdso.lds.S | 2 +
arch/arm/vdso/vgettimeofday.c | 268 +------------
arch/arm64/Kconfig | 3 +
arch/arm64/Makefile | 23 +-
arch/arm64/include/asm/elf.h | 14 +
arch/arm64/include/asm/signal32.h | 46 +++
arch/arm64/include/asm/unistd.h | 5 +
arch/arm64/include/asm/vdso.h | 3 +
arch/arm64/include/asm/vdso/compat_barrier.h | 51 +++
.../include/asm/vdso/compat_gettimeofday.h | 108 ++++++
arch/arm64/include/asm/vdso/gettimeofday.h | 84 +++++
arch/arm64/include/asm/vdso/vsyscall.h | 53 +++
arch/arm64/include/asm/vdso_datapage.h | 48 ---
arch/arm64/kernel/Makefile | 6 +-
arch/arm64/kernel/asm-offsets.c | 39 +-
arch/arm64/kernel/signal32.c | 72 ++--
arch/arm64/kernel/vdso.c | 356 ++++++++++++------
arch/arm64/kernel/vdso/Makefile | 34 +-
arch/arm64/kernel/vdso/gettimeofday.S | 334 ----------------
arch/arm64/kernel/vdso/vgettimeofday.c | 28 ++
arch/arm64/kernel/vdso32/.gitignore | 2 +
arch/arm64/kernel/vdso32/Makefile | 184 +++++++++
arch/arm64/kernel/vdso32/note.c | 15 +
arch/arm64/kernel/vdso32/sigreturn.S | 62 +++
arch/arm64/kernel/vdso32/vdso.S | 19 +
arch/arm64/kernel/vdso32/vdso.lds.S | 82 ++++
arch/arm64/kernel/vdso32/vgettimeofday.c | 59 +++
arch/mips/Kconfig | 2 +
arch/mips/include/asm/vdso.h | 78 +---
arch/mips/include/asm/vdso/gettimeofday.h | 175 +++++++++
arch/mips/{ => include/asm}/vdso/vdso.h | 6 +-
arch/mips/include/asm/vdso/vsyscall.h | 43 +++
arch/mips/kernel/vdso.c | 37 +-
arch/mips/vdso/Makefile | 25 +-
arch/mips/vdso/elf.S | 2 +-
arch/mips/vdso/gettimeofday.c | 273 --------------
arch/mips/vdso/sigreturn.S | 2 +-
arch/mips/vdso/vdso.lds.S | 4 +
arch/mips/vdso/vgettimeofday.c | 57 +++
arch/x86/Kconfig | 3 +
arch/x86/entry/vdso/Makefile | 9 +
arch/x86/entry/vdso/vclock_gettime.c | 251 +++---------
arch/x86/entry/vdso/vdso.lds.S | 2 +
arch/x86/entry/vdso/vdso32/vdso32.lds.S | 2 +
arch/x86/entry/vdso/vdsox32.lds.S | 1 +
arch/x86/entry/vsyscall/Makefile | 2 -
arch/x86/entry/vsyscall/vsyscall_gtod.c | 83 ----
arch/x86/include/asm/mshyperv-tsc.h | 76 ++++
arch/x86/include/asm/mshyperv.h | 70 +---
arch/x86/include/asm/pvclock.h | 2 +-
arch/x86/include/asm/vdso/gettimeofday.h | 203 ++++++++++
arch/x86/include/asm/vdso/vsyscall.h | 44 +++
arch/x86/include/asm/vgtod.h | 75 +---
arch/x86/include/asm/vvar.h | 7 +-
arch/x86/kernel/pvclock.c | 1 +
include/asm-generic/vdso/vsyscall.h | 56 +++
include/linux/hrtimer.h | 15 +-
include/linux/hrtimer_defs.h | 25 ++
include/linux/timekeeper_internal.h | 9 +
include/vdso/datapage.h | 91 +++++
include/vdso/helpers.h | 56 +++
include/vdso/vsyscall.h | 11 +
kernel/Makefile | 1 +
kernel/vdso/Makefile | 2 +
kernel/vdso/vsyscall.c | 139 +++++++
lib/Kconfig | 5 +
lib/vdso/Kconfig | 36 ++
lib/vdso/Makefile | 22 ++
lib/vdso/gettimeofday.c | 229 +++++++++++
tools/testing/selftests/vDSO/Makefile | 2 +
tools/testing/selftests/vDSO/vdso_full_test.c | 261 +++++++++++++
78 files changed, 3042 insertions(+), 1767 deletions(-)
create mode 100644 arch/arm/include/asm/vdso/gettimeofday.h
create mode 100644 arch/arm/include/asm/vdso/vsyscall.h
create mode 100644 arch/arm/vdso/note.c
create mode 100644 arch/arm64/include/asm/vdso/compat_barrier.h
create mode 100644 arch/arm64/include/asm/vdso/compat_gettimeofday.h
create mode 100644 arch/arm64/include/asm/vdso/gettimeofday.h
create mode 100644 arch/arm64/include/asm/vdso/vsyscall.h
delete mode 100644 arch/arm64/include/asm/vdso_datapage.h
delete mode 100644 arch/arm64/kernel/vdso/gettimeofday.S
create mode 100644 arch/arm64/kernel/vdso/vgettimeofday.c
create mode 100644 arch/arm64/kernel/vdso32/.gitignore
create mode 100644 arch/arm64/kernel/vdso32/Makefile
create mode 100644 arch/arm64/kernel/vdso32/note.c
create mode 100644 arch/arm64/kernel/vdso32/sigreturn.S
create mode 100644 arch/arm64/kernel/vdso32/vdso.S
create mode 100644 arch/arm64/kernel/vdso32/vdso.lds.S
create mode 100644 arch/arm64/kernel/vdso32/vgettimeofday.c
create mode 100644 arch/mips/include/asm/vdso/gettimeofday.h
rename arch/mips/{ => include/asm}/vdso/vdso.h (90%)
create mode 100644 arch/mips/include/asm/vdso/vsyscall.h
delete mode 100644 arch/mips/vdso/gettimeofday.c
create mode 100644 arch/mips/vdso/vgettimeofday.c
delete mode 100644 arch/x86/entry/vsyscall/vsyscall_gtod.c
create mode 100644 arch/x86/include/asm/mshyperv-tsc.h
create mode 100644 arch/x86/include/asm/vdso/gettimeofday.h
create mode 100644 arch/x86/include/asm/vdso/vsyscall.h
create mode 100644 include/asm-generic/vdso/vsyscall.h
create mode 100644 include/linux/hrtimer_defs.h
create mode 100644 include/vdso/datapage.h
create mode 100644 include/vdso/helpers.h
create mode 100644 include/vdso/vsyscall.h
create mode 100644 kernel/vdso/Makefile
create mode 100644 kernel/vdso/vsyscall.c
create mode 100644 lib/vdso/Kconfig
create mode 100644 lib/vdso/Makefile
create mode 100644 lib/vdso/gettimeofday.c
create mode 100644 tools/testing/selftests/vDSO/vdso_full_test.c
--
2.21.0
This patch series moves Hyper-V clock/timer code to a separate Hyper-V
clocksource driver. Previously, Hyper-V clock/timer code and data
structures were mixed in with other Hyper-V code in the ISA independent
drivers/hv code as well as in ISA dependent code. The new Hyper-V
clocksource driver is ISA independent, with a just few dependencies on
ISA specific functions. The patch series does not change any behavior
or functionality -- it only reorganizes the existing code and fixes up
the linkages. A few places outside of Hyper-V code are fixed up to use
the new #include file structure.
This restructuring is in response to Marc Zyngier's review comments
on supporting Hyper-V running on ARM64, and is a good idea in general.
It increases the amount of code shared between the x86 and ARM64
architectures, and reduces the size of the new code for supporting
Hyper-V on ARM64. A new version of the Hyper-V on ARM64 patches will
follow once this clocksource restructuring is accepted.
The code is diff'ed against the upstream tip tree:
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git timers/vdso
Changes in v4:
* Revised commit messages
* Rebased to upstream tip tree
Changes in v3:
* Removed boolean argument to hv_init_clocksource(). Always call
sched_clock_register, which is needed on ARM64 but a no-op on x86.
* Removed separate cpuhp setup in hv_stimer_alloc() and instead
directly call hv_stimer_init() and hv_stimer_cleanup() from
corresponding VMbus functions. This more closely matches original
code and avoids clocksource stop/restart problems on ARM64 when
VMbus code denies CPU offlining request.
Changes in v2:
* Revised commit short descriptions so the distinction between
the first and second patches is clearer [GregKH]
* Renamed new clocksource driver files and functions to use
existing "timer" and "stimer" names instead of introducing
"syntimer". [Vitaly Kuznetsov]
* Introduced CONFIG_HYPER_TIMER to fix build problem when
CONFIG_HYPERV=m [Vitaly Kuznetsov]
* Added "Suggested-by: Marc Zyngier"
Michael Kelley (2):
Drivers: hv: Create Hyper-V clocksource driver from existing
clockevents code
Drivers: hv: Move Hyper-V clocksource code to new clocksource driver
MAINTAINERS | 2 +
arch/x86/entry/vdso/vma.c | 2 +-
arch/x86/hyperv/hv_init.c | 91 +--------
arch/x86/include/asm/hyperv-tlfs.h | 6 +
arch/x86/include/asm/mshyperv.h | 81 ++------
arch/x86/include/asm/vdso/gettimeofday.h | 2 +-
arch/x86/kernel/cpu/mshyperv.c | 2 +
arch/x86/kvm/x86.c | 1 +
drivers/clocksource/Makefile | 1 +
drivers/clocksource/hyperv_timer.c | 321 +++++++++++++++++++++++++++++++
drivers/hv/Kconfig | 3 +
drivers/hv/hv.c | 156 +--------------
drivers/hv/hyperv_vmbus.h | 3 -
drivers/hv/vmbus_drv.c | 42 ++--
include/clocksource/hyperv_timer.h | 105 ++++++++++
15 files changed, 483 insertions(+), 335 deletions(-)
create mode 100644 drivers/clocksource/hyperv_timer.c
create mode 100644 include/clocksource/hyperv_timer.h
--
1.8.3.1
The blockdev book basically contains user-faced documentation.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
---
.../blockdev/drbd/DRBD-8.3-data-packets.svg | 0
.../blockdev/drbd/DRBD-data-packets.svg | 0
.../blockdev/drbd/conn-states-8.dot | 0
.../blockdev/drbd/data-structure-v9.rst | 0
.../blockdev/drbd/disk-states-8.dot | 0
.../drbd/drbd-connection-state-overview.dot | 0
.../blockdev/drbd/figures.rst | 0
.../{ => admin-guide}/blockdev/drbd/index.rst | 0
.../blockdev/drbd/node-states-8.dot | 1 -
.../{ => admin-guide}/blockdev/floppy.rst | 0
.../{ => admin-guide}/blockdev/index.rst | 2 --
.../{ => admin-guide}/blockdev/nbd.rst | 0
.../{ => admin-guide}/blockdev/paride.rst | 0
.../{ => admin-guide}/blockdev/ramdisk.rst | 0
.../{ => admin-guide}/blockdev/zram.rst | 0
Documentation/admin-guide/index.rst | 1 +
.../admin-guide/kernel-parameters.txt | 18 +++++++++---------
MAINTAINERS | 10 +++++-----
drivers/block/Kconfig | 8 ++++----
drivers/block/floppy.c | 2 +-
drivers/block/zram/Kconfig | 6 +++---
tools/testing/selftests/zram/README | 2 +-
22 files changed, 24 insertions(+), 26 deletions(-)
rename Documentation/{ => admin-guide}/blockdev/drbd/DRBD-8.3-data-packets.svg (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/DRBD-data-packets.svg (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/conn-states-8.dot (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/data-structure-v9.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/disk-states-8.dot (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/drbd-connection-state-overview.dot (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/figures.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/index.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/drbd/node-states-8.dot (99%)
rename Documentation/{ => admin-guide}/blockdev/floppy.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/index.rst (94%)
rename Documentation/{ => admin-guide}/blockdev/nbd.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/paride.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/ramdisk.rst (100%)
rename Documentation/{ => admin-guide}/blockdev/zram.rst (100%)
diff --git a/Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg b/Documentation/admin-guide/blockdev/drbd/DRBD-8.3-data-packets.svg
similarity index 100%
rename from Documentation/blockdev/drbd/DRBD-8.3-data-packets.svg
rename to Documentation/admin-guide/blockdev/drbd/DRBD-8.3-data-packets.svg
diff --git a/Documentation/blockdev/drbd/DRBD-data-packets.svg b/Documentation/admin-guide/blockdev/drbd/DRBD-data-packets.svg
similarity index 100%
rename from Documentation/blockdev/drbd/DRBD-data-packets.svg
rename to Documentation/admin-guide/blockdev/drbd/DRBD-data-packets.svg
diff --git a/Documentation/blockdev/drbd/conn-states-8.dot b/Documentation/admin-guide/blockdev/drbd/conn-states-8.dot
similarity index 100%
rename from Documentation/blockdev/drbd/conn-states-8.dot
rename to Documentation/admin-guide/blockdev/drbd/conn-states-8.dot
diff --git a/Documentation/blockdev/drbd/data-structure-v9.rst b/Documentation/admin-guide/blockdev/drbd/data-structure-v9.rst
similarity index 100%
rename from Documentation/blockdev/drbd/data-structure-v9.rst
rename to Documentation/admin-guide/blockdev/drbd/data-structure-v9.rst
diff --git a/Documentation/blockdev/drbd/disk-states-8.dot b/Documentation/admin-guide/blockdev/drbd/disk-states-8.dot
similarity index 100%
rename from Documentation/blockdev/drbd/disk-states-8.dot
rename to Documentation/admin-guide/blockdev/drbd/disk-states-8.dot
diff --git a/Documentation/blockdev/drbd/drbd-connection-state-overview.dot b/Documentation/admin-guide/blockdev/drbd/drbd-connection-state-overview.dot
similarity index 100%
rename from Documentation/blockdev/drbd/drbd-connection-state-overview.dot
rename to Documentation/admin-guide/blockdev/drbd/drbd-connection-state-overview.dot
diff --git a/Documentation/blockdev/drbd/figures.rst b/Documentation/admin-guide/blockdev/drbd/figures.rst
similarity index 100%
rename from Documentation/blockdev/drbd/figures.rst
rename to Documentation/admin-guide/blockdev/drbd/figures.rst
diff --git a/Documentation/blockdev/drbd/index.rst b/Documentation/admin-guide/blockdev/drbd/index.rst
similarity index 100%
rename from Documentation/blockdev/drbd/index.rst
rename to Documentation/admin-guide/blockdev/drbd/index.rst
diff --git a/Documentation/blockdev/drbd/node-states-8.dot b/Documentation/admin-guide/blockdev/drbd/node-states-8.dot
similarity index 99%
rename from Documentation/blockdev/drbd/node-states-8.dot
rename to Documentation/admin-guide/blockdev/drbd/node-states-8.dot
index 4a2b00c23547..bfa54e1f8016 100644
--- a/Documentation/blockdev/drbd/node-states-8.dot
+++ b/Documentation/admin-guide/blockdev/drbd/node-states-8.dot
@@ -11,4 +11,3 @@ digraph peer_states {
Unknown -> Primary [ label = "connected" ]
Unknown -> Secondary [ label = "connected" ]
}
-
diff --git a/Documentation/blockdev/floppy.rst b/Documentation/admin-guide/blockdev/floppy.rst
similarity index 100%
rename from Documentation/blockdev/floppy.rst
rename to Documentation/admin-guide/blockdev/floppy.rst
diff --git a/Documentation/blockdev/index.rst b/Documentation/admin-guide/blockdev/index.rst
similarity index 94%
rename from Documentation/blockdev/index.rst
rename to Documentation/admin-guide/blockdev/index.rst
index a9af6ed8b4aa..20a738d9d047 100644
--- a/Documentation/blockdev/index.rst
+++ b/Documentation/admin-guide/blockdev/index.rst
@@ -1,5 +1,3 @@
-:orphan:
-
===========================
The Linux RapidIO Subsystem
===========================
diff --git a/Documentation/blockdev/nbd.rst b/Documentation/admin-guide/blockdev/nbd.rst
similarity index 100%
rename from Documentation/blockdev/nbd.rst
rename to Documentation/admin-guide/blockdev/nbd.rst
diff --git a/Documentation/blockdev/paride.rst b/Documentation/admin-guide/blockdev/paride.rst
similarity index 100%
rename from Documentation/blockdev/paride.rst
rename to Documentation/admin-guide/blockdev/paride.rst
diff --git a/Documentation/blockdev/ramdisk.rst b/Documentation/admin-guide/blockdev/ramdisk.rst
similarity index 100%
rename from Documentation/blockdev/ramdisk.rst
rename to Documentation/admin-guide/blockdev/ramdisk.rst
diff --git a/Documentation/blockdev/zram.rst b/Documentation/admin-guide/blockdev/zram.rst
similarity index 100%
rename from Documentation/blockdev/zram.rst
rename to Documentation/admin-guide/blockdev/zram.rst
diff --git a/Documentation/admin-guide/index.rst b/Documentation/admin-guide/index.rst
index 65e821a03aca..c073af461cdf 100644
--- a/Documentation/admin-guide/index.rst
+++ b/Documentation/admin-guide/index.rst
@@ -73,6 +73,7 @@ configure specific aspects of kernel behavior to your liking.
java
ras
bcache
+ blockdev/index
ext4
pm/index
thunderbolt
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 9b535c0e22f3..49ad034c4675 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1249,7 +1249,7 @@
See also Documentation/fault-injection/.
floppy= [HW]
- See Documentation/blockdev/floppy.rst.
+ See Documentation/admin-guide/blockdev/floppy.rst.
force_pal_cache_flush
[IA-64] Avoid check_sal_cache_flush which may hang on
@@ -2247,7 +2247,7 @@
memblock=debug [KNL] Enable memblock debug messages.
load_ramdisk= [RAM] List of ramdisks to load from floppy
- See Documentation/blockdev/ramdisk.rst.
+ See Documentation/admin-guide/blockdev/ramdisk.rst.
lockd.nlm_grace_period=P [NFS] Assign grace period.
Format: <integer>
@@ -3294,7 +3294,7 @@
pcd. [PARIDE]
See header of drivers/block/paride/pcd.c.
- See also Documentation/blockdev/paride.rst.
+ See also Documentation/admin-guide/blockdev/paride.rst.
pci=option[,option...] [PCI] various PCI subsystem options.
@@ -3538,7 +3538,7 @@
needed on a platform with proper driver support.
pd. [PARIDE]
- See Documentation/blockdev/paride.rst.
+ See Documentation/admin-guide/blockdev/paride.rst.
pdcchassis= [PARISC,HW] Disable/Enable PDC Chassis Status codes at
boot time.
@@ -3553,10 +3553,10 @@
and performance comparison.
pf. [PARIDE]
- See Documentation/blockdev/paride.rst.
+ See Documentation/admin-guide/blockdev/paride.rst.
pg. [PARIDE]
- See Documentation/blockdev/paride.rst.
+ See Documentation/admin-guide/blockdev/paride.rst.
pirq= [SMP,APIC] Manual mp-table setup
See Documentation/x86/i386/IO-APIC.rst.
@@ -3668,7 +3668,7 @@
prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
before loading.
- See Documentation/blockdev/ramdisk.rst.
+ See Documentation/admin-guide/blockdev/ramdisk.rst.
psi= [KNL] Enable or disable pressure stall information
tracking.
@@ -3690,7 +3690,7 @@
pstore.backend= Specify the name of the pstore backend to use
pt. [PARIDE]
- See Documentation/blockdev/paride.rst.
+ See Documentation/admin-guide/blockdev/paride.rst.
pti= [X86_64] Control Page Table Isolation of user and
kernel address spaces. Disabling this feature
@@ -3719,7 +3719,7 @@
See Documentation/admin-guide/md.rst.
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
- See Documentation/blockdev/ramdisk.rst.
+ See Documentation/admin-guide/blockdev/ramdisk.rst.
random.trust_cpu={on,off}
[KNL] Enable or disable trusting the use of the
diff --git a/MAINTAINERS b/MAINTAINERS
index 4c622a19ab7d..3f0f654d1166 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4974,7 +4974,7 @@ T: git git://git.linbit.com/drbd-8.4.git
S: Supported
F: drivers/block/drbd/
F: lib/lru_cache.c
-F: Documentation/blockdev/drbd/
+F: Documentation/admin-guide/blockdev/
DRIVER CORE, KOBJECTS, DEBUGFS AND SYSFS
M: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
@@ -11024,7 +11024,7 @@ M: Josef Bacik <josef(a)toxicpanda.com>
S: Maintained
L: linux-block(a)vger.kernel.org
L: nbd(a)other.debian.org
-F: Documentation/blockdev/nbd.rst
+F: Documentation/admin-guide/blockdev/nbd.rst
F: drivers/block/nbd.c
F: include/trace/events/nbd.h
F: include/uapi/linux/nbd.h
@@ -12028,7 +12028,7 @@ PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
M: Tim Waugh <tim(a)cyberelk.net>
L: linux-parport(a)lists.infradead.org (subscribers-only)
S: Maintained
-F: Documentation/blockdev/paride.rst
+F: Documentation/admin-guide/blockdev/paride.rst
F: drivers/block/paride/
PARISC ARCHITECTURE
@@ -13310,7 +13310,7 @@ F: drivers/net/wireless/ralink/rt2x00/
RAMDISK RAM BLOCK DEVICE DRIVER
M: Jens Axboe <axboe(a)kernel.dk>
S: Maintained
-F: Documentation/blockdev/ramdisk.rst
+F: Documentation/admin-guide/blockdev/ramdisk.rst
F: drivers/block/brd.c
RANCHU VIRTUAL BOARD FOR MIPS
@@ -17672,7 +17672,7 @@ R: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
L: linux-kernel(a)vger.kernel.org
S: Maintained
F: drivers/block/zram/
-F: Documentation/blockdev/zram.rst
+F: Documentation/admin-guide/blockdev/zram.rst
ZS DECSTATION Z85C30 SERIAL DRIVER
M: "Maciej W. Rozycki" <macro(a)linux-mips.org>
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index c43690b973d8..1bb8ec575352 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -31,7 +31,7 @@ config BLK_DEV_FD
If you want to use the floppy disk drive(s) of your PC under Linux,
say Y. Information about this driver, especially important for IBM
Thinkpad users, is contained in
- <file:Documentation/blockdev/floppy.rst>.
+ <file:Documentation/admin-guide/blockdev/floppy.rst>.
That file also contains the location of the Floppy driver FAQ as
well as location of the fdutils package used to configure additional
parameters of the driver at run time.
@@ -96,7 +96,7 @@ config PARIDE
your computer's parallel port. Most of them are actually IDE devices
using a parallel port IDE adapter. This option enables the PARIDE
subsystem which contains drivers for many of these external drives.
- Read <file:Documentation/blockdev/paride.rst> for more information.
+ Read <file:Documentation/admin-guide/blockdev/paride.rst> for more information.
If you have said Y to the "Parallel-port support" configuration
option, you may share a single port between your printer and other
@@ -261,7 +261,7 @@ config BLK_DEV_NBD
userland (making server and client physically the same computer,
communicating using the loopback network device).
- Read <file:Documentation/blockdev/nbd.rst> for more information,
+ Read <file:Documentation/admin-guide/blockdev/nbd.rst> for more information,
especially about where to find the server code, which runs in user
space and does not need special kernel support.
@@ -303,7 +303,7 @@ config BLK_DEV_RAM
during the initial install of Linux.
Note that the kernel command line option "ramdisk=XX" is now obsolete.
- For details, read <file:Documentation/blockdev/ramdisk.rst>.
+ For details, read <file:Documentation/admin-guide/blockdev/ramdisk.rst>.
To compile this driver as a module, choose M here: the
module will be called brd. An alias "rd" has been defined
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index 5c99e52f9dc1..f652c1ac3ae9 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4424,7 +4424,7 @@ static int __init floppy_setup(char *str)
pr_cont("\n");
} else
DPRINT("botched floppy option\n");
- DPRINT("Read Documentation/blockdev/floppy.rst\n");
+ DPRINT("Read Documentation/admin-guide/blockdev/floppy.rst\n");
return 0;
}
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index e06b99d54816..fe7a4b7d30cf 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -12,7 +12,7 @@ config ZRAM
It has several use cases, for example: /tmp storage, use as swap
disks and maybe many more.
- See Documentation/blockdev/zram.rst for more information.
+ See Documentation/admin-guide/blockdev/zram.rst for more information.
config ZRAM_WRITEBACK
bool "Write back incompressible or idle page to backing device"
@@ -26,7 +26,7 @@ config ZRAM_WRITEBACK
With /sys/block/zramX/{idle,writeback}, application could ask
idle page's writeback to the backing device to save in memory.
- See Documentation/blockdev/zram.rst for more information.
+ See Documentation/admin-guide/blockdev/zram.rst for more information.
config ZRAM_MEMORY_TRACKING
bool "Track zRam block status"
@@ -36,4 +36,4 @@ config ZRAM_MEMORY_TRACKING
of zRAM. Admin could see the information via
/sys/kernel/debug/zram/zramX/block_state.
- See Documentation/blockdev/zram.rst for more information.
+ See Documentation/admin-guide/blockdev/zram.rst for more information.
diff --git a/tools/testing/selftests/zram/README b/tools/testing/selftests/zram/README
index 5fa378391d3b..110b34834a6f 100644
--- a/tools/testing/selftests/zram/README
+++ b/tools/testing/selftests/zram/README
@@ -37,4 +37,4 @@ Commands required for testing:
- mkfs/ mkfs.ext4
For more information please refer:
-kernel-source-tree/Documentation/blockdev/zram.rst
+kernel-source-tree/Documentation/admin-guide/blockdev/zram.rst
--
2.21.0
Rename the blockdev documentation files to ReST, add an
index for them and adjust in order to produce a nice html
output via the Sphinx build system.
The drbd sub-directory contains some graphs and data flows.
Add those too to the documentation.
At its new index.rst, let's add a :orphan: while this is not linked to
the main index.rst file, in order to avoid build warnings.
Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung(a)kernel.org>
---
.../admin-guide/kernel-parameters.txt | 18 +-
...structure-v9.txt => data-structure-v9.rst} | 6 +-
Documentation/blockdev/drbd/figures.rst | 28 +++
.../blockdev/drbd/{README.txt => index.rst} | 15 +-
.../blockdev/{floppy.txt => floppy.rst} | 88 ++++----
Documentation/blockdev/index.rst | 16 ++
Documentation/blockdev/{nbd.txt => nbd.rst} | 2 +-
.../blockdev/{paride.txt => paride.rst} | 194 +++++++++--------
.../blockdev/{ramdisk.txt => ramdisk.rst} | 55 ++---
Documentation/blockdev/{zram.txt => zram.rst} | 195 ++++++++++++------
MAINTAINERS | 8 +-
drivers/block/Kconfig | 8 +-
drivers/block/floppy.c | 2 +-
drivers/block/zram/Kconfig | 6 +-
tools/testing/selftests/zram/README | 2 +-
15 files changed, 398 insertions(+), 245 deletions(-)
rename Documentation/blockdev/drbd/{data-structure-v9.txt => data-structure-v9.rst} (94%)
create mode 100644 Documentation/blockdev/drbd/figures.rst
rename Documentation/blockdev/drbd/{README.txt => index.rst} (55%)
rename Documentation/blockdev/{floppy.txt => floppy.rst} (81%)
create mode 100644 Documentation/blockdev/index.rst
rename Documentation/blockdev/{nbd.txt => nbd.rst} (96%)
rename Documentation/blockdev/{paride.txt => paride.rst} (81%)
rename Documentation/blockdev/{ramdisk.txt => ramdisk.rst} (84%)
rename Documentation/blockdev/{zram.txt => zram.rst} (76%)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 92d335837c52..7ed94527a4a0 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1249,7 +1249,7 @@
See also Documentation/fault-injection/.
floppy= [HW]
- See Documentation/blockdev/floppy.txt.
+ See Documentation/blockdev/floppy.rst.
force_pal_cache_flush
[IA-64] Avoid check_sal_cache_flush which may hang on
@@ -2247,7 +2247,7 @@
memblock=debug [KNL] Enable memblock debug messages.
load_ramdisk= [RAM] List of ramdisks to load from floppy
- See Documentation/blockdev/ramdisk.txt.
+ See Documentation/blockdev/ramdisk.rst.
lockd.nlm_grace_period=P [NFS] Assign grace period.
Format: <integer>
@@ -3294,7 +3294,7 @@
pcd. [PARIDE]
See header of drivers/block/paride/pcd.c.
- See also Documentation/blockdev/paride.txt.
+ See also Documentation/blockdev/paride.rst.
pci=option[,option...] [PCI] various PCI subsystem options.
@@ -3538,7 +3538,7 @@
needed on a platform with proper driver support.
pd. [PARIDE]
- See Documentation/blockdev/paride.txt.
+ See Documentation/blockdev/paride.rst.
pdcchassis= [PARISC,HW] Disable/Enable PDC Chassis Status codes at
boot time.
@@ -3553,10 +3553,10 @@
and performance comparison.
pf. [PARIDE]
- See Documentation/blockdev/paride.txt.
+ See Documentation/blockdev/paride.rst.
pg. [PARIDE]
- See Documentation/blockdev/paride.txt.
+ See Documentation/blockdev/paride.rst.
pirq= [SMP,APIC] Manual mp-table setup
See Documentation/x86/i386/IO-APIC.rst.
@@ -3668,7 +3668,7 @@
prompt_ramdisk= [RAM] List of RAM disks to prompt for floppy disk
before loading.
- See Documentation/blockdev/ramdisk.txt.
+ See Documentation/blockdev/ramdisk.rst.
psi= [KNL] Enable or disable pressure stall information
tracking.
@@ -3690,7 +3690,7 @@
pstore.backend= Specify the name of the pstore backend to use
pt. [PARIDE]
- See Documentation/blockdev/paride.txt.
+ See Documentation/blockdev/paride.rst.
pti= [X86_64] Control Page Table Isolation of user and
kernel address spaces. Disabling this feature
@@ -3719,7 +3719,7 @@
See Documentation/admin-guide/md.rst.
ramdisk_size= [RAM] Sizes of RAM disks in kilobytes
- See Documentation/blockdev/ramdisk.txt.
+ See Documentation/blockdev/ramdisk.rst.
random.trust_cpu={on,off}
[KNL] Enable or disable trusting the use of the
diff --git a/Documentation/blockdev/drbd/data-structure-v9.txt b/Documentation/blockdev/drbd/data-structure-v9.rst
similarity index 94%
rename from Documentation/blockdev/drbd/data-structure-v9.txt
rename to Documentation/blockdev/drbd/data-structure-v9.rst
index 1e52a0e32624..66036b901644 100644
--- a/Documentation/blockdev/drbd/data-structure-v9.txt
+++ b/Documentation/blockdev/drbd/data-structure-v9.rst
@@ -1,3 +1,7 @@
+================================
+kernel data structure for DRBD-9
+================================
+
This describes the in kernel data structure for DRBD-9. Starting with
Linux v3.14 we are reorganizing DRBD to use this data structure.
@@ -10,7 +14,7 @@ device is represented by a block device locally.
The DRBD objects are interconnected to form a matrix as depicted below; a
drbd_peer_device object sits at each intersection between a drbd_device and a
-drbd_connection:
+drbd_connection::
/--------------+---------------+.....+---------------\
| resource | device | | device |
diff --git a/Documentation/blockdev/drbd/figures.rst b/Documentation/blockdev/drbd/figures.rst
new file mode 100644
index 000000000000..3e3fd4b8a478
--- /dev/null
+++ b/Documentation/blockdev/drbd/figures.rst
@@ -0,0 +1,28 @@
+.. The here included files are intended to help understand the implementation
+
+Data flows that Relate some functions, and write packets
+========================================================
+
+.. kernel-figure:: DRBD-8.3-data-packets.svg
+ :alt: DRBD-8.3-data-packets.svg
+ :align: center
+
+.. kernel-figure:: DRBD-data-packets.svg
+ :alt: DRBD-data-packets.svg
+ :align: center
+
+
+Sub graphs of DRBD's state transitions
+======================================
+
+.. kernel-figure:: conn-states-8.dot
+ :alt: conn-states-8.dot
+ :align: center
+
+.. kernel-figure:: disk-states-8.dot
+ :alt: disk-states-8.dot
+ :align: center
+
+.. kernel-figure:: node-states-8.dot
+ :alt: node-states-8.dot
+ :align: center
diff --git a/Documentation/blockdev/drbd/README.txt b/Documentation/blockdev/drbd/index.rst
similarity index 55%
rename from Documentation/blockdev/drbd/README.txt
rename to Documentation/blockdev/drbd/index.rst
index 627b0a1bf35e..68ecd5c113e9 100644
--- a/Documentation/blockdev/drbd/README.txt
+++ b/Documentation/blockdev/drbd/index.rst
@@ -1,4 +1,9 @@
+==========================================
+Distributed Replicated Block Device - DRBD
+==========================================
+
Description
+===========
DRBD is a shared-nothing, synchronously replicated block device. It
is designed to serve as a building block for high availability
@@ -7,10 +12,8 @@ Description
Please visit http://www.drbd.org to find out more.
-The here included files are intended to help understand the implementation
+.. toctree::
+ :maxdepth: 1
-DRBD-8.3-data-packets.svg, DRBD-data-packets.svg
- relates some functions, and write packets.
-
-conn-states-8.dot, disk-states-8.dot, node-states-8.dot
- The sub graphs of DRBD's state transitions
+ data-structure-v9
+ figures
diff --git a/Documentation/blockdev/floppy.txt b/Documentation/blockdev/floppy.rst
similarity index 81%
rename from Documentation/blockdev/floppy.txt
rename to Documentation/blockdev/floppy.rst
index e2240f5ab64d..4a8f31cf4139 100644
--- a/Documentation/blockdev/floppy.txt
+++ b/Documentation/blockdev/floppy.rst
@@ -1,35 +1,37 @@
-This file describes the floppy driver.
+=============
+Floppy Driver
+=============
FAQ list:
=========
- A FAQ list may be found in the fdutils package (see below), and also
+A FAQ list may be found in the fdutils package (see below), and also
at <http://fdutils.linux.lu/faq.html>.
LILO configuration options (Thinkpad users, read this)
======================================================
- The floppy driver is configured using the 'floppy=' option in
+The floppy driver is configured using the 'floppy=' option in
lilo. This option can be typed at the boot prompt, or entered in the
lilo configuration file.
- Example: If your kernel is called linux-2.6.9, type the following line
-at the lilo boot prompt (if you have a thinkpad):
+Example: If your kernel is called linux-2.6.9, type the following line
+at the lilo boot prompt (if you have a thinkpad)::
linux-2.6.9 floppy=thinkpad
You may also enter the following line in /etc/lilo.conf, in the description
-of linux-2.6.9:
+of linux-2.6.9::
append = "floppy=thinkpad"
- Several floppy related options may be given, example:
+Several floppy related options may be given, example::
linux-2.6.9 floppy=daring floppy=two_fdc
append = "floppy=daring floppy=two_fdc"
- If you give options both in the lilo config file and on the boot
+If you give options both in the lilo config file and on the boot
prompt, the option strings of both places are concatenated, the boot
prompt options coming last. That's why there are also options to
restore the default behavior.
@@ -38,21 +40,23 @@ restore the default behavior.
Module configuration options
============================
- If you use the floppy driver as a module, use the following syntax:
-modprobe floppy floppy="<options>"
+If you use the floppy driver as a module, use the following syntax::
-Example:
- modprobe floppy floppy="omnibook messages"
+ modprobe floppy floppy="<options>"
- If you need certain options enabled every time you load the floppy driver,
-you can put:
+Example::
- options floppy floppy="omnibook messages"
+ modprobe floppy floppy="omnibook messages"
+
+If you need certain options enabled every time you load the floppy driver,
+you can put::
+
+ options floppy floppy="omnibook messages"
in a configuration file in /etc/modprobe.d/.
- The floppy driver related options are:
+The floppy driver related options are:
floppy=asus_pci
Sets the bit mask to allow only units 0 and 1. (default)
@@ -70,8 +74,7 @@ in a configuration file in /etc/modprobe.d/.
Tells the floppy driver that you have only one floppy controller.
(default)
- floppy=two_fdc
- floppy=<address>,two_fdc
+ floppy=two_fdc / floppy=<address>,two_fdc
Tells the floppy driver that you have two floppy controllers.
The second floppy controller is assumed to be at <address>.
This option is not needed if the second controller is at address
@@ -84,8 +87,7 @@ in a configuration file in /etc/modprobe.d/.
floppy=0,thinkpad
Tells the floppy driver that you don't have a Thinkpad.
- floppy=omnibook
- floppy=nodma
+ floppy=omnibook / floppy=nodma
Tells the floppy driver not to use Dma for data transfers.
This is needed on HP Omnibooks, which don't have a workable
DMA channel for the floppy driver. This option is also useful
@@ -144,14 +146,16 @@ in a configuration file in /etc/modprobe.d/.
described in the physical CMOS), or if your BIOS uses
non-standard CMOS types. The CMOS types are:
- 0 - Use the value of the physical CMOS
- 1 - 5 1/4 DD
- 2 - 5 1/4 HD
- 3 - 3 1/2 DD
- 4 - 3 1/2 HD
- 5 - 3 1/2 ED
- 6 - 3 1/2 ED
- 16 - unknown or not installed
+ == ==================================
+ 0 Use the value of the physical CMOS
+ 1 5 1/4 DD
+ 2 5 1/4 HD
+ 3 3 1/2 DD
+ 4 3 1/2 HD
+ 5 3 1/2 ED
+ 6 3 1/2 ED
+ 16 unknown or not installed
+ == ==================================
(Note: there are two valid types for ED drives. This is because 5 was
initially chosen to represent floppy *tapes*, and 6 for ED drives.
@@ -162,8 +166,7 @@ in a configuration file in /etc/modprobe.d/.
Print a warning message when an unexpected interrupt is received.
(default)
- floppy=no_unexpected_interrupts
- floppy=L40SX
+ floppy=no_unexpected_interrupts / floppy=L40SX
Don't print a message when an unexpected interrupt is received. This
is needed on IBM L40SX laptops in certain video modes. (There seems
to be an interaction between video and floppy. The unexpected
@@ -199,47 +202,54 @@ in a configuration file in /etc/modprobe.d/.
Sets the floppy DMA channel to <nr> instead of 2.
floppy=slow
- Use PS/2 stepping rate:
- " PS/2 floppies have much slower step rates than regular floppies.
+ Use PS/2 stepping rate::
+
+ PS/2 floppies have much slower step rates than regular floppies.
It's been recommended that take about 1/4 of the default speed
- in some more extreme cases."
+ in some more extreme cases.
Supporting utilities and additional documentation:
==================================================
- Additional parameters of the floppy driver can be configured at
+Additional parameters of the floppy driver can be configured at
runtime. Utilities which do this can be found in the fdutils package.
This package also contains a new version of mtools which allows to
access high capacity disks (up to 1992K on a high density 3 1/2 disk!).
It also contains additional documentation about the floppy driver.
The latest version can be found at fdutils homepage:
+
http://fdutils.linux.lu
The fdutils releases can be found at:
+
http://fdutils.linux.lu/download.html
+
http://www.tux.org/pub/knaff/fdutils/
+
ftp://metalab.unc.edu/pub/Linux/utils/disk-management/
Reporting problems about the floppy driver
==========================================
- If you have a question or a bug report about the floppy driver, mail
+If you have a question or a bug report about the floppy driver, mail
me at Alain.Knaff(a)poboxes.com . If you post to Usenet, preferably use
comp.os.linux.hardware. As the volume in these groups is rather high,
be sure to include the word "floppy" (or "FLOPPY") in the subject
line. If the reported problem happens when mounting floppy disks, be
sure to mention also the type of the filesystem in the subject line.
- Be sure to read the FAQ before mailing/posting any bug reports!
+Be sure to read the FAQ before mailing/posting any bug reports!
- Alain
+Alain
Changelog
=========
-10-30-2004 : Cleanup, updating, add reference to module configuration.
+10-30-2004 :
+ Cleanup, updating, add reference to module configuration.
James Nelson <james4765(a)gmail.com>
-6-3-2000 : Original Document
+6-3-2000 :
+ Original Document
diff --git a/Documentation/blockdev/index.rst b/Documentation/blockdev/index.rst
new file mode 100644
index 000000000000..a9af6ed8b4aa
--- /dev/null
+++ b/Documentation/blockdev/index.rst
@@ -0,0 +1,16 @@
+:orphan:
+
+===========================
+The Linux RapidIO Subsystem
+===========================
+
+.. toctree::
+ :maxdepth: 1
+
+ floppy
+ nbd
+ paride
+ ramdisk
+ zram
+
+ drbd/index
diff --git a/Documentation/blockdev/nbd.txt b/Documentation/blockdev/nbd.rst
similarity index 96%
rename from Documentation/blockdev/nbd.txt
rename to Documentation/blockdev/nbd.rst
index db242ea2bce8..d78dfe559dcf 100644
--- a/Documentation/blockdev/nbd.txt
+++ b/Documentation/blockdev/nbd.rst
@@ -1,3 +1,4 @@
+==================================
Network Block Device (TCP version)
==================================
@@ -28,4 +29,3 @@ max_part
nbds_max
Number of block devices that should be initialized (default: 16).
-
diff --git a/Documentation/blockdev/paride.txt b/Documentation/blockdev/paride.rst
similarity index 81%
rename from Documentation/blockdev/paride.txt
rename to Documentation/blockdev/paride.rst
index ee6717e3771d..87b4278bf314 100644
--- a/Documentation/blockdev/paride.txt
+++ b/Documentation/blockdev/paride.rst
@@ -1,15 +1,17 @@
-
- Linux and parallel port IDE devices
+===================================
+Linux and parallel port IDE devices
+===================================
PARIDE v1.03 (c) 1997-8 Grant Guenther <grant(a)torque.net>
1. Introduction
+===============
Owing to the simplicity and near universality of the parallel port interface
to personal computers, many external devices such as portable hard-disk,
CD-ROM, LS-120 and tape drives use the parallel port to connect to their
host computer. While some devices (notably scanners) use ad-hoc methods
-to pass commands and data through the parallel port interface, most
+to pass commands and data through the parallel port interface, most
external devices are actually identical to an internal model, but with
a parallel-port adapter chip added in. Some of the original parallel port
adapters were little more than mechanisms for multiplexing a SCSI bus.
@@ -28,47 +30,50 @@ were to open up a parallel port CD-ROM drive, for instance, one would
find a standard ATAPI CD-ROM drive, a power supply, and a single adapter
that interconnected a standard PC parallel port cable and a standard
IDE cable. It is usually possible to exchange the CD-ROM device with
-any other device using the IDE interface.
+any other device using the IDE interface.
The document describes the support in Linux for parallel port IDE
devices. It does not cover parallel port SCSI devices, "ditto" tape
-drives or scanners. Many different devices are supported by the
+drives or scanners. Many different devices are supported by the
parallel port IDE subsystem, including:
- MicroSolutions backpack CD-ROM
- MicroSolutions backpack PD/CD
- MicroSolutions backpack hard-drives
- MicroSolutions backpack 8000t tape drive
- SyQuest EZ-135, EZ-230 & SparQ drives
- Avatar Shark
- Imation Superdisk LS-120
- Maxell Superdisk LS-120
- FreeCom Power CD
- Hewlett-Packard 5GB and 8GB tape drives
- Hewlett-Packard 7100 and 7200 CD-RW drives
+ - MicroSolutions backpack CD-ROM
+ - MicroSolutions backpack PD/CD
+ - MicroSolutions backpack hard-drives
+ - MicroSolutions backpack 8000t tape drive
+ - SyQuest EZ-135, EZ-230 & SparQ drives
+ - Avatar Shark
+ - Imation Superdisk LS-120
+ - Maxell Superdisk LS-120
+ - FreeCom Power CD
+ - Hewlett-Packard 5GB and 8GB tape drives
+ - Hewlett-Packard 7100 and 7200 CD-RW drives
as well as most of the clone and no-name products on the market.
To support such a wide range of devices, PARIDE, the parallel port IDE
subsystem, is actually structured in three parts. There is a base
paride module which provides a registry and some common methods for
-accessing the parallel ports. The second component is a set of
-high-level drivers for each of the different types of supported devices:
+accessing the parallel ports. The second component is a set of
+high-level drivers for each of the different types of supported devices:
+ === =============
pd IDE disk
pcd ATAPI CD-ROM
pf ATAPI disk
pt ATAPI tape
pg ATAPI generic
+ === =============
(Currently, the pg driver is only used with CD-R drives).
The high-level drivers function according to the relevant standards.
The third component of PARIDE is a set of low-level protocol drivers
for each of the parallel port IDE adapter chips. Thanks to the interest
-and encouragement of Linux users from many parts of the world,
+and encouragement of Linux users from many parts of the world,
support is available for almost all known adapter protocols:
+ ==== ====================================== ====
aten ATEN EH-100 (HK)
bpck Microsolutions backpack (US)
comm DataStor (old-type) "commuter" adapter (TW)
@@ -83,9 +88,11 @@ support is available for almost all known adapter protocols:
ktti KT Technology PHd adapter (SG)
on20 OnSpec 90c20 (US)
on26 OnSpec 90c26 (US)
+ ==== ====================================== ====
2. Using the PARIDE subsystem
+=============================
While configuring the Linux kernel, you may choose either to build
the PARIDE drivers into your kernel, or to build them as modules.
@@ -94,10 +101,10 @@ In either case, you will need to select "Parallel port IDE device support"
as well as at least one of the high-level drivers and at least one
of the parallel port communication protocols. If you do not know
what kind of parallel port adapter is used in your drive, you could
-begin by checking the file names and any text files on your DOS
+begin by checking the file names and any text files on your DOS
installation floppy. Alternatively, you can look at the markings on
the adapter chip itself. That's usually sufficient to identify the
-correct device.
+correct device.
You can actually select all the protocol modules, and allow the PARIDE
subsystem to try them all for you.
@@ -105,8 +112,9 @@ subsystem to try them all for you.
For the "brand-name" products listed above, here are the protocol
and high-level drivers that you would use:
+ ================ ============ ====== ========
Manufacturer Model Driver Protocol
-
+ ================ ============ ====== ========
MicroSolutions CD-ROM pcd bpck
MicroSolutions PD drive pf bpck
MicroSolutions hard-drive pd bpck
@@ -119,8 +127,10 @@ and high-level drivers that you would use:
Hewlett-Packard 5GB Tape pt epat
Hewlett-Packard 7200e (CD) pcd epat
Hewlett-Packard 7200e (CD-R) pg epat
+ ================ ============ ====== ========
2.1 Configuring built-in drivers
+---------------------------------
We recommend that you get to know how the drivers work and how to
configure them as loadable modules, before attempting to compile a
@@ -143,7 +153,7 @@ protocol identification number and, for some devices, the drive's
chain ID. While your system is booting, a number of messages are
displayed on the console. Like all such messages, they can be
reviewed with the 'dmesg' command. Among those messages will be
-some lines like:
+some lines like::
paride: bpck registered as protocol 0
paride: epat registered as protocol 1
@@ -158,10 +168,10 @@ the last two digits of the drive's serial number (but read MicroSolutions'
documentation about this).
As an example, let's assume that you have a MicroSolutions PD/CD drive
-with unit ID number 36 connected to the parallel port at 0x378, a SyQuest
-EZ-135 connected to the chained port on the PD/CD drive and also an
-Imation Superdisk connected to port 0x278. You could give the following
-options on your boot command:
+with unit ID number 36 connected to the parallel port at 0x378, a SyQuest
+EZ-135 connected to the chained port on the PD/CD drive and also an
+Imation Superdisk connected to port 0x278. You could give the following
+options on your boot command::
pd.drive0=0x378,1 pf.drive0=0x278,1 pf.drive1=0x378,0,36
@@ -169,24 +179,27 @@ In the last option, pf.drive1 configures device /dev/pf1, the 0x378
is the parallel port base address, the 0 is the protocol registration
number and 36 is the chain ID.
-Please note: while PARIDE will work both with and without the
+Please note: while PARIDE will work both with and without the
PARPORT parallel port sharing system that is included by the
"Parallel port support" option, PARPORT must be included and enabled
if you want to use chains of devices on the same parallel port.
2.2 Loading and configuring PARIDE as modules
+----------------------------------------------
It is much faster and simpler to get to understand the PARIDE drivers
-if you use them as loadable kernel modules.
+if you use them as loadable kernel modules.
-Note 1: using these drivers with the "kerneld" automatic module loading
-system is not recommended for beginners, and is not documented here.
+Note 1:
+ using these drivers with the "kerneld" automatic module loading
+ system is not recommended for beginners, and is not documented here.
-Note 2: if you build PARPORT support as a loadable module, PARIDE must
-also be built as loadable modules, and PARPORT must be loaded before the
-PARIDE modules.
+Note 2:
+ if you build PARPORT support as a loadable module, PARIDE must
+ also be built as loadable modules, and PARPORT must be loaded before
+ the PARIDE modules.
-To use PARIDE, you must begin by
+To use PARIDE, you must begin by::
insmod paride
@@ -195,8 +208,8 @@ among other tasks.
Then, load as many of the protocol modules as you think you might need.
As you load each module, it will register the protocols that it supports,
-and print a log message to your kernel log file and your console. For
-example:
+and print a log message to your kernel log file and your console. For
+example::
# insmod epat
paride: epat registered as protocol 0
@@ -205,22 +218,22 @@ example:
paride: k971 registered as protocol 2
Finally, you can load high-level drivers for each kind of device that
-you have connected. By default, each driver will autoprobe for a single
+you have connected. By default, each driver will autoprobe for a single
device, but you can support up to four similar devices by giving their
individual co-ordinates when you load the driver.
For example, if you had two no-name CD-ROM drives both using the
KingByte KBIC-951A adapter, one on port 0x378 and the other on 0x3bc
-you could give the following command:
+you could give the following command::
# insmod pcd drive0=0x378,1 drive1=0x3bc,1
For most adapters, giving a port address and protocol number is sufficient,
-but check the source files in linux/drivers/block/paride for more
+but check the source files in linux/drivers/block/paride for more
information. (Hopefully someone will write some man pages one day !).
As another example, here's what happens when PARPORT is installed, and
-a SyQuest EZ-135 is attached to port 0x378:
+a SyQuest EZ-135 is attached to port 0x378::
# insmod paride
paride: version 1.0 installed
@@ -237,46 +250,47 @@ Note that the last line is the output from the generic partition table
scanner - in this case it reports that it has found a disk with one partition.
2.3 Using a PARIDE device
+--------------------------
Once the drivers have been loaded, you can access PARIDE devices in the
same way as their traditional counterparts. You will probably need to
create the device "special files". Here is a simple script that you can
-cut to a file and execute:
+cut to a file and execute::
-#!/bin/bash
-#
-# mkd -- a script to create the device special files for the PARIDE subsystem
-#
-function mkdev {
- mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
-}
-#
-function pd {
- D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
- mkdev pd$D b 45 $[ $1 * 16 ]
- for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
- do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
- done
-}
-#
-cd /dev
-#
-for u in 0 1 2 3 ; do pd $u ; done
-for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done
-for u in 0 1 2 3 ; do mkdev pf$u b 47 $u ; done
-for u in 0 1 2 3 ; do mkdev pt$u c 96 $u ; done
-for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done
-for u in 0 1 2 3 ; do mkdev pg$u c 97 $u ; done
-#
-# end of mkd
+ #!/bin/bash
+ #
+ # mkd -- a script to create the device special files for the PARIDE subsystem
+ #
+ function mkdev {
+ mknod $1 $2 $3 $4 ; chmod 0660 $1 ; chown root:disk $1
+ }
+ #
+ function pd {
+ D=$( printf \\$( printf "x%03x" $[ $1 + 97 ] ) )
+ mkdev pd$D b 45 $[ $1 * 16 ]
+ for P in 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
+ do mkdev pd$D$P b 45 $[ $1 * 16 + $P ]
+ done
+ }
+ #
+ cd /dev
+ #
+ for u in 0 1 2 3 ; do pd $u ; done
+ for u in 0 1 2 3 ; do mkdev pcd$u b 46 $u ; done
+ for u in 0 1 2 3 ; do mkdev pf$u b 47 $u ; done
+ for u in 0 1 2 3 ; do mkdev pt$u c 96 $u ; done
+ for u in 0 1 2 3 ; do mkdev npt$u c 96 $[ $u + 128 ] ; done
+ for u in 0 1 2 3 ; do mkdev pg$u c 97 $u ; done
+ #
+ # end of mkd
With the device files and drivers in place, you can access PARIDE devices
-like any other Linux device. For example, to mount a CD-ROM in pcd0, use:
+like any other Linux device. For example, to mount a CD-ROM in pcd0, use::
mount /dev/pcd0 /cdrom
If you have a fresh Avatar Shark cartridge, and the drive is pda, you
-might do something like:
+might do something like::
fdisk /dev/pda -- make a new partition table with
partition 1 of type 83
@@ -289,41 +303,46 @@ might do something like:
Devices like the Imation superdisk work in the same way, except that
they do not have a partition table. For example to make a 120MB
-floppy that you could share with a DOS system:
+floppy that you could share with a DOS system::
mkdosfs /dev/pf0
mount /dev/pf0 /mnt
2.4 The pf driver
+------------------
The pf driver is intended for use with parallel port ATAPI disk
devices. The most common devices in this category are PD drives
and LS-120 drives. Traditionally, media for these devices are not
partitioned. Consequently, the pf driver does not support partitioned
-media. This may be changed in a future version of the driver.
+media. This may be changed in a future version of the driver.
2.5 Using the pt driver
+------------------------
The pt driver for parallel port ATAPI tape drives is a minimal driver.
-It does not yet support many of the standard tape ioctl operations.
+It does not yet support many of the standard tape ioctl operations.
For best performance, a block size of 32KB should be used. You will
probably want to set the parallel port delay to 0, if you can.
2.6 Using the pg driver
+------------------------
The pg driver can be used in conjunction with the cdrecord program
to create CD-ROMs. Please get cdrecord version 1.6.1 or later
-from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ . To record CD-R media
-your parallel port should ideally be set to EPP mode, and the "port delay"
-should be set to 0. With those settings it is possible to record at 2x
+from ftp://ftp.fokus.gmd.de/pub/unix/cdrecord/ . To record CD-R media
+your parallel port should ideally be set to EPP mode, and the "port delay"
+should be set to 0. With those settings it is possible to record at 2x
speed without any buffer underruns. If you cannot get the driver to work
in EPP mode, try to use "bidirectional" or "PS/2" mode and 1x speeds only.
3. Troubleshooting
+==================
3.1 Use EPP mode if you can
+----------------------------
The most common problems that people report with the PARIDE drivers
concern the parallel port CMOS settings. At this time, none of the
@@ -332,6 +351,7 @@ If you are able to do so, please set your parallel port into EPP mode
using your CMOS setup procedure.
3.2 Check the port delay
+-------------------------
Some parallel ports cannot reliably transfer data at full speed. To
offset the errors, the PARIDE protocol modules introduce a "port
@@ -347,23 +367,25 @@ read the comments at the beginning of the driver source files in
linux/drivers/block/paride.
3.3 Some drives need a printer reset
+-------------------------------------
There appear to be a number of "noname" external drives on the market
that do not always power up correctly. We have noticed this with some
drives based on OnSpec and older Freecom adapters. In these rare cases,
the adapter can often be reinitialised by issuing a "printer reset" on
-the parallel port. As the reset operation is potentially disruptive in
-multiple device environments, the PARIDE drivers will not do it
-automatically. You can however, force a printer reset by doing:
+the parallel port. As the reset operation is potentially disruptive in
+multiple device environments, the PARIDE drivers will not do it
+automatically. You can however, force a printer reset by doing::
insmod lp reset=1
rmmod lp
If you have one of these marginal cases, you should probably build
your paride drivers as modules, and arrange to do the printer reset
-before loading the PARIDE drivers.
+before loading the PARIDE drivers.
3.4 Use the verbose option and dmesg if you need help
+------------------------------------------------------
While a lot of testing has gone into these drivers to make them work
as smoothly as possible, problems will arise. If you do have problems,
@@ -373,7 +395,7 @@ clues, then please make sure that only one drive is hooked to your system,
and that either (a) PARPORT is enabled or (b) no other device driver
is using your parallel port (check in /proc/ioports). Then, load the
appropriate drivers (you can load several protocol modules if you want)
-as in:
+as in::
# insmod paride
# insmod epat
@@ -394,12 +416,14 @@ by e-mail to grant(a)torque.net, or join the linux-parport mailing list
and post your report there.
3.5 For more information or help
+---------------------------------
You can join the linux-parport mailing list by sending a mail message
-to
+to:
+
linux-parport-request(a)torque.net
-with the single word
+with the single word::
subscribe
@@ -412,6 +436,4 @@ have in your mail headers, when sending mail to the list server.
You might also find some useful information on the linux-parport
web pages (although they are not always up to date) at
- http://web.archive.org/web/*/http://www.torque.net/parport/
-
-
+ http://web.archive.org/web/%2E/http://www.torque.net/parport/
diff --git a/Documentation/blockdev/ramdisk.txt b/Documentation/blockdev/ramdisk.rst
similarity index 84%
rename from Documentation/blockdev/ramdisk.txt
rename to Documentation/blockdev/ramdisk.rst
index 501e12e0323e..b7c2268f8dec 100644
--- a/Documentation/blockdev/ramdisk.txt
+++ b/Documentation/blockdev/ramdisk.rst
@@ -1,7 +1,8 @@
+==========================================
Using the RAM disk block device with Linux
-------------------------------------------
+==========================================
-Contents:
+.. Contents:
1) Overview
2) Kernel Command Line Parameters
@@ -42,7 +43,7 @@ rescue floppy disk.
2a) Kernel Command Line Parameters
ramdisk_size=N
- ==============
+ Size of the ramdisk.
This parameter tells the RAM disk driver to set up RAM disks of N k size. The
default is 4096 (4 MB).
@@ -50,16 +51,13 @@ default is 4096 (4 MB).
2b) Module parameters
rd_nr
- =====
- /dev/ramX devices created.
+ /dev/ramX devices created.
max_part
- ========
- Maximum partition number.
+ Maximum partition number.
rd_size
- =======
- See ramdisk_size.
+ See ramdisk_size.
3) Using "rdev -r"
------------------
@@ -71,11 +69,11 @@ to 2 MB (2^11) of where to find the RAM disk (this used to be the size). Bit
prompt/wait sequence is to be given before trying to read the RAM disk. Since
the RAM disk dynamically grows as data is being written into it, a size field
is not required. Bits 11 to 13 are not currently used and may as well be zero.
-These numbers are no magical secrets, as seen below:
+These numbers are no magical secrets, as seen below::
-./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK 0x07FF
-./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
-./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
+ ./arch/x86/kernel/setup.c:#define RAMDISK_IMAGE_START_MASK 0x07FF
+ ./arch/x86/kernel/setup.c:#define RAMDISK_PROMPT_FLAG 0x8000
+ ./arch/x86/kernel/setup.c:#define RAMDISK_LOAD_FLAG 0x4000
Consider a typical two floppy disk setup, where you will have the
kernel on disk one, and have already put a RAM disk image onto disk #2.
@@ -92,20 +90,23 @@ sequence so that you have a chance to switch floppy disks.
The command line equivalent is: "prompt_ramdisk=1"
Putting that together gives 2^15 + 2^14 + 0 = 49152 for an rdev word.
-So to create disk one of the set, you would do:
+So to create disk one of the set, you would do::
/usr/src/linux# cat arch/x86/boot/zImage > /dev/fd0
/usr/src/linux# rdev /dev/fd0 /dev/fd0
/usr/src/linux# rdev -r /dev/fd0 49152
-If you make a boot disk that has LILO, then for the above, you would use:
+If you make a boot disk that has LILO, then for the above, you would use::
+
append = "ramdisk_start=0 load_ramdisk=1 prompt_ramdisk=1"
-Since the default start = 0 and the default prompt = 1, you could use:
+
+Since the default start = 0 and the default prompt = 1, you could use::
+
append = "load_ramdisk=1"
4) An Example of Creating a Compressed RAM Disk
-----------------------------------------------
+-----------------------------------------------
To create a RAM disk image, you will need a spare block device to
construct it on. This can be the RAM disk device itself, or an
@@ -120,11 +121,11 @@ a) Decide on the RAM disk size that you want. Say 2 MB for this example.
Create it by writing to the RAM disk device. (This step is not currently
required, but may be in the future.) It is wise to zero out the
area (esp. for disks) so that maximal compression is achieved for
- the unused blocks of the image that you are about to create.
+ the unused blocks of the image that you are about to create::
dd if=/dev/zero of=/dev/ram0 bs=1k count=2048
-b) Make a filesystem on it. Say ext2fs for this example.
+b) Make a filesystem on it. Say ext2fs for this example::
mke2fs -vm0 /dev/ram0 2048
@@ -133,11 +134,11 @@ c) Mount it, copy the files you want to it (eg: /etc/* /dev/* ...)
d) Compress the contents of the RAM disk. The level of compression
will be approximately 50% of the space used by the files. Unused
- space on the RAM disk will compress to almost nothing.
+ space on the RAM disk will compress to almost nothing::
dd if=/dev/ram0 bs=1k count=2048 | gzip -v9 > /tmp/ram_image.gz
-e) Put the kernel onto the floppy
+e) Put the kernel onto the floppy::
dd if=zImage of=/dev/fd0 bs=1k
@@ -146,13 +147,13 @@ f) Put the RAM disk image onto the floppy, after the kernel. Use an offset
(possibly larger) kernel onto the same floppy later without overlapping
the RAM disk image. An offset of 400 kB for kernels about 350 kB in
size would be reasonable. Make sure offset+size of ram_image.gz is
- not larger than the total space on your floppy (usually 1440 kB).
+ not larger than the total space on your floppy (usually 1440 kB)::
dd if=/tmp/ram_image.gz of=/dev/fd0 bs=1k seek=400
g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
For prompt_ramdisk=1, load_ramdisk=1, ramdisk_start=400, one would
- have 2^15 + 2^14 + 400 = 49552.
+ have 2^15 + 2^14 + 400 = 49552::
rdev /dev/fd0 /dev/fd0
rdev -r /dev/fd0 49552
@@ -160,15 +161,17 @@ g) Use "rdev" to set the boot device, RAM disk offset, prompt flag, etc.
That is it. You now have your boot/root compressed RAM disk floppy. Some
users may wish to combine steps (d) and (f) by using a pipe.
---------------------------------------------------------------------------
+
Paul Gortmaker 12/95
Changelog:
----------
-10-22-04 : Updated to reflect changes in command line options, remove
+10-22-04 :
+ Updated to reflect changes in command line options, remove
obsolete references, general cleanup.
James Nelson (james4765(a)gmail.com)
-12-95 : Original Document
+12-95 :
+ Original Document
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.rst
similarity index 76%
rename from Documentation/blockdev/zram.txt
rename to Documentation/blockdev/zram.rst
index 4df0ce271085..2111231c9c0f 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.rst
@@ -1,7 +1,9 @@
+========================================
zram: Compressed RAM based block devices
-----------------------------------------
+========================================
-* Introduction
+Introduction
+============
The zram module creates RAM based block devices named /dev/zram<id>
(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
@@ -12,9 +14,11 @@ use as swap disks, various caches under /var and maybe many more :)
Statistics for individual zram devices are exported through sysfs nodes at
/sys/block/zram<id>/
-* Usage
+Usage
+=====
There are several ways to configure and manage zram device(-s):
+
a) using zram and zram_control sysfs attributes
b) using zramctl utility, provided by util-linux (util-linux(a)vger.kernel.org).
@@ -22,7 +26,7 @@ In this document we will describe only 'manual' zram configuration steps,
IOW, zram and zram_control sysfs attributes.
In order to get a better idea about zramctl please consult util-linux
-documentation, zramctl man-page or `zramctl --help'. Please be informed
+documentation, zramctl man-page or `zramctl --help`. Please be informed
that zram maintainers do not develop/maintain util-linux or zramctl, should
you have any questions please contact util-linux(a)vger.kernel.org
@@ -30,19 +34,23 @@ Following shows a typical sequence of steps for using zram.
WARNING
=======
+
For the sake of simplicity we skip error checking parts in most of the
examples below. However, it is your sole responsibility to handle errors.
zram sysfs attributes always return negative values in case of errors.
The list of possible return codes:
--EBUSY -- an attempt to modify an attribute that cannot be changed once
-the device has been initialised. Please reset device first;
--ENOMEM -- zram was not able to allocate enough memory to fulfil your
-needs;
--EINVAL -- invalid input has been provided.
+
+======== =============================================================
+-EBUSY an attempt to modify an attribute that cannot be changed once
+ the device has been initialised. Please reset device first;
+-ENOMEM zram was not able to allocate enough memory to fulfil your
+ needs;
+-EINVAL invalid input has been provided.
+======== =============================================================
If you use 'echo', the returned value that is changed by 'echo' utility,
-and, in general case, something like:
+and, in general case, something like::
echo 3 > /sys/block/zram0/max_comp_streams
if [ $? -ne 0 ];
@@ -51,7 +59,11 @@ and, in general case, something like:
should suffice.
-1) Load Module:
+1) Load Module
+==============
+
+::
+
modprobe zram num_devices=4
This creates 4 devices: /dev/zram{0,1,2,3}
@@ -59,6 +71,8 @@ num_devices parameter is optional and tells zram how many devices should be
pre-created. Default: 1.
2) Set max number of compression streams
+========================================
+
Regardless the value passed to this attribute, ZRAM will always
allocate multiple compression streams - one per online CPUs - thus
allowing several concurrent compression operations. The number of
@@ -66,16 +80,20 @@ allocated compression streams goes down when some of the CPUs
become offline. There is no single-compression-stream mode anymore,
unless you are running a UP system or has only 1 CPU online.
-To find out how many streams are currently available:
+To find out how many streams are currently available::
+
cat /sys/block/zram0/max_comp_streams
3) Select compression algorithm
+===============================
+
Using comp_algorithm device attribute one can see available and
currently selected (shown in square brackets) compression algorithms,
change selected compression algorithm (once the device is initialised
there is no way to change compression algorithm).
-Examples:
+Examples::
+
#show supported compression algorithms
cat /sys/block/zram0/comp_algorithm
lzo [lz4]
@@ -83,20 +101,23 @@ Examples:
#select lzo compression algorithm
echo lzo > /sys/block/zram0/comp_algorithm
-For the time being, the `comp_algorithm' content does not necessarily
+For the time being, the `comp_algorithm` content does not necessarily
show every compression algorithm supported by the kernel. We keep this
list primarily to simplify device configuration and one can configure
a new device with a compression algorithm that is not listed in
-`comp_algorithm'. The thing is that, internally, ZRAM uses Crypto API
+`comp_algorithm`. The thing is that, internally, ZRAM uses Crypto API
and, if some of the algorithms were built as modules, it's impossible
to list all of them using, for instance, /proc/crypto or any other
method. This, however, has an advantage of permitting the usage of
custom crypto compression modules (implementing S/W or H/W compression).
4) Set Disksize
+===============
+
Set disk size by writing the value to sysfs node 'disksize'.
The value can be either in bytes or you can use mem suffixes.
-Examples:
+Examples::
+
# Initialize /dev/zram0 with 50MB disksize
echo $((50*1024*1024)) > /sys/block/zram0/disksize
@@ -111,10 +132,13 @@ since we expect a 2:1 compression ratio. Note that zram uses about 0.1% of the
size of the disk when not in use so a huge zram is wasteful.
5) Set memory limit: Optional
+=============================
+
Set memory limit by writing the value to sysfs node 'mem_limit'.
The value can be either in bytes or you can use mem suffixes.
In addition, you could change the value in runtime.
-Examples:
+Examples::
+
# limit /dev/zram0 with 50MB memory
echo $((50*1024*1024)) > /sys/block/zram0/mem_limit
@@ -126,7 +150,11 @@ Examples:
# To disable memory limit
echo 0 > /sys/block/zram0/mem_limit
-6) Activate:
+6) Activate
+===========
+
+::
+
mkswap /dev/zram0
swapon /dev/zram0
@@ -134,6 +162,7 @@ Examples:
mount /dev/zram1 /tmp
7) Add/remove zram devices
+==========================
zram provides a control interface, which enables dynamic (on-demand) device
addition and removal.
@@ -142,37 +171,44 @@ In order to add a new /dev/zramX device, perform read operation on hot_add
attribute. This will return either new device's device id (meaning that you
can use /dev/zram<id>) or error code.
-Example:
+Example::
+
cat /sys/class/zram-control/hot_add
1
To remove the existing /dev/zramX device (where X is a device id)
-execute
+execute::
+
echo X > /sys/class/zram-control/hot_remove
-8) Stats:
+8) Stats
+========
+
Per-device statistics are exported as various nodes under /sys/block/zram<id>/
A brief description of exported device attributes. For more details please
read Documentation/ABI/testing/sysfs-block-zram.
+====================== ====== ===============================================
Name access description
----- ------ -----------
+====================== ====== ===============================================
disksize RW show and set the device's disk size
initstate RO shows the initialization state of the device
reset WO trigger device reset
-mem_used_max WO reset the `mem_used_max' counter (see later)
-mem_limit WO specifies the maximum amount of memory ZRAM can use
- to store the compressed data
-writeback_limit WO specifies the maximum amount of write IO zram can
- write out to backing device as 4KB unit
+mem_used_max WO reset the `mem_used_max` counter (see later)
+mem_limit WO specifies the maximum amount of memory ZRAM can
+ use to store the compressed data
+writeback_limit WO specifies the maximum amount of write IO zram
+ can write out to backing device as 4KB unit
writeback_limit_enable RW show and set writeback_limit feature
-max_comp_streams RW the number of possible concurrent compress operations
+max_comp_streams RW the number of possible concurrent compress
+ operations
comp_algorithm RW show and change the compression algorithm
compact WO trigger memory compaction
debug_stat RO this file is used for zram debugging purposes
backing_dev RW set up backend storage for zram to write out
idle WO mark allocated slot as idle
+====================== ====== ===============================================
User space is advised to use the following files to read the device statistics.
@@ -188,23 +224,31 @@ The stat file represents device's I/O statistics not accounted by block
layer and, thus, not available in zram<id>/stat file. It consists of a
single line of text and contains the following stats separated by
whitespace:
- failed_reads the number of failed reads
- failed_writes the number of failed writes
- invalid_io the number of non-page-size-aligned I/O requests
+
+ ============= =============================================================
+ failed_reads The number of failed reads
+ failed_writes The number of failed writes
+ invalid_io The number of non-page-size-aligned I/O requests
notify_free Depending on device usage scenario it may account
+
a) the number of pages freed because of swap slot free
- notifications or b) the number of pages freed because of
- REQ_OP_DISCARD requests sent by bio. The former ones are
- sent to a swap block device when a swap slot is freed,
- which implies that this disk is being used as a swap disk.
+ notifications
+ b) the number of pages freed because of
+ REQ_OP_DISCARD requests sent by bio. The former ones are
+ sent to a swap block device when a swap slot is freed,
+ which implies that this disk is being used as a swap disk.
+
The latter ones are sent by filesystem mounted with
discard option, whenever some data blocks are getting
discarded.
+ ============= =============================================================
File /sys/block/zram<id>/mm_stat
The stat file represents device's mm statistics. It consists of a single
line of text and contains the following stats separated by whitespace:
+
+ ================ =============================================================
orig_data_size uncompressed size of data stored in this disk.
This excludes same-element-filled pages (same_pages) since
no memory is allocated for them.
@@ -223,58 +267,71 @@ line of text and contains the following stats separated by whitespace:
No memory is allocated for such pages.
pages_compacted the number of pages freed during compaction
huge_pages the number of incompressible pages
+ ================ =============================================================
File /sys/block/zram<id>/bd_stat
The stat file represents device's backing device statistics. It consists of
a single line of text and contains the following stats separated by whitespace:
+
+ ============== =============================================================
bd_count size of data written in backing device.
Unit: 4K bytes
bd_reads the number of reads from backing device
Unit: 4K bytes
bd_writes the number of writes to backing device
Unit: 4K bytes
+ ============== =============================================================
+
+9) Deactivate
+=============
+
+::
-9) Deactivate:
swapoff /dev/zram0
umount /dev/zram1
-10) Reset:
- Write any positive value to 'reset' sysfs node
- echo 1 > /sys/block/zram0/reset
- echo 1 > /sys/block/zram1/reset
+10) Reset
+=========
+
+ Write any positive value to 'reset' sysfs node::
+
+ echo 1 > /sys/block/zram0/reset
+ echo 1 > /sys/block/zram1/reset
This frees all the memory allocated for the given device and
resets the disksize to zero. You must set the disksize again
before reusing the device.
-* Optional Feature
+Optional Feature
+================
-= writeback
+writeback
+---------
With CONFIG_ZRAM_WRITEBACK, zram can write idle/incompressible page
to backing storage rather than keeping it in memory.
-To use the feature, admin should set up backing device via
+To use the feature, admin should set up backing device via::
- "echo /dev/sda5 > /sys/block/zramX/backing_dev"
+ echo /dev/sda5 > /sys/block/zramX/backing_dev
before disksize setting. It supports only partition at this moment.
-If admin want to use incompressible page writeback, they could do via
+If admin want to use incompressible page writeback, they could do via::
- "echo huge > /sys/block/zramX/write"
+ echo huge > /sys/block/zramX/write
To use idle page writeback, first, user need to declare zram pages
-as idle.
+as idle::
- "echo all > /sys/block/zramX/idle"
+ echo all > /sys/block/zramX/idle
From now on, any pages on zram are idle pages. The idle mark
will be removed until someone request access of the block.
IOW, unless there is access request, those pages are still idle pages.
-Admin can request writeback of those idle pages at right timing via
+Admin can request writeback of those idle pages at right timing via::
- "echo idle > /sys/block/zramX/writeback"
+ echo idle > /sys/block/zramX/writeback
With the command, zram writeback idle pages from memory to the storage.
@@ -285,7 +342,7 @@ to guarantee storage health for entire product life.
To overcome the concern, zram supports "writeback_limit" feature.
The "writeback_limit_enable"'s default value is 0 so that it doesn't limit
any writeback. IOW, if admin want to apply writeback budget, he should
-enable writeback_limit_enable via
+enable writeback_limit_enable via::
$ echo 1 > /sys/block/zramX/writeback_limit_enable
@@ -296,7 +353,7 @@ until admin set the budget via /sys/block/zramX/writeback_limit.
assigned via /sys/block/zramX/writeback_limit is meaninless.)
If admin want to limit writeback as per-day 400M, he could do it
-like below.
+like below::
$ MB_SHIFT=20
$ 4K_SHIFT=12
@@ -305,16 +362,16 @@ like below.
$ echo 1 > /sys/block/zram0/writeback_limit_enable
If admin want to allow further write again once the bugdet is exausted,
-he could do it like below
+he could do it like below::
$ echo $((400<<MB_SHIFT>>4K_SHIFT)) > \
/sys/block/zram0/writeback_limit
-If admin want to see remaining writeback budget since he set,
+If admin want to see remaining writeback budget since he set::
$ cat /sys/block/zramX/writeback_limit
-If admin want to disable writeback limit, he could do
+If admin want to disable writeback limit, he could do::
$ echo 0 > /sys/block/zramX/writeback_limit_enable
@@ -326,25 +383,35 @@ budget in next setting is user's job.
If admin want to measure writeback count in a certain period, he could
know it via /sys/block/zram0/bd_stat's 3rd column.
-= memory tracking
+memory tracking
+===============
With CONFIG_ZRAM_MEMORY_TRACKING, user can know information of the
zram block. It could be useful to catch cold or incompressible
pages of the process with*pagemap.
+
If you enable the feature, you could see block state via
-/sys/kernel/debug/zram/zram0/block_state". The output is as follows,
+/sys/kernel/debug/zram/zram0/block_state". The output is as follows::
300 75.033841 .wh.
301 63.806904 s...
302 63.806919 ..hi
-First column is zram's block index.
-Second column is access time since the system was booted
-Third column is state of the block.
-(s: same page
-w: written page to backing store
-h: huge page
-i: idle page)
+First column
+ zram's block index.
+Second column
+ access time since the system was booted
+Third column
+ state of the block:
+
+ s:
+ same page
+ w:
+ written page to backing store
+ h:
+ huge page
+ i:
+ idle page
First line of above example says 300th block is accessed at 75.033841sec
and the block's state is huge so it is written back to the backing
diff --git a/MAINTAINERS b/MAINTAINERS
index 85648a944446..808f65e06ad8 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -11024,7 +11024,7 @@ M: Josef Bacik <josef(a)toxicpanda.com>
S: Maintained
L: linux-block(a)vger.kernel.org
L: nbd(a)other.debian.org
-F: Documentation/blockdev/nbd.txt
+F: Documentation/blockdev/nbd.rst
F: drivers/block/nbd.c
F: include/trace/events/nbd.h
F: include/uapi/linux/nbd.h
@@ -12028,7 +12028,7 @@ PARIDE DRIVERS FOR PARALLEL PORT IDE DEVICES
M: Tim Waugh <tim(a)cyberelk.net>
L: linux-parport(a)lists.infradead.org (subscribers-only)
S: Maintained
-F: Documentation/blockdev/paride.txt
+F: Documentation/blockdev/paride.rst
F: drivers/block/paride/
PARISC ARCHITECTURE
@@ -13310,7 +13310,7 @@ F: drivers/net/wireless/ralink/rt2x00/
RAMDISK RAM BLOCK DEVICE DRIVER
M: Jens Axboe <axboe(a)kernel.dk>
S: Maintained
-F: Documentation/blockdev/ramdisk.txt
+F: Documentation/blockdev/ramdisk.rst
F: drivers/block/brd.c
RANCHU VIRTUAL BOARD FOR MIPS
@@ -17672,7 +17672,7 @@ R: Sergey Senozhatsky <sergey.senozhatsky.work(a)gmail.com>
L: linux-kernel(a)vger.kernel.org
S: Maintained
F: drivers/block/zram/
-F: Documentation/blockdev/zram.txt
+F: Documentation/blockdev/zram.rst
ZS DECSTATION Z85C30 SERIAL DRIVER
M: "Maciej W. Rozycki" <macro(a)linux-mips.org>
diff --git a/drivers/block/Kconfig b/drivers/block/Kconfig
index 96ec7e0fc1ea..c43690b973d8 100644
--- a/drivers/block/Kconfig
+++ b/drivers/block/Kconfig
@@ -31,7 +31,7 @@ config BLK_DEV_FD
If you want to use the floppy disk drive(s) of your PC under Linux,
say Y. Information about this driver, especially important for IBM
Thinkpad users, is contained in
- <file:Documentation/blockdev/floppy.txt>.
+ <file:Documentation/blockdev/floppy.rst>.
That file also contains the location of the Floppy driver FAQ as
well as location of the fdutils package used to configure additional
parameters of the driver at run time.
@@ -96,7 +96,7 @@ config PARIDE
your computer's parallel port. Most of them are actually IDE devices
using a parallel port IDE adapter. This option enables the PARIDE
subsystem which contains drivers for many of these external drives.
- Read <file:Documentation/blockdev/paride.txt> for more information.
+ Read <file:Documentation/blockdev/paride.rst> for more information.
If you have said Y to the "Parallel-port support" configuration
option, you may share a single port between your printer and other
@@ -261,7 +261,7 @@ config BLK_DEV_NBD
userland (making server and client physically the same computer,
communicating using the loopback network device).
- Read <file:Documentation/blockdev/nbd.txt> for more information,
+ Read <file:Documentation/blockdev/nbd.rst> for more information,
especially about where to find the server code, which runs in user
space and does not need special kernel support.
@@ -303,7 +303,7 @@ config BLK_DEV_RAM
during the initial install of Linux.
Note that the kernel command line option "ramdisk=XX" is now obsolete.
- For details, read <file:Documentation/blockdev/ramdisk.txt>.
+ For details, read <file:Documentation/blockdev/ramdisk.rst>.
To compile this driver as a module, choose M here: the
module will be called brd. An alias "rd" has been defined
diff --git a/drivers/block/floppy.c b/drivers/block/floppy.c
index b933a7eea52b..5c99e52f9dc1 100644
--- a/drivers/block/floppy.c
+++ b/drivers/block/floppy.c
@@ -4424,7 +4424,7 @@ static int __init floppy_setup(char *str)
pr_cont("\n");
} else
DPRINT("botched floppy option\n");
- DPRINT("Read Documentation/blockdev/floppy.txt\n");
+ DPRINT("Read Documentation/blockdev/floppy.rst\n");
return 0;
}
diff --git a/drivers/block/zram/Kconfig b/drivers/block/zram/Kconfig
index 1ffc64770643..e06b99d54816 100644
--- a/drivers/block/zram/Kconfig
+++ b/drivers/block/zram/Kconfig
@@ -12,7 +12,7 @@ config ZRAM
It has several use cases, for example: /tmp storage, use as swap
disks and maybe many more.
- See Documentation/blockdev/zram.txt for more information.
+ See Documentation/blockdev/zram.rst for more information.
config ZRAM_WRITEBACK
bool "Write back incompressible or idle page to backing device"
@@ -26,7 +26,7 @@ config ZRAM_WRITEBACK
With /sys/block/zramX/{idle,writeback}, application could ask
idle page's writeback to the backing device to save in memory.
- See Documentation/blockdev/zram.txt for more information.
+ See Documentation/blockdev/zram.rst for more information.
config ZRAM_MEMORY_TRACKING
bool "Track zRam block status"
@@ -36,4 +36,4 @@ config ZRAM_MEMORY_TRACKING
of zRAM. Admin could see the information via
/sys/kernel/debug/zram/zramX/block_state.
- See Documentation/blockdev/zram.txt for more information.
+ See Documentation/blockdev/zram.rst for more information.
diff --git a/tools/testing/selftests/zram/README b/tools/testing/selftests/zram/README
index 7972cc512408..5fa378391d3b 100644
--- a/tools/testing/selftests/zram/README
+++ b/tools/testing/selftests/zram/README
@@ -37,4 +37,4 @@ Commands required for testing:
- mkfs/ mkfs.ext4
For more information please refer:
-kernel-source-tree/Documentation/blockdev/zram.txt
+kernel-source-tree/Documentation/blockdev/zram.rst
--
2.21.0
This adds the pidfd_open() syscall. It allows a caller to retrieve pollable
pidfds for a process which did not get created via CLONE_PIDFD, i.e. for a
process that is created via traditional fork()/clone() calls that is only
referenced by a PID:
int pidfd = pidfd_open(1234, 0);
ret = pidfd_send_signal(pidfd, SIGSTOP, NULL, 0);
With the introduction of pidfds through CLONE_PIDFD it is possible to
created pidfds at process creation time.
However, a lot of processes get created with traditional PID-based calls
such as fork() or clone() (without CLONE_PIDFD). For these processes a
caller can currently not create a pollable pidfd. This is a problem for
Android's low memory killer (LMK) and service managers such as systemd.
Both are examples of tools that want to make use of pidfds to get reliable
notification of process exit for non-parents (pidfd polling) and race-free
signal sending (pidfd_send_signal()). They intend to switch to this API for
process supervision/management as soon as possible. Having no way to get
pollable pidfds from PID-only processes is one of the biggest blockers for
them in adopting this api. With pidfd_open() making it possible to retrieve
pidfds for PID-based processes we enable them to adopt this api.
In line with Arnd's recent changes to consolidate syscall numbers across
architectures, I have added the pidfd_open() syscall to all architectures
at the same time.
Signed-off-by: Christian Brauner <christian(a)brauner.io>
Reviewed-by: Oleg Nesterov <oleg(a)redhat.com>
Acked-by: Arnd Bergmann <arnd(a)arndb.de>
Cc: "Eric W. Biederman" <ebiederm(a)xmission.com>
Cc: Kees Cook <keescook(a)chromium.org>
Cc: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Jann Horn <jannh(a)google.com>
Cc: David Howells <dhowells(a)redhat.com>
Cc: Andy Lutomirsky <luto(a)kernel.org>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Aleksa Sarai <cyphar(a)cyphar.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Al Viro <viro(a)zeniv.linux.org.uk>
Cc: linux-api(a)vger.kernel.org
---
v1:
- kbuild test robot <lkp(a)intel.com>:
- add missing entry for pidfd_open to arch/arm/tools/syscall.tbl
- Oleg Nesterov <oleg(a)redhat.com>:
- use simpler thread-group leader check
v2:
- Oleg Nesterov <oleg(a)redhat.com>:
- avoid using additional variable
- remove unneeded comment
- Arnd Bergmann <arnd(a)arndb.de>:
- switch from 428 to 434 since the new mount api has taken it
- bump syscall numbers in arch/arm64/include/asm/unistd.h
- Joel Fernandes (Google) <joel(a)joelfernandes.org>:
- switch from ESRCH to EINVAL when the passed-in pid does not refer to a
thread-group leader
- Christian Brauner <christian(a)brauner.io>:
- rebase on v5.2-rc1
- adapt syscall number to account for new mount api syscalls
v3:
- Arnd Bergmann <arnd(a)arndb.de>:
- add missing syscall entries for mips-o32 and mips-n64
---
arch/alpha/kernel/syscalls/syscall.tbl | 1 +
arch/arm/tools/syscall.tbl | 1 +
arch/arm64/include/asm/unistd.h | 2 +-
arch/arm64/include/asm/unistd32.h | 2 +
arch/ia64/kernel/syscalls/syscall.tbl | 1 +
arch/m68k/kernel/syscalls/syscall.tbl | 1 +
arch/microblaze/kernel/syscalls/syscall.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n32.tbl | 1 +
arch/mips/kernel/syscalls/syscall_n64.tbl | 1 +
arch/mips/kernel/syscalls/syscall_o32.tbl | 1 +
arch/parisc/kernel/syscalls/syscall.tbl | 1 +
arch/powerpc/kernel/syscalls/syscall.tbl | 1 +
arch/s390/kernel/syscalls/syscall.tbl | 1 +
arch/sh/kernel/syscalls/syscall.tbl | 1 +
arch/sparc/kernel/syscalls/syscall.tbl | 1 +
arch/x86/entry/syscalls/syscall_32.tbl | 1 +
arch/x86/entry/syscalls/syscall_64.tbl | 1 +
arch/xtensa/kernel/syscalls/syscall.tbl | 1 +
include/linux/pid.h | 1 +
include/linux/syscalls.h | 1 +
include/uapi/asm-generic/unistd.h | 4 +-
kernel/fork.c | 2 +-
kernel/pid.c | 43 +++++++++++++++++++++
23 files changed, 68 insertions(+), 3 deletions(-)
diff --git a/arch/alpha/kernel/syscalls/syscall.tbl b/arch/alpha/kernel/syscalls/syscall.tbl
index 9e7704e44f6d..1db9bbcfb84e 100644
--- a/arch/alpha/kernel/syscalls/syscall.tbl
+++ b/arch/alpha/kernel/syscalls/syscall.tbl
@@ -473,3 +473,4 @@
541 common fsconfig sys_fsconfig
542 common fsmount sys_fsmount
543 common fspick sys_fspick
+544 common pidfd_open sys_pidfd_open
diff --git a/arch/arm/tools/syscall.tbl b/arch/arm/tools/syscall.tbl
index aaf479a9e92d..81e6e1817c45 100644
--- a/arch/arm/tools/syscall.tbl
+++ b/arch/arm/tools/syscall.tbl
@@ -447,3 +447,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/arm64/include/asm/unistd.h b/arch/arm64/include/asm/unistd.h
index 70e6882853c0..e8f7d95a1481 100644
--- a/arch/arm64/include/asm/unistd.h
+++ b/arch/arm64/include/asm/unistd.h
@@ -44,7 +44,7 @@
#define __ARM_NR_compat_set_tls (__ARM_NR_COMPAT_BASE + 5)
#define __ARM_NR_COMPAT_END (__ARM_NR_COMPAT_BASE + 0x800)
-#define __NR_compat_syscalls 434
+#define __NR_compat_syscalls 435
#endif
#define __ARCH_WANT_SYS_CLONE
diff --git a/arch/arm64/include/asm/unistd32.h b/arch/arm64/include/asm/unistd32.h
index c39e90600bb3..7a3158ccd68e 100644
--- a/arch/arm64/include/asm/unistd32.h
+++ b/arch/arm64/include/asm/unistd32.h
@@ -886,6 +886,8 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
__SYSCALL(__NR_fsmount, sys_fsmount)
#define __NR_fspick 433
__SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_pidfd_open 434
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
/*
* Please add new compat syscalls above this comment and update
diff --git a/arch/ia64/kernel/syscalls/syscall.tbl b/arch/ia64/kernel/syscalls/syscall.tbl
index e01df3f2f80d..ecc44926737b 100644
--- a/arch/ia64/kernel/syscalls/syscall.tbl
+++ b/arch/ia64/kernel/syscalls/syscall.tbl
@@ -354,3 +354,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/m68k/kernel/syscalls/syscall.tbl b/arch/m68k/kernel/syscalls/syscall.tbl
index 7e3d0734b2f3..9a3eb2558568 100644
--- a/arch/m68k/kernel/syscalls/syscall.tbl
+++ b/arch/m68k/kernel/syscalls/syscall.tbl
@@ -433,3 +433,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/microblaze/kernel/syscalls/syscall.tbl b/arch/microblaze/kernel/syscalls/syscall.tbl
index 26339e417695..ad706f83c755 100644
--- a/arch/microblaze/kernel/syscalls/syscall.tbl
+++ b/arch/microblaze/kernel/syscalls/syscall.tbl
@@ -439,3 +439,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/mips/kernel/syscalls/syscall_n32.tbl b/arch/mips/kernel/syscalls/syscall_n32.tbl
index 0e2dd68ade57..97035e19ad03 100644
--- a/arch/mips/kernel/syscalls/syscall_n32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n32.tbl
@@ -372,3 +372,4 @@
431 n32 fsconfig sys_fsconfig
432 n32 fsmount sys_fsmount
433 n32 fspick sys_fspick
+434 n32 pidfd_open sys_pidfd_open
diff --git a/arch/mips/kernel/syscalls/syscall_n64.tbl b/arch/mips/kernel/syscalls/syscall_n64.tbl
index 5eebfa0d155c..d7292722d3b0 100644
--- a/arch/mips/kernel/syscalls/syscall_n64.tbl
+++ b/arch/mips/kernel/syscalls/syscall_n64.tbl
@@ -348,3 +348,4 @@
431 n64 fsconfig sys_fsconfig
432 n64 fsmount sys_fsmount
433 n64 fspick sys_fspick
+434 n64 pidfd_open sys_pidfd_open
diff --git a/arch/mips/kernel/syscalls/syscall_o32.tbl b/arch/mips/kernel/syscalls/syscall_o32.tbl
index 3cc1374e02d0..dba084c92f14 100644
--- a/arch/mips/kernel/syscalls/syscall_o32.tbl
+++ b/arch/mips/kernel/syscalls/syscall_o32.tbl
@@ -421,3 +421,4 @@
431 o32 fsconfig sys_fsconfig
432 o32 fsmount sys_fsmount
433 o32 fspick sys_fspick
+434 o32 pidfd_open sys_pidfd_open
diff --git a/arch/parisc/kernel/syscalls/syscall.tbl b/arch/parisc/kernel/syscalls/syscall.tbl
index c9e377d59232..5022b9e179c2 100644
--- a/arch/parisc/kernel/syscalls/syscall.tbl
+++ b/arch/parisc/kernel/syscalls/syscall.tbl
@@ -430,3 +430,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/powerpc/kernel/syscalls/syscall.tbl b/arch/powerpc/kernel/syscalls/syscall.tbl
index 103655d84b4b..f2c3bda2d39f 100644
--- a/arch/powerpc/kernel/syscalls/syscall.tbl
+++ b/arch/powerpc/kernel/syscalls/syscall.tbl
@@ -515,3 +515,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/s390/kernel/syscalls/syscall.tbl b/arch/s390/kernel/syscalls/syscall.tbl
index e822b2964a83..6ebacfeaf853 100644
--- a/arch/s390/kernel/syscalls/syscall.tbl
+++ b/arch/s390/kernel/syscalls/syscall.tbl
@@ -436,3 +436,4 @@
431 common fsconfig sys_fsconfig sys_fsconfig
432 common fsmount sys_fsmount sys_fsmount
433 common fspick sys_fspick sys_fspick
+434 common pidfd_open sys_pidfd_open sys_pidfd_open
diff --git a/arch/sh/kernel/syscalls/syscall.tbl b/arch/sh/kernel/syscalls/syscall.tbl
index 016a727d4357..834c9c7d79fa 100644
--- a/arch/sh/kernel/syscalls/syscall.tbl
+++ b/arch/sh/kernel/syscalls/syscall.tbl
@@ -436,3 +436,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/sparc/kernel/syscalls/syscall.tbl b/arch/sparc/kernel/syscalls/syscall.tbl
index e047480b1605..c58e71f21129 100644
--- a/arch/sparc/kernel/syscalls/syscall.tbl
+++ b/arch/sparc/kernel/syscalls/syscall.tbl
@@ -479,3 +479,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
index ad968b7bac72..43e4429a5272 100644
--- a/arch/x86/entry/syscalls/syscall_32.tbl
+++ b/arch/x86/entry/syscalls/syscall_32.tbl
@@ -438,3 +438,4 @@
431 i386 fsconfig sys_fsconfig __ia32_sys_fsconfig
432 i386 fsmount sys_fsmount __ia32_sys_fsmount
433 i386 fspick sys_fspick __ia32_sys_fspick
+434 i386 pidfd_open sys_pidfd_open __ia32_sys_pidfd_open
diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl
index b4e6f9e6204a..1bee0a77fdd3 100644
--- a/arch/x86/entry/syscalls/syscall_64.tbl
+++ b/arch/x86/entry/syscalls/syscall_64.tbl
@@ -355,6 +355,7 @@
431 common fsconfig __x64_sys_fsconfig
432 common fsmount __x64_sys_fsmount
433 common fspick __x64_sys_fspick
+434 common pidfd_open __x64_sys_pidfd_open
#
# x32-specific system call numbers start at 512 to avoid cache impact
diff --git a/arch/xtensa/kernel/syscalls/syscall.tbl b/arch/xtensa/kernel/syscalls/syscall.tbl
index 5fa0ee1c8e00..782b81945ccc 100644
--- a/arch/xtensa/kernel/syscalls/syscall.tbl
+++ b/arch/xtensa/kernel/syscalls/syscall.tbl
@@ -404,3 +404,4 @@
431 common fsconfig sys_fsconfig
432 common fsmount sys_fsmount
433 common fspick sys_fspick
+434 common pidfd_open sys_pidfd_open
diff --git a/include/linux/pid.h b/include/linux/pid.h
index 3c8ef5a199ca..c938a92eab99 100644
--- a/include/linux/pid.h
+++ b/include/linux/pid.h
@@ -67,6 +67,7 @@ struct pid
extern struct pid init_struct_pid;
extern const struct file_operations pidfd_fops;
+extern int pidfd_create(struct pid *pid);
static inline struct pid *get_pid(struct pid *pid)
{
diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index e2870fe1be5b..989055e0b501 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -929,6 +929,7 @@ asmlinkage long sys_clock_adjtime32(clockid_t which_clock,
struct old_timex32 __user *tx);
asmlinkage long sys_syncfs(int fd);
asmlinkage long sys_setns(int fd, int nstype);
+asmlinkage long sys_pidfd_open(pid_t pid, unsigned int flags);
asmlinkage long sys_sendmmsg(int fd, struct mmsghdr __user *msg,
unsigned int vlen, unsigned flags);
asmlinkage long sys_process_vm_readv(pid_t pid,
diff --git a/include/uapi/asm-generic/unistd.h b/include/uapi/asm-generic/unistd.h
index a87904daf103..e5684a4512c0 100644
--- a/include/uapi/asm-generic/unistd.h
+++ b/include/uapi/asm-generic/unistd.h
@@ -844,9 +844,11 @@ __SYSCALL(__NR_fsconfig, sys_fsconfig)
__SYSCALL(__NR_fsmount, sys_fsmount)
#define __NR_fspick 433
__SYSCALL(__NR_fspick, sys_fspick)
+#define __NR_pidfd_open 434
+__SYSCALL(__NR_pidfd_open, sys_pidfd_open)
#undef __NR_syscalls
-#define __NR_syscalls 434
+#define __NR_syscalls 435
/*
* 32 bit systems traditionally used different
diff --git a/kernel/fork.c b/kernel/fork.c
index b4cba953040a..c3df226f47a1 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1724,7 +1724,7 @@ const struct file_operations pidfd_fops = {
* Return: On success, a cloexec pidfd is returned.
* On error, a negative errno number will be returned.
*/
-static int pidfd_create(struct pid *pid)
+int pidfd_create(struct pid *pid)
{
int fd;
diff --git a/kernel/pid.c b/kernel/pid.c
index 89548d35eefb..8fc9d94f6ac1 100644
--- a/kernel/pid.c
+++ b/kernel/pid.c
@@ -37,6 +37,7 @@
#include <linux/syscalls.h>
#include <linux/proc_ns.h>
#include <linux/proc_fs.h>
+#include <linux/sched/signal.h>
#include <linux/sched/task.h>
#include <linux/idr.h>
@@ -450,6 +451,48 @@ struct pid *find_ge_pid(int nr, struct pid_namespace *ns)
return idr_get_next(&ns->idr, &nr);
}
+/**
+ * pidfd_open() - Open new pid file descriptor.
+ *
+ * @pid: pid for which to retrieve a pidfd
+ * @flags: flags to pass
+ *
+ * This creates a new pid file descriptor with the O_CLOEXEC flag set for
+ * the process identified by @pid. Currently, the process identified by
+ * @pid must be a thread-group leader. This restriction currently exists
+ * for all aspects of pidfds including pidfd creation (CLONE_PIDFD cannot
+ * be used with CLONE_THREAD) and pidfd polling (only supports thread group
+ * leaders).
+ *
+ * Return: On success, a cloexec pidfd is returned.
+ * On error, a negative errno number will be returned.
+ */
+SYSCALL_DEFINE2(pidfd_open, pid_t, pid, unsigned int, flags)
+{
+ int fd, ret;
+ struct pid *p;
+
+ if (flags)
+ return -EINVAL;
+
+ if (pid <= 0)
+ return -EINVAL;
+
+ p = find_get_pid(pid);
+ if (!p)
+ return -ESRCH;
+
+ ret = 0;
+ rcu_read_lock();
+ if (!pid_task(p, PIDTYPE_TGID))
+ ret = -EINVAL;
+ rcu_read_unlock();
+
+ fd = ret ?: pidfd_create(p);
+ put_pid(p);
+ return fd;
+}
+
void __init pid_idr_init(void)
{
/* Verify no one has done anything silly: */
--
2.21.0
This reverts commit 2b845d4b4acd9422bbb668989db8dc36dfc8f438.
That commit introduces build issues for programs compiled in Thumb mode.
Rather than try to be clever and emit a valid trap instruction on arm32,
which requires special care about big/little endian handling on that
architecture, just emit plain data. Data in the instruction stream is
technically expected on arm32: this is how literal pools are
implemented. Reverting to the prior behavior does exactly that.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
CC: Peter Zijlstra <peterz(a)infradead.org>
CC: Thomas Gleixner <tglx(a)linutronix.de>
CC: Joel Fernandes <joelaf(a)google.com>
CC: Catalin Marinas <catalin.marinas(a)arm.com>
CC: Dave Watson <davejwatson(a)fb.com>
CC: Will Deacon <will.deacon(a)arm.com>
CC: Shuah Khan <shuah(a)kernel.org>
CC: Andi Kleen <andi(a)firstfloor.org>
CC: linux-kselftest(a)vger.kernel.org
CC: "H . Peter Anvin" <hpa(a)zytor.com>
CC: Chris Lameter <cl(a)linux.com>
CC: Russell King <linux(a)arm.linux.org.uk>
CC: Michael Kerrisk <mtk.manpages(a)gmail.com>
CC: "Paul E . McKenney" <paulmck(a)linux.vnet.ibm.com>
CC: Paul Turner <pjt(a)google.com>
CC: Boqun Feng <boqun.feng(a)gmail.com>
CC: Josh Triplett <josh(a)joshtriplett.org>
CC: Steven Rostedt <rostedt(a)goodmis.org>
CC: Ben Maurer <bmaurer(a)fb.com>
CC: linux-api(a)vger.kernel.org
CC: Andy Lutomirski <luto(a)amacapital.net>
CC: Andrew Morton <akpm(a)linux-foundation.org>
CC: Linus Torvalds <torvalds(a)linux-foundation.org>
CC: Carlos O'Donell <carlos(a)redhat.com>
CC: Florian Weimer <fweimer(a)redhat.com>
---
tools/testing/selftests/rseq/rseq-arm.h | 52 ++-------------------------------
1 file changed, 2 insertions(+), 50 deletions(-)
diff --git a/tools/testing/selftests/rseq/rseq-arm.h b/tools/testing/selftests/rseq/rseq-arm.h
index 84f28f147fb6..5f262c54364f 100644
--- a/tools/testing/selftests/rseq/rseq-arm.h
+++ b/tools/testing/selftests/rseq/rseq-arm.h
@@ -5,54 +5,7 @@
* (C) Copyright 2016-2018 - Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com>
*/
-/*
- * RSEQ_SIG uses the udf A32 instruction with an uncommon immediate operand
- * value 0x5de3. This traps if user-space reaches this instruction by mistake,
- * and the uncommon operand ensures the kernel does not move the instruction
- * pointer to attacker-controlled code on rseq abort.
- *
- * The instruction pattern in the A32 instruction set is:
- *
- * e7f5def3 udf #24035 ; 0x5de3
- *
- * This translates to the following instruction pattern in the T16 instruction
- * set:
- *
- * little endian:
- * def3 udf #243 ; 0xf3
- * e7f5 b.n <7f5>
- *
- * pre-ARMv6 big endian code:
- * e7f5 b.n <7f5>
- * def3 udf #243 ; 0xf3
- *
- * ARMv6+ -mbig-endian generates mixed endianness code vs data: little-endian
- * code and big-endian data. Ensure the RSEQ_SIG data signature matches code
- * endianness. Prior to ARMv6, -mbig-endian generates big-endian code and data
- * (which match), so there is no need to reverse the endianness of the data
- * representation of the signature. However, the choice between BE32 and BE8
- * is done by the linker, so we cannot know whether code and data endianness
- * will be mixed before the linker is invoked.
- */
-
-#define RSEQ_SIG_CODE 0xe7f5def3
-
-#ifndef __ASSEMBLER__
-
-#define RSEQ_SIG_DATA \
- ({ \
- int sig; \
- asm volatile ("b 2f\n\t" \
- "1: .inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
- "2:\n\t" \
- "ldr %[sig], 1b\n\t" \
- : [sig] "=r" (sig)); \
- sig; \
- })
-
-#define RSEQ_SIG RSEQ_SIG_DATA
-
-#endif
+#define RSEQ_SIG 0x53053053
#define rseq_smp_mb() __asm__ __volatile__ ("dmb" ::: "memory", "cc")
#define rseq_smp_rmb() __asm__ __volatile__ ("dmb" ::: "memory", "cc")
@@ -125,8 +78,7 @@ do { \
__rseq_str(table_label) ":\n\t" \
".word " __rseq_str(version) ", " __rseq_str(flags) "\n\t" \
".word " __rseq_str(start_ip) ", 0x0, " __rseq_str(post_commit_offset) ", 0x0, " __rseq_str(abort_ip) ", 0x0\n\t" \
- ".arm\n\t" \
- ".inst " __rseq_str(RSEQ_SIG_CODE) "\n\t" \
+ ".word " __rseq_str(RSEQ_SIG) "\n\t" \
__rseq_str(label) ":\n\t" \
teardown \
"b %l[" __rseq_str(abort_label) "]\n\t"
--
2.11.0
selftests: bpf test_libbpf.sh failed running Linux -next kernel
20190618 and 20190619.
Here is the log from x86_64,
# selftests bpf test_libbpf.sh
bpf: test_libbpf.sh_ #
# [0] libbpf BTF is required, but is missing or corrupted.
libbpf: BTF_is #
# test_libbpf failed at file test_l4lb.o
failed: at_file #
# selftests test_libbpf [FAILED]
test_libbpf: [FAILED]_ #
[FAIL] 29 selftests bpf test_libbpf.sh
selftests: bpf_test_libbpf.sh [FAIL]
Full test log,
https://qa-reports.linaro.org/lkft/linux-next-oe/build/next-20190619/testru…
Test results comparison,
https://qa-reports.linaro.org/lkft/linux-next-oe/tests/kselftest/bpf_test_l…
Good linux -next tag: next-20190617
Bad linux -next tag: next-20190618
git branch master
git commit 1c6b40509daf5190b1fd2c758649f7df1da4827b
git repo
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
Best regards
Naresh Kamboju
---------- Forwarded message ---------
From: Jeffrin Thalakkottoor <jeffrin(a)rajagiritech.edu.in>
Date: Fri, Jun 14, 2019 at 12:06 AM
Subject: BUG: KASAN: global-out-of-bounds in ata_exec_internal_sg+0x50f/0xc70
To: <axboe(a)kernel.dk>, <jejb(a)linux.ibm.com>, <martin.petersen(a)oracle.com>
Cc: lkml <linux-kernel(a)vger.kernel.org>, <linux-ide(a)vger.kernel.org>,
<linux-scsi(a)vger.kernel.org>
hello ,
[ 55.169278] ==================================================================
[ 55.169899] BUG: KASAN: global-out-of-bounds in
ata_exec_internal_sg+0x50f/0xc70 [libata]
[ 55.170039] Read of size 16 at addr ffffffffc0723500 by task scsi_eh_1/149
[ 55.186354] The buggy address belongs to the variable:
[ 55.186972] cdb.48295+0x0/0xfffffffffffeab00 [libata]
[ 55.187171] Memory state around the buggy address:
[ 55.187290] ffffffffc0723400: 00 00 fa fa fa fa fa fa 00 00 fa fa
fa fa fa fa
[ 55.187417] ffffffffc0723480: 00 00 00 00 00 00 00 00 05 fa fa fa
fa fa fa fa
[ 55.187544] >ffffffffc0723500: 00 04 fa fa fa fa fa fa 00 00 fa fa
fa fa fa fa
[ 55.187686] ^
[ 55.187810] ffffffffc0723580: 00 00 05 fa fa fa fa fa 00 00 00 00
00 00 00 00
[ 55.187940] ffffffffc0723600: 00 00 00 00 00 00 00 00 00 00 00 00
00 00 00 00
[ 55.188060] ==================================================================
output of dmesg with the filename kasan.txt attached
--
software engineer
rajagiri school of engineering and technology
=== Overview
arm64 has a feature called Top Byte Ignore, which allows to embed pointer
tags into the top byte of each pointer. Userspace programs (such as
HWASan, a memory debugging tool [1]) might use this feature and pass
tagged user pointers to the kernel through syscalls or other interfaces.
Right now the kernel is already able to handle user faults with tagged
pointers, due to these patches:
1. 81cddd65 ("arm64: traps: fix userspace cache maintenance emulation on a
tagged pointer")
2. 7dcd9dd8 ("arm64: hw_breakpoint: fix watchpoint matching for tagged
pointers")
3. 276e9327 ("arm64: entry: improve data abort handling of tagged
pointers")
This patchset extends tagged pointer support to syscall arguments.
As per the proposed ABI change [3], tagged pointers are only allowed to be
passed to syscalls when they point to memory ranges obtained by anonymous
mmap() or sbrk() (see the patchset [3] for more details).
For non-memory syscalls this is done by untaging user pointers when the
kernel performs pointer checking to find out whether the pointer comes
from userspace (most notably in access_ok). The untagging is done only
when the pointer is being checked, the tag is preserved as the pointer
makes its way through the kernel and stays tagged when the kernel
dereferences the pointer when perfoming user memory accesses.
The mmap and mremap (only new_addr) syscalls do not currently accept
tagged addresses. Architectures may interpret the tag as a background
colour for the corresponding vma.
Other memory syscalls (mprotect, etc.) don't do user memory accesses but
rather deal with memory ranges, and untagged pointers are better suited to
describe memory ranges internally. Thus for memory syscalls we untag
pointers completely when they enter the kernel.
=== Other approaches
One of the alternative approaches to untagging that was considered is to
completely strip the pointer tag as the pointer enters the kernel with
some kind of a syscall wrapper, but that won't work with the countless
number of different ioctl calls. With this approach we would need a custom
wrapper for each ioctl variation, which doesn't seem practical.
An alternative approach to untagging pointers in memory syscalls prologues
is to inspead allow tagged pointers to be passed to find_vma() (and other
vma related functions) and untag them there. Unfortunately, a lot of
find_vma() callers then compare or subtract the returned vma start and end
fields against the pointer that was being searched. Thus this approach
would still require changing all find_vma() callers.
=== Testing
The following testing approaches has been taken to find potential issues
with user pointer untagging:
1. Static testing (with sparse [2] and separately with a custom static
analyzer based on Clang) to track casts of __user pointers to integer
types to find places where untagging needs to be done.
2. Static testing with grep to find parts of the kernel that call
find_vma() (and other similar functions) or directly compare against
vm_start/vm_end fields of vma.
3. Static testing with grep to find parts of the kernel that compare
user pointers with TASK_SIZE or other similar consts and macros.
4. Dynamic testing: adding BUG_ON(has_tag(addr)) to find_vma() and running
a modified syzkaller version that passes tagged pointers to the kernel.
Based on the results of the testing the requried patches have been added
to the patchset.
=== Notes
This patchset is meant to be merged together with "arm64 relaxed ABI" [3].
This patchset is a prerequisite for ARM's memory tagging hardware feature
support [4].
This patchset has been merged into the Pixel 2 & 3 kernel trees and is
now being used to enable testing of Pixel phones with HWASan.
Thanks!
[1] http://clang.llvm.org/docs/HardwareAssistedAddressSanitizerDesign.html
[2] https://github.com/lucvoo/sparse-dev/commit/5f960cb10f56ec2017c128ef9d16060…
[3] https://lkml.org/lkml/2019/3/18/819
[4] https://community.arm.com/processors/b/blog/posts/arm-a-profile-architectur…
=== History
Changes in v17:
- The "uaccess: add noop untagged_addr definition" patch is dropped, as it
was merged into upstream named as "uaccess: add noop untagged_addr
definition".
- Merged "mm, arm64: untag user pointers in do_pages_move" into
"mm, arm64: untag user pointers passed to memory syscalls".
- Added "arm64: Introduce prctl() options to control the tagged user
addresses ABI" patch from Catalin.
- Add tags_lib.so to tools/testing/selftests/arm64/.gitignore.
- Added a comment clarifying untagged in mremap.
- Moved untagging back into mlx4_get_umem_mr() for the IB patch.
Changes in v16:
- Moved untagging for memory syscalls from arm64 wrappers back to generic
code.
- Dropped untagging for the following memory syscalls: brk, mmap, munmap;
mremap (only dropped for new_address); mmap_pgoff (not used on arm64);
remap_file_pages (deprecated); shmat, shmdt (work on shared memory).
- Changed kselftest to LD_PRELOAD a shared library that overrides malloc
to return tagged pointers.
- Rebased onto 5.2-rc3.
Changes in v15:
- Removed unnecessary untagging from radeon_ttm_tt_set_userptr().
- Removed unnecessary untagging from amdgpu_ttm_tt_set_userptr().
- Moved untagging to validate_range() in userfaultfd code.
- Moved untagging to ib_uverbs_(re)reg_mr() from mlx4_get_umem_mr().
- Rebased onto 5.1.
Changes in v14:
- Moved untagging for most memory syscalls to an arm64 specific
implementation, instead of doing that in the common code.
- Dropped "net, arm64: untag user pointers in tcp_zerocopy_receive", since
the provided user pointers don't come from an anonymous map and thus are
not covered by this ABI relaxation.
- Dropped "kernel, arm64: untag user pointers in prctl_set_mm*".
- Moved untagging from __check_mem_type() to tee_shm_register().
- Updated untagging for the amdgpu and radeon drivers to cover the MMU
notifier, as suggested by Felix.
- Since this ABI relaxation doesn't actually allow tagged instruction
pointers, dropped the following patches:
- Dropped "tracing, arm64: untag user pointers in seq_print_user_ip".
- Dropped "uprobes, arm64: untag user pointers in find_active_uprobe".
- Dropped "bpf, arm64: untag user pointers in stack_map_get_build_id_offset".
- Rebased onto 5.1-rc7 (37624b58).
Changes in v13:
- Simplified untagging in tcp_zerocopy_receive().
- Looked at find_vma() callers in drivers/, which allowed to identify a
few other places where untagging is needed.
- Added patch "mm, arm64: untag user pointers in get_vaddr_frames".
- Added patch "drm/amdgpu, arm64: untag user pointers in
amdgpu_ttm_tt_get_user_pages".
- Added patch "drm/radeon, arm64: untag user pointers in
radeon_ttm_tt_pin_userptr".
- Added patch "IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr".
- Added patch "media/v4l2-core, arm64: untag user pointers in
videobuf_dma_contig_user_get".
- Added patch "tee/optee, arm64: untag user pointers in check_mem_type".
- Added patch "vfio/type1, arm64: untag user pointers".
Changes in v12:
- Changed untagging in tcp_zerocopy_receive() to also untag zc->address.
- Fixed untagging in prctl_set_mm* to only untag pointers for vma lookups
and validity checks, but leave them as is for actual user space accesses.
- Updated the link to the v2 of the "arm64 relaxed ABI" patchset [3].
- Dropped the documentation patch, as the "arm64 relaxed ABI" patchset [3]
handles that.
Changes in v11:
- Added "uprobes, arm64: untag user pointers in find_active_uprobe" patch.
- Added "bpf, arm64: untag user pointers in stack_map_get_build_id_offset"
patch.
- Fixed "tracing, arm64: untag user pointers in seq_print_user_ip" to
correctly perform subtration with a tagged addr.
- Moved untagged_addr() from SYSCALL_DEFINE3(mprotect) and
SYSCALL_DEFINE4(pkey_mprotect) to do_mprotect_pkey().
- Moved untagged_addr() definition for other arches from
include/linux/memory.h to include/linux/mm.h.
- Changed untagging in strn*_user() to perform userspace accesses through
tagged pointers.
- Updated the documentation to mention that passing tagged pointers to
memory syscalls is allowed.
- Updated the test to use malloc'ed memory instead of stack memory.
Changes in v10:
- Added "mm, arm64: untag user pointers passed to memory syscalls" back.
- New patch "fs, arm64: untag user pointers in fs/userfaultfd.c".
- New patch "net, arm64: untag user pointers in tcp_zerocopy_receive".
- New patch "kernel, arm64: untag user pointers in prctl_set_mm*".
- New patch "tracing, arm64: untag user pointers in seq_print_user_ip".
Changes in v9:
- Rebased onto 4.20-rc6.
- Used u64 instead of __u64 in type casts in the untagged_addr macro for
arm64.
- Added braces around (addr) in the untagged_addr macro for other arches.
Changes in v8:
- Rebased onto 65102238 (4.20-rc1).
- Added a note to the cover letter on why syscall wrappers/shims that untag
user pointers won't work.
- Added a note to the cover letter that this patchset has been merged into
the Pixel 2 kernel tree.
- Documentation fixes, in particular added a list of syscalls that don't
support tagged user pointers.
Changes in v7:
- Rebased onto 17b57b18 (4.19-rc6).
- Dropped the "arm64: untag user address in __do_user_fault" patch, since
the existing patches already handle user faults properly.
- Dropped the "usb, arm64: untag user addresses in devio" patch, since the
passed pointer must come from a vma and therefore be untagged.
- Dropped the "arm64: annotate user pointers casts detected by sparse"
patch (see the discussion to the replies of the v6 of this patchset).
- Added more context to the cover letter.
- Updated Documentation/arm64/tagged-pointers.txt.
Changes in v6:
- Added annotations for user pointer casts found by sparse.
- Rebased onto 050cdc6c (4.19-rc1+).
Changes in v5:
- Added 3 new patches that add untagging to places found with static
analysis.
- Rebased onto 44c929e1 (4.18-rc8).
Changes in v4:
- Added a selftest for checking that passing tagged pointers to the
kernel succeeds.
- Rebased onto 81e97f013 (4.18-rc1+).
Changes in v3:
- Rebased onto e5c51f30 (4.17-rc6+).
- Added linux-arch@ to the list of recipients.
Changes in v2:
- Rebased onto 2d618bdf (4.17-rc3+).
- Removed excessive untagging in gup.c.
- Removed untagging pointers returned from __uaccess_mask_ptr.
Changes in v1:
- Rebased onto 4.17-rc1.
Changes in RFC v2:
- Added "#ifndef untagged_addr..." fallback in linux/uaccess.h instead of
defining it for each arch individually.
- Updated Documentation/arm64/tagged-pointers.txt.
- Dropped "mm, arm64: untag user addresses in memory syscalls".
- Rebased onto 3eb2ce82 (4.16-rc7).
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Andrey Konovalov (14):
arm64: untag user pointers in access_ok and __uaccess_mask_ptr
lib, arm64: untag user pointers in strn*_user
mm, arm64: untag user pointers passed to memory syscalls
mm, arm64: untag user pointers in mm/gup.c
mm, arm64: untag user pointers in get_vaddr_frames
fs, arm64: untag user pointers in copy_mount_options
userfaultfd, arm64: untag user pointers
drm/amdgpu, arm64: untag user pointers
drm/radeon, arm64: untag user pointers in radeon_gem_userptr_ioctl
IB/mlx4, arm64: untag user pointers in mlx4_get_umem_mr
media/v4l2-core, arm64: untag user pointers in
videobuf_dma_contig_user_get
tee/shm, arm64: untag user pointers in tee_shm_register
vfio/type1, arm64: untag user pointers in vaddr_get_pfn
selftests, arm64: add a selftest for passing tagged pointers to kernel
Catalin Marinas (1):
arm64: Introduce prctl() options to control the tagged user addresses
ABI
arch/arm64/include/asm/processor.h | 6 ++
arch/arm64/include/asm/thread_info.h | 1 +
arch/arm64/include/asm/uaccess.h | 11 ++-
arch/arm64/kernel/process.c | 67 +++++++++++++++++++
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +
drivers/gpu/drm/radeon/radeon_gem.c | 2 +
drivers/infiniband/hw/mlx4/mr.c | 7 +-
drivers/media/v4l2-core/videobuf-dma-contig.c | 9 +--
drivers/tee/tee_shm.c | 1 +
drivers/vfio/vfio_iommu_type1.c | 2 +
fs/namespace.c | 2 +-
fs/userfaultfd.c | 22 +++---
include/uapi/linux/prctl.h | 5 ++
kernel/sys.c | 16 +++++
lib/strncpy_from_user.c | 3 +-
lib/strnlen_user.c | 3 +-
mm/frame_vector.c | 2 +
mm/gup.c | 4 ++
mm/madvise.c | 2 +
mm/mempolicy.c | 3 +
mm/migrate.c | 2 +-
mm/mincore.c | 2 +
mm/mlock.c | 4 ++
mm/mprotect.c | 2 +
mm/mremap.c | 7 ++
mm/msync.c | 2 +
tools/testing/selftests/arm64/.gitignore | 2 +
tools/testing/selftests/arm64/Makefile | 22 ++++++
.../testing/selftests/arm64/run_tags_test.sh | 12 ++++
tools/testing/selftests/arm64/tags_lib.c | 62 +++++++++++++++++
tools/testing/selftests/arm64/tags_test.c | 18 +++++
32 files changed, 282 insertions(+), 25 deletions(-)
create mode 100644 tools/testing/selftests/arm64/.gitignore
create mode 100644 tools/testing/selftests/arm64/Makefile
create mode 100755 tools/testing/selftests/arm64/run_tags_test.sh
create mode 100644 tools/testing/selftests/arm64/tags_lib.c
create mode 100644 tools/testing/selftests/arm64/tags_test.c
--
2.22.0.rc2.383.gf4fbbf30c2-goog
[ Upstream commit 00e38a5d753d7788852f81703db804a60a84c26e ]
The cgroup testing relys on the root cgroup's subtree_control setting,
If the 'memory' controller isn't set, some test cases will be failed
as following:
$sudo ./test_core
not ok 1 test_cgcore_internal_process_constraint
ok 2 test_cgcore_top_down_constraint_enable
not ok 3 test_cgcore_top_down_constraint_disable
...
To correct this unexpected failure, this patch write the 'memory' to
subtree_control of root to get a right result.
Signed-off-by: Alex Shi <alex.shi(a)linux.alibaba.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Claudio Zumbo <claudioz(a)fb.com>
Cc: Claudio <claudiozumbo(a)gmail.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/cgroup/test_core.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/cgroup/test_core.c b/tools/testing/selftests/cgroup/test_core.c
index be59f9c34ea2..d78f1c5366d3 100644
--- a/tools/testing/selftests/cgroup/test_core.c
+++ b/tools/testing/selftests/cgroup/test_core.c
@@ -376,6 +376,11 @@ int main(int argc, char *argv[])
if (cg_find_unified_root(root, sizeof(root)))
ksft_exit_skip("cgroup v2 isn't mounted\n");
+
+ if (cg_read_strstr(root, "cgroup.subtree_control", "memory"))
+ if (cg_write(root, "cgroup.subtree_control", "+memory"))
+ ksft_exit_skip("Failed to set memory controller\n");
+
for (i = 0; i < ARRAY_SIZE(tests); i++) {
switch (tests[i].fn(root)) {
case KSFT_PASS:
--
2.20.1
[ Upstream commit f6131f28057d4fd8922599339e701a2504e0f23d ]
The cgroup testing relies on the root cgroup's subtree_control setting,
If the 'memory' controller isn't set, all test cases will be failed
as following:
$ sudo ./test_memcontrol
not ok 1 test_memcg_subtree_control
not ok 2 test_memcg_current
ok 3 # skip test_memcg_min
not ok 4 test_memcg_low
not ok 5 test_memcg_high
not ok 6 test_memcg_max
not ok 7 test_memcg_oom_events
ok 8 # skip test_memcg_swap_max
not ok 9 test_memcg_sock
not ok 10 test_memcg_oom_group_leaf_events
not ok 11 test_memcg_oom_group_parent_events
not ok 12 test_memcg_oom_group_score_events
To correct this unexpected failure, this patch write the 'memory' to
subtree_control of root to get a right result.
Signed-off-by: Alex Shi <alex.shi(a)linux.alibaba.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Jay Kamat <jgkamat(a)fb.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/cgroup/test_memcontrol.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 6f339882a6ca..c19a97dd02d4 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -1205,6 +1205,10 @@ int main(int argc, char **argv)
if (cg_read_strstr(root, "cgroup.controllers", "memory"))
ksft_exit_skip("memory controller isn't available\n");
+ if (cg_read_strstr(root, "cgroup.subtree_control", "memory"))
+ if (cg_write(root, "cgroup.subtree_control", "+memory"))
+ ksft_exit_skip("Failed to set memory controller\n");
+
for (i = 0; i < ARRAY_SIZE(tests); i++) {
switch (tests[i].fn(root)) {
case KSFT_PASS:
--
2.20.1
[ Upstream commit 00e38a5d753d7788852f81703db804a60a84c26e ]
The cgroup testing relys on the root cgroup's subtree_control setting,
If the 'memory' controller isn't set, some test cases will be failed
as following:
$sudo ./test_core
not ok 1 test_cgcore_internal_process_constraint
ok 2 test_cgcore_top_down_constraint_enable
not ok 3 test_cgcore_top_down_constraint_disable
...
To correct this unexpected failure, this patch write the 'memory' to
subtree_control of root to get a right result.
Signed-off-by: Alex Shi <alex.shi(a)linux.alibaba.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Claudio Zumbo <claudioz(a)fb.com>
Cc: Claudio <claudiozumbo(a)gmail.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/cgroup/test_core.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tools/testing/selftests/cgroup/test_core.c b/tools/testing/selftests/cgroup/test_core.c
index be59f9c34ea2..d78f1c5366d3 100644
--- a/tools/testing/selftests/cgroup/test_core.c
+++ b/tools/testing/selftests/cgroup/test_core.c
@@ -376,6 +376,11 @@ int main(int argc, char *argv[])
if (cg_find_unified_root(root, sizeof(root)))
ksft_exit_skip("cgroup v2 isn't mounted\n");
+
+ if (cg_read_strstr(root, "cgroup.subtree_control", "memory"))
+ if (cg_write(root, "cgroup.subtree_control", "+memory"))
+ ksft_exit_skip("Failed to set memory controller\n");
+
for (i = 0; i < ARRAY_SIZE(tests); i++) {
switch (tests[i].fn(root)) {
case KSFT_PASS:
--
2.20.1
[ Upstream commit f6131f28057d4fd8922599339e701a2504e0f23d ]
The cgroup testing relies on the root cgroup's subtree_control setting,
If the 'memory' controller isn't set, all test cases will be failed
as following:
$ sudo ./test_memcontrol
not ok 1 test_memcg_subtree_control
not ok 2 test_memcg_current
ok 3 # skip test_memcg_min
not ok 4 test_memcg_low
not ok 5 test_memcg_high
not ok 6 test_memcg_max
not ok 7 test_memcg_oom_events
ok 8 # skip test_memcg_swap_max
not ok 9 test_memcg_sock
not ok 10 test_memcg_oom_group_leaf_events
not ok 11 test_memcg_oom_group_parent_events
not ok 12 test_memcg_oom_group_score_events
To correct this unexpected failure, this patch write the 'memory' to
subtree_control of root to get a right result.
Signed-off-by: Alex Shi <alex.shi(a)linux.alibaba.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: Roman Gushchin <guro(a)fb.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Jay Kamat <jgkamat(a)fb.com>
Cc: linux-kselftest(a)vger.kernel.org
Cc: linux-kernel(a)vger.kernel.org
Reviewed-by: Roman Gushchin <guro(a)fb.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Shuah Khan <skhan(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
tools/testing/selftests/cgroup/test_memcontrol.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/tools/testing/selftests/cgroup/test_memcontrol.c b/tools/testing/selftests/cgroup/test_memcontrol.c
index 6f339882a6ca..c19a97dd02d4 100644
--- a/tools/testing/selftests/cgroup/test_memcontrol.c
+++ b/tools/testing/selftests/cgroup/test_memcontrol.c
@@ -1205,6 +1205,10 @@ int main(int argc, char **argv)
if (cg_read_strstr(root, "cgroup.controllers", "memory"))
ksft_exit_skip("memory controller isn't available\n");
+ if (cg_read_strstr(root, "cgroup.subtree_control", "memory"))
+ if (cg_write(root, "cgroup.subtree_control", "+memory"))
+ ksft_exit_skip("Failed to set memory controller\n");
+
for (i = 0; i < ARRAY_SIZE(tests); i++) {
switch (tests[i].fn(root)) {
case KSFT_PASS:
--
2.20.1
When ordinarily running the tests, upon `make install', the
following error is encountered:
ImportError: No module named tpm2_tests
because the Python files are not installed at the moment.
Fix this by adding both Python modules as accompanying
TEST_FILES in the Makefile.
Signed-off-by: Daniel Díaz <daniel.diaz(a)linaro.org>
---
tools/testing/selftests/tpm2/Makefile | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/testing/selftests/tpm2/Makefile b/tools/testing/selftests/tpm2/Makefile
index 9dd848427a7b..bf401f725eef 100644
--- a/tools/testing/selftests/tpm2/Makefile
+++ b/tools/testing/selftests/tpm2/Makefile
@@ -2,3 +2,4 @@
include ../lib.mk
TEST_PROGS := test_smoke.sh test_space.sh
+TEST_FILES := tpm2.py tpm2_tests.py
--
2.20.1