This is a followup from [1], in which a split of commits was suggested
by Greg. Additionally, the following changes were removed and not
included in this v2 version:
- dropped the binder_transaction_log_entry->strerr[] logic
- dropped the binder_transaction_error() do-it-all function
- dropped the re-work of current binder_user_error() messages
[1] https://lore.kernel.org/r/20220421042040.759068-1-cmllamas@google.com/
Carlos Llamas (5):
binder: add failed transaction logging info
binder: add BINDER_GET_EXTENDED_ERROR ioctl
binderfs: add extended_error feature entry
binder: convert logging macros into functions
binder: additional transaction error logs
drivers/android/binder.c | 153 ++++++++++++++++--
drivers/android/binder_internal.h | 3 +
drivers/android/binderfs.c | 8 +
include/uapi/linux/android/binder.h | 16 ++
.../filesystems/binderfs/binderfs_test.c | 1 +
5 files changed, 165 insertions(+), 16 deletions(-)
base-commit: 8013d1d3d2e33236dee13a133fba49ad55045e79
--
2.36.0.464.gb9c8b46e94-goog
When clatd starts with ebpf offloaing, and NETIF_F_GRO_FRAGLIST is enable,
several skbs are gathered in skb_shinfo(skb)->frag_list. The first skb's
ipv6 header will be changed to ipv4 after bpf_skb_proto_6_to_4,
network_header\transport_header\mac_header have been updated as ipv4 acts,
but other skbs in frag_list didnot update anything, just ipv6 packets.
udp_queue_rcv_skb will call skb_segment_list to traverse other skbs in
frag_list and make sure right udp payload is delivered to user space.
Unfortunately, other skbs in frag_list who are still ipv6 packets are
updated like the first skb and will have wrong transport header length.
e.g.before bpf_skb_proto_6_to_4,the first skb and other skbs in frag_list
has the same network_header(24)& transport_header(64), after
bpf_skb_proto_6_to_4, ipv6 protocol has been changed to ipv4, the first
skb's network_header is 44,transport_header is 64, other skbs in frag_list
didnot change.After skb_segment_list, the other skbs in frag_list has
different network_header(24) and transport_header(44), so there will be 20
bytes different from original,that is difference between ipv6 header and
ipv4 header. Just change transport_header to be the same with original.
Actually, there are two solutions to fix it, one is traversing all skbs
and changing every skb header in bpf_skb_proto_6_to_4, the other is
modifying frag_list skb's header in skb_segment_list. Considering
efficiency, adopt the second one--- when the first skb and other skbs in
frag_list has different network_header length, restore them to make sure
right udp payload is delivered to user space.
Signed-off-by: Lina Wang <lina.wang(a)mediatek.com>
---
net/core/skbuff.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 10bde7c6db44..e8006e0a1b25 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -3897,7 +3897,7 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
unsigned int delta_len = 0;
struct sk_buff *tail = NULL;
struct sk_buff *nskb, *tmp;
- int err;
+ int len_diff, err;
skb_push(skb, -skb_network_offset(skb) + offset);
@@ -3937,9 +3937,11 @@ struct sk_buff *skb_segment_list(struct sk_buff *skb,
skb_push(nskb, -skb_network_offset(nskb) + offset);
skb_release_head_state(nskb);
+ len_diff = skb_network_header_len(nskb) - skb_network_header_len(skb);
__copy_skb_header(nskb, skb);
skb_headers_offset_update(nskb, skb_headroom(nskb) - skb_headroom(skb));
+ nskb->transport_header += len_diff;
skb_copy_from_linear_data_offset(skb, -tnl_hlen,
nskb->data - tnl_hlen,
offset + tnl_hlen);
--
2.18.0
The gup_test binary will fail showing only the output of perror("open") in
the case that /sys/kernel/debug/gup_test is not found. This will almost
always be due to CONFIG_GUP_TEST not being set, which enables
compilation of a kernel that provides this file.
Add a short error message to clarify this failure and point the user to
the solution.
Signed-off-by: Joel Savitz <jsavitz(a)redhat.com>
---
tools/testing/selftests/vm/gup_test.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vm/gup_test.c b/tools/testing/selftests/vm/gup_test.c
index cda837a14736..ac4e804d47f0 100644
--- a/tools/testing/selftests/vm/gup_test.c
+++ b/tools/testing/selftests/vm/gup_test.c
@@ -18,6 +18,8 @@
#define FOLL_WRITE 0x01 /* check pte is writable */
#define FOLL_TOUCH 0x02 /* mark page accessed */
+#define GUP_TEST_FILE "/sys/kernel/debug/gup_test"
+
static unsigned long cmd = GUP_FAST_BENCHMARK;
static int gup_fd, repeats = 1;
static unsigned long size = 128 * MB;
@@ -204,9 +206,11 @@ int main(int argc, char **argv)
if (write)
gup.gup_flags |= FOLL_WRITE;
- gup_fd = open("/sys/kernel/debug/gup_test", O_RDWR);
+ gup_fd = open(GUP_TEST_FILE, O_RDWR);
if (gup_fd == -1) {
perror("open");
+ fprintf(stderr, "Unable to open %s: check that CONFIG_GUP_TEST=y\n",
+ GUP_TEST_FILE);
exit(1);
}
--
2.27.0
My name is Warren Buffett, an American businessman and investor I have
something important to discuss with you.
Mr. Warren Buffett
warren001buffett(a)gmail.com
Chief Executive Officer: Berkshire Hathaway
aphy/Warren-Edward-Buffett
[memo to self: don't send stuff on Friday evenings]
Sorry about the spam, resend w/o config, see
https://people.redhat.com/~cohuck/config-mte
On Fri, May 06 2022, Cornelia Huck <cohuck(a)redhat.com> wrote:
> Hi,
>
> I'm currently trying to run the MTE selftests on the FVP simulator (Base
> Model)[1], mainly to verify things are sane on the host before wiring up
> the KVM support in QEMU. However, I'm seeing some failures (the non-mte
> tests seemed all fine):
>
> # selftests: /arm64: check_buffer_fill
> # 1..20
> # ok 1 Check buffer correctness by byte with sync err mode and mmap memory
> # ok 2 Check buffer correctness by byte with async err mode and mmap memory
> # ok 3 Check buffer correctness by byte with sync err mode and mmap/mprotect memory
> # ok 4 Check buffer correctness by byte with async err mode and mmap/mprotect memory
> # not ok 5 Check buffer write underflow by byte with sync mode and mmap memory
> # not ok 6 Check buffer write underflow by byte with async mode and mmap memory
> # ok 7 Check buffer write underflow by byte with tag check fault ignore and mmap memory
> # not ok 8 Check buffer write underflow by byte with sync mode and mmap memory
> # not ok 9 Check buffer write underflow by byte with async mode and mmap memory
> # ok 10 Check buffer write underflow by byte with tag check fault ignore and mmap memory
> # not ok 11 Check buffer write overflow by byte with sync mode and mmap memory
> # not ok 12 Check buffer write overflow by byte with async mode and mmap memory
> # ok 13 Check buffer write overflow by byte with tag fault ignore mode and mmap memory
> # ok 14 Check buffer write correctness by block with sync mode and mmap memory
> # ok 15 Check buffer write correctness by block with async mode and mmap memory
> # ok 16 Check buffer write correctness by block with tag fault ignore and mmap memory
> # ok 17 Check initial tags with private mapping, sync error mode and mmap memory
> # ok 18 Check initial tags with private mapping, sync error mode and mmap/mprotect memory
> # ok 19 Check initial tags with shared mapping, sync error mode and mmap memory
> # ok 20 Check initial tags with shared mapping, sync error mode and mmap/mprotect memory
> # # Totals: pass:14 fail:6 xfail:0 xpass:0 skip:0 error:0
> not ok 24 selftests: /arm64: check_buffer_fill # exit=1
>
> # selftests: /arm64: check_child_memory
> # 1..12
> # not ok 1 Check child anonymous memory with private mapping, precise mode and mmap memory
> # not ok 2 Check child anonymous memory with shared mapping, precise mode and mmap memory
> # not ok 3 Check child anonymous memory with private mapping, imprecise mode and mmap memory
> # not ok 4 Check child anonymous memory with shared mapping, imprecise mode and mmap memory
> # not ok 5 Check child anonymous memory with private mapping, precise mode and mmap/mprotect memory
> # not ok 6 Check child anonymous memory with shared mapping, precise mode and mmap/mprotect memory
> # not ok 7 Check child file memory with private mapping, precise mode and mmap memory
> # not ok 8 Check child file memory with shared mapping, precise mode and mmap memory
> # not ok 9 Check child file memory with private mapping, imprecise mode and mmap memory
> # not ok 10 Check child file memory with shared mapping, imprecise mode and mmap memory
> # not ok 11 Check child file memory with private mapping, precise mode and mmap/mprotect memory
> # not ok 12 Check child file memory with shared mapping, precise mode and mmap/mprotect memory
> # # Totals: pass:0 fail:12 xfail:0 xpass:0 skip:0 error:0
> not ok 25 selftests: /arm64: check_child_memory # exit=1
>
> # selftests: /arm64: check_gcr_el1_cswitch
> # 1..1
> # 1..1
> # 1..1
> # 1..1
> [...many more lines of the same...]
> # 1..1
> #
> not ok 26 selftests: /arm64: check_gcr_el1_cswitch # TIMEOUT 45 seconds
>
> # selftests: /arm64: check_mmap_options
> # 1..22
> # ok 1 Check anonymous memory with private mapping, sync error mode, mmap memory and tag check off
> # ok 2 Check file memory with private mapping, sync error mode, mmap/mprotect memory and tag check off
> # ok 3 Check anonymous memory with private mapping, no error mode, mmap memory and tag check off
> # ok 4 Check file memory with private mapping, no error mode, mmap/mprotect memory and tag check off
> # not ok 5 Check anonymous memory with private mapping, sync error mode, mmap memory and tag check on
> # not ok 6 Check anonymous memory with private mapping, sync error mode, mmap/mprotect memory and tag check on
> # not ok 7 Check anonymous memory with shared mapping, sync error mode, mmap memory and tag check on
> # not ok 8 Check anonymous memory with shared mapping, sync error mode, mmap/mprotect memory and tag check on
> # not ok 9 Check anonymous memory with private mapping, async error mode, mmap memory and tag check on
> # not ok 10 Check anonymous memory with private mapping, async error mode, mmap/mprotect memory and tag check on
> # not ok 11 Check anonymous memory with shared mapping, async error mode, mmap memory and tag check on
> # not ok 12 Check anonymous memory with shared mapping, async error mode, mmap/mprotect memory and tag check on
> # not ok 13 Check file memory with private mapping, sync error mode, mmap memory and tag check on
> # not ok 14 Check file memory with private mapping, sync error mode, mmap/mprotect memory and tag check on
> # not ok 15 Check file memory with shared mapping, sync error mode, mmap memory and tag check on
> # not ok 16 Check file memory with shared mapping, sync error mode, mmap/mprotect memory and tag check on
> # not ok 17 Check file memory with private mapping, async error mode, mmap memory and tag check on
> # not ok 18 Check file memory with private mapping, async error mode, mmap/mprotect memory and tag check on
> # not ok 19 Check file memory with shared mapping, async error mode, mmap memory and tag check on
> # not ok 20 Check file memory with shared mapping, async error mode, mmap/mprotect memory and tag check on
> # not ok 21 Check clear PROT_MTE flags with private mapping, sync error mode and mmap memory
> # not ok 22 Check clear PROT_MTE flags with private mapping and sync error mode and mmap/mprotect memory
> # # Totals: pass:4 fail:18 xfail:0 xpass:0 skip:0 error:0
> not ok 28 selftests: /arm64: check_mmap_options # exit=1
>
> # selftests: /arm64: check_tags_inclusion
> # 1..4
> # not ok 1 Check an included tag value with sync mode
> # not ok 2 Check different included tags value with sync mode
> # ok 3 Check none included tags value with sync mode
> # not ok 4 Check all included tags value with sync mode
> # # Totals: pass:1 fail:3 xfail:0 xpass:0 skip:0 error:0
> not ok 29 selftests: /arm64: check_tags_inclusion # exit=1
>
> check_ksm_options and check_user_mem work as expected.
>
> Are the MTE tests supposed to work on the FVP model? Something broken in
> my config? Anything I can debug?
>
> [1] Command line:
> "$MODEL" \
> -C cache_state_modelled=0 \
> -C bp.refcounter.non_arch_start_at_default=1 \
> -C bp.secure_memory=false \
> -C cluster0.has_arm_v8-1=1 \
> -C cluster0.has_arm_v8-2=1 \
> -C cluster0.has_arm_v8-3=1 \
> -C cluster0.has_arm_v8-4=1 \
> -C cluster0.has_arm_v8-5=1 \
> -C cluster0.has_amu=1 \
> -C cluster0.NUM_CORES=4 \
> -C cluster0.memory_tagging_support_level=2 \
> -a "cluster0.*=$AXF" \
>
> where $AXF contains a kernel at v5.18-rc5-16-g107c948d1d3e[2] and an
> initrd built by mbuto[3] from that level with a slightly tweaked "kselftests"
> profile (adding /dev/shm).
>
> [2] CONFIG_ARM64_MTE=y, no modules; complete config below[4]
>
> [3] https://mbuto.sh/mbuto/
>
> [4] kernel config: https://people.redhat.com/~cohuck/config-mte
Dzień dobry,
chciałbym poinformować Państwa o możliwości pozyskania nowych zleceń ze strony www.
Widzimy zainteresowanie potencjalnych Klientów Państwa firmą, dlatego chętnie pomożemy Państwu dotrzeć z ofertą do większego grona odbiorców poprzez efektywne metody pozycjonowania strony w Google.
Czy mógłbym liczyć na kontakt zwrotny?
Pozdrawiam,
Mikołaj Rudzik
This patch series is preparation to fix the problems when execute
the multiple_kprobes.tc selftest on mips, some more work needs to
be done.
Tiezhu Yang (2):
selftests/ftrace: Save kprobe_events to test log
MIPS: Use NOKPROBE_SYMBOL() instead of __kprobes annotation
arch/mips/kernel/kprobes.c | 45 +++++++++++++++-------
arch/mips/mm/fault.c | 6 ++-
.../ftrace/test.d/kprobe/multiple_kprobes.tc | 2 +
3 files changed, 38 insertions(+), 15 deletions(-)
--
2.1.0
H e l l o,
I lead family investment vehicles who want to invest a proportion of their funds with a trust party .
Please are you interested in discussing investment in your sector?
Please email, or simply write to me here: allen.large(a)cheapnet.it I value promptness and will make every attempt to respond within a short time.
Thank you.
Allen S.
From: Frank Rowand <frank.rowand(a)sony.com>
The process to create version 2 of the KTAP Specification is documented
in email discussions. I am attempting to capture this information at
https://elinux.org/Test_Results_Format_Notes#KTAP_version_2
I am already not following the suggested process, which says:
"...please try to follow this principal of one major topic per email
thread." I think that is ok in this case because the two patches
are related and (hopefully) not controversial.
Changes since patch version 1:
- drop patch 1/2. Jonathan Corbet has already applied this patch
into version 1 of the Specification
- rename patch 2/2 to patch 1/2, with updated patch comment
- add new patch 2/2
Frank Rowand (2):
ktap_v2: change version to 2-rc in KTAP specification
ktap_v2: change "version 1" to "version 2" in examples
Documentation/dev-tools/ktap.rst | 25 +++++++++++++------------
1 file changed, 13 insertions(+), 12 deletions(-)
--
Frank Rowand <frank.rowand(a)sony.com>
When writing tests, it'd often be very useful to be able to intercept
calls to a function in the code being tested and replace it with a
test-specific stub. This has always been an obviously missing piece of
KUnit, and the solutions always involve some tradeoffs with cleanliness,
performance, or impact on non-test code. See the folowing document for
some of the challenges:
https://kunit.dev/mocking.html
This series consists of two prototype patches which add support for this
sort of redirection to KUnit tests:
1: static_stub: Any function which might want to be intercepted adds a
call to a macro which checks if a test has redirected calls to it, and
calls the corresponding replacement.
2: ftrace_stub: Functions are intercepted using ftrace and livepatch.
This doesn't require adding a new prologue to each function being
replaced, but does have more dependencies (which restricts it to a small
number of architectures, not including UML), and doesn't work well with
inline functions.
The API for both implementations is very similar, so it should be easy
to migrate from one to the other if necessary. Both of these
implementations restrict the redirection to the test context: it is
automatically undone after the KUnit test completes, and does not affect
calls in other threads. If CONFIG_KUNIT is not enabled, there should be
no overhead in either implementation.
Does either (or both) of these features sound useful, and is this
sort-of API the right model? (Personally, I think there's a reasonable
scope for both.) Is anything obviously missing or wrong? Do the names,
descriptions etc. make any sense?
Note that these patches are definitely still at the "prototype" level,
and things like error-handling, documentation, and testing are still
pretty sparse. There is also quite a bit of room for optimisation.
These'll all be improved for v1 if the concept seems good.
Cheers,
-- David
Daniel Latypov (1):
kunit: expose ftrace-based API for stubbing out functions during tests
David Gow (1):
kunit: Expose 'static stub' API to redirect functions
include/kunit/ftrace_stub.h | 84 +++++++++++++++++
include/kunit/static_stub.h | 106 +++++++++++++++++++++
lib/kunit/Kconfig | 11 +++
lib/kunit/Makefile | 5 +
lib/kunit/ftrace_stub.c | 138 ++++++++++++++++++++++++++++
lib/kunit/kunit-example-test.c | 64 +++++++++++++
lib/kunit/static_stub.c | 125 +++++++++++++++++++++++++
lib/kunit/stubs_example.kunitconfig | 11 +++
8 files changed, 544 insertions(+)
create mode 100644 include/kunit/ftrace_stub.h
create mode 100644 include/kunit/static_stub.h
create mode 100644 lib/kunit/ftrace_stub.c
create mode 100644 lib/kunit/static_stub.c
create mode 100644 lib/kunit/stubs_example.kunitconfig
--
2.35.1.894.gb6a874cedc-goog
Currently the arm64 kselftests attempt to locate the ABI headers using
custom logic which doesn't work correctly in the case of out of tree builds
if KBUILD_OUTPUT is not specified. Since lib.mk defines KHDR_INCLUDES with
the appropriate flags we can simply remove the custom logic and use that
instead.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
This fix is required to get us able to run the arm64 kselftests
in KernelCI, it does out of tree kselftest builds triggering the
issue especially in conjunction with the addition of the new
definitions for SME.
tools/testing/selftests/arm64/Makefile | 11 +----------
1 file changed, 1 insertion(+), 10 deletions(-)
diff --git a/tools/testing/selftests/arm64/Makefile b/tools/testing/selftests/arm64/Makefile
index 1e8d9a8f59df..9460cbe81bcc 100644
--- a/tools/testing/selftests/arm64/Makefile
+++ b/tools/testing/selftests/arm64/Makefile
@@ -17,16 +17,7 @@ top_srcdir = $(realpath ../../../../)
# Additional include paths needed by kselftest.h and local headers
CFLAGS += -I$(top_srcdir)/tools/testing/selftests/
-# Guessing where the Kernel headers could have been installed
-# depending on ENV config
-ifeq ($(KBUILD_OUTPUT),)
-khdr_dir = $(top_srcdir)/usr/include
-else
-# the KSFT preferred location when KBUILD_OUTPUT is set
-khdr_dir = $(KBUILD_OUTPUT)/kselftest/usr/include
-endif
-
-CFLAGS += -I$(khdr_dir)
+CFLAGS += $(KHDR_INCLUDES)
export CFLAGS
export top_srcdir
--
2.30.2
There is a spelling mistake in an error message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index 29c973f606b2..136df5b76319 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -4320,7 +4320,7 @@ static ssize_t get_nth(struct __test_metadata *_metadata, const char *path,
f = fopen(path, "r");
ASSERT_NE(f, NULL) {
- TH_LOG("Coud not open %s: %s", path, strerror(errno));
+ TH_LOG("Could not open %s: %s", path, strerror(errno));
}
for (i = 0; i < position; i++) {
--
2.35.1
v10:
- Relax constraints for changes made to "cpuset.cpus"
and "cpuset.cpus.partition" as suggested. Now almost all changes
are allowed.
v9:
- Add a new patch 1 to remove the child cpuset restriction on parent's
"cpuset.cpus".
- Relax initial root partition entry limitation to allow cpuset.cpus to
overlap that of parent's.
- An "isolated invalid" displayed type is added to
cpuset.cpus.partition.
- Resetting partition root to "member" will leave child partition root
as invalid.
- Update documentation and test accordingly.
v8:
- Reorganize the patch series and rationalize the features and
constraints of a partition.
- Update patch descriptions and documentation accordingly.
This patchset include the following enhancements to the cpuset v2
partition code.
1) Allow partitions that have no task to have empty effective cpus.
2) Relax the constraints on what changes are allowed in cpuset.cpus
and cpuset.cpus.partition. However, the partition remain invalid
until the constraints of a valid partition root is satisfied.
3) Add a new "isolated" partition type for partitions with no load
balancing which is available in v1 but not yet in v2.
4) Allow the reading of cpuset.cpus.partition to include a reason
string as to why the partition remain invalid.
In addition, the cgroup-v2.rst documentation file is updated and
a self test is added to verify the correctness the partition code.
Waiman Long (8):
cgroup/cpuset: Add top_cpuset check in update_tasks_cpumask()
cgroup/cpuset: Miscellaneous cleanups & add helper functions
cgroup/cpuset: Allow no-task partition to have empty
cpuset.cpus.effective
cgroup/cpuset: Relax constraints to partition & cpus changes
cgroup/cpuset: Add a new isolated cpus.partition type
cgroup/cpuset: Show invalid partition reason string
cgroup/cpuset: Update description of cpuset.cpus.partition in
cgroup-v2.rst
kselftest/cgroup: Add cpuset v2 partition root state test
Documentation/admin-guide/cgroup-v2.rst | 145 ++--
kernel/cgroup/cpuset.c | 712 +++++++++++-------
tools/testing/selftests/cgroup/Makefile | 5 +-
.../selftests/cgroup/test_cpuset_prs.sh | 674 +++++++++++++++++
tools/testing/selftests/cgroup/wait_inotify.c | 87 +++
5 files changed, 1295 insertions(+), 328 deletions(-)
create mode 100755 tools/testing/selftests/cgroup/test_cpuset_prs.sh
create mode 100644 tools/testing/selftests/cgroup/wait_inotify.c
--
2.27.0
This patch series adds a memory.reclaim proactive reclaim interface.
The rationale behind the interface and how it works are in the first
patch.
---
Changes in V5:
- Fixed comment formating and added Co-developed-by in patch 1.
- Modified selftest to work if swap is enabled or not, and retry
multiple times to wait for background allocation before failing
with a clear message.
Changes in V4:
mm/memcontrol.c:
- Return -EINTR on signal_pending().
- On the final retry, drain percpu lru caches hoping that it might
introduce some evictable pages for reclaim.
- Simplified the retry loop as suggested by Dan Schatzberg.
selftests:
- Always return -errno on failure from cg_write() (whether open() or
write() fail), also update cg_read() and read_text() to return -errno
as well for consistency. Also make sure to correctly check that the
whole buffer was written in cg_write().
- Added a maximum number of retries for the reclaim selftest.
Changes in V3:
- Fix cg_write() (in patch 2) to properly return -1 if open() fails
and not fail if len == errno.
- Remove debug printf() in patch 3.
Changes in V2:
- Add the interface to root as well.
- Added a selftest.
- Documented the interface as a nested-keyed interface, which makes
adding optional arguments in the future easier (see doc updates in the
first patch).
- Modified the commit message to reflect changes and added a timeout
argument as a suggested possible extension
- Return -EAGAIN if the kernel fails to reclaim the full requested
amount.
---
Shakeel Butt (1):
memcg: introduce per-memcg reclaim interface
Yosry Ahmed (3):
selftests: cgroup: return -errno from cg_read()/cg_write() on failure
selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory
selftests: cgroup: add a selftest for memory.reclaim
Documentation/admin-guide/cgroup-v2.rst | 21 ++++
mm/memcontrol.c | 45 +++++++
tools/testing/selftests/cgroup/cgroup_util.c | 44 +++----
.../selftests/cgroup/test_memcontrol.c | 114 +++++++++++++++++-
4 files changed, 197 insertions(+), 27 deletions(-)
--
2.36.0.rc2.479.g8af0fa9b8e-goog
The first patch of this series is a documentation fix.
The second patch allows BPF helpers to accept memory regions of fixed
size without doing runtime size checks.
The two next patches add new functionality that allows XDP to
accelerate iptables synproxy.
v1 of this series [1] used to include a patch that exposed conntrack
lookup to BPF using stable helpers. It was superseded by series [2] by
Kumar Kartikeya Dwivedi, which implements this functionality using
unstable helpers.
The third patch adds new helpers to issue and check SYN cookies without
binding to a socket, which is useful in the synproxy scenario.
The fourth patch adds a selftest, which includes an XDP program and a
userspace control application. The XDP program uses socketless SYN
cookie helpers and queries conntrack status instead of socket status.
The userspace control application allows to tune parameters of the XDP
program. This program also serves as a minimal example of usage of the
new functionality.
The last patch exposes the new helpers to TC BPF.
The draft of the new functionality was presented on Netdev 0x15 [3].
v2 changes:
Split into two series, submitted bugfixes to bpf, dropped the conntrack
patches, implemented the timestamp cookie in BPF using bpf_loop, dropped
the timestamp cookie patch.
v3 changes:
Moved some patches from bpf to bpf-next, dropped the patch that changed
error codes, split the new helpers into IPv4/IPv6, added verifier
functionality to accept memory regions of fixed size.
v4 changes:
Converted the selftest to the test_progs runner. Replaced some
deprecated functions in xdp_synproxy userspace helper.
v5 changes:
Fixed a bug in the selftest. Added questionable functionality to support
new helpers in TC BPF, added selftests for it.
v6 changes:
Wrap the new helpers themselves into #ifdef CONFIG_SYN_COOKIES, replaced
fclose with pclose and fixed the MSS for IPv6 in the selftest.
v7 changes:
Fixed the off-by-one error in indices, changed the section name to
"xdp", added missing kernel config options to vmtest in CI.
v8 changes:
Properly rebased, dropped the first patch (the same change was applied
by someone else), updated the cover letter.
[1]: https://lore.kernel.org/bpf/20211020095815.GJ28644@breakpoint.cc/t/
[2]: https://lore.kernel.org/bpf/20220114163953.1455836-1-memxor@gmail.com/
[3]: https://netdevconf.info/0x15/session.html?Accelerating-synproxy-with-XDP
Maxim Mikityanskiy (5):
bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
bpf: Allow helpers to accept pointers with a fixed size
bpf: Add helpers to issue and check SYN cookies in XDP
bpf: Add selftests for raw syncookie helpers
bpf: Allow the new syncookie helpers to work with SKBs
include/linux/bpf.h | 10 +
include/net/tcp.h | 1 +
include/uapi/linux/bpf.h | 88 +-
kernel/bpf/verifier.c | 26 +-
net/core/filter.c | 128 +++
net/ipv4/tcp_input.c | 3 +-
scripts/bpf_doc.py | 4 +
tools/include/uapi/linux/bpf.h | 88 +-
tools/testing/selftests/bpf/.gitignore | 1 +
tools/testing/selftests/bpf/Makefile | 2 +-
.../selftests/bpf/prog_tests/xdp_synproxy.c | 144 +++
.../selftests/bpf/progs/xdp_synproxy_kern.c | 819 ++++++++++++++++++
tools/testing/selftests/bpf/xdp_synproxy.c | 466 ++++++++++
13 files changed, 1759 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c
create mode 100644 tools/testing/selftests/bpf/xdp_synproxy.c
--
2.30.2
Dear linux-kselftest
We are interested in having some of your hot selling product in
our stores and outlets spread all over United Kingdom, Northern
Island and Africa. ASDA Stores Limited is one of the highest-
ranking Wholesale & Retail outlets in the United Kingdom.
We shall furnish our detailed company profile in our next
correspondent. However, it would be appreciated if you can send
us your catalog through email to learn more about your company's
products and wholesale quote. It is hopeful that we can start a
viable long-lasting business relationship (partnership) with you.
Your prompt response would be delightfully appreciated.
Best Wishes
Hanes S. Thomas
Procurement Office.
ASDA Stores Limited
Tel: + 44 - 7451271650
WhatsApp: + 44 – 7441440360
Website: www.asda.co.uk
This splits up the get_proc_stat function to make it so we can use it as a
generic helper to read the nth field from multiple different files, versus
replicating the logic in multiple places.
Signed-off-by: Sargun Dhillon <sargun(a)sargun.me>
Cc: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 54 +++++++++++++------
1 file changed, 38 insertions(+), 16 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index ab340c4759a3..4fb5eda89223 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -4231,32 +4231,54 @@ TEST(user_notification_addfd_rlimit)
close(memfd);
}
-static char get_proc_stat(int pid)
+/*
+ * gen_nth - Get the nth, space separated entry in a file.
+ *
+ * Returns the length of the read field.
+ * Throws error if field is zero-lengthed.
+ */
+static ssize_t get_nth(struct __test_metadata *_metadata, const char *path,
+ const unsigned int position, char **entry)
{
- char proc_path[100] = {0};
char *line = NULL;
- size_t len = 0;
+ unsigned int i;
ssize_t nread;
- char status;
+ size_t len = 0;
FILE *f;
- int i;
- snprintf(proc_path, sizeof(proc_path), "/proc/%d/stat", pid);
- f = fopen(proc_path, "r");
- if (f == NULL)
- ksft_exit_fail_msg("%s - Could not open %s\n",
- strerror(errno), proc_path);
+ f = fopen(path, "r");
+ ASSERT_NE(f, NULL) {
+ TH_LOG("Coud not open %s: %s", path, strerror(errno));
+ }
- for (i = 0; i < 3; i++) {
+ for (i = 0; i < position; i++) {
nread = getdelim(&line, &len, ' ', f);
- if (nread <= 0)
- ksft_exit_fail_msg("Failed to read status: %s\n",
- strerror(errno));
+ ASSERT_GE(nread, 0) {
+ TH_LOG("Failed to read %d entry in file %s", i, path);
+ }
}
+ fclose(f);
+
+ ASSERT_GT(nread, 0) {
+ TH_LOG("Entry in file %s had zero length", path);
+ }
+
+ *entry = line;
+ return nread - 1;
+}
+
+/* For a given PID, get the task state (D, R, etc...) */
+static char get_proc_stat(struct __test_metadata *_metadata, pid_t pid)
+{
+ char proc_path[100] = {0};
+ char status;
+ char *line;
+
+ snprintf(proc_path, sizeof(proc_path), "/proc/%d/stat", pid);
+ ASSERT_EQ(get_nth(_metadata, proc_path, 3, &line), 1);
status = *line;
free(line);
- fclose(f);
return status;
}
@@ -4317,7 +4339,7 @@ TEST(user_notification_fifo)
/* This spins until all of the children are sleeping */
restart_wait:
for (i = 0; i < ARRAY_SIZE(pids); i++) {
- if (get_proc_stat(pids[i]) != 'S') {
+ if (get_proc_stat(_metadata, pids[i]) != 'S') {
nanosleep(&delay, NULL);
goto restart_wait;
}
--
2.25.1
This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.
Changelog:
v6:
Move documentation into cgroup-v2.rst per Tejun Heo.
Rename BINDER_FD{A}_FLAG_SENDER_NO_NEED ->
BINDER_FD{A}_FLAG_XFER_CHARGE per Carlos Llamas.
Return error on transfer failure per Carlos Llamas.
v5:
Rebase on top of v5.18-rc3
Drop the global GPU cgroup "total" (sum of all device totals) portion
of the design since there is no currently known use for this per
Tejun Heo.
Fix commit message which still contained the old name for
dma_buf_transfer_charge per Michal Koutný.
Remove all GPU cgroup code except what's necessary to support charge transfer
from dma_buf. Previously charging was done in export, but for non-Android
graphics use-cases this is not ideal since there may be a delay between
allocation and export, during which time there is no accounting.
Merge dmabuf: Use the GPU cgroup charge/uncharge APIs patch into
dmabuf: heaps: export system_heap buffers with GPU cgroup charging as a
result of above.
Put the charge and uncharge code in the same file (system_heap_allocate,
system_heap_dma_buf_release) instead of splitting them between the heap and
the dma_buf_release. This avoids asymmetric management of the gpucg charges.
Modify the dma_buf_transfer_charge API to accept a task_struct instead
of a gpucg. This avoids requiring the caller to manage the refcount
of the gpucg upon failure and confusing ownership transfer logic.
Support all strings for gpucg_register_bucket instead of just string
literals.
Enforce globally unique gpucg_bucket names.
Constrain gpucg_bucket name lengths to 64 bytes.
Append "-heap" to gpucg_bucket names from dmabuf-heaps.
Drop patch 7 from the series, which changed the types of
binder_transaction_data's sender_pid and sender_euid fields. This was
done in another commit here:
https://lore.kernel.org/all/20220210021129.3386083-4-masahiroy@kernel.org/
Rename:
gpucg_try_charge -> gpucg_charge
find_cg_rpool_locked -> cg_rpool_find_locked
init_cg_rpool -> cg_rpool_init
get_cg_rpool_locked -> cg_rpool_get_locked
"gpu cgroup controller" -> "GPU controller"
gpucg_device -> gpucg_bucket
usage -> size
Tests:
Support both binder_fd_array_object and binder_fd_object. This is
necessary because new versions of Android will use binder_fd_object
instead of binder_fd_array_object, and we need to support both.
Tests for both binder_fd_array_object and binder_fd_object.
For binder_utils return error codes instead of
struct binder{fs}_ctx.
Use ifdef __ANDROID__ to choose platform-dependent temp path instead
of a runtime fallback.
Ensure binderfs_mntpt ends with a trailing '/' character instead of
prepending it where used.
v4:
Skip test if not run as root per Shuah Khan
Add better test logging for abnormal child termination per Shuah Khan
Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný
Adjust gpucg_try_charge critical section for charge transfer functionality
Fix uninitialized return code error for dmabuf_try_charge error case
v3:
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz
Use more common dual author commit message format per John Stultz
Remove android from binder changes title per Todd Kjos
Add a kselftest for this new behavior per Greg Kroah-Hartman
Include details on behavior for all combinations of kernel/userspace
versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.
Fix pid and uid types in binder UAPI header
v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.
Fix incorrect Kconfig help section indentation per Randy Dunlap.
History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.
[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-…
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.co…
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
Hridya Valsaraju (3):
gpu: rfc: Proposal for a GPU cgroup controller
cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
memory
binder: Add flags to relinquish ownership of fds
T.J. Mercier (3):
dmabuf: heaps: export system_heap buffers with GPU cgroup charging
dmabuf: Add gpu cgroup charge transfer function
selftests: Add binder cgroup gpu memory transfer tests
Documentation/admin-guide/cgroup-v2.rst | 24 +
drivers/android/binder.c | 31 +-
drivers/dma-buf/dma-buf.c | 80 ++-
drivers/dma-buf/dma-heap.c | 39 ++
drivers/dma-buf/heaps/system_heap.c | 28 +-
include/linux/cgroup_gpu.h | 137 +++++
include/linux/cgroup_subsys.h | 4 +
include/linux/dma-buf.h | 49 +-
include/linux/dma-heap.h | 15 +
include/uapi/linux/android/binder.h | 23 +-
init/Kconfig | 7 +
kernel/cgroup/Makefile | 1 +
kernel/cgroup/gpu.c | 386 +++++++++++++
.../selftests/drivers/android/binder/Makefile | 8 +
.../drivers/android/binder/binder_util.c | 250 +++++++++
.../drivers/android/binder/binder_util.h | 32 ++
.../selftests/drivers/android/binder/config | 4 +
.../binder/test_dmabuf_cgroup_transfer.c | 526 ++++++++++++++++++
18 files changed, 1621 insertions(+), 23 deletions(-)
create mode 100644 include/linux/cgroup_gpu.h
create mode 100644 kernel/cgroup/gpu.c
create mode 100644 tools/testing/selftests/drivers/android/binder/Makefile
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.c
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.h
create mode 100644 tools/testing/selftests/drivers/android/binder/config
create mode 100644 tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c
--
2.36.0.464.gb9c8b46e94-goog
If a memop fails due to key checked protection, after already having
written to the guest, don't indicate suppression to the guest, as that
would imply that memory wasn't modified.
This could be considered a fix to the code introducing storage key
support, however this is a bug in KVM only if we emulate an
instructions writing to an operand spanning multiple pages, which I
don't believe we do.
v1 -> v2
* Reword commit message of patch 1
Janis Schoetterl-Glausch (2):
KVM: s390: Don't indicate suppression on dirtying, failing memop
KVM: s390: selftest: Test suppression indication on key prot exception
arch/s390/kvm/gaccess.c | 47 ++++++++++++++---------
tools/testing/selftests/kvm/s390x/memop.c | 43 ++++++++++++++++++++-
2 files changed, 70 insertions(+), 20 deletions(-)
base-commit: af2d861d4cd2a4da5137f795ee3509e6f944a25b
--
2.32.0
These names sound more general than they are.
The _end() function increments a `static int kunit_suite_counter`, so it
can only safely be called on suites, aka top-level subtests.
It would need to have a separate counter for each level of subtest to be
generic enough.
So rename it to make it clear it's only appropriate for suites.
Signed-off-by: Daniel Latypov <dlatypov(a)google.com>
Reviewed-by: David Gow <davidgow(a)google.com>
---
v1 -> v2: no change (see patch 2 and 4)
---
lib/kunit/test.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/lib/kunit/test.c b/lib/kunit/test.c
index 0f66c13d126e..64ee6a9d8003 100644
--- a/lib/kunit/test.c
+++ b/lib/kunit/test.c
@@ -134,7 +134,7 @@ size_t kunit_suite_num_test_cases(struct kunit_suite *suite)
}
EXPORT_SYMBOL_GPL(kunit_suite_num_test_cases);
-static void kunit_print_subtest_start(struct kunit_suite *suite)
+static void kunit_print_suite_start(struct kunit_suite *suite)
{
kunit_log(KERN_INFO, suite, KUNIT_SUBTEST_INDENT "# Subtest: %s",
suite->name);
@@ -192,7 +192,7 @@ EXPORT_SYMBOL_GPL(kunit_suite_has_succeeded);
static size_t kunit_suite_counter = 1;
-static void kunit_print_subtest_end(struct kunit_suite *suite)
+static void kunit_print_suite_end(struct kunit_suite *suite)
{
kunit_print_ok_not_ok((void *)suite, false,
kunit_suite_has_succeeded(suite),
@@ -498,7 +498,7 @@ int kunit_run_tests(struct kunit_suite *suite)
struct kunit_result_stats suite_stats = { 0 };
struct kunit_result_stats total_stats = { 0 };
- kunit_print_subtest_start(suite);
+ kunit_print_suite_start(suite);
kunit_suite_for_each_test_case(suite, test_case) {
struct kunit test = { .param_value = NULL, .param_index = 0 };
@@ -552,7 +552,7 @@ int kunit_run_tests(struct kunit_suite *suite)
}
kunit_print_suite_stats(suite, suite_stats, total_stats);
- kunit_print_subtest_end(suite);
+ kunit_print_suite_end(suite);
return 0;
}
base-commit: 59729170afcd4900e08997a482467ffda8d88c7f
--
2.36.0.464.gb9c8b46e94-goog
The kvm_binary_stats_test test currently does not have any output (unless
one of the TEST_ASSERT statement fails), so it's hard to say for a user
how far it did proceed already. Thus let's make this a little bit more
user-friendly and include some TAP output via the kselftest.h interface.
Signed-off-by: Thomas Huth <thuth(a)redhat.com>
---
.../testing/selftests/kvm/kvm_binary_stats_test.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/kvm/kvm_binary_stats_test.c b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
index 17f65d514915..aa648834e178 100644
--- a/tools/testing/selftests/kvm/kvm_binary_stats_test.c
+++ b/tools/testing/selftests/kvm/kvm_binary_stats_test.c
@@ -19,6 +19,7 @@
#include "kvm_util.h"
#include "asm/kvm.h"
#include "linux/kvm.h"
+#include "kselftest.h"
static void stats_test(int stats_fd)
{
@@ -52,7 +53,7 @@ static void stats_test(int stats_fd)
/* Sanity check for other fields in header */
if (header->num_desc == 0) {
- printf("No KVM stats defined!");
+ ksft_print_msg("No KVM stats defined!\n");
return;
}
/* Check overlap */
@@ -219,12 +220,15 @@ int main(int argc, char *argv[])
max_vcpu = DEFAULT_NUM_VCPU;
}
+ ksft_print_header();
+
/* Check the extension for binary stats */
if (kvm_check_cap(KVM_CAP_BINARY_STATS_FD) <= 0) {
- print_skip("Binary form statistics interface is not supported");
- exit(KSFT_SKIP);
+ ksft_exit_skip("Binary form statistics interface is not supported\n");
}
+ ksft_set_plan(max_vm);
+
/* Create VMs and VCPUs */
vms = malloc(sizeof(vms[0]) * max_vm);
TEST_ASSERT(vms, "Allocate memory for storing VM pointers");
@@ -240,10 +244,12 @@ int main(int argc, char *argv[])
vm_stats_test(vms[i]);
for (j = 0; j < max_vcpu; ++j)
vcpu_stats_test(vms[i], j);
+ ksft_test_result_pass("vm%i\n", i);
}
for (i = 0; i < max_vm; ++i)
kvm_vm_free(vms[i]);
free(vms);
- return 0;
+
+ ksft_finished(); /* Print results and exit() accordingly */
}
--
2.27.0
This patch series is motivated by Shuah's suggestion here:
https://lore.kernel.org/kvm/d576d8f7-980f-3bc6-87ad-5a6ae45609b8@linuxfound…
Many s390x KVM selftests do not output any information about which
tests have been run, so it's hard to say whether a test binary
contains a certain sub-test or not. To improve this situation let's
add some TAP output via the kselftest.h interface to these tests,
so that it easier to understand what has been executed or not.
v2:
- Reworked the extension checking in the first patch
- Make sure to always print the TAP 13 header in the second patch
- Reworked the SKIP printing in the third patch
Thomas Huth (4):
KVM: s390: selftests: Use TAP interface in the memop test
KVM: s390: selftests: Use TAP interface in the sync_regs test
KVM: s390: selftests: Use TAP interface in the tprot test
KVM: s390: selftests: Use TAP interface in the reset test
tools/testing/selftests/kvm/s390x/memop.c | 90 +++++++++++++++----
tools/testing/selftests/kvm/s390x/resets.c | 38 ++++++--
.../selftests/kvm/s390x/sync_regs_test.c | 87 +++++++++++++-----
tools/testing/selftests/kvm/s390x/tprot.c | 28 ++++--
4 files changed, 192 insertions(+), 51 deletions(-)
--
2.27.0
From: Frank Rowand <frank.rowand(a)sony.com>
An August 2021 RFC patch [1] to create the KTAP Specification resulted in
some discussion of possible items to add to the specification.
The conversation ended without completing the document.
Progress resumed with a December 2021 RFC patch [2] to add a KTAP
Specification file (Version 1) to the Linux kernel. Many of the
suggestions from the August 2021 discussion were not included in
Version 1. This patch series is intended to revisit some of the
suggestions from the August 2021 discussion.
Patch 1 changes the Specification version to "2-rc" to indicate
that following patches are not yet accepted into a final version 2.
Patch 2 is an example of a simple change to the Specification. The
change does not change the content of the Specification, but updates
a formatting directive as suggested by the Documentation maintainer.
I intend to take some specific suggestions from the August 2021
discussion to create stand-alone RFC patches to the Specification
instead of adding them as additional patches in this series. The
intent is to focus discussion on a single area of the Specification
in each patch email thread.
[1] https://lore.kernel.org/r/CA+GJov6tdjvY9x12JsJT14qn6c7NViJxqaJk+r-K1YJzPggF…
[2] https://lore.kernel.org/r/20211207190251.18426-1-davidgow@google.com
Frank Rowand (2):
Documentation: dev-tools: KTAP spec change version to 2-rc
Documentation: dev-tools: use literal block instead of code-block
Documentation/dev-tools/ktap.rst | 20 +++++++++-----------
1 file changed, 9 insertions(+), 11 deletions(-)
--
Frank Rowand <frank.rowand(a)sony.com>
Currently the arm64 selftests don't support building with O=, this
series fixes that, bringing them more into line with how the kselftest
Makefiles want to work.
v3:
- Rebase onto arm64/for-next/core.
v2:
- Rebase onto v5.18-rc3.
Mark Brown (4):
selftests/arm64: Use TEST_GEN_PROGS_EXTENDED in the FP Makefile
selftests/arm64: Define top_srcdir for the fp tests
selftests/arm64: Clean the fp helper libraries
selftests/arm64: Fix O= builds for the floating point tests
tools/testing/selftests/arm64/fp/Makefile | 49 ++++++++++++-----------
1 file changed, 26 insertions(+), 23 deletions(-)
base-commit: 5c346f94d2933ba320af8325cfe77fc58c6e537a
--
2.30.2
The first patch of this series is an improvement to the existing
syncookie BPF helper. The second patch is a documentation fix.
The third patch allows BPF helpers to accept memory regions of fixed
size without doing runtime size checks.
The two last patches add new functionality that allows XDP to
accelerate iptables synproxy.
v1 of this series [1] used to include a patch that exposed conntrack
lookup to BPF using stable helpers. It was superseded by series [2] by
Kumar Kartikeya Dwivedi, which implements this functionality using
unstable helpers.
The fourth patch adds new helpers to issue and check SYN cookies without
binding to a socket, which is useful in the synproxy scenario.
The fifth patch adds a selftest, which consists of a script, an XDP
program and a userspace control application. The XDP program uses
socketless SYN cookie helpers and queries conntrack status instead of
socket status. The userspace control application allows to tune
parameters of the XDP program. This program also serves as a minimal
example of usage of the new functionality.
The draft of the new functionality was presented on Netdev 0x15 [3].
v2 changes:
Split into two series, submitted bugfixes to bpf, dropped the conntrack
patches, implemented the timestamp cookie in BPF using bpf_loop, dropped
the timestamp cookie patch.
v3 changes:
Moved some patches from bpf to bpf-next, dropped the patch that changed
error codes, split the new helpers into IPv4/IPv6, added verifier
functionality to accept memory regions of fixed size.
v4 changes:
Converted the selftest to the test_progs runner. Replaced some
deprecated functions in xdp_synproxy userspace helper.
v5 changes:
Fixed a bug in the selftest. Added questionable functionality to support
new helpers in TC BPF, added selftests for it.
v6 changes:
Wrap the new helpers themselves into #ifdef CONFIG_SYN_COOKIES, replaced
fclose with pclose and fixed the MSS for IPv6 in the selftest.
v7 changes:
Fixed the off-by-one error in indices, changed the section name to
"xdp", added missing kernel config options to vmtest in CI.
[1]: https://lore.kernel.org/bpf/20211020095815.GJ28644@breakpoint.cc/t/
[2]: https://lore.kernel.org/bpf/20220114163953.1455836-1-memxor@gmail.com/
[3]: https://netdevconf.info/0x15/session.html?Accelerating-synproxy-with-XDP
Maxim Mikityanskiy (6):
bpf: Use ipv6_only_sock in bpf_tcp_gen_syncookie
bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
bpf: Allow helpers to accept pointers with a fixed size
bpf: Add helpers to issue and check SYN cookies in XDP
bpf: Add selftests for raw syncookie helpers
bpf: Allow the new syncookie helpers to work with SKBs
include/linux/bpf.h | 10 +
include/net/tcp.h | 1 +
include/uapi/linux/bpf.h | 88 +-
kernel/bpf/verifier.c | 26 +-
net/core/filter.c | 130 ++-
net/ipv4/tcp_input.c | 3 +-
scripts/bpf_doc.py | 4 +
tools/include/uapi/linux/bpf.h | 88 +-
tools/testing/selftests/bpf/.gitignore | 1 +
tools/testing/selftests/bpf/Makefile | 2 +-
.../selftests/bpf/prog_tests/xdp_synproxy.c | 144 +++
.../selftests/bpf/progs/xdp_synproxy_kern.c | 819 ++++++++++++++++++
tools/testing/selftests/bpf/xdp_synproxy.c | 466 ++++++++++
13 files changed, 1760 insertions(+), 22 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c
create mode 100644 tools/testing/selftests/bpf/xdp_synproxy.c
--
2.30.2
Dzień dobry,
stworzyliśmy specjalną ofertę dla firm, na kompleksową obsługę inwestycji w fotowoltaikę.
Specjalizujemy się w zakresie doboru, montażu i serwisie instalacji fotowoltaicznych, dysponujemy najnowocześniejszymi rozwiązania, które zapewnią Państwu oczekiwane rezultaty.
Możemy przygotować dla Państwa wstępną kalkulację i przeanalizować efekty możliwe do osiągnięcia.
Czy są Państwo otwarci na wstępną rozmowę w tym temacie?
Pozdrawiam
Arkadiusz Sokołowski
The first patch of this series is an improvement to the existing
syncookie BPF helper. The second patch is a documentation fix.
The third patch allows BPF helpers to accept memory regions of fixed
size without doing runtime size checks.
The two last patches add new functionality that allows XDP to
accelerate iptables synproxy.
v1 of this series [1] used to include a patch that exposed conntrack
lookup to BPF using stable helpers. It was superseded by series [2] by
Kumar Kartikeya Dwivedi, which implements this functionality using
unstable helpers.
The fourth patch adds new helpers to issue and check SYN cookies without
binding to a socket, which is useful in the synproxy scenario.
The fifth patch adds a selftest, which consists of a script, an XDP
program and a userspace control application. The XDP program uses
socketless SYN cookie helpers and queries conntrack status instead of
socket status. The userspace control application allows to tune
parameters of the XDP program. This program also serves as a minimal
example of usage of the new functionality.
The draft of the new functionality was presented on Netdev 0x15 [3].
v2 changes:
Split into two series, submitted bugfixes to bpf, dropped the conntrack
patches, implemented the timestamp cookie in BPF using bpf_loop, dropped
the timestamp cookie patch.
v3 changes:
Moved some patches from bpf to bpf-next, dropped the patch that changed
error codes, split the new helpers into IPv4/IPv6, added verifier
functionality to accept memory regions of fixed size.
v4 changes:
Converted the selftest to the test_progs runner. Replaced some
deprecated functions in xdp_synproxy userspace helper.
v5 changes:
Fixed a bug in the selftest. Added questionable functionality to support
new helpers in TC BPF, added selftests for it.
v6 changes:
Wrap the new helpers themselves into #ifdef CONFIG_SYN_COOKIES, replaced
fclose with pclose and fixed the MSS for IPv6 in the selftest.
[1]: https://lore.kernel.org/bpf/20211020095815.GJ28644@breakpoint.cc/t/
[2]: https://lore.kernel.org/bpf/20220114163953.1455836-1-memxor@gmail.com/
[3]: https://netdevconf.info/0x15/session.html?Accelerating-synproxy-with-XDP
Maxim Mikityanskiy (6):
bpf: Use ipv6_only_sock in bpf_tcp_gen_syncookie
bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
bpf: Allow helpers to accept pointers with a fixed size
bpf: Add helpers to issue and check SYN cookies in XDP
bpf: Add selftests for raw syncookie helpers
bpf: Allow the new syncookie helpers to work with SKBs
include/linux/bpf.h | 10 +
include/net/tcp.h | 1 +
include/uapi/linux/bpf.h | 88 +-
kernel/bpf/verifier.c | 26 +-
net/core/filter.c | 130 ++-
net/ipv4/tcp_input.c | 3 +-
scripts/bpf_doc.py | 4 +
tools/include/uapi/linux/bpf.h | 88 +-
tools/testing/selftests/bpf/.gitignore | 1 +
tools/testing/selftests/bpf/Makefile | 2 +-
.../selftests/bpf/prog_tests/xdp_synproxy.c | 144 +++
.../selftests/bpf/progs/xdp_synproxy_kern.c | 819 ++++++++++++++++++
tools/testing/selftests/bpf/xdp_synproxy.c | 466 ++++++++++
13 files changed, 1760 insertions(+), 22 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c
create mode 100644 tools/testing/selftests/bpf/xdp_synproxy.c
--
2.30.2
Currently the arm64 selftests don't support building with O=, this
series fixes that, bringing them more into line with how the kselftest
Makefiles want to work.
v2:
- Rebase onto v5.18-rc3.
Mark Brown (4):
selftests/arm64: Use TEST_GEN_PROGS_EXTENDED in the FP Makefile
selftests/arm64: Define top_srcdir for the fp tests
selftests/arm64: Clean the fp helper libraries
selftests/arm64: Fix O= builds for the floating point tests
tools/testing/selftests/arm64/fp/Makefile | 29 +++++++++++++----------
1 file changed, 17 insertions(+), 12 deletions(-)
base-commit: b2d229d4ddb17db541098b83524d901257e93845
--
2.30.2
The code is copied from the Android Open Source Project and the author(
Maciej ��enczykowski) has gave permission to relicense it under GPLv2.
The test is to change input IPv6 packets to IPv4 ones and output IPv4 to
IPv6 with bpf_skb_change_proto.
Signed-off-by: Maciej ��enczykowski <maze(a)google.com>
Signed-off-by: Lina Wang <lina.wang(a)mediatek.com>
---
tools/testing/selftests/bpf/progs/nat6to4.c | 293 ++++++++++++++++++++
1 file changed, 293 insertions(+)
create mode 100644 tools/testing/selftests/bpf/progs/nat6to4.c
diff --git a/tools/testing/selftests/bpf/progs/nat6to4.c b/tools/testing/selftests/bpf/progs/nat6to4.c
new file mode 100644
index 000000000000..099950f7a6cc
--- /dev/null
+++ b/tools/testing/selftests/bpf/progs/nat6to4.c
@@ -0,0 +1,293 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * This code is taken from the Android Open Source Project and the author
+ * (Maciej ��enczykowski) has gave permission to relicense it under the
+ * GPLv2. Therefore this program is free software;
+ * You can redistribute it and/or modify it under the terms of the GNU
+ * General Public License version 2 as published by the Free Software
+ * Foundation
+
+ * The original headers, including the original license headers, are
+ * included below for completeness.
+ *
+ * Copyright (C) 2019 The Android Open Source Project
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+#include <linux/bpf.h>
+#include <linux/if.h>
+#include <linux/if_ether.h>
+#include <linux/if_packet.h>
+#include <linux/in.h>
+#include <linux/in6.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/swab.h>
+#include <stdbool.h>
+#include <stdint.h>
+
+// bionic kernel uapi linux/udp.h header is munged...
+#define __kernel_udphdr udphdr
+#include <linux/udp.h>
+
+#include <bpf/bpf_helpers.h>
+
+#define htons(x) (__builtin_constant_p(x) ? ___constant_swab16(x) : __builtin_bswap16(x))
+#define htonl(x) (__builtin_constant_p(x) ? ___constant_swab32(x) : __builtin_bswap32(x))
+#define ntohs(x) htons(x)
+#define ntohl(x) htonl(x)
+
+// From kernel:include/net/ip.h
+#define IP_DF 0x4000 // Flag: "Don't Fragment"
+
+SEC("schedcls/ingress6/nat_6")
+int sched_cls_ingress6_nat_6_prog(struct __sk_buff *skb)
+{
+
+ const int l2_header_size = sizeof(struct ethhdr);
+ void *data = (void *)(long)skb->data;
+ const void *data_end = (void *)(long)skb->data_end;
+ const struct ethhdr * const eth = data; // used iff is_ethernet
+ const struct ipv6hdr * const ip6 = (void *)(eth + 1);
+
+ // Require ethernet dst mac address to be our unicast address.
+ if (skb->pkt_type != PACKET_HOST)
+ return TC_ACT_OK;
+
+ // Must be meta-ethernet IPv6 frame
+ if (skb->protocol != htons(ETH_P_IPV6))
+ return TC_ACT_OK;
+
+ // Must have (ethernet and) ipv6 header
+ if (data + l2_header_size + sizeof(*ip6) > data_end)
+ return TC_ACT_OK;
+
+ // Ethertype - if present - must be IPv6
+ if (eth->h_proto != htons(ETH_P_IPV6))
+ return TC_ACT_OK;
+
+ // IP version must be 6
+ if (ip6->version != 6)
+ return TC_ACT_OK;
+ // Maximum IPv6 payload length that can be translated to IPv4
+ if (ntohs(ip6->payload_len) > 0xFFFF - sizeof(struct iphdr))
+ return TC_ACT_OK;
+ switch (ip6->nexthdr) {
+ case IPPROTO_TCP: // For TCP & UDP the checksum neutrality of the chosen IPv6
+ case IPPROTO_UDP: // address means there is no need to update their checksums.
+ case IPPROTO_GRE: // We do not need to bother looking at GRE/ESP headers,
+ case IPPROTO_ESP: // since there is never a checksum to update.
+ break;
+ default: // do not know how to handle anything else
+ return TC_ACT_OK;
+ }
+
+ struct ethhdr eth2; // used iff is_ethernet
+
+ eth2 = *eth; // Copy over the ethernet header (src/dst mac)
+ eth2.h_proto = htons(ETH_P_IP); // But replace the ethertype
+
+ struct iphdr ip = {
+ .version = 4, // u4
+ .ihl = sizeof(struct iphdr) / sizeof(__u32), // u4
+ .tos = (ip6->priority << 4) + (ip6->flow_lbl[0] >> 4), // u8
+ .tot_len = htons(ntohs(ip6->payload_len) + sizeof(struct iphdr)), // u16
+ .id = 0, // u16
+ .frag_off = htons(IP_DF), // u16
+ .ttl = ip6->hop_limit, // u8
+ .protocol = ip6->nexthdr, // u8
+ .check = 0, // u16
+ .saddr = 0x0201a8c0, // u32
+ .daddr = 0x0101a8c0, // u32
+ };
+
+ // Calculate the IPv4 one's complement checksum of the IPv4 header.
+ __wsum sum4 = 0;
+
+ for (int i = 0; i < sizeof(ip) / sizeof(__u16); ++i)
+ sum4 += ((__u16 *)&ip)[i];
+
+ // Note that sum4 is guaranteed to be non-zero by virtue of ip.version == 4
+ sum4 = (sum4 & 0xFFFF) + (sum4 >> 16); // collapse u32 into range 1 .. 0x1FFFE
+ sum4 = (sum4 & 0xFFFF) + (sum4 >> 16); // collapse any potential carry into u16
+ ip.check = (__u16)~sum4; // sum4 cannot be zero, so this is never 0xFFFF
+
+ // Calculate the *negative* IPv6 16-bit one's complement checksum of the IPv6 header.
+ __wsum sum6 = 0;
+ // We'll end up with a non-zero sum due to ip6->version == 6 (which has '0' bits)
+ for (int i = 0; i < sizeof(*ip6) / sizeof(__u16); ++i)
+ sum6 += ~((__u16 *)ip6)[i]; // note the bitwise negation
+
+ // Note that there is no L4 checksum update: we are relying on the checksum neutrality
+ // of the ipv6 address chosen by netd's ClatdController.
+
+ // Packet mutations begin - point of no return, but if this first modification fails
+ // the packet is probably still pristine, so let clatd handle it.
+ if (bpf_skb_change_proto(skb, htons(ETH_P_IP), 0))
+ return TC_ACT_OK;
+ bpf_csum_update(skb, sum6);
+
+ data = (void *)(long)skb->data;
+ data_end = (void *)(long)skb->data_end;
+ if (data + l2_header_size + sizeof(struct iphdr) > data_end)
+ return TC_ACT_SHOT;
+
+ struct ethhdr *new_eth = data;
+
+ // Copy over the updated ethernet header
+ *new_eth = eth2;
+
+ // Copy over the new ipv4 header.
+ *(struct iphdr *)(new_eth + 1) = ip;
+ return bpf_redirect(skb->ifindex, BPF_F_INGRESS);
+}
+SEC("schedcls/egress4/snat4")
+int sched_cls_egress4_snat4_prog(struct __sk_buff *skb)
+{
+ const int l2_header_size = sizeof(struct ethhdr);
+ void *data = (void *)(long)skb->data;
+ const void *data_end = (void *)(long)skb->data_end;
+ const struct ethhdr *const eth = data; // used iff is_ethernet
+ const struct iphdr *const ip4 = (void *)(eth + 1);
+
+
+ // Must be meta-ethernet IPv4 frame
+ if (skb->protocol != htons(ETH_P_IP))
+ return TC_ACT_OK;
+
+ // Must have ipv4 header
+ if (data + l2_header_size + sizeof(struct ipv6hdr) > data_end)
+ return TC_ACT_OK;
+
+ // Ethertype - if present - must be IPv4
+ if (eth->h_proto != htons(ETH_P_IP))
+ return TC_ACT_OK;
+
+ // IP version must be 4
+ if (ip4->version != 4)
+ return TC_ACT_OK;
+
+ // We cannot handle IP options, just standard 20 byte == 5 dword minimal IPv4 header
+ if (ip4->ihl != 5)
+ return TC_ACT_OK;
+
+ // Maximum IPv6 payload length that can be translated to IPv4
+ if (htons(ip4->tot_len) > 0xFFFF - sizeof(struct ipv6hdr))
+ return TC_ACT_OK;
+
+ // Calculate the IPv4 one's complement checksum of the IPv4 header.
+ __wsum sum4 = 0;
+
+ for (int i = 0; i < sizeof(*ip4) / sizeof(__u16); ++i)
+ sum4 += ((__u16 *)ip4)[i];
+
+ // Note that sum4 is guaranteed to be non-zero by virtue of ip4->version == 4
+ sum4 = (sum4 & 0xFFFF) + (sum4 >> 16); // collapse u32 into range 1 .. 0x1FFFE
+ sum4 = (sum4 & 0xFFFF) + (sum4 >> 16); // collapse any potential carry into u16
+ // for a correct checksum we should get *a* zero, but sum4 must be positive, ie 0xFFFF
+ if (sum4 != 0xFFFF)
+ return TC_ACT_OK;
+
+ // Minimum IPv4 total length is the size of the header
+ if (ntohs(ip4->tot_len) < sizeof(*ip4))
+ return TC_ACT_OK;
+
+ // We are incapable of dealing with IPv4 fragments
+ if (ip4->frag_off & ~htons(IP_DF))
+ return TC_ACT_OK;
+
+ switch (ip4->protocol) {
+ case IPPROTO_TCP: // For TCP & UDP the checksum neutrality of the chosen IPv6
+ case IPPROTO_GRE: // address means there is no need to update their checksums.
+ case IPPROTO_ESP: // We do not need to bother looking at GRE/ESP headers,
+ break; // since there is never a checksum to update.
+
+ case IPPROTO_UDP: // See above comment, but must also have UDP header...
+ if (data + sizeof(*ip4) + sizeof(struct udphdr) > data_end)
+ return TC_ACT_OK;
+ const struct udphdr *uh = (const struct udphdr *)(ip4 + 1);
+ // If IPv4/UDP checksum is 0 then fallback to clatd so it can calculate the
+ // checksum. Otherwise the network or more likely the NAT64 gateway might
+ // drop the packet because in most cases IPv6/UDP packets with a zero checksum
+ // are invalid. See RFC 6935. TODO: calculate checksum via bpf_csum_diff()
+ if (!uh->check)
+ return TC_ACT_OK;
+ break;
+
+ default: // do not know how to handle anything else
+ return TC_ACT_OK;
+ }
+ struct ethhdr eth2; // used iff is_ethernet
+
+ eth2 = *eth; // Copy over the ethernet header (src/dst mac)
+ eth2.h_proto = htons(ETH_P_IPV6); // But replace the ethertype
+
+ struct ipv6hdr ip6 = {
+ .version = 6, // __u8:4
+ .priority = ip4->tos >> 4, // __u8:4
+ .flow_lbl = {(ip4->tos & 0xF) << 4, 0, 0}, // __u8[3]
+ .payload_len = htons(ntohs(ip4->tot_len) - 20), // __be16
+ .nexthdr = ip4->protocol, // __u8
+ .hop_limit = ip4->ttl, // __u8
+ };
+ ip6.saddr.in6_u.u6_addr32[0] = htonl(0x20010db8);
+ ip6.saddr.in6_u.u6_addr32[1] = 0;
+ ip6.saddr.in6_u.u6_addr32[2] = 0;
+ ip6.saddr.in6_u.u6_addr32[3] = htonl(1);
+ ip6.daddr.in6_u.u6_addr32[0] = htonl(0x20010db8);
+ ip6.daddr.in6_u.u6_addr32[1] = 0;
+ ip6.daddr.in6_u.u6_addr32[2] = 0;
+ ip6.daddr.in6_u.u6_addr32[3] = htonl(2);
+
+ // Calculate the IPv6 16-bit one's complement checksum of the IPv6 header.
+ __wsum sum6 = 0;
+ // We'll end up with a non-zero sum due to ip6.version == 6
+ for (int i = 0; i < sizeof(ip6) / sizeof(__u16); ++i)
+ sum6 += ((__u16 *)&ip6)[i];
+
+ // Packet mutations begin - point of no return, but if this first modification fails
+ // the packet is probably still pristine, so let clatd handle it.
+ if (bpf_skb_change_proto(skb, htons(ETH_P_IPV6), 0))
+ return TC_ACT_OK;
+
+ // This takes care of updating the skb->csum field for a CHECKSUM_COMPLETE packet.
+ // In such a case, skb->csum is a 16-bit one's complement sum of the entire payload,
+ // thus we need to subtract out the ipv4 header's sum, and add in the ipv6 header's sum.
+ // However, we've already verified the ipv4 checksum is correct and thus 0.
+ // Thus we only need to add the ipv6 header's sum.
+ //
+ // bpf_csum_update() always succeeds if the skb is CHECKSUM_COMPLETE and returns an error
+ // (-ENOTSUPP) if it isn't. So we just ignore the return code (see above for more details).
+ bpf_csum_update(skb, sum6);
+
+ // bpf_skb_change_proto() invalidates all pointers - reload them.
+ data = (void *)(long)skb->data;
+ data_end = (void *)(long)skb->data_end;
+
+ // I cannot think of any valid way for this error condition to trigger, however I do
+ // believe the explicit check is required to keep the in kernel ebpf verifier happy.
+ if (data + l2_header_size + sizeof(ip6) > data_end)
+ return TC_ACT_SHOT;
+
+ struct ethhdr *new_eth = data;
+
+ // Copy over the updated ethernet header
+ *new_eth = eth2;
+ // Copy over the new ipv4 header.
+ *(struct ipv6hdr *)(new_eth + 1) = ip6;
+ return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = ("GPL");
+
--
2.18.0