v10:
- Relax constraints for changes made to "cpuset.cpus"
and "cpuset.cpus.partition" as suggested. Now almost all changes
are allowed.
v9:
- Add a new patch 1 to remove the child cpuset restriction on parent's
"cpuset.cpus".
- Relax initial root partition entry limitation to allow cpuset.cpus to
overlap that of parent's.
- An "isolated invalid" displayed type is added to
cpuset.cpus.partition.
- Resetting partition root to "member" will leave child partition root
as invalid.
- Update documentation and test accordingly.
v8:
- Reorganize the patch series and rationalize the features and
constraints of a partition.
- Update patch descriptions and documentation accordingly.
This patchset include the following enhancements to the cpuset v2
partition code.
1) Allow partitions that have no task to have empty effective cpus.
2) Relax the constraints on what changes are allowed in cpuset.cpus
and cpuset.cpus.partition. However, the partition remain invalid
until the constraints of a valid partition root is satisfied.
3) Add a new "isolated" partition type for partitions with no load
balancing which is available in v1 but not yet in v2.
4) Allow the reading of cpuset.cpus.partition to include a reason
string as to why the partition remain invalid.
In addition, the cgroup-v2.rst documentation file is updated and
a self test is added to verify the correctness the partition code.
Waiman Long (8):
cgroup/cpuset: Add top_cpuset check in update_tasks_cpumask()
cgroup/cpuset: Miscellaneous cleanups & add helper functions
cgroup/cpuset: Allow no-task partition to have empty
cpuset.cpus.effective
cgroup/cpuset: Relax constraints to partition & cpus changes
cgroup/cpuset: Add a new isolated cpus.partition type
cgroup/cpuset: Show invalid partition reason string
cgroup/cpuset: Update description of cpuset.cpus.partition in
cgroup-v2.rst
kselftest/cgroup: Add cpuset v2 partition root state test
Documentation/admin-guide/cgroup-v2.rst | 145 ++--
kernel/cgroup/cpuset.c | 712 +++++++++++-------
tools/testing/selftests/cgroup/Makefile | 5 +-
.../selftests/cgroup/test_cpuset_prs.sh | 674 +++++++++++++++++
tools/testing/selftests/cgroup/wait_inotify.c | 87 +++
5 files changed, 1295 insertions(+), 328 deletions(-)
create mode 100755 tools/testing/selftests/cgroup/test_cpuset_prs.sh
create mode 100644 tools/testing/selftests/cgroup/wait_inotify.c
--
2.27.0
This patch series adds a memory.reclaim proactive reclaim interface.
The rationale behind the interface and how it works are in the first
patch.
---
Changes in V5:
- Fixed comment formating and added Co-developed-by in patch 1.
- Modified selftest to work if swap is enabled or not, and retry
multiple times to wait for background allocation before failing
with a clear message.
Changes in V4:
mm/memcontrol.c:
- Return -EINTR on signal_pending().
- On the final retry, drain percpu lru caches hoping that it might
introduce some evictable pages for reclaim.
- Simplified the retry loop as suggested by Dan Schatzberg.
selftests:
- Always return -errno on failure from cg_write() (whether open() or
write() fail), also update cg_read() and read_text() to return -errno
as well for consistency. Also make sure to correctly check that the
whole buffer was written in cg_write().
- Added a maximum number of retries for the reclaim selftest.
Changes in V3:
- Fix cg_write() (in patch 2) to properly return -1 if open() fails
and not fail if len == errno.
- Remove debug printf() in patch 3.
Changes in V2:
- Add the interface to root as well.
- Added a selftest.
- Documented the interface as a nested-keyed interface, which makes
adding optional arguments in the future easier (see doc updates in the
first patch).
- Modified the commit message to reflect changes and added a timeout
argument as a suggested possible extension
- Return -EAGAIN if the kernel fails to reclaim the full requested
amount.
---
Shakeel Butt (1):
memcg: introduce per-memcg reclaim interface
Yosry Ahmed (3):
selftests: cgroup: return -errno from cg_read()/cg_write() on failure
selftests: cgroup: fix alloc_anon_noexit() instantly freeing memory
selftests: cgroup: add a selftest for memory.reclaim
Documentation/admin-guide/cgroup-v2.rst | 21 ++++
mm/memcontrol.c | 45 +++++++
tools/testing/selftests/cgroup/cgroup_util.c | 44 +++----
.../selftests/cgroup/test_memcontrol.c | 114 +++++++++++++++++-
4 files changed, 197 insertions(+), 27 deletions(-)
--
2.36.0.rc2.479.g8af0fa9b8e-goog
The first patch of this series is a documentation fix.
The second patch allows BPF helpers to accept memory regions of fixed
size without doing runtime size checks.
The two next patches add new functionality that allows XDP to
accelerate iptables synproxy.
v1 of this series [1] used to include a patch that exposed conntrack
lookup to BPF using stable helpers. It was superseded by series [2] by
Kumar Kartikeya Dwivedi, which implements this functionality using
unstable helpers.
The third patch adds new helpers to issue and check SYN cookies without
binding to a socket, which is useful in the synproxy scenario.
The fourth patch adds a selftest, which includes an XDP program and a
userspace control application. The XDP program uses socketless SYN
cookie helpers and queries conntrack status instead of socket status.
The userspace control application allows to tune parameters of the XDP
program. This program also serves as a minimal example of usage of the
new functionality.
The last patch exposes the new helpers to TC BPF.
The draft of the new functionality was presented on Netdev 0x15 [3].
v2 changes:
Split into two series, submitted bugfixes to bpf, dropped the conntrack
patches, implemented the timestamp cookie in BPF using bpf_loop, dropped
the timestamp cookie patch.
v3 changes:
Moved some patches from bpf to bpf-next, dropped the patch that changed
error codes, split the new helpers into IPv4/IPv6, added verifier
functionality to accept memory regions of fixed size.
v4 changes:
Converted the selftest to the test_progs runner. Replaced some
deprecated functions in xdp_synproxy userspace helper.
v5 changes:
Fixed a bug in the selftest. Added questionable functionality to support
new helpers in TC BPF, added selftests for it.
v6 changes:
Wrap the new helpers themselves into #ifdef CONFIG_SYN_COOKIES, replaced
fclose with pclose and fixed the MSS for IPv6 in the selftest.
v7 changes:
Fixed the off-by-one error in indices, changed the section name to
"xdp", added missing kernel config options to vmtest in CI.
v8 changes:
Properly rebased, dropped the first patch (the same change was applied
by someone else), updated the cover letter.
[1]: https://lore.kernel.org/bpf/20211020095815.GJ28644@breakpoint.cc/t/
[2]: https://lore.kernel.org/bpf/20220114163953.1455836-1-memxor@gmail.com/
[3]: https://netdevconf.info/0x15/session.html?Accelerating-synproxy-with-XDP
Maxim Mikityanskiy (5):
bpf: Fix documentation of th_len in bpf_tcp_{gen,check}_syncookie
bpf: Allow helpers to accept pointers with a fixed size
bpf: Add helpers to issue and check SYN cookies in XDP
bpf: Add selftests for raw syncookie helpers
bpf: Allow the new syncookie helpers to work with SKBs
include/linux/bpf.h | 10 +
include/net/tcp.h | 1 +
include/uapi/linux/bpf.h | 88 +-
kernel/bpf/verifier.c | 26 +-
net/core/filter.c | 128 +++
net/ipv4/tcp_input.c | 3 +-
scripts/bpf_doc.py | 4 +
tools/include/uapi/linux/bpf.h | 88 +-
tools/testing/selftests/bpf/.gitignore | 1 +
tools/testing/selftests/bpf/Makefile | 2 +-
.../selftests/bpf/prog_tests/xdp_synproxy.c | 144 +++
.../selftests/bpf/progs/xdp_synproxy_kern.c | 819 ++++++++++++++++++
tools/testing/selftests/bpf/xdp_synproxy.c | 466 ++++++++++
13 files changed, 1759 insertions(+), 21 deletions(-)
create mode 100644 tools/testing/selftests/bpf/prog_tests/xdp_synproxy.c
create mode 100644 tools/testing/selftests/bpf/progs/xdp_synproxy_kern.c
create mode 100644 tools/testing/selftests/bpf/xdp_synproxy.c
--
2.30.2
Dear linux-kselftest
We are interested in having some of your hot selling product in
our stores and outlets spread all over United Kingdom, Northern
Island and Africa. ASDA Stores Limited is one of the highest-
ranking Wholesale & Retail outlets in the United Kingdom.
We shall furnish our detailed company profile in our next
correspondent. However, it would be appreciated if you can send
us your catalog through email to learn more about your company's
products and wholesale quote. It is hopeful that we can start a
viable long-lasting business relationship (partnership) with you.
Your prompt response would be delightfully appreciated.
Best Wishes
Hanes S. Thomas
Procurement Office.
ASDA Stores Limited
Tel: + 44 - 7451271650
WhatsApp: + 44 – 7441440360
Website: www.asda.co.uk
This splits up the get_proc_stat function to make it so we can use it as a
generic helper to read the nth field from multiple different files, versus
replicating the logic in multiple places.
Signed-off-by: Sargun Dhillon <sargun(a)sargun.me>
Cc: linux-kselftest(a)vger.kernel.org
---
tools/testing/selftests/seccomp/seccomp_bpf.c | 54 +++++++++++++------
1 file changed, 38 insertions(+), 16 deletions(-)
diff --git a/tools/testing/selftests/seccomp/seccomp_bpf.c b/tools/testing/selftests/seccomp/seccomp_bpf.c
index ab340c4759a3..4fb5eda89223 100644
--- a/tools/testing/selftests/seccomp/seccomp_bpf.c
+++ b/tools/testing/selftests/seccomp/seccomp_bpf.c
@@ -4231,32 +4231,54 @@ TEST(user_notification_addfd_rlimit)
close(memfd);
}
-static char get_proc_stat(int pid)
+/*
+ * gen_nth - Get the nth, space separated entry in a file.
+ *
+ * Returns the length of the read field.
+ * Throws error if field is zero-lengthed.
+ */
+static ssize_t get_nth(struct __test_metadata *_metadata, const char *path,
+ const unsigned int position, char **entry)
{
- char proc_path[100] = {0};
char *line = NULL;
- size_t len = 0;
+ unsigned int i;
ssize_t nread;
- char status;
+ size_t len = 0;
FILE *f;
- int i;
- snprintf(proc_path, sizeof(proc_path), "/proc/%d/stat", pid);
- f = fopen(proc_path, "r");
- if (f == NULL)
- ksft_exit_fail_msg("%s - Could not open %s\n",
- strerror(errno), proc_path);
+ f = fopen(path, "r");
+ ASSERT_NE(f, NULL) {
+ TH_LOG("Coud not open %s: %s", path, strerror(errno));
+ }
- for (i = 0; i < 3; i++) {
+ for (i = 0; i < position; i++) {
nread = getdelim(&line, &len, ' ', f);
- if (nread <= 0)
- ksft_exit_fail_msg("Failed to read status: %s\n",
- strerror(errno));
+ ASSERT_GE(nread, 0) {
+ TH_LOG("Failed to read %d entry in file %s", i, path);
+ }
}
+ fclose(f);
+
+ ASSERT_GT(nread, 0) {
+ TH_LOG("Entry in file %s had zero length", path);
+ }
+
+ *entry = line;
+ return nread - 1;
+}
+
+/* For a given PID, get the task state (D, R, etc...) */
+static char get_proc_stat(struct __test_metadata *_metadata, pid_t pid)
+{
+ char proc_path[100] = {0};
+ char status;
+ char *line;
+
+ snprintf(proc_path, sizeof(proc_path), "/proc/%d/stat", pid);
+ ASSERT_EQ(get_nth(_metadata, proc_path, 3, &line), 1);
status = *line;
free(line);
- fclose(f);
return status;
}
@@ -4317,7 +4339,7 @@ TEST(user_notification_fifo)
/* This spins until all of the children are sleeping */
restart_wait:
for (i = 0; i < ARRAY_SIZE(pids); i++) {
- if (get_proc_stat(pids[i]) != 'S') {
+ if (get_proc_stat(_metadata, pids[i]) != 'S') {
nanosleep(&delay, NULL);
goto restart_wait;
}
--
2.25.1
This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.
Changelog:
v6:
Move documentation into cgroup-v2.rst per Tejun Heo.
Rename BINDER_FD{A}_FLAG_SENDER_NO_NEED ->
BINDER_FD{A}_FLAG_XFER_CHARGE per Carlos Llamas.
Return error on transfer failure per Carlos Llamas.
v5:
Rebase on top of v5.18-rc3
Drop the global GPU cgroup "total" (sum of all device totals) portion
of the design since there is no currently known use for this per
Tejun Heo.
Fix commit message which still contained the old name for
dma_buf_transfer_charge per Michal Koutný.
Remove all GPU cgroup code except what's necessary to support charge transfer
from dma_buf. Previously charging was done in export, but for non-Android
graphics use-cases this is not ideal since there may be a delay between
allocation and export, during which time there is no accounting.
Merge dmabuf: Use the GPU cgroup charge/uncharge APIs patch into
dmabuf: heaps: export system_heap buffers with GPU cgroup charging as a
result of above.
Put the charge and uncharge code in the same file (system_heap_allocate,
system_heap_dma_buf_release) instead of splitting them between the heap and
the dma_buf_release. This avoids asymmetric management of the gpucg charges.
Modify the dma_buf_transfer_charge API to accept a task_struct instead
of a gpucg. This avoids requiring the caller to manage the refcount
of the gpucg upon failure and confusing ownership transfer logic.
Support all strings for gpucg_register_bucket instead of just string
literals.
Enforce globally unique gpucg_bucket names.
Constrain gpucg_bucket name lengths to 64 bytes.
Append "-heap" to gpucg_bucket names from dmabuf-heaps.
Drop patch 7 from the series, which changed the types of
binder_transaction_data's sender_pid and sender_euid fields. This was
done in another commit here:
https://lore.kernel.org/all/20220210021129.3386083-4-masahiroy@kernel.org/
Rename:
gpucg_try_charge -> gpucg_charge
find_cg_rpool_locked -> cg_rpool_find_locked
init_cg_rpool -> cg_rpool_init
get_cg_rpool_locked -> cg_rpool_get_locked
"gpu cgroup controller" -> "GPU controller"
gpucg_device -> gpucg_bucket
usage -> size
Tests:
Support both binder_fd_array_object and binder_fd_object. This is
necessary because new versions of Android will use binder_fd_object
instead of binder_fd_array_object, and we need to support both.
Tests for both binder_fd_array_object and binder_fd_object.
For binder_utils return error codes instead of
struct binder{fs}_ctx.
Use ifdef __ANDROID__ to choose platform-dependent temp path instead
of a runtime fallback.
Ensure binderfs_mntpt ends with a trailing '/' character instead of
prepending it where used.
v4:
Skip test if not run as root per Shuah Khan
Add better test logging for abnormal child termination per Shuah Khan
Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný
Adjust gpucg_try_charge critical section for charge transfer functionality
Fix uninitialized return code error for dmabuf_try_charge error case
v3:
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz
Use more common dual author commit message format per John Stultz
Remove android from binder changes title per Todd Kjos
Add a kselftest for this new behavior per Greg Kroah-Hartman
Include details on behavior for all combinations of kernel/userspace
versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.
Fix pid and uid types in binder UAPI header
v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.
Fix incorrect Kconfig help section indentation per Randy Dunlap.
History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.
[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-…
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.co…
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
Hridya Valsaraju (3):
gpu: rfc: Proposal for a GPU cgroup controller
cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
memory
binder: Add flags to relinquish ownership of fds
T.J. Mercier (3):
dmabuf: heaps: export system_heap buffers with GPU cgroup charging
dmabuf: Add gpu cgroup charge transfer function
selftests: Add binder cgroup gpu memory transfer tests
Documentation/admin-guide/cgroup-v2.rst | 24 +
drivers/android/binder.c | 31 +-
drivers/dma-buf/dma-buf.c | 80 ++-
drivers/dma-buf/dma-heap.c | 39 ++
drivers/dma-buf/heaps/system_heap.c | 28 +-
include/linux/cgroup_gpu.h | 137 +++++
include/linux/cgroup_subsys.h | 4 +
include/linux/dma-buf.h | 49 +-
include/linux/dma-heap.h | 15 +
include/uapi/linux/android/binder.h | 23 +-
init/Kconfig | 7 +
kernel/cgroup/Makefile | 1 +
kernel/cgroup/gpu.c | 386 +++++++++++++
.../selftests/drivers/android/binder/Makefile | 8 +
.../drivers/android/binder/binder_util.c | 250 +++++++++
.../drivers/android/binder/binder_util.h | 32 ++
.../selftests/drivers/android/binder/config | 4 +
.../binder/test_dmabuf_cgroup_transfer.c | 526 ++++++++++++++++++
18 files changed, 1621 insertions(+), 23 deletions(-)
create mode 100644 include/linux/cgroup_gpu.h
create mode 100644 kernel/cgroup/gpu.c
create mode 100644 tools/testing/selftests/drivers/android/binder/Makefile
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.c
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.h
create mode 100644 tools/testing/selftests/drivers/android/binder/config
create mode 100644 tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c
--
2.36.0.464.gb9c8b46e94-goog
If a memop fails due to key checked protection, after already having
written to the guest, don't indicate suppression to the guest, as that
would imply that memory wasn't modified.
This could be considered a fix to the code introducing storage key
support, however this is a bug in KVM only if we emulate an
instructions writing to an operand spanning multiple pages, which I
don't believe we do.
v1 -> v2
* Reword commit message of patch 1
Janis Schoetterl-Glausch (2):
KVM: s390: Don't indicate suppression on dirtying, failing memop
KVM: s390: selftest: Test suppression indication on key prot exception
arch/s390/kvm/gaccess.c | 47 ++++++++++++++---------
tools/testing/selftests/kvm/s390x/memop.c | 43 ++++++++++++++++++++-
2 files changed, 70 insertions(+), 20 deletions(-)
base-commit: af2d861d4cd2a4da5137f795ee3509e6f944a25b
--
2.32.0