This patch series revisits the proposal for a GPU cgroup controller to
track and limit memory allocations by various device/allocator
subsystems. The patch series also contains a simple prototype to
illustrate how Android intends to implement DMA-BUF allocator
attribution using the GPU cgroup controller. The prototype does not
include resource limit enforcements.
Changelog:
v4:
Skip test if not run as root per Shuah Khan
Add better test logging for abnormal child termination per Shuah Khan
Adjust ordering of charge/uncharge during transfer to avoid potentially
hitting cgroup limit per Michal Koutný
Adjust gpucg_try_charge critical section for charge transfer functionality
Fix uninitialized return code error for dmabuf_try_charge error case
v3:
Remove Upstreaming Plan from gpu-cgroup.rst per John Stultz
Use more common dual author commit message format per John Stultz
Remove android from binder changes title per Todd Kjos
Add a kselftest for this new behavior per Greg Kroah-Hartman
Include details on behavior for all combinations of kernel/userspace
versions in changelog (thanks Suren Baghdasaryan) per Greg Kroah-Hartman.
Fix pid and uid types in binder UAPI header
v2:
See the previous revision of this change submitted by Hridya Valsaraju
at: https://lore.kernel.org/all/20220115010622.3185921-1-hridya@google.com/
Move dma-buf cgroup charge transfer from a dma_buf_op defined by every
heap to a single dma-buf function for all heaps per Daniel Vetter and
Christian König. Pointers to struct gpucg and struct gpucg_device
tracking the current associations were added to the dma_buf struct to
achieve this.
Fix incorrect Kconfig help section indentation per Randy Dunlap.
History of the GPU cgroup controller
====================================
The GPU/DRM cgroup controller came into being when a consensus[1]
was reached that the resources it tracked were unsuitable to be integrated
into memcg. Originally, the proposed controller was specific to the DRM
subsystem and was intended to track GEM buffers and GPU-specific
resources[2]. In order to help establish a unified memory accounting model
for all GPU and all related subsystems, Daniel Vetter put forth a
suggestion to move it out of the DRM subsystem so that it can be used by
other DMA-BUF exporters as well[3]. This RFC proposes an interface that
does the same.
[1]: https://patchwork.kernel.org/project/dri-devel/cover/20190501140438.9506-1-…
[2]: https://lore.kernel.org/amd-gfx/20210126214626.16260-1-brian.welty@intel.co…
[3]: https://lore.kernel.org/amd-gfx/YCVOl8%2F87bqRSQei@phenom.ffwll.local/
Hridya Valsaraju (5):
gpu: rfc: Proposal for a GPU cgroup controller
cgroup: gpu: Add a cgroup controller for allocator attribution of GPU
memory
dmabuf: heaps: export system_heap buffers with GPU cgroup charging
dmabuf: Add gpu cgroup charge transfer function
binder: Add a buffer flag to relinquish ownership of fds
T.J. Mercier (3):
dmabuf: Use the GPU cgroup charge/uncharge APIs
binder: use __kernel_pid_t and __kernel_uid_t for userspace
selftests: Add binder cgroup gpu memory transfer test
Documentation/gpu/rfc/gpu-cgroup.rst | 183 +++++++
Documentation/gpu/rfc/index.rst | 4 +
drivers/android/binder.c | 26 +
drivers/dma-buf/dma-buf.c | 107 ++++
drivers/dma-buf/dma-heap.c | 27 +
drivers/dma-buf/heaps/system_heap.c | 3 +
include/linux/cgroup_gpu.h | 139 +++++
include/linux/cgroup_subsys.h | 4 +
include/linux/dma-buf.h | 22 +-
include/linux/dma-heap.h | 11 +
include/uapi/linux/android/binder.h | 5 +-
init/Kconfig | 7 +
kernel/cgroup/Makefile | 1 +
kernel/cgroup/gpu.c | 362 +++++++++++++
.../selftests/drivers/android/binder/Makefile | 8 +
.../drivers/android/binder/binder_util.c | 254 +++++++++
.../drivers/android/binder/binder_util.h | 32 ++
.../selftests/drivers/android/binder/config | 4 +
.../binder/test_dmabuf_cgroup_transfer.c | 484 ++++++++++++++++++
19 files changed, 1679 insertions(+), 4 deletions(-)
create mode 100644 Documentation/gpu/rfc/gpu-cgroup.rst
create mode 100644 include/linux/cgroup_gpu.h
create mode 100644 kernel/cgroup/gpu.c
create mode 100644 tools/testing/selftests/drivers/android/binder/Makefile
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.c
create mode 100644 tools/testing/selftests/drivers/android/binder/binder_util.h
create mode 100644 tools/testing/selftests/drivers/android/binder/config
create mode 100644 tools/testing/selftests/drivers/android/binder/test_dmabuf_cgroup_transfer.c
--
2.35.1.1021.g381101b075-goog
On Fri, 2022-03-11 at 17:24 +0100, Vincent Whitchurch wrote:
> Import the libvhost-user from QEMU for use in the implementation of the
> virtio devices in the roadtest backend.
>
So hm, I wonder if this is the sensible thing to do?
Not that I mind importing qemu code, but:
1) the implementation is rather complex in some places, and has support
for a LOT of virtio/vhost-user features that are really not needed
in these cases, for performance etc. It's also close to 4k LOC.
2) the implementation doesn't support time-travel mode which might come
in handy
We have another implementation that might be simpler:
https://github.com/linux-test-project/usfstl/blob/main/src/vhost.c
but it probably has dependencies on other things in this library, but
vhost.c itself is only ~1k LOC. (But I need to update it, I'm sure we
have some unpublished bugfixes etc. in this code)
johannes
Hi,
This is a followup of my v1 at [0] and v2 at [1].
The short summary of the previous cover letter and discussions is that
HID could benefit from BPF for the following use cases:
- simple fixup of report descriptor:
benefits are faster development time and testing, with the produced
bpf program being shipped in the kernel directly (the shipping part
is *not* addressed here).
- Universal Stylus Interface:
allows a user-space program to define its own kernel interface
- Surface Dial:
somehow similar to the previous one except that userspace can decide
to change the shape of the exported device
- firewall:
still partly missing there, there is not yet interception of hidraw
calls, but it's coming in a followup series, I promise
- tracing:
well, tracing.
I think I addressed the comments from the previous version, but there are
a few things I'd like to note here:
- I did not take the various rev-by and tested-by (thanks a lot for those)
because the uapi changed significantly in v3, so I am not very confident
in taking those rev-by blindly
- I mentioned in my discussion with Song that I'll put a summary of the uapi
in the cover letter, but I ended up adding a (long) file in the Documentation
directory. So please maybe start by reading 17/17 to have an overview of
what I want to achieve
- I added in the libbpf and bpf the new type BPF_HID_DRIVER_EVENT, even though
I don't have a user of it right now in the kernel. I wanted to have them in
the docs, but we might not want to have them ready here.
In terms of code, it just means that we can attach such programs types
but that they will never get triggered.
Anyway, I have been mulling on this for the past 2 weeks, and I think that
maybe sharing this now is better than me just starring at the code over and
over.
Short summary of changes:
v3:
===
- squashed back together most of the libbpf and bpf changes into bigger
commits that give a better overview of the whole interactions
- reworked the user API to not expose .data as a directly accessible field
from the context, but instead forces everyone to use hid_bpf_get_data (or
get/set_bits)
- added BPF_HID_DRIVER_EVENT (see note above)
- addressed the various nitpicks from v2
- added a big Documentation file (and so adding now the doc maintainers to the
long list of recipients)
v2:
===
- split the series by subsystem (bpf, HID, libbpf, selftests and
samples)
- Added an extra patch at the beginning to not require CAP_NET_ADMIN for
BPF_PROG_TYPE_LIRC_MODE2 (please shout if this is wrong)
- made the bpf context attached to HID program of dynamic size:
* the first 1 kB will be able to be addressed directly
* the rest can be retrieved through bpf_hid_{set|get}_data
(note that I am definitivey not happy with that API, because there
is part of it in bits and other in bytes. ouch)
- added an extra patch to prevent non GPL HID bpf programs to be loaded
of type BPF_PROG_TYPE_HID
* same here, not really happy but I don't know where to put that check
in verifier.c
- added a new flag BPF_F_INSERT_HEAD for BPF_LINK_CREATE syscall when in
used with HID program types.
* this flag is used for tracing, to be able to load a program before
any others that might already have been inserted and that might
change the data stream.
Cheers,
Benjamin
[0] https://lore.kernel.org/linux-input/20220224110828.2168231-1-benjamin.tisso…
[1] https://lore.kernel.org/linux-input/20220304172852.274126-1-benjamin.tissoi…
Benjamin Tissoires (17):
bpf: add new is_sys_admin_prog_type() helper
bpf: introduce hid program type
bpf/verifier: prevent non GPL programs to be loaded against HID
libbpf: add HID program type and API
HID: hook up with bpf
HID: allow to change the report descriptor from an eBPF program
selftests/bpf: add tests for the HID-bpf initial implementation
selftests/bpf: add report descriptor fixup tests
selftests/bpf: Add a test for BPF_F_INSERT_HEAD
selftests/bpf: add test for user call of HID bpf programs
samples/bpf: add new hid_mouse example
bpf/hid: add more HID helpers
HID: bpf: implement hid_bpf_get|set_bits
HID: add implementation of bpf_hid_raw_request
selftests/bpf: add tests for hid_{get|set}_bits helpers
selftests/bpf: add tests for bpf_hid_hw_request
Documentation: add HID-BPF docs
Documentation/hid/hid-bpf.rst | 444 +++++++++++
Documentation/hid/index.rst | 1 +
drivers/hid/Makefile | 1 +
drivers/hid/hid-bpf.c | 328 ++++++++
drivers/hid/hid-core.c | 34 +-
include/linux/bpf-hid.h | 127 +++
include/linux/bpf_types.h | 4 +
include/linux/hid.h | 36 +-
include/uapi/linux/bpf.h | 67 ++
include/uapi/linux/bpf_hid.h | 71 ++
include/uapi/linux/hid.h | 10 +
kernel/bpf/Makefile | 3 +
kernel/bpf/btf.c | 1 +
kernel/bpf/hid.c | 728 +++++++++++++++++
kernel/bpf/syscall.c | 27 +-
kernel/bpf/verifier.c | 7 +
samples/bpf/.gitignore | 1 +
samples/bpf/Makefile | 4 +
samples/bpf/hid_mouse_kern.c | 117 +++
samples/bpf/hid_mouse_user.c | 129 +++
tools/include/uapi/linux/bpf.h | 67 ++
tools/lib/bpf/libbpf.c | 23 +-
tools/lib/bpf/libbpf.h | 2 +
tools/lib/bpf/libbpf.map | 1 +
tools/testing/selftests/bpf/config | 3 +
tools/testing/selftests/bpf/prog_tests/hid.c | 788 +++++++++++++++++++
tools/testing/selftests/bpf/progs/hid.c | 205 +++++
27 files changed, 3204 insertions(+), 25 deletions(-)
create mode 100644 Documentation/hid/hid-bpf.rst
create mode 100644 drivers/hid/hid-bpf.c
create mode 100644 include/linux/bpf-hid.h
create mode 100644 include/uapi/linux/bpf_hid.h
create mode 100644 kernel/bpf/hid.c
create mode 100644 samples/bpf/hid_mouse_kern.c
create mode 100644 samples/bpf/hid_mouse_user.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/hid.c
create mode 100644 tools/testing/selftests/bpf/progs/hid.c
--
2.35.1
Before Linux 5.17, there was a problem with the CMOS RTC driver:
cmos_read_alarm() and cmos_set_alarm() did not check for the UIP (Update
in progress) bit, which could have caused it to sometimes fail silently
and read bogus values or do not set the alarm correctly.
Luckily, this issue was masked by cmos_read_time() invocations in core
RTC code - see https://marc.info/?l=linux-rtc&m=164858416511425&w=4
To avoid such a problem in the future in some other driver, I wrote a
test unit that reads the alarm time many times in a row. As the alarm
time is usually read once and cached by the RTC core, this requires a
way for userspace to trigger direct alarm time read from hardware. I
think that debugfs is the natural choice for this.
So, introduce /sys/kernel/debug/rtc/rtcX/wakealarm_raw. This interface
as implemented here does not seem to be that useful to userspace, so
there is little risk that it will become kernel ABI.
Is this approach correct and worth it?
TODO:
- should I add a new Kconfig option (like CONFIG_RTC_INTF_DEBUGFS), or
just use CONFIG_DEBUG_FS here? I wouldn't like to create unnecessary
config options in the kernel.
Signed-off-by: Mateusz Jończyk <mat.jonczyk(a)o2.pl>
Cc: Alessandro Zummo <a.zummo(a)towertech.it>
Cc: Alexandre Belloni <alexandre.belloni(a)bootlin.com>
Cc: Shuah Khan <shuah(a)kernel.org>
---
drivers/rtc/Makefile | 1 +
drivers/rtc/class.c | 3 ++
drivers/rtc/debugfs.c | 112 ++++++++++++++++++++++++++++++++++++++++
drivers/rtc/interface.c | 3 +-
include/linux/rtc.h | 16 ++++++
5 files changed, 133 insertions(+), 2 deletions(-)
create mode 100644 drivers/rtc/debugfs.c
diff --git a/drivers/rtc/Makefile b/drivers/rtc/Makefile
index 678a8ef4abae..50e166a97f54 100644
--- a/drivers/rtc/Makefile
+++ b/drivers/rtc/Makefile
@@ -14,6 +14,7 @@ rtc-core-$(CONFIG_RTC_NVMEM) += nvmem.o
rtc-core-$(CONFIG_RTC_INTF_DEV) += dev.o
rtc-core-$(CONFIG_RTC_INTF_PROC) += proc.o
rtc-core-$(CONFIG_RTC_INTF_SYSFS) += sysfs.o
+rtc-core-$(CONFIG_DEBUG_FS) += debugfs.o
obj-$(CONFIG_RTC_LIB_KUNIT_TEST) += lib_test.o
diff --git a/drivers/rtc/class.c b/drivers/rtc/class.c
index 4b460c61f1d8..5673b7b26c0d 100644
--- a/drivers/rtc/class.c
+++ b/drivers/rtc/class.c
@@ -334,6 +334,7 @@ static void devm_rtc_unregister_device(void *data)
* Remove innards of this RTC, then disable it, before
* letting any rtc_class_open() users access it again
*/
+ rtc_debugfs_del_device(rtc);
rtc_proc_del_device(rtc);
if (!test_bit(RTC_NO_CDEV, &rtc->flags))
cdev_device_del(&rtc->char_dev, &rtc->dev);
@@ -417,6 +418,7 @@ int __devm_rtc_register_device(struct module *owner, struct rtc_device *rtc)
}
rtc_proc_add_device(rtc);
+ rtc_debugfs_add_device(rtc);
dev_info(rtc->dev.parent, "registered as %s\n",
dev_name(&rtc->dev));
@@ -476,6 +478,7 @@ static int __init rtc_init(void)
}
rtc_class->pm = RTC_CLASS_DEV_PM_OPS;
rtc_dev_init();
+ rtc_debugfs_init();
return 0;
}
subsys_initcall(rtc_init);
diff --git a/drivers/rtc/debugfs.c b/drivers/rtc/debugfs.c
new file mode 100644
index 000000000000..5ceed5504033
--- /dev/null
+++ b/drivers/rtc/debugfs.c
@@ -0,0 +1,112 @@
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+/*
+ * Debugfs interface for testing RTC alarms.
+ */
+#include <linux/debugfs.h>
+#include <linux/err.h>
+#include <linux/rtc.h>
+
+static struct dentry *rtc_main_debugfs_dir;
+
+void rtc_debugfs_init(void)
+{
+ struct dentry *ret = debugfs_create_dir("rtc", NULL);
+
+ // No error is critical here
+ if (!IS_ERR(ret))
+ rtc_main_debugfs_dir = ret;
+}
+
+/*
+ * Handler for /sys/kernel/debug/rtc/rtcX/wakealarm_raw .
+ * This function reads the RTC alarm time directly from hardware. If the RTC
+ * alarm is enabled, this function returns the alarm time modulo 24h in seconds
+ * since midnight.
+ *
+ * Should be only used for testing of the RTC alarm read functionality in
+ * drivers - to make sure that the driver returns consistent values.
+ *
+ * Used in tools/testing/selftests/rtc/rtctest.c .
+ */
+static int rtc_debugfs_alarm_read(void *p, u64 *out)
+{
+ int ret;
+ struct rtc_device *rtc = p;
+ struct rtc_wkalrm alm;
+
+ /* Using rtc_read_alarm_internal() instead of __rtc_read_alarm() will
+ * allow us to avoid any interaction with rtc_read_time() and possibly
+ * see more issues.
+ */
+ ret = rtc_read_alarm_internal(rtc, &alm);
+ if (ret != 0)
+ return ret;
+
+ if (!alm.enabled) {
+ *out = -1;
+ return 0;
+ }
+
+ /* It does not matter if the device does not support seconds resolution
+ * of the RTC alarm.
+ */
+ if (test_bit(RTC_FEATURE_ALARM_RES_MINUTE, rtc->features))
+ alm.time.tm_sec = 0;
+
+ /* The selftest code works with fully defined alarms only.
+ */
+ if (alm.time.tm_sec == -1 || alm.time.tm_min == -1 || alm.time.tm_hour == -1) {
+ *out = -2;
+ return 0;
+ }
+
+ /* Check if the alarm time is correct.
+ * rtc_valid_tm() does not allow fields containing "-1", so put in
+ * something to satisfy it.
+ */
+ if (alm.time.tm_year == -1)
+ alm.time.tm_year = 100;
+ if (alm.time.tm_mon == -1)
+ alm.time.tm_mon = 0;
+ if (alm.time.tm_mday == -1)
+ alm.time.tm_mday = 1;
+ if (rtc_valid_tm(&alm.time))
+ return -EINVAL;
+
+ /* We do not duplicate the logic in __rtc_read_alarm() and instead only
+ * return the alarm time modulo 24h, which all devices should support.
+ * This should be enough for testing purposes.
+ */
+ *out = alm.time.tm_hour * 3600 + alm.time.tm_min * 60 + alm.time.tm_sec;
+
+ return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(rtc_alarm_raw, rtc_debugfs_alarm_read, NULL, "%lld\n");
+
+void rtc_debugfs_add_device(struct rtc_device *rtc)
+{
+ struct dentry *dev_dir;
+
+ if (!rtc_main_debugfs_dir)
+ return;
+
+ dev_dir = debugfs_create_dir(dev_name(&rtc->dev), rtc_main_debugfs_dir);
+
+ if (IS_ERR(dev_dir)) {
+ rtc->debugfs_dir = NULL;
+ return;
+ }
+ rtc->debugfs_dir = dev_dir;
+
+ if (test_bit(RTC_FEATURE_ALARM, rtc->features) && rtc->ops->read_alarm) {
+ debugfs_create_file("wakealarm_raw", 0444, dev_dir,
+ rtc, &rtc_alarm_raw);
+ }
+}
+
+void rtc_debugfs_del_device(struct rtc_device *rtc)
+{
+ debugfs_remove_recursive(rtc->debugfs_dir);
+ rtc->debugfs_dir = NULL;
+}
diff --git a/drivers/rtc/interface.c b/drivers/rtc/interface.c
index d8e835798153..51c801c82472 100644
--- a/drivers/rtc/interface.c
+++ b/drivers/rtc/interface.c
@@ -175,8 +175,7 @@ int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm)
}
EXPORT_SYMBOL_GPL(rtc_set_time);
-static int rtc_read_alarm_internal(struct rtc_device *rtc,
- struct rtc_wkalrm *alarm)
+int rtc_read_alarm_internal(struct rtc_device *rtc, struct rtc_wkalrm *alarm)
{
int err;
diff --git a/include/linux/rtc.h b/include/linux/rtc.h
index 47fd1c2d3a57..4665bc238a94 100644
--- a/include/linux/rtc.h
+++ b/include/linux/rtc.h
@@ -41,6 +41,7 @@ static inline time64_t rtc_tm_sub(struct rtc_time *lhs, struct rtc_time *rhs)
#include <linux/mutex.h>
#include <linux/timerqueue.h>
#include <linux/workqueue.h>
+#include <linux/debugfs.h>
extern struct class *rtc_class;
@@ -152,6 +153,10 @@ struct rtc_device {
time64_t offset_secs;
bool set_start_time;
+#ifdef CONFIG_DEBUG_FS
+ struct dentry *debugfs_dir;
+#endif
+
#ifdef CONFIG_RTC_INTF_DEV_UIE_EMUL
struct work_struct uie_task;
struct timer_list uie_timer;
@@ -190,6 +195,7 @@ extern int rtc_set_time(struct rtc_device *rtc, struct rtc_time *tm);
int __rtc_read_alarm(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
extern int rtc_read_alarm(struct rtc_device *rtc,
struct rtc_wkalrm *alrm);
+int rtc_read_alarm_internal(struct rtc_device *rtc, struct rtc_wkalrm *alarm);
extern int rtc_set_alarm(struct rtc_device *rtc,
struct rtc_wkalrm *alrm);
extern int rtc_initialize_alarm(struct rtc_device *rtc,
@@ -262,4 +268,14 @@ int rtc_add_groups(struct rtc_device *rtc, const struct attribute_group **grps)
return 0;
}
#endif
+
+#ifdef CONFIG_DEBUG_FS
+void rtc_debugfs_init(void);
+void rtc_debugfs_add_device(struct rtc_device *rtc);
+void rtc_debugfs_del_device(struct rtc_device *rtc);
+#else /* CONFIG_DEBUG_FS */
+static inline void rtc_debugfs_init(void) {}
+static inline void rtc_debugfs_add_device(struct rtc_device *rtc) {}
+static inline void rtc_debugfs_del_device(struct rtc_device *rtc) {}
+#endif /* CONFIG_DEBUG_FS */
#endif /* _LINUX_RTC_H_ */
--
2.25.1
Print two possible reasons /sys/kernel/debug/gup_test
cannot be opened to help users of this test diagnose
failures.
Signed-off-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Cc: stable(a)vger.kernel.org # 5.15+
---
tools/testing/selftests/vm/gup_test.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tools/testing/selftests/vm/gup_test.c b/tools/testing/selftests/vm/gup_test.c
index fe043f67798b0..c496bcefa7a0e 100644
--- a/tools/testing/selftests/vm/gup_test.c
+++ b/tools/testing/selftests/vm/gup_test.c
@@ -205,7 +205,9 @@ int main(int argc, char **argv)
gup_fd = open("/sys/kernel/debug/gup_test", O_RDWR);
if (gup_fd == -1) {
- perror("open");
+ perror("failed to open /sys/kernel/debug/gup_test");
+ printf("check if CONFIG_GUP_TEST is enabled in kernel config\n");
+ printf("check if debugfs is mounted at /sys/kernel/debug\n");
exit(1);
}
--
2.24.1
On 3/30/2022 12:03 PM, Jarkko Sakkinen wrote:
> On Wed, 2022-03-30 at 10:40 -0700, Reinette Chatre wrote:
>> Could you please elaborate how the compiler will fix it up?
>
> Sure.
>
> Here's the disassembly of the RBX version:
>
> [0x000021a9]> pi 1
> lea rax, [rbx + loc.encl_stack]
>
> Here's the same with s/RBX/RIP/:
>
> [0x000021a9]> pi 5
> lea rax, loc.encl_stack
>
> Compiler will substitute correct offset relative to the RIP,
> well, because it can and it makes sense.
It does not make sense to me because, as proven with my test,
the two threads end up sharing the same stack memory.
Reinette