This patch set implements out of the box support of perf tools for persistent events. For this the kernel must provide necessary information about existing persistent events via sysfs to userland. Persistent events are provided by the kernel with readonly event buffers. To allow the independent usage of the event buffers by any process without limiting other processes, multiple users of a single event must be handled too.
The basic concept is to use a pmu as an event container for persistent events. The pmu registers events in sysfs and provides format and event information for the userland. The persistent event framework requires to add events to the pmu dynamically.
With the information in sysfs userland knows about how to setup the perf_event attribute of a persistent event. Since a persistent event always has the persistent flag set, a way is needed to express this in sysfs. A new syntax is used for this. With 'attr<num>:<mask>' any bit in the attribute structure may be set in a similar way as using 'config<num>', but <num> is an index that points to the u64 value to change within the attribute.
For persistent events the persistent flag (bit 23 of flag field in struct perf_event_attr) needs to be set which is expressed in sysfs with "attr5:23". E.g. the mce_record event is described in sysfs as follows:
/sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106 /sys/bus/event_source/devices/persistent/format/persistent:attr5:23
Note that perf tools need to support the 'attr<num>' syntax that is added in a separate patch set. With it we are able to run perf tool commands to read persistent events, e.g.:
# perf record -e persistent/mce_record/ sleep 10 # perf top -e persistent/mce_record/
In general the new syntax is flexible to describe with sysfs any event to be setup by perf tools.
First patches contain also fixes and reworks made after reviewing code. Could be applied separately.
Patches base on Boris' patches which I have rebased to latest tip/perf/core. All patches can be found here:
git://git.kernel.org/pub/scm/linux/kernel/git/rric/oprofile.git persistent-v2
I wonder if this patch set could be applied to a tip/perf topic branch? This would avoid reposting already reviewed patches.
Note: The perf tools patch set do not need to be reposted.
-Robert
Changes for V2: * Merged minor changes into Boris' patches * Included Boris' patches for review * Document attr<index> syntax in sysfs ABI * Adding cpu check to perf_get_persistent_event_fd() * Rebased to latest tip/perf/core
Borislav Petkov (4): perf, ring_buffer: Use same prefix perf: Add persistent events perf: Add persistent event facilities MCE: Enable persistent event
Robert Richter (10): perf, persistent: Rework struct pers_event_desc perf, persistent: Remove rb_put() perf, persistent: Introduce get_persistent_event() perf, persistent: Reworking perf_get_persistent_event_fd() perf, persistent: Protect event lists with mutex perf, persistent: Avoid adding identical events perf, persistent: Implementing a persistent pmu perf, persistent: Name each persistent event perf, persistent: Exposing persistent events using sysfs perf, persistent: Allow multiple users for an event
.../testing/sysfs-bus-event_source-devices-format | 43 ++- arch/x86/kernel/cpu/mcheck/mce.c | 19 ++ include/linux/perf_event.h | 9 +- include/uapi/linux/perf_event.h | 3 +- kernel/events/Makefile | 2 +- kernel/events/core.c | 56 ++-- kernel/events/internal.h | 3 + kernel/events/persistent.c | 320 +++++++++++++++++++++ kernel/events/ring_buffer.c | 7 +- 9 files changed, 419 insertions(+), 43 deletions(-) create mode 100644 kernel/events/persistent.c
From: Borislav Petkov bp@suse.de
Rename ring_buffer-handling functions consistently with the "rb_" prefix.
Signed-off-by: Borislav Petkov bp@suse.de Signed-off-by: Robert Richter rric@kernel.org --- kernel/events/core.c | 37 +++++++++++++++++-------------------- kernel/events/ring_buffer.c | 7 +++---- 2 files changed, 20 insertions(+), 24 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c index a0780b3..b790ab6 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -198,8 +198,7 @@ static void cpu_ctx_sched_in(struct perf_cpu_context *cpuctx, static void update_context_time(struct perf_event_context *ctx); static u64 perf_event_time(struct perf_event *event);
-static void ring_buffer_attach(struct perf_event *event, - struct ring_buffer *rb); +static void rb_attach(struct perf_event *event, struct ring_buffer *rb);
void __weak perf_event_print_debug(void) { }
@@ -3022,7 +3021,7 @@ static void free_event_rcu(struct rcu_head *head) kfree(event); }
-static void ring_buffer_put(struct ring_buffer *rb); +static void rb_put(struct ring_buffer *rb);
static void free_event(struct perf_event *event) { @@ -3054,7 +3053,7 @@ static void free_event(struct perf_event *event) }
if (event->rb) { - ring_buffer_put(event->rb); + rb_put(event->rb); event->rb = NULL; }
@@ -3299,8 +3298,8 @@ static unsigned int perf_poll(struct file *file, poll_table *wait) * t0: T1, rb = rcu_dereference(event->rb) * t1: T2, old_rb = event->rb * t2: T2, event->rb = new rb - * t3: T2, ring_buffer_detach(old_rb) - * t4: T1, ring_buffer_attach(rb1) + * t3: T2, rb_detach(old_rb) + * t4: T1, rb_attach(rb1) * t5: T1, poll_wait(event->waitq) * * To avoid this problem, we grab mmap_mutex in perf_poll() @@ -3312,7 +3311,7 @@ static unsigned int perf_poll(struct file *file, poll_table *wait) rcu_read_lock(); rb = rcu_dereference(event->rb); if (rb) { - ring_buffer_attach(event, rb); + rb_attach(event, rb); events = atomic_xchg(&rb->poll, 0); } rcu_read_unlock(); @@ -3617,8 +3616,7 @@ unlock: return ret; }
-static void ring_buffer_attach(struct perf_event *event, - struct ring_buffer *rb) +static void rb_attach(struct perf_event *event, struct ring_buffer *rb) { unsigned long flags;
@@ -3634,8 +3632,7 @@ unlock: spin_unlock_irqrestore(&rb->event_lock, flags); }
-static void ring_buffer_detach(struct perf_event *event, - struct ring_buffer *rb) +static void rb_detach(struct perf_event *event, struct ring_buffer *rb) { unsigned long flags;
@@ -3648,7 +3645,7 @@ static void ring_buffer_detach(struct perf_event *event, spin_unlock_irqrestore(&rb->event_lock, flags); }
-static void ring_buffer_wakeup(struct perf_event *event) +static void rb_wakeup(struct perf_event *event) { struct ring_buffer *rb;
@@ -3672,7 +3669,7 @@ static void rb_free_rcu(struct rcu_head *rcu_head) rb_free(rb); }
-static struct ring_buffer *ring_buffer_get(struct perf_event *event) +static struct ring_buffer *rb_get(struct perf_event *event) { struct ring_buffer *rb;
@@ -3687,7 +3684,7 @@ static struct ring_buffer *ring_buffer_get(struct perf_event *event) return rb; }
-static void ring_buffer_put(struct ring_buffer *rb) +static void rb_put(struct ring_buffer *rb) { struct perf_event *event, *n; unsigned long flags; @@ -3724,10 +3721,10 @@ static void perf_mmap_close(struct vm_area_struct *vma) atomic_long_sub((size >> PAGE_SHIFT) + 1, &user->locked_vm); vma->vm_mm->pinned_vm -= event->mmap_locked; rcu_assign_pointer(event->rb, NULL); - ring_buffer_detach(event, rb); + rb_detach(event, rb); mutex_unlock(&event->mmap_mutex);
- ring_buffer_put(rb); + rb_put(rb); free_uid(user); } } @@ -3881,7 +3878,7 @@ static const struct file_operations perf_fops = {
void perf_event_wakeup(struct perf_event *event) { - ring_buffer_wakeup(event); + rb_wakeup(event);
if (event->pending_kill) { kill_fasync(&event->fasync, SIGIO, event->pending_kill); @@ -6567,7 +6564,7 @@ set:
if (output_event) { /* get the rb we want to redirect to */ - rb = ring_buffer_get(output_event); + rb = rb_get(output_event); if (!rb) goto unlock; } @@ -6575,13 +6572,13 @@ set: old_rb = event->rb; rcu_assign_pointer(event->rb, rb); if (old_rb) - ring_buffer_detach(event, old_rb); + rb_detach(event, old_rb); ret = 0; unlock: mutex_unlock(&event->mmap_mutex);
if (old_rb) - ring_buffer_put(old_rb); + rb_put(old_rb); out: return ret; } diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c index cd55144..b514c56 100644 --- a/kernel/events/ring_buffer.c +++ b/kernel/events/ring_buffer.c @@ -212,8 +212,7 @@ void perf_output_end(struct perf_output_handle *handle) rcu_read_unlock(); }
-static void -ring_buffer_init(struct ring_buffer *rb, long watermark, int flags) +static void rb_init(struct ring_buffer *rb, long watermark, int flags) { long max_size = perf_data_size(rb);
@@ -290,7 +289,7 @@ struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
rb->nr_pages = nr_pages;
- ring_buffer_init(rb, watermark, flags); + rb_init(rb, watermark, flags);
return rb;
@@ -395,7 +394,7 @@ struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags) rb->page_order = ilog2(nr_pages); rb->nr_pages = !!nr_pages;
- ring_buffer_init(rb, watermark, flags); + rb_init(rb, watermark, flags);
return rb;
From: Borislav Petkov bp@alien8.de
Add the needed pieces for persistent events which makes them process-agnostic. Also, make their buffers read-only when mmaping them from userspace.
While at it, do not return a void function, as caught by Fengguang's build robot.
Changes made by Robert Richter robert.richter@linaro.org:
* mmap should return EACCES error if fd can not be opened writable. This error code also helps userland to map buffers readonly on failure.
Signed-off-by: Borislav Petkov bp@suse.de [ Return -EACCES if mapped buffers must be readonly ] Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- include/uapi/linux/perf_event.h | 3 ++- kernel/events/core.c | 10 +++++++++- 2 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index fb104e5..6032361 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -272,8 +272,9 @@ struct perf_event_attr {
exclude_callchain_kernel : 1, /* exclude kernel callchains */ exclude_callchain_user : 1, /* exclude user callchains */ + persistent : 1, /* always-on event */
- __reserved_1 : 41; + __reserved_1 : 40;
union { __u32 wakeup_events; /* wakeup every n events */ diff --git a/kernel/events/core.c b/kernel/events/core.c index b790ab6..a13e457 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3713,6 +3713,11 @@ static void perf_mmap_close(struct vm_area_struct *vma) { struct perf_event *event = vma->vm_file->private_data;
+ if (event->attr.persistent) { + atomic_dec(&event->mmap_count); + return; + } + if (atomic_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex)) { unsigned long size = perf_data_size(event->rb); struct user_struct *user = event->mmap_user; @@ -3756,9 +3761,12 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma) if (event->cpu == -1 && event->attr.inherit) return -EINVAL;
- if (!(vma->vm_flags & VM_SHARED)) + if (!(vma->vm_flags & VM_SHARED) && !event->attr.persistent) return -EINVAL;
+ if (event->attr.persistent && (vma->vm_flags & VM_WRITE)) + return -EACCES; + vma_size = vma->vm_end - vma->vm_start; nr_pages = (vma_size / PAGE_SIZE) - 1;
From: Borislav Petkov bp@alien8.de
Add a barebones implementation for registering persistent events with perf. For that, we don't destroy the buffers when they're unmapped; also, we map them read-only so that multiple agents can access them.
Also, we allocate the event buffers at event init time and not at mmap time so that we can log samples into them regardless of whether there are readers in userspace or not.
Changes made by Robert Richter robert.richter@linaro.org:
* Fixing wrongly determined attribute size.
* The default buffer size used to setup event buffers with perf tools is 512k. Using the same buffer size for persistent events. This also avoids failed mmap calls due to different buffer sizes.
* Improve error reporting.
* Returning -ENODEV if no file descriptor is found. An error code of -1 (-EPERM) is misleading in this case.
* Adding cpu check to perf_get_persistent_event_fd()
[ make percpu variable static ] Reported-by: Fengguang Wu fengguang.wu@intel.com Signed-off-by: Borislav Petkov bp@suse.de [ Fix attr size ] [ Setting default buffer size to 512k as in perf tools ] [ Print error code on failure when adding events ] [ Return resonable error code ] [ Adding cpu check to perf_get_persistent_event_fd() ] Reported-by: Jiri Olsa jolsa@redhat.com Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- include/linux/perf_event.h | 16 +++- kernel/events/Makefile | 2 +- kernel/events/core.c | 13 ++-- kernel/events/internal.h | 4 + kernel/events/persistent.c | 181 +++++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 207 insertions(+), 9 deletions(-) create mode 100644 kernel/events/persistent.c
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 6fddac1..d2a42b7 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -518,6 +518,13 @@ struct perf_output_handle { int page; };
+struct pers_event_desc { + struct perf_event_attr *attr; + struct perf_event *event; + struct list_head plist; + int fd; +}; + #ifdef CONFIG_PERF_EVENTS
extern int perf_pmu_register(struct pmu *pmu, char *name, int type); @@ -750,7 +757,9 @@ extern void perf_event_enable(struct perf_event *event); extern void perf_event_disable(struct perf_event *event); extern int __perf_event_disable(void *info); extern void perf_event_task_tick(void); -#else +extern int perf_add_persistent_event(struct perf_event_attr *, unsigned); +extern int perf_add_persistent_event_by_id(int id); +#else /* !CONFIG_PERF_EVENTS */ static inline void perf_event_task_sched_in(struct task_struct *prev, struct task_struct *task) { } @@ -790,7 +799,10 @@ static inline void perf_event_enable(struct perf_event *event) { } static inline void perf_event_disable(struct perf_event *event) { } static inline int __perf_event_disable(void *info) { return -1; } static inline void perf_event_task_tick(void) { } -#endif +static inline int perf_add_persistent_event(struct perf_event_attr *attr, + unsigned nr_pages) { return -EINVAL; } +static inline int perf_add_persistent_event_by_id(int id) { return -EINVAL; } +#endif /* !CONFIG_PERF_EVENTS */
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL) extern bool perf_event_can_stop_tick(void); diff --git a/kernel/events/Makefile b/kernel/events/Makefile index 103f5d1..70990d5 100644 --- a/kernel/events/Makefile +++ b/kernel/events/Makefile @@ -2,7 +2,7 @@ ifdef CONFIG_FUNCTION_TRACER CFLAGS_REMOVE_core.o = -pg endif
-obj-y := core.o ring_buffer.o callchain.o +obj-y := core.o ring_buffer.o callchain.o persistent.o
obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o obj-$(CONFIG_UPROBES) += uprobes.o diff --git a/kernel/events/core.c b/kernel/events/core.c index a13e457..a9b6470 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3021,8 +3021,6 @@ static void free_event_rcu(struct rcu_head *head) kfree(event); }
-static void rb_put(struct ring_buffer *rb); - static void free_event(struct perf_event *event) { irq_work_sync(&event->pending); @@ -3398,8 +3396,6 @@ unlock: return ret; }
-static const struct file_operations perf_fops; - static inline int perf_fget_light(int fd, struct fd *p) { struct fd f = fdget(fd); @@ -3684,7 +3680,7 @@ static struct ring_buffer *rb_get(struct perf_event *event) return rb; }
-static void rb_put(struct ring_buffer *rb) +void rb_put(struct ring_buffer *rb) { struct perf_event *event, *n; unsigned long flags; @@ -3866,7 +3862,7 @@ static int perf_fasync(int fd, struct file *filp, int on) return 0; }
-static const struct file_operations perf_fops = { +const struct file_operations perf_fops = { .llseek = no_llseek, .release = perf_release, .read = perf_read, @@ -6623,6 +6619,9 @@ SYSCALL_DEFINE5(perf_event_open, if (err) return err;
+ if (attr.persistent) + return perf_get_persistent_event_fd(cpu, &attr); + if (!attr.exclude_kernel) { if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN)) return -EACCES; @@ -7579,6 +7578,8 @@ void __init perf_event_init(void) */ BUILD_BUG_ON((offsetof(struct perf_event_mmap_page, data_head)) != 1024); + + persistent_events_init(); }
static int __init perf_event_sysfs_init(void) diff --git a/kernel/events/internal.h b/kernel/events/internal.h index eb675c4..3b481be 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -38,6 +38,7 @@ struct ring_buffer { extern void rb_free(struct ring_buffer *rb); extern struct ring_buffer * rb_alloc(int nr_pages, long watermark, int cpu, int flags); +extern void rb_put(struct ring_buffer *rb); extern void perf_event_wakeup(struct perf_event *event);
extern void @@ -174,4 +175,7 @@ static inline bool arch_perf_have_user_stack_dump(void) #define perf_user_stack_pointer(regs) 0 #endif /* CONFIG_HAVE_PERF_USER_STACK_DUMP */
+extern const struct file_operations perf_fops; +extern int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr); +extern void __init persistent_events_init(void); #endif /* _KERNEL_EVENTS_INTERNAL_H */ diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c new file mode 100644 index 0000000..53411b4 --- /dev/null +++ b/kernel/events/persistent.c @@ -0,0 +1,181 @@ +#include <linux/slab.h> +#include <linux/file.h> +#include <linux/perf_event.h> +#include <linux/anon_inodes.h> + +#include "internal.h" + +/* 512 kiB: default perf tools memory size, see perf_evlist__mmap() */ +#define CPU_BUFFER_NR_PAGES ((512 * 1024) / PAGE_SIZE) + +static DEFINE_PER_CPU(struct list_head, pers_events); + +static struct perf_event * +add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr, + unsigned nr_pages) +{ + struct perf_event *event = ERR_PTR(-ENOMEM); + struct pers_event_desc *desc; + struct ring_buffer *buf; + + desc = kzalloc(sizeof(*desc), GFP_KERNEL); + if (!desc) + goto out; + + buf = rb_alloc(nr_pages, 0, cpu, 0); + if (!buf) + goto err_rb; + + event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL); + if (IS_ERR(event)) + goto err_event; + + rcu_assign_pointer(event->rb, buf); + + desc->event = event; + desc->attr = attr; + + INIT_LIST_HEAD(&desc->plist); + list_add_tail(&desc->plist, &per_cpu(pers_events, cpu)); + + /* All workie, enable event now */ + perf_event_enable(event); + + goto out; + + err_event: + rb_put(buf); + + err_rb: + kfree(desc); + + out: + return event; +} + +static void del_persistent_event(int cpu, struct perf_event_attr *attr) +{ + struct pers_event_desc *desc, *tmp; + struct perf_event *event = NULL; + + list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) { + if (desc->attr->config == attr->config) { + event = desc->event; + break; + } + } + + if (!event) + return; + + list_del(&desc->plist); + + perf_event_disable(event); + if (event->rb) { + rb_put(event->rb); + rcu_assign_pointer(event->rb, NULL); + } + + perf_event_release_kernel(event); + put_unused_fd(desc->fd); + kfree(desc->attr); + kfree(desc); +} + +static int __alloc_persistent_event_fd(struct pers_event_desc *desc) +{ + struct file *event_file = NULL; + int event_fd = -1; + + event_fd = get_unused_fd(); + if (event_fd < 0) + goto out; + + event_file = anon_inode_getfile("[pers_event]", &perf_fops, + desc->event, O_RDONLY); + if (IS_ERR(event_file)) + goto err_event_file; + + desc->fd = event_fd; + fd_install(event_fd, event_file); + + goto out; + + + err_event_file: + put_unused_fd(event_fd); + + out: + return event_fd; +} + +/* + * Create and enable the persistent version of the perf event described by + * @attr. + * + * @attr: perf event descriptor + * @nr_pages: size in pages + */ +int perf_add_persistent_event(struct perf_event_attr *attr, unsigned nr_pages) +{ + struct perf_event *event; + int i; + + for_each_possible_cpu(i) { + event = add_persistent_event_on_cpu(i, attr, nr_pages); + if (IS_ERR(event)) + goto unwind; + } + return 0; + +unwind: + pr_err("%s: Error adding persistent event on cpu %d: %ld\n", + __func__, i, PTR_ERR(event)); + + while (--i >= 0) + del_persistent_event(i, attr); + + return PTR_ERR(event); +} + +int perf_add_persistent_event_by_id(int id) +{ + struct perf_event_attr *attr; + + attr = kzalloc(sizeof(*attr), GFP_KERNEL); + if (!attr) + return -ENOMEM; + + attr->sample_period = 1; + attr->wakeup_events = 1; + attr->sample_type = PERF_SAMPLE_RAW; + attr->persistent = 1; + attr->config = id; + attr->type = PERF_TYPE_TRACEPOINT; + attr->size = sizeof(*attr); + + return perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES); +} + +int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr) +{ + struct pers_event_desc *desc; + + if (cpu >= (unsigned)nr_cpu_ids) + return -EINVAL; + + list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist) + if (desc->attr->config == attr->config) + return __alloc_persistent_event_fd(desc); + + return -ENODEV; +} + + +void __init persistent_events_init(void) +{ + int i; + + for_each_possible_cpu(i) + INIT_LIST_HEAD(&per_cpu(pers_events, i)); +}
From: Borislav Petkov bp@suse.de
... for MCEs collection.
Changes made by Robert Richter robert.richter@linaro.org:
The mce_record tracepoint needs tracepoints to be enabled. Fixing build error for no-tracepoints configs.
Signed-off-by: Borislav Petkov bp@suse.de [ Fix build error for no-tracepoints configs ] Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- arch/x86/kernel/cpu/mcheck/mce.c | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index 9239504..d421937 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1987,6 +1987,24 @@ int __init mcheck_init(void) return 0; }
+#ifdef CONFIG_TRACEPOINTS + +int __init mcheck_init_tp(void) +{ + if (perf_add_persistent_event_by_id(event_mce_record.event.type)) { + pr_err("Error adding MCE persistent event.\n"); + return -EINVAL; + } + return 0; +} +/* + * We can't run earlier because persistent events uses anon_inode_getfile and + * its anon_inode_mnt gets initialized as a fs_initcall. + */ +fs_initcall_sync(mcheck_init_tp); + +#endif /* CONFIG_TRACEPOINTS */ + /* * mce_syscore: PM support */
From: Robert Richter robert.richter@linaro.org
Struct pers_event_desc is only used in kernel/events/persistent.c. Moving it there. Also, removing attr member as this is a copy of event->attr.
Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- include/linux/perf_event.h | 7 ------- kernel/events/persistent.c | 12 ++++++++---- 2 files changed, 8 insertions(+), 11 deletions(-)
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index d2a42b7..dc72c93 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -518,13 +518,6 @@ struct perf_output_handle { int page; };
-struct pers_event_desc { - struct perf_event_attr *attr; - struct perf_event *event; - struct list_head plist; - int fd; -}; - #ifdef CONFIG_PERF_EVENTS
extern int perf_pmu_register(struct pmu *pmu, char *name, int type); diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c index 53411b4..d7aaf95 100644 --- a/kernel/events/persistent.c +++ b/kernel/events/persistent.c @@ -8,6 +8,12 @@ /* 512 kiB: default perf tools memory size, see perf_evlist__mmap() */ #define CPU_BUFFER_NR_PAGES ((512 * 1024) / PAGE_SIZE)
+struct pers_event_desc { + struct perf_event *event; + struct list_head plist; + int fd; +}; + static DEFINE_PER_CPU(struct list_head, pers_events);
static struct perf_event * @@ -33,7 +39,6 @@ add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr, rcu_assign_pointer(event->rb, buf);
desc->event = event; - desc->attr = attr;
INIT_LIST_HEAD(&desc->plist); list_add_tail(&desc->plist, &per_cpu(pers_events, cpu)); @@ -59,7 +64,7 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr) struct perf_event *event = NULL;
list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) { - if (desc->attr->config == attr->config) { + if (desc->event->attr.config == attr->config) { event = desc->event; break; } @@ -78,7 +83,6 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr)
perf_event_release_kernel(event); put_unused_fd(desc->fd); - kfree(desc->attr); kfree(desc); }
@@ -165,7 +169,7 @@ int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr) return -EINVAL;
list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist) - if (desc->attr->config == attr->config) + if (desc->event->attr.config == attr->config) return __alloc_persistent_event_fd(desc);
return -ENODEV;
From: Robert Richter robert.richter@linaro.org
rb_put() is called already in perf_event_release_kernel(), so no need to do the same in del_persistent_event(). We also don't need it in add_persistent_event_on_cpu() after a rework. Since there are no users of rb_put() anymore, we can make it private again to only events/core.c.
Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- kernel/events/core.c | 4 +++- kernel/events/internal.h | 1 - kernel/events/persistent.c | 29 +++++++++++------------------ 3 files changed, 14 insertions(+), 20 deletions(-)
diff --git a/kernel/events/core.c b/kernel/events/core.c index a9b6470..8f85caa 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3021,6 +3021,8 @@ static void free_event_rcu(struct rcu_head *head) kfree(event); }
+static void rb_put(struct ring_buffer *rb); + static void free_event(struct perf_event *event) { irq_work_sync(&event->pending); @@ -3680,7 +3682,7 @@ static struct ring_buffer *rb_get(struct perf_event *event) return rb; }
-void rb_put(struct ring_buffer *rb) +static void rb_put(struct ring_buffer *rb) { struct perf_event *event, *n; unsigned long flags; diff --git a/kernel/events/internal.h b/kernel/events/internal.h index 3b481be..3647289 100644 --- a/kernel/events/internal.h +++ b/kernel/events/internal.h @@ -38,7 +38,6 @@ struct ring_buffer { extern void rb_free(struct ring_buffer *rb); extern struct ring_buffer * rb_alloc(int nr_pages, long watermark, int cpu, int flags); -extern void rb_put(struct ring_buffer *rb); extern void perf_event_wakeup(struct perf_event *event);
extern void diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c index d7aaf95..e6a7664 100644 --- a/kernel/events/persistent.c +++ b/kernel/events/persistent.c @@ -20,22 +20,22 @@ static struct perf_event * add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr, unsigned nr_pages) { - struct perf_event *event = ERR_PTR(-ENOMEM); + struct perf_event *event; struct pers_event_desc *desc; struct ring_buffer *buf;
desc = kzalloc(sizeof(*desc), GFP_KERNEL); if (!desc) - goto out; - - buf = rb_alloc(nr_pages, 0, cpu, 0); - if (!buf) - goto err_rb; + return ERR_PTR(-ENOMEM);
event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL); if (IS_ERR(event)) goto err_event;
+ buf = rb_alloc(nr_pages, 0, cpu, 0); + if (!buf) + goto err_rb; + rcu_assign_pointer(event->rb, buf);
desc->event = event; @@ -47,14 +47,12 @@ add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr, perf_event_enable(event);
goto out; - - err_event: - rb_put(buf); - - err_rb: +err_rb: + perf_event_release_kernel(event); + event = ERR_PTR(-ENOMEM); +err_event: kfree(desc); - - out: +out: return event; }
@@ -76,11 +74,6 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr) list_del(&desc->plist);
perf_event_disable(event); - if (event->rb) { - rb_put(event->rb); - rcu_assign_pointer(event->rb, NULL); - } - perf_event_release_kernel(event); put_unused_fd(desc->fd); kfree(desc);
From: Robert Richter robert.richter@linaro.org
Reducing duplicate code by introducing function get_persistent_event() to get the descriptor of an persistent event.
Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- kernel/events/persistent.c | 36 ++++++++++++++++++++++-------------- 1 file changed, 22 insertions(+), 14 deletions(-)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c index e6a7664..7d91871 100644 --- a/kernel/events/persistent.c +++ b/kernel/events/persistent.c @@ -16,6 +16,19 @@ struct pers_event_desc {
static DEFINE_PER_CPU(struct list_head, pers_events);
+static struct pers_event_desc +*get_persistent_event(int cpu, struct perf_event_attr *attr) +{ + struct pers_event_desc *desc; + + list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist) { + if (desc->event->attr.config == attr->config) + return desc; + } + + return NULL; +} + static struct perf_event * add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr, unsigned nr_pages) @@ -58,18 +71,13 @@ out:
static void del_persistent_event(int cpu, struct perf_event_attr *attr) { - struct pers_event_desc *desc, *tmp; - struct perf_event *event = NULL; - - list_for_each_entry_safe(desc, tmp, &per_cpu(pers_events, cpu), plist) { - if (desc->event->attr.config == attr->config) { - event = desc->event; - break; - } - } + struct pers_event_desc *desc; + struct perf_event *event;
- if (!event) + desc = get_persistent_event(cpu, attr); + if (!desc) return; + event = desc->event;
list_del(&desc->plist);
@@ -161,11 +169,11 @@ int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr) if (cpu >= (unsigned)nr_cpu_ids) return -EINVAL;
- list_for_each_entry(desc, &per_cpu(pers_events, cpu), plist) - if (desc->event->attr.config == attr->config) - return __alloc_persistent_event_fd(desc); + desc = get_persistent_event(cpu, attr); + if (!desc) + return -ENODEV;
- return -ENODEV; + return __alloc_persistent_event_fd(desc); }
From: Robert Richter robert.richter@linaro.org
There is already a function anon_inode_getfd() that does already all the work. Reworking and simplifying code.
Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- kernel/events/persistent.c | 35 +++++++---------------------------- 1 file changed, 7 insertions(+), 28 deletions(-)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c index 7d91871..16ed47c 100644 --- a/kernel/events/persistent.c +++ b/kernel/events/persistent.c @@ -87,33 +87,6 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr) kfree(desc); }
-static int __alloc_persistent_event_fd(struct pers_event_desc *desc) -{ - struct file *event_file = NULL; - int event_fd = -1; - - event_fd = get_unused_fd(); - if (event_fd < 0) - goto out; - - event_file = anon_inode_getfile("[pers_event]", &perf_fops, - desc->event, O_RDONLY); - if (IS_ERR(event_file)) - goto err_event_file; - - desc->fd = event_fd; - fd_install(event_fd, event_file); - - goto out; - - - err_event_file: - put_unused_fd(event_fd); - - out: - return event_fd; -} - /* * Create and enable the persistent version of the perf event described by * @attr. @@ -165,6 +138,7 @@ int perf_add_persistent_event_by_id(int id) int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr) { struct pers_event_desc *desc; + int event_fd;
if (cpu >= (unsigned)nr_cpu_ids) return -EINVAL; @@ -173,7 +147,12 @@ int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr) if (!desc) return -ENODEV;
- return __alloc_persistent_event_fd(desc); + event_fd = anon_inode_getfd("[pers_event]", &perf_fops, + desc->event, O_RDONLY); + if (event_fd >= 0) + desc->fd = event_fd; + + return event_fd; }
From: Robert Richter robert.richter@linaro.org
Protect esp. access to struct pers_event_desc *desc. There are race conditions possible where the descriptor could be removed from list while it is used.
Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- kernel/events/persistent.c | 33 ++++++++++++++++++++++++--------- 1 file changed, 24 insertions(+), 9 deletions(-)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c index 16ed47c..586cea5 100644 --- a/kernel/events/persistent.c +++ b/kernel/events/persistent.c @@ -15,6 +15,7 @@ struct pers_event_desc { };
static DEFINE_PER_CPU(struct list_head, pers_events); +static DEFINE_PER_CPU(struct mutex, pers_events_lock);
static struct pers_event_desc *get_persistent_event(int cpu, struct perf_event_attr *attr) @@ -37,9 +38,13 @@ add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr, struct pers_event_desc *desc; struct ring_buffer *buf;
+ mutex_lock(&per_cpu(pers_events_lock, cpu)); + desc = kzalloc(sizeof(*desc), GFP_KERNEL); - if (!desc) - return ERR_PTR(-ENOMEM); + if (!desc) { + event = ERR_PTR(-ENOMEM); + goto out; + }
event = perf_event_create_kernel_counter(attr, cpu, NULL, NULL, NULL); if (IS_ERR(event)) @@ -66,6 +71,7 @@ err_rb: err_event: kfree(desc); out: + mutex_unlock(&per_cpu(pers_events_lock, cpu)); return event; }
@@ -74,9 +80,11 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr) struct pers_event_desc *desc; struct perf_event *event;
+ mutex_lock(&per_cpu(pers_events_lock, cpu)); + desc = get_persistent_event(cpu, attr); if (!desc) - return; + goto out; event = desc->event;
list_del(&desc->plist); @@ -85,6 +93,8 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr) perf_event_release_kernel(event); put_unused_fd(desc->fd); kfree(desc); +out: + mutex_unlock(&per_cpu(pers_events_lock, cpu)); }
/* @@ -138,28 +148,33 @@ int perf_add_persistent_event_by_id(int id) int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr) { struct pers_event_desc *desc; - int event_fd; + int event_fd = -ENODEV;
if (cpu >= (unsigned)nr_cpu_ids) return -EINVAL;
+ mutex_lock(&per_cpu(pers_events_lock, cpu)); + desc = get_persistent_event(cpu, attr); if (!desc) - return -ENODEV; + goto out;
event_fd = anon_inode_getfd("[pers_event]", &perf_fops, desc->event, O_RDONLY); if (event_fd >= 0) desc->fd = event_fd; +out: + mutex_unlock(&per_cpu(pers_events_lock, cpu));
return event_fd; }
- void __init persistent_events_init(void) { - int i; + int cpu;
- for_each_possible_cpu(i) - INIT_LIST_HEAD(&per_cpu(pers_events, i)); + for_each_possible_cpu(cpu) { + INIT_LIST_HEAD(&per_cpu(pers_events, cpu)); + mutex_init(&per_cpu(pers_events_lock, cpu)); + } }
From: Robert Richter robert.richter@linaro.org
Check if an event already exists before adding it.
Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- kernel/events/persistent.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c index 586cea5..4fcd071 100644 --- a/kernel/events/persistent.c +++ b/kernel/events/persistent.c @@ -40,6 +40,12 @@ add_persistent_event_on_cpu(unsigned int cpu, struct perf_event_attr *attr,
mutex_lock(&per_cpu(pers_events_lock, cpu));
+ desc = get_persistent_event(cpu, attr); + if (desc) { + event = ERR_PTR(-EEXIST); + goto out; + } + desc = kzalloc(sizeof(*desc), GFP_KERNEL); if (!desc) { event = ERR_PTR(-ENOMEM);
From: Robert Richter robert.richter@linaro.org
We want to use the kernel's pmu design to later expose persistent events via sysfs to userland. Initially implement a persistent pmu.
The format syntax is introduced allowing to set bits anywhere in struct perf_event_attr. This is used in this case to set the persistent flag (attr5:23). The syntax is attr<num> where num is the index of the u64 array in struct perf_event_attr. Otherwise syntax is same as for config<num>.
Patches that implement this functionality for perf tools are sent in a separate patchset.
Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- kernel/events/persistent.c | 35 +++++++++++++++++++++++++++++++++++ 1 file changed, 35 insertions(+)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c index 4fcd071..97c57c9 100644 --- a/kernel/events/persistent.c +++ b/kernel/events/persistent.c @@ -175,10 +175,45 @@ out: return event_fd; }
+PMU_FORMAT_ATTR(persistent, "attr5:23"); + +static struct attribute *persistent_format_attrs[] = { + &format_attr_persistent.attr, + NULL, +}; + +static struct attribute_group persistent_format_group = { + .name = "format", + .attrs = persistent_format_attrs, +}; + +static const struct attribute_group *persistent_attr_groups[] = { + &persistent_format_group, + NULL, +}; + +static struct pmu persistent_pmu; + +static int persistent_pmu_init(struct perf_event *event) +{ + if (persistent_pmu.type != event->attr.type) + return -ENOENT; + + /* Not a persistent event. */ + return -EFAULT; +} + +static struct pmu persistent_pmu = { + .event_init = persistent_pmu_init, + .attr_groups = persistent_attr_groups, +}; + void __init persistent_events_init(void) { int cpu;
+ perf_pmu_register(&persistent_pmu, "persistent", -1); + for_each_possible_cpu(cpu) { INIT_LIST_HEAD(&per_cpu(pers_events, cpu)); mutex_init(&per_cpu(pers_events_lock, cpu));
From: Robert Richter robert.richter@linaro.org
For later adding persistent events to sysfs we need a name for each event. Adding a name to each persistent event.
Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- arch/x86/kernel/cpu/mcheck/mce.c | 3 ++- include/linux/perf_event.h | 4 ++-- kernel/events/persistent.c | 30 +++++++++++++++++++++++++----- 3 files changed, 29 insertions(+), 8 deletions(-)
diff --git a/arch/x86/kernel/cpu/mcheck/mce.c b/arch/x86/kernel/cpu/mcheck/mce.c index d421937..833eb7a 100644 --- a/arch/x86/kernel/cpu/mcheck/mce.c +++ b/arch/x86/kernel/cpu/mcheck/mce.c @@ -1991,7 +1991,8 @@ int __init mcheck_init(void)
int __init mcheck_init_tp(void) { - if (perf_add_persistent_event_by_id(event_mce_record.event.type)) { + if (perf_add_persistent_event_by_id(event_mce_record.name, + event_mce_record.event.type)) { pr_err("Error adding MCE persistent event.\n"); return -EINVAL; } diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index dc72c93..06b4357b 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -751,7 +751,7 @@ extern void perf_event_disable(struct perf_event *event); extern int __perf_event_disable(void *info); extern void perf_event_task_tick(void); extern int perf_add_persistent_event(struct perf_event_attr *, unsigned); -extern int perf_add_persistent_event_by_id(int id); +extern int perf_add_persistent_event_by_id(char *name, int id); #else /* !CONFIG_PERF_EVENTS */ static inline void perf_event_task_sched_in(struct task_struct *prev, @@ -794,7 +794,7 @@ static inline int __perf_event_disable(void *info) { return -1; } static inline void perf_event_task_tick(void) { } static inline int perf_add_persistent_event(struct perf_event_attr *attr, unsigned nr_pages) { return -EINVAL; } -static inline int perf_add_persistent_event_by_id(int id) { return -EINVAL; } +static inline int perf_add_persistent_event_by_id(char *name, int id) { return -EINVAL; } #endif /* !CONFIG_PERF_EVENTS */
#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_NO_HZ_FULL) diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c index 97c57c9..96201c1 100644 --- a/kernel/events/persistent.c +++ b/kernel/events/persistent.c @@ -14,6 +14,11 @@ struct pers_event_desc { int fd; };
+struct pers_event { + char *name; + struct perf_event_attr attr; +}; + static DEFINE_PER_CPU(struct list_head, pers_events); static DEFINE_PER_CPU(struct mutex, pers_events_lock);
@@ -132,14 +137,20 @@ unwind: return PTR_ERR(event); }
-int perf_add_persistent_event_by_id(int id) +int perf_add_persistent_event_by_id(char* name, int id) { - struct perf_event_attr *attr; + struct pers_event *event; + struct perf_event_attr *attr; + int ret = -ENOMEM;
- attr = kzalloc(sizeof(*attr), GFP_KERNEL); - if (!attr) + event = kzalloc(sizeof(*event), GFP_KERNEL); + if (!event) return -ENOMEM; + event->name = kstrdup(name, GFP_KERNEL); + if (!event->name) + goto fail;
+ attr = &event->attr; attr->sample_period = 1; attr->wakeup_events = 1; attr->sample_type = PERF_SAMPLE_RAW; @@ -148,7 +159,16 @@ int perf_add_persistent_event_by_id(int id) attr->type = PERF_TYPE_TRACEPOINT; attr->size = sizeof(*attr);
- return perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES); + ret = perf_add_persistent_event(attr, CPU_BUFFER_NR_PAGES); + if (ret) + goto fail; + + return 0; +fail: + kfree(event->name); + kfree(event); + + return ret; }
int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr)
From: Robert Richter robert.richter@linaro.org
Expose persistent events in the system to userland using sysfs. Perf tools are able to read existing pmu events from sysfs. Now we use a persistent pmu as an event container containing all registered persistent events of the system. This patch adds dynamically registration of persistent events to sysfs. E.g. something like this:
/sys/bus/event_source/devices/persistent/events/mce_record:persistent,config=106 /sys/bus/event_source/devices/persistent/format/persistent:attr5:23
Perf tools need to support the attr<num> syntax that is added in a separate patch set. With it we are able to run perf tool commands to read persistent events, e.g.:
# perf record -e persistent/mce_record/ sleep 10 # perf top -e persistent/mce_record/
[ Document attr<index> syntax in sysfs ABI ] Reported-by: Jiri Olsa jolsa@redhat.com Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- .../testing/sysfs-bus-event_source-devices-format | 43 ++++++++++++----- kernel/events/persistent.c | 55 +++++++++++++++++++++- 2 files changed, 86 insertions(+), 12 deletions(-)
diff --git a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format index 77f47ff..47b7353 100644 --- a/Documentation/ABI/testing/sysfs-bus-event_source-devices-format +++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-format @@ -1,13 +1,14 @@ -Where: /sys/bus/event_source/devices/<dev>/format +Where: /sys/bus/event_source/devices/<pmu>/format/<name> Date: January 2012 -Kernel Version: 3.3 +Kernel Version: 3.3 + 3.12 (added attr<index>:<bits>) Contact: Jiri Olsa jolsa@redhat.com -Description: - Attribute group to describe the magic bits that go into - perf_event_attr::config[012] for a particular pmu. - Each attribute of this group defines the 'hardware' bitmask - we want to export, so that userspace can deal with sane - name/value pairs. + +Description: Define formats for bit ranges in perf_event_attr + + Attribute group to describe the magic bits that go + into struct perf_event_attr for a particular pmu. Bit + range may be any bit mask of an u64 (bits 0 to 63).
Userspace must be prepared for the possibility that attributes define overlapping bit ranges. For example: @@ -15,6 +16,26 @@ Description: attr2 = 'config:0-7' attr3 = 'config:12-35'
- Example: 'config1:1,6-10,44' - Defines contents of attribute that occupies bits 1,6-10,44 of - perf_event_attr::config1. + Syntax Description + + config[012]*:<bits> Each attribute of this group + defines the 'hardware' bitmask + we want to export, so that + userspace can deal with sane + name/value pairs. + + attr<index>:<bits> Set any field of the event + attribute. The index is a + decimal number that specifies + the u64 value to be set within + struct perf_event_attr. + + Examples: + + 'config1:1,6-10,44' Defines contents of attribute + that occupies bits 1,6-10,44 + of perf_event_attr::config1. + + 'attr5:23' Define the persistent event + flag (bit 23 of the attribute + flags) diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c index 96201c1..8be7c05 100644 --- a/kernel/events/persistent.c +++ b/kernel/events/persistent.c @@ -17,8 +17,10 @@ struct pers_event_desc { struct pers_event { char *name; struct perf_event_attr attr; + struct perf_pmu_events_attr sysfs; };
+static struct pmu persistent_pmu; static DEFINE_PER_CPU(struct list_head, pers_events); static DEFINE_PER_CPU(struct mutex, pers_events_lock);
@@ -137,6 +139,8 @@ unwind: return PTR_ERR(event); }
+static int pers_event_sysfs_register(struct pers_event *event); + int perf_add_persistent_event_by_id(char* name, int id) { struct pers_event *event; @@ -150,6 +154,8 @@ int perf_add_persistent_event_by_id(char* name, int id) if (!event->name) goto fail;
+ event->sysfs.id = id; + attr = &event->attr; attr->sample_period = 1; attr->wakeup_events = 1; @@ -163,6 +169,8 @@ int perf_add_persistent_event_by_id(char* name, int id) if (ret) goto fail;
+ pers_event_sysfs_register(event); + return 0; fail: kfree(event->name); @@ -207,12 +215,57 @@ static struct attribute_group persistent_format_group = { .attrs = persistent_format_attrs, };
+#define MAX_EVENTS 16 + +static struct attribute *persistent_events_attr[MAX_EVENTS + 1] = { }; + +static struct attribute_group persistent_events_group = { + .name = "events", + .attrs = persistent_events_attr, +}; + static const struct attribute_group *persistent_attr_groups[] = { &persistent_format_group, + NULL, /* placeholder: &persistent_events_group */ NULL, }; +#define EVENTS_GROUP (persistent_attr_groups[1])
-static struct pmu persistent_pmu; +static ssize_t pers_event_sysfs_show(struct device *dev, + struct device_attribute *__attr, char *page) +{ + struct perf_pmu_events_attr *attr = + container_of(__attr, struct perf_pmu_events_attr, attr); + return sprintf(page, "persistent,config=%lld", + (unsigned long long)attr->id); +} + +static int pers_event_sysfs_register(struct pers_event *event) +{ + struct device_attribute *attr = &event->sysfs.attr; + int idx; + + *attr = (struct device_attribute)__ATTR(, 0444, pers_event_sysfs_show, + NULL); + attr->attr.name = event->name; + + /* add sysfs attr to events: */ + for (idx = 0; idx < MAX_EVENTS; idx++) { + if (!cmpxchg(persistent_events_attr + idx, NULL, &attr->attr)) + break; + } + + if (idx >= MAX_EVENTS) + return -ENOSPC; + if (!idx) + EVENTS_GROUP = &persistent_events_group; + if (!persistent_pmu.dev) + return 0; /* sysfs not yet initialized */ + if (idx) + return sysfs_update_group(&persistent_pmu.dev->kobj, + EVENTS_GROUP); + return sysfs_create_group(&persistent_pmu.dev->kobj, EVENTS_GROUP); +}
static int persistent_pmu_init(struct perf_event *event) {
From: Robert Richter robert.richter@linaro.org
Usually a fd close leads to the release of the event too. For persistent events this is different as the events should be permanently enabled in the system. Using reference counting to avoid releasing an event during a fd close. This also allows it to have multiple users (open file descriptors) for a single persistent event.
While at this, we don't need desc->fd any longer. The fd is attached to a task and reference counting keeps the event. Removing desc->fd.
Signed-off-by: Robert Richter robert.richter@linaro.org Signed-off-by: Robert Richter rric@kernel.org --- kernel/events/persistent.c | 46 ++++++++++++++++++++++++++++++++++++---------- 1 file changed, 36 insertions(+), 10 deletions(-)
diff --git a/kernel/events/persistent.c b/kernel/events/persistent.c index 8be7c05..dd20b55 100644 --- a/kernel/events/persistent.c +++ b/kernel/events/persistent.c @@ -11,7 +11,6 @@ struct pers_event_desc { struct perf_event *event; struct list_head plist; - int fd; };
struct pers_event { @@ -88,6 +87,18 @@ out: return event; }
+static void detach_persistent_event(struct pers_event_desc *desc) +{ + list_del(&desc->plist); + kfree(desc); +} + +static void release_persistent_event(struct perf_event *event) +{ + perf_event_disable(event); + perf_event_release_kernel(event); +} + static void del_persistent_event(int cpu, struct perf_event_attr *attr) { struct pers_event_desc *desc; @@ -100,12 +111,14 @@ static void del_persistent_event(int cpu, struct perf_event_attr *attr) goto out; event = desc->event;
- list_del(&desc->plist); - - perf_event_disable(event); - perf_event_release_kernel(event); - put_unused_fd(desc->fd); - kfree(desc); + /* + * We primarily want to remove desc from the list. If there + * are no open files, the refcount is 0 and we need to release + * the event too. + */ + detach_persistent_event(desc); + if (atomic_long_dec_and_test(&event->refcount)) + release_persistent_event(event); out: mutex_unlock(&per_cpu(pers_events_lock, cpu)); } @@ -182,6 +195,7 @@ fail: int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr) { struct pers_event_desc *desc; + struct perf_event *event; int event_fd = -ENODEV;
if (cpu >= (unsigned)nr_cpu_ids) @@ -190,13 +204,25 @@ int perf_get_persistent_event_fd(unsigned cpu, struct perf_event_attr *attr) mutex_lock(&per_cpu(pers_events_lock, cpu));
desc = get_persistent_event(cpu, attr); - if (!desc) + + /* Increment refcount to keep event on put_event() */ + if (!desc || !atomic_long_inc_not_zero(&desc->event->refcount)) goto out;
event_fd = anon_inode_getfd("[pers_event]", &perf_fops, desc->event, O_RDONLY); - if (event_fd >= 0) - desc->fd = event_fd; + + if (event_fd < 0) { + event = desc->event; + if (WARN_ON(atomic_long_dec_and_test(&event->refcount))) { + /* + * May not happen since decrementing refcount is + * protected by pers_events_lock. + */ + detach_persistent_event(desc); + release_persistent_event(event); + } + } out: mutex_unlock(&per_cpu(pers_events_lock, cpu));
linaro-kernel@lists.linaro.org