Hi Greg,
On Thu, Jun 17, 2021 at 2:03 AM Greg KH greg@kroah.com wrote:
On Thu, Jun 17, 2021 at 04:41:43AM +0000, Jing Zhang wrote:
Provides a file descriptor per VM to read VM stats info/data. Provides a file descriptor per vCPU to read vCPU stats info/data.
The KVM stats now is only accessible by debugfs, which has some shortcomings this change are supposed to fix:
- Debugfs is not a stable interface for production and it is disabled when kernel Lockdown mode is enabled.
debugfs _could_ be a stable interface if you want it to be and make that rule for your subsystem. Disabling it for lockdown mode is a different issue, and that is a system-wide-policy-decision, not a debugfs-specific thing.
- Debugfs is organized as "one value per file", it is good for debugging, but not supposed to be used for production.
debugfs IS NOT one-value-per-file, you can do whatever you want in there. sysfs IS one-value-per-file, do not get the two confused there.
- Debugfs read/clear in KVM are protected by the global kvm_lock.
That's your implementation issue, not a debugfs issue.
The only "rule" in debugfs is: There are no rules.
So while your subsystem might have issues with using debugfs for statistics like this, that's not debugfs's fault, that's how you want to use the debugfs files for your subsystem.
You are right. The issues are from how the debugfs is used in KVM stats. Will fix the text accordingly.
Besides that, there are some other benefits with this change:
- All KVM VM/VCPU stats can be read out in a bulk by one copy to userspace.
- A schema is used to describe KVM statistics. From userspace's perspective, the KVM statistics are self-describing.
- Fd-based solution provides the possibility that a telemetry can read KVM stats in a less privileged situation.
"possiblity"? Does this work or not? Have you tested it?
I should've said "We are able to read KVM stats in a less privileged process".
+static ssize_t kvm_vm_stats_read(struct file *file, char __user *user_buffer,
size_t size, loff_t *offset)
+{
struct kvm *kvm = file->private_data;
snprintf(&kvm_vm_stats_header.id[0], sizeof(kvm_vm_stats_header.id),
"kvm-%d", task_pid_nr(current));
Why do you write to this static variable for EVERY read? Shouldn't you just do it once at open? How can it change?
Wait, it's a single shared variable, what happens when multiple tasks open this thing and read from it? You race between writing to this variable here and then:
return kvm_stats_read(&kvm_vm_stats_header, &kvm_vm_stats_desc[0],
&kvm->stat, sizeof(kvm->stat), user_buffer, size, offset);
Accessing it here.
So how is this really working?
You are right. We only need to do it once at the open. Will fix it according to Paolo's suggestion.
thanks,
greg k-h
Thanks, Jing