Hi Paolo,
On Wed, Mar 10, 2021 at 8:55 AM Paolo Bonzini pbonzini@redhat.com wrote:
On 10/03/21 01:30, Jing Zhang wrote:
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 383df23514b9..87dd62516c8b 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3464,6 +3464,51 @@ static long kvm_vcpu_ioctl(struct file *filp, r = kvm_arch_vcpu_ioctl_set_fpu(vcpu, fpu); break; }
case KVM_STATS_GET_INFO: {
struct kvm_stats_info stats_info;
r = -EFAULT;
stats_info.num_stats = VCPU_STAT_COUNT;
if (copy_to_user(argp, &stats_info, sizeof(stats_info)))
goto out;
r = 0;
break;
}
case KVM_STATS_GET_NAMES: {
struct kvm_stats_names stats_names;
r = -EFAULT;
if (copy_from_user(&stats_names, argp, sizeof(stats_names)))
goto out;
r = -EINVAL;
if (stats_names.size < VCPU_STAT_COUNT * KVM_STATS_NAME_LEN)
goto out;
r = -EFAULT;
if (copy_to_user(argp + sizeof(stats_names),
kvm_vcpu_stat_strings,
VCPU_STAT_COUNT * KVM_STATS_NAME_LEN))
The only reason to separate the strings in patch 1 is to pass them here. But this is a poor API because it imposes a limit on the length of the statistics, and makes that length part of the binary interface.
Agreed. I am considering returning the length of stats name strings in the kvm_stats_info structure instead of exporting it as a constant in uapi, which would put no limit on the length of the stats name strings.
I would prefer a completely different interface, where you have a file descriptor that can be created and associated to a vCPU or VM (or even to /dev/kvm). Having a file descriptor is important because the fd can be passed to a less-privileged process that takes care of gathering the metrics
Separate file descriptor solution is very tempting. We are still considering it seriously. Our biggest concern is that the metrics gathering/handling process is not necessary running on the same node as the one file descriptor belongs to. It scales better to pass metrics data directly than to pass file descriptors.
The result of reading the file descriptor could be either ASCII or binary. IMO the real cost lies in opening and reading a multitude of files rather than in the ASCII<->binary conversion.
Agreed.
The format could be one of the following:
- binary:
4 bytes flags (always zero) 4 bytes number of statistics 4 bytes offset of the first stat description 4 bytes offset of the first stat value stat descriptions:
- 4 bytes for the type (for now always zero: uint64_t)
- 4 bytes for the flags (for now always zero)
- length of name
- name
statistics in 64-bit format
- text:
stat1_name uint64 123 stat2_name uint64 456 ...
What do you think?
The binary format presented above is very flexible. I understand why it is organized this way. In our situation, the metrics data could be pulled periodically as short as half second. They are used by different kinds of monitors/triggers/alerts. To enhance efficiency and reduce traffic caused by metrics passing, we treat all metrics info/data as two kinds. One is immutable information, which doesn't change in a given system boot. The other is mutable data (statistics data), which is pulled/transferred periodically at a high frequency.
Paolo
Thanks, Jing