Current usage of kvm_io_device requires users to destruct it with an extra call of kvm_iodevice_destructor after the device gets unregistered from the kvm_io_bus. This is not necessary and can cause errors if a user forgot to make the extra call.
Simplify the usage by combining kvm_iodevice_destructor into kvm_io_bus_unregister_dev. This reduces LOCs a bit for users and can avoid the leakage of destructing the device explicitly.
The fix was originally provided by Sean Christopherson. Link: https://lore.kernel.org/lkml/DS0PR11MB6373F27D0EE6CD28C784478BDCEC9@DS0PR11M... Fixes: 5d3c4c79384a ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed") Cc: stable@vger.kernel.org Reported-by: ��������� liujingfeng@qianxin.com Signed-off-by: Wei Wang wei.w.wang@intel.com --- include/kvm/iodev.h | 6 ------ virt/kvm/coalesced_mmio.c | 1 - virt/kvm/eventfd.c | 1 - virt/kvm/kvm_main.c | 24 +++++++++++++++--------- 4 files changed, 15 insertions(+), 17 deletions(-)
diff --git a/include/kvm/iodev.h b/include/kvm/iodev.h index d75fc4365746..56619e33251e 100644 --- a/include/kvm/iodev.h +++ b/include/kvm/iodev.h @@ -55,10 +55,4 @@ static inline int kvm_iodevice_write(struct kvm_vcpu *vcpu, : -EOPNOTSUPP; }
-static inline void kvm_iodevice_destructor(struct kvm_io_device *dev) -{ - if (dev->ops->destructor) - dev->ops->destructor(dev); -} - #endif /* __KVM_IODEV_H__ */ diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 0be80c213f7f..d7135a5e76f8 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -195,7 +195,6 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm, */ if (r) break; - kvm_iodevice_destructor(&dev->dev); } }
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 2a3ed401ce46..1b277afb545b 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -898,7 +898,6 @@ kvm_deassign_ioeventfd_idx(struct kvm *kvm, enum kvm_bus bus_idx, bus = kvm_get_bus(kvm, bus_idx); if (bus) bus->ioeventfd_count--; - ioeventfd_release(p); ret = 0; break; } diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 13e88297f999..582757ebdce6 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5200,6 +5200,12 @@ static struct notifier_block kvm_reboot_notifier = { .priority = 0, };
+static void kvm_iodevice_destructor(struct kvm_io_device *dev) +{ + if (dev->ops->destructor) + dev->ops->destructor(dev); +} + static void kvm_io_bus_destroy(struct kvm_io_bus *bus) { int i; @@ -5423,7 +5429,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, struct kvm_io_device *dev) { - int i, j; + int i; struct kvm_io_bus *new_bus, *bus;
lockdep_assert_held(&kvm->slots_lock); @@ -5453,18 +5459,18 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, rcu_assign_pointer(kvm->buses[bus_idx], new_bus); synchronize_srcu_expedited(&kvm->srcu);
- /* Destroy the old bus _after_ installing the (null) bus. */ + /* + * If (null) bus is installed, destroy the old bus, including all the + * attached devices. Otherwise, destroy the caller's device only. + */ if (!new_bus) { pr_err("kvm: failed to shrink bus, removing it completely\n"); - for (j = 0; j < bus->dev_count; j++) { - if (j == i) - continue; - kvm_iodevice_destructor(bus->range[j].dev); - } + kvm_io_bus_destroy(bus); + return -ENOMEM; }
- kfree(bus); - return new_bus ? 0 : -ENOMEM; + kvm_iodevice_destructor(dev); + return 0; }
struct kvm_io_device *kvm_io_bus_get_dev(struct kvm *kvm, enum kvm_bus bus_idx,
On Thu, Dec 29, 2022, Wei Wang wrote:
Current usage of kvm_io_device requires users to destruct it with an extra call of kvm_iodevice_destructor after the device gets unregistered from the kvm_io_bus. This is not necessary and can cause errors if a user forgot to make the extra call.
Simplify the usage by combining kvm_iodevice_destructor into kvm_io_bus_unregister_dev. This reduces LOCs a bit for users and can avoid the leakage of destructing the device explicitly.
The fix was originally provided by Sean Christopherson. Link: https://lore.kernel.org/lkml/DS0PR11MB6373F27D0EE6CD28C784478BDCEC9@DS0PR11M... Fixes: 5d3c4c79384a ("KVM: Stop looking for coalesced MMIO zones if the bus is destroyed") Cc: stable@vger.kernel.org Reported-by: 柳菁峰 liujingfeng@qianxin.com Signed-off-by: Wei Wang wei.w.wang@intel.com
include/kvm/iodev.h | 6 ------ virt/kvm/coalesced_mmio.c | 1 - virt/kvm/eventfd.c | 1 - virt/kvm/kvm_main.c | 24 +++++++++++++++--------- 4 files changed, 15 insertions(+), 17 deletions(-)
diff --git a/include/kvm/iodev.h b/include/kvm/iodev.h index d75fc4365746..56619e33251e 100644 --- a/include/kvm/iodev.h +++ b/include/kvm/iodev.h @@ -55,10 +55,4 @@ static inline int kvm_iodevice_write(struct kvm_vcpu *vcpu, : -EOPNOTSUPP; } -static inline void kvm_iodevice_destructor(struct kvm_io_device *dev) -{
- if (dev->ops->destructor)
dev->ops->destructor(dev);
-}
#endif /* __KVM_IODEV_H__ */ diff --git a/virt/kvm/coalesced_mmio.c b/virt/kvm/coalesced_mmio.c index 0be80c213f7f..d7135a5e76f8 100644 --- a/virt/kvm/coalesced_mmio.c +++ b/virt/kvm/coalesced_mmio.c @@ -195,7 +195,6 @@ int kvm_vm_ioctl_unregister_coalesced_mmio(struct kvm *kvm, */ if (r) break;
kvm_iodevice_destructor(&dev->dev);
The comment lurking above this is now stale.
}
} diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 2a3ed401ce46..1b277afb545b 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -898,7 +898,6 @@ kvm_deassign_ioeventfd_idx(struct kvm *kvm, enum kvm_bus bus_idx, bus = kvm_get_bus(kvm, bus_idx); if (bus) bus->ioeventfd_count--;
ret = 0; break; }ioeventfd_release(p);
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 13e88297f999..582757ebdce6 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -5200,6 +5200,12 @@ static struct notifier_block kvm_reboot_notifier = { .priority = 0, }; +static void kvm_iodevice_destructor(struct kvm_io_device *dev) +{
- if (dev->ops->destructor)
dev->ops->destructor(dev);
+}
static void kvm_io_bus_destroy(struct kvm_io_bus *bus) { int i; @@ -5423,7 +5429,7 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, struct kvm_io_device *dev) {
- int i, j;
- int i; struct kvm_io_bus *new_bus, *bus;
lockdep_assert_held(&kvm->slots_lock); @@ -5453,18 +5459,18 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, rcu_assign_pointer(kvm->buses[bus_idx], new_bus); synchronize_srcu_expedited(&kvm->srcu);
- /* Destroy the old bus _after_ installing the (null) bus. */
- /*
* If (null) bus is installed, destroy the old bus, including all the
* attached devices. Otherwise, destroy the caller's device only.
if (!new_bus) { pr_err("kvm: failed to shrink bus, removing it completely\n");*/
for (j = 0; j < bus->dev_count; j++) {
if (j == i)
continue;
kvm_iodevice_destructor(bus->range[j].dev);
}
kvm_io_bus_destroy(bus);
return -ENOMEM;
Returning an error code is unnecessary if unregister_dev() destroys the bus. Nothing ultimately consumes the result, e.g. kvm_vm_ioctl_unregister_coalesced_mmio() intentionally ignores the result other than to bail from the loop, and destroying the bus means it will immediately bail from the loop anyways.
}
- kfree(bus);
- return new_bus ? 0 : -ENOMEM;
- kvm_iodevice_destructor(dev);
- return 0;
Unless I'm misreading things, this path leaks "bus".
Given that that intent is to send the fix for stable, that this is as much a cleanup as it is a bug fix, and that it's not super trivial, I'm inclined to queue my patch and then land this on top as cleanup.
On 1/24/23 00:25, Sean Christopherson wrote:
On Thu, Dec 29, 2022, Wei Wang wrote:
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 2a3ed401ce46..1b277afb545b 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -898,7 +898,6 @@ kvm_deassign_ioeventfd_idx(struct kvm *kvm, enum kvm_bus bus_idx, bus = kvm_get_bus(kvm, bus_idx); if (bus) bus->ioeventfd_count--;
ret = 0; break; }ioeventfd_release(p);
I was wondering: would it make sense to simplify from list_for_each_entry_safe() to list_for_each_entry() in this loop?
@@ -5453,18 +5459,18 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, rcu_assign_pointer(kvm->buses[bus_idx], new_bus); synchronize_srcu_expedited(&kvm->srcu);
- /* Destroy the old bus _after_ installing the (null) bus. */
- /*
* If (null) bus is installed, destroy the old bus, including all the
* attached devices. Otherwise, destroy the caller's device only.
if (!new_bus) { pr_err("kvm: failed to shrink bus, removing it completely\n");*/
for (j = 0; j < bus->dev_count; j++) {
if (j == i)
continue;
kvm_iodevice_destructor(bus->range[j].dev);
}
kvm_io_bus_destroy(bus);
return -ENOMEM;
Returning an error code is unnecessary if unregister_dev() destroys the bus. Nothing ultimately consumes the result, e.g. kvm_vm_ioctl_unregister_coalesced_mmio() intentionally ignores the result other than to bail from the loop, and destroying the bus means it will immediately bail from the loop anyways.
But it is important to know _if_ the bus was destroyed, right? IOW, doesn't your comment from commit 5d3c4c79384a still hold?
(...) But, it doesn't tell the caller that it obliterated the bus and invoked the destructor for all devices that were on the bus. In the coalesced MMIO case, this can result in a deleted list entry dereference due to attempting to continue iterating on coalesced_zones after future entries (in the walk) have been deleted.
Michal
On Tue, Jan 24, 2023, Michal Luczaj wrote:
On 1/24/23 00:25, Sean Christopherson wrote:
On Thu, Dec 29, 2022, Wei Wang wrote:
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 2a3ed401ce46..1b277afb545b 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -898,7 +898,6 @@ kvm_deassign_ioeventfd_idx(struct kvm *kvm, enum kvm_bus bus_idx, bus = kvm_get_bus(kvm, bus_idx); if (bus) bus->ioeventfd_count--;
ret = 0; break; }ioeventfd_release(p);
I was wondering: would it make sense to simplify from list_for_each_entry_safe() to list_for_each_entry() in this loop?
Ooh, yeah, that's super confusing, at least to me, because the "safe" part implies that the loop processes entries after kvm_io_bus_unregister_dev(), i.e. needs to guard against failure same as the coalesced MMIO case.
Wei, want to tack on a patch in v2?
@@ -5453,18 +5459,18 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, rcu_assign_pointer(kvm->buses[bus_idx], new_bus); synchronize_srcu_expedited(&kvm->srcu);
- /* Destroy the old bus _after_ installing the (null) bus. */
- /*
* If (null) bus is installed, destroy the old bus, including all the
* attached devices. Otherwise, destroy the caller's device only.
if (!new_bus) { pr_err("kvm: failed to shrink bus, removing it completely\n");*/
for (j = 0; j < bus->dev_count; j++) {
if (j == i)
continue;
kvm_iodevice_destructor(bus->range[j].dev);
}
kvm_io_bus_destroy(bus);
return -ENOMEM;
Returning an error code is unnecessary if unregister_dev() destroys the bus. Nothing ultimately consumes the result, e.g. kvm_vm_ioctl_unregister_coalesced_mmio() intentionally ignores the result other than to bail from the loop, and destroying the bus means it will immediately bail from the loop anyways.
But it is important to know _if_ the bus was destroyed, right? IOW, doesn't your comment from commit 5d3c4c79384a still hold?
/facepalm
Yes, it matters. I somehow got on the train of thought that list_for_each_entry_safe() magically bails if the list is purged, but the safe variant only plays nice with the _current_ entry being deleted.
So yeah, the return code needs to stay.
(...) But, it doesn't tell the caller that it obliterated the bus and invoked the destructor for all devices that were on the bus. In the coalesced MMIO case, this can result in a deleted list entry dereference due to attempting to continue iterating on coalesced_zones after future entries (in the walk) have been deleted.
Michal
On Wednesday, January 25, 2023 4:55 AM, Sean Christopherson wrote:
I was wondering: would it make sense to simplify from list_for_each_entry_safe() to list_for_each_entry() in this loop?
Ooh, yeah, that's super confusing, at least to me, because the "safe" part implies that the loop processes entries after kvm_io_bus_unregister_dev(), i.e. needs to guard against failure same as the coalesced MMIO case.
Wei, want to tack on a patch in v2?
Yes. I will include it in v2.
On Tuesday, January 24, 2023 7:25 AM, Sean Christopherson wrote:
- kfree(bus);
- return new_bus ? 0 : -ENOMEM;
- kvm_iodevice_destructor(dev);
- return 0;
Unless I'm misreading things, this path leaks "bus".
Right, should keep the kfree above.
Given that that intent is to send the fix for stable, that this is as much a cleanup as it is a bug fix, and that it's not super trivial, I'm inclined to queue my patch and then land this on top as cleanup.
Sounds good to me.
linux-stable-mirror@lists.linaro.org