On 2/11/21 7:23 AM, Cornelia Huck wrote:
On Wed, 10 Feb 2021 15:34:24 -0500 Tony Krowiak akrowiak@linux.ibm.com wrote:
On 2/10/21 5:53 AM, Cornelia Huck wrote:
On Tue, 9 Feb 2021 14:48:30 -0500 Tony Krowiak akrowiak@linux.ibm.com wrote:
This patch fixes a circular locking dependency in the CI introduced by commit f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated"). The lockdep only occurs when starting a Secure Execution guest. Crypto virtualization (vfio_ap) is not yet supported for SE guests; however, in order to avoid CI errors, this fix is being provided.
The circular lockdep was introduced when the masks in the guest's APCB were taken under the matrix_dev->lock. While the lock is definitely needed to protect the setting/unsetting of the KVM pointer, it is not necessarily critical for setting the masks, so this will not be done under protection of the matrix_dev->lock.
Fixes: f21916ec4826 ("s390/vfio-ap: clean up vfio_ap resources when KVM pointer invalidated") Cc: stable@vger.kernel.org Signed-off-by: Tony Krowiak akrowiak@linux.ibm.com
drivers/s390/crypto/vfio_ap_ops.c | 75 ++++++++++++++++++------------- 1 file changed, 45 insertions(+), 30 deletions(-)
static void vfio_ap_mdev_unset_kvm(struct ap_matrix_mdev *matrix_mdev) {
- kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
- matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
- vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
- kvm_put_kvm(matrix_mdev->kvm);
- matrix_mdev->kvm = NULL;
- if (matrix_mdev->kvm) {
If you're doing setting/unsetting under matrix_dev->lock, is it possible that matrix_mdev->kvm gets unset between here and the next line, as you don't hold the lock?
That is highly unlikely because the only place the matrix_mdev->kvm pointer is cleared is in this function which is called from only two places: the notifier that handles the VFIO_GROUP_NOTIFY_SET_KVM notification when the KVM pointer is cleared; the vfio_ap_mdev_release() function which is called when the mdev fd is closed (i.e., when the guest is shut down). The fact is, with the only end-to-end implementation currently available, the notifier callback is never invoked to clear the KVM pointer because the vfio_ap_mdev_release callback is invoked first and it unregisters the notifier callback.
Having said that, I suppose there is no guarantee that there will not be different userspace clients in the future that do things in a different order. At the very least, it wouldn't hurt to protect against that as you suggest below.
Yes, if userspace is able to use the interfaces in the certain way, we should always make sure that nothing bad happens if it does so, even if known userspace applications are well-behaved.
[Can we make an 'evil userspace' test program, maybe? The hardware dependency makes this hard to run, though.]
Of course it is possible to create such a test program, but off the top of my head, I can't come up with an algorithm that would result in the scenario you have laid out. I haven't dabbled in the QEMU space in quite some time; so, there would also be a bit of a re-learning curve. I'm not sure it would be worth the effort to take this on given how unlikely it is this scenario can happen, but I will take it into consideration as it is a good idea.
Maybe you could
- grab a reference to kvm while holding the lock
- call the mask handling functions with that kvm reference
- lock again, drop the reference, and do the rest of the processing?
kvm_arch_crypto_clear_masks(matrix_mdev->kvm);
mutex_lock(&matrix_dev->lock);
matrix_mdev->kvm->arch.crypto.pqap_hook = NULL;
vfio_ap_mdev_reset_queues(matrix_mdev->mdev);
kvm_put_kvm(matrix_mdev->kvm);
matrix_mdev->kvm = NULL;
mutex_unlock(&matrix_dev->lock);
- } }