On Fri, 2019-12-06 at 15:15 -0500, Boris Ostrovsky wrote:
On 12/6/19 1:09 PM, Nuernberger, Stefan wrote:
On Fri, 2019-12-06 at 10:11 -0500, Boris Ostrovsky wrote:
On 12/6/19 8:48 AM, Stefan Nuernberger wrote:
From: Uwe Dannowski uwed@amazon.de list_for_each_entry(cfg_entry, &dev_data-
config_fields, list) {
Couldn't you have the same race here?
Not quite the same, but it might not be entirely safe yet. The 'quirks_show' takes the 'device_ids_lock' and races with unbind / 'pcistub_device_release' "which takes device_lock mutex". So this might now be a UAF read access instead of a NULL pointer dereference.
Yes, that's what I meant (although I don't see much difference in this context).
Well, the NULL ptr access causes an instant kernel panic whereas we have not attributed crashes to the possible UAF read until now.
We have not observed adversarial effects in our testing (compared to the obvious issues with NULL pointer) but that's not a guarantee of course.
So should quirks_show actually be protected by pcistub_devices_lock instead as are other functions that access dev_data? Does it need both locks in that case?
device_ids_lock protects device_ids list, which is not what you are trying to access, so that doesn't look like right lock to hold. And AFAICT pcistub_devices_lock is not held when device data is cleared in pcistub_device_release() (which I think is where we are racing).
Indeed. The xen_pcibk_quirks list does not have a separate lock to protect it. It's either modified under 'pcistub_devices_lock', from pcistub_remove(), or iterated over with the 'device_ids_lock' held in quirks_show(). Also the quirks list is amended from pcistub_init_device() -> xen_pcibk_config_init_dev() -> xen_pcibk_config_quirks_init() without holding any lock at all. In fact the pcistub_init_devices_late() and pcistub_seize() functions deliberately release the pcistub_devices_lock before calling pcistub_init_device(). This looks broken.
The race is between pcistub_remove() -> pcistub_device_put() -> pcistub_device_release() on one side and the quirks_show() on the other side. The problematic quirk is freed from the xen_pcibk_quirks list in pcistub_remove() early on under pcistub_devices_lock before the associated dev_data is freed eventually. So switching from device_ids_lock to pcistub_devices_lock in quirks_show() could be sufficient to always have valid dev_data for all quirks in the list.
There is also pcistub_put_pci_dev() possibly in the race, called from xen_pcibk_remove_device(), or xen_pcibk_xenbus_remove(), or pcistub_remove(). The pcistub_remove() call site is safe when we switch to pcistub_devices_lock (same reasoning as above). For the others I currently do not see when the quirks are ever freed?
- Stefan
Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Ralf Herbrich Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879