Hi
On 01/03/2022 13:30, Jinlong Mao wrote:
Hi Mike,
On 3/1/2022 9:15 PM, Mike Leach wrote:
Hi,
On Tue, 1 Mar 2022 at 11:42, Jinlong Mao quic_jinlmao@quicinc.com wrote:
On 2/28/2022 10:51 PM, Suzuki K Poulose wrote:
...
Hi Suzuki,
This issue happens when race condition happens. The condition is that the device and its child_device's probe happens at the same time.
For example: device0 and its child device device1. Both of them are calling coresight_register function. device0 is calling coresight_fixup_device_conns. device1 is waiting for device0 to release the coresight_mutex. Because device1's csdev node is allocated, coresight_make_links will be called for device0. Then in coresight_add_sysfs_link, has_conns_grp is true for device0, but has_conns_grp is false for device1 as has_conns_grp is set to true in coresight_create_conns_sysfs_group . The probe of device0 will fail for at this condition.
struct coresight_device *coresight_register(struct coresight_desc *desc) { ......... mutex_lock(&coresight_mutex);
ret = coresight_create_conns_sysfs_group(csdev); if (!ret) ret = coresight_fixup_device_conns(csdev); if (!ret) ret = coresight_fixup_orphan_conns(csdev); if (!ret && cti_assoc_ops && cti_assoc_ops->add) cti_assoc_ops->add(csdev);
mutex_unlock(&coresight_mutex);
.........
}
static int coresight_fixup_device_conns(struct coresight_device *csdev) { .......... conn->child_dev = coresight_find_csdev_by_fwnode(conn->child_fwnode);
The issue appears to be a constraint hidden in the lower layers of the code. Would a better solution not be to alter the code here:
if (conn->child_dev && conn->child_dev->has_conns_grp) { ... } else { csdev->orphan = true; }
which would mean that the connection attempt would drop through to label the connection as an orphan, to be cleaned up by the child itself when it runs coresight_fixup_orphan_conns()
Tnanks Mike, I think that is a good solution. Alternatively, we could make sure that device_register() and the fixup following that are atomic.
i.e.
mutex_lock()
device_register() fixup_connections() create_sysfs()
mutex_unlock();
The fix may be a bit invasive than Mike's proposal, but it makes sure we don't end up with half baked device on the coresight-bus.
Suzuki