Thanks Greg for the inputs!!
On 5/10/2022 4:30 PM, Greg KH wrote:
The dmabuf file uses get_next_ino()(through dma_buf_getfile() -> alloc_anon_inode()) to get an inode number and uses the same as a directory name under /sys/kernel/dmabuf/buffers/<ino>. This directory is used to collect the dmabuf stats and it is created through dma_buf_stats_setup(). At current, failure to create this directory entry can make the dma_buf_export() to fail.
Now, as the get_next_ino() can definitely give a repetitive inode no causing the directory entry creation to fail with -EEXIST. This is a problem on the systems where dmabuf stats functionality is enabled on the production builds can make the dma_buf_export(), though the dmabuf memory is allocated successfully, to fail just because it couldn't create stats entry.
Then maybe we should not fail the creation path of the kobject fails to be created? It's just for debugging, it should be fine if the creation of it isn't there.
Not creating the debug node under some special cases can make this interface not reliable if one wants to know info about the created dmabuf buffers. Please help in correcting me If my perspective is wrong here.
IIUC, except this -EEXIST condition, under the other conditions (-EINVAL and -ENOMEM) failure is fine. Since, we are going to fix the -EEXIST error in this patch, my opinion is failure in the kobject creation path is acceptable for the reasons: a) The user is expected to pass the valid dmabuf to create the stats node, b) The user can undefine the CONFIG_DMABUF_SYSFS_STATS if he don't want this stats.
This issue we are able to see on the snapdragon system within 13 days where there already exists a directory with inode no "122602" so dma_buf_stats_setup() failed with -EEXIST as it is trying to create the same directory entry.
To make the directory entry as unique, append the inode creation time to the inode. With this change the stats directory entries will be in the format of: /sys/kernel/dmabuf/buffers/<inode no>-<inode creation time in secs>.
As you are changing the format here, shouldn't the Documentation/ABI/ entry for this also be changed?
And what's to keep the seconds field from also being the same?
get_next_ino() just increases the inode number monotonically and return to the caller and it is 'unsigned int' data type. Thus 2 successive calls always generate the different inode_number but can be the same secs value. With inode-secs format, this will be still be a unique string. Say it will be like ino1-sec1 and ino2-sec1.
Now after the inode number overflow and wraps, we may get the ino1 again from the get_next_ino() but then secs will be different i.e. say it may be like ino1-secn and ion2-secn. So, it always be a unique string.
IOW, with secs field added, to get the same inode-secs string, the uint should overflow in the same second which is impossible.
Thanks for pointing out the changes to be done in ABI document. Will do it in the next spin.