Hi Bryan.
On 6/12/25 11:07, Bryan O'Donoghue wrote:
msm_vfe_register_entities loops through each Raw Data Interface input line. For each loop we add video device with its associated pads.
Once a single /dev/video0 node has been populated it is possible for
Here is a typo, /dev/video0 should be replaced by something like /dev/videoX.
camss_find_sensor_pad to run. This routine scans through a list of media entities taking a pointer pad = media_entity->pad[0] and assuming that pointer is always valid.
It is possible for both the enumeration loop in msm_vfe_register_entities() and a call from user-space to run concurrently.
Here comes my insufficient understanding, please explain further.
Per se this concurrent execution shall not lead to the encountered bug, both an initialization of media entity pads by media_entity_pads_init() and a registration of a v4l2 devnode inside msm_video_register() are done under in a proper sequence, aren't they?
From what I read there is no bug stated.
Adding some deliberate sleep code into the loop in msm_vfe_register_entities() and constructing a user-space program to open every /dev/videoX node in a tight continuous loop, quickly shows the following error.
[ 691.074558] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000030 [ 691.074933] Call trace: [ 691.074935] camss_find_sensor_pad+0x74/0x114 [qcom_camss] (P) [ 691.074946] camss_get_pixel_clock+0x18/0x64 [qcom_camss] [ 691.074956] vfe_get+0xc0/0x54c [qcom_camss] [ 691.074968] vfe_set_power+0x58/0xf4c [qcom_camss] [ 691.074978] pipeline_pm_power_one+0x124/0x140 [videodev] [ 691.074986] pipeline_pm_power+0x70/0x100 [videodev] [ 691.074992] v4l2_pipeline_pm_use+0x54/0x90 [videodev] [ 691.074998] v4l2_pipeline_pm_get+0x14/0x20 [videodev] [ 691.075005] video_open+0x74/0xe0 [qcom_camss] [ 691.075014] v4l2_open+0xa8/0x124 [videodev] [ 691.075021] chrdev_open+0xb0/0x21c [ 691.075031] do_dentry_open+0x138/0x4c4 [ 691.075040] vfs_open+0x2c/0xe8 [ 691.075044] path_openat+0x6f0/0x10a0 [ 691.075050] do_filp_open+0xa8/0x164 [ 691.075054] do_sys_openat2+0x94/0x104 [ 691.075058] __arm64_sys_openat+0x64/0xc0 [ 691.075061] invoke_syscall+0x48/0x104 [ 691.075069] el0_svc_common.constprop.0+0x40/0xe0 [ 691.075075] do_el0_svc+0x1c/0x28 [ 691.075080] el0_svc+0x30/0xcc [ 691.075085] el0t_64_sync_handler+0x10c/0x138 [ 691.075088] el0t_64_sync+0x198/0x19c
Taking the vfe->power_lock is not possible since v4l2_device_register_subdev takes the mdev->graph_lock. Later on fops->open takes the mdev->graph_lock followed by vfe_get() -> taking vfe->power_lock.
It's unclear what is the connection between the issue and a call to v4l2_device_register_subdev(), the latter is related to /dev/v4l-subdevX devnodes, but all way above the talk was about /dev/videoX devnodes, no?
Introduce a simple enumeration_complete bool which is false initially and only set true once in our init routine after we complete enumeration.
It might be a fix (what is the bug actually? it's still left unexplained) at the price of the machine state complification, a much better fix would be not to create and expose a non-ready /dev/videoX devnode by calling video_register_device() too early.
If user-space tries to interact with the VFE before complete enumeration it will receive -EAGAIN.
It sounds like a critical change in the kernel to userspace ABI of open(2) syscall for CAMSS V4L2 devnodes, unfortunately... EAGAIN could be received, if open() is called with O_NONBLOCK flag, otherwise the syscall shall be blocked.
I believe a completion of media device entities/pads registration before creating a devnode should solve all the issues in a proper way.
Cc: stable@vger.kernel.org Fixes: 4c98a5f57f90 ("media: camss: Add VFE files") Reported-by: Johan Hovold johan+linaro@kernel.org Closes: https://lore.kernel.org/all/Zwjw6XfVWcufMlqM@hovoldconsulting.com Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org
-- Best wishes, Vladimir