v3: - Amends the commit log for patch #1 per Johan's suggestion. - Link to v2: https://lore.kernel.org/r/20240716-linux-next-24-07-13-camss-fixes-v2-0-e60c...
v2: - Updates commits with Johan's Review/Reported tags - Adds Closes: https://lore.kernel.org/lkml/ZoVNHOTI0PKMNt4_@hovoldconsulting.com - Cc's stable - Adds in suggested kernel log to allow others to more easily match kernel log to fixes - Link to v1: https://lore.kernel.org/r/20240714-linux-next-24-07-13-camss-fixes-v1-0-8f89...
V1: Dogfooding with SoftISP has uncovered two bugs in this series which I'm posting fixes for.
- The first error: A simple race condition which to be honest I'm surprised I haven't found earlier nor has anybody else. Simply stated the order we typically end up loading CAMSS on boot has masked out the pm_runtime_enable() race condition that has been present in CAMSS for a long time.
If you blacklist qcom-camss in modules.d and then modprobe after boot, the race condition shows up easily.
Moving the pm_runtime_enable prior to subdevice registration fixes the problem.
The second error: Nomenclature: - CSIPHY: CSI Physical layer analogue to digital domain serialiser - CSID: CSI Decoder - VFE: Video Front End - RDI: Raw Data Interface - VC: Virtual Channel
In order to support streaming multiple virtual-channels on the same RDI a V4L2 provided use_count variable is used to decide whether or not to actually terminate streaming and release buffers for 'msm_vfe_rdiX'.
Unfortunately use_count indicates the number of times msm_vfe_rdiX has been opened by user-space not the number of concurrent streams on msm_vfe_rdiX.
Simply stated use_count and stream_count are two different things.
The silicon enabling code to select between VCs is valid but, a different solution needs to be found to support _concurrent_ VC streams.
Right now the upstream use_count as-is is breaking the non concurrent VC case and I don't believe there are upstream users of concurrent VCs on CAMSS.
This series implements a revert for the invalid use_count check, retaining the ability to select which VC is active on the RDI.
Dogfooding with libcamera's SoftISP in Hangouts, Zoom and multiple runs of libcamera's "qcam" application is a very different test-case to the simple capture of frames we previously did when validating the 'use_count' change.
A partial revert in expectation of a renewed push to fixup that concurrent VC issue is included.
Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org --- Bryan O'Donoghue (2): media: qcom: camss: Remove use_count guard in stop_streaming media: qcom: camss: Fix ordering of pm_runtime_enable
drivers/media/platform/qcom/camss/camss-video.c | 6 ------ drivers/media/platform/qcom/camss/camss.c | 5 +++-- 2 files changed, 3 insertions(+), 8 deletions(-) --- base-commit: c6ce8f9ab92edc9726996a0130bfc1c408132d47 change-id: 20240713-linux-next-24-07-13-camss-fixes-fa98c0965a5d
Best regards,
The use_count check was introduced so that multiple concurrent Raw Data Interfaces RDIs could be driven by different virtual channels VCs on the CSIPHY input driving the video pipeline.
This is an invalid use of use_count though as use_count pertains to the number of times a video entity has been opened by user-space not the number of active streams.
If use_count and stream-on count don't agree then stop_streaming() will break as is currently the case and has become apparent when using CAMSS with libcamera's released softisp 0.3.
The use of use_count like this is a bit hacky and right now breaks regular usage of CAMSS for a single stream case. Stopping qcam results in the splat below, and then it cannot be started again and any attempts to do so fails with -EBUSY.
[ 1265.509831] WARNING: CPU: 5 PID: 919 at drivers/media/common/videobuf2/videobuf2-core.c:2183 __vb2_queue_cancel+0x230/0x2c8 [videobuf2_common] ... [ 1265.510630] Call trace: [ 1265.510636] __vb2_queue_cancel+0x230/0x2c8 [videobuf2_common] [ 1265.510648] vb2_core_streamoff+0x24/0xcc [videobuf2_common] [ 1265.510660] vb2_ioctl_streamoff+0x5c/0xa8 [videobuf2_v4l2] [ 1265.510673] v4l_streamoff+0x24/0x30 [videodev] [ 1265.510707] __video_do_ioctl+0x190/0x3f4 [videodev] [ 1265.510732] video_usercopy+0x304/0x8c4 [videodev] [ 1265.510757] video_ioctl2+0x18/0x34 [videodev] [ 1265.510782] v4l2_ioctl+0x40/0x60 [videodev] ... [ 1265.510944] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 0 in active state [ 1265.511175] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 1 in active state [ 1265.511398] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 2 in active st
One CAMSS specific way to handle multiple VCs on the same RDI might be:
- Reference count each pipeline enable for CSIPHY, CSID, VFE and RDIx. - The video buffers are already associated with msm_vfeN_rdiX so release video buffers when told to do so by stop_streaming. - Only release the power-domains for the CSIPHY, CSID and VFE when their internal refcounts drop.
Either way refusing to release video buffers based on use_count is erroneous and should be reverted. The silicon enabling code for selecting VCs is perfectly fine. Its a "known missing feature" that concurrent VCs won't work with CAMSS right now.
Initial testing with this code didn't show an error but, SoftISP and "real" usage with Google Hangouts breaks the upstream code pretty quickly, we need to do a partial revert and take another pass at VCs.
This commit partially reverts commit 89013969e232 ("media: camss: sm8250: Pipeline starting and stopping for multiple virtual channels")
Fixes: 89013969e232 ("media: camss: sm8250: Pipeline starting and stopping for multiple virtual channels") Reported-by: Johan Hovold johan+linaro@kernel.org Closes: https://lore.kernel.org/lkml/ZoVNHOTI0PKMNt4_@hovoldconsulting.com/ Tested-by: Johan Hovold johan+linaro@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org --- drivers/media/platform/qcom/camss/camss-video.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/media/platform/qcom/camss/camss-video.c b/drivers/media/platform/qcom/camss/camss-video.c index cd72feca618c..3b8fc31d957c 100644 --- a/drivers/media/platform/qcom/camss/camss-video.c +++ b/drivers/media/platform/qcom/camss/camss-video.c @@ -297,12 +297,6 @@ static void video_stop_streaming(struct vb2_queue *q)
ret = v4l2_subdev_call(subdev, video, s_stream, 0);
- if (entity->use_count > 1) { - /* Don't stop if other instances of the pipeline are still running */ - dev_dbg(video->camss->dev, "Video pipeline still used, don't stop streaming.\n"); - return; - } - if (ret) { dev_err(video->camss->dev, "Video pipeline stop failed: %d\n", ret); return;
On 7/29/24 15:42, Bryan O'Donoghue wrote:
The use_count check was introduced so that multiple concurrent Raw Data Interfaces RDIs could be driven by different virtual channels VCs on the CSIPHY input driving the video pipeline.
This is an invalid use of use_count though as use_count pertains to the number of times a video entity has been opened by user-space not the number of active streams.
If use_count and stream-on count don't agree then stop_streaming() will break as is currently the case and has become apparent when using CAMSS with libcamera's released softisp 0.3.
The use of use_count like this is a bit hacky and right now breaks regular usage of CAMSS for a single stream case. Stopping qcam results in the splat below, and then it cannot be started again and any attempts to do so fails with -EBUSY.
[ 1265.509831] WARNING: CPU: 5 PID: 919 at drivers/media/common/videobuf2/videobuf2-core.c:2183 __vb2_queue_cancel+0x230/0x2c8 [videobuf2_common] ... [ 1265.510630] Call trace: [ 1265.510636] __vb2_queue_cancel+0x230/0x2c8 [videobuf2_common] [ 1265.510648] vb2_core_streamoff+0x24/0xcc [videobuf2_common] [ 1265.510660] vb2_ioctl_streamoff+0x5c/0xa8 [videobuf2_v4l2] [ 1265.510673] v4l_streamoff+0x24/0x30 [videodev] [ 1265.510707] __video_do_ioctl+0x190/0x3f4 [videodev] [ 1265.510732] video_usercopy+0x304/0x8c4 [videodev] [ 1265.510757] video_ioctl2+0x18/0x34 [videodev] [ 1265.510782] v4l2_ioctl+0x40/0x60 [videodev] ... [ 1265.510944] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 0 in active state [ 1265.511175] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 1 in active state [ 1265.511398] videobuf2_common: driver bug: stop_streaming operation is leaving buffer 2 in active st
One CAMSS specific way to handle multiple VCs on the same RDI might be:
- Reference count each pipeline enable for CSIPHY, CSID, VFE and RDIx.
- The video buffers are already associated with msm_vfeN_rdiX so release video buffers when told to do so by stop_streaming.
- Only release the power-domains for the CSIPHY, CSID and VFE when their internal refcounts drop.
Either way refusing to release video buffers based on use_count is erroneous and should be reverted. The silicon enabling code for selecting VCs is perfectly fine. Its a "known missing feature" that concurrent VCs won't work with CAMSS right now.
Initial testing with this code didn't show an error but, SoftISP and "real" usage with Google Hangouts breaks the upstream code pretty quickly, we need to do a partial revert and take another pass at VCs.
This commit partially reverts commit 89013969e232 ("media: camss: sm8250: Pipeline starting and stopping for multiple virtual channels")
Fixes: 89013969e232 ("media: camss: sm8250: Pipeline starting and stopping for multiple virtual channels") Reported-by: Johan Hovold johan+linaro@kernel.org Closes: https://lore.kernel.org/lkml/ZoVNHOTI0PKMNt4_@hovoldconsulting.com/ Tested-by: Johan Hovold johan+linaro@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org
drivers/media/platform/qcom/camss/camss-video.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/drivers/media/platform/qcom/camss/camss-video.c b/drivers/media/platform/qcom/camss/camss-video.c index cd72feca618c..3b8fc31d957c 100644 --- a/drivers/media/platform/qcom/camss/camss-video.c +++ b/drivers/media/platform/qcom/camss/camss-video.c @@ -297,12 +297,6 @@ static void video_stop_streaming(struct vb2_queue *q) ret = v4l2_subdev_call(subdev, video, s_stream, 0);
if (entity->use_count > 1) {
/* Don't stop if other instances of the pipeline are still running */
dev_dbg(video->camss->dev, "Video pipeline still used, don't stop streaming.\n");
return;
}
- if (ret) { dev_err(video->camss->dev, "Video pipeline stop failed: %d\n", ret); return;
Reviewed-by: Vladimir Zapolskiy vladimir.zapolskiy@linaro.org
-- Best wishes, Vladimir
pm_runtime_enable() should happen prior to vfe_get() since vfe_get() calls pm_runtime_resume_and_get().
This is a basic race condition that doesn't show up for most users so is not widely reported. If you blacklist qcom-camss in modules.d and then subsequently modprobe the module post-boot it is possible to reliably show this error up.
The kernel log for this error looks like this:
qcom-camss ac5a000.camss: Failed to power up pipeline: -13
Fixes: 02afa816dbbf ("media: camss: Add basic runtime PM support") Reported-by: Johan Hovold johan+linaro@kernel.org Closes: https://lore.kernel.org/lkml/ZoVNHOTI0PKMNt4_@hovoldconsulting.com/ Tested-by: Johan Hovold johan+linaro@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org --- drivers/media/platform/qcom/camss/camss.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/media/platform/qcom/camss/camss.c b/drivers/media/platform/qcom/camss/camss.c index 51b1d3550421..d64985ca6e88 100644 --- a/drivers/media/platform/qcom/camss/camss.c +++ b/drivers/media/platform/qcom/camss/camss.c @@ -2283,6 +2283,8 @@ static int camss_probe(struct platform_device *pdev)
v4l2_async_nf_init(&camss->notifier, &camss->v4l2_dev);
+ pm_runtime_enable(dev); + num_subdevs = camss_of_parse_ports(camss); if (num_subdevs < 0) { ret = num_subdevs; @@ -2323,8 +2325,6 @@ static int camss_probe(struct platform_device *pdev) } }
- pm_runtime_enable(dev); - return 0;
err_register_subdevs: @@ -2332,6 +2332,7 @@ static int camss_probe(struct platform_device *pdev) err_v4l2_device_unregister: v4l2_device_unregister(&camss->v4l2_dev); v4l2_async_nf_cleanup(&camss->notifier); + pm_runtime_disable(dev); err_genpd_cleanup: camss_genpd_cleanup(camss);
On 29.07.2024 2:42 PM, Bryan O'Donoghue wrote:
pm_runtime_enable() should happen prior to vfe_get() since vfe_get() calls pm_runtime_resume_and_get().
This is a basic race condition that doesn't show up for most users so is not widely reported. If you blacklist qcom-camss in modules.d and then subsequently modprobe the module post-boot it is possible to reliably show this error up.
The kernel log for this error looks like this:
qcom-camss ac5a000.camss: Failed to power up pipeline: -13
Fixes: 02afa816dbbf ("media: camss: Add basic runtime PM support") Reported-by: Johan Hovold johan+linaro@kernel.org Closes: https://lore.kernel.org/lkml/ZoVNHOTI0PKMNt4_@hovoldconsulting.com/ Tested-by: Johan Hovold johan+linaro@kernel.org Cc: stable@vger.kernel.org Signed-off-by: Bryan O'Donoghue bryan.odonoghue@linaro.org
Reviewed-by: Konrad Dybcio konradybcio@kernel.org
Konrad
linux-stable-mirror@lists.linaro.org