I have a hung task call trace from a debug kernel in case it's helpful: https://gist.github.com/jmontleon/a6dff2ad949cc50bb8f162d7b306b320
On Thu, Feb 9, 2023 at 11:13 AM Jason Montleon jmontleo@redhat.com wrote:
I've done some more digging. The only line that needs to be reverted from f2bd1c5ae2cb0cf9525c9bffc0038c12dd7e1338, moving from snd_hda_codec_device_init back to snd_hda_codec_device_new is: codec->core.exec_verb = codec_exec_verb; (https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/sound/...)
I added a bunch of debug statements and all the code in codec_exec_verb runs at boot with this in snd_hda_codec_device_init, whereas it does not when in snd_hda_codec_device_new.
From what I can tell we end up in snd_hda_power_up_pm and then get hung up at snd_hdac_power_up.
There are a bunch of pin port messages that show up from hdac_hdmi_query_port_connlist when things are working, that never appear when broken: [ 14.618805] HDMI HDA Codec ehdaudio0D2: No connections found for pin:port 5:0 [ 14.619242] HDMI HDA Codec ehdaudio0D2: No connections found for pin:port 5:1 [ 14.619703] HDMI HDA Codec ehdaudio0D2: No connections found for pin:port 5:2 ...
I do see hdac_hdmi_runtime_suspend run a moment before things go bad, but I have no idea if it is related.
Without patching anything and CONFIG_PM unset everything works.
I don't know if that helps anyone see where the problem is. If not I'll keep plugging away.
Incidentally, commit 3fd63658caed9494cca1d4789a66d3d2def2a0ab, pointed to by my second bisect, starts making using of skl_codec_device_init where I believe snd_hda_codec_device_init is called and starts the problem. I believe this is why reverting either of the two works around the problem.
On Mon, Feb 6, 2023 at 2:57 PM Jason Montleon jmontleo@redhat.com wrote:
On Mon, Feb 6, 2023 at 8:51 AM Jason Montleon jmontleo@redhat.com wrote:
On Mon, Feb 6, 2023 at 4:04 AM Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com wrote:
On 2/4/2023 4:16 PM, Jason Montleon wrote:
I have built kernels for 6.0.19 (I don't think anyone confirmed whether or not it worked), plus every 6.1 tag from 6.1-rc1 up to 6.1.7. 6.0.19 worked. No 6.1 kernels worked. For rc1 to rc5 I built with and without the legacy dai renaming patch added in rc6 that I believe would be necessary, but it made no difference either way.
Hi,
thank you for trying to narrow it down, if I understand correctly -rc1 doesn't work, which means that problem was introduced somewhere between 6.0 and 6.1-rc1 (just for the sake of being sure, can you test 6.0 instead of 6.0.19?) There is one commit which I'm bit suspicious about: ef6f5494faf6a37c74990689a3bb3cee76d2544c it changes how HDMI are assigned and as a machine board present on EVE makes use of HDMI, it may potentially cause some problems. Can you try reverting it? (If reverting on top of v6.1.8 you need to revert both f9aafff5448b1d8d457052271cd9a11b24e4d0bd and ef6f5494faf6a37c74990689a3bb3cee76d2544c which has minor conflict, easily resolved with just adding both lines.
Yes, happy to give that a shot and will report back.
Removing f9aafff5448b1d8d457052271cd9a11b24e4d0bd and ef6f5494faf6a37c74990689a3bb3cee76d2544c did not make things work.
You may be onto something with pulseaudio and/or HDMI, however. When setting up Slackware I saw an interesting aplay hang. Normally aplay -l will list like this with working audio: $ aplay -l **** List of PLAYBACK Hardware Devices **** card 0: kblr55145663max [kbl-r5514-5663-max], device 0: Audio (*) [] Subdevices: 1/1 Subdevice #0: subdevice #0 card 0: kblr55145663max [kbl-r5514-5663-max], device 2: Headset Audio (*) [] Subdevices: 1/1 Subdevice #0: subdevice #0 card 0: kblr55145663max [kbl-r5514-5663-max], device 6: Hdmi1 (*) [] Subdevices: 1/1 Subdevice #0: subdevice #0 card 0: kblr55145663max [kbl-r5514-5663-max], device 7: Hdmi2 (*) [] Subdevices: 1/1 Subdevice #0: subdevice #0
Both on Slackware and Fedora with broken audio it hangs like so (haven't tried on Arch): $ aplay -l **** List of PLAYBACK Hardware Devices **** card 0: kblr55145663max [kbl-r5514-5663-max], device 0: Audio (*) [] Subdevices: 1/1 Subdevice #0: subdevice #0 card 0: kblr55145663max [kbl-r5514-5663-max], device 2: Headset Audio (*) [] Subdevices: 1/1 Subdevice #0: subdevice #0 card 0: kblr55145663max [kbl-r5514-5663-max], device 6: Hdmi1 (*) [] Subdevices: 1/1 Subdevice #0: subdevice #0
If I remove or disable pulseaudio it lists without hanging, but it's difficult for me to tell whether it's working since aplay, etc. seem to want pulseaudio to play anything. Shutdown hangs persist regardless.
Also, Slackware with 6.1.9 behaves as badly for me as everything else. If Sasa has working audio I do not know how he has managed to configure it. On each distro, as soon as I add topology and firmware files everything goes bad, regardless of whether I add ucm configuration or not, etc.
I also still wonder, why problem reproduces only on some distributions... any chance you can try and boot with pipewire/pulseaudio disabled and see if it still happens, iirc those tools try to check all FEs and this may be breaking something during enumeration.
I can definitely try disabling pulseaudio and switching to pipewire and seeing if anything changes as well.
FWIW, I installed Arch on a thumb drive this weekend and was able to reproduce the issue and work around it by reverting the commit from my first bisect. So, for me it behaves just like Fedora. The instructions for Arch for building a custom kernel are great except they generalize the bootloader instructions, so you need to know what to do at the end to add the grub boot entries, if using grub for example, and I suspect that may be where the confusion came from, though I don't know. I'm trying to get one of the two to reproduce my results to confirm and at least get them a workaround.
I have slackware on another thumb drive already, but I have yet to even get it updated to 6.1.8.
If any of them behave differently I was hoping to tease out whether it's firmware, kernel config, or something else, but so far the first has been more of the same.
Thanks, Amadeusz
On Wed, Feb 1, 2023 at 9:33 AM Jason Montleon jmontleo@redhat.com wrote:
On Wed, Feb 1, 2023 at 6:05 AM Amadeusz Sławiński amadeuszx.slawinski@linux.intel.com wrote: > > On 1/31/2023 4:16 PM, Jason Montleon wrote: >> On Tue, Jan 31, 2023 at 7:37 AM Cezary Rojewski >> cezary.rojewski@intel.com wrote: >>> >>> On 2023-01-30 1:22 PM, Sasa Ostrouska wrote: >>> >>>> Dear Czarek, many thanks for the answer and taking care of it. If >>>> needed something from my side please jest let me know >>>> and I will try to do it. >>> >>> >>> Hello Sasa, >>> >>> Could you provide us with the topology and firmware binary present on >>> your machine? >>> >>> Audio topology is located at /lib/firmware and named: >>> >>> 9d71-GOOGLE-EVEMAX-0-tplg.bin >>> -or- >>> dfw_sst.bin >>> >>> Firmware on the other hand is found in /lib/firmware/intel/. >>> 'dsp_fw_kbl.bin' will lie there, it shall be a symlink pointing to an >>> actual AudioDSP firmware binary. >>> >> Maybe this is the problem. >> >> I think most of us are pulling the topology and firmware from the >> chromeos recovery images for lack of any other known source, and it >> looks a little different than this. Those can be downloaded like so: >> https://gist.github.com/jmontleon/8899cb83138f2653f520fbbcc5b830a0 >> >> After placing the topology file you'll see these errors and audio will >> not work until they're also copied in place. >> snd_soc_skl 0000:00:1f.3: Direct firmware load for >> dsp_lib_dsm_core_spt_release.bin failed with error -2 >> snd_soc_skl 0000:00:1f.3: Direct firmware load for >> intel/dsp_fw_C75061F3-F2B2-4DCC-8F9F-82ABB4131E66.bin failed with >> error -2 >> >> Once those were in place, up to 6.0.18 audio worked. >> >> Is there a better source for the topology file? >> >>> The reasoning for these asks is fact that problem stopped reproducing on >>> our end once we started playing with kernel versions (moved away from >>> status quo with Fedora). Neither on Lukasz EVE nor on my SKL RVP. >>> However, we might be using newer configuration files when compared to >>> equivalent of yours. >>> >>> Recent v6.2-rc5 broonie/sound/for-next - no repro >>> Our internal tree based on Mark's for-next - no repro >>> 6.1.7 stable [1] - no repro >>> >>> Of course we will continue with our attempts. Will notify about the >>> progress. >>> >>> >>> [1]: >>> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?h=v... >>> >>> >>> Kind regards, >>> Czarek >>> >> >> > > Hi Jason, > > as I understand you've tried to do bisect, can you instead try building > kernels checking out following tags: > v6.1 v6.1.1 v6.1.2 v6.1.3 v6.1.4 v6.1.5 v6.1.6 > v6.1.7 v6.1.8 > and report when it stops working, so it narrows scope of what we look > at? I assume that kernel builds are done using upstream stable kernel > (from https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/). > > Thanks, > Amadeusz > Hi Amadeusz, Yes, I did the bisects using https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/
The only thing I did to these was add 392cc13c5ec72ccd6bbfb1bc2339502cc59dd285, otherwise audio breaks with the dai not registered error message in dmesg from the rt5514 bug from 6.0 and up. It wasn't added to 6.1 until rc6, I believe. If there's a better way to work around the multiple bugs I can try again, otherwise I will start working on builds from tags and see if I learn anything.
FWIW, I've seen two people complain that Arch isn't working either since it moved to 6.1. For the one who was trying, patching out the commit I came to with the first bisect did not regain them sound like it did for me. And yet Sasa reports Slackware is mostly working for him with 6.1.8 on Slackware. I don't know what to make of it, but thought I'd share in case it helps point someone else to something. https://github.com/jmontleon/pixelbook-fedora/issues/51#issuecomment-1410222... https://github.com/jmontleon/pixelbook-fedora/issues/51#issuecomment-1410673... https://github.com/jmontleon/pixelbook-fedora/issues/53#issuecomment-1408699...
Probably less relevant since they aren't from upstream and I know they don't mean as much, but I have tried 6.1.5-6.1.8 Fedora packages for certain, and went back trying several others from koji back into rc builds, although using prebuilt kernels, anything before 6.1-rc6 won't work, as mentioned above. Nothing worked. But as I said I'll build from tags and see if I can learn anything.
Thank you, Jason Montleon
-- Jason Montleon | email: jmontleo@redhat.com Red Hat, Inc. | gpg key: 0x069E3022 Cell: 508-496-0663 | irc: jmontleo / jmontleon
-- Jason Montleon | email: jmontleo@redhat.com Red Hat, Inc. | gpg key: 0x069E3022 Cell: 508-496-0663 | irc: jmontleo / jmontleon
-- Jason Montleon | email: jmontleo@redhat.com Red Hat, Inc. | gpg key: 0x069E3022 Cell: 508-496-0663 | irc: jmontleo / jmontleon
-- Jason Montleon | email: jmontleo@redhat.com Red Hat, Inc. | gpg key: 0x069E3022 Cell: 508-496-0663 | irc: jmontleo / jmontleon