I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and noticed that the dmesg line "Incomplete global flushes, disabling PCID" had disappeared from the log.
That message comes from commit c26b9e193172f48cd0ccc64285337106fb8aa804, which disables PCID support on some broken hardware in arch/x86/mm/init.c:
#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \ .family = 6, \ .model = _model, \ } /* * INVLPG may not properly flush Global entries * on these CPUs when PCIDs are enabled. */ static const struct x86_cpu_id invlpg_miss_ids[] = { INTEL_MATCH(INTEL_FAM6_ALDERLAKE ), INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ), INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ), INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ), INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P), INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S), {}
...
if (x86_match_cpu(invlpg_miss_ids)) { pr_info("Incomplete global flushes, disabling PCID"); setup_clear_cpu_cap(X86_FEATURE_PCID); return; }
arch/x86/mm/init.c, which has that code, hasn't changed in 6.1.94 -> 6.1.99. However I found a commit changing how x86_match_cpu() behaves in 6.1.96:
commit 8ab1361b2eae44077fef4adea16228d44ffb860c Author: Tony Luck tony.luck@intel.com Date: Mon May 20 15:45:33 2024 -0700
x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
I suspect this broke the PCID disabling code in arch/x86/mm/init.c. The commit message says:
"Add a new flags field to struct x86_cpu_id that has a bit set to indicate that this entry in the array is valid. Update X86_MATCH*() macros to set that bit. Change the end-marker check in x86_match_cpu() to just check the flags field for this bit."
But the PCID disabling code in 6.1.99 does not make use of the X86_MATCH*() macros; instead, it defines a new INTEL_MATCH() macro without the X86_CPU_ID_FLAG_ENTRY_VALID flag.
I looked in upstream git and found an existing fix: commit 2eda374e883ad297bd9fe575a16c1dc850346075 Author: Tony Luck tony.luck@intel.com Date: Wed Apr 24 11:15:18 2024 -0700
x86/mm: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.
[ dhansen: vertically align 0's in invlpg_miss_ids[] ]
Signed-off-by: Tony Luck tony.luck@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 679893ea5e68..6b43b6480354 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -261,21 +261,17 @@ static void __init probe_page_size_mask(void) } }
-#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \ - .family = 6, \ - .model = _model, \ - } /* * INVLPG may not properly flush Global entries * on these CPUs when PCIDs are enabled. */ static const struct x86_cpu_id invlpg_miss_ids[] = { - INTEL_MATCH(INTEL_FAM6_ALDERLAKE ), - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ), - INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S), + X86_MATCH_VFM(INTEL_ALDERLAKE, 0), + X86_MATCH_VFM(INTEL_ALDERLAKE_L, 0), + X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, 0), + X86_MATCH_VFM(INTEL_RAPTORLAKE, 0), + X86_MATCH_VFM(INTEL_RAPTORLAKE_P, 0), + X86_MATCH_VFM(INTEL_RAPTORLAKE_S, 0), {} };
The fix removed the custom INTEL_MATCH macro and uses the X86_MATCH*() macros with X86_CPU_ID_FLAG_ENTRY_VALID. This fixed commit was never backported to 6.1, so it looks like a stable series regression due to a missing backport.
If I apply the fix patch on 6.1.99, the PCID disabling code activates again. I had to change all the INTEL_* definitions to the old definitions to make it build:
static const struct x86_cpu_id invlpg_miss_ids[] = { - INTEL_MATCH(INTEL_FAM6_ALDERLAKE ), - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ), - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S), + X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE, 0), + X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_L, 0), + X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_N, 0), + X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE, 0), + X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_P, 0), + X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_S, 0), {} };
I only looked at the code in arch/x86/mm/init.c, so there may be other uses of x86_match_cpu() in the kernel that are also broken in 6.1.99. This email is meant as a bug report, not a pull request. Someone else should confirm the problem and submit the appropriate fix.
[CCing the x86 folks, Greg, and the regressions list]
Hi, Thorsten here, the Linux kernel's regression tracker.
On 30.07.24 18:41, Thomas Lindroth wrote:
I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and noticed that the dmesg line "Incomplete global flushes, disabling PCID" had disappeared from the log.
Thomas, thx for the report. FWIW, mainline developers like the x86 folks or Tony are free to focus on mainline and leave stable/longterm series to other people -- some nevertheless help out regularly or occasionally. So with a bit of luck this mail will make one of them care enough to provide a 6.1 version of what you afaics called the "existing fix" in mainline (2eda374e883ad2 ("x86/mm: Switch to new Intel CPU model defines") [v6.10-rc1]) that seems to be missing in 6.1.y. But if not I suspect it might be up to you to prepare and submit a 6.1.y variant of that fix, as you seem to care and are able to test the patch.
Ciao, Thorsten
That message comes from commit c26b9e193172f48cd0ccc64285337106fb8aa804, which disables PCID support on some broken hardware in arch/x86/mm/init.c:
#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \ .family = 6, \ .model = _model, \ } /* * INVLPG may not properly flush Global entries * on these CPUs when PCIDs are enabled. */ static const struct x86_cpu_id invlpg_miss_ids[] = { INTEL_MATCH(INTEL_FAM6_ALDERLAKE ), INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ), INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ), INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ), INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P), INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S), {}
...
if (x86_match_cpu(invlpg_miss_ids)) { pr_info("Incomplete global flushes, disabling PCID"); setup_clear_cpu_cap(X86_FEATURE_PCID); return; }
arch/x86/mm/init.c, which has that code, hasn't changed in 6.1.94 -> 6.1.99. However I found a commit changing how x86_match_cpu() behaves in 6.1.96:
commit 8ab1361b2eae44077fef4adea16228d44ffb860c Author: Tony Luck tony.luck@intel.com Date: Mon May 20 15:45:33 2024 -0700
x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL
I suspect this broke the PCID disabling code in arch/x86/mm/init.c. The commit message says:
"Add a new flags field to struct x86_cpu_id that has a bit set to indicate that this entry in the array is valid. Update X86_MATCH*() macros to set that bit. Change the end-marker check in x86_match_cpu() to just check the flags field for this bit."
But the PCID disabling code in 6.1.99 does not make use of the X86_MATCH*() macros; instead, it defines a new INTEL_MATCH() macro without the X86_CPU_ID_FLAG_ENTRY_VALID flag.
I looked in upstream git and found an existing fix: commit 2eda374e883ad297bd9fe575a16c1dc850346075 Author: Tony Luck tony.luck@intel.com Date: Wed Apr 24 11:15:18 2024 -0700
x86/mm: Switch to new Intel CPU model defines
New CPU #defines encode vendor and family as well as model.
[ dhansen: vertically align 0's in invlpg_miss_ids[] ]
Signed-off-by: Tony Luck tony.luck@intel.com Signed-off-by: Dave Hansen dave.hansen@linux.intel.com Signed-off-by: Borislav Petkov (AMD) bp@alien8.de Link: https://lore.kernel.org/all/20240424181518.41946-1-tony.luck%40intel.com
diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c index 679893ea5e68..6b43b6480354 100644 --- a/arch/x86/mm/init.c +++ b/arch/x86/mm/init.c @@ -261,21 +261,17 @@ static void __init probe_page_size_mask(void) } } -#define INTEL_MATCH(_model) { .vendor = X86_VENDOR_INTEL, \ - .family = 6, \ - .model = _model, \ - } /* * INVLPG may not properly flush Global entries * on these CPUs when PCIDs are enabled. */ static const struct x86_cpu_id invlpg_miss_ids[] = { - INTEL_MATCH(INTEL_FAM6_ALDERLAKE ), - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ), - INTEL_MATCH(INTEL_FAM6_ATOM_GRACEMONT ), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S), + X86_MATCH_VFM(INTEL_ALDERLAKE, 0), + X86_MATCH_VFM(INTEL_ALDERLAKE_L, 0), + X86_MATCH_VFM(INTEL_ATOM_GRACEMONT, 0), + X86_MATCH_VFM(INTEL_RAPTORLAKE, 0), + X86_MATCH_VFM(INTEL_RAPTORLAKE_P, 0), + X86_MATCH_VFM(INTEL_RAPTORLAKE_S, 0), {} };
The fix removed the custom INTEL_MATCH macro and uses the X86_MATCH*() macros with X86_CPU_ID_FLAG_ENTRY_VALID. This fixed commit was never backported to 6.1, so it looks like a stable series regression due to a missing backport.
If I apply the fix patch on 6.1.99, the PCID disabling code activates again. I had to change all the INTEL_* definitions to the old definitions to make it build:
static const struct x86_cpu_id invlpg_miss_ids[] = { - INTEL_MATCH(INTEL_FAM6_ALDERLAKE ), - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_L ), - INTEL_MATCH(INTEL_FAM6_ALDERLAKE_N ), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE ), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_P), - INTEL_MATCH(INTEL_FAM6_RAPTORLAKE_S), + X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE, 0), + X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_L, 0), + X86_MATCH_VFM(INTEL_FAM6_ALDERLAKE_N, 0), + X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE, 0), + X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_P, 0), + X86_MATCH_VFM(INTEL_FAM6_RAPTORLAKE_S, 0), {} };
I only looked at the code in arch/x86/mm/init.c, so there may be other uses of x86_match_cpu() in the kernel that are also broken in 6.1.99. This email is meant as a bug report, not a pull request. Someone else should confirm the problem and submit the appropriate fix.
P.S.:
#regzbot ^introduced 8ab1361b2eae44 #regzbot title x86: Possible missing backport of x86_match_cpu() change #regzbot ignore-activity
On Wed, Aug 07, 2024 at 10:15:23AM +0200, Thorsten Leemhuis wrote:
[CCing the x86 folks, Greg, and the regressions list]
Hi, Thorsten here, the Linux kernel's regression tracker.
On 30.07.24 18:41, Thomas Lindroth wrote:
I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and noticed that the dmesg line "Incomplete global flushes, disabling PCID" had disappeared from the log.
Thomas, thx for the report. FWIW, mainline developers like the x86 folks or Tony are free to focus on mainline and leave stable/longterm series to other people -- some nevertheless help out regularly or occasionally. So with a bit of luck this mail will make one of them care enough to provide a 6.1 version of what you afaics called the "existing fix" in mainline (2eda374e883ad2 ("x86/mm: Switch to new Intel CPU model defines") [v6.10-rc1]) that seems to be missing in 6.1.y. But if not I suspect it might be up to you to prepare and submit a 6.1.y variant of that fix, as you seem to care and are able to test the patch.
Needs to go to 6.6.y first, right? But even then, it does not apply to 6.1.y cleanly, so someone needs to send a backported (and tested) series to us at stable@vger.kernel.org and we will be glad to queue them up then.
thanks,
greg k-h
On Mon, 2024-08-12 at 14:11 +0200, Greg KH wrote:
On Wed, Aug 07, 2024 at 10:15:23AM +0200, Thorsten Leemhuis wrote:
[CCing the x86 folks, Greg, and the regressions list]
Hi, Thorsten here, the Linux kernel's regression tracker.
On 30.07.24 18:41, Thomas Lindroth wrote:
I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and noticed that the dmesg line "Incomplete global flushes, disabling PCID" had disappeared from the log.
Thomas, thx for the report. FWIW, mainline developers like the x86 folks or Tony are free to focus on mainline and leave stable/longterm series to other people -- some nevertheless help out regularly or occasionally. So with a bit of luck this mail will make one of them care enough to provide a 6.1 version of what you afaics called the "existing fix" in mainline (2eda374e883ad2 ("x86/mm: Switch to new Intel CPU model defines") [v6.10-rc1]) that seems to be missing in 6.1.y. But if not I suspect it might be up to you to prepare and submit a 6.1.y variant of that fix, as you seem to care and are able to test the patch.
Needs to go to 6.6.y first, right? But even then, it does not apply to 6.1.y cleanly, so someone needs to send a backported (and tested) series to us at stable@vger.kernel.org and we will be glad to queue them up then.
thanks,
greg k-h
There are three commits involved.
commit A: 4db64279bc2b (""x86/cpu: Switch to new Intel CPU model defines"") This commit replaces X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */ with X86_MATCH_VFM(INTEL_ANY, 1), /* SNC */ This is a functional change because the family info is replaced with 0. And this exposes a x86_match_cpu() problem that it breaks when the vendor/family/model/stepping/feature fields are all zeros.
commit B: 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL") It addresses the x86_match_cpu() problem by introducing a valid flag and set the flag in the Intel CPU model defines. This fixes commit A, but it actually breaks the x86_cpu_id structures that are constructed without using the Intel CPU model defines, like arch/x86/mm/init.c.
commit C: 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines") arch/x86/mm/init.c: broke by commit B but fixed by using the new Intel CPU model defines
In 6.1.99, commit A is missing commit B is there commit C is missing
In 6.6.50, commit A is missing commit B is there commit C is missing
Now we can fix the problem in stable kernel, by converting arch/x86/mm/init.c to use the CPU model defines (even the old style ones). But before that, I'm wondering if we need to backport commit B in 6.1 and 6.6 stable kernel because only commit A can expose this problem.
thanks, rui
On Wed, Sep 18, 2024 at 06:54:33AM +0000, Zhang, Rui wrote:
On Mon, 2024-08-12 at 14:11 +0200, Greg KH wrote:
On Wed, Aug 07, 2024 at 10:15:23AM +0200, Thorsten Leemhuis wrote:
[CCing the x86 folks, Greg, and the regressions list]
Hi, Thorsten here, the Linux kernel's regression tracker.
On 30.07.24 18:41, Thomas Lindroth wrote:
I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and noticed that the dmesg line "Incomplete global flushes, disabling PCID" had disappeared from the log.
Thomas, thx for the report. FWIW, mainline developers like the x86 folks or Tony are free to focus on mainline and leave stable/longterm series to other people -- some nevertheless help out regularly or occasionally. So with a bit of luck this mail will make one of them care enough to provide a 6.1 version of what you afaics called the "existing fix" in mainline (2eda374e883ad2 ("x86/mm: Switch to new Intel CPU model defines") [v6.10-rc1]) that seems to be missing in 6.1.y. But if not I suspect it might be up to you to prepare and submit a 6.1.y variant of that fix, as you seem to care and are able to test the patch.
Needs to go to 6.6.y first, right? But even then, it does not apply to 6.1.y cleanly, so someone needs to send a backported (and tested) series to us at stable@vger.kernel.org and we will be glad to queue them up then.
thanks,
greg k-h
There are three commits involved.
commit A: 4db64279bc2b (""x86/cpu: Switch to new Intel CPU model defines"") This commit replaces X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */ with X86_MATCH_VFM(INTEL_ANY, 1), /* SNC */ This is a functional change because the family info is replaced with 0. And this exposes a x86_match_cpu() problem that it breaks when the vendor/family/model/stepping/feature fields are all zeros.
commit B: 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL") It addresses the x86_match_cpu() problem by introducing a valid flag and set the flag in the Intel CPU model defines. This fixes commit A, but it actually breaks the x86_cpu_id structures that are constructed without using the Intel CPU model defines, like arch/x86/mm/init.c.
commit C: 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines") arch/x86/mm/init.c: broke by commit B but fixed by using the new Intel CPU model defines
In 6.1.99, commit A is missing commit B is there commit C is missing
In 6.6.50, commit A is missing commit B is there commit C is missing
Now we can fix the problem in stable kernel, by converting arch/x86/mm/init.c to use the CPU model defines (even the old style ones). But before that, I'm wondering if we need to backport commit B in 6.1 and 6.6 stable kernel because only commit A can expose this problem.
If so, can you submit the needed backports for us to apply? That's the easiest way for us to take them, thanks.
greg k-h
On Thu, Sep 19, 2024 at 01:19:27PM +0200, gregkh@linuxfoundation.org wrote:
On Wed, Sep 18, 2024 at 06:54:33AM +0000, Zhang, Rui wrote:
On Mon, 2024-08-12 at 14:11 +0200, Greg KH wrote:
On Wed, Aug 07, 2024 at 10:15:23AM +0200, Thorsten Leemhuis wrote:
[CCing the x86 folks, Greg, and the regressions list]
Hi, Thorsten here, the Linux kernel's regression tracker.
On 30.07.24 18:41, Thomas Lindroth wrote:
I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and noticed that the dmesg line "Incomplete global flushes, disabling PCID" had disappeared from the log.
Thomas, thx for the report. FWIW, mainline developers like the x86 folks or Tony are free to focus on mainline and leave stable/longterm series to other people -- some nevertheless help out regularly or occasionally. So with a bit of luck this mail will make one of them care enough to provide a 6.1 version of what you afaics called the "existing fix" in mainline (2eda374e883ad2 ("x86/mm: Switch to new Intel CPU model defines") [v6.10-rc1]) that seems to be missing in 6.1.y. But if not I suspect it might be up to you to prepare and submit a 6.1.y variant of that fix, as you seem to care and are able to test the patch.
Needs to go to 6.6.y first, right? But even then, it does not apply to 6.1.y cleanly, so someone needs to send a backported (and tested) series to us at stable@vger.kernel.org and we will be glad to queue them up then.
thanks,
greg k-h
There are three commits involved.
commit A: 4db64279bc2b (""x86/cpu: Switch to new Intel CPU model defines"") This commit replaces X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */ with X86_MATCH_VFM(INTEL_ANY, 1), /* SNC */ This is a functional change because the family info is replaced with 0. And this exposes a x86_match_cpu() problem that it breaks when the vendor/family/model/stepping/feature fields are all zeros.
commit B: 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL") It addresses the x86_match_cpu() problem by introducing a valid flag and set the flag in the Intel CPU model defines. This fixes commit A, but it actually breaks the x86_cpu_id structures that are constructed without using the Intel CPU model defines, like arch/x86/mm/init.c.
commit C: 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines") arch/x86/mm/init.c: broke by commit B but fixed by using the new Intel CPU model defines
In 6.1.99, commit A is missing commit B is there commit C is missing
In 6.6.50, commit A is missing commit B is there commit C is missing
Now we can fix the problem in stable kernel, by converting arch/x86/mm/init.c to use the CPU model defines (even the old style ones). But before that, I'm wondering if we need to backport commit B in 6.1 and 6.6 stable kernel because only commit A can expose this problem.
If so, can you submit the needed backports for us to apply? That's the easiest way for us to take them, thanks.
I audited all the uses of x86_match_cpu(match). All callers that construct the `match` argument using the family of X86_MATCH_* macros from arch/x86/ include/asm/cpu_device_id.h function correctly because the commit B has been backported to v6.1.99 and to v6.6.50 -- 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL").
Only those callers that use their own thing to compose the `match` argument are buggy: * arch/x86/mm/init.c * drivers/powercap/intel_rapl_msr.c (only in 6.1.99)
Summarizing, v6.1.99 needs these two commits from mainline * d05b5e0baf42 ("powercap: RAPL: fix invalid initialization for pl4_supported field") * 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines")
v6.6.50 only needs the second commit.
I will submit these backports.
Thanks and BR, Ricardo
On Mon, 2024-09-23 at 19:45 -0700, Ricardo Neri wrote:
On Thu, Sep 19, 2024 at 01:19:27PM +0200, gregkh@linuxfoundation.org wrote:
On Wed, Sep 18, 2024 at 06:54:33AM +0000, Zhang, Rui wrote:
On Mon, 2024-08-12 at 14:11 +0200, Greg KH wrote:
On Wed, Aug 07, 2024 at 10:15:23AM +0200, Thorsten Leemhuis wrote:
[CCing the x86 folks, Greg, and the regressions list]
Hi, Thorsten here, the Linux kernel's regression tracker.
On 30.07.24 18:41, Thomas Lindroth wrote:
I upgraded from kernel 6.1.94 to 6.1.99 on one of my machines and noticed that the dmesg line "Incomplete global flushes, disabling PCID" had disappeared from the log.
Thomas, thx for the report. FWIW, mainline developers like the x86 folks or Tony are free to focus on mainline and leave stable/longterm series to other people -- some nevertheless help out regularly or occasionally. So with a bit of luck this mail will make one of them care enough to provide a 6.1 version of what you afaics called the "existing fix" in mainline (2eda374e883ad2 ("x86/mm: Switch to new Intel CPU model defines") [v6.10-rc1]) that seems to be missing in 6.1.y. But if not I suspect it might be up to you to prepare and submit a 6.1.y variant of that fix, as you seem to care and are able to test the patch.
Needs to go to 6.6.y first, right? But even then, it does not apply to 6.1.y cleanly, so someone needs to send a backported (and tested) series to us at stable@vger.kernel.org and we will be glad to queue them up then.
thanks,
greg k-h
There are three commits involved.
commit A: 4db64279bc2b (""x86/cpu: Switch to new Intel CPU model defines"") This commit replaces X86_MATCH_INTEL_FAM6_MODEL(ANY, 1), /* SNC */ with X86_MATCH_VFM(INTEL_ANY, 1), /* SNC */ This is a functional change because the family info is replaced with 0. And this exposes a x86_match_cpu() problem that it breaks when the vendor/family/model/stepping/feature fields are all zeros.
commit B: 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL") It addresses the x86_match_cpu() problem by introducing a valid flag and set the flag in the Intel CPU model defines. This fixes commit A, but it actually breaks the x86_cpu_id structures that are constructed without using the Intel CPU model defines, like arch/x86/mm/init.c.
commit C: 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines") arch/x86/mm/init.c: broke by commit B but fixed by using the new Intel CPU model defines
In 6.1.99, commit A is missing commit B is there commit C is missing
In 6.6.50, commit A is missing commit B is there commit C is missing
Now we can fix the problem in stable kernel, by converting arch/x86/mm/init.c to use the CPU model defines (even the old style ones). But before that, I'm wondering if we need to backport commit B in 6.1 and 6.6 stable kernel because only commit A can expose this problem.
If so, can you submit the needed backports for us to apply? That's the easiest way for us to take them, thanks.
I audited all the uses of x86_match_cpu(match). All callers that construct the `match` argument using the family of X86_MATCH_* macros from arch/x86/ include/asm/cpu_device_id.h function correctly because the commit B has been backported to v6.1.99 and to v6.6.50 -- 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL").
Only those callers that use their own thing to compose the `match` argument are buggy: * arch/x86/mm/init.c * drivers/powercap/intel_rapl_msr.c (only in 6.1.99)
Thanks for auditing this. I overlooked the intel_rapl driver case.
Summarizing, v6.1.99 needs these two commits from mainline * d05b5e0baf42 ("powercap: RAPL: fix invalid initialization for pl4_supported field") * 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines")
v6.6.50 only needs the second commit.
Well, commit B 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL") is backported to all stable kernels. And the above two broken cases are also there.
So I suppose we need to backport all of them to 5.x stable kernel as well.
thanks, rui
I will submit these backports.
Thanks and BR, Ricardo
On Wed, Sep 25, 2024 at 05:20:41AM +0000, Zhang, Rui wrote:
If so, can you submit the needed backports for us to apply? That's the easiest way for us to take them, thanks.
I audited all the uses of x86_match_cpu(match). All callers that construct the `match` argument using the family of X86_MATCH_* macros from arch/x86/ include/asm/cpu_device_id.h function correctly because the commit B has been backported to v6.1.99 and to v6.6.50 -- 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL").
Only those callers that use their own thing to compose the `match` argument are buggy: * arch/x86/mm/init.c * drivers/powercap/intel_rapl_msr.c (only in 6.1.99)
Thanks for auditing this. I overlooked the intel_rapl driver case.
Summarizing, v6.1.99 needs these two commits from mainline * d05b5e0baf42 ("powercap: RAPL: fix invalid initialization for pl4_supported field") * 2eda374e883a ("x86/mm: Switch to new Intel CPU model defines")
v6.6.50 only needs the second commit.
Well, commit B 93022482b294 ("x86/cpu: Fix x86_match_cpu() to match just X86_VENDOR_INTEL") is backported to all stable kernels. And the above two broken cases are also there.
So I suppose we need to backport all of them to 5.x stable kernel as well.
Indeed, this the case. It has been backported to v5.15.y and v5.10.y, but not to v5.4.y nor 4.19.y.
I found one more case in those two v5.x versions. I will post the backports.
linux-stable-mirror@lists.linaro.org