The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com --- v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
The patch has been tested, it works as expected with and without nopat in the Xen PV Dom0 environment. That is, "nopat" causes the system to exhibit the effects and problems that lack of PAT support causes.
arch/x86/mm/pat/memtype.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index d5ef64ddd35e..10a37d309d23 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -62,6 +62,7 @@
static bool __read_mostly pat_bp_initialized; static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT); +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT); static bool __read_mostly pat_bp_enabled; static bool __read_mostly pat_cm_initialized;
@@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason) static int __init nopat(char *str) { pat_disable("PAT support disabled via boot option."); + pat_force_disabled = true; return 0; } early_param("nopat", nopat); @@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat) wrmsrl(MSR_IA32_CR_PAT, pat); }
-void init_cache_modes(void) +void __init init_cache_modes(void) { u64 pat = 0;
@@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) { + if (!pat || pat_force_disabled) { /* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable). @@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC); + } else if (!pat_bp_enabled) { + /* + * In some environments, specifically Xen PV, PAT + * initialization is skipped because MTRRs are disabled even + * though PAT is available. In such environments, set PAT to + * enabled to correctly indicate to callers of pat_enabled() + * that CPU support for PAT is available. + */ + pat_bp_enabled = true; + pr_info("x86/PAT: PAT enabled by init_cache_modes\n"); }
__init_cache_modes(pat);
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
Reviewed-by: Juergen Gross jgross@suse.com
Juergen
On 13.07.2022 03:36, Chuck Zmudzinski wrote:
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
You've sufficiently altered this change to clearly no longer want my S-o-b; unfortunately in fact I think you broke things:
@@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
By checking the new variable here ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
... you put in place a software view which doesn't match hardware. I continue to think that ...
- } else if (!pat_bp_enabled) {
... the variable wants checking here instead (at which point, yes, this comes quite close to simply being a v2 of my original patch).
By using !pat_bp_enabled here you actually broaden where the change would take effect. Iirc Boris had asked to narrow things (besides voicing opposition to this approach altogether). Even without that request I wonder whether you aren't going to far with this.
Jan
On 7/13/22 2:18 AM, Jan Beulich wrote:
On 13.07.2022 03:36, Chuck Zmudzinski wrote:
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
You've sufficiently altered this change to clearly no longer want my S-o-b; unfortunately in fact I think you broke things:
Well, I hope we can come to an agreement so I have your S-o-b. But that would probably require me to remove Juergen's R-b.
@@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
By checking the new variable here ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
... you put in place a software view which doesn't match hardware. I continue to think that ...
- } else if (!pat_bp_enabled) {
... the variable wants checking here instead (at which point, yes, this comes quite close to simply being a v2 of my original patch).
By using !pat_bp_enabled here you actually broaden where the change would take effect. Iirc Boris had asked to narrow things (besides voicing opposition to this approach altogether). Even without that request I wonder whether you aren't going to far with this.
Jan
I thought about checking for the administrator's "nopat" setting where you suggest which would limit the effect of "nopat" to not reporting PAT as enabled to device drivers who query for PAT availability using pat_enabled(). The main reason I did not do that is that due to the fact that we cannot write to the PAT MSR, we cannot really disable PAT. But we come closer to respecting the wishes of the administrator by configuring the caching modes as if PAT is actually disabled by the hardware or firmware when in fact it is not.
What would you propose logging as a message when we report PAT as disabled via pat_enabled()? The main reason I did not choose to check the new variable in the new 'else if' block is that I could not figure out what to tell the administrator in that case. I think we would have to log something like, "nopat is set, but we cannot disable PAT, doing our best to disable PAT by not reporting PAT as enabled via pat_enabled(), but that does not guarantee that kernel drivers and components cannot use PAT if they query for PAT support using boot_cpu_has(X86_FEATURE_PAT) instead of pat_enabled()." However, I acknowledge WC mappings would still be disabled because arch_can_pci_mmap_wc() will be false if pat_enabled() is false.
Perhaps we also need to log something if we keep the check for "nopat" where I placed it. We could say something like: "nopat is set, but we cannot disable hardware/firmware PAT support, so we are emulating as if there is no PAT support which puts in place a software view that does not match hardware."
No matter what, because we cannot write to PAT MSR in the Xen PV case, we probably need to log something to explain the problems associated with trying to honor the administrator's request. Also, what log level should it be. Should it be a pr_warn instead of a pr_info?
I will agree to implement your approach of checking the new variable where you suggest and limiting the effect of "nopat" to not reporting PAT as enabled via pat_enabled() if there is a consensus about what we should log to the administrator in that case and if Juergen and/or Boris agrees with it.
Chuck
On 13.07.2022 10:51, Chuck Zmudzinski wrote:
On 7/13/22 2:18 AM, Jan Beulich wrote:
On 13.07.2022 03:36, Chuck Zmudzinski wrote:
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
You've sufficiently altered this change to clearly no longer want my S-o-b; unfortunately in fact I think you broke things:
Well, I hope we can come to an agreement so I have your S-o-b. But that would probably require me to remove Juergen's R-b.
@@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
By checking the new variable here ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
... you put in place a software view which doesn't match hardware. I continue to think that ...
- } else if (!pat_bp_enabled) {
... the variable wants checking here instead (at which point, yes, this comes quite close to simply being a v2 of my original patch).
By using !pat_bp_enabled here you actually broaden where the change would take effect. Iirc Boris had asked to narrow things (besides voicing opposition to this approach altogether). Even without that request I wonder whether you aren't going to far with this.
Jan
I thought about checking for the administrator's "nopat" setting where you suggest which would limit the effect of "nopat" to not reporting PAT as enabled to device drivers who query for PAT availability using pat_enabled(). The main reason I did not do that is that due to the fact that we cannot write to the PAT MSR, we cannot really disable PAT. But we come closer to respecting the wishes of the administrator by configuring the caching modes as if PAT is actually disabled by the hardware or firmware when in fact it is not.
What would you propose logging as a message when we report PAT as disabled via pat_enabled()? The main reason I did not choose to check the new variable in the new 'else if' block is that I could not figure out what to tell the administrator in that case. I think we would have to log something like, "nopat is set, but we cannot disable PAT, doing our best to disable PAT by not reporting PAT as enabled via pat_enabled(), but that does not guarantee that kernel drivers and components cannot use PAT if they query for PAT support using boot_cpu_has(X86_FEATURE_PAT) instead of pat_enabled()." However, I acknowledge WC mappings would still be disabled because arch_can_pci_mmap_wc() will be false if pat_enabled() is false.
Perhaps we also need to log something if we keep the check for "nopat" where I placed it. We could say something like: "nopat is set, but we cannot disable hardware/firmware PAT support, so we are emulating as if there is no PAT support which puts in place a software view that does not match hardware."
No matter what, because we cannot write to PAT MSR in the Xen PV case, we probably need to log something to explain the problems associated with trying to honor the administrator's request. Also, what log level should it be. Should it be a pr_warn instead of a pr_info?
I'm afraid I'm the wrong one to answer logging questions. As you can see from my original patch, I didn't add any new logging (and no addition was requested in the comments that I have got). I also don't think "nopat" has ever meant "disable PAT", as the feature is either there or not. Instead I think it was always seen as "disable fiddling with PAT", which by implication means using whatever is there (if the feature / MSR itself is available).
Jan
On 7/13/2022 5:09 AM, Jan Beulich wrote:
On 13.07.2022 10:51, Chuck Zmudzinski wrote:
On 7/13/22 2:18 AM, Jan Beulich wrote:
On 13.07.2022 03:36, Chuck Zmudzinski wrote:
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
You've sufficiently altered this change to clearly no longer want my S-o-b; unfortunately in fact I think you broke things:
Well, I hope we can come to an agreement so I have your S-o-b. But that would probably require me to remove Juergen's R-b.
@@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
By checking the new variable here ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
... you put in place a software view which doesn't match hardware. I continue to think that ...
- } else if (!pat_bp_enabled) {
... the variable wants checking here instead (at which point, yes, this comes quite close to simply being a v2 of my original patch).
By using !pat_bp_enabled here you actually broaden where the change would take effect. Iirc Boris had asked to narrow things (besides voicing opposition to this approach altogether). Even without that request I wonder whether you aren't going to far with this.
Jan
I thought about checking for the administrator's "nopat" setting where you suggest which would limit the effect of "nopat" to not reporting PAT as enabled to device drivers who query for PAT availability using pat_enabled(). The main reason I did not do that is that due to the fact that we cannot write to the PAT MSR, we cannot really disable PAT. But we come closer to respecting the wishes of the administrator by configuring the caching modes as if PAT is actually disabled by the hardware or firmware when in fact it is not.
What would you propose logging as a message when we report PAT as disabled via pat_enabled()? The main reason I did not choose to check the new variable in the new 'else if' block is that I could not figure out what to tell the administrator in that case. I think we would have to log something like, "nopat is set, but we cannot disable PAT, doing our best to disable PAT by not reporting PAT as enabled via pat_enabled(), but that does not guarantee that kernel drivers and components cannot use PAT if they query for PAT support using boot_cpu_has(X86_FEATURE_PAT) instead of pat_enabled()." However, I acknowledge WC mappings would still be disabled because arch_can_pci_mmap_wc() will be false if pat_enabled() is false.
Perhaps we also need to log something if we keep the check for "nopat" where I placed it. We could say something like: "nopat is set, but we cannot disable hardware/firmware PAT support, so we are emulating as if there is no PAT support which puts in place a software view that does not match hardware."
No matter what, because we cannot write to PAT MSR in the Xen PV case, we probably need to log something to explain the problems associated with trying to honor the administrator's request. Also, what log level should it be. Should it be a pr_warn instead of a pr_info?
I'm afraid I'm the wrong one to answer logging questions. As you can see from my original patch, I didn't add any new logging (and no addition was requested in the comments that I have got). I also don't think "nopat" has ever meant "disable PAT", as the feature is either there or not. Instead I think it was always seen as "disable fiddling with PAT", which by implication means using whatever is there (if the feature / MSR itself is available).
IIRC, I do think I mentioned in the comments on your patch that it would be preferable to mention in the commit message that your patch would change the current behavior of "nopat" on Xen. The question is, how much do we want to change the current behavior of "nopat" on Xen. I think if we have to change the current behavior of "nopat" on Xen and if we are going to propagate that change to all current stable branches all the way back to 4.9.y,, we better make a lot of noise about what we are doing here.
Chuck
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
On 7/13/2022 5:09 AM, Jan Beulich wrote:
On 13.07.2022 10:51, Chuck Zmudzinski wrote:
On 7/13/22 2:18 AM, Jan Beulich wrote:
On 13.07.2022 03:36, Chuck Zmudzinski wrote:
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
You've sufficiently altered this change to clearly no longer want my S-o-b; unfortunately in fact I think you broke things:
Well, I hope we can come to an agreement so I have your S-o-b. But that would probably require me to remove Juergen's R-b.
@@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
By checking the new variable here ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
... you put in place a software view which doesn't match hardware. I continue to think that ...
- } else if (!pat_bp_enabled) {
... the variable wants checking here instead (at which point, yes, this comes quite close to simply being a v2 of my original patch).
By using !pat_bp_enabled here you actually broaden where the change would take effect. Iirc Boris had asked to narrow things (besides voicing opposition to this approach altogether). Even without that request I wonder whether you aren't going to far with this.
Jan
I thought about checking for the administrator's "nopat" setting where you suggest which would limit the effect of "nopat" to not reporting PAT as enabled to device drivers who query for PAT availability using pat_enabled(). The main reason I did not do that is that due to the fact that we cannot write to the PAT MSR, we cannot really disable PAT. But we come closer to respecting the wishes of the administrator by configuring the caching modes as if PAT is actually disabled by the hardware or firmware when in fact it is not.
What would you propose logging as a message when we report PAT as disabled via pat_enabled()? The main reason I did not choose to check the new variable in the new 'else if' block is that I could not figure out what to tell the administrator in that case. I think we would have to log something like, "nopat is set, but we cannot disable PAT, doing our best to disable PAT by not reporting PAT as enabled via pat_enabled(), but that does not guarantee that kernel drivers and components cannot use PAT if they query for PAT support using boot_cpu_has(X86_FEATURE_PAT) instead of pat_enabled()." However, I acknowledge WC mappings would still be disabled because arch_can_pci_mmap_wc() will be false if pat_enabled() is false.
Perhaps we also need to log something if we keep the check for "nopat" where I placed it. We could say something like: "nopat is set, but we cannot disable hardware/firmware PAT support, so we are emulating as if there is no PAT support which puts in place a software view that does not match hardware."
No matter what, because we cannot write to PAT MSR in the Xen PV case, we probably need to log something to explain the problems associated with trying to honor the administrator's request. Also, what log level should it be. Should it be a pr_warn instead of a pr_info?
I'm afraid I'm the wrong one to answer logging questions. As you can see from my original patch, I didn't add any new logging (and no addition was requested in the comments that I have got). I also don't think "nopat" has ever meant "disable PAT", as the feature is either there or not. Instead I think it was always seen as "disable fiddling with PAT", which by implication means using whatever is there (if the feature / MSR itself is available).
IIRC, I do think I mentioned in the comments on your patch that it would be preferable to mention in the commit message that your patch would change the current behavior of "nopat" on Xen. The question is, how much do we want to change the current behavior of "nopat" on Xen. I think if we have to change the current behavior of "nopat" on Xen and if we are going to propagate that change to all current stable branches all the way back to 4.9.y,, we better make a lot of noise about what we are doing here.
Chuck
And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist? Or perhaps, Juergen, could you accept a v3 of my patch that does not need to decide how to backport the change to "nopat" to the stable branches that are affected by the current behavior of "nopat" on Xen?
To do such a v3, I would just have to fix the style problems with my original patch and not come to an agreement with Jan about how to deal with the "nopat" problem.
Chuck
Chuck
On 13.07.2022 13:10, Chuck Zmudzinski wrote:
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
On 7/13/2022 5:09 AM, Jan Beulich wrote:
On 13.07.2022 10:51, Chuck Zmudzinski wrote:
On 7/13/22 2:18 AM, Jan Beulich wrote:
On 13.07.2022 03:36, Chuck Zmudzinski wrote:
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
You've sufficiently altered this change to clearly no longer want my S-o-b; unfortunately in fact I think you broke things:
Well, I hope we can come to an agreement so I have your S-o-b. But that would probably require me to remove Juergen's R-b.
@@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
By checking the new variable here ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
... you put in place a software view which doesn't match hardware. I continue to think that ...
- } else if (!pat_bp_enabled) {
... the variable wants checking here instead (at which point, yes, this comes quite close to simply being a v2 of my original patch).
By using !pat_bp_enabled here you actually broaden where the change would take effect. Iirc Boris had asked to narrow things (besides voicing opposition to this approach altogether). Even without that request I wonder whether you aren't going to far with this.
Jan
I thought about checking for the administrator's "nopat" setting where you suggest which would limit the effect of "nopat" to not reporting PAT as enabled to device drivers who query for PAT availability using pat_enabled(). The main reason I did not do that is that due to the fact that we cannot write to the PAT MSR, we cannot really disable PAT. But we come closer to respecting the wishes of the administrator by configuring the caching modes as if PAT is actually disabled by the hardware or firmware when in fact it is not.
What would you propose logging as a message when we report PAT as disabled via pat_enabled()? The main reason I did not choose to check the new variable in the new 'else if' block is that I could not figure out what to tell the administrator in that case. I think we would have to log something like, "nopat is set, but we cannot disable PAT, doing our best to disable PAT by not reporting PAT as enabled via pat_enabled(), but that does not guarantee that kernel drivers and components cannot use PAT if they query for PAT support using boot_cpu_has(X86_FEATURE_PAT) instead of pat_enabled()." However, I acknowledge WC mappings would still be disabled because arch_can_pci_mmap_wc() will be false if pat_enabled() is false.
Perhaps we also need to log something if we keep the check for "nopat" where I placed it. We could say something like: "nopat is set, but we cannot disable hardware/firmware PAT support, so we are emulating as if there is no PAT support which puts in place a software view that does not match hardware."
No matter what, because we cannot write to PAT MSR in the Xen PV case, we probably need to log something to explain the problems associated with trying to honor the administrator's request. Also, what log level should it be. Should it be a pr_warn instead of a pr_info?
I'm afraid I'm the wrong one to answer logging questions. As you can see from my original patch, I didn't add any new logging (and no addition was requested in the comments that I have got). I also don't think "nopat" has ever meant "disable PAT", as the feature is either there or not. Instead I think it was always seen as "disable fiddling with PAT", which by implication means using whatever is there (if the feature / MSR itself is available).
IIRC, I do think I mentioned in the comments on your patch that it would be preferable to mention in the commit message that your patch would change the current behavior of "nopat" on Xen. The question is, how much do we want to change the current behavior of "nopat" on Xen. I think if we have to change the current behavior of "nopat" on Xen and if we are going to propagate that change to all current stable branches all the way back to 4.9.y,, we better make a lot of noise about what we are doing here.
Chuck
And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist?
Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo.
Jan
Or perhaps, Juergen, could you accept a v3 of my patch that does not need to decide how to backport the change to "nopat" to the stable branches that are affected by the current behavior of "nopat" on Xen?
To do such a v3, I would just have to fix the style problems with my original patch and not come to an agreement with Jan about how to deal with the "nopat" problem.
Chuck
Chuck
On 13.07.22 15:34, Jan Beulich wrote:
On 13.07.2022 13:10, Chuck Zmudzinski wrote:
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
On 7/13/2022 5:09 AM, Jan Beulich wrote:
On 13.07.2022 10:51, Chuck Zmudzinski wrote:
On 7/13/22 2:18 AM, Jan Beulich wrote:
On 13.07.2022 03:36, Chuck Zmudzinski wrote: > v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) > *Add the necessary code to incorporate the "nopat" fix > *void init_cache_modes(void) -> void __init init_cache_modes(void) > *Add Jan Beulich as Co-developer (Jan has not signed off yet) > *Expand the commit message to include relevant parts of the commit > message of Jan Beulich's proposed patch for this problem > *Fix 'else if ... {' placement and indentation > *Remove indication the backport to stable branches is only back to 5.17.y > > I think these changes address all the comments on the original patch > > I added Jan Beulich as a Co-developer because Juergen Gross asked me to > include Jan's idea for fixing "nopat" that was missing from the first > version of the patch.
You've sufficiently altered this change to clearly no longer want my S-o-b; unfortunately in fact I think you broke things:
Well, I hope we can come to an agreement so I have your S-o-b. But that would probably require me to remove Juergen's R-b.
> @@ -292,7 +294,7 @@ void init_cache_modes(void) > rdmsrl(MSR_IA32_CR_PAT, pat); > } > > - if (!pat) { > + if (!pat || pat_force_disabled) {
By checking the new variable here ...
> /* > * No PAT. Emulate the PAT table that corresponds to the two > * cache bits, PWT (Write Through) and PCD (Cache Disable). > @@ -313,6 +315,16 @@ void init_cache_modes(void) > */ > pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | > PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
... you put in place a software view which doesn't match hardware. I continue to think that ...
> + } else if (!pat_bp_enabled) {
... the variable wants checking here instead (at which point, yes, this comes quite close to simply being a v2 of my original patch).
By using !pat_bp_enabled here you actually broaden where the change would take effect. Iirc Boris had asked to narrow things (besides voicing opposition to this approach altogether). Even without that request I wonder whether you aren't going to far with this.
Jan
I thought about checking for the administrator's "nopat" setting where you suggest which would limit the effect of "nopat" to not reporting PAT as enabled to device drivers who query for PAT availability using pat_enabled(). The main reason I did not do that is that due to the fact that we cannot write to the PAT MSR, we cannot really disable PAT. But we come closer to respecting the wishes of the administrator by configuring the caching modes as if PAT is actually disabled by the hardware or firmware when in fact it is not.
What would you propose logging as a message when we report PAT as disabled via pat_enabled()? The main reason I did not choose to check the new variable in the new 'else if' block is that I could not figure out what to tell the administrator in that case. I think we would have to log something like, "nopat is set, but we cannot disable PAT, doing our best to disable PAT by not reporting PAT as enabled via pat_enabled(), but that does not guarantee that kernel drivers and components cannot use PAT if they query for PAT support using boot_cpu_has(X86_FEATURE_PAT) instead of pat_enabled()." However, I acknowledge WC mappings would still be disabled because arch_can_pci_mmap_wc() will be false if pat_enabled() is false.
Perhaps we also need to log something if we keep the check for "nopat" where I placed it. We could say something like: "nopat is set, but we cannot disable hardware/firmware PAT support, so we are emulating as if there is no PAT support which puts in place a software view that does not match hardware."
No matter what, because we cannot write to PAT MSR in the Xen PV case, we probably need to log something to explain the problems associated with trying to honor the administrator's request. Also, what log level should it be. Should it be a pr_warn instead of a pr_info?
I'm afraid I'm the wrong one to answer logging questions. As you can see from my original patch, I didn't add any new logging (and no addition was requested in the comments that I have got). I also don't think "nopat" has ever meant "disable PAT", as the feature is either there or not. Instead I think it was always seen as "disable fiddling with PAT", which by implication means using whatever is there (if the feature / MSR itself is available).
IIRC, I do think I mentioned in the comments on your patch that it would be preferable to mention in the commit message that your patch would change the current behavior of "nopat" on Xen. The question is, how much do we want to change the current behavior of "nopat" on Xen. I think if we have to change the current behavior of "nopat" on Xen and if we are going to propagate that change to all current stable branches all the way back to 4.9.y,, we better make a lot of noise about what we are doing here.
Chuck
And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist?
Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo.
There is Documentation/x86/pat.rst which rather clearly states, how "nopat" is meant to work. It should not change the contents of the PAT MSR and keep it just as it was set at boot time (the doc talks about the "BIOS" setting of the MSR, and I guess in the Xen case the hypervisor is kind of acting as the BIOS).
The question is, whether "nopat" needs to be translated to pat_enabled() returning "false". I've found exactly one case where "nopat" seems to be required for a driver to work correctly, but this driver (drivers/media/pci/ivtv/ivtvfb.c) seems to rely on MTRR availability, which isn't supported in Xen PV guests.
Other than that the "nopat" option seems to be a chicken bit from the early days of PAT usage, which probably isn't really needed any more. So I wouldn't be worried to drop "nopat" having any effect on the system in Xen PV guests.
On bare metal it should still work, of course, if only for said driver.
Or perhaps, Juergen, could you accept a v3 of my patch that does not need to decide how to backport the change to "nopat" to the stable branches that are affected by the current behavior of "nopat" on Xen?
Note that it is not me to accept such a patch, this would be one of the x86 maintainers (e.g. Boris).
Juergen
On 7/13/2022 9:45 AM, Juergen Gross wrote:
On 13.07.22 15:34, Jan Beulich wrote:
On 13.07.2022 13:10, Chuck Zmudzinski wrote:
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
On 7/13/2022 5:09 AM, Jan Beulich wrote:
On 13.07.2022 10:51, Chuck Zmudzinski wrote:
On 7/13/22 2:18 AM, Jan Beulich wrote: > On 13.07.2022 03:36, Chuck Zmudzinski wrote: >> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) >> *Add the necessary code to incorporate the "nopat" fix >> *void init_cache_modes(void) -> void __init init_cache_modes(void) >> *Add Jan Beulich as Co-developer (Jan has not signed off yet) >> *Expand the commit message to include relevant parts of the commit >> message of Jan Beulich's proposed patch for this problem >> *Fix 'else if ... {' placement and indentation >> *Remove indication the backport to stable branches is only back to 5.17.y >> >> I think these changes address all the comments on the original patch >> >> I added Jan Beulich as a Co-developer because Juergen Gross asked me to >> include Jan's idea for fixing "nopat" that was missing from the first >> version of the patch. > > You've sufficiently altered this change to clearly no longer want my > S-o-b; unfortunately in fact I think you broke things:
Well, I hope we can come to an agreement so I have your S-o-b. But that would probably require me to remove Juergen's R-b.
>> @@ -292,7 +294,7 @@ void init_cache_modes(void) >> rdmsrl(MSR_IA32_CR_PAT, pat); >> } >> >> - if (!pat) { >> + if (!pat || pat_force_disabled) { > > By checking the new variable here ... > >> /* >> * No PAT. Emulate the PAT table that corresponds to the two >> * cache bits, PWT (Write Through) and PCD (Cache Disable). >> @@ -313,6 +315,16 @@ void init_cache_modes(void) >> */ >> pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | >> PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC); > > ... you put in place a software view which doesn't match hardware. I > continue to think that ... > >> + } else if (!pat_bp_enabled) { > > ... the variable wants checking here instead (at which point, yes, > this comes quite close to simply being a v2 of my original patch). > > By using !pat_bp_enabled here you actually broaden where the change > would take effect. Iirc Boris had asked to narrow things (besides > voicing opposition to this approach altogether). Even without that > request I wonder whether you aren't going to far with this. > > Jan
I thought about checking for the administrator's "nopat" setting where you suggest which would limit the effect of "nopat" to not reporting PAT as enabled to device drivers who query for PAT availability using pat_enabled(). The main reason I did not do that is that due to the fact that we cannot write to the PAT MSR, we cannot really disable PAT. But we come closer to respecting the wishes of the administrator by configuring the caching modes as if PAT is actually disabled by the hardware or firmware when in fact it is not.
What would you propose logging as a message when we report PAT as disabled via pat_enabled()? The main reason I did not choose to check the new variable in the new 'else if' block is that I could not figure out what to tell the administrator in that case. I think we would have to log something like, "nopat is set, but we cannot disable PAT, doing our best to disable PAT by not reporting PAT as enabled via pat_enabled(), but that does not guarantee that kernel drivers and components cannot use PAT if they query for PAT support using boot_cpu_has(X86_FEATURE_PAT) instead of pat_enabled()." However, I acknowledge WC mappings would still be disabled because arch_can_pci_mmap_wc() will be false if pat_enabled() is false.
Perhaps we also need to log something if we keep the check for "nopat" where I placed it. We could say something like: "nopat is set, but we cannot disable hardware/firmware PAT support, so we are emulating as if there is no PAT support which puts in place a software view that does not match hardware."
No matter what, because we cannot write to PAT MSR in the Xen PV case, we probably need to log something to explain the problems associated with trying to honor the administrator's request. Also, what log level should it be. Should it be a pr_warn instead of a pr_info?
I'm afraid I'm the wrong one to answer logging questions. As you can see from my original patch, I didn't add any new logging (and no addition was requested in the comments that I have got). I also don't think "nopat" has ever meant "disable PAT", as the feature is either there or not. Instead I think it was always seen as "disable fiddling with PAT", which by implication means using whatever is there (if the feature / MSR itself is available).
IIRC, I do think I mentioned in the comments on your patch that it would be preferable to mention in the commit message that your patch would change the current behavior of "nopat" on Xen. The question is, how much do we want to change the current behavior of "nopat" on Xen. I think if we have to change the current behavior of "nopat" on Xen and if we are going to propagate that change to all current stable branches all the way back to 4.9.y,, we better make a lot of noise about what we are doing here.
Chuck
And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist?
Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo.
There is Documentation/x86/pat.rst which rather clearly states, how "nopat" is meant to work. It should not change the contents of the PAT MSR and keep it just as it was set at boot time (the doc talks about the "BIOS" setting of the MSR, and I guess in the Xen case the hypervisor is kind of acting as the BIOS).
The question is, whether "nopat" needs to be translated to pat_enabled() returning "false". I've found exactly one case where "nopat" seems to be required for a driver to work correctly, but this driver (drivers/media/pci/ivtv/ivtvfb.c) seems to rely on MTRR availability, which isn't supported in Xen PV guests.
Other than that the "nopat" option seems to be a chicken bit from the early days of PAT usage, which probably isn't really needed any more. So I wouldn't be worried to drop "nopat" having any effect on the system in Xen PV guests.
On bare metal it should still work, of course, if only for said driver.
Or perhaps, Juergen, could you accept a v3 of my patch that does not need to decide how to backport the change to "nopat" to the stable branches that are affected by the current behavior of "nopat" on Xen?
Note that it is not me to accept such a patch, this would be one of the x86 maintainers (e.g. Boris).
Juergen
Thanks, I think you have given me enough information to try a v3 without a fix for "nopat" that you might be willing to give your Reviewed-by to.
Unfortunately, I am not optimistic about coming to an agreement with Jan. I do not want to sign-off on a patch that makes the kind of changes to how "nopat" behaves with the attitude, "I'm not the one to ask about logging." We have to give the administrator accurate information about what we are doing with useful logs, and I cannot sign-off on Jan's approach without giving the administrator a sensible and honest message about what effects, or lack of effects, the "nopat" option will have with his approach. Perhaps Jan will change his mind and sign-off with me on a v3 that will explain with a log message the limitations of the "nopat" option using his approach.
I am just against not being open about any issues that are present in what we always call "open source software."
Chuck
On 7/13/2022 9:45 AM, Juergen Gross wrote:
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote: And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist?
Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo.
There is Documentation/x86/pat.rst which rather clearly states, how "nopat" is meant to work. It should not change the contents of the PAT MSR and keep it just as it was set at boot time (the doc talks about the "BIOS" setting of the MSR, and I guess in the Xen case the hypervisor is kind of acting as the BIOS).
The question is, whether "nopat" needs to be translated to pat_enabled() returning "false".
When I started working on a re-factoring effort of the logic surrounding pat_enabled(), I noticed there are five different reasons in the current code for setting pat_disabled to true, which IMO is what should be a redundant variable that should always be equal !pat_enabled() and !pat_bp_enabled, but that unfortunately is not the case. The five reasons for setting pat_disabled to true are given as message strings:
1. "MTRRs disabled, skipping PAT initialization too." 2. "PAT support disabled because CONFIG_MTRR is disabled in the kernel." 3. "PAT support disabled via boot option." 4. "PAT not supported by the CPU." 5. "PAT support disabled by the firmware."
The only effect of setting pat_disabled to true is to inhibit the execution of pat_init(), but it does not inhibit the execution of init_cache_modes(), which is for handling all these cases when pat_init() was skipped. The Xen case is one of those cases, so in the Xen case, pat_disabled will be true yet the only way to fix the current regression and the five-year-old commit is by setting pat_bp_enabled to true so pat_enabled() will return true. So to fix the five-year-old commit, we must have
pat_enabled() != pat_disabled
Something is wrong with this logic, that is why I wanted to precede my fix with some re-factoring that will change some variable and function names and modify some comments before trying to fix the five-year-old commit, so that we will never have a situation when pat_enabled() != pat_disabled.
Chuck
On 7/13/2022 3:07 PM, Chuck Zmudzinski wrote:
On 7/13/2022 9:45 AM, Juergen Gross wrote:
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote: And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist?
Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo.
There is Documentation/x86/pat.rst which rather clearly states, how "nopat" is meant to work. It should not change the contents of the PAT MSR and keep it just as it was set at boot time (the doc talks about the "BIOS" setting of the MSR, and I guess in the Xen case the hypervisor is kind of acting as the BIOS).
The question is, whether "nopat" needs to be translated to pat_enabled() returning "false".
When I started working on a re-factoring effort of the logic surrounding pat_enabled(), I noticed there are five different reasons in the current code for setting pat_disabled to true, which IMO is what should be a redundant variable that should always be equal !pat_enabled() and !pat_bp_enabled, but that unfortunately is not the case. The five reasons for setting pat_disabled to true are given as message strings:
- "MTRRs disabled, skipping PAT initialization too."
- "PAT support disabled because CONFIG_MTRR is disabled in the kernel."
- "PAT support disabled via boot option."
- "PAT not supported by the CPU."
- "PAT support disabled by the firmware."
The only effect of setting pat_disabled to true is to inhibit the execution of pat_init(), but it does not inhibit the execution of init_cache_modes(), which is for handling all these cases when pat_init() was skipped. The Xen case is one of those cases, so in the Xen case, pat_disabled will be true yet the only way to fix the current regression and the five-year-old commit is by setting pat_bp_enabled to true so pat_enabled() will return true. So to fix the five-year-old commit, we must have
pat_enabled() != pat_disabled
Something is wrong with this logic, that is why I wanted to precede my fix with some re-factoring that will change some variable and function names and modify some comments before trying to fix the five-year-old commit, so that we will never have a situation when pat_enabled() != pat_disabled.
Chuck
Sorry, I meant to say,
To fix the five-year-old commit, we must have
pat_enabled() != !pat_disabled or pat_enabled() == pat_disabled,
and there is something wrong with that logic.
Chuck
On 7/13/2022 3:22 PM, Chuck Zmudzinski wrote:
On 7/13/2022 3:07 PM, Chuck Zmudzinski wrote:
On 7/13/2022 9:45 AM, Juergen Gross wrote:
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote: And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist?
Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo.
There is Documentation/x86/pat.rst which rather clearly states, how "nopat" is meant to work. It should not change the contents of the PAT MSR and keep it just as it was set at boot time (the doc talks about the "BIOS" setting of the MSR, and I guess in the Xen case the hypervisor is kind of acting as the BIOS).
The question is, whether "nopat" needs to be translated to pat_enabled() returning "false".
When I started working on a re-factoring effort of the logic surrounding pat_enabled(), I noticed there are five different reasons in the current code for setting pat_disabled to true, which IMO is what should be a redundant variable that should always be equal !pat_enabled() and !pat_bp_enabled, but that unfortunately is not the case. The five reasons for setting pat_disabled to true are given as message strings:
- "MTRRs disabled, skipping PAT initialization too."
- "PAT support disabled because CONFIG_MTRR is disabled in the kernel."
- "PAT support disabled via boot option."
- "PAT not supported by the CPU."
- "PAT support disabled by the firmware."
The only effect of setting pat_disabled to true is to inhibit the execution of pat_init(), but it does not inhibit the execution of init_cache_modes(), which is for handling all these cases when pat_init() was skipped. The Xen case is one of those cases, so in the Xen case, pat_disabled will be true yet the only way to fix the current regression and the five-year-old commit is by setting pat_bp_enabled to true so pat_enabled() will return true. So to fix the five-year-old commit, we must have
pat_enabled() != pat_disabled
Something is wrong with this logic, that is why I wanted to precede my fix with some re-factoring that will change some variable and function names and modify some comments before trying to fix the five-year-old commit, so that we will never have a situation when pat_enabled() != pat_disabled.
Chuck
Sorry, I meant to say,
To fix the five-year-old commit, we must have
pat_enabled() != !pat_disabled or pat_enabled() == pat_disabled,
and there is something wrong with that logic.
Chuck
So to summarize, I think this means that to be comfortable fixing the five-year-old commit and the current regression by artificially setting pat_bp_enabled and pat_enabled() to true, something which both my patch and Jan's patch does, we need to come to a new understanding of what the static boolean variable pat_disabled in arch/x86/mm/pat/memtype.c in the code really means.
The fact is, we have a regression and the only fix we can find is to try to make pat_enabled() == pat_disabled
I need to stop thinking about this for a while. It is time for those who have authority to fix this regression to make some comments about how they think this should be fixed.
Chuck
On 7/13/2022 9:45 AM, Juergen Gross wrote:
On 13.07.22 15:34, Jan Beulich wrote:
On 13.07.2022 13:10, Chuck Zmudzinski wrote:
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
On 7/13/2022 5:09 AM, Jan Beulich wrote:
On 13.07.2022 10:51, Chuck Zmudzinski wrote:
On 7/13/22 2:18 AM, Jan Beulich wrote: > On 13.07.2022 03:36, Chuck Zmudzinski wrote: >> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) >> *Add the necessary code to incorporate the "nopat" fix >> *void init_cache_modes(void) -> void __init init_cache_modes(void) >> *Add Jan Beulich as Co-developer (Jan has not signed off yet) >> *Expand the commit message to include relevant parts of the commit >> message of Jan Beulich's proposed patch for this problem >> *Fix 'else if ... {' placement and indentation >> *Remove indication the backport to stable branches is only back to 5.17.y >> >> I think these changes address all the comments on the original patch >> >> I added Jan Beulich as a Co-developer because Juergen Gross asked me to >> include Jan's idea for fixing "nopat" that was missing from the first >> version of the patch. > > You've sufficiently altered this change to clearly no longer want my > S-o-b; unfortunately in fact I think you broke things:
Well, I hope we can come to an agreement so I have your S-o-b. But that would probably require me to remove Juergen's R-b.
>> @@ -292,7 +294,7 @@ void init_cache_modes(void) >> rdmsrl(MSR_IA32_CR_PAT, pat); >> } >> >> - if (!pat) { >> + if (!pat || pat_force_disabled) { > > By checking the new variable here ... > >> /* >> * No PAT. Emulate the PAT table that corresponds to the two >> * cache bits, PWT (Write Through) and PCD (Cache Disable). >> @@ -313,6 +315,16 @@ void init_cache_modes(void) >> */ >> pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | >> PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC); > > ... you put in place a software view which doesn't match hardware. I > continue to think that ... > >> + } else if (!pat_bp_enabled) { > > ... the variable wants checking here instead (at which point, yes, > this comes quite close to simply being a v2 of my original patch). > > By using !pat_bp_enabled here you actually broaden where the change > would take effect. Iirc Boris had asked to narrow things (besides > voicing opposition to this approach altogether). Even without that > request I wonder whether you aren't going to far with this. > > Jan
I thought about checking for the administrator's "nopat" setting where you suggest which would limit the effect of "nopat" to not reporting PAT as enabled to device drivers who query for PAT availability using pat_enabled(). The main reason I did not do that is that due to the fact that we cannot write to the PAT MSR, we cannot really disable PAT. But we come closer to respecting the wishes of the administrator by configuring the caching modes as if PAT is actually disabled by the hardware or firmware when in fact it is not.
What would you propose logging as a message when we report PAT as disabled via pat_enabled()? The main reason I did not choose to check the new variable in the new 'else if' block is that I could not figure out what to tell the administrator in that case. I think we would have to log something like, "nopat is set, but we cannot disable PAT, doing our best to disable PAT by not reporting PAT as enabled via pat_enabled(), but that does not guarantee that kernel drivers and components cannot use PAT if they query for PAT support using boot_cpu_has(X86_FEATURE_PAT) instead of pat_enabled()." However, I acknowledge WC mappings would still be disabled because arch_can_pci_mmap_wc() will be false if pat_enabled() is false.
Perhaps we also need to log something if we keep the check for "nopat" where I placed it. We could say something like: "nopat is set, but we cannot disable hardware/firmware PAT support, so we are emulating as if there is no PAT support which puts in place a software view that does not match hardware."
No matter what, because we cannot write to PAT MSR in the Xen PV case, we probably need to log something to explain the problems associated with trying to honor the administrator's request. Also, what log level should it be. Should it be a pr_warn instead of a pr_info?
I'm afraid I'm the wrong one to answer logging questions. As you can see from my original patch, I didn't add any new logging (and no addition was requested in the comments that I have got). I also don't think "nopat" has ever meant "disable PAT", as the feature is either there or not. Instead I think it was always seen as "disable fiddling with PAT", which by implication means using whatever is there (if the feature / MSR itself is available).
IIRC, I do think I mentioned in the comments on your patch that it would be preferable to mention in the commit message that your patch would change the current behavior of "nopat" on Xen. The question is, how much do we want to change the current behavior of "nopat" on Xen. I think if we have to change the current behavior of "nopat" on Xen and if we are going to propagate that change to all current stable branches all the way back to 4.9.y,, we better make a lot of noise about what we are doing here.
Chuck
And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist?
Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo.
There is Documentation/x86/pat.rst which rather clearly states, how "nopat" is meant to work. It should not change the contents of the PAT MSR and keep it just as it was set at boot time (the doc talks about the "BIOS" setting of the MSR, and I guess in the Xen case the hypervisor is kind of acting as the BIOS).
If that is the true meaning of "nopat", then the pat_enabled() test we currently have in the i915 driver is the wrong test for the capability of the CPU to use the fast WC type pages for video frames access because it is possible for pat_enabled() to be false and "nopat" set with its official meaning, and still have a CPU with WC cache mode capability.
If we accept pat_enabled() as implied WC cache mode support, why not also accept (!pat_enabled && boot_cpu_has(X86_FEATURE_HYPERVISOR)) also as implied WC cache mode support? That is what Jan's patch effectively does. He just possibly places his patch in the wrong portion of the Linux tree to be consistent with the official meaning of "nopat" and pat_enabled().
We could implement Jan's fix instead in the i915 driver instead if we need to be consistent with the official meaning of "nopat" and pat_enabled().
I could make that a v3 of my patch - and try the i915 maintainers instead of the x86 maintainers to provide the fix. But before I do that, can someone on this list of 20 recipients tell me why none of you have fixed this nasty regression? I am new to trying to contribute to Linux and the whole experience is frustrating when all you get is stonewalling from the official maintainers. So why not just someone step up and do this fix?
In the meantime, Juergen can start working on cleaning up the x86/PAT code so it can provide the i915 driver with a test not for PAT, but for the WC page caching mode support that works in all supported environments, including Xen. Currently there is no such test available. Juergen proposed one but it failed to accurately test for WC cache mode capability on my Xen workstation. Until the x86 subsystem developers can provide the rest of Linux with an accurate test for the WC caching mode, we have to settle for less than a pure and perfect solution if we are serious about following Linus' regression rule and accept a quick fix to a nasty regression while we wait for a better solution that will hopefully come later.
Chuck
On 7/13/2022 9:34 AM, Jan Beulich wrote:
On 13.07.2022 13:10, Chuck Zmudzinski wrote:
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
On 7/13/2022 5:09 AM, Jan Beulich wrote:
On 13.07.2022 10:51, Chuck Zmudzinski wrote:
On 7/13/22 2:18 AM, Jan Beulich wrote:
On 13.07.2022 03:36, Chuck Zmudzinski wrote: > v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) > *Add the necessary code to incorporate the "nopat" fix > *void init_cache_modes(void) -> void __init init_cache_modes(void) > *Add Jan Beulich as Co-developer (Jan has not signed off yet) > *Expand the commit message to include relevant parts of the commit > message of Jan Beulich's proposed patch for this problem > *Fix 'else if ... {' placement and indentation > *Remove indication the backport to stable branches is only back to 5.17.y > > I think these changes address all the comments on the original patch > > I added Jan Beulich as a Co-developer because Juergen Gross asked me to > include Jan's idea for fixing "nopat" that was missing from the first > version of the patch.
You've sufficiently altered this change to clearly no longer want my S-o-b; unfortunately in fact I think you broke things:
Well, I hope we can come to an agreement so I have your S-o-b. But that would probably require me to remove Juergen's R-b.
> @@ -292,7 +294,7 @@ void init_cache_modes(void) > rdmsrl(MSR_IA32_CR_PAT, pat); > } > > - if (!pat) { > + if (!pat || pat_force_disabled) {
By checking the new variable here ...
> /* > * No PAT. Emulate the PAT table that corresponds to the two > * cache bits, PWT (Write Through) and PCD (Cache Disable). > @@ -313,6 +315,16 @@ void init_cache_modes(void) > */ > pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | > PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
... you put in place a software view which doesn't match hardware. I continue to think that ...
> + } else if (!pat_bp_enabled) {
... the variable wants checking here instead (at which point, yes, this comes quite close to simply being a v2 of my original patch).
By using !pat_bp_enabled here you actually broaden where the change would take effect. Iirc Boris had asked to narrow things (besides voicing opposition to this approach altogether). Even without that request I wonder whether you aren't going to far with this.
Jan
I thought about checking for the administrator's "nopat" setting where you suggest which would limit the effect of "nopat" to not reporting PAT as enabled to device drivers who query for PAT availability using pat_enabled(). The main reason I did not do that is that due to the fact that we cannot write to the PAT MSR, we cannot really disable PAT. But we come closer to respecting the wishes of the administrator by configuring the caching modes as if PAT is actually disabled by the hardware or firmware when in fact it is not.
What would you propose logging as a message when we report PAT as disabled via pat_enabled()? The main reason I did not choose to check the new variable in the new 'else if' block is that I could not figure out what to tell the administrator in that case. I think we would have to log something like, "nopat is set, but we cannot disable PAT, doing our best to disable PAT by not reporting PAT as enabled via pat_enabled(), but that does not guarantee that kernel drivers and components cannot use PAT if they query for PAT support using boot_cpu_has(X86_FEATURE_PAT) instead of pat_enabled()." However, I acknowledge WC mappings would still be disabled because arch_can_pci_mmap_wc() will be false if pat_enabled() is false.
Perhaps we also need to log something if we keep the check for "nopat" where I placed it. We could say something like: "nopat is set, but we cannot disable hardware/firmware PAT support, so we are emulating as if there is no PAT support which puts in place a software view that does not match hardware."
No matter what, because we cannot write to PAT MSR in the Xen PV case, we probably need to log something to explain the problems associated with trying to honor the administrator's request. Also, what log level should it be. Should it be a pr_warn instead of a pr_info?
I'm afraid I'm the wrong one to answer logging questions. As you can see from my original patch, I didn't add any new logging (and no addition was requested in the comments that I have got). I also don't think "nopat" has ever meant "disable PAT", as the feature is either there or not. Instead I think it was always seen as "disable fiddling with PAT", which by implication means using whatever is there (if the feature / MSR itself is available).
IIRC, I do think I mentioned in the comments on your patch that it would be preferable to mention in the commit message that your patch would change the current behavior of "nopat" on Xen. The question is, how much do we want to change the current behavior of "nopat" on Xen. I think if we have to change the current behavior of "nopat" on Xen and if we are going to propagate that change to all current stable branches all the way back to 4.9.y,, we better make a lot of noise about what we are doing here.
Chuck
And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist?
Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo.
Jan
IOW, you are saying PAT has been broken on Xen for a long time, and it is necessary to fix it now not only on master, but also on all the stable branches.
Why is it necessary to do it on all the stable branches?
The only valid reason I can think of is a zero-day exploit that can only be mitigated by really disabling PAT on Xen.
Chuck
On 13.07.2022 15:49, Chuck Zmudzinski wrote:
On 7/13/2022 9:34 AM, Jan Beulich wrote:
On 13.07.2022 13:10, Chuck Zmudzinski wrote:
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
On 7/13/2022 5:09 AM, Jan Beulich wrote:
On 13.07.2022 10:51, Chuck Zmudzinski wrote:
On 7/13/22 2:18 AM, Jan Beulich wrote: > On 13.07.2022 03:36, Chuck Zmudzinski wrote: >> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) >> *Add the necessary code to incorporate the "nopat" fix >> *void init_cache_modes(void) -> void __init init_cache_modes(void) >> *Add Jan Beulich as Co-developer (Jan has not signed off yet) >> *Expand the commit message to include relevant parts of the commit >> message of Jan Beulich's proposed patch for this problem >> *Fix 'else if ... {' placement and indentation >> *Remove indication the backport to stable branches is only back to 5.17.y >> >> I think these changes address all the comments on the original patch >> >> I added Jan Beulich as a Co-developer because Juergen Gross asked me to >> include Jan's idea for fixing "nopat" that was missing from the first >> version of the patch. > > You've sufficiently altered this change to clearly no longer want my > S-o-b; unfortunately in fact I think you broke things:
Well, I hope we can come to an agreement so I have your S-o-b. But that would probably require me to remove Juergen's R-b.
>> @@ -292,7 +294,7 @@ void init_cache_modes(void) >> rdmsrl(MSR_IA32_CR_PAT, pat); >> } >> >> - if (!pat) { >> + if (!pat || pat_force_disabled) { > > By checking the new variable here ... > >> /* >> * No PAT. Emulate the PAT table that corresponds to the two >> * cache bits, PWT (Write Through) and PCD (Cache Disable). >> @@ -313,6 +315,16 @@ void init_cache_modes(void) >> */ >> pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | >> PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC); > > ... you put in place a software view which doesn't match hardware. I > continue to think that ... > >> + } else if (!pat_bp_enabled) { > > ... the variable wants checking here instead (at which point, yes, > this comes quite close to simply being a v2 of my original patch). > > By using !pat_bp_enabled here you actually broaden where the change > would take effect. Iirc Boris had asked to narrow things (besides > voicing opposition to this approach altogether). Even without that > request I wonder whether you aren't going to far with this. > > Jan
I thought about checking for the administrator's "nopat" setting where you suggest which would limit the effect of "nopat" to not reporting PAT as enabled to device drivers who query for PAT availability using pat_enabled(). The main reason I did not do that is that due to the fact that we cannot write to the PAT MSR, we cannot really disable PAT. But we come closer to respecting the wishes of the administrator by configuring the caching modes as if PAT is actually disabled by the hardware or firmware when in fact it is not.
What would you propose logging as a message when we report PAT as disabled via pat_enabled()? The main reason I did not choose to check the new variable in the new 'else if' block is that I could not figure out what to tell the administrator in that case. I think we would have to log something like, "nopat is set, but we cannot disable PAT, doing our best to disable PAT by not reporting PAT as enabled via pat_enabled(), but that does not guarantee that kernel drivers and components cannot use PAT if they query for PAT support using boot_cpu_has(X86_FEATURE_PAT) instead of pat_enabled()." However, I acknowledge WC mappings would still be disabled because arch_can_pci_mmap_wc() will be false if pat_enabled() is false.
Perhaps we also need to log something if we keep the check for "nopat" where I placed it. We could say something like: "nopat is set, but we cannot disable hardware/firmware PAT support, so we are emulating as if there is no PAT support which puts in place a software view that does not match hardware."
No matter what, because we cannot write to PAT MSR in the Xen PV case, we probably need to log something to explain the problems associated with trying to honor the administrator's request. Also, what log level should it be. Should it be a pr_warn instead of a pr_info?
I'm afraid I'm the wrong one to answer logging questions. As you can see from my original patch, I didn't add any new logging (and no addition was requested in the comments that I have got). I also don't think "nopat" has ever meant "disable PAT", as the feature is either there or not. Instead I think it was always seen as "disable fiddling with PAT", which by implication means using whatever is there (if the feature / MSR itself is available).
IIRC, I do think I mentioned in the comments on your patch that it would be preferable to mention in the commit message that your patch would change the current behavior of "nopat" on Xen. The question is, how much do we want to change the current behavior of "nopat" on Xen. I think if we have to change the current behavior of "nopat" on Xen and if we are going to propagate that change to all current stable branches all the way back to 4.9.y,, we better make a lot of noise about what we are doing here.
Chuck
And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist?
Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo.
Jan
IOW, you are saying PAT has been broken on Xen for a long time, and it is necessary to fix it now not only on master, but also on all the stable branches.
Why is it necessary to do it on all the stable branches?
I'm not saying that's _necessary_ (but I think it would make sense), and I'm not the one to decide whether or how far to backport this.
Jan
On 7/13/2022 9:52 AM, Jan Beulich wrote:
On 13.07.2022 15:49, Chuck Zmudzinski wrote:
On 7/13/2022 9:34 AM, Jan Beulich wrote:
On 13.07.2022 13:10, Chuck Zmudzinski wrote:
On 7/13/2022 6:36 AM, Chuck Zmudzinski wrote:
On 7/13/2022 5:09 AM, Jan Beulich wrote:
On 13.07.2022 10:51, Chuck Zmudzinski wrote: > On 7/13/22 2:18 AM, Jan Beulich wrote: >> On 13.07.2022 03:36, Chuck Zmudzinski wrote: >>> v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) >>> *Add the necessary code to incorporate the "nopat" fix >>> *void init_cache_modes(void) -> void __init init_cache_modes(void) >>> *Add Jan Beulich as Co-developer (Jan has not signed off yet) >>> *Expand the commit message to include relevant parts of the commit >>> message of Jan Beulich's proposed patch for this problem >>> *Fix 'else if ... {' placement and indentation >>> *Remove indication the backport to stable branches is only back to 5.17.y >>> >>> I think these changes address all the comments on the original patch >>> >>> I added Jan Beulich as a Co-developer because Juergen Gross asked me to >>> include Jan's idea for fixing "nopat" that was missing from the first >>> version of the patch. >> >> You've sufficiently altered this change to clearly no longer want my >> S-o-b; unfortunately in fact I think you broke things: > > Well, I hope we can come to an agreement so I have > your S-o-b. But that would probably require me to remove > Juergen's R-b. > >>> @@ -292,7 +294,7 @@ void init_cache_modes(void) >>> rdmsrl(MSR_IA32_CR_PAT, pat); >>> } >>> >>> - if (!pat) { >>> + if (!pat || pat_force_disabled) { >> >> By checking the new variable here ... >> >>> /* >>> * No PAT. Emulate the PAT table that corresponds to the two >>> * cache bits, PWT (Write Through) and PCD (Cache Disable). >>> @@ -313,6 +315,16 @@ void init_cache_modes(void) >>> */ >>> pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | >>> PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC); >> >> ... you put in place a software view which doesn't match hardware. I >> continue to think that ... >> >>> + } else if (!pat_bp_enabled) { >> >> ... the variable wants checking here instead (at which point, yes, >> this comes quite close to simply being a v2 of my original patch). >> >> By using !pat_bp_enabled here you actually broaden where the change >> would take effect. Iirc Boris had asked to narrow things (besides >> voicing opposition to this approach altogether). Even without that >> request I wonder whether you aren't going to far with this. >> >> Jan > > I thought about checking for the administrator's "nopat" > setting where you suggest which would limit the effect > of "nopat" to not reporting PAT as enabled to device > drivers who query for PAT availability using pat_enabled(). > The main reason I did not do that is that due to the fact > that we cannot write to the PAT MSR, we cannot really > disable PAT. But we come closer to respecting the wishes > of the administrator by configuring the caching modes as > if PAT is actually disabled by the hardware or firmware > when in fact it is not. > > What would you propose logging as a message when > we report PAT as disabled via pat_enabled()? The main > reason I did not choose to check the new variable in the > new 'else if' block is that I could not figure out what to > tell the administrator in that case. I think we would have > to log something like, "nopat is set, but we cannot disable > PAT, doing our best to disable PAT by not reporting PAT > as enabled via pat_enabled(), but that does not guarantee > that kernel drivers and components cannot use PAT if they > query for PAT support using boot_cpu_has(X86_FEATURE_PAT) > instead of pat_enabled()." However, I acknowledge WC mappings > would still be disabled because arch_can_pci_mmap_wc() will > be false if pat_enabled() is false. > > Perhaps we also need to log something if we keep the > check for "nopat" where I placed it. We could say something > like: "nopat is set, but we cannot disable hardware/firmware > PAT support, so we are emulating as if there is no PAT support > which puts in place a software view that does not match > hardware." > > No matter what, because we cannot write to PAT MSR in > the Xen PV case, we probably need to log something to > explain the problems associated with trying to honor the > administrator's request. Also, what log level should it be. > Should it be a pr_warn instead of a pr_info?
I'm afraid I'm the wrong one to answer logging questions. As you can see from my original patch, I didn't add any new logging (and no addition was requested in the comments that I have got). I also don't think "nopat" has ever meant "disable PAT", as the feature is either there or not. Instead I think it was always seen as "disable fiddling with PAT", which by implication means using whatever is there (if the feature / MSR itself is available).
IIRC, I do think I mentioned in the comments on your patch that it would be preferable to mention in the commit message that your patch would change the current behavior of "nopat" on Xen. The question is, how much do we want to change the current behavior of "nopat" on Xen. I think if we have to change the current behavior of "nopat" on Xen and if we are going to propagate that change to all current stable branches all the way back to 4.9.y,, we better make a lot of noise about what we are doing here.
Chuck
And in addition, if we are going to backport this patch to all current stable branches, we better have a really, really, good reason for changing the behavior of "nopat" on Xen.
Does such a reason exist?
Well, the simple reason is: It doesn't work the same way under Xen and non-Xen (in turn because, before my patch or whatever equivalent work, things don't work properly anyway, PAT-wise). Yet it definitely ought to behave the same everywhere, imo.
Jan
IOW, you are saying PAT has been broken on Xen for a long time, and it is necessary to fix it now not only on master, but also on all the stable branches.
Why is it necessary to do it on all the stable branches?
I'm not saying that's _necessary_ (but I think it would make sense), and I'm not the one to decide whether or how far to backport this.
Jan
What conclusion do you draw from these facts?
1. Linus' regression rule is a high priority in Linux 2. Security concerns are even a higher priority in Linux 3. You and I have been trying to fix a regression for the past two months 4. The ones who can fix the regression have not accepted our patches. 5. I have been asked to help backport my fix to all stable branches.
Chuck
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
BTW, "submitting-patches.rst" says it should just be "the first 12 characters of the SHA-1 ID"
Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
Sorry, have to ask: is this supposed to finally fix this regression? https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
If yes, please include Link: and Reported-by: tags, as explained in submitting-patches.rst (I only care about the link tag, as I'm tacking that regression).
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
BTW, "submitting-patches.rst" says it should just be "the first 12 characters of the SHA-1 ID"
Actually it says "at least", so more that 12 is It is probably safest to put the entire SHA-1 ID in because of the possibility of a collision. At least that's how I understand what submitting-patches.rst.
Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
Sorry, have to ask: is this supposed to finally fix this regression? https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
Yes that's the first report I saw to lkml about this isssue. So if I submit a v3 I will include that. But my patch does not have a sign-off from the Co-developer as I mentioned in a message earlier today, and the discussion that has ensued leads me to think a better solution is to just revert the commit in the i915 driver instead, and leave the bigger questions for Juergen Gross and his plans to re-work the x86/PAT code in September, as he said on this thread in the last couple of days. This patch is dead now, as far as I can tell, because the Co-developer is not cooperating.
Chuck
If yes, please include Link: and Reported-by: tags, as explained in submitting-patches.rst (I only care about the link tag, as I'm tacking that regression).
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
On 15.07.22 04:07, Chuck Zmudzinski wrote:
On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
BTW, "submitting-patches.rst" says it should just be "the first 12 characters of the SHA-1 ID"
Actually it says "at least",
/me looks a bit confused, as he quoted above directly from that file
Ohh, that "at least" is in a different section of the document. :-D For the fixes tag is explicitly "12 characters":
``` If your patch fixes a bug in a specific commit, e.g. you found an issue using git bisect, please use the ‘Fixes:’ tag with the first 12 characters of the SHA-1 ID, and the one line summary. ```
so more that 12 is It is probably safest to put the entire SHA-1 ID in because of the possibility of a collision.
Yes, sure, but it's a little mess if everybody uses something different and 12 characters is what was agreed on as "good enough" for now.
At least that's how I understand what submitting-patches.rst.
Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
Sorry, have to ask: is this supposed to finally fix this regression? https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
Yes that's the first report I saw to lkml about this isssue.
Thx for confirming.
So if I submit a v3 I will include that.
Thx.
But my patch does not have a sign-off from the Co-developer as I mentioned in a message earlier today, and the discussion that has ensued leads me to think a better solution is to just revert the commit in the i915 driver instead, and leave the bigger questions for Juergen Gross and his plans to re-work the x86/PAT code in September, as he said on this thread in the last couple of days. This patch is dead now, as far as I can tell, because the Co-developer is not cooperating.
k, thx for the quick summary, this whole thing is a bit hard to follow for me: I can only briefly look into most regressions, as there is only so much time in a day...
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
P.S.: As the Linux kernel's regression tracker I deal with a lot of reports and sometimes miss something important when writing mails like this. If that's the case here, don't hesitate to tell me in a public reply, it's in everyone's interest to set the public record straight.
On 15.07.2022 04:07, Chuck Zmudzinski wrote:
On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
BTW, "submitting-patches.rst" says it should just be "the first 12 characters of the SHA-1 ID"
Actually it says "at least", so more that 12 is It is probably safest to put the entire SHA-1 ID in because of the possibility of a collision. At least that's how I understand what submitting-patches.rst.
Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
Sorry, have to ask: is this supposed to finally fix this regression? https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
Yes that's the first report I saw to lkml about this isssue. So if I submit a v3 I will include that. But my patch does not have a sign-off from the Co-developer as I mentioned in a message earlier today, and the discussion that has ensued leads me to think a better solution is to just revert the commit in the i915 driver instead, and leave the bigger questions for Juergen Gross and his plans to re-work the x86/PAT code in September, as he said on this thread in the last couple of days. This patch is dead now, as far as I can tell, because the Co-developer is not cooperating.
???
Jan
On 7/15/2022 6:05 AM, Jan Beulich wrote:
On 15.07.2022 04:07, Chuck Zmudzinski wrote:
On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
BTW, "submitting-patches.rst" says it should just be "the first 12 characters of the SHA-1 ID"
Actually it says "at least", so more that 12 is It is probably safest to put the entire SHA-1 ID in because of the possibility of a collision. At least that's how I understand what submitting-patches.rst.
Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
Sorry, have to ask: is this supposed to finally fix this regression? https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
Yes that's the first report I saw to lkml about this isssue. So if I submit a v3 I will include that. But my patch does not have a sign-off from the Co-developer as I mentioned in a message earlier today, and the discussion that has ensued leads me to think a better solution is to just revert the commit in the i915 driver instead, and leave the bigger questions for Juergen Gross and his plans to re-work the x86/PAT code in September, as he said on this thread in the last couple of days. This patch is dead now, as far as I can tell, because the Co-developer is not cooperating.
???
Jan
I think I recall you said my patch proves I don't want your S-o-b. I also want to add some useful logs with your approach, probably a pr_warn, which you are not interested in doing. I think it is probably necessary, under the current situation, to warn all users of Xen/Linux that Linux on Xen is not guaranteed to be secure and full-featured anymore. That is also why I think the Linux maintainers are ignoring your patch. They are basically saying Xen is just some weird one-off thing, I can't remember who said it, but I did read it here in some of the comments on your patch, and you do not seem to be listening to and responding to what the Linux developers are asking you to do.
Two things I see here in my efforts to get a patch to fix this regression:
1. Does Xen have plans to give Linux running in Dom0 write-access to the PAT MSR?
2. Does Xen have plans to expose MTRRs to Linux running in Dom0?
These don't have to be the defaults for Dom0. But can Xen at least make these as supported configurations for Linux? Then, the problem of Xen being some weird one-off thing goes away. At least that's how I see the situation. Maybe Xen can provide these for Linux in Dom0, but from what I've read here, it seems Xen is not willing to do that for Linux. Do I understand that correctly?
Chuck
On 7/15/2022 3:53 PM, Chuck Zmudzinski wrote:
On 7/15/2022 6:05 AM, Jan Beulich wrote:
On 15.07.2022 04:07, Chuck Zmudzinski wrote:
On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it")
BTW, "submitting-patches.rst" says it should just be "the first 12 characters of the SHA-1 ID"
Actually it says "at least", so more that 12 is It is probably safest to put the entire SHA-1 ID in because of the possibility of a collision. At least that's how I understand what submitting-patches.rst.
Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
Sorry, have to ask: is this supposed to finally fix this regression? https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
Yes that's the first report I saw to lkml about this isssue. So if I submit a v3 I will include that. But my patch does not have a sign-off from the Co-developer as I mentioned in a message earlier today, and the discussion that has ensued leads me to think a better solution is to just revert the commit in the i915 driver instead, and leave the bigger questions for Juergen Gross and his plans to re-work the x86/PAT code in September, as he said on this thread in the last couple of days. This patch is dead now, as far as I can tell, because the Co-developer is not cooperating.
???
Jan
I think I recall you said my patch proves I don't want your S-o-b. I also want to add some useful logs with your approach, probably a pr_warn, which you are not interested in doing. I think it is probably necessary, under the current situation, to warn all users of Xen/Linux that Linux on Xen is not guaranteed to be secure and full-featured anymore. That is also why I think the Linux maintainers are ignoring your patch. They are basically saying Xen is just some weird one-off thing, I can't remember who said it, but I did read it here in some of the comments on your patch, and you do not seem to be listening to and responding to what the Linux developers are asking you to do.
Two things I see here in my efforts to get a patch to fix this regression:
- Does Xen have plans to give Linux running in Dom0 write-access to the
PAT MSR?
- Does Xen have plans to expose MTRRs to Linux running in Dom0?
These don't have to be the defaults for Dom0. But can Xen at least make these as supported configurations for Linux? Then, the problem of Xen being some weird one-off thing goes away. At least that's how I see the situation. Maybe Xen can provide these for Linux in Dom0, but from what I've read here, it seems Xen is not willing to do that for Linux. Do I understand that correctly?
Chuck
???
Chuck
On 15.07.2022 21:53, Chuck Zmudzinski wrote:
Two things I see here in my efforts to get a patch to fix this regression:
- Does Xen have plans to give Linux running in Dom0 write-access to the
PAT MSR?
No, as this is not technically feasible (all physical CPUs should run with the same value in the MSR, or else other issues arise).
- Does Xen have plans to expose MTRRs to Linux running in Dom0?
Yen does expose MTRRs to PV Dom0, but via a hypercall mechanism. I don't think there are plans on the Xen side to support the MSR interface (and hence to expose the CPUID bit), and iirc there are no plans on the Linux side to use the MTRR interface. This also wouldn't really make sense anymore now that it has become quite clear that Linux wants to have PAT working without depending on MTRR.
Jan
On 7/18/2022 2:07 AM, Jan Beulich wrote:
On 15.07.2022 21:53, Chuck Zmudzinski wrote:
Two things I see here in my efforts to get a patch to fix this regression:
- Does Xen have plans to give Linux running in Dom0 write-access to the
PAT MSR?
No, as this is not technically feasible (all physical CPUs should run with the same value in the MSR, or else other issues arise).
- Does Xen have plans to expose MTRRs to Linux running in Dom0?
Yen does expose MTRRs to PV Dom0, but via a hypercall mechanism. I don't think there are plans on the Xen side to support the MSR interface (and hence to expose the CPUID bit), and iirc there are no plans on the Linux side to use the MTRR interface. This also wouldn't really make sense anymore now that it has become quite clear that Linux wants to have PAT working without depending on MTRR.
I am not so sure about that, given what Borislav Petkov said when commenting on your patch here:
https://lore.kernel.org/lkml/YsRjX%2FU1XN8rq+8u@zn.tnic/
Specifically, Borislav Petkov wrote on Tue, 5 Jul 2022 18:14:23 +0200:
Actually, the current goal is to adjust Xen dom0 because:
1. it uses the PAT code
2. but then it does something special and hides the MTRRs
which is not something real hardware does.
So this one-off thing should be prominent, visible and not get in the way.
--------------end of Borislav Petkov quote-----------
Jan, can you explain this comment by Borislav Petkov about Xen being a "one-off thing" that hides MTRRs and needs to be "adjusted" so it does "not get in the way"?
Borislav, are you willing to retract the comments you made on Tue, 5 Jul 2022 18:14:23 +0200 about Xen?
Jan
Jan, thanks for answering my questions.
Chuck
On 7/14/2022 10:07 PM, Chuck Zmudzinski wrote:
On 7/14/2022 1:30 AM, Thorsten Leemhuis wrote:
Sorry, have to ask: is this supposed to finally fix this regression? https://lore.kernel.org/regressions/YnHK1Z3o99eMXsVK@mail-itl/
Yes that's the first report I saw to lkml about this isssue.
Hi Thorsten,
Actually, now I realize that was not the first report to lkml about this issue, although it *was* the first report to the regressions mailing list.
Actually, the first report to lkml about this issue was Jan's patch that will hopefully soon make it to linux-next and mainline. So the proper Link: tag for this issue in the actual patch to be committed to the mainline kernel is to Jan's patch that was originally posted to lkml before any user reported it to the regressions mailing list. To know there was a regression from Jan's original patch, one would need to have read his commit message since he did not actually report it as a regression to the regressions mailing list at that time, nor did he identify a culprit commit at that time.
Bottom line: everything seems OK right now because the patch moving towards mainline does have the correct Link: tag. Thanks for all the work you have done on this regression.
Best regards,
Chuck
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
The patch has been tested, it works as expected with and without nopat in the Xen PV Dom0 environment. That is, "nopat" causes the system to exhibit the effects and problems that lack of PAT support causes.
arch/x86/mm/pat/memtype.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index d5ef64ddd35e..10a37d309d23 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -62,6 +62,7 @@ static bool __read_mostly pat_bp_initialized; static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT); +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT); static bool __read_mostly pat_bp_enabled; static bool __read_mostly pat_cm_initialized; @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason) static int __init nopat(char *str) { pat_disable("PAT support disabled via boot option.");
- pat_force_disabled = true; return 0; } early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat) wrmsrl(MSR_IA32_CR_PAT, pat); } -void init_cache_modes(void) +void __init init_cache_modes(void) { u64 pat = 0; @@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
Can we just remove this modification and ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
- } else if (!pat_bp_enabled) {
... use
+ } else if (!pat_bp_enabled && !pat_force_disabled) {
here?
This will result in the desired outcome in all cases IMO: If PAT wasn't disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of MTRR), then PAT will be set to "enabled" again.
/*
* In some environments, specifically Xen PV, PAT
* initialization is skipped because MTRRs are disabled even
* though PAT is available. In such environments, set PAT to
* enabled to correctly indicate to callers of pat_enabled()
* that CPU support for PAT is available.
*/
pat_bp_enabled = true;
}pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
__init_cache_modes(pat);
Juergen
On 14.07.2022 07:40, Juergen Gross wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
@@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
Can we just remove this modification and ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
- } else if (!pat_bp_enabled) {
... use
- } else if (!pat_bp_enabled && !pat_force_disabled) {
here?
This will result in the desired outcome in all cases IMO: If PAT wasn't disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of MTRR), then PAT will be set to "enabled" again.
Just to mention it explicitly: If the value _read_ from the MSR is zero, we're hosed anyway, as then we can only express a single memory type (UC) in all PTEs. The zero case we mean to deal with is when reading the MSR wasn't valid to try.
Jan
On 7/14/2022 1:40 AM, Juergen Gross wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
The patch has been tested, it works as expected with and without nopat in the Xen PV Dom0 environment. That is, "nopat" causes the system to exhibit the effects and problems that lack of PAT support causes.
arch/x86/mm/pat/memtype.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index d5ef64ddd35e..10a37d309d23 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -62,6 +62,7 @@ static bool __read_mostly pat_bp_initialized; static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT); +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT); static bool __read_mostly pat_bp_enabled; static bool __read_mostly pat_cm_initialized; @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason) static int __init nopat(char *str) { pat_disable("PAT support disabled via boot option.");
- pat_force_disabled = true; return 0; } early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat) wrmsrl(MSR_IA32_CR_PAT, pat); } -void init_cache_modes(void) +void __init init_cache_modes(void) { u64 pat = 0; @@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
Can we just remove this modification and ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
- } else if (!pat_bp_enabled) {
... use
- } else if (!pat_bp_enabled && !pat_force_disabled) {
here?
This will result in the desired outcome in all cases IMO: If PAT wasn't disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of MTRR), then PAT will be set to "enabled" again.
With that, you can also completely remove the new Boolean - it will be a meaningless variable wasting memory. This will also make my patch more or less do what Jan's patch does - the "nopat" option will not cause the situation when the PAT MSR does not match the software view. So you are basically proposing just going back to my original patch, after fixing the style problems, of course. That also would solve the problem of needing Jan's S-o-b. I am inclined, however, to wait for a maintainer who has power to actually do the commit, to make a comment. Your R-b to my v2 did not have much clout with the actual maintainers, as far as I can tell. I am somewhat annoyed that it was at your suggestion that my v2 ended up confusing the main issue, the regression, with the red herring of the "nopat" option.
Chuck
/*
* In some environments, specifically Xen PV, PAT
* initialization is skipped because MTRRs are disabled even
* though PAT is available. In such environments, set PAT to
* enabled to correctly indicate to callers of pat_enabled()
* that CPU support for PAT is available.
*/
pat_bp_enabled = true;
}pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
__init_cache_modes(pat);
Juergen
On 7/14/2022 10:19 PM, Chuck Zmudzinski wrote:
On 7/14/2022 1:40 AM, Juergen Gross wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
The patch has been tested, it works as expected with and without nopat in the Xen PV Dom0 environment. That is, "nopat" causes the system to exhibit the effects and problems that lack of PAT support causes.
arch/x86/mm/pat/memtype.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index d5ef64ddd35e..10a37d309d23 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -62,6 +62,7 @@ static bool __read_mostly pat_bp_initialized; static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT); +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT); static bool __read_mostly pat_bp_enabled; static bool __read_mostly pat_cm_initialized; @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason) static int __init nopat(char *str) { pat_disable("PAT support disabled via boot option.");
- pat_force_disabled = true; return 0; } early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat) wrmsrl(MSR_IA32_CR_PAT, pat); } -void init_cache_modes(void) +void __init init_cache_modes(void) { u64 pat = 0; @@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
Can we just remove this modification and ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
- } else if (!pat_bp_enabled) {
... use
- } else if (!pat_bp_enabled && !pat_force_disabled) {
here?
This will result in the desired outcome in all cases IMO: If PAT wasn't disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of MTRR), then PAT will be set to "enabled" again.
With that, you can also completely remove the new Boolean - it will be a meaningless variable wasting memory. This will also make my patch more or less do what Jan's patch does - the "nopat" option will not cause the situation when the PAT MSR does not match the software view. So you are basically proposing just going back to my original patch, after fixing the style problems, of course. That also would solve the problem of needing Jan's S-o-b. I am inclined, however, to wait for a maintainer who has power to actually do the commit, to make a comment. Your R-b to my v2 did not have much clout with the actual maintainers, as far as I can tell. I am somewhat annoyed that it was at your suggestion that my v2 ended up confusing the main issue, the regression, with the red herring of the "nopat" option.
Chuck
Actually, what your change does depend on keeping pat_force_disable, but after all the discussion and further thinking about this, I would prefer that you give a R-b to v3 as simply my original patch with the style fixed. I think it is wrong to confuse the regression with the "nopat" issue. If you and Jan want to do a patch for the "nopat" issue, that is your decision. I am not interested in that. I am interested in fixing the regression. Also, I am not included to formally submit v3 until Dave, Andy, Boris, or someone else with more clout here on Linux expresses interest in giving this idea an R-b.
Chuck
/*
* In some environments, specifically Xen PV, PAT
* initialization is skipped because MTRRs are disabled even
* though PAT is available. In such environments, set PAT to
* enabled to correctly indicate to callers of pat_enabled()
* that CPU support for PAT is available.
*/
pat_bp_enabled = true;
}pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
__init_cache_modes(pat);
Juergen
On 7/14/2022 10:53 PM, Chuck Zmudzinski wrote:
On 7/14/2022 10:19 PM, Chuck Zmudzinski wrote:
On 7/14/2022 1:40 AM, Juergen Gross wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
The patch has been tested, it works as expected with and without nopat in the Xen PV Dom0 environment. That is, "nopat" causes the system to exhibit the effects and problems that lack of PAT support causes.
arch/x86/mm/pat/memtype.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index d5ef64ddd35e..10a37d309d23 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -62,6 +62,7 @@ static bool __read_mostly pat_bp_initialized; static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT); +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT); static bool __read_mostly pat_bp_enabled; static bool __read_mostly pat_cm_initialized; @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason) static int __init nopat(char *str) { pat_disable("PAT support disabled via boot option.");
- pat_force_disabled = true; return 0; } early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat) wrmsrl(MSR_IA32_CR_PAT, pat); } -void init_cache_modes(void) +void __init init_cache_modes(void) { u64 pat = 0; @@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
Can we just remove this modification and ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
- } else if (!pat_bp_enabled) {
... use
- } else if (!pat_bp_enabled && !pat_force_disabled) {
here?
This will result in the desired outcome in all cases IMO: If PAT wasn't disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of MTRR), then PAT will be set to "enabled" again.
With that, you can also completely remove the new Boolean - it will be a meaningless variable wasting memory. This will also make my patch more or less do what Jan's patch does - the "nopat" option will not cause the situation when the PAT MSR does not match the software view. So you are basically proposing just going back to my original patch, after fixing the style problems, of course. That also would solve the problem of needing Jan's S-o-b. I am inclined, however, to wait for a maintainer who has power to actually do the commit, to make a comment. Your R-b to my v2 did not have much clout with the actual maintainers, as far as I can tell. I am somewhat annoyed that it was at your suggestion that my v2 ended up confusing the main issue, the regression, with the red herring of the "nopat" option.
Chuck
Actually, what your change does depend on keeping pat_force_disable, but after all the discussion and further thinking about this, I would prefer that you give a R-b to v3 as simply my original patch with the style fixed. I think it is wrong to confuse the regression with the "nopat" issue. If you and Jan want to do a patch for the "nopat" issue, that is your decision. I am not interested in that. I am interested in fixing the regression. Also, I am not included to formally submit v3 until Dave, Andy, Boris, or someone else with more clout here on Linux expresses interest in giving this idea an R-b.
I meant to say "i am not inclined..." not I am not included..."
Sorry for the confusion,
Chuck
/*
* In some environments, specifically Xen PV, PAT
* initialization is skipped because MTRRs are disabled even
* though PAT is available. In such environments, set PAT to
* enabled to correctly indicate to callers of pat_enabled()
* that CPU support for PAT is available.
*/
pat_bp_enabled = true;
}pr_info("x86/PAT: PAT enabled by init_cache_modes\n");
__init_cache_modes(pat);
Juergen
On 15.07.22 04:19, Chuck Zmudzinski wrote:
On 7/14/2022 1:40 AM, Juergen Gross wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
The patch has been tested, it works as expected with and without nopat in the Xen PV Dom0 environment. That is, "nopat" causes the system to exhibit the effects and problems that lack of PAT support causes.
arch/x86/mm/pat/memtype.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index d5ef64ddd35e..10a37d309d23 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -62,6 +62,7 @@ static bool __read_mostly pat_bp_initialized; static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT); +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT); static bool __read_mostly pat_bp_enabled; static bool __read_mostly pat_cm_initialized; @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason) static int __init nopat(char *str) { pat_disable("PAT support disabled via boot option.");
- pat_force_disabled = true; return 0; } early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat) wrmsrl(MSR_IA32_CR_PAT, pat); } -void init_cache_modes(void) +void __init init_cache_modes(void) { u64 pat = 0; @@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
Can we just remove this modification and ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
- } else if (!pat_bp_enabled) {
... use
- } else if (!pat_bp_enabled && !pat_force_disabled) {
here?
This will result in the desired outcome in all cases IMO: If PAT wasn't disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of MTRR), then PAT will be set to "enabled" again.
With that, you can also completely remove the new Boolean - it will be a meaningless variable wasting memory. This will also make
No, it is making a difference with "nopat" having been specified.
In the Xen PV case we will have pat_bp_enabled == false due to the lack of MTRR. We don't want to set it to true if "nopat" has been specified on the command line, so pat_force_disabled should not be true when we are setting pat_bp_enabled to true again.
my patch more or less do what Jan's patch does - the "nopat" option will not cause the situation when the PAT MSR does not match the software view. So you are basically proposing just going back to my original patch, after fixing the style problems, of course. That also would solve the problem of needing Jan's S-o-b. I am inclined, however, to wait for a maintainer who has power to actually do the commit, to make a comment. Your R-b to my v2 did not have much clout with the actual maintainers, as far as I can tell. I am somewhat annoyed that it was at your suggestion that my v2 ended up confusing the main issue, the regression, with the red herring of the "nopat" option.
I'm sorry for that.
Juergen
On 7/15/2022 12:22 AM, Juergen Gross wrote:
On 15.07.22 04:19, Chuck Zmudzinski wrote:
On 7/14/2022 1:40 AM, Juergen Gross wrote:
On 13.07.22 03:36, Chuck Zmudzinski wrote:
The commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") incorrectly failed to account for the case in init_cache_modes() when CPUs do support PAT and falsely reported PAT to be disabled when in fact PAT is enabled. In some environments, notably in Xen PV domains, MTRR is disabled but PAT is still enabled, and that is the case that the aforementioned commit failed to account for.
As an unfortunate consequnce, the pat_enabled() function currently does not correctly report that PAT is enabled in such environments. The fix is implemented in init_cache_modes() by setting pat_bp_enabled to true in init_cache_modes() for the case that commit 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") failed to account for.
This approach arranges for pat_enabled() to return true in the Xen PV environment without undermining the rest of PAT MSR management logic that considers PAT to be disabled: Specifically, no writes to the PAT MSR should occur.
This patch fixes a regression that some users are experiencing with Linux as a Xen Dom0 driving particular Intel graphics devices by correctly reporting to the Intel i915 driver that PAT is enabled where previously it was falsely reporting that PAT is disabled. Some users are experiencing system hangs in Xen PV Dom0 and all users on Xen PV Dom0 are experiencing reduced graphics performance because the keying of the use of WC mappings to pat_enabled() (see arch_can_pci_mmap_wc()) means that in particular graphics frame buffer accesses are quite a bit less performant than possible without this patch.
Also, with the current code, in the Xen PV environment, PAT will not be disabled if the administrator sets the "nopat" boot option. Introduce a new boolean variable, pat_force_disable, to forcibly disable PAT when the administrator sets the "nopat" option to override the default behavior of using the PAT configuration that Xen has provided.
For the new boolean to live in .init.data, init_cache_modes() also needs moving to .init.text (where it could/should have lived already before).
Fixes: 99c13b8c8896d7bcb92753bf ("x86/mm/pat: Don't report PAT on CPUs that don't support it") Co-developed-by: Jan Beulich jbeulich@suse.com Cc: stable@vger.kernel.org Signed-off-by: Chuck Zmudzinski brchuckz@aol.com
v2: *Add force_pat_disabled variable to fix "nopat" on Xen PV (Jan Beulich) *Add the necessary code to incorporate the "nopat" fix *void init_cache_modes(void) -> void __init init_cache_modes(void) *Add Jan Beulich as Co-developer (Jan has not signed off yet) *Expand the commit message to include relevant parts of the commit message of Jan Beulich's proposed patch for this problem *Fix 'else if ... {' placement and indentation *Remove indication the backport to stable branches is only back to 5.17.y
I think these changes address all the comments on the original patch
I added Jan Beulich as a Co-developer because Juergen Gross asked me to include Jan's idea for fixing "nopat" that was missing from the first version of the patch.
The patch has been tested, it works as expected with and without nopat in the Xen PV Dom0 environment. That is, "nopat" causes the system to exhibit the effects and problems that lack of PAT support causes.
arch/x86/mm/pat/memtype.c | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-)
diff --git a/arch/x86/mm/pat/memtype.c b/arch/x86/mm/pat/memtype.c index d5ef64ddd35e..10a37d309d23 100644 --- a/arch/x86/mm/pat/memtype.c +++ b/arch/x86/mm/pat/memtype.c @@ -62,6 +62,7 @@ static bool __read_mostly pat_bp_initialized; static bool __read_mostly pat_disabled = !IS_ENABLED(CONFIG_X86_PAT); +static bool __initdata pat_force_disabled = !IS_ENABLED(CONFIG_X86_PAT); static bool __read_mostly pat_bp_enabled; static bool __read_mostly pat_cm_initialized; @@ -86,6 +87,7 @@ void pat_disable(const char *msg_reason) static int __init nopat(char *str) { pat_disable("PAT support disabled via boot option.");
- pat_force_disabled = true; return 0; } early_param("nopat", nopat);
@@ -272,7 +274,7 @@ static void pat_ap_init(u64 pat) wrmsrl(MSR_IA32_CR_PAT, pat); } -void init_cache_modes(void) +void __init init_cache_modes(void) { u64 pat = 0; @@ -292,7 +294,7 @@ void init_cache_modes(void) rdmsrl(MSR_IA32_CR_PAT, pat); }
- if (!pat) {
- if (!pat || pat_force_disabled) {
Can we just remove this modification and ...
/* * No PAT. Emulate the PAT table that corresponds to the two * cache bits, PWT (Write Through) and PCD (Cache Disable).
@@ -313,6 +315,16 @@ void init_cache_modes(void) */ pat = PAT(0, WB) | PAT(1, WT) | PAT(2, UC_MINUS) | PAT(3, UC) | PAT(4, WB) | PAT(5, WT) | PAT(6, UC_MINUS) | PAT(7, UC);
- } else if (!pat_bp_enabled) {
... use
- } else if (!pat_bp_enabled && !pat_force_disabled) {
here?
This will result in the desired outcome in all cases IMO: If PAT wasn't disabled via "nopat" and the PAT MSR has a non-zero value (from BIOS or Hypervisor) and PAT has been disabled implicitly (e.g. due to lack of MTRR), then PAT will be set to "enabled" again.
With that, you can also completely remove the new Boolean - it will be a meaningless variable wasting memory. This will also make
No, it is making a difference with "nopat" having been specified.
In the Xen PV case we will have pat_bp_enabled == false due to the lack of MTRR. We don't want to set it to true if "nopat" has been specified on the command line, so pat_force_disabled should not be true when we are setting pat_bp_enabled to true again.
my patch more or less do what Jan's patch does - the "nopat" option will not cause the situation when the PAT MSR does not match the software view. So you are basically proposing just going back to my original patch, after fixing the style problems, of course. That also would solve the problem of needing Jan's S-o-b. I am inclined, however, to wait for a maintainer who has power to actually do the commit, to make a comment. Your R-b to my v2 did not have much clout with the actual maintainers, as far as I can tell. I am somewhat annoyed that it was at your suggestion that my v2 ended up confusing the main issue, the regression, with the red herring of the "nopat" option.
I'm sorry for that.
I accept your apology. A few days back you indicated Boris was willing to go along with the approach I was suggesting. Do you have any idea why he didn't give me an R-b for either my original patch or v2 or at least tell me what I would have to do to get his R-b?
Chuck
Juergen
linux-stable-mirror@lists.linaro.org