The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x f0362a253606e2031f8d61c74195d4d6556e12a4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082642-catfight-gallantly-8b84@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
f0362a253606 ("mm,ima,kexec,of: use memblock_free_late from ima_free_kexec_buffer")
3ecc68349bba ("memblock: rename memblock_free to memblock_phys_free")
fa27717110ae ("memblock: drop memblock_free_early_nid() and memblock_free_early()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f0362a253606e2031f8d61c74195d4d6556e12a4 Mon Sep 17 00:00:00 2001
From: Rik van Riel <riel(a)surriel.com>
Date: Thu, 17 Aug 2023 13:57:59 -0400
Subject: [PATCH] mm,ima,kexec,of: use memblock_free_late from
ima_free_kexec_buffer
The code calling ima_free_kexec_buffer runs long after the memblock
allocator has already been torn down, potentially resulting in a use
after free in memblock_isolate_range.
With KASAN or KFENCE, this use after free will result in a BUG
from the idle task, and a subsequent kernel panic.
Switch ima_free_kexec_buffer over to memblock_free_late to avoid
that issue.
Fixes: fee3ff99bc67 ("powerpc: Move arch independent ima kexec functions to drivers/of/kexec.c")
Cc: stable(a)kernel.org
Signed-off-by: Rik van Riel <riel(a)surriel.com>
Suggested-by: Mike Rappoport <rppt(a)kernel.org>
Link: https://lore.kernel.org/r/20230817135759.0888e5ef@imladris.surriel.com
Signed-off-by: Rob Herring <robh(a)kernel.org>
diff --git a/drivers/of/kexec.c b/drivers/of/kexec.c
index f26d2ba8a371..68278340cecf 100644
--- a/drivers/of/kexec.c
+++ b/drivers/of/kexec.c
@@ -184,7 +184,8 @@ int __init ima_free_kexec_buffer(void)
if (ret)
return ret;
- return memblock_phys_free(addr, size);
+ memblock_free_late(addr, size);
+ return 0;
}
#endif
From: "Paul E. McKenney" <paulmck(a)kernel.org>
[ Upstream commit 10f84c2cfb5045e37d78cb5d4c8e8321e06ae18f ]
Currently, the various torture tests sometimes react to an early-boot
bug by rebooting. This is almost always counterproductive, needlessly
consuming CPU time and bloating the console log. This commit therefore
adds the "-no-reboot" argument to qemu so that reboot requests will
cause qemu to exit.
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 9 +++++----
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index f4c8055dbf7a..c57be9563214 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -9,9 +9,10 @@
#
# Usage: kvm-test-1-run.sh config resdir seconds qemu-args boot_args_in
#
-# qemu-args defaults to "-enable-kvm -nographic", along with arguments
-# specifying the number of CPUs and other options
-# generated from the underlying CPU architecture.
+# qemu-args defaults to "-enable-kvm -nographic -no-reboot", along with
+# arguments specifying the number of CPUs and
+# other options generated from the underlying
+# CPU architecture.
# boot_args_in defaults to value returned by the per_version_boot_params
# shell function.
#
@@ -141,7 +142,7 @@ then
fi
# Generate -smp qemu argument.
-qemu_args="-enable-kvm -nographic $qemu_args"
+qemu_args="-enable-kvm -nographic -no-reboot $qemu_args"
cpu_count=`configNR_CPUS.sh $resdir/ConfigFragment`
cpu_count=`configfrag_boot_cpus "$boot_args_in" "$config_template" "$cpu_count"`
if test "$cpu_count" -gt "$TORTURE_ALLOTED_CPUS"
--
2.42.0.rc1.204.g551eb34607-goog
--
My name is Dr.Law Kok Chung, I have very important information that I
would like to pass to you but I was wondering if you are still using
this email ID or not. Please if I reach you on this email as I am
hopeful, Please endeavor to confirm to me so I can pass the detailed
information to you.
I will be waiting for your feedback.
Thanks
Dr.Law Kok Chung
Research Assistant
CV Industrial laboratory Ltd
Dzień dobry,
jakiś czas temu zgłosiła się do nas firma, której strona internetowa nie pozycjonowała się wysoko w wyszukiwarce Google.
Na podstawie wykonanego przez nas audytu SEO zoptymalizowaliśmy treści na stronie pod kątem wcześniej opracowanych słów kluczowych. Nasz wewnętrzny system codziennie analizuje prawidłowe działanie witryny. Dzięki indywidualnej strategii, firma zdobywa coraz więcej Klientów.
Czy chcieliby Państwo zwiększyć liczbę osób odwiedzających stronę internetową firmy? Mógłbym przedstawić ofertę?
Pozdrawiam serdecznie,
Dominik Perkowski
This is the start of the stable review cycle for the 6.1.49 release.
There are 4 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Mon, 28 Aug 2023 15:46:14 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.49-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.49-rc1
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "f2fs: fix to do sanity check on direct node in truncate_dnode()"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "f2fs: fix to set flush_merge opt and show noflush_merge"
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Revert "f2fs: don't reset unchangable mount option in f2fs_remount()"
Peter Zijlstra <peterz(a)infradead.org>
objtool/x86: Fix SRSO mess
-------------
Diffstat:
Makefile | 4 ++--
fs/f2fs/f2fs.h | 1 +
fs/f2fs/file.c | 5 +++++
fs/f2fs/node.c | 14 ++----------
fs/f2fs/super.c | 43 ++++++++++++------------------------
include/linux/f2fs_fs.h | 1 -
tools/objtool/arch/x86/decode.c | 11 +++++----
tools/objtool/check.c | 22 +++++++++++++++++-
tools/objtool/include/objtool/arch.h | 1 +
tools/objtool/include/objtool/elf.h | 1 +
10 files changed, 54 insertions(+), 49 deletions(-)
Hi Greg,
please revert the following two patches as 5.10.192 fails to build with
them:
asoc-intel-sof_sdw-add-quirk-for-lnl-rvp.patch
asoc-intel-sof_sdw-add-quirk-for-mtl-rvp.patch
Error message: error: ‘RT711_JD2_100K’ undeclared here (not in a function)
2023-08-26T17:46:51.3733116Z sound/soc/intel/boards/sof_sdw.c:208:41:
error: ‘RT711_JD2_100K’ undeclared here (not in a function)
2023-08-26T17:46:51.3744338Z 208 | .driver_data =
(void *)(RT711_JD2_100K),
2023-08-26T17:46:51.3745547Z |
^~~~~~~~~~~~~~
2023-08-26T17:46:51.4620173Z make[4]: *** [scripts/Makefile.build:286:
sound/soc/intel/boards/sof_sdw.o] Error 1
2023-08-26T17:46:51.4625055Z make[3]: *** [scripts/Makefile.build:503:
sound/soc/intel/boards] Error 2
2023-08-26T17:46:51.4626370Z make[2]: *** [scripts/Makefile.build:503:
sound/soc/intel] Error 2
This happened before already:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/com…
--
Best, Philip
This is a backport of the series that fixes the way deadline bandwidth
restoration is done which is causing noticeable delay on resume path. It also
converts the cpuset lock back into a mutex which some users on Android too.
I lack the details but AFAIU the read/write semaphore was slower on high
contention.
Compile tested against some randconfig for different archs. Only boot tested on
x86 qemu.
Based on v6.4.11
Original series:
https://lore.kernel.org/lkml/20230508075854.17215-1-juri.lelli@redhat.com/
Thanks!
--
Qais Yousef
Dietmar Eggemann (2):
sched/deadline: Create DL BW alloc, free & check overflow interface
cgroup/cpuset: Free DL BW in case can_attach() fails
Juri Lelli (4):
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
sched/cpuset: Bring back cpuset_mutex
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
cgroup/cpuset: Iterate only if DEADLINE tasks are present
include/linux/cpuset.h | 12 +-
include/linux/sched.h | 4 +-
kernel/cgroup/cgroup.c | 4 +
kernel/cgroup/cpuset.c | 244 ++++++++++++++++++++++++++--------------
kernel/sched/core.c | 41 +++----
kernel/sched/deadline.c | 67 ++++++++---
kernel/sched/sched.h | 2 +-
7 files changed, 246 insertions(+), 128 deletions(-)
--
2.34.1
From: "Paul E. McKenney" <paulmck(a)kernel.org>
[ Upstream commit 10f84c2cfb5045e37d78cb5d4c8e8321e06ae18f ]
Currently, the various torture tests sometimes react to an early-boot
bug by rebooting. This is almost always counterproductive, needlessly
consuming CPU time and bloating the console log. This commit therefore
adds the "-no-reboot" argument to qemu so that reboot requests will
cause qemu to exit.
Signed-off-by: Paul E. McKenney <paulmck(a)kernel.org>
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
---
tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
index 6dc2b49b85ea..bdd747dc61f2 100755
--- a/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
+++ b/tools/testing/selftests/rcutorture/bin/kvm-test-1-run.sh
@@ -9,7 +9,7 @@
#
# Usage: kvm-test-1-run.sh config builddir resdir seconds qemu-args boot_args
#
-# qemu-args defaults to "-enable-kvm -nographic", along with arguments
+# qemu-args defaults to "-enable-kvm -nographic -no-reboot", along with arguments
# specifying the number of CPUs and other options
# generated from the underlying CPU architecture.
# boot_args defaults to value returned by the per_version_boot_params
@@ -132,7 +132,7 @@ then
fi
# Generate -smp qemu argument.
-qemu_args="-enable-kvm -nographic $qemu_args"
+qemu_args="-enable-kvm -nographic -no-reboot $qemu_args"
cpu_count=`configNR_CPUS.sh $resdir/ConfigFragment`
cpu_count=`configfrag_boot_cpus "$boot_args" "$config_template" "$cpu_count"`
if test "$cpu_count" -gt "$TORTURE_ALLOTED_CPUS"
--
2.42.0.rc1.204.g551eb34607-goog
I'm announcing the release of the 6.1.49 kernel.
This upgrade is only for all users of the 6.1 series that use the x86
platform OR the F2FS file system. If that's not you, feel free to
ignore this release.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 -
fs/f2fs/f2fs.h | 1
fs/f2fs/file.c | 5 ++++
fs/f2fs/node.c | 14 +----------
fs/f2fs/super.c | 43 +++++++++++------------------------
include/linux/f2fs_fs.h | 1
tools/objtool/arch/x86/decode.c | 11 +++++---
tools/objtool/check.c | 22 +++++++++++++++++
tools/objtool/include/objtool/arch.h | 1
tools/objtool/include/objtool/elf.h | 1
10 files changed, 53 insertions(+), 48 deletions(-)
Greg Kroah-Hartman (4):
Revert "f2fs: don't reset unchangable mount option in f2fs_remount()"
Revert "f2fs: fix to set flush_merge opt and show noflush_merge"
Revert "f2fs: fix to do sanity check on direct node in truncate_dnode()"
Linux 6.1.49
Peter Zijlstra (1):
objtool/x86: Fix SRSO mess
Cc: stable(a)vger.kernel.org # v5.11+
Fixes: e7e0545299d8 ("x86/sgx: Initialize metadata for Enclave Page Cache (EPC) sections")
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308221542.11UpkVfp-lkp@intel.com/
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
arch/x86/kernel/cpu/sgx/main.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 166692f2d501..388350b8f5e3 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -732,6 +732,10 @@ int arch_memory_failure(unsigned long pfn, int flags)
}
/**
+ * sgx_calc_section_metric() - Calculate an EPC section metric
+ * @low: low 32-bit word from CPUID:0x12:{2, ...}
+ * @high: high 32-bit word from CPUID:0x12:{2, ...}
+ *
* A section metric is concatenated in a way that @low bits 12-31 define the
* bits 12-31 of the metric and @high bits 0-19 define the bits 32-51 of the
* metric.
--
2.39.2
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
folio_mapcount() is used to check whether the folio is shared. But it's
not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases:
User space application call madvise() with MADV_FREE, MADV_COLD and
MADV_PAGEOUT for specific address range. There are THP mapped to the
range. Without the patchset, the THP is skipped. With the patch, the
THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT
skip THP:
https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel…
and I confirmed this patchset make it work again.
This patch (of 3):
Commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range()
to use folios") replaced the page_mapcount() with folio_mapcount() to
check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com
Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com
Fixes: 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
(cherry picked from commit 2f406263e3e954aa24c1248edcfa9be0c1bb30fa)
---
mm/madvise.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/mm/madvise.c b/mm/madvise.c
index b5ffbaf616f5..6adee363a9fa 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -375,7 +375,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
folio = pfn_folio(pmd_pfn(orig_pmd));
/* Do not interfere with other mappings of this folio */
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
goto huge_unlock;
if (pageout_anon_only_filter && !folio_test_anon(folio))
@@ -447,7 +447,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
* are sure it's worth. Split it if we are only owner.
*/
if (folio_test_large(folio)) {
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (pageout_anon_only_filter && !folio_test_anon(folio))
break;
--
2.39.2
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 0e0e9bd5f7b9d40fd03b70092367247d52da1db0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082614-choice-mongoose-0731@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0e0e9bd5f7b9d40fd03b70092367247d52da1db0 Mon Sep 17 00:00:00 2001
From: Yin Fengwei <fengwei.yin(a)intel.com>
Date: Tue, 8 Aug 2023 10:09:17 +0800
Subject: [PATCH] madvise:madvise_free_pte_range(): don't use mapcount()
against large folio for sharing check
Commit 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a
folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com
Fixes: 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/madvise.c b/mm/madvise.c
index 46802b4cf65a..ec30f48f8f2e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -680,7 +680,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
if (folio_test_large(folio)) {
int err;
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (!folio_trylock(folio))
break;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 0e0e9bd5f7b9d40fd03b70092367247d52da1db0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082616-velocity-mocha-97c0@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
0e0e9bd5f7b9 ("madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check")
f3cd4ab0aabf ("mm/madvise: clean up pte_offset_map_lock() scans")
07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
fd3b1bc3c86e ("mm/madvise: fix madvise_pageout for private file mappings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 0e0e9bd5f7b9d40fd03b70092367247d52da1db0 Mon Sep 17 00:00:00 2001
From: Yin Fengwei <fengwei.yin(a)intel.com>
Date: Tue, 8 Aug 2023 10:09:17 +0800
Subject: [PATCH] madvise:madvise_free_pte_range(): don't use mapcount()
against large folio for sharing check
Commit 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a
folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com
Fixes: 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/madvise.c b/mm/madvise.c
index 46802b4cf65a..ec30f48f8f2e 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -680,7 +680,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
if (folio_test_large(folio)) {
int err;
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (!folio_trylock(folio))
break;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 56b930dcd88c2adc261410501c402c790980bdb5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023081222-chummy-aqueduct-85c2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
56b930dcd88c ("hwmon: (aquacomputer_d5next) Add selective 200ms delay after sending ctrl report")
19692f17cd13 ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Aquastream XT")
866e630a3b8b ("hwmon: (aquacomputer_d5next) Add temperature offset control for Aquaero")
6c83ccb10c49 ("hwmon: (aquacomputer_d5next) Add infrastructure for Aquaero control reports")
b29090bac935 ("hwmon: (aquacomputer_d5next) Device dependent control report settings")
7505dab78f58 ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Aquastream Ultimate")
e0f6c370f0ad ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Poweradjust 3")
3d2e9f582a8e ("hwmon: (aquacomputer_d5next) Add support for reading calculated Aquaero sensors")
2c55211104b4 ("hwmon: (aquacomputer_d5next) Support sensors for Aquacomputer Aquaero")
ad2f0811fbeb ("hwmon: (aquacomputer_d5next) Device dependent serial number and firmware offsets")
249c752110a5 ("hwmon: (aquacomputer_d5next) Add structure for fan layout")
8bcb02bdc638 ("hwmon: (aquacomputer_d5next) Rename AQC_TEMP_SENSOR_SIZE to AQC_SENSOR_SIZE")
6ff838f2877d ("hwmon: (aquacomputer_d5next) Add support for Quadro flow sensor pulses")
d5d896b83822 ("hwmon: (aquacomputer_d5next) Clear up macros and comments")
662d20b3a5af ("hwmon: (aquacomputer_d5next) Add support for temperature sensor offsets")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 56b930dcd88c2adc261410501c402c790980bdb5 Mon Sep 17 00:00:00 2001
From: Aleksa Savic <savicaleksa83(a)gmail.com>
Date: Mon, 7 Aug 2023 19:20:03 +0200
Subject: [PATCH] hwmon: (aquacomputer_d5next) Add selective 200ms delay after
sending ctrl report
Add a 200ms delay after sending a ctrl report to Quadro,
Octo, D5 Next and Aquaero to give them enough time to
process the request and save the data to memory. Otherwise,
under heavier userspace loads where multiple sysfs entries
are usually set in quick succession, a new ctrl report could
be requested from the device while it's still processing the
previous one and fail with -EPIPE. The delay is only applied
if two ctrl report operations are near each other in time.
Reported by a user on Github [1] and tested by both of us.
[1] https://github.com/aleksamagicka/aquacomputer_d5next-hwmon/issues/82
Fixes: 752b927951ea ("hwmon: (aquacomputer_d5next) Add support for Aquacomputer Octo")
Signed-off-by: Aleksa Savic <savicaleksa83(a)gmail.com>
Link: https://lore.kernel.org/r/20230807172004.456968-1-savicaleksa83@gmail.com
Signed-off-by: Guenter Roeck <linux(a)roeck-us.net>
diff --git a/drivers/hwmon/aquacomputer_d5next.c b/drivers/hwmon/aquacomputer_d5next.c
index a997dbcb563f..023807859be7 100644
--- a/drivers/hwmon/aquacomputer_d5next.c
+++ b/drivers/hwmon/aquacomputer_d5next.c
@@ -13,9 +13,11 @@
#include <linux/crc16.h>
#include <linux/debugfs.h>
+#include <linux/delay.h>
#include <linux/hid.h>
#include <linux/hwmon.h>
#include <linux/jiffies.h>
+#include <linux/ktime.h>
#include <linux/module.h>
#include <linux/mutex.h>
#include <linux/seq_file.h>
@@ -63,6 +65,8 @@ static const char *const aqc_device_names[] = {
#define CTRL_REPORT_ID 0x03
#define AQUAERO_CTRL_REPORT_ID 0x0b
+#define CTRL_REPORT_DELAY 200 /* ms */
+
/* The HID report that the official software always sends
* after writing values, currently same for all devices
*/
@@ -527,6 +531,9 @@ struct aqc_data {
int secondary_ctrl_report_size;
u8 *secondary_ctrl_report;
+ ktime_t last_ctrl_report_op;
+ int ctrl_report_delay; /* Delay between two ctrl report operations, in ms */
+
int buffer_size;
u8 *buffer;
int checksum_start;
@@ -611,17 +618,35 @@ static int aqc_aquastreamxt_convert_fan_rpm(u16 val)
return 0;
}
+static void aqc_delay_ctrl_report(struct aqc_data *priv)
+{
+ /*
+ * If previous read or write is too close to this one, delay the current operation
+ * to give the device enough time to process the previous one.
+ */
+ if (priv->ctrl_report_delay) {
+ s64 delta = ktime_ms_delta(ktime_get(), priv->last_ctrl_report_op);
+
+ if (delta < priv->ctrl_report_delay)
+ msleep(priv->ctrl_report_delay - delta);
+ }
+}
+
/* Expects the mutex to be locked */
static int aqc_get_ctrl_data(struct aqc_data *priv)
{
int ret;
+ aqc_delay_ctrl_report(priv);
+
memset(priv->buffer, 0x00, priv->buffer_size);
ret = hid_hw_raw_request(priv->hdev, priv->ctrl_report_id, priv->buffer, priv->buffer_size,
HID_FEATURE_REPORT, HID_REQ_GET_REPORT);
if (ret < 0)
ret = -ENODATA;
+ priv->last_ctrl_report_op = ktime_get();
+
return ret;
}
@@ -631,6 +656,8 @@ static int aqc_send_ctrl_data(struct aqc_data *priv)
int ret;
u16 checksum;
+ aqc_delay_ctrl_report(priv);
+
/* Checksum is not needed for Aquaero */
if (priv->kind != aquaero) {
/* Init and xorout value for CRC-16/USB is 0xffff */
@@ -646,12 +673,16 @@ static int aqc_send_ctrl_data(struct aqc_data *priv)
ret = hid_hw_raw_request(priv->hdev, priv->ctrl_report_id, priv->buffer, priv->buffer_size,
HID_FEATURE_REPORT, HID_REQ_SET_REPORT);
if (ret < 0)
- return ret;
+ goto record_access_and_ret;
/* The official software sends this report after every change, so do it here as well */
ret = hid_hw_raw_request(priv->hdev, priv->secondary_ctrl_report_id,
priv->secondary_ctrl_report, priv->secondary_ctrl_report_size,
HID_FEATURE_REPORT, HID_REQ_SET_REPORT);
+
+record_access_and_ret:
+ priv->last_ctrl_report_op = ktime_get();
+
return ret;
}
@@ -1524,6 +1555,7 @@ static int aqc_probe(struct hid_device *hdev, const struct hid_device_id *id)
priv->buffer_size = AQUAERO_CTRL_REPORT_SIZE;
priv->temp_ctrl_offset = AQUAERO_TEMP_CTRL_OFFSET;
+ priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->temp_label = label_temp_sensors;
priv->virtual_temp_label = label_virtual_temp_sensors;
@@ -1547,6 +1579,7 @@ static int aqc_probe(struct hid_device *hdev, const struct hid_device_id *id)
priv->temp_ctrl_offset = D5NEXT_TEMP_CTRL_OFFSET;
priv->buffer_size = D5NEXT_CTRL_REPORT_SIZE;
+ priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->power_cycle_count_offset = D5NEXT_POWER_CYCLES;
@@ -1597,6 +1630,7 @@ static int aqc_probe(struct hid_device *hdev, const struct hid_device_id *id)
priv->temp_ctrl_offset = OCTO_TEMP_CTRL_OFFSET;
priv->buffer_size = OCTO_CTRL_REPORT_SIZE;
+ priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->power_cycle_count_offset = OCTO_POWER_CYCLES;
@@ -1624,6 +1658,7 @@ static int aqc_probe(struct hid_device *hdev, const struct hid_device_id *id)
priv->temp_ctrl_offset = QUADRO_TEMP_CTRL_OFFSET;
priv->buffer_size = QUADRO_CTRL_REPORT_SIZE;
+ priv->ctrl_report_delay = CTRL_REPORT_DELAY;
priv->flow_pulses_ctrl_offset = QUADRO_FLOW_PULSES_CTRL_OFFSET;
priv->power_cycle_count_offset = QUADRO_POWER_CYCLES;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c275a176e4b69868576e543409927ae75e3a3288
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082737-cavity-bloating-1779@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
c275a176e4b6 ("can: raw: add missing refcount for memory leak fix")
ee8b94c8510c ("can: raw: fix receiver memory leak")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c275a176e4b69868576e543409927ae75e3a3288 Mon Sep 17 00:00:00 2001
From: Oliver Hartkopp <socketcan(a)hartkopp.net>
Date: Mon, 21 Aug 2023 16:45:47 +0200
Subject: [PATCH] can: raw: add missing refcount for memory leak fix
Commit ee8b94c8510c ("can: raw: fix receiver memory leak") introduced
a new reference to the CAN netdevice that has assigned CAN filters.
But this new ro->dev reference did not maintain its own refcount which
lead to another KASAN use-after-free splat found by Eric Dumazet.
This patch ensures a proper refcount for the CAN nedevice.
Fixes: ee8b94c8510c ("can: raw: fix receiver memory leak")
Reported-by: Eric Dumazet <edumazet(a)google.com>
Cc: Ziyang Xuan <william.xuanziyang(a)huawei.com>
Signed-off-by: Oliver Hartkopp <socketcan(a)hartkopp.net>
Link: https://lore.kernel.org/r/20230821144547.6658-3-socketcan@hartkopp.net
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/can/raw.c b/net/can/raw.c
index e10f59375659..d50c3f3d892f 100644
--- a/net/can/raw.c
+++ b/net/can/raw.c
@@ -85,6 +85,7 @@ struct raw_sock {
int bound;
int ifindex;
struct net_device *dev;
+ netdevice_tracker dev_tracker;
struct list_head notifier;
int loopback;
int recv_own_msgs;
@@ -285,8 +286,10 @@ static void raw_notify(struct raw_sock *ro, unsigned long msg,
case NETDEV_UNREGISTER:
lock_sock(sk);
/* remove current filters & unregister */
- if (ro->bound)
+ if (ro->bound) {
raw_disable_allfilters(dev_net(dev), dev, sk);
+ netdev_put(dev, &ro->dev_tracker);
+ }
if (ro->count > 1)
kfree(ro->filter);
@@ -391,10 +394,12 @@ static int raw_release(struct socket *sock)
/* remove current filters & unregister */
if (ro->bound) {
- if (ro->dev)
+ if (ro->dev) {
raw_disable_allfilters(dev_net(ro->dev), ro->dev, sk);
- else
+ netdev_put(ro->dev, &ro->dev_tracker);
+ } else {
raw_disable_allfilters(sock_net(sk), NULL, sk);
+ }
}
if (ro->count > 1)
@@ -445,10 +450,10 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len)
goto out;
}
if (dev->type != ARPHRD_CAN) {
- dev_put(dev);
err = -ENODEV;
- goto out;
+ goto out_put_dev;
}
+
if (!(dev->flags & IFF_UP))
notify_enetdown = 1;
@@ -456,7 +461,9 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len)
/* filters set by default/setsockopt */
err = raw_enable_allfilters(sock_net(sk), dev, sk);
- dev_put(dev);
+ if (err)
+ goto out_put_dev;
+
} else {
ifindex = 0;
@@ -467,18 +474,28 @@ static int raw_bind(struct socket *sock, struct sockaddr *uaddr, int len)
if (!err) {
if (ro->bound) {
/* unregister old filters */
- if (ro->dev)
+ if (ro->dev) {
raw_disable_allfilters(dev_net(ro->dev),
ro->dev, sk);
- else
+ /* drop reference to old ro->dev */
+ netdev_put(ro->dev, &ro->dev_tracker);
+ } else {
raw_disable_allfilters(sock_net(sk), NULL, sk);
+ }
}
ro->ifindex = ifindex;
ro->bound = 1;
+ /* bind() ok -> hold a reference for new ro->dev */
ro->dev = dev;
+ if (ro->dev)
+ netdev_hold(ro->dev, &ro->dev_tracker, GFP_KERNEL);
}
- out:
+out_put_dev:
+ /* remove potential reference from dev_get_by_index() */
+ if (dev)
+ dev_put(dev);
+out:
release_sock(sk);
rtnl_unlock();
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x dbf46008775516f7f25c95b7760041c286299783
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082724-deflate-drinkable-54a1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
dbf460087755 ("objtool/x86: Fixup frame-pointer vs rethunk")
c6f5dc28fb3d ("objtool: Union instruction::{call_dest,jump_table}")
0932dbe1f568 ("objtool: Remove instruction::reloc")
8b2de412158e ("objtool: Shrink instruction::{type,visited}")
d54066546121 ("objtool: Make instruction::alts a single-linked list")
3ee88df1b063 ("objtool: Make instruction::stack_ops a single-linked list")
20a554638dd2 ("objtool: Change arch_decode_instruction() signature")
5f6e430f931d ("Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dbf46008775516f7f25c95b7760041c286299783 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Wed, 16 Aug 2023 13:59:21 +0200
Subject: [PATCH] objtool/x86: Fixup frame-pointer vs rethunk
For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.
Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:
vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
Fixes: 4ae68b26c3ab ("objtool/x86: Fix SRSO mess")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe(a)kernel.org>
Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-a…
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 7a9aaf400873..1384090530db 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2650,12 +2650,17 @@ static int decode_sections(struct objtool_file *file)
return 0;
}
-static bool is_fentry_call(struct instruction *insn)
+static bool is_special_call(struct instruction *insn)
{
- if (insn->type == INSN_CALL &&
- insn_call_dest(insn) &&
- insn_call_dest(insn)->fentry)
- return true;
+ if (insn->type == INSN_CALL) {
+ struct symbol *dest = insn_call_dest(insn);
+
+ if (!dest)
+ return false;
+
+ if (dest->fentry || dest->embedded_insn)
+ return true;
+ }
return false;
}
@@ -3656,7 +3661,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
if (ret)
return ret;
- if (opts.stackval && func && !is_fentry_call(insn) &&
+ if (opts.stackval && func && !is_special_call(insn) &&
!has_valid_stack_frame(&state)) {
WARN_INSN(insn, "call without frame pointer save/setup");
return 1;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x dbf46008775516f7f25c95b7760041c286299783
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082722-rocking-unbaked-1baf@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
dbf460087755 ("objtool/x86: Fixup frame-pointer vs rethunk")
c6f5dc28fb3d ("objtool: Union instruction::{call_dest,jump_table}")
0932dbe1f568 ("objtool: Remove instruction::reloc")
8b2de412158e ("objtool: Shrink instruction::{type,visited}")
d54066546121 ("objtool: Make instruction::alts a single-linked list")
3ee88df1b063 ("objtool: Make instruction::stack_ops a single-linked list")
20a554638dd2 ("objtool: Change arch_decode_instruction() signature")
5f6e430f931d ("Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dbf46008775516f7f25c95b7760041c286299783 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Wed, 16 Aug 2023 13:59:21 +0200
Subject: [PATCH] objtool/x86: Fixup frame-pointer vs rethunk
For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.
Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:
vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
Fixes: 4ae68b26c3ab ("objtool/x86: Fix SRSO mess")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe(a)kernel.org>
Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-a…
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 7a9aaf400873..1384090530db 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2650,12 +2650,17 @@ static int decode_sections(struct objtool_file *file)
return 0;
}
-static bool is_fentry_call(struct instruction *insn)
+static bool is_special_call(struct instruction *insn)
{
- if (insn->type == INSN_CALL &&
- insn_call_dest(insn) &&
- insn_call_dest(insn)->fentry)
- return true;
+ if (insn->type == INSN_CALL) {
+ struct symbol *dest = insn_call_dest(insn);
+
+ if (!dest)
+ return false;
+
+ if (dest->fentry || dest->embedded_insn)
+ return true;
+ }
return false;
}
@@ -3656,7 +3661,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
if (ret)
return ret;
- if (opts.stackval && func && !is_fentry_call(insn) &&
+ if (opts.stackval && func && !is_special_call(insn) &&
!has_valid_stack_frame(&state)) {
WARN_INSN(insn, "call without frame pointer save/setup");
return 1;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x dbf46008775516f7f25c95b7760041c286299783
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082723-bribe-sporty-3c8c@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
dbf460087755 ("objtool/x86: Fixup frame-pointer vs rethunk")
c6f5dc28fb3d ("objtool: Union instruction::{call_dest,jump_table}")
0932dbe1f568 ("objtool: Remove instruction::reloc")
8b2de412158e ("objtool: Shrink instruction::{type,visited}")
d54066546121 ("objtool: Make instruction::alts a single-linked list")
3ee88df1b063 ("objtool: Make instruction::stack_ops a single-linked list")
20a554638dd2 ("objtool: Change arch_decode_instruction() signature")
5f6e430f931d ("Merge tag 'powerpc-6.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From dbf46008775516f7f25c95b7760041c286299783 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz(a)infradead.org>
Date: Wed, 16 Aug 2023 13:59:21 +0200
Subject: [PATCH] objtool/x86: Fixup frame-pointer vs rethunk
For stack-validation of a frame-pointer build, objtool validates that
every CALL instruction is preceded by a frame-setup. The new SRSO
return thunks violate this with their RSB stuffing trickery.
Extend the __fentry__ exception to also cover the embedded_insn case
used for this. This cures:
vmlinux.o: warning: objtool: srso_untrain_ret+0xd: call without frame pointer save/setup
Fixes: 4ae68b26c3ab ("objtool/x86: Fix SRSO mess")
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Acked-by: Josh Poimboeuf <jpoimboe(a)kernel.org>
Link: https://lore.kernel.org/r/20230816115921.GH980931@hirez.programming.kicks-a…
diff --git a/tools/objtool/check.c b/tools/objtool/check.c
index 7a9aaf400873..1384090530db 100644
--- a/tools/objtool/check.c
+++ b/tools/objtool/check.c
@@ -2650,12 +2650,17 @@ static int decode_sections(struct objtool_file *file)
return 0;
}
-static bool is_fentry_call(struct instruction *insn)
+static bool is_special_call(struct instruction *insn)
{
- if (insn->type == INSN_CALL &&
- insn_call_dest(insn) &&
- insn_call_dest(insn)->fentry)
- return true;
+ if (insn->type == INSN_CALL) {
+ struct symbol *dest = insn_call_dest(insn);
+
+ if (!dest)
+ return false;
+
+ if (dest->fentry || dest->embedded_insn)
+ return true;
+ }
return false;
}
@@ -3656,7 +3661,7 @@ static int validate_branch(struct objtool_file *file, struct symbol *func,
if (ret)
return ret;
- if (opts.stackval && func && !is_fentry_call(insn) &&
+ if (opts.stackval && func && !is_special_call(insn) &&
!has_valid_stack_frame(&state)) {
WARN_INSN(insn, "call without frame pointer save/setup");
return 1;
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x c4d6b5438116c184027b2e911c0f2c7c406fb47c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082711-crown-acuteness-453a@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
c4d6b5438116 ("tracing/synthetic: Allocate one additional element for size")
fc1a9dc10129 ("tracing/histogram: Don't use strlen to find length of stacktrace variables")
288709c9f3b0 ("tracing: Allow stacktraces to be saved as histogram variables")
5f2e094ed259 ("tracing: Allow multiple hitcount values in histograms")
0934ae9977c2 ("tracing: Fix reading strings from synthetic events")
86087383ec0a ("tracing/hist: Call hist functions directly via a switch statement")
05770dd0ad11 ("tracing: Support __rel_loc relative dynamic data location attribute")
938aa33f1465 ("tracing: Add length protection to histogram string copies")
63f84ae6b82b ("tracing/histogram: Do not copy the fixed-size char array field over the field size")
8b5d46fd7a38 ("tracing/histogram: Optimize division by constants")
f47716b7a955 ("tracing/histogram: Covert expr to const if both operands are constants")
c5eac6ee8bc5 ("tracing/histogram: Simplify handling of .sym-offset in expressions")
9710b2f341a0 ("tracing: Fix operator precedence for hist triggers expression")
bcef04415032 ("tracing: Add division and multiplication support for hist triggers")
52cfb373536a ("tracing: Add support for creating hist trigger variables from literal")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c4d6b5438116c184027b2e911c0f2c7c406fb47c Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Wed, 16 Aug 2023 17:49:28 +0200
Subject: [PATCH] tracing/synthetic: Allocate one additional element for size
While debugging another issue I noticed that the stack trace contains one
invalid entry at the end:
<idle>-0 [008] d..4. 26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK:
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x150/0x2c0
=> kcompactd+0x9ca/0xc20
=> kthread+0x2f6/0x3d8
=> __ret_from_fork+0x8a/0xe8
=> 0x6b6b6b6b6b6b6b6b
This is because the code failed to add the one element containing the
number of entries to field_size.
Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 80a2a832f857..9897d0bfcab7 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -528,7 +528,8 @@ static notrace void trace_event_raw_event_synth(void *__data,
str_val = (char *)(long)var_ref_vals[val_idx];
if (event->dynamic_fields[i]->is_stack) {
- len = *((unsigned long *)str_val);
+ /* reserve one extra element for size */
+ len = *((unsigned long *)str_val) + 1;
len *= sizeof(unsigned long);
} else {
len = fetch_store_strlen((unsigned long)str_val);
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c4d6b5438116c184027b2e911c0f2c7c406fb47c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082710-twisty-automatic-966e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
c4d6b5438116 ("tracing/synthetic: Allocate one additional element for size")
fc1a9dc10129 ("tracing/histogram: Don't use strlen to find length of stacktrace variables")
288709c9f3b0 ("tracing: Allow stacktraces to be saved as histogram variables")
5f2e094ed259 ("tracing: Allow multiple hitcount values in histograms")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c4d6b5438116c184027b2e911c0f2c7c406fb47c Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Wed, 16 Aug 2023 17:49:28 +0200
Subject: [PATCH] tracing/synthetic: Allocate one additional element for size
While debugging another issue I noticed that the stack trace contains one
invalid entry at the end:
<idle>-0 [008] d..4. 26.484201: wake_lat: pid=0 delta=2629976084 000000009cc24024 stack=STACK:
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x150/0x2c0
=> kcompactd+0x9ca/0xc20
=> kthread+0x2f6/0x3d8
=> __ret_from_fork+0x8a/0xe8
=> 0x6b6b6b6b6b6b6b6b
This is because the code failed to add the one element containing the
number of entries to field_size.
Link: https://lkml.kernel.org/r/20230816154928.4171614-4-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 80a2a832f857..9897d0bfcab7 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -528,7 +528,8 @@ static notrace void trace_event_raw_event_synth(void *__data,
str_val = (char *)(long)var_ref_vals[val_idx];
if (event->dynamic_fields[i]->is_stack) {
- len = *((unsigned long *)str_val);
+ /* reserve one extra element for size */
+ len = *((unsigned long *)str_val) + 1;
len *= sizeof(unsigned long);
} else {
len = fetch_store_strlen((unsigned long)str_val);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 887f92e09ef34a949745ad26ce82be69e2dabcf6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082700-clamp-coerce-0153@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
887f92e09ef3 ("tracing/synthetic: Skip first entry for stack traces")
ddeea494a16f ("tracing/synthetic: Use union instead of casts")
116b41162f8b ("Merge tag 'probes-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 887f92e09ef34a949745ad26ce82be69e2dabcf6 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Wed, 16 Aug 2023 17:49:27 +0200
Subject: [PATCH] tracing/synthetic: Skip first entry for stack traces
While debugging another issue I noticed that the stack trace output
contains the number of entries on top:
<idle>-0 [000] d..4. 203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK:
=> 0x10
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x242/0x2c0
=> __wait_for_common+0x434/0x680
=> __wait_rcu_gp+0x198/0x3e0
=> synchronize_rcu+0x112/0x138
=> ring_buffer_reset_online_cpus+0x140/0x2e0
=> tracing_reset_online_cpus+0x15c/0x1d0
=> tracing_set_clock+0x180/0x1d8
=> hist_register_trigger+0x486/0x670
=> event_hist_trigger_parse+0x494/0x1318
=> trigger_process_regex+0x1d4/0x258
=> event_trigger_write+0xb4/0x170
=> vfs_write+0x210/0xad0
=> ksys_write+0x122/0x208
Fix this by skipping the first element. Also replace the pointer
logic with an index variable which is easier to read.
Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 7fff8235075f..80a2a832f857 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -350,7 +350,7 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
struct trace_seq *s = &iter->seq;
struct synth_trace_event *entry;
struct synth_event *se;
- unsigned int i, n_u64;
+ unsigned int i, j, n_u64;
char print_fmt[32];
const char *fmt;
@@ -389,18 +389,13 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
}
} else if (se->fields[i]->is_stack) {
- unsigned long *p, *end;
union trace_synth_field *data = &entry->fields[n_u64];
-
- p = (void *)entry + data->as_dynamic.offset;
- end = (void *)p + data->as_dynamic.len - (sizeof(long) - 1);
+ unsigned long *p = (void *)entry + data->as_dynamic.offset;
trace_seq_printf(s, "%s=STACK:\n", se->fields[i]->name);
-
- for (; *p && p < end; p++)
- trace_seq_printf(s, "=> %pS\n", (void *)*p);
+ for (j = 1; j < data->as_dynamic.len / sizeof(long); j++)
+ trace_seq_printf(s, "=> %pS\n", (void *)p[j]);
n_u64++;
-
} else {
struct trace_print_flags __flags[] = {
__def_gfpflag_names, {-1, NULL} };
@@ -490,10 +485,6 @@ static unsigned int trace_stack(struct synth_trace_event *entry,
break;
}
- /* Include the zero'd element if it fits */
- if (len < HIST_STACKTRACE_DEPTH)
- len++;
-
len *= sizeof(long);
/* Find the dynamic section to copy the stack into. */
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 887f92e09ef34a949745ad26ce82be69e2dabcf6
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082759-esquire-online-0814@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
887f92e09ef3 ("tracing/synthetic: Skip first entry for stack traces")
ddeea494a16f ("tracing/synthetic: Use union instead of casts")
116b41162f8b ("Merge tag 'probes-v6.3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 887f92e09ef34a949745ad26ce82be69e2dabcf6 Mon Sep 17 00:00:00 2001
From: Sven Schnelle <svens(a)linux.ibm.com>
Date: Wed, 16 Aug 2023 17:49:27 +0200
Subject: [PATCH] tracing/synthetic: Skip first entry for stack traces
While debugging another issue I noticed that the stack trace output
contains the number of entries on top:
<idle>-0 [000] d..4. 203.322502: wake_lat: pid=0 delta=2268270616 stack=STACK:
=> 0x10
=> __schedule+0xac6/0x1a98
=> schedule+0x126/0x2c0
=> schedule_timeout+0x242/0x2c0
=> __wait_for_common+0x434/0x680
=> __wait_rcu_gp+0x198/0x3e0
=> synchronize_rcu+0x112/0x138
=> ring_buffer_reset_online_cpus+0x140/0x2e0
=> tracing_reset_online_cpus+0x15c/0x1d0
=> tracing_set_clock+0x180/0x1d8
=> hist_register_trigger+0x486/0x670
=> event_hist_trigger_parse+0x494/0x1318
=> trigger_process_regex+0x1d4/0x258
=> event_trigger_write+0xb4/0x170
=> vfs_write+0x210/0xad0
=> ksys_write+0x122/0x208
Fix this by skipping the first element. Also replace the pointer
logic with an index variable which is easier to read.
Link: https://lkml.kernel.org/r/20230816154928.4171614-3-svens@linux.ibm.com
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Signed-off-by: Sven Schnelle <svens(a)linux.ibm.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_synth.c b/kernel/trace/trace_events_synth.c
index 7fff8235075f..80a2a832f857 100644
--- a/kernel/trace/trace_events_synth.c
+++ b/kernel/trace/trace_events_synth.c
@@ -350,7 +350,7 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
struct trace_seq *s = &iter->seq;
struct synth_trace_event *entry;
struct synth_event *se;
- unsigned int i, n_u64;
+ unsigned int i, j, n_u64;
char print_fmt[32];
const char *fmt;
@@ -389,18 +389,13 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
n_u64 += STR_VAR_LEN_MAX / sizeof(u64);
}
} else if (se->fields[i]->is_stack) {
- unsigned long *p, *end;
union trace_synth_field *data = &entry->fields[n_u64];
-
- p = (void *)entry + data->as_dynamic.offset;
- end = (void *)p + data->as_dynamic.len - (sizeof(long) - 1);
+ unsigned long *p = (void *)entry + data->as_dynamic.offset;
trace_seq_printf(s, "%s=STACK:\n", se->fields[i]->name);
-
- for (; *p && p < end; p++)
- trace_seq_printf(s, "=> %pS\n", (void *)*p);
+ for (j = 1; j < data->as_dynamic.len / sizeof(long); j++)
+ trace_seq_printf(s, "=> %pS\n", (void *)p[j]);
n_u64++;
-
} else {
struct trace_print_flags __flags[] = {
__def_gfpflag_names, {-1, NULL} };
@@ -490,10 +485,6 @@ static unsigned int trace_stack(struct synth_trace_event *entry,
break;
}
- /* Include the zero'd element if it fits */
- if (len < HIST_STACKTRACE_DEPTH)
- len++;
-
len *= sizeof(long);
/* Find the dynamic section to copy the stack into. */
From: Pietro Borrello <borrello(a)diag.uniroma1.it>
commit 7c4a5b89a0b5a57a64b601775b296abf77a9fe97 upstream.
Commit 326587b84078 ("sched: fix goto retry in pick_next_task_rt()")
removed any path which could make pick_next_rt_entity() return NULL.
However, BUG_ON(!rt_se) in _pick_next_task_rt() (the only caller of
pick_next_rt_entity()) still checks the error condition, which can
never happen, since list_entry() never returns NULL.
Remove the BUG_ON check, and instead emit a warning in the only
possible error condition here: the queue being empty which should
never happen.
Fixes: 326587b84078 ("sched: fix goto retry in pick_next_task_rt()")
Signed-off-by: Pietro Borrello <borrello(a)diag.uniroma1.it>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Reviewed-by: Phil Auld <pauld(a)redhat.com>
Reviewed-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
Link: https://lore.kernel.org/r/20230128-list-entry-null-check-sched-v3-1-b1a71bd…
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Srish: Fixes CVE-2023-1077: sched/rt: pick_next_rt_entity(): check list_entry
An insufficient list empty checking in pick_next_rt_entity().
The _pick_next_task_rt() checks pick_next_rt_entity() returns
NULL or not but pick_next_rt_entity() never returns NULL.
So, even if the list is empty, _pick_next_task_rt() continues
its process.]
Signed-off-by: Srish Srinivasan <ssrish(a)vmware.com>
---
kernel/sched/rt.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c
index 9c6c3572b..394c66442 100644
--- a/kernel/sched/rt.c
+++ b/kernel/sched/rt.c
@@ -1522,6 +1522,8 @@ static struct sched_rt_entity *pick_next_rt_entity(struct rq *rq,
BUG_ON(idx >= MAX_RT_PRIO);
queue = array->queue + idx;
+ if (SCHED_WARN_ON(list_empty(queue)))
+ return NULL;
next = list_entry(queue->next, struct sched_rt_entity, run_list);
return next;
@@ -1535,7 +1537,8 @@ static struct task_struct *_pick_next_task_rt(struct rq *rq)
do {
rt_se = pick_next_rt_entity(rq, rt_rq);
- BUG_ON(!rt_se);
+ if (unlikely(!rt_se))
+ return NULL;
rt_rq = group_rt_rq(rt_se);
} while (rt_rq);
--
2.35.6
commit 9c7c4bc986932218fd0df9d2a100509772028fb1 upstream
sizeof(struct ublksrv_io_cmd) is 16bytes, which can be held in 64byte SQE,
so not necessary to check IO_URING_F_SQE128.
With this change, we get chance to save half SQ ring memory.
Fixed: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver")
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Link: https://lore.kernel.org/r/20230220041413.1524335-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
---
drivers/block/ublk_drv.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/block/ublk_drv.c b/drivers/block/ublk_drv.c
index f48d213fb65e..09d29fa53939 100644
--- a/drivers/block/ublk_drv.c
+++ b/drivers/block/ublk_drv.c
@@ -1271,9 +1271,6 @@ static int ublk_ch_uring_cmd(struct io_uring_cmd *cmd, unsigned int issue_flags)
__func__, cmd->cmd_op, ub_cmd->q_id, tag,
ub_cmd->result);
- if (!(issue_flags & IO_URING_F_SQE128))
- goto out;
-
if (ub_cmd->q_id >= ub->dev_info.nr_hw_queues)
goto out;
--
2.40.1
From: Sanjay R Mehta <sanju.mehta(a)amd.com>
Previously, on unplug events, the TMU mode was disabled first
followed by the Time Synchronization Handshake, irrespective of
whether the tb_switch_tmu_rate_write() API was successful or not.
However, this caused a problem with Thunderbolt 3 (TBT3)
devices, as the TSPacketInterval bits were always enabled by default,
leading the host router to assume that the device router's TMU was
already enabled and preventing it from initiating the Time
Synchronization Handshake. As a result, TBT3 monitors experienced
display flickering from the second hot plug onwards.
To address this issue, we have modified the code to only disable the
Time Synchronization Handshake during TMU disable if the
tb_switch_tmu_rate_write() function is successful. This ensures that
the TBT3 devices function correctly and eliminates the display
flickering issue.
Co-developed-by: Sanath S <Sanath.S(a)amd.com>
Signed-off-by: Sanath S <Sanath.S(a)amd.com>
Signed-off-by: Sanjay R Mehta <sanju.mehta(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
(cherry picked from commit 583893a66d731f5da010a3fa38a0460e05f0149b)
USB4v2 introduced support for uni-directional TMU mode as part of
d49b4f043d63 ("thunderbolt: Add support for enhanced uni-directional TMU mode")
This is not a stable candidate commit, so adjust the code for backport.
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/thunderbolt/tmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/thunderbolt/tmu.c b/drivers/thunderbolt/tmu.c
index 626aca3124b1..d9544600b386 100644
--- a/drivers/thunderbolt/tmu.c
+++ b/drivers/thunderbolt/tmu.c
@@ -415,7 +415,8 @@ int tb_switch_tmu_disable(struct tb_switch *sw)
* uni-directional mode and we don't want to change it's TMU
* mode.
*/
- tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF);
+ ret = tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF);
+ return ret;
tb_port_tmu_time_sync_disable(up);
ret = tb_port_tmu_time_sync_disable(down);
--
2.34.1
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 9d3de7ee192a6a253f475197fe4d2e2af10a731f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072413-glamorous-unjustly-bb12@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 9d3de7ee192a6a253f475197fe4d2e2af10a731f Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Sat, 22 Jul 2023 22:45:24 +0530
Subject: [PATCH] ext4: fix rbtree traversal bug in ext4_mb_use_preallocated
During allocations, while looking for preallocations(PA) in the per
inode rbtree, we can't do a direct traversal of the tree because
ext4_mb_discard_group_preallocation() can paralelly mark the pa deleted
and that can cause direct traversal to skip some entries. This was
leading to a BUG_ON() being hit [1] when we missed a PA that could satisfy
our request and ultimately tried to create a new PA that would overlap
with the missed one.
To makes sure we handle that case while still keeping the performance of
the rbtree, we make use of the fact that the only pa that could possibly
overlap the original goal start is the one that satisfies the below
conditions:
1. It must have it's logical start immediately to the left of
(ie less than) original logical start.
2. It must not be deleted
To find this pa we use the following traversal method:
1. Descend into the rbtree normally to find the immediate neighboring
PA. Here we keep descending irrespective of if the PA is deleted or if
it overlaps with our request etc. The goal is to find an immediately
adjacent PA.
2. If the found PA is on right of original goal, use rb_prev() to find
the left adjacent PA.
3. Check if this PA is deleted and keep moving left with rb_prev() until
a non deleted PA is found.
4. This is the PA we are looking for. Now we can check if it can satisfy
the original request and proceed accordingly.
This approach also takes care of having deleted PAs in the tree.
(While we are at it, also fix a possible overflow bug in calculating the
end of a PA)
[1] https://lore.kernel.org/linux-ext4/CA+G9fYv2FRpLqBZf34ZinR8bU2_ZRAUOjKAD3+t…
Cc: stable(a)kernel.org # 6.4
Fixes: 3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Reported-by: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Reviewed-by: Ritesh Harjani (IBM) ritesh.list(a)gmail.com
Tested-by: Ritesh Harjani (IBM) ritesh.list(a)gmail.com
Link: https://lore.kernel.org/r/edd2efda6a83e6343c5ace9deea44813e71dbe20.16900459…
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index 456150ef6111..21b903fe546e 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -4765,8 +4765,8 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
int order, i;
struct ext4_inode_info *ei = EXT4_I(ac->ac_inode);
struct ext4_locality_group *lg;
- struct ext4_prealloc_space *tmp_pa, *cpa = NULL;
- ext4_lblk_t tmp_pa_start, tmp_pa_end;
+ struct ext4_prealloc_space *tmp_pa = NULL, *cpa = NULL;
+ loff_t tmp_pa_end;
struct rb_node *iter;
ext4_fsblk_t goal_block;
@@ -4774,47 +4774,151 @@ ext4_mb_use_preallocated(struct ext4_allocation_context *ac)
if (!(ac->ac_flags & EXT4_MB_HINT_DATA))
return false;
- /* first, try per-file preallocation */
+ /*
+ * first, try per-file preallocation by searching the inode pa rbtree.
+ *
+ * Here, we can't do a direct traversal of the tree because
+ * ext4_mb_discard_group_preallocation() can paralelly mark the pa
+ * deleted and that can cause direct traversal to skip some entries.
+ */
read_lock(&ei->i_prealloc_lock);
+
+ if (RB_EMPTY_ROOT(&ei->i_prealloc_node)) {
+ goto try_group_pa;
+ }
+
+ /*
+ * Step 1: Find a pa with logical start immediately adjacent to the
+ * original logical start. This could be on the left or right.
+ *
+ * (tmp_pa->pa_lstart never changes so we can skip locking for it).
+ */
for (iter = ei->i_prealloc_node.rb_node; iter;
iter = ext4_mb_pa_rb_next_iter(ac->ac_o_ex.fe_logical,
- tmp_pa_start, iter)) {
+ tmp_pa->pa_lstart, iter)) {
tmp_pa = rb_entry(iter, struct ext4_prealloc_space,
pa_node.inode_node);
+ }
- /* all fields in this condition don't change,
- * so we can skip locking for them */
- tmp_pa_start = tmp_pa->pa_lstart;
- tmp_pa_end = tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len);
+ /*
+ * Step 2: The adjacent pa might be to the right of logical start, find
+ * the left adjacent pa. After this step we'd have a valid tmp_pa whose
+ * logical start is towards the left of original request's logical start
+ */
+ if (tmp_pa->pa_lstart > ac->ac_o_ex.fe_logical) {
+ struct rb_node *tmp;
+ tmp = rb_prev(&tmp_pa->pa_node.inode_node);
- /* original request start doesn't lie in this PA */
- if (ac->ac_o_ex.fe_logical < tmp_pa_start ||
- ac->ac_o_ex.fe_logical >= tmp_pa_end)
- continue;
-
- /* non-extent files can't have physical blocks past 2^32 */
- if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) &&
- (tmp_pa->pa_pstart + EXT4_C2B(sbi, tmp_pa->pa_len) >
- EXT4_MAX_BLOCK_FILE_PHYS)) {
+ if (tmp) {
+ tmp_pa = rb_entry(tmp, struct ext4_prealloc_space,
+ pa_node.inode_node);
+ } else {
/*
- * Since PAs don't overlap, we won't find any
- * other PA to satisfy this.
+ * If there is no adjacent pa to the left then finding
+ * an overlapping pa is not possible hence stop searching
+ * inode pa tree
+ */
+ goto try_group_pa;
+ }
+ }
+
+ BUG_ON(!(tmp_pa && tmp_pa->pa_lstart <= ac->ac_o_ex.fe_logical));
+
+ /*
+ * Step 3: If the left adjacent pa is deleted, keep moving left to find
+ * the first non deleted adjacent pa. After this step we should have a
+ * valid tmp_pa which is guaranteed to be non deleted.
+ */
+ for (iter = &tmp_pa->pa_node.inode_node;; iter = rb_prev(iter)) {
+ if (!iter) {
+ /*
+ * no non deleted left adjacent pa, so stop searching
+ * inode pa tree
+ */
+ goto try_group_pa;
+ }
+ tmp_pa = rb_entry(iter, struct ext4_prealloc_space,
+ pa_node.inode_node);
+ spin_lock(&tmp_pa->pa_lock);
+ if (tmp_pa->pa_deleted == 0) {
+ /*
+ * We will keep holding the pa_lock from
+ * this point on because we don't want group discard
+ * to delete this pa underneath us. Since group
+ * discard is anyways an ENOSPC operation it
+ * should be okay for it to wait a few more cycles.
*/
break;
- }
-
- /* found preallocated blocks, use them */
- spin_lock(&tmp_pa->pa_lock);
- if (tmp_pa->pa_deleted == 0 && tmp_pa->pa_free &&
- likely(ext4_mb_pa_goal_check(ac, tmp_pa))) {
- atomic_inc(&tmp_pa->pa_count);
- ext4_mb_use_inode_pa(ac, tmp_pa);
+ } else {
spin_unlock(&tmp_pa->pa_lock);
- read_unlock(&ei->i_prealloc_lock);
- return true;
}
- spin_unlock(&tmp_pa->pa_lock);
}
+
+ BUG_ON(!(tmp_pa && tmp_pa->pa_lstart <= ac->ac_o_ex.fe_logical));
+ BUG_ON(tmp_pa->pa_deleted == 1);
+
+ /*
+ * Step 4: We now have the non deleted left adjacent pa. Only this
+ * pa can possibly satisfy the request hence check if it overlaps
+ * original logical start and stop searching if it doesn't.
+ */
+ tmp_pa_end = (loff_t)tmp_pa->pa_lstart + EXT4_C2B(sbi, tmp_pa->pa_len);
+
+ if (ac->ac_o_ex.fe_logical >= tmp_pa_end) {
+ spin_unlock(&tmp_pa->pa_lock);
+ goto try_group_pa;
+ }
+
+ /* non-extent files can't have physical blocks past 2^32 */
+ if (!(ext4_test_inode_flag(ac->ac_inode, EXT4_INODE_EXTENTS)) &&
+ (tmp_pa->pa_pstart + EXT4_C2B(sbi, tmp_pa->pa_len) >
+ EXT4_MAX_BLOCK_FILE_PHYS)) {
+ /*
+ * Since PAs don't overlap, we won't find any other PA to
+ * satisfy this.
+ */
+ spin_unlock(&tmp_pa->pa_lock);
+ goto try_group_pa;
+ }
+
+ if (tmp_pa->pa_free && likely(ext4_mb_pa_goal_check(ac, tmp_pa))) {
+ atomic_inc(&tmp_pa->pa_count);
+ ext4_mb_use_inode_pa(ac, tmp_pa);
+ spin_unlock(&tmp_pa->pa_lock);
+ read_unlock(&ei->i_prealloc_lock);
+ return true;
+ } else {
+ /*
+ * We found a valid overlapping pa but couldn't use it because
+ * it had no free blocks. This should ideally never happen
+ * because:
+ *
+ * 1. When a new inode pa is added to rbtree it must have
+ * pa_free > 0 since otherwise we won't actually need
+ * preallocation.
+ *
+ * 2. An inode pa that is in the rbtree can only have it's
+ * pa_free become zero when another thread calls:
+ * ext4_mb_new_blocks
+ * ext4_mb_use_preallocated
+ * ext4_mb_use_inode_pa
+ *
+ * 3. Further, after the above calls make pa_free == 0, we will
+ * immediately remove it from the rbtree in:
+ * ext4_mb_new_blocks
+ * ext4_mb_release_context
+ * ext4_mb_put_pa
+ *
+ * 4. Since the pa_free becoming 0 and pa_free getting removed
+ * from tree both happen in ext4_mb_new_blocks, which is always
+ * called with i_data_sem held for data allocations, we can be
+ * sure that another process will never see a pa in rbtree with
+ * pa_free == 0.
+ */
+ WARN_ON_ONCE(tmp_pa->pa_free == 0);
+ }
+ spin_unlock(&tmp_pa->pa_lock);
+try_group_pa:
read_unlock(&ei->i_prealloc_lock);
/* can we use group allocation? */
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072456-starting-gauging-768c@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 4b430d4ac99750ee2ae2f893f1055c7af1ec3dc5
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082118-donation-clench-604d@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
4b430d4ac997 ("mmc: block: Fix in_flight[issue_type] value error")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4b430d4ac99750ee2ae2f893f1055c7af1ec3dc5 Mon Sep 17 00:00:00 2001
From: Yibin Ding <yibin.ding(a)unisoc.com>
Date: Wed, 2 Aug 2023 10:30:23 +0800
Subject: [PATCH] mmc: block: Fix in_flight[issue_type] value error
For a completed request, after the mmc_blk_mq_complete_rq(mq, req)
function is executed, the bitmap_tags corresponding to the
request will be cleared, that is, the request will be regarded as
idle. If the request is acquired by a different type of process at
this time, the issue_type of the request may change. It further
caused the value of mq->in_flight[issue_type] to be abnormal,
and a large number of requests could not be sent.
p1: p2:
mmc_blk_mq_complete_rq
blk_mq_free_request
blk_mq_get_request
blk_mq_rq_ctx_init
mmc_blk_mq_dec_in_flight
mmc_issue_type(mq, req)
This strategy can ensure the consistency of issue_type
before and after executing mmc_blk_mq_complete_rq.
Fixes: 81196976ed94 ("mmc: block: Add blk-mq support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Yibin Ding <yibin.ding(a)unisoc.com>
Acked-by: Adrian Hunter <adrian.hunter(a)intel.com>
Link: https://lore.kernel.org/r/20230802023023.1318134-1-yunlong.xing@unisoc.com
Signed-off-by: Ulf Hansson <ulf.hansson(a)linaro.org>
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index f701efb1fa78..b6f4be25b31b 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -2097,14 +2097,14 @@ static void mmc_blk_mq_poll_completion(struct mmc_queue *mq,
mmc_blk_urgent_bkops(mq, mqrq);
}
-static void mmc_blk_mq_dec_in_flight(struct mmc_queue *mq, struct request *req)
+static void mmc_blk_mq_dec_in_flight(struct mmc_queue *mq, enum mmc_issue_type issue_type)
{
unsigned long flags;
bool put_card;
spin_lock_irqsave(&mq->lock, flags);
- mq->in_flight[mmc_issue_type(mq, req)] -= 1;
+ mq->in_flight[issue_type] -= 1;
put_card = (mmc_tot_in_flight(mq) == 0);
@@ -2117,6 +2117,7 @@ static void mmc_blk_mq_dec_in_flight(struct mmc_queue *mq, struct request *req)
static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req,
bool can_sleep)
{
+ enum mmc_issue_type issue_type = mmc_issue_type(mq, req);
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
struct mmc_request *mrq = &mqrq->brq.mrq;
struct mmc_host *host = mq->card->host;
@@ -2136,7 +2137,7 @@ static void mmc_blk_mq_post_req(struct mmc_queue *mq, struct request *req,
blk_mq_complete_request(req);
}
- mmc_blk_mq_dec_in_flight(mq, req);
+ mmc_blk_mq_dec_in_flight(mq, issue_type);
}
void mmc_blk_mq_recovery(struct mmc_queue *mq)
We observed a 35% of regression running phoronix pts/ramspeed and also 16%
with unixbench. Regression is caused by the following commit:
dd0f194cfeb5 | mm: rewrite wait_on_page_bit_common() logic
Backporting this fixes the regression (this is already in 5.9+):
- 5ef64cc8987a mm: allow a controlled amount of unfairness in the page lock
Linus Torvalds (1):
mm: allow a controlled amount of unfairness in the page lock
include/linux/mm.h | 2 +
include/linux/wait.h | 2 +
kernel/sysctl.c | 8 +++
mm/filemap.c | 160 ++++++++++++++++++++++++++++++++++---------
4 files changed, 141 insertions(+), 31 deletions(-)
--
2.41.0
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x a337b64f0d5717248a0c894e2618e658e6a9de9f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023080742-ion-implement-ceb1@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
a337b64f0d57 ("drm/i915: Fix premature release of request's reusable memory")
506006055769 ("drm/i915/active: Fix misuse of non-idle barriers as fence trackers")
ad5c99e02047 ("drm/i915: Remove unused bits of i915_vma/active api")
f6c466b84cfa ("drm/i915: Add support for moving fence waiting")
544460c33821 ("drm/i915: Multi-BB execbuf")
5851387a422c ("drm/i915/guc: Implement no mid batch preemption for multi-lrc")
e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
d38a9294491d ("drm/i915/guc: Update debugfs for GuC multi-lrc")
bc955204919e ("drm/i915/guc: Insert submit fences between requests in parent-child relationship")
6b540bf6f143 ("drm/i915/guc: Implement multi-lrc submission")
99b47aaddfa9 ("drm/i915/guc: Implement parallel context pin / unpin functions")
c2aa552ff09d ("drm/i915/guc: Add multi-lrc context registration")
3897df4c0187 ("drm/i915/guc: Introduce context parent-child relationship")
4f3059dc2dbb ("drm/i915: Add logical engine mapping")
1a52faed3131 ("drm/i915/guc: Take GT PM ref when deregistering context")
0ea92ace8b95 ("drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct")
0d8ee5ba8db4 ("drm/i915: Don't back up pinned LMEM context images and rings during suspend")
c56ce9565374 ("drm/i915 Implement LMEM backup and restore for suspend / resume")
0d9388635a22 ("drm/i915/ttm: Implement a function to copy the contents of two TTM-based objects")
68c03c0e985e ("drm/i915/debugfs: Do not report currently active engine when describing objects")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a337b64f0d5717248a0c894e2618e658e6a9de9f Mon Sep 17 00:00:00 2001
From: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Date: Thu, 20 Jul 2023 11:35:44 +0200
Subject: [PATCH] drm/i915: Fix premature release of request's reusable memory
Infinite waits for completion of GPU activity have been observed in CI,
mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or
igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps
generated with a lot of extra trace_printk() calls added to the code,
revealed loops of request dependencies being accidentally built,
preventing the requests from being processed, each waiting for completion
of another one's activity.
After we substitute a new request for a last active one tracked on a
timeline, we set up a dependency of our new request to wait on completion
of current activity of that previous one. While doing that, we must take
care of keeping the old request still in memory until we use its
attributes for setting up that await dependency, or we can happen to set
up the await dependency on an unrelated request that already reuses the
memory previously allocated to the old one, already released. Combined
with perf adding consecutive kernel context remote requests to different
user context timelines, unresolvable loops of await dependencies can be
built, leading do infinite waits.
We obtain a pointer to the previous request to wait upon when we
substitute it with a pointer to our new request in an active tracker,
e.g. in intel_timeline.last_request. In some processing paths we protect
that old request from being freed before we use it by getting a reference
to it under RCU protection, but in others, e.g. __i915_request_commit()
-> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(),
we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU,
that RCU protection is not sufficient against reuse of memory.
We could protect i915_request's memory from being prematurely reused by
calling its release function via call_rcu() and using rcu_read_lock()
consequently, as proposed in v1. However, that approach leads to
significant (up to 10 times) increase of SLAB utilization by i915_request
SLAB cache. Another potential approach is to take a reference to the
previous active fence.
When updating an active fence tracker, we first lock the new fence,
substitute a pointer of the current active fence with the new one, then we
lock the substituted fence. With this approach, there is a time window
after the substitution and before the lock when the request can be
concurrently released by an interrupt handler and its memory reused, then
we may happen to lock and return a new, unrelated request.
Always get a reference to the current active fence first, before
replacing it with a new one. Having it protected from premature release
and reuse, lock it and then replace with the new one but only if not
yet signalled via a potential concurrent interrupt nor replaced with
another one by a potential concurrent thread, otherwise retry, starting
from getting a reference to the new current one. Adjust users to not
get a reference to the previous active fence themselves and always put the
reference got by __i915_active_fence_set() when no longer needed.
v3: Fix lockdep splat reports and other issues caused by incorrect use of
try_cmpxchg() (use (cmpxchg() != prev) instead)
v2: Protect request's memory by getting a reference to it in favor of
delegating its release to call_rcu() (Chris)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211
Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself")
Suggested-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.6+
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janus…
(cherry picked from commit 946e047a3d88d46d15b5c5af0414098e12b243f7)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 8ef93889061a..5ec293011d99 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -449,8 +449,11 @@ int i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
}
} while (unlikely(is_barrier(active)));
- if (!__i915_active_fence_set(active, fence))
+ fence = __i915_active_fence_set(active, fence);
+ if (!fence)
__i915_active_acquire(ref);
+ else
+ dma_fence_put(fence);
out:
i915_active_release(ref);
@@ -469,13 +472,9 @@ __i915_active_set_fence(struct i915_active *ref,
return NULL;
}
- rcu_read_lock();
prev = __i915_active_fence_set(active, fence);
- if (prev)
- prev = dma_fence_get_rcu(prev);
- else
+ if (!prev)
__i915_active_acquire(ref);
- rcu_read_unlock();
return prev;
}
@@ -1019,10 +1018,11 @@ void i915_request_add_active_barriers(struct i915_request *rq)
*
* Records the new @fence as the last active fence along its timeline in
* this active tracker, moving the tracking callbacks from the previous
- * fence onto this one. Returns the previous fence (if not already completed),
- * which the caller must ensure is executed before the new fence. To ensure
- * that the order of fences within the timeline of the i915_active_fence is
- * understood, it should be locked by the caller.
+ * fence onto this one. Gets and returns a reference to the previous fence
+ * (if not already completed), which the caller must put after making sure
+ * that it is executed before the new fence. To ensure that the order of
+ * fences within the timeline of the i915_active_fence is understood, it
+ * should be locked by the caller.
*/
struct dma_fence *
__i915_active_fence_set(struct i915_active_fence *active,
@@ -1031,7 +1031,23 @@ __i915_active_fence_set(struct i915_active_fence *active,
struct dma_fence *prev;
unsigned long flags;
- if (fence == rcu_access_pointer(active->fence))
+ /*
+ * In case of fences embedded in i915_requests, their memory is
+ * SLAB_FAILSAFE_BY_RCU, then it can be reused right after release
+ * by new requests. Then, there is a risk of passing back a pointer
+ * to a new, completely unrelated fence that reuses the same memory
+ * while tracked under a different active tracker. Combined with i915
+ * perf open/close operations that build await dependencies between
+ * engine kernel context requests and user requests from different
+ * timelines, this can lead to dependency loops and infinite waits.
+ *
+ * As a countermeasure, we try to get a reference to the active->fence
+ * first, so if we succeed and pass it back to our user then it is not
+ * released and potentially reused by an unrelated request before the
+ * user has a chance to set up an await dependency on it.
+ */
+ prev = i915_active_fence_get(active);
+ if (fence == prev)
return fence;
GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags));
@@ -1040,27 +1056,56 @@ __i915_active_fence_set(struct i915_active_fence *active,
* Consider that we have two threads arriving (A and B), with
* C already resident as the active->fence.
*
- * A does the xchg first, and so it sees C or NULL depending
- * on the timing of the interrupt handler. If it is NULL, the
- * previous fence must have been signaled and we know that
- * we are first on the timeline. If it is still present,
- * we acquire the lock on that fence and serialise with the interrupt
- * handler, in the process removing it from any future interrupt
- * callback. A will then wait on C before executing (if present).
- *
- * As B is second, it sees A as the previous fence and so waits for
- * it to complete its transition and takes over the occupancy for
- * itself -- remembering that it needs to wait on A before executing.
+ * Both A and B have got a reference to C or NULL, depending on the
+ * timing of the interrupt handler. Let's assume that if A has got C
+ * then it has locked C first (before B).
*
* Note the strong ordering of the timeline also provides consistent
* nesting rules for the fence->lock; the inner lock is always the
* older lock.
*/
spin_lock_irqsave(fence->lock, flags);
- prev = xchg(__active_fence_slot(active), fence);
- if (prev) {
- GEM_BUG_ON(prev == fence);
+ if (prev)
spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+
+ /*
+ * A does the cmpxchg first, and so it sees C or NULL, as before, or
+ * something else, depending on the timing of other threads and/or
+ * interrupt handler. If not the same as before then A unlocks C if
+ * applicable and retries, starting from an attempt to get a new
+ * active->fence. Meanwhile, B follows the same path as A.
+ * Once A succeeds with cmpxch, B fails again, retires, gets A from
+ * active->fence, locks it as soon as A completes, and possibly
+ * succeeds with cmpxchg.
+ */
+ while (cmpxchg(__active_fence_slot(active), prev, fence) != prev) {
+ if (prev) {
+ spin_unlock(prev->lock);
+ dma_fence_put(prev);
+ }
+ spin_unlock_irqrestore(fence->lock, flags);
+
+ prev = i915_active_fence_get(active);
+ GEM_BUG_ON(prev == fence);
+
+ spin_lock_irqsave(fence->lock, flags);
+ if (prev)
+ spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+ }
+
+ /*
+ * If prev is NULL then the previous fence must have been signaled
+ * and we know that we are first on the timeline. If it is still
+ * present then, having the lock on that fence already acquired, we
+ * serialise with the interrupt handler, in the process of removing it
+ * from any future interrupt callback. A will then wait on C before
+ * executing (if present).
+ *
+ * As B is second, it sees A as the previous fence and so waits for
+ * it to complete its transition and takes over the occupancy for
+ * itself -- remembering that it needs to wait on A before executing.
+ */
+ if (prev) {
__list_del_entry(&active->cb.node);
spin_unlock(prev->lock); /* serialise with prev->cb_list */
}
@@ -1077,11 +1122,7 @@ int i915_active_fence_set(struct i915_active_fence *active,
int err = 0;
/* Must maintain timeline ordering wrt previous active requests */
- rcu_read_lock();
fence = __i915_active_fence_set(active, &rq->fence);
- if (fence) /* but the previous fence may not belong to that timeline! */
- fence = dma_fence_get_rcu(fence);
- rcu_read_unlock();
if (fence) {
err = i915_request_await_dma_fence(rq, fence);
dma_fence_put(fence);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 894068bb37b6..833b73edefdb 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1661,6 +1661,11 @@ __i915_request_ensure_parallel_ordering(struct i915_request *rq,
request_to_parent(rq)->parallel.last_rq = i915_request_get(rq);
+ /*
+ * Users have to put a reference potentially got by
+ * __i915_active_fence_set() to the returned request
+ * when no longer needed
+ */
return to_request(__i915_active_fence_set(&timeline->last_request,
&rq->fence));
}
@@ -1707,6 +1712,10 @@ __i915_request_ensure_ordering(struct i915_request *rq,
0);
}
+ /*
+ * Users have to put the reference to prev potentially got
+ * by __i915_active_fence_set() when no longer needed
+ */
return prev;
}
@@ -1760,6 +1769,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
prev = __i915_request_ensure_ordering(rq, timeline);
else
prev = __i915_request_ensure_parallel_ordering(rq, timeline);
+ if (prev)
+ i915_request_put(prev);
/*
* Make sure that no request gazumped us - if it was allocated after
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 656f9aec07dba7c61d469727494a5d1b18d0bef4
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082705-predator-enjoyable-15fb@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 656f9aec07dba7c61d469727494a5d1b18d0bef4 Mon Sep 17 00:00:00 2001
From: Huacai Chen <chenhuacai(a)kernel.org>
Date: Sat, 26 Aug 2023 22:21:57 +0800
Subject: [PATCH] LoongArch: Ensure FP/SIMD registers in the core dump file is
up to date
This is a port of commit 379eb01c21795edb4c ("riscv: Ensure the value
of FP registers in the core dump file is up to date").
The values of FP/SIMD registers in the core dump file come from the
thread.fpu. However, kernel saves the FP/SIMD registers only before
scheduling out the process. If no process switch happens during the
exception handling, kernel will not have a chance to save the latest
values of FP/SIMD registers. So it may cause their values in the core
dump file incorrect. To solve this problem, force fpr_get()/simd_get()
to save the FP/SIMD registers into the thread.fpu if the target task
equals the current task.
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h
index b541f6248837..c2d8962fda00 100644
--- a/arch/loongarch/include/asm/fpu.h
+++ b/arch/loongarch/include/asm/fpu.h
@@ -173,16 +173,30 @@ static inline void restore_fp(struct task_struct *tsk)
_restore_fp(&tsk->thread.fpu);
}
-static inline union fpureg *get_fpu_regs(struct task_struct *tsk)
+static inline void save_fpu_regs(struct task_struct *tsk)
{
+ unsigned int euen;
+
if (tsk == current) {
preempt_disable();
- if (is_fpu_owner())
+
+ euen = csr_read32(LOONGARCH_CSR_EUEN);
+
+#ifdef CONFIG_CPU_HAS_LASX
+ if (euen & CSR_EUEN_LASXEN)
+ _save_lasx(¤t->thread.fpu);
+ else
+#endif
+#ifdef CONFIG_CPU_HAS_LSX
+ if (euen & CSR_EUEN_LSXEN)
+ _save_lsx(¤t->thread.fpu);
+ else
+#endif
+ if (euen & CSR_EUEN_FPEN)
_save_fp(¤t->thread.fpu);
+
preempt_enable();
}
-
- return tsk->thread.fpu.fpr;
}
static inline int is_simd_owner(void)
diff --git a/arch/loongarch/kernel/ptrace.c b/arch/loongarch/kernel/ptrace.c
index a0767c3a0f0a..f72adbf530c6 100644
--- a/arch/loongarch/kernel/ptrace.c
+++ b/arch/loongarch/kernel/ptrace.c
@@ -147,6 +147,8 @@ static int fpr_get(struct task_struct *target,
{
int r;
+ save_fpu_regs(target);
+
if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t))
r = gfpr_get(target, &to);
else
@@ -278,6 +280,8 @@ static int simd_get(struct task_struct *target,
{
const unsigned int wr_size = NUM_FPU_REGS * regset->size;
+ save_fpu_regs(target);
+
if (!tsk_used_math(target)) {
/* The task hasn't used FP or LSX, fill with 0xff */
copy_pad_fprs(target, regset, &to, 0);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a337b64f0d5717248a0c894e2618e658e6a9de9f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023080739-trinity-overfed-f523@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a337b64f0d57 ("drm/i915: Fix premature release of request's reusable memory")
506006055769 ("drm/i915/active: Fix misuse of non-idle barriers as fence trackers")
ad5c99e02047 ("drm/i915: Remove unused bits of i915_vma/active api")
f6c466b84cfa ("drm/i915: Add support for moving fence waiting")
544460c33821 ("drm/i915: Multi-BB execbuf")
5851387a422c ("drm/i915/guc: Implement no mid batch preemption for multi-lrc")
e5e32171a2cf ("drm/i915/guc: Connect UAPI to GuC multi-lrc interface")
d38a9294491d ("drm/i915/guc: Update debugfs for GuC multi-lrc")
bc955204919e ("drm/i915/guc: Insert submit fences between requests in parent-child relationship")
6b540bf6f143 ("drm/i915/guc: Implement multi-lrc submission")
99b47aaddfa9 ("drm/i915/guc: Implement parallel context pin / unpin functions")
c2aa552ff09d ("drm/i915/guc: Add multi-lrc context registration")
3897df4c0187 ("drm/i915/guc: Introduce context parent-child relationship")
4f3059dc2dbb ("drm/i915: Add logical engine mapping")
1a52faed3131 ("drm/i915/guc: Take GT PM ref when deregistering context")
0ea92ace8b95 ("drm/i915/guc: Move GuC guc_id allocation under submission state sub-struct")
0d8ee5ba8db4 ("drm/i915: Don't back up pinned LMEM context images and rings during suspend")
c56ce9565374 ("drm/i915 Implement LMEM backup and restore for suspend / resume")
0d9388635a22 ("drm/i915/ttm: Implement a function to copy the contents of two TTM-based objects")
68c03c0e985e ("drm/i915/debugfs: Do not report currently active engine when describing objects")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a337b64f0d5717248a0c894e2618e658e6a9de9f Mon Sep 17 00:00:00 2001
From: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Date: Thu, 20 Jul 2023 11:35:44 +0200
Subject: [PATCH] drm/i915: Fix premature release of request's reusable memory
Infinite waits for completion of GPU activity have been observed in CI,
mostly inside __i915_active_wait(), triggered by igt@gem_barrier_race or
igt@perf@stress-open-close. Root cause analysis, based of ftrace dumps
generated with a lot of extra trace_printk() calls added to the code,
revealed loops of request dependencies being accidentally built,
preventing the requests from being processed, each waiting for completion
of another one's activity.
After we substitute a new request for a last active one tracked on a
timeline, we set up a dependency of our new request to wait on completion
of current activity of that previous one. While doing that, we must take
care of keeping the old request still in memory until we use its
attributes for setting up that await dependency, or we can happen to set
up the await dependency on an unrelated request that already reuses the
memory previously allocated to the old one, already released. Combined
with perf adding consecutive kernel context remote requests to different
user context timelines, unresolvable loops of await dependencies can be
built, leading do infinite waits.
We obtain a pointer to the previous request to wait upon when we
substitute it with a pointer to our new request in an active tracker,
e.g. in intel_timeline.last_request. In some processing paths we protect
that old request from being freed before we use it by getting a reference
to it under RCU protection, but in others, e.g. __i915_request_commit()
-> __i915_request_add_to_timeline() -> __i915_request_ensure_ordering(),
we don't. But anyway, since the requests' memory is SLAB_FAILSAFE_BY_RCU,
that RCU protection is not sufficient against reuse of memory.
We could protect i915_request's memory from being prematurely reused by
calling its release function via call_rcu() and using rcu_read_lock()
consequently, as proposed in v1. However, that approach leads to
significant (up to 10 times) increase of SLAB utilization by i915_request
SLAB cache. Another potential approach is to take a reference to the
previous active fence.
When updating an active fence tracker, we first lock the new fence,
substitute a pointer of the current active fence with the new one, then we
lock the substituted fence. With this approach, there is a time window
after the substitution and before the lock when the request can be
concurrently released by an interrupt handler and its memory reused, then
we may happen to lock and return a new, unrelated request.
Always get a reference to the current active fence first, before
replacing it with a new one. Having it protected from premature release
and reuse, lock it and then replace with the new one but only if not
yet signalled via a potential concurrent interrupt nor replaced with
another one by a potential concurrent thread, otherwise retry, starting
from getting a reference to the new current one. Adjust users to not
get a reference to the previous active fence themselves and always put the
reference got by __i915_active_fence_set() when no longer needed.
v3: Fix lockdep splat reports and other issues caused by incorrect use of
try_cmpxchg() (use (cmpxchg() != prev) instead)
v2: Protect request's memory by getting a reference to it in favor of
delegating its release to call_rcu() (Chris)
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/8211
Fixes: df9f85d8582e ("drm/i915: Serialise i915_active_fence_set() with itself")
Suggested-by: Chris Wilson <chris(a)chris-wilson.co.uk>
Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> # v5.6+
Reviewed-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Signed-off-by: Andi Shyti <andi.shyti(a)linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230720093543.832147-2-janus…
(cherry picked from commit 946e047a3d88d46d15b5c5af0414098e12b243f7)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 8ef93889061a..5ec293011d99 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -449,8 +449,11 @@ int i915_active_add_request(struct i915_active *ref, struct i915_request *rq)
}
} while (unlikely(is_barrier(active)));
- if (!__i915_active_fence_set(active, fence))
+ fence = __i915_active_fence_set(active, fence);
+ if (!fence)
__i915_active_acquire(ref);
+ else
+ dma_fence_put(fence);
out:
i915_active_release(ref);
@@ -469,13 +472,9 @@ __i915_active_set_fence(struct i915_active *ref,
return NULL;
}
- rcu_read_lock();
prev = __i915_active_fence_set(active, fence);
- if (prev)
- prev = dma_fence_get_rcu(prev);
- else
+ if (!prev)
__i915_active_acquire(ref);
- rcu_read_unlock();
return prev;
}
@@ -1019,10 +1018,11 @@ void i915_request_add_active_barriers(struct i915_request *rq)
*
* Records the new @fence as the last active fence along its timeline in
* this active tracker, moving the tracking callbacks from the previous
- * fence onto this one. Returns the previous fence (if not already completed),
- * which the caller must ensure is executed before the new fence. To ensure
- * that the order of fences within the timeline of the i915_active_fence is
- * understood, it should be locked by the caller.
+ * fence onto this one. Gets and returns a reference to the previous fence
+ * (if not already completed), which the caller must put after making sure
+ * that it is executed before the new fence. To ensure that the order of
+ * fences within the timeline of the i915_active_fence is understood, it
+ * should be locked by the caller.
*/
struct dma_fence *
__i915_active_fence_set(struct i915_active_fence *active,
@@ -1031,7 +1031,23 @@ __i915_active_fence_set(struct i915_active_fence *active,
struct dma_fence *prev;
unsigned long flags;
- if (fence == rcu_access_pointer(active->fence))
+ /*
+ * In case of fences embedded in i915_requests, their memory is
+ * SLAB_FAILSAFE_BY_RCU, then it can be reused right after release
+ * by new requests. Then, there is a risk of passing back a pointer
+ * to a new, completely unrelated fence that reuses the same memory
+ * while tracked under a different active tracker. Combined with i915
+ * perf open/close operations that build await dependencies between
+ * engine kernel context requests and user requests from different
+ * timelines, this can lead to dependency loops and infinite waits.
+ *
+ * As a countermeasure, we try to get a reference to the active->fence
+ * first, so if we succeed and pass it back to our user then it is not
+ * released and potentially reused by an unrelated request before the
+ * user has a chance to set up an await dependency on it.
+ */
+ prev = i915_active_fence_get(active);
+ if (fence == prev)
return fence;
GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags));
@@ -1040,27 +1056,56 @@ __i915_active_fence_set(struct i915_active_fence *active,
* Consider that we have two threads arriving (A and B), with
* C already resident as the active->fence.
*
- * A does the xchg first, and so it sees C or NULL depending
- * on the timing of the interrupt handler. If it is NULL, the
- * previous fence must have been signaled and we know that
- * we are first on the timeline. If it is still present,
- * we acquire the lock on that fence and serialise with the interrupt
- * handler, in the process removing it from any future interrupt
- * callback. A will then wait on C before executing (if present).
- *
- * As B is second, it sees A as the previous fence and so waits for
- * it to complete its transition and takes over the occupancy for
- * itself -- remembering that it needs to wait on A before executing.
+ * Both A and B have got a reference to C or NULL, depending on the
+ * timing of the interrupt handler. Let's assume that if A has got C
+ * then it has locked C first (before B).
*
* Note the strong ordering of the timeline also provides consistent
* nesting rules for the fence->lock; the inner lock is always the
* older lock.
*/
spin_lock_irqsave(fence->lock, flags);
- prev = xchg(__active_fence_slot(active), fence);
- if (prev) {
- GEM_BUG_ON(prev == fence);
+ if (prev)
spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+
+ /*
+ * A does the cmpxchg first, and so it sees C or NULL, as before, or
+ * something else, depending on the timing of other threads and/or
+ * interrupt handler. If not the same as before then A unlocks C if
+ * applicable and retries, starting from an attempt to get a new
+ * active->fence. Meanwhile, B follows the same path as A.
+ * Once A succeeds with cmpxch, B fails again, retires, gets A from
+ * active->fence, locks it as soon as A completes, and possibly
+ * succeeds with cmpxchg.
+ */
+ while (cmpxchg(__active_fence_slot(active), prev, fence) != prev) {
+ if (prev) {
+ spin_unlock(prev->lock);
+ dma_fence_put(prev);
+ }
+ spin_unlock_irqrestore(fence->lock, flags);
+
+ prev = i915_active_fence_get(active);
+ GEM_BUG_ON(prev == fence);
+
+ spin_lock_irqsave(fence->lock, flags);
+ if (prev)
+ spin_lock_nested(prev->lock, SINGLE_DEPTH_NESTING);
+ }
+
+ /*
+ * If prev is NULL then the previous fence must have been signaled
+ * and we know that we are first on the timeline. If it is still
+ * present then, having the lock on that fence already acquired, we
+ * serialise with the interrupt handler, in the process of removing it
+ * from any future interrupt callback. A will then wait on C before
+ * executing (if present).
+ *
+ * As B is second, it sees A as the previous fence and so waits for
+ * it to complete its transition and takes over the occupancy for
+ * itself -- remembering that it needs to wait on A before executing.
+ */
+ if (prev) {
__list_del_entry(&active->cb.node);
spin_unlock(prev->lock); /* serialise with prev->cb_list */
}
@@ -1077,11 +1122,7 @@ int i915_active_fence_set(struct i915_active_fence *active,
int err = 0;
/* Must maintain timeline ordering wrt previous active requests */
- rcu_read_lock();
fence = __i915_active_fence_set(active, &rq->fence);
- if (fence) /* but the previous fence may not belong to that timeline! */
- fence = dma_fence_get_rcu(fence);
- rcu_read_unlock();
if (fence) {
err = i915_request_await_dma_fence(rq, fence);
dma_fence_put(fence);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 894068bb37b6..833b73edefdb 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1661,6 +1661,11 @@ __i915_request_ensure_parallel_ordering(struct i915_request *rq,
request_to_parent(rq)->parallel.last_rq = i915_request_get(rq);
+ /*
+ * Users have to put a reference potentially got by
+ * __i915_active_fence_set() to the returned request
+ * when no longer needed
+ */
return to_request(__i915_active_fence_set(&timeline->last_request,
&rq->fence));
}
@@ -1707,6 +1712,10 @@ __i915_request_ensure_ordering(struct i915_request *rq,
0);
}
+ /*
+ * Users have to put the reference to prev potentially got
+ * by __i915_active_fence_set() when no longer needed
+ */
return prev;
}
@@ -1760,6 +1769,8 @@ __i915_request_add_to_timeline(struct i915_request *rq)
prev = __i915_request_ensure_ordering(rq, timeline);
else
prev = __i915_request_ensure_parallel_ordering(rq, timeline);
+ if (prev)
+ i915_request_put(prev);
/*
* Make sure that no request gazumped us - if it was allocated after
From: Joel Fernandes <joel(a)joelfernandes.org>
[ Upstream commit d52d3a2bf408ff86f3a79560b5cce80efb340239 ]
During shutdown of rcutorture, the shutdown thread in
rcu_torture_cleanup() calls torture_cleanup_begin() which sets fullstop
to FULLSTOP_RMMOD. This is enough to cause the rcutorture threads for
readers and fakewriters to breakout of their main while loop and start
shutting down.
Once out of their main loop, they then call torture_kthread_stopping()
which in turn waits for kthread_stop() to be called, however
rcu_torture_cleanup() has not even called kthread_stop() on those
threads yet, it does that a bit later. However, before it gets a chance
to do so, torture_kthread_stopping() calls
schedule_timeout_interruptible(1) in a tight loop. Tracing confirmed
this makes the timer softirq constantly execute timer callbacks, while
never returning back to the softirq exit path and is essentially "locked
up" because of that. If the softirq preempts the shutdown thread,
kthread_stop() may never be called.
This commit improves the situation dramatically, by increasing timeout
passed to schedule_timeout_interruptible() 1/20th of a second. This
causes the timer softirq to not lock up a CPU and everything works fine.
Testing has shown 100 runs of TREE07 passing reliably, which was not the
case before because of RCU stalls.
Cc: Paul McKenney <paulmck(a)kernel.org>
Cc: Frederic Weisbecker <fweisbec(a)gmail.com>
Cc: Zhouyi Zhou <zhouzhouyi(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # 6.0.x
Signed-off-by: Joel Fernandes (Google) <joel(a)joelfernandes.org>
Reviewed-by: Davidlohr Bueso <dave(a)stgolabs.net>
Tested-by: Zhouyi Zhou <zhouzhouyi(a)gmail.com>
---
kernel/torture.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/torture.c b/kernel/torture.c
index 1061492f14bd..477d9b601438 100644
--- a/kernel/torture.c
+++ b/kernel/torture.c
@@ -788,7 +788,7 @@ void torture_kthread_stopping(char *title)
VERBOSE_TOROUT_STRING(buf);
while (!kthread_should_stop()) {
torture_shutdown_absorb(title);
- schedule_timeout_uninterruptible(1);
+ schedule_timeout_uninterruptible(HZ/20);
}
}
EXPORT_SYMBOL_GPL(torture_kthread_stopping);
--
2.41.0.640.ga95def55d0-goog
These two are backports for 6.1.y. Conflict resolution in done in
both patches.
I have tested LTP-nfs fchown02 and chown02 on 6.1.y with below patches
applied. The tests passed.
I would like to have a review as I am not familiar with this code.
Thanks to Vegard for helping me with this.
Thanks,
Harshit
Christian Brauner (2):
nfs: use vfs setgid helper
nfsd: use vfs setgid helper
fs/attr.c | 1 +
fs/internal.h | 2 --
fs/nfs/inode.c | 4 +---
fs/nfsd/vfs.c | 4 +++-
include/linux/fs.h | 2 ++
5 files changed, 7 insertions(+), 6 deletions(-)
--
2.34.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x f9e96bf1905479f18e83a3a4c314a8dfa56ede2c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082736-canal-swimsuit-6b7b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
f9e96bf19054 ("drm/vmwgfx: Fix possible invalid drm gem put calls")
9ef8d83e8e25 ("drm/vmwgfx: Do not drop the reference to the handle too soon")
668b206601c5 ("drm/vmwgfx: Stop using raw ttm_buffer_object's")
39985eea5a6d ("drm/vmwgfx: Abstract placement selection")
e0029da927fa ("drm/vmwgfx: Rename dummy to is_iomem")
cb8097a45da1 ("drm/vmwgfx: Cleanup the vmw bo usage in the cursor paths")
6703e28f976d ("drm/vmwgfx: Simplify fb pinning")
09881d2940bb ("drm/vmwgfx: Rename vmw_buffer_object to vmw_bo")
6b2e8aa45126 ("drm/vmwgfx: Remove the duplicate bo_free function")
f87c1f0b7b79 ("drm/ttm: prevent moving of pinned BOs")
aebd8f0c6f82 ("Merge v6.2-rc6 into drm-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From f9e96bf1905479f18e83a3a4c314a8dfa56ede2c Mon Sep 17 00:00:00 2001
From: Zack Rusin <zackr(a)vmware.com>
Date: Fri, 18 Aug 2023 00:13:01 -0400
Subject: [PATCH] drm/vmwgfx: Fix possible invalid drm gem put calls
vmw_bo_unreference sets the input buffer to null on exit, resulting in
null ptr deref's on the subsequent drm gem put calls.
This went unnoticed because only very old userspace would be exercising
those paths but it wouldn't be hard to hit on old distros with brand
new kernels.
Introduce a new function that abstracts unrefing of user bo's to make
the code cleaner and more explicit.
Signed-off-by: Zack Rusin <zackr(a)vmware.com>
Reported-by: Ian Forbes <iforbes(a)vmware.com>
Fixes: 9ef8d83e8e25 ("drm/vmwgfx: Do not drop the reference to the handle too soon")
Cc: <stable(a)vger.kernel.org> # v6.4+
Reviewed-by: Maaz Mombasawala<mombasawalam(a)vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20230818041301.407636-1-zack@…
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
index 82094c137855..c43853597776 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.c
@@ -497,10 +497,9 @@ static int vmw_user_bo_synccpu_release(struct drm_file *filp,
if (!(flags & drm_vmw_synccpu_allow_cs)) {
atomic_dec(&vmw_bo->cpu_writers);
}
- ttm_bo_put(&vmw_bo->tbo);
+ vmw_user_bo_unref(vmw_bo);
}
- drm_gem_object_put(&vmw_bo->tbo.base);
return ret;
}
@@ -540,8 +539,7 @@ int vmw_user_bo_synccpu_ioctl(struct drm_device *dev, void *data,
return ret;
ret = vmw_user_bo_synccpu_grab(vbo, arg->flags);
- vmw_bo_unreference(&vbo);
- drm_gem_object_put(&vbo->tbo.base);
+ vmw_user_bo_unref(vbo);
if (unlikely(ret != 0)) {
if (ret == -ERESTARTSYS || ret == -EBUSY)
return -EBUSY;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
index 50a836e70994..1d433fceed3d 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_bo.h
@@ -195,6 +195,14 @@ static inline struct vmw_bo *vmw_bo_reference(struct vmw_bo *buf)
return buf;
}
+static inline void vmw_user_bo_unref(struct vmw_bo *vbo)
+{
+ if (vbo) {
+ ttm_bo_put(&vbo->tbo);
+ drm_gem_object_put(&vbo->tbo.base);
+ }
+}
+
static inline struct vmw_bo *to_vmw_bo(struct drm_gem_object *gobj)
{
return container_of((gobj), struct vmw_bo, tbo.base);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
index d30c0e3d3ab7..98e0723ca6f5 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c
@@ -1164,8 +1164,7 @@ static int vmw_translate_mob_ptr(struct vmw_private *dev_priv,
}
vmw_bo_placement_set(vmw_bo, VMW_BO_DOMAIN_MOB, VMW_BO_DOMAIN_MOB);
ret = vmw_validation_add_bo(sw_context->ctx, vmw_bo);
- ttm_bo_put(&vmw_bo->tbo);
- drm_gem_object_put(&vmw_bo->tbo.base);
+ vmw_user_bo_unref(vmw_bo);
if (unlikely(ret != 0))
return ret;
@@ -1221,8 +1220,7 @@ static int vmw_translate_guest_ptr(struct vmw_private *dev_priv,
vmw_bo_placement_set(vmw_bo, VMW_BO_DOMAIN_GMR | VMW_BO_DOMAIN_VRAM,
VMW_BO_DOMAIN_GMR | VMW_BO_DOMAIN_VRAM);
ret = vmw_validation_add_bo(sw_context->ctx, vmw_bo);
- ttm_bo_put(&vmw_bo->tbo);
- drm_gem_object_put(&vmw_bo->tbo.base);
+ vmw_user_bo_unref(vmw_bo);
if (unlikely(ret != 0))
return ret;
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
index b62207be3363..1489ad73c103 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c
@@ -1665,10 +1665,8 @@ static struct drm_framebuffer *vmw_kms_fb_create(struct drm_device *dev,
err_out:
/* vmw_user_lookup_handle takes one ref so does new_fb */
- if (bo) {
- vmw_bo_unreference(&bo);
- drm_gem_object_put(&bo->tbo.base);
- }
+ if (bo)
+ vmw_user_bo_unref(bo);
if (surface)
vmw_surface_unreference(&surface);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
index 7e112319a23c..fb85f244c3d0 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_overlay.c
@@ -451,8 +451,7 @@ int vmw_overlay_ioctl(struct drm_device *dev, void *data,
ret = vmw_overlay_update_stream(dev_priv, buf, arg, true);
- vmw_bo_unreference(&buf);
- drm_gem_object_put(&buf->tbo.base);
+ vmw_user_bo_unref(buf);
out_unlock:
mutex_unlock(&overlay->mutex);
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
index e7226db8b242..1e81ff2422cf 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_shader.c
@@ -809,8 +809,7 @@ static int vmw_shader_define(struct drm_device *dev, struct drm_file *file_priv,
shader_type, num_input_sig,
num_output_sig, tfile, shader_handle);
out_bad_arg:
- vmw_bo_unreference(&buffer);
- drm_gem_object_put(&buffer->tbo.base);
+ vmw_user_bo_unref(buffer);
return ret;
}
With commit 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver
with pata_falcon and falconide"), the Q40 IDE driver was
replaced by pata_falcon.c.
Both IO and memory resources were defined for the Q40 IDE
platform device, but definition of the IDE register addresses
was modeled after the Falcon case, both in use of the memory
resources and in including register shift and byte vs. word
offset in the address.
This was correct for the Falcon case, which does not apply
any address translation to the register addresses. In the
Q40 case, all of device base address, byte access offset
and register shift is included in the platform specific
ISA access translation (in asm/mm_io.h).
As a consequence, such address translation gets applied
twice, and register addresses are mangled.
Use the device base address from the platform IO resource
for Q40 (the IO address translation will then add the correct
ISA window base address and byte access offset), with register
shift 1. Use MMIO base address and register shift 2 as before
for Falcon.
Encode PIO_OFFSET into IO port addresses for all registers
for Q40 except the data transfer register. Encode the MMIO
offset there (pata_falcon_data_xfer() directly uses raw IO
with no address translation).
Reported-by: William R Sowerbutts <will(a)sowerbutts.com>
Closes: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Link: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Fixes: 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver with pata_falcon and falconide")
Cc: stable(a)vger.kernel.org
Cc: Finn Thain <fthain(a)linux-m68k.org>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Tested-by: William R Sowerbutts <will(a)sowerbutts.com>
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov(a)omp.ru>
Reviewed-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
Changes from v4:
Geert Uytterhoeven:
- use %px for ap->ioaddr.data_addr
Changes from v3:
Sergey Shtylyov:
- change use of reg_scale to reg_shift
Geert Uytterhoeven:
- factor out ata_port_desc() from platform specific code
Changes from v2:
Finn Thain:
- add back stable Cc:
Changes from v1:
Damien Le Moal:
- change patch title
- drop stable backport tag
Changes from RFC v3:
- split off byte swap option into separate patch
Geert Uytterhoeven:
- review comments
Changes from RFC v2:
- add driver parameter 'data_swap' as bit mask for drives to swap
Changes from RFC v1:
Finn Thain:
- take care to supply IO address suitable for ioread8/iowrite8
- use MMIO address for data transfer
---
drivers/ata/pata_falcon.c | 50 +++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 21 deletions(-)
diff --git a/drivers/ata/pata_falcon.c b/drivers/ata/pata_falcon.c
index 996516e64f13..616064b02de6 100644
--- a/drivers/ata/pata_falcon.c
+++ b/drivers/ata/pata_falcon.c
@@ -123,8 +123,8 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
struct resource *base_res, *ctl_res, *irq_res;
struct ata_host *host;
struct ata_port *ap;
- void __iomem *base;
- int irq = 0;
+ void __iomem *base, *ctl_base;
+ int irq = 0, io_offset = 1, reg_shift = 2; /* Falcon defaults */
dev_info(&pdev->dev, "Atari Falcon and Q40/Q60 PATA controller\n");
@@ -165,26 +165,34 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
ap->pio_mask = ATA_PIO4;
ap->flags |= ATA_FLAG_SLAVE_POSS | ATA_FLAG_NO_IORDY;
- base = (void __iomem *)base_mem_res->start;
/* N.B. this assumes data_addr will be used for word-sized I/O only */
- ap->ioaddr.data_addr = base + 0 + 0 * 4;
- ap->ioaddr.error_addr = base + 1 + 1 * 4;
- ap->ioaddr.feature_addr = base + 1 + 1 * 4;
- ap->ioaddr.nsect_addr = base + 1 + 2 * 4;
- ap->ioaddr.lbal_addr = base + 1 + 3 * 4;
- ap->ioaddr.lbam_addr = base + 1 + 4 * 4;
- ap->ioaddr.lbah_addr = base + 1 + 5 * 4;
- ap->ioaddr.device_addr = base + 1 + 6 * 4;
- ap->ioaddr.status_addr = base + 1 + 7 * 4;
- ap->ioaddr.command_addr = base + 1 + 7 * 4;
-
- base = (void __iomem *)ctl_mem_res->start;
- ap->ioaddr.altstatus_addr = base + 1;
- ap->ioaddr.ctl_addr = base + 1;
-
- ata_port_desc(ap, "cmd 0x%lx ctl 0x%lx",
- (unsigned long)base_mem_res->start,
- (unsigned long)ctl_mem_res->start);
+ ap->ioaddr.data_addr = (void __iomem *)base_mem_res->start;
+
+ if (base_res) { /* only Q40 has IO resources */
+ io_offset = 0x10000;
+ reg_shift = 0;
+ base = (void __iomem *)base_res->start;
+ ctl_base = (void __iomem *)ctl_res->start;
+ } else {
+ base = (void __iomem *)base_mem_res->start;
+ ctl_base = (void __iomem *)ctl_mem_res->start;
+ }
+
+ ap->ioaddr.error_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.feature_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.nsect_addr = base + io_offset + (2 << reg_shift);
+ ap->ioaddr.lbal_addr = base + io_offset + (3 << reg_shift);
+ ap->ioaddr.lbam_addr = base + io_offset + (4 << reg_shift);
+ ap->ioaddr.lbah_addr = base + io_offset + (5 << reg_shift);
+ ap->ioaddr.device_addr = base + io_offset + (6 << reg_shift);
+ ap->ioaddr.status_addr = base + io_offset + (7 << reg_shift);
+ ap->ioaddr.command_addr = base + io_offset + (7 << reg_shift);
+
+ ap->ioaddr.altstatus_addr = ctl_base + io_offset;
+ ap->ioaddr.ctl_addr = ctl_base + io_offset;
+
+ ata_port_desc(ap, "cmd %px ctl %px data %px",
+ base, ctl_base, ap->ioaddr.data_addr);
irq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
if (irq_res && irq_res->start > 0) {
--
2.17.1
since the commit
a855724dc08b pinctrl: amd: Fix mistake in handling clearing pins at startup
after boot, pressing power button is detected just once and ignored
afterwards.
product: IdeaPad 5 14ALC05
cpu: AMD Ryzen 5 5500U with Radeon Graphics
bios version: G5CN16WW(V1.04)
distro: Arch Linux
desktop environment: KDE Plasma 5.27.7
steps to reproduce:
boot the computer
log in
run sudo evtest
select event2
(on my computer the power button is always represented by
/dev/input/event2,
i don't know if it's the same on others)
press the power button multiple times
(might have to close the log out dialog depending on the DE)
expected behavior:
all the power button presses are recorded
observed behavior:
only the first power button press is recorded
i also have a desktop computer with a ryzen 5 2600x processor, but that
isn't affected
#regzbot introduced: a855724dc08b
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 2f406263e3e954aa24c1248edcfa9be0c1bb30fa
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082620-overact-feisty-c309@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 2f406263e3e954aa24c1248edcfa9be0c1bb30fa Mon Sep 17 00:00:00 2001
From: Yin Fengwei <fengwei.yin(a)intel.com>
Date: Tue, 8 Aug 2023 10:09:15 +0800
Subject: [PATCH] madvise:madvise_cold_or_pageout_pte_range(): don't use
mapcount() against large folio for sharing check
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
folio_mapcount() is used to check whether the folio is shared. But it's
not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases:
User space application call madvise() with MADV_FREE, MADV_COLD and
MADV_PAGEOUT for specific address range. There are THP mapped to the
range. Without the patchset, the THP is skipped. With the patch, the
THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT
skip THP:
https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel…
and I confirmed this patchset make it work again.
This patch (of 3):
Commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range()
to use folios") replaced the page_mapcount() with folio_mapcount() to
check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com
Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com
Fixes: 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/madvise.c b/mm/madvise.c
index bfe0e06427bd..46802b4cf65a 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -384,7 +384,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
folio = pfn_folio(pmd_pfn(orig_pmd));
/* Do not interfere with other mappings of this folio */
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
goto huge_unlock;
if (pageout_anon_only_filter && !folio_test_anon(folio))
@@ -458,7 +458,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd,
if (folio_test_large(folio)) {
int err;
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (pageout_anon_only_filter && !folio_test_anon(folio))
break;
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x cfeb6ae8bcb96ccf674724f223661bbcef7b0d0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082644-dimmed-purse-07c2@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
cfeb6ae8bcb9 ("maple_tree: disable mas_wr_append() when other readers are possible")
2e1da329b424 ("maple_tree: add comments and some minor cleanups to mas_wr_append()")
c6fc9e4a5c50 ("maple_tree: add mas_wr_new_end() to calculate new_end accurately")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cfeb6ae8bcb96ccf674724f223661bbcef7b0d0b Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Fri, 18 Aug 2023 20:43:55 -0400
Subject: [PATCH] maple_tree: disable mas_wr_append() when other readers are
possible
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 4dd73cf936a6..f723024e1426 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4265,6 +4265,10 @@ static inline unsigned char mas_wr_new_end(struct ma_wr_state *wr_mas)
* mas_wr_append: Attempt to append
* @wr_mas: the maple write state
*
+ * This is currently unsafe in rcu mode since the end of the node may be cached
+ * by readers while the node contents may be updated which could result in
+ * inaccurate information.
+ *
* Return: True if appended, false otherwise
*/
static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
@@ -4274,6 +4278,9 @@ static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
struct ma_state *mas = wr_mas->mas;
unsigned char node_pivots = mt_pivots[wr_mas->type];
+ if (mt_in_rcu(mas->tree))
+ return false;
+
if (mas->offset != wr_mas->node_end)
return false;
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x cfeb6ae8bcb96ccf674724f223661bbcef7b0d0b
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082642-energetic-playpen-0bea@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cfeb6ae8bcb96ccf674724f223661bbcef7b0d0b Mon Sep 17 00:00:00 2001
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Date: Fri, 18 Aug 2023 20:43:55 -0400
Subject: [PATCH] maple_tree: disable mas_wr_append() when other readers are
possible
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 4dd73cf936a6..f723024e1426 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -4265,6 +4265,10 @@ static inline unsigned char mas_wr_new_end(struct ma_wr_state *wr_mas)
* mas_wr_append: Attempt to append
* @wr_mas: the maple write state
*
+ * This is currently unsafe in rcu mode since the end of the node may be cached
+ * by readers while the node contents may be updated which could result in
+ * inaccurate information.
+ *
* Return: True if appended, false otherwise
*/
static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
@@ -4274,6 +4278,9 @@ static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
struct ma_state *mas = wr_mas->mas;
unsigned char node_pivots = mt_pivots[wr_mas->type];
+ if (mt_in_rcu(mas->tree))
+ return false;
+
if (mas->offset != wr_mas->node_end)
return false;
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 987aae75fc1041072941ffb622b45ce2359a99b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082645-unicycle-diaphragm-272c@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
987aae75fc10 ("batman-adv: Hold rtnl lock during MTU update via netlink")
3e15b06eb7e4 ("batman-adv: Add fragmentation mesh genl configuration")
a1c8de803296 ("batman-adv: Add distributed_arp_table mesh genl configuration")
43ff6105a527 ("batman-adv: Add bridge_loop_avoidance mesh genl configuration")
d7e52506b680 ("batman-adv: Add bonding mesh genl configuration")
e43d16b87dc2 ("batman-adv: Add ap_isolation mesh/vlan genl configuration")
9ab4cee5ced9 ("batman-adv: Add aggregated_ogms mesh genl configuration")
49e7e37cd981 ("batman-adv: Prepare framework for vlan genl config")
5c55a40fa801 ("batman-adv: Prepare framework for hardif genl config")
600405135360 ("batman-adv: Prepare framework for mesh genl config")
c4a7a8d9bb8f ("batman-adv: Move common genl doit code pre/post hooks")
fb69be697916 ("batman-adv: Add inconsistent hardif netlink dump detection")
53dd9a68ba68 ("batman-adv: add multicast flags netlink support")
41aeefcc38a2 ("batman-adv: add DAT cache netlink support")
fec149f5d323 ("batman-adv: Convert packet.h to uapi header")
7e9a8c2ce7c5 ("batman-adv: Use parentheses in function kernel-doc")
7db7d9f369a4 ("batman-adv: Add SPDX license identifier above copyright header")
40b16b9be577 ("batman-adv: use inline kernel-doc for uapi constants")
706cc9f51d9a ("batman-adv: Add argument names for function ptr definitions")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 987aae75fc1041072941ffb622b45ce2359a99b9 Mon Sep 17 00:00:00 2001
From: Sven Eckelmann <sven(a)narfation.org>
Date: Mon, 21 Aug 2023 21:48:48 +0200
Subject: [PATCH] batman-adv: Hold rtnl lock during MTU update via netlink
The automatic recalculation of the maximum allowed MTU is usually triggered
by code sections which are already rtnl lock protected by callers outside
of batman-adv. But when the fragmentation setting is changed via
batman-adv's own batadv genl family, then the rtnl lock is not yet taken.
But dev_set_mtu requires that the caller holds the rtnl lock because it
uses netdevice notifiers. And this code will then fail the check for this
lock:
RTNL: assertion failed at net/core/dev.c (1953)
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+f8812454d9b3ac00d282(a)syzkaller.appspotmail.com
Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7b…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/batman-adv/netlink.c b/net/batman-adv/netlink.c
index ad5714f737be..6efbc9275aec 100644
--- a/net/batman-adv/netlink.c
+++ b/net/batman-adv/netlink.c
@@ -495,7 +495,10 @@ static int batadv_netlink_set_mesh(struct sk_buff *skb, struct genl_info *info)
attr = info->attrs[BATADV_ATTR_FRAGMENTATION_ENABLED];
atomic_set(&bat_priv->fragmentation, !!nla_get_u8(attr));
+
+ rtnl_lock();
batadv_update_min_mtu(bat_priv->soft_iface);
+ rtnl_unlock();
}
if (info->attrs[BATADV_ATTR_GW_BANDWIDTH_DOWN]) {
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 987aae75fc1041072941ffb622b45ce2359a99b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082644-placard-bullfight-dbc3@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
987aae75fc10 ("batman-adv: Hold rtnl lock during MTU update via netlink")
3e15b06eb7e4 ("batman-adv: Add fragmentation mesh genl configuration")
a1c8de803296 ("batman-adv: Add distributed_arp_table mesh genl configuration")
43ff6105a527 ("batman-adv: Add bridge_loop_avoidance mesh genl configuration")
d7e52506b680 ("batman-adv: Add bonding mesh genl configuration")
e43d16b87dc2 ("batman-adv: Add ap_isolation mesh/vlan genl configuration")
9ab4cee5ced9 ("batman-adv: Add aggregated_ogms mesh genl configuration")
49e7e37cd981 ("batman-adv: Prepare framework for vlan genl config")
5c55a40fa801 ("batman-adv: Prepare framework for hardif genl config")
600405135360 ("batman-adv: Prepare framework for mesh genl config")
c4a7a8d9bb8f ("batman-adv: Move common genl doit code pre/post hooks")
fb69be697916 ("batman-adv: Add inconsistent hardif netlink dump detection")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 987aae75fc1041072941ffb622b45ce2359a99b9 Mon Sep 17 00:00:00 2001
From: Sven Eckelmann <sven(a)narfation.org>
Date: Mon, 21 Aug 2023 21:48:48 +0200
Subject: [PATCH] batman-adv: Hold rtnl lock during MTU update via netlink
The automatic recalculation of the maximum allowed MTU is usually triggered
by code sections which are already rtnl lock protected by callers outside
of batman-adv. But when the fragmentation setting is changed via
batman-adv's own batadv genl family, then the rtnl lock is not yet taken.
But dev_set_mtu requires that the caller holds the rtnl lock because it
uses netdevice notifiers. And this code will then fail the check for this
lock:
RTNL: assertion failed at net/core/dev.c (1953)
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+f8812454d9b3ac00d282(a)syzkaller.appspotmail.com
Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Reviewed-by: Simon Horman <horms(a)kernel.org>
Link: https://lore.kernel.org/r/20230821-batadv-missing-mtu-rtnl-lock-v1-1-1c5a7b…
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/net/batman-adv/netlink.c b/net/batman-adv/netlink.c
index ad5714f737be..6efbc9275aec 100644
--- a/net/batman-adv/netlink.c
+++ b/net/batman-adv/netlink.c
@@ -495,7 +495,10 @@ static int batadv_netlink_set_mesh(struct sk_buff *skb, struct genl_info *info)
attr = info->attrs[BATADV_ATTR_FRAGMENTATION_ENABLED];
atomic_set(&bat_priv->fragmentation, !!nla_get_u8(attr));
+
+ rtnl_lock();
batadv_update_min_mtu(bat_priv->soft_iface);
+ rtnl_unlock();
}
if (info->attrs[BATADV_ATTR_GW_BANDWIDTH_DOWN]) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x d8e42a2b0addf238be8b3b37dcd9795a5c1be459
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082656-sadly-harness-6601@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
d8e42a2b0add ("batman-adv: Don't increase MTU when set by user")
c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU")
8b84cc4fb556 ("batman-adv: Use inline kernel-doc for enum/struct")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d8e42a2b0addf238be8b3b37dcd9795a5c1be459 Mon Sep 17 00:00:00 2001
From: Sven Eckelmann <sven(a)narfation.org>
Date: Wed, 19 Jul 2023 10:01:15 +0200
Subject: [PATCH] batman-adv: Don't increase MTU when set by user
If the user set an MTU value, it usually means that there are special
requirements for the MTU. But if an interface gots activated, the MTU was
always recalculated and then the user set value was overwritten.
The only reason why this user set value has to be overwritten, is when the
MTU has to be decreased because batman-adv is not able to transfer packets
with the user specified size.
Fixes: c6c8fea29769 ("net: Add batman-adv meshing protocol")
Cc: stable(a)vger.kernel.org
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
Signed-off-by: Simon Wunderlich <sw(a)simonwunderlich.de>
diff --git a/net/batman-adv/hard-interface.c b/net/batman-adv/hard-interface.c
index ae5762af0146..24c9c0c3f316 100644
--- a/net/batman-adv/hard-interface.c
+++ b/net/batman-adv/hard-interface.c
@@ -630,7 +630,19 @@ int batadv_hardif_min_mtu(struct net_device *soft_iface)
*/
void batadv_update_min_mtu(struct net_device *soft_iface)
{
- dev_set_mtu(soft_iface, batadv_hardif_min_mtu(soft_iface));
+ struct batadv_priv *bat_priv = netdev_priv(soft_iface);
+ int limit_mtu;
+ int mtu;
+
+ mtu = batadv_hardif_min_mtu(soft_iface);
+
+ if (bat_priv->mtu_set_by_user)
+ limit_mtu = bat_priv->mtu_set_by_user;
+ else
+ limit_mtu = ETH_DATA_LEN;
+
+ mtu = min(mtu, limit_mtu);
+ dev_set_mtu(soft_iface, mtu);
/* Check if the local translate table should be cleaned up to match a
* new (and smaller) MTU.
diff --git a/net/batman-adv/soft-interface.c b/net/batman-adv/soft-interface.c
index d3fdf82282af..85d00dc9ce32 100644
--- a/net/batman-adv/soft-interface.c
+++ b/net/batman-adv/soft-interface.c
@@ -153,11 +153,14 @@ static int batadv_interface_set_mac_addr(struct net_device *dev, void *p)
static int batadv_interface_change_mtu(struct net_device *dev, int new_mtu)
{
+ struct batadv_priv *bat_priv = netdev_priv(dev);
+
/* check ranges */
if (new_mtu < 68 || new_mtu > batadv_hardif_min_mtu(dev))
return -EINVAL;
dev->mtu = new_mtu;
+ bat_priv->mtu_set_by_user = new_mtu;
return 0;
}
diff --git a/net/batman-adv/types.h b/net/batman-adv/types.h
index ca9449ec9836..cf1a0eafe3ab 100644
--- a/net/batman-adv/types.h
+++ b/net/batman-adv/types.h
@@ -1546,6 +1546,12 @@ struct batadv_priv {
/** @soft_iface: net device which holds this struct as private data */
struct net_device *soft_iface;
+ /**
+ * @mtu_set_by_user: MTU was set once by user
+ * protected by rtnl_lock
+ */
+ int mtu_set_by_user;
+
/**
* @bat_counters: mesh internal traffic statistic counters (see
* batadv_counters)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x e2c1ab070fdc81010ec44634838d24fce9ff9e53
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082631-preacher-spousal-e0e5@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
e2c1ab070fdc ("mm: memory-failure: fix unexpected return value in soft_offline_page()")
7adb45887c8a ("mm: memory-failure: kill soft_offline_free_page()")
2a57d83c78f8 ("mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e2c1ab070fdc81010ec44634838d24fce9ff9e53 Mon Sep 17 00:00:00 2001
From: Miaohe Lin <linmiaohe(a)huawei.com>
Date: Tue, 27 Jun 2023 19:28:08 +0800
Subject: [PATCH] mm: memory-failure: fix unexpected return value in
soft_offline_page()
When page_handle_poison() fails to handle the hugepage or free page in
retry path, soft_offline_page() will return 0 while -EBUSY is expected in
this case.
Consequently the user will think soft_offline_page succeeds while it in
fact failed. So the user will not try again later in this case.
Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com
Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 139b31fdb678..fe121fdb05f7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2741,10 +2741,13 @@ int soft_offline_page(unsigned long pfn, int flags)
if (ret > 0) {
ret = soft_offline_in_use_page(page);
} else if (ret == 0) {
- if (!page_handle_poison(page, true, false) && try_again) {
- try_again = false;
- flags &= ~MF_COUNT_INCREASED;
- goto retry;
+ if (!page_handle_poison(page, true, false)) {
+ if (try_again) {
+ try_again = false;
+ flags &= ~MF_COUNT_INCREASED;
+ goto retry;
+ }
+ ret = -EBUSY;
}
}
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x e2c1ab070fdc81010ec44634838d24fce9ff9e53
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082619-rocky-strobe-5ad9@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
e2c1ab070fdc ("mm: memory-failure: fix unexpected return value in soft_offline_page()")
7adb45887c8a ("mm: memory-failure: kill soft_offline_free_page()")
2a57d83c78f8 ("mm/hwpoison: clear MF_COUNT_INCREASED before retrying get_any_page()")
dad4e5b39086 ("mm: fix page reference leak in soft_offline_page()")
8295d535e2aa ("mm,hwpoison: refactor get_any_page")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e2c1ab070fdc81010ec44634838d24fce9ff9e53 Mon Sep 17 00:00:00 2001
From: Miaohe Lin <linmiaohe(a)huawei.com>
Date: Tue, 27 Jun 2023 19:28:08 +0800
Subject: [PATCH] mm: memory-failure: fix unexpected return value in
soft_offline_page()
When page_handle_poison() fails to handle the hugepage or free page in
retry path, soft_offline_page() will return 0 while -EBUSY is expected in
this case.
Consequently the user will think soft_offline_page succeeds while it in
fact failed. So the user will not try again later in this case.
Link: https://lkml.kernel.org/r/20230627112808.1275241-1-linmiaohe@huawei.com
Fixes: b94e02822deb ("mm,hwpoison: try to narrow window race for free pages")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Acked-by: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 139b31fdb678..fe121fdb05f7 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -2741,10 +2741,13 @@ int soft_offline_page(unsigned long pfn, int flags)
if (ret > 0) {
ret = soft_offline_in_use_page(page);
} else if (ret == 0) {
- if (!page_handle_poison(page, true, false) && try_again) {
- try_again = false;
- flags &= ~MF_COUNT_INCREASED;
- goto retry;
+ if (!page_handle_poison(page, true, false)) {
+ if (try_again) {
+ try_again = false;
+ flags &= ~MF_COUNT_INCREASED;
+ goto retry;
+ }
+ ret = -EBUSY;
}
}
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x d74943a2f3cdade34e471b36f55f7979be656867
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082639-crushed-impart-a42d@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
d74943a2f3cd ("mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT")
2c2241081f7d ("mm/gup: move private gup FOLL_ flags to internal.h")
63b605128655 ("mm/gup: move gup_must_unshare() to mm/internal.h")
f04740f54594 ("mm/gup: add FOLL_UNLOCKABLE")
d64e2dbc33a1 ("mm/gup: simplify the external interface functions and consolidate invariants")
afa3c33e2684 ("mm/gup: don't call __gup_longterm_locked() if FOLL_LONGTERM cannot be set")
b2a72dff85fa ("mm/gup: have internal functions get the mmap_read_lock()")
b5054174ac7c ("mm: move FOLL_* defs to mm_types.h")
8fa590bf3448 ("Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From d74943a2f3cdade34e471b36f55f7979be656867 Mon Sep 17 00:00:00 2001
From: David Hildenbrand <david(a)redhat.com>
Date: Thu, 3 Aug 2023 16:32:02 +0200
Subject: [PATCH] mm/gup: reintroduce FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
Unfortunately commit 474098edac26 ("mm/gup: replace FOLL_NUMA by
gup_can_follow_protnone()") missed that follow_page() and
follow_trans_huge_pmd() never implicitly set FOLL_NUMA because they really
don't want to fail on PROT_NONE-mapped pages -- either due to NUMA hinting
or due to inaccessible (PROT_NONE) VMAs.
As spelled out in commit 0b9d705297b2 ("mm: numa: Support NUMA hinting
page faults from gup/gup_fast"): "Other follow_page callers like KSM
should not use FOLL_NUMA, or they would fail to get the pages if they use
follow_page instead of get_user_pages."
liubo reported [1] that smaps_rollup results are imprecise, because they
miss accounting of pages that are mapped PROT_NONE. Further, it's easy to
reproduce that KSM no longer works on inaccessible VMAs on x86-64, because
pte_protnone()/pmd_protnone() also indictaes "true" in inaccessible VMAs,
and follow_page() refuses to return such pages right now.
As KVM really depends on these NUMA hinting faults, removing the
pte_protnone()/pmd_protnone() handling in GUP code completely is not
really an option.
To fix the issues at hand, let's revive FOLL_NUMA as FOLL_HONOR_NUMA_FAULT
to restore the original behavior for now and add better comments.
Set FOLL_HONOR_NUMA_FAULT independent of FOLL_FORCE in
is_valid_gup_args(), to add that flag for all external GUP users.
Note that there are three GUP-internal __get_user_pages() users that don't
end up calling is_valid_gup_args() and consequently won't get
FOLL_HONOR_NUMA_FAULT set.
1) get_dump_page(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE and wouldn't have honored NUMA
hinting faults already.
2) populate_vma_page_range(): we really don't want to handle NUMA hinting
faults. It specifies FOLL_FORCE on accessible VMAs, so it wouldn't have
honored NUMA hinting faults already.
3) faultin_vma_page_range(): we similarly don't want to handle NUMA
hinting faults.
To make the combination of FOLL_FORCE and FOLL_HONOR_NUMA_FAULT work in
inaccessible VMAs properly, we have to perform VMA accessibility checks in
gup_can_follow_protnone().
As GUP-fast should reject such pages either way in
pte_access_permitted()/pmd_access_permitted() -- for example on x86-64 and
arm64 that both implement pte_protnone() -- let's just always fallback to
ordinary GUP when stumbling over pte_protnone()/pmd_protnone().
As Linus notes [2], honoring NUMA faults might only make sense for
selected GUP users.
So we should really see if we can instead let relevant GUP callers specify
it manually, and not trigger NUMA hinting faults from GUP as default.
Prepare for that by making FOLL_HONOR_NUMA_FAULT an external GUP flag and
adding appropriate documenation.
While at it, remove a stale comment from follow_trans_huge_pmd(): That
comment for pmd_protnone() was added in commit 2b4847e73004 ("mm: numa:
serialise parallel get_user_page against THP migration"), which noted:
THP does not unmap pages due to a lack of support for migration
entries at a PMD level. This allows races with get_user_pages
Nowadays, we do have PMD migration entries, so the comment no longer
applies. Let's drop it.
[1] https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
[2] https://lore.kernel.org/r/CAHk-=wgRiP_9X0rRdZKT8nhemZGNateMtb366t37d8-x7VRs…
Link: https://lkml.kernel.org/r/20230803143208.383663-2-david@redhat.com
Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: liubo <liubo254(a)huawei.com>
Closes: https://lore.kernel.org/r/20230726073409.631838-1-liubo254@huawei.com
Reported-by: Peter Xu <peterx(a)redhat.com>
Closes: https://lore.kernel.org/all/ZMKJjDaqZ7FW0jfe@x1n/
Acked-by: Mel Gorman <mgorman(a)techsingularity.net>
Acked-by: Peter Xu <peterx(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: John Hubbard <jhubbard(a)nvidia.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Mel Gorman <mgorman(a)suse.de>
Cc: Paolo Bonzini <pbonzini(a)redhat.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 406ab9ea818f..34f9dba17c1a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3421,15 +3421,24 @@ static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags)
* Indicates whether GUP can follow a PROT_NONE mapped page, or whether
* a (NUMA hinting) fault is required.
*/
-static inline bool gup_can_follow_protnone(unsigned int flags)
+static inline bool gup_can_follow_protnone(struct vm_area_struct *vma,
+ unsigned int flags)
{
/*
- * FOLL_FORCE has to be able to make progress even if the VMA is
- * inaccessible. Further, FOLL_FORCE access usually does not represent
- * application behaviour and we should avoid triggering NUMA hinting
- * faults.
+ * If callers don't want to honor NUMA hinting faults, no need to
+ * determine if we would actually have to trigger a NUMA hinting fault.
*/
- return flags & FOLL_FORCE;
+ if (!(flags & FOLL_HONOR_NUMA_FAULT))
+ return true;
+
+ /*
+ * NUMA hinting faults don't apply in inaccessible (PROT_NONE) VMAs.
+ *
+ * Requiring a fault here even for inaccessible VMAs would mean that
+ * FOLL_FORCE cannot make any progress, because handle_mm_fault()
+ * refuses to process NUMA hinting faults in inaccessible VMAs.
+ */
+ return !vma_is_accessible(vma);
}
typedef int (*pte_fn_t)(pte_t *pte, unsigned long addr, void *data);
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5e74ce4a28cd..7d30dc4ff0ff 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1286,6 +1286,15 @@ enum {
FOLL_PCI_P2PDMA = 1 << 10,
/* allow interrupts from generic signals */
FOLL_INTERRUPTIBLE = 1 << 11,
+ /*
+ * Always honor (trigger) NUMA hinting faults.
+ *
+ * FOLL_WRITE implicitly honors NUMA hinting faults because a
+ * PROT_NONE-mapped page is not writable (exceptions with FOLL_FORCE
+ * apply). get_user_pages_fast_only() always implicitly honors NUMA
+ * hinting faults.
+ */
+ FOLL_HONOR_NUMA_FAULT = 1 << 12,
/* See also internal only FOLL flags in mm/internal.h */
};
diff --git a/mm/gup.c b/mm/gup.c
index 76d222ccc3ff..6e2f9e9d6537 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -597,7 +597,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma,
pte = ptep_get(ptep);
if (!pte_present(pte))
goto no_page;
- if (pte_protnone(pte) && !gup_can_follow_protnone(flags))
+ if (pte_protnone(pte) && !gup_can_follow_protnone(vma, flags))
goto no_page;
page = vm_normal_page(vma, address, pte);
@@ -714,7 +714,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma,
if (likely(!pmd_trans_huge(pmdval)))
return follow_page_pte(vma, address, pmd, flags, &ctx->pgmap);
- if (pmd_protnone(pmdval) && !gup_can_follow_protnone(flags))
+ if (pmd_protnone(pmdval) && !gup_can_follow_protnone(vma, flags))
return no_page_table(vma, flags);
ptl = pmd_lock(mm, pmd);
@@ -851,6 +851,10 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
return NULL;
+ /*
+ * We never set FOLL_HONOR_NUMA_FAULT because callers don't expect
+ * to fail on PROT_NONE-mapped pages.
+ */
page = follow_page_mask(vma, address, foll_flags, &ctx);
if (ctx.pgmap)
put_dev_pagemap(ctx.pgmap);
@@ -2227,6 +2231,13 @@ static bool is_valid_gup_args(struct page **pages, int *locked,
gup_flags |= FOLL_UNLOCKABLE;
}
+ /*
+ * For now, always trigger NUMA hinting faults. Some GUP users like
+ * KVM require the hint to be as the calling context of GUP is
+ * functionally similar to a memory reference from task context.
+ */
+ gup_flags |= FOLL_HONOR_NUMA_FAULT;
+
/* FOLL_GET and FOLL_PIN are mutually exclusive. */
if (WARN_ON_ONCE((gup_flags & (FOLL_PIN | FOLL_GET)) ==
(FOLL_PIN | FOLL_GET)))
@@ -2551,7 +2562,14 @@ static int gup_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr,
struct page *page;
struct folio *folio;
- if (pte_protnone(pte) && !gup_can_follow_protnone(flags))
+ /*
+ * Always fallback to ordinary GUP on PROT_NONE-mapped pages:
+ * pte_access_permitted() better should reject these pages
+ * either way: otherwise, GUP-fast might succeed in
+ * cases where ordinary GUP would fail due to VMA access
+ * permissions.
+ */
+ if (pte_protnone(pte))
goto pte_unmap;
if (!pte_access_permitted(pte, flags & FOLL_WRITE))
@@ -2970,8 +2988,8 @@ static int gup_pmd_range(pud_t *pudp, pud_t pud, unsigned long addr, unsigned lo
if (unlikely(pmd_trans_huge(pmd) || pmd_huge(pmd) ||
pmd_devmap(pmd))) {
- if (pmd_protnone(pmd) &&
- !gup_can_follow_protnone(flags))
+ /* See gup_pte_range() */
+ if (pmd_protnone(pmd))
return 0;
if (!gup_huge_pmd(pmd, pmdp, addr, next, flags,
@@ -3151,7 +3169,7 @@ static int internal_get_user_pages_fast(unsigned long start,
if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM |
FOLL_FORCE | FOLL_PIN | FOLL_GET |
FOLL_FAST_ONLY | FOLL_NOFAULT |
- FOLL_PCI_P2PDMA)))
+ FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT)))
return -EINVAL;
if (gup_flags & FOLL_PIN)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index eb3678360b97..f15d557e5708 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -1467,8 +1467,7 @@ struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
return ERR_PTR(-EFAULT);
- /* Full NUMA hinting faults to serialise migration in fault paths */
- if (pmd_protnone(*pmd) && !gup_can_follow_protnone(flags))
+ if (pmd_protnone(*pmd) && !gup_can_follow_protnone(vma, flags))
return NULL;
if (!pmd_write(*pmd) && gup_must_unshare(vma, flags, page))
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 1cbc11aaa01f80577b67ae02c73ee781112125fd
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082613-lilac-overpower-64a2@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
1cbc11aaa01f ("NFSv4: Fix dropped lock for racing OPEN and delegation return")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 1cbc11aaa01f80577b67ae02c73ee781112125fd Mon Sep 17 00:00:00 2001
From: Benjamin Coddington <bcodding(a)redhat.com>
Date: Fri, 30 Jun 2023 09:18:13 -0400
Subject: [PATCH] NFSv4: Fix dropped lock for racing OPEN and delegation return
Commmit f5ea16137a3f ("NFSv4: Retry LOCK on OLD_STATEID during delegation
return") attempted to solve this problem by using nfs4's generic async error
handling, but introduced a regression where v4.0 lock recovery would hang.
The additional complexity introduced by overloading that error handling is
not necessary for this case. This patch expects that commit to be
reverted.
The problem as originally explained in the above commit is:
There's a small window where a LOCK sent during a delegation return can
race with another OPEN on client, but the open stateid has not yet been
updated. In this case, the client doesn't handle the OLD_STATEID error
from the server and will lose this lock, emitting:
"NFS: nfs4_handle_delegation_recall_error: unhandled error -10024".
Fix this by using the old_stateid refresh helpers if the server replies
with OLD_STATEID.
Suggested-by: Trond Myklebust <trondmy(a)hammerspace.com>
Signed-off-by: Benjamin Coddington <bcodding(a)redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust(a)hammerspace.com>
diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index e1a886b58354..4604e9f3d1b0 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -7181,8 +7181,15 @@ static void nfs4_lock_done(struct rpc_task *task, void *calldata)
} else if (!nfs4_update_lock_stateid(lsp, &data->res.stateid))
goto out_restart;
break;
- case -NFS4ERR_BAD_STATEID:
case -NFS4ERR_OLD_STATEID:
+ if (data->arg.new_lock_owner != 0 &&
+ nfs4_refresh_open_old_stateid(&data->arg.open_stateid,
+ lsp->ls_state))
+ goto out_restart;
+ if (nfs4_refresh_lock_old_stateid(&data->arg.lock_stateid, lsp))
+ goto out_restart;
+ fallthrough;
+ case -NFS4ERR_BAD_STATEID:
case -NFS4ERR_STALE_STATEID:
case -NFS4ERR_EXPIRED:
if (data->arg.new_lock_owner != 0) {
Hi Sasha
I just saw that you queued up mm-disable-config_per_vma_lock-until-its-fixed.patch for 6.4.
The problems that this patch tried to prevent were fixed before it actually made it into a
release, and Linus un-did the commit in his merge (at the bottom):
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/m…
Since the fixes for PER_VMA_LOCK have been in 6.4 releases for a while, this patch
should not go in.
thanks
Holger
Take the vCPU's mmu_seq snapshot as an "unsigned long" instead of an "int"
when checking to see if a page fault is stale, as the sequence count is
stored as an "unsigned long" everywhere else in KVM. This fixes a bug
where KVM will effectively hang vCPUs due to always thinking page faults
are stale, which results in KVM refusing to "fix" faults.
mmu_invalidate_seq (née mmu_notifier_seq) is a sequence counter used when
KVM is handling page faults to detect if userspace mappings relevant to
the guest were invalidated between snapshotting the counter and acquiring
mmu_lock, i.e. to ensure that the userspace mapping KVM is using to
resolve the page fault is fresh. If KVM sees that the counter has
changed, KVM simply resumes the guest without fixing the fault.
What _should_ happen is that the source of the mmu_notifier invalidations
eventually goes away, mmu_invalidate_seq becomes stable, and KVM can once
again fix guest page fault(s).
But for a long-lived VM and/or a VM that the host just doesn't particularly
like, it's possible for a VM to be on the receiving end of 2 billion (with
a B) mmu_notifier invalidations. When that happens, bit 31 will be set in
mmu_invalidate_seq. This causes the value to be turned into a 32-bit
negative value when implicitly cast to an "int" by is_page_fault_stale(),
and then sign-extended into a 64-bit unsigned when the signed "int" is
implicitly cast back to an "unsigned long" on the call to
mmu_invalidate_retry_hva().
As a result of the casting and sign-extension, given a sequence counter of
e.g. 0x8002dc25, mmu_invalidate_retry_hva() ends up doing
if (0x8002dc25 != 0xffffffff8002dc25)
and signals that the page fault is stale and needs to be retried even
though the sequence counter is stable, and KVM effectively hangs any vCPU
that takes a page fault (EPT violation or #NPF when TDP is enabled).
Note, upstream commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq
in kvm_faultin_pfn()") unknowingly fixed the bug in v6.3 when refactoring
how KVM tracks the sequence counter snapshot.
Reported-by: Brian Rak <brak(a)vultr.com>
Reported-by: Amaan Cheval <amaan.cheval(a)gmail.com>
Reported-by: Eric Wheeler <kvm(a)lists.ewheeler.net>
Closes: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameserver…
Fixes: a955cad84cda ("KVM: x86/mmu: Retry page fault if root is invalidated by memslot update")
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
---
arch/x86/kvm/mmu/mmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 230108a90cf3..beca03556379 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -4212,7 +4212,8 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
* root was invalidated by a memslot update or a relevant mmu_notifier fired.
*/
static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
- struct kvm_page_fault *fault, int mmu_seq)
+ struct kvm_page_fault *fault,
+ unsigned long mmu_seq)
{
struct kvm_mmu_page *sp = to_shadow_page(vcpu->arch.mmu->root.hpa);
base-commit: 802aacbbffe2512dce9f8f33ad99d01cfec435de
--
2.42.0.rc2.253.gd59a3bf2b4-goog
Upstream commit edbdb43fc96b11b3bfa531be306a1993d9fe89ec.
Preserve TDP MMU roots until they are explicitly invalidated by gifting
the TDP MMU itself a reference to a root when it is allocated. Keeping a
reference in the TDP MMU fixes a flaw where the TDP MMU exhibits terrible
performance, and can potentially even soft-hang a vCPU, if a vCPU
frequently unloads its roots, e.g. when KVM is emulating SMI+RSM.
When KVM emulates something that invalidates _all_ TLB entries, e.g. SMI
and RSM, KVM unloads all of the vCPUs roots (KVM keeps a small per-vCPU
cache of previous roots). Unloading roots is a simple way to ensure KVM
flushes and synchronizes all roots for the vCPU, as KVM flushes and syncs
when allocating a "new" root (from the vCPU's perspective).
In the shadow MMU, KVM keeps track of all shadow pages, roots included, in
a per-VM hash table. Unloading a shadow MMU root just wipes it from the
per-vCPU cache; the root is still tracked in the per-VM hash table. When
KVM loads a "new" root for the vCPU, KVM will find the old, unloaded root
in the per-VM hash table.
Unlike the shadow MMU, the TDP MMU doesn't track "inactive" roots in a
per-VM structure, where "active" in this case means a root is either
in-use or cached as a previous root by at least one vCPU. When a TDP MMU
root becomes inactive, i.e. the last vCPU reference to the root is put,
KVM immediately frees the root (asterisk on "immediately" as the actual
freeing may be done by a worker, but for all intents and purposes the root
is gone).
The TDP MMU behavior is especially problematic for 1-vCPU setups, as
unloading all roots effectively frees all roots. The issue is mitigated
to some degree in multi-vCPU setups as a different vCPU usually holds a
reference to an unloaded root and thus keeps the root alive, allowing the
vCPU to reuse its old root after unloading (with a flush+sync).
The TDP MMU flaw has been known for some time, as until very recently,
KVM's handling of CR0.WP also triggered unloading of all roots. The
CR0.WP toggling scenario was eventually addressed by not unloading roots
when _only_ CR0.WP is toggled, but such an approach doesn't Just Work
for emulating SMM as KVM must emulate a full TLB flush on entry and exit
to/from SMM. Given that the shadow MMU plays nice with unloading roots
at will, teaching the TDP MMU to do the same is far less complex than
modifying KVM to track which roots need to be flushed before reuse.
Note, preserving all possible TDP MMU roots is not a concern with respect
to memory consumption. Now that the role for direct MMUs doesn't include
information about the guest, e.g. CR0.PG, CR0.WP, CR4.SMEP, etc., there
are _at most_ six possible roots (where "guest_mode" here means L2):
1. 4-level !SMM !guest_mode
2. 4-level SMM !guest_mode
3. 5-level !SMM !guest_mode
4. 5-level SMM !guest_mode
5. 4-level !SMM guest_mode
6. 5-level !SMM guest_mode
And because each vCPU can track 4 valid roots, a VM can already have all
6 root combinations live at any given time. Not to mention that, in
practice, no sane VMM will advertise different guest.MAXPHYADDR values
across vCPUs, i.e. KVM won't ever use both 4-level and 5-level roots for
a single VM. Furthermore, the vast majority of modern hypervisors will
utilize EPT/NPT when available, thus the guest_mode=%true cases are also
unlikely to be utilized.
[6.1 backport notes: conflicts with
09732d2b4dc5 ("KVM: x86/mmu: Move TDP MMU VM init/uninit behind tdp_mmu_enabled")
1f98f2bd8ec4 ("KVM: x86/mmu: Change tdp_mmu to a read-only parameter")
de0322f575be ("KVM: x86/mmu: Replace open coded usage of tdp_mmu_page with is_tdp_mmu_page()")
prevented a clean cherry-pick. First two resolved by keeping 6.1's check
on kvm->arch.tdp_mmu_enabled, last one resolved by taking the upstream
change, i.e. by opportunistically switching to is_tdp_mmu_page()]
Reported-by: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Link: https://lore.kernel.org/all/959c5bce-beb5-b463-7158-33fc4a4f910c@linux.micr…
Link: https://lkml.kernel.org/r/20220209170020.1775368-1-pbonzini%40redhat.com
Link: https://lore.kernel.org/all/20230322013731.102955-1-minipli@grsecurity.net
Link: https://lore.kernel.org/all/000000000000a0bc2b05f9dd7fab@google.com
Link: https://lore.kernel.org/all/000000000000eca0b905fa0f7756@google.com
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230426220323.3079789-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
---
arch/x86/kvm/mmu/tdp_mmu.c | 121 +++++++++++++++++--------------------
1 file changed, 56 insertions(+), 65 deletions(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 672f0432d777..70945f00ec41 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -51,7 +51,17 @@ void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm)
if (!kvm->arch.tdp_mmu_enabled)
return;
- /* Also waits for any queued work items. */
+ /*
+ * Invalidate all roots, which besides the obvious, schedules all roots
+ * for zapping and thus puts the TDP MMU's reference to each root, i.e.
+ * ultimately frees all roots.
+ */
+ kvm_tdp_mmu_invalidate_all_roots(kvm);
+
+ /*
+ * Destroying a workqueue also first flushes the workqueue, i.e. no
+ * need to invoke kvm_tdp_mmu_zap_invalidated_roots().
+ */
destroy_workqueue(kvm->arch.tdp_mmu_zap_wq);
WARN_ON(!list_empty(&kvm->arch.tdp_mmu_pages));
@@ -127,16 +137,6 @@ static void tdp_mmu_schedule_zap_root(struct kvm *kvm, struct kvm_mmu_page *root
queue_work(kvm->arch.tdp_mmu_zap_wq, &root->tdp_mmu_async_work);
}
-static inline bool kvm_tdp_root_mark_invalid(struct kvm_mmu_page *page)
-{
- union kvm_mmu_page_role role = page->role;
- role.invalid = true;
-
- /* No need to use cmpxchg, only the invalid bit can change. */
- role.word = xchg(&page->role.word, role.word);
- return role.invalid;
-}
-
void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
bool shared)
{
@@ -145,45 +145,12 @@ void kvm_tdp_mmu_put_root(struct kvm *kvm, struct kvm_mmu_page *root,
if (!refcount_dec_and_test(&root->tdp_mmu_root_count))
return;
- WARN_ON(!root->tdp_mmu_page);
-
/*
- * The root now has refcount=0. It is valid, but readers already
- * cannot acquire a reference to it because kvm_tdp_mmu_get_root()
- * rejects it. This remains true for the rest of the execution
- * of this function, because readers visit valid roots only
- * (except for tdp_mmu_zap_root_work(), which however
- * does not acquire any reference itself).
- *
- * Even though there are flows that need to visit all roots for
- * correctness, they all take mmu_lock for write, so they cannot yet
- * run concurrently. The same is true after kvm_tdp_root_mark_invalid,
- * since the root still has refcount=0.
- *
- * However, tdp_mmu_zap_root can yield, and writers do not expect to
- * see refcount=0 (see for example kvm_tdp_mmu_invalidate_all_roots()).
- * So the root temporarily gets an extra reference, going to refcount=1
- * while staying invalid. Readers still cannot acquire any reference;
- * but writers are now allowed to run if tdp_mmu_zap_root yields and
- * they might take an extra reference if they themselves yield.
- * Therefore, when the reference is given back by the worker,
- * there is no guarantee that the refcount is still 1. If not, whoever
- * puts the last reference will free the page, but they will not have to
- * zap the root because a root cannot go from invalid to valid.
+ * The TDP MMU itself holds a reference to each root until the root is
+ * explicitly invalidated, i.e. the final reference should be never be
+ * put for a valid root.
*/
- if (!kvm_tdp_root_mark_invalid(root)) {
- refcount_set(&root->tdp_mmu_root_count, 1);
-
- /*
- * Zapping the root in a worker is not just "nice to have";
- * it is required because kvm_tdp_mmu_invalidate_all_roots()
- * skips already-invalid roots. If kvm_tdp_mmu_put_root() did
- * not add the root to the workqueue, kvm_tdp_mmu_zap_all_fast()
- * might return with some roots not zapped yet.
- */
- tdp_mmu_schedule_zap_root(kvm, root);
- return;
- }
+ KVM_BUG_ON(!is_tdp_mmu_page(root) || !root->role.invalid, kvm);
spin_lock(&kvm->arch.tdp_mmu_pages_lock);
list_del_rcu(&root->link);
@@ -329,7 +296,14 @@ hpa_t kvm_tdp_mmu_get_vcpu_root_hpa(struct kvm_vcpu *vcpu)
root = tdp_mmu_alloc_sp(vcpu);
tdp_mmu_init_sp(root, NULL, 0, role);
- refcount_set(&root->tdp_mmu_root_count, 1);
+ /*
+ * TDP MMU roots are kept until they are explicitly invalidated, either
+ * by a memslot update or by the destruction of the VM. Initialize the
+ * refcount to two; one reference for the vCPU, and one reference for
+ * the TDP MMU itself, which is held until the root is invalidated and
+ * is ultimately put by tdp_mmu_zap_root_work().
+ */
+ refcount_set(&root->tdp_mmu_root_count, 2);
spin_lock(&kvm->arch.tdp_mmu_pages_lock);
list_add_rcu(&root->link, &kvm->arch.tdp_mmu_roots);
@@ -1027,32 +1001,49 @@ void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm)
/*
* Mark each TDP MMU root as invalid to prevent vCPUs from reusing a root that
* is about to be zapped, e.g. in response to a memslots update. The actual
- * zapping is performed asynchronously, so a reference is taken on all roots.
- * Using a separate workqueue makes it easy to ensure that the destruction is
- * performed before the "fast zap" completes, without keeping a separate list
- * of invalidated roots; the list is effectively the list of work items in
- * the workqueue.
+ * zapping is performed asynchronously. Using a separate workqueue makes it
+ * easy to ensure that the destruction is performed before the "fast zap"
+ * completes, without keeping a separate list of invalidated roots; the list is
+ * effectively the list of work items in the workqueue.
*
- * Get a reference even if the root is already invalid, the asynchronous worker
- * assumes it was gifted a reference to the root it processes. Because mmu_lock
- * is held for write, it should be impossible to observe a root with zero refcount,
- * i.e. the list of roots cannot be stale.
- *
- * This has essentially the same effect for the TDP MMU
- * as updating mmu_valid_gen does for the shadow MMU.
+ * Note, the asynchronous worker is gifted the TDP MMU's reference.
+ * See kvm_tdp_mmu_get_vcpu_root_hpa().
*/
void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm)
{
struct kvm_mmu_page *root;
- lockdep_assert_held_write(&kvm->mmu_lock);
- list_for_each_entry(root, &kvm->arch.tdp_mmu_roots, link) {
- if (!root->role.invalid &&
- !WARN_ON_ONCE(!kvm_tdp_mmu_get_root(root))) {
+ /*
+ * mmu_lock must be held for write to ensure that a root doesn't become
+ * invalid while there are active readers (invalidating a root while
+ * there are active readers may or may not be problematic in practice,
+ * but it's uncharted territory and not supported).
+ *
+ * Waive the assertion if there are no users of @kvm, i.e. the VM is
+ * being destroyed after all references have been put, or if no vCPUs
+ * have been created (which means there are no roots), i.e. the VM is
+ * being destroyed in an error path of KVM_CREATE_VM.
+ */
+ if (IS_ENABLED(CONFIG_PROVE_LOCKING) &&
+ refcount_read(&kvm->users_count) && kvm->created_vcpus)
+ lockdep_assert_held_write(&kvm->mmu_lock);
+
+ /*
+ * As above, mmu_lock isn't held when destroying the VM! There can't
+ * be other references to @kvm, i.e. nothing else can invalidate roots
+ * or be consuming roots, but walking the list of roots does need to be
+ * guarded against roots being deleted by the asynchronous zap worker.
+ */
+ rcu_read_lock();
+
+ list_for_each_entry_rcu(root, &kvm->arch.tdp_mmu_roots, link) {
+ if (!root->role.invalid) {
root->role.invalid = true;
tdp_mmu_schedule_zap_root(kvm, root);
}
}
+
+ rcu_read_unlock();
}
/*
base-commit: 802aacbbffe2512dce9f8f33ad99d01cfec435de
--
2.42.0.rc2.253.gd59a3bf2b4-goog
Disable the TDP MMU by default in v5.15 kernels to "fix" several severe
performance bugs that have since been found and fixed in the TDP MMU, but
are unsuitable for backporting to v5.15.
The problematic bugs are fixed by upstream commit edbdb43fc96b ("KVM:
x86: Preserve TDP MMU roots until they are explicitly invalidated") and
commit 01b31714bd90 ("KVM: x86: Do not unload MMU roots when only toggling
CR0.WP with TDP enabled"). Both commits fix scenarios where KVM will
rebuild all TDP MMU page tables in paths that are frequently hit by
certain guest workloads. While not exactly common, the guest workloads
are far from rare. The fallout of rebuilding TDP MMU page tables can be
so severe in some cases that it induces soft lockups in the guest.
Commit edbdb43fc96b would require _significant_ effort and churn to
backport due it depending on a major rework that was done in v5.18.
Commit 01b31714bd90 has far fewer direct conflicts, but has several subtle
_known_ dependencies, and it's unclear whether or not there are more
unknown dependencies that have been missed.
Lastly, disabling the TDP MMU in v5.15 kernels also fixes a lurking train
wreck started by upstream commit a955cad84cda ("KVM: x86/mmu: Retry page
fault if root is invalidated by memslot update"). That commit was tagged
for stable to fix a memory leak, but didn't cherry-pick cleanly and was
never backported to v5.15. Which is extremely fortunate, as it introduced
not one but two bugs, one of which was fixed by upstream commit
18c841e1f411 ("KVM: x86: Retry page fault if MMU reload is pending and
root has no sp"), while the other was unknowingly fixed by upstream
commit ba6e3fe25543 ("KVM: x86/mmu: Grab mmu_invalidate_seq in
kvm_faultin_pfn()") in v6.3 (a one-off fix will be made for v6.1 kernels,
which did receive a backport for a955cad84cda). Disabling the TDP MMU
by default reduces the probability of breaking v5.15 kernels by
backporting only a subset of the fixes.
As far as what is lost by disabling the TDP MMU, the main selling point of
the TDP MMU is its ability to service page fault VM-Exits in parallel,
i.e. the main benefactors of the TDP MMU are deployments of large VMs
(hundreds of vCPUs), and in particular delployments that live-migrate such
VMs and thus need to fault-in huge amounts of memory on many vCPUs after
restarting the VM after migration.
Smaller VMs can see performance improvements, but nowhere enough to make
up for the TDP MMU (in v5.15) absolutely cratering performance for some
workloads. And practically speaking, anyone that is deploying and
migrating VMs with hundreds of vCPUs is likely rolling their own kernel,
not using a stock v5.15 series kernel.
This reverts commit 71ba3f3189c78f756a659568fb473600fd78f207.
Link: https://lore.kernel.org/all/ZDmEGM+CgYpvDLh6@google.com
Link: https://lore.kernel.org/all/f023d927-52aa-7e08-2ee5-59a2fbc65953@gameserver…
Cc: Jeremi Piotrowski <jpiotrowski(a)linux.microsoft.com>
Cc: Mathias Krause <minipli(a)grsecurity.net>
Signed-off-by: Sean Christopherson <seanjc(a)google.com>
---
arch/x86/kvm/mmu/tdp_mmu.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 6c2bb60ccd88..7a64fb238044 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -10,7 +10,7 @@
#include <asm/cmpxchg.h>
#include <trace/events/kvm.h>
-static bool __read_mostly tdp_mmu_enabled = true;
+static bool __read_mostly tdp_mmu_enabled = false;
module_param_named(tdp_mmu, tdp_mmu_enabled, bool, 0644);
/* Initializes the TDP MMU for the VM, if enabled. */
base-commit: f6f7927ac664ba23447f8dd3c3dfe2f4ee39272f
--
2.42.0.rc2.253.gd59a3bf2b4-goog
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 5310760af1d4fbea1452bfc77db5f9a680f7ae47
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082114-remix-cable-0852@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
5310760af1d4 ("ipvs: fix racy memcpy in proc_do_sync_threshold")
1b90af292e71 ("ipvs: Improve robustness to the ipvs sysctl")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5310760af1d4fbea1452bfc77db5f9a680f7ae47 Mon Sep 17 00:00:00 2001
From: Sishuai Gong <sishuai.system(a)gmail.com>
Date: Thu, 10 Aug 2023 15:12:42 -0400
Subject: [PATCH] ipvs: fix racy memcpy in proc_do_sync_threshold
When two threads run proc_do_sync_threshold() in parallel,
data races could happen between the two memcpy():
Thread-1 Thread-2
memcpy(val, valp, sizeof(val));
memcpy(valp, val, sizeof(val));
This race might mess up the (struct ctl_table *) table->data,
so we add a mutex lock to serialize them.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Link: https://lore.kernel.org/netdev/B6988E90-0A1E-4B85-BF26-2DAF6D482433@gmail.c…
Signed-off-by: Sishuai Gong <sishuai.system(a)gmail.com>
Acked-by: Simon Horman <horms(a)kernel.org>
Acked-by: Julian Anastasov <ja(a)ssi.bg>
Signed-off-by: Florian Westphal <fw(a)strlen.de>
diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 62606fb44d02..4bb0d90eca1c 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -1876,6 +1876,7 @@ static int
proc_do_sync_threshold(struct ctl_table *table, int write,
void *buffer, size_t *lenp, loff_t *ppos)
{
+ struct netns_ipvs *ipvs = table->extra2;
int *valp = table->data;
int val[2];
int rc;
@@ -1885,6 +1886,7 @@ proc_do_sync_threshold(struct ctl_table *table, int write,
.mode = table->mode,
};
+ mutex_lock(&ipvs->sync_mutex);
memcpy(val, valp, sizeof(val));
rc = proc_dointvec(&tmp, write, buffer, lenp, ppos);
if (write) {
@@ -1894,6 +1896,7 @@ proc_do_sync_threshold(struct ctl_table *table, int write,
else
memcpy(valp, val, sizeof(val));
}
+ mutex_unlock(&ipvs->sync_mutex);
return rc;
}
@@ -4321,6 +4324,7 @@ static int __net_init ip_vs_control_net_init_sysctl(struct netns_ipvs *ipvs)
ipvs->sysctl_sync_threshold[0] = DEFAULT_SYNC_THRESHOLD;
ipvs->sysctl_sync_threshold[1] = DEFAULT_SYNC_PERIOD;
tbl[idx].data = &ipvs->sysctl_sync_threshold;
+ tbl[idx].extra2 = ipvs;
tbl[idx++].maxlen = sizeof(ipvs->sysctl_sync_threshold);
ipvs->sysctl_sync_refresh_period = DEFAULT_SYNC_REFRESH_PERIOD;
tbl[idx++].data = &ipvs->sysctl_sync_refresh_period;
This is a port of commit 379eb01c21795edb4c ("riscv: Ensure the value
of FP registers in the core dump file is up to date").
The values of FP/SIMD registers in the core dump file come from the
thread.fpu. However, kernel saves the FP/SIMD registers only before
scheduling out the process. If no process switch happens during the
exception handling, kernel will not have a chance to save the latest
values of FP/SIMD registers. So it may cause their values in the core
dump file incorrect. To solve this problem, force fpr_get()/simd_get()
to save the FP/SIMD registers into the thread.fpu if the target task
equals the current task.
Cc: stable(a)vger.kernel.org
Signed-off-by: Huacai Chen <chenhuacai(a)loongson.cn>
---
V2: Rename get_fpu_regs() to save_fpu_regs().
arch/loongarch/include/asm/fpu.h | 22 ++++++++++++++++++----
arch/loongarch/kernel/ptrace.c | 4 ++++
2 files changed, 22 insertions(+), 4 deletions(-)
diff --git a/arch/loongarch/include/asm/fpu.h b/arch/loongarch/include/asm/fpu.h
index b541f6248837..08a45e9fd15c 100644
--- a/arch/loongarch/include/asm/fpu.h
+++ b/arch/loongarch/include/asm/fpu.h
@@ -173,16 +173,30 @@ static inline void restore_fp(struct task_struct *tsk)
_restore_fp(&tsk->thread.fpu);
}
-static inline union fpureg *get_fpu_regs(struct task_struct *tsk)
+static inline void save_fpu_regs(struct task_struct *tsk)
{
+ unsigned int euen;
+
if (tsk == current) {
preempt_disable();
- if (is_fpu_owner())
+
+ euen = csr_read32(LOONGARCH_CSR_EUEN);
+
+#ifdef CONFIG_CPU_HAS_LASX
+ if (euen & CSR_EUEN_LASXEN)
+ _save_lasx(¤t->thread.fpu);
+ else
+#endif
+#ifdef CONFIG_CPU_HAS_LSX
+ if (euen & CSR_EUEN_LSXEN)
+ _save_lsx(¤t->thread.fpu);
+ else
+#endif
+ if (euen & CSR_EUEN_FPEN)
_save_fp(¤t->thread.fpu);
+
preempt_enable();
}
-
- return tsk->thread.fpu.fpr;
}
static inline int is_simd_owner(void)
diff --git a/arch/loongarch/kernel/ptrace.c b/arch/loongarch/kernel/ptrace.c
index a0767c3a0f0a..9a75dc43eb29 100644
--- a/arch/loongarch/kernel/ptrace.c
+++ b/arch/loongarch/kernel/ptrace.c
@@ -147,6 +147,8 @@ static int fpr_get(struct task_struct *target,
{
int r;
+ save_fpu_regs(target);
+
if (sizeof(target->thread.fpu.fpr[0]) == sizeof(elf_fpreg_t))
r = gfpr_get(target, &to);
else
@@ -278,6 +280,8 @@ static int simd_get(struct task_struct *target,
{
const unsigned int wr_size = NUM_FPU_REGS * regset->size;
+ save_fpu_regs(target);
+
if (!tsk_used_math(target)) {
/* The task hasn't used FP or LSX, fill with 0xff */
copy_pad_fprs(target, regset, &to, 0);
--
2.39.3
I'm announcing the release of the 6.1.48 kernel.
All users of the 6.1 kernel series must upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Documentation/admin-guide/hw-vuln/srso.rst | 4
Makefile | 2
arch/x86/include/asm/entry-common.h | 1
arch/x86/include/asm/nospec-branch.h | 28 +++---
arch/x86/kernel/cpu/amd.c | 1
arch/x86/kernel/cpu/bugs.c | 28 ++++--
arch/x86/kernel/static_call.c | 13 ++
arch/x86/kernel/traps.c | 2
arch/x86/kernel/vmlinux.lds.S | 20 ++--
arch/x86/kvm/svm/svm.c | 2
arch/x86/lib/retpoline.S | 135 ++++++++++++++++++++---------
tools/objtool/arch/x86/decode.c | 2
tools/objtool/check.c | 21 ++--
13 files changed, 178 insertions(+), 81 deletions(-)
Borislav Petkov (AMD) (4):
x86/srso: Explain the untraining sequences a bit more
x86/CPU/AMD: Fix the DIV(0) initial fix attempt
x86/srso: Disable the mitigation on unaffected configurations
x86/srso: Correct the mitigation status when SMT is disabled
Greg Kroah-Hartman (1):
Linux 6.1.48
Peter Zijlstra (9):
x86/cpu: Fix __x86_return_thunk symbol type
x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk()
x86/alternative: Make custom return thunk unconditional
x86/cpu: Clean up SRSO return thunk mess
x86/cpu: Rename original retbleed methods
x86/cpu: Rename srso_(.*)_alias to srso_alias_\1
x86/cpu: Cleanup the untrain mess
x86/static_call: Fix __static_call_fixup()
objtool/x86: Fixup frame-pointer vs rethunk
Petr Pavlu (1):
x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG
Sean Christopherson (1):
x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()
This is the start of the stable review cycle for the 6.1.48 release.
There are 15 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Sat, 26 Aug 2023 14:14:28 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.1.48-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.1.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.1.48-rc1
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/srso: Correct the mitigation status when SMT is disabled
Peter Zijlstra <peterz(a)infradead.org>
objtool/x86: Fixup frame-pointer vs rethunk
Petr Pavlu <petr.pavlu(a)suse.com>
x86/retpoline,kprobes: Fix position of thunk sections with CONFIG_LTO_CLANG
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/srso: Disable the mitigation on unaffected configurations
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/CPU/AMD: Fix the DIV(0) initial fix attempt
Sean Christopherson <seanjc(a)google.com>
x86/retpoline: Don't clobber RFLAGS during srso_safe_ret()
Peter Zijlstra <peterz(a)infradead.org>
x86/static_call: Fix __static_call_fixup()
Borislav Petkov (AMD) <bp(a)alien8.de>
x86/srso: Explain the untraining sequences a bit more
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Cleanup the untrain mess
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Rename srso_(.*)_alias to srso_alias_\1
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Rename original retbleed methods
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Clean up SRSO return thunk mess
Peter Zijlstra <peterz(a)infradead.org>
x86/alternative: Make custom return thunk unconditional
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Fix up srso_safe_ret() and __x86_return_thunk()
Peter Zijlstra <peterz(a)infradead.org>
x86/cpu: Fix __x86_return_thunk symbol type
-------------
Diffstat:
Documentation/admin-guide/hw-vuln/srso.rst | 4 +-
Makefile | 4 +-
arch/x86/include/asm/entry-common.h | 1 +
arch/x86/include/asm/nospec-branch.h | 28 +++---
arch/x86/kernel/cpu/amd.c | 1 +
arch/x86/kernel/cpu/bugs.c | 28 +++++-
arch/x86/kernel/static_call.c | 13 +++
arch/x86/kernel/traps.c | 2 -
arch/x86/kernel/vmlinux.lds.S | 20 ++--
arch/x86/kvm/svm/svm.c | 2 +
arch/x86/lib/retpoline.S | 141 ++++++++++++++++++++---------
tools/objtool/arch/x86/decode.c | 2 +-
tools/objtool/check.c | 21 +++--
13 files changed, 182 insertions(+), 85 deletions(-)
From: Helge Deller <deller(a)gmx.de>
Older PA-RISC machines have LEDs which show the disk- and LAN-activity.
The computation is done in software and takes quite some time, e.g. on a
J6500 this may take up to 60% time of one CPU if the machine is loaded
via network traffic.
Since most people don't care about the LEDs, start with LEDs disabled and
just show a CPU heartbeat LED. The disk and LAN LEDs can be turned on
manually via /proc/pdc/led.
Signed-off-by: Helge Deller <deller(a)gmx.de>
Cc: <stable(a)vger.kernel.org>
---
drivers/parisc/led.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/parisc/led.c b/drivers/parisc/led.c
index 8bdc5e043831..765f19608f60 100644
--- a/drivers/parisc/led.c
+++ b/drivers/parisc/led.c
@@ -56,8 +56,8 @@
static int led_type __read_mostly = -1;
static unsigned char lastleds; /* LED state from most recent update */
static unsigned int led_heartbeat __read_mostly = 1;
-static unsigned int led_diskio __read_mostly = 1;
-static unsigned int led_lanrxtx __read_mostly = 1;
+static unsigned int led_diskio __read_mostly;
+static unsigned int led_lanrxtx __read_mostly;
static char lcd_text[32] __read_mostly;
static char lcd_text_default[32] __read_mostly;
static int lcd_no_led_support __read_mostly = 0; /* KittyHawk doesn't support LED on its LCD */
@@ -589,6 +589,9 @@ int __init register_led_driver(int model, unsigned long cmd_reg, unsigned long d
return 1;
}
+ pr_info("LED: Enable disk and LAN activity LEDs "
+ "via /proc/pdc/led\n");
+
/* mark the LCD/LED driver now as initialized and
* register to the reboot notifier chain */
initialized++;
--
2.41.0
As of now, bpf counters (bperf) don't support event groups. But the
default perf stat includes topdown metrics if supported (on recent Intel
machines) which require groups. That makes perf stat exiting.
$ sudo perf stat --bpf-counter true
bpf managed perf events do not yet support groups.
Actually the test explicitly uses cycles event only, but it missed to
pass the option when it checks the availability of the command.
Fixes: 2c0cb9f56020d ("perf test: Add a shell test for 'perf stat --bpf-counters' new option")
Cc: stable(a)vger.kernel.org
Cc: Song Liu <song(a)kernel.org>
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
---
tools/perf/tests/shell/stat_bpf_counters.sh | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/tools/perf/tests/shell/stat_bpf_counters.sh b/tools/perf/tests/shell/stat_bpf_counters.sh
index 513cd1e58e0e..a87bb2814b4c 100755
--- a/tools/perf/tests/shell/stat_bpf_counters.sh
+++ b/tools/perf/tests/shell/stat_bpf_counters.sh
@@ -22,10 +22,10 @@ compare_number()
}
# skip if --bpf-counters is not supported
-if ! perf stat --bpf-counters true > /dev/null 2>&1; then
+if ! perf stat -e cycles --bpf-counters true > /dev/null 2>&1; then
if [ "$1" = "-v" ]; then
echo "Skipping: --bpf-counters not supported"
- perf --no-pager stat --bpf-counters true || true
+ perf --no-pager stat -e cycles --bpf-counters true || true
fi
exit 2
fi
--
2.42.0.rc1.204.g551eb34607-goog
These two are backports for 5.15.y. Conflict resolution in done in
both patches.
I have tested LTP-nfs fchown02 and chown02 on 5.15.y with below patches
applied. The tests passed.
I would like to have a review as I am not familiar with this code.
Thanks to Vegard for helping me with this.
Regards,
Harshit
Christian Brauner (2):
nfs: use vfs setgid helper
nfsd: use vfs setgid helper
fs/attr.c | 1 +
fs/internal.h | 2 --
fs/nfs/inode.c | 4 +---
fs/nfsd/vfs.c | 4 +++-
include/linux/fs.h | 2 ++
5 files changed, 7 insertions(+), 6 deletions(-)
--
2.34.1
The PERF_RECORD_ATTR is used for a pipe mode to describe an event with
attribute and IDs. The ID table comes after the attr and it calculate
size of the table using the total record size and the attr size.
n_ids = (total_record_size - end_of_the_attr_field) / sizeof(u64)
This is fine for most use cases, but sometimes it saves the pipe output
in a file and then process it later. And it becomes a problem if there
is a change in attr size between the record and report.
$ perf record -o- > perf-pipe.data # old version
$ perf report -i- < perf-pipe.data # new version
For example, if the attr size is 128 and it has 4 IDs, then it would
save them in 168 byte like below:
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 },
128 byte: perf event attr { .size = 128, ... },
32 byte: event IDs [] = { 1234, 1235, 1236, 1237 },
But when report later, it thinks the attr size is 136 then it only read
the last 3 entries as ID.
8 byte: perf event header { .type = PERF_RECORD_ATTR, .size = 168 },
136 byte: perf event attr { .size = 136, ... },
24 byte: event IDs [] = { 1235, 1236, 1237 }, // 1234 is missing
So it should use the recorded version of the attr. The attr has the
size field already then it should honor the size when reading data.
Fixes: 2c46dbb517a10 ("perf: Convert perf header attrs into attr events")
Cc: stable(a)vger.kernel.org
Cc: Tom Zanussi <zanussi(a)kernel.org>
Signed-off-by: Namhyung Kim <namhyung(a)kernel.org>
---
Keep this version before the libperf change so that it can go through
the stable versions.
tools/perf/util/header.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 52fbf526fe74..f89321cbfdee 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -4381,7 +4381,8 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
union perf_event *event,
struct evlist **pevlist)
{
- u32 i, ids, n_ids;
+ u32 i, n_ids;
+ u64 *ids;
struct evsel *evsel;
struct evlist *evlist = *pevlist;
@@ -4397,9 +4398,8 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
evlist__add(evlist, evsel);
- ids = event->header.size;
- ids -= (void *)&event->attr.id - (void *)event;
- n_ids = ids / sizeof(u64);
+ n_ids = event->header.size - sizeof(event->header) - event->attr.attr.size;
+ n_ids = n_ids / sizeof(u64);
/*
* We don't have the cpu and thread maps on the header, so
* for allocating the perf_sample_id table we fake 1 cpu and
@@ -4408,8 +4408,9 @@ int perf_event__process_attr(struct perf_tool *tool __maybe_unused,
if (perf_evsel__alloc_id(&evsel->core, 1, n_ids))
return -ENOMEM;
+ ids = (void *)&event->attr.attr + event->attr.attr.size;
for (i = 0; i < n_ids; i++) {
- perf_evlist__id_add(&evlist->core, &evsel->core, 0, i, event->attr.id[i]);
+ perf_evlist__id_add(&evlist->core, &evsel->core, 0, i, ids[i]);
}
return 0;
--
2.42.0.rc1.204.g551eb34607-goog
This patch fixes an issues when concurrent fcntl() syscalls are
executing on two different gfs2 filesystems. Each gfs2 filesystem
creates an DLM lockspace, it seems that VFS only allows fcntl() syscalls
at one time on a per filesystem basis. However if there are two
filesystems and we executing fcntl() syscalls our lookup mechanism on the
global plock op list does not work anymore.
It can be reproduced with two mounted gfs2 filesystems using DLM
locking. Then call stress-ng --fcntl 32 on each mount point. The kernel
log will show several:
WARNING: CPU: 4 PID: 943 at fs/dlm/plock.c:574 dev_write+0x15c/0x590
because we have a sanity check if it's was really the meant original
plock op when dev_write() does a lookup. This patch adds just a
additional check for fsid to find the right plock op which is an
indicator that the recv_list should be on a per lockspace basis and not
globally defined. After this patch the sanity check never warned again
that the wrong plock op was being looked up.
Cc: stable(a)vger.kernel.org
Reported-by: Barry Marson <bmarson(a)redhat.com>
Fixes: 57e2c2f2d94c ("fs: dlm: fix mismatch of plock results from userspace")
Signed-off-by: Alexander Aring <aahringo(a)redhat.com>
---
fs/dlm/plock.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/fs/dlm/plock.c b/fs/dlm/plock.c
index 00e1d802a81c..e6b4c1a21446 100644
--- a/fs/dlm/plock.c
+++ b/fs/dlm/plock.c
@@ -556,7 +556,8 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
op = plock_lookup_waiter(&info);
} else {
list_for_each_entry(iter, &recv_list, list) {
- if (!iter->info.wait) {
+ if (!iter->info.wait &&
+ iter->info.fsid == info.fsid) {
op = iter;
break;
}
@@ -568,8 +569,7 @@ static ssize_t dev_write(struct file *file, const char __user *u, size_t count,
if (info.wait)
WARN_ON(op->info.optype != DLM_PLOCK_OP_LOCK);
else
- WARN_ON(op->info.fsid != info.fsid ||
- op->info.number != info.number ||
+ WARN_ON(op->info.number != info.number ||
op->info.owner != info.owner ||
op->info.optype != info.optype);
--
2.31.1
From: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
Extract the internal code inside a helper function, fix the
initialization of the parameters used in the helper function
(`hidpp->answer_available` was not reset and `*response` wasn't either),
and use a `do {...} while();` loop.
Fixes: 586e8fede795 ("HID: logitech-hidpp: Retry commands when device is busy")
Cc: stable(a)vger.kernel.org
Reviewed-by: Bastien Nocera <hadess(a)hadess.net>
Signed-off-by: Benjamin Tissoires <benjamin.tissoires(a)redhat.com>
---
as requested by https://lore.kernel.org/all/CAHk-=wiMbF38KCNhPFiargenpSBoecSXTLQACKS2UMyo_V…
This is a rewrite of that particular piece of code.
---
Changes in v2:
- added __must_hold() for KASAN
- Reworked the comment describing the functions and their return values
- Link to v1: https://lore.kernel.org/r/20230621-logitech-fixes-v1-1-32e70933c0b0@redhat.…
---
drivers/hid/hid-logitech-hidpp.c | 115 +++++++++++++++++++++++++--------------
1 file changed, 75 insertions(+), 40 deletions(-)
diff --git a/drivers/hid/hid-logitech-hidpp.c b/drivers/hid/hid-logitech-hidpp.c
index 129b01be488d..09ba2086c95c 100644
--- a/drivers/hid/hid-logitech-hidpp.c
+++ b/drivers/hid/hid-logitech-hidpp.c
@@ -275,21 +275,22 @@ static int __hidpp_send_report(struct hid_device *hdev,
}
/*
- * hidpp_send_message_sync() returns 0 in case of success, and something else
- * in case of a failure.
- * - If ' something else' is positive, that means that an error has been raised
- * by the protocol itself.
- * - If ' something else' is negative, that means that we had a classic error
- * (-ENOMEM, -EPIPE, etc...)
+ * Effectively send the message to the device, waiting for its answer.
+ *
+ * Must be called with hidpp->send_mutex locked
+ *
+ * Same return protocol than hidpp_send_message_sync():
+ * - success on 0
+ * - negative error means transport error
+ * - positive value means protocol error
*/
-static int hidpp_send_message_sync(struct hidpp_device *hidpp,
+static int __do_hidpp_send_message_sync(struct hidpp_device *hidpp,
struct hidpp_report *message,
struct hidpp_report *response)
{
- int ret = -1;
- int max_retries = 3;
+ int ret;
- mutex_lock(&hidpp->send_mutex);
+ __must_hold(&hidpp->send_mutex);
hidpp->send_receive_buf = response;
hidpp->answer_available = false;
@@ -300,47 +301,74 @@ static int hidpp_send_message_sync(struct hidpp_device *hidpp,
*/
*response = *message;
- for (; max_retries != 0 && ret; max_retries--) {
- ret = __hidpp_send_report(hidpp->hid_dev, message);
+ ret = __hidpp_send_report(hidpp->hid_dev, message);
+ if (ret) {
+ dbg_hid("__hidpp_send_report returned err: %d\n", ret);
+ memset(response, 0, sizeof(struct hidpp_report));
+ return ret;
+ }
- if (ret) {
- dbg_hid("__hidpp_send_report returned err: %d\n", ret);
- memset(response, 0, sizeof(struct hidpp_report));
- break;
- }
+ if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
+ 5*HZ)) {
+ dbg_hid("%s:timeout waiting for response\n", __func__);
+ memset(response, 0, sizeof(struct hidpp_report));
+ return -ETIMEDOUT;
+ }
- if (!wait_event_timeout(hidpp->wait, hidpp->answer_available,
- 5*HZ)) {
- dbg_hid("%s:timeout waiting for response\n", __func__);
- memset(response, 0, sizeof(struct hidpp_report));
- ret = -ETIMEDOUT;
- break;
- }
+ if (response->report_id == REPORT_ID_HIDPP_SHORT &&
+ response->rap.sub_id == HIDPP_ERROR) {
+ ret = response->rap.params[1];
+ dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
+ return ret;
+ }
- if (response->report_id == REPORT_ID_HIDPP_SHORT &&
- response->rap.sub_id == HIDPP_ERROR) {
- ret = response->rap.params[1];
- dbg_hid("%s:got hidpp error %02X\n", __func__, ret);
+ if ((response->report_id == REPORT_ID_HIDPP_LONG ||
+ response->report_id == REPORT_ID_HIDPP_VERY_LONG) &&
+ response->fap.feature_index == HIDPP20_ERROR) {
+ ret = response->fap.params[1];
+ dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret);
+ return ret;
+ }
+
+ return 0;
+}
+
+/*
+ * hidpp_send_message_sync() returns 0 in case of success, and something else
+ * in case of a failure.
+ *
+ * See __do_hidpp_send_message_sync() for a detailed explanation of the returned
+ * value.
+ */
+static int hidpp_send_message_sync(struct hidpp_device *hidpp,
+ struct hidpp_report *message,
+ struct hidpp_report *response)
+{
+ int ret;
+ int max_retries = 3;
+
+ mutex_lock(&hidpp->send_mutex);
+
+ do {
+ ret = __do_hidpp_send_message_sync(hidpp, message, response);
+ if (ret != HIDPP20_ERROR_BUSY)
break;
- }
- if ((response->report_id == REPORT_ID_HIDPP_LONG ||
- response->report_id == REPORT_ID_HIDPP_VERY_LONG) &&
- response->fap.feature_index == HIDPP20_ERROR) {
- ret = response->fap.params[1];
- if (ret != HIDPP20_ERROR_BUSY) {
- dbg_hid("%s:got hidpp 2.0 error %02X\n", __func__, ret);
- break;
- }
- dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret);
- }
- }
+ dbg_hid("%s:got busy hidpp 2.0 error %02X, retrying\n", __func__, ret);
+ } while (--max_retries);
mutex_unlock(&hidpp->send_mutex);
return ret;
}
+/*
+ * hidpp_send_fap_command_sync() returns 0 in case of success, and something else
+ * in case of a failure.
+ *
+ * See __do_hidpp_send_message_sync() for a detailed explanation of the returned
+ * value.
+ */
static int hidpp_send_fap_command_sync(struct hidpp_device *hidpp,
u8 feat_index, u8 funcindex_clientid, u8 *params, int param_count,
struct hidpp_report *response)
@@ -373,6 +401,13 @@ static int hidpp_send_fap_command_sync(struct hidpp_device *hidpp,
return ret;
}
+/*
+ * hidpp_send_rap_command_sync() returns 0 in case of success, and something else
+ * in case of a failure.
+ *
+ * See __do_hidpp_send_message_sync() for a detailed explanation of the returned
+ * value.
+ */
static int hidpp_send_rap_command_sync(struct hidpp_device *hidpp_dev,
u8 report_id, u8 sub_id, u8 reg_address, u8 *params, int param_count,
struct hidpp_report *response)
---
base-commit: 87854366176403438d01f368b09de3ec2234e0f5
change-id: 20230621-logitech-fixes-a4c0e66ea2ad
Best regards,
--
Benjamin Tissoires <bentiss(a)kernel.org>
The goal is to support a bpf_redirect() from an ethernet device (ingress)
to a ppp device (egress).
The l2 header is added automatically by the ppp driver, thus the ethernet
header should be removed.
CC: stable(a)vger.kernel.org
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Tested-by: Siwar Zitouni <siwar.zitouni(a)6wind.com>
---
v2 -> v3:
- add a comment in the code
- rework the commit log
v1 -> v2:
- I forgot the 'Tested-by' tag in the v1 :/
include/linux/if_arp.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/include/linux/if_arp.h b/include/linux/if_arp.h
index 1ed52441972f..10a1e81434cb 100644
--- a/include/linux/if_arp.h
+++ b/include/linux/if_arp.h
@@ -53,6 +53,10 @@ static inline bool dev_is_mac_header_xmit(const struct net_device *dev)
case ARPHRD_NONE:
case ARPHRD_RAWIP:
case ARPHRD_PIMREG:
+ /* PPP adds its l2 header automatically in ppp_start_xmit().
+ * This makes it look like an l3 device to __bpf_redirect() and tcf_mirred_init().
+ */
+ case ARPHRD_PPP:
return false;
default:
return true;
--
2.39.2
With commit 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver
with pata_falcon and falconide"), the Q40 IDE driver was
replaced by pata_falcon.c.
Both IO and memory resources were defined for the Q40 IDE
platform device, but definition of the IDE register addresses
was modeled after the Falcon case, both in use of the memory
resources and in including register shift and byte vs. word
offset in the address.
This was correct for the Falcon case, which does not apply
any address translation to the register addresses. In the
Q40 case, all of device base address, byte access offset
and register shift is included in the platform specific
ISA access translation (in asm/mm_io.h).
As a consequence, such address translation gets applied
twice, and register addresses are mangled.
Use the device base address from the platform IO resource
for Q40 (the IO address translation will then add the correct
ISA window base address and byte access offset), with register
shift 1. Use MMIO base address and register shift 2 as before
for Falcon.
Encode PIO_OFFSET into IO port addresses for all registers
for Q40 except the data transfer register. Encode the MMIO
offset there (pata_falcon_data_xfer() directly uses raw IO
with no address translation).
Reported-by: William R Sowerbutts <will(a)sowerbutts.com>
Closes: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Link: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Fixes: 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver with pata_falcon and falconide")
Cc: stable(a)vger.kernel.org
Cc: Finn Thain <fthain(a)linux-m68k.org>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Tested-by: William R Sowerbutts <will(a)sowerbutts.com>
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov(a)omp.ru>
Reviewed-by: Geert Uytterhoeven <geert(a)linux-m68k.org>
---
Changes from v4:
Geert Uytterhoeven:
- use %px for ap->ioaddr.data_addr
Changes from v3:
Sergey Shtylyov:
- change use of reg_scale to reg_shift
Geert Uytterhoeven:
- factor out ata_port_desc() from platform specific code
Changes from v2:
Finn Thain:
- add back stable Cc:
Changes from v1:
Damien Le Moal:
- change patch title
- drop stable backport tag
Changes from RFC v3:
- split off byte swap option into separate patch
Geert Uytterhoeven:
- review comments
Changes from RFC v2:
- add driver parameter 'data_swap' as bit mask for drives to swap
Changes from RFC v1:
Finn Thain:
- take care to supply IO address suitable for ioread8/iowrite8
- use MMIO address for data transfer
---
drivers/ata/pata_falcon.c | 50 +++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 21 deletions(-)
diff --git a/drivers/ata/pata_falcon.c b/drivers/ata/pata_falcon.c
index 996516e64f13..616064b02de6 100644
--- a/drivers/ata/pata_falcon.c
+++ b/drivers/ata/pata_falcon.c
@@ -123,8 +123,8 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
struct resource *base_res, *ctl_res, *irq_res;
struct ata_host *host;
struct ata_port *ap;
- void __iomem *base;
- int irq = 0;
+ void __iomem *base, *ctl_base;
+ int irq = 0, io_offset = 1, reg_shift = 2; /* Falcon defaults */
dev_info(&pdev->dev, "Atari Falcon and Q40/Q60 PATA controller\n");
@@ -165,26 +165,34 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
ap->pio_mask = ATA_PIO4;
ap->flags |= ATA_FLAG_SLAVE_POSS | ATA_FLAG_NO_IORDY;
- base = (void __iomem *)base_mem_res->start;
/* N.B. this assumes data_addr will be used for word-sized I/O only */
- ap->ioaddr.data_addr = base + 0 + 0 * 4;
- ap->ioaddr.error_addr = base + 1 + 1 * 4;
- ap->ioaddr.feature_addr = base + 1 + 1 * 4;
- ap->ioaddr.nsect_addr = base + 1 + 2 * 4;
- ap->ioaddr.lbal_addr = base + 1 + 3 * 4;
- ap->ioaddr.lbam_addr = base + 1 + 4 * 4;
- ap->ioaddr.lbah_addr = base + 1 + 5 * 4;
- ap->ioaddr.device_addr = base + 1 + 6 * 4;
- ap->ioaddr.status_addr = base + 1 + 7 * 4;
- ap->ioaddr.command_addr = base + 1 + 7 * 4;
-
- base = (void __iomem *)ctl_mem_res->start;
- ap->ioaddr.altstatus_addr = base + 1;
- ap->ioaddr.ctl_addr = base + 1;
-
- ata_port_desc(ap, "cmd 0x%lx ctl 0x%lx",
- (unsigned long)base_mem_res->start,
- (unsigned long)ctl_mem_res->start);
+ ap->ioaddr.data_addr = (void __iomem *)base_mem_res->start;
+
+ if (base_res) { /* only Q40 has IO resources */
+ io_offset = 0x10000;
+ reg_shift = 0;
+ base = (void __iomem *)base_res->start;
+ ctl_base = (void __iomem *)ctl_res->start;
+ } else {
+ base = (void __iomem *)base_mem_res->start;
+ ctl_base = (void __iomem *)ctl_mem_res->start;
+ }
+
+ ap->ioaddr.error_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.feature_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.nsect_addr = base + io_offset + (2 << reg_shift);
+ ap->ioaddr.lbal_addr = base + io_offset + (3 << reg_shift);
+ ap->ioaddr.lbam_addr = base + io_offset + (4 << reg_shift);
+ ap->ioaddr.lbah_addr = base + io_offset + (5 << reg_shift);
+ ap->ioaddr.device_addr = base + io_offset + (6 << reg_shift);
+ ap->ioaddr.status_addr = base + io_offset + (7 << reg_shift);
+ ap->ioaddr.command_addr = base + io_offset + (7 << reg_shift);
+
+ ap->ioaddr.altstatus_addr = ctl_base + io_offset;
+ ap->ioaddr.ctl_addr = ctl_base + io_offset;
+
+ ata_port_desc(ap, "cmd %px ctl %px data %px",
+ base, ctl_base, ap->ioaddr.data_addr);
irq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
if (irq_res && irq_res->start > 0) {
--
2.17.1
Gadget ACM while unloading module try to dequeue not queued usb
request which causes the kernel to crash.
Patch adds extra condition to check whether usb request is processed
by CDNSP driver.
cc: <stable(a)vger.kernel.org>
Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD Driver")
Signed-off-by: Pawel Laszczak <pawell(a)cadence.com>
---
drivers/usb/cdns3/cdnsp-gadget.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/cdns3/cdnsp-gadget.c b/drivers/usb/cdns3/cdnsp-gadget.c
index fff9ec9c391f..3a30c2af0c00 100644
--- a/drivers/usb/cdns3/cdnsp-gadget.c
+++ b/drivers/usb/cdns3/cdnsp-gadget.c
@@ -1125,6 +1125,9 @@ static int cdnsp_gadget_ep_dequeue(struct usb_ep *ep,
unsigned long flags;
int ret;
+ if (request->status != -EINPROGRESS)
+ return 0;
+
if (!pep->endpoint.desc) {
dev_err(pdev->dev,
"%s: can't dequeue to disabled endpoint\n",
--
2.37.2
With commit 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver
with pata_falcon and falconide"), the Q40 IDE driver was
replaced by pata_falcon.c.
Both IO and memory resources were defined for the Q40 IDE
platform device, but definition of the IDE register addresses
was modeled after the Falcon case, both in use of the memory
resources and in including register shift and byte vs. word
offset in the address.
This was correct for the Falcon case, which does not apply
any address translation to the register addresses. In the
Q40 case, all of device base address, byte access offset
and register shift is included in the platform specific
ISA access translation (in asm/mm_io.h).
As a consequence, such address translation gets applied
twice, and register addresses are mangled.
Use the device base address from the platform IO resource
for Q40 (the IO address translation will then add the correct
ISA window base address and byte access offset), with register
shift 1. Use MMIO base address and register shift 2 as before
for Falcon.
Encode PIO_OFFSET into IO port addresses for all registers
for Q40 except the data transfer register. Encode the MMIO
offset there (pata_falcon_data_xfer() directly uses raw IO
with no address translation).
Reported-by: William R Sowerbutts <will(a)sowerbutts.com>
Closes: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Link: https://lore.kernel.org/r/CAMuHMdUU62jjunJh9cqSqHT87B0H0A4udOOPs=WN7WZKpcag…
Fixes: 44b1fbc0f5f3 ("m68k/q40: Replace q40ide driver with pata_falcon and falconide")
Cc: stable(a)vger.kernel.org
Cc: Finn Thain <fthain(a)linux-m68k.org>
Cc: Geert Uytterhoeven <geert(a)linux-m68k.org>
Tested-by: William R Sowerbutts <will(a)sowerbutts.com>
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Reviewed-by: Sergey Shtylyov <s.shtylyov(a)omp.ru>
---
Changes from v3:
Sergey Shtylyov:
- change use of reg_scale to reg_shift
Geert Uytterhoeven:
- factor out ata_port_desc() from platform specific code
Changes from v2:
Finn Thain:
- add back stable Cc:
Changes from v1:
Damien Le Moal:
- change patch title
- drop stable backport tag
Changes from RFC v3:
- split off byte swap option into separate patch
Geert Uytterhoeven:
- review comments
Changes from RFC v2:
- add driver parameter 'data_swap' as bit mask for drives to swap
Changes from RFC v1:
Finn Thain:
- take care to supply IO address suitable for ioread8/iowrite8
- use MMIO address for data transfer
---
drivers/ata/pata_falcon.c | 50 +++++++++++++++++++++++----------------
1 file changed, 29 insertions(+), 21 deletions(-)
diff --git a/drivers/ata/pata_falcon.c b/drivers/ata/pata_falcon.c
index 996516e64f13..3841ea200bcb 100644
--- a/drivers/ata/pata_falcon.c
+++ b/drivers/ata/pata_falcon.c
@@ -123,8 +123,8 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
struct resource *base_res, *ctl_res, *irq_res;
struct ata_host *host;
struct ata_port *ap;
- void __iomem *base;
- int irq = 0;
+ void __iomem *base, *ctl_base;
+ int irq = 0, io_offset = 1, reg_shift = 2; /* Falcon defaults */
dev_info(&pdev->dev, "Atari Falcon and Q40/Q60 PATA controller\n");
@@ -165,26 +165,34 @@ static int __init pata_falcon_init_one(struct platform_device *pdev)
ap->pio_mask = ATA_PIO4;
ap->flags |= ATA_FLAG_SLAVE_POSS | ATA_FLAG_NO_IORDY;
- base = (void __iomem *)base_mem_res->start;
/* N.B. this assumes data_addr will be used for word-sized I/O only */
- ap->ioaddr.data_addr = base + 0 + 0 * 4;
- ap->ioaddr.error_addr = base + 1 + 1 * 4;
- ap->ioaddr.feature_addr = base + 1 + 1 * 4;
- ap->ioaddr.nsect_addr = base + 1 + 2 * 4;
- ap->ioaddr.lbal_addr = base + 1 + 3 * 4;
- ap->ioaddr.lbam_addr = base + 1 + 4 * 4;
- ap->ioaddr.lbah_addr = base + 1 + 5 * 4;
- ap->ioaddr.device_addr = base + 1 + 6 * 4;
- ap->ioaddr.status_addr = base + 1 + 7 * 4;
- ap->ioaddr.command_addr = base + 1 + 7 * 4;
-
- base = (void __iomem *)ctl_mem_res->start;
- ap->ioaddr.altstatus_addr = base + 1;
- ap->ioaddr.ctl_addr = base + 1;
-
- ata_port_desc(ap, "cmd 0x%lx ctl 0x%lx",
- (unsigned long)base_mem_res->start,
- (unsigned long)ctl_mem_res->start);
+ ap->ioaddr.data_addr = (void __iomem *)base_mem_res->start;
+
+ if (base_res) { /* only Q40 has IO resources */
+ io_offset = 0x10000;
+ reg_shift = 0;
+ base = (void __iomem *)base_res->start;
+ ctl_base = (void __iomem *)ctl_res->start;
+ } else {
+ base = (void __iomem *)base_mem_res->start;
+ ctl_base = (void __iomem *)ctl_mem_res->start;
+ }
+
+ ap->ioaddr.error_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.feature_addr = base + io_offset + (1 << reg_shift);
+ ap->ioaddr.nsect_addr = base + io_offset + (2 << reg_shift);
+ ap->ioaddr.lbal_addr = base + io_offset + (3 << reg_shift);
+ ap->ioaddr.lbam_addr = base + io_offset + (4 << reg_shift);
+ ap->ioaddr.lbah_addr = base + io_offset + (5 << reg_shift);
+ ap->ioaddr.device_addr = base + io_offset + (6 << reg_shift);
+ ap->ioaddr.status_addr = base + io_offset + (7 << reg_shift);
+ ap->ioaddr.command_addr = base + io_offset + (7 << reg_shift);
+
+ ap->ioaddr.altstatus_addr = ctl_base + io_offset;
+ ap->ioaddr.ctl_addr = ctl_base + io_offset;
+
+ ata_port_desc(ap, "cmd %px ctl %px data %pa",
+ base, ctl_base, &ap->ioaddr.data_addr);
irq_res = platform_get_resource(pdev, IORESOURCE_IRQ, 0);
if (irq_res && irq_res->start > 0) {
--
2.17.1
The quilt patch titled
Subject: shmem: fix smaps BUG sleeping while atomic
has been removed from the -mm tree. Its filename was
shmem-fix-smaps-bug-sleeping-while-atomic.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: shmem: fix smaps BUG sleeping while atomic
Date: Tue, 22 Aug 2023 22:14:47 -0700 (PDT)
smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page
table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if
need_resched(): "BUG: sleeping function called from invalid context".
Since shmem_partial_swap_usage() is designed to count across a range, but
smaps_pte_hole_lookup() only calls it for a single page slot, just break
out of the loop on the last or only page, before checking need_resched().
Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com
Fixes: 230100321518 ("mm/smaps: simplify shmem handling of pte holes")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [5.16+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/shmem.c~shmem-fix-smaps-bug-sleeping-while-atomic
+++ a/mm/shmem.c
@@ -806,14 +806,16 @@ unsigned long shmem_partial_swap_usage(s
XA_STATE(xas, &mapping->i_pages, start);
struct page *page;
unsigned long swapped = 0;
+ unsigned long max = end - 1;
rcu_read_lock();
- xas_for_each(&xas, page, end - 1) {
+ xas_for_each(&xas, page, max) {
if (xas_retry(&xas, page))
continue;
if (xa_is_value(page))
swapped++;
-
+ if (xas.xa_index == max)
+ break;
if (need_resched()) {
xas_pause(&xas);
cond_resched_rcu();
_
Patches currently in -mm which might be from hughd(a)google.com are
mm-khugepaged-fix-collapse_pte_mapped_thp-versus-uffd.patch
The quilt patch titled
Subject: maple_tree: disable mas_wr_append() when other readers are possible
has been removed from the -mm tree. Its filename was
maple_tree-disable-mas_wr_append-when-other-readers-are-possible.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: "Liam R. Howlett" <Liam.Howlett(a)oracle.com>
Subject: maple_tree: disable mas_wr_append() when other readers are possible
Date: Fri, 18 Aug 2023 20:43:55 -0400
The current implementation of append may cause duplicate data and/or
incorrect ranges to be returned to a reader during an update. Although
this has not been reported or seen, disable the append write operation
while the tree is in rcu mode out of an abundance of caution.
During the analysis of the mas_next_slot() the following was
artificially created by separating the writer and reader code:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last slot write is to start of slot
store current contents in slot
overwrite old end pivot
mas_next_slot():
read end metadata
read old end pivot
return with incorrect range
store new value
Alternatively:
Writer: reader:
mas_wr_append
set end pivot
updates end metata
Detects write to last slot
last lost write to end of slot
store value
mas_next_slot():
read end metadata
read old end pivot
read new end pivot
return with incorrect range
set old end pivot
There may be other accesses that are not safe since we are now updating
both metadata and pointers, so disabling append if there could be rcu
readers is the safest action.
Link: https://lkml.kernel.org/r/20230819004356.1454718-2-Liam.Howlett@oracle.com
Fixes: 54a611b60590 ("Maple Tree: add new data structure")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/maple_tree.c | 7 +++++++
1 file changed, 7 insertions(+)
--- a/lib/maple_tree.c~maple_tree-disable-mas_wr_append-when-other-readers-are-possible
+++ a/lib/maple_tree.c
@@ -4265,6 +4265,10 @@ static inline unsigned char mas_wr_new_e
* mas_wr_append: Attempt to append
* @wr_mas: the maple write state
*
+ * This is currently unsafe in rcu mode since the end of the node may be cached
+ * by readers while the node contents may be updated which could result in
+ * inaccurate information.
+ *
* Return: True if appended, false otherwise
*/
static inline bool mas_wr_append(struct ma_wr_state *wr_mas)
@@ -4274,6 +4278,9 @@ static inline bool mas_wr_append(struct
struct ma_state *mas = wr_mas->mas;
unsigned char node_pivots = mt_pivots[wr_mas->type];
+ if (mt_in_rcu(mas->tree))
+ return false;
+
if (mas->offset != wr_mas->node_end)
return false;
_
Patches currently in -mm which might be from Liam.Howlett(a)oracle.com are
maple_tree-clean-up-mas_wr_append.patch
The quilt patch titled
Subject: madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
has been removed from the -mm tree. Its filename was
madvise-madvise_free_pte_range-dont-use-mapcount-against-large-folio-for-sharing-check.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yin Fengwei <fengwei.yin(a)intel.com>
Subject: madvise:madvise_free_pte_range(): don't use mapcount() against large folio for sharing check
Date: Tue, 8 Aug 2023 10:09:17 +0800
Commit 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a
folio") replaced the page_mapcount() with folio_mapcount() to check
whether the folio is shared by other mapping.
It's not correct for large folios. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-4-fengwei.yin@intel.com
Fixes: 98b211d6415f ("madvise: convert madvise_free_pte_range() to use a folio")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/mm/madvise.c~madvise-madvise_free_pte_range-dont-use-mapcount-against-large-folio-for-sharing-check
+++ a/mm/madvise.c
@@ -680,7 +680,7 @@ static int madvise_free_pte_range(pmd_t
if (folio_test_large(folio)) {
int err;
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (!folio_trylock(folio))
break;
_
Patches currently in -mm which might be from fengwei.yin(a)intel.com are
filemap-add-filemap_map_folio_range.patch
rmap-add-folio_add_file_rmap_range.patch
mm-convert-do_set_pte-to-set_pte_range.patch
filemap-batch-pte-mappings.patch
The quilt patch titled
Subject: madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
has been removed from the -mm tree. Its filename was
madvise-madvise_cold_or_pageout_pte_range-dont-use-mapcount-against-large-folio-for-sharing-check.patch
This patch was dropped because it was merged into the mm-hotfixes-stable branch
of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
------------------------------------------------------
From: Yin Fengwei <fengwei.yin(a)intel.com>
Subject: madvise:madvise_cold_or_pageout_pte_range(): don't use mapcount() against large folio for sharing check
Date: Tue, 8 Aug 2023 10:09:15 +0800
Patch series "don't use mapcount() to check large folio sharing", v2.
In madvise_cold_or_pageout_pte_range() and madvise_free_pte_range(),
folio_mapcount() is used to check whether the folio is shared. But it's
not correct as folio_mapcount() returns total mapcount of large folio.
Use folio_estimated_sharers() here as the estimated number is enough.
This patchset will fix the cases:
User space application call madvise() with MADV_FREE, MADV_COLD and
MADV_PAGEOUT for specific address range. There are THP mapped to the
range. Without the patchset, the THP is skipped. With the patch, the
THP will be split and handled accordingly.
David reported the cow self test skip some cases because of MADV_PAGEOUT
skip THP:
https://lore.kernel.org/linux-mm/9e92e42d-488f-47db-ac9d-75b24cd0d037@intel…
and I confirmed this patchset make it work again.
This patch (of 3):
Commit 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range()
to use folios") replaced the page_mapcount() with folio_mapcount() to
check whether the folio is shared by other mapping.
It's not correct for large folio. folio_mapcount() returns the total
mapcount of large folio which is not suitable to detect whether the folio
is shared.
Use folio_estimated_sharers() which returns a estimated number of shares.
That means it's not 100% correct. It should be OK for madvise case here.
User-visible effects is that the THP is skipped when user call madvise.
But the correct behavior is THP should be split and processed then.
NOTE: this change is a temporary fix to reduce the user-visible effects
before the long term fix from David is ready.
Link: https://lkml.kernel.org/r/20230808020917.2230692-1-fengwei.yin@intel.com
Link: https://lkml.kernel.org/r/20230808020917.2230692-2-fengwei.yin@intel.com
Fixes: 07e8c82b5eff ("madvise: convert madvise_cold_or_pageout_pte_range() to use folios")
Signed-off-by: Yin Fengwei <fengwei.yin(a)intel.com>
Reviewed-by: Yu Zhao <yuzhao(a)google.com>
Reviewed-by: Ryan Roberts <ryan.roberts(a)arm.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Minchan Kim <minchan(a)kernel.org>
Cc: Vishal Moola (Oracle) <vishal.moola(a)gmail.com>
Cc: Yang Shi <shy828301(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/madvise.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/madvise.c~madvise-madvise_cold_or_pageout_pte_range-dont-use-mapcount-against-large-folio-for-sharing-check
+++ a/mm/madvise.c
@@ -384,7 +384,7 @@ static int madvise_cold_or_pageout_pte_r
folio = pfn_folio(pmd_pfn(orig_pmd));
/* Do not interfere with other mappings of this folio */
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
goto huge_unlock;
if (pageout_anon_only_filter && !folio_test_anon(folio))
@@ -458,7 +458,7 @@ regular_folio:
if (folio_test_large(folio)) {
int err;
- if (folio_mapcount(folio) != 1)
+ if (folio_estimated_sharers(folio) != 1)
break;
if (pageout_anon_only_filter && !folio_test_anon(folio))
break;
_
Patches currently in -mm which might be from fengwei.yin(a)intel.com are
filemap-add-filemap_map_folio_range.patch
rmap-add-folio_add_file_rmap_range.patch
mm-convert-do_set_pte-to-set_pte_range.patch
filemap-batch-pte-mappings.patch
When building the kernel with binutils 2.37 and GCC-11.1.0/GCC-11.2.0,
the following error occurs:
Assembler messages:
Error: cannot find default versions of the ISA extension `zicsr'
Error: cannot find default versions of the ISA extension `zifencei'
The above error originated from this commit of binutils[0], which has been
resolved and backported by GCC-12.1.0[1] and GCC-11.3.0[2].
So fix this by change the GCC version in
CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC to GCC-11.3.0.
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f0bae2552db1dd4f1… [0]
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=ca2bbb88f999f4d3cc40e89bc1aba… [1]
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d29f5d6ab513c52fd872f532c492e… [2]
Fixes: ca09f772ccca ("riscv: Handle zicsr/zifencei issue between gcc and binutils")
Reported-by: Conor Dooley <conor.dooley(a)microchip.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mingzheng Xing <xingmingzheng(a)iscas.ac.cn>
---
Then below are my test results after this fix:
gcc binutils
10.5.0 2.35 ok
10.5.0 2.36 ok
10.5.0 2.37 ok
10.5.0 2.38 ok
11.1.0 2.35 ok
11.1.0 2.36 ok
11.1.0 2.37 ok
11.1.0 2.38 ok
11.2.0 2.35 ok
11.2.0 2.36 ok
11.2.0 2.37 ok
11.2.0 2.38 ok
11.3.0 2.35 ok
11.3.0 2.36 ok
11.3.0 2.37 ok
11.3.0 2.38 ok
11.4.0 2.35 ok
11.4.0 2.36 ok
11.4.0 2.37 ok
11.4.0 2.38 ok
12.1.0 2.35 ok
12.1.0 2.36 ok
12.1.0 2.37 ok
12.1.0 2.38 ok
12.2.0 2.35 ok
12.2.0 2.36 ok
12.2.0 2.37 ok
12.2.0 2.38 ok
arch/riscv/Kconfig | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 10e7a7ad175a..bea7b73e895d 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -580,15 +580,15 @@ config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
and Zifencei are supported in binutils from version 2.36 onwards.
To make life easier, and avoid forcing toolchains that default to a
newer ISA spec to version 2.2, relax the check to binutils >= 2.36.
- For clang < 17 or GCC < 11.1.0, for which this is not possible, this is
- dealt with in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
+ For clang < 17 or GCC < 11.3.0, for which this is not possible or need
+ special treatment, this is dealt with in TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
config TOOLCHAIN_NEEDS_OLD_ISA_SPEC
def_bool y
depends on TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
# https://github.com/llvm/llvm-project/commit/22e199e6afb1263c943c0c0d4498694…
- # https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b03be74bad08c382da47e048007a7…
- depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110100)
+ # https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d29f5d6ab513c52fd872f532c492e…
+ depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110300)
help
Certain versions of clang and GCC do not support zicsr and zifencei via
-march. This option causes an older ISA spec compatible with these older
--
2.34.1
On Thu, 24 Aug 2023 11:05:13 PDT (-0700), Conor Dooley wrote:
> On Fri, Aug 25, 2023 at 01:46:59AM +0800, Mingzheng Xing wrote:
>
>> Just a question, I see that the previous patch has been merged
>> into 6.5-rc7, and now a new fix patch should be sent out based
>> on that, right?
>
> yes
Ideally ASAP, it's very late in the cycle. I have something for
tomorrow morning, but this will need time to test...
The patch titled
Subject: memcontrol: ensure memcg acquired by id is properly set up
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
memcontrol-ensure-memcg-acquired-by-id-is-properly-set-up.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Johannes Weiner <hannes(a)cmpxchg.org>
Subject: memcontrol: ensure memcg acquired by id is properly set up
Date: Wed, 23 Aug 2023 15:54:30 -0700
In the eviction recency check, we attempt to retrieve the memcg to which
the folio belonged when it was evicted, by the memcg id stored in the
shadow entry. However, there is a chance that the retrieved memcg is not
the original memcg that has been killed, but a new one which happens to
have the same id.
This is a somewhat unfortunate, but acceptable and rare inaccuracy in the
heuristics. However, if we retrieve this new memcg between its allocation
and when it is properly attached to the memcg hierarchy, we could run into
the following NULL pointer exception during the memcg hierarchy traversal
done in mem_cgroup_get_nr_swap_pages():
[ 155757.793456] BUG: kernel NULL pointer dereference, address: 00000000000000c0
[ 155757.807568] #PF: supervisor read access in kernel mode
[ 155757.818024] #PF: error_code(0x0000) - not-present page
[ 155757.828482] PGD 401f77067 P4D 401f77067 PUD 401f76067 PMD 0
[ 155757.839985] Oops: 0000 [#1] SMP
[ 155757.887870] RIP: 0010:mem_cgroup_get_nr_swap_pages+0x3d/0xb0
[ 155757.899377] Code: 29 19 4a 02 48 39 f9 74 63 48 8b 97 c0 00 00 00 48 8b b7 58 02 00 00 48 2b b7 c0 01 00 00 48 39 f0 48 0f 4d c6 48 39 d1 74 42 <48> 8b b2 c0 00 00 00 48 8b ba 58 02 00 00 48 2b ba c0 01 00 00 48
[ 155757.937125] RSP: 0018:ffffc9002ecdfbc8 EFLAGS: 00010286
[ 155757.947755] RAX: 00000000003a3b1c RBX: 000007ffffffffff RCX: ffff888280183000
[ 155757.962202] RDX: 0000000000000000 RSI: 0007ffffffffffff RDI: ffff888bbc2d1000
[ 155757.976648] RBP: 0000000000000001 R08: 000000000000000b R09: ffff888ad9cedba0
[ 155757.991094] R10: ffffea0039c07900 R11: 0000000000000010 R12: ffff888b23a7b000
[ 155758.005540] R13: 0000000000000000 R14: ffff888bbc2d1000 R15: 000007ffffc71354
[ 155758.019991] FS: 00007f6234c68640(0000) GS:ffff88903f9c0000(0000) knlGS:0000000000000000
[ 155758.036356] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 155758.048023] CR2: 00000000000000c0 CR3: 0000000a83eb8004 CR4: 00000000007706e0
[ 155758.062473] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 155758.076924] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 155758.091376] PKRU: 55555554
[ 155758.096957] Call Trace:
[ 155758.102016] <TASK>
[ 155758.106502] ? __die+0x78/0xc0
[ 155758.112793] ? page_fault_oops+0x286/0x380
[ 155758.121175] ? exc_page_fault+0x5d/0x110
[ 155758.129209] ? asm_exc_page_fault+0x22/0x30
[ 155758.137763] ? mem_cgroup_get_nr_swap_pages+0x3d/0xb0
[ 155758.148060] workingset_test_recent+0xda/0x1b0
[ 155758.157133] workingset_refault+0xca/0x1e0
[ 155758.165508] filemap_add_folio+0x4d/0x70
[ 155758.173538] page_cache_ra_unbounded+0xed/0x190
[ 155758.182919] page_cache_sync_ra+0xd6/0x1e0
[ 155758.191738] filemap_read+0x68d/0xdf0
[ 155758.199495] ? mlx5e_napi_poll+0x123/0x940
[ 155758.207981] ? __napi_schedule+0x55/0x90
[ 155758.216095] __x64_sys_pread64+0x1d6/0x2c0
[ 155758.224601] do_syscall_64+0x3d/0x80
[ 155758.232058] entry_SYSCALL_64_after_hwframe+0x46/0xb0
[ 155758.242473] RIP: 0033:0x7f62c29153b5
[ 155758.249938] Code: e8 48 89 75 f0 89 7d f8 48 89 4d e0 e8 b4 e6 f7 ff 41 89 c0 4c 8b 55 e0 48 8b 55 e8 48 8b 75 f0 8b 7d f8 b8 11 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 33 44 89 c7 48 89 45 f8 e8 e7 e6 f7 ff 48 8b
[ 155758.288005] RSP: 002b:00007f6234c5ffd0 EFLAGS: 00000293 ORIG_RAX: 0000000000000011
[ 155758.303474] RAX: ffffffffffffffda RBX: 00007f628c4e70c0 RCX: 00007f62c29153b5
[ 155758.318075] RDX: 000000000003c041 RSI: 00007f61d2986000 RDI: 0000000000000076
[ 155758.332678] RBP: 00007f6234c5fff0 R08: 0000000000000000 R09: 0000000064d5230c
[ 155758.347452] R10: 000000000027d450 R11: 0000000000000293 R12: 000000000003c041
[ 155758.362044] R13: 00007f61d2986000 R14: 00007f629e11b060 R15: 000000000027d450
[ 155758.376661] </TASK>
This patch fixes the issue by moving the memcg's id publication from the
alloc stage to online stage, ensuring that any memcg acquired via id must
be connected to the memcg tree.
Link: https://lkml.kernel.org/r/20230823225430.166925-1-nphamcs@gmail.com
Fixes: f78dfc7b77d5 ("workingset: fix confusion around eviction vs refault container")
Signed-off-by: Johannes Weiner <hannes(a)cmpxchg.org>
Co-developed-by: Nhat Pham <nphamcs(a)gmail.com>
Signed-off-by: Nhat Pham <nphamcs(a)gmail.com>
Cc: Yosry Ahmed <yosryahmed(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memcontrol.c | 22 +++++++++++++++++-----
1 file changed, 17 insertions(+), 5 deletions(-)
--- a/mm/memcontrol.c~memcontrol-ensure-memcg-acquired-by-id-is-properly-set-up
+++ a/mm/memcontrol.c
@@ -5339,7 +5339,6 @@ static struct mem_cgroup *mem_cgroup_all
INIT_LIST_HEAD(&memcg->deferred_split_queue.split_queue);
memcg->deferred_split_queue.split_queue_len = 0;
#endif
- idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
lru_gen_init_memcg(memcg);
return memcg;
fail:
@@ -5411,14 +5410,27 @@ static int mem_cgroup_css_online(struct
if (alloc_shrinker_info(memcg))
goto offline_kmem;
- /* Online state pins memcg ID, memcg ID pins CSS */
- refcount_set(&memcg->id.ref, 1);
- css_get(css);
-
if (unlikely(mem_cgroup_is_root(memcg)))
queue_delayed_work(system_unbound_wq, &stats_flush_dwork,
FLUSH_TIME);
lru_gen_online_memcg(memcg);
+
+ /* Online state pins memcg ID, memcg ID pins CSS */
+ refcount_set(&memcg->id.ref, 1);
+ css_get(css);
+
+ /*
+ * Ensure mem_cgroup_from_id() works once we're fully online.
+ *
+ * We could do this earlier and require callers to filter with
+ * css_tryget_online(). But right now there are no users that
+ * need earlier access, and the workingset code relies on the
+ * cgroup tree linkage (mem_cgroup_get_nr_swap_pages()). So
+ * publish it here at the end of onlining. This matches the
+ * regular ID destruction during offlining.
+ */
+ idr_replace(&mem_cgroup_idr, memcg, memcg->id.id);
+
return 0;
offline_kmem:
memcg_offline_kmem(memcg);
_
Patches currently in -mm which might be from hannes(a)cmpxchg.org are
memcontrol-ensure-memcg-acquired-by-id-is-properly-set-up.patch
mm-page_alloc-remove-stale-cma-guard-code.patch
On 8/25/23 01:33, Conor Dooley wrote:
> On Fri, Aug 25, 2023 at 01:30:36AM +0800, Mingzheng Xing wrote:
>> On 8/24/23 19:32, Mingzheng Xing wrote:
>>> On 8/23/23 21:31, Conor Dooley wrote:
>>>> On Wed, Aug 23, 2023 at 12:51:13PM +0100, Conor Dooley wrote:
>>>>> On Thu, Aug 17, 2023 at 03:20:24PM +0000,
>>>>> patchwork-bot+linux-riscv(a)kernel.org wrote:
>>>>>> Hello:
>>>>>>
>>>>>> This patch was applied to riscv/linux.git (fixes)
>>>>>> by Palmer Dabbelt <palmer(a)rivosinc.com>:
>>>>>>
>>>>>> On Thu, 10 Aug 2023 00:56:48 +0800 you wrote:
>>>>>>> Binutils-2.38 and GCC-12.1.0 bumped[0][1] the default
>>>>>>> ISA spec to the newer
>>>>>>> 20191213 version which moves some instructions from the
>>>>>>> I extension to the
>>>>>>> Zicsr and Zifencei extensions. So if one of the binutils
>>>>>>> and GCC exceeds
>>>>>>> that version, we should explicitly specifying Zicsr and
>>>>>>> Zifencei via -march
>>>>>>> to cope with the new changes. but this only occurs when
>>>>>>> binutils >= 2.36
>>>>>>> and GCC >= 11.1.0. It's a different story when binutils < 2.36.
>>>>>>>
>>>>>>> [...]
>>>>>> Here is the summary with links:
>>>>>> - [v5] riscv: Handle zicsr/zifencei issue between gcc and binutils
>>>>>> https://git.kernel.org/riscv/c/ca09f772ccca
>>>>> *sigh* so this breaks the build for gcc-11 & binutils 2.37 w/
>>>>> Assembler messages:
>>>>> Error: cannot find default versions of the ISA extension `zicsr'
>>>>> Error: cannot find default versions of the ISA extension `zifencei'
>>>>>
>>>>> I'll have a poke later.
>>>> So uh, are we sure that this should not be:
>>>> - depends on (CC_IS_CLANG && CLANG_VERSION < 170000) ||
>>>> (CC_IS_GCC && GCC_VERSION < 110100)
>>>> + depends on (CC_IS_CLANG && CLANG_VERSION < 170000) ||
>>>> (CC_IS_GCC && GCC_VERSION <= 110100)
>>>>
>>>> My gcc-11.1 + binutils 2.37 toolchain built from riscv-gnu-toolchain
>>>> doesn't have the default versions & the above diff fixes the build.
>>> I reproduced the error, the combination of gcc-11.1 and
>>> binutils 2.37 does cause errors. What a surprise, since binutils
>>> 2.36 and 2.38 are fine.
>>>
>>> I used git bisect to locate this commit[1] for binutils.
>>> I'll test this diff in more detail later. Thanks!
>>>
>>> [1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f0bae2552db1dd4f1…
>>>
>> Hi, Conor.
>> The above error does originate from link[1] mentioned above, which was
>> resolved in gcc-12.1.0[2], and gcc-11.3.0 made the backport[3].
>> So gcc-11.2.0 combined with binutils 2.37 produces the same error.
>> I think we should do the following diff to fix it:
>> - depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC &&
>> GCC_VERSION < 110100)
>> + depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC &&
>> GCC_VERSION < 110300)
> That sounds good to me. Can you make that a real patch please?
Okey.
Just a question, I see that the previous patch has been merged
into 6.5-rc7, and now a new fix patch should be sent out based
on that, right?
Best Regards,
Mingzheng.
>
> Thanks for working on it :)
On 8/23/23 21:31, Conor Dooley wrote:
> On Wed, Aug 23, 2023 at 12:51:13PM +0100, Conor Dooley wrote:
>> On Thu, Aug 17, 2023 at 03:20:24PM +0000, patchwork-bot+linux-riscv(a)kernel.org wrote:
>>> Hello:
>>>
>>> This patch was applied to riscv/linux.git (fixes)
>>> by Palmer Dabbelt <palmer(a)rivosinc.com>:
>>>
>>> On Thu, 10 Aug 2023 00:56:48 +0800 you wrote:
>>>> Binutils-2.38 and GCC-12.1.0 bumped[0][1] the default ISA spec to the newer
>>>> 20191213 version which moves some instructions from the I extension to the
>>>> Zicsr and Zifencei extensions. So if one of the binutils and GCC exceeds
>>>> that version, we should explicitly specifying Zicsr and Zifencei via -march
>>>> to cope with the new changes. but this only occurs when binutils >= 2.36
>>>> and GCC >= 11.1.0. It's a different story when binutils < 2.36.
>>>>
>>>> [...]
>>> Here is the summary with links:
>>> - [v5] riscv: Handle zicsr/zifencei issue between gcc and binutils
>>> https://git.kernel.org/riscv/c/ca09f772ccca
>> *sigh* so this breaks the build for gcc-11 & binutils 2.37 w/
>> Assembler messages:
>> Error: cannot find default versions of the ISA extension `zicsr'
>> Error: cannot find default versions of the ISA extension `zifencei'
>>
>> I'll have a poke later.
> So uh, are we sure that this should not be:
> - depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110100)
> + depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION <= 110100)
>
> My gcc-11.1 + binutils 2.37 toolchain built from riscv-gnu-toolchain
> doesn't have the default versions & the above diff fixes the build.
I reproduced the error, the combination of gcc-11.1 and
binutils 2.37 does cause errors. What a surprise, since binutils
2.36 and 2.38 are fine.
I used git bisect to locate this commit[1] for binutils.
I'll test this diff in more detail later. Thanks!
[1]
https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=f0bae2552db1dd4f1…
Best Regards,
Mingzheng.
>
> Thanks,
> Conor.
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 1f69383b203e28cf8a4ca9570e572da1699f76cd
Gitweb: https://git.kernel.org/tip/1f69383b203e28cf8a4ca9570e572da1699f76cd
Author: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
AuthorDate: Fri, 18 Aug 2023 10:03:05 -07:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 24 Aug 2023 11:01:45 +02:00
x86/fpu: Invalidate FPU state correctly on exec()
The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:
1: fpu_fpregs_owner_ctx points to the current tasks FPU state
2: the last CPU the registers were live in was the current CPU.
This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.
Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:
1. A task is executing in userspace on CPU0
- CPU0’s FPU context points to tasks
- fpu->last_cpu=CPU0
2. The task exec()’s
3. While in the kernel the task is preempted
- CPU0 gets a thread executing in the kernel (such that no other
FPU context is activated)
- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out
4. Task is migrated to CPU1
5. Continuing the exec(), the task gets to
fpu_flush_thread()->fpu_reset_fpregs()
- Sets CPU1’s fpu context to NULL
- Copies the init state to the task’s FPU buffer
- Sets TIF_NEED_FPU_LOAD on the task
6. The task reschedules back to CPU0 before completing the exec() and
returning to userspace
- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
- Skips saving the registers and updating task’s fpu→last_cpu,
because TIF_NEED_FPU_LOAD is the canonical source.
7. Now CPU0’s FPU context is still pointing to the task’s, and
fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
though the reset FPU state has not been restored.
So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().
Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.
Fixes: 33344368cb08 ("x86/fpu: Clean up the fpu__clear() variants")
Reported-by: Lei Wang <lei4.wang(a)intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Tested-by: Lijun Pan <lijun.pan(a)intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta(a)intel.com>
Acked-by: Lijun Pan <lijun.pan(a)intel.com>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com
---
arch/x86/kernel/fpu/context.h | 3 +--
arch/x86/kernel/fpu/core.c | 2 +-
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/fpu/context.h b/arch/x86/kernel/fpu/context.h
index af5cbdd..f6d856b 100644
--- a/arch/x86/kernel/fpu/context.h
+++ b/arch/x86/kernel/fpu/context.h
@@ -19,8 +19,7 @@
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
- * Either one of these invalidation functions is enough. Invalidate
- * a resource you control: CPU if using the CPU for something else
+ * Invalidate a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/
diff --git a/arch/x86/kernel/fpu/core.c b/arch/x86/kernel/fpu/core.c
index 1015af1..98e507c 100644
--- a/arch/x86/kernel/fpu/core.c
+++ b/arch/x86/kernel/fpu/core.c
@@ -679,7 +679,7 @@ static void fpu_reset_fpregs(void)
struct fpu *fpu = ¤t->thread.fpu;
fpregs_lock();
- fpu__drop(fpu);
+ __fpu_invalidate_fpregs_state(fpu);
/*
* This does not change the actual hardware registers. It just
* resets the memory image and sets TIF_NEED_FPU_LOAD so a
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit b1e213a9e31c20206f111ec664afcf31cbfe0dbb ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: dmaengine(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index f5f422f9b8507..b6221b4432fd3 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -211,6 +211,7 @@ config FSL_DMA
config FSL_EDMA
tristate "Freescale eDMA engine support"
depends on OF
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
@@ -280,6 +281,7 @@ config IMX_SDMA
config INTEL_IDMA64
tristate "Intel integrated DMA 64-bit support"
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
--
2.40.1
Hi there,
Reach your target audience and maximize your sales/revenue with our clean and reliable contact database. We have compiled a list of 45158 contacts who are attending Hy Fcell International Expo And Conference 2023.
If interested, I can share the pricing and guarantees.
Looking for your response.
Kind Regards,
Sarah Perez - B2B Marketing & Tradeshow Specialist
If you do not wish to receive these messages, reply back with remove and we will make sure you don't receive any more emails from our end.
iommu_device_register always returns 0 in 4.11-5.12, so
we need to remove redundant comparison with 0
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: b16c0170b53c ("iommu/mediatek: Make use of iommu_device_register interface")
Signed-off-by: Alexandra Diupina <adiupina(a)astralinux.ru>
---
drivers/iommu/mtk_iommu.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c
index 051815c9d2bb..208c47218b75 100644
--- a/drivers/iommu/mtk_iommu.c
+++ b/drivers/iommu/mtk_iommu.c
@@ -748,9 +748,7 @@ static int mtk_iommu_probe(struct platform_device *pdev)
iommu_device_set_ops(&data->iommu, &mtk_iommu_ops);
iommu_device_set_fwnode(&data->iommu, &pdev->dev.of_node->fwnode);
- ret = iommu_device_register(&data->iommu);
- if (ret)
- return ret;
+ iommu_device_register(&data->iommu);
spin_lock_init(&data->tlb_lock);
list_add_tail(&data->list, &m4ulist);
--
2.30.2
The following commit has been merged into the x86/urgent branch of tip:
Commit-ID: 2c66ca3949dc701da7f4c9407f2140ae425683a5
Gitweb: https://git.kernel.org/tip/2c66ca3949dc701da7f4c9407f2140ae425683a5
Author: Feng Tang <feng.tang(a)intel.com>
AuthorDate: Wed, 23 Aug 2023 14:57:47 +08:00
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Thu, 24 Aug 2023 11:01:45 +02:00
x86/fpu: Set X86_FEATURE_OSXSAVE feature after enabling OSXSAVE in CR4
0-Day found a 34.6% regression in stress-ng's 'af-alg' test case, and
bisected it to commit b81fac906a8f ("x86/fpu: Move FPU initialization into
arch_cpu_finalize_init()"), which optimizes the FPU init order, and moves
the CR4_OSXSAVE enabling into a later place:
arch_cpu_finalize_init
identify_boot_cpu
identify_cpu
generic_identify
get_cpu_cap --> setup cpu capability
...
fpu__init_cpu
fpu__init_cpu_xstate
cr4_set_bits(X86_CR4_OSXSAVE);
As the FPU is not yet initialized the CPU capability setup fails to set
X86_FEATURE_OSXSAVE. Many security module like 'camellia_aesni_avx_x86_64'
depend on this feature and therefore fail to load, causing the regression.
Cure this by setting X86_FEATURE_OSXSAVE feature right after OSXSAVE
enabling.
[ tglx: Moved it into the actual BSP FPU initialization code and added a comment ]
Fixes: b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Signed-off-by: Feng Tang <feng.tang(a)intel.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/lkml/202307192135.203ac24e-oliver.sang@intel.com
Link: https://lore.kernel.org/lkml/20230823065747.92257-1-feng.tang@intel.com
---
arch/x86/kernel/fpu/xstate.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/arch/x86/kernel/fpu/xstate.c b/arch/x86/kernel/fpu/xstate.c
index 0bab497..1afbc48 100644
--- a/arch/x86/kernel/fpu/xstate.c
+++ b/arch/x86/kernel/fpu/xstate.c
@@ -882,6 +882,13 @@ void __init fpu__init_system_xstate(unsigned int legacy_size)
goto out_disable;
}
+ /*
+ * CPU capabilities initialization runs before FPU init. So
+ * X86_FEATURE_OSXSAVE is not set. Now that XSAVE is completely
+ * functional, set the feature bit so depending code works.
+ */
+ setup_force_cpu_cap(X86_FEATURE_OSXSAVE);
+
print_xstate_offset_size();
pr_info("x86/fpu: Enabled xstate features 0x%llx, context size is %d bytes, using '%s' format.\n",
fpu_kernel_cfg.max_features,
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x c164c7bc9775be7bcc68754bb3431fce5823822e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023082019-brink-buddhist-4d1b@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
c164c7bc9775 ("blk-cgroup: hold queue_lock when removing blkg->q_node")
a06377c5d01e ("Revert "blk-cgroup: pin the gendisk in struct blkcg_gq"")
9a9c261e6b55 ("Revert "blk-cgroup: pass a gendisk to blkg_lookup"")
1231039db31c ("Revert "blk-cgroup: move the cgroup information to struct gendisk"")
dcb522014351 ("Revert "blk-cgroup: simplify blkg freeing from initialization failure paths"")
3f13ab7c80fd ("blk-cgroup: move the cgroup information to struct gendisk")
479664cee14d ("blk-cgroup: pass a gendisk to blkg_lookup")
ba91c849fa50 ("blk-rq-qos: store a gendisk instead of request_queue in struct rq_qos")
3963d84df797 ("blk-rq-qos: constify rq_qos_ops")
ce57b558604e ("blk-rq-qos: make rq_qos_add and rq_qos_del more useful")
b494f9c566ba ("blk-rq-qos: move rq_qos_add and rq_qos_del out of line")
f05837ed73d0 ("blk-cgroup: store a gendisk to throttle in struct task_struct")
84d7d462b16d ("blk-cgroup: pin the gendisk in struct blkcg_gq")
180b04d450a7 ("blk-cgroup: remove the !bdi->dev check in blkg_dev_name")
27b642b07a4a ("blk-cgroup: simplify blkg freeing from initialization failure paths")
0b6f93bdf07e ("blk-cgroup: improve error unwinding in blkg_alloc")
f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
dfd6200a0954 ("blk-cgroup: support to track if policy is online")
c7241babf085 ("blk-cgroup: dropping parent refcount after pd_free_fn() is done")
e3ff8887e7db ("blk-cgroup: fix missing pd_online_fn() while activating policy")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c164c7bc9775be7bcc68754bb3431fce5823822e Mon Sep 17 00:00:00 2001
From: Ming Lei <ming.lei(a)redhat.com>
Date: Thu, 17 Aug 2023 22:17:51 +0800
Subject: [PATCH] blk-cgroup: hold queue_lock when removing blkg->q_node
When blkg is removed from q->blkg_list from blkg_free_workfn(), queue_lock
has to be held, otherwise, all kinds of bugs(list corruption, hard lockup,
..) can be triggered from blkg_destroy_all().
Fixes: f1c006f1c685 ("blk-cgroup: synchronize pd_free_fn() from blkg_free_workfn() and blkcg_deactivate_policy()")
Cc: Yu Kuai <yukuai3(a)huawei.com>
Cc: xiaoli feng <xifeng(a)redhat.com>
Cc: Chunyu Hu <chuhu(a)redhat.com>
Cc: Mike Snitzer <snitzer(a)kernel.org>
Cc: Tejun Heo <tj(a)kernel.org>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
Acked-by: Tejun Heo <tj(a)kernel.org>
Link: https://lore.kernel.org/r/20230817141751.1128970-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe(a)kernel.dk>
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index fc49be622e05..9faafcd10e17 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -136,7 +136,9 @@ static void blkg_free_workfn(struct work_struct *work)
blkcg_policy[i]->pd_free_fn(blkg->pd[i]);
if (blkg->parent)
blkg_put(blkg->parent);
+ spin_lock_irq(&q->queue_lock);
list_del_init(&blkg->q_node);
+ spin_unlock_irq(&q->queue_lock);
mutex_unlock(&q->blkcg_mutex);
blk_put_queue(q);
Re; Interest,
I am interested in discussing the Investment proposal as I explained
in my previous mail. May you let me know your interest and the
possibility of a cooperation aimed for mutual interest.
Looking forward to your mail for further discussion.
Regards
------
Chen Yun - Chairman of CREC
China Railway Engineering Corporation - CRECG
China Railway Plaza, No.69 Fuxing Road, Haidian District, Beijing, P.R.
China
fbcon requires that we implement &drm_framebuffer_funcs.dirty.
Otherwise, the framebuffer might take a while to flush (which would
manifest as noticeable lag). However, we can't enable this callback for
non-fbcon cases since it may cause too many atomic commits to be made at
once. So, implement amdgpu_dirtyfb() and only enable it for fbcon
framebuffers (we can use the "struct drm_file file" parameter in the
callback to check for this since it is only NULL when called by fbcon,
at least in the mainline kernel) on devices that support atomic KMS.
Cc: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Cc: Mario Limonciello <mario.limonciello(a)amd.com>
Cc: stable(a)vger.kernel.org # 6.1+
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2519
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
--
v3: check if file is NULL instead of doing a strcmp() and make note of
it in the commit message
---
drivers/gpu/drm/amd/amdgpu/amdgpu_display.c | 26 ++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
index d20dd3f852fc..363e6a2cad8c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_display.c
@@ -38,6 +38,8 @@
#include <linux/pci.h>
#include <linux/pm_runtime.h>
#include <drm/drm_crtc_helper.h>
+#include <drm/drm_damage_helper.h>
+#include <drm/drm_drv.h>
#include <drm/drm_edid.h>
#include <drm/drm_fb_helper.h>
#include <drm/drm_gem_framebuffer_helper.h>
@@ -532,11 +534,29 @@ bool amdgpu_display_ddc_probe(struct amdgpu_connector *amdgpu_connector,
return true;
}
+static int amdgpu_dirtyfb(struct drm_framebuffer *fb, struct drm_file *file,
+ unsigned int flags, unsigned int color,
+ struct drm_clip_rect *clips, unsigned int num_clips)
+{
+
+ if (file)
+ return -ENOSYS;
+
+ return drm_atomic_helper_dirtyfb(fb, file, flags, color, clips,
+ num_clips);
+}
+
static const struct drm_framebuffer_funcs amdgpu_fb_funcs = {
.destroy = drm_gem_fb_destroy,
.create_handle = drm_gem_fb_create_handle,
};
+static const struct drm_framebuffer_funcs amdgpu_fb_funcs_atomic = {
+ .destroy = drm_gem_fb_destroy,
+ .create_handle = drm_gem_fb_create_handle,
+ .dirty = amdgpu_dirtyfb
+};
+
uint32_t amdgpu_display_supported_domains(struct amdgpu_device *adev,
uint64_t bo_flags)
{
@@ -1139,7 +1159,11 @@ static int amdgpu_display_gem_fb_verify_and_init(struct drm_device *dev,
if (ret)
goto err;
- ret = drm_framebuffer_init(dev, &rfb->base, &amdgpu_fb_funcs);
+ if (drm_drv_uses_atomic_modeset(dev))
+ ret = drm_framebuffer_init(dev, &rfb->base,
+ &amdgpu_fb_funcs_atomic);
+ else
+ ret = drm_framebuffer_init(dev, &rfb->base, &amdgpu_fb_funcs);
if (ret)
goto err;
--
2.41.0
The patch titled
Subject: shmem: fix smaps BUG sleeping while atomic
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
shmem-fix-smaps-bug-sleeping-while-atomic.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Hugh Dickins <hughd(a)google.com>
Subject: shmem: fix smaps BUG sleeping while atomic
Date: Tue, 22 Aug 2023 22:14:47 -0700 (PDT)
smaps_pte_hole_lookup() is calling shmem_partial_swap_usage() with page
table lock held: but shmem_partial_swap_usage() does cond_resched_rcu() if
need_resched(): "BUG: sleeping function called from invalid context".
Since shmem_partial_swap_usage() is designed to count across a range, but
smaps_pte_hole_lookup() only calls it for a single page slot, just break
out of the loop on the last or only page, before checking need_resched().
Link: https://lkml.kernel.org/r/6fe3b3ec-abdf-332f-5c23-6a3b3a3b11a9@google.com
Fixes: 230100321518 ("mm/smaps: simplify shmem handling of pte holes")
Signed-off-by: Hugh Dickins <hughd(a)google.com>
Acked-by: Peter Xu <peterx(a)redhat.com>
Cc: <stable(a)vger.kernel.org> [5.16+]
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/shmem.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
--- a/mm/shmem.c~shmem-fix-smaps-bug-sleeping-while-atomic
+++ a/mm/shmem.c
@@ -806,14 +806,16 @@ unsigned long shmem_partial_swap_usage(s
XA_STATE(xas, &mapping->i_pages, start);
struct page *page;
unsigned long swapped = 0;
+ unsigned long max = end - 1;
rcu_read_lock();
- xas_for_each(&xas, page, end - 1) {
+ xas_for_each(&xas, page, max) {
if (xas_retry(&xas, page))
continue;
if (xa_is_value(page))
swapped++;
-
+ if (xas.xa_index == max)
+ break;
if (need_resched()) {
xas_pause(&xas);
cond_resched_rcu();
_
Patches currently in -mm which might be from hughd(a)google.com are
shmem-fix-smaps-bug-sleeping-while-atomic.patch
Official Name:United States of America
Capitol:Washington
Population:318,814,000
Languages:English, Spanish, numerous others
Geographic Region:Americas Northern America
Geographic Size (km sq):9,526,468
Year of UN Membership:1945
Year of Present State Formation:1787
Current UN Representative:Mr.Dennis Francis
Greetings
This message is converting to you from united nation Headquarter from
New-York America to know what is exactly the reason of being
ungrateful to the received compensation fund, meanwhile you have to
explain to us how the fund was divided to each and every needful one
in your country because united nation compersated you with (€
2,500,000.00 Million EUR ) to use part of the money and help orphan
and widowers including the people covid19 affected in your country for
our proper documentary.
It had been officially known that out of the (150) lucky winners that
has received their compensation fund out there worldwide sum of (€
2,500,000.00 Million EUR ) per each of the lucky winner as it was
listed in our list files and individuals, that was offered by United
Nations compensation in last year 2022,(149) has all returned back
with appreciation letter to united nation office remainder
you.Woodforest National Bank reported to united nation that they has
paid all the lucky winners,after we checked our file we saw that
(149)has come and thanked united nation and explained how they used
there money remaining you to complete the total number(150).we need
your urgent response for our proper documentry.
You are adviced to explain in details how the fund was divided to the
needful as the purpose on your reply mail.
Thank you in advance
Mr.Dennis Francis
PRESIDENT OF THE UNITED NATIONS GENERAL ASSEMBLY
From: Sanjay R Mehta <sanju.mehta(a)amd.com>
Previously, on unplug events, the TMU mode was disabled first
followed by the Time Synchronization Handshake, irrespective of
whether the tb_switch_tmu_rate_write() API was successful or not.
However, this caused a problem with Thunderbolt 3 (TBT3)
devices, as the TSPacketInterval bits were always enabled by default,
leading the host router to assume that the device router's TMU was
already enabled and preventing it from initiating the Time
Synchronization Handshake. As a result, TBT3 monitors experienced
display flickering from the second hot plug onwards.
To address this issue, we have modified the code to only disable the
Time Synchronization Handshake during TMU disable if the
tb_switch_tmu_rate_write() function is successful. This ensures that
the TBT3 devices function correctly and eliminates the display
flickering issue.
Co-developed-by: Sanath S <Sanath.S(a)amd.com>
Signed-off-by: Sanath S <Sanath.S(a)amd.com>
Signed-off-by: Sanjay R Mehta <sanju.mehta(a)amd.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Mika Westerberg <mika.westerberg(a)linux.intel.com>
(cherry picked from commit 583893a66d731f5da010a3fa38a0460e05f0149b)
USB4v2 introduced support for uni-directional TMU mode as part of
d49b4f043d63 ("thunderbolt: Add support for enhanced uni-directional TMU mode")
This is not a stable candidate commit, so adjust the code for backport.
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
drivers/thunderbolt/tmu.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/thunderbolt/tmu.c b/drivers/thunderbolt/tmu.c
index 626aca3124b1..d9544600b386 100644
--- a/drivers/thunderbolt/tmu.c
+++ b/drivers/thunderbolt/tmu.c
@@ -415,7 +415,8 @@ int tb_switch_tmu_disable(struct tb_switch *sw)
* uni-directional mode and we don't want to change it's TMU
* mode.
*/
- tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF);
+ ret = tb_switch_tmu_rate_write(sw, TB_SWITCH_TMU_RATE_OFF);
+ return ret;
tb_port_tmu_time_sync_disable(up);
ret = tb_port_tmu_time_sync_disable(down);
--
2.34.1
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit b1e213a9e31c20206f111ec664afcf31cbfe0dbb ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let FSL_EDMA and INTEL_IDMA64 depend on HAS_IOMEM so that it
won't be built to cause below compiling error if PCI is unset.
--------
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" [drivers/dma/idma64.ko] undefined!
--------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Vinod Koul <vkoul(a)kernel.org>
Cc: dmaengine(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-2-bhe@redhat.com
Signed-off-by: Vinod Koul <vkoul(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/dma/Kconfig | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/dma/Kconfig b/drivers/dma/Kconfig
index 95344ae49e532..e1beddcc8c84a 100644
--- a/drivers/dma/Kconfig
+++ b/drivers/dma/Kconfig
@@ -202,6 +202,7 @@ config FSL_DMA
config FSL_EDMA
tristate "Freescale eDMA engine support"
depends on OF
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
@@ -271,6 +272,7 @@ config IMX_SDMA
config INTEL_IDMA64
tristate "Intel integrated DMA 64-bit support"
+ depends on HAS_IOMEM
select DMA_ENGINE
select DMA_VIRTUAL_CHANNELS
help
--
2.40.1
The IOMMU list has moved and emails to the old list bounce. Bring stable
in alignment with mainline.
Joerg Roedel (1):
MAINTAINERS: Remove iommu(a)lists.linux-foundation.org
Xiang Chen (1):
MAINTAINERS: update maintainer list of DMA MAPPING BENCHMARK
MAINTAINERS | 13 +------------
1 file changed, 1 insertion(+), 12 deletions(-)
--
2.25.1
From: Baoquan He <bhe(a)redhat.com>
[ Upstream commit e7dd44f4f3166db45248414f5df8f615392de47a ]
On s390 systems (aka mainframes), it has classic channel devices for
networking and permanent storage that are currently even more common
than PCI devices. Hence it could have a fully functional s390 kernel
with CONFIG_PCI=n, then the relevant iomem mapping functions
[including ioremap(), devm_ioremap(), etc.] are not available.
Here let COMMON_CLK_FIXED_MMIO depend on HAS_IOMEM so that it won't
be built to cause below compiling error if PCI is unset:
------
ld: drivers/clk/clk-fixed-mmio.o: in function `fixed_mmio_clk_setup':
clk-fixed-mmio.c:(.text+0x5e): undefined reference to `of_iomap'
ld: clk-fixed-mmio.c:(.text+0xba): undefined reference to `iounmap'
------
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202306211329.ticOJCSv-lkp@intel.com/
Signed-off-by: Baoquan He <bhe(a)redhat.com>
Cc: Michael Turquette <mturquette(a)baylibre.com>
Cc: Stephen Boyd <sboyd(a)kernel.org>
Cc: linux-clk(a)vger.kernel.org
Link: https://lore.kernel.org/r/20230707135852.24292-8-bhe@redhat.com
Signed-off-by: Stephen Boyd <sboyd(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/clk/Kconfig | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig
index 016814e15536a..52dfbae4f361c 100644
--- a/drivers/clk/Kconfig
+++ b/drivers/clk/Kconfig
@@ -444,6 +444,7 @@ config COMMON_CLK_BD718XX
config COMMON_CLK_FIXED_MMIO
bool "Clock driver for Memory Mapped Fixed values"
depends on COMMON_CLK && OF
+ depends on HAS_IOMEM
help
Support for Memory Mapped IO Fixed clocks
--
2.40.1
From: Peter Wang <peter.wang(a)mediatek.com>
If clock scale up and suspend clock scaling, ufs will keep high
performance/power mode but no read/write requests on going.
It is logic wrong and have power concern.
Fixes: 401f1e4490ee ("scsi: ufs: don't suspend clock scaling during clock gating")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
---
drivers/ufs/core/ufshcd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index 129446775796..e3672e55efae 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -1458,7 +1458,7 @@ static int ufshcd_devfreq_target(struct device *dev,
ktime_to_us(ktime_sub(ktime_get(), start)), ret);
out:
- if (sched_clk_scaling_suspend_work)
+ if (sched_clk_scaling_suspend_work && !scale_up)
queue_work(hba->clk_scaling.workq,
&hba->clk_scaling.suspend_work);
--
2.18.0
From: Gabe Teeger <gabe.teeger(a)amd.com>
[Why]
We wait for mpc idle while in a locked state, leading to potential
deadlock.
[What]
Move the wait_for_idle call to outside of HW lock. This and a
call to wait_drr_doublebuffer_pending_clear are moved added to a new
static helper function called wait_for_outstanding_hw_updates, to make
the interface clearer.
Cc: stable(a)vger.kernel.org
Fixes: 8f0d304d21b3 ("drm/amd/display: Do not commit pipe when updating DRR")
Reviewed-by: Jun Lei <jun.lei(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Gabe Teeger <gabe.teeger(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/Makefile | 1 +
drivers/gpu/drm/amd/display/dc/core/dc.c | 58 +++++++++++++------
.../drm/amd/display/dc/dcn20/dcn20_hwseq.c | 11 ----
3 files changed, 42 insertions(+), 28 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/Makefile b/drivers/gpu/drm/amd/display/dc/Makefile
index 69ffd4424dc7..1b8c2aef4633 100644
--- a/drivers/gpu/drm/amd/display/dc/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/Makefile
@@ -78,3 +78,4 @@ DC_EDID += dc_edid_parser.o
AMD_DISPLAY_DMUB = $(addprefix $(AMDDALPATH)/dc/,$(DC_DMUB))
AMD_DISPLAY_EDID = $(addprefix $(AMDDALPATH)/dc/,$(DC_EDID))
AMD_DISPLAY_FILES += $(AMD_DISPLAY_DMUB) $(AMD_DISPLAY_EDID)
+
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c
index c6f6dc972c2a..5aab67868cb6 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -3501,6 +3501,45 @@ static void commit_planes_for_stream_fast(struct dc *dc,
top_pipe_to_program->stream->update_flags.raw = 0;
}
+static void wait_for_outstanding_hw_updates(struct dc *dc, const struct dc_state *dc_context)
+{
+/*
+ * This function calls HWSS to wait for any potentially double buffered
+ * operations to complete. It should be invoked as a pre-amble prior
+ * to full update programming before asserting any HW locks.
+ */
+ int pipe_idx;
+ int opp_inst;
+ int opp_count = dc->res_pool->pipe_count;
+ struct hubp *hubp;
+ int mpcc_inst;
+ const struct pipe_ctx *pipe_ctx;
+
+ for (pipe_idx = 0; pipe_idx < dc->res_pool->pipe_count; pipe_idx++) {
+ pipe_ctx = &dc_context->res_ctx.pipe_ctx[pipe_idx];
+
+ if (!pipe_ctx->stream)
+ continue;
+
+ if (pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear)
+ pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear(pipe_ctx->stream_res.tg);
+
+ hubp = pipe_ctx->plane_res.hubp;
+ if (!hubp)
+ continue;
+
+ mpcc_inst = hubp->inst;
+ // MPCC inst is equal to pipe index in practice
+ for (opp_inst = 0; opp_inst < opp_count; opp_inst++) {
+ if (dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst]) {
+ dc->res_pool->mpc->funcs->wait_for_idle(dc->res_pool->mpc, mpcc_inst);
+ dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst] = false;
+ break;
+ }
+ }
+ }
+}
+
static void commit_planes_for_stream(struct dc *dc,
struct dc_surface_update *srf_updates,
int surface_count,
@@ -3519,24 +3558,9 @@ static void commit_planes_for_stream(struct dc *dc,
// dc->current_state anymore, so we have to cache it before we apply
// the new SubVP context
subvp_prev_use = false;
-
-
dc_z10_restore(dc);
-
- if (update_type == UPDATE_TYPE_FULL) {
- /* wait for all double-buffer activity to clear on all pipes */
- int pipe_idx;
-
- for (pipe_idx = 0; pipe_idx < dc->res_pool->pipe_count; pipe_idx++) {
- struct pipe_ctx *pipe_ctx = &context->res_ctx.pipe_ctx[pipe_idx];
-
- if (!pipe_ctx->stream)
- continue;
-
- if (pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear)
- pipe_ctx->stream_res.tg->funcs->wait_drr_doublebuffer_pending_clear(pipe_ctx->stream_res.tg);
- }
- }
+ if (update_type == UPDATE_TYPE_FULL)
+ wait_for_outstanding_hw_updates(dc, context);
if (update_type == UPDATE_TYPE_FULL) {
dc_allow_idle_optimizations(dc, false);
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index dd4c7a7faf28..971fa8bf6d1f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -1563,17 +1563,6 @@ static void dcn20_update_dchubp_dpp(
|| plane_state->update_flags.bits.global_alpha_change
|| plane_state->update_flags.bits.per_pixel_alpha_change) {
// MPCC inst is equal to pipe index in practice
- int mpcc_inst = hubp->inst;
- int opp_inst;
- int opp_count = dc->res_pool->pipe_count;
-
- for (opp_inst = 0; opp_inst < opp_count; opp_inst++) {
- if (dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst]) {
- dc->res_pool->mpc->funcs->wait_for_idle(dc->res_pool->mpc, mpcc_inst);
- dc->res_pool->opps[opp_inst]->mpcc_disconnect_pending[mpcc_inst] = false;
- break;
- }
- }
hws->funcs.update_mpcc(dc, pipe_ctx);
}
--
2.41.0
From: Wenjing Liu <wenjing.liu(a)amd.com>
ODM power optimization is only supported with single stream. When ODM
power optimization is enabled, we might not have enough free pipes for
enabling other stream. So when we are committing more than 1 stream we
should first switch off ODM power optimization to make room for new
stream and then allocating pipe resource for the new stream.
Cc: stable(a)vger.kernel.org
Fixes: 4fbcb04a2ff5 ("drm/amd/display: add ODM case when looking for first split pipe")
Reviewed-by: Dillon Varone <dillon.varone(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/core/dc.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c
index 025e0fdf486d..c6f6dc972c2a 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc.c
@@ -2073,12 +2073,12 @@ enum dc_status dc_commit_streams(struct dc *dc,
}
}
- /* Check for case where we are going from odm 2:1 to max
- * pipe scenario. For these cases, we will call
- * commit_minimal_transition_state() to exit out of odm 2:1
- * first before processing new streams
+ /* ODM Combine 2:1 power optimization is only applied for single stream
+ * scenario, it uses extra pipes than needed to reduce power consumption
+ * We need to switch off this feature to make room for new streams.
*/
- if (stream_count == dc->res_pool->pipe_count) {
+ if (stream_count > dc->current_state->stream_count &&
+ dc->current_state->stream_count == 1) {
for (i = 0; i < dc->res_pool->pipe_count; i++) {
pipe = &dc->current_state->res_ctx.pipe_ctx[i];
if (pipe->next_odm_pipe)
--
2.41.0
From: Wenjing Liu <wenjing.liu(a)amd.com>
When we are dynamically adding new ODM slices, we didn't update
blank state, if the pipe used by new ODM slice is previously blanked,
we will continue outputting blank pixel data on that slice causing
right half of the screen showing blank image.
The previous fix was a temporary hack to directly update current state
when committing new state. This could potentially cause hw and sw
state synchronization issues and it is not permitted by dc commit
design.
Cc: stable(a)vger.kernel.org
Fixes: 7fbf451e7639 ("drm/amd/display: Reinit DPG when exiting dynamic ODM")
Reviewed-by: Dillon Varone <dillon.varone(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu(a)amd.com>
---
.../drm/amd/display/dc/dcn20/dcn20_hwseq.c | 36 +++++--------------
1 file changed, 9 insertions(+), 27 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index d3caba52d2fc..f3db16cd10db 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -1106,29 +1106,6 @@ void dcn20_blank_pixel_data(
v_active,
offset);
- if (!blank && dc->debug.enable_single_display_2to1_odm_policy) {
- /* when exiting dynamic ODM need to reinit DPG state for unused pipes */
- struct pipe_ctx *old_odm_pipe = dc->current_state->res_ctx.pipe_ctx[pipe_ctx->pipe_idx].next_odm_pipe;
-
- odm_pipe = pipe_ctx->next_odm_pipe;
-
- while (old_odm_pipe) {
- if (!odm_pipe || old_odm_pipe->pipe_idx != odm_pipe->pipe_idx)
- dc->hwss.set_disp_pattern_generator(dc,
- old_odm_pipe,
- CONTROLLER_DP_TEST_PATTERN_VIDEOMODE,
- CONTROLLER_DP_COLOR_SPACE_UDEFINED,
- COLOR_DEPTH_888,
- NULL,
- 0,
- 0,
- 0);
- old_odm_pipe = old_odm_pipe->next_odm_pipe;
- if (odm_pipe)
- odm_pipe = odm_pipe->next_odm_pipe;
- }
- }
-
if (!blank)
if (stream_res->abm) {
dc->hwss.set_pipe(pipe_ctx);
@@ -1732,11 +1709,16 @@ static void dcn20_program_pipe(
struct dc_state *context)
{
struct dce_hwseq *hws = dc->hwseq;
- /* Only need to unblank on top pipe */
- if ((pipe_ctx->update_flags.bits.enable || pipe_ctx->stream->update_flags.bits.abm_level)
- && !pipe_ctx->top_pipe && !pipe_ctx->prev_odm_pipe)
- hws->funcs.blank_pixel_data(dc, pipe_ctx, !pipe_ctx->plane_state->visible);
+ /* Only need to unblank on top pipe */
+ if (resource_is_pipe_type(pipe_ctx, OTG_MASTER)) {
+ if (pipe_ctx->update_flags.bits.enable ||
+ pipe_ctx->update_flags.bits.odm ||
+ pipe_ctx->stream->update_flags.bits.abm_level)
+ hws->funcs.blank_pixel_data(dc, pipe_ctx,
+ !pipe_ctx->plane_state ||
+ !pipe_ctx->plane_state->visible);
+ }
/* Only update TG on top pipe */
if (pipe_ctx->update_flags.bits.global_sync && !pipe_ctx->top_pipe
--
2.41.0
From: Fudong Wang <fudong.wang(a)amd.com>
A benchmark stress test (12-40 machines x 48hours) found that DCN315 has
cases where DC writes to an indirect register to set the smu clock msg
id, but when we go to read the same indirect register the returned msg
id doesn't match with what we just set it to. So, to fix this retry the
write until the register's value matches with the requested value.
Cc: stable(a)vger.kernel.org # 6.1+
Fixes: f94903996140 ("drm/amd/display: Add DCN315 CLK_MGR")
Reviewed-by: Charlene Liu <charlene.liu(a)amd.com>
Acked-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Fudong Wang <fudong.wang(a)amd.com>
---
.../display/dc/clk_mgr/dcn315/dcn315_smu.c | 20 +++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c
index 3e0da873cf4c..1042cf1a3ab0 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn315/dcn315_smu.c
@@ -32,6 +32,7 @@
#define MAX_INSTANCE 6
#define MAX_SEGMENT 6
+#define SMU_REGISTER_WRITE_RETRY_COUNT 5
struct IP_BASE_INSTANCE {
unsigned int segment[MAX_SEGMENT];
@@ -132,6 +133,8 @@ static int dcn315_smu_send_msg_with_param(
unsigned int msg_id, unsigned int param)
{
uint32_t result;
+ uint32_t i = 0;
+ uint32_t read_back_data;
result = dcn315_smu_wait_for_response(clk_mgr, 10, 200000);
@@ -148,10 +151,19 @@ static int dcn315_smu_send_msg_with_param(
/* Set the parameter register for the SMU message, unit is Mhz */
REG_WRITE(MP1_SMN_C2PMSG_37, param);
- /* Trigger the message transaction by writing the message ID */
- generic_write_indirect_reg(CTX,
- REG_NBIO(RSMU_INDEX), REG_NBIO(RSMU_DATA),
- mmMP1_C2PMSG_3, msg_id);
+ for (i = 0; i < SMU_REGISTER_WRITE_RETRY_COUNT; i++) {
+ /* Trigger the message transaction by writing the message ID */
+ generic_write_indirect_reg(CTX,
+ REG_NBIO(RSMU_INDEX), REG_NBIO(RSMU_DATA),
+ mmMP1_C2PMSG_3, msg_id);
+ read_back_data = generic_read_indirect_reg(CTX,
+ REG_NBIO(RSMU_INDEX), REG_NBIO(RSMU_DATA),
+ mmMP1_C2PMSG_3);
+ if (read_back_data == msg_id)
+ break;
+ udelay(2);
+ smu_print("SMU msg id write fail %x times. \n", i + 1);
+ }
result = dcn315_smu_wait_for_response(clk_mgr, 10, 200000);
--
2.41.0
It was noticed that APs stopped to accept clients after a while. With QCA's
ath11k fork, it even printed some additional information:
attach ack fail -28
when new clients tried to connect. hostapd was then usually showing a
message like "deauthenticated due to inactivity (timer DEAUTH/REMOVE)".
While debugging this, it was noticed that this happened when a peer was no
longer known by ath11k but an NL80211_CMD_PROBE_CLIENT triggered TX was
just "finished" for it. In that case, ath11k was just throwing the skb away
and left some information in various data structures in mac80211.
And after Felix pointed out ieee80211_free_txskb(), it is also clear that
dev_kfree_skb_any() in these functions should also be calls to
ieee80211_free_txskb() - but for these, I have nothing to trigger this
error case. Still, a patch is provided as part of this patch series.
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
---
Changes in v2:
- Simply switch to ieee80211_free_txskb() as recommended by Felix Fietkau
+ ieee80211_free_txskb calls ieee80211_report_used_skb
+ ieee80211_report_used_skb calls ieee80211_report_ack_skb (when
ack_frame_id is set and !IEEE80211_TX_INTFL_MLME_CONN_TX)
+ ieee80211_report_ack_skb will remove skb from ack_status_frames
- Add second patch which handles similar situations in the previously
patched functions
- Link to v1: https://lore.kernel.org/r/20230801-ath11k-ack_status_leak-v1-1-539cb72c55bc…
---
Sven Eckelmann (2):
ath11k: Don't drop tx_status when peer cannot be found
ath11k: Cleanup mac80211 references on failure during tx_complete
drivers/net/wireless/ath/ath11k/dp_tx.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
---
base-commit: 1d7dd5aa35474e553b8671b58579e0749b560779
change-id: 20230801-ath11k-ack_status_leak-70a7a30e5d9f
Best regards,
--
Sven Eckelmann <sven(a)narfation.org>
This kind of interface doesn't have a mac header. This patch fixes
bpf_redirect() to a ppp interface.
CC: stable(a)vger.kernel.org
Fixes: 27b29f63058d ("bpf: add bpf_redirect() helper")
Signed-off-by: Nicolas Dichtel <nicolas.dichtel(a)6wind.com>
Tested-by: Siwar Zitouni <siwar.zitouni(a)6wind.com>
---
v1 -> v2:
- I forgot the 'Tested-by' tag in the v1 :/
include/linux/if_arp.h | 1 +
1 file changed, 1 insertion(+)
diff --git a/include/linux/if_arp.h b/include/linux/if_arp.h
index 1ed52441972f..8efbe29a6f0c 100644
--- a/include/linux/if_arp.h
+++ b/include/linux/if_arp.h
@@ -53,6 +53,7 @@ static inline bool dev_is_mac_header_xmit(const struct net_device *dev)
case ARPHRD_NONE:
case ARPHRD_RAWIP:
case ARPHRD_PIMREG:
+ case ARPHRD_PPP:
return false;
default:
return true;
--
2.39.2
Binutils-2.38 and GCC-12.1.0 bumped[0][1] the default ISA spec to the newer
20191213 version which moves some instructions from the I extension to the
Zicsr and Zifencei extensions. So if one of the binutils and GCC exceeds
that version, we should explicitly specifying Zicsr and Zifencei via -march
to cope with the new changes. but this only occurs when binutils >= 2.36
and GCC >= 11.1.0. It's a different story when binutils < 2.36.
binutils-2.36 supports the Zifencei extension[2] and splits Zifencei and
Zicsr from I[3]. GCC-11.1.0 is particular[4] because it add support Zicsr
and Zifencei extension for -march. binutils-2.35 does not support the
Zifencei extension, and does not need to specify Zicsr and Zifencei when
working with GCC >= 12.1.0.
To make our lives easier, let's relax the check to binutils >= 2.36 in
CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. For the other two cases,
where clang < 17 or GCC < 11.1.0, we will deal with them in
CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
For more information, please refer to:
commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38")
commit e89c2e815e76 ("riscv: Handle zicsr/zifencei issues between clang and binutils")
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae871… [0]
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76bab51… [1]
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=5a1b31e1e1cee6e9f… [2]
Link: https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=729a53530e86972d1… [3]
Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b03be74bad08c382da47e048007a7… [4]
Link: https://lore.kernel.org/all/20230308220842.1231003-1-conor@kernel.org
Link: https://lore.kernel.org/all/20230223220546.52879-1-conor@kernel.org
Reviewed-by: Conor Dooley <conor.dooley(a)microchip.com>
Acked-by: Guo Ren <guoren(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mingzheng Xing <xingmingzheng(a)iscas.ac.cn>
---
Changelog and test results:
v4 -> v5:
- Add Reviewed-by and Acked-by to commit message, no other code
changes.
v3 -> v4:
- Update the Kconfig help text and commit message.
Link: https://lore.kernel.org/all/20230731150511.38140-1-xingmingzheng@iscas.ac.cn
v2 -> v3:
- Relax the check to binutils >= 2.36.
- Update the Kconfig help text and commit message.
Link: https://lore.kernel.org/all/20230731095936.23397-1-xingmingzheng@iscas.ac.cn
v1 -> v2:
- Update the Kconfig help text and commit message.
- Add considerations for low version gcc case.
Link: https://lore.kernel.org/all/20230726174524.340952-1-xingmingzheng@iscas.ac.…
v1:
Link: https://lore.kernel.org/all/20230725170405.251011-1-xingmingzheng@iscas.ac.…
Here are my test results:
* Compiling the kernel for the master branch with a combination of
multiple versions of gcc and binutils.
gcc binutils patched no patch
11.4.0 2.35 ok ok
11.4.0 2.36 ok ok
11.4.0 2.38 ok ok
12.2.0 2.35 ok error[1]
12.2.0 2.36 ok error[2]
12.2.0 2.38 ok ok
10.5.0 2.35 ok ok
10.5.0 2.36 ok ok
10.5.0 2.38 ok error[3]
11.1.0 2.35 ok ok
11.1.0 2.36 ok ok
11.1.0 2.38 ok ok
11.2.0 2.35 ok ok
11.2.0 2.36 ok ok
11.2.0 2.38 ok ok
[1]
Assembler messages:
Fatal error: -march=rv32imafd_zicsr_zifencei: Invalid or unknown z ISA extension: 'zifencei'
make[2]: *** [arch/riscv/kernel/compat_vdso/Makefile:47: arch/riscv/kernel/compat_vdso/rt_sigreturn.o] Error 1
[2]
./arch/riscv/include/asm/vdso/gettimeofday.h: Assembler messages:
./arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
./arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
./arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
./arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
make[2]: *** [scripts/Makefile.build:243: arch/riscv/kernel/vdso/vgettimeofday.o] Error 1
[3]
cc1: error: '-march=rv64imac_zicsr_zifencei': unsupported ISA subset 'z'
cc1: error: ABI requires '-march=rv64'
make[2]: *** [scripts/Makefile.build:243: scripts/mod/empty.o] Error 1
make[2]: *** Waiting for unfinished jobs....
cc1: error: '-march=rv64imac_zicsr_zifencei': unsupported ISA subset 'z'
cc1: error: ABI requires '-march=rv64'
arch/riscv/Kconfig | 32 +++++++++++++++-----------
arch/riscv/kernel/compat_vdso/Makefile | 8 ++++++-
2 files changed, 26 insertions(+), 14 deletions(-)
diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index f52dd125ac5e..ce3a6667cfdb 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -570,24 +570,30 @@ config TOOLCHAIN_HAS_ZIHINTPAUSE
config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
def_bool y
# https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae871…
- depends on AS_IS_GNU && AS_VERSION >= 23800
- help
- Newer binutils versions default to ISA spec version 20191213 which
- moves some instructions from the I extension to the Zicsr and Zifencei
- extensions.
+ # https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76bab51…
+ depends on AS_IS_GNU && AS_VERSION >= 23600
+ help
+ Binutils-2.38 and GCC-12.1.0 bumped the default ISA spec to the newer
+ 20191213 version, which moves some instructions from the I extension to
+ the Zicsr and Zifencei extensions. This requires explicitly specifying
+ Zicsr and Zifencei when binutils >= 2.38 or GCC >= 12.1.0. Zicsr
+ and Zifencei are supported in binutils from version 2.36 onwards.
+ To make life easier, and avoid forcing toolchains that default to a
+ newer ISA spec to version 2.2, relax the check to binutils >= 2.36.
+ For clang < 17 or GCC < 11.1.0, for which this is not possible, this is
+ dealt with in CONFIG_TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
config TOOLCHAIN_NEEDS_OLD_ISA_SPEC
def_bool y
depends on TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
# https://github.com/llvm/llvm-project/commit/22e199e6afb1263c943c0c0d4498694…
- depends on CC_IS_CLANG && CLANG_VERSION < 170000
- help
- Certain versions of clang do not support zicsr and zifencei via -march
- but newer versions of binutils require it for the reasons noted in the
- help text of CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI. This
- option causes an older ISA spec compatible with these older versions
- of clang to be passed to GAS, which has the same result as passing zicsr
- and zifencei to -march.
+ # https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b03be74bad08c382da47e048007a7…
+ depends on (CC_IS_CLANG && CLANG_VERSION < 170000) || (CC_IS_GCC && GCC_VERSION < 110100)
+ help
+ Certain versions of clang and GCC do not support zicsr and zifencei via
+ -march. This option causes an older ISA spec compatible with these older
+ versions of clang and GCC to be passed to GAS, which has the same result
+ as passing zicsr and zifencei to -march.
config FPU
bool "FPU support"
diff --git a/arch/riscv/kernel/compat_vdso/Makefile b/arch/riscv/kernel/compat_vdso/Makefile
index 189345773e7e..b86e5e2c3aea 100644
--- a/arch/riscv/kernel/compat_vdso/Makefile
+++ b/arch/riscv/kernel/compat_vdso/Makefile
@@ -11,7 +11,13 @@ compat_vdso-syms += flush_icache
COMPAT_CC := $(CC)
COMPAT_LD := $(LD)
-COMPAT_CC_FLAGS := -march=rv32g -mabi=ilp32
+# binutils 2.35 does not support the zifencei extension, but in the ISA
+# spec 20191213, G stands for IMAFD_ZICSR_ZIFENCEI.
+ifdef CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
+ COMPAT_CC_FLAGS := -march=rv32g -mabi=ilp32
+else
+ COMPAT_CC_FLAGS := -march=rv32imafd -mabi=ilp32
+endif
COMPAT_LD_FLAGS := -melf32lriscv
# Disable attributes, as they're useless and break the build.
--
2.34.1
When sending Discover Identity messages to a Port Partner that uses Power
Delivery v2 and SVDM v1, we currently send PD v2 messages with SVDM v2.0,
expecting the port partner to respond with its highest supported SVDM
version as stated in Section 6.4.4.2.3 in the Power Delivery v3
specification. However, sending SVDM v2 to some Power Delivery v2 port
partners results in a NAK whereas sending SVDM v1 does not.
NAK messages can be handled by the initiator (PD v3 section 6.4.4.2.5.1),
and one solution could be to resend Discover Identity on a lower SVDM
version if possible. But, Section 6.4.4.3 of PD v2 states that "A NAK
response Should be taken as an indication not to retry that particular
Command."
Instead, we can set the SVDM version to the maximum one supported by the
negotiated PD revision. When operating in PD v2, this obeys Section
6.4.4.2.3, which states the SVDM field "Shall be set to zero to indicate
Version 1.0." In PD v3, the SVDM field "Shall be set to 01b to indicate
Version 2.0."
Fixes: c34e85fa69b9 ("usb: typec: tcpm: Send DISCOVER_IDENTITY from dedicated work")
Cc: stable(a)vger.kernel.org
Signed-off-by: RD Babiera <rdbabiera(a)google.com>
---
Changes since v1:
* Fixed styling errors.
---
drivers/usb/typec/tcpm/tcpm.c | 35 +++++++++++++++++++++++++++++++----
1 file changed, 31 insertions(+), 4 deletions(-)
diff --git a/drivers/usb/typec/tcpm/tcpm.c b/drivers/usb/typec/tcpm/tcpm.c
index 829d75ebab42..5024354a0fe0 100644
--- a/drivers/usb/typec/tcpm/tcpm.c
+++ b/drivers/usb/typec/tcpm/tcpm.c
@@ -3928,6 +3928,29 @@ static enum typec_cc_status tcpm_pwr_opmode_to_rp(enum typec_pwr_opmode opmode)
}
}
+static void tcpm_set_initial_svdm_version(struct tcpm_port *port)
+{
+ switch (port->negotiated_rev) {
+ case PD_REV30:
+ break;
+ /*
+ * 6.4.4.2.3 Structured VDM Version
+ * 2.0 states "At this time, there is only one version (1.0) defined.
+ * This field Shall be set to zero to indicate Version 1.0."
+ * 3.0 states "This field Shall be set to 01b to indicate Version 2.0."
+ * To ensure that we follow the Power Delivery revision we are currently
+ * operating on, downgrade the SVDM version to the highest one supported
+ * by the Power Delivery revision.
+ */
+ case PD_REV20:
+ typec_partner_set_svdm_version(port->partner, SVDM_VER_1_0);
+ break;
+ default:
+ typec_partner_set_svdm_version(port->partner, SVDM_VER_1_0);
+ break;
+ }
+}
+
static void run_state_machine(struct tcpm_port *port)
{
int ret;
@@ -4165,10 +4188,12 @@ static void run_state_machine(struct tcpm_port *port)
* For now, this driver only supports SOP for DISCOVER_IDENTITY, thus using
* port->explicit_contract to decide whether to send the command.
*/
- if (port->explicit_contract)
+ if (port->explicit_contract) {
+ tcpm_set_initial_svdm_version(port);
mod_send_discover_delayed_work(port, 0);
- else
+ } else {
port->send_discover = false;
+ }
/*
* 6.3.5
@@ -4455,10 +4480,12 @@ static void run_state_machine(struct tcpm_port *port)
* For now, this driver only supports SOP for DISCOVER_IDENTITY, thus using
* port->explicit_contract.
*/
- if (port->explicit_contract)
+ if (port->explicit_contract) {
+ tcpm_set_initial_svdm_version(port);
mod_send_discover_delayed_work(port, 0);
- else
+ } else {
port->send_discover = false;
+ }
power_supply_changed(port->psy);
break;
base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c
--
2.41.0.585.gd2178a4bd4-goog
Dzień dobry,
czy konsultowali Państwo swoją umowę ubezpieczeniową z niezależnym doradcą?
Większość moich Klientów, dotychczas nie była świadoma, jak bardzo przepłacają za polisy.
Jako specjalista niepowiązany z żadną organizacja ubezpieczeniową bezpłatnie przeanalizuję Państwa rozwiązania finansowe i zarekomenduję najkorzystniejsze na rynku alternatywy, które pozwolą zmniejszyć dotychczasowe koszty przy jednocześnie zwiększonej ochronie.
Byliby Państwo zainteresowani?
Pozdrawiam
Grzegorz Frycz
This is an automatic generated email to let you know that the following patch were queued:
Subject: media: vcodec: Fix potential array out-of-bounds in encoder queue_setup
Author: Wei Chen <harperchen1110(a)gmail.com>
Date: Thu Aug 10 08:23:33 2023 +0000
variable *nplanes is provided by user via system call argument. The
possible value of q_data->fmt->num_planes is 1-3, while the value
of *nplanes can be 1-8. The array access by index i can cause array
out-of-bounds.
Fix this bug by checking *nplanes against the array size.
Fixes: 4e855a6efa54 ("[media] vcodec: mediatek: Add Mediatek V4L2 Video Encoder Driver")
Signed-off-by: Wei Chen <harperchen1110(a)gmail.com>
Cc: stable(a)vger.kernel.org
Reviewed-by: Chen-Yu Tsai <wenst(a)chromium.org>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c | 2 ++
1 file changed, 2 insertions(+)
---
diff --git a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c
index 9ff439a50f53..315e97a2450e 100644
--- a/drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c
+++ b/drivers/media/platform/mediatek/vcodec/mtk_vcodec_enc.c
@@ -821,6 +821,8 @@ static int vb2ops_venc_queue_setup(struct vb2_queue *vq,
return -EINVAL;
if (*nplanes) {
+ if (*nplanes != q_data->fmt->num_planes)
+ return -EINVAL;
for (i = 0; i < *nplanes; i++)
if (sizes[i] < q_data->sizeimage[i])
return -EINVAL;
The automatic recalculation of the maximum allowed MTU is usually triggered
by code sections which are already rtnl lock protected by callers outside
of batman-adv. But when the fragmentation setting is changed via
batman-adv's own batadv genl family, then the rtnl lock is not yet taken.
But dev_set_mtu requires that the caller holds the rtnl lock because it
uses netdevice notifiers. And this code will then fail the check for this
lock:
RTNL: assertion failed at net/core/dev.c (1953)
Cc: stable(a)vger.kernel.org
Reported-by: syzbot+f8812454d9b3ac00d282(a)syzkaller.appspotmail.com
Fixes: c6a953cce8d0 ("batman-adv: Trigger events for auto adjusted MTU")
Signed-off-by: Sven Eckelmann <sven(a)narfation.org>
---
This problem was just identified by syzbot [1]. I hope it is ok to directly
send this patch to netdev instead of creating a single-patch PR from
the batadv/net branch. If you still prefer a PR then we can also prepare
it.
[1] https://lore.kernel.org/r/0000000000009bbb4b0603717cde@google.com
---
net/batman-adv/netlink.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/batman-adv/netlink.c b/net/batman-adv/netlink.c
index ad5714f737be..6efbc9275aec 100644
--- a/net/batman-adv/netlink.c
+++ b/net/batman-adv/netlink.c
@@ -495,7 +495,10 @@ static int batadv_netlink_set_mesh(struct sk_buff *skb, struct genl_info *info)
attr = info->attrs[BATADV_ATTR_FRAGMENTATION_ENABLED];
atomic_set(&bat_priv->fragmentation, !!nla_get_u8(attr));
+
+ rtnl_lock();
batadv_update_min_mtu(bat_priv->soft_iface);
+ rtnl_unlock();
}
if (info->attrs[BATADV_ATTR_GW_BANDWIDTH_DOWN]) {
---
base-commit: 421d467dc2d483175bad4fb76a31b9e5a3d744cf
change-id: 20230821-batadv-missing-mtu-rtnl-lock-bc4cee67731d
Best regards,
--
Sven Eckelmann <sven(a)narfation.org>
The vendor check introduced by commit 554b841d4703 ("tpm: Disable RNG for
all AMD fTPMs") doesn't work properly on a number of Intel fTPMs. On the
reported systems the TPM doesn't reply at bootup and returns back the
command code. This makes the TPM fail probe.
Since only Microsoft Pluton is the only known combination of AMD CPU and
fTPM from other vendor, disable hwrng otherwise. In order to make sysadmin
aware of this, print also info message to the klog.
Cc: stable(a)vger.kernel.org
Fixes: 554b841d4703 ("tpm: Disable RNG for all AMD fTPMs")
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217804
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
v2:
* CONFIG_X86
* Removed "Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>"
* Removed "Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>"
---
drivers/char/tpm/tpm_crb.c | 31 ++++++-------------------------
1 file changed, 6 insertions(+), 25 deletions(-)
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 65ff4d2fbe8d..28448bfd4062 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -463,28 +463,6 @@ static bool crb_req_canceled(struct tpm_chip *chip, u8 status)
return (cancel & CRB_CANCEL_INVOKE) == CRB_CANCEL_INVOKE;
}
-static int crb_check_flags(struct tpm_chip *chip)
-{
- u32 val;
- int ret;
-
- ret = crb_request_locality(chip, 0);
- if (ret)
- return ret;
-
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL);
- if (ret)
- goto release;
-
- if (val == 0x414D4400U /* AMD */)
- chip->flags |= TPM_CHIP_FLAG_HWRNG_DISABLED;
-
-release:
- crb_relinquish_locality(chip, 0);
-
- return ret;
-}
-
static const struct tpm_class_ops tpm_crb = {
.flags = TPM_OPS_AUTO_STARTUP,
.status = crb_status,
@@ -827,9 +805,12 @@ static int crb_acpi_add(struct acpi_device *device)
if (rc)
goto out;
- rc = crb_check_flags(chip);
- if (rc)
- goto out;
+ /* A quirk for https://www.amd.com/en/support/kb/faq/pa-410 */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ priv->sm != ACPI_TPM2_COMMAND_BUFFER_WITH_PLUTON) {
+ dev_info(dev, "Disabling hwrng\n");
+ chip->flags |= TPM_CHIP_FLAG_HWRNG_DISABLED;
+ }
rc = tpm_chip_register(chip);
--
2.39.2
The vendor check introduced by commit 554b841d4703 ("tpm: Disable RNG for
all AMD fTPMs") doesn't work properly on a number of Intel fTPMs. On the
reported systems the TPM doesn't reply at bootup and returns back the
command code. This makes the TPM fail probe.
As this isn't crucial for anything but AMD fTPM and AMD fTPM works, check
the chip vendor and if it's not AMD don't run the checks.
Cc: stable(a)vger.kernel.org
Fixes: 554b841d4703 ("tpm: Disable RNG for all AMD fTPMs")
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Reported-by: Patrick Steinhardt <ps(a)pks.im>
Reported-by: Ronan Pigott <ronan(a)rjp.ie>
Reported-by: Raymond Jay Golo <rjgolo(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217804
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
---
v1->v2:
* Check x86 vendor for AMD
---
drivers/char/tpm/tpm_crb.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 9eb1a18590123..7faf670201ccc 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -465,8 +465,12 @@ static bool crb_req_canceled(struct tpm_chip *chip, u8 status)
static int crb_check_flags(struct tpm_chip *chip)
{
+ int ret = 0;
+#ifdef CONFIG_X86
u32 val;
- int ret;
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+ return ret;
ret = crb_request_locality(chip, 0);
if (ret)
@@ -481,6 +485,7 @@ static int crb_check_flags(struct tpm_chip *chip)
release:
crb_relinquish_locality(chip, 0);
+#endif
return ret;
}
--
2.34.1
The vendor check introduced by commit 554b841d4703 ("tpm: Disable RNG for
all AMD fTPMs") doesn't work properly on a number of Intel fTPMs. On the
reported systems the TPM doesn't reply at bootup and returns back the
command code. This makes the TPM fail probe.
Since only Microsoft Pluton is the only known combination of AMD CPU and
fTPM from other vendor, disable hwrng otherwise. In order to make sysadmin
aware of this, print also info message to the klog.
Cc: stable(a)vger.kernel.org # v6.5+
Fixes: 554b841d4703 ("tpm: Disable RNG for all AMD fTPMs")
Reported-by: Todd Brandt <todd.e.brandt(a)intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217804
Signed-off-by: Mario Limonciello <mario.limonciello(a)amd.com>
Reviewed-by: Jarkko Sakkinen <jarkko(a)kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko(a)kernel.org>
---
drivers/char/tpm/tpm_crb.c | 31 ++++++-------------------------
1 file changed, 6 insertions(+), 25 deletions(-)
diff --git a/drivers/char/tpm/tpm_crb.c b/drivers/char/tpm/tpm_crb.c
index 65ff4d2fbe8d..28448bfd4062 100644
--- a/drivers/char/tpm/tpm_crb.c
+++ b/drivers/char/tpm/tpm_crb.c
@@ -463,28 +463,6 @@ static bool crb_req_canceled(struct tpm_chip *chip, u8 status)
return (cancel & CRB_CANCEL_INVOKE) == CRB_CANCEL_INVOKE;
}
-static int crb_check_flags(struct tpm_chip *chip)
-{
- u32 val;
- int ret;
-
- ret = crb_request_locality(chip, 0);
- if (ret)
- return ret;
-
- ret = tpm2_get_tpm_pt(chip, TPM2_PT_MANUFACTURER, &val, NULL);
- if (ret)
- goto release;
-
- if (val == 0x414D4400U /* AMD */)
- chip->flags |= TPM_CHIP_FLAG_HWRNG_DISABLED;
-
-release:
- crb_relinquish_locality(chip, 0);
-
- return ret;
-}
-
static const struct tpm_class_ops tpm_crb = {
.flags = TPM_OPS_AUTO_STARTUP,
.status = crb_status,
@@ -827,9 +805,12 @@ static int crb_acpi_add(struct acpi_device *device)
if (rc)
goto out;
- rc = crb_check_flags(chip);
- if (rc)
- goto out;
+ /* A quirk for https://www.amd.com/en/support/kb/faq/pa-410 */
+ if (boot_cpu_data.x86_vendor == X86_VENDOR_AMD &&
+ priv->sm != ACPI_TPM2_COMMAND_BUFFER_WITH_PLUTON) {
+ dev_info(dev, "Disabling hwrng\n");
+ chip->flags |= TPM_CHIP_FLAG_HWRNG_DISABLED;
+ }
rc = tpm_chip_register(chip);
--
2.39.2
As made mention of in commit 099303e9a9bd ("drm/amd/display: eDP
intermittent black screen during PnP"), we need to turn off the
display's backlight before powering off an eDP display. Not doing so
will result in undefined behaviour according to the eDP spec. So, set
DCN301's edp_backlight_control() function pointer to
dce110_edp_backlight_control().
Cc: stable(a)vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2765
Fixes: 9c75891feef0 ("drm/amd/display: rework recent update PHY state commit")
Suggested-by: Swapnil Patel <swapnil.patel(a)amd.com>
Signed-off-by: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
---
drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
index 257df8660b4c..61205cdbe2d5 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn301/dcn301_init.c
@@ -75,6 +75,7 @@ static const struct hw_sequencer_funcs dcn301_funcs = {
.get_hw_state = dcn10_get_hw_state,
.clear_status_bits = dcn10_clear_status_bits,
.wait_for_mpcc_disconnect = dcn10_wait_for_mpcc_disconnect,
+ .edp_backlight_control = dce110_edp_backlight_control,
.edp_power_control = dce110_edp_power_control,
.edp_wait_for_hpd_ready = dce110_edp_wait_for_hpd_ready,
.set_cursor_position = dcn10_set_cursor_position,
--
2.41.0
check_clock doesn't account for vfe_lite which means that vfe_lite will
never get validated by this routine. Add the clock name to the expected set
to remediate.
Fixes: 7319cdf189bb ("media: camss: Add support for VFE hardware version Titan 170")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
drivers/media/platform/qcom/camss/camss-vfe.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/platform/qcom/camss/camss-vfe.c b/drivers/media/platform/qcom/camss/camss-vfe.c
index 938f373bcd1fd..b021f81cef123 100644
--- a/drivers/media/platform/qcom/camss/camss-vfe.c
+++ b/drivers/media/platform/qcom/camss/camss-vfe.c
@@ -535,7 +535,8 @@ static int vfe_check_clock_rates(struct vfe_device *vfe)
struct camss_clock *clock = &vfe->clock[i];
if (!strcmp(clock->name, "vfe0") ||
- !strcmp(clock->name, "vfe1")) {
+ !strcmp(clock->name, "vfe1") ||
+ !strcmp(clock->name, "vfe_lite")) {
u64 min_rate = 0;
unsigned long rate;
--
2.41.0
There are two problems with the current vfe_disable_output() routine.
Firstly we rightly use a spinlock to protect output->gen2.active_num
everywhere except for in the IDLE timeout path of vfe_disable_output().
Even if that is not racy "in practice" somehow it is by happenstance not
by design.
Secondly we do not get consistent behaviour from this routine. On
sc8280xp 50% of the time I get "VFE idle timeout - resetting". In this
case the subsequent capture will succeed. The other 50% of the time, we
don't hit the idle timeout, never do the VFE reset and subsequent
captures stall indefinitely.
Rewrite the vfe_disable_output() routine to
- Quiesce write masters with vfe_wm_stop()
- Set active_num = 0
remembering to hold the spinlock when we do so followed by
- Reset the VFE
Testing on sc8280xp and sdm845 shows this to be a valid fix.
Fixes: 7319cdf189bb ("media: camss: Add support for VFE hardware version Titan 170")
Cc: stable(a)vger.kernel.org
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
.../media/platform/qcom/camss/camss-vfe-170.c | 19 +++----------------
1 file changed, 3 insertions(+), 16 deletions(-)
diff --git a/drivers/media/platform/qcom/camss/camss-vfe-170.c b/drivers/media/platform/qcom/camss/camss-vfe-170.c
index 02494c89da91c..ae9137633c301 100644
--- a/drivers/media/platform/qcom/camss/camss-vfe-170.c
+++ b/drivers/media/platform/qcom/camss/camss-vfe-170.c
@@ -500,28 +500,15 @@ static int vfe_disable_output(struct vfe_line *line)
struct vfe_output *output = &line->output;
unsigned long flags;
unsigned int i;
- bool done;
- int timeout = 0;
-
- do {
- spin_lock_irqsave(&vfe->output_lock, flags);
- done = !output->gen2.active_num;
- spin_unlock_irqrestore(&vfe->output_lock, flags);
- usleep_range(10000, 20000);
-
- if (timeout++ == 100) {
- dev_err(vfe->camss->dev, "VFE idle timeout - resetting\n");
- vfe_reset(vfe);
- output->gen2.active_num = 0;
- return 0;
- }
- } while (!done);
spin_lock_irqsave(&vfe->output_lock, flags);
for (i = 0; i < output->wm_num; i++)
vfe_wm_stop(vfe, output->wm_idx[i]);
+ output->gen2.active_num = 0;
spin_unlock_irqrestore(&vfe->output_lock, flags);
+ vfe_reset(vfe);
+
return 0;
}
--
2.41.0