The patch titled
Subject: mm: memory-failure: fix potential unexpected return value from unpoison_memory()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-memory-failure-fix-potential-unexpected-return-value-from-unpoison_memory.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm: memory-failure: fix potential unexpected return value from unpoison_memory()
Date: Thu, 27 Jul 2023 19:56:41 +0800
If unpoison_memory() fails to clear page hwpoisoned flag, return value ret
is expected to be -EBUSY. But when get_hwpoison_page() returns 1 and
fails to clear page hwpoisoned flag due to races, return value will be
unexpected 1 leading to users being confused. And there's a code smell
that the variable "ret" is used not only to save the return value of
unpoison_memory(), but also the return value from get_hwpoison_page().
Make a further cleanup by using another auto-variable solely to save the
return value of get_hwpoison_page() as suggested by Naoya.
Link: https://lkml.kernel.org/r/20230727115643.639741-3-linmiaohe@huawei.com
Fixes: bf181c582588 ("mm/hwpoison: fix unpoison_memory()")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/memory-failure.c | 19 +++++++++----------
1 file changed, 9 insertions(+), 10 deletions(-)
--- a/mm/memory-failure.c~mm-memory-failure-fix-potential-unexpected-return-value-from-unpoison_memory
+++ a/mm/memory-failure.c
@@ -2466,7 +2466,7 @@ int unpoison_memory(unsigned long pfn)
{
struct folio *folio;
struct page *p;
- int ret = -EBUSY;
+ int ret = -EBUSY, ghp;
unsigned long count = 1;
bool huge = false;
static DEFINE_RATELIMIT_STATE(unpoison_rs, DEFAULT_RATELIMIT_INTERVAL,
@@ -2514,29 +2514,28 @@ int unpoison_memory(unsigned long pfn)
if (folio_test_slab(folio) || PageTable(&folio->page) || folio_test_reserved(folio))
goto unlock_mutex;
- ret = get_hwpoison_page(p, MF_UNPOISON);
- if (!ret) {
+ ghp = get_hwpoison_page(p, MF_UNPOISON);
+ if (!ghp) {
if (PageHuge(p)) {
huge = true;
count = folio_free_raw_hwp(folio, false);
- if (count == 0) {
- ret = -EBUSY;
+ if (count == 0)
goto unlock_mutex;
- }
}
ret = folio_test_clear_hwpoison(folio) ? 0 : -EBUSY;
- } else if (ret < 0) {
- if (ret == -EHWPOISON) {
+ } else if (ghp < 0) {
+ if (ghp == -EHWPOISON) {
ret = put_page_back_buddy(p) ? 0 : -EBUSY;
- } else
+ } else {
+ ret = ghp;
unpoison_pr_info("Unpoison: failed to grab page %#lx\n",
pfn, &unpoison_rs);
+ }
} else {
if (PageHuge(p)) {
huge = true;
count = folio_free_raw_hwp(folio, false);
if (count == 0) {
- ret = -EBUSY;
folio_put(folio);
goto unlock_mutex;
}
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-swapfile-fix-wrong-swap-entry-type-for-hwpoisoned-swapcache-page.patch
mm-memory-failure-fix-potential-unexpected-return-value-from-unpoison_memory.patch
mm-memory-failure-avoid-false-hwpoison-page-mapped-error-info.patch
mm-memory-failure-add-pageoffline-check.patch
mm-mm_initc-update-obsolete-comment-in-get_pfn_range_for_nid.patch
mm-memory-failure-fix-unexpected-return-value-in-soft_offline_page.patch
mm-memory-failure-fix-potential-page-refcnt-leak-in-memory_failure.patch
mm-memory-failure-remove-unneeded-page-state-check-in-shake_page.patch
memory-tier-use-helper-function-destroy_memory_type.patch
mm-memory-failure-remove-unneeded-inline-annotation.patch
mm-mm_initc-remove-obsolete-macro-hash_small.patch
mm-page_alloc-avoid-false-page-outside-zone-error-info.patch
memory-tier-rename-destroy_memory_type-to-put_memory_type.patch
mm-remove-obsolete-comment-above-struct-per_cpu_pages.patch
mm-memcg-minor-cleanup-for-mem_cgroup_id_max.patch
mm-memory-failure-remove-unneeded-pagehuge-check.patch
mm-memory-failure-ensure-moving-hwpoison-flag-to-the-raw-error-pages.patch
mm-memory-failure-dont-account-hwpoison_filter-filtered-pages.patch
mm-memory-failure-use-local-variable-huge-to-check-hugetlb-page.patch
mm-memory-failure-remove-unneeded-header-files.patch
mm-memory-failure-minor-cleanup-for-comments-and-codestyle.patch
mm-memory-failure-fetch-compound-head-after-extra-page-refcnt-is-held.patch
mm-memory-failure-fix-race-window-when-trying-to-get-hugetlb-folio.patch
mm-huge_memory-use-rmap_none-when-calling-page_add_anon_rmap.patch
mm-memcg-fix-obsolete-comment-above-mem_cgroup_max_reclaim_loops.patch
mm-memcg-minor-cleanup-for-mc_handle_present_pte.patch
memory-tier-use-helper-macro-__attr_rw.patch
mm-fix-obsolete-function-name-above-debug_pagealloc_enabled_static.patch
mm-mprotect-fix-obsolete-function-name-in-change_pte_range.patch
mm-memcg-fix-obsolete-function-name-in-mem_cgroup_protection.patch
The patch titled
Subject: mm/swapfile: fix wrong swap entry type for hwpoisoned swapcache page
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-swapfile-fix-wrong-swap-entry-type-for-hwpoisoned-swapcache-page.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Miaohe Lin <linmiaohe(a)huawei.com>
Subject: mm/swapfile: fix wrong swap entry type for hwpoisoned swapcache page
Date: Thu, 27 Jul 2023 19:56:40 +0800
Patch series "A few fixup patches for mm", v2.
This series contains a few fixup patches to fix potential unexpected
return value, fix wrong swap entry type for hwpoisoned swapcache page and
so on. More details can be found in the respective changelogs.
This patch (of 4):
Hwpoisoned dirty swap cache page is kept in the swap cache and there's
simple interception code in do_swap_page() to catch it. But when trying
to swapoff, unuse_pte() will wrongly install a general sense of "future
accesses are invalid" swap entry for hwpoisoned swap cache page due to
unaware of such type of page. The user will receive SIGBUS signal without
expected BUS_MCEERR_AR payload. BTW, typo 'hwposioned' is fixed.
Link: https://lkml.kernel.org/r/20230727115643.639741-1-linmiaohe@huawei.com
Link: https://lkml.kernel.org/r/20230727115643.639741-2-linmiaohe@huawei.com
Fixes: 6b970599e807 ("mm: hwpoison: support recovery from ksm_might_need_to_copy()")
Signed-off-by: Miaohe Lin <linmiaohe(a)huawei.com>
Cc: Kefeng Wang <wangkefeng.wang(a)huawei.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi(a)nec.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/ksm.c | 2 ++
mm/swapfile.c | 8 ++++----
2 files changed, 6 insertions(+), 4 deletions(-)
--- a/mm/ksm.c~mm-swapfile-fix-wrong-swap-entry-type-for-hwpoisoned-swapcache-page
+++ a/mm/ksm.c
@@ -2784,6 +2784,8 @@ struct page *ksm_might_need_to_copy(stru
anon_vma->root == vma->anon_vma->root) {
return page; /* still no need to copy it */
}
+ if (PageHWPoison(page))
+ return ERR_PTR(-EHWPOISON);
if (!PageUptodate(page))
return page; /* let do_swap_page report the error */
--- a/mm/swapfile.c~mm-swapfile-fix-wrong-swap-entry-type-for-hwpoisoned-swapcache-page
+++ a/mm/swapfile.c
@@ -1746,7 +1746,7 @@ static int unuse_pte(struct vm_area_stru
struct page *swapcache;
spinlock_t *ptl;
pte_t *pte, new_pte, old_pte;
- bool hwposioned = false;
+ bool hwpoisoned = PageHWPoison(page);
int ret = 1;
swapcache = page;
@@ -1754,7 +1754,7 @@ static int unuse_pte(struct vm_area_stru
if (unlikely(!page))
return -ENOMEM;
else if (unlikely(PTR_ERR(page) == -EHWPOISON))
- hwposioned = true;
+ hwpoisoned = true;
pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
if (unlikely(!pte || !pte_same_as_swp(ptep_get(pte),
@@ -1765,11 +1765,11 @@ static int unuse_pte(struct vm_area_stru
old_pte = ptep_get(pte);
- if (unlikely(hwposioned || !PageUptodate(page))) {
+ if (unlikely(hwpoisoned || !PageUptodate(page))) {
swp_entry_t swp_entry;
dec_mm_counter(vma->vm_mm, MM_SWAPENTS);
- if (hwposioned) {
+ if (hwpoisoned) {
swp_entry = make_hwpoison_entry(swapcache);
page = swapcache;
} else {
_
Patches currently in -mm which might be from linmiaohe(a)huawei.com are
mm-swapfile-fix-wrong-swap-entry-type-for-hwpoisoned-swapcache-page.patch
mm-memory-failure-fix-potential-unexpected-return-value-from-unpoison_memory.patch
mm-memory-failure-avoid-false-hwpoison-page-mapped-error-info.patch
mm-memory-failure-add-pageoffline-check.patch
mm-mm_initc-update-obsolete-comment-in-get_pfn_range_for_nid.patch
mm-memory-failure-fix-unexpected-return-value-in-soft_offline_page.patch
mm-memory-failure-fix-potential-page-refcnt-leak-in-memory_failure.patch
mm-memory-failure-remove-unneeded-page-state-check-in-shake_page.patch
memory-tier-use-helper-function-destroy_memory_type.patch
mm-memory-failure-remove-unneeded-inline-annotation.patch
mm-mm_initc-remove-obsolete-macro-hash_small.patch
mm-page_alloc-avoid-false-page-outside-zone-error-info.patch
memory-tier-rename-destroy_memory_type-to-put_memory_type.patch
mm-remove-obsolete-comment-above-struct-per_cpu_pages.patch
mm-memcg-minor-cleanup-for-mem_cgroup_id_max.patch
mm-memory-failure-remove-unneeded-pagehuge-check.patch
mm-memory-failure-ensure-moving-hwpoison-flag-to-the-raw-error-pages.patch
mm-memory-failure-dont-account-hwpoison_filter-filtered-pages.patch
mm-memory-failure-use-local-variable-huge-to-check-hugetlb-page.patch
mm-memory-failure-remove-unneeded-header-files.patch
mm-memory-failure-minor-cleanup-for-comments-and-codestyle.patch
mm-memory-failure-fetch-compound-head-after-extra-page-refcnt-is-held.patch
mm-memory-failure-fix-race-window-when-trying-to-get-hugetlb-folio.patch
mm-huge_memory-use-rmap_none-when-calling-page_add_anon_rmap.patch
mm-memcg-fix-obsolete-comment-above-mem_cgroup_max_reclaim_loops.patch
mm-memcg-minor-cleanup-for-mc_handle_present_pte.patch
memory-tier-use-helper-macro-__attr_rw.patch
mm-fix-obsolete-function-name-above-debug_pagealloc_enabled_static.patch
mm-mprotect-fix-obsolete-function-name-in-change_pte_range.patch
mm-memcg-fix-obsolete-function-name-in-mem_cgroup_protection.patch
The patch titled
Subject: radix tree test suite: fix incorrect allocation size for pthreads
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
radix-tree-test-suite-fix-incorrect-allocation-size-for-pthreads.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Colin Ian King <colin.i.king(a)gmail.com>
Subject: radix tree test suite: fix incorrect allocation size for pthreads
Date: Thu, 27 Jul 2023 17:09:30 +0100
Currently the pthread allocation for each array item is based on the size
of a pthread_t pointer and should be the size of the pthread_t structure,
so the allocation is under-allocating the correct size. Fix this by using
the size of each element in the pthreads array.
Static analysis cppcheck reported:
tools/testing/radix-tree/regression1.c:180:2: warning: Size of pointer
'threads' used instead of size of its data. [pointerSize]
Link: https://lkml.kernel.org/r/20230727160930.632674-1-colin.i.king@gmail.com
Fixes: 1366c37ed84b ("radix tree test harness")
Signed-off-by: Colin Ian King <colin.i.king(a)gmail.com>
Cc: Konstantin Khlebnikov <koct9i(a)gmail.com>
Cc: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/radix-tree/regression1.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/radix-tree/regression1.c~radix-tree-test-suite-fix-incorrect-allocation-size-for-pthreads
+++ a/tools/testing/radix-tree/regression1.c
@@ -177,7 +177,7 @@ void regression1_test(void)
nr_threads = 2;
pthread_barrier_init(&worker_barrier, NULL, nr_threads);
- threads = malloc(nr_threads * sizeof(pthread_t *));
+ threads = malloc(nr_threads * sizeof(*threads));
for (i = 0; i < nr_threads; i++) {
arg = i;
_
Patches currently in -mm which might be from colin.i.king(a)gmail.com are
radix-tree-test-suite-fix-incorrect-allocation-size-for-pthreads.patch
fs-hfsplus-make-extend-error-rate-limited.patch
This series fixes an issue which David Spickett found where if we change
the SVE VL while SME is in use we can end up attempting to save state to
an unallocated buffer and adds testing coverage for that plus a bit more
coverage of VL changes, just for paranioa.
Signed-off-by: Mark Brown <broonie(a)kernel.org>
---
Changes in v2:
- Always reallocate the SVE state.
- Rebase onto v6.5-rc2.
- Link to v1: https://lore.kernel.org/r/20230713-arm64-fix-sve-sme-vl-change-v1-0-129dd86…
---
Mark Brown (3):
arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
kselftest/arm64: Add a test case for SVE VL changes with SME active
kselftest/arm64: Validate that changing one VL type does not affect another
arch/arm64/kernel/fpsimd.c | 33 +++++--
tools/testing/selftests/arm64/fp/vec-syscfg.c | 127 +++++++++++++++++++++++++-
2 files changed, 148 insertions(+), 12 deletions(-)
---
base-commit: 06785562d1b99ff6dc1cd0af54be5e3ff999dc02
change-id: 20230713-arm64-fix-sve-sme-vl-change-60eb1fa6a707
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x a5a5990c099dd354e05e89ee77cd2dbf6655d4a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072146-grit-winking-6e88@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
a5a5990c099d ("selftests: mptcp: sockopt: use 'iptables-legacy' if available")
5f17f8e315ad ("selftests: mptcp: declare var as local")
de2392028a19 ("selftests: mptcp: clearly declare global ns vars")
787eb1e4df93 ("selftests: mptcp: uniform 'rndh' variable")
b71dd705179c ("selftests: mptcp: removed defined but unused vars")
b4e0df4cafe1 ("selftests: mptcp: run mptcp_inq from a clean netns")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a5a5990c099dd354e05e89ee77cd2dbf6655d4a1 Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Tue, 4 Jul 2023 22:44:36 +0200
Subject: [PATCH] selftests: mptcp: sockopt: use 'iptables-legacy' if available
IPTables commands using 'iptables-nft' fail on old kernels, at least
on v5.15 because it doesn't see the default IPTables chains:
$ iptables -L
iptables/1.8.2 Failed to initialize nft: Protocol not supported
As a first step before switching to NFTables, we can use iptables-legacy
if available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: dc65fe82fb07 ("selftests: mptcp: add packet mark test case")
Cc: stable(a)vger.kernel.org
Acked-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
index f295a371ff14..c21bfd7f0c01 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
@@ -12,6 +12,8 @@ ksft_skip=4
timeout_poll=30
timeout_test=$((timeout_poll * 2 + 1))
mptcp_connect=""
+iptables="iptables"
+ip6tables="ip6tables"
sec=$(date +%s)
rndh=$(printf %x $sec)-$(mktemp -u XXXXXX)
@@ -25,7 +27,7 @@ add_mark_rules()
local m=$2
local t
- for t in iptables ip6tables; do
+ for t in ${iptables} ${ip6tables}; do
# just to debug: check we have multiple subflows connection requests
ip netns exec $ns $t -A OUTPUT -p tcp --syn -m mark --mark $m -j ACCEPT
@@ -95,14 +97,14 @@ if [ $? -ne 0 ];then
exit $ksft_skip
fi
-iptables -V > /dev/null 2>&1
-if [ $? -ne 0 ];then
+# Use the legacy version if available to support old kernel versions
+if iptables-legacy -V &> /dev/null; then
+ iptables="iptables-legacy"
+ ip6tables="ip6tables-legacy"
+elif ! iptables -V &> /dev/null; then
echo "SKIP: Could not run all tests without iptables tool"
exit $ksft_skip
-fi
-
-ip6tables -V > /dev/null 2>&1
-if [ $? -ne 0 ];then
+elif ! ip6tables -V &> /dev/null; then
echo "SKIP: Could not run all tests without ip6tables tool"
exit $ksft_skip
fi
@@ -112,10 +114,10 @@ check_mark()
local ns=$1
local af=$2
- local tables=iptables
+ local tables=${iptables}
if [ $af -eq 6 ];then
- tables=ip6tables
+ tables=${ip6tables}
fi
local counters values
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 3fffa15bfef48b0ad6424779c03e68ae8ace5acb
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072102-finicky-everyday-cce4@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
3fffa15bfef4 ("mptcp: ensure subflow is unhashed before cleaning the backlog")
57fc0f1ceaa4 ("mptcp: ensure listener is unhashed before updating the sk status")
2a6a870e44dd ("mptcp: stops worker on unaccepted sockets at listener close")
0a3f4f1f9c27 ("mptcp: fix UaF in listener shutdown")
b6985b9b8295 ("mptcp: use the workqueue to destroy unaccepted sockets")
c558246ee73e ("mptcp: add statistics for mptcp socket in use")
cfdcfeed6449 ("mptcp: introduce 'sk' to replace 'sock->sk' in mptcp_listen()")
fec3adfd754c ("mptcp: fix lockdep false positive")
7d803344fdc3 ("mptcp: fix deadlock in fastopen error path")
f8c9dfbd875b ("mptcp: add pm listener events")
f2bb566f5c97 ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 3fffa15bfef48b0ad6424779c03e68ae8ace5acb Mon Sep 17 00:00:00 2001
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Tue, 4 Jul 2023 22:44:33 +0200
Subject: [PATCH] mptcp: ensure subflow is unhashed before cleaning the backlog
While tacking care of the mptcp-level listener I unintentionally
moved the subflow level unhash after the subflow listener backlog
cleanup.
That could cause some nasty race and makes the code harder to read.
Address the issue restoring the proper order of operations.
Fixes: 57fc0f1ceaa4 ("mptcp: ensure listener is unhashed before updating the sk status")
Cc: stable(a)vger.kernel.org
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Reviewed-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index e892673deb73..489a3defdde5 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2909,10 +2909,10 @@ static void mptcp_check_listen_stop(struct sock *sk)
return;
lock_sock_nested(ssk, SINGLE_DEPTH_NESTING);
+ tcp_set_state(ssk, TCP_CLOSE);
mptcp_subflow_queue_clean(sk, ssk);
inet_csk_listen_stop(ssk);
mptcp_event_pm_listener(ssk, MPTCP_EVENT_LISTENER_CLOSED);
- tcp_set_state(ssk, TCP_CLOSE);
release_sock(ssk);
}
Syzkaller found another small issue, a race with
iommufd_object_destroy_user() and IOMMUFD_DESTROY.
While researching that I noticed an error unwind mistake by inspection.
Yi and Nicolin stumbled on the last one running the test suite sometimes.
Jason Gunthorpe (3):
iommufd/selftest: Do not try to destroy an access once it is attached
iommufd: IOMMUFD_DESTROY should not increase the refcount
iommufd: Set end correctly when doing batch carry
drivers/iommu/iommufd/device.c | 12 +---
drivers/iommu/iommufd/iommufd_private.h | 15 ++++-
drivers/iommu/iommufd/main.c | 78 +++++++++++++++++++------
drivers/iommu/iommufd/pages.c | 2 +-
drivers/iommu/iommufd/selftest.c | 9 +--
5 files changed, 81 insertions(+), 35 deletions(-)
base-commit: 6eaae198076080886b9e7d57f4ae06fa782f90ef
--
2.41.0
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x e51df4f81b02bcdd828a04de7c1eb6a92988b61e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072306-polyester-reproach-45e3@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
e51df4f81b02 ("ASoC: cs42l51: fix driver to properly autoload with automatic module loading")
4a4043456cb8 ("ASoC: cs*: use simple i2c probe function")
f8593e885400 ("ASoC: cs42l42: Handle system suspend")
5982b5a8ec7d ("ASoC: cs42l42: Change jack_detect_mutex to a lock of all IRQ handling")
8d06f797f844 ("ASoC: cs42l42: Report full jack status when plug is detected")
a319cb32e7cf ("ASoC: cs4265: Add a remove() function")
fdd535283779 ("ASoC: cs42l42: Report initial jack state")
c778c01d3e66 ("ASoC: cs42l42: Remove unused runtime_suspend/runtime_resume callbacks")
bfceb9c21601 ("Merge branch 'asoc-5.15' into asoc-5.16")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e51df4f81b02bcdd828a04de7c1eb6a92988b61e Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Date: Thu, 13 Jul 2023 13:21:12 +0200
Subject: [PATCH] ASoC: cs42l51: fix driver to properly autoload with automatic
module loading
In commit 2cb1e0259f50 ("ASoC: cs42l51: re-hook of_match_table
pointer"), 9 years ago, some random guy fixed the cs42l51 after it was
split into a core part and an I2C part to properly match based on a
Device Tree compatible string.
However, the fix in this commit is wrong: the MODULE_DEVICE_TABLE(of,
....) is in the core part of the driver, not the I2C part. Therefore,
automatic module loading based on module.alias, based on matching with
the DT compatible string, loads the core part of the driver, but not
the I2C part. And threfore, the i2c_driver is not registered, and the
codec is not known to the system, nor matched with a DT node with the
corresponding compatible string.
In order to fix that, we move the MODULE_DEVICE_TABLE(of, ...) into
the I2C part of the driver. The cs42l51_of_match[] array is also moved
as well, as it is not possible to have this definition in one file,
and the MODULE_DEVICE_TABLE(of, ...) invocation in another file, due
to how MODULE_DEVICE_TABLE works.
Thanks to this commit, the I2C part of the driver now properly
autoloads, and thanks to its dependency on the core part, the core
part gets autoloaded as well, resulting in a functional sound card
without having to manually load kernel modules.
Fixes: 2cb1e0259f50 ("ASoC: cs42l51: re-hook of_match_table pointer")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Link: https://lore.kernel.org/r/20230713112112.778576-1-thomas.petazzoni@bootlin.…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/codecs/cs42l51-i2c.c b/sound/soc/codecs/cs42l51-i2c.c
index b2106ff6a7cb..e7db7bcd0296 100644
--- a/sound/soc/codecs/cs42l51-i2c.c
+++ b/sound/soc/codecs/cs42l51-i2c.c
@@ -19,6 +19,12 @@ static struct i2c_device_id cs42l51_i2c_id[] = {
};
MODULE_DEVICE_TABLE(i2c, cs42l51_i2c_id);
+const struct of_device_id cs42l51_of_match[] = {
+ { .compatible = "cirrus,cs42l51", },
+ { }
+};
+MODULE_DEVICE_TABLE(of, cs42l51_of_match);
+
static int cs42l51_i2c_probe(struct i2c_client *i2c)
{
struct regmap_config config;
diff --git a/sound/soc/codecs/cs42l51.c b/sound/soc/codecs/cs42l51.c
index a67cd3ee84e0..a7079ae0ca09 100644
--- a/sound/soc/codecs/cs42l51.c
+++ b/sound/soc/codecs/cs42l51.c
@@ -823,13 +823,6 @@ int __maybe_unused cs42l51_resume(struct device *dev)
}
EXPORT_SYMBOL_GPL(cs42l51_resume);
-const struct of_device_id cs42l51_of_match[] = {
- { .compatible = "cirrus,cs42l51", },
- { }
-};
-MODULE_DEVICE_TABLE(of, cs42l51_of_match);
-EXPORT_SYMBOL_GPL(cs42l51_of_match);
-
MODULE_AUTHOR("Arnaud Patard <arnaud.patard(a)rtp-net.org>");
MODULE_DESCRIPTION("Cirrus Logic CS42L51 ALSA SoC Codec Driver");
MODULE_LICENSE("GPL");
diff --git a/sound/soc/codecs/cs42l51.h b/sound/soc/codecs/cs42l51.h
index a79343e8a54e..125703ede113 100644
--- a/sound/soc/codecs/cs42l51.h
+++ b/sound/soc/codecs/cs42l51.h
@@ -16,7 +16,6 @@ int cs42l51_probe(struct device *dev, struct regmap *regmap);
void cs42l51_remove(struct device *dev);
int __maybe_unused cs42l51_suspend(struct device *dev);
int __maybe_unused cs42l51_resume(struct device *dev);
-extern const struct of_device_id cs42l51_of_match[];
#define CS42L51_CHIP_ID 0x1B
#define CS42L51_CHIP_REV_A 0x00
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x e51df4f81b02bcdd828a04de7c1eb6a92988b61e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072304-tripping-clean-8cd5@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
e51df4f81b02 ("ASoC: cs42l51: fix driver to properly autoload with automatic module loading")
4a4043456cb8 ("ASoC: cs*: use simple i2c probe function")
f8593e885400 ("ASoC: cs42l42: Handle system suspend")
5982b5a8ec7d ("ASoC: cs42l42: Change jack_detect_mutex to a lock of all IRQ handling")
8d06f797f844 ("ASoC: cs42l42: Report full jack status when plug is detected")
a319cb32e7cf ("ASoC: cs4265: Add a remove() function")
fdd535283779 ("ASoC: cs42l42: Report initial jack state")
c778c01d3e66 ("ASoC: cs42l42: Remove unused runtime_suspend/runtime_resume callbacks")
bfceb9c21601 ("Merge branch 'asoc-5.15' into asoc-5.16")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e51df4f81b02bcdd828a04de7c1eb6a92988b61e Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Date: Thu, 13 Jul 2023 13:21:12 +0200
Subject: [PATCH] ASoC: cs42l51: fix driver to properly autoload with automatic
module loading
In commit 2cb1e0259f50 ("ASoC: cs42l51: re-hook of_match_table
pointer"), 9 years ago, some random guy fixed the cs42l51 after it was
split into a core part and an I2C part to properly match based on a
Device Tree compatible string.
However, the fix in this commit is wrong: the MODULE_DEVICE_TABLE(of,
....) is in the core part of the driver, not the I2C part. Therefore,
automatic module loading based on module.alias, based on matching with
the DT compatible string, loads the core part of the driver, but not
the I2C part. And threfore, the i2c_driver is not registered, and the
codec is not known to the system, nor matched with a DT node with the
corresponding compatible string.
In order to fix that, we move the MODULE_DEVICE_TABLE(of, ...) into
the I2C part of the driver. The cs42l51_of_match[] array is also moved
as well, as it is not possible to have this definition in one file,
and the MODULE_DEVICE_TABLE(of, ...) invocation in another file, due
to how MODULE_DEVICE_TABLE works.
Thanks to this commit, the I2C part of the driver now properly
autoloads, and thanks to its dependency on the core part, the core
part gets autoloaded as well, resulting in a functional sound card
without having to manually load kernel modules.
Fixes: 2cb1e0259f50 ("ASoC: cs42l51: re-hook of_match_table pointer")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Link: https://lore.kernel.org/r/20230713112112.778576-1-thomas.petazzoni@bootlin.…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/codecs/cs42l51-i2c.c b/sound/soc/codecs/cs42l51-i2c.c
index b2106ff6a7cb..e7db7bcd0296 100644
--- a/sound/soc/codecs/cs42l51-i2c.c
+++ b/sound/soc/codecs/cs42l51-i2c.c
@@ -19,6 +19,12 @@ static struct i2c_device_id cs42l51_i2c_id[] = {
};
MODULE_DEVICE_TABLE(i2c, cs42l51_i2c_id);
+const struct of_device_id cs42l51_of_match[] = {
+ { .compatible = "cirrus,cs42l51", },
+ { }
+};
+MODULE_DEVICE_TABLE(of, cs42l51_of_match);
+
static int cs42l51_i2c_probe(struct i2c_client *i2c)
{
struct regmap_config config;
diff --git a/sound/soc/codecs/cs42l51.c b/sound/soc/codecs/cs42l51.c
index a67cd3ee84e0..a7079ae0ca09 100644
--- a/sound/soc/codecs/cs42l51.c
+++ b/sound/soc/codecs/cs42l51.c
@@ -823,13 +823,6 @@ int __maybe_unused cs42l51_resume(struct device *dev)
}
EXPORT_SYMBOL_GPL(cs42l51_resume);
-const struct of_device_id cs42l51_of_match[] = {
- { .compatible = "cirrus,cs42l51", },
- { }
-};
-MODULE_DEVICE_TABLE(of, cs42l51_of_match);
-EXPORT_SYMBOL_GPL(cs42l51_of_match);
-
MODULE_AUTHOR("Arnaud Patard <arnaud.patard(a)rtp-net.org>");
MODULE_DESCRIPTION("Cirrus Logic CS42L51 ALSA SoC Codec Driver");
MODULE_LICENSE("GPL");
diff --git a/sound/soc/codecs/cs42l51.h b/sound/soc/codecs/cs42l51.h
index a79343e8a54e..125703ede113 100644
--- a/sound/soc/codecs/cs42l51.h
+++ b/sound/soc/codecs/cs42l51.h
@@ -16,7 +16,6 @@ int cs42l51_probe(struct device *dev, struct regmap *regmap);
void cs42l51_remove(struct device *dev);
int __maybe_unused cs42l51_suspend(struct device *dev);
int __maybe_unused cs42l51_resume(struct device *dev);
-extern const struct of_device_id cs42l51_of_match[];
#define CS42L51_CHIP_ID 0x1B
#define CS42L51_CHIP_REV_A 0x00
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x e51df4f81b02bcdd828a04de7c1eb6a92988b61e
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072302-prancing-correct-152e@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
e51df4f81b02 ("ASoC: cs42l51: fix driver to properly autoload with automatic module loading")
4a4043456cb8 ("ASoC: cs*: use simple i2c probe function")
f8593e885400 ("ASoC: cs42l42: Handle system suspend")
5982b5a8ec7d ("ASoC: cs42l42: Change jack_detect_mutex to a lock of all IRQ handling")
8d06f797f844 ("ASoC: cs42l42: Report full jack status when plug is detected")
a319cb32e7cf ("ASoC: cs4265: Add a remove() function")
fdd535283779 ("ASoC: cs42l42: Report initial jack state")
c778c01d3e66 ("ASoC: cs42l42: Remove unused runtime_suspend/runtime_resume callbacks")
bfceb9c21601 ("Merge branch 'asoc-5.15' into asoc-5.16")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From e51df4f81b02bcdd828a04de7c1eb6a92988b61e Mon Sep 17 00:00:00 2001
From: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Date: Thu, 13 Jul 2023 13:21:12 +0200
Subject: [PATCH] ASoC: cs42l51: fix driver to properly autoload with automatic
module loading
In commit 2cb1e0259f50 ("ASoC: cs42l51: re-hook of_match_table
pointer"), 9 years ago, some random guy fixed the cs42l51 after it was
split into a core part and an I2C part to properly match based on a
Device Tree compatible string.
However, the fix in this commit is wrong: the MODULE_DEVICE_TABLE(of,
....) is in the core part of the driver, not the I2C part. Therefore,
automatic module loading based on module.alias, based on matching with
the DT compatible string, loads the core part of the driver, but not
the I2C part. And threfore, the i2c_driver is not registered, and the
codec is not known to the system, nor matched with a DT node with the
corresponding compatible string.
In order to fix that, we move the MODULE_DEVICE_TABLE(of, ...) into
the I2C part of the driver. The cs42l51_of_match[] array is also moved
as well, as it is not possible to have this definition in one file,
and the MODULE_DEVICE_TABLE(of, ...) invocation in another file, due
to how MODULE_DEVICE_TABLE works.
Thanks to this commit, the I2C part of the driver now properly
autoloads, and thanks to its dependency on the core part, the core
part gets autoloaded as well, resulting in a functional sound card
without having to manually load kernel modules.
Fixes: 2cb1e0259f50 ("ASoC: cs42l51: re-hook of_match_table pointer")
Cc: stable(a)vger.kernel.org
Signed-off-by: Thomas Petazzoni <thomas.petazzoni(a)bootlin.com>
Link: https://lore.kernel.org/r/20230713112112.778576-1-thomas.petazzoni@bootlin.…
Signed-off-by: Mark Brown <broonie(a)kernel.org>
diff --git a/sound/soc/codecs/cs42l51-i2c.c b/sound/soc/codecs/cs42l51-i2c.c
index b2106ff6a7cb..e7db7bcd0296 100644
--- a/sound/soc/codecs/cs42l51-i2c.c
+++ b/sound/soc/codecs/cs42l51-i2c.c
@@ -19,6 +19,12 @@ static struct i2c_device_id cs42l51_i2c_id[] = {
};
MODULE_DEVICE_TABLE(i2c, cs42l51_i2c_id);
+const struct of_device_id cs42l51_of_match[] = {
+ { .compatible = "cirrus,cs42l51", },
+ { }
+};
+MODULE_DEVICE_TABLE(of, cs42l51_of_match);
+
static int cs42l51_i2c_probe(struct i2c_client *i2c)
{
struct regmap_config config;
diff --git a/sound/soc/codecs/cs42l51.c b/sound/soc/codecs/cs42l51.c
index a67cd3ee84e0..a7079ae0ca09 100644
--- a/sound/soc/codecs/cs42l51.c
+++ b/sound/soc/codecs/cs42l51.c
@@ -823,13 +823,6 @@ int __maybe_unused cs42l51_resume(struct device *dev)
}
EXPORT_SYMBOL_GPL(cs42l51_resume);
-const struct of_device_id cs42l51_of_match[] = {
- { .compatible = "cirrus,cs42l51", },
- { }
-};
-MODULE_DEVICE_TABLE(of, cs42l51_of_match);
-EXPORT_SYMBOL_GPL(cs42l51_of_match);
-
MODULE_AUTHOR("Arnaud Patard <arnaud.patard(a)rtp-net.org>");
MODULE_DESCRIPTION("Cirrus Logic CS42L51 ALSA SoC Codec Driver");
MODULE_LICENSE("GPL");
diff --git a/sound/soc/codecs/cs42l51.h b/sound/soc/codecs/cs42l51.h
index a79343e8a54e..125703ede113 100644
--- a/sound/soc/codecs/cs42l51.h
+++ b/sound/soc/codecs/cs42l51.h
@@ -16,7 +16,6 @@ int cs42l51_probe(struct device *dev, struct regmap *regmap);
void cs42l51_remove(struct device *dev);
int __maybe_unused cs42l51_suspend(struct device *dev);
int __maybe_unused cs42l51_resume(struct device *dev);
-extern const struct of_device_id cs42l51_of_match[];
#define CS42L51_CHIP_ID 0x1B
#define CS42L51_CHIP_REV_A 0x00
Please backport commit
commit 9cf42bca30e98a1c6c9e8abf876940a551eaa3d1
Author: Ard Biesheuvel <ardb(a)kernel.org>
Date: Tue Aug 2 11:00:16 2022 +0200
efi: libstub: use EFI_LOADER_CODE region when moving the kernel in memory
to all active stable trees all the way back to v5.15. I will provide a
separate backport for v5.10, and possibly a [much] larger set of
backports for v5.4 for EFI boot support.
Thanks,
Ard.
Hi,
A series landed in 6.5 that helps system hangs when AMD dGPUs are used
in Intel hosts and the host advertises ASPM.
The whole series was CC stable 6.1+, but for some reason (unknown to me)
these two commits in the series didn't apply to 6.1 and 6.4.
188623076d0f ("drm/amd: Move helper for dynamic speed switch check out
of smu13")
e701156ccc6c ("drm/amd: Align SMU11 SMU_MSG_OverridePcieParameters
implementation with SMU13")
The rest of the commits came back properly, and I double checked these
were not in the queue and work when added into 6.4.6/6.1.41 so can you
please bring them in to complete the stable backport?
Thanks!
Hi,
This commit fixes a PHY hang that happens on AMD Phoenix when DPMS is
toggled on/off. Can you please take to 6.1.y?
2b02d746c181 ("drm/amd/display: Keep PHY active for dp config")
Thanks,
Hi,
The following two patches fix the notification path to the EC (by way of
SBIOS) for the PMF driver.
33c9ab5b493a0 platform/x86/amd/pmf: Notify OS power slider update
839e90e75e695 platform/x86/amd/pmf: reduce verbosity of
apmf_get_system_params
This change allows Phoenix based laptops to use software like
power-profiles-daemon to configure the APU for power saving or for
performance and on affected systems can significantly impact battery life.
As Phoenix systems are enabled from 6.1 onwards, can these both be
brought to 6.1.y and 6.4.y?
Thanks!
Hi Greg, Sasha,
Could you please cherry-pick:
29ad9a305943 ("netfilter: nf_tables: fix underflow in chain reference counter")
b8ae60de6fd3 ("netfilter: nf_tables: fix underflow in object reference counter"
into 6.1.y?
Other -stable kernels I have just audited do not need these updates,
since this fix have been already included in my recent -stable backports.
Thanks.
P.S: Moving forward, I will add the Cc: stable(a)vger.kernel.org tag to
patches as Greg suggested.
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x a5a5990c099dd354e05e89ee77cd2dbf6655d4a1
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072145-cardigan-stinger-9c9a@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
a5a5990c099d ("selftests: mptcp: sockopt: use 'iptables-legacy' if available")
5f17f8e315ad ("selftests: mptcp: declare var as local")
de2392028a19 ("selftests: mptcp: clearly declare global ns vars")
787eb1e4df93 ("selftests: mptcp: uniform 'rndh' variable")
b71dd705179c ("selftests: mptcp: removed defined but unused vars")
b4e0df4cafe1 ("selftests: mptcp: run mptcp_inq from a clean netns")
7e68d31020f1 ("selftests: mptcp: run mptcp_sockopt from a new netns")
259a834fadda ("selftests: mptcp: functional tests for the userspace PM type")
1e777bd818bd ("selftests: mptcp: join: clarify local/global vars")
ae7bd9ccecc3 ("selftests: mptcp: join: option to execute specific tests")
e59300ce3ff8 ("selftests: mptcp: join: reset failing links")
3afd0280e7d3 ("selftests: mptcp: join: define tests groups once")
3c082695e78b ("selftests: mptcp: drop msg argument of chk_csum_nr")
69c6ce7b6eca ("selftests: mptcp: add implicit endpoint test case")
d045b9eb95a9 ("mptcp: introduce implicit endpoints")
6fa0174a7c86 ("mptcp: more careful RM_ADDR generation")
f98c2bca7b2b ("selftests: mptcp: Rename wait function")
826d7bdca833 ("selftests: mptcp: join: allow running -cCi")
7d9bf018f907 ("selftests: mptcp: update output info of chk_rm_nr")
26516e10c433 ("selftests: mptcp: add more arguments for chk_join_nr")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From a5a5990c099dd354e05e89ee77cd2dbf6655d4a1 Mon Sep 17 00:00:00 2001
From: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Date: Tue, 4 Jul 2023 22:44:36 +0200
Subject: [PATCH] selftests: mptcp: sockopt: use 'iptables-legacy' if available
IPTables commands using 'iptables-nft' fail on old kernels, at least
on v5.15 because it doesn't see the default IPTables chains:
$ iptables -L
iptables/1.8.2 Failed to initialize nft: Protocol not supported
As a first step before switching to NFTables, we can use iptables-legacy
if available.
Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368
Fixes: dc65fe82fb07 ("selftests: mptcp: add packet mark test case")
Cc: stable(a)vger.kernel.org
Acked-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
diff --git a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
index f295a371ff14..c21bfd7f0c01 100755
--- a/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
+++ b/tools/testing/selftests/net/mptcp/mptcp_sockopt.sh
@@ -12,6 +12,8 @@ ksft_skip=4
timeout_poll=30
timeout_test=$((timeout_poll * 2 + 1))
mptcp_connect=""
+iptables="iptables"
+ip6tables="ip6tables"
sec=$(date +%s)
rndh=$(printf %x $sec)-$(mktemp -u XXXXXX)
@@ -25,7 +27,7 @@ add_mark_rules()
local m=$2
local t
- for t in iptables ip6tables; do
+ for t in ${iptables} ${ip6tables}; do
# just to debug: check we have multiple subflows connection requests
ip netns exec $ns $t -A OUTPUT -p tcp --syn -m mark --mark $m -j ACCEPT
@@ -95,14 +97,14 @@ if [ $? -ne 0 ];then
exit $ksft_skip
fi
-iptables -V > /dev/null 2>&1
-if [ $? -ne 0 ];then
+# Use the legacy version if available to support old kernel versions
+if iptables-legacy -V &> /dev/null; then
+ iptables="iptables-legacy"
+ ip6tables="ip6tables-legacy"
+elif ! iptables -V &> /dev/null; then
echo "SKIP: Could not run all tests without iptables tool"
exit $ksft_skip
-fi
-
-ip6tables -V > /dev/null 2>&1
-if [ $? -ne 0 ];then
+elif ! ip6tables -V &> /dev/null; then
echo "SKIP: Could not run all tests without ip6tables tool"
exit $ksft_skip
fi
@@ -112,10 +114,10 @@ check_mark()
local ns=$1
local af=$2
- local tables=iptables
+ local tables=${iptables}
if [ $af -eq 6 ];then
- tables=ip6tables
+ tables=${ip6tables}
fi
local counters values
When a grant entry is still in use by the remote domain, Linux must put
it on a deferred list. Normally, this list is very short, because
the PV network and block protocols expect the backend to unmap the grant
first. However, Qubes OS's GUI protocol is subject to the constraints
of the X Window System, and as such winds up with the frontend unmapping
the window first. As a result, the list can grow very large, resulting
in a massive memory leak and eventual VM freeze.
To partially solve this problem, make the number of entries that the VM
will attempt to free at each iteration tunable. The default is still
10, but it can be overridden at compile-time (via Kconfig), boot-time
(via a kernel command-line option), or runtime (via sysfs).
This is Cc: stable because (when combined with appropriate userspace
changes) it fixes a severe performance and stability problem for Qubes
OS users.
Cc: stable(a)vger.kernel.org
Signed-off-by: Demi Marie Obenour <demi(a)invisiblethingslab.com>
---
drivers/xen/grant-table.c | 40 ++++++++++++++++++++++++++++-----------
1 file changed, 29 insertions(+), 11 deletions(-)
Changes since v2:
- use atomic_inc_return(x) and atomic_dec_return(x) instead of
atomic_add_return(1, x) and atomic_sub_return(1, x) respectively.
- move module_param macro closer to the definition of
free_per_iteration.
- add blank line between declarations and statements.
Changes since v1:
- drop setting default via Kconfig
diff --git a/drivers/xen/grant-table.c b/drivers/xen/grant-table.c
index e1ec725c2819d4d5dede063eb00d86a6d52944c0..f13c3b76ad1eb7110e2a2981e9fa4e504174e431 100644
--- a/drivers/xen/grant-table.c
+++ b/drivers/xen/grant-table.c
@@ -498,14 +498,21 @@ static LIST_HEAD(deferred_list);
static void gnttab_handle_deferred(struct timer_list *);
static DEFINE_TIMER(deferred_timer, gnttab_handle_deferred);
+static atomic64_t deferred_count;
+static atomic64_t leaked_count;
+static unsigned int free_per_iteration = 10;
+module_param(free_per_iteration, uint, 0600);
+
static void gnttab_handle_deferred(struct timer_list *unused)
{
- unsigned int nr = 10;
+ unsigned int nr = READ_ONCE(free_per_iteration);
+ const bool ignore_limit = nr == 0;
struct deferred_entry *first = NULL;
unsigned long flags;
+ size_t freed = 0;
spin_lock_irqsave(&gnttab_list_lock, flags);
- while (nr--) {
+ while ((ignore_limit || nr--) && !list_empty(&deferred_list)) {
struct deferred_entry *entry
= list_first_entry(&deferred_list,
struct deferred_entry, list);
@@ -515,10 +522,14 @@ static void gnttab_handle_deferred(struct timer_list *unused)
list_del(&entry->list);
spin_unlock_irqrestore(&gnttab_list_lock, flags);
if (_gnttab_end_foreign_access_ref(entry->ref)) {
+ uint64_t ret = atomic64_dec_return(&deferred_count);
+
put_free_entry(entry->ref);
- pr_debug("freeing g.e. %#x (pfn %#lx)\n",
- entry->ref, page_to_pfn(entry->page));
+ pr_debug("freeing g.e. %#x (pfn %#lx), %llu remaining\n",
+ entry->ref, page_to_pfn(entry->page),
+ (unsigned long long)ret);
put_page(entry->page);
+ freed++;
kfree(entry);
entry = NULL;
} else {
@@ -530,21 +541,22 @@ static void gnttab_handle_deferred(struct timer_list *unused)
spin_lock_irqsave(&gnttab_list_lock, flags);
if (entry)
list_add_tail(&entry->list, &deferred_list);
- else if (list_empty(&deferred_list))
- break;
}
- if (!list_empty(&deferred_list) && !timer_pending(&deferred_timer)) {
+ if (list_empty(&deferred_list))
+ WARN_ON(atomic64_read(&deferred_count));
+ else if (!timer_pending(&deferred_timer)) {
deferred_timer.expires = jiffies + HZ;
add_timer(&deferred_timer);
}
spin_unlock_irqrestore(&gnttab_list_lock, flags);
+ pr_debug("Freed %zu references", freed);
}
static void gnttab_add_deferred(grant_ref_t ref, struct page *page)
{
struct deferred_entry *entry;
gfp_t gfp = (in_atomic() || irqs_disabled()) ? GFP_ATOMIC : GFP_KERNEL;
- const char *what = KERN_WARNING "leaking";
+ uint64_t leaked, deferred;
entry = kmalloc(sizeof(*entry), gfp);
if (!page) {
@@ -567,10 +579,16 @@ static void gnttab_add_deferred(grant_ref_t ref, struct page *page)
add_timer(&deferred_timer);
}
spin_unlock_irqrestore(&gnttab_list_lock, flags);
- what = KERN_DEBUG "deferring";
+ deferred = atomic64_inc_return(&deferred_count);
+ leaked = atomic64_read(&leaked_count);
+ pr_debug("deferring g.e. %#x (pfn %#lx) (total deferred %llu, total leaked %llu)\n",
+ ref, page ? page_to_pfn(page) : -1, deferred, leaked);
+ } else {
+ deferred = atomic64_read(&deferred_count);
+ leaked = atomic64_inc_return(&leaked_count);
+ pr_warn("leaking g.e. %#x (pfn %#lx) (total deferred %llu, total leaked %llu)\n",
+ ref, page ? page_to_pfn(page) : -1, deferred, leaked);
}
- printk("%s g.e. %#x (pfn %#lx)\n",
- what, ref, page ? page_to_pfn(page) : -1);
}
int gnttab_try_end_foreign_access(grant_ref_t ref)
--
Sincerely,
Demi Marie Obenour (she/her/hers)
Invisible Things Lab
Patch 1: Better detection of ip6tables vs ip6tables-legacy tools for
self tests. Fix for 6.4 and newer.
Patch 2: Only generate "new listener" event if listen operation
succeeds. Fix for 6.2 and newer.
Signed-off-by: Mat Martineau <martineau(a)kernel.org>
---
Matthieu Baerts (1):
selftests: mptcp: join: only check for ip6tables if needed
Paolo Abeni (1):
mptcp: more accurate NL event generation
net/mptcp/protocol.c | 3 +--
tools/testing/selftests/net/mptcp/mptcp_join.sh | 4 +---
2 files changed, 2 insertions(+), 5 deletions(-)
---
base-commit: 284779dbf4e98753458708783af8c35630674a21
change-id: 20230725-send-net-20230725-579ecd4326e8
Best regards,
--
Mat Martineau <martineau(a)kernel.org>
The patch titled
Subject: crypto, cifs: Fix error handling in extract_iter_to_sg()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
crypto-cifs-fix-error-handling-in-extract_iter_to_sg.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Howells <dhowells(a)redhat.com>
Subject: crypto, cifs: Fix error handling in extract_iter_to_sg()
Date: Wed, 26 Jul 2023 11:57:56 +0100
Fix error handling in extract_iter_to_sg(). Pages need to be unpinned, not
put in extract_user_to_sg() when handling IOVEC/UBUF sources.
The bug may result in a warning like the following:
WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:27 [inline]
WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 raw_atomic_add include/linux/atomic/atomic-arch-fallback.h:537 [inline]
WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 atomic_add include/linux/atomic/atomic-instrumented.h:105 [inline]
WARNING: CPU: 1 PID: 20384 at mm/gup.c:229 try_grab_page+0x108/0x160 mm/gup.c:252
...
pc : try_grab_page+0x108/0x160 mm/gup.c:229
lr : follow_page_pte+0x174/0x3e4 mm/gup.c:651
...
Call trace:
__lse_atomic_add arch/arm64/include/asm/atomic_lse.h:27 [inline]
arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline]
raw_atomic_add include/linux/atomic/atomic-arch-fallback.h:537 [inline]
atomic_add include/linux/atomic/atomic-instrumented.h:105 [inline]
try_grab_page+0x108/0x160 mm/gup.c:252
follow_pmd_mask mm/gup.c:734 [inline]
follow_pud_mask mm/gup.c:765 [inline]
follow_p4d_mask mm/gup.c:782 [inline]
follow_page_mask+0x12c/0x2e4 mm/gup.c:839
__get_user_pages+0x174/0x30c mm/gup.c:1217
__get_user_pages_locked mm/gup.c:1448 [inline]
__gup_longterm_locked+0x94/0x8f4 mm/gup.c:2142
internal_get_user_pages_fast+0x970/0xb60 mm/gup.c:3140
pin_user_pages_fast+0x4c/0x60 mm/gup.c:3246
iov_iter_extract_user_pages lib/iov_iter.c:1768 [inline]
iov_iter_extract_pages+0xc8/0x54c lib/iov_iter.c:1831
extract_user_to_sg lib/scatterlist.c:1123 [inline]
extract_iter_to_sg lib/scatterlist.c:1349 [inline]
extract_iter_to_sg+0x26c/0x6fc lib/scatterlist.c:1339
hash_sendmsg+0xc0/0x43c crypto/algif_hash.c:117
sock_sendmsg_nosec net/socket.c:725 [inline]
sock_sendmsg+0x54/0x60 net/socket.c:748
____sys_sendmsg+0x270/0x2ac net/socket.c:2494
___sys_sendmsg+0x80/0xdc net/socket.c:2548
__sys_sendmsg+0x68/0xc4 net/socket.c:2577
__do_sys_sendmsg net/socket.c:2586 [inline]
__se_sys_sendmsg net/socket.c:2584 [inline]
__arm64_sys_sendmsg+0x24/0x30 net/socket.c:2584
__invoke_syscall arch/arm64/kernel/syscall.c:38 [inline]
invoke_syscall+0x48/0x114 arch/arm64/kernel/syscall.c:52
el0_svc_common.constprop.0+0x44/0xe4 arch/arm64/kernel/syscall.c:142
do_el0_svc+0x38/0xa4 arch/arm64/kernel/syscall.c:191
el0_svc+0x2c/0xb0 arch/arm64/kernel/entry-common.c:647
el0t_64_sync_handler+0xc0/0xc4 arch/arm64/kernel/entry-common.c:665
el0t_64_sync+0x19c/0x1a0 arch/arm64/kernel/entry.S:591
Link: https://lkml.kernel.org/r/20571.1690369076@warthog.procyon.org.uk
Fixes: 018584697533 ("netfs: Add a function to extract an iterator into a scatterlist")
Reported-by: syzbot+9b82859567f2e50c123e(a)syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-mm/000000000000273d0105ff97bf56@google.com/
Signed-off-by: David Howells <dhowells(a)redhat.com>
Reviewed-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Steve French <stfrench(a)microsoft.com>
Cc: Sven Schnelle <svens(a)linux.ibm.com>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Jeff Layton <jlayton(a)kernel.org>
Cc: Shyam Prasad N <nspmangalore(a)gmail.com>
Cc: Rohith Surabattula <rohiths.msft(a)gmail.com>
Cc: Jens Axboe <axboe(a)kernel.dk>
Cc: Herbert Xu <herbert(a)gondor.apana.org.au>
Cc: "David S. Miller" <davem(a)davemloft.net>
Cc: Eric Dumazet <edumazet(a)google.com>
Cc: Jakub Kicinski <kuba(a)kernel.org>
Cc: Paolo Abeni <pabeni(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/scatterlist.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/lib/scatterlist.c~crypto-cifs-fix-error-handling-in-extract_iter_to_sg
+++ a/lib/scatterlist.c
@@ -1148,7 +1148,7 @@ static ssize_t extract_user_to_sg(struct
failed:
while (sgtable->nents > sgtable->orig_nents)
- put_page(sg_page(&sgtable->sgl[--sgtable->nents]));
+ unpin_user_page(sg_page(&sgtable->sgl[--sgtable->nents]));
return res;
}
_
Patches currently in -mm which might be from dhowells(a)redhat.com are
crypto-cifs-fix-error-handling-in-extract_iter_to_sg.patch
mm-merge-folio_has_private-filemap_release_folio-call-pairs.patch
mm-netfs-fscache-stop-read-optimisation-when-folio-removed-from-pagecache.patch
The original patch from Lin Ma enables the vdpa driver to use validation
netlink ops.
The second patch simply disables the validation skip which is no longer
neccesary. Patchset started of from this discussion [0].
[0] https://lore.kernel.org/virtualization/20230726074710-mutt-send-email-mst@k…
v2: cc'ed stable
Dragos Tatulea (1):
vdpa: Enable strict validation for netlinks ops
Lin Ma (1):
vdpa: Complement vdpa_nl_policy for nlattr length check
drivers/vdpa/vdpa.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
--
2.41.0
Unlike the other data structures provided during registration the
thermal core takes a copy of the thermal_zone_params provided to it and
stores that copy in the thermal_zone_device, taking care to free it on
unregistration. This is done because the parameters will be modified at
runtime.
Unfortunately the thermal_of code assumes that the params structure it
provides will be used throughout the lifetime of the device and since
the params are dynamically allocated based on the bindings it attempts
to free it on unregistration. This results in not only leaking the
original params but also double freeing the copy the core made, leading
to memory corruption.
Fix this by instead freeing the params parsed from the DT during
registration.
This issue causing instability on systems where thermal zones are
unregistered, especially visble on those systems where some zones
provided by a device have no trip points such as Allwinner systems.
For example with current mainline an arm64 defconfig is unbootable on
Pine64 Plus and LibreTech Tritium is massively unstable. These issues
have been there for a while and have been made more prominent by recent
memory management changes.
Fixes: 3fd6d6e2b4e80 ("thermal/of: Rework the thermal device tree initialization")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: stable(a)vger.kernel.org
---
drivers/thermal/thermal_of.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/thermal/thermal_of.c b/drivers/thermal/thermal_of.c
index 6fb14e521197..0af11cdfa2c1 100644
--- a/drivers/thermal/thermal_of.c
+++ b/drivers/thermal/thermal_of.c
@@ -442,13 +442,11 @@ static int thermal_of_unbind(struct thermal_zone_device *tz,
static void thermal_of_zone_unregister(struct thermal_zone_device *tz)
{
struct thermal_trip *trips = tz->trips;
- struct thermal_zone_params *tzp = tz->tzp;
struct thermal_zone_device_ops *ops = tz->ops;
thermal_zone_device_disable(tz);
thermal_zone_device_unregister(tz);
kfree(trips);
- kfree(tzp);
kfree(ops);
}
@@ -530,6 +528,9 @@ static struct thermal_zone_device *thermal_of_zone_register(struct device_node *
goto out_kfree_tzp;
}
+ /* The core will take a copy of tzp, free our copy here. */
+ kfree(tzp);
+
ret = thermal_zone_device_enable(tz);
if (ret) {
pr_err("Failed to enabled thermal zone '%s', id=%d: %d\n",
---
base-commit: fdf0eaf11452d72945af31804e2a1048ee1b574c
change-id: 20230722-thermal-fix-of-memory-corruption-73c023f8612b
Best regards,
--
Mark Brown <broonie(a)kernel.org>
The patch titled
Subject: proc/vmcore: fix signedness bug in read_from_oldmem()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
proc-vmcore-fix-signedness-bug-in-read_from_oldmem.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Dan Carpenter <dan.carpenter(a)linaro.org>
Subject: proc/vmcore: fix signedness bug in read_from_oldmem()
Date: Tue, 25 Jul 2023 20:03:16 +0300
The bug is the error handling:
if (tmp < nr_bytes) {
"tmp" can hold negative error codes but because "nr_bytes" is type size_t
the negative error codes are treated as very high positive values
(success). Fix this by changing "nr_bytes" to type ssize_t. The
"nr_bytes" variable is used to store values between 1 and PAGE_SIZE and
they can fit in ssize_t without any issue.
Link: https://lkml.kernel.org/r/b55f7eed-1c65-4adc-95d1-6c7c65a54a6e@moroto.mount…
Fixes: 5d8de293c224 ("vmcore: convert copy_oldmem_page() to take an iov_iter")
Signed-off-by: Dan Carpenter <dan.carpenter(a)linaro.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Acked-by: Baoquan He <bhe(a)redhat.com>
Cc: Dave Young <dyoung(a)redhat.com>
Cc: Vivek Goyal <vgoyal(a)redhat.com>
Cc: Alexey Dobriyan <adobriyan(a)gmail.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/vmcore.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/proc/vmcore.c~proc-vmcore-fix-signedness-bug-in-read_from_oldmem
+++ a/fs/proc/vmcore.c
@@ -132,7 +132,7 @@ ssize_t read_from_oldmem(struct iov_iter
u64 *ppos, bool encrypted)
{
unsigned long pfn, offset;
- size_t nr_bytes;
+ ssize_t nr_bytes;
ssize_t read = 0, tmp;
int idx;
_
Patches currently in -mm which might be from dan.carpenter(a)linaro.org are
proc-vmcore-fix-signedness-bug-in-read_from_oldmem.patch
On 7/26/23 06:17, Conor Dooley wrote:
> On Tue, Jul 25, 2023 at 07:57:54PM +0100, Conor Dooley wrote:
>> On Tue, Jul 25, 2023 at 10:23:44AM -0700, Nathan Chancellor wrote:
>>> On Wed, Jul 26, 2023 at 01:04:05AM +0800, Mingzheng Xing wrote:
>>>> When compiling the kernel with the toolchain composed of GCC >= 12.1.0 and
>>>> binutils < 2.38, default ISA spec used when building binutils and GCC, the
>>>> following build failure will appear because the
>>>> CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI is not turned on.
>>>> (i.e, gcc-12.1.0 and binutils-2.36, or gcc-12.3.0 and binutils-2.37, use
>>>> default ISA spec.)
>>>>
>>>> CC arch/riscv/kernel/vdso/vgettimeofday.o
>>>> <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h: Assembler messages:
>>>> <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
>>> The gift that keeps on giving :/
>>>
>>>> Binutils has updated the default ISA spec version, and the community has
>>>> responded well to this[1][2][3], but it appears that this is not over yet.
> Also, I just noticed this comment. I disagree with the wording "well",
> and more like "with weeping and gnashing of teeth" ;) This stuff is a
> huge pain in the ass, and mixing toolchains between LLVM & GNU stuff (or
> using an older binutils with a newer GCC) really makes it a lot worse.
At least in the usual way , except in my unusual case...
> Thanks for submitting a fix for this so that Nathan or I didn't have to!
Hopefully this patch will make things better.
>>>> We also need to consider the situation of binutils < 2.38 but
>>>> GCC >= 12.1.0, since the combination between different versions of GCC and
>>>> binutils is not unique, which is to some extent flexible. GCC release
>>>> 12.1.0 updated the default ISA spec version in GCC commit[4].
>>> I suspect this combination is not too common because binutils 2.38 came
>>> out before GCC 12.1.0 but as you note, it is obviously possible. What
>>> toolchain has this combination in the wild, which would be helpful for
>>> documentation purposes?
>> Yeah, that'd be great to know, at least the other niche stuff that we
>> are working around had a clear use-case (testing LLVM in debian containers)
>> whereas there's no clear user for this.
>> That's doubly interesting, as this patch seems to break things for binutils
>> < 2.35, and if we have to make a trade-off between those too, then it'd
>> be good to be able to weigh up the options.
>> Do we perhaps need the misa-spec workaround instead for this case?
>> Haven't tested that though, trying to dig myself out of email backlog.
> I don't think the misa-spec stuff is what we need actually. Instead, the
> workaround/fix that this patch implements just needs to be constrained to
> versions of GAS greater than 2.35.
I'm sorry, I didn't quite understand this, could you provide some more
information, thank you very much!
I'll change the handling in v2. After testing it myself, it seems to
work fine.
Thanks,
Mingzheng.
> Thanks,
> Conor.
>
>>>> For more information, please refer to:
>>>>
>>>> commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38")
>>>> commit e89c2e815e76 ("riscv: Handle zicsr/zifencei issues between clang and binutils")
>>>>
>>>> [1]:https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/aE1ZeHHCYf4
>>>> [2]:https://lore.kernel.org/all/20230308220842.1231003-1-conor@kernel.org
>>>> [3]:https://lore.kernel.org/all/20230223220546.52879-1-conor@kernel.org
>>>> [4]:https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76ba…
>> btw, please make these regular Link: tags (with a [N] at EOL) and drop
>> the space between them and the sign off. Also, this probably needs to be
>> CC:stable@vger.kernel.org too.
>>>> Signed-off-by: Mingzheng Xing<xingmingzheng(a)iscas.ac.cn>
>>>> ---
>>>> arch/riscv/Kconfig | 6 +++++-
>>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>>> index 4c07b9189c86..b49cea30f6cc 100644
>>>> --- a/arch/riscv/Kconfig
>>>> +++ b/arch/riscv/Kconfig
>>>> @@ -570,11 +570,15 @@ config TOOLCHAIN_HAS_ZIHINTPAUSE
>>>> config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
>>>> def_bool y
>>>> #https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae8717a77d94b51ac3614e2ca6dc
>>>> - depends on AS_IS_GNU && AS_VERSION >= 23800
>>>> + #https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76bab51b5b52fec2cd
>>>> + depends on CC_IS_GCC && GCC_VERSION >= 120100 || \
>>>> + AS_IS_GNU && AS_VERSION >= 23800
>>> GCC_VERSION will be 0 for clang, so you don't need the CC_IS_GCC check.
>>> With that change, this should be able to stay on one line:
>>>
>>> depends on GCC_VERSION >= 120100 || (AS_IS_GNU && AS_VERSION >= 23800)
>>>
>>>> help
>>>> Newer binutils versions default to ISA spec version 20191213 which
>>>> moves some instructions from the I extension to the Zicsr and Zifencei
>>>> extensions.
>>>> + Similarly, GCC release 12.1.0 has changed the default ISA spec version to
>>>> + 20191213, so the above situation requires this option to be enabled.
>>>>
>>>> config TOOLCHAIN_NEEDS_OLD_ISA_SPEC
>>>> def_bool y
>>>> --
>>>> 2.34.1
>>>>
On 7/26/23 02:57, Conor Dooley wrote:
> On Tue, Jul 25, 2023 at 10:23:44AM -0700, Nathan Chancellor wrote:
>> Hi Mingzheng,
>>
>> Thanks for the patch!
>>
>> On Wed, Jul 26, 2023 at 01:04:05AM +0800, Mingzheng Xing wrote:
>>> When compiling the kernel with the toolchain composed of GCC >= 12.1.0 and
>>> binutils < 2.38, default ISA spec used when building binutils and GCC, the
>>> following build failure will appear because the
>>> CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI is not turned on.
>>> (i.e, gcc-12.1.0 and binutils-2.36, or gcc-12.3.0 and binutils-2.37, use
>>> default ISA spec.)
>>>
>>> CC arch/riscv/kernel/vdso/vgettimeofday.o
>>> <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h: Assembler messages:
>>> <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
>> The gift that keeps on giving :/
>>
>>> Binutils has updated the default ISA spec version, and the community has
>>> responded well to this[1][2][3], but it appears that this is not over yet.
>>>
>>> We also need to consider the situation of binutils < 2.38 but
>>> GCC >= 12.1.0, since the combination between different versions of GCC and
>>> binutils is not unique, which is to some extent flexible. GCC release
>>> 12.1.0 updated the default ISA spec version in GCC commit[4].
>> I suspect this combination is not too common because binutils 2.38 came
>> out before GCC 12.1.0 but as you note, it is obviously possible. What
>> toolchain has this combination in the wild, which would be helpful for
>> documentation purposes?
> Yeah, that'd be great to know, at least the other niche stuff that we
> are working around had a clear use-case (testing LLVM in debian containers)
> whereas there's no clear user for this.
> That's doubly interesting, as this patch seems to break things for binutils
> < 2.35, and if we have to make a trade-off between those too, then it'd
> be good to be able to weigh up the options.
> Do we perhaps need the misa-spec workaround instead for this case?
> Haven't tested that though, trying to dig myself out of email backlog.
Well, what I encountered use-case was a temporary thing caused by the
inconsistent pace of distro package upgrades, but it really happened and
took quite a bit of time to explore why. There are sites like [1] that count
the "Successful Builds" between different GCC and binutils releases,
(Though they don't seem to be updated much...), but it seems to indicate
that all kinds of available pairings are possible.
Before replying to the e-mail, I tested some "extreme" cases and things
seemed to be a little clearer.
1. binutils-2.38 and GCC-12.1.0 each changed default ISA spec version,
updating from 2.2 to 20191213.
2. binutils>=2.38 or GCC>=12.1.0
when meet any of these it is recommended to turn on
TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI, the good outweighs
the bad. (My personal understanding.)
3. (Extreme case) binutils>=2.38 AND GCC<11.1.0
GCC-11.1.0 starts to support zicsr and zifencei extension for
-march[2].
In this case just turn on TOOLCHAIN_NEEDS_OLD_ISA_SPEC.
(Otherwise compiling the kernel will report an error whether
TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI is turned on or not.)
4. (Extreme case) GCC>=12.1.0, binutils<2.36
I tested GCC-12.1.0 + binutils-2.36 and it compiles the kernel
fine(after
hitting this patch). Also tested GCC-12.1.0 + binutils2.35 and
GCC-12.1.0 + binutils-2.34.0, but both pairings gave errors and failed
to produce a usable toolchain (Default ISA versions used). It seems
safe
to assume that GCC-12.1.0 + binutils<2.35 is almost non-existent.
I'm no expert on toolchains, so thanks for correcting me if I'm wrong
somewhere...
[1] https://wiki.osdev.org/Cross-Compiler_Successful_Builds
[2]
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b03be74bad08c382da47e048007a7…
>>> For more information, please refer to:
>>>
>>> commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38")
>>> commit e89c2e815e76 ("riscv: Handle zicsr/zifencei issues between clang and binutils")
>>>
>>> [1]:https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/aE1ZeHHCYf4
>>> [2]:https://lore.kernel.org/all/20230308220842.1231003-1-conor@kernel.org
>>> [3]:https://lore.kernel.org/all/20230223220546.52879-1-conor@kernel.org
>>> [4]:https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76ba…
> btw, please make these regular Link: tags (with a [N] at EOL) and drop
> the space between them and the sign off. Also, this probably needs to be
> CC:stable@vger.kernel.org too.
OK, I'll fix it.
Thanks,
Mingzheng.
> Cheers,
> Conor.
>
>>> Signed-off-by: Mingzheng Xing<xingmingzheng(a)iscas.ac.cn>
>>> ---
>>> arch/riscv/Kconfig | 6 +++++-
>>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>>> index 4c07b9189c86..b49cea30f6cc 100644
>>> --- a/arch/riscv/Kconfig
>>> +++ b/arch/riscv/Kconfig
>>> @@ -570,11 +570,15 @@ config TOOLCHAIN_HAS_ZIHINTPAUSE
>>> config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
>>> def_bool y
>>> #https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae8717a77d94b51ac3614e2ca6dc
>>> - depends on AS_IS_GNU && AS_VERSION >= 23800
>>> + #https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76bab51b5b52fec2cd
>>> + depends on CC_IS_GCC && GCC_VERSION >= 120100 || \
>>> + AS_IS_GNU && AS_VERSION >= 23800
>> GCC_VERSION will be 0 for clang, so you don't need the CC_IS_GCC check.
>> With that change, this should be able to stay on one line:
>>
>> depends on GCC_VERSION >= 120100 || (AS_IS_GNU && AS_VERSION >= 23800)
>>
>>> help
>>> Newer binutils versions default to ISA spec version 20191213 which
>>> moves some instructions from the I extension to the Zicsr and Zifencei
>>> extensions.
>>> + Similarly, GCC release 12.1.0 has changed the default ISA spec version to
>>> + 20191213, so the above situation requires this option to be enabled.
>>>
>>> config TOOLCHAIN_NEEDS_OLD_ISA_SPEC
>>> def_bool y
>>> --
>>> 2.34.1
>>>
On 7/26/23 01:23, Nathan Chancellor wrote:
> Hi Mingzheng,
>
> Thanks for the patch!
>
> On Wed, Jul 26, 2023 at 01:04:05AM +0800, Mingzheng Xing wrote:
>> When compiling the kernel with the toolchain composed of GCC >= 12.1.0 and
>> binutils < 2.38, default ISA spec used when building binutils and GCC, the
>> following build failure will appear because the
>> CONFIG_TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI is not turned on.
>> (i.e, gcc-12.1.0 and binutils-2.36, or gcc-12.3.0 and binutils-2.37, use
>> default ISA spec.)
>>
>> CC arch/riscv/kernel/vdso/vgettimeofday.o
>> <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h: Assembler messages:
>> <<BUILDDIR>>/arch/riscv/include/asm/vdso/gettimeofday.h:79: Error: unrecognized opcode `csrr a5,0xc01'
> The gift that keeps on giving :/
Yeah, but things will get better :)
>> Binutils has updated the default ISA spec version, and the community has
>> responded well to this[1][2][3], but it appears that this is not over yet.
>>
>> We also need to consider the situation of binutils < 2.38 but
>> GCC >= 12.1.0, since the combination between different versions of GCC and
>> binutils is not unique, which is to some extent flexible. GCC release
>> 12.1.0 updated the default ISA spec version in GCC commit[4].
> I suspect this combination is not too common because binutils 2.38 came
> out before GCC 12.1.0 but as you note, it is obviously possible. What
> toolchain has this combination in the wild, which would be helpful for
> documentation purposes?
Actually, this issue was discovered during the upgrade of the distribution
openEuler for RISC-V. It is a temporary phenomenon caused by
inconsistent upgrade speeds of packages such as GCC and binutils.
From my limited understanding, GCC and binutils are not strictly
version-bound in some other distributions, so I can't rule out this issue
happening in other scenarios. But once it happens it can cause problems
with compiling the kernel.
>> For more information, please refer to:
>>
>> commit 6df2a016c0c8 ("riscv: fix build with binutils 2.38")
>> commit e89c2e815e76 ("riscv: Handle zicsr/zifencei issues between clang and binutils")
>>
>> [1]:https://groups.google.com/a/groups.riscv.org/g/sw-dev/c/aE1ZeHHCYf4
>> [2]:https://lore.kernel.org/all/20230308220842.1231003-1-conor@kernel.org
>> [3]:https://lore.kernel.org/all/20230223220546.52879-1-conor@kernel.org
>> [4]:https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76ba…
>>
>> Signed-off-by: Mingzheng Xing<xingmingzheng(a)iscas.ac.cn>
>> ---
>> arch/riscv/Kconfig | 6 +++++-
>> 1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> index 4c07b9189c86..b49cea30f6cc 100644
>> --- a/arch/riscv/Kconfig
>> +++ b/arch/riscv/Kconfig
>> @@ -570,11 +570,15 @@ config TOOLCHAIN_HAS_ZIHINTPAUSE
>> config TOOLCHAIN_NEEDS_EXPLICIT_ZICSR_ZIFENCEI
>> def_bool y
>> #https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=aed44286efa8ae8717a77d94b51ac3614e2ca6dc
>> - depends on AS_IS_GNU && AS_VERSION >= 23800
>> + #https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=98416dbb0a62579d4a7a4a76bab51b5b52fec2cd
>> + depends on CC_IS_GCC && GCC_VERSION >= 120100 || \
>> + AS_IS_GNU && AS_VERSION >= 23800
> GCC_VERSION will be 0 for clang, so you don't need the CC_IS_GCC check.
> With that change, this should be able to stay on one line:
>
> depends on GCC_VERSION >= 120100 || (AS_IS_GNU && AS_VERSION >= 23800)
OK, I'll change it in v2.
Thanks,
Mingzheng.
>> help
>> Newer binutils versions default to ISA spec version 20191213 which
>> moves some instructions from the I extension to the Zicsr and Zifencei
>> extensions.
>> + Similarly, GCC release 12.1.0 has changed the default ISA spec version to
>> + 20191213, so the above situation requires this option to be enabled.
>>
>> config TOOLCHAIN_NEEDS_OLD_ISA_SPEC
>> def_bool y
>> --
>> 2.34.1
>>
> Cheers,
> Nathan
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv(a)lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Adding a reserved memory region for the framebuffer memory
(the splash memory region set up by the bootloader).
It fixes a kernel panic (arm-smmu: Unhandled context fault
at this particular memory region) reported on DB845c running
v5.10.y.
Cc: stable(a)vger.kernel.org # v5.10+
Reviewed-by: Caleb Connolly <caleb.connolly(a)linaro.org>
Signed-off-by: Amit Pundir <amit.pundir(a)linaro.org>
---
v6: Collected review tag, updated commit message for the
context and marked for stable kernel versions.
v5: Re-sending with updated dt-bindings patch in mdss-common
schema.
v4: Re-sending this along with a new dt-bindings patch to
document memory-region property in qcom,sdm845-mdss
schema and keep dtbs_check happy.
v3: Point this reserved region to MDSS.
v2: Updated commit message.
There was some dicussion on v1 but it didn't go anywhere,
https://lore.kernel.org/linux-kernel/20230124182857.1524912-1-amit.pundir@l….
The general consensus is that this memory should be freed and be
made resuable but that (releasing this piece of memory) has been
tried before and it is not trivial to return the reserved memory
node to the system RAM pool in this case.
arch/arm64/boot/dts/qcom/sdm845-db845c.dts | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
index d6b464cb61d6..f546f6f57c1e 100644
--- a/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
+++ b/arch/arm64/boot/dts/qcom/sdm845-db845c.dts
@@ -101,6 +101,14 @@ hdmi_con: endpoint {
};
};
+ reserved-memory {
+ /* Cont splash region set up by the bootloader */
+ cont_splash_mem: framebuffer@9d400000 {
+ reg = <0x0 0x9d400000 0x0 0x2400000>;
+ no-map;
+ };
+ };
+
lt9611_1v8: lt9611-vdd18-regulator {
compatible = "regulator-fixed";
regulator-name = "LT9611_1V8";
@@ -506,6 +514,7 @@ &i2c14 {
};
&mdss {
+ memory-region = <&cont_splash_mem>;
status = "okay";
};
--
2.25.1
Supported codec bitmask is populated from the payload from venus firmware.
There is a possible case when all the bits in the codec bitmask is set. In
such case, core cap for decoder is filled and MAX_CODEC_NUM is utilized.
Now while filling the caps for encoder, it can lead to access the caps
array beyong 32 index. Hence leading to OOB write.
The fix counts the supported encoder and decoder. If the count is more than
max, then it skips accessing the caps.
Cc: stable(a)vger.kernel.org
Fixes: 1a73374a04e5 ("media: venus: hfi_parser: add common capability parser")
Signed-off-by: Vikash Garodia <quic_vgarodia(a)quicinc.com>
---
drivers/media/platform/qcom/venus/hfi_parser.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/drivers/media/platform/qcom/venus/hfi_parser.c b/drivers/media/platform/qcom/venus/hfi_parser.c
index ec73cac..651e215 100644
--- a/drivers/media/platform/qcom/venus/hfi_parser.c
+++ b/drivers/media/platform/qcom/venus/hfi_parser.c
@@ -14,11 +14,26 @@
typedef void (*func)(struct hfi_plat_caps *cap, const void *data,
unsigned int size);
+static int count_setbits(u32 input)
+{
+ u32 count = 0;
+
+ while (input > 0) {
+ if ((input & 1) == 1)
+ count++;
+ input >>= 1;
+ }
+ return count;
+}
+
static void init_codecs(struct venus_core *core)
{
struct hfi_plat_caps *caps = core->caps, *cap;
unsigned long bit;
+ if ((count_setbits(core->dec_codecs) + count_setbits(core->enc_codecs)) > MAX_CODEC_NUM)
+ return;
+
for_each_set_bit(bit, &core->dec_codecs, MAX_CODEC_NUM) {
cap = &caps[core->codecs_count++];
cap->codec = BIT(bit);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
The hfi parser, parses the capabilities received from venus firmware and
copies them to core capabilities. Consider below api, for example,
fill_caps - In this api, caps in core structure gets updated with the
number of capabilities received in firmware data payload. If the same api
is called multiple times, there is a possibility of copying beyond the max
allocated size in core caps.
Similar possibilities in fill_raw_fmts and fill_profile_level functions.
Cc: stable(a)vger.kernel.org
Fixes: 1a73374a04e5 ("media: venus: hfi_parser: add common capability parser")
Signed-off-by: Vikash Garodia <quic_vgarodia(a)quicinc.com>
---
drivers/media/platform/qcom/venus/hfi_parser.c | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/drivers/media/platform/qcom/venus/hfi_parser.c b/drivers/media/platform/qcom/venus/hfi_parser.c
index 6cf74b2..ec73cac 100644
--- a/drivers/media/platform/qcom/venus/hfi_parser.c
+++ b/drivers/media/platform/qcom/venus/hfi_parser.c
@@ -86,6 +86,9 @@ static void fill_profile_level(struct hfi_plat_caps *cap, const void *data,
{
const struct hfi_profile_level *pl = data;
+ if (cap->num_pl > HFI_MAX_PROFILE_COUNT)
+ return;
+
memcpy(&cap->pl[cap->num_pl], pl, num * sizeof(*pl));
cap->num_pl += num;
}
@@ -111,6 +114,9 @@ fill_caps(struct hfi_plat_caps *cap, const void *data, unsigned int num)
{
const struct hfi_capability *caps = data;
+ if (cap->num_caps > MAX_CAP_ENTRIES)
+ return;
+
memcpy(&cap->caps[cap->num_caps], caps, num * sizeof(*caps));
cap->num_caps += num;
}
@@ -137,6 +143,9 @@ static void fill_raw_fmts(struct hfi_plat_caps *cap, const void *fmts,
{
const struct raw_formats *formats = fmts;
+ if (cap->num_fmts > MAX_FMT_ENTRIES)
+ return;
+
memcpy(&cap->fmts[cap->num_fmts], formats, num_fmts * sizeof(*formats));
cap->num_fmts += num_fmts;
}
@@ -159,6 +168,9 @@ parse_raw_formats(struct venus_core *core, u32 codecs, u32 domain, void *data)
rawfmts[i].buftype = fmt->buffer_type;
i++;
+ if (i >= MAX_FMT_ENTRIES)
+ return;
+
if (pinfo->num_planes > MAX_PLANES)
break;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Buffer requirement, for different buffer type, comes from video firmware.
While copying these requirements, there is an OOB possibility when the
payload from firmware is more than expected size. Fix the check to avoid
the OOB possibility.
Cc: stable(a)vger.kernel.org
Fixes: 09c2845e8fe4 ("[media] media: venus: hfi: add Host Firmware Interface (HFI)")
Signed-off-by: Vikash Garodia <quic_vgarodia(a)quicinc.com>
---
drivers/media/platform/qcom/venus/hfi_msgs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/media/platform/qcom/venus/hfi_msgs.c b/drivers/media/platform/qcom/venus/hfi_msgs.c
index 3d5dadf..3e85bd8 100644
--- a/drivers/media/platform/qcom/venus/hfi_msgs.c
+++ b/drivers/media/platform/qcom/venus/hfi_msgs.c
@@ -398,7 +398,7 @@ session_get_prop_buf_req(struct hfi_msg_session_property_info_pkt *pkt,
memcpy(&bufreq[idx], buf_req, sizeof(*bufreq));
idx++;
- if (idx > HFI_BUFFER_TYPE_MAX)
+ if (idx >= HFI_BUFFER_TYPE_MAX)
return HFI_ERR_SESSION_INVALID_PARAMETER;
req_bytes -= sizeof(struct hfi_buffer_requirements);
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Read and write pointers are used to track the packet index in the memory
shared between video driver and firmware. There is a possibility of OOB
access if the read or write pointer goes beyond the queue memory size.
Add checks for the read and write pointer to avoid OOB access.
Cc: stable(a)vger.kernel.org
Fixes: d96d3f30c0f2 ("[media] media: venus: hfi: add Venus HFI files")
Signed-off-by: Vikash Garodia <quic_vgarodia(a)quicinc.com>
---
drivers/media/platform/qcom/venus/hfi_venus.c | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/drivers/media/platform/qcom/venus/hfi_venus.c b/drivers/media/platform/qcom/venus/hfi_venus.c
index f0b4638..dc228c4 100644
--- a/drivers/media/platform/qcom/venus/hfi_venus.c
+++ b/drivers/media/platform/qcom/venus/hfi_venus.c
@@ -206,6 +206,10 @@ static int venus_write_queue(struct venus_hfi_device *hdev,
new_wr_idx = wr_idx + dwords;
wr_ptr = (u32 *)(queue->qmem.kva + (wr_idx << 2));
+
+ if (wr_ptr < (u32 *)queue->qmem.kva || wr_ptr > (u32 *)(queue->qmem.kva + queue->qmem.size))
+ return -EINVAL;
+
if (new_wr_idx < qsize) {
memcpy(wr_ptr, packet, dwords << 2);
} else {
@@ -273,6 +277,10 @@ static int venus_read_queue(struct venus_hfi_device *hdev,
}
rd_ptr = (u32 *)(queue->qmem.kva + (rd_idx << 2));
+
+ if (rd_ptr < (u32 *)queue->qmem.kva || rd_ptr > (u32 *)(queue->qmem.kva + queue->qmem.size))
+ return -EINVAL;
+
dwords = *rd_ptr >> 2;
if (!dwords)
return -EINVAL;
--
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project
Freeing a hugetlb page and releasing base pages back to the underlying
allocator such as buddy or cma is performed in two steps:
- remove_hugetlb_folio() is called to remove the folio from hugetlb
lists, get a ref on the page and remove hugetlb destructor. This
all must be done under the hugetlb lock. After this call, the page
can be treated as a normal compound page or a collection of base
size pages.
- update_and_free_hugetlb_folio() is called to allocate vmemmap if
needed and the free routine of the underlying allocator is called
on the resulting page. We can not hold the hugetlb lock here.
One issue with this scheme is that a memory error could occur between
these two steps. In this case, the memory error handling code treats
the old hugetlb page as a normal compound page or collection of base
pages. It will then try to SetPageHWPoison(page) on the page with an
error. If the page with error is a tail page without vmemmap, a write
error will occur when trying to set the flag.
Address this issue by modifying remove_hugetlb_folio() and
update_and_free_hugetlb_folio() such that the hugetlb destructor is not
cleared until after allocating vmemmap. Since clearing the destructor
requires holding the hugetlb lock, the clearing is done in
remove_hugetlb_folio() if the vmemmap is present. This saves a
lock/unlock cycle. Otherwise, destructor is cleared in
update_and_free_hugetlb_folio() after allocating vmemmap.
Note that this will leave hugetlb pages in a state where they are marked
free (by hugetlb specific page flag) and have a ref count. This is not
a normal state. The only code that would notice is the memory error
code, and it is set up to retry in such a case.
A subsequent patch will create a routine to do bulk processing of
vmemmap allocation. This will eliminate a lock/unlock cycle for each
hugetlb page in the case where we are freeing a large number of pages.
Fixes: ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated with each HugeTLB page")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
---
mm/hugetlb.c | 90 ++++++++++++++++++++++++++++++++++++++--------------
1 file changed, 66 insertions(+), 24 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 64a3239b6407..4a910121a647 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -1579,9 +1579,37 @@ static inline void destroy_compound_gigantic_folio(struct folio *folio,
unsigned int order) { }
#endif
+static inline void __clear_hugetlb_destructor(struct hstate *h,
+ struct folio *folio)
+{
+ lockdep_assert_held(&hugetlb_lock);
+
+ /*
+ * Very subtle
+ *
+ * For non-gigantic pages set the destructor to the normal compound
+ * page dtor. This is needed in case someone takes an additional
+ * temporary ref to the page, and freeing is delayed until they drop
+ * their reference.
+ *
+ * For gigantic pages set the destructor to the null dtor. This
+ * destructor will never be called. Before freeing the gigantic
+ * page destroy_compound_gigantic_folio will turn the folio into a
+ * simple group of pages. After this the destructor does not
+ * apply.
+ *
+ */
+ if (hstate_is_gigantic(h))
+ folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
+ else
+ folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
+}
+
/*
- * Remove hugetlb folio from lists, and update dtor so that the folio appears
- * as just a compound page.
+ * Remove hugetlb folio from lists.
+ * If vmemmap exists for the folio, update dtor so that the folio appears
+ * as just a compound page. Otherwise, wait until after allocating vmemmap
+ * to update dtor.
*
* A reference is held on the folio, except in the case of demote.
*
@@ -1612,31 +1640,19 @@ static void __remove_hugetlb_folio(struct hstate *h, struct folio *folio,
}
/*
- * Very subtle
- *
- * For non-gigantic pages set the destructor to the normal compound
- * page dtor. This is needed in case someone takes an additional
- * temporary ref to the page, and freeing is delayed until they drop
- * their reference.
- *
- * For gigantic pages set the destructor to the null dtor. This
- * destructor will never be called. Before freeing the gigantic
- * page destroy_compound_gigantic_folio will turn the folio into a
- * simple group of pages. After this the destructor does not
- * apply.
- *
- * This handles the case where more than one ref is held when and
- * after update_and_free_hugetlb_folio is called.
- *
- * In the case of demote we do not ref count the page as it will soon
- * be turned into a page of smaller size.
+ * We can only clear the hugetlb destructor after allocating vmemmap
+ * pages. Otherwise, someone (memory error handling) may try to write
+ * to tail struct pages.
+ */
+ if (!folio_test_hugetlb_vmemmap_optimized(folio))
+ __clear_hugetlb_destructor(h, folio);
+
+ /*
+ * In the case of demote we do not ref count the page as it will soon
+ * be turned into a page of smaller size.
*/
if (!demote)
folio_ref_unfreeze(folio, 1);
- if (hstate_is_gigantic(h))
- folio_set_compound_dtor(folio, NULL_COMPOUND_DTOR);
- else
- folio_set_compound_dtor(folio, COMPOUND_PAGE_DTOR);
h->nr_huge_pages--;
h->nr_huge_pages_node[nid]--;
@@ -1728,6 +1744,19 @@ static void __update_and_free_hugetlb_folio(struct hstate *h,
return;
}
+ /*
+ * If needed, clear hugetlb destructor under the hugetlb lock.
+ * This must be done AFTER allocating vmemmap pages in case there is an
+ * attempt to write to tail struct pages as in memory poison.
+ * It must be done BEFORE PageHWPoison handling so that any subsequent
+ * memory errors poison individual pages instead of head.
+ */
+ if (folio_test_hugetlb(folio)) {
+ spin_lock_irq(&hugetlb_lock);
+ __clear_hugetlb_destructor(h, folio);
+ spin_unlock_irq(&hugetlb_lock);
+ }
+
/*
* Move PageHWPoison flag from head page to the raw error pages,
* which makes any healthy subpages reusable.
@@ -3604,6 +3633,19 @@ static int demote_free_hugetlb_folio(struct hstate *h, struct folio *folio)
return rc;
}
+ /*
+ * The hugetlb destructor could still be set for this folio if vmemmap
+ * was actually allocated above. The ref count on all pages is 0.
+ * Therefore, nobody should attempt access. However, before destroying
+ * compound page below, clear the destructor. Unfortunately, this
+ * requires a lock/unlock cycle.
+ */
+ if (folio_test_hugetlb(folio)) {
+ spin_lock_irq(&hugetlb_lock);
+ __clear_hugetlb_destructor(h, folio);
+ spin_unlock_irq(&hugetlb_lock);
+ }
+
/*
* Use destroy_compound_hugetlb_folio_for_demote for all huge page
* sizes as it will not ref count folios.
--
2.41.0
This is a note to let you know that I've just added the patch titled
usb: chipidea: imx: improve logic if samsung,picophy-* parameter is 0
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
From 36668515d56bf73f06765c71e08c8f7465f1e5c4 Mon Sep 17 00:00:00 2001
From: Xu Yang <xu.yang_2(a)nxp.com>
Date: Tue, 27 Jun 2023 19:21:24 +0800
Subject: usb: chipidea: imx: improve logic if samsung,picophy-* parameter is 0
In current driver, the value of tuning parameter will not take effect
if samsung,picophy-* is assigned as 0. Because 0 is also a valid value
acccording to the description of USB_PHY_CFG1 register, this will improve
the logic to let it work.
Fixes: 58a3cefb3840 ("usb: chipidea: imx: add two samsung picophy parameters tuning implementation")
cc: <stable(a)vger.kernel.org>
Signed-off-by: Xu Yang <xu.yang_2(a)nxp.com>
Acked-by: Peter Chen <peter.chen(a)kernel.org>
Link: https://lore.kernel.org/r/20230627112126.1882666-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/chipidea/ci_hdrc_imx.c | 10 ++++++----
drivers/usb/chipidea/usbmisc_imx.c | 6 ++++--
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index aa2aebed8e2d..c251a4f34879 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -176,10 +176,12 @@ static struct imx_usbmisc_data *usbmisc_get_init_data(struct device *dev)
if (of_usb_get_phy_mode(np) == USBPHY_INTERFACE_MODE_ULPI)
data->ulpi = 1;
- of_property_read_u32(np, "samsung,picophy-pre-emp-curr-control",
- &data->emp_curr_control);
- of_property_read_u32(np, "samsung,picophy-dc-vol-level-adjust",
- &data->dc_vol_level_adjust);
+ if (of_property_read_u32(np, "samsung,picophy-pre-emp-curr-control",
+ &data->emp_curr_control))
+ data->emp_curr_control = -1;
+ if (of_property_read_u32(np, "samsung,picophy-dc-vol-level-adjust",
+ &data->dc_vol_level_adjust))
+ data->dc_vol_level_adjust = -1;
return data;
}
diff --git a/drivers/usb/chipidea/usbmisc_imx.c b/drivers/usb/chipidea/usbmisc_imx.c
index e8a712e5abad..c4165f061e42 100644
--- a/drivers/usb/chipidea/usbmisc_imx.c
+++ b/drivers/usb/chipidea/usbmisc_imx.c
@@ -660,13 +660,15 @@ static int usbmisc_imx7d_init(struct imx_usbmisc_data *data)
usbmisc->base + MX7D_USBNC_USB_CTRL2);
/* PHY tuning for signal quality */
reg = readl(usbmisc->base + MX7D_USB_OTG_PHY_CFG1);
- if (data->emp_curr_control && data->emp_curr_control <=
+ if (data->emp_curr_control >= 0 &&
+ data->emp_curr_control <=
(TXPREEMPAMPTUNE0_MASK >> TXPREEMPAMPTUNE0_BIT)) {
reg &= ~TXPREEMPAMPTUNE0_MASK;
reg |= (data->emp_curr_control << TXPREEMPAMPTUNE0_BIT);
}
- if (data->dc_vol_level_adjust && data->dc_vol_level_adjust <=
+ if (data->dc_vol_level_adjust >= 0 &&
+ data->dc_vol_level_adjust <=
(TXVREFTUNE0_MASK >> TXVREFTUNE0_BIT)) {
reg &= ~TXVREFTUNE0_MASK;
reg |= (data->dc_vol_level_adjust << TXVREFTUNE0_BIT);
--
2.41.0
The qca8xxx switch supports 2 way to write reg values, a slow way using
mdio and a fast way by sending specially crafted mgmt packet to
read/write reg.
The fast way can support up to 32 bytes of data as eth packet are used
to send/receive.
This correctly works for almost the entire regmap of the switch but with
the use of some kernel selftests for dsa drivers it was found a funny
and interesting hw defect/limitation.
For some specific reg, bulk write won't work and will result in writing
only part of the requested regs resulting in half data written. This was
especially hard to track and discover due to the total strangeness of
the problem and also by the specific regs where this occurs.
This occurs in the specific regs of the ATU table, where multiple entry
needs to be written to compose the entire entry.
It was discovered that with a bulk write of 12 bytes on
QCA8K_REG_ATU_DATA0 only QCA8K_REG_ATU_DATA0 and QCA8K_REG_ATU_DATA2
were written, but QCA8K_REG_ATU_DATA1 was always zero.
Tcpdump was used to make sure the specially crafted packet was correct
and this was confirmed.
The problem was hard to track as the lack of QCA8K_REG_ATU_DATA1
resulted in an entry somehow possible as the first bytes of the mac
address are set in QCA8K_REG_ATU_DATA0 and the entry type is set in
QCA8K_REG_ATU_DATA2.
Funlly enough writing QCA8K_REG_ATU_DATA1 results in the same problem
with QCA8K_REG_ATU_DATA2 empty and QCA8K_REG_ATU_DATA1 and
QCA8K_REG_ATU_FUNC correctly written.
A speculation on the problem might be that there are some kind of
indirection internally when accessing these regs and they can't be
accessed all together, due to the fact that it's really a table mapped
somewhere in the switch SRAM.
Even more funny is the fact that every other reg was tested with all
kind of combination and they are not affected by this problem. Read
operation was also tested and always worked so it's not affected by this
problem.
The problem is not present if we limit writing a single reg at times.
To handle this hardware defect, enable use_single_write so that bulk
api can correctly split the write in multiple different operation
effectively reverting to a non-bulk write.
Cc: Mark Brown <broonie(a)kernel.org>
Fixes: c766e077d927 ("net: dsa: qca8k: convert to regmap read/write API")
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
drivers/net/dsa/qca/qca8k-8xxx.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/drivers/net/dsa/qca/qca8k-8xxx.c b/drivers/net/dsa/qca/qca8k-8xxx.c
index dee7b6579916..ae088a4df794 100644
--- a/drivers/net/dsa/qca/qca8k-8xxx.c
+++ b/drivers/net/dsa/qca/qca8k-8xxx.c
@@ -576,8 +576,11 @@ static struct regmap_config qca8k_regmap_config = {
.rd_table = &qca8k_readable_table,
.disable_locking = true, /* Locking is handled by qca8k read/write */
.cache_type = REGCACHE_NONE, /* Explicitly disable CACHE */
- .max_raw_read = 32, /* mgmt eth can read/write up to 8 registers at time */
- .max_raw_write = 32,
+ .max_raw_read = 32, /* mgmt eth can read up to 8 registers at time */
+ /* ATU regs suffer from a bug where some data are not correctly
+ * written. Disable bulk write to correctly write ATU entry.
+ */
+ .use_single_write = true,
};
static int
--
2.40.1
Dzień dobry,
chciałbym poinformować Państwa o możliwości pozyskania nowych zleceń ze strony www.
Widzimy zainteresowanie potencjalnych Klientów Państwa firmą, dlatego chętnie pomożemy Państwu dotrzeć z ofertą do większego grona odbiorców poprzez efektywne metody pozycjonowania strony w Google.
Czy mógłbym liczyć na kontakt zwrotny?
Pozdrawiam serdecznie
Wiktor Nurek
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x c4d13222afd8a64bf11bc7ec68645496ee8b54b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072654-smother-blurred-0b03@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
c4d13222afd8 ("ext4: fix to check return value of freeze_bdev() in ext4_shutdown()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c4d13222afd8a64bf11bc7ec68645496ee8b54b9 Mon Sep 17 00:00:00 2001
From: Chao Yu <chao(a)kernel.org>
Date: Tue, 6 Jun 2023 15:32:03 +0800
Subject: [PATCH] ext4: fix to check return value of freeze_bdev() in
ext4_shutdown()
freeze_bdev() can fail due to a lot of reasons, it needs to check its
reason before later process.
Fixes: 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl")
Cc: stable(a)kernel.org
Signed-off-by: Chao Yu <chao(a)kernel.org>
Link: https://lore.kernel.org/r/20230606073203.1310389-1-chao@kernel.org
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f9a430152063..55be1b8a6360 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -797,6 +797,7 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
__u32 flags;
+ int ret;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -815,7 +816,9 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
switch (flags) {
case EXT4_GOING_FLAGS_DEFAULT:
- freeze_bdev(sb->s_bdev);
+ ret = freeze_bdev(sb->s_bdev);
+ if (ret)
+ return ret;
set_bit(EXT4_FLAGS_SHUTDOWN, &sbi->s_ext4_flags);
thaw_bdev(sb->s_bdev);
break;
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x c4d13222afd8a64bf11bc7ec68645496ee8b54b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072653-gumminess-freemason-4c63@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
c4d13222afd8 ("ext4: fix to check return value of freeze_bdev() in ext4_shutdown()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c4d13222afd8a64bf11bc7ec68645496ee8b54b9 Mon Sep 17 00:00:00 2001
From: Chao Yu <chao(a)kernel.org>
Date: Tue, 6 Jun 2023 15:32:03 +0800
Subject: [PATCH] ext4: fix to check return value of freeze_bdev() in
ext4_shutdown()
freeze_bdev() can fail due to a lot of reasons, it needs to check its
reason before later process.
Fixes: 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl")
Cc: stable(a)kernel.org
Signed-off-by: Chao Yu <chao(a)kernel.org>
Link: https://lore.kernel.org/r/20230606073203.1310389-1-chao@kernel.org
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f9a430152063..55be1b8a6360 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -797,6 +797,7 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
__u32 flags;
+ int ret;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -815,7 +816,9 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
switch (flags) {
case EXT4_GOING_FLAGS_DEFAULT:
- freeze_bdev(sb->s_bdev);
+ ret = freeze_bdev(sb->s_bdev);
+ if (ret)
+ return ret;
set_bit(EXT4_FLAGS_SHUTDOWN, &sbi->s_ext4_flags);
thaw_bdev(sb->s_bdev);
break;
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x c4d13222afd8a64bf11bc7ec68645496ee8b54b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072652-selector-hangover-e9ed@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
c4d13222afd8 ("ext4: fix to check return value of freeze_bdev() in ext4_shutdown()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c4d13222afd8a64bf11bc7ec68645496ee8b54b9 Mon Sep 17 00:00:00 2001
From: Chao Yu <chao(a)kernel.org>
Date: Tue, 6 Jun 2023 15:32:03 +0800
Subject: [PATCH] ext4: fix to check return value of freeze_bdev() in
ext4_shutdown()
freeze_bdev() can fail due to a lot of reasons, it needs to check its
reason before later process.
Fixes: 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl")
Cc: stable(a)kernel.org
Signed-off-by: Chao Yu <chao(a)kernel.org>
Link: https://lore.kernel.org/r/20230606073203.1310389-1-chao@kernel.org
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f9a430152063..55be1b8a6360 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -797,6 +797,7 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
__u32 flags;
+ int ret;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -815,7 +816,9 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
switch (flags) {
case EXT4_GOING_FLAGS_DEFAULT:
- freeze_bdev(sb->s_bdev);
+ ret = freeze_bdev(sb->s_bdev);
+ if (ret)
+ return ret;
set_bit(EXT4_FLAGS_SHUTDOWN, &sbi->s_ext4_flags);
thaw_bdev(sb->s_bdev);
break;
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x c4d13222afd8a64bf11bc7ec68645496ee8b54b9
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072651-wreath-moonrise-7e72@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
c4d13222afd8 ("ext4: fix to check return value of freeze_bdev() in ext4_shutdown()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c4d13222afd8a64bf11bc7ec68645496ee8b54b9 Mon Sep 17 00:00:00 2001
From: Chao Yu <chao(a)kernel.org>
Date: Tue, 6 Jun 2023 15:32:03 +0800
Subject: [PATCH] ext4: fix to check return value of freeze_bdev() in
ext4_shutdown()
freeze_bdev() can fail due to a lot of reasons, it needs to check its
reason before later process.
Fixes: 783d94854499 ("ext4: add EXT4_IOC_GOINGDOWN ioctl")
Cc: stable(a)kernel.org
Signed-off-by: Chao Yu <chao(a)kernel.org>
Link: https://lore.kernel.org/r/20230606073203.1310389-1-chao@kernel.org
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/ioctl.c b/fs/ext4/ioctl.c
index f9a430152063..55be1b8a6360 100644
--- a/fs/ext4/ioctl.c
+++ b/fs/ext4/ioctl.c
@@ -797,6 +797,7 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
{
struct ext4_sb_info *sbi = EXT4_SB(sb);
__u32 flags;
+ int ret;
if (!capable(CAP_SYS_ADMIN))
return -EPERM;
@@ -815,7 +816,9 @@ static int ext4_shutdown(struct super_block *sb, unsigned long arg)
switch (flags) {
case EXT4_GOING_FLAGS_DEFAULT:
- freeze_bdev(sb->s_bdev);
+ ret = freeze_bdev(sb->s_bdev);
+ if (ret)
+ return ret;
set_bit(EXT4_FLAGS_SHUTDOWN, &sbi->s_ext4_flags);
thaw_bdev(sb->s_bdev);
break;
The "or" (|) in regular expression must be within parentheses,
otherwise it is not really an "or" and it matches supplies:
qcom-apq8060-dragonboard.dtb: regulators-1: vdd_ncp-supply: [[34]] is not of type 'object'
Fixes: fde0e25b71a9 ("dt-bindings: regulators: convert non-smd RPM Regulators bindings to dt-schema")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
---
.../devicetree/bindings/regulator/qcom,rpm-regulator.yaml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/devicetree/bindings/regulator/qcom,rpm-regulator.yaml b/Documentation/devicetree/bindings/regulator/qcom,rpm-regulator.yaml
index 8a08698e3484..b4eb4001eb3d 100644
--- a/Documentation/devicetree/bindings/regulator/qcom,rpm-regulator.yaml
+++ b/Documentation/devicetree/bindings/regulator/qcom,rpm-regulator.yaml
@@ -49,7 +49,7 @@ patternProperties:
".*-supply$":
description: Input supply phandle(s) for this node
- "^((s|l|lvs)[0-9]*)|(s[1-2][a-b])|(ncp)|(mvs)|(usb-switch)|(hdmi-switch)$":
+ "^((s|l|lvs)[0-9]*|s[1-2][a-b]|ncp|mvs|usb-switch|hdmi-switch)$":
description: List of regulators and its properties
$ref: regulator.yaml#
unevaluatedProperties: false
--
2.34.1
From: Mike Kravetz <mike.kravetz(a)oracle.com>
commit 16f8eb3eea9eb2a1568279d64ca4dc977e7aa538 upstream
This reverts commit 9425c591e06a9ab27a145ba655fb50532cf0bcc9
The reverted commit fixed up routines primarily used by readahead code
such that they could also be used by hugetlb. Unfortunately, this
caused a performance regression as pointed out by the Closes: tag.
The hugetlb code which uses page_cache_next_miss will be addressed in
a subsequent patch.
Link: https://lkml.kernel.org/r/20230621212403.174710-1-mike.kravetz@oracle.com
Fixes: 9425c591e06a ("page cache: fix page_cache_next/prev_miss off by one")
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Reported-by: kernel test robot <oliver.sang(a)intel.com>
Closes: https://lore.kernel.org/oe-lkp/202306211346.1e9ff03e-oliver.sang@intel.com
Reviewed-by: Sidhartha Kumar <sidhartha.kumar(a)oracle.com>
Cc: stable(a)vger.kernel.org
Cc: Thomas Backlund <tmb(a)tmb.nu>
Cc: Ackerley Tng <ackerleytng(a)google.com>
Cc: Erdem Aktas <erdemaktas(a)google.com>
Cc: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Muchun Song <songmuchun(a)bytedance.com>
Cc: Vishal Annapurve <vannapurve(a)google.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
As pointed out in [1], this patch was missed in the 6.4 stable tree.
Resending this patch with a Cc: stable.
[1]: https://lore.kernel.org/all/20230621212403.174710-1-mike.kravetz@oracle.com…
mm/filemap.c | 26 ++++++++++----------------
1 file changed, 10 insertions(+), 16 deletions(-)
diff --git a/mm/filemap.c b/mm/filemap.c
index 8abce63b259c9..a2006936a6ae2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -1760,9 +1760,7 @@ bool __folio_lock_or_retry(struct folio *folio, struct mm_struct *mm,
*
* Return: The index of the gap if found, otherwise an index outside the
* range specified (in which case 'return - index >= max_scan' will be true).
- * In the rare case of index wrap-around, 0 will be returned. 0 will also
- * be returned if index == 0 and there is a gap at the index. We can not
- * wrap-around if passed index == 0.
+ * In the rare case of index wrap-around, 0 will be returned.
*/
pgoff_t page_cache_next_miss(struct address_space *mapping,
pgoff_t index, unsigned long max_scan)
@@ -1772,13 +1770,12 @@ pgoff_t page_cache_next_miss(struct address_space *mapping,
while (max_scan--) {
void *entry = xas_next(&xas);
if (!entry || xa_is_value(entry))
- return xas.xa_index;
- if (xas.xa_index == 0 && index != 0)
- return xas.xa_index;
+ break;
+ if (xas.xa_index == 0)
+ break;
}
- /* No gaps in range and no wrap-around, return index beyond range */
- return xas.xa_index + 1;
+ return xas.xa_index;
}
EXPORT_SYMBOL(page_cache_next_miss);
@@ -1799,9 +1796,7 @@ EXPORT_SYMBOL(page_cache_next_miss);
*
* Return: The index of the gap if found, otherwise an index outside the
* range specified (in which case 'index - return >= max_scan' will be true).
- * In the rare case of wrap-around, ULONG_MAX will be returned. ULONG_MAX
- * will also be returned if index == ULONG_MAX and there is a gap at the
- * index. We can not wrap-around if passed index == ULONG_MAX.
+ * In the rare case of wrap-around, ULONG_MAX will be returned.
*/
pgoff_t page_cache_prev_miss(struct address_space *mapping,
pgoff_t index, unsigned long max_scan)
@@ -1811,13 +1806,12 @@ pgoff_t page_cache_prev_miss(struct address_space *mapping,
while (max_scan--) {
void *entry = xas_prev(&xas);
if (!entry || xa_is_value(entry))
- return xas.xa_index;
- if (xas.xa_index == ULONG_MAX && index != ULONG_MAX)
- return xas.xa_index;
+ break;
+ if (xas.xa_index == ULONG_MAX)
+ break;
}
- /* No gaps in range and no wrap-around, return index beyond range */
- return xas.xa_index - 1;
+ return xas.xa_index;
}
EXPORT_SYMBOL(page_cache_prev_miss);
--
2.41.0
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 05f933d5f7318b03ff2028c1704dc867ac16f2c7
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072154-animal-dropkick-6a92@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 05f933d5f7318b03ff2028c1704dc867ac16f2c7 Mon Sep 17 00:00:00 2001
From: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Date: Tue, 4 Jul 2023 21:50:28 +0200
Subject: [PATCH] i2c: nomadik: Remove a useless call in the remove function
Since commit 235602146ec9 ("i2c-nomadik: turn the platform driver to an amba
driver"), there is no more request_mem_region() call in this driver.
So remove the release_mem_region() call from the remove function which is
likely a left over.
Fixes: 235602146ec9 ("i2c-nomadik: turn the platform driver to an amba driver")
Cc: <stable(a)vger.kernel.org> # v3.6+
Acked-by: Linus Walleij <linus.walleij(a)linaro.org>
Reviewed-by: Andi Shyti <andi.shyti(a)kernel.org>
Signed-off-by: Christophe JAILLET <christophe.jaillet(a)wanadoo.fr>
Signed-off-by: Wolfram Sang <wsa(a)kernel.org>
diff --git a/drivers/i2c/busses/i2c-nomadik.c b/drivers/i2c/busses/i2c-nomadik.c
index 1e5fd23ef45c..212f412f1c74 100644
--- a/drivers/i2c/busses/i2c-nomadik.c
+++ b/drivers/i2c/busses/i2c-nomadik.c
@@ -1038,7 +1038,6 @@ static int nmk_i2c_probe(struct amba_device *adev, const struct amba_id *id)
static void nmk_i2c_remove(struct amba_device *adev)
{
- struct resource *res = &adev->res;
struct nmk_i2c_dev *dev = amba_get_drvdata(adev);
i2c_del_adapter(&dev->adap);
@@ -1047,7 +1046,6 @@ static void nmk_i2c_remove(struct amba_device *adev)
clear_all_interrupts(dev);
/* disable the controller */
i2c_clr_bit(dev->virtbase + I2C_CR, I2C_CR_PE);
- release_mem_region(res->start, resource_size(res));
}
static struct i2c_vendor_data vendor_stn8815 = {
This is a note to let you know that I've just added the patch titled
usb: chipidea: imx: improve logic if samsung,picophy-* parameter is 0
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
From 36668515d56bf73f06765c71e08c8f7465f1e5c4 Mon Sep 17 00:00:00 2001
From: Xu Yang <xu.yang_2(a)nxp.com>
Date: Tue, 27 Jun 2023 19:21:24 +0800
Subject: usb: chipidea: imx: improve logic if samsung,picophy-* parameter is 0
In current driver, the value of tuning parameter will not take effect
if samsung,picophy-* is assigned as 0. Because 0 is also a valid value
acccording to the description of USB_PHY_CFG1 register, this will improve
the logic to let it work.
Fixes: 58a3cefb3840 ("usb: chipidea: imx: add two samsung picophy parameters tuning implementation")
cc: <stable(a)vger.kernel.org>
Signed-off-by: Xu Yang <xu.yang_2(a)nxp.com>
Acked-by: Peter Chen <peter.chen(a)kernel.org>
Link: https://lore.kernel.org/r/20230627112126.1882666-1-xu.yang_2@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/chipidea/ci_hdrc_imx.c | 10 ++++++----
drivers/usb/chipidea/usbmisc_imx.c | 6 ++++--
2 files changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/usb/chipidea/ci_hdrc_imx.c b/drivers/usb/chipidea/ci_hdrc_imx.c
index aa2aebed8e2d..c251a4f34879 100644
--- a/drivers/usb/chipidea/ci_hdrc_imx.c
+++ b/drivers/usb/chipidea/ci_hdrc_imx.c
@@ -176,10 +176,12 @@ static struct imx_usbmisc_data *usbmisc_get_init_data(struct device *dev)
if (of_usb_get_phy_mode(np) == USBPHY_INTERFACE_MODE_ULPI)
data->ulpi = 1;
- of_property_read_u32(np, "samsung,picophy-pre-emp-curr-control",
- &data->emp_curr_control);
- of_property_read_u32(np, "samsung,picophy-dc-vol-level-adjust",
- &data->dc_vol_level_adjust);
+ if (of_property_read_u32(np, "samsung,picophy-pre-emp-curr-control",
+ &data->emp_curr_control))
+ data->emp_curr_control = -1;
+ if (of_property_read_u32(np, "samsung,picophy-dc-vol-level-adjust",
+ &data->dc_vol_level_adjust))
+ data->dc_vol_level_adjust = -1;
return data;
}
diff --git a/drivers/usb/chipidea/usbmisc_imx.c b/drivers/usb/chipidea/usbmisc_imx.c
index e8a712e5abad..c4165f061e42 100644
--- a/drivers/usb/chipidea/usbmisc_imx.c
+++ b/drivers/usb/chipidea/usbmisc_imx.c
@@ -660,13 +660,15 @@ static int usbmisc_imx7d_init(struct imx_usbmisc_data *data)
usbmisc->base + MX7D_USBNC_USB_CTRL2);
/* PHY tuning for signal quality */
reg = readl(usbmisc->base + MX7D_USB_OTG_PHY_CFG1);
- if (data->emp_curr_control && data->emp_curr_control <=
+ if (data->emp_curr_control >= 0 &&
+ data->emp_curr_control <=
(TXPREEMPAMPTUNE0_MASK >> TXPREEMPAMPTUNE0_BIT)) {
reg &= ~TXPREEMPAMPTUNE0_MASK;
reg |= (data->emp_curr_control << TXPREEMPAMPTUNE0_BIT);
}
- if (data->dc_vol_level_adjust && data->dc_vol_level_adjust <=
+ if (data->dc_vol_level_adjust >= 0 &&
+ data->dc_vol_level_adjust <=
(TXVREFTUNE0_MASK >> TXVREFTUNE0_BIT)) {
reg &= ~TXVREFTUNE0_MASK;
reg |= (data->dc_vol_level_adjust << TXVREFTUNE0_BIT);
--
2.41.0
stable-rc/linux-5.4.y build: 17 builds: 0 failed, 17 passed, 38 warnings (v5.4.250-314-g1362bbd1b27c7)
Full Build Summary: https://kernelci.org/build/stable-rc/branch/linux-5.4.y/kernel/v5.4.250-314…
Tree: stable-rc
Branch: linux-5.4.y
Git Describe: v5.4.250-314-g1362bbd1b27c7
Git Commit: 1362bbd1b27c72f90eb34039a8d1b0f8b8468a5e
Git URL: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
Built: 7 unique architectures
Warnings Detected:
arc:
arm64:
defconfig (gcc-10): 3 warnings
defconfig+arm64-chromebook (gcc-10): 3 warnings
arm:
imx_v6_v7_defconfig (gcc-10): 1 warning
multi_v5_defconfig (gcc-10): 1 warning
multi_v7_defconfig (gcc-10): 1 warning
omap2plus_defconfig (gcc-10): 1 warning
vexpress_defconfig (gcc-10): 1 warning
i386:
allnoconfig (gcc-10): 2 warnings
i386_defconfig (gcc-10): 3 warnings
tinyconfig (gcc-10): 2 warnings
mips:
32r2el_defconfig (gcc-10): 1 warning
riscv:
defconfig (gcc-10): 1 warning
x86_64:
allnoconfig (gcc-10): 4 warnings
tinyconfig (gcc-10): 4 warnings
x86_64_defconfig (gcc-10): 5 warnings
x86_64_defconfig+x86-chromebook (gcc-10): 5 warnings
Warnings summary:
12 fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
7 ld: warning: creating DT_TEXTREL in a PIE
4 ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
4 arch/arm64/include/asm/memory.h:238:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
3 ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
2 arch/x86/entry/entry_64.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.
2 arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x1c1: unsupported intra-function call
2 arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x151: unsupported intra-function call
2 arch/x86/entry/entry_64.S:1756: Warning: no instruction mnemonic suffix given and no register operands; using default for `sysret'
Section mismatches summary:
1 WARNING: vmlinux.o(___ksymtab_gpl+vic_init_cascaded+0x0): Section mismatch in reference from the variable __ksymtab_vic_init_cascaded to the function .init.text:vic_init_cascaded()
================================================================================
Detailed per-defconfig build reports:
--------------------------------------------------------------------------------
32r2el_defconfig (mips, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches
Warnings:
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
--------------------------------------------------------------------------------
allnoconfig (i386, gcc-10) — PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
allnoconfig (x86_64, gcc-10) — PASS, 0 errors, 4 warnings, 0 section mismatches
Warnings:
arch/x86/entry/entry_64.S:1756: Warning: no instruction mnemonic suffix given and no register operands; using default for `sysret'
arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x151: unsupported intra-function call
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
defconfig (riscv, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches
Warnings:
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
--------------------------------------------------------------------------------
defconfig (arm64, gcc-10) — PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
arch/arm64/include/asm/memory.h:238:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
arch/arm64/include/asm/memory.h:238:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
--------------------------------------------------------------------------------
defconfig+arm64-chromebook (arm64, gcc-10) — PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
arch/arm64/include/asm/memory.h:238:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
arch/arm64/include/asm/memory.h:238:15: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
--------------------------------------------------------------------------------
haps_hs_smp_defconfig (arc, gcc-10) — PASS, 0 errors, 0 warnings, 0 section mismatches
--------------------------------------------------------------------------------
i386_defconfig (i386, gcc-10) — PASS, 0 errors, 3 warnings, 0 section mismatches
Warnings:
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
imx_v6_v7_defconfig (arm, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches
Warnings:
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
--------------------------------------------------------------------------------
multi_v5_defconfig (arm, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches
Warnings:
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
Section mismatches:
WARNING: vmlinux.o(___ksymtab_gpl+vic_init_cascaded+0x0): Section mismatch in reference from the variable __ksymtab_vic_init_cascaded to the function .init.text:vic_init_cascaded()
--------------------------------------------------------------------------------
multi_v7_defconfig (arm, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches
Warnings:
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
--------------------------------------------------------------------------------
omap2plus_defconfig (arm, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches
Warnings:
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
--------------------------------------------------------------------------------
tinyconfig (i386, gcc-10) — PASS, 0 errors, 2 warnings, 0 section mismatches
Warnings:
ld: arch/x86/boot/compressed/head_32.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
tinyconfig (x86_64, gcc-10) — PASS, 0 errors, 4 warnings, 0 section mismatches
Warnings:
arch/x86/entry/entry_64.S:1756: Warning: no instruction mnemonic suffix given and no register operands; using default for `sysret'
arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x151: unsupported intra-function call
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
vexpress_defconfig (arm, gcc-10) — PASS, 0 errors, 1 warning, 0 section mismatches
Warnings:
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
--------------------------------------------------------------------------------
x86_64_defconfig (x86_64, gcc-10) — PASS, 0 errors, 5 warnings, 0 section mismatches
Warnings:
arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x1c1: unsupported intra-function call
arch/x86/entry/entry_64.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
--------------------------------------------------------------------------------
x86_64_defconfig+x86-chromebook (x86_64, gcc-10) — PASS, 0 errors, 5 warnings, 0 section mismatches
Warnings:
arch/x86/entry/entry_64.o: warning: objtool: .entry.text+0x1c1: unsupported intra-function call
arch/x86/entry/entry_64.o: warning: objtool: If this is a retpoline, please patch it in with alternatives and annotate it with ANNOTATE_NOSPEC_ALTERNATIVE.
fs/ext4/ioctl.c:595:7: warning: assignment to ‘int’ from ‘struct super_block *’ makes integer from pointer without a cast [-Wint-conversion]
ld: arch/x86/boot/compressed/head_64.o: warning: relocation in read-only section `.head.text'
ld: warning: creating DT_TEXTREL in a PIE
---
For more info write to <info(a)kernelci.org>
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
When configuring a pin as an output pin with a value of logic 0, we
end up as having a value of logic 1 on the output pin. Setting a
logic 0 a second time (or more) after that will correctly output a
logic 0 on the output pin.
By default, all GPIO pins are configured as inputs. When we enter
sc16is7xx_gpio_direction_output() for the first time, we first set the
desired value in IOSTATE, and then we configure the pin as an output.
The datasheet states that writing to IOSTATE register will trigger a
transfer of the value to the I/O pin configured as output, so if the
pin is configured as an input, nothing will be transferred.
Therefore, set the direction first in IODIR, and then set the desired
value in IOSTATE.
This is what is done in NXP application note AN10587.
Fixes: dfeae619d781 ("serial: sc16is7xx")
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Reviewed-by: Lech Perczak <lech.perczak(a)camlingroup.com>
Tested-by: Lech Perczak <lech.perczak(a)camlingroup.com>
---
drivers/tty/serial/sc16is7xx.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index bc0a288f258d..07ae889db296 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -1342,9 +1342,18 @@ static int sc16is7xx_gpio_direction_output(struct gpio_chip *chip,
state |= BIT(offset);
else
state &= ~BIT(offset);
- sc16is7xx_port_write(port, SC16IS7XX_IOSTATE_REG, state);
+
+ /*
+ * If we write IOSTATE first, and then IODIR, the output value is not
+ * transferred to the corresponding I/O pin.
+ * The datasheet states that each register bit will be transferred to
+ * the corresponding I/O pin programmed as output when writing to
+ * IOSTATE. Therefore, configure direction first with IODIR, and then
+ * set value after with IOSTATE.
+ */
sc16is7xx_port_update(port, SC16IS7XX_IODIR_REG, BIT(offset),
BIT(offset));
+ sc16is7xx_port_write(port, SC16IS7XX_IOSTATE_REG, state);
return 0;
}
--
2.30.2
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Some variants in this series of UART controllers have GPIO pins that
are shared between GPIO and modem control lines.
The pin mux mode (GPIO or modem control lines) can be set for each
ports (channels) supported by the variant.
This adds a property to the device tree to set the GPIO pin mux to
modem control lines on selected ports if needed.
Cc: <stable(a)vger.kernel.org> # 6.1.x
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Acked-by: Conor Dooley <conor.dooley(a)microchip.com>
Reviewed-by: Lech Perczak <lech.perczak(a)camlingroup.com>
---
.../bindings/serial/nxp,sc16is7xx.txt | 46 +++++++++++++++++++
1 file changed, 46 insertions(+)
diff --git a/Documentation/devicetree/bindings/serial/nxp,sc16is7xx.txt b/Documentation/devicetree/bindings/serial/nxp,sc16is7xx.txt
index 0fa8e3e43bf8..1a7e4bff0456 100644
--- a/Documentation/devicetree/bindings/serial/nxp,sc16is7xx.txt
+++ b/Documentation/devicetree/bindings/serial/nxp,sc16is7xx.txt
@@ -23,6 +23,9 @@ Optional properties:
1 = active low.
- irda-mode-ports: An array that lists the indices of the port that
should operate in IrDA mode.
+- nxp,modem-control-line-ports: An array that lists the indices of the port that
+ should have shared GPIO lines configured as
+ modem control lines.
Example:
sc16is750: sc16is750@51 {
@@ -35,6 +38,26 @@ Example:
#gpio-cells = <2>;
};
+ sc16is752: sc16is752@53 {
+ compatible = "nxp,sc16is752";
+ reg = <0x53>;
+ clocks = <&clk20m>;
+ interrupt-parent = <&gpio3>;
+ interrupts = <7 IRQ_TYPE_EDGE_FALLING>;
+ nxp,modem-control-line-ports = <1>; /* Port 1 as modem control lines */
+ gpio-controller; /* Port 0 as GPIOs */
+ #gpio-cells = <2>;
+ };
+
+ sc16is752: sc16is752@54 {
+ compatible = "nxp,sc16is752";
+ reg = <0x54>;
+ clocks = <&clk20m>;
+ interrupt-parent = <&gpio3>;
+ interrupts = <7 IRQ_TYPE_EDGE_FALLING>;
+ nxp,modem-control-line-ports = <0 1>; /* Ports 0 and 1 as modem control lines */
+ };
+
* spi as bus
Required properties:
@@ -59,6 +82,9 @@ Optional properties:
1 = active low.
- irda-mode-ports: An array that lists the indices of the port that
should operate in IrDA mode.
+- nxp,modem-control-line-ports: An array that lists the indices of the port that
+ should have shared GPIO lines configured as
+ modem control lines.
Example:
sc16is750: sc16is750@0 {
@@ -70,3 +96,23 @@ Example:
gpio-controller;
#gpio-cells = <2>;
};
+
+ sc16is752: sc16is752@1 {
+ compatible = "nxp,sc16is752";
+ reg = <1>;
+ clocks = <&clk20m>;
+ interrupt-parent = <&gpio3>;
+ interrupts = <7 IRQ_TYPE_EDGE_FALLING>;
+ nxp,modem-control-line-ports = <1>; /* Port 1 as modem control lines */
+ gpio-controller; /* Port 0 as GPIOs */
+ #gpio-cells = <2>;
+ };
+
+ sc16is752: sc16is752@2 {
+ compatible = "nxp,sc16is752";
+ reg = <2>;
+ clocks = <&clk20m>;
+ interrupt-parent = <&gpio3>;
+ interrupts = <7 IRQ_TYPE_EDGE_FALLING>;
+ nxp,modem-control-line-ports = <0 1>; /* Ports 0 and 1 as modem control lines */
+ };
--
2.30.2
On Mon, Jul 24, 2023 at 07:38:24PM +0000, Deucher, Alexander wrote:
> [AMD Official Use Only - General]
>
> > -----Original Message-----
> > From: Alex G. <mr.nuke.me(a)gmail.com>
> > Sent: Monday, July 24, 2023 3:23 PM
> > To: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>;
> > stable(a)vger.kernel.org
> > Cc: patches(a)lists.linux.dev; Limonciello, Mario
> > <Mario.Limonciello(a)amd.com>; Deucher, Alexander
> > <Alexander.Deucher(a)amd.com>; Wang, Chao-kai (Stylon)
> > <Stylon.Wang(a)amd.com>; Wu, Hersen <hersenxs.wu(a)amd.com>; Li, Roman
> > <Roman.Li(a)amd.com>; Wheeler, Daniel <Daniel.Wheeler(a)amd.com>; eniac-
> > xw.zhang(a)hp.com
> > Subject: Re: [PATCH 6.1 146/223] drm/amd/display: edp do not add non-edid
> > timings
> >
> > Hi Greg,
> >
> > This patch was
> > * originally added to v6.1.35
> > * reverted in v6.1.39
> > * added back in v6.1.40
> >
> > This patch is still reverted in mainline. Was this patch re-added by mistake in
> > v6.1.y stable?
>
> Yes, this patch should stay reverted.
Where was it reverted in the 6.1.y tree? And where was it reverted in
Linus's tree?
I think the confusion here is you have the same commit in the tree with
two different commit ids. So when I see the patches flow by, I applied
just this one to the tree, and I only see it in 6.1.40 with that id,
missing any possible revert of a previous version as the ids don't match
up.
In other words, what am I supposed to do here when you duplicate
commits? What's the revert of this commit, is it also in the tree
twice?
thanks,
greg k-h
From: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Commit 4bc91dbde0da ("drm/i915/lmem: Bypass aperture when lmem is available")
added a code path which does not map via GGTT, but was still setting the
ggtt write bit, and so triggering the GGTT flushing.
Fix it by not setting that bit unless the GGTT mapping path was used, and
replace the flush with wmb() in i915_vma_flush_writes().
This also works for the i915_gem_object_pin_map path added in
d976521a995a ("drm/i915: extend i915_vma_pin_iomap()").
It is hard to say if the fix has any observable effect, given that the
write-combine buffer gets flushed from intel_gt_flush_ggtt_writes too, but
apart from code clarity, skipping the needless GGTT flushing could be
beneficial on platforms with non-coherent GGTT. (See the code flow in
intel_gt_flush_ggtt_writes().)
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin(a)intel.com>
Fixes: 4bc91dbde0da ("drm/i915/lmem: Bypass aperture when lmem is available")
References: d976521a995a ("drm/i915: extend i915_vma_pin_iomap()")
Cc: Radhakrishna Sripada <radhakrishna.sripada(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v5.14+
---
drivers/gpu/drm/i915/i915_vma.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/i915_vma.c b/drivers/gpu/drm/i915/i915_vma.c
index ffb425ba591c..f2b626cd2755 100644
--- a/drivers/gpu/drm/i915/i915_vma.c
+++ b/drivers/gpu/drm/i915/i915_vma.c
@@ -602,7 +602,9 @@ void __iomem *i915_vma_pin_iomap(struct i915_vma *vma)
if (err)
goto err_unpin;
- i915_vma_set_ggtt_write(vma);
+ if (!i915_gem_object_is_lmem(vma->obj) &&
+ i915_vma_is_map_and_fenceable(vma))
+ i915_vma_set_ggtt_write(vma);
/* NB Access through the GTT requires the device to be awake. */
return page_mask_bits(ptr);
@@ -617,6 +619,8 @@ void i915_vma_flush_writes(struct i915_vma *vma)
{
if (i915_vma_unset_ggtt_write(vma))
intel_gt_flush_ggtt_writes(vma->vm->gt);
+ else
+ wmb(); /* Just flush the write-combine buffer. */
}
void i915_vma_unpin_iomap(struct i915_vma *vma)
--
2.39.2
Last year, the code that manages GSI channel transactions switched
from using spinlock-protected linked lists to using indexes into the
ring buffer used for a channel. Recently, Google reported seeing
transaction reference count underflows occasionally during shutdown.
Doug Anderson found a way to reproduce the issue reliably, and
bisected the issue to the commit that eliminated the linked lists
and the lock. The root cause was ultimately determined to be
related to unused transactions being committed as part of the modem
shutdown cleanup activity. Unused transactions are not normally
expected (except in error cases).
The modem uses some ranges of IPA-resident memory, and whenever it
shuts down we zero those ranges. In ipa_filter_reset_table() a
transaction is allocated to zero modem filter table entries. If
hashing is not supported, hashed table memory should not be zeroed.
But currently nothing prevents that, and the result is an unused
transaction. Something similar occurs when we zero routing table
entries for the modem.
By preventing any attempt to clear hashed tables when hashing is not
supported, the reference count underflow is avoided in this case.
Note that there likely remains an issue with properly freeing unused
transactions (if they occur due to errors). This patch addresses
only the underflows that Google originally reported.
Fixes: d338ae28d8a8 ("net: ipa: kill all other transaction lists")
Cc: <stable(a)vger.kernel.org> # 6.1.x
Tested-by: Douglas Anderson <dianders(a)chromium.org>
Signed-off-by: Alex Elder <elder(a)linaro.org>
---
drivers/net/ipa/ipa_table.c | 26 ++++++++++++++------------
1 file changed, 14 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ipa/ipa_table.c b/drivers/net/ipa/ipa_table.c
index 510ff2dc8999a..cd81dd916c29e 100644
--- a/drivers/net/ipa/ipa_table.c
+++ b/drivers/net/ipa/ipa_table.c
@@ -311,16 +311,15 @@ static int ipa_filter_reset(struct ipa *ipa, bool modem)
if (ret)
return ret;
- ret = ipa_filter_reset_table(ipa, IPA_MEM_V4_FILTER_HASHED, modem);
- if (ret)
- return ret;
-
ret = ipa_filter_reset_table(ipa, IPA_MEM_V6_FILTER, modem);
+ if (ret || !ipa_table_hash_support(ipa))
+ return ret;
+
+ ret = ipa_filter_reset_table(ipa, IPA_MEM_V4_FILTER_HASHED, modem);
if (ret)
return ret;
- ret = ipa_filter_reset_table(ipa, IPA_MEM_V6_FILTER_HASHED, modem);
- return ret;
+ return ipa_filter_reset_table(ipa, IPA_MEM_V6_FILTER_HASHED, modem);
}
/* The AP routes and modem routes are each contiguous within the
@@ -329,11 +328,12 @@ static int ipa_filter_reset(struct ipa *ipa, bool modem)
* */
static int ipa_route_reset(struct ipa *ipa, bool modem)
{
+ bool hash_support = ipa_table_hash_support(ipa);
struct gsi_trans *trans;
u16 first;
u16 count;
- trans = ipa_cmd_trans_alloc(ipa, 4);
+ trans = ipa_cmd_trans_alloc(ipa, hash_support ? 4 : 2);
if (!trans) {
dev_err(&ipa->pdev->dev,
"no transaction for %s route reset\n",
@@ -350,12 +350,14 @@ static int ipa_route_reset(struct ipa *ipa, bool modem)
}
ipa_table_reset_add(trans, false, first, count, IPA_MEM_V4_ROUTE);
- ipa_table_reset_add(trans, false, first, count,
- IPA_MEM_V4_ROUTE_HASHED);
-
ipa_table_reset_add(trans, false, first, count, IPA_MEM_V6_ROUTE);
- ipa_table_reset_add(trans, false, first, count,
- IPA_MEM_V6_ROUTE_HASHED);
+
+ if (hash_support) {
+ ipa_table_reset_add(trans, false, first, count,
+ IPA_MEM_V4_ROUTE_HASHED);
+ ipa_table_reset_add(trans, false, first, count,
+ IPA_MEM_V6_ROUTE_HASHED);
+ }
gsi_trans_commit_wait(trans);
--
2.34.1
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x c2a88e8bdf5f6239948d75283d0ae7e0c7945b03
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072327-repost-magma-4171@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
c2a88e8bdf5f ("drm/client: Fix memory leak in drm_client_target_cloned")
cf13909aee05 ("drm/fb-helper: Move out modeset config code")
aafa9e066872 ("drm/fb-helper: Prepare to move out modeset config code")
e5852bee90d6 ("drm/fb-helper: Remove drm_fb_helper_connector")
aec3925f093d ("drm/fb-helper: Move out commit code")
eade2a17ddc5 ("drm/fb-helper: Prepare to move out commit code")
d81294afeecd ("drm/fb-helper: Remove drm_fb_helper_crtc")
4672b1d65fc9 ("Merge remote-tracking branch 'drm/drm-next' into drm-misc-next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c2a88e8bdf5f6239948d75283d0ae7e0c7945b03 Mon Sep 17 00:00:00 2001
From: Jocelyn Falempe <jfalempe(a)redhat.com>
Date: Tue, 11 Jul 2023 11:20:43 +0200
Subject: [PATCH] drm/client: Fix memory leak in drm_client_target_cloned
dmt_mode is allocated and never freed in this function.
It was found with the ast driver, but most drivers using generic fbdev
setup are probably affected.
This fixes the following kmemleak report:
backtrace:
[<00000000b391296d>] drm_mode_duplicate+0x45/0x220 [drm]
[<00000000e45bb5b3>] drm_client_target_cloned.constprop.0+0x27b/0x480 [drm]
[<00000000ed2d3a37>] drm_client_modeset_probe+0x6bd/0xf50 [drm]
[<0000000010e5cc9d>] __drm_fb_helper_initial_config_and_unlock+0xb4/0x2c0 [drm_kms_helper]
[<00000000909f82ca>] drm_fbdev_client_hotplug+0x2bc/0x4d0 [drm_kms_helper]
[<00000000063a69aa>] drm_client_register+0x169/0x240 [drm]
[<00000000a8c61525>] ast_pci_probe+0x142/0x190 [ast]
[<00000000987f19bb>] local_pci_probe+0xdc/0x180
[<000000004fca231b>] work_for_cpu_fn+0x4e/0xa0
[<0000000000b85301>] process_one_work+0x8b7/0x1540
[<000000003375b17c>] worker_thread+0x70a/0xed0
[<00000000b0d43cd9>] kthread+0x29f/0x340
[<000000008d770833>] ret_from_fork+0x1f/0x30
unreferenced object 0xff11000333089a00 (size 128):
cc: <stable(a)vger.kernel.org>
Fixes: 1d42bbc8f7f9 ("drm/fbdev: fix cloning on fbcon")
Reported-by: Zhang Yi <yizhan(a)redhat.com>
Signed-off-by: Jocelyn Falempe <jfalempe(a)redhat.com>
Reviewed-by: Javier Martinez Canillas <javierm(a)redhat.com>
Reviewed-by: Thomas Zimmermann <tzimmermann(a)suse.de>
Link: https://patchwork.freedesktop.org/patch/msgid/20230711092203.68157-2-jfalem…
diff --git a/drivers/gpu/drm/drm_client_modeset.c b/drivers/gpu/drm/drm_client_modeset.c
index 1b12a3c201a3..a4a62aa99984 100644
--- a/drivers/gpu/drm/drm_client_modeset.c
+++ b/drivers/gpu/drm/drm_client_modeset.c
@@ -311,6 +311,9 @@ static bool drm_client_target_cloned(struct drm_device *dev,
can_clone = true;
dmt_mode = drm_mode_find_dmt(dev, 1024, 768, 60, false);
+ if (!dmt_mode)
+ goto fail;
+
for (i = 0; i < connector_count; i++) {
if (!enabled[i])
continue;
@@ -326,11 +329,13 @@ static bool drm_client_target_cloned(struct drm_device *dev,
if (!modes[i])
can_clone = false;
}
+ kfree(dmt_mode);
if (can_clone) {
DRM_DEBUG_KMS("can clone using 1024x768\n");
return true;
}
+fail:
DRM_INFO("kms: can't enable cloning when we probably wanted to.\n");
return false;
}
"drm/amd/display: Add polling method to handle MST reply packet"
was intended to be backported to kernel 6.1 to solve usage of some
non-spec compliant MST hubs.
This series has a variety of dependencies across amdgpu_dm though,
so the backport requires several manual fixups to avoid hitting
file and symbol renames.
Details of individual changes are in the commit message for each
patch.
Hamza Mahfooz (1):
drm/amd/display: use max_dsc_bpp in amdgpu_dm
Hersen Wu (1):
drm/amd/display: fix linux dp link lost handled only one time
Qingqing Zhuo (1):
drm/amd/display: force connector state when bpc changes during
compliance
Srinivasan Shanmugam (2):
drm/amd/display: fix some coding style issues
drm/amd/display: Clean up errors & warnings in amdgpu_dm.c
Wayne Lin (2):
drm/dp_mst: Clear MSG_RDY flag before sending new message
drm/amd/display: Add polling method to handle MST reply packet
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 407 +++++++++---------
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h | 12 +
.../amd/display/amdgpu_dm/amdgpu_dm_helpers.c | 125 ++++++
.../display/amdgpu_dm/amdgpu_dm_mst_types.c | 121 +++++-
.../display/amdgpu_dm/amdgpu_dm_mst_types.h | 11 +
.../gpu/drm/amd/display/dc/core/dc_link_dp.c | 141 +-----
drivers/gpu/drm/amd/display/dc/dm_helpers.h | 6 +
.../gpu/drm/amd/display/dc/inc/dc_link_dp.h | 4 +
drivers/gpu/drm/display/drm_dp_mst_topology.c | 54 ++-
drivers/gpu/drm/i915/display/intel_dp.c | 7 +-
drivers/gpu/drm/nouveau/dispnv50/disp.c | 12 +-
include/drm/display/drm_dp_mst_helper.h | 7 +-
12 files changed, 571 insertions(+), 336 deletions(-)
--
2.34.1
Hi,
The patch "4f6d9e38c4d2 ("drm/amd/display: Add polling method to handle
MST reply packet")" was tagged for stable, but it failed to apply to
both 6.4.y and 6.1.y.
The 6.4.y backport is missing a dependency. So to fix this can you
please apply:
87279fdf5ee0 ("drm/amd/display: Clean up errors & warnings in amdgpu_dm.c")
4f6d9e38c4d2 ("drm/amd/display: Add polling method to handle MST reply
packet")
I intend to also send out a proper 6.1.y backport, but it's a lot more
complex due to other driver changes and will come separately.
Thanks,
Hi,
Wish my email finds you well.
This is Steven from Xiamen Oready Industry & Trade Co.,Ltd. We are one of the leading manufacturer of bags in Xiamen,China.
Here is a good quality drawstring backpack I would like to recommend to you:
Material: 300D polyester + 10mm cord.
Size: 38x42cm
Side zipper pocket fits with mobile phone and keys.
Print: silkscreen print / heat transfer print / sublimation print
MOQ: 1000 pcs
We have many more designs you may like, please don't hesitate to contact me if you have any inquiry.
Best regards,
Steven Xiu
Xiamen Oready Industry & Trade Co.,Ltd.
Why, Lord Anvers asked me to marry him
The mid-day sun beat down upon the stone "Scales
2023/7/25 16:51:02
Queued for loongarch-next, thanks.
Huacai
On Tue, Jul 25, 2023 at 3:05 PM Tiezhu Yang <yangtiezhu(a)loongson.cn> wrote:
>
> As the code comment says, the initial aim is to reduce one instruction
> in some corner cases, if bit[51:31] is all 0 or all 1, no need to call
> lu32id, that is to say, it should call lu32id only if bit[51:31] is not
> all 0 and not all 1. The current code always call lu32id, the result is
> right but the logic is unexpected and wrong, fix it.
>
> Cc: stable(a)vger.kernel.org # 6.1
> Fixes: 5dc615520c4d ("LoongArch: Add BPF JIT support")
> Reported-by: Colin King (gmail) <colin.i.king(a)gmail.com>
> Closes: https://lore.kernel.org/all/bcf97046-e336-712a-ac68-7fd194f2953e@gmail.com/
> Signed-off-by: Tiezhu Yang <yangtiezhu(a)loongson.cn>
> ---
> arch/loongarch/net/bpf_jit.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/arch/loongarch/net/bpf_jit.h b/arch/loongarch/net/bpf_jit.h
> index c335dc4..6858633 100644
> --- a/arch/loongarch/net/bpf_jit.h
> +++ b/arch/loongarch/net/bpf_jit.h
> @@ -150,7 +150,7 @@ static inline void move_imm(struct jit_ctx *ctx, enum loongarch_gpr rd, long imm
> * no need to call lu32id to do a new filled operation.
> */
> imm_51_31 = (imm >> 31) & 0x1fffff;
> - if (imm_51_31 != 0 || imm_51_31 != 0x1fffff) {
> + if (imm_51_31 != 0 && imm_51_31 != 0x1fffff) {
> /* lu32id rd, imm_51_32 */
> imm_51_32 = (imm >> 32) & 0xfffff;
> emit_insn(ctx, lu32id, rd, imm_51_32);
> --
> 2.1.0
>
>
`rustc` outputs by default the temporary files (i.e. the ones saved
by `-Csave-temps`, such as `*.rcgu*` files) in the current working
directory when `-o` and `--out-dir` are not given (even if
`--emit=x=path` is given, i.e. it does not use those for temporaries).
Since out-of-tree modules are compiled from the `linux` tree,
`rustc` then tries to create them there, which may not be accessible.
Thus pass `--out-dir` explicitly, even if it is just for the temporary
files.
Similarly, do so for Rust host programs too.
Reported-by: Raphael Nestler <raphael.nestler(a)gmail.com>
Closes: https://github.com/Rust-for-Linux/linux/issues/1015
Reported-by: Andrea Righi <andrea.righi(a)canonical.com>
Tested-by: Raphael Nestler <raphael.nestler(a)gmail.com> # non-hostprogs
Tested-by: Andrea Righi <andrea.righi(a)canonical.com> # non-hostprogs
Fixes: 295d8398c67e ("kbuild: specify output names separately for each emission type from rustc")
Cc: stable(a)vger.kernel.org
Signed-off-by: Miguel Ojeda <ojeda(a)kernel.org>
---
v2:
- Add fix for host programs too (Masahiro).
scripts/Makefile.build | 5 ++++-
scripts/Makefile.host | 6 +++++-
2 files changed, 9 insertions(+), 2 deletions(-)
diff --git a/scripts/Makefile.build b/scripts/Makefile.build
index 6413342a03f4..82e3fb19fdaf 100644
--- a/scripts/Makefile.build
+++ b/scripts/Makefile.build
@@ -264,6 +264,9 @@ $(obj)/%.lst: $(src)/%.c FORCE
rust_allowed_features := new_uninit
+# `--out-dir` is required to avoid temporaries being created by `rustc` in the
+# current working directory, which may be not accessible in the out-of-tree
+# modules case.
rust_common_cmd = \
RUST_MODFILE=$(modfile) $(RUSTC_OR_CLIPPY) $(rust_flags) \
-Zallow-features=$(rust_allowed_features) \
@@ -272,7 +275,7 @@ rust_common_cmd = \
--extern alloc --extern kernel \
--crate-type rlib -L $(objtree)/rust/ \
--crate-name $(basename $(notdir $@)) \
- --emit=dep-info=$(depfile)
+ --out-dir $(dir $@) --emit=dep-info=$(depfile)
# `--emit=obj`, `--emit=asm` and `--emit=llvm-ir` imply a single codegen unit
# will be used. We explicitly request `-Ccodegen-units=1` in any case, and
diff --git a/scripts/Makefile.host b/scripts/Makefile.host
index 7aea9005e497..8f7f842b54f9 100644
--- a/scripts/Makefile.host
+++ b/scripts/Makefile.host
@@ -86,7 +86,11 @@ hostc_flags = -Wp,-MMD,$(depfile) \
hostcxx_flags = -Wp,-MMD,$(depfile) \
$(KBUILD_HOSTCXXFLAGS) $(HOST_EXTRACXXFLAGS) \
$(HOSTCXXFLAGS_$(target-stem).o)
-hostrust_flags = --emit=dep-info=$(depfile) \
+
+# `--out-dir` is required to avoid temporaries being created by `rustc` in the
+# current working directory, which may be not accessible in the out-of-tree
+# modules case.
+hostrust_flags = --out-dir $(dir $@) --emit=dep-info=$(depfile) \
$(KBUILD_HOSTRUSTFLAGS) $(HOST_EXTRARUSTFLAGS) \
$(HOSTRUSTFLAGS_$(target-stem))
base-commit: 06c2afb862f9da8dc5efa4b6076a0e48c3fbaaa5
--
2.41.0
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x cf2ffdea0839398cb0551762af7f5efb0a6e0fea
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072337-dreamlike-rewrite-a12e@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From cf2ffdea0839398cb0551762af7f5efb0a6e0fea Mon Sep 17 00:00:00 2001
From: Heiner Kallweit <hkallweit1(a)gmail.com>
Date: Tue, 18 Jul 2023 13:11:31 +0200
Subject: [PATCH] r8169: revert 2ab19de62d67 ("r8169: remove ASPM restrictions
now that ASPM is disabled during NAPI poll")
There have been reports that on a number of systems this change breaks
network connectivity. Therefore effectively revert it. Mainly affected
seem to be systems where BIOS denies ASPM access to OS.
Due to later changes we can't do a direct revert.
Fixes: 2ab19de62d67 ("r8169: remove ASPM restrictions now that ASPM is disabled during NAPI poll")
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/netdev/e47bac0d-e802-65e1-b311-6acb26d5cf10@freenet…
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217596
Signed-off-by: Heiner Kallweit <hkallweit1(a)gmail.com>
Link: https://lore.kernel.org/r/57f13ec0-b216-d5d8-363d-5b05528ec5fb@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
diff --git a/drivers/net/ethernet/realtek/r8169_main.c b/drivers/net/ethernet/realtek/r8169_main.c
index fce4a2b908c2..8a8b7d8a5c3f 100644
--- a/drivers/net/ethernet/realtek/r8169_main.c
+++ b/drivers/net/ethernet/realtek/r8169_main.c
@@ -623,6 +623,7 @@ struct rtl8169_private {
int cfg9346_usage_count;
unsigned supports_gmii:1;
+ unsigned aspm_manageable:1;
dma_addr_t counters_phys_addr;
struct rtl8169_counters *counters;
struct rtl8169_tc_offsets tc_offset;
@@ -2746,7 +2747,8 @@ static void rtl_hw_aspm_clkreq_enable(struct rtl8169_private *tp, bool enable)
if (tp->mac_version < RTL_GIGA_MAC_VER_32)
return;
- if (enable) {
+ /* Don't enable ASPM in the chip if OS can't control ASPM */
+ if (enable && tp->aspm_manageable) {
/* On these chip versions ASPM can even harm
* bus communication of other PCI devices.
*/
@@ -5165,6 +5167,16 @@ static void rtl_init_mac_address(struct rtl8169_private *tp)
rtl_rar_set(tp, mac_addr);
}
+/* register is set if system vendor successfully tested ASPM 1.2 */
+static bool rtl_aspm_is_safe(struct rtl8169_private *tp)
+{
+ if (tp->mac_version >= RTL_GIGA_MAC_VER_61 &&
+ r8168_mac_ocp_read(tp, 0xc0b2) & 0xf)
+ return true;
+
+ return false;
+}
+
static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
{
struct rtl8169_private *tp;
@@ -5234,6 +5246,19 @@ static int rtl_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
xid);
tp->mac_version = chipset;
+ /* Disable ASPM L1 as that cause random device stop working
+ * problems as well as full system hangs for some PCIe devices users.
+ * Chips from RTL8168h partially have issues with L1.2, but seem
+ * to work fine with L1 and L1.1.
+ */
+ if (rtl_aspm_is_safe(tp))
+ rc = 0;
+ else if (tp->mac_version >= RTL_GIGA_MAC_VER_46)
+ rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1_2);
+ else
+ rc = pci_disable_link_state(pdev, PCIE_LINK_STATE_L1);
+ tp->aspm_manageable = !rc;
+
tp->dash_type = rtl_check_dash(tp);
tp->cp_cmd = RTL_R16(tp, CPlusCmd) & CPCMD_MASK;
After an initial link up the CAN device is in ERROR-ACTIVE mode. Due
to a missing CAN_STATE_STOPPED in gs_can_close() it doesn't change to
STOPPED after a link down:
| ip link set dev can0 up
| ip link set dev can0 down
| ip --details link show can0
| 13: can0: <NOARP,ECHO> mtu 16 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 10
| link/can promiscuity 0 allmulti 0 minmtu 0 maxmtu 0
| can state ERROR-ACTIVE restart-ms 1000
Add missing assignment of CAN_STATE_STOPPED in gs_can_close().
Cc: stable(a)vger.kernel.org
Fixes: d08e973a77d1 ("can: gs_usb: Added support for the GS_USB CAN devices")
Link: https://lore.kernel.org/all/20230718-gs_usb-fix-can-state-v1-1-f19738ae2c23…
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/usb/gs_usb.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/net/can/usb/gs_usb.c b/drivers/net/can/usb/gs_usb.c
index f418066569fc..bd9eb066ecf1 100644
--- a/drivers/net/can/usb/gs_usb.c
+++ b/drivers/net/can/usb/gs_usb.c
@@ -1030,6 +1030,8 @@ static int gs_can_close(struct net_device *netdev)
usb_kill_anchored_urbs(&dev->tx_submitted);
atomic_set(&dev->active_tx_urbs, 0);
+ dev->can.state = CAN_STATE_STOPPED;
+
/* reset the device */
rc = gs_cmd_reset(dev);
if (rc < 0)
base-commit: 9f9d4c1a2e82174a4e799ec405284a2b0de32b6a
--
2.40.1
Hi,
as there are new hardware directives, we need a little adaptation
for the AUX invalidation sequence.
In this version we support all the engines affected by this
change.
The stable backport has some challenges because the original
patch that this series fixes has had more changes in between.
This 9th version is simplified compared to the previous round.
Thanks a lot Nirmoy, Andrzej and Matt for your review and for the
fruitful discussions!
Thanks,
Andi
Changelog:
=========
v8 -> v9
- Removed the patch that was refactoring into
intel_emit_pipe_control_cs() the pipe control emission as it
was adding more confusion than simplification.
- Cleaned up the engine quiescing sequence for BCS, VECS and VCS
engines. After this the patch that was extending the quiescing
sequence was very little to be kept on its own and squashed
with the one adding the CCS_FLUSH in the pipe control.
- The above changes have brought some more refactoring in the
very last patch.
- Added, whenever needed, the "Required:" tag to make sure
dependencies are held.
v7 -> v8
- Removed the aux invalidation from the device info and added a
helper function, instead (patch 2).
- Use "MTL and beyond" instead of "MTL+" in comments.
- Use the "gen12_" prefix instead of "intel_".
- In patch 6 return an int error instead of an error embedded in
the pointer in the intel_emit_pipe_control_cs() function and
propagate the error to the upper layers.
v6 -> v7
- Fix correct sequence applied to the correct engine. A little
confusion promptly cought by Nirmoy when applying to the VD
engine the sequence belonging to the compute engines. Thanks a
lot, Nirmoy!
v5 -> v6
- Fixed ccs flush in the engines VE and BCS. They are sent as a
separate command instead of added in the pipe control.
- Separated the CCS flusing in the pipe control patch with the
quiescing of the memory. They were meant to be on separate
patch already in the previous verision, but apparently I
squashed them by mistake.
v4 -> v5
- The AUX CCS is added as a device property instead of checking
against FLAT CCS. This adds the new HAS_AUX_CCS check
(Patch 2, new).
- little and trivial refactoring here and there.
- extended the flags{0,1}/bit_group_{0,1} renaming to other
functions.
- Created an intel_emit_pipe_control_cs() wrapper for submitting
the pipe control.
- Quiesce memory for all the engines, not just RCS (Patch 6,
new).
- The PIPE_CONTROL_CCS_FLUSH is added to all the engines.
- Remove redundant EMIT_FLUSH_CCS mode flag.
- Remove unnecessary NOOPs from the command streamer for
invalidating the CCS table.
- Use INVALID_MMIO_REG and gen12_get_aux_inv_reg() instad of
__MMIO(0) and reg.reg.
- Remove useless wrapper and just use gen12_get_aux_inv_reg().
v3 -> v4
- A trivial patch 3 is added to rename the flags with
bit_group_{0,1} to align with the datasheet naming.
- Patch 4 fixes a confusion I made where the CCS flag was
applied to the wrong bit group.
v2 -> v3
- added r-b from Nirmoy in patch 1 and 4.
- added patch 3 which enables the ccs_flush in the control pipe
for mtl+ compute and render engines.
- added redundant checks in patch 2 for enabling the EMIT_FLUSH
flag.
v1 -> v2
- add a clean up preliminary patch for the existing registers
- add support for more engines
- add the Fixes tag
Andi Shyti (5):
drm/i915/gt: Cleanup aux invalidation registers
drm/i915: Add the gen12_needs_ccs_aux_inv helper
drm/i915/gt: Rename flags with bit_group_X according to the datasheet
drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control and in the
CS
drm/i915/gt: Support aux invalidation on all engines
Jonathan Cavitt (2):
drm/i915/gt: Ensure memory quiesced before invalidation
drm/i915/gt: Poll aux invalidation register bit on invalidation
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 140 ++++++++++++-------
drivers/gpu/drm/i915/gt/gen8_engine_cs.h | 21 +--
drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 2 +
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 16 +--
drivers/gpu/drm/i915/gt/intel_lrc.c | 17 +--
5 files changed, 117 insertions(+), 79 deletions(-)
--
2.40.1
Last year, the code that manages GSI channel transactions switched
from using spinlock-protected linked lists to using indexes into the
ring buffer used for a channel. Recently, Google reported seeing
transaction reference count underflows occasionally during shutdown.
Doug Anderson found a way to reproduce the issue reliably, and
bisected the issue to the commit that eliminated the linked lists
and the lock. The root cause was ultimately determined to be
related to unused transactions being committed as part of the modem
shutdown cleanup activity. Unused transactions are not normally
expected (except in error cases).
The modem uses some ranges of IPA-resident memory, and whenever it
shuts down we zero those ranges. In ipa_filter_reset_table() a
transaction is allocated to zero modem filter table entries. If
hashing is not supported, hashed table memory should not be zeroed.
But currently nothing prevents that, and the result is an unused
transaction. Something similar occurs when we zero routing table
entries for the modem.
By preventing any attempt to clear hashed tables when hashing is not
supported, the reference count underflow is avoided in this case.
Note that there likely remains an issue with properly freeing unused
transactions (if they occur due to errors). This patch addresses
only the underflows that Google originally reported.
Fixes: d338ae28d8a8 ("net: ipa: kill all other transaction lists")
Cc: <stable(a)vger.kernel.org> # 6.4.x
Tested-by: Douglas Anderson <dianders(a)chromium.org>
Signed-off-by: Alex Elder <elder(a)linaro.org>
---
drivers/net/ipa/ipa_table.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/drivers/net/ipa/ipa_table.c b/drivers/net/ipa/ipa_table.c
index f0529c31d0b6e..7b637bb8b41c8 100644
--- a/drivers/net/ipa/ipa_table.c
+++ b/drivers/net/ipa/ipa_table.c
@@ -273,16 +273,15 @@ static int ipa_filter_reset(struct ipa *ipa, bool modem)
if (ret)
return ret;
- ret = ipa_filter_reset_table(ipa, true, false, modem);
- if (ret)
- return ret;
-
ret = ipa_filter_reset_table(ipa, false, true, modem);
+ if (ret || !ipa_table_hash_support(ipa))
+ return ret;
+
+ ret = ipa_filter_reset_table(ipa, true, false, modem);
if (ret)
return ret;
- ret = ipa_filter_reset_table(ipa, true, true, modem);
- return ret;
+ return ipa_filter_reset_table(ipa, true, true, modem);
}
/* The AP routes and modem routes are each contiguous within the
@@ -291,12 +290,13 @@ static int ipa_filter_reset(struct ipa *ipa, bool modem)
* */
static int ipa_route_reset(struct ipa *ipa, bool modem)
{
+ bool hash_support = ipa_table_hash_support(ipa);
u32 modem_route_count = ipa->modem_route_count;
struct gsi_trans *trans;
u16 first;
u16 count;
- trans = ipa_cmd_trans_alloc(ipa, 4);
+ trans = ipa_cmd_trans_alloc(ipa, hash_support ? 4 : 2);
if (!trans) {
dev_err(&ipa->pdev->dev,
"no transaction for %s route reset\n",
@@ -313,10 +313,12 @@ static int ipa_route_reset(struct ipa *ipa, bool modem)
}
ipa_table_reset_add(trans, false, false, false, first, count);
- ipa_table_reset_add(trans, false, true, false, first, count);
-
ipa_table_reset_add(trans, false, false, true, first, count);
- ipa_table_reset_add(trans, false, true, true, first, count);
+
+ if (hash_support) {
+ ipa_table_reset_add(trans, false, true, false, first, count);
+ ipa_table_reset_add(trans, false, true, true, first, count);
+ }
gsi_trans_commit_wait(trans);
--
2.34.1
This series attempts to fix a reported exception with mmap
on newer kernels.
Fixes: 1543b4c5071c ("fs/9p: remove writeback fid and fix per-file modes")
Link: https://lore.kernel.org/v9fs/ZK25XZ%2BGpR3KHIB%2F@pengutronix.de/
Reported-by: Robert Schwebel <r.schwebel(a)pengutronix.de>
Signed-off-by: Eric Van Hensbergen <ericvh(a)kernel.org>
---
Changes in v3:
- Clarify debug print to read-only mmap mode versus no mmap mode in
v9fs_file_mmap
- Fix suggested regression tags and propagate across series
- Link to v2: https://lore.kernel.org/r/20230716-fixes-overly-restrictive-mmap-v2-0-147d6…
Changes in v2:
- fix requested changes in commit messages
- add patch to remove unnecessary invalidate_inode_pages in mmap readonly path
- Link to v1: https://lore.kernel.org/r/20230716-fixes-overly-restrictive-mmap-v1-0-0683b…
---
Eric Van Hensbergen (4):
fs/9p: remove unnecessary and overrestrictive check
fs/9p: fix typo in comparison logic for cache mode
fs/9p: fix type mismatch in file cache mode helper
fs/9p: remove unnecessary invalidate_inode_pages2
fs/9p/fid.h | 6 +++---
fs/9p/vfs_file.c | 5 +----
2 files changed, 4 insertions(+), 7 deletions(-)
---
base-commit: 95f41d87810083d8b3dedcce46a4e356cf4a9673
change-id: 20230716-fixes-overly-restrictive-mmap-30a23501e787
Best regards,
--
Eric Van Hensbergen <ericvh(a)kernel.org>
On hardware that supports Indirect Branch Tracking (IBT), Hyper-V VMs
with ConfigVersion 9.3 or later support IBT in the guest. However,
current versions of Hyper-V have a bug in that there's not an ENDBR64
instruction at the beginning of the hypercall page. Since hypercalls are
made with an indirect call to the hypercall page, all hypercall attempts
fail with an exception and Linux panics.
A Hyper-V fix is in progress to add ENDBR64. But guard against the Linux
panic by clearing X86_FEATURE_IBT if the hypercall page doesn't start
with ENDBR. The VM will boot and run without IBT.
If future Linux 32-bit kernels were to support IBT, additional hypercall
page hackery would be needed to make IBT work for such kernels in a
Hyper-V VM.
Cc: stable(a)vger.kernel.org
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
---
Changes in v2:
* Use pr_warn() instead of pr_info() [Peter Zijlstra]
arch/x86/hyperv/hv_init.c | 21 +++++++++++++++++++++
1 file changed, 21 insertions(+)
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 6c04b52..5cbee24 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -14,6 +14,7 @@
#include <asm/apic.h>
#include <asm/desc.h>
#include <asm/sev.h>
+#include <asm/ibt.h>
#include <asm/hypervisor.h>
#include <asm/hyperv-tlfs.h>
#include <asm/mshyperv.h>
@@ -472,6 +473,26 @@ void __init hyperv_init(void)
}
/*
+ * Some versions of Hyper-V that provide IBT in guest VMs have a bug
+ * in that there's no ENDBR64 instruction at the entry to the
+ * hypercall page. Because hypercalls are invoked via an indirect call
+ * to the hypercall page, all hypercall attempts fail when IBT is
+ * enabled, and Linux panics. For such buggy versions, disable IBT.
+ *
+ * Fixed versions of Hyper-V always provide ENDBR64 on the hypercall
+ * page, so if future Linux kernel versions enable IBT for 32-bit
+ * builds, additional hypercall page hackery will be required here
+ * to provide an ENDBR32.
+ */
+#ifdef CONFIG_X86_KERNEL_IBT
+ if (cpu_feature_enabled(X86_FEATURE_IBT) &&
+ *(u32 *)hv_hypercall_pg != gen_endbr()) {
+ setup_clear_cpu_cap(X86_FEATURE_IBT);
+ pr_warn("Hyper-V: Disabling IBT because of Hyper-V bug\n");
+ }
+#endif
+
+ /*
* hyperv_init() is called before LAPIC is initialized: see
* apic_intr_mode_init() -> x86_platform.apic_post_init() and
* apic_bsp_setup() -> setup_local_APIC(). The direct-mode STIMER
--
1.8.3.1
Hi Andres,
With this commit applied to the 6.1 and later kernels (others not
tested) the iowait time ("wa" field in top) in an ARM64 build running
on a 4 core CPU (a Raspberry Pi 4 B) increases to 25%, as if one core
is permanently blocked on I/O. The change can be observed after
installing mariadb-server (no configuration or use is required). After
reverting just this commit, "wa" drops to zero again.
I can believe that this change hasn't negatively affected performance,
but the result is misleading. I also think it's pushing the boundaries
of what a back-port to stable should do.
Phil
From: Robin Murphy <robin.murphy(a)arm.com>
commit 309a15cb16bb075da1c99d46fb457db6a1a2669e upstream
To work around MMU-700 erratum 2812531 we need to ensure that certain
sequences of commands cannot be issued without an intervening sync. In
practice this falls out of our current command-batching machinery
anyway - each batch only contains a single type of invalidation command,
and ends with a sync. The only exception is when a batch is sufficiently
large to need issuing across multiple command queue slots, wherein the
earlier slots will not contain a sync and thus may in theory interleave
with another batch being issued in parallel to create an affected
sequence across the slot boundary.
Since MMU-700 supports range invalidate commands and thus we will prefer
to use them (which also happens to avoid conditions for other errata),
I'm not entirely sure it's even possible for a single high-level
invalidate call to generate a batch of more than 63 commands, but for
the sake of robustness and documentation, wire up an option to enforce
that a sync is always inserted for every slot issued.
The other aspect is that the relative order of DVM commands cannot be
controlled, so DVM cannot be used. Again that is already the status quo,
but since we have at least defined ARM_SMMU_FEAT_BTM, we can explicitly
disable it for documentation purposes even if it's not wired up anywhere
yet.
Signed-off-by: Robin Murphy <robin.murphy(a)arm.com>
Reviewed-by: Nicolin Chen <nicolinc(a)nvidia.com>
Link: https://lore.kernel.org/r/330221cdfd0003cd51b6c04e7ff3566741ad8374.16837312…
Signed-off-by: Will Deacon <will(a)kernel.org>
Signed-off-by: Easwar Hariharan <eahariha(a)linux.microsoft.com>
---
Documentation/arm64/silicon-errata.rst | 4 +++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c | 39 +++++++++++++++++++++
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h | 1 +
3 files changed, 44 insertions(+)
diff --git a/Documentation/arm64/silicon-errata.rst b/Documentation/arm64/silicon-errata.rst
index 55492fea4427..120784507bc0 100644
--- a/Documentation/arm64/silicon-errata.rst
+++ b/Documentation/arm64/silicon-errata.rst
@@ -138,6 +138,10 @@ stable kernels.
+----------------+-----------------+-----------------+-----------------------------+
| ARM | MMU-500 | #841119,826419 | N/A |
+----------------+-----------------+-----------------+-----------------------------+
+| ARM | MMU-600 | #1076982 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+| ARM | MMU-700 | #2812531 | N/A |
++----------------+-----------------+-----------------+-----------------------------+
+----------------+-----------------+-----------------+-----------------------------+
| Broadcom | Brahma-B53 | N/A | ARM64_ERRATUM_845719 |
+----------------+-----------------+-----------------+-----------------------------+
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
index d4d8bfee9feb..3e20c60e9c32 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -882,6 +882,12 @@ static void arm_smmu_cmdq_batch_add(struct arm_smmu_device *smmu,
{
int index;
+ if (cmds->num == CMDQ_BATCH_ENTRIES - 1 &&
+ (smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) {
+ arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, true);
+ cmds->num = 0;
+ }
+
if (cmds->num == CMDQ_BATCH_ENTRIES) {
arm_smmu_cmdq_issue_cmdlist(smmu, cmds->cmds, cmds->num, false);
cmds->num = 0;
@@ -3410,6 +3416,39 @@ static int arm_smmu_device_reset(struct arm_smmu_device *smmu, bool bypass)
return 0;
}
+#define IIDR_IMPLEMENTER_ARM 0x43b
+#define IIDR_PRODUCTID_ARM_MMU_600 0x483
+#define IIDR_PRODUCTID_ARM_MMU_700 0x487
+
+static void arm_smmu_device_iidr_probe(struct arm_smmu_device *smmu)
+{
+ u32 reg;
+ unsigned int implementer, productid, variant, revision;
+
+ reg = readl_relaxed(smmu->base + ARM_SMMU_IIDR);
+ implementer = FIELD_GET(IIDR_IMPLEMENTER, reg);
+ productid = FIELD_GET(IIDR_PRODUCTID, reg);
+ variant = FIELD_GET(IIDR_VARIANT, reg);
+ revision = FIELD_GET(IIDR_REVISION, reg);
+
+ switch (implementer) {
+ case IIDR_IMPLEMENTER_ARM:
+ switch (productid) {
+ case IIDR_PRODUCTID_ARM_MMU_600:
+ /* Arm erratum 1076982 */
+ if (variant == 0 && revision <= 2)
+ smmu->features &= ~ARM_SMMU_FEAT_SEV;
+ break;
+ case IIDR_PRODUCTID_ARM_MMU_700:
+ /* Arm erratum 2812531 */
+ smmu->features &= ~ARM_SMMU_FEAT_BTM;
+ smmu->options |= ARM_SMMU_OPT_CMDQ_FORCE_SYNC;
+ break;
+ }
+ break;
+ }
+}
+
static int arm_smmu_device_hw_probe(struct arm_smmu_device *smmu)
{
u32 reg;
diff --git a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
index cd48590ada30..406ac0426762 100644
--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
@@ -644,6 +644,7 @@ struct arm_smmu_device {
#define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0)
#define ARM_SMMU_OPT_PAGE0_REGS_ONLY (1 << 1)
#define ARM_SMMU_OPT_MSIPOLL (1 << 2)
+#define ARM_SMMU_OPT_CMDQ_FORCE_SYNC (1 << 3)
u32 options;
struct arm_smmu_cmdq cmdq;
--
2.25.1
The patch titled
Subject: mm: lock VMA in dup_anon_vma() before setting ->anon_vma
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-lock-vma-in-dup_anon_vma-before-setting-anon_vma.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm: lock VMA in dup_anon_vma() before setting ->anon_vma
Date: Fri, 21 Jul 2023 05:46:43 +0200
When VMAs are merged, dup_anon_vma() is called with `dst` pointing to the
VMA that is being expanded to cover the area previously occupied by
another VMA. This currently happens while `dst` is not write-locked.
This means that, in the `src->anon_vma && !dst->anon_vma` case, as soon as
the assignment `dst->anon_vma = src->anon_vma` has happened, concurrent
page faults can happen on `dst` under the per-VMA lock. This is already
icky in itself, since such page faults can now install pages into `dst`
that are attached to an `anon_vma` that is not yet tied back to the
`anon_vma` with an `anon_vma_chain`. But if `anon_vma_clone()` fails due
to an out-of-memory error, things get much worse: `anon_vma_clone()` then
reverts `dst->anon_vma` back to NULL, and `dst` remains completely
unconnected to the `anon_vma`, even though we can have pages in the area
covered by `dst` that point to the `anon_vma`.
This means the `anon_vma` of such pages can be freed while the pages are
still mapped into userspace, which leads to UAF when a helper like
folio_lock_anon_vma_read() tries to look up the anon_vma of such a page.
This theoretically is a security bug, but I believe it is really hard to
actually trigger as an unprivileged user because it requires that you can
make an order-0 GFP_KERNEL allocation fail, and the page allocator tries
pretty hard to prevent that.
I think doing the vma_start_write() call inside dup_anon_vma() is the most
straightforward fix for now.
For a kernel-assisted reproducer, see the notes section of the patch mail.
Link: https://lkml.kernel.org/r/20230721034643.616851-1-jannh@google.com
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mmap.c | 1 +
1 file changed, 1 insertion(+)
--- a/mm/mmap.c~mm-lock-vma-in-dup_anon_vma-before-setting-anon_vma
+++ a/mm/mmap.c
@@ -615,6 +615,7 @@ static inline int dup_anon_vma(struct vm
* anon pages imported.
*/
if (src->anon_vma && !dst->anon_vma) {
+ vma_start_write(dst);
dst->anon_vma = src->anon_vma;
return anon_vma_clone(dst, src);
}
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq.patch
mm-lock-vma-in-dup_anon_vma-before-setting-anon_vma.patch
I'm announcing the release of the 4.19.289 kernel.
All AMD processor users of the 4.19 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 4.19.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-4.19.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
arch/x86/kernel/cpu/microcode/amd.c | 2
7 files changed, 135 insertions(+), 74 deletions(-)
Borislav Petkov (AMD) (3):
x86/microcode/AMD: Load late on both threads too
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 4.19.289
I'm announcing the release of the 5.4.250 kernel.
All AMD processor users of the 5.4 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 5.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
arch/x86/kernel/cpu/microcode/amd.c | 2
7 files changed, 135 insertions(+), 74 deletions(-)
Borislav Petkov (AMD) (3):
x86/microcode/AMD: Load late on both threads too
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 5.4.250
I'm announcing the release of the 5.10.187 kernel.
All AMD processor users of the 5.10 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 5.10.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.10.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
arch/x86/kernel/cpu/microcode/amd.c | 2
7 files changed, 135 insertions(+), 74 deletions(-)
Borislav Petkov (AMD) (3):
x86/microcode/AMD: Load late on both threads too
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 5.10.187
I'm announcing the release of the 5.15.122 kernel.
All AMD processor users of the 5.15 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 5.15.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-5.15.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
6 files changed, 134 insertions(+), 73 deletions(-)
Borislav Petkov (AMD) (2):
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 5.15.122
I'm announcing the release of the 6.1.41 kernel.
All AMD processor users of the 6.1 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 6.1.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.1.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
6 files changed, 134 insertions(+), 73 deletions(-)
Borislav Petkov (AMD) (2):
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 6.1.41
I'm announcing the release of the 6.4.6 kernel.
All AMD processor users of the 6.4 kernel series who have not updated
their microcode to the latest version, must upgrade.
The updated 6.4.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.4.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2
arch/x86/include/asm/microcode.h | 1
arch/x86/include/asm/microcode_amd.h | 2
arch/x86/include/asm/msr-index.h | 1
arch/x86/kernel/cpu/amd.c | 199 ++++++++++++++++++++++-------------
arch/x86/kernel/cpu/common.c | 2
6 files changed, 134 insertions(+), 73 deletions(-)
Borislav Petkov (AMD) (2):
x86/cpu/amd: Move the errata checking functionality up
x86/cpu/amd: Add a Zenbleed fix
Greg Kroah-Hartman (1):
Linux 6.4.6
The patch titled
Subject: mm: Fix memory ordering for mm_lock_seq and vm_lock_seq
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jann Horn <jannh(a)google.com>
Subject: mm: Fix memory ordering for mm_lock_seq and vm_lock_seq
Date: Sat, 22 Jul 2023 00:51:07 +0200
mm->mm_lock_seq effectively functions as a read/write lock; therefore it
must be used with acquire/release semantics.
A specific example is the interaction between userfaultfd_register() and
lock_vma_under_rcu().
userfaultfd_register() does the following from the point where it changes
a VMA's flags to the point where concurrent readers are permitted again
(in a simple scenario where only a single private VMA is accessed and no
merging/splitting is involved):
userfaultfd_register
userfaultfd_set_vm_flags
vm_flags_reset
vma_start_write
down_write(&vma->vm_lock->lock)
vma->vm_lock_seq = mm_lock_seq [marks VMA as busy]
up_write(&vma->vm_lock->lock)
vm_flags_init
[sets VM_UFFD_* in __vm_flags]
vma->vm_userfaultfd_ctx.ctx = ctx
mmap_write_unlock
vma_end_write_all
WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1) [unlocks VMA]
There are no memory barriers in between the __vm_flags update and the
mm->mm_lock_seq update that unlocks the VMA, so the unlock can be
reordered to above the `vm_flags_init()` call, which means from the
perspective of a concurrent reader, a VMA can be marked as a userfaultfd
VMA while it is not VMA-locked. That's bad, we definitely need a
store-release for the unlock operation.
The non-atomic write to vma->vm_lock_seq in vma_start_write() is mostly
fine because all accesses to vma->vm_lock_seq that matter are always
protected by the VMA lock. There is a racy read in vma_start_read()
though that can tolerate false-positives, so we should be using
WRITE_ONCE() to keep things tidy and data-race-free (including for KCSAN).
On the other side, lock_vma_under_rcu() works as follows in the relevant
region for locking and userfaultfd check:
lock_vma_under_rcu
vma_start_read
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [early bailout]
down_read_trylock(&vma->vm_lock->lock)
vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq) [main check]
userfaultfd_armed
checks vma->vm_flags & __VM_UFFD_FLAGS
Here, the interesting aspect is how far down the mm->mm_lock_seq read can
be reordered - if this read is reordered down below the vma->vm_flags
access, this could cause lock_vma_under_rcu() to partly operate on
information that was read while the VMA was supposed to be locked. To
prevent this kind of downwards bleeding of the mm->mm_lock_seq read, we
need to read it with a load-acquire.
Some of the comment wording is based on suggestions by Suren.
BACKPORT WARNING: One of the functions changed by this patch (which I've
written against Linus' tree) is vma_try_start_write(), but this function
no longer exists in mm/mm-everything. I don't know whether the merged
version of this patch will be ordered before or after the patch that
removes vma_try_start_write(). If you're backporting this patch to a tree
with vma_try_start_write(), make sure this patch changes that function.
Link: https://lkml.kernel.org/r/20230721225107.942336-1-jannh@google.com
Fixes: 5e31275cc997 ("mm: add per-VMA lock and helper functions to control it")
Signed-off-by: Jann Horn <jannh(a)google.com>
Reviewed-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/mm.h | 29 +++++++++++++++++++++++------
include/linux/mm_types.h | 28 ++++++++++++++++++++++++++++
include/linux/mmap_lock.h | 10 ++++++++--
3 files changed, 59 insertions(+), 8 deletions(-)
--- a/include/linux/mmap_lock.h~mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq
+++ a/include/linux/mmap_lock.h
@@ -76,8 +76,14 @@ static inline void mmap_assert_write_loc
static inline void vma_end_write_all(struct mm_struct *mm)
{
mmap_assert_write_locked(mm);
- /* No races during update due to exclusive mmap_lock being held */
- WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1);
+ /*
+ * Nobody can concurrently modify mm->mm_lock_seq due to exclusive
+ * mmap_lock being held.
+ * We need RELEASE semantics here to ensure that preceding stores into
+ * the VMA take effect before we unlock it with this store.
+ * Pairs with ACQUIRE semantics in vma_start_read().
+ */
+ smp_store_release(&mm->mm_lock_seq, mm->mm_lock_seq + 1);
}
#else
static inline void vma_end_write_all(struct mm_struct *mm) {}
--- a/include/linux/mm.h~mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq
+++ a/include/linux/mm.h
@@ -641,8 +641,14 @@ static inline void vma_numab_state_free(
*/
static inline bool vma_start_read(struct vm_area_struct *vma)
{
- /* Check before locking. A race might cause false locked result. */
- if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
+ /*
+ * Check before locking. A race might cause false locked result.
+ * We can use READ_ONCE() for the mm_lock_seq here, and don't need
+ * ACQUIRE semantics, because this is just a lockless check whose result
+ * we don't rely on for anything - the mm_lock_seq read against which we
+ * need ordering is below.
+ */
+ if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq))
return false;
if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0))
@@ -653,8 +659,13 @@ static inline bool vma_start_read(struct
* False unlocked result is impossible because we modify and check
* vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq
* modification invalidates all existing locks.
+ *
+ * We must use ACQUIRE semantics for the mm_lock_seq so that if we are
+ * racing with vma_end_write_all(), we only start reading from the VMA
+ * after it has been unlocked.
+ * This pairs with RELEASE semantics in vma_end_write_all().
*/
- if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) {
+ if (unlikely(vma->vm_lock_seq == smp_load_acquire(&vma->vm_mm->mm_lock_seq))) {
up_read(&vma->vm_lock->lock);
return false;
}
@@ -676,7 +687,7 @@ static bool __is_vma_write_locked(struct
* current task is holding mmap_write_lock, both vma->vm_lock_seq and
* mm->mm_lock_seq can't be concurrently modified.
*/
- *mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq);
+ *mm_lock_seq = vma->vm_mm->mm_lock_seq;
return (vma->vm_lock_seq == *mm_lock_seq);
}
@@ -688,7 +699,13 @@ static inline void vma_start_write(struc
return;
down_write(&vma->vm_lock->lock);
- vma->vm_lock_seq = mm_lock_seq;
+ /*
+ * We should use WRITE_ONCE() here because we can have concurrent reads
+ * from the early lockless pessimistic check in vma_start_read().
+ * We don't really care about the correctness of that early check, but
+ * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy.
+ */
+ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
up_write(&vma->vm_lock->lock);
}
@@ -702,7 +719,7 @@ static inline bool vma_try_start_write(s
if (!down_write_trylock(&vma->vm_lock->lock))
return false;
- vma->vm_lock_seq = mm_lock_seq;
+ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
up_write(&vma->vm_lock->lock);
return true;
}
--- a/include/linux/mm_types.h~mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq
+++ a/include/linux/mm_types.h
@@ -514,6 +514,20 @@ struct vm_area_struct {
};
#ifdef CONFIG_PER_VMA_LOCK
+ /*
+ * Can only be written (using WRITE_ONCE()) while holding both:
+ * - mmap_lock (in write mode)
+ * - vm_lock->lock (in write mode)
+ * Can be read reliably while holding one of:
+ * - mmap_lock (in read or write mode)
+ * - vm_lock->lock (in read or write mode)
+ * Can be read unreliably (using READ_ONCE()) for pessimistic bailout
+ * while holding nothing (except RCU to keep the VMA struct allocated).
+ *
+ * This sequence counter is explicitly allowed to overflow; sequence
+ * counter reuse can only lead to occasional unnecessary use of the
+ * slowpath.
+ */
int vm_lock_seq;
struct vma_lock *vm_lock;
@@ -679,6 +693,20 @@ struct mm_struct {
* by mmlist_lock
*/
#ifdef CONFIG_PER_VMA_LOCK
+ /*
+ * This field has lock-like semantics, meaning it is sometimes
+ * accessed with ACQUIRE/RELEASE semantics.
+ * Roughly speaking, incrementing the sequence number is
+ * equivalent to releasing locks on VMAs; reading the sequence
+ * number can be part of taking a read lock on a VMA.
+ *
+ * Can be modified under write mmap_lock using RELEASE
+ * semantics.
+ * Can be read with no other protection when holding write
+ * mmap_lock.
+ * Can be read with ACQUIRE semantics if not holding write
+ * mmap_lock.
+ */
int mm_lock_seq;
#endif
_
Patches currently in -mm which might be from jannh(a)google.com are
mm-fix-memory-ordering-for-mm_lock_seq-and-vm_lock_seq.patch
The patch titled
Subject: Revert "um: Use swap() to make code cleaner"
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
revert-um-use-swap-to-make-code-cleaner.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Subject: Revert "um: Use swap() to make code cleaner"
Date: Mon, 24 Jul 2023 17:31:31 +0300
This reverts commit 9b0da3f22307af693be80f5d3a89dc4c7f360a85.
The sigio.c is clearly user space code which is handled by
arch/um/scripts/Makefile.rules (see USER_OBJS rule).
The above mentioned commit simply broke this agreement,
we may not use Linux kernel internal headers in them without
thorough thinking.
Hence, revert the wrong commit.
Link: https://lkml.kernel.org/r/20230724143131.30090-1-andriy.shevchenko@linux.in…
Signed-off-by: Andy Shevchenko <andriy.shevchenko(a)linux.intel.com>
Reported-by: kernel test robot <lkp(a)intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202307212304.cH79zJp1-lkp@intel.com/
Cc: Anton Ivanov <anton.ivanov(a)cambridgegreys.com>
Cc: Herve Codina <herve.codina(a)bootlin.com>
Cc: Jason A. Donenfeld <Jason(a)zx2c4.com>
Cc: Johannes Berg <johannes(a)sipsolutions.net>
Cc: Rasmus Villemoes <linux(a)rasmusvillemoes.dk>
Cc: Richard Weinberger <richard(a)nod.at>
Cc: Yang Guang <yang.guang5(a)zte.com.cn>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
arch/um/os-Linux/sigio.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
--- a/arch/um/os-Linux/sigio.c~revert-um-use-swap-to-make-code-cleaner
+++ a/arch/um/os-Linux/sigio.c
@@ -3,7 +3,6 @@
* Copyright (C) 2002 - 2008 Jeff Dike (jdike(a){addtoit,linux.intel}.com)
*/
-#include <linux/minmax.h>
#include <unistd.h>
#include <errno.h>
#include <fcntl.h>
@@ -51,7 +50,7 @@ static struct pollfds all_sigio_fds;
static int write_sigio_thread(void *unused)
{
- struct pollfds *fds;
+ struct pollfds *fds, tmp;
struct pollfd *p;
int i, n, respond_fd;
char c;
@@ -78,7 +77,9 @@ static int write_sigio_thread(void *unus
"write_sigio_thread : "
"read on socket failed, "
"err = %d\n", errno);
- swap(current_poll, next_poll);
+ tmp = current_poll;
+ current_poll = next_poll;
+ next_poll = tmp;
respond_fd = sigio_private[1];
}
else {
_
Patches currently in -mm which might be from andriy.shevchenko(a)linux.intel.com are
revert-um-use-swap-to-make-code-cleaner.patch
kernelh-split-out-count_args-and-concatenate-to-argsh.patch
x86-asm-replace-custom-count_args-concatenate-implementations.patch
arm64-smccc-replace-custom-count_args-concatenate-implementations.patch
genetlink-replace-custom-concatenate-implementation.patch
Hi,
as there are new hardware directives, we need a little adaptation
for the AUX invalidation sequence.
In this version we support all the engines affected by this
change.
The stable backport has some challenges because the original
patch that this series fixes has had more changes in between.
This patch is slowly exploding with code refactorings and
features added and fixed.
Thanks a lot Nirmoy, Andrzej and Matt for your review and for the
fruitful discussions!
Thanks,
Andi
Changelog:
=========
v7 -> v8
- Removed the aux invalidation from the device info and added a
helper function, instead (patch 2).
- Use "MTL and beyond" instead of "MTL+" in comments.
- Use the "gen12_" prefix instead of "intel_".
- In patch 6 return an int error instead of an error embedded in
the pointer in the intel_emit_pipe_control_cs() function and
propagate the error to the upper layers.
v6 -> v7
- Fix correct sequence applied to the correct engine. A little
confusion promptly cought by Nirmoy when applying to the VD
engine the sequence belonging to the compute engines. Thanks a
lot, Nirmoy!
v5 -> v6
- Fixed ccs flush in the engines VE and BCS. They are sent as a
separate command instead of added in the pipe control.
- Separated the CCS flusing in the pipe control patch with the
quiescing of the memory. They were meant to be on separate
patch already in the previous verision, but apparently I
squashed them by mistake.
v4 -> v5
- The AUX CCS is added as a device property instead of checking
against FLAT CCS. This adds the new HAS_AUX_CCS check
(Patch 2, new).
- little and trivial refactoring here and there.
- extended the flags{0,1}/bit_group_{0,1} renaming to other
functions.
- Created an intel_emit_pipe_control_cs() wrapper for submitting
the pipe control.
- Quiesce memory for all the engines, not just RCS (Patch 6,
new).
- The PIPE_CONTROL_CCS_FLUSH is added to all the engines.
- Remove redundant EMIT_FLUSH_CCS mode flag.
- Remove unnecessary NOOPs from the command streamer for
invalidating the CCS table.
- Use INVALID_MMIO_REG and gen12_get_aux_inv_reg() instad of
__MMIO(0) and reg.reg.
- Remove useless wrapper and just use gen12_get_aux_inv_reg().
v3 -> v4
- A trivial patch 3 is added to rename the flags with
bit_group_{0,1} to align with the datasheet naming.
- Patch 4 fixes a confusion I made where the CCS flag was
applied to the wrong bit group.
v2 -> v3
- added r-b from Nirmoy in patch 1 and 4.
- added patch 3 which enables the ccs_flush in the control pipe
for mtl+ compute and render engines.
- added redundant checks in patch 2 for enabling the EMIT_FLUSH
flag.
v1 -> v2
- add a clean up preliminary patch for the existing registers
- add support for more engines
- add the Fixes tag
Andi Shyti (7):
drm/i915/gt: Cleanup aux invalidation registers
drm/i915: Add the gen12_needs_ccs_aux_inv helper
drm/i915/gt: Rename flags with bit_group_X according to the datasheet
drm/i915/gt: Enable the CCS_FLUSH bit in the pipe control
drm/i915/gt: Refactor intel_emit_pipe_control_cs() in a single
function
drm/i915/gt: Ensure memory quiesced before invalidation for all
engines
drm/i915/gt: Support aux invalidation on all engines
Jonathan Cavitt (2):
drm/i915/gt: Ensure memory quiesced before invalidation
drm/i915/gt: Poll aux invalidation register bit on invalidation
drivers/gpu/drm/i915/gt/gen8_engine_cs.c | 198 ++++++++++++-------
drivers/gpu/drm/i915/gt/gen8_engine_cs.h | 21 +-
drivers/gpu/drm/i915/gt/intel_gpu_commands.h | 2 +
drivers/gpu/drm/i915/gt/intel_gt_regs.h | 16 +-
drivers/gpu/drm/i915/gt/intel_lrc.c | 17 +-
5 files changed, 155 insertions(+), 99 deletions(-)
--
2.40.1
Dpt objects that are created from internal get evicted when there is
memory pressure and do not get restored when pinned during scanout. The
pinned page table entries look corrupted and programming the display
engine with the incorrect pte's result in DE throwing pipe faults.
Create DPT objects from shmem and mark the object as dirty when pinning so
that the object is restored when shrinker evicts an unpinned buffer object.
v2: Unconditionally mark the dpt objects dirty during pinning(Chris).
Fixes: 0dc987b699ce ("drm/i915/display: Add smem fallback allocation for dpt")
Cc: <stable(a)vger.kernel.org> # v6.0+
Cc: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin(a)linux.intel.com>
Suggested-by: Chris Wilson <chris.p.wilson(a)intel.com>
Signed-off-by: Fei Yang <fei.yang(a)intel.com>
Signed-off-by: Radhakrishna Sripada <radhakrishna.sripada(a)intel.com>
---
drivers/gpu/drm/i915/display/intel_dpt.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
index 7c5fddb203ba..fbfd8f959f17 100644
--- a/drivers/gpu/drm/i915/display/intel_dpt.c
+++ b/drivers/gpu/drm/i915/display/intel_dpt.c
@@ -166,6 +166,8 @@ struct i915_vma *intel_dpt_pin(struct i915_address_space *vm)
i915_vma_get(vma);
}
+ dpt->obj->mm.dirty = true;
+
atomic_dec(&i915->gpu_error.pending_fb_pin);
intel_runtime_pm_put(&i915->runtime_pm, wakeref);
@@ -261,7 +263,7 @@ intel_dpt_create(struct intel_framebuffer *fb)
dpt_obj = i915_gem_object_create_stolen(i915, size);
if (IS_ERR(dpt_obj) && !HAS_LMEM(i915)) {
drm_dbg_kms(&i915->drm, "Allocating dpt from smem\n");
- dpt_obj = i915_gem_object_create_internal(i915, size);
+ dpt_obj = i915_gem_object_create_shmem(i915, size);
}
if (IS_ERR(dpt_obj))
return ERR_CAST(dpt_obj);
--
2.34.1
From: hackyzh002 <hackyzh002(a)gmail.com>
[ Upstream commit f828b681d0cd566f86351c0b913e6cb6ed8c7b9c ]
The type of size is unsigned, if size is 0x40000000, there will be an
integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: hackyzh002 <hackyzh002(a)gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/radeon/radeon_cs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 1ae31dbc61c64..5e61abb3dce5c 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -265,7 +265,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
{
struct drm_radeon_cs *cs = data;
uint64_t *chunk_array_ptr;
- unsigned size, i;
+ u64 size;
+ unsigned i;
u32 ring = RADEON_CS_RING_GFX;
s32 priority = 0;
--
2.39.2
Hi Nirmoy,
> static int mtl_dummy_pipe_control(struct i915_request *rq)
> {
> /* Wa_14016712196 */
> if (IS_MTL_GRAPHICS_STEP(rq->engine->i915, M, STEP_A0, STEP_B0) ||
> - IS_MTL_GRAPHICS_STEP(rq->engine->i915, P, STEP_A0, STEP_B0)) {
> - u32 *cs;
> -
> - /* dummy PIPE_CONTROL + depth flush */
> - cs = intel_ring_begin(rq, 6);
> - if (IS_ERR(cs))
> - return PTR_ERR(cs);
> - cs = gen12_emit_pipe_control(cs,
> - 0,
> - PIPE_CONTROL_DEPTH_CACHE_FLUSH,
> - LRC_PPHWSP_SCRATCH_ADDR);
> - intel_ring_advance(rq, cs);
> - }
> + IS_MTL_GRAPHICS_STEP(rq->engine->i915, P, STEP_A0, STEP_B0))
> + return gen12_emit_pipe_control_cs(rq, 0,
> + PIPE_CONTROL_DEPTH_CACHE_FLUSH,
> + LRC_PPHWSP_SCRATCH_ADDR);
>
> Don't think this will get backported to 5.8+. I think mtl introduced after
> that.
>
> If that is not an issue for rest of the series and then
to be honest I don't think I fully understood the stable
policies, as in this series only two are the patches that are
really fixing things and the rest are only support.
In this case I think this will produce a conflict that will be
eventually fixed (... I guess!).
> Reviewed-by: Nirmoy Das <nirmoy.das(a)intel.com>
Thanks,
Andi
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 26efd79c4624294e553aeaa3439c646729bad084
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072114-giblet-unzip-f1db@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
26efd79c4624 ("ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()")
db42523b4f3e ("ftrace: Store the order of pages allocated in ftrace_page")
59300b36f85f ("ftrace: Check if pages were allocated before calling free_pages()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 26efd79c4624294e553aeaa3439c646729bad084 Mon Sep 17 00:00:00 2001
From: Zheng Yejian <zhengyejian1(a)huawei.com>
Date: Wed, 12 Jul 2023 14:04:52 +0800
Subject: [PATCH] ftrace: Fix possible warning on checking all pages used in
ftrace_process_locs()
As comments in ftrace_process_locs(), there may be NULL pointers in
mcount_loc section:
> Some architecture linkers will pad between
> the different mcount_loc sections of different
> object files to satisfy alignments.
> Skip any NULL pointers.
After commit 20e5227e9f55 ("ftrace: allow NULL pointers in mcount_loc"),
NULL pointers will be accounted when allocating ftrace pages but skipped
before adding into ftrace pages, this may result in some pages not being
used. Then after commit 706c81f87f84 ("ftrace: Remove extra helper
functions"), warning may occur at:
WARN_ON(pg->next);
To fix it, only warn for case that no pointers skipped but pages not used
up, then free those unused pages after releasing ftrace_lock.
Link: https://lore.kernel.org/linux-trace-kernel/20230712060452.3175675-1-zhengye…
Cc: stable(a)vger.kernel.org
Fixes: 706c81f87f84 ("ftrace: Remove extra helper functions")
Suggested-by: Steven Rostedt <rostedt(a)goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1(a)huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 3740aca79fe7..05c0024815bf 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -3305,6 +3305,22 @@ static int ftrace_allocate_records(struct ftrace_page *pg, int count)
return cnt;
}
+static void ftrace_free_pages(struct ftrace_page *pages)
+{
+ struct ftrace_page *pg = pages;
+
+ while (pg) {
+ if (pg->records) {
+ free_pages((unsigned long)pg->records, pg->order);
+ ftrace_number_of_pages -= 1 << pg->order;
+ }
+ pages = pg->next;
+ kfree(pg);
+ pg = pages;
+ ftrace_number_of_groups--;
+ }
+}
+
static struct ftrace_page *
ftrace_allocate_pages(unsigned long num_to_init)
{
@@ -3343,17 +3359,7 @@ ftrace_allocate_pages(unsigned long num_to_init)
return start_pg;
free_pages:
- pg = start_pg;
- while (pg) {
- if (pg->records) {
- free_pages((unsigned long)pg->records, pg->order);
- ftrace_number_of_pages -= 1 << pg->order;
- }
- start_pg = pg->next;
- kfree(pg);
- pg = start_pg;
- ftrace_number_of_groups--;
- }
+ ftrace_free_pages(start_pg);
pr_info("ftrace: FAILED to allocate memory for functions\n");
return NULL;
}
@@ -6471,9 +6477,11 @@ static int ftrace_process_locs(struct module *mod,
unsigned long *start,
unsigned long *end)
{
+ struct ftrace_page *pg_unuse = NULL;
struct ftrace_page *start_pg;
struct ftrace_page *pg;
struct dyn_ftrace *rec;
+ unsigned long skipped = 0;
unsigned long count;
unsigned long *p;
unsigned long addr;
@@ -6536,8 +6544,10 @@ static int ftrace_process_locs(struct module *mod,
* object files to satisfy alignments.
* Skip any NULL pointers.
*/
- if (!addr)
+ if (!addr) {
+ skipped++;
continue;
+ }
end_offset = (pg->index+1) * sizeof(pg->records[0]);
if (end_offset > PAGE_SIZE << pg->order) {
@@ -6551,8 +6561,10 @@ static int ftrace_process_locs(struct module *mod,
rec->ip = addr;
}
- /* We should have used all pages */
- WARN_ON(pg->next);
+ if (pg->next) {
+ pg_unuse = pg->next;
+ pg->next = NULL;
+ }
/* Assign the last page to ftrace_pages */
ftrace_pages = pg;
@@ -6574,6 +6586,11 @@ static int ftrace_process_locs(struct module *mod,
out:
mutex_unlock(&ftrace_lock);
+ /* We should have used all pages unless we skipped some */
+ if (pg_unuse) {
+ WARN_ON(!skipped);
+ ftrace_free_pages(pg_unuse);
+ }
return ret;
}
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 4481913607e58196c48a4fef5e6f45350684ec3c
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072146-sports-deluge-22a1@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 4481913607e58196c48a4fef5e6f45350684ec3c Mon Sep 17 00:00:00 2001
From: Yunxiang Li <Yunxiang.Li(a)amd.com>
Date: Thu, 22 Jun 2023 10:18:03 -0400
Subject: [PATCH] drm/ttm: fix bulk_move corruption when adding a entry
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When the resource is the first in the bulk_move range, adding it again
(thus moving it to the tail) will corrupt the list since the first
pointer is not moved. This eventually lead to null pointer deref in
ttm_lru_bulk_move_del()
Fixes: fee2ede15542 ("drm/ttm: rework bulk move handling v5")
Signed-off-by: Yunxiang Li <Yunxiang.Li(a)amd.com>
Reviewed-by: Christian König <christian.koenig(a)amd.com>
CC: stable(a)vger.kernel.org
Link: https://patchwork.freedesktop.org/patch/msgid/20230622141902.28718-3-Yunxia…
Signed-off-by: Christian König <christian.koenig(a)amd.com>
diff --git a/drivers/gpu/drm/ttm/ttm_resource.c b/drivers/gpu/drm/ttm/ttm_resource.c
index 7333f7a87a2f..e51dbc7a2d53 100644
--- a/drivers/gpu/drm/ttm/ttm_resource.c
+++ b/drivers/gpu/drm/ttm/ttm_resource.c
@@ -86,6 +86,8 @@ static void ttm_lru_bulk_move_pos_tail(struct ttm_lru_bulk_move_pos *pos,
struct ttm_resource *res)
{
if (pos->last != res) {
+ if (pos->first == res)
+ pos->first = list_next_entry(res, lru);
list_move(&res->lru, &pos->last->lru);
pos->last = res;
}
@@ -111,7 +113,8 @@ static void ttm_lru_bulk_move_del(struct ttm_lru_bulk_move *bulk,
{
struct ttm_lru_bulk_move_pos *pos = ttm_lru_bulk_move_pos(bulk, res);
- if (unlikely(pos->first == res && pos->last == res)) {
+ if (unlikely(WARN_ON(!pos->first || !pos->last) ||
+ pos->first == res && pos->last == res)) {
pos->first = NULL;
pos->last = NULL;
} else if (pos->first == res) {
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072412-operable-craving-317a@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
dbf2bab7935b ("jbd2: simplify journal_clean_one_cp_list()")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
597599268e3b ("jbd2: discard dirty data when forgetting an un-journalled buffer")
f69120ce6c02 ("jbd2: fix sphinx kernel-doc build warnings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072403-virtuous-straw-2f43@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
dbf2bab7935b ("jbd2: simplify journal_clean_one_cp_list()")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
597599268e3b ("jbd2: discard dirty data when forgetting an un-journalled buffer")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072453-tricolor-employed-7622@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
dbf2bab7935b ("jbd2: simplify journal_clean_one_cp_list()")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072443-tackiness-pennant-36bc@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
dbf2bab7935b ("jbd2: simplify journal_clean_one_cp_list()")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072441-deprive-glimpse-b891@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072434-stylist-reflex-c474@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072433-obsessed-limeade-e4b9@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072427-curler-operate-a8d0@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072424-dejected-raisin-2dd7@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
46f881b5b175 ("jbd2: fix a race when checking checkpoint buffer busy")
b98dba273a0e ("jbd2: remove journal_clean_one_cp_list()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072419-silt-kinswoman-c896@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 6.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.4.y
git checkout FETCH_HEAD
git cherry-pick -x 46f881b5b1758dc4a35fba4a643c10717d0cf427
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072415-dyslexic-spoilage-fb58@gregkh' --subject-prefix 'PATCH 6.4.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 46f881b5b1758dc4a35fba4a643c10717d0cf427 Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:27 +0800
Subject: [PATCH] jbd2: fix a race when checking checkpoint buffer busy
Before removing checkpoint buffer from the t_checkpoint_list, we have to
check both BH_Dirty and BH_Lock bits together to distinguish buffers
have not been or were being written back. But __cp_buffer_busy() checks
them separately, it first check lock state and then check dirty, the
window between these two checks could be raced by writing back
procedure, which locks buffer and clears buffer dirty before I/O
completes. So it cannot guarantee checkpointing buffers been written
back to disk if some error happens later. Finally, it may clean
checkpoint transactions and lead to inconsistent filesystem.
jbd2_journal_forget() and __journal_try_to_free_buffer() also have the
same problem (journal_unmap_buffer() escape from this issue since it's
running under the buffer lock), so fix them through introducing a new
helper to try holding the buffer lock and remove really clean buffer.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217490
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-6-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 42b34cab64fb..9ec91017a7f3 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -376,11 +376,15 @@ static unsigned long journal_shrink_one_cp_list(struct journal_head *jh,
jh = next_jh;
next_jh = jh->b_cpnext;
- if (!destroy && __cp_buffer_busy(jh))
- continue;
+ if (destroy) {
+ ret = __jbd2_journal_remove_checkpoint(jh);
+ } else {
+ ret = jbd2_journal_try_remove_checkpoint(jh);
+ if (ret < 0)
+ continue;
+ }
nr_freed++;
- ret = __jbd2_journal_remove_checkpoint(jh);
if (ret) {
*released = true;
break;
@@ -616,6 +620,34 @@ int __jbd2_journal_remove_checkpoint(struct journal_head *jh)
return 1;
}
+/*
+ * Check the checkpoint buffer and try to remove it from the checkpoint
+ * list if it's clean. Returns -EBUSY if it is not clean, returns 1 if
+ * it frees the transaction, 0 otherwise.
+ *
+ * This function is called with j_list_lock held.
+ */
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh)
+{
+ struct buffer_head *bh = jh2bh(jh);
+
+ if (!trylock_buffer(bh))
+ return -EBUSY;
+ if (buffer_dirty(bh)) {
+ unlock_buffer(bh);
+ return -EBUSY;
+ }
+ unlock_buffer(bh);
+
+ /*
+ * Buffer is clean and the IO has finished (we held the buffer
+ * lock) so the checkpoint is done. We can safely remove the
+ * buffer from this transaction.
+ */
+ JBUFFER_TRACE(jh, "remove from checkpoint list");
+ return __jbd2_journal_remove_checkpoint(jh);
+}
+
/*
* journal_insert_checkpoint: put a committed buffer onto a checkpoint
* list so that we know when it is safe to clean the transaction out of
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 18611241f451..6ef5022949c4 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1784,8 +1784,7 @@ int jbd2_journal_forget(handle_t *handle, struct buffer_head *bh)
* Otherwise, if the buffer has been written to disk,
* it is safe to remove the checkpoint and drop it.
*/
- if (!buffer_dirty(bh)) {
- __jbd2_journal_remove_checkpoint(jh);
+ if (jbd2_journal_try_remove_checkpoint(jh) >= 0) {
spin_unlock(&journal->j_list_lock);
goto drop;
}
@@ -2112,20 +2111,14 @@ __journal_try_to_free_buffer(journal_t *journal, struct buffer_head *bh)
jh = bh2jh(bh);
- if (buffer_locked(bh) || buffer_dirty(bh))
- goto out;
-
if (jh->b_next_transaction != NULL || jh->b_transaction != NULL)
- goto out;
+ return;
spin_lock(&journal->j_list_lock);
- if (jh->b_cp_transaction != NULL) {
- /* written-back checkpointed metadata buffer */
- JBUFFER_TRACE(jh, "remove from checkpoint list");
- __jbd2_journal_remove_checkpoint(jh);
- }
+ /* Remove written-back checkpointed metadata buffer */
+ if (jh->b_cp_transaction != NULL)
+ jbd2_journal_try_remove_checkpoint(jh);
spin_unlock(&journal->j_list_lock);
-out:
return;
}
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index bd660aac8e07..44c298aa58d4 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1443,6 +1443,7 @@ extern void jbd2_journal_commit_transaction(journal_t *);
void __jbd2_journal_clean_checkpoint_list(journal_t *journal, bool destroy);
unsigned long jbd2_journal_shrink_checkpoint_list(journal_t *journal, unsigned long *nr_to_scan);
int __jbd2_journal_remove_checkpoint(struct journal_head *);
+int jbd2_journal_try_remove_checkpoint(struct journal_head *jh);
void jbd2_journal_destroy_checkpoint(journal_t *journal);
void __jbd2_journal_insert_checkpoint(struct journal_head *, transaction_t *);
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072412-raft-circle-8700@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
83e80a6e3543 ("ext4: use buckets for cr 1 block scan instead of rbtree")
4fca50d440cc ("ext4: make mballoc try target group first even with mb_optimize_scan")
cf4ff938b47f ("ext4: correct the judgment of BUG in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072404-hubcap-railroad-9c52@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
5d5460fa7932 ("ext4: fix off by one issue in ext4_mb_choose_next_group_best_avail()")
f52f3d2b9fba ("ext4: Give symbolic names to mballoc criterias")
7e170922f06b ("ext4: Add allocation criteria 1.5 (CR1_5)")
1b4200112108 ("ext4: Avoid scanning smaller extents in BG during CR1")
3ef5d2638796 ("ext4: Add counter to track successful allocation of goal length")
fdd9a00943a5 ("ext4: Add per CR extent scanned counter")
4eb7a4a1a33b ("ext4: Convert mballoc cr (criteria) to enum")
c3defd99d58c ("ext4: treat stripe in block unit")
361eb69fc99f ("ext4: Remove the logic to trim inode PAs")
3872778664e3 ("ext4: Use rbtrees to manage PAs instead of inode i_prealloc_list")
a8e38fd37cff ("ext4: Convert pa->pa_inode_list and pa->pa_obj_lock into a union")
93cdf49f6eca ("ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa()")
0830344c953a ("ext4: Abstract out overlap fix/check logic in ext4_mb_normalize_request()")
7692094ac513 ("ext4: Move overlap assert logic into a separate function")
bcf434992145 ("ext4: Refactor code in ext4_mb_normalize_request() and ext4_mb_use_preallocated()")
e86a718228b6 ("ext4: Stop searching if PA doesn't satisfy non-extent file")
91a48aaf59d0 ("ext4: avoid unnecessary pointer dereference in ext4_mb_normalize_request")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 5d5460fa7932bed3a9082a6a8852cfbdb46acbe8 Mon Sep 17 00:00:00 2001
From: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Date: Fri, 9 Jun 2023 16:04:03 +0530
Subject: [PATCH] ext4: fix off by one issue in
ext4_mb_choose_next_group_best_avail()
In ext4_mb_choose_next_group_best_avail(), we want the start order to be
1 less than goal length and the min_order to be, at max, 1 more than the
original length. This commit fixes an off by one issue that arose due to
the fact that 1 << fls(n) > (n).
After all the processing:
order = 1 order below goal len
min_order = maximum of the three:-
- order - trim_order
- 1 order below B2C(s_stripe)
- 1 order above original len
Cc: stable(a)kernel.org
Fixes: 33122aa930 ("ext4: Add allocation criteria 1.5 (CR1_5)")
Signed-off-by: Ojaswin Mujoo <ojaswin(a)linux.ibm.com>
Link: https://lore.kernel.org/r/20230609103403.112807-1-ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index a2475b8c9fb5..456150ef6111 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -1006,14 +1006,11 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
* fls() instead since we need to know the actual length while modifying
* goal length.
*/
- order = fls(ac->ac_g_ex.fe_len);
+ order = fls(ac->ac_g_ex.fe_len) - 1;
min_order = order - sbi->s_mb_best_avail_max_trim_order;
if (min_order < 0)
min_order = 0;
- if (1 << min_order < ac->ac_o_ex.fe_len)
- min_order = fls(ac->ac_o_ex.fe_len) + 1;
-
if (sbi->s_stripe > 0) {
/*
* We are assuming that stripe size is always a multiple of
@@ -1021,9 +1018,16 @@ static void ext4_mb_choose_next_group_best_avail(struct ext4_allocation_context
*/
num_stripe_clusters = EXT4_NUM_B2C(sbi, sbi->s_stripe);
if (1 << min_order < num_stripe_clusters)
- min_order = fls(num_stripe_clusters);
+ /*
+ * We consider 1 order less because later we round
+ * up the goal len to num_stripe_clusters
+ */
+ min_order = fls(num_stripe_clusters) - 1;
}
+ if (1 << min_order < ac->ac_o_ex.fe_len)
+ min_order = fls(ac->ac_o_ex.fe_len);
+
for (i = order; i >= min_order; i--) {
int frag_order;
/*
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x c2d6fd9d6f35079f1669f0100f05b46708c74b7f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072459-unmolded-serrated-1862@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
c2d6fd9d6f35 ("jbd2: recheck chechpointing non-dirty buffer")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
ccd3c4373eac ("jbd2: fix use after free in jbd2_log_do_checkpoint()")
f69120ce6c02 ("jbd2: fix sphinx kernel-doc build warnings")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c2d6fd9d6f35079f1669f0100f05b46708c74b7f Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:23 +0800
Subject: [PATCH] jbd2: recheck chechpointing non-dirty buffer
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 51bd38da21cd..25e3c20eb19f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
}
}
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
-
- if (!transaction->t_checkpoint_io_list) {
- jh->b_cpnext = jh->b_cpprev = jh;
- } else {
- jh->b_cpnext = transaction->t_checkpoint_io_list;
- jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
- jh->b_cpprev->b_cpnext = jh;
- jh->b_cpnext->b_cpprev = jh;
- }
- transaction->t_checkpoint_io_list = jh;
-}
-
/*
* Check a checkpoint buffer could be release or not.
*
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
struct buffer_head *bh = journal->j_chkpt_bhs[i];
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
+ journal->j_chkpt_bhs[i] = NULL;
}
*batch_count = 0;
}
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
+ /*
+ * The buffer may be writing back, or flushing out in the
+ * last couple of cycles, or re-adding into a new transaction,
+ * need to check it again until it's unlocked.
+ */
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
}
if (!buffer_dirty(bh)) {
BUFFER_TRACE(bh, "remove from checkpoint");
- if (__jbd2_journal_remove_checkpoint(jh))
- /* The transaction was released; we're done */
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
goto out;
- continue;
+ } else {
+ /*
+ * We are about to write the buffer, it could be
+ * raced by some other transaction shrink or buffer
+ * re-log logic once we release the j_list_lock,
+ * leave it on the checkpoint list and check status
+ * again to make sure it's clean.
+ */
+ BUFFER_TRACE(bh, "queue");
+ get_bh(bh);
+ J_ASSERT_BH(bh, !buffer_jwrite(bh));
+ journal->j_chkpt_bhs[batch_count++] = bh;
+ transaction->t_chp_stats.cs_written++;
+ transaction->t_checkpoint_list = jh->b_cpnext;
}
- /*
- * Important: we are about to write the buffer, and
- * possibly block, while still holding the journal
- * lock. We cannot afford to let the transaction
- * logic start messing around with this buffer before
- * we write it to disk, as that would break
- * recoverability.
- */
- BUFFER_TRACE(bh, "queue");
- get_bh(bh);
- J_ASSERT_BH(bh, !buffer_jwrite(bh));
- journal->j_chkpt_bhs[batch_count++] = bh;
- __buffer_relink_io(jh);
- transaction->t_chp_stats.cs_written++;
+
if ((batch_count == JBD2_NR_BATCH) ||
- need_resched() ||
- spin_needbreak(&journal->j_list_lock))
+ need_resched() || spin_needbreak(&journal->j_list_lock) ||
+ jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
goto unlock_and_flush;
}
@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
goto restart;
}
- /*
- * Now we issued all of the transaction's buffers, let's deal
- * with the buffers that are out for I/O.
- */
-restart2:
- /* Did somebody clean up the transaction in the meanwhile? */
- if (journal->j_checkpoint_transactions != transaction ||
- transaction->t_tid != this_tid)
- goto out;
-
- while (transaction->t_checkpoint_io_list) {
- jh = transaction->t_checkpoint_io_list;
- bh = jh2bh(jh);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- spin_lock(&journal->j_list_lock);
- goto restart2;
- }
-
- /*
- * Now in whatever state the buffer currently is, we
- * know that it has been written out and so we can
- * drop it from the list
- */
- if (__jbd2_journal_remove_checkpoint(jh))
- break;
- }
out:
spin_unlock(&journal->j_list_lock);
result = jbd2_cleanup_journal_tail(journal);
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x c2d6fd9d6f35079f1669f0100f05b46708c74b7f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072457-patio-doornail-5f44@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
c2d6fd9d6f35 ("jbd2: recheck chechpointing non-dirty buffer")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
ccd3c4373eac ("jbd2: fix use after free in jbd2_log_do_checkpoint()")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c2d6fd9d6f35079f1669f0100f05b46708c74b7f Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:23 +0800
Subject: [PATCH] jbd2: recheck chechpointing non-dirty buffer
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 51bd38da21cd..25e3c20eb19f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
}
}
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
-
- if (!transaction->t_checkpoint_io_list) {
- jh->b_cpnext = jh->b_cpprev = jh;
- } else {
- jh->b_cpnext = transaction->t_checkpoint_io_list;
- jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
- jh->b_cpprev->b_cpnext = jh;
- jh->b_cpnext->b_cpprev = jh;
- }
- transaction->t_checkpoint_io_list = jh;
-}
-
/*
* Check a checkpoint buffer could be release or not.
*
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
struct buffer_head *bh = journal->j_chkpt_bhs[i];
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
+ journal->j_chkpt_bhs[i] = NULL;
}
*batch_count = 0;
}
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
+ /*
+ * The buffer may be writing back, or flushing out in the
+ * last couple of cycles, or re-adding into a new transaction,
+ * need to check it again until it's unlocked.
+ */
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
}
if (!buffer_dirty(bh)) {
BUFFER_TRACE(bh, "remove from checkpoint");
- if (__jbd2_journal_remove_checkpoint(jh))
- /* The transaction was released; we're done */
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
goto out;
- continue;
+ } else {
+ /*
+ * We are about to write the buffer, it could be
+ * raced by some other transaction shrink or buffer
+ * re-log logic once we release the j_list_lock,
+ * leave it on the checkpoint list and check status
+ * again to make sure it's clean.
+ */
+ BUFFER_TRACE(bh, "queue");
+ get_bh(bh);
+ J_ASSERT_BH(bh, !buffer_jwrite(bh));
+ journal->j_chkpt_bhs[batch_count++] = bh;
+ transaction->t_chp_stats.cs_written++;
+ transaction->t_checkpoint_list = jh->b_cpnext;
}
- /*
- * Important: we are about to write the buffer, and
- * possibly block, while still holding the journal
- * lock. We cannot afford to let the transaction
- * logic start messing around with this buffer before
- * we write it to disk, as that would break
- * recoverability.
- */
- BUFFER_TRACE(bh, "queue");
- get_bh(bh);
- J_ASSERT_BH(bh, !buffer_jwrite(bh));
- journal->j_chkpt_bhs[batch_count++] = bh;
- __buffer_relink_io(jh);
- transaction->t_chp_stats.cs_written++;
+
if ((batch_count == JBD2_NR_BATCH) ||
- need_resched() ||
- spin_needbreak(&journal->j_list_lock))
+ need_resched() || spin_needbreak(&journal->j_list_lock) ||
+ jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
goto unlock_and_flush;
}
@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
goto restart;
}
- /*
- * Now we issued all of the transaction's buffers, let's deal
- * with the buffers that are out for I/O.
- */
-restart2:
- /* Did somebody clean up the transaction in the meanwhile? */
- if (journal->j_checkpoint_transactions != transaction ||
- transaction->t_tid != this_tid)
- goto out;
-
- while (transaction->t_checkpoint_io_list) {
- jh = transaction->t_checkpoint_io_list;
- bh = jh2bh(jh);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- spin_lock(&journal->j_list_lock);
- goto restart2;
- }
-
- /*
- * Now in whatever state the buffer currently is, we
- * know that it has been written out and so we can
- * drop it from the list
- */
- if (__jbd2_journal_remove_checkpoint(jh))
- break;
- }
out:
spin_unlock(&journal->j_list_lock);
result = jbd2_cleanup_journal_tail(journal);
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x c2d6fd9d6f35079f1669f0100f05b46708c74b7f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072456-thermal-relieving-1b68@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
c2d6fd9d6f35 ("jbd2: recheck chechpointing non-dirty buffer")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
2bf31d94423c ("jbd2: fix kernel-doc markups")
60ed633f51d0 ("jbd2: fix incorrect code style")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c2d6fd9d6f35079f1669f0100f05b46708c74b7f Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:23 +0800
Subject: [PATCH] jbd2: recheck chechpointing non-dirty buffer
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 51bd38da21cd..25e3c20eb19f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
}
}
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
-
- if (!transaction->t_checkpoint_io_list) {
- jh->b_cpnext = jh->b_cpprev = jh;
- } else {
- jh->b_cpnext = transaction->t_checkpoint_io_list;
- jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
- jh->b_cpprev->b_cpnext = jh;
- jh->b_cpnext->b_cpprev = jh;
- }
- transaction->t_checkpoint_io_list = jh;
-}
-
/*
* Check a checkpoint buffer could be release or not.
*
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
struct buffer_head *bh = journal->j_chkpt_bhs[i];
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
+ journal->j_chkpt_bhs[i] = NULL;
}
*batch_count = 0;
}
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
+ /*
+ * The buffer may be writing back, or flushing out in the
+ * last couple of cycles, or re-adding into a new transaction,
+ * need to check it again until it's unlocked.
+ */
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
}
if (!buffer_dirty(bh)) {
BUFFER_TRACE(bh, "remove from checkpoint");
- if (__jbd2_journal_remove_checkpoint(jh))
- /* The transaction was released; we're done */
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
goto out;
- continue;
+ } else {
+ /*
+ * We are about to write the buffer, it could be
+ * raced by some other transaction shrink or buffer
+ * re-log logic once we release the j_list_lock,
+ * leave it on the checkpoint list and check status
+ * again to make sure it's clean.
+ */
+ BUFFER_TRACE(bh, "queue");
+ get_bh(bh);
+ J_ASSERT_BH(bh, !buffer_jwrite(bh));
+ journal->j_chkpt_bhs[batch_count++] = bh;
+ transaction->t_chp_stats.cs_written++;
+ transaction->t_checkpoint_list = jh->b_cpnext;
}
- /*
- * Important: we are about to write the buffer, and
- * possibly block, while still holding the journal
- * lock. We cannot afford to let the transaction
- * logic start messing around with this buffer before
- * we write it to disk, as that would break
- * recoverability.
- */
- BUFFER_TRACE(bh, "queue");
- get_bh(bh);
- J_ASSERT_BH(bh, !buffer_jwrite(bh));
- journal->j_chkpt_bhs[batch_count++] = bh;
- __buffer_relink_io(jh);
- transaction->t_chp_stats.cs_written++;
+
if ((batch_count == JBD2_NR_BATCH) ||
- need_resched() ||
- spin_needbreak(&journal->j_list_lock))
+ need_resched() || spin_needbreak(&journal->j_list_lock) ||
+ jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
goto unlock_and_flush;
}
@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
goto restart;
}
- /*
- * Now we issued all of the transaction's buffers, let's deal
- * with the buffers that are out for I/O.
- */
-restart2:
- /* Did somebody clean up the transaction in the meanwhile? */
- if (journal->j_checkpoint_transactions != transaction ||
- transaction->t_tid != this_tid)
- goto out;
-
- while (transaction->t_checkpoint_io_list) {
- jh = transaction->t_checkpoint_io_list;
- bh = jh2bh(jh);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- spin_lock(&journal->j_list_lock);
- goto restart2;
- }
-
- /*
- * Now in whatever state the buffer currently is, we
- * know that it has been written out and so we can
- * drop it from the list
- */
- if (__jbd2_journal_remove_checkpoint(jh))
- break;
- }
out:
spin_unlock(&journal->j_list_lock);
result = jbd2_cleanup_journal_tail(journal);
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x c2d6fd9d6f35079f1669f0100f05b46708c74b7f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072450-contour-slick-c3db@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
c2d6fd9d6f35 ("jbd2: recheck chechpointing non-dirty buffer")
4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers")
214eb5a4d8a2 ("jbd2: remove redundant buffer io error checks")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From c2d6fd9d6f35079f1669f0100f05b46708c74b7f Mon Sep 17 00:00:00 2001
From: Zhang Yi <yi.zhang(a)huawei.com>
Date: Tue, 6 Jun 2023 21:59:23 +0800
Subject: [PATCH] jbd2: recheck chechpointing non-dirty buffer
There is a long-standing metadata corruption issue that happens from
time to time, but it's very difficult to reproduce and analyse, benefit
from the JBD2_CYCLE_RECORD option, we found out that the problem is the
checkpointing process miss to write out some buffers which are raced by
another do_get_write_access(). Looks below for detail.
jbd2_log_do_checkpoint() //transaction X
//buffer A is dirty and not belones to any transaction
__buffer_relink_io() //move it to the IO list
__flush_batch()
write_dirty_buffer()
do_get_write_access()
clear_buffer_dirty
__jbd2_journal_file_buffer()
//add buffer A to a new transaction Y
lock_buffer(bh)
//doesn't write out
__jbd2_journal_remove_checkpoint()
//finish checkpoint except buffer A
//filesystem corrupt if the new transaction Y isn't fully write out.
Due to the t_checkpoint_list walking loop in jbd2_log_do_checkpoint()
have already handles waiting for buffers under IO and re-added new
transaction to complete commit, and it also removing cleaned buffers,
this makes sure the list will eventually get empty. So it's fine to
leave buffers on the t_checkpoint_list while flushing out and completely
stop using the t_checkpoint_io_list.
Cc: stable(a)vger.kernel.org
Suggested-by: Jan Kara <jack(a)suse.cz>
Signed-off-by: Zhang Yi <yi.zhang(a)huawei.com>
Tested-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Reviewed-by: Jan Kara <jack(a)suse.cz>
Link: https://lore.kernel.org/r/20230606135928.434610-2-yi.zhang@huaweicloud.com
Signed-off-by: Theodore Ts'o <tytso(a)mit.edu>
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 51bd38da21cd..25e3c20eb19f 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -57,28 +57,6 @@ static inline void __buffer_unlink(struct journal_head *jh)
}
}
-/*
- * Move a buffer from the checkpoint list to the checkpoint io list
- *
- * Called with j_list_lock held
- */
-static inline void __buffer_relink_io(struct journal_head *jh)
-{
- transaction_t *transaction = jh->b_cp_transaction;
-
- __buffer_unlink_first(jh);
-
- if (!transaction->t_checkpoint_io_list) {
- jh->b_cpnext = jh->b_cpprev = jh;
- } else {
- jh->b_cpnext = transaction->t_checkpoint_io_list;
- jh->b_cpprev = transaction->t_checkpoint_io_list->b_cpprev;
- jh->b_cpprev->b_cpnext = jh;
- jh->b_cpnext->b_cpprev = jh;
- }
- transaction->t_checkpoint_io_list = jh;
-}
-
/*
* Check a checkpoint buffer could be release or not.
*
@@ -183,6 +161,7 @@ __flush_batch(journal_t *journal, int *batch_count)
struct buffer_head *bh = journal->j_chkpt_bhs[i];
BUFFER_TRACE(bh, "brelse");
__brelse(bh);
+ journal->j_chkpt_bhs[i] = NULL;
}
*batch_count = 0;
}
@@ -242,6 +221,11 @@ int jbd2_log_do_checkpoint(journal_t *journal)
jh = transaction->t_checkpoint_list;
bh = jh2bh(jh);
+ /*
+ * The buffer may be writing back, or flushing out in the
+ * last couple of cycles, or re-adding into a new transaction,
+ * need to check it again until it's unlocked.
+ */
if (buffer_locked(bh)) {
get_bh(bh);
spin_unlock(&journal->j_list_lock);
@@ -287,28 +271,32 @@ int jbd2_log_do_checkpoint(journal_t *journal)
}
if (!buffer_dirty(bh)) {
BUFFER_TRACE(bh, "remove from checkpoint");
- if (__jbd2_journal_remove_checkpoint(jh))
- /* The transaction was released; we're done */
+ /*
+ * If the transaction was released or the checkpoint
+ * list was empty, we're done.
+ */
+ if (__jbd2_journal_remove_checkpoint(jh) ||
+ !transaction->t_checkpoint_list)
goto out;
- continue;
+ } else {
+ /*
+ * We are about to write the buffer, it could be
+ * raced by some other transaction shrink or buffer
+ * re-log logic once we release the j_list_lock,
+ * leave it on the checkpoint list and check status
+ * again to make sure it's clean.
+ */
+ BUFFER_TRACE(bh, "queue");
+ get_bh(bh);
+ J_ASSERT_BH(bh, !buffer_jwrite(bh));
+ journal->j_chkpt_bhs[batch_count++] = bh;
+ transaction->t_chp_stats.cs_written++;
+ transaction->t_checkpoint_list = jh->b_cpnext;
}
- /*
- * Important: we are about to write the buffer, and
- * possibly block, while still holding the journal
- * lock. We cannot afford to let the transaction
- * logic start messing around with this buffer before
- * we write it to disk, as that would break
- * recoverability.
- */
- BUFFER_TRACE(bh, "queue");
- get_bh(bh);
- J_ASSERT_BH(bh, !buffer_jwrite(bh));
- journal->j_chkpt_bhs[batch_count++] = bh;
- __buffer_relink_io(jh);
- transaction->t_chp_stats.cs_written++;
+
if ((batch_count == JBD2_NR_BATCH) ||
- need_resched() ||
- spin_needbreak(&journal->j_list_lock))
+ need_resched() || spin_needbreak(&journal->j_list_lock) ||
+ jh2bh(transaction->t_checkpoint_list) == journal->j_chkpt_bhs[0])
goto unlock_and_flush;
}
@@ -322,38 +310,6 @@ int jbd2_log_do_checkpoint(journal_t *journal)
goto restart;
}
- /*
- * Now we issued all of the transaction's buffers, let's deal
- * with the buffers that are out for I/O.
- */
-restart2:
- /* Did somebody clean up the transaction in the meanwhile? */
- if (journal->j_checkpoint_transactions != transaction ||
- transaction->t_tid != this_tid)
- goto out;
-
- while (transaction->t_checkpoint_io_list) {
- jh = transaction->t_checkpoint_io_list;
- bh = jh2bh(jh);
- if (buffer_locked(bh)) {
- get_bh(bh);
- spin_unlock(&journal->j_list_lock);
- wait_on_buffer(bh);
- /* the journal_head may have gone by now */
- BUFFER_TRACE(bh, "brelse");
- __brelse(bh);
- spin_lock(&journal->j_list_lock);
- goto restart2;
- }
-
- /*
- * Now in whatever state the buffer currently is, we
- * know that it has been written out and so we can
- * drop it from the list
- */
- if (__jbd2_journal_remove_checkpoint(jh))
- break;
- }
out:
spin_unlock(&journal->j_list_lock);
result = jbd2_cleanup_journal_tail(journal);
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit 6bd6cd29c92401a101993290051fa55078238a52 ]
Returning early from stm32_usart_serial_remove() results in a resource
leak as several cleanup functions are not called. The driver core ignores
the return value and there is no possibility to clean up later.
uart_remove_one_port() only returns non-zero if there is some
inconsistency (i.e. stm32_usart_driver.state[port->line].uart_port == NULL).
This should never happen, and even if it does it's a bad idea to exit
early in the remove callback without cleaning up.
This prepares changing the prototype of struct platform_driver::remove to
return void. See commit 5c5a7680e67b ("platform: Provide a remove callback
that returns no value") for further details about this quest.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://lore.kernel.org/r/20230512173810.131447-2-u.kleine-koenig@pengutron…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/stm32-usart.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 1e38fc9b10c11..e9e11a2596211 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -1755,13 +1755,10 @@ static int stm32_usart_serial_remove(struct platform_device *pdev)
struct uart_port *port = platform_get_drvdata(pdev);
struct stm32_port *stm32_port = to_stm32_port(port);
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
- int err;
u32 cr3;
pm_runtime_get_sync(&pdev->dev);
- err = uart_remove_one_port(&stm32_usart_driver, port);
- if (err)
- return(err);
+ uart_remove_one_port(&stm32_usart_driver, port);
pm_runtime_disable(&pdev->dev);
pm_runtime_set_suspended(&pdev->dev);
--
2.39.2
I hope this message finds you well. I am currently downsizing my Late Husband Steinway Piano and have come to the decision of finding a new home for my late husband's piano. It holds sentimental value to me, and I want to ensure it goes to someone who will appreciate and cherish it as much as he did.
If you or anyone you know is interested in acquiring a Steinway piano, I would be delighted to hear from you. I believe that finding a suitable owner who will give it the care and attention it deserves is of utmost importance. The piano has been well-maintained over the years and is in excellent condition.
Please feel free to reach out to me at your earliest convenience. I send you more pictures of the pictures in my next email.
Cheers,
Linda Ramey
Email: rsharon1759(a)gmail.com
From: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
[ Upstream commit 6bd6cd29c92401a101993290051fa55078238a52 ]
Returning early from stm32_usart_serial_remove() results in a resource
leak as several cleanup functions are not called. The driver core ignores
the return value and there is no possibility to clean up later.
uart_remove_one_port() only returns non-zero if there is some
inconsistency (i.e. stm32_usart_driver.state[port->line].uart_port == NULL).
This should never happen, and even if it does it's a bad idea to exit
early in the remove callback without cleaning up.
This prepares changing the prototype of struct platform_driver::remove to
return void. See commit 5c5a7680e67b ("platform: Provide a remove callback
that returns no value") for further details about this quest.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig(a)pengutronix.de>
Link: https://lore.kernel.org/r/20230512173810.131447-2-u.kleine-koenig@pengutron…
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/tty/serial/stm32-usart.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)
diff --git a/drivers/tty/serial/stm32-usart.c b/drivers/tty/serial/stm32-usart.c
index 28edbaf7bb329..2a9c4058824a8 100644
--- a/drivers/tty/serial/stm32-usart.c
+++ b/drivers/tty/serial/stm32-usart.c
@@ -1753,13 +1753,10 @@ static int stm32_usart_serial_remove(struct platform_device *pdev)
struct uart_port *port = platform_get_drvdata(pdev);
struct stm32_port *stm32_port = to_stm32_port(port);
const struct stm32_usart_offsets *ofs = &stm32_port->info->ofs;
- int err;
u32 cr3;
pm_runtime_get_sync(&pdev->dev);
- err = uart_remove_one_port(&stm32_usart_driver, port);
- if (err)
- return(err);
+ uart_remove_one_port(&stm32_usart_driver, port);
pm_runtime_disable(&pdev->dev);
pm_runtime_set_suspended(&pdev->dev);
--
2.39.2
From: hackyzh002 <hackyzh002(a)gmail.com>
[ Upstream commit f828b681d0cd566f86351c0b913e6cb6ed8c7b9c ]
The type of size is unsigned, if size is 0x40000000, there will be an
integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: hackyzh002 <hackyzh002(a)gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/radeon/radeon_cs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 1ae31dbc61c64..5e61abb3dce5c 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -265,7 +265,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
{
struct drm_radeon_cs *cs = data;
uint64_t *chunk_array_ptr;
- unsigned size, i;
+ u64 size;
+ unsigned i;
u32 ring = RADEON_CS_RING_GFX;
s32 priority = 0;
--
2.39.2
From: hackyzh002 <hackyzh002(a)gmail.com>
[ Upstream commit f828b681d0cd566f86351c0b913e6cb6ed8c7b9c ]
The type of size is unsigned, if size is 0x40000000, there will be an
integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: hackyzh002 <hackyzh002(a)gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/radeon/radeon_cs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 7b54606783821..ba64dad1d7c9e 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -271,7 +271,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
{
struct drm_radeon_cs *cs = data;
uint64_t *chunk_array_ptr;
- unsigned size, i;
+ u64 size;
+ unsigned i;
u32 ring = RADEON_CS_RING_GFX;
s32 priority = 0;
--
2.39.2
From: hackyzh002 <hackyzh002(a)gmail.com>
[ Upstream commit f828b681d0cd566f86351c0b913e6cb6ed8c7b9c ]
The type of size is unsigned, if size is 0x40000000, there will be an
integer overflow, size will be zero after size *= sizeof(uint32_t),
will cause uninitialized memory to be referenced later
Reviewed-by: Christian König <christian.koenig(a)amd.com>
Signed-off-by: hackyzh002 <hackyzh002(a)gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/radeon/radeon_cs.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 9ed2b2700e0a5..e5fbe851ed930 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -270,7 +270,8 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
{
struct drm_radeon_cs *cs = data;
uint64_t *chunk_array_ptr;
- unsigned size, i;
+ u64 size;
+ unsigned i;
u32 ring = RADEON_CS_RING_GFX;
s32 priority = 0;
--
2.39.2
The Hyper-V host is queried to get the max transfer size that it
supports, and this value is used to set max_sectors for the synthetic
SCSI controller. However, this max transfer size may be too large
for virtual Fibre Channel devices, which are limited to 512 Kbytes.
If a larger transfer size is used with a vFC device, Hyper-V always
returns an error, and storvsc logs a message like this where the SRB
status and SCSI status are both zero:
hv_storvsc <GUID>: tag#197 cmd 0x8a status: scsi 0x0 srb 0x0 hv 0xc0000001
Add logic to limit the max transfer size to 512 Kbytes for vFC devices.
Fixes: 1d3e0980782f ("scsi: storvsc: Correct reporting of Hyper-V I/O size limits")
Cc: stable(a)vger.kernel.org
Signed-off-by: Michael Kelley <mikelley(a)microsoft.com>
---
drivers/scsi/storvsc_drv.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/scsi/storvsc_drv.c b/drivers/scsi/storvsc_drv.c
index 7f12d93..f282321 100644
--- a/drivers/scsi/storvsc_drv.c
+++ b/drivers/scsi/storvsc_drv.c
@@ -366,6 +366,7 @@ enum storvsc_request_type {
#define STORVSC_FC_MAX_LUNS_PER_TARGET 255
#define STORVSC_FC_MAX_TARGETS 128
#define STORVSC_FC_MAX_CHANNELS 8
+#define STORVSC_FC_MAX_XFER_SIZE ((u32)(512 * 1024))
#define STORVSC_IDE_MAX_LUNS_PER_TARGET 64
#define STORVSC_IDE_MAX_TARGETS 1
@@ -2006,6 +2007,9 @@ static int storvsc_probe(struct hv_device *device,
* protecting it from any weird value.
*/
max_xfer_bytes = round_down(stor_device->max_transfer_bytes, HV_HYP_PAGE_SIZE);
+ if (is_fc)
+ max_xfer_bytes = min(max_xfer_bytes, STORVSC_FC_MAX_XFER_SIZE);
+
/* max_hw_sectors_kb */
host->max_sectors = max_xfer_bytes >> 9;
/*
--
1.8.3.1
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x b321c31c9b7b309dcde5e8854b741c8e6a9a05f0
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072322-posing-dispatch-863e@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
b321c31c9b7b ("KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t preemption")
0c2f9acf6ae7 ("KVM: arm64: PMU: Don't overwrite PMUSERENR with vcpu loaded")
8681f7175901 ("KVM: arm64: PMU: Restore the host's PMUSERENR_EL0")
009d6dc87a56 ("ARM: perf: Allow the use of the PMUv3 driver on 32bit ARM")
711432770f78 ("perf: pmuv3: Abstract PMU version checks")
df29ddf4f04b ("arm64: perf: Abstract system register accesses away")
7755cec63ade ("arm64: perf: Move PMUv3 driver to drivers/perf")
cc91b9481605 ("arm64/perf: Replace PMU version number '0' with ID_AA64DFR0_EL1_PMUVer_NI")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From b321c31c9b7b309dcde5e8854b741c8e6a9a05f0 Mon Sep 17 00:00:00 2001
From: Marc Zyngier <maz(a)kernel.org>
Date: Thu, 13 Jul 2023 08:06:57 +0100
Subject: [PATCH] KVM: arm64: vgic-v4: Make the doorbell request robust w.r.t
preemption
Xiang reports that VMs occasionally fail to boot on GICv4.1 systems when
running a preemptible kernel, as it is possible that a vCPU is blocked
without requesting a doorbell interrupt.
The issue is that any preemption that occurs between vgic_v4_put() and
schedule() on the block path will mark the vPE as nonresident and *not*
request a doorbell irq. This occurs because when the vcpu thread is
resumed on its way to block, vcpu_load() will make the vPE resident
again. Once the vcpu actually blocks, we don't request a doorbell
anymore, and the vcpu won't be woken up on interrupt delivery.
Fix it by tracking that we're entering WFI, and key the doorbell
request on that flag. This allows us not to make the vPE resident
when going through a preempt/schedule cycle, meaning we don't lose
any state.
Cc: stable(a)vger.kernel.org
Fixes: 8e01d9a396e6 ("KVM: arm64: vgic-v4: Move the GICv4 residency flow to be driven by vcpu_load/put")
Reported-by: Xiang Chen <chenxiang66(a)hisilicon.com>
Suggested-by: Zenghui Yu <yuzenghui(a)huawei.com>
Tested-by: Xiang Chen <chenxiang66(a)hisilicon.com>
Co-developed-by: Oliver Upton <oliver.upton(a)linux.dev>
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Acked-by: Zenghui Yu <yuzenghui(a)huawei.com>
Link: https://lore.kernel.org/r/20230713070657.3873244-1-maz@kernel.org
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 8b6096753740..d3dd05bbfe23 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -727,6 +727,8 @@ struct kvm_vcpu_arch {
#define DBG_SS_ACTIVE_PENDING __vcpu_single_flag(sflags, BIT(5))
/* PMUSERENR for the guest EL0 is on physical CPU */
#define PMUSERENR_ON_CPU __vcpu_single_flag(sflags, BIT(6))
+/* WFI instruction trapped */
+#define IN_WFI __vcpu_single_flag(sflags, BIT(7))
/* Pointer to the vcpu's SVE FFR for sve_{save,load}_state() */
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index a402ea5511f3..72dc53a75d1c 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -718,13 +718,15 @@ void kvm_vcpu_wfi(struct kvm_vcpu *vcpu)
*/
preempt_disable();
kvm_vgic_vmcr_sync(vcpu);
- vgic_v4_put(vcpu, true);
+ vcpu_set_flag(vcpu, IN_WFI);
+ vgic_v4_put(vcpu);
preempt_enable();
kvm_vcpu_halt(vcpu);
vcpu_clear_flag(vcpu, IN_WFIT);
preempt_disable();
+ vcpu_clear_flag(vcpu, IN_WFI);
vgic_v4_load(vcpu);
preempt_enable();
}
@@ -792,7 +794,7 @@ static int check_vcpu_requests(struct kvm_vcpu *vcpu)
if (kvm_check_request(KVM_REQ_RELOAD_GICv4, vcpu)) {
/* The distributor enable bits were changed */
preempt_disable();
- vgic_v4_put(vcpu, false);
+ vgic_v4_put(vcpu);
vgic_v4_load(vcpu);
preempt_enable();
}
diff --git a/arch/arm64/kvm/vgic/vgic-v3.c b/arch/arm64/kvm/vgic/vgic-v3.c
index c3b8e132d599..3dfc8b84e03e 100644
--- a/arch/arm64/kvm/vgic/vgic-v3.c
+++ b/arch/arm64/kvm/vgic/vgic-v3.c
@@ -749,7 +749,7 @@ void vgic_v3_put(struct kvm_vcpu *vcpu)
{
struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3;
- WARN_ON(vgic_v4_put(vcpu, false));
+ WARN_ON(vgic_v4_put(vcpu));
vgic_v3_vmcr_sync(vcpu);
diff --git a/arch/arm64/kvm/vgic/vgic-v4.c b/arch/arm64/kvm/vgic/vgic-v4.c
index c1c28fe680ba..339a55194b2c 100644
--- a/arch/arm64/kvm/vgic/vgic-v4.c
+++ b/arch/arm64/kvm/vgic/vgic-v4.c
@@ -336,14 +336,14 @@ void vgic_v4_teardown(struct kvm *kvm)
its_vm->vpes = NULL;
}
-int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db)
+int vgic_v4_put(struct kvm_vcpu *vcpu)
{
struct its_vpe *vpe = &vcpu->arch.vgic_cpu.vgic_v3.its_vpe;
if (!vgic_supports_direct_msis(vcpu->kvm) || !vpe->resident)
return 0;
- return its_make_vpe_non_resident(vpe, need_db);
+ return its_make_vpe_non_resident(vpe, !!vcpu_get_flag(vcpu, IN_WFI));
}
int vgic_v4_load(struct kvm_vcpu *vcpu)
@@ -354,6 +354,9 @@ int vgic_v4_load(struct kvm_vcpu *vcpu)
if (!vgic_supports_direct_msis(vcpu->kvm) || vpe->resident)
return 0;
+ if (vcpu_get_flag(vcpu, IN_WFI))
+ return 0;
+
/*
* Before making the VPE resident, make sure the redistributor
* corresponding to our current CPU expects us here. See the
diff --git a/include/kvm/arm_vgic.h b/include/kvm/arm_vgic.h
index 402b545959af..5b27f94d4fad 100644
--- a/include/kvm/arm_vgic.h
+++ b/include/kvm/arm_vgic.h
@@ -431,7 +431,7 @@ int kvm_vgic_v4_unset_forwarding(struct kvm *kvm, int irq,
int vgic_v4_load(struct kvm_vcpu *vcpu);
void vgic_v4_commit(struct kvm_vcpu *vcpu);
-int vgic_v4_put(struct kvm_vcpu *vcpu, bool need_db);
+int vgic_v4_put(struct kvm_vcpu *vcpu);
/* CPU HP callbacks */
void kvm_vgic_cpu_up(void);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.14.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072359-overfull-ascent-520d@gregkh' --subject-prefix 'PATCH 4.14.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
094d00f8ca58 ("KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance")
e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
d6b4bd3f4897 ("KVM: arm64: Fixup hyp stage-1 refcount")
2ea2ff91e822 ("KVM: arm64: Refcount hyp stage-1 pgtable pages")
cf0c7125d578 ("Merge branch kvm-arm64/mmu/el2-tracking into kvmarm-master/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-4.19.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072357-trilogy-average-5fd9@gregkh' --subject-prefix 'PATCH 4.19.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
094d00f8ca58 ("KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance")
e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
d6b4bd3f4897 ("KVM: arm64: Fixup hyp stage-1 refcount")
2ea2ff91e822 ("KVM: arm64: Refcount hyp stage-1 pgtable pages")
cf0c7125d578 ("Merge branch kvm-arm64/mmu/el2-tracking into kvmarm-master/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
The patch below does not apply to the 5.4-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.4.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072356-margin-freckled-cde8@gregkh' --subject-prefix 'PATCH 5.4.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
094d00f8ca58 ("KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance")
e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
d6b4bd3f4897 ("KVM: arm64: Fixup hyp stage-1 refcount")
2ea2ff91e822 ("KVM: arm64: Refcount hyp stage-1 pgtable pages")
cf0c7125d578 ("Merge branch kvm-arm64/mmu/el2-tracking into kvmarm-master/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072354-sublet-observing-2ff8@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
094d00f8ca58 ("KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance")
e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
d6b4bd3f4897 ("KVM: arm64: Fixup hyp stage-1 refcount")
2ea2ff91e822 ("KVM: arm64: Refcount hyp stage-1 pgtable pages")
cf0c7125d578 ("Merge branch kvm-arm64/mmu/el2-tracking into kvmarm-master/next")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072352-ooze-childcare-fd84@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
094d00f8ca58 ("KVM: arm64: pkvm: Use the mm_ops indirection for cache maintenance")
e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
82bb02445de5 ("KVM: arm64: Implement kvm_pgtable_hyp_unmap() at EL2")
d6b4bd3f4897 ("KVM: arm64: Fixup hyp stage-1 refcount")
2ea2ff91e822 ("KVM: arm64: Refcount hyp stage-1 pgtable pages")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x df6556adf27b7372cfcd97e1c0afb0d516c8279f
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2023072351-slackness-unlearned-6f45@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
df6556adf27b ("KVM: arm64: Correctly handle page aging notifiers for unaligned memslot")
ca5de2448c3b ("KVM: arm64: Atomically update stage 2 leaf attributes in parallel walks")
6b91b8f95cad ("KVM: arm64: Use an opaque type for pteps")
fa002e8e79b3 ("KVM: arm64: Don't pass kvm_pgtable through kvm_pgtable_walk_data")
2a611c7f87f2 ("KVM: arm64: Pass mm_ops through the visitor context")
83844a2317ec ("KVM: arm64: Stash observed pte value in visitor context")
dfc7a7769ab7 ("KVM: arm64: Combine visitor arguments into a context structure")
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From df6556adf27b7372cfcd97e1c0afb0d516c8279f Mon Sep 17 00:00:00 2001
From: Oliver Upton <oliver.upton(a)linux.dev>
Date: Tue, 27 Jun 2023 23:54:05 +0000
Subject: [PATCH] KVM: arm64: Correctly handle page aging notifiers for
unaligned memslot
Userspace is allowed to select any PAGE_SIZE aligned hva to back guest
memory. This is even the case with hugepages, although it is a rather
suboptimal configuration as PTE level mappings are used at stage-2.
The arm64 page aging handlers have an assumption that the specified
range is exactly one page/block of memory, which in the aforementioned
case is not necessarily true. All together this leads to the WARN() in
kvm_age_gfn() firing.
However, the WARN is only part of the issue as the table walkers visit
at most a single leaf PTE. For hugepage-backed memory in a memslot that
isn't hugepage-aligned, page aging entirely misses accesses to the
hugepage beyond the first page in the memslot.
Add a new walker dedicated to handling page aging MMU notifiers capable
of walking a range of PTEs. Convert kvm(_test)_age_gfn() over to the new
walker and drop the WARN that caught the issue in the first place. The
implementation of this walker was inspired by the test_clear_young()
implementation by Yu Zhao [*], but repurposed to address a bug in the
existing aging implementation.
Cc: stable(a)vger.kernel.org # v5.15
Fixes: 056aad67f836 ("kvm: arm/arm64: Rework gpa callback handlers")
Link: https://lore.kernel.org/kvmarm/20230526234435.662652-6-yuzhao@google.com/
Co-developed-by: Yu Zhao <yuzhao(a)google.com>
Signed-off-by: Yu Zhao <yuzhao(a)google.com>
Reported-by: Reiji Watanabe <reijiw(a)google.com>
Reviewed-by: Marc Zyngier <maz(a)kernel.org>
Reviewed-by: Shaoqin Huang <shahuang(a)redhat.com>
Link: https://lore.kernel.org/r/20230627235405.4069823-1-oliver.upton@linux.dev
Signed-off-by: Oliver Upton <oliver.upton(a)linux.dev>
diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/kvm_pgtable.h
index 8294a9a7e566..929d355eae0a 100644
--- a/arch/arm64/include/asm/kvm_pgtable.h
+++ b/arch/arm64/include/asm/kvm_pgtable.h
@@ -608,22 +608,26 @@ int kvm_pgtable_stage2_wrprotect(struct kvm_pgtable *pgt, u64 addr, u64 size);
kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr);
/**
- * kvm_pgtable_stage2_mkold() - Clear the access flag in a page-table entry.
+ * kvm_pgtable_stage2_test_clear_young() - Test and optionally clear the access
+ * flag in a page-table entry.
* @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
* @addr: Intermediate physical address to identify the page-table entry.
+ * @size: Size of the address range to visit.
+ * @mkold: True if the access flag should be cleared.
*
* The offset of @addr within a page is ignored.
*
- * If there is a valid, leaf page-table entry used to translate @addr, then
- * clear the access flag in that entry.
+ * Tests and conditionally clears the access flag for every valid, leaf
+ * page-table entry used to translate the range [@addr, @addr + @size).
*
* Note that it is the caller's responsibility to invalidate the TLB after
* calling this function to ensure that the updated permissions are visible
* to the CPUs.
*
- * Return: The old page-table entry prior to clearing the flag, 0 on failure.
+ * Return: True if any of the visited PTEs had the access flag set.
*/
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold);
/**
* kvm_pgtable_stage2_relax_perms() - Relax the permissions enforced by a
@@ -645,18 +649,6 @@ kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr);
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
enum kvm_pgtable_prot prot);
-/**
- * kvm_pgtable_stage2_is_young() - Test whether a page-table entry has the
- * access flag set.
- * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*().
- * @addr: Intermediate physical address to identify the page-table entry.
- *
- * The offset of @addr within a page is ignored.
- *
- * Return: True if the page-table entry has the access flag set, false otherwise.
- */
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr);
-
/**
* kvm_pgtable_stage2_flush_range() - Clean and invalidate data cache to Point
* of Coherency for guest stage-2 address
diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c
index aa740a974e02..f7a93ef29250 100644
--- a/arch/arm64/kvm/hyp/pgtable.c
+++ b/arch/arm64/kvm/hyp/pgtable.c
@@ -1195,25 +1195,54 @@ kvm_pte_t kvm_pgtable_stage2_mkyoung(struct kvm_pgtable *pgt, u64 addr)
return pte;
}
-kvm_pte_t kvm_pgtable_stage2_mkold(struct kvm_pgtable *pgt, u64 addr)
+struct stage2_age_data {
+ bool mkold;
+ bool young;
+};
+
+static int stage2_age_walker(const struct kvm_pgtable_visit_ctx *ctx,
+ enum kvm_pgtable_walk_flags visit)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, KVM_PTE_LEAF_ATTR_LO_S2_AF,
- &pte, NULL, 0);
+ kvm_pte_t new = ctx->old & ~KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data *data = ctx->arg;
+
+ if (!kvm_pte_valid(ctx->old) || new == ctx->old)
+ return 0;
+
+ data->young = true;
+
+ /*
+ * stage2_age_walker() is always called while holding the MMU lock for
+ * write, so this will always succeed. Nonetheless, this deliberately
+ * follows the race detection pattern of the other stage-2 walkers in
+ * case the locking mechanics of the MMU notifiers is ever changed.
+ */
+ if (data->mkold && !stage2_try_set_pte(ctx, new))
+ return -EAGAIN;
+
/*
* "But where's the TLBI?!", you scream.
* "Over in the core code", I sigh.
*
* See the '->clear_flush_young()' callback on the KVM mmu notifier.
*/
- return pte;
+ return 0;
}
-bool kvm_pgtable_stage2_is_young(struct kvm_pgtable *pgt, u64 addr)
+bool kvm_pgtable_stage2_test_clear_young(struct kvm_pgtable *pgt, u64 addr,
+ u64 size, bool mkold)
{
- kvm_pte_t pte = 0;
- stage2_update_leaf_attrs(pgt, addr, 1, 0, 0, &pte, NULL, 0);
- return pte & KVM_PTE_LEAF_ATTR_LO_S2_AF;
+ struct stage2_age_data data = {
+ .mkold = mkold,
+ };
+ struct kvm_pgtable_walker walker = {
+ .cb = stage2_age_walker,
+ .arg = &data,
+ .flags = KVM_PGTABLE_WALK_LEAF,
+ };
+
+ WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker));
+ return data.young;
}
int kvm_pgtable_stage2_relax_perms(struct kvm_pgtable *pgt, u64 addr,
diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c
index 6db9ef288ec3..d3b4feed460c 100644
--- a/arch/arm64/kvm/mmu.c
+++ b/arch/arm64/kvm/mmu.c
@@ -1756,27 +1756,25 @@ bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
u64 size = (range->end - range->start) << PAGE_SHIFT;
- kvm_pte_t kpte;
- pte_t pte;
if (!kvm->arch.mmu.pgt)
return false;
- WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PUD_SIZE);
-
- kpte = kvm_pgtable_stage2_mkold(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
- pte = __pte(kpte);
- return pte_valid(pte) && pte_young(pte);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, true);
}
bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range)
{
+ u64 size = (range->end - range->start) << PAGE_SHIFT;
+
if (!kvm->arch.mmu.pgt)
return false;
- return kvm_pgtable_stage2_is_young(kvm->arch.mmu.pgt,
- range->start << PAGE_SHIFT);
+ return kvm_pgtable_stage2_test_clear_young(kvm->arch.mmu.pgt,
+ range->start << PAGE_SHIFT,
+ size, false);
}
phys_addr_t kvm_mmu_get_httbr(void)
Recently Luiz Capitulino reported BPF test failure for kernel version
6.1.36 (see [7]). The following test_verifier test failed:
"precise: ST insn causing spi > allocated_stack".
After back-port of the following upstream commit:
ecdf985d7615 ("bpf: track immediate values written to stack by BPF_ST instruction")
Investigation in [8] shows that test failure is not a bug, but a
difference in BPF verifier behavior between upstream, where commits
[1,2,3] by Andrii Nakryiko are present, and 6.1.36, where these
commits are absent. Both Luiz and Greg suggested back-porting [1,2,3]
from upstream to avoid divergences.
Commits [1,2,3] break test_progs selftest "align/packet variable offset",
commit [4] fixes this selftest.
I did some additional testing using the following compiler versions:
- Kernel compilation
- gcc version 11.3.0
- BPF tests compilation
- clang version 16.0.6
- clang version 17.0.0 (fa46feb31481)
And identified a few more failing BPF selftests:
- Tests failing with LLVM 16:
- test_verifier:
- precise: ST insn causing spi > allocated_stack FAIL (fixed by [1,2,3])
- test_progs:
- sk_assign (fixed by [6])
- Tests failing with LLVM 17:
- test_verifier:
- precise: ST insn causing spi > allocated_stack FAIL (fixed by [1,2,3])
- test_progs:
- fexit_bpf2bpf/func_replace_verify (fixed by [5])
- fexit_bpf2bpf/func_replace_return_code (fixed by [5])
- sk_assign (fixed by [6])
Commits [4,5,6] only apply to BPF selftests and don't change verifier
behavior.
After applying all of the listed commits I have test_verifier,
test_progs, test_progs-no_alu32 and test_maps passing on my x86 setup,
both for LLVM 16 and LLVM 17.
Upstream commits in chronological order:
[1] be2ef8161572 ("bpf: allow precision tracking for programs with subprogs")
[2] f63181b6ae79 ("bpf: stop setting precise in current state")
[3] 7a830b53c17b ("bpf: aggressively forget precise markings during state checkpointing")
[4] 4f999b767769 ("selftests/bpf: make test_align selftest more robust")
[5] 63d78b7e8ca2 ("selftests/bpf: Workaround verification failure for fexit_bpf2bpf/func_replace_return_code")
[6] 7ce878ca81bc ("selftests/bpf: Fix sk_assign on s390x")
Links:
[7] https://lore.kernel.org/stable/935c4751-d368-df29-33a6-9f4fcae720fa@amazon.…
[8] https://lore.kernel.org/stable/c9b10a8a551edafdfec855fbd35757c6238ad258.cam…
Reported-by: Luiz Capitulino <luizcap(a)amazon.com>
Andrii Nakryiko (4):
bpf: allow precision tracking for programs with subprogs
bpf: stop setting precise in current state
bpf: aggressively forget precise markings during state checkpointing
selftests/bpf: make test_align selftest more robust
Ilya Leoshkevich (1):
selftests/bpf: Fix sk_assign on s390x
Yonghong Song (1):
selftests/bpf: Workaround verification failure for
fexit_bpf2bpf/func_replace_return_code
kernel/bpf/verifier.c | 202 ++++++++++++++++--
.../testing/selftests/bpf/prog_tests/align.c | 38 ++--
.../selftests/bpf/prog_tests/sk_assign.c | 25 ++-
.../selftests/bpf/progs/connect4_prog.c | 2 +-
.../selftests/bpf/progs/test_sk_assign.c | 11 +
.../bpf/progs/test_sk_assign_libbpf.c | 3 +
6 files changed, 247 insertions(+), 34 deletions(-)
create mode 100644 tools/testing/selftests/bpf/progs/test_sk_assign_libbpf.c
--
2.41.0
Seen in arm:allmodconfig and arm64:allmodconfig builds with v5.4.249-278-g78f9a3d1c959.
sound/soc/mediatek/mt8173/mt8173-afe-pcm.c: In function 'mt8173_afe_pcm_dev_probe':
sound/soc/mediatek/mt8173/mt8173-afe-pcm.c:1159:17: error: label 'err_cleanup_components' used but not defined
Guenter