The patch titled
Subject: mm: open-code page_folio() in dump_page()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-open-code-page_folio-in-dump_page.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: mm: open-code page_folio() in dump_page()
Date: Mon, 25 Nov 2024 20:17:19 +0000
page_folio() calls page_fixed_fake_head() which will misidentify this page
as being a fake head and load off the end of 'precise'. We may have a
pointer to a fake head, but that's OK because it contains the right
information for dump_page().
gcc-15 is smart enough to catch this with -Warray-bounds:
In function 'page_fixed_fake_head',
inlined from '_compound_head' at ../include/linux/page-flags.h:251:24,
inlined from '__dump_page' at ../mm/debug.c:123:11:
../include/asm-generic/rwonce.h:44:26: warning: array subscript 9 is outside
+array bounds of 'struct page[1]' [-Warray-bounds=]
Link: https://lkml.kernel.org/r/20241125201721.2963278-2-willy@infradead.org
Fixes: fae7d834c43c ("mm: add __dump_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Reported-by: Kees Cook <kees(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/debug.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
--- a/mm/debug.c~mm-open-code-page_folio-in-dump_page
+++ a/mm/debug.c
@@ -124,19 +124,22 @@ static void __dump_page(const struct pag
{
struct folio *foliop, folio;
struct page precise;
+ unsigned long head;
unsigned long pfn = page_to_pfn(page);
unsigned long idx, nr_pages = 1;
int loops = 5;
again:
memcpy(&precise, page, sizeof(*page));
- foliop = page_folio(&precise);
- if (foliop == (struct folio *)&precise) {
+ head = precise.compound_head;
+ if ((head & 1) == 0) {
+ foliop = (struct folio *)&precise;
idx = 0;
if (!folio_test_large(foliop))
goto dump;
foliop = (struct folio *)page;
} else {
+ foliop = (struct folio *)(head - 1);
idx = folio_page_idx(foliop, page);
}
_
Patches currently in -mm which might be from willy(a)infradead.org are
mm-open-code-pagetail-in-folio_flags-and-const_folio_flags.patch
mm-open-code-page_folio-in-dump_page.patch
mm-page_alloc-cache-page_zone-result-in-free_unref_page.patch
mm-make-alloc_pages_mpol-static.patch
mm-page_alloc-export-free_frozen_pages-instead-of-free_unref_page.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-post_alloc_hook.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-prep_new_page.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-get_page_from_freelist.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_cpuset_fallback.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_may_oom.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_direct_compact.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_direct_reclaim.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_slowpath.patch
mm-page_alloc-move-set_page_refcounted-to-end-of-__alloc_pages.patch
mm-page_alloc-add-__alloc_frozen_pages.patch
mm-mempolicy-add-alloc_frozen_pages.patch
slab-allocate-frozen-pages.patch
The patch titled
Subject: mm: open-code PageTail in folio_flags() and const_folio_flags()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-open-code-pagetail-in-folio_flags-and-const_folio_flags.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: "Matthew Wilcox (Oracle)" <willy(a)infradead.org>
Subject: mm: open-code PageTail in folio_flags() and const_folio_flags()
Date: Mon, 25 Nov 2024 20:17:18 +0000
It is unsafe to call PageTail() in dump_page() as page_is_fake_head() will
almost certainly return true when called on a head page that is copied to
the stack. That will cause the VM_BUG_ON_PGFLAGS() in const_folio_flags()
to trigger when it shouldn't. Fortunately, we don't need to call
PageTail() here; it's fine to have a pointer to a virtual alias of the
page's flag word rather than the real page's flag word.
Link: https://lkml.kernel.org/r/20241125201721.2963278-1-willy@infradead.org
Fixes: fae7d834c43c ("mm: add __dump_folio()")
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Cc: Kees Cook <kees(a)kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
include/linux/page-flags.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/include/linux/page-flags.h~mm-open-code-pagetail-in-folio_flags-and-const_folio_flags
+++ a/include/linux/page-flags.h
@@ -306,7 +306,7 @@ static const unsigned long *const_folio_
{
const struct page *page = &folio->page;
- VM_BUG_ON_PGFLAGS(PageTail(page), page);
+ VM_BUG_ON_PGFLAGS(page->compound_head & 1, page);
VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
return &page[n].flags;
}
@@ -315,7 +315,7 @@ static unsigned long *folio_flags(struct
{
struct page *page = &folio->page;
- VM_BUG_ON_PGFLAGS(PageTail(page), page);
+ VM_BUG_ON_PGFLAGS(page->compound_head & 1, page);
VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
return &page[n].flags;
}
_
Patches currently in -mm which might be from willy(a)infradead.org are
mm-open-code-pagetail-in-folio_flags-and-const_folio_flags.patch
mm-open-code-page_folio-in-dump_page.patch
mm-page_alloc-cache-page_zone-result-in-free_unref_page.patch
mm-make-alloc_pages_mpol-static.patch
mm-page_alloc-export-free_frozen_pages-instead-of-free_unref_page.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-post_alloc_hook.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-prep_new_page.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-get_page_from_freelist.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_cpuset_fallback.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_may_oom.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_direct_compact.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_direct_reclaim.patch
mm-page_alloc-move-set_page_refcounted-to-callers-of-__alloc_pages_slowpath.patch
mm-page_alloc-move-set_page_refcounted-to-end-of-__alloc_pages.patch
mm-page_alloc-add-__alloc_frozen_pages.patch
mm-mempolicy-add-alloc_frozen_pages.patch
slab-allocate-frozen-pages.patch
The patch titled
Subject: mm: fix vrealloc()'s KASAN poisoning logic
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-fix-vreallocs-kasan-poisoning-logic.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Andrii Nakryiko <andrii(a)kernel.org>
Subject: mm: fix vrealloc()'s KASAN poisoning logic
Date: Mon, 25 Nov 2024 16:52:06 -0800
When vrealloc() reuses already allocated vmap_area, we need to re-annotate
poisoned and unpoisoned portions of underlying memory according to the new
size.
Note, hard-coding KASAN_VMALLOC_PROT_NORMAL might not be exactly correct,
but KASAN flag logic is pretty involved and spread out throughout
__vmalloc_node_range_noprof(), so I'm using the bare minimum flag here and
leaving the rest to mm people to refactor this logic and reuse it here.
Link: https://lkml.kernel.org/r/20241126005206.3457974-1-andrii@kernel.org
Fixes: 3ddc2fefe6f3 ("mm: vmalloc: implement vrealloc()")
Signed-off-by: Andrii Nakryiko <andrii(a)kernel.org>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Uladzislau Rezki (Sony) <urezki(a)gmail.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmalloc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/vmalloc.c~mm-fix-vreallocs-kasan-poisoning-logic
+++ a/mm/vmalloc.c
@@ -4093,7 +4093,8 @@ void *vrealloc_noprof(const void *p, siz
/* Zero out spare memory. */
if (want_init_on_alloc(flags))
memset((void *)p + size, 0, old_size - size);
-
+ kasan_poison_vmalloc(p + size, old_size - size);
+ kasan_unpoison_vmalloc(p, size, KASAN_VMALLOC_PROT_NORMAL);
return (void *)p;
}
_
Patches currently in -mm which might be from andrii(a)kernel.org are
mm-fix-vreallocs-kasan-poisoning-logic.patch
The patch titled
Subject: Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()"
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
revert-readahead-properly-shorten-readahead-when-falling-back-to-do_page_cache_ra.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jan Kara <jack(a)suse.cz>
Subject: Revert "readahead: properly shorten readahead when falling back to do_page_cache_ra()"
Date: Tue, 26 Nov 2024 15:52:08 +0100
This reverts commit 7c877586da3178974a8a94577b6045a48377ff25.
Anders and Philippe have reported that recent kernels occasionally hang
when used with NFS in readahead code. The problem has been bisected to
7c877586da3 ("readahead: properly shorten readahead when falling back to
do_page_cache_ra()"). The cause of the problem is that ra->size can be
shrunk by read_pages() call and subsequently we end up calling
do_page_cache_ra() with negative (read huge positive) number of pages.
Let's revert 7c877586da3 for now until we can find a proper way how the
logic in read_pages() and page_cache_ra_order() can coexist. This can
lead to reduced readahead throughput due to readahead window confusion but
that's better than outright hangs.
Link: https://lkml.kernel.org/r/20241126145208.985-1-jack@suse.cz
Fixes: 7c877586da31 ("readahead: properly shorten readahead when falling back to do_page_cache_ra()")
Reported-by: Anders Blomdell <anders.blomdell(a)gmail.com>
Reported-by: Philippe Troin <phil(a)fifi.org>
Signed-off-by: Jan Kara <jack(a)suse.cz>
Tested-by: Philippe Troin <phil(a)fifi.org>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/readahead.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
--- a/mm/readahead.c~revert-readahead-properly-shorten-readahead-when-falling-back-to-do_page_cache_ra
+++ a/mm/readahead.c
@@ -460,8 +460,7 @@ void page_cache_ra_order(struct readahea
struct file_ra_state *ra, unsigned int new_order)
{
struct address_space *mapping = ractl->mapping;
- pgoff_t start = readahead_index(ractl);
- pgoff_t index = start;
+ pgoff_t index = readahead_index(ractl);
unsigned int min_order = mapping_min_folio_order(mapping);
pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
pgoff_t mark = index + ra->size - ra->async_size;
@@ -524,7 +523,7 @@ void page_cache_ra_order(struct readahea
if (!err)
return;
fallback:
- do_page_cache_ra(ractl, ra->size - (index - start), ra->async_size);
+ do_page_cache_ra(ractl, ra->size, ra->async_size);
}
static unsigned long ractl_max_pages(struct readahead_control *ractl,
_
Patches currently in -mm which might be from jack(a)suse.cz are
revert-readahead-properly-shorten-readahead-when-falling-back-to-do_page_cache_ra.patch
The patch titled
Subject: mm: vmscan: ensure kswapd is woken up if the wait queue is active
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-vmscan-ensure-kswapd-is-woken-up-if-the-wait-queue-is-active.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Seiji Nishikawa <snishika(a)redhat.com>
Subject: mm: vmscan: ensure kswapd is woken up if the wait queue is active
Date: Wed, 27 Nov 2024 00:06:12 +0900
Even after commit 501b26510ae3 ("vmstat: allow_direct_reclaim should use
zone_page_state_snapshot"), a task may remain indefinitely stuck in
throttle_direct_reclaim() while holding mm->rwsem.
__alloc_pages_nodemask
try_to_free_pages
throttle_direct_reclaim
This can cause numerous other tasks to wait on the same rwsem, leading
to severe system hangups:
[1088963.358712] INFO: task python3:1670971 blocked for more than 120 seconds.
[1088963.365653] Tainted: G OE -------- - - 4.18.0-553.el8_10.aarch64 #1
[1088963.373887] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[1088963.381862] task:python3 state:D stack:0 pid:1670971 ppid:1667117 flags:0x00800080
[1088963.381869] Call trace:
[1088963.381872] __switch_to+0xd0/0x120
[1088963.381877] __schedule+0x340/0xac8
[1088963.381881] schedule+0x68/0x118
[1088963.381886] rwsem_down_read_slowpath+0x2d4/0x4b8
The issue arises when allow_direct_reclaim(pgdat) returns false,
preventing progress even when the pgdat->pfmemalloc_wait wait queue is
empty. Despite the wait queue being empty, the condition,
allow_direct_reclaim(pgdat), may still be returning false, causing it to
continue looping.
In some cases, reclaimable pages exist (zone_reclaimable_pages() returns
> 0), but calculations of pfmemalloc_reserve and free_pages result in
wmark_ok being false.
And then, despite the pgdat->kswapd_wait queue being non-empty, kswapd
is not woken up, further exacerbating the problem:
crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_highest_zoneidx
$775 = __MAX_NR_ZONES
This patch modifies allow_direct_reclaim() to wake kswapd if the
pgdat->kswapd_wait queue is active, regardless of whether wmark_ok is true
or false. This change ensures kswapd does not miss wake-ups under high
memory pressure, reducing the risk of task stalls in the throttled reclaim
path.
Link: https://lkml.kernel.org/r/20241126150612.114561-1-snishika@redhat.com
Signed-off-by: Seiji Nishikawa <snishika(a)redhat.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/vmscan.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
--- a/mm/vmscan.c~mm-vmscan-ensure-kswapd-is-woken-up-if-the-wait-queue-is-active
+++ a/mm/vmscan.c
@@ -6389,8 +6389,8 @@ static bool allow_direct_reclaim(pg_data
wmark_ok = free_pages > pfmemalloc_reserve / 2;
- /* kswapd must be awake if processes are being throttled */
- if (!wmark_ok && waitqueue_active(&pgdat->kswapd_wait)) {
+ /* Always wake up kswapd if the wait queue is not empty */
+ if (waitqueue_active(&pgdat->kswapd_wait)) {
if (READ_ONCE(pgdat->kswapd_highest_zoneidx) > ZONE_NORMAL)
WRITE_ONCE(pgdat->kswapd_highest_zoneidx, ZONE_NORMAL);
_
Patches currently in -mm which might be from snishika(a)redhat.com are
mm-vmscan-ensure-kswapd-is-woken-up-if-the-wait-queue-is-active.patch
The patch titled
Subject: selftests/damon: add _damon_sysfs.py to TEST_FILES
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftests-damon-add-_damon_sysfspy-to-test_files.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Maximilian Heyne <mheyne(a)amazon.de>
Subject: selftests/damon: add _damon_sysfs.py to TEST_FILES
Date: Wed, 27 Nov 2024 12:08:53 +0000
When running selftests I encountered the following error message with
some damon tests:
# Traceback (most recent call last):
# File "[...]/damon/./damos_quota.py", line 7, in <module>
# import _damon_sysfs
# ModuleNotFoundError: No module named '_damon_sysfs'
Fix this by adding the _damon_sysfs.py file to TEST_FILES so that it
will be available when running the respective damon selftests.
Link: https://lkml.kernel.org/r/20241127-picks-visitor-7416685b-mheyne@amazon.de
Fixes: 306abb63a8ca ("selftests/damon: implement a python module for test-purpose DAMON sysfs controls")
Signed-off-by: Maximilian Heyne <mheyne(a)amazon.de>
Reviewed-by: SeongJae Park <sj(a)kernel.org>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/damon/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/tools/testing/selftests/damon/Makefile~selftests-damon-add-_damon_sysfspy-to-test_files
+++ a/tools/testing/selftests/damon/Makefile
@@ -6,7 +6,7 @@ TEST_GEN_FILES += debugfs_target_ids_rea
TEST_GEN_FILES += debugfs_target_ids_pid_leak
TEST_GEN_FILES += access_memory access_memory_even
-TEST_FILES = _chk_dependency.sh _debugfs_common.sh
+TEST_FILES = _chk_dependency.sh _debugfs_common.sh _damon_sysfs.py
# functionality tests
TEST_PROGS = debugfs_attrs.sh debugfs_schemes.sh debugfs_target_ids.sh
_
Patches currently in -mm which might be from mheyne(a)amazon.de are
selftests-damon-add-_damon_sysfspy-to-test_files.patch
The patch titled
Subject: selftest: hugetlb_dio: fix test naming
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
selftest-hugetlb_dio-fix-test-naming.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Mark Brown <broonie(a)kernel.org>
Subject: selftest: hugetlb_dio: fix test naming
Date: Wed, 27 Nov 2024 16:14:22 +0000
The string logged when a test passes or fails is used by the selftest
framework to identify which test is being reported. The hugetlb_dio test
not only uses the same strings for every test that is run but it also uses
different strings for test passes and failures which means that test
automation is unable to follow what the test is doing at all.
Pull the existing duplicated logging of the number of free huge pages
before and after the test out of the conditional and replace that and the
logging of the result with a single ksft_print_result() which incorporates
the parameters passed into the test into the output.
Link: https://lkml.kernel.org/r/20241127-kselftest-mm-hugetlb-dio-names-v1-1-22aa…
Fixes: fae1980347bf ("selftests: hugetlb_dio: fixup check for initial conditions to skip in the start")
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Cc: Donet Tom <donettom(a)linux.ibm.com>
Cc: Muhammad Usama Anjum <usama.anjum(a)collabora.com>
Cc: Ritesh Harjani (IBM) <ritesh.list(a)gmail.com>
Cc: Shuah Khan <shuah(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
tools/testing/selftests/mm/hugetlb_dio.c | 14 +++++---------
1 file changed, 5 insertions(+), 9 deletions(-)
--- a/tools/testing/selftests/mm/hugetlb_dio.c~selftest-hugetlb_dio-fix-test-naming
+++ a/tools/testing/selftests/mm/hugetlb_dio.c
@@ -76,19 +76,15 @@ void run_dio_using_hugetlb(unsigned int
/* Get the free huge pages after unmap*/
free_hpage_a = get_free_hugepages();
+ ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
+ ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
+
/*
* If the no. of free hugepages before allocation and after unmap does
* not match - that means there could still be a page which is pinned.
*/
- if (free_hpage_a != free_hpage_b) {
- ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
- ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
- ksft_test_result_fail(": Huge pages not freed!\n");
- } else {
- ksft_print_msg("No. Free pages before allocation : %d\n", free_hpage_b);
- ksft_print_msg("No. Free pages after munmap : %d\n", free_hpage_a);
- ksft_test_result_pass(": Huge pages freed successfully !\n");
- }
+ ksft_test_result(free_hpage_a == free_hpage_b,
+ "free huge pages from %u-%u\n", start_off, end_off);
}
int main(void)
_
Patches currently in -mm which might be from broonie(a)kernel.org are
selftest-hugetlb_dio-fix-test-naming.patch
The patch titled
Subject: arch_numa: restore nid checks before registering a memblock with a node
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
arch_numa-restore-nid-checks-before-registering-a-memblock-with-a-node.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Marc Zyngier <maz(a)kernel.org>
Subject: arch_numa: restore nid checks before registering a memblock with a node
Date: Wed, 27 Nov 2024 19:30:00 +0000
Commit 767507654c22 ("arch_numa: switch over to numa_memblks")
significantly cleaned up the NUMA registration code, but also dropped a
significant check that was refusing to accept to configure a memblock with
an invalid nid.
On "quality hardware" such as my ThunderX machine, this results
in a kernel that dies immediately:
[ 0.000000] Booting Linux on physical CPU 0x0000000000 [0x431f0a10]
[ 0.000000] Linux version 6.12.0-00013-g8920d74cf8db (maz@valley-girl) (gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #3872 SMP PREEMPT Wed Nov 27 15:25:49 GMT 2024
[ 0.000000] KASLR disabled due to lack of seed
[ 0.000000] Machine model: Cavium ThunderX CN88XX board
[ 0.000000] efi: EFI v2.4 by American Megatrends
[ 0.000000] efi: ESRT=0xffce0ff18 SMBIOS 3.0=0xfffb0000 ACPI 2.0=0xffec60000 MEMRESERVE=0xffc905d98
[ 0.000000] esrt: Reserving ESRT space from 0x0000000ffce0ff18 to 0x0000000ffce0ff50.
[ 0.000000] earlycon: pl11 at MMIO 0x000087e024000000 (options '115200n8')
[ 0.000000] printk: legacy bootconsole [pl11] enabled
[ 0.000000] NODE_DATA(0) allocated [mem 0xff6754580-0xff67566bf]
[ 0.000000] Unable to handle kernel paging request at virtual address 0000000000001d40
[ 0.000000] Mem abort info:
[ 0.000000] ESR = 0x0000000096000004
[ 0.000000] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.000000] SET = 0, FnV = 0
[ 0.000000] EA = 0, S1PTW = 0
[ 0.000000] FSC = 0x04: level 0 translation fault
[ 0.000000] Data abort info:
[ 0.000000] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[ 0.000000] CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[ 0.000000] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[ 0.000000] [0000000000001d40] user address but active_mm is swapper
[ 0.000000] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 0.000000] Modules linked in:
[ 0.000000] CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.12.0-00013-g8920d74cf8db #3872
[ 0.000000] Hardware name: Cavium ThunderX CN88XX board (DT)
[ 0.000000] pstate: a00000c5 (NzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.000000] pc : sparse_init_nid+0x54/0x428
[ 0.000000] lr : sparse_init+0x118/0x240
[ 0.000000] sp : ffff800081da3cb0
[ 0.000000] x29: ffff800081da3cb0 x28: 0000000fedbab10c x27: 0000000000000001
[ 0.000000] x26: 0000000ffee250f8 x25: 0000000000000001 x24: ffff800082102cd0
[ 0.000000] x23: 0000000000000001 x22: 0000000000000000 x21: 00000000001fffff
[ 0.000000] x20: 0000000000000001 x19: 0000000000000000 x18: ffffffffffffffff
[ 0.000000] x17: 0000000001b00000 x16: 0000000ffd130000 x15: 0000000000000000
[ 0.000000] x14: 00000000003e0000 x13: 00000000000001c8 x12: 0000000000000014
[ 0.000000] x11: ffff800081e82860 x10: ffff8000820fb2c8 x9 : ffff8000820fb490
[ 0.000000] x8 : 0000000000ffed20 x7 : 0000000000000014 x6 : 00000000001fffff
[ 0.000000] x5 : 00000000ffffffff x4 : 0000000000000000 x3 : 0000000000000000
[ 0.000000] x2 : 0000000000000000 x1 : 0000000000000040 x0 : 0000000000000007
[ 0.000000] Call trace:
[ 0.000000] sparse_init_nid+0x54/0x428
[ 0.000000] sparse_init+0x118/0x240
[ 0.000000] bootmem_init+0x70/0x1c8
[ 0.000000] setup_arch+0x184/0x270
[ 0.000000] start_kernel+0x74/0x670
[ 0.000000] __primary_switched+0x80/0x90
[ 0.000000] Code: f865d804 d37df060 cb030000 d2800003 (b95d4084)
[ 0.000000] ---[ end trace 0000000000000000 ]---
[ 0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[ 0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---
while previous kernel versions were able to recognise how brain-damaged
the machine is, and only build a fake node.
Restoring the check brings back some sanity and a "working" system.
Link: https://lkml.kernel.org/r/20241127193000.3702637-1-maz@kernel.org
Fixes: 767507654c22 ("arch_numa: switch over to numa_memblks")
Signed-off-by: Marc Zyngier <maz(a)kernel.org>
Cc: Mike Rapoport <rppt(a)kernel.org>
Cc: Catalin Marinas <catalin.marinas(a)arm.com>
Cc: Will Deacon <will(a)kernel.org>
Cc: Zi Yan <ziy(a)nvidia.com>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: David Hildenbrand <david(a)redhat.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
drivers/base/arch_numa.c | 14 ++++++++++++++
1 file changed, 14 insertions(+)
--- a/drivers/base/arch_numa.c~arch_numa-restore-nid-checks-before-registering-a-memblock-with-a-node
+++ a/drivers/base/arch_numa.c
@@ -207,7 +207,21 @@ static void __init setup_node_data(int n
static int __init numa_register_nodes(void)
{
int nid;
+ struct memblock_region *mblk;
+ /* Check that valid nid is set to memblks */
+ for_each_mem_region(mblk) {
+ int mblk_nid = memblock_get_region_node(mblk);
+ phys_addr_t start = mblk->base;
+ phys_addr_t end = mblk->base + mblk->size - 1;
+
+ if (mblk_nid == NUMA_NO_NODE || mblk_nid >= MAX_NUMNODES) {
+ pr_warn("Warning: invalid memblk node %d [mem %pap-%pap]\n",
+ mblk_nid, &start, &end);
+ return -EINVAL;
+ }
+ }
+
/* Finally register nodes. */
for_each_node_mask(nid, numa_nodes_parsed) {
unsigned long start_pfn, end_pfn;
_
Patches currently in -mm which might be from maz(a)kernel.org are
arch_numa-restore-nid-checks-before-registering-a-memblock-with-a-node.patch
The quilt patch titled
Subject: fs/proc/kcore.c: clear ret value in read_kcore_iter after successful iov_iter_zero
has been removed from the -mm tree. Its filename was
fs-proc-kcorec-clear-ret-value-in-read_kcore_iter-after-successful-iov_iter_zero.patch
This patch was dropped because it was merged into mainline or a subsystem tree
------------------------------------------------------
From: Jiri Olsa <jolsa(a)kernel.org>
Subject: fs/proc/kcore.c: clear ret value in read_kcore_iter after successful iov_iter_zero
Date: Fri, 22 Nov 2024 00:11:18 +0100
If iov_iter_zero succeeds after failed copy_from_kernel_nofault, we need
to reset the ret value to zero otherwise it will be returned as final
return value of read_kcore_iter.
This fixes objdump -d dump over /proc/kcore for me.
Link: https://lkml.kernel.org/r/20241121231118.3212000-1-jolsa@kernel.org
Fixes: 3d5854d75e31 ("fs/proc/kcore.c: allow translation of physical memory addresses")
Signed-off-by: Jiri Olsa <jolsa(a)kernel.org>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: <hca(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/kcore.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/proc/kcore.c~fs-proc-kcorec-clear-ret-value-in-read_kcore_iter-after-successful-iov_iter_zero
+++ a/fs/proc/kcore.c
@@ -600,6 +600,7 @@ static ssize_t read_kcore_iter(struct ki
ret = -EFAULT;
goto out;
}
+ ret = 0;
/*
* We know the bounce buffer is safe to copy from, so
* use _copy_to_iter() directly.
_
Patches currently in -mm which might be from jolsa(a)kernel.org are
Despite CM_IDLEST1_CORE and CM_FCLKEN1_CORE behaving normal,
disabling SPI leads to messages like:
Powerdomain (core_pwrdm) didn't enter target state 0
and according to /sys/kernel/debug/pm_debug/count off state is not
entered. That was not connected to SPI during the discussion
of disabling SPI. See:
https://lore.kernel.org/linux-omap/20230122100852.32ae082c@aktux/
The reason is that SPI is per default in slave mode. Linux driver
will turn it to master per default. It slave mode, the powerdomain seems to
be kept active if active chip select input is sensed.
Fix that by explicitly disabling the SPI3 pins which are muxed by
the bootloader since they are available on an optionally fitted header
which would require dtb overlays anyways.
Fixes: a622310f7f01 ("ARM: dts: gta04: fix excess dma channel usage")
CC: stable(a)vger.kernel.org
Signed-off-by: Andreas Kemnade <andreas(a)kemnade.info>
---
arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi | 10 ++++++++++
1 file changed, 10 insertions(+)
diff --git a/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi b/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
index 3661340009e7a..3940909a5aac7 100644
--- a/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
+++ b/arch/arm/boot/dts/ti/omap/omap3-gta04.dtsi
@@ -446,6 +446,7 @@ &omap3_pmx_core2 {
pinctrl-names = "default";
pinctrl-0 = <
&hsusb2_2_pins
+ &mcspi3hog_pins
>;
hsusb2_2_pins: hsusb2-2-pins {
@@ -459,6 +460,15 @@ OMAP3630_CORE2_IOPAD(0x25fa, PIN_INPUT_PULLDOWN | MUX_MODE3) /* etk_d15.hsusb2_d
>;
};
+ mcspi3hog_pins: mcspi3hog-pins {
+ pinctrl-single,pins = <
+ OMAP3630_CORE2_IOPAD(0x25dc, PIN_OUTPUT_PULLDOWN | MUX_MODE7) /* etk_d0 */
+ OMAP3630_CORE2_IOPAD(0x25de, PIN_OUTPUT_PULLDOWN | MUX_MODE7) /* etk_d1 */
+ OMAP3630_CORE2_IOPAD(0x25e0, PIN_OUTPUT_PULLDOWN | MUX_MODE7) /* etk_d2 */
+ OMAP3630_CORE2_IOPAD(0x25e2, PIN_OUTPUT_PULLDOWN | MUX_MODE7) /* etk_d3 */
+ >;
+ };
+
spi_gpio_pins: spi-gpio-pinmux-pins {
pinctrl-single,pins = <
OMAP3630_CORE2_IOPAD(0x25d8, PIN_OUTPUT | MUX_MODE4) /* clk */
--
2.39.2
From: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
DSB LUT register writes vs. palette anti-collision logic
appear to interact in interesting ways:
- posted DSB writes simply vanish into thin air while
anti-collision is active
- non-posted DSB writes actually get blocked by the anti-collision
logic, but unfortunately this ends up hogging the bus for
long enough that unrelated parallel CPU MMIO accesses start
to disappear instead
Even though we are updating the LUT during vblank we aren't
immune to the anti-collision logic because it kicks in brifly
for pipe prefill (initiated at frame start). The safe time
window for performing the LUT update is thus between the
undelayed vblank and frame start. Turns out that with low
enough CDCLK frequency (DSB execution speed depends on CDCLK)
we can exceed that.
As we are currently using non-posted writes for the legacy LUT
updates, in which case we can hit the far more severe failure
mode. The problem is exacerbated by the fact that non-posted
writes are much slower than posted writes (~4x it seems).
To mititage the problem let's switch to using posted DSB
writes for legacy LUT updates (which will involve using the
double write approach to avoid other problems with DSB
vs. legacy LUT writes). Despite writing each register twice
this will in fact make the legacy LUT update faster when
compared to the non-posted write approach, making the
problem less likely to appear. The failure mode is also
less severe.
This isn't the 100% solution we need though. That will involve
estimating how long the LUT update will take, and pushing
frame start and/or delayed vblank forward to guarantee that
the update will have finished by the time the pipe prefill
starts...
Cc: stable(a)vger.kernel.org
Fixes: 34d8311f4a1c ("drm/i915/dsb: Re-instate DSB for LUT updates")
Fixes: 25ea3411bd23 ("drm/i915/dsb: Use non-posted register writes for legacy LUT")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12494
Signed-off-by: Ville Syrjälä <ville.syrjala(a)linux.intel.com>
---
drivers/gpu/drm/i915/display/intel_color.c | 30 ++++++++++++++--------
1 file changed, 20 insertions(+), 10 deletions(-)
diff --git a/drivers/gpu/drm/i915/display/intel_color.c b/drivers/gpu/drm/i915/display/intel_color.c
index 6ea3d5c58cb1..7cd902bbd244 100644
--- a/drivers/gpu/drm/i915/display/intel_color.c
+++ b/drivers/gpu/drm/i915/display/intel_color.c
@@ -1368,19 +1368,29 @@ static void ilk_load_lut_8(const struct intel_crtc_state *crtc_state,
lut = blob->data;
/*
- * DSB fails to correctly load the legacy LUT
- * unless we either write each entry twice,
- * or use non-posted writes
+ * DSB fails to correctly load the legacy LUT unless
+ * we either write each entry twice when using posted
+ * writes, or we use non-posted writes.
+ *
+ * If palette anti-collision is active during LUT
+ * register writes:
+ * - posted writes simply get dropped and thus the LUT
+ * contents may not be correctly updated
+ * - non-posted writes are blocked and thus the LUT
+ * contents are always correct, but simultaneous CPU
+ * MMIO access will start to fail
+ *
+ * Choose the lesser of two evils and use posted writes.
+ * Using posted writes is also faster, even when having
+ * to write each register twice.
*/
- if (crtc_state->dsb_color_vblank)
- intel_dsb_nonpost_start(crtc_state->dsb_color_vblank);
-
- for (i = 0; i < 256; i++)
+ for (i = 0; i < 256; i++) {
ilk_lut_write(crtc_state, LGC_PALETTE(pipe, i),
i9xx_lut_8(&lut[i]));
-
- if (crtc_state->dsb_color_vblank)
- intel_dsb_nonpost_end(crtc_state->dsb_color_vblank);
+ if (crtc_state->dsb_color_vblank)
+ ilk_lut_write(crtc_state, LGC_PALETTE(pipe, i),
+ i9xx_lut_8(&lut[i]));
+ }
}
static void ilk_load_lut_10(const struct intel_crtc_state *crtc_state,
--
2.45.2
From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
The early_console_setup() function initializes the sci_ports[0].port with
an object of type struct uart_port obtained from the object of type
struct earlycon_device received as argument by the early_console_setup().
It may happen that later, when the rest of the serial ports are probed,
the serial port that was used as earlycon (e.g., port A) to be mapped to a
different position in sci_ports[] and the slot 0 to be used by a different
serial port (e.g., port B), as follows:
sci_ports[0] = port A
sci_ports[X] = port B
In this case, the new port mapped at index zero will have associated data
that was used for earlycon.
In case this happens, after Linux boot, any access to the serial port that
maps on sci_ports[0] (port A) will block the serial port that was used as
earlycon (port B).
To fix this, add early_console_exit() that clean the sci_ports[0] at
earlycon exit time.
Fixes: 0b0cced19ab1 ("serial: sh-sci: Add CONFIG_SERIAL_EARLYCON support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
---
drivers/tty/serial/sh-sci.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 8e2d534401fa..2f8188bdb251 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -3546,6 +3546,32 @@ sh_early_platform_init_buffer("earlyprintk", &sci_driver,
#ifdef CONFIG_SERIAL_SH_SCI_EARLYCON
static struct plat_sci_port port_cfg __initdata;
+static int early_console_exit(struct console *co)
+{
+ struct sci_port *sci_port = &sci_ports[0];
+ struct uart_port *port = &sci_port->port;
+ unsigned long flags;
+ int locked = 1;
+
+ if (port->sysrq)
+ locked = 0;
+ else if (oops_in_progress)
+ locked = uart_port_trylock_irqsave(port, &flags);
+ else
+ uart_port_lock_irqsave(port, &flags);
+
+ /*
+ * Clean the slot used by earlycon. A new SCI device might
+ * map to this slot.
+ */
+ memset(sci_ports, 0, sizeof(*sci_port));
+
+ if (locked)
+ uart_port_unlock_irqrestore(port, flags);
+
+ return 0;
+}
+
static int __init early_console_setup(struct earlycon_device *device,
int type)
{
@@ -3562,6 +3588,8 @@ static int __init early_console_setup(struct earlycon_device *device,
SCSCR_RE | SCSCR_TE | port_cfg.scscr);
device->con->write = serial_console_write;
+ device->con->exit = early_console_exit;
+
return 0;
}
static int __init sci_early_console_setup(struct earlycon_device *device,
--
2.39.2
From: lei lu <llfamsec(a)gmail.com>
[ Upstream commit fb63435b7c7dc112b1ae1baea5486e0a6e27b196 ]
There is a lack of verification of the space occupied by fixed members
of xlog_op_header in the xlog_recover_process_data.
We can create a crafted image to trigger an out of bounds read by
following these steps:
1) Mount an image of xfs, and do some file operations to leave records
2) Before umounting, copy the image for subsequent steps to simulate
abnormal exit. Because umount will ensure that tail_blk and
head_blk are the same, which will result in the inability to enter
xlog_recover_process_data
3) Write a tool to parse and modify the copied image in step 2
4) Make the end of the xlog_op_header entries only 1 byte away from
xlog_rec_header->h_size
5) xlog_rec_header->h_num_logops++
6) Modify xlog_rec_header->h_crc
Fix:
Add a check to make sure there is sufficient space to access fixed members
of xlog_op_header.
Signed-off-by: lei lu <llfamsec(a)gmail.com>
Reviewed-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu(a)kernel.org>
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
---
fs/xfs/xfs_log_recover.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 9f9d3abad2cf..d11de0fa5c5f 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2456,7 +2456,10 @@ xlog_recover_process_data(
ohead = (struct xlog_op_header *)dp;
dp += sizeof(*ohead);
- ASSERT(dp <= end);
+ if (dp > end) {
+ xfs_warn(log->l_mp, "%s: op header overrun", __func__);
+ return -EFSCORRUPTED;
+ }
/* errors will abort recovery */
error = xlog_recover_process_ophdr(log, rhash, rhead, ohead,
--
2.34.1
From: Alex Hung <alex.hung(a)amd.com>
[ Upstream commit 3718a619a8c0a53152e76bb6769b6c414e1e83f4 ]
dcn32_enable_phantom_stream can return null, so returned value
must be checked before used.
This fixes 1 NULL_RETURNS issue reported by Coverity.
Reviewed-by: Rodrigo Siqueira <rodrigo.siqueira(a)amd.com>
Signed-off-by: Jerry Zuo <jerry.zuo(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Xiangyu: BP to fix CVE: CVE-2024-49897, modified the source path]
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
---
drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
index 2b8700b291a4..ef47fb2f6905 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_resource.c
@@ -1796,6 +1796,9 @@ void dcn32_add_phantom_pipes(struct dc *dc, struct dc_state *context,
// be a valid candidate for SubVP (i.e. has a plane, stream, doesn't
// already have phantom pipe assigned, etc.) by previous checks.
phantom_stream = dcn32_enable_phantom_stream(dc, context, pipes, pipe_cnt, index);
+ if (!phantom_stream)
+ return;
+
dcn32_enable_phantom_plane(dc, context, phantom_stream, index);
for (i = 0; i < dc->res_pool->pipe_count; i++) {
--
2.25.1
From: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
[ Upstream commit 62ed6f0f198da04e884062264df308277628004f ]
This commit adds a null check for the set_output_gamma function pointer
in the dcn20_set_output_transfer_func function. Previously,
set_output_gamma was being checked for null at line 1030, but then it
was being dereferenced without any null check at line 1048. This could
potentially lead to a null pointer dereference error if set_output_gamma
is null.
To fix this, we now ensure that set_output_gamma is not null before
dereferencing it. We do this by adding a null check for set_output_gamma
before the call to set_output_gamma at line 1048.
Cc: Tom Chung <chiahsuan.chung(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Roman Li <roman.li(a)amd.com>
Cc: Alex Hung <alex.hung(a)amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
---
drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
index 9bd6a5716cdc..81b1ab55338a 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hwseq.c
@@ -856,7 +856,8 @@ bool dcn20_set_output_transfer_func(struct dc *dc, struct pipe_ctx *pipe_ctx,
/*
* if above if is not executed then 'params' equal to 0 and set in bypass
*/
- mpc->funcs->set_output_gamma(mpc, mpcc_id, params);
+ if (mpc->funcs->set_output_gamma)
+ mpc->funcs->set_output_gamma(mpc, mpcc_id, params);
return true;
}
--
2.43.0
From: lei lu <llfamsec(a)gmail.com>
[ Upstream commit fb63435b7c7dc112b1ae1baea5486e0a6e27b196 ]
There is a lack of verification of the space occupied by fixed members
of xlog_op_header in the xlog_recover_process_data.
We can create a crafted image to trigger an out of bounds read by
following these steps:
1) Mount an image of xfs, and do some file operations to leave records
2) Before umounting, copy the image for subsequent steps to simulate
abnormal exit. Because umount will ensure that tail_blk and
head_blk are the same, which will result in the inability to enter
xlog_recover_process_data
3) Write a tool to parse and modify the copied image in step 2
4) Make the end of the xlog_op_header entries only 1 byte away from
xlog_rec_header->h_size
5) xlog_rec_header->h_num_logops++
6) Modify xlog_rec_header->h_crc
Fix:
Add a check to make sure there is sufficient space to access fixed members
of xlog_op_header.
Signed-off-by: lei lu <llfamsec(a)gmail.com>
Reviewed-by: Dave Chinner <dchinner(a)redhat.com>
Reviewed-by: Darrick J. Wong <djwong(a)kernel.org>
Signed-off-by: Chandan Babu R <chandanbabu(a)kernel.org>
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
---
fs/xfs/xfs_log_recover.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index affe94356ed1..006a376c34b2 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -2439,7 +2439,10 @@ xlog_recover_process_data(
ohead = (struct xlog_op_header *)dp;
dp += sizeof(*ohead);
- ASSERT(dp <= end);
+ if (dp > end) {
+ xfs_warn(log->l_mp, "%s: op header overrun", __func__);
+ return -EFSCORRUPTED;
+ }
/* errors will abort recovery */
error = xlog_recover_process_ophdr(log, rhash, rhead, ohead,
--
2.34.1
From: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
[ Upstream commit c395fd47d1565bd67671f45cca281b3acc2c31ef ]
This commit addresses a potential null pointer dereference issue in the
`dcn32_init_hw` function. The issue could occur when `dc->clk_mgr` is
null.
The fix adds a check to ensure `dc->clk_mgr` is not null before
accessing its functions. This prevents a potential null pointer
dereference.
Reported by smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn32/dcn32_hwseq.c:961 dcn32_init_hw() error: we previously assumed 'dc->clk_mgr' could be null (see line 782)
Cc: Tom Chung <chiahsuan.chung(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Roman Li <roman.li(a)amd.com>
Cc: Alex Hung <alex.hung(a)amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
Reviewed-by: Alex Hung <alex.hung(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Xiangyu: BP to fix CVE: CVE-2024-49915, modified the source path]
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
---
drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index d3ad13bf35c8..55a24d9f5b14 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -811,7 +811,7 @@ void dcn32_init_hw(struct dc *dc)
int edp_num;
uint32_t backlight = MAX_BACKLIGHT_LEVEL;
- if (dc->clk_mgr && dc->clk_mgr->funcs->init_clocks)
+ if (dc->clk_mgr && dc->clk_mgr->funcs && dc->clk_mgr->funcs->init_clocks)
dc->clk_mgr->funcs->init_clocks(dc->clk_mgr);
// Initialize the dccg
@@ -970,10 +970,11 @@ void dcn32_init_hw(struct dc *dc)
if (!dcb->funcs->is_accelerated_mode(dcb) && dc->res_pool->hubbub->funcs->init_watermarks)
dc->res_pool->hubbub->funcs->init_watermarks(dc->res_pool->hubbub);
- if (dc->clk_mgr->funcs->notify_wm_ranges)
+ if (dc->clk_mgr && dc->clk_mgr->funcs && dc->clk_mgr->funcs->notify_wm_ranges)
dc->clk_mgr->funcs->notify_wm_ranges(dc->clk_mgr);
- if (dc->clk_mgr->funcs->set_hard_max_memclk && !dc->clk_mgr->dc_mode_softmax_enabled)
+ if (dc->clk_mgr && dc->clk_mgr->funcs && dc->clk_mgr->funcs->set_hard_max_memclk &&
+ !dc->clk_mgr->dc_mode_softmax_enabled)
dc->clk_mgr->funcs->set_hard_max_memclk(dc->clk_mgr);
if (dc->res_pool->hubbub->funcs->force_pstate_change_control)
--
2.25.1
From: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
[ Upstream commit cba7fec864172dadd953daefdd26e01742b71a6a ]
This commit addresses a potential null pointer dereference issue in the
`dcn30_init_hw` function. The issue could occur when `dc->clk_mgr` or
`dc->clk_mgr->funcs` is null.
The fix adds a check to ensure `dc->clk_mgr` and `dc->clk_mgr->funcs` is
not null before accessing its functions. This prevents a potential null
pointer dereference.
Reported by smatch:
drivers/gpu/drm/amd/amdgpu/../display/dc/hwss/dcn30/dcn30_hwseq.c:789 dcn30_init_hw() error: we previously assumed 'dc->clk_mgr' could be null (see line 628)
Cc: Tom Chung <chiahsuan.chung(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Roman Li <roman.li(a)amd.com>
Cc: Alex Hung <alex.hung(a)amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
Reviewed-by: Alex Hung <alex.hung(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
[Xiangyu: BP to fix CVE: CVE-2024-49917, modified the source path]
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
---
drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
index ba4a1e7f196d..b8653bdfc40f 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn30/dcn30_hwseq.c
@@ -440,7 +440,7 @@ void dcn30_init_hw(struct dc *dc)
int edp_num;
uint32_t backlight = MAX_BACKLIGHT_LEVEL;
- if (dc->clk_mgr && dc->clk_mgr->funcs->init_clocks)
+ if (dc->clk_mgr && dc->clk_mgr->funcs && dc->clk_mgr->funcs->init_clocks)
dc->clk_mgr->funcs->init_clocks(dc->clk_mgr);
// Initialize the dccg
@@ -599,11 +599,12 @@ void dcn30_init_hw(struct dc *dc)
if (!dcb->funcs->is_accelerated_mode(dcb) && dc->res_pool->hubbub->funcs->init_watermarks)
dc->res_pool->hubbub->funcs->init_watermarks(dc->res_pool->hubbub);
- if (dc->clk_mgr->funcs->notify_wm_ranges)
+ if (dc->clk_mgr && dc->clk_mgr->funcs && dc->clk_mgr->funcs->notify_wm_ranges)
dc->clk_mgr->funcs->notify_wm_ranges(dc->clk_mgr);
//if softmax is enabled then hardmax will be set by a different call
- if (dc->clk_mgr->funcs->set_hard_max_memclk && !dc->clk_mgr->dc_mode_softmax_enabled)
+ if (dc->clk_mgr && dc->clk_mgr->funcs && dc->clk_mgr->funcs->set_hard_max_memclk &&
+ !dc->clk_mgr->dc_mode_softmax_enabled)
dc->clk_mgr->funcs->set_hard_max_memclk(dc->clk_mgr);
if (dc->res_pool->hubbub->funcs->force_pstate_change_control)
--
2.25.1
v4:
- Adds Bjorn's RB to first patch - Bjorn
- Drops the 'd' in "and int" - Bjorn
- Amends commit log of patch 3 to capture a number of open questions -
Bjorn
- Link to v3: https://lore.kernel.org/r/20241126-b4-linux-next-24-11-18-clock-multiple-po…
v3:
- Fixes commit log "per which" - Bryan
- Link to v2: https://lore.kernel.org/r/20241125-b4-linux-next-24-11-18-clock-multiple-po…
v2:
The main change in this version is Bjorn's pointing out that pm_runtime_*
inside of the gdsc_enable/gdsc_disable path would be recursive and cause a
lockdep splat. Dmitry alluded to this too.
Bjorn pointed to stuff being done lower in the gdsc_register() routine that
might be a starting point.
I iterated around that idea and came up with patch #3. When a gdsc has no
parent and the pd_list is non-NULL then attach that orphan GDSC to the
clock controller power-domain list.
Existing subdomain code in gdsc_register() will connect the parent GDSCs in
the clock-controller to the clock-controller subdomain, the new code here
does that same job for a list of power-domains the clock controller depends
on.
To Dmitry's point about MMCX and MCX dependencies for the registers inside
of the clock controller, I have switched off all references in a test dtsi
and confirmed that accessing the clock-controller regs themselves isn't
required.
On the second point I also verified my test branch with lockdep on which
was a concern with the pm_domain version of this solution but I wanted to
cover it anyway with the new approach for completeness sake.
Here's the item-by-item list of changes:
- Adds a patch to capture pm_genpd_add_subdomain() result code - Bryan
- Changes changelog of second patch to remove singleton and generally
to make the commit log easier to understand - Bjorn
- Uses demv_pm_domain_attach_list - Vlad
- Changes error check to if (ret < 0 && ret != -EEXIST) - Vlad
- Retains passing &pd_data instead of NULL - because NULL doesn't do
the same thing - Bryan/Vlad
- Retains standalone function qcom_cc_pds_attach() because the pd_data
enumeration looks neater in a standalone function - Bryan/Vlad
- Drops pm_runtime in favour of gdsc_add_subdomain_list() for each
power-domain in the pd_list.
The pd_list will be whatever is pointed to by power-domains = <>
in the dtsi - Bjorn
- Link to v1: https://lore.kernel.org/r/20241118-b4-linux-next-24-11-18-clock-multiple-po…
v1:
On x1e80100 and it's SKUs the Camera Clock Controller - CAMCC has
multiple power-domains which power it. Usually with a single power-domain
the core platform code will automatically switch on the singleton
power-domain for you. If you have multiple power-domains for a device, in
this case the clock controller, you need to switch those power-domains
on/off yourself.
The clock controllers can also contain Global Distributed
Switch Controllers - GDSCs which themselves can be referenced from dtsi
nodes ultimately triggering a gdsc_en() in drivers/clk/qcom/gdsc.c.
As an example:
cci0: cci@ac4a000 {
power-domains = <&camcc TITAN_TOP_GDSC>;
};
This series adds the support to attach a power-domain list to the
clock-controllers and the GDSCs those controllers provide so that in the
case of the above example gdsc_toggle_logic() will trigger the power-domain
list with pm_runtime_resume_and_get() and pm_runtime_put_sync()
respectively.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (3):
clk: qcom: gdsc: Capture pm_genpd_add_subdomain result code
clk: qcom: common: Add support for power-domain attachment
clk: qcom: Support attaching GDSCs to multiple parents
drivers/clk/qcom/common.c | 21 +++++++++++++++++++++
drivers/clk/qcom/gdsc.c | 41 +++++++++++++++++++++++++++++++++++++++--
drivers/clk/qcom/gdsc.h | 1 +
3 files changed, 61 insertions(+), 2 deletions(-)
---
base-commit: 744cf71b8bdfcdd77aaf58395e068b7457634b2c
change-id: 20241118-b4-linux-next-24-11-18-clock-multiple-power-domains-a5f994dc452a
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
From: Chuck Lever <chuck.lever(a)oracle.com>
Testing shows that the EBUSY error return from mtree_alloc_cyclic()
leaks into user space. The ERRORS section of "man creat(2)" says:
> EBUSY O_EXCL was specified in flags and pathname refers
> to a block device that is in use by the system
> (e.g., it is mounted).
ENOSPC is closer to what applications expect in this situation.
Note that the normal range of simple directory offset values is
2..2^63, so hitting this error is going to be rare to impossible.
Fixes: 6faddda69f62 ("libfs: Add directory operations for stable offsets")
Cc: <stable(a)vger.kernel.org> # v6.9+
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Reviewed-by: Yang Erkun <yangerkun(a)huawei.com>
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
fs/libfs.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/libfs.c b/fs/libfs.c
index 46966fd8bcf9..bf67954b525b 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -288,7 +288,9 @@ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry)
ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry, DIR_OFFSET_MIN,
LONG_MAX, &octx->next_offset, GFP_KERNEL);
- if (ret < 0)
+ if (unlikely(ret == -EBUSY))
+ return -ENOSPC;
+ if (unlikely(ret < 0))
return ret;
offset_set(dentry, offset);
--
2.47.0
The following commit has been merged into the timers/urgent branch of tip:
Commit-ID: 299130166e70124956c865a66a3669a61db1c212
Gitweb: https://git.kernel.org/tip/299130166e70124956c865a66a3669a61db1c212
Author: Marcelo Dalmas <marcelo.dalmas(a)ge.com>
AuthorDate: Mon, 25 Nov 2024 12:16:09
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Wed, 27 Nov 2024 15:18:45 +01:00
ntp: Remove invalid cast in time offset math
Due to an unsigned cast, adjtimex() returns the wrong offest when using
ADJ_MICRO and the offset is negative. In this case a small negative offset
returns approximately 4.29 seconds (~ 2^32/1000 milliseconds) due to the
unsigned cast of the negative offset.
Remove the cast and restore the signed division to cure that issue.
Fixes: ead25417f82e ("timex: use __kernel_timex internally")
Signed-off-by: Marcelo Dalmas <marcelo.dalmas(a)ge.com>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/SJ0P101MB03687BF7D5A10FD3C49C51E5F42E2@SJ0P101M…
---
kernel/time/ntp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index b550ebe..02e7fe6 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -798,7 +798,7 @@ int __do_adjtimex(struct __kernel_timex *txc, const struct timespec64 *ts,
txc->offset = shift_right(ntpdata->time_offset * NTP_INTERVAL_FREQ, NTP_SCALE_SHIFT);
if (!(ntpdata->time_status & STA_NANO))
- txc->offset = (u32)txc->offset / NSEC_PER_USEC;
+ txc->offset /= NSEC_PER_USEC;
}
result = ntpdata->time_state;
From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
On the Renesas RZ/G3S, when doing suspend to RAM, the uart_suspend_port()
is called. The uart_suspend_port() calls 3 times the
struct uart_port::ops::tx_empty() before shutting down the port.
According to the documentation, the struct uart_port::ops::tx_empty()
API tests whether the transmitter FIFO and shifter for the port is
empty.
The Renesas RZ/G3S SCIFA IP reports the number of data units stored in the
transmit FIFO through the FDR (FIFO Data Count Register). The data units
in the FIFOs are written in the shift register and transmitted from there.
The TEND bit in the Serial Status Register reports if the data was
transmitted from the shift register.
In the previous code, in the tx_empty() API implemented by the sh-sci
driver, it is considered that the TX is empty if the hardware reports the
TEND bit set and the number of data units in the FIFO is zero.
According to the HW manual, the TEND bit has the following meaning:
0: Transmission is in the waiting state or in progress.
1: Transmission is completed.
It has been noticed that when opening the serial device w/o using it and
then switch to a power saving mode, the tx_empty() call in the
uart_port_suspend() function fails, leading to the "Unable to drain
transmitter" message being printed on the console. This is because the
TEND=0 if nothing has been transmitted and the FIFOs are empty. As the
TEND=0 has double meaning (waiting state, in progress) we can't
determined the scenario described above.
Add a software workaround for this. This sets a variable if any data has
been sent on the serial console (when using PIO) or if the DMA callback has
been called (meaning something has been transmitted). In the tx_empty()
API the status of the DMA transaction is also checked and if it is
completed or in progress the code falls back in checking the hardware
registers instead of relying on the software variable.
Fixes: 73a19e4c0301 ("serial: sh-sci: Add DMA support.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
---
Changes in v3:
- s/first_time_tx/tx_occurred/g
- checked the DMA status in sci_tx_empty() through sci_dma_check_tx_occurred()
function; added this new function as the DMA support is conditioned by
the CONFIG_SERIAL_SH_SCI_DMA flag
- dropped the tx_occurred initialization in sci_shutdown() as it is already
initialized in sci_startup()
- adjusted the commit message to reflect latest changes
Changes in v2:
- use bool type instead of atomic_t
drivers/tty/serial/sh-sci.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 136e0c257af1..ade151ff39d2 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -157,6 +157,7 @@ struct sci_port {
bool has_rtscts;
bool autorts;
+ bool tx_occurred;
};
#define SCI_NPORTS CONFIG_SERIAL_SH_SCI_NR_UARTS
@@ -850,6 +851,7 @@ static void sci_transmit_chars(struct uart_port *port)
{
struct tty_port *tport = &port->state->port;
unsigned int stopped = uart_tx_stopped(port);
+ struct sci_port *s = to_sci_port(port);
unsigned short status;
unsigned short ctrl;
int count;
@@ -885,6 +887,7 @@ static void sci_transmit_chars(struct uart_port *port)
}
sci_serial_out(port, SCxTDR, c);
+ s->tx_occurred = true;
port->icount.tx++;
} while (--count > 0);
@@ -1241,6 +1244,8 @@ static void sci_dma_tx_complete(void *arg)
if (kfifo_len(&tport->xmit_fifo) < WAKEUP_CHARS)
uart_write_wakeup(port);
+ s->tx_occurred = true;
+
if (!kfifo_is_empty(&tport->xmit_fifo)) {
s->cookie_tx = 0;
schedule_work(&s->work_tx);
@@ -1731,6 +1736,16 @@ static void sci_flush_buffer(struct uart_port *port)
s->cookie_tx = -EINVAL;
}
}
+
+static void sci_dma_check_tx_occurred(struct sci_port *s)
+{
+ struct dma_tx_state state;
+ enum dma_status status;
+
+ status = dmaengine_tx_status(s->chan_tx, s->cookie_tx, &state);
+ if (status == DMA_COMPLETE || status == DMA_IN_PROGRESS)
+ s->tx_occurred = true;
+}
#else /* !CONFIG_SERIAL_SH_SCI_DMA */
static inline void sci_request_dma(struct uart_port *port)
{
@@ -1740,6 +1755,10 @@ static inline void sci_free_dma(struct uart_port *port)
{
}
+static void sci_dma_check_tx_occurred(struct sci_port *s)
+{
+}
+
#define sci_flush_buffer NULL
#endif /* !CONFIG_SERIAL_SH_SCI_DMA */
@@ -2076,6 +2095,12 @@ static unsigned int sci_tx_empty(struct uart_port *port)
{
unsigned short status = sci_serial_in(port, SCxSR);
unsigned short in_tx_fifo = sci_txfill(port);
+ struct sci_port *s = to_sci_port(port);
+
+ sci_dma_check_tx_occurred(s);
+
+ if (!s->tx_occurred)
+ return TIOCSER_TEMT;
return (status & SCxSR_TEND(port)) && !in_tx_fifo ? TIOCSER_TEMT : 0;
}
@@ -2247,6 +2272,7 @@ static int sci_startup(struct uart_port *port)
dev_dbg(port->dev, "%s(%d)\n", __func__, port->line);
+ s->tx_occurred = false;
sci_request_dma(port);
ret = sci_request_irq(s);
--
2.39.2
From: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
The problem apparetly has been known since the conversion to
raw_spin_lock() (commit 4dbada2be460
("gpio: omap: use raw locks for locking")).
Symptom:
[ BUG: Invalid wait context ]
5.10.214
-----------------------------
swapper/1 is trying to lock:
(enable_lock){....}-{3:3}, at: clk_enable_lock
other info that might help us debug this:
context-{5:5}
2 locks held by swapper/1:
#0: (&dev->mutex){....}-{4:4}, at: device_driver_attach
#1: (&bank->lock){....}-{2:2}, at: omap_gpio_set_config
stack backtrace:
CPU: 0 PID: 1 Comm: swapper Not tainted 5.10.214
Hardware name: Generic AM33XX (Flattened Device Tree)
unwind_backtrace
show_stack
__lock_acquire
lock_acquire.part.0
_raw_spin_lock_irqsave
clk_enable_lock
clk_enable
omap_gpio_set_config
gpio_keys_setup_key
gpio_keys_probe
platform_drv_probe
really_probe
driver_probe_device
device_driver_attach
__driver_attach
bus_for_each_dev
bus_add_driver
driver_register
do_one_initcall
do_initcalls
kernel_init_freeable
kernel_init
Problematic spin_lock_irqsave(&enable_lock, ...) is being called by
clk_enable()/clk_disable() in omap2_set_gpio_debounce() and
omap_clear_gpio_debounce().
For omap2_set_gpio_debounce() it's possible to move
raw_spin_lock_irqsave(&bank->lock, ...) inside omap2_set_gpio_debounce()
so that the locks nest as follows:
clk_enable(bank->dbck)
raw_spin_lock_irqsave(&bank->lock, ...)
raw_spin_unlock_irqrestore()
clk_disable()
Two call-sites of omap_clear_gpio_debounce() are more convoluted, but one
can take the advantage of the nesting nature of clk_enable()/clk_disable(),
so that the inner clk_disable() becomes lockless:
clk_enable(bank->dbck) <-- only to clk_enable_lock()
raw_spin_lock_irqsave(&bank->lock, ...)
omap_clear_gpio_debounce()
clk_disable() <-- becomes lockless
raw_spin_unlock_irqrestore()
clk_disable()
Cc: stable(a)vger.kernel.org
Fixes: 4dbada2be460 ("gpio: omap: use raw locks for locking")
Signed-off-by: Alexander Sverdlin <alexander.sverdlin(a)siemens.com>
---
drivers/gpio/gpio-omap.c | 35 ++++++++++++++++++++++++++++++-----
1 file changed, 30 insertions(+), 5 deletions(-)
diff --git a/drivers/gpio/gpio-omap.c b/drivers/gpio/gpio-omap.c
index 7ad4534054962..f9e502aa57753 100644
--- a/drivers/gpio/gpio-omap.c
+++ b/drivers/gpio/gpio-omap.c
@@ -181,6 +181,7 @@ static inline void omap_gpio_dbck_disable(struct gpio_bank *bank)
static int omap2_set_gpio_debounce(struct gpio_bank *bank, unsigned offset,
unsigned debounce)
{
+ unsigned long flags;
u32 val;
u32 l;
bool enable = !!debounce;
@@ -196,13 +197,18 @@ static int omap2_set_gpio_debounce(struct gpio_bank *bank, unsigned offset,
l = BIT(offset);
+ /*
+ * Ordering is important here: clk_enable() calls spin_lock_irqsave(),
+ * therefore it must be outside of the following raw_spin_lock_irqsave()
+ */
clk_enable(bank->dbck);
+ raw_spin_lock_irqsave(&bank->lock, flags);
+
writel_relaxed(debounce, bank->base + bank->regs->debounce);
val = omap_gpio_rmw(bank->base + bank->regs->debounce_en, l, enable);
bank->dbck_enable_mask = val;
- clk_disable(bank->dbck);
/*
* Enable debounce clock per module.
* This call is mandatory because in omap_gpio_request() when
@@ -217,6 +223,9 @@ static int omap2_set_gpio_debounce(struct gpio_bank *bank, unsigned offset,
bank->context.debounce_en = val;
}
+ raw_spin_unlock_irqrestore(&bank->lock, flags);
+ clk_disable(bank->dbck);
+
return 0;
}
@@ -647,6 +656,13 @@ static void omap_gpio_irq_shutdown(struct irq_data *d)
unsigned long flags;
unsigned offset = d->hwirq;
+ /*
+ * Enable the clock here so that the nested clk_disable() in the
+ * following omap_clear_gpio_debounce() is lockless
+ */
+ if (bank->dbck_flag)
+ clk_enable(bank->dbck);
+
raw_spin_lock_irqsave(&bank->lock, flags);
bank->irq_usage &= ~(BIT(offset));
omap_set_gpio_triggering(bank, offset, IRQ_TYPE_NONE);
@@ -656,6 +672,9 @@ static void omap_gpio_irq_shutdown(struct irq_data *d)
omap_clear_gpio_debounce(bank, offset);
omap_disable_gpio_module(bank, offset);
raw_spin_unlock_irqrestore(&bank->lock, flags);
+
+ if (bank->dbck_flag)
+ clk_disable(bank->dbck);
}
static void omap_gpio_irq_bus_lock(struct irq_data *data)
@@ -827,6 +846,13 @@ static void omap_gpio_free(struct gpio_chip *chip, unsigned offset)
struct gpio_bank *bank = gpiochip_get_data(chip);
unsigned long flags;
+ /*
+ * Enable the clock here so that the nested clk_disable() in the
+ * following omap_clear_gpio_debounce() is lockless
+ */
+ if (bank->dbck_flag)
+ clk_enable(bank->dbck);
+
raw_spin_lock_irqsave(&bank->lock, flags);
bank->mod_usage &= ~(BIT(offset));
if (!LINE_USED(bank->irq_usage, offset)) {
@@ -836,6 +862,9 @@ static void omap_gpio_free(struct gpio_chip *chip, unsigned offset)
omap_disable_gpio_module(bank, offset);
raw_spin_unlock_irqrestore(&bank->lock, flags);
+ if (bank->dbck_flag)
+ clk_disable(bank->dbck);
+
pm_runtime_put(chip->parent);
}
@@ -913,15 +942,11 @@ static int omap_gpio_debounce(struct gpio_chip *chip, unsigned offset,
unsigned debounce)
{
struct gpio_bank *bank;
- unsigned long flags;
int ret;
bank = gpiochip_get_data(chip);
- raw_spin_lock_irqsave(&bank->lock, flags);
ret = omap2_set_gpio_debounce(bank, offset, debounce);
- raw_spin_unlock_irqrestore(&bank->lock, flags);
-
if (ret)
dev_info(chip->parent,
"Could not set line %u debounce to %u microseconds (%d)",
--
2.47.0
[BUG]
If submit_one_sector() failed inside extent_writepage_io() for sector
size < page size cases (e.g. 4K sector size and 64K page size), then
we can hit double ordered extent accounting error.
This should be very rare, as submit_one_sector() only fails when we
failed to grab the extent map, and such extent map should exist inside
the memory and have been pinned.
[CAUSE]
For example we have the following folio layout:
0 4K 32K 48K 60K 64K
|//| |//////| |///|
Where |///| is the dirty range we need to writeback. The 3 different
dirty ranges are submitted for regular COW.
Now we hit the following sequence:
- submit_one_sector() returned 0 for [0, 4K)
- submit_one_sector() returned 0 for [32K, 48K)
- submit_one_sector() returned error for [60K, 64K)
- btrfs_mark_ordered_io_finished() called for the whole folio
This will mark the following ranges as finished:
* [0, 4K)
* [32K, 48K)
Both ranges have their IO already submitted, this cleanup will
lead to double accounting.
* [60K, 64K)
That's the correct cleanup.
The only good news is, this error is only theoretical, as the target
extent map is always pinned, thus we should directly grab it from
memory, other than reading it from the disk.
[FIX]
Instead of calling btrfs_mark_ordered_io_finished() for the whole folio
range, which can touch ranges we should not touch, instead
move the error handling inside extent_writepage_io().
So that we can cleanup exact sectors that are ought to be submitted but
failed.
This provide much more accurate cleanup, avoiding the double accounting.
Cc: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/extent_io.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index d619c4e148be..b74298c2c24f 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1418,6 +1418,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
struct btrfs_fs_info *fs_info = inode->root->fs_info;
unsigned long range_bitmap = 0;
bool submitted_io = false;
+ bool error = false;
const u64 folio_start = folio_pos(folio);
u64 cur;
int bit;
@@ -1460,11 +1461,21 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
break;
}
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
- if (ret < 0)
- goto out;
+ if (unlikely(ret < 0)) {
+ submit_one_bio(bio_ctrl);
+ /*
+ * Failed to grab the extent map which should be very rare.
+ * Since there is no bio submitted to finish the ordered
+ * extent, we have to manually finish this sector.
+ */
+ btrfs_mark_ordered_io_finished(inode, folio, cur,
+ fs_info->sectorsize, false);
+ error = true;
+ continue;
+ }
submitted_io = true;
}
-out:
+
/*
* If we didn't submitted any sector (>= i_size), folio dirty get
* cleared but PAGECACHE_TAG_DIRTY is not cleared (only cleared
@@ -1472,8 +1483,11 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
*
* Here we set writeback and clear for the range. If the full folio
* is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag.
+ *
+ * If we hit any error, the corresponding sector will still be dirty
+ * thus no need to clear PAGECACHE_TAG_DIRTY.
*/
- if (!submitted_io) {
+ if (!submitted_io && !error) {
btrfs_folio_set_writeback(fs_info, folio, start, len);
btrfs_folio_clear_writeback(fs_info, folio, start, len);
}
@@ -1493,7 +1507,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
{
struct inode *inode = folio->mapping->host;
struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
- const u64 page_start = folio_pos(folio);
int ret;
size_t pg_offset;
loff_t i_size = i_size_read(inode);
@@ -1536,10 +1549,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
bio_ctrl->wbc->nr_to_write--;
- if (ret)
- btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
- page_start, PAGE_SIZE, !ret);
-
done:
if (ret < 0)
mapping_set_error(folio->mapping, ret);
@@ -2320,11 +2329,8 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
if (ret == 1)
goto next_page;
- if (ret) {
- btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
- cur, cur_len, !ret);
+ if (ret)
mapping_set_error(mapping, ret);
- }
btrfs_folio_end_lock(fs_info, folio, cur, cur_len);
if (ret < 0)
found_error = true;
--
2.47.0
[BUG]
There are several double accounting case, where the WARN_ON_ONCE() is
triggered inside can_finish_ordered_extent().
And all such cases points back to the btrfs_mark_ordered_io_finished()
call inside extent_writepage() when it hits some error.
[CAUSE]
With extra debug patches to show where the error is from, it turns out
to be btrfs_run_delalloc_range() can fail with -ENOSPC.
Such failure itself is already a symptom of some bad data/metadata space
reservation, but here we need to focus on the error handling part.
For example, we have the following dirty page layout (4K sector size and
4K page size):
0 16K 32K
|/////|/////|/////|/////|/////|/////|/////|/////|
Where the range [0, 32K) is dirty and we need to write all the 8 pages
back.
When handling the first page 0, we go the following sequence:
- btrfs_run_delalloc_range() for range [0, 32k)
We enter cow_file_range() for [0, 32K)
- btrfs_reserve_extent() only returned a 16K data extent.
This can be caused by fragmentation, and it's already an indication
we're almost running of space.
Now we have the following layout:
0 16K 32K
|<----- Reserved ------>|/////|/////|/////|/////|
The range [0, 16K) has ordered extent allocated.
- btrfs_reserve_extent() returned -ENOSPC
We really run out of space. But since we have reserved space
for range [0, 16K) we need to clean them up.
But that cleanup for ordered extent only happens inside
btrfs_run_delalloc_range().
- btrfs_run_delalloc_range() cleanup the reserved ordered extent
By calling btrfs_mark_ordered_io_finished() for range [0, 32K).
It will locate the ordered extent [0, 16K) and mark it as IOERR.
Also since the ordered extent is only 16K, we're finishing the whole
ordered extent.
Thus we call btrfs_queue_ordered_fn() to queue to finish the ordered
extent.
But still, the ordered extent [0, 16K) is still in the
btrfs_inode::ordered_tree.
- extent_writepage() cleanup the ordered extent inside the folio
We call btrfs_mark_ordered_io_finished() for range [0, 4K).
Since the finished ordered extent [0, 16K) is not yet removed (racy,
depends on when btrfs_finish_one_ordered() is called), if
btrfs_mark_ordered_io_finished() is called before
btrfs_finish_one_ordered(), we will double account and trigger the
warning inside can_finish_ordered_extent().
So the root cause is, we're relying on btrfs_mark_ordered_io_finished()
to handle ranges which is already cleaned up.
Unfortunately the bug dates back to the early days when
btrfs_mark_ordered_io_finished() is introduced as a no-brain choice for
error paths, but such no-brain solution just hides all the race and make
us less cautious when handling errors.
[FIX]
Instead of relying on the btrfs_mark_ordered_io_finished() call to
cleanup the whole folio range, record the last successfully ran delalloc
range.
And combined with bio_ctrl->submit_bitmap to properly clean up any newly
created ordered extents.
Since we have cleaned up the ordered extents in range, we should not
rely on the btrfs_mark_ordered_io_finished() inside extent_writepage()
anymore.
By this, we ensure btrfs_mark_ordered_io_finished() is only called once
when writepage_delalloc() failed.
Cc: stable(a)vger.kernel.org # 5.15+
Fixes: e65f152e4348 ("btrfs: refactor how we finish ordered extent io for endio functions")
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/extent_io.c | 37 ++++++++++++++++++++++++++++++++-----
1 file changed, 32 insertions(+), 5 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 438974d4def4..d619c4e148be 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1167,6 +1167,12 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
* last delalloc end.
*/
u64 last_delalloc_end = 0;
+ /*
+ * Save the last successfully ran delalloc range end (exclusive).
+ * This is for error handling to avoid ranges with ordered extent created
+ * but no IO will be submitted due to error.
+ */
+ u64 last_finished = page_start;
u64 delalloc_start = page_start;
u64 delalloc_end = page_end;
u64 delalloc_to_write = 0;
@@ -1235,11 +1241,19 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
found_len = last_delalloc_end + 1 - found_start;
if (ret >= 0) {
+ /*
+ * Some delalloc range may be created by previous folios.
+ * Thus we still need to clean those range up during error
+ * handling.
+ */
+ last_finished = found_start;
/* No errors hit so far, run the current delalloc range. */
ret = btrfs_run_delalloc_range(inode, folio,
found_start,
found_start + found_len - 1,
wbc);
+ if (ret >= 0)
+ last_finished = found_start + found_len;
} else {
/*
* We've hit an error during previous delalloc range,
@@ -1274,8 +1288,21 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
delalloc_start = found_start + found_len;
}
- if (ret < 0)
+ /*
+ * It's possible we have some ordered extents created before we hit
+ * an error, cleanup non-async successfully created delalloc ranges.
+ */
+ if (unlikely(ret < 0)) {
+ unsigned int bitmap_size = min(
+ (last_finished - page_start) >> fs_info->sectorsize_bits,
+ fs_info->sectors_per_page);
+
+ for_each_set_bit(bit, &bio_ctrl->submit_bitmap, bitmap_size)
+ btrfs_mark_ordered_io_finished(inode, folio,
+ page_start + (bit << fs_info->sectorsize_bits),
+ fs_info->sectorsize, false);
return ret;
+ }
out:
if (last_delalloc_end)
delalloc_end = last_delalloc_end;
@@ -1509,13 +1536,13 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
bio_ctrl->wbc->nr_to_write--;
-done:
- if (ret) {
+ if (ret)
btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
page_start, PAGE_SIZE, !ret);
- mapping_set_error(folio->mapping, ret);
- }
+done:
+ if (ret < 0)
+ mapping_set_error(folio->mapping, ret);
/*
* Only unlock ranges that are submitted. As there can be some async
* submitted ranges inside the folio.
--
2.47.0
Hi,
This series fixes the UFS resume from suspend issue by marking the UFS PHY GDSCs
as ALWAYS_ON. Starting from SM8550, UFS PHY GDSCs doesn't support retention
state. So we should keep them always on so that they don't loose the state
during suspend.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
Manivannan Sadhasivam (2):
clk: qcom: gcc-sm8550: Keep UFS PHY GDSCs ALWAYS_ON
clk: qcom: gcc-sm8650: Keep UFS PHY GDSCs ALWAYS_ON
drivers/clk/qcom/gcc-sm8550.c | 4 ++--
drivers/clk/qcom/gcc-sm8650.c | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20241107-ufs-clk-fix-e49ee2097594
Best regards,
--
Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Patches 1 and 2 of this series fix the issue reported by Hsin-Te Yuan
[1] where MT8192-based Chromebooks are not able to suspend/resume 10
times in a row. Either one of those patches on its own is enough to fix
the issue, but I believe both are desirable, so I've included them both
here.
Patches 3-5 fix unrelated issues that I've noticed while debugging.
Patch 3 fixes IRQ storms when the temperature sensors drop to 20
Celsius. Patches 4 and 5 are cleanups to prevent future issues.
To test this series, I've run 'rtcwake -m mem -d 60' 10 times in a row
on a MT8192-Asurada-Spherion-rev3 Chromebook and checked that the wakeup
happened 60 seconds later (+-5 seconds). I've repeated that test on 10
separate runs. Not once did the chromebook wake up early with the series
applied.
I've also checked that during those runs, the LVTS interrupt didn't
trigger even once, while before the series it would trigger a few times
per run, generally during boot or resume.
Finally, as a sanity check I've verified that the interrupts still work
by lowering the thermal trip point to 45 Celsius and running 'stress -c
8'. Indeed they still do, and the temperature showed by the
thermal_temperature ftrace event matched the expected value.
[1] https://lore.kernel.org/all/20241108-lvts-v1-1-eee339c6ca20@chromium.org/
Signed-off-by: Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
---
Nícolas F. R. A. Prado (5):
thermal/drivers/mediatek/lvts: Disable monitor mode during suspend
thermal/drivers/mediatek/lvts: Disable Stage 3 thermal threshold
thermal/drivers/mediatek/lvts: Disable low offset IRQ for minimum threshold
thermal/drivers/mediatek/lvts: Start sensor interrupts disabled
thermal/drivers/mediatek/lvts: Only update IRQ enable for valid sensors
drivers/thermal/mediatek/lvts_thermal.c | 103 ++++++++++++++++++++++----------
1 file changed, 72 insertions(+), 31 deletions(-)
---
base-commit: b852e1e7a0389ed6168ef1d38eb0bad71a6b11e8
change-id: 20241121-mt8192-lvts-filtered-suspend-fix-a5032ca8eceb
Best regards,
--
Nícolas F. R. A. Prado <nfraprado(a)collabora.com>
v3:
- Fixes commit log "per which" - Bryan
- Link to v2: https://lore.kernel.org/r/20241125-b4-linux-next-24-11-18-clock-multiple-po…
v2:
The main change in this version is Bjorn's pointing out that pm_runtime_*
inside of the gdsc_enable/gdsc_disable path would be recursive and cause a
lockdep splat. Dmitry alluded to this too.
Bjorn pointed to stuff being done lower in the gdsc_register() routine that
might be a starting point.
I iterated around that idea and came up with patch #3. When a gdsc has no
parent and the pd_list is non-NULL then attach that orphan GDSC to the
clock controller power-domain list.
Existing subdomain code in gdsc_register() will connect the parent GDSCs in
the clock-controller to the clock-controller subdomain, the new code here
does that same job for a list of power-domains the clock controller depends
on.
To Dmitry's point about MMCX and MCX dependencies for the registers inside
of the clock controller, I have switched off all references in a test dtsi
and confirmed that accessing the clock-controller regs themselves isn't
required.
On the second point I also verified my test branch with lockdep on which
was a concern with the pm_domain version of this solution but I wanted to
cover it anyway with the new approach for completeness sake.
Here's the item-by-item list of changes:
- Adds a patch to capture pm_genpd_add_subdomain() result code - Bryan
- Changes changelog of second patch to remove singleton and generally
to make the commit log easier to understand - Bjorn
- Uses demv_pm_domain_attach_list - Vlad
- Changes error check to if (ret < 0 && ret != -EEXIST) - Vlad
- Retains passing &pd_data instead of NULL - because NULL doesn't do
the same thing - Bryan/Vlad
- Retains standalone function qcom_cc_pds_attach() because the pd_data
enumeration looks neater in a standalone function - Bryan/Vlad
- Drops pm_runtime in favour of gdsc_add_subdomain_list() for each
power-domain in the pd_list.
The pd_list will be whatever is pointed to by power-domains = <>
in the dtsi - Bjorn
- Link to v1: https://lore.kernel.org/r/20241118-b4-linux-next-24-11-18-clock-multiple-po…
v1:
On x1e80100 and it's SKUs the Camera Clock Controller - CAMCC has
multiple power-domains which power it. Usually with a single power-domain
the core platform code will automatically switch on the singleton
power-domain for you. If you have multiple power-domains for a device, in
this case the clock controller, you need to switch those power-domains
on/off yourself.
The clock controllers can also contain Global Distributed
Switch Controllers - GDSCs which themselves can be referenced from dtsi
nodes ultimately triggering a gdsc_en() in drivers/clk/qcom/gdsc.c.
As an example:
cci0: cci@ac4a000 {
power-domains = <&camcc TITAN_TOP_GDSC>;
};
This series adds the support to attach a power-domain list to the
clock-controllers and the GDSCs those controllers provide so that in the
case of the above example gdsc_toggle_logic() will trigger the power-domain
list with pm_runtime_resume_and_get() and pm_runtime_put_sync()
respectively.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (3):
clk: qcom: gdsc: Capture pm_genpd_add_subdomain result code
clk: qcom: common: Add support for power-domain attachment
driver: clk: qcom: Support attaching subdomain list to multiple parents
drivers/clk/qcom/common.c | 21 +++++++++++++++++++++
drivers/clk/qcom/gdsc.c | 41 +++++++++++++++++++++++++++++++++++++++--
drivers/clk/qcom/gdsc.h | 1 +
3 files changed, 61 insertions(+), 2 deletions(-)
---
base-commit: 744cf71b8bdfcdd77aaf58395e068b7457634b2c
change-id: 20241118-b4-linux-next-24-11-18-clock-multiple-power-domains-a5f994dc452a
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
From: Chuck Lever <chuck.lever(a)oracle.com>
Testing shows that the EBUSY error return from mtree_alloc_cyclic()
leaks into user space. The ERRORS section of "man creat(2)" says:
> EBUSY O_EXCL was specified in flags and pathname refers
> to a block device that is in use by the system
> (e.g., it is mounted).
ENOSPC is closer to what applications expect in this situation.
Note that the normal range of simple directory offset values is
2..2^63, so hitting this error is going to be rare to impossible.
Fixes: 6faddda69f62 ("libfs: Add directory operations for stable offsets")
Cc: <stable(a)vger.kernel.org> # v6.9+
Signed-off-by: Chuck Lever <chuck.lever(a)oracle.com>
---
fs/libfs.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/fs/libfs.c b/fs/libfs.c
index 46966fd8bcf9..bf67954b525b 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -288,7 +288,9 @@ int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry)
ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry, DIR_OFFSET_MIN,
LONG_MAX, &octx->next_offset, GFP_KERNEL);
- if (ret < 0)
+ if (unlikely(ret == -EBUSY))
+ return -ENOSPC;
+ if (unlikely(ret < 0))
return ret;
offset_set(dentry, offset);
--
2.47.0
XE_CACHE_WB must be converted into the per-platform pat index for that
particular caching mode, otherwise we are just encoding whatever happens
to be the value of that enum.
Fixes: e8babb280b5e ("drm/xe: Convert multiple bind ops into single job")
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: Nirmoy Das <nirmoy.das(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.12+
---
drivers/gpu/drm/xe/xe_migrate.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/xe/xe_migrate.c b/drivers/gpu/drm/xe/xe_migrate.c
index cfd31ae49cc1..48e205a40fd2 100644
--- a/drivers/gpu/drm/xe/xe_migrate.c
+++ b/drivers/gpu/drm/xe/xe_migrate.c
@@ -1350,6 +1350,7 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
/* For sysmem PTE's, need to map them in our hole.. */
if (!IS_DGFX(xe)) {
+ u16 pat_index = xe->pat.idx[XE_CACHE_WB];
u32 ptes, ofs;
ppgtt_ofs = NUM_KERNEL_PDE - 1;
@@ -1409,7 +1410,7 @@ __xe_migrate_update_pgtables(struct xe_migrate *m,
pt_bo->update_index = current_update;
addr = vm->pt_ops->pte_encode_bo(pt_bo, 0,
- XE_CACHE_WB, 0);
+ pat_index, 0);
bb->cs[bb->len++] = lower_32_bits(addr);
bb->cs[bb->len++] = upper_32_bits(addr);
}
--
2.47.0
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: 12aaf67584cf19dc84615b7aba272fe642c35b8b
Gitweb: https://git.kernel.org/tip/12aaf67584cf19dc84615b7aba272fe642c35b8b
Author: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
AuthorDate: Thu, 21 Nov 2024 12:48:25
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 26 Nov 2024 19:58:27 +01:00
irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domain
Commit fbdf14e90ce4 ("irqchip/irq-mvebu-sei: Switch to MSI parent")
introduced in v6.11-rc1 broke Mavell Armada platforms (and possibly others)
by incorrectly switching irq-mvebu-sei to MSI parent.
In the above commit, msi_parent_ops is set for the sei->cp_domain, but
rather than adding a .select method to mvebu_sei_cp_domain_ops (which is
associated with sei->cp_domain), it was added to mvebu_sei_domain_ops which
is associated with sei->sei_domain, which doesn't have any
msi_parent_ops. This makes the call to msi_lib_irq_domain_select() always
fail.
This bug manifests itself with the following kernel messages on Armada 8040
based systems:
platform f21e0000.interrupt-controller:interrupt-controller@50: deferred probe pending: (reason unknown)
platform f41e0000.interrupt-controller:interrupt-controller@50: deferred probe pending: (reason unknown)
Move the select callback to mvebu_sei_cp_domain_ops to cure it.
Fixes: fbdf14e90ce4 ("irqchip/irq-mvebu-sei: Switch to MSI parent")
Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
Link: https://lore.kernel.org/all/E1tE6bh-004CmX-QU@rmk-PC.armlinux.org.uk
---
drivers/irqchip/irq-mvebu-sei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-mvebu-sei.c b/drivers/irqchip/irq-mvebu-sei.c
index f8c70f2..065166a 100644
--- a/drivers/irqchip/irq-mvebu-sei.c
+++ b/drivers/irqchip/irq-mvebu-sei.c
@@ -192,7 +192,6 @@ static void mvebu_sei_domain_free(struct irq_domain *domain, unsigned int virq,
}
static const struct irq_domain_ops mvebu_sei_domain_ops = {
- .select = msi_lib_irq_domain_select,
.alloc = mvebu_sei_domain_alloc,
.free = mvebu_sei_domain_free,
};
@@ -306,6 +305,7 @@ static void mvebu_sei_cp_domain_free(struct irq_domain *domain,
}
static const struct irq_domain_ops mvebu_sei_cp_domain_ops = {
+ .select = msi_lib_irq_domain_select,
.alloc = mvebu_sei_cp_domain_alloc,
.free = mvebu_sei_cp_domain_free,
};
The following commit has been merged into the irq/urgent branch of tip:
Commit-ID: 81b9e4c6910fd779b679d7674ec7d3730c7f0e2c
Gitweb: https://git.kernel.org/tip/81b9e4c6910fd779b679d7674ec7d3730c7f0e2c
Author: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
AuthorDate: Thu, 21 Nov 2024 12:48:25
Committer: Thomas Gleixner <tglx(a)linutronix.de>
CommitterDate: Tue, 26 Nov 2024 19:50:42 +01:00
irqchip/irq-mvebu-sei: Move misplaced select() callback to SEI CP domain
Commit fbdf14e90ce4 ("irqchip/irq-mvebu-sei: Switch to MSI parent")
introduced in v6.11-rc1 broke Mavell Armada platforms (and possibly others)
by incorrectly switching irq-mvebu-sei to MSI parent.
In the above commit, msi_parent_ops is set for the sei->cp_domain, but
rather than adding a .select method to mvebu_sei_cp_domain_ops (which is
associated with sei->cp_domain), it was added to mvebu_sei_domain_ops which
is associated with sei->sei_domain, which doesn't have any
msi_parent_ops. This makes the call to msi_lib_irq_domain_select() always
fail.
This bug manifests itself with the following kernel messages on Armada 8040
based systems:
platform f21e0000.interrupt-controller:interrupt-controller@50: deferred probe pending: (reason unknown)
platform f41e0000.interrupt-controller:interrupt-controller@50: deferred probe pending: (reason unknown)
Move the select callback to mvebu_sei_cp_domain_ops to cure it.
Fixes: fbdf14e90ce4 ("irqchip/irq-mvebu-sei: Switch to MSI parent")
Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Thomas Gleixner <tglx(a)linutronix.de>
Cc: stable(a)vger.kernel.org
---
drivers/irqchip/irq-mvebu-sei.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/irqchip/irq-mvebu-sei.c b/drivers/irqchip/irq-mvebu-sei.c
index f8c70f2..065166a 100644
--- a/drivers/irqchip/irq-mvebu-sei.c
+++ b/drivers/irqchip/irq-mvebu-sei.c
@@ -192,7 +192,6 @@ static void mvebu_sei_domain_free(struct irq_domain *domain, unsigned int virq,
}
static const struct irq_domain_ops mvebu_sei_domain_ops = {
- .select = msi_lib_irq_domain_select,
.alloc = mvebu_sei_domain_alloc,
.free = mvebu_sei_domain_free,
};
@@ -306,6 +305,7 @@ static void mvebu_sei_cp_domain_free(struct irq_domain *domain,
}
static const struct irq_domain_ops mvebu_sei_cp_domain_ops = {
+ .select = msi_lib_irq_domain_select,
.alloc = mvebu_sei_cp_domain_alloc,
.free = mvebu_sei_cp_domain_free,
};
From: Alex Hung <alex.hung(a)amd.com>
[ Upstream commit b995c0a6de6c74656a0c39cd57a0626351b13e3c ]
[WHAT & HOW]
Variables used as denominators and maybe not assigned to other values,
should not be 0. Change their default to 1 so they are never 0.
This fixes 10 DIVIDE_BY_ZERO issues reported by Coverity.
Reviewed-by: Harry Wentland <harry.wentland(a)amd.com>
Signed-off-by: Jerry Zuo <jerry.zuo(a)amd.com>
Signed-off-by: Alex Hung <alex.hung(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
[Xiangyu: Bp to fix CVE: CVE-2024-49899
Discard the dml2_core/dml2_core_shared.c due to this file no exists]
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
---
.../gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c | 2 +-
drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
index 548cdef8a8ad..543ce9a08cfd 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn20/display_rq_dlg_calc_20.c
@@ -78,7 +78,7 @@ static void calculate_ttu_cursor(struct display_mode_lib *mode_lib,
static unsigned int get_bytes_per_element(enum source_format_class source_format, bool is_chroma)
{
- unsigned int ret_val = 0;
+ unsigned int ret_val = 1;
if (source_format == dm_444_16) {
if (!is_chroma)
diff --git a/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c b/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c
index 3df559c591f8..70df992f859d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dml1_display_rq_dlg_calc.c
@@ -39,7 +39,7 @@
static unsigned int get_bytes_per_element(enum source_format_class source_format, bool is_chroma)
{
- unsigned int ret_val = 0;
+ unsigned int ret_val = 1;
if (source_format == dm_444_16) {
if (!is_chroma)
--
2.43.0
From: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
[ Upstream commit 28574b08c70e56d34d6f6379326a860b96749051 ]
This commit adds a null check for the set_output_gamma function pointer
in the dcn32_set_output_transfer_func function. Previously,
set_output_gamma was being checked for null, but then it was being
dereferenced without any null check. This could lead to a null pointer
dereference if set_output_gamma is null.
To fix this, we now ensure that set_output_gamma is not null before
dereferencing it. We do this by adding a null check for set_output_gamma
before the call to set_output_gamma.
Cc: Tom Chung <chiahsuan.chung(a)amd.com>
Cc: Rodrigo Siqueira <Rodrigo.Siqueira(a)amd.com>
Cc: Roman Li <roman.li(a)amd.com>
Cc: Alex Hung <alex.hung(a)amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai(a)amd.com>
Cc: Harry Wentland <harry.wentland(a)amd.com>
Cc: Hamza Mahfooz <hamza.mahfooz(a)amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam(a)amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
---
drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
index bd75d3cba098..d3ad13bf35c8 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn32/dcn32_hwseq.c
@@ -667,7 +667,9 @@ bool dcn32_set_output_transfer_func(struct dc *dc,
}
}
- mpc->funcs->set_output_gamma(mpc, mpcc_id, params);
+ if (mpc->funcs->set_output_gamma)
+ mpc->funcs->set_output_gamma(mpc, mpcc_id, params);
+
return ret;
}
--
2.43.0
From: Xiangyu Chen <xiangyu.chen(a)windriver.com>
Backport to fix CVE-2024-49951, the main fix is
Bluetooth: MGMT: Fix possible crash on mgmt_index_removed
This required 1 extra commit to make sure the picks are clean:
Bluetooth: hci_sync: Add helper functions to manipulate cmd_sync queue
Luiz Augusto von Dentz (2):
Bluetooth: hci_sync: Add helper functions to manipulate cmd_sync queue
Bluetooth: MGMT: Fix possible crash on mgmt_index_removed
include/net/bluetooth/hci_sync.h | 12 +++
net/bluetooth/hci_sync.c | 132 +++++++++++++++++++++++++++++--
net/bluetooth/mgmt.c | 23 +++---
3 files changed, 150 insertions(+), 17 deletions(-)
--
2.43.0
Hi
I submit this upstream patch for the stable branches 6.6 and 6.11.
Matthew Sakai from the DM-VDO team found out that there is a very narrow
race condition in the slub sysfs code and it could cause crashes if caches
with the same name are rapidly created and deleted. In order to work
around these crashes, we need to have unique slab cache names.
Mikulas
commit 42964e4b5e3ac95090bdd23ed7da2a941ccd902c
Author: Mikulas Patocka <mpatocka(a)redhat.com>
Date: Mon Nov 11 16:48:18 2024 +0100
dm-bufio: fix warnings about duplicate slab caches
The commit 4c39529663b9 adds a warning about duplicate cache names if
CONFIG_DEBUG_VM is selected. These warnings are triggered by the dm-bufio
code. The dm-bufio code allocates a slab cache with each client. It is
not possible to preallocate the caches in the module init function
because the size of auxiliary per-buffer data is not known at this point.
So, this commit changes dm-bufio so that it appends a unique atomic value
to the cache name, to avoid the warnings.
Signed-off-by: Mikulas Patocka <mpatocka(a)redhat.com>
Fixes: 4c39529663b9 ("slab: Warn on duplicate cache names when DEBUG_VM=y")
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index d478aafa02c9..23e0b71b991e 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -2471,7 +2471,8 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
int r;
unsigned int num_locks;
struct dm_bufio_client *c;
- char slab_name[27];
+ char slab_name[64];
+ static atomic_t seqno = ATOMIC_INIT(0);
if (!block_size || block_size & ((1 << SECTOR_SHIFT) - 1)) {
DMERR("%s: block size not specified or is not multiple of 512b", __func__);
@@ -2522,7 +2523,8 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
(block_size < PAGE_SIZE || !is_power_of_2(block_size))) {
unsigned int align = min(1U << __ffs(block_size), (unsigned int)PAGE_SIZE);
- snprintf(slab_name, sizeof(slab_name), "dm_bufio_cache-%u", block_size);
+ snprintf(slab_name, sizeof(slab_name), "dm_bufio_cache-%u-%u",
+ block_size, atomic_inc_return(&seqno));
c->slab_cache = kmem_cache_create(slab_name, block_size, align,
SLAB_RECLAIM_ACCOUNT, NULL);
if (!c->slab_cache) {
@@ -2531,9 +2533,11 @@ struct dm_bufio_client *dm_bufio_client_create(struct block_device *bdev, unsign
}
}
if (aux_size)
- snprintf(slab_name, sizeof(slab_name), "dm_bufio_buffer-%u", aux_size);
+ snprintf(slab_name, sizeof(slab_name), "dm_bufio_buffer-%u-%u",
+ aux_size, atomic_inc_return(&seqno));
else
- snprintf(slab_name, sizeof(slab_name), "dm_bufio_buffer");
+ snprintf(slab_name, sizeof(slab_name), "dm_bufio_buffer-%u",
+ atomic_inc_return(&seqno));
c->slab_buffer = kmem_cache_create(slab_name, sizeof(struct dm_buffer) + aux_size,
0, SLAB_RECLAIM_ACCOUNT, NULL);
if (!c->slab_buffer) {
This reverts commit 7c877586da3178974a8a94577b6045a48377ff25.
Anders and Philippe have reported that recent kernels occasionally hang
when used with NFS in readahead code. The problem has been bisected to
7c877586da3 ("readahead: properly shorten readahead when falling back to
do_page_cache_ra()"). The cause of the problem is that ra->size can be
shrunk by read_pages() call and subsequently we end up calling
do_page_cache_ra() with negative (read huge positive) number of pages.
Let's revert 7c877586da3 for now until we can find a proper way how the
logic in read_pages() and page_cache_ra_order() can coexist. This can
lead to reduced readahead throughput due to readahead window confusion
but that's better than outright hangs.
Reported-by: Anders Blomdell <anders.blomdell(a)gmail.com>
Reported-by: Philippe Troin <phil(a)fifi.org>
CC: stable(a)vger.kernel.org
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
mm/readahead.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --git a/mm/readahead.c b/mm/readahead.c
index 8f1cf599b572..ea650b8b02fb 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -458,8 +458,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
struct file_ra_state *ra, unsigned int new_order)
{
struct address_space *mapping = ractl->mapping;
- pgoff_t start = readahead_index(ractl);
- pgoff_t index = start;
+ pgoff_t index = readahead_index(ractl);
unsigned int min_order = mapping_min_folio_order(mapping);
pgoff_t limit = (i_size_read(mapping->host) - 1) >> PAGE_SHIFT;
pgoff_t mark = index + ra->size - ra->async_size;
@@ -522,7 +521,7 @@ void page_cache_ra_order(struct readahead_control *ractl,
if (!err)
return;
fallback:
- do_page_cache_ra(ractl, ra->size - (index - start), ra->async_size);
+ do_page_cache_ra(ractl, ra->size, ra->async_size);
}
static unsigned long ractl_max_pages(struct readahead_control *ractl,
--
2.35.3
Added a check for ubi_num for negative numbers
If the variable ubi_num takes negative values then we get:
qemu-system-arm ... -append "ubi.mtd=0,0,0,-22222345" ...
[ 0.745065] ubi_attach_mtd_dev from ubi_init+0x178/0x218
[ 0.745230] ubi_init from do_one_initcall+0x70/0x1ac
[ 0.745344] do_one_initcall from kernel_init_freeable+0x198/0x224
[ 0.745474] kernel_init_freeable from kernel_init+0x18/0x134
[ 0.745600] kernel_init from ret_from_fork+0x14/0x28
[ 0.745727] Exception stack(0x90015fb0 to 0x90015ff8)
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 83ff59a06663 ("UBI: support ubi_num on mtd.ubi command line")
Cc: stable(a)vger.kernel.org
Signed-off-by: Denis Arefev <arefev(a)swemel.ru>
---
V1 -> V2: changed the tag Fixes and moved the check to ubi_mtd_param_parse()
drivers/mtd/ubi/build.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/mtd/ubi/build.c b/drivers/mtd/ubi/build.c
index 30be4ed68fad..ef6a22f372f9 100644
--- a/drivers/mtd/ubi/build.c
+++ b/drivers/mtd/ubi/build.c
@@ -1537,7 +1537,7 @@ static int ubi_mtd_param_parse(const char *val, const struct kernel_param *kp)
if (token) {
int err = kstrtoint(token, 10, &p->ubi_num);
- if (err) {
+ if (err || p->ubi_num < UBI_DEV_NUM_AUTO) {
pr_err("UBI error: bad value for ubi_num parameter: %s\n",
token);
return -EINVAL;
--
2.25.1
From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
[ Upstream commit ec8b6f55b98146c41dcf15e8189eb43291e35e89 ]
If we remove a GPIO chip that is also an interrupt controller with users
not having freed some interrupts, we'll end up leaking resources as
indicated by the following warning:
remove_proc_entry: removing non-empty directory 'irq/30', leaking at least 'gpio'
As there's no way of notifying interrupt users about the irqchip going
away and the interrupt subsystem is not plugged into the driver model and
so not all cases can be handled by devlinks, we need to make sure to free
all interrupts before the complete the removal of the provider.
Reviewed-by: Herve Codina <herve.codina(a)bootlin.com>
Tested-by: Herve Codina <herve.codina(a)bootlin.com>
Link: https://lore.kernel.org/r/20240919135104.3583-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpio/gpiolib.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 2b02655abb56e..44372f8647d51 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -14,6 +14,7 @@
#include <linux/idr.h>
#include <linux/interrupt.h>
#include <linux/irq.h>
+#include <linux/irqdesc.h>
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/lockdep.h>
@@ -713,6 +714,45 @@ bool gpiochip_line_is_valid(const struct gpio_chip *gc,
}
EXPORT_SYMBOL_GPL(gpiochip_line_is_valid);
+static void gpiod_free_irqs(struct gpio_desc *desc)
+{
+ int irq = gpiod_to_irq(desc);
+ struct irq_desc *irqd = irq_to_desc(irq);
+ void *cookie;
+
+ for (;;) {
+ /*
+ * Make sure the action doesn't go away while we're
+ * dereferencing it. Retrieve and store the cookie value.
+ * If the irq is freed after we release the lock, that's
+ * alright - the underlying maple tree lookup will return NULL
+ * and nothing will happen in free_irq().
+ */
+ scoped_guard(mutex, &irqd->request_mutex) {
+ if (!irq_desc_has_action(irqd))
+ return;
+
+ cookie = irqd->action->dev_id;
+ }
+
+ free_irq(irq, cookie);
+ }
+}
+
+/*
+ * The chip is going away but there may be users who had requested interrupts
+ * on its GPIO lines who have no idea about its removal and have no way of
+ * being notified about it. We need to free any interrupts still in use here or
+ * we'll leak memory and resources (like procfs files).
+ */
+static void gpiochip_free_remaining_irqs(struct gpio_chip *gc)
+{
+ struct gpio_desc *desc;
+
+ for_each_gpio_desc_with_flag(gc, desc, FLAG_USED_AS_IRQ)
+ gpiod_free_irqs(desc);
+}
+
static void gpiodev_release(struct device *dev)
{
struct gpio_device *gdev = to_gpio_device(dev);
@@ -1125,6 +1165,7 @@ void gpiochip_remove(struct gpio_chip *gc)
/* FIXME: should the legacy sysfs handling be moved to gpio_device? */
gpiochip_sysfs_unregister(gdev);
gpiochip_free_hogs(gc);
+ gpiochip_free_remaining_irqs(gc);
scoped_guard(mutex, &gpio_devices_lock)
list_del_rcu(&gdev->list);
--
2.43.0
From: Darrick J. Wong <djwong(a)kernel.org>
I'm appointing myself to be responsible for getting after people to
submit their upstream bug fixes with the appropriate Fixes tags and to
cc stable; to find whatever slips through the cracks; and to keep an eye
on the automatic QA of all that stuff.
Cc: <stable(a)vger.kernel.org> # v6.12
Signed-off-by: "Darrick J. Wong" <djwong(a)kernel.org>
---
MAINTAINERS | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/MAINTAINERS b/MAINTAINERS
index b878ddc99f94e7..23d89f2a3008e2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -25358,8 +25358,7 @@ F: include/xen/arm/swiotlb-xen.h
F: include/xen/swiotlb-xen.h
XFS FILESYSTEM
-M: Carlos Maiolino <cem(a)kernel.org>
-R: Darrick J. Wong <djwong(a)kernel.org>
+M: Darrick J. Wong <djwong(a)kernel.org>
L: linux-xfs(a)vger.kernel.org
S: Supported
W: http://xfs.org/
Commit b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround
broken TEF FIFO tail index erratum") introduced
mcp251xfd_get_tef_len() to get the number of unhandled transmit events
from the Transmit Event FIFO (TEF).
As the TEF has no head index, the driver uses the TX-FIFO's tail index
instead, assuming that send frames are completed.
When calculating the number of unhandled TEF events, that commit
didn't take mcp2518fd erratum DS80000789E 6. into account. According
to that erratum, the FIFOCI bits of a FIFOSTA register, here the
TX-FIFO tail index might be corrupted.
However here it seems the bit indicating that the TX-FIFO is
empty (MCP251XFD_REG_FIFOSTA_TFERFFIF) is not correct while the
TX-FIFO tail index is.
Assume that the TX-FIFO is indeed empty if:
- Chip's head and tail index are equal (len == 0).
- The TX-FIFO is less than half full.
(The TX-FIFO empty case has already been checked at the
beginning of this function.)
- No free buffers in the TX ring.
If the TX-FIFO is assumed to be empty, assume that the TEF is full and
return the number of elements in the TX-FIFO (which equals the number
of TEF elements).
If these assumptions are false, the driver might read to many objects
from the TEF. mcp251xfd_handle_tefif_one() checks the sequence numbers
and will refuse to process old events.
Reported-by: Renjaya Raga Zenta <renjaya.zenta(a)formulatrix.com>
Closes: https://patch.msgid.link/CAJ7t6HgaeQ3a_OtfszezU=zB-FqiZXqrnATJ3UujNoQJJf7Gg…
Fixes: b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum")
Tested-by: Renjaya Raga Zenta <renjaya.zenta(a)formulatrix.com>
Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
Changes in v2:
- adjusted patch subject
- added stable on Cc
- added Renjaya Raga Zenta's Tested-by
- Link to RFC: https://patch.msgid.link/20241125-mcp251xfd-fix-length-calculation-v1-1-974…
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 29 ++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
index d3ac865933fdf6c4ecdd80ad4d7accbff51eb0f8..e94321849fd7e69ed045eaeac3efec52fe077d96 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
@@ -21,6 +21,11 @@ static inline bool mcp251xfd_tx_fifo_sta_empty(u32 fifo_sta)
return fifo_sta & MCP251XFD_REG_FIFOSTA_TFERFFIF;
}
+static inline bool mcp251xfd_tx_fifo_sta_less_than_half_full(u32 fifo_sta)
+{
+ return fifo_sta & MCP251XFD_REG_FIFOSTA_TFHRFHIF;
+}
+
static inline int
mcp251xfd_tef_tail_get_from_chip(const struct mcp251xfd_priv *priv,
u8 *tef_tail)
@@ -147,7 +152,29 @@ mcp251xfd_get_tef_len(struct mcp251xfd_priv *priv, u8 *len_p)
BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(len));
len = (chip_tx_tail << shift) - (tail << shift);
- *len_p = len >> shift;
+ len >>= shift;
+
+ /* According to mcp2518fd erratum DS80000789E 6. the FIFOCI
+ * bits of a FIFOSTA register, here the TX-FIFO tail index
+ * might be corrupted.
+ *
+ * However here it seems the bit indicating that the TX-FIFO
+ * is empty (MCP251XFD_REG_FIFOSTA_TFERFFIF) is not correct
+ * while the TX-FIFO tail index is.
+ *
+ * We assume the TX-FIFO is empty, i.e. all pending CAN frames
+ * haven been send, if:
+ * - Chip's head and tail index are equal (len == 0).
+ * - The TX-FIFO is less than half full.
+ * (The TX-FIFO empty case has already been checked at the
+ * beginning of this function.)
+ * - No free buffers in the TX ring.
+ */
+ if (len == 0 && mcp251xfd_tx_fifo_sta_less_than_half_full(fifo_sta) &&
+ mcp251xfd_get_tx_free(tx_ring) == 0)
+ len = tx_ring->obj_num;
+
+ *len_p = len;
return 0;
}
---
base-commit: 9bb88c659673003453fd42e0ddf95c9628409094
change-id: 20241115-mcp251xfd-fix-length-calculation-96d4a0ed11fe
Best regards,
--
Marc Kleine-Budde <mkl(a)pengutronix.de>
[BUG]
If submit_one_sector() failed inside extent_writepage_io() for sector
size < page size cases (e.g. 4K sector size and 64K page size), then
we can hit double ordered extent accounting error.
This should be very rare, as submit_one_sector() only fails when we
failed to grab the extent map, and such extent map should exist inside
the memory and have been pinned.
[CAUSE]
For example we have the following folio layout:
0 4K 32K 48K 60K 64K
|//| |//////| |///|
Where |///| is the dirty range we need to writeback. The 3 different
dirty ranges are submitted for regular COW.
Now we hit the following sequence:
- submit_one_sector() returned 0 for [0, 4K)
- submit_one_sector() returned 0 for [32K, 48K)
- submit_one_sector() returned error for [60K, 64K)
- btrfs_mark_ordered_io_finished() called for the whole folio
This will mark the following ranges as finished:
* [0, 4K)
* [32K, 48K)
Both ranges have their IO already submitted, this cleanup will
lead to double accounting.
* [60K, 64K)
That's the correct cleanup.
Unfortunately the behavior dates back to the old days when there is no
subpage support.
[FIX]
Instead of calling btrfs_mark_ordered_io_finished() unconditionally at
extent_writepage(), which can touch ranges we should not touch, instead
move the error handling inside extent_writepage_io().
So that we can cleanup exact sectors that are ought to be submitted but
failed.
This provide much more accurate cleanup, avoiding the double accounting.
Cc: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/extent_io.c | 32 +++++++++++++++++++-------------
1 file changed, 19 insertions(+), 13 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 0132c2b84d99..a3d4f698fd25 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1420,6 +1420,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
struct btrfs_fs_info *fs_info = inode->root->fs_info;
unsigned long range_bitmap = 0;
bool submitted_io = false;
+ bool error = false;
const u64 folio_start = folio_pos(folio);
u64 cur;
int bit;
@@ -1462,11 +1463,21 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
break;
}
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
- if (ret < 0)
- goto out;
+ if (unlikely(ret < 0)) {
+ submit_one_bio(bio_ctrl);
+ /*
+ * Failed to grab the extent map which should be very rare.
+ * Since there is no bio submitted to finish the ordered
+ * extent, we have to manually finish this sector.
+ */
+ btrfs_mark_ordered_io_finished(inode, folio, cur,
+ fs_info->sectorsize, false);
+ error = true;
+ continue;
+ }
submitted_io = true;
}
-out:
+
/*
* If we didn't submitted any sector (>= i_size), folio dirty get
* cleared but PAGECACHE_TAG_DIRTY is not cleared (only cleared
@@ -1474,8 +1485,11 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
*
* Here we set writeback and clear for the range. If the full folio
* is no longer dirty then we clear the PAGECACHE_TAG_DIRTY tag.
+ *
+ * If we hit any error, the corresponding sector will still be dirty
+ * thus no need to clear PAGECACHE_TAG_DIRTY.
*/
- if (!submitted_io) {
+ if (!submitted_io && !error) {
btrfs_folio_set_writeback(fs_info, folio, start, len);
btrfs_folio_clear_writeback(fs_info, folio, start, len);
}
@@ -1495,7 +1509,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
{
struct inode *inode = folio->mapping->host;
struct btrfs_fs_info *fs_info = inode_to_fs_info(inode);
- const u64 page_start = folio_pos(folio);
int ret;
size_t pg_offset;
loff_t i_size = i_size_read(inode);
@@ -1538,10 +1551,6 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
bio_ctrl->wbc->nr_to_write--;
- if (ret)
- btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
- page_start, PAGE_SIZE, !ret);
-
done:
if (ret < 0)
mapping_set_error(folio->mapping, ret);
@@ -2322,11 +2331,8 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
if (ret == 1)
goto next_page;
- if (ret) {
- btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
- cur, cur_len, !ret);
+ if (ret)
mapping_set_error(mapping, ret);
- }
btrfs_folio_end_lock(fs_info, folio, cur, cur_len);
if (ret < 0)
found_error = true;
--
2.47.0
[BUG]
There are several crash or hang during fstests runs with sectorsize < page
size setup.
It turns out that most of those hang happens after a
btrfs_run_delalloc_range() failure (caused by -ENOSPC).
The most common one is generic/750.
The symptom are all related to ordered extent finishing, where we double
account the target ordered extent.
[CAUSE]
Inside writepage_delalloc() if we hit an error from
btrfs_run_delalloc_range(), we still need to unlock all the locked
range, but that's the only error handling.
If we have the following page layout with a 64K page size and 4K sector
size:
0 4K 32K 40K 60K 64K
|////| |////| |/////|
Where |//| is the dirtied blocks inside the folio.
Then we hit the following sequence:
- Enter writepage_delalloc() for folio 0
- btrfs_run_delalloc_range() returned 0 for [0, 4K)
And created regular COW ordered extent for range [0, 4K)
- btrfs_run_delalloc_range() returned 0 for [32K, 40K)
And created async extent for range [32K, 40K).
This means the error handling will be done in another thread, we
should not touch the range anymore.
- btrfs_run_delalloc_range() failed with -ENOSPC for range [60K, 64K)
In theory we should not fail since we should have reserved enough
space at buffered write time, but let's ignore that rabbit hole and
focus on the error handling.
- Error handling in extent_writepage()
Now we go to the done: tag, calling btrfs_mark_ordered_io_finished()
for the whole folio range.
This will find ranges [0, 4K) and [32K, 40K) to cleanup, for [0, 4K)
it should be cleaned up, but for range [32K, 40K) it's asynchronously
handled, the OE may have already been submitted.
This will lead to the double account for range [32K, 40K) and crash
the kernel.
Unfortunately this bad error handling is from the very beginning of
sector size < page size support.
[FIX]
Instead of relying on the btrfs_mark_ordered_io_finished() call to
cleanup the whole folio range, record the last successfully ran delalloc
range.
And combined with bio_ctrl->submit_bitmap to properly clean up any newly
created ordered extents.
Since we have cleaned up the ordered extents in range, we should not
rely on the btrfs_mark_ordered_io_finished() inside extent_writepage()
anymore.
By this, we should avoid double accounting during error handling.
Cc: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/extent_io.c | 45 ++++++++++++++++++++++++++++++++++++--------
1 file changed, 37 insertions(+), 8 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 438974d4def4..0132c2b84d99 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -1145,11 +1145,13 @@ static bool find_next_delalloc_bitmap(struct folio *folio,
* helper for extent_writepage(), doing all of the delayed allocation setup.
*
* This returns 1 if btrfs_run_delalloc_range function did all the work required
- * to write the page (copy into inline extent). In this case the IO has
- * been started and the page is already unlocked.
+ * to write the page (copy into inline extent or compression). In this case
+ * the IO has been started and we should no longer touch the page (may have
+ * already been unlocked).
*
* This returns 0 if all went well (page still locked)
- * This returns < 0 if there were errors (page still locked)
+ * This returns < 0 if there were errors (page still locked), in this case any
+ * newly created delalloc range will be marked as error and finished.
*/
static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
struct folio *folio,
@@ -1167,6 +1169,12 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
* last delalloc end.
*/
u64 last_delalloc_end = 0;
+ /*
+ * Save the last successfully ran delalloc range end (exclusive).
+ * This is for error handling to avoid ranges with ordered extent created
+ * but no IO will be submitted due to error.
+ */
+ u64 last_finished = page_start;
u64 delalloc_start = page_start;
u64 delalloc_end = page_end;
u64 delalloc_to_write = 0;
@@ -1235,11 +1243,19 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
found_len = last_delalloc_end + 1 - found_start;
if (ret >= 0) {
+ /*
+ * Some delalloc range may be created by previous folios.
+ * Thus we still need to clean those range up during error
+ * handling.
+ */
+ last_finished = found_start;
/* No errors hit so far, run the current delalloc range. */
ret = btrfs_run_delalloc_range(inode, folio,
found_start,
found_start + found_len - 1,
wbc);
+ if (ret >= 0)
+ last_finished = found_start + found_len;
} else {
/*
* We've hit an error during previous delalloc range,
@@ -1274,8 +1290,21 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
delalloc_start = found_start + found_len;
}
- if (ret < 0)
+ /*
+ * It's possible we have some ordered extents created before we hit
+ * an error, cleanup non-async successfully created delalloc ranges.
+ */
+ if (unlikely(ret < 0)) {
+ unsigned int bitmap_size = min(
+ (last_finished - page_start) >> fs_info->sectorsize_bits,
+ fs_info->sectors_per_page);
+
+ for_each_set_bit(bit, &bio_ctrl->submit_bitmap, bitmap_size)
+ btrfs_mark_ordered_io_finished(inode, folio,
+ page_start + (bit << fs_info->sectorsize_bits),
+ fs_info->sectorsize, false);
return ret;
+ }
out:
if (last_delalloc_end)
delalloc_end = last_delalloc_end;
@@ -1509,13 +1538,13 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
bio_ctrl->wbc->nr_to_write--;
-done:
- if (ret) {
+ if (ret)
btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
page_start, PAGE_SIZE, !ret);
- mapping_set_error(folio->mapping, ret);
- }
+done:
+ if (ret < 0)
+ mapping_set_error(folio->mapping, ret);
/*
* Only unlock ranges that are submitted. As there can be some async
* submitted ranges inside the folio.
--
2.47.0
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
[ Upstream commit 7d3b793faaab1305994ce568b59d61927235f57b ]
When enabling access to the special register set, Receiver time-out and
RHR interrupts can happen. In this case, the IRQ handler will try to read
from the FIFO thru the RHR register at address 0x00, but address 0x00 is
mapped to DLL register, resulting in erroneous FIFO reading.
Call graph example:
sc16is7xx_startup(): entry
sc16is7xx_ms_proc(): entry
sc16is7xx_set_termios(): entry
sc16is7xx_set_baud(): DLH/DLL = $009C --> access special register set
sc16is7xx_port_irq() entry --> IIR is 0x0C
sc16is7xx_handle_rx() entry
sc16is7xx_fifo_read(): --> unable to access FIFO (RHR) because it is
mapped to DLL (LCR=LCR_CONF_MODE_A)
sc16is7xx_set_baud(): exit --> Restore access to general register set
Fix the problem by claiming the efr_lock mutex when accessing the Special
register set.
Fixes: dfeae619d781 ("serial: sc16is7xx")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Link: https://lore.kernel.org/r/20240723125302.1305372-3-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[ Resolve minor conflicts ]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
---
drivers/tty/serial/sc16is7xx.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 7a9924d9b294..d7728920853e 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -545,6 +545,8 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud)
SC16IS7XX_MCR_CLKSEL_BIT,
prescaler == 1 ? 0 : SC16IS7XX_MCR_CLKSEL_BIT);
+ mutex_lock(&one->efr_lock);
+
/* Open the LCR divisors for configuration */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG,
SC16IS7XX_LCR_CONF_MODE_A);
@@ -558,6 +560,8 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud)
/* Put LCR back to the normal mode */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr);
+ mutex_unlock(&one->efr_lock);
+
return DIV_ROUND_CLOSEST((clk / prescaler) / 16, div);
}
--
2.34.1
Set link_startup_again to false after a successful
ufshcd_dme_link_startup operation and confirmation of device presence.
Prevents unnecessary link startup attempts when the previous operation
has succeeded.
Signed-off-by: Vamshi Gajjela <vamshigajjela(a)google.com>
Fixes: 7caf489b99a4 ("scsi: ufs: issue link starup 2 times if device isn't active")
Cc: stable(a)vger.kernel.org
---
drivers/ufs/core/ufshcd.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index abbe7135a977..cc1d15002ab5 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4994,6 +4994,10 @@ static int ufshcd_link_startup(struct ufs_hba *hba)
goto out;
}
+ /* link_startup success and device is present */
+ if (!ret && ufshcd_is_device_present(hba))
+ link_startup_again = false;
+
/*
* DME link lost indication is only received when link is up,
* but we can't be sure if the link is up until link startup
--
2.47.0.371.ga323438b13-goog
From: Wayne Lin <wayne.lin(a)amd.com>
[ Upstream commit fcf6a49d79923a234844b8efe830a61f3f0584e4 ]
[Why]
When unplug one of monitors connected after mst hub, encounter null pointer dereference.
It's due to dc_sink get released immediately in early_unregister() or detect_ctx(). When
commit new state which directly referring to info stored in dc_sink will cause null pointer
dereference.
[how]
Remove redundant checking condition. Relevant condition should already be covered by checking
if dsc_aux is null or not. Also reset dsc_aux to NULL when the connector is disconnected.
Reviewed-by: Jerry Zuo <jerry.zuo(a)amd.com>
Acked-by: Zaeem Mohamed <zaeem.mohamed(a)amd.com>
Signed-off-by: Wayne Lin <wayne.lin(a)amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler(a)amd.com>
Signed-off-by: Alex Deucher <alexander.deucher(a)amd.com>
[ Resolve minor conflicts ]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
index d390e3d62e56..9ec9792f115a 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_mst_types.c
@@ -179,6 +179,8 @@ amdgpu_dm_mst_connector_early_unregister(struct drm_connector *connector)
dc_sink_release(dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
+ aconnector->dsc_aux = NULL;
+ port->passthrough_aux = NULL;
}
aconnector->mst_status = MST_STATUS_DEFAULT;
@@ -487,6 +489,8 @@ dm_dp_mst_detect(struct drm_connector *connector,
dc_sink_release(aconnector->dc_sink);
aconnector->dc_sink = NULL;
aconnector->edid = NULL;
+ aconnector->dsc_aux = NULL;
+ port->passthrough_aux = NULL;
amdgpu_dm_set_mst_status(&aconnector->mst_status,
MST_REMOTE_EDID | MST_ALLOCATE_NEW_PAYLOAD | MST_CLEAR_ALLOCATED_PAYLOAD,
--
2.43.0
Currently in some testcases we can trigger:
xe 0000:03:00.0: [drm] Assertion `exec_queue_destroyed(q)` failed!
....
WARNING: CPU: 18 PID: 2640 at drivers/gpu/drm/xe/xe_guc_submit.c:1826 xe_guc_sched_done_handler+0xa54/0xef0 [xe]
xe 0000:03:00.0: [drm] *ERROR* GT1: DEREGISTER_DONE: Unexpected engine state 0x00a1, guc_id=57
Looking at a snippet of corresponding ftrace for this GuC id we can see:
162.673311: xe_sched_msg_add: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3
162.673317: xe_sched_msg_recv: dev=0000:03:00.0, gt=1 guc_id=57, opcode=3
162.673319: xe_exec_queue_scheduling_disable: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0
162.674089: xe_exec_queue_kill: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0x29, flags=0x0
162.674108: xe_exec_queue_close: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0
162.674488: xe_exec_queue_scheduling_done: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa9, flags=0x0
162.678452: xe_exec_queue_deregister: dev=0000:03:00.0, 1:0x2, gt=1, width=1, guc_id=57, guc_state=0xa1, flags=0x0
It looks like we try to suspend the queue (opcode=3), setting
suspend_pending and triggering a disable_scheduling. The user then
closes the queue. However closing the queue seems to forcefully signal
the fence after killing the queue, however when the G2H response for
disable_scheduling comes back we have now cleared suspend_pending when
signalling the suspend fence, so the disable_scheduling now incorrectly
tries to also deregister the queue, leading to warnings since the queue
has yet to even be marked for destruction. We also seem to trigger
errors later with trying to double unregister the same queue.
To fix this tweak the ordering when handling the response to ensure we
don't race with a disable_scheduling that doesn't actually intend to
actually unregister. The destruction path should now also correctly
wait for any pending_disable before marking as destroyed.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3371
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_guc_submit.c | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index f3c22b101916..f82f286fd431 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -1867,16 +1867,30 @@ static void handle_sched_done(struct xe_guc *guc, struct xe_exec_queue *q,
xe_gt_assert(guc_to_gt(guc), runnable_state == 0);
xe_gt_assert(guc_to_gt(guc), exec_queue_pending_disable(q));
- clear_exec_queue_pending_disable(q);
if (q->guc->suspend_pending) {
suspend_fence_signal(q);
+ clear_exec_queue_pending_disable(q);
} else {
if (exec_queue_banned(q) || check_timeout) {
smp_wmb();
wake_up_all(&guc->ct.wq);
}
- if (!check_timeout)
+ if (!check_timeout && exec_queue_destroyed(q)) {
+ /*
+ * Make sure we clear the pending_disable only
+ * after the sampling the destroyed state. We
+ * want to ensure we don't trigger the
+ * unregister too early with something only
+ * intending to only disable scheduling. The
+ * caller doing the destroy must wait for an
+ * ongoing pending_destroy before marking as
+ * destroyed.
+ */
+ clear_exec_queue_pending_disable(q);
deregister_exec_queue(guc, q);
+ } else {
+ clear_exec_queue_pending_disable(q);
+ }
}
}
}
--
2.47.0
Fix the MST sideband message body length check, which must be at least 1
byte accounting for the message body CRC (aka message data CRC) at the
end of the message.
This fixes a case where an MST branch device returns a header with a
correct header CRC (indicating a correctly received body length), with
the body length being incorrectly set to 0. This will later lead to a
memory corruption in drm_dp_sideband_append_payload() and the following
errors in dmesg:
UBSAN: array-index-out-of-bounds in drivers/gpu/drm/display/drm_dp_mst_topology.c:786:25
index -1 is out of range for type 'u8 [48]'
Call Trace:
drm_dp_sideband_append_payload+0x33d/0x350 [drm_display_helper]
drm_dp_get_one_sb_msg+0x3ce/0x5f0 [drm_display_helper]
drm_dp_mst_hpd_irq_handle_event+0xc8/0x1580 [drm_display_helper]
memcpy: detected field-spanning write (size 18446744073709551615) of single field "&msg->msg[msg->curlen]" at drivers/gpu/drm/display/drm_dp_mst_topology.c:791 (size 256)
Call Trace:
drm_dp_sideband_append_payload+0x324/0x350 [drm_display_helper]
drm_dp_get_one_sb_msg+0x3ce/0x5f0 [drm_display_helper]
drm_dp_mst_hpd_irq_handle_event+0xc8/0x1580 [drm_display_helper]
Cc: <stable(a)vger.kernel.org>
Cc: Lyude Paul <lyude(a)redhat.com>
Signed-off-by: Imre Deak <imre.deak(a)intel.com>
---
drivers/gpu/drm/display/drm_dp_mst_topology.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index ac90118b9e7a8..e6ee180815b20 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -320,6 +320,9 @@ static bool drm_dp_decode_sideband_msg_hdr(const struct drm_dp_mst_topology_mgr
hdr->broadcast = (buf[idx] >> 7) & 0x1;
hdr->path_msg = (buf[idx] >> 6) & 0x1;
hdr->msg_len = buf[idx] & 0x3f;
+ if (hdr->msg_len < 1) /* min space for body CRC */
+ return false;
+
idx++;
hdr->somt = (buf[idx] >> 7) & 0x1;
hdr->eomt = (buf[idx] >> 6) & 0x1;
--
2.44.2
It is unsafe to call PageTail() in dump_page() as page_is_fake_head()
will almost certainly return true when called on a head page that
is copied to the stack. That will cause the VM_BUG_ON_PGFLAGS() in
const_folio_flags() to trigger when it shouldn't. Fortunately, we don't
need to call PageTail() here; it's fine to have a pointer to a virtual
alias of the page's flag word rather than the real page's flag word.
Fixes: fae7d834c43c (mm: add __dump_folio())
Signed-off-by: Matthew Wilcox (Oracle) <willy(a)infradead.org>
Cc: stable(a)vger.kernel.org
---
include/linux/page-flags.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 2220bfec278e..cf46ac720802 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -306,7 +306,7 @@ static const unsigned long *const_folio_flags(const struct folio *folio,
{
const struct page *page = &folio->page;
- VM_BUG_ON_PGFLAGS(PageTail(page), page);
+ VM_BUG_ON_PGFLAGS(page->compound_head & 1, page);
VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
return &page[n].flags;
}
@@ -315,7 +315,7 @@ static unsigned long *folio_flags(struct folio *folio, unsigned n)
{
struct page *page = &folio->page;
- VM_BUG_ON_PGFLAGS(PageTail(page), page);
+ VM_BUG_ON_PGFLAGS(page->compound_head & 1, page);
VM_BUG_ON_PGFLAGS(n > 0 && !test_bit(PG_head, &page->flags), page);
return &page[n].flags;
}
--
2.45.2
Some notebooks have a button to disable the camera (not to be mistaken
with the mechanical cover). This is a standard GPIO linked to the
camera via the ACPI table.
4 years ago we added support for this button in UVC via the Privacy control.
This has three issues:
- If the camera has its own privacy control, it will be masked.
- We need to power-up the camera to read the privacy control gpio.
- Other drivers have not followed this approach and have used evdev.
We tried to fix the power-up issues implementing "granular power
saving" but it has been more complicated than anticipated...
This patchset implements the Privacy GPIO as a evdev.
The first patch of this set is already in Laurent's tree... but I
include it to get some CI coverage.
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v4:
- Remove gpio entity, it is not needed.
- Use unit->gpio.irq in free_irq to make smatch happy.
- Link to v3: https://lore.kernel.org/r/20241112-uvc-subdev-v3-0-0ea573d41a18@chromium.org
Changes in v3:
- CodeStyle (Thanks Sakari)
- Re-implement as input device
- Make the code depend on UVC_INPUT_EVDEV
- Link to v2: https://lore.kernel.org/r/20241108-uvc-subdev-v2-0-85d8a051a3d3@chromium.org
Changes in v2:
- Rebase on top of https://patchwork.linuxtv.org/project/linux-media/patch/20241106-uvc-crashr…
- Create uvc_gpio_cleanup and uvc_gpio_deinit
- Refactor quirk: do not disable irq
- Change define number for MEDIA_ENT_F_GPIO
- Link to v1: https://lore.kernel.org/r/20241031-uvc-subdev-v1-0-a68331cedd72@chromium.org
---
Ricardo Ribalda (7):
media: uvcvideo: Fix crash during unbind if gpio unit is in use
media: uvcvideo: Factor out gpio functions to its own file
media: uvcvideo: Re-implement privacy GPIO as an input device
Revert "media: uvcvideo: Allow entity-defined get_info and get_cur"
media: uvcvideo: Introduce UVC_QUIRK_PRIVACY_DURING_STREAM
media: uvcvideo: Make gpio_unit entity-less
media: uvcvideo: Remove UVC_EXT_GPIO entity
drivers/media/usb/uvc/Kconfig | 2 +-
drivers/media/usb/uvc/Makefile | 3 +
drivers/media/usb/uvc/uvc_ctrl.c | 40 ++---------
drivers/media/usb/uvc/uvc_driver.c | 123 ++--------------------------------
drivers/media/usb/uvc/uvc_entity.c | 7 +-
drivers/media/usb/uvc/uvc_gpio.c | 134 +++++++++++++++++++++++++++++++++++++
drivers/media/usb/uvc/uvc_status.c | 13 +++-
drivers/media/usb/uvc/uvc_video.c | 4 ++
drivers/media/usb/uvc/uvcvideo.h | 43 +++++++-----
9 files changed, 195 insertions(+), 174 deletions(-)
---
base-commit: 72ad4ff638047bbbdf3232178fea4bec1f429319
change-id: 20241030-uvc-subdev-89f4467a00b5
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
From: Dmitry Kandybka <d.kandybka(a)gmail.com>
commit b169e76ebad22cbd055101ee5aa1a7bed0e66606 upstream.
In 'mptcp_reset_tout_timer', promote 'probe_timestamp' to unsigned long
to avoid possible integer overflow. Compile tested only.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Dmitry Kandybka <d.kandybka(a)gmail.com>
Link: https://patch.msgid.link/20241107103657.1560536-1-d.kandybka@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[ Conflict in this version because commit d866ae9aaa43 ("mptcp: add a
new sysctl for make after break timeout") is not in this version, and
replaced TCP_TIMEWAIT_LEN in the expression. The fix can still be
applied the same way: by forcing a cast to unsigned long for the first
item. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
net/mptcp/protocol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index 1acd4e37a0ea..370afcac2623 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2708,8 +2708,8 @@ void mptcp_reset_tout_timer(struct mptcp_sock *msk, unsigned long fail_tout)
if (!fail_tout && !inet_csk(sk)->icsk_mtup.probe_timestamp)
return;
- close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies +
- TCP_TIMEWAIT_LEN;
+ close_timeout = (unsigned long)inet_csk(sk)->icsk_mtup.probe_timestamp -
+ tcp_jiffies32 + jiffies + TCP_TIMEWAIT_LEN;
/* the close timeout takes precedence on the fail one, and here at least one of
* them is active
--
2.45.2
From: Puranjay Mohan <pjy(a)amazon.com>
[ Upstream commit 7c2fd76048e95dd267055b5f5e0a48e6e7c81fd9 ]
On an NVMe namespace that does not support metadata, it is possible to
send an IO command with metadata through io-passthru. This allows issues
like [1] to trigger in the completion code path.
nvme_map_user_request() doesn't check if the namespace supports metadata
before sending it forward. It also allows admin commands with metadata to
be processed as it ignores metadata when bdev == NULL and may report
success.
Reject an IO command with metadata when the NVMe namespace doesn't
support it and reject an admin command if it has metadata.
[1] https://lore.kernel.org/all/mb61pcylvnym8.fsf@amazon.com/
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Puranjay Mohan <pjy(a)amazon.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Anuj Gupta <anuj20.g(a)samsung.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
[ Move the changes from nvme_map_user_request() to nvme_submit_user_cmd()
to make it work on 5.4 ]
Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com>
---
drivers/nvme/host/core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 0676637e1eab..a841fd4929ad 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -921,11 +921,16 @@ static int nvme_submit_user_cmd(struct request_queue *q,
bool write = nvme_is_write(cmd);
struct nvme_ns *ns = q->queuedata;
struct gendisk *disk = ns ? ns->disk : NULL;
+ bool supports_metadata = disk && blk_get_integrity(disk);
+ bool has_metadata = meta_buffer && meta_len;
struct request *req;
struct bio *bio = NULL;
void *meta = NULL;
int ret;
+ if (has_metadata && !supports_metadata)
+ return -EINVAL;
+
req = nvme_alloc_request(q, cmd, 0, NVME_QID_ANY);
if (IS_ERR(req))
return PTR_ERR(req);
@@ -940,7 +945,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
goto out;
bio = req->bio;
bio->bi_disk = disk;
- if (disk && meta_buffer && meta_len) {
+ if (has_metadata) {
meta = nvme_add_user_metadata(bio, meta_buffer, meta_len,
meta_seed, write);
if (IS_ERR(meta)) {
--
2.40.1
From: Puranjay Mohan <pjy(a)amazon.com>
[ Upstream commit 7c2fd76048e95dd267055b5f5e0a48e6e7c81fd9 ]
On an NVMe namespace that does not support metadata, it is possible to
send an IO command with metadata through io-passthru. This allows issues
like [1] to trigger in the completion code path.
nvme_map_user_request() doesn't check if the namespace supports metadata
before sending it forward. It also allows admin commands with metadata to
be processed as it ignores metadata when bdev == NULL and may report
success.
Reject an IO command with metadata when the NVMe namespace doesn't
support it and reject an admin command if it has metadata.
[1] https://lore.kernel.org/all/mb61pcylvnym8.fsf@amazon.com/
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Puranjay Mohan <pjy(a)amazon.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Anuj Gupta <anuj20.g(a)samsung.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
[ Move the changes from nvme_map_user_request() to nvme_submit_user_cmd()
to make it work on 4.19 ]
Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com>
---
drivers/nvme/host/core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6adff541282b..fcf062f3b507 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -802,11 +802,16 @@ static int nvme_submit_user_cmd(struct request_queue *q,
bool write = nvme_is_write(cmd);
struct nvme_ns *ns = q->queuedata;
struct gendisk *disk = ns ? ns->disk : NULL;
+ bool supports_metadata = disk && blk_get_integrity(disk);
+ bool has_metadata = meta_buffer && meta_len;
struct request *req;
struct bio *bio = NULL;
void *meta = NULL;
int ret;
+ if (has_metadata && !supports_metadata)
+ return -EINVAL;
+
req = nvme_alloc_request(q, cmd, 0, NVME_QID_ANY);
if (IS_ERR(req))
return PTR_ERR(req);
@@ -821,7 +826,7 @@ static int nvme_submit_user_cmd(struct request_queue *q,
goto out;
bio = req->bio;
bio->bi_disk = disk;
- if (disk && meta_buffer && meta_len) {
+ if (has_metadata) {
meta = nvme_add_user_metadata(bio, meta_buffer, meta_len,
meta_seed, write);
if (IS_ERR(meta)) {
--
2.40.1
From: Dmitry Kandybka <d.kandybka(a)gmail.com>
commit b169e76ebad22cbd055101ee5aa1a7bed0e66606 upstream.
In 'mptcp_reset_tout_timer', promote 'probe_timestamp' to unsigned long
to avoid possible integer overflow. Compile tested only.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Dmitry Kandybka <d.kandybka(a)gmail.com>
Link: https://patch.msgid.link/20241107103657.1560536-1-d.kandybka@gmail.com
Signed-off-by: Jakub Kicinski <kuba(a)kernel.org>
[ Conflict in this version because commit d866ae9aaa43 ("mptcp: add a
new sysctl for make after break timeout") is not in this version, and
replaced TCP_TIMEWAIT_LEN in the expression. The fix can still be
applied the same way: by forcing a cast to unsigned long for the first
item. ]
Signed-off-by: Matthieu Baerts (NGI0) <matttbe(a)kernel.org>
---
net/mptcp/protocol.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/net/mptcp/protocol.c b/net/mptcp/protocol.c
index b8357d7c6b3a..01f6ce970918 100644
--- a/net/mptcp/protocol.c
+++ b/net/mptcp/protocol.c
@@ -2691,8 +2691,8 @@ void mptcp_reset_tout_timer(struct mptcp_sock *msk, unsigned long fail_tout)
if (!fail_tout && !inet_csk(sk)->icsk_mtup.probe_timestamp)
return;
- close_timeout = inet_csk(sk)->icsk_mtup.probe_timestamp - tcp_jiffies32 + jiffies +
- TCP_TIMEWAIT_LEN;
+ close_timeout = (unsigned long)inet_csk(sk)->icsk_mtup.probe_timestamp -
+ tcp_jiffies32 + jiffies + TCP_TIMEWAIT_LEN;
/* the close timeout takes precedence on the fail one, and here at least one of
* them is active
--
2.45.2
From: Oleg Nesterov <oleg(a)redhat.com>
[ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]
1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
2. Add a comment to explain why do we need to validate slot_addr.
3. Simplify the validation above. We can simply check offset < PAGE_SIZE,
unsigned underflows are fine, it should work if slot_addr < area->vaddr.
4. Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must
be valid if offset < PAGE_SIZE.
The next patches will cleanup this function even more.
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/uprobes.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 6dac0b5798213..5ce3d189e33c2 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1629,8 +1629,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
static void xol_free_insn_slot(struct task_struct *tsk)
{
struct xol_area *area;
- unsigned long vma_end;
unsigned long slot_addr;
+ unsigned long offset;
if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask)
return;
@@ -1639,24 +1639,21 @@ static void xol_free_insn_slot(struct task_struct *tsk)
if (unlikely(!slot_addr))
return;
+ tsk->utask->xol_vaddr = 0;
area = tsk->mm->uprobes_state.xol_area;
- vma_end = area->vaddr + PAGE_SIZE;
- if (area->vaddr <= slot_addr && slot_addr < vma_end) {
- unsigned long offset;
- int slot_nr;
-
- offset = slot_addr - area->vaddr;
- slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
- if (slot_nr >= UINSNS_PER_PAGE)
- return;
+ offset = slot_addr - area->vaddr;
+ /*
+ * slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE).
+ * This check can only fail if the "[uprobes]" vma was mremap'ed.
+ */
+ if (offset < PAGE_SIZE) {
+ int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
clear_bit(slot_nr, area->bitmap);
atomic_dec(&area->slot_count);
smp_mb__after_atomic(); /* pairs with prepare_to_wait() */
if (waitqueue_active(&area->wq))
wake_up(&area->wq);
-
- tsk->utask->xol_vaddr = 0;
}
}
--
2.43.0
From: Oleg Nesterov <oleg(a)redhat.com>
[ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]
1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
2. Add a comment to explain why do we need to validate slot_addr.
3. Simplify the validation above. We can simply check offset < PAGE_SIZE,
unsigned underflows are fine, it should work if slot_addr < area->vaddr.
4. Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must
be valid if offset < PAGE_SIZE.
The next patches will cleanup this function even more.
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/uprobes.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 56cd0c7f516d3..5df99a1223c22 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1639,8 +1639,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
static void xol_free_insn_slot(struct task_struct *tsk)
{
struct xol_area *area;
- unsigned long vma_end;
unsigned long slot_addr;
+ unsigned long offset;
if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask)
return;
@@ -1649,24 +1649,21 @@ static void xol_free_insn_slot(struct task_struct *tsk)
if (unlikely(!slot_addr))
return;
+ tsk->utask->xol_vaddr = 0;
area = tsk->mm->uprobes_state.xol_area;
- vma_end = area->vaddr + PAGE_SIZE;
- if (area->vaddr <= slot_addr && slot_addr < vma_end) {
- unsigned long offset;
- int slot_nr;
-
- offset = slot_addr - area->vaddr;
- slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
- if (slot_nr >= UINSNS_PER_PAGE)
- return;
+ offset = slot_addr - area->vaddr;
+ /*
+ * slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE).
+ * This check can only fail if the "[uprobes]" vma was mremap'ed.
+ */
+ if (offset < PAGE_SIZE) {
+ int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
clear_bit(slot_nr, area->bitmap);
atomic_dec(&area->slot_count);
smp_mb__after_atomic(); /* pairs with prepare_to_wait() */
if (waitqueue_active(&area->wq))
wake_up(&area->wq);
-
- tsk->utask->xol_vaddr = 0;
}
}
--
2.43.0
Commit b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround
broken TEF FIFO tail index erratum") introduced
mcp251xfd_get_tef_len() to get the number of unhandled transmit events
from the Transmit Event FIFO (TEF).
As the TEF has no head index, the driver uses the TX-FIFO's tail index
instead, assuming that send frames are completed.
When calculating the number of unhandled TEF events, that commit
didn't take mcp2518fd erratum DS80000789E 6. into account. According
to that erratum, the FIFOCI bits of a FIFOSTA register, here the
TX-FIFO tail index might be corrupted.
However here it seems the bit indicating that the TX-FIFO is
empty (MCP251XFD_REG_FIFOSTA_TFERFFIF) is not correct while the
TX-FIFO tail index is.
Assume that the TX-FIFO is indeed empty if:
- Chip's head and tail index are equal (len == 0).
- The TX-FIFO is less than half full.
(The TX-FIFO empty case has already been checked at the
beginning of this function.)
- No free buffers in the TX ring.
If the TX-FIFO is assumed to be empty, assume that the TEF is full and
return the number of elements in the TX-FIFO (which equals the number
of TEF elements).
If these assumptions are false, the driver might read to many objects
from the TEF. mcp251xfd_handle_tefif_one() checks the sequence numbers
and will refuse to process old events.
Reported-by: Renjaya Raga Zenta <renjaya.zenta(a)formulatrix.com>
Closes: https://patch.msgid.link/CAJ7t6HgaeQ3a_OtfszezU=zB-FqiZXqrnATJ3UujNoQJJf7Gg…
Fixes: b8e0ddd36ce9 ("can: mcp251xfd: tef: prepare to workaround broken TEF FIFO tail index erratum")
Not-yet-Cc: stable(a)vger.kernel.org
Signed-off-by: Marc Kleine-Budde <mkl(a)pengutronix.de>
---
drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c | 29 ++++++++++++++++++++++++++-
1 file changed, 28 insertions(+), 1 deletion(-)
diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
index d3ac865933fdf6c4ecdd80ad4d7accbff51eb0f8..e94321849fd7e69ed045eaeac3efec52fe077d96 100644
--- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
+++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-tef.c
@@ -21,6 +21,11 @@ static inline bool mcp251xfd_tx_fifo_sta_empty(u32 fifo_sta)
return fifo_sta & MCP251XFD_REG_FIFOSTA_TFERFFIF;
}
+static inline bool mcp251xfd_tx_fifo_sta_less_than_half_full(u32 fifo_sta)
+{
+ return fifo_sta & MCP251XFD_REG_FIFOSTA_TFHRFHIF;
+}
+
static inline int
mcp251xfd_tef_tail_get_from_chip(const struct mcp251xfd_priv *priv,
u8 *tef_tail)
@@ -147,7 +152,29 @@ mcp251xfd_get_tef_len(struct mcp251xfd_priv *priv, u8 *len_p)
BUILD_BUG_ON(sizeof(tx_ring->obj_num) != sizeof(len));
len = (chip_tx_tail << shift) - (tail << shift);
- *len_p = len >> shift;
+ len >>= shift;
+
+ /* According to mcp2518fd erratum DS80000789E 6. the FIFOCI
+ * bits of a FIFOSTA register, here the TX-FIFO tail index
+ * might be corrupted.
+ *
+ * However here it seems the bit indicating that the TX-FIFO
+ * is empty (MCP251XFD_REG_FIFOSTA_TFERFFIF) is not correct
+ * while the TX-FIFO tail index is.
+ *
+ * We assume the TX-FIFO is empty, i.e. all pending CAN frames
+ * haven been send, if:
+ * - Chip's head and tail index are equal (len == 0).
+ * - The TX-FIFO is less than half full.
+ * (The TX-FIFO empty case has already been checked at the
+ * beginning of this function.)
+ * - No free buffers in the TX ring.
+ */
+ if (len == 0 && mcp251xfd_tx_fifo_sta_less_than_half_full(fifo_sta) &&
+ mcp251xfd_get_tx_free(tx_ring) == 0)
+ len = tx_ring->obj_num;
+
+ *len_p = len;
return 0;
}
---
base-commit: fcc79e1714e8c2b8e216dc3149812edd37884eef
change-id: 20241115-mcp251xfd-fix-length-calculation-96d4a0ed11fe
Best regards,
--
Marc Kleine-Budde <mkl(a)pengutronix.de>
Critical fixes for mmap_region(), backported to 5.10.y.
Some notes on differences from upstream:
* We do NOT take commit 0fb4a7ad270b ("mm: refactor
map_deny_write_exec()"), as this refactors code only introduced in 6.2.
* We make reference in "mm: refactor arch_calc_vm_flag_bits() and arm64 MTE
handling" to parisc, but the referenced functionality does not exist in
this kernel.
* In this kernel is_shared_maywrite() does not exist and the code uses
VM_SHARED to determine whether mapping_map_writable() /
mapping_unmap_writable() should be invoked. This backport therefore
follows suit.
* The vma_dummy_vm_ops static global doesn't exist in this kernel, so we
use a local static variable in mmap_file() and vma_close().
* Each version of these series is confronted by a slightly different
mmap_region(), so we must adapt the change for each stable version. The
approach remains the same throughout, however, and we correctly avoid
closing the VMA part way through any __mmap_region() operation.
* In 5.10 we must handle VM_DENYWRITE. Since this is done at the top of the
file-backed VMA handling logic, and importantly before mmap_file() invocation,
this does not imply any additional difficult error handling on partial
completion of mapping so has no significant impact.
Lorenzo Stoakes (4):
mm: avoid unsafe VMA hook invocation when error arises on mmap hook
mm: unconditionally close VMAs on error
mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
mm: resolve faulty mmap_region() error path behaviour
arch/arm64/include/asm/mman.h | 10 +++--
include/linux/mman.h | 7 +--
mm/internal.h | 19 ++++++++
mm/mmap.c | 82 +++++++++++++++++++++--------------
mm/nommu.c | 9 ++--
mm/shmem.c | 3 --
mm/util.c | 33 ++++++++++++++
7 files changed, 117 insertions(+), 46 deletions(-)
--
2.47.0
From: Puranjay Mohan <pjy(a)amazon.com>
[ Upstream commit 7c2fd76048e95dd267055b5f5e0a48e6e7c81fd9 ]
On an NVMe namespace that does not support metadata, it is possible to
send an IO command with metadata through io-passthru. This allows issues
like [1] to trigger in the completion code path.
nvme_map_user_request() doesn't check if the namespace supports metadata
before sending it forward. It also allows admin commands with metadata to
be processed as it ignores metadata when bdev == NULL and may report
success.
Reject an IO command with metadata when the NVMe namespace doesn't
support it and reject an admin command if it has metadata.
[1] https://lore.kernel.org/all/mb61pcylvnym8.fsf@amazon.com/
Suggested-by: Christoph Hellwig <hch(a)lst.de>
Signed-off-by: Puranjay Mohan <pjy(a)amazon.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Sagi Grimberg <sagi(a)grimberg.me>
Reviewed-by: Anuj Gupta <anuj20.g(a)samsung.com>
Signed-off-by: Keith Busch <kbusch(a)kernel.org>
[ Minor changes to make it work on 6.6 ]
Signed-off-by: Hagar Hemdan <hagarhem(a)amazon.com>
---
drivers/nvme/host/ioctl.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/nvme/host/ioctl.c b/drivers/nvme/host/ioctl.c
index 875dee6ecd40..19a7f0160618 100644
--- a/drivers/nvme/host/ioctl.c
+++ b/drivers/nvme/host/ioctl.c
@@ -3,6 +3,7 @@
* Copyright (c) 2011-2014, Intel Corporation.
* Copyright (c) 2017-2021 Christoph Hellwig.
*/
+#include <linux/blk-integrity.h>
#include <linux/ptrace.h> /* for force_successful_syscall_return */
#include <linux/nvme_ioctl.h>
#include <linux/io_uring.h>
@@ -171,10 +172,15 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer,
struct request_queue *q = req->q;
struct nvme_ns *ns = q->queuedata;
struct block_device *bdev = ns ? ns->disk->part0 : NULL;
+ bool supports_metadata = bdev && blk_get_integrity(bdev->bd_disk);
+ bool has_metadata = meta_buffer && meta_len;
struct bio *bio = NULL;
void *meta = NULL;
int ret;
+ if (has_metadata && !supports_metadata)
+ return -EINVAL;
+
if (ioucmd && (ioucmd->flags & IORING_URING_CMD_FIXED)) {
struct iov_iter iter;
@@ -198,7 +204,7 @@ static int nvme_map_user_request(struct request *req, u64 ubuffer,
if (bdev)
bio_set_dev(bio, bdev);
- if (bdev && meta_buffer && meta_len) {
+ if (has_metadata) {
meta = nvme_add_user_metadata(req, meta_buffer, meta_len,
meta_seed);
if (IS_ERR(meta)) {
--
2.40.1
From: Hans de Goede <hdegoede(a)redhat.com>
[ Upstream commit 3de0f2627ef849735f155c1818247f58404dddfe ]
Not all subsystems support a device getting removed while there are
still consumers of the device with a reference to the device.
One example of this is the regulator subsystem. If a regulator gets
unregistered while there are still drivers holding a reference
a WARN() at drivers/regulator/core.c:5829 triggers, e.g.:
WARNING: CPU: 1 PID: 1587 at drivers/regulator/core.c:5829 regulator_unregister
Hardware name: Intel Corp. VALLEYVIEW C0 PLATFORM/BYT-T FFD8, BIOS BLADE_21.X64.0005.R00.1504101516 FFD8_X64_R_2015_04_10_1516 04/10/2015
RIP: 0010:regulator_unregister
Call Trace:
<TASK>
regulator_unregister
devres_release_group
i2c_device_remove
device_release_driver_internal
bus_remove_device
device_del
device_unregister
x86_android_tablet_remove
On the Lenovo Yoga Tablet 2 series the bq24190 charger chip also provides
a 5V boost converter output for powering USB devices connected to the micro
USB port, the bq24190-charger driver exports this as a Vbus regulator.
On the 830 (8") and 1050 ("10") models this regulator is controlled by
a platform_device and x86_android_tablet_remove() removes platform_device-s
before i2c_clients so the consumer gets removed first.
But on the 1380 (13") model there is a lc824206xa micro-USB switch
connected over I2C and the extcon driver for that controls the regulator.
The bq24190 i2c-client *must* be registered first, because that creates
the regulator with the lc824206xa listed as its consumer. If the regulator
has not been registered yet the lc824206xa driver will end up getting
a dummy regulator.
Since in this case both the regulator provider and consumer are I2C
devices, the only way to ensure that the consumer is unregistered first
is to unregister the I2C devices in reverse order of in which they were
created.
For consistency and to avoid similar problems in the future change
x86_android_tablet_remove() to unregister all device types in reverse
order.
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
Link: https://lore.kernel.org/r/20240406125058.13624-1-hdegoede@redhat.com
[ Resolve minor conflicts ]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
---
drivers/platform/x86/x86-android-tablets/core.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/platform/x86/x86-android-tablets/core.c b/drivers/platform/x86/x86-android-tablets/core.c
index a0fa0b6859c9..63a348af83db 100644
--- a/drivers/platform/x86/x86-android-tablets/core.c
+++ b/drivers/platform/x86/x86-android-tablets/core.c
@@ -230,20 +230,20 @@ static void x86_android_tablet_remove(struct platform_device *pdev)
{
int i;
- for (i = 0; i < serdev_count; i++) {
+ for (i = serdev_count - 1; i >= 0; i--) {
if (serdevs[i])
serdev_device_remove(serdevs[i]);
}
kfree(serdevs);
- for (i = 0; i < pdev_count; i++)
+ for (i = pdev_count - 1; i >= 0; i--)
platform_device_unregister(pdevs[i]);
kfree(pdevs);
kfree(buttons);
- for (i = 0; i < i2c_client_count; i++)
+ for (i = i2c_client_count - 1; i >= 0; i--)
i2c_unregister_device(i2c_clients[i]);
kfree(i2c_clients);
--
2.43.0
From: Pali Rohár <pali(a)kernel.org>
commit e2a8910af01653c1c268984855629d71fb81f404 upstream.
ReparseDataLength is sum of the InodeType size and DataBuffer size.
So to get DataBuffer size it is needed to subtract InodeType's size from
ReparseDataLength.
Function cifs_strndup_from_utf16() is currentlly accessing buf->DataBuffer
at position after the end of the buffer because it does not subtract
InodeType size from the length. Fix this problem and correctly subtract
variable len.
Member InodeType is present only when reparse buffer is large enough. Check
for ReparseDataLength before accessing InodeType to prevent another invalid
memory access.
Major and minor rdev values are present also only when reparse buffer is
large enough. Check for reparse buffer size before calling reparse_mkdev().
Fixes: d5ecebc4900d ("smb3: Allow query of symlinks stored as reparse points")
Reviewed-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
[use variable name symlink_buf, the other buf->InodeType accesses are
not used in current version so skip]
Signed-off-by: Mahmoud Adam <mngyadam(a)amazon.com>
---
v2: fix upstream format.
https://lore.kernel.org/stable/20241122152943.76044-1-mngyadam@amazon.com/
fs/cifs/smb2ops.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 9ec67b76bc062..4f7639afa7627 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2807,6 +2807,12 @@ parse_reparse_posix(struct reparse_posix_data *symlink_buf,
/* See MS-FSCC 2.1.2.6 for the 'NFS' style reparse tags */
len = le16_to_cpu(symlink_buf->ReparseDataLength);
+ if (len < sizeof(symlink_buf->InodeType)) {
+ cifs_dbg(VFS, "srv returned malformed nfs buffer\n");
+ return -EIO;
+ }
+
+ len -= sizeof(symlink_buf->InodeType);
if (le64_to_cpu(symlink_buf->InodeType) != NFS_SPECFILE_LNK) {
cifs_dbg(VFS, "%lld not a supported symlink type\n",
--
2.40.1
From: Pali Rohár <pali(a)kernel.org>
commit e2a8910af01653c1c268984855629d71fb81f404 upstream.
ReparseDataLength is sum of the InodeType size and DataBuffer size.
So to get DataBuffer size it is needed to subtract InodeType's size from
ReparseDataLength.
Function cifs_strndup_from_utf16() is currentlly accessing buf->DataBuffer
at position after the end of the buffer because it does not subtract
InodeType size from the length. Fix this problem and correctly subtract
variable len.
Member InodeType is present only when reparse buffer is large enough. Check
for ReparseDataLength before accessing InodeType to prevent another invalid
memory access.
Major and minor rdev values are present also only when reparse buffer is
large enough. Check for reparse buffer size before calling reparse_mkdev().
Fixes: d5ecebc4900d ("smb3: Allow query of symlinks stored as reparse points")
Reviewed-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
[use variable name symlink_buf, the other buf->InodeType accesses are
not used in current version so skip]
Signed-off-by: Mahmoud Adam <mngyadam(a)amazon.com>
---
v2: fix upstream format.
https://lore.kernel.org/stable/Z0Pd9slDKJNM0n3T@ca93ea81d97d/T/#m8cdb746a25…
fs/smb/client/smb2ops.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/smb/client/smb2ops.c b/fs/smb/client/smb2ops.c
index d1e5ff9a3cd39..fcfbc096924a8 100644
--- a/fs/smb/client/smb2ops.c
+++ b/fs/smb/client/smb2ops.c
@@ -2897,6 +2897,12 @@ parse_reparse_posix(struct reparse_posix_data *symlink_buf,
/* See MS-FSCC 2.1.2.6 for the 'NFS' style reparse tags */
len = le16_to_cpu(symlink_buf->ReparseDataLength);
+ if (len < sizeof(symlink_buf->InodeType)) {
+ cifs_dbg(VFS, "srv returned malformed nfs buffer\n");
+ return -EIO;
+ }
+
+ len -= sizeof(symlink_buf->InodeType);
if (le64_to_cpu(symlink_buf->InodeType) != NFS_SPECFILE_LNK) {
cifs_dbg(VFS, "%lld not a supported symlink type\n",
--
2.40.1
Critical fixes for mmap_region(), backported to 5.15.y.
Some notes on differences from upstream:
* We do NOT take commit 0fb4a7ad270b ("mm: refactor
map_deny_write_exec()"), as this refactors code only introduced in 6.2.
* We make reference in "mm: refactor arch_calc_vm_flag_bits() and arm64 MTE
handling" to parisc, but the referenced functionality does not exist in
this kernel.
* In this kernel is_shared_maywrite() does not exist and the code uses
VM_SHARED to determine whether mapping_map_writable() /
mapping_unmap_writable() should be invoked. This backport therefore
follows suit.
* The vma_dummy_vm_ops static global doesn't exist in this kernel, so we
use a local static variable in mmap_file() and vma_close().
* Each version of these series is confronted by a slightly different
mmap_region(), so we must adapt the change for each stable version. The
approach remains the same throughout, however, and we correctly avoid
closing the VMA part way through any __mmap_region() operation.
Lorenzo Stoakes (4):
mm: avoid unsafe VMA hook invocation when error arises on mmap hook
mm: unconditionally close VMAs on error
mm: refactor arch_calc_vm_flag_bits() and arm64 MTE handling
mm: resolve faulty mmap_region() error path behaviour
arch/arm64/include/asm/mman.h | 10 ++--
include/linux/mman.h | 7 +--
mm/internal.h | 19 ++++++++
mm/mmap.c | 86 +++++++++++++++++++++--------------
mm/nommu.c | 9 ++--
mm/shmem.c | 3 --
mm/util.c | 33 ++++++++++++++
7 files changed, 119 insertions(+), 48 deletions(-)
--
2.47.0
From userspace, spawning a new process with, for example,
posix_spawn(), only allows the user to work with
the scheduling priority value defined by POSIX
in the sched_param struct.
However, sched_setparam() and similar syscalls lead to
__sched_setscheduler() which rejects any new value
for the priority other than 0 for non-RT schedule classes,
a behavior that existed since Linux 2.6 or earlier.
Linux translates the usage of the sched_param struct
into it's own internal sched_attr struct during the syscall,
but the user currently has no way to manage the other values
within the sched_attr struct using only POSIX functions.
The only other way to adjust niceness when using posix_spawn()
would be to set the value after the process has started,
but this introduces the risk of the process being dead
before the syscall can set the priority afterward.
To resolve this, allow the use of the priority value
originally from the POSIX sched_param struct in order to
set the niceness value instead of rejecting the priority value.
Edit the sched_get_priority_*() POSIX syscalls
in order to reflect the range of values accepted.
Cc: stable(a)vger.kernel.org # Apply to kernel/sched/core.c
Signed-off-by: Michael C. Pratt <mcpratt(a)pm.me>
---
kernel/sched/syscalls.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/kernel/sched/syscalls.c b/kernel/sched/syscalls.c
index 24f9f90b6574..43eb283e6281 100644
--- a/kernel/sched/syscalls.c
+++ b/kernel/sched/syscalls.c
@@ -785,6 +785,19 @@ static int _sched_setscheduler(struct task_struct *p, int policy,
attr.sched_policy = policy;
}
+ if (attr.sched_priority > MAX_PRIO-1)
+ return -EINVAL;
+
+ /*
+ * If priority is set for SCHED_NORMAL or SCHED_BATCH,
+ * set the niceness instead, but only for user calls.
+ */
+ if (check && attr.sched_priority > MAX_RT_PRIO-1 &&
+ ((policy != SETPARAM_POLICY && fair_policy(policy)) || fair_policy(p->policy))) {
+ attr.sched_nice = PRIO_TO_NICE(attr.sched_priority);
+ attr.sched_priority = 0;
+ }
+
return __sched_setscheduler(p, &attr, check, true);
}
/**
@@ -1532,9 +1545,11 @@ SYSCALL_DEFINE1(sched_get_priority_max, int, policy)
case SCHED_RR:
ret = MAX_RT_PRIO-1;
break;
- case SCHED_DEADLINE:
case SCHED_NORMAL:
case SCHED_BATCH:
+ ret = MAX_PRIO-1;
+ break;
+ case SCHED_DEADLINE:
case SCHED_IDLE:
case SCHED_EXT:
ret = 0;
@@ -1560,9 +1575,11 @@ SYSCALL_DEFINE1(sched_get_priority_min, int, policy)
case SCHED_RR:
ret = 1;
break;
- case SCHED_DEADLINE:
case SCHED_NORMAL:
case SCHED_BATCH:
+ ret = MAX_RT_PRIO;
+ break;
+ case SCHED_DEADLINE:
case SCHED_IDLE:
case SCHED_EXT:
ret = 0;
base-commit: 2d5404caa8c7bb5c4e0435f94b28834ae5456623
--
2.30.2
When the current_uuid attribute is set to active policy UUID, reading
back the same attribute is displaying uuid as "INVALID" instead of active
policy UUID on some platforms before Ice Lake.
In platforms before Ice Lake, firmware provides list of supported thermal
policies. In this case user space can select any of the supported thermal
policy via a write to attribute "current_uuid".
With the 'commit c7ff29763989 ("thermal: int340x: Update OS policy
capability handshake")', OS policy handshake is updated to support
Ice Lake and later platforms. But this treated priv->current_uuid_index=0
as invalid. This priv->current_uuid_index=0 is for active policy.
Only priv->current_uuid_index=-1 is invalid.
Fix this issue by treating priv->current_uuid_index=0 as valid.
Fixes: c7ff29763989 ("thermal: int340x: Update OS policy capability handshake")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada(a)linux.intel.com>
CC: stable(a)vger.kernel.org # 5.18+
---
drivers/thermal/intel/int340x_thermal/int3400_thermal.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
index b0c0f0ffdcb0..f547d386ae80 100644
--- a/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
+++ b/drivers/thermal/intel/int340x_thermal/int3400_thermal.c
@@ -137,7 +137,7 @@ static ssize_t current_uuid_show(struct device *dev,
struct int3400_thermal_priv *priv = dev_get_drvdata(dev);
int i, length = 0;
- if (priv->current_uuid_index > 0)
+ if (priv->current_uuid_index >= 0)
return sprintf(buf, "%s\n",
int3400_thermal_uuids[priv->current_uuid_index]);
--
2.47.0
From: Oleg Nesterov <oleg(a)redhat.com>
[ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]
1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
2. Add a comment to explain why do we need to validate slot_addr.
3. Simplify the validation above. We can simply check offset < PAGE_SIZE,
unsigned underflows are fine, it should work if slot_addr < area->vaddr.
4. Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must
be valid if offset < PAGE_SIZE.
The next patches will cleanup this function even more.
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/uprobes.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4b52cb2ae6d62..cc605df73d72f 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1683,8 +1683,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
static void xol_free_insn_slot(struct task_struct *tsk)
{
struct xol_area *area;
- unsigned long vma_end;
unsigned long slot_addr;
+ unsigned long offset;
if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask)
return;
@@ -1693,24 +1693,21 @@ static void xol_free_insn_slot(struct task_struct *tsk)
if (unlikely(!slot_addr))
return;
+ tsk->utask->xol_vaddr = 0;
area = tsk->mm->uprobes_state.xol_area;
- vma_end = area->vaddr + PAGE_SIZE;
- if (area->vaddr <= slot_addr && slot_addr < vma_end) {
- unsigned long offset;
- int slot_nr;
-
- offset = slot_addr - area->vaddr;
- slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
- if (slot_nr >= UINSNS_PER_PAGE)
- return;
+ offset = slot_addr - area->vaddr;
+ /*
+ * slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE).
+ * This check can only fail if the "[uprobes]" vma was mremap'ed.
+ */
+ if (offset < PAGE_SIZE) {
+ int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
clear_bit(slot_nr, area->bitmap);
atomic_dec(&area->slot_count);
smp_mb__after_atomic(); /* pairs with prepare_to_wait() */
if (waitqueue_active(&area->wq))
wake_up(&area->wq);
-
- tsk->utask->xol_vaddr = 0;
}
}
--
2.43.0
From: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
On the Renesas RZ/G3S, when doing suspend to RAM, the uart_suspend_port()
is called. The uart_suspend_port() calls 3 times the
struct uart_port::ops::tx_empty() before shutting down the port.
According to the documentation, the struct uart_port::ops::tx_empty()
API tests whether the transmitter FIFO and shifter for the port is
empty.
The Renesas RZ/G3S SCIFA IP reports the number of data units stored in the
transmit FIFO through the FDR (FIFO Data Count Register). The data units
in the FIFOs are written in the shift register and transmitted from there.
The TEND bit in the Serial Status Register reports if the data was
transmitted from the shift register.
In the previous code, in the tx_empty() API implemented by the sh-sci
driver, it is considered that the TX is empty if the hardware reports the
TEND bit set and the number of data units in the FIFO is zero.
According to the HW manual, the TEND bit has the following meaning:
0: Transmission is in the waiting state or in progress.
1: Transmission is completed.
It has been noticed that when opening the serial device w/o using it and
then switch to a power saving mode, the tx_empty() call in the
uart_port_suspend() function fails, leading to the "Unable to drain
transmitter" message being printed on the console. This is because the
TEND=0 if nothing has been transmitted and the FIFOs are empty. As the
TEND=0 has double meaning (waiting state, in progress) we can't
determined the scenario described above.
Add a software workaround for this. This sets a variable if any data has
been sent on the serial console (when using PIO) or if the DMA callback has
been called (meaning something has been transmitted). In the tx_empty()
API the status of the DMA transaction is also checked and if it is
completed or in progress the code falls back in checking the hardware
registers instead of relying on the software variable.
Fixes: 73a19e4c0301 ("serial: sh-sci: Add DMA support.")
Cc: stable(a)vger.kernel.org
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj(a)bp.renesas.com>
---
Patch was initially part of series at [1].
Changes since [1]:
- checked the s->chan_tx validity in sci_dma_check_tx_occurred()
[1] https://lore.kernel.org/all/20241115134401.3893008-1-claudiu.beznea.uj@bp.r…
drivers/tty/serial/sh-sci.c | 29 +++++++++++++++++++++++++++++
1 file changed, 29 insertions(+)
diff --git a/drivers/tty/serial/sh-sci.c b/drivers/tty/serial/sh-sci.c
index 136e0c257af1..680f0203fda4 100644
--- a/drivers/tty/serial/sh-sci.c
+++ b/drivers/tty/serial/sh-sci.c
@@ -157,6 +157,7 @@ struct sci_port {
bool has_rtscts;
bool autorts;
+ bool tx_occurred;
};
#define SCI_NPORTS CONFIG_SERIAL_SH_SCI_NR_UARTS
@@ -850,6 +851,7 @@ static void sci_transmit_chars(struct uart_port *port)
{
struct tty_port *tport = &port->state->port;
unsigned int stopped = uart_tx_stopped(port);
+ struct sci_port *s = to_sci_port(port);
unsigned short status;
unsigned short ctrl;
int count;
@@ -885,6 +887,7 @@ static void sci_transmit_chars(struct uart_port *port)
}
sci_serial_out(port, SCxTDR, c);
+ s->tx_occurred = true;
port->icount.tx++;
} while (--count > 0);
@@ -1241,6 +1244,8 @@ static void sci_dma_tx_complete(void *arg)
if (kfifo_len(&tport->xmit_fifo) < WAKEUP_CHARS)
uart_write_wakeup(port);
+ s->tx_occurred = true;
+
if (!kfifo_is_empty(&tport->xmit_fifo)) {
s->cookie_tx = 0;
schedule_work(&s->work_tx);
@@ -1731,6 +1736,19 @@ static void sci_flush_buffer(struct uart_port *port)
s->cookie_tx = -EINVAL;
}
}
+
+static void sci_dma_check_tx_occurred(struct sci_port *s)
+{
+ struct dma_tx_state state;
+ enum dma_status status;
+
+ if (!s->chan_tx)
+ return;
+
+ status = dmaengine_tx_status(s->chan_tx, s->cookie_tx, &state);
+ if (status == DMA_COMPLETE || status == DMA_IN_PROGRESS)
+ s->tx_occurred = true;
+}
#else /* !CONFIG_SERIAL_SH_SCI_DMA */
static inline void sci_request_dma(struct uart_port *port)
{
@@ -1740,6 +1758,10 @@ static inline void sci_free_dma(struct uart_port *port)
{
}
+static void sci_dma_check_tx_occurred(struct sci_port *s)
+{
+}
+
#define sci_flush_buffer NULL
#endif /* !CONFIG_SERIAL_SH_SCI_DMA */
@@ -2076,6 +2098,12 @@ static unsigned int sci_tx_empty(struct uart_port *port)
{
unsigned short status = sci_serial_in(port, SCxSR);
unsigned short in_tx_fifo = sci_txfill(port);
+ struct sci_port *s = to_sci_port(port);
+
+ sci_dma_check_tx_occurred(s);
+
+ if (!s->tx_occurred)
+ return TIOCSER_TEMT;
return (status & SCxSR_TEND(port)) && !in_tx_fifo ? TIOCSER_TEMT : 0;
}
@@ -2247,6 +2275,7 @@ static int sci_startup(struct uart_port *port)
dev_dbg(port->dev, "%s(%d)\n", __func__, port->line);
+ s->tx_occurred = false;
sci_request_dma(port);
ret = sci_request_irq(s);
--
2.39.2
A collection of interrupt related fixes and cleanups. A few patches are
from Devarsh and have been posted to dri-devel, which I've included here
with a permission from Devarsh, so that we have all interrupt patches
together. I have modified both of those patches compared to the posted
versions.
Tomi
Signed-off-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
---
Devarsh Thakkar (2):
drm/tidss: Clear the interrupt status for interrupts being disabled
drm/tidss: Fix race condition while handling interrupt registers
Tomi Valkeinen (5):
drm/tidss: Fix issue in irq handling causing irq-flood issue
drm/tidss: Remove unused OCP error flag
drm/tidss: Remove extra K2G check
drm/tidss: Add printing of underflows
drm/tidss: Rename 'wait_lock' to 'irq_lock'
drivers/gpu/drm/tidss/tidss_dispc.c | 28 ++++++++++++++++------------
drivers/gpu/drm/tidss/tidss_drv.c | 2 +-
drivers/gpu/drm/tidss/tidss_drv.h | 5 +++--
drivers/gpu/drm/tidss/tidss_irq.c | 34 +++++++++++++++++++++++-----------
drivers/gpu/drm/tidss/tidss_irq.h | 4 +---
drivers/gpu/drm/tidss/tidss_plane.c | 8 ++++++++
drivers/gpu/drm/tidss/tidss_plane.h | 2 ++
7 files changed, 54 insertions(+), 29 deletions(-)
---
base-commit: 98f7e32f20d28ec452afb208f9cffc08448a2652
change-id: 20240918-tidss-irq-fix-f687b149a42c
Best regards,
--
Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
From: Pali Rohár <pali(a)kernel.org>
upstream e2a8910af01653c1c268984855629d71fb81f404 commit.
ReparseDataLength is sum of the InodeType size and DataBuffer size.
So to get DataBuffer size it is needed to subtract InodeType's size from
ReparseDataLength.
Function cifs_strndup_from_utf16() is currentlly accessing buf->DataBuffer
at position after the end of the buffer because it does not subtract
InodeType size from the length. Fix this problem and correctly subtract
variable len.
Member InodeType is present only when reparse buffer is large enough. Check
for ReparseDataLength before accessing InodeType to prevent another invalid
memory access.
Major and minor rdev values are present also only when reparse buffer is
large enough. Check for reparse buffer size before calling reparse_mkdev().
Fixes: d5ecebc4900d ("smb3: Allow query of symlinks stored as reparse points")
Reviewed-by: Paulo Alcantara (Red Hat) <pc(a)manguebit.com>
Signed-off-by: Pali Rohár <pali(a)kernel.org>
Signed-off-by: Steve French <stfrench(a)microsoft.com>
[use variable name symlink_buf, the other buf->InodeType accesses are
not used in current version so skip]
Signed-off-by: Mahmoud Adam <mngyadam(a)amazon.com>
---
This fixes CVE-2024-49996.
fs/cifs/smb2ops.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/fs/cifs/smb2ops.c b/fs/cifs/smb2ops.c
index 6c30fff8a029e..ee9a1e6550e3c 100644
--- a/fs/cifs/smb2ops.c
+++ b/fs/cifs/smb2ops.c
@@ -2971,6 +2971,12 @@ parse_reparse_posix(struct reparse_posix_data *symlink_buf,
/* See MS-FSCC 2.1.2.6 for the 'NFS' style reparse tags */
len = le16_to_cpu(symlink_buf->ReparseDataLength);
+ if (len < sizeof(symlink_buf->InodeType)) {
+ cifs_dbg(VFS, "srv returned malformed nfs buffer\n");
+ return -EIO;
+ }
+
+ len -= sizeof(symlink_buf->InodeType);
if (le64_to_cpu(symlink_buf->InodeType) != NFS_SPECFILE_LNK) {
cifs_dbg(VFS, "%lld not a supported symlink type\n",
--
2.40.1
A race condition exists between SMB request handling in
`ksmbd_conn_handler_loop()` and the freeing of `ksmbd_conn` in the
workqueue handler `handle_ksmbd_work()`. This leads to a UAF.
- KASAN: slab-use-after-free Read in handle_ksmbd_work
- KASAN: slab-use-after-free in rtlock_slowlock_locked
This race condition arises as follows:
- `ksmbd_conn_handler_loop()` waits for `conn->r_count` to reach zero:
`wait_event(conn->r_count_q, atomic_read(&conn->r_count) == 0);`
- Meanwhile, `handle_ksmbd_work()` decrements `conn->r_count` using
`atomic_dec_return(&conn->r_count)`, and if it reaches zero, calls
`ksmbd_conn_free()`, which frees `conn`.
- However, after `handle_ksmbd_work()` decrements `conn->r_count`,
it may still access `conn->r_count_q` in the following line:
`waitqueue_active(&conn->r_count_q)` or `wake_up(&conn->r_count_q)`
This results in a UAF, as `conn` has already been freed.
The discovery of this UAF can be referenced in the following PR for
syzkaller's support for SMB requests.
Link: https://github.com/google/syzkaller/pull/5524
Fixes: ee426bfb9d09 ("ksmbd: add refcnt to ksmbd_conn struct")
Cc: linux-cifs(a)vger.kernel.org
Cc: stable(a)vger.kernel.org # v6.6.55+, v6.10.14+, v6.11.3+
Cc: syzkaller(a)googlegroups.com
Signed-off-by: Yunseong Kim <yskelg(a)gmail.com>
---
fs/smb/server/server.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/smb/server/server.c b/fs/smb/server/server.c
index e6cfedba9992..c8cc6fa6fc3e 100644
--- a/fs/smb/server/server.c
+++ b/fs/smb/server/server.c
@@ -276,8 +276,12 @@ static void handle_ksmbd_work(struct work_struct *wk)
* disconnection. waitqueue_active is safe because it
* uses atomic operation for condition.
*/
+ atomic_inc(&conn->refcnt);
if (!atomic_dec_return(&conn->r_count) && waitqueue_active(&conn->r_count_q))
wake_up(&conn->r_count_q);
+
+ if (atomic_dec_and_test(&conn->refcnt))
+ kfree(conn);
}
/**
--
2.43.0
From: Dom Cobley <popcornmix(a)gmail.com>
[ Upstream commit b4e5646178e86665f5caef2894578600f597098a ]
We regularly get dmesg error reports of:
[ 18.184066] hdmi-audio-codec hdmi-audio-codec.3.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[ 18.184098] MAI: soc_pcm_open() failed (-19)
These are generated for any disconnected hdmi interface when pulseaudio
attempts to open the associated ALSA device (numerous times). Each open
generates a kernel error message, generating general log spam.
The error messages all come from _soc_pcm_ret in sound/soc/soc-pcm.c#L39
which suggests returning ENOTSUPP, rather that ENODEV will be quiet.
And indeed it is.
Signed-off-by: Dom Cobley <popcornmix(a)gmail.com>
Reviewed-by: Maxime Ripard <mripard(a)kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240621152055.4180873-5-dave…
Signed-off-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/vc4/vc4_hdmi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index 971801acbde60..51971035d8cbb 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -2156,7 +2156,7 @@ static int vc4_hdmi_audio_startup(struct device *dev, void *data)
}
if (!vc4_hdmi_audio_can_stream(vc4_hdmi)) {
- ret = -ENODEV;
+ ret = -ENOTSUPP;
goto out_dev_exit;
}
--
2.43.0
v2:
The main change in this version is Bjorn's pointing out that pm_runtime_*
inside of the gdsc_enable/gdsc_disable path would be recursive and cause a
lockdep splat. Dmitry alluded to this too.
Bjorn pointed to stuff being done lower in the gdsc_register() routine that
might be a starting point.
I iterated around that idea and came up with patch #3. When a gdsc has no
parent and the pd_list is non-NULL then attach that orphan GDSC to the
clock controller power-domain list.
Existing subdomain code in gdsc_register() will connect the parent GDSCs in
the clock-controller to the clock-controller subdomain, the new code here
does that same job for a list of power-domains the clock controller depends
on.
To Dmitry's point about MMCX and MCX dependencies for the registers inside
of the clock controller, I have switched off all references in a test dtsi
and confirmed that accessing the clock-controller regs themselves isn't
required.
On the second point I also verified my test branch with lockdep on which
was a concern with the pm_domain version of this solution but I wanted to
cover it anyway with the new approach for completeness sake.
Here's the item-by-item list of changes:
- Adds a patch to capture pm_genpd_add_subdomain() result code - Bryan
- Changes changelog of second patch to remove singleton and generally
to make the commit log easier to understand - Bjorn
- Uses demv_pm_domain_attach_list - Vlad
- Changes error check to if (ret < 0 && ret != -EEXIST) - Vlad
- Retains passing &pd_data instead of NULL - because NULL doesn't do
the same thing - Bryan/Vlad
- Retains standalone function qcom_cc_pds_attach() because the pd_data
enumeration looks neater in a standalone function - Bryan/Vlad
- Drops pm_runtime in favour of gdsc_add_subdomain_list() for each
power-domain in the pd_list.
The pd_list will be whatever is pointed to by power-domains = <>
in the dtsi - Bjorn
- Link to v1: https://lore.kernel.org/r/20241118-b4-linux-next-24-11-18-clock-multiple-po…
v1:
On x1e80100 and it's SKUs the Camera Clock Controller - CAMCC has
multiple power-domains which power it. Usually with a single power-domain
the core platform code will automatically switch on the singleton
power-domain for you. If you have multiple power-domains for a device, in
this case the clock controller, you need to switch those power-domains
on/off yourself.
The clock controllers can also contain Global Distributed
Switch Controllers - GDSCs which themselves can be referenced from dtsi
nodes ultimately triggering a gdsc_en() in drivers/clk/qcom/gdsc.c.
As an example:
cci0: cci@ac4a000 {
power-domains = <&camcc TITAN_TOP_GDSC>;
};
This series adds the support to attach a power-domain list to the
clock-controllers and the GDSCs those controllers provide so that in the
case of the above example gdsc_toggle_logic() will trigger the power-domain
list with pm_runtime_resume_and_get() and pm_runtime_put_sync()
respectively.
Signed-off-by: Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
---
Bryan O'Donoghue (3):
clk: qcom: gdsc: Capture pm_genpd_add_subdomain result code
clk: qcom: common: Add support for power-domain attachment
driver: clk: qcom: Support attaching subdomain list to multiple parents
drivers/clk/qcom/common.c | 21 +++++++++++++++++++++
drivers/clk/qcom/gdsc.c | 41 +++++++++++++++++++++++++++++++++++++++--
drivers/clk/qcom/gdsc.h | 1 +
3 files changed, 61 insertions(+), 2 deletions(-)
---
base-commit: 744cf71b8bdfcdd77aaf58395e068b7457634b2c
change-id: 20241118-b4-linux-next-24-11-18-clock-multiple-power-domains-a5f994dc452a
Best regards,
--
Bryan O'Donoghue <bryan.odonoghue(a)linaro.org>
[BUG]
Btrfs will fail generic/750 randomly if its sector size is smaller than
page size.
One of the warning looks like this:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 90263 at fs/btrfs/ordered-data.c:360 can_finish_ordered_extent+0x33c/0x390 [btrfs]
CPU: 1 UID: 0 PID: 90263 Comm: kworker/u18:1 Tainted: G OE 6.12.0-rc3-custom+ #79
Workqueue: events_unbound btrfs_async_reclaim_metadata_space [btrfs]
pc : can_finish_ordered_extent+0x33c/0x390 [btrfs]
lr : can_finish_ordered_extent+0xdc/0x390 [btrfs]
Call trace:
can_finish_ordered_extent+0x33c/0x390 [btrfs]
btrfs_mark_ordered_io_finished+0x130/0x2b8 [btrfs]
extent_writepage+0xfc/0x338 [btrfs]
extent_write_cache_pages+0x1d4/0x4b8 [btrfs]
btrfs_writepages+0x94/0x158 [btrfs]
do_writepages+0x74/0x190
filemap_fdatawrite_wbc+0x88/0xc8
start_delalloc_inodes+0x180/0x3b0 [btrfs]
btrfs_start_delalloc_roots+0x17c/0x288 [btrfs]
shrink_delalloc+0x11c/0x280 [btrfs]
flush_space+0x27c/0x310 [btrfs]
btrfs_async_reclaim_metadata_space+0xcc/0x208 [btrfs]
process_one_work+0x228/0x670
worker_thread+0x1bc/0x360
kthread+0x100/0x118
ret_from_fork+0x10/0x20
irq event stamp: 9784200
hardirqs last enabled at (9784199): [<ffffd21ec54dc01c>] _raw_spin_unlock_irqrestore+0x74/0x80
hardirqs last disabled at (9784200): [<ffffd21ec54db374>] _raw_spin_lock_irqsave+0x8c/0xa0
softirqs last enabled at (9784148): [<ffffd21ec472ff44>] handle_softirqs+0x45c/0x4b0
softirqs last disabled at (9784141): [<ffffd21ec46d01e4>] __do_softirq+0x1c/0x28
---[ end trace 0000000000000000 ]---
BTRFS critical (device dm-2): bad ordered extent accounting, root=5 ino=1492 OE offset=1654784 OE len=57344 to_dec=49152 left=0
[CAUSE]
There are several error paths not properly handling during folio
writeback:
1) Partially submitted folio
During extent_writepage_io() if some error happened (the only
possible case is submit_one_sector() failed to grab an extent map),
then we can have partially submitted folio.
Since extent_writepage_io() failed, we need to call
btrfs_mark_ordered_io_finished() to cleanup the submitted range.
But we will call btrfs_mark_ordered_io_finished() for submitted range
too, causing double accounting.
2) Partially created ordered extents
We cal also fail at writepage_delalloc(), which will stop creating
new ordered extents if it hit any error from
btrfs_run_delalloc_range().
In that case, we will call btrfs_mark_ordered_io_finished() for
ranges where there is no ordered extent at all.
Both bugs are only affecting sector size < page size cases.
[FIX]
- Introduce a new member btrfs_bio_ctrl::last_submitted
This will trace the last sector submitted through
extent_writepage_io().
So for the above extent_writepage() case, we will know exactly which
sectors are submitted and should not do the ordered extent accounting.
- Clear the submit_bitmap for ranges where no ordered extent is created
So if btrfs_run_delalloc_range() failed for a range, it will be not
cleaned up.
- Introduce a helper cleanup_ordered_extents()
This will do a sector-by-sector cleanup with
btrfs_bio_ctrl::last_submitted and btrfs_bio_ctrl::submit_bitmap into
consideartion.
Using @last_submitted is to avoid double accounting on the submitted
ranges.
Meanwhile using @submit_bitmap is to avoid touching ranges going
through compression.
cc: stable(a)vger.kernel.org # 5.15+
Signed-off-by: Qu Wenruo <wqu(a)suse.com>
---
fs/btrfs/extent_io.c | 54 ++++++++++++++++++++++++++++++++++++++------
1 file changed, 47 insertions(+), 7 deletions(-)
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index e629d2ee152a..1c2246d36672 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -108,6 +108,14 @@ struct btrfs_bio_ctrl {
* This is to avoid touching ranges covered by compression/inline.
*/
unsigned long submit_bitmap;
+
+ /*
+ * The end (exclusive) of the last submitted range in the folio.
+ *
+ * This is for sector size < page size case where we may hit error
+ * half way.
+ */
+ u64 last_submitted;
};
static void submit_one_bio(struct btrfs_bio_ctrl *bio_ctrl)
@@ -1254,11 +1262,18 @@ static noinline_for_stack int writepage_delalloc(struct btrfs_inode *inode,
/*
* We have some ranges that's going to be submitted asynchronously
- * (compression or inline). These range have their own control
+ * (compression or inline, ret > 0). These range have their own control
* on when to unlock the pages. We should not touch them
- * anymore, so clear the range from the submission bitmap.
+ * anymore.
+ *
+ * We can also have some ranges where we didn't even call
+ * btrfs_run_delalloc_range() (as previous run failed, ret < 0).
+ * These error ranges should not be submitted nor cleaned up as
+ * there is no ordered extent allocated for them.
+ *
+ * For either cases, we should clear the submit_bitmap.
*/
- if (ret > 0) {
+ if (ret) {
unsigned int start_bit = (found_start - page_start) >>
fs_info->sectorsize_bits;
unsigned int end_bit = (min(page_end + 1, found_start + found_len) -
@@ -1435,6 +1450,7 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
ret = submit_one_sector(inode, folio, cur, bio_ctrl, i_size);
if (ret < 0)
goto out;
+ bio_ctrl->last_submitted = cur + fs_info->sectorsize;
submitted_io = true;
}
out:
@@ -1453,6 +1469,24 @@ static noinline_for_stack int extent_writepage_io(struct btrfs_inode *inode,
return ret;
}
+static void cleanup_ordered_extents(struct btrfs_inode *inode,
+ struct folio *folio, u64 file_pos,
+ u64 num_bytes, unsigned long *bitmap)
+{
+ struct btrfs_fs_info *fs_info = inode->root->fs_info;
+ unsigned int cur_bit = (file_pos - folio_pos(folio)) >> fs_info->sectorsize_bits;
+
+ for_each_set_bit_from(cur_bit, bitmap, fs_info->sectors_per_page) {
+ u64 cur_pos = folio_pos(folio) + (cur_bit << fs_info->sectorsize_bits);
+
+ if (cur_pos >= file_pos + num_bytes)
+ break;
+
+ btrfs_mark_ordered_io_finished(inode, folio, cur_pos,
+ fs_info->sectorsize, false);
+ }
+}
+
/*
* the writepage semantics are similar to regular writepage. extent
* records are inserted to lock ranges in the tree, and as dirty areas
@@ -1492,6 +1526,7 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
* The proper bitmap can only be initialized until writepage_delalloc().
*/
bio_ctrl->submit_bitmap = (unsigned long)-1;
+ bio_ctrl->last_submitted = page_start;
ret = set_folio_extent_mapped(folio);
if (ret < 0)
goto done;
@@ -1511,8 +1546,10 @@ static int extent_writepage(struct folio *folio, struct btrfs_bio_ctrl *bio_ctrl
done:
if (ret) {
- btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
- page_start, PAGE_SIZE, !ret);
+ cleanup_ordered_extents(BTRFS_I(inode), folio,
+ bio_ctrl->last_submitted,
+ page_start + PAGE_SIZE - bio_ctrl->last_submitted,
+ &bio_ctrl->submit_bitmap);
mapping_set_error(folio->mapping, ret);
}
@@ -2288,14 +2325,17 @@ void extent_write_locked_range(struct inode *inode, const struct folio *locked_f
* extent_writepage_io() will do the truncation correctly.
*/
bio_ctrl.submit_bitmap = (unsigned long)-1;
+ bio_ctrl.last_submitted = cur;
ret = extent_writepage_io(BTRFS_I(inode), folio, cur, cur_len,
&bio_ctrl, i_size);
if (ret == 1)
goto next_page;
if (ret) {
- btrfs_mark_ordered_io_finished(BTRFS_I(inode), folio,
- cur, cur_len, !ret);
+ cleanup_ordered_extents(BTRFS_I(inode), folio,
+ bio_ctrl.last_submitted,
+ cur_end + 1 - bio_ctrl.last_submitted,
+ &bio_ctrl.submit_bitmap);
mapping_set_error(mapping, ret);
}
btrfs_folio_end_lock(fs_info, folio, cur, cur_len);
--
2.47.0
If:
1) the user requested USO, but
2) there is not enough payload for GSO to kick in, and
3) the egress device doesn't offer checksum offload, then
we want to compute the L4 checksum in software early on.
In the case when we are not taking the GSO path, but it has been requested,
the software checksum fallback in skb_segment doesn't get a chance to
compute the full checksum, if the egress device can't do it. As a result we
end up sending UDP datagrams with only a partial checksum filled in, which
the peer will discard.
Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: Ivan Babrou <ivan(a)cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub(a)cloudflare.com>
Acked-by: Willem de Bruijn <willemdebruijn.kernel(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
Changes in v2:
- Fix typo in patch description
- Link to v1: https://lore.kernel.org/r/20241010-uso-swcsum-fixup-v1-1-a63fbd0a414c@cloud…
---
net/ipv4/udp.c | 4 +++-
net/ipv6/udp.c | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8accbf4cb295..2849b273b131 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -951,8 +951,10 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4,
skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
cork->gso_size);
+
+ /* Don't checksum the payload, skb will get segmented */
+ goto csum_partial;
}
- goto csum_partial;
}
if (is_udplite) /* UDP-Lite */
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 52dfbb2ff1a8..0cef8ae5d1ea 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1266,8 +1266,10 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6,
skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4;
skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
cork->gso_size);
+
+ /* Don't checksum the payload, skb will get segmented */
+ goto csum_partial;
}
- goto csum_partial;
}
if (is_udplite)
From: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
[ Upstream commit 014eccc9da7bfc76a3107fceea37dd60f1d63630 ]
The HVS can change AXI request mode based on how full the COB
FIFOs are.
Until now the vc4 driver has been relying on the firmware to
have set these to sensible values.
With HVS channel 2 now being used for live video, change the
panic mode for all channels to be explicitly set by the driver,
and the same for all channels.
Reviewed-by: Maxime Ripard <mripard(a)kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240621152055.4180873-7-dave…
Signed-off-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/vc4/vc4_hvs.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_hvs.c b/drivers/gpu/drm/vc4/vc4_hvs.c
index f8f2fc3d15f73..64a02e29b7cb1 100644
--- a/drivers/gpu/drm/vc4/vc4_hvs.c
+++ b/drivers/gpu/drm/vc4/vc4_hvs.c
@@ -688,6 +688,17 @@ static int vc4_hvs_bind(struct device *dev, struct device *master, void *data)
dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_PANIC1);
dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_PANIC2);
+ /* Set AXI panic mode.
+ * VC4 panics when < 2 lines in FIFO.
+ * VC5 panics when less than 1 line in the FIFO.
+ */
+ dispctrl &= ~(SCALER_DISPCTRL_PANIC0_MASK |
+ SCALER_DISPCTRL_PANIC1_MASK |
+ SCALER_DISPCTRL_PANIC2_MASK);
+ dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_PANIC0);
+ dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_PANIC1);
+ dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_PANIC2);
+
HVS_WRITE(SCALER_DISPCTRL, dispctrl);
ret = devm_request_irq(dev, platform_get_irq(pdev, 0),
--
2.43.0
From: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
[ Upstream commit 014eccc9da7bfc76a3107fceea37dd60f1d63630 ]
The HVS can change AXI request mode based on how full the COB
FIFOs are.
Until now the vc4 driver has been relying on the firmware to
have set these to sensible values.
With HVS channel 2 now being used for live video, change the
panic mode for all channels to be explicitly set by the driver,
and the same for all channels.
Reviewed-by: Maxime Ripard <mripard(a)kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240621152055.4180873-7-dave…
Signed-off-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/vc4/vc4_hvs.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/gpu/drm/vc4/vc4_hvs.c b/drivers/gpu/drm/vc4/vc4_hvs.c
index 3856ac289d380..69b2936a5f4ad 100644
--- a/drivers/gpu/drm/vc4/vc4_hvs.c
+++ b/drivers/gpu/drm/vc4/vc4_hvs.c
@@ -729,6 +729,17 @@ static int vc4_hvs_bind(struct device *dev, struct device *master, void *data)
dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_PANIC1);
dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_PANIC2);
+ /* Set AXI panic mode.
+ * VC4 panics when < 2 lines in FIFO.
+ * VC5 panics when less than 1 line in the FIFO.
+ */
+ dispctrl &= ~(SCALER_DISPCTRL_PANIC0_MASK |
+ SCALER_DISPCTRL_PANIC1_MASK |
+ SCALER_DISPCTRL_PANIC2_MASK);
+ dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_PANIC0);
+ dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_PANIC1);
+ dispctrl |= VC4_SET_FIELD(2, SCALER_DISPCTRL_PANIC2);
+
HVS_WRITE(SCALER_DISPCTRL, dispctrl);
ret = devm_request_irq(dev, platform_get_irq(pdev, 0),
--
2.43.0
From: Dom Cobley <popcornmix(a)gmail.com>
[ Upstream commit b4e5646178e86665f5caef2894578600f597098a ]
We regularly get dmesg error reports of:
[ 18.184066] hdmi-audio-codec hdmi-audio-codec.3.auto: ASoC: error at snd_soc_dai_startup on i2s-hifi: -19
[ 18.184098] MAI: soc_pcm_open() failed (-19)
These are generated for any disconnected hdmi interface when pulseaudio
attempts to open the associated ALSA device (numerous times). Each open
generates a kernel error message, generating general log spam.
The error messages all come from _soc_pcm_ret in sound/soc/soc-pcm.c#L39
which suggests returning ENOTSUPP, rather that ENODEV will be quiet.
And indeed it is.
Signed-off-by: Dom Cobley <popcornmix(a)gmail.com>
Reviewed-by: Maxime Ripard <mripard(a)kernel.org>
Link: https://patchwork.freedesktop.org/patch/msgid/20240621152055.4180873-5-dave…
Signed-off-by: Dave Stevenson <dave.stevenson(a)raspberrypi.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpu/drm/vc4/vc4_hdmi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/vc4/vc4_hdmi.c b/drivers/gpu/drm/vc4/vc4_hdmi.c
index c6e986f71a26f..7cabfa1b883d2 100644
--- a/drivers/gpu/drm/vc4/vc4_hdmi.c
+++ b/drivers/gpu/drm/vc4/vc4_hdmi.c
@@ -2392,7 +2392,7 @@ static int vc4_hdmi_audio_startup(struct device *dev, void *data)
}
if (!vc4_hdmi_audio_can_stream(vc4_hdmi)) {
- ret = -ENODEV;
+ ret = -ENOTSUPP;
goto out_dev_exit;
}
--
2.43.0
From: Stefan Wahren <wahrenst(a)gmx.net>
[ Upstream commit fa8ecda9876ac1e7b29257aa82af1fd0695496e2 ]
The target value of scldiv is just a byte, but its calculation in
fsl_lpspi_set_bitrate could be negative. So use an adequate type to store
the result and avoid overflows. After that this needs range check
adjustments, but this should make the code less opaque.
Signed-off-by: Stefan Wahren <wahrenst(a)gmx.net>
Reviewed-by: Frank Li <Frank.Li(a)nxp.com>
Link: https://patch.msgid.link/20240930093056.93418-2-wahrenst@gmx.net
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/spi/spi-fsl-lpspi.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c
index 13313f07839b6..84881983d5d9f 100644
--- a/drivers/spi/spi-fsl-lpspi.c
+++ b/drivers/spi/spi-fsl-lpspi.c
@@ -315,9 +315,10 @@ static void fsl_lpspi_set_watermark(struct fsl_lpspi_data *fsl_lpspi)
static int fsl_lpspi_set_bitrate(struct fsl_lpspi_data *fsl_lpspi)
{
struct lpspi_config config = fsl_lpspi->config;
- unsigned int perclk_rate, scldiv, div;
+ unsigned int perclk_rate, div;
u8 prescale_max;
u8 prescale;
+ int scldiv;
perclk_rate = clk_get_rate(fsl_lpspi->clk_per);
prescale_max = fsl_lpspi->devtype_data->prescale_max;
@@ -338,13 +339,13 @@ static int fsl_lpspi_set_bitrate(struct fsl_lpspi_data *fsl_lpspi)
for (prescale = 0; prescale <= prescale_max; prescale++) {
scldiv = div / (1 << prescale) - 2;
- if (scldiv < 256) {
+ if (scldiv >= 0 && scldiv < 256) {
fsl_lpspi->config.prescale = prescale;
break;
}
}
- if (scldiv >= 256)
+ if (scldiv < 0 || scldiv >= 256)
return -EINVAL;
writel(scldiv | (scldiv << 8) | ((scldiv >> 1) << 16),
--
2.43.0
From: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
[ Upstream commit ec8b6f55b98146c41dcf15e8189eb43291e35e89 ]
If we remove a GPIO chip that is also an interrupt controller with users
not having freed some interrupts, we'll end up leaking resources as
indicated by the following warning:
remove_proc_entry: removing non-empty directory 'irq/30', leaking at least 'gpio'
As there's no way of notifying interrupt users about the irqchip going
away and the interrupt subsystem is not plugged into the driver model and
so not all cases can be handled by devlinks, we need to make sure to free
all interrupts before the complete the removal of the provider.
Reviewed-by: Herve Codina <herve.codina(a)bootlin.com>
Tested-by: Herve Codina <herve.codina(a)bootlin.com>
Link: https://lore.kernel.org/r/20240919135104.3583-1-brgl@bgdev.pl
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski(a)linaro.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
drivers/gpio/gpiolib.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index 337971080dfde..206757d155ef9 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -14,6 +14,7 @@
#include <linux/idr.h>
#include <linux/interrupt.h>
#include <linux/irq.h>
+#include <linux/irqdesc.h>
#include <linux/kernel.h>
#include <linux/list.h>
#include <linux/lockdep.h>
@@ -710,6 +711,45 @@ bool gpiochip_line_is_valid(const struct gpio_chip *gc,
}
EXPORT_SYMBOL_GPL(gpiochip_line_is_valid);
+static void gpiod_free_irqs(struct gpio_desc *desc)
+{
+ int irq = gpiod_to_irq(desc);
+ struct irq_desc *irqd = irq_to_desc(irq);
+ void *cookie;
+
+ for (;;) {
+ /*
+ * Make sure the action doesn't go away while we're
+ * dereferencing it. Retrieve and store the cookie value.
+ * If the irq is freed after we release the lock, that's
+ * alright - the underlying maple tree lookup will return NULL
+ * and nothing will happen in free_irq().
+ */
+ scoped_guard(mutex, &irqd->request_mutex) {
+ if (!irq_desc_has_action(irqd))
+ return;
+
+ cookie = irqd->action->dev_id;
+ }
+
+ free_irq(irq, cookie);
+ }
+}
+
+/*
+ * The chip is going away but there may be users who had requested interrupts
+ * on its GPIO lines who have no idea about its removal and have no way of
+ * being notified about it. We need to free any interrupts still in use here or
+ * we'll leak memory and resources (like procfs files).
+ */
+static void gpiochip_free_remaining_irqs(struct gpio_chip *gc)
+{
+ struct gpio_desc *desc;
+
+ for_each_gpio_desc_with_flag(gc, desc, FLAG_USED_AS_IRQ)
+ gpiod_free_irqs(desc);
+}
+
static void gpiodev_release(struct device *dev)
{
struct gpio_device *gdev = to_gpio_device(dev);
@@ -1122,6 +1162,7 @@ void gpiochip_remove(struct gpio_chip *gc)
/* FIXME: should the legacy sysfs handling be moved to gpio_device? */
gpiochip_sysfs_unregister(gdev);
gpiochip_free_hogs(gc);
+ gpiochip_free_remaining_irqs(gc);
scoped_guard(mutex, &gpio_devices_lock)
list_del_rcu(&gdev->list);
--
2.43.0
From: Oleg Nesterov <oleg(a)redhat.com>
[ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]
1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
2. Add a comment to explain why do we need to validate slot_addr.
3. Simplify the validation above. We can simply check offset < PAGE_SIZE,
unsigned underflows are fine, it should work if slot_addr < area->vaddr.
4. Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must
be valid if offset < PAGE_SIZE.
The next patches will cleanup this function even more.
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/uprobes.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 3ca91daddc9f5..e4156acc004d3 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1332,8 +1332,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
static void xol_free_insn_slot(struct task_struct *tsk)
{
struct xol_area *area;
- unsigned long vma_end;
unsigned long slot_addr;
+ unsigned long offset;
if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask)
return;
@@ -1342,24 +1342,21 @@ static void xol_free_insn_slot(struct task_struct *tsk)
if (unlikely(!slot_addr))
return;
+ tsk->utask->xol_vaddr = 0;
area = tsk->mm->uprobes_state.xol_area;
- vma_end = area->vaddr + PAGE_SIZE;
- if (area->vaddr <= slot_addr && slot_addr < vma_end) {
- unsigned long offset;
- int slot_nr;
-
- offset = slot_addr - area->vaddr;
- slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
- if (slot_nr >= UINSNS_PER_PAGE)
- return;
+ offset = slot_addr - area->vaddr;
+ /*
+ * slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE).
+ * This check can only fail if the "[uprobes]" vma was mremap'ed.
+ */
+ if (offset < PAGE_SIZE) {
+ int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
clear_bit(slot_nr, area->bitmap);
atomic_dec(&area->slot_count);
smp_mb__after_atomic(); /* pairs with prepare_to_wait() */
if (waitqueue_active(&area->wq))
wake_up(&area->wq);
-
- tsk->utask->xol_vaddr = 0;
}
}
--
2.43.0
From: Oleg Nesterov <oleg(a)redhat.com>
[ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]
1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
2. Add a comment to explain why do we need to validate slot_addr.
3. Simplify the validation above. We can simply check offset < PAGE_SIZE,
unsigned underflows are fine, it should work if slot_addr < area->vaddr.
4. Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must
be valid if offset < PAGE_SIZE.
The next patches will cleanup this function even more.
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/uprobes.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 6285674412f25..d338eeb30145b 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1640,8 +1640,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
static void xol_free_insn_slot(struct task_struct *tsk)
{
struct xol_area *area;
- unsigned long vma_end;
unsigned long slot_addr;
+ unsigned long offset;
if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask)
return;
@@ -1650,24 +1650,21 @@ static void xol_free_insn_slot(struct task_struct *tsk)
if (unlikely(!slot_addr))
return;
+ tsk->utask->xol_vaddr = 0;
area = tsk->mm->uprobes_state.xol_area;
- vma_end = area->vaddr + PAGE_SIZE;
- if (area->vaddr <= slot_addr && slot_addr < vma_end) {
- unsigned long offset;
- int slot_nr;
-
- offset = slot_addr - area->vaddr;
- slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
- if (slot_nr >= UINSNS_PER_PAGE)
- return;
+ offset = slot_addr - area->vaddr;
+ /*
+ * slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE).
+ * This check can only fail if the "[uprobes]" vma was mremap'ed.
+ */
+ if (offset < PAGE_SIZE) {
+ int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
clear_bit(slot_nr, area->bitmap);
atomic_dec(&area->slot_count);
smp_mb__after_atomic(); /* pairs with prepare_to_wait() */
if (waitqueue_active(&area->wq))
wake_up(&area->wq);
-
- tsk->utask->xol_vaddr = 0;
}
}
--
2.43.0
From: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
Currently suspending while sensors are one will result in timestamping
continuing without gap at resume. It can work with monotonic clock but
not with other clocks. Fix that by resetting timestamping.
Fixes: ec74ae9fd37c ("iio: imu: inv_icm42600: add accurate timestamping")
Cc: stable(a)vger.kernel.org
Signed-off-by: Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
---
drivers/iio/imu/inv_icm42600/inv_icm42600_core.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/drivers/iio/imu/inv_icm42600/inv_icm42600_core.c b/drivers/iio/imu/inv_icm42600/inv_icm42600_core.c
index 93b5d7a3339ccff16b21bf6c40ed7b2311317cf4..03139e2e4eddbf37e154de2eb486549bc3bdb284 100644
--- a/drivers/iio/imu/inv_icm42600/inv_icm42600_core.c
+++ b/drivers/iio/imu/inv_icm42600/inv_icm42600_core.c
@@ -814,6 +814,8 @@ static int inv_icm42600_suspend(struct device *dev)
static int inv_icm42600_resume(struct device *dev)
{
struct inv_icm42600_state *st = dev_get_drvdata(dev);
+ struct inv_icm42600_sensor_state *gyro_st = iio_priv(st->indio_gyro);
+ struct inv_icm42600_sensor_state *accel_st = iio_priv(st->indio_accel);
int ret;
mutex_lock(&st->lock);
@@ -834,9 +836,12 @@ static int inv_icm42600_resume(struct device *dev)
goto out_unlock;
/* restore FIFO data streaming */
- if (st->fifo.on)
+ if (st->fifo.on) {
+ inv_sensors_timestamp_reset(&gyro_st->ts);
+ inv_sensors_timestamp_reset(&accel_st->ts);
ret = regmap_write(st->map, INV_ICM42600_REG_FIFO_CONFIG,
INV_ICM42600_FIFO_CONFIG_STREAM);
+ }
out_unlock:
mutex_unlock(&st->lock);
---
base-commit: 9dd2270ca0b38ee16094817f4a53e7ba78e31567
change-id: 20241113-inv_icm42600-fix-timestamps-after-suspend-b7e421eaa40c
Best regards,
--
Jean-Baptiste Maneyrol <jean-baptiste.maneyrol(a)tdk.com>
From: Oleg Nesterov <oleg(a)redhat.com>
[ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]
1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
2. Add a comment to explain why do we need to validate slot_addr.
3. Simplify the validation above. We can simply check offset < PAGE_SIZE,
unsigned underflows are fine, it should work if slot_addr < area->vaddr.
4. Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must
be valid if offset < PAGE_SIZE.
The next patches will cleanup this function even more.
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/uprobes.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index b37a6bde8a915..a0f846b2ac1ea 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1633,8 +1633,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
static void xol_free_insn_slot(struct task_struct *tsk)
{
struct xol_area *area;
- unsigned long vma_end;
unsigned long slot_addr;
+ unsigned long offset;
if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask)
return;
@@ -1643,24 +1643,21 @@ static void xol_free_insn_slot(struct task_struct *tsk)
if (unlikely(!slot_addr))
return;
+ tsk->utask->xol_vaddr = 0;
area = tsk->mm->uprobes_state.xol_area;
- vma_end = area->vaddr + PAGE_SIZE;
- if (area->vaddr <= slot_addr && slot_addr < vma_end) {
- unsigned long offset;
- int slot_nr;
-
- offset = slot_addr - area->vaddr;
- slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
- if (slot_nr >= UINSNS_PER_PAGE)
- return;
+ offset = slot_addr - area->vaddr;
+ /*
+ * slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE).
+ * This check can only fail if the "[uprobes]" vma was mremap'ed.
+ */
+ if (offset < PAGE_SIZE) {
+ int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
clear_bit(slot_nr, area->bitmap);
atomic_dec(&area->slot_count);
smp_mb__after_atomic(); /* pairs with prepare_to_wait() */
if (waitqueue_active(&area->wq))
wake_up(&area->wq);
-
- tsk->utask->xol_vaddr = 0;
}
}
--
2.43.0
From: Oleg Nesterov <oleg(a)redhat.com>
[ Upstream commit c7b4133c48445dde789ed30b19ccb0448c7593f7 ]
1. Clear utask->xol_vaddr unconditionally, even if this addr is not valid,
xol_free_insn_slot() should never return with utask->xol_vaddr != NULL.
2. Add a comment to explain why do we need to validate slot_addr.
3. Simplify the validation above. We can simply check offset < PAGE_SIZE,
unsigned underflows are fine, it should work if slot_addr < area->vaddr.
4. Kill the unnecessary "slot_nr >= UINSNS_PER_PAGE" check, slot_nr must
be valid if offset < PAGE_SIZE.
The next patches will cleanup this function even more.
Signed-off-by: Oleg Nesterov <oleg(a)redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org>
Link: https://lore.kernel.org/r/20240929144235.GA9471@redhat.com
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
kernel/events/uprobes.c | 21 +++++++++------------
1 file changed, 9 insertions(+), 12 deletions(-)
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 9ee25351cecac..e3c616085137e 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -1632,8 +1632,8 @@ static unsigned long xol_get_insn_slot(struct uprobe *uprobe)
static void xol_free_insn_slot(struct task_struct *tsk)
{
struct xol_area *area;
- unsigned long vma_end;
unsigned long slot_addr;
+ unsigned long offset;
if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask)
return;
@@ -1642,24 +1642,21 @@ static void xol_free_insn_slot(struct task_struct *tsk)
if (unlikely(!slot_addr))
return;
+ tsk->utask->xol_vaddr = 0;
area = tsk->mm->uprobes_state.xol_area;
- vma_end = area->vaddr + PAGE_SIZE;
- if (area->vaddr <= slot_addr && slot_addr < vma_end) {
- unsigned long offset;
- int slot_nr;
-
- offset = slot_addr - area->vaddr;
- slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
- if (slot_nr >= UINSNS_PER_PAGE)
- return;
+ offset = slot_addr - area->vaddr;
+ /*
+ * slot_addr must fit into [area->vaddr, area->vaddr + PAGE_SIZE).
+ * This check can only fail if the "[uprobes]" vma was mremap'ed.
+ */
+ if (offset < PAGE_SIZE) {
+ int slot_nr = offset / UPROBE_XOL_SLOT_BYTES;
clear_bit(slot_nr, area->bitmap);
atomic_dec(&area->slot_count);
smp_mb__after_atomic(); /* pairs with prepare_to_wait() */
if (waitqueue_active(&area->wq))
wake_up(&area->wq);
-
- tsk->utask->xol_vaddr = 0;
}
}
--
2.43.0
From: Lukas Wunner <lukas(a)wunner.de>
[ Upstream commit 3b0565c703503f832d6cd7ba805aafa3b330cb9d ]
When extracting a signature component r or s from an ASN.1-encoded
integer, ecdsa_get_signature_rs() subtracts the expected length
"bufsize" from the ASN.1 length "vlen" (both of unsigned type size_t)
and stores the result in "diff" (of signed type ssize_t).
This results in a signed integer overflow if vlen > SSIZE_MAX + bufsize.
The kernel is compiled with -fno-strict-overflow, which implies -fwrapv,
meaning signed integer overflow is not undefined behavior. And the
function does check for overflow:
if (-diff >= bufsize)
return -EINVAL;
So the code is fine in principle but not very obvious. In the future it
might trigger a false-positive with CONFIG_UBSAN_SIGNED_WRAP=y.
Avoid by comparing the two unsigned variables directly and erroring out
if "vlen" is too large.
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Reviewed-by: Stefan Berger <stefanb(a)linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
crypto/ecdsa.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/crypto/ecdsa.c b/crypto/ecdsa.c
index d5a10959ec281..80ef16ae6a40b 100644
--- a/crypto/ecdsa.c
+++ b/crypto/ecdsa.c
@@ -36,29 +36,24 @@ static int ecdsa_get_signature_rs(u64 *dest, size_t hdrlen, unsigned char tag,
const void *value, size_t vlen, unsigned int ndigits)
{
size_t bufsize = ndigits * sizeof(u64);
- ssize_t diff = vlen - bufsize;
const char *d = value;
- if (!value || !vlen)
+ if (!value || !vlen || vlen > bufsize + 1)
return -EINVAL;
- /* diff = 0: 'value' has exacly the right size
- * diff > 0: 'value' has too many bytes; one leading zero is allowed that
- * makes the value a positive integer; error on more
- * diff < 0: 'value' is missing leading zeros
+ /*
+ * vlen may be 1 byte larger than bufsize due to a leading zero byte
+ * (necessary if the most significant bit of the integer is set).
*/
- if (diff > 0) {
+ if (vlen > bufsize) {
/* skip over leading zeros that make 'value' a positive int */
if (*d == 0) {
vlen -= 1;
- diff--;
d++;
- }
- if (diff)
+ } else {
return -EINVAL;
+ }
}
- if (-diff >= bufsize)
- return -EINVAL;
ecc_digits_from_bytes(d, vlen, dest, ndigits);
--
2.43.0
From: Lukas Wunner <lukas(a)wunner.de>
[ Upstream commit 3b0565c703503f832d6cd7ba805aafa3b330cb9d ]
When extracting a signature component r or s from an ASN.1-encoded
integer, ecdsa_get_signature_rs() subtracts the expected length
"bufsize" from the ASN.1 length "vlen" (both of unsigned type size_t)
and stores the result in "diff" (of signed type ssize_t).
This results in a signed integer overflow if vlen > SSIZE_MAX + bufsize.
The kernel is compiled with -fno-strict-overflow, which implies -fwrapv,
meaning signed integer overflow is not undefined behavior. And the
function does check for overflow:
if (-diff >= bufsize)
return -EINVAL;
So the code is fine in principle but not very obvious. In the future it
might trigger a false-positive with CONFIG_UBSAN_SIGNED_WRAP=y.
Avoid by comparing the two unsigned variables directly and erroring out
if "vlen" is too large.
Signed-off-by: Lukas Wunner <lukas(a)wunner.de>
Reviewed-by: Stefan Berger <stefanb(a)linux.ibm.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron(a)huawei.com>
Signed-off-by: Herbert Xu <herbert(a)gondor.apana.org.au>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
crypto/ecdsa.c | 19 +++++++------------
1 file changed, 7 insertions(+), 12 deletions(-)
diff --git a/crypto/ecdsa.c b/crypto/ecdsa.c
index d5a10959ec281..80ef16ae6a40b 100644
--- a/crypto/ecdsa.c
+++ b/crypto/ecdsa.c
@@ -36,29 +36,24 @@ static int ecdsa_get_signature_rs(u64 *dest, size_t hdrlen, unsigned char tag,
const void *value, size_t vlen, unsigned int ndigits)
{
size_t bufsize = ndigits * sizeof(u64);
- ssize_t diff = vlen - bufsize;
const char *d = value;
- if (!value || !vlen)
+ if (!value || !vlen || vlen > bufsize + 1)
return -EINVAL;
- /* diff = 0: 'value' has exacly the right size
- * diff > 0: 'value' has too many bytes; one leading zero is allowed that
- * makes the value a positive integer; error on more
- * diff < 0: 'value' is missing leading zeros
+ /*
+ * vlen may be 1 byte larger than bufsize due to a leading zero byte
+ * (necessary if the most significant bit of the integer is set).
*/
- if (diff > 0) {
+ if (vlen > bufsize) {
/* skip over leading zeros that make 'value' a positive int */
if (*d == 0) {
vlen -= 1;
- diff--;
d++;
- }
- if (diff)
+ } else {
return -EINVAL;
+ }
}
- if (-diff >= bufsize)
- return -EINVAL;
ecc_digits_from_bytes(d, vlen, dest, ndigits);
--
2.43.0
From: Thomas Richter <tmricht(a)linux.ibm.com>
[ Upstream commit a0bd7dacbd51c632b8e2c0500b479af564afadf3 ]
CPU hotplug remove handling triggers the following function
call sequence:
CPUHP_AP_PERF_S390_SF_ONLINE --> s390_pmu_sf_offline_cpu()
...
CPUHP_AP_PERF_ONLINE --> perf_event_exit_cpu()
The s390 CPUMF sampling CPU hotplug handler invokes:
s390_pmu_sf_offline_cpu()
+--> cpusf_pmu_setup()
+--> setup_pmc_cpu()
+--> deallocate_buffers()
This function de-allocates all sampling data buffers (SDBs) allocated
for that CPU at event initialization. It also clears the
PMU_F_RESERVED bit. The CPU is gone and can not be sampled.
With the event still being active on the removed CPU, the CPU event
hotplug support in kernel performance subsystem triggers the
following function calls on the removed CPU:
perf_event_exit_cpu()
+--> perf_event_exit_cpu_context()
+--> __perf_event_exit_context()
+--> __perf_remove_from_context()
+--> event_sched_out()
+--> cpumsf_pmu_del()
+--> cpumsf_pmu_stop()
+--> hw_perf_event_update()
to stop and remove the event. During removal of the event, the
sampling device driver tries to read out the remaining samples from
the sample data buffers (SDBs). But they have already been freed
(and may have been re-assigned). This may lead to a use after free
situation in which case the samples are most likely invalid. In the
best case the memory has not been reassigned and still contains
valid data.
Remedy this situation and check if the CPU is still in reserved
state (bit PMU_F_RESERVED set). In this case the SDBs have not been
released an contain valid data. This is always the case when
the event is removed (and no CPU hotplug off occured).
If the PMU_F_RESERVED bit is not set, the SDB buffers are gone.
Signed-off-by: Thomas Richter <tmricht(a)linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner(a)linux.ibm.com>
Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/s390/kernel/perf_cpum_sf.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/arch/s390/kernel/perf_cpum_sf.c b/arch/s390/kernel/perf_cpum_sf.c
index 4f251cd624d7e..6047ccb6f8e26 100644
--- a/arch/s390/kernel/perf_cpum_sf.c
+++ b/arch/s390/kernel/perf_cpum_sf.c
@@ -1862,7 +1862,9 @@ static void cpumsf_pmu_stop(struct perf_event *event, int flags)
event->hw.state |= PERF_HES_STOPPED;
if ((flags & PERF_EF_UPDATE) && !(event->hw.state & PERF_HES_UPTODATE)) {
- hw_perf_event_update(event, 1);
+ /* CPU hotplug off removes SDBs. No samples to extract. */
+ if (cpuhw->flags & PMU_F_RESERVED)
+ hw_perf_event_update(event, 1);
event->hw.state |= PERF_HES_UPTODATE;
}
perf_pmu_enable(event->pmu);
--
2.43.0
From: Christian Brauner <brauner(a)kernel.org>
[ Upstream commit 6474353a5e3d0b2cf610153cea0c61f576a36d0a ]
Epoll relies on a racy fastpath check during __fput() in
eventpoll_release() to avoid the hit of pointlessly acquiring a
semaphore. Annotate that race by using WRITE_ONCE() and READ_ONCE().
Link: https://lore.kernel.org/r/66edfb3c.050a0220.3195df.001a.GAE@google.com
Link: https://lore.kernel.org/r/20240925-fungieren-anbauen-79b334b00542@brauner
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reported-by: syzbot+3b6b32dc50537a49bb4a(a)syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/eventpoll.c | 6 ++++--
include/linux/eventpoll.h | 2 +-
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index b60edddf17870..7413b4a6ba282 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -696,7 +696,8 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
to_free = NULL;
head = file->f_ep;
if (head->first == &epi->fllink && !epi->fllink.next) {
- file->f_ep = NULL;
+ /* See eventpoll_release() for details. */
+ WRITE_ONCE(file->f_ep, NULL);
if (!is_file_epoll(file)) {
struct epitems_head *v;
v = container_of(head, struct epitems_head, epitems);
@@ -1460,7 +1461,8 @@ static int attach_epitem(struct file *file, struct epitem *epi)
spin_unlock(&file->f_lock);
goto allocate;
}
- file->f_ep = head;
+ /* See eventpoll_release() for details. */
+ WRITE_ONCE(file->f_ep, head);
to_free = NULL;
}
hlist_add_head_rcu(&epi->fllink, file->f_ep);
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 3337745d81bd6..0c0d00fcd131f 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -42,7 +42,7 @@ static inline void eventpoll_release(struct file *file)
* because the file in on the way to be removed and nobody ( but
* eventpoll ) has still a reference to this file.
*/
- if (likely(!file->f_ep))
+ if (likely(!READ_ONCE(file->f_ep)))
return;
/*
--
2.43.0
From: Christian Brauner <brauner(a)kernel.org>
[ Upstream commit 6474353a5e3d0b2cf610153cea0c61f576a36d0a ]
Epoll relies on a racy fastpath check during __fput() in
eventpoll_release() to avoid the hit of pointlessly acquiring a
semaphore. Annotate that race by using WRITE_ONCE() and READ_ONCE().
Link: https://lore.kernel.org/r/66edfb3c.050a0220.3195df.001a.GAE@google.com
Link: https://lore.kernel.org/r/20240925-fungieren-anbauen-79b334b00542@brauner
Reviewed-by: Jan Kara <jack(a)suse.cz>
Reported-by: syzbot+3b6b32dc50537a49bb4a(a)syzkaller.appspotmail.com
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
fs/eventpoll.c | 6 ++++--
include/linux/eventpoll.h | 2 +-
2 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 7221072f39fad..f296ffb57d052 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -703,7 +703,8 @@ static int ep_remove(struct eventpoll *ep, struct epitem *epi)
to_free = NULL;
head = file->f_ep;
if (head->first == &epi->fllink && !epi->fllink.next) {
- file->f_ep = NULL;
+ /* See eventpoll_release() for details. */
+ WRITE_ONCE(file->f_ep, NULL);
if (!is_file_epoll(file)) {
struct epitems_head *v;
v = container_of(head, struct epitems_head, epitems);
@@ -1467,7 +1468,8 @@ static int attach_epitem(struct file *file, struct epitem *epi)
spin_unlock(&file->f_lock);
goto allocate;
}
- file->f_ep = head;
+ /* See eventpoll_release() for details. */
+ WRITE_ONCE(file->f_ep, head);
to_free = NULL;
}
hlist_add_head_rcu(&epi->fllink, file->f_ep);
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index 3337745d81bd6..0c0d00fcd131f 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -42,7 +42,7 @@ static inline void eventpoll_release(struct file *file)
* because the file in on the way to be removed and nobody ( but
* eventpoll ) has still a reference to this file.
*/
- if (likely(!file->f_ep))
+ if (likely(!READ_ONCE(file->f_ep)))
return;
/*
--
2.43.0
Hi!
The following upstream commits have not been queued for stable due to
missing `Fixes:` tags, but are strongly recommended for correct PTP
operation on the i.MX 6 SoC family, please pick them. Our first priority
would be 6.6, the last LTS, but obviously all stable versions would benefit.
* 4374a1fe580a ("net: fec: Move `fec_ptp_read()` to the top of the file")
* 713ebaed68d8 ("net: fec: Remove duplicated code")
* bf8ca67e2167 [tree: net-next] ("net: fec: refactor PPS channel
configuration")
Bence
A missing or empty dma-ranges in a DT node implies a 1:1 mapping for dma
translations. In this specific case, the current behaviour is to zero out
the entire specifier so that the translation could be carried on as an
offset from zero. This includes address specifier that has flags (e.g.
PCI ranges).
Once the flags portion has been zeroed, the translation chain is broken
since the mapping functions will check the upcoming address specifier
against mismatching flags, always failing the 1:1 mapping and its entire
purpose of always succeeding.
Set to zero only the address portion while passing the flags through.
Fixes: dbbdee94734b ("of/address: Merge all of the bus translation code")
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrea della Porta <andrea.porta(a)suse.com>
Tested-by: Herve Codina <herve.codina(a)bootlin.com>
---
drivers/of/address.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/of/address.c b/drivers/of/address.c
index 286f0c161e33..b3479586bd4d 100644
--- a/drivers/of/address.c
+++ b/drivers/of/address.c
@@ -455,7 +455,8 @@ static int of_translate_one(struct device_node *parent, struct of_bus *bus,
}
if (ranges == NULL || rlen == 0) {
offset = of_read_number(addr, na);
- memset(addr, 0, pna * 4);
+ /* set address to zero, pass flags through */
+ memset(addr + pbus->flag_cells, 0, (pna - pbus->flag_cells) * 4);
pr_debug("empty ranges; 1:1 translation\n");
goto finish;
}
--
2.35.3
This series switches from the device_for_each_child_node() macro to its
scoped variant, which in general makes the code more robust if new early
exits are added to the loops, because there is no need for explicit
calls to fwnode_handle_put(). Depending on the complexity of the loop
and its error handling, the code gets simplified and it gets easier to
follow.
The non-scoped variant of the macro is error-prone, and it has been the
source of multiple bugs where the child node refcount was not
decremented accordingly in error paths within the loops. The first patch
of this series is a good example, which fixes that kind of bug that is
regularly found in node iterators.
The uses of device_for_each_child_node() with no early exits have been
left untouched because their simpilicty justifies the non-scoped
variant.
Note that the child node is now declared in the macro, and therefore the
explicit declaration is no longer required.
The general functionality should not be affected by this modification.
If functional changes are found, please report them back as errors.
Signed-off-by: Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
---
Javier Carrasco (18):
leds: flash: mt6360: fix device_for_each_child_node() refcounting in error paths
leds: flash: mt6370: switch to device_for_each_child_node_scoped()
leds: flash: leds-qcom-flash: switch to device_for_each_child_node_scoped()
leds: aw200xx: switch to device_for_each_child_node_scoped()
leds: cr0014114: switch to device_for_each_child_node_scoped()
leds: el15203000: switch to device_for_each_child_node_scoped()
leds: gpio: switch to device_for_each_child_node_scoped()
leds: lm3532: switch to device_for_each_child_node_scoped()
leds: lm3697: switch to device_for_each_child_node_scoped()
leds: lp50xx: switch to device_for_each_child_node_scoped()
leds: max77650: switch to device_for_each_child_node_scoped()
leds: ns2: switch to device_for_each_child_node_scoped()
leds: pca963x: switch to device_for_each_child_node_scoped()
leds: pwm: switch to device_for_each_child_node_scoped()
leds: sun50i-a100: switch to device_for_each_child_node_scoped()
leds: tca6507: switch to device_for_each_child_node_scoped()
leds: rgb: ktd202x: switch to device_for_each_child_node_scoped()
leds: rgb: mt6370: switch to device_for_each_child_node_scoped()
drivers/leds/flash/leds-mt6360.c | 3 +--
drivers/leds/flash/leds-mt6370-flash.c | 11 +++-------
drivers/leds/flash/leds-qcom-flash.c | 4 +---
drivers/leds/leds-aw200xx.c | 7 ++-----
drivers/leds/leds-cr0014114.c | 4 +---
drivers/leds/leds-el15203000.c | 14 ++++---------
drivers/leds/leds-gpio.c | 9 +++------
drivers/leds/leds-lm3532.c | 18 +++++++----------
drivers/leds/leds-lm3697.c | 18 ++++++-----------
drivers/leds/leds-lp50xx.c | 21 +++++++------------
drivers/leds/leds-max77650.c | 18 ++++++-----------
drivers/leds/leds-ns2.c | 7 ++-----
drivers/leds/leds-pca963x.c | 11 +++-------
drivers/leds/leds-pwm.c | 15 ++++----------
drivers/leds/leds-sun50i-a100.c | 27 +++++++++----------------
drivers/leds/leds-tca6507.c | 7 ++-----
drivers/leds/rgb/leds-ktd202x.c | 8 +++-----
drivers/leds/rgb/leds-mt6370-rgb.c | 37 ++++++++++------------------------
18 files changed, 75 insertions(+), 164 deletions(-)
---
base-commit: 92fc9636d1471b7f68bfee70c776f7f77e747b97
change-id: 20240926-leds_device_for_each_child_node_scoped-5a95255413fa
Best regards,
--
Javier Carrasco <javier.carrasco.cruz(a)gmail.com>
Piergiorgio reported a bug in bugzilla as below:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330
RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs]
Call Trace:
__issue_discard_cmd+0x1ca/0x350 [f2fs]
issue_discard_thread+0x191/0x480 [f2fs]
kthread+0xcf/0x100
ret_from_fork+0x31/0x50
ret_from_fork_asm+0x1a/0x30
w/ below testcase, it can reproduce this bug quickly:
- pvcreate /dev/vdb
- vgcreate myvg1 /dev/vdb
- lvcreate -L 1024m -n mylv1 myvg1
- mount /dev/myvg1/mylv1 /mnt/f2fs
- dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20
- sync
- rm /mnt/f2fs/file
- sync
- lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1
- umount /mnt/f2fs
The root cause is: it will update discard_max_bytes of mounted lvm
device to zero after creating snapshot on this lvm device, then,
__submit_discard_cmd() will pass parameter @nr_sects w/ zero value
to __blkdev_issue_discard(), it returns a NULL bio pointer, result
in panic.
This patch changes as below for fixing:
1. Let's drop all remained discards in f2fs_unfreeze() if snapshot
of lvm device is created.
2. Checking discard_max_bytes before submitting discard during
__submit_discard_cmd().
Cc: stable(a)vger.kernel.org
Fixes: 35ec7d574884 ("f2fs: split discard command in prior to block layer")
Reported-by: Piergiorgio Sartor <piergiorgio.sartor(a)nexgo.de>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484
Signed-off-by: Chao Yu <chao(a)kernel.org>
---
fs/f2fs/segment.c | 16 +++++++++-------
fs/f2fs/super.c | 12 ++++++++++++
2 files changed, 21 insertions(+), 7 deletions(-)
diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index 7bdfe08ce9ea..af3fb3f6d9b5 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -1290,16 +1290,18 @@ static int __submit_discard_cmd(struct f2fs_sb_info *sbi,
wait_list, issued);
return 0;
}
-
- /*
- * Issue discard for conventional zones only if the device
- * supports discard.
- */
- if (!bdev_max_discard_sectors(bdev))
- return -EOPNOTSUPP;
}
#endif
+ /*
+ * stop issuing discard for any of below cases:
+ * 1. device is conventional zone, but it doesn't support discard.
+ * 2. device is regulare device, after snapshot it doesn't support
+ * discard.
+ */
+ if (!bdev_max_discard_sectors(bdev))
+ return -EOPNOTSUPP;
+
trace_f2fs_issue_discard(bdev, dc->di.start, dc->di.len);
lstart = dc->di.lstart;
diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
index c0670cd61956..fc7d463dee15 100644
--- a/fs/f2fs/super.c
+++ b/fs/f2fs/super.c
@@ -1760,6 +1760,18 @@ static int f2fs_freeze(struct super_block *sb)
static int f2fs_unfreeze(struct super_block *sb)
{
+ struct f2fs_sb_info *sbi = F2FS_SB(sb);
+
+ /*
+ * It will update discard_max_bytes of mounted lvm device to zero
+ * after creating snapshot on this lvm device, let's drop all
+ * remained discards.
+ * We don't need to disable real-time discard because discard_max_bytes
+ * will recover after removal of snapshot.
+ */
+ if (test_opt(sbi, DISCARD) && !f2fs_hw_support_discard(sbi))
+ f2fs_issue_discard_timeout(sbi);
+
clear_sbi_flag(F2FS_SB(sb), SBI_IS_FREEZING);
return 0;
}
--
2.40.1
the Hide wrote...
> Who should I contact regarding the following error
>
>
> E: Malformed entry 5 in list file
> /etc/apt/sources.list.d/additional-repositories.list (Component)
> E: The list of sources could not be read.
> E: _cache->open() failed, please report.
Assuming you're using Debian and not some derivatve: Some Debian users
mailing list, like <https://lists.debian.org/debian-user/>
From the above error message I assume there's a format error in
/etc/apt/sources.list.d/additional-repositories.list - so it was wise to
include the content of that file in a message to that list.
If it's actually a bug in apt, the Debian bug tracker was the place to
go. This list here however is about development of the Linux kernel, the
stable releases, so not quite the right place.
Christoph
Currently in some testcases we can trigger:
[drm] *ERROR* GT0: SCHED_DONE: Unexpected engine state 0x02b1, guc_id=8, runnable_state=0
[drm] *ERROR* GT0: G2H action 0x1002 failed (-EPROTO) len 3 msg 02 10 00 90 08 00 00 00 00 00 00 00
Looking at a snippet of corresponding ftrace for this GuC id we can see:
498.852891: xe_sched_msg_add: dev=0000:03:00.0, gt=0 guc_id=8, opcode=3
498.854083: xe_sched_msg_recv: dev=0000:03:00.0, gt=0 guc_id=8, opcode=3
498.855389: xe_exec_queue_kill: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x3, flags=0x0
498.855436: xe_exec_queue_lr_cleanup: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x83, flags=0x0
498.856767: xe_exec_queue_close: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x83, flags=0x0
498.862889: xe_exec_queue_scheduling_disable: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0xa9, flags=0x0
498.863032: xe_exec_queue_scheduling_disable: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x2b9, flags=0x0
498.875596: xe_exec_queue_scheduling_done: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x2b9, flags=0x0
498.875604: xe_exec_queue_deregister: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x2b1, flags=0x0
499.074483: xe_exec_queue_deregister_done: dev=0000:03:00.0, 5:0x1, gt=0, width=1, guc_id=8, guc_state=0x2b1, flags=0x0
This looks to be the two scheduling_disable racing with each other, one
from the suspend (opcode=3) and then again during lr cleanup. While
those two operations are serialized, the G2H portion is not, therefore
when marking the queue as pending_disabled and then firing off the first
request, we proceed do the same again, however the first disable
response only fires after this which then clears the pending_disabled.
At this point the second comes back and is processed, however the
pending_disabled is no longer set, hence triggering the warning.
To fix this wait for pending_disabled when doing the lr cleanup and
calling disable_scheduling_deregister. Also do the same for all other
disable_scheduling callers.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/3515
Signed-off-by: Matthew Auld <matthew.auld(a)intel.com>
Cc: Matthew Brost <matthew.brost(a)intel.com>
Cc: <stable(a)vger.kernel.org> # v6.8+
---
drivers/gpu/drm/xe/xe_guc_submit.c | 16 ++++++++++------
1 file changed, 10 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index f9ecee5364d8..f3c22b101916 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -767,12 +767,15 @@ static void disable_scheduling_deregister(struct xe_guc *guc,
set_min_preemption_timeout(guc, q);
smp_rmb();
- ret = wait_event_timeout(guc->ct.wq, !exec_queue_pending_enable(q) ||
- xe_guc_read_stopped(guc), HZ * 5);
+ ret = wait_event_timeout(guc->ct.wq,
+ (!exec_queue_pending_enable(q) &&
+ !exec_queue_pending_disable(q)) ||
+ xe_guc_read_stopped(guc),
+ HZ * 5);
if (!ret) {
struct xe_gpu_scheduler *sched = &q->guc->sched;
- xe_gt_warn(q->gt, "Pending enable failed to respond\n");
+ xe_gt_warn(q->gt, "Pending enable/disable failed to respond\n");
xe_sched_submission_start(sched);
xe_gt_reset_async(q->gt);
xe_sched_tdr_queue_imm(sched);
@@ -1099,7 +1102,8 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
* modifying state
*/
ret = wait_event_timeout(guc->ct.wq,
- !exec_queue_pending_enable(q) ||
+ (!exec_queue_pending_enable(q) &&
+ !exec_queue_pending_disable(q)) ||
xe_guc_read_stopped(guc), HZ * 5);
if (!ret || xe_guc_read_stopped(guc))
goto trigger_reset;
@@ -1329,8 +1333,8 @@ static void __guc_exec_queue_process_msg_suspend(struct xe_sched_msg *msg)
if (guc_exec_queue_allowed_to_change_state(q) && !exec_queue_suspended(q) &&
exec_queue_enabled(q)) {
- wait_event(guc->ct.wq, q->guc->resume_time != RESUME_PENDING ||
- xe_guc_read_stopped(guc));
+ wait_event(guc->ct.wq, (q->guc->resume_time != RESUME_PENDING ||
+ xe_guc_read_stopped(guc)) && !exec_queue_pending_disable(q));
if (!xe_guc_read_stopped(guc)) {
s64 since_resume_ms =
--
2.47.0
The patch titled "scsi: core: Fix scsi_mode_sense() buffer length handling"
addresses CVE-2021-47182, fixing the following issues in `scsi_mode_sense()`
buffer length handling:
1. Incorrect handling of the allocation length field in the MODE SENSE(10)
command, causing truncation of buffer lengths larger than 255 bytes.
2. Memory corruption when handling small buffer lengths due to lack of proper
validation.
Original patch submission:
https://lore.kernel.org/all/20210820070255.682775-2-damien.lemoal@wdc.com/
CVE announcement in linux-cve-announce:
https://lore.kernel.org/linux-cve-announce/2024041032-CVE-2021-47182-377e@g…
Fixed versions:
- Fixed in 5.15.5 with commit e15de347faf4
- Fixed in 5.16 with commit 17b49bcbf835
Official CVE entry:
https://cve.org/CVERecord/?id=CVE-2021-47182
[PATCH 5.10.y] scsi: core: Fix scsi_mode_sense() buffer length handling
From: John Watts <contact(a)jookia.org>
[ Upstream commit f8da001ae7af0abd9f6250c02c01a1121074ca60 ]
The audio graph card doesn't mark its subnodes such as multi {}, dpcm {}
and c2c {} as not requiring any suppliers. This causes a hang as Linux
waits for these phantom suppliers to show up on boot.
Make it clear these nodes have no suppliers.
Example error message:
[ 15.208558] platform 2034000.i2s: deferred probe pending: platform: wait for supplier /sound/multi
[ 15.208584] platform sound: deferred probe pending: asoc-audio-graph-card2: parse error
Signed-off-by: John Watts <contact(a)jookia.org>
Acked-by: Kuninori Morimoto <kuninori.morimoto.gx(a)renesas.com>
Link: https://patch.msgid.link/20241108-graph_dt_fix-v1-1-173e2f9603d6@jookia.org
Signed-off-by: Mark Brown <broonie(a)kernel.org>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
sound/soc/generic/audio-graph-card2.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/sound/soc/generic/audio-graph-card2.c b/sound/soc/generic/audio-graph-card2.c
index b1c675c6b6db6..686e0dea2bc75 100644
--- a/sound/soc/generic/audio-graph-card2.c
+++ b/sound/soc/generic/audio-graph-card2.c
@@ -261,16 +261,19 @@ static enum graph_type __graph_get_type(struct device_node *lnk)
if (of_node_name_eq(np, GRAPH_NODENAME_MULTI)) {
ret = GRAPH_MULTI;
+ fw_devlink_purge_absent_suppliers(&np->fwnode);
goto out_put;
}
if (of_node_name_eq(np, GRAPH_NODENAME_DPCM)) {
ret = GRAPH_DPCM;
+ fw_devlink_purge_absent_suppliers(&np->fwnode);
goto out_put;
}
if (of_node_name_eq(np, GRAPH_NODENAME_C2C)) {
ret = GRAPH_C2C;
+ fw_devlink_purge_absent_suppliers(&np->fwnode);
goto out_put;
}
--
2.43.0
I'm announcing the release of the 6.12.1 kernel.
All users of the 6.12 kernel series must upgrade.
The updated 6.12.y git tree can be found at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git linux-6.12.y
and can be browsed at the normal kernel.org git web browser:
https://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=summary
thanks,
greg k-h
------------
Makefile | 2 +-
drivers/media/usb/uvc/uvc_driver.c | 2 +-
mm/mmap.c | 13 ++++++++++++-
net/vmw_vsock/hyperv_transport.c | 1 +
4 files changed, 15 insertions(+), 3 deletions(-)
Benoit Sevens (1):
media: uvcvideo: Skip parsing frames of type UVC_VS_UNDEFINED in uvc_parse_format
Greg Kroah-Hartman (1):
Linux 6.12.1
Hyunwoo Kim (1):
hv_sock: Initializing vsk->trans to NULL to prevent a dangling pointer
Liam R. Howlett (1):
mm/mmap: fix __mmap_region() error handling in rare merge failure case
Paul, Neeraj, and Stable Team:
I've run into a case with rcu_tasks_postscan where the warning introduced as
part of 46aa886c4("rcu-tasks: Fix IPI failure handling in
trc_wait_for_one_reader") is getting triggered when trc_wait_for_one_reader
sends an IPI to a CPU that is offline. This is occurring on a platform that has
hotplug slots available but not populated. I don't believe the bug is caused by
this change, but I do think that Paul's commit that confines the postscan
operation to just the active CPUs would help prevent this from happening.
Would the RCU maintainers be amenable to having this patch backported to the
5.10 and 5.15 branches as well? I've attached cherry-picks of the relevant
commits to minimize the additional work needed.
Thanks,
-K
Paul E. McKenney (1):
rcu-tasks: Idle tasks on offline CPUs are in quiescent states
kernel/rcu/tasks.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.25.1
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
[ Upstream commit 7d3b793faaab1305994ce568b59d61927235f57b ]
When enabling access to the special register set, Receiver time-out and
RHR interrupts can happen. In this case, the IRQ handler will try to read
from the FIFO thru the RHR register at address 0x00, but address 0x00 is
mapped to DLL register, resulting in erroneous FIFO reading.
Call graph example:
sc16is7xx_startup(): entry
sc16is7xx_ms_proc(): entry
sc16is7xx_set_termios(): entry
sc16is7xx_set_baud(): DLH/DLL = $009C --> access special register set
sc16is7xx_port_irq() entry --> IIR is 0x0C
sc16is7xx_handle_rx() entry
sc16is7xx_fifo_read(): --> unable to access FIFO (RHR) because it is
mapped to DLL (LCR=LCR_CONF_MODE_A)
sc16is7xx_set_baud(): exit --> Restore access to general register set
Fix the problem by claiming the efr_lock mutex when accessing the Special
register set.
Fixes: dfeae619d781 ("serial: sc16is7xx")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Link: https://lore.kernel.org/r/20240723125302.1305372-3-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[ Resolve minor conflicts ]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
---
drivers/tty/serial/sc16is7xx.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index a723df9b37dd..c07baf5d5a9c 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -545,6 +545,9 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud)
SC16IS7XX_MCR_CLKSEL_BIT,
prescaler == 1 ? 0 : SC16IS7XX_MCR_CLKSEL_BIT);
+
+ mutex_lock(&one->efr_lock);
+
/* Open the LCR divisors for configuration */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG,
SC16IS7XX_LCR_CONF_MODE_A);
@@ -558,6 +561,8 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud)
/* Put LCR back to the normal mode */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr);
+ mutex_unlock(&one->efr_lock);
+
return DIV_ROUND_CLOSEST((clk / prescaler) / 16, div);
}
--
2.43.0
From: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
[ Upstream commit 7d3b793faaab1305994ce568b59d61927235f57b ]
When enabling access to the special register set, Receiver time-out and
RHR interrupts can happen. In this case, the IRQ handler will try to read
from the FIFO thru the RHR register at address 0x00, but address 0x00 is
mapped to DLL register, resulting in erroneous FIFO reading.
Call graph example:
sc16is7xx_startup(): entry
sc16is7xx_ms_proc(): entry
sc16is7xx_set_termios(): entry
sc16is7xx_set_baud(): DLH/DLL = $009C --> access special register set
sc16is7xx_port_irq() entry --> IIR is 0x0C
sc16is7xx_handle_rx() entry
sc16is7xx_fifo_read(): --> unable to access FIFO (RHR) because it is
mapped to DLL (LCR=LCR_CONF_MODE_A)
sc16is7xx_set_baud(): exit --> Restore access to general register set
Fix the problem by claiming the efr_lock mutex when accessing the Special
register set.
Fixes: dfeae619d781 ("serial: sc16is7xx")
Cc: stable(a)vger.kernel.org
Signed-off-by: Hugo Villeneuve <hvilleneuve(a)dimonoff.com>
Link: https://lore.kernel.org/r/20240723125302.1305372-3-hugo@hugovil.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
[ Resolve minor conflicts ]
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
---
drivers/tty/serial/sc16is7xx.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/tty/serial/sc16is7xx.c b/drivers/tty/serial/sc16is7xx.c
index 7a9924d9b294..f290fbe21d63 100644
--- a/drivers/tty/serial/sc16is7xx.c
+++ b/drivers/tty/serial/sc16is7xx.c
@@ -545,6 +545,8 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud)
SC16IS7XX_MCR_CLKSEL_BIT,
prescaler == 1 ? 0 : SC16IS7XX_MCR_CLKSEL_BIT);
+ mutex_lock(&one->efr_lock);
+
/* Open the LCR divisors for configuration */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG,
SC16IS7XX_LCR_CONF_MODE_A);
@@ -558,6 +560,8 @@ static int sc16is7xx_set_baud(struct uart_port *port, int baud)
/* Put LCR back to the normal mode */
sc16is7xx_port_write(port, SC16IS7XX_LCR_REG, lcr);
+ mutex_unlock(&one->efr_lock);
+
return DIV_ROUND_CLOSEST((clk / prescaler) / 16, div);
}
--
2.43.0
From: Christian Gmeiner <cgmeiner(a)igalia.com>
If the active performance monitor (v3d->active_perfmon) is being
destroyed, stop it first. Currently, the active perfmon is not
stopped during destruction, leaving the v3d->active_perfmon pointer
stale. This can lead to undefined behavior and instability.
This patch ensures that the active perfmon is stopped before being
destroyed, aligning with the behavior introduced in commit
7d1fd3638ee3 ("drm/v3d: Stop the active perfmon before being destroyed").
Cc: stable(a)vger.kernel.org # v5.15+
Fixes: 26a4dc29b74a ("drm/v3d: Expose performance counters to userspace")
Signed-off-by: Christian Gmeiner <cgmeiner(a)igalia.com>
---
drivers/gpu/drm/v3d/v3d_perfmon.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/drivers/gpu/drm/v3d/v3d_perfmon.c b/drivers/gpu/drm/v3d/v3d_perfmon.c
index 00cd081d7873..909288d43f2f 100644
--- a/drivers/gpu/drm/v3d/v3d_perfmon.c
+++ b/drivers/gpu/drm/v3d/v3d_perfmon.c
@@ -383,6 +383,7 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void *data,
{
struct v3d_file_priv *v3d_priv = file_priv->driver_priv;
struct drm_v3d_perfmon_destroy *req = data;
+ struct v3d_dev *v3d = v3d_priv->v3d;
struct v3d_perfmon *perfmon;
mutex_lock(&v3d_priv->perfmon.lock);
@@ -392,6 +393,10 @@ int v3d_perfmon_destroy_ioctl(struct drm_device *dev, void *data,
if (!perfmon)
return -EINVAL;
+ /* If the active perfmon is being destroyed, stop it first */
+ if (perfmon == v3d->active_perfmon)
+ v3d_perfmon_stop(v3d, perfmon, false);
+
v3d_perfmon_put(perfmon);
return 0;
--
2.47.0
If iov_iter_zero succeeds after failed copy_from_kernel_nofault,
we need to reset the ret value to zero otherwise it will be returned
as final return value of read_kcore_iter.
This fixes objdump -d dump over /proc/kcore for me.
Cc: stable(a)vger.kernel.org
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Fixes: 3d5854d75e31 ("fs/proc/kcore.c: allow translation of physical memory addresses")
Signed-off-by: Jiri Olsa <jolsa(a)kernel.org>
---
fs/proc/kcore.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 51446c59388f..c82c408e573e 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -600,6 +600,7 @@ static ssize_t read_kcore_iter(struct kiocb *iocb, struct iov_iter *iter)
ret = -EFAULT;
goto out;
}
+ ret = 0;
/*
* We know the bounce buffer is safe to copy from, so
* use _copy_to_iter() directly.
--
2.47.0
The first two are perhaps not strictly fixes, as they essentially add
support for nested ATRs. The third one is a fix, thus stable is in cc.
Tomi
Signed-off-by: Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
---
Cosmin Tanislav (1):
i2c: atr: Allow unmapped addresses from nested ATRs
Tomi Valkeinen (2):
i2c: atr: Fix lockdep for nested ATRs
i2c: atr: Fix client detach
drivers/i2c/i2c-atr.c | 25 ++++++++++++++-----------
1 file changed, 14 insertions(+), 11 deletions(-)
---
base-commit: adc218676eef25575469234709c2d87185ca223a
change-id: 20241121-i2c-atr-fixes-6d52b9f5f0c1
Best regards,
--
Tomi Valkeinen <tomi.valkeinen(a)ideasonboard.com>
This is the start of the stable review cycle for the 6.12.1 release.
There are 3 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.
Responses should be made by Fri, 22 Nov 2024 12:40:53 +0000.
Anything received after that time might be too late.
The whole patch series can be found in one patch at:
https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.12.1-rc1…
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.12.y
and the diffstat can be found below.
thanks,
greg k-h
-------------
Pseudo-Shortlog of commits:
Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Linux 6.12.1-rc1
Liam R. Howlett <Liam.Howlett(a)Oracle.com>
mm/mmap: fix __mmap_region() error handling in rare merge failure case
Benoit Sevens <bsevens(a)google.com>
media: uvcvideo: Skip parsing frames of type UVC_VS_UNDEFINED in uvc_parse_format
Hyunwoo Kim <v4bel(a)theori.io>
hv_sock: Initializing vsk->trans to NULL to prevent a dangling pointer
-------------
Diffstat:
Makefile | 4 ++--
drivers/media/usb/uvc/uvc_driver.c | 2 +-
mm/mmap.c | 13 ++++++++++++-
net/vmw_vsock/hyperv_transport.c | 1 +
4 files changed, 16 insertions(+), 4 deletions(-)
The patch titled
Subject: nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
nilfs2-fix-potential-out-of-bounds-memory-access-in-nilfs_find_entry.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Subject: nilfs2: fix potential out-of-bounds memory access in nilfs_find_entry()
Date: Wed, 20 Nov 2024 02:23:37 +0900
Syzbot reported that when searching for records in a directory where the
inode's i_size is corrupted and has a large value, memory access outside
the folio/page range may occur, or a use-after-free bug may be detected if
KASAN is enabled.
This is because nilfs_last_byte(), which is called by nilfs_find_entry()
and others to calculate the number of valid bytes of directory data in a
page from i_size and the page index, loses the upper 32 bits of the 64-bit
size information due to an inappropriate type of local variable to which
the i_size value is assigned.
This caused a large byte offset value due to underflow in the end address
calculation in the calling nilfs_find_entry(), resulting in memory access
that exceeds the folio/page size.
Fix this issue by changing the type of the local variable causing the bit
loss from "unsigned int" to "u64". The return value of nilfs_last_byte()
is also of type "unsigned int", but it is truncated so as not to exceed
PAGE_SIZE and no bit loss occurs, so no change is required.
Link: https://lkml.kernel.org/r/20241119172403.9292-1-konishi.ryusuke@gmail.com
Fixes: 2ba466d74ed7 ("nilfs2: directory entry operations")
Signed-off-by: Ryusuke Konishi <konishi.ryusuke(a)gmail.com>
Reported-by: syzbot+96d5d14c47d97015c624(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=96d5d14c47d97015c624
Tested-by: syzbot+96d5d14c47d97015c624(a)syzkaller.appspotmail.com
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/nilfs2/dir.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/nilfs2/dir.c~nilfs2-fix-potential-out-of-bounds-memory-access-in-nilfs_find_entry
+++ a/fs/nilfs2/dir.c
@@ -70,7 +70,7 @@ static inline unsigned int nilfs_chunk_s
*/
static unsigned int nilfs_last_byte(struct inode *inode, unsigned long page_nr)
{
- unsigned int last_byte = inode->i_size;
+ u64 last_byte = inode->i_size;
last_byte -= page_nr << PAGE_SHIFT;
if (last_byte > PAGE_SIZE)
_
Patches currently in -mm which might be from konishi.ryusuke(a)gmail.com are
nilfs2-fix-potential-out-of-bounds-memory-access-in-nilfs_find_entry.patch
The patch titled
Subject: kasan: make report_lock a raw spinlock
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kasan-make-report_lock-a-raw-spinlock.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jared Kangas <jkangas(a)redhat.com>
Subject: kasan: make report_lock a raw spinlock
Date: Tue, 19 Nov 2024 13:02:34 -0800
If PREEMPT_RT is enabled, report_lock is a sleeping spinlock and must not
be locked when IRQs are disabled. However, KASAN reports may be triggered
in such contexts. For example:
char *s = kzalloc(1, GFP_KERNEL);
kfree(s);
local_irq_disable();
char c = *s; /* KASAN report here leads to spin_lock() */
local_irq_enable();
Make report_spinlock a raw spinlock to prevent rescheduling when
PREEMPT_RT is enabled.
Link: https://lkml.kernel.org/r/20241119210234.1602529-1-jkangas@redhat.com
Signed-off-by: Jared Kangas <jkangas(a)redhat.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Konovalov <andreyknvl(a)gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Vincenzo Frascino <vincenzo.frascino(a)arm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/report.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
--- a/mm/kasan/report.c~kasan-make-report_lock-a-raw-spinlock
+++ a/mm/kasan/report.c
@@ -200,7 +200,7 @@ static inline void fail_non_kasan_kunit_
#endif /* CONFIG_KUNIT */
-static DEFINE_SPINLOCK(report_lock);
+static DEFINE_RAW_SPINLOCK(report_lock);
static void start_report(unsigned long *flags, bool sync)
{
@@ -211,7 +211,7 @@ static void start_report(unsigned long *
lockdep_off();
/* Make sure we don't end up in loop. */
report_suppress_start();
- spin_lock_irqsave(&report_lock, *flags);
+ raw_spin_lock_irqsave(&report_lock, *flags);
pr_err("==================================================================\n");
}
@@ -221,7 +221,7 @@ static void end_report(unsigned long *fl
trace_error_report_end(ERROR_DETECTOR_KASAN,
(unsigned long)addr);
pr_err("==================================================================\n");
- spin_unlock_irqrestore(&report_lock, *flags);
+ raw_spin_unlock_irqrestore(&report_lock, *flags);
if (!test_bit(KASAN_BIT_MULTI_SHOT, &kasan_flags))
check_panic_on_warn("KASAN");
switch (kasan_arg_fault) {
_
Patches currently in -mm which might be from jkangas(a)redhat.com are
kasan-make-report_lock-a-raw-spinlock.patch
The patch titled
Subject: mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-mempolicy-fix-migrate_to_node-assuming-there-is-at-least-one-vma-in-a-mm.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: David Hildenbrand <david(a)redhat.com>
Subject: mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM
Date: Wed, 20 Nov 2024 21:11:51 +0100
We currently assume that there is at least one VMA in a MM, which isn't
true.
So we might end up having find_vma() return NULL, to then de-reference
NULL. So properly handle find_vma() returning NULL.
This fixes the report:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
__do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
__x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Link: https://lkml.kernel.org/r/20241120201151.9518-1-david@redhat.com
Fixes: 39743889aaf7 ("[PATCH] Swap Migration V5: sys_migrate_pages interface")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Reported-by: syzbot+3511625422f7aa637f0d(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/lkml/673d2696.050a0220.3c9d61.012f.GAE@google.com/T/
Reviewed-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Reviewed-by: Christoph Lameter <cl(a)linux.com>
Cc: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/mempolicy.c | 4 ++++
1 file changed, 4 insertions(+)
--- a/mm/mempolicy.c~mm-mempolicy-fix-migrate_to_node-assuming-there-is-at-least-one-vma-in-a-mm
+++ a/mm/mempolicy.c
@@ -1080,6 +1080,10 @@ static long migrate_to_node(struct mm_st
mmap_read_lock(mm);
vma = find_vma(mm, 0);
+ if (!vma) {
+ mmap_read_unlock(mm);
+ return 0;
+ }
/*
* This does not migrate the range, but isolates all pages that
_
Patches currently in -mm which might be from david(a)redhat.com are
mm-mempolicy-fix-migrate_to_node-assuming-there-is-at-least-one-vma-in-a-mm.patch
The patch titled
Subject: mm/gup: handle NULL pages in unpin_user_pages()
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
mm-gup-handle-null-pages-in-unpin_user_pages.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: John Hubbard <jhubbard(a)nvidia.com>
Subject: mm/gup: handle NULL pages in unpin_user_pages()
Date: Wed, 20 Nov 2024 19:49:33 -0800
The recent addition of "pofs" (pages or folios) handling to gup has a
flaw: it assumes that unpin_user_pages() handles NULL pages in the pages**
array. That's not the case, as I discovered when I ran on a new
configuration on my test machine.
Fix this by skipping NULL pages in unpin_user_pages(), just like
unpin_folios() already does.
Details: when booting on x86 with "numa=fake=2 movablecore=4G" on Linux
6.12, and running this:
tools/testing/selftests/mm/gup_longterm
...I get the following crash:
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:sanity_check_pinned_pages+0x3a/0x2d0
...
Call Trace:
<TASK>
? __die_body+0x66/0xb0
? page_fault_oops+0x30c/0x3b0
? do_user_addr_fault+0x6c3/0x720
? irqentry_enter+0x34/0x60
? exc_page_fault+0x68/0x100
? asm_exc_page_fault+0x22/0x30
? sanity_check_pinned_pages+0x3a/0x2d0
unpin_user_pages+0x24/0xe0
check_and_migrate_movable_pages_or_folios+0x455/0x4b0
__gup_longterm_locked+0x3bf/0x820
? mmap_read_lock_killable+0x12/0x50
? __pfx_mmap_read_lock_killable+0x10/0x10
pin_user_pages+0x66/0xa0
gup_test_ioctl+0x358/0xb20
__se_sys_ioctl+0x6b/0xc0
do_syscall_64+0x7b/0x150
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Link: https://lkml.kernel.org/r/20241121034933.77502-1-jhubbard@nvidia.com
Fixes: 94efde1d1539 ("mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases")
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Gerd Hoffmann <kraxel(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Dongwon Kim <dongwon.kim(a)intel.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Junxiao Chang <junxiao.chang(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/gup.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
--- a/mm/gup.c~mm-gup-handle-null-pages-in-unpin_user_pages
+++ a/mm/gup.c
@@ -52,7 +52,12 @@ static inline void sanity_check_pinned_p
*/
for (; npages; npages--, pages++) {
struct page *page = *pages;
- struct folio *folio = page_folio(page);
+ struct folio *folio;
+
+ if (!page)
+ continue;
+
+ folio = page_folio(page);
if (is_zero_page(page) ||
!folio_test_anon(folio))
@@ -409,6 +414,10 @@ void unpin_user_pages(struct page **page
sanity_check_pinned_pages(pages, npages);
for (i = 0; i < npages; i += nr) {
+ if (!pages[i]) {
+ nr = 1;
+ continue;
+ }
folio = gup_folio_next(pages, npages, i, &nr);
gup_put_folio(folio, nr, FOLL_PIN);
}
_
Patches currently in -mm which might be from jhubbard(a)nvidia.com are
mm-gup-handle-null-pages-in-unpin_user_pages.patch
The patch titled
Subject: fs/proc/kcore.c: clear ret value in read_kcore_iter after successful iov_iter_zero
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
fs-proc-kcorec-clear-ret-value-in-read_kcore_iter-after-successful-iov_iter_zero.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Jiri Olsa <jolsa(a)kernel.org>
Subject: fs/proc/kcore.c: clear ret value in read_kcore_iter after successful iov_iter_zero
Date: Fri, 22 Nov 2024 00:11:18 +0100
If iov_iter_zero succeeds after failed copy_from_kernel_nofault, we need
to reset the ret value to zero otherwise it will be returned as final
return value of read_kcore_iter.
This fixes objdump -d dump over /proc/kcore for me.
Link: https://lkml.kernel.org/r/20241121231118.3212000-1-jolsa@kernel.org
Fixes: 3d5854d75e31 ("fs/proc/kcore.c: allow translation of physical memory addresses")
Signed-off-by: Jiri Olsa <jolsa(a)kernel.org>
Cc: Alexander Gordeev <agordeev(a)linux.ibm.com>
Cc: Christian Brauner <brauner(a)kernel.org>
Cc: <hca(a)linux.ibm.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
fs/proc/kcore.c | 1 +
1 file changed, 1 insertion(+)
--- a/fs/proc/kcore.c~fs-proc-kcorec-clear-ret-value-in-read_kcore_iter-after-successful-iov_iter_zero
+++ a/fs/proc/kcore.c
@@ -600,6 +600,7 @@ static ssize_t read_kcore_iter(struct ki
ret = -EFAULT;
goto out;
}
+ ret = 0;
/*
* We know the bounce buffer is safe to copy from, so
* use _copy_to_iter() directly.
_
Patches currently in -mm which might be from jolsa(a)kernel.org are
fs-proc-kcorec-clear-ret-value-in-read_kcore_iter-after-successful-iov_iter_zero.patch
commit 7f00be96f125 ("of: property: Add device link support for
interrupt-parent, dmas and -gpio(s)") started adding device links for
the interrupt-parent property. commit 4104ca776ba3 ("of: property: Add
fw_devlink support for interrupts") and commit f265f06af194 ("of:
property: Fix fw_devlink handling of interrupts/interrupts-extended")
later added full support for parsing the interrupts and
interrupts-extended properties, which includes looking up the node of
the parent domain. This made the handler for the interrupt-parent
property redundant.
In fact, creating device links based solely on interrupt-parent is
problematic, because it can create spurious cycles. A node may have
this property without itself being an interrupt controller or consumer.
For example, this property is often present in the root node or a /soc
bus node to set the default interrupt parent for child nodes. However,
it is incorrect for the bus to depend on the interrupt controller, as
some of the bus's children may not be interrupt consumers at all or may
have a different interrupt parent.
Resolving these spurious dependency cycles can cause an incorrect probe
order for interrupt controller drivers. This was observed on a RISC-V
system with both an APLIC and IMSIC under /soc, where interrupt-parent
in /soc points to the APLIC, and the APLIC msi-parent points to the
IMSIC. fw_devlink found three dependency cycles and attempted to probe
the APLIC before the IMSIC. After applying this patch, there were no
dependency cycles and the probe order was correct.
Acked-by: Marc Zyngier <maz(a)kernel.org>
Cc: stable(a)vger.kernel.org
Fixes: 4104ca776ba3 ("of: property: Add fw_devlink support for interrupts")
Signed-off-by: Samuel Holland <samuel.holland(a)sifive.com>
---
Changes in v2:
- Fix typo in commit message
- Add Fixes: tag and CC stable
drivers/of/property.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/of/property.c b/drivers/of/property.c
index 11b922fde7af..7bd8390f2fba 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -1213,7 +1213,6 @@ DEFINE_SIMPLE_PROP(iommus, "iommus", "#iommu-cells")
DEFINE_SIMPLE_PROP(mboxes, "mboxes", "#mbox-cells")
DEFINE_SIMPLE_PROP(io_channels, "io-channels", "#io-channel-cells")
DEFINE_SIMPLE_PROP(io_backends, "io-backends", "#io-backend-cells")
-DEFINE_SIMPLE_PROP(interrupt_parent, "interrupt-parent", NULL)
DEFINE_SIMPLE_PROP(dmas, "dmas", "#dma-cells")
DEFINE_SIMPLE_PROP(power_domains, "power-domains", "#power-domain-cells")
DEFINE_SIMPLE_PROP(hwlocks, "hwlocks", "#hwlock-cells")
@@ -1359,7 +1358,6 @@ static const struct supplier_bindings of_supplier_bindings[] = {
{ .parse_prop = parse_mboxes, },
{ .parse_prop = parse_io_channels, },
{ .parse_prop = parse_io_backends, },
- { .parse_prop = parse_interrupt_parent, },
{ .parse_prop = parse_dmas, .optional = true, },
{ .parse_prop = parse_power_domains, },
{ .parse_prop = parse_hwlocks, },
--
2.45.1
Null pointer dereference occurs due to a race between smmu
driver probe and client driver probe, when of_dma_configure()
for client is called after the iommu_device_register() for smmu driver
probe has executed but before the driver_bound() for smmu driver
has been called.
Following is how the race occurs:
T1:Smmu device probe T2: Client device probe
really_probe()
arm_smmu_device_probe()
iommu_device_register()
really_probe()
platform_dma_configure()
of_dma_configure()
of_dma_configure_id()
of_iommu_configure()
iommu_probe_device()
iommu_init_device()
arm_smmu_probe_device()
arm_smmu_get_by_fwnode()
driver_find_device_by_fwnode()
driver_find_device()
next_device()
klist_next()
/* null ptr
assigned to smmu */
/* null ptr dereference
while smmu->streamid_mask */
driver_bound()
klist_add_tail()
When this null smmu pointer is dereferenced later in
arm_smmu_probe_device, the device crashes.
Fix this by deferring the probe of the client device
until the smmu device has bound to the arm smmu driver.
Fixes: 021bb8420d44 ("iommu/arm-smmu: Wire up generic configuration support")
Cc: stable(a)vger.kernel.org
Co-developed-by: Prakash Gupta <quic_guptap(a)quicinc.com>
Signed-off-by: Prakash Gupta <quic_guptap(a)quicinc.com>
Signed-off-by: Pratyush Brahma <quic_pbrahma(a)quicinc.com>
---
Changes in v2:
Fix kernel test robot warning
Add stable kernel list in cc
Link to v1: https://lore.kernel.org/all/20241001055633.21062-1-quic_pbrahma@quicinc.com/
drivers/iommu/arm/arm-smmu/arm-smmu.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/iommu/arm/arm-smmu/arm-smmu.c b/drivers/iommu/arm/arm-smmu/arm-smmu.c
index 723273440c21..7c778b7eb8c8 100644
--- a/drivers/iommu/arm/arm-smmu/arm-smmu.c
+++ b/drivers/iommu/arm/arm-smmu/arm-smmu.c
@@ -1437,6 +1437,9 @@ static struct iommu_device *arm_smmu_probe_device(struct device *dev)
goto out_free;
} else {
smmu = arm_smmu_get_by_fwnode(fwspec->iommu_fwnode);
+ if (!smmu)
+ return ERR_PTR(dev_err_probe(dev, -EPROBE_DEFER,
+ "smmu dev has not bound yet\n"));
}
ret = -EINVAL;
--
2.17.1
The patch below does not apply to the 6.6-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.6.y
git checkout FETCH_HEAD
git cherry-pick -x 577c134d311b9b94598d7a0c86be1f431f823003
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111735-paging-quintet-7ce1@gregkh' --subject-prefix 'PATCH 6.6.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 577c134d311b9b94598d7a0c86be1f431f823003 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Tue, 5 Nov 2024 10:57:46 -0500
Subject: [PATCH] x86/stackprotector: Work around strict Clang TLS symbol
requirements
GCC and Clang both implement stack protector support based on Thread Local
Storage (TLS) variables, and this is used in the kernel to implement per-task
stack cookies, by copying a task's stack cookie into a per-CPU variable every
time it is scheduled in.
Both now also implement -mstack-protector-guard-symbol=, which permits the TLS
variable to be specified directly. This is useful because it will allow to
move away from using a fixed offset of 40 bytes into the per-CPU area on
x86_64, which requires a lot of special handling in the per-CPU code and the
runtime relocation code.
However, while GCC is rather lax in its implementation of this command line
option, Clang actually requires that the provided symbol name refers to a TLS
variable (i.e., one declared with __thread), although it also permits the
variable to be undeclared entirely, in which case it will use an implicit
declaration of the right type.
The upshot of this is that Clang will emit the correct references to the stack
cookie variable in most cases, e.g.,
10d: 64 a1 00 00 00 00 mov %fs:0x0,%eax
10f: R_386_32 __stack_chk_guard
However, if a non-TLS definition of the symbol in question is visible in the
same compilation unit (which amounts to the whole of vmlinux if LTO is
enabled), it will drop the per-CPU prefix and emit a load from a bogus
address.
Work around this by using a symbol name that never occurs in C code, and emit
it as an alias in the linker script.
Fixes: 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Brian Gerst <brgerst(a)gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Tested-by: Nathan Chancellor <nathan(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Link: https://lore.kernel.org/r/20241105155801.1779119-2-brgerst@gmail.com
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index cd75e78a06c1..5b773b34768d 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -142,9 +142,10 @@ ifeq ($(CONFIG_X86_32),y)
ifeq ($(CONFIG_STACKPROTECTOR),y)
ifeq ($(CONFIG_SMP),y)
- KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
+ KBUILD_CFLAGS += -mstack-protector-guard-reg=fs \
+ -mstack-protector-guard-symbol=__ref_stack_chk_guard
else
- KBUILD_CFLAGS += -mstack-protector-guard=global
+ KBUILD_CFLAGS += -mstack-protector-guard=global
endif
endif
else
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 324686bca368..b7ea3e8e9ecc 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -51,3 +51,19 @@ EXPORT_SYMBOL_GPL(mds_verw_sel);
.popsection
THUNK warn_thunk_thunk, __warn_thunk
+
+#ifndef CONFIG_X86_64
+/*
+ * Clang's implementation of TLS stack cookies requires the variable in
+ * question to be a TLS variable. If the variable happens to be defined as an
+ * ordinary variable with external linkage in the same compilation unit (which
+ * amounts to the whole of vmlinux with LTO enabled), Clang will drop the
+ * segment register prefix from the references, resulting in broken code. Work
+ * around this by avoiding the symbol used in -mstack-protector-guard-symbol=
+ * entirely in the C code, and use an alias emitted by the linker script
+ * instead.
+ */
+#ifdef CONFIG_STACKPROTECTOR
+EXPORT_SYMBOL(__ref_stack_chk_guard);
+#endif
+#endif
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 25466c4d2134..3674006e3974 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -20,3 +20,6 @@
extern void cmpxchg8b_emu(void);
#endif
+#if defined(__GENKSYMS__) && defined(CONFIG_STACKPROTECTOR)
+extern unsigned long __ref_stack_chk_guard;
+#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a5f221ea5688..f43bb974fc66 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2089,8 +2089,10 @@ void syscall_init(void)
#ifdef CONFIG_STACKPROTECTOR
DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+#ifndef CONFIG_SMP
EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
#endif
+#endif
#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index b8c5741d2fb4..feb8102a9ca7 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -491,6 +491,9 @@ SECTIONS
. = ASSERT((_end - LOAD_OFFSET <= KERNEL_IMAGE_SIZE),
"kernel image bigger than KERNEL_IMAGE_SIZE");
+/* needed for Clang - see arch/x86/entry/entry.S */
+PROVIDE(__ref_stack_chk_guard = __stack_chk_guard);
+
#ifdef CONFIG_X86_64
/*
* Per-cpu symbols which need to be offset from __per_cpu_load
The patch below does not apply to the 5.10-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.10.y
git checkout FETCH_HEAD
git cherry-pick -x 577c134d311b9b94598d7a0c86be1f431f823003
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111739-dimmed-aspirate-171d@gregkh' --subject-prefix 'PATCH 5.10.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 577c134d311b9b94598d7a0c86be1f431f823003 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Tue, 5 Nov 2024 10:57:46 -0500
Subject: [PATCH] x86/stackprotector: Work around strict Clang TLS symbol
requirements
GCC and Clang both implement stack protector support based on Thread Local
Storage (TLS) variables, and this is used in the kernel to implement per-task
stack cookies, by copying a task's stack cookie into a per-CPU variable every
time it is scheduled in.
Both now also implement -mstack-protector-guard-symbol=, which permits the TLS
variable to be specified directly. This is useful because it will allow to
move away from using a fixed offset of 40 bytes into the per-CPU area on
x86_64, which requires a lot of special handling in the per-CPU code and the
runtime relocation code.
However, while GCC is rather lax in its implementation of this command line
option, Clang actually requires that the provided symbol name refers to a TLS
variable (i.e., one declared with __thread), although it also permits the
variable to be undeclared entirely, in which case it will use an implicit
declaration of the right type.
The upshot of this is that Clang will emit the correct references to the stack
cookie variable in most cases, e.g.,
10d: 64 a1 00 00 00 00 mov %fs:0x0,%eax
10f: R_386_32 __stack_chk_guard
However, if a non-TLS definition of the symbol in question is visible in the
same compilation unit (which amounts to the whole of vmlinux if LTO is
enabled), it will drop the per-CPU prefix and emit a load from a bogus
address.
Work around this by using a symbol name that never occurs in C code, and emit
it as an alias in the linker script.
Fixes: 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Brian Gerst <brgerst(a)gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Tested-by: Nathan Chancellor <nathan(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Link: https://lore.kernel.org/r/20241105155801.1779119-2-brgerst@gmail.com
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index cd75e78a06c1..5b773b34768d 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -142,9 +142,10 @@ ifeq ($(CONFIG_X86_32),y)
ifeq ($(CONFIG_STACKPROTECTOR),y)
ifeq ($(CONFIG_SMP),y)
- KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
+ KBUILD_CFLAGS += -mstack-protector-guard-reg=fs \
+ -mstack-protector-guard-symbol=__ref_stack_chk_guard
else
- KBUILD_CFLAGS += -mstack-protector-guard=global
+ KBUILD_CFLAGS += -mstack-protector-guard=global
endif
endif
else
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 324686bca368..b7ea3e8e9ecc 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -51,3 +51,19 @@ EXPORT_SYMBOL_GPL(mds_verw_sel);
.popsection
THUNK warn_thunk_thunk, __warn_thunk
+
+#ifndef CONFIG_X86_64
+/*
+ * Clang's implementation of TLS stack cookies requires the variable in
+ * question to be a TLS variable. If the variable happens to be defined as an
+ * ordinary variable with external linkage in the same compilation unit (which
+ * amounts to the whole of vmlinux with LTO enabled), Clang will drop the
+ * segment register prefix from the references, resulting in broken code. Work
+ * around this by avoiding the symbol used in -mstack-protector-guard-symbol=
+ * entirely in the C code, and use an alias emitted by the linker script
+ * instead.
+ */
+#ifdef CONFIG_STACKPROTECTOR
+EXPORT_SYMBOL(__ref_stack_chk_guard);
+#endif
+#endif
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 25466c4d2134..3674006e3974 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -20,3 +20,6 @@
extern void cmpxchg8b_emu(void);
#endif
+#if defined(__GENKSYMS__) && defined(CONFIG_STACKPROTECTOR)
+extern unsigned long __ref_stack_chk_guard;
+#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a5f221ea5688..f43bb974fc66 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2089,8 +2089,10 @@ void syscall_init(void)
#ifdef CONFIG_STACKPROTECTOR
DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+#ifndef CONFIG_SMP
EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
#endif
+#endif
#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index b8c5741d2fb4..feb8102a9ca7 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -491,6 +491,9 @@ SECTIONS
. = ASSERT((_end - LOAD_OFFSET <= KERNEL_IMAGE_SIZE),
"kernel image bigger than KERNEL_IMAGE_SIZE");
+/* needed for Clang - see arch/x86/entry/entry.S */
+PROVIDE(__ref_stack_chk_guard = __stack_chk_guard);
+
#ifdef CONFIG_X86_64
/*
* Per-cpu symbols which need to be offset from __per_cpu_load
The patch below does not apply to the 6.1-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-6.1.y
git checkout FETCH_HEAD
git cherry-pick -x 577c134d311b9b94598d7a0c86be1f431f823003
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111736-handshake-thesaurus-43e6@gregkh' --subject-prefix 'PATCH 6.1.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 577c134d311b9b94598d7a0c86be1f431f823003 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Tue, 5 Nov 2024 10:57:46 -0500
Subject: [PATCH] x86/stackprotector: Work around strict Clang TLS symbol
requirements
GCC and Clang both implement stack protector support based on Thread Local
Storage (TLS) variables, and this is used in the kernel to implement per-task
stack cookies, by copying a task's stack cookie into a per-CPU variable every
time it is scheduled in.
Both now also implement -mstack-protector-guard-symbol=, which permits the TLS
variable to be specified directly. This is useful because it will allow to
move away from using a fixed offset of 40 bytes into the per-CPU area on
x86_64, which requires a lot of special handling in the per-CPU code and the
runtime relocation code.
However, while GCC is rather lax in its implementation of this command line
option, Clang actually requires that the provided symbol name refers to a TLS
variable (i.e., one declared with __thread), although it also permits the
variable to be undeclared entirely, in which case it will use an implicit
declaration of the right type.
The upshot of this is that Clang will emit the correct references to the stack
cookie variable in most cases, e.g.,
10d: 64 a1 00 00 00 00 mov %fs:0x0,%eax
10f: R_386_32 __stack_chk_guard
However, if a non-TLS definition of the symbol in question is visible in the
same compilation unit (which amounts to the whole of vmlinux if LTO is
enabled), it will drop the per-CPU prefix and emit a load from a bogus
address.
Work around this by using a symbol name that never occurs in C code, and emit
it as an alias in the linker script.
Fixes: 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Brian Gerst <brgerst(a)gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Tested-by: Nathan Chancellor <nathan(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Link: https://lore.kernel.org/r/20241105155801.1779119-2-brgerst@gmail.com
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index cd75e78a06c1..5b773b34768d 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -142,9 +142,10 @@ ifeq ($(CONFIG_X86_32),y)
ifeq ($(CONFIG_STACKPROTECTOR),y)
ifeq ($(CONFIG_SMP),y)
- KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
+ KBUILD_CFLAGS += -mstack-protector-guard-reg=fs \
+ -mstack-protector-guard-symbol=__ref_stack_chk_guard
else
- KBUILD_CFLAGS += -mstack-protector-guard=global
+ KBUILD_CFLAGS += -mstack-protector-guard=global
endif
endif
else
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 324686bca368..b7ea3e8e9ecc 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -51,3 +51,19 @@ EXPORT_SYMBOL_GPL(mds_verw_sel);
.popsection
THUNK warn_thunk_thunk, __warn_thunk
+
+#ifndef CONFIG_X86_64
+/*
+ * Clang's implementation of TLS stack cookies requires the variable in
+ * question to be a TLS variable. If the variable happens to be defined as an
+ * ordinary variable with external linkage in the same compilation unit (which
+ * amounts to the whole of vmlinux with LTO enabled), Clang will drop the
+ * segment register prefix from the references, resulting in broken code. Work
+ * around this by avoiding the symbol used in -mstack-protector-guard-symbol=
+ * entirely in the C code, and use an alias emitted by the linker script
+ * instead.
+ */
+#ifdef CONFIG_STACKPROTECTOR
+EXPORT_SYMBOL(__ref_stack_chk_guard);
+#endif
+#endif
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 25466c4d2134..3674006e3974 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -20,3 +20,6 @@
extern void cmpxchg8b_emu(void);
#endif
+#if defined(__GENKSYMS__) && defined(CONFIG_STACKPROTECTOR)
+extern unsigned long __ref_stack_chk_guard;
+#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a5f221ea5688..f43bb974fc66 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2089,8 +2089,10 @@ void syscall_init(void)
#ifdef CONFIG_STACKPROTECTOR
DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+#ifndef CONFIG_SMP
EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
#endif
+#endif
#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index b8c5741d2fb4..feb8102a9ca7 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -491,6 +491,9 @@ SECTIONS
. = ASSERT((_end - LOAD_OFFSET <= KERNEL_IMAGE_SIZE),
"kernel image bigger than KERNEL_IMAGE_SIZE");
+/* needed for Clang - see arch/x86/entry/entry.S */
+PROVIDE(__ref_stack_chk_guard = __stack_chk_guard);
+
#ifdef CONFIG_X86_64
/*
* Per-cpu symbols which need to be offset from __per_cpu_load
The patch below does not apply to the 5.15-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
To reproduce the conflict and resubmit, you may use the following commands:
git fetch https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/ linux-5.15.y
git checkout FETCH_HEAD
git cherry-pick -x 577c134d311b9b94598d7a0c86be1f431f823003
# <resolve conflicts, build, test, etc.>
git commit -s
git send-email --to '<stable(a)vger.kernel.org>' --in-reply-to '2024111737-undead-acutely-d4b1@gregkh' --subject-prefix 'PATCH 5.15.y' HEAD^..
Possible dependencies:
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
From 577c134d311b9b94598d7a0c86be1f431f823003 Mon Sep 17 00:00:00 2001
From: Ard Biesheuvel <ardb(a)kernel.org>
Date: Tue, 5 Nov 2024 10:57:46 -0500
Subject: [PATCH] x86/stackprotector: Work around strict Clang TLS symbol
requirements
GCC and Clang both implement stack protector support based on Thread Local
Storage (TLS) variables, and this is used in the kernel to implement per-task
stack cookies, by copying a task's stack cookie into a per-CPU variable every
time it is scheduled in.
Both now also implement -mstack-protector-guard-symbol=, which permits the TLS
variable to be specified directly. This is useful because it will allow to
move away from using a fixed offset of 40 bytes into the per-CPU area on
x86_64, which requires a lot of special handling in the per-CPU code and the
runtime relocation code.
However, while GCC is rather lax in its implementation of this command line
option, Clang actually requires that the provided symbol name refers to a TLS
variable (i.e., one declared with __thread), although it also permits the
variable to be undeclared entirely, in which case it will use an implicit
declaration of the right type.
The upshot of this is that Clang will emit the correct references to the stack
cookie variable in most cases, e.g.,
10d: 64 a1 00 00 00 00 mov %fs:0x0,%eax
10f: R_386_32 __stack_chk_guard
However, if a non-TLS definition of the symbol in question is visible in the
same compilation unit (which amounts to the whole of vmlinux if LTO is
enabled), it will drop the per-CPU prefix and emit a load from a bogus
address.
Work around this by using a symbol name that never occurs in C code, and emit
it as an alias in the linker script.
Fixes: 3fb0fdb3bbe7 ("x86/stackprotector/32: Make the canary into a regular percpu variable")
Signed-off-by: Ard Biesheuvel <ardb(a)kernel.org>
Signed-off-by: Brian Gerst <brgerst(a)gmail.com>
Signed-off-by: Borislav Petkov (AMD) <bp(a)alien8.de>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Tested-by: Nathan Chancellor <nathan(a)kernel.org>
Cc: stable(a)vger.kernel.org
Link: https://github.com/ClangBuiltLinux/linux/issues/1854
Link: https://lore.kernel.org/r/20241105155801.1779119-2-brgerst@gmail.com
diff --git a/arch/x86/Makefile b/arch/x86/Makefile
index cd75e78a06c1..5b773b34768d 100644
--- a/arch/x86/Makefile
+++ b/arch/x86/Makefile
@@ -142,9 +142,10 @@ ifeq ($(CONFIG_X86_32),y)
ifeq ($(CONFIG_STACKPROTECTOR),y)
ifeq ($(CONFIG_SMP),y)
- KBUILD_CFLAGS += -mstack-protector-guard-reg=fs -mstack-protector-guard-symbol=__stack_chk_guard
+ KBUILD_CFLAGS += -mstack-protector-guard-reg=fs \
+ -mstack-protector-guard-symbol=__ref_stack_chk_guard
else
- KBUILD_CFLAGS += -mstack-protector-guard=global
+ KBUILD_CFLAGS += -mstack-protector-guard=global
endif
endif
else
diff --git a/arch/x86/entry/entry.S b/arch/x86/entry/entry.S
index 324686bca368..b7ea3e8e9ecc 100644
--- a/arch/x86/entry/entry.S
+++ b/arch/x86/entry/entry.S
@@ -51,3 +51,19 @@ EXPORT_SYMBOL_GPL(mds_verw_sel);
.popsection
THUNK warn_thunk_thunk, __warn_thunk
+
+#ifndef CONFIG_X86_64
+/*
+ * Clang's implementation of TLS stack cookies requires the variable in
+ * question to be a TLS variable. If the variable happens to be defined as an
+ * ordinary variable with external linkage in the same compilation unit (which
+ * amounts to the whole of vmlinux with LTO enabled), Clang will drop the
+ * segment register prefix from the references, resulting in broken code. Work
+ * around this by avoiding the symbol used in -mstack-protector-guard-symbol=
+ * entirely in the C code, and use an alias emitted by the linker script
+ * instead.
+ */
+#ifdef CONFIG_STACKPROTECTOR
+EXPORT_SYMBOL(__ref_stack_chk_guard);
+#endif
+#endif
diff --git a/arch/x86/include/asm/asm-prototypes.h b/arch/x86/include/asm/asm-prototypes.h
index 25466c4d2134..3674006e3974 100644
--- a/arch/x86/include/asm/asm-prototypes.h
+++ b/arch/x86/include/asm/asm-prototypes.h
@@ -20,3 +20,6 @@
extern void cmpxchg8b_emu(void);
#endif
+#if defined(__GENKSYMS__) && defined(CONFIG_STACKPROTECTOR)
+extern unsigned long __ref_stack_chk_guard;
+#endif
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index a5f221ea5688..f43bb974fc66 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -2089,8 +2089,10 @@ void syscall_init(void)
#ifdef CONFIG_STACKPROTECTOR
DEFINE_PER_CPU(unsigned long, __stack_chk_guard);
+#ifndef CONFIG_SMP
EXPORT_PER_CPU_SYMBOL(__stack_chk_guard);
#endif
+#endif
#endif /* CONFIG_X86_64 */
diff --git a/arch/x86/kernel/vmlinux.lds.S b/arch/x86/kernel/vmlinux.lds.S
index b8c5741d2fb4..feb8102a9ca7 100644
--- a/arch/x86/kernel/vmlinux.lds.S
+++ b/arch/x86/kernel/vmlinux.lds.S
@@ -491,6 +491,9 @@ SECTIONS
. = ASSERT((_end - LOAD_OFFSET <= KERNEL_IMAGE_SIZE),
"kernel image bigger than KERNEL_IMAGE_SIZE");
+/* needed for Clang - see arch/x86/entry/entry.S */
+PROVIDE(__ref_stack_chk_guard = __stack_chk_guard);
+
#ifdef CONFIG_X86_64
/*
* Per-cpu symbols which need to be offset from __per_cpu_load
vma_adjust_trans_huge() uses find_vma() to get the VMA, but find_vma() uses
the returned pointer without any verification, even though it may return NULL.
In this case, NULL pointer dereference may occur, so to prevent this,
vma_adjust_trans_huge() should be fix to verify the return value of find_vma().
Cc: <stable(a)vger.kernel.org>
Fixes: 685405020b9f ("mm/khugepaged: stop using vma linked list")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
---
mm/huge_memory.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 5734d5d5060f..db55b8abae2e 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2941,9 +2941,12 @@ void vma_adjust_trans_huge(struct vm_area_struct *vma,
*/
if (adjust_next > 0) {
struct vm_area_struct *next = find_vma(vma->vm_mm, vma->vm_end);
- unsigned long nstart = next->vm_start;
- nstart += adjust_next;
- split_huge_pmd_if_needed(next, nstart);
+
+ if (likely(next)) {
+ unsigned long nstart = next->vm_start;
+ nstart += adjust_next;
+ split_huge_pmd_if_needed(next, nstart);
+ }
}
}
--
The current implementation of can_set_termination() sets the GPIO in a
context which cannot sleep. This is an issue if the GPIO controller can
sleep (e.g. since the concerning GPIO expander is connected via SPI or
I2C). Thus, if the termination resistor is set (eg. with ip link),
a warning splat will be issued in the kernel log.
Fix this by setting the termination resistor with
gpiod_set_value_cansleep() which instead of gpiod_set_value() allows it to
sleep.
Cc: stable(a)vger.kernel.org
Signed-off-by: Nicolai Buchwitz <nb(a)tipi-net.de>
---
drivers/net/can/dev/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/can/dev/dev.c b/drivers/net/can/dev/dev.c
index 6792c14fd7eb..681643ab3780 100644
--- a/drivers/net/can/dev/dev.c
+++ b/drivers/net/can/dev/dev.c
@@ -468,7 +468,7 @@ static int can_set_termination(struct net_device *ndev, u16 term)
else
set = 0;
- gpiod_set_value(priv->termination_gpio, set);
+ gpiod_set_value_cansleep(priv->termination_gpio, set);
return 0;
}
--
2.39.5
I've found that NFS access to an exported tmpfs file system hangs
indefinitely when the client first performs a GETATTR. The hanging
nfsd thread is waiting for the inode lock in shmem_getattr():
task:nfsd state:D stack:0 pid:1775 tgid:1775 ppid:2 flags:0x00004000
Call Trace:
<TASK>
__schedule+0x770/0x7b0
schedule+0x33/0x50
schedule_preempt_disabled+0x19/0x30
rwsem_down_read_slowpath+0x206/0x230
down_read+0x3f/0x60
shmem_getattr+0x84/0xf0
vfs_getattr_nosec+0x9e/0xc0
vfs_getattr+0x49/0x50
fh_getattr+0x43/0x50 [nfsd]
fh_fill_pre_attrs+0x4e/0xd0 [nfsd]
nfsd4_open+0x51f/0x910 [nfsd]
nfsd4_proc_compound+0x492/0x5d0 [nfsd]
nfsd_dispatch+0x117/0x1f0 [nfsd]
svc_process_common+0x3b2/0x5e0 [sunrpc]
? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
svc_process+0xcf/0x130 [sunrpc]
svc_recv+0x64e/0x750 [sunrpc]
? __wake_up_bit+0x4b/0x60
? __pfx_nfsd+0x10/0x10 [nfsd]
nfsd+0xc6/0xf0 [nfsd]
kthread+0xed/0x100
? __pfx_kthread+0x10/0x10
ret_from_fork+0x2e/0x50
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
I bisected the problem to:
d949d1d14fa281ace388b1de978e8f2cd52875cf is the first bad commit
commit d949d1d14fa281ace388b1de978e8f2cd52875cf
Author: Jeongjun Park <aha310510(a)gmail.com>
AuthorDate: Mon Sep 9 21:35:58 2024 +0900
Commit: Andrew Morton <akpm(a)linux-foundation.org>
CommitDate: Mon Oct 28 21:40:39 2024 -0700
mm: shmem: fix data-race in shmem_getattr()
...
Link: https://lkml.kernel.org/r/20240909123558.70229-1-aha310510@gmail.com
Fixes: 44a30220bc0a ("shmem: recalculate file inode when fstat")
Signed-off-by: Jeongjun Park <aha310510(a)gmail.com>
Reported-by: syzbot <syzkaller(a)googlegroup.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Yu Zhao <yuzhao(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
which first appeared in v6.12-rc6, and adds the line that is waiting
on the inode lock when my NFS server hangs.
I haven't yet found the process that is holding the inode lock for
this inode.
Because this commit addresses only a KCSAN splat that has been
present since v4.3, and does not address a reported behavioral
issue, I respectfully request that this commit be reverted
immediately so that it does not appear in v6.12 final.
Troubleshooting and testing should continue until a fix to the KCSAN
issue can be found that does not deadlock NFS exports of tmpfs.
--
Chuck Lever
The supported resolutions were misrepresented in earlier versions of
hardware manuals.
Fixes: 768e9e61b3b9 ("drm: renesas: Add RZ/G2L DU Support")
Cc: stable(a)vger.kernel.org
Signed-off-by: Chris Brandt <chris.brandt(a)renesas.com>
---
drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c b/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c
index b99217b4e05d..90c6269ccd29 100644
--- a/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c
+++ b/drivers/gpu/drm/renesas/rz-du/rzg2l_du_kms.c
@@ -311,11 +311,11 @@ int rzg2l_du_modeset_init(struct rzg2l_du_device *rcdu)
dev->mode_config.helper_private = &rzg2l_du_mode_config_helper;
/*
- * The RZ DU uses the VSP1 for memory access, and is limited
- * to frame sizes of 1920x1080.
+ * The RZ DU was designed to support a frame size of 1920x1200 (landscape)
+ * or 1200x1920 (portrait).
*/
dev->mode_config.max_width = 1920;
- dev->mode_config.max_height = 1080;
+ dev->mode_config.max_height = 1920;
rcdu->num_crtcs = hweight8(rcdu->info->channels_mask);
--
2.34.1
The recent addition of "pofs" (pages or folios) handling to gup has a
flaw: it assumes that unpin_user_pages() handles NULL pages in the
pages** array. That's not the case, as I discovered when I ran on a new
configuration on my test machine.
Fix this by skipping NULL pages in unpin_user_pages(), just like
unpin_folios() already does.
Details: when booting on x86 with "numa=fake=2 movablecore=4G" on Linux
6.12, and running this:
tools/testing/selftests/mm/gup_longterm
...I get the following crash:
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:sanity_check_pinned_pages+0x3a/0x2d0
...
Call Trace:
<TASK>
? __die_body+0x66/0xb0
? page_fault_oops+0x30c/0x3b0
? do_user_addr_fault+0x6c3/0x720
? irqentry_enter+0x34/0x60
? exc_page_fault+0x68/0x100
? asm_exc_page_fault+0x22/0x30
? sanity_check_pinned_pages+0x3a/0x2d0
unpin_user_pages+0x24/0xe0
check_and_migrate_movable_pages_or_folios+0x455/0x4b0
__gup_longterm_locked+0x3bf/0x820
? mmap_read_lock_killable+0x12/0x50
? __pfx_mmap_read_lock_killable+0x10/0x10
pin_user_pages+0x66/0xa0
gup_test_ioctl+0x358/0xb20
__se_sys_ioctl+0x6b/0xc0
do_syscall_64+0x7b/0x150
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 94efde1d1539 ("mm/gup: avoid an unnecessary allocation call for FOLL_LONGTERM cases")
Cc: David Hildenbrand <david(a)redhat.com>
Cc: Oscar Salvador <osalvador(a)suse.de>
Cc: Vivek Kasireddy <vivek.kasireddy(a)intel.com>
Cc: Dave Airlie <airlied(a)redhat.com>
Cc: Gerd Hoffmann <kraxel(a)redhat.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Jason Gunthorpe <jgg(a)nvidia.com>
Cc: Peter Xu <peterx(a)redhat.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Daniel Vetter <daniel.vetter(a)ffwll.ch>
Cc: Dongwon Kim <dongwon.kim(a)intel.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Junxiao Chang <junxiao.chang(a)intel.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: John Hubbard <jhubbard(a)nvidia.com>
---
Hi,
This is based on v6.12.
Changes since v1:
Simplified by limiting the change to unpin_user_pages() and the
associated sanity_check_pinned_pages(), thanks to David Hildenbrand
for pointing out that issue in v1 [1].
[1] https://lore.kernel.org/20241119044923.194853-1-jhubbard@nvidia.com
thanks,
John Hubbard
mm/gup.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/mm/gup.c b/mm/gup.c
index ad0c8922dac3..7053f8114e01 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -52,7 +52,12 @@ static inline void sanity_check_pinned_pages(struct page **pages,
*/
for (; npages; npages--, pages++) {
struct page *page = *pages;
- struct folio *folio = page_folio(page);
+ struct folio *folio;
+
+ if (!page)
+ continue;
+
+ folio = page_folio(page);
if (is_zero_page(page) ||
!folio_test_anon(folio))
@@ -409,6 +414,10 @@ void unpin_user_pages(struct page **pages, unsigned long npages)
sanity_check_pinned_pages(pages, npages);
for (i = 0; i < npages; i += nr) {
+ if (!pages[i]) {
+ nr = 1;
+ continue;
+ }
folio = gup_folio_next(pages, npages, i, &nr);
gup_put_folio(folio, nr, FOLL_PIN);
}
base-commit: adc218676eef25575469234709c2d87185ca223a
--
2.47.0
Setting GPIO direction = high, sometimes results in GPIO value = 0.
If a GPIO is pulled high, the following construction results in the
value being 0 when the desired value is 1:
$ echo "high" > /sys/class/gpio/gpio336/direction
$ cat /sys/class/gpio/gpio336/value
0
Before the GPIO direction is changed from an input to an output,
exar_set_value() is called with value = 1, but since the GPIO is an
input when exar_set_value() is called, _regmap_update_bits() reads a 1
due to an external pull-up. regmap_set_bits() sets force_write =
false, so the value (1) is not written. When the direction is then
changed, the GPIO becomes an output with the value of 0 (the hardware
default).
regmap_write_bits() sets force_write = true, so the value is always
written by exar_set_value() and an external pull-up doesn't affect the
outcome of setting direction = high.
The same can happen when a GPIO is pulled low, but the scenario is a
little more complicated.
$ echo high > /sys/class/gpio/gpio351/direction
$ cat /sys/class/gpio/gpio351/value
1
$ echo in > /sys/class/gpio/gpio351/direction
$ cat /sys/class/gpio/gpio351/value
0
$ echo low > /sys/class/gpio/gpio351/direction
$ cat /sys/class/gpio/gpio351/value
1
Fixes: 36fb7218e878 ("gpio: exar: switch to using regmap")
Signed-off-by: Sai Kumar Cholleti <skmr537(a)gmail.com>
Signed-off-by: Matthew McClain <mmcclain(a)noprivs.com>
Cc: <stable(a)vger.kernel.org>
---
drivers/gpio/gpio-exar.c | 10 ++++++----
1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/drivers/gpio/gpio-exar.c b/drivers/gpio/gpio-exar.c
index 5170fe7599cd..dfc7a9ca3e62 100644
--- a/drivers/gpio/gpio-exar.c
+++ b/drivers/gpio/gpio-exar.c
@@ -99,11 +99,13 @@ static void exar_set_value(struct gpio_chip *chip, unsigned int offset,
struct exar_gpio_chip *exar_gpio = gpiochip_get_data(chip);
unsigned int addr = exar_offset_to_lvl_addr(exar_gpio, offset);
unsigned int bit = exar_offset_to_bit(exar_gpio, offset);
+ unsigned int bit_value = value ? BIT(bit) : 0;
- if (value)
- regmap_set_bits(exar_gpio->regmap, addr, BIT(bit));
- else
- regmap_clear_bits(exar_gpio->regmap, addr, BIT(bit));
+ /*
+ * regmap_write_bits forces value to be written when an external
+ * pull up/down might otherwise indicate value was already set
+ */
+ regmap_write_bits(exar_gpio->regmap, addr, BIT(bit), bit_value);
}
static int exar_direction_output(struct gpio_chip *chip, unsigned int offset,
--
2.34.1
Hi,
This series has several bug fixes that I encountered when the ufs-qcom driver
was removed and inserted back. But the fixes are applicable to other platform
glue drivers as well.
This series is tested on Qcom RB5 development board based on SM8250 SoC.
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
---
Manivannan Sadhasivam (5):
scsi: ufs: core: Cancel RTC work during ufshcd_remove()
scsi: ufs: qcom: Only free platform MSIs when ESI is enabled
scsi: ufs: pltfrm: Disable runtime PM during removal of glue drivers
scsi: ufs: pltfrm: Drop PM runtime reference count after ufshcd_remove()
scsi: ufs: pltfrm: Dellocate HBA during ufshcd_pltfrm_remove()
drivers/ufs/core/ufshcd.c | 1 +
drivers/ufs/host/cdns-pltfrm.c | 4 +---
drivers/ufs/host/tc-dwc-g210-pltfrm.c | 5 +----
drivers/ufs/host/ufs-exynos.c | 3 +--
drivers/ufs/host/ufs-hisi.c | 4 +---
drivers/ufs/host/ufs-mediatek.c | 5 +----
drivers/ufs/host/ufs-qcom.c | 7 ++++---
drivers/ufs/host/ufs-renesas.c | 4 +---
drivers/ufs/host/ufs-sprd.c | 5 +----
drivers/ufs/host/ufshcd-pltfrm.c | 16 ++++++++++++++++
drivers/ufs/host/ufshcd-pltfrm.h | 1 +
11 files changed, 29 insertions(+), 26 deletions(-)
---
base-commit: 59b723cd2adbac2a34fc8e12c74ae26ae45bf230
change-id: 20241111-ufs_bug_fix-6d17f39afaa4
Best regards,
--
Manivannan Sadhasivam <manivannan.sadhasivam(a)linaro.org>
Hi,
Can we please push commit b1a28f2eb9ea ("NFS:
nfs_async_write_reschedule_io must not recurse into the writeback
code") into stable kernel 5.15.x?
The bug addressed by this patch is being seen to cause lockups on some
production systems running kernels based on Linux 5.15.167.
Thanks!
Trond
--
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust(a)hammerspace.com
From: Jammy Huang <jammy_huang(a)aspeedtech.com>
[ Upstream commit c281355068bc258fd619c5aefd978595bede7bfe ]
When capturing 1600x900, system could crash when system memory usage is
tight.
The way to reproduce this issue:
1. Use 1600x900 to display on host
2. Mount ISO through 'Virtual media' on OpenBMC's web
3. Run script as below on host to do sha continuously
#!/bin/bash
while [ [1] ];
do
find /media -type f -printf '"%h/%f"\n' | xargs sha256sum
done
4. Open KVM on OpenBMC's web
The size of macro block captured is 8x8. Therefore, we should make sure
the height of src-buf is 8 aligned to fix this issue.
Signed-off-by: Jammy Huang <jammy_huang(a)aspeedtech.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco(a)xs4all.nl>
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
---
drivers/media/platform/aspeed/aspeed-video.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/media/platform/aspeed/aspeed-video.c b/drivers/media/platform/aspeed/aspeed-video.c
index 20f795ccc11b..c5af28bf0e96 100644
--- a/drivers/media/platform/aspeed/aspeed-video.c
+++ b/drivers/media/platform/aspeed/aspeed-video.c
@@ -1047,7 +1047,7 @@ static void aspeed_video_get_resolution(struct aspeed_video *video)
static void aspeed_video_set_resolution(struct aspeed_video *video)
{
struct v4l2_bt_timings *act = &video->active_timings;
- unsigned int size = act->width * act->height;
+ unsigned int size = act->width * ALIGN(act->height, 8);
/* Set capture/compression frame sizes */
aspeed_video_calc_compressed_size(video, size);
@@ -1064,7 +1064,7 @@ static void aspeed_video_set_resolution(struct aspeed_video *video)
u32 width = ALIGN(act->width, 64);
aspeed_video_write(video, VE_CAP_WINDOW, width << 16 | act->height);
- size = width * act->height;
+ size = width * ALIGN(act->height, 8);
} else {
aspeed_video_write(video, VE_CAP_WINDOW,
act->width << 16 | act->height);
--
2.43.0
From: tuqiang <tu.qiang35(a)zte.com.cn>
The MR/QP restrack also needs to be released when delete it, otherwise it
cause memory leak as the task struct won't be released.
This problem has been fixed by the commit <dac153f2802d>
("RDMA/restrack: Release MR restrack when delete"), but still exists in the
linux-5.10.y branch.
Fixes: 13ef5539def7 ("RDMA/restrack: Count references to the verbs objects")
Signed-off-by: tuqiang <tu.qiang35(a)zte.com.cn>
Signed-off-by: Jiang Kun <jiang.kun2(a)zte.com.cn>
Cc: stable(a)vger.kernel.org
Cc: xu xin <xu.xin16(a)zte.com.cn>
Cc: Doug Ledford <dledford(a)redhat.com>
Cc: Jason Gunthorpe <jgg(a)ziepe.ca>
Cc: Leon Romanovsky <leon(a)kernel.org>
---
drivers/infiniband/core/restrack.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/infiniband/core/restrack.c b/drivers/infiniband/core/restrack.c
index bbbbec5b1593..d5a69c4a1891 100644
--- a/drivers/infiniband/core/restrack.c
+++ b/drivers/infiniband/core/restrack.c
@@ -326,8 +326,6 @@ void rdma_restrack_del(struct rdma_restrack_entry *res)
rt = &dev->res[res->type];
old = xa_erase(&rt->xa, res->id);
- if (res->type == RDMA_RESTRACK_MR || res->type == RDMA_RESTRACK_QP)
- return;
WARN_ON(old != res);
res->valid = false;
--
2.18.4
`MFD_NOEXEC_SEAL` should remove the executable bits and set `F_SEAL_EXEC`
to prevent further modifications to the executable bits as per the comment
in the uapi header file:
not executable and sealed to prevent changing to executable
However, commit 105ff5339f498a ("mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC")
that introduced this feature made it so that `MFD_NOEXEC_SEAL` unsets
`F_SEAL_SEAL`, essentially acting as a superset of `MFD_ALLOW_SEALING`.
Nothing implies that it should be so, and indeed up until the second version
of the of the patchset[0] that introduced `MFD_EXEC` and `MFD_NOEXEC_SEAL`,
`F_SEAL_SEAL` was not removed, however, it was changed in the third revision
of the patchset[1] without a clear explanation.
This behaviour is surprising for application developers, there is no
documentation that would reveal that `MFD_NOEXEC_SEAL` has the additional
effect of `MFD_ALLOW_SEALING`. Additionally, combined with `vm.memfd_noexec=2`
it has the effect of making all memfds initially sealable.
So do not remove `F_SEAL_SEAL` when `MFD_NOEXEC_SEAL` is requested,
thereby returning to the pre-Linux 6.3 behaviour of only allowing
sealing when `MFD_ALLOW_SEALING` is specified.
Now, this is technically a uapi break. However, the damage is expected
to be minimal. To trigger user visible change, a program has to do the
following steps:
- create memfd:
- with `MFD_NOEXEC_SEAL`,
- without `MFD_ALLOW_SEALING`;
- try to add seals / check the seals.
But that seems unlikely to happen intentionally since this change
essentially reverts the kernel's behaviour to that of Linux <6.3,
so if a program worked correctly on those older kernels, it will
likely work correctly after this change.
I have used Debian Code Search and GitHub to try to find potential
breakages, and I could only find a single one. dbus-broker's
memfd_create() wrapper is aware of this implicit `MFD_ALLOW_SEALING`
behaviour, and tries to work around it[2]. This workaround will
break. Luckily, this only affects the test suite, it does not affect
the normal operations of dbus-broker. There is a PR with a fix[3].
I also carried out a smoke test by building a kernel with this change
and booting an Arch Linux system into GNOME and Plasma sessions.
There was also a previous attempt to address this peculiarity by
introducing a new flag[4].
[0]: https://lore.kernel.org/lkml/20220805222126.142525-3-jeffxu@google.com/
[1]: https://lore.kernel.org/lkml/20221202013404.163143-3-jeffxu@google.com/
[2]: https://github.com/bus1/dbus-broker/blob/9eb0b7e5826fc76cad7b025bc46f267d4a…
[3]: https://github.com/bus1/dbus-broker/pull/366
[4]: https://lore.kernel.org/lkml/20230714114753.170814-1-david@readahead.eu/
Cc: stable(a)vger.kernel.org
Signed-off-by: Barnabás Pőcze <pobrn(a)protonmail.com>
---
* v3: https://lore.kernel.org/linux-mm/20240611231409.3899809-1-jeffxu@chromium.o…
* v2: https://lore.kernel.org/linux-mm/20240524033933.135049-1-jeffxu@google.com/
* v1: https://lore.kernel.org/linux-mm/20240513191544.94754-1-pobrn@protonmail.co…
This fourth version returns to removing the inconsistency as opposed to documenting
its existence, with the same code change as v1 but with a somewhat extended commit
message. This is sent because I believe it is worth at least a try; it can be easily
reverted if bigger application breakages are discovered than initially imagined.
---
mm/memfd.c | 9 ++++-----
tools/testing/selftests/memfd/memfd_test.c | 2 +-
2 files changed, 5 insertions(+), 6 deletions(-)
diff --git a/mm/memfd.c b/mm/memfd.c
index 7d8d3ab3fa37..8b7f6afee21d 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -356,12 +356,11 @@ SYSCALL_DEFINE2(memfd_create,
inode->i_mode &= ~0111;
file_seals = memfd_file_seals_ptr(file);
- if (file_seals) {
- *file_seals &= ~F_SEAL_SEAL;
+ if (file_seals)
*file_seals |= F_SEAL_EXEC;
- }
- } else if (flags & MFD_ALLOW_SEALING) {
- /* MFD_EXEC and MFD_ALLOW_SEALING are set */
+ }
+
+ if (flags & MFD_ALLOW_SEALING) {
file_seals = memfd_file_seals_ptr(file);
if (file_seals)
*file_seals &= ~F_SEAL_SEAL;
diff --git a/tools/testing/selftests/memfd/memfd_test.c b/tools/testing/selftests/memfd/memfd_test.c
index 95af2d78fd31..7b78329f65b6 100644
--- a/tools/testing/selftests/memfd/memfd_test.c
+++ b/tools/testing/selftests/memfd/memfd_test.c
@@ -1151,7 +1151,7 @@ static void test_noexec_seal(void)
mfd_def_size,
MFD_CLOEXEC | MFD_NOEXEC_SEAL);
mfd_assert_mode(fd, 0666);
- mfd_assert_has_seals(fd, F_SEAL_EXEC);
+ mfd_assert_has_seals(fd, F_SEAL_SEAL | F_SEAL_EXEC);
mfd_fail_chmod(fd, 0777);
close(fd);
}
--
2.45.2
SCM driver looks messy in terms of handling concurrency of probe. The
driver exports interface which is guarded by global '__scm' variable
but:
1. Lacks proper read barrier (commit adding write barriers mixed up
READ_ONCE with a read barrier).
2. Lacks barriers or checks for '__scm' in multiple places.
3. Lacks probe error cleanup.
I fixed here few visible things, but this was not tested extensively. I
tried only SM8450.
ARM32 and SC8280xp/X1E platforms would be useful for testing as well.
All the issues here are non-urgent, IOW, they were here for some time
(v6.10-rc1 and earlier).
Best regards,
Krzysztof
---
Krzysztof Kozlowski (6):
firmware: qcom: scm: Fix missing read barrier in qcom_scm_is_available()
firmware: qcom: scm: Fix missing read barrier in qcom_scm_get_tzmem_pool()
firmware: qcom: scm: Handle various probe ordering for qcom_scm_assign_mem()
[RFC/RFT] firmware: qcom: scm: Cleanup global '__scm' on probe failures
firmware: qcom: scm: smc: Handle missing SCM device
firmware: qcom: scm: smc: Narrow 'mempool' variable scope
drivers/firmware/qcom/qcom_scm-smc.c | 6 +++-
drivers/firmware/qcom/qcom_scm.c | 55 +++++++++++++++++++++++++-----------
2 files changed, 44 insertions(+), 17 deletions(-)
---
base-commit: 414c97c966b69e4a6ea7b32970fa166b2f9b9ef0
change-id: 20241119-qcom-scm-missing-barriers-and-all-sort-of-srap-a25d59074882
Best regards,
--
Krzysztof Kozlowski <krzysztof.kozlowski(a)linaro.org>
Some cameras do not return all the bytes requested from a control
if it can fit in less bytes. Eg: returning 0xab instead of 0x00ab.
Support these devices.
Also, now that we are at it, improve uvc_query_ctrl() logging.
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v3:
- Improve documentation.
- Do not change return sequence.
- Use dev_ratelimit and dev_warn_once
- Link to v2: https://lore.kernel.org/r/20241008-uvc-readless-v2-0-04d9d51aee56@chromium.…
Changes in v2:
- Rewrite error handling (Thanks Sakari)
- Discard 2/3. It is not needed after rewriting the error handling.
- Link to v1: https://lore.kernel.org/r/20241008-uvc-readless-v1-0-042ac4581f44@chromium.…
---
Ricardo Ribalda (2):
media: uvcvideo: Support partial control reads
media: uvcvideo: Add more logging to uvc_query_ctrl()
drivers/media/usb/uvc/uvc_video.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20241008-uvc-readless-23f9b8cad0b3
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
From: Linus Walleij <linus.walleij(a)linaro.org>
[ Upstream commit 4aea16b7cfb76bd3361858ceee6893ef5c9b5570 ]
When enabling expert mode CONFIG_EXPERT and using that power
user mode to disable the branch prediction hardening
!CONFIG_HARDEN_BRANCH_PREDICTOR, the assembly linker
in CLANG notices that some assembly in proc-v7.S does
not have corresponding C call sites, i.e. the prototypes
in proc-v7-bugs.c are enclosed in ifdef
CONFIG_HARDEN_BRANCH_PREDICTOR so this assembly:
SYM_TYPED_FUNC_START(cpu_v7_smc_switch_mm)
SYM_TYPED_FUNC_START(cpu_v7_hvc_switch_mm)
Results in:
ld.lld: error: undefined symbol: __kcfi_typeid_cpu_v7_smc_switch_mm
>>> referenced by proc-v7.S:94 (.../arch/arm/mm/proc-v7.S:94)
>>> arch/arm/mm/proc-v7.o:(.text+0x108) in archive vmlinux.a
ld.lld: error: undefined symbol: __kcfi_typeid_cpu_v7_hvc_switch_mm
>>> referenced by proc-v7.S:105 (.../arch/arm/mm/proc-v7.S:105)
>>> arch/arm/mm/proc-v7.o:(.text+0x124) in archive vmlinux.a
Fix this by adding an additional requirement that
CONFIG_HARDEN_BRANCH_PREDICTOR has to be enabled to compile
these assembly calls.
Closes: https://lore.kernel.org/oe-kbuild-all/202411041456.ZsoEiD7T-lkp@intel.com/
Reported-by: kernel test robot <lkp(a)intel.com>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mm/proc-v7.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index e351d682c2e36..8bde80b31a861 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -94,7 +94,7 @@ ENTRY(cpu_v7_dcache_clean_area)
ret lr
ENDPROC(cpu_v7_dcache_clean_area)
-#ifdef CONFIG_ARM_PSCI
+#if defined(CONFIG_ARM_PSCI) && defined(CONFIG_HARDEN_BRANCH_PREDICTOR)
.arch_extension sec
ENTRY(cpu_v7_smc_switch_mm)
stmfd sp!, {r0 - r3}
--
2.43.0
From: Linus Walleij <linus.walleij(a)linaro.org>
[ Upstream commit 4aea16b7cfb76bd3361858ceee6893ef5c9b5570 ]
When enabling expert mode CONFIG_EXPERT and using that power
user mode to disable the branch prediction hardening
!CONFIG_HARDEN_BRANCH_PREDICTOR, the assembly linker
in CLANG notices that some assembly in proc-v7.S does
not have corresponding C call sites, i.e. the prototypes
in proc-v7-bugs.c are enclosed in ifdef
CONFIG_HARDEN_BRANCH_PREDICTOR so this assembly:
SYM_TYPED_FUNC_START(cpu_v7_smc_switch_mm)
SYM_TYPED_FUNC_START(cpu_v7_hvc_switch_mm)
Results in:
ld.lld: error: undefined symbol: __kcfi_typeid_cpu_v7_smc_switch_mm
>>> referenced by proc-v7.S:94 (.../arch/arm/mm/proc-v7.S:94)
>>> arch/arm/mm/proc-v7.o:(.text+0x108) in archive vmlinux.a
ld.lld: error: undefined symbol: __kcfi_typeid_cpu_v7_hvc_switch_mm
>>> referenced by proc-v7.S:105 (.../arch/arm/mm/proc-v7.S:105)
>>> arch/arm/mm/proc-v7.o:(.text+0x124) in archive vmlinux.a
Fix this by adding an additional requirement that
CONFIG_HARDEN_BRANCH_PREDICTOR has to be enabled to compile
these assembly calls.
Closes: https://lore.kernel.org/oe-kbuild-all/202411041456.ZsoEiD7T-lkp@intel.com/
Reported-by: kernel test robot <lkp(a)intel.com>
Reviewed-by: Nathan Chancellor <nathan(a)kernel.org>
Signed-off-by: Linus Walleij <linus.walleij(a)linaro.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel(a)armlinux.org.uk>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
arch/arm/mm/proc-v7.S | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/arch/arm/mm/proc-v7.S b/arch/arm/mm/proc-v7.S
index 48e0ef6f0dccf..aee85fd261f5d 100644
--- a/arch/arm/mm/proc-v7.S
+++ b/arch/arm/mm/proc-v7.S
@@ -91,7 +91,7 @@ ENTRY(cpu_v7_dcache_clean_area)
ret lr
ENDPROC(cpu_v7_dcache_clean_area)
-#ifdef CONFIG_ARM_PSCI
+#if defined(CONFIG_ARM_PSCI) && defined(CONFIG_HARDEN_BRANCH_PREDICTOR)
.arch_extension sec
ENTRY(cpu_v7_smc_switch_mm)
stmfd sp!, {r0 - r3}
--
2.43.0
From: "Gustavo A. R. Silva" <gustavoars(a)kernel.org>
[ Upstream commit 08ae3e5f5fc8edb9bd0c7ef9696ff29ef18b26ef ]
Commit 38aa3f5ac6d2 ("integrity: Avoid -Wflex-array-member-not-at-end
warnings") introduced tagged `struct evm_ima_xattr_data_hdr` and
`struct ima_digest_data_hdr`. We want to ensure that when new members
need to be added to the flexible structures, they are always included
within these tagged structs.
So, we use `static_assert()` to ensure that the memory layout for
both the flexible structure and the tagged struct is the same after
any changes.
Signed-off-by: Gustavo A. R. Silva <gustavoars(a)kernel.org>
Tested-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu(a)huawei.com>
Signed-off-by: Mimi Zohar <zohar(a)linux.ibm.com>
Signed-off-by: Sasha Levin <sashal(a)kernel.org>
---
security/integrity/integrity.h | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/security/integrity/integrity.h b/security/integrity/integrity.h
index 660f76cb69d37..c2c2da6911233 100644
--- a/security/integrity/integrity.h
+++ b/security/integrity/integrity.h
@@ -37,6 +37,8 @@ struct evm_ima_xattr_data {
);
u8 data[];
} __packed;
+static_assert(offsetof(struct evm_ima_xattr_data, data) == sizeof(struct evm_ima_xattr_data_hdr),
+ "struct member likely outside of __struct_group()");
/* Only used in the EVM HMAC code. */
struct evm_xattr {
@@ -65,6 +67,8 @@ struct ima_digest_data {
);
u8 digest[];
} __packed;
+static_assert(offsetof(struct ima_digest_data, digest) == sizeof(struct ima_digest_data_hdr),
+ "struct member likely outside of __struct_group()");
/*
* Instead of wrapping the ima_digest_data struct inside a local structure
--
2.43.0
Some cameras do not return all the bytes requested from a control
if it can fit in less bytes. Eg: returning 0xab instead of 0x00ab.
Support these devices.
Also, now that we are at it, improve uvc_query_ctrl() logging.
Signed-off-by: Ricardo Ribalda <ribalda(a)chromium.org>
---
Changes in v2:
- Rewrite error handling (Thanks Sakari)
- Discard 2/3. It is not needed after rewriting the error handling.
- Link to v1: https://lore.kernel.org/r/20241008-uvc-readless-v1-0-042ac4581f44@chromium.…
---
Ricardo Ribalda (2):
media: uvcvideo: Support partial control reads
media: uvcvideo: Add more logging to uvc_query_ctrl()
drivers/media/usb/uvc/uvc_video.c | 27 +++++++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)
---
base-commit: 9852d85ec9d492ebef56dc5f229416c925758edc
change-id: 20241008-uvc-readless-23f9b8cad0b3
Best regards,
--
Ricardo Ribalda <ribalda(a)chromium.org>
From: Eli Billauer <eli.billauer(a)gmail.com>
[ Upstream commit 282a4b71816b6076029017a7bab3a9dcee12a920 ]
The driver for XillyUSB devices maintains a kref reference count on each
xillyusb_dev structure, which represents a physical device. This reference
count reaches zero when the device has been disconnected and there are no
open file descriptors that are related to the device. When this occurs,
kref_put() calls cleanup_dev(), which clears up the device's data,
including the structure itself.
However, when xillyusb_open() is called, this reference count becomes
tricky: This function needs to obtain the xillyusb_dev structure that
relates to the inode's major and minor (as there can be several such).
xillybus_find_inode() (which is defined in xillybus_class.c) is called
for this purpose. xillybus_find_inode() holds a mutex that is global in
xillybus_class.c to protect the list of devices, and releases this
mutex before returning. As a result, nothing protects the xillyusb_dev's
reference counter from being decremented to zero before xillyusb_open()
increments it on its own behalf. Hence the structure can be freed
due to a rare race condition.
To solve this, a mutex is added. It is locked by xillyusb_open() before
the call to xillybus_find_inode() and is released only after the kref
counter has been incremented on behalf of the newly opened inode. This
protects the kref reference counters of all xillyusb_dev structs from
being decremented by xillyusb_disconnect() during this time segment, as
the call to kref_put() in this function is done with the same lock held.
There is no need to hold the lock on other calls to kref_put(), because
if xillybus_find_inode() finds a struct, xillyusb_disconnect() has not
made the call to remove it, and hence not made its call to kref_put(),
which takes place afterwards. Hence preventing xillyusb_disconnect's
call to kref_put() is enough to ensure that the reference doesn't reach
zero before it's incremented by xillyusb_open().
It would have been more natural to increment the reference count in
xillybus_find_inode() of course, however this function is also called by
Xillybus' driver for PCIe / OF, which registers a completely different
structure. Therefore, xillybus_find_inode() treats these structures as
void pointers, and accordingly can't make any changes.
Reported-by: Hyunwoo Kim <imv4bel(a)gmail.com>
Suggested-by: Alan Stern <stern(a)rowland.harvard.edu>
Signed-off-by: Eli Billauer <eli.billauer(a)gmail.com>
Link: https://lore.kernel.org/r/20221030094209.65916-1-eli.billauer@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
Signed-off-by: Bin Lan <bin.lan.cn(a)windriver.com>
---
drivers/char/xillybus/xillyusb.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/char/xillybus/xillyusb.c b/drivers/char/xillybus/xillyusb.c
index 3a2a0fb3d928..45771b1a3716 100644
--- a/drivers/char/xillybus/xillyusb.c
+++ b/drivers/char/xillybus/xillyusb.c
@@ -185,6 +185,14 @@ struct xillyusb_dev {
struct mutex process_in_mutex; /* synchronize wakeup_all() */
};
+/*
+ * kref_mutex is used in xillyusb_open() to prevent the xillyusb_dev
+ * struct from being freed during the gap between being found by
+ * xillybus_find_inode() and having its reference count incremented.
+ */
+
+static DEFINE_MUTEX(kref_mutex);
+
/* FPGA to host opcodes */
enum {
OPCODE_DATA = 0,
@@ -1234,9 +1242,16 @@ static int xillyusb_open(struct inode *inode, struct file *filp)
int rc;
int index;
+ mutex_lock(&kref_mutex);
+
rc = xillybus_find_inode(inode, (void **)&xdev, &index);
- if (rc)
+ if (rc) {
+ mutex_unlock(&kref_mutex);
return rc;
+ }
+
+ kref_get(&xdev->kref);
+ mutex_unlock(&kref_mutex);
chan = &xdev->channels[index];
filp->private_data = chan;
@@ -1272,8 +1287,6 @@ static int xillyusb_open(struct inode *inode, struct file *filp)
((filp->f_mode & FMODE_WRITE) && chan->open_for_write))
goto unmutex_fail;
- kref_get(&xdev->kref);
-
if (filp->f_mode & FMODE_READ)
chan->open_for_read = 1;
@@ -1410,6 +1423,7 @@ static int xillyusb_open(struct inode *inode, struct file *filp)
return rc;
unmutex_fail:
+ kref_put(&xdev->kref, cleanup_dev);
mutex_unlock(&chan->lock);
return rc;
}
@@ -2244,7 +2258,9 @@ static void xillyusb_disconnect(struct usb_interface *interface)
xdev->dev = NULL;
+ mutex_lock(&kref_mutex);
kref_put(&xdev->kref, cleanup_dev);
+ mutex_unlock(&kref_mutex);
}
static struct usb_driver xillyusb_driver = {
--
2.43.0
From: Xiangyu Chen <xiangyu.chen(a)windriver.com>
Backport to fix CVE-2024-36478
https://lore.kernel.org/linux-cve-announce/2024062136-CVE-2024-36478-d249@g…
The CVE fix is "null_blk: fix null-ptr-dereference while configuring 'power'
and 'submit_queues'"
This required 2 extra commit to make sure the picks are clean:
null_blk: Remove usage of the deprecated ida_simple_xx() API
null_blk: Fix return value of nullb_device_power_store()
Changes:
V1 -> V2
Added the extra commit Fix return value of nullb_device_power_store()
Christophe JAILLET (1):
null_blk: Remove usage of the deprecated ida_simple_xx() API
Damien Le Moal (1):
null_blk: Fix return value of nullb_device_power_store()
Yu Kuai (1):
null_blk: fix null-ptr-dereference while configuring 'power' and
'submit_queues'
drivers/block/null_blk/main.c | 45 ++++++++++++++++++++++-------------
1 file changed, 29 insertions(+), 16 deletions(-)
--
2.43.0
From: "Liam R. Howlett" <Liam.Howlett(a)Oracle.com>
The mmap_region() function tries to install a new vma, which requires a
pre-allocation for the maple tree write due to the complex locking
scenarios involved.
Recent efforts to simplify the error recovery required the relocation of
the preallocation of the maple tree nodes (via vma_iter_prealloc()
calling mas_preallocate()) higher in the function.
The relocation of the preallocation meant that, if there was a file
associated with the vma and the driver call (mmap_file()) modified the
vma flags, then a new merge of the new vma with existing vmas is
attempted.
During the attempt to merge the existing vma with the new vma, the vma
iterator is used - the same iterator that would be used for the next
write attempt to the tree. In the event of needing a further allocation
and if the new allocations fails, the vma iterator (and contained maple
state) will cleaned up, including freeing all previous allocations and
will be reset internally.
Upon returning to the __mmap_region() function, the error is available
in the vma_merge_struct and can be used to detect the -ENOMEM status.
Hitting an -ENOMEM scenario after the driver callback leaves the system
in a state that undoing the mapping is worse than continuing by dipping
into the reserve.
A preallocation should be performed in the case of an -ENOMEM and the
allocations were lost during the failure scenario. The __GFP_NOFAIL
flag is used in the allocation to ensure the allocation succeeds after
implicitly telling the driver that the mapping was happening.
The range is already set in the vma_iter_store() call below, so it is
not necessary and is dropped.
Reported-by: syzbot+bc6bfc25a68b7a020ee1(a)syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/x/log.txt?x=17b0ace8580000
Fixes: 5de195060b2e2 ("mm: resolve faulty mmap_region() error path behaviour")
Signed-off-by: Liam R. Howlett <Liam.Howlett(a)Oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes(a)oracle.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Jann Horn <jannh(a)google.com>
Cc: <stable(a)vger.kernel.org>
---
mm/mmap.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
Changes since v1:
- Don't bail out and force the allocation when the merge failure is
-ENOMEM - Thanks Lorenzo
diff --git a/mm/mmap.c b/mm/mmap.c
index 79d541f1502b2..4f6e566d52faa 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1491,7 +1491,18 @@ static unsigned long __mmap_region(struct file *file, unsigned long addr,
vm_flags = vma->vm_flags;
goto file_expanded;
}
- vma_iter_config(&vmi, addr, end);
+
+ /*
+ * In the unlikely even that more memory was needed, but
+ * not available for the vma merge, the vma iterator
+ * will have no memory reserved for the write we told
+ * the driver was happening. To keep up the ruse,
+ * ensure the allocation for the store succeeds.
+ */
+ if (vmg_nomem(&vmg)) {
+ mas_preallocate(&vmi.mas, vma,
+ GFP_KERNEL|__GFP_NOFAIL);
+ }
}
vm_flags = vma->vm_flags;
--
2.43.0
From: Peter Wang <peter.wang(a)mediatek.com>
When the power mode change is successful but the power mode
hasn't actually changed, the post notification was missed.
Similar to the approach with hibernate/clock scale/hce enable,
having pre/post notifications in the same function will
make it easier to maintain.
Fixes: 7eb584db73be ("ufs: refactor configuring power mode")
Cc: stable(a)vger.kernel.org #6.11.x
Signed-off-by: Peter Wang <peter.wang(a)mediatek.com>
---
drivers/ufs/core/ufshcd.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index abbe7135a977..814402e93a1e 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -4651,9 +4651,6 @@ static int ufshcd_change_power_mode(struct ufs_hba *hba,
dev_err(hba->dev,
"%s: power mode change failed %d\n", __func__, ret);
} else {
- ufshcd_vops_pwr_change_notify(hba, POST_CHANGE, NULL,
- pwr_mode);
-
memcpy(&hba->pwr_info, pwr_mode,
sizeof(struct ufs_pa_layer_attr));
}
@@ -4682,6 +4679,10 @@ int ufshcd_config_pwr_mode(struct ufs_hba *hba,
ret = ufshcd_change_power_mode(hba, &final_params);
+ if (!ret)
+ ufshcd_vops_pwr_change_notify(hba, POST_CHANGE, NULL,
+ &final_params);
+
return ret;
}
EXPORT_SYMBOL_GPL(ufshcd_config_pwr_mode);
--
2.18.0