Hi Greg, Sasha,
The following list shows the backported patches, I am using original
commit IDs for reference:
1) 0854db2aaef3 ("netfilter: nf_tables: use net_generic infra for transaction data")
2) 81ea01066741 ("netfilter: nf_tables: add rescheduling points during loop detection walks")
3) 1240eb93f061 ("netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE")
4) 4bedf9eee016 ("netfilter: nf_tables: fix chain binding transaction logic")
5) 26b5a5712eb8 ("netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain")
6) 938154b93be8 ("netfilter: nf_tables: reject unbound anonymous set before commit phase")
7) 62e1e94b246e ("netfilter: nf_tables: reject unbound chain set before commit phase")
8) f8bb7889af58 ("netfilter: nftables: rename set element data activation/deactivation functions")
9) 628bd3e49cba ("netfilter: nf_tables: drop map element references from preparation phase")
10) 3e70489721b6 ("netfilter: nf_tables: unbind non-anonymous set if rule construction fails")
Note that Patch #1 is a backported dependency patch required by these fixes.
Please, apply,
Thanks.
Florian Westphal (2):
netfilter: nf_tables: use net_generic infra for transaction data
netfilter: nf_tables: add rescheduling points during loop detection walks
Pablo Neira Ayuso (8):
netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
netfilter: nf_tables: fix chain binding transaction logic
netfilter: nf_tables: add NFT_TRANS_PREPARE_ERROR to deal with bound set/chain
netfilter: nf_tables: reject unbound anonymous set before commit phase
netfilter: nf_tables: reject unbound chain set before commit phase
netfilter: nftables: rename set element data activation/deactivation functions
netfilter: nf_tables: drop map element references from preparation phase
netfilter: nf_tables: unbind non-anonymous set if rule construction fails
include/net/netfilter/nf_tables.h | 41 +-
include/net/netns/nftables.h | 7 -
net/netfilter/nf_tables_api.c | 696 +++++++++++++++++++++---------
net/netfilter/nf_tables_offload.c | 30 +-
net/netfilter/nft_chain_filter.c | 11 +-
net/netfilter/nft_dynset.c | 6 +-
net/netfilter/nft_immediate.c | 90 +++-
net/netfilter/nft_set_bitmap.c | 5 +-
net/netfilter/nft_set_hash.c | 23 +-
net/netfilter/nft_set_pipapo.c | 14 +-
net/netfilter/nft_set_rbtree.c | 5 +-
11 files changed, 682 insertions(+), 246 deletions(-)
--
2.30.2
Here is a first batch of fixes for v6.5 and older.
The fixes are not linked to each others.
Patch 1 ensures subflows are unhashed before cleaning the backlog to
avoid races. This fixes another recent fix from v6.4.
Patch 2 does not rely on implicit state check in mptcp_listen() to avoid
races when receiving an MP_FASTCLOSE. A regression from v5.17.
The rest fixes issues in the selftests.
Patch 3 makes sure errors when setting up the environment are no longer
ignored. For v5.17+.
Patch 4 uses 'iptables-legacy' if available to be able to run on older
kernels. A fix for v5.13 and newer.
Patch 5 catches errors when issues are detected with packet marks. Also
for v5.13+.
Patch 6 uses the correct variable instead of an undefined one. Even if
there was no visible impact, it can help to find regressions later. An
issue visible in v5.19+.
Patch 7 makes sure errors with some sub-tests are reported to have the
selftest marked as failed as expected. Also for v5.19+.
Patch 8 adds a kernel config that is required to execute MPTCP
selftests. It is valid for v5.9+.
Patch 9 fixes issues when validating the userspace path-manager with
32-bit arch, an issue affecting v5.19+.
Signed-off-by: Matthieu Baerts <matthieu.baerts(a)tessares.net>
---
Matthieu Baerts (7):
selftests: mptcp: connect: fail if nft supposed to work
selftests: mptcp: sockopt: use 'iptables-legacy' if available
selftests: mptcp: sockopt: return error if wrong mark
selftests: mptcp: userspace_pm: use correct server port
selftests: mptcp: userspace_pm: report errors with 'remove' tests
selftests: mptcp: depend on SYN_COOKIES
selftests: mptcp: pm_nl_ctl: fix 32-bit support
Paolo Abeni (2):
mptcp: ensure subflow is unhashed before cleaning the backlog
mptcp: do not rely on implicit state check in mptcp_listen()
net/mptcp/protocol.c | 7 +++++-
tools/testing/selftests/net/mptcp/config | 1 +
tools/testing/selftests/net/mptcp/mptcp_connect.sh | 3 +++
tools/testing/selftests/net/mptcp/mptcp_sockopt.sh | 29 ++++++++++++----------
tools/testing/selftests/net/mptcp/pm_nl_ctl.c | 10 ++++----
tools/testing/selftests/net/mptcp/userspace_pm.sh | 4 ++-
6 files changed, 34 insertions(+), 20 deletions(-)
---
base-commit: 14bb236b29922c4f57d8c05bfdbcb82677f917c9
change-id: 20230704-upstream-net-20230704-misc-fixes-6-5-rc1-c52608649559
Best regards,
--
Matthieu Baerts <matthieu.baerts(a)tessares.net>
Making 'blk' sector_t (i.e. 64 bit if LBD support is active)
fails the 'blk>0' test in the partition block loop if a
value of (signed int) -1 is used to mark the end of the
partition block list.
This bug was introduced in patch 3 of my prior Amiga partition
support fixes series, and spotted by Christian Zigotzky when
testing the latest block updates.
Explicitly cast 'blk' to signed int to allow use of -1 to
terminate the partition block linked list.
Testing by Christian also exposed another aspect of the old
bug fixed in commits fc3d092c6b ("block: fix signed int
overflow in Amiga partition support") and b6f3f28f60
("block: add overflow checks for Amiga partition support"):
Partitions that did overflow the disk size (due to 32 bit int
overflow) were not skipped but truncated to the end of the
disk. Users who missed the warning message during boot would
go on to create a filesystem with a size exceeding the
actual partition size. Now that the 32 bit overflow has been
corrected, such filesystems may refuse to mount with a
'filesystem exceeds partition size' error. Users should
either correct the partition size, or resize the filesystem
before attempting to boot a kernel with the RDB fixes in
place.
Reported-by: Christian Zigotzky <chzigotzky(a)xenosoft.de>
Fixes: b6f3f28f60 ("block: add overflow checks for Amiga partition support")
Message-ID: 024ce4fa-cc6d-50a2-9aae-3701d0ebf668(a)xenosoft.de
Cc: <stable(a)vger.kernel.org> # 6.4
Link: https://lore.kernel.org/r/024ce4fa-cc6d-50a2-9aae-3701d0ebf668@xenosoft.de
Signed-off-by: Michael Schmitz <schmitzmic(a)gmail.com>
Tested-by: Christian Zigotzky <chzigotzky(a)xenosoft.de>
--
Changes since v2:
Adrian Glaubitz:
- fix typo in commit message
Changes since v1:
- corrected Fixes: tag
- added Tested-by:
- reworded commit message to describe filesystem partition
size mismatch problem
---
block/partitions/amiga.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/partitions/amiga.c b/block/partitions/amiga.c
index ed222b9c901b..506921095412 100644
--- a/block/partitions/amiga.c
+++ b/block/partitions/amiga.c
@@ -90,7 +90,7 @@ int amiga_partition(struct parsed_partitions *state)
}
blk = be32_to_cpu(rdb->rdb_PartitionList);
put_dev_sector(sect);
- for (part = 1; blk>0 && part<=16; part++, put_dev_sector(sect)) {
+ for (part = 1; (s32) blk>0 && part<=16; part++, put_dev_sector(sect)) {
/* Read in terms partition table understands */
if (check_mul_overflow(blk, (sector_t) blksize, &blk)) {
pr_err("Dev %s: overflow calculating partition block %llu! Skipping partitions %u and beyond\n",
--
2.17.1
We get the following crash caused by a null pointer access:
BUG: kernel NULL pointer dereference, address: 0000000000000000
...
RIP: 0010:resume_execution+0x35/0x190
...
Call Trace:
<#DB>
kprobe_debug_handler+0x41/0xd0
exc_debug+0xe5/0x1b0
asm_exc_debug+0x19/0x30
RIP: 0010:copy_from_kernel_nofault.part.0+0x55/0xc0
...
</#DB>
process_fetch_insn+0xfb/0x720
kprobe_trace_func+0x199/0x2c0
? kernel_clone+0x5/0x2f0
kprobe_dispatcher+0x3d/0x60
aggr_pre_handler+0x40/0x80
? kernel_clone+0x1/0x2f0
kprobe_ftrace_handler+0x82/0xf0
? __se_sys_clone+0x65/0x90
ftrace_ops_assist_func+0x86/0x110
? rcu_nocb_try_bypass+0x1f3/0x370
0xffffffffc07e60c8
? kernel_clone+0x1/0x2f0
kernel_clone+0x5/0x2f0
The analysis reveals that kprobe and hardware breakpoints conflict in
the use of debug exceptions.
If we set a hardware breakpoint on a memory address and also have a
kprobe event to fetch the memory at this address. Then when kprobe
triggers, it goes to read the memory and triggers hardware breakpoint
monitoring. This time, since kprobe handles debug exceptions earlier
than hardware breakpoints, it will cause kprobe to incorrectly assume
that the exception is a kprobe trigger.
Notice that after the mainline commit 6256e668b7af ("x86/kprobes: Use
int3 instead of debug trap for single-step"), kprobe no longer uses
debug trap, avoiding the conflict with hardware breakpoints here. This
commit is to remove the IRET that returns to kernel, not to fix the
problem we have here. Also there are a bunch of merge conflicts when
trying to apply this commit to older kernels, so fixing it directly in
older kernels is probably a better option.
If the debug exception is triggered by kprobe, then regs->ip should be
located in the kprobe instruction slot. Add this check to
kprobe_debug_handler() to properly determine if a debug exception should
be handled by kprobe.
The stable kernels affected are 5.10, 5.4, 4.19, and 4.14. I made the
fix in 5.10, and we should probably apply this fix to other stable
kernels.
Signed-off-by: Li Huafei <lihuafei1(a)huawei.com>
---
arch/x86/kernel/kprobes/core.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/kprobes/core.c b/arch/x86/kernel/kprobes/core.c
index 5de757099186..fd8d7d128807 100644
--- a/arch/x86/kernel/kprobes/core.c
+++ b/arch/x86/kernel/kprobes/core.c
@@ -900,7 +900,14 @@ int kprobe_debug_handler(struct pt_regs *regs)
struct kprobe *cur = kprobe_running();
struct kprobe_ctlblk *kcb = get_kprobe_ctlblk();
- if (!cur)
+ if (!cur || !cur->ainsn.insn)
+ return 0;
+
+ /* regs->ip should be the address of next instruction to
+ * cur->ainsn.insn.
+ */
+ if (regs->ip < (unsigned long)cur->ainsn.insn ||
+ regs->ip - (unsigned long)cur->ainsn.insn > MAX_INSN_SIZE)
return 0;
resume_execution(cur, regs, kcb);
--
2.17.1
When forking a child process, parent write-protects an anonymous page
and COW-shares it with the child being forked using copy_present_pte().
Parent's TLB is flushed right before we drop the parent's mmap_lock in
dup_mmap(). If we get a write-fault before that TLB flush in the parent,
and we end up replacing that anonymous page in the parent process in
do_wp_page() (because, COW-shared with the child), this might lead to
some stale writable TLB entries targeting the wrong (old) page.
Similar issue happened in the past with userfaultfd (see flush_tlb_page()
call inside do_wp_page()).
Lock VMAs of the parent process when forking a child, which prevents
concurrent page faults during fork operation and avoids this issue.
This fix can potentially regress some fork-heavy workloads. Kernel build
time did not show noticeable regression on a 56-core machine while a
stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression. If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance. Further
optimizations are possible if this regression proves to be problematic.
Suggested-by: David Hildenbrand <david(a)redhat.com>
Reported-by: Jiri Slaby <jirislaby(a)kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger(a)applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-as…
Reported-by: Jacob Young <jacobly.alt(a)gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
---
kernel/fork.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/fork.c b/kernel/fork.c
index b85814e614a5..d2e12b6d2b18 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -686,6 +686,7 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
for_each_vma(old_vmi, mpnt) {
struct file *file;
+ vma_start_write(mpnt);
if (mpnt->vm_flags & VM_DONTCOPY) {
vm_stat_account(mm, mpnt->vm_flags, -vma_pages(mpnt));
continue;
--
2.41.0.255.g8b1d071c50-goog
The patch titled
Subject: kasan: fix type cast in memory_is_poisoned_n
has been added to the -mm mm-hotfixes-unstable branch. Its filename is
kasan-fix-type-cast-in-memory_is_poisoned_n.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-hotfixes-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Andrey Konovalov <andreyknvl(a)google.com>
Subject: kasan: fix type cast in memory_is_poisoned_n
Date: Tue, 4 Jul 2023 02:52:05 +0200
Commit bb6e04a173f0 ("kasan: use internal prototypes matching gcc-13
builtins") introduced a bug into the memory_is_poisoned_n implementation:
it effectively removed the cast to a signed integer type after applying
KASAN_GRANULE_MASK.
As a result, KASAN started failing to properly check memset, memcpy, and
other similar functions.
Fix the bug by adding the cast back (through an additional signed integer
variable to make the code more readable).
Link: https://lkml.kernel.org/r/8c9e0251c2b8b81016255709d4ec42942dcaf018.16884318…
Fixes: bb6e04a173f0 ("kasan: use internal prototypes matching gcc-13 builtins")
Signed-off-by: Andrey Konovalov <andreyknvl(a)google.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Andrey Ryabinin <ryabinin.a.a(a)gmail.com>
Cc: Arnd Bergmann <arnd(a)arndb.de>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Marco Elver <elver(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/generic.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
--- a/mm/kasan/generic.c~kasan-fix-type-cast-in-memory_is_poisoned_n
+++ a/mm/kasan/generic.c
@@ -130,9 +130,10 @@ static __always_inline bool memory_is_po
if (unlikely(ret)) {
const void *last_byte = addr + size - 1;
s8 *last_shadow = (s8 *)kasan_mem_to_shadow(last_byte);
+ s8 last_accessible_byte = (unsigned long)last_byte & KASAN_GRANULE_MASK;
if (unlikely(ret != (unsigned long)last_shadow ||
- (((long)last_byte & KASAN_GRANULE_MASK) >= *last_shadow)))
+ last_accessible_byte >= *last_shadow))
return true;
}
return false;
_
Patches currently in -mm which might be from andreyknvl(a)google.com are
kasan-fix-type-cast-in-memory_is_poisoned_n.patch
Hallow und wie geht es dir heute?
Ich möchte, dass Ihre Partnerschaft Sie als Subunternehmer
präsentiert, damit Sie in meinem Namen 8,6 Mio.
Überrechnungsvertragsfonds erhalten können, die wir zu 65 % und 35 %
aufteilen können.
Diese Transaktion ist 100 % risikofrei; Du brauchst keine Angst zu haben.
Bitte senden Sie mir eine E-Mail an (kmrs41786(a)gmail.com), um
ausführliche Informationen zu erhalten und bei Interesse zu erfahren,
wie wir dies gemeinsam bewältigen können.
Sie müssen es mir also weiterleiten
Ihr vollständiger Name.........................
Telefon.............
Mit freundlichen Grüße,
Herr Jimoh Oyebisi.
Hallow and how are you today?
I seek for your partnership to present you as a sub-contractor so
that you can receive 8.6M Over-Invoice contract fund on my behalf and
we can split it 65% 35%.
This transaction is 100% risk -free; you need not to be afraid.
Please email me at ( kmrs41786(a)gmail.com ) for comprehensive details
and how we can handle this together if interested.
So I need you to forward it to me
your full name.........................
Telephone.............
Kind Regards,
Mr. Jimoh Oyebisi.
The patch titled
Subject: memcg: drop kmem.limit_in_bytes
has been added to the -mm mm-unstable branch. Its filename is
memcg-drop-kmemlimit_in_bytes.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche…
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Michal Hocko <mhocko(a)suse.com>
Subject: memcg: drop kmem.limit_in_bytes
Date: Tue, 4 Jul 2023 13:52:40 +0200
kmem.limit_in_bytes (v1 way to limit kernel memory usage) has been
deprecated since 58056f77502f ("memcg, kmem: further deprecate
kmem.limit_in_bytes") merged in 5.16. We haven't heard about any serious
users since then but it seems that the mere presence of the file is
causing more harm than good. We (SUSE) have had several bug reports from
customers where Docker based containers started to fail because a write to
kmem.limit_in_bytes has failed.
This was unexpected because runc code only expects ENOENT (kmem disabled)
or EBUSY (tasks already running within cgroup). So a new error code was
unexpected and the whole container startup failed. This has been later
addressed by
https://github.com/opencontainers/runc/commit/52390d68040637dfc77f9fda6bbe7…
so current Docker runtimes do not suffer from the problem anymore. There
are still older version of Docker in use and likely hard to get rid of
completely.
Address this by wiping out the file completely and effectively get back to
pre 4.5 era and CONFIG_MEMCG_KMEM=n configuration.
I would recommend backporting to stable trees which have picked up
58056f77502f ("memcg, kmem: further deprecate kmem.limit_in_bytes").
Link: https://lkml.kernel.org/r/20230704115240.14672-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko(a)suse.com>
Cc: Johannes Weiner <hannes(a)cmpxchg.org>
Cc: Muchun Song <muchun.song(a)linux.dev>
Cc: Roman Gushchin <roman.gushchin(a)linux.dev>
Cc: Shakeel Butt <shakeelb(a)google.com>
Cc: Tejun Heo <tj(a)kernel.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/admin-guide/cgroup-v1/memory.rst | 2 --
mm/memcontrol.c | 13 -------------
2 files changed, 15 deletions(-)
--- a/Documentation/admin-guide/cgroup-v1/memory.rst~memcg-drop-kmemlimit_in_bytes
+++ a/Documentation/admin-guide/cgroup-v1/memory.rst
@@ -92,8 +92,6 @@ Brief summary of control files.
memory.oom_control set/show oom controls.
memory.numa_stat show the number of memory usage per numa
node
- memory.kmem.limit_in_bytes This knob is deprecated and writing to
- it will return -ENOTSUPP.
memory.kmem.usage_in_bytes show current kernel memory allocation
memory.kmem.failcnt show the number of kernel memory usage
hits limits
--- a/mm/memcontrol.c~memcg-drop-kmemlimit_in_bytes
+++ a/mm/memcontrol.c
@@ -3708,9 +3708,6 @@ static u64 mem_cgroup_read_u64(struct cg
case _MEMSWAP:
counter = &memcg->memsw;
break;
- case _KMEM:
- counter = &memcg->kmem;
- break;
case _TCP:
counter = &memcg->tcpmem;
break;
@@ -3871,10 +3868,6 @@ static ssize_t mem_cgroup_write(struct k
case _MEMSWAP:
ret = mem_cgroup_resize_max(memcg, nr_pages, true);
break;
- case _KMEM:
- /* kmem.limit_in_bytes is deprecated. */
- ret = -EOPNOTSUPP;
- break;
case _TCP:
ret = memcg_update_tcp_max(memcg, nr_pages);
break;
@@ -5086,12 +5079,6 @@ static struct cftype mem_cgroup_legacy_f
},
#endif
{
- .name = "kmem.limit_in_bytes",
- .private = MEMFILE_PRIVATE(_KMEM, RES_LIMIT),
- .write = mem_cgroup_write,
- .read_u64 = mem_cgroup_read_u64,
- },
- {
.name = "kmem.usage_in_bytes",
.private = MEMFILE_PRIVATE(_KMEM, RES_USAGE),
.read_u64 = mem_cgroup_read_u64,
_
Patches currently in -mm which might be from mhocko(a)suse.com are
memcg-drop-kmemlimit_in_bytes.patch