November 2022 - Linux-stable-mirror

[PATCH v2 1/2] drm/i915/migrate: Account for the reserved_space

by Matthew Auld

From: Chris Wilson <chris.p.wilson(a)intel.com> If the ring is nearly full when calling into emit_pte(), we might incorrectly trample the reserved_space when constructing the packet to emit the PTEs. This then triggers the GEM_BUG_ON(rq->reserved_space > ring->space) when later submitting the request, since the request itself doesn't have enough space left in the ring to emit things like workarounds, breadcrumbs etc. Testcase: igt@i915_selftests@live_emit_pte_full_ring Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/7535 Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6889 Fixes: cf586021642d ("drm/i915/gt: Pipelined page migration") Signed-off-by: Chris Wilson <chris.p.wilson(a)intel.com> Signed-off-by: Matthew Auld <matthew.auld(a)intel.com> Cc: Andrzej Hajda <andrzej.hajda(a)intel.com> Cc: Nirmoy Das <nirmoy.das(a)intel.com> Cc: <stable(a)vger.kernel.org> # v5.15+ --- drivers/gpu/drm/i915/gt/intel_migrate.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/intel_migrate.c b/drivers/gpu/drm/i915/gt/intel_migrate.c index b405a04135ca..48c3b5168558 100644 --- a/drivers/gpu/drm/i915/gt/intel_migrate.c +++ b/drivers/gpu/drm/i915/gt/intel_migrate.c @@ -342,6 +342,16 @@ static int emit_no_arbitration(struct i915_request *rq) return 0; } +static int max_pte_pkt_size(struct i915_request *rq, int pkt) +{ + struct intel_ring *ring = rq->ring; + + pkt = min_t(int, pkt, (ring->space - rq->reserved_space) / sizeof(u32) + 5); + pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5); + + return pkt; +} + static int emit_pte(struct i915_request *rq, struct sgt_dma *it, enum i915_cache_level cache_level, @@ -388,8 +398,7 @@ static int emit_pte(struct i915_request *rq, return PTR_ERR(cs); /* Pack as many PTE updates as possible into a single MI command */ - pkt = min_t(int, dword_length, ring->space / sizeof(u32) + 5); - pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5); + pkt = max_pte_pkt_size(rq, dword_length); hdr = cs; *cs++ = MI_STORE_DATA_IMM | REG_BIT(21); /* as qword elements */ @@ -422,8 +431,7 @@ static int emit_pte(struct i915_request *rq, } } - pkt = min_t(int, dword_rem, ring->space / sizeof(u32) + 5); - pkt = min_t(int, pkt, (ring->size - ring->emit) / sizeof(u32) + 5); + pkt = max_pte_pkt_size(rq, dword_rem); hdr = cs; *cs++ = MI_STORE_DATA_IMM | REG_BIT(21); -- 2.38.1

2 years, 6 months

2
1
0 0

[PATCH 5.10] Revert "tty: n_gsm: avoid call of sleeping functions from atomic context"

by Fedor Pchelkin

From: Fedor Pchelkin <pchelkin(a)ispras.ru> [ Upstream commit acdab4cb4ba7e5f94d2b422ebd7bf4bf68178fb2 ] This reverts commit eb75efdec8dd0f01ac85c88feafa6e63b34a2521. The above commit is reverted as the usage of tx_mutex seems not to solve the problem described in eb75efdec8dd ("tty: n_gsm: avoid call of sleeping functions from atomic context") and just moves the bug to another place. Signed-off-by: Fedor Pchelkin <pchelkin(a)ispras.ru> Signed-off-by: Alexey Khoroshilov <khoroshilov(a)ispras.ru> Reviewed-by: Daniel Starke <daniel.starke(a)siemens.com> Link: https://lore.kernel.org/r/20221008110221.13645-2-pchelkin@ispras.ru --- drivers/tty/n_gsm.c | 39 +++++++++++++++++++++------------------ 1 file changed, 21 insertions(+), 18 deletions(-) diff --git a/drivers/tty/n_gsm.c b/drivers/tty/n_gsm.c index c91a3004931f..c2212f52a603 100644 --- a/drivers/tty/n_gsm.c +++ b/drivers/tty/n_gsm.c @@ -235,7 +235,7 @@ struct gsm_mux { int old_c_iflag; /* termios c_iflag value before attach */ bool constipated; /* Asked by remote to shut up */ - struct mutex tx_mutex; + spinlock_t tx_lock; unsigned int tx_bytes; /* TX data outstanding */ #define TX_THRESH_HI 8192 #define TX_THRESH_LO 2048 @@ -820,14 +820,15 @@ static void __gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg) * * Add data to the transmit queue and try and get stuff moving * out of the mux tty if not already doing so. Take the - * the gsm tx mutex and dlci lock. + * the gsm tx lock and dlci lock. */ static void gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg) { - mutex_lock(&dlci->gsm->tx_mutex); + unsigned long flags; + spin_lock_irqsave(&dlci->gsm->tx_lock, flags); __gsm_data_queue(dlci, msg); - mutex_unlock(&dlci->gsm->tx_mutex); + spin_unlock_irqrestore(&dlci->gsm->tx_lock, flags); } /** @@ -839,7 +840,7 @@ static void gsm_data_queue(struct gsm_dlci *dlci, struct gsm_msg *msg) * is data. Keep to the MRU of the mux. This path handles the usual tty * interface which is a byte stream with optional modem data. * - * Caller must hold the tx_mutex of the mux. + * Caller must hold the tx_lock of the mux. */ static int gsm_dlci_data_output(struct gsm_mux *gsm, struct gsm_dlci *dlci) @@ -902,7 +903,7 @@ static int gsm_dlci_data_output(struct gsm_mux *gsm, struct gsm_dlci *dlci) * is data. Keep to the MRU of the mux. This path handles framed data * queued as skbuffs to the DLCI. * - * Caller must hold the tx_mutex of the mux. + * Caller must hold the tx_lock of the mux. */ static int gsm_dlci_data_output_framed(struct gsm_mux *gsm, @@ -918,7 +919,7 @@ static int gsm_dlci_data_output_framed(struct gsm_mux *gsm, if (dlci->adaption == 4) overhead = 1; - /* dlci->skb is locked by tx_mutex */ + /* dlci->skb is locked by tx_lock */ if (dlci->skb == NULL) { dlci->skb = skb_dequeue_tail(&dlci->skb_list); if (dlci->skb == NULL) @@ -1018,12 +1019,13 @@ static void gsm_dlci_data_sweep(struct gsm_mux *gsm) static void gsm_dlci_data_kick(struct gsm_dlci *dlci) { + unsigned long flags; int sweep; if (dlci->constipated) return; - mutex_lock(&dlci->gsm->tx_mutex); + spin_lock_irqsave(&dlci->gsm->tx_lock, flags); /* If we have nothing running then we need to fire up */ sweep = (dlci->gsm->tx_bytes < TX_THRESH_LO); if (dlci->gsm->tx_bytes == 0) { @@ -1034,7 +1036,7 @@ static void gsm_dlci_data_kick(struct gsm_dlci *dlci) } if (sweep) gsm_dlci_data_sweep(dlci->gsm); - mutex_unlock(&dlci->gsm->tx_mutex); + spin_unlock_irqrestore(&dlci->gsm->tx_lock, flags); } /* @@ -1256,6 +1258,7 @@ static void gsm_control_message(struct gsm_mux *gsm, unsigned int command, const u8 *data, int clen) { u8 buf[1]; + unsigned long flags; switch (command) { case CMD_CLD: { @@ -1277,9 +1280,9 @@ static void gsm_control_message(struct gsm_mux *gsm, unsigned int command, gsm->constipated = false; gsm_control_reply(gsm, CMD_FCON, NULL, 0); /* Kick the link in case it is idling */ - mutex_lock(&gsm->tx_mutex); + spin_lock_irqsave(&gsm->tx_lock, flags); gsm_data_kick(gsm, NULL); - mutex_unlock(&gsm->tx_mutex); + spin_unlock_irqrestore(&gsm->tx_lock, flags); break; case CMD_FCOFF: /* Modem wants us to STFU */ @@ -2225,7 +2228,6 @@ static void gsm_free_mux(struct gsm_mux *gsm) break; } } - mutex_destroy(&gsm->tx_mutex); mutex_destroy(&gsm->mutex); kfree(gsm->txframe); kfree(gsm->buf); @@ -2297,12 +2299,12 @@ static struct gsm_mux *gsm_alloc_mux(void) } spin_lock_init(&gsm->lock); mutex_init(&gsm->mutex); - mutex_init(&gsm->tx_mutex); kref_init(&gsm->ref); INIT_LIST_HEAD(&gsm->tx_list); timer_setup(&gsm->t2_timer, gsm_control_retransmit, 0); init_waitqueue_head(&gsm->event); spin_lock_init(&gsm->control_lock); + spin_lock_init(&gsm->tx_lock); gsm->t1 = T1; gsm->t2 = T2; @@ -2327,7 +2329,6 @@ static struct gsm_mux *gsm_alloc_mux(void) } spin_unlock(&gsm_mux_lock); if (i == MAX_MUX) { - mutex_destroy(&gsm->tx_mutex); mutex_destroy(&gsm->mutex); kfree(gsm->txframe); kfree(gsm->buf); @@ -2652,15 +2653,16 @@ static int gsmld_open(struct tty_struct *tty) static void gsmld_write_wakeup(struct tty_struct *tty) { struct gsm_mux *gsm = tty->disc_data; + unsigned long flags; /* Queue poll */ clear_bit(TTY_DO_WRITE_WAKEUP, &tty->flags); - mutex_lock(&gsm->tx_mutex); + spin_lock_irqsave(&gsm->tx_lock, flags); gsm_data_kick(gsm, NULL); if (gsm->tx_bytes < TX_THRESH_LO) { gsm_dlci_data_sweep(gsm); } - mutex_unlock(&gsm->tx_mutex); + spin_unlock_irqrestore(&gsm->tx_lock, flags); } /** @@ -2703,6 +2705,7 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file, const unsigned char *buf, size_t nr) { struct gsm_mux *gsm = tty->disc_data; + unsigned long flags; int space; int ret; @@ -2710,13 +2713,13 @@ static ssize_t gsmld_write(struct tty_struct *tty, struct file *file, return -ENODEV; ret = -ENOBUFS; - mutex_lock(&gsm->tx_mutex); + spin_lock_irqsave(&gsm->tx_lock, flags); space = tty_write_room(tty); if (space >= nr) ret = tty->ops->write(tty, buf, nr); else set_bit(TTY_DO_WRITE_WAKEUP, &tty->flags); - mutex_unlock(&gsm->tx_mutex); + spin_unlock_irqrestore(&gsm->tx_lock, flags); return ret; } -- 2.38.1

2 years, 6 months

3
10
0 0

[PATCH 5.4 and earlier only] mm: Fix '.data.once' orphan section warning

by Nathan Chancellor

Portions of upstream commit a4055888629b ("mm/memcg: warning on !memcg after readahead page charged") were backported as commit cfe575954ddd ("mm: add VM_WARN_ON_ONCE_PAGE() macro"). Unfortunately, the backport did not account for the lack of commit 33def8498fdd ("treewide: Convert macro and uses of __section(foo) to __section("foo")") in kernels prior to 5.10, resulting in the following orphan section warnings on PowerPC clang builds with CONFIG_DEBUG_VM=y: powerpc64le-linux-gnu-ld: warning: orphan section `".data.once"' from `mm/huge_memory.o' being placed in section `".data.once"' powerpc64le-linux-gnu-ld: warning: orphan section `".data.once"' from `mm/huge_memory.o' being placed in section `".data.once"' powerpc64le-linux-gnu-ld: warning: orphan section `".data.once"' from `mm/huge_memory.o' being placed in section `".data.once"' This is a difference between how clang and gcc handle macro stringification, which was resolved for the kernel by not stringifying the argument to the __section() macro. Since that change was deemed not suitable for the stable kernels by commit 59f89518f510 ("once: fix section mismatch on clang builds"), do that same thing as that change and remove the quotes from the argument to __section(). Fixes: cfe575954ddd ("mm: add VM_WARN_ON_ONCE_PAGE() macro") Signed-off-by: Nathan Chancellor <nathan(a)kernel.org> --- As far as I can tell, this should be applied to 5.4 and earlier. It should apply cleanly but let me know if not. include/linux/mmdebug.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/mmdebug.h b/include/linux/mmdebug.h index 5d0767cb424a..4ed52879ce55 100644 --- a/include/linux/mmdebug.h +++ b/include/linux/mmdebug.h @@ -38,7 +38,7 @@ void dump_mm(const struct mm_struct *mm); } \ } while (0) #define VM_WARN_ON_ONCE_PAGE(cond, page) ({ \ - static bool __section(".data.once") __warned; \ + static bool __section(.data.once) __warned; \ int __ret_warn_once = !!(cond); \ \ if (unlikely(__ret_warn_once && !__warned)) { \ base-commit: 4d2a309b5c28a2edc0900542d22fec3a5a22243b -- 2.38.1

2 years, 6 months

3
6
0 0

[PATCH 5.4 1/2] nvme: restrict management ioctls to admin

by Ovidiu Panait

From: Keith Busch <kbusch(a)kernel.org> commit 23e085b2dead13b51fe86d27069895b740f749c0 upstream. The passthrough commands already have this restriction, but the other operations do not. Require the same capabilities for all users as all of these operations, which include resets and rescans, can be disruptive. Signed-off-by: Keith Busch <kbusch(a)kernel.org> Signed-off-by: Christoph Hellwig <hch(a)lst.de> Signed-off-by: Ovidiu Panait <ovidiu.panait(a)windriver.com> --- These backports are for CVE-2022-3169. drivers/nvme/host/core.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index 6627fb531f33..3b5e5fb158be 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -3027,11 +3027,17 @@ static long nvme_dev_ioctl(struct file *file, unsigned int cmd, case NVME_IOCTL_IO_CMD: return nvme_dev_user_cmd(ctrl, argp); case NVME_IOCTL_RESET: + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; dev_warn(ctrl->device, "resetting controller\n"); return nvme_reset_ctrl_sync(ctrl); case NVME_IOCTL_SUBSYS_RESET: + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; return nvme_reset_subsystem(ctrl); case NVME_IOCTL_RESCAN: + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; nvme_queue_scan(ctrl); return 0; default: -- 2.38.1

2 years, 6 months

2
2
0 0

[PATCH 4.14 1/1] tcp/udp: Fix memory leak in ipv6_renew_options().

by Ovidiu Panait

From: Kuniyuki Iwashima <kuniyu(a)amazon.com> commit 3c52c6bb831f6335c176a0fc7214e26f43adbd11 upstream. syzbot reported a memory leak [0] related to IPV6_ADDRFORM. The scenario is that while one thread is converting an IPv6 socket into IPv4 with IPV6_ADDRFORM, another thread calls do_ipv6_setsockopt() and allocates memory to inet6_sk(sk)->XXX after conversion. Then, the converted sk with (tcp|udp)_prot never frees the IPv6 resources, which inet6_destroy_sock() should have cleaned up. setsockopt(IPV6_ADDRFORM) setsockopt(IPV6_DSTOPTS) +-----------------------+ +----------------------+ - do_ipv6_setsockopt(sk, ...) - sockopt_lock_sock(sk) - do_ipv6_setsockopt(sk, ...) - lock_sock(sk) ^._ called via tcpv6_prot - WRITE_ONCE(sk->sk_prot, &tcp_prot) before WRITE_ONCE() - xchg(&np->opt, NULL) - txopt_put(opt) - sockopt_release_sock(sk) - release_sock(sk) - sockopt_lock_sock(sk) - lock_sock(sk) - ipv6_set_opt_hdr(sk, ...) - ipv6_update_options(sk, opt) - xchg(&inet6_sk(sk)->opt, opt) ^._ opt is never freed. - sockopt_release_sock(sk) - release_sock(sk) Since IPV6_DSTOPTS allocates options under lock_sock(), we can avoid this memory leak by testing whether sk_family is changed by IPV6_ADDRFORM after acquiring the lock. This issue exists from the initial commit between IPV6_ADDRFORM and IPV6_PKTOPTIONS. [0]: BUG: memory leak unreferenced object 0xffff888009ab9f80 (size 96): comm "syz-executor583", pid 328, jiffies 4294916198 (age 13.034s) hex dump (first 32 bytes): 01 00 00 00 48 00 00 00 08 00 00 00 00 00 00 00 ....H........... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<000000002ee98ae1>] kmalloc include/linux/slab.h:605 [inline] [<000000002ee98ae1>] sock_kmalloc+0xb3/0x100 net/core/sock.c:2566 [<0000000065d7b698>] ipv6_renew_options+0x21e/0x10b0 net/ipv6/exthdrs.c:1318 [<00000000a8c756d7>] ipv6_set_opt_hdr net/ipv6/ipv6_sockglue.c:354 [inline] [<00000000a8c756d7>] do_ipv6_setsockopt.constprop.0+0x28b7/0x4350 net/ipv6/ipv6_sockglue.c:668 [<000000002854d204>] ipv6_setsockopt+0xdf/0x190 net/ipv6/ipv6_sockglue.c:1021 [<00000000e69fdcf8>] tcp_setsockopt+0x13b/0x2620 net/ipv4/tcp.c:3789 [<0000000090da4b9b>] __sys_setsockopt+0x239/0x620 net/socket.c:2252 [<00000000b10d192f>] __do_sys_setsockopt net/socket.c:2263 [inline] [<00000000b10d192f>] __se_sys_setsockopt net/socket.c:2260 [inline] [<00000000b10d192f>] __x64_sys_setsockopt+0xbe/0x160 net/socket.c:2260 [<000000000a80d7aa>] do_syscall_x64 arch/x86/entry/common.c:50 [inline] [<000000000a80d7aa>] do_syscall_64+0x38/0x90 arch/x86/entry/common.c:80 [<000000004562b5c6>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: syzbot <syzkaller(a)googlegroups.com> Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com> Signed-off-by: Jakub Kicinski <kuba(a)kernel.org> Signed-off-by: Ovidiu Panait <ovidiu.panait(a)windriver.com> --- net/ipv6/ipv6_sockglue.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 3f44316db51b..3c099742c58e 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -166,6 +166,12 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, rtnl_lock(); lock_sock(sk); + /* Another thread has converted the socket into IPv4 with + * IPV6_ADDRFORM concurrently. + */ + if (unlikely(sk->sk_family != AF_INET6)) + goto unlock; + switch (optname) { case IPV6_ADDRFORM: @@ -905,6 +911,7 @@ static int do_ipv6_setsockopt(struct sock *sk, int level, int optname, break; } +unlock: release_sock(sk); if (needs_rtnl) rtnl_unlock(); -- 2.38.1

2 years, 6 months

2
1
0 0

FAILED: patch "[PATCH] tracing/ring-buffer: Have polling block on watermark" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. Possible dependencies: 42fb0a1e84ff ("tracing/ring-buffer: Have polling block on watermark") 7e9fbbb1b776 ("ring-buffer: Add ring_buffer_wake_waiters()") 13292494379f ("tracing: Make struct ring_buffer less ambiguous") 1c5eb4481e01 ("tracing: Rename trace_buffer to array_buffer") 2d6425af6116 ("tracing: Declare newly exported APIs in include/linux/trace.h") thanks, greg k-h ------------------ original commit in Linus's tree ------------------ From 42fb0a1e84ff525ebe560e2baf9451ab69127e2b Mon Sep 17 00:00:00 2001 From: "Steven Rostedt (Google)" <rostedt(a)goodmis.org> Date: Thu, 20 Oct 2022 23:14:27 -0400 Subject: [PATCH] tracing/ring-buffer: Have polling block on watermark Currently the way polling works on the ring buffer is broken. It will return immediately if there's any data in the ring buffer whereas a read will block until the watermark (defined by the tracefs buffer_percent file) is hit. That is, a select() or poll() will return as if there's data available, but then the following read will block. This is broken for the way select()s and poll()s are supposed to work. Have the polling on the ring buffer also block the same way reads and splice does on the ring buffer. Link: https://lkml.kernel.org/r/20221020231427.41be3f26@gandalf.local.home Cc: Linux Trace Kernel <linux-trace-kernel(a)vger.kernel.org> Cc: Masami Hiramatsu <mhiramat(a)kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers(a)efficios.com> Cc: Primiano Tucci <primiano(a)google.com> Cc: stable(a)vger.kernel.org Fixes: 1e0d6714aceb7 ("ring-buffer: Do not wake up a splice waiter when page is not full") Signed-off-by: Steven Rostedt (Google) <rostedt(a)goodmis.org> diff --git a/include/linux/ring_buffer.h b/include/linux/ring_buffer.h index 2504df9a0453..3c7d295746f6 100644 --- a/include/linux/ring_buffer.h +++ b/include/linux/ring_buffer.h @@ -100,7 +100,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full); __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, - struct file *filp, poll_table *poll_table); + struct file *filp, poll_table *poll_table, int full); void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu); #define RING_BUFFER_ALL_CPUS -1 diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c index 9712083832f4..089b1ec9cb3b 100644 --- a/kernel/trace/ring_buffer.c +++ b/kernel/trace/ring_buffer.c @@ -907,6 +907,21 @@ size_t ring_buffer_nr_dirty_pages(struct trace_buffer *buffer, int cpu) return cnt - read; } +static __always_inline bool full_hit(struct trace_buffer *buffer, int cpu, int full) +{ + struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu]; + size_t nr_pages; + size_t dirty; + + nr_pages = cpu_buffer->nr_pages; + if (!nr_pages || !full) + return true; + + dirty = ring_buffer_nr_dirty_pages(buffer, cpu); + + return (dirty * 100) > (full * nr_pages); +} + /* * rb_wake_up_waiters - wake up tasks waiting for ring buffer input * @@ -1046,22 +1061,20 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) !ring_buffer_empty_cpu(buffer, cpu)) { unsigned long flags; bool pagebusy; - size_t nr_pages; - size_t dirty; + bool done; if (!full) break; raw_spin_lock_irqsave(&cpu_buffer->reader_lock, flags); pagebusy = cpu_buffer->reader_page == cpu_buffer->commit_page; - nr_pages = cpu_buffer->nr_pages; - dirty = ring_buffer_nr_dirty_pages(buffer, cpu); + done = !pagebusy && full_hit(buffer, cpu, full); + if (!cpu_buffer->shortest_full || cpu_buffer->shortest_full > full) cpu_buffer->shortest_full = full; raw_spin_unlock_irqrestore(&cpu_buffer->reader_lock, flags); - if (!pagebusy && - (!nr_pages || (dirty * 100) > full * nr_pages)) + if (done) break; } @@ -1087,6 +1100,7 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) * @cpu: the cpu buffer to wait on * @filp: the file descriptor * @poll_table: The poll descriptor + * @full: wait until the percentage of pages are available, if @cpu != RING_BUFFER_ALL_CPUS * * If @cpu == RING_BUFFER_ALL_CPUS then the task will wake up as soon * as data is added to any of the @buffer's cpu buffers. Otherwise @@ -1096,14 +1110,15 @@ int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full) * zero otherwise. */ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, - struct file *filp, poll_table *poll_table) + struct file *filp, poll_table *poll_table, int full) { struct ring_buffer_per_cpu *cpu_buffer; struct rb_irq_work *work; - if (cpu == RING_BUFFER_ALL_CPUS) + if (cpu == RING_BUFFER_ALL_CPUS) { work = &buffer->irq_work; - else { + full = 0; + } else { if (!cpumask_test_cpu(cpu, buffer->cpumask)) return -EINVAL; @@ -1111,8 +1126,14 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, work = &cpu_buffer->irq_work; } - poll_wait(filp, &work->waiters, poll_table); - work->waiters_pending = true; + if (full) { + poll_wait(filp, &work->full_waiters, poll_table); + work->full_waiters_pending = true; + } else { + poll_wait(filp, &work->waiters, poll_table); + work->waiters_pending = true; + } + /* * There's a tight race between setting the waiters_pending and * checking if the ring buffer is empty. Once the waiters_pending bit @@ -1128,6 +1149,9 @@ __poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu, */ smp_mb(); + if (full) + return full_hit(buffer, cpu, full) ? EPOLLIN | EPOLLRDNORM : 0; + if ((cpu == RING_BUFFER_ALL_CPUS && !ring_buffer_empty(buffer)) || (cpu != RING_BUFFER_ALL_CPUS && !ring_buffer_empty_cpu(buffer, cpu))) return EPOLLIN | EPOLLRDNORM; @@ -3155,10 +3179,6 @@ static void rb_commit(struct ring_buffer_per_cpu *cpu_buffer, static __always_inline void rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer) { - size_t nr_pages; - size_t dirty; - size_t full; - if (buffer->irq_work.waiters_pending) { buffer->irq_work.waiters_pending = false; /* irq_work_queue() supplies it's own memory barriers */ @@ -3182,10 +3202,7 @@ rb_wakeups(struct trace_buffer *buffer, struct ring_buffer_per_cpu *cpu_buffer) cpu_buffer->last_pages_touch = local_read(&cpu_buffer->pages_touched); - full = cpu_buffer->shortest_full; - nr_pages = cpu_buffer->nr_pages; - dirty = ring_buffer_nr_dirty_pages(buffer, cpu_buffer->cpu); - if (full && nr_pages && (dirty * 100) <= full * nr_pages) + if (!full_hit(buffer, cpu_buffer->cpu, cpu_buffer->shortest_full)) return; cpu_buffer->irq_work.wakeup_full = true; diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 47a44b055a1d..c6c7a0af3ed2 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -6681,7 +6681,7 @@ trace_poll(struct trace_iterator *iter, struct file *filp, poll_table *poll_tabl return EPOLLIN | EPOLLRDNORM; else return ring_buffer_poll_wait(iter->array_buffer->buffer, iter->cpu_file, - filp, poll_table); + filp, poll_table, iter->tr->buffer_percent); } static __poll_t

2 years, 6 months

3
2
0 0

[PATCH 4.14] efi: random: Properly limit the size of the random seed

by Ben Hutchings

Commit be36f9e7517e ("efi: READ_ONCE rng seed size before munmap") added a READ_ONCE() and also changed the call to add_bootloader_randomness() to use the local size variable. Neither of these changes was actually needed and this was not backported to the 4.14 stable branch. Commit 161a438d730d ("efi: random: reduce seed size to 32 bytes") reverted the addition of READ_ONCE() and added a limit to the value of size. This depends on the earlier commit, because size can now differ from seed->size, but it was wrongly backported to the 4.14 stable branch by itself. Apply the missing change to the add_bootloader_randomness() parameter (except that here we are still using add_device_randomness()). Fixes: 700485f70e50 ("efi: random: reduce seed size to 32 bytes") Signed-off-by: Ben Hutchings <ben(a)decadent.org.uk> --- drivers/firmware/efi/efi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/firmware/efi/efi.c b/drivers/firmware/efi/efi.c index ed981f5e29ae..cc64869d8420 100644 --- a/drivers/firmware/efi/efi.c +++ b/drivers/firmware/efi/efi.c @@ -541,7 +541,7 @@ int __init efi_config_parse_tables(void *config_tables, int count, int sz, seed = early_memremap(efi.rng_seed, sizeof(*seed) + size); if (seed != NULL) { - add_device_randomness(seed->bits, seed->size); + add_device_randomness(seed->bits, size); early_memunmap(seed, sizeof(*seed) + size); pr_notice("seeding entropy pool\n"); } else {

2 years, 6 months

3
3
0 0

Stable backport request: tools and binutils' init_disassemble_info

by Daniel Díaz

Hello! Would the stable maintainers please consider backporting the following series of patches?: https://lore.kernel.org/lkml/20220801013834.156015-1-andres@anarazel.de/ Branches where this is needed are: * 5.4 * 5.10 * 5.15 Branch 6.0.y is fine, as this series is present there. Failure is seen in this form: -----8<----------8<----------8<----- util/annotate.c: In function 'symbol__disassemble_bpf': util/annotate.c:1739:9: error: too few arguments to function 'init_disassemble_info' 1739 | init_disassemble_info(&info, s, | ^~~~~~~~~~~~~~~~~~~~~ In file included from util/annotate.c:1692: /usr/include/dis-asm.h:472:13: note: declared here 472 | extern void init_disassemble_info (struct disassemble_info *dinfo, void *stream, | ^~~~~~~~~~~~~~~~~~~~~ make[4]: *** [/builds/linux/tools/build/Makefile.build:96: /home/tuxbuild/.cache/tuxmake/builds/1/build/util/annotate.o] Error 1 ----->8---------->8---------->8----- The 5.15 backport is almost straight-forward, with patches 7 and 8 requiring some small modifications. I could not get the 5.10 backport to build, and for 5.4 it was even more difficult to adapt. Thanks and greetings! Daniel Díaz daniel.diaz(a)linaro.org

2 years, 6 months

2
1
0 0

[stable:PATCH v5.4.225 0/2] arm64: errata: Spectre-BHB fixes

by James Morse

Hello! The first patch fixes an issue reported by Sami, where linux panic()s when bringing secondary CPUs online. The problem was the Spectre workarounds trying to allocate a new slot for mitigating KVM when those pages are no longer writeable. While debugging that issue, I spotted the Spectre-BHB KVM mitigation was over-riding the Spectre-v2 KVM Mitigation. It's supposed to happen the other way round. The backports aren't the same as mainline because the spectre mitigation code was totally rewritten for v5.10, and prior to that the KVM infrastructure is very different. Thanks, James Morse (2): arm64: Fix panic() when Spectre-v2 causes Spectre-BHB to re-allocate KVM vectors arm64: errata: Fix KVM Spectre-v2 mitigation selection for Cortex-A57/A72 arch/arm64/kernel/cpu_errata.c | 24 ++++++++++++++++++------ 1 file changed, 18 insertions(+), 6 deletions(-) -- 2.30.2

2 years, 6 months

2
3
0 0

[PATCH v3 1/2] mm/migrate: Fix read-only page got writable when recover pte

by Peter Xu

Ives van Hoorne from codesandbox.io reported an issue regarding possible data loss of uffd-wp when applied to memfds on heavily loaded systems. The symptom is some read page got data mismatch from the snapshot child VMs. Here I can also reproduce with a Rust reproducer that was provided by Ives that keeps taking snapshot of a 256MB VM, on a 32G system when I initiate 80 instances I can trigger the issues in ten minutes. It turns out that we got some pages write-through even if uffd-wp is applied to the pte. The problem is, when removing migration entries, we didn't really worry about write bit as long as we know it's not a write migration entry. That may not be true, for some memory types (e.g. writable shmem) mk_pte can return a pte with write bit set, then to recover the migration entry to its original state we need to explicit wr-protect the pte or it'll has the write bit set if it's a read migration entry. For uffd it can cause write-through. The relevant code on uffd was introduced in the anon support, which is commit f45ec5ff16a7 ("userfaultfd: wp: support swap and page migration", 2020-04-07). However anon shouldn't suffer from this problem because anon should already have the write bit cleared always, so that may not be a proper Fixes target, while I'm adding the Fixes to be uffd shmem support. Cc: Andrea Arcangeli <aarcange(a)redhat.com> Cc: stable(a)vger.kernel.org Fixes: b1f9e876862d ("mm/uffd: enable write protection for shmem & hugetlbfs") Reported-by: Ives van Hoorne <ives(a)codesandbox.io> Reviewed-by: Alistair Popple <apopple(a)nvidia.com> Tested-by: Ives van Hoorne <ives(a)codesandbox.io> Signed-off-by: Peter Xu <peterx(a)redhat.com> --- mm/migrate.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index dff333593a8a..8b6351c08c78 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -213,8 +213,14 @@ static bool remove_migration_pte(struct folio *folio, pte = pte_mkdirty(pte); if (is_writable_migration_entry(entry)) pte = maybe_mkwrite(pte, vma); - else if (pte_swp_uffd_wp(*pvmw.pte)) + else + /* NOTE: mk_pte can have write bit set */ + pte = pte_wrprotect(pte); + + if (pte_swp_uffd_wp(*pvmw.pte)) { + WARN_ON_ONCE(pte_write(pte)); pte = pte_mkuffd_wp(pte); + } if (folio_test_anon(folio) && !is_readable_migration_entry(entry)) rmap_flags |= RMAP_EXCLUSIVE; -- 2.37.3

2 years, 7 months

3
10
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror November 2022