From: Stefan Hajnoczi stefanha@redhat.com
[ Upstream commit b8e0792449928943c15d1af9f63816911d139267 ]
Commit 4e0400525691 ("virtio-blk: support polling I/O") triggers the following gcc 13 W=1 warnings:
drivers/block/virtio_blk.c: In function ‘init_vq’: drivers/block/virtio_blk.c:1077:68: warning: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size 7 [-Wformat-truncation=] 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~ drivers/block/virtio_blk.c:1077:58: note: directive argument in the range [-2147483648, 65534] 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~~~~~~~~~~~~ drivers/block/virtio_blk.c:1077:17: note: ‘snprintf’ output between 11 and 21 bytes into a destination of size 16 1077 | snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is a false positive because the lower bound -2147483648 is incorrect. The true range of i is [0, num_vqs - 1] where 0 < num_vqs < 65536.
The code mixes int, unsigned short, and unsigned int types in addition to using "%d" for an unsigned value. Use unsigned short and "%u" consistently to solve the compiler warning.
Cc: Suwan Kim suwan.kim027@gmail.com Reported-by: kernel test robot lkp@intel.com Closes: https://lore.kernel.org/oe-kbuild-all/202312041509.DIyvEt9h-lkp@intel.com/ Signed-off-by: Stefan Hajnoczi stefanha@redhat.com Message-Id: 20231204140743.1487843-1-stefanha@redhat.com Signed-off-by: Michael S. Tsirkin mst@redhat.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/block/virtio_blk.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 4a4b9bad551e8..225c86c74d4e9 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -1021,12 +1021,12 @@ static void virtblk_config_changed(struct virtio_device *vdev) static int init_vq(struct virtio_blk *vblk) { int err; - int i; + unsigned short i; vq_callback_t **callbacks; const char **names; struct virtqueue **vqs; unsigned short num_vqs; - unsigned int num_poll_vqs; + unsigned short num_poll_vqs; struct virtio_device *vdev = vblk->vdev; struct irq_affinity desc = { 0, };
@@ -1070,13 +1070,13 @@ static int init_vq(struct virtio_blk *vblk)
for (i = 0; i < num_vqs - num_poll_vqs; i++) { callbacks[i] = virtblk_done; - snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%d", i); + snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req.%u", i); names[i] = vblk->vqs[i].name; }
for (; i < num_vqs; i++) { callbacks[i] = NULL; - snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%d", i); + snprintf(vblk->vqs[i].name, VQ_NAME_LEN, "req_poll.%u", i); names[i] = vblk->vqs[i].name; }
From: Siddh Raman Pant code@siddh.me
[ Upstream commit 6ec0d7527c4287369b52df3bcefd21a0c4fb2b7c ]
As we know we cannot send the datagram (state can be set to LLCP_CLOSED by nfc_llcp_socket_release()), there is no need to proceed further.
Thus, bail out early from llcp_sock_sendmsg().
Signed-off-by: Siddh Raman Pant code@siddh.me Reviewed-by: Krzysztof Kozlowski krzysztof.kozlowski@linaro.org Reviewed-by: Suman Ghosh sumang@marvell.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/nfc/llcp_sock.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/net/nfc/llcp_sock.c b/net/nfc/llcp_sock.c index 645677f84dba2..819157bbb5a2c 100644 --- a/net/nfc/llcp_sock.c +++ b/net/nfc/llcp_sock.c @@ -796,6 +796,11 @@ static int llcp_sock_sendmsg(struct socket *sock, struct msghdr *msg, }
if (sk->sk_type == SOCK_DGRAM) { + if (sk->sk_state != LLCP_BOUND) { + release_sock(sk); + return -ENOTCONN; + } + DECLARE_SOCKADDR(struct sockaddr_nfc_llcp *, addr, msg->msg_name);
From: Sarannya S quic_sarannya@quicinc.com
[ Upstream commit 9bf2e9165f90dc9f416af53c902be7e33930f728 ]
When a 'DEL_CLIENT' message is received from the remote, the corresponding server port gets deleted. A DEL_SERVER message is then announced for this server. As part of handling the subsequent DEL_SERVER message, the name- server attempts to delete the server port which results in a '-ENOENT' error. The return value from server_del() is then propagated back to qrtr_ns_worker, causing excessive error prints. To address this, return 0 from control_cmd_del_server() without checking the return value of server_del(), since the above scenario is not an error case and hence server_del() doesn't have any other error return value.
Signed-off-by: Sarannya Sasikumar quic_sarannya@quicinc.com Reviewed-by: Simon Horman horms@kernel.org Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- net/qrtr/ns.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/net/qrtr/ns.c b/net/qrtr/ns.c index b1db0b519179b..abb0c70ffc8b0 100644 --- a/net/qrtr/ns.c +++ b/net/qrtr/ns.c @@ -512,7 +512,9 @@ static int ctrl_cmd_del_server(struct sockaddr_qrtr *from, if (!node) return -ENOENT;
- return server_del(node, port, true); + server_del(node, port, true); + + return 0; }
static int ctrl_cmd_new_lookup(struct sockaddr_qrtr *from,
From: wangkeqi wangkeqiwang@didiglobal.com
[ Upstream commit c46bfba1337d301661dbb23cfd905d4cb51f27ca ]
When we register a cn_proc listening event, the proc_event_num_listener variable will be incremented by one, but if PROC_CN_MCAST_IGNORE is not called, the count will not decrease. This will cause the proc_*_connector function to take the wrong path. It will reappear when the forkstat tool exits via ctrl + c. We solve this problem by determining whether there are still listeners to clear proc_event_num_listener.
Signed-off-by: wangkeqi wangkeqiwang@didiglobal.com Signed-off-by: David S. Miller davem@davemloft.net Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/connector/cn_proc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/connector/cn_proc.c b/drivers/connector/cn_proc.c index 44b19e6961763..3d5e6d705fc6e 100644 --- a/drivers/connector/cn_proc.c +++ b/drivers/connector/cn_proc.c @@ -108,8 +108,9 @@ static inline void send_msg(struct cn_msg *msg) filter_data[1] = 0; }
- cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT, - cn_filter, (void *)filter_data); + if (cn_netlink_send_mult(msg, msg->len, 0, CN_IDX_PROC, GFP_NOWAIT, + cn_filter, (void *)filter_data) == -ESRCH) + atomic_set(&proc_event_num_listeners, 0);
local_unlock(&local_event.lock); }
From: Stefan Wahren wahrenst@gmx.net
[ Upstream commit 643fe70e7bcdcc9e2d96952f7fc2bab56385cce5 ]
of_property_match_string returns an int; either an index from 0 or greater if successful or negative on failure. Even it's very unlikely that the DT CPU node contains multiple enable-methods these checks should be fixed.
This patch was inspired by the work of Nick Desaulniers.
Link: https://lore.kernel.org/lkml/20230516-sunxi-v1-1-ac4b9651a8c1@google.com/T/ Cc: Nick Desaulniers ndesaulniers@google.com Signed-off-by: Stefan Wahren wahrenst@gmx.net Link: https://lore.kernel.org/r/20231228193903.9078-2-wahrenst@gmx.net Reviewed-by: Chen-Yu Tsai wens@csie.org Signed-off-by: Arnd Bergmann arnd@arndb.de Signed-off-by: Sasha Levin sashal@kernel.org --- arch/arm/mach-sunxi/mc_smp.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/arch/arm/mach-sunxi/mc_smp.c b/arch/arm/mach-sunxi/mc_smp.c index cb63921232a6f..a0e38050e7fad 100644 --- a/arch/arm/mach-sunxi/mc_smp.c +++ b/arch/arm/mach-sunxi/mc_smp.c @@ -803,14 +803,14 @@ static int __init sunxi_mc_smp_init(void) for (i = 0; i < ARRAY_SIZE(sunxi_mc_smp_data); i++) { ret = of_property_match_string(node, "enable-method", sunxi_mc_smp_data[i].enable_method); - if (!ret) + if (ret >= 0) break; }
is_a83t = sunxi_mc_smp_data[i].is_a83t;
of_node_put(node); - if (ret) + if (ret < 0) return -ENODEV;
if (!sunxi_mc_smp_cpu_table_init())
From: Noah Goldstein goldstein.w.n@gmail.com
[ Upstream commit 5d4acb62853abac1da2deebcb1c1c5b79219bf3b ]
The special case for odd aligned buffers is unnecessary and mostly just adds overhead. Aligned buffers is the expectations, and even for unaligned buffer, the only case that was helped is if the buffer was 1-byte from word aligned which is ~1/7 of the cases. Overall it seems highly unlikely to be worth to extra branch.
It was left in the previous perf improvement patch because I was erroneously comparing the exact output of `csum_partial(...)`, but really we only need `csum_fold(csum_partial(...))` to match so its safe to remove.
All csum kunit tests pass.
Signed-off-by: Noah Goldstein goldstein.w.n@gmail.com Reviewed-by: Eric Dumazet edumazet@google.com Reviewed-by: David Laight david.laight@aculab.com Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/lib/csum-partial_64.c | 36 ++++------------------------------ 1 file changed, 4 insertions(+), 32 deletions(-)
diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c index cea25ca8b8cf6..557e42ede68ec 100644 --- a/arch/x86/lib/csum-partial_64.c +++ b/arch/x86/lib/csum-partial_64.c @@ -11,26 +11,9 @@ #include <asm/checksum.h> #include <asm/word-at-a-time.h>
-static inline unsigned short from32to16(unsigned a) +static inline __wsum csum_finalize_sum(u64 temp64) { - unsigned short b = a >> 16; - asm("addw %w2,%w0\n\t" - "adcw $0,%w0\n" - : "=r" (b) - : "0" (b), "r" (a)); - return b; -} - -static inline __wsum csum_tail(u64 temp64, int odd) -{ - unsigned int result; - - result = add32_with_carry(temp64 >> 32, temp64 & 0xffffffff); - if (unlikely(odd)) { - result = from32to16(result); - result = ((result >> 8) & 0xff) | ((result & 0xff) << 8); - } - return (__force __wsum)result; + return (__force __wsum)((temp64 + ror64(temp64, 32)) >> 32); }
/* @@ -47,17 +30,6 @@ static inline __wsum csum_tail(u64 temp64, int odd) __wsum csum_partial(const void *buff, int len, __wsum sum) { u64 temp64 = (__force u64)sum; - unsigned odd; - - odd = 1 & (unsigned long) buff; - if (unlikely(odd)) { - if (unlikely(len == 0)) - return sum; - temp64 = ror32((__force u32)sum, 8); - temp64 += (*(unsigned char *)buff << 8); - len--; - buff++; - }
/* * len == 40 is the hot case due to IPv6 headers, but annotating it likely() @@ -73,7 +45,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) "adcq $0,%[res]" : [res] "+r"(temp64) : [src] "r"(buff), "m"(*(const char(*)[40])buff)); - return csum_tail(temp64, odd); + return csum_finalize_sum(temp64); } if (unlikely(len >= 64)) { /* @@ -143,7 +115,7 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) : [res] "+r"(temp64) : [trail] "r"(trail)); } - return csum_tail(temp64, odd); + return csum_finalize_sum(temp64); } EXPORT_SYMBOL(csum_partial);
From: Linus Torvalds torvalds@linux-foundation.org
[ Upstream commit a476aae3f1dc78a162a0d2e7945feea7d2b29401 ]
Commit 688eb8191b47 ("x86/csum: Improve performance of `csum_partial`") ended up improving the code generation for the IP csum calculations, and in particular special-casing the 40-byte case that is a hot case for IPv6 headers.
It then had _another_ special case for the 64-byte unrolled loop, which did two chains of 32-byte blocks, which allows modern CPU's to improve performance by doing the chains in parallel thanks to renaming the carry flag.
This just unifies the special cases and combines them into just one single helper the 40-byte csum case, and replaces the 64-byte case by a 80-byte case that just does that single helper twice. It avoids having all these different versions of inline assembly, and actually improved performance further in my tests.
There was never anything magical about the 64-byte unrolled case, even though it happens to be a common size (and typically is the cacheline size).
Signed-off-by: Linus Torvalds torvalds@linux-foundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- arch/x86/lib/csum-partial_64.c | 81 ++++++++++++++++------------------ 1 file changed, 37 insertions(+), 44 deletions(-)
diff --git a/arch/x86/lib/csum-partial_64.c b/arch/x86/lib/csum-partial_64.c index 557e42ede68ec..c9dae65ac01b5 100644 --- a/arch/x86/lib/csum-partial_64.c +++ b/arch/x86/lib/csum-partial_64.c @@ -16,6 +16,20 @@ static inline __wsum csum_finalize_sum(u64 temp64) return (__force __wsum)((temp64 + ror64(temp64, 32)) >> 32); }
+static inline unsigned long update_csum_40b(unsigned long sum, const unsigned long m[5]) +{ + asm("addq %1,%0\n\t" + "adcq %2,%0\n\t" + "adcq %3,%0\n\t" + "adcq %4,%0\n\t" + "adcq %5,%0\n\t" + "adcq $0,%0" + :"+r" (sum) + :"m" (m[0]), "m" (m[1]), "m" (m[2]), + "m" (m[3]), "m" (m[4])); + return sum; +} + /* * Do a checksum on an arbitrary memory area. * Returns a 32bit checksum. @@ -31,52 +45,31 @@ __wsum csum_partial(const void *buff, int len, __wsum sum) { u64 temp64 = (__force u64)sum;
- /* - * len == 40 is the hot case due to IPv6 headers, but annotating it likely() - * has noticeable negative affect on codegen for all other cases with - * minimal performance benefit here. - */ - if (len == 40) { - asm("addq 0*8(%[src]),%[res]\n\t" - "adcq 1*8(%[src]),%[res]\n\t" - "adcq 2*8(%[src]),%[res]\n\t" - "adcq 3*8(%[src]),%[res]\n\t" - "adcq 4*8(%[src]),%[res]\n\t" - "adcq $0,%[res]" - : [res] "+r"(temp64) - : [src] "r"(buff), "m"(*(const char(*)[40])buff)); - return csum_finalize_sum(temp64); + /* Do two 40-byte chunks in parallel to get better ILP */ + if (likely(len >= 80)) { + u64 temp64_2 = 0; + do { + temp64 = update_csum_40b(temp64, buff); + temp64_2 = update_csum_40b(temp64_2, buff + 40); + buff += 80; + len -= 80; + } while (len >= 80); + + asm("addq %1,%0\n\t" + "adcq $0,%0" + :"+r" (temp64): "r" (temp64_2)); } - if (unlikely(len >= 64)) { - /* - * Extra accumulators for better ILP in the loop. - */ - u64 tmp_accum, tmp_carries;
- asm("xorl %k[tmp_accum],%k[tmp_accum]\n\t" - "xorl %k[tmp_carries],%k[tmp_carries]\n\t" - "subl $64, %[len]\n\t" - "1:\n\t" - "addq 0*8(%[src]),%[res]\n\t" - "adcq 1*8(%[src]),%[res]\n\t" - "adcq 2*8(%[src]),%[res]\n\t" - "adcq 3*8(%[src]),%[res]\n\t" - "adcl $0,%k[tmp_carries]\n\t" - "addq 4*8(%[src]),%[tmp_accum]\n\t" - "adcq 5*8(%[src]),%[tmp_accum]\n\t" - "adcq 6*8(%[src]),%[tmp_accum]\n\t" - "adcq 7*8(%[src]),%[tmp_accum]\n\t" - "adcl $0,%k[tmp_carries]\n\t" - "addq $64, %[src]\n\t" - "subl $64, %[len]\n\t" - "jge 1b\n\t" - "addq %[tmp_accum],%[res]\n\t" - "adcq %[tmp_carries],%[res]\n\t" - "adcq $0,%[res]" - : [tmp_accum] "=&r"(tmp_accum), - [tmp_carries] "=&r"(tmp_carries), [res] "+r"(temp64), - [len] "+r"(len), [src] "+r"(buff) - : "m"(*(const char *)buff)); + /* + * len == 40 is the hot case due to IPv6 headers, so return + * early for that exact case without checking the tail bytes. + */ + if (len >= 40) { + temp64 = update_csum_40b(temp64, buff); + len -= 40; + if (!len) + return csum_finalize_sum(temp64); + buff += 40; }
if (len & 32) {
From: Dave Airlie airlied@gmail.com
[ Upstream commit 7854ea0e408d7f2e8faaada1773f3ddf9cb538f5 ]
This func ptr here is normally static allocation, but gsp r535 uses a dynamic pointer, so we need to handle that better.
This fixes a crash with GSP when you use config=disp=0 to avoid disp problems.
Signed-off-by: Dave Airlie airlied@redhat.com Link: https://patchwork.freedesktop.org/patch/msgid/20231222043308.3090089-4-airli... Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/gpu/drm/nouveau/nvkm/engine/disp/base.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/nouveau/nvkm/engine/disp/base.c b/drivers/gpu/drm/nouveau/nvkm/engine/disp/base.c index 73104b59f97fe..54a3017f4756a 100644 --- a/drivers/gpu/drm/nouveau/nvkm/engine/disp/base.c +++ b/drivers/gpu/drm/nouveau/nvkm/engine/disp/base.c @@ -276,7 +276,7 @@ nvkm_disp_oneinit(struct nvkm_engine *engine) list_add_tail(&outp->conn->head, &disp->conns); }
- if (disp->func->oneinit) { + if (disp->func && disp->func->oneinit) { ret = disp->func->oneinit(disp); if (ret) return ret; @@ -377,8 +377,10 @@ nvkm_disp_new_(const struct nvkm_disp_func *func, struct nvkm_device *device, spin_lock_init(&disp->client.lock);
ret = nvkm_engine_ctor(&nvkm_disp, device, type, inst, true, &disp->engine); - if (ret) + if (ret) { + disp->func = NULL; return ret; + }
if (func->super) { disp->super.wq = create_singlethread_workqueue("nvkm-disp");
linux-stable-mirror@lists.linaro.org