Currently, when device mtu is updated, vmxnet3 updates netdev mtu, quiesces
the device and then reactivates it for the ESXi to know about the new mtu.
So, technically the OS stack can start using the new mtu before ESXi knows
about the new mtu.
This can lead to issues for TSO packets which use mss as per the new mtu
configured. This patch fixes this issue by moving the mtu write after
device quiesce.
Cc: stable(a)vger.kernel.org
Fixes: d1a890fa37f2 ("net: VMware virtual Ethernet NIC driver: vmxnet3")
Signed-off-by: Ronak Doshi <ronak.doshi(a)broadcom.com>
Acked-by: Guolin Yang <guolin.yang(a)broadcom.com>
Changes v1-> v2:
Moved MTU write after destroy of rx rings
---
drivers/net/vmxnet3/vmxnet3_drv.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c
index 3df6aabc7e33..c676979c7ab9 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -3607,8 +3607,6 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu)
struct vmxnet3_adapter *adapter = netdev_priv(netdev);
int err = 0;
- WRITE_ONCE(netdev->mtu, new_mtu);
-
/*
* Reset_work may be in the middle of resetting the device, wait for its
* completion.
@@ -3622,6 +3620,7 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu)
/* we need to re-create the rx queue based on the new mtu */
vmxnet3_rq_destroy_all(adapter);
+ WRITE_ONCE(netdev->mtu, new_mtu);
vmxnet3_adjust_rx_ring_size(adapter);
err = vmxnet3_rq_create_all(adapter);
if (err) {
@@ -3638,6 +3637,8 @@ vmxnet3_change_mtu(struct net_device *netdev, int new_mtu)
"Closing it\n", err);
goto out;
}
+ } else {
+ WRITE_ONCE(netdev->mtu, new_mtu);
}
out:
--
2.45.2
Add the missing memory barrier to make sure that the REO dest ring
descriptor is read after the head pointer to avoid using stale data on
weakly ordered architectures like aarch64.
This may fix the ring-buffer corruption worked around by commit
f9fff67d2d7c ("wifi: ath11k: Fix SKB corruption in REO destination
ring") by silently discarding data, and may possibly also address user
reported errors like:
ath11k_pci 0006:01:00.0: msdu_done bit in attention is not set
Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
Cc: stable(a)vger.kernel.org # 5.6
Link: https://bugzilla.kernel.org/show_bug.cgi?id=218005
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
As I reported here:
https://lore.kernel.org/lkml/Z9G5zEOcTdGKm7Ei@hovoldconsulting.com/
the ath11k and ath12k appear to be missing a number of memory barriers
that are required on weakly ordered architectures like aarch64 to avoid
memory corruption issues.
Here's a fix for one more such case which people already seem to be
hitting.
Note that I've seen one "msdu_done" bit not set warning also with this
patch so whether it helps with that at all remains to be seen. I'm CCing
Jens and Steev that see these warnings frequently and that may be able
to help out with testing.
Johan
drivers/net/wireless/ath/ath11k/dp_rx.c | 25 ++++++++++++++++---------
1 file changed, 16 insertions(+), 9 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/dp_rx.c b/drivers/net/wireless/ath/ath11k/dp_rx.c
index 029ecf51c9ef..0a57b337e4c6 100644
--- a/drivers/net/wireless/ath/ath11k/dp_rx.c
+++ b/drivers/net/wireless/ath/ath11k/dp_rx.c
@@ -2646,7 +2646,7 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
struct ath11k *ar;
struct hal_reo_dest_ring *desc;
enum hal_reo_dest_ring_push_reason push_reason;
- u32 cookie;
+ u32 cookie, info0, rx_msdu_info0, rx_mpdu_info0;
int i;
for (i = 0; i < MAX_RADIOS; i++)
@@ -2659,11 +2659,14 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
try_again:
ath11k_hal_srng_access_begin(ab, srng);
+ /* Make sure descriptor is read after the head pointer. */
+ dma_rmb();
+
while (likely(desc =
(struct hal_reo_dest_ring *)ath11k_hal_srng_dst_get_next_entry(ab,
srng))) {
cookie = FIELD_GET(BUFFER_ADDR_INFO1_SW_COOKIE,
- desc->buf_addr_info.info1);
+ READ_ONCE(desc->buf_addr_info.info1));
buf_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_BUF_ID,
cookie);
mac_id = FIELD_GET(DP_RXDMA_BUF_COOKIE_PDEV_ID, cookie);
@@ -2692,8 +2695,9 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
num_buffs_reaped[mac_id]++;
+ info0 = READ_ONCE(desc->info0);
push_reason = FIELD_GET(HAL_REO_DEST_RING_INFO0_PUSH_REASON,
- desc->info0);
+ info0);
if (unlikely(push_reason !=
HAL_REO_DEST_RING_PUSH_REASON_ROUTING_INSTRUCTION)) {
dev_kfree_skb_any(msdu);
@@ -2701,18 +2705,21 @@ int ath11k_dp_process_rx(struct ath11k_base *ab, int ring_id,
continue;
}
- rxcb->is_first_msdu = !!(desc->rx_msdu_info.info0 &
+ rx_msdu_info0 = READ_ONCE(desc->rx_msdu_info.info0);
+ rx_mpdu_info0 = READ_ONCE(desc->rx_mpdu_info.info0);
+
+ rxcb->is_first_msdu = !!(rx_msdu_info0 &
RX_MSDU_DESC_INFO0_FIRST_MSDU_IN_MPDU);
- rxcb->is_last_msdu = !!(desc->rx_msdu_info.info0 &
+ rxcb->is_last_msdu = !!(rx_msdu_info0 &
RX_MSDU_DESC_INFO0_LAST_MSDU_IN_MPDU);
- rxcb->is_continuation = !!(desc->rx_msdu_info.info0 &
+ rxcb->is_continuation = !!(rx_msdu_info0 &
RX_MSDU_DESC_INFO0_MSDU_CONTINUATION);
rxcb->peer_id = FIELD_GET(RX_MPDU_DESC_META_DATA_PEER_ID,
- desc->rx_mpdu_info.meta_data);
+ READ_ONCE(desc->rx_mpdu_info.meta_data));
rxcb->seq_no = FIELD_GET(RX_MPDU_DESC_INFO0_SEQ_NUM,
- desc->rx_mpdu_info.info0);
+ rx_mpdu_info0);
rxcb->tid = FIELD_GET(HAL_REO_DEST_RING_INFO0_RX_QUEUE_NUM,
- desc->info0);
+ info0);
rxcb->mac_id = mac_id;
__skb_queue_tail(&msdu_list[mac_id], msdu);
--
2.48.1
Users of the Lenovo ThinkPad X13s have reported that Wi-Fi sometimes
breaks and the log fills up with errors like:
ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1484, expected 1492
ath11k_pci 0006:01:00.0: HTC Rx: insufficient length, got 1460, expected 1484
which based on a quick look at the driver seemed to indicate some kind
of ring-buffer corruption.
Miaoqing Pan tracked it down to the host seeing the updated destination
ring head pointer before the updated descriptor, and the error handling
for that in turn leaves the ring buffer in an inconsistent state.
Add the missing memory barrier to make sure that the descriptor is read
after the head pointer to address the root cause of the corruption while
fixing up the error handling in case there are ever any (ordering) bugs
on the device side.
Note that the READ_ONCE() are only needed to avoid compiler mischief in
case the ring-buffer helpers are ever inlined.
Tested-on: WCN6855 hw2.1 WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
Fixes: d5c65159f289 ("ath11k: driver for Qualcomm IEEE 802.11ax devices")
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218623
Link: https://lore.kernel.org/20250310010217.3845141-3-quic_miaoqing@quicinc.com
Cc: Miaoqing Pan <quic_miaoqing(a)quicinc.com>
Cc: stable(a)vger.kernel.org # 5.6
Signed-off-by: Johan Hovold <johan+linaro(a)kernel.org>
---
drivers/net/wireless/ath/ath11k/ce.c | 11 +++++------
drivers/net/wireless/ath/ath11k/hal.c | 4 ++--
2 files changed, 7 insertions(+), 8 deletions(-)
diff --git a/drivers/net/wireless/ath/ath11k/ce.c b/drivers/net/wireless/ath/ath11k/ce.c
index e66e86bdec20..9d8efec46508 100644
--- a/drivers/net/wireless/ath/ath11k/ce.c
+++ b/drivers/net/wireless/ath/ath11k/ce.c
@@ -393,11 +393,10 @@ static int ath11k_ce_completed_recv_next(struct ath11k_ce_pipe *pipe,
goto err;
}
+ /* Make sure descriptor is read after the head pointer. */
+ dma_rmb();
+
*nbytes = ath11k_hal_ce_dst_status_get_length(desc);
- if (*nbytes == 0) {
- ret = -EIO;
- goto err;
- }
*skb = pipe->dest_ring->skb[sw_index];
pipe->dest_ring->skb[sw_index] = NULL;
@@ -430,8 +429,8 @@ static void ath11k_ce_recv_process_cb(struct ath11k_ce_pipe *pipe)
dma_unmap_single(ab->dev, ATH11K_SKB_RXCB(skb)->paddr,
max_nbytes, DMA_FROM_DEVICE);
- if (unlikely(max_nbytes < nbytes)) {
- ath11k_warn(ab, "rxed more than expected (nbytes %d, max %d)",
+ if (unlikely(max_nbytes < nbytes || nbytes == 0)) {
+ ath11k_warn(ab, "unexpected rx length (nbytes %d, max %d)",
nbytes, max_nbytes);
dev_kfree_skb_any(skb);
continue;
diff --git a/drivers/net/wireless/ath/ath11k/hal.c b/drivers/net/wireless/ath/ath11k/hal.c
index 61f4b6dd5380..8cb1505a5a0c 100644
--- a/drivers/net/wireless/ath/ath11k/hal.c
+++ b/drivers/net/wireless/ath/ath11k/hal.c
@@ -599,7 +599,7 @@ u32 ath11k_hal_ce_dst_status_get_length(void *buf)
struct hal_ce_srng_dst_status_desc *desc = buf;
u32 len;
- len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, desc->flags);
+ len = FIELD_GET(HAL_CE_DST_STATUS_DESC_FLAGS_LEN, READ_ONCE(desc->flags));
desc->flags &= ~HAL_CE_DST_STATUS_DESC_FLAGS_LEN;
return len;
@@ -829,7 +829,7 @@ void ath11k_hal_srng_access_begin(struct ath11k_base *ab, struct hal_srng *srng)
srng->u.src_ring.cached_tp =
*(volatile u32 *)srng->u.src_ring.tp_addr;
} else {
- srng->u.dst_ring.cached_hp = *srng->u.dst_ring.hp_addr;
+ srng->u.dst_ring.cached_hp = READ_ONCE(*srng->u.dst_ring.hp_addr);
/* Try to prefetch the next descriptor in the ring */
if (srng->flags & HAL_SRNG_FLAGS_CACHED)
--
2.48.1
Embryo socket is not queued in gc_candidates, so we can't drop
a reference held by its oob_skb.
Let's say we create listener and embryo sockets, send the
listener's fd to the embryo as OOB data, and close() them
without recv()ing the OOB data.
There is a self-reference cycle like
listener -> embryo.oob_skb -> listener
, so this must be cleaned up by GC. Otherwise, the listener's
refcnt is not released and sockets are leaked:
# unshare -n
# cat /proc/net/protocols | grep UNIX-STREAM
UNIX-STREAM 1024 0 -1 NI 0 yes kernel ...
# python3
>>> from array import array
>>> from socket import *
>>>
>>> s = socket(AF_UNIX, SOCK_STREAM)
>>> s.bind('\0test\0')
>>> s.listen()
>>>
>>> c = socket(AF_UNIX, SOCK_STREAM)
>>> c.connect(s.getsockname())
>>> c.sendmsg([b'x'], [(SOL_SOCKET, SCM_RIGHTS, array('i', [s.fileno()]))], MSG_OOB)
1
>>> quit()
# cat /proc/net/protocols | grep UNIX-STREAM
UNIX-STREAM 1024 3 -1 NI 0 yes kernel ...
^^^
3 sockets still in use after FDs are close()d
Let's drop the embryo socket's oob_skb ref in scan_inflight().
This also fixes a racy access to oob_skb that commit 9841991a446c
("af_unix: Update unix_sk(sk)->oob_skb under sk_receive_queue
lock.") fixed for the new Tarjan's algo-based GC.
Fixes: 314001f0bf92 ("af_unix: Add OOB support")
Reported-by: Lei Lu <llfamsec(a)gmail.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu(a)amazon.com>
---
This has no upstream commit because I replaced the entire GC in
6.10 and the new GC does not have this bug, and this fix is only
applicable to the old GC (<= 6.9), thus for 5.15/6.1/6.6.
---
---
net/unix/garbage.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/net/unix/garbage.c b/net/unix/garbage.c
index 2a758531e102..b3fbdf129944 100644
--- a/net/unix/garbage.c
+++ b/net/unix/garbage.c
@@ -102,13 +102,14 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
/* Process the descriptors of this socket */
int nfd = UNIXCB(skb).fp->count;
struct file **fp = UNIXCB(skb).fp->fp;
+ struct unix_sock *u;
while (nfd--) {
/* Get the socket the fd matches if it indeed does so */
struct sock *sk = unix_get_socket(*fp++);
if (sk) {
- struct unix_sock *u = unix_sk(sk);
+ u = unix_sk(sk);
/* Ignore non-candidates, they could
* have been added to the queues after
@@ -122,6 +123,13 @@ static void scan_inflight(struct sock *x, void (*func)(struct unix_sock *),
}
}
if (hit && hitlist != NULL) {
+#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
+ u = unix_sk(x);
+ if (u->oob_skb) {
+ WARN_ON_ONCE(skb_unref(u->oob_skb));
+ u->oob_skb = NULL;
+ }
+#endif
__skb_unlink(skb, &x->sk_receive_queue);
__skb_queue_tail(hitlist, skb);
}
@@ -299,17 +307,9 @@ void unix_gc(void)
* which are creating the cycle(s).
*/
skb_queue_head_init(&hitlist);
- list_for_each_entry(u, &gc_candidates, link) {
+ list_for_each_entry(u, &gc_candidates, link)
scan_children(&u->sk, inc_inflight, &hitlist);
-#if IS_ENABLED(CONFIG_AF_UNIX_OOB)
- if (u->oob_skb) {
- kfree_skb(u->oob_skb);
- u->oob_skb = NULL;
- }
-#endif
- }
-
/* not_cycle_list contains those sockets which do not make up a
* cycle. Restore these to the inflight list.
*/
--
2.39.5 (Apple Git-154)
From: Amir Goldstein <amir73il(a)gmail.com>
commit 974e3fe0ac61de85015bbe5a4990cf4127b304b2 upstream.
Encoding file handles is usually performed by a filesystem >encode_fh()
method that may fail for various reasons.
The legacy users of exportfs_encode_fh(), namely, nfsd and
name_to_handle_at(2) syscall are ready to cope with the possibility
of failure to encode a file handle.
There are a few other users of exportfs_encode_{fh,fid}() that
currently have a WARN_ON() assertion when ->encode_fh() fails.
Relax those assertions because they are wrong.
The second linked bug report states commit 16aac5ad1fa9 ("ovl: support
encoding non-decodable file handles") in v6.6 as the regressing commit,
but this is not accurate.
The aforementioned commit only increases the chances of the assertion
and allows triggering the assertion with the reproducer using overlayfs,
inotify and drop_caches.
Triggering this assertion was always possible with other filesystems and
other reasons of ->encode_fh() failures and more particularly, it was
also possible with the exact same reproducer using overlayfs that is
mounted with options index=on,nfs_export=on also on kernels < v6.6.
Therefore, I am not listing the aforementioned commit as a Fixes commit.
Backport hint: this patch will have a trivial conflict applying to
v6.6.y, and other trivial conflicts applying to stable kernels < v6.6.
Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog…
Reported-by: Dmitry Safonov <dima(a)arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3…
Cc: stable(a)vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
[Minor conflict resolved due to code context change.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
fs/notify/fdinfo.c | 4 +---
fs/overlayfs/copy_up.c | 5 ++---
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 55081ae3a6ec..dd5bc6ffae85 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -51,10 +51,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
size = f.handle.handle_bytes >> 2;
ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, NULL);
- if ((ret == FILEID_INVALID) || (ret < 0)) {
- WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
+ if ((ret == FILEID_INVALID) || (ret < 0))
return;
- }
f.handle.handle_type = ret;
f.handle.handle_bytes = size * sizeof(u32);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 65ac504595ba..8557180bd7e1 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -302,9 +302,8 @@ struct ovl_fh *ovl_encode_real_fh(struct dentry *real, bool is_upper)
buflen = (dwords << 2);
err = -EIO;
- if (WARN_ON(fh_type < 0) ||
- WARN_ON(buflen > MAX_HANDLE_SZ) ||
- WARN_ON(fh_type == FILEID_INVALID))
+ if (fh_type < 0 || fh_type == FILEID_INVALID ||
+ WARN_ON(buflen > MAX_HANDLE_SZ))
goto out_err;
fh->fb.version = OVL_FH_VERSION;
--
2.34.1
From: Amir Goldstein <amir73il(a)gmail.com>
commit 974e3fe0ac61de85015bbe5a4990cf4127b304b2 upstream.
Encoding file handles is usually performed by a filesystem >encode_fh()
method that may fail for various reasons.
The legacy users of exportfs_encode_fh(), namely, nfsd and
name_to_handle_at(2) syscall are ready to cope with the possibility
of failure to encode a file handle.
There are a few other users of exportfs_encode_{fh,fid}() that
currently have a WARN_ON() assertion when ->encode_fh() fails.
Relax those assertions because they are wrong.
The second linked bug report states commit 16aac5ad1fa9 ("ovl: support
encoding non-decodable file handles") in v6.6 as the regressing commit,
but this is not accurate.
The aforementioned commit only increases the chances of the assertion
and allows triggering the assertion with the reproducer using overlayfs,
inotify and drop_caches.
Triggering this assertion was always possible with other filesystems and
other reasons of ->encode_fh() failures and more particularly, it was
also possible with the exact same reproducer using overlayfs that is
mounted with options index=on,nfs_export=on also on kernels < v6.6.
Therefore, I am not listing the aforementioned commit as a Fixes commit.
Backport hint: this patch will have a trivial conflict applying to
v6.6.y, and other trivial conflicts applying to stable kernels < v6.6.
Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog…
Reported-by: Dmitry Safonov <dima(a)arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3…
Cc: stable(a)vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
[Minor conflict resolved due to code context change.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
fs/notify/fdinfo.c | 4 +---
fs/overlayfs/copy_up.c | 5 ++---
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 55081ae3a6ec..dd5bc6ffae85 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -51,10 +51,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
size = f.handle.handle_bytes >> 2;
ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, NULL);
- if ((ret == FILEID_INVALID) || (ret < 0)) {
- WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
+ if ((ret == FILEID_INVALID) || (ret < 0))
return;
- }
f.handle.handle_type = ret;
f.handle.handle_bytes = size * sizeof(u32);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 203b88293f6b..ced56696beeb 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -361,9 +361,8 @@ struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real,
buflen = (dwords << 2);
err = -EIO;
- if (WARN_ON(fh_type < 0) ||
- WARN_ON(buflen > MAX_HANDLE_SZ) ||
- WARN_ON(fh_type == FILEID_INVALID))
+ if (fh_type < 0 || fh_type == FILEID_INVALID ||
+ WARN_ON(buflen > MAX_HANDLE_SZ))
goto out_err;
fh->fb.version = OVL_FH_VERSION;
--
2.34.1
From: Amir Goldstein <amir73il(a)gmail.com>
commit 974e3fe0ac61de85015bbe5a4990cf4127b304b2 upstream.
Encoding file handles is usually performed by a filesystem >encode_fh()
method that may fail for various reasons.
The legacy users of exportfs_encode_fh(), namely, nfsd and
name_to_handle_at(2) syscall are ready to cope with the possibility
of failure to encode a file handle.
There are a few other users of exportfs_encode_{fh,fid}() that
currently have a WARN_ON() assertion when ->encode_fh() fails.
Relax those assertions because they are wrong.
The second linked bug report states commit 16aac5ad1fa9 ("ovl: support
encoding non-decodable file handles") in v6.6 as the regressing commit,
but this is not accurate.
The aforementioned commit only increases the chances of the assertion
and allows triggering the assertion with the reproducer using overlayfs,
inotify and drop_caches.
Triggering this assertion was always possible with other filesystems and
other reasons of ->encode_fh() failures and more particularly, it was
also possible with the exact same reproducer using overlayfs that is
mounted with options index=on,nfs_export=on also on kernels < v6.6.
Therefore, I am not listing the aforementioned commit as a Fixes commit.
Backport hint: this patch will have a trivial conflict applying to
v6.6.y, and other trivial conflicts applying to stable kernels < v6.6.
Reported-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Tested-by: syzbot+ec07f6f5ce62b858579f(a)syzkaller.appspotmail.com
Closes: https://lore.kernel.org/linux-unionfs/671fd40c.050a0220.4735a.024f.GAE@goog…
Reported-by: Dmitry Safonov <dima(a)arista.com>
Closes: https://lore.kernel.org/linux-fsdevel/CAGrbwDTLt6drB9eaUagnQVgdPBmhLfqqxAf3…
Cc: stable(a)vger.kernel.org
Signed-off-by: Amir Goldstein <amir73il(a)gmail.com>
Link: https://lore.kernel.org/r/20241219115301.465396-1-amir73il@gmail.com
Signed-off-by: Christian Brauner <brauner(a)kernel.org>
[Minor conflict resolved due to code context change.]
Signed-off-by: Jianqi Ren <jianqi.ren.cn(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified the build test
---
fs/notify/fdinfo.c | 4 +---
fs/overlayfs/copy_up.c | 5 ++---
2 files changed, 3 insertions(+), 6 deletions(-)
diff --git a/fs/notify/fdinfo.c b/fs/notify/fdinfo.c
index 55081ae3a6ec..dd5bc6ffae85 100644
--- a/fs/notify/fdinfo.c
+++ b/fs/notify/fdinfo.c
@@ -51,10 +51,8 @@ static void show_mark_fhandle(struct seq_file *m, struct inode *inode)
size = f.handle.handle_bytes >> 2;
ret = exportfs_encode_inode_fh(inode, (struct fid *)f.handle.f_handle, &size, NULL);
- if ((ret == FILEID_INVALID) || (ret < 0)) {
- WARN_ONCE(1, "Can't encode file handler for inotify: %d\n", ret);
+ if ((ret == FILEID_INVALID) || (ret < 0))
return;
- }
f.handle.handle_type = ret;
f.handle.handle_bytes = size * sizeof(u32);
diff --git a/fs/overlayfs/copy_up.c b/fs/overlayfs/copy_up.c
index 5fc32483afed..c5d8ad610a37 100644
--- a/fs/overlayfs/copy_up.c
+++ b/fs/overlayfs/copy_up.c
@@ -366,9 +366,8 @@ struct ovl_fh *ovl_encode_real_fh(struct ovl_fs *ofs, struct dentry *real,
buflen = (dwords << 2);
err = -EIO;
- if (WARN_ON(fh_type < 0) ||
- WARN_ON(buflen > MAX_HANDLE_SZ) ||
- WARN_ON(fh_type == FILEID_INVALID))
+ if (fh_type < 0 || fh_type == FILEID_INVALID ||
+ WARN_ON(buflen > MAX_HANDLE_SZ))
goto out_err;
fh->fb.version = OVL_FH_VERSION;
--
2.34.1
Hi ,
I'm offering a resource that connects aviation outreach with data-backed direction.
We provide a current and verified list of contacts, tailored specifically for your industry.
(i) High-Net-Worth Individuals (HNWI) seeking seamless luxury travel and private charters
(ii) MRO Professionals focused on enhancing maintenance and procurement strategies
(iii) Executive Assistants managing the travel needs of high-ranking executives
For businesses offering aviation products, maintenance services, or charter flights, these contacts are ideal for your campaign.
Please let me know if you'd like to explore the lead counts and their pricing structure.
Regards,
Jessica
Marketing Manager
Campaign Data Leads.,
Please respond with an Remove if you don't wish to receive further emails.
A warning is emitted in set_return_thunk() when the return thunk is
overwritten since this is likely a bug and will result in mitigations not
functioning and the mitigation information displayed in sysfs being
incorrect.
There is a special case when the return thunk is overwritten from
retbleed_return_thunk to srso_return_thunk since srso_return_thunk provides
a superset of the functionality of retbleed_return_thunk, and this is
handled correctly in entry_untrain_ret(). Avoid emitting the warning in
this scenario to clarify that this is not an issue.
This situation occurs on certain AMD processors (e.g. Zen2) which are
affected by both retbleed and srso.
Fixes: f4818881c47fd ("x86/its: Enable Indirect Target Selection mitigation")
Cc: stable(a)vger.kernel.org # 5.15.x-
Signed-off-by: Suraj Jitindar Singh <surajjs(a)amazon.com>
---
arch/x86/kernel/cpu/bugs.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index 8596ce85026c..b7797636140f 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -69,7 +69,16 @@ void (*x86_return_thunk)(void) __ro_after_init = __x86_return_thunk;
static void __init set_return_thunk(void *thunk)
{
- if (x86_return_thunk != __x86_return_thunk)
+ /*
+ * There can only be one return thunk enabled at a time, so issue a
+ * warning when overwriting it. retbleed_return_thunk is a special case
+ * which is safe to be overwritten with srso_return_thunk since it
+ * provides a superset of the functionality and is handled correctly in
+ * entry_untrain_ret().
+ */
+ if ((x86_return_thunk != __x86_return_thunk) &&
+ (thunk != srso_return_thunk ||
+ x86_return_thunk != retbleed_return_thunk))
pr_warn("x86/bugs: return thunk changed\n");
x86_return_thunk = thunk;
--
2.34.1
For all the complexity of handling affinity for CPU hotplug, what we've
apparently managed to overlook is that arm_cmn_init_irqs() has in fact
always been setting the *initial* affinity of all IRQs to CPU 0, not the
CPU we subsequently choose for event scheduling. Oh dear.
Cc: stable(a)vger.kernel.org
Fixes: 0ba64770a2f2 ("perf: Add Arm CMN-600 PMU driver")
Signed-off-by: Robin Murphy <robin.murphy(a)arm.com>
---
drivers/perf/arm-cmn.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c
index aa2908313558..e385f187a084 100644
--- a/drivers/perf/arm-cmn.c
+++ b/drivers/perf/arm-cmn.c
@@ -2558,6 +2558,7 @@ static int arm_cmn_probe(struct platform_device *pdev)
cmn->dev = &pdev->dev;
cmn->part = (unsigned long)device_get_match_data(cmn->dev);
+ cmn->cpu = cpumask_local_spread(0, dev_to_node(cmn->dev));
platform_set_drvdata(pdev, cmn);
if (cmn->part == PART_CMN600 && has_acpi_companion(cmn->dev)) {
@@ -2585,7 +2586,6 @@ static int arm_cmn_probe(struct platform_device *pdev)
if (err)
return err;
- cmn->cpu = cpumask_local_spread(0, dev_to_node(cmn->dev));
cmn->pmu = (struct pmu) {
.module = THIS_MODULE,
.parent = cmn->dev,
--
2.39.2.101.g768bb238c484.dirty
From: Shigeru Yoshida <syoshida(a)redhat.com>
[ Upstream commit fc1092f51567277509563800a3c56732070b6aa4 ]
KMSAN reported uninit-value access in __ip_make_skb() [1]. __ip_make_skb()
tests HDRINCL to know if the skb has icmphdr. However, HDRINCL can cause a
race condition. If calling setsockopt(2) with IP_HDRINCL changes HDRINCL
while __ip_make_skb() is running, the function will access icmphdr in the
skb even if it is not included. This causes the issue reported by KMSAN.
Check FLOWI_FLAG_KNOWN_NH on fl4->flowi4_flags instead of testing HDRINCL
on the socket.
Also, fl4->fl4_icmp_type and fl4->fl4_icmp_code are not initialized. These
are union in struct flowi4 and are implicitly initialized by
flowi4_init_output(), but we should not rely on specific union layout.
Initialize these explicitly in raw_sendmsg().
[1]
BUG: KMSAN: uninit-value in __ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481
__ip_make_skb+0x2b74/0x2d20 net/ipv4/ip_output.c:1481
ip_finish_skb include/net/ip.h:243 [inline]
ip_push_pending_frames+0x4c/0x5c0 net/ipv4/ip_output.c:1508
raw_sendmsg+0x2381/0x2690 net/ipv4/raw.c:654
inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x274/0x3c0 net/socket.c:745
__sys_sendto+0x62c/0x7b0 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x130/0x200 net/socket.c:2199
do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6d/0x75
Uninit was created at:
slab_post_alloc_hook mm/slub.c:3804 [inline]
slab_alloc_node mm/slub.c:3845 [inline]
kmem_cache_alloc_node+0x5f6/0xc50 mm/slub.c:3888
kmalloc_reserve+0x13c/0x4a0 net/core/skbuff.c:577
__alloc_skb+0x35a/0x7c0 net/core/skbuff.c:668
alloc_skb include/linux/skbuff.h:1318 [inline]
__ip_append_data+0x49ab/0x68c0 net/ipv4/ip_output.c:1128
ip_append_data+0x1e7/0x260 net/ipv4/ip_output.c:1365
raw_sendmsg+0x22b1/0x2690 net/ipv4/raw.c:648
inet_sendmsg+0x27b/0x2a0 net/ipv4/af_inet.c:851
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg+0x274/0x3c0 net/socket.c:745
__sys_sendto+0x62c/0x7b0 net/socket.c:2191
__do_sys_sendto net/socket.c:2203 [inline]
__se_sys_sendto net/socket.c:2199 [inline]
__x64_sys_sendto+0x130/0x200 net/socket.c:2199
do_syscall_64+0xd8/0x1f0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x6d/0x75
CPU: 1 PID: 15709 Comm: syz-executor.7 Not tainted 6.8.0-11567-gb3603fcb79b1 #25
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-1.fc39 04/01/2014
Fixes: 99e5acae193e ("ipv4: Fix potential uninit variable access bug in __ip_make_skb()")
Reported-by: syzkaller <syzkaller(a)googlegroups.com>
Signed-off-by: Shigeru Yoshida <syoshida(a)redhat.com>
Link: https://lore.kernel.org/r/20240430123945.2057348-1-syoshida@redhat.com
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Signed-off-by: Zhaoyang Li <lizy04(a)hust.edu.cn>
---
net/ipv4/ip_output.c | 3 ++-
net/ipv4/raw.c | 3 +++
2 files changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index c82107bbd981..543d029102cf 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1580,7 +1580,8 @@ struct sk_buff *__ip_make_skb(struct sock *sk,
* so icmphdr does not in skb linear region and can not get icmp_type
* by icmp_hdr(skb)->type.
*/
- if (sk->sk_type == SOCK_RAW && !inet_sk(sk)->hdrincl)
+ if (sk->sk_type == SOCK_RAW &&
+ !(fl4->flowi4_flags & FLOWI_FLAG_KNOWN_NH))
icmp_type = fl4->fl4_icmp_type;
else
icmp_type = icmp_hdr(skb)->type;
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ee0efd0efec4..c109bc376cc5 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -608,6 +608,9 @@ static int raw_sendmsg(struct sock *sk, struct msghdr *msg, size_t len)
(hdrincl ? FLOWI_FLAG_KNOWN_NH : 0),
daddr, saddr, 0, 0, sk->sk_uid);
+ fl4.fl4_icmp_type = 0;
+ fl4.fl4_icmp_code = 0;
+
if (!hdrincl) {
rfv.msg = msg;
rfv.hlen = 0;
--
2.25.1
From: Jann Horn <jannh(a)google.com>
commit 71c186efc1b2cf1aeabfeff3b9bd5ac4c5ac14d8 upstream.
Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.
The pmd_trans_huge() code in mfill_atomic() is wrong in three different
ways depending on kernel version:
1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
the right two race windows) - I've tested this in a kernel build with
some extra mdelay() calls. See the commit message for a description
of the race scenario.
On older kernels (before 6.5), I think the same bug can even
theoretically lead to accessing transhuge page contents as a page table
if you hit the right 5 narrow race windows (I haven't tested this case).
2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
detecting PMDs that don't point to page tables.
On older kernels (before 6.5), you'd just have to win a single fairly
wide race to hit this.
I've tested this on 6.1 stable by racing migration (with a mdelay()
patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
VM, that causes a kernel oops in ptlock_ptr().
3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
to yank page tables out from under us (though I haven't tested that),
so I think the BUG_ON() checks in mfill_atomic() are just wrong.
I decided to write two separate fixes for these (one fix for bugs 1+2, one
fix for bug 3), so that the first fix can be backported to kernels
affected by bugs 1+2.
This patch (of 2):
This fixes two issues.
I discovered that the following race can occur:
mfill_atomic other thread
============ ============
<zap PMD>
pmdp_get_lockless() [reads none pmd]
<bail if trans_huge>
<if none:>
<pagefault creates transhuge zeropage>
__pte_alloc [no-op]
<zap PMD>
<bail if pmd_trans_huge(*dst_pmd)>
BUG_ON(pmd_none(*dst_pmd))
I have experimentally verified this in a kernel with extra mdelay() calls;
the BUG_ON(pmd_none(*dst_pmd)) triggers.
On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
a BUG_ON(), since the page table access helpers are actually designed to
deal with page tables concurrently disappearing; but on older kernels
(<=6.4), I think we could probably theoretically race past the two
BUG_ON() checks and end up treating a hugepage as a page table.
The second issue is that, as Qi Zheng pointed out, there are other types
of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
(in particular, migration PMDs).
On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
PMD that contains a migration entry (which just requires winning a single,
fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
assumes that the PMD points to a page table.
Breakage follows: First, the kernel tries to take the PTE lock (which will
crash or maybe worse if there is no "struct page" for the address bits in
the migration entry PMD - I think at least on X86 there usually is no
corresponding "struct page" thanks to the PTE inversion mitigation, amd64
looks different).
If that didn't crash, the kernel would next try to write a PTE into what
it wrongly thinks is a page table.
As part of fixing these issues, get rid of the check for pmd_trans_huge()
before __pte_alloc() - that's redundant, we're going to have to check for
that after the __pte_alloc() anyway.
Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@goog…
Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@goog…
Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
Signed-off-by: Jann Horn <jannh(a)google.com>
Acked-by: David Hildenbrand <david(a)redhat.com>
Cc: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Hugh Dickins <hughd(a)google.com>
Cc: Jann Horn <jannh(a)google.com>
Cc: Pavel Emelyanov <xemul(a)virtuozzo.com>
Cc: Qi Zheng <zhengqi.arch(a)bytedance.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
[ According to backport note in git comment message, using pmd_read_atomic()
instead of pmdp_get_lockless() in older kernels ]
Signed-off-by: Xiangyu Chen <xiangyu.chen(a)windriver.com>
Signed-off-by: He Zhe <zhe.he(a)windriver.com>
---
Verified on build test
---
mm/userfaultfd.c | 22 ++++++++++++----------
1 file changed, 12 insertions(+), 10 deletions(-)
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index 992a0a16846f..998c7075c62a 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -642,21 +642,23 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
}
dst_pmdval = pmd_read_atomic(dst_pmd);
- /*
- * If the dst_pmd is mapped as THP don't
- * override it and just be strict.
- */
- if (unlikely(pmd_trans_huge(dst_pmdval))) {
- err = -EEXIST;
- break;
- }
if (unlikely(pmd_none(dst_pmdval)) &&
unlikely(__pte_alloc(dst_mm, dst_pmd))) {
err = -ENOMEM;
break;
}
- /* If an huge pmd materialized from under us fail */
- if (unlikely(pmd_trans_huge(*dst_pmd))) {
+ dst_pmdval = pmd_read_atomic(dst_pmd);
+ /*
+ * If the dst_pmd is THP don't override it and just be strict.
+ * (This includes the case where the PMD used to be THP and
+ * changed back to none after __pte_alloc().)
+ */
+ if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) ||
+ pmd_devmap(dst_pmdval))) {
+ err = -EEXIST;
+ break;
+ }
+ if (unlikely(pmd_bad(dst_pmdval))) {
err = -EFAULT;
break;
}
--
2.34.1
If the directory is corrupted and the number of nlinks is less than 2
(valid nlinks have at least 2), then when the directory is deleted, the
minix_rmdir will try to reduce the nlinks(unsigned int) to a negative
value.
Make nlinks validity check for directories.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: stable(a)vger.kernel.org
Signed-off-by: Andrey Kriulin <kitotavrik.s(a)gmail.com>
---
v3: Move nlinks validaty check to minix_rmdir and minix_rename per Jan
Kara <jack(a)suse.cz> request.
v2: Move nlinks validaty check to V[12]_minix_iget() per Jan Kara
<jack(a)suse.cz> request. Change return error code to EUCLEAN. Don't block
directory in r/o mode per Al Viro <viro(a)zeniv.linux.org.uk> request.
fs/minix/namei.c | 10 +++++++++-
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/fs/minix/namei.c b/fs/minix/namei.c
index 8938536d8d3c..5a1e5f8ef443 100644
--- a/fs/minix/namei.c
+++ b/fs/minix/namei.c
@@ -161,8 +161,12 @@ static int minix_unlink(struct inode * dir, struct dentry *dentry)
static int minix_rmdir(struct inode * dir, struct dentry *dentry)
{
struct inode * inode = d_inode(dentry);
- int err = -ENOTEMPTY;
+ int err = -EUCLEAN;
+ if (inode->i_nlink < 2)
+ return err;
+
+ err = -ENOTEMPTY;
if (minix_empty_dir(inode)) {
err = minix_unlink(dir, dentry);
if (!err) {
@@ -235,6 +239,10 @@ static int minix_rename(struct mnt_idmap *idmap,
mark_inode_dirty(old_inode);
if (dir_de) {
+ if (old_dir->i_nlink <= 2) {
+ err = -EUCLEAN;
+ goto out_dir;
+ }
err = minix_set_link(dir_de, dir_folio, new_dir);
if (!err)
inode_dec_link_count(old_dir);
--
2.47.2
This fixes a couple of different problems, that can cause RTC (alarm)
irqs to be missing when generating UIE interrupts.
The first commit fixes a long-standing problem, which has been
documented in a comment since 2010. This fixes a race that could cause
UIE irqs to stop being generated, which was easily reproduced by
timing the use of RTC_UIE_ON ioctl with the seconds tick in the RTC.
The last commit ensures that RTC (alarm) irqs are enabled whenever
RTC_UIE_ON ioctl is used.
The driver specific commits avoids kernel warnings about unbalanced
enable_irq/disable_irq, which gets triggered on first RTC_UIE_ON with
the last commit. Before this series, the same warning should be seen
on initial RTC_AIE_ON with those drivers.
Signed-off-by: Esben Haabendal <esben(a)geanix.com>
---
Changes in v2:
- Dropped patch for rtc-st-lpc driver.
- Link to v1: https://lore.kernel.org/r/20241203-rtc-uie-irq-fixes-v1-0-01286ecd9f3f@gean…
---
Esben Haabendal (5):
rtc: interface: Fix long-standing race when setting alarm
rtc: isl12022: Fix initial enable_irq/disable_irq balance
rtc: cpcap: Fix initial enable_irq/disable_irq balance
rtc: tps6586x: Fix initial enable_irq/disable_irq balance
rtc: interface: Ensure alarm irq is enabled when UIE is enabled
drivers/rtc/interface.c | 27 +++++++++++++++++++++++++++
drivers/rtc/rtc-cpcap.c | 1 +
drivers/rtc/rtc-isl12022.c | 1 +
drivers/rtc/rtc-tps6586x.c | 1 +
4 files changed, 30 insertions(+)
---
base-commit: 82f2b0b97b36ee3fcddf0f0780a9a0825d52fec3
change-id: 20241203-rtc-uie-irq-fixes-f2838782d0f8
Best regards,
--
Esben Haabendal <esben(a)geanix.com>