Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change
1) Makes a copy of the bind address in kernel_bind() to insulate callers. 2) Replaces direct calls to sock->ops->bind() with kernel_bind()
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Signed-off-by: Jordan Rife jrife@google.com --- v3->v4: Remove precondition check for addrlen. Pass address copy to bind() instead of original address. v2->v3: Add "Fixes" tag. Check for positivity in addrlen sanity check. v1->v2: Split up original patch into patch series. Insulate sock->ops->bind() calls with kernel_bind().
drivers/block/drbd/drbd_receiver.c | 4 ++-- drivers/char/agp/alpha-agp.c | 2 +- drivers/infiniband/hw/erdma/erdma_cm.c | 6 +++--- drivers/infiniband/sw/siw/siw_cm.c | 10 +++++----- drivers/isdn/mISDN/l1oip_core.c | 4 ++-- fs/dlm/lowcomms.c | 7 +++---- fs/ocfs2/cluster/tcp.c | 6 +++--- fs/smb/client/connect.c | 6 +++--- net/netfilter/ipvs/ip_vs_sync.c | 4 ++-- net/rds/tcp_connect.c | 2 +- net/rds/tcp_listen.c | 2 +- net/socket.c | 7 ++++++- 12 files changed, 32 insertions(+), 28 deletions(-)
diff --git a/drivers/block/drbd/drbd_receiver.c b/drivers/block/drbd/drbd_receiver.c index 9b2660e990a98..752759ed22b8c 100644 --- a/drivers/block/drbd/drbd_receiver.c +++ b/drivers/block/drbd/drbd_receiver.c @@ -638,7 +638,7 @@ static struct socket *drbd_try_connect(struct drbd_connection *connection) * a free one dynamically. */ what = "bind before connect"; - err = sock->ops->bind(sock, (struct sockaddr *) &src_in6, my_addr_len); + err = kernel_bind(sock, (struct sockaddr *)&src_in6, my_addr_len); if (err < 0) goto out;
@@ -725,7 +725,7 @@ static int prepare_listen_socket(struct drbd_connection *connection, struct acce drbd_setbufsize(s_listen, sndbuf_size, rcvbuf_size);
what = "bind before listen"; - err = s_listen->ops->bind(s_listen, (struct sockaddr *)&my_addr, my_addr_len); + err = kernel_bind(s_listen, (struct sockaddr *)&my_addr, my_addr_len); if (err < 0) goto out;
diff --git a/drivers/char/agp/alpha-agp.c b/drivers/char/agp/alpha-agp.c index c9bf2c2198418..f251fedfb4840 100644 --- a/drivers/char/agp/alpha-agp.c +++ b/drivers/char/agp/alpha-agp.c @@ -96,7 +96,7 @@ static int alpha_core_agp_insert_memory(struct agp_memory *mem, off_t pg_start, if ((pg_start + mem->page_count) > num_entries) return -EINVAL;
- status = agp->ops->bind(agp, pg_start, mem); + status = kernel_bind(agp, pg_start, mem); mb(); alpha_core_agp_tlbflush(mem);
diff --git a/drivers/infiniband/hw/erdma/erdma_cm.c b/drivers/infiniband/hw/erdma/erdma_cm.c index e2b89e7bbe2b8..674702d159c29 100644 --- a/drivers/infiniband/hw/erdma/erdma_cm.c +++ b/drivers/infiniband/hw/erdma/erdma_cm.c @@ -990,7 +990,7 @@ static int kernel_bindconnect(struct socket *s, struct sockaddr *laddr, int ret;
sock_set_reuseaddr(s->sk); - ret = s->ops->bind(s, laddr, laddrlen); + ret = kernel_bind(s, laddr, laddrlen); if (ret) return ret; ret = kernel_connect(s, raddr, raddrlen, flags); @@ -1309,8 +1309,8 @@ int erdma_create_listen(struct iw_cm_id *id, int backlog) if (ipv4_is_zeronet(laddr->sin_addr.s_addr)) s->sk->sk_bound_dev_if = dev->netdev->ifindex;
- ret = s->ops->bind(s, (struct sockaddr *)laddr, - sizeof(struct sockaddr_in)); + ret = kernel_bind(s, (struct sockaddr *)laddr, + sizeof(struct sockaddr_in)); if (ret) goto error;
diff --git a/drivers/infiniband/sw/siw/siw_cm.c b/drivers/infiniband/sw/siw/siw_cm.c index 05624f424153e..d05e0eeee9244 100644 --- a/drivers/infiniband/sw/siw/siw_cm.c +++ b/drivers/infiniband/sw/siw/siw_cm.c @@ -1324,7 +1324,7 @@ static int kernel_bindconnect(struct socket *s, struct sockaddr *laddr, return rv; }
- rv = s->ops->bind(s, laddr, size); + rv = kernel_bind(s, laddr, size); if (rv < 0) return rv;
@@ -1793,8 +1793,8 @@ int siw_create_listen(struct iw_cm_id *id, int backlog) if (ipv4_is_zeronet(laddr->sin_addr.s_addr)) s->sk->sk_bound_dev_if = sdev->netdev->ifindex;
- rv = s->ops->bind(s, (struct sockaddr *)laddr, - sizeof(struct sockaddr_in)); + rv = kernel_bind(s, (struct sockaddr *)laddr, + sizeof(struct sockaddr_in)); } else { struct sockaddr_in6 *laddr = &to_sockaddr_in6(id->local_addr);
@@ -1811,8 +1811,8 @@ int siw_create_listen(struct iw_cm_id *id, int backlog) if (ipv6_addr_any(&laddr->sin6_addr)) s->sk->sk_bound_dev_if = sdev->netdev->ifindex;
- rv = s->ops->bind(s, (struct sockaddr *)laddr, - sizeof(struct sockaddr_in6)); + rv = kernel_bind(s, (struct sockaddr *)laddr, + sizeof(struct sockaddr_in6)); } if (rv) { siw_dbg(id->device, "socket bind error: %d\n", rv); diff --git a/drivers/isdn/mISDN/l1oip_core.c b/drivers/isdn/mISDN/l1oip_core.c index f010b35a05313..681147e1fc843 100644 --- a/drivers/isdn/mISDN/l1oip_core.c +++ b/drivers/isdn/mISDN/l1oip_core.c @@ -675,8 +675,8 @@ l1oip_socket_thread(void *data) hc->sin_remote.sin_port = htons((unsigned short)hc->remoteport);
/* bind to incoming port */ - if (socket->ops->bind(socket, (struct sockaddr *)&hc->sin_local, - sizeof(hc->sin_local))) { + if (kernel_bind(socket, (struct sockaddr *)&hc->sin_local, + sizeof(hc->sin_local))) { printk(KERN_ERR "%s: Failed to bind socket to port %d.\n", __func__, hc->localport); ret = -EINVAL; diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c index 1cf796b97eb65..73ab179833fbd 100644 --- a/fs/dlm/lowcomms.c +++ b/fs/dlm/lowcomms.c @@ -1805,8 +1805,7 @@ static int dlm_tcp_bind(struct socket *sock) memcpy(&src_addr, &dlm_local_addr[0], sizeof(src_addr)); make_sockaddr(&src_addr, 0, &addr_len);
- result = sock->ops->bind(sock, (struct sockaddr *)&src_addr, - addr_len); + result = kernel_bind(sock, (struct sockaddr *)&src_addr, addr_len); if (result < 0) { /* This *may* not indicate a critical error */ log_print("could not bind for connect: %d", result); @@ -1850,8 +1849,8 @@ static int dlm_tcp_listen_bind(struct socket *sock)
/* Bind to our port */ make_sockaddr(&dlm_local_addr[0], dlm_config.ci_tcp_port, &addr_len); - return sock->ops->bind(sock, (struct sockaddr *)&dlm_local_addr[0], - addr_len); + return kernel_bind(sock, (struct sockaddr *)&dlm_local_addr[0], + addr_len); }
static const struct dlm_proto_ops dlm_tcp_ops = { diff --git a/fs/ocfs2/cluster/tcp.c b/fs/ocfs2/cluster/tcp.c index ead7c287ff373..3a4a7a521476d 100644 --- a/fs/ocfs2/cluster/tcp.c +++ b/fs/ocfs2/cluster/tcp.c @@ -1614,8 +1614,8 @@ static void o2net_start_connect(struct work_struct *work) myaddr.sin_addr.s_addr = mynode->nd_ipv4_address; myaddr.sin_port = htons(0); /* any port */
- ret = sock->ops->bind(sock, (struct sockaddr *)&myaddr, - sizeof(myaddr)); + ret = kernel_bind(sock, (struct sockaddr *)&myaddr, + sizeof(myaddr)); if (ret) { mlog(ML_ERROR, "bind failed with %d at address %pI4\n", ret, &mynode->nd_ipv4_address); @@ -1998,7 +1998,7 @@ static int o2net_open_listening_sock(__be32 addr, __be16 port) INIT_WORK(&o2net_listen_work, o2net_accept_many);
sock->sk->sk_reuse = SK_CAN_REUSE; - ret = sock->ops->bind(sock, (struct sockaddr *)&sin, sizeof(sin)); + ret = kernel_bind(sock, (struct sockaddr *)&sin, sizeof(sin)); if (ret < 0) { printk(KERN_ERR "o2net: Error %d while binding socket at " "%pI4:%u\n", ret, &addr, ntohs(port)); diff --git a/fs/smb/client/connect.c b/fs/smb/client/connect.c index b7764cd57e035..6dcc1cd41b8c5 100644 --- a/fs/smb/client/connect.c +++ b/fs/smb/client/connect.c @@ -2891,9 +2891,9 @@ bind_socket(struct TCP_Server_Info *server) if (server->srcaddr.ss_family != AF_UNSPEC) { /* Bind to the specified local IP address */ struct socket *socket = server->ssocket; - rc = socket->ops->bind(socket, - (struct sockaddr *) &server->srcaddr, - sizeof(server->srcaddr)); + rc = kernel_bind(socket, + (struct sockaddr *)&server->srcaddr, + sizeof(server->srcaddr)); if (rc < 0) { struct sockaddr_in *saddr4; struct sockaddr_in6 *saddr6; diff --git a/net/netfilter/ipvs/ip_vs_sync.c b/net/netfilter/ipvs/ip_vs_sync.c index 6e4ed1e11a3b7..4174076c66fa7 100644 --- a/net/netfilter/ipvs/ip_vs_sync.c +++ b/net/netfilter/ipvs/ip_vs_sync.c @@ -1439,7 +1439,7 @@ static int bind_mcastif_addr(struct socket *sock, struct net_device *dev) sin.sin_addr.s_addr = addr; sin.sin_port = 0;
- return sock->ops->bind(sock, (struct sockaddr*)&sin, sizeof(sin)); + return kernel_bind(sock, (struct sockaddr *)&sin, sizeof(sin)); }
static void get_mcast_sockaddr(union ipvs_sockaddr *sa, int *salen, @@ -1546,7 +1546,7 @@ static int make_receive_sock(struct netns_ipvs *ipvs, int id,
get_mcast_sockaddr(&mcast_addr, &salen, &ipvs->bcfg, id); sock->sk->sk_bound_dev_if = dev->ifindex; - result = sock->ops->bind(sock, (struct sockaddr *)&mcast_addr, salen); + result = kernel_bind(sock, (struct sockaddr *)&mcast_addr, salen); if (result < 0) { pr_err("Error binding to the multicast addr\n"); goto error; diff --git a/net/rds/tcp_connect.c b/net/rds/tcp_connect.c index d788c6d28986f..a0046e99d6df7 100644 --- a/net/rds/tcp_connect.c +++ b/net/rds/tcp_connect.c @@ -145,7 +145,7 @@ int rds_tcp_conn_path_connect(struct rds_conn_path *cp) addrlen = sizeof(sin); }
- ret = sock->ops->bind(sock, addr, addrlen); + ret = kernel_bind(sock, addr, addrlen); if (ret) { rdsdebug("bind failed with %d at address %pI6c\n", ret, &conn->c_laddr); diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c index 014fa24418c12..53b3535a1e4a8 100644 --- a/net/rds/tcp_listen.c +++ b/net/rds/tcp_listen.c @@ -306,7 +306,7 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6) addr_len = sizeof(*sin); }
- ret = sock->ops->bind(sock, (struct sockaddr *)&ss, addr_len); + ret = kernel_bind(sock, (struct sockaddr *)&ss, addr_len); if (ret < 0) { rdsdebug("could not bind %s listener socket: %d\n", isv6 ? "IPv6" : "IPv4", ret); diff --git a/net/socket.c b/net/socket.c index a39ec136f5cff..c4a6f55329552 100644 --- a/net/socket.c +++ b/net/socket.c @@ -3516,7 +3516,12 @@ static long compat_sock_ioctl(struct file *file, unsigned int cmd,
int kernel_bind(struct socket *sock, struct sockaddr *addr, int addrlen) { - return READ_ONCE(sock->ops)->bind(sock, addr, addrlen); + struct sockaddr_storage address; + + memcpy(&address, addr, addrlen); + + return READ_ONCE(sock->ops)->bind(sock, (struct sockaddr *)&address, + addrlen); } EXPORT_SYMBOL(kernel_bind);
Jordan Rife wrote:
Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change
- Makes a copy of the bind address in kernel_bind() to insulate callers.
- Replaces direct calls to sock->ops->bind() with kernel_bind()
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Signed-off-by: Jordan Rife jrife@google.com
Reviewed-by: Willem de Bruijn willemb@google.com
On Wed, 2023-09-20 at 09:30 -0400, Willem de Bruijn wrote:
Jordan Rife wrote:
Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change
- Makes a copy of the bind address in kernel_bind() to insulate callers.
- Replaces direct calls to sock->ops->bind() with kernel_bind()
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Signed-off-by: Jordan Rife jrife@google.com
Reviewed-by: Willem de Bruijn willemb@google.com
I fear this is going to cause a few conflicts with other trees. We can still take it, but at very least we will need some acks from the relevant maintainers.
I *think* it would be easier split this and patch 1/3 in individual patches targeting the different trees, hopefully not many additional patches will be required. What do you think?
Cheers,
Paolo
On Thu, Sep 21, 2023 at 4:35 AM Paolo Abeni pabeni@redhat.com wrote:
On Wed, 2023-09-20 at 09:30 -0400, Willem de Bruijn wrote:
Jordan Rife wrote:
Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change
- Makes a copy of the bind address in kernel_bind() to insulate callers.
- Replaces direct calls to sock->ops->bind() with kernel_bind()
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Signed-off-by: Jordan Rife jrife@google.com
Reviewed-by: Willem de Bruijn willemb@google.com
I fear this is going to cause a few conflicts with other trees. We can still take it, but at very least we will need some acks from the relevant maintainers.
I *think* it would be easier split this and patch 1/3 in individual patches targeting the different trees, hopefully not many additional patches will be required. What do you think?
Roughly how many patches would result from this one patch. From the stat line I count { block/drbd, char/agp, infiniband, isdn, fs/dlm, fs/ocfs2, fs/smb, netfilter, rds }. That's worst case nine callers plus the core patch to net/socket.c?
If logistically simpler and you prefer the approach, we can also revisit Jordan's original approach, which embedded the memcpy inside the BPF branches.
That has the slight benefit to in-kernel callers that it limits the cost of the memcpy to cgroup_bpf_enabled. But adds a superfluous second copy to the more common userspace callers, again at least only if cgroup_bpf_enabled.
If so, it should at least move the whole logic around those BPF hooks into helper functions.
On Thu, 2023-09-21 at 09:30 -0400, Willem de Bruijn wrote:
On Thu, Sep 21, 2023 at 4:35 AM Paolo Abeni pabeni@redhat.com wrote:
On Wed, 2023-09-20 at 09:30 -0400, Willem de Bruijn wrote:
Jordan Rife wrote:
Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change
- Makes a copy of the bind address in kernel_bind() to insulate callers.
- Replaces direct calls to sock->ops->bind() with kernel_bind()
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Signed-off-by: Jordan Rife jrife@google.com
Reviewed-by: Willem de Bruijn willemb@google.com
I fear this is going to cause a few conflicts with other trees. We can still take it, but at very least we will need some acks from the relevant maintainers.
I *think* it would be easier split this and patch 1/3 in individual patches targeting the different trees, hopefully not many additional patches will be required. What do you think?
Roughly how many patches would result from this one patch. From the stat line I count { block/drbd, char/agp, infiniband, isdn, fs/dlm, fs/ocfs2, fs/smb, netfilter, rds }. That's worst case nine callers plus the core patch to net/socket.c?
I think there should not be problems taking directly changes for rds and nf/ipvs.
Additionally, I think the non network changes could consolidate the bind and connect changes in a single patch.
It should be 7 not-network patches overall.
If logistically simpler and you prefer the approach, we can also revisit Jordan's original approach, which embedded the memcpy inside the BPF branches.
That has the slight benefit to in-kernel callers that it limits the cost of the memcpy to cgroup_bpf_enabled. But adds a superfluous second copy to the more common userspace callers, again at least only if cgroup_bpf_enabled.
If so, it should at least move the whole logic around those BPF hooks into helper functions.
IMHO the approach implemented here is preferable, I suggest going forward with it.
Thanks,
Paolo
On Thu, Sep 21, 2023 at 8:26 AM Paolo Abeni pabeni@redhat.com wrote:
On Thu, 2023-09-21 at 09:30 -0400, Willem de Bruijn wrote:
On Thu, Sep 21, 2023 at 4:35 AM Paolo Abeni pabeni@redhat.com wrote:
On Wed, 2023-09-20 at 09:30 -0400, Willem de Bruijn wrote:
Jordan Rife wrote:
Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change
- Makes a copy of the bind address in kernel_bind() to insulate callers.
- Replaces direct calls to sock->ops->bind() with kernel_bind()
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Signed-off-by: Jordan Rife jrife@google.com
Reviewed-by: Willem de Bruijn willemb@google.com
I fear this is going to cause a few conflicts with other trees. We can still take it, but at very least we will need some acks from the relevant maintainers.
I *think* it would be easier split this and patch 1/3 in individual patches targeting the different trees, hopefully not many additional patches will be required. What do you think?
Roughly how many patches would result from this one patch. From the stat line I count { block/drbd, char/agp, infiniband, isdn, fs/dlm, fs/ocfs2, fs/smb, netfilter, rds }. That's worst case nine callers plus the core patch to net/socket.c?
I think there should not be problems taking directly changes for rds and nf/ipvs.
Additionally, I think the non network changes could consolidate the bind and connect changes in a single patch.
It should be 7 not-network patches overall.
If logistically simpler and you prefer the approach, we can also revisit Jordan's original approach, which embedded the memcpy inside the BPF branches.
That has the slight benefit to in-kernel callers that it limits the cost of the memcpy to cgroup_bpf_enabled. But adds a superfluous second copy to the more common userspace callers, again at least only if cgroup_bpf_enabled.
If so, it should at least move the whole logic around those BPF hooks into helper functions.
IMHO the approach implemented here is preferable, I suggest going forward with it.
Thanks,
Paolo
Additionally, I think the non network changes could consolidate the bind and connect changes in a single patch.
It should be 7 not-network patches overall.
I'm fine with this. If there are no objections, I can drop the non-net changes in this patch series and send out several kernel_connect/kernel_bind patches to the appropriate trees as a follow up. Shall we wait to hear back from the maintainers or just go ahead with this plan?
-Jordan
On Thu, 2023-09-21 at 10:01 -0700, Jordan Rife wrote:
On Thu, Sep 21, 2023 at 8:26 AM Paolo Abeni pabeni@redhat.com wrote:
On Thu, 2023-09-21 at 09:30 -0400, Willem de Bruijn wrote:
On Thu, Sep 21, 2023 at 4:35 AM Paolo Abeni pabeni@redhat.com wrote:
On Wed, 2023-09-20 at 09:30 -0400, Willem de Bruijn wrote:
Jordan Rife wrote:
Similar to the change in commit 0bdf399342c5("net: Avoid address overwrite in kernel_connect"), BPF hooks run on bind may rewrite the address passed to kernel_bind(). This change
- Makes a copy of the bind address in kernel_bind() to insulate callers.
- Replaces direct calls to sock->ops->bind() with kernel_bind()
Link: https://lore.kernel.org/netdev/20230912013332.2048422-1-jrife@google.com/ Fixes: 4fbac77d2d09 ("bpf: Hooks for sys_bind") Cc: stable@vger.kernel.org Signed-off-by: Jordan Rife jrife@google.com
Reviewed-by: Willem de Bruijn willemb@google.com
I fear this is going to cause a few conflicts with other trees. We can still take it, but at very least we will need some acks from the relevant maintainers.
I *think* it would be easier split this and patch 1/3 in individual patches targeting the different trees, hopefully not many additional patches will be required. What do you think?
Roughly how many patches would result from this one patch. From the stat line I count { block/drbd, char/agp, infiniband, isdn, fs/dlm, fs/ocfs2, fs/smb, netfilter, rds }. That's worst case nine callers plus the core patch to net/socket.c?
I think there should not be problems taking directly changes for rds and nf/ipvs.
Additionally, I think the non network changes could consolidate the bind and connect changes in a single patch.
It should be 7 not-network patches overall.
If logistically simpler and you prefer the approach, we can also revisit Jordan's original approach, which embedded the memcpy inside the BPF branches.
That has the slight benefit to in-kernel callers that it limits the cost of the memcpy to cgroup_bpf_enabled. But adds a superfluous second copy to the more common userspace callers, again at least only if cgroup_bpf_enabled.
If so, it should at least move the whole logic around those BPF hooks into helper functions.
IMHO the approach implemented here is preferable, I suggest going forward with it.
Thanks,
Paolo
Additionally, I think the non network changes could consolidate the bind and connect changes in a single patch.
It should be 7 not-network patches overall.
I'm fine with this. If there are no objections, I can drop the non-net changes in this patch series and send out several kernel_connect/kernel_bind patches to the appropriate trees as a follow up. Shall we wait to hear back from the maintainers or just go ahead with this plan?
I'm guessing you can go ahead with that: it should better fit anybody.
Thanks
Paolo
p.s. also using this reply to check if finally vger accepts my message again...
linux-stable-mirror@lists.linaro.org