This is a note to let you know that I've just added the patch titled
cls_u32: fix use after free in u32_destroy_key()
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
cls_u32-fix-use-after-free-in-u32_destroy_key.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Mar 6 19:02:56 PST 2018
From: Paolo Abeni <pabeni(a)redhat.com>
Date: Mon, 5 Feb 2018 22:23:01 +0100
Subject: cls_u32: fix use after free in u32_destroy_key()
From: Paolo Abeni <pabeni(a)redhat.com>
[ Upstream commit d7cdee5ea8d28ae1b6922deb0c1badaa3aa0ef8c ]
Li Shuang reported an Oops with cls_u32 due to an use-after-free
in u32_destroy_key(). The use-after-free can be triggered with:
dev=lo
tc qdisc add dev $dev root handle 1: htb default 10
tc filter add dev $dev parent 1: prio 5 handle 1: protocol ip u32 divisor 256
tc filter add dev $dev protocol ip parent 1: prio 5 u32 ht 800:: match ip dst\
10.0.0.0/8 hashkey mask 0x0000ff00 at 16 link 1:
tc qdisc del dev $dev root
Which causes the following kasan splat:
==================================================================
BUG: KASAN: use-after-free in u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
Read of size 4 at addr ffff881b83dae618 by task kworker/u48:5/571
CPU: 17 PID: 571 Comm: kworker/u48:5 Not tainted 4.15.0+ #87
Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
Workqueue: tc_filter_workqueue u32_delete_key_freepf_work [cls_u32]
Call Trace:
dump_stack+0xd6/0x182
? dma_virt_map_sg+0x22e/0x22e
print_address_description+0x73/0x290
kasan_report+0x277/0x360
? u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
u32_delete_key_freepf_work+0x1c/0x30 [cls_u32]
process_one_work+0xae0/0x1c80
? sched_clock+0x5/0x10
? pwq_dec_nr_in_flight+0x3c0/0x3c0
? _raw_spin_unlock_irq+0x29/0x40
? trace_hardirqs_on_caller+0x381/0x570
? _raw_spin_unlock_irq+0x29/0x40
? finish_task_switch+0x1e5/0x760
? finish_task_switch+0x208/0x760
? preempt_notifier_dec+0x20/0x20
? __schedule+0x839/0x1ee0
? check_noncircular+0x20/0x20
? firmware_map_remove+0x73/0x73
? find_held_lock+0x39/0x1c0
? worker_thread+0x434/0x1820
? lock_contended+0xee0/0xee0
? lock_release+0x1100/0x1100
? init_rescuer.part.16+0x150/0x150
? retint_kernel+0x10/0x10
worker_thread+0x216/0x1820
? process_one_work+0x1c80/0x1c80
? lock_acquire+0x1a5/0x540
? lock_downgrade+0x6b0/0x6b0
? sched_clock+0x5/0x10
? lock_release+0x1100/0x1100
? compat_start_thread+0x80/0x80
? do_raw_spin_trylock+0x190/0x190
? _raw_spin_unlock_irq+0x29/0x40
? trace_hardirqs_on_caller+0x381/0x570
? _raw_spin_unlock_irq+0x29/0x40
? finish_task_switch+0x1e5/0x760
? finish_task_switch+0x208/0x760
? preempt_notifier_dec+0x20/0x20
? __schedule+0x839/0x1ee0
? kmem_cache_alloc_trace+0x143/0x320
? firmware_map_remove+0x73/0x73
? sched_clock+0x5/0x10
? sched_clock_cpu+0x18/0x170
? find_held_lock+0x39/0x1c0
? schedule+0xf3/0x3b0
? lock_downgrade+0x6b0/0x6b0
? __schedule+0x1ee0/0x1ee0
? do_wait_intr_irq+0x340/0x340
? do_raw_spin_trylock+0x190/0x190
? _raw_spin_unlock_irqrestore+0x32/0x60
? process_one_work+0x1c80/0x1c80
? process_one_work+0x1c80/0x1c80
kthread+0x312/0x3d0
? kthread_create_worker_on_cpu+0xc0/0xc0
ret_from_fork+0x3a/0x50
Allocated by task 1688:
kasan_kmalloc+0xa0/0xd0
__kmalloc+0x162/0x380
u32_change+0x1220/0x3c9e [cls_u32]
tc_ctl_tfilter+0x1ba6/0x2f80
rtnetlink_rcv_msg+0x4f0/0x9d0
netlink_rcv_skb+0x124/0x320
netlink_unicast+0x430/0x600
netlink_sendmsg+0x8fa/0xd60
sock_sendmsg+0xb1/0xe0
___sys_sendmsg+0x678/0x980
__sys_sendmsg+0xc4/0x210
do_syscall_64+0x232/0x7f0
return_from_SYSCALL_64+0x0/0x75
Freed by task 112:
kasan_slab_free+0x71/0xc0
kfree+0x114/0x320
rcu_process_callbacks+0xc3f/0x1600
__do_softirq+0x2bf/0xc06
The buggy address belongs to the object at ffff881b83dae600
which belongs to the cache kmalloc-4096 of size 4096
The buggy address is located 24 bytes inside of
4096-byte region [ffff881b83dae600, ffff881b83daf600)
The buggy address belongs to the page:
page:ffffea006e0f6a00 count:1 mapcount:0 mapping: (null) index:0x0 compound_mapcount: 0
flags: 0x17ffffc0008100(slab|head)
raw: 0017ffffc0008100 0000000000000000 0000000000000000 0000000100070007
raw: dead000000000100 dead000000000200 ffff880187c0e600 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff881b83dae500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff881b83dae580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff881b83dae600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff881b83dae680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff881b83dae700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
The problem is that the htnode is freed before the linked knodes and the
latter will try to access the first at u32_destroy_key() time.
This change addresses the issue using the htnode refcnt to guarantee
the correct free order. While at it also add a RCU annotation,
to keep sparse happy.
v1 -> v2: use rtnl_derefence() instead of RCU read locks
v2 -> v3:
- don't check refcnt in u32_destroy_hnode()
- cleaned-up u32_destroy() implementation
- cleaned-up code comment
v3 -> v4:
- dropped unneeded comment
Reported-by: Li Shuang <shuali(a)redhat.com>
Fixes: c0d378ef1266 ("net_sched: use tcf_queue_work() in u32 filter")
Signed-off-by: Paolo Abeni <pabeni(a)redhat.com>
Acked-by: Cong Wang <xiyou.wangcong(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/sched/cls_u32.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)
--- a/net/sched/cls_u32.c
+++ b/net/sched/cls_u32.c
@@ -397,10 +397,12 @@ static int u32_init(struct tcf_proto *tp
static int u32_destroy_key(struct tcf_proto *tp, struct tc_u_knode *n,
bool free_pf)
{
+ struct tc_u_hnode *ht = rtnl_dereference(n->ht_down);
+
tcf_exts_destroy(&n->exts);
tcf_exts_put_net(&n->exts);
- if (n->ht_down)
- n->ht_down->refcnt--;
+ if (ht && --ht->refcnt == 0)
+ kfree(ht);
#ifdef CONFIG_CLS_U32_PERF
if (free_pf)
free_percpu(n->pf);
@@ -653,16 +655,15 @@ static void u32_destroy(struct tcf_proto
hlist_del(&tp_c->hnode);
- for (ht = rtnl_dereference(tp_c->hlist);
- ht;
- ht = rtnl_dereference(ht->next)) {
- ht->refcnt--;
- u32_clear_hnode(tp, ht);
- }
-
while ((ht = rtnl_dereference(tp_c->hlist)) != NULL) {
+ u32_clear_hnode(tp, ht);
RCU_INIT_POINTER(tp_c->hlist, ht->next);
- kfree_rcu(ht, rcu);
+
+ /* u32_destroy_key() will later free ht for us, if it's
+ * still referenced by some knode
+ */
+ if (--ht->refcnt == 0)
+ kfree_rcu(ht, rcu);
}
idr_destroy(&tp_c->handle_idr);
Patches currently in stable-queue which might be from pabeni(a)redhat.com are
queue-4.15/cls_u32-fix-use-after-free-in-u32_destroy_key.patch
This is a note to let you know that I've just added the patch titled
cxgb4: fix trailing zero in CIM LA dump
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
cxgb4-fix-trailing-zero-in-cim-la-dump.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Mar 6 19:02:56 PST 2018
From: Rahul Lakkireddy <rahul.lakkireddy(a)chelsio.com>
Date: Thu, 15 Feb 2018 18:20:01 +0530
Subject: cxgb4: fix trailing zero in CIM LA dump
From: Rahul Lakkireddy <rahul.lakkireddy(a)chelsio.com>
[ Upstream commit e6f02a4d57cc438099bc8abfba43ba1400d77b38 ]
Set correct size of the CIM LA dump for T6.
Fixes: 27887bc7cb7f ("cxgb4: collect hardware LA dumps")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy(a)chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr(a)chelsio.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c | 2 +-
drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)
--- a/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cudbg_lib.c
@@ -156,7 +156,7 @@ int cudbg_collect_cim_la(struct cudbg_in
if (is_t6(padap->params.chip)) {
size = padap->params.cim_la_size / 10 + 1;
- size *= 11 * sizeof(u32);
+ size *= 10 * sizeof(u32);
} else {
size = padap->params.cim_la_size / 8;
size *= 8 * sizeof(u32);
--- a/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
+++ b/drivers/net/ethernet/chelsio/cxgb4/cxgb4_cudbg.c
@@ -97,7 +97,7 @@ static u32 cxgb4_get_entity_length(struc
case CUDBG_CIM_LA:
if (is_t6(adap->params.chip)) {
len = adap->params.cim_la_size / 10 + 1;
- len *= 11 * sizeof(u32);
+ len *= 10 * sizeof(u32);
} else {
len = adap->params.cim_la_size / 8;
len *= 8 * sizeof(u32);
Patches currently in stable-queue which might be from rahul.lakkireddy(a)chelsio.com are
queue-4.15/cxgb4-fix-trailing-zero-in-cim-la-dump.patch
This is a note to let you know that I've just added the patch titled
bridge: Fix VLAN reference count problem
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bridge-fix-vlan-reference-count-problem.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Mar 6 19:02:56 PST 2018
From: Ido Schimmel <idosch(a)mellanox.com>
Date: Sun, 25 Feb 2018 21:59:06 +0200
Subject: bridge: Fix VLAN reference count problem
From: Ido Schimmel <idosch(a)mellanox.com>
[ Upstream commit 0e5a82efda872c2469c210957d7d4161ef8f4391 ]
When a VLAN is added on a port, a reference is taken on the
corresponding master VLAN entry. If it does not already exist, then it
is created and a reference taken.
However, in the second case a reference is not really taken when
CONFIG_REFCOUNT_FULL is enabled as refcount_inc() is replaced by
refcount_inc_not_zero().
Fix this by using refcount_set() on a newly created master VLAN entry.
Fixes: 251277598596 ("net, bridge: convert net_bridge_vlan.refcnt from atomic_t to refcount_t")
Signed-off-by: Ido Schimmel <idosch(a)mellanox.com>
Acked-by: Nikolay Aleksandrov <nikolay(a)cumulusnetworks.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/bridge/br_vlan.c | 2 ++
1 file changed, 2 insertions(+)
--- a/net/bridge/br_vlan.c
+++ b/net/bridge/br_vlan.c
@@ -168,6 +168,8 @@ static struct net_bridge_vlan *br_vlan_g
masterv = br_vlan_find(vg, vid);
if (WARN_ON(!masterv))
return NULL;
+ refcount_set(&masterv->refcnt, 1);
+ return masterv;
}
refcount_inc(&masterv->refcnt);
Patches currently in stable-queue which might be from idosch(a)mellanox.com are
queue-4.15/mlxsw-spectrum_router-fix-error-path-in-mlxsw_sp_vr_create.patch
queue-4.15/mlxsw-spectrum_router-do-not-unconditionally-clear-route-offload-indication.patch
queue-4.15/bridge-fix-vlan-reference-count-problem.patch
queue-4.15/net-ipv4-set-addr_type-in-hash_keys-for-forwarded-case.patch
queue-4.15/mlxsw-spectrum_switchdev-check-success-of-fdb-add-operation.patch
This is a note to let you know that I've just added the patch titled
bridge: check brport attr show in brport_show
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
bridge-check-brport-attr-show-in-brport_show.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Mar 6 19:02:56 PST 2018
From: Xin Long <lucien.xin(a)gmail.com>
Date: Mon, 12 Feb 2018 17:15:40 +0800
Subject: bridge: check brport attr show in brport_show
From: Xin Long <lucien.xin(a)gmail.com>
[ Upstream commit 1b12580af1d0677c3c3a19e35bfe5d59b03f737f ]
Now br_sysfs_if file flush doesn't have attr show. To read it will
cause kernel panic after users chmod u+r this file.
Xiong found this issue when running the commands:
ip link add br0 type bridge
ip link add type veth
ip link set veth0 master br0
chmod u+r /sys/devices/virtual/net/veth0/brport/flush
timeout 3 cat /sys/devices/virtual/net/veth0/brport/flush
kernel crashed with NULL a pointer dereference call trace.
This patch is to fix it by return -EINVAL when brport_attr->show
is null, just the same as the check for brport_attr->store in
brport_store().
Fixes: 9cf637473c85 ("bridge: add sysfs hook to flush forwarding table")
Reported-by: Xiong Zhou <xzhou(a)redhat.com>
Signed-off-by: Xin Long <lucien.xin(a)gmail.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/bridge/br_sysfs_if.c | 3 +++
1 file changed, 3 insertions(+)
--- a/net/bridge/br_sysfs_if.c
+++ b/net/bridge/br_sysfs_if.c
@@ -255,6 +255,9 @@ static ssize_t brport_show(struct kobjec
struct brport_attribute *brport_attr = to_brport_attr(attr);
struct net_bridge_port *p = to_brport(kobj);
+ if (!brport_attr->show)
+ return -EINVAL;
+
return brport_attr->show(p, buf);
}
Patches currently in stable-queue which might be from lucien.xin(a)gmail.com are
queue-4.15/bridge-check-brport-attr-show-in-brport_show.patch
queue-4.15/sctp-do-not-pr_err-for-the-duplicated-node-in-transport-rhlist.patch
This is a note to let you know that I've just added the patch titled
amd-xgbe: Restore PCI interrupt enablement setting on resume
to the 4.15-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
amd-xgbe-restore-pci-interrupt-enablement-setting-on-resume.patch
and it can be found in the queue-4.15 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Mar 6 19:02:56 PST 2018
From: Tom Lendacky <thomas.lendacky(a)amd.com>
Date: Tue, 20 Feb 2018 15:22:05 -0600
Subject: amd-xgbe: Restore PCI interrupt enablement setting on resume
From: Tom Lendacky <thomas.lendacky(a)amd.com>
[ Upstream commit cfd092f2db8b4b6727e1c03ef68a7842e1023573 ]
After resuming from suspend, the PCI device support must re-enable the
interrupt setting so that interrupts are actually delivered.
Signed-off-by: Tom Lendacky <thomas.lendacky(a)amd.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/ethernet/amd/xgbe/xgbe-pci.c | 2 ++
1 file changed, 2 insertions(+)
--- a/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
+++ b/drivers/net/ethernet/amd/xgbe/xgbe-pci.c
@@ -426,6 +426,8 @@ static int xgbe_pci_resume(struct pci_de
struct net_device *netdev = pdata->netdev;
int ret = 0;
+ XP_IOWRITE(pdata, XP_INT_EN, 0x1fffff);
+
pdata->lpm_ctrl &= ~MDIO_CTRL1_LPOWER;
XMDIO_WRITE(pdata, MDIO_MMD_PCS, MDIO_CTRL1, pdata->lpm_ctrl);
Patches currently in stable-queue which might be from thomas.lendacky(a)amd.com are
queue-4.15/net-amd-xgbe-fix-comparison-to-bitshift-when-dealing-with-a-mask.patch
queue-4.15/amd-xgbe-restore-pci-interrupt-enablement-setting-on-resume.patch
This is a note to let you know that I've just added the patch titled
virtio-net: disable NAPI only when enabled during XDP set
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
virtio-net-disable-napi-only-when-enabled-during-xdp-set.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Mar 6 19:02:12 PST 2018
From: Jason Wang <jasowang(a)redhat.com>
Date: Wed, 28 Feb 2018 18:20:04 +0800
Subject: virtio-net: disable NAPI only when enabled during XDP set
From: Jason Wang <jasowang(a)redhat.com>
[ Upstream commit 4e09ff5362843dff3accfa84c805c7f3a99de9cd ]
We try to disable NAPI to prevent a single XDP TX queue being used by
multiple cpus. But we don't check if device is up (NAPI is enabled),
this could result stall because of infinite wait in
napi_disable(). Fixing this by checking device state through
netif_running() before.
Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set")
Signed-off-by: Jason Wang <jasowang(a)redhat.com>
Acked-by: Michael S. Tsirkin <mst(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/virtio_net.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1995,8 +1995,9 @@ static int virtnet_xdp_set(struct net_de
}
/* Make sure NAPI is not using any XDP TX queues for RX. */
- for (i = 0; i < vi->max_queue_pairs; i++)
- napi_disable(&vi->rq[i].napi);
+ if (netif_running(dev))
+ for (i = 0; i < vi->max_queue_pairs; i++)
+ napi_disable(&vi->rq[i].napi);
netif_set_real_num_rx_queues(dev, curr_qp + xdp_qp);
err = _virtnet_set_queues(vi, curr_qp + xdp_qp);
@@ -2015,7 +2016,8 @@ static int virtnet_xdp_set(struct net_de
}
if (old_prog)
bpf_prog_put(old_prog);
- virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
+ if (netif_running(dev))
+ virtnet_napi_enable(vi->rq[i].vq, &vi->rq[i].napi);
}
return 0;
Patches currently in stable-queue which might be from jasowang(a)redhat.com are
queue-4.14/tuntap-correctly-add-the-missing-xdp-flush.patch
queue-4.14/virtio-net-disable-napi-only-when-enabled-during-xdp-set.patch
queue-4.14/tuntap-disable-preemption-during-xdp-processing.patch
This is a note to let you know that I've just added the patch titled
udplite: fix partial checksum initialization
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
udplite-fix-partial-checksum-initialization.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Mar 6 19:02:12 PST 2018
From: Alexey Kodanev <alexey.kodanev(a)oracle.com>
Date: Thu, 15 Feb 2018 20:18:43 +0300
Subject: udplite: fix partial checksum initialization
From: Alexey Kodanev <alexey.kodanev(a)oracle.com>
[ Upstream commit 15f35d49c93f4fa9875235e7bf3e3783d2dd7a1b ]
Since UDP-Lite is always using checksum, the following path is
triggered when calculating pseudo header for it:
udp4_csum_init() or udp6_csum_init()
skb_checksum_init_zero_check()
__skb_checksum_validate_complete()
The problem can appear if skb->len is less than CHECKSUM_BREAK. In
this particular case __skb_checksum_validate_complete() also invokes
__skb_checksum_complete(skb). If UDP-Lite is using partial checksum
that covers only part of a packet, the function will return bad
checksum and the packet will be dropped.
It can be fixed if we skip skb_checksum_init_zero_check() and only
set the required pseudo header checksum for UDP-Lite with partial
checksum before udp4_csum_init()/udp6_csum_init() functions return.
Fixes: ed70fcfcee95 ("net: Call skb_checksum_init in IPv4")
Fixes: e4f45b7f40bd ("net: Call skb_checksum_init in IPv6")
Signed-off-by: Alexey Kodanev <alexey.kodanev(a)oracle.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
include/net/udplite.h | 1 +
net/ipv4/udp.c | 5 +++++
net/ipv6/ip6_checksum.c | 5 +++++
3 files changed, 11 insertions(+)
--- a/include/net/udplite.h
+++ b/include/net/udplite.h
@@ -64,6 +64,7 @@ static inline int udplite_checksum_init(
UDP_SKB_CB(skb)->cscov = cscov;
if (skb->ip_summed == CHECKSUM_COMPLETE)
skb->ip_summed = CHECKSUM_NONE;
+ skb->csum_valid = 0;
}
return 0;
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2032,6 +2032,11 @@ static inline int udp4_csum_init(struct
err = udplite_checksum_init(skb, uh);
if (err)
return err;
+
+ if (UDP_SKB_CB(skb)->partial_cov) {
+ skb->csum = inet_compute_pseudo(skb, proto);
+ return 0;
+ }
}
/* Note, we are only interested in != 0 or == 0, thus the
--- a/net/ipv6/ip6_checksum.c
+++ b/net/ipv6/ip6_checksum.c
@@ -73,6 +73,11 @@ int udp6_csum_init(struct sk_buff *skb,
err = udplite_checksum_init(skb, uh);
if (err)
return err;
+
+ if (UDP_SKB_CB(skb)->partial_cov) {
+ skb->csum = ip6_compute_pseudo(skb, proto);
+ return 0;
+ }
}
/* To support RFC 6936 (allow zero checksum in UDP/IPV6 for tunnels)
Patches currently in stable-queue which might be from alexey.kodanev(a)oracle.com are
queue-4.14/sctp-fix-dst-refcnt-leak-in-sctp_v6_get_dst.patch
queue-4.14/udplite-fix-partial-checksum-initialization.patch
queue-4.14/sctp-verify-size-of-a-new-chunk-in-_sctp_make_chunk.patch
This is a note to let you know that I've just added the patch titled
tuntap: correctly add the missing XDP flush
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tuntap-correctly-add-the-missing-xdp-flush.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Mar 6 19:02:12 PST 2018
From: Jason Wang <jasowang(a)redhat.com>
Date: Sat, 24 Feb 2018 11:32:26 +0800
Subject: tuntap: correctly add the missing XDP flush
From: Jason Wang <jasowang(a)redhat.com>
[ Upstream commit 1bb4f2e868a2891ab8bc668b8173d6ccb8c4ce6f ]
We don't flush batched XDP packets through xdp_do_flush_map(), this
will cause packets stall at TX queue. Consider we don't do XDP on NAPI
poll(), the only possible fix is to call xdp_do_flush_map()
immediately after xdp_do_redirect().
Note, this in fact won't try to batch packets through devmap, we could
address in the future.
Reported-by: Christoffer Dall <christoffer.dall(a)linaro.org>
Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Jason Wang <jasowang(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/tun.c | 1 +
1 file changed, 1 insertion(+)
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1333,6 +1333,7 @@ static struct sk_buff *tun_build_skb(str
get_page(alloc_frag->page);
alloc_frag->offset += buflen;
err = xdp_do_redirect(tun->dev, &xdp, xdp_prog);
+ xdp_do_flush_map();
if (err)
goto err_redirect;
rcu_read_unlock();
Patches currently in stable-queue which might be from jasowang(a)redhat.com are
queue-4.14/tuntap-correctly-add-the-missing-xdp-flush.patch
queue-4.14/virtio-net-disable-napi-only-when-enabled-during-xdp-set.patch
queue-4.14/tuntap-disable-preemption-during-xdp-processing.patch
This is a note to let you know that I've just added the patch titled
tuntap: disable preemption during XDP processing
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tuntap-disable-preemption-during-xdp-processing.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Mar 6 19:02:12 PST 2018
From: Jason Wang <jasowang(a)redhat.com>
Date: Sat, 24 Feb 2018 11:32:25 +0800
Subject: tuntap: disable preemption during XDP processing
From: Jason Wang <jasowang(a)redhat.com>
[ Upstream commit 23e43f07f896f8578318cfcc9466f1e8b8ab21b6 ]
Except for tuntap, all other drivers' XDP was implemented at NAPI
poll() routine in a bh. This guarantees all XDP operation were done at
the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But
for tuntap, we do it in process context and we try to protect XDP
processing by RCU reader lock. This is insufficient since
CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which
breaks the assumption that all XDP were processed in the same CPU.
Fixing this by simply disabling preemption during XDP processing.
Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Jason Wang <jasowang(a)redhat.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/net/tun.c | 6 ++++++
1 file changed, 6 insertions(+)
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1315,6 +1315,7 @@ static struct sk_buff *tun_build_skb(str
else
*skb_xdp = 0;
+ preempt_disable();
rcu_read_lock();
xdp_prog = rcu_dereference(tun->xdp_prog);
if (xdp_prog && !*skb_xdp) {
@@ -1337,6 +1338,7 @@ static struct sk_buff *tun_build_skb(str
if (err)
goto err_redirect;
rcu_read_unlock();
+ preempt_enable();
return NULL;
case XDP_TX:
xdp_xmit = true;
@@ -1358,6 +1360,7 @@ static struct sk_buff *tun_build_skb(str
skb = build_skb(buf, buflen);
if (!skb) {
rcu_read_unlock();
+ preempt_enable();
return ERR_PTR(-ENOMEM);
}
@@ -1370,10 +1373,12 @@ static struct sk_buff *tun_build_skb(str
skb->dev = tun->dev;
generic_xdp_tx(skb, xdp_prog);
rcu_read_unlock();
+ preempt_enable();
return NULL;
}
rcu_read_unlock();
+ preempt_enable();
return skb;
@@ -1381,6 +1386,7 @@ err_redirect:
put_page(alloc_frag->page);
err_xdp:
rcu_read_unlock();
+ preempt_enable();
this_cpu_inc(tun->pcpu_stats->rx_dropped);
return NULL;
}
Patches currently in stable-queue which might be from jasowang(a)redhat.com are
queue-4.14/tuntap-correctly-add-the-missing-xdp-flush.patch
queue-4.14/virtio-net-disable-napi-only-when-enabled-during-xdp-set.patch
queue-4.14/tuntap-disable-preemption-during-xdp-processing.patch
This is a note to let you know that I've just added the patch titled
tcp_bbr: better deal with suboptimal GSO
to the 4.14-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=sum…
The filename of the patch is:
tcp_bbr-better-deal-with-suboptimal-gso.patch
and it can be found in the queue-4.14 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable(a)vger.kernel.org> know about it.
>From foo@baz Tue Mar 6 19:02:12 PST 2018
From: Eric Dumazet <edumazet(a)google.com>
Date: Wed, 21 Feb 2018 06:43:03 -0800
Subject: tcp_bbr: better deal with suboptimal GSO
From: Eric Dumazet <edumazet(a)google.com>
[ Upstream commit 350c9f484bde93ef229682eedd98cd5f74350f7f ]
BBR uses tcp_tso_autosize() in an attempt to probe what would be the
burst sizes and to adjust cwnd in bbr_target_cwnd() with following
gold formula :
/* Allow enough full-sized skbs in flight to utilize end systems. */
cwnd += 3 * bbr->tso_segs_goal;
But GSO can be lacking or be constrained to very small
units (ip link set dev ... gso_max_segs 2)
What we really want is to have enough packets in flight so that both
GSO and GRO are efficient.
So in the case GSO is off or downgraded, we still want to have the same
number of packets in flight as if GSO/TSO was fully operational, so
that GRO can hopefully be working efficiently.
To fix this issue, we make tcp_tso_autosize() unaware of
sk->sk_gso_max_segs
Only tcp_tso_segs() has to enforce the gso_max_segs limit.
Tested:
ethtool -K eth0 tso off gso off
tc qd replace dev eth0 root pfifo_fast
Before patch:
for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
691 (ss -temoi shows cwnd is stuck around 6 )
667
651
631
517
After patch :
# for f in {1..5}; do ./super_netperf 1 -H lpaa24 -- -K bbr; done
1733 (ss -temoi shows cwnd is around 386 )
1778
1746
1781
1718
Fixes: 0f8782ea1497 ("tcp_bbr: add BBR congestion control")
Signed-off-by: Eric Dumazet <edumazet(a)google.com>
Reported-by: Oleksandr Natalenko <oleksandr(a)natalenko.name>
Acked-by: Neal Cardwell <ncardwell(a)google.com>
Acked-by: Soheil Hassas Yeganeh <soheil(a)google.com>
Signed-off-by: David S. Miller <davem(a)davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
net/ipv4/tcp_output.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1681,7 +1681,7 @@ u32 tcp_tso_autosize(const struct sock *
*/
segs = max_t(u32, bytes / mss_now, min_tso_segs);
- return min_t(u32, segs, sk->sk_gso_max_segs);
+ return segs;
}
EXPORT_SYMBOL(tcp_tso_autosize);
@@ -1693,8 +1693,10 @@ static u32 tcp_tso_segs(struct sock *sk,
const struct tcp_congestion_ops *ca_ops = inet_csk(sk)->icsk_ca_ops;
u32 tso_segs = ca_ops->tso_segs_goal ? ca_ops->tso_segs_goal(sk) : 0;
- return tso_segs ? :
- tcp_tso_autosize(sk, mss_now, sysctl_tcp_min_tso_segs);
+ if (!tso_segs)
+ tso_segs = tcp_tso_autosize(sk, mss_now,
+ sysctl_tcp_min_tso_segs);
+ return min_t(u32, tso_segs, sk->sk_gso_max_segs);
}
/* Returns the portion of skb which can be sent right away */
Patches currently in stable-queue which might be from edumazet(a)google.com are
queue-4.14/doc-change-the-min-default-value-of-tcp_wmem-tcp_rmem.patch
queue-4.14/tcp-purge-write-queue-upon-rst.patch
queue-4.14/net_sched-gen_estimator-fix-broken-estimators-based-on-percpu-stats.patch
queue-4.14/tcp_bbr-better-deal-with-suboptimal-gso.patch