This is a backport of a 5.1rc patchset:
https://patchwork.ozlabs.org/cover/1029418/
Which was backported into 4.19:
https://patchwork.ozlabs.org/cover/1081619/
and into 4.14:
https://patchwork.ozlabs.org/cover/1089651/
This 4.9 patchset is very close to the 4.14 patchset above
(cherry-picks from 4.14 were almost clean).
Eric Dumazet (1):
ipv6: frags: fix a lockdep false positive
Florian Westphal (1):
ipv6: remove dependency of nf_defrag_ipv6 on ipv6 module
Peter Oskolkov (3):
net: IP defrag: encapsulate rbtree defrag code into callable functions
net: IP6 defrag: use rbtrees for IPv6 defrag
net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c
include/net/inet_frag.h | 16 +-
include/net/ipv6.h | 29 --
include/net/ipv6_frag.h | 111 +++++++
net/ieee802154/6lowpan/reassembly.c | 2 +-
net/ipv4/inet_fragment.c | 293 ++++++++++++++++++
net/ipv4/ip_fragment.c | 295 +++---------------
net/ipv6/netfilter/nf_conntrack_reasm.c | 273 +++++-----------
net/ipv6/netfilter/nf_defrag_ipv6_hooks.c | 3 +-
net/ipv6/reassembly.c | 361 ++++++----------------
net/openvswitch/conntrack.c | 1 +
10 files changed, 631 insertions(+), 753 deletions(-)
create mode 100644 include/net/ipv6_frag.h
--
2.21.0.593.g511ec345e18-goog
This barrier only applies to the read-modify-write operations; in
particular, it does not apply to the atomic64_set() primitive.
Replace the barrier with an smp_mb().
Fixes: fdd4e15838e59 ("ceph: rework dcache readdir")
Cc: stable(a)vger.kernel.org
Reported-by: "Paul E. McKenney" <paulmck(a)linux.ibm.com>
Reported-by: Peter Zijlstra <peterz(a)infradead.org>
Signed-off-by: Andrea Parri <andrea.parri(a)amarulasolutions.com>
Cc: "Yan, Zheng" <zyan(a)redhat.com>
Cc: Sage Weil <sage(a)redhat.com>
Cc: Ilya Dryomov <idryomov(a)gmail.com>
Cc: ceph-devel(a)vger.kernel.org
---
fs/ceph/super.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ceph/super.h b/fs/ceph/super.h
index 16c03188578ea..b5c782e6d62f1 100644
--- a/fs/ceph/super.h
+++ b/fs/ceph/super.h
@@ -541,7 +541,7 @@ static inline void __ceph_dir_set_complete(struct ceph_inode_info *ci,
long long release_count,
long long ordered_count)
{
- smp_mb__before_atomic();
+ smp_mb();
atomic64_set(&ci->i_complete_seq[0], release_count);
atomic64_set(&ci->i_complete_seq[1], ordered_count);
}
--
2.7.4
Hi all,
After a glibc update to 2.29, my 4.14 builds started failing like so:
$ make -j$(nproc) defconfig bzImage
HOSTCC scripts/basic/fixdep
HOSTCC scripts/kconfig/conf.o
SHIPPED scripts/kconfig/zconf.tab.c
SHIPPED scripts/kconfig/zconf.lex.c
HOSTCC scripts/kconfig/zconf.tab.o
HOSTLD scripts/kconfig/conf
*** Default configuration is based on 'x86_64_defconfig'
#
# configuration written to .config
#
scripts/kconfig/conf --silentoldconfig Kconfig
SYSTBL arch/x86/include/generated/asm/syscalls_32.h
SYSHDR arch/x86/include/generated/asm/unistd_32_ia32.h
SYSHDR arch/x86/include/generated/asm/unistd_64_x32.h
SYSHDR arch/x86/include/generated/uapi/asm/unistd_32.h
SYSHDR arch/x86/include/generated/uapi/asm/unistd_64.h
SYSTBL arch/x86/include/generated/asm/syscalls_64.h
SYSHDR arch/x86/include/generated/uapi/asm/unistd_x32.h
CHK include/config/kernel.release
CHK include/generated/uapi/linux/version.h
UPD include/generated/uapi/linux/version.h
DESCEND objtool
HOSTCC /home/nathan/cbl/linux-stable/tools/objtool/fixdep.o
HOSTLD /home/nathan/cbl/linux-stable/tools/objtool/fixdep-in.o
LINK /home/nathan/cbl/linux-stable/tools/objtool/fixdep
CC /home/nathan/cbl/linux-stable/tools/objtool/builtin-check.o
CC /home/nathan/cbl/linux-stable/tools/objtool/builtin-orc.o
CC /home/nathan/cbl/linux-stable/tools/objtool/check.o
CC /home/nathan/cbl/linux-stable/tools/objtool/orc_gen.o
CC /home/nathan/cbl/linux-stable/tools/objtool/orc_dump.o
CC /home/nathan/cbl/linux-stable/tools/objtool/elf.o
CC /home/nathan/cbl/linux-stable/tools/objtool/special.o
GEN /home/nathan/cbl/linux-stable/tools/objtool/arch/x86/lib/inat-tables.c
CC /home/nathan/cbl/linux-stable/tools/objtool/objtool.o
CC /home/nathan/cbl/linux-stable/tools/objtool/libstring.o
CC /home/nathan/cbl/linux-stable/tools/objtool/str_error_r.o
CC /home/nathan/cbl/linux-stable/tools/objtool/exec-cmd.o
CC /home/nathan/cbl/linux-stable/tools/objtool/pager.o
CC /home/nathan/cbl/linux-stable/tools/objtool/help.o
CC /home/nathan/cbl/linux-stable/tools/objtool/parse-options.o
CC /home/nathan/cbl/linux-stable/tools/objtool/run-command.o
CC /home/nathan/cbl/linux-stable/tools/objtool/sigchain.o
CC /home/nathan/cbl/linux-stable/tools/objtool/subcmd-config.o
CC /home/nathan/cbl/linux-stable/tools/objtool/arch/x86/decode.o
LD /home/nathan/cbl/linux-stable/tools/objtool/libsubcmd-in.o
LD /home/nathan/cbl/linux-stable/tools/objtool/arch/x86/objtool-in.o
LD /home/nathan/cbl/linux-stable/tools/objtool/objtool-in.o
AR /home/nathan/cbl/linux-stable/tools/objtool/libsubcmd.a
LINK /home/nathan/cbl/linux-stable/tools/objtool/objtool
UPD include/config/kernel.release
HOSTCC arch/x86/tools/relocs_32.o
HOSTCC arch/x86/tools/relocs_common.o
HOSTCC arch/x86/tools/relocs_64.o
WRAP arch/x86/include/generated/asm/dma-contiguous.h
WRAP arch/x86/include/generated/asm/mcs_spinlock.h
WRAP arch/x86/include/generated/asm/clkdev.h
WRAP arch/x86/include/generated/asm/mm-arch-hooks.h
WRAP arch/x86/include/generated/asm/early_ioremap.h
CHK include/generated/utsrelease.h
UPD include/generated/utsrelease.h
HOSTCC scripts/kallsyms
HOSTCC scripts/pnmtologo
HOSTCC scripts/conmakehash
HOSTCC scripts/sortextable
CC scripts/mod/empty.o
CC scripts/mod/devicetable-offsets.s
HOSTCC scripts/mod/mk_elfconfig
HOSTCC scripts/selinux/mdp/mdp
HOSTCC scripts/selinux/genheaders/genheaders
In file included from scripts/selinux/genheaders/genheaders.c:19:
./security/selinux/include/classmap.h:245:2: error: #error New address family defined, please update secclass_map.
#error New address family defined, please update secclass_map.
^~~~~
make[4]: *** [scripts/Makefile.host:102: scripts/selinux/genheaders/genheaders] Error 1
make[3]: *** [scripts/Makefile.build:585: scripts/selinux/genheaders] Error 2
make[3]: *** Waiting for unfinished jobs....
In file included from scripts/selinux/mdp/mdp.c:49:
./security/selinux/include/classmap.h:245:2: error: #error New address family defined, please update secclass_map.
#error New address family defined, please update secclass_map.
^~~~~
make[4]: *** [scripts/Makefile.host:102: scripts/selinux/mdp/mdp] Error 1
make[3]: *** [scripts/Makefile.build:585: scripts/selinux/mdp] Error 2
make[2]: *** [scripts/Makefile.build:585: scripts/selinux] Error 2
make[2]: *** Waiting for unfinished jobs....
HOSTLD arch/x86/tools/relocs
CHK scripts/mod/devicetable-offsets.h
UPD scripts/mod/devicetable-offsets.h
MKELF scripts/mod/elfconfig.h
HOSTCC scripts/mod/modpost.o
HOSTCC scripts/mod/file2alias.o
HOSTCC scripts/mod/sumversion.o
CHK include/generated/timeconst.h
CC kernel/bounds.s
UPD include/generated/timeconst.h
CHK include/generated/bounds.h
UPD include/generated/bounds.h
CC arch/x86/kernel/asm-offsets.s
HOSTLD scripts/mod/modpost
make[1]: *** [Makefile:572: scripts] Error 2
make[1]: *** Waiting for unfinished jobs....
CHK include/generated/asm-offsets.h
UPD include/generated/asm-offsets.h
CALL scripts/checksyscalls.sh
make: *** [Makefile:264: __build_one_by_one] Error 2
This is due to commit c017c71ce09f ("selinux: include sys/socket.h in
host programs to have PF_MAX") [1] in the kernel interacting poorly
with glibc's commit 38b0593e9a ("Add PF_XDP, AF_XDP and SOL_XDP from
Linux 4.18 to bits/socket.h.") [2]
I am not really sure how this should be fixed or who is at fault but I
didn't see it reported anywhere yet (I assume the kernel) and I feel
more comfortable on the kernel mailing list than other bug trackers so
here we are.
[1]: https://git.kernel.org/linus/c017c71ce09f4c7a5378fccbec6a3d7e96b0c5c2
[2]: https://sourceware.org/git/?p=glibc.git;a=commit;h=38b0593e9a862c3b35392a0f…
Thanks,
Nathan
This is a note to let you know that I've just added the patch titled
USB: cdc-acm: fix unthrottle races
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 764478f41130f1b8d8057575b89e69980a0f600d Mon Sep 17 00:00:00 2001
From: Johan Hovold <johan(a)kernel.org>
Date: Thu, 25 Apr 2019 18:05:39 +0200
Subject: USB: cdc-acm: fix unthrottle races
Fix two long-standing bugs which could potentially lead to memory
corruption or leave the port throttled until it is reopened (on weakly
ordered systems), respectively, when read-URB completion races with
unthrottle().
First, the URB must not be marked as free before processing is complete
to prevent it from being submitted by unthrottle() on another CPU.
CPU 1 CPU 2
================ ================
complete() unthrottle()
process_urb();
smp_mb__before_atomic();
set_bit(i, free); if (test_and_clear_bit(i, free))
submit_urb();
Second, the URB must be marked as free before checking the throttled
flag to prevent unthrottle() on another CPU from failing to observe that
the URB needs to be submitted if complete() sees that the throttled flag
is set.
CPU 1 CPU 2
================ ================
complete() unthrottle()
set_bit(i, free); throttled = 0;
smp_mb__after_atomic(); smp_mb();
if (throttled) if (test_and_clear_bit(i, free))
return; submit_urb();
Note that test_and_clear_bit() only implies barriers when the test is
successful. To handle the case where the URB is still in use an explicit
barrier needs to be added to unthrottle() for the second race condition.
Also note that the first race was fixed by 36e59e0d70d6 ("cdc-acm: fix
race between callback and unthrottle") back in 2015, but the bug was
reintroduced a year later.
Fixes: 1aba579f3cf5 ("cdc-acm: handle read pipe errors")
Fixes: 088c64f81284 ("USB: cdc-acm: re-write read processing")
Signed-off-by: Johan Hovold <johan(a)kernel.org>
Acked-by: Oliver Neukum <oneukum(a)suse.com>
Cc: stable <stable(a)vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh(a)linuxfoundation.org>
---
drivers/usb/class/cdc-acm.c | 32 +++++++++++++++++++++++++-------
1 file changed, 25 insertions(+), 7 deletions(-)
diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index ec666eb4b7b4..c03aa8550980 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -470,12 +470,12 @@ static void acm_read_bulk_callback(struct urb *urb)
struct acm *acm = rb->instance;
unsigned long flags;
int status = urb->status;
+ bool stopped = false;
+ bool stalled = false;
dev_vdbg(&acm->data->dev, "got urb %d, len %d, status %d\n",
rb->index, urb->actual_length, status);
- set_bit(rb->index, &acm->read_urbs_free);
-
if (!acm->dev) {
dev_dbg(&acm->data->dev, "%s - disconnected\n", __func__);
return;
@@ -488,15 +488,16 @@ static void acm_read_bulk_callback(struct urb *urb)
break;
case -EPIPE:
set_bit(EVENT_RX_STALL, &acm->flags);
- schedule_work(&acm->work);
- return;
+ stalled = true;
+ break;
case -ENOENT:
case -ECONNRESET:
case -ESHUTDOWN:
dev_dbg(&acm->data->dev,
"%s - urb shutting down with status: %d\n",
__func__, status);
- return;
+ stopped = true;
+ break;
default:
dev_dbg(&acm->data->dev,
"%s - nonzero urb status received: %d\n",
@@ -505,10 +506,24 @@ static void acm_read_bulk_callback(struct urb *urb)
}
/*
- * Unthrottle may run on another CPU which needs to see events
- * in the same order. Submission has an implict barrier
+ * Make sure URB processing is done before marking as free to avoid
+ * racing with unthrottle() on another CPU. Matches the barriers
+ * implied by the test_and_clear_bit() in acm_submit_read_urb().
*/
smp_mb__before_atomic();
+ set_bit(rb->index, &acm->read_urbs_free);
+ /*
+ * Make sure URB is marked as free before checking the throttled flag
+ * to avoid racing with unthrottle() on another CPU. Matches the
+ * smp_mb() in unthrottle().
+ */
+ smp_mb__after_atomic();
+
+ if (stopped || stalled) {
+ if (stalled)
+ schedule_work(&acm->work);
+ return;
+ }
/* throttle device if requested by tty */
spin_lock_irqsave(&acm->read_lock, flags);
@@ -842,6 +857,9 @@ static void acm_tty_unthrottle(struct tty_struct *tty)
acm->throttle_req = 0;
spin_unlock_irq(&acm->read_lock);
+ /* Matches the smp_mb__after_atomic() in acm_read_bulk_callback(). */
+ smp_mb();
+
if (was_throttled)
acm_submit_read_urbs(acm, GFP_KERNEL);
}
--
2.21.0
Once blk_cleanup_queue() returns, tags shouldn't be used any more,
because blk_mq_free_tag_set() may be called. Commit 45a9c9d909b2
("blk-mq: Fix a use-after-free") fixes this issue exactly.
However, that commit introduces another issue. Before 45a9c9d909b2,
we are allowed to run queue during cleaning up queue if the queue's
kobj refcount is held. After that commit, queue can't be run during
queue cleaning up, otherwise oops can be triggered easily because
some fields of hctx are freed by blk_mq_free_queue() in blk_cleanup_queue().
We have invented ways for addressing this kind of issue before, such as:
8dc765d438f1 ("SCSI: fix queue cleanup race before queue initialization is done")
c2856ae2f315 ("blk-mq: quiesce queue before freeing queue")
But still can't cover all cases, recently James reports another such
kind of issue:
https://marc.info/?l=linux-scsi&m=155389088124782&w=2
This issue can be quite hard to address by previous way, given
scsi_run_queue() may run requeues for other LUNs.
Fixes the above issue by freeing hctx's resources in its release handler, and this
way is safe becasue tags isn't needed for freeing such hctx resource.
This approach follows typical design pattern wrt. kobject's release handler.
Cc: Dongli Zhang <dongli.zhang(a)oracle.com>
Cc: James Smart <james.smart(a)broadcom.com>
Cc: Bart Van Assche <bart.vanassche(a)wdc.com>
Cc: linux-scsi(a)vger.kernel.org,
Cc: Martin K . Petersen <martin.petersen(a)oracle.com>,
Cc: Christoph Hellwig <hch(a)lst.de>,
Cc: James E . J . Bottomley <jejb(a)linux.vnet.ibm.com>,
Reported-by: James Smart <james.smart(a)broadcom.com>
Fixes: 45a9c9d909b2 ("blk-mq: Fix a use-after-free")
Cc: stable(a)vger.kernel.org
Reviewed-by: Hannes Reinecke <hare(a)suse.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Tested-by: James Smart <james.smart(a)broadcom.com>
Signed-off-by: Ming Lei <ming.lei(a)redhat.com>
---
block/blk-core.c | 2 +-
block/blk-mq-sysfs.c | 6 ++++++
block/blk-mq.c | 8 ++------
block/blk-mq.h | 2 +-
4 files changed, 10 insertions(+), 8 deletions(-)
diff --git a/block/blk-core.c b/block/blk-core.c
index 93dc588fabe2..2dd94b3e9ece 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -374,7 +374,7 @@ void blk_cleanup_queue(struct request_queue *q)
blk_exit_queue(q);
if (queue_is_mq(q))
- blk_mq_free_queue(q);
+ blk_mq_exit_queue(q);
percpu_ref_exit(&q->q_usage_counter);
diff --git a/block/blk-mq-sysfs.c b/block/blk-mq-sysfs.c
index 3f9c3f4ac44c..4040e62c3737 100644
--- a/block/blk-mq-sysfs.c
+++ b/block/blk-mq-sysfs.c
@@ -10,6 +10,7 @@
#include <linux/smp.h>
#include <linux/blk-mq.h>
+#include "blk.h"
#include "blk-mq.h"
#include "blk-mq-tag.h"
@@ -33,6 +34,11 @@ static void blk_mq_hw_sysfs_release(struct kobject *kobj)
{
struct blk_mq_hw_ctx *hctx = container_of(kobj, struct blk_mq_hw_ctx,
kobj);
+
+ if (hctx->flags & BLK_MQ_F_BLOCKING)
+ cleanup_srcu_struct(hctx->srcu);
+ blk_free_flush_queue(hctx->fq);
+ sbitmap_free(&hctx->ctx_map);
free_cpumask_var(hctx->cpumask);
kfree(hctx->ctxs);
kfree(hctx);
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 89781309a108..d98cb9614dfa 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2267,12 +2267,7 @@ static void blk_mq_exit_hctx(struct request_queue *q,
if (set->ops->exit_hctx)
set->ops->exit_hctx(hctx, hctx_idx);
- if (hctx->flags & BLK_MQ_F_BLOCKING)
- cleanup_srcu_struct(hctx->srcu);
-
blk_mq_remove_cpuhp(hctx);
- blk_free_flush_queue(hctx->fq);
- sbitmap_free(&hctx->ctx_map);
}
static void blk_mq_exit_hw_queues(struct request_queue *q,
@@ -2907,7 +2902,8 @@ struct request_queue *blk_mq_init_allocated_queue(struct blk_mq_tag_set *set,
}
EXPORT_SYMBOL(blk_mq_init_allocated_queue);
-void blk_mq_free_queue(struct request_queue *q)
+/* tags can _not_ be used after returning from blk_mq_exit_queue */
+void blk_mq_exit_queue(struct request_queue *q)
{
struct blk_mq_tag_set *set = q->tag_set;
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 423ea88ab6fb..633a5a77ee8b 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -37,7 +37,7 @@ struct blk_mq_ctx {
struct kobject kobj;
} ____cacheline_aligned_in_smp;
-void blk_mq_free_queue(struct request_queue *q);
+void blk_mq_exit_queue(struct request_queue *q);
int blk_mq_update_nr_requests(struct request_queue *q, unsigned int nr);
void blk_mq_wake_waiters(struct request_queue *q);
bool blk_mq_dispatch_rq_list(struct request_queue *, struct list_head *, bool);
--
2.9.5
Commit abbbdf12497d ("replace kill_bdev() with __invalidate_device()")
once did this, but 29eaadc03649 ("nbd: stop using the bdev everywhere")
resurrected kill_bdev() and it has been there since then. So buffer_head
mappings still get killed on a server disconnection, and we can still
hit the BUG_ON on a filesystem on the top of the nbd device.
EXT4-fs (nbd0): mounted filesystem with ordered data mode. Opts: (null)
block nbd0: Receive control failed (result -32)
block nbd0: shutting down sockets
print_req_error: I/O error, dev nbd0, sector 66264 flags 3000
EXT4-fs warning (device nbd0): htree_dirblock_to_tree:979: inode #2: lblock 0: comm ls: error -5 reading directory block
print_req_error: I/O error, dev nbd0, sector 2264 flags 3000
EXT4-fs error (device nbd0): __ext4_get_inode_loc:4690: inode #2: block 283: comm ls: unable to read itable block
EXT4-fs error (device nbd0) in ext4_reserve_inode_write:5894: IO failure
------------[ cut here ]------------
kernel BUG at fs/buffer.c:3057!
invalid opcode: 0000 [#1] SMP PTI
CPU: 7 PID: 40045 Comm: jbd2/nbd0-8 Not tainted 5.1.0-rc3+ #4
Hardware name: Amazon EC2 m5.12xlarge/, BIOS 1.0 10/16/2017
RIP: 0010:submit_bh_wbc+0x18b/0x190
...
Call Trace:
jbd2_write_superblock+0xf1/0x230 [jbd2]
? account_entity_enqueue+0xc5/0xf0
jbd2_journal_update_sb_log_tail+0x94/0xe0 [jbd2]
jbd2_journal_commit_transaction+0x12f/0x1d20 [jbd2]
? __switch_to_asm+0x40/0x70
...
? lock_timer_base+0x67/0x80
kjournald2+0x121/0x360 [jbd2]
? remove_wait_queue+0x60/0x60
kthread+0xf8/0x130
? commit_timeout+0x10/0x10 [jbd2]
? kthread_bind+0x10/0x10
ret_from_fork+0x35/0x40
With __invalidate_device(), I no longer hit the BUG_ON with sync or
unmount on the disconnected device.
Fixes: 29eaadc03649 ("nbd: stop using the bdev everywhere")
Cc: linux-block(a)vger.kernel.org
Cc: Ratna Manoj Bolla <manoj.br(a)gmail.com>
Cc: nbd(a)other.debian.org
Cc: stable(a)vger.kernel.org
Cc: David Woodhouse <dwmw(a)amazon.com>
Signed-off-by: Munehisa Kamata <kamatam(a)amazon.com>
CR: https://code.amazon.com/reviews/CR-7629288
---
drivers/block/nbd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c
index 90ba9f4..6d6eedd 100644
--- a/drivers/block/nbd.c
+++ b/drivers/block/nbd.c
@@ -1217,7 +1217,7 @@ static void nbd_clear_sock_ioctl(struct nbd_device *nbd,
struct block_device *bdev)
{
sock_shutdown(nbd);
- kill_bdev(bdev);
+ __invalidate_device(bdev, true);
nbd_bdev_reset(bdev);
if (test_and_clear_bit(NBD_HAS_CONFIG_REF,
&nbd->config->runtime_flags))
--
2.7.4