This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
------------- Pseudo-Shortlog of commits:
Greg Kroah-Hartman gregkh@linuxfoundation.org Linux 6.0.17-rc1
Marco Elver elver@google.com kcsan: Instrument memcpy/memset/memmove with newer Clang
Chuck Lever chuck.lever@oracle.com SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() fails
Hanjun Guo guohanjun@huawei.com tpm: tpm_tis: Add the missed acpi_put_table() to fix memory leak
Hanjun Guo guohanjun@huawei.com tpm: tpm_crb: Add the missed acpi_put_table() to fix memory leak
Hanjun Guo guohanjun@huawei.com tpm: acpi: Call acpi_put_table() to fix memory leak
Deren Wu deren.wu@mediatek.com mmc: vub300: fix warning - do not call blocking ops when !TASK_RUNNING
Jan Kara jack@suse.cz block: Do not reread partition table on exclusively open device
Jaegeuk Kim jaegeuk@kernel.org f2fs: allow to read node block after shutdown
Pavel Machek pavel@denx.de f2fs: should put a page when checking the summary info
NARIBAYASHI Akira a.naribayashi@fujitsu.com mm, compaction: fix fast_isolate_around() to stay within boundaries
Mikulas Patocka mpatocka@redhat.com md: fix a crash in mempool_free
ChiYuan Huang cy_huang@richtek.com mfd: mt6360: Add bounds checking in Regmap read/write call-backs
Christian Brauner brauner@kernel.org pnode: terminate at peers of source
Takashi Iwai tiwai@suse.de ALSA: hda/hdmi: Static PCM mapping again with AMD HDMI codecs
Artem Egorkine arteme@gmail.com ALSA: line6: fix stack overflow in line6_midi_transmit
Artem Egorkine arteme@gmail.com ALSA: line6: correct midi status byte when receiving data from podxt
Al Viro viro@zeniv.linux.org.uk ovl: update ->f_iocb_flags when ovl_change_flags() modifies ->f_flags
Zhang Tianci zhangtianci.1997@bytedance.com ovl: Use ovl mounter's fsuid and fsgid in ovl_link()
Wang Yufen wangyufen@huawei.com binfmt: Fix error return code in load_elf_fdpic_binary()
Pavel Begunkov asml.silence@gmail.com io_uring: dont remove file from msg_ring reqs
Jens Axboe axboe@kernel.dk eventfd: provide a eventfd_signal_mask() helper
Jens Axboe axboe@kernel.dk eventpoll: add EPOLL_URING_WAKE poll wakeup flag
Aditya Garg gargaditya08@live.com hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount
Qiujun Huang hqjagain@gmail.com pstore/zone: Use GFP_ATOMIC to allocate zone buffer
Luca Stefani luca@osomprivacy.com pstore: Properly assign mem_type property
Mathieu Desnoyers mathieu.desnoyers@efficios.com mm/mempolicy: fix memory leak in set_mempolicy_home_node system call
Mel Gorman mgorman@techsingularity.net rtmutex: Add acquire semantics for rtmutex lock acquisition slow path
Mathieu Desnoyers mathieu.desnoyers@efficios.com futex: Fix futex_waitv() hrtimer debug object leak on kcalloc error
Terry Junge linuxhid@cosmicgizmosystems.com HID: plantronics: Additional PIDs for double volume key presses quirk
José Expósito jose.exposito89@gmail.com HID: multitouch: fix Asus ExpertBook P2 P2451FA trackpoint
wuqiang wuqiang.matt@bytedance.com kprobes: kretprobe events missing on 2-core KVM guest
Kees Cook keescook@chromium.org rtc: msc313: Fix function prototype mismatch in msc313_rtc_probe()
Nathan Lynch nathanl@linux.ibm.com powerpc/rtas: avoid scheduling in rtas_os_term()
Nathan Lynch nathanl@linux.ibm.com powerpc/rtas: avoid device tree lookups in rtas_os_term()
Ricardo Ribalda ribalda@chromium.org iommu/mediatek: Fix crash on isr after kexec()
Christophe Leroy christophe.leroy@csgroup.eu objtool: Fix SEGFAULT
Yin Xiujiang yinxiujiang@kylinos.cn fs/ntfs3: Fix slab-out-of-bounds in r_page
Dan Carpenter dan.carpenter@oracle.com fs/ntfs3: Delete duplicate condition in ntfs_read_mft()
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp fs/ntfs3: Use __GFP_NOWARN allocation at ntfs_fill_super()
Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp fs/ntfs3: Use __GFP_NOWARN allocation at wnd_init()
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Validate index root when initialize NTFS security
Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com soundwire: dmi-quirks: add quirk variant for LAPBC710 NUC15
Hawkins Jiawei yin31149@gmail.com fs/ntfs3: Fix slab-out-of-bounds read in run_unpack
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Validate resident attribute name
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Validate buffer length while parsing index
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Validate attribute name offset
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Add null pointer check for inode operations
Shigeru Yoshida syoshida@redhat.com fs/ntfs3: Fix memory leak on ntfs_fill_super() error path
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Add null pointer check to attr_load_runs_vcn
Edward Lo edward.lo@ambergroup.io fs/ntfs3: Validate data run offset
edward lo edward.lo@ambergroup.io fs/ntfs3: Add overflow check for attribute size
edward lo edward.lo@ambergroup.io fs/ntfs3: Validate BOOT record_size
Christoph Hellwig hch@lst.de nvmet: don't defer passthrough commands with trivial effects to the workqueue
Christoph Hellwig hch@lst.de nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition
Adam Vodopjan grozzly@protonmail.com ata: ahci: Fix PCS quirk application for suspend
Yu Kuai yukuai3@huawei.com block, bfq: fix uaf for bfqq in bfq_exit_icq_bfqq
Adrian Freund adrian@freund.io ACPI: resource: do IRQ override on Lenovo 14ALC7
Erik Schumacher ofenfisch@googlemail.com ACPI: resource: do IRQ override on XMG Core 15
Jiri Slaby (SUSE) jirislaby@kernel.org ACPI: resource: do IRQ override on LENOVO IdeaPad
Tamim Khan tamim@fusetak.com ACPI: resource: Skip IRQ override on Asus Vivobook K3402ZA/K3502ZA
Keith Busch kbusch@kernel.org nvme-pci: fix page size checks
Keith Busch kbusch@kernel.org nvme-pci: fix mempool alloc size
Klaus Jensen k.jensen@samsung.com nvme-pci: fix doorbell buffer value endianness
Paulo Alcantara pc@cjr.nz cifs: don't leak -ENOMEM in smb2_open_file()
Paulo Alcantara pc@cjr.nz cifs: fix static checker warning
Tejun Heo tj@kernel.org blk-iolatency: Fix memory leak on add_disk() failures
Christoph Hellwig hch@lst.de blk-cgroup: pass a gendisk to blkg_destroy_all
Christoph Hellwig hch@lst.de blk-throttle: pass a gendisk to blk_throtl_init and blk_throtl_exit
Christoph Hellwig hch@lst.de blk-cgroup: pass a gendisk to blkcg_init_queue and blkcg_exit_queue
Christoph Hellwig hch@lst.de blk-cgroup: cleanup the blkg_lookup family of functions
Christoph Hellwig hch@lst.de blk-cgroup: remove open coded blkg_lookup instances
Christoph Hellwig hch@lst.de blk-cgroup: remove blk_queue_root_blkg
Christoph Hellwig hch@lst.de blk-cgroup: fix error unwinding in blkcg_init_queue
Miaoqian Lin linmq006@gmail.com usb: dwc3: qcom: Fix memory leak in dwc3_qcom_interconnect_init
-------------
Diffstat:
Documentation/trace/kprobes.rst | 3 +- Makefile | 4 +- arch/powerpc/kernel/rtas.c | 20 ++++++-- block/bfq-iosched.c | 2 +- block/blk-cgroup.c | 100 ++++++++++++++------------------------ block/blk-cgroup.h | 68 ++++++++------------------ block/blk-throttle.c | 7 ++- block/blk-throttle.h | 8 +-- block/blk.h | 2 +- block/genhd.c | 12 +++-- block/ioctl.c | 12 +++-- drivers/acpi/resource.c | 78 ++++++++++++++++++++++++----- drivers/ata/ahci.c | 32 ++++++++---- drivers/char/tpm/eventlog/acpi.c | 12 +++-- drivers/char/tpm/tpm_crb.c | 29 +++++++---- drivers/char/tpm/tpm_tis.c | 9 ++-- drivers/hid/hid-ids.h | 3 ++ drivers/hid/hid-multitouch.c | 4 ++ drivers/hid/hid-plantronics.c | 9 ++++ drivers/iommu/mtk_iommu.c | 2 +- drivers/md/md.c | 9 ++-- drivers/mfd/mt6360-core.c | 14 +++++- drivers/mmc/host/vub300.c | 2 + drivers/nvme/host/pci.c | 37 +++++++------- drivers/nvme/target/passthru.c | 11 ++--- drivers/rtc/rtc-msc313.c | 12 +---- drivers/soundwire/dmi-quirks.c | 8 +++ drivers/usb/dwc3/dwc3-qcom.c | 13 +++-- fs/binfmt_elf_fdpic.c | 5 +- fs/cifs/smb2file.c | 4 +- fs/eventfd.c | 37 ++++++++------ fs/eventpoll.c | 18 ++++--- fs/f2fs/gc.c | 1 + fs/f2fs/node.c | 3 +- fs/hfsplus/hfsplus_fs.h | 2 + fs/hfsplus/inode.c | 4 +- fs/hfsplus/options.c | 4 ++ fs/ntfs3/attrib.c | 18 +++++++ fs/ntfs3/attrlist.c | 5 ++ fs/ntfs3/bitmap.c | 2 +- fs/ntfs3/frecord.c | 14 ++++++ fs/ntfs3/fslog.c | 35 +++++-------- fs/ntfs3/fsntfs.c | 10 ++-- fs/ntfs3/index.c | 6 +++ fs/ntfs3/inode.c | 9 ++++ fs/ntfs3/record.c | 10 ++++ fs/ntfs3/super.c | 9 ++-- fs/overlayfs/dir.c | 46 ++++++++++++------ fs/overlayfs/file.c | 1 + fs/pnode.c | 2 +- fs/pstore/ram.c | 2 +- fs/pstore/zone.c | 2 +- include/linux/eventfd.h | 7 +++ include/linux/nvme.h | 3 +- include/uapi/linux/eventpoll.h | 6 +++ io_uring/io_uring.c | 2 +- io_uring/msg_ring.c | 4 -- io_uring/opdef.c | 7 +++ io_uring/opdef.h | 2 + kernel/futex/syscalls.c | 11 +++-- kernel/kcsan/core.c | 50 +++++++++++++++++++ kernel/kprobes.c | 8 +-- kernel/locking/rtmutex.c | 55 +++++++++++++++++---- kernel/locking/rtmutex_api.c | 6 +-- mm/compaction.c | 18 ++----- mm/mempolicy.c | 1 + net/sunrpc/auth_gss/svcauth_gss.c | 9 +++- sound/pci/hda/patch_hdmi.c | 27 +++++++--- sound/usb/line6/driver.c | 3 +- sound/usb/line6/midi.c | 6 ++- sound/usb/line6/midibuf.c | 25 +++++++--- sound/usb/line6/midibuf.h | 5 +- sound/usb/line6/pod.c | 3 +- tools/objtool/check.c | 2 +- 74 files changed, 664 insertions(+), 367 deletions(-)
From: Miaoqian Lin linmq006@gmail.com
[ Upstream commit 97a48da1619ba6bd42a0e5da0a03aa490a9496b1 ]
of_icc_get() alloc resources for path handle, we should release it when not need anymore. Like the release in dwc3_qcom_interconnect_exit() function. Add icc_put() in error handling to fix this.
Fixes: bea46b981515 ("usb: dwc3: qcom: Add interconnect support in dwc3 driver") Cc: stable stable@kernel.org Acked-by: Thinh Nguyen Thinh.Nguyen@synopsys.com Signed-off-by: Miaoqian Lin linmq006@gmail.com Link: https://lore.kernel.org/r/20221206081731.818107-1-linmq006@gmail.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/usb/dwc3/dwc3-qcom.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c index d3f3937d7005..e7d37b6000ad 100644 --- a/drivers/usb/dwc3/dwc3-qcom.c +++ b/drivers/usb/dwc3/dwc3-qcom.c @@ -260,7 +260,8 @@ static int dwc3_qcom_interconnect_init(struct dwc3_qcom *qcom) if (IS_ERR(qcom->icc_path_apps)) { dev_err(dev, "failed to get apps-usb path: %ld\n", PTR_ERR(qcom->icc_path_apps)); - return PTR_ERR(qcom->icc_path_apps); + ret = PTR_ERR(qcom->icc_path_apps); + goto put_path_ddr; }
if (usb_get_maximum_speed(&qcom->dwc3->dev) >= USB_SPEED_SUPER || @@ -273,17 +274,23 @@ static int dwc3_qcom_interconnect_init(struct dwc3_qcom *qcom)
if (ret) { dev_err(dev, "failed to set bandwidth for usb-ddr path: %d\n", ret); - return ret; + goto put_path_apps; }
ret = icc_set_bw(qcom->icc_path_apps, APPS_USB_AVG_BW, APPS_USB_PEAK_BW); if (ret) { dev_err(dev, "failed to set bandwidth for apps-usb path: %d\n", ret); - return ret; + goto put_path_apps; }
return 0; + +put_path_apps: + icc_put(qcom->icc_path_apps); +put_path_ddr: + icc_put(qcom->icc_path_ddr); + return ret; }
/**
From: Christoph Hellwig hch@lst.de
[ Upstream commit 33dc62796cb657a633050138a86253fb2a553713 ]
When blk_throtl_init fails, we need to call blk_ioprio_exit. Switch to proper goto based unwinding to fix this.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Andreas Herrmann aherrmann@suse.de Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20220921180501.1539876-2-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: 813e693023ba ("blk-iolatency: Fix memory leak on add_disk() failures") Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index c8f0c865bf4e..bcd3873ac5ff 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1297,17 +1297,18 @@ int blkcg_init_queue(struct request_queue *q)
ret = blk_throtl_init(q); if (ret) - goto err_destroy_all; + goto err_ioprio_exit;
ret = blk_iolatency_init(q); - if (ret) { - blk_throtl_exit(q); - blk_ioprio_exit(q); - goto err_destroy_all; - } + if (ret) + goto err_throtl_exit;
return 0;
+err_throtl_exit: + blk_throtl_exit(q); +err_ioprio_exit: + blk_ioprio_exit(q); err_destroy_all: blkg_destroy_all(q); return ret;
From: Christoph Hellwig hch@lst.de
[ Upstream commit 928f6f00a91ecbef6cb1fe59474831ceaf205290 ]
Just open code it in the only caller and drop the unused !BLK_CGROUP stub.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Andreas Herrmann aherrmann@suse.de Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20220921180501.1539876-3-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: 813e693023ba ("blk-iolatency: Fix memory leak on add_disk() failures") Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 3 +-- block/blk-cgroup.h | 13 ------------- 2 files changed, 1 insertion(+), 15 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index bcd3873ac5ff..11f256214928 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -915,8 +915,7 @@ static void blkcg_fill_root_iostats(void) class_dev_iter_init(&iter, &block_class, NULL, &disk_type); while ((dev = class_dev_iter_next(&iter))) { struct block_device *bdev = dev_to_bdev(dev); - struct blkcg_gq *blkg = - blk_queue_root_blkg(bdev_get_queue(bdev)); + struct blkcg_gq *blkg = bdev->bd_disk->queue->root_blkg; struct blkg_iostat tmp; int cpu; unsigned long flags; diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index d2724d1dd7c9..c1fb00a1dfc0 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -268,17 +268,6 @@ static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, return __blkg_lookup(blkcg, q, false); }
-/** - * blk_queue_root_blkg - return blkg for the (blkcg_root, @q) pair - * @q: request_queue of interest - * - * Lookup blkg for @q at the root level. See also blkg_lookup(). - */ -static inline struct blkcg_gq *blk_queue_root_blkg(struct request_queue *q) -{ - return q->root_blkg; -} - /** * blkg_to_pdata - get policy private data * @blkg: blkg of interest @@ -507,8 +496,6 @@ struct blkcg { };
static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, void *key) { return NULL; } -static inline struct blkcg_gq *blk_queue_root_blkg(struct request_queue *q) -{ return NULL; } static inline int blkcg_init_queue(struct request_queue *q) { return 0; } static inline void blkcg_exit_queue(struct request_queue *q) { } static inline int blkcg_policy_register(struct blkcg_policy *pol) { return 0; }
From: Christoph Hellwig hch@lst.de
[ Upstream commit 79fcc5be93e5b17a2a5d36553f7a5c1ad9e953b6 ]
Use blkg_lookup instead of open coding it.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Andreas Herrmann aherrmann@suse.de Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20220921180501.1539876-4-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: 813e693023ba ("blk-iolatency: Fix memory leak on add_disk() failures") Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 6 +++--- block/blk-cgroup.h | 8 ++++---- 2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 11f256214928..ee48b6f4d5d4 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -324,7 +324,7 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg,
/* link parent */ if (blkcg_parent(blkcg)) { - blkg->parent = __blkg_lookup(blkcg_parent(blkcg), q, false); + blkg->parent = blkg_lookup(blkcg_parent(blkcg), q); if (WARN_ON_ONCE(!blkg->parent)) { ret = -ENODEV; goto err_put_css; @@ -412,7 +412,7 @@ static struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg, struct blkcg_gq *ret_blkg = q->root_blkg;
while (parent) { - blkg = __blkg_lookup(parent, q, false); + blkg = blkg_lookup(parent, q); if (blkg) { /* remember closest blkg */ ret_blkg = blkg; @@ -724,7 +724,7 @@ int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol, struct blkcg_gq *new_blkg;
parent = blkcg_parent(blkcg); - while (parent && !__blkg_lookup(parent, q, false)) { + while (parent && !blkg_lookup(parent, q)) { pos = parent; parent = blkcg_parent(parent); } diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index c1fb00a1dfc0..30396cad50e9 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -362,8 +362,8 @@ static inline void blkg_put(struct blkcg_gq *blkg) */ #define blkg_for_each_descendant_pre(d_blkg, pos_css, p_blkg) \ css_for_each_descendant_pre((pos_css), &(p_blkg)->blkcg->css) \ - if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \ - (p_blkg)->q, false))) + if (((d_blkg) = blkg_lookup(css_to_blkcg(pos_css), \ + (p_blkg)->q)))
/** * blkg_for_each_descendant_post - post-order walk of a blkg's descendants @@ -377,8 +377,8 @@ static inline void blkg_put(struct blkcg_gq *blkg) */ #define blkg_for_each_descendant_post(d_blkg, pos_css, p_blkg) \ css_for_each_descendant_post((pos_css), &(p_blkg)->blkcg->css) \ - if (((d_blkg) = __blkg_lookup(css_to_blkcg(pos_css), \ - (p_blkg)->q, false))) + if (((d_blkg) = blkg_lookup(css_to_blkcg(pos_css), \ + (p_blkg)->q)))
bool __blkcg_punt_bio_submit(struct bio *bio);
From: Christoph Hellwig hch@lst.de
[ Upstream commit 4a69f325aa43847e0827fbfe4b3623307b0c9baa ]
Add a fully inlined blkg_lookup as the extra two checks aren't going to generated a lot more code vs the call to the slowpath routine, and open code the hint update in the two callers that care.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Andreas Herrmann aherrmann@suse.de Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20220921180501.1539876-5-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: 813e693023ba ("blk-iolatency: Fix memory leak on add_disk() failures") Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 38 +++++++++++++++----------------------- block/blk-cgroup.h | 39 ++++++++++++--------------------------- 2 files changed, 27 insertions(+), 50 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index ee48b6f4d5d4..f66cf1734e84 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -263,29 +263,13 @@ static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q, return NULL; }
-struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg, - struct request_queue *q, bool update_hint) +static void blkg_update_hint(struct blkcg *blkcg, struct blkcg_gq *blkg) { - struct blkcg_gq *blkg; - - /* - * Hint didn't match. Look up from the radix tree. Note that the - * hint can only be updated under queue_lock as otherwise @blkg - * could have already been removed from blkg_tree. The caller is - * responsible for grabbing queue_lock if @update_hint. - */ - blkg = radix_tree_lookup(&blkcg->blkg_tree, q->id); - if (blkg && blkg->q == q) { - if (update_hint) { - lockdep_assert_held(&q->queue_lock); - rcu_assign_pointer(blkcg->blkg_hint, blkg); - } - return blkg; - } + lockdep_assert_held(&blkg->q->queue_lock);
- return NULL; + if (blkcg != &blkcg_root && blkg != rcu_dereference(blkcg->blkg_hint)) + rcu_assign_pointer(blkcg->blkg_hint, blkg); } -EXPORT_SYMBOL_GPL(blkg_lookup_slowpath);
/* * If @new_blkg is %NULL, this function tries to allocate a new one as @@ -397,9 +381,11 @@ static struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg, return blkg;
spin_lock_irqsave(&q->queue_lock, flags); - blkg = __blkg_lookup(blkcg, q, true); - if (blkg) + blkg = blkg_lookup(blkcg, q); + if (blkg) { + blkg_update_hint(blkcg, blkg); goto found; + }
/* * Create blkgs walking down from blkcg_root to @blkcg, so that all @@ -621,12 +607,18 @@ static struct blkcg_gq *blkg_lookup_check(struct blkcg *blkcg, const struct blkcg_policy *pol, struct request_queue *q) { + struct blkcg_gq *blkg; + WARN_ON_ONCE(!rcu_read_lock_held()); lockdep_assert_held(&q->queue_lock);
if (!blkcg_policy_enabled(q, pol)) return ERR_PTR(-EOPNOTSUPP); - return __blkg_lookup(blkcg, q, true /* update_hint */); + + blkg = blkg_lookup(blkcg, q); + if (blkg) + blkg_update_hint(blkcg, blkg); + return blkg; }
/** diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index 30396cad50e9..91b7ae0773be 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -178,8 +178,6 @@ struct blkcg_policy { extern struct blkcg blkcg_root; extern bool blkcg_debug_stats;
-struct blkcg_gq *blkg_lookup_slowpath(struct blkcg *blkcg, - struct request_queue *q, bool update_hint); int blkcg_init_queue(struct request_queue *q); void blkcg_exit_queue(struct request_queue *q);
@@ -227,22 +225,21 @@ static inline bool bio_issue_as_root_blkg(struct bio *bio) }
/** - * __blkg_lookup - internal version of blkg_lookup() + * blkg_lookup - lookup blkg for the specified blkcg - q pair * @blkcg: blkcg of interest * @q: request_queue of interest - * @update_hint: whether to update lookup hint with the result or not * - * This is internal version and shouldn't be used by policy - * implementations. Looks up blkgs for the @blkcg - @q pair regardless of - * @q's bypass state. If @update_hint is %true, the caller should be - * holding @q->queue_lock and lookup hint is updated on success. + * Lookup blkg for the @blkcg - @q pair. + + * Must be called in a RCU critical section. */ -static inline struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, - struct request_queue *q, - bool update_hint) +static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, + struct request_queue *q) { struct blkcg_gq *blkg;
+ WARN_ON_ONCE(!rcu_read_lock_held()); + if (blkcg == &blkcg_root) return q->root_blkg;
@@ -250,22 +247,10 @@ static inline struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, if (blkg && blkg->q == q) return blkg;
- return blkg_lookup_slowpath(blkcg, q, update_hint); -} - -/** - * blkg_lookup - lookup blkg for the specified blkcg - q pair - * @blkcg: blkcg of interest - * @q: request_queue of interest - * - * Lookup blkg for the @blkcg - @q pair. This function should be called - * under RCU read lock. - */ -static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, - struct request_queue *q) -{ - WARN_ON_ONCE(!rcu_read_lock_held()); - return __blkg_lookup(blkcg, q, false); + blkg = radix_tree_lookup(&blkcg->blkg_tree, q->id); + if (blkg && blkg->q != q) + blkg = NULL; + return blkg; }
/**
From: Christoph Hellwig hch@lst.de
[ Upstream commit 9823538fb7efe66ce987a1e4c0e0f3dc882623c4 ]
Pass the gendisk to blkcg_init_disk and blkcg_exit_disk as part of moving the blk-cgroup infrastructure to be gendisk based. Also remove the rather pointless kerneldoc comments for these internal functions with a single caller each.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Andreas Herrmann aherrmann@suse.de Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20220921180501.1539876-7-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: 813e693023ba ("blk-iolatency: Fix memory leak on add_disk() failures") Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 25 +++++-------------------- block/blk-cgroup.h | 8 ++++---- block/genhd.c | 5 +++-- 3 files changed, 12 insertions(+), 26 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index f66cf1734e84..4943f36d8a84 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1246,18 +1246,9 @@ static int blkcg_css_online(struct cgroup_subsys_state *css) return 0; }
-/** - * blkcg_init_queue - initialize blkcg part of request queue - * @q: request_queue to initialize - * - * Called from blk_alloc_queue(). Responsible for initializing blkcg - * part of new request_queue @q. - * - * RETURNS: - * 0 on success, -errno on failure. - */ -int blkcg_init_queue(struct request_queue *q) +int blkcg_init_disk(struct gendisk *disk) { + struct request_queue *q = disk->queue; struct blkcg_gq *new_blkg, *blkg; bool preloaded; int ret; @@ -1310,16 +1301,10 @@ int blkcg_init_queue(struct request_queue *q) return PTR_ERR(blkg); }
-/** - * blkcg_exit_queue - exit and release blkcg part of request_queue - * @q: request_queue being released - * - * Called from blk_exit_queue(). Responsible for exiting blkcg part. - */ -void blkcg_exit_queue(struct request_queue *q) +void blkcg_exit_disk(struct gendisk *disk) { - blkg_destroy_all(q); - blk_throtl_exit(q); + blkg_destroy_all(disk->queue); + blk_throtl_exit(disk->queue); }
static void blkcg_bind(struct cgroup_subsys_state *root_css) diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index 91b7ae0773be..aa2b286bc825 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -178,8 +178,8 @@ struct blkcg_policy { extern struct blkcg blkcg_root; extern bool blkcg_debug_stats;
-int blkcg_init_queue(struct request_queue *q); -void blkcg_exit_queue(struct request_queue *q); +int blkcg_init_disk(struct gendisk *disk); +void blkcg_exit_disk(struct gendisk *disk);
/* Blkio controller policy registration */ int blkcg_policy_register(struct blkcg_policy *pol); @@ -481,8 +481,8 @@ struct blkcg { };
static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, void *key) { return NULL; } -static inline int blkcg_init_queue(struct request_queue *q) { return 0; } -static inline void blkcg_exit_queue(struct request_queue *q) { } +static inline int blkcg_init_disk(struct gendisk *disk) { return 0; } +static inline void blkcg_exit_disk(struct gendisk *disk) { } static inline int blkcg_policy_register(struct blkcg_policy *pol) { return 0; } static inline void blkcg_policy_unregister(struct blkcg_policy *pol) { } static inline int blkcg_activate_policy(struct request_queue *q, diff --git a/block/genhd.c b/block/genhd.c index 28654723bc2b..0568da4c0a2e 100644 --- a/block/genhd.c +++ b/block/genhd.c @@ -1154,7 +1154,8 @@ static void disk_release(struct device *dev) !test_bit(GD_ADDED, &disk->state)) blk_mq_exit_queue(disk->queue);
- blkcg_exit_queue(disk->queue); + blkcg_exit_disk(disk); + bioset_exit(&disk->bio_split);
disk_release_events(disk); @@ -1367,7 +1368,7 @@ struct gendisk *__alloc_disk_node(struct request_queue *q, int node_id, if (xa_insert(&disk->part_tbl, 0, disk->part0, GFP_KERNEL)) goto out_destroy_part_tbl;
- if (blkcg_init_queue(q)) + if (blkcg_init_disk(disk)) goto out_erase_part0;
rand_initialize_disk(disk);
From: Christoph Hellwig hch@lst.de
[ Upstream commit e13793bae65919cd3e6a7827f8d30f4dbb8584ee ]
Pass the gendisk to blk_throtl_init and blk_throtl_exit as part of moving the blk-cgroup infrastructure to be gendisk based.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Andreas Herrmann aherrmann@suse.de Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20220921180501.1539876-13-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: 813e693023ba ("blk-iolatency: Fix memory leak on add_disk() failures") Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 6 +++--- block/blk-throttle.c | 7 +++++-- block/blk-throttle.h | 8 ++++---- 3 files changed, 12 insertions(+), 9 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 4943f36d8a84..afe802e1180f 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -1277,7 +1277,7 @@ int blkcg_init_disk(struct gendisk *disk) if (ret) goto err_destroy_all;
- ret = blk_throtl_init(q); + ret = blk_throtl_init(disk); if (ret) goto err_ioprio_exit;
@@ -1288,7 +1288,7 @@ int blkcg_init_disk(struct gendisk *disk) return 0;
err_throtl_exit: - blk_throtl_exit(q); + blk_throtl_exit(disk); err_ioprio_exit: blk_ioprio_exit(q); err_destroy_all: @@ -1304,7 +1304,7 @@ int blkcg_init_disk(struct gendisk *disk) void blkcg_exit_disk(struct gendisk *disk) { blkg_destroy_all(disk->queue); - blk_throtl_exit(disk->queue); + blk_throtl_exit(disk); }
static void blkcg_bind(struct cgroup_subsys_state *root_css) diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 35cf744ea9d1..f84a6ed440c9 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -2276,8 +2276,9 @@ void blk_throtl_bio_endio(struct bio *bio) } #endif
-int blk_throtl_init(struct request_queue *q) +int blk_throtl_init(struct gendisk *disk) { + struct request_queue *q = disk->queue; struct throtl_data *td; int ret;
@@ -2319,8 +2320,10 @@ int blk_throtl_init(struct request_queue *q) return ret; }
-void blk_throtl_exit(struct request_queue *q) +void blk_throtl_exit(struct gendisk *disk) { + struct request_queue *q = disk->queue; + BUG_ON(!q->td); del_timer_sync(&q->td->service_queue.pending_timer); throtl_shutdown_wq(q); diff --git a/block/blk-throttle.h b/block/blk-throttle.h index ee7299e6dea9..e8c2b3d4a18b 100644 --- a/block/blk-throttle.h +++ b/block/blk-throttle.h @@ -159,14 +159,14 @@ static inline struct throtl_grp *blkg_to_tg(struct blkcg_gq *blkg) * Internal throttling interface */ #ifndef CONFIG_BLK_DEV_THROTTLING -static inline int blk_throtl_init(struct request_queue *q) { return 0; } -static inline void blk_throtl_exit(struct request_queue *q) { } +static inline int blk_throtl_init(struct gendisk *disk) { return 0; } +static inline void blk_throtl_exit(struct gendisk *disk) { } static inline void blk_throtl_register_queue(struct request_queue *q) { } static inline bool blk_throtl_bio(struct bio *bio) { return false; } static inline void blk_throtl_cancel_bios(struct request_queue *q) { } #else /* CONFIG_BLK_DEV_THROTTLING */ -int blk_throtl_init(struct request_queue *q); -void blk_throtl_exit(struct request_queue *q); +int blk_throtl_init(struct gendisk *disk); +void blk_throtl_exit(struct gendisk *disk); void blk_throtl_register_queue(struct request_queue *q); bool __blk_throtl_bio(struct bio *bio); void blk_throtl_cancel_bios(struct request_queue *q);
From: Christoph Hellwig hch@lst.de
[ Upstream commit 00ad6991bbae116b7c83f68754edd6f4d5e65e01 ]
Pass the gendisk to blkg_destroy_all as part of moving the blk-cgroup infrastructure to be gendisk based.
Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Andreas Herrmann aherrmann@suse.de Acked-by: Tejun Heo tj@kernel.org Link: https://lore.kernel.org/r/20220921180501.1539876-16-hch@lst.de Signed-off-by: Jens Axboe axboe@kernel.dk Stable-dep-of: 813e693023ba ("blk-iolatency: Fix memory leak on add_disk() failures") Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 13 ++++--------- 1 file changed, 4 insertions(+), 9 deletions(-)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index afe802e1180f..cd682fe46d2f 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -462,14 +462,9 @@ static void blkg_destroy(struct blkcg_gq *blkg) percpu_ref_kill(&blkg->refcnt); }
-/** - * blkg_destroy_all - destroy all blkgs associated with a request_queue - * @q: request_queue of interest - * - * Destroy all blkgs associated with @q. - */ -static void blkg_destroy_all(struct request_queue *q) +static void blkg_destroy_all(struct gendisk *disk) { + struct request_queue *q = disk->queue; struct blkcg_gq *blkg, *n; int count = BLKG_DESTROY_BATCH_SIZE;
@@ -1292,7 +1287,7 @@ int blkcg_init_disk(struct gendisk *disk) err_ioprio_exit: blk_ioprio_exit(q); err_destroy_all: - blkg_destroy_all(q); + blkg_destroy_all(disk); return ret; err_unlock: spin_unlock_irq(&q->queue_lock); @@ -1303,7 +1298,7 @@ int blkcg_init_disk(struct gendisk *disk)
void blkcg_exit_disk(struct gendisk *disk) { - blkg_destroy_all(disk->queue); + blkg_destroy_all(disk); blk_throtl_exit(disk); }
From: Tejun Heo tj@kernel.org
[ Upstream commit 813e693023ba10da9e75067780f8378465bf27cc ]
When a gendisk is successfully initialized but add_disk() fails such as when a loop device has invalid number of minor device numbers specified, blkcg_init_disk() is called during init and then blkcg_exit_disk() during error handling. Unfortunately, iolatency gets initialized in the former but doesn't get cleaned up in the latter.
This is because, in non-error cases, the cleanup is performed by del_gendisk() calling rq_qos_exit(), the assumption being that rq_qos policies, iolatency being one of them, can only be activated once the disk is fully registered and visible. That assumption is true for wbt and iocost, but not so for iolatency as it gets initialized before add_disk() is called.
It is desirable to lazy-init rq_qos policies because they are optional features and add to hot path overhead once initialized - each IO has to walk all the registered rq_qos policies. So, we want to switch iolatency to lazy init too. However, that's a bigger change. As a fix for the immediate problem, let's just add an extra call to rq_qos_exit() in blkcg_exit_disk(). This is safe because duplicate calls to rq_qos_exit() become noop's.
Signed-off-by: Tejun Heo tj@kernel.org Reported-by: darklight2357@icloud.com Cc: Josef Bacik josef@toxicpanda.com Cc: Linus Torvalds torvalds@linux-foundation.org Fixes: d70675121546 ("block: introduce blk-iolatency io controller") Cc: stable@vger.kernel.org # v4.19+ Reviewed-by: Christoph Hellwig hch@lst.de Link: https://lore.kernel.org/r/Y5TQ5gm3O4HXrXR3@slm.duckdns.org Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/blk-cgroup.c | 2 ++ 1 file changed, 2 insertions(+)
diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index cd682fe46d2f..ee517fb06aa6 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -33,6 +33,7 @@ #include "blk-cgroup.h" #include "blk-ioprio.h" #include "blk-throttle.h" +#include "blk-rq-qos.h"
/* * blkcg_pol_mutex protects blkcg_policy[] and policy [de]activation. @@ -1299,6 +1300,7 @@ int blkcg_init_disk(struct gendisk *disk) void blkcg_exit_disk(struct gendisk *disk) { blkg_destroy_all(disk); + rq_qos_exit(disk->queue); blk_throtl_exit(disk); }
From: Paulo Alcantara pc@cjr.nz
[ Upstream commit a9e17d3d74d14e5fd10d54f0a07e0fce4e5f80dd ]
Remove unnecessary NULL check of oparam->cifs_sb when parsing symlink error response as it's already set by all smb2_open_file() callers and deferenced earlier.
This fixes below report:
fs/cifs/smb2file.c:126 smb2_open_file() warn: variable dereferenced before check 'oparms->cifs_sb' (see line 112)
Link: https://lore.kernel.org/r/Y0kt42j2tdpYakRu@kili Reported-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Signed-off-by: Steve French stfrench@microsoft.com Stable-dep-of: f60ffa662d14 ("cifs: don't leak -ENOMEM in smb2_open_file()") Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/smb2file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/cifs/smb2file.c b/fs/cifs/smb2file.c index 4992b43616a7..ffbd9a99fc12 100644 --- a/fs/cifs/smb2file.c +++ b/fs/cifs/smb2file.c @@ -123,7 +123,7 @@ int smb2_open_file(const unsigned int xid, struct cifs_open_parms *oparms, __u32
if (unlikely(!err_iov.iov_base || err_buftype == CIFS_NO_BUFFER)) rc = -ENOMEM; - else if (hdr->Status == STATUS_STOPPED_ON_SYMLINK && oparms->cifs_sb) { + else if (hdr->Status == STATUS_STOPPED_ON_SYMLINK) { rc = smb2_parse_symlink_response(oparms->cifs_sb, &err_iov, &data->symlink_target); if (!rc) {
From: Paulo Alcantara pc@cjr.nz
[ Upstream commit f60ffa662d1427cfd31fe9d895c3566ac50bfe52 ]
A NULL error response might be a valid case where smb2_reconnect() failed to reconnect the session and tcon due to a disconnected server prior to issuing the I/O operation, so don't leak -ENOMEM to userspace on such occasions.
Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+") Signed-off-by: Paulo Alcantara (SUSE) pc@cjr.nz Signed-off-by: Steve French stfrench@microsoft.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/cifs/smb2file.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/cifs/smb2file.c b/fs/cifs/smb2file.c index ffbd9a99fc12..ba6cc50af390 100644 --- a/fs/cifs/smb2file.c +++ b/fs/cifs/smb2file.c @@ -122,8 +122,8 @@ int smb2_open_file(const unsigned int xid, struct cifs_open_parms *oparms, __u32 struct smb2_hdr *hdr = err_iov.iov_base;
if (unlikely(!err_iov.iov_base || err_buftype == CIFS_NO_BUFFER)) - rc = -ENOMEM; - else if (hdr->Status == STATUS_STOPPED_ON_SYMLINK) { + goto out; + if (hdr->Status == STATUS_STOPPED_ON_SYMLINK) { rc = smb2_parse_symlink_response(oparms->cifs_sb, &err_iov, &data->symlink_target); if (!rc) {
From: Klaus Jensen k.jensen@samsung.com
[ Upstream commit b5f96cb719d8ba220b565ddd3ba4ac0d8bcfb130 ]
When using shadow doorbells, the event index and the doorbell values are written to host memory. Prior to this patch, the values written would erroneously be written in host endianness. This causes trouble on big-endian platforms. Fix this by adding missing endian conversions.
This issue was noticed by Guenter while testing various big-endian platforms under QEMU[1]. A similar fix required for hw/nvme in QEMU is up for review as well[2].
[1]: https://lore.kernel.org/qemu-devel/20221209110022.GA3396194@roeck-us.net/ [2]: https://lore.kernel.org/qemu-devel/20221212114409.34972-4-its@irrelevant.dk/
Fixes: f9f38e33389c ("nvme: improve performance for virtual NVMe devices") Reported-by: Guenter Roeck linux@roeck-us.net Signed-off-by: Klaus Jensen k.jensen@samsung.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 25 +++++++++++++------------ 1 file changed, 13 insertions(+), 12 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 6867620bcc98..0b97e8ce13e4 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -144,9 +144,9 @@ struct nvme_dev { mempool_t *iod_mempool;
/* shadow doorbell buffer support: */ - u32 *dbbuf_dbs; + __le32 *dbbuf_dbs; dma_addr_t dbbuf_dbs_dma_addr; - u32 *dbbuf_eis; + __le32 *dbbuf_eis; dma_addr_t dbbuf_eis_dma_addr;
/* host memory buffer support: */ @@ -210,10 +210,10 @@ struct nvme_queue { #define NVMEQ_SQ_CMB 1 #define NVMEQ_DELETE_ERROR 2 #define NVMEQ_POLLED 3 - u32 *dbbuf_sq_db; - u32 *dbbuf_cq_db; - u32 *dbbuf_sq_ei; - u32 *dbbuf_cq_ei; + __le32 *dbbuf_sq_db; + __le32 *dbbuf_cq_db; + __le32 *dbbuf_sq_ei; + __le32 *dbbuf_cq_ei; struct completion delete_done; };
@@ -340,11 +340,11 @@ static inline int nvme_dbbuf_need_event(u16 event_idx, u16 new_idx, u16 old) }
/* Update dbbuf and return true if an MMIO is required */ -static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, - volatile u32 *dbbuf_ei) +static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db, + volatile __le32 *dbbuf_ei) { if (dbbuf_db) { - u16 old_value; + u16 old_value, event_idx;
/* * Ensure that the queue is written before updating @@ -352,8 +352,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, */ wmb();
- old_value = *dbbuf_db; - *dbbuf_db = value; + old_value = le32_to_cpu(*dbbuf_db); + *dbbuf_db = cpu_to_le32(value);
/* * Ensure that the doorbell is updated before reading the event @@ -363,7 +363,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, */ mb();
- if (!nvme_dbbuf_need_event(*dbbuf_ei, value, old_value)) + event_idx = le32_to_cpu(*dbbuf_ei); + if (!nvme_dbbuf_need_event(event_idx, value, old_value)) return false; }
From: Keith Busch kbusch@kernel.org
[ Upstream commit c89a529e823d51dd23c7ec0c047c7a454a428541 ]
Convert the max size to bytes to match the units of the divisor that calculates the worst-case number of PRP entries.
The result is used to determine how many PRP Lists are required. The code was previously rounding this to 1 list, but we can require 2 in the worst case. In that scenario, the driver would corrupt memory beyond the size provided by the mempool.
While unlikely to occur (you'd need a 4MB in exactly 127 phys segments on a queue that doesn't support SGLs), this memory corruption has been observed by kfence.
Cc: Jens Axboe axboe@kernel.dk Fixes: 943e942e6266f ("nvme-pci: limit max IO size and segments to avoid high order allocations") Signed-off-by: Keith Busch kbusch@kernel.org Reviewed-by: Jens Axboe axboe@kernel.dk Reviewed-by: Kanchan Joshi joshi.k@samsung.com Reviewed-by: Chaitanya Kulkarni kch@nvidia.com Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 0b97e8ce13e4..b128e2e36b68 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -378,8 +378,8 @@ static bool nvme_dbbuf_update_and_check_event(u16 value, __le32 *dbbuf_db, */ static int nvme_pci_npages_prp(void) { - unsigned nprps = DIV_ROUND_UP(NVME_MAX_KB_SZ + NVME_CTRL_PAGE_SIZE, - NVME_CTRL_PAGE_SIZE); + unsigned max_bytes = (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE; + unsigned nprps = DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE); return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); }
From: Keith Busch kbusch@kernel.org
[ Upstream commit 841734234a28fd5cd0889b84bd4d93a0988fa11e ]
The size allocated out of the dma pool is at most NVME_CTRL_PAGE_SIZE, which may be smaller than the PAGE_SIZE.
Fixes: c61b82c7b7134 ("nvme-pci: fix PRP pool size") Signed-off-by: Keith Busch kbusch@kernel.org Signed-off-by: Christoph Hellwig hch@lst.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/host/pci.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index b128e2e36b68..529b424ef9b2 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -35,7 +35,7 @@ #define SQ_SIZE(q) ((q)->q_depth << (q)->sqes) #define CQ_SIZE(q) ((q)->q_depth * sizeof(struct nvme_completion))
-#define SGES_PER_PAGE (PAGE_SIZE / sizeof(struct nvme_sgl_desc)) +#define SGES_PER_PAGE (NVME_CTRL_PAGE_SIZE / sizeof(struct nvme_sgl_desc))
/* * These can be higher, but we need to ensure that any command doesn't @@ -380,7 +380,7 @@ static int nvme_pci_npages_prp(void) { unsigned max_bytes = (NVME_MAX_KB_SZ * 1024) + NVME_CTRL_PAGE_SIZE; unsigned nprps = DIV_ROUND_UP(max_bytes, NVME_CTRL_PAGE_SIZE); - return DIV_ROUND_UP(8 * nprps, PAGE_SIZE - 8); + return DIV_ROUND_UP(8 * nprps, NVME_CTRL_PAGE_SIZE - 8); }
/* @@ -390,7 +390,7 @@ static int nvme_pci_npages_prp(void) static int nvme_pci_npages_sgl(void) { return DIV_ROUND_UP(NVME_MAX_SEGS * sizeof(struct nvme_sgl_desc), - PAGE_SIZE); + NVME_CTRL_PAGE_SIZE); }
static size_t nvme_pci_iod_alloc_size(void) @@ -721,7 +721,7 @@ static void nvme_pci_sgl_set_seg(struct nvme_sgl_desc *sge, sge->length = cpu_to_le32(entries * sizeof(*sge)); sge->type = NVME_SGL_FMT_LAST_SEG_DESC << 4; } else { - sge->length = cpu_to_le32(PAGE_SIZE); + sge->length = cpu_to_le32(NVME_CTRL_PAGE_SIZE); sge->type = NVME_SGL_FMT_SEG_DESC << 4; } }
From: Tamim Khan tamim@fusetak.com
[ Upstream commit e12dee3736731e24b1e7367f87d66ac0fcd73ce7 ]
In the ACPI DSDT table for Asus VivoBook K3402ZA/K3502ZA IRQ 1 is described as ActiveLow; however, the kernel overrides it to Edge_High. This prevents the internal keyboard from working on these laptops. In order to fix this add these laptops to the skip_override_table so that the kernel does not override IRQ 1 to Edge_High.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=216158 Reviewed-by: Hui Wang hui.wang@canonical.com Tested-by: Tamim Khan tamim@fusetak.com Tested-by: Sunand sunandchakradhar@gmail.com Signed-off-by: Tamim Khan tamim@fusetak.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Stable-dep-of: f3cb9b740869 ("ACPI: resource: do IRQ override on Lenovo 14ALC7") Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/resource.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 510cdec375c4..2ebc85233bac 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -399,6 +399,24 @@ static const struct dmi_system_id medion_laptop[] = { { } };
+static const struct dmi_system_id asus_laptop[] = { + { + .ident = "Asus Vivobook K3402ZA", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BOARD_NAME, "K3402ZA"), + }, + }, + { + .ident = "Asus Vivobook K3502ZA", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "ASUSTeK COMPUTER INC."), + DMI_MATCH(DMI_BOARD_NAME, "K3502ZA"), + }, + }, + { } +}; + struct irq_override_cmp { const struct dmi_system_id *system; unsigned char irq; @@ -409,6 +427,7 @@ struct irq_override_cmp {
static const struct irq_override_cmp skip_override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0 }, + { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0 }, };
static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
From: Jiri Slaby (SUSE) jirislaby@kernel.org
[ Upstream commit bfcdf58380b1d9be564a78a9370da722ed1a9965 ]
LENOVO IdeaPad Flex 5 is ryzen-5 based and the commit below removed IRQ overriding for those. This broke touchscreen and trackpad: i2c_designware AMDI0010:00: controller timed out i2c_designware AMDI0010:03: controller timed out i2c_hid_acpi i2c-MSFT0001:00: failed to reset device: -61 i2c_designware AMDI0010:03: controller timed out ... i2c_hid_acpi i2c-MSFT0001:00: can't add hid device: -61 i2c_hid_acpi: probe of i2c-MSFT0001:00 failed with error -61
White-list this specific model in the override_table.
For this to work, the ZEN test needs to be put below the table walk.
Fixes: 37c81d9f1d1b (ACPI: resource: skip IRQ override on AMD Zen platforms) Link: https://bugzilla.suse.com/show_bug.cgi?id=1203794 Signed-off-by: Jiri Slaby (SUSE) jirislaby@kernel.org Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/resource.c | 42 +++++++++++++++++++++++++++-------------- 1 file changed, 28 insertions(+), 14 deletions(-)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 2ebc85233bac..30811a9521d4 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -417,17 +417,31 @@ static const struct dmi_system_id asus_laptop[] = { { } };
+static const struct dmi_system_id lenovo_82ra[] = { + { + .ident = "LENOVO IdeaPad Flex 5 16ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82RA"), + }, + }, + { } +}; + struct irq_override_cmp { const struct dmi_system_id *system; unsigned char irq; unsigned char triggering; unsigned char polarity; unsigned char shareable; + bool override; };
-static const struct irq_override_cmp skip_override_table[] = { - { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0 }, - { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0 }, +static const struct irq_override_cmp override_table[] = { + { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, + { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, + { lenovo_82ra, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_82ra, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, };
static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, @@ -435,6 +449,17 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, { int i;
+ for (i = 0; i < ARRAY_SIZE(override_table); i++) { + const struct irq_override_cmp *entry = &override_table[i]; + + if (dmi_check_system(entry->system) && + entry->irq == gsi && + entry->triggering == triggering && + entry->polarity == polarity && + entry->shareable == shareable) + return entry->override; + } + #ifdef CONFIG_X86 /* * IRQ override isn't needed on modern AMD Zen systems and @@ -445,17 +470,6 @@ static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity, return false; #endif
- for (i = 0; i < ARRAY_SIZE(skip_override_table); i++) { - const struct irq_override_cmp *entry = &skip_override_table[i]; - - if (dmi_check_system(entry->system) && - entry->irq == gsi && - entry->triggering == triggering && - entry->polarity == polarity && - entry->shareable == shareable) - return false; - } - return true; }
From: Erik Schumacher ofenfisch@googlemail.com
[ Upstream commit 7592b79ba4a91350b38469e05238308bcfe1019b ]
The Schenker XMG CORE 15 (M22) is Ryzen-6 based and needs IRQ overriding for the keyboard to work. Adding an entry for this laptop to the override_table makes the internal keyboard functional again.
Signed-off-by: Erik Schumacher ofenfisch@googlemail.com Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/resource.c | 12 ++++++++++++ 1 file changed, 12 insertions(+)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 30811a9521d4..504378cc79a2 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -428,6 +428,17 @@ static const struct dmi_system_id lenovo_82ra[] = { { } };
+static const struct dmi_system_id schenker_gm_rg[] = { + { + .ident = "XMG CORE 15 (M22)", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "SchenkerTechnologiesGmbH"), + DMI_MATCH(DMI_BOARD_NAME, "GMxRGxx"), + }, + }, + { } +}; + struct irq_override_cmp { const struct dmi_system_id *system; unsigned char irq; @@ -442,6 +453,7 @@ static const struct irq_override_cmp override_table[] = { { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { lenovo_82ra, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, { lenovo_82ra, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { schenker_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, };
static bool acpi_dev_irq_override(u32 gsi, u8 triggering, u8 polarity,
From: Adrian Freund adrian@freund.io
[ Upstream commit f3cb9b740869712d448edf3b9ef5952b847caf8b ]
Commit bfcdf58380b1 ("ACPI: resource: do IRQ override on LENOVO IdeaPad") added an override for Lenovo IdeaPad 5 16ALC7. The 14ALC7 variant also suffers from a broken touchscreen and trackpad.
Fixes: 9946e39fe8d0 ("ACPI: resource: skip IRQ override on AMD Zen platforms") Link: https://bugzilla.kernel.org/show_bug.cgi?id=216804 Signed-off-by: Adrian Freund adrian@freund.io Signed-off-by: Rafael J. Wysocki rafael.j.wysocki@intel.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/acpi/resource.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/drivers/acpi/resource.c b/drivers/acpi/resource.c index 504378cc79a2..98daea0db979 100644 --- a/drivers/acpi/resource.c +++ b/drivers/acpi/resource.c @@ -417,7 +417,14 @@ static const struct dmi_system_id asus_laptop[] = { { } };
-static const struct dmi_system_id lenovo_82ra[] = { +static const struct dmi_system_id lenovo_laptop[] = { + { + .ident = "LENOVO IdeaPad Flex 5 14ALC7", + .matches = { + DMI_MATCH(DMI_SYS_VENDOR, "LENOVO"), + DMI_MATCH(DMI_PRODUCT_NAME, "82R9"), + }, + }, { .ident = "LENOVO IdeaPad Flex 5 16ALC7", .matches = { @@ -451,8 +458,8 @@ struct irq_override_cmp { static const struct irq_override_cmp override_table[] = { { medion_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, { asus_laptop, 1, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, false }, - { lenovo_82ra, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, - { lenovo_82ra, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_laptop, 6, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, + { lenovo_laptop, 10, ACPI_LEVEL_SENSITIVE, ACPI_ACTIVE_LOW, 0, true }, { schenker_gm_rg, 1, ACPI_EDGE_SENSITIVE, ACPI_ACTIVE_LOW, 1, true }, };
From: Yu Kuai yukuai3@huawei.com
[ Upstream commit 246cf66e300b76099b5dbd3fdd39e9a5dbc53f02 ]
Commit 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'") will access 'bic->bfqq' in bic_set_bfqq(), however, bfq_exit_icq_bfqq() can free bfqq first, and then call bic_set_bfqq(), which will cause uaf.
Fix the problem by moving bfq_exit_bfqq() behind bic_set_bfqq().
Fixes: 64dc8c732f5c ("block, bfq: fix possible uaf for 'bfqq->bic'") Reported-by: Yi Zhang yi.zhang@redhat.com Signed-off-by: Yu Kuai yukuai3@huawei.com Link: https://lore.kernel.org/r/20221226030605.1437081-1-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Sasha Levin sashal@kernel.org --- block/bfq-iosched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/bfq-iosched.c b/block/bfq-iosched.c index 528ca21044a5..7d2ca122362f 100644 --- a/block/bfq-iosched.c +++ b/block/bfq-iosched.c @@ -5385,8 +5385,8 @@ static void bfq_exit_icq_bfqq(struct bfq_io_cq *bic, bool is_sync) unsigned long flags;
spin_lock_irqsave(&bfqd->lock, flags); - bfq_exit_bfqq(bfqd, bfqq); bic_set_bfqq(bic, NULL, is_sync); + bfq_exit_bfqq(bfqd, bfqq); spin_unlock_irqrestore(&bfqd->lock, flags); } }
From: Adam Vodopjan grozzly@protonmail.com
[ Upstream commit 37e14e4f3715428b809e4df9a9958baa64c77d51 ]
Since kernel 5.3.4 my laptop (ICH8M controller) does not see Kingston SV300S37A60G SSD disk connected into a SATA connector on wake from suspend. The problem was introduced in c312ef176399 ("libata/ahci: Drop PCS quirk for Denverton and beyond"): the quirk is not applied on wake from suspend as it originally was.
It is worth to mention the commit contained another bug: the quirk is not applied at all to controllers which require it. The fix commit 09d6ac8dc51a ("libata/ahci: Fix PCS quirk application") landed in 5.3.8. So testing my patch anywhere between commits c312ef176399 and 09d6ac8dc51a is pointless.
Not all disks trigger the problem. For example nothing bad happens with Western Digital WD5000LPCX HDD.
Test hardware: - Acer 5920G with ICH8M SATA controller - sda: some SATA HDD connnected into the DVD drive IDE port with a SATA-IDE caddy. It is a boot disk - sdb: Kingston SV300S37A60G SSD connected into the only SATA port
Sample "dmesg --notime | grep -E '^(sd |ata)'" output on wake:
sd 0:0:0:0: [sda] Starting disk sd 2:0:0:0: [sdb] Starting disk ata4: SATA link down (SStatus 4 SControl 300) ata3: SATA link down (SStatus 4 SControl 300) ata1.00: ACPI cmd ef/03:0c:00:00:00:a0 (SET FEATURES) filtered out ata1.00: ACPI cmd ef/03:42:00:00:00:a0 (SET FEATURES) filtered out ata1: FORCE: cable set to 80c ata5: SATA link down (SStatus 0 SControl 300) ata3: SATA link down (SStatus 4 SControl 300) ata3: SATA link down (SStatus 4 SControl 300) ata3.00: disabled sd 2:0:0:0: rejecting I/O to offline device ata3.00: detaching (SCSI 2:0:0:0) sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK sd 2:0:0:0: [sdb] Synchronizing SCSI cache sd 2:0:0:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK sd 2:0:0:0: [sdb] Stopping disk sd 2:0:0:0: [sdb] Start/Stop Unit failed: Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK
Commit c312ef176399 dropped ahci_pci_reset_controller() which internally calls ahci_reset_controller() and applies the PCS quirk if needed after that. It was called each time a reset was required instead of just ahci_reset_controller(). This patch puts the function back in place.
Fixes: c312ef176399 ("libata/ahci: Drop PCS quirk for Denverton and beyond") Signed-off-by: Adam Vodopjan grozzly@protonmail.com Signed-off-by: Damien Le Moal damien.lemoal@opensource.wdc.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/ata/ahci.c | 32 +++++++++++++++++++++++--------- 1 file changed, 23 insertions(+), 9 deletions(-)
diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c index c1eca72b4575..28d8c56cb4dd 100644 --- a/drivers/ata/ahci.c +++ b/drivers/ata/ahci.c @@ -84,6 +84,7 @@ enum board_ids { static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent); static void ahci_remove_one(struct pci_dev *dev); static void ahci_shutdown_one(struct pci_dev *dev); +static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hpriv); static int ahci_vt8251_hardreset(struct ata_link *link, unsigned int *class, unsigned long deadline); static int ahci_avn_hardreset(struct ata_link *link, unsigned int *class, @@ -677,6 +678,25 @@ static void ahci_pci_save_initial_config(struct pci_dev *pdev, ahci_save_initial_config(&pdev->dev, hpriv); }
+static int ahci_pci_reset_controller(struct ata_host *host) +{ + struct pci_dev *pdev = to_pci_dev(host->dev); + struct ahci_host_priv *hpriv = host->private_data; + int rc; + + rc = ahci_reset_controller(host); + if (rc) + return rc; + + /* + * If platform firmware failed to enable ports, try to enable + * them here. + */ + ahci_intel_pcs_quirk(pdev, hpriv); + + return 0; +} + static void ahci_pci_init_controller(struct ata_host *host) { struct ahci_host_priv *hpriv = host->private_data; @@ -871,7 +891,7 @@ static int ahci_pci_device_runtime_resume(struct device *dev) struct ata_host *host = pci_get_drvdata(pdev); int rc;
- rc = ahci_reset_controller(host); + rc = ahci_pci_reset_controller(host); if (rc) return rc; ahci_pci_init_controller(host); @@ -907,7 +927,7 @@ static int ahci_pci_device_resume(struct device *dev) ahci_mcp89_apple_enable(pdev);
if (pdev->dev.power.power_state.event == PM_EVENT_SUSPEND) { - rc = ahci_reset_controller(host); + rc = ahci_pci_reset_controller(host); if (rc) return rc;
@@ -1788,12 +1808,6 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) /* save initial config */ ahci_pci_save_initial_config(pdev, hpriv);
- /* - * If platform firmware failed to enable ports, try to enable - * them here. - */ - ahci_intel_pcs_quirk(pdev, hpriv); - /* prepare host */ if (hpriv->cap & HOST_CAP_NCQ) { pi.flags |= ATA_FLAG_NCQ; @@ -1903,7 +1917,7 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent) if (rc) return rc;
- rc = ahci_reset_controller(host); + rc = ahci_pci_reset_controller(host); if (rc) return rc;
From: Christoph Hellwig hch@lst.de
[ Upstream commit 685e6311637e46f3212439ce2789f8a300e5050f ]
3 << 16 does not generate the correct mask for bits 16, 17 and 18. Use the GENMASK macro to generate the correct mask instead.
Fixes: 84fef62d135b ("nvme: check admin passthru command effects") Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Keith Busch kbusch@kernel.org Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Kanchan Joshi joshi.k@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- include/linux/nvme.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/include/linux/nvme.h b/include/linux/nvme.h index ae53d74f3696..e2dbb9755cca 100644 --- a/include/linux/nvme.h +++ b/include/linux/nvme.h @@ -7,6 +7,7 @@ #ifndef _LINUX_NVME_H #define _LINUX_NVME_H
+#include <linux/bits.h> #include <linux/types.h> #include <linux/uuid.h>
@@ -639,7 +640,7 @@ enum { NVME_CMD_EFFECTS_NCC = 1 << 2, NVME_CMD_EFFECTS_NIC = 1 << 3, NVME_CMD_EFFECTS_CCC = 1 << 4, - NVME_CMD_EFFECTS_CSE_MASK = 3 << 16, + NVME_CMD_EFFECTS_CSE_MASK = GENMASK(18, 16), NVME_CMD_EFFECTS_UUID_SEL = 1 << 19, };
From: Christoph Hellwig hch@lst.de
[ Upstream commit 2a459f6933e1c459bffb7cc73fd6c900edc714bd ]
Mask out the "Command Supported" and "Logical Block Content Change" bits and only defer execution of commands that have non-trivial effects to the workqueue for synchronous execution. This allows to execute admin commands asynchronously on controllers that provide a Command Supported and Effects log page, and will keep allowing to execute Write commands asynchronously once command effects on I/O commands are taken into account.
Fixes: c1fef73f793b ("nvmet: add passthru code to process commands") Signed-off-by: Christoph Hellwig hch@lst.de Reviewed-by: Keith Busch kbusch@kernel.org Reviewed-by: Sagi Grimberg sagi@grimberg.me Reviewed-by: Kanchan Joshi joshi.k@samsung.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/nvme/target/passthru.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/drivers/nvme/target/passthru.c b/drivers/nvme/target/passthru.c index 94d3153bae54..09537000d817 100644 --- a/drivers/nvme/target/passthru.c +++ b/drivers/nvme/target/passthru.c @@ -333,14 +333,13 @@ static void nvmet_passthru_execute_cmd(struct nvmet_req *req) }
/* - * If there are effects for the command we are about to execute, or - * an end_req function we need to use nvme_execute_passthru_rq() - * synchronously in a work item seeing the end_req function and - * nvme_passthru_end() can't be called in the request done callback - * which is typically in interrupt context. + * If a command needs post-execution fixups, or there are any + * non-trivial effects, make sure to execute the command synchronously + * in a workqueue so that nvme_passthru_end gets called. */ effects = nvme_command_effects(ctrl, ns, req->cmd->common.opcode); - if (req->p.use_workqueue || effects) { + if (req->p.use_workqueue || + (effects & ~(NVME_CMD_EFFECTS_CSUPP | NVME_CMD_EFFECTS_LBCC))) { INIT_WORK(&req->p.work, nvmet_passthru_execute_cmd_work); req->p.rq = rq; queue_work(nvmet_wq, &req->p.work);
From: edward lo edward.lo@ambergroup.io
[ Upstream commit 0b66046266690454dc04e6307bcff4a5605b42a1 ]
When the NTFS BOOT record_size field < 0, it represents a shift value. However, there is no sanity check on the shift result and the sbi->record_bits calculation through blksize_bits() assumes the size always > 256, which could lead to NPD while mounting a malformed NTFS image.
[ 318.675159] BUG: kernel NULL pointer dereference, address: 0000000000000158 [ 318.675682] #PF: supervisor read access in kernel mode [ 318.675869] #PF: error_code(0x0000) - not-present page [ 318.676246] PGD 0 P4D 0 [ 318.676502] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 318.676934] CPU: 0 PID: 259 Comm: mount Not tainted 5.19.0 #5 [ 318.677289] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 318.678136] RIP: 0010:ni_find_attr+0x2d/0x1c0 [ 318.678656] Code: 89 ca 4d 89 c7 41 56 41 55 41 54 41 89 cc 55 48 89 fd 53 48 89 d3 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 180 [ 318.679848] RSP: 0018:ffffa6c8c0297bd8 EFLAGS: 00000246 [ 318.680104] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080 [ 318.680790] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 318.681679] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 318.682577] R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000080 [ 318.683015] R13: ffff8d5582e68400 R14: 0000000000000100 R15: 0000000000000000 [ 318.683618] FS: 00007fd9e1c81e40(0000) GS:ffff8d55fdc00000(0000) knlGS:0000000000000000 [ 318.684280] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 318.684651] CR2: 0000000000000158 CR3: 0000000002e1a000 CR4: 00000000000006f0 [ 318.685623] Call Trace: [ 318.686607] <TASK> [ 318.686872] ? ntfs_alloc_inode+0x1a/0x60 [ 318.687235] attr_load_runs_vcn+0x2b/0xa0 [ 318.687468] mi_read+0xbb/0x250 [ 318.687576] ntfs_iget5+0x114/0xd90 [ 318.687750] ntfs_fill_super+0x588/0x11b0 [ 318.687953] ? put_ntfs+0x130/0x130 [ 318.688065] ? snprintf+0x49/0x70 [ 318.688164] ? put_ntfs+0x130/0x130 [ 318.688256] get_tree_bdev+0x16a/0x260 [ 318.688407] vfs_get_tree+0x20/0xb0 [ 318.688519] path_mount+0x2dc/0x9b0 [ 318.688877] do_mount+0x74/0x90 [ 318.689142] __x64_sys_mount+0x89/0xd0 [ 318.689636] do_syscall_64+0x3b/0x90 [ 318.689998] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 318.690318] RIP: 0033:0x7fd9e133c48a [ 318.690687] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 318.691357] RSP: 002b:00007ffd374406c8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 318.691632] RAX: ffffffffffffffda RBX: 0000564d0b051080 RCX: 00007fd9e133c48a [ 318.691920] RDX: 0000564d0b051280 RSI: 0000564d0b051300 RDI: 0000564d0b0596a0 [ 318.692123] RBP: 0000000000000000 R08: 0000564d0b0512a0 R09: 0000000000000020 [ 318.692349] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564d0b0596a0 [ 318.692673] R13: 0000564d0b051280 R14: 0000000000000000 R15: 00000000ffffffff [ 318.693007] </TASK> [ 318.693271] Modules linked in: [ 318.693614] CR2: 0000000000000158 [ 318.694446] ---[ end trace 0000000000000000 ]--- [ 318.694779] RIP: 0010:ni_find_attr+0x2d/0x1c0 [ 318.694952] Code: 89 ca 4d 89 c7 41 56 41 55 41 54 41 89 cc 55 48 89 fd 53 48 89 d3 48 83 ec 20 65 48 8b 04 25 28 00 00 00 48 89 44 24 180 [ 318.696042] RSP: 0018:ffffa6c8c0297bd8 EFLAGS: 00000246 [ 318.696531] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000080 [ 318.698114] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 318.699286] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 318.699795] R10: 0000000000000000 R11: 0000000000000005 R12: 0000000000000080 [ 318.700236] R13: ffff8d5582e68400 R14: 0000000000000100 R15: 0000000000000000 [ 318.700973] FS: 00007fd9e1c81e40(0000) GS:ffff8d55fdc00000(0000) knlGS:0000000000000000 [ 318.701688] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 318.702190] CR2: 0000000000000158 CR3: 0000000002e1a000 CR4: 00000000000006f0 [ 318.726510] mount (259) used greatest stack depth: 13320 bytes left
This patch adds a sanity check.
Signed-off-by: edward lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c index adc4f73722b7..a0cea3b7526e 100644 --- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -789,7 +789,7 @@ static int ntfs_init_from_boot(struct super_block *sb, u32 sector_size, : (u32)boot->record_size << sbi->cluster_bits;
- if (record_size > MAXIMUM_BYTES_PER_MFT) + if (record_size > MAXIMUM_BYTES_PER_MFT || record_size < SECTOR_SIZE) goto out;
sbi->record_bits = blksize_bits(record_size);
From: edward lo edward.lo@ambergroup.io
[ Upstream commit e19c6277652efba203af4ecd8eed4bd30a0054c9 ]
The offset addition could overflow and pass the used size check given an attribute with very large size (e.g., 0xffffff7f) while parsing MFT attributes. This could lead to out-of-bound memory R/W if we try to access the next attribute derived by Add2Ptr(attr, asize)
[ 32.963847] BUG: unable to handle page fault for address: ffff956a83c76067 [ 32.964301] #PF: supervisor read access in kernel mode [ 32.964526] #PF: error_code(0x0000) - not-present page [ 32.964893] PGD 4dc01067 P4D 4dc01067 PUD 0 [ 32.965316] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 32.965727] CPU: 0 PID: 243 Comm: mount Not tainted 5.19.0+ #6 [ 32.966050] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 32.966628] RIP: 0010:mi_enum_attr+0x44/0x110 [ 32.967239] Code: 89 f0 48 29 c8 48 89 c1 39 c7 0f 86 94 00 00 00 8b 56 04 83 fa 17 0f 86 88 00 00 00 89 d0 01 ca 48 01 f0 8d 4a 08 39 f9a [ 32.968101] RSP: 0018:ffffba15c06a7c38 EFLAGS: 00000283 [ 32.968364] RAX: ffff956a83c76067 RBX: ffff956983c76050 RCX: 000000000000006f [ 32.968651] RDX: 0000000000000067 RSI: ffff956983c760e8 RDI: 00000000000001c8 [ 32.968963] RBP: ffffba15c06a7c38 R08: 0000000000000064 R09: 00000000ffffff7f [ 32.969249] R10: 0000000000000007 R11: ffff956983c760e8 R12: ffff95698225e000 [ 32.969870] R13: 0000000000000000 R14: ffffba15c06a7cd8 R15: ffff95698225e170 [ 32.970655] FS: 00007fdab8189e40(0000) GS:ffff9569fdc00000(0000) knlGS:0000000000000000 [ 32.971098] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.971378] CR2: ffff956a83c76067 CR3: 0000000002c58000 CR4: 00000000000006f0 [ 32.972098] Call Trace: [ 32.972842] <TASK> [ 32.973341] ni_enum_attr_ex+0xda/0xf0 [ 32.974087] ntfs_iget5+0x1db/0xde0 [ 32.974386] ? slab_post_alloc_hook+0x53/0x270 [ 32.974778] ? ntfs_fill_super+0x4c7/0x12a0 [ 32.975115] ntfs_fill_super+0x5d6/0x12a0 [ 32.975336] get_tree_bdev+0x175/0x270 [ 32.975709] ? put_ntfs+0x150/0x150 [ 32.975956] ntfs_fs_get_tree+0x15/0x20 [ 32.976191] vfs_get_tree+0x2a/0xc0 [ 32.976374] ? capable+0x19/0x20 [ 32.976572] path_mount+0x484/0xaa0 [ 32.977025] ? putname+0x57/0x70 [ 32.977380] do_mount+0x80/0xa0 [ 32.977555] __x64_sys_mount+0x8b/0xe0 [ 32.978105] do_syscall_64+0x3b/0x90 [ 32.978830] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 32.979311] RIP: 0033:0x7fdab72e948a [ 32.980015] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 32.981251] RSP: 002b:00007ffd15b87588 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 32.981832] RAX: ffffffffffffffda RBX: 0000557de0aaf060 RCX: 00007fdab72e948a [ 32.982234] RDX: 0000557de0aaf260 RSI: 0000557de0aaf2e0 RDI: 0000557de0ab7ce0 [ 32.982714] RBP: 0000000000000000 R08: 0000557de0aaf280 R09: 0000000000000020 [ 32.983046] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000557de0ab7ce0 [ 32.983494] R13: 0000557de0aaf260 R14: 0000000000000000 R15: 00000000ffffffff [ 32.984094] </TASK> [ 32.984352] Modules linked in: [ 32.984753] CR2: ffff956a83c76067 [ 32.985911] ---[ end trace 0000000000000000 ]--- [ 32.986555] RIP: 0010:mi_enum_attr+0x44/0x110 [ 32.987217] Code: 89 f0 48 29 c8 48 89 c1 39 c7 0f 86 94 00 00 00 8b 56 04 83 fa 17 0f 86 88 00 00 00 89 d0 01 ca 48 01 f0 8d 4a 08 39 f9a [ 32.988232] RSP: 0018:ffffba15c06a7c38 EFLAGS: 00000283 [ 32.988532] RAX: ffff956a83c76067 RBX: ffff956983c76050 RCX: 000000000000006f [ 32.988916] RDX: 0000000000000067 RSI: ffff956983c760e8 RDI: 00000000000001c8 [ 32.989356] RBP: ffffba15c06a7c38 R08: 0000000000000064 R09: 00000000ffffff7f [ 32.989994] R10: 0000000000000007 R11: ffff956983c760e8 R12: ffff95698225e000 [ 32.990415] R13: 0000000000000000 R14: ffffba15c06a7cd8 R15: ffff95698225e170 [ 32.991011] FS: 00007fdab8189e40(0000) GS:ffff9569fdc00000(0000) knlGS:0000000000000000 [ 32.991524] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 32.991936] CR2: ffff956a83c76067 CR3: 0000000002c58000 CR4: 00000000000006f0
This patch adds an overflow check
Signed-off-by: edward lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/record.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/ntfs3/record.c b/fs/ntfs3/record.c index 7d2fac5ee215..9f81944441ae 100644 --- a/fs/ntfs3/record.c +++ b/fs/ntfs3/record.c @@ -220,6 +220,11 @@ struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr) return NULL; }
+ if (off + asize < off) { + /* overflow check */ + return NULL; + } + attr = Add2Ptr(attr, asize); off += asize; }
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit 6db620863f8528ed9a9aa5ad323b26554a17881d ]
This adds sanity checks for data run offset. We should make sure data run offset is legit before trying to unpack them, otherwise we may encounter use-after-free or some unexpected memory access behaviors.
[ 82.940342] BUG: KASAN: use-after-free in run_unpack+0x2e3/0x570 [ 82.941180] Read of size 1 at addr ffff888008a8487f by task mount/240 [ 82.941670] [ 82.942069] CPU: 0 PID: 240 Comm: mount Not tainted 5.19.0+ #15 [ 82.942482] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 82.943720] Call Trace: [ 82.944204] <TASK> [ 82.944471] dump_stack_lvl+0x49/0x63 [ 82.944908] print_report.cold+0xf5/0x67b [ 82.945141] ? __wait_on_bit+0x106/0x120 [ 82.945750] ? run_unpack+0x2e3/0x570 [ 82.946626] kasan_report+0xa7/0x120 [ 82.947046] ? run_unpack+0x2e3/0x570 [ 82.947280] __asan_load1+0x51/0x60 [ 82.947483] run_unpack+0x2e3/0x570 [ 82.947709] ? memcpy+0x4e/0x70 [ 82.947927] ? run_pack+0x7a0/0x7a0 [ 82.948158] run_unpack_ex+0xad/0x3f0 [ 82.948399] ? mi_enum_attr+0x14a/0x200 [ 82.948717] ? run_unpack+0x570/0x570 [ 82.949072] ? ni_enum_attr_ex+0x1b2/0x1c0 [ 82.949332] ? ni_fname_type.part.0+0xd0/0xd0 [ 82.949611] ? mi_read+0x262/0x2c0 [ 82.949970] ? ntfs_cmp_names_cpu+0x125/0x180 [ 82.950249] ntfs_iget5+0x632/0x1870 [ 82.950621] ? ntfs_get_block_bmap+0x70/0x70 [ 82.951192] ? evict+0x223/0x280 [ 82.951525] ? iput.part.0+0x286/0x320 [ 82.951969] ntfs_fill_super+0x1321/0x1e20 [ 82.952436] ? put_ntfs+0x1d0/0x1d0 [ 82.952822] ? vsprintf+0x20/0x20 [ 82.953188] ? mutex_unlock+0x81/0xd0 [ 82.953379] ? set_blocksize+0x95/0x150 [ 82.954001] get_tree_bdev+0x232/0x370 [ 82.954438] ? put_ntfs+0x1d0/0x1d0 [ 82.954700] ntfs_fs_get_tree+0x15/0x20 [ 82.955049] vfs_get_tree+0x4c/0x130 [ 82.955292] path_mount+0x645/0xfd0 [ 82.955615] ? putname+0x80/0xa0 [ 82.955955] ? finish_automount+0x2e0/0x2e0 [ 82.956310] ? kmem_cache_free+0x110/0x390 [ 82.956723] ? putname+0x80/0xa0 [ 82.957023] do_mount+0xd6/0xf0 [ 82.957411] ? path_mount+0xfd0/0xfd0 [ 82.957638] ? __kasan_check_write+0x14/0x20 [ 82.957948] __x64_sys_mount+0xca/0x110 [ 82.958310] do_syscall_64+0x3b/0x90 [ 82.958719] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 82.959341] RIP: 0033:0x7fd0d1ce948a [ 82.960193] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 82.961532] RSP: 002b:00007ffe59ff69a8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 82.962527] RAX: ffffffffffffffda RBX: 0000564dcc107060 RCX: 00007fd0d1ce948a [ 82.963266] RDX: 0000564dcc107260 RSI: 0000564dcc1072e0 RDI: 0000564dcc10fce0 [ 82.963686] RBP: 0000000000000000 R08: 0000564dcc107280 R09: 0000000000000020 [ 82.964272] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564dcc10fce0 [ 82.964785] R13: 0000564dcc107260 R14: 0000000000000000 R15: 00000000ffffffff
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/attrib.c | 13 +++++++++++++ fs/ntfs3/attrlist.c | 5 +++++ fs/ntfs3/frecord.c | 14 ++++++++++++++ fs/ntfs3/fslog.c | 9 +++++++++ fs/ntfs3/inode.c | 5 +++++ 5 files changed, 46 insertions(+)
diff --git a/fs/ntfs3/attrib.c b/fs/ntfs3/attrib.c index 71f870d497ae..0d354560d323 100644 --- a/fs/ntfs3/attrib.c +++ b/fs/ntfs3/attrib.c @@ -101,6 +101,10 @@ static int attr_load_runs(struct ATTRIB *attr, struct ntfs_inode *ni,
asize = le32_to_cpu(attr->size); run_off = le16_to_cpu(attr->nres.run_off); + + if (run_off > asize) + return -EINVAL; + err = run_unpack_ex(run, ni->mi.sbi, ni->mi.rno, svcn, evcn, vcn ? *vcn : svcn, Add2Ptr(attr, run_off), asize - run_off); @@ -1232,6 +1236,10 @@ int attr_load_runs_vcn(struct ntfs_inode *ni, enum ATTR_TYPE type, }
ro = le16_to_cpu(attr->nres.run_off); + + if (ro > le32_to_cpu(attr->size)) + return -EINVAL; + err = run_unpack_ex(run, ni->mi.sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, ro), le32_to_cpu(attr->size) - ro); if (err < 0) @@ -1901,6 +1909,11 @@ int attr_collapse_range(struct ntfs_inode *ni, u64 vbo, u64 bytes) u16 le_sz; u16 roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > le32_to_cpu(attr->size)) { + err = -EINVAL; + goto out; + } + run_unpack_ex(RUN_DEALLOCATE, sbi, ni->mi.rno, svcn, evcn1 - 1, svcn, Add2Ptr(attr, roff), le32_to_cpu(attr->size) - roff); diff --git a/fs/ntfs3/attrlist.c b/fs/ntfs3/attrlist.c index bad6d8a849a2..c0c6bcbc8c05 100644 --- a/fs/ntfs3/attrlist.c +++ b/fs/ntfs3/attrlist.c @@ -68,6 +68,11 @@ int ntfs_load_attr_list(struct ntfs_inode *ni, struct ATTRIB *attr)
run_init(&ni->attr_list.run);
+ if (run_off > le32_to_cpu(attr->size)) { + err = -EINVAL; + goto out; + } + err = run_unpack_ex(&ni->attr_list.run, ni->mi.sbi, ni->mi.rno, 0, le64_to_cpu(attr->nres.evcn), 0, Add2Ptr(attr, run_off), diff --git a/fs/ntfs3/frecord.c b/fs/ntfs3/frecord.c index 381a38a06ec2..b1b476fb7229 100644 --- a/fs/ntfs3/frecord.c +++ b/fs/ntfs3/frecord.c @@ -568,6 +568,12 @@ static int ni_repack(struct ntfs_inode *ni) }
roff = le16_to_cpu(attr->nres.run_off); + + if (roff > le32_to_cpu(attr->size)) { + err = -EINVAL; + break; + } + err = run_unpack(&run, sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, roff), le32_to_cpu(attr->size) - roff); @@ -1589,6 +1595,9 @@ int ni_delete_all(struct ntfs_inode *ni) asize = le32_to_cpu(attr->size); roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > asize) + return -EINVAL; + /* run==1 means unpack and deallocate. */ run_unpack_ex(RUN_DEALLOCATE, sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, roff), asize - roff); @@ -2291,6 +2300,11 @@ int ni_decompress_file(struct ntfs_inode *ni) asize = le32_to_cpu(attr->size); roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > asize) { + err = -EINVAL; + goto out; + } + /*run==1 Means unpack and deallocate. */ run_unpack_ex(RUN_DEALLOCATE, sbi, ni->mi.rno, svcn, evcn, svcn, Add2Ptr(attr, roff), asize - roff); diff --git a/fs/ntfs3/fslog.c b/fs/ntfs3/fslog.c index e7c494005122..138050b1aa93 100644 --- a/fs/ntfs3/fslog.c +++ b/fs/ntfs3/fslog.c @@ -2727,6 +2727,9 @@ static inline bool check_attr(const struct MFT_REC *rec, return false; }
+ if (run_off > asize) + return false; + if (run_unpack(NULL, sbi, 0, svcn, evcn, svcn, Add2Ptr(attr, run_off), asize - run_off) < 0) { return false; @@ -4771,6 +4774,12 @@ int log_replay(struct ntfs_inode *ni, bool *initialized) u16 roff = le16_to_cpu(attr->nres.run_off); CLST svcn = le64_to_cpu(attr->nres.svcn);
+ if (roff > t32) { + kfree(oa->attr); + oa->attr = NULL; + goto fake_attr; + } + err = run_unpack(&oa->run0, sbi, inode->i_ino, svcn, le64_to_cpu(attr->nres.evcn), svcn, Add2Ptr(attr, roff), t32 - roff); diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 26a76ebfe58f..3f5e3ca099c7 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -364,6 +364,11 @@ static struct inode *ntfs_read_mft(struct inode *inode, attr_unpack_run: roff = le16_to_cpu(attr->nres.run_off);
+ if (roff > asize) { + err = -EINVAL; + goto out; + } + t64 = le64_to_cpu(attr->nres.svcn); err = run_unpack_ex(run, sbi, ino, t64, le64_to_cpu(attr->nres.evcn), t64, Add2Ptr(attr, roff), asize - roff);
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit 2681631c29739509eec59cc0b34e977bb04c6cf1 ]
Some metadata files are handled before MFT. This adds a null pointer check for some corner cases that could lead to NPD while reading these metadata files for a malformed NTFS image.
[ 240.190827] BUG: kernel NULL pointer dereference, address: 0000000000000158 [ 240.191583] #PF: supervisor read access in kernel mode [ 240.191956] #PF: error_code(0x0000) - not-present page [ 240.192391] PGD 0 P4D 0 [ 240.192897] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 240.193805] CPU: 0 PID: 242 Comm: mount Tainted: G B 5.19.0+ #17 [ 240.194477] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 240.195152] RIP: 0010:ni_find_attr+0xae/0x300 [ 240.195679] Code: c8 48 c7 45 88 c0 4e 5e 86 c7 00 f1 f1 f1 f1 c7 40 04 00 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 e2 d9f [ 240.196642] RSP: 0018:ffff88800812f690 EFLAGS: 00000286 [ 240.197019] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff85ef037a [ 240.197523] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff88e95f60 [ 240.197877] RBP: ffff88800812f738 R08: 0000000000000001 R09: fffffbfff11d2bed [ 240.198292] R10: ffffffff88e95f67 R11: fffffbfff11d2bec R12: 0000000000000000 [ 240.198647] R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000000 [ 240.199410] FS: 00007f233c33be40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 240.199895] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 240.200314] CR2: 0000000000000158 CR3: 0000000004d32000 CR4: 00000000000006f0 [ 240.200839] Call Trace: [ 240.201104] <TASK> [ 240.201502] ? ni_load_mi+0x80/0x80 [ 240.202297] ? ___slab_alloc+0x465/0x830 [ 240.202614] attr_load_runs_vcn+0x8c/0x1a0 [ 240.202886] ? __kasan_slab_alloc+0x32/0x90 [ 240.203157] ? attr_data_write_resident+0x250/0x250 [ 240.203543] mi_read+0x133/0x2c0 [ 240.203785] mi_get+0x70/0x140 [ 240.204012] ni_load_mi_ex+0xfa/0x190 [ 240.204346] ? ni_std5+0x90/0x90 [ 240.204588] ? __kasan_kmalloc+0x88/0xb0 [ 240.204859] ni_enum_attr_ex+0xf1/0x1c0 [ 240.205107] ? ni_fname_type.part.0+0xd0/0xd0 [ 240.205600] ? ntfs_load_attr_list+0xbe/0x300 [ 240.205864] ? ntfs_cmp_names_cpu+0x125/0x180 [ 240.206157] ntfs_iget5+0x56c/0x1870 [ 240.206510] ? ntfs_get_block_bmap+0x70/0x70 [ 240.206776] ? __kasan_kmalloc+0x88/0xb0 [ 240.207030] ? set_blocksize+0x95/0x150 [ 240.207545] ntfs_fill_super+0xb8f/0x1e20 [ 240.207839] ? put_ntfs+0x1d0/0x1d0 [ 240.208069] ? vsprintf+0x20/0x20 [ 240.208467] ? mutex_unlock+0x81/0xd0 [ 240.208846] ? set_blocksize+0x95/0x150 [ 240.209221] get_tree_bdev+0x232/0x370 [ 240.209804] ? put_ntfs+0x1d0/0x1d0 [ 240.210519] ntfs_fs_get_tree+0x15/0x20 [ 240.210991] vfs_get_tree+0x4c/0x130 [ 240.211455] path_mount+0x645/0xfd0 [ 240.211806] ? putname+0x80/0xa0 [ 240.212112] ? finish_automount+0x2e0/0x2e0 [ 240.212559] ? kmem_cache_free+0x110/0x390 [ 240.212906] ? putname+0x80/0xa0 [ 240.213329] do_mount+0xd6/0xf0 [ 240.213829] ? path_mount+0xfd0/0xfd0 [ 240.214246] ? __kasan_check_write+0x14/0x20 [ 240.214774] __x64_sys_mount+0xca/0x110 [ 240.215080] do_syscall_64+0x3b/0x90 [ 240.215442] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 240.215811] RIP: 0033:0x7f233b4e948a [ 240.216104] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 240.217615] RSP: 002b:00007fff02211ec8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 240.218718] RAX: ffffffffffffffda RBX: 0000561cdc35b060 RCX: 00007f233b4e948a [ 240.219556] RDX: 0000561cdc35b260 RSI: 0000561cdc35b2e0 RDI: 0000561cdc363af0 [ 240.219975] RBP: 0000000000000000 R08: 0000561cdc35b280 R09: 0000000000000020 [ 240.220403] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000561cdc363af0 [ 240.220803] R13: 0000561cdc35b260 R14: 0000000000000000 R15: 00000000ffffffff [ 240.221256] </TASK> [ 240.221567] Modules linked in: [ 240.222028] CR2: 0000000000000158 [ 240.223291] ---[ end trace 0000000000000000 ]--- [ 240.223669] RIP: 0010:ni_find_attr+0xae/0x300 [ 240.224058] Code: c8 48 c7 45 88 c0 4e 5e 86 c7 00 f1 f1 f1 f1 c7 40 04 00 f3 f3 f3 65 48 8b 04 25 28 00 00 00 48 89 45 d0 31 c0 e8 e2 d9f [ 240.225033] RSP: 0018:ffff88800812f690 EFLAGS: 00000286 [ 240.225968] RAX: 0000000000000001 RBX: 0000000000000000 RCX: ffffffff85ef037a [ 240.226624] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff88e95f60 [ 240.227307] RBP: ffff88800812f738 R08: 0000000000000001 R09: fffffbfff11d2bed [ 240.227816] R10: ffffffff88e95f67 R11: fffffbfff11d2bec R12: 0000000000000000 [ 240.228330] R13: 0000000000000080 R14: 0000000000000000 R15: 0000000000000000 [ 240.228729] FS: 00007f233c33be40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 240.229281] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 240.230298] CR2: 0000000000000158 CR3: 0000000004d32000 CR4: 00000000000006f0
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/attrib.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/ntfs3/attrib.c b/fs/ntfs3/attrib.c index 0d354560d323..578c2bcfb1d9 100644 --- a/fs/ntfs3/attrib.c +++ b/fs/ntfs3/attrib.c @@ -1221,6 +1221,11 @@ int attr_load_runs_vcn(struct ntfs_inode *ni, enum ATTR_TYPE type, CLST svcn, evcn; u16 ro;
+ if (!ni) { + /* Is record corrupted? */ + return -ENOENT; + } + attr = ni_find_attr(ni, NULL, NULL, type, name, name_len, &vcn, NULL); if (!attr) { /* Is record corrupted? */
From: Shigeru Yoshida syoshida@redhat.com
[ Upstream commit 51e76a232f8c037f1d9e9922edc25b003d5f3414 ]
syzbot reported kmemleak as below:
BUG: memory leak unreferenced object 0xffff8880122f1540 (size 32): comm "a.out", pid 6664, jiffies 4294939771 (age 25.500s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 ed ff ed ff 00 00 00 00 ................ backtrace: [<ffffffff81b16052>] ntfs_init_fs_context+0x22/0x1c0 [<ffffffff8164aaa7>] alloc_fs_context+0x217/0x430 [<ffffffff81626dd4>] path_mount+0x704/0x1080 [<ffffffff81627e7c>] __x64_sys_mount+0x18c/0x1d0 [<ffffffff84593e14>] do_syscall_64+0x34/0xb0 [<ffffffff84600087>] entry_SYSCALL_64_after_hwframe+0x63/0xcd
This patch fixes this issue by freeing mount options on error path of ntfs_fill_super().
Reported-by: syzbot+9d67170b20e8f94351c8@syzkaller.appspotmail.com Signed-off-by: Shigeru Yoshida syoshida@redhat.com Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/super.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c index a0cea3b7526e..170682c2bf67 100644 --- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -1281,6 +1281,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc) * Free resources here. * ntfs_fs_free will be called with fc->s_fs_info = NULL */ + put_mount_options(sbi->options); put_ntfs(sbi); sb->s_fs_info = NULL;
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit c1ca8ef0262b25493631ecbd9cb8c9893e1481a1 ]
This adds a sanity check for the i_op pointer of the inode which is returned after reading Root directory MFT record. We should check the i_op is valid before trying to create the root dentry, otherwise we may encounter a NPD while mounting a image with a funny Root directory MFT record.
[ 114.484325] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 114.484811] #PF: supervisor read access in kernel mode [ 114.485084] #PF: error_code(0x0000) - not-present page [ 114.485606] PGD 0 P4D 0 [ 114.485975] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 114.486570] CPU: 0 PID: 237 Comm: mount Tainted: G B 6.0.0-rc4 #28 [ 114.486977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 114.488169] RIP: 0010:d_flags_for_inode+0xe0/0x110 [ 114.488816] Code: 24 f7 ff 49 83 3e 00 74 41 41 83 cd 02 66 44 89 6b 02 eb 92 48 8d 7b 20 e8 6d 24 f7 ff 4c 8b 73 20 49 8d 7e 08 e8 60 241 [ 114.490326] RSP: 0018:ffff8880065e7aa8 EFLAGS: 00000296 [ 114.490695] RAX: 0000000000000001 RBX: ffff888008ccd750 RCX: ffffffff84af2aea [ 114.490986] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff87abd020 [ 114.491364] RBP: ffff8880065e7ac8 R08: 0000000000000001 R09: fffffbfff0f57a05 [ 114.491675] R10: ffffffff87abd027 R11: fffffbfff0f57a04 R12: 0000000000000000 [ 114.491954] R13: 0000000000000008 R14: 0000000000000000 R15: ffff888008ccd750 [ 114.492397] FS: 00007fdc8a627e40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 114.492797] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 114.493150] CR2: 0000000000000008 CR3: 00000000013ba000 CR4: 00000000000006f0 [ 114.493671] Call Trace: [ 114.493890] <TASK> [ 114.494075] __d_instantiate+0x24/0x1c0 [ 114.494505] d_instantiate.part.0+0x35/0x50 [ 114.494754] d_make_root+0x53/0x80 [ 114.494998] ntfs_fill_super+0x1232/0x1b50 [ 114.495260] ? put_ntfs+0x1d0/0x1d0 [ 114.495499] ? vsprintf+0x20/0x20 [ 114.495723] ? set_blocksize+0x95/0x150 [ 114.495964] get_tree_bdev+0x232/0x370 [ 114.496272] ? put_ntfs+0x1d0/0x1d0 [ 114.496502] ntfs_fs_get_tree+0x15/0x20 [ 114.496859] vfs_get_tree+0x4c/0x130 [ 114.497099] path_mount+0x654/0xfe0 [ 114.497507] ? putname+0x80/0xa0 [ 114.497933] ? finish_automount+0x2e0/0x2e0 [ 114.498362] ? putname+0x80/0xa0 [ 114.498571] ? kmem_cache_free+0x1c4/0x440 [ 114.498819] ? putname+0x80/0xa0 [ 114.499069] do_mount+0xd6/0xf0 [ 114.499343] ? path_mount+0xfe0/0xfe0 [ 114.499683] ? __kasan_check_write+0x14/0x20 [ 114.500133] __x64_sys_mount+0xca/0x110 [ 114.500592] do_syscall_64+0x3b/0x90 [ 114.500930] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 114.501294] RIP: 0033:0x7fdc898e948a [ 114.501542] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 114.502716] RSP: 002b:00007ffd793e58f8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 114.503175] RAX: ffffffffffffffda RBX: 0000564b2228f060 RCX: 00007fdc898e948a [ 114.503588] RDX: 0000564b2228f260 RSI: 0000564b2228f2e0 RDI: 0000564b22297ce0 [ 114.504925] RBP: 0000000000000000 R08: 0000564b2228f280 R09: 0000000000000020 [ 114.505484] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 0000564b22297ce0 [ 114.505823] R13: 0000564b2228f260 R14: 0000000000000000 R15: 00000000ffffffff [ 114.506562] </TASK> [ 114.506887] Modules linked in: [ 114.507648] CR2: 0000000000000008 [ 114.508884] ---[ end trace 0000000000000000 ]--- [ 114.509675] RIP: 0010:d_flags_for_inode+0xe0/0x110 [ 114.510140] Code: 24 f7 ff 49 83 3e 00 74 41 41 83 cd 02 66 44 89 6b 02 eb 92 48 8d 7b 20 e8 6d 24 f7 ff 4c 8b 73 20 49 8d 7e 08 e8 60 241 [ 114.511762] RSP: 0018:ffff8880065e7aa8 EFLAGS: 00000296 [ 114.512401] RAX: 0000000000000001 RBX: ffff888008ccd750 RCX: ffffffff84af2aea [ 114.513103] RDX: 0000000000000001 RSI: 0000000000000008 RDI: ffffffff87abd020 [ 114.513512] RBP: ffff8880065e7ac8 R08: 0000000000000001 R09: fffffbfff0f57a05 [ 114.513831] R10: ffffffff87abd027 R11: fffffbfff0f57a04 R12: 0000000000000000 [ 114.514757] R13: 0000000000000008 R14: 0000000000000000 R15: ffff888008ccd750 [ 114.515411] FS: 00007fdc8a627e40(0000) GS:ffff888058200000(0000) knlGS:0000000000000000 [ 114.515794] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 114.516208] CR2: 0000000000000008 CR3: 00000000013ba000 CR4: 00000000000006f0
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/super.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c index 170682c2bf67..94f9e4b775a7 100644 --- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -1260,9 +1260,9 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc) ref.low = cpu_to_le32(MFT_REC_ROOT); ref.seq = cpu_to_le16(MFT_REC_ROOT); inode = ntfs_iget5(sb, &ref, &NAME_ROOT); - if (IS_ERR(inode)) { + if (IS_ERR(inode) || !inode->i_op) { ntfs_err(sb, "Failed to load root."); - err = PTR_ERR(inode); + err = IS_ERR(inode) ? PTR_ERR(inode) : -EINVAL; goto out; }
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit 4f1dc7d9756e66f3f876839ea174df2e656b7f79 ]
Although the attribute name length is checked before comparing it to some common names (e.g., $I30), the offset isn't. This adds a sanity check for the attribute name offset, guarantee the validity and prevent possible out-of-bound memory accesses.
[ 191.720056] BUG: unable to handle page fault for address: ffffebde00000008 [ 191.721060] #PF: supervisor read access in kernel mode [ 191.721586] #PF: error_code(0x0000) - not-present page [ 191.722079] PGD 0 P4D 0 [ 191.722571] Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI [ 191.723179] CPU: 0 PID: 244 Comm: mount Not tainted 6.0.0-rc4 #28 [ 191.723749] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 191.724832] RIP: 0010:kfree+0x56/0x3b0 [ 191.725870] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069 [ 191.727375] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286 [ 191.727897] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9 [ 191.728531] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040 [ 191.729183] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000 [ 191.729628] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040 [ 191.730158] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0 [ 191.730645] FS: 00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000 [ 191.731328] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.731667] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0 [ 191.732568] Call Trace: [ 191.733231] <TASK> [ 191.733860] kvfree+0x2c/0x40 [ 191.734632] ni_clear+0x180/0x290 [ 191.735085] ntfs_evict_inode+0x45/0x70 [ 191.735495] evict+0x199/0x280 [ 191.735996] iput.part.0+0x286/0x320 [ 191.736438] iput+0x32/0x50 [ 191.736811] iget_failed+0x23/0x30 [ 191.737270] ntfs_iget5+0x337/0x1890 [ 191.737629] ? ntfs_clear_mft_tail+0x20/0x260 [ 191.738201] ? ntfs_get_block_bmap+0x70/0x70 [ 191.738482] ? ntfs_objid_init+0xf6/0x140 [ 191.738779] ? ntfs_reparse_init+0x140/0x140 [ 191.739266] ntfs_fill_super+0x121b/0x1b50 [ 191.739623] ? put_ntfs+0x1d0/0x1d0 [ 191.739984] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 191.740466] ? put_ntfs+0x1d0/0x1d0 [ 191.740787] ? sb_set_blocksize+0x6a/0x80 [ 191.741272] get_tree_bdev+0x232/0x370 [ 191.741829] ? put_ntfs+0x1d0/0x1d0 [ 191.742669] ntfs_fs_get_tree+0x15/0x20 [ 191.743132] vfs_get_tree+0x4c/0x130 [ 191.743457] path_mount+0x654/0xfe0 [ 191.743938] ? putname+0x80/0xa0 [ 191.744271] ? finish_automount+0x2e0/0x2e0 [ 191.744582] ? putname+0x80/0xa0 [ 191.745053] ? kmem_cache_free+0x1c4/0x440 [ 191.745403] ? putname+0x80/0xa0 [ 191.745616] do_mount+0xd6/0xf0 [ 191.745887] ? path_mount+0xfe0/0xfe0 [ 191.746287] ? __kasan_check_write+0x14/0x20 [ 191.746582] __x64_sys_mount+0xca/0x110 [ 191.746850] do_syscall_64+0x3b/0x90 [ 191.747122] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 191.747517] RIP: 0033:0x7f351fee948a [ 191.748332] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 191.749341] RSP: 002b:00007ffd51cf3af8 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 191.749960] RAX: ffffffffffffffda RBX: 000055b903733060 RCX: 00007f351fee948a [ 191.750589] RDX: 000055b903733260 RSI: 000055b9037332e0 RDI: 000055b90373bce0 [ 191.751115] RBP: 0000000000000000 R08: 000055b903733280 R09: 0000000000000020 [ 191.751537] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055b90373bce0 [ 191.751946] R13: 000055b903733260 R14: 0000000000000000 R15: 00000000ffffffff [ 191.752519] </TASK> [ 191.752782] Modules linked in: [ 191.753785] CR2: ffffebde00000008 [ 191.754937] ---[ end trace 0000000000000000 ]--- [ 191.755429] RIP: 0010:kfree+0x56/0x3b0 [ 191.755725] Code: 80 48 01 d8 0f 82 65 03 00 00 48 c7 c2 00 00 00 80 48 2b 15 2c 06 dd 01 48 01 d0 48 c1 e8 0c 48 c1 e0 06 48 03 05 0a 069 [ 191.756744] RSP: 0018:ffff8880076f7878 EFLAGS: 00000286 [ 191.757218] RAX: ffffebde00000000 RBX: 0000000000000040 RCX: ffffffff8528d5b9 [ 191.757580] RDX: 0000777f80000000 RSI: ffffffff8522d49c RDI: 0000000000000040 [ 191.758016] RBP: ffff8880076f78a0 R08: 0000000000000000 R09: 0000000000000000 [ 191.758570] R10: ffff888008949fd8 R11: ffffed10011293fd R12: 0000000000000040 [ 191.758957] R13: ffff888008949f98 R14: ffff888008949ec0 R15: ffff888008949fb0 [ 191.759317] FS: 00007f3520cd7e40(0000) GS:ffff88805ba00000(0000) knlGS:0000000000000000 [ 191.759711] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 191.760118] CR2: ffffebde00000008 CR3: 0000000009704000 CR4: 00000000000006f0
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/inode.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 3f5e3ca099c7..791d049a9ad7 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -129,6 +129,9 @@ static struct inode *ntfs_read_mft(struct inode *inode, rsize = attr->non_res ? 0 : le32_to_cpu(attr->res.data_size); asize = le32_to_cpu(attr->size);
+ if (le16_to_cpu(attr->name_off) + attr->name_len > asize) + goto out; + switch (attr->type) { case ATTR_STD: if (attr->non_res ||
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit 4d42ecda239cc13738d6fd84d098a32e67b368b9 ]
indx_read is called when we have some NTFS directory operations that need more information from the index buffers. This adds a sanity check to make sure the returned index buffer length is legit, or we may have some out-of-bound memory accesses.
[ 560.897595] BUG: KASAN: slab-out-of-bounds in hdr_find_e.isra.0+0x10c/0x320 [ 560.898321] Read of size 2 at addr ffff888009497238 by task exp/245 [ 560.898760] [ 560.899129] CPU: 0 PID: 245 Comm: exp Not tainted 6.0.0-rc6 #37 [ 560.899505] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 560.900170] Call Trace: [ 560.900407] <TASK> [ 560.900732] dump_stack_lvl+0x49/0x63 [ 560.901108] print_report.cold+0xf5/0x689 [ 560.901395] ? hdr_find_e.isra.0+0x10c/0x320 [ 560.901716] kasan_report+0xa7/0x130 [ 560.901950] ? hdr_find_e.isra.0+0x10c/0x320 [ 560.902208] __asan_load2+0x68/0x90 [ 560.902427] hdr_find_e.isra.0+0x10c/0x320 [ 560.902846] ? cmp_uints+0xe0/0xe0 [ 560.903363] ? cmp_sdh+0x90/0x90 [ 560.903883] ? ntfs_bread_run+0x190/0x190 [ 560.904196] ? rwsem_down_read_slowpath+0x750/0x750 [ 560.904969] ? ntfs_fix_post_read+0xe0/0x130 [ 560.905259] ? __kasan_check_write+0x14/0x20 [ 560.905599] ? up_read+0x1a/0x90 [ 560.905853] ? indx_read+0x22c/0x380 [ 560.906096] indx_find+0x2ef/0x470 [ 560.906352] ? indx_find_buffer+0x2d0/0x2d0 [ 560.906692] ? __kasan_kmalloc+0x88/0xb0 [ 560.906977] dir_search_u+0x196/0x2f0 [ 560.907220] ? ntfs_nls_to_utf16+0x450/0x450 [ 560.907464] ? __kasan_check_write+0x14/0x20 [ 560.907747] ? mutex_lock+0x8f/0xe0 [ 560.907970] ? __mutex_lock_slowpath+0x20/0x20 [ 560.908214] ? kmem_cache_alloc+0x143/0x4b0 [ 560.908459] ntfs_lookup+0xe0/0x100 [ 560.908788] __lookup_slow+0x116/0x220 [ 560.909050] ? lookup_fast+0x1b0/0x1b0 [ 560.909309] ? lookup_fast+0x13f/0x1b0 [ 560.909601] walk_component+0x187/0x230 [ 560.909944] link_path_walk.part.0+0x3f0/0x660 [ 560.910285] ? handle_lookup_down+0x90/0x90 [ 560.910618] ? path_init+0x642/0x6e0 [ 560.911084] ? percpu_counter_add_batch+0x6e/0xf0 [ 560.912559] ? __alloc_file+0x114/0x170 [ 560.913008] path_openat+0x19c/0x1d10 [ 560.913419] ? getname_flags+0x73/0x2b0 [ 560.913815] ? kasan_save_stack+0x3a/0x50 [ 560.914125] ? kasan_save_stack+0x26/0x50 [ 560.914542] ? __kasan_slab_alloc+0x6d/0x90 [ 560.914924] ? kmem_cache_alloc+0x143/0x4b0 [ 560.915339] ? getname_flags+0x73/0x2b0 [ 560.915647] ? getname+0x12/0x20 [ 560.916114] ? __x64_sys_open+0x4c/0x60 [ 560.916460] ? path_lookupat.isra.0+0x230/0x230 [ 560.916867] ? __isolate_free_page+0x2e0/0x2e0 [ 560.917194] do_filp_open+0x15c/0x1f0 [ 560.917448] ? may_open_dev+0x60/0x60 [ 560.917696] ? expand_files+0xa4/0x3a0 [ 560.917923] ? __kasan_check_write+0x14/0x20 [ 560.918185] ? _raw_spin_lock+0x88/0xdb [ 560.918409] ? _raw_spin_lock_irqsave+0x100/0x100 [ 560.918783] ? _find_next_bit+0x4a/0x130 [ 560.919026] ? _raw_spin_unlock+0x19/0x40 [ 560.919276] ? alloc_fd+0x14b/0x2d0 [ 560.919635] do_sys_openat2+0x32a/0x4b0 [ 560.920035] ? file_open_root+0x230/0x230 [ 560.920336] ? __rcu_read_unlock+0x5b/0x280 [ 560.920813] do_sys_open+0x99/0xf0 [ 560.921208] ? filp_open+0x60/0x60 [ 560.921482] ? exit_to_user_mode_prepare+0x49/0x180 [ 560.921867] __x64_sys_open+0x4c/0x60 [ 560.922128] do_syscall_64+0x3b/0x90 [ 560.922369] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 560.923030] RIP: 0033:0x7f7dff2e4469 [ 560.923681] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 088 [ 560.924451] RSP: 002b:00007ffd41a210b8 EFLAGS: 00000206 ORIG_RAX: 0000000000000002 [ 560.925168] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f7dff2e4469 [ 560.925655] RDX: 0000000000000000 RSI: 0000000000000002 RDI: 00007ffd41a211f0 [ 560.926085] RBP: 00007ffd41a252a0 R08: 00007f7dff60fba0 R09: 00007ffd41a25388 [ 560.926405] R10: 0000000000400b80 R11: 0000000000000206 R12: 00000000004004e0 [ 560.926867] R13: 00007ffd41a25380 R14: 0000000000000000 R15: 0000000000000000 [ 560.927241] </TASK> [ 560.927491] [ 560.927755] Allocated by task 245: [ 560.928409] kasan_save_stack+0x26/0x50 [ 560.929271] __kasan_kmalloc+0x88/0xb0 [ 560.929778] __kmalloc+0x192/0x320 [ 560.930023] indx_read+0x249/0x380 [ 560.930224] indx_find+0x2a2/0x470 [ 560.930695] dir_search_u+0x196/0x2f0 [ 560.930892] ntfs_lookup+0xe0/0x100 [ 560.931115] __lookup_slow+0x116/0x220 [ 560.931323] walk_component+0x187/0x230 [ 560.931570] link_path_walk.part.0+0x3f0/0x660 [ 560.931791] path_openat+0x19c/0x1d10 [ 560.932008] do_filp_open+0x15c/0x1f0 [ 560.932226] do_sys_openat2+0x32a/0x4b0 [ 560.932413] do_sys_open+0x99/0xf0 [ 560.932709] __x64_sys_open+0x4c/0x60 [ 560.933417] do_syscall_64+0x3b/0x90 [ 560.933776] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 560.934235] [ 560.934486] The buggy address belongs to the object at ffff888009497000 [ 560.934486] which belongs to the cache kmalloc-512 of size 512 [ 560.935239] The buggy address is located 56 bytes to the right of [ 560.935239] 512-byte region [ffff888009497000, ffff888009497200) [ 560.936153] [ 560.937326] The buggy address belongs to the physical page: [ 560.938228] page:0000000062a3dfae refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x9496 [ 560.939616] head:0000000062a3dfae order:1 compound_mapcount:0 compound_pincount:0 [ 560.940219] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) [ 560.942702] raw: 000fffffc0010200 ffffea0000164f80 dead000000000005 ffff888001041c80 [ 560.943932] raw: 0000000000000000 0000000080080008 00000001ffffffff 0000000000000000 [ 560.944568] page dumped because: kasan: bad access detected [ 560.945735] [ 560.946112] Memory state around the buggy address: [ 560.946870] ffff888009497100: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 560.947242] ffff888009497180: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 560.947611] >ffff888009497200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 560.947915] ^ [ 560.948249] ffff888009497280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 560.948687] ffff888009497300: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/index.c | 6 ++++++ 1 file changed, 6 insertions(+)
diff --git a/fs/ntfs3/index.c b/fs/ntfs3/index.c index 440328147e7e..c27b4fe57513 100644 --- a/fs/ntfs3/index.c +++ b/fs/ntfs3/index.c @@ -1017,6 +1017,12 @@ int indx_read(struct ntfs_index *indx, struct ntfs_inode *ni, CLST vbn, err = 0; }
+ /* check for index header length */ + if (offsetof(struct INDEX_BUFFER, ihdr) + ib->ihdr.used > bytes) { + err = -EINVAL; + goto out; + } + in->index = ib; *node = in;
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit 54e45702b648b7c0000e90b3e9b890e367e16ea8 ]
Though we already have some sanity checks while enumerating attributes, resident attribute names aren't included. This patch checks the resident attribute names are in the valid ranges.
[ 259.209031] BUG: KASAN: slab-out-of-bounds in ni_create_attr_list+0x1e1/0x850 [ 259.210770] Write of size 426 at addr ffff88800632f2b2 by task exp/255 [ 259.211551] [ 259.212035] CPU: 0 PID: 255 Comm: exp Not tainted 6.0.0-rc6 #37 [ 259.212955] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 259.214387] Call Trace: [ 259.214640] <TASK> [ 259.214895] dump_stack_lvl+0x49/0x63 [ 259.215284] print_report.cold+0xf5/0x689 [ 259.215565] ? kasan_poison+0x3c/0x50 [ 259.215778] ? kasan_unpoison+0x28/0x60 [ 259.215991] ? ni_create_attr_list+0x1e1/0x850 [ 259.216270] kasan_report+0xa7/0x130 [ 259.216481] ? ni_create_attr_list+0x1e1/0x850 [ 259.216719] kasan_check_range+0x15a/0x1d0 [ 259.216939] memcpy+0x3c/0x70 [ 259.217136] ni_create_attr_list+0x1e1/0x850 [ 259.217945] ? __rcu_read_unlock+0x5b/0x280 [ 259.218384] ? ni_remove_attr+0x2e0/0x2e0 [ 259.218712] ? kernel_text_address+0xcf/0xe0 [ 259.219064] ? __kernel_text_address+0x12/0x40 [ 259.219434] ? arch_stack_walk+0x9e/0xf0 [ 259.219668] ? __this_cpu_preempt_check+0x13/0x20 [ 259.219904] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ 259.220140] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20 [ 259.220561] ni_ins_attr_ext+0x52c/0x5c0 [ 259.220984] ? ni_create_attr_list+0x850/0x850 [ 259.221532] ? run_deallocate+0x120/0x120 [ 259.221972] ? vfs_setxattr+0x128/0x300 [ 259.222688] ? setxattr+0x126/0x140 [ 259.222921] ? path_setxattr+0x164/0x180 [ 259.223431] ? __x64_sys_setxattr+0x6d/0x80 [ 259.223828] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.224417] ? mi_find_attr+0x3c/0xf0 [ 259.224772] ni_insert_attr+0x1ba/0x420 [ 259.225216] ? ni_ins_attr_ext+0x5c0/0x5c0 [ 259.225504] ? ntfs_read_ea+0x119/0x450 [ 259.225775] ni_insert_resident+0xc0/0x1c0 [ 259.226316] ? ni_insert_nonresident+0x400/0x400 [ 259.227001] ? __kasan_kmalloc+0x88/0xb0 [ 259.227468] ? __kmalloc+0x192/0x320 [ 259.227773] ntfs_set_ea+0x6bf/0xb30 [ 259.228216] ? ftrace_graph_ret_addr+0x2a/0xb0 [ 259.228494] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.228838] ? ntfs_read_ea+0x450/0x450 [ 259.229098] ? is_bpf_text_address+0x24/0x40 [ 259.229418] ? kernel_text_address+0xcf/0xe0 [ 259.229681] ? __kernel_text_address+0x12/0x40 [ 259.229948] ? unwind_get_return_address+0x3a/0x60 [ 259.230271] ? write_profile+0x270/0x270 [ 259.230537] ? arch_stack_walk+0x9e/0xf0 [ 259.230836] ntfs_setxattr+0x114/0x5c0 [ 259.231099] ? ntfs_set_acl_ex+0x2e0/0x2e0 [ 259.231529] ? evm_protected_xattr_common+0x6d/0x100 [ 259.231817] ? posix_xattr_acl+0x13/0x80 [ 259.232073] ? evm_protect_xattr+0x1f7/0x440 [ 259.232351] __vfs_setxattr+0xda/0x120 [ 259.232635] ? xattr_resolve_name+0x180/0x180 [ 259.232912] __vfs_setxattr_noperm+0x93/0x300 [ 259.233219] __vfs_setxattr_locked+0x141/0x160 [ 259.233492] ? kasan_poison+0x3c/0x50 [ 259.233744] vfs_setxattr+0x128/0x300 [ 259.234002] ? __vfs_setxattr_locked+0x160/0x160 [ 259.234837] do_setxattr+0xb8/0x170 [ 259.235567] ? vmemdup_user+0x53/0x90 [ 259.236212] setxattr+0x126/0x140 [ 259.236491] ? do_setxattr+0x170/0x170 [ 259.236791] ? debug_smp_processor_id+0x17/0x20 [ 259.237232] ? kasan_quarantine_put+0x57/0x180 [ 259.237605] ? putname+0x80/0xa0 [ 259.237870] ? __kasan_slab_free+0x11c/0x1b0 [ 259.238234] ? putname+0x80/0xa0 [ 259.238500] ? preempt_count_sub+0x18/0xc0 [ 259.238775] ? __mnt_want_write+0xaa/0x100 [ 259.238990] ? mnt_want_write+0x8b/0x150 [ 259.239290] path_setxattr+0x164/0x180 [ 259.239605] ? setxattr+0x140/0x140 [ 259.239849] ? debug_smp_processor_id+0x17/0x20 [ 259.240174] ? fpregs_assert_state_consistent+0x67/0x80 [ 259.240411] __x64_sys_setxattr+0x6d/0x80 [ 259.240715] do_syscall_64+0x3b/0x90 [ 259.240934] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.241697] RIP: 0033:0x7fc6b26e4469 [ 259.242647] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 088 [ 259.244512] RSP: 002b:00007ffc3c7841f8 EFLAGS: 00000217 ORIG_RAX: 00000000000000bc [ 259.245086] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fc6b26e4469 [ 259.246025] RDX: 00007ffc3c784380 RSI: 00007ffc3c7842e0 RDI: 00007ffc3c784238 [ 259.246961] RBP: 00007ffc3c788410 R08: 0000000000000001 R09: 00007ffc3c7884f8 [ 259.247775] R10: 000000000000007f R11: 0000000000000217 R12: 00000000004004e0 [ 259.248534] R13: 00007ffc3c7884f0 R14: 0000000000000000 R15: 0000000000000000 [ 259.249368] </TASK> [ 259.249644] [ 259.249888] Allocated by task 255: [ 259.250283] kasan_save_stack+0x26/0x50 [ 259.250957] __kasan_kmalloc+0x88/0xb0 [ 259.251826] __kmalloc+0x192/0x320 [ 259.252745] ni_create_attr_list+0x11e/0x850 [ 259.253298] ni_ins_attr_ext+0x52c/0x5c0 [ 259.253685] ni_insert_attr+0x1ba/0x420 [ 259.253974] ni_insert_resident+0xc0/0x1c0 [ 259.254311] ntfs_set_ea+0x6bf/0xb30 [ 259.254629] ntfs_setxattr+0x114/0x5c0 [ 259.254859] __vfs_setxattr+0xda/0x120 [ 259.255155] __vfs_setxattr_noperm+0x93/0x300 [ 259.255445] __vfs_setxattr_locked+0x141/0x160 [ 259.255862] vfs_setxattr+0x128/0x300 [ 259.256251] do_setxattr+0xb8/0x170 [ 259.256522] setxattr+0x126/0x140 [ 259.256911] path_setxattr+0x164/0x180 [ 259.257308] __x64_sys_setxattr+0x6d/0x80 [ 259.257637] do_syscall_64+0x3b/0x90 [ 259.257970] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 259.258550] [ 259.258772] The buggy address belongs to the object at ffff88800632f000 [ 259.258772] which belongs to the cache kmalloc-1k of size 1024 [ 259.260190] The buggy address is located 690 bytes inside of [ 259.260190] 1024-byte region [ffff88800632f000, ffff88800632f400) [ 259.261412] [ 259.261743] The buggy address belongs to the physical page: [ 259.262354] page:0000000081e8cac9 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x632c [ 259.263722] head:0000000081e8cac9 order:2 compound_mapcount:0 compound_pincount:0 [ 259.264284] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) [ 259.265312] raw: 000fffffc0010200 ffffea0000060d00 dead000000000004 ffff888001041dc0 [ 259.265772] raw: 0000000000000000 0000000080080008 00000001ffffffff 0000000000000000 [ 259.266305] page dumped because: kasan: bad access detected [ 259.266588] [ 259.266728] Memory state around the buggy address: [ 259.267225] ffff88800632f300: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 259.267841] ffff88800632f380: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 [ 259.269111] >ffff88800632f400: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 259.269626] ^ [ 259.270162] ffff88800632f480: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 259.270810] ffff88800632f500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/record.c | 5 +++++ 1 file changed, 5 insertions(+)
diff --git a/fs/ntfs3/record.c b/fs/ntfs3/record.c index 9f81944441ae..af1e4b364ea8 100644 --- a/fs/ntfs3/record.c +++ b/fs/ntfs3/record.c @@ -265,6 +265,11 @@ struct ATTRIB *mi_enum_attr(struct mft_inode *mi, struct ATTRIB *attr) if (t16 + t32 > asize) return NULL;
+ if (attr->name_len && + le16_to_cpu(attr->name_off) + sizeof(short) * attr->name_len > t16) { + return NULL; + } + return attr; }
From: Hawkins Jiawei yin31149@gmail.com
[ Upstream commit 887bfc546097fbe8071dac13b2fef73b77920899 ]
Syzkaller reports slab-out-of-bounds bug as follows: ================================================================== BUG: KASAN: slab-out-of-bounds in run_unpack+0x8b7/0x970 fs/ntfs3/run.c:944 Read of size 1 at addr ffff88801bbdff02 by task syz-executor131/3611
[...] Call Trace: <TASK> __dump_stack lib/dump_stack.c:88 [inline] dump_stack_lvl+0xcd/0x134 lib/dump_stack.c:106 print_address_description mm/kasan/report.c:317 [inline] print_report.cold+0x2ba/0x719 mm/kasan/report.c:433 kasan_report+0xb1/0x1e0 mm/kasan/report.c:495 run_unpack+0x8b7/0x970 fs/ntfs3/run.c:944 run_unpack_ex+0xb0/0x7c0 fs/ntfs3/run.c:1057 ntfs_read_mft fs/ntfs3/inode.c:368 [inline] ntfs_iget5+0xc20/0x3280 fs/ntfs3/inode.c:501 ntfs_loadlog_and_replay+0x124/0x5d0 fs/ntfs3/fsntfs.c:272 ntfs_fill_super+0x1eff/0x37f0 fs/ntfs3/super.c:1018 get_tree_bdev+0x440/0x760 fs/super.c:1323 vfs_get_tree+0x89/0x2f0 fs/super.c:1530 do_new_mount fs/namespace.c:3040 [inline] path_mount+0x1326/0x1e20 fs/namespace.c:3370 do_mount fs/namespace.c:3383 [inline] __do_sys_mount fs/namespace.c:3591 [inline] __se_sys_mount fs/namespace.c:3568 [inline] __x64_sys_mount+0x27f/0x300 fs/namespace.c:3568 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] </TASK>
The buggy address belongs to the physical page: page:ffffea00006ef600 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x1bbd8 head:ffffea00006ef600 order:3 compound_mapcount:0 compound_pincount:0 flags: 0xfff00000010200(slab|head|node=0|zone=1|lastcpupid=0x7ff) page dumped because: kasan: bad access detected
Memory state around the buggy address: ffff88801bbdfe00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88801bbdfe80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88801bbdff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^ ffff88801bbdff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88801bbe0000: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ==================================================================
Kernel will tries to read record and parse MFT from disk in ntfs_read_mft().
Yet the problem is that during enumerating attributes in record, kernel doesn't check whether run_off field loading from the disk is a valid value.
To be more specific, if attr->nres.run_off is larger than attr->size, kernel will passes an invalid argument run_buf_size in run_unpack_ex(), which having an integer overflow. Then this invalid argument will triggers the slab-out-of-bounds Read bug as above.
This patch solves it by adding the sanity check between the offset to packed runs and attribute size.
link: https://lore.kernel.org/all/0000000000009145fc05e94bd5c3@google.com/#t Reported-and-tested-by: syzbot+8d6fbb27a6aded64b25b@syzkaller.appspotmail.com Signed-off-by: Hawkins Jiawei yin31149@gmail.com Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/inode.c | 7 +++++++ 1 file changed, 7 insertions(+)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index 791d049a9ad7..ba2005c12ee3 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -373,6 +373,13 @@ static struct inode *ntfs_read_mft(struct inode *inode, }
t64 = le64_to_cpu(attr->nres.svcn); + + /* offset to packed runs is out-of-bounds */ + if (roff > asize) { + err = -EINVAL; + goto out; + } + err = run_unpack_ex(run, sbi, ino, t64, le64_to_cpu(attr->nres.evcn), t64, Add2Ptr(attr, roff), asize - roff); if (err < 0)
From: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com
[ Upstream commit f74495761df10c25a98256d16ea7465191b6e2cd ]
Some NUC15 LAPBC710 devices don't expose the same DMI information as the Intel reference, add additional entry in the match table.
BugLink: https://github.com/thesofproject/linux/issues/3885 Signed-off-by: Pierre-Louis Bossart pierre-louis.bossart@linux.intel.com Reviewed-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com Signed-off-by: Bard Liao yung-chuan.liao@linux.intel.com Link: https://lore.kernel.org/r/20221018012500.1592994-1-yung-chuan.liao@linux.int... Signed-off-by: Vinod Koul vkoul@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/soundwire/dmi-quirks.c | 8 ++++++++ 1 file changed, 8 insertions(+)
diff --git a/drivers/soundwire/dmi-quirks.c b/drivers/soundwire/dmi-quirks.c index f81cdd83ec26..7969881f126d 100644 --- a/drivers/soundwire/dmi-quirks.c +++ b/drivers/soundwire/dmi-quirks.c @@ -90,6 +90,14 @@ static const struct dmi_system_id adr_remap_quirk_table[] = { }, .driver_data = (void *)intel_tgl_bios, }, + { + /* quirk used for NUC15 LAPBC710 skew */ + .matches = { + DMI_MATCH(DMI_BOARD_VENDOR, "Intel Corporation"), + DMI_MATCH(DMI_BOARD_NAME, "LAPBC710"), + }, + .driver_data = (void *)intel_tgl_bios, + }, { .matches = { DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc"),
From: Edward Lo edward.lo@ambergroup.io
[ Upstream commit bfcdbae0523bd95eb75a739ffb6221a37109881e ]
This enhances the sanity check for $SDH and $SII while initializing NTFS security, guarantees these index root are legit.
[ 162.459513] BUG: KASAN: use-after-free in hdr_find_e.isra.0+0x10c/0x320 [ 162.460176] Read of size 2 at addr ffff8880037bca99 by task mount/243 [ 162.460851] [ 162.461252] CPU: 0 PID: 243 Comm: mount Not tainted 6.0.0-rc7 #42 [ 162.461744] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 162.462609] Call Trace: [ 162.462954] <TASK> [ 162.463276] dump_stack_lvl+0x49/0x63 [ 162.463822] print_report.cold+0xf5/0x689 [ 162.464608] ? unwind_get_return_address+0x3a/0x60 [ 162.465766] ? hdr_find_e.isra.0+0x10c/0x320 [ 162.466975] kasan_report+0xa7/0x130 [ 162.467506] ? _raw_spin_lock_irq+0xc0/0xf0 [ 162.467998] ? hdr_find_e.isra.0+0x10c/0x320 [ 162.468536] __asan_load2+0x68/0x90 [ 162.468923] hdr_find_e.isra.0+0x10c/0x320 [ 162.469282] ? cmp_uints+0xe0/0xe0 [ 162.469557] ? cmp_sdh+0x90/0x90 [ 162.469864] ? ni_find_attr+0x214/0x300 [ 162.470217] ? ni_load_mi+0x80/0x80 [ 162.470479] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 162.470931] ? ntfs_bread_run+0x190/0x190 [ 162.471307] ? indx_get_root+0xe4/0x190 [ 162.471556] ? indx_get_root+0x140/0x190 [ 162.471833] ? indx_init+0x1e0/0x1e0 [ 162.472069] ? fnd_clear+0x115/0x140 [ 162.472363] ? _raw_spin_lock_irqsave+0x100/0x100 [ 162.472731] indx_find+0x184/0x470 [ 162.473461] ? sysvec_apic_timer_interrupt+0x57/0xc0 [ 162.474429] ? indx_find_buffer+0x2d0/0x2d0 [ 162.474704] ? do_syscall_64+0x3b/0x90 [ 162.474962] dir_search_u+0x196/0x2f0 [ 162.475381] ? ntfs_nls_to_utf16+0x450/0x450 [ 162.475661] ? ntfs_security_init+0x3d6/0x440 [ 162.475906] ? is_sd_valid+0x180/0x180 [ 162.476191] ntfs_extend_init+0x13f/0x2c0 [ 162.476496] ? ntfs_fix_post_read+0x130/0x130 [ 162.476861] ? iput.part.0+0x286/0x320 [ 162.477325] ntfs_fill_super+0x11e0/0x1b50 [ 162.477709] ? put_ntfs+0x1d0/0x1d0 [ 162.477970] ? vsprintf+0x20/0x20 [ 162.478258] ? set_blocksize+0x95/0x150 [ 162.478538] get_tree_bdev+0x232/0x370 [ 162.478789] ? put_ntfs+0x1d0/0x1d0 [ 162.479038] ntfs_fs_get_tree+0x15/0x20 [ 162.479374] vfs_get_tree+0x4c/0x130 [ 162.479729] path_mount+0x654/0xfe0 [ 162.480124] ? putname+0x80/0xa0 [ 162.480484] ? finish_automount+0x2e0/0x2e0 [ 162.480894] ? putname+0x80/0xa0 [ 162.481467] ? kmem_cache_free+0x1c4/0x440 [ 162.482280] ? putname+0x80/0xa0 [ 162.482714] do_mount+0xd6/0xf0 [ 162.483264] ? path_mount+0xfe0/0xfe0 [ 162.484782] ? __kasan_check_write+0x14/0x20 [ 162.485593] __x64_sys_mount+0xca/0x110 [ 162.486024] do_syscall_64+0x3b/0x90 [ 162.486543] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 162.487141] RIP: 0033:0x7f9d374e948a [ 162.488324] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 162.489728] RSP: 002b:00007ffe30e73d18 EFLAGS: 00000206 ORIG_RAX: 00000000000000a5 [ 162.490971] RAX: ffffffffffffffda RBX: 0000561cdb43a060 RCX: 00007f9d374e948a [ 162.491669] RDX: 0000561cdb43a260 RSI: 0000561cdb43a2e0 RDI: 0000561cdb442af0 [ 162.492050] RBP: 0000000000000000 R08: 0000561cdb43a280 R09: 0000000000000020 [ 162.492459] R10: 00000000c0ed0000 R11: 0000000000000206 R12: 0000561cdb442af0 [ 162.493183] R13: 0000561cdb43a260 R14: 0000000000000000 R15: 00000000ffffffff [ 162.493644] </TASK> [ 162.493908] [ 162.494214] The buggy address belongs to the physical page: [ 162.494761] page:000000003e38a3d5 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x37bc [ 162.496064] flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) [ 162.497278] raw: 000fffffc0000000 ffffea00000df1c8 ffffea00000df008 0000000000000000 [ 162.498928] raw: 0000000000000000 0000000000240000 00000000ffffffff 0000000000000000 [ 162.500542] page dumped because: kasan: bad access detected [ 162.501057] [ 162.501242] Memory state around the buggy address: [ 162.502230] ffff8880037bc980: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.502977] ffff8880037bca00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.503522] >ffff8880037bca80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.503963] ^ [ 162.504370] ffff8880037bcb00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 162.504766] ffff8880037bcb80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
Signed-off-by: Edward Lo edward.lo@ambergroup.io Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/fsntfs.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-)
diff --git a/fs/ntfs3/fsntfs.c b/fs/ntfs3/fsntfs.c index 4ed15f64b17f..b6e22bcb929b 100644 --- a/fs/ntfs3/fsntfs.c +++ b/fs/ntfs3/fsntfs.c @@ -1849,9 +1849,10 @@ int ntfs_security_init(struct ntfs_sb_info *sbi) goto out; }
- root_sdh = resident_data(attr); + root_sdh = resident_data_ex(attr, sizeof(struct INDEX_ROOT)); if (root_sdh->type != ATTR_ZERO || - root_sdh->rule != NTFS_COLLATION_TYPE_SECURITY_HASH) { + root_sdh->rule != NTFS_COLLATION_TYPE_SECURITY_HASH || + offsetof(struct INDEX_ROOT, ihdr) + root_sdh->ihdr.used > attr->res.data_size) { err = -EINVAL; goto out; } @@ -1867,9 +1868,10 @@ int ntfs_security_init(struct ntfs_sb_info *sbi) goto out; }
- root_sii = resident_data(attr); + root_sii = resident_data_ex(attr, sizeof(struct INDEX_ROOT)); if (root_sii->type != ATTR_ZERO || - root_sii->rule != NTFS_COLLATION_TYPE_UINT) { + root_sii->rule != NTFS_COLLATION_TYPE_UINT || + offsetof(struct INDEX_ROOT, ihdr) + root_sii->ihdr.used > attr->res.data_size) { err = -EINVAL; goto out; }
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 0d0f659bf713662fabed973f9996b8f23c59ca51 ]
syzbot is reporting too large allocation at wnd_init() [1], for a crafted filesystem can become wnd->nwnd close to UINT_MAX. Add __GFP_NOWARN in order to avoid too large allocation warning, than exhausting memory by using kvcalloc().
Link: https://syzkaller.appspot.com/bug?extid=fa4648a5446460b7b963 [1] Reported-by: syzot syzbot+fa4648a5446460b7b963@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/bitmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ntfs3/bitmap.c b/fs/ntfs3/bitmap.c index 087282cb130b..bb29bc1782fb 100644 --- a/fs/ntfs3/bitmap.c +++ b/fs/ntfs3/bitmap.c @@ -661,7 +661,7 @@ int wnd_init(struct wnd_bitmap *wnd, struct super_block *sb, size_t nbits) if (!wnd->bits_last) wnd->bits_last = wbits;
- wnd->free_bits = kcalloc(wnd->nwnd, sizeof(u16), GFP_NOFS); + wnd->free_bits = kcalloc(wnd->nwnd, sizeof(u16), GFP_NOFS | __GFP_NOWARN); if (!wnd->free_bits) return -ENOMEM;
From: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp
[ Upstream commit 59bfd7a483da36bd202532a3d9ea1f14f3bf3aaf ]
syzbot is reporting too large allocation at ntfs_fill_super() [1], for a crafted filesystem can contain bogus inode->i_size. Add __GFP_NOWARN in order to avoid too large allocation warning, than exhausting memory by using kvmalloc().
Link: https://syzkaller.appspot.com/bug?extid=33f3faaa0c08744f7d40 [1] Reported-by: syzot syzbot+33f3faaa0c08744f7d40@syzkaller.appspotmail.com Signed-off-by: Tetsuo Handa penguin-kernel@I-love.SAKURA.ne.jp Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/super.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ntfs3/super.c b/fs/ntfs3/super.c index 94f9e4b775a7..8e2fe0f69203 100644 --- a/fs/ntfs3/super.c +++ b/fs/ntfs3/super.c @@ -1141,7 +1141,7 @@ static int ntfs_fill_super(struct super_block *sb, struct fs_context *fc) goto put_inode_out; } bytes = inode->i_size; - sbi->def_table = t = kmalloc(bytes, GFP_NOFS); + sbi->def_table = t = kmalloc(bytes, GFP_NOFS | __GFP_NOWARN); if (!t) { err = -ENOMEM; goto put_inode_out;
From: Dan Carpenter dan.carpenter@oracle.com
[ Upstream commit 658015167a8432b88f5d032e9d85d8fd50e5bf2c ]
There were two patches which addressed the same bug and added the same condition:
commit 6db620863f85 ("fs/ntfs3: Validate data run offset") commit 887bfc546097 ("fs/ntfs3: Fix slab-out-of-bounds read in run_unpack")
Delete one condition.
Signed-off-by: Dan Carpenter dan.carpenter@oracle.com Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/inode.c | 6 ------ 1 file changed, 6 deletions(-)
diff --git a/fs/ntfs3/inode.c b/fs/ntfs3/inode.c index ba2005c12ee3..471ea4d813ad 100644 --- a/fs/ntfs3/inode.c +++ b/fs/ntfs3/inode.c @@ -374,12 +374,6 @@ static struct inode *ntfs_read_mft(struct inode *inode,
t64 = le64_to_cpu(attr->nres.svcn);
- /* offset to packed runs is out-of-bounds */ - if (roff > asize) { - err = -EINVAL; - goto out; - } - err = run_unpack_ex(run, sbi, ino, t64, le64_to_cpu(attr->nres.evcn), t64, Add2Ptr(attr, roff), asize - roff); if (err < 0)
From: Yin Xiujiang yinxiujiang@kylinos.cn
[ Upstream commit ecfbd57cf9c5ca225184ae266ce44ae473792132 ]
When PAGE_SIZE is 64K, if read_log_page is called by log_read_rst for the first time, the size of *buffer would be equal to DefaultLogPageSize(4K).But for *buffer operations like memcpy, if the memory area size(n) which being assigned to buffer is larger than 4K (log->page_size(64K) or bytes(64K-page_off)), it will cause an out of boundary error. Call trace: [...] kasan_report+0x44/0x130 check_memory_region+0xf8/0x1a0 memcpy+0xc8/0x100 ntfs_read_run_nb+0x20c/0x460 read_log_page+0xd0/0x1f4 log_read_rst+0x110/0x75c log_replay+0x1e8/0x4aa0 ntfs_loadlog_and_replay+0x290/0x2d0 ntfs_fill_super+0x508/0xec0 get_tree_bdev+0x1fc/0x34c [...]
Fix this by setting variable r_page to NULL in log_read_rst.
Signed-off-by: Yin Xiujiang yinxiujiang@kylinos.cn Signed-off-by: Konstantin Komarov almaz.alexandrovich@paragon-software.com Signed-off-by: Sasha Levin sashal@kernel.org --- fs/ntfs3/fslog.c | 26 ++------------------------ 1 file changed, 2 insertions(+), 24 deletions(-)
diff --git a/fs/ntfs3/fslog.c b/fs/ntfs3/fslog.c index 138050b1aa93..4236194af35e 100644 --- a/fs/ntfs3/fslog.c +++ b/fs/ntfs3/fslog.c @@ -1132,7 +1132,7 @@ static int read_log_page(struct ntfs_log *log, u32 vbo, return -EINVAL;
if (!*buffer) { - to_free = kmalloc(bytes, GFP_NOFS); + to_free = kmalloc(log->page_size, GFP_NOFS); if (!to_free) return -ENOMEM; *buffer = to_free; @@ -1180,10 +1180,7 @@ static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first, struct restart_info *info) { u32 skip, vbo; - struct RESTART_HDR *r_page = kmalloc(DefaultLogPageSize, GFP_NOFS); - - if (!r_page) - return -ENOMEM; + struct RESTART_HDR *r_page = NULL;
/* Determine which restart area we are looking for. */ if (first) { @@ -1197,7 +1194,6 @@ static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first, /* Loop continuously until we succeed. */ for (; vbo < l_size; vbo = 2 * vbo + skip, skip = 0) { bool usa_error; - u32 sys_page_size; bool brst, bchk; struct RESTART_AREA *ra;
@@ -1251,24 +1247,6 @@ static int log_read_rst(struct ntfs_log *log, u32 l_size, bool first, goto check_result; }
- /* Read the entire restart area. */ - sys_page_size = le32_to_cpu(r_page->sys_page_size); - if (DefaultLogPageSize != sys_page_size) { - kfree(r_page); - r_page = kzalloc(sys_page_size, GFP_NOFS); - if (!r_page) - return -ENOMEM; - - if (read_log_page(log, vbo, - (struct RECORD_PAGE_HDR **)&r_page, - &usa_error)) { - /* Ignore any errors. */ - kfree(r_page); - r_page = NULL; - continue; - } - } - if (is_client_area_valid(r_page, usa_error)) { info->valid_page = true; ra = Add2Ptr(r_page, le16_to_cpu(r_page->ra_off));
From: Christophe Leroy christophe.leroy@csgroup.eu
[ Upstream commit efb11fdb3e1a9f694fa12b70b21e69e55ec59c36 ]
find_insn() will return NULL in case of failure. Check insn in order to avoid a kernel Oops for NULL pointer dereference.
Tested-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Reviewed-by: Naveen N. Rao naveen.n.rao@linux.vnet.ibm.com Acked-by: Josh Poimboeuf jpoimboe@kernel.org Acked-by: Peter Zijlstra (Intel) peterz@infradead.org Signed-off-by: Christophe Leroy christophe.leroy@csgroup.eu Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20221114175754.1131267-9-sv@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- tools/objtool/check.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tools/objtool/check.c b/tools/objtool/check.c index 67afdce3421f..274bc3d4bc3f 100644 --- a/tools/objtool/check.c +++ b/tools/objtool/check.c @@ -207,7 +207,7 @@ static bool __dead_end_function(struct objtool_file *file, struct symbol *func, return false;
insn = find_insn(file, func->sec, func->offset); - if (!insn->func) + if (!insn || !insn->func) return false;
func_for_each_insn(file, func, insn) {
From: Ricardo Ribalda ribalda@chromium.org
[ Upstream commit 00ef8885a945c37551547d8ac8361cacd20c4e42 ]
If the system is rebooted via isr(), the IRQ handler might be triggered before the domain is initialized. Resulting on an invalid memory access error.
Fix: [ 0.500930] Unable to handle kernel read from unreadable memory at virtual address 0000000000000070 [ 0.501166] Call trace: [ 0.501174] report_iommu_fault+0x28/0xfc [ 0.501180] mtk_iommu_isr+0x10c/0x1c0
Signed-off-by: Ricardo Ribalda ribalda@chromium.org Reviewed-by: AngeloGioacchino Del Regno angelogioacchino.delregno@collabora.com Reviewed-by: Robin Murphy robin.murphy@arm.com Link: https://lore.kernel.org/r/20221125-mtk-iommu-v2-0-e168dff7d43e@chromium.org [ joro: Fixed spelling in commit message ] Signed-off-by: Joerg Roedel jroedel@suse.de Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/iommu/mtk_iommu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/iommu/mtk_iommu.c b/drivers/iommu/mtk_iommu.c index ec73720e239b..15fa380b1f84 100644 --- a/drivers/iommu/mtk_iommu.c +++ b/drivers/iommu/mtk_iommu.c @@ -452,7 +452,7 @@ static irqreturn_t mtk_iommu_isr(int irq, void *dev_id) fault_larb = data->plat_data->larbid_remap[fault_larb][sub_comm]; }
- if (report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova, + if (!dom || report_iommu_fault(&dom->domain, bank->parent_dev, fault_iova, write ? IOMMU_FAULT_WRITE : IOMMU_FAULT_READ)) { dev_err_ratelimited( bank->parent_dev,
From: Nathan Lynch nathanl@linux.ibm.com
[ Upstream commit ed2213bfb192ab51f09f12e9b49b5d482c6493f3 ]
rtas_os_term() is called during panic. Its behavior depends on a couple of conditions in the /rtas node of the device tree, the traversal of which entails locking and local IRQ state changes. If the kernel panics while devtree_lock is held, rtas_os_term() as currently written could hang.
Instead of discovering the relevant characteristics at panic time, cache them in file-static variables at boot. Note the lookup for "ibm,extended-os-term" is converted to of_property_read_bool() since it is a boolean property, not an RTAS function token.
Signed-off-by: Nathan Lynch nathanl@linux.ibm.com Reviewed-by: Nicholas Piggin npiggin@gmail.com Reviewed-by: Andrew Donnellan ajd@linux.ibm.com [mpe: Incorporate suggested change from Nick] Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20221118150751.469393-4-nathanl@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/rtas.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 0b8a858aa847..082b0c9e0c8a 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -875,6 +875,7 @@ void __noreturn rtas_halt(void)
/* Must be in the RMO region, so we place it here */ static char rtas_os_term_buf[2048]; +static s32 ibm_os_term_token = RTAS_UNKNOWN_SERVICE;
void rtas_os_term(char *str) { @@ -886,14 +887,13 @@ void rtas_os_term(char *str) * this property may terminate the partition which we want to avoid * since it interferes with panic_timeout. */ - if (RTAS_UNKNOWN_SERVICE == rtas_token("ibm,os-term") || - RTAS_UNKNOWN_SERVICE == rtas_token("ibm,extended-os-term")) + if (ibm_os_term_token == RTAS_UNKNOWN_SERVICE) return;
snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str);
do { - status = rtas_call(rtas_token("ibm,os-term"), 1, 1, NULL, + status = rtas_call(ibm_os_term_token, 1, 1, NULL, __pa(rtas_os_term_buf)); } while (rtas_busy_delay(status));
@@ -1255,6 +1255,13 @@ void __init rtas_initialize(void) no_entry = of_property_read_u32(rtas.dev, "linux,rtas-entry", &entry); rtas.entry = no_entry ? rtas.base : entry;
+ /* + * Discover these now to avoid device tree lookups in the + * panic path. + */ + if (of_property_read_bool(rtas.dev, "ibm,extended-os-term")) + ibm_os_term_token = rtas_token("ibm,os-term"); + /* If RTAS was found, allocate the RMO buffer for it and look for * the stop-self token if any */
From: Nathan Lynch nathanl@linux.ibm.com
[ Upstream commit 6c606e57eecc37d6b36d732b1ff7e55b7dc32dd4 ]
It's unsafe to use rtas_busy_delay() to handle a busy status from the ibm,os-term RTAS function in rtas_os_term():
Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b BUG: sleeping function called from invalid context at arch/powerpc/kernel/rtas.c:618 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 1, name: swapper/0 preempt_count: 2, expected: 0 CPU: 7 PID: 1 Comm: swapper/0 Tainted: G D 6.0.0-rc5-02182-gf8553a572277-dirty #9 Call Trace: [c000000007b8f000] [c000000001337110] dump_stack_lvl+0xb4/0x110 (unreliable) [c000000007b8f040] [c0000000002440e4] __might_resched+0x394/0x3c0 [c000000007b8f0e0] [c00000000004f680] rtas_busy_delay+0x120/0x1b0 [c000000007b8f100] [c000000000052d04] rtas_os_term+0xb8/0xf4 [c000000007b8f180] [c0000000001150fc] pseries_panic+0x50/0x68 [c000000007b8f1f0] [c000000000036354] ppc_panic_platform_handler+0x34/0x50 [c000000007b8f210] [c0000000002303c4] notifier_call_chain+0xd4/0x1c0 [c000000007b8f2b0] [c0000000002306cc] atomic_notifier_call_chain+0xac/0x1c0 [c000000007b8f2f0] [c0000000001d62b8] panic+0x228/0x4d0 [c000000007b8f390] [c0000000001e573c] do_exit+0x140c/0x1420 [c000000007b8f480] [c0000000001e586c] make_task_dead+0xdc/0x200
Use rtas_busy_delay_time() instead, which signals without side effects whether to attempt the ibm,os-term RTAS call again.
Signed-off-by: Nathan Lynch nathanl@linux.ibm.com Reviewed-by: Nicholas Piggin npiggin@gmail.com Reviewed-by: Andrew Donnellan ajd@linux.ibm.com Signed-off-by: Michael Ellerman mpe@ellerman.id.au Link: https://lore.kernel.org/r/20221118150751.469393-5-nathanl@linux.ibm.com Signed-off-by: Sasha Levin sashal@kernel.org --- arch/powerpc/kernel/rtas.c | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-)
diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c index 082b0c9e0c8a..5a873d4fbd7b 100644 --- a/arch/powerpc/kernel/rtas.c +++ b/arch/powerpc/kernel/rtas.c @@ -892,10 +892,15 @@ void rtas_os_term(char *str)
snprintf(rtas_os_term_buf, 2048, "OS panic: %s", str);
+ /* + * Keep calling as long as RTAS returns a "try again" status, + * but don't use rtas_busy_delay(), which potentially + * schedules. + */ do { status = rtas_call(ibm_os_term_token, 1, 1, NULL, __pa(rtas_os_term_buf)); - } while (rtas_busy_delay(status)); + } while (rtas_busy_delay_time(status));
if (status != 0) printk(KERN_EMERG "ibm,os-term call failed %d\n", status);
From: Kees Cook keescook@chromium.org
[ Upstream commit 21b8a1dd56a163825e5749b303858fb902ebf198 ]
With clang's kernel control flow integrity (kCFI, CONFIG_CFI_CLANG), indirect call targets are validated against the expected function pointer prototype to make sure the call target is valid to help mitigate ROP attacks. If they are not identical, there is a failure at run time, which manifests as either a kernel panic or thread getting killed.
msc313_rtc_probe() was passing clk_disable_unprepare() directly, which did not have matching prototypes for devm_add_action_or_reset()'s callback argument. Refactor to use devm_clk_get_enabled() instead.
This was found as a result of Clang's new -Wcast-function-type-strict flag, which is more sensitive than the simpler -Wcast-function-type, which only checks for type width mismatches.
Reported-by: kernel test robot lkp@intel.com Link: https://lore.kernel.org/lkml/202211041527.HD8TLSE1-lkp@intel.com Suggested-by: Christophe JAILLET christophe.jaillet@wanadoo.fr Cc: Daniel Palmer daniel@thingy.jp Cc: Romain Perier romain.perier@gmail.com Cc: Alessandro Zummo a.zummo@towertech.it Cc: Alexandre Belloni alexandre.belloni@bootlin.com Cc: linux-arm-kernel@lists.infradead.org Cc: linux-rtc@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Reviewed-by: Daniel Palmer daniel@thingy.jp Tested-by: Daniel Palmer daniel@thingy.jp Link: https://lore.kernel.org/r/20221202184525.gonna.423-kees@kernel.org Signed-off-by: Alexandre Belloni alexandre.belloni@bootlin.com Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/rtc/rtc-msc313.c | 12 +----------- 1 file changed, 1 insertion(+), 11 deletions(-)
diff --git a/drivers/rtc/rtc-msc313.c b/drivers/rtc/rtc-msc313.c index f3fde013c4b8..8d7737e0e2e0 100644 --- a/drivers/rtc/rtc-msc313.c +++ b/drivers/rtc/rtc-msc313.c @@ -212,22 +212,12 @@ static int msc313_rtc_probe(struct platform_device *pdev) return ret; }
- clk = devm_clk_get(dev, NULL); + clk = devm_clk_get_enabled(dev, NULL); if (IS_ERR(clk)) { dev_err(dev, "No input reference clock\n"); return PTR_ERR(clk); }
- ret = clk_prepare_enable(clk); - if (ret) { - dev_err(dev, "Failed to enable the reference clock, %d\n", ret); - return ret; - } - - ret = devm_add_action_or_reset(dev, (void (*) (void *))clk_disable_unprepare, clk); - if (ret) - return ret; - rate = clk_get_rate(clk); writew(rate & 0xFFFF, priv->rtc_base + REG_RTC_FREQ_CW_L); writew((rate >> 16) & 0xFFFF, priv->rtc_base + REG_RTC_FREQ_CW_H);
From: wuqiang wuqiang.matt@bytedance.com
[ Upstream commit 3b7ddab8a19aefc768f345fd3782af35b4a68d9b ]
Default value of maxactive is set as num_possible_cpus() for nonpreemptable systems. For a 2-core system, only 2 kretprobe instances would be allocated in default, then these 2 instances for execve kretprobe are very likely to be used up with a pipelined command.
Here's the testcase: a shell script was added to crontab, and the content of the script is:
#!/bin/sh do_something_magic `tr -dc a-z < /dev/urandom | head -c 10`
cron will trigger a series of program executions (4 times every hour). Then events loss would be noticed normally after 3-4 hours of testings.
The issue is caused by a burst of series of execve requests. The best number of kretprobe instances could be different case by case, and should be user's duty to determine, but num_possible_cpus() as the default value is inadequate especially for systems with small number of cpus.
This patch enables the logic for preemption as default, thus increases the minimum of maxactive to 10 for nonpreemptable systems.
Link: https://lore.kernel.org/all/20221110081502.492289-1-wuqiang.matt@bytedance.c...
Signed-off-by: wuqiang wuqiang.matt@bytedance.com Reviewed-by: Solar Designer solar@openwall.com Acked-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Masami Hiramatsu (Google) mhiramat@kernel.org Signed-off-by: Sasha Levin sashal@kernel.org --- Documentation/trace/kprobes.rst | 3 +-- kernel/kprobes.c | 8 ++------ 2 files changed, 3 insertions(+), 8 deletions(-)
diff --git a/Documentation/trace/kprobes.rst b/Documentation/trace/kprobes.rst index f318bceda1e6..97d086b23ce8 100644 --- a/Documentation/trace/kprobes.rst +++ b/Documentation/trace/kprobes.rst @@ -131,8 +131,7 @@ For example, if the function is non-recursive and is called with a spinlock held, maxactive = 1 should be enough. If the function is non-recursive and can never relinquish the CPU (e.g., via a semaphore or preemption), NR_CPUS should be enough. If maxactive <= 0, it is -set to a default value. If CONFIG_PREEMPT is enabled, the default -is max(10, 2*NR_CPUS). Otherwise, the default is NR_CPUS. +set to a default value: max(10, 2*NR_CPUS).
It's not a disaster if you set maxactive too low; you'll just miss some probes. In the kretprobe struct, the nmissed field is set to diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 771fcce54fac..fb88278978fe 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -2209,13 +2209,9 @@ int register_kretprobe(struct kretprobe *rp) rp->kp.post_handler = NULL;
/* Pre-allocate memory for max kretprobe instances */ - if (rp->maxactive <= 0) { -#ifdef CONFIG_PREEMPTION + if (rp->maxactive <= 0) rp->maxactive = max_t(unsigned int, 10, 2*num_possible_cpus()); -#else - rp->maxactive = num_possible_cpus(); -#endif - } + #ifdef CONFIG_KRETPROBE_ON_RETHOOK rp->rh = rethook_alloc((void *)rp, kretprobe_rethook_handler); if (!rp->rh)
From: José Expósito jose.exposito89@gmail.com
[ Upstream commit 4eab1c2fe06c98a4dff258dd64800b6986c101e9 ]
The HID descriptor of this device contains two mouse collections, one for mouse emulation and the other for the trackpoint.
Both collections get merged and, because the first one defines X and Y, the movemenent events reported by the trackpoint collection are ignored.
Set the MT_CLS_WIN_8_FORCE_MULTI_INPUT class for this device to be able to receive its reports.
This fix is similar to/based on commit 40d5bb87377a ("HID: multitouch: enable multi-input as a quirk for some devices").
Link: https://gitlab.freedesktop.org/libinput/libinput/-/issues/825 Reported-by: Akito the@akito.ooo Tested-by: Akito the@akito.ooo Signed-off-by: José Expósito jose.exposito89@gmail.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-multitouch.c | 4 ++++ 1 file changed, 4 insertions(+)
diff --git a/drivers/hid/hid-multitouch.c b/drivers/hid/hid-multitouch.c index 91a4d3fc30e0..372cbdd223e0 100644 --- a/drivers/hid/hid-multitouch.c +++ b/drivers/hid/hid-multitouch.c @@ -1967,6 +1967,10 @@ static const struct hid_device_id mt_devices[] = { HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8, USB_VENDOR_ID_ELAN, 0x313a) },
+ { .driver_data = MT_CLS_WIN_8_FORCE_MULTI_INPUT, + HID_DEVICE(BUS_I2C, HID_GROUP_MULTITOUCH_WIN_8, + USB_VENDOR_ID_ELAN, 0x3148) }, + /* Elitegroup panel */ { .driver_data = MT_CLS_SERIAL, MT_USB_DEVICE(USB_VENDOR_ID_ELITEGROUP,
From: Terry Junge linuxhid@cosmicgizmosystems.com
[ Upstream commit 3d57f36c89d8ba32b2c312f397a37fd1a2dc7cfc ]
I no longer work for Plantronics (aka Poly, aka HP) and do not have access to the headsets in order to test. However, as noted by Maxim, the other 32xx models that share the same base code set as the 3220 would need the same quirk. This patch adds the PIDs for the rest of the Blackwire 32XX product family that require the quirk.
Plantronics Blackwire 3210 Series (047f:c055) Plantronics Blackwire 3215 Series (047f:c057) Plantronics Blackwire 3225 Series (047f:c058)
Quote from previous patch by Maxim Mikityanskiy Plantronics Blackwire 3220 Series (047f:c056) sends HID reports twice for each volume key press. This patch adds a quirk to hid-plantronics for this product ID, which will ignore the second volume key press if it happens within 5 ms from the last one that was handled.
The patch was tested on the mentioned model only, it shouldn't affect other models, however, this quirk might be needed for them too. Auto-repeat (when a key is held pressed) is not affected, because the rate is about 3 times per second, which is far less frequent than once in 5 ms. End quote
Signed-off-by: Terry Junge linuxhid@cosmicgizmosystems.com Signed-off-by: Jiri Kosina jkosina@suse.cz Signed-off-by: Sasha Levin sashal@kernel.org --- drivers/hid/hid-ids.h | 3 +++ drivers/hid/hid-plantronics.c | 9 +++++++++ 2 files changed, 12 insertions(+)
diff --git a/drivers/hid/hid-ids.h b/drivers/hid/hid-ids.h index 86e754b9400f..5680543e97fd 100644 --- a/drivers/hid/hid-ids.h +++ b/drivers/hid/hid-ids.h @@ -995,7 +995,10 @@ #define USB_DEVICE_ID_ORTEK_IHOME_IMAC_A210S 0x8003
#define USB_VENDOR_ID_PLANTRONICS 0x047f +#define USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3210_SERIES 0xc055 #define USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3220_SERIES 0xc056 +#define USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3215_SERIES 0xc057 +#define USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3225_SERIES 0xc058
#define USB_VENDOR_ID_PANASONIC 0x04da #define USB_DEVICE_ID_PANABOARD_UBT780 0x1044 diff --git a/drivers/hid/hid-plantronics.c b/drivers/hid/hid-plantronics.c index e81b7cec2d12..3d414ae194ac 100644 --- a/drivers/hid/hid-plantronics.c +++ b/drivers/hid/hid-plantronics.c @@ -198,9 +198,18 @@ static int plantronics_probe(struct hid_device *hdev, }
static const struct hid_device_id plantronics_devices[] = { + { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, + USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3210_SERIES), + .driver_data = PLT_QUIRK_DOUBLE_VOLUME_KEYS }, { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3220_SERIES), .driver_data = PLT_QUIRK_DOUBLE_VOLUME_KEYS }, + { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, + USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3215_SERIES), + .driver_data = PLT_QUIRK_DOUBLE_VOLUME_KEYS }, + { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, + USB_DEVICE_ID_PLANTRONICS_BLACKWIRE_3225_SERIES), + .driver_data = PLT_QUIRK_DOUBLE_VOLUME_KEYS }, { HID_USB_DEVICE(USB_VENDOR_ID_PLANTRONICS, HID_ANY_ID) }, { } };
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
commit 94cd8fa09f5f1ebdd4e90964b08b7f2cc4b36c43 upstream.
In a scenario where kcalloc() fails to allocate memory, the futex_waitv system call immediately returns -ENOMEM without invoking destroy_hrtimer_on_stack(). When CONFIG_DEBUG_OBJECTS_TIMERS=y, this results in leaking a timer debug object.
Fixes: bf69bad38cf6 ("futex: Implement sys_futex_waitv()") Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Signed-off-by: Peter Zijlstra (Intel) peterz@infradead.org Reviewed-by: Davidlohr Bueso dave@stgolabs.net Cc: stable@vger.kernel.org Cc: stable@vger.kernel.org # v5.16+ Link: https://lore.kernel.org/r/20221214222008.200393-1-mathieu.desnoyers@efficios... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/futex/syscalls.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-)
--- a/kernel/futex/syscalls.c +++ b/kernel/futex/syscalls.c @@ -286,19 +286,22 @@ SYSCALL_DEFINE5(futex_waitv, struct fute }
futexv = kcalloc(nr_futexes, sizeof(*futexv), GFP_KERNEL); - if (!futexv) - return -ENOMEM; + if (!futexv) { + ret = -ENOMEM; + goto destroy_timer; + }
ret = futex_parse_waitv(futexv, waiters, nr_futexes); if (!ret) ret = futex_wait_multiple(futexv, nr_futexes, timeout ? &to : NULL);
+ kfree(futexv); + +destroy_timer: if (timeout) { hrtimer_cancel(&to.timer); destroy_hrtimer_on_stack(&to.timer); } - - kfree(futexv); return ret; }
From: Mel Gorman mgorman@techsingularity.net
commit 1c0908d8e441631f5b8ba433523cf39339ee2ba0 upstream.
Jan Kara reported the following bug triggering on 6.0.5-rt14 running dbench on XFS on arm64.
kernel BUG at fs/inode.c:625! Internal error: Oops - BUG: 0 [#1] PREEMPT_RT SMP CPU: 11 PID: 6611 Comm: dbench Tainted: G E 6.0.0-rt14-rt+ #1 pc : clear_inode+0xa0/0xc0 lr : clear_inode+0x38/0xc0 Call trace: clear_inode+0xa0/0xc0 evict+0x160/0x180 iput+0x154/0x240 do_unlinkat+0x184/0x300 __arm64_sys_unlinkat+0x48/0xc0 el0_svc_common.constprop.4+0xe4/0x2c0 do_el0_svc+0xac/0x100 el0_svc+0x78/0x200 el0t_64_sync_handler+0x9c/0xc0 el0t_64_sync+0x19c/0x1a0
It also affects 6.1-rc7-rt5 and affects a preempt-rt fork of 5.14 so this is likely a bug that existed forever and only became visible when ARM support was added to preempt-rt. The same problem does not occur on x86-64 and he also reported that converting sb->s_inode_wblist_lock to raw_spinlock_t makes the problem disappear indicating that the RT spinlock variant is the problem.
Which in turn means that RT mutexes on ARM64 and any other weakly ordered architecture are affected by this independent of RT.
Will Deacon observed:
"I'd be more inclined to be suspicious of the slowpath tbh, as we need to make sure that we have acquire semantics on all paths where the lock can be taken. Looking at the rtmutex code, this really isn't obvious to me -- for example, try_to_take_rt_mutex() appears to be able to return via the 'takeit' label without acquire semantics and it looks like we might be relying on the caller's subsequent _unlock_ of the wait_lock for ordering, but that will give us release semantics which aren't correct."
Sebastian Andrzej Siewior prototyped a fix that does work based on that comment but it was a little bit overkill and added some fences that should not be necessary.
The lock owner is updated with an IRQ-safe raw spinlock held, but the spin_unlock does not provide acquire semantics which are needed when acquiring a mutex.
Adds the necessary acquire semantics for lock owner updates in the slow path acquisition and the waiter bit logic.
It successfully completed 10 iterations of the dbench workload while the vanilla kernel fails on the first iteration.
[ bigeasy@linutronix.de: Initial prototype fix ]
Fixes: 700318d1d7b38 ("locking/rtmutex: Use acquire/release semantics") Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") Reported-by: Jan Kara jack@suse.cz Signed-off-by: Mel Gorman mgorman@techsingularity.net Signed-off-by: Thomas Gleixner tglx@linutronix.de Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221202100223.6mevpbl7i6x5udfd@techsingularity.ne... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/locking/rtmutex.c | 55 +++++++++++++++++++++++++++++++++++-------- kernel/locking/rtmutex_api.c | 6 ++-- 2 files changed, 49 insertions(+), 12 deletions(-)
--- a/kernel/locking/rtmutex.c +++ b/kernel/locking/rtmutex.c @@ -89,15 +89,31 @@ static inline int __ww_mutex_check_kill( * set this bit before looking at the lock. */
-static __always_inline void -rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner) +static __always_inline struct task_struct * +rt_mutex_owner_encode(struct rt_mutex_base *lock, struct task_struct *owner) { unsigned long val = (unsigned long)owner;
if (rt_mutex_has_waiters(lock)) val |= RT_MUTEX_HAS_WAITERS;
- WRITE_ONCE(lock->owner, (struct task_struct *)val); + return (struct task_struct *)val; +} + +static __always_inline void +rt_mutex_set_owner(struct rt_mutex_base *lock, struct task_struct *owner) +{ + /* + * lock->wait_lock is held but explicit acquire semantics are needed + * for a new lock owner so WRITE_ONCE is insufficient. + */ + xchg_acquire(&lock->owner, rt_mutex_owner_encode(lock, owner)); +} + +static __always_inline void rt_mutex_clear_owner(struct rt_mutex_base *lock) +{ + /* lock->wait_lock is held so the unlock provides release semantics. */ + WRITE_ONCE(lock->owner, rt_mutex_owner_encode(lock, NULL)); }
static __always_inline void clear_rt_mutex_waiters(struct rt_mutex_base *lock) @@ -106,7 +122,8 @@ static __always_inline void clear_rt_mut ((unsigned long)lock->owner & ~RT_MUTEX_HAS_WAITERS); }
-static __always_inline void fixup_rt_mutex_waiters(struct rt_mutex_base *lock) +static __always_inline void +fixup_rt_mutex_waiters(struct rt_mutex_base *lock, bool acquire_lock) { unsigned long owner, *p = (unsigned long *) &lock->owner;
@@ -172,8 +189,21 @@ static __always_inline void fixup_rt_mut * still set. */ owner = READ_ONCE(*p); - if (owner & RT_MUTEX_HAS_WAITERS) - WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS); + if (owner & RT_MUTEX_HAS_WAITERS) { + /* + * See rt_mutex_set_owner() and rt_mutex_clear_owner() on + * why xchg_acquire() is used for updating owner for + * locking and WRITE_ONCE() for unlocking. + * + * WRITE_ONCE() would work for the acquire case too, but + * in case that the lock acquisition failed it might + * force other lockers into the slow path unnecessarily. + */ + if (acquire_lock) + xchg_acquire(p, owner & ~RT_MUTEX_HAS_WAITERS); + else + WRITE_ONCE(*p, owner & ~RT_MUTEX_HAS_WAITERS); + } }
/* @@ -208,6 +238,13 @@ static __always_inline void mark_rt_mute owner = *p; } while (cmpxchg_relaxed(p, owner, owner | RT_MUTEX_HAS_WAITERS) != owner); + + /* + * The cmpxchg loop above is relaxed to avoid back-to-back ACQUIRE + * operations in the event of contention. Ensure the successful + * cmpxchg is visible. + */ + smp_mb__after_atomic(); }
/* @@ -1243,7 +1280,7 @@ static int __sched __rt_mutex_slowtryloc * try_to_take_rt_mutex() sets the lock waiters bit * unconditionally. Clean this up. */ - fixup_rt_mutex_waiters(lock); + fixup_rt_mutex_waiters(lock, true);
return ret; } @@ -1604,7 +1641,7 @@ static int __sched __rt_mutex_slowlock(s * try_to_take_rt_mutex() sets the waiter bit * unconditionally. We might have to fix that up. */ - fixup_rt_mutex_waiters(lock); + fixup_rt_mutex_waiters(lock, true);
trace_contention_end(lock, ret);
@@ -1719,7 +1756,7 @@ static void __sched rtlock_slowlock_lock * try_to_take_rt_mutex() sets the waiter bit unconditionally. * We might have to fix that up: */ - fixup_rt_mutex_waiters(lock); + fixup_rt_mutex_waiters(lock, true); debug_rt_mutex_free_waiter(&waiter);
trace_contention_end(lock, 0); --- a/kernel/locking/rtmutex_api.c +++ b/kernel/locking/rtmutex_api.c @@ -267,7 +267,7 @@ void __sched rt_mutex_init_proxy_locked( void __sched rt_mutex_proxy_unlock(struct rt_mutex_base *lock) { debug_rt_mutex_proxy_unlock(lock); - rt_mutex_set_owner(lock, NULL); + rt_mutex_clear_owner(lock); }
/** @@ -382,7 +382,7 @@ int __sched rt_mutex_wait_proxy_lock(str * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might * have to fix that up. */ - fixup_rt_mutex_waiters(lock); + fixup_rt_mutex_waiters(lock, true); raw_spin_unlock_irq(&lock->wait_lock);
return ret; @@ -438,7 +438,7 @@ bool __sched rt_mutex_cleanup_proxy_lock * try_to_take_rt_mutex() sets the waiter bit unconditionally. We might * have to fix that up. */ - fixup_rt_mutex_waiters(lock); + fixup_rt_mutex_waiters(lock, false);
raw_spin_unlock_irq(&lock->wait_lock);
From: Mathieu Desnoyers mathieu.desnoyers@efficios.com
commit 38ce7c9bdfc228c14d7621ba36d3eebedd9d4f76 upstream.
When encountering any vma in the range with policy other than MPOL_BIND or MPOL_PREFERRED_MANY, an error is returned without issuing a mpol_put on the policy just allocated with mpol_dup().
This allows arbitrary users to leak kernel memory.
Link: https://lkml.kernel.org/r/20221215194621.202816-1-mathieu.desnoyers@efficios... Fixes: c6018b4b2549 ("mm/mempolicy: add set_mempolicy_home_node syscall") Signed-off-by: Mathieu Desnoyers mathieu.desnoyers@efficios.com Reviewed-by: Randy Dunlap rdunlap@infradead.org Reviewed-by: "Huang, Ying" ying.huang@intel.com Reviewed-by: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Acked-by: Michal Hocko mhocko@suse.com Cc: Aneesh Kumar K.V aneesh.kumar@linux.ibm.com Cc: Dave Hansen dave.hansen@linux.intel.com Cc: Feng Tang feng.tang@intel.com Cc: Michal Hocko mhocko@kernel.org Cc: Andrea Arcangeli aarcange@redhat.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Mike Kravetz mike.kravetz@oracle.com Cc: Randy Dunlap rdunlap@infradead.org Cc: Vlastimil Babka vbabka@suse.cz Cc: Andi Kleen ak@linux.intel.com Cc: Dan Williams dan.j.williams@intel.com Cc: Huang Ying ying.huang@intel.com Cc: stable@vger.kernel.org [5.17+] Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/mempolicy.c | 1 + 1 file changed, 1 insertion(+)
--- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1525,6 +1525,7 @@ SYSCALL_DEFINE4(set_mempolicy_home_node, * the home node for vmas we already updated before. */ if (new->mode != MPOL_BIND && new->mode != MPOL_PREFERRED_MANY) { + mpol_put(new); err = -EOPNOTSUPP; break; }
From: Luca Stefani luca@osomprivacy.com
commit beca3e311a49cd3c55a056096531737d7afa4361 upstream.
If mem-type is specified in the device tree it would end up overriding the record_size field instead of populating mem_type.
As record_size is currently parsed after the improper assignment with default size 0 it continued to work as expected regardless of the value found in the device tree.
Simply changing the target field of the struct is enough to get mem-type working as expected.
Fixes: 9d843e8fafc7 ("pstore: Add mem_type property DT parsing support") Cc: stable@vger.kernel.org Signed-off-by: Luca Stefani luca@osomprivacy.com Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/20221222131049.286288-1-luca@osomprivacy.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/pstore/ram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -670,7 +670,7 @@ static int ramoops_parse_dt(struct platf field = value; \ }
- parse_u32("mem-type", pdata->record_size, pdata->mem_type); + parse_u32("mem-type", pdata->mem_type, pdata->mem_type); parse_u32("record-size", pdata->record_size, 0); parse_u32("console-size", pdata->console_size, 0); parse_u32("ftrace-size", pdata->ftrace_size, 0);
From: Qiujun Huang hqjagain@gmail.com
commit 99b3b837855b987563bcfb397cf9ddd88262814b upstream.
There is a case found when triggering a panic_on_oom, pstore fails to dump kmsg. Because psz_kmsg_write_record can't get the new buffer.
Handle this by using GFP_ATOMIC to allocate a buffer at lower watermark.
Signed-off-by: Qiujun Huang hqjagain@gmail.com Fixes: 335426c6dcdd ("pstore/zone: Provide way to skip "broken" zone for MTD devices") Cc: WeiXiong Liao gmpy.liaowx@gmail.com Cc: stable@vger.kernel.org Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/CAJRQjofRCF7wjrYmw3D7zd5QZnwHQq+F8U-mJDJ6NZ4bddYdL... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/pstore/zone.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/pstore/zone.c +++ b/fs/pstore/zone.c @@ -761,7 +761,7 @@ static inline int notrace psz_kmsg_write /* avoid destroying old data, allocate a new one */ len = zone->buffer_size + sizeof(*zone->buffer); zone->oldbuf = zone->buffer; - zone->buffer = kzalloc(len, GFP_KERNEL); + zone->buffer = kzalloc(len, GFP_ATOMIC); if (!zone->buffer) { zone->buffer = zone->oldbuf; return -ENOMEM;
From: Aditya Garg gargaditya08@live.com
commit 9f2b5debc07073e6dfdd774e3594d0224b991927 upstream.
Despite specifying UID and GID in mount command, the specified UID and GID were not being assigned. This patch fixes this issue.
Link: https://lkml.kernel.org/r/C0264BF5-059C-45CF-B8DA-3A3BD2C803A2@live.com Signed-off-by: Aditya Garg gargaditya08@live.com Reviewed-by: Viacheslav Dubeyko slava@dubeyko.com Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/hfsplus/hfsplus_fs.h | 2 ++ fs/hfsplus/inode.c | 4 ++-- fs/hfsplus/options.c | 4 ++++ 3 files changed, 8 insertions(+), 2 deletions(-)
--- a/fs/hfsplus/hfsplus_fs.h +++ b/fs/hfsplus/hfsplus_fs.h @@ -198,6 +198,8 @@ struct hfsplus_sb_info { #define HFSPLUS_SB_HFSX 3 #define HFSPLUS_SB_CASEFOLD 4 #define HFSPLUS_SB_NOBARRIER 5 +#define HFSPLUS_SB_UID 6 +#define HFSPLUS_SB_GID 7
static inline struct hfsplus_sb_info *HFSPLUS_SB(struct super_block *sb) { --- a/fs/hfsplus/inode.c +++ b/fs/hfsplus/inode.c @@ -192,11 +192,11 @@ static void hfsplus_get_perms(struct ino mode = be16_to_cpu(perms->mode);
i_uid_write(inode, be32_to_cpu(perms->owner)); - if (!i_uid_read(inode) && !mode) + if ((test_bit(HFSPLUS_SB_UID, &sbi->flags)) || (!i_uid_read(inode) && !mode)) inode->i_uid = sbi->uid;
i_gid_write(inode, be32_to_cpu(perms->group)); - if (!i_gid_read(inode) && !mode) + if ((test_bit(HFSPLUS_SB_GID, &sbi->flags)) || (!i_gid_read(inode) && !mode)) inode->i_gid = sbi->gid;
if (dir) { --- a/fs/hfsplus/options.c +++ b/fs/hfsplus/options.c @@ -140,6 +140,8 @@ int hfsplus_parse_options(char *input, s if (!uid_valid(sbi->uid)) { pr_err("invalid uid specified\n"); return 0; + } else { + set_bit(HFSPLUS_SB_UID, &sbi->flags); } break; case opt_gid: @@ -151,6 +153,8 @@ int hfsplus_parse_options(char *input, s if (!gid_valid(sbi->gid)) { pr_err("invalid gid specified\n"); return 0; + } else { + set_bit(HFSPLUS_SB_GID, &sbi->flags); } break; case opt_part:
From: Jens Axboe axboe@kernel.dk
commit caf1aeaffc3b09649a56769e559333ae2c4f1802 upstream.
We can have dependencies between epoll and io_uring. Consider an epoll context, identified by the epfd file descriptor, and an io_uring file descriptor identified by iofd. If we add iofd to the epfd context, and arm a multishot poll request for epfd with iofd, then the multishot poll request will repeatedly trigger and generate events until terminated by CQ ring overflow. This isn't a desired behavior.
Add EPOLL_URING so that io_uring can pass it in as part of the poll wakeup key, and io_uring can check for that to detect a potential recursive invocation.
Cc: stable@vger.kernel.org # 6.0 Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/eventpoll.c | 18 ++++++++++-------- include/uapi/linux/eventpoll.h | 6 ++++++ 2 files changed, 16 insertions(+), 8 deletions(-)
diff --git a/fs/eventpoll.c b/fs/eventpoll.c index 52954d4637b5..64659b110973 100644 --- a/fs/eventpoll.c +++ b/fs/eventpoll.c @@ -491,7 +491,8 @@ static inline void ep_set_busy_poll_napi_id(struct epitem *epi) */ #ifdef CONFIG_DEBUG_LOCK_ALLOC
-static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi) +static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi, + unsigned pollflags) { struct eventpoll *ep_src; unsigned long flags; @@ -522,16 +523,17 @@ static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi) } spin_lock_irqsave_nested(&ep->poll_wait.lock, flags, nests); ep->nests = nests + 1; - wake_up_locked_poll(&ep->poll_wait, EPOLLIN); + wake_up_locked_poll(&ep->poll_wait, EPOLLIN | pollflags); ep->nests = 0; spin_unlock_irqrestore(&ep->poll_wait.lock, flags); }
#else
-static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi) +static void ep_poll_safewake(struct eventpoll *ep, struct epitem *epi, + unsigned pollflags) { - wake_up_poll(&ep->poll_wait, EPOLLIN); + wake_up_poll(&ep->poll_wait, EPOLLIN | pollflags); }
#endif @@ -742,7 +744,7 @@ static void ep_free(struct eventpoll *ep)
/* We need to release all tasks waiting for these file */ if (waitqueue_active(&ep->poll_wait)) - ep_poll_safewake(ep, NULL); + ep_poll_safewake(ep, NULL, 0);
/* * We need to lock this because we could be hit by @@ -1208,7 +1210,7 @@ out_unlock:
/* We have to call this outside the lock */ if (pwake) - ep_poll_safewake(ep, epi); + ep_poll_safewake(ep, epi, pollflags & EPOLL_URING_WAKE);
if (!(epi->event.events & EPOLLEXCLUSIVE)) ewake = 1; @@ -1553,7 +1555,7 @@ static int ep_insert(struct eventpoll *ep, const struct epoll_event *event,
/* We have to call this outside the lock */ if (pwake) - ep_poll_safewake(ep, NULL); + ep_poll_safewake(ep, NULL, 0);
return 0; } @@ -1629,7 +1631,7 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi,
/* We have to call this outside the lock */ if (pwake) - ep_poll_safewake(ep, NULL); + ep_poll_safewake(ep, NULL, 0);
return 0; } diff --git a/include/uapi/linux/eventpoll.h b/include/uapi/linux/eventpoll.h index 8a3432d0f0dc..e687658843b1 100644 --- a/include/uapi/linux/eventpoll.h +++ b/include/uapi/linux/eventpoll.h @@ -41,6 +41,12 @@ #define EPOLLMSG (__force __poll_t)0x00000400 #define EPOLLRDHUP (__force __poll_t)0x00002000
+/* + * Internal flag - wakeup generated by io_uring, used to detect recursion back + * into the io_uring poll handler. + */ +#define EPOLL_URING_WAKE ((__force __poll_t)(1U << 27)) + /* Set exclusive wakeup mode for the target file descriptor */ #define EPOLLEXCLUSIVE ((__force __poll_t)(1U << 28))
From: Jens Axboe axboe@kernel.dk
commit 03e02acda8e267a8183e1e0ed289ff1ef9cd7ed8 upstream.
This is identical to eventfd_signal(), but it allows the caller to pass in a mask to be used for the poll wakeup key. The use case is avoiding repeated multishot triggers if we have a dependency between eventfd and io_uring.
If we setup an eventfd context and register that as the io_uring eventfd, and at the same time queue a multishot poll request for the eventfd context, then any CQE posted will repeatedly trigger the multishot request until it terminates when the CQ ring overflows.
In preparation for io_uring detecting this circular dependency, add the mentioned helper so that io_uring can pass in EPOLL_URING as part of the poll wakeup key.
Cc: stable@vger.kernel.org # 6.0 [axboe: fold in !CONFIG_EVENTFD fix from Zhang Qilong] Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/eventfd.c | 37 +++++++++++++++++++++---------------- include/linux/eventfd.h | 7 +++++++ 2 files changed, 28 insertions(+), 16 deletions(-)
diff --git a/fs/eventfd.c b/fs/eventfd.c index c0ffee99ad23..249ca6c0b784 100644 --- a/fs/eventfd.c +++ b/fs/eventfd.c @@ -43,21 +43,7 @@ struct eventfd_ctx { int id; };
-/** - * eventfd_signal - Adds @n to the eventfd counter. - * @ctx: [in] Pointer to the eventfd context. - * @n: [in] Value of the counter to be added to the eventfd internal counter. - * The value cannot be negative. - * - * This function is supposed to be called by the kernel in paths that do not - * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX - * value, and we signal this as overflow condition by returning a EPOLLERR - * to poll(2). - * - * Returns the amount by which the counter was incremented. This will be less - * than @n if the counter has overflowed. - */ -__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) +__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, unsigned mask) { unsigned long flags;
@@ -78,12 +64,31 @@ __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) n = ULLONG_MAX - ctx->count; ctx->count += n; if (waitqueue_active(&ctx->wqh)) - wake_up_locked_poll(&ctx->wqh, EPOLLIN); + wake_up_locked_poll(&ctx->wqh, EPOLLIN | mask); current->in_eventfd = 0; spin_unlock_irqrestore(&ctx->wqh.lock, flags);
return n; } + +/** + * eventfd_signal - Adds @n to the eventfd counter. + * @ctx: [in] Pointer to the eventfd context. + * @n: [in] Value of the counter to be added to the eventfd internal counter. + * The value cannot be negative. + * + * This function is supposed to be called by the kernel in paths that do not + * allow sleeping. In this function we allow the counter to reach the ULLONG_MAX + * value, and we signal this as overflow condition by returning a EPOLLERR + * to poll(2). + * + * Returns the amount by which the counter was incremented. This will be less + * than @n if the counter has overflowed. + */ +__u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n) +{ + return eventfd_signal_mask(ctx, n, 0); +} EXPORT_SYMBOL_GPL(eventfd_signal);
static void eventfd_free_ctx(struct eventfd_ctx *ctx) diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index 30eb30d6909b..786824f58d3d 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -40,6 +40,7 @@ struct file *eventfd_fget(int fd); struct eventfd_ctx *eventfd_ctx_fdget(int fd); struct eventfd_ctx *eventfd_ctx_fileget(struct file *file); __u64 eventfd_signal(struct eventfd_ctx *ctx, __u64 n); +__u64 eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, unsigned mask); int eventfd_ctx_remove_wait_queue(struct eventfd_ctx *ctx, wait_queue_entry_t *wait, __u64 *cnt); void eventfd_ctx_do_read(struct eventfd_ctx *ctx, __u64 *cnt); @@ -66,6 +67,12 @@ static inline int eventfd_signal(struct eventfd_ctx *ctx, int n) return -ENOSYS; }
+static inline int eventfd_signal_mask(struct eventfd_ctx *ctx, __u64 n, + unsigned mask) +{ + return -ENOSYS; +} + static inline void eventfd_ctx_put(struct eventfd_ctx *ctx) {
From: Pavel Begunkov asml.silence@gmail.com
commit ef0ec1ad03119b8b46b035dad42bca7d6da7c2e5 upstream.
We should not be messing with req->file outside of core paths. Clearing it makes msg_ring non reentrant, i.e. luckily io_msg_send_fd() fails the request on failed io_double_lock_ctx() but clearly was originally intended to do retries instead.
Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov asml.silence@gmail.com Link: https://lore.kernel.org/r/e5ac9edadb574fe33f6d727cb8f14ce68262a684.167038489... Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- io_uring/io_uring.c | 2 +- io_uring/msg_ring.c | 4 ---- io_uring/opdef.c | 7 +++++++ io_uring/opdef.h | 2 ++ 4 files changed, 10 insertions(+), 5 deletions(-)
--- a/io_uring/io_uring.c +++ b/io_uring/io_uring.c @@ -1607,7 +1607,7 @@ static int io_issue_sqe(struct io_kiocb return ret;
/* If the op doesn't have a file, we're not polling for it */ - if ((req->ctx->flags & IORING_SETUP_IOPOLL) && req->file) + if ((req->ctx->flags & IORING_SETUP_IOPOLL) && def->iopoll_queue) io_iopoll_req_issued(req, issue_flags);
return 0; --- a/io_uring/msg_ring.c +++ b/io_uring/msg_ring.c @@ -169,9 +169,5 @@ done: if (ret < 0) req_set_fail(req); io_req_set_res(req, ret, 0); - /* put file to avoid an attempt to IOPOLL the req */ - if (!(req->flags & REQ_F_FIXED_FILE)) - io_put_file(req->file); - req->file = NULL; return IOU_OK; } --- a/io_uring/opdef.c +++ b/io_uring/opdef.c @@ -63,6 +63,7 @@ const struct io_op_def io_op_defs[] = { .audit_skip = 1, .ioprio = 1, .iopoll = 1, + .iopoll_queue = 1, .async_size = sizeof(struct io_async_rw), .name = "READV", .prep = io_prep_rw, @@ -80,6 +81,7 @@ const struct io_op_def io_op_defs[] = { .audit_skip = 1, .ioprio = 1, .iopoll = 1, + .iopoll_queue = 1, .async_size = sizeof(struct io_async_rw), .name = "WRITEV", .prep = io_prep_rw, @@ -103,6 +105,7 @@ const struct io_op_def io_op_defs[] = { .audit_skip = 1, .ioprio = 1, .iopoll = 1, + .iopoll_queue = 1, .async_size = sizeof(struct io_async_rw), .name = "READ_FIXED", .prep = io_prep_rw, @@ -118,6 +121,7 @@ const struct io_op_def io_op_defs[] = { .audit_skip = 1, .ioprio = 1, .iopoll = 1, + .iopoll_queue = 1, .async_size = sizeof(struct io_async_rw), .name = "WRITE_FIXED", .prep = io_prep_rw, @@ -275,6 +279,7 @@ const struct io_op_def io_op_defs[] = { .audit_skip = 1, .ioprio = 1, .iopoll = 1, + .iopoll_queue = 1, .async_size = sizeof(struct io_async_rw), .name = "READ", .prep = io_prep_rw, @@ -290,6 +295,7 @@ const struct io_op_def io_op_defs[] = { .audit_skip = 1, .ioprio = 1, .iopoll = 1, + .iopoll_queue = 1, .async_size = sizeof(struct io_async_rw), .name = "WRITE", .prep = io_prep_rw, @@ -475,6 +481,7 @@ const struct io_op_def io_op_defs[] = { .needs_file = 1, .plug = 1, .name = "URING_CMD", + .iopoll_queue = 1, .async_size = uring_cmd_pdu_size(1), .prep = io_uring_cmd_prep, .issue = io_uring_cmd, --- a/io_uring/opdef.h +++ b/io_uring/opdef.h @@ -25,6 +25,8 @@ struct io_op_def { unsigned ioprio : 1; /* supports iopoll */ unsigned iopoll : 1; + /* have to be put into the iopoll list */ + unsigned iopoll_queue : 1; /* opcode specific path will handle ->async_data allocation if needed */ unsigned manual_alloc : 1; /* size of async data needed, if any */
From: Wang Yufen wangyufen@huawei.com
commit e7f703ff2507f4e9f496da96cd4b78fd3026120c upstream.
Fix to return a negative error code from create_elf_fdpic_tables() instead of 0.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Wang Yufen wangyufen@huawei.com Signed-off-by: Kees Cook keescook@chromium.org Link: https://lore.kernel.org/r/1669945261-30271-1-git-send-email-wangyufen@huawei... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/binfmt_elf_fdpic.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)
--- a/fs/binfmt_elf_fdpic.c +++ b/fs/binfmt_elf_fdpic.c @@ -434,8 +434,9 @@ static int load_elf_fdpic_binary(struct current->mm->start_stack = current->mm->start_brk + stack_size; #endif
- if (create_elf_fdpic_tables(bprm, current->mm, - &exec_params, &interp_params) < 0) + retval = create_elf_fdpic_tables(bprm, current->mm, &exec_params, + &interp_params); + if (retval < 0) goto error;
kdebug("- start_code %lx", current->mm->start_code);
From: Zhang Tianci zhangtianci.1997@bytedance.com
commit 5b0db51215e895a361bc63132caa7cca36a53d6a upstream.
There is a wrong case of link() on overlay: $ mkdir /lower /fuse /merge $ mount -t fuse /fuse $ mkdir /fuse/upper /fuse/work $ mount -t overlay /merge -o lowerdir=/lower,upperdir=/fuse/upper,\ workdir=work $ touch /merge/file $ chown bin.bin /merge/file // the file's caller becomes "bin" $ ln /merge/file /merge/lnkfile
Then we will get an error(EACCES) because fuse daemon checks the link()'s caller is "bin", it denied this request.
In the changing history of ovl_link(), there are two key commits:
The first is commit bb0d2b8ad296 ("ovl: fix sgid on directory") which overrides the cred's fsuid/fsgid using the new inode. The new inode's owner is initialized by inode_init_owner(), and inode->fsuid is assigned to the current user. So the override fsuid becomes the current user. We know link() is actually modifying the directory, so the caller must have the MAY_WRITE permission on the directory. The current caller may should have this permission. This is acceptable to use the caller's fsuid.
The second is commit 51f7e52dc943 ("ovl: share inode for hard link") which removed the inode creation in ovl_link(). This commit move inode_init_owner() into ovl_create_object(), so the ovl_link() just give the old inode to ovl_create_or_link(). Then the override fsuid becomes the old inode's fsuid, neither the caller nor the overlay's mounter! So this is incorrect.
Fix this bug by using ovl mounter's fsuid/fsgid to do underlying fs's link().
Link: https://lore.kernel.org/all/20220817102952.xnvesg3a7rbv576x@wittgenstein/T Link: https://lore.kernel.org/lkml/20220825130552.29587-1-zhangtianci.1997@bytedan... Signed-off-by: Zhang Tianci zhangtianci.1997@bytedance.com Signed-off-by: Jiachen Zhang zhangjiachen.jaycee@bytedance.com Reviewed-by: Christian Brauner (Microsoft) brauner@kernel.org Fixes: 51f7e52dc943 ("ovl: share inode for hard link") Cc: stable@vger.kernel.org # v4.8 Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/overlayfs/dir.c | 46 ++++++++++++++++++++++++++++++---------------- 1 file changed, 30 insertions(+), 16 deletions(-)
--- a/fs/overlayfs/dir.c +++ b/fs/overlayfs/dir.c @@ -592,28 +592,42 @@ static int ovl_create_or_link(struct den goto out_revert_creds; }
- err = -ENOMEM; - override_cred = prepare_creds(); - if (override_cred) { + if (!attr->hardlink) { + err = -ENOMEM; + override_cred = prepare_creds(); + if (!override_cred) + goto out_revert_creds; + /* + * In the creation cases(create, mkdir, mknod, symlink), + * ovl should transfer current's fs{u,g}id to underlying + * fs. Because underlying fs want to initialize its new + * inode owner using current's fs{u,g}id. And in this + * case, the @inode is a new inode that is initialized + * in inode_init_owner() to current's fs{u,g}id. So use + * the inode's i_{u,g}id to override the cred's fs{u,g}id. + * + * But in the other hardlink case, ovl_link() does not + * create a new inode, so just use the ovl mounter's + * fs{u,g}id. + */ override_cred->fsuid = inode->i_uid; override_cred->fsgid = inode->i_gid; - if (!attr->hardlink) { - err = security_dentry_create_files_as(dentry, - attr->mode, &dentry->d_name, old_cred, - override_cred); - if (err) { - put_cred(override_cred); - goto out_revert_creds; - } + err = security_dentry_create_files_as(dentry, + attr->mode, &dentry->d_name, old_cred, + override_cred); + if (err) { + put_cred(override_cred); + goto out_revert_creds; } put_cred(override_creds(override_cred)); put_cred(override_cred); - - if (!ovl_dentry_is_whiteout(dentry)) - err = ovl_create_upper(dentry, inode, attr); - else - err = ovl_create_over_whiteout(dentry, inode, attr); } + + if (!ovl_dentry_is_whiteout(dentry)) + err = ovl_create_upper(dentry, inode, attr); + else + err = ovl_create_over_whiteout(dentry, inode, attr); + out_revert_creds: revert_creds(old_cred); return err;
From: Al Viro viro@zeniv.linux.org.uk
commit 456b59e757b0c558df550764a4fd5ae6877e93f8 upstream.
ovl_change_flags() is an open-coded variant of fs/fcntl.c:setfl() and it got missed by commit 164f4064ca81 ("keep iocb_flags() result cached in struct file"); the same change applies there.
Reported-by: Pierre Labastie pierre.labastie@neuf.fr Fixes: 164f4064ca81 ("keep iocb_flags() result cached in struct file") Cc: stable@vger.kernel.org # v6.0 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216738 Signed-off-by: Al Viro viro@zeniv.linux.org.uk Signed-off-by: Miklos Szeredi mszeredi@redhat.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/overlayfs/file.c | 1 + 1 file changed, 1 insertion(+)
diff --git a/fs/overlayfs/file.c b/fs/overlayfs/file.c index a1a22f58ba18..dd688a842b0b 100644 --- a/fs/overlayfs/file.c +++ b/fs/overlayfs/file.c @@ -96,6 +96,7 @@ static int ovl_change_flags(struct file *file, unsigned int flags)
spin_lock(&file->f_lock); file->f_flags = (file->f_flags & ~OVL_SETFL_MASK) | flags; + file->f_iocb_flags = iocb_flags(file); spin_unlock(&file->f_lock);
return 0;
From: Artem Egorkine arteme@gmail.com
commit 8508fa2e7472f673edbeedf1b1d2b7a6bb898ecc upstream.
A PODxt device sends 0xb2, 0xc2 or 0xf2 as a status byte for MIDI messages over USB that should otherwise have a 0xb0, 0xc0 or 0xf0 status byte. This is usually corrected by the driver on other OSes.
This fixes MIDI sysex messages sent by PODxt.
[ tiwai: fixed white spaces ]
Signed-off-by: Artem Egorkine arteme@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221225105728.1153989-1-arteme@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/line6/driver.c | 3 ++- sound/usb/line6/midi.c | 3 ++- sound/usb/line6/midibuf.c | 25 +++++++++++++++++-------- sound/usb/line6/midibuf.h | 5 ++++- sound/usb/line6/pod.c | 3 ++- 5 files changed, 27 insertions(+), 12 deletions(-)
--- a/sound/usb/line6/driver.c +++ b/sound/usb/line6/driver.c @@ -304,7 +304,8 @@ static void line6_data_received(struct u for (;;) { done = line6_midibuf_read(mb, line6->buffer_message, - LINE6_MIDI_MESSAGE_MAXLEN); + LINE6_MIDI_MESSAGE_MAXLEN, + LINE6_MIDIBUF_READ_RX);
if (done <= 0) break; --- a/sound/usb/line6/midi.c +++ b/sound/usb/line6/midi.c @@ -56,7 +56,8 @@ static void line6_midi_transmit(struct s
for (;;) { done = line6_midibuf_read(mb, chunk, - LINE6_FALLBACK_MAXPACKETSIZE); + LINE6_FALLBACK_MAXPACKETSIZE, + LINE6_MIDIBUF_READ_TX);
if (done == 0) break; --- a/sound/usb/line6/midibuf.c +++ b/sound/usb/line6/midibuf.c @@ -9,6 +9,7 @@
#include "midibuf.h"
+ static int midibuf_message_length(unsigned char code) { int message_length; @@ -20,12 +21,7 @@ static int midibuf_message_length(unsign
message_length = length[(code >> 4) - 8]; } else { - /* - Note that according to the MIDI specification 0xf2 is - the "Song Position Pointer", but this is used by Line 6 - to send sysex messages to the host. - */ - static const int length[] = { -1, 2, -1, 2, -1, -1, 1, 1, 1, 1, + static const int length[] = { -1, 2, 2, 2, -1, -1, 1, 1, 1, -1, 1, 1, 1, -1, 1, 1 }; message_length = length[code & 0x0f]; @@ -125,7 +121,7 @@ int line6_midibuf_write(struct midi_buff }
int line6_midibuf_read(struct midi_buffer *this, unsigned char *data, - int length) + int length, int read_type) { int bytes_used; int length1, length2; @@ -148,9 +144,22 @@ int line6_midibuf_read(struct midi_buffe
length1 = this->size - this->pos_read;
- /* check MIDI command length */ command = this->buf[this->pos_read]; + /* + PODxt always has status byte lower nibble set to 0010, + when it means to send 0000, so we correct if here so + that control/program changes come on channel 1 and + sysex message status byte is correct + */ + if (read_type == LINE6_MIDIBUF_READ_RX) { + if (command == 0xb2 || command == 0xc2 || command == 0xf2) { + unsigned char fixed = command & 0xf0; + this->buf[this->pos_read] = fixed; + command = fixed; + } + }
+ /* check MIDI command length */ if (command & 0x80) { midi_length = midibuf_message_length(command); this->command_prev = command; --- a/sound/usb/line6/midibuf.h +++ b/sound/usb/line6/midibuf.h @@ -8,6 +8,9 @@ #ifndef MIDIBUF_H #define MIDIBUF_H
+#define LINE6_MIDIBUF_READ_TX 0 +#define LINE6_MIDIBUF_READ_RX 1 + struct midi_buffer { unsigned char *buf; int size; @@ -23,7 +26,7 @@ extern void line6_midibuf_destroy(struct extern int line6_midibuf_ignore(struct midi_buffer *mb, int length); extern int line6_midibuf_init(struct midi_buffer *mb, int size, int split); extern int line6_midibuf_read(struct midi_buffer *mb, unsigned char *data, - int length); + int length, int read_type); extern void line6_midibuf_reset(struct midi_buffer *mb); extern int line6_midibuf_write(struct midi_buffer *mb, unsigned char *data, int length); --- a/sound/usb/line6/pod.c +++ b/sound/usb/line6/pod.c @@ -159,8 +159,9 @@ static struct line6_pcm_properties pod_p .bytes_per_channel = 3 /* SNDRV_PCM_FMTBIT_S24_3LE */ };
+ static const char pod_version_header[] = { - 0xf2, 0x7e, 0x7f, 0x06, 0x02 + 0xf0, 0x7e, 0x7f, 0x06, 0x02 };
static char *pod_alloc_sysex_buffer(struct usb_line6_pod *pod, int code,
From: Artem Egorkine arteme@gmail.com
commit b8800d324abb50160560c636bfafe2c81001b66c upstream.
Correctly calculate available space including the size of the chunk buffer. This fixes a buffer overflow when multiple MIDI sysex messages are sent to a PODxt device.
Signed-off-by: Artem Egorkine arteme@gmail.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20221225105728.1153989-2-arteme@gmail.com Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/usb/line6/midi.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)
--- a/sound/usb/line6/midi.c +++ b/sound/usb/line6/midi.c @@ -44,7 +44,8 @@ static void line6_midi_transmit(struct s int req, done;
for (;;) { - req = min(line6_midibuf_bytes_free(mb), line6->max_packet_size); + req = min3(line6_midibuf_bytes_free(mb), line6->max_packet_size, + LINE6_FALLBACK_MAXPACKETSIZE); done = snd_rawmidi_transmit_peek(substream, chunk, req);
if (done == 0)
From: Takashi Iwai tiwai@suse.de
commit 090ddad4c7a9fefd647c762093a555870a19c8b2 upstream.
The recent code refactoring for HD-audio HDMI codec driver caused a regression on AMD/ATI HDMI codecs; namely, PulseAudioand pipewire don't recognize HDMI outputs any longer while the direct output via ALSA raw access still works.
The problem turned out that, after the code refactoring, the driver assumes only the dynamic PCM assignment, and when a PCM stream that still isn't assigned to any pin gets opened, the driver tries to assign any free converter to the PCM stream. This behavior is OK for Intel and other codecs, as they have arbitrary connections between pins and converters. OTOH, on AMD chips that have a 1:1 mapping between pins and converters, this may end up with blocking the open of the next PCM stream for the pin that is tied with the formerly taken converter.
Also, with the code refactoring, more PCM streams are exposed than necessary as we assume all converters can be used, while this isn't true for AMD case. This may change the PCM stream assignment and confuse users as well.
This patch fixes those problems by:
- Introducing a flag spec->static_pcm_mapping, and if it's set, the driver applies the static mapping between pins and converters at the probe time - Limiting the number of PCM streams per pins, too; this avoids the superfluous PCM streams
Fixes: ef6f5494faf6 ("ALSA: hda/hdmi: Use only dynamic PCM device allocation") Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=216836 Co-developed-by: Jaroslav Kysela perex@perex.cz Signed-off-by: Jaroslav Kysela perex@perex.cz Link: https://lore.kernel.org/r/20221228125714.16329-1-tiwai@suse.de Signed-off-by: Takashi Iwai tiwai@suse.de Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- sound/pci/hda/patch_hdmi.c | 27 +++++++++++++++++++-------- 1 file changed, 19 insertions(+), 8 deletions(-)
--- a/sound/pci/hda/patch_hdmi.c +++ b/sound/pci/hda/patch_hdmi.c @@ -167,6 +167,7 @@ struct hdmi_spec { struct hdmi_ops ops;
bool dyn_pin_out; + bool static_pcm_mapping; /* hdmi interrupt trigger control flag for Nvidia codec */ bool hdmi_intr_trig_ctrl; bool nv_dp_workaround; /* workaround DP audio infoframe for Nvidia */ @@ -1525,13 +1526,16 @@ static void update_eld(struct hda_codec */ pcm_jack = pin_idx_to_pcm_jack(codec, per_pin);
- if (eld->eld_valid) { - hdmi_attach_hda_pcm(spec, per_pin); - hdmi_pcm_setup_pin(spec, per_pin); - } else { - hdmi_pcm_reset_pin(spec, per_pin); - hdmi_detach_hda_pcm(spec, per_pin); + if (!spec->static_pcm_mapping) { + if (eld->eld_valid) { + hdmi_attach_hda_pcm(spec, per_pin); + hdmi_pcm_setup_pin(spec, per_pin); + } else { + hdmi_pcm_reset_pin(spec, per_pin); + hdmi_detach_hda_pcm(spec, per_pin); + } } + /* if pcm_idx == -1, it means this is in monitor connection event * we can get the correct pcm_idx now. */ @@ -2281,8 +2285,8 @@ static int generic_hdmi_build_pcms(struc struct hdmi_spec *spec = codec->spec; int idx, pcm_num;
- /* limit the PCM devices to the codec converters */ - pcm_num = spec->num_cvts; + /* limit the PCM devices to the codec converters or available PINs */ + pcm_num = min(spec->num_cvts, spec->num_pins); codec_dbg(codec, "hdmi: pcm_num set to %d\n", pcm_num);
for (idx = 0; idx < pcm_num; idx++) { @@ -2379,6 +2383,11 @@ static int generic_hdmi_build_controls(s struct hdmi_spec_per_pin *per_pin = get_pin(spec, pin_idx); struct hdmi_eld *pin_eld = &per_pin->sink_eld;
+ if (spec->static_pcm_mapping) { + hdmi_attach_hda_pcm(spec, per_pin); + hdmi_pcm_setup_pin(spec, per_pin); + } + pin_eld->eld_valid = false; hdmi_present_sense(per_pin, 0); } @@ -4419,6 +4428,8 @@ static int patch_atihdmi(struct hda_code
spec = codec->spec;
+ spec->static_pcm_mapping = true; + spec->ops.pin_get_eld = atihdmi_pin_get_eld; spec->ops.pin_setup_infoframe = atihdmi_pin_setup_infoframe; spec->ops.pin_hbr_setup = atihdmi_pin_hbr_setup;
From: Christian Brauner brauner@kernel.org
commit 11933cf1d91d57da9e5c53822a540bbdc2656c16 upstream.
The propagate_mnt() function handles mount propagation when creating mounts and propagates the source mount tree @source_mnt to all applicable nodes of the destination propagation mount tree headed by @dest_mnt.
Unfortunately it contains a bug where it fails to terminate at peers of @source_mnt when looking up copies of the source mount that become masters for copies of the source mount tree mounted on top of slaves in the destination propagation tree causing a NULL dereference.
Once the mechanics of the bug are understood it's easy to trigger. Because of unprivileged user namespaces it is available to unprivileged users.
While fixing this bug we've gotten confused multiple times due to unclear terminology or missing concepts. So let's start this with some clarifications:
* The terms "master" or "peer" denote a shared mount. A shared mount belongs to a peer group.
* A peer group is a set of shared mounts that propagate to each other. They are identified by a peer group id. The peer group id is available in @shared_mnt->mnt_group_id. Shared mounts within the same peer group have the same peer group id. The peers in a peer group can be reached via @shared_mnt->mnt_share.
* The terms "slave mount" or "dependent mount" denote a mount that receives propagation from a peer in a peer group. IOW, shared mounts may have slave mounts and slave mounts have shared mounts as their master. Slave mounts of a given peer in a peer group are listed on that peers slave list available at @shared_mnt->mnt_slave_list.
* The term "master mount" denotes a mount in a peer group. IOW, it denotes a shared mount or a peer mount in a peer group. The term "master mount" - or "master" for short - is mostly used when talking in the context of slave mounts that receive propagation from a master mount. A master mount of a slave identifies the closest peer group a slave mount receives propagation from. The master mount of a slave can be identified via @slave_mount->mnt_master. Different slaves may point to different masters in the same peer group.
* Multiple peers in a peer group can have non-empty ->mnt_slave_lists. Non-empty ->mnt_slave_lists of peers don't intersect. Consequently, to ensure all slave mounts of a peer group are visited the ->mnt_slave_lists of all peers in a peer group have to be walked.
* Slave mounts point to a peer in the closest peer group they receive propagation from via @slave_mnt->mnt_master (see above). Together with these peers they form a propagation group (see below). The closest peer group can thus be identified through the peer group id @slave_mnt->mnt_master->mnt_group_id of the peer/master that a slave mount receives propagation from.
* A shared-slave mount is a slave mount to a peer group pg1 while also a peer in another peer group pg2. IOW, a peer group may receive propagation from another peer group.
If a peer group pg1 is a slave to another peer group pg2 then all peers in peer group pg1 point to the same peer in peer group pg2 via ->mnt_master. IOW, all peers in peer group pg1 appear on the same ->mnt_slave_list. IOW, they cannot be slaves to different peer groups.
* A pure slave mount is a slave mount that is a slave to a peer group but is not a peer in another peer group.
* A propagation group denotes the set of mounts consisting of a single peer group pg1 and all slave mounts and shared-slave mounts that point to a peer in that peer group via ->mnt_master. IOW, all slave mounts such that @slave_mnt->mnt_master->mnt_group_id is equal to @shared_mnt->mnt_group_id.
The concept of a propagation group makes it easier to talk about a single propagation level in a propagation tree.
For example, in propagate_mnt() the immediate peers of @dest_mnt and all slaves of @dest_mnt's peer group form a propagation group propg1. So a shared-slave mount that is a slave in propg1 and that is a peer in another peer group pg2 forms another propagation group propg2 together with all slaves that point to that shared-slave mount in their ->mnt_master.
* A propagation tree refers to all mounts that receive propagation starting from a specific shared mount.
For example, for propagate_mnt() @dest_mnt is the start of a propagation tree. The propagation tree ecompasses all mounts that receive propagation from @dest_mnt's peer group down to the leafs.
With that out of the way let's get to the actual algorithm.
We know that @dest_mnt is guaranteed to be a pure shared mount or a shared-slave mount. This is guaranteed by a check in attach_recursive_mnt(). So propagate_mnt() will first propagate the source mount tree to all peers in @dest_mnt's peer group:
for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) { ret = propagate_one(n); if (ret) goto out; }
Notice, that the peer propagation loop of propagate_mnt() doesn't propagate @dest_mnt itself. @dest_mnt is mounted directly in attach_recursive_mnt() after we propagated to the destination propagation tree.
The mount that will be mounted on top of @dest_mnt is @source_mnt. This copy was created earlier even before we entered attach_recursive_mnt() and doesn't concern us a lot here.
It's just important to notice that when propagate_mnt() is called @source_mnt will not yet have been mounted on top of @dest_mnt. Thus, @source_mnt->mnt_parent will either still point to @source_mnt or - in the case @source_mnt is moved and thus already attached - still to its former parent.
For each peer @m in @dest_mnt's peer group propagate_one() will create a new copy of the source mount tree and mount that copy @child on @m such that @child->mnt_parent points to @m after propagate_one() returns.
propagate_one() will stash the last destination propagation node @m in @last_dest and the last copy it created for the source mount tree in @last_source.
Hence, if we call into propagate_one() again for the next destination propagation node @m, @last_dest will point to the previous destination propagation node and @last_source will point to the previous copy of the source mount tree and mounted on @last_dest.
Each new copy of the source mount tree is created from the previous copy of the source mount tree. This will become important later.
The peer loop in propagate_mnt() is straightforward. We iterate through the peers copying and updating @last_source and @last_dest as we go through them and mount each copy of the source mount tree @child on a peer @m in @dest_mnt's peer group.
After propagate_mnt() handled the peers in @dest_mnt's peer group propagate_mnt() will propagate the source mount tree down the propagation tree that @dest_mnt's peer group propagates to:
for (m = next_group(dest_mnt, dest_mnt); m; m = next_group(m, dest_mnt)) { /* everything in that slave group */ n = m; do { ret = propagate_one(n); if (ret) goto out; n = next_peer(n); } while (n != m); }
The next_group() helper will recursively walk the destination propagation tree, descending into each propagation group of the propagation tree.
The important part is that it takes care to propagate the source mount tree to all peers in the peer group of a propagation group before it propagates to the slaves to those peers in the propagation group. IOW, it creates and mounts copies of the source mount tree that become masters before it creates and mounts copies of the source mount tree that become slaves to these masters.
It is important to remember that propagating the source mount tree to each mount @m in the destination propagation tree simply means that we create and mount new copies @child of the source mount tree on @m such that @child->mnt_parent points to @m.
Since we know that each node @m in the destination propagation tree headed by @dest_mnt's peer group will be overmounted with a copy of the source mount tree and since we know that the propagation properties of each copy of the source mount tree we create and mount at @m will mostly mirror the propagation properties of @m. We can use that information to create and mount the copies of the source mount tree that become masters before their slaves.
The easy case is always when @m and @last_dest are peers in a peer group of a given propagation group. In that case we know that we can simply copy @last_source without having to figure out what the master for the new copy @child of the source mount tree needs to be as we've done that in a previous call to propagate_one().
The hard case is when we're dealing with a slave mount or a shared-slave mount @m in a destination propagation group that we need to create and mount a copy of the source mount tree on.
For each propagation group in the destination propagation tree we propagate the source mount tree to we want to make sure that the copies @child of the source mount tree we create and mount on slaves @m pick an ealier copy of the source mount tree that we mounted on a master @m of the destination propagation group as their master. This is a mouthful but as far as we can tell that's the core of it all.
But, if we keep track of the masters in the destination propagation tree @m we can use the information to find the correct master for each copy of the source mount tree we create and mount at the slaves in the destination propagation tree @m.
Let's walk through the base case as that's still fairly easy to grasp.
If we're dealing with the first slave in the propagation group that @dest_mnt is in then we don't yet have marked any masters in the destination propagation tree.
We know the master for the first slave to @dest_mnt's peer group is simple @dest_mnt. So we expect this algorithm to yield a copy of the source mount tree that was mounted on a peer in @dest_mnt's peer group as the master for the copy of the source mount tree we want to mount at the first slave @m:
for (n = m; ; n = p) { p = n->mnt_master; if (p == dest_master || IS_MNT_MARKED(p)) break; }
For the first slave we walk the destination propagation tree all the way up to a peer in @dest_mnt's peer group. IOW, the propagation hierarchy can be walked by walking up the @mnt->mnt_master hierarchy of the destination propagation tree @m. We will ultimately find a peer in @dest_mnt's peer group and thus ultimately @dest_mnt->mnt_master.
Btw, here the assumption we listed at the beginning becomes important. Namely, that peers in a peer group pg1 that are slaves in another peer group pg2 appear on the same ->mnt_slave_list. IOW, all slaves who are peers in peer group pg1 point to the same peer in peer group pg2 via their ->mnt_master. Otherwise the termination condition in the code above would be wrong and next_group() would be broken too.
So the first iteration sets:
n = m; p = n->mnt_master;
such that @p now points to a peer or @dest_mnt itself. We walk up one more level since we don't have any marked mounts. So we end up with:
n = dest_mnt; p = dest_mnt->mnt_master;
If @dest_mnt's peer group is not slave to another peer group then @p is now NULL. If @dest_mnt's peer group is a slave to another peer group then @p now points to @dest_mnt->mnt_master points which is a master outside the propagation tree we're dealing with.
Now we need to figure out the master for the copy of the source mount tree we're about to create and mount on the first slave of @dest_mnt's peer group:
do { struct mount *parent = last_source->mnt_parent; if (last_source == first_source) break; done = parent->mnt_master == p; if (done && peers(n, parent)) break; last_source = last_source->mnt_master; } while (!done);
We know that @last_source->mnt_parent points to @last_dest and @last_dest is the last peer in @dest_mnt's peer group we propagated to in the peer loop in propagate_mnt().
Consequently, @last_source is the last copy we created and mount on that last peer in @dest_mnt's peer group. So @last_source is the master we want to pick.
We know that @last_source->mnt_parent->mnt_master points to @last_dest->mnt_master. We also know that @last_dest->mnt_master is either NULL or points to a master outside of the destination propagation tree and so does @p. Hence:
done = parent->mnt_master == p;
is trivially true in the base condition.
We also know that for the first slave mount of @dest_mnt's peer group that @last_dest either points @dest_mnt itself because it was initialized to:
last_dest = dest_mnt;
at the beginning of propagate_mnt() or it will point to a peer of @dest_mnt in its peer group. In both cases it is guaranteed that on the first iteration @n and @parent are peers (Please note the check for peers here as that's important.):
if (done && peers(n, parent)) break;
So, as we expected, we select @last_source, which referes to the last copy of the source mount tree we mounted on the last peer in @dest_mnt's peer group, as the master of the first slave in @dest_mnt's peer group. The rest is taken care of by clone_mnt(last_source, ...). We'll skip over that part otherwise this becomes a blogpost.
At the end of propagate_mnt() we now mark @m->mnt_master as the first master in the destination propagation tree that is distinct from @dest_mnt->mnt_master. IOW, we mark @dest_mnt itself as a master.
By marking @dest_mnt or one of it's peers we are able to easily find it again when we later lookup masters for other copies of the source mount tree we mount copies of the source mount tree on slaves @m to @dest_mnt's peer group. This, in turn allows us to find the master we selected for the copies of the source mount tree we mounted on master in the destination propagation tree again.
The important part is to realize that the code makes use of the fact that the last copy of the source mount tree stashed in @last_source was mounted on top of the previous destination propagation node @last_dest. What this means is that @last_source allows us to walk the destination propagation hierarchy the same way each destination propagation node @m does.
If we take @last_source, which is the copy of @source_mnt we have mounted on @last_dest in the previous iteration of propagate_one(), then we know @last_source->mnt_parent points to @last_dest but we also know that as we walk through the destination propagation tree that @last_source->mnt_master will point to an earlier copy of the source mount tree we mounted one an earlier destination propagation node @m.
IOW, @last_source->mnt_parent will be our hook into the destination propagation tree and each consecutive @last_source->mnt_master will lead us to an earlier propagation node @m via @last_source->mnt_master->mnt_parent.
Hence, by walking up @last_source->mnt_master, each of which is mounted on a node that is a master @m in the destination propagation tree we can also walk up the destination propagation hierarchy.
So, for each new destination propagation node @m we use the previous copy of @last_source and the fact it's mounted on the previous propagation node @last_dest via @last_source->mnt_master->mnt_parent to determine what the master of the new copy of @last_source needs to be.
The goal is to find the _closest_ master that the new copy of the source mount tree we are about to create and mount on a slave @m in the destination propagation tree needs to pick. IOW, we want to find a suitable master in the propagation group.
As the propagation structure of the source mount propagation tree we create mirrors the propagation structure of the destination propagation tree we can find @m's closest master - i.e., a marked master - which is a peer in the closest peer group that @m receives propagation from. We store that closest master of @m in @p as before and record the slave to that master in @n
We then search for this master @p via @last_source by walking up the master hierarchy starting from the last copy of the source mount tree stored in @last_source that we created and mounted on the previous destination propagation node @m.
We will try to find the master by walking @last_source->mnt_master and by comparing @last_source->mnt_master->mnt_parent->mnt_master to @p. If we find @p then we can figure out what earlier copy of the source mount tree needs to be the master for the new copy of the source mount tree we're about to create and mount at the current destination propagation node @m.
If @last_source->mnt_master->mnt_parent and @n are peers then we know that the closest master they receive propagation from is @last_source->mnt_master->mnt_parent->mnt_master. If not then the closest immediate peer group that they receive propagation from must be one level higher up.
This builds on the earlier clarification at the beginning that all peers in a peer group which are slaves of other peer groups all point to the same ->mnt_master, i.e., appear on the same ->mnt_slave_list, of the closest peer group that they receive propagation from.
However, terminating the walk has corner cases.
If the closest marked master for a given destination node @m cannot be found by walking up the master hierarchy via @last_source->mnt_master then we need to terminate the walk when we encounter @source_mnt again.
This isn't an arbitrary termination. It simply means that the new copy of the source mount tree we're about to create has a copy of the source mount tree we created and mounted on a peer in @dest_mnt's peer group as its master. IOW, @source_mnt is the peer in the closest peer group that the new copy of the source mount tree receives propagation from.
We absolutely have to stop @source_mnt because @last_source->mnt_master either points outside the propagation hierarchy we're dealing with or it is NULL because @source_mnt isn't a shared-slave.
So continuing the walk past @source_mnt would cause a NULL dereference via @last_source->mnt_master->mnt_parent. And so we have to stop the walk when we encounter @source_mnt again.
One scenario where this can happen is when we first handled a series of slaves of @dest_mnt's peer group and then encounter peers in a new peer group that is a slave to @dest_mnt's peer group. We handle them and then we encounter another slave mount to @dest_mnt that is a pure slave to @dest_mnt's peer group. That pure slave will have a peer in @dest_mnt's peer group as its master. Consequently, the new copy of the source mount tree will need to have @source_mnt as it's master. So we walk the propagation hierarchy all the way up to @source_mnt based on @last_source->mnt_master.
So terminate on @source_mnt, easy peasy. Except, that the check misses something that the rest of the algorithm already handles.
If @dest_mnt has peers in it's peer group the peer loop in propagate_mnt():
for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) { ret = propagate_one(n); if (ret) goto out; }
will consecutively update @last_source with each previous copy of the source mount tree we created and mounted at the previous peer in @dest_mnt's peer group. So after that loop terminates @last_source will point to whatever copy of the source mount tree was created and mounted on the last peer in @dest_mnt's peer group.
Furthermore, if there is even a single additional peer in @dest_mnt's peer group then @last_source will __not__ point to @source_mnt anymore. Because, as we mentioned above, @dest_mnt isn't even handled in this loop but directly in attach_recursive_mnt(). So it can't even accidently come last in that peer loop.
So the first time we handle a slave mount @m of @dest_mnt's peer group the copy of the source mount tree we create will make the __last copy of the source mount tree we created and mounted on the last peer in @dest_mnt's peer group the master of the new copy of the source mount tree we create and mount on the first slave of @dest_mnt's peer group__.
But this means that the termination condition that checks for @source_mnt is wrong. The @source_mnt cannot be found anymore by propagate_one(). Instead it will find the last copy of the source mount tree we created and mounted for the last peer of @dest_mnt's peer group again. And that is a peer of @source_mnt not @source_mnt itself.
IOW, we fail to terminate the loop correctly and ultimately dereference @last_source->mnt_master->mnt_parent. When @source_mnt's peer group isn't slave to another peer group then @last_source->mnt_master is NULL causing the splat below.
For example, assume @dest_mnt is a pure shared mount and has three peers in its peer group:
=================================================================================== mount-id mount-parent-id peer-group-id =================================================================================== (@dest_mnt) mnt_master[216] 309 297 shared:216 \ (@source_mnt) mnt_master[218]: 609 609 shared:218
(1) mnt_master[216]: 607 605 shared:216 \ (P1) mnt_master[218]: 624 607 shared:218
(2) mnt_master[216]: 576 574 shared:216 \ (P2) mnt_master[218]: 625 576 shared:218
(3) mnt_master[216]: 545 543 shared:216 \ (P3) mnt_master[218]: 626 545 shared:218
After this sequence has been processed @last_source will point to (P3), the copy generated for the third peer in @dest_mnt's peer group we handled. So the copy of the source mount tree (P4) we create and mount on the first slave of @dest_mnt's peer group:
=================================================================================== mount-id mount-parent-id peer-group-id =================================================================================== mnt_master[216] 309 297 shared:216 / / (S0) mnt_slave 483 481 master:216 \ \ (P3) mnt_master[218] 626 545 shared:218 \ / / (P4) mnt_slave 627 483 master:218
will pick the last copy of the source mount tree (P3) as master, not (S0).
When walking the propagation hierarchy via @last_source's master hierarchy we encounter (P3) but not (S0), i.e., @source_mnt.
We can fix this in multiple ways:
(1) By setting @last_source to @source_mnt after we processed the peers in @dest_mnt's peer group right after the peer loop in propagate_mnt().
(2) By changing the termination condition that relies on finding exactly @source_mnt to finding a peer of @source_mnt.
(3) By only moving @last_source when we actually venture into a new peer group or some clever variant thereof.
The first two options are minimally invasive and what we want as a fix. The third option is more intrusive but something we'd like to explore in the near future.
This passes all LTP tests and specifically the mount propagation testsuite part of it. It also holds up against all known reproducers of this issues.
Final words. First, this is a clever but __worringly__ underdocumented algorithm. There isn't a single detailed comment to be found in next_group(), propagate_one() or anywhere else in that file for that matter. This has been a giant pain to understand and work through and a bug like this is insanely difficult to fix without a detailed understanding of what's happening. Let's not talk about the amount of time that was sunk into fixing this.
Second, all the cool kids with access to unshare --mount --user --map-root --propagation=unchanged are going to have a lot of fun. IOW, triggerable by unprivileged users while namespace_lock() lock is held.
[ 115.848393] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 115.848967] #PF: supervisor read access in kernel mode [ 115.849386] #PF: error_code(0x0000) - not-present page [ 115.849803] PGD 0 P4D 0 [ 115.850012] Oops: 0000 [#1] PREEMPT SMP PTI [ 115.850354] CPU: 0 PID: 15591 Comm: mount Not tainted 6.1.0-rc7 #3 [ 115.850851] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 115.851510] RIP: 0010:propagate_one.part.0+0x7f/0x1a0 [ 115.851924] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10 49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01 00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37 02 4d [ 115.853441] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282 [ 115.853865] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00 [ 115.854458] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780 [ 115.855044] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0 [ 115.855693] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8 [ 115.856304] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000 [ 115.856859] FS: 00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000) knlGS:0000000000000000 [ 115.857531] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.858006] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0 [ 115.858598] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 115.859393] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 115.860099] Call Trace: [ 115.860358] <TASK> [ 115.860535] propagate_mnt+0x14d/0x190 [ 115.860848] attach_recursive_mnt+0x274/0x3e0 [ 115.861212] path_mount+0x8c8/0xa60 [ 115.861503] __x64_sys_mount+0xf6/0x140 [ 115.861819] do_syscall_64+0x5b/0x80 [ 115.862117] ? do_faccessat+0x123/0x250 [ 115.862435] ? syscall_exit_to_user_mode+0x17/0x40 [ 115.862826] ? do_syscall_64+0x67/0x80 [ 115.863133] ? syscall_exit_to_user_mode+0x17/0x40 [ 115.863527] ? do_syscall_64+0x67/0x80 [ 115.863835] ? do_syscall_64+0x67/0x80 [ 115.864144] ? do_syscall_64+0x67/0x80 [ 115.864452] ? exc_page_fault+0x70/0x170 [ 115.864775] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 115.865187] RIP: 0033:0x7f92c92b0ebe [ 115.865480] Code: 48 8b 0d 75 4f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 42 4f 0c 00 f7 d8 64 89 01 48 [ 115.866984] RSP: 002b:00007fff000aa728 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [ 115.867607] RAX: ffffffffffffffda RBX: 000055a77888d6b0 RCX: 00007f92c92b0ebe [ 115.868240] RDX: 000055a77888d8e0 RSI: 000055a77888e6e0 RDI: 000055a77888e620 [ 115.868823] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 [ 115.869403] R10: 0000000000001000 R11: 0000000000000246 R12: 000055a77888e620 [ 115.869994] R13: 000055a77888d8e0 R14: 00000000ffffffff R15: 00007f92c93e4076 [ 115.870581] </TASK> [ 115.870763] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink qrtr snd_intel8x0 sunrpc snd_ac97_codec ac97_bus snd_pcm snd_timer intel_rapl_msr intel_rapl_common snd vboxguest intel_powerclamp video rapl joydev soundcore i2c_piix4 wmi fuse zram xfs vmwgfx crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic drm_ttm_helper ttm e1000 ghash_clmulni_intel serio_raw ata_generic pata_acpi scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath [ 115.875288] CR2: 0000000000000010 [ 115.875641] ---[ end trace 0000000000000000 ]--- [ 115.876135] RIP: 0010:propagate_one.part.0+0x7f/0x1a0 [ 115.876551] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10 49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01 00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37 02 4d [ 115.878086] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282 [ 115.878511] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00 [ 115.879128] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780 [ 115.879715] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0 [ 115.880359] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8 [ 115.880962] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000 [ 115.881548] FS: 00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000) knlGS:0000000000000000 [ 115.882234] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.882713] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0 [ 115.883314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 115.883966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Fixes: f2ebb3a921c1 ("smarter propagate_mnt()") Fixes: 5ec0811d3037 ("propogate_mnt: Handle the first propogated copy being a slave") Cc: stable@vger.kernel.org Reported-by: Ditang Chen ditang.c@gmail.com Signed-off-by: Seth Forshee (Digital Ocean) sforshee@kernel.org Signed-off-by: Christian Brauner (Microsoft) brauner@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- If there are no big objections I'll get this to Linus rather sooner than later. --- fs/pnode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)
--- a/fs/pnode.c +++ b/fs/pnode.c @@ -244,7 +244,7 @@ static int propagate_one(struct mount *m } do { struct mount *parent = last_source->mnt_parent; - if (last_source == first_source) + if (peers(last_source, first_source)) break; done = parent->mnt_master == p; if (done && peers(n, parent))
From: ChiYuan Huang cy_huang@richtek.com
commit 5f4f94e9f26cca6514474b307b59348b8485e711 upstream.
Fix the potential risk of OOB read if bank index is over the maximum.
Refer to the discussion list for the experiment result on mt6370. https://lore.kernel.org/all/20220914013345.GA5802@cyhuang-hp-elitebook-840-g... If not to check the bound, there is the same issue on mt6360.
Cc: stable@vger.kernel.org Fixes: 3b0850440a06c (mfd: mt6360: Merge different sub-devices I2C read/write) Signed-off-by: ChiYuan Huang cy_huang@richtek.com Signed-off-by: Lee Jones lee@kernel.org Link: https://lore.kernel.org/r/1664416817-31590-1-git-send-email-u0084500@gmail.c... Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mfd/mt6360-core.c | 14 ++++++++++++-- 1 file changed, 12 insertions(+), 2 deletions(-)
--- a/drivers/mfd/mt6360-core.c +++ b/drivers/mfd/mt6360-core.c @@ -402,7 +402,7 @@ static int mt6360_regmap_read(void *cont struct mt6360_ddata *ddata = context; u8 bank = *(u8 *)reg; u8 reg_addr = *(u8 *)(reg + 1); - struct i2c_client *i2c = ddata->i2c[bank]; + struct i2c_client *i2c; bool crc_needed = false; u8 *buf; int buf_len = MT6360_ALLOC_READ_SIZE(val_size); @@ -410,6 +410,11 @@ static int mt6360_regmap_read(void *cont u8 crc; int ret;
+ if (bank >= MT6360_SLAVE_MAX) + return -EINVAL; + + i2c = ddata->i2c[bank]; + if (bank == MT6360_SLAVE_PMIC || bank == MT6360_SLAVE_LDO) { crc_needed = true; ret = mt6360_xlate_pmicldo_addr(®_addr, val_size); @@ -453,13 +458,18 @@ static int mt6360_regmap_write(void *con struct mt6360_ddata *ddata = context; u8 bank = *(u8 *)val; u8 reg_addr = *(u8 *)(val + 1); - struct i2c_client *i2c = ddata->i2c[bank]; + struct i2c_client *i2c; bool crc_needed = false; u8 *buf; int buf_len = MT6360_ALLOC_WRITE_SIZE(val_size); int write_size = val_size - MT6360_REGMAP_REG_BYTE_SIZE; int ret;
+ if (bank >= MT6360_SLAVE_MAX) + return -EINVAL; + + i2c = ddata->i2c[bank]; + if (bank == MT6360_SLAVE_PMIC || bank == MT6360_SLAVE_LDO) { crc_needed = true; ret = mt6360_xlate_pmicldo_addr(®_addr, val_size - MT6360_REGMAP_REG_BYTE_SIZE);
From: Mikulas Patocka mpatocka@redhat.com
commit 341097ee53573e06ab9fc675d96a052385b851fa upstream.
There's a crash in mempool_free when running the lvm test shell/lvchange-rebuild-raid.sh.
The reason for the crash is this: * super_written calls atomic_dec_and_test(&mddev->pending_writes) and wake_up(&mddev->sb_wait). Then it calls rdev_dec_pending(rdev, mddev) and bio_put(bio). * so, the process that waited on sb_wait and that is woken up is racing with bio_put(bio). * if the process wins the race, it calls bioset_exit before bio_put(bio) is executed. * bio_put(bio) attempts to free a bio into a destroyed bio set - causing a crash in mempool_free.
We fix this bug by moving bio_put before atomic_dec_and_test.
We also move rdev_dec_pending before atomic_dec_and_test as suggested by Neil Brown.
The function md_end_flush has a similar bug - we must call bio_put before we decrement the number of in-progress bios.
BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 11557f0067 P4D 11557f0067 PUD 0 Oops: 0002 [#1] PREEMPT SMP CPU: 0 PID: 73 Comm: kworker/0:1 Not tainted 6.1.0-rc3 #5 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 Workqueue: kdelayd flush_expired_bios [dm_delay] RIP: 0010:mempool_free+0x47/0x80 Code: 48 89 ef 5b 5d ff e0 f3 c3 48 89 f7 e8 32 45 3f 00 48 63 53 08 48 89 c6 3b 53 04 7d 2d 48 8b 43 10 8d 4a 01 48 89 df 89 4b 08 <48> 89 2c d0 e8 b0 45 3f 00 48 8d 7b 30 5b 5d 31 c9 ba 01 00 00 00 RSP: 0018:ffff88910036bda8 EFLAGS: 00010093 RAX: 0000000000000000 RBX: ffff8891037b65d8 RCX: 0000000000000001 RDX: 0000000000000000 RSI: 0000000000000202 RDI: ffff8891037b65d8 RBP: ffff8891447ba240 R08: 0000000000012908 R09: 00000000003d0900 R10: 0000000000000000 R11: 0000000000173544 R12: ffff889101a14000 R13: ffff8891562ac300 R14: ffff889102b41440 R15: ffffe8ffffa00d05 FS: 0000000000000000(0000) GS:ffff88942fa00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000000 CR3: 0000001102e99000 CR4: 00000000000006b0 Call Trace: <TASK> clone_endio+0xf4/0x1c0 [dm_mod] clone_endio+0xf4/0x1c0 [dm_mod] __submit_bio+0x76/0x120 submit_bio_noacct_nocheck+0xb6/0x2a0 flush_expired_bios+0x28/0x2f [dm_delay] process_one_work+0x1b4/0x300 worker_thread+0x45/0x3e0 ? rescuer_thread+0x380/0x380 kthread+0xc2/0x100 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Modules linked in: brd dm_delay dm_raid dm_mod af_packet uvesafb cfbfillrect cfbimgblt cn cfbcopyarea fb font fbdev tun autofs4 binfmt_misc configfs ipv6 virtio_rng virtio_balloon rng_core virtio_net pcspkr net_failover failover qemu_fw_cfg button mousedev raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq raid6_pq async_xor xor async_tx raid1 raid0 md_mod sd_mod t10_pi crc64_rocksoft crc64 virtio_scsi scsi_mod evdev psmouse bsg scsi_common [last unloaded: brd] CR2: 0000000000000000 ---[ end trace 0000000000000000 ]---
Signed-off-by: Mikulas Patocka mpatocka@redhat.com Cc: stable@vger.kernel.org Signed-off-by: Song Liu song@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/md/md.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)
--- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -509,13 +509,14 @@ static void md_end_flush(struct bio *bio struct md_rdev *rdev = bio->bi_private; struct mddev *mddev = rdev->mddev;
+ bio_put(bio); + rdev_dec_pending(rdev, mddev);
if (atomic_dec_and_test(&mddev->flush_pending)) { /* The pre-request flush has finished */ queue_work(md_wq, &mddev->flush_work); } - bio_put(bio); }
static void md_submit_flush_data(struct work_struct *ws); @@ -913,10 +914,12 @@ static void super_written(struct bio *bi } else clear_bit(LastDev, &rdev->flags);
+ bio_put(bio); + + rdev_dec_pending(rdev, mddev); + if (atomic_dec_and_test(&mddev->pending_writes)) wake_up(&mddev->sb_wait); - rdev_dec_pending(rdev, mddev); - bio_put(bio); }
void md_super_write(struct mddev *mddev, struct md_rdev *rdev,
From: NARIBAYASHI Akira a.naribayashi@fujitsu.com
commit be21b32afe470c5ae98e27e49201158a47032942 upstream.
Depending on the memory configuration, isolate_freepages_block() may scan pages out of the target range and causes panic.
Panic can occur on systems with multiple zones in a single pageblock.
The reason it is rare is that it only happens in special configurations. Depending on how many similar systems there are, it may be a good idea to fix this problem for older kernels as well.
The problem is that pfn as argument of fast_isolate_around() could be out of the target range. Therefore we should consider the case where pfn < start_pfn, and also the case where end_pfn < pfn.
This problem should have been addressd by the commit 6e2b7044c199 ("mm, compaction: make fast_isolate_freepages() stay within zone") but there was an oversight.
Case1: pfn < start_pfn
<at memory compaction for node Y> | node X's zone | node Y's zone +-----------------+------------------------------... pageblock ^ ^ ^ +-----------+-----------+-----------+-----------+... ^ ^ ^ ^ ^ end_pfn ^ start_pfn = cc->zone->zone_start_pfn pfn <---------> scanned range by "Scan After"
Case2: end_pfn < pfn
<at memory compaction for node X> | node X's zone | node Y's zone +-----------------+------------------------------... pageblock ^ ^ ^ +-----------+-----------+-----------+-----------+... ^ ^ ^ ^ ^ pfn ^ end_pfn start_pfn <---------> scanned range by "Scan Before"
It seems that there is no good reason to skip nr_isolated pages just after given pfn. So let perform simple scan from start to end instead of dividing the scan into "Before" and "After".
Link: https://lkml.kernel.org/r/20221026112438.236336-1-a.naribayashi@fujitsu.com Fixes: 6e2b7044c199 ("mm, compaction: make fast_isolate_freepages() stay within zone"). Signed-off-by: NARIBAYASHI Akira a.naribayashi@fujitsu.com Cc: David Rientjes rientjes@google.com Cc: Mel Gorman mgorman@techsingularity.net Cc: Vlastimil Babka vbabka@suse.cz Cc: stable@vger.kernel.org Signed-off-by: Andrew Morton akpm@linux-foundation.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- mm/compaction.c | 18 +++++------------- 1 file changed, 5 insertions(+), 13 deletions(-)
--- a/mm/compaction.c +++ b/mm/compaction.c @@ -1346,7 +1346,7 @@ move_freelist_tail(struct list_head *fre }
static void -fast_isolate_around(struct compact_control *cc, unsigned long pfn, unsigned long nr_isolated) +fast_isolate_around(struct compact_control *cc, unsigned long pfn) { unsigned long start_pfn, end_pfn; struct page *page; @@ -1367,21 +1367,13 @@ fast_isolate_around(struct compact_contr if (!page) return;
- /* Scan before */ - if (start_pfn != pfn) { - isolate_freepages_block(cc, &start_pfn, pfn, &cc->freepages, 1, false); - if (cc->nr_freepages >= cc->nr_migratepages) - return; - } - - /* Scan after */ - start_pfn = pfn + nr_isolated; - if (start_pfn < end_pfn) - isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false); + isolate_freepages_block(cc, &start_pfn, end_pfn, &cc->freepages, 1, false);
/* Skip this pageblock in the future as it's full or nearly full */ if (cc->nr_freepages < cc->nr_migratepages) set_pageblock_skip(page); + + return; }
/* Search orders in round-robin fashion */ @@ -1558,7 +1550,7 @@ fast_isolate_freepages(struct compact_co return cc->free_pfn;
low_pfn = page_to_pfn(page); - fast_isolate_around(cc, low_pfn, nr_isolated); + fast_isolate_around(cc, low_pfn); return low_pfn; }
From: Pavel Machek pavel@denx.de
commit c3db3c2fd9992c08f49aa93752d3c103c3a4f6aa upstream.
The commit introduces another bug.
Cc: stable@vger.kernel.org Fixes: c6ad7fd16657e ("f2fs: fix to do sanity check on summary info") Signed-off-by: Pavel Machek pavel@denx.de Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/gc.c | 1 + 1 file changed, 1 insertion(+)
--- a/fs/f2fs/gc.c +++ b/fs/f2fs/gc.c @@ -1109,6 +1109,7 @@ static bool is_alive(struct f2fs_sb_info if (ofs_in_node >= max_addrs) { f2fs_err(sbi, "Inconsistent ofs_in_node:%u in summary, ino:%u, nid:%u, max:%u", ofs_in_node, dni->ino, dni->nid, max_addrs); + f2fs_put_page(node_page, 1); return false; }
From: Jaegeuk Kim jaegeuk@kernel.org
commit e6ecb142429183cef4835f31d4134050ae660032 upstream.
If block address is still alive, we should give a valid node block even after shutdown. Otherwise, we can see zero data when reading out a file.
Cc: stable@vger.kernel.org Fixes: 83a3bfdb5a8a ("f2fs: indicate shutdown f2fs to allow unmount successfully") Reviewed-by: Chao Yu chao@kernel.org Signed-off-by: Jaegeuk Kim jaegeuk@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- fs/f2fs/node.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)
--- a/fs/f2fs/node.c +++ b/fs/f2fs/node.c @@ -1358,8 +1358,7 @@ static int read_node_page(struct page *p return err;
/* NEW_ADDR can be seen, after cp_error drops some dirty node pages */ - if (unlikely(ni.blk_addr == NULL_ADDR || ni.blk_addr == NEW_ADDR) || - is_sbi_flag_set(sbi, SBI_IS_SHUTDOWN)) { + if (unlikely(ni.blk_addr == NULL_ADDR || ni.blk_addr == NEW_ADDR)) { ClearPageUptodate(page); return -ENOENT; }
From: Jan Kara jack@suse.cz
commit 36369f46e91785688a5f39d7a5590e3f07981316 upstream.
Since commit 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") we allow rereading of partition table although there are users of the block device. This has an undesirable consequence that e.g. if sda and sdb are assembled to a RAID1 device md0 with partitions, BLKRRPART ioctl on sda will rescan partition table and create sda1 device. This partition device under a raid device confuses some programs (such as libstorage-ng used for initial partitioning for distribution installation) leading to failures.
Fix the problem refusing to rescan partitions if there is another user that has the block device exclusively open.
Cc: stable@vger.kernel.org Link: https://lore.kernel.org/all/20221130135344.2ul4cyfstfs3znxg@quack3 Fixes: 10c70d95c0f2 ("block: remove the bd_openers checks in blk_drop_partitions") Signed-off-by: Jan Kara jack@suse.cz Link: https://lore.kernel.org/r/20221130175653.24299-1-jack@suse.cz [axboe: fold in followup fix] Signed-off-by: Jens Axboe axboe@kernel.dk Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- block/blk.h | 2 +- block/genhd.c | 7 +++++-- block/ioctl.c | 12 +++++++----- 3 files changed, 13 insertions(+), 8 deletions(-)
--- a/block/blk.h +++ b/block/blk.h @@ -429,7 +429,7 @@ static inline struct kmem_cache *blk_get } struct request_queue *blk_alloc_queue(int node_id, bool alloc_srcu);
-int disk_scan_partitions(struct gendisk *disk, fmode_t mode); +int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner);
int disk_alloc_events(struct gendisk *disk); void disk_add_events(struct gendisk *disk); --- a/block/genhd.c +++ b/block/genhd.c @@ -356,7 +356,7 @@ void disk_uevent(struct gendisk *disk, e } EXPORT_SYMBOL_GPL(disk_uevent);
-int disk_scan_partitions(struct gendisk *disk, fmode_t mode) +int disk_scan_partitions(struct gendisk *disk, fmode_t mode, void *owner) { struct block_device *bdev;
@@ -366,6 +366,9 @@ int disk_scan_partitions(struct gendisk return -EINVAL; if (disk->open_partitions) return -EBUSY; + /* Someone else has bdev exclusively open? */ + if (disk->part0->bd_holder && disk->part0->bd_holder != owner) + return -EBUSY;
set_bit(GD_NEED_PART_SCAN, &disk->state); bdev = blkdev_get_by_dev(disk_devt(disk), mode, NULL); @@ -499,7 +502,7 @@ int __must_check device_add_disk(struct
bdev_add(disk->part0, ddev->devt); if (get_capacity(disk)) - disk_scan_partitions(disk, FMODE_READ); + disk_scan_partitions(disk, FMODE_READ, NULL);
/* * Announce the disk and partitions after all partitions are --- a/block/ioctl.c +++ b/block/ioctl.c @@ -467,9 +467,10 @@ static int blkdev_bszset(struct block_de * user space. Note the separate arg/argp parameters that are needed * to deal with the compat_ptr() conversion. */ -static int blkdev_common_ioctl(struct block_device *bdev, fmode_t mode, - unsigned cmd, unsigned long arg, void __user *argp) +static int blkdev_common_ioctl(struct file *file, fmode_t mode, unsigned cmd, + unsigned long arg, void __user *argp) { + struct block_device *bdev = I_BDEV(file->f_mapping->host); unsigned int max_sectors;
switch (cmd) { @@ -527,7 +528,8 @@ static int blkdev_common_ioctl(struct bl return -EACCES; if (bdev_is_partition(bdev)) return -EINVAL; - return disk_scan_partitions(bdev->bd_disk, mode & ~FMODE_EXCL); + return disk_scan_partitions(bdev->bd_disk, mode & ~FMODE_EXCL, + file); case BLKTRACESTART: case BLKTRACESTOP: case BLKTRACETEARDOWN: @@ -605,7 +607,7 @@ long blkdev_ioctl(struct file *file, uns break; }
- ret = blkdev_common_ioctl(bdev, mode, cmd, arg, argp); + ret = blkdev_common_ioctl(file, mode, cmd, arg, argp); if (ret != -ENOIOCTLCMD) return ret;
@@ -674,7 +676,7 @@ long compat_blkdev_ioctl(struct file *fi break; }
- ret = blkdev_common_ioctl(bdev, mode, cmd, arg, argp); + ret = blkdev_common_ioctl(file, mode, cmd, arg, argp); if (ret == -ENOIOCTLCMD && disk->fops->compat_ioctl) ret = disk->fops->compat_ioctl(bdev, mode, cmd, arg);
From: Deren Wu deren.wu@mediatek.com
commit 4a44cd249604e29e7b90ae796d7692f5773dd348 upstream.
vub300_enable_sdio_irq() works with mutex and need TASK_RUNNING here. Ensure that we mark current as TASK_RUNNING for sleepable context.
[ 77.554641] do not call blocking ops when !TASK_RUNNING; state=1 set at [<ffffffff92a72c1d>] sdio_irq_thread+0x17d/0x5b0 [ 77.554652] WARNING: CPU: 2 PID: 1983 at kernel/sched/core.c:9813 __might_sleep+0x116/0x160 [ 77.554905] CPU: 2 PID: 1983 Comm: ksdioirqd/mmc1 Tainted: G OE 6.1.0-rc5 #1 [ 77.554910] Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, BIOS BECFL357.86A.0081.2020.0504.1834 05/04/2020 [ 77.554912] RIP: 0010:__might_sleep+0x116/0x160 [ 77.554920] RSP: 0018:ffff888107b7fdb8 EFLAGS: 00010282 [ 77.554923] RAX: 0000000000000000 RBX: ffff888118c1b740 RCX: 0000000000000000 [ 77.554926] RDX: 0000000000000001 RSI: 0000000000000004 RDI: ffffed1020f6ffa9 [ 77.554928] RBP: ffff888107b7fde0 R08: 0000000000000001 R09: ffffed1043ea60ba [ 77.554930] R10: ffff88821f5305cb R11: ffffed1043ea60b9 R12: ffffffff93aa3a60 [ 77.554932] R13: 000000000000011b R14: 7fffffffffffffff R15: ffffffffc0558660 [ 77.554934] FS: 0000000000000000(0000) GS:ffff88821f500000(0000) knlGS:0000000000000000 [ 77.554937] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 77.554939] CR2: 00007f8a44010d68 CR3: 000000024421a003 CR4: 00000000003706e0 [ 77.554942] Call Trace: [ 77.554944] <TASK> [ 77.554952] mutex_lock+0x78/0xf0 [ 77.554973] vub300_enable_sdio_irq+0x103/0x3c0 [vub300] [ 77.554981] sdio_irq_thread+0x25c/0x5b0 [ 77.555006] kthread+0x2b8/0x370 [ 77.555017] ret_from_fork+0x1f/0x30 [ 77.555023] </TASK> [ 77.555025] ---[ end trace 0000000000000000 ]---
Fixes: 88095e7b473a ("mmc: Add new VUB300 USB-to-SD/SDIO/MMC driver") Signed-off-by: Deren Wu deren.wu@mediatek.com Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/87dc45b122d26d63c80532976813c9365d7160b3.167014088... Signed-off-by: Ulf Hansson ulf.hansson@linaro.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/mmc/host/vub300.c | 2 ++ 1 file changed, 2 insertions(+)
--- a/drivers/mmc/host/vub300.c +++ b/drivers/mmc/host/vub300.c @@ -2049,6 +2049,7 @@ static void vub300_enable_sdio_irq(struc return; kref_get(&vub300->kref); if (enable) { + set_current_state(TASK_RUNNING); mutex_lock(&vub300->irq_mutex); if (vub300->irqs_queued) { vub300->irqs_queued -= 1; @@ -2064,6 +2065,7 @@ static void vub300_enable_sdio_irq(struc vub300_queue_poll_work(vub300, 0); } mutex_unlock(&vub300->irq_mutex); + set_current_state(TASK_INTERRUPTIBLE); } else { vub300->irq_enabled = 0; }
From: Hanjun Guo guohanjun@huawei.com
commit 8740a12ca2e2959531ad253bac99ada338b33d80 upstream.
The start and length of the event log area are obtained from TPM2 or TCPA table, so we call acpi_get_table() to get the ACPI information, but the acpi_get_table() should be coupled with acpi_put_table() to release the ACPI memory, add the acpi_put_table() properly to fix the memory leak.
While we are at it, remove the redundant empty line at the end of the tpm_read_log_acpi().
Fixes: 0bfb23746052 ("tpm: Move eventlog files to a subdirectory") Fixes: 85467f63a05c ("tpm: Add support for event log pointer found in TPM2 ACPI table") Cc: stable@vger.kernel.org Signed-off-by: Hanjun Guo guohanjun@huawei.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/eventlog/acpi.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)
--- a/drivers/char/tpm/eventlog/acpi.c +++ b/drivers/char/tpm/eventlog/acpi.c @@ -90,16 +90,21 @@ int tpm_read_log_acpi(struct tpm_chip *c return -ENODEV;
if (tbl->header.length < - sizeof(*tbl) + sizeof(struct acpi_tpm2_phy)) + sizeof(*tbl) + sizeof(struct acpi_tpm2_phy)) { + acpi_put_table((struct acpi_table_header *)tbl); return -ENODEV; + }
tpm2_phy = (void *)tbl + sizeof(*tbl); len = tpm2_phy->log_area_minimum_length;
start = tpm2_phy->log_area_start_address; - if (!start || !len) + if (!start || !len) { + acpi_put_table((struct acpi_table_header *)tbl); return -ENODEV; + }
+ acpi_put_table((struct acpi_table_header *)tbl); format = EFI_TCG2_EVENT_LOG_FORMAT_TCG_2; } else { /* Find TCPA entry in RSDT (ACPI_LOGICAL_ADDRESSING) */ @@ -120,8 +125,10 @@ int tpm_read_log_acpi(struct tpm_chip *c break; }
+ acpi_put_table((struct acpi_table_header *)buff); format = EFI_TCG2_EVENT_LOG_FORMAT_TCG_1_2; } + if (!len) { dev_warn(&chip->dev, "%s: TCPA log area empty\n", __func__); return -EIO; @@ -156,5 +163,4 @@ err: kfree(log->bios_event_log); log->bios_event_log = NULL; return ret; - }
From: Hanjun Guo guohanjun@huawei.com
commit 37e90c374dd11cf4919c51e847c6d6ced0abc555 upstream.
In crb_acpi_add(), we get the TPM2 table to retrieve information like start method, and then assign them to the priv data, so the TPM2 table is not used after the init, should be freed, call acpi_put_table() to fix the memory leak.
Fixes: 30fc8d138e91 ("tpm: TPM 2.0 CRB Interface") Cc: stable@vger.kernel.org Signed-off-by: Hanjun Guo guohanjun@huawei.com Reviewed-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_crb.c | 29 ++++++++++++++++++++--------- 1 file changed, 20 insertions(+), 9 deletions(-)
--- a/drivers/char/tpm/tpm_crb.c +++ b/drivers/char/tpm/tpm_crb.c @@ -676,12 +676,16 @@ static int crb_acpi_add(struct acpi_devi
/* Should the FIFO driver handle this? */ sm = buf->start_method; - if (sm == ACPI_TPM2_MEMORY_MAPPED) - return -ENODEV; + if (sm == ACPI_TPM2_MEMORY_MAPPED) { + rc = -ENODEV; + goto out; + }
priv = devm_kzalloc(dev, sizeof(struct crb_priv), GFP_KERNEL); - if (!priv) - return -ENOMEM; + if (!priv) { + rc = -ENOMEM; + goto out; + }
if (sm == ACPI_TPM2_COMMAND_BUFFER_WITH_ARM_SMC) { if (buf->header.length < (sizeof(*buf) + sizeof(*crb_smc))) { @@ -689,7 +693,8 @@ static int crb_acpi_add(struct acpi_devi FW_BUG "TPM2 ACPI table has wrong size %u for start method type %d\n", buf->header.length, ACPI_TPM2_COMMAND_BUFFER_WITH_ARM_SMC); - return -EINVAL; + rc = -EINVAL; + goto out; } crb_smc = ACPI_ADD_PTR(struct tpm2_crb_smc, buf, sizeof(*buf)); priv->smc_func_id = crb_smc->smc_func_id; @@ -700,17 +705,23 @@ static int crb_acpi_add(struct acpi_devi
rc = crb_map_io(device, priv, buf); if (rc) - return rc; + goto out;
chip = tpmm_chip_alloc(dev, &tpm_crb); - if (IS_ERR(chip)) - return PTR_ERR(chip); + if (IS_ERR(chip)) { + rc = PTR_ERR(chip); + goto out; + }
dev_set_drvdata(&chip->dev, priv); chip->acpi_dev_handle = device->handle; chip->flags = TPM_CHIP_FLAG_TPM2;
- return tpm_chip_register(chip); + rc = tpm_chip_register(chip); + +out: + acpi_put_table((struct acpi_table_header *)buf); + return rc; }
static int crb_acpi_remove(struct acpi_device *device)
From: Hanjun Guo guohanjun@huawei.com
commit db9622f762104459ff87ecdf885cc42c18053fd9 upstream.
In check_acpi_tpm2(), we get the TPM2 table just to make sure the table is there, not used after the init, so the acpi_put_table() should be added to release the ACPI memory.
Fixes: 4cb586a188d4 ("tpm_tis: Consolidate the platform and acpi probe flow") Cc: stable@vger.kernel.org Signed-off-by: Hanjun Guo guohanjun@huawei.com Signed-off-by: Jarkko Sakkinen jarkko@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- drivers/char/tpm/tpm_tis.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)
--- a/drivers/char/tpm/tpm_tis.c +++ b/drivers/char/tpm/tpm_tis.c @@ -125,6 +125,7 @@ static int check_acpi_tpm2(struct device const struct acpi_device_id *aid = acpi_match_device(tpm_acpi_tbl, dev); struct acpi_table_tpm2 *tbl; acpi_status st; + int ret = 0;
if (!aid || aid->driver_data != DEVICE_IS_TPM2) return 0; @@ -132,8 +133,7 @@ static int check_acpi_tpm2(struct device /* If the ACPI TPM2 signature is matched then a global ACPI_SIG_TPM2 * table is mandatory */ - st = - acpi_get_table(ACPI_SIG_TPM2, 1, (struct acpi_table_header **)&tbl); + st = acpi_get_table(ACPI_SIG_TPM2, 1, (struct acpi_table_header **)&tbl); if (ACPI_FAILURE(st) || tbl->header.length < sizeof(*tbl)) { dev_err(dev, FW_BUG "failed to get TPM2 ACPI table\n"); return -EINVAL; @@ -141,9 +141,10 @@ static int check_acpi_tpm2(struct device
/* The tpm2_crb driver handles this device */ if (tbl->start_method != ACPI_TPM2_MEMORY_MAPPED) - return -ENODEV; + ret = -ENODEV;
- return 0; + acpi_put_table((struct acpi_table_header *)tbl); + return ret; } #else static int check_acpi_tpm2(struct device *dev)
From: Chuck Lever chuck.lever@oracle.com
commit da522b5fe1a5f8b7c20a0023e87b52a150e53bf5 upstream.
Fixes: 030d794bf498 ("SUNRPC: Use gssproxy upcall for server RPCGSS authentication.") Signed-off-by: Chuck Lever chuck.lever@oracle.com Cc: stable@vger.kernel.org Reviewed-by: Jeff Layton jlayton@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- net/sunrpc/auth_gss/svcauth_gss.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-)
--- a/net/sunrpc/auth_gss/svcauth_gss.c +++ b/net/sunrpc/auth_gss/svcauth_gss.c @@ -1162,18 +1162,23 @@ static int gss_read_proxy_verf(struct sv return res;
inlen = svc_getnl(argv); - if (inlen > (argv->iov_len + rqstp->rq_arg.page_len)) + if (inlen > (argv->iov_len + rqstp->rq_arg.page_len)) { + kfree(in_handle->data); return SVC_DENIED; + }
pages = DIV_ROUND_UP(inlen, PAGE_SIZE); in_token->pages = kcalloc(pages, sizeof(struct page *), GFP_KERNEL); - if (!in_token->pages) + if (!in_token->pages) { + kfree(in_handle->data); return SVC_DENIED; + } in_token->page_base = 0; in_token->page_len = inlen; for (i = 0; i < pages; i++) { in_token->pages[i] = alloc_page(GFP_KERNEL); if (!in_token->pages[i]) { + kfree(in_handle->data); gss_free_in_token_pages(in_token); return SVC_DENIED; }
From: Marco Elver elver@google.com
commit 7c201739beef1a586d806463f1465429cdce34c5 upstream.
With Clang version 16+, -fsanitize=thread will turn memcpy/memset/memmove calls in instrumented functions into __tsan_memcpy/__tsan_memset/__tsan_memmove calls respectively.
Add these functions to the core KCSAN runtime, so that we (a) catch data races with mem* functions, and (b) won't run into linker errors with such newer compilers.
Cc: stable@vger.kernel.org # v5.10+ Signed-off-by: Marco Elver elver@google.com Signed-off-by: Paul E. McKenney paulmck@kernel.org Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org --- kernel/kcsan/core.c | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+)
--- a/kernel/kcsan/core.c +++ b/kernel/kcsan/core.c @@ -14,10 +14,12 @@ #include <linux/init.h> #include <linux/kernel.h> #include <linux/list.h> +#include <linux/minmax.h> #include <linux/moduleparam.h> #include <linux/percpu.h> #include <linux/preempt.h> #include <linux/sched.h> +#include <linux/string.h> #include <linux/uaccess.h>
#include "encoding.h" @@ -1308,3 +1310,51 @@ noinline void __tsan_atomic_signal_fence } } EXPORT_SYMBOL(__tsan_atomic_signal_fence); + +#ifdef __HAVE_ARCH_MEMSET +void *__tsan_memset(void *s, int c, size_t count); +noinline void *__tsan_memset(void *s, int c, size_t count) +{ + /* + * Instead of not setting up watchpoints where accessed size is greater + * than MAX_ENCODABLE_SIZE, truncate checked size to MAX_ENCODABLE_SIZE. + */ + size_t check_len = min_t(size_t, count, MAX_ENCODABLE_SIZE); + + check_access(s, check_len, KCSAN_ACCESS_WRITE, _RET_IP_); + return memset(s, c, count); +} +#else +void *__tsan_memset(void *s, int c, size_t count) __alias(memset); +#endif +EXPORT_SYMBOL(__tsan_memset); + +#ifdef __HAVE_ARCH_MEMMOVE +void *__tsan_memmove(void *dst, const void *src, size_t len); +noinline void *__tsan_memmove(void *dst, const void *src, size_t len) +{ + size_t check_len = min_t(size_t, len, MAX_ENCODABLE_SIZE); + + check_access(dst, check_len, KCSAN_ACCESS_WRITE, _RET_IP_); + check_access(src, check_len, 0, _RET_IP_); + return memmove(dst, src, len); +} +#else +void *__tsan_memmove(void *dst, const void *src, size_t len) __alias(memmove); +#endif +EXPORT_SYMBOL(__tsan_memmove); + +#ifdef __HAVE_ARCH_MEMCPY +void *__tsan_memcpy(void *dst, const void *src, size_t len); +noinline void *__tsan_memcpy(void *dst, const void *src, size_t len) +{ + size_t check_len = min_t(size_t, len, MAX_ENCODABLE_SIZE); + + check_access(dst, check_len, KCSAN_ACCESS_WRITE, _RET_IP_); + check_access(src, check_len, 0, _RET_IP_); + return memcpy(dst, src, len); +} +#else +void *__tsan_memcpy(void *dst, const void *src, size_t len) __alias(memcpy); +#endif +EXPORT_SYMBOL(__tsan_memcpy);
On 1/2/23 04:21, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my test system. No dmesg regressions.
Tested-by: Shuah Khan skhan@linuxfoundation.org
thanks, -- Shuah
On Mon, Jan 02, 2023 at 12:21:33PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
Build results: total: 155 pass: 155 fail: 0 Qemu test results: total: 500 pass: 500 fail: 0
Tested-by: Guenter Roeck linux@roeck-us.net
Guenter
On Mon, 2 Jan 2023 at 16:58, Greg Kroah-Hartman gregkh@linuxfoundation.org wrote:
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386.
Tested-by: Linux Kernel Functional Testing lkft@linaro.org
## Build * kernel: 6.0.17-rc1 * git: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git * git branch: linux-6.0.y * git commit: 9c0ac88985a8f62726941213c92ff8eddf500f72 * git describe: v6.0.16-75-g9c0ac88985a8 * test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-linux-6.0.y/build/v6.0.16...
## Test Regressions (compared to v6.0.15-1067-gf54b936f8ec7)
## Metric Regressions (compared to v6.0.15-1067-gf54b936f8ec7)
## Test Fixes (compared to v6.0.15-1067-gf54b936f8ec7)
## Metric Fixes (compared to v6.0.15-1067-gf54b936f8ec7)
## Test result summary total: 143234, pass: 127347, fail: 2737, skip: 12854, xfail: 296
## Build Summary * arc: 5 total, 5 passed, 0 failed * arm: 151 total, 146 passed, 5 failed * arm64: 49 total, 49 passed, 0 failed * i386: 39 total, 36 passed, 3 failed * mips: 30 total, 28 passed, 2 failed * parisc: 8 total, 8 passed, 0 failed * powerpc: 38 total, 32 passed, 6 failed * riscv: 16 total, 16 passed, 0 failed * s390: 16 total, 16 passed, 0 failed * sh: 14 total, 12 passed, 2 failed * sparc: 8 total, 8 passed, 0 failed * x86_64: 42 total, 41 passed, 1 failed
## Test suites summary * boot * fwts * igt-gpu-tools * kselftest-android * kselftest-arm64 * kselftest-breakpoints * kselftest-capabilities * kselftest-cgroup * kselftest-clone3 * kselftest-core * kselftest-cpu-hotplug * kselftest-cpufreq * kselftest-drivers-dma-buf * kselftest-efivarfs * kselftest-filesystems * kselftest-filesystems-binderfs * kselftest-firmware * kselftest-fpu * kselftest-futex * kselftest-gpio * kselftest-intel_pstate * kselftest-ipc * kselftest-ir * kselftest-kcmp * kselftest-kvm * kselftest-lib * kselftest-livepatch * kselftest-membarrier * kselftest-memfd * kselftest-memory-hotplug * kselftest-mincore * kselftest-mount * kselftest-mqueue * kselftest-net * kselftest-net-forwarding * kselftest-net-mptcp * kselftest-netfilter * kselftest-nsfs * kselftest-openat2 * kselftest-pid_namespace * kselftest-pidfd * kselftest-proc * kselftest-ptrace * kselftest-rseq * kselftest-rtc * kselftest-seccomp * kselftest-sigaltstack * kselftest-size * kselftest-splice * kselftest-static_keys * kselftest-sync * kselftest-sysctl * kselftest-tc-testing * kselftest-timens * kselftest-timers * kselftest-tmpfs * kselftest-tpm2 * kselftest-user * kselftest-vm * kselftest-x86 * kselftest-zram * kunit * kvm-unit-tests * libgpiod * libhugetlbfs * log-parser-boot * log-parser-test * ltp-cap_bounds * ltp-commands * ltp-containers * ltp-controllers * ltp-cpuhotplug * ltp-crypto * ltp-cve * ltp-dio * ltp-fcntl-locktests * ltp-filecaps * ltp-fs * ltp-fs_bind * ltp-fs_perms_simple * ltp-fsx * ltp-hugetlb * ltp-io * ltp-ipc * ltp-math * ltp-mm * ltp-nptl * ltp-open-posix-tests * ltp-pty * ltp-sched * ltp-securebits * ltp-smoke * ltp-syscalls * ltp-tracing * network-basic-tests * packetdrill * perf * rcutorture * v4l2-compliance * vdso
-- Linaro LKFT https://lkft.linaro.org
Hi Greg,
On Mon, Jan 02, 2023 at 12:21:33PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
Build test (gcc version 12.2.1 20221127): mips: 52 configs -> no failure arm: 100 configs -> no failure arm64: 3 configs -> no failure x86_64: 4 configs -> no failure alpha allmodconfig -> no failure csky allmodconfig -> no failure powerpc allmodconfig -> no failure riscv allmodconfig -> no failure s390 allmodconfig -> no failure xtensa allmodconfig -> no failure
Boot test: x86_64: Booted on my test laptop. No regression. x86_64: Booted on qemu. No regression. [1] arm64: Booted on rpi4b (4GB model). No regression. [2]
mips: Booted on ci20 board. VNC server did not start, will try a bisect later tonight.
[1]. https://openqa.qa.codethink.co.uk/tests/2541 [2]. https://openqa.qa.codethink.co.uk/tests/2545
Tested-by: Sudip Mukherjee sudip.mukherjee@codethink.co.uk
On Tue, 3 Jan 2023 at 10:37, Sudip Mukherjee (Codethink) sudipm.mukherjee@gmail.com wrote:
Hi Greg,
On Mon, Jan 02, 2023 at 12:21:33PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
<snip>
mips: Booted on ci20 board. VNC server did not start, will try a bisect later tonight.
bisect pointed to 2575eebf1bd2 ("net: Return errno in sk->sk_prot->get_port().") introduced in v6.0.16 Reverting it on top of v6.0.16 and 6.0.17-rc1 fixed the problem.
This is also in v6.1.y but the issue is not seen there, and I am trying to figure out why.
On Tue, Jan 03, 2023 at 10:59:58PM +0000, Sudip Mukherjee wrote:
On Tue, 3 Jan 2023 at 10:37, Sudip Mukherjee (Codethink) sudipm.mukherjee@gmail.com wrote:
Hi Greg,
On Mon, Jan 02, 2023 at 12:21:33PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
<snip>
mips: Booted on ci20 board. VNC server did not start, will try a bisect later tonight.
bisect pointed to 2575eebf1bd2 ("net: Return errno in sk->sk_prot->get_port().") introduced in v6.0.16 Reverting it on top of v6.0.16 and 6.0.17-rc1 fixed the problem.
This is also in v6.1.y but the issue is not seen there, and I am trying to figure out why.
As there's probably only going to be one more 6.0.y release before it is end-of-life, if you can't figure it out, not that big of a deal :)
thanks,
greg k-h
On Mon, Jan 02, 2023 at 12:21:33PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Successfully cross-compiled for arm64 (bcm2711_defconfig, GCC 10.2.0) and powerpc (ps3_defconfig, GCC 12.2.0).
Tested-by: Bagas Sanjaya bagasdotme@gmail.com
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
Compiled and booted on my x86_64 and ARM64 test systems. No errors or regressions.
Tested-by: Allen Pais apais@linux.microsoft.com
Thanks.
On 1/2/23 03:21, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
On ARCH_BRCMSTB using 32-bit and 64-bit ARM kernels, build tested on BMIPS_GENERIC:
Tested-by: Florian Fainelli f.fainelli@gmail.com
On 1/2/23 3:21 AM, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
Built and booted successfully on RISC-V RV64 (HiFive Unmatched).
Tested-by: Ron Economos re@w6rz.net
On Mon, Jan 02, 2023 at 12:21:33PM +0100, Greg Kroah-Hartman wrote:
This is the start of the stable review cycle for the 6.0.17 release. There are 74 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know.
Responses should be made by Wed, 04 Jan 2023 11:05:34 +0000. Anything received after that time might be too late.
The whole patch series can be found in one patch at: https://www.kernel.org/pub/linux/kernel/v6.x/stable-review/patch-6.0.17-rc1.... or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-6.0.y and the diffstat can be found below.
thanks,
greg k-h
Tested rc1 against the Fedora build system (aarch64, armv7, ppc64le, s390x, x86_64), and boot tested x86_64. No regressions noted.
Tested-by: Justin M. Forbes jforbes@fedoraproject.org
linux-stable-mirror@lists.linaro.org