February 2024 - Linux-stable-mirror

[PATCH v3 1/2] mmc: xenon: fix PHY init clock stability

by Elad Nachman

From: Elad Nachman <enachman(a)marvell.com> Each time SD/mmc phy is initialized, at times, in some of the attempts, phy fails to completes its initialization which results into timeout error. Per the HW spec, it is a pre-requisite to ensure a stable SD clock before a phy initialization is attempted. Fixes: 06c8b667ff5b ("mmc: sdhci-xenon: Add support to PHYs of Marvell Xenon SDHC") Acked-by: Adrian Hunter <adrian.hunter(a)intel.com> Cc: stable(a)vger.kernel.org Signed-off-by: Elad Nachman <enachman(a)marvell.com> --- drivers/mmc/host/sdhci-xenon-phy.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/drivers/mmc/host/sdhci-xenon-phy.c b/drivers/mmc/host/sdhci-xenon-phy.c index 8cf3a375de65..c3096230a969 100644 --- a/drivers/mmc/host/sdhci-xenon-phy.c +++ b/drivers/mmc/host/sdhci-xenon-phy.c @@ -11,6 +11,7 @@ #include <linux/slab.h> #include <linux/delay.h> #include <linux/ktime.h> +#include <linux/iopoll.h> #include <linux/of_address.h> #include "sdhci-pltfm.h" @@ -216,6 +217,19 @@ static int xenon_alloc_emmc_phy(struct sdhci_host *host) return 0; } +static int xenon_check_stability_internal_clk(struct sdhci_host *host) +{ + u32 reg; + int err; + + err = read_poll_timeout(sdhci_readw, reg, reg & SDHCI_CLOCK_INT_STABLE, + 1100, 20000, false, host, SDHCI_CLOCK_CONTROL); + if (err) + dev_err(mmc_dev(host->mmc), "phy_init: Internal clock never stabilized.\n"); + + return err; +} + /* * eMMC 5.0/5.1 PHY init/re-init. * eMMC PHY init should be executed after: @@ -232,6 +246,11 @@ static int xenon_emmc_phy_init(struct sdhci_host *host) struct xenon_priv *priv = sdhci_pltfm_priv(pltfm_host); struct xenon_emmc_phy_regs *phy_regs = priv->emmc_phy_regs; + int ret = xenon_check_stability_internal_clk(host); + + if (ret) + return ret; + reg = sdhci_readl(host, phy_regs->timing_adj); reg |= XENON_PHY_INITIALIZAION; sdhci_writel(host, reg, phy_regs->timing_adj); -- 2.25.1

1 year, 4 months

1
0
0 0

+ mm-mmap-fix-vma_merge-case-7-with-vma_ops-close.patch added to mm-hotfixes-unstable branch

by Andrew Morton

The patch titled Subject: mm, mmap: fix vma_merge() case 7 with vma_ops->close has been added to the -mm mm-hotfixes-unstable branch. Its filename is mm-mmap-fix-vma_merge-case-7-with-vma_ops-close.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patche… This patch will later appear in the mm-hotfixes-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Vlastimil Babka <vbabka(a)suse.cz> Subject: mm, mmap: fix vma_merge() case 7 with vma_ops->close Date: Thu, 22 Feb 2024 17:55:50 +0100 When debugging issues with a workload using SysV shmem, Michal Hocko has come up with a reproducer that shows how a series of mprotect() operations can result in an elevated shm_nattch and thus leak of the resource. The problem is caused by wrong assumptions in vma_merge() commit 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test"). The shmem vmas have a vma_ops->close callback that decrements shm_nattch, and we remove the vma without calling it. vma_merge() has thus historically avoided merging vma's with vma_ops->close and commit 714965ca8252 was supposed to keep it that way. It relaxed the checks for vma_ops->close in can_vma_merge_after() assuming that it is never called on a vma that would be a candidate for removal. However, the vma_merge() code does also use the result of this check in the decision to remove a different vma in the merge case 7. A robust solution would be to refactor vma_merge() code in a way that the vma_ops->close check is only done for vma's that are actually going to be removed, and not as part of the preliminary checks. That would both solve the existing bug, and also allow additional merges that the checks currently prevent unnecessarily in some cases. However to fix the existing bug first with a minimized risk, and for easier stable backports, this patch only adds a vma_ops->close check to the buggy case 7 specifically. All other cases of vma removal are covered by the can_vma_merge_before() check that includes the test for vma_ops->close. The reproducer code, adapted from Michal Hocko's code: int main(int argc, char *argv[]) { int segment_id; size_t segment_size = 20 * PAGE_SIZE; char * sh_mem; struct shmid_ds shmid_ds; key_t key = 0x1234; segment_id = shmget(key, segment_size, IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR); sh_mem = (char *)shmat(segment_id, NULL, 0); mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_NONE); mprotect(sh_mem + PAGE_SIZE, PAGE_SIZE, PROT_WRITE); mprotect(sh_mem + 2*PAGE_SIZE, PAGE_SIZE, PROT_WRITE); shmdt(sh_mem); shmctl(segment_id, IPC_STAT, &shmid_ds); printf("nattch after shmdt(): %lu (expected: 0)\n", shmid_ds.shm_nattch); if (shmctl(segment_id, IPC_RMID, 0)) printf("IPCRM failed %d\n", errno); return (shmid_ds.shm_nattch) ? 1 : 0; } Link: https://lkml.kernel.org/r/20240222165549.32753-2-vbabka@suse.cz Fixes: 714965ca8252 ("mm/mmap: start distinguishing if vma can be removed in mergeability test") Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz> Reported-by: Michal Hocko <mhocko(a)suse.com> Cc: Liam R. Howlett <Liam.Howlett(a)oracle.com> Cc: Lorenzo Stoakes <lstoakes(a)gmail.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> --- mm/mmap.c | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) --- a/mm/mmap.c~mm-mmap-fix-vma_merge-case-7-with-vma_ops-close +++ a/mm/mmap.c @@ -954,10 +954,19 @@ static struct vm_area_struct } else if (merge_prev) { /* case 2 */ if (curr) { vma_start_write(curr); - err = dup_anon_vma(prev, curr, &anon_dup); if (end == curr->vm_end) { /* case 7 */ + /* + * can_vma_merge_after() assumed we would not be + * removing prev vma, so it skipped the check + * for vm_ops->close, but we are removing curr + */ + if (curr->vm_ops && curr->vm_ops->close) + err = -EINVAL; + else + err = dup_anon_vma(prev, curr, &anon_dup); remove = curr; } else { /* case 5 */ + err = dup_anon_vma(prev, curr, &anon_dup); adjust = curr; adj_start = (end - curr->vm_start); } _ Patches currently in -mm which might be from vbabka(a)suse.cz are mm-vmscan-prevent-infinite-loop-for-costly-gfp_noio-__gfp_retry_mayfail-allocations.patch mm-mmap-fix-vma_merge-case-7-with-vma_ops-close.patch

1 year, 4 months

1
0
0 0

[PATCH 0/3] sched/rt fixes for 4.19

by Petr Vorel

Hi, maybe you will not like introducing 'static int int_max = INT_MAX;' for this old kernel which EOL in 10 months. Cyril Hrubis (3): sched/rt: Fix sysctl_sched_rr_timeslice intial value sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset sched/rt: Disallow writing invalid values to sched_rt_period_us kernel/sched/rt.c | 10 +++++----- kernel/sysctl.c | 5 +++++ 2 files changed, 10 insertions(+), 5 deletions(-) -- 2.35.3

1 year, 4 months

2
16
0 0

[PATCH 6.6 v2 1/1] sched/rt: Disallow writing invalid values to sched_rt_period_us

by Petr Vorel

From: Cyril Hrubis <chrubis(a)suse.cz> [ Upstream commit 079be8fc630943d9fc70a97807feb73d169ee3fc ] The validation of the value written to sched_rt_period_us was broken because: - the sysclt_sched_rt_period is declared as unsigned int - parsed by proc_do_intvec() - the range is asserted after the value parsed by proc_do_intvec() Because of this negative values written to the file were written into a unsigned integer that were later on interpreted as large positive integers which did passed the check: if (sysclt_sched_rt_period <= 0) return EINVAL; This commit fixes the parsing by setting explicit range for both perid_us and runtime_us into the sched_rt_sysctls table and processes the values with proc_dointvec_minmax() instead. Alternatively if we wanted to use full range of unsigned int for the period value we would have to split the proc_handler and use proc_douintvec() for it however even the Documentation/scheduller/sched-rt-group.rst describes the range as 1 to INT_MAX. As far as I can tell the only problem this causes is that the sysctl file allows writing negative values which when read back may confuse userspace. There is also a LTP test being submitted for these sysctl files at: http://patchwork.ozlabs.org/project/ltp/patch/20230901144433.2526-1-chrubis… Signed-off-by: Cyril Hrubis <chrubis(a)suse.cz> Signed-off-by: Ingo Molnar <mingo(a)kernel.org> Link: https://lore.kernel.org/r/20231002115553.3007-2-chrubis@suse.cz Signed-off-by: Petr Vorel <pvorel(a)suse.cz> --- kernel/sched/rt.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 904dd8534597..4ac36eb4cdee 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -37,6 +37,8 @@ static struct ctl_table sched_rt_sysctls[] = { .maxlen = sizeof(unsigned int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rt_runtime_us", @@ -44,6 +46,8 @@ static struct ctl_table sched_rt_sysctls[] = { .maxlen = sizeof(int), .mode = 0644, .proc_handler = sched_rt_handler, + .extra1 = SYSCTL_NEG_ONE, + .extra2 = SYSCTL_INT_MAX, }, { .procname = "sched_rr_timeslice_ms", @@ -2989,9 +2993,6 @@ static int sched_rt_global_constraints(void) #ifdef CONFIG_SYSCTL static int sched_rt_global_validate(void) { - if (sysctl_sched_rt_period <= 0) - return -EINVAL; - if ((sysctl_sched_rt_runtime != RUNTIME_INF) && ((sysctl_sched_rt_runtime > sysctl_sched_rt_period) || ((u64)sysctl_sched_rt_runtime * @@ -3022,7 +3023,7 @@ static int sched_rt_handler(struct ctl_table *table, int write, void *buffer, old_period = sysctl_sched_rt_period; old_runtime = sysctl_sched_rt_runtime; - ret = proc_dointvec(table, write, buffer, lenp, ppos); + ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); if (!ret && write) { ret = sched_rt_global_validate(); -- 2.35.3

1 year, 4 months

1
0
0 0

[PATCH 6.1 v2 1/2] sched/rt: sysctl_sched_rr_timeslice show default timeslice after reset

by Petr Vorel

From: Cyril Hrubis <chrubis(a)suse.cz> [ Upstream commit c1fc6484e1fb7cc2481d169bfef129a1b0676abe ] The sched_rr_timeslice can be reset to default by writing value that is <= 0. However after reading from this file we always got the last value written, which is not useful at all. $ echo -1 > /proc/sys/kernel/sched_rr_timeslice_ms $ cat /proc/sys/kernel/sched_rr_timeslice_ms -1 Fix this by setting the variable that holds the sysctl file value to the jiffies_to_msecs(RR_TIMESLICE) in case that <= 0 value was written. Signed-off-by: Cyril Hrubis <chrubis(a)suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Reviewed-by: Petr Vorel <pvorel(a)suse.cz> Acked-by: Mel Gorman <mgorman(a)suse.de> Tested-by: Petr Vorel <pvorel(a)suse.cz> Link: https://lore.kernel.org/r/20230802151906.25258-3-chrubis@suse.cz Signed-off-by: Petr Vorel <pvorel(a)suse.cz> --- kernel/sched/rt.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 76bafa8d331a..f79a6f36777a 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -3047,6 +3047,9 @@ static int sched_rr_handler(struct ctl_table *table, int write, void *buffer, sched_rr_timeslice = sysctl_sched_rr_timeslice <= 0 ? RR_TIMESLICE : msecs_to_jiffies(sysctl_sched_rr_timeslice); + + if (sysctl_sched_rr_timeslice <= 0) + sysctl_sched_rr_timeslice = jiffies_to_msecs(RR_TIMESLICE); } mutex_unlock(&mutex); -- 2.35.3

1 year, 4 months

1
1
0 0

[PATCH 5.4 v2 1/3] sched/rt: Fix sysctl_sched_rr_timeslice intial value

by Petr Vorel

From: Cyril Hrubis <chrubis(a)suse.cz> [ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ] There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y. This was found with LTP test sched_rr_get_interval01: sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 What this test does is to compare the return value from the sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and fails if they do not match. The problem it found is the intial sysctl file value which was computed as: static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; which works fine as long as MSEC_PER_SEC is multiple of HZ, however it introduces 10% rounding error for CONFIG_HZ_300: (MSEC_PER_SEC / HZ) * (100 * HZ / 1000) (1000 / 300) * (100 * 300 / 1000) 3 * 30 = 90 This can be easily fixed by reversing the order of the multiplication and division. After this fix we get: (MSEC_PER_SEC * (100 * HZ / 1000)) / HZ (1000 * (100 * 300 / 1000)) / 300 (1000 * 30) / 300 = 100 Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds") Signed-off-by: Cyril Hrubis <chrubis(a)suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Reviewed-by: Petr Vorel <pvorel(a)suse.cz> Acked-by: Mel Gorman <mgorman(a)suse.de> Tested-by: Petr Vorel <pvorel(a)suse.cz> Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz [ pvorel: rebased for 5.4 ] Signed-off-by: Petr Vorel <pvorel(a)suse.cz> --- kernel/sched/rt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 186e7d78ded5..e40f32e3ab06 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -8,7 +8,7 @@ #include "pelt.h" int sched_rr_timeslice = RR_TIMESLICE; -int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; +int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ; /* More than 4 hours if BW_SHIFT equals 20. */ static const u64 max_rt_runtime = MAX_BW; -- 2.35.3

1 year, 4 months

1
2
0 0

[PATCH 5.15, 5.10 v2 1/3] sched/rt: Fix sysctl_sched_rr_timeslice intial value

by Petr Vorel

From: Cyril Hrubis <chrubis(a)suse.cz> [ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ] There is a 10% rounding error in the intial value of the sysctl_sched_rr_timeslice with CONFIG_HZ_300=y. This was found with LTP test sched_rr_get_interval01: sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90 What this test does is to compare the return value from the sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and fails if they do not match. The problem it found is the intial sysctl file value which was computed as: static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; which works fine as long as MSEC_PER_SEC is multiple of HZ, however it introduces 10% rounding error for CONFIG_HZ_300: (MSEC_PER_SEC / HZ) * (100 * HZ / 1000) (1000 / 300) * (100 * 300 / 1000) 3 * 30 = 90 This can be easily fixed by reversing the order of the multiplication and division. After this fix we get: (MSEC_PER_SEC * (100 * HZ / 1000)) / HZ (1000 * (100 * 300 / 1000)) / 300 (1000 * 30) / 300 = 100 Fixes: 975e155ed873 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds") Signed-off-by: Cyril Hrubis <chrubis(a)suse.cz> Signed-off-by: Peter Zijlstra (Intel) <peterz(a)infradead.org> Reviewed-by: Petr Vorel <pvorel(a)suse.cz> Acked-by: Mel Gorman <mgorman(a)suse.de> Tested-by: Petr Vorel <pvorel(a)suse.cz> Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz [ pvorel: rebased for 5.15, 5.10 ] Signed-off-by: Petr Vorel <pvorel(a)suse.cz> --- kernel/sched/rt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/sched/rt.c b/kernel/sched/rt.c index 7045595aacac..3394b7f923a0 100644 --- a/kernel/sched/rt.c +++ b/kernel/sched/rt.c @@ -8,7 +8,7 @@ #include "pelt.h" int sched_rr_timeslice = RR_TIMESLICE; -int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE; +int sysctl_sched_rr_timeslice = (MSEC_PER_SEC * RR_TIMESLICE) / HZ; /* More than 4 hours if BW_SHIFT equals 20. */ static const u64 max_rt_runtime = MAX_BW; -- 2.35.3

1 year, 4 months

1
2
0 0

[PATCH v2] ext4: fix corruption during on-line resize

by Maximilian Heyne

We observed a corruption during on-line resize of a file system that is larger than 16 TiB with 4k block size. With having more then 2^32 blocks resize_inode is turned off by default by mke2fs. The issue can be reproduced on a smaller file system for convenience by explicitly turning off resize_inode. An on-line resize across an 8 GiB boundary (the size of a meta block group in this setup) then leads to a corruption: dev=/dev/<some_dev> # should be >= 16 GiB mkdir -p /corruption /sbin/mke2fs -t ext4 -b 4096 -O ^resize_inode $dev $((2 * 2**21 - 2**15)) mount -t ext4 $dev /corruption dd if=/dev/zero bs=4096 of=/corruption/test count=$((2*2**21 - 4*2**15)) sha1sum /corruption/test # 79d2658b39dcfd77274e435b0934028adafaab11 /corruption/test /sbin/resize2fs $dev $((2*2**21)) # drop page cache to force reload the block from disk echo 1 > /proc/sys/vm/drop_caches sha1sum /corruption/test # 3c2abc63cbf1a94c9e6977e0fbd72cd832c4d5c3 /corruption/test 2^21 = 2^15*2^6 equals 8 GiB whereof 2^15 is the number of blocks per block group and 2^6 are the number of block groups that make a meta block group. The last checksum might be different depending on how the file is laid out across the physical blocks. The actual corruption occurs at physical block 63*2^15 = 2064384 which would be the location of the backup of the meta block group's block descriptor. During the on-line resize the file system will be converted to meta_bg starting at s_first_meta_bg which is 2 in the example - meaning all block groups after 16 GiB. However, in ext4_flex_group_add we might add block groups that are not part of the first meta block group yet. In the reproducer we achieved this by substracting the size of a whole block group from the point where the meta block group would start. This must be considered when updating the backup block group descriptors to follow the non-meta_bg layout. The fix is to add a test whether the group to add is already part of the meta block group or not. Fixes: 01f795f9e0d67 ("ext4: add online resizing support for meta_bg and 64-bit file systems") Cc: stable(a)vger.kernel.org Signed-off-by: Maximilian Heyne <mheyne(a)amazon.de> --- fs/ext4/resize.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c index 4d4a5a32e310..3c0d12382e06 100644 --- a/fs/ext4/resize.c +++ b/fs/ext4/resize.c @@ -1602,7 +1602,8 @@ static int ext4_flex_group_add(struct super_block *sb, int gdb_num = group / EXT4_DESC_PER_BLOCK(sb); int gdb_num_end = ((group + flex_gd->count - 1) / EXT4_DESC_PER_BLOCK(sb)); - int meta_bg = ext4_has_feature_meta_bg(sb); + int meta_bg = ext4_has_feature_meta_bg(sb) && + gdb_num >= le32_to_cpu(es->s_first_meta_bg); sector_t padding_blocks = meta_bg ? 0 : sbi->s_sbh->b_blocknr - ext4_group_first_block_no(sb, 0); -- 2.40.1 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879

1 year, 4 months

3
2
0 0

[PATCH 2/2] usb: port: Don't try to peer unused USB ports based on location

by Mathias Nyman

Unused USB ports may have bogus location data in ACPI PLD tables. This causes port peering failures as these unused USB2 and USB3 ports location may match. This is seen on DELL systems where all unused ports return zeroed location data. Don't try to peer or match ports that have connect type set to USB_PORT_NOT_USED. Tested-by: Paul Menzel <pmenzel(a)molgen.mpg.de> Cc: stable(a)vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman(a)linux.intel.com> --- drivers/usb/core/port.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/usb/core/port.c b/drivers/usb/core/port.c index c628c1abc907..4d63496f98b6 100644 --- a/drivers/usb/core/port.c +++ b/drivers/usb/core/port.c @@ -573,7 +573,7 @@ static int match_location(struct usb_device *peer_hdev, void *p) struct usb_hub *peer_hub = usb_hub_to_struct_hub(peer_hdev); struct usb_device *hdev = to_usb_device(port_dev->dev.parent->parent); - if (!peer_hub) + if (!peer_hub || port_dev->connect_type == USB_PORT_NOT_USED) return 0; hcd = bus_to_hcd(hdev->bus); @@ -584,7 +584,8 @@ static int match_location(struct usb_device *peer_hdev, void *p) for (port1 = 1; port1 <= peer_hdev->maxchild; port1++) { peer = peer_hub->ports[port1 - 1]; - if (peer && peer->location == port_dev->location) { + if (peer && peer->connect_type != USB_PORT_NOT_USED && + peer->location == port_dev->location) { link_peers_report(port_dev, peer); return 1; /* done */ } -- 2.25.1

1 year, 4 months

3
3
0 0

[PATCH] virtio_blk: Fix device surprise removal

by Parav Pandit

When the PCI device is surprise removed, requests won't complete from the device. These IOs are never completed and disk deletion hangs indefinitely. Fix it by aborting the IOs which the device will never complete when the VQ is broken. With this fix now fio completes swiftly. An alternative of IO timeout has been considered, however when the driver knows about unresponsive block device, swiftly clearing them enables users and upper layers to react quickly. Verified with multiple device unplug cycles with pending IOs in virtio used ring and some pending with device. In future instead of VQ broken, a more elegant method can be used. At the moment the patch is kept to its minimal changes given its urgency to fix broken kernels. Fixes: 43bb40c5b926 ("virtio_pci: Support surprise removal of virtio pci device") Cc: stable(a)vger.kernel.org Reported-by: lirongqing(a)baidu.com Closes: https://lore.kernel.org/virtualization/c45dd68698cd47238c55fb73ca9b4741@bai… Co-developed-by: Chaitanya Kulkarni <kch(a)nvidia.com> Signed-off-by: Chaitanya Kulkarni <kch(a)nvidia.com> Signed-off-by: Parav Pandit <parav(a)nvidia.com> --- drivers/block/virtio_blk.c | 54 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 2bf14a0e2815..59b49899b229 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -1562,10 +1562,64 @@ static int virtblk_probe(struct virtio_device *vdev) return err; } +static bool virtblk_cancel_request(struct request *rq, void *data) +{ + struct virtblk_req *vbr = blk_mq_rq_to_pdu(rq); + + vbr->in_hdr.status = VIRTIO_BLK_S_IOERR; + if (blk_mq_request_started(rq) && !blk_mq_request_completed(rq)) + blk_mq_complete_request(rq); + + return true; +} + +static void virtblk_cleanup_reqs(struct virtio_blk *vblk) +{ + struct virtio_blk_vq *blk_vq; + struct request_queue *q; + struct virtqueue *vq; + unsigned long flags; + int i; + + vq = vblk->vqs[0].vq; + if (!virtqueue_is_broken(vq)) + return; + + q = vblk->disk->queue; + /* Block upper layer to not get any new requests */ + blk_mq_quiesce_queue(q); + + for (i = 0; i < vblk->num_vqs; i++) { + blk_vq = &vblk->vqs[i]; + + /* Synchronize with any ongoing virtblk_poll() which may be + * completing the requests to uppper layer which has already + * crossed the broken vq check. + */ + spin_lock_irqsave(&blk_vq->lock, flags); + spin_unlock_irqrestore(&blk_vq->lock, flags); + } + + blk_sync_queue(q); + + /* Complete remaining pending requests with error */ + blk_mq_tagset_busy_iter(&vblk->tag_set, virtblk_cancel_request, vblk); + blk_mq_tagset_wait_completed_request(&vblk->tag_set); + + /* + * Unblock any pending dispatch I/Os before we destroy device. From + * del_gendisk() -> __blk_mark_disk_dead(disk) will set GD_DEAD flag, + * that will make sure any new I/O from bio_queue_enter() to fail. + */ + blk_mq_unquiesce_queue(q); +} + static void virtblk_remove(struct virtio_device *vdev) { struct virtio_blk *vblk = vdev->priv; + virtblk_cleanup_reqs(vblk); + /* Make sure no work handler is accessing the device. */ flush_work(&vblk->config_work); -- 2.34.1

1 year, 4 months

4
12
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror February 2024