Before adding this quirk, this (mechanical keyboard) device would not be
recognized, logging:
new full-speed USB device number 56 using xhci_hcd
unable to read config index 0 descriptor/start: -32
chopping to 0 config(s)
It would take dozens of plugging/unpuggling cycles for the keyboard to
be recognized. Keyboard seems to simply work after applying this quirk.
This issue had been reported by users in two places already ([1], [2])
but nobody tried upstreaming a patch yet. After testing I believe their
suggested fix (DELAY_INIT + NO_LPM + DEVICE_QUALIFIER) was probably a
little overkill. I assume this particular combination was tested because
it had been previously suggested in [3], but only NO_LPM seems
sufficient for this device.
[1]: https://qiita.com/float168/items/fed43d540c8e2201b543
[2]: https://blog.kostic.dev/posts/making-the-realforce-87ub-work-with-usb30-on-…
[3]: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678477
---
Changes in v2:
- add the entry to the right location (sorting entries by
vendor/device id).
Cc: stable(a)vger.kernel.org
Signed-off-by: Nicolas Dumazet <ndumazet(a)google.com>
---
drivers/usb/core/quirks.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index 0722d2131305..079e183cf3bf 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -362,6 +362,9 @@ static const struct usb_device_id usb_quirk_list[] = {
{ USB_DEVICE(0x0781, 0x5583), .driver_info = USB_QUIRK_NO_LPM },
{ USB_DEVICE(0x0781, 0x5591), .driver_info = USB_QUIRK_NO_LPM },
+ /* Realforce 87U Keyboard */
+ { USB_DEVICE(0x0853, 0x011b), .driver_info = USB_QUIRK_NO_LPM },
+
/* M-Systems Flash Disk Pioneers */
{ USB_DEVICE(0x08ec, 0x1000), .driver_info = USB_QUIRK_RESET_RESUME },
--
2.38.0.135.g90850a2211-goog
We've got a dump that current cpu is in pinctrl_commit_state, the
old_state != p->state while the stack is still in the process of
pinmux_disable_setting. So it means even if the current p->state is
changed in new state, the settings are not yet up-to-date enabled
complete yet.
Currently p->state in different value to synchronize the
pinctrl_commit_state behaviors. The p->state will have transaction like
old_state -> NULL -> new_state. When in old_state, it will try to
disable all the all state settings. And when after new state settings
enabled, p->state will changed to the new state after that. So use
smp_mb to synchronize the p->state variable and the settings in order.
---
drivers/pinctrl/core.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/pinctrl/core.c b/drivers/pinctrl/core.c
index 9e57f4c62e60..cd917a5b1a0a 100644
--- a/drivers/pinctrl/core.c
+++ b/drivers/pinctrl/core.c
@@ -1256,6 +1256,7 @@ static int pinctrl_commit_state(struct pinctrl *p, struct pinctrl_state *state)
}
}
+ smp_mb();
p->state = NULL;
/* Apply all the settings for the new state - pinmux first */
@@ -1305,6 +1306,7 @@ static int pinctrl_commit_state(struct pinctrl *p, struct pinctrl_state *state)
pinctrl_link_add(setting->pctldev, p->dev);
}
+ smp_mb();
p->state = state;
return 0;
--
2.17.1
Hello,
I was profiling the 5.10 kernel and comparing it to 4.14. On a system with 64 virtual CPUs and 256 GiB of RAM, I am observing a significant drop in IO performance. Using the following FIO with the script "sudo ftest_write.sh <dev_name>" in attachment, I saw FIO iops result drop from 22K to less than 1K.
The script simply does: mount a the EXT4 16GiB volume with max IOPS 64000K, mounting option is " -o noatime,nodiratime,data=ordered", then run fio with 2048 fio wring thread with 28800000 file size with { --name=16kb_rand_write_only_2048_jobs --directory=/rdsdbdata1 --rw=randwrite --ioengine=sync --buffered=1 --bs=16k --max-jobs=2048 --numjobs=2048 --runtime=60 --time_based --thread --filesize=28800000 --fsync=1 --group_reporting }.
My analyzing is that the degradation is introduce by commit {244adf6426ee31a83f397b700d964cff12a247d3} and the issue is the contention on rsv_conversion_wq. The simplest option is to increase the journal size, but that introduces more operational complexity. Another option is to add the following change in attachment "allow more ext4-rsv-conversion workqueue.patch"
From 27e1b0e14275a281b3529f6a60c7b23a81356751 Mon Sep 17 00:00:00 2001
From: davinalu <davinalu(a)amazon.com>
Date: Fri, 23 Sep 2022 00:43:53 +0000
Subject: [PATCH] allow more ext4-rsv-conversion workqueue to speedup fio writing
---
fs/ext4/super.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/super.c b/fs/ext4/super.c index a0af833f7da7..6b34298cdc3b 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -4963,7 +4963,7 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
* concurrency isn't really necessary. Limit it to 1.
*/
EXT4_SB(sb)->rsv_conversion_wq =
- alloc_workqueue("ext4-rsv-conversion", WQ_MEM_RECLAIM | WQ_UNBOUND, 1);
+ alloc_workqueue("ext4-rsv-conversion", WQ_MEM_RECLAIM |
+ WQ_UNBOUND | __WQ_ORDERED, 0);
if (!EXT4_SB(sb)->rsv_conversion_wq) {
printk(KERN_ERR "EXT4-fs: failed to create workqueue\n");
ret = -ENOMEM;
My thought is: If the max_active is 1, it means the "__WQ_ORDERED" combined with WQ_UNBOUND setting, based on alloc_workqueue(). So I added it .
I am not sure should we need "__WQ_ORDERED" or not? without "__WQ_ORDERED" it looks also work at my testbed, but I added since not much fio TP difference on my testbed result with/out "__WQ_ORDERED".
From My understanding and observation: with dioread_unlock and delay_alloc both enabled, the bio_endio() and ext4_writepages() will trigger this work queue to ext4_do_flush_completed_IO(). Looks like the work queue is an one-by-one updating: at EXT4 extend.c io_end->list_vec list only have one io_end_vec each time. So if the BIO has high performance, and we have only one thread to do EXT4 flush will be an bottleneck here. The "ext4-rsv-conversion" this workqueue is mainly for update the EXT4_IO_END_UNWRITTEN extend block(only exist on dioread_unlock and delay_alloc options are set) and extend status if I understand correctly here. Am I correct?
This works on my test system and passes xfstests, but will this cause any corruption on ext4 extends blocks updates, not even sure about the journal transaction updates either?
Can you tell me what I will break if this change is made?
Thanks
Davina
commit 573ae4f13f630d6660008f1974c0a8a29c30e18a upstream.
With special lengths supplied by user space, tee_shm_register() has
an integer overflow when calculating the number of pages covered by a
supplied user space memory region.
This may cause pin_user_pages_fast() to do a NULL pointer dereference.
Fix this by adding an an explicit call to access_ok() in
tee_ioctl_shm_register() to catch an invalid user space address early.
Fixes: 033ddf12bcf5 ("tee: add register user memory")
Cc: stable(a)vger.kernel.org # 5.4
Cc: stable(a)vger.kernel.org # 5.10
Reported-by: Nimish Mishra <neelam.nimish(a)gmail.com>
Reported-by: Anirban Chakraborty <ch.anirban00727(a)gmail.com>
Reported-by: Debdeep Mukhopadhyay <debdeep.mukhopadhyay(a)gmail.com>
Suggested-by: Jerome Forissier <jerome.forissier(a)linaro.org>
Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org>
[JW: backport to stable 5.4 and 5.10 + update commit message]
Signed-off-by: Jens Wiklander <jens.wiklander(a)linaro.org>
---
drivers/tee/tee_core.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/tee/tee_core.c b/drivers/tee/tee_core.c
index a7ccd4d2bd10..2db144d2d26f 100644
--- a/drivers/tee/tee_core.c
+++ b/drivers/tee/tee_core.c
@@ -182,6 +182,9 @@ tee_ioctl_shm_register(struct tee_context *ctx,
if (data.flags)
return -EINVAL;
+ if (!access_ok((void __user *)(unsigned long)data.addr, data.length))
+ return -EFAULT;
+
shm = tee_shm_register(ctx, data.addr, data.length,
TEE_SHM_DMA_BUF | TEE_SHM_USER_MAPPED);
if (IS_ERR(shm))
--
2.31.1
madvise(MADV_DONTNEED) ends up calling zap_page_range() to clear the
page tables associated with the address range. For hugetlb vmas,
zap_page_range will call __unmap_hugepage_range_final. However,
__unmap_hugepage_range_final assumes the passed vma is about to be
removed and deletes the vma_lock to prevent pmd sharing as the vma is
on the way out. In the case of madvise(MADV_DONTNEED) the vma remains,
but the missing vma_lock prevents pmd sharing and could potentially
lead to issues with truncation/fault races.
This issue was originally reported here [1] as a BUG triggered in
page_try_dup_anon_rmap. Prior to the introduction of the hugetlb
vma_lock, __unmap_hugepage_range_final cleared the VM_MAYSHARE flag to
prevent pmd sharing. Subsequent faults on this vma were confused as
VM_MAYSHARE indicates a sharable vma, but was not set so page_mapping
was not set in new pages added to the page table. This resulted in
pages that appeared anonymous in a VM_SHARED vma and triggered the BUG.
Create a new routine clear_hugetlb_page_range() that can be called from
madvise(MADV_DONTNEED) for hugetlb vmas. It has the same setup as
zap_page_range, but does not delete the vma_lock.
[1] https://lore.kernel.org/lkml/CAO4mrfdLMXsao9RF4fUE8-Wfde8xmjsKrTNMNC9wjUb6J…
Fixes: 90e7e7f5ef3f ("mm: enable MADV_DONTNEED for hugetlb mappings")
Reported-by: Wei Chen <harperchen1110(a)gmail.com>
Signed-off-by: Mike Kravetz <mike.kravetz(a)oracle.com>
Cc: <stable(a)vger.kernel.org>
---
v2 - Fix build issues associated with !CONFIG_ADVISE_SYSCALLS
include/linux/hugetlb.h | 7 +++++
mm/hugetlb.c | 62 +++++++++++++++++++++++++++++++++--------
mm/madvise.c | 5 +++-
3 files changed, 61 insertions(+), 13 deletions(-)
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index a899bc76d677..0246e77be3a3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -158,6 +158,8 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
void unmap_hugepage_range(struct vm_area_struct *,
unsigned long, unsigned long, struct page *,
zap_flags_t);
+void clear_hugetlb_page_range(struct vm_area_struct *vma,
+ unsigned long start, unsigned long end);
void __unmap_hugepage_range_final(struct mmu_gather *tlb,
struct vm_area_struct *vma,
unsigned long start, unsigned long end,
@@ -426,6 +428,11 @@ static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb,
BUG();
}
+static void __maybe_unused clear_hugetlb_page_range(struct vm_area_struct *vma,
+ unsigned long start, unsigned long end)
+{
+}
+
static inline vm_fault_t hugetlb_fault(struct mm_struct *mm,
struct vm_area_struct *vma, unsigned long address,
unsigned int flags)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 931789a8f734..807cfd2884fa 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5202,29 +5202,67 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct
tlb_flush_mmu_tlbonly(tlb);
}
-void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+static void __unmap_hugepage_range_locking(struct mmu_gather *tlb,
struct vm_area_struct *vma, unsigned long start,
unsigned long end, struct page *ref_page,
- zap_flags_t zap_flags)
+ zap_flags_t zap_flags, bool final)
{
hugetlb_vma_lock_write(vma);
i_mmap_lock_write(vma->vm_file->f_mapping);
__unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags);
- /*
- * Unlock and free the vma lock before releasing i_mmap_rwsem. When
- * the vma_lock is freed, this makes the vma ineligible for pmd
- * sharing. And, i_mmap_rwsem is required to set up pmd sharing.
- * This is important as page tables for this unmapped range will
- * be asynchrously deleted. If the page tables are shared, there
- * will be issues when accessed by someone else.
- */
- __hugetlb_vma_unlock_write_free(vma);
+ if (final) {
+ /*
+ * Unlock and free the vma lock before releasing i_mmap_rwsem.
+ * When the vma_lock is freed, this makes the vma ineligible
+ * for pmd sharing. And, i_mmap_rwsem is required to set up
+ * pmd sharing. This is important as page tables for this
+ * unmapped range will be asynchrously deleted. If the page
+ * tables are shared, there will be issues when accessed by
+ * someone else.
+ */
+ __hugetlb_vma_unlock_write_free(vma);
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+ } else {
+ i_mmap_unlock_write(vma->vm_file->f_mapping);
+ hugetlb_vma_unlock_write(vma);
+ }
+}
- i_mmap_unlock_write(vma->vm_file->f_mapping);
+void __unmap_hugepage_range_final(struct mmu_gather *tlb,
+ struct vm_area_struct *vma, unsigned long start,
+ unsigned long end, struct page *ref_page,
+ zap_flags_t zap_flags)
+{
+ __unmap_hugepage_range_locking(tlb, vma, start, end, ref_page,
+ zap_flags, true);
}
+#ifdef CONFIG_ADVISE_SYSCALLS
+/*
+ * Similar setup as in zap_page_range(). madvise(MADV_DONTNEED) can not call
+ * zap_page_range for hugetlb vmas as __unmap_hugepage_range_final will delete
+ * the associated vma_lock.
+ */
+void clear_hugetlb_page_range(struct vm_area_struct *vma, unsigned long start,
+ unsigned long end)
+{
+ struct mmu_notifier_range range;
+ struct mmu_gather tlb;
+
+ mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, vma->vm_mm,
+ start, end);
+ tlb_gather_mmu(&tlb, vma->vm_mm);
+ update_hiwater_rss(vma->vm_mm);
+
+ __unmap_hugepage_range_locking(&tlb, vma, start, end, NULL, 0, false);
+
+ mmu_notifier_invalidate_range_end(&range);
+ tlb_finish_mmu(&tlb);
+}
+#endif
+
void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start,
unsigned long end, struct page *ref_page,
zap_flags_t zap_flags)
diff --git a/mm/madvise.c b/mm/madvise.c
index 2baa93ca2310..90577a669635 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -790,7 +790,10 @@ static int madvise_free_single_vma(struct vm_area_struct *vma,
static long madvise_dontneed_single_vma(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
- zap_page_range(vma, start, end - start);
+ if (!is_vm_hugetlb_page(vma))
+ zap_page_range(vma, start, end - start);
+ else
+ clear_hugetlb_page_range(vma, start, end);
return 0;
}
--
2.37.3
With use_codeword_fixup enabled, any return from
mtd_device_parse_register gets overwritten. Aside from the clear bug, this
is also problematic as a parser can EPROBE_DEFER and because this is not
correctly handled, the nand is never rescanned later in the bootup
process.
An example of this problem is when smem requires additional time to be
probed and nandc use qcomsmempart as parser. Parser will return
EPROBE_DEFER but in the current code this ret gets overwritten by
qcom_nand_host_parse_boot_partitions and qcom_nand_host_init_and_register
return 0.
Correctly handle the return code from mtd_device_parse_register so that
any error from this function is not ignored.
Fixes: 862bdedd7f4b ("mtd: nand: raw: qcom_nandc: add support for unprotected spare data pages")
Cc: stable(a)vger.kernel.org # v6.0+
Signed-off-by: Christian Marangi <ansuelsmth(a)gmail.com>
---
drivers/mtd/nand/raw/qcom_nandc.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/drivers/mtd/nand/raw/qcom_nandc.c b/drivers/mtd/nand/raw/qcom_nandc.c
index 8f80019a9f01..198a44794d2d 100644
--- a/drivers/mtd/nand/raw/qcom_nandc.c
+++ b/drivers/mtd/nand/raw/qcom_nandc.c
@@ -3167,16 +3167,18 @@ static int qcom_nand_host_init_and_register(struct qcom_nand_controller *nandc,
ret = mtd_device_parse_register(mtd, probes, NULL, NULL, 0);
if (ret)
- nand_cleanup(chip);
+ goto err;
if (nandc->props->use_codeword_fixup) {
ret = qcom_nand_host_parse_boot_partitions(nandc, host, dn);
- if (ret) {
- nand_cleanup(chip);
- return ret;
- }
+ if (ret)
+ goto err;
}
+ return 0;
+
+err:
+ nand_cleanup(chip);
return ret;
}
--
2.37.2
Hi,
Can you please backport the following commit to 5.15 stable?
Title: "af_unix: Fix memory leaks of the whole sk due to OOB skb."
Commit SHA: 7a62ed61367b8fd01bae1e18e30602c25060d824
This commit fixes "314001f0bf927015e459c9d387d62a231fe93af3" which
landed on "tags/v5.15-rc1~157^2~305".
Best,
Anil
If the starting position of our insert range happens to be in the hole
between the two ext4_extent_idx, because the lblk of the ext4_extent in
the previous ext4_extent_idx is always less than the start, which leads
to the "extent" variable access across the boundary, the following UAF is
triggered:
==================================================================
BUG: KASAN: use-after-free in ext4_ext_shift_extents+0x257/0x790
Read of size 4 at addr ffff88819807a008 by task fallocate/8010
CPU: 3 PID: 8010 Comm: fallocate Tainted: G E 5.10.0+ #492
Call Trace:
dump_stack+0x7d/0xa3
print_address_description.constprop.0+0x1e/0x220
kasan_report.cold+0x67/0x7f
ext4_ext_shift_extents+0x257/0x790
ext4_insert_range+0x5b6/0x700
ext4_fallocate+0x39e/0x3d0
vfs_fallocate+0x26f/0x470
ksys_fallocate+0x3a/0x70
__x64_sys_fallocate+0x4f/0x60
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x44/0xa9
==================================================================
For right shifts, we can divide them into the following situations:
1. When the first ee_block of ext4_extent_idx is greater than or equal to
start, make right shifts directly from the first ee_block.
1) If it is greater than start, we need to continue searching in the
previous ext4_extent_idx.
2) If it is equal to start, we can exit the loop (iterator=NULL).
2. When the first ee_block of ext4_extent_idx is less than start, then
traverse from the last extent to find the first extent whose ee_block
is less than start.
1) If extent is still the last extent after traversal, it means that
the last ee_block of ext4_extent_idx is less than start, that is,
start is located in the hole between idx and (idx+1), so we can
exit the loop directly (break) without right shifts.
2) Otherwise, make right shifts at the corresponding position of the
found extent, and then exit the loop (iterator=NULL).
Fixes: 331573febb6a ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate")
Cc: stable(a)vger.kernel.org # v4.2+
Signed-off-by: Zhihao Cheng <chengzhihao1(a)huawei.com>
Signed-off-by: Baokun Li <libaokun1(a)huawei.com>
---
V1->V2:
Initialize "ret" after the "again:" label to avoid return value mismatch.
Refactoring reduces cycles and makes code more readable.
fs/ext4/extents.c | 18 +++++++++++++-----
1 file changed, 13 insertions(+), 5 deletions(-)
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index c148bb97b527..39c9f87de0be 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -5179,6 +5179,7 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
* and it is decreased till we reach start.
*/
again:
+ ret = 0;
if (SHIFT == SHIFT_LEFT)
iterator = &start;
else
@@ -5222,14 +5223,21 @@ ext4_ext_shift_extents(struct inode *inode, handle_t *handle,
ext4_ext_get_actual_len(extent);
} else {
extent = EXT_FIRST_EXTENT(path[depth].p_hdr);
- if (le32_to_cpu(extent->ee_block) > 0)
+ if (le32_to_cpu(extent->ee_block) > start)
*iterator = le32_to_cpu(extent->ee_block) - 1;
- else
- /* Beginning is reached, end of the loop */
+ else if (le32_to_cpu(extent->ee_block) == start)
iterator = NULL;
- /* Update path extent in case we need to stop */
- while (le32_to_cpu(extent->ee_block) < start)
+ else {
+ extent = EXT_LAST_EXTENT(path[depth].p_hdr);
+ while (le32_to_cpu(extent->ee_block) >= start)
+ extent--;
+
+ if (extent == EXT_LAST_EXTENT(path[depth].p_hdr))
+ break;
+
extent++;
+ iterator = NULL;
+ }
path[depth].p_ext = extent;
}
ret = ext4_ext_shift_path_extents(path, shift, inode,
--
2.31.1
On 10/16/22 19:21, Bagas Sanjaya wrote:
> On 10/16/22 03:59, Phillip Lougher wrote:
>>
>> Which identified the "squashfs: support reading fragments in readahead call"
>> patch.
>>
>> There is a race-condition introduced in that patch, which involves cache
>> releasing and reuse.
>>
>> The following diff will fix that race-condition. It would be great if
>> someone could test and verify before sending it out as a patch.
>>
>> Thanks
>>
>> Phillip
>>
>> diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c
>> index e56510964b22..6cc23178e9ad 100644
>> --- a/fs/squashfs/file.c
>> +++ b/fs/squashfs/file.c
>> @@ -506,8 +506,9 @@ static int squashfs_readahead_fragment(struct page **page,
>> squashfs_i(inode)->fragment_size);
>> struct squashfs_sb_info *msblk = inode->i_sb->s_fs_info;
>> unsigned int n, mask = (1 << (msblk->block_log - PAGE_SHIFT)) - 1;
>> + int error = buffer->error;
>>
>> - if (buffer->error)
>> + if (error)
>> goto out;
>>
>> expected += squashfs_i(inode)->fragment_offset;
>> @@ -529,7 +530,7 @@ static int squashfs_readahead_fragment(struct page **page,
>>
>> out:
>> squashfs_cache_put(buffer);
>> - return buffer->error;
>> + return error;
>> }
>>
>> static void squashfs_readahead(struct readahead_control *ractl)
>>
>
> No Verneed warnings so far. However, I need to test for a longer time
> (a day) to check if any warnings are reported.
>
> Thanks.
>
Also, since this regression is also found on linux-6.0.y stable branch,
don't forget to Cc stable list.
Thanks.
--
An old man doll... just what I always wanted! - Clara