July 2021 - Linux-stable-mirror

[PATCH] xhci: add quirk for host controllers that don't update endpoint DCS

by Bjørn Mork

From: Jonathan Bell <jonathan(a)raspberrypi.org> Seen on a VLI VL805 PCIe to USB controller. For non-stream endpoints at least, if the xHC halts on a particular TRB due to an error then the DCS field in the Out Endpoint Context maintained by the hardware is not updated with the current cycle state. Using the quirk XHCI_EP_CTX_BROKEN_DCS and instead fetch the DCS bit from the TRB that the xHC stopped on. Cc: stable(a)vger.kernel.org Link: https://github.com/raspberrypi/linux/issues/3060 Signed-off-by: Jonathan Bell <jonathan(a)raspberrypi.org> Signed-off-by: Bjørn Mork <bjorn(a)mork.no> --- Ran into this issue on an RPi4 running Debian bullseye, having mostly a plain v5.10.40 kernel. Using an RTL2838 (0bda:2838) with rtl-sdr just did not work, showing all the issues described on the above link. This quirk found in https://github.com/raspberrypi/linux.git solves the problem for me. I don't see why it shouldn't be in mainline. And I propose adding it to stable as well, since it solves a real problem. Mostly for my own convenience as I'd prefer just using a Debian built kernel ;-) Did not check this submission with Jonathan - hoping it is OK... Bjørn drivers/usb/host/xhci-pci.c | 4 +++- drivers/usb/host/xhci-ring.c | 26 ++++++++++++++++++++++++++ drivers/usb/host/xhci.h | 1 + 3 files changed, 30 insertions(+), 1 deletion(-) diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c index 18c2bbddf080..6f3bed09028c 100644 --- a/drivers/usb/host/xhci-pci.c +++ b/drivers/usb/host/xhci-pci.c @@ -279,8 +279,10 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci) pdev->device == 0x3432) xhci->quirks |= XHCI_BROKEN_STREAMS; - if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483) + if (pdev->vendor == PCI_VENDOR_ID_VIA && pdev->device == 0x3483) { xhci->quirks |= XHCI_LPM_SUPPORT; + xhci->quirks |= XHCI_EP_CTX_BROKEN_DCS; + } if (pdev->vendor == PCI_VENDOR_ID_ASMEDIA && pdev->device == PCI_DEVICE_ID_ASMEDIA_1042_XHCI) diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c index 8fea44bbc266..a9c860ff5177 100644 --- a/drivers/usb/host/xhci-ring.c +++ b/drivers/usb/host/xhci-ring.c @@ -559,8 +559,11 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci, struct xhci_ring *ep_ring; struct xhci_command *cmd; struct xhci_segment *new_seg; + struct xhci_segment *halted_seg = NULL; union xhci_trb *new_deq; int new_cycle; + union xhci_trb *halted_trb; + int index = 0; dma_addr_t addr; u64 hw_dequeue; bool cycle_found = false; @@ -600,6 +603,29 @@ static int xhci_move_dequeue_past_td(struct xhci_hcd *xhci, new_deq = ep_ring->dequeue; new_cycle = hw_dequeue & 0x1; + /* + * Quirk: xHC write-back of the DCS field in the hardware dequeue + * pointer is wrong - use the cycle state of the TRB pointed to by + * the dequeue pointer. + */ + if (xhci->quirks & XHCI_EP_CTX_BROKEN_DCS && + !(ep->ep_state & EP_HAS_STREAMS)) + halted_seg = trb_in_td(xhci, cur_td->start_seg, + cur_td->first_trb, cur_td->last_trb, + hw_dequeue & ~0xf, false); + if (halted_seg) { + index = ((dma_addr_t)(hw_dequeue & ~0xf) - halted_seg->dma) / + sizeof(*halted_trb); + halted_trb = &halted_seg->trbs[index]; + state->new_cycle_state = halted_trb->generic.field[3] & 0x1; + xhci_dbg(xhci, "Endpoint DCS = %d TRB index = %d cycle = %d\n", + (u8)(hw_dequeue & 0x1), index, + state->new_cycle_state); + } else { + state->new_cycle_state = hw_dequeue & 0x1; + } + state->stream_id = stream_id; + /* * We want to find the pointer, segment and cycle state of the new trb * (the one after current TD's last_trb). We know the cycle state at diff --git a/drivers/usb/host/xhci.h b/drivers/usb/host/xhci.h index 3c7d281672ae..911aeb7d8a19 100644 --- a/drivers/usb/host/xhci.h +++ b/drivers/usb/host/xhci.h @@ -1896,6 +1896,7 @@ struct xhci_hcd { #define XHCI_SG_TRB_CACHE_SIZE_QUIRK BIT_ULL(39) #define XHCI_NO_SOFT_RETRY BIT_ULL(40) #define XHCI_BROKEN_D3COLD BIT_ULL(41) +#define XHCI_EP_CTX_BROKEN_DCS BIT_ULL(42) unsigned int num_active_eps; unsigned int limit_active_eps; -- 2.30.2

4 years

2
2
0 0

mm/thp commits: please wait a few days

by Hugh Dickins

Hi Greg, Linus has taken in a group of mm/thp commits Cc stable today: 504e070dc08f mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split 22061a1ffabd mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() 31657170deaf mm/thp: fix page_address_in_vma() on file THP tails 494334e43c16 mm/thp: fix vma_address() if virtual address below file offset 732ed55823fc mm/thp: try_to_unmap() use TTU_SYNC for safe splitting 3b77e8c8cde5 mm/thp: make is_huge_zero_pmd() safe and quicker 99fa8a48203d mm/thp: fix __split_huge_pmd_locked() on shmem migration entry ffc90cbb2970 mm, thp: use head page in __migration_entry_wait() and I expect some more to follow in a few days time (thanks Andrew). No problem with the commits themselves, but I'm aware that some of them have dependencies on other commits not yet in stable, which I have to sort out for you now. I'd prefer to avoid a deluge of "does not apply" messages, so ask you please to hold off trying to merge these into stable trees for a few days: I'll get back to you with what's needed for them to apply. Thanks, Hugh

4 years

4
14
0 0

[PATCH v2] userfaultfd: preserve user-supplied address tag in struct uffd_msg

by Peter Collingbourne

If a user program uses userfaultfd on ranges of heap memory, it may end up passing a tagged pointer to the kernel in the range.start field of the UFFDIO_REGISTER ioctl. This can happen when using an MTE-capable allocator, or on Android if using the Tagged Pointers feature for MTE readiness [1]. When a fault subsequently occurs, the tag is stripped from the fault address returned to the application in the fault.address field of struct uffd_msg. However, from the application's perspective, the tagged address *is* the memory address, so if the application is unaware of memory tags, it may get confused by receiving an address that is, from its point of view, outside of the bounds of the allocation. We observed this behavior in the kselftest for userfaultfd [2] but other applications could have the same problem. Fix this by remembering which tag was used to originally register the userfaultfd and passing that tag back in fault.address. In a future enhancement, we may want to pass back the original fault address, but like SA_EXPOSE_TAGBITS, this should be guarded by a flag. [1] https://source.android.com/devices/tech/debug/tagged-pointers [2] tools/testing/selftests/vm/userfaultfd.c Signed-off-by: Peter Collingbourne <pcc(a)google.com> Link: https://linux-review.googlesource.com/id/I761aa9f0344454c482b83fcfcce547db0… Fixes: 63f0c6037965 ("arm64: Introduce prctl() options to control the tagged user addresses ABI") Cc: <stable(a)vger.kernel.org> # 5.4 --- Documentation/arm64/tagged-pointers.rst | 5 +++++ fs/userfaultfd.c | 17 +++++++++++------ include/linux/mm_types.h | 3 ++- 3 files changed, 18 insertions(+), 7 deletions(-) diff --git a/Documentation/arm64/tagged-pointers.rst b/Documentation/arm64/tagged-pointers.rst index 19d284b70384..ec8e1f90b744 100644 --- a/Documentation/arm64/tagged-pointers.rst +++ b/Documentation/arm64/tagged-pointers.rst @@ -73,6 +73,11 @@ flag setting. Non-zero tags are never preserved in sigcontext.fault_address regardless of the SA_EXPOSE_TAGBITS flag setting. +When using userfaultfd the address tag supplied in the range.start +field of the UFFDIO_REGISTER ioctl is preserved and returned to +userspace via the fault.address field of struct uffd_msg, and the +tag of the original fault address is discarded. + The architecture prevents the use of a tagged PC, so the upper byte will be set to a sign-extension of bit 55 on exception return. diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index dd7a6c62b56f..adb0f7d0638a 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -110,15 +110,15 @@ static int userfaultfd_wake_function(wait_queue_entry_t *wq, unsigned mode, struct userfaultfd_wake_range *range = key; int ret; struct userfaultfd_wait_queue *uwq; - unsigned long start, len; + unsigned long start, len, addr; uwq = container_of(wq, struct userfaultfd_wait_queue, wq); ret = 0; /* len == 0 means wake all */ start = range->start; len = range->len; - if (len && (start > uwq->msg.arg.pagefault.address || - start + len <= uwq->msg.arg.pagefault.address)) + addr = untagged_addr(uwq->msg.arg.pagefault.address); + if (len && (start > addr || start + len <= addr)) goto out; WRITE_ONCE(uwq->waken, true); /* @@ -480,8 +480,9 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) init_waitqueue_func_entry(&uwq.wq, userfaultfd_wake_function); uwq.wq.private = current; - uwq.msg = userfault_msg(vmf->address, vmf->flags, reason, - ctx->features); + uwq.msg = userfault_msg( + vmf->address + vmf->vma->vm_userfaultfd_ctx.address_tag, + vmf->flags, reason, ctx->features); uwq.ctx = ctx; uwq.waken = false; @@ -1287,7 +1288,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, unsigned long vm_flags, new_flags; bool found; bool basic_ioctls; - unsigned long start, end, vma_end; + unsigned long address_tag, start, end, vma_end; user_uffdio_register = (struct uffdio_register __user *) arg; @@ -1313,6 +1314,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vm_flags |= VM_UFFD_MINOR; } + address_tag = uffdio_register.range.start - + untagged_addr(uffdio_register.range.start); + ret = validate_range(mm, &uffdio_register.range.start, uffdio_register.range.len); if (ret) @@ -1462,6 +1466,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, */ vma->vm_flags = new_flags; vma->vm_userfaultfd_ctx.ctx = ctx; + vma->vm_userfaultfd_ctx.address_tag = address_tag; if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) hugetlb_unshare_all_pmds(vma); diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 8f0fb62e8975..cb93e5b17896 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -286,9 +286,10 @@ struct vm_region { }; #ifdef CONFIG_USERFAULTFD -#define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) { NULL, }) +#define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) { NULL, 0, }) struct vm_userfaultfd_ctx { struct userfaultfd_ctx *ctx; + unsigned long address_tag; }; #else /* CONFIG_USERFAULTFD */ #define NULL_VM_UFFD_CTX ((struct vm_userfaultfd_ctx) {}) -- 2.32.0.93.g670b81a890-goog

4 years

3
4
0 0

[PATCH v2] perf: Require CAP_KILL if sigtrap is requested

by Marco Elver

If perf_event_open() is called with another task as target and perf_event_attr::sigtrap is set, and the target task's user does not match the calling user, also require the CAP_KILL capability. Otherwise, with the CAP_PERFMON capability alone it would be possible for a user to send SIGTRAP signals via perf events to another user's tasks. This could potentially result in those tasks being terminated if they cannot handle SIGTRAP signals. Note: The check complements the existing capability check, but is not supposed to supersede the ptrace_may_access() check. At a high level we now have: capable of CAP_PERFMON and (CAP_KILL if sigtrap) OR ptrace_may_access() // also checks for same thread-group and uid Fixes: 97ba62b27867 ("perf: Add support for SIGTRAP on perf events") Cc: <stable(a)vger.kernel.org> # 5.13+ Reported-by: Dmitry Vyukov <dvyukov(a)google.com> Signed-off-by: Marco Elver <elver(a)google.com> --- v2: * Drop kill_capable() and just check CAP_KILL (reported by Ondrej Mosnacek). * Use ns_capable(__task_cred(task)->user_ns, CAP_KILL) to check for capability in target task's ns (reported by Ondrej Mosnacek). --- kernel/events/core.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/kernel/events/core.c b/kernel/events/core.c index fe88d6eea3c2..43c99695dc3f 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -12152,10 +12152,23 @@ SYSCALL_DEFINE5(perf_event_open, } if (task) { + bool is_capable; + err = down_read_interruptible(&task->signal->exec_update_lock); if (err) goto err_file; + is_capable = perfmon_capable(); + if (attr.sigtrap) { + /* + * perf_event_attr::sigtrap sends signals to the other + * task. Require the current task to have CAP_KILL. + */ + rcu_read_lock(); + is_capable &= ns_capable(__task_cred(task)->user_ns, CAP_KILL); + rcu_read_unlock(); + } + /* * Preserve ptrace permission check for backwards compatibility. * @@ -12165,7 +12178,7 @@ SYSCALL_DEFINE5(perf_event_open, * perf_event_exit_task() that could imply). */ err = -EACCES; - if (!perfmon_capable() && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) + if (!is_capable && !ptrace_may_access(task, PTRACE_MODE_READ_REALCREDS)) goto err_cred; } -- 2.32.0.93.g670b81a890-goog

4 years

2
2
0 0

[PATCH 4/5] ubifs: report correct st_size for encrypted symlinks

by Eric Biggers

From: Eric Biggers <ebiggers(a)google.com> The stat() family of syscalls report the wrong size for encrypted symlinks, which has caused breakage in several userspace programs. Fix this by calling fscrypt_symlink_getattr() after ubifs_getattr() for encrypted symlinks. This function computes the correct size by reading and decrypting the symlink target (if it's not already cached). For more details, see the commit which added fscrypt_symlink_getattr(). Fixes: ca7f85be8d6c ("ubifs: Add support for encrypted symlinks") Cc: stable(a)vger.kernel.org Signed-off-by: Eric Biggers <ebiggers(a)google.com> --- fs/ubifs/file.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/fs/ubifs/file.c b/fs/ubifs/file.c index 2e4e1d159969..5cfa28cd00cd 100644 --- a/fs/ubifs/file.c +++ b/fs/ubifs/file.c @@ -1630,6 +1630,17 @@ static const char *ubifs_get_link(struct dentry *dentry, return fscrypt_get_symlink(inode, ui->data, ui->data_len, done); } +static int ubifs_symlink_getattr(struct user_namespace *mnt_userns, + const struct path *path, struct kstat *stat, + u32 request_mask, unsigned int query_flags) +{ + ubifs_getattr(mnt_userns, path, stat, request_mask, query_flags); + + if (IS_ENCRYPTED(d_inode(path->dentry))) + return fscrypt_symlink_getattr(path, stat); + return 0; +} + const struct address_space_operations ubifs_file_address_operations = { .readpage = ubifs_readpage, .writepage = ubifs_writepage, @@ -1655,7 +1666,7 @@ const struct inode_operations ubifs_file_inode_operations = { const struct inode_operations ubifs_symlink_inode_operations = { .get_link = ubifs_get_link, .setattr = ubifs_setattr, - .getattr = ubifs_getattr, + .getattr = ubifs_symlink_getattr, .listxattr = ubifs_listxattr, .update_time = ubifs_update_time, }; -- 2.32.0

4 years

1
0
0 0

[PATCH 3/5] f2fs: report correct st_size for encrypted symlinks

by Eric Biggers

From: Eric Biggers <ebiggers(a)google.com> The stat() family of syscalls report the wrong size for encrypted symlinks, which has caused breakage in several userspace programs. Fix this by calling fscrypt_symlink_getattr() after f2fs_getattr() for encrypted symlinks. This function computes the correct size by reading and decrypting the symlink target (if it's not already cached). For more details, see the commit which added fscrypt_symlink_getattr(). Fixes: cbaf042a3cc6 ("f2fs crypto: add symlink encryption") Cc: stable(a)vger.kernel.org Signed-off-by: Eric Biggers <ebiggers(a)google.com> --- fs/f2fs/namei.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/f2fs/namei.c b/fs/f2fs/namei.c index a9cd9cf97229..e2d540ae2293 100644 --- a/fs/f2fs/namei.c +++ b/fs/f2fs/namei.c @@ -1305,9 +1305,19 @@ static const char *f2fs_encrypted_get_link(struct dentry *dentry, return target; } +static int f2fs_encrypted_symlink_getattr(struct user_namespace *mnt_userns, + const struct path *path, + struct kstat *stat, u32 request_mask, + unsigned int query_flags) +{ + f2fs_getattr(mnt_userns, path, stat, request_mask, query_flags); + + return fscrypt_symlink_getattr(path, stat); +} + const struct inode_operations f2fs_encrypted_symlink_inode_operations = { .get_link = f2fs_encrypted_get_link, - .getattr = f2fs_getattr, + .getattr = f2fs_encrypted_symlink_getattr, .setattr = f2fs_setattr, .listxattr = f2fs_listxattr, }; -- 2.32.0

4 years

1
0
0 0

[PATCH 2/5] ext4: report correct st_size for encrypted symlinks

by Eric Biggers

From: Eric Biggers <ebiggers(a)google.com> The stat() family of syscalls report the wrong size for encrypted symlinks, which has caused breakage in several userspace programs. Fix this by calling fscrypt_symlink_getattr() after ext4_getattr() for encrypted symlinks. This function computes the correct size by reading and decrypting the symlink target (if it's not already cached). For more details, see the commit which added fscrypt_symlink_getattr(). Fixes: f348c252320b ("ext4 crypto: add symlink encryption") Cc: stable(a)vger.kernel.org Signed-off-by: Eric Biggers <ebiggers(a)google.com> --- fs/ext4/symlink.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/ext4/symlink.c b/fs/ext4/symlink.c index dd05af983092..69109746e6e2 100644 --- a/fs/ext4/symlink.c +++ b/fs/ext4/symlink.c @@ -52,10 +52,20 @@ static const char *ext4_encrypted_get_link(struct dentry *dentry, return paddr; } +static int ext4_encrypted_symlink_getattr(struct user_namespace *mnt_userns, + const struct path *path, + struct kstat *stat, u32 request_mask, + unsigned int query_flags) +{ + ext4_getattr(mnt_userns, path, stat, request_mask, query_flags); + + return fscrypt_symlink_getattr(path, stat); +} + const struct inode_operations ext4_encrypted_symlink_inode_operations = { .get_link = ext4_encrypted_get_link, .setattr = ext4_setattr, - .getattr = ext4_getattr, + .getattr = ext4_encrypted_symlink_getattr, .listxattr = ext4_listxattr, }; -- 2.32.0

4 years

1
0
0 0

[PATCH 1/5] fscrypt: add fscrypt_symlink_getattr() for computing st_size

by Eric Biggers

From: Eric Biggers <ebiggers(a)google.com> Add a helper function fscrypt_symlink_getattr() which will be called from the various filesystems' ->getattr() methods to read and decrypt the target of encrypted symlinks in order to report the correct st_size. Detailed explanation: As required by POSIX and as documented in various man pages, st_size for a symlink is supposed to be the length of the symlink target. Unfortunately, st_size has always been wrong for encrypted symlinks because st_size is populated from i_size from disk, which intentionally contains the length of the encrypted symlink target. That's slightly greater than the length of the decrypted symlink target (which is the symlink target that userspace usually sees), and usually won't match the length of the no-key encoded symlink target either. This hadn't been fixed yet because reporting the correct st_size would require reading the symlink target from disk and decrypting or encoding it, which historically has been considered too heavyweight to do in ->getattr(). Also historically, the wrong st_size had only broken a test (LTP lstat03) and there were no known complaints from real users. (This is probably because the st_size of symlinks isn't used too often, and when it is, typically it's for a hint for what buffer size to pass to readlink() -- which a slightly-too-large size still works for.) However, a couple things have changed now. First, there have recently been complaints about the current behavior from real users: - Breakage in rpmbuild: https://github.com/rpm-software-management/rpm/issues/1682 https://github.com/google/fscrypt/issues/305 - Breakage in toybox cpio: https://www.mail-archive.com/toybox@lists.landley.net/msg07193.html - Breakage in libgit2: https://issuetracker.google.com/issues/189629152 (on Android public issue tracker, requires login) Second, we now cache decrypted symlink targets in ->i_link. Therefore, taking the performance hit of reading and decrypting the symlink target in ->getattr() wouldn't be as big a deal as it used to be, since usually it will just save having to do the same thing later. Also note that eCryptfs ended up having to read and decrypt symlink targets in ->getattr() as well, to fix this same issue; see commit 3a60a1686f0d ("eCryptfs: Decrypt symlink target for stat size"). So, let's just bite the bullet, and read and decrypt the symlink target in ->getattr() in order to report the correct st_size. Add a function fscrypt_symlink_getattr() which the filesystems will call to do this. (Alternatively, we could store the decrypted size of symlinks on-disk. But there isn't a great place to do so, and encryption is meant to hide the original size to some extent; that property would be lost.) Cc: stable(a)vger.kernel.org Signed-off-by: Eric Biggers <ebiggers(a)google.com> --- fs/crypto/hooks.c | 44 +++++++++++++++++++++++++++++++++++++++++ include/linux/fscrypt.h | 7 +++++++ 2 files changed, 51 insertions(+) diff --git a/fs/crypto/hooks.c b/fs/crypto/hooks.c index a73b0376e6f3..af74599ae1cf 100644 --- a/fs/crypto/hooks.c +++ b/fs/crypto/hooks.c @@ -384,3 +384,47 @@ const char *fscrypt_get_symlink(struct inode *inode, const void *caddr, return ERR_PTR(err); } EXPORT_SYMBOL_GPL(fscrypt_get_symlink); + +/** + * fscrypt_symlink_getattr() - set the correct st_size for encrypted symlinks + * @path: the path for the encrypted symlink being queried + * @stat: the struct being filled with the symlink's attributes + * + * Override st_size of encrypted symlinks to be the length of the decrypted + * symlink target (or the no-key encoded symlink target, if the key is + * unavailable) rather than the length of the encrypted symlink target. This is + * necessary for st_size to match the symlink target that userspace actually + * sees. POSIX requires this, and some userspace programs depend on it. + * + * This requires reading the symlink target from disk if needed, setting up the + * inode's encryption key if possible, and then decrypting or encoding the + * symlink target. This makes lstat() more heavyweight than is normally the + * case. However, decrypted symlink targets will be cached in ->i_link, so + * usually the symlink won't have to be read and decrypted again later if/when + * it is actually followed, readlink() is called, or lstat() is called again. + * + * Return: 0 on success, -errno on failure + */ +int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat) +{ + struct dentry *dentry = path->dentry; + struct inode *inode = d_inode(dentry); + const char *link; + DEFINE_DELAYED_CALL(done); + + /* + * To get the symlink target that userspace will see (whether it's the + * decrypted target or the no-key encoded target), we can just get it in + * the same way the VFS does during path resolution and readlink(). + */ + link = READ_ONCE(inode->i_link); + if (!link) { + link = inode->i_op->get_link(dentry, inode, &done); + if (IS_ERR(link)) + return PTR_ERR(link); + } + stat->size = strlen(link); + do_delayed_call(&done); + return 0; +} +EXPORT_SYMBOL_GPL(fscrypt_symlink_getattr); diff --git a/include/linux/fscrypt.h b/include/linux/fscrypt.h index 2ea1387bb497..b7bfd0cd4f3e 100644 --- a/include/linux/fscrypt.h +++ b/include/linux/fscrypt.h @@ -253,6 +253,7 @@ int __fscrypt_encrypt_symlink(struct inode *inode, const char *target, const char *fscrypt_get_symlink(struct inode *inode, const void *caddr, unsigned int max_size, struct delayed_call *done); +int fscrypt_symlink_getattr(const struct path *path, struct kstat *stat); static inline void fscrypt_set_ops(struct super_block *sb, const struct fscrypt_operations *s_cop) { @@ -583,6 +584,12 @@ static inline const char *fscrypt_get_symlink(struct inode *inode, return ERR_PTR(-EOPNOTSUPP); } +static inline int fscrypt_symlink_getattr(const struct path *path, + struct kstat *stat) +{ + return -EOPNOTSUPP; +} + static inline void fscrypt_set_ops(struct super_block *sb, const struct fscrypt_operations *s_cop) { -- 2.32.0

4 years

1
0
0 0

[for-next][PATCH 2/2] tracing: Resize tgid_map to pid_max, not PID_MAX_DEFAULT

by Steven Rostedt

From: Paul Burton <paulburton(a)google.com> Currently tgid_map is sized at PID_MAX_DEFAULT entries, which means that on systems where pid_max is configured higher than PID_MAX_DEFAULT the ftrace record-tgid option doesn't work so well. Any tasks with PIDs higher than PID_MAX_DEFAULT are simply not recorded in tgid_map, and don't show up in the saved_tgids file. In particular since systemd v243 & above configure pid_max to its highest possible 1<<22 value by default on 64 bit systems this renders the record-tgids option of little use. Increase the size of tgid_map to the configured pid_max instead, allowing it to cover the full range of PIDs up to the maximum value of PID_MAX_LIMIT if the system is configured that way. On 64 bit systems with pid_max == PID_MAX_LIMIT this will increase the size of tgid_map from 256KiB to 16MiB. Whilst this 64x increase in memory overhead sounds significant 64 bit systems are presumably best placed to accommodate it, and since tgid_map is only allocated when the record-tgid option is actually used presumably the user would rather it spends sufficient memory to actually record the tgids they expect. The size of tgid_map could also increase for CONFIG_BASE_SMALL=y configurations, but these seem unlikely to be systems upon which people are both configuring a large pid_max and running ftrace with record-tgid anyway. Of note is that we only allocate tgid_map once, the first time that the record-tgid option is enabled. Therefore its size is only set once, to the value of pid_max at the time the record-tgid option is first enabled. If a user increases pid_max after that point, the saved_tgids file will not contain entries for any tasks with pids beyond the earlier value of pid_max. Link: https://lkml.kernel.org/r/20210701172407.889626-2-paulburton@google.com Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks") Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Joel Fernandes <joelaf(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Paul Burton <paulburton(a)google.com> [ Fixed comment coding style ] Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> --- kernel/trace/trace.c | 63 +++++++++++++++++++++++++++++++++----------- 1 file changed, 47 insertions(+), 16 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 4843076d67d3..14f56e9fa001 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -2191,8 +2191,15 @@ void tracing_reset_all_online_cpus(void) } } +/* + * The tgid_map array maps from pid to tgid; i.e. the value stored at index i + * is the tgid last observed corresponding to pid=i. + */ static int *tgid_map; +/* The maximum valid index into tgid_map. */ +static size_t tgid_map_max; + #define SAVED_CMDLINES_DEFAULT 128 #define NO_CMDLINE_MAP UINT_MAX static arch_spinlock_t trace_cmdline_lock = __ARCH_SPIN_LOCK_UNLOCKED; @@ -2468,24 +2475,41 @@ void trace_find_cmdline(int pid, char comm[]) preempt_enable(); } +static int *trace_find_tgid_ptr(int pid) +{ + /* + * Pairs with the smp_store_release in set_tracer_flag() to ensure that + * if we observe a non-NULL tgid_map then we also observe the correct + * tgid_map_max. + */ + int *map = smp_load_acquire(&tgid_map); + + if (unlikely(!map || pid > tgid_map_max)) + return NULL; + + return &map[pid]; +} + int trace_find_tgid(int pid) { - if (unlikely(!tgid_map || !pid || pid > PID_MAX_DEFAULT)) - return 0; + int *ptr = trace_find_tgid_ptr(pid); - return tgid_map[pid]; + return ptr ? *ptr : 0; } static int trace_save_tgid(struct task_struct *tsk) { + int *ptr; + /* treat recording of idle task as a success */ if (!tsk->pid) return 1; - if (unlikely(!tgid_map || tsk->pid > PID_MAX_DEFAULT)) + ptr = trace_find_tgid_ptr(tsk->pid); + if (!ptr) return 0; - tgid_map[tsk->pid] = tsk->tgid; + *ptr = tsk->tgid; return 1; } @@ -5225,6 +5249,8 @@ int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set) int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled) { + int *map; + if ((mask == TRACE_ITER_RECORD_TGID) || (mask == TRACE_ITER_RECORD_CMD)) lockdep_assert_held(&event_mutex); @@ -5247,10 +5273,19 @@ int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled) trace_event_enable_cmd_record(enabled); if (mask == TRACE_ITER_RECORD_TGID) { - if (!tgid_map) - tgid_map = kvcalloc(PID_MAX_DEFAULT + 1, - sizeof(*tgid_map), - GFP_KERNEL); + if (!tgid_map) { + tgid_map_max = pid_max; + map = kvcalloc(tgid_map_max + 1, sizeof(*tgid_map), + GFP_KERNEL); + + /* + * Pairs with smp_load_acquire() in + * trace_find_tgid_ptr() to ensure that if it observes + * the tgid_map we just allocated then it also observes + * the corresponding tgid_map_max value. + */ + smp_store_release(&tgid_map, map); + } if (!tgid_map) { tr->trace_flags &= ~TRACE_ITER_RECORD_TGID; return -ENOMEM; @@ -5664,18 +5699,14 @@ static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos) { int pid = ++(*pos); - if (pid > PID_MAX_DEFAULT) - return NULL; - - return &tgid_map[pid]; + return trace_find_tgid_ptr(pid); } static void *saved_tgids_start(struct seq_file *m, loff_t *pos) { - if (!tgid_map || *pos > PID_MAX_DEFAULT) - return NULL; + int pid = *pos; - return &tgid_map[*pos]; + return trace_find_tgid_ptr(pid); } static void saved_tgids_stop(struct seq_file *m, void *v) -- 2.30.2

4 years

1
0
0 0

[for-next][PATCH 1/2] tracing: Simplify & fix saved_tgids logic

by Steven Rostedt

From: Paul Burton <paulburton(a)google.com> The tgid_map array records a mapping from pid to tgid, where the index of an entry within the array is the pid & the value stored at that index is the tgid. The saved_tgids_next() function iterates over pointers into the tgid_map array & dereferences the pointers which results in the tgid, but then it passes that dereferenced value to trace_find_tgid() which treats it as a pid & does a further lookup within the tgid_map array. It seems likely that the intent here was to skip over entries in tgid_map for which the recorded tgid is zero, but instead we end up skipping over entries for which the thread group leader hasn't yet had its own tgid recorded in tgid_map. A minimal fix would be to remove the call to trace_find_tgid, turning: if (trace_find_tgid(*ptr)) into: if (*ptr) ..but it seems like this logic can be much simpler if we simply let seq_read() iterate over the whole tgid_map array & filter out empty entries by returning SEQ_SKIP from saved_tgids_show(). Here we take that approach, removing the incorrect logic here entirely. Link: https://lkml.kernel.org/r/20210630003406.4013668-1-paulburton@google.com Fixes: d914ba37d714 ("tracing: Add support for recording tgid of tasks") Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Joel Fernandes <joelaf(a)google.com> Cc: <stable(a)vger.kernel.org> Signed-off-by: Paul Burton <paulburton(a)google.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> --- kernel/trace/trace.c | 38 +++++++++++++------------------------- 1 file changed, 13 insertions(+), 25 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 60492464281e..4843076d67d3 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -5662,37 +5662,20 @@ static const struct file_operations tracing_readme_fops = { static void *saved_tgids_next(struct seq_file *m, void *v, loff_t *pos) { - int *ptr = v; + int pid = ++(*pos); - if (*pos || m->count) - ptr++; - - (*pos)++; - - for (; ptr <= &tgid_map[PID_MAX_DEFAULT]; ptr++) { - if (trace_find_tgid(*ptr)) - return ptr; - } + if (pid > PID_MAX_DEFAULT) + return NULL; - return NULL; + return &tgid_map[pid]; } static void *saved_tgids_start(struct seq_file *m, loff_t *pos) { - void *v; - loff_t l = 0; - - if (!tgid_map) + if (!tgid_map || *pos > PID_MAX_DEFAULT) return NULL; - v = &tgid_map[0]; - while (l <= *pos) { - v = saved_tgids_next(m, v, &l); - if (!v) - return NULL; - } - - return v; + return &tgid_map[*pos]; } static void saved_tgids_stop(struct seq_file *m, void *v) @@ -5701,9 +5684,14 @@ static void saved_tgids_stop(struct seq_file *m, void *v) static int saved_tgids_show(struct seq_file *m, void *v) { - int pid = (int *)v - tgid_map; + int *entry = (int *)v; + int pid = entry - tgid_map; + int tgid = *entry; + + if (tgid == 0) + return SEQ_SKIP; - seq_printf(m, "%d %d\n", pid, trace_find_tgid(pid)); + seq_printf(m, "%d %d\n", pid, tgid); return 0; } -- 2.30.2

4 years

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror July 2021