November 2024 - Linux-kselftest-mirror

[PATCH 1/2] selftest/mm: Fix typo in virtual_address_range

by Chunyan Zhang

The function name should be *hint* address, so correct it. Signed-off-by: Chunyan Zhang <zhangchunyan(a)iscas.ac.cn> --- tools/testing/selftests/mm/virtual_address_range.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mm/virtual_address_range.c b/tools/testing/selftests/mm/virtual_address_range.c index 4e4c1e311247..2a2b69e91950 100644 --- a/tools/testing/selftests/mm/virtual_address_range.c +++ b/tools/testing/selftests/mm/virtual_address_range.c @@ -64,7 +64,7 @@ #define NR_CHUNKS_HIGH NR_CHUNKS_384TB #endif -static char *hind_addr(void) +static char *hint_addr(void) { int bits = HIGH_ADDR_SHIFT + rand() % (63 - HIGH_ADDR_SHIFT); @@ -185,7 +185,7 @@ int main(int argc, char *argv[]) } for (i = 0; i < NR_CHUNKS_HIGH; i++) { - hint = hind_addr(); + hint = hint_addr(); hptr[i] = mmap(hint, MAP_CHUNK_SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); -- 2.34.1

7 months

4
5
0 0

[PATCH 00/15] KVM: x86: Introduce new ioctl KVM_TRANSLATE2

by Nikolas Wipper

This series introduces a new ioctl KVM_TRANSLATE2, which expands on KVM_TRANSLATE. It is required to implement Hyper-V's HvTranslateVirtualAddress hyper-call as part of the ongoing effort to emulate HyperV's Virtual Secure Mode (VSM) within KVM and QEMU. The hyper- call requires several new KVM APIs, one of which is KVM_TRANSLATE2, which implements the core functionality of the hyper-call. The rest of the required functionality will be implemented in subsequent series. Other than translating guest virtual addresses, the ioctl allows the caller to control whether the access and dirty bits are set during the page walk. It also allows specifying an access mode instead of returning viable access modes, which enables setting the bits up to the level that caused a failure. Additionally, the ioctl provides more information about why the page walk failed, and which page table is responsible. This functionality is not available within KVM_TRANSLATE, and can't be added without breaking backwards compatiblity, thus a new ioctl is required. The ioctl was designed to facilitate as many other use cases as possible apart from VSM. The error codes were intentionally chosen to be broad enough to avoid exposing architecture specific details. Even though HvTranslateVirtualAddress only really needs one flag to set the accessed and dirty bits whenever possible, that was split into several flags so that future users can chose more gradually when these bits should be set. Furthermore, as much information as possible is provided to the caller. The patch series includes selftests for the ioctl, as well as fuzzy testing on random garbage guest page table entries. All previously passing KVM selftests and KVM unit tests still pass. Series overview: - 1: Document the new ioctl - 2-11: Update the page walker in preparation - 12-14: Implement the ioctl - 15: Implement testing This series, alongside the series by Nicolas Saenz Julienne [1] introducing the core building blocks for VSM and the accompanying QEMU implementation [2], is capable of booting Windows Server 2019. Both series are also available on GitHub [3]. [1] https://lore.kernel.org/linux-hyperv/20240609154945.55332-1-nsaenz@amazon.c… [2] https://github.com/vianpl/qemu/tree/vsm/next [3] https://github.com/vianpl/linux/tree/vsm/next Best, Nikolas Nikolas Wipper (15): KVM: Add API documentation for KVM_TRANSLATE2 KVM: x86/mmu: Abort page walk if permission checks fail KVM: x86/mmu: Introduce exception flag for unmapped GPAs KVM: x86/mmu: Store GPA in exception if applicable KVM: x86/mmu: Introduce flags parameter to page walker KVM: x86/mmu: Implement PWALK_SET_ACCESSED in page walker KVM: x86/mmu: Implement PWALK_SET_DIRTY in page walker KVM: x86/mmu: Implement PWALK_FORCE_SET_ACCESSED in page walker KVM: x86/mmu: Introduce status parameter to page walker KVM: x86/mmu: Implement PWALK_STATUS_READ_ONLY_PTE_GPA in page walker KVM: x86: Introduce generic gva to gpa translation function KVM: Introduce KVM_TRANSLATE2 KVM: Add KVM_TRANSLATE2 stub KVM: x86: Implement KVM_TRANSLATE2 KVM: selftests: Add test for KVM_TRANSLATE2 Documentation/virt/kvm/api.rst | 131 ++++++++ arch/x86/include/asm/kvm_host.h | 18 +- arch/x86/kvm/hyperv.c | 3 +- arch/x86/kvm/kvm_emulate.h | 8 + arch/x86/kvm/mmu.h | 10 +- arch/x86/kvm/mmu/mmu.c | 7 +- arch/x86/kvm/mmu/paging_tmpl.h | 80 +++-- arch/x86/kvm/x86.c | 123 ++++++- include/linux/kvm_host.h | 6 + include/uapi/linux/kvm.h | 33 ++ tools/testing/selftests/kvm/Makefile | 1 + .../selftests/kvm/x86_64/kvm_translate2.c | 310 ++++++++++++++++++ virt/kvm/kvm_main.c | 41 +++ 13 files changed, 724 insertions(+), 47 deletions(-) create mode 100644 tools/testing/selftests/kvm/x86_64/kvm_translate2.c -- 2.40.1 Amazon Web Services Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 257764 B Sitz: Berlin Ust-ID: DE 365 538 597

7 months

2
18
0 0

[PATCH v2 0/4] Migrate PCI Endpoint Subsystem tests to Kselftest

by Manivannan Sadhasivam

Hi, This series carries forward the effort to add Kselftest for PCI Endpoint Subsystem started by Aman Gupta [1] a while ago. I reworked the initial version based on another patch that fixes the return values of IOCTLs in pci_endpoint_test driver and did many cleanups. Since the resulting work modified the initial version substantially, I took over the authorship. This series also incorporates the review comment by Shuah Khan [2] to move the existing tests from 'tools/pci' to 'tools/testing/kselftest/pci_endpoint' before migrating to Kselftest framework. I made sure that the tests are executable in each commit and updated documentation accordingly. NOTE: Patch 1 is strictly not related to this series, but necessary to execute Kselftests with Qualcomm Endpoint devices. So this can be merged separately. - Mani [1] https://lore.kernel.org/linux-pci/20221007053934.5188-1-aman1.gupta@samsung… [2] https://lore.kernel.org/linux-pci/b2a5db97-dc59-33ab-71cd-f591e0b1b34d@linu… Changes in v2: * Added a patch that fixes return values of IOCTL in pci_endpoint_test driver * Moved the existing tests to new location before migrating * Added a fix for BARs on Qcom devices * Updated documentation and also added fixture variants for memcpy & DMA modes Manivannan Sadhasivam (4): PCI: qcom-ep: Mark BAR0/BAR2 as 64bit BARs and BAR1/BAR3 as RESERVED misc: pci_endpoint_test: Fix the return value of IOCTL selftests: Move PCI Endpoint tests from tools/pci to Kselftests selftests: pci_endpoint: Migrate to Kselftest framework Documentation/PCI/endpoint/pci-test-howto.rst | 144 +++------- MAINTAINERS | 2 +- drivers/misc/pci_endpoint_test.c | 236 ++++++++--------- drivers/pci/controller/dwc/pcie-qcom-ep.c | 4 + tools/pci/Build | 1 - tools/pci/Makefile | 58 ---- tools/pci/pcitest.c | 250 ------------------ tools/pci/pcitest.sh | 72 ----- tools/testing/selftests/Makefile | 1 + .../testing/selftests/pci_endpoint/.gitignore | 2 + tools/testing/selftests/pci_endpoint/Makefile | 7 + tools/testing/selftests/pci_endpoint/config | 4 + .../pci_endpoint/pci_endpoint_test.c | 186 +++++++++++++ 13 files changed, 365 insertions(+), 602 deletions(-) delete mode 100644 tools/pci/Build delete mode 100644 tools/pci/Makefile delete mode 100644 tools/pci/pcitest.c delete mode 100644 tools/pci/pcitest.sh create mode 100644 tools/testing/selftests/pci_endpoint/.gitignore create mode 100644 tools/testing/selftests/pci_endpoint/Makefile create mode 100644 tools/testing/selftests/pci_endpoint/config create mode 100644 tools/testing/selftests/pci_endpoint/pci_endpoint_test.c -- 2.25.1

7 months

6
18
0 0

[PATCH for-next v3] selftests/filesystems: Add missing gitignore file

by Li Zhijian

Compiled binary files should be added to .gitignore 'git status' complains: Untracked files: (use "git add <file>..." to include in what will be committed) filesystems/statmount/statmount_test_ns Cc: Shuah Khan <shuah(a)kernel.org> Cc: Christian Brauner <brauner(a)kernel.org> Cc: Miklos Szeredi <mszeredi(a)redhat.com> Cc: Josef Bacik <josef(a)toxicpanda.com> Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com> --- Hello, Cover letter is here. This patch set aims to make 'git status' clear after 'make' and 'make run_tests' for kselftests. --- V3: sorted the ignored files V2: split as a separate patch from a small one [0] [0] https://lore.kernel.org/linux-kselftest/20241015010817.453539-1-lizhijian@f… Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com> --- tools/testing/selftests/filesystems/statmount/.gitignore | 1 + 1 file changed, 1 insertion(+) diff --git a/tools/testing/selftests/filesystems/statmount/.gitignore b/tools/testing/selftests/filesystems/statmount/.gitignore index 82a4846cbc4b..973363ad66a2 100644 --- a/tools/testing/selftests/filesystems/statmount/.gitignore +++ b/tools/testing/selftests/filesystems/statmount/.gitignore @@ -1,2 +1,3 @@ # SPDX-License-Identifier: GPL-2.0-only +statmount_test_ns /*_test -- 2.44.0

7 months

4
3
0 0

[PATCH for-next v3] selftests/zram: gitignore output file

by Li Zhijian

After `make run_tests`, the git status complains: Untracked files: (use "git add <file>..." to include in what will be committed) zram/err.log This file will be cleaned up when execute 'make clean' Cc: Shuah Khan <shuah(a)kernel.org> Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com> --- Hello, Cover letter is here. This patch set aims to make 'git status' clear after 'make' and 'make run_tests' for kselftests. --- V3: Add Copyright description V2: split as a separate patch from a small one [0] [0] https://lore.kernel.org/linux-kselftest/20241015010817.453539-1-lizhijian@f… Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com> --- tools/testing/selftests/zram/.gitignore | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 tools/testing/selftests/zram/.gitignore diff --git a/tools/testing/selftests/zram/.gitignore b/tools/testing/selftests/zram/.gitignore new file mode 100644 index 000000000000..088cd9bad87a --- /dev/null +++ b/tools/testing/selftests/zram/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +err.log -- 2.44.0

7 months

3
4
0 0

[PATCH net-next v4 0/6] tls: implement key updates for TLS1.3

by Sabrina Dubroca

This adds support for receiving KeyUpdate messages (RFC 8446, 4.6.3 [1]). A sender transmits a KeyUpdate message and then changes its TX key. The receiver should react by updating its RX key before processing the next message. This patchset implements key updates by: 1. pausing decryption when a KeyUpdate message is received, to avoid attempting to use the old key to decrypt a record encrypted with the new key 2. returning -EKEYEXPIRED to syscalls that cannot receive the KeyUpdate message, until the rekey has been performed by userspace 3. passing the KeyUpdate message to userspace as a control message 4. allowing updates of the crypto_info via the TLS_TX/TLS_RX setsockopts This API has been tested with gnutls to make sure that it allows userspace libraries to implement key updates [2]. Thanks to Frantisek Krenzelok <fkrenzel(a)redhat.com> for providing the implementation in gnutls and testing the kernel patches. ======================================================================= Discussions around v2 of this patchset focused on how HW offload would interact with rekey. RX - The existing SW path will handle all records between the KeyUpdate message signaling the change of key and the new key becoming known to the kernel -- those will be queued encrypted, and decrypted in SW as they are read by userspace (once the key is provided, ie same as this patchset) - Call ->tls_dev_del + ->tls_dev_add immediately during setsockopt(TLS_RX) TX - After setsockopt(TLS_TX), switch to the existing SW path (not the current device_fallback) until we're able to re-enable HW offload - tls_device_sendmsg will call into tls_sw_sendmsg under lock_sock to avoid changing socket ops during the rekey while another thread might be waiting on the lock - We only re-enable HW offload (call ->tls_dev_add to install the new key in HW) once all records sent with the old key have been ACKed. At this point, all unacked records are SW-encrypted with the new key, and the old key is unused by both HW and retransmissions. - If there are no unacked records when userspace does setsockopt(TLS_TX), we can (try to) install the new key in HW immediately. - If yet another key has been provided via setsockopt(TLS_TX), we don't install intermediate keys, only the latest. - TCP notifies ktls of ACKs via the icsk_clean_acked callback. In case of a rekey, tls_icsk_clean_acked will record when all data sent with the most recent past key has been sent. The next call to sendmsg will install the new key in HW. - We close and push the current SW record before reenabling offload. If ->tls_dev_add fails to install the new key in HW, we stay in SW mode. We can add a counter to keep track of this. In addition: Because we can't change socket ops during a rekey, we'll also have to modify do_tls_setsockopt_conf to check ctx->tx_conf and only call either tls_set_device_offload or tls_set_sw_offload. RX already uses the same ops for both TLS_HW and TLS_SW, so we could switch between HW and SW mode on rekey. An alternative would be to have a common sendmsg which locks the socket and then calls the correct implementation. We'll need that anyway for the offload under rekey case, so that would only add a test to the SW path's ops (compared to the current code). That should allow us to simplify build_protos a bit, but might have a performance impact - we'll need to check it if we want to go that route. ======================================================================= Changes since v3: - rebase on top of net-next - rework tls_check_pending_rekey according to Jakub's feedback - add statistics for rekey: {RX,TX}REKEY{OK,ERROR} - some coding style clean ups Link: https://lore.kernel.org/netdev/cover.1691584074.git.sd@queasysnail.net/ [v3] Link: https://lore.kernel.org/netdev/cover.1676052788.git.sd@queasysnail.net/ [v2] Link: https://lore.kernel.org/netdev/cover.1673952268.git.sd@queasysnail.net/ [v1] Link: https://www.rfc-editor.org/rfc/rfc8446#section-4.6.3 [1] Link: https://gitlab.com/gnutls/gnutls/-/merge_requests/1625 [2] Sabrina Dubroca (6): tls: block decryption when a rekey is pending tls: implement rekey for TLS1.3 tls: add counters for rekey docs: tls: document TLS1.3 key updates selftests: tls: add key_generation argument to tls_crypto_info_init selftests: tls: add rekey tests Documentation/networking/tls.rst | 31 ++ include/net/tls.h | 3 + include/uapi/linux/snmp.h | 4 + net/tls/tls.h | 3 +- net/tls/tls_device.c | 2 +- net/tls/tls_main.c | 71 ++++- net/tls/tls_proc.c | 4 + net/tls/tls_sw.c | 138 +++++++-- tools/testing/selftests/net/tls.c | 480 +++++++++++++++++++++++++++++- 9 files changed, 676 insertions(+), 60 deletions(-) -- 2.47.0

7 months

3
19
0 0

[PATCH for-next v3] selftests/cpufreq: gitignore output files and clean them in make clean

by Li Zhijian

After `make run_tests`, the git status complains: Untracked files: (use "git add <file>..." to include in what will be committed) cpufreq/cpufreq_selftest.dmesg_cpufreq.txt cpufreq/cpufreq_selftest.dmesg_full.txt cpufreq/cpufreq_selftest.txt Cc: "Rafael J. Wysocki" <rafael(a)kernel.org> Cc: Viresh Kumar <viresh.kumar(a)linaro.org> Cc: Shuah Khan <shuah(a)kernel.org> Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com> --- Cc: linux-pm(a)vger.kernel.org --- Hello, Cover letter is here. This patch set aims to make 'git status' clear after 'make' and 'make run_tests' for kselftests. --- V3: add Copyright descirption V2: split as a separate patch from a small one [0] [0] https://lore.kernel.org/linux-kselftest/20241015010817.453539-1-lizhijian@f… Signed-off-by: Li Zhijian <lizhijian(a)fujitsu.com> --- tools/testing/selftests/cpufreq/.gitignore | 2 ++ tools/testing/selftests/cpufreq/Makefile | 1 + 2 files changed, 3 insertions(+) create mode 100644 tools/testing/selftests/cpufreq/.gitignore diff --git a/tools/testing/selftests/cpufreq/.gitignore b/tools/testing/selftests/cpufreq/.gitignore new file mode 100644 index 000000000000..67604e91e068 --- /dev/null +++ b/tools/testing/selftests/cpufreq/.gitignore @@ -0,0 +1,2 @@ +# SPDX-License-Identifier: GPL-2.0-only +cpufreq_selftest.* diff --git a/tools/testing/selftests/cpufreq/Makefile b/tools/testing/selftests/cpufreq/Makefile index c86ca8342222..9b2ccb10b0cf 100644 --- a/tools/testing/selftests/cpufreq/Makefile +++ b/tools/testing/selftests/cpufreq/Makefile @@ -3,6 +3,7 @@ all: TEST_PROGS := main.sh TEST_FILES := cpu.sh cpufreq.sh governor.sh module.sh special-tests.sh +EXTRA_CLEAN := cpufreq_selftest.dmesg_cpufreq.txt cpufreq_selftest.dmesg_full.txt cpufreq_selftest.txt include ../lib.mk -- 2.44.0

7 months

3
2
0 0

[RFC PATCH v3 00/15] context_tracking,x86: Defer some IPIs until a user->kernel transition

by Valentin Schneider

Context ======= We've observed within Red Hat that isolated, NOHZ_FULL CPUs running a pure-userspace application get regularly interrupted by IPIs sent from housekeeping CPUs. Those IPIs are caused by activity on the housekeeping CPUs leading to various on_each_cpu() calls, e.g.: 64359.052209596 NetworkManager 0 1405 smp_call_function_many_cond (cpu=0, func=do_kernel_range_flush) smp_call_function_many_cond+0x1 smp_call_function+0x39 on_each_cpu+0x2a flush_tlb_kernel_range+0x7b __purge_vmap_area_lazy+0x70 _vm_unmap_aliases.part.42+0xdf change_page_attr_set_clr+0x16a set_memory_ro+0x26 bpf_int_jit_compile+0x2f9 bpf_prog_select_runtime+0xc6 bpf_prepare_filter+0x523 sk_attach_filter+0x13 sock_setsockopt+0x92c __sys_setsockopt+0x16a __x64_sys_setsockopt+0x20 do_syscall_64+0x87 entry_SYSCALL_64_after_hwframe+0x65 The heart of this series is the thought that while we cannot remove NOHZ_FULL CPUs from the list of CPUs targeted by these IPIs, they may not have to execute the callbacks immediately. Anything that only affects kernelspace can wait until the next user->kernel transition, providing it can be executed "early enough" in the entry code. The original implementation is from Peter [1]. Nicolas then added kernel TLB invalidation deferral to that [2], and I picked it up from there. Deferral approach ================= Storing each and every callback, like a secondary call_single_queue turned out to be a no-go: the whole point of deferral is to keep NOHZ_FULL CPUs in userspace for as long as possible - no signal of any form would be sent when deferring an IPI. This means that any form of queuing for deferred callbacks would end up as a convoluted memory leak. Deferred IPIs must thus be coalesced, which this series achieves by assigning IPIs a "type" and having a mapping of IPI type to callback, leveraged upon kernel entry. What about IPIs whose callback take a parameter, you may ask? Peter suggested during OSPM23 [3] that since on_each_cpu() targets housekeeping CPUs *and* isolated CPUs, isolated CPUs can access either global or housekeeping-CPU-local state to "reconstruct" the data that would have been sent via the IPI. This series does not affect any IPI callback that requires an argument, but the approach would remain the same (one coalescable callback executed on kernel entry). Kernel entry vs execution of the deferred operation =================================================== This is what I've referred to as the "Danger Zone" during my LPC24 talk [4]. There is a non-zero length of code that is executed upon kernel entry before the deferred operation can be itself executed (i.e. before we start getting into context_tracking.c proper), i.e.: idtentry_func_foo() <--- we're in the kernel irqentry_enter() enter_from_user_mode() __ct_user_exit() ct_kernel_enter_state() ct_work_flush() <--- deferred operation is executed here This means one must take extra care to what can happen in the early entry code, and that <bad things> cannot happen. For instance, we really don't want to hit instructions that have been modified by a remote text_poke() while we're on our way to execute a deferred sync_core(). Patches doing the actual deferral have more detail on this. Patches ======= o Patches 1-3 are standalone cleanups. o Patches 4-5 add an RCU testing feature. o Patches 6-8 add a new type of jump label for static keys that will not have their IPI be deferred. o Patch 9 adds objtool verification of static keys vs their text_poke IPI deferral o Patches 10-14 add the actual IPI deferrals o Patch 15 is a freebie to enable the deferral feature for NO_HZ_IDLE Patches are also available at: https://gitlab.com/vschneid/linux.git -b redhat/isolirq/defer/v3 RFC status ========== Things I'd like to get comments on and/or that are a bit WIPish; they're called out in the individual changelogs: o "forceful" jump label naming which I don't particularly like o objtool usage of 'offset_of(static_key.type)' and JUMP_TYPE_FORCEFUL. I've hardcoded them but it could do with being shoved in a kernel header objtool can include directly o The noinstr variant of __flush_tlb_all() doesn't have a paravirt variant, does it need one? Testing ======= Xeon E5-2699 system with SMToff, NOHZ_FULL, isolated CPUs. RHEL9 userspace. Workload is using rteval (kernel compilation + hackbench) on housekeeping CPUs and a dummy stay-in-userspace loop on the isolated CPUs. The main invocation is: $ trace-cmd record -e "csd_queue_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \ -e "ipi_send_cpumask" -f "cpumask & CPUS{$ISOL_CPUS}" \ -e "ipi_send_cpu" -f "cpu & CPUS{$ISOL_CPUS}" \ rteval --onlyload --loads-cpulist=$HK_CPUS \ --hackbench-runlowmem=True --duration=$DURATION This only records IPIs sent to isolated CPUs, so any event there is interference (with a bit of fuzz at the start/end of the workload when spawning the processes). All tests were done with a duration of 1hr. v6.12-rc4 # This is the actual IPI count $ trace-cmd report trace-base.dat | grep callback | awk '{ print $(NF) }' | sort | uniq -c | sort -nr 1782 callback=generic_smp_call_function_single_interrupt+0x0 73 callback=0x0 # These are the different CSD's that caused IPIs $ trace-cmd report | grep csd_queue | awk '{ print $(NF-1) }' | sort | uniq -c | sort -nr 22048 func=tlb_remove_table_smp_sync 16536 func=do_sync_core 2262 func=do_flush_tlb_all 182 func=do_kernel_range_flush 144 func=rcu_exp_handler 60 func=sched_ttwu_pending v6.12-rc4 + patches: # This is the actual IPI count $ trace-cmd report | grep callback | awk '{ print $(NF) }' | sort | uniq -c | sort -nr 1168 callback=generic_smp_call_function_single_interrupt+0x0 74 callback=0x0 # These are the different CSD's that caused IPIs $ trace-cmd report | grep csd_queue | awk '{ print $(NF-1) }' | sort | uniq -c | sort -nr 23686 func=tlb_remove_table_smp_sync 192 func=rcu_exp_handler 65 func=sched_ttwu_pending Interestingly tlb_remove_table_smp_sync() started showing up on this machine, while it didn't during testing for v2 and it's the same machine. Yair had a series adressing this [5] which per these results would be worth revisiting. Acknowledgements ================ Special thanks to: o Clark Williams for listening to my ramblings about this and throwing ideas my way o Josh Poimboeuf for his guidance regarding objtool and hinting at the .data..ro_after_init section. o All of the folks who attended various talks about this and provided precious feedback. Links ===== [1]: https://lore.kernel.org/all/20210929151723.162004989@infradead.org/ [2]: https://github.com/vianpl/linux.git -b ct-work-defer-wip [3]: https://youtu.be/0vjE6fjoVVE [4]: https://lpc.events/event/18/contributions/1889/ [5]: https://lore.kernel.org/lkml/20230620144618.125703-1-ypodemsk@redhat.com/ Revisions ========= RFCv2 -> RFCv3 +++++++++++ o Rebased onto v6.12-rc7 o Added objtool documentation for the new warning (Josh) o Added low-size RCU watching counter to TREE04 torture scenario (Paul) o Added FORCEFUL jump label and static key types o Added noinstr-compliant helpers for tlb flush deferral o Overall changelog & comments cleanup RFCv1 -> RFCv2 ++++++++++++++ o Rebased onto v6.5-rc1 o Updated the trace filter patches (Steven) o Fixed __ro_after_init keys used in modules (Peter) o Dropped the extra context_tracking atomic, squashed the new bits in the existing .state field (Peter, Frederic) o Added an RCU_EXPERT config for the RCU dynticks counter size, and added an rcutorture case for a low-size counter (Paul) o Fixed flush_tlb_kernel_range_deferrable() definition Valentin Schneider (15): objtool: Make validate_call() recognize indirect calls to pv_ops[] objtool: Flesh out warning related to pv_ops[] calls sched/clock: Make sched_clock_running __ro_after_init rcu: Add a small-width RCU watching counter debug option rcutorture: Make TREE04 use CONFIG_RCU_DYNTICKS_TORTURE jump_label: Add forceful jump label type x86/speculation/mds: Make mds_idle_clear forceful sched/clock, x86: Make __sched_clock_stable forceful objtool: Warn about non __ro_after_init static key usage in .noinstr x86/alternatives: Record text_poke's of JUMP_TYPE_FORCEFUL labels context-tracking: Introduce work deferral infrastructure context_tracking,x86: Defer kernel text patching IPIs context_tracking,x86: Add infrastructure to defer kernel TLBI x86/mm, mm/vmalloc: Defer flush_tlb_kernel_range() targeting NOHZ_FULL CPUs context-tracking: Add a Kconfig to enable IPI deferral for NO_HZ_IDLE arch/Kconfig | 9 +++ arch/x86/Kconfig | 1 + arch/x86/include/asm/context_tracking_work.h | 20 +++++++ arch/x86/include/asm/special_insns.h | 1 + arch/x86/include/asm/text-patching.h | 13 ++++- arch/x86/include/asm/tlbflush.h | 17 +++++- arch/x86/kernel/alternative.c | 49 ++++++++++++---- arch/x86/kernel/cpu/bugs.c | 2 +- arch/x86/kernel/cpu/common.c | 6 +- arch/x86/kernel/jump_label.c | 7 ++- arch/x86/kernel/kprobes/core.c | 4 +- arch/x86/kernel/kprobes/opt.c | 4 +- arch/x86/kernel/module.c | 2 +- arch/x86/mm/tlb.c | 49 ++++++++++++++-- include/linux/context_tracking.h | 21 +++++++ include/linux/context_tracking_state.h | 54 ++++++++++++++--- include/linux/context_tracking_work.h | 28 +++++++++ include/linux/jump_label.h | 26 ++++++--- kernel/context_tracking.c | 46 ++++++++++++++- kernel/rcu/Kconfig.debug | 14 +++++ kernel/sched/clock.c | 4 +- kernel/time/Kconfig | 19 ++++++ mm/vmalloc.c | 35 +++++++++-- tools/objtool/Documentation/objtool.txt | 13 +++++ tools/objtool/check.c | 58 ++++++++++++++++--- tools/objtool/include/objtool/check.h | 1 + tools/objtool/include/objtool/special.h | 2 + tools/objtool/special.c | 3 + .../selftests/rcutorture/configs/rcu/TREE04 | 1 + 29 files changed, 450 insertions(+), 59 deletions(-) create mode 100644 arch/x86/include/asm/context_tracking_work.h create mode 100644 include/linux/context_tracking_work.h -- 2.43.0

7 months

9
63
0 0

[PATCH v2] selftests/vDSO: support DT_GNU_HASH

by Fangrui Song

glibc added support for DT_GNU_HASH in 2006 and DT_HASH has been obsoleted for more than one decade in many Linux distributions. Many vDSOs support DT_GNU_HASH. This patch adds selftests support. Signed-off-by: Fangrui Song <maskray(a)google.com> Tested-by: Xi Ruoyao <xry111(a)xry111.site> -- Changes from v1: * fix style of a multi-line comment. ignore false positive suggestions from checkpath.pl: `ELF(Word) *` --- tools/testing/selftests/vDSO/parse_vdso.c | 105 ++++++++++++++++------ 1 file changed, 79 insertions(+), 26 deletions(-) diff --git a/tools/testing/selftests/vDSO/parse_vdso.c b/tools/testing/selftests/vDSO/parse_vdso.c index 4ae417372e9e..dbc946dee4b1 100644 --- a/tools/testing/selftests/vDSO/parse_vdso.c +++ b/tools/testing/selftests/vDSO/parse_vdso.c @@ -47,6 +47,7 @@ static struct vdso_info /* Symbol table */ ELF(Sym) *symtab; const char *symstrings; + ELF(Word) *gnu_hash; ELF(Word) *bucket, *chain; ELF(Word) nbucket, nchain; @@ -75,6 +76,16 @@ static unsigned long elf_hash(const char *name) return h; } +static uint32_t gnu_hash(const char *name) +{ + const unsigned char *s = (void *)name; + uint32_t h = 5381; + + for (; *s; s++) + h += h * 32 + *s; + return h; +} + void vdso_init_from_sysinfo_ehdr(uintptr_t base) { size_t i; @@ -117,6 +128,7 @@ void vdso_init_from_sysinfo_ehdr(uintptr_t base) */ ELF(Word) *hash = 0; vdso_info.symstrings = 0; + vdso_info.gnu_hash = 0; vdso_info.symtab = 0; vdso_info.versym = 0; vdso_info.verdef = 0; @@ -137,6 +149,11 @@ void vdso_init_from_sysinfo_ehdr(uintptr_t base) ((uintptr_t)dyn[i].d_un.d_ptr + vdso_info.load_offset); break; + case DT_GNU_HASH: + vdso_info.gnu_hash = + (ELF(Word) *)((uintptr_t)dyn[i].d_un.d_ptr + + vdso_info.load_offset); + break; case DT_VERSYM: vdso_info.versym = (ELF(Versym) *) ((uintptr_t)dyn[i].d_un.d_ptr @@ -149,17 +166,26 @@ void vdso_init_from_sysinfo_ehdr(uintptr_t base) break; } } - if (!vdso_info.symstrings || !vdso_info.symtab || !hash) + if (!vdso_info.symstrings || !vdso_info.symtab || + (!hash && !vdso_info.gnu_hash)) return; /* Failed */ if (!vdso_info.verdef) vdso_info.versym = 0; /* Parse the hash table header. */ - vdso_info.nbucket = hash[0]; - vdso_info.nchain = hash[1]; - vdso_info.bucket = &hash[2]; - vdso_info.chain = &hash[vdso_info.nbucket + 2]; + if (vdso_info.gnu_hash) { + vdso_info.nbucket = vdso_info.gnu_hash[0]; + /* The bucket array is located after the header (4 uint32) and the bloom + * filter (size_t array of gnu_hash[2] elements). */ + vdso_info.bucket = vdso_info.gnu_hash + 4 + + sizeof(size_t) / 4 * vdso_info.gnu_hash[2]; + } else { + vdso_info.nbucket = hash[0]; + vdso_info.nchain = hash[1]; + vdso_info.bucket = &hash[2]; + vdso_info.chain = &hash[vdso_info.nbucket + 2]; + } /* That's all we need. */ vdso_info.valid = true; @@ -203,6 +229,26 @@ static bool vdso_match_version(ELF(Versym) ver, && !strcmp(name, vdso_info.symstrings + aux->vda_name); } +static bool check_sym(ELF(Sym) *sym, ELF(Word) i, const char *name, + const char *version, unsigned long ver_hash) +{ + /* Check for a defined global or weak function w/ right name. */ + if (ELF64_ST_TYPE(sym->st_info) != STT_FUNC) + return false; + if (ELF64_ST_BIND(sym->st_info) != STB_GLOBAL && + ELF64_ST_BIND(sym->st_info) != STB_WEAK) + return false; + if (strcmp(name, vdso_info.symstrings + sym->st_name)) + return false; + + /* Check symbol version. */ + if (vdso_info.versym && + !vdso_match_version(vdso_info.versym[i], version, ver_hash)) + return false; + + return true; +} + void *vdso_sym(const char *version, const char *name) { unsigned long ver_hash; @@ -210,29 +256,36 @@ void *vdso_sym(const char *version, const char *name) return 0; ver_hash = elf_hash(version); - ELF(Word) chain = vdso_info.bucket[elf_hash(name) % vdso_info.nbucket]; + ELF(Word) i; - for (; chain != STN_UNDEF; chain = vdso_info.chain[chain]) { - ELF(Sym) *sym = &vdso_info.symtab[chain]; + if (vdso_info.gnu_hash) { + uint32_t h1 = gnu_hash(name), h2, *hashval; - /* Check for a defined global or weak function w/ right name. */ - if (ELF64_ST_TYPE(sym->st_info) != STT_FUNC) - continue; - if (ELF64_ST_BIND(sym->st_info) != STB_GLOBAL && - ELF64_ST_BIND(sym->st_info) != STB_WEAK) - continue; - if (sym->st_shndx == SHN_UNDEF) - continue; - if (strcmp(name, vdso_info.symstrings + sym->st_name)) - continue; - - /* Check symbol version. */ - if (vdso_info.versym - && !vdso_match_version(vdso_info.versym[chain], - version, ver_hash)) - continue; - - return (void *)(vdso_info.load_offset + sym->st_value); + i = vdso_info.bucket[h1 % vdso_info.nbucket]; + if (i == 0) + return 0; + h1 |= 1; + hashval = vdso_info.bucket + vdso_info.nbucket + + (i - vdso_info.gnu_hash[1]); + for (;; i++) { + ELF(Sym) *sym = &vdso_info.symtab[i]; + h2 = *hashval++; + if (h1 == (h2 | 1) && + check_sym(sym, i, name, version, ver_hash)) + return (void *)(vdso_info.load_offset + + sym->st_value); + if (h2 & 1) + break; + } + } else { + i = vdso_info.bucket[elf_hash(name) % vdso_info.nbucket]; + for (; i; i = vdso_info.chain[i]) { + ELF(Sym) *sym = &vdso_info.symtab[i]; + if (sym->st_shndx != SHN_UNDEF && + check_sym(sym, i, name, version, ver_hash)) + return (void *)(vdso_info.load_offset + + sym->st_value); + } } return 0; -- 2.46.0.662.g92d0881bb0-goog

7 months

4
6
0 0

[PATCH livepatch/master v1 1/2] selftests/livepatch: Replace hardcoded path with variable in test-syscall.sh

by George Guo

From: George Guo <guodongtai(a)kylinos.cn> Updated test-syscall.sh to replace the path /sys/kernel/test_klp_syscall/npids with a variable $MOD_SYSCALL. Signed-off-by: George Guo <guodongtai(a)kylinos.cn> --- tools/testing/selftests/livepatch/test-syscall.sh | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/livepatch/test-syscall.sh b/tools/testing/selftests/livepatch/test-syscall.sh index b76a881d4..9cfa17b6b 100755 --- a/tools/testing/selftests/livepatch/test-syscall.sh +++ b/tools/testing/selftests/livepatch/test-syscall.sh @@ -24,9 +24,9 @@ pid_list=$(echo ${pids[@]} | tr ' ' ',') load_lp $MOD_SYSCALL klp_pids=$pid_list # wait for all tasks to transition to patched state -loop_until 'grep -q '^0$' /sys/kernel/test_klp_syscall/npids' +loop_until 'grep -q '^0$' /sys/kernel/$MOD_SYSCALL/npids' -pending_pids=$(cat /sys/kernel/test_klp_syscall/npids) +pending_pids=$(cat /sys/kernel/$MOD_SYSCALL/npids) log "$MOD_SYSCALL: Remaining not livepatched processes: $pending_pids" for pid in ${pids[@]}; do -- 2.43.0

7 months

2
5
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-kselftest-mirror November 2024