December 2022 - Linux-stable-mirror

[PATCH 2/3] acpi/processor: sanitize _PDC buffer bits when running as Xen dom0

by Roger Pau Monne

The Processor _PDC buffer bits notify ACPI of the OS capabilities, and so ACPI can adjust the return of other Processor methods taking the OS capabilities into account. When Linux is running as a Xen dom0, it's the hypervisor the entity in charge of processor power management, and hence Xen needs to make sure the capabilities reported in the _PDC buffer match the capabilities of the driver in Xen. Introduce a small helper to sanitize the buffer when running as Xen dom0. Signed-off-by: Roger Pau Monné <roger.pau(a)citrix.com> Cc: stable(a)vger.kernel.org --- arch/x86/include/asm/xen/hypervisor.h | 2 ++ arch/x86/xen/enlighten.c | 17 +++++++++++++++++ drivers/acpi/processor_pdc.c | 8 ++++++++ 3 files changed, 27 insertions(+) diff --git a/arch/x86/include/asm/xen/hypervisor.h b/arch/x86/include/asm/xen/hypervisor.h index b9f512138043..b4ed90ef5e68 100644 --- a/arch/x86/include/asm/xen/hypervisor.h +++ b/arch/x86/include/asm/xen/hypervisor.h @@ -63,12 +63,14 @@ void __init mem_map_via_hcall(struct boot_params *boot_params_p); #ifdef CONFIG_XEN_DOM0 bool __init xen_processor_present(uint32_t acpi_id); +void xen_sanitize_pdc(uint32_t *buf); #else static inline bool xen_processor_present(uint32_t acpi_id) { BUG(); return false; } +static inline void xen_sanitize_pdc(uint32_t *buf) { BUG(); } #endif #endif /* _ASM_X86_XEN_HYPERVISOR_H */ diff --git a/arch/x86/xen/enlighten.c b/arch/x86/xen/enlighten.c index d4c44361a26c..394dd6675113 100644 --- a/arch/x86/xen/enlighten.c +++ b/arch/x86/xen/enlighten.c @@ -372,4 +372,21 @@ bool __init xen_processor_present(uint32_t acpi_id) return false; } + +void xen_sanitize_pdc(uint32_t *buf) +{ + struct xen_platform_op op = { + .cmd = XENPF_set_processor_pminfo, + .interface_version = XENPF_INTERFACE_VERSION, + .u.set_pminfo.id = -1, + .u.set_pminfo.type = XEN_PM_PDC, + }; + int ret; + + set_xen_guest_handle(op.u.set_pminfo.pdc, buf); + ret = HYPERVISOR_platform_op(&op); + if (ret) + pr_info("sanitize of _PDC buffer bits from Xen failed: %d\n", + ret); +} #endif diff --git a/drivers/acpi/processor_pdc.c b/drivers/acpi/processor_pdc.c index 18fb04523f93..58f4c208517a 100644 --- a/drivers/acpi/processor_pdc.c +++ b/drivers/acpi/processor_pdc.c @@ -137,6 +137,14 @@ acpi_processor_eval_pdc(acpi_handle handle, struct acpi_object_list *pdc_in) buffer[2] &= ~(ACPI_PDC_C_C2C3_FFH | ACPI_PDC_C_C1_FFH); } + if (xen_initial_domain()) + /* + * When Linux is running as Xen dom0 it's the hypervisor the + * entity in charge of the processor power management, and so + * Xen needs to check the OS capabilities reported in the _PDC + * buffer matches what the hypervisor driver supports. + */ + xen_sanitize_pdc((uint32_t *)pdc_in->pointer->buffer.pointer); status = acpi_evaluate_object(handle, "_PDC", pdc_in, NULL); if (ACPI_FAILURE(status)) -- 2.37.3

2 years, 6 months

4
5
0 0

[PATCH v5 0/2] KEYS: asymmetric: Copy sig and digest in public_key_verify_signature()

by Roberto Sassu

From: Roberto Sassu <roberto.sassu(a)huawei.com> Changelog: v4: - Replace sg_init_table()/sg_set_buf() with sg_init_one() (suggested by Eric) v3: v2: - Add patch by Herbert to take only the needed bytes for a MPI from the scatterlist - Use only one scatterlist for signature and digest (suggested by Eric) - Rename key variable to buf (suggested by Eric) - Rename key_max_len variable to buf_len - Use size_t for the buf_len variable instead of u32 v1: - Unconditionally copy the signature and digest to the buffer to keep the code simple (suggested by Eric) Herbert Xu (1): lib/mpi: Fix buffer overrun when SG is too long Roberto Sassu (1): KEYS: asymmetric: Copy sig and digest in public_key_verify_signature() crypto/asymmetric_keys/public_key.c | 38 ++++++++++++++++------------- lib/mpi/mpicoder.c | 3 ++- 2 files changed, 23 insertions(+), 18 deletions(-) -- 2.25.1

2 years, 6 months

8
24
0 0

[PATCH 1/1] net: cdc_ncm: Allow for dwNtbOutMaxSize to be unset or zero

by Lee Jones

Currently, due to the sequential use of min_t() and clamp_t() macros, in cdc_ncm_check_tx_max(), if dwNtbOutMaxSize is not set, the logic sets tx_max to 0. This is then used to allocate the data area of the SKB requested later in cdc_ncm_fill_tx_frame(). This does not cause an issue presently because when memory is allocated during initialisation phase of SKB creation, more memory (512b) is allocated than is required for the SKB headers alone (320b), leaving some space (512b - 320b = 192b) for CDC data (172b). However, if more elements (for example 3 x u64 = [24b]) were added to one of the SKB header structs, say 'struct skb_shared_info', increasing its original size (320b [320b aligned]) to something larger (344b [384b aligned]), then suddenly the CDC data (172b) no longer fits in the spare SKB data area (512b - 384b = 128b). Consequently the SKB bounds checking semantics fails and panics: skbuff: skb_over_panic: text:ffffffff830a5b5f len:184 put:172 \ head:ffff888119227c00 data:ffff888119227c00 tail:0xb8 end:0x80 dev:<NULL> ------------[ cut here ]------------ kernel BUG at net/core/skbuff.c:110! RIP: 0010:skb_panic+0x14f/0x160 net/core/skbuff.c:106 <snip> Call Trace: <IRQ> skb_over_panic+0x2c/0x30 net/core/skbuff.c:115 skb_put+0x205/0x210 net/core/skbuff.c:1877 skb_put_zero include/linux/skbuff.h:2270 [inline] cdc_ncm_ndp16 drivers/net/usb/cdc_ncm.c:1116 [inline] cdc_ncm_fill_tx_frame+0x127f/0x3d50 drivers/net/usb/cdc_ncm.c:1293 cdc_ncm_tx_fixup+0x98/0xf0 drivers/net/usb/cdc_ncm.c:1514 By overriding the max value with the default CDC_NCM_NTB_MAX_SIZE_TX when not offered through the system provided params, we ensure enough data space is allocated to handle the CDC data, meaning no crash will occur. Cc: stable(a)vger.kernel.org Cc: Oliver Neukum <oliver(a)neukum.org> Cc: "David S. Miller" <davem(a)davemloft.net> Cc: Jakub Kicinski <kuba(a)kernel.org> Cc: linux-usb(a)vger.kernel.org Cc: netdev(a)vger.kernel.org Cc: linux-kernel(a)vger.kernel.org Fixes: 289507d3364f9 ("net: cdc_ncm: use sysfs for rx/tx aggregation tuning") Signed-off-by: Lee Jones <lee.jones(a)linaro.org> --- drivers/net/usb/cdc_ncm.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/net/usb/cdc_ncm.c b/drivers/net/usb/cdc_ncm.c index 24753a4da7e60..e303b522efb50 100644 --- a/drivers/net/usb/cdc_ncm.c +++ b/drivers/net/usb/cdc_ncm.c @@ -181,6 +181,8 @@ static u32 cdc_ncm_check_tx_max(struct usbnet *dev, u32 new_tx) min = ctx->max_datagram_size + ctx->max_ndp_size + sizeof(struct usb_cdc_ncm_nth32); max = min_t(u32, CDC_NCM_NTB_MAX_SIZE_TX, le32_to_cpu(ctx->ncm_parm.dwNtbOutMaxSize)); + if (max == 0) + max = CDC_NCM_NTB_MAX_SIZE_TX; /* dwNtbOutMaxSize not set */ /* some devices set dwNtbOutMaxSize too low for the above default */ min = min(min, max); -- 2.34.0.384.gca35af8252-goog

2 years, 7 months

6
13
0 0

[PATCH 0/9] KVM backports to 5.10

by Rishabh Bhatnagar

This patch series backports a few VM preemption_status, steal_time and PV TLB flushing fixes to 5.10 stable kernel. Most of the changes backport cleanly except i had to work around a few becauseof missing support/APIs in 5.10 kernel. I have captured those in the changelog as well in the individual patches. Changelog - Use mark_page_dirty_in_slot api without kvm argument (KVM: x86: Fix recording of guest steal time / preempted status) - Avoid checking for xen_msr and SEV-ES conditions (KVM: x86: do not set st->preempted when going back to user space) - Use VCPU_STAT macro to expose preemption_reported and preemption_other fields (KVM: x86: do not report a vCPU as preempted outside instruction boundaries) David Woodhouse (2): KVM: x86: Fix recording of guest steal time / preempted status KVM: Fix steal time asm constraints Lai Jiangshan (1): KVM: x86: Ensure PV TLB flush tracepoint reflects KVM behavior Paolo Bonzini (5): KVM: x86: do not set st->preempted when going back to user space KVM: x86: do not report a vCPU as preempted outside instruction boundaries KVM: x86: revalidate steal time cache if MSR value changes KVM: x86: do not report preemption if the steal time cache is stale KVM: x86: move guest_pv_has out of user_access section Sean Christopherson (1): KVM: x86: Remove obsolete disabling of page faults in kvm_arch_vcpu_put() arch/x86/include/asm/kvm_host.h | 5 +- arch/x86/kvm/svm/svm.c | 2 + arch/x86/kvm/vmx/vmx.c | 1 + arch/x86/kvm/x86.c | 164 ++++++++++++++++++++++---------- 4 files changed, 122 insertions(+), 50 deletions(-) -- 2.37.1

2 years, 7 months

6
19
0 0

[PATCH v2 1/8] drm: Disable the cursor plane on atomic contexts with virtualized drivers

by Zack Rusin

From: Zack Rusin <zackr(a)vmware.com> Cursor planes on virtualized drivers have special meaning and require that the clients handle them in specific ways, e.g. the cursor plane should react to the mouse movement the way a mouse cursor would be expected to and the client is required to set hotspot properties on it in order for the mouse events to be routed correctly. This breaks the contract as specified by the "universal planes". Fix it by disabling the cursor planes on virtualized drivers while adding a foundation on top of which it's possible to special case mouse cursor planes for clients that want it. Disabling the cursor planes makes some kms compositors which were broken, e.g. Weston, fallback to software cursor which works fine or at least better than currently while having no effect on others, e.g. gnome-shell or kwin, which put virtualized drivers on a deny-list when running in atomic context to make them fallback to legacy kms and avoid this issue. Signed-off-by: Zack Rusin <zackr(a)vmware.com> Fixes: 681e7ec73044 ("drm: Allow userspace to ask for universal plane list (v2)") Cc: <stable(a)vger.kernel.org> # v5.4+ Cc: Maarten Lankhorst <maarten.lankhorst(a)linux.intel.com> Cc: Maxime Ripard <mripard(a)kernel.org> Cc: Thomas Zimmermann <tzimmermann(a)suse.de> Cc: David Airlie <airlied(a)linux.ie> Cc: Daniel Vetter <daniel(a)ffwll.ch> Cc: Dave Airlie <airlied(a)redhat.com> Cc: Gerd Hoffmann <kraxel(a)redhat.com> Cc: Hans de Goede <hdegoede(a)redhat.com> Cc: Gurchetan Singh <gurchetansingh(a)chromium.org> Cc: Chia-I Wu <olvaffe(a)gmail.com> Cc: dri-devel(a)lists.freedesktop.org Cc: virtualization(a)lists.linux-foundation.org Cc: spice-devel(a)lists.freedesktop.org --- drivers/gpu/drm/drm_plane.c | 11 +++++++++++ drivers/gpu/drm/qxl/qxl_drv.c | 2 +- drivers/gpu/drm/vboxvideo/vbox_drv.c | 2 +- drivers/gpu/drm/virtio/virtgpu_drv.c | 3 ++- drivers/gpu/drm/vmwgfx/vmwgfx_drv.c | 2 +- include/drm/drm_drv.h | 10 ++++++++++ include/drm/drm_file.h | 12 ++++++++++++ 7 files changed, 38 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c index 726f2f163c26..e1e2a65c7119 100644 --- a/drivers/gpu/drm/drm_plane.c +++ b/drivers/gpu/drm/drm_plane.c @@ -667,6 +667,17 @@ int drm_mode_getplane_res(struct drm_device *dev, void *data, !file_priv->universal_planes) continue; + /* + * Unless userspace supports virtual cursor plane + * then if we're running on virtual driver do not + * advertise cursor planes because they'll be broken + */ + if (plane->type == DRM_PLANE_TYPE_CURSOR && + drm_core_check_feature(dev, DRIVER_VIRTUAL) && + file_priv->atomic && + !file_priv->supports_virtual_cursor_plane) + continue; + if (drm_lease_held(file_priv, plane->base.id)) { if (count < plane_resp->count_planes && put_user(plane->base.id, plane_ptr + count)) diff --git a/drivers/gpu/drm/qxl/qxl_drv.c b/drivers/gpu/drm/qxl/qxl_drv.c index 1cb6f0c224bb..0e4212e05caa 100644 --- a/drivers/gpu/drm/qxl/qxl_drv.c +++ b/drivers/gpu/drm/qxl/qxl_drv.c @@ -281,7 +281,7 @@ static const struct drm_ioctl_desc qxl_ioctls[] = { }; static struct drm_driver qxl_driver = { - .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC, + .driver_features = DRIVER_GEM | DRIVER_MODESET | DRIVER_ATOMIC | DRIVER_VIRTUAL, .dumb_create = qxl_mode_dumb_create, .dumb_map_offset = drm_gem_ttm_dumb_map_offset, diff --git a/drivers/gpu/drm/vboxvideo/vbox_drv.c b/drivers/gpu/drm/vboxvideo/vbox_drv.c index f4f2bd79a7cb..84e75bcc3384 100644 --- a/drivers/gpu/drm/vboxvideo/vbox_drv.c +++ b/drivers/gpu/drm/vboxvideo/vbox_drv.c @@ -176,7 +176,7 @@ DEFINE_DRM_GEM_FOPS(vbox_fops); static const struct drm_driver driver = { .driver_features = - DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC, + DRIVER_MODESET | DRIVER_GEM | DRIVER_ATOMIC | DRIVER_VIRTUAL, .lastclose = drm_fb_helper_lastclose, diff --git a/drivers/gpu/drm/virtio/virtgpu_drv.c b/drivers/gpu/drm/virtio/virtgpu_drv.c index 5f25a8d15464..3c5bb006159a 100644 --- a/drivers/gpu/drm/virtio/virtgpu_drv.c +++ b/drivers/gpu/drm/virtio/virtgpu_drv.c @@ -198,7 +198,8 @@ MODULE_AUTHOR("Alon Levy"); DEFINE_DRM_GEM_FOPS(virtio_gpu_driver_fops); static const struct drm_driver driver = { - .driver_features = DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | DRIVER_ATOMIC, + .driver_features = + DRIVER_MODESET | DRIVER_GEM | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_VIRTUAL, .open = virtio_gpu_driver_open, .postclose = virtio_gpu_driver_postclose, diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c index 01a5b47e95f9..712f6ad0b014 100644 --- a/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_drv.c @@ -1581,7 +1581,7 @@ static const struct file_operations vmwgfx_driver_fops = { static const struct drm_driver driver = { .driver_features = - DRIVER_MODESET | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_GEM, + DRIVER_MODESET | DRIVER_RENDER | DRIVER_ATOMIC | DRIVER_GEM | DRIVER_VIRTUAL, .ioctls = vmw_ioctls, .num_ioctls = ARRAY_SIZE(vmw_ioctls), .master_set = vmw_master_set, diff --git a/include/drm/drm_drv.h b/include/drm/drm_drv.h index f6159acb8856..c4cd7fc350d9 100644 --- a/include/drm/drm_drv.h +++ b/include/drm/drm_drv.h @@ -94,6 +94,16 @@ enum drm_driver_feature { * synchronization of command submission. */ DRIVER_SYNCOBJ_TIMELINE = BIT(6), + /** + * @DRIVER_VIRTUAL: + * + * Driver is running on top of virtual hardware. The most significant + * implication of this is a requirement of special handling of the + * cursor plane (e.g. cursor plane has to actually track the mouse + * cursor and the clients are required to set hotspot in order for + * the cursor planes to work correctly). + */ + DRIVER_VIRTUAL = BIT(7), /* IMPORTANT: Below are all the legacy flags, add new ones above. */ diff --git a/include/drm/drm_file.h b/include/drm/drm_file.h index e0a73a1e2df7..3e5c36891161 100644 --- a/include/drm/drm_file.h +++ b/include/drm/drm_file.h @@ -223,6 +223,18 @@ struct drm_file { */ bool is_master; + /** + * @supports_virtual_cursor_plane: + * + * This client is capable of handling the cursor plane with the + * restrictions imposed on it by the virtualized drivers. + * + * The implies that the cursor plane has to behave like a cursor + * i.e. track cursor movement. It also requires setting of the + * hotspot properties by the client on the cursor plane. + */ + bool supports_virtual_cursor_plane; + /** * @master: * -- 2.34.1

2 years, 7 months

6
12
0 0

[v4 PATCH] fs/proc: task_mmu.c: don't read mapcount for migration entry

by Yang Shi

The syzbot reported the below BUG: kernel BUG at include/linux/page-flags.h:785! invalid opcode: 0000 [#1] PREEMPT SMP KASAN CPU: 1 PID: 4392 Comm: syz-executor560 Not tainted 5.16.0-rc6-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline] RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744 Code: e8 d3 16 d1 ff 48 c7 c6 c0 00 b6 89 48 89 ef e8 94 4e 04 00 0f 0b e8 bd 16 d1 ff 48 c7 c6 60 01 b6 89 48 89 ef e8 7e 4e 04 00 <0f> 0b e8 a7 16 d1 ff 48 c7 c6 a0 01 b6 89 4c 89 f7 e8 68 4e 04 00 RSP: 0018:ffffc90002b6f7b8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888019619d00 RSI: ffffffff81a68c12 RDI: 0000000000000003 RBP: ffffea0001bdc2c0 R08: 0000000000000029 R09: 00000000ffffffff R10: ffffffff8903e29f R11: 00000000ffffffff R12: 00000000ffffffff R13: 00000000ffffea00 R14: ffffc90002b6fb30 R15: ffffea0001bd8001 FS: 00007faa2aefd700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff7e663318 CR3: 0000000018c6e000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <TASK> page_mapcount include/linux/mm.h:837 [inline] smaps_account+0x470/0xb10 fs/proc/task_mmu.c:466 smaps_pte_entry fs/proc/task_mmu.c:538 [inline] smaps_pte_range+0x611/0x1250 fs/proc/task_mmu.c:601 walk_pmd_range mm/pagewalk.c:128 [inline] walk_pud_range mm/pagewalk.c:205 [inline] walk_p4d_range mm/pagewalk.c:240 [inline] walk_pgd_range mm/pagewalk.c:277 [inline] __walk_page_range+0xe23/0x1ea0 mm/pagewalk.c:379 walk_page_vma+0x277/0x350 mm/pagewalk.c:530 smap_gather_stats.part.0+0x148/0x260 fs/proc/task_mmu.c:768 smap_gather_stats fs/proc/task_mmu.c:741 [inline] show_smap+0xc6/0x440 fs/proc/task_mmu.c:822 seq_read_iter+0xbb0/0x1240 fs/seq_file.c:272 seq_read+0x3e0/0x5b0 fs/seq_file.c:162 vfs_read+0x1b5/0x600 fs/read_write.c:479 ksys_read+0x12d/0x250 fs/read_write.c:619 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7faa2af6c969 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 11 15 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007faa2aefd288 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 00007faa2aff4418 RCX: 00007faa2af6c969 RDX: 0000000000002025 RSI: 0000000020000100 RDI: 0000000000000003 RBP: 00007faa2aff4410 R08: 00007faa2aefd700 R09: 0000000000000000 R10: 00007faa2aefd700 R11: 0000000000000246 R12: 00007faa2afc20ac R13: 00007fff7e6632bf R14: 00007faa2aefd400 R15: 0000000000022000 </TASK> Modules linked in: ---[ end trace 24ec93ff95e4ac3d ]--- RIP: 0010:PageDoubleMap include/linux/page-flags.h:785 [inline] RIP: 0010:__page_mapcount+0x2d2/0x350 mm/util.c:744 Code: e8 d3 16 d1 ff 48 c7 c6 c0 00 b6 89 48 89 ef e8 94 4e 04 00 0f 0b e8 bd 16 d1 ff 48 c7 c6 60 01 b6 89 48 89 ef e8 7e 4e 04 00 <0f> 0b e8 a7 16 d1 ff 48 c7 c6 a0 01 b6 89 4c 89 f7 e8 68 4e 04 00 RSP: 0018:ffffc90002b6f7b8 EFLAGS: 00010293 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000 RDX: ffff888019619d00 RSI: ffffffff81a68c12 RDI: 0000000000000003 RBP: ffffea0001bdc2c0 R08: 0000000000000029 R09: 00000000ffffffff R10: ffffffff8903e29f R11: 00000000ffffffff R12: 00000000ffffffff R13: 00000000ffffea00 R14: ffffc90002b6fb30 R15: ffffea0001bd8001 FS: 00007faa2aefd700(0000) GS:ffff8880b9d00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fff7e663318 CR3: 0000000018c6e000 CR4: 00000000003506e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 The reproducer was trying to reading /proc/$PID/smaps when calling MADV_FREE at the mean time. MADV_FREE may split THPs if it is called for partial THP. It may trigger the below race: CPU A CPU B ----- ----- smaps walk: MADV_FREE: page_mapcount() PageCompound() split_huge_page() page = compound_head(page) PageDoubleMap(page) When calling PageDoubleMap() this page is not a tail page of THP anymore so the BUG is triggered. This could be fixed by elevated refcount of the page before calling mapcount, but it prevents from counting migration entries, and it seems overkilling because the race just could happen when PMD is split so all PTE entries of tail pages are actually migration entries, and smaps_account() does treat migration entries as mapcount == 1 as Kirill pointed out. Add a new parameter for smaps_account() to tell this entry is migration entry then skip calling page_mapcount(). Don't skip getting mapcount for device private entries since they do track references with mapcount. Pagemap also has the similar issue although it was not reported. Fixed it as well. Fixes: e9b61f19858a ("thp: reintroduce split_huge_page()") Reported-by: syzbot+1f52b3a18d5633fa7f82(a)syzkaller.appspotmail.com Acked-by: David Hildenbrand <david(a)redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov(a)linux.intel.com> Cc: Jann Horn <jannh(a)google.com> Cc: Matthew Wilcox <willy(a)infradead.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Yang Shi <shy828301(a)gmail.com> --- v4: * s/Treated/Treat per David * Collected acked-by tag from David v3: * Fixed the fix tag, the one used by v2 was not accurate * Added comment about the risk calling page_mapcount() per David * Fix pagemap v2: * Added proper fix tag per Jann Horn * Rebased to the latest linus's tree fs/proc/task_mmu.c | 38 ++++++++++++++++++++++++++++++-------- 1 file changed, 30 insertions(+), 8 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 18f8c3acbb85..bc2f46033231 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -440,7 +440,8 @@ static void smaps_page_accumulate(struct mem_size_stats *mss, } static void smaps_account(struct mem_size_stats *mss, struct page *page, - bool compound, bool young, bool dirty, bool locked) + bool compound, bool young, bool dirty, bool locked, + bool migration) { int i, nr = compound ? compound_nr(page) : 1; unsigned long size = nr * PAGE_SIZE; @@ -467,8 +468,15 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page, * page_count(page) == 1 guarantees the page is mapped exactly once. * If any subpage of the compound page mapped with PTE it would elevate * page_count(). + * + * The page_mapcount() is called to get a snapshot of the mapcount. + * Without holding the page lock this snapshot can be slightly wrong as + * we cannot always read the mapcount atomically. It is not safe to + * call page_mapcount() even with PTL held if the page is not mapped, + * especially for migration entries. Treat regular migration entries + * as mapcount == 1. */ - if (page_count(page) == 1) { + if ((page_count(page) == 1) || migration) { smaps_page_accumulate(mss, page, size, size << PSS_SHIFT, dirty, locked, true); return; @@ -517,6 +525,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, struct vm_area_struct *vma = walk->vma; bool locked = !!(vma->vm_flags & VM_LOCKED); struct page *page = NULL; + bool migration = false; if (pte_present(*pte)) { page = vm_normal_page(vma, addr, *pte); @@ -536,8 +545,11 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, } else { mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT; } - } else if (is_pfn_swap_entry(swpent)) + } else if (is_pfn_swap_entry(swpent)) { + if (is_migration_entry(swpent)) + migration = true; page = pfn_swap_entry_to_page(swpent); + } } else { smaps_pte_hole_lookup(addr, walk); return; @@ -546,7 +558,8 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, if (!page) return; - smaps_account(mss, page, false, pte_young(*pte), pte_dirty(*pte), locked); + smaps_account(mss, page, false, pte_young(*pte), pte_dirty(*pte), + locked, migration); } #ifdef CONFIG_TRANSPARENT_HUGEPAGE @@ -557,6 +570,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, struct vm_area_struct *vma = walk->vma; bool locked = !!(vma->vm_flags & VM_LOCKED); struct page *page = NULL; + bool migration = false; if (pmd_present(*pmd)) { /* FOLL_DUMP will return -EFAULT on huge zero page */ @@ -564,8 +578,10 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, } else if (unlikely(thp_migration_supported() && is_swap_pmd(*pmd))) { swp_entry_t entry = pmd_to_swp_entry(*pmd); - if (is_migration_entry(entry)) + if (is_migration_entry(entry)) { + migration = true; page = pfn_swap_entry_to_page(entry); + } } if (IS_ERR_OR_NULL(page)) return; @@ -577,7 +593,9 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, /* pass */; else mss->file_thp += HPAGE_PMD_SIZE; - smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd), locked); + + smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd), + locked, migration); } #else static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr, @@ -1378,6 +1396,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, { u64 frame = 0, flags = 0; struct page *page = NULL; + bool migration = false; if (pte_present(pte)) { if (pm->show_pfn) @@ -1399,13 +1418,14 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, frame = swp_type(entry) | (swp_offset(entry) << MAX_SWAPFILES_SHIFT); flags |= PM_SWAP; + migration = is_migration_entry(entry); if (is_pfn_swap_entry(entry)) page = pfn_swap_entry_to_page(entry); } if (page && !PageAnon(page)) flags |= PM_FILE; - if (page && page_mapcount(page) == 1) + if (page && !migration && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; if (vma->vm_flags & VM_SOFTDIRTY) flags |= PM_SOFT_DIRTY; @@ -1421,6 +1441,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, spinlock_t *ptl; pte_t *pte, *orig_pte; int err = 0; + bool migration = false; #ifdef CONFIG_TRANSPARENT_HUGEPAGE ptl = pmd_trans_huge_lock(pmdp, vma); @@ -1461,11 +1482,12 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, if (pmd_swp_uffd_wp(pmd)) flags |= PM_UFFD_WP; VM_BUG_ON(!is_pmd_migration_entry(pmd)); + migration = is_migration_entry(entry); page = pfn_swap_entry_to_page(entry); } #endif - if (page && page_mapcount(page) == 1) + if (page && !migration && page_mapcount(page) == 1) flags |= PM_MMAP_EXCLUSIVE; for (; addr != end; addr += PAGE_SIZE) { -- 2.26.3

2 years, 8 months

5
12
0 0

Re: [PATCH 5.4 182/389] PCI/portdrv: Dont disable AER reporting in get_port_device_capability()

by Greg Kroah-Hartman

On Tue, Aug 23, 2022 at 07:20:14AM -0500, Bjorn Helgaas wrote: > On Tue, Aug 23, 2022, 6:35 AM Greg Kroah-Hartman <gregkh(a)linuxfoundation.org> > wrote: > > > From: Stefan Roese <sr(a)denx.de> > > > > [ Upstream commit 8795e182b02dc87e343c79e73af6b8b7f9c5e635 ] > > > > There's an open regression related to this commit: > > https://bugzilla.kernel.org/show_bug.cgi?id=216373 This is already in the following released stable kernels: 5.10.137 5.15.61 5.18.18 5.19.2 I'll go drop it from the 4.19 and 5.4 queues, but when this gets resolved in Linus's tree, make sure there's a cc: stable on the fix so that we know to backport it to the above branches as well. Or at the least, a "Fixes:" tag. thanks, greg k-h

2 years, 8 months

5
14
0 0

[PATCH] drm/panfrost: Fix the panfrost_mmu_map_fault_addr() error path

by Boris Brezillon

Make sure all bo->base.pages entries are either NULL or pointing to a valid page before calling drm_gem_shmem_put_pages(). Reported-by: Tomeu Vizoso <tomeu.vizoso(a)collabora.com> Cc: <stable(a)vger.kernel.org> Fixes: 187d2929206e ("drm/panfrost: Add support for GPU heap allocations") Signed-off-by: Boris Brezillon <boris.brezillon(a)collabora.com> --- drivers/gpu/drm/panfrost/panfrost_mmu.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/panfrost/panfrost_mmu.c b/drivers/gpu/drm/panfrost/panfrost_mmu.c index 569509c2ba27..d76dff201ea6 100644 --- a/drivers/gpu/drm/panfrost/panfrost_mmu.c +++ b/drivers/gpu/drm/panfrost/panfrost_mmu.c @@ -460,6 +460,7 @@ static int panfrost_mmu_map_fault_addr(struct panfrost_device *pfdev, int as, if (IS_ERR(pages[i])) { mutex_unlock(&bo->base.pages_lock); ret = PTR_ERR(pages[i]); + pages[i] = NULL; goto err_pages; } } -- 2.31.1

2 years, 8 months

2
2
0 0

[PATCH 6.0.y / 6.1.y] x86/split_lock: Add sysctl to control the misery mode

by Guilherme G. Piccoli

commit 727209376f4998bc84db1d5d8af15afea846a92b upstream. Commit b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") changed the way the split lock detector works when in "warn" mode; basically, it not only shows the warn message, but also intentionally introduces a slowdown through sleeping plus serialization mechanism on such task. Based on discussions in [0], seems the warning alone wasn't enough motivation for userspace developers to fix their applications. This slowdown is enough to totally break some proprietary (aka. unfixable) userspace[1]. Happens that originally the proposal in [0] was to add a new mode which would warns + slowdown the "split locking" task, keeping the old warn mode untouched. In the end, that idea was discarded and the regular/default "warn" mode now slows down the applications. This is quite aggressive with regards proprietary/legacy programs that basically are unable to properly run in kernel with this change. While it is understandable that a malicious application could DoS by split locking, it seems unacceptable to regress old/proprietary userspace programs through a default configuration that previously worked. An example of such breakage was reported in [1]. Add a sysctl to allow controlling the "misery mode" behavior, as per Thomas suggestion on [2]. This way, users running legacy and/or proprietary software are allowed to still execute them with a decent performance while still observing the warning messages on kernel log. [0] https://lore.kernel.org/lkml/20220217012721.9694-1-tony.luck@intel.com/ [1] https://github.com/doitsujin/dxvk/issues/2938 [2] https://lore.kernel.org/lkml/87pmf4bter.ffs@tglx/ [ dhansen: minor changelog tweaks, including clarifying the actual problem ] Fixes: b041b525dab9 ("x86/split_lock: Make life miserable for split lockers") Suggested-by: Thomas Gleixner <tglx(a)linutronix.de> Signed-off-by: Guilherme G. Piccoli <gpiccoli(a)igalia.com> Signed-off-by: Dave Hansen <dave.hansen(a)linux.intel.com> Reviewed-by: Tony Luck <tony.luck(a)intel.com> Tested-by: Andre Almeida <andrealmeid(a)igalia.com> Link: https://lore.kernel.org/all/20221024200254.635256-1-gpiccoli%40igalia.com --- Hi folks, I've build tested this on both 6.0.13 and 6.1, worked fine. The split lock detector code changed almost nothing since 6.0, so that makes sense... I think this is important to have in stable, some gaming community members seems excited with that, it'll help with general proprietary software (that is basically unfixable), making them run smoothly on 6.0.y and 6.1.y. I've CCed some folks more than just the stable list, to gather more opinions on that, so apologies if you received this email but think that you shouldn't have. Thanks in advance, Guilherme Documentation/admin-guide/sysctl/kernel.rst | 23 ++++++++ arch/x86/kernel/cpu/intel.c | 63 +++++++++++++++++---- 2 files changed, 76 insertions(+), 10 deletions(-) diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 98d1b198b2b4..c2c64c1b706f 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -1314,6 +1314,29 @@ watchdog work to be queued by the watchdog timer function, otherwise the NMI watchdog — if enabled — can detect a hard lockup condition. +split_lock_mitigate (x86 only) +============================== + +On x86, each "split lock" imposes a system-wide performance penalty. On larger +systems, large numbers of split locks from unprivileged users can result in +denials of service to well-behaved and potentially more important users. + +The kernel mitigates these bad users by detecting split locks and imposing +penalties: forcing them to wait and only allowing one core to execute split +locks at a time. + +These mitigations can make those bad applications unbearably slow. Setting +split_lock_mitigate=0 may restore some application performance, but will also +increase system exposure to denial of service attacks from split lock users. + += =================================================================== +0 Disable the mitigation mode - just warns the split lock on kernel log + and exposes the system to denials of service from the split lockers. +1 Enable the mitigation mode (this is the default) - penalizes the split + lockers with intentional performance degradation. += =================================================================== + + stack_erasing ============= diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c index 2d7ea5480ec3..427899650483 100644 --- a/arch/x86/kernel/cpu/intel.c +++ b/arch/x86/kernel/cpu/intel.c @@ -1034,8 +1034,32 @@ static const struct { static struct ratelimit_state bld_ratelimit; +static unsigned int sysctl_sld_mitigate = 1; static DEFINE_SEMAPHORE(buslock_sem); +#ifdef CONFIG_PROC_SYSCTL +static struct ctl_table sld_sysctls[] = { + { + .procname = "split_lock_mitigate", + .data = &sysctl_sld_mitigate, + .maxlen = sizeof(unsigned int), + .mode = 0644, + .proc_handler = proc_douintvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, + }, + {} +}; + +static int __init sld_mitigate_sysctl_init(void) +{ + register_sysctl_init("kernel", sld_sysctls); + return 0; +} + +late_initcall(sld_mitigate_sysctl_init); +#endif + static inline bool match_option(const char *arg, int arglen, const char *opt) { int len = strlen(opt), ratelimit; @@ -1146,12 +1170,20 @@ static void split_lock_init(void) split_lock_verify_msr(sld_state != sld_off); } -static void __split_lock_reenable(struct work_struct *work) +static void __split_lock_reenable_unlock(struct work_struct *work) { sld_update_msr(true); up(&buslock_sem); } +static DECLARE_DELAYED_WORK(sl_reenable_unlock, __split_lock_reenable_unlock); + +static void __split_lock_reenable(struct work_struct *work) +{ + sld_update_msr(true); +} +static DECLARE_DELAYED_WORK(sl_reenable, __split_lock_reenable); + /* * If a CPU goes offline with pending delayed work to re-enable split lock * detection then the delayed work will be executed on some other CPU. That @@ -1169,10 +1201,9 @@ static int splitlock_cpu_offline(unsigned int cpu) return 0; } -static DECLARE_DELAYED_WORK(split_lock_reenable, __split_lock_reenable); - static void split_lock_warn(unsigned long ip) { + struct delayed_work *work; int cpu; if (!current->reported_split_lock) @@ -1180,14 +1211,26 @@ static void split_lock_warn(unsigned long ip) current->comm, current->pid, ip); current->reported_split_lock = 1; - /* misery factor #1, sleep 10ms before trying to execute split lock */ - if (msleep_interruptible(10) > 0) - return; - /* Misery factor #2, only allow one buslocked disabled core at a time */ - if (down_interruptible(&buslock_sem) == -EINTR) - return; + if (sysctl_sld_mitigate) { + /* + * misery factor #1: + * sleep 10ms before trying to execute split lock. + */ + if (msleep_interruptible(10) > 0) + return; + /* + * Misery factor #2: + * only allow one buslocked disabled core at a time. + */ + if (down_interruptible(&buslock_sem) == -EINTR) + return; + work = &sl_reenable_unlock; + } else { + work = &sl_reenable; + } + cpu = get_cpu(); - schedule_delayed_work_on(cpu, &split_lock_reenable, 2); + schedule_delayed_work_on(cpu, work, 2); /* Disable split lock detection on this CPU to make progress */ sld_update_msr(false); -- 2.38.1

2 years, 8 months

5
5
0 0

[PATCH v2] module: Don't wait for GOING modules

by Petr Pavlu

During a system boot, it can happen that the kernel receives a burst of requests to insert the same module but loading it eventually fails during its init call. For instance, udev can make a request to insert a frequency module for each individual CPU when another frequency module is already loaded which causes the init function of the new module to return an error. Since commit 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading"), the kernel waits for modules in MODULE_STATE_GOING state to finish unloading before making another attempt to load the same module. This creates unnecessary work in the described scenario and delays the boot. In the worst case, it can prevent udev from loading drivers for other devices and might cause timeouts of services waiting on them and subsequently a failed boot. This patch attempts a different solution for the problem 6e6de3dee51a was trying to solve. Rather than waiting for the unloading to complete, it returns a different error code (-EBUSY) for modules in the GOING state. This should avoid the error situation that was described in 6e6de3dee51a (user space attempting to load a dependent module because the -EEXIST error code would suggest to user space that the first module had been loaded successfully), while avoiding the delay situation too. Fixes: 6e6de3dee51a ("kernel/module.c: Only return -EEXIST for modules that have finished loading") Co-developed-by: Martin Wilck <mwilck(a)suse.com> Signed-off-by: Martin Wilck <mwilck(a)suse.com> Signed-off-by: Petr Pavlu <petr.pavlu(a)suse.com> Cc: stable(a)vger.kernel.org --- Changes since v1 [1]: - Don't attempt a new module initialization when a same-name module completely disappeared while waiting on it, which means it went through the GOING state implicitly already. [1] https://lore.kernel.org/linux-modules/20221123131226.24359-1-petr.pavlu@sus… kernel/module/main.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) diff --git a/kernel/module/main.c b/kernel/module/main.c index d02d39c7174e..7a627345d4fd 100644 --- a/kernel/module/main.c +++ b/kernel/module/main.c @@ -2386,7 +2386,8 @@ static bool finished_loading(const char *name) sched_annotate_sleep(); mutex_lock(&module_mutex); mod = find_module_all(name, strlen(name), true); - ret = !mod || mod->state == MODULE_STATE_LIVE; + ret = !mod || mod->state == MODULE_STATE_LIVE + || mod->state == MODULE_STATE_GOING; mutex_unlock(&module_mutex); return ret; @@ -2562,20 +2563,35 @@ static int add_unformed_module(struct module *mod) mod->state = MODULE_STATE_UNFORMED; -again: mutex_lock(&module_mutex); old = find_module_all(mod->name, strlen(mod->name), true); if (old != NULL) { - if (old->state != MODULE_STATE_LIVE) { + if (old->state == MODULE_STATE_COMING + || old->state == MODULE_STATE_UNFORMED) { /* Wait in case it fails to load. */ mutex_unlock(&module_mutex); err = wait_event_interruptible(module_wq, finished_loading(mod->name)); if (err) goto out_unlocked; - goto again; + + /* The module might have gone in the meantime. */ + mutex_lock(&module_mutex); + old = find_module_all(mod->name, strlen(mod->name), + true); } - err = -EEXIST; + + /* + * We are here only when the same module was being loaded. Do + * not try to load it again right now. It prevents long delays + * caused by serialized module load failures. It might happen + * when more devices of the same type trigger load of + * a particular module. + */ + if (old && old->state == MODULE_STATE_LIVE) + err = -EEXIST; + else + err = -EBUSY; goto out; } mod_update_bounds(mod); -- 2.35.3

2 years, 9 months

5
30
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror December 2022