June 2021 - Linux-stable-mirror

[PATCH V2] serial: tegra: Only print FIFO error message when an error occurs

by Jon Hunter

The Tegra serial driver always prints an error message when enabling the FIFO for devices that have support for checking the FIFO enable status. Fix this by displaying the error message, only when an error occurs. Finally, update the error message to make it clear that enabling the FIFO failed and display the error code. Fixes: 222dcdff3405 ("serial: tegra: check for FIFO mode enabled status") Cc: <stable(a)vger.kernel.org> Signed-off-by: Jon Hunter <jonathanh(a)nvidia.com> --- Changes since V1: - Updated the error message to make it more meaningful. drivers/tty/serial/serial-tegra.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/tty/serial/serial-tegra.c b/drivers/tty/serial/serial-tegra.c index 222032792d6c..eba5b9ecba34 100644 --- a/drivers/tty/serial/serial-tegra.c +++ b/drivers/tty/serial/serial-tegra.c @@ -1045,9 +1045,11 @@ static int tegra_uart_hw_init(struct tegra_uart_port *tup) if (tup->cdata->fifo_mode_enable_status) { ret = tegra_uart_wait_fifo_mode_enabled(tup); - dev_err(tup->uport.dev, "FIFO mode not enabled\n"); - if (ret < 0) + if (ret < 0) { + dev_err(tup->uport.dev, + "Failed to enable FIFO mode: %d\n", ret); return ret; + } } else { /* * For all tegra devices (up to t210), there is a hardware -- 2.25.1

4 years, 5 months

4
6
0 0

[PATCH 4.19] KVM: SVM: Call SEV Guest Decommission if ASID binding fails

by Alper Gun

commit 934002cd660b035b926438244b4294e647507e13 upstream. Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding fails. If a failure happens after a successful LAUNCH_START command, a decommission command should be executed. Otherwise, guest context will be unfreed inside the AMD SP. After the firmware will not have memory to allocate more SEV guest context, LAUNCH_START command will begin to fail with SEV_RET_RESOURCE_LIMIT error. The existing code calls decommission inside sev_unbind_asid, but it is not called if a failure happens before guest activation succeeds. If sev_bind_asid fails, decommission is never called. PSP firmware has a limit for the number of guests. If sev_asid_binding fails many times, PSP firmware will not have resources to create another guest context. Cc: stable(a)vger.kernel.org Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Reported-by: Peter Gonda <pgonda(a)google.com> Signed-off-by: Alper Gun <alpergun(a)google.com> Reviewed-by: Marc Orr <marcorr(a)google.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> Message-Id: <20210610174604.2554090-1-alpergun(a)google.com> --- arch/x86/kvm/svm.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index ad24e6777277..ae94ec036137 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1791,9 +1791,25 @@ static void sev_asid_free(struct kvm *kvm) __sev_asid_free(sev->asid); } -static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) +static void sev_decommission(unsigned int handle) { struct sev_data_decommission *decommission; + + if (!handle) + return; + + decommission = kzalloc(sizeof(*decommission), GFP_KERNEL); + if (!decommission) + return; + + decommission->handle = handle; + sev_guest_decommission(decommission, NULL); + + kfree(decommission); +} + +static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) +{ struct sev_data_deactivate *data; if (!handle) @@ -1811,15 +1827,7 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) sev_guest_df_flush(NULL); kfree(data); - decommission = kzalloc(sizeof(*decommission), GFP_KERNEL); - if (!decommission) - return; - - /* decommission handle */ - decommission->handle = handle; - sev_guest_decommission(decommission, NULL); - - kfree(decommission); + sev_decommission(handle); } static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, @@ -6468,8 +6476,10 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) /* Bind ASID to this guest */ ret = sev_bind_asid(kvm, start->handle, error); - if (ret) + if (ret) { + sev_decommission(start->handle); goto e_free_session; + } /* return handle to userspace */ params.handle = start->handle; -- 2.32.0.93.g670b81a890-goog

4 years, 5 months

2
1
0 0

[PATCH 5.4] KVM: SVM: Call SEV Guest Decommission if ASID binding fails

by Alper Gun

commit 934002cd660b035b926438244b4294e647507e13 upstream. Send SEV_CMD_DECOMMISSION command to PSP firmware if ASID binding fails. If a failure happens after a successful LAUNCH_START command, a decommission command should be executed. Otherwise, guest context will be unfreed inside the AMD SP. After the firmware will not have memory to allocate more SEV guest context, LAUNCH_START command will begin to fail with SEV_RET_RESOURCE_LIMIT error. The existing code calls decommission inside sev_unbind_asid, but it is not called if a failure happens before guest activation succeeds. If sev_bind_asid fails, decommission is never called. PSP firmware has a limit for the number of guests. If sev_asid_binding fails many times, PSP firmware will not have resources to create another guest context. Cc: stable(a)vger.kernel.org Fixes: 59414c989220 ("KVM: SVM: Add support for KVM_SEV_LAUNCH_START command") Reported-by: Peter Gonda <pgonda(a)google.com> Signed-off-by: Alper Gun <alpergun(a)google.com> Reviewed-by: Marc Orr <marcorr(a)google.com> Signed-off-by: Paolo Bonzini <pbonzini(a)redhat.com> Message-Id: <20210610174604.2554090-1-alpergun(a)google.com> --- arch/x86/kvm/svm.c | 32 +++++++++++++++++++++----------- 1 file changed, 21 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 074cd170912a..aa2da922ca99 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -1794,9 +1794,25 @@ static void sev_asid_free(struct kvm *kvm) __sev_asid_free(sev->asid); } -static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) +static void sev_decommission(unsigned int handle) { struct sev_data_decommission *decommission; + + if (!handle) + return; + + decommission = kzalloc(sizeof(*decommission), GFP_KERNEL); + if (!decommission) + return; + + decommission->handle = handle; + sev_guest_decommission(decommission, NULL); + + kfree(decommission); +} + +static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) +{ struct sev_data_deactivate *data; if (!handle) @@ -1814,15 +1830,7 @@ static void sev_unbind_asid(struct kvm *kvm, unsigned int handle) sev_guest_df_flush(NULL); kfree(data); - decommission = kzalloc(sizeof(*decommission), GFP_KERNEL); - if (!decommission) - return; - - /* decommission handle */ - decommission->handle = handle; - sev_guest_decommission(decommission, NULL); - - kfree(decommission); + sev_decommission(handle); } static struct page **sev_pin_memory(struct kvm *kvm, unsigned long uaddr, @@ -6475,8 +6483,10 @@ static int sev_launch_start(struct kvm *kvm, struct kvm_sev_cmd *argp) /* Bind ASID to this guest */ ret = sev_bind_asid(kvm, start->handle, error); - if (ret) + if (ret) { + sev_decommission(start->handle); goto e_free_session; + } /* return handle to userspace */ params.handle = start->handle; -- 2.32.0.93.g670b81a890-goog

4 years, 5 months

4
4
0 0

FAILED: patch "[PATCH] s390/stack: fix possible register corruption with stack" failed to apply to 5.4-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.4-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 67147e96a332b56c7206238162771d82467f86c0 Mon Sep 17 00:00:00 2001 From: Heiko Carstens <hca(a)linux.ibm.com> Date: Fri, 18 Jun 2021 16:58:47 +0200 Subject: [PATCH] s390/stack: fix possible register corruption with stack switch helper The CALL_ON_STACK macro is used to call a C function from inline assembly, and therefore must consider the C ABI, which says that only registers 6-13, and 15 are non-volatile (restored by the called function). The inline assembly incorrectly marks all registers used to pass parameters to the called function as read-only input operands, instead of operands that are read and written to. This might result in register corruption depending on usage, compiler, and compile options. Fix this by marking all operands used to pass parameters as read/write operands. To keep the code simple even register 6, if used, is marked as read-write operand. Fixes: ff340d2472ec ("s390: add stack switch helper") Cc: <stable(a)kernel.org> # 4.20 Reviewed-by: Vasily Gorbik <gor(a)linux.ibm.com> Signed-off-by: Heiko Carstens <hca(a)linux.ibm.com> Signed-off-by: Vasily Gorbik <gor(a)linux.ibm.com> diff --git a/arch/s390/include/asm/stacktrace.h b/arch/s390/include/asm/stacktrace.h index 2b543163d90a..76c6034428be 100644 --- a/arch/s390/include/asm/stacktrace.h +++ b/arch/s390/include/asm/stacktrace.h @@ -91,12 +91,16 @@ struct stack_frame { CALL_ARGS_4(arg1, arg2, arg3, arg4); \ register unsigned long r4 asm("6") = (unsigned long)(arg5) -#define CALL_FMT_0 "=&d" (r2) : -#define CALL_FMT_1 "+&d" (r2) : -#define CALL_FMT_2 CALL_FMT_1 "d" (r3), -#define CALL_FMT_3 CALL_FMT_2 "d" (r4), -#define CALL_FMT_4 CALL_FMT_3 "d" (r5), -#define CALL_FMT_5 CALL_FMT_4 "d" (r6), +/* + * To keep this simple mark register 2-6 as being changed (volatile) + * by the called function, even though register 6 is saved/nonvolatile. + */ +#define CALL_FMT_0 "=&d" (r2) +#define CALL_FMT_1 "+&d" (r2) +#define CALL_FMT_2 CALL_FMT_1, "+&d" (r3) +#define CALL_FMT_3 CALL_FMT_2, "+&d" (r4) +#define CALL_FMT_4 CALL_FMT_3, "+&d" (r5) +#define CALL_FMT_5 CALL_FMT_4, "+&d" (r6) #define CALL_CLOBBER_5 "0", "1", "14", "cc", "memory" #define CALL_CLOBBER_4 CALL_CLOBBER_5 @@ -118,7 +122,7 @@ struct stack_frame { " brasl 14,%[_fn]\n" \ " la 15,0(%[_prev])\n" \ : [_prev] "=&a" (prev), CALL_FMT_##nr \ - [_stack] "R" (stack), \ + : [_stack] "R" (stack), \ [_bc] "i" (offsetof(struct stack_frame, back_chain)), \ [_frame] "d" (frame), \ [_fn] "X" (fn) : CALL_CLOBBER_##nr); \

4 years, 5 months

3
2
0 0

FAILED: patch "[PATCH] kthread_worker: split code for canceling the delayed work" failed to apply to 4.19-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.19-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 34b3d5344719d14fd2185b2d9459b3abcb8cf9d8 Mon Sep 17 00:00:00 2001 From: Petr Mladek <pmladek(a)suse.com> Date: Thu, 24 Jun 2021 18:39:45 -0700 Subject: [PATCH] kthread_worker: split code for canceling the delayed work timer Patch series "kthread_worker: Fix race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync()". This patchset fixes the race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync() including proper return value handling. This patch (of 2): Simple code refactoring as a preparation step for fixing a race between kthread_mod_delayed_work() and kthread_cancel_delayed_work_sync(). It does not modify the existing behavior. Link: https://lkml.kernel.org/r/20210610133051.15337-2-pmladek@suse.com Signed-off-by: Petr Mladek <pmladek(a)suse.com> Cc: <jenhaochen(a)google.com> Cc: Martin Liu <liumartin(a)google.com> Cc: Minchan Kim <minchan(a)google.com> Cc: Nathan Chancellor <nathan(a)kernel.org> Cc: Nick Desaulniers <ndesaulniers(a)google.com> Cc: Oleg Nesterov <oleg(a)redhat.com> Cc: Tejun Heo <tj(a)kernel.org> Cc: <stable(a)vger.kernel.org> Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds(a)linux-foundation.org> diff --git a/kernel/kthread.c b/kernel/kthread.c index fe3f2a40d61e..121a0e1fc659 100644 --- a/kernel/kthread.c +++ b/kernel/kthread.c @@ -1092,6 +1092,33 @@ void kthread_flush_work(struct kthread_work *work) } EXPORT_SYMBOL_GPL(kthread_flush_work); +/* + * Make sure that the timer is neither set nor running and could + * not manipulate the work list_head any longer. + * + * The function is called under worker->lock. The lock is temporary + * released but the timer can't be set again in the meantime. + */ +static void kthread_cancel_delayed_work_timer(struct kthread_work *work, + unsigned long *flags) +{ + struct kthread_delayed_work *dwork = + container_of(work, struct kthread_delayed_work, work); + struct kthread_worker *worker = work->worker; + + /* + * del_timer_sync() must be called to make sure that the timer + * callback is not running. The lock must be temporary released + * to avoid a deadlock with the callback. In the meantime, + * any queuing is blocked by setting the canceling counter. + */ + work->canceling++; + raw_spin_unlock_irqrestore(&worker->lock, *flags); + del_timer_sync(&dwork->timer); + raw_spin_lock_irqsave(&worker->lock, *flags); + work->canceling--; +} + /* * This function removes the work from the worker queue. Also it makes sure * that it won't get queued later via the delayed work's timer. @@ -1106,23 +1133,8 @@ static bool __kthread_cancel_work(struct kthread_work *work, bool is_dwork, unsigned long *flags) { /* Try to cancel the timer if exists. */ - if (is_dwork) { - struct kthread_delayed_work *dwork = - container_of(work, struct kthread_delayed_work, work); - struct kthread_worker *worker = work->worker; - - /* - * del_timer_sync() must be called to make sure that the timer - * callback is not running. The lock must be temporary released - * to avoid a deadlock with the callback. In the meantime, - * any queuing is blocked by setting the canceling counter. - */ - work->canceling++; - raw_spin_unlock_irqrestore(&worker->lock, *flags); - del_timer_sync(&dwork->timer); - raw_spin_lock_irqsave(&worker->lock, *flags); - work->canceling--; - } + if (is_dwork) + kthread_cancel_delayed_work_timer(work, flags); /* * Try to remove the work from a worker list. It might either

4 years, 5 months

3
8
0 0

[PATCH v2 4/8] btrfs: wake up async_delalloc_pages waiters after submit

by Josef Bacik

We use the async_delalloc_pages mechanism to make sure that we've completed our async work before trying to continue our delalloc flushing. The reason for this is we need to see any ordered extents that were created by our delalloc flushing. However we're waking up before we do the submit work, which is before we create the ordered extents. This is a pretty wide race window where we could potentially think there are no ordered extents and thus exit shrink_delalloc prematurely. Fix this by waking us up after we've done the work to create ordered extents. cc: stable(a)vger.kernel.org Signed-off-by: Josef Bacik <josef(a)toxicpanda.com> --- fs/btrfs/inode.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b1f02e3fea5d..e388153c4ae4 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1290,11 +1290,6 @@ static noinline void async_cow_submit(struct btrfs_work *work) nr_pages = (async_chunk->end - async_chunk->start + PAGE_SIZE) >> PAGE_SHIFT; - /* atomic_sub_return implies a barrier */ - if (atomic_sub_return(nr_pages, &fs_info->async_delalloc_pages) < - 5 * SZ_1M) - cond_wake_up_nomb(&fs_info->async_submit_wait); - /* * ->inode could be NULL if async_chunk_start has failed to compress, * in which case we don't have anything to submit, yet we need to @@ -1303,6 +1298,11 @@ static noinline void async_cow_submit(struct btrfs_work *work) */ if (async_chunk->inode) submit_compressed_extents(async_chunk); + + /* atomic_sub_return implies a barrier */ + if (atomic_sub_return(nr_pages, &fs_info->async_delalloc_pages) < + 5 * SZ_1M) + cond_wake_up_nomb(&fs_info->async_submit_wait); } static noinline void async_cow_free(struct btrfs_work *work) -- 2.26.3

4 years, 5 months

2
1
0 0

[PATCH v2 2/8] btrfs: handle shrink_delalloc pages calculation differently

by Josef Bacik

We have been hitting some early ENOSPC issues in production with more recent kernels, and I tracked it down to us simply not flushing delalloc as aggressively as we should be. With tracing I was seeing us failing all tickets with all of the block rsvs at or around 0, with very little pinned space, but still around 120MiB of outstanding bytes_may_used. Upon further investigation I saw that we were flushing around 14 pages per shrink call for delalloc, despite having around 2GiB of delalloc outstanding. Consider the example of a 8 way machine, all CPUs trying to create a file in parallel, which at the time of this commit requires 5 items to do. Assuming a 16k leaf size, we have 10MiB of total metadata reclaim size waiting on reservations. Now assume we have 128MiB of delalloc outstanding. With our current math we would set items to 20, and then set to_reclaim to 20 * 256k, or 5MiB. Assuming that we went through this loop all 3 times, for both FLUSH_DELALLOC and FLUSH_DELALLOC_WAIT, and then did the full loop twice, we'd only flush 60MiB of the 128MiB delalloc space. This could leave a fair bit of delalloc reservations still hanging around by the time we go to ENOSPC out all the remaining tickets. Fix this two ways. First, change the calculations to be a fraction of the total delalloc bytes on the system. Prior to this change we were calculating based on dirty inodes so our math made more sense, now it's just completely unrelated to what we're actually doing. Second add a FLUSH_DELALLOC_FULL state, that we hold off until we've gone through the flush states at least once. This will empty the system of all delalloc so we're sure to be truly out of space when we start failing tickets. I'm tagging stable 5.10 and forward, because this is where we started using the page stuff heavily again. This affects earlier kernel versions as well, but would be a pain to backport to them as the flushing mechanisms aren't the same. CC: stable(a)vger.kernel.org # 5.10+ Signed-off-by: Josef Bacik <josef(a)toxicpanda.com> --- fs/btrfs/ctree.h | 9 +++++---- fs/btrfs/space-info.c | 35 ++++++++++++++++++++++++++--------- include/trace/events/btrfs.h | 1 + 3 files changed, 32 insertions(+), 13 deletions(-) diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index d7ef4d7d2c1a..232ff1a49ca6 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -2783,10 +2783,11 @@ enum btrfs_flush_state { FLUSH_DELAYED_REFS = 4, FLUSH_DELALLOC = 5, FLUSH_DELALLOC_WAIT = 6, - ALLOC_CHUNK = 7, - ALLOC_CHUNK_FORCE = 8, - RUN_DELAYED_IPUTS = 9, - COMMIT_TRANS = 10, + FLUSH_DELALLOC_FULL = 7, + ALLOC_CHUNK = 8, + ALLOC_CHUNK_FORCE = 9, + RUN_DELAYED_IPUTS = 10, + COMMIT_TRANS = 11, }; int btrfs_subvolume_reserve_metadata(struct btrfs_root *root, diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c index af161eb808a2..0c539a94c6d9 100644 --- a/fs/btrfs/space-info.c +++ b/fs/btrfs/space-info.c @@ -494,6 +494,9 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, long time_left; int loops; + delalloc_bytes = percpu_counter_sum_positive(&fs_info->delalloc_bytes); + ordered_bytes = percpu_counter_sum_positive(&fs_info->ordered_bytes); + /* Calc the number of the pages we need flush for space reservation */ if (to_reclaim == U64_MAX) { items = U64_MAX; @@ -501,19 +504,21 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, /* * to_reclaim is set to however much metadata we need to * reclaim, but reclaiming that much data doesn't really track - * exactly, so increase the amount to reclaim by 2x in order to - * make sure we're flushing enough delalloc to hopefully reclaim - * some metadata reservations. + * exactly. What we really want to do is reclaim full inode's + * worth of reservations, however that's not available to us + * here. We will take a fraction of the delalloc bytes for our + * flushing loops and hope for the best. Delalloc will expand + * the amount we write to cover an entire dirty extent, which + * will reclaim the metadata reservation for that range. If + * it's not enough subsequent flush stages will be more + * aggressive. */ + to_reclaim = max(to_reclaim, delalloc_bytes >> 3); items = calc_reclaim_items_nr(fs_info, to_reclaim) * 2; - to_reclaim = items * EXTENT_SIZE_PER_ITEM; } trans = (struct btrfs_trans_handle *)current->journal_info; - delalloc_bytes = percpu_counter_sum_positive( - &fs_info->delalloc_bytes); - ordered_bytes = percpu_counter_sum_positive(&fs_info->ordered_bytes); if (delalloc_bytes == 0 && ordered_bytes == 0) return; @@ -596,8 +601,11 @@ static void flush_space(struct btrfs_fs_info *fs_info, break; case FLUSH_DELALLOC: case FLUSH_DELALLOC_WAIT: + case FLUSH_DELALLOC_FULL: + if (state == FLUSH_DELALLOC_FULL) + num_bytes = U64_MAX; shrink_delalloc(fs_info, space_info, num_bytes, - state == FLUSH_DELALLOC_WAIT, for_preempt); + state != FLUSH_DELALLOC, for_preempt); break; case FLUSH_DELAYED_REFS_NR: case FLUSH_DELAYED_REFS: @@ -907,6 +915,14 @@ static void btrfs_async_reclaim_metadata_space(struct work_struct *work) commit_cycles--; } + /* + * We do not want to empty the system of delalloc unless we're + * under heavy pressure, so allow one trip through the flushing + * logic before we start doing a FLUSH_DELALLOC_FULL. + */ + if (flush_state == FLUSH_DELALLOC_FULL && !commit_cycles) + flush_state++; + /* * We don't want to force a chunk allocation until we've tried * pretty hard to reclaim space. Think of the case where we @@ -1070,7 +1086,7 @@ static void btrfs_preempt_reclaim_metadata_space(struct work_struct *work) * so if we now have space to allocate do the force chunk allocation. */ static const enum btrfs_flush_state data_flush_states[] = { - FLUSH_DELALLOC_WAIT, + FLUSH_DELALLOC_FULL, RUN_DELAYED_IPUTS, COMMIT_TRANS, ALLOC_CHUNK_FORCE, @@ -1159,6 +1175,7 @@ static const enum btrfs_flush_state evict_flush_states[] = { FLUSH_DELAYED_REFS, FLUSH_DELALLOC, FLUSH_DELALLOC_WAIT, + FLUSH_DELALLOC_FULL, ALLOC_CHUNK, COMMIT_TRANS, }; diff --git a/include/trace/events/btrfs.h b/include/trace/events/btrfs.h index 3d81ba8c37b9..ddf5c250726c 100644 --- a/include/trace/events/btrfs.h +++ b/include/trace/events/btrfs.h @@ -94,6 +94,7 @@ struct btrfs_space_info; EM( FLUSH_DELAYED_ITEMS, "FLUSH_DELAYED_ITEMS") \ EM( FLUSH_DELALLOC, "FLUSH_DELALLOC") \ EM( FLUSH_DELALLOC_WAIT, "FLUSH_DELALLOC_WAIT") \ + EM( FLUSH_DELALLOC_FULL, "FLUSH_DELALLOC_FULL") \ EM( FLUSH_DELAYED_REFS_NR, "FLUSH_DELAYED_REFS_NR") \ EM( FLUSH_DELAYED_REFS, "FLUSH_ELAYED_REFS") \ EM( ALLOC_CHUNK, "ALLOC_CHUNK") \ -- 2.26.3

4 years, 5 months

2
1
0 0

[PATCH] xfrm: Fix RCU vs hash_resize_mutex lock inversion

by Frederic Weisbecker

xfrm_bydst_resize() calls synchronize_rcu() while holding hash_resize_mutex. But then on PREEMPT_RT configurations, xfrm_policy_lookup_bytype() may acquire that mutex while running in an RCU read side critical section. This results in a deadlock. In fact the scope of hash_resize_mutex is way beyond the purpose of xfrm_policy_lookup_bytype() to just fetch a coherent and stable policy for a given destination/direction, along with other details. The lower level net->xfrm.xfrm_policy_lock, which among other things protects per destination/direction references to policy entries, is enough to serialize and benefit from priority inheritance against the write side. As a bonus, it makes it officially a per network namespace synchronization business where a policy table resize on namespace A shouldn't block a policy lookup on namespace B. Fixes: 77cc278f7b20 (xfrm: policy: Use sequence counters with associated lock) Cc: stable(a)vger.kernel.org Cc: Ahmed S. Darwish <a.darwish(a)linutronix.de> Cc: Peter Zijlstra (Intel) <peterz(a)infradead.org> Cc: Varad Gautam <varad.gautam(a)suse.com> Cc: Steffen Klassert <steffen.klassert(a)secunet.com> Cc: Herbert Xu <herbert(a)gondor.apana.org.au> Cc: David S. Miller <davem(a)davemloft.net> Signed-off-by: Frederic Weisbecker <frederic(a)kernel.org> --- include/net/netns/xfrm.h | 1 + net/xfrm/xfrm_policy.c | 17 ++++++++--------- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/include/net/netns/xfrm.h b/include/net/netns/xfrm.h index e816b6a3ef2b..9b376b87bd54 100644 --- a/include/net/netns/xfrm.h +++ b/include/net/netns/xfrm.h @@ -74,6 +74,7 @@ struct netns_xfrm { #endif spinlock_t xfrm_state_lock; seqcount_spinlock_t xfrm_state_hash_generation; + seqcount_spinlock_t xfrm_policy_hash_generation; spinlock_t xfrm_policy_lock; struct mutex xfrm_cfg_mutex; diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c index ce500f847b99..46a6d15b66d6 100644 --- a/net/xfrm/xfrm_policy.c +++ b/net/xfrm/xfrm_policy.c @@ -155,7 +155,6 @@ static struct xfrm_policy_afinfo const __rcu *xfrm_policy_afinfo[AF_INET6 + 1] __read_mostly; static struct kmem_cache *xfrm_dst_cache __ro_after_init; -static __read_mostly seqcount_mutex_t xfrm_policy_hash_generation; static struct rhashtable xfrm_policy_inexact_table; static const struct rhashtable_params xfrm_pol_inexact_params; @@ -585,7 +584,7 @@ static void xfrm_bydst_resize(struct net *net, int dir) return; spin_lock_bh(&net->xfrm.xfrm_policy_lock); - write_seqcount_begin(&xfrm_policy_hash_generation); + write_seqcount_begin(&net->xfrm.xfrm_policy_hash_generation); odst = rcu_dereference_protected(net->xfrm.policy_bydst[dir].table, lockdep_is_held(&net->xfrm.xfrm_policy_lock)); @@ -596,7 +595,7 @@ static void xfrm_bydst_resize(struct net *net, int dir) rcu_assign_pointer(net->xfrm.policy_bydst[dir].table, ndst); net->xfrm.policy_bydst[dir].hmask = nhashmask; - write_seqcount_end(&xfrm_policy_hash_generation); + write_seqcount_end(&net->xfrm.xfrm_policy_hash_generation); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); synchronize_rcu(); @@ -1245,7 +1244,7 @@ static void xfrm_hash_rebuild(struct work_struct *work) } while (read_seqretry(&net->xfrm.policy_hthresh.lock, seq)); spin_lock_bh(&net->xfrm.xfrm_policy_lock); - write_seqcount_begin(&xfrm_policy_hash_generation); + write_seqcount_begin(&net->xfrm.xfrm_policy_hash_generation); /* make sure that we can insert the indirect policies again before * we start with destructive action. @@ -1354,7 +1353,7 @@ static void xfrm_hash_rebuild(struct work_struct *work) out_unlock: __xfrm_policy_inexact_flush(net); - write_seqcount_end(&xfrm_policy_hash_generation); + write_seqcount_end(&net->xfrm.xfrm_policy_hash_generation); spin_unlock_bh(&net->xfrm.xfrm_policy_lock); mutex_unlock(&hash_resize_mutex); @@ -2095,9 +2094,9 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type, rcu_read_lock(); retry: do { - sequence = read_seqcount_begin(&xfrm_policy_hash_generation); + sequence = read_seqcount_begin(&net->xfrm.xfrm_policy_hash_generation); chain = policy_hash_direct(net, daddr, saddr, family, dir); - } while (read_seqcount_retry(&xfrm_policy_hash_generation, sequence)); + } while (read_seqcount_retry(&net->xfrm.xfrm_policy_hash_generation, sequence)); ret = NULL; hlist_for_each_entry_rcu(pol, chain, bydst) { @@ -2128,7 +2127,7 @@ static struct xfrm_policy *xfrm_policy_lookup_bytype(struct net *net, u8 type, } skip_inexact: - if (read_seqcount_retry(&xfrm_policy_hash_generation, sequence)) + if (read_seqcount_retry(&net->xfrm.xfrm_policy_hash_generation, sequence)) goto retry; if (ret && !xfrm_pol_hold_rcu(ret)) @@ -4084,6 +4083,7 @@ static int __net_init xfrm_net_init(struct net *net) /* Initialize the per-net locks here */ spin_lock_init(&net->xfrm.xfrm_state_lock); spin_lock_init(&net->xfrm.xfrm_policy_lock); + seqcount_spinlock_init(&net->xfrm.xfrm_policy_hash_generation, &net->xfrm.xfrm_policy_lock); mutex_init(&net->xfrm.xfrm_cfg_mutex); rv = xfrm_statistics_init(net); @@ -4128,7 +4128,6 @@ void __init xfrm_init(void) { register_pernet_subsys(&xfrm_net_ops); xfrm_dev_init(); - seqcount_mutex_init(&xfrm_policy_hash_generation, &hash_resize_mutex); xfrm_input_init(); #ifdef CONFIG_XFRM_ESPINTCP -- 2.25.1

4 years, 5 months

3
5
0 0

[PATCH 5.10 000/101] 5.10.47-rc1 review

by Sasha Levin

This is the start of the stable review cycle for the 5.10.47 release. There are 101 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Wed 30 Jun 2021 02:25:36 PM UTC. Anything received after that time might be too late. The whole patch series can be found in one patch at: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/… or in the git tree and branch at: git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-5.10.y and the diffstat can be found below. Thanks, Sasha ------------- Pseudo-Shortlog of commits: Alex Shi (1): mm: add VM_WARN_ON_ONCE_PAGE() macro Alper Gun (1): KVM: SVM: Call SEV Guest Decommission if ASID binding fails Arnd Bergmann (1): ARM: 9081/1: fix gcc-10 thumb2-kernel regression Austin Kim (1): net: ethtool: clear heap allocations for ethtool function Bumyong Lee (1): swiotlb: manipulate orig_addr when tlb_addr has offset Catalin Marinas (2): arm64: Ignore any DMA offsets in the max_zone_phys() calculation arm64: Force NO_BLOCK_MAPPINGS if crashkernel reservation is required Christian König (3): drm/nouveau: wait for moving fence after pinning v2 drm/radeon: wait for moving fence after pinning drm/amdgpu: wait for moving fence after pinning Christoph Hellwig (1): scsi: sd: Call sd_revalidate_disk() for ioctl(BLKRRPART) Daniel Vetter (1): Revert "drm: add a locked version of drm_is_current_master" Desmond Cheong Zhi Xi (1): drm: add a locked version of drm_is_current_master Du Cheng (1): cfg80211: call cfg80211_leave_ocb when switching away from OCB Eric Dumazet (6): inet: annotate data race in inet_send_prepare() and inet_dgram_connect() net: annotate data race in sock_error() inet: annotate date races around sk->sk_txhash net/packet: annotate data race in packet_sendmsg() net/packet: annotate accesses to po->bind net/packet: annotate accesses to po->ifindex Eric Snowberg (4): certs: Add EFI_CERT_X509_GUID support for dbx entries certs: Move load_system_certificate_list to a common function certs: Add ability to preload revocation certs integrity: Load mokx variables into the blacklist keyring Esben Haabendal (2): net: ll_temac: Add memory-barriers for TX BD access net: ll_temac: Avoid ndo_start_xmit returning NETDEV_TX_BUSY Fabien Dessenne (1): pinctrl: stm32: fix the reported number of GPIO lines per bank Fuad Tabba (1): KVM: selftests: Fix kvm_check_cap() assertion Gabriel Knezek (1): gpiolib: cdev: zero padding during conversion to gpioline_info_changed Guillaume Ranquet (3): dmaengine: mediatek: free the proper desc in desc_free handler dmaengine: mediatek: do not issue a new desc if one is still current dmaengine: mediatek: use GFP_NOWAIT instead of GFP_ATOMIC in prep_dma Haibo Chen (1): spi: spi-nxp-fspi: move the register operation after the clock enable Heiko Carstens (1): s390/stack: fix possible register corruption with stack switch helper Heiner Kallweit (1): i2c: i801: Ensure that SMBHSTSTS_INUSE_STS is cleared when leaving i801_access Hugh Dickins (16): mm/thp: fix __split_huge_pmd_locked() on shmem migration entry mm/thp: make is_huge_zero_pmd() safe and quicker mm/thp: try_to_unmap() use TTU_SYNC for safe splitting mm/thp: fix vma_address() if virtual address below file offset mm/thp: unmap_mapping_page() to fix THP truncate_cleanup_page() mm: page_vma_mapped_walk(): use page for pvmw->page mm: page_vma_mapped_walk(): settle PageHuge on entry mm: page_vma_mapped_walk(): use pmde for *pvmw->pmd mm: page_vma_mapped_walk(): prettify PVMW_MIGRATION block mm: page_vma_mapped_walk(): crossing page table boundary mm: page_vma_mapped_walk(): add a level of indentation mm: page_vma_mapped_walk(): use goto instead of while (1) mm: page_vma_mapped_walk(): get vma_address_end() earlier mm/thp: fix page_vma_mapped_walk() if THP mapped by ptes mm/thp: another PVMW_SYNC fix in page_vma_mapped_walk() mm, futex: fix shared futex pgoff on shmem huge page Jeff Layton (2): ceph: must hold snap_rwsem when filling inode for async create netfs: fix test for whether we can skip read when writing beyond EOF Johan Hovold (1): i2c: robotfuzz-osif: fix control-request directions Johannes Berg (5): mac80211: remove warning in ieee80211_get_sband() mac80211_hwsim: drop pending frames on stop mac80211: drop multicast fragments mac80211: reset profile_periodicity/ema_ap mac80211: handle various extensible elements correctly Jue Wang (1): mm/thp: fix page_address_in_vma() on file THP tails Kan Liang (1): perf/x86: Track pmu in per-CPU cpu_hw_events Kees Cook (4): r8152: Avoid memcpy() over-reading of ETH_SS_STATS sh_eth: Avoid memcpy() over-reading of ETH_SS_STATS r8169: Avoid memcpy() over-reading of ETH_SS_STATS net: qed: Fix memcpy() overflow of qed_dcbx_params() Khem Raj (1): riscv32: Use medany C model for modules Laurent Pinchart (2): dmaengine: xilinx: dpdma: Add missing dependencies to Kconfig dmaengine: xilinx: dpdma: Limit descriptor IDs to 16 bits Like Xu (1): perf/x86/lbr: Remove cpuc->lbr_xsave allocation from atomic context Maxime Ripard (2): drm/vc4: hdmi: Move the HSM clock enable to runtime_pm drm/vc4: hdmi: Make sure the controller is powered in detect Miaohe Lin (2): mm/rmap: remove unneeded semicolon in page_not_mapped() mm/rmap: use page_not_mapped in try_to_unmap() Mikel Rychliski (1): PCI: Add AMD RS690 quirk to enable 64-bit DMA Mimi Zohar (1): module: limit enabling module.sig_enforce Nathan Chancellor (1): MIPS: generic: Update node names to avoid unit addresses Neil Armstrong (1): mmc: meson-gx: use memcpy_to/fromio for dram-access-quirk Nicholas Piggin (1): KVM: do not allow mapping valid but non-reference-counted pages Pavel Skripkin (2): net: caif: fix memory leak in ldisc_open nilfs2: fix memory leak in nilfs_sysfs_delete_device_group Peter Zijlstra (4): x86/entry: Fix noinstr fail in __do_fast_syscall_32() x86/xen: Fix noinstr fail in exc_xen_unknown_trap() locking/lockdep: Improve noinstr vs errors recordmcount: Correct st_shndx handling Petr Mladek (2): kthread_worker: split code for canceling the delayed work timer kthread: prevent deadlock when kthread_mod_delayed_work() races with kthread_cancel_delayed_work_sync() Praneeth Bajjuri (1): net: phy: dp83867: perform soft reset and retain established link Rafael J. Wysocki (1): Revert "PCI: PM: Do not read power state in pci_enable_device_flags()" Sasha Levin (1): Linux 5.10.47-rc1 Thomas Gleixner (3): perf/x86/intel/lbr: Zero the xstate buffer on allocation x86/fpu: Preserve supervisor states in sanitize_restored_user_xstate() x86/fpu: Make init_fpstate correct with optimized XSAVE Xu Yu (1): mm, thp: use head page in __migration_entry_wait() Yang Shi (1): mm: thp: replace DEBUG_VM BUG with VM_WARN when unmap fails for split Yifan Zhang (2): Revert "drm/amdgpu/gfx9: fix the doorbell missing when in CGPG issue." Revert "drm/amdgpu/gfx10: enlarge CP_MEC_DOORBELL_RANGE_UPPER to cover full doorbell." Yu Kuai (2): dmaengine: zynqmp_dma: Fix PM reference leak in zynqmp_dma_alloc_chan_resourc() dmaengine: stm32-mdma: fix PM reference leak in stm32_mdma_alloc_chan_resourc() Zheng Yongjun (2): net: ipv4: Remove unneed BUG() function ping: Check return value of function 'ping_queue_rcv_skb' Zou Wei (1): dmaengine: rcar-dmac: Fix PM reference leak in rcar_dmac_probe() Makefile | 4 +- arch/arm/kernel/setup.c | 16 +- arch/arm64/mm/init.c | 17 +- arch/arm64/mm/mmu.c | 37 ++-- arch/mips/generic/board-boston.its.S | 10 +- arch/mips/generic/board-ni169445.its.S | 10 +- arch/mips/generic/board-ocelot.its.S | 20 +-- arch/mips/generic/board-xilfpga.its.S | 10 +- arch/mips/generic/vmlinux.its.S | 10 +- arch/riscv/Makefile | 2 +- arch/s390/include/asm/stacktrace.h | 18 +- arch/x86/entry/common.c | 2 +- arch/x86/events/core.c | 23 ++- arch/x86/events/intel/core.c | 2 +- arch/x86/events/intel/ds.c | 4 +- arch/x86/events/intel/lbr.c | 36 ++-- arch/x86/events/perf_event.h | 10 +- arch/x86/include/asm/fpu/internal.h | 30 +--- arch/x86/kernel/fpu/signal.c | 26 +-- arch/x86/kernel/fpu/xstate.c | 41 ++++- arch/x86/kvm/svm/sev.c | 32 ++-- arch/x86/pci/fixup.c | 44 +++++ arch/x86/xen/enlighten_pv.c | 2 + certs/Kconfig | 17 ++ certs/Makefile | 21 ++- certs/blacklist.c | 64 +++++++ certs/blacklist.h | 2 + certs/common.c | 57 +++++++ certs/common.h | 9 + certs/revocation_certificates.S | 21 +++ certs/system_keyring.c | 55 +----- drivers/dma/Kconfig | 1 + drivers/dma/mediatek/mtk-uart-apdma.c | 27 +-- drivers/dma/sh/rcar-dmac.c | 2 +- drivers/dma/stm32-mdma.c | 4 +- drivers/dma/xilinx/xilinx_dpdma.c | 7 +- drivers/dma/xilinx/zynqmp_dma.c | 2 +- drivers/gpio/gpiolib-cdev.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 14 +- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 6 +- drivers/gpu/drm/nouveau/nouveau_prime.c | 17 +- drivers/gpu/drm/radeon/radeon_prime.c | 16 +- drivers/gpu/drm/vc4/vc4_hdmi.c | 44 +++-- drivers/i2c/busses/i2c-i801.c | 3 + drivers/i2c/busses/i2c-robotfuzz-osif.c | 4 +- drivers/mmc/host/meson-gx-mmc.c | 50 +++++- drivers/net/caif/caif_serial.c | 1 + drivers/net/ethernet/qlogic/qed/qed_dcbx.c | 4 +- drivers/net/ethernet/realtek/r8169_main.c | 2 +- drivers/net/ethernet/renesas/sh_eth.c | 2 +- drivers/net/ethernet/xilinx/ll_temac_main.c | 19 ++- drivers/net/phy/dp83867.c | 6 +- drivers/net/usb/r8152.c | 2 +- drivers/net/wireless/mac80211_hwsim.c | 5 + drivers/pci/pci.c | 16 +- drivers/pinctrl/stm32/pinctrl-stm32.c | 9 +- drivers/scsi/sd.c | 22 ++- drivers/spi/spi-nxp-fspi.c | 11 +- fs/ceph/addr.c | 54 ++++-- fs/ceph/file.c | 3 + fs/ceph/inode.c | 2 + fs/nilfs2/sysfs.c | 1 + include/keys/system_keyring.h | 15 ++ include/linux/debug_locks.h | 2 + include/linux/huge_mm.h | 8 +- include/linux/hugetlb.h | 16 -- include/linux/mm.h | 3 + include/linux/mmdebug.h | 13 ++ include/linux/pagemap.h | 13 +- include/linux/rmap.h | 1 + include/net/sock.h | 17 +- kernel/dma/swiotlb.c | 3 + kernel/futex.c | 3 +- kernel/kthread.c | 77 ++++++--- kernel/locking/lockdep.c | 4 +- kernel/module.c | 14 +- lib/debug_locks.c | 2 +- mm/huge_memory.c | 56 +++--- mm/hugetlb.c | 5 +- mm/internal.h | 53 ++++-- mm/memory.c | 41 +++++ mm/migrate.c | 1 + mm/page_vma_mapped.c | 160 +++++++++++------- mm/pgtable-generic.c | 4 +- mm/rmap.c | 50 +++--- mm/truncate.c | 43 +++-- net/ethtool/ioctl.c | 10 +- net/ipv4/af_inet.c | 4 +- net/ipv4/devinet.c | 2 +- net/ipv4/ping.c | 12 +- net/ipv6/addrconf.c | 2 +- net/mac80211/ieee80211_i.h | 2 +- net/mac80211/mlme.c | 8 + net/mac80211/rx.c | 9 +- net/mac80211/util.c | 22 +-- net/packet/af_packet.c | 41 +++-- net/wireless/util.c | 3 + scripts/Makefile | 1 + scripts/recordmcount.h | 15 +- .../platform_certs/keyring_handler.c | 11 ++ security/integrity/platform_certs/load_uefi.c | 20 ++- tools/testing/selftests/kvm/lib/kvm_util.c | 2 +- virt/kvm/kvm_main.c | 19 ++- 104 files changed, 1242 insertions(+), 560 deletions(-) create mode 100644 certs/common.c create mode 100644 certs/common.h create mode 100644 certs/revocation_certificates.S -- 2.30.2

4 years, 5 months

10
115
0 0

FAILED: patch "[PATCH] RDMA/mlx5: Block FDB rules when not in switchdev mode" failed to apply to 5.10-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 5.10-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From edc0b0bccc9c80d9a44d3002dcca94984b25e7cf Mon Sep 17 00:00:00 2001 From: Mark Bloch <mbloch(a)nvidia.com> Date: Mon, 7 Jun 2021 11:03:12 +0300 Subject: [PATCH] RDMA/mlx5: Block FDB rules when not in switchdev mode Allow creating FDB steering rules only when in switchdev mode. The only software model where a userspace application can manipulate FDB entries is when it manages the eswitch. This is only possible in switchdev mode where we expose a single RDMA device with representors for all the vports that are connected to the eswitch. Fixes: 52438be44112 ("RDMA/mlx5: Allow inserting a steering rule to the FDB") Link: https://lore.kernel.org/r/e928ae7c58d07f104716a2a8d730963d1bd01204.16230529… Reviewed-by: Maor Gottlieb <maorg(a)nvidia.com> Signed-off-by: Mark Bloch <mbloch(a)nvidia.com> Signed-off-by: Leon Romanovsky <leonro(a)nvidia.com> Signed-off-by: Jason Gunthorpe <jgg(a)nvidia.com> diff --git a/drivers/infiniband/hw/mlx5/fs.c b/drivers/infiniband/hw/mlx5/fs.c index 2fc6a60c4e77..f84441ff0c81 100644 --- a/drivers/infiniband/hw/mlx5/fs.c +++ b/drivers/infiniband/hw/mlx5/fs.c @@ -2134,6 +2134,12 @@ static int UVERBS_HANDLER(MLX5_IB_METHOD_FLOW_MATCHER_CREATE)( if (err) goto end; + if (obj->ns_type == MLX5_FLOW_NAMESPACE_FDB && + mlx5_eswitch_mode(dev->mdev) != MLX5_ESWITCH_OFFLOADS) { + err = -EINVAL; + goto end; + } + uobj->object = obj; obj->mdev = dev->mdev; atomic_set(&obj->usecnt, 0);

4 years, 5 months

3
2
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror June 2021