In commit d8bb6718c4db ("arm64: Make debug exception handlers visible
from RCU") debug_exception_enter and exit were added to deal with better
tracking of RCU state - however, in addition to that, but not mentioned
in the commit log, a preempt_disable/enable pair were also added.
Based on the comment (being removed here) it would seem that the pair
were not added to address a specific problem, but just as a proactive,
preventative measure - as in "seemed like a good idea at the time".
The problem is that do_debug_exception() eventually calls out to
generic kernel code like do_force_sig_info() which takes non-raw locks
and on -rt enabled kernels, results in things looking like the following,
since on -rt kernels, it is noticed that preemption is still disabled.
BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:975
in_atomic(): 1, irqs_disabled(): 0, pid: 35658, name: gdbtest
Preemption disabled at:
[<ffff000010081578>] do_debug_exception+0x38/0x1a4
Call trace:
dump_backtrace+0x0/0x138
show_stack+0x24/0x30
dump_stack+0x94/0xbc
___might_sleep+0x13c/0x168
rt_spin_lock+0x40/0x80
do_force_sig_info+0x30/0xe0
force_sig_fault+0x64/0x90
arm64_force_sig_fault+0x50/0x80
send_user_sigtrap+0x50/0x80
brk_handler+0x98/0xc8
do_debug_exception+0x70/0x1a4
el0_dbg+0x18/0x20
The reproducer was basically an automated gdb test that set a breakpoint
on a simple "hello world" program and then quit gdb once the breakpoint
was hit - i.e. "(gdb) A debugging session is active. Quit anyway? "
Fixes: d8bb6718c4db ("arm64: Make debug exception handlers visible from RCU")
Cc: stable(a)vger.kernel.org
Cc: Naresh Kamboju <naresh.kamboju(a)linaro.org>
Cc: Paul E. McKenney <paulmck(a)linux.ibm.com>
Cc: Masami Hiramatsu <mhiramat(a)kernel.org>
Cc: Will Deacon <will(a)kernel.org>
Signed-off-by: Paul Gortmaker <paul.gortmaker(a)windriver.com>
---
arch/arm64/mm/fault.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 8afb238ff335..4d83ec991b33 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -788,13 +788,6 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name = name;
}
-/*
- * In debug exception context, we explicitly disable preemption despite
- * having interrupts disabled.
- * This serves two purposes: it makes it much less likely that we would
- * accidentally schedule in exception context and it will force a warning
- * if we somehow manage to schedule by accident.
- */
static void debug_exception_enter(struct pt_regs *regs)
{
/*
@@ -816,8 +809,6 @@ static void debug_exception_enter(struct pt_regs *regs)
rcu_nmi_enter();
}
- preempt_disable();
-
/* This code is a bit fragile. Test it. */
RCU_LOCKDEP_WARN(!rcu_is_watching(), "exception_enter didn't work");
}
@@ -825,8 +816,6 @@ NOKPROBE_SYMBOL(debug_exception_enter);
static void debug_exception_exit(struct pt_regs *regs)
{
- preempt_enable_no_resched();
-
if (!user_mode(regs))
rcu_nmi_exit();
--
2.7.4
DIR_INDEX has been introduced as a compat ext4 feature. That means that
even kernels / tools that don't understand the feature may modify the
filesystem. This works because for kernels not understanding indexed dir
format, internal htree nodes appear just as empty directory entries.
Index dir aware kernels then check the htree structure is still
consistent before using the data. This all worked reasonably well until
metadata checksums were introduced. The problem is that these
effectively made DIR_INDEX only ro-compatible because internal htree
nodes store checksums in a different place than normal directory blocks.
Thus any modification ignorant to DIR_INDEX (or just clearing
EXT4_INDEX_FL from the inode) will effectively cause checksum mismatch
and trigger kernel errors. So we have to be more careful when dealing
with indexed directories on filesystems with checksumming enabled.
1) We just disallow loading and directory inodes with EXT4_INDEX_FL when
DIR_INDEX is not enabled. This is harsh but it should be very rare (it
means someone disabled DIR_INDEX on existing filesystem and didn't run
e2fsck), e2fsck can fix the problem, and we don't want to answer the
difficult question: "Should we rather corrupt the directory more or
should we ignore that DIR_INDEX feature is not set?"
2) When we find out htree structure is corrupted (but the filesystem and
the directory should in support htrees), we continue just ignoring htree
information for reading but we refuse to add new entries to the
directory to avoid corrupting it more.
CC: stable(a)vger.kernel.org
Fixes: dbe89444042a ("ext4: Calculate and verify checksums for htree nodes")
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/ext4/dir.c | 14 ++++++++------
fs/ext4/ext4.h | 5 ++++-
fs/ext4/inode.c | 13 +++++++++++++
fs/ext4/namei.c | 7 +++++++
4 files changed, 32 insertions(+), 7 deletions(-)
diff --git a/fs/ext4/dir.c b/fs/ext4/dir.c
index 9f00fc0bf21d..cb9ea593b544 100644
--- a/fs/ext4/dir.c
+++ b/fs/ext4/dir.c
@@ -129,12 +129,14 @@ static int ext4_readdir(struct file *file, struct dir_context *ctx)
if (err != ERR_BAD_DX_DIR) {
return err;
}
- /*
- * We don't set the inode dirty flag since it's not
- * critical that it get flushed back to the disk.
- */
- ext4_clear_inode_flag(file_inode(file),
- EXT4_INODE_INDEX);
+ /* Can we just clear INDEX flag to ignore htree information? */
+ if (!ext4_has_metadata_csum(sb)) {
+ /*
+ * We don't set the inode dirty flag since it's not
+ * critical that it gets flushed back to the disk.
+ */
+ ext4_clear_inode_flag(inode, EXT4_INODE_INDEX);
+ }
}
if (ext4_has_inline_data(inode)) {
diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index f8578caba40d..1fd6c1e2ce2a 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -2482,8 +2482,11 @@ void ext4_insert_dentry(struct inode *inode,
struct ext4_filename *fname);
static inline void ext4_update_dx_flag(struct inode *inode)
{
- if (!ext4_has_feature_dir_index(inode->i_sb))
+ if (!ext4_has_feature_dir_index(inode->i_sb)) {
+ /* ext4_iget() should have caught this... */
+ WARN_ON_ONCE(ext4_has_feature_metadata_csum(inode->i_sb));
ext4_clear_inode_flag(inode, EXT4_INODE_INDEX);
+ }
}
static const unsigned char ext4_filetype_table[] = {
DT_UNKNOWN, DT_REG, DT_DIR, DT_CHR, DT_BLK, DT_FIFO, DT_SOCK, DT_LNK
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 629a25d999f0..d33135308c1b 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4615,6 +4615,19 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino,
ret = -EFSCORRUPTED;
goto bad_inode;
}
+ /*
+ * If dir_index is not enabled but there's dir with INDEX flag set,
+ * we'd normally treat htree data as empty space. But with metadata
+ * checksumming that corrupts checksums so forbid that.
+ */
+ if (!ext4_has_feature_dir_index(sb) && ext4_has_metadata_csum(sb) &&
+ ext4_test_inode_flag(inode, EXT4_INODE_INDEX)) {
+ ext4_error_inode(inode, function, line, 0,
+ "iget: Dir with htree data on filesystem "
+ "without dir_index feature.");
+ ret = -EFSCORRUPTED;
+ goto bad_inode;
+ }
ei->i_disksize = inode->i_size;
#ifdef CONFIG_QUOTA
ei->i_reserved_quota = 0;
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 1cb42d940784..deb9f7a02976 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2207,6 +2207,13 @@ static int ext4_add_entry(handle_t *handle, struct dentry *dentry,
retval = ext4_dx_add_entry(handle, &fname, dir, inode);
if (!retval || (retval != ERR_BAD_DX_DIR))
goto out;
+ /* Can we just ignore htree data? */
+ if (ext4_has_metadata_csum(sb)) {
+ EXT4_ERROR_INODE(dir,
+ "Directory has corrupted htree index.");
+ retval = -EFSCORRUPTED;
+ goto out;
+ }
ext4_clear_inode_flag(dir, EXT4_INODE_INDEX);
dx_fallback++;
ext4_mark_inode_dirty(handle, dir);
--
2.16.4
This is a bit of a mess, to put it mildly. But, it's a bug
that seems to have gone unticked up to now, probably because
nobody uses MPX. The other alternative to this fix is to just
deprecate MPX, even in -stable kernels.
MPX has the arch_unmap() hook inside of munmap() because MPX
uses bounds tables that protect other areas of memory. When
memory is unmapped, there is also a need to unmap the MPX
bounds tables. Barring this, unused bounds tables can eat 80%
of the address space.
But, the recursive do_munmap() that gets called vi arch_unmap()
wreaks havoc with __do_munmap()'s state. It can result in
freeing populated page tables, accessing bogus VMA state,
double-freed VMAs and more.
To fix this, call arch_unmap() before __do_unmap() has a chance
to do anything meaningful. Also, remove the 'vma' argument
and force the MPX code to do its own, independent VMA lookup.
For the common success case this is functionally identical to
what was there before. For the munmap() failure case, it's
possible that some MPX tables will be zapped for memory that
continues to be in use. But, this is an extraordinarily
unlikely scenario and the harm would be that MPX provides no
protection since the bounds table got reset (zeroed).
I can't imagine anyone doing this:
ptr = mmap();
// use ptr
ret = munmap(ptr);
if (ret)
// oh, there was an error, I'll
// keep using ptr.
Because if you're doing munmap(), you are *done* with the
memory. There's probably no good data in there _anyway_.
This passes the original reproducer from Richard Biener as
well as the existing mpx selftests/.
====
The long story:
munmap() has a couple of pieces:
1. Find the affected VMA(s)
2. Split the start/end one(s) if neceesary
3. Pull the VMAs out of the rbtree
4. Actually zap the memory via unmap_region(), including
freeing page tables (or queueing them to be freed).
5. Fixup some of the accounting (like fput()) and actually
free the VMA itself.
I decided to put the arch_unmap() call right afer #3. This
was *just* before mmap_sem looked like it might get downgraded
(it won't in this context), but it looked right. It wasn't.
Richard Biener reported a test that shows this in dmesg:
[1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
[1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
What triggered this was the recursive do_munmap() called via
arch_unmap(). It was freeing page tables that has not been
properly zapped.
But, the problem was bigger than this. For one, arch_unmap()
can free VMAs. But, the calling __do_munmap() has variables
that *point* to VMAs and obviously can't handle them just
getting freed while the pointer is still valid.
I tried a couple of things here. First, I tried to fix the page
table freeing problem in isolation, but I then found the VMA
issue. I also tried having the MPX code return a flag if it
modified the rbtree which would force __do_munmap() to re-walk
to restart. That spiralled out of control in complexity pretty
fast.
Just moving arch_unmap() and accepting that the bonkers failure
case might eat some bounds tables seems like the simplest viable
fix.
Reported-by: Richard Biener <rguenther(a)suse.de>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Andy Lutomirski <luto(a)amacapital.net>
Cc: x86(a)kernel.org
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: linux-kernel(a)vger.kernel.org
Cc: linux-mm(a)kvack.org
Cc: stable(a)vger.kernel.org
---
b/arch/x86/include/asm/mmu_context.h | 6 +++---
b/arch/x86/include/asm/mpx.h | 5 ++---
b/arch/x86/mm/mpx.c | 10 ++++++----
b/include/asm-generic/mm_hooks.h | 1 -
b/mm/mmap.c | 15 ++++++++-------
5 files changed, 19 insertions(+), 18 deletions(-)
diff -puN mm/mmap.c~mpx-rss-pass-no-vma mm/mmap.c
--- a/mm/mmap.c~mpx-rss-pass-no-vma 2019-04-01 06:56:53.409411123 -0700
+++ b/mm/mmap.c 2019-04-01 06:56:53.423411123 -0700
@@ -2731,9 +2731,17 @@ int __do_munmap(struct mm_struct *mm, un
return -EINVAL;
len = PAGE_ALIGN(len);
+ end = start + len;
if (len == 0)
return -EINVAL;
+ /*
+ * arch_unmap() might do unmaps itself. It must be called
+ * and finish any rbtree manipulation before this code
+ * runs and also starts to manipulate the rbtree.
+ */
+ arch_unmap(mm, start, end);
+
/* Find the first overlapping VMA */
vma = find_vma(mm, start);
if (!vma)
@@ -2742,7 +2750,6 @@ int __do_munmap(struct mm_struct *mm, un
/* we have start < vma->vm_end */
/* if it doesn't overlap, we have nothing.. */
- end = start + len;
if (vma->vm_start >= end)
return 0;
@@ -2812,12 +2819,6 @@ int __do_munmap(struct mm_struct *mm, un
/* Detach vmas from rbtree */
detach_vmas_to_be_unmapped(mm, vma, prev, end);
- /*
- * mpx unmap needs to be called with mmap_sem held for write.
- * It is safe to call it before unmap_region().
- */
- arch_unmap(mm, vma, start, end);
-
if (downgrade)
downgrade_write(&mm->mmap_sem);
diff -puN arch/x86/include/asm/mmu_context.h~mpx-rss-pass-no-vma arch/x86/include/asm/mmu_context.h
--- a/arch/x86/include/asm/mmu_context.h~mpx-rss-pass-no-vma 2019-04-01 06:56:53.412411123 -0700
+++ b/arch/x86/include/asm/mmu_context.h 2019-04-01 06:56:53.423411123 -0700
@@ -277,8 +277,8 @@ static inline void arch_bprm_mm_init(str
mpx_mm_init(mm);
}
-static inline void arch_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+static inline void arch_unmap(struct mm_struct *mm, unsigned long start,
+ unsigned long end)
{
/*
* mpx_notify_unmap() goes and reads a rarely-hot
@@ -298,7 +298,7 @@ static inline void arch_unmap(struct mm_
* consistently wrong.
*/
if (unlikely(cpu_feature_enabled(X86_FEATURE_MPX)))
- mpx_notify_unmap(mm, vma, start, end);
+ mpx_notify_unmap(mm, start, end);
}
/*
diff -puN include/asm-generic/mm_hooks.h~mpx-rss-pass-no-vma include/asm-generic/mm_hooks.h
--- a/include/asm-generic/mm_hooks.h~mpx-rss-pass-no-vma 2019-04-01 06:56:53.414411123 -0700
+++ b/include/asm-generic/mm_hooks.h 2019-04-01 06:56:53.423411123 -0700
@@ -18,7 +18,6 @@ static inline void arch_exit_mmap(struct
}
static inline void arch_unmap(struct mm_struct *mm,
- struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
}
diff -puN arch/x86/mm/mpx.c~mpx-rss-pass-no-vma arch/x86/mm/mpx.c
--- a/arch/x86/mm/mpx.c~mpx-rss-pass-no-vma 2019-04-01 06:56:53.416411123 -0700
+++ b/arch/x86/mm/mpx.c 2019-04-01 06:56:53.423411123 -0700
@@ -881,9 +881,10 @@ static int mpx_unmap_tables(struct mm_st
* the virtual address region start...end have already been split if
* necessary, and the 'vma' is the first vma in this range (start -> end).
*/
-void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long start, unsigned long end)
+void mpx_notify_unmap(struct mm_struct *mm, unsigned long start,
+ unsigned long end)
{
+ struct vm_area_struct *vma;
int ret;
/*
@@ -902,11 +903,12 @@ void mpx_notify_unmap(struct mm_struct *
* which should not occur normally. Being strict about it here
* helps ensure that we do not have an exploitable stack overflow.
*/
- do {
+ vma = find_vma(mm, start);
+ while (vma && vma->vm_start < end) {
if (vma->vm_flags & VM_MPX)
return;
vma = vma->vm_next;
- } while (vma && vma->vm_start < end);
+ }
ret = mpx_unmap_tables(mm, start, end);
if (ret)
diff -puN arch/x86/include/asm/mpx.h~mpx-rss-pass-no-vma arch/x86/include/asm/mpx.h
--- a/arch/x86/include/asm/mpx.h~mpx-rss-pass-no-vma 2019-04-01 06:56:53.418411123 -0700
+++ b/arch/x86/include/asm/mpx.h 2019-04-01 06:56:53.424411123 -0700
@@ -78,8 +78,8 @@ static inline void mpx_mm_init(struct mm
*/
mm->context.bd_addr = MPX_INVALID_BOUNDS_DIR;
}
-void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma,
- unsigned long start, unsigned long end);
+void mpx_notify_unmap(struct mm_struct *mm, unsigned long start,
+ unsigned long end);
unsigned long mpx_unmapped_area_check(unsigned long addr, unsigned long len,
unsigned long flags);
@@ -100,7 +100,6 @@ static inline void mpx_mm_init(struct mm
{
}
static inline void mpx_notify_unmap(struct mm_struct *mm,
- struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
}
_
On Cherry Trail devices there are 2 possible ACPI OpRegions for
accessing GPIOs. The standard GeneralPurposeIo OpRegion and the Cherry
Trail specific UserDefined 0x9X OpRegions.
Having 2 different types of OpRegions leads to potential issues with
checks for OpRegion availability, or in other words checks if _REG has
been called for the OpRegion which the ACPI code wants to use.
The ACPICA core does not call _REG on an ACPI node which does not
define an OpRegion matching the type being registered; and the reference
design DSDT, from which most Cherry Trail DSDTs are derived, does not
define GeneralPurposeIo, nor UserDefined(0x93) OpRegions for the GPO2
(UID 3) device, because no pins were assigned ACPI controlled functions
in the reference design.
Together this leads to the perfect storm, at least on the Cherry Trail
based Medion Akayo E1239T. This design does use a GPO2 pin from its ACPI
code and has added the Cherry Trail specific UserDefined(0x93) opregion
to its GPO2 ACPI node to access this pin.
But it uses a has _REG been called availability check for the standard
GeneralPurposeIo OpRegion. This clearly is a bug in the DSDT, but this
does work under Windows. This issue leads to the intel_vbtn driver
reporting the device always being in tablet-mode at boot, even if it
is in laptop mode. Which in turn causes userspace to ignore touchpad
events. So iow this issues causes the touchpad to not work at boot.
Since the bug in the DSDT stems from the confusion of having 2 different
OpRegion types for accessing GPIOs on Cherry Trail devices, I believe
that this is best fixed inside the Cherryview pinctrl driver.
This commit adds a workaround to the Cherryview pinctrl driver so
that the DSDT's expectations of _REG always getting called for the
GeneralPurposeIo OpRegion are met.
Cc: stable(a)vger.kernel.org
Signed-off-by: Hans de Goede <hdegoede(a)redhat.com>
---
Changes in v2:
- Drop unnecessary if (acpi_has_method(adev->handle, "_REG")) check
- Fix Cherryview spelling in the commit message
---
drivers/pinctrl/intel/pinctrl-cherryview.c | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/drivers/pinctrl/intel/pinctrl-cherryview.c b/drivers/pinctrl/intel/pinctrl-cherryview.c
index 4c74fdde576d..4817aec114d6 100644
--- a/drivers/pinctrl/intel/pinctrl-cherryview.c
+++ b/drivers/pinctrl/intel/pinctrl-cherryview.c
@@ -1693,6 +1693,8 @@ static acpi_status chv_pinctrl_mmio_access_handler(u32 function,
static int chv_pinctrl_probe(struct platform_device *pdev)
{
+ struct acpi_object_list input;
+ union acpi_object params[2];
struct chv_pinctrl *pctrl;
struct acpi_device *adev;
acpi_status status;
@@ -1755,6 +1757,22 @@ static int chv_pinctrl_probe(struct platform_device *pdev)
if (ACPI_FAILURE(status))
dev_err(&pdev->dev, "failed to install ACPI addr space handler\n");
+ /*
+ * Some DSDT-s use the chv_pinctrl_mmio_access_handler while checking
+ * for the regular GeneralPurposeIo OpRegion availability, mixed with
+ * the DSDT not defining a GeneralPurposeIo OpRegion at all. In this
+ * case the ACPICA code will not call _REG to signal availability of
+ * the GeneralPurposeIo OpRegion. Manually call _REG here so that
+ * the DSDT-s GeneralPurposeIo availability checks will succeed.
+ */
+ params[0].type = ACPI_TYPE_INTEGER;
+ params[0].integer.value = ACPI_ADR_SPACE_GPIO;
+ params[1].type = ACPI_TYPE_INTEGER;
+ params[1].integer.value = 1;
+ input.count = 2;
+ input.pointer = params;
+ acpi_evaluate_object(adev->handle, "_REG", &input, NULL);
+
platform_set_drvdata(pdev, pctrl);
return 0;
--
2.26.0
--Andy
> On Apr 18, 2020, at 12:42 PM, Linus Torvalds <torvalds(a)linux-foundation.org> wrote:
>
>>> On Fri, Apr 17, 2020 at 5:12 PM Dan Williams <dan.j.williams(a)intel.com> wrote:
>>>
>>> @@ -106,12 +108,10 @@ static __always_inline __must_check unsigned long
>>> memcpy_mcsafe(void *dst, const void *src, size_t cnt)
>>> {
>>> #ifdef CONFIG_X86_MCE
>>> - i(static_branch_unlikely(&mcsafe_key))
>>> - return __memcpy_mcsafe(dst, src, cnt);
>>> - else
>>> + if (static_branch_unlikely(&mcsafe_slow_key))
>>> + return memcpy_mcsafe_slow(dst, src, cnt);
>>> #endif
>>> - memcpy(dst, src, cnt);
>>> - return 0;
>>> + return memcpy_mcsafe_fast(dst, src, cnt);
>>> }
>
> It strikes me that I see no advantages to making this an inline function at all.
>
> Even for the good case - where it turns into just a memcpy because MCE
> is entirely disabled - it doesn't seem to matter.
>
> The only case that really helps is when the memcpy can be turned into
> a single access. Which - and I checked - does exist, with people doing
>
> r = memcpy_mcsafe(&sb_seq_count, &sb(wc)->seq_count, sizeof(uint64_t));
>
> to read a single 64-bit field which looks aligned to me.
>
> But that code is incredible garbage anyway, since even on a broken
> machine, there's no actual reason to use the slow variant for that
> whole access that I can tell. The macs-safe copy routines do not do
> anything worthwhile for a single access.
Maybe I’m missing something obvious, but what’s the alternative? The _mcsafe variants don’t just avoid the REP mess — they also tell the kernel that this particular access is recoverable via extable. With a regular memory access, the CPU may not explode, but do_machine_check() will, at very best, OOPS, and even that requires a certain degree of optimism. A panic is more likely.
When doing an atomic modeset with ALLOW_MODESET drivers are allowed to
pull in arbitrary other resources, including CRTCs (e.g. when
reconfiguring global resources).
But in nonblocking mode userspace has then no idea this happened,
which can lead to spurious EBUSY calls, both:
- when that other CRTC is currently busy doing a page_flip the
ALLOW_MODESET commit can fail with an EBUSY
- on the other CRTC a normal atomic flip can fail with EBUSY because
of the additional commit inserted by the kernel without userspace's
knowledge
For blocking commits this isn't a problem, because everyone else will
just block until all the CRTC are reconfigured. Only thing userspace
can notice is the dropped frames without any reason for why frames got
dropped.
Consensus is that we need new uapi to handle this properly, but no one
has any idea what exactly the new uapi should look like. As a stop-gap
plug this problem by demoting nonblocking commits which might cause
issues by including CRTCs not in the original request to blocking
commits.
v2: Add comments and a WARN_ON to enforce this only when allowed - we
don't want to silently convert page flips into blocking plane updates
just because the driver is buggy.
References: https://lists.freedesktop.org/archives/dri-devel/2018-July/182281.html
Bugzilla: https://gitlab.freedesktop.org/wayland/weston/issues/24#note_9568
Cc: Daniel Stone <daniel(a)fooishbar.org>
Cc: Pekka Paalanen <pekka.paalanen(a)collabora.co.uk>
Cc: stable(a)vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter(a)intel.com>
---
drivers/gpu/drm/drm_atomic.c | 34 +++++++++++++++++++++++++++++++---
1 file changed, 31 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/drm_atomic.c b/drivers/gpu/drm/drm_atomic.c
index d5cefb1cb2a2..058512f14772 100644
--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -2018,15 +2018,43 @@ EXPORT_SYMBOL(drm_atomic_commit);
int drm_atomic_nonblocking_commit(struct drm_atomic_state *state)
{
struct drm_mode_config *config = &state->dev->mode_config;
- int ret;
+ unsigned requested_crtc = 0;
+ unsigned affected_crtc = 0;
+ struct drm_crtc *crtc;
+ struct drm_crtc_state *crtc_state;
+ bool nonblocking = true;
+ int ret, i;
+
+ /*
+ * For commits that allow modesets drivers can add other CRTCs to the
+ * atomic commit, e.g. when they need to reallocate global resources.
+ *
+ * But when userspace also requests a nonblocking commit then userspace
+ * cannot know that the commit affects other CRTCs, which can result in
+ * spurious EBUSY failures. Until we have better uapi plug this by
+ * demoting such commits to blocking mode.
+ */
+ for_each_new_crtc_in_state(state, crtc, crtc_state, i)
+ requested_crtc |= drm_crtc_mask(crtc);
ret = drm_atomic_check_only(state);
if (ret)
return ret;
- DRM_DEBUG_ATOMIC("committing %p nonblocking\n", state);
+ for_each_new_crtc_in_state(state, crtc, crtc_state, i)
+ affected_crtc |= drm_crtc_mask(crtc);
+
+ if (affected_crtc != requested_crtc) {
+ /* adding other CRTC is only allowed for modeset commits */
+ WARN_ON(state->allow_modeset);
+
+ DRM_DEBUG_ATOMIC("demoting %p to blocking mode to avoid EBUSY\n", state);
+ nonblocking = false;
+ } else {
+ DRM_DEBUG_ATOMIC("committing %p nonblocking\n", state);
+ }
- return config->funcs->atomic_commit(state->dev, state, true);
+ return config->funcs->atomic_commit(state->dev, state, nonblocking);
}
EXPORT_SYMBOL(drm_atomic_nonblocking_commit);
--
2.18.0
The premature free in the error path is blocked by V4L
refcounting, not USB refcounting. Thanks to
Ben Hutchings for review.
[v2] corrected attributions
Signed-off-by: Oliver Neukum <oneukum(a)suse.com>
Fixes: 50e704453553 ("media: usbtv: prevent double free in error case")
CC: stable(a)vger.kernel.org
Reported-by: Ben Hutchings <ben.hutchings(a)codethink.co.uk>
---
drivers/media/usb/usbtv/usbtv-core.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/media/usb/usbtv/usbtv-core.c b/drivers/media/usb/usbtv/usbtv-core.c
index 5095c380b2c1..4a03c4d66314 100644
--- a/drivers/media/usb/usbtv/usbtv-core.c
+++ b/drivers/media/usb/usbtv/usbtv-core.c
@@ -113,7 +113,8 @@ static int usbtv_probe(struct usb_interface *intf,
usbtv_audio_fail:
/* we must not free at this point */
- usb_get_dev(usbtv->udev);
+ v4l2_device_get(&usbtv->v4l2_dev);
+ /* this will undo the v4l2_device_get() */
usbtv_video_free(usbtv);
usbtv_video_fail:
--
2.13.6