Hello,
commit 310ca162d77 "block/loop: Use global lock for ioctl() operation." has
been pushed to multiple stable trees. This patch is a part of larger series
that overhauls the locking inside loopback device upstream and for 4.4,
4.9, and 4.14 stable trees only this patch from the series is applied. Our
testing now has shown [1] that the patch alone makes present deadlocks
inside loopback driver more likely (the openqa test in our infrastructure
didn't hit the deadlock before whereas with the new kernel it hits it
reliably every time). So I would suggest we revert 310ca162d77 from 4.4,
4.9, and 4.14 kernels.
Another option would be to backport other locking fixes for the loop
device but honestly I don't think that's a stable material - never heard
of real users hitting problems, only syzkaller could, and we are still
fixing up some small glitches resulting from that rework...
Honza
[1] https://bugzilla.suse.com/show_bug.cgi?id=1129739
--
Jan Kara <jack(a)suse.com>
SUSE Labs, CR
The frequency calculation was based on the current(max) frequency of the
CPU. However for low frequency, the value used was already the parent
frequency divided by a factor of 2.
Instead of using this frequency, this fix directly get the frequency from
the parent clock.
Fixes: 92ce45fb875d ("cpufreq: Add DVFS support for Armada 37xx")
Cc: <stable(a)vger.kernel.org>
Reported-by: Christian Neubert <christian.neubert.86(a)gmail.com>
Signed-off-by: Gregory CLEMENT <gregory.clement(a)bootlin.com>
---
drivers/cpufreq/armada-37xx-cpufreq.c | 22 +++++++++++++++++++---
1 file changed, 19 insertions(+), 3 deletions(-)
diff --git a/drivers/cpufreq/armada-37xx-cpufreq.c b/drivers/cpufreq/armada-37xx-cpufreq.c
index ad4463e4266e..a0962463805e 100644
--- a/drivers/cpufreq/armada-37xx-cpufreq.c
+++ b/drivers/cpufreq/armada-37xx-cpufreq.c
@@ -373,11 +373,11 @@ static int __init armada37xx_cpufreq_driver_init(void)
struct armada_37xx_dvfs *dvfs;
struct platform_device *pdev;
unsigned long freq;
- unsigned int cur_frequency;
+ unsigned int cur_frequency, base_frequency;
struct regmap *nb_pm_base, *avs_base;
struct device *cpu_dev;
int load_lvl, ret;
- struct clk *clk;
+ struct clk *clk, *parent;
nb_pm_base =
syscon_regmap_lookup_by_compatible("marvell,armada-3700-nb-pm");
@@ -413,6 +413,22 @@ static int __init armada37xx_cpufreq_driver_init(void)
return PTR_ERR(clk);
}
+ parent = clk_get_parent(clk);
+ if (IS_ERR(parent)) {
+ dev_err(cpu_dev, "Cannot get parent clock for CPU0\n");
+ clk_put(clk);
+ return PTR_ERR(parent);
+ }
+
+ /* Get parent CPU frequency */
+ base_frequency = clk_get_rate(parent);
+
+ if (!base_frequency) {
+ dev_err(cpu_dev, "Failed to get parent clock rate for CPU\n");
+ clk_put(clk);
+ return -EINVAL;
+ }
+
/* Get nominal (current) CPU frequency */
cur_frequency = clk_get_rate(clk);
if (!cur_frequency) {
@@ -445,7 +461,7 @@ static int __init armada37xx_cpufreq_driver_init(void)
for (load_lvl = ARMADA_37XX_DVFS_LOAD_0; load_lvl < LOAD_LEVEL_NR;
load_lvl++) {
unsigned long u_volt = avs_map[dvfs->avs[load_lvl]] * 1000;
- freq = cur_frequency / dvfs->divider[load_lvl];
+ freq = base_frequency / dvfs->divider[load_lvl];
ret = dev_pm_opp_add(cpu_dev, freq, u_volt);
if (ret)
goto remove_opp;
--
2.20.1
The patch titled
Subject: mm: add /sys/kernel/slab/cache/cache_dma32
has been added to the -mm tree. Its filename is
mm-add-sys-kernel-slab-cache-cache_dma32.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-add-sys-kernel-slab-cache-cache…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-add-sys-kernel-slab-cache-cache…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Nicolas Boichat <drinkcat(a)chromium.org>
Subject: mm: add /sys/kernel/slab/cache/cache_dma32
A previous patch in this series adds support for SLAB_CACHE_DMA32 kmem
caches. This adds the corresponding /sys/kernel/slab/cache/cache_dma32
entries, and fixes slabinfo tool.
Link: http://lkml.kernel.org/r/20181210011504.122604-4-drinkcat@chromium.org
Signed-off-by: Nicolas Boichat <drinkcat(a)chromium.org>
Cc: Christoph Hellwig <hch(a)infradead.org>
Cc: Christoph Lameter <cl(a)linux.com>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Hsin-Yi Wang <hsinyi(a)chromium.org>
Cc: Huaisheng Ye <yehs1(a)lenovo.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: Matthias Brugger <matthias.bgg(a)gmail.com>
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Mike Rapoport <rppt(a)linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg(a)kernel.org>
Cc: Robin Murphy <robin.murphy(a)arm.com>
Cc: Sasha Levin <Alexander.Levin(a)microsoft.com>
Cc: Tomasz Figa <tfiga(a)google.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Yingjoe Chen <yingjoe.chen(a)mediatek.com>
Cc: Yong Wu <yong.wu(a)mediatek.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
Documentation/ABI/testing/sysfs-kernel-slab | 9 +++++++++
mm/slub.c | 11 +++++++++++
tools/vm/slabinfo.c | 7 ++++++-
3 files changed, 26 insertions(+), 1 deletion(-)
--- a/Documentation/ABI/testing/sysfs-kernel-slab~mm-add-sys-kernel-slab-cache-cache_dma32
+++ a/Documentation/ABI/testing/sysfs-kernel-slab
@@ -106,6 +106,15 @@ Description:
are from ZONE_DMA.
Available when CONFIG_ZONE_DMA is enabled.
+What: /sys/kernel/slab/cache/cache_dma32
+Date: December 2018
+KernelVersion: 4.21
+Contact: Nicolas Boichat <drinkcat(a)chromium.org>
+Description:
+ The cache_dma32 file is read-only and specifies whether objects
+ are from ZONE_DMA32.
+ Available when CONFIG_ZONE_DMA32 is enabled.
+
What: /sys/kernel/slab/cache/cpu_slabs
Date: May 2007
KernelVersion: 2.6.22
--- a/mm/slub.c~mm-add-sys-kernel-slab-cache-cache_dma32
+++ a/mm/slub.c
@@ -5112,6 +5112,14 @@ static ssize_t cache_dma_show(struct kme
SLAB_ATTR_RO(cache_dma);
#endif
+#ifdef CONFIG_ZONE_DMA32
+static ssize_t cache_dma32_show(struct kmem_cache *s, char *buf)
+{
+ return sprintf(buf, "%d\n", !!(s->flags & SLAB_CACHE_DMA32));
+}
+SLAB_ATTR_RO(cache_dma32);
+#endif
+
static ssize_t usersize_show(struct kmem_cache *s, char *buf)
{
return sprintf(buf, "%u\n", s->usersize);
@@ -5452,6 +5460,9 @@ static struct attribute *slab_attrs[] =
#ifdef CONFIG_ZONE_DMA
&cache_dma_attr.attr,
#endif
+#ifdef CONFIG_ZONE_DMA32
+ &cache_dma32_attr.attr,
+#endif
#ifdef CONFIG_NUMA
&remote_node_defrag_ratio_attr.attr,
#endif
--- a/tools/vm/slabinfo.c~mm-add-sys-kernel-slab-cache-cache_dma32
+++ a/tools/vm/slabinfo.c
@@ -29,7 +29,7 @@ struct slabinfo {
char *name;
int alias;
int refs;
- int aliases, align, cache_dma, cpu_slabs, destroy_by_rcu;
+ int aliases, align, cache_dma, cache_dma32, cpu_slabs, destroy_by_rcu;
unsigned int hwcache_align, object_size, objs_per_slab;
unsigned int sanity_checks, slab_size, store_user, trace;
int order, poison, reclaim_account, red_zone;
@@ -534,6 +534,8 @@ static void report(struct slabinfo *s)
printf("** Hardware cacheline aligned\n");
if (s->cache_dma)
printf("** Memory is allocated in a special DMA zone\n");
+ if (s->cache_dma32)
+ printf("** Memory is allocated in a special DMA32 zone\n");
if (s->destroy_by_rcu)
printf("** Slabs are destroyed via RCU\n");
if (s->reclaim_account)
@@ -602,6 +604,8 @@ static void slabcache(struct slabinfo *s
*p++ = '*';
if (s->cache_dma)
*p++ = 'd';
+ if (s->cache_dma32)
+ *p++ = 'D';
if (s->hwcache_align)
*p++ = 'A';
if (s->poison)
@@ -1208,6 +1212,7 @@ static void read_slab_dir(void)
slab->aliases = get_obj("aliases");
slab->align = get_obj("align");
slab->cache_dma = get_obj("cache_dma");
+ slab->cache_dma32 = get_obj("cache_dma32");
slab->cpu_slabs = get_obj("cpu_slabs");
slab->destroy_by_rcu = get_obj("destroy_by_rcu");
slab->hwcache_align = get_obj("hwcache_align");
_
Patches currently in -mm which might be from drinkcat(a)chromium.org are
mm-add-support-for-kmem-caches-in-dma32-zone.patch
iommu-io-pgtable-arm-v7s-request-dma32-memory-and-improve-debugging.patch
mm-add-sys-kernel-slab-cache-cache_dma32.patch
On a very specific subset of ThinkPad P50 SKUs, particularly ones that
come with a Quadro M1000M chip instead of the M2000M variant, the BIOS
seems to have a very nasty habit of not always resetting the secondary
Nvidia GPU between full reboots if the laptop is configured in Hybrid
Graphics mode. The reason for this happening is unknown, but the
following steps and possibly a good bit of patience will reproduce the
issue:
1. Boot up the laptop normally in Hybrid graphics mode
2. Make sure nouveau is loaded and that the GPU is awake
2. Allow the nvidia GPU to runtime suspend itself after being idle
3. Reboot the machine, the more sudden the better (e.g sysrq-b may help)
4. If nouveau loads up properly, reboot the machine again and go back to
step 2 until you reproduce the issue
This results in some very strange behavior: the GPU will
quite literally be left in exactly the same state it was in when the
previously booted kernel started the reboot. This has all sorts of bad
sideaffects: for starters, this completely breaks nouveau starting with a
mysterious EVO channel failure that happens well before we've actually
used the EVO channel for anything:
nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000
00000002
Later on, this causes us to timeout trying to bring up the GR ctx:
------------[ cut here ]------------
nouveau 0000:01:00.0: timeout
WARNING: CPU: 0 PID: 12 at
drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547
gf100_grctx_generate+0x7b2/0x850 [nouveau]
Modules linked in: nouveau mxm_wmi i915 crc32c_intel ttm i2c_algo_bit
serio_raw drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
xhci_pci drm xhci_hcd i2c_core wmi video
CPU: 0 PID: 12 Comm: kworker/0:1 Not tainted 5.0.0-rc5Lyude-Test+ #29
Hardware name: LENOVO 20EQS64N0B/20EQS64N0B, BIOS N1EET82W (1.55 )
12/18/2018
Workqueue: events_long drm_dp_mst_link_probe_work [drm_kms_helper]
RIP: 0010:gf100_grctx_generate+0x7b2/0x850 [nouveau]
Code: 85 d2 75 04 48 8b 57 10 48 89 95 28 ff ff ff e8 b4 37 0e e1 48 8b
95 28 ff ff ff 48 c7 c7 b1 97 57 a0 48 89 c6 e8 5a 38 c0 e0 <0f> 0b e9
b9 fd ff ff 48 8b 85 60 ff ff ff 48 8b 40 10 48 8b 78 10
RSP: 0018:ffffc900000b77f0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff888871af8000 RCX: 0000000000000000
RDX: ffff88887f41dfe0 RSI: ffff88887f415698 RDI: ffff88887f415698
RBP: ffffc900000b78c8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff888872118000
R13: 0000000000000000 R14: ffffffffa0551420 R15: ffffc900000b7818
FS: 0000000000000000(0000) GS:ffff88887f400000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005644d0556ca8 CR3: 0000000002214006 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
gf100_gr_init_ctxctl+0x27b/0x2d0 [nouveau]
gf100_gr_init+0x5bd/0x5e0 [nouveau]
gf100_gr_init_+0x61/0x70 [nouveau]
nvkm_gr_init+0x1d/0x20 [nouveau]
nvkm_engine_init+0xcb/0x210 [nouveau]
nvkm_subdev_init+0xd6/0x230 [nouveau]
nvkm_engine_ref.part.0+0x52/0x70 [nouveau]
nvkm_engine_ref+0x13/0x20 [nouveau]
nvkm_ioctl_new+0x12c/0x260 [nouveau]
? nvkm_fifo_chan_child_del+0xa0/0xa0 [nouveau]
? gf100_gr_dtor+0xe0/0xe0 [nouveau]
nvkm_ioctl+0xe2/0x180 [nouveau]
nvkm_client_ioctl+0x12/0x20 [nouveau]
nvif_object_ioctl+0x47/0x50 [nouveau]
nvif_object_init+0xc8/0x120 [nouveau]
nvc0_fbcon_accel_init+0x5c/0x960 [nouveau]
nouveau_fbcon_create+0x5a5/0x5d0 [nouveau]
? drm_setup_crtcs+0x27b/0xcb0 [drm_kms_helper]
? __lock_is_held+0x5e/0xa0
__drm_fb_helper_initial_config_and_unlock+0x27c/0x520 [drm_kms_helper]
drm_fb_helper_hotplug_event.part.29+0xae/0xc0 [drm_kms_helper]
drm_fb_helper_hotplug_event+0x1c/0x30 [drm_kms_helper]
nouveau_fbcon_output_poll_changed+0xb8/0x110 [nouveau]
drm_kms_helper_hotplug_event+0x2a/0x40 [drm_kms_helper]
drm_dp_send_link_address+0x176/0x1c0 [drm_kms_helper]
drm_dp_check_and_send_link_address+0xa0/0xb0 [drm_kms_helper]
drm_dp_mst_link_probe_work+0xa4/0xc0 [drm_kms_helper]
process_one_work+0x22f/0x5c0
worker_thread+0x44/0x3a0
kthread+0x12b/0x150
? wq_pool_ids_show+0x140/0x140
? kthread_create_on_node+0x60/0x60
ret_from_fork+0x3a/0x50
irq event stamp: 22490
hardirqs last enabled at (22489): [<ffffffff8113281d>]
console_unlock+0x44d/0x5f0
hardirqs last disabled at (22490): [<ffffffff81001c03>]
trace_hardirqs_off_thunk+0x1a/0x1c
softirqs last enabled at (22486): [<ffffffff81c00330>]
__do_softirq+0x330/0x44d
softirqs last disabled at (22479): [<ffffffff810c3105>]
irq_exit+0xe5/0xf0
WARNING: CPU: 0 PID: 12 at
drivers/gpu/drm/nouveau/nvkm/engine/gr/ctxgf100.c:1547
gf100_grctx_generate+0x7b2/0x850 [nouveau]
---[ end trace bf0976ed88b122a8 ]---
nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
nouveau 0000:01:00.0: gr: wait for idle timeout (en: 1, ctxsw: 0, busy: 1)
nouveau 0000:01:00.0: fifo: fault 01 [WRITE] at 0000000000008000 engine
00 [GR] client 15 [HUB/SCC_NB] reason c4 [] on channel -1 [0000000000
unknown]
>From which the GPU never manages to recover. Booting without nouveau
loading causes issues as well, since the GPU starts sending spurious
interrupts that cause other device's IRQs to get disabled by the kernel:
irq 16: nobody cared (try booting with the "irqpoll" option)
…
handlers:
[<000000007faa9e99>] i801_isr [i2c_i801]
Disabling IRQ #16
…
serio: RMI4 PS/2 pass-through port at rmi4-00.fn03
i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
i801_smbus 0000:00:1f.4: Transaction timeout
rmi4_f03 rmi4-00.fn03: rmi_f03_pt_write: Failed to write to F03 TX
register (-110).
i801_smbus 0000:00:1f.4: Timeout waiting for interrupt!
i801_smbus 0000:00:1f.4: Transaction timeout
rmi4_physical rmi4-00: rmi_driver_set_irq_bits: Failed to change enabled
interrupts!
Which in turn causes the touchpad and sometimes even other things to get
disabled.
Since the GPU staying on causes problems even without nouveau's
intervention, we can't fix this problem from nouveau itself. We have to
fix it as early as possible in the boot sequence in order to make sure
that the GPU is in a clean state before it has a chance to spam us with
interrupts and break things.
So to do this, we add a new pci quirk using
DECLARE_PCI_FIXUP_CLASS_FINAL that will be invoked before the PCI probe
at boot finishes. From there, we check to make sure that this is indeed
the specific P50 variant of this GPU. We also make sure that the GPU PCI
device is advertising NoReset- in order to prevent us from trying to
reset the GPU when the machine is in Dedicated graphics mode (where the
GPU being initialized by the BIOS is normal and expected). Finally, we
try mapping the MMIO space for the GPU which should only work if the GPU
is actually active in D0 mode. We can then read the magic 0x2240c
register on the GPU, which will have bit 1 set if the GPU's firmware has
already been posted during a previous boot. Once we've confirmed all of
this, we reset the PCI device and re-disable it - bringing the GPU back
into a healthy state.
Signed-off-by: Lyude Paul <lyude(a)redhat.com>
Cc: nouveau(a)lists.freedesktop.org
Cc: dri-devel(a)lists.freedesktop.org
Cc: Karol Herbst <kherbst(a)redhat.com>
Cc: Ben Skeggs <skeggsb(a)gmail.com>
Cc: stable(a)vger.kernel.org
---
drivers/pci/quirks.c | 65 ++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 65 insertions(+)
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b0a413f3f7ca..948492fda8bf 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5117,3 +5117,68 @@ SWITCHTEC_QUIRK(0x8573); /* PFXI 48XG3 */
SWITCHTEC_QUIRK(0x8574); /* PFXI 64XG3 */
SWITCHTEC_QUIRK(0x8575); /* PFXI 80XG3 */
SWITCHTEC_QUIRK(0x8576); /* PFXI 96XG3 */
+
+/*
+ * On certain Lenovo Thinkpad P50 SKUs, specifically those with a Nvidia
+ * Quadro M1000M, the BIOS will occasionally make the mistake of not resetting
+ * the nvidia GPU between reboots if the system is configured to use hybrid
+ * graphics mode. This results in the GPU being left in whatever state it was
+ * in during the previous boot which causes spurious interrupts from the GPU,
+ * which in turn cause us to disable the wrong IRQs and end up breaking the
+ * touchpad. Unsurprisingly, this also completely breaks nouveau.
+ *
+ * Luckily, it seems a simple reset of the PCI device for the nvidia GPU
+ * manages to bring the GPU back into a clean state and fix all of these
+ * issues. Additionally since the GPU will report NoReset+ when the machine is
+ * configured in Dedicated display mode, we don't need to worry about
+ * accidentally resetting the GPU when it's supposed to already be
+ * initialized.
+ */
+static void
+quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot(struct pci_dev *pdev)
+{
+ void __iomem *map;
+ int ret;
+
+ if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO ||
+ pdev->subsystem_device != 0x222e ||
+ !pdev->reset_fn)
+ return;
+
+ /*
+ * If we can't enable the device's mmio space, it's probably not even
+ * initialized. This is fine, and means we can just skip the quirk
+ * entirely.
+ */
+ if (pci_enable_device_mem(pdev)) {
+ pci_dbg(pdev, "Can't enable device mem, no reset needed\n");
+ return;
+ }
+
+ /* Taken from drivers/gpu/drm/nouveau/engine/device/base.c */
+ map = ioremap(pci_resource_start(pdev, 0), 0x102000);
+ if (!map) {
+ pci_err(pdev, "Can't map MMIO space, this is probably very bad\n");
+ goto out_disable;
+ }
+
+ /*
+ * Be extra careful, and make sure that the GPU firmware is posted
+ * before trying a reset
+ */
+ if (ioread32(map + 0x2240c) & 0x2) {
+ pci_info(pdev,
+ FW_BUG "GPU left initialized by EFI, resetting\n");
+ ret = pci_reset_function(pdev);
+ if (ret < 0)
+ pci_err(pdev, "Failed to reset GPU: %d\n", ret);
+ }
+
+ iounmap(map);
+out_disable:
+ pci_disable_device(pdev);
+}
+
+DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
+ PCI_CLASS_DISPLAY_VGA, 8,
+ quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot);
--
2.20.1
FUSE filesystem server and kernel client negotiate during initialization
phase, what should be the maximum write size the client will ever issue.
Correspondingly the filesystem server then queues sys_read calls to read
requests with buffer capacity large enough to carry request header
+ that max_write bytes. A filesystem server is free to set its max_write
in anywhere in the range between [1·page, fc->max_pages·page]. In
particular go-fuse[2] sets max_write by default as 64K, wheres default
fc->max_pages corresponds to 128K. Libfuse also allows users to
configure max_write, but by default presets it to possible maximum.
If max_write is < fc->max_pages·page, and in NOTIFY_RETRIEVE handler we
allow to retrieve more than max_write bytes, corresponding prepared
NOTIFY_REPLY will be thrown away by fuse_dev_do_read, because the
filesystem server, in full correspondence with server/client contract,
will be only queuing sys_read with ~max_write buffer capacity, and
fuse_dev_do_read throws away requests that cannot fit into server
request buffer. In turn the filesystem server could get stuck waiting
indefinitely for NOTIFY_REPLY since NOTIFY_RETRIEVE handler returned OK
which is understood by clients as that NOTIFY_REPLY was queued and will
be sent back.
-> Cap requested size to negotiate max_write to avoid the problem.
This aligns with the way NOTIFY_RETRIEVE handler works, which already
unconditionally caps requested retrieve size to fuse_conn->max_pages.
This way it should not hurt NOTIFY_RETRIEVE semantic if we return less
data than was originally requested.
Please see [1] for context where the problem of stuck filesystem was hit
for real, how the situation was traced and for more involving patch that
did not make it into the tree.
[1] https://marc.info/?l=linux-fsdevel&m=155057023600853&w=2
[2] https://github.com/hanwen/go-fuse
Signed-off-by: Kirill Smelkov <kirr(a)nexedi.com>
Cc: Han-Wen Nienhuys <hanwen(a)google.com>
Cc: Jakob Unterwurzacher <jakobunt(a)gmail.com>
Cc: <stable(a)vger.kernel.org> # v2.6.36+
---
fs/fuse/dev.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/fuse/dev.c b/fs/fuse/dev.c
index 8a63e52785e9..38e94bc43053 100644
--- a/fs/fuse/dev.c
+++ b/fs/fuse/dev.c
@@ -1749,7 +1749,7 @@ static int fuse_retrieve(struct fuse_conn *fc, struct inode *inode,
offset = outarg->offset & ~PAGE_MASK;
file_size = i_size_read(inode);
- num = outarg->size;
+ num = min(outarg->size, fc->max_write);
if (outarg->offset > file_size)
num = 0;
else if (outarg->offset + num > file_size)
--
2.21.0.392.gf8f6787159
This is a partial backport of original Darrick's series "xfs: logging fixes" to kernel 4.14.
It fixes the in-memory metadata corruption error, which happens
when a partially initialized attribute buffer is attemped to be written to disk.
This issue is reproducible with kernel 4.14, when adding a 1-sec sleep in xfs_attr_set(),
between the call to xfs_attr_shortform_to_leaf() and the call to xfs_attr_leaf_addname().
Darrick J. Wong (2):
xfs: add the ability to join a held buffer to a defer_ops
xfs: hold xfs_buf locked between shortform->leaf conversion and the
addition of an attribute
fs/xfs/libxfs/xfs_attr.c | 20 +++++++++++++++-----
fs/xfs/libxfs/xfs_attr_leaf.c | 9 ++++++---
fs/xfs/libxfs/xfs_attr_leaf.h | 3 ++-
fs/xfs/libxfs/xfs_defer.c | 39 ++++++++++++++++++++++++++++++++++++---
fs/xfs/libxfs/xfs_defer.h | 5 ++++-
5 files changed, 63 insertions(+), 13 deletions(-)
--
1.9.1
As the comment notes, the return codes for TSYNC and NEW_LISTENER conflict,
because they both return positive values, one in the case of success and
one in the case of error. So, let's disallow both of these flags together.
While this is technically a userspace break, all the users I know of are
still waiting on me to land this feature in libseccomp, so I think it'll be
safe. Also, at present my use case doesn't require TSYNC at all, so this
isn't a big deal to disallow. If someone wanted to support this, a path
forward would be to add a new flag like
TSYNC_AND_LISTENER_YES_I_UNDERSTAND_THAT_TSYNC_WILL_JUST_RETURN_EAGAIN, but
the use cases are so different I don't see it really happening.
Finally, it's worth noting that this does actually fix a UAF issue: at the end
of seccomp_set_mode_filter(), we have:
if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {
if (ret < 0) {
listener_f->private_data = NULL;
fput(listener_f);
put_unused_fd(listener);
} else {
fd_install(listener, listener_f);
ret = listener;
}
}
out_free:
seccomp_filter_free(prepared);
But if ret > 0 because TSYNC raced, we'll install the listener fd and then free
the filter out from underneath it, causing a UAF when the task closes it or
dies. This patch also switches the condition to be simply if (ret), so that
if someone does add the flag mentioned above, they won't have to remember
to fix this too.
Signed-off-by: Tycho Andersen <tycho(a)tycho.ws>
Fixes: 6a21cc50f0c7 ("seccomp: add a return code to trap to userspace")
CC: stable(a)vger.kernel.org # v5.0+
---
kernel/seccomp.c | 17 +++++++++++++++--
1 file changed, 15 insertions(+), 2 deletions(-)
diff --git a/kernel/seccomp.c b/kernel/seccomp.c
index d0d355ded2f4..79bada51091b 100644
--- a/kernel/seccomp.c
+++ b/kernel/seccomp.c
@@ -500,7 +500,10 @@ seccomp_prepare_user_filter(const char __user *user_filter)
*
* Caller must be holding current->sighand->siglock lock.
*
- * Returns 0 on success, -ve on error.
+ * Returns 0 on success, -ve on error, or
+ * - in TSYNC mode: the pid of a thread which was either not in the correct
+ * seccomp mode or did not have an ancestral seccomp filter
+ * - in NEW_LISTENER mode: the fd of the new listener
*/
static long seccomp_attach_filter(unsigned int flags,
struct seccomp_filter *filter)
@@ -1256,6 +1259,16 @@ static long seccomp_set_mode_filter(unsigned int flags,
if (flags & ~SECCOMP_FILTER_FLAG_MASK)
return -EINVAL;
+ /*
+ * In the successful case, NEW_LISTENER returns the new listener fd.
+ * But in the failure case, TSYNC returns the thread that died. If you
+ * combine these two flags, there's no way to tell whether something
+ * succeded or failed. So, let's disallow this combination.
+ */
+ if ((flags & SECCOMP_FILTER_FLAG_TSYNC) &&
+ (flags && SECCOMP_FILTER_FLAG_NEW_LISTENER))
+ return -EINVAL;
+
/* Prepare the new filter before holding any locks. */
prepared = seccomp_prepare_user_filter(filter);
if (IS_ERR(prepared))
@@ -1302,7 +1315,7 @@ static long seccomp_set_mode_filter(unsigned int flags,
mutex_unlock(¤t->signal->cred_guard_mutex);
out_put_fd:
if (flags & SECCOMP_FILTER_FLAG_NEW_LISTENER) {
- if (ret < 0) {
+ if (ret) {
listener_f->private_data = NULL;
fput(listener_f);
put_unused_fd(listener);
--
2.19.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 442601e87a4769a8daba4976ec3afa5222ca211d Mon Sep 17 00:00:00 2001
From: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Date: Fri, 8 Feb 2019 18:30:59 +0200
Subject: [PATCH] tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is
incomplete
Return -E2BIG when the transfer is incomplete. The upper layer does
not retry, so not doing that is incorrect behaviour.
Cc: stable(a)vger.kernel.org
Fixes: a2871c62e186 ("tpm: Add support for Atmel I2C TPMs")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Reviewed-by: Stefan Berger <stefanb(a)linux.ibm.com>
Reviewed-by: Jerry Snitselaar <jsnitsel(a)redhat.com>
diff --git a/drivers/char/tpm/tpm_i2c_atmel.c b/drivers/char/tpm/tpm_i2c_atmel.c
index 32a8e27c5382..cc4e642d3180 100644
--- a/drivers/char/tpm/tpm_i2c_atmel.c
+++ b/drivers/char/tpm/tpm_i2c_atmel.c
@@ -69,6 +69,10 @@ static int i2c_atmel_send(struct tpm_chip *chip, u8 *buf, size_t len)
if (status < 0)
return status;
+ /* The upper layer does not support incomplete sends. */
+ if (status != len)
+ return -E2BIG;
+
return 0;
}
The patch below does not apply to the 5.0-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 442601e87a4769a8daba4976ec3afa5222ca211d Mon Sep 17 00:00:00 2001
From: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Date: Fri, 8 Feb 2019 18:30:59 +0200
Subject: [PATCH] tpm/tpm_i2c_atmel: Return -E2BIG when the transfer is
incomplete
Return -E2BIG when the transfer is incomplete. The upper layer does
not retry, so not doing that is incorrect behaviour.
Cc: stable(a)vger.kernel.org
Fixes: a2871c62e186 ("tpm: Add support for Atmel I2C TPMs")
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen(a)linux.intel.com>
Reviewed-by: Stefan Berger <stefanb(a)linux.ibm.com>
Reviewed-by: Jerry Snitselaar <jsnitsel(a)redhat.com>
diff --git a/drivers/char/tpm/tpm_i2c_atmel.c b/drivers/char/tpm/tpm_i2c_atmel.c
index 32a8e27c5382..cc4e642d3180 100644
--- a/drivers/char/tpm/tpm_i2c_atmel.c
+++ b/drivers/char/tpm/tpm_i2c_atmel.c
@@ -69,6 +69,10 @@ static int i2c_atmel_send(struct tpm_chip *chip, u8 *buf, size_t len)
if (status < 0)
return status;
+ /* The upper layer does not support incomplete sends. */
+ if (status != len)
+ return -E2BIG;
+
return 0;
}