- Linux-stable-mirror - lists.linaro.org

[v4.14.y PATCH] errseq: Always report a writeback error once

by Jeff Layton

From: Matthew Wilcox <willy(a)infradead.org> The errseq_t infrastructure assumes that errors which occurred before the file descriptor was opened are of no interest to the application. This turns out to be a regression for some applications, notably Postgres. Before errseq_t, a writeback error would be reported exactly once (as long as the inode remained in memory), so Postgres could open a file, call fsync() and find out whether there had been a writeback error on that file from another process. This patch changes the errseq infrastructure to report errors to all file descriptors which are opened after the error occurred, but before it was reported to any file descriptor. This restores the user-visible behaviour. [ jlayton: fix up conflicts in comments ] Cc: stable(a)vger.kernel.org Fixes: 5660e13d2fd6 ("fs: new infrastructure for writeback error handling and reporting") Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com> Reviewed-by: Jeff Layton <jlayton(a)kernel.org> Signed-off-by: Jeff Layton <jlayton(a)redhat.com> (cherry picked from commit b4678df184b314a2bd47d2329feca2c2534aa12b) --- lib/errseq.c | 25 +++++++++++-------------- 1 file changed, 11 insertions(+), 14 deletions(-) This is a backport to the v4.14 stable series. The only merge conflict was due to an earlier patch by Willy to flesh out the comments. There were no code changes necessary. diff --git a/lib/errseq.c b/lib/errseq.c index 79cc66897db4..b6ed81ec788d 100644 --- a/lib/errseq.c +++ b/lib/errseq.c @@ -111,25 +111,22 @@ EXPORT_SYMBOL(errseq_set); * errseq_sample - grab current errseq_t value * @eseq: pointer to errseq_t to be sampled * - * This function allows callers to sample an errseq_t value, marking it as - * "seen" if required. + * This function allows callers to initialise their errseq_t variable. + * If the error has been "seen", new callers will not see an old error. + * If there is an unseen error in @eseq, the caller of this function will + * see it the next time it checks for an error. + * + * Context: Any context. + * Return: The current errseq value. */ errseq_t errseq_sample(errseq_t *eseq) { errseq_t old = READ_ONCE(*eseq); - errseq_t new = old; - /* - * For the common case of no errors ever having been set, we can skip - * marking the SEEN bit. Once an error has been set, the value will - * never go back to zero. - */ - if (old != 0) { - new |= ERRSEQ_SEEN; - if (old != new) - cmpxchg(eseq, old, new); - } - return new; + /* If nobody has seen this error yet, then we can be the first. */ + if (!(old & ERRSEQ_SEEN)) + old = 0; + return old; } EXPORT_SYMBOL(errseq_sample); -- 2.17.0

7 years, 2 months

3
2
0 0

[PATCH 0/6] virtio-console: spec compliance fixes

by Michael S. Tsirkin

Turns out virtio console tries to take a buffer out of an active vq. Works by sheer luck, and is explicitly forbidden by spec. And while going over it I saw that error handling is also broken - failure is easy to trigger if I force allocations to fail. Lightly tested. Michael S. Tsirkin (6): virtio_console: don't tie bufs to a vq virtio: add ability to iterate over vqs virtio_console: free buffers after reset virtio_console: drop custom control queue cleanup virtio_console: move removal code virtio_console: reset on out of memory drivers/char/virtio_console.c | 155 ++++++++++++++++++++---------------------- include/linux/virtio.h | 3 + 2 files changed, 75 insertions(+), 83 deletions(-) -- MST

7 years, 2 months

5
18
0 0

[PATCH 2/4] iio: imu: inv_mpu6050: fix possible deadlock between mutex and iio

by Jean-Baptiste Maneyrol

Detected by kernel circular locking dependency checker. We are locking iio mutex (iio_device_claim_direct_mode) after locking our internal mutex. But when the buffer starts, iio first locks its mutex and then we lock our internal one. To avoid possible deadlock, we need to use the same order everwhere. So we change the ordering by locking first iio mutex, then our internal mutex. Fixes: 68cd6e5b206b ("iio: imu: inv_mpu6050: fix lock issues by using our own mutex") Cc: stable(a)vger.kernel.org Signed-off-by: Jean-Baptiste Maneyrol <jmaneyrol(a)invensense.com> --- drivers/iio/imu/inv_mpu6050/inv_mpu_core.c | 34 ++++++++++++++---------------- 1 file changed, 16 insertions(+), 18 deletions(-) diff --git a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c index aafa777..7358935 100644 --- a/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c +++ b/drivers/iio/imu/inv_mpu6050/inv_mpu_core.c @@ -340,12 +340,9 @@ static int inv_mpu6050_read_channel_data(struct iio_dev *indio_dev, int result; int ret; - result = iio_device_claim_direct_mode(indio_dev); - if (result) - return result; result = inv_mpu6050_set_power_itg(st, true); if (result) - goto error_release; + return result; switch (chan->type) { case IIO_ANGL_VEL: @@ -386,14 +383,11 @@ static int inv_mpu6050_read_channel_data(struct iio_dev *indio_dev, result = inv_mpu6050_set_power_itg(st, false); if (result) goto error_power_off; - iio_device_release_direct_mode(indio_dev); return ret; error_power_off: inv_mpu6050_set_power_itg(st, false); -error_release: - iio_device_release_direct_mode(indio_dev); return result; } @@ -407,9 +401,13 @@ inv_mpu6050_read_raw(struct iio_dev *indio_dev, switch (mask) { case IIO_CHAN_INFO_RAW: + ret = iio_device_claim_direct_mode(indio_dev); + if (ret) + return ret; mutex_lock(&st->lock); ret = inv_mpu6050_read_channel_data(indio_dev, chan, val); mutex_unlock(&st->lock); + iio_device_release_direct_mode(indio_dev); return ret; case IIO_CHAN_INFO_SCALE: switch (chan->type) { @@ -532,17 +530,18 @@ static int inv_mpu6050_write_raw(struct iio_dev *indio_dev, struct inv_mpu6050_state *st = iio_priv(indio_dev); int result; - mutex_lock(&st->lock); /* * we should only update scale when the chip is disabled, i.e. * not running */ result = iio_device_claim_direct_mode(indio_dev); if (result) - goto error_write_raw_unlock; + return result; + + mutex_lock(&st->lock); result = inv_mpu6050_set_power_itg(st, true); if (result) - goto error_write_raw_release; + goto error_write_raw_unlock; switch (mask) { case IIO_CHAN_INFO_SCALE: @@ -581,10 +580,9 @@ static int inv_mpu6050_write_raw(struct iio_dev *indio_dev, } result |= inv_mpu6050_set_power_itg(st, false); -error_write_raw_release: - iio_device_release_direct_mode(indio_dev); error_write_raw_unlock: mutex_unlock(&st->lock); + iio_device_release_direct_mode(indio_dev); return result; } @@ -643,17 +641,18 @@ inv_mpu6050_fifo_rate_store(struct device *dev, struct device_attribute *attr, fifo_rate > INV_MPU6050_MAX_FIFO_RATE) return -EINVAL; + result = iio_device_claim_direct_mode(indio_dev); + if (result) + return result; + mutex_lock(&st->lock); if (fifo_rate == st->chip_config.fifo_rate) { result = 0; goto fifo_rate_fail_unlock; } - result = iio_device_claim_direct_mode(indio_dev); - if (result) - goto fifo_rate_fail_unlock; result = inv_mpu6050_set_power_itg(st, true); if (result) - goto fifo_rate_fail_release; + goto fifo_rate_fail_unlock; d = INV_MPU6050_ONE_K_HZ / fifo_rate - 1; result = regmap_write(st->map, st->reg->sample_rate_div, d); @@ -667,10 +666,9 @@ inv_mpu6050_fifo_rate_store(struct device *dev, struct device_attribute *attr, fifo_rate_fail_power_off: result |= inv_mpu6050_set_power_itg(st, false); -fifo_rate_fail_release: - iio_device_release_direct_mode(indio_dev); fifo_rate_fail_unlock: mutex_unlock(&st->lock); + iio_device_release_direct_mode(indio_dev); if (result) return result; -- 2.7.4

7 years, 2 months

2
1
0 0

FAILED: patch "[PATCH] cpufreq / CPPC: Set platform specific transition_delay_us" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From d4f3388afd488ed15368fa7413b8bd6d1f98bb1d Mon Sep 17 00:00:00 2001 From: Prashanth Prakash <pprakash(a)codeaurora.org> Date: Fri, 27 Apr 2018 11:35:27 -0600 Subject: [PATCH] cpufreq / CPPC: Set platform specific transition_delay_us Add support to specify platform specific transition_delay_us instead of using the transition delay derived from PCC. With commit 3d41386d556d (cpufreq: CPPC: Use transition_delay_us depending transition_latency) we are setting transition_delay_us directly and not applying the LATENCY_MULTIPLIER. Because of that, on Qualcomm Centriq we can end up with a very high rate of frequency change requests when using the schedutil governor (default rate_limit_us=10 compared to an earlier value of 10000). The PCC subspace describes the rate at which the platform can accept commands on the CPPC's PCC channel. This includes read and write command on the PCC channel that can be used for reasons other than frequency transitions. Moreover the same PCC subspace can be used by multiple freq domains and deriving transition_delay_us from it as we do now can be sub-optimal. Moreover if a platform does not use PCC for desired_perf register then there is no way to compute the transition latency or the delay_us. CPPC does not have a standard defined mechanism to get the transition rate or the latency at the moment. Given the above limitations, it is simpler to have a platform specific transition_delay_us and rely on PCC derived value only if a platform specific value is not available. Signed-off-by: Prashanth Prakash <pprakash(a)codeaurora.org> Cc: 4.14+ <stable(a)vger.kernel.org> # 4.14+ Fixes: 3d41386d556d (cpufreq: CPPC: Use transition_delay_us depending transition_latency) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index bc5fc1630876..b15115a48775 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -126,6 +126,49 @@ static void cppc_cpufreq_stop_cpu(struct cpufreq_policy *policy) cpu->perf_caps.lowest_perf, cpu_num, ret); } +/* + * The PCC subspace describes the rate at which platform can accept commands + * on the shared PCC channel (including READs which do not count towards freq + * trasition requests), so ideally we need to use the PCC values as a fallback + * if we don't have a platform specific transition_delay_us + */ +#ifdef CONFIG_ARM64 +#include <asm/cputype.h> + +static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu) +{ + unsigned long implementor = read_cpuid_implementor(); + unsigned long part_num = read_cpuid_part_number(); + unsigned int delay_us = 0; + + switch (implementor) { + case ARM_CPU_IMP_QCOM: + switch (part_num) { + case QCOM_CPU_PART_FALKOR_V1: + case QCOM_CPU_PART_FALKOR: + delay_us = 10000; + break; + default: + delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC; + break; + } + break; + default: + delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC; + break; + } + + return delay_us; +} + +#else + +static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu) +{ + return cppc_get_transition_latency(cpu) / NSEC_PER_USEC; +} +#endif + static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) { struct cppc_cpudata *cpu; @@ -162,8 +205,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) cpu->perf_caps.highest_perf; policy->cpuinfo.max_freq = cppc_dmi_max_khz; - policy->transition_delay_us = cppc_get_transition_latency(cpu_num) / - NSEC_PER_USEC; + policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu_num); policy->shared_type = cpu->shared_type; if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) {

7 years, 2 months

1
0
0 0

FAILED: patch "[PATCH] cpufreq / CPPC: Set platform specific transition_delay_us" failed to apply to 4.16-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.16-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From d4f3388afd488ed15368fa7413b8bd6d1f98bb1d Mon Sep 17 00:00:00 2001 From: Prashanth Prakash <pprakash(a)codeaurora.org> Date: Fri, 27 Apr 2018 11:35:27 -0600 Subject: [PATCH] cpufreq / CPPC: Set platform specific transition_delay_us Add support to specify platform specific transition_delay_us instead of using the transition delay derived from PCC. With commit 3d41386d556d (cpufreq: CPPC: Use transition_delay_us depending transition_latency) we are setting transition_delay_us directly and not applying the LATENCY_MULTIPLIER. Because of that, on Qualcomm Centriq we can end up with a very high rate of frequency change requests when using the schedutil governor (default rate_limit_us=10 compared to an earlier value of 10000). The PCC subspace describes the rate at which the platform can accept commands on the CPPC's PCC channel. This includes read and write command on the PCC channel that can be used for reasons other than frequency transitions. Moreover the same PCC subspace can be used by multiple freq domains and deriving transition_delay_us from it as we do now can be sub-optimal. Moreover if a platform does not use PCC for desired_perf register then there is no way to compute the transition latency or the delay_us. CPPC does not have a standard defined mechanism to get the transition rate or the latency at the moment. Given the above limitations, it is simpler to have a platform specific transition_delay_us and rely on PCC derived value only if a platform specific value is not available. Signed-off-by: Prashanth Prakash <pprakash(a)codeaurora.org> Cc: 4.14+ <stable(a)vger.kernel.org> # 4.14+ Fixes: 3d41386d556d (cpufreq: CPPC: Use transition_delay_us depending transition_latency) Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com> diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c index bc5fc1630876..b15115a48775 100644 --- a/drivers/cpufreq/cppc_cpufreq.c +++ b/drivers/cpufreq/cppc_cpufreq.c @@ -126,6 +126,49 @@ static void cppc_cpufreq_stop_cpu(struct cpufreq_policy *policy) cpu->perf_caps.lowest_perf, cpu_num, ret); } +/* + * The PCC subspace describes the rate at which platform can accept commands + * on the shared PCC channel (including READs which do not count towards freq + * trasition requests), so ideally we need to use the PCC values as a fallback + * if we don't have a platform specific transition_delay_us + */ +#ifdef CONFIG_ARM64 +#include <asm/cputype.h> + +static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu) +{ + unsigned long implementor = read_cpuid_implementor(); + unsigned long part_num = read_cpuid_part_number(); + unsigned int delay_us = 0; + + switch (implementor) { + case ARM_CPU_IMP_QCOM: + switch (part_num) { + case QCOM_CPU_PART_FALKOR_V1: + case QCOM_CPU_PART_FALKOR: + delay_us = 10000; + break; + default: + delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC; + break; + } + break; + default: + delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC; + break; + } + + return delay_us; +} + +#else + +static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu) +{ + return cppc_get_transition_latency(cpu) / NSEC_PER_USEC; +} +#endif + static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) { struct cppc_cpudata *cpu; @@ -162,8 +205,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy) cpu->perf_caps.highest_perf; policy->cpuinfo.max_freq = cppc_dmi_max_khz; - policy->transition_delay_us = cppc_get_transition_latency(cpu_num) / - NSEC_PER_USEC; + policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu_num); policy->shared_type = cpu->shared_type; if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) {

7 years, 2 months

1
0
0 0

FAILED: patch "[PATCH] IB/mlx5: Fix represent correct netdevice in dual port RoCE" failed to apply to 4.16-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.16-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 84a6a7a99c0ac2f67366288c0625c9fba176b264 Mon Sep 17 00:00:00 2001 From: Parav Pandit <parav(a)mellanox.com> Date: Mon, 23 Apr 2018 17:01:55 +0300 Subject: [PATCH] IB/mlx5: Fix represent correct netdevice in dual port RoCE In commit bcf87f1dbbec ("IB/mlx5: Listen to netdev register/unresiter events in switchdev mode") incorrectly mapped primary device's netdevice to 2nd port netdevice. It always represented primary port's netdevice for 2nd port netdevice when ib representors were not used. This results into failing to process CM request arriving on 2nd port due to incorrect mapping of netdevice. This fix corrects it by considering the right mdev. Cc: <stable(a)vger.kernel.org> # 4.16 Fixes: bcf87f1dbbec ("IB/mlx5: Listen to netdev register/unresiter events in switchdev mode") Reviewed-by: Mark Bloch <markb(a)mellanox.com> Signed-off-by: Parav Pandit <parav(a)mellanox.com> Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com> Signed-off-by: Doug Ledford <dledford(a)redhat.com> diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c index 6a749c02b14c..78a4b2797057 100644 --- a/drivers/infiniband/hw/mlx5/main.c +++ b/drivers/infiniband/hw/mlx5/main.c @@ -179,7 +179,7 @@ static int mlx5_netdev_event(struct notifier_block *this, if (rep_ndev == ndev) roce->netdev = (event == NETDEV_UNREGISTER) ? NULL : ndev; - } else if (ndev->dev.parent == &ibdev->mdev->pdev->dev) { + } else if (ndev->dev.parent == &mdev->pdev->dev) { roce->netdev = (event == NETDEV_UNREGISTER) ? NULL : ndev; }

7 years, 2 months

1
0
0 0

FAILED: patch "[PATCH] RDMA/mlx5: Fix multiple NULL-ptr deref errors in rereg_mr" failed to apply to 4.9-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.9-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From b4bd701ac469075d94ed9699a28755f2862252b9 Mon Sep 17 00:00:00 2001 From: Leon Romanovsky <leonro(a)mellanox.com> Date: Mon, 23 Apr 2018 17:01:52 +0300 Subject: [PATCH] RDMA/mlx5: Fix multiple NULL-ptr deref errors in rereg_mr flow Failure in rereg MR releases UMEM but leaves the MR to be destroyed by the user. As a result the following scenario may happen: "create MR -> rereg MR with failure -> call to rereg MR again" and hit "NULL-ptr deref or user memory access" errors. Ensure that rereg MR is only performed on a non-dead MR. Cc: syzkaller <syzkaller(a)googlegroups.com> Cc: <stable(a)vger.kernel.org> # 4.5 Fixes: 395a8e4c32ea ("IB/mlx5: Refactoring register MR code") Reported-by: Noa Osherovich <noaos(a)mellanox.com> Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com> Signed-off-by: Doug Ledford <dledford(a)redhat.com> diff --git a/drivers/infiniband/hw/mlx5/mr.c b/drivers/infiniband/hw/mlx5/mr.c index 1520a2f20f98..90a9c461cedc 100644 --- a/drivers/infiniband/hw/mlx5/mr.c +++ b/drivers/infiniband/hw/mlx5/mr.c @@ -866,25 +866,28 @@ static int mr_umem_get(struct ib_pd *pd, u64 start, u64 length, int *order) { struct mlx5_ib_dev *dev = to_mdev(pd->device); + struct ib_umem *u; int err; - *umem = ib_umem_get(pd->uobject->context, start, length, - access_flags, 0); - err = PTR_ERR_OR_ZERO(*umem); + *umem = NULL; + + u = ib_umem_get(pd->uobject->context, start, length, access_flags, 0); + err = PTR_ERR_OR_ZERO(u); if (err) { - *umem = NULL; - mlx5_ib_err(dev, "umem get failed (%d)\n", err); + mlx5_ib_dbg(dev, "umem get failed (%d)\n", err); return err; } - mlx5_ib_cont_pages(*umem, start, MLX5_MKEY_PAGE_SHIFT_MASK, npages, + mlx5_ib_cont_pages(u, start, MLX5_MKEY_PAGE_SHIFT_MASK, npages, page_shift, ncont, order); if (!*npages) { mlx5_ib_warn(dev, "avoid zero region\n"); - ib_umem_release(*umem); + ib_umem_release(u); return -EINVAL; } + *umem = u; + mlx5_ib_dbg(dev, "npages %d, ncont %d, order %d, page_shift %d\n", *npages, *ncont, *order, *page_shift); @@ -1458,13 +1461,12 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start, int access_flags = flags & IB_MR_REREG_ACCESS ? new_access_flags : mr->access_flags; - u64 addr = (flags & IB_MR_REREG_TRANS) ? virt_addr : mr->umem->address; - u64 len = (flags & IB_MR_REREG_TRANS) ? length : mr->umem->length; int page_shift = 0; int upd_flags = 0; int npages = 0; int ncont = 0; int order = 0; + u64 addr, len; int err; mlx5_ib_dbg(dev, "start 0x%llx, virt_addr 0x%llx, length 0x%llx, access_flags 0x%x\n", @@ -1472,6 +1474,17 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start, atomic_sub(mr->npages, &dev->mdev->priv.reg_pages); + if (!mr->umem) + return -EINVAL; + + if (flags & IB_MR_REREG_TRANS) { + addr = virt_addr; + len = length; + } else { + addr = mr->umem->address; + len = mr->umem->length; + } + if (flags != IB_MR_REREG_PD) { /* * Replace umem. This needs to be done whether or not UMR is @@ -1479,6 +1492,7 @@ int mlx5_ib_rereg_user_mr(struct ib_mr *ib_mr, int flags, u64 start, */ flags |= IB_MR_REREG_TRANS; ib_umem_release(mr->umem); + mr->umem = NULL; err = mr_umem_get(pd, addr, len, access_flags, &mr->umem, &npages, &page_shift, &ncont, &order); if (err)

7 years, 2 months

1
0
0 0

FAILED: patch "[PATCH] tracing: Fix bad use of igrab in trace_uprobe.c" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 0c92c7a3c5d416f47b32c5f20a611dfeca5d5f2e Mon Sep 17 00:00:00 2001 From: Song Liu <songliubraving(a)fb.com> Date: Mon, 23 Apr 2018 10:21:34 -0700 Subject: [PATCH] tracing: Fix bad use of igrab in trace_uprobe.c As Miklos reported and suggested: This pattern repeats two times in trace_uprobe.c and in kernel/events/core.c as well: ret = kern_path(filename, LOOKUP_FOLLOW, &path); if (ret) goto fail_address_parse; inode = igrab(d_inode(path.dentry)); path_put(&path); And it's wrong. You can only hold a reference to the inode if you have an active ref to the superblock as well (which is normally through path.mnt) or holding s_umount. This way unmounting the containing filesystem while the tracepoint is active will give you the "VFS: Busy inodes after unmount..." message and a crash when the inode is finally put. Solution: store path instead of inode. This patch fixes two instances in trace_uprobe.c. struct path is added to struct trace_uprobe to keep the inode and containing mount point referenced. Link: http://lkml.kernel.org/r/20180423172135.4050588-1-songliubraving@fb.com Fixes: f3f096cfedf8 ("tracing: Provide trace events interface for uprobes") Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU") Cc: stable(a)vger.kernel.org Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Howard McLauchlan <hmclauchlan(a)fb.com> Cc: Josef Bacik <jbacik(a)fb.com> Cc: Srikar Dronamraju <srikar(a)linux.vnet.ibm.com> Acked-by: Miklos Szeredi <mszeredi(a)redhat.com> Reported-by: Miklos Szeredi <miklos(a)szeredi.hu> Signed-off-by: Song Liu <songliubraving(a)fb.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index 34fd0e0ec51d..ac892878dbe6 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -55,6 +55,7 @@ struct trace_uprobe { struct list_head list; struct trace_uprobe_filter filter; struct uprobe_consumer consumer; + struct path path; struct inode *inode; char *filename; unsigned long offset; @@ -289,7 +290,7 @@ static void free_trace_uprobe(struct trace_uprobe *tu) for (i = 0; i < tu->tp.nr_args; i++) traceprobe_free_probe_arg(&tu->tp.args[i]); - iput(tu->inode); + path_put(&tu->path); kfree(tu->tp.call.class->system); kfree(tu->tp.call.name); kfree(tu->filename); @@ -363,7 +364,6 @@ static int register_trace_uprobe(struct trace_uprobe *tu) static int create_trace_uprobe(int argc, char **argv) { struct trace_uprobe *tu; - struct inode *inode; char *arg, *event, *group, *filename; char buf[MAX_EVENT_NAME_LEN]; struct path path; @@ -371,7 +371,6 @@ static int create_trace_uprobe(int argc, char **argv) bool is_delete, is_return; int i, ret; - inode = NULL; ret = 0; is_delete = false; is_return = false; @@ -437,21 +436,16 @@ static int create_trace_uprobe(int argc, char **argv) } /* Find the last occurrence, in case the path contains ':' too. */ arg = strrchr(argv[1], ':'); - if (!arg) { - ret = -EINVAL; - goto fail_address_parse; - } + if (!arg) + return -EINVAL; *arg++ = '\0'; filename = argv[1]; ret = kern_path(filename, LOOKUP_FOLLOW, &path); if (ret) - goto fail_address_parse; - - inode = igrab(d_real_inode(path.dentry)); - path_put(&path); + return ret; - if (!inode || !S_ISREG(inode->i_mode)) { + if (!d_is_reg(path.dentry)) { ret = -EINVAL; goto fail_address_parse; } @@ -490,7 +484,7 @@ static int create_trace_uprobe(int argc, char **argv) goto fail_address_parse; } tu->offset = offset; - tu->inode = inode; + tu->path = path; tu->filename = kstrdup(filename, GFP_KERNEL); if (!tu->filename) { @@ -558,7 +552,7 @@ static int create_trace_uprobe(int argc, char **argv) return ret; fail_address_parse: - iput(inode); + path_put(&path); pr_info("Failed to parse address or file.\n"); @@ -922,6 +916,7 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file, goto err_flags; tu->consumer.filter = filter; + tu->inode = d_real_inode(tu->path.dentry); ret = uprobe_register(tu->inode, tu->offset, &tu->consumer); if (ret) goto err_buffer; @@ -967,6 +962,7 @@ probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file) WARN_ON(!uprobe_filter_is_empty(&tu->filter)); uprobe_unregister(tu->inode, tu->offset, &tu->consumer); + tu->inode = NULL; tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE; uprobe_buffer_disable(); @@ -1337,7 +1333,6 @@ struct trace_event_call * create_local_trace_uprobe(char *name, unsigned long offs, bool is_return) { struct trace_uprobe *tu; - struct inode *inode; struct path path; int ret; @@ -1345,11 +1340,8 @@ create_local_trace_uprobe(char *name, unsigned long offs, bool is_return) if (ret) return ERR_PTR(ret); - inode = igrab(d_inode(path.dentry)); - path_put(&path); - - if (!inode || !S_ISREG(inode->i_mode)) { - iput(inode); + if (!d_is_reg(path.dentry)) { + path_put(&path); return ERR_PTR(-EINVAL); } @@ -1364,11 +1356,12 @@ create_local_trace_uprobe(char *name, unsigned long offs, bool is_return) if (IS_ERR(tu)) { pr_info("Failed to allocate trace_uprobe.(%d)\n", (int)PTR_ERR(tu)); + path_put(&path); return ERR_CAST(tu); } tu->offset = offs; - tu->inode = inode; + tu->path = path; tu->filename = kstrdup(name, GFP_KERNEL); init_trace_event_call(tu, &tu->tp.call);

7 years, 2 months

1
0
0 0

FAILED: patch "[PATCH] tracing: Fix bad use of igrab in trace_uprobe.c" failed to apply to 4.16-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.16-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From 0c92c7a3c5d416f47b32c5f20a611dfeca5d5f2e Mon Sep 17 00:00:00 2001 From: Song Liu <songliubraving(a)fb.com> Date: Mon, 23 Apr 2018 10:21:34 -0700 Subject: [PATCH] tracing: Fix bad use of igrab in trace_uprobe.c As Miklos reported and suggested: This pattern repeats two times in trace_uprobe.c and in kernel/events/core.c as well: ret = kern_path(filename, LOOKUP_FOLLOW, &path); if (ret) goto fail_address_parse; inode = igrab(d_inode(path.dentry)); path_put(&path); And it's wrong. You can only hold a reference to the inode if you have an active ref to the superblock as well (which is normally through path.mnt) or holding s_umount. This way unmounting the containing filesystem while the tracepoint is active will give you the "VFS: Busy inodes after unmount..." message and a crash when the inode is finally put. Solution: store path instead of inode. This patch fixes two instances in trace_uprobe.c. struct path is added to struct trace_uprobe to keep the inode and containing mount point referenced. Link: http://lkml.kernel.org/r/20180423172135.4050588-1-songliubraving@fb.com Fixes: f3f096cfedf8 ("tracing: Provide trace events interface for uprobes") Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU") Cc: stable(a)vger.kernel.org Cc: Ingo Molnar <mingo(a)redhat.com> Cc: Howard McLauchlan <hmclauchlan(a)fb.com> Cc: Josef Bacik <jbacik(a)fb.com> Cc: Srikar Dronamraju <srikar(a)linux.vnet.ibm.com> Acked-by: Miklos Szeredi <mszeredi(a)redhat.com> Reported-by: Miklos Szeredi <miklos(a)szeredi.hu> Signed-off-by: Song Liu <songliubraving(a)fb.com> Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c index 34fd0e0ec51d..ac892878dbe6 100644 --- a/kernel/trace/trace_uprobe.c +++ b/kernel/trace/trace_uprobe.c @@ -55,6 +55,7 @@ struct trace_uprobe { struct list_head list; struct trace_uprobe_filter filter; struct uprobe_consumer consumer; + struct path path; struct inode *inode; char *filename; unsigned long offset; @@ -289,7 +290,7 @@ static void free_trace_uprobe(struct trace_uprobe *tu) for (i = 0; i < tu->tp.nr_args; i++) traceprobe_free_probe_arg(&tu->tp.args[i]); - iput(tu->inode); + path_put(&tu->path); kfree(tu->tp.call.class->system); kfree(tu->tp.call.name); kfree(tu->filename); @@ -363,7 +364,6 @@ static int register_trace_uprobe(struct trace_uprobe *tu) static int create_trace_uprobe(int argc, char **argv) { struct trace_uprobe *tu; - struct inode *inode; char *arg, *event, *group, *filename; char buf[MAX_EVENT_NAME_LEN]; struct path path; @@ -371,7 +371,6 @@ static int create_trace_uprobe(int argc, char **argv) bool is_delete, is_return; int i, ret; - inode = NULL; ret = 0; is_delete = false; is_return = false; @@ -437,21 +436,16 @@ static int create_trace_uprobe(int argc, char **argv) } /* Find the last occurrence, in case the path contains ':' too. */ arg = strrchr(argv[1], ':'); - if (!arg) { - ret = -EINVAL; - goto fail_address_parse; - } + if (!arg) + return -EINVAL; *arg++ = '\0'; filename = argv[1]; ret = kern_path(filename, LOOKUP_FOLLOW, &path); if (ret) - goto fail_address_parse; - - inode = igrab(d_real_inode(path.dentry)); - path_put(&path); + return ret; - if (!inode || !S_ISREG(inode->i_mode)) { + if (!d_is_reg(path.dentry)) { ret = -EINVAL; goto fail_address_parse; } @@ -490,7 +484,7 @@ static int create_trace_uprobe(int argc, char **argv) goto fail_address_parse; } tu->offset = offset; - tu->inode = inode; + tu->path = path; tu->filename = kstrdup(filename, GFP_KERNEL); if (!tu->filename) { @@ -558,7 +552,7 @@ static int create_trace_uprobe(int argc, char **argv) return ret; fail_address_parse: - iput(inode); + path_put(&path); pr_info("Failed to parse address or file.\n"); @@ -922,6 +916,7 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file, goto err_flags; tu->consumer.filter = filter; + tu->inode = d_real_inode(tu->path.dentry); ret = uprobe_register(tu->inode, tu->offset, &tu->consumer); if (ret) goto err_buffer; @@ -967,6 +962,7 @@ probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file) WARN_ON(!uprobe_filter_is_empty(&tu->filter)); uprobe_unregister(tu->inode, tu->offset, &tu->consumer); + tu->inode = NULL; tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE; uprobe_buffer_disable(); @@ -1337,7 +1333,6 @@ struct trace_event_call * create_local_trace_uprobe(char *name, unsigned long offs, bool is_return) { struct trace_uprobe *tu; - struct inode *inode; struct path path; int ret; @@ -1345,11 +1340,8 @@ create_local_trace_uprobe(char *name, unsigned long offs, bool is_return) if (ret) return ERR_PTR(ret); - inode = igrab(d_inode(path.dentry)); - path_put(&path); - - if (!inode || !S_ISREG(inode->i_mode)) { - iput(inode); + if (!d_is_reg(path.dentry)) { + path_put(&path); return ERR_PTR(-EINVAL); } @@ -1364,11 +1356,12 @@ create_local_trace_uprobe(char *name, unsigned long offs, bool is_return) if (IS_ERR(tu)) { pr_info("Failed to allocate trace_uprobe.(%d)\n", (int)PTR_ERR(tu)); + path_put(&path); return ERR_CAST(tu); } tu->offset = offs; - tu->inode = inode; + tu->path = path; tu->filename = kstrdup(name, GFP_KERNEL); init_trace_event_call(tu, &tu->tp.call);

7 years, 2 months

1
0
0 0

FAILED: patch "[PATCH] errseq: Always report a writeback error once" failed to apply to 4.14-stable tree

by gregkh＠linuxfoundation.org

The patch below does not apply to the 4.14-stable tree. If someone wants it applied there, or to any other stable or longterm tree, then please email the backport, including the original git commit id to <stable(a)vger.kernel.org>. thanks, greg k-h ------------------ original commit in Linus's tree ------------------ >From b4678df184b314a2bd47d2329feca2c2534aa12b Mon Sep 17 00:00:00 2001 From: Matthew Wilcox <willy(a)infradead.org> Date: Tue, 24 Apr 2018 14:02:57 -0700 Subject: [PATCH] errseq: Always report a writeback error once The errseq_t infrastructure assumes that errors which occurred before the file descriptor was opened are of no interest to the application. This turns out to be a regression for some applications, notably Postgres. Before errseq_t, a writeback error would be reported exactly once (as long as the inode remained in memory), so Postgres could open a file, call fsync() and find out whether there had been a writeback error on that file from another process. This patch changes the errseq infrastructure to report errors to all file descriptors which are opened after the error occurred, but before it was reported to any file descriptor. This restores the user-visible behaviour. Cc: stable(a)vger.kernel.org Fixes: 5660e13d2fd6 ("fs: new infrastructure for writeback error handling and reporting") Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com> Reviewed-by: Jeff Layton <jlayton(a)kernel.org> Signed-off-by: Jeff Layton <jlayton(a)redhat.com> diff --git a/lib/errseq.c b/lib/errseq.c index df782418b333..81f9e33aa7e7 100644 --- a/lib/errseq.c +++ b/lib/errseq.c @@ -111,27 +111,22 @@ EXPORT_SYMBOL(errseq_set); * errseq_sample() - Grab current errseq_t value. * @eseq: Pointer to errseq_t to be sampled. * - * This function allows callers to sample an errseq_t value, marking it as - * "seen" if required. + * This function allows callers to initialise their errseq_t variable. + * If the error has been "seen", new callers will not see an old error. + * If there is an unseen error in @eseq, the caller of this function will + * see it the next time it checks for an error. * + * Context: Any context. * Return: The current errseq value. */ errseq_t errseq_sample(errseq_t *eseq) { errseq_t old = READ_ONCE(*eseq); - errseq_t new = old; - /* - * For the common case of no errors ever having been set, we can skip - * marking the SEEN bit. Once an error has been set, the value will - * never go back to zero. - */ - if (old != 0) { - new |= ERRSEQ_SEEN; - if (old != new) - cmpxchg(eseq, old, new); - } - return new; + /* If nobody has seen this error yet, then we can be the first. */ + if (!(old & ERRSEQ_SEEN)) + old = 0; + return old; } EXPORT_SYMBOL(errseq_sample);

7 years, 2 months

1
0
0 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

Linux-stable-mirror