From: Matthew Wilcox <willy(a)infradead.org>
The errseq_t infrastructure assumes that errors which occurred before
the file descriptor was opened are of no interest to the application.
This turns out to be a regression for some applications, notably Postgres.
Before errseq_t, a writeback error would be reported exactly once (as
long as the inode remained in memory), so Postgres could open a file,
call fsync() and find out whether there had been a writeback error on
that file from another process.
This patch changes the errseq infrastructure to report errors to all
file descriptors which are opened after the error occurred, but before
it was reported to any file descriptor. This restores the user-visible
behaviour.
[ jlayton: fix up conflicts in comments ]
Cc: stable(a)vger.kernel.org
Fixes: 5660e13d2fd6 ("fs: new infrastructure for writeback error handling and reporting")
Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Jeff Layton <jlayton(a)redhat.com>
(cherry picked from commit b4678df184b314a2bd47d2329feca2c2534aa12b)
---
lib/errseq.c | 25 +++++++++++--------------
1 file changed, 11 insertions(+), 14 deletions(-)
This is a backport to the v4.14 stable series. The only merge conflict
was due to an earlier patch by Willy to flesh out the comments. There
were no code changes necessary.
diff --git a/lib/errseq.c b/lib/errseq.c
index 79cc66897db4..b6ed81ec788d 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -111,25 +111,22 @@ EXPORT_SYMBOL(errseq_set);
* errseq_sample - grab current errseq_t value
* @eseq: pointer to errseq_t to be sampled
*
- * This function allows callers to sample an errseq_t value, marking it as
- * "seen" if required.
+ * This function allows callers to initialise their errseq_t variable.
+ * If the error has been "seen", new callers will not see an old error.
+ * If there is an unseen error in @eseq, the caller of this function will
+ * see it the next time it checks for an error.
+ *
+ * Context: Any context.
+ * Return: The current errseq value.
*/
errseq_t errseq_sample(errseq_t *eseq)
{
errseq_t old = READ_ONCE(*eseq);
- errseq_t new = old;
- /*
- * For the common case of no errors ever having been set, we can skip
- * marking the SEEN bit. Once an error has been set, the value will
- * never go back to zero.
- */
- if (old != 0) {
- new |= ERRSEQ_SEEN;
- if (old != new)
- cmpxchg(eseq, old, new);
- }
- return new;
+ /* If nobody has seen this error yet, then we can be the first. */
+ if (!(old & ERRSEQ_SEEN))
+ old = 0;
+ return old;
}
EXPORT_SYMBOL(errseq_sample);
--
2.17.0
Turns out virtio console tries to take a buffer out of an active vq.
Works by sheer luck, and is explicitly forbidden by spec. And while
going over it I saw that error handling is also broken -
failure is easy to trigger if I force allocations to fail.
Lightly tested.
Michael S. Tsirkin (6):
virtio_console: don't tie bufs to a vq
virtio: add ability to iterate over vqs
virtio_console: free buffers after reset
virtio_console: drop custom control queue cleanup
virtio_console: move removal code
virtio_console: reset on out of memory
drivers/char/virtio_console.c | 155 ++++++++++++++++++++----------------------
include/linux/virtio.h | 3 +
2 files changed, 75 insertions(+), 83 deletions(-)
--
MST
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d4f3388afd488ed15368fa7413b8bd6d1f98bb1d Mon Sep 17 00:00:00 2001
From: Prashanth Prakash <pprakash(a)codeaurora.org>
Date: Fri, 27 Apr 2018 11:35:27 -0600
Subject: [PATCH] cpufreq / CPPC: Set platform specific transition_delay_us
Add support to specify platform specific transition_delay_us instead
of using the transition delay derived from PCC.
With commit 3d41386d556d (cpufreq: CPPC: Use transition_delay_us
depending transition_latency) we are setting transition_delay_us
directly and not applying the LATENCY_MULTIPLIER. Because of that,
on Qualcomm Centriq we can end up with a very high rate of frequency
change requests when using the schedutil governor (default
rate_limit_us=10 compared to an earlier value of 10000).
The PCC subspace describes the rate at which the platform can accept
commands on the CPPC's PCC channel. This includes read and write
command on the PCC channel that can be used for reasons other than
frequency transitions. Moreover the same PCC subspace can be used by
multiple freq domains and deriving transition_delay_us from it as we
do now can be sub-optimal.
Moreover if a platform does not use PCC for desired_perf register then
there is no way to compute the transition latency or the delay_us.
CPPC does not have a standard defined mechanism to get the transition
rate or the latency at the moment.
Given the above limitations, it is simpler to have a platform specific
transition_delay_us and rely on PCC derived value only if a platform
specific value is not available.
Signed-off-by: Prashanth Prakash <pprakash(a)codeaurora.org>
Cc: 4.14+ <stable(a)vger.kernel.org> # 4.14+
Fixes: 3d41386d556d (cpufreq: CPPC: Use transition_delay_us depending transition_latency)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index bc5fc1630876..b15115a48775 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -126,6 +126,49 @@ static void cppc_cpufreq_stop_cpu(struct cpufreq_policy *policy)
cpu->perf_caps.lowest_perf, cpu_num, ret);
}
+/*
+ * The PCC subspace describes the rate at which platform can accept commands
+ * on the shared PCC channel (including READs which do not count towards freq
+ * trasition requests), so ideally we need to use the PCC values as a fallback
+ * if we don't have a platform specific transition_delay_us
+ */
+#ifdef CONFIG_ARM64
+#include <asm/cputype.h>
+
+static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu)
+{
+ unsigned long implementor = read_cpuid_implementor();
+ unsigned long part_num = read_cpuid_part_number();
+ unsigned int delay_us = 0;
+
+ switch (implementor) {
+ case ARM_CPU_IMP_QCOM:
+ switch (part_num) {
+ case QCOM_CPU_PART_FALKOR_V1:
+ case QCOM_CPU_PART_FALKOR:
+ delay_us = 10000;
+ break;
+ default:
+ delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC;
+ break;
+ }
+ break;
+ default:
+ delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC;
+ break;
+ }
+
+ return delay_us;
+}
+
+#else
+
+static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu)
+{
+ return cppc_get_transition_latency(cpu) / NSEC_PER_USEC;
+}
+#endif
+
static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
{
struct cppc_cpudata *cpu;
@@ -162,8 +205,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
cpu->perf_caps.highest_perf;
policy->cpuinfo.max_freq = cppc_dmi_max_khz;
- policy->transition_delay_us = cppc_get_transition_latency(cpu_num) /
- NSEC_PER_USEC;
+ policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu_num);
policy->shared_type = cpu->shared_type;
if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) {
The patch below does not apply to the 4.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From d4f3388afd488ed15368fa7413b8bd6d1f98bb1d Mon Sep 17 00:00:00 2001
From: Prashanth Prakash <pprakash(a)codeaurora.org>
Date: Fri, 27 Apr 2018 11:35:27 -0600
Subject: [PATCH] cpufreq / CPPC: Set platform specific transition_delay_us
Add support to specify platform specific transition_delay_us instead
of using the transition delay derived from PCC.
With commit 3d41386d556d (cpufreq: CPPC: Use transition_delay_us
depending transition_latency) we are setting transition_delay_us
directly and not applying the LATENCY_MULTIPLIER. Because of that,
on Qualcomm Centriq we can end up with a very high rate of frequency
change requests when using the schedutil governor (default
rate_limit_us=10 compared to an earlier value of 10000).
The PCC subspace describes the rate at which the platform can accept
commands on the CPPC's PCC channel. This includes read and write
command on the PCC channel that can be used for reasons other than
frequency transitions. Moreover the same PCC subspace can be used by
multiple freq domains and deriving transition_delay_us from it as we
do now can be sub-optimal.
Moreover if a platform does not use PCC for desired_perf register then
there is no way to compute the transition latency or the delay_us.
CPPC does not have a standard defined mechanism to get the transition
rate or the latency at the moment.
Given the above limitations, it is simpler to have a platform specific
transition_delay_us and rely on PCC derived value only if a platform
specific value is not available.
Signed-off-by: Prashanth Prakash <pprakash(a)codeaurora.org>
Cc: 4.14+ <stable(a)vger.kernel.org> # 4.14+
Fixes: 3d41386d556d (cpufreq: CPPC: Use transition_delay_us depending transition_latency)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki(a)intel.com>
diff --git a/drivers/cpufreq/cppc_cpufreq.c b/drivers/cpufreq/cppc_cpufreq.c
index bc5fc1630876..b15115a48775 100644
--- a/drivers/cpufreq/cppc_cpufreq.c
+++ b/drivers/cpufreq/cppc_cpufreq.c
@@ -126,6 +126,49 @@ static void cppc_cpufreq_stop_cpu(struct cpufreq_policy *policy)
cpu->perf_caps.lowest_perf, cpu_num, ret);
}
+/*
+ * The PCC subspace describes the rate at which platform can accept commands
+ * on the shared PCC channel (including READs which do not count towards freq
+ * trasition requests), so ideally we need to use the PCC values as a fallback
+ * if we don't have a platform specific transition_delay_us
+ */
+#ifdef CONFIG_ARM64
+#include <asm/cputype.h>
+
+static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu)
+{
+ unsigned long implementor = read_cpuid_implementor();
+ unsigned long part_num = read_cpuid_part_number();
+ unsigned int delay_us = 0;
+
+ switch (implementor) {
+ case ARM_CPU_IMP_QCOM:
+ switch (part_num) {
+ case QCOM_CPU_PART_FALKOR_V1:
+ case QCOM_CPU_PART_FALKOR:
+ delay_us = 10000;
+ break;
+ default:
+ delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC;
+ break;
+ }
+ break;
+ default:
+ delay_us = cppc_get_transition_latency(cpu) / NSEC_PER_USEC;
+ break;
+ }
+
+ return delay_us;
+}
+
+#else
+
+static unsigned int cppc_cpufreq_get_transition_delay_us(int cpu)
+{
+ return cppc_get_transition_latency(cpu) / NSEC_PER_USEC;
+}
+#endif
+
static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
{
struct cppc_cpudata *cpu;
@@ -162,8 +205,7 @@ static int cppc_cpufreq_cpu_init(struct cpufreq_policy *policy)
cpu->perf_caps.highest_perf;
policy->cpuinfo.max_freq = cppc_dmi_max_khz;
- policy->transition_delay_us = cppc_get_transition_latency(cpu_num) /
- NSEC_PER_USEC;
+ policy->transition_delay_us = cppc_cpufreq_get_transition_delay_us(cpu_num);
policy->shared_type = cpu->shared_type;
if (policy->shared_type == CPUFREQ_SHARED_TYPE_ANY) {
The patch below does not apply to the 4.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 84a6a7a99c0ac2f67366288c0625c9fba176b264 Mon Sep 17 00:00:00 2001
From: Parav Pandit <parav(a)mellanox.com>
Date: Mon, 23 Apr 2018 17:01:55 +0300
Subject: [PATCH] IB/mlx5: Fix represent correct netdevice in dual port RoCE
In commit bcf87f1dbbec ("IB/mlx5: Listen to netdev register/unresiter events in switchdev mode")
incorrectly mapped primary device's netdevice to 2nd port netdevice.
It always represented primary port's netdevice for 2nd port netdevice
when ib representors were not used.
This results into failing to process CM request arriving on 2nd port due
to incorrect mapping of netdevice.
This fix corrects it by considering the right mdev.
Cc: <stable(a)vger.kernel.org> # 4.16
Fixes: bcf87f1dbbec ("IB/mlx5: Listen to netdev register/unresiter events in switchdev mode")
Reviewed-by: Mark Bloch <markb(a)mellanox.com>
Signed-off-by: Parav Pandit <parav(a)mellanox.com>
Signed-off-by: Leon Romanovsky <leonro(a)mellanox.com>
Signed-off-by: Doug Ledford <dledford(a)redhat.com>
diff --git a/drivers/infiniband/hw/mlx5/main.c b/drivers/infiniband/hw/mlx5/main.c
index 6a749c02b14c..78a4b2797057 100644
--- a/drivers/infiniband/hw/mlx5/main.c
+++ b/drivers/infiniband/hw/mlx5/main.c
@@ -179,7 +179,7 @@ static int mlx5_netdev_event(struct notifier_block *this,
if (rep_ndev == ndev)
roce->netdev = (event == NETDEV_UNREGISTER) ?
NULL : ndev;
- } else if (ndev->dev.parent == &ibdev->mdev->pdev->dev) {
+ } else if (ndev->dev.parent == &mdev->pdev->dev) {
roce->netdev = (event == NETDEV_UNREGISTER) ?
NULL : ndev;
}
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0c92c7a3c5d416f47b32c5f20a611dfeca5d5f2e Mon Sep 17 00:00:00 2001
From: Song Liu <songliubraving(a)fb.com>
Date: Mon, 23 Apr 2018 10:21:34 -0700
Subject: [PATCH] tracing: Fix bad use of igrab in trace_uprobe.c
As Miklos reported and suggested:
This pattern repeats two times in trace_uprobe.c and in
kernel/events/core.c as well:
ret = kern_path(filename, LOOKUP_FOLLOW, &path);
if (ret)
goto fail_address_parse;
inode = igrab(d_inode(path.dentry));
path_put(&path);
And it's wrong. You can only hold a reference to the inode if you
have an active ref to the superblock as well (which is normally
through path.mnt) or holding s_umount.
This way unmounting the containing filesystem while the tracepoint is
active will give you the "VFS: Busy inodes after unmount..." message
and a crash when the inode is finally put.
Solution: store path instead of inode.
This patch fixes two instances in trace_uprobe.c. struct path is added to
struct trace_uprobe to keep the inode and containing mount point
referenced.
Link: http://lkml.kernel.org/r/20180423172135.4050588-1-songliubraving@fb.com
Fixes: f3f096cfedf8 ("tracing: Provide trace events interface for uprobes")
Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU")
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Howard McLauchlan <hmclauchlan(a)fb.com>
Cc: Josef Bacik <jbacik(a)fb.com>
Cc: Srikar Dronamraju <srikar(a)linux.vnet.ibm.com>
Acked-by: Miklos Szeredi <mszeredi(a)redhat.com>
Reported-by: Miklos Szeredi <miklos(a)szeredi.hu>
Signed-off-by: Song Liu <songliubraving(a)fb.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 34fd0e0ec51d..ac892878dbe6 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -55,6 +55,7 @@ struct trace_uprobe {
struct list_head list;
struct trace_uprobe_filter filter;
struct uprobe_consumer consumer;
+ struct path path;
struct inode *inode;
char *filename;
unsigned long offset;
@@ -289,7 +290,7 @@ static void free_trace_uprobe(struct trace_uprobe *tu)
for (i = 0; i < tu->tp.nr_args; i++)
traceprobe_free_probe_arg(&tu->tp.args[i]);
- iput(tu->inode);
+ path_put(&tu->path);
kfree(tu->tp.call.class->system);
kfree(tu->tp.call.name);
kfree(tu->filename);
@@ -363,7 +364,6 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
static int create_trace_uprobe(int argc, char **argv)
{
struct trace_uprobe *tu;
- struct inode *inode;
char *arg, *event, *group, *filename;
char buf[MAX_EVENT_NAME_LEN];
struct path path;
@@ -371,7 +371,6 @@ static int create_trace_uprobe(int argc, char **argv)
bool is_delete, is_return;
int i, ret;
- inode = NULL;
ret = 0;
is_delete = false;
is_return = false;
@@ -437,21 +436,16 @@ static int create_trace_uprobe(int argc, char **argv)
}
/* Find the last occurrence, in case the path contains ':' too. */
arg = strrchr(argv[1], ':');
- if (!arg) {
- ret = -EINVAL;
- goto fail_address_parse;
- }
+ if (!arg)
+ return -EINVAL;
*arg++ = '\0';
filename = argv[1];
ret = kern_path(filename, LOOKUP_FOLLOW, &path);
if (ret)
- goto fail_address_parse;
-
- inode = igrab(d_real_inode(path.dentry));
- path_put(&path);
+ return ret;
- if (!inode || !S_ISREG(inode->i_mode)) {
+ if (!d_is_reg(path.dentry)) {
ret = -EINVAL;
goto fail_address_parse;
}
@@ -490,7 +484,7 @@ static int create_trace_uprobe(int argc, char **argv)
goto fail_address_parse;
}
tu->offset = offset;
- tu->inode = inode;
+ tu->path = path;
tu->filename = kstrdup(filename, GFP_KERNEL);
if (!tu->filename) {
@@ -558,7 +552,7 @@ static int create_trace_uprobe(int argc, char **argv)
return ret;
fail_address_parse:
- iput(inode);
+ path_put(&path);
pr_info("Failed to parse address or file.\n");
@@ -922,6 +916,7 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file,
goto err_flags;
tu->consumer.filter = filter;
+ tu->inode = d_real_inode(tu->path.dentry);
ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
if (ret)
goto err_buffer;
@@ -967,6 +962,7 @@ probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file)
WARN_ON(!uprobe_filter_is_empty(&tu->filter));
uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
+ tu->inode = NULL;
tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
uprobe_buffer_disable();
@@ -1337,7 +1333,6 @@ struct trace_event_call *
create_local_trace_uprobe(char *name, unsigned long offs, bool is_return)
{
struct trace_uprobe *tu;
- struct inode *inode;
struct path path;
int ret;
@@ -1345,11 +1340,8 @@ create_local_trace_uprobe(char *name, unsigned long offs, bool is_return)
if (ret)
return ERR_PTR(ret);
- inode = igrab(d_inode(path.dentry));
- path_put(&path);
-
- if (!inode || !S_ISREG(inode->i_mode)) {
- iput(inode);
+ if (!d_is_reg(path.dentry)) {
+ path_put(&path);
return ERR_PTR(-EINVAL);
}
@@ -1364,11 +1356,12 @@ create_local_trace_uprobe(char *name, unsigned long offs, bool is_return)
if (IS_ERR(tu)) {
pr_info("Failed to allocate trace_uprobe.(%d)\n",
(int)PTR_ERR(tu));
+ path_put(&path);
return ERR_CAST(tu);
}
tu->offset = offs;
- tu->inode = inode;
+ tu->path = path;
tu->filename = kstrdup(name, GFP_KERNEL);
init_trace_event_call(tu, &tu->tp.call);
The patch below does not apply to the 4.16-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 0c92c7a3c5d416f47b32c5f20a611dfeca5d5f2e Mon Sep 17 00:00:00 2001
From: Song Liu <songliubraving(a)fb.com>
Date: Mon, 23 Apr 2018 10:21:34 -0700
Subject: [PATCH] tracing: Fix bad use of igrab in trace_uprobe.c
As Miklos reported and suggested:
This pattern repeats two times in trace_uprobe.c and in
kernel/events/core.c as well:
ret = kern_path(filename, LOOKUP_FOLLOW, &path);
if (ret)
goto fail_address_parse;
inode = igrab(d_inode(path.dentry));
path_put(&path);
And it's wrong. You can only hold a reference to the inode if you
have an active ref to the superblock as well (which is normally
through path.mnt) or holding s_umount.
This way unmounting the containing filesystem while the tracepoint is
active will give you the "VFS: Busy inodes after unmount..." message
and a crash when the inode is finally put.
Solution: store path instead of inode.
This patch fixes two instances in trace_uprobe.c. struct path is added to
struct trace_uprobe to keep the inode and containing mount point
referenced.
Link: http://lkml.kernel.org/r/20180423172135.4050588-1-songliubraving@fb.com
Fixes: f3f096cfedf8 ("tracing: Provide trace events interface for uprobes")
Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU")
Cc: stable(a)vger.kernel.org
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: Howard McLauchlan <hmclauchlan(a)fb.com>
Cc: Josef Bacik <jbacik(a)fb.com>
Cc: Srikar Dronamraju <srikar(a)linux.vnet.ibm.com>
Acked-by: Miklos Szeredi <mszeredi(a)redhat.com>
Reported-by: Miklos Szeredi <miklos(a)szeredi.hu>
Signed-off-by: Song Liu <songliubraving(a)fb.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index 34fd0e0ec51d..ac892878dbe6 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -55,6 +55,7 @@ struct trace_uprobe {
struct list_head list;
struct trace_uprobe_filter filter;
struct uprobe_consumer consumer;
+ struct path path;
struct inode *inode;
char *filename;
unsigned long offset;
@@ -289,7 +290,7 @@ static void free_trace_uprobe(struct trace_uprobe *tu)
for (i = 0; i < tu->tp.nr_args; i++)
traceprobe_free_probe_arg(&tu->tp.args[i]);
- iput(tu->inode);
+ path_put(&tu->path);
kfree(tu->tp.call.class->system);
kfree(tu->tp.call.name);
kfree(tu->filename);
@@ -363,7 +364,6 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
static int create_trace_uprobe(int argc, char **argv)
{
struct trace_uprobe *tu;
- struct inode *inode;
char *arg, *event, *group, *filename;
char buf[MAX_EVENT_NAME_LEN];
struct path path;
@@ -371,7 +371,6 @@ static int create_trace_uprobe(int argc, char **argv)
bool is_delete, is_return;
int i, ret;
- inode = NULL;
ret = 0;
is_delete = false;
is_return = false;
@@ -437,21 +436,16 @@ static int create_trace_uprobe(int argc, char **argv)
}
/* Find the last occurrence, in case the path contains ':' too. */
arg = strrchr(argv[1], ':');
- if (!arg) {
- ret = -EINVAL;
- goto fail_address_parse;
- }
+ if (!arg)
+ return -EINVAL;
*arg++ = '\0';
filename = argv[1];
ret = kern_path(filename, LOOKUP_FOLLOW, &path);
if (ret)
- goto fail_address_parse;
-
- inode = igrab(d_real_inode(path.dentry));
- path_put(&path);
+ return ret;
- if (!inode || !S_ISREG(inode->i_mode)) {
+ if (!d_is_reg(path.dentry)) {
ret = -EINVAL;
goto fail_address_parse;
}
@@ -490,7 +484,7 @@ static int create_trace_uprobe(int argc, char **argv)
goto fail_address_parse;
}
tu->offset = offset;
- tu->inode = inode;
+ tu->path = path;
tu->filename = kstrdup(filename, GFP_KERNEL);
if (!tu->filename) {
@@ -558,7 +552,7 @@ static int create_trace_uprobe(int argc, char **argv)
return ret;
fail_address_parse:
- iput(inode);
+ path_put(&path);
pr_info("Failed to parse address or file.\n");
@@ -922,6 +916,7 @@ probe_event_enable(struct trace_uprobe *tu, struct trace_event_file *file,
goto err_flags;
tu->consumer.filter = filter;
+ tu->inode = d_real_inode(tu->path.dentry);
ret = uprobe_register(tu->inode, tu->offset, &tu->consumer);
if (ret)
goto err_buffer;
@@ -967,6 +962,7 @@ probe_event_disable(struct trace_uprobe *tu, struct trace_event_file *file)
WARN_ON(!uprobe_filter_is_empty(&tu->filter));
uprobe_unregister(tu->inode, tu->offset, &tu->consumer);
+ tu->inode = NULL;
tu->tp.flags &= file ? ~TP_FLAG_TRACE : ~TP_FLAG_PROFILE;
uprobe_buffer_disable();
@@ -1337,7 +1333,6 @@ struct trace_event_call *
create_local_trace_uprobe(char *name, unsigned long offs, bool is_return)
{
struct trace_uprobe *tu;
- struct inode *inode;
struct path path;
int ret;
@@ -1345,11 +1340,8 @@ create_local_trace_uprobe(char *name, unsigned long offs, bool is_return)
if (ret)
return ERR_PTR(ret);
- inode = igrab(d_inode(path.dentry));
- path_put(&path);
-
- if (!inode || !S_ISREG(inode->i_mode)) {
- iput(inode);
+ if (!d_is_reg(path.dentry)) {
+ path_put(&path);
return ERR_PTR(-EINVAL);
}
@@ -1364,11 +1356,12 @@ create_local_trace_uprobe(char *name, unsigned long offs, bool is_return)
if (IS_ERR(tu)) {
pr_info("Failed to allocate trace_uprobe.(%d)\n",
(int)PTR_ERR(tu));
+ path_put(&path);
return ERR_CAST(tu);
}
tu->offset = offs;
- tu->inode = inode;
+ tu->path = path;
tu->filename = kstrdup(name, GFP_KERNEL);
init_trace_event_call(tu, &tu->tp.call);
The patch below does not apply to the 4.14-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From b4678df184b314a2bd47d2329feca2c2534aa12b Mon Sep 17 00:00:00 2001
From: Matthew Wilcox <willy(a)infradead.org>
Date: Tue, 24 Apr 2018 14:02:57 -0700
Subject: [PATCH] errseq: Always report a writeback error once
The errseq_t infrastructure assumes that errors which occurred before
the file descriptor was opened are of no interest to the application.
This turns out to be a regression for some applications, notably Postgres.
Before errseq_t, a writeback error would be reported exactly once (as
long as the inode remained in memory), so Postgres could open a file,
call fsync() and find out whether there had been a writeback error on
that file from another process.
This patch changes the errseq infrastructure to report errors to all
file descriptors which are opened after the error occurred, but before
it was reported to any file descriptor. This restores the user-visible
behaviour.
Cc: stable(a)vger.kernel.org
Fixes: 5660e13d2fd6 ("fs: new infrastructure for writeback error handling and reporting")
Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com>
Reviewed-by: Jeff Layton <jlayton(a)kernel.org>
Signed-off-by: Jeff Layton <jlayton(a)redhat.com>
diff --git a/lib/errseq.c b/lib/errseq.c
index df782418b333..81f9e33aa7e7 100644
--- a/lib/errseq.c
+++ b/lib/errseq.c
@@ -111,27 +111,22 @@ EXPORT_SYMBOL(errseq_set);
* errseq_sample() - Grab current errseq_t value.
* @eseq: Pointer to errseq_t to be sampled.
*
- * This function allows callers to sample an errseq_t value, marking it as
- * "seen" if required.
+ * This function allows callers to initialise their errseq_t variable.
+ * If the error has been "seen", new callers will not see an old error.
+ * If there is an unseen error in @eseq, the caller of this function will
+ * see it the next time it checks for an error.
*
+ * Context: Any context.
* Return: The current errseq value.
*/
errseq_t errseq_sample(errseq_t *eseq)
{
errseq_t old = READ_ONCE(*eseq);
- errseq_t new = old;
- /*
- * For the common case of no errors ever having been set, we can skip
- * marking the SEEN bit. Once an error has been set, the value will
- * never go back to zero.
- */
- if (old != 0) {
- new |= ERRSEQ_SEEN;
- if (old != new)
- cmpxchg(eseq, old, new);
- }
- return new;
+ /* If nobody has seen this error yet, then we can be the first. */
+ if (!(old & ERRSEQ_SEEN))
+ old = 0;
+ return old;
}
EXPORT_SYMBOL(errseq_sample);