From: Will Deacon <will.deacon(a)arm.com>
commit 2a355ec25729053bb9a1a89b6c1d1cdd6c3b3fb1 upstream.
While the CSV3 field of the ID_AA64_PFR0 CPU ID register can be checked
to see if a CPU is susceptible to Meltdown and therefore requires kpti
to be enabled, existing CPUs do not implement this field.
We therefore whitelist all unaffected Cortex-A CPUs that do not implement
the CSV3 field.
Signed-off-by: Will Deacon <will.deacon(a)arm.com>
[florian: adjust whilelist location and table to stable-4.9.y]
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
---
arch/arm64/kernel/cpufeature.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 9a8e45dc36bd..8cf001baee21 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -789,6 +789,11 @@ static bool unmap_kernel_at_el0(const struct arm64_cpu_capabilities *entry,
switch (read_cpuid_id() & MIDR_CPU_MODEL_MASK) {
case MIDR_CAVIUM_THUNDERX2:
case MIDR_BRCM_VULCAN:
+ case MIDR_CORTEX_A53:
+ case MIDR_CORTEX_A55:
+ case MIDR_CORTEX_A57:
+ case MIDR_CORTEX_A72:
+ case MIDR_CORTEX_A73:
return false;
}
--
2.17.1
From: Jeremy Linton <jeremy.linton(a)arm.com>
commit de19055564c8f8f9d366f8db3395836da0b2176c upstream
For a while Arm64 has been capable of force enabling
or disabling the kpti mitigations. Lets make sure the
documentation reflects that.
Signed-off-by: Jeremy Linton <jeremy.linton(a)arm.com>
Reviewed-by: Andre Przywara <andre.przywara(a)arm.com>
Signed-off-by: Jonathan Corbet <corbet(a)lwn.net>
[florian: patch the correct file]
Signed-off-by: Florian Fainelli <f.fainelli(a)gmail.com>
---
Documentation/kernel-parameters.txt | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 1bc12619bedd..b2d2f4539a3f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1965,6 +1965,12 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
kmemcheck=2 (one-shot mode)
Default: 2 (one-shot mode)
+ kpti= [ARM64] Control page table isolation of user
+ and kernel address spaces.
+ Default: enabled on cores which need mitigation.
+ 0: force disabled
+ 1: force enabled
+
kstack=N [X86] Print N words from the kernel stack
in oops dumps.
--
2.17.1
ZBC/ZAC report zones command may return less bytes than requested if the
number of matching zones for the report request is small. However, unlike
read or write commands, the remainder of incomplete report zones commands
cannot be automatically requested by the block layer: the start sector of
the next report cannot be known, and the report reply may not be 512B
aligned for SAS drives (a report zone reply size is always a multiple of
64B). The regular request completion code executing bio_advance() and
restart of the command remainder part currently causes invalid zone
descriptor data to be reported to the caller if the report zone size is
smaller than 512B (a case that can happen easily for a report of the last
zones of a SAS drive for example).
Since blkdev_report_zones() handles report zone command processing in a
loop until completion (no more zones are being reported), we can safely
avoid that the block layer performs an incorrect bio_advance() call and
restart of the remainder of incomplete report zone BIOs. To do so, always
indicate a full completion of REQ_OP_ZONE_REPORT by setting good_bytes to
the request buffer size and by setting the command resid to 0. This does
not affect the post processing of the report zone reply done by
sd_zbc_complete() since the reply header indicates the number of zones
reported.
Fixes: 89d947561077 ("sd: Implement support for ZBC devices")
Cc: <stable(a)vger.kernel.org> # 4.19
Cc: <stable(a)vger.kernel.org> # 4.14
Signed-off-by: Masato Suzuki <masato.suzuki(a)wdc.com>
---
drivers/scsi/sd.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 2955b856e9ec..e8c2afbb82e9 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1981,9 +1981,13 @@ static int sd_done(struct scsi_cmnd *SCpnt)
}
break;
case REQ_OP_ZONE_REPORT:
+ /* To avoid that the block layer performs an incorrect
+ * bio_advance() call and restart of the remainder of
+ * incomplete report zone BIOs, always indicate a full
+ * completion of REQ_OP_ZONE_REPORT.
+ */
if (!result) {
- good_bytes = scsi_bufflen(SCpnt)
- - scsi_get_resid(SCpnt);
+ good_bytes = scsi_bufflen(SCpnt);
scsi_set_resid(SCpnt, 0);
} else {
good_bytes = 0;
--
2.24.1
The patch below does not apply to the 4.19-stable tree.
If someone wants it applied there, or to any other stable or longterm
tree, then please email the backport, including the original git commit
id to <stable(a)vger.kernel.org>.
thanks,
greg k-h
------------------ original commit in Linus's tree ------------------
>From 8bcebc77e85f3d7536f96845a0fe94b1dddb6af0 Mon Sep 17 00:00:00 2001
From: "Steven Rostedt (VMware)" <rostedt(a)goodmis.org>
Date: Mon, 20 Jan 2020 13:07:31 -0500
Subject: [PATCH] tracing: Fix histogram code when expression has same var as
value
While working on a tool to convert SQL syntex into the histogram language of
the kernel, I discovered the following bug:
# echo 'first u64 start_time u64 end_time pid_t pid u64 delta' >> synthetic_events
# echo 'hist:keys=pid:start=common_timestamp' > events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp-$start,start2=$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger
Would not display any histograms in the sched_switch histogram side.
But if I were to swap the location of
"delta=common_timestamp-$start" with "start2=$start"
Such that the last line had:
# echo 'hist:keys=next_pid:start2=$start,delta=common_timestamp-$start:onmatch(sched.sched_waking).trace(first,$start2,common_timestamp,next_pid,$delta)' > events/sched/sched_switch/trigger
The histogram works as expected.
What I found out is that the expressions clear out the value once it is
resolved. As the variables are resolved in the order listed, when
processing:
delta=common_timestamp-$start
The $start is cleared. When it gets to "start2=$start", it errors out with
"unresolved symbol" (which is silent as this happens at the location of the
trace), and the histogram is dropped.
When processing the histogram for variable references, instead of adding a
new reference for a variable used twice, use the same reference. That way,
not only is it more efficient, but the order will no longer matter in
processing of the variables.
>From Tom Zanussi:
"Just to clarify some more about what the problem was is that without
your patch, we would have two separate references to the same variable,
and during resolve_var_refs(), they'd both want to be resolved
separately, so in this case, since the first reference to start wasn't
part of an expression, it wouldn't get the read-once flag set, so would
be read normally, and then the second reference would do the read-once
read and also be read but using read-once. So everything worked and
you didn't see a problem:
from: start2=$start,delta=common_timestamp-$start
In the second case, when you switched them around, the first reference
would be resolved by doing the read-once, and following that the second
reference would try to resolve and see that the variable had already
been read, so failed as unset, which caused it to short-circuit out and
not do the trigger action to generate the synthetic event:
to: delta=common_timestamp-$start,start2=$start
With your patch, we only have the single resolution which happens
correctly the one time it's resolved, so this can't happen."
Link: https://lore.kernel.org/r/20200116154216.58ca08eb@gandalf.local.home
Cc: stable(a)vger.kernel.org
Fixes: 067fe038e70f6 ("tracing: Add variable reference handling to hist triggers")
Reviewed-by: Tom Zanuss <zanussi(a)kernel.org>
Tested-by: Tom Zanussi <zanussi(a)kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt(a)goodmis.org>
diff --git a/kernel/trace/trace_events_hist.c b/kernel/trace/trace_events_hist.c
index d33b046f985a..6ac35b9e195d 100644
--- a/kernel/trace/trace_events_hist.c
+++ b/kernel/trace/trace_events_hist.c
@@ -116,6 +116,7 @@ struct hist_field {
struct ftrace_event_field *field;
unsigned long flags;
hist_field_fn_t fn;
+ unsigned int ref;
unsigned int size;
unsigned int offset;
unsigned int is_signed;
@@ -2427,8 +2428,16 @@ static int contains_operator(char *str)
return field_op;
}
+static void get_hist_field(struct hist_field *hist_field)
+{
+ hist_field->ref++;
+}
+
static void __destroy_hist_field(struct hist_field *hist_field)
{
+ if (--hist_field->ref > 1)
+ return;
+
kfree(hist_field->var.name);
kfree(hist_field->name);
kfree(hist_field->type);
@@ -2470,6 +2479,8 @@ static struct hist_field *create_hist_field(struct hist_trigger_data *hist_data,
if (!hist_field)
return NULL;
+ hist_field->ref = 1;
+
hist_field->hist_data = hist_data;
if (flags & HIST_FIELD_FL_EXPR || flags & HIST_FIELD_FL_ALIAS)
@@ -2665,6 +2676,17 @@ static struct hist_field *create_var_ref(struct hist_trigger_data *hist_data,
{
unsigned long flags = HIST_FIELD_FL_VAR_REF;
struct hist_field *ref_field;
+ int i;
+
+ /* Check if the variable already exists */
+ for (i = 0; i < hist_data->n_var_refs; i++) {
+ ref_field = hist_data->var_refs[i];
+ if (ref_field->var.idx == var_field->var.idx &&
+ ref_field->var.hist_data == var_field->hist_data) {
+ get_hist_field(ref_field);
+ return ref_field;
+ }
+ }
ref_field = create_hist_field(var_field->hist_data, NULL, flags, NULL);
if (ref_field) {
Since commit a49bd4d71637 ("mm, numa: rework do_pages_move"),
the semantic of move_pages() has changed to return the number of
non-migrated pages if they were result of a non-fatal reasons (usually a
busy page). This was an unintentional change that hasn't been noticed
except for LTP tests which checked for the documented behavior.
There are two ways to go around this change. We can even get back to the
original behavior and return -EAGAIN whenever migrate_pages is not able
to migrate pages due to non-fatal reasons. Another option would be to
simply continue with the changed semantic and extend move_pages
documentation to clarify that -errno is returned on an invalid input or
when migration simply cannot succeed (e.g. -ENOMEM, -EBUSY) or the
number of pages that couldn't have been migrated due to ephemeral
reasons (e.g. page is pinned or locked for other reasons).
This patch implements the second option because this behavior is in
place for some time without anybody complaining and possibly new users
depending on it. Also it allows to have a slightly easier error handling
as the caller knows that it is worth to retry when err > 0.
But since the new semantic would be aborted immediately if migration is
failed due to ephemeral reasons, need include the number of non-attempted
pages in the return value too.
Fixes: a49bd4d71637 ("mm, numa: rework do_pages_move")
Suggested-by: Michal Hocko <mhocko(a)suse.com>
Acked-by: Michal Hocko <mhocko(a)suse.com>
Cc: Wei Yang <richardw.yang(a)linux.intel.com>
Cc: <stable(a)vger.kernel.org> [4.17+]
Signed-off-by: Yang Shi <yang.shi(a)linux.alibaba.com>
---
v4: Fixed some typo and grammar errors caught by Willy
v3: Rephrased the commit log per Michal and added Michal's Acked-by
v2: Rebased on top of the latest mainline kernel per Andrew
mm/migrate.c | 25 +++++++++++++++++++++++--
1 file changed, 23 insertions(+), 2 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index 86873b6..2530860 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1627,8 +1627,19 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
start = i;
} else if (node != current_node) {
err = do_move_pages_to_node(mm, &pagelist, current_node);
- if (err)
+ if (err) {
+ /*
+ * Positive err means the number of failed
+ * pages to migrate. Since we are going to
+ * abort and return the number of non-migrated
+ * pages, so need to incude the rest of the
+ * nr_pages that have not been attempted as
+ * well.
+ */
+ if (err > 0)
+ err += nr_pages - i - 1;
goto out;
+ }
err = store_status(status, start, current_node, i - start);
if (err)
goto out;
@@ -1659,8 +1670,11 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
goto out_flush;
err = do_move_pages_to_node(mm, &pagelist, current_node);
- if (err)
+ if (err) {
+ if (err > 0)
+ err += nr_pages - i - 1;
goto out;
+ }
if (i > start) {
err = store_status(status, start, current_node, i - start);
if (err)
@@ -1674,6 +1688,13 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
/* Make sure we do not overwrite the existing error */
err1 = do_move_pages_to_node(mm, &pagelist, current_node);
+ /*
+ * Don't have to report non-attempted pages here since:
+ * - If the above loop is done gracefully all pages have been
+ * attempted.
+ * - If the above loop is aborted it means a fatal error
+ * happened, should return ret.
+ */
if (!err1)
err1 = store_status(status, start, current_node, i - start);
if (!err)
--
1.8.3.1
From: Suren Baghdasaryan <surenb(a)google.com>
When ashmem file is mmapped, the resulting vma->vm_file points to the
backing shmem file with the generic fops that do not check ashmem
permissions like fops of ashmem do. If an mremap is done on the ashmem
region, then the permission checks will be skipped. Fix that by disallowing
mapping operation on the backing shmem file.
Reported-by: Jann Horn <jannh(a)google.com>
Signed-off-by: Suren Baghdasaryan <surenb(a)google.com>
Cc: stable <stable(a)vger.kernel.org> # 4.4,4.9,4.14,4.18,5.4
Signed-off-by: Todd Kjos <tkjos(a)google.com>
---
drivers/staging/android/ashmem.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
v2: update commit message as suggested by joelaf(a)google.com.
diff --git a/drivers/staging/android/ashmem.c b/drivers/staging/android/ashmem.c
index 74d497d39c5a..c6695354b123 100644
--- a/drivers/staging/android/ashmem.c
+++ b/drivers/staging/android/ashmem.c
@@ -351,8 +351,23 @@ static inline vm_flags_t calc_vm_may_flags(unsigned long prot)
_calc_vm_trans(prot, PROT_EXEC, VM_MAYEXEC);
}
+static int ashmem_vmfile_mmap(struct file *file, struct vm_area_struct *vma)
+{
+ /* do not allow to mmap ashmem backing shmem file directly */
+ return -EPERM;
+}
+
+static unsigned long
+ashmem_vmfile_get_unmapped_area(struct file *file, unsigned long addr,
+ unsigned long len, unsigned long pgoff,
+ unsigned long flags)
+{
+ return current->mm->get_unmapped_area(file, addr, len, pgoff, flags);
+}
+
static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
{
+ static struct file_operations vmfile_fops;
struct ashmem_area *asma = file->private_data;
int ret = 0;
@@ -393,6 +408,19 @@ static int ashmem_mmap(struct file *file, struct vm_area_struct *vma)
}
vmfile->f_mode |= FMODE_LSEEK;
asma->file = vmfile;
+ /*
+ * override mmap operation of the vmfile so that it can't be
+ * remapped which would lead to creation of a new vma with no
+ * asma permission checks. Have to override get_unmapped_area
+ * as well to prevent VM_BUG_ON check for f_ops modification.
+ */
+ if (!vmfile_fops.mmap) {
+ vmfile_fops = *vmfile->f_op;
+ vmfile_fops.mmap = ashmem_vmfile_mmap;
+ vmfile_fops.get_unmapped_area =
+ ashmem_vmfile_get_unmapped_area;
+ }
+ vmfile->f_op = &vmfile_fops;
}
get_file(asma->file);
--
2.25.0.341.g760bfbb309-goog