From: David Hildenbrand <david(a)redhat.com>
Subject: kasan: fix memory hotplug during boot
Using module_init() is wrong. E.g. ACPI adds and onlines memory before
our memory notifier gets registered.
This makes sure that ACPI memory detected during boot up will not result
in a kernel crash.
Easily reproducible with QEMU, just specify a DIMM when starting up.
Link: http://lkml.kernel.org/r/20180522100756.18478-3-david@redhat.com
Fixes: 786a8959912e ("kasan: disable memory hotplug")
Signed-off-by: David Hildenbrand <david(a)redhat.com>
Acked-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/kasan.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff -puN mm/kasan/kasan.c~kasan-fix-memory-hotplug-during-boot mm/kasan/kasan.c
--- a/mm/kasan/kasan.c~kasan-fix-memory-hotplug-during-boot
+++ a/mm/kasan/kasan.c
@@ -898,5 +898,5 @@ static int __init kasan_memhotplug_init(
return 0;
}
-module_init(kasan_memhotplug_init);
+core_initcall(kasan_memhotplug_init);
#endif
_
From: "Gustavo A. R. Silva" <gustavo(a)embeddedor.com>
Subject: kernel/sys.c: fix potential Spectre v1 issue
`resource' can be controlled by user-space, hence leading to a potential
exploitation of the Spectre variant 1 vulnerability.
This issue was detected with the help of Smatch:
kernel/sys.c:1474 __do_compat_sys_old_getrlimit() warn: potential
spectre issue 'get_current()->signal->rlim' (local cap)
kernel/sys.c:1455 __do_sys_old_getrlimit() warn: potential spectre issue
'get_current()->signal->rlim' (local cap)
Fix this by sanitizing *resource* before using it to index
current->signal->rlim
Notice that given that speculation windows are large, the policy is to
kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
[1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2
Link: http://lkml.kernel.org/r/20180515030038.GA11822@embeddedor.com
Signed-off-by: Gustavo A. R. Silva <gustavo(a)embeddedor.com>
Reviewed-by: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Alexei Starovoitov <ast(a)kernel.org>
Cc: Dan Williams <dan.j.williams(a)intel.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Peter Zijlstra <peterz(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
kernel/sys.c | 5 +++++
1 file changed, 5 insertions(+)
diff -puN kernel/sys.c~kernel-sys-fix-potential-spectre-v1 kernel/sys.c
--- a/kernel/sys.c~kernel-sys-fix-potential-spectre-v1
+++ a/kernel/sys.c
@@ -71,6 +71,9 @@
#include <asm/io.h>
#include <asm/unistd.h>
+/* Hardening for Spectre-v1 */
+#include <linux/nospec.h>
+
#include "uid16.h"
#ifndef SET_UNALIGN_CTL
@@ -1453,6 +1456,7 @@ SYSCALL_DEFINE2(old_getrlimit, unsigned
if (resource >= RLIM_NLIMITS)
return -EINVAL;
+ resource = array_index_nospec(resource, RLIM_NLIMITS);
task_lock(current->group_leader);
x = current->signal->rlim[resource];
task_unlock(current->group_leader);
@@ -1472,6 +1476,7 @@ COMPAT_SYSCALL_DEFINE2(old_getrlimit, un
if (resource >= RLIM_NLIMITS)
return -EINVAL;
+ resource = array_index_nospec(resource, RLIM_NLIMITS);
task_lock(current->group_leader);
r = current->signal->rlim[resource];
task_unlock(current->group_leader);
_
From: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Subject: mm/kasan: don't vfree() nonexistent vm_area
KASAN uses different routines to map shadow for hot added memory and
memory obtained in boot process. Attempt to offline memory onlined by
normal boot process leads to this:
Trying to vfree() nonexistent vm area (000000005d3b34b9)
WARNING: CPU: 2 PID: 13215 at mm/vmalloc.c:1525 __vunmap+0x147/0x190
Call Trace:
kasan_mem_notifier+0xad/0xb9
notifier_call_chain+0x166/0x260
__blocking_notifier_call_chain+0xdb/0x140
__offline_pages+0x96a/0xb10
memory_subsys_offline+0x76/0xc0
device_offline+0xb8/0x120
store_mem_state+0xfa/0x120
kernfs_fop_write+0x1d5/0x320
__vfs_write+0xd4/0x530
vfs_write+0x105/0x340
SyS_write+0xb0/0x140
Obviously we can't call vfree() to free memory that wasn't allocated via
vmalloc(). Use find_vm_area() to see if we can call vfree().
Unfortunately it's a bit tricky to properly unmap and free shadow
allocated during boot, so we'll have to keep it. If memory will come
online again that shadow will be reused.
Matthew asked: how can you call vfree() on something that isn't a
vmalloc address?
vfree() is able to free any address returned by
__vmalloc_node_range(). And __vmalloc_node_range() gives you any
address you ask. It doesn't have to be an address in [VMALLOC_START,
VMALLOC_END] range.
That's also how the module_alloc()/module_memfree() works on
architectures that have designated area for modules.
[aryabinin(a)virtuozzo.com: improve comments]
Link: http://lkml.kernel.org/r/dabee6ab-3a7a-51cd-3b86-5468718e0390@virtuozzo.com
[akpm(a)linux-foundation.org: fix typos, reflow comment]
Link: http://lkml.kernel.org/r/20180201163349.8700-1-aryabinin@virtuozzo.com
Fixes: fa69b5989bb0 ("mm/kasan: add support for memory hotplug")
Signed-off-by: Andrey Ryabinin <aryabinin(a)virtuozzo.com>
Reported-by: Paul Menzel <pmenzel+linux-kasan-dev(a)molgen.mpg.de>
Cc: Alexander Potapenko <glider(a)google.com>
Cc: Dmitry Vyukov <dvyukov(a)google.com>
Cc: Matthew Wilcox <willy(a)infradead.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/kasan/kasan.c | 63 +++++++++++++++++++++++++++++++++++++++++++--
1 file changed, 61 insertions(+), 2 deletions(-)
diff -puN mm/kasan/kasan.c~mm-kasan-dont-vfree-nonexistent-vm_area mm/kasan/kasan.c
--- a/mm/kasan/kasan.c~mm-kasan-dont-vfree-nonexistent-vm_area
+++ a/mm/kasan/kasan.c
@@ -792,6 +792,40 @@ DEFINE_ASAN_SET_SHADOW(f5);
DEFINE_ASAN_SET_SHADOW(f8);
#ifdef CONFIG_MEMORY_HOTPLUG
+static bool shadow_mapped(unsigned long addr)
+{
+ pgd_t *pgd = pgd_offset_k(addr);
+ p4d_t *p4d;
+ pud_t *pud;
+ pmd_t *pmd;
+ pte_t *pte;
+
+ if (pgd_none(*pgd))
+ return false;
+ p4d = p4d_offset(pgd, addr);
+ if (p4d_none(*p4d))
+ return false;
+ pud = pud_offset(p4d, addr);
+ if (pud_none(*pud))
+ return false;
+
+ /*
+ * We can't use pud_large() or pud_huge(), the first one is
+ * arch-specific, the last one depends on HUGETLB_PAGE. So let's abuse
+ * pud_bad(), if pud is bad then it's bad because it's huge.
+ */
+ if (pud_bad(*pud))
+ return true;
+ pmd = pmd_offset(pud, addr);
+ if (pmd_none(*pmd))
+ return false;
+
+ if (pmd_bad(*pmd))
+ return true;
+ pte = pte_offset_kernel(pmd, addr);
+ return !pte_none(*pte);
+}
+
static int __meminit kasan_mem_notifier(struct notifier_block *nb,
unsigned long action, void *data)
{
@@ -813,6 +847,14 @@ static int __meminit kasan_mem_notifier(
case MEM_GOING_ONLINE: {
void *ret;
+ /*
+ * If shadow is mapped already than it must have been mapped
+ * during the boot. This could happen if we onlining previously
+ * offlined memory.
+ */
+ if (shadow_mapped(shadow_start))
+ return NOTIFY_OK;
+
ret = __vmalloc_node_range(shadow_size, PAGE_SIZE, shadow_start,
shadow_end, GFP_KERNEL,
PAGE_KERNEL, VM_NO_GUARD,
@@ -824,8 +866,25 @@ static int __meminit kasan_mem_notifier(
kmemleak_ignore(ret);
return NOTIFY_OK;
}
- case MEM_OFFLINE:
- vfree((void *)shadow_start);
+ case MEM_OFFLINE: {
+ struct vm_struct *vm;
+
+ /*
+ * shadow_start was either mapped during boot by kasan_init()
+ * or during memory online by __vmalloc_node_range().
+ * In the latter case we can use vfree() to free shadow.
+ * Non-NULL result of the find_vm_area() will tell us if
+ * that was the second case.
+ *
+ * Currently it's not possible to free shadow mapped
+ * during boot by kasan_init(). It's because the code
+ * to do that hasn't been written yet. So we'll just
+ * leak the memory.
+ */
+ vm = find_vm_area((void *)shadow_start);
+ if (vm)
+ vfree((void *)shadow_start);
+ }
}
return NOTIFY_OK;
_
From: Davidlohr Bueso <dave(a)stgolabs.net>
Subject: ipc/shm: fix shmat() nil address after round-down when remapping
shmat()'s SHM_REMAP option forbids passing a nil address for; this is in
fact the very first thing we check for. Andrea reported that for
SHM_RND|SHM_REMAP cases we can end up bypassing the initial addr check,
but we need to check again if the address was rounded down to nil. As of
this patch, such cases will return -EINVAL.
Link: http://lkml.kernel.org/r/20180503204934.kk63josdu6u53fbd@linux-n805
Signed-off-by: Davidlohr Bueso <dbueso(a)suse.de>
Reported-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Joe Lawrence <joe.lawrence(a)redhat.com>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/shm.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff -puN ipc/shm.c~ipc-shm-fix-shmat-nil-address-after-round-down-when-remapping ipc/shm.c
--- a/ipc/shm.c~ipc-shm-fix-shmat-nil-address-after-round-down-when-remapping
+++ a/ipc/shm.c
@@ -1363,9 +1363,17 @@ long do_shmat(int shmid, char __user *sh
if (addr) {
if (addr & (shmlba - 1)) {
- if (shmflg & SHM_RND)
+ if (shmflg & SHM_RND) {
addr &= ~(shmlba - 1); /* round down */
- else
+
+ /*
+ * Ensure that the round-down is non-nil
+ * when remapping. This can happen for
+ * cases when addr < shmlba.
+ */
+ if (!addr && (shmflg & SHM_REMAP))
+ goto out;
+ } else
#ifndef __ARCH_FORCE_SHMLBA
if (addr & ~PAGE_MASK)
#endif
_
From: Davidlohr Bueso <dave(a)stgolabs.net>
Subject: Revert "ipc/shm: Fix shmat mmap nil-page protection"
Patch series "ipc/shm: shmat() fixes around nil-page".
These patches fix two issues reported[1] a while back by Joe and Andrea
around how shmat(2) behaves with nil-page.
The first reverts a commit that it was incorrectly thought that mapping
nil-page (address=0) was a no no with MAP_FIXED. This is not the case,
with the exception of SHM_REMAP; which is address in the second patch.
I chose two patches because it is easier to backport and it explicitly
reverts bogus behaviour. Both patches ought to be in -stable and ltp
testcases need updated (the added testcase around the cve can be modified
to just test for SHM_RND|SHM_REMAP).
[1] lkml.kernel.org/r/20180430172152.nfa564pvgpk3ut7p@linux-n805
This patch (of 2):
95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection") worked on the
idea that we should not be mapping as root addr=0 and MAP_FIXED. However,
it was reported that this scenario is in fact valid, thus making the patch
both bogus and breaks userspace as well. For example X11's libint10.so
relies on shmat(1, SHM_RND) for lowmem initialization[1].
[1] https://cgit.freedesktop.org/xorg/xserver/tree/hw/xfree86/os-support/linux/…
Link: http://lkml.kernel.org/r/20180503203243.15045-2-dave@stgolabs.net
Fixes: 95e91b831f87 ("ipc/shm: Fix shmat mmap nil-page protection")
Signed-off-by: Davidlohr Bueso <dbueso(a)suse.de>
Reported-by: Joe Lawrence <joe.lawrence(a)redhat.com>
Reported-by: Andrea Arcangeli <aarcange(a)redhat.com>
Cc: Manfred Spraul <manfred(a)colorfullife.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
ipc/shm.c | 9 ++-------
1 file changed, 2 insertions(+), 7 deletions(-)
diff -puN ipc/shm.c~revert-ipc-shm-fix-shmat-mmap-nil-page-protection ipc/shm.c
--- a/ipc/shm.c~revert-ipc-shm-fix-shmat-mmap-nil-page-protection
+++ a/ipc/shm.c
@@ -1363,13 +1363,8 @@ long do_shmat(int shmid, char __user *sh
if (addr) {
if (addr & (shmlba - 1)) {
- /*
- * Round down to the nearest multiple of shmlba.
- * For sane do_mmap_pgoff() parameters, avoid
- * round downs that trigger nil-page and MAP_FIXED.
- */
- if ((shmflg & SHM_RND) && addr >= shmlba)
- addr &= ~(shmlba - 1);
+ if (shmflg & SHM_RND)
+ addr &= ~(shmlba - 1); /* round down */
else
#ifndef __ARCH_FORCE_SHMLBA
if (addr & ~PAGE_MASK)
_
From: Matthew Wilcox <mawilcox(a)microsoft.com>
Subject: idr: fix invalid ptr dereference on item delete
If the radix tree underlying the IDR happens to be full and we attempt to
remove an id which is larger than any id in the IDR, we will call
__radix_tree_delete() with an uninitialised 'slot' pointer, at which point
anything could happen. This was easiest to hit with a single entry at id
0 and attempting to remove a non-0 id, but it could have happened with 64
entries and attempting to remove an id >= 64.
Roman said:
The syzcaller test boils down to opening /dev/kvm, creating an
eventfd, and calling a couple of KVM ioctls. None of this requires
superuser. And the result is dereferencing an uninitialized pointer
which is likely a crash. The specific path caught by syzbot is via
KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
other user-triggerable paths, so cc:stable is probably justified.
Matthew added:
We have around 250 calls to idr_remove() in the kernel today. Many
of them pass an ID which is embedded in the object they're removing,
so they're safe. Picking a few likely candidates:
drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
drivers/atm/nicstar.c could be taken down by a handcrafted packet
Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Reported-by: <syzbot+35666cba7f0a337e2e79(a)syzkaller.appspotmail.com>
Debugged-by: Roman Kagan <rkagan(a)virtuozzo.com>
Signed-off-by: Matthew Wilcox <mawilcox(a)microsoft.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
lib/radix-tree.c | 4 +++-
tools/testing/radix-tree/idr-test.c | 7 +++++++
2 files changed, 10 insertions(+), 1 deletion(-)
diff -puN lib/radix-tree.c~idr-fix-invalid-ptr-dereference-on-item-delete lib/radix-tree.c
--- a/lib/radix-tree.c~idr-fix-invalid-ptr-dereference-on-item-delete
+++ a/lib/radix-tree.c
@@ -2034,10 +2034,12 @@ void *radix_tree_delete_item(struct radi
unsigned long index, void *item)
{
struct radix_tree_node *node = NULL;
- void __rcu **slot;
+ void __rcu **slot = NULL;
void *entry;
entry = __radix_tree_lookup(root, index, &node, &slot);
+ if (!slot)
+ return NULL;
if (!entry && (!is_idr(root) || node_tag_get(root, node, IDR_FREE,
get_slot_offset(node, slot))))
return NULL;
diff -puN tools/testing/radix-tree/idr-test.c~idr-fix-invalid-ptr-dereference-on-item-delete tools/testing/radix-tree/idr-test.c
--- a/tools/testing/radix-tree/idr-test.c~idr-fix-invalid-ptr-dereference-on-item-delete
+++ a/tools/testing/radix-tree/idr-test.c
@@ -252,6 +252,13 @@ void idr_checks(void)
idr_remove(&idr, 3);
idr_remove(&idr, 0);
+ assert(idr_alloc(&idr, DUMMY_PTR, 0, 0, GFP_KERNEL) == 0);
+ idr_remove(&idr, 1);
+ for (i = 1; i < RADIX_TREE_MAP_SIZE; i++)
+ assert(idr_alloc(&idr, DUMMY_PTR, 0, 0, GFP_KERNEL) == i);
+ idr_remove(&idr, 1 << 30);
+ idr_destroy(&idr);
+
for (i = INT_MAX - 3UL; i < INT_MAX + 1UL; i++) {
struct item *item = item_create(i, 0);
assert(idr_alloc(&idr, item, i, i + 10, GFP_KERNEL) == i);
_
The patch titled
Subject: mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
has been added to the -mm tree. Its filename is
mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-page_alloc-do-not-break-__gfp_t…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-page_alloc-do-not-break-__gfp_t…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Vlastimil Babka <vbabka(a)suse.cz>
Subject: mm, page_alloc: do not break __GFP_THISNODE by zonelist reset
In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
allocations that can ignore memory policies. The zonelist is obtained
from current CPU's node. This is a problem for __GFP_THISNODE allocations
that want to allocate on a different node, e.g. because the allocating
thread has been migrated to a different CPU.
This has been observed to break SLAB in our 4.4-based kernel, because
there it relies on __GFP_THISNODE working as intended. If a slab page is
put on wrong node's list, then further list manipulations may corrupt the
list because page_to_nid() is used to determine which node's list_lock
should be locked and thus we may take a wrong lock and race.
Current SLAB implementation seems to be immune by luck thanks to commit
511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on
arbitrary node") but there may be others assuming that __GFP_THISNODE
works as promised.
We can fix it by simply removing the zonelist reset completely. There is
actually no reason to reset it, because memory policies and cpusets don't
affect the zonelist choice in the first place. This was different when
commit 183f6371aac2 ("mm: ignore mempolicies when using
ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
own restricted zonelists.
We might consider this for 4.17 although I don't know if there's anything
currently broken. Stable backports should be more important, but will
have to be reviewed carefully, as the code went through many changes. BTW
I think that also the ac->preferred_zoneref reset is currently useless if
we don't also reset ac->nodemask from a mempolicy to NULL first (which we
probably should for the OOM victims etc?), but I would leave that for a
separate patch.
Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz
Signed-off-by: Vlastimil Babka <vbabka(a)suse.cz>
Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK")
Cc: Mel Gorman <mgorman(a)techsingularity.net>
Cc: Michal Hocko <mhocko(a)kernel.org>
Cc: David Rientjes <rientjes(a)google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim(a)lge.com>
Cc: Vlastimil Babka <vbabka(a)suse.cz>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
mm/page_alloc.c | 1 -
1 file changed, 1 deletion(-)
diff -puN mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset mm/page_alloc.c
--- a/mm/page_alloc.c~mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset
+++ a/mm/page_alloc.c
@@ -4169,7 +4169,6 @@ retry:
* orientated.
*/
if (!(alloc_flags & ALLOC_CPUSET) || reserve_flags) {
- ac->zonelist = node_zonelist(numa_node_id(), gfp_mask);
ac->preferred_zoneref = first_zones_zonelist(ac->zonelist,
ac->high_zoneidx, ac->nodemask);
}
_
Patches currently in -mm which might be from vbabka(a)suse.cz are
mm-page_alloc-do-not-break-__gfp_thisnode-by-zonelist-reset.patch
This is a note to let you know that I've just added the patch titled
phy: qcom-qusb2: Fix crash if nvmem cell not specified
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-next branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will also be merged in the next major kernel release
during the merge window.
If you have any questions about this process, please let me know.
>From 0b4555e776ba0712c6fafb98b226b21fd05d2427 Mon Sep 17 00:00:00 2001
From: Manu Gautam <mgautam(a)codeaurora.org>
Date: Thu, 3 May 2018 02:36:10 +0530
Subject: phy: qcom-qusb2: Fix crash if nvmem cell not specified
Driver currently crashes due to NULL pointer deference
while updating PHY tune register if nvmem cell is NULL.
Since, fused value for Tune1/2 register is optional,
we'd rather bail out.
Fixes: ca04d9d3e1b1 ("phy: qcom-qusb2: New driver for QUSB2 PHY on Qcom chips")
Reviewed-by: Vivek Gautam <vivek.gautam(a)codeaurora.org>
Reviewed-by: Evan Green <evgreen(a)chromium.org>
Cc: stable <stable(a)vger.kernel.org> # 4.14+
Signed-off-by: Manu Gautam <mgautam(a)codeaurora.org>
Signed-off-by: Kishon Vijay Abraham I <kishon(a)ti.com>
---
drivers/phy/qualcomm/phy-qcom-qusb2.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/phy/qualcomm/phy-qcom-qusb2.c b/drivers/phy/qualcomm/phy-qcom-qusb2.c
index 94afeac1a19e..40fdef8b5b75 100644
--- a/drivers/phy/qualcomm/phy-qcom-qusb2.c
+++ b/drivers/phy/qualcomm/phy-qcom-qusb2.c
@@ -315,6 +315,10 @@ static void qusb2_phy_set_tune2_param(struct qusb2_phy *qphy)
const struct qusb2_phy_cfg *cfg = qphy->cfg;
u8 *val;
+ /* efuse register is optional */
+ if (!qphy->cell)
+ return;
+
/*
* Read efuse register having TUNE2/1 parameter's high nibble.
* If efuse register shows value as 0x0, or if we fail to find
--
2.17.0
ext4_resize_fs() has an off-by-one bug when checking whether growing of
a filesystem will not overflow inode count. As a result it allows a
filesystem with 8192 inodes per group to grow to 64TB which overflows
inode count to 0 and makes filesystem unusable. Fix it.
CC: stable(a)vger.kernel.org
Fixes: 3f8a6411fbada1fa482276591e037f3b1adcf55b
Reported-by: Jaco Kroon <jaco(a)uls.co.za>
Signed-off-by: Jan Kara <jack(a)suse.cz>
---
fs/ext4/resize.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index b6bec270a8e4..d792b7689d92 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -1933,7 +1933,7 @@ int ext4_resize_fs(struct super_block *sb, ext4_fsblk_t n_blocks_count)
return 0;
n_group = ext4_get_group_number(sb, n_blocks_count - 1);
- if (n_group > (0xFFFFFFFFUL / EXT4_INODES_PER_GROUP(sb))) {
+ if (n_group >= (0xFFFFFFFFUL / EXT4_INODES_PER_GROUP(sb))) {
ext4_warning(sb, "resize would cause inodes_count overflow");
return -EINVAL;
}
--
2.13.6
This is a note to let you know that I've just added the patch titled
phy: qcom-qusb2: Fix crash if nvmem cell not specified
to my usb git tree which can be found at
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git
in the usb-testing branch.
The patch will show up in the next release of the linux-next tree
(usually sometime within the next 24 hours during the week.)
The patch will be merged to the usb-next branch sometime soon,
after it passes testing, and the merge window is open.
If you have any questions about this process, please let me know.
>From 0b4555e776ba0712c6fafb98b226b21fd05d2427 Mon Sep 17 00:00:00 2001
From: Manu Gautam <mgautam(a)codeaurora.org>
Date: Thu, 3 May 2018 02:36:10 +0530
Subject: phy: qcom-qusb2: Fix crash if nvmem cell not specified
Driver currently crashes due to NULL pointer deference
while updating PHY tune register if nvmem cell is NULL.
Since, fused value for Tune1/2 register is optional,
we'd rather bail out.
Fixes: ca04d9d3e1b1 ("phy: qcom-qusb2: New driver for QUSB2 PHY on Qcom chips")
Reviewed-by: Vivek Gautam <vivek.gautam(a)codeaurora.org>
Reviewed-by: Evan Green <evgreen(a)chromium.org>
Cc: stable <stable(a)vger.kernel.org> # 4.14+
Signed-off-by: Manu Gautam <mgautam(a)codeaurora.org>
Signed-off-by: Kishon Vijay Abraham I <kishon(a)ti.com>
---
drivers/phy/qualcomm/phy-qcom-qusb2.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/drivers/phy/qualcomm/phy-qcom-qusb2.c b/drivers/phy/qualcomm/phy-qcom-qusb2.c
index 94afeac1a19e..40fdef8b5b75 100644
--- a/drivers/phy/qualcomm/phy-qcom-qusb2.c
+++ b/drivers/phy/qualcomm/phy-qcom-qusb2.c
@@ -315,6 +315,10 @@ static void qusb2_phy_set_tune2_param(struct qusb2_phy *qphy)
const struct qusb2_phy_cfg *cfg = qphy->cfg;
u8 *val;
+ /* efuse register is optional */
+ if (!qphy->cell)
+ return;
+
/*
* Read efuse register having TUNE2/1 parameter's high nibble.
* If efuse register shows value as 0x0, or if we fail to find
--
2.17.0
From: Chintan Pandya <cpandya(a)codeaurora.org>
The following kernel panic was observed on ARM64 platform due to a stale
TLB entry.
1. ioremap with 4K size, a valid pte page table is set.
2. iounmap it, its pte entry is set to 0.
3. ioremap the same address with 2M size, update its pmd entry with
a new value.
4. CPU may hit an exception because the old pmd entry is still in TLB,
which leads to a kernel panic.
Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page
table") has addressed this panic by falling to pte mappings in the above
case on ARM64.
To support pmd mappings in all cases, TLB purge needs to be performed
in this case on ARM64.
Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
so that TLB purge can be added later in seprate patches.
[toshi(a)hpe.com: merge changes, rewrite patch description]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya <cpandya(a)codeaurora.org>
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: <stable(a)vger.kernel.org>
---
arch/arm64/mm/mmu.c | 4 ++--
arch/x86/mm/pgtable.c | 8 +++++---
include/asm-generic/pgtable.h | 8 ++++----
lib/ioremap.c | 4 ++--
4 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9..da98828 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
}
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
return pud_none(*pud);
}
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
return pmd_none(*pmd);
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13..37e3cba 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -718,11 +718,12 @@ int pmd_clear_huge(pmd_t *pmd)
/**
* pud_free_pmd_page - Clear pud entry and free pmd page.
* @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
*
* Context: The pud range has been unmaped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
pmd_t *pmd;
int i;
@@ -733,7 +734,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i]))
+ if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
return 0;
pud_clear(pud);
@@ -745,11 +746,12 @@ int pud_free_pmd_page(pud_t *pud)
/**
* pmd_free_pte_page - Clear pmd entry and free pte page.
* @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
*
* Context: The pmd range has been unmaped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
pte_t *pte;
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639a..b081794 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ static inline int p4d_clear_huge(p4d_t *p4d)
int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
int pud_clear_huge(pud_t *pud);
int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
{
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
{
return 0;
}
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
return 0;
}
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
return 0;
}
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bba..517f585 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
- pmd_free_pte_page(pmd)) {
+ pmd_free_pte_page(pmd, addr)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -119,7 +119,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
- pud_free_pmd_page(pud)) {
+ pud_free_pmd_page(pud, addr)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project
From: Chintan Pandya <cpandya(a)codeaurora.org>
The following kernel panic was observed on ARM64 platform due to a stale
TLB entry.
1. ioremap with 4K size, a valid pte page table is set.
2. iounmap it, its pte entry is set to 0.
3. ioremap the same address with 2M size, update its pmd entry with
a new value.
4. CPU may hit an exception because the old pmd entry is still in TLB,
which leads to a kernel panic.
Commit b6bdb7517c3d ("mm/vmalloc: add interfaces to free unmapped page
table") has addressed this panic by falling to pte mappings in the above
case on ARM64.
To support pmd mappings in all cases, TLB purge needs to be performed
in this case on ARM64.
Add a new arg, 'addr', to pud_free_pmd_page() and pmd_free_pte_page()
so that TLB purge can be added later in seprate patches.
[toshi(a)hpe.com: merge changes, rewrite patch description]
Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces")
Signed-off-by: Chintan Pandya <cpandya(a)codeaurora.org>
Signed-off-by: Toshi Kani <toshi.kani(a)hpe.com>
Cc: Andrew Morton <akpm(a)linux-foundation.org>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Thomas Gleixner <tglx(a)linutronix.de>
Cc: Ingo Molnar <mingo(a)redhat.com>
Cc: "H. Peter Anvin" <hpa(a)zytor.com>
Cc: Will Deacon <will.deacon(a)arm.com>
Cc: Joerg Roedel <joro(a)8bytes.org>
Cc: <stable(a)vger.kernel.org>
---
arch/arm64/mm/mmu.c | 4 ++--
arch/x86/mm/pgtable.c | 8 +++++---
include/asm-generic/pgtable.h | 8 ++++----
lib/ioremap.c | 4 ++--
4 files changed, 13 insertions(+), 11 deletions(-)
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 2dbb2c9..da98828 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -973,12 +973,12 @@ int pmd_clear_huge(pmd_t *pmdp)
return 1;
}
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
return pud_none(*pud);
}
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
return pmd_none(*pmd);
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ffc8c13..37e3cba 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -718,11 +718,12 @@ int pmd_clear_huge(pmd_t *pmd)
/**
* pud_free_pmd_page - Clear pud entry and free pmd page.
* @pud: Pointer to a PUD.
+ * @addr: Virtual address associated with pud.
*
* Context: The pud range has been unmaped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
-int pud_free_pmd_page(pud_t *pud)
+int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
pmd_t *pmd;
int i;
@@ -733,7 +734,7 @@ int pud_free_pmd_page(pud_t *pud)
pmd = (pmd_t *)pud_page_vaddr(*pud);
for (i = 0; i < PTRS_PER_PMD; i++)
- if (!pmd_free_pte_page(&pmd[i]))
+ if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE)))
return 0;
pud_clear(pud);
@@ -745,11 +746,12 @@ int pud_free_pmd_page(pud_t *pud)
/**
* pmd_free_pte_page - Clear pmd entry and free pte page.
* @pmd: Pointer to a PMD.
+ * @addr: Virtual address associated with pmd.
*
* Context: The pmd range has been unmaped and TLB purged.
* Return: 1 if clearing the entry succeeded. 0 otherwise.
*/
-int pmd_free_pte_page(pmd_t *pmd)
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
pte_t *pte;
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index f59639a..b081794 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,8 +1019,8 @@ static inline int p4d_clear_huge(p4d_t *p4d)
int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
int pud_clear_huge(pud_t *pud);
int pmd_clear_huge(pmd_t *pmd);
-int pud_free_pmd_page(pud_t *pud);
-int pmd_free_pte_page(pmd_t *pmd);
+int pud_free_pmd_page(pud_t *pud, unsigned long addr);
+int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
#else /* !CONFIG_HAVE_ARCH_HUGE_VMAP */
static inline int p4d_set_huge(p4d_t *p4d, phys_addr_t addr, pgprot_t prot)
{
@@ -1046,11 +1046,11 @@ static inline int pmd_clear_huge(pmd_t *pmd)
{
return 0;
}
-static inline int pud_free_pmd_page(pud_t *pud)
+static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
{
return 0;
}
-static inline int pmd_free_pte_page(pmd_t *pmd)
+static inline int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
{
return 0;
}
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 54e5bba..517f585 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -92,7 +92,7 @@ static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
if (ioremap_pmd_enabled() &&
((next - addr) == PMD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
- pmd_free_pte_page(pmd)) {
+ pmd_free_pte_page(pmd, addr)) {
if (pmd_set_huge(pmd, phys_addr + addr, prot))
continue;
}
@@ -119,7 +119,7 @@ static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
if (ioremap_pud_enabled() &&
((next - addr) == PUD_SIZE) &&
IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
- pud_free_pmd_page(pud)) {
+ pud_free_pmd_page(pud, addr)) {
if (pud_set_huge(pud, phys_addr + addr, prot))
continue;
}
--
Qualcomm India Private Limited, on behalf of Qualcomm Innovation
Center, Inc., is a member of Code Aurora Forum, a Linux Foundation
Collaborative Project
As it stands, memory_failure() gets thoroughly confused by dev_pagemap
backed mappings. The recovery code has specific enabling for several
possible page states and needs new enabling to handle poison in dax
mappings.
In order to support reliable reverse mapping of user space addresses add
new locking in the fsdax implementation to prevent races between
page-address_space disassociation events and the rmap performed in the
memory_failure() path. Additionally, since dev_pagemap pages are hidden
from the page allocator, add a mechanism to determine the size of the
mapping that encompasses a given poisoned pfn. Lastly, since pmem errors
can be repaired, change the speculatively accessed poison protection,
mce_unmap_kpfn(), to be reversible and otherwise allow ongoing access
from the kernel.
---
Dan Williams (11):
device-dax: convert to vmf_insert_mixed and vm_fault_t
device-dax: cleanup vm_fault de-reference chains
device-dax: enable page_mapping()
device-dax: set page->index
filesystem-dax: set page->index
filesystem-dax: perform __dax_invalidate_mapping_entry() under the page lock
mm, madvise_inject_error: fix page count leak
x86, memory_failure: introduce {set,clear}_mce_nospec()
mm, memory_failure: pass page size to kill_proc()
mm, memory_failure: teach memory_failure() about dev_pagemap pages
libnvdimm, pmem: restore page attributes when clearing errors
arch/x86/include/asm/set_memory.h | 29 ++++++
arch/x86/kernel/cpu/mcheck/mce-internal.h | 15 ---
arch/x86/kernel/cpu/mcheck/mce.c | 38 +-------
drivers/dax/device.c | 91 ++++++++++++--------
drivers/nvdimm/pmem.c | 26 ++++++
drivers/nvdimm/pmem.h | 13 +++
fs/dax.c | 102 ++++++++++++++++++++--
include/linux/huge_mm.h | 5 +
include/linux/set_memory.h | 14 +++
mm/huge_memory.c | 4 -
mm/madvise.c | 11 ++
mm/memory-failure.c | 133 +++++++++++++++++++++++++++--
12 files changed, 370 insertions(+), 111 deletions(-)
Add a kernel parameter that allows setting UV memory block size. This
is to provide an adjustment for new forms of PMEM and other DIMM memory
that might require alignment restrictions other than scanning the global
address table for the required minimum alignment. The value set will be
further adjusted by both the GAM range table scan as well as restrictions
imposed by set_memory_block_size_order().
Signed-off-by: Mike Travis <mike.travis(a)hpe.com>
Reviewed-by: Andrew Banman <andrew.banman(a)hpe.com>
Cc: stable(a)vger.kernel.org
---
arch/x86/kernel/apic/x2apic_uv_x.c | 11 +++++++++++
1 file changed, 11 insertions(+)
--- linux.orig/arch/x86/kernel/apic/x2apic_uv_x.c
+++ linux/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -396,6 +396,17 @@ EXPORT_SYMBOL(uv_hub_info_version);
/* Default UV memory block size is 2GB */
static unsigned long mem_block_size = (2UL << 30);
+/* Kernel parameter to specify UV mem block size */
+static int parse_mem_block_size(char *ptr)
+{
+ unsigned long size = memparse(ptr, NULL);
+
+ /* Size will be rounded down by set_block_size() below */
+ mem_block_size = size;
+ return 0;
+}
+early_param("uv_memblksize", parse_mem_block_size);
+
static __init int adj_blksize(u32 lgre)
{
unsigned long base = (unsigned long)lgre << UV_GAM_RANGE_SHFT;
--
Add a call to the new function to "adjust" the current fixed UV memory
block size of 2GB so it can be changed to a different physical boundary.
This accommodates changes in the Intel BIOS, and therefore UV BIOS,
which now can align boundaries different than the previous UV standard
of 2GB. It also flags any UV Global Address boundaries from BIOS that
cause a change in the mem block size (boundary).
The current boundary of 2GB has been used on UV since the first system
release in 2009 with Linux 2.6 and has worked fine. But the new NVDIMM
persistent memory modules (PMEM), along with the Intel BIOS changes to
support these modules caused the memory block size boundary to be set
to a lower limit. Intel only guarantees that this minimum boundary at
64MB though the current Linux limit is 128MB.
Note that the default remains 2GB if no changes occur.
Signed-off-by: Mike Travis <mike.travis(a)hpe.com>
Reviewed-by: Andrew Banman <andrew.banman(a)hpe.com>
Cc: stable(a)vger.kernel.org
---
v2: Update description
---
arch/x86/kernel/apic/x2apic_uv_x.c | 49 ++++++++++++++++++++++++++++++++++---
1 file changed, 46 insertions(+), 3 deletions(-)
--- linux.orig/arch/x86/kernel/apic/x2apic_uv_x.c
+++ linux/arch/x86/kernel/apic/x2apic_uv_x.c
@@ -26,6 +26,7 @@
#include <linux/delay.h>
#include <linux/crash_dump.h>
#include <linux/reboot.h>
+#include <linux/memory.h>
#include <asm/uv/uv_mmrs.h>
#include <asm/uv/uv_hub.h>
@@ -392,6 +393,40 @@ extern int uv_hub_info_version(void)
}
EXPORT_SYMBOL(uv_hub_info_version);
+/* Default UV memory block size is 2GB */
+static unsigned long mem_block_size = (2UL << 30);
+
+static __init int adj_blksize(u32 lgre)
+{
+ unsigned long base = (unsigned long)lgre << UV_GAM_RANGE_SHFT;
+ unsigned long size;
+
+ for (size = mem_block_size; size > MIN_MEMORY_BLOCK_SIZE; size >>= 1)
+ if (IS_ALIGNED(base, size))
+ break;
+
+ if (size >= mem_block_size)
+ return 0;
+
+ mem_block_size = size;
+ return 1;
+}
+
+static __init void set_block_size(void)
+{
+ unsigned int order = ffs(mem_block_size);
+
+ if (order) {
+ /* adjust for ffs return of 1..64 */
+ set_memory_block_size_order(order - 1);
+ pr_info("UV: mem_block_size set to 0x%lx\n", mem_block_size);
+ } else {
+ /* bad or zero value, default to 1UL << 31 (2GB) */
+ pr_err("UV: mem_block_size error with 0x%lx\n", mem_block_size);
+ set_memory_block_size_order(31);
+ }
+}
+
/* Build GAM range lookup table: */
static __init void build_uv_gr_table(void)
{
@@ -1180,23 +1215,30 @@ static void __init decode_gam_rng_tbl(un
<< UV_GAM_RANGE_SHFT);
int order = 0;
char suffix[] = " KMGTPE";
+ int flag = ' ';
while (size > 9999 && order < sizeof(suffix)) {
size /= 1024;
order++;
}
+ /* adjust max block size to current range start */
+ if (gre->type == 1 || gre->type == 2)
+ if (adj_blksize(lgre))
+ flag = '*';
+
if (!index) {
pr_info("UV: GAM Range Table...\n");
- pr_info("UV: # %20s %14s %5s %4s %5s %3s %2s\n", "Range", "", "Size", "Type", "NASID", "SID", "PN");
+ pr_info("UV: # %20s %14s %6s %4s %5s %3s %2s\n", "Range", "", "Size", "Type", "NASID", "SID", "PN");
}
- pr_info("UV: %2d: 0x%014lx-0x%014lx %5lu%c %3d %04x %02x %02x\n",
+ pr_info("UV: %2d: 0x%014lx-0x%014lx%c %5lu%c %3d %04x %02x %02x\n",
index++,
(unsigned long)lgre << UV_GAM_RANGE_SHFT,
(unsigned long)gre->limit << UV_GAM_RANGE_SHFT,
- size, suffix[order],
+ flag, size, suffix[order],
gre->type, gre->nasid, gre->sockid, gre->pnode);
+ /* update to next range start */
lgre = gre->limit;
if (sock_min > gre->sockid)
sock_min = gre->sockid;
@@ -1427,6 +1469,7 @@ static void __init uv_system_init_hub(vo
build_socket_tables();
build_uv_gr_table();
+ set_block_size();
uv_init_hub_info(&hub_info);
uv_possible_blades = num_possible_nodes();
if (!_node_to_pnode)
--
Add a new function to "adjust" the current fixed UV memory block size
of 2GB so it can be changed to a different physical boundary. This is
out of necessity so arch dependent code can accommodate specific BIOS
requirements which can align these new PMEM modules at less than the
default boundaries.
A "set order" type of function was used to insure that the memory block
size will be a power of two value without requiring a validity check.
64GB was chosen as the upper limit for memory block size values to
accommodate upcoming 4PB systems which have 6 more bits of physical
address space (46 becoming 52).
Signed-off-by: Mike Travis <mike.travis(a)hpe.com>
Reviewed-by: Andrew Banman <andrew.banman(a)hpe.com>
Cc: stable(a)vger.kernel.org
---
v2: Update description
---
arch/x86/mm/init_64.c | 20 ++++++++++++++++----
include/linux/memory.h | 1 +
2 files changed, 17 insertions(+), 4 deletions(-)
--- linux.orig/arch/x86/mm/init_64.c
+++ linux/arch/x86/mm/init_64.c
@@ -1350,16 +1350,28 @@ int kern_addr_valid(unsigned long addr)
/* Amount of ram needed to start using large blocks */
#define MEM_SIZE_FOR_LARGE_BLOCK (64UL << 30)
+/* Adjustable memory block size */
+static unsigned long set_memory_block_size;
+int __init set_memory_block_size_order(unsigned int order)
+{
+ unsigned long size = 1UL << order;
+
+ if (size > MEM_SIZE_FOR_LARGE_BLOCK || size < MIN_MEMORY_BLOCK_SIZE)
+ return -EINVAL;
+
+ set_memory_block_size = size;
+ return 0;
+}
+
static unsigned long probe_memory_block_size(void)
{
unsigned long boot_mem_end = max_pfn << PAGE_SHIFT;
unsigned long bz;
- /* If this is UV system, always set 2G block size */
- if (is_uv_system()) {
- bz = MAX_BLOCK_SIZE;
+ /* If memory block size has been set, then use it */
+ bz = set_memory_block_size;
+ if (bz)
goto done;
- }
/* Use regular block if RAM is smaller than MEM_SIZE_FOR_LARGE_BLOCK */
if (boot_mem_end < MEM_SIZE_FOR_LARGE_BLOCK) {
--- linux.orig/include/linux/memory.h
+++ linux/include/linux/memory.h
@@ -38,6 +38,7 @@ struct memory_block {
int arch_get_memory_phys_device(unsigned long start_pfn);
unsigned long memory_block_size_bytes(void);
+int set_memory_block_size_order(unsigned int order);
/* These states are exposed to userspace as text strings in sysfs */
#define MEM_ONLINE (1<<0) /* exposed to userspace */
--