The patch titled
Subject: mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL
has been added to the -mm tree. Its filename is
mm-hmm-mark-hmm_devmem_add-add_resource-export_symbol_gpl.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-hmm-mark-hmm_devmem_add-add_res…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-hmm-mark-hmm_devmem_add-add_res…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm, hmm: mark hmm_devmem_{add, add_resource} EXPORT_SYMBOL_GPL
At Maintainer Summit, Greg brought up a topic I proposed around
EXPORT_SYMBOL_GPL usage. The motivation was considerations for when
EXPORT_SYMBOL_GPL is warranted and the criteria for taking the exceptional
step of reclassifying an existing export. Specifically, I wanted to make
the case that although the line is fuzzy and hard to specify in abstract
terms, it is nonetheless clear that devm_memremap_pages() and HMM
(Heterogeneous Memory Management) have crossed it. The
devm_memremap_pages() facility should have been EXPORT_SYMBOL_GPL from the
beginning, and HMM as a derivative of that functionality should have
naturally picked up that designation as well.
Contrary to typical rules, the HMM infrastructure was merged upstream with
zero in-tree consumers. There was a promise at the time that those users
would be merged "soon", but it has been over a year with no drivers
arriving. While the Nouveau driver is about to belatedly make good on
that promise it is clear that HMM was targeted first and foremost at an
out-of-tree consumer.
HMM is derived from devm_memremap_pages(), a facility Christoph and I
spearheaded to support persistent memory. It combines a device lifetime
model with a dynamically created 'struct page' / memmap array for any
physical address range. It enables coordination and control of the many
code paths in the kernel built to interact with memory via 'struct page'
objects. With HMM the integration goes even deeper by allowing device
drivers to hook and manipulate page fault and page free events.
One interpretation of when EXPORT_SYMBOL is suitable is when it is
exporting stable and generic leaf functionality. The
devm_memremap_pages() facility continues to see expanding use cases,
peer-to-peer DMA being the most recent, with no clear end date when it
will stop attracting reworks and semantic changes. It is not suitable to
export devm_memremap_pages() as a stable 3rd party driver API due to the
fact that it is still changing and manipulates core behavior. Moreover,
it is not in the best interest of the long term development of the core
memory management subsystem to permit any external driver to effectively
define its own system-wide memory management policies with no
encouragement to engage with upstream.
I am also concerned that HMM was designed in a way to minimize further
engagement with the core-MM. That, with these hooks in place,
device-drivers are free to implement their own policies without much
consideration for whether and how the core-MM could grow to meet that
need. Going forward not only should HMM be EXPORT_SYMBOL_GPL, but the
core-MM should be allowed the opportunity and stimulus to change and
address these new use cases as first class functionality.
Original changelog:
hmm_devmem_add(), and hmm_devmem_add_resource() duplicated
devm_memremap_pages() and are now simple now wrappers around the core
facility to inject a dev_pagemap instance into the global pgmap_radix and
hook page-idle events. The devm_memremap_pages() interface is base
infrastructure for HMM. HMM has more and deeper ties into the kernel
memory management implementation than base ZONE_DEVICE which is itself a
EXPORT_SYMBOL_GPL facility.
Originally, the HMM page structure creation routines copied the
devm_memremap_pages() code and reused ZONE_DEVICE. A cleanup to unify the
implementations was discussed during the initial review:
http://lkml.iu.edu/hypermail/linux/kernel/1701.2/00812.html Recent work to
extend devm_memremap_pages() for the peer-to-peer-DMA facility enabled
this cleanup to move forward.
In addition to the integration with devm_memremap_pages() HMM depends on
other GPL-only symbols:
mmu_notifier_unregister_no_release
percpu_ref
region_intersects
__class_create
It goes further to consume / indirectly expose functionality that is not
exported to any other driver:
alloc_pages_vma
walk_page_range
HMM is derived from devm_memremap_pages(), and extends deep core-kernel
fundamentals. Similar to devm_memremap_pages(), mark its entry points
EXPORT_SYMBOL_GPL().
Link: http://lkml.kernel.org/r/154275560565.76910.15919297436557795278.stgit@dwil…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: Balbir Singh <bsingharora(a)gmail.com>,
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: Linus Torvalds <torvalds(a)linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh(a)kernel.crashing.org>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/mm/hmm.c~mm-hmm-mark-hmm_devmem_add-add_resource-export_symbol_gpl
+++ a/mm/hmm.c
@@ -1110,7 +1110,7 @@ struct hmm_devmem *hmm_devmem_add(const
return result;
return devmem;
}
-EXPORT_SYMBOL(hmm_devmem_add);
+EXPORT_SYMBOL_GPL(hmm_devmem_add);
struct hmm_devmem *hmm_devmem_add_resource(const struct hmm_devmem_ops *ops,
struct device *device,
@@ -1164,7 +1164,7 @@ struct hmm_devmem *hmm_devmem_add_resour
return result;
return devmem;
}
-EXPORT_SYMBOL(hmm_devmem_add_resource);
+EXPORT_SYMBOL_GPL(hmm_devmem_add_resource);
/*
* A device driver that wants to handle multiple devices memory through a
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl.patch
mm-devm_memremap_pages-kill-mapping-system-ram-support.patch
mm-devm_memremap_pages-fix-shutdown-handling.patch
mm-devm_memremap_pages-add-memory_device_private-support.patch
mm-hmm-use-devm-semantics-for-hmm_devmem_add-remove.patch
mm-hmm-replace-hmm_devmem_pages_create-with-devm_memremap_pages.patch
mm-hmm-mark-hmm_devmem_add-add_resource-export_symbol_gpl.patch
The patch titled
Subject: mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support
has been added to the -mm tree. Its filename is
mm-devm_memremap_pages-add-memory_device_private-support.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-devm_memremap_pages-add-memory_…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-devm_memremap_pages-add-memory_…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm, devm_memremap_pages: add MEMORY_DEVICE_PRIVATE support
In preparation for consolidating all ZONE_DEVICE enabling via
devm_memremap_pages(), teach it how to handle the constraints of
MEMORY_DEVICE_PRIVATE ranges.
[jglisse(a)redhat.com: call move_pfn_range_to_zone for MEMORY_DEVICE_PRIVATE]
Link: http://lkml.kernel.org/r/154275559036.76910.12434636179931292607.stgit@dwil…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Jérôme Glisse <jglisse(a)redhat.com>
Acked-by: Christoph Hellwig <hch(a)lst.de>
Reported-by: Logan Gunthorpe <logang(a)deltatee.com>
Reviewed-by: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Balbir Singh <bsingharora(a)gmail.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/kernel/memremap.c~mm-devm_memremap_pages-add-memory_device_private-support
+++ a/kernel/memremap.c
@@ -98,9 +98,15 @@ static void devm_memremap_pages_release(
- align_start;
mem_hotplug_begin();
- arch_remove_memory(align_start, align_size, pgmap->altmap_valid ?
- &pgmap->altmap : NULL);
- kasan_remove_zero_shadow(__va(align_start), align_size);
+ if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+ pfn = align_start >> PAGE_SHIFT;
+ __remove_pages(page_zone(pfn_to_page(pfn)), pfn,
+ align_size >> PAGE_SHIFT, NULL);
+ } else {
+ arch_remove_memory(align_start, align_size,
+ pgmap->altmap_valid ? &pgmap->altmap : NULL);
+ kasan_remove_zero_shadow(__va(align_start), align_size);
+ }
mem_hotplug_done();
untrack_pfn(NULL, PHYS_PFN(align_start), align_size);
@@ -187,17 +193,40 @@ void *devm_memremap_pages(struct device
goto err_pfn_remap;
mem_hotplug_begin();
- error = kasan_add_zero_shadow(__va(align_start), align_size);
- if (error) {
- mem_hotplug_done();
- goto err_kasan;
+
+ /*
+ * For device private memory we call add_pages() as we only need to
+ * allocate and initialize struct page for the device memory. More-
+ * over the device memory is un-accessible thus we do not want to
+ * create a linear mapping for the memory like arch_add_memory()
+ * would do.
+ *
+ * For all other device memory types, which are accessible by
+ * the CPU, we do want the linear mapping and thus use
+ * arch_add_memory().
+ */
+ if (pgmap->type == MEMORY_DEVICE_PRIVATE) {
+ error = add_pages(nid, align_start >> PAGE_SHIFT,
+ align_size >> PAGE_SHIFT, NULL, false);
+ } else {
+ error = kasan_add_zero_shadow(__va(align_start), align_size);
+ if (error) {
+ mem_hotplug_done();
+ goto err_kasan;
+ }
+
+ error = arch_add_memory(nid, align_start, align_size, altmap,
+ false);
+ }
+
+ if (!error) {
+ struct zone *zone;
+
+ zone = &NODE_DATA(nid)->node_zones[ZONE_DEVICE];
+ move_pfn_range_to_zone(zone, align_start >> PAGE_SHIFT,
+ align_size >> PAGE_SHIFT, altmap);
}
- error = arch_add_memory(nid, align_start, align_size, altmap, false);
- if (!error)
- move_pfn_range_to_zone(&NODE_DATA(nid)->node_zones[ZONE_DEVICE],
- align_start >> PAGE_SHIFT,
- align_size >> PAGE_SHIFT, altmap);
mem_hotplug_done();
if (error)
goto err_add_memory;
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl.patch
mm-devm_memremap_pages-kill-mapping-system-ram-support.patch
mm-devm_memremap_pages-fix-shutdown-handling.patch
mm-devm_memremap_pages-add-memory_device_private-support.patch
mm-hmm-use-devm-semantics-for-hmm_devmem_add-remove.patch
mm-hmm-replace-hmm_devmem_pages_create-with-devm_memremap_pages.patch
mm-hmm-mark-hmm_devmem_add-add_resource-export_symbol_gpl.patch
The patch titled
Subject: mm, devm_memremap_pages: fix shutdown handling
has been added to the -mm tree. Its filename is
mm-devm_memremap_pages-fix-shutdown-handling.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-devm_memremap_pages-fix-shutdow…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-devm_memremap_pages-fix-shutdow…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm, devm_memremap_pages: fix shutdown handling
The last step before devm_memremap_pages() returns success is to allocate
a release action, devm_memremap_pages_release(), to tear the entire setup
down. However, the result from devm_add_action() is not checked.
Checking the error from devm_add_action() is not enough. The api
currently relies on the fact that the percpu_ref it is using is killed by
the time the devm_memremap_pages_release() is run. Rather than continue
this awkward situation, offload the responsibility of killing the
percpu_ref to devm_memremap_pages_release() directly. This allows
devm_memremap_pages() to do the right thing relative to init failures and
shutdown.
Without this change we could fail to register the teardown of
devm_memremap_pages(). The likelihood of hitting this failure is tiny as
small memory allocations almost always succeed. However, the impact of
the failure is large given any future reconfiguration, or disable/enable,
of an nvdimm namespace will fail forever as subsequent calls to
devm_memremap_pages() will fail to setup the pgmap_radix since there will
be stale entries for the physical address range.
An argument could be made to require that the ->kill() operation be set in
the @pgmap arg rather than passed in separately. However, it helps code
readability, tracking the lifetime of a given instance, to be able to grep
the kill routine directly at the devm_memremap_pages() call site.
Link: http://lkml.kernel.org/r/154275558526.76910.7535251937849268605.stgit@dwill…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Fixes: e8d513483300 ("memremap: change devm_memremap_pages interface...")
Reviewed-by: "Jérôme Glisse" <jglisse(a)redhat.com>
Reported-by: Logan Gunthorpe <logang(a)deltatee.com>
Reviewed-by: Logan Gunthorpe <logang(a)deltatee.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: <stable(a)vger.kernel.org>
Cc: Balbir Singh <bsingharora(a)gmail.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/drivers/dax/pmem.c~mm-devm_memremap_pages-fix-shutdown-handling
+++ a/drivers/dax/pmem.c
@@ -48,9 +48,8 @@ static void dax_pmem_percpu_exit(void *d
percpu_ref_exit(ref);
}
-static void dax_pmem_percpu_kill(void *data)
+static void dax_pmem_percpu_kill(struct percpu_ref *ref)
{
- struct percpu_ref *ref = data;
struct dax_pmem *dax_pmem = to_dax_pmem(ref);
dev_dbg(dax_pmem->dev, "trace\n");
@@ -112,17 +111,10 @@ static int dax_pmem_probe(struct device
}
dax_pmem->pgmap.ref = &dax_pmem->ref;
+ dax_pmem->pgmap.kill = dax_pmem_percpu_kill;
addr = devm_memremap_pages(dev, &dax_pmem->pgmap);
- if (IS_ERR(addr)) {
- devm_remove_action(dev, dax_pmem_percpu_exit, &dax_pmem->ref);
- percpu_ref_exit(&dax_pmem->ref);
+ if (IS_ERR(addr))
return PTR_ERR(addr);
- }
-
- rc = devm_add_action_or_reset(dev, dax_pmem_percpu_kill,
- &dax_pmem->ref);
- if (rc)
- return rc;
/* adjust the dax_region resource to the start of data */
memcpy(&res, &dax_pmem->pgmap.res, sizeof(res));
--- a/drivers/nvdimm/pmem.c~mm-devm_memremap_pages-fix-shutdown-handling
+++ a/drivers/nvdimm/pmem.c
@@ -309,8 +309,11 @@ static void pmem_release_queue(void *q)
blk_cleanup_queue(q);
}
-static void pmem_freeze_queue(void *q)
+static void pmem_freeze_queue(struct percpu_ref *ref)
{
+ struct request_queue *q;
+
+ q = container_of(ref, typeof(*q), q_usage_counter);
blk_freeze_queue_start(q);
}
@@ -402,6 +405,7 @@ static int pmem_attach_disk(struct devic
pmem->pfn_flags = PFN_DEV;
pmem->pgmap.ref = &q->q_usage_counter;
+ pmem->pgmap.kill = pmem_freeze_queue;
if (is_nd_pfn(dev)) {
if (setup_pagemap_fsdax(dev, &pmem->pgmap))
return -ENOMEM;
@@ -427,13 +431,6 @@ static int pmem_attach_disk(struct devic
memcpy(&bb_res, &nsio->res, sizeof(bb_res));
}
- /*
- * At release time the queue must be frozen before
- * devm_memremap_pages is unwound
- */
- if (devm_add_action_or_reset(dev, pmem_freeze_queue, q))
- return -ENOMEM;
-
if (IS_ERR(addr))
return PTR_ERR(addr);
pmem->virt_addr = addr;
--- a/include/linux/memremap.h~mm-devm_memremap_pages-fix-shutdown-handling
+++ a/include/linux/memremap.h
@@ -111,6 +111,7 @@ typedef void (*dev_page_free_t)(struct p
* @altmap: pre-allocated/reserved memory for vmemmap allocations
* @res: physical address range covered by @ref
* @ref: reference count that pins the devm_memremap_pages() mapping
+ * @kill: callback to transition @ref to the dead state
* @dev: host device of the mapping for debug
* @data: private data pointer for page_free()
* @type: memory type: see MEMORY_* in memory_hotplug.h
@@ -122,6 +123,7 @@ struct dev_pagemap {
bool altmap_valid;
struct resource res;
struct percpu_ref *ref;
+ void (*kill)(struct percpu_ref *ref);
struct device *dev;
void *data;
enum memory_type type;
--- a/kernel/memremap.c~mm-devm_memremap_pages-fix-shutdown-handling
+++ a/kernel/memremap.c
@@ -88,14 +88,10 @@ static void devm_memremap_pages_release(
resource_size_t align_start, align_size;
unsigned long pfn;
+ pgmap->kill(pgmap->ref);
for_each_device_pfn(pfn, pgmap)
put_page(pfn_to_page(pfn));
- if (percpu_ref_tryget_live(pgmap->ref)) {
- dev_WARN(dev, "%s: page mapping is still live!\n", __func__);
- percpu_ref_put(pgmap->ref);
- }
-
/* pages are dead and unused, undo the arch mapping */
align_start = res->start & ~(SECTION_SIZE - 1);
align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
@@ -116,7 +112,7 @@ static void devm_memremap_pages_release(
/**
* devm_memremap_pages - remap and provide memmap backing for the given resource
* @dev: hosting device for @res
- * @pgmap: pointer to a struct dev_pgmap
+ * @pgmap: pointer to a struct dev_pagemap
*
* Notes:
* 1/ At a minimum the res, ref and type members of @pgmap must be initialized
@@ -125,11 +121,8 @@ static void devm_memremap_pages_release(
* 2/ The altmap field may optionally be initialized, in which case altmap_valid
* must be set to true
*
- * 3/ pgmap.ref must be 'live' on entry and 'dead' before devm_memunmap_pages()
- * time (or devm release event). The expected order of events is that ref has
- * been through percpu_ref_kill() before devm_memremap_pages_release(). The
- * wait for the completion of all references being dropped and
- * percpu_ref_exit() must occur after devm_memremap_pages_release().
+ * 3/ pgmap->ref must be 'live' on entry and will be killed at
+ * devm_memremap_pages_release() time, or if this routine fails.
*
* 4/ res is expected to be a host memory range that could feasibly be
* treated as a "System RAM" range, i.e. not a device mmio range, but
@@ -145,6 +138,9 @@ void *devm_memremap_pages(struct device
pgprot_t pgprot = PAGE_KERNEL;
int error, nid, is_ram;
+ if (!pgmap->ref || !pgmap->kill)
+ return ERR_PTR(-EINVAL);
+
align_start = res->start & ~(SECTION_SIZE - 1);
align_size = ALIGN(res->start + resource_size(res), SECTION_SIZE)
- align_start;
@@ -170,12 +166,10 @@ void *devm_memremap_pages(struct device
if (is_ram != REGION_DISJOINT) {
WARN_ONCE(1, "%s attempted on %s region %pr\n", __func__,
is_ram == REGION_MIXED ? "mixed" : "ram", res);
- return ERR_PTR(-ENXIO);
+ error = -ENXIO;
+ goto err_array;
}
- if (!pgmap->ref)
- return ERR_PTR(-EINVAL);
-
pgmap->dev = dev;
error = xa_err(xa_store_range(&pgmap_array, PHYS_PFN(res->start),
@@ -217,7 +211,10 @@ void *devm_memremap_pages(struct device
align_size >> PAGE_SHIFT, pgmap);
percpu_ref_get_many(pgmap->ref, pfn_end(pgmap) - pfn_first(pgmap));
- devm_add_action(dev, devm_memremap_pages_release, pgmap);
+ error = devm_add_action_or_reset(dev, devm_memremap_pages_release,
+ pgmap);
+ if (error)
+ return ERR_PTR(error);
return __va(res->start);
@@ -228,6 +225,7 @@ void *devm_memremap_pages(struct device
err_pfn_remap:
pgmap_array_delete(res);
err_array:
+ pgmap->kill(pgmap->ref);
return ERR_PTR(error);
}
EXPORT_SYMBOL_GPL(devm_memremap_pages);
--- a/tools/testing/nvdimm/test/iomap.c~mm-devm_memremap_pages-fix-shutdown-handling
+++ a/tools/testing/nvdimm/test/iomap.c
@@ -104,13 +104,26 @@ void *__wrap_devm_memremap(struct device
}
EXPORT_SYMBOL(__wrap_devm_memremap);
+static void nfit_test_kill(void *_pgmap)
+{
+ struct dev_pagemap *pgmap = _pgmap;
+
+ pgmap->kill(pgmap->ref);
+}
+
void *__wrap_devm_memremap_pages(struct device *dev, struct dev_pagemap *pgmap)
{
resource_size_t offset = pgmap->res.start;
struct nfit_test_resource *nfit_res = get_nfit_res(offset);
- if (nfit_res)
+ if (nfit_res) {
+ int rc;
+
+ rc = devm_add_action_or_reset(dev, nfit_test_kill, pgmap);
+ if (rc)
+ return ERR_PTR(rc);
return nfit_res->buf + offset - nfit_res->res.start;
+ }
return devm_memremap_pages(dev, pgmap);
}
EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl.patch
mm-devm_memremap_pages-kill-mapping-system-ram-support.patch
mm-devm_memremap_pages-fix-shutdown-handling.patch
mm-devm_memremap_pages-add-memory_device_private-support.patch
mm-hmm-use-devm-semantics-for-hmm_devmem_add-remove.patch
mm-hmm-replace-hmm_devmem_pages_create-with-devm_memremap_pages.patch
mm-hmm-mark-hmm_devmem_add-add_resource-export_symbol_gpl.patch
The patch titled
Subject: mm, devm_memremap_pages: kill mapping "System RAM" support
has been added to the -mm tree. Its filename is
mm-devm_memremap_pages-kill-mapping-system-ram-support.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-devm_memremap_pages-kill-mappin…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-devm_memremap_pages-kill-mappin…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm, devm_memremap_pages: kill mapping "System RAM" support
Given the fact that devm_memremap_pages() requires a percpu_ref that is
torn down by devm_memremap_pages_release() the current support for mapping
RAM is broken.
Support for remapping "System RAM" has been broken since the beginning and
there is no existing user of this this code path, so just kill the support
and make it an explicit error.
This cleanup also simplifies a follow-on patch to fix the error path when
setting a devm release action for devm_memremap_pages_release() fails.
Link: http://lkml.kernel.org/r/154275557997.76910.14689813630968180480.stgit@dwil…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: "Jérôme Glisse" <jglisse(a)redhat.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Reviewed-by: Logan Gunthorpe <logang(a)deltatee.com>
Cc: Balbir Singh <bsingharora(a)gmail.com>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/kernel/memremap.c~mm-devm_memremap_pages-kill-mapping-system-ram-support
+++ a/kernel/memremap.c
@@ -167,15 +167,12 @@ void *devm_memremap_pages(struct device
is_ram = region_intersects(align_start, align_size,
IORESOURCE_SYSTEM_RAM, IORES_DESC_NONE);
- if (is_ram == REGION_MIXED) {
- WARN_ONCE(1, "%s attempted on mixed region %pr\n",
- __func__, res);
+ if (is_ram != REGION_DISJOINT) {
+ WARN_ONCE(1, "%s attempted on %s region %pr\n", __func__,
+ is_ram == REGION_MIXED ? "mixed" : "ram", res);
return ERR_PTR(-ENXIO);
}
- if (is_ram == REGION_INTERSECTS)
- return __va(res->start);
-
if (!pgmap->ref)
return ERR_PTR(-EINVAL);
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl.patch
mm-devm_memremap_pages-kill-mapping-system-ram-support.patch
mm-devm_memremap_pages-fix-shutdown-handling.patch
mm-devm_memremap_pages-add-memory_device_private-support.patch
mm-hmm-use-devm-semantics-for-hmm_devmem_add-remove.patch
mm-hmm-replace-hmm_devmem_pages_create-with-devm_memremap_pages.patch
mm-hmm-mark-hmm_devmem_add-add_resource-export_symbol_gpl.patch
The patch titled
Subject: mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
has been added to the -mm tree. Its filename is
mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl.patch
This patch should soon appear at
http://ozlabs.org/~akpm/mmots/broken-out/mm-devm_memremap_pages-mark-devm_m…
and later at
http://ozlabs.org/~akpm/mmotm/broken-out/mm-devm_memremap_pages-mark-devm_m…
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next and is updated
there every 3-4 working days
------------------------------------------------------
From: Dan Williams <dan.j.williams(a)intel.com>
Subject: mm, devm_memremap_pages: mark devm_memremap_pages() EXPORT_SYMBOL_GPL
devm_memremap_pages() is a facility that can create struct page entries
for any arbitrary range and give drivers the ability to subvert core
aspects of page management.
Specifically the facility is tightly integrated with the kernel's memory
hotplug functionality. It injects an altmap argument deep into the
architecture specific vmemmap implementation to allow allocating from
specific reserved pages, and it has Linux specific assumptions about page
structure reference counting relative to get_user_pages() and
get_user_pages_fast(). It was an oversight and a mistake that this was
not marked EXPORT_SYMBOL_GPL from the outset.
Again, devm_memremap_pagex() exposes and relies upon core kernel internal
assumptions and will continue to evolve along with 'struct page', memory
hotplug, and support for new memory types / topologies. Only an in-kernel
GPL-only driver is expected to keep up with this ongoing evolution. This
interface, and functionality derived from this interface, is not suitable
for kernel-external drivers.
Link: http://lkml.kernel.org/r/154275557457.76910.16923571232582744134.stgit@dwil…
Signed-off-by: Dan Williams <dan.j.williams(a)intel.com>
Reviewed-by: Christoph Hellwig <hch(a)lst.de>
Cc: Michal Hocko <mhocko(a)suse.com>
Cc: "Jérôme Glisse" <jglisse(a)redhat.com>
Cc: Balbir Singh <bsingharora(a)gmail.com>
Cc: Logan Gunthorpe <logang(a)deltatee.com>
Cc: <stable(a)vger.kernel.org>
Signed-off-by: Andrew Morton <akpm(a)linux-foundation.org>
---
--- a/kernel/memremap.c~mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl
+++ a/kernel/memremap.c
@@ -233,7 +233,7 @@ void *devm_memremap_pages(struct device
err_array:
return ERR_PTR(error);
}
-EXPORT_SYMBOL(devm_memremap_pages);
+EXPORT_SYMBOL_GPL(devm_memremap_pages);
unsigned long vmem_altmap_offset(struct vmem_altmap *altmap)
{
--- a/tools/testing/nvdimm/test/iomap.c~mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl
+++ a/tools/testing/nvdimm/test/iomap.c
@@ -113,7 +113,7 @@ void *__wrap_devm_memremap_pages(struct
return nfit_res->buf + offset - nfit_res->res.start;
return devm_memremap_pages(dev, pgmap);
}
-EXPORT_SYMBOL(__wrap_devm_memremap_pages);
+EXPORT_SYMBOL_GPL(__wrap_devm_memremap_pages);
pfn_t __wrap_phys_to_pfn_t(phys_addr_t addr, unsigned long flags)
{
_
Patches currently in -mm which might be from dan.j.williams(a)intel.com are
mm-devm_memremap_pages-mark-devm_memremap_pages-export_symbol_gpl.patch
mm-devm_memremap_pages-kill-mapping-system-ram-support.patch
mm-devm_memremap_pages-fix-shutdown-handling.patch
mm-devm_memremap_pages-add-memory_device_private-support.patch
mm-hmm-use-devm-semantics-for-hmm_devmem_add-remove.patch
mm-hmm-replace-hmm_devmem_pages_create-with-devm_memremap_pages.patch
mm-hmm-mark-hmm_devmem_add-add_resource-export_symbol_gpl.patch
My apology that the v6 patches are missing the first two patch in
the series. Resending the patch series as v7.
Fix in this version bugs causing build problems for UP configuration.
Also merged in Jiri's change to extend STIBP for SECCOMP processes and
renaming TIF_STIBP to TIF_SPEC_INDIR_BRANCH.
I've updated the boot options spectre_v2_app2app to
on, off, auto, prctl and seccomp. This aligns with
the options for other speculation related mitigations.
I tried to incorporate sched_smt_present to detect when we have all SMT
going offline and we can disable the SMT path that Peter suggested.
this is an optimization that can be easily left out of the patch
series. I've put these two patches at the end and they can be considered
separately.
I've dropped the TIF flags re-organization patches
as they are orthogonal to this patch series.
To do: Create a dedicated document on the mitigation options for Spectre V2.
Since Jiri's patchset to always turn on STIBP
has big performance impact, I think that it should
be reverted from 4.20 and stable kernels for now, till this
patchset to mitigate its performance impact can be merged
with it.
Thanks.
Tim
Patch 1 to 3 are clean up patches.
Patch 4 and 5 disable STIBP for enhacned IBRS.
Patch 6 to 9 reorganize and clean up the code without affecting
functionality for easier modification later.
Patch 10 introduces the STIBP flag on a task to dynamically
enable STIBP for that task.
Patch 11 introduces different modes to protect a
task against Spectre v2 user space attack.
Patch 12 adds prctl interface to turn on Spectre v2 user mode defenses on a task.
Patch 13 Put IBPB usage under the mode chosen for app2app mitigation.
Patch 14 Add STIBP protection for SECCOMP tasks.
Patch 15-16 add Spectre v2 defenses for non-dumpable tasks.
Patch 15-16 reorganizes the TIF flags, and can be dropped without affecting this series
Patch 17-18 When there are no paired SMT left, disable SMT specific code
Changes:
v6:
1. Fix bugs for UP build configuration.
2. Add protection for SECCOMP tasks.
3. Rename TIF_STIBP to TIF_SPEC_INDIR_BRANCH.
4. Update boot options to align with other speculation mitigations.
5. Separate out IBPB change that makes it depend on TIF_SPEC_INDIR_BRANCH.
6. Move some checks for SPEC_CTRL updates to spec_ctrl_update_msr to avoid
unnecesseary MSR writes.
7. Drop TIF reorg patches.
8. Incorporate optimization to disable SMT code paths when no paired SMT is present.
v5:
1. Drop patch to extend TIF_STIBP changes to all related threads on
a task's dumpabibility change.
2. Drop patch to replace sched_smt_present with cpu_smt_enabled.
3. Drop export of cpu_smt_control in kernel/cpu.c and replace external
usages of cpu_smt_control with cpu_smt_enabled.
4. Rebase patch series on 4.20-rc2.
v4:
1. Extend STIBP update to all threads of a process changing
it dumpability.
2. Add logic to update SPEC_CTRL MSR on a remote CPU when TIF flags
affecting speculation changes for task running on the remote CPU.
3. Regroup x86 TIF_* flags according to their functions.
4. Various code clean up.
v3:
1. Add logic to skip STIBP when Enhanced IBRS is used.
2. Break up v2 patches into smaller logical patches.
3. Fix bug in arch_set_dumpable that did not update SPEC_CTRL
MSR right away when according to task's STIBP flag clearing which
caused SITBP to be left on.
4. Various code clean up.
v2:
1. Extend per process STIBP to AMD cpus
2. Add prctl option to control per process indirect branch speculation
3. Bug fixes and cleanups
Jiri's patchset to harden Spectre v2 user space mitigation makes IBPB
and STIBP in use for Spectre v2 mitigation on all processes. IBPB will
be issued for switching to an application that's not ptraceable by the
previous application and STIBP will be always turned on.
However, leaving STIBP on all the time is expensive for certain
applications that have frequent indirect branches. One such application
is perlbench in the SpecInt Rate 2006 test suite which shows a
21% reduction in throughput.
There're also reports of drop in performance on Python and PHP benchmarks:
https://www.phoronix.com/scan.php?page=article&item=linux-420-bisect&num=2
Other applications like bzip2 with minimal indirct branches have
only a 0.7% reduction in throughput. IBPB will also impose
overhead during context switches.
Users may not wish to incur performance overhead from IBPB and STIBP for
general non security sensitive processes and use these mitigations only
for security sensitive processes.
This patchset provides a process property based lite protection mode.
In this mode, IBPB and STIBP mitigation are applied only to security
sensitive non-dumpable processes and processes that users want to protect
by having indirect branch speculation disabled via PRCTL. So the overhead
from IBPB and STIBP are avoided for low security processes that don't
require extra protection.
Jiri Kosina (1):
x86/speculation: Add 'seccomp' Spectre v2 app to app protection mode
Peter Zijlstra (1):
sched/smt: Make sched_smt_present track topology
Tim Chen (16):
x86/speculation: Clean up spectre_v2_parse_cmdline()
x86/speculation: Remove unnecessary ret variable in cpu_show_common()
x86/speculation: Reorganize cpu_show_common()
x86/speculation: Add X86_FEATURE_USE_IBRS_ENHANCED
x86/speculation: Disable STIBP when enhanced IBRS is in use
x86/speculation: Rename SSBD update functions
x86/speculation: Reorganize speculation control MSRs update
smt: Create cpu_smt_enabled static key for SMT specific code
x86/smt: Convert cpu_smt_control check to cpu_smt_enabled static key
x86/speculation: Turn on or off STIBP according to a task's TIF_STIBP
x86/speculation: Add Spectre v2 app to app protection modes
x86/speculation: Create PRCTL interface to restrict indirect branch
speculation
x86/speculation: Enable IBPB for tasks with
TIF_SPEC_BRANCH_SPECULATION
security: Update speculation restriction of a process when modifying
its dumpability
x86/speculation: Use STIBP to restrict speculation on non-dumpable
task
x86/smt: Allow disabling of SMT when last SMT is offlined
Documentation/admin-guide/kernel-parameters.txt | 34 +++
Documentation/userspace-api/spec_ctrl.rst | 9 +
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 6 +-
arch/x86/include/asm/nospec-branch.h | 10 +
arch/x86/include/asm/spec-ctrl.h | 18 +-
arch/x86/include/asm/thread_info.h | 5 +-
arch/x86/kernel/cpu/bugs.c | 336 +++++++++++++++++++++---
arch/x86/kernel/process.c | 58 +++-
arch/x86/kvm/vmx.c | 2 +-
arch/x86/mm/tlb.c | 23 +-
fs/exec.c | 3 +
include/linux/cpu.h | 31 ++-
include/linux/sched.h | 9 +
include/uapi/linux/prctl.h | 1 +
kernel/cpu.c | 28 +-
kernel/cred.c | 5 +-
kernel/sched/core.c | 19 +-
kernel/sched/sched.h | 2 -
kernel/sys.c | 7 +
tools/include/uapi/linux/prctl.h | 1 +
21 files changed, 526 insertions(+), 82 deletions(-)
--
2.9.4