Vince reported perf_fuzzer quickly locks up on 4.15-rc7 with PTI; Robert reported Bad RIP with KPTI and Intel BTS also on 4.15-rc7: honggfuzz -f /tmp/somedirectorywithatleastonefile \ --linux_perf_bts_edge -s -- /bin/true (honggfuzz from https://github.com/google/honggfuzz) crashed with BUG: unable to handle kernel paging request at ffff9d3215100000 (then narrowed it down to perf record --per-thread -e intel_bts//u -- /bin/ls).
The intel_bts driver does not use the 'normal' BTS buffer which is exposed through kaiser_add_mapping(), but instead uses the memory allocated for the perf AUX buffer.
This obviously comes apart when using PTI, because then the kernel mapping, which includes that AUX buffer memory, disappears while switched to user page tables.
Easily fixed in old-Kaiser backports, by applying kaiser_add_mapping() to those pages; perhaps not so easy for upstream, where 4.15-rc8 commit 99a9dc98ba52 ("x86,perf: Disable intel_bts when PTI") disables for now.
Slightly reorganized surrounding code in bts_buffer_setup_aux(), so it can better match bts_buffer_free_aux(): free_aux with an #ifdef to avoid the loop when PTI is off, but setup_aux needs to loop anyway (and kaiser_add_mapping() is cheap when PTI config is off or "pti=off").
Reported-by: Vince Weaver vincent.weaver@maine.edu Reported-by: Robert Święcki robert@swiecki.net Analyzed-by: Peter Zijlstra peterz@infradead.org Analyzed-by: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Vince Weaver vince@deater.net Cc: stable@vger.kernel.org Cc: Jiri Kosina jkosina@suze.cz Signed-off-by: Hugh Dickins hughd@google.com --- arch/x86/kernel/cpu/perf_event_intel_bts.c | 44 ++++++++++++++++++++++-------- 1 file changed, 33 insertions(+), 11 deletions(-)
diff --git a/arch/x86/kernel/cpu/perf_event_intel_bts.c b/arch/x86/kernel/cpu/perf_event_intel_bts.c index 2cad71d1b14c..5af11c46d0b9 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_bts.c +++ b/arch/x86/kernel/cpu/perf_event_intel_bts.c @@ -22,6 +22,7 @@ #include <linux/debugfs.h> #include <linux/device.h> #include <linux/coredump.h> +#include <linux/kaiser.h>
#include <asm-generic/sizes.h> #include <asm/perf_event.h> @@ -67,6 +68,23 @@ static size_t buf_size(struct page *page) return 1 << (PAGE_SHIFT + page_private(page)); }
+static void bts_buffer_free_aux(void *data) +{ +#ifdef CONFIG_PAGE_TABLE_ISOLATION + struct bts_buffer *buf = data; + int nbuf; + + for (nbuf = 0; nbuf < buf->nr_bufs; nbuf++) { + struct page *page = buf->buf[nbuf].page; + void *kaddr = page_address(page); + size_t page_size = buf_size(page); + + kaiser_remove_mapping((unsigned long)kaddr, page_size); + } +#endif + kfree(data); +} + static void * bts_buffer_setup_aux(int cpu, void **pages, int nr_pages, bool overwrite) { @@ -103,29 +121,33 @@ bts_buffer_setup_aux(int cpu, void **pages, int nr_pages, bool overwrite) buf->real_size = size - size % BTS_RECORD_SIZE;
for (pg = 0, nbuf = 0, offset = 0, pad = 0; nbuf < buf->nr_bufs; nbuf++) { - unsigned int __nr_pages; + void *kaddr = pages[pg]; + size_t page_size; + + page = virt_to_page(kaddr); + page_size = buf_size(page); + + if (kaiser_add_mapping((unsigned long)kaddr, + page_size, __PAGE_KERNEL) < 0) { + buf->nr_bufs = nbuf; + bts_buffer_free_aux(buf); + return NULL; + }
- page = virt_to_page(pages[pg]); - __nr_pages = PagePrivate(page) ? 1 << page_private(page) : 1; buf->buf[nbuf].page = page; buf->buf[nbuf].offset = offset; buf->buf[nbuf].displacement = (pad ? BTS_RECORD_SIZE - pad : 0); - buf->buf[nbuf].size = buf_size(page) - buf->buf[nbuf].displacement; + buf->buf[nbuf].size = page_size - buf->buf[nbuf].displacement; pad = buf->buf[nbuf].size % BTS_RECORD_SIZE; buf->buf[nbuf].size -= pad;
- pg += __nr_pages; - offset += __nr_pages << PAGE_SHIFT; + pg += page_size >> PAGE_SHIFT; + offset += page_size; }
return buf; }
-static void bts_buffer_free_aux(void *data) -{ - kfree(data); -} - static unsigned long bts_buffer_offset(struct bts_buffer *buf, unsigned int idx) { return buf->buf[idx].offset + buf->buf[idx].displacement;
This is a note to let you know that I've just added the patch titled
kaiser: fix intel_bts perf crashes
to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: kaiser-fix-intel_bts-perf-crashes.patch and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
From hughd@google.com Thu Feb 1 09:09:20 2018
From: Hugh Dickins hughd@google.com Date: Mon, 29 Jan 2018 18:15:33 -0800 Subject: kaiser: fix intel_bts perf crashes To: Greg Kroah-Hartman gregkh@linuxfoundation.org Cc: Hugh Dickins hughd@google.com, Thomas Gleixner tglx@linutronix.de, Ingo Molnar mingo@kernel.org, Andy Lutomirski luto@amacapital.net, Alexander Shishkin alexander.shishkin@linux.intel.com, Linus Torvalds torvalds@linux-foundation.org, Vince Weaver vince@deater.net, stable@vger.kernel.org, Jiri Kosina jkosina@suze.cz Message-ID: 20180130021533.228782-1-hughd@google.com
From: Hugh Dickins hughd@google.com
Vince reported perf_fuzzer quickly locks up on 4.15-rc7 with PTI; Robert reported Bad RIP with KPTI and Intel BTS also on 4.15-rc7: honggfuzz -f /tmp/somedirectorywithatleastonefile \ --linux_perf_bts_edge -s -- /bin/true (honggfuzz from https://github.com/google/honggfuzz) crashed with BUG: unable to handle kernel paging request at ffff9d3215100000 (then narrowed it down to perf record --per-thread -e intel_bts//u -- /bin/ls).
The intel_bts driver does not use the 'normal' BTS buffer which is exposed through kaiser_add_mapping(), but instead uses the memory allocated for the perf AUX buffer.
This obviously comes apart when using PTI, because then the kernel mapping, which includes that AUX buffer memory, disappears while switched to user page tables.
Easily fixed in old-Kaiser backports, by applying kaiser_add_mapping() to those pages; perhaps not so easy for upstream, where 4.15-rc8 commit 99a9dc98ba52 ("x86,perf: Disable intel_bts when PTI") disables for now.
Slightly reorganized surrounding code in bts_buffer_setup_aux(), so it can better match bts_buffer_free_aux(): free_aux with an #ifdef to avoid the loop when PTI is off, but setup_aux needs to loop anyway (and kaiser_add_mapping() is cheap when PTI config is off or "pti=off").
Reported-by: Vince Weaver vincent.weaver@maine.edu Reported-by: Robert Święcki robert@swiecki.net Analyzed-by: Peter Zijlstra peterz@infradead.org Analyzed-by: Stephane Eranian eranian@google.com Cc: Thomas Gleixner tglx@linutronix.de Cc: Ingo Molnar mingo@kernel.org Cc: Andy Lutomirski luto@amacapital.net Cc: Alexander Shishkin alexander.shishkin@linux.intel.com Cc: Linus Torvalds torvalds@linux-foundation.org Cc: Vince Weaver vince@deater.net Cc: stable@vger.kernel.org Cc: Jiri Kosina jkosina@suze.cz Signed-off-by: Hugh Dickins hughd@google.com Signed-off-by: Greg Kroah-Hartman gregkh@linuxfoundation.org
--- arch/x86/kernel/cpu/perf_event_intel_bts.c | 44 +++++++++++++++++++++-------- 1 file changed, 33 insertions(+), 11 deletions(-)
--- a/arch/x86/kernel/cpu/perf_event_intel_bts.c +++ b/arch/x86/kernel/cpu/perf_event_intel_bts.c @@ -22,6 +22,7 @@ #include <linux/debugfs.h> #include <linux/device.h> #include <linux/coredump.h> +#include <linux/kaiser.h>
#include <asm-generic/sizes.h> #include <asm/perf_event.h> @@ -67,6 +68,23 @@ static size_t buf_size(struct page *page return 1 << (PAGE_SHIFT + page_private(page)); }
+static void bts_buffer_free_aux(void *data) +{ +#ifdef CONFIG_PAGE_TABLE_ISOLATION + struct bts_buffer *buf = data; + int nbuf; + + for (nbuf = 0; nbuf < buf->nr_bufs; nbuf++) { + struct page *page = buf->buf[nbuf].page; + void *kaddr = page_address(page); + size_t page_size = buf_size(page); + + kaiser_remove_mapping((unsigned long)kaddr, page_size); + } +#endif + kfree(data); +} + static void * bts_buffer_setup_aux(int cpu, void **pages, int nr_pages, bool overwrite) { @@ -103,29 +121,33 @@ bts_buffer_setup_aux(int cpu, void **pag buf->real_size = size - size % BTS_RECORD_SIZE;
for (pg = 0, nbuf = 0, offset = 0, pad = 0; nbuf < buf->nr_bufs; nbuf++) { - unsigned int __nr_pages; + void *kaddr = pages[pg]; + size_t page_size; + + page = virt_to_page(kaddr); + page_size = buf_size(page); + + if (kaiser_add_mapping((unsigned long)kaddr, + page_size, __PAGE_KERNEL) < 0) { + buf->nr_bufs = nbuf; + bts_buffer_free_aux(buf); + return NULL; + }
- page = virt_to_page(pages[pg]); - __nr_pages = PagePrivate(page) ? 1 << page_private(page) : 1; buf->buf[nbuf].page = page; buf->buf[nbuf].offset = offset; buf->buf[nbuf].displacement = (pad ? BTS_RECORD_SIZE - pad : 0); - buf->buf[nbuf].size = buf_size(page) - buf->buf[nbuf].displacement; + buf->buf[nbuf].size = page_size - buf->buf[nbuf].displacement; pad = buf->buf[nbuf].size % BTS_RECORD_SIZE; buf->buf[nbuf].size -= pad;
- pg += __nr_pages; - offset += __nr_pages << PAGE_SHIFT; + pg += page_size >> PAGE_SHIFT; + offset += page_size; }
return buf; }
-static void bts_buffer_free_aux(void *data) -{ - kfree(data); -} - static unsigned long bts_buffer_offset(struct bts_buffer *buf, unsigned int idx) { return buf->buf[idx].offset + buf->buf[idx].displacement;
Patches currently in stable-queue which might be from hughd@google.com are
queue-4.4/x86-pti-make-unpoison-of-pgd-for-trusted-boot-work-for-real.patch queue-4.4/kaiser-fix-intel_bts-perf-crashes.patch
On Thu, 1 Feb 2018, gregkh@linuxfoundation.org wrote:
This is a note to let you know that I've just added the patch titled
kaiser: fix intel_bts perf crashes
to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: kaiser-fix-intel_bts-perf-crashes.patch and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
I thought that to get into stable the patch had to be in upstream first.
As far as I know, there is no fix for this particular problem (BTS disabled due to KPTI caused crashes) upstream.
Vince
On Thu, Feb 01, 2018 at 09:18:43AM -0500, Vince Weaver wrote:
On Thu, 1 Feb 2018, gregkh@linuxfoundation.org wrote:
This is a note to let you know that I've just added the patch titled
kaiser: fix intel_bts perf crashes
to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: kaiser-fix-intel_bts-perf-crashes.patch and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
I thought that to get into stable the patch had to be in upstream first.
As far as I know, there is no fix for this particular problem (BTS disabled due to KPTI caused crashes) upstream.
Really? This is a reported issue in 4.15? I haven't seen that report anywhere, do you have a pointer to it?
The 4.4 and 4.9 backports do have odd one-off patches that are not in 4.14 and newer due to the way the backports were done, so they will diverge. But they should not diverge in bugfixes :)
thanks,
greg k-h
Refer to https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h... where this is disabled, not fixed.
On Thu, Feb 1, 2018 at 2:47 PM, Greg KH gregkh@linuxfoundation.org wrote:
On Thu, Feb 01, 2018 at 09:18:43AM -0500, Vince Weaver wrote:
On Thu, 1 Feb 2018, gregkh@linuxfoundation.org wrote:
This is a note to let you know that I've just added the patch titled
kaiser: fix intel_bts perf crashes
to the 4.4-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git%3Ba=su...
The filename of the patch is: kaiser-fix-intel_bts-perf-crashes.patch and it can be found in the queue-4.4 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree, please let stable@vger.kernel.org know about it.
I thought that to get into stable the patch had to be in upstream first.
As far as I know, there is no fix for this particular problem (BTS disabled due to KPTI caused crashes) upstream.
Really? This is a reported issue in 4.15? I haven't seen that report anywhere, do you have a pointer to it?
The 4.4 and 4.9 backports do have odd one-off patches that are not in 4.14 and newer due to the way the backports were done, so they will diverge. But they should not diverge in bugfixes :)
thanks,
greg k-h
On Thu, Feb 1, 2018 at 6:18 AM, Vince Weaver vincent.weaver@maine.edu wrote:
I thought that to get into stable the patch had to be in upstream first.
Upstream does have the fix, it's just very very different in this case.
The upstream fix is commit 99a9dc98ba52 ("x86,perf: Disable intel_bts when PTI"), which just disables intel_bts entirely, because upstream doesn't do those "kaiser_add/remove_mapping()" things.
So this "backport" is ok.
Linus
On Thu, 1 Feb 2018, Linus Torvalds wrote:
On Thu, Feb 1, 2018 at 6:18 AM, Vince Weaver vincent.weaver@maine.edu wrote:
I thought that to get into stable the patch had to be in upstream first.
Upstream does have the fix, it's just very very different in this case.
The upstream fix is commit 99a9dc98ba52 ("x86,perf: Disable intel_bts when PTI"), which just disables intel_bts entirely, because upstream doesn't do those "kaiser_add/remove_mapping()" things.
So this "backport" is ok.
Linus
Unless I'm misunderstanding, the fix going into 4.4 actually fixes the issue (BTS trying to access a VM buffer that is unmapped) using the KAISER infrastructure.
The "fix" in 4.15 just disables BTS totally (until someone figures out how to fix things properly).
So while in theory both are equivelant fixes ("the kernel no longer crashes when using perf/bts") the mechanism is completely different.
I'm only minorly complaining because I do a lot of testing/fuzzing on linux-git to catch perf related issues (I'm the one who first reported this issue in the first place). But I typically don't test the stable trees, and now we more or less have a fork where the perf/bts code is doing something very different in stable vs head.
Vince
On Thu, Feb 1, 2018 at 9:22 AM, Vince Weaver vincent.weaver@maine.edu wrote:
Unless I'm misunderstanding, the fix going into 4.4 actually fixes the issue (BTS trying to access a VM buffer that is unmapped) using the KAISER infrastructure.
The "fix" in 4.15 just disables BTS totally (until someone figures out how to fix things properly).
So while in theory both are equivelant fixes ("the kernel no longer crashes when using perf/bts") the mechanism is completely different.
Absolutely. That's not uncommon for some "backports" where the code has changed.
What the eventual intel_bts evolution will be, I have no idea. Maybe people won't care, and it's all simply "dead code until fixed chips" issue.
Happily, the PTI-fixed parts *will* come fairly soon, and developers (who are the main target of things like profiling) are more likely to get new machines. Telling your management "I as a developer need a newer system that supports XYZ" is easier than "we need to upgrade all our machines".
Linus
On Thu, Feb 01, 2018 at 12:22:40PM -0500, Vince Weaver wrote:
On Thu, 1 Feb 2018, Linus Torvalds wrote:
On Thu, Feb 1, 2018 at 6:18 AM, Vince Weaver vincent.weaver@maine.edu wrote:
I thought that to get into stable the patch had to be in upstream first.
Upstream does have the fix, it's just very very different in this case.
The upstream fix is commit 99a9dc98ba52 ("x86,perf: Disable intel_bts when PTI"), which just disables intel_bts entirely, because upstream doesn't do those "kaiser_add/remove_mapping()" things.
So this "backport" is ok.
Linus
Unless I'm misunderstanding, the fix going into 4.4 actually fixes the issue (BTS trying to access a VM buffer that is unmapped) using the KAISER infrastructure.
The "fix" in 4.15 just disables BTS totally (until someone figures out how to fix things properly).
We could produce a better fix, I was kind of waiting to see if anyone actually cares about intel_bts, especially since one can still get BTS the "old way" where the data is processed on the kernel side. And most CPUs these days should have at least some form of PT.
Regards, -- Alex
linux-stable-mirror@lists.linaro.org