On 01/16/2018 08:32 AM, Shankar, Ravi V wrote:
Vikas on vacation until end of the month. Fenghua will look into this issue.
On Jan 16, 2018, at 5:09 AM, Thomas Gleixner <tglx@linutronix.de mailto:tglx@linutronix.de> wrote:
Vikas, Fenghua can you please look at that ASAP?
On Sun, 14 Jan 2018, Thomas Gleixner wrote:
On Fri, 12 Jan 2018, Joseph Salisbury wrote:
Hi Vikas,
A kernel bug report was opened against Ubuntu [0]. After a kernel bisect, it was found that reverting the following commit resolved this bug:
commit 24247aeeabe99eab13b798ccccc2dec066dd6f07 Author: Vikas Shivappa <vikas.shivappa@linux.intel.com mailto:vikas.shivappa@linux.intel.com> Date: Tue Aug 15 18:00:43 2017 -0700
x86/intel_rdt/cqm: Improve limbo list processing
The regression was introduced as of v4.14-r1 and still exists with current mainline. The trace with v4.15-rc7 is in comment #44[1].
I was hoping to get your feedback, since you are the patch author. Do you think gathering any additional data will help diagnose this issue, or would it be best to submit a revert request?
That stinks like a use after free. Can you run with KASAN enabled?
Thanks,
tglx
Here is some data wiht KASAN enabled: https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1733662/comments/51
Are there any specific logs you would like to see, or specific actions executed?
Thanks,
Joe
On Tue, 16 Jan 2018, Joseph Salisbury wrote:
On 01/16/2018 08:32 AM, Shankar, Ravi V wrote:
Vikas on vacation until end of the month. Fenghua will look into this issue.
On Jan 16, 2018, at 5:09 AM, Thomas Gleixner <tglx@linutronix.de mailto:tglx@linutronix.de> wrote:
Vikas, Fenghua can you please look at that ASAP?
On Sun, 14 Jan 2018, Thomas Gleixner wrote:
On Fri, 12 Jan 2018, Joseph Salisbury wrote:
Hi Vikas,
A kernel bug report was opened against Ubuntu [0]. After a kernel bisect, it was found that reverting the following commit resolved this bug:
commit 24247aeeabe99eab13b798ccccc2dec066dd6f07 Author: Vikas Shivappa <vikas.shivappa@linux.intel.com mailto:vikas.shivappa@linux.intel.com> Date: Tue Aug 15 18:00:43 2017 -0700
x86/intel_rdt/cqm: Improve limbo list processing
The regression was introduced as of v4.14-r1 and still exists with current mainline. The trace with v4.15-rc7 is in comment #44[1].
I was hoping to get your feedback, since you are the patch author. Do you think gathering any additional data will help diagnose this issue, or would it be best to submit a revert request?
That stinks like a use after free. Can you run with KASAN enabled?
Thanks,
tglx
Here is some data wiht KASAN enabled: https://bugs.launchpad.net/ubuntu/+source/linux-hwe/+bug/1733662/comments/51
Are there any specific logs you would like to see, or specific actions executed?
No, the KASAN output is pretty clear where the issue is.
Thanks,
tglx
From: Thomas Gleixner [mailto:tglx@linutronix.de] On Tue, 16 Jan 2018, Joseph Salisbury wrote:
On 01/16/2018 08:32 AM, Shankar, Ravi V wrote:
Vikas on vacation until end of the month. Fenghua will look into this issue.
On Jan 16, 2018, at 5:09 AM, Thomas Gleixner <tglx@linutronix.de mailto:tglx@linutronix.de> wrote:
Vikas, Fenghua can you please look at that ASAP?
On Sun, 14 Jan 2018, Thomas Gleixner wrote:
On Fri, 12 Jan 2018, Joseph Salisbury wrote:
Hi Vikas,
A kernel bug report was opened against Ubuntu [0]. After a kernel bisect, it was found that reverting the following commit resolved this bug:
commit 24247aeeabe99eab13b798ccccc2dec066dd6f07 Author: Vikas Shivappa <vikas.shivappa@linux.intel.com mailto:vikas.shivappa@linux.intel.com> Date: Tue Aug 15 18:00:43 2017 -0700
x86/intel_rdt/cqm: Improve limbo list processing
The regression was introduced as of v4.14-r1 and still exists with current mainline. The trace with v4.15-rc7 is in comment #44[1].
I was hoping to get your feedback, since you are the patch author. Do you think gathering any additional data will help diagnose this issue, or would it be best to submit a revert request?
That stinks like a use after free. Can you run with KASAN enabled?
Thanks,
tglx
Here is some data wiht KASAN enabled: https://bugs.launchpad.net/ubuntu/+source/linux-
hwe/+bug/1733662/comme
nts/51
Are there any specific logs you would like to see, or specific actions executed?
No, the KASAN output is pretty clear where the issue is.
Thanks,
tglx
Is this a Haswell specific issue?
I run the following test forever without issue on Broadwell and 4.15.0-rc6 with rdt mounted: for ((;;)) do for ((i=1;i<88;i++)) do echo 0 >/sys/devices/system/cpu/cpu$i/online done echo "online cpus:" grep processor /proc/cpuinfo |wc for ((i=1;i<88;i++)) do echo 1 >/sys/devices/system/cpu/cpu$i/online done echo "online cpus:" grep processor /proc/cpuinfo|wc done
I'm finding a Haswell to reproduce the issue.
Thanks.
-Fenghua
On Tue, 16 Jan 2018, Yu, Fenghua wrote:
From: Thomas Gleixner [mailto:tglx@linutronix.de]
Is this a Haswell specific issue?
I run the following test forever without issue on Broadwell and 4.15.0-rc6 with rdt mounted: for ((;;)) do for ((i=1;i<88;i++)) do echo 0 >/sys/devices/system/cpu/cpu$i/online done echo "online cpus:" grep processor /proc/cpuinfo |wc for ((i=1;i<88;i++)) do echo 1 >/sys/devices/system/cpu/cpu$i/online done echo "online cpus:" grep processor /proc/cpuinfo|wc done
I'm finding a Haswell to reproduce the issue.
Come on. This is crystal clear from the KASAN trace. And the fix is simple enough.
You simply do not run into it because on your machine
is_llc_occupancy_enabled() is false...
Thanks,
tglx 8<--------------------
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 88dcf8479013..99442370de40 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r) */ if (static_branch_unlikely(&rdt_mon_enable_key)) rmdir_mondata_subdir_allrdtgrp(r, d->id); - kfree(d->ctrl_val); - kfree(d->rmid_busy_llc); - kfree(d->mbm_total); - kfree(d->mbm_local); list_del(&d->list); if (is_mbm_enabled()) cancel_delayed_work(&d->mbm_over); @@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r) cancel_delayed_work(&d->cqm_limbo); }
+ kfree(d->ctrl_val); + kfree(d->rmid_busy_llc); + kfree(d->mbm_total); + kfree(d->mbm_local); kfree(d); return; }
On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
On Tue, 16 Jan 2018, Yu, Fenghua wrote:
From: Thomas Gleixner [mailto:tglx@linutronix.de]
Is this a Haswell specific issue?
I run the following test forever without issue on Broadwell and 4.15.0-rc6 with rdt mounted: for ((;;)) do for ((i=1;i<88;i++)) do echo 0 >/sys/devices/system/cpu/cpu$i/online done echo "online cpus:" grep processor /proc/cpuinfo |wc for ((i=1;i<88;i++)) do echo 1 >/sys/devices/system/cpu/cpu$i/online done echo "online cpus:" grep processor /proc/cpuinfo|wc done
I'm finding a Haswell to reproduce the issue.
Come on. This is crystal clear from the KASAN trace. And the fix is simple enough.
You simply do not run into it because on your machine
is_llc_occupancy_enabled() is false...
Thanks,
tglx 8<--------------------
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 88dcf8479013..99442370de40 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r) */ if (static_branch_unlikely(&rdt_mon_enable_key)) rmdir_mondata_subdir_allrdtgrp(r, d->id);
kfree(d->ctrl_val);
kfree(d->rmid_busy_llc);
kfree(d->mbm_total);
list_del(&d->list); if (is_mbm_enabled()) cancel_delayed_work(&d->mbm_over);kfree(d->mbm_local);
@@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r) cancel_delayed_work(&d->cqm_limbo); }
kfree(d->ctrl_val);
kfree(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d); return; }kfree(d->mbm_local);
Thanks, Thomas. I'll build some test kernels and have your patch tested out.
Thanks,
Joe
On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
On Tue, 16 Jan 2018, Yu, Fenghua wrote:
From: Thomas Gleixner [mailto:tglx@linutronix.de]
Is this a Haswell specific issue?
I run the following test forever without issue on Broadwell and 4.15.0-rc6 with rdt mounted: for ((;;)) do for ((i=1;i<88;i++)) do echo 0 >/sys/devices/system/cpu/cpu$i/online done echo "online cpus:" grep processor /proc/cpuinfo |wc for ((i=1;i<88;i++)) do echo 1 >/sys/devices/system/cpu/cpu$i/online done echo "online cpus:" grep processor /proc/cpuinfo|wc done
I'm finding a Haswell to reproduce the issue.
Come on. This is crystal clear from the KASAN trace. And the fix is simple enough.
You simply do not run into it because on your machine
is_llc_occupancy_enabled() is false...
Thanks,
tglx 8<--------------------
diff --git a/arch/x86/kernel/cpu/intel_rdt.c b/arch/x86/kernel/cpu/intel_rdt.c index 88dcf8479013..99442370de40 100644 --- a/arch/x86/kernel/cpu/intel_rdt.c +++ b/arch/x86/kernel/cpu/intel_rdt.c @@ -525,10 +525,6 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r) */ if (static_branch_unlikely(&rdt_mon_enable_key)) rmdir_mondata_subdir_allrdtgrp(r, d->id);
kfree(d->ctrl_val);
kfree(d->rmid_busy_llc);
kfree(d->mbm_total);
list_del(&d->list); if (is_mbm_enabled()) cancel_delayed_work(&d->mbm_over);kfree(d->mbm_local);
@@ -545,6 +541,10 @@ static void domain_remove_cpu(int cpu, struct rdt_resource *r) cancel_delayed_work(&d->cqm_limbo); }
kfree(d->ctrl_val);
kfree(d->rmid_busy_llc);
kfree(d->mbm_total);
kfree(d); return; }kfree(d->mbm_local);
Hi Thomas,
Testing of your patch shows that your patch resolves the bug. Thanks for the assistance! Is this something you could submit to mainline?
Thanks,
Joe
On Wed, 17 Jan 2018, Joseph Salisbury wrote:
On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
Testing of your patch shows that your patch resolves the bug. Thanks for the assistance! Is this something you could submit to mainline?
Already there :)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Tagged for stable.
Thanks,
tglx
On 01/17/2018 05:55 PM, Thomas Gleixner wrote:
On Wed, 17 Jan 2018, Joseph Salisbury wrote:
On 01/16/2018 01:59 PM, Thomas Gleixner wrote:
Testing of your patch shows that your patch resolves the bug. Thanks for the assistance! Is this something you could submit to mainline?
Already there :)
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?i...
Tagged for stable.
Thanks,
tglx
Thanks so much!
linux-stable-mirror@lists.linaro.org