Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group

10 Dec 2020

Hi James,
On 12/9/2020 8:51 AM, James Morse wrote:
...
Hi Reinette, Fenghua,
Subject nit: I think 'use IPI instead of task_work() to update PQR_ASSOC_MSR' conveys the
guts of this change more quickly!
Sure. Thank you. A small change is that I plan to write "PQR_ASSOC MSR" 
instead to closer match the name.
...
On 03/12/2020 23:25, Reinette Chatre wrote:
...
From: Fenghua Yu fenghua.yu@intel.com
Currently when moving a task to a resource group the PQR_ASSOC MSR
is updated with the new closid and rmid in an added task callback.
If the task is running the work is run as soon as possible. If the
task is not running the work is executed later
...
in the kernel exit path when the kernel returns to the task again.
kernel exit makes me thing of user-space... is it enough to just say:
"by __switch_to() when task is next run"?
I do not think that would be accurate. The paragraph to which you are 
proposing the change states the current context, before the fix, when 
task_work_add() is still used to update PQR_ASSOC. The "kernel exit" 
text you refer to is quite close to task_work_add()'s comments and 
indeed refers to the work that is run before returning to user space. If 
a function name would make things clearer perhaps adding 
exit_to_user_mode_loop() instead? Changing the text to __switch_to() 
does not reflect the context described here since __switch_to() is what 
is called when task is context switched back from where 
resctrl_sched_in() is called, not the queued work being described here.
...
...
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task
is running is the right thing to do. Queueing work for a task that is
not running is unnecessary (the PQR_ASSOC MSR is already updated when the
task is scheduled in) and causing system resource waste with the way in
which it is implemented: Work to update the PQR_ASSOC register is queued
every time the user writes a task id to the "tasks" file, even if the task
already belongs to the resource group. This could result in multiple pending
work items associated with a single task even if they are all identical and
even though only a single update with most recent values is needed.
Specifically, even if a task is moved between different resource groups
while it is sleeping then it is only the last move that is relevant but
yet a work item is queued during each move.
This unnecessary queueing of work items could result in significant system
resource waste, especially on tasks sleeping for a long time. For example,
as demonstrated by Shakeel Butt in [1] writing the same task id to the
"tasks" file can quickly consume significant memory. The same problem
(wasted system resources) occurs when moving a task between different
resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue with
the way in which the queueing of work is done in that the task_struct update
is currently done after the work is queued, resulting in a race with the
register update possibly done before the data needed by the update is available.
To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way
right after the new closid and rmid are ready during the task movement,
only if the task is running. If a moved task is not running nothing is
done since the PQR_ASSOC MSR will be updated next time the task is scheduled.
This is the same way used to update the register when tasks are moved as
part of resource group removal.
(as t->on_cpu is already used...)
Reviewed-by: James Morse james.morse@arm.com
Thank you very much. I do plan to follow your suggestion below though.
...
...

diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
index 68db7d2dec8f..9d62f1fadcc3 100644
--- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c
+++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c
@@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
...
+static void update_task_closid_rmid(struct task_struct *t)
  {

int cpu;


if (task_on_cpu(t, &cpu))
smp_call_function_single(cpu, _update_task_closid_rmid, t, 1);



I think:
|	if (task_curr(t))
|		smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
here would make for an easier backport as it doesn't depend on the previous patch.
Will do, thank you very much.
The previous patch has now become a bugfix in its own right after you 
pointed out the issue with using t->on_cpu. In addressing that I plan to 
remove the helpers found in patch #1 so backporting should continue to 
be easier.
...
...
+}
[...]
...
static int __rdtgroup_move_task(struct task_struct *tsk,
   			struct rdtgroup *rdtgrp)
  {
...

if (rdtgrp->type == RDTCTRL_GROUP) {
tsk->closid = rdtgrp->closid;


tsk->rmid = rdtgrp->mon.rmid;


} else if (rdtgrp->type == RDTMON_GROUP) {

[...]
...

} else {

...

rdt_last_cmd_puts("Invalid resource group type\n");


return -EINVAL;



Wouldn't this be a kernel bug?
I'd have thought there would be a WARN_ON_ONCE() here to make it clear this isn't
user-space's fault!
You are right, this would be a kernel bug. I'll add a WARN_ON_ONCE().
...
...
}

return ret;



/*
* By now, the task's closid and rmid are set. If the task is current


* on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource


* group go into effect. If the task is not current, the MSR will be


* updated when the task is scheduled in.


*/


update_task_closid_rmid(tsk);

return 0;
}

Thank you very much.
Reinette

    

2025

2024

2023

2022

2021

2020

2019

2018

2017

Re: [PATCH 2/3] x86/resctrl: Update PQR_ASSOC MSR synchronously when moving task to resource group