From: Fenghua Yu fenghua.yu@intel.com
Currently when moving a task to a resource group the PQR_ASSOC MSR is updated with the new closid and rmid in an added task callback. If the task is running the work is run as soon as possible. If the task is not running the work is executed later in the kernel exit path when the kernel returns to the task again.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task is running is the right thing to do. Queueing work for a task that is not running is unnecessary (the PQR_ASSOC MSR is already updated when the task is scheduled in) and causing system resource waste with the way in which it is implemented: Work to update the PQR_ASSOC register is queued every time the user writes a task id to the "tasks" file, even if the task already belongs to the resource group. This could result in multiple pending work items associated with a single task even if they are all identical and even though only a single update with most recent values is needed. Specifically, even if a task is moved between different resource groups while it is sleeping then it is only the last move that is relevant but yet a work item is queued during each move. This unnecessary queueing of work items could result in significant system resource waste, especially on tasks sleeping for a long time. For example, as demonstrated by Shakeel Butt in [1] writing the same task id to the "tasks" file can quickly consume significant memory. The same problem (wasted system resources) occurs when moving a task between different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue with the way in which the queueing of work is done in that the task_struct update is currently done after the work is queued, resulting in a race with the register update possibly done before the data needed by the update is available.
To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way right after the new closid and rmid are ready during the task movement, only if the task is running. If a moved task is not running nothing is done since the PQR_ASSOC MSR will be updated next time the task is scheduled. This is the same way used to update the register when tasks are moved as part of resource group removal.
[1] https://lore.kernel.org/lkml/CALvZod7E9zzHwenzf7objzGKsdBmVwTgEJ0nPgs0LUFU3S... [2] https://lore.kernel.org/lkml/20201123022433.17905-1-valentin.schneider@arm.c...
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Reported-by: Shakeel Butt shakeelb@google.com Reported-by: Valentin Schneider valentin.schneider@arm.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Tony Luck tony.luck@intel.com Cc: stable@vger.kernel.org --- arch/x86/kernel/cpu/resctrl/rdtgroup.c | 123 ++++++++++--------------- 1 file changed, 50 insertions(+), 73 deletions(-)
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 68db7d2dec8f..9d62f1fadcc3 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp) kfree(rdtgrp); }
+static void _update_task_closid_rmid(void *task) +{ + /* + * If the task is still current on this CPU, update PQR_ASSOC MSR. + * Otherwise, the MSR is updated when the task is scheduled in. + */ + if (task == current) + resctrl_sched_in(); +} + #ifdef CONFIG_SMP /* Get the CPU if the task is on it. */ static bool task_on_cpu(struct task_struct *t, int *cpu) @@ -552,94 +562,61 @@ static void set_task_cpumask(struct task_struct *t, struct cpumask *mask) if (mask && task_on_cpu(t, &cpu)) cpumask_set_cpu(cpu, mask); } -#else -static inline void -set_task_cpumask(struct task_struct *t, struct cpumask *mask) { } -#endif - -struct task_move_callback { - struct callback_head work; - struct rdtgroup *rdtgrp; -};
-static void move_myself(struct callback_head *head) +static void update_task_closid_rmid(struct task_struct *t) { - struct task_move_callback *callback; - struct rdtgroup *rdtgrp; - - callback = container_of(head, struct task_move_callback, work); - rdtgrp = callback->rdtgrp; - - /* - * If resource group was deleted before this task work callback - * was invoked, then assign the task to root group and free the - * resource group. - */ - if (atomic_dec_and_test(&rdtgrp->waitcount) && - (rdtgrp->flags & RDT_DELETED)) { - current->closid = 0; - current->rmid = 0; - rdtgroup_remove(rdtgrp); - } + int cpu;
- if (unlikely(current->flags & PF_EXITING)) - goto out; + if (task_on_cpu(t, &cpu)) + smp_call_function_single(cpu, _update_task_closid_rmid, t, 1); +}
- preempt_disable(); - /* update PQR_ASSOC MSR to make resource group go into effect */ - resctrl_sched_in(); - preempt_enable(); +#else +static inline void +set_task_cpumask(struct task_struct *t, struct cpumask *mask) { }
-out: - kfree(callback); +static void update_task_closid_rmid(struct task_struct *t) +{ + _update_task_closid_rmid(t); } +#endif
static int __rdtgroup_move_task(struct task_struct *tsk, struct rdtgroup *rdtgrp) { - struct task_move_callback *callback; - int ret; - - callback = kzalloc(sizeof(*callback), GFP_KERNEL); - if (!callback) - return -ENOMEM; - callback->work.func = move_myself; - callback->rdtgrp = rdtgrp; - /* - * Take a refcount, so rdtgrp cannot be freed before the - * callback has been invoked. + * Set the task's closid/rmid before the PQR_ASSOC MSR can be + * updated by them. + * + * For ctrl_mon groups, move both closid and rmid. + * For monitor groups, can move the tasks only from + * their parent CTRL group. */ - atomic_inc(&rdtgrp->waitcount); - ret = task_work_add(tsk, &callback->work, TWA_RESUME); - if (ret) { - /* - * Task is exiting. Drop the refcount and free the callback. - * No need to check the refcount as the group cannot be - * deleted before the write function unlocks rdtgroup_mutex. - */ - atomic_dec(&rdtgrp->waitcount); - kfree(callback); - rdt_last_cmd_puts("Task exited\n"); - } else { - /* - * For ctrl_mon groups move both closid and rmid. - * For monitor groups, can move the tasks only from - * their parent CTRL group. - */ - if (rdtgrp->type == RDTCTRL_GROUP) { - tsk->closid = rdtgrp->closid; + + if (rdtgrp->type == RDTCTRL_GROUP) { + tsk->closid = rdtgrp->closid; + tsk->rmid = rdtgrp->mon.rmid; + } else if (rdtgrp->type == RDTMON_GROUP) { + if (rdtgrp->mon.parent->closid == tsk->closid) { tsk->rmid = rdtgrp->mon.rmid; - } else if (rdtgrp->type == RDTMON_GROUP) { - if (rdtgrp->mon.parent->closid == tsk->closid) { - tsk->rmid = rdtgrp->mon.rmid; - } else { - rdt_last_cmd_puts("Can't move task to different control group\n"); - ret = -EINVAL; - } + } else { + rdt_last_cmd_puts("Can't move task to different control group\n"); + return -EINVAL; } + } else { + rdt_last_cmd_puts("Invalid resource group type\n"); + return -EINVAL; } - return ret; + + /* + * By now, the task's closid and rmid are set. If the task is current + * on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource + * group go into effect. If the task is not current, the MSR will be + * updated when the task is scheduled in. + */ + update_task_closid_rmid(tsk); + + return 0; }
static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
Hi Reinette, Fenghua,
Subject nit: I think 'use IPI instead of task_work() to update PQR_ASSOC_MSR' conveys the guts of this change more quickly!
On 03/12/2020 23:25, Reinette Chatre wrote:
From: Fenghua Yu fenghua.yu@intel.com
Currently when moving a task to a resource group the PQR_ASSOC MSR is updated with the new closid and rmid in an added task callback. If the task is running the work is run as soon as possible. If the task is not running the work is executed later
in the kernel exit path when the kernel returns to the task again.
kernel exit makes me thing of user-space... is it enough to just say: "by __switch_to() when task is next run"?
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task is running is the right thing to do. Queueing work for a task that is not running is unnecessary (the PQR_ASSOC MSR is already updated when the task is scheduled in) and causing system resource waste with the way in which it is implemented: Work to update the PQR_ASSOC register is queued every time the user writes a task id to the "tasks" file, even if the task already belongs to the resource group. This could result in multiple pending work items associated with a single task even if they are all identical and even though only a single update with most recent values is needed. Specifically, even if a task is moved between different resource groups while it is sleeping then it is only the last move that is relevant but yet a work item is queued during each move. This unnecessary queueing of work items could result in significant system resource waste, especially on tasks sleeping for a long time. For example, as demonstrated by Shakeel Butt in [1] writing the same task id to the "tasks" file can quickly consume significant memory. The same problem (wasted system resources) occurs when moving a task between different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue with the way in which the queueing of work is done in that the task_struct update is currently done after the work is queued, resulting in a race with the register update possibly done before the data needed by the update is available.
To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way right after the new closid and rmid are ready during the task movement, only if the task is running. If a moved task is not running nothing is done since the PQR_ASSOC MSR will be updated next time the task is scheduled. This is the same way used to update the register when tasks are moved as part of resource group removal.
(as t->on_cpu is already used...)
Reviewed-by: James Morse james.morse@arm.com
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 68db7d2dec8f..9d62f1fadcc3 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
+static void update_task_closid_rmid(struct task_struct *t) {
- int cpu;
- if (task_on_cpu(t, &cpu))
smp_call_function_single(cpu, _update_task_closid_rmid, t, 1);
I think: | if (task_curr(t)) | smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
here would make for an easier backport as it doesn't depend on the previous patch.
+}
[...]
static int __rdtgroup_move_task(struct task_struct *tsk, struct rdtgroup *rdtgrp) {
- if (rdtgrp->type == RDTCTRL_GROUP) {
tsk->closid = rdtgrp->closid;
tsk->rmid = rdtgrp->mon.rmid;
- } else if (rdtgrp->type == RDTMON_GROUP) {
[...]
- } else {
rdt_last_cmd_puts("Invalid resource group type\n");
return -EINVAL;
Wouldn't this be a kernel bug? I'd have thought there would be a WARN_ON_ONCE() here to make it clear this isn't user-space's fault!
}
- return ret;
- /*
* By now, the task's closid and rmid are set. If the task is current
* on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
* group go into effect. If the task is not current, the MSR will be
* updated when the task is scheduled in.
*/
- update_task_closid_rmid(tsk);
- return 0;
}
Thanks,
James
Hi James,
On 12/9/2020 8:51 AM, James Morse wrote:
Hi Reinette, Fenghua,
Subject nit: I think 'use IPI instead of task_work() to update PQR_ASSOC_MSR' conveys the guts of this change more quickly!
Sure. Thank you. A small change is that I plan to write "PQR_ASSOC MSR" instead to closer match the name.
On 03/12/2020 23:25, Reinette Chatre wrote:
From: Fenghua Yu fenghua.yu@intel.com
Currently when moving a task to a resource group the PQR_ASSOC MSR is updated with the new closid and rmid in an added task callback. If the task is running the work is run as soon as possible. If the task is not running the work is executed later
in the kernel exit path when the kernel returns to the task again.
kernel exit makes me thing of user-space... is it enough to just say: "by __switch_to() when task is next run"?
I do not think that would be accurate. The paragraph to which you are proposing the change states the current context, before the fix, when task_work_add() is still used to update PQR_ASSOC. The "kernel exit" text you refer to is quite close to task_work_add()'s comments and indeed refers to the work that is run before returning to user space. If a function name would make things clearer perhaps adding exit_to_user_mode_loop() instead? Changing the text to __switch_to() does not reflect the context described here since __switch_to() is what is called when task is context switched back from where resctrl_sched_in() is called, not the queued work being described here.
Updating the PQR_ASSOC MSR as soon as possible on the CPU a moved task is running is the right thing to do. Queueing work for a task that is not running is unnecessary (the PQR_ASSOC MSR is already updated when the task is scheduled in) and causing system resource waste with the way in which it is implemented: Work to update the PQR_ASSOC register is queued every time the user writes a task id to the "tasks" file, even if the task already belongs to the resource group. This could result in multiple pending work items associated with a single task even if they are all identical and even though only a single update with most recent values is needed. Specifically, even if a task is moved between different resource groups while it is sleeping then it is only the last move that is relevant but yet a work item is queued during each move. This unnecessary queueing of work items could result in significant system resource waste, especially on tasks sleeping for a long time. For example, as demonstrated by Shakeel Butt in [1] writing the same task id to the "tasks" file can quickly consume significant memory. The same problem (wasted system resources) occurs when moving a task between different resource groups.
As pointed out by Valentin Schneider in [2] there is an additional issue with the way in which the queueing of work is done in that the task_struct update is currently done after the work is queued, resulting in a race with the register update possibly done before the data needed by the update is available.
To solve these issues, the PQR_ASSOC MSR is updated in a synchronous way right after the new closid and rmid are ready during the task movement, only if the task is running. If a moved task is not running nothing is done since the PQR_ASSOC MSR will be updated next time the task is scheduled. This is the same way used to update the register when tasks are moved as part of resource group removal.
(as t->on_cpu is already used...)
Reviewed-by: James Morse james.morse@arm.com
Thank you very much. I do plan to follow your suggestion below though.
diff --git a/arch/x86/kernel/cpu/resctrl/rdtgroup.c b/arch/x86/kernel/cpu/resctrl/rdtgroup.c index 68db7d2dec8f..9d62f1fadcc3 100644 --- a/arch/x86/kernel/cpu/resctrl/rdtgroup.c +++ b/arch/x86/kernel/cpu/resctrl/rdtgroup.c @@ -525,6 +525,16 @@ static void rdtgroup_remove(struct rdtgroup *rdtgrp)
+static void update_task_closid_rmid(struct task_struct *t) {
- int cpu;
- if (task_on_cpu(t, &cpu))
smp_call_function_single(cpu, _update_task_closid_rmid, t, 1);
I think: | if (task_curr(t)) | smp_call_function_single(task_cpu(t), _update_task_closid_rmid, t, 1);
here would make for an easier backport as it doesn't depend on the previous patch.
Will do, thank you very much.
The previous patch has now become a bugfix in its own right after you pointed out the issue with using t->on_cpu. In addressing that I plan to remove the helpers found in patch #1 so backporting should continue to be easier.
+}
[...]
static int __rdtgroup_move_task(struct task_struct *tsk, struct rdtgroup *rdtgrp) {
- if (rdtgrp->type == RDTCTRL_GROUP) {
tsk->closid = rdtgrp->closid;
tsk->rmid = rdtgrp->mon.rmid;
- } else if (rdtgrp->type == RDTMON_GROUP) {
[...]
- } else {
rdt_last_cmd_puts("Invalid resource group type\n");
return -EINVAL;
Wouldn't this be a kernel bug? I'd have thought there would be a WARN_ON_ONCE() here to make it clear this isn't user-space's fault!
You are right, this would be a kernel bug. I'll add a WARN_ON_ONCE().
}
- return ret;
- /*
* By now, the task's closid and rmid are set. If the task is current
* on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
* group go into effect. If the task is not current, the MSR will be
* updated when the task is scheduled in.
*/
- update_task_closid_rmid(tsk);
- return 0; }
Thank you very much.
Reinette
On 03/12/20 23:25, Reinette Chatre wrote:
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Reported-by: Shakeel Butt shakeelb@google.com Reported-by: Valentin Schneider valentin.schneider@arm.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Tony Luck tony.luck@intel.com Cc: stable@vger.kernel.org
Some pedantic comments below; with James' task_curr() + task_cpu() suggestion:
Reviewed-by: Valentin Schneider valentin.schneider@arm.com
static int __rdtgroup_move_task(struct task_struct *tsk, struct rdtgroup *rdtgrp) {
- struct task_move_callback *callback;
- int ret;
- callback = kzalloc(sizeof(*callback), GFP_KERNEL);
- if (!callback)
return -ENOMEM;
- callback->work.func = move_myself;
- callback->rdtgrp = rdtgrp;
/*
* Take a refcount, so rdtgrp cannot be freed before the
* callback has been invoked.
* Set the task's closid/rmid before the PQR_ASSOC MSR can be
* updated by them.
*
* For ctrl_mon groups, move both closid and rmid.
* For monitor groups, can move the tasks only from
* their parent CTRL group. */
- atomic_inc(&rdtgrp->waitcount);
- ret = task_work_add(tsk, &callback->work, TWA_RESUME);
- if (ret) {
/*
* Task is exiting. Drop the refcount and free the callback.
* No need to check the refcount as the group cannot be
* deleted before the write function unlocks rdtgroup_mutex.
*/
atomic_dec(&rdtgrp->waitcount);
kfree(callback);
rdt_last_cmd_puts("Task exited\n");
- } else {
/*
* For ctrl_mon groups move both closid and rmid.
* For monitor groups, can move the tasks only from
* their parent CTRL group.
*/
if (rdtgrp->type == RDTCTRL_GROUP) {
tsk->closid = rdtgrp->closid;
- if (rdtgrp->type == RDTCTRL_GROUP) {
tsk->closid = rdtgrp->closid;
tsk->rmid = rdtgrp->mon.rmid;
- } else if (rdtgrp->type == RDTMON_GROUP) {
if (rdtgrp->mon.parent->closid == tsk->closid) { tsk->rmid = rdtgrp->mon.rmid;
} else if (rdtgrp->type == RDTMON_GROUP) {
if (rdtgrp->mon.parent->closid == tsk->closid) {
tsk->rmid = rdtgrp->mon.rmid;
} else {
rdt_last_cmd_puts("Can't move task to different control group\n");
ret = -EINVAL;
}
} else {
rdt_last_cmd_puts("Can't move task to different control group\n");
return -EINVAL; }
- } else {
rdt_last_cmd_puts("Invalid resource group type\n");
}return -EINVAL;
James already pointed out this should be a WARN_ON_ONCE(), but is that the right place to assert rdtgrp->type validity?
I see only a single assignment to rdtgrp->type in mkdir_rdt_prepare(); could we fail the group creation there instead if the passed rtype is invalid?
- return ret;
- /*
* By now, the task's closid and rmid are set. If the task is current
* on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
* group go into effect. If the task is not current, the MSR will be
* updated when the task is scheduled in.
*/
- update_task_closid_rmid(tsk);
We need the above writes to be compile-ordered before the IPI is sent. There *is* a preempt_disable() down in smp_call_function_single() that gives us the required barrier(), can we deem that sufficient or would we want one before update_task_closid_rmid() for the sake of clarity?
- return 0;
}
static bool is_closid_match(struct task_struct *t, struct rdtgroup *r)
Hi Valentin,
On 12/11/2020 12:46 PM, Valentin Schneider wrote:
On 03/12/20 23:25, Reinette Chatre wrote:
Fixes: e02737d5b826 ("x86/intel_rdt: Add tasks files") Reported-by: Shakeel Butt shakeelb@google.com Reported-by: Valentin Schneider valentin.schneider@arm.com Signed-off-by: Fenghua Yu fenghua.yu@intel.com Signed-off-by: Reinette Chatre reinette.chatre@intel.com Reviewed-by: Tony Luck tony.luck@intel.com Cc: stable@vger.kernel.org
Some pedantic comments below; with James' task_curr() + task_cpu() suggestion:
Reviewed-by: Valentin Schneider valentin.schneider@arm.com
Thank you very much.
...
- if (rdtgrp->type == RDTCTRL_GROUP) {
tsk->closid = rdtgrp->closid;
tsk->rmid = rdtgrp->mon.rmid;
- } else if (rdtgrp->type == RDTMON_GROUP) {
if (rdtgrp->mon.parent->closid == tsk->closid) { tsk->rmid = rdtgrp->mon.rmid;
} else if (rdtgrp->type == RDTMON_GROUP) {
if (rdtgrp->mon.parent->closid == tsk->closid) {
tsk->rmid = rdtgrp->mon.rmid;
} else {
rdt_last_cmd_puts("Can't move task to different control group\n");
ret = -EINVAL;
}
} else {
rdt_last_cmd_puts("Can't move task to different control group\n");
return -EINVAL; }
- } else {
rdt_last_cmd_puts("Invalid resource group type\n");
}return -EINVAL;
James already pointed out this should be a WARN_ON_ONCE(), but is that the right place to assert rdtgrp->type validity?
I see only a single assignment to rdtgrp->type in mkdir_rdt_prepare(); could we fail the group creation there instead if the passed rtype is invalid?
Yes, there is that single assignment in mkdir_rdt_prepare() and looking at how mkdir_rdt_prepare() is called it is only ever called with RDTMON_GROUP or RDTCTRL_GROUP. This additional error checking was added as part of this fix but unrelated to the fix itself. Since you and James both pointed out flaws with it and it is unnecessary I will remove it here and maintain the original checking that continues to be sufficient.
- return ret;
- /*
* By now, the task's closid and rmid are set. If the task is current
* on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
* group go into effect. If the task is not current, the MSR will be
* updated when the task is scheduled in.
*/
- update_task_closid_rmid(tsk);
We need the above writes to be compile-ordered before the IPI is sent. There *is* a preempt_disable() down in smp_call_function_single() that gives us the required barrier(), can we deem that sufficient or would we want one before update_task_closid_rmid() for the sake of clarity?
Apologies, it is not clear to me why the preempt_disable() would be insufficient. If it is not then there may be a few other areas (where resctrl calls smp_call_function_xxx()) that needs to be re-evaluated.
Thank you very much
Reinette
On 14/12/20 18:41, Reinette Chatre wrote:
- return ret;
- /*
* By now, the task's closid and rmid are set. If the task is current
* on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
* group go into effect. If the task is not current, the MSR will be
* updated when the task is scheduled in.
*/
- update_task_closid_rmid(tsk);
We need the above writes to be compile-ordered before the IPI is sent. There *is* a preempt_disable() down in smp_call_function_single() that gives us the required barrier(), can we deem that sufficient or would we want one before update_task_closid_rmid() for the sake of clarity?
Apologies, it is not clear to me why the preempt_disable() would be insufficient. If it is not then there may be a few other areas (where resctrl calls smp_call_function_xxx()) that needs to be re-evaluated.
So that's part paranoia and part nonsense from my end - the contents of smp_call() shouldn't matter here.
If we distill the code to:
tsk->closid = x;
if (task_curr(tsk)) smp_call(...);
It is somewhat far fetched, but AFAICT this can be compiled as:
if (task_curr(tsk)) tsk->closid = x; smp_call(...); else tsk->closid = x;
IOW, there could be a sequence where the closid write is ordered *after* the task_curr() read. With
tsk->closid = x;
barrier();
if (task_curr(tsk)) smp_call(...);
that explicitely cannot happen.
Thank you very much
Reinette
Hi Valentin,
On 12/16/2020 9:41 AM, Valentin Schneider wrote:
On 14/12/20 18:41, Reinette Chatre wrote:
- return ret;
- /*
* By now, the task's closid and rmid are set. If the task is current
* on a CPU, the PQR_ASSOC MSR needs to be updated to make the resource
* group go into effect. If the task is not current, the MSR will be
* updated when the task is scheduled in.
*/
- update_task_closid_rmid(tsk);
We need the above writes to be compile-ordered before the IPI is sent. There *is* a preempt_disable() down in smp_call_function_single() that gives us the required barrier(), can we deem that sufficient or would we want one before update_task_closid_rmid() for the sake of clarity?
Apologies, it is not clear to me why the preempt_disable() would be insufficient. If it is not then there may be a few other areas (where resctrl calls smp_call_function_xxx()) that needs to be re-evaluated.
So that's part paranoia and part nonsense from my end - the contents of smp_call() shouldn't matter here.
If we distill the code to:
tsk->closid = x;
if (task_curr(tsk)) smp_call(...);
It is somewhat far fetched, but AFAICT this can be compiled as:
if (task_curr(tsk)) tsk->closid = x; smp_call(...); else tsk->closid = x;
IOW, there could be a sequence where the closid write is ordered *after* the task_curr() read.
Could you please elaborate why it would be an issue is the closid write is ordered after the task_curr() read? task_curr() does not depend on the closid.
With
tsk->closid = x;
barrier();
if (task_curr(tsk)) smp_call(...);
that explicitely cannot happen.
Reinette
On 16/12/20 18:26, Reinette Chatre wrote:
Hi Valentin,
So that's part paranoia and part nonsense from my end - the contents of smp_call() shouldn't matter here.
If we distill the code to:
tsk->closid = x;
if (task_curr(tsk)) smp_call(...);
It is somewhat far fetched, but AFAICT this can be compiled as:
if (task_curr(tsk)) tsk->closid = x; smp_call(...); else tsk->closid = x;
IOW, there could be a sequence where the closid write is ordered *after* the task_curr() read.
Could you please elaborate why it would be an issue is the closid write is ordered after the task_curr() read? task_curr() does not depend on the closid.
IMO the 'task_curr()' check only makes sense if it happens *after* the write, the logic being: 'closid/rmid has been written to, does the task now need interrupting?'
In the above 'else' clause, task_curr() would need to be re-evaluated after the write: the task wasn't current *before* the write, but nothing guarantees this still holds *after* the write.
With
tsk->closid = x;
barrier();
if (task_curr(tsk)) smp_call(...);
that explicitely cannot happen.
Reinette
linux-stable-mirror@lists.linaro.org