SchedTune: define payoff parameters for platform

List overview All Threads
Download

newer

older

[PATCH RFC 0/8] EASv5.2: Optimize...

RV Owners Email List

Leo Yan

14 Jun 2016 14 Jun '16

1:42 p.m.

Hi Patrick,

[ + eas-dev ]

Here have a common question for how to define schedTune threshold array for payoff. So basically I want check below questions:

- When every CGroup has its own perf_boost_idx for PB region and perf_constrain_idx for PC region. So do you have suggestion or guideline to define these index?

And for difference CGroup like "backgroud", "foreground" or "performance" every CGroup will have its dedicated index or the platform can share the same index value?

- How to define the array value for "threshold_gains"?

IIUC this array is platform dependency, but what's the reasonable method to generate this table? Here have some suggested testing for generating this table?

Or my understanding is wrong so this array is fixed, then just need ajust perf_boost_idx/perf_constrain_idx for platform is enough?

- So far we cannot set these payoff parameters (including perf_boost_idx/perf_constrain_idx and threshold_gains) from sysfs dynamically, so how we can initilizae these value for platform specific? Suppose now we can only set these value when kernel's init flow, right?

Thanks, Leo Yan

Show replies by date

Patrick Bellasi

23 Jun 23 Jun

8:54 a.m.

On 14-Jun 21:42, Leo Yan wrote:

...

Hi Patrick,

Hi Leo,

...

[ + eas-dev ]

Here have a common question for how to define schedTune threshold array for payoff. So basically I want check below questions:

When every CGroup has its own perf_boost_idx for PB region and perf_constrain_idx for PC region. So do you have suggestion or guideline to define these index?

And for difference CGroup like "backgroud", "foreground" or "performance" every CGroup will have its dedicated index or the platform can share the same index value?

How to define the array value for "threshold_gains"?

IIUC this array is platform dependency, but what's the reasonable method to generate this table? Here have some suggested testing for generating this table?

Or my understanding is wrong so this array is fixed, then just need ajust perf_boost_idx/perf_constrain_idx for platform is enough?

So far we cannot set these payoff parameters (including perf_boost_idx/perf_constrain_idx and threshold_gains) from sysfs dynamically, so how we can initilizae these value for platform specific? Suppose now we can only set these value when kernel's init flow, right?

I think all these questions at the end they boil down to a single one: is the threshold_params a platform dependent or independent array?

Well, my original view was for this array to be NOT platform dependent at all. It is actually defined just as an "implementation detail" to speed-up the __schedtune_accept_deltas function.

Let's consider that you do not have this array for a moment. In this case the PE space is still defined as well as the PB and PC cuts. These cuts are just in a continuous space but still, from a conceptual standpoint: the more you boost a task the more you accept to pay a bigger energy for a smaller amount of performances.

That's the basic idea, which translates than in some implementation considerations: does it make sense to distinguish between tasks boosted 61% or 62%? Probably no, thus the PB and PC spaces can be discretized to have a faster check for a schedule candidate to be over or below the cat. An easy way to define a finite and simple set of cuts was to just consider points which are just 1 unit apart in the Perf and Energy axes. That's how that table has been defined. Again there are no platform related consideration on building that table.

However, you can argue that the optimal boost value for a task is something which is somehow platform dependent. I would say that it is more use-case dependent. Thus, probably boosting with the same value all the foreground tasks is not the best way to go. Right, but that's the reason why we support the possibility to defined multiple CGroups.

Each cgroup can be used to defined the boost value which has been found to be optimal for certain use-cases running on a specific platform.

These are the idea at the base of the original design, but if you have a different view let's talk about. Maybe that some more specific example/use-case can help on describing the need for a different approach.

...

Thanks, Leo Yan

Cheers Patrick

-- #include <best/regards.h>

Patrick Bellasi

Leo Yan

24 Jun 24 Jun

7:37 a.m.

Hi Patrick,

On Thu, Jun 23, 2016 at 09:54:13AM +0100, Patrick Bellasi wrote:

...

On 14-Jun 21:42, Leo Yan wrote:

...
Hi Patrick,

Hi Leo,

...
[ + eas-dev ]

Here have a common question for how to define schedTune threshold array for payoff. So basically I want check below questions:

When every CGroup has its own perf_boost_idx for PB region and perf_constrain_idx for PC region. So do you have suggestion or guideline to define these index?

And for difference CGroup like "backgroud", "foreground" or "performance" every CGroup will have its dedicated index or the platform can share the same index value?

How to define the array value for "threshold_gains"?

IIUC this array is platform dependency, but what's the reasonable method to generate this table? Here have some suggested testing for generating this table?

Or my understanding is wrong so this array is fixed, then just need ajust perf_boost_idx/perf_constrain_idx for platform is enough?

So far we cannot set these payoff parameters (including perf_boost_idx/perf_constrain_idx and threshold_gains) from sysfs dynamically, so how we can initilizae these value for platform specific? Suppose now we can only set these value when kernel's init flow, right?

I think all these questions at the end they boil down to a single one: is the threshold_params a platform dependent or independent array?

Well, my original view was for this array to be NOT platform dependent at all. It is actually defined just as an "implementation detail" to speed-up the __schedtune_accept_deltas function.

Let's consider that you do not have this array for a moment. In this case the PE space is still defined as well as the PB and PC cuts. These cuts are just in a continuous space but still, from a conceptual standpoint: the more you boost a task the more you accept to pay a bigger energy for a smaller amount of performances.

Be honest, upper sentence is very good explanation for understanding PB cuts. But I still struggle to understand PC cuts :) I usually think (PC) region is used to make sure performance will not downgrade too much if we cannot see significant power saving.

...

That's the basic idea, which translates than in some implementation considerations: does it make sense to distinguish between tasks boosted 61% or 62%? Probably no, thus the PB and PC spaces can be discretized to have a faster check for a schedule candidate to be over or below the cat. An easy way to define a finite and simple set of cuts was to just consider points which are just 1 unit apart in the Perf and Energy axes. That's how that table has been defined. Again there are no platform related consideration on building that table.

Thanks for clear explanation. I have no more question from design perspective but still have concern for how to use it on SoC. Let's see more detailed implementation:

If boost = 0 then PE will be a vertical cut, so will only keep (O) and (PC) regions. If boost = 5% or 10%, with current threshold_params it will cut both (PB) and (PC) regions; this is hard to understand.

I think more reasonable method is likely to shift the cut gradient from vertical to right side a little so we can keep almost region for (PC) region and also give some chance for (PB) region. if so, then threshold_params should be:

static struct threshold_params threshold_gains[] = { { 1, 5 }, /* >= 0% */ { 2, 5 }, /* >= 10% */ { 3, 5 }, /* >= 20% */ { 4, 5 }, /* >= 30% */ { 5, 5 }, /* >= 40% */ { 5, 4 }, /* >= 50% */ { 5, 3 }, /* >= 60% */ { 5, 2 }, /* >= 70% */ { 5, 1 }, /* >= 80% */ { 5, 0 } /* >= 90% */ };

And perf_boost_idx and perf_constrain_idx will have same value so have same gradient so the cut will step by step to shift from high gradient to low grandient. How about you think for this?

...

However, you can argue that the optimal boost value for a task is something which is somehow platform dependent. I would say that it is more use-case dependent. Thus, probably boosting with the same value all the foreground tasks is not the best way to go. Right, but that's the reason why we support the possibility to defined multiple CGroups.

Each cgroup can be used to defined the boost value which has been found to be optimal for certain use-cases running on a specific platform.

These are the idea at the base of the original design, but if you have a different view let's talk about. Maybe that some more specific example/use-case can help on describing the need for a different approach.

Agree. payoff is not to resolve optimal issue but this is done by CGroup. Case is important and I will try to gather related info if possible.

...

...
Thanks, Leo Yan

Cheers Patrick

-- #include <best/regards.h>

Patrick Bellasi

Patrick Bellasi

3:21 p.m.

On 24-Jun 15:37, Leo Yan wrote:

...

Hi Patrick,

On Thu, Jun 23, 2016 at 09:54:13AM +0100, Patrick Bellasi wrote:

...
On 14-Jun 21:42, Leo Yan wrote:

...
Hi Patrick,

Hi Leo,

...
[ + eas-dev ]

Here have a common question for how to define schedTune threshold array for payoff. So basically I want check below questions:

When every CGroup has its own perf_boost_idx for PB region and perf_constrain_idx for PC region. So do you have suggestion or guideline to define these index?

And for difference CGroup like "backgroud", "foreground" or "performance" every CGroup will have its dedicated index or the platform can share the same index value?

How to define the array value for "threshold_gains"?

IIUC this array is platform dependency, but what's the reasonable method to generate this table? Here have some suggested testing for generating this table?

Or my understanding is wrong so this array is fixed, then just need ajust perf_boost_idx/perf_constrain_idx for platform is enough?

So far we cannot set these payoff parameters (including perf_boost_idx/perf_constrain_idx and threshold_gains) from sysfs dynamically, so how we can initilizae these value for platform specific? Suppose now we can only set these value when kernel's init flow, right?

I think all these questions at the end they boil down to a single one: is the threshold_params a platform dependent or independent array?

Well, my original view was for this array to be NOT platform dependent at all. It is actually defined just as an "implementation detail" to speed-up the __schedtune_accept_deltas function.

Let's consider that you do not have this array for a moment. In this case the PE space is still defined as well as the PB and PC cuts. These cuts are just in a continuous space but still, from a conceptual standpoint: the more you boost a task the more you accept to pay a bigger energy for a smaller amount of performances.

Be honest, upper sentence is very good explanation for understanding PB cuts. But I still struggle to understand PC cuts :) I usually think (PC) region is used to make sure performance will not downgrade too much if we cannot see significant power saving.

You are right. Thus, in the previous example: the more you boost a task the more you have to save energy to impact performances. Can you see that as a description of a PC?

...

...
That's the basic idea, which translates than in some implementation considerations: does it make sense to distinguish between tasks boosted 61% or 62%? Probably no, thus the PB and PC spaces can be discretized to have a faster check for a schedule candidate to be over or below the cat. An easy way to define a finite and simple set of cuts was to just consider points which are just 1 unit apart in the Perf and Energy axes. That's how that table has been defined. Again there are no platform related consideration on building that table.

Thanks for clear explanation. I have no more question from design perspective but still have concern for how to use it on SoC. Let's see more detailed implementation:

If boost = 0 then PE will be a vertical cut, so will only keep (O) and (PC) regions. If boost = 5% or 10%, with current threshold_params it will cut both (PB) and (PC) regions; this is hard to understand.

The most recent version of the cuts we are using is: http://www.linux-arm.org/git?p=linux-pb.git%3Ba=blob%3Bf=kernel/sched/tune.c...

static struct threshold_params threshold_gains[] = { { 0, 4 }, /* >= 0% */ { 0, 4 }, /* >= 10% */ { 1, 4 }, /* >= 20% */ { 2, 4 }, /* >= 30% */ { 3, 4 }, /* >= 40% */ { 4, 3 }, /* >= 50% */ { 4, 2 }, /* >= 60% */ { 4, 1 }, /* >= 70% */ { 4, 0 }, /* >= 80% */ { 4, 0 } /* >= 90% */ };

This table defines a vertical cut up to 19% boost values. In other words, up to 19% boost value we are in a boost "dead zone" where we bias only OPP selections without allowing any increase in energy consumption. The same table defines that above 79% boost value we are in a sort of "accept all" zone where we accept all the scheduling candidate which provides a capacity increase, without caring about energy variations. All the boost values between 20% and 79% defined different performance-energy trade-offs and with PB and PC regions cut with the same gradient.

...

I think more reasonable method is likely to shift the cut gradient from vertical to right side a little so we can keep almost region for (PC) region and also give some chance for (PB) region. if so, then

I cannot really get this point. What's the goal?

...

threshold_params should be:

static struct threshold_params threshold_gains[] = { { 1, 5 }, /* >= 0% */ { 2, 5 }, /* >= 10% */ { 3, 5 }, /* >= 20% */ { 4, 5 }, /* >= 30% */ { 5, 5 }, /* >= 40% */ { 5, 4 }, /* >= 50% */ { 5, 3 }, /* >= 60% */ { 5, 2 }, /* >= 70% */ { 5, 1 }, /* >= 80% */ { 5, 0 } /* >= 90% */ };

And perf_boost_idx and perf_constrain_idx will have same value so have same gradient so the cut will step by step to shift from high gradient to low grandient. How about you think for this?

AFAIU this new table is basically: a) removing the "dead-zone" up to 19% b) reduce the "accept all" region to boost values >90%

Sound like we should run some experiments and benchmarks to check how much this setup is different than the previous one. At first glance it seems to be a little bit more aggressive on low boost values and more conservative in high boost values.

...

...
However, you can argue that the optimal boost value for a task is something which is somehow platform dependent. I would say that it is more use-case dependent. Thus, probably boosting with the same value all the foreground tasks is not the best way to go. Right, but that's the reason why we support the possibility to defined multiple CGroups.

Each cgroup can be used to defined the boost value which has been found to be optimal for certain use-cases running on a specific platform.

These are the idea at the base of the original design, but if you have a different view let's talk about. Maybe that some more specific example/use-case can help on describing the need for a different approach.

Agree. payoff is not to resolve optimal issue but this is done by CGroup. Case is important and I will try to gather related info if possible.

We need to setup an evaluation exercise with reproducible benchmarks to proper evaluate these variations.

-- #include <best/regards.h>

Patrick Bellasi

Leo Yan

27 Jun 27 Jun

7:34 a.m.

On Fri, Jun 24, 2016 at 04:21:03PM +0100, Patrick Bellasi wrote:

[...]

...

...
...
I think all these questions at the end they boil down to a single one: is the threshold_params a platform dependent or independent array?

Well, my original view was for this array to be NOT platform dependent at all. It is actually defined just as an "implementation detail" to speed-up the __schedtune_accept_deltas function.

Let's consider that you do not have this array for a moment. In this case the PE space is still defined as well as the PB and PC cuts. These cuts are just in a continuous space but still, from a conceptual standpoint: the more you boost a task the more you accept to pay a bigger energy for a smaller amount of performances.

Be honest, upper sentence is very good explanation for understanding PB cuts. But I still struggle to understand PC cuts :) I usually think (PC) region is used to make sure performance will not downgrade too much if we cannot see significant power saving.

You are right. Thus, in the previous example: the more you boost a task the more you have to save energy to impact performances. Can you see that as a description of a PC?

Yes. Three factors to understand payoff: boost margin, performance and energy.

...

...
...
That's the basic idea, which translates than in some implementation considerations: does it make sense to distinguish between tasks boosted 61% or 62%? Probably no, thus the PB and PC spaces can be discretized to have a faster check for a schedule candidate to be over or below the cat. An easy way to define a finite and simple set of cuts was to just consider points which are just 1 unit apart in the Perf and Energy axes. That's how that table has been defined. Again there are no platform related consideration on building that table.

Thanks for clear explanation. I have no more question from design perspective but still have concern for how to use it on SoC. Let's see more detailed implementation:

If boost = 0 then PE will be a vertical cut, so will only keep (O) and (PC) regions. If boost = 5% or 10%, with current threshold_params it will cut both (PB) and (PC) regions; this is hard to understand.

The most recent version of the cuts we are using is: http://www.linux-arm.org/git?p=linux-pb.git%3Ba=blob%3Bf=kernel/sched/tune.c...

Why this code base don't apply the patch we found have some issues for PE filter [1]? I just want to confirm we are discussing on the same code base.

[1] https://lists.linaro.org/pipermail/eas-dev/2016-May/000428.html

...

static struct threshold_params threshold_gains[] = { { 0, 4 }, /* >= 0% */ { 0, 4 }, /* >= 10% */ { 1, 4 }, /* >= 20% */ { 2, 4 }, /* >= 30% */ { 3, 4 }, /* >= 40% */ { 4, 3 }, /* >= 50% */ { 4, 2 }, /* >= 60% */ { 4, 1 }, /* >= 70% */ { 4, 0 }, /* >= 80% */ { 4, 0 } /* >= 90% */ };

This table defines a vertical cut up to 19% boost values. In other words, up to 19% boost value we are in a boost "dead zone" where we bias only OPP selections without allowing any increase in energy consumption. The same table defines that above 79% boost value we are in a sort of "accept all" zone where we accept all the scheduling candidate which provides a capacity increase, without caring about energy variations. All the boost values between 20% and 79% defined different performance-energy trade-offs and with PB and PC regions cut with the same gradient.

Just reminding, if with below code I think PB and PC gradient is _NOT_ same.

578 int 579 sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, 580 void __user *buffer, size_t *lenp, 581 loff_t *ppos) 582 { 583 int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); 584 585 if (ret || !write) 586 return ret; 587 588 /* Performance Boost (B) region threshold params */ 589 perf_boost_idx = sysctl_sched_cfs_boost; 590 perf_boost_idx /= 10; 591 592 /* Performance Constraint (C) region threshold params */ 593 perf_constrain_idx = 100 - sysctl_sched_cfs_boost; 594 perf_constrain_idx /= 10; 595 596 return 0; 597 }

...

...
I think more reasonable method is likely to shift the cut gradient from vertical to right side a little so we can keep almost region for (PC) region and also give some chance for (PB) region. if so, then

I cannot really get this point. What's the goal?

For more easily to demonstrate my idea, please see the plot I draw for the PE filter region:

https://people.linaro.org/~leo.yan/PE_Filter_Regions.png

For boost=0, means PE filter region is left cut of X axis. If boost=5, then that means the cut gradient will rotate to right then will enable part of PB region and remove some of PC region. If boost=100, that means the cut gradient finally rotate to horizontal level and will totally enable PB region and remove whole PC region.

Essentially below table wants to implement this idea properly. So Loose or restrict specific margin is not my purpose, but I want to figure out if can define more ordinary trend for PE filter region.

...

...
threshold_params should be:

static struct threshold_params threshold_gains[] = { { 1, 5 }, /* >= 0% */ { 2, 5 }, /* >= 10% */ { 3, 5 }, /* >= 20% */ { 4, 5 }, /* >= 30% */ { 5, 5 }, /* >= 40% */ { 5, 4 }, /* >= 50% */ { 5, 3 }, /* >= 60% */ { 5, 2 }, /* >= 70% */ { 5, 1 }, /* >= 80% */ { 5, 0 } /* >= 90% */ };

And perf_boost_idx and perf_constrain_idx will have same value so have same gradient so the cut will step by step to shift from high gradient to low grandient. How about you think for this?

AFAIU this new table is basically: a) removing the "dead-zone" up to 19% b) reduce the "accept all" region to boost values >90%

Sound like we should run some experiments and benchmarks to check how much this setup is different than the previous one. At first glance it seems to be a little bit more aggressive on low boost values and more conservative in high boost values.

...
...
However, you can argue that the optimal boost value for a task is something which is somehow platform dependent. I would say that it is more use-case dependent. Thus, probably boosting with the same value all the foreground tasks is not the best way to go. Right, but that's the reason why we support the possibility to defined multiple CGroups.

Each cgroup can be used to defined the boost value which has been found to be optimal for certain use-cases running on a specific platform.

These are the idea at the base of the original design, but if you have a different view let's talk about. Maybe that some more specific example/use-case can help on describing the need for a different approach.

Agree. payoff is not to resolve optimal issue but this is done by CGroup. Case is important and I will try to gather related info if possible.

We need to setup an evaluation exercise with reproducible benchmarks to proper evaluate these variations.

-- #include <best/regards.h>

Patrick Bellasi

Patrick Bellasi

12:42 p.m.

On 27-Jun 15:34, Leo Yan wrote:

...

On Fri, Jun 24, 2016 at 04:21:03PM +0100, Patrick Bellasi wrote:

[...]

...
...
...
I think all these questions at the end they boil down to a single one: is the threshold_params a platform dependent or independent array?

Well, my original view was for this array to be NOT platform dependent at all. It is actually defined just as an "implementation detail" to speed-up the __schedtune_accept_deltas function.

Let's consider that you do not have this array for a moment. In this case the PE space is still defined as well as the PB and PC cuts. These cuts are just in a continuous space but still, from a conceptual standpoint: the more you boost a task the more you accept to pay a bigger energy for a smaller amount of performances.

Be honest, upper sentence is very good explanation for understanding PB cuts. But I still struggle to understand PC cuts :) I usually think (PC) region is used to make sure performance will not downgrade too much if we cannot see significant power saving.

You are right. Thus, in the previous example: the more you boost a task the more you have to save energy to impact performances. Can you see that as a description of a PC?

Yes. Three factors to understand payoff: boost margin, performance and energy.

...
...
...
That's the basic idea, which translates than in some implementation considerations: does it make sense to distinguish between tasks boosted 61% or 62%? Probably no, thus the PB and PC spaces can be discretized to have a faster check for a schedule candidate to be over or below the cat. An easy way to define a finite and simple set of cuts was to just consider points which are just 1 unit apart in the Perf and Energy axes. That's how that table has been defined. Again there are no platform related consideration on building that table.

Thanks for clear explanation. I have no more question from design perspective but still have concern for how to use it on SoC. Let's see more detailed implementation:

If boost = 0 then PE will be a vertical cut, so will only keep (O) and (PC) regions. If boost = 5% or 10%, with current threshold_params it will cut both (PB) and (PC) regions; this is hard to understand.

The most recent version of the cuts we are using is: http://www.linux-arm.org/git?p=linux-pb.git%3Ba=blob%3Bf=kernel/sched/tune.c...

Why this code base don't apply the patch we found have some issues for PE filter [1]? I just want to confirm we are discussing on the same code base.

[1] https://lists.linaro.org/pipermail/eas-dev/2016-May/000428.html

You right, internally we use a code based which includes the patch from this discussion. The link I've provided before was just what we release as v5.2, which did not included at that time the PE filter patch. The link was just to point you to a version of the thresholds_gains table we are using.

...

...
static struct threshold_params threshold_gains[] = { { 0, 4 }, /* >= 0% */ { 0, 4 }, /* >= 10% */ { 1, 4 }, /* >= 20% */ { 2, 4 }, /* >= 30% */ { 3, 4 }, /* >= 40% */ { 4, 3 }, /* >= 50% */ { 4, 2 }, /* >= 60% */ { 4, 1 }, /* >= 70% */ { 4, 0 }, /* >= 80% */ { 4, 0 } /* >= 90% */ };

This table defines a vertical cut up to 19% boost values. In other words, up to 19% boost value we are in a boost "dead zone" where we bias only OPP selections without allowing any increase in energy consumption. The same table defines that above 79% boost value we are in a sort of "accept all" zone where we accept all the scheduling candidate which provides a capacity increase, without caring about energy variations. All the boost values between 20% and 79% defined different performance-energy trade-offs and with PB and PC regions cut with the same gradient.

Just reminding, if with below code I think PB and PC gradient is _NOT_ same.

578 int 579 sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, 580 void __user *buffer, size_t *lenp, 581 loff_t *ppos) 582 { 583 int ret = proc_dointvec_minmax(table, write, buffer, lenp, ppos); 584 585 if (ret || !write) 586 return ret; 587 588 /* Performance Boost (B) region threshold params */ 589 perf_boost_idx = sysctl_sched_cfs_boost; 590 perf_boost_idx /= 10; 591 592 /* Performance Constraint (C) region threshold params */ 593 perf_constrain_idx = 100 - sysctl_sched_cfs_boost;

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Dont remember why this ended up to be like this... but this seems completely broken! :-(

There are two main issues: 1) constrains are not with the same gradient, e.g. for a boost of 20% perf_boost_idx = 2 ==> (nrg_gain: 1, cap_gain: 4) perf_constrain_idx = 8 ==> (nrg_gain: 4, cap_gain: 0)

2) we overflow the threshold_gains array for boost=100

The second issue is due to a check I've forgot to port in one of the many rewrite...

The fist issue is much more worst, it's quite likely an implementation error. I've always considered the two margin with the same gradient, with the exact behavior you described in the plot you shared in attachment.

Right now for boost=0 we are basically in a situation to accept only candidates in the O region. Which is NOT what we would like: we would like that for boost=0 we behave as a "standard" EAS which optimize just for energy reduction without constrains on performance impacts.

...

594 perf_constrain_idx /= 10; 595 596 return 0; 597 }

...
...
I think more reasonable method is likely to shift the cut gradient from vertical to right side a little so we can keep almost region for (PC) region and also give some chance for (PB) region. if so, then

I cannot really get this point. What's the goal?

For more easily to demonstrate my idea, please see the plot I draw for the PE filter region:

https://people.linaro.org/~leo.yan/PE_Filter_Regions.png

For boost=0, means PE filter region is left cut of X axis. If boost=5, then that means the cut gradient will rotate to right then will enable part of PB region and remove some of PC region. If boost=100, that means the cut gradient finally rotate to horizontal level and will totally enable PB region and remove whole PC region.

That's exactly the original idea based on which the PE regions cuts should work. Thanks for sharing these plots, they are quite useful. I would like to add something similar in LISA...

...

Essentially below table wants to implement this idea properly. So Loose or restrict specific margin is not my purpose, but I want to figure out if can define more ordinary trend for PE filter region.

Not sure updating the table is enough, it's basically "just" increasing the granularity of cuts near the 0% and 100% boost values...

...

...
...
threshold_params should be:

static struct threshold_params threshold_gains[] = { { 1, 5 }, /* >= 0% */

For boost=0 we would have:

/* Performance Boost (B) region threshold params */ perf_boost_idx = sysctl_sched_cfs_boost; perf_boost_idx /= 10;

==> perf_boost_idx = 0 ====> nrg_gain: 1 ====> cap_gain: 5

/* Performance Constraint (C) region threshold params */ perf_constrain_idx = 100 - sysctl_sched_cfs_boost; perf_constrain_idx /= 10;

==> perf_constrain_idx = 10 (overflow) ==> perf_constrain_idx = 9 (once fixed for boundaries checks) ====> nrg_gain: 5 ====> cap_gain: 0

Thus, for example, a scheduling candidate which corresponds to a 50% decrease in both energy and capacity would return:

__schedtune_accept_deltas(int nrg_delta, int cap_delta, int perf_boost_idx, int perf_constrain_idx)

gain_idx = perf_constrain_idx ==> 9

payoff = cap_delta * threshold_gains[gain_idx].nrg_gain; ==> -50 * 5

payoff -= nrg_delta * threshold_gains[gain_idx].cap_gain; ==> -250 - (-50 * 0)

==> payoff: -250 ==> REJECT

And that is wrong, because we would expect to accept a candidate which reduces energy by 50%, regardless of the 50% impact on performances.

I think that the current solution can have some bad impacts both on lower boost values, by forbidding to spread tasks, as well as on higher boost values, by allowing to save small amount of energy wile impacting a lot performances.

...

...
...
     { 2, 5 }, /* >= 10% */
     { 3, 5 }, /* >= 20% */
     { 4, 5 }, /* >= 30% */
     { 5, 5 }, /* >= 40% */
     { 5, 4 }, /* >= 50% */
     { 5, 3 }, /* >= 60% */
     { 5, 2 }, /* >= 70% */
     { 5, 1 }, /* >= 80% */
     { 5, 0 }  /* >= 90% */
};

And perf_boost_idx and perf_constrain_idx will have same value so have same gradient so the cut will step by step to shift from high gradient to low grandient. How about you think for this?
AFAIU this new table is basically: a) removing the "dead-zone" up to 19% b) reduce the "accept all" region to boost values >90%

Sound like we should run some experiments and benchmarks to check how much this setup is different than the previous one. At first glance it seems to be a little bit more aggressive on low boost values and more conservative in high boost values.

...
...
However, you can argue that the optimal boost value for a task is something which is somehow platform dependent. I would say that it is more use-case dependent. Thus, probably boosting with the same value all the foreground tasks is not the best way to go. Right, but that's the reason why we support the possibility to defined multiple CGroups.

Each cgroup can be used to defined the boost value which has been found to be optimal for certain use-cases running on a specific platform.

These are the idea at the base of the original design, but if you have a different view let's talk about. Maybe that some more specific example/use-case can help on describing the need for a different approach.

Agree. payoff is not to resolve optimal issue but this is done by CGroup. Case is important and I will try to gather related info if possible.

We need to setup an evaluation exercise with reproducible benchmarks to proper evaluate these variations.

-- #include <best/regards.h>

Patrick Bellasi

-- #include <best/regards.h>

Patrick Bellasi

Leo Yan

1:46 p.m.

On Mon, Jun 27, 2016 at 01:42:48PM +0100, Patrick Bellasi wrote:

[...]

...

...
For more easily to demonstrate my idea, please see the plot I draw for the PE filter region:

https://people.linaro.org/~leo.yan/PE_Filter_Regions.png

For boost=0, means PE filter region is left cut of X axis. If boost=5, then that means the cut gradient will rotate to right then will enable part of PB region and remove some of PC region. If boost=100, that means the cut gradient finally rotate to horizontal level and will totally enable PB region and remove whole PC region.

That's exactly the original idea based on which the PE regions cuts should work. Thanks for sharing these plots, they are quite useful. I would like to add something similar in LISA...

Glad for alignment. I have shared ipynb file in: https://people.linaro.org/~leo.yan/Payoff_PE_Filter_Regions.ipynb

Code is not clean and common enough, so just hope it's helpful.

...

...
Essentially below table wants to implement this idea properly. So Loose or restrict specific margin is not my purpose, but I want to figure out if can define more ordinary trend for PE filter region.

Not sure updating the table is enough, it's basically "just" increasing the granularity of cuts near the 0% and 100% boost values...

...
...
...
threshold_params should be:

static struct threshold_params threshold_gains[] = { { 1, 5 }, /* >= 0% */

For boost=0 we would have:

/* Performance Boost (B) region threshold params */ perf_boost_idx = sysctl_sched_cfs_boost; perf_boost_idx /= 10;

==> perf_boost_idx = 0 ====> nrg_gain: 1 ====> cap_gain: 5

/* Performance Constraint (C) region threshold params */ perf_constrain_idx = 100 - sysctl_sched_cfs_boost; perf_constrain_idx /= 10;

==> perf_constrain_idx = 10 (overflow) ==> perf_constrain_idx = 9 (once fixed for boundaries checks) ====> nrg_gain: 5 ====> cap_gain: 0

Thus, for example, a scheduling candidate which corresponds to a 50% decrease in both energy and capacity would return:

__schedtune_accept_deltas(int nrg_delta, int cap_delta, int perf_boost_idx, int perf_constrain_idx)
    gain_idx = perf_constrain_idx
    ==> 9


    payoff  = cap_delta * threshold_gains[gain_idx].nrg_gain;
    ==> -50 * 5

    payoff -= nrg_delta * threshold_gains[gain_idx].cap_gain;
    ==> -250 - (-50  * 0)
==> payoff: -250 ==> REJECT

And that is wrong, because we would expect to accept a candidate which reduces energy by 50%, regardless of the 50% impact on performances.

I think that the current solution can have some bad impacts both on lower boost values, by forbidding to spread tasks, as well as on higher boost values, by allowing to save small amount of energy wile impacting a lot performances.

How about below code:

diff --git a/kernel/sched/tune.c b/kernel/sched/tune.c index ced8ba0..e2303c4 100644 --- a/kernel/sched/tune.c +++ b/kernel/sched/tune.c @@ -41,16 +41,17 @@ struct threshold_params { */ static struct threshold_params threshold_gains[] = { - { 0, 4 }, /* >= 0% */ - { 0, 4 }, /* >= 10% */ - { 1, 4 }, /* >= 20% */ - { 2, 4 }, /* >= 30% */ - { 3, 4 }, /* >= 40% */ - { 4, 3 }, /* >= 50% */ - { 4, 2 }, /* >= 60% */ - { 4, 1 }, /* >= 70% */ - { 4, 0 }, /* >= 80% */ - { 4, 0 } /* >= 90% */ + { 0, 5 }, /* 0% */ + { 1, 5 }, /* 01% .. 10%*/ + { 2, 5 }, /* 11% .. 20%*/ + { 3, 5 }, /* 21% .. 30% */ + { 4, 5 }, /* 31% .. 40% */ + { 5, 5 }, /* 41% .. 50% */ + { 5, 4 }, /* 51% .. 60% */ + { 5, 3 }, /* 61% .. 70% */ + { 5, 2 }, /* 71% .. 80% */ + { 5, 1 }, /* 81% .. 90% */ + { 5, 0 } /* 91% .. 100% */ };

static int @@ -571,12 +572,13 @@ sysctl_sched_cfs_boost_handler(struct ctl_table *table, int write, return ret;

/* Performance Boost (B) region threshold params */ - perf_boost_idx = sysctl_sched_cfs_boost; + perf_boost_idx = sysctl_sched_cfs_boost; + + perf_boost_idx += 9; perf_boost_idx /= 10;

/* Performance Constraint (C) region threshold params */ - perf_constrain_idx = 100 - sysctl_sched_cfs_boost; - perf_constrain_idx /= 10; + perf_constrain_idx = perf_boost_idx;

return 0; }

Thanks, Leo Yan

Patrick Bellasi

6:47 p.m.

On 27-Jun 21:46, Leo Yan wrote:

...

On Mon, Jun 27, 2016 at 01:42:48PM +0100, Patrick Bellasi wrote:

[...]

...
...
For more easily to demonstrate my idea, please see the plot I draw for the PE filter region:

https://people.linaro.org/~leo.yan/PE_Filter_Regions.png

For boost=0, means PE filter region is left cut of X axis. If boost=5, then that means the cut gradient will rotate to right then will enable part of PB region and remove some of PC region. If boost=100, that means the cut gradient finally rotate to horizontal level and will totally enable PB region and remove whole PC region.

That's exactly the original idea based on which the PE regions cuts should work. Thanks for sharing these plots, they are quite useful. I would like to add something similar in LISA...

Glad for alignment. I have shared ipynb file in: https://people.linaro.org/~leo.yan/Payoff_PE_Filter_Regions.ipynb

Code is not clean and common enough, so just hope it's helpful.

...
...
Essentially below table wants to implement this idea properly. So Loose or restrict specific margin is not my purpose, but I want to figure out if can define more ordinary trend for PE filter region.

Not sure updating the table is enough, it's basically "just" increasing the granularity of cuts near the 0% and 100% boost values...

...
...
...
threshold_params should be:

static struct threshold_params threshold_gains[] = { { 1, 5 }, /* >= 0% */

For boost=0 we would have:

/* Performance Boost (B) region threshold params */ perf_boost_idx = sysctl_sched_cfs_boost; perf_boost_idx /= 10;

==> perf_boost_idx = 0 ====> nrg_gain: 1 ====> cap_gain: 5

/* Performance Constraint (C) region threshold params */ perf_constrain_idx = 100 - sysctl_sched_cfs_boost; perf_constrain_idx /= 10;

==> perf_constrain_idx = 10 (overflow) ==> perf_constrain_idx = 9 (once fixed for boundaries checks) ====> nrg_gain: 5 ====> cap_gain: 0

Thus, for example, a scheduling candidate which corresponds to a 50% decrease in both energy and capacity would return:

__schedtune_accept_deltas(int nrg_delta, int cap_delta, int perf_boost_idx, int perf_constrain_idx)
    gain_idx = perf_constrain_idx
    ==> 9


    payoff  = cap_delta * threshold_gains[gain_idx].nrg_gain;
    ==> -50 * 5

    payoff -= nrg_delta * threshold_gains[gain_idx].cap_gain;
    ==> -250 - (-50  * 0)
==> payoff: -250 ==> REJECT

And that is wrong, because we would expect to accept a candidate which reduces energy by 50%, regardless of the 50% impact on performances.

I think that the current solution can have some bad impacts both on lower boost values, by forbidding to spread tasks, as well as on higher boost values, by allowing to save small amount of energy wile impacting a lot performances.
How about below code:

I'm working on something quite similar but different... reviewing that code patch I think there are other issues as well.

For example, I think we are not updating the threshold indexes in case of CGroup support. Please find in attachment a series which fixes these issue, if you can give it a review and test it will be appreciated ;-)

Cheers Patrick

-- #include <best/regards.h>

Patrick Bellasi

toby huang

1 Jul 1 Jul

6:35 a.m.

Hi Patrick,

With your new fix patch, you define

perf_constrain_idx = perf_boost_idx,

As my understanding, it means will permit more performance reduction for PC region with the same energy reduction. For example:

boost = 15, energ_delt = -10.

Previous solution: perf_boost_idx = 1, perf_constrain_index = 9,

For PC region, nrg_gain = 5, cap_gain = 1, then capa_delt * 5 - (-10 x 1) > 0 capa_delt > -2

New solution: perf_boost_idx = 1, perf_constrain_index = 1,

For PC region, nrg_gain = 1, cap_gain = 5, then capa_delt * 1 - (-10 x 5) > 0 capa_delt > -50

so, we can see, for new solution, the accepted capacity delta range is more big.

I don't know whether my understanding is correct, if true, it will cause more performance downgrade? am i right?

On 2016年06月28日 02:47, Patrick Bellasi wrote:

...

On 27-Jun 21:46, Leo Yan wrote:

...
On Mon, Jun 27, 2016 at 01:42:48PM +0100, Patrick Bellasi wrote:

[...]

...
...
For more easily to demonstrate my idea, please see the plot I draw for the PE filter region:

https://people.linaro.org/~leo.yan/PE_Filter_Regions.png

For boost=0, means PE filter region is left cut of X axis. If boost=5, then that means the cut gradient will rotate to right then will enable part of PB region and remove some of PC region. If boost=100, that means the cut gradient finally rotate to horizontal level and will totally enable PB region and remove whole PC region.

That's exactly the original idea based on which the PE regions cuts should work. Thanks for sharing these plots, they are quite useful. I would like to add something similar in LISA...

Glad for alignment. I have shared ipynb file in: https://people.linaro.org/~leo.yan/Payoff_PE_Filter_Regions.ipynb

Code is not clean and common enough, so just hope it's helpful.

...
...
Essentially below table wants to implement this idea properly. So Loose or restrict specific margin is not my purpose, but I want to figure out if can define more ordinary trend for PE filter region.

Not sure updating the table is enough, it's basically "just" increasing the granularity of cuts near the 0% and 100% boost values...

...
...
...
threshold_params should be:

static struct threshold_params threshold_gains[] = { { 1, 5 }, /* >= 0% */

For boost=0 we would have:

/* Performance Boost (B) region threshold params */ perf_boost_idx = sysctl_sched_cfs_boost; perf_boost_idx /= 10;

==> perf_boost_idx = 0 ====> nrg_gain: 1 ====> cap_gain: 5

/* Performance Constraint (C) region threshold params */ perf_constrain_idx = 100 - sysctl_sched_cfs_boost; perf_constrain_idx /= 10;

==> perf_constrain_idx = 10 (overflow) ==> perf_constrain_idx = 9 (once fixed for boundaries checks) ====> nrg_gain: 5 ====> cap_gain: 0

Thus, for example, a scheduling candidate which corresponds to a 50% decrease in both energy and capacity would return:

__schedtune_accept_deltas(int nrg_delta, int cap_delta, int perf_boost_idx, int perf_constrain_idx)
    gain_idx = perf_constrain_idx
    ==> 9


    payoff  = cap_delta * threshold_gains[gain_idx].nrg_gain;
    ==> -50 * 5

    payoff -= nrg_delta * threshold_gains[gain_idx].cap_gain;
    ==> -250 - (-50  * 0)
==> payoff: -250 ==> REJECT

And that is wrong, because we would expect to accept a candidate which reduces energy by 50%, regardless of the 50% impact on performances.

I think that the current solution can have some bad impacts both on lower boost values, by forbidding to spread tasks, as well as on higher boost values, by allowing to save small amount of energy wile impacting a lot performances.
How about below code:
I'm working on something quite similar but different... reviewing that code patch I think there are other issues as well.

For example, I think we are not updating the threshold indexes in case of CGroup support. Please find in attachment a series which fixes these issue, if you can give it a review and test it will be appreciated ;-)

Cheers Patrick

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

Patrick Bellasi

8:42 a.m.

On 01-Jul 14:35, toby huang wrote:

...

Hi Patrick,

With your new fix patch, you define

perf_constrain_idx = perf_boost_idx,

As my understanding, it means will permit more performance reduction for PC region with the same energy reduction. For example:

boost = 15, energ_delt = -10.

Previous solution: perf_boost_idx = 1, perf_constrain_index = 9,

For PC region, nrg_gain = 5, cap_gain = 1, then capa_delt * 5 - (-10 x 1) > 0 capa_delt > -2

New solution: perf_boost_idx = 1, perf_constrain_index = 1,

For PC region, nrg_gain = 1, cap_gain = 5, then capa_delt * 1 - (-10 x 5) > 0 capa_delt > -50

so, we can see, for new solution, the accepted capacity delta range is more big.

I don't know whether my understanding is correct, if true, it will cause more performance downgrade? am i right?

You are right, but that is the intended design of SchedTune.

If you use low boost values (e.g. 10%) you are more biased towards energy saving, and thus you are willing to accept bigger impact on performances for a certain amount of energy saving.

While, when you boost more (e.g. 80%) than you are more biased toward performance boosting, and thus you are willing to reject big impact on performances for the same amount of energy saving.

Unfortunately, the previous implementation was breaking this assumption.

Cheers Patrick

-- #include <best/regards.h>

Patrick Bellasi

3651

days inactive

3668

days old

eas-dev@lists.linaro.org

9 comments

participants

tags (0)

participants (3)

Leo Yan
Patrick Bellasi
toby huang