On Tue, Oct 26, 2021 at 12:14 PM Steven Rostedt rostedt@goodmis.org wrote:
On Mon, 25 Oct 2021 13:08:38 -0700 Kalesh Singh kaleshsingh@google.com wrote:
== Results ==
Divisor is a power of 2 (divisor == 32):
test_hist_field_div_not_optimized | 8,717,091 cpu-cycles test_hist_field_div_optimized | 1,643,137 cpu-cycles
If the divisor is a power of 2, the optimized version is ~5.3x faster.
Divisor is not a power of 2 (divisor == 33):
test_hist_field_div_not_optimized | 4,444,324 cpu-cycles test_hist_field_div_optimized | 5,497,958 cpu-cycles
To optimize this even more, if the divisor is constant, we could make a separate function to not do the branch, and just shift or divide.
Ack. I can update to use separate functions for the constant divisors.
And even if it is not a power of 2, for constants, we could implement a multiplication and shift, and guarantee an accuracy up to a defined max.
If div is a constant, then we can calculate the mult and shift, and max dividend. Let's use 20 for shift.
// This works best for small divisors if (div > max_div) { // only do a real division return; } shift = 20; mult = ((1 << shift) + div - 1) / div; delta = mult * div - (1 << shift); if (!delta) { /* div is a power of 2 */ max = -1; return; } max = (1 << shift) / delta;
I'm still trying to digest the above algorithm. But doesn't this add 2 extra divisions? What am I missing here?
Thanks, Kalesh
We would of course need to use 64 bit operations (maybe only do this for 64 bit machines). And perhaps even use bigger shift values to get a bigger max.
Then we could do:
if (val1 < max) return (val1 * mult) >> shift;
-- Steve