On Mon, 25 Oct 2021 13:08:38 -0700 Kalesh Singh kaleshsingh@google.com wrote:
== Results ==
Divisor is a power of 2 (divisor == 32):
test_hist_field_div_not_optimized | 8,717,091 cpu-cycles test_hist_field_div_optimized | 1,643,137 cpu-cycles
If the divisor is a power of 2, the optimized version is ~5.3x faster.
Divisor is not a power of 2 (divisor == 33):
test_hist_field_div_not_optimized | 4,444,324 cpu-cycles test_hist_field_div_optimized | 5,497,958 cpu-cycles
To optimize this even more, if the divisor is constant, we could make a separate function to not do the branch, and just shift or divide.
And even if it is not a power of 2, for constants, we could implement a multiplication and shift, and guarantee an accuracy up to a defined max.
If div is a constant, then we can calculate the mult and shift, and max dividend. Let's use 20 for shift.
// This works best for small divisors if (div > max_div) { // only do a real division return; } shift = 20; mult = ((1 << shift) + div - 1) / div; delta = mult * div - (1 << shift); if (!delta) { /* div is a power of 2 */ max = -1; return; } max = (1 << shift) / delta;
We would of course need to use 64 bit operations (maybe only do this for 64 bit machines). And perhaps even use bigger shift values to get a bigger max.
Then we could do:
if (val1 < max) return (val1 * mult) >> shift;
-- Steve