abs()
. I thought to myself, `Gee, this looks easily exploitable with a bit hack. I wonder if the C math implementation uses branching. Well, there's only one way to find out!'I wrote a program that would compare four different implementations of the absolute value function:
- inlined bithack
- inlined branching
<math.h>
implementation- macro bithack
i
is i + (i < 0) * (-2*i)
. Clever, huh? :)The program was compiled with the following command:
gcc -o abs fast_abs.c -O0
and run with the following command:
pnqsub time ./abs i
where
i
was an integer from 0 to 3 indicating which algorithm should be tested.Runtimes
The testing methodology was not the best; I timed the programs with the UNIX time command instead of using 6.172's
ktiming
framework (because I couldn't get the darn framework to cooperate with the linker). In any case I present (possibly inaccurate, but interesting) data:Algorithm | Time (s) |
---|---|
1 | 5.20 |
2 | 4.88 |
3 | 2.61 |
4 | 2.62 |
At first, I was like, woah, how does inlined branching go 5% faster than inlined branching? I should probably delve into the assembly and check it out. The second thing of note is that the
abs()
in <math.h>
does probably the same thing as macro-ed bithack. Branch Misses
The next set of data I collected was branch miss ratios from
perf
, a Linux performance counter utility.The command used to collect
perf
data was:pnqsub perf stat -e branches -e branch-misses ./abs i
where
i
was an integer from 0 to 3 indicating which algorithm should be tested.Algorithm | Branches | Branch Misses |
---|---|---|
1 | 3000949371 | 8268 |
2 | 4500899247 | 7919 |
3 | 1000500826 | 7316 |
4 | 1000511479 | 6681 |
All four algorithms have very little branch misses! As expected, #2 has more branches than #1, but the branch predictor is too good to incur any branch misses. Darn.
Cache Misses
The last set of data I collected was cache miss ratios (again from
perf
purely for my amusement. The command used to collect
perf
data was:pnqsub perf stat -e L1-dcache-loads -e L1-dcache-load-misses ./abs i
where
i
was an integer from 0 to 3 indicating which algorithm should be tested.Algorithm | Loads | Misses |
---|---|---|
1 | 8001552904 | 39584 |
2 | 8001477478 | 37913 |
3 | 2000831604 | 25153 |
4 | 2000839860 | 24283 |
All four algorithms have a very low percentage of cache misses! But oh my, #1 and #2 have a massively higher number of cache loads and more misses and #3 and #4 have very similar numbers!
Conclusion
Overall, the [possibly inaccurate] data suggest that the math library's absolute value function is more or less just the bit hack macro. The processor's branch prediction is complex enough to correctly predict the branches in #2, thus having a less of a cost than computing the bithack. The experiment also says loudly and clearly that calling user-defined functions (even when inlined) incurs a substantial overhead; maybe declaring them static would speed up the program. Finally, perhaps writing four different programs, each with its own algorithm, would make a difference.
#include <stdio.h> #include <math.h> #include <assert.h> #include <stdlib.h> #include <unistd.h> #include "ktiming.h" #define FAST_ABS(i) ((i) + ((i) < 0) * (-2*(i))) #define MAXN 10000000 #define REP 100 inline int fast_abs(int i) { return i + (i < 0) * (-2*i); } inline int slow_abs(int i) { if(i < 0) { return -i; } return i; } int main(int argc, char **argv) { printf("argc = %d\n", argc); if(argc < 1) { fprintf(stderr, "Usage: ./abs i\n"); return 1; } int cmd = atoi(argv[1]); int i, rep; #ifdef KTIMING clockmark_t time1, time2; #endif switch(cmd) { case 0: // fast abs #ifdef KTIMING time1 = ktiming_getmark(); #endif for(rep = 0; rep < REP; ++rep) { for(i = -MAXN/2; i < MAXN/2; ++i) { fast_abs(i); } } #ifdef KTIMING time2 = ktiming_getmark(); #endif break; case 1: // slow abs #ifdef KTIMING time1 = ktiming_getmark(); #endif for(rep = 0; rep < REP; ++rep) { for(i = -MAXN/2; i < MAXN/2; ++i) { slow_abs(i); } } #ifdef KTIMING time2 = ktiming_getmark(); #endif break; case 2: // math.h:abs #ifdef KTIMING time1 = ktiming_getmark(); #endif for(rep = 0; rep < REP; ++rep) { for(i = -MAXN/2; i < MAXN/2; ++i) { abs(i); } } #ifdef KTIMING time2 = ktiming_getmark(); #endif break; case 3: // macro abs #ifdef KTIMING time1 = ktiming_getmark(); #endif for(rep = 0; rep < REP; ++rep) { for(i = -MAXN/2; i < MAXN/2; ++i) { FAST_ABS(i); } } #ifdef KTIMING time2 = ktiming_getmark(); #endif break; default: fprintf(stderr, "This isn't supposed to happen.\n"); break; } #ifdef KTIMING float elapsedf = ktiming_diff_sec(&time1, &time2); printf("Elapsed execution time: %f sec\n", elapsedf); #endif return 0; }
No comments:
Post a Comment