Here's a couple that I've found to be useful:
Take the following code:
We can make use of compiler optimizations to unroll an inner loop by doing the following:
This resulted in about a factor of two speedup when I tested it on my system.
Another optimization that I made once, when performing one-dimensional integration, was to make use of recursion. Here's the code:
Making use of the above recursive algorithm for integration sped up my integral by a factor of 100, while maintaining better accuracy, than my previous "divide the integral into N steps and sum" setup. Since 1-D integration is the limiting factor in my code at the moment, I'm really hoping one day that I'll be able to implement a recursive function like this on a GPU...but that's beside the point.
Anyway, here's a link on optimization that epicstruggle posted in the Coding style thread. It's got the loop unrolling tip, plus a number of others:
http://www.cse.msu.edu/~cse471/lecture14_files/v3_document.htm
Take the following code:
Code:
for (i = 0; i < maxvalue; i++)
p += arr[i];
Code:
const int stride = 16;
index = 0;
for (i = 0; i < maxvalue/stride; i++)
for (j = 0; j < stride; j++)
p += arr[index++];
for (i = 0; i < maxvalue % stride; i++)
p+= arr[index++];
Another optimization that I made once, when performing one-dimensional integration, was to make use of recursion. Here's the code:
Code:
double trapint(function *f, double a, double b, double acc)
{
double f1, f2, f3, I1, I2, h;
h = b - a;
f1 = f->eval(a); //value at beginning of integration
f2 = f->eval(b); //value at end
f3 = f->eval((a+b)/2.0); //half-way inbetween
I1 = 0.5*h*(f1 + f2); //Integrate using trapezoid rule approximation
I2 = h/6.0*(f1 + 4.0*f3 + f2); //Integrate using Simpson's rule approximation
if (fabs(I1 - I2) < acc) //Check for convergence
return I2;
else //Hasn't converged: make the result of this integration equal to the sum of two integrations
return trapint(f, a, a+h/2.0, acc, f1, f3) + trapint(f, a+h/2.0, b, acc, f3, f2);
//The above function calls are similar to this function, except that the function values
//at the beginning and end of the integration region are passed instead of recalculating
}
Anyway, here's a link on optimization that epicstruggle posted in the Coding style thread. It's got the loop unrolling tip, plus a number of others:
http://www.cse.msu.edu/~cse471/lecture14_files/v3_document.htm