I'm not familiar with Crays, but in my Uni days, I was asked if I'd be interested in rewriting FORTRAN code** for a, IIRC, a Cyber 200 machine which (again, IIRC) could do > 200MFLOPS! (This
was in the early 80s).
To make use of the vector hardware, it assumed you could rewrite code so that you had, for example, sets of long arrays of floats and the program to do operations in a form equivalent to, say,...
Code:
for(i=0; i < ARRAY_LENGTH; i++)
{
c[i] = a[i] + b[i];
}
The compiler then had 'libraries' with functions that did operations like the above using, say, a single instruction.
This is not like the shader models of graphics chips which assume you can do ops on fixed 4-element structures.
**It was a holiday programming job. In the end, I was very glad I didn't get it as I spoke to the guy who did and he said it was not much fun