OK, just an outline of my 1.0000001 pass shader.
I found if I tried to unpack the knot vectors into red, green, and blue knots and then multiply by the basis matrix per pixel (per fragment in OpenGL terminology) I wasted a whole bunch of moves.
Also, it seemed a waste to select the four nearest knots per pixel(fragment).
Instead, I stored the knot vectors in 3 textures - red, green, blue, (or as an optimization, a single texture.)
I'll outline the knot vectors in three textures first.
The first texel of the red texture will have the RED component of the 1st, 2nd, 3rd and 4th knots. The second texel will have the RED component of the 2nd, 3rd, 4th, and 5th knots. And so on.
Same thing for the green texture and blue texture.
Optimization - instead of 3 1D textures, pack the knots into 1 2D texture.
(Grumble grumble, pack them into power of 2 textures as well.)
Instead of a simple scale and bias of 0.5 * altiitude + 0.5, I set the scale and bias to address the appropriate texel. In this way, I always automatically fetch the four nearest knots.
To find where I am in the span it's just a simple frac of scaled and biased altitude * the width of the texture.
Now, in a single POINT SAMPLE fetch of a texel we can get the RED component of the four knots. Two more POINT SAMPLE fetches get the green and blue components.
Other thing to think about, INSTEAD of transforming per pixel(fragment) the reflection vector back to world space and taking the z component, transform the world up vector to eye space (once per frame or, if you are feeling wasteful, once per vertex) and dot it with the reflection vector. Same result, but much more effficiently.
Why is it a 1.0000001 pass shader? I used a simple image editing ap to store my knots knot1.rgb, knot2.rgb, knot3.rgb, knot4.rgb. I wrote a simple shader to source that texture and render to texture as knot1234.r, knot2345.r, knot3456.r and so on. That shader only needs to run when the knot texture is updated.
-mr. bill