Unified Shader Architecture: Point sampling in addition to Bilinear Texturing

Chalnoth said:
It's still linear because nowhere do you take a nonlinear function of any sample color. It's just linear with different weighting.

Chalnoth, bilinear is called the way it's called because it uses linear-function-based weights. the application of the weights is just a convoluion operation, which by definition is linear. the weights you're using are not linear, they're radial. your sample surely does 4-tap filtering but it's not bilinear as by everybody's definition.
 
Chalnoth said:
Code:
float filterBilinear(float tex[][], float xCoord, float yCoord, int xSize, int ySize)
{
  //Get location in texture to sample:
  int xIndex = ( (int)(xCoord*xSize) ) % xSize;
  int yIndex = ( (int)(yCoord*ySize) ) % ySize;
  //Get sample weights:
  float dx = fracpart(xCoord*xSize);  //Assume we have an optimized function to find
  float dy = fracpart(yCoord*ySize);  //the fractional part of a floating point number quickly
  float sweight1 = sqrt(dx*dx + dy*dy);
  float sweight2 = sqrt((1-dx)*(1-dx) + dy*dy);
  float sweight3 = sqrt(dx*dx + (1-dy)*(1-dy));
  float sweight4 = sqrt((1-dx)*(1-dx) + (1-dy)*(1-dy));
  float sum = sweight1 + sweight2 + sweight3 + sweight4;

  return (sweight1*tex[xIndex][yIndex]
          + sweight2*tex[xIndex+1][yIndex]
          + sweight3*tex[xIndex][yIndex+1]
          + sweight4*tex[xIndex+1][yIndex+1])/sum;
}
This is not bilinear at all. If you try to insert e.g. dx = 0 and dy = 0.5 you get as a result with this method that all 4 texels are given nonzero weights (the two rigthmost texels are given weight 15% each, and the two leftmost 35% each). This means that you will get a color discontinuity wherever xCoord or yCoord cross an integer boundary.

This is distance-based interpolation; this method produces a discontinuity whenever you change the set of points you are distance-interpolating between.
 
arjan de lumens said:
This is not bilinear at all. If you try to insert e.g. dx = 0 and dy = 0.5 you get as a result with this method that all 4 texels are given nonzero weights (the two rigthmost texels are given weight 15% each, and the two leftmost 35% each). This means that you will get a color discontinuity wherever xCoord or yCoord cross an integer boundary.
Ah, right, didn't think about that. Obviously you have to use the linear weighting scheme. Distance-weighted sampling might look decent within the pixel, but would make each pixel in a magnified texture rather obvious.
 
So, I guess I'll revise the code, and insert a fraction-part finding algorithm:
Code:
float fracpart(float x)
{
  volatile float intpart;
  intpart = x + MAGIC;       //MAGIC is specially-defined based on the precision to
  intpart = intpart - MAGIC;//truncate the FP number so that we get the integer part remaining
  return x - intpart;
}

float filterBilinear(float tex[][], float xCoord, float yCoord, int xSize, int ySize)
{
  //Get location in texture to sample:
  int xIndex = ( (int)(xCoord*xSize) ) % xSize;
  int yIndex = ( (int)(yCoord*ySize) ) % ySize;
  //Get sample weights:
  float dx = fracpart(xCoord*xSize);  //Assume we have an optimized function to find
  float dy = fracpart(yCoord*ySize);  //the fractional part of a floating point number quickly
  return (1-dx)*((1-dy)*tex[xIndex][yIndex] + dy*tex[xIndex][yIndex+1])
              + dx*((1-dy)*tex[xIndex+1][yIndex]+dy*tex[xIndex+1][yIndex+1]);
}
I figure that'd take on the order of ~20-30 clock cycles, assuming perfect cache hits and that the use of "volatile" in the fracpart algorithm doesn't slow you down, and one assembly instruction per clock cycle.
 
Firstly, um, you can't use [][] for an array in the function arg list. That function wont work as is. Better to pass a pointer to the 'array'. There is also an error with over stepping the bounds of the array. Doing a bilinear from a texture in C isn't that simple.

Secondly, I assume that "OpenGL guys" question was mostly to do with reading from a texture, probably RGBA32. I would have done it more like this using fixed point:

Code:
// Integer Bilinear for Power of 2 32 bit RGBA textures

float size256[2]; // Size of the texture * 256 as a float
int stride;       // size of one line in the texture
int xMask;        // xSize-1
int yMask;        // ySize*(stride-1)
 
colour32 filterBilinear(colour32 *tex, float xCoord, float yCoord)
{
  int xFixed = (int)(xCoord*size256[0]);
  int yFixed = (int)(yCoord*size256[1]);
  int xFactor = xFixed&0xFF;
  int yFactor = yFixed&0xFF;
  int xIndex = xFixed>>8;
  int yIndex = (yFixed>>8)*stride;

  colour32 ul = tex[(xIndex&xMask) + (yIndex&yMask)];
  colour32 bl = tex[(xIndex&xMask) + ((yIndex+stride)&yMask)];
  colour32 ur = tex[((xIndex+1)&xMask) + (yIndex&yMask)];
  colour32 br = tex[((xIndex+1)&xMask) + ((yIndex+stride)&yMask)];

  // Using bilinear = s1 + (s2-s1)*fac;
  colour32 upper = ((ul&0x00FF00FF) + ((((ur&0x00FF00FF)-(ul&0x00FF00FF))*xFactor)>>8)&0x00FF00FF)
                + ((ul&0xFF00FF00) + ((((ur&0xFF00FF00)-(ul&0xFF00FF00))>>8)*xFactor)&0xFF00FF00);

  colour32 bottom = ((bl&0x00FF00FF) + ((((br&0x00FF00FF)-(bl&0x00FF00FF))*xFactor)>>8)&0x00FF00FF)
                 + ((bl&0xFF00FF00) + ((((br&0xFF00FF00)-(bl&0xFF00FF00))>>8)*xFactor)&0xFF00FF00);

  return ((upper&0x00FF00FF) + ((((bottom&0x00FF00FF)-(upper&0x00FF00FF))*yFactor)>>8)&0x00FF00FF)
       + ((upper&0xFF00FF00) + ((((bottom&0xFF00FF00)-(upper&0xFF00FF00))>>8)*yFactor)&0xFF00FF00);
}
I'm ignoring any possible accuracy errors in that and MMX makes code like that substantially smaller. :)
 
Last edited by a moderator:
Colourless said:
Firstly, um, you can't use [][] for an array in the function arg list.
Oh, you're right, it wants to have bounds for all array arguments but the first. Oh, well, doesn't really change the code any at all, other than the definition of the function. Just define the texture as a pointer to a pointer.

That function wont work as is. Better to pass a pointer to the 'array'. There is also an error with over stepping the bounds of the array. Doing a bilinear from a texture in C isn't that simple.
Ah, you're right. But simply adding the xMask/yMask as you did would do the trick. And yes, with a 4-component fixed point texture, you could indeed cut the work in half by doing two components at a time.

But with a floating point texture on a CPU, I think you'd basically have to duplicate the single-component averaging four times.

P.S. You used xIndex and yIndex before you defined them, and have && where there should be just & :)
 
Chalnoth said:
And yes, with a 4-component fixed point texture, you could indeed cut the work in half by doing two components at a time.

But with a floating point texture on a CPU, I think you'd basically have to duplicate the single-component averaging four times.

Sort of depends on if you want to do things more like how a GPU does things or use a more general approach. If we think about GPUs and > 32 Bit filtering, chances are they use fixed point for integer formats, and have a separate block for filtering of floating point textures (if it exists at all!), hence various limitations with them.

Regardless, filtering a 32 bit per component textures on a CPU in C would require duplicated code. Which kind of proves OpenGL Guys point. 1 instruction in a shader to access a texture, require how many lines of C code....
 
  • Like
Reactions: Geo
Yup, whichever way you slice it, it's going to take a hell of a lot longer to do on a CPU than one instruction. Especially when that one instruction might represent trilinear + anisotropic filtering as well, and if textures aren't used all that often compared to math ops, these things won't even impact performance significantly.
 
Colourless - Couldn't you do fp-filtering largely in fixed point by converting the 4 input samples to shared exponent format before performing much arithmetic on them?
 
OpenGL guy said:
Except that texture units are designed to handle filtering in a very effcient manner. How many lines of code does it take to do a bilinear filter from a 2D array in C?

Any other GPU features you wanted to see written in C? :LOL:
 
rwolf said:
Any other GPU features you wanted to see written in C?

not to meantion that the 'lines of code' measure is so cute.. reminds me of the '80s ; )

(sorry, ogl guy, not meaning to downplay your parallel, just a sincere raction ; )
 
Back
Top