Unified Shader Architecture: Point sampling in addition to Bilinear Texturing

Dave Baumann said:
Yes, as the article states, they were described to me as independant units, both available simultaneously, if needed (its perfectly feasible to doing vertex texture lookups on one shader array while filtered texturing for PS in another). Its described as "Vertex Fetch Units" for want of a better description really; I think this is for any type of fixed point float texture sampling really.
I would guess it's described as "Vertex Fetch Units" because that's exactly the main purpose: getting vertex data into the shader.
It would effectively be a 2D access (vertex index, offset inside vertex data structure). So assuming both values can be passed from the shader, and the vertex stride register has enough bits, you could use that very same memory access method to point sample from a shadow map.


Mintmaster said:
One more question I have about Xenos is regarding multisampling. Can you control the sampling positions? And can you get access to the unresolved MSAA buffer? If you could revert to a square grid, you'd get pseudo-high-res rendering for free. Great for shadow maps.
AFAIK the answer is yes in both cases.
 
  • Like
Reactions: Geo
It's my understanding that there's generally a fair amount of point-sampled texturing used in pixel shaders, to perform "look-ups". At the same time (not being a dev) I don't know the degree to which point sampling is typically used.
I can't say with any substantial authority simply because it varies immensely on what you're trying to achieve..

But from what I've noticed, the "look up textures" are becoming a bit less common. Now that we have more powerful hardware as well as better shader models (no more obscure SM1 limits :D) theres less requirement on storing complex equations as look-ups. It is now perfectly viable to compute many expressions directly as arithemetic... But on the other hand, if you want to squeeze every drop out of the hardware then you may well want to drop back to this optimization. I've also seen examples where people expose distributions to 1D look-up textures so that artists can "do their thing" without messing with the shader code.

My recent work has been heavy on the post-processing side of things (particularly HDRI post-processing), and that really benefits from filtering. The performance and quality difference between my Radeon 9800 and GeForce 6800 is amazing. When you're jumping around various resolution render-targets its really easy to get aliasing and bilinear filtering is a cheap and easy way of masking it ;)

Mintmaster said:
I'm not too familiar with DX10 (I think JHoxley is the resident expert on that)
:D Thanks!

Although, as of late I've not had much time with D3D10 - Vista has taken more and more of a dislike to my PC and refuses to even install now :cry:

How much of this stuff is coming to D3D10? Perhaps you'll get a chance to play soon.
It's difficult/risky to try and draw too many parallels between the PC-based API's and the Console-based API's.

Sure the hardware tends to share an awful lot of technology, but its not so true with the software side of things. The consoles have always been capable of doing things that no PC parts can (which is why XBox isn't really D3D 8 or 9 and XB360 isn't really D3D 9 or 10...) and the low-level/thin API's for respective consoles are much more highly customized.

One of the new things in D3D10 (not in previous D3D's, but in the Xbox afaik) is type-casting of various formats. You create your resource, but you tend to bind a "view" as a shader input - allowing you to (re)interpret the same data in different ways. Consequently its not quite so clear-cut what you can/cant do with different formats. But there are some (optional) restrictions - for example, FP32 filtering is still optional in D3D10.

Hope that's useful!

Jack
 
  • Like
Reactions: Geo
I though hey maybe ATI might soon simply go to point sampling only seeing as thats all they have in HDR atm and with the shaders being unified its all the more reason just to have point sampling.
 
That won't ever happen, IMO.

Because filtering needs adjacent texels, you can make a sampler capable of filtering far more compact than 4 point samplers. Transferring all the data to the shader also requires wider internal buses. It makes more sense to do filtering as close to the cache as possible.

Remember that ATI does I16 filtering and NVidia doesn't. That has some notable advantages also, though maybe not specifically for HDR.
 
Last edited by a moderator:
Xmas said:
I would guess it's described as "Vertex Fetch Units" because that's exactly the main purpose: getting vertex data into the shader.
It would effectively be a 2D access (vertex index, offset inside vertex data structure). So assuming both values can be passed from the shader, and the vertex stride register has enough bits, you could use that very same memory access method to point sample from a shadow map.
Excellent point. However, offset might be stored as an index offset rather than a byte offset, which would ruin the 2D-ish nature. I'm not sure that you'd make the offset changeable mid-batch either. Then again, I didn't expect the filtering to be changeable on the fly either.

Regardless, even 1D access is good enough provided the texture can be stored without any tiling schemes. Figure out the index of the centre tap with a FRAC and MADD, and then add a precomputed constant for the other offsets between lookups. If the constant registers are indexable, as they should be, then it should be easy to have dynamically changing sample positions also.
 
Jawed said:
his original post
I really don't foresee any ISV using any point-sampled texturing of texture maps.

Using that feature to index into arbitrary array-like data structures (eg. textures containing arrays of bone matrices for skinning, and etc) could make more sense. But what a bizarre programming model that is -- to have to store something as simple as an array as a texture map in order to be able to read it from a GPU :!:

IMO, the IHVs are failing (miserably) to look forward to reasonable general purpose solutions to basic programming problems here.
 
Shadow map sampling has already come up in this thread as a practical use-case for point sampling. I'm no expert, though, which is why I created the thread...

Jawed
 
Just out of interest, have you done any graphics programming work yet with your new hardware? I thought that was the aim :D
 
Nom De Guerre said:
I really don't foresee any ISV using any point-sampled texturing of texture maps.

Using that feature to index into arbitrary array-like data structures (eg. textures containing arrays of bone matrices for skinning, and etc) could make more sense. But what a bizarre programming model that is -- to have to store something as simple as an array as a texture map in order to be able to read it from a GPU :!:

IMO, the IHVs are failing (miserably) to look forward to reasonable general purpose solutions to basic programming problems here.
I don't see what's bizarre about it. Textures are nothing more than data in a block of memory with base address, dimensions, and format. Pretty much exactly what an array is.
 
Rys said:
Just out of interest, have you done any graphics programming work yet with your new hardware? I thought that was the aim :D
The aim? No - at some point I might get around to it. I think it's a "winter" thing.

Jawed
 
Xmas said:
I don't see what's bizarre about it. Textures are nothing more than data in a block of memory with base address, dimensions, and format. Pretty much exactly what an array is.
Textures are isomorphic to 2D arrays, but in C you can declare an array in one line of code, allocate variable-sized arrays, have nested arrays, arrays of any data type, pointers to arrays, etc. Allocating and managing a texture requires tens of lines of codes for an array; calling this bizarre (and quite ridiculous, I might add now) isn't too extreme IMO.
 
Nom De Guerre said:
Textures are isomorphic to 2D arrays, but in C you can declare an array in one line of code, allocate variable-sized arrays, have nested arrays, arrays of any data type, pointers to arrays, etc. Allocating and managing a texture requires tens of lines of codes for an array; calling this bizarre (and quite ridiculous, I might add now) isn't too extreme IMO.
Except that texture units are designed to handle filtering in a very effcient manner. How many lines of code does it take to do a bilinear filter from a 2D array in C?
 
Mintmaster said:
One more question I have about Xenos is regarding multisampling. Can you control the sampling positions? And can you get access to the unresolved MSAA buffer? If you could revert to a square grid, you'd get pseudo-high-res rendering for free. Great for shadow maps.

I'm guessing no because there would have to be some synchronization with the eDRAM for it to do the Z interpolation. Not a deal breaker, but the eDRAM logic is pretty basic.


You're right, the Xenos article says the sampling positions are fixed:

Xenos Article Page 4 said:
As all the sampling units for frame buffer operations are multiplied to work optimally with 4x FSAA this is actually the maximum mode available. Although the developer can choose to use 2x or no FSAA, there are no FSAA levels available higher than 4x. The sampling pattern is not programmable but fixed, although it does use a sample pattern that doesn't have any of the sample points intersecting one or another on either the vertical or horizontal axis. Although we don't know the exact sample pattern shape, we suspect it will be similar to that seen on other sparse sampled / jittered / rotated grid FSAA mechanisms we've seen over the past few years, such as this.

(I've got zero other knowledge on your other points though. :oops: )
 
OpenGL guy said:
Except that texture units are designed to handle filtering in a very effcient manner. How many lines of code does it take to do a bilinear filter from a 2D array in C?
Well, let's see if I can do it (assuming texture coordinates range from 0 to 1 over the texture, this is a repeated texture, and we use distance-weighted sampling):
Code:
float filterBilinear(float tex[][], float xCoord, float yCoord, int xSize, int ySize)
{
  //Get location in texture to sample:
  int xIndex = ( (int)(xCoord*xSize) ) % xSize;
  int yIndex = ( (int)(yCoord*ySize) ) % ySize;
  //Get sample weights:
  float dx = fracpart(xCoord*xSize);  //Assume we have an optimized function to find
  float dy = fracpart(yCoord*ySize);  //the fractional part of a floating point number quickly
  float sweight1 = sqrt(dx*dx + dy*dy);
  float sweight2 = sqrt((1-dx)*(1-dx) + dy*dy);
  float sweight3 = sqrt(dx*dx + (1-dy)*(1-dy));
  float sweight4 = sqrt((1-dx)*(1-dx) + (1-dy)*(1-dy));
  float sum = sweight1 + sweight2 + sweight3 + sweight4;

  return (sweight1*tex[xIndex][yIndex]
          + sweight2*tex[xIndex+1][yIndex]
          + sweight3*tex[xIndex][yIndex+1]
          + sweight4*tex[xIndex+1][yIndex+1])/sum;
}
Some notes:
We only need use a rough approximation for the square root, so that doesn't need to take a significant amount of time. It is possible to find the fractional part of a floating point number in just a couple of assembly instructions. But we still have to contend with a large number of operations to do bilinear filtering.

Try to compare the above to what graphics hardware can do in a single clock cycle, sometimes without impacting other processing that is going on!
 
Hm. I thought bilinear filtering was somewhat simpler than that. You are doing distance based averaging, but that is not linear isn't it?

If the the adjacent texels were arranged this way (A - D being the colors)

Code:
AB
CD

then

Code:
filteredColor = dy * (A * dx + B * (1 - dx)) + (1 - dy)*(C * dx + D * (1 - dx));

should be all math to it, shouldn't it? (except getting the fractional parts dx and dy)

Of course your filter would work too, but it's somewhat more luxurious :)
Does anyone know what actual maths are being done in the sampler unit?
 
hWnd said:
Hm. I thought bilinear filtering was somewhat simpler than that. You are doing distance based averaging, but that is not linear isn't it?
It's still linear because nowhere do you take a nonlinear function of any sample color. It's just linear with different weighting.
 
Chalnoth said:
It's still linear because nowhere do you take a nonlinear function of any sample color.

That's true, but you don't do so either when using biquadratic or bicubic interpolation. In respect to the color samples it's just as linear.

When you take

Code:
dy * (A * dx + B * (1 - dx)) + (1 - dy)*(C * dx + D * (1 - dx))

and expand it you get

Code:
dy * dx * (A + D - C - B) + dy * (B - D) + dx * (C - D) + D
.

As you can see, it's is both linear to dx and dy (thus linear filter), as well as to the color values.

Your filter is using squares and square roots on dx and dy, thus it's not linear. (My opinion)
 
Back
Top