Unified Shader Architecture: Point sampling in addition to Bilinear Texturing

Xmas · Apr 30, 2006

Dave Baumann said:
Yes, as the article states, they were described to me as independant units, both available simultaneously, if needed (its perfectly feasible to doing vertex texture lookups on one shader array while filtered texturing for PS in another). Its described as "Vertex Fetch Units" for want of a better description really; I think this is for any type of fixed point float texture sampling really.

I would guess it's described as "Vertex Fetch Units" because that's exactly the main purpose: getting vertex data into the shader.
It would effectively be a 2D access (vertex index, offset inside vertex data structure). So assuming both values can be passed from the shader, and the vertex stride register has enough bits, you could use that very same memory access method to point sample from a shadow map.

Mintmaster said:
One more question I have about Xenos is regarding multisampling. Can you control the sampling positions? And can you get access to the unresolved MSAA buffer? If you could revert to a square grid, you'd get pseudo-high-res rendering for free. Great for shadow maps.

AFAIK the answer is yes in both cases.

JHoxley · Apr 30, 2006

It's my understanding that there's generally a fair amount of point-sampled texturing used in pixel shaders, to perform "look-ups". At the same time (not being a dev) I don't know the degree to which point sampling is typically used.

I can't say with any substantial authority simply because it varies immensely on what you're trying to achieve..

But from what I've noticed, the "look up textures" are becoming a bit less common. Now that we have more powerful hardware as well as better shader models (no more obscure SM1 limits

) theres less requirement on storing complex equations as look-ups. It is now perfectly viable to compute many expressions directly as arithemetic... But on the other hand, if you want to squeeze every drop out of the hardware then you may well want to drop back to this optimization. I've also seen examples where people expose distributions to 1D look-up textures so that artists can "do their thing" without messing with the shader code.

My recent work has been heavy on the post-processing side of things (particularly HDRI post-processing), and that really benefits from filtering. The performance and quality difference between my Radeon 9800 and GeForce 6800 is amazing. When you're jumping around various resolution render-targets its really easy to get aliasing and bilinear filtering is a cheap and easy way of masking it

Mintmaster said:
I'm not too familiar with DX10 (I think JHoxley is the resident expert on that)

Thanks!

Although, as of late I've not had much time with D3D10 - Vista has taken more and more of a dislike to my PC and refuses to even install now

How much of this stuff is coming to D3D10? Perhaps you'll get a chance to play soon.

It's difficult/risky to try and draw too many parallels between the PC-based API's and the Console-based API's.

Sure the hardware tends to share an awful lot of technology, but its not so true with the software side of things. The consoles have always been capable of doing things that no PC parts can (which is why XBox isn't really D3D 8 or 9 and XB360 isn't really D3D 9 or 10...) and the low-level/thin API's for respective consoles are much more highly customized.

One of the new things in D3D10 (not in previous D3D's, but in the Xbox afaik) is type-casting of various formats. You create your resource, but you tend to bind a "view" as a shader input - allowing you to (re)interpret the same data in different ways. Consequently its not quite so clear-cut what you can/cant do with different formats. But there are some (optional) restrictions - for example, FP32 filtering is still optional in D3D10.

Hope that's useful!

Jack

bloodbob · Apr 30, 2006

I though hey maybe ATI might soon simply go to point sampling only seeing as thats all they have in HDR atm and with the shaders being unified its all the more reason just to have point sampling.

Mintmaster · Apr 30, 2006

That won't ever happen, IMO.

Because filtering needs adjacent texels, you can make a sampler capable of filtering far more compact than 4 point samplers. Transferring all the data to the shader also requires wider internal buses. It makes more sense to do filtering as close to the cache as possible.

Remember that ATI does I16 filtering and NVidia doesn't. That has some notable advantages also, though maybe not specifically for HDR.

Mintmaster · Apr 30, 2006

Xmas said:
I would guess it's described as "Vertex Fetch Units" because that's exactly the main purpose: getting vertex data into the shader.
It would effectively be a 2D access (vertex index, offset inside vertex data structure). So assuming both values can be passed from the shader, and the vertex stride register has enough bits, you could use that very same memory access method to point sample from a shadow map.

Excellent point. However, offset might be stored as an index offset rather than a byte offset, which would ruin the 2D-ish nature. I'm not sure that you'd make the offset changeable mid-batch either. Then again, I didn't expect the filtering to be changeable on the fly either.

Regardless, even 1D access is good enough provided the texture can be stored without any tiling schemes. Figure out the index of the centre tap with a FRAC and MADD, and then add a precomputed constant for the other offsets between lookups. If the constant registers are indexable, as they should be, then it should be easy to have dynamically changing sample positions also.

Nom De Guerre · May 2, 2006

Jawed said:
his original post

I really don't foresee any ISV using any point-sampled texturing of texture maps.

Using that feature to index into arbitrary array-like data structures (eg. textures containing arrays of bone matrices for skinning, and etc) could make more sense. But what a bizarre programming model that is -- to have to store something as simple as an array as a texture map in order to be able to read it from a GPU :!:

IMO, the IHVs are failing (miserably) to look forward to reasonable general purpose solutions to basic programming problems here.

Jawed · May 2, 2006

Shadow map sampling has already come up in this thread as a practical use-case for point sampling. I'm no expert, though, which is why I created the thread...

Jawed

Rys · May 2, 2006

Just out of interest, have you done any graphics programming work yet with your new hardware? I thought that was the aim

Xmas · May 2, 2006

Nom De Guerre said:
I really don't foresee any ISV using any point-sampled texturing of texture maps.

Using that feature to index into arbitrary array-like data structures (eg. textures containing arrays of bone matrices for skinning, and etc) could make more sense. But what a bizarre programming model that is -- to have to store something as simple as an array as a texture map in order to be able to read it from a GPU

IMO, the IHVs are failing (miserably) to look forward to reasonable general purpose solutions to basic programming problems here.

I don't see what's bizarre about it. Textures are nothing more than data in a block of memory with base address, dimensions, and format. Pretty much exactly what an array is.

Jawed · May 2, 2006

Rys said:
Just out of interest, have you done any graphics programming work yet with your new hardware? I thought that was the aim

The aim? No - at some point I might get around to it. I think it's a "winter" thing.

Jawed

Nom De Guerre · May 4, 2006

Xmas said:
I don't see what's bizarre about it. Textures are nothing more than data in a block of memory with base address, dimensions, and format. Pretty much exactly what an array is.

Textures are isomorphic to 2D arrays, but in C you can declare an array in one line of code, allocate variable-sized arrays, have nested arrays, arrays of any data type, pointers to arrays, etc. Allocating and managing a texture requires tens of lines of codes for an array; calling this bizarre (and quite ridiculous, I might add now) isn't too extreme IMO.

OpenGL guy · May 5, 2006

Nom De Guerre said:
Textures are isomorphic to 2D arrays, but in C you can declare an array in one line of code, allocate variable-sized arrays, have nested arrays, arrays of any data type, pointers to arrays, etc. Allocating and managing a texture requires tens of lines of codes for an array; calling this bizarre (and quite ridiculous, I might add now) isn't too extreme IMO.

Except that texture units are designed to handle filtering in a very effcient manner. How many lines of code does it take to do a bilinear filter from a 2D array in C?

AlNom · Jul 12, 2006

Mintmaster said:
One more question I have about Xenos is regarding multisampling. Can you control the sampling positions? And can you get access to the unresolved MSAA buffer? If you could revert to a square grid, you'd get pseudo-high-res rendering for free. Great for shadow maps.

I'm guessing no because there would have to be some synchronization with the eDRAM for it to do the Z interpolation. Not a deal breaker, but the eDRAM logic is pretty basic.

You're right, the Xenos article says the sampling positions are fixed:

Xenos Article Page 4 said:
As all the sampling units for frame buffer operations are multiplied to work optimally with 4x FSAA this is actually the maximum mode available. Although the developer can choose to use 2x or no FSAA, there are no FSAA levels available higher than 4x. The sampling pattern is not programmable but fixed, although it does use a sample pattern that doesn't have any of the sample points intersecting one or another on either the vertical or horizontal axis. Although we don't know the exact sample pattern shape, we suspect it will be similar to that seen on other sparse sampled / jittered / rotated grid FSAA mechanisms we've seen over the past few years, such as this.

(I've got zero other knowledge on your other points though.

)

Tammuz · Jul 13, 2006

OpenGL guy said:
bilinear filter from a 2D array in C?

This is common in games?

Inane_Dork · Jul 13, 2006

Tammuz said:
This is common in games?

Only slightly. :-|

bloodbob · Jul 16, 2006

OpenGL guy said:
How many lines of code does it take to do a bilinear filter from a 2D array in C?

1

AND APPARENTLY MY POST IS TOO SHORT

KimB · Jul 16, 2006

OpenGL guy said:
Except that texture units are designed to handle filtering in a very effcient manner. How many lines of code does it take to do a bilinear filter from a 2D array in C?

Well, let's see if I can do it (assuming texture coordinates range from 0 to 1 over the texture, this is a repeated texture, and we use distance-weighted sampling):

Code:

float filterBilinear(float tex[][], float xCoord, float yCoord, int xSize, int ySize)
{
  //Get location in texture to sample:
  int xIndex = ( (int)(xCoord*xSize) ) % xSize;
  int yIndex = ( (int)(yCoord*ySize) ) % ySize;
  //Get sample weights:
  float dx = fracpart(xCoord*xSize);  //Assume we have an optimized function to find
  float dy = fracpart(yCoord*ySize);  //the fractional part of a floating point number quickly
  float sweight1 = sqrt(dx*dx + dy*dy);
  float sweight2 = sqrt((1-dx)*(1-dx) + dy*dy);
  float sweight3 = sqrt(dx*dx + (1-dy)*(1-dy));
  float sweight4 = sqrt((1-dx)*(1-dx) + (1-dy)*(1-dy));
  float sum = sweight1 + sweight2 + sweight3 + sweight4;

  return (sweight1*tex[xIndex][yIndex]
          + sweight2*tex[xIndex+1][yIndex]
          + sweight3*tex[xIndex][yIndex+1]
          + sweight4*tex[xIndex+1][yIndex+1])/sum;
}

Some notes:
We only need use a rough approximation for the square root, so that doesn't need to take a significant amount of time. It is possible to find the fractional part of a floating point number in just a couple of assembly instructions. But we still have to contend with a large number of operations to do bilinear filtering.

Try to compare the above to what graphics hardware can do in a single clock cycle, sometimes without impacting other processing that is going on!

hWnd · Jul 16, 2006

Hm. I thought bilinear filtering was somewhat simpler than that. You are doing distance based averaging, but that is not linear isn't it?

If the the adjacent texels were arranged this way (A - D being the colors)

Code:

AB
CD

then

Code:

filteredColor = dy * (A * dx + B * (1 - dx)) + (1 - dy)*(C * dx + D * (1 - dx));

should be all math to it, shouldn't it? (except getting the fractional parts dx and dy)

Of course your filter would work too, but it's somewhat more luxurious

Does anyone know what actual maths are being done in the sampler unit?

KimB · Jul 16, 2006

hWnd said:
Hm. I thought bilinear filtering was somewhat simpler than that. You are doing distance based averaging, but that is not linear isn't it?

It's still linear because nowhere do you take a nonlinear function of any sample color. It's just linear with different weighting.

hWnd · Jul 16, 2006

Chalnoth said:
It's still linear because nowhere do you take a nonlinear function of any sample color.

That's true, but you don't do so either when using biquadratic or bicubic interpolation. In respect to the color samples it's just as linear.

When you take

Code:

dy * (A * dx + B * (1 - dx)) + (1 - dy)*(C * dx + D * (1 - dx))

and expand it you get

Code:

dy * dx * (A + D - C - B) + dy * (B - D) + dx * (C - D) + D

.

As you can see, it's is both linear to dx and dy (thus linear filter), as well as to the color values.

Your filter is using squares and square roots on dx and dy, thus it's not linear. (My opinion)

Unified Shader Architecture: Point sampling in addition to Bilinear Texturing

Xmas

Porous

JHoxley

bloodbob

Trollipop

Mintmaster

Mintmaster

Nom De Guerre

Jawed

Rys

Graphics @ AMD

Xmas

Porous

Jawed

Nom De Guerre

OpenGL guy

AlNom

Moderator

Tammuz

Inane_Dork

Rebmem Roines

bloodbob

Trollipop

KimB

hWnd

KimB

hWnd

Similar threads