nV40 info w/ benchmarks

KimB · Jan 31, 2004

Fred da Roza said:
Let me get this straight. You are saying even if 12 bit texture adressing, on the GeForce 3/4 series, causes very noticable artifacts it doesn't matter because FX16, used on the R200 series, isn't used in modern cards. However FP24, on the R300 series, is a problem even though it hasn't caused any noticable artifacts.

The GeForce3/4 use two separate precisions, and neither is FX12.

The shader is separated into two parts (how it works in hardware can best be seen by looking at the OpenGL extensions, which I will outline here):
1. Texture shader. The texture shader governs all operations that lead to a texture read. All operations in the texture shader are done at FP32. The operations available are very limited in scope.

2. Register combiner. This is an essentially general processing unit that is designed to work on color data in order to calculate the final output color. Precision is FX9.

Anyway, fortunately, as far as I know, no video card in existence processes any texture information at less than FX16.

Dio · Jan 31, 2004

Chalnoth said:
All operations in the texture shader are done at FP32.

I thought that all pre-DX9 cards used integer interpolators, although I could be wrong (even about ATI's cards).

Fred da Roza · Jan 31, 2004

Chalnoth said:
The GeForce3/4 use two separate precisions, and neither is FX12.

The shader is separated into two parts (how it works in hardware can best be seen by looking at the OpenGL extensions, which I will outline here):
1. Texture shader. The texture shader governs all operations that lead to a texture read. All operations in the texture shader are done at FP32. The operations available are very limited in scope.

2. Register combiner. This is an essentially general processing unit that is designed to work on color data in order to calculate the final output color. Precision is FX9.

Anyway, fortunately, as far as I know, no video card in existence processes any texture information at less than FX16.

Let me correct that then. You are saying even if 12 bit color precision, on the GeForce 3/4 series, causes very noticable artifacts it doesn't matter because FX16, used on the R200 series, isn't used in modern cards. However FP24 texture addressing, on the R300 series, is a problem even though it hasn't caused any noticable artifacts.

KimB · Jan 31, 2004

Fred da Roza said:
Let me correct that then. You are saying even if 12 bit color precision, on the GeForce 3/4 series, causes very noticable artifacts it doesn't matter because FX16, used on the R200 series, isn't used in modern cards. However FP24 texture addressing, on the R300 series, is a problem even though it hasn't caused any noticable artifacts.

No, I'm saying that the GeForce3/4 series doesn't have 12-bit color precision.

I'm also saying that color precision has absolutely nothing to do with texture addressing. I'm not talking about these processors' ability to compute color values. I'm talking about their ability to properly calculate the location on the texture to obtain for proper texture filtering.

Fred da Roza · Jan 31, 2004

Chalnoth said:
Doomtrooper said:

I also thought the extra precision shown here for the diffuse bump mapping example was due to FX16 :

Click to expand...

Nope. Just FX12 with ATI electing to use 8 bits of color accuracy, such that the value is clamped to [-8,8].

nVidia instead elected to use 10 bits of color accuracy with the value clamped to [-2,2].

So, in essence, the R200 just uses a different FX12 format, but it's still FX12.

Then what precision is being discussed here?

KimB · Jan 31, 2004

Dio said:
Chalnoth said:

All operations in the texture shader are done at FP32.

Click to expand...

I thought that all pre-DX9 cards used integer interpolators, although I could be wrong (even about ATI's cards).

Well, it's hard to figure out exactly what precision many cards are doing various things at, but I have seen from a number of sources that the NV2x uses FP32 for texture address calculations. This would be the essential reason why texture calculations and arithmetic calculations are separate in the NV2x shaders.

Now, we have seen on this message board that there is a difference on the R3xx in texture addressing between dependent texture reads (where the texture coordinate must, at some point, be truncated to FP24) and independent texture reads (where the texture coordinate can remain in its original FP32 form as calculated by the vertex shader and interpolated during triangle setup).

So the R300 must indeed be calculating texture addresses, by default, at some higher precision than FP24. I remember specifically an interview over at firingsquad.com where an ATI employee stated something to the tune of, " everything but the pixel shader is done at FP32," which, while it isn't strictly accurate, would paint a picture that the vertex shader (and all interpoalted per-vertex attributes) are calculated and stored at FP32.

Where will this difference be seen? Well, as with many things, the most obvious would be with a large, high-contrast shader. Some possible problems that would crop up: skinned models that require the same texture to wrap across multiple triangles may have seams or cracks along triangle edges where the calculations don't properly line up. There may be problems with, for example, a dependent texture not properly aligning with the base texture, causing the surface to look just a little bit "off." Anisotropic which, I believe, would require higher subpixel accuracy may exacerbate these issues.

Anyway, all texture calcluations should be moved to FP32 in the near future, if for no better reason than to make invariant texture calculations performed in the vertex shader and in the pixel shader. FP24 really is a compromise. It is high enough precision that there shouldn't be many really obvious issues with texture address calculation, but it still isn't as high as we're used to for fixed-function texturing.

KimB · Jan 31, 2004

Fred da Roza said:
Chalnoth said:

Doomtrooper said:

I also thought the extra precision shown here for the diffuse bump mapping example was due to FX16 :

Click to expand...

Nope. Just FX12 with ATI electing to use 8 bits of color accuracy, such that the value is clamped to [-8,8].

nVidia instead elected to use 10 bits of color accuracy with the value clamped to [-2,2].

So, in essence, the R200 just uses a different FX12 format, but it's still FX12.

Click to expand...

Then what precision is being discussed here?

Read a bit later in the thread. The statement above turned out not to be accurate. The R200 does indeed use FX16.

Fred da Roza · Jan 31, 2004

Chalnoth said:
Fred da Roza said:

Chalnoth said:

Doomtrooper said:

I also thought the extra precision shown here for the diffuse bump mapping example was due to FX16 :

Click to expand...

Nope. Just FX12 with ATI electing to use 8 bits of color accuracy, such that the value is clamped to [-8,8].

nVidia instead elected to use 10 bits of color accuracy with the value clamped to [-2,2].

So, in essence, the R200 just uses a different FX12 format, but it's still FX12.

Click to expand...

Then what precision is being discussed here?

Click to expand...

Read a bit later in the thread. The statement above turned out not to be accurate. The R200 does indeed use FX16.

Well if it is color precision that is being discussed there, then you basically said even if 12 bit color precision, on the GeForce 3/4 series, causes very noticable artifacts it doesn't matter because FX16, used on the R200 series, isn't used in modern cards. However FP24 texture addressing, on the R300 series, is a problem even though it hasn't caused any noticable artifacts.

OpenGL guy · Jan 31, 2004

Chalnoth said:
Where will this difference be seen? Well, as with many things, the most obvious would be with a large, high-contrast shader. Some possible problems that would crop up: skinned models that require the same texture to wrap across multiple triangles may have seams or cracks along triangle edges where the calculations don't properly line up. There may be problems with, for example, a dependent texture not properly aligning with the base texture, causing the surface to look just a little bit "off." Anisotropic which, I believe, would require higher subpixel accuracy may exacerbate these issues.

So write such a shader and show us the problems. Your statement above starts with "obviously" then every sentence thereafter says "may". "May be" it's not so "obvious" afterall, hmm?

Dave Baumann · Jan 31, 2004

Chalnoth said:
Heh. Finally read some of the later posts on this thread. Looks like it's FX16 after all, but it's not like it matters. FX16 still isn't used in modern cards (in the shaders).

Is the 9200 not a viable modern card then? I thought it was still being produced and is having quite a nice life in laptops as well.

rwolf · Jan 31, 2004

Chalnoth said:
Now, we have seen on this message board that there is a difference on the R3xx in texture addressing between dependent texture reads (where the texture coordinate must, at some point, be truncated to FP24) and independent texture reads (where the texture coordinate can remain in its original FP32 form as calculated by the vertex shader and interpolated during triangle setup).

I would like you to explain a single case where the texture coordinate would be truncated to FP24. What are the possible rage of values for texture coordinates?

Chalnoth said:
So the R300 must indeed be calculating texture addresses, by default, at some higher precision than FP24. I remember specifically an interview over at firingsquad.com where an ATI employee stated something to the tune of, " everything but the pixel shader is done at FP32," which, while it isn't strictly accurate, would paint a picture that the vertex shader (and all interpoalted per-vertex attributes) are calculated and stored at FP32.

Where will this difference be seen? Well, as with many things, the most obvious would be with a large, high-contrast shader. Some possible problems that would crop up: skinned models that require the same texture to wrap across multiple triangles may have seams or cracks along triangle edges where the calculations don't properly line up. There may be problems with, for example, a dependent texture not properly aligning with the base texture, causing the surface to look just a little bit "off." Anisotropic which, I believe, would require higher subpixel accuracy may exacerbate these issues.

Anyway, all texture calcluations should be moved to FP32 in the near future, if for no better reason than to make invariant texture calculations performed in the vertex shader and in the pixel shader. FP24 really is a compromise. It is high enough precision that there shouldn't be many really obvious issues with texture address calculation, but it still isn't as high as we're used to for fixed-function texturing.

Your argument doesn't hold water. I would bet that Nvidia does most calculations at FP16 and you seem to think their quality is ok. The vertex shaders have to deal with cases where multiple polygons fall in the same pixel (ie seams or cracks), but the smallest unit of measure for a pixel shader is the pixel. If the pixel falls in a seam you would run the shader the same way for fp32, fp24, or fp16.

The only thing you lose is color precision with fp32 over fp24.

Ostsol · Jan 31, 2004

rwolf said:
Chalnoth said:

Now, we have seen on this message board that there is a difference on the R3xx in texture addressing between dependent texture reads (where the texture coordinate must, at some point, be truncated to FP24) and independent texture reads (where the texture coordinate can remain in its original FP32 form as calculated by the vertex shader and interpolated during triangle setup).

Click to expand...

I would like you to explain a single case where the texture coordinate would be truncated to FP24. What are the possible rage of values for texture coordinates?

The definition of a dependant texture read is that the texture coordinate that is input into the texture addressing unit is not data that has come straight from the floating point interpolaters (which work at FP32 on the R3xx). It is a value that has undergone manipulation in the fragment program. Since the R3xx has FP24 precision registers, any time a register is used as the source for the texture address, that address is an FP24 value. That is the case where a texture coordinate is truncated to FP24. "Normal" texture addressing, where the input addressed is taken unmodified from the interpolaters, is performed at FP32. The precision of the sampled fragment, of course, is another matter.

Speaking of which. . . does anyone know if the textures are sampled at FP32 or FP24? Obviously if the result is dumped into a general purpose register it will be truncated to FP24, but what if it is dumped into the output register? Basically, is it only the general purpose registers and the ALUs on the R3xx that are FP24?

Dave Baumann · Jan 31, 2004

IIRC if its sampling float textures then R300 does one FP32 component per cycle.

KimB · Jan 31, 2004

rwolf said:
Chalnoth said:

Now, we have seen on this message board that there is a difference on the R3xx in texture addressing between dependent texture reads (where the texture coordinate must, at some point, be truncated to FP24) and independent texture reads (where the texture coordinate can remain in its original FP32 form as calculated by the vertex shader and interpolated during triangle setup).

Click to expand...

I would like you to explain a single case where the texture coordinate would be truncated to FP24. What are the possible rage of values for texture coordinates?

Um, a dependent texture read, perhaps? Basically any time you change the texture coordinate within the pixel shader you'll be doing that at FP24.

Your argument doesn't hold water. I would bet that Nvidia does most calculations at FP16 and you seem to think their quality is ok.

Not texture calculations.

The vertex shaders have to deal with cases where multiple polygons fall in the same pixel (ie seams or cracks), but the smallest unit of measure for a pixel shader is the pixel. If the pixel falls in a seam you would run the shader the same way for fp32, fp24, or fp16.

Yes, but it's not usually the pixel shader's job to calculate the texture coordinates. That's the job of the vertex shader. If you have one texture that has its coordinate calculated via FP24 and another via FP32, they may not align properly.

The only thing you lose is color precision with fp32 over fp24.

Haha. No.

You'd have to take a seriously pathological case to have color precision problems with FP24. If you're shader is designed for a realtime game on an R3xx, there's pretty much no way it's going to be long enough for precision problems to creep in.

KimB · Jan 31, 2004

Ostsol said:
Speaking of which. . . does anyone know if the textures are sampled at FP32 or FP24? Obviously if the result is dumped into a general purpose register it will be truncated to FP24, but what if it is dumped into the output register? Basically, is it only the general purpose registers and the ALUs on the R3xx that are FP24?

What would be the point?

Ostsol · Jan 31, 2004

Chalnoth said:
Ostsol said:

Speaking of which. . . does anyone know if the textures are sampled at FP32 or FP24? Obviously if the result is dumped into a general purpose register it will be truncated to FP24, but what if it is dumped into the output register? Basically, is it only the general purpose registers and the ALUs on the R3xx that are FP24?

Click to expand...

What would be the point?

If texture sampling was at FP32, then ATI might not have far to go to supporting texture reads in the vertex pipeline. The existing texture samplers would have all the precision requirements.

KimB · Feb 1, 2004

Ostsol said:
If texture sampling was at FP32, then ATI might not have far to go to supporting texture reads in the vertex pipeline. The existing texture samplers would have all the precision requirements.

I don't think it'd be a challenge to change regardless.

rwolf · Feb 8, 2004

Chalnoth said:
rwolf said:

Chalnoth said:

Now, we have seen on this message board that there is a difference on the R3xx in texture addressing between dependent texture reads (where the texture coordinate must, at some point, be truncated to FP24) and independent texture reads (where the texture coordinate can remain in its original FP32 form as calculated by the vertex shader and interpolated during triangle setup).

Click to expand...

I would like you to explain a single case where the texture coordinate would be truncated to FP24. What are the possible rage of values for texture coordinates?

Click to expand...

Um, a dependent texture read, perhaps? Basically any time you change the texture coordinate within the pixel shader you'll be doing that at FP24.

Your argument doesn't hold water. I would bet that Nvidia does most calculations at FP16 and you seem to think their quality is ok.

Click to expand...

Not texture calculations.

The vertex shaders have to deal with cases where multiple polygons fall in the same pixel (ie seams or cracks), but the smallest unit of measure for a pixel shader is the pixel. If the pixel falls in a seam you would run the shader the same way for fp32, fp24, or fp16.

Click to expand...

Yes, but it's not usually the pixel shader's job to calculate the texture coordinates. That's the job of the vertex shader. If you have one texture that has its coordinate calculated via FP24 and another via FP32, they may not align properly.

The only thing you lose is color precision with fp32 over fp24.

Click to expand...

Haha. No.

You'd have to take a seriously pathological case to have color precision problems with FP24. If you're shader is designed for a realtime game on an R3xx, there's pretty much no way it's going to be long enough for precision problems to creep in.

So you are saying that 24 bits is not enough to calculate texture coordinates? How big are these textures? I don't see it.

KimB · Feb 8, 2004

Here's a link to my response to a post where Ostol tested the difference between dependent texture reads on the R3xx and "normal" texture reads:

http://www.beyond3d.com/forum/viewtopic.php?t=9521&postdays=0&postorder=asc&start=292

The post I was responding to can be found on the preceding page.

rwolf · Feb 8, 2004

OpenGL Guy said:
What's your point? Is anyone interested in staring at four texels stretched across the whole screen?

This is a completely unrealistic test.

If your rendering "Toy Store" you might have textures this large, but for real time graphics I think not. A 256 color 4096x4096 uncompressed texture would be 16MB. I highly doubt that even at 1600x1200 there would be any benefit to using a texture that detailed.

nV40 info w/ benchmarks

KimB

Dio

Fred da Roza

KimB

Fred da Roza

KimB

KimB

Fred da Roza

OpenGL guy

Dave Baumann

Gamerscore Wh...

rwolf

Rock Star

Ostsol

Dave Baumann

Gamerscore Wh...

KimB

KimB

Ostsol

KimB

rwolf

Rock Star

KimB

rwolf

Rock Star

Similar threads