FP16 and market support

I might be light-years out of my intellectual league here, but do not Ostsol's shots measure the bilinear blend precision rather than the texture addressing precision? One thing I noticed in the shot was the 32 intermediate color gradients in between the texels, which corresponds well with the fact ATi has 5 bit precision bilinear blending (according to that German article a while back).

My (horrible) logic proceeds as follows - for a 2kx2k texture you need 11 bits to address each texel. You have 17 bits mantissa precision in a normalized FP24 value, which leaves you with six fractional bits for blending, which is one more fractional bit than ATi uses for blending.

One problem would be if you have the texture set on repeat wrap mode (like a tile repeating on a floor). But I doubt you would use a 2k texture for that given the aliasing that would generate.
 
OpenGL guy said:
What's your point? Is anyone interested in staring at four texels stretched across the whole screen?
The point is to see if the same image can be produced, which would show that the same precision (FP32) is being used in texture lookups. Given that the NV25 supports textures up to 4096x4096, I think it quitely likely that it does lookup textures with FP32 texcoords.
 
OpenGL guy: Well, according to all the stuff you ATI guys said here, it doesn't seem like there's much of a need for FP32 in the NV3x, unless you get out of realtime stuff using 1024 instructions on Quadro FX boards, maybe.

I must admit NVIDIA's obsession with FP32 is extremely strange, and it's bad for them too, but it's not the first time I see people mention FP32 in the NV25...
Anyway, if it wasn't in FP32, in what format would it be? FP24? NVIDIA never used FP24 AFAIK. Integer or FP16? Maybe, but I don't think so :)

BTW, gotta agree with ya that those people care wayyy too much about invisible texturing problems - overall, the R300 is just fine when it comes to texture filtering. Sure, if you were to make some areas better, others would become lacking in comparaison; for example, if you increased AF quality a lot, some other precision values might have to be increased.
But I think the design has made its proofs, and for this generation, and probably even the next, it seems perfectly sufficient.


Uttar
 
akira888 said:
I might be light-years out of my intellectual league here, but do not Ostsol's shots measure the bilinear blend precision rather than the texture addressing precision? One thing I noticed in the shot was the 32 intermediate color gradients in between the texels, which corresponds well with the fact ATi has 5 bit precision bilinear blending (according to that German article a while back).

I think the 5-bit precision thing is about MIP LOD selection, not bilinear blending. Bilinear blending should always be done in at least 8 bits per component, or you'll always get very bad quality.

My (horrible) logic proceeds as follows - for a 2kx2k texture you need 11 bits to address each texel. You have 17 bits mantissa precision in a normalized FP24 value, which leaves you with six fractional bits for blending, which is one more fractional bit than ATi uses for blending.

It seems to be correct. You can see 64 shades in Ostsol's screenshot, which suggests 6 bits sub-texel addressing accuracy. On NV34 Ostsol's program gives smooth result (256 shades) at the 1st range. At the 4th range it gives 128 shades.
 
I would not put too much weight on what OpeGL guy has to say. Remember he is/was responsible for OpenGL on the S3 Savage series and for ATi cards. We all know how the OpenGL quality of those products stacks up vs nVidia's OpenGL.
 
Greg - he's actually responsible for DirectX on the R300 series...

I think he'll have an infinite amount more knowledge than you on any subject related to 3D.
 
DaveBaumann said:
Greg - he's actually responsible for DirectX on the R300 series...

I think he'll have an infinite amount more knowledge than you on any subject related to 3D.

I don't doubt it for a second.

I do doubt his knowledge (and that of many other speculators) of the inner workings of nVidia's GPU's though.
 
I don't doubt it for a second.

Thats not what you said, saying:

Remember he is/was responsible for OpenGL on the S3 Savage series and for ATi cards. We all know how the OpenGL quality of those products stacks up vs nVidia's OpenGL.

Says something completely different and speaks volumes about your character.
 
radar1200gs said:
DaveBaumann said:
Greg - he's actually responsible for DirectX on the R300 series...

I think he'll have an infinite amount more knowledge than you on any subject related to 3D.

I don't doubt it for a second.

I do doubt his knowledge (and that of many other speculators) of the inner workings of nVidia's GPU's though.

While you guys are speculating how Nvidia chips work, I bet these guys know exactly how they function. ATI has electron microscopes and can directly look at their chip layout and design. They can also hack the drivers and see how they interface with the hardware.

Haven't you heard Nvidia comment about how many cycles a sin operation takes on ATI hardware? Nvidia also knows the details of ATI's hardware in detail.
 
OpenGL guy said:
I don't buy it. There's absolutely no need for NV2x to have FP32 texture lookups.

I do not speak about texture lookups. I speak about the unit that calculate the 4 values for the register combiner. nVidia call this unit the texture shader. The texture lookup is only a part of the whole unit.

Let's take a look at precision:

Texture Shader Precision
• Interpolated texture coordinates are IEEE 32-bit
floating-point values
• Texture projections, dot products, texture offset,
post-texture offset scaling, reflection vector, and
depth replace division computations are
performed in IEEE 32-bit floating-point
• HILO texture components are filtered as 16-bit
values
• DSDT, MAG, intensity, and color components are
filtered as 8-bit values

http://developer.nvidia.com/object/texture_shaders.html (PDF;Page 130)
 
pcchen said:
I think the 5-bit precision thing is about MIP LOD selection, not bilinear blending. Bilinear blending should always be done in at least 8 bits per component, or you'll always get very bad quality.
It is at least 8 bits per component, but with two 6 bits (not 5) blending factors. Mipmap interpolation is done with a 5 bits factor.
 
OpenGL guy said:
Ostsol said:
Here's normal rendering (FP32) on a Radeon 9700 Pro at the first texture address range:
http://members.shaw.ca/dwkjo/screenshots/normal.png

Here's dependant texture reads (FP24) on the same card at the same range:
http://members.shaw.ca/dwkjo/screenshots/dependant.png
What's your point? Is anyone interested in staring at four texels stretched across the whole screen?
If you look, those texels are shifted slightly. The top right of the image is relatively stationary, with the bottom and left portions moving away from it in the 24-bit precision mode.

What does this mean?

There is a danger of textures not lining up properly any time the developer depends on textures to line up at the edge of a triangle, which would, at best, produce extra edge aliasing. At worse it would cause additional lines to appear where none should be. It is further conceivable that effects requiring many textures may no longer line up properly (esp. since it is likely that only some of the textures will require dependent reads).
 
pcchen said:
I think the 5-bit precision thing is about MIP LOD selection, not bilinear blending. Bilinear blending should always be done in at least 8 bits per component, or you'll always get very bad quality.
...
It seems to be correct. You can see 64 shades in Ostsol's screenshot, which suggests 6 bits sub-texel addressing accuracy.
Yes, ATI chips use 5 bit precision coefficient for MIP LOD blending and 6 bit precision coefficient for bilinear filtering.
DirectX REFRAST uses 5 bit precision for both of them.
 
ok since we have all of the Nviditrolls around here constantly telling us why ATi sucks....

why don't you guys ask Nvidia why

1. FP32 is not used and they have to use FP16 in order to compete with ATi's FP24

2. why they claim 5200 is DX compatable and yet lacks the required percision. it is my understanding that it has no FP32 units and may not even have FP16 ones and relies on integer units

3. Why they are slowing down the advancement of IQ and Cinematic computing that they were hyping. Do any of you remember Nvidia telling us how FP32 was faster than ATi's FP24 abd they released the Dawn Demo showing off their card all the while it turned out that dawn was running in FP16 and INT12 for the most part. Not to mention that it looks better and runs faster on ATi's hardware and they are using a wrapper.........

4. What did nVidia say about .15u? that ATi would be stuck in the low 300mhz range at best and look here we are running 9800TX's on .15u at 415+Mhz all the while producing less heat than Nvidia's .13u parts in the same price range.

6. What ever happened to the 5800Ultra? can you find it on nVIdia's webpage?

6. Why their FSAA sucks in comparison IQ and FPS wise to ATi's why is it that Ati's 6x FSAA looks better and runs faster than nVidia's 8x?

7. why nvidia has been quoted as saying INTEL is their main competitor?

8. why Ati can pretty much run DX9 titles right out of the box and nVidia has to hand code for each game and even worse yet pressure developers to remove benchmark modes from their games since it makes them look bad (TRAOD)

9. why don't they do full trilinear in UT 2K3? while Ati does

10. why that S3's new card is faster on some of the DX9 benchmarks than their 5600 and 5600 ultra cards and will be about the price of a 5200?
 
Uttar said:
So, could it be that with texturing, VS will also have register usage penalties?

Definately.
Except that the penalty is not necessarily paid in slowdowns but in transistors.
I might require a 10x larger register array to avoid the slowdowns - it's definately not cheap.

As for branchig it was definately cheaper to implement it in VS than in PS.
Branching in PS have to break quads, while vertices are already independent.

I think at least the first generation of PS3.0 hw will execute the pixel program in quads until the first dynamic branch, and independently from that point.
And it will probably have penalties after that point.
 
YeuEmMaiMai said:
1. FP32 is not used and they have to use FP16 in order to compete with ATi's FP24
FP32 is used for texture ops.

2. why they claim 5200 is DX compatable and yet lacks the required percision. it is my understanding that it has no FP32 units and may not even have FP16 ones and relies on integer units
It has the same basic architecture as the 5600 and 5800. With no FP32 units, proper texturing could not be achieved.

What you are speaking of, mistakenly, is of course a result of nVidia's choice to retain integer units in the original NV3x lineup (NV30-34), in conjunction with Microsoft's refusal to support integer types. Since the NV30-34 suffer a huge performance hit when going all FP (they lose about 2/3rds of their available power), nVidia has been forced to use integer precision anyway under DX9. It would have been vastly better for everybody involved if Microsoft had just supported integer types.

6. What ever happened to the 5800Ultra? can you find it on nVIdia's webpage?
5900 Ultra.

6. Why their FSAA sucks in comparison IQ and FPS wise to ATi's why is it that Ati's 6x FSAA looks better and runs faster than nVidia's 8x?
Unfortunate. Hopefully fixed in the NV40.

7. why nvidia has been quoted as saying INTEL is their main competitor?
Integrated chipsets.

8. why Ati can pretty much run DX9 titles right out of the box and nVidia has to hand code for each game and even worse yet pressure developers to remove benchmark modes from their games since it makes them look bad (TRAOD)
False. ATI had DX9 hardware first, so developers started developing DX9 titles on ATI hardware. ATI got a head start.

9. why don't they do full trilinear in UT 2K3? while Ati does
ATI doesn't do any trilinear on any texture stage but the first (when aniso is selected from the control panel).

10. why that S3's new card is faster on some of the DX9 benchmarks than their 5600 and 5600 ultra cards and will be about the price of a 5200?
Heh. S3's card is going to suck in so many different ways, its performance will be a secondary consideration. Specifically: what has S3 sacrificed to get that performance?

Regardless, the 5600 line is being replaced by the much better 5700 line.
 
YeuEmMaiMai said:
2. why they claim 5200 is DX compatable and yet lacks the required percision. it is my understanding that it has no FP32 units and may not even have FP16 ones and relies on integer units

Why? Because it ist DX9 chip.

A NV34 kann execute 2 vector4 FP32 operations per clock.
 
Back
Top