flipcode said...

Dravern

Newcomer
The 9700’s floating point pipeline allows 24-bits (floating point) per channel throughout the pipeline, making these per-pixel effects more realistic for sharper reflections and smoother shading.

And also said..

NVIDIA raises the bar with 32-bit floats (vs. 24-bit) in the pipeline and blows away ATI’s vertex and pixel shaders.

And this is from ATI site. (fire gl x1 spec)

24-bit for each color component (RGBA) enables true-life images to be displayed beyond 16.7M colors

24bit per channel in the pipeline? Is this correct?
I thought 32bit per channel is right, but now I'm a little confusing again.
 
OTOH I read somewhere on this forum that the precision of calculations is 24bits floating point per component, (i.e. 96 bits total for a pixel), but that it is written to memory as 128bits (1 component per 32bit word) for better memory alignment.

Serge
 
psurge said:
OTOH I read somewhere on this forum that the precision of calculations is 24bits floating point per component, (i.e. 96 bits total for a pixel), but that it is written to memory as 128bits (1 component per 32bit word) for better memory alignment.

Serge

Actually, the little data I've seen could just as well be interpreted that the floating point format of the R300 is such that the mantissa is represented with 24 bits. That is, you would get 24 bit accuracy, but with a dynamic range defined by the exponent.
After some digging I also found that the R300 used IEEE fp format for 32bit floating point, which could support the above interpretation.

Even so, I would prefer better data to say anything for sure. But the nVidia hype machine might not be the best source of info on the R300...

Entropy
 
Entropy said:
psurge said:
OTOH I read somewhere on this forum that the precision of calculations is 24bits floating point per component, (i.e. 96 bits total for a pixel), but that it is written to memory as 128bits (1 component per 32bit word) for better memory alignment.

Serge

Actually, the little data I've seen could just as well be interpreted that the floating point format of the R300 is such that the mantissa is represented with 24 bits. That is, you would get 24 bit accuracy, but with a dynamic range defined by the exponent.
After some digging I also found that the R300 used IEEE fp format for 32bit floating point, which could support the above interpretation.

Even so, I would prefer better data to say anything for sure. But the nVidia hype machine might not be the best source of info on the R300...

Entropy

If so, what is this supposed to mean?
24-bit for each color component (RGBA) enables true-life images to be displayed beyond 16.7M colors
24-bit per RGBA displays beyond 16.7M colors
All are from firegl x1 spec in ati site.
Do these also mean 24bit mantissa (plus 8bit exponent)?
I want some clarification.
 
Dravern said:
I want some clarification.

That makes two of us. :D
I've given as much as I know.

On thing's for sure though - it's not meaningful to get too wound up about this from a practical standpoint. The major step was from 8 bits per colour component to a higher precision floating point format.
There is no way in hell you could find 400MHz RAMDACs accurate to 24 bits for instance, nor any output device known to man that could represent it even if possible. These numbers for precision are only relevant for ensuring sufficient dynamic range, and that cumulative calculational errors stay well below detection thresholds.

My guess:both the R300 and the NV30 use a 32-bit format internally. (The R300 is reported to be somewhat less flexible as far as external representation goes.)

But this is only based on my understanding of publicly available material. If this is truly important for your work, you should contact ATI directly.

Entropy
 
Like the ATI Radeon 9700, the NVIDIA CineFX cards offer a 128-bit frame buffer and floating point pipeline. NVIDIA raises the bar with 32-bit floats (vs. 24-bit) in the pipeline and blows away ATI’s vertex and pixel shaders. Instead of offering a few hundred instructions, they went for 1k (that’s a thousand!) instructions in pixel shaders and 64k instructions in vertex shaders. These include branches and loop instructions. As has been the case in the past, the NVIDIA card promises higher raw performance and longer pipelines but gives less texture support. These vertex and pixel shaders may eclipse the ATI Radeon 9700’s limits when the CineFX cards ship, but the ATI card has twice the texture access rate. However, the cards are so difficult to program that it may be the tools and not the hardware that makes the difference in the end.

i have some serious issues with this *view* of the two technologies... First.. Nvidia sent out a clarification that the Nv30 only supports 256 static instructions no 1k. It just irritates the crap out of me that articles and opinoions are being formed *again* based on deceptive information that Nvidia has slippe out.

The R300 ran Q3 with ray traced shadows with approx 80,000 instructions and managed 10 FPS.. So clearly the R300 is no sloutch in this department. Nvidias pixel and vertex shaders *based on reality* do NOT blow the r300 away.. Why does Nvidia CONSTANTLY get away with misinformation???

but the ATI card has twice the texture access rate. However, the cards are so difficult to program

Huh????? Texture access is *difficult*???

Then they make a comment that ATi needs to support NV_DEPTH_CLAMP.. in order to do shadow volumes *correctly*...

Currently, there is no alpha blending when using a 128-bit frame buffer. This requires developers to rewrite their rendering pipelines in order to make effects like shadows, translucency, and particle systems work with high precision color. Adding high precision alpha blending, even longer vertex and pixel programs, and support for the NV_DEPTH_CLAMP OpenGL extension (important for shadow volume rendering) would really make this a next generation card with features to match its raw performance

They actually imply in the end that the 9700 is *not* a true Next gen card... Even though it is the ONLY fully complient DX9 card in exsistence. Even though DX9 is not even released yet .. it does not have enough features.. even though it is fully DX9 complient....????? Again the R300 just ran Q3 with Ray traced shadows.. and is the ONLY card that can do it.. doing 80,000 instructions.. yet it *needs* more intruction power???

The real iussue here is that Nvidia just released a statement saying they only support 256 static instructions NOT.. 1024... so yet again, what do we have??? we have an article that will be read by many.. that makes the 9700 look inferior to a product that has been delayed to Christmas at the earliest.. Based on Rumored stats that have been PREOVEN false...

Where the writers even go as far as to suggest that it is *not* a true next gen product..

It makes me ill....
 
the card will ship with as many texture units as they can cram onto it, support for two-sided stencil testing and

It has been shown that more texture units are not necesarrily better in an 8 pipeline design.. also.. R300 HAS Twin stencil registers per pipeline... yet no mention of it.. only that the Nv30 might have it...

Does'nt this strike anyone else here as being a little one sided?? especially looking at all the avaiable info???
 
Hellbinder[CE said:
]

i have some serious issues with this *view* of the two technologies... First.. Nvidia sent out a clarification that the Nv30 only supports 256 static instructions no 1k. It just irritates the crap out of me that articles and opinoions are being formed *again* based on deceptive information that Nvidia has slippe out.

Notice how the article states PIXEL shaders, not vertex shaders. It is still 1024 static instructions for pixel shaders, it is 256 static instructions for vertex shaders. The information you quoted is correct.

Not saying that the article isn't riddled with error, but in that particular quote it was correct (well.. in the instruction area atleast) ;p
 
Two-sided stencil shadow acceleration is part of the DX9 spec so I'm pretty sure NV30 will have it.
 
If you are talking about the two sided stencil test (that is currently exposed on Nv2x chips via OpenGL extensions), then there isn't actually a very big speed difference when using it. At first glance you might think "half the passes = twice the speed", but if you do the math and/or try it that isn't how it comes out. It is the same number of stencil buffer updates, so in a fillrate limited render it won't make much of a difference.

(though there is a slight CPU usage decrease, and repeatedly rendering all front or back facing shadow volumes can cause inefficiences)

The speed difference is usually a small fraction of a percent. Whether or not that is the case with DX9's version of the functionality remains to be seen, but I wouldn't count on anything big coming from it (but I am not on the DX9 SDK beta).
 
Basic said:
OpenGL guy was quite clear, it's a 24bit floating point format, not 32bit.

Well, ATIs own educational on the 9700 specify 32-bits for the vertex engine. They call the the rendering pipes "128-bit" but when they get down to the gritty of it they specify 96 bits per pixel, which I would have taken as 32 bits per RGB component.
Furthermore, there is that IEEE reference, although I admit that it could have referred to the vertex engine alone. My mind haven't kept a hammerlock on that.

And of course there is always that ambiguity when talking about the precision of a floating point number - do you specify the size of the full floating point format, or of the mantissa alone, since that is the only part that matters as far as accuracy and error propagation goes? I myself and at least one other (fail to remember who) have specifically asked for information on how the bits are actually allocated - there was never any answer.

So if an anonymous source on a bbs such as this says "it's 24 bits" without going further into it, how should that be interpreted? Is it even necessarily accurate, regardless of good intentions? For my own sake, I don't depend on it either way, nor can I see that it makes any practical difference, although it's always possible to construct examples to the contrary. But if I actually _needed_ to know, anecdotal evidence just doesn't cut it either way.

And yes, I'm a bit miffed that the actual floating point format was never given. :D But the reason could simply have been that nobody actually knew exactly. And if that was the case...

Entropy
 
And an "anonymous guy" from a bbs stated that the 1024 static VS instructions was an error, and the real number was 256. The documents was corrected only after it was brought up on a bbs.
And nvidia has got quite some bashing for it from some persons.

The "anonymous guy" = OpenGL guy was quite explicit that it was 24 bit floating point as opposed to normal 32 bit floating point. There's no doubt that he didn't mean 24 bit mantissa. So the only argument would be that OpenGL guy is faking it and realy doesn't know what he's talking about. I'd say that he's given enough good information here that he should be trusted. Especially since the number 24 bit per component in mentioned in ATis official doc.

So we have a pretty similar situation as the 1024/256 VS instruction thing.
IMO the most probable thing is that it's 24 bit floating point, but I can of course not be sure. I wouldn't change my mind just because of one more document saying "we use 32 bit fp". Some official comment like "the 24 bit fp rumours are false, we do 32 bit fp internally", would be needed.
And I think that's reasonable.
I would of course want an official comment on this to put it to rest, and it should be an comment that explicitly addreses the 24 bit fp.

Btw, (one of) the guy(s) that requested more explicit format of the pixel shader variables was me.
 
BTW, nVidia finally got around to updating their NV3X paper (at developer.nvidia.com) earlier today.

256 static instructions (was 1024 before)
256 loops (was 64 before)

PR:
65536 instructions.. 256 instructions can be looped 256 times. That's PR for you ;p
 
Entropy said:
Basic said:
OpenGL guy was quite clear, it's a 24bit floating point format, not 32bit.

Well, ATIs own educational on the 9700 specify 32-bits for the vertex engine. They call the the rendering pipes "128-bit" but when they get down to the gritty of it they specify 96 bits per pixel, which I would have taken as 32 bits per RGB component.
Furthermore, there is that IEEE reference, although I admit that it could have referred to the vertex engine alone. My mind haven't kept a hammerlock on that.

And of course there is always that ambiguity when talking about the precision of a floating point number - do you specify the size of the full floating point format, or of the mantissa alone, since that is the only part that matters as far as accuracy and error propagation goes? I myself and at least one other (fail to remember who) have specifically asked for information on how the bits are actually allocated - there was never any answer.

So if an anonymous source on a bbs such as this says "it's 24 bits" without going further into it, how should that be interpreted? Is it even necessarily accurate, regardless of good intentions? For my own sake, I don't depend on it either way, nor can I see that it makes any practical difference, although it's always possible to construct examples to the contrary. But if I actually _needed_ to know, anecdotal evidence just doesn't cut it either way.

And yes, I'm a bit miffed that the actual floating point format was never given. :D But the reason could simply have been that nobody actually knew exactly. And if that was the case...

Entropy

It's sometimes hard to mix all the nuances of hardware with making a straightforward & clear marketing message.

The 9700 supports a mixture of both 32b and 24b floating point precision, per RGBA component, in the pixel pipes (Vertex shaders and the like are all at least 32b). Some of the internal texture address logic uses 32b (single precision, IEEE floating point) SPFP, while the core pixel shader operations use a smaller format, which is 24b total, and has sign, exponent and mantissa subfields.

The output of the pixel shaders can be then cut down to a lower precision mode (i.e. 32b per pixel, 64b per pixel, etc...), or can be expanded up to 128b per pixel (4x32b). When we expand up, we convert our internal format to a 32b per component IEEE single precision floating point number.

In the same way, when reading a full 128b (4x32b SPFP) per texel texture, we convert the external format to our internal format. Writing out our internal format to 128b and reading it back in as a texture causes no precision loss.

Hope that clears it up a little.
 
Thanks

sireric - thanks.

I still have questions .. but .. WILL .. restrain .. myself! Arggh!

Entropy

PS. No disrespect to OpenGL guy intended in that earlier post. He is a major contributor indeed to these forums.
 
Back
Top