Carmack's comments on NV30 vs R300, DOOM developments

KimB · Jan 30, 2003

Doomtrooper said:
I don't know, there was a thread complaining about the R300 only using 24bits was there not....IEEE standard ??

Whats your opinion pcchen.

Well, I'll tell you my opinion. There probably won't be any noticeable difference in 99.9% of all situations. I currently don't know of any specific algorithms that would really benefit from greater than 24 bits of floating-point color per channel.

Update:
Oh, there is one thing. There may be some problems with z values calculated in the pixel shader, but that shouldn't be done often anyway, as it destroys the memory bandwidth savings techniques, and there may be ways to prevent z-buffer problems here through clever algorithms.

Doomtrooper · Jan 30, 2003

Well old buddy I thought you'd say something like that, since Carmack needs 16 bit FP to compete in Doom 3...I'm looking for other lets say 'unbiased' opinions...anyone.

Hellbinder · Jan 30, 2003

Yes, simple logic.

A > B and C > B. What does that tell you about A and C? Nothing.

The visual difference is another question. Will 24 bits FP much better than 16 bits FP on most pixel shading? On the other hand, will 32 bits FP much better than 24 bits FP?

Give me a break dude...

At the least it tells you that A will help close the gap on C even if you dont know the exact final result. We also dont have any idea what the visual difference between the R200 path and the Nv30 path.

KimB · Jan 30, 2003

Hellbinder[CE said:
]At the least it tells you that A will help close the gap on C even if you dont know the exact final result. We also dont have any idea what the visual difference between the R200 path and the Nv30 path.

Or if there will be the option to run the R200 path on the R300.

Whoops, just realized something. He did say something about this:

Light and view vectors normalized with math, rather than a cube map. On future hardware this will likely be a performance improvement due to the decrease in bandwidth, but current hardware has the computation and bandwidth balanced such that it is pretty much a wash. What it does (in conjunction with floating point math) give you is a perfectly smooth specular highlight, instead of the pixelish blob that we get on older generations of cards.

Go back to the .plan for more detailed info...

Colourless · Jan 30, 2003

My Interpretation

Speed wise:
GFFX:NV30 >> GFFX:ARB2
GFFX:NV30 > 9700:ARB2
9700:ARB2 > GFFX:ARB2
9700:R200 > 9700:ARB2

Quality Wise
GFFX:ARB2 > GFFX:NV30
GFFX:ARB2 > 9700:ARB2
9700:ARB2 > GFFX:NV30
9700:ARB2 > 9700:R200

Of course out of that, I am unable to determing what the comparisons between GFFX:NV30 and 9700:R200 would be. Guess would be GFFX would be faster and look better than 9700 under these circumstances.

boobs · Jan 30, 2003

Hellbinder[CE said:
]

Yes, simple logic.

A > B and C > B. What does that tell you about A and C? Nothing.

The visual difference is another question. Will 24 bits FP much better than 16 bits FP on most pixel shading? On the other hand, will 32 bits FP much better than 24 bits FP?

Click to expand...

Give me a break dude...

At the least it tells you that A will help close the gap on C even if you dont know the exact final result. We also dont have any idea what the visual difference between the R200 path and the Nv30 path.

Wow, I came back from dinner expecting to read intelligent discussions, instead, thanks to Hellbinder, I find a recap of SAT type math questions.

J/k dude.

So, umm, any speculation as to why the ARB path is so slow on GFFX? Just the higher precision alone doesn't seem to account for GFFX's performance given that it runs at a much higher clock speed and 96 bit vs 128 bit is only 33% difference.

Perhaps there are fundamental differences in design between Nvidia's fragment processor and ATi's? Or maybe the current GFFX driver is doing something suboptimal?

Doomtrooper · Jan 30, 2003

NV_f_p extension spec, there are 3 register percisions: 12bit fixed, 16bit float, and 32bit float

KimB · Jan 30, 2003

Colourless said:
My Interpretation

...
Quality Wise
GFFX:ARB2 > GFFX:NV30

This is highly unlikely.

Anyway, here's a quick and dirty explanation as to why 16-bit floating point is enough for the vast majority of calculations.

The argument is simple: the mantissa is 10 bits for nVidia's half float format. It appears that on any video card, the framebuffer will be in the 8888 integer format (due to the lack of blending modes in the floating-point formats, according to what JC stated). This means that for any color-only data, the 16-bit floating point format will offer more than enough accuracy than will ever reach the framebuffer.

32-bit floats may be useful for things like normal maps. Whether or not these are used is up to JC, but it seems apparent to me that he is very concerned about offering an equivalent experience on every video card.

Update:
One additional thing. Even if a 16-bit framebuffer is used, the DAC appears to be only 10-bit on the NV30 (same as the R300, I believe). From what I've read, high-end visualization systems generally output at around 12 bits (The "subpixel accuracy" of the Quadro FX, for what it's worth), so there still is room for improvement, but for the FX, 16-bit is all that will ever be needed for color calculations (except, perhaps, if a huge number of instructions degrades quality too much...but I doubt such a thing will be rendered in realtime).

Fuz · Jan 30, 2003

Just something to think about.

By the time Doom III goes gold, we will most likely be playing on NV35s' and R350's. If Doom III is released around christmas, might even be playing on a R400.

KimB · Jan 30, 2003

Fuz said:
Just something to think about.

By the time Doom III goes gold, we will most likely be playing on NV35s' and R350's. If Doom III is released around christmas, might even be playing on a R400.

The NV35 will likely be out much later than the R350.

T2k · Jan 30, 2003

Hellbinder[CE said:
]

Furthermore, JC didn't say R300 runs faster than NV30 with R200 path.

Click to expand...

Its simple logic.

-Nv30 is slightly faster than than R300 running ARB2

=

What did you smoke, Hell?

At the moment, the NV30 is slightly faster on most scenes in Doom than the R300, but I can still find some scenes where the R300 pulls a little bit ahead.
(...)
The NV30 runs the ARB2 path MUCH slower than the NV30 path.
Half the speed at the moment. This is unfortunate, because when you do an exact, apples-to-apples comparison using exactly the same API, the R300 looks twice as fast, but when you use the vendor-specific paths, the NV30 wins."

FYI: this means R300 runs MUCH FASTER w/ ARB2 path.

Edit: closed bolding

tb · Jan 30, 2003

I think in the final Doom 3 version everyone can choose his prefered(quality or speed) rendering path, just like in the beta version...

So, no time for panic.

Thomas

OpenGL guy · Jan 30, 2003

Chalnoth said:
Colourless said:

My Interpretation

...
Quality Wise
GFFX:ARB2 > GFFX:NV30

Click to expand...

This is highly unlikely.

More like highly likely, but you probably won't notice the difference. If GFFX:ARB2 is using 32-bit floats and GFFX:NV30 is using 16-bit floats, there's no doubt that GFFX:ARG2 > GFFX:NV30.

Anyway, here's a quick and dirty explanation as to why 16-bit floating point is enough for the vast majority of calculations.

The argument is simple: the mantissa is 10 bits for nVidia's half float format. It appears that on any video card, the framebuffer will be in the 8888 integer format (due to the lack of blending modes in the floating-point formats, according to what JC stated). This means that for any color-only data, the 16-bit floating point format will offer more than enough accuracy than will ever reach the framebuffer.

Is that so? How many operations can you do with such a 16-bit float before the difference affects the LSB of an 8-bit integer format? Not too many.

Let me give a contrived example:
X = 2.6
Y = 3.8
X + Y = 6.4

However, if you only had 1 digit of precision, you'd be reduced to:
X = 3
Y = 4
X + Y = 7

when the closer value would be 6.

Very artificial, sure, but if you do enough computations, errors like this become commonplace.

32-bit floats may be useful for things like normal maps. Whether or not these are used is up to JC, but it seems apparent to me that he is very concerned about offering an equivalent experience on every video card.

Carmack may not need 32-bit floats, but he might need more than 16-bit.

demalion · Jan 30, 2003

I think a more balanced and through comparison is, trying to list the biggest differences from top to bottom is...:

Speed:

GFFX:NV30>>GFFX:ARB2
9700:ARB2>>GFFX:ARB2
GFFX:NV30>9700:ARB2
9700:R200>9700:ARB2

Quality:

GFFX:ARB2>GFFX:NV30
9700:ARB2>GFFX:NV30
GFFX:ARB2>9700:ARB2
9700:ARB2>9700:R200

The larger the number of bits you start with, the less quality difference with additional bits. I have no idea currently how noticeable 64 bit float is, but if Carmack doesn't make mention of it as a quality concern for the nv30 path, I think it is safe bet that it isn't a concern for Doom 3. This also seems to indicate the quality advantage of the ARB2 path's current precision is pointless. I think comparing 64-bit on the nv30 (nv30 code path) to 96-bit on the r300 is likely to be fair based on this info.

Which leaves us with a comparison of Speed roughly equal (slight edge to GFFX:NV30, though the R300 can lead sometimes), and Quality roughly equal (slight edge to R300:ARB2, though the difference may only be mathematical).

Can we live with that for now?

May I make suggestions that quips about how Carmack's praising of the drivers is unflattering like an elementary school progress award, or LONG rants about how Carmack is rigging his code against a card, can be done without?

As for the GFFX getting a special path...well, 1) the R200 was advanced enough so that its code path is almost comparable (and still offers enhanced speed and good quality) to the most advanced path 2) the R300 doesn't offer multiple precisions so doesn't seem like it especially needs one.

The R300 runs at 325 MHz and is competing neck and neck with a card at 500 MHz...what the heck more would you want?

antlers · Jan 30, 2003

I think it is quite likely that (mathematically, if not visibly) the ARB2 path is more accurate than the NV30 path. The NV30 path is described (as are the R200, NV20, and NV10 paths) as full-featured. The ARB2 path is described as full-featured with minor rendering improvements.

It sounds like people with R300s will be running the ARB2 path, and get slightly lower performance and perhaps imperceptibly better image quality than people running the NV30 path on NV30s.

Masochists with NV30s or people with faster successor chips will be able to run the ARB2 path on those to get even better FP image quality because of the higher precision FP pipeline.

People with R300s will be able to run the R200 path for a minor image-quality/speed tradeoff.

T2k · Jan 30, 2003

I wanna use 4xAA+8xAF at least 1280... so, what do you think NOW about a neck-to-neck situation?

C'mon: switch on this on NV30 and the race finished...

Doomtrooper · Jan 30, 2003

I'd like my 9700 to show up, thats what ..running my Radeon 64 right now as I gave my 8500 to the wife..BF 1942 just isn't the same.

Whats DX9 spec 32 FP ??

T2k · Jan 30, 2003

antlers4 said:
I think it is quite likely that (mathematically, if not visibly) the ARB2 path is more accurate than the NV30 path. The NV30 path is described (as are the R200, NV20, and NV10 paths) as full-featured. The ARB2 path is described as full-featured with minor rendering improvements.

It sounds like people with R300s will be running the ARB2 path, and get slightly lower performance and perhaps imperceptibly better image quality than people running the NV30 path on NV30s.

Masochists with NV30s or people with faster successor chips will be able to run the ARB2 path on those to get even better FP image quality because of the higher precision FP pipeline.

People with R300s will be able to run the R200 path for a minor image-quality/speed tradeoff.

Agreed.

Althornin · Jan 30, 2003

Chalnoth said:
Anyway, here's a quick and dirty explanation as to why 16-bit floating point is enough for the vast majority of calculations.

when it looked like GFFX would be faster than 9700 with 32bit floating point (per component), you launched a tirade on these boards about how 24bit floating point (per component)was insufficient.

JD · Jan 30, 2003

The best part of the whole .plan file is that JC is pushing ARB extensions which is a big win for many devs disgruntled with current vendor specific extension mess. Also interesting is that he also finds the gffx noisy. Also, no mention of CG at all.

Carmack's comments on NV30 vs R300, DOOM developments

KimB

Doomtrooper

Hellbinder

KimB

Colourless

Monochrome wench

boobs

Doomtrooper

KimB

Fuz

KimB

T2k

tb

OpenGL guy

demalion

antlers

T2k

Doomtrooper

T2k

Althornin

Senior Lurker

JD

Similar threads