Tony Tamasi Interview

radar1200gs · Apr 26, 2004

Ailuros said:
The lack of gamma correction in nVidia's early drivers won't cause any program to fail to run, which is more than can be said of R300 and NOLF2.

Click to expand...

I can live w/o gamma correction for the record and haven't seen anything that I can't so far excuse in previews when it comes to NV40. The results were fine for me in such an early stage it was tested.

However that last part of the sentence was entirely unnecessary too; if you really want me to start splitting hair, there's no perfection either in NV40's premature drivers. Understandably so, but do I really need to start listing things that do not work yet as they should?

Do they bring entire games to a screaming halt?

Volt: Yes, nVidia did initially claim Gamma correct AA for NV30. In a couple of PDF's IIRC, probably the ones pertaining to Intellisample.

anaqer · Apr 26, 2004

radar1200gs said:
Do they bring entire games to a screaming halt?

Is there a point to this?

Ailuros · Apr 26, 2004

There isn't a point in this to answer anager's questions. To put things into perspective I consider both R300 and NV40 premature previews being on a very high level. As I said hairsplitting and it's really a topic that will result into another senseless pages long debate over nothing.

Depends what you're really aiming for radar.

hoom · Apr 26, 2004

I loved the way he fudged between criticising fp24 as 'insufficient precision' (hmm seen all those back at nv30 launch) and meanwhile saying that fp16 is ok though for nv3x

So much so bollocks in that interview.

Is he admitting that the 'sm3.0 mod' is actually a sm2.0 extension to the existing sm2.0 functionality in the game???
Certainly, having seen hummus' demos, Farcry itself & those nv40 presentation videos, I can't see what is actually sm3.0 in that mod.
[edit]nevermind, discussed here: http://www.beyond3d.com/forum/viewtopic.php?t=11873 [/edit]

Bjorn · Apr 26, 2004

arrrse said:
I loved the way he fudged between criticising fp24 as 'insufficient precision' (hmm seen all those back at nv30 launch) and meanwhile saying that fp16 is ok though for nv3x

That's not necessarily BS though since perhaps the NV3X wasn't really capable of running shaders long enough to create problems with FP 16

Geeforcer · Apr 26, 2004

arrrse said:
I loved the way he fudged between criticising fp24 as 'insufficient precision' (hmm seen all those back at nv30 launch) and meanwhile saying that fp16 is ok though for nv3x

He did? Where did he say anything about NV30 and fp16, I must have missed it.

hoom · Apr 26, 2004

He sure did.

and of course for Shader Model 3, the required precision is FP32, so you don't get any artifacts that might have been due to partial precision. You can still get access to partial precision, but now anything less than FP32 becomes partial precision. Essentially, the required precision for Shader Model 3 is FP32. What do gamers get out of this? Well, they're going to get titles or content that either looks better or runs faster or both.

Decoded:
Partial precision gives you artifacts & fp24 is partial precision in sm3 --> fp24 = artifacts.
But if the hardware can't run the shader fast enough in fp32, you can still use fp16 which, as we've been saying for the last year, is actually good enough for anything that you might need to do with sm2, in fact, who needs sm2 when you've got ps1.4, I mean doom3 only does ps1.4 effects & both you and I know that The Carmack is the man.

Meanwhile in reality: fp16 as used almost 100% of the time by nv3x (& due to nv40 support of fp16 & Nv instruction to developers for the last year to use -pp everywhere, is currently used lots by nv40) is totally not good enough & there is bugger all difference in quality between fp24 & fp32.

and

TR: What about some examples of shaders where FP32 precision produces correct results and FP24 produces visible artifacts?

Tamasi: You don't have to listen to me, you can listen to the statements by Tim Sweeney. They've got a number of lighting algorithms that produce artifacts with FP24. In general, what you're going to find is that the more complex the shader gets, the more complex the lighting model gets, the more likely you are to see precision issues with FP24. Typically, if you do shaders that actually manipulate depth values, then again you might see issues with FP24.

Decoded: There are all manner of different possible algorithms that can written which will be affected by fp precision & Tim Sweeny our TWIMTBP lead spokesman says you gotta have fp32 so you gotta have fp32.

Meanwhile in reality: Any developer with half a brain actually uses algorithms that don't loose precision between passes & even if they did use precision affected algorithms, fp32 really isn't that much better than fp24 on even those types of algorithm.

But generally, the more complex the lighting algorithm, or they actually manipulate depth, the more likely you are to run into precision issues with FP24.

Decoded: Fp24 is not good enough for complex lighting or depth algorithms.

Meanwhile in reality: The fp16 used by nv3x really is not good enough for complex lighting & depth algorithms, while fp24 is good enough so far & there are few cases where fp32 is a substantial improvement.

And I think lastly, the big issue is that there is no standard for FP24, quite honestly. There is a standard for FP32. It's been around for about 20 years. It's IEEE 754.

The fp format standard any coder should be coding to is the fp format used by the hardware you are coding on/for (which should be as broad a variety as possible in order not to loose substantial market share).
Who gives a toss if you're using ieee standard fp32 if ATIs fp24 or nv3xs entirely non standard fp16 is adequate for what a graphics coder needs?
Do nv really have full ieee standard fp32?
Or just the bits that are adequate for what a graphics coder needs?

I do more or less agree with him about how to count transistors.
Though, in terms of a 12 active pipe chip on a chip designed for up to 16 pipes, it is quite fair to not count the inactive transistors (160mil) but wasn't the story that ATI was only counting logic & not any cache or ancilliary non-core-functionality transistors?
Anyways, I would be surprised if the count of all the transistors on a 16 pipe r420 would come out anything other than pretty close to the nv40s 220 million odd.

hoom · Apr 26, 2004

That's not necessarily BS though since perhaps the NV3X wasn't really capable of running shaders long enough to create problems with FP 16

Yes it is.
If ps2.0a is to be believed, nv3x is capable of running 65000 instruction shaders

digitalwanderer · Apr 26, 2004

arrrse said:
He sure did.

and of course for Shader Model 3, the required precision is FP32, so you don't get any artifacts that might have been due to partial precision. You can still get access to partial precision, but now anything less than FP32 becomes partial precision. Essentially, the required precision for Shader Model 3 is FP32. What do gamers get out of this? Well, they're going to get titles or content that either looks better or runs faster or both.

Click to expand...

Decoded:
Partial precision gives you artifacts & fp24 is partial precision in sm3 --> fp24 = artifacts.
But if the hardware can't run the shader fast enough in fp32, you can still use fp16 which, as we've been saying for the last year, is actually good enough for anything that you might need to do with sm2, in fact, who needs sm2 when you've got ps1.4, I mean doom3 only does ps1.4 effects & both you and I know that The Carmack is the man.

Meanwhile in reality: fp16 as used almost 100% of the time by nv3x (& due to nv40 support of fp16 & Nv instruction to developers for the last year to use -pp everywhere, is currently used lots by nv40) is totally not good enough & there is bugger all difference in quality between fp24 & fp32.

and

TR: What about some examples of shaders where FP32 precision produces correct results and FP24 produces visible artifacts?

Tamasi: You don't have to listen to me, you can listen to the statements by Tim Sweeney. They've got a number of lighting algorithms that produce artifacts with FP24. In general, what you're going to find is that the more complex the shader gets, the more complex the lighting model gets, the more likely you are to see precision issues with FP24. Typically, if you do shaders that actually manipulate depth values, then again you might see issues with FP24.

Click to expand...

Decoded: There are all manner of different possible algorithms that can written which will be affected by fp precision & Tim Sweeny our TWIMTBP lead spokesman says you gotta have fp32 so you gotta have fp32.

Meanwhile in reality: Any developer with half a brain actually uses algorithms that don't loose precision between passes & even if they did use precision affected algorithms, fp32 really isn't that much better than fp24 on even those types of algorithm.

But generally, the more complex the lighting algorithm, or they actually manipulate depth, the more likely you are to run into precision issues with FP24.

Click to expand...

Decoded: Fp24 is not good enough for complex lighting or depth algorithms.

Meanwhile in reality: The fp16 used by nv3x really is not good enough for complex lighting & depth algorithms, while fp24 is good enough so far & there are few cases where fp32 is a substantial improvement.

And I think lastly, the big issue is that there is no standard for FP24, quite honestly. There is a standard for FP32. It's been around for about 20 years. It's IEEE 754.

Click to expand...

The fp format standard any coder should be coding to is the fp format used by the hardware you are coding on/for (which should be as broad a variety as possible in order not to loose substantial market share).
Who gives a toss if you're using ieee standard fp32 if ATIs fp24 or nv3xs entirely non standard fp16 is adequate for what a graphics coder needs?
Do nv really have full ieee standard fp32?
Or just the bits that are adequate for what a graphics coder needs?

I do more or less agree with him about how to count transistors.
Though, in terms of a 12 active pipe chip on a chip designed for up to 16 pipes, it is quite fair to not count the inactive transistors (160mil) but wasn't the story that ATI was only counting logic & not any cache or ancilliary non-core-functionality transistors?
Anyways, I would be surprised if the count of all the transistors on a 16 pipe r420 would come out anything other than pretty close to the nv40s 220 million odd.

Thanks! I'm too pre-coffee to pull off a post half as good as yours about this, but you said everything I wanted to.

hoom · Apr 26, 2004

I probably shouldn't even have bought up the fp issue anyway

I mean its been done to death here how many times already???
[edit]& by people who know craploads more about such things than me[/edit]

Heathen · Apr 26, 2004

Well apart from the gibberish about FP24, which I see arrse has explained better than I could be bothered to, isn't wasn't really that bad an interview.

Ardrid · Apr 26, 2004

Geeforcer said:
Ardrid said:

Since when?

Click to expand...

[url=http://www.beyond3d.com/forum/viewtopic.php?t=11836&postdays=0&postorder=asc&start=15 said:

MuFu[/url]]
Still much faster than nV when running full precision shaders in the vast majority of cases; sometimes by ~100% in synthetic benchmarks (and that's just the X800 Pro, heh). How's that?

Click to expand...

Is that accurate or is MuFu just guesstimating?

digitalwanderer · Apr 26, 2004

Ardrid said:
Is that accurate or is MuFu just guesstimating?

It's a 1,000%, totally accurate guestimate. 8)

PaulS · Apr 26, 2004

digitalwanderer said:
Ardrid said:

Is that accurate or is MuFu just guesstimating?

Click to expand...

It's a 1,000%, totally accurate guestimate. 8)

Apart from the fact he was talking about the wrong part, and had misunderstood how frequently it would happen? Yeah

Fixed now though.

Optimummind · Apr 26, 2004

Geeforcer said:
Althornin said:

radar1200gs said:

BRiT said:

Interesting, if the 6800 truely has gamma-adjusted/corrected AA, the screenshots certainly don't show it.

Click to expand...

Early drivers.

Click to expand...

huh.
you crucify ATI for problems in early drivers, where are your harsh words for nVidia?

Click to expand...

Why would there be any harsh worlds regarding such problems in NV40 drivers when neither the drivers nor the hardware is actually available? I don't think anyone is bothered by any flaws in the early drivers for the card they don't have. If the drivers included with shipping cards have said problems, that the parallel would actually hold water.

Exactly. Before the Catalyst series, ATI drivers were harshly criticized b/c their official release and shipping drivers were very flawed. In this case, you can't criticize the 60.72 drivers b/c they're BETA and not official ones.

digitalwanderer · Apr 26, 2004

PaulS said:
Apart from the fact he was talking about the wrong part, and had misunderstood how frequently it would happen? Yeah

Details, details....lighten up!

WaltC · Apr 26, 2004

FUDie said:
DemoCoder said:

http://www.beyond3d.com/forum/viewtopic.php?p=253502&highlight=#253502
It's not implemented yet, but it can be, quite easily actually on the NV40.

Click to expand...

NVIDIA also claimed gamma corrected AA for the NV30 and we all know how that worked out...

-FUDie

One of my pet peeves is when IHVs talk up a hardware feature and then lamely add that "we haven't exposed it in our drivers yet." (This is not merely a criticism of nVidia, either, but I include all of the IHVs who do it.) The fact is that if it's unexposed in the drivers then from the standpoint of the user it might as well not exist within the chip at all, since it provides no benefit. The fact is that an end user really doesn't know whether the support is actually present but unexposed, or whether it actually isn't supported in the hardware but the IHV simply doesn't want to admit it. But practically speaking, such unexposed features are the equivalent of not being supported in the hardware to begin with (until such time as they are "exposed," of course, if ever.)

I recall years ago an instance with the TNT which was contrasted with the 3dfx V3, as I recall (or might have been the V2--been a long time), and one of the features 3dfx supported that the TNT did not was 8-bit palletized textures. nVidia kept telling everyone that TNT supported them, too, but that the feature was "unexposed" in the drivers. Several months after the TNT2 began shipping, it was finally revealed that "Surprise! TNT doesn't really support 8-bit palletized textures after all," and after that came out nVidia embarked on a campaign of telling everyone that they'd never said it was "present but unexposed" in the first place, and that everyone who thought otherwise must've hallucinated the whole thing...

If anyone else can remember this and I've remembered incorrectly, by all means chime in and let me know...

But generally, until it is "exposed in the drivers," I really don't want to hear about it.

Edit: I'd also add that I chimed in with my comments on the Tamasi interview on the TR comment forums, and can think of nothing further to add here in addition...

GraphixViolence · Apr 26, 2004

DemoCoder said:
Chalnoth said:

Ardrid said:

Hey guys, TR's got an interview up with Tony Tamasi and it seems that the 6800 Ultra has gamma-corrected AA or at the very least "gamma adjusted"

http://www.techreport.com/etc/2004q2/tamasi/index.x?pg=1

Click to expand...

Notice that it did state that a shader pass would be required. Therefore don't expect the ability to force gamma correction in the driver, and if it is forceable, expect some performance hit.

At least it would be configurable, though.

Click to expand...

That's a pass on a screensized quad, and the performance hit should be negligable (back in the old days, chips would do another pass in order to do the final downsample. Minor fillrate usage, mostly a bandwidth saver) I'm not sure the NV40 requires an extra pass tho.

Didn't this come up in an earlier discussion? To do this they would have to execute a pixel shader on the multisample buffer, which is not in a displayable format since it's compressed and each pixel could have multiple color values associated with it. The uncompressed version of this buffer would be larger than the screen size by an arbitrary amount. Even if they can manage that, using a pixel shader to do the gamma correction guarantees that an extra pass will be required, because all the polygons in the scene would have to be rendered before running it. In fact, wouldn't you have to de-gamma correct the shader output first, then do the AA resolve, then re-gamma correct for the final output?

If they are planning to implement this, I can see it causing a lot of issues with existing apps, which might explain why it's not enabled in their drivers yet.

DemoCoder · Apr 26, 2004

Nope, nope, and nope. No, re-rendering the geometry is not required. No, the MSAA "format" of the buffer isn't an issue on the NV40. And no, you don't neccessarily have to "degamma" first.

FUDie · Apr 26, 2004

DemoCoder said:
No, the MSAA "format" of the buffer isn't an issue on the NV40.

And how do you know this?

-FUDie

Tony Tamasi Interview

radar1200gs

anaqer

Ailuros

Epsilon plus three

hoom

Bjorn

Geeforcer

Harmlessly Evil

hoom

hoom

digitalwanderer

wandering

hoom

Heathen

Ardrid

digitalwanderer

wandering

PaulS

Optimummind

digitalwanderer

wandering

WaltC

GraphixViolence

DemoCoder

FUDie

Similar threads