WinHEC slides give info on early spec of DX in longhorn

Zeross said:
I asked Cass Everit on this issue because I always thought that GF3 had both a fixed function T&L unit and a VS unit and that with the GF4 the T&L was replaced with another VS. Seems I was wrong cause he answered me this :
Zeross,
It essentially uses the same computational resources, but fixed-function does not use a stored program model on GeForce hardware.

Thanks -
Cass

Basically that means that there is no dedicated hardware for fixed function T&L, but on the other hand it's not an "emulation" like the R300 is doing.

Yeah, this is pretty well much what I figured.
 
trinibwoy said:
Huh? I thought the software was backwards compatible with the hardware not the other way around :?:
It's both. From the nVidia driver download page (emphasis mine):
Part of the NVIDIA Forceware unified software environment (USE). The NVIDIA UDA guarantees forward and backward compatibility with software drivers. Simplifies upgrading to a new NVIDIA product because all NVIDIA products work with the same driver software.
 
anaqer said:
Isn't the "unification" done backwards only? :?: Last time I checked (GF4 era) you DID need driver support to run a given card.
While I didn't test it myself, I seem to remember people running the NV2x with NV1x drivers in order to use supersampling FSAA.

Anyway, it's definitely true that you're much more likely to run into bugs by using older drivers, since they were obviously never tested with the newer hardware, but the capability is there.
 
If T&L is achieved via "emulation" on nv boards then why doesn't T&L performance scale like the radeons do? Or at least, according to the data here:

X800
http://www.beyond3d.com/reviews/ati/r420_x800/index.php?p=15

6800
http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=19

Not sure there is going to be much difference from the the different shader units from FX to 6800 since T&L is hardly likely to use the new capabilities of the 6800 shaders - plus, the vs 1.1 / 2.x performance of 6800 is scaling in comparison to FX, so if this was emulation then T&L would get the same boost purely from being an "optimized" shader program. Looks to me like its not all emulated if these number are correct.
 
whql said:
If T&L is achieved via "emulation" on nv boards then why doesn't T&L performance scale like the radeons do? Or at least, according to the data here:

X800
http://www.beyond3d.com/reviews/ati/r420_x800/index.php?p=15

6800
http://www.beyond3d.com/previews/nvidia/nv40/index.php?p=19

Not sure there is going to be much difference from the the different shader units from FX to 6800 since T&L is hardly likely to use the new capabilities of the 6800 shaders - plus, the vs 1.1 / 2.x performance of 6800 is scalinging in comparison to FX, so if this was emulation then T&L would get the same boost purely from being an "optimized" shader program. Looks to me like its not all emulated if these number are correct.

Up until the NV40 it's my understanding that there was special fixed function hardware on NV boards but it was used to speed up only part of the process.
Basically they had special hard ware that would speed up the calculation of some of the fixed function lighting calculations.

It's also really hard to compare NV vertex shaders to ATI ones on short shaders, Nvidia, has (or at least had) some extra overhead/vertex. It's a fixed cost so it skews the results significantly on shorter shaders.
 
whql said:
If T&L is achieved via "emulation" on nv boards then why doesn't T&L performance scale like the radeons do?
Once again:
Optimized shader program. Programs compiled on the fly aren't able to manage resources as efficiently, as many assumptions cannot be made. Fixed function is known, and thus optimized for.
 
Chalnoth said:
Optimized shader program. Programs compiled on the fly aren't able to manage resources as efficiently, as many assumptions cannot be made. Fixed function is known, and thus optimized for.

Errr, yes - I'm not saying it is or isn't an optimized shader program. An optimised shader program is just as much "T&L emulation" as a program compiled on the fly.

But nv40 clearly isn't scaling with its shader increase over nv30 in t&l performance - fixed function is known so if both nv30 and nv40 were only using an optimized shader program then there should be a similar increase in performance (as shown by the radeons).

Seems that ERP's already explained it in that in some cases L is or was still using fixed function hardware.
 
Why must the NV40 have as much gain over the NV30 as the R420 has over the R300? They're different cores with different clock speeds, and the different tests will have different bottlenecks given the different performance characteristics. I see nothing here that requires the interpretation that there's fixed-function hardware.
 
Are you a bit dense?

Look at the previous pages of those articles - nv40 has 68% higher vertex rates than nv38, and this is demonstrated (and exceeded) in vertex shader processing, its not in T&L test, in fact its only 26% faster than nv38. The comparison with ati was to show that using T&L performance via "emulation" can scale with the shader performance.

Seeing as the nv40 isn't scaling over nv3x in T&L performance this tends to suggest that either nv3x or nv3x and nv40 had some fixed function elements.
 
I thought it was said NV3x had some fixed TnL hardware, which was removed for NV40 (in favor of ATi's fixed-on-programmable approach). This may account for the relatively smaller gain in TnL vs. VS performance.
 
Pete said:
I thought it was said NV3x had some fixed TnL hardware, which was removed for NV40 (in favor of ATi's fixed-on-programmable approach). This may account for the relatively smaller gain in TnL vs. VS performance.
The numbers support that, but Chalnoth was arguing against of for geforce fx as well as nv40.
 
Seeing as the nv40 isn't scaling over nv3x in T&L performance this tends to suggest that either nv3x or nv3x and nv40 had some fixed function elements.

I must be slow today or I can´t figure out where that conclusion comes from. He has a point when he mentions clockspeeds.

NV30=500........R300=325
NV35=450........R350=380
NV38=475........R360=412
NV40=400........R420=475 (520 for XT)

What am I missing?

***edit: NV3x=3 VS, R3xx= 4 VS, NV4x/R4xx= 6 VS
 
whql said:
Are you a bit dense?

Look at the previous pages of those articles - nv40 has 68% higher vertex rates than nv38, and this is demonstrated (and exceeded) in vertex shader processing, its not in T&L test, in fact its only 26% faster than nv38. The comparison with ati was to show that using T&L performance via "emulation" can scale with the shader performance.

Seeing as the nv40 isn't scaling over nv3x in T&L performance this tends to suggest that either nv3x or nv3x and nv40 had some fixed function elements.
No.

It suggests that due to the higher performance of the fixed-function test, some other factor is limiting performance.
 
Ailuros said:
What am I missing?
Look. From the beyond3d reviews -

Theoretical Rates:
5950: 190mtris, 6800u: 320mtris. 6800u 68.4% higher performance than 5950

9800xt: 412mtris, x800 xt: 780mtris. x800 89.3% higher performance than 9800xt

Performance differences between respective boards

Code:
         nv       ATI
T&L      26.1%    87.0%
VS1.1    69.6%    87.0%
VS2.0    68.8%    82.6%
VS2.0    112.3%   86.4%
VS2.x    120.7%   n/a

Look, the vertex shader tests are inline with the theoretical performance rates (the branching ones for nvidia are higher, but nv40 is supposed to have better branching capabilities), however the T&L test isn't scaling - this is unlike ATI who everyone seems to believe have always been "emulating" T&L and we can see that its T&L performance via emulation does scale with improved vs performance.
 
Chalnoth said:
It suggests that due to the higher performance of the fixed-function test, some other factor is limiting performance.
Like what?

Ever considered why it has higher performance in the first place, given that all the nv boards have lower theoretical rates and lower general vertex shader performance in comparison to ati in the first place? It suggests to me that, as ERP explains, one or another had fixed some fixed function hardware in there in the first place which is boosting its performance relative to any other vertex shader usage.
 
Chalnoth said:
Optimized shader program. Programs compiled on the fly aren't able to manage resources as efficiently, as many assumptions cannot be made. Fixed function is known, and thus optimized for.

you're not planning on commiting a faulty statement of yours any day soon? this is wrong, at least for the gf3 stuff you stated above. very wrong.

you believe too much in optimized shaders. the normally used fixed function is very simple (transform only, lighting doesn't bother most, most games have lightmaps instead). for this, you can't optimise much, except go to a dedicated hw path.

they had the full t&l part split from vs in gf3/gf4 days. they possibly partially unified the resources on gfFX but i doubt it (i would have gotten informed, i guess).

they have now dropped fixed function, i think, but i'm not sure here.
 
davepermen said:
you believe too much in optimized shaders. the normally used fixed function is very simple (transform only, lighting doesn't bother most, most games have lightmaps instead). for this, you can't optimise much, except go to a dedicated hw path.
That's kind of the point. The simplicity means that many assumptions can be made, and it's easier to schedule instructions, to keep data flowing. Those same assumptions cannot be made when running a vertex shader, and thus the hardware has to be a bit more careful, and doesn't always know how to allocate resources.

Now, exactly how these "shaders" were run I don't know. They could well have been embedded into the hardware, but I seriously doubt any computational resources that were not made available in the shaders were ever used for fixed-function vertex processing.
 
radar1200gs said:
Also given that the average consumer purchases a machine and uses it for 3 years before disposing of it or upgrading it significantly would you say that a GF6800 based card or a Radeon series card would be a better choice for such a consumer?


IMHO, I don't many peeps buy a high end card with a view to keeping it for three years., certainly not enthusiasts, because they know that the card will be totally outdated within 2. Ironically, it's low end cards that get used longer, but then thats often in machines for which intensive 3d performance is irrelevant and which, if bought today, are extremely unlikely to be upgraded to Longhorn anyway.

And like someone said, neither architecture is dx(10) compliant anyway, so it's really moot.
 
Chalnoth said:
davepermen said:
you believe too much in optimized shaders. the normally used fixed function is very simple (transform only, lighting doesn't bother most, most games have lightmaps instead). for this, you can't optimise much, except go to a dedicated hw path.
That's kind of the point. The simplicity means that many assumptions can be made, and it's easier to schedule instructions, to keep data flowing. Those same assumptions cannot be made when running a vertex shader, and thus the hardware has to be a bit more careful, and doesn't always know how to allocate resources.

Now, exactly how these "shaders" were run I don't know. They could well have been embedded into the hardware, but I seriously doubt any computational resources that were not made available in the shaders were ever used for fixed-function vertex processing.

the simplicity means simply one thing: they are simply 4 dotproducts for the whole thing, and some passtrough, most the times. there can not be any software side additional optimisation. the ONLY thing that can be optimized is switching to some dedicated hw path. and you state they don't have one.

but btw, they at least did have one, for long time. thats why they needed the option invariant in opengl shaders, to get identical output in fixed and shader vertex functions. because they have individual hw for both modes.
 
Back
Top