Futuremarks technical response

Why wouldn't FX be able to provide acceptable speed with ps_1_4 with WHQL drivers?

I don't know...ask nVidia?

Because so far, the only benchmarks on WHQL drivers show poor performance (relative to expectations) in anything but Ps 1.1.

I think it's quite obious that NVidia optimised OpenGL NV30 path first (since they are running all their demos on this path), while all other paths were nothing more but working.

That is certainly a possibility. Good thing ATI's customers don't have to worry as much with optimizing for custom paths.....
 
Thanks Joe and Hyp-X. :)

Does the R300 have deparate integer and floating hardware for shaders 1.4 and 2.0 too? If that is the case, then my previous conception is wrong- that PS2.0 simply runs PS1.4 shaders through hardware emulation. Why can't the integer be converted to a float and be done in PS2.0 hardware?

Or why can't the NV30 convert PS1.x to floats and carry them out in its floating pipeline?

One last thing from before I'm not too clear on: NV30 supports 32-bit IEEE through its entire pipeline but R300 does not. I remember the R300 has some parts which support 32-bit while others are capped at 24-bit. Which parts are which? How much of an advantage is it to have IEEE throughout the entire pipe? Kirk made a point that because 32-bit is everywhere, you can mix and match ops on CPU / GPU without losing any precision. Hence the NV30 is a really powerful floating point processor. Your opinion?

Thanks again.
-AP.
 
Hyp-X probably knows more of the specifics than I do, but here' what I know, in "laymans" terms....Because "in laymans terms" is really the only way I know it.;):

Unlike NV30, R300 does NOT have "separate" hardware for integer and floating point paths. Simplistically, everything (PS 1.1, PS 1.4, PS2.0) goes through the floating point paths "at the same speed." You can look at this one of two ways: there's no "performance penalty" for going to higher precision....or there's no "performance benefit" for sticking with a lower precision.

Or why can't the NV30 convert PS1.x to floats and carry them out in its floating pipeline?

It might physically be able to, but according to nVidia, in the FX architecture, the floating point pipeline is inherently slower than the interger pipeline. (Furthermore, the FP32 pipeline is slower than the FP16 pipeline.) So unless you "need" floating point precision, you want to stick with the integer pipeline.
 
Hyp-X said:
demalion said:
I thought I understood this, but others seem to say otherwise and I could use with some confirmation or correction: is floating point output a specification of the shader version, or the DirectX version?

Neigther.

I.e., is there such a thing as a "pixel shader 1.1 program working on floating point values", or does producing/accepting floating point values automatically make it "pixel shader 2.0", regardless of the instructions used?

There's such a thing that "PS1.1 program working on FP values" but it's not specified nor denied by DirectX. The R300 does this (there's no integer shaders in R300).

Ah, OK, thanks...I thought DX 8.1 would have an upper limit, but I guess that wouldn't be future looking, would it?

DirectX requires that PS1.1 should handle at least 8 bit precisity and a range of [-1, 1].
The driver can indicate the highest absolute value it supports in 1.x shaders by the devcaps:

GF3/4: 1.0
R8500/9000: 8.0
GFFX: 2.0
R9500/9700: 340282346638528860000000000000000000000.0

So even DX 8.1 could expose pixel shaders with floating point capability.

The language used in, for example, Futuremark's discussion of GT 4, lead me to believe "pixel shader version" is determined by the instruction set and length, and it simply makes sense to me that it would be this way.

Pixel shader version have to be explicitly specified in the shader source.

OK, so the instruction set and length constraints have to be checked when authoring. I guess we'll have to wait until automatic multipass fallback for anything else to make sense?

For HLSL the version has to be passed to the compiler.
I understand this, but...
The HLSL compiler has no knowledge of the hardware apart from the PS version passed to it!
I had thought the compiler would use devcaps as well for checking whether values were legal, and whether it could compile for "extended" 2.0 functionality? I had thought 2.0 "extended" was a devcap bit that was set...are you saying it would have to be something specified in the source?

Oh, and for confirmation, precompiling shaders by the application when it is run, and storing the results for later execution, is a practical method of usage for DX 9 HLSL, correct?

Your comment about statically linking the compiler is worrying...you have to patch the application to update the compiler, you can't just patch DX runtime and have the application benefit? Why is this necessitated?
 
demalion said:
I had thought the compiler would use devcaps as well for checking whether values were legal, and whether it could compile for "extended" 2.0 functionality? I had thought 2.0 "extended" was a devcap bit that was set...are you saying it would have to be something specified in the source?
DX 9 HLSL does not yet support extended ps_2_0. Cg takes extended shader model 2.0 caps as input.

demalion said:
Oh, and for confirmation, precompiling shaders by the application when it is run, and storing the results for later execution, is a practical method of usage for DX 9 HLSL, correct?
As there aren't much (any?) benefits of compiling on load time (runtime compilation) - yes.

demalion said:
Your comment about statically linking the compiler is worrying...you have to patch the application to update the compiler, you can't just patch DX runtime and have the application benefit? Why is this necessitated?
The compiler is placed in D3DX, which is placed in a lib. This means that when you build your exe program the compiler will get embeded in your exe file. Compiler isn't part of the runtime...
 
Joe DeFuria said:
Because so far, the only benchmarks on WHQL drivers show poor performance (relative to expectations) in anything but Ps 1.1.
There are 42.xx series WHQL drivers out there?
 
There are 42.xx series WHQL drivers out there?

Hmmm...I assumed that at least someone had at least tested FX with WHQL candidate drivers? (Or any drivers available to the public from nVidia for that matter?)
 
Every driver NVidia makes is WHQL candidate... The last driver available to public from NVidia doesn't even support GeForce FX. I'm not aware of any 42.xx WHQL driver, if any of you are please let me know.
 
MDolenc said:
demalion said:
I had thought the compiler would use devcaps as well for checking whether values were legal, and whether it could compile for "extended" 2.0 functionality? I had thought 2.0 "extended" was a devcap bit that was set...are you saying it would have to be something specified in the source?
DX 9 HLSL does not yet support extended ps_2_0. Cg takes extended shader model 2.0 caps as input.

Yes, this was on my mind, I should have said "was a devcap bit that would be set". This is why my concern for...

demalion said:
Your comment about statically linking the compiler is worrying...you have to patch the application to update the compiler, you can't just patch DX runtime and have the application benefit? Why is this necessitated?
The compiler is placed in D3DX, which is placed in a lib. This means that when you build your exe program the compiler will get embeded in your exe file. Compiler isn't part of the runtime...

Ack! Well, game developers can update their SDKs easily enough, I guess, but I still am not clear on why this has to be the case...does something in the communication model of DX or the OS make using a dynamic library undesirable for this? As it stands, it sounds to me like the "bloatware concept of software design", but if you could offer some guesses (or definitive answers if you have them, of course), I'd appreciate it.
 
demalion said:
Ack! Well, game developers can update their SDKs easily enough, I guess, but I still am not clear on why this has to be the case...does something in the communication model of DX or the OS make using a dynamic library undesirable for this? As it stands, it sounds to me like the "bloatware concept of software design", but if you could offer some guesses (or definitive answers if you have them, of course), I'd appreciate it.
If it would be placed in D3D immediate mode, drivers would have to take care of compilation, not Microsoft. Both HLSL and ASM compilers are part of D3DX (which gets statically linked to your application), D3D immediate mode only accepts dword tokens.
 
Looks like this got overlooked amongst the arguing, but here's S3's opinion on 3DMark (from http://www.savage2k.com/):

Since its release, many have been complaining that the widespread use of synthetic benchmarks such as 3Dmark is hurting the industry by making the graphics companies focus on high benchmark numbers and not focusing on the games. What is S3’s stance on this?

Young: I was first in line for complaining about our whole net worth as a company coming down to one benchmark number. Unfortunately, this is the way human beings are. They tend to want to apply a linearly variable number that associates value to things. i.e. horsepower, carat weight, square footage and so on.

Of course we want other measures, games being one of them. Unfortunately, the bulk of our OEM customers do not have the time to really understand all the games out there and they tend to want to focus on a minimal linear measurement such as benchmarks. This is where the design wins are so we have to play the game. (no pun!)

This is really a bad predicament that companies have themselves in. Its like they're caught in a loop, its complained that they need to focus more time on games, yet people use benchmarks, mostly 3DMark, to determine whether they want to buy the product. So in order to keep people buying their product they are forced to focus time on maximizing benchmarks.
 
Back
Top