Chalnoth said:
demalion said:
It seems simple: fp16 and fp32 is a good decision, fx12 and fp16 and fp32 was not. To me, it seems illogical to simultaneously propose that "FX12 was necessary and not wasteful" and "the NV35 is able to improve a similar design significantly", and isn't even consistent with what nVidia themselves has recognized.
Well, going for higher-speed FP16 is obviously better, but that does not mean FX12 was necessarily bad, either.
Didn't say FX12 was bad, said it was wasteful. Hence why I discuss the example of the NV35 and relatively small transistor count increase as being an illustration of this.
Still, Microsoft is making it very hard to get good performance and image quality from the NV31-34 cards through DirectX.
I'm sorry, but that seems to not even be remotely divergent from rampant and nonsensical bias. Feel free to provide some reasoning for the statement that will give me some reason to think otherwise.
Let's try this statement...can you say it, mean it, and have it take for you before the NV30-NV34 fade from your memory?:
"nVidia is making it very hard to get good performance and image quality from the NV31-34 cards through DirectX".
Since nVidia designed the NV3
0-34, and made them dependent on inferiority to the DX 9 PS spec for performance, why is Microsoft to blame and not nVidia? They can perform well for PS 1.1 through PS 1.4 within their limitations, except when you compare to their competitors.
That comparison is nVidia's fault, Microsoft just lets that weakness be exposed. Well, Microsoft and every other cross vendor standard.
The NV3
1-34 are just worse cases of the NV30.
Since I really don't have much information on exactly what kinds of shaders and how many shaders used in real games will need what sorts of precision, I can't give an accurate picture as to whether FX12 was a good decision or not.
That's because you're good at turning a blind eye to what you don't want to see. FX12 dependency is never better in and of itself, it is only worse...the only benefit is from the tradeoffs it might allow you to avoid, and that is irrelevant until those tradeoffs are actually avoided and you gain something significant. If you can implement floating point in approximately the same space, or less, your FX12 implementation was wasteful. This is long demonstrated to be the case for the NV30.
FP16 is just clearly better (for performance...it has a larger transistor count, which may not have been possible in the NV30's timeframe, esp. given the other development problems).
It was possible to do better than even fp16
before the NV30's time frame. It was also possible to implement floating point processing (fp32, AFAIK, except with the severe register limitations) in just slightly more space and the same functionality shortly afterwards. To me, this clearly shows that the NV30 itself was wasteful. Considering both factors, and not selectively ignoring one at a time, how are you saying this is not demonstrated?
All that I do know is general information on what precision is needed where, an common sense from this knowledge tells me that it will be rare to need FP32 throughout most shaders.
You do realize that this in no way validates FX12 independently of your having a preference for it and labelling it "common sense", and then also still leaves fp24 better than fp16?
Why do you still persist in concentrating on the peak performance of the NV3x (NV35 in this case), ignoring the limitations affecting its ability to reach its peak, ignoring the peak performance of the R3xx (which, btw, is 16 ops if you want to ignore limitations), and then concluding that "shader performance will still be higher than an 8 PS per clock architecture"?
The implication is that with enough optimization (hopefully available through a HLSL compiler, eventually if it's not there yet), performance close to that peak can be realized.
Yeah, but then you should either leave the consistent repetition of fallacious comparison to the R3xx out of your "implications", or give some basic recognizition that the same factors would allow it to reach
its peak as well.
The FX architecture is hard to write assembly for. Hopefully these compilers can help (DX9 HLSL and Cg now, GL2 HLSL later).
And the R3xx seems easier to write assembly for.
I think their respective designers are to blame for that. Why do you seem completely unable to accept that possibility and make it part of your working thought process?
As for the coincidence of vec3/scalar and texture/32FP ops, these will prevent the architecture from reaching peak performance,
Not really, it just can't take advantage of them to increase performance, except maybe by conservation of register usage. You just seem to be dedicated to avoiding recognition that the R3xx
can take advantage of them at every turn, though, by pretending they don't exist in your discussion.
but if other DX9-level games are anything like DOOM3, they won't be enough to drop the optimized shader performance of the FX architecture below R3xx levels.[
Doom 3 is a DX 8.1 featureset level game, not DX 9. It was designed with DX 7 in mind, and requires DX
8.1 functionality to do the least work to implement its full effect set. How things fall out after that can benefit speed and quality depending on what the rest of the hardware delivers, and the top of the food chain are the cards with good DX 9 level feature support...that limits NV3x discussion to NV35, presumably using the ARB2 Doom 3 path.
Note the favorable Doom 3 "full" featureset speed/transistor ratio of the RV250. Note the nearest transistor count competitor of similar functional level, the NV34, performing poorly in comparison for all shader execution, with more transistors. Whose fault is that?
All these questions are very important and directly relevant for comparison, and they are questions you consistently ignore when you state "12 versus 8" in what seems to me to be a useless fashion...
All I can state is what I know.
Where was that statement of what you "know" in your post? That uninformative commentary around the "common sense" reference?
Real, solid info on the PS2-level shader performance capabilities of the NV3x in real games just isn't yet available.
Ah, the "real game" stipulation, because...none of the abundant "real PS 2.0 benchmark" performance comparisons are at all relevant to what "real
game PS 2.0" performance will look like?
How about the "real
game" performance comparisons of Doom 3 using the ARB2 path (which exposes DX 9 level functionality, as the NV30 path does not for the NV30-NV34) that John Carmack provided? How about all the "real PS 2.0 demos"? Is there a criteria for exclusion of these PS 2.0 factors, and other VS 2.0 factors for that matter, besides convenience?
All we have is conjecture.
Once you've finished turning your eyes from the inconvenient, sure.