WaltC said:
Right, it's not just DX9 that's slow at full precision on nV3x, it's ARB2, as well.
"ARB2" is not a specification; it's John Carmack's name for a particular rendering path in the Doom3 engine. Unless you thought we were talking about doing DCC preview and production quality offline rendering using the Doom3 engine--replacing Maya with machinima, if you will--"ARB2" has no place in this discussion.
Perhaps you meant to say "ARB_fragment_program," which is the ARB's non-proprietary extension that serves as the OpenGL counterpart to DX9's PS 2.0. Of course since digital content creation--whether using hardware acceleration in the preview or the final render stages--is typically rendered to still, film or video formats and not distributed as code to be run on the end-user's GPU, the relevant specification would more likely be NV_fragment_program, the Nvidia proprietary extension that serves a similar purpose.
Yes, we all know it's still slow in full precision.
Here's a tip: perhaps what you meant to say was "NV3x is not just slow at full precision in DX9, but also in OpenGL." Of course this was not in dispute, so long as you measure "slow" in reference to realtime framerates--which, by the way, we're not doing here.
The point for me is that the idea that nV3x was designed for "workstations" and not "3d gaming" is simply void, as it suffers from the same problems in workstation useage--it's slow at full precision there, too.
Compared to R3x0 at FP24, yes. Compared to offline rendering costing a couple orders of magnitude more, it is extremely fast (at the subset of functionality it can provide). Compared to R3x0 at FP32...oh yeah...
What you don't seem to understand is that graphics performance is only "slow" or "fast" in reference to a target framerate that depends crucially on the task being done. Just as it's utterly irrelevant whether a graphics card can push 200 or 400fps in Q3, it's close to irrelevant whether a graphics card can push 2 or 4 fps when used to preview a shader effect in Maya, or to accelerate offline rendering that would take many seconds per frame in software.
Let's invent two hypothetical cards, A and B, and two shader workloads--one simple (for offline rendering; still far too complex for realtime), one less so--that Bob the Special Effects Guy wants to preview in his effects editing package before sending the frame to the render farm for a full render that will take minutes or hours. Let's say card A can render the first effect at 2fps, but can't accelerate the second effect at all, because it has a limit on the number of instructions in a shader, or because the shader needs some functionality like branching which card A doesn't support; meaning Bob needs to do a software preview that takes, say, two minutes. Now let's say card B can render the first effect at .5fps, and the second effect at .1fps. Which card would Bob rather have?
Oh, almost forgot to mention: card A renders at a different internal precision from the final render on the renderfarm, so there's a greater chance the effect as previewed won't look quite like it does as rendered. While the preview render from card B won't match the final render exactly--that's why it's a preview, after all--at least it won't have the precision issue. Now, which card would Bob rather have?
Obviously card B, even though on a common workload card A is four times faster. That 4x performance isn't as important as the other stuff.
I should point out that this hypothetical doesn't necessarily capture the pros and cons of, say, NV35 vs. R350 for this market. Among other things, ATI seems to have better toolkit support, as Ashli integrates with the major rendering packages, and avoids most cases where NV35's more flexible shader program rules might be expected to allow it to run a wider class of shaders, because Ashli can generate automatic multipassed shaders.
But it isn't meant to capture today's market dynamics, but rather the design considerations Nvidia may have had when designing NV3x. It certainly demonstrates the point of having a very flexible fragment shader pipeline that so lacks in performance that it could never come close to its limitations while maintaining realtime framerates.
OK, I went back a second time and reread it, and still din't see one word in it about nV3x and DX9 (or the same functionality exposed in OpenGL.)
Howbout the words "dependent texture reads and floating point pixels," which (in addition to longer shader program lengths, which, while not explicitly mentioned are a third obvious factor for what he's talking about) is a pretty darn good definition of DX9 functionality?
In fact, this sentence:
John Carmack said:
...The current generation of cards do not have the necessary flexibility, but cards released before the end of the year will be able to do floating point calculations, which is the last gating factor....
...leads me to believe that this was June 27, 2002
Nice piece of sleuthing, Einstein. Where as for me it was this sentence that led me to the same conclusion:
John Carmack said:
by John Carmack (101025) on 09:51 PM June 27th, 2002
Different strokes for different folks, I guess.
In fact, Carmack simply seems to be discussing, in general, the trend of 3d-chips overtaking software renderers, which has been in progress ever since the V1 rolled out.
No, he's quite explicitly discussing how VS/PS 2.0(+) functionality would lead in the near future to cards based on consumer GPUs being used to replace low-end uses of offline software renderers in generating production frames, particularly for TV. And he notes that he's recently been briefed on how "some companies have done some very smart things" in terms of an emphasis on achieving this possibility with their upcoming generation of products instead of further down the road as JC had previously assumed.
So re-reading Carmack's very general statement here doesn't provide me with how you reached your ideas about nV3x being "special" in this regard, in comparison with R3x0--which is headed in the same direction.
First off, your original snide-ass comment wasn't that R3x0 was equally well-positioned for offline rendering of production frames as NV3x, but rather that anyone who thought taking over some of the low-end offline rendering market was a design goal for current-generation DX9 cards was worthy of ridicule. Carmack's post indicates that either you're wrong, or that he's not only just such an idiot but even managed to misconstrue what ATI/Nvidia (or both) had told him so as to mistakenly assert that one or both of them was actually pursuing that goal when obviously they weren't.
Having said that, while it's true the post doesn't specifically identify Nvidia as hastening this push any more than ATI, some context makes it clear that this is more likely. In particular, while the R300 hadn't been officially launched, it had already had its debut about a month prior--by Carmack himself, running Doom3 at id's E3 booth. Meanwhile, this was only a month or so before SIGGRAPH 2002, where Nvidia showily launched "Cinematic Computing" and Cg; it seems very likely that Carmack would have recently recieved his preview of what they were set to announce then.
Of course ATI also launched RenderMonkey at the same conference, so perhaps Carmack really was referring to the both of them in this comment. Still, I don't think it's really debatable which of the two focused much more strongly on pushing their new generation cards to preview render shaders in offline content creation, and which was the much closer to pushing for their cards to be used to take production frames from software renderers.