Nvidia and ARB2

WaltC · Sep 21, 2003

Chalnoth said:
The point is that it's capable of full precision. While I'm not sure how it stacks up against 3DLabs' own solution right now, it is a definite step ahead of the R3xx in precision, which may make it the only current viable solution for low-end offline rendering.

And as far as speed is concerned, remember that we're talking about pitting the NV3x vs. software rendering.

Trust me...many offline rendering "farms" today do not need 128-bit color precision, nor 96-bit color precision--many have been operating for years at essentially 32-bit integer precision. This is why most of the rendering software out there does not support 96-bit/128-bit rendering yet--just like 32-bit 3d games don't magically render at 96/128-bits *unless* the software engines support it. Doom3, for instance, is perfectly happy with fx12/fp16 and has *no requirement* for fp24/32 rendering (as Carmack has said a zillion times.) All it would do is slow it down.

This will change, certainly, as time goes on. Of course, the fp aspect is a requirement of DX9 D3d, an API used for *3d gaming*. Basically, for existing 3d-rendering special effects software--FX12 would have been more than enough. nVidia's Quadro line has gotten along for years without fp capability of any type, let alone FX12. What I object to is the idea that fp32 is somehow *immediately useful* in anybody's vernacular. IE, the only reason fp24 is useful for ATi is because the R3x0 performs so well while running it.

I guess that I see any attempt to portray nV3x as anything but primarily a 3d-gaming chip first, and whatever-else second, as a very lame apology for the fact that its full-precision DX9/ARB2 performance sucks...

sireric · Sep 21, 2003

Regarding Adder/Multiplier in series: Each full unit is capable, in itself, of MADs, DOTs and all derivatives. The simpler ALUs don't have all those functions, but they add functions above and beyond that. If it wasn't clear already.

Regarding "offline" rendering: In actuality, none of the HW architectures currently out there are capable of offloading any of the real render farms. Those farms are executing RenderMan (TM) Language shaders, which are often double or higher precision, and offer basic features such as limitless iterators, limitless instuctions as well as full precision buffers (including and above double precision), and infinite oversampling. While some of that can be emulated through multi-pass, not all of it. This flexibility is required for achieving very high resolution and high quality film data. Perhaps for some of the "live TV" stuff a subset would be acceptable. In general, the HW is seen as a way to accelerate the preview aspect. With the ATI Ashli plugin, you can do that. You need to switch to Cg to do the same in NV land. Problem is, all of this stuff is written (for now and the perceivable future) in RenderMan (TM), not Cg.

WaltC · Sep 21, 2003

sireric said:
Regarding "offline" rendering: In actuality, none of the HW architectures currently out there are capable of offloading any of the real render farms. Those farms are executing RenderMan (TM) Language shaders, which are often double or higher precision, and offer basic features such as limitless iterators, limitless instuctions as well as full precision buffers (including and above double precision), and infinite oversampling. While some of that can be emulated through multi-pass, not all of it. This flexibility is required for achieving very high resolution and high quality film data. Perhaps for some of the "live TV" stuff a subset would be acceptable. In general, the HW is seen as a way to accelerate the preview aspect. With the ATI Ashli plugin, you can do that. You need to switch to Cg to do the same in NV land. Problem is, all of this stuff is written (for now and the perceivable future) in RenderMan (TM), not Cg.

Yep, in every report I've read to date these "workstation" cards (borrowing the vernacular used in this thread) are generally used in pre-production work (for "previews", as you've stated), with the actual production work done by the multi-million $ monsters...

Heck, even with the Amiga/Toaster work I did 10 years ago for local broadcasting and cable outlets I was rendering single frames at much higher resolutions than those supported by 3d chips today--several times larger than 1600x1200 (or even 2048x1536, for that matter.) I was using Lightwave then, before it migrated to x86.

Of course, on my 68040 Amigas I was lucky to get a frame every six hours at those high resolutions (ray-traced--scan lined was much faster but wasn't as good, I thought), so the "solution" there was using six machines, which boiled down to a frame every hour or so--which was a lot better, but still not so great when your spot had to consist of a minute or two of animation...

Anyway, the way it seems to me is that "real-time" 3d has been getting closer every year to approximating that kind of thing @ 60 fps. It's got a long way to go, but the progress that's been made is astounding, I think...

Daliden · Sep 21, 2003

WaltC said:
I guess that I see any attempt to portray nV3x as anything but primarily a 3d-gaming chip first, and whatever-else second, as a very lame apology for the fact that its full-precision DX9/ARB2 performance sucks...

Hmm, then again, doesn't the nV3x architecture support rather elaborate shader programs, especially when compared to the usual? OK, it runs those programs slowly, but I seemed to get the idea that it has some of the features that will be required for the notorious "Pixar-like rendering" with hardware.

Is it so far-fetched to suspect that the card is a two-in-one -- very fast DX8 card and an FX rendering card supporting elaborate shader programs? Perhaps the first priority for NVidia was to enable as complex shader programs as possible, and the speed only the second priority?

WaltC · Sep 21, 2003

Daliden said:
Hmm, then again, doesn't the nV3x architecture support rather elaborate shader programs, especially when compared to the usual? OK, it runs those programs slowly, but I seemed to get the idea that it has some of the features that will be required for the notorious "Pixar-like rendering" with hardware.

Is it so far-fetched to suspect that the card is a two-in-one -- very fast DX8 card and an FX rendering card supporting elaborate shader programs? Perhaps the first priority for NVidia was to enable as complex shader programs as possible, and the speed only the second priority?

If you've followed this thread (and others) you'd note that nV3x shader support is rather sub-par in relation to R3x0, and I have no idea what you mean by "usual"? If you are referring to the length of shader instruction chains, then ATi's F-buffer has effectively countered that. So, I'm not sure where the "elaborate shader instruction" idea originates. You probably got your ideas about "Pixar like rendering" from nVidia's PR relating to the "Dawn of Cinematic Computing" in 3d games. We got "Dawn" but didn't see the dawn, if you know what I mean...

No, I don't see the 2-in-1 concept (3d-gaming card & 3d workstation card) as unique in the case of nV3x, since R3x0 has the same 2-in-1 capability, as evidenced by the appropriate versions of the R3x0 products. nVidia's always marketed the same refrence designs and gpus into both markets--that's old news. What I was taking exception to was the speculation that nV3x full precision performance was deliberate because nVidia didn't design it as 3d-gaming chip, but designed primarily to be a 3d-workstation chip. I don't see any evidence to support that, nor any rational reason to believe it to be the case, besides...

nVidia is certainly not marketing nV3x as a DX8-ish chip, is it? I mean, if what you state had been nVidia's intent, don't you think their marketing would express it? Um, this is getting a bit ridiculous...

Daliden · Sep 21, 2003

WaltC said:
If you've followed this thread (and others) you'd note that nV3x shader support is rather sub-par in relation to R3x0, and I have no idea what you mean by "usual"? If you are referring to the length of shader instruction chains, then ATi's F-buffer has effectively countered that. So, I'm not sure where the "elaborate shader instruction" idea originates. You probably got your ideas about "Pixar like rendering" from nVidia's PR relating to the "Dawn of Cinematic Computing" in 3d games. We got "Dawn" but didn't see the dawn, if you know what I mean...

Well "excuse me very many" for not being a native speaker and therefore not knowing the correct terminology

. With elaborate programs (not "elaborate shader instructions") I simply meant, for example, that nVidia supports branching in its shader programs, and that's not found from ATi, nor any other hardware that has shader support of some sort, now is it? That's where the "usual" came from. If by "sub-par" you mean "slow as hell", yeah, then it is sub-par. But you are also saying that it is actually not the case that you can write more elaborate (yes dammit, feel free to offer me a better word to use there) shader programs on nV3x? Of course they cannot be used in realtime games, and that should have been obvious to nVidia engineers from simulations. So, that really does beg the question "was there more behind choosing CineFX as the name than just empty marketing rhetoric?".

And what's with the condescension? As if the "Pixar-like rendering" meme hasn't been around for years. Haven't read nVidias PR about Dawn; watched it a couple of times once the ATi wrapper came out. Pretty enough, I guess, but it's basically just textures. I want light and shadows, dammit!

WaltC said:
No, I don't see the 2-in-1 concept (3d-gaming card & 3d workstation card) as unique in the case of nV3x, since R3x0 has the same 2-in-1 capability, as evidenced by the appropriate versions of the R3x0 products. nVidia's always marketed the same refrence designs and gpus into both markets--that's old news. What I was taking exception to was the speculation that nV3x full precision performance was deliberate because nVidia didn't design it as 3d-gaming chip, but designed primarily to be a 3d-workstation chip. I don't see any evidence to support that, nor any rational reason to believe it to be the case, besides...

I must admit that I don't see your logic here. I specifically mentioned "2-in-1" so that it could be said that nVidia designed nV3x to be both a 3D gaming chip and a 3D workstation chip. I mean, to me this seems the only rational explanation for the performance we're seeing. OK, so they guessed wrong, and nobody wanted to adopt CG instead of Renderman . . .

Perhaps I should have used "2-on-1" instead, 2 chips on one board? ATi uses the same circuitry for everything, be it games or workstation use. But that would not exactly be the case with nV3x, would it? In gaming, the FP32 units would lie dormant, and much of the shader features remain unused. But in workstation use, it would be the FX12 and FP16 units that would be left aside (all this speculation still relates to the rosy world of nVidia's dream from a couple years back).

WaltC said:
nVidia is certainly not marketing nV3x as a DX8-ish chip, is it? I mean, if what you state had been nVidia's intent, don't you think their marketing would express it? Um, this is getting a bit ridiculous...

Uh, of course they aren't marketing it as a DX8 chip

That would be nVidia from some parallel universe. But what has the current marketing to do with what the design intentions were back then?

If we want to talk about marketing, let's talk about the marketing when the first FX cards were published. At least to me it seemed an awful lot like "the few cards we can get available are selling as hot cakes to Hollywood studios, where they are used to make movies" or something like that. It had this image of an all-powerful card that would be almost ridiculous to use for mere gaming. Of course, when that didn't pan out at all, now they market it for gaming as much as possible.

One more thing: I'm not here advocating anything, I'm just standing on a soap box and trying to advance my own pet theory about what happened to nVidia

Dave H · Sep 21, 2003

sireric said:
The FP24 availability was at least 1.5 years before NV30 "hit" the market. I'm sure MS would of been very reasonable to inform them earlier or even work with them on some compromise. I have no idea what actually "happened". Not sure how that fits in with their schedule, but I think it's safe to say that they had time, if they wanted.

So unless you're referring to NV30's paper launch, we're talking fall '01 timeframe. And of course something of a compromise was made: the _pp hint. (Hmm...the presence of FP16 in the PS 2.0 and ARB_f_p specs would indicate that Nvidia anticipated register space issues at least that far back...)

I'll have to defer to your judgement that that left enough time to redesign to better fit the specs. Surely the fact that the R3x0 design waited for the decision before committing to FP24 is a big point in favor of that argument. I guess the two significant questions are whether the NV3x team was still shooting for a fall '02 release at that point, and how big a factor it is that CineFX is a multiple-precisions design into which FP24 might not have fit very well, while the R3x0 was a one-precision design waiting to find out what that precision would be.

I don't believe in the TSMC problems. I agree that LowK was not available, but the 130nm process was clean by late spring 02, as far as I know.

Interesting.

Surely a respin to target "normal K" would delay things by a few months, although one would think that had Nvidia should have started work on that as soon as potential problems with low K cropped up, even if they still expected low K would be fixed in time for the planned release.

Still...if .13 at TSMC was doing great in plenty of time for a fall '02 product, and if a process respin isn't too time consuming, why didn't ATI use it for R350, if not R300?

I can't speculate too much, but I agree with some of your posts. At the end, the GF4 was a 4x? arch (was it x1 or x2 -- I don't remember). A natural evolution of that architecture would be a 4x2 still. A radical change there might be more than their architecture can handle.

Good point. GeForce4 was a 4x2, as was every high-end Nvidia design since the GeForce2. And every mainstream Nvidia chip since then has been a 2x2. (With the possible exception of NV31, although with the realization that NV34 is actually a 2x2 except in fixed-function single texturing situations, who knows what NV31 is doing.) So, yes: that it's an nx2 architecture is likely the result of evolving previous designs.

Well, 3x would be 33% in my book.

"A has 3x the performance of B" == "A is 200% faster than B" == "B is 33% the speed of A" == "B is 66% slower than/behind A" 8)

Sorry for any confusion.

I don't remember the shadermark results, but I thought some were more than 3x. Our renderman conversion examples (using Ashli to generate DX9 assembly) showed up to 5x using some sort of 4x driver set from NV.

Fair enough, although it seems fair to assume the Ashli compiler knows more about R3x0 optimization opportunities than those for NV3x, right? (Not that the HLSL compiler likely produces code too much different.) In any case, while measuring the performance in even a very shader-limited game situation, as opposed to measuring synthetic shader performance, will tend to tamp down the performance difference somewhat (Amdahl's Law and all that), you're sadly right that a factor of 3x is probably too conservative as an upper bound. (Well I don't expect you're too sad about it!

) Maybe we'll split the difference and call it 4x?

Not sure exactly. I'm guessing that they share the ALUs with texture and ALU ops, so can only issue 1 instruction per cycle (also having 1 instruction memory implies that too). As well, I'm guessing that they use some sort of storage, per pixel, for registers, which is also used to hide latency. It could be that that memory is a small cache and that they store in local video memory the overflow; that would not be a good design, since DX9 has a very fixed maximum bound on pixel sizes. At the end, we may never know.

Not sure what you mean by the bolded part--do you mean that DX9 specifies a particular maximum amount of state for each pixel in-flight (i.e. 32 constants and 12 temps for PS 2.0)? Because if so you should note that PS 2.0x has much greater requirements: a total of 256 constant registers and 32 temps. A look at NV_fragment_program confirms that those are indeed implemented as 32 architected FP32 temps (or 64 FP16). (For completeness I should note that ARB_fragment_program calls for 16 temps, which is what I suspect the R3x0 pipeline provides.)

So it does make sense that NV3x would have a harder time keeping all its state in physical registers, rather than having a mechanism to spill them to a cache or even local memory. Although I'm at a loss for how one gets from there to "full speed access for the first 32 bytes, then a small penalty for the next 32, a larger penalty for the following 32, an even larger hit for the next 32...". Interesting to speculate on what's going on at the hardware level there...

And there is still the question of why they would choose to implement such an expansive set of architected registers. Yes, the longer shader programs that CineFX allows might perhaps result in the desire for a few more temps, but it would seem like the primary value of a register count above 16 or so would only be in enabling optimizations that trade off register use for performance...which, in the case of NV3x, tend to be worthless because using the registers extracts a higher performance penalty than what you get from the optimization. Still quite odd. :?

Laa-Yosh · Sep 21, 2003

32 bit integer precision in an offline renderer?? You guys must be kidding... or else name this renderer

I seriously doubt that any of the big 4 (Max, Maya, XSI, LW) would be using less than 64 bit per color - in fact, AFAIK LW uses 128 bits per color... Mental Ray and PRMan should be at least as good. Dammit, MR can be 100% phisically accurate, which doesn't sound like integer precision to me.

Also please not that apart from movie VFX studios, PRMan is quite rare in the industry because of its very high price (USD 5000 / CPU AFAIK). Most of the 3D you see in game FMVs, commercials, documentaries etc. is made using the built-in renderers of the "big 4".

demalion · Sep 21, 2003

WaltC said:
...

Trust me...many offline rendering "farms" today do not need 128-bit color precision, nor 96-bit color precision--many have been operating for years at essentially 32-bit integer precision. This is why most of the rendering software out there does not support 96-bit/128-bit rendering yet--just like 32-bit 3d games don't magically render at 96/128-bits *unless* the software engines support it.

OK, I don't understand how this makes sense at all. To process "32-bit" graphics in the sense of "128-bit" graphics we are discussing, CPUs and the rendering software would have to be doing calculations at 8 bit precision. Which software does that? "128-bit" is 32-bit per component, and CPU calculation precision wouldn't have clear reason to multiply by 4 like is done for GPUs.

Also, as I understand, there are 128-bit and 96-bit intermediate storage formats and implementations for "multipass like" steps as required for off-line rendering.

Correct me where I'm mistaken.

demalion · Sep 22, 2003

Well, I continue to maintain that from the standpoint of aiming towards something resembling a unified shader model and (hypothetically) being able to task the same processing units to vertex or pixel processing at need, with some efficiency, the NV3x series makes sense as an intermediate step that suffered from:
1) underestimation of the performance competitors would achieve
2) a distorted perception of the ability to dictate that the industry wait for you
3) a host of issues resulting from decisisons made to deal with the situation after the situation resulting from this circumstance became more evident

Also, I continue to perceive the challenge nVidia faces for the NV40 under this view of their plans is primarily one of performance relative to ATI's product.

One prior discussion

KimB · Sep 22, 2003

WaltC said:
If you've followed this thread (and others) you'd note that nV3x shader support is rather sub-par in relation to R3x0,

Um, the NV3x supports higher precision (full 32-bit FP), more instructions (branching in VS, approximated partials, for example), and much longer programs. These are precisely the things that offline rendering needs.

Unfortunately, it does have limitations that make it a challenge to make the longer programs work in realtime, most glaring of which is the register usage problem, but I doubt these programs would make it a bad choice for offline rendering.

As far as I can see, the only thing keeping the NV3x architecture from being used in offline rendering farms is a software barrier. There would need to be a Renderman compiler, complete with software routines/software assist for those algorithms that the hardware cannot do (such as arbitrary branching or loops, or double precision), as well as automatic multipass.

If you are referring to the length of shader instruction chains, then ATi's F-buffer has effectively countered that.

I remember the announcement of ATI's F-buffer, but I have yet to hear any real information on it, such as how well it works, what, if any, bugs exist, what kind of performance hit there is, and so on. Is this feature yet available? Who has tested it?

Anyway, the point is that nVidia's IEEE 32-bit FP support makes it much more suitable for using the card as somewhat of a "hardware assist" to offline rendering, where some of the processing can be done in hardware, and some in software, with no difference in quality.

KimB · Sep 22, 2003

demalion said:
2) a distorted perception of the ability to dictate that the industry wait for you

I'm not sure this is accurate at all. The product was delayed. This was obviously due to technical issues. I'm not sure having technical issues has anything to do with the rest of the industry waiting for nVidia.

And if you're thinking that nVidia felt the rest of the industry should have waited for nVidia to move technology forward, that's sort of the point. nVidia thought they'd still be the technology leader. Given ATI's very lacking performance at the time of the initial design of the NV3x architecture (around the time of the original GeForce), as well as the rest of the competition, this seems to me that it was a reasonable assumption. The NV3x architecture was designed as the next natural step up after the NV2x architecture, and it was meant to bring technology forward, not hold it back.

And, of course, the NV3x architecture is capable of more than the R3xx architecture. It's only too bad that it has such severe flaws in relations to its FP performance. The only solution is to have shaders that use both FP and integer, and I still am not sure that this is such a bad thing.

Tahir2 · Sep 22, 2003

I think what some members are saying is that we need 64bit FP quality and complete programmability of the GPU - essentially making it even more CPU-like - with regards to the type of shaders, pixel or vertex (more emphasis perhaps on the pixel shaders at this time).

Perhaps the next card to do this will be here in a couple of years rather than now.

Althornin · Sep 22, 2003

Chalnoth said:
And, of course, the NV3x architecture is capable of more than the R3xx architecture.

Is it?
I seem to recall a number of things ATI can do that nVidia cannot.

OpenGL guy · Sep 22, 2003

Chalnoth said:
Anyway, the point is that nVidia's IEEE 32-bit FP support makes it much more suitable for using the card as somewhat of a "hardware assist" to offline rendering, where some of the processing can be done in hardware, and some in software, with no difference in quality.

Except that if you have to multipass on the NV3x hardware, you are in trouble because there's no F-buffer or even floating point intermediate buffer format available.

archie4oz · Sep 22, 2003

Did NV_float_buffer stop working?

Dave H · Sep 22, 2003

WaltC said:
Right, it's not just DX9 that's slow at full precision on nV3x, it's ARB2, as well.

"ARB2" is not a specification; it's John Carmack's name for a particular rendering path in the Doom3 engine. Unless you thought we were talking about doing DCC preview and production quality offline rendering using the Doom3 engine--replacing Maya with machinima, if you will--"ARB2" has no place in this discussion.

Perhaps you meant to say "ARB_fragment_program," which is the ARB's non-proprietary extension that serves as the OpenGL counterpart to DX9's PS 2.0. Of course since digital content creation--whether using hardware acceleration in the preview or the final render stages--is typically rendered to still, film or video formats and not distributed as code to be run on the end-user's GPU, the relevant specification would more likely be NV_fragment_program, the Nvidia proprietary extension that serves a similar purpose.

Yes, we all know it's still slow in full precision.

Here's a tip: perhaps what you meant to say was "NV3x is not just slow at full precision in DX9, but also in OpenGL." Of course this was not in dispute, so long as you measure "slow" in reference to realtime framerates--which, by the way, we're not doing here.

The point for me is that the idea that nV3x was designed for "workstations" and not "3d gaming" is simply void, as it suffers from the same problems in workstation useage--it's slow at full precision there, too.

Compared to R3x0 at FP24, yes. Compared to offline rendering costing a couple orders of magnitude more, it is extremely fast (at the subset of functionality it can provide). Compared to R3x0 at FP32...oh yeah...

What you don't seem to understand is that graphics performance is only "slow" or "fast" in reference to a target framerate that depends crucially on the task being done. Just as it's utterly irrelevant whether a graphics card can push 200 or 400fps in Q3, it's close to irrelevant whether a graphics card can push 2 or 4 fps when used to preview a shader effect in Maya, or to accelerate offline rendering that would take many seconds per frame in software.

Let's invent two hypothetical cards, A and B, and two shader workloads--one simple (for offline rendering; still far too complex for realtime), one less so--that Bob the Special Effects Guy wants to preview in his effects editing package before sending the frame to the render farm for a full render that will take minutes or hours. Let's say card A can render the first effect at 2fps, but can't accelerate the second effect at all, because it has a limit on the number of instructions in a shader, or because the shader needs some functionality like branching which card A doesn't support; meaning Bob needs to do a software preview that takes, say, two minutes. Now let's say card B can render the first effect at .5fps, and the second effect at .1fps. Which card would Bob rather have?

Oh, almost forgot to mention: card A renders at a different internal precision from the final render on the renderfarm, so there's a greater chance the effect as previewed won't look quite like it does as rendered. While the preview render from card B won't match the final render exactly--that's why it's a preview, after all--at least it won't have the precision issue. Now, which card would Bob rather have?

Obviously card B, even though on a common workload card A is four times faster. That 4x performance isn't as important as the other stuff.

I should point out that this hypothetical doesn't necessarily capture the pros and cons of, say, NV35 vs. R350 for this market. Among other things, ATI seems to have better toolkit support, as Ashli integrates with the major rendering packages, and avoids most cases where NV35's more flexible shader program rules might be expected to allow it to run a wider class of shaders, because Ashli can generate automatic multipassed shaders.

But it isn't meant to capture today's market dynamics, but rather the design considerations Nvidia may have had when designing NV3x. It certainly demonstrates the point of having a very flexible fragment shader pipeline that so lacks in performance that it could never come close to its limitations while maintaining realtime framerates.

OK, I went back a second time and reread it, and still din't see one word in it about nV3x and DX9 (or the same functionality exposed in OpenGL.)

Howbout the words "dependent texture reads and floating point pixels," which (in addition to longer shader program lengths, which, while not explicitly mentioned are a third obvious factor for what he's talking about) is a pretty darn good definition of DX9 functionality?

John Carmack said:
In fact, this sentence:

John Carmack said:

...The current generation of cards do not have the necessary flexibility, but cards released before the end of the year will be able to do floating point calculations, which is the last gating factor....

Click to expand...

...leads me to believe that this was June 27, 2002

Nice piece of sleuthing, Einstein. Where as for me it was this sentence that led me to the same conclusion:

John Carmack said:
by John Carmack (101025) on 09:51 PM June 27th, 2002

Different strokes for different folks, I guess.

In fact, Carmack simply seems to be discussing, in general, the trend of 3d-chips overtaking software renderers, which has been in progress ever since the V1 rolled out.

No, he's quite explicitly discussing how VS/PS 2.0(+) functionality would lead in the near future to cards based on consumer GPUs being used to replace low-end uses of offline software renderers in generating production frames, particularly for TV. And he notes that he's recently been briefed on how "some companies have done some very smart things" in terms of an emphasis on achieving this possibility with their upcoming generation of products instead of further down the road as JC had previously assumed.

So re-reading Carmack's very general statement here doesn't provide me with how you reached your ideas about nV3x being "special" in this regard, in comparison with R3x0--which is headed in the same direction.

First off, your original snide-ass comment wasn't that R3x0 was equally well-positioned for offline rendering of production frames as NV3x, but rather that anyone who thought taking over some of the low-end offline rendering market was a design goal for current-generation DX9 cards was worthy of ridicule. Carmack's post indicates that either you're wrong, or that he's not only just such an idiot but even managed to misconstrue what ATI/Nvidia (or both) had told him so as to mistakenly assert that one or both of them was actually pursuing that goal when obviously they weren't.

Having said that, while it's true the post doesn't specifically identify Nvidia as hastening this push any more than ATI, some context makes it clear that this is more likely. In particular, while the R300 hadn't been officially launched, it had already had its debut about a month prior--by Carmack himself, running Doom3 at id's E3 booth. Meanwhile, this was only a month or so before SIGGRAPH 2002, where Nvidia showily launched "Cinematic Computing" and Cg; it seems very likely that Carmack would have recently recieved his preview of what they were set to announce then.

Of course ATI also launched RenderMonkey at the same conference, so perhaps Carmack really was referring to the both of them in this comment. Still, I don't think it's really debatable which of the two focused much more strongly on pushing their new generation cards to preview render shaders in offline content creation, and which was the much closer to pushing for their cards to be used to take production frames from software renderers.

Natoma · Sep 22, 2003

OpenGL guy said:
Chalnoth said:

Anyway, the point is that nVidia's IEEE 32-bit FP support makes it much more suitable for using the card as somewhat of a "hardware assist" to offline rendering, where some of the processing can be done in hardware, and some in software, with no difference in quality.

Click to expand...

Except that if you have to multipass on the NV3x hardware, you are in trouble because there's no F-buffer or even floating point intermediate buffer format available.

Question. Where exactly is the F-Buffer exposed? I read somewhere that it is not exposed in DX9, but could be in a 9.1 revision, but you can use it in OGL, but only if the APP is specifically coded "for it". Can you clear up this confusion please?

Frank · Sep 22, 2003

Does a NV3x have better hardware support for Renderman shading than for DX9/OGL? Or is it only about the precision and length of the functions? And when would 32-bit FP be better than 24-bit? After how many and which calculations?

btw. Are there cards with 2 or more R3x0 chips? How do they preform?

Anyway, if it is only for a preview, couldn't you use a 9800 as well? If the shader prog is too long, can't you do multiple passes with parts of the program?

EDIT: This is probably a very stupid idea, but if you need conditional branching, can't you use an intermediate format that loops the pixels back to the vertex shader one way or another? Those can do branching. And 32-bit FP, if needed for specific things.

Can that been done at all? Would it be useful?

demalion · Sep 22, 2003

Chalnoth said:
demalion said:

2) a distorted perception of the ability to dictate that the industry wait for you

Click to expand...

I'm not sure this is accurate at all. The product was delayed. This was obviously due to technical issues.

That was point 3). This can be thought of briefly as responding to the R300.

Point 2) refers to where they aimed with the NV30 and the tools intended for it. This can be thought of briefly as failing to anticipate the R300 and its impact on their plans, such that 3) occurred as it did.

Nvidia and ARB2

WaltC

sireric

WaltC

Daliden

WaltC

Daliden

Dave H

Laa-Yosh

I can has custom title?

demalion

demalion

KimB

KimB

Tahir2

Althornin

Senior Lurker

OpenGL guy

archie4oz

ea_spouse is H4WT!

Dave H

Natoma

Frank

Certified not a majority

demalion

Similar threads