Nvidia and ARB2

OpenGL guy · Sep 22, 2003

Natoma said:
Question. Where exactly is the F-Buffer exposed? I read somewhere that it is not exposed in DX9, but could be in a 9.1 revision, but you can use it in OGL, but only if the APP is specifically coded "for it". Can you clear up this confusion please?

I don't know what the plans are for supporting the F-buffer. The best way to think of the F-buffer is as just another surface format. The application can allocate an F-buffer then use it just like any other render target.

WaltC · Sep 22, 2003

demalion said:
OK, I don't understand how this makes sense at all. To process "32-bit" graphics in the sense of "128-bit" graphics we are discussing, CPUs and the rendering software would have to be doing calculations at 8 bit precision. Which software does that? "128-bit" is 32-bit per component, and CPU calculation precision wouldn't have clear reason to multiply by 4 like is done for GPUs.

Also, as I understand, there are 128-bit and 96-bit intermediate storage formats and implementations for "multipass like" steps as required for off-line rendering.

Correct me where I'm mistaken.

Actually, my old Amiga/Toaster renderers circa 1993...

...rendered to 24-bits of precision with Lightwave--in software--and it was quite good for TV and cable. It could actually do even better, depending on the quality of the output media (BetaMax, Film, S-VHS single frame recorders, etc.) and resolutions used--for instance the TV series "Babylon 5" began using nothing but the same kind of Amiga/Toaster render farms I used at the time--and the effects were pretty good, generally. The main advantage was the cost of a sytem like that was about 1/20 the cost of comparable "commercial" gear at the time--and you couldn't do nearly as much with the far more expensive commercial systems. Of course, like everything done in software--the higher the precision, the more time required to calculate and render the result. But it was 8-bits, RGB...

(24-bits.)

What doesn't make any sense to me is to discuss nV3x as an off-line renderer for this kind of work--as though its slow full-precision performance has to do with a particular design decision by nVidia in targeting the offline rendering market. It's fine for pre-production preview work, just like R3x0, but not for the kinds of special effects they do these days (as has been pointed out.)

WaltC · Sep 22, 2003

Laa-Yosh said:
32 bit integer precision in an offline renderer?? You guys must be kidding... or else name this renderer

I seriously doubt that any of the big 4 (Max, Maya, XSI, LW) would be using less than 64 bit per color - in fact, AFAIK LW uses 128 bits per color... Mental Ray and PRMan should be at least as good. Dammit, MR can be 100% phisically accurate, which doesn't sound like integer precision to me.

Also please not that apart from movie VFX studios, PRMan is quite rare in the industry because of its very high price (USD 5000 / CPU AFAIK). Most of the 3D you see in game FMVs, commercials, documentaries etc. is made using the built-in renderers of the "big 4".

But if you want to use Lightwave to calculate to 128-bit colocr accuracy in a ray-traced frame--how's 128-bit fp in 3d chip going to help you do that? (It might be OK in a preview window--if you wanted to rotate an object while you create it--but why not just use flat shading or wire frame--it's much faster? I think most scene creators would use wire-frame or flat-shading in creating objects and doing pathing a scene. I see zero advantage to nV3x over R3x0 in this regard.)

More to the point, distinction needs to be made between 3d and 2d. I don't think that's being done here...

WaltC · Sep 22, 2003

Daliden said:
Well "excuse me very many" for not being a native speaker and therefore not knowing the correct terminology . With elaborate programs (not "elaborate shader instructions") I simply meant, for example, that nVidia supports branching in its shader programs, and that's not found from ATi, nor any other hardware that has shader support of some sort, now is it? That's where the "usual" came from. If by "sub-par" you mean "slow as hell", yeah, then it is sub-par. But you are also saying that it is actually not the case that you can write more elaborate (yes dammit, feel free to offer me a better word to use there) shader programs on nV3x? Of course they cannot be used in realtime games, and that should have been obvious to nVidia engineers from simulations. So, that really does beg the question "was there more behind choosing CineFX as the name than just empty marketing rhetoric?".

There are as many theories being circulated about the nV3x architecture (eg, "non-traditional," or "zixels" or--you name it) than most of us have fingers and toes...

What I'd like to know is the purpose for which you might like to write a "more elaborate" shader program of the type being hypothesized?...

Is elaboration for elaboration's sake a good thing? I can see how it might be a slow thing, but as to being a "good" thing, I think that is YTD...

(Yet to be determined.) I guess I'm just weary of hearing fanciful tales spun about an architecture no one really knows anything about simply in order to justify it (I don't think you are doing that, btw.)

And what's with the condescension? As if the "Pixar-like rendering" meme hasn't been around for years. Haven't read nVidias PR about Dawn; watched it a couple of times once the ATi wrapper came out. Pretty enough, I guess, but it's basically just textures. I want light and shadows, dammit!

Actually, it was 3dfx which started the "Cinematic, Toy-story" brand of PR associated with 3d gaming back in 1999--and you are exactly right, it's as old as the hills. I guess I'm weary of hearing it...

I must admit that I don't see your logic here. I specifically mentioned "2-in-1" so that it could be said that nVidia designed nV3x to be both a 3D gaming chip and a 3D workstation chip. I mean, to me this seems the only rational explanation for the performance we're seeing. OK, so they guessed wrong, and nobody wanted to adopt CG instead of Renderman . . .

My logic is simple and factual--most of nVidia's chips in the last few years have been sold into the "professional" segment and into the "gaming" segment. nV3x is no departure. The only difference between them has been the driver and software packages included, as well as the price. ATi for instance sells R3x0 as FireGL (I believe)...and it is just as capable for the "professional" as is the nV3x--but it's much faster at full-precision 3d *games* than is nV3x. Ergo, it does not follow that a capable 3d "professional" chip *must be* slow at full-precision 3d gaming.

Some are, however, slow at 3d gaming--such as 3d-labs offerings--but 3d Labs products are not sold into the 3d gaming markets, either. They are squarely aimed at the professional markets, as the company's pricing and marketing clearly indicate.

The rational explanation for nV3x's slow gaming full precision performance is abundantly clear--it's a poorly designed DX9/ARB2 full precision chip, in *comparison to* R3x0. We need not invent fanciful tales about "off-line" rendering to explain it.

Perhaps I should have used "2-on-1" instead, 2 chips on one board? ATi uses the same circuitry for everything, be it games or workstation use. But that would not exactly be the case with nV3x, would it? In gaming, the FP32 units would lie dormant, and much of the shader features remain unused. But in workstation use, it would be the FX12 and FP16 units that would be left aside (all this speculation still relates to the rosy world of nVidia's dream from a couple years back).

In professional 3d rendering work, most especially in ray tracing, the nV3x fp32 units would be useless for final output. You need the cpu for that. The best you could say is that, for software that supports it, you might use fp32 for *preview* work--but not for final output.

Uh, of course they aren't marketing it as a DX8 chip That would be nVidia from some parallel universe. But what has the current marketing to do with what the design intentions were back then?

Back when...as in back when they found out about fp24 in DX9? The problem is that nobody put a gun to nVidia's head and said "You must include fp32" or else, did they? It was an elective choice they made, just as ATi made its choices. Nothing prevented nVidia from designing an fp24 chip, except nVidia.

Secondly, if nVidia had to depend on the "professional workstation" market for its high-end 3d chips, they'd be out of business rather quickly, or wind up being bought by Creative Labs...

Obviously, their intent was to design a chip for the 3d gaming markets which they could also market in a higher-priced package to the professional markets--just like they've always done with Quadro. There is nothing I can see that would possibly make me think anything different.

If we want to talk about marketing, let's talk about the marketing when the first FX cards were published. At least to me it seemed an awful lot like "the few cards we can get available are selling as hot cakes to Hollywood studios, where they are used to make movies" or something like that. It had this image of an all-powerful card that would be almost ridiculous to use for mere gaming. Of course, when that didn't pan out at all, now they market it for gaming as much as possible.

OK, I see your problem...

You were blinded by the PR blitz about the "dawn of cinematic computing" that preceeded the first nV30 product availability by several months (before it was cancelled)...

Man, and they say PR doesn't work...Honestly, I never, ever got out of their PR the message that you did...

What I got out of it was nV3x was supposed to bring "Hollywood" to 3d gaming. Heh...

I was very skeptical then, and with much justification, it turns out.

One more thing: I'm not here advocating anything, I'm just standing on a soap box and trying to advance my own pet theory about what happened to nVidia

That's fine--except it's simply not the most obvious, simplest explanation. What's the reason nV3x looks so bad at DX9/ARB2 precision? It's R3x0, of course, and nothing else whatsoever. If not for R3x0 no one would be discussing why it was "so slow" in those areas, because it would be the "fastest" thing going. In short, if not for R3x0, this discussion would not be taking place...

If the R3x0 is not considered by some to be as good on a "professional" basis as nV3x, the reason would have nothing to do with fp32--at all--since you can't use it to do ray-tracing--Heh...

--or any other of the incredibly resource-intensive things off-line special effect production work entails. The reason would have to do with specific nVidia-architecture optimizations code designers have built into their rendering software for nVidia OPenGL drivers. It certainly would not be because of the hardware.

demalion · Sep 22, 2003

WaltC,

AFAICS, you keep talking about the final output, and its 4 (or 3) component total "bitness", when CPU "bitness" used to process that output doesn't refer to the same thing at all. As has been said.

Consider the 68882 that was used with the 68020 and 68030 chips. That is the 32-bit processing (as well as some 32-bit integer processing options for simpler rendering solutions than currently used) renderers use. But, when processing a "shader" effect with color components, they apply it to each componenet individually. This is why "multimedia extensions" to processors that can process multiple components simultaneously is a boon to "software renderers".

The significance of a GPU, even one that suffers from real-time performance issues, is that it excels even further at this specific task than the extensions offered as an addendum to CPUs.

WaltC · Sep 22, 2003

Dave H said:
"ARB2" is not a specification; it's John Carmack's name for a particular rendering path in the Doom3 engine. Unless you thought we were talking about doing DCC preview and production quality offline rendering using the Doom3 engine--replacing Maya with machinima, if you will--"ARB2" has no place in this discussion.

....
Yes, we all know it's still slow in full precision.

Here's a tip: perhaps what you meant to say was "NV3x is not just slow at full precision in DX9, but also in OpenGL." Of course this was not in dispute, so long as you measure "slow" in reference to realtime framerates--which, by the way, we're not doing here.

Heh...

Kind of funny for you to point me to a Carmack post, and then tell me Carmack's definitions aren't relevant to the discussion. When Carmack says "ARB2" and points out that R3x0 is 2x as fast when rendering it, do we really have to have it translated that he's talking about "full precision OpenGL"?...*chuckle* I hope not...

Compared to R3x0 at FP24, yes. Compared to offline rendering costing a couple orders of magnitude more, it is extremely fast (at the subset of functionality it can provide). Compared to R3x0 at FP32...oh yeah...

What you don't understand is that in off-line rendering "costing a couple orders of magnitude more," fp32 is as much a subset of that as fp24 is. These things are fine for pre-production preview work--not for final output in which you want ray-tracing and/or lots of other things not possible with either R3x0 or nV3x.

What you don't seem to understand is that graphics performance is only "slow" or "fast" in reference to a target framerate that depends crucially on the task being done. Just as it's utterly irrelevant whether a graphics card can push 200 or 400fps in Q3, it's close to irrelevant whether a graphics card can push 2 or 4 fps when used to preview a shader effect in Maya, or to accelerate offline rendering that would take many seconds per frame in software....

What you don't seem to understand is that 3d chips designed for the gaming market are not meant to do ray-tracing--or many other things multi-million $ off-line special effects renderers are designed to do. It just doesn't matter at all that you could use "fp24" or "fp32" in off-line rendering, since neither is capable of doing all of the other things off-line renderers for movies/commercials/etc. are designed to do. Again, they are good for pre-production work "professionally" if your software supports them in that capacity--but essentially worthless for production-quality rendering. They aren't nearly what's needed for that--"fp32/24" being irrelevant.

...I should point out that this hypothetical doesn't necessarily capture the pros and cons of, say, NV35 vs. R350 for this market. Among other things, ATI seems to have better toolkit support, as Ashli integrates with the major rendering packages, and avoids most cases where NV35's more flexible shader program rules might be expected to allow it to run a wider class of shaders, because Ashli can generate automatic multipassed shaders.

What you seem to be doing, Dave, is taking an old Carmack post about how the current state of 3d chips are gaining ground in areas once the exclusive province of off-line renderers--a very general post about a general trend--and hypothesizing an industry which doesn't exist, an industry where 3d cards are used to do final, production-quality off-line rendering for TV and films. We're a long way from being there at this point.

But it isn't meant to capture today's market dynamics, but rather the design considerations Nvidia may have had when designing NV3x. It certainly demonstrates the point of having a very flexible fragment shader pipeline that so lacks in performance that it could never come close to its limitations while maintaining realtime framerates.

This is what I find bizarre about your reasoning--that it's *necessary* for a 3d card to be comparatively lousy in full-precision 3d gaming so that it can be a "great" off-line rendering product for the professional market. Again, current 3d cards cannot replace expensive off-line rendering solutions for a wide variety of reasons--fp32 or fp24 does not help overcome these "deficiencies" in any way whatsoever since you need a whole lot more than either for production-quality off-line rendering.

What is the chief value for fp32/24 as regards the market for nV3x/R3x0? Why, it's the 3d-gaming market! Gee...

Who'd of thunk it? What a novel idea...!

...Howbout the words "dependent texture reads and floating point pixels," which (in addition to longer shader program lengths, which, while not explicitly mentioned are a third obvious factor for what he's talking about) is a pretty darn good definition of DX9 functionality?

Oh, golly--no other 3d chips support these things so it just *had* to be nV3x and DX9 that he was talking about June 27, 2002...! I think it's obvious that you read far more into his comments than he intended to ever say.

John Carmack said:
by John Carmack (101025) on 09:51 PM June 27th, 2002

Different strokes for different folks, I guess.[/quote]

I didn't see the year of the post--but I'm glad I was right--as I knew I had to be since the only fp-capable 3d chip shipped in '02 was R3x0...

Has it occurred to you that he could just as easily have been talking about R3x0...? I mean, he did choose to run D3 on it initially...In fact, it's clear from Carmack's post that he was discussing *general* things with regard to *general features* in all of the upcoming 3d chips.

It was a general post not talking about specific 3d architectures--at all. The interpretation you place on it seems to be of your own construction.

No, he's quite explicitly discussing how VS/PS 2.0(+) functionality would lead in the near future to cards based on consumer GPUs being used to replace low-end uses of offline software renderers in generating production frames, particularly for TV. And he notes that he's recently been briefed on how "some companies have done some very smart things" in terms of an emphasis on achieving this possibility with their upcoming generation of products instead of further down the road as JC had previously assumed.

How are you going to use a 3d-card to raytrace? I never did get that straight. Sigh...again...he says *nothing* about nV3x in particular. And remember as well, he was speaking one month ahead of R3x0 shipping and *several months* ahead of the nV30--which bombed and was withdrawn.

His comments are all very general and speculative...surely you note that his comments on "some companies have done some very smart things" would have had *nothing* to do with nV3x at the time, right? (since it was the better part of a year away at that time.) So, I mean I think we can conclude that there are several reasons why he wasn't talking about nV3x apart from the fact that he didn't mention it...

First off, your original snide-ass comment wasn't that R3x0 was equally well-positioned for offline rendering of production frames as NV3x, but rather that anyone who thought taking over some of the low-end offline rendering market was a design goal for current-generation DX9 cards was worthy of ridicule.

No, you only got part of my statement correct. I stated that anyone who might think such a "design goal" was the *reason* nV3x was such a comparatively poor full precision game performer was being ridiculous. The reason nV3x is a comparatively poor full precision game performer is R3x0--has nothing to do whatever with "low-end, off-line" 3d rendering being a primary design goal for nV3x. That makes no business sense whatever. Secondly, it also does not explain why nVidia couldn't have designed an fp24/32 chip instead of the fx12/fp16/fp32 chip it designed.

Look--run the numbers yourself--what percentage of nVidia's chips are sold into the general markets targeted at 3d gaming, and what percentage are sold to the "professional" markets? I'd guess 95% of nVidia's chips are sold to the 3d gaming market--or the general consumer market. At least. That should tip you immediately on what markets nVidia was primarily designing for.

And...look at what nVidia's been saying all year about the "direction of 3d games for the future" and how they "disagreed" with FutureMark and everybody else about ps2.0, full fp precision, and other things...nVidia has never once stated through all of this that "We designed nV3x for 5% of the market and would like the other 95% to accept its shortcomings as necessary for 'professional' off-line rendering market." Come on, that's ridiculous on several levels.

The thing that makes nV3x look so bad running full precision games is in fact something nVidia did not anticipate and had no control over--R3x0.

WaltC · Sep 22, 2003

demalion said:
WaltC,

AFAICS, you keep talking about the final output, and its 4 (or 3) component total "bitness", when CPU "bitness" used to process that output doesn't refer to the same thing at all. As has been said.

Consider the [http://e-www.motorola.com/webapp/sps/site/prod_summary.jsp?code=MC68882]68882[/url] that was used with the 68020 and 68030 chips. That is the 32-bit processing (as well as some 32-bit integer processing options for simpler rendering solutions than currently used) renderers use. But, when processing a "shader" effect with color components, they apply it to each componenet individually. This is why "multimedia extensions" to processors that can process multiple components simultaneously is a boon to "software renderers".

The significance of a GPU, even one that suffers from real-time performance issues, is that it excels even further at this specific task than the extensions offered as an addendum to CPUs.

I understand that...

What would make you think I didn't? The simple fact is that fp 32 assumes 32-bits of precision for R,G,B & Alpha, which might be interpreted as "128-bit" precision color. "24-bits" concerning the Amiga/Toaster (Toaster had its own 24-bit 2D hardware graphics engine for output, which did 8-bits of R,G,B & Alpha in final output, and only relied on the Amiga's native graphics for preview work with the device.) This was at a time years before "24-bit" support made it into 3d chips. Years before 3d chips, in fact.

I'm not sure how you might have thought I was talking about "32-bit cpus," or something else. ?.... I agree with your point if it's being made to describe how silly it is to talk about fp32 as if it alone might be fundamental to current 3d off-line rendering. With software, provided one has the hardware resources, any upper-end limit is possible.

WaltC · Sep 22, 2003

Chalnoth said:
The point is that it's capable of full precision. While I'm not sure how it stacks up against 3DLabs' own solution right now, it is a definite step ahead of the R3xx in precision, which may make it the only current viable solution for low-end offline rendering.

And as far as speed is concerned, remember that we're talking about pitting the NV3x vs. software rendering.

This is really the point I should have addressed initially. What is "full precision"...? Is it fp24? It is in R3x0. Is it fp32? It is in nV3x.

But is either fp24 or fp32 "full precision" in terms of an absolute limit? Of course not...any more than 32-bit integer precision was an absolute limit (although many people speculated it might as well have been.)

So, to say that nV3x is "full precision" but that R3x0 is not, is erroneous, as the sky's literally the limit on what you can mean by "full precision." The fact is that--obviously--in R3x0 fp24 is the full-precision limit today, based on state-of-the-art manufacturing and design capabilities. In nV3x, the *current* limit is fp32. However, since fp24 on R3x0 performs much better than fp32 on nV3x, fp24 is considerably more useful for running 3d games--the goal of which is running 3d games in as close to "real time" as possible.

Who's to say that in 5 years the upper limit won't be fp64? How about in 10 years? I think that discussing either fp24 or fp32 in terms of "full-precision limits" is about as constructive as the discussions a few years ago concerning many people's idea that "We'll never need more than 32-bit integer precision"....and so on...

demalion · Sep 22, 2003

WaltC said:
demalion said:

WaltC,

AFAICS, you keep talking about the final output, and its 4 (or 3) component total "bitness", when CPU "bitness" used to process that output doesn't refer to the same thing at all. As has been said.

Consider the [http://e-www.motorola.com/webapp/sps/site/prod_summary.jsp?code=MC68882]68882[/url] that was used with the 68020 and 68030 chips. That is the 32-bit processing (as well as some 32-bit integer processing options for simpler rendering solutions than currently used) renderers use. But, when processing a "shader" effect with color components, they apply it to each componenet individually. This is why "multimedia extensions" to processors that can process multiple components simultaneously is a boon to "software renderers".

The significance of a GPU, even one that suffers from real-time performance issues, is that it excels even further at this specific task than the extensions offered as an addendum to CPUs.

Click to expand...

I understand that... What would make you think I didn't?

WaltC said:
Trust me...many offline rendering "farms" today do not need 128-bit color precision, nor 96-bit color precision--many have been operating for years at essentially 32-bit integer precision.

Eh?

128-bit equates to the 32-bit processing offline rendering has been using...measured 4 components at a time. Which offline renderers haven't been taking advantage of 68882 processors or other floating point processing as it became more prevalanet, such that they "do not need 128-bit color precision, nor 96-bit color precision---many have been operating for years at essentially 32-bit integer precision."?

The simple fact is that fp 32 assumes 32-bits of precision for R,G,B & Alpha, which might be interpreted as "128-bit" precision color.

This is a conversation I think you need to go and have with yourself, I know I've already done so earlier. :-?

"24-bits" concerning the Amiga/Toaster (Toaster had its own 24-bit 2D hardware graphics engine for output, which did 8-bits of R,G,B & Alpha in final output, and only relied on the Amiga's native graphics for preview work with the device.)

It calculated the 3D images using the CPU, using, as one example, it's 32-bit floating point processing, or implementations that more slowly obtained the same quality with integer processing implementations of floating point calculations, or more quickly(?) calculated results and in such a way limited by not being floating point. It then applied post processing on such output images for transitions and special effects, but that was primarily the additional hardware, not directly related to what Lightwave was doing.

Vertex processing relates to spatial calculations and surface lighting characteristics, fragment processing relates to more detailed (for multipixel surfaces) material and lighting characteristics, from the perspective of a pixel/fragment. Things like what the CPU would be doing, but it would not be doing as quickly.

AFAICS, the one who seems to be confusing these things is yourself.

This was at a time years before "24-bit" support made it into 3d chips. Years before 3d chips, in fact.

Why do you bring up 24-bit output support again? I don't think your point is necessarily true at the time of Lightwave's adoption and success, even if it was relevant.

I'm not sure how you might have thought I was talking about "32-bit cpus," or something else.

Because "32-bit cpus" were used for offline 3D rendering, and you mentioned 32-bit in your comment about offline 3D rendering....
Eh?

?.... I agree with your point if it's being made to describe how silly it is to talk about fp32 as if it alone might be fundamental to current 3d off-line rendering.

It was concerning how silly your comments were. What this has to do with "it alone might be fundamental to current 3d off-line rendering" other than a blatant contradiction to your stating that offline renderers only need 32-bit, I do not know.

With software, provided one has the hardware resources, any upper-end limit is possible.

Thanks for that insight(?), but I'm not sure what part of the replies you've received indicates that it profits me or lends it relevance to my responses to you.

But concerning the problematic statement you introduced into the discussion...?

WaltC · Sep 22, 2003

I wanted to add here...I don't want it thought that I'm saying the nV3x isn't "suitable" or something for the kind of low-end, pre-production preview work being discussed here. I'm not saying that at all. I'm only saying that, IMO, fp32 doesn't make it any more suitable for this kind of work than does fp24 in R3x0. And I'm saying that nV3x is "slow" in full-precision DX9/ARB2 gaming only when *compared to* R3x0. IE, if there was no R3x0, nv3x wouldn't seem so slow...

And, prior to nV3x, what was the capability of the Quadro cards nVidia's been selling *for years* into the "professional" low-end 3d rendering market being discussed here? What architecture has much of the current software been optimized around--ie, 3dMax, Lightwave, etc.? It's no secret that prior to nV3x nVidia's best-selling professional Quadro cards were all 32-bit integer display cards. They were considered useful even so, were they not, for use with these applications?

And that's because the best use of these products in that environment is as preview devices--they worked out fine. And, it's not as if nV3x won't support the integer precision written into this software, is it, to support nVidia 3d chips that had no fp capability at all...?

WaltC · Sep 22, 2003

demalion said:
It was concerning how silly your comments were. What this has to do with "it alone might be fundamental to current 3d off-line rendering" other than a blatant contradiction to your stating that offline renderers only need 32-bit, I do not know.

With software, provided one has the hardware resources, any upper-end limit is possible.

Click to expand...

Thanks for that insight(?), but I'm not sure what part of the replies you've received indicates that it profits me or lends it relevance to my responses to you.

But concerning the problematic statement you introduced into the discussion...?

I'm quite sure that neither of us thinks the other knows of what he speaks...

I was referring--albeit obliquely--to the fact that nVidia's been selling 32-bit integer Quadros into this same market for years. Now please explain to me how a nv25-based Quadro is going to render in > 32-bit integer precision. Not only has the basic work not changed, but also the software optimized to nVidia's pre-nv3x hardware hasn't changed, either, for this kind of work as concerns the Quadro series--as of this date. Yet, clearly, many of the rendering packages discussed here had a capability far in excess of nv25 Quadro. Seems very simple to me...

Also, if you think you can challenge me on the info concering the Amiga/toaster I've referenced...please do. I've got one sitting 10 feet from me.

Last, I believe the comment above about no limit on precision was made to one of Chalnoth's comments--and not to you--so no wonder it has no "relevance" to your responses to me.

:?:

Edit: OK, I see what you're talking about above---heck, I've made too many responses here to keep track of them all. Sorry for the confusion, as that was my basic point to Chalnoth.

demalion · Sep 22, 2003

WaltC said:
demalion said:

It was concerning how silly your comments were. What this has to do with "it alone might be fundamental to current 3d off-line rendering" other than a blatant contradiction to your stating that offline renderers only need 32-bit, I do not know.

With software, provided one has the hardware resources, any upper-end limit is possible.

Click to expand...

Thanks for that insight(?), but I'm not sure what part of the replies you've received indicates that it profits me or lends it relevance to my responses to you.

But concerning the problematic statement you introduced into the discussion...?

Click to expand...

I'm quite sure that neither of us thinks the other knows of what he speaks...

I was referring--albeit obliquely--to the fact that nVidia's been selling 32-bit integer Quadros into this same market for years.

"32-bit integer Quadros"? Oh, yeah, in 2D.

How is this relevant to any of the rest of my discussion, which was the point of my post?

How does this establish a relevance to that discussion where I complained about your confusing 2D output with 3D processing it in this comment, the establishing of which was the only purpose of this particular, isolated, and tail-end part of my response, that you chose to quote.

Now please explain to me how a nv25-based Quadro is going to render in > 32-bit integer precision.

Why would I, since I never talked about NV25 based Quadros?

Wait, when I typed "WaltC", I meant this other guy I used to know who came to mind. Now that I've said that, "please explain to me what made you think I was talking to you?"

Not only has the basic work not changed, but also the software optimized to nVidia's pre-nv3x hardware hasn't changed, either, for this kind of work as concerns the Quadro series--as of this date.

You're talking about wireframe and basic solid modelling. What does this have to do with 128-bit color precision which was being discussed, or using such capabilities in recent cards for 3D render processing?

Yet, clearly, many of the rendering packages discussed here had a capability far in excess of nv25 Quadro.

Sure, by using the CPU. The entire discussion was about the applicability of using the NV3x in place of a CPU, hence "off line rendering" and not "real time modeling to 32-bit color output like an NV25 could do for use in off line rendering", clues being discussion of the NV3x capabilities that the NV25 did not have.

Seems very simple to me...

Sure, it is perfectly obvious that people are talking about the NV25 when talking about 128-bit precision, and it is foolish of us to not realize that this is obviously what you were referring to in your discussion of it.

Also, if you think you can challenge me on the info concering the Amiga/toaster I've referenced...please do. I've got one sitting 10 feet from me.

Well, I was more interested in what I was saying being true, which, since you simply said this instead of correcting it, I presume it is. What this Amiga/Toaster "showdown" has to do with that, I do not know.

Last, I believe the comment above about no limit on precision was made to one of Chalnoth's comments--and not to you--so no wonder it has no "relevance" to your responses to me.

Well, the comment about "any upper-end limit" came from your reply to my post, complete with my name at the top, so I'm presuming you're not being silly enough to talk about that.

I must therefore ask you to clarify what you are talking about, since I don't see discussion of "limits" anywhere else.

You could possibly mean the only quote of you that didn't come from your reply to me, but that was not presented on the basis of it being directed at me, just on the basis of your comment having to do with discussing off line rendering processing being done at 32-bit and not having 96-bit or 128-bit processing being necessary, and this statement being silly, as "32-bit" processing at all comparable to the "128-bit or 96-bit" nomenclature would be calculated at 8 bit precision by a CPU or GPU.

With that text of yours mentioning, directly, "many offline rendering 'farms' today do not need 128-bit color precision, nor 96-bit color precision--many have been operating for years at essentially 32-bit integer precision", my prior discussion, and your deciding to completely skip over dealing it directly, I think this has already been successfully addressed.

EDIT:

WaltC said:
Edit: OK, I see what you're talking about above---heck, I've made too many responses here to keep track of them all. Sorry for the confusion, as that was my basic point to Chalnoth.

That's fine, but it leaves your entire reply without a point that I can see.

Natoma · Sep 22, 2003

OpenGL guy said:
Natoma said:

Question. Where exactly is the F-Buffer exposed? I read somewhere that it is not exposed in DX9, but could be in a 9.1 revision, but you can use it in OGL, but only if the APP is specifically coded "for it". Can you clear up this confusion please?

Click to expand...

I don't know what the plans are for supporting the F-buffer. The best way to think of the F-buffer is as just another surface format. The application can allocate an F-buffer then use it just like any other render target.

So then the F-Buffer can be used in DX9 as long as the app codes for it? Or is DX9 different than OGL in this regard? Would this alleviate JC's issues with the R300 instruction limits he commented on back in February?

Daliden · Sep 22, 2003

WaltC, you constantly mention how 3D cards cannot be used for raytracing, and for that reason cannot be used in either TV or movie production.

But what percentage of TV effects work is actually raytraced? High-end commercials most certainly most of the time. But a regular weekly effects show? I'd be pretty surprised if they'd have raytraced effects. I'd even go as far as to claim that Enterprise doesn't use raytracing (if someone now says that they use models, well, I'll just go shoot myself . . .

)

Another point . . . it's not the 32-bit precision why some of us say that maybe the FX was designed with, well, FX work in mind. It's the other features: some built-in math functions, branching, long shader programs and some other stuff (that's a bit too esoteric for me

) that seems quite unnecessary for gaming purposes.

Dave H · Sep 22, 2003

Walt-

Your ignorance of current practice in the professional 3d graphics creation industry is breathtaking. Rather than respond to your impossible volume of wrongheadedness point by point, I'd like to correct a few of your most egregious and/or most repeated misunderstandings, and then ignore you from now on so long as I don't think you're misleading anyone else with your drivel (and I don't think you've got anyone fooled at the moment).

Most production frames are scanline-rendered. Not ray-traced. Even for feature film work, ray-tracing is typically only used to achieve a particular effect--not for most frames, and not necessarily for the entire frame when it is used. Ray-tracing for TV work is rare.
Most production frames are rendered at FP32-per-component precision. Yes, some programs offer the ability to render in FP64, but again this is not the norm for film and decidedly rare for TV. If FP64 is used at all, it is far more likely to show up on the geometry side than the pixel side.
FP24 is not "a subset" of offline rendering work. Unless you think someone is still stuck with your old Video Toaster you keep using as the basis for your remarkably uninformed comments. FP24 is not used at all (unless someone is using R3x0 boards to accelerate final renders), because the hardware used to render doesn't do calculations at FP24.
The primary technical reason NV3x is not suitable for final renders of many production frames is that its shader pipeline doesn't support shaders of arbitrary length or dynamic branching. Lack of FP64 or ray-tracing ability are less important concerns, particularly for the portion of the market likely to first start doing production renders on a consumer card. Compared to this, R3x0 is substantially less suitable because it lacks FP32 precision, has substantially more restrictive limits on shader length, and doesn't even do static branching.

In addition, there are logistical issues holding back production rendering on GPUs, like toolkit support and a means for writing output to disk. Also, R3x0 is helped out by the fact that the Ashli plugin can automatically compile a single software shader into multiple R3x0 shaders, which will alleviate the length issue in many cases. (The lack of even static branching is still a huge hole, of course.)
The "workstation" graphics market is not nearly as monolithic as you appear to assume. In particular, while Nvidia has made inroads into certain aspects of the market with the NV25 (and earlier) Quadros, the functionality they provide--primarily hardware acceleration (at realtime interactive framerates) of wireframe and flat shaded views for CAD/CAM applications--is completely different from the new functionality prospectively offered by the somewhat flexible FP shader pipelines in the DX9-class cards (namely single-frame previews, or possibly final renders, of procedurally shaded scenes at non-realtime framerates). Yes, earlier Quadros were used to accelerate wireframe and fixed-function views in programs like Maya etc. in the course of creating procedurally shaded content, but the actual shader programs were always run (slowly) in software. And the primary market for older Quadros is more CAD, which doesn't use programmable shaders at all.

So the fact that Nvidia already had products for "the workstation market" is completely irrelevant to this discussion.

Hope that cleared some things up for you; apparently things have changed a bit since your Video Toasting days. Perhaps now you'll be able to look back at that Carmack posting or Nvidia's early NV3x marketing and realize that the intent was exactly as I represented it. (BTW, even if as you claim (with no evidence) that professional applications only represent ~5% of NV3x unit sales, that would likely translate into ~35% of NV3x revenue and the vast majority of NV3x profits. Myself I'd bet all categories are a bit lower, but I could easily see the Quadro side responsible for around half of Nvidia's NV3x profits.)

And a tip: read more and post less. You might learn something.

micron · Sep 22, 2003

Wow....that was completely uncalled for....

archie4oz · Sep 23, 2003

Perhaps... But in my experience, his 5 points are spot on... With the possible exception of #2 where you see a lot TV/Commercials using the "Big 4" apps built-in renderers where you see a lot more doubles sprinking through their renderers... Although this is more often than not simple CYOA just to keep the renderer "clean" and not worry about precision overflows. Film/Production add-on renderers tend to get a lot more extensive workout where performance and memory footprint can play bigger role and the vendors of said products pay more attention to numerical analysis of what can be trimmed out of the renderer to improve those facets...

KimB · Sep 23, 2003

OpenGL guy said:
Except that if you have to multipass on the NV3x hardware, you are in trouble because there's no F-buffer or even floating point intermediate buffer format available.

That second part a current software limitation (in DirectX, I believe the FP buffers are working in OpenGL).

Whenever using F-buffer or multipass, the relative performance hit will always be smaller if the hardware supports more instructions (or whatever the limit is, such as texture reads, which the NV3x can also do more of). Since there undoubtedly will be some performance hit associated with using the F-buffer (as opposed to just being able to run a longer program, everything else the same), I don't think you can state unequivocally that the R300 using the F-buffer with its 96 instruction limit will automatically have less of a performance hit than the NV3x with its 1024 instruction limit using multipass (or 2048 instruction limit with the high-end Quadro version, which is more applicable to the situations we're talking about).

KimB · Sep 23, 2003

Dave H said:
[*]The primary technical reason NV3x is not suitable for final renders of many production frames is that its shader pipeline doesn't support shaders of arbitrary length or dynamic branching. Lack of FP64 or ray-tracing ability are less important concerns, particularly for the portion of the market likely to first start doing production renders on a consumer card. Compared to this, R3x0 is substantially less suitable because it lacks FP32 precision, has substantially more restrictive limits on shader length, and doesn't even do static branching.

Well, I think that arbitrary branching and shader length are primarily software limitations. That is, these things can be simulated with multipass. The only question is, if you have to multipass to simulate these things, will the performance still be good enough to compete on a price/performance basis? (I'm attempting to imply that going multipass for arbitrary branching, say, for the termination of a loop, may lead to, for example, hundreds of executions of a short loop for no reason, or of hundreds of multipasses of a short shader, either one of which would waste processing time)

Of course, the software needs to get there. If nVidia is seriously considering an attempt at penetrating this market, then they need to be working on such software (or contracting another company to be working on the software...something I would consider typically less desirable).

Why isn't this software out by now? I don't know. Perhaps the hardware design decisions were made first, and the decision whether or not to produce the software for this application was put off to a later date. Perhaps they did some market research, and decided the market would be unprofitable for this product. That's one possibility. There could be other reasons, such as logistic reasons, that the software has yet to be released (note: I'm not trying to imply that it's even being worked on with that last sentence. I don't know.).

OpenGL guy · Sep 23, 2003

Chalnoth said:
OpenGL guy said:

Except that if you have to multipass on the NV3x hardware, you are in trouble because there's no F-buffer or even floating point intermediate buffer format available.

Click to expand...

That second part a current software limitation (in DirectX, I believe the FP buffers are working in OpenGL).

Whenever using F-buffer or multipass, the relative performance hit will always be smaller if the hardware supports more instructions (or whatever the limit is, such as texture reads, which the NV3x can also do more of). Since there undoubtedly will be some performance hit associated with using the F-buffer (as opposed to just being able to run a longer program, everything else the same), I don't think you can state unequivocally that the R300 using the F-buffer with its 96 instruction limit will automatically have less of a performance hit than the NV3x with its 1024 instruction limit using multipass (or 2048 instruction limit with the high-end Quadro version, which is more applicable to the situations we're talking about).

When did I say anything about performance? But since you've brought it up, I've seen samples from Ashli that do several passes on the R300, yet are still much faster than single pass on NV3x parts. I believe it would take a long time before bandwidth considerations would offset that (large) performance difference.

Nvidia and ARB2

OpenGL guy

WaltC

WaltC

WaltC

demalion

WaltC

WaltC

WaltC

demalion

WaltC

WaltC

demalion

Natoma

Daliden

Dave H

micron

Diamond Viper 550

archie4oz

ea_spouse is H4WT!

KimB

KimB

OpenGL guy

Similar threads