Carmack's comments on NV30 vs R300, DOOM developments

antlers · Jan 30, 2003

Grall said:
Oh well... *sighs deeply*

When a discussion has momentum, it's hard to make it change direction.

I do agree that it is a little surprising that the NV30-specific codepath on the NV30 is only beating the ARB2 codepath on the R300 by a little bit. Everyone thought that Doom3 with an NV30-specific codepath would be kind of a best-case scenario for the NV30. It looks like instead the relative performance is going to be more typical of the results seen for other games.

RussSchultz · Jan 30, 2003

I agree, I was confusing the vertex vs. fragment shaders about whether for sure it was gone or not.

Regardless, distilling a persons motivations and beliefs from several disconnected actions amounts to reading tea leaves. Which is why I said you were stretching to come to your conclusion (that being him keeping the NV30 fragment path was because he didn't believe the ARB path would be up to snuff).

Thelacky · Jan 30, 2003

The ATI R300 is just one brilliant piece of engineering. ATI truely surpassed themselves.

Joe DeFuria · Jan 30, 2003

Regardless, distilling a persons motivations and beliefs from several disconnected actions amounts to reading tea leaves. Which is why I said you were stretching to come to your conclusion...

All speculation is reading tea leaves. You keep on trying to make the point that basically "I'm specualting." And I keep saying..."Yes I am. Obviously."

No need to make the point again. It has been implicitly conceded from the start.

All I ask (yet again), is that if my speculation is so "stretched", then it should be very easy to provide a separate scenario / reason for his "maintaining" (generic term) the NV30 path, that has nothing to do with anticipated ARB2 drivers getting up to snuff.

Or is it that you just term "all speculation" as "stretched?"

Sxotty · Jan 30, 2003

Do you fellas, remember analyzing Shakespeare and other literary works, well that is kinda what this seems like to me, everyone is reading all kinds of crazy things into a statement, every ambiguity is twisted to support the idea that an individual personally subscribes to. Usually in this case people are divided on company lines, that personally I see no reason to really care about.

Do youguys have a lot of stock in ATI, or Nvidia? I mean what is the point of getting huffy, the truth is both companies make good stuff, hopefully neither will get as far ahead as Nvidia once was, and ati was before. As long as competition remains strong then the products we end up with, whoever designs them will just get better. Everyone is entitled to their opinion, maybe they like red and ATI, or green and Nvidia, but whatever the reason that does not mean you should get angry. It is too much like a football game and the fans starting to fight instead of just enjoying the game.

(I personally still respect Carmack and do not think that he is paid off by Nvidia to say what they want as some have suggested, he seems to say what he thinks and whether it is correct youguys can argue all you want about)

RussSchultz · Jan 30, 2003

Speculating about motivations and beliefs when there's nothing that addresses motivations and beliefs in anything he wrote is where I call it stretching.

Speculating that I'm a vegetarian because you've never seen me eat meat is fine and dandy. Saying I'm doing it for ethical reasons based on those same facts is simply stretching it.

Along those lines, he says it (the NV30 arb2 path) performs worse than the NV30 path. He also says NVIDIA has assured him it resolved shortly. No where is there any shred of anything that suggests he doesn't believe them or that he doubts the NV30 performance will be similar between the two paths, so why do you attribute those beliefs to him as the "only reasonable explanation "?

Why the code path is still there (assuming it is truly there and being actively developed) is something only he can answer with any certainty.

And Joe, I already agreed with you that its a possibility that what you say is the reasoning behind it. I just take particular exception that "its the only reasonable explanation". (Mostly because I don't really find it complelling at all.)

Saem · Jan 30, 2003

I don't think he thinks FP24 is overkill as long as the performance is there (and it is). I think his major problem with the R300 is limited shader length.

I believe that was in context of him trying to experiment, this is not connected to what's going on in Doom III, in the realm of supported features.

As for complaints of the NV30 path, he had this done before, the performance of the NV30 with the ARB2 path is obviously lacking, roughly half that of the R300. The optimization is necessary otherwise it'll run like crap. In the case of the R300 it has high precision rendering which is just shy of the NV30, optimizing by creating a R300 code path wouldn't get you much for the time invested.

Simply put, the R300 can run everything on and fast with the ARB2 path. The NV30 cannot run everything on and fast enough with the ARB2 path. Which would likely be the best ONE to optimize for?

Further more, the NV30 path might lend to Carmack's research for his next creation/project(s).

OpenGL guy · Jan 31, 2003

Chalnoth said:
OpenGL guy said:

Is that so? How many operations can you do with such a 16-bit float before the difference affects the LSB of an 8-bit integer format? Not too many.

Click to expand...

Keep in mind that most internal calculations are done at higher precision than the storage formats. The NV30 appears to do at least 12-bit precision on its 16-bit mantissa for the internal calculation. While this is probably not stored each time around, it will significantly reduce the errors from what you've stated.

Actually, it won't. If you are losing important data just loading the values as 16-bit floats, no amount of internal precision can bring that back.

But I suppose you are right, there will be some errors.

Thanks for conceding a point.

The thing is, the floating-point format ensures that they will be much less noticeable. That is, the human visual subsystem is much more highly-attuned to intensity differences at very low brightness levels. For each halving of the number value, the floating-point format will effectively gain a bit of "buffer zone" before the errors become apparent. This for the reason that, for example, the number 1.1E-03 will become 0.0011 when sent to the framebuffer. If that second digit had been in error, it wouldn't make any difference in the end since the DAC can only deal with a granularity of 1/1024.

Then a better example would be 9.6E-04. If the second digit were in error, then you *might* notice a difference.

I still think 16-bit floats being enough for pretty much any color-only data (at least until 12-bit DACs are available...).

DAC precision is a factor, but, if you do enough computations, DAC precision becomes a moot point.

Humus · Jan 31, 2003

Joe DeFuria said:
if Carmack has developed a NV30 path, maybe just for plain curiousity, there's no reason for him to get rid of it if that path has an advantage. He may drop it in the end if it turns out the ARB2 is just as fast or close, but he cannot at this time be certain.

Click to expand...

Agree, 100%. This does not contradict my opinion!

I don't get why Russ is so upset over this. The key is as you said: at this time he cannot be certain. In other words, he does not have enough confidence that it will be practical to drop NV30 path.

I see nothing wrong there,

Click to expand...

Neither do I!

If the NV30 path has a high performance advantage today I'd continue developing it if I were in his position. It's not any more fun developing with low performance than gaming with low performance.

Click to expand...

Thank you...that's ONE reason so far. Something Russ has yet to present to me. However, IMO, that is a bit weak for a few reasons:

1) Even if it's currently faster, why continue to develop / maintain if in the end, you believe it won't provide any real benefit?

2) Is performance of ARB2 THAT bad? I don't see how rendered FPS performance during development is a big issue. CPU speed (compiling), absolutely. I don't really see how prototyping stuff, and generating either 60 or 30 or 15 FPS makes much of a difference.

3) Isn't he still going to have to develop the ARB2 on GeForecFX regardless of whether he develops NV30 path or not?

Either way it slows your productivity down, at least for me, I'm certainly more productive at 60fps than 30fps.

Click to expand...

See number 2. Could you explain how? 60 FPS vs. 60 seconds per frame I could understand. Not 60 FPS vs. 30.

"He cannot be certain" was supposed to be read as "he may have confidence, but no garantuees are given". He cannot drop it until fast drivers for ARB2 are delivered, but he may still have confidence that this will happend eventually.

Yes, 60fps vs. 30fps during prototyping matters. You have to navigate your little world, and in the case of a full-blown game the navigating time can be much higher than for simple techdemos. If you're stuttering your way through the world it'll take more time, especially if you're going to study details as your spatial precision goes down. It also has the same psycological effect as when you play games at low framerate. More annoyance and sometimes even some kind of odd distrust of your code.

Mintmaster · Jan 31, 2003

LeStoffer said:
I agree that something seems 'off' given nVidia otherwise excellent driver reputation.

I have this odd feeling that the 3 levels of precision (fixed, FP16, FP32) in the pixel pipeline is the issue here: I would guess that they have the fixed function pretty under control by now (it's the old beloved register combiners from GF3/4 after all), but they still need to work on the two other levels - and I don't think that speed at FP 32 is were the focus is right now.

I just don't see how a driver and optimizations can have such huge repercussions in pixel shader speed. I suppose some out of order scheduling may increase shader speed by reducing stalls, but would it really make that much of a difference?

Also, how is NVidia performing so poorly in things like the PS 1.4 benchmarks, both in 3DMark and ShaderMark? Surely PS 1.4 is a subset of this so called "NV30 path".

Nvidia has had SO much time to develop the NV30. I always thought of them as kings of optimization. More so on the software side, but their GF3/GF4 hardware was very efficient (well, after recycling so much old tech, you'd expect it to be, I guess). I think I'm going to just ignore NV30's pixel shader performance for a month.

antlers4 said:
I don't think that has to be the case. I just think NVidia designed the chip thinking that a certain FP shader performance was adequate, and compared to the R300 it is looking less adequate (in some circumstances; I'm not forgetting that the NV30-specific path runs fine).

As I argued a couple of pages back, the R300 runs everything through its FP24 path, so it has to be fast. NVidia chose to maintain a 32-bit int path for speed, so it didn't have to make its FP path as fast.

I suppose you could be right, but it still seems very unlikely to me. Would NVidia really be that cocky? They're talking about 4 times GF4's pixel shader performance, a brand new architecture that blows the competition away, etc, etc.

However, your argument matches well with that interview we heard a while ago about keeping the register combiners.

Mintmaster · Jan 31, 2003

Don't worry Grall, I was rather engaged by your post. I was conjuring up a reply, and then your next paragraph had exactly what I was thinking!

Grall said:
What I find so fascinating about all this isn't wether the NV30-specific codepath uses this or that shader precision mode or wether ATI or Nvidia has the more powerful shader implementation... No! I was expecting the NV30 to eat the R300 alive on mere basis of its vast fillrate advantage and Doom3's heavy use of stencil volumes! (And also, Nvidia's claim of a 40% advantage on their own proprietary map played some part in that also I might add.)

By the time the game actually lays down any pixel shader stuff at all, it's already done a Z-only pass so there'll be no overdraw (except for transparencies of course, but those are handled in a different manner from opaque surfaces it seems that probably is less costly), so I didn't expect pixel shaders to be much of a limit at all as far as performance is concerned, and anyway, Doom3s pixel shaders aren't very complicated. PS1.4 can do them in one pass after all, so with a 2.0 implementation I figured it would be a breeze.

On the other hand, we've seen what an enormous performance-hit stencil shadows brings just in a game like Q3A where only characters have such shadows (and they don't have many polys at all comparatively). Doom3 uses stencils *everywhere*! NV30 with its high clock speed should burn through those stencils like no tomorrow I assumed. Maybe it's true, except obviously they're not as big a factor in overall performance as pixel shader speed.

Guess that's because the game redraws the entire screen with pixel shaders at least once per light source (more using pre-PS1.4 shaders or god forbid, no shaders at all), and with many sources that'll be quite a bit of shading there. (Probably the reason why the game's so bloody dark, heh heh.) On the other hand, the stencils seems to be rendered just once (well, front and back side of models, but DX9 cards can do both at once I think I read somewhere).

Well discussed. I thought the same thing, and figured that was true after seeing NVidia's PR graphs.

However, when you think about it, the texturing passes are very important. The Z only and stencil passes only need 1 clock per pixel, and this will be very optimal since there are no bandwidth limitations (or there shouldn't be in a well designed GPU). However, he's using 7 textures per pass when lighting everything, so that's at least 7 clocks per pixel. Throw in trilinear filtering, and it could potentially be 10+ wherever minification is happening (although NV30 is supposed to do single cycle trilinear, I think).

So in the end, it turns out the game is limited in performance in completely different ways than I had anticipated...! Who coulda thunkit! So is it basically true then, what XBit labs wrote a while back, that NV30 issues just one PS fp op per clock (or two texture reads)? Or how else can this apparent deficiency in performance be explained?

*G*

I still think the deficiency goes beyond this, but I'm doubtful we'll ever know the real truth. R300 is also one fp op per pipe per clock, but is clocked way lower. :?

Crusher · Jan 31, 2003

Basic said:
Btw, the comment on "Carmacks' reverse" was about the stencil filling passes. "Carmacks' reverse" mean that you do a stencil operation for every hidden pixel, and an architecture that is built to effectively throw away hidden pixels doesn't help much there.
HierZ should work in the other passes.

No, neither normal volumetric shadowing or Carmack's Reverse method touch hidden pixels. Both methods make comparisons on the Z-Buffer, which contains no hidden pixels at all. The way Carmack's Reverse differs is that he renders the back face of the shadow volume, incrementing the stencil buffer when there's a z-fail, and then render the front side of the volume and decrement the stencil buffer if there's a z-pass. Normal method does front side first, incrementing on z-pass, then the back side, decrementing on z-fail. Culling is disabled for both methods, which is part of the reason the performance is eaten up, and is probably where you got confused about hidden pixels.

Grall said:
On the other hand, we've seen what an enormous performance-hit stencil shadows brings just in a game like Q3A where only characters have such shadows (and they don't have many polys at all comparatively). Doom3 uses stencils *everywhere*! NV30 with its high clock speed should burn through those stencils like no tomorrow I assumed. Maybe it's true, except obviously they're not as big a factor in overall performance as pixel shader speed.

...

On the other hand, the stencils seems to be rendered just once (well, front and back side of models, but DX9 cards can do both at once I think I read somewhere).

You don't render stencil shadows on the back side of models. It's also difficult to "burn through" them, since the calculations necessary to build the shadow volumes are time consuming, and the workload increases along with the polygon count of the occluders (might be one reason the Doom 3 models seem slightly lower in polygon count than people expected them to be).

As for the NV30 codepath, the two reasons I could think of are:

a) once he tried the ARB2 path on the NV30 and found it wasn't performing very well, he added an NV30 path to see if the proprietary extensions were any faster, and when he discovered they were, he kept it

b) there was an NV30 emulator for Cg developers long before the NV30 was available to work on, perhaps he added the NV30 path to play around on the emulator with.

Either way I'm sure it doesn't take a lot of time and effort for Carmack to convert ARB OpenGL paths to proprietary extensions, so I guess I don't see why anyone feels the need to worry about their existence.

Mintmaster · Jan 31, 2003

Crusher, what Basic means by "hidden pixels" is "z-fail". If they fail, they are in effect hidden. He's just talking about how ATI's Heirarchical Z doesn't work when drawing the stencil volumes because of this. Trust me, Basic is a smart, knowledgeable guy that doesn't easily get "confused".

And I don't know what you mean by "backside of models", but you do render the backside of the stencil volumes, which is what Grall is talking about. He said "stencils", not models.

Finally, we are obviously talking about situations that aren't CPU limited. What good does a fast GPU do you then? While the graphics card is handling the intense texturing for one frame, the CPU is doing the stencil volumes for the next frame. NV30 should very well be able to burn through them.

Crusher · Jan 31, 2003

Mintmaster said:
Crusher, what Basic means by "hidden pixels" is "z-fail". If they fail, they are in effect hidden. He's just talking about how ATI's Heirarchical Z doesn't work when drawing the stencil volumes because of this. Trust me, Basic is a smart, knowledgeable guy that doesn't easily get "confused".

That's not what it sounds like he is talking about to me. Nor does your description of what Hierarchial Z does sound correct. You say HZ doens't work when drawing stencil volumes because it throws away hidden pixels, and you claim the hidden pixels are the parts of the volume face that fail the z-test (i.e. is this pixel in front of or behind the pixel stored in the same location in the z-buffer). Throwing away pixels that fail the z-test is PERFECTLY FINE. It should never be keeping those pixels anyway, since you explicitly disable z-buffer writes before you render the volume. All you care about is the result of the test--the fact that it did fail--so that you can alter the entry in the stencil buffer accordingly. As long as the driver accurately reports the result of the depth test, everything should be peachy. And I don't see any possible reason why HZ would affect the depth tests themselves.

Mintmaster said:
And I don't know what you mean by "backside of models", but you do render the backside of the stencil volumes, which is what Grall is talking about. He said "stencils", not models.

He said models, but perhaps he was talking about the stencil volumes instead of models, in which case I misunderstood him.

Mintmaster said:
Finally, we are obviously talking about situations that aren't CPU limited. What good does a fast GPU do you then?

Nowhere in Carmack's .plan update did he even suggest that Doom 3 was not being CPU limited, and since the comments I'm responding to are referring to the FX's performance in Doom 3, I don't see how it could be obvious that we're talking about situations that aren't CPU limited.

Mintmaster said:
While the graphics card is handling the intense texturing for one frame, the CPU is doing the stencil volumes for the next frame. NV30 should very well be able to burn through them.

NV30 might be able to burn through the z tests, and the rendering pass to add the shadow from the stencil mask might not take too long, that I could agree with. My point is, while the NV30 might be able to handle it's share of the workload for the stencil volumes, the end performance probably isn't going to be "blazing fast" like it sounded he was expecting it to be, since there are still lots of things that have to be done to calculate them. And your comment implies that the NV30 has other things to do while the CPU is computing the volumes, which isn't normally the case.

The rendering process is usually that you build the volume for one occluder, do the z tests and update the stencil mask, then build the volume for the next occluder. In this situation, if the GPU can do the transform and z tests faster than the CPU can compute the volume for the next occluder, the GPU will be sitting idle waiting for that information. And if you have this all being done in the same function (or even the same thread), the transformation and z testing won't be done concurrently with the volume production anyway, so they'll both be waiting for the other. You could generate all the shadow volumes before you begin transforming and doing z tests, but I don't think that would be any faster, and you'd have to have to store a lot more vertices in each frame.

Nagorak · Jan 31, 2003

Sxotty said:
(I personally still respect Carmack and do not think that he is paid off by Nvidia to say what they want as some have suggested, he seems to say what he thinks and whether it is correct youguys can argue all you want about)

Let's see he 1) creates a specific code path for the NV30, while 2) not creating one for the R300.

Then 3) he pats Nvidia on the back for making such great drivers, when the fact is he is tailoring his code to run on their HW to begin with! How could things not work exactly how he wants, when he goes out of his way to program for Nvidia's hardware?

4) Nvidia's graphics cards were too slow to demo Doom3, so he ran with an R9700. But then at the first available opportunity he throws the R300 out of his comp and goes back to Nvidia. It's just a load of bullshit, seeing as he already has an NV30 optimized path, he should keep the R9700 and work on making one for it. Frankly he should just keep his mouth shut about driver quality, because he obviously doesn't have an ATi card in his computer enough to judge.

And saying that the R200 path is fine to use with the R300 is just ridiculous...why not just run the NV30 with the NV10 codepath too?

Maybe I'm overreacting, but it just annoys me that everyone thinks Carmack is some sort of god, when the truth is he hasn't made a single good game yet! When Doom3 comes out everyone here is going to be wetting themselves over how great it looks, when the truth is you'd probably find more atmosphere in the now-anemic graphics of System Shock 2.

gokickrocks · Jan 31, 2003

well IMO, doom3 (alpha) doesnt look all that great, i think resident evil on my gamecube looks better

Mulciber · Jan 31, 2003

Nagorak said:
Sxotty said:

(I personally still respect Carmack and do not think that he is paid off by Nvidia to say what they want as some have suggested, he seems to say what he thinks and whether it is correct youguys can argue all you want about)

Click to expand...

Let's see he 1) creates a specific code path for the NV30, while 2) not creating one for the R300.

Then 3) he pats Nvidia on the back for making such great drivers, when the fact is he is tailoring his code to run on their HW to begin with! How could things not work exactly how he wants, when he goes out of his way to program for Nvidia's hardware?

4) Nvidia's graphics cards were too slow to demo Doom3, so he ran with an R9700. But then at the first available opportunity he throws the R300 out of his comp and goes back to Nvidia. It's just a load of bullshit, seeing as he already has an NV30 optimized path, he should keep the R9700 and work on making one for it. Frankly he should just keep his mouth shut about driver quality, because he obviously doesn't have an ATi card in his computer enough to judge.

And saying that the R200 path is fine to use with the R300 is just ridiculous...why not just run the NV30 with the NV10 codepath too?

Maybe I'm overreacting, but it just annoys me that everyone thinks Carmack is some sort of god, when the truth is he hasn't made a single good game yet! When Doom3 comes out everyone here is going to be wetting themselves over how great it looks, when the truth is you'd probably find more atmosphere in the now-anemic graphics of System Shock 2.

The way I read it the ARB2 path is already optimized for ATI, so that point is moot.

And Carmack doesnt make games, he makes engines...so moot I say.

Mintmaster · Jan 31, 2003

I don't want to turn this into a big back-and-forth argument, so I'm going to try and clarify things.

Crusher said:
Mintmaster said:

Crusher, what Basic means by "hidden pixels" is "z-fail". If they fail, they are in effect hidden. He's just talking about how ATI's Heirarchical Z doesn't work when drawing the stencil volumes because of this. Trust me, Basic is a smart, knowledgeable guy that doesn't easily get "confused".

Click to expand...

That's not what it sounds like he is talking about to me. Nor does your description of what Hierarchial Z does sound correct. You say HZ doens't work when drawing stencil volumes because it throws away hidden pixels, and you claim the hidden pixels are the parts of the volume face that fail the z-test (i.e. is this pixel in front of or behind the pixel stored in the same location in the z-buffer). Throwing away pixels that fail the z-test is PERFECTLY FINE. It should never be keeping those pixels anyway, since you explicitly disable z-buffer writes before you render the volume. All you care about is the result of the test--the fact that it did fail--so that you can alter the entry in the stencil buffer accordingly. As long as the driver accurately reports the result of the depth test, everything should be peachy. And I don't see any possible reason why HZ would affect the depth tests themselves.

HZ on R300 keeps the max of all Z values in each tile (assuming the convention where higher Z is further away). If a current polygon's closest z (i.e. lowest z) is bigger than the value in the corresponding HZ tile, the polygon is discarded, as it entirely fails the test. This is ordinary depth pass rendering.

If you now want to do depth fail rendering, you have no HZ acceleration. HZ holds the max, so you can't tell when a polygon entirely passes and thus should not be rendered in depth fail mode. In other words, pixels from the stencil volumes that pass the depth test will not change the stencil buffer, but cannot be discarded rapidly by HZ. Polygons that fail can be discarded, but you don't want to discard them since they need to update the stencil buffer one pixel at a time.

NOTE: when I say polygons wrt HZ, I mean tiles or blocks of pixels within polygons.

This is what was written in ATI's HZ performance guidelines: If you change the Z function from depth pass to depth fail, HZ can't work. I just explained to you why, and this is what Basic was talking about. ATI then falls back to ordinary Z-buffering.

Mintmaster said:
Mintmaster said:

And I don't know what you mean by "backside of models", but you do render the backside of the stencil volumes, which is what Grall is talking about. He said "stencils", not models.

Click to expand...

He said models, but perhaps he was talking about the stencil volumes instead of models, in which case I misunderstood him.

Grall said stencils, then said models in a statement enclosed in parentheses immediately thereafter. There is no need for you to nitpick.

Mintmaster said:
Mintmaster said:

Finally, we are obviously talking about situations that aren't CPU limited. What good does a fast GPU do you then?

Click to expand...

Nowhere in Carmack's .plan update did he even suggest that Doom 3 was not being CPU limited, and since the comments I'm responding to are referring to the FX's performance in Doom 3, I don't see how it could be obvious that we're talking about situations that aren't CPU limited.

If we are CPU limited, why the hell would he be talking about video card performance? All you have to do to test the video card is raise the resolution so that framerates are significantly faster than when disabling rendering altogether.

Whenever you talk about video card performance, you mean not CPU limited. Otherwise you are either talking about driver overhead or have no idea what you're talking about, neither of which apply to John Carmack's statements.

Mintmaster said:
Mintmaster said:

While the graphics card is handling the intense texturing for one frame, the CPU is doing the stencil volumes for the next frame. NV30 should very well be able to burn through them.

Click to expand...

NV30 might be able to burn through the z tests, and the rendering pass to add the shadow from the stencil mask might not take too long, that I could agree with. My point is, while the NV30 might be able to handle it's share of the workload for the stencil volumes, the end performance probably isn't going to be "blazing fast" like it sounded he was expecting it to be, since there are still lots of things that have to be done to calculate them. And your comment implies that the NV30 has other things to do while the CPU is computing the volumes, which isn't normally the case.

The rendering process is usually that you build the volume for one occluder, do the z tests and update the stencil mask, then build the volume for the next occluder. In this situation, if the GPU can do the transform and z tests faster than the CPU can compute the volume for the next occluder, the GPU will be sitting idle waiting for that information. And if you have this all being done in the same function (or even the same thread), the transformation and z testing won't be done concurrently with the volume production anyway, so they'll both be waiting for the other. You could generate all the shadow volumes before you begin transforming and doing z tests, but I don't think that would be any faster, and you'd have to have to store a lot more vertices in each frame.

The CPU does not wait for the GPU to finish the stencil drawing, nor the other way around. Things get queued up, with the GPU finishing rendering one frame while the driver caches the draw commands for the next. The drawing calls in the function do not wait for the GPU to finish before returning. This is probably the most fundamental of driver enhancements to reduce CPU usage.

The only time it fails is when you change rendering resources like textures or vertex buffers in the middle of a frame, or if you need to get a result back, like doing a framebuffer read or using occlusion query, in which case you empty the queue. Even the latter has mechanisms for issuing the query and retrieving results later. If working with dynamic vertex buffers, then the driver can make a copy of the vertex data, via CPU (or AGP, I think), and queue it.

Carmack knows very well how to optimize a program. He will not let both GPU and CPU have any significant idle time in the same frame. If the driver doesn't do what I said, then he will ping-pong between vertex buffers from frame to frame.

So even if you generate your shadow volumes and send them to the GPU one at a time, the driver will effectively wind up drawing them all together some time later.

elchuppa · Jan 31, 2003

you're right about analyzing shakespear.. I was actually thinking more the bible hehe. Like a bunch of priests arguing over the meanings in some obscure passage of the old testament.

It's interesting how some people seem to hate Carmack, to me he has never been particularly abrasive nor judgemental in style or manner. I suppose that he hasn't ever really said anything I didn't want to hear. Still, to be so angry at the guy seems peculiar. He seems to know more about what he's doing than most...

I always like reading his .plan updates that's for sure.

no_way · Jan 31, 2003

elchuppa said:
It's interesting how some people seem to hate Carmack, to me he has never been particularly abrasive nor judgemental in style or manner. ...

I always like reading his .plan updates that's for sure.

Plus, he's a great rocket scientist. er no, make that engineer.
Still, i wonder what code path will the original Radeon use ?

Carmack's comments on NV30 vs R300, DOOM developments

antlers

RussSchultz

Professional Malcontent

Thelacky

Joe DeFuria

Sxotty

RussSchultz

Professional Malcontent

Saem

OpenGL guy

Humus

Crazy coder

Mintmaster

Mintmaster

Crusher

Aptitudinal Constituent

Mintmaster

Crusher

Aptitudinal Constituent

Nagorak

gokickrocks

Mulciber

Mintmaster

elchuppa

no_way

Similar threads