If Doom 3 had been written in D3D....

Scali said:
This article shows exactly what I mean:

http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2149&p=7

Look at the XP2000+ (almost as slow as my 1800+), it gets 46.1 fps, and that's it... Changing the resolution doesn't matter anything. Only on 3+ GHz systems, the resolution seems to have any impact at all on the framerate, below that, they are completely CPU-limited.
And there is more than a factor 2 speed difference between the slowest and the fastest CPUs in the test, that is even more than the processor ratings from AMD would indicate.

So these figures justify my suspicions: Doom3 is way too CPU-heavy.
Well of course the game is cpu limited in that test, the Athlon XP2000+ is combined with a Geforce 6800 Ultra! Had it been using a Radeon 9600, I can promise you it would be GPU limited too...
 
Scali said:
I don't think it's reasonable to believe pairing a high-end GPU with a low-end CPU will give you good results. You have to pair equal classes.

Erm, I believe that about two years ago, both 1800+ and 9700 were near the high-end. There was the 9700Pro ofcourse, and with CPUs... maybe 2200+ or so?
So as far as I recall, they are equal classes.
AFAIK the Athlon XP 2600+ was released at the same time as the 9700PRO
 
Well of course the game is cpu limited in that test, the Athlon XP2000+ is combined with a Geforce 6800 Ultra! Had it been using a Radeon 9600, I can promise you it would be GPU limited too...

What's your point? Use a GPU slow enough, and the game is not CPU-limited?

The point is that given the CPU (2000+ is not that slow, it does fine in pretty much all games), and the GPU, 46 fps is a ridiculously low average framerate.
Since the game runs on the fastest GPU available, and at only 800x600, the conclusion is pretty much that Doom3's gamelogic itself can only be processed at about 46 fps on a 2000+. That is ridiculously heavy, and on my 1800+ it means the framerates during combat are too low. I doubt that the 2000+ does much better (and that is already way above the minimum requirement of 1500+).

So the issue is not that the game is CPU-limited. If games are CPU-limited, yet they get an average of 100+ fps, nobody would care. The point is that the game is CPU-limited, and at such a low framerate, that it is barely playable on a 2000+, even if you have the fastest GPU in the world.
I bet that there is no other game that will drop below 100 fps on average in 800x600 on the same 2000+/6800U machine.
Or take the Battle of Proxycon test from 3dmark03 for example.
 
AFAIK the Athlon XP 2600+ was released at the same time as the 9700PRO

Okay, so then high-end would be 2600+ and 9700Pro...
What would be a nice match for 9700 non-pro then? Somewhere in the range of 1800+ - 2200+ I suppose.
 
Scali said:
The point is that the game is CPU-limited, and at such a low framerate, that it is barely playable on a 2000+, even if you have the fastest GPU in the world.

Using an XP 2000+ with a 6800U is obviously a gross mismatch, and not realistic. Like I tried to explain to you earlier, as graphics cards become faster, they tend to require faster CPU's to really push them, and this is especially true with the current generation of GPu hardware.

If you bothered to read the conclusion of the article, you would understand why a CPU such as the Athlon XP 2000+ has limited performance in a game like Doom 3:

Anandtech said:
Doom 3 sees system memory as one big cache and drives performance up considerably. It is also the on-die memory controller that makes cache size less of an issue on the Athlon 64, while too small of a cache seems to make or break performance with the Pentium 4.

The Athlon XP is much less impressive under Doom 3 thanks to its lack of an on-die memory controller

Here is a final conclusion that they make:

Anandtech said:
If you are lucky enough to own any of the GeForce 6 series cards and play at resolutions lower than 1280x1024 rest assured that money spent on a faster CPU is money well spent. If you happen to have a slower card, something along the lines of a Radeon 9800 Pro or even a regular X800, your system is far less CPU bound and you may want to go with a more middle-of-the-road CPU in order to maximize performance without spending needlessly.
 
Scali said:
Okay, so then high-end would be 2600+ and 9700Pro...
What would be a nice match for 9700 non-pro then? Somewhere in the range of 1800+ - 2200+ I suppose.

at least 2200+, but more like an (imaginary) 2300+ if you calculate it from the clockspeed or rating difference in comparsion to the speed difference between the 9700np/pro (=15%)
 
Scali said:
Well of course the game is cpu limited in that test, the Athlon XP2000+ is combined with a Geforce 6800 Ultra! Had it been using a Radeon 9600, I can promise you it would be GPU limited too...

What's your point? Use a GPU slow enough, and the game is not CPU-limited?
That would be a good point. It's the truth for almost any game in existance. Of course there are some benchmarks that go to great lengths to avoid being affected by CPU performance. Such as, conviently ...
Scali said:
the Battle of Proxycon test from 3dmark03
Read this.

As you might know, that test extrudes the stencil volumes on the GPU, by inserting degenerates into the meshes. This causes massive vertex shader and setup load, but it frees the CPU from performing that task. More GPU load, less CPU load. It is the right balance for a graphics card benchmark. It isn't necessarily the right balance for a game.

Does an Athlon XP2200+ coupled with a Geforce 6800Ultra deliver 46.1 fps in Battle of Proxycon?

edit: found the answer. Slightly above 100 fps seems to be a normal result. I don't know how much "tweaking" is required for that, but it's so far above 46.1 that it doesn't matter anyway.
 
OK, let me get this right. You're expecting a CPU that is almost 3 years old (Athlon XP 1500+ up to 1800+ was originally released October 2001) to give you 75+ fps (you stated 46 wasn't enough) in a game that has per-polygon hit detection and ragdoll physics?
 
Doom 3 compares pretty well to older games such as Serious Sam and Unreal Tournament for the high end CPU's of their day.

Early 2000, the fastest CPUs available (P3 800 and Athlon 850) were around 40fps in Unreal Tournament, and around 90fps in Expendable. And with those games, performance dropped only about 2-3 fps when moving from 640x480x16 to 1024x768x32, so it would seem those 2 weren't CPU limited.

Serious Sam was getting around 90 fps on the fastest CPU available around it's release. The same with Mercedez-Benz Truck Racing at the same date, regardless of when it was released.

Serious Sam The Second Encounter was getting around 110 fps on the fastest CPU's about 6 months after it was released.

So it seems reasonable to assume games based on a new engine could get around 90-100 fps on some of the fastest cpus available, and much less on CPU's that were 2-3 years old.
 
As you might know, that test extrudes the stencil volumes on the GPU, by inserting degenerates into the meshes. This causes massive vertex shader and setup load, but it frees the CPU from performing that task. More GPU load, less CPU load. It is the right balance for a graphics card benchmark. It isn't necessarily the right balance for a game.

Exactly, as FM states, a 9700 card is about 5 times as fast at skinning than a 3 GHz P4 (http://www.futuremark.com/companyinfo/Response_to_3DMark03_discussion.pdf), so why the hell is Carmack skinning on the CPU on 9700+ cards?
That is my entire point, if there was more GPU load and less CPU load, the graphics card would actually be doing something.

Also, as you can see in the graphs here, even on low-end cards, the FM approach is not vertexbound, but fillrate-bound, so apparently you can increase the GPU load like this, and still be fillrate-bound.

Does an Athlon XP2200+ coupled with a Geforce 6800Ultra deliver 46.1 fps in Battle of Proxycon?

Closer to 100 fps actually.. and that is with more detailed characters than in Doom3.
 
OK, let me get this right. You're expecting a CPU that is almost 3 years old (Athlon XP 1500+ up to 1800+ was originally released October 2001) to give you 75+ fps (you stated 46 wasn't enough) in a game that has per-polygon hit detection and ragdoll physics?

If Max Payne 2 can do it, why can't Doom3?
Max Payne 2 seems to have a whole lot more objects going on at a time than Doom3, yet it doesn't bother the CPU at all. Max Payne 2 is playable at all times, the amount of enemies or moving objects seems to have virtually no impact on framerate... not enough to affect gameplay anyway.

Also, the 46 is the AVERAGE framerate in the timedemo... Which is the result of the game running very fast in simple halls, and very slow in combat with a few enemies. The combat is the problem, it drops so low ( < 10 fps) that you can barely see what's going on. I would definitely settle for 46 fps at all times.
 
How do we know what Doom3 is doing on the CPU vs. the GPU? Can someone give me a list of what is being done on the CPU, and is capable of being done on the GPU?

I'd also like to hear what people think the reasons are for doing things on the CPU..... Is is likely that only the latest hardware is flexible enough to do these things on GPU and maintain the interactivity in the game?
 
How do we know what Doom3 is doing on the CPU vs. the GPU? Can someone give me a list of what is being done on the CPU, and is capable of being done on the GPU?

In short, I suppose the CPU does mesh skinning, shadowvolume extrusion, AI, physics and sound.
Of these, any GPU with vertexshader-support (the only one without it, supported by Doom3 is the GF4MX) can do mesh skinning and shadowvolume extrusion. And the sound could be offloaded to chips with 3d DSP features, I suppose. Leaving the CPU to do only physics and AI, pretty much.

I'd also like to hear what people think the reasons are for doing things on the CPU..... Is is likely that only the latest hardware is flexible enough to do these things on GPU and maintain the interactivity in the game?

Well, as said above, only one GPU supported by Doom3 doesn't support it at all. The older GPUs (GF3/R8500) may or may not be fast enough... But GF4+ and R9500+ are well capable of handling it faster than the CPU, as 3dmark03 also shows.
So perhaps this decision was made at a time when the CPU was the better option... But that would mean that they weren't looking forward, because obviously GPUs would be capable of it soon. And to boot, GPUs would be able to handle higher polycount easily. Doom3 is now very lowpoly, even on the fastest videocards. This probably has to do with the fact that there is a much larger difference between GPU speed than CPU speed, so even the fastest CPU can't really handle all that much more geometry than the minimum required CPU.

I think there should either have been two paths, one CPU and one GPU, or only a GPU path, and dropping support for the GF4MX (or letting NVIDIA's driver emulation handle it).
The way it is now, doesn't benefit anyone. People with slow CPUs can't play the game well at all, regardless of the GPU they have, because in combat it is very slow. And people with fast GPUs will not get more detailed geometry (only texturemaps), so a can of soda is still 6-sided.
Instead a large part of the GPU's processing power is simply wasted.
It's a lose-lose situation.
 
My guess would be triangle collision needs the vertices to be skinned.

Yes, but if he uses some less-than-naive approach with bounding boxes or so, he doesn't have to skin every mesh entirely every frame.
 
Infinisearch said:
Scali said:
Exactly, as FM states, a 9700 card is about 5 times as fast at skinning than a 3 GHz P4 (http://www.futuremark.com/companyinfo/Response_to_3DMark03_discussion.pdf), so why the hell is Carmack skinning on the CPU on 9700+ cards?

My guess would be triangle collision needs the vertices to be skinned.

I'm going to disagree with Futuremark on this one and claim it depends how you implement it. If you use a Naive approach then yes the GPU is faster however if you play to the strengths of the CPU, it's not necessarilly true.

In Carmacks case the decision is most likelyto do with edge extrusion for the shadows. He needs the post transform positions and normals to create the shadow volumes.
 
I'm going to disagree with Futuremark on this one and claim it depends how you implement it. If you use a Naive approach then yes the GPU is faster however if you play to the strengths of the CPU, it's not necessarilly true.

Apparently it is. Carmack's approach is so CPU-intensive, that you pretty much need 3+ GHz to actually not be completely CPU-limited.
And this combines with less-than-impressive framerates.
FutureMark's approach however, gets high framerates regardless of the CPU. The approach may be naive, but it's the result that counts. If modern GPUs are faster at a naive approach than CPUs are at smart approach, use the naive approach.
It's hard to say if the Doom3 timedemo is comparable to the Battle of Proxycon gametest, but at any rate, on high end systems (with 6800U), they both score ~100 fps... The difference is that Doom3 performs like crap with a CPU of about half the speed, while Proxycon still gets ~100 fps.

In theory you are right. But in practice you aren't. The CPU-way simply isn't faster, because the GPUs are that much faster at vertexprocessing anyway. And this gap is only going to grow larger in the near future, so even if Carmack's decision was right at this time, it would not be future-proof, which is a rather silly decision for an engine that has to power games for the coming 5 years or so.
If Doom3 is going to limit polycount because of its CPU-processing while HalfLife 2 or UE3 use the GPU, and have much looser limits on content, I don't think many developers will choose to license the Doom3 engine.
 
dropping support for the GF4MX is basically cutting out 50+% of the market....

Yes, but if he uses some less-than-naive approach with bounding boxes or so, he doesn't have to skin every mesh entirely every frame.

It really wouldn't be in the best interest of id to do away with per-poly hit detection considering it's one of the big "features" of the engine. ;)

If Doom3 is going to limit polycount because of its CPU-processing while HalfLife 2 or UE3 use the GPU, and have much looser limits on content, I don't think many developers will choose to license the Doom3 engine.

Well, the engine certainly isn't ideal for many types of games. Carmack even said that the engine was built to serve the purpose Doom 3 . i.e. a moody, dark, scary game.

It might be worthwhile to mention Chronicles of Riddick on the xbox, which does run very well even on the cut down P3 733.......
 
Scali said:
Yes, but if he uses some less-than-naive approach with bounding boxes or so, he doesn't have to skin every mesh entirely every frame.

If you mean do far field collision with bounding boxes and then do triangle based near field collision, then yes i'd agree with you in terms of collision. You wouldn't have to skin the entire mesh every frame for every model needing triangle level collision on the cpu, that is to say you skin on the gpu instead of the cpu if no triangle level collision is needed for that model.

However when you take into account shadow sillouette generation which as mentioned by ERP needs "the post transform positions and normals to create the shadow volumes." Extrusion can be done on the GPU rather easily, however sillouette generation on the gpu requires:
1.DX9 hardware with support for floating point textures
2.Two or three passes
3.it requires alot of memory for the buffers.
4.it can't be parallelized with the z-only pass since the gpu can't do both the z-only pass and the sillouette generation in parallel.

All of a sudden you
1.lose support for not only dx7 hardware but dx8.x class hardware as well.
2.require a really, really good gpu because you are tripling you vertex load, and serializing rendering and sillouette determinization.
 
You wouldn't have to skin the entire mesh every frame for every model needing triangle level collision on the cpu, that is to say you skin on the gpu instead of the cpu if no triangle level collision is needed for that model.

Since you can process the mesh hierarchically for collision, you still don't need to skin the entire mesh.
Besides, why would you only skin on the GPU when no triangle collision is needed? You can do the collision on the CPU and still do the complete skinning on the GPU.

However when you take into account shadow sillouette generation which as mentioned by ERP needs "the post transform positions and normals to create the shadow volumes." Extrusion can be done on the GPU rather easily, however sillouette generation on the gpu requires:
1.DX9 hardware with support for floating point textures
2.Two or three passes
3.it requires alot of memory for the buffers.
4.it can't be parallelized with the z-only pass since the gpu can't do both the z-only pass and the sillouette generation in parallel.

Yes, but why would you want to do this in the first place?
You can simply do bruteforce extrusion like Battle Of Proxycon, which works on vs1.1, and is very fast anyway.
As I said before, if the naive way is faster, do the naive way.
 
Back
Top