The Technology of a 3D Engine - Part One

Thanks :)

That's really the introduction article, the others dive into details rather quickly, there are already
a couple of them down the pipe, they should come "soon" ;)
 
Nice read, I dont like programming but its very interresting to learn more about how a engine works.

Something I didnt totally understand was this:

A more interesting approach, is to choose neither of them, and to write an abstraction layer which will hide all API specific code inside a module, making the engine API agnostic. With such a layer, the engine will be able to use the best API for a given system, to ensure high performance. The drawback of having an abstract renderer interface is that it must target least common denominator of the APIs it'll be hiding, or the engine will need some tweaks to target some platforms. Still, since the code is nicely encapsulated, changes, even engine broad, will be much easier to deal with.

In what code is your engine written? you write in language X and code the layer to translate all the code in the engine to either opengl or DX?
 
Nice read, I dont like programming but its very interresting to learn more about how a engine works.

Something I didnt totally understand was this:

In what code is your engine written? you write in language X and code the layer to translate all the code in the engine to either opengl or DX?

OpenGL and DirectX are not programming languages. Both are libraries you can use with different programming languages (C, C++, C#, Java, etc). The interface of OpenGL and DirectX differ a bit, but both provide more or less the same functionality. The idea is to create an common interface that provides the engine all the graphics hardware functionality it needs. Behind this interface is the renderer module that communacates with the graphics API (or the hardware directly). No other pars of the engine have any access to the graphics API or the hardware. This way all other engine parts (scene management / culling, texture/resource management, animation, object/scene loaders, etc) are completely graphics API and platform independent. This is the way most cross platform graphics engines access the API/hardware.
 
Has anyone sent this article to 3DRealms and the Duke Nukem Forever team, yet? ;)
 
Thanks for sharing your knowledge. I really don't know anything about 3d engines, but it's a very interesting subject to explore. I'm looking to read the next part!

Great site! This is my first post!
 
On one hand you have Direct3D, pushed by Microsoft, with a rather nice interface (in its 9th and 10th versions), but suffering from a severe draw call issue. Draw calls on certain Direct3D platforms force a kernel context switch, which has an ultimate performance cost.
Not longer true for Vista (and therefore not true for D3D10 altogether). Please fix.

And it's an urban legend that it's the kernel context switch which makes it that slow. It's the encoding to an immediate form and subsequent decoding in the driver of API commands (that's what makes a Direct3D 9 driver understand the Direct3D 1 API without any extra code btw.).
 
Last edited by a moderator:
Not longer true for Vista (and therefore not true for D3D10 altogether). Please fix.
So draw calls are completely free on Vista now? Methinks not.

As for it not being the context switch that's the issue with 9, it certainly is on XP. There's (almost) no expensive format conversion done in D3D9 titles whatsoever, and you hit the call limit precisely because it's an in-kernel operation under XP (and 2K/9x).
 
Has anyone measured batch count limits on XP vs. Vista using the same DX9 code? Both the ring transition and the DP2 buffering are gone in Vista (which unfortunately means we can't isolate them), so I'd imagine draw overhead would be lower on Vista. But I haven't seen any data.
 
So draw calls are completely free on Vista now? Methinks not.

Not free but cheaper. But OpenGL draw calls where never free on XP, too.

As for it not being the context switch that's the issue with 9, it certainly is on XP. There's (almost) no expensive format conversion done in D3D9 titles whatsoever, and you hit the call limit precisely because it's an in-kernel operation under XP (and 2K/9x).

I am not sure what you are mean with “format conversion” here. The DP2 encoding/decoding is for sure a part of this story. But this was done on 9x too and the only differences I know with 2K was that 2K requires the context switch and 9x not. I have the feeling that later runtimes increased the size of the command buffer to reduce the number of context switches needed. Additional game developers become more sensitive.

Has anyone measured batch count limits on XP vs. Vista using the same DX9 code? Both the ring transition and the DP2 buffering are gone in Vista (which unfortunately means we can't isolate them), so I'd imagine draw overhead would be lower on Vista. But I haven't seen any data.

The ring transition is not gone. This will maybe happened with WDDM 2.x. With WDDM 1.0 it is still required as the kernel mode graphics subsystem and the driver need to replace memory handles with real memory address. VRAM swapping is done there too if necessary. But they properly occur less often as the buffers are larger and the internal GPU command format maybe need less room compared to the DP2 tokens.

I am not sure if DP2 encoding is gone. If you have a look at the WDK samples you can see that the sample 8500 driver use the XP DP2 decoder and a custom DP2 encoder. Maybe this technique is used in some released drivers too. At least it makes it possible to still use big parts of the XP drivers that are still developed.

I have seen some big overhead differences between 9 and 10 on Vista on my 8800 development rig. But I am not sure if this is based on different command handling. Maybe there are some differences in the handling of SM3 and SM4.
 
I am not as knowledgeable as you guys are in this forum, but I had a question...all these directx 10 games...so called...how come they are not reflecting the performance that was being touted by Microsoft...they were saying how the games would run a lot faster because of the driver overhead being non-existent or something like that. Does it not seem that in a lot of benchmarks the DX 9 version is faster than the DX 10 version? Is it because drivers and the hardware are not optimized for DX 10? Or are programmers designing games not used to dX 10? Just trying to understand...

Also another question...when they say that this engine scales really well...well I guess a better way to put it is COD 4 vs Crysis...I have seen screenies and accounts from people i know who have played both Crysis and COD4...and say COD4 was brilliant looking and ran on their system just fine whereas Crysis brought their system down to its knees...just trying to understand what makes Crysis game's engine so heavy when COD4 breezes through while looking just as good on lower end systems?
 
Suryad, a lot of it is down to apples and oranges comparisons. The DX10-capable games are not rendering the different modes with the exact same image settings. All of the current DX10-capable games were not developed with DX10 in mind. It was more of an afterthought or bolt-on and as such any performance improvements will be limited.
 
DX10 does have lower CPU overhead than DX9 for an equivalent set of rendering commands. I've measured this on Nvidia hardware and I believe it's true for AMD as well. But that only matters if the game is CPU-limited; otherwise the reduced overhead won't affect the framerate at all.

Some of the new DX10 features, like texture/rendertarget arrays and geometry shaders, also make it possible to do the same thing as in DX9 except more efficiently (for the GPU). This helps games that are GPU-limited, but only if they actually use those features. The problem here is that most of the "DX10 games" so far are really DX9 games with a DX10 path thrown in as an afterthought (the others are Xbox360 ports) -- the developers haven't invested in taking advantage of the new DX10 features for the effects that are also in the DX9 version; they only use DX10-specific features for the additional DX10-only effects. So the DX10 version does all the same stuff as the DX9 version in the same way as DX9 (and therefore with the same performance), and then does extra DX10-specific stuff. That gets you extra eye candy but not better performance...

The final problem is that trying to do things the DX9 way in DX10 isn't always efficient. Constant buffers are the biggest problem here (look at all of the MS, AMD, and NV developer presentations about DX10 -- they all harp about this) -- in DX9 you only have one constant buffer but you can update individual constants efficiently. In DX10 you have lots of constant buffers, but you can't change individual elements of a buffer -- you have to rewrite the entire thing each time. The efficient way to manage this in DX9 is very inefficient if translated directly to DX10, and the efficient way to do it in DX10 is not possible in DX9. This makes it very hard for games that use both DX9 and DX10 to optimize for both -- and since they're mostly developed on DX9, that's what they optimize for.
 
The final problem is that trying to do things the DX9 way in DX10 isn't always efficient. Constant buffers are the biggest problem here (look at all of the MS, AMD, and NV developer presentations about DX10 -- they all harp about this) -- in DX9 you only have one constant buffer but you can update individual constants efficiently. In DX10 you have lots of constant buffers, but you can't change individual elements of a buffer -- you have to rewrite the entire thing each time. The efficient way to manage this in DX9 is very inefficient if translated directly to DX10, and the efficient way to do it in DX10 is not possible in DX9. This makes it very hard for games that use both DX9 and DX10 to optimize for both -- and since they're mostly developed on DX9, that's what they optimize for.

Yes, the constant buffer transfer limit is a nasty thing. You can very easy block your whole GPU with processing constant buffer updates. The SM4 compiler makes your life here even harder as there is no option to remove unused constants like the SM2/3 compiler does. This can be a big problem if you are trying to use the same shader base for 9 and 10.

But D3D9 have its own problems here too. Updating too many constants individual will steal you CPU time. If you can do it as block operation it is normally better.

Another thing that can you make headaches when going from 9 to 10 are render state changes. While 9 allows changing each state individual 10 supports only state objects that bundle multiple states together. Unfortunately most engines are designed for single state changes. If you want to add Direct3D 10 without changing too much you are forced to implement state object lookups. This will eat up most of the CPU cycles that the new Direct3D 10 state system can save you.
 
I am not as knowledgeable as you guys are in this forum, but I had a question...all these directx 10 games...so called...how come they are not reflecting the performance that was being touted by Microsoft...they were saying how the games would run a lot faster because of the driver overhead being non-existent or something like that. Does it not seem that in a lot of benchmarks the DX 9 version is faster than the DX 10 version? Is it because drivers and the hardware are not optimized for DX 10? Or are programmers designing games not used to dX 10? Just trying to understand...

Also another question...when they say that this engine scales really well...well I guess a better way to put it is COD 4 vs Crysis...I have seen screenies and accounts from people i know who have played both Crysis and COD4...and say COD4 was brilliant looking and ran on their system just fine whereas Crysis brought their system down to its knees...just trying to understand what makes Crysis game's engine so heavy when COD4 breezes through while looking just as good on lower end systems?

I understand where you're coming from here, and I absolutely agree that the messaging was confusing on this point for much of 2006 and earlier. I think it more turns out that the answer to both questions above are roughly the same. That the overhead of DX9 limited developers to a certain number of objects on the screen before the overhead became performance-prohibitive. But below that number it was manageable. So to get the performance advantage touted for DX10, you first need a game that goes beyond those old limits. So the performance point and scalability point are actually the same point. . .
 
Back
Top