Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 21-Dec-2007, 17:10   #1
B3D News
Beyond3D News
 
Join Date: May 2007
Posts: 440
Default The Technology of a 3D Engine - Part One

Beyond3D is very proud to present the first part of an on-going series featuring one man's thoughts and experience on modern engine development. Read the full news item
B3D News is offline   Reply With Quote
Old 21-Dec-2007, 22:43   #2
suryad
Senior Member
 
Join Date: Aug 2004
Posts: 2,454
Default

Awesome read! I am looking for the next part!
suryad is offline   Reply With Quote
Old 22-Dec-2007, 04:00   #3
JeGX
Junior Member
 
Join Date: Jan 2007
Posts: 11
Default

Yep nice initiative! I hope we'll see some cool screenshots of FlExtEngine in the next part
JeGX is offline   Reply With Quote
Old 25-Dec-2007, 11:53   #4
Rodéric
a.k.a. Ingenu
 
Join Date: Feb 2002
Location: Apsley, U.K.
Posts: 2,729
Default

Thanks

That's really the introduction article, the others dive into details rather quickly, there are already
a couple of them down the pipe, they should come "soon"
__________________
So many things to do, and yet so little time to spend...
Rodéric is online now   Reply With Quote
Old 27-Dec-2007, 01:54   #5
MrGaribaldi
Member
 
Join Date: Nov 2002
Location: In transit
Posts: 604
Default

Interesting. I'm looking forwards to the next installments, as I'm somewhat interested in certain engine aspects myself.
__________________
"Artificial Intelligence can never replace Human Stupidity"
MrGaribaldi is offline   Reply With Quote
Old 27-Dec-2007, 15:02   #6
tongue_of_colicab
Senior Member
 
Join Date: Oct 2004
Location: The Netherlands
Posts: 2,231
Default

Nice read, I dont like programming but its very interresting to learn more about how a engine works.

Something I didnt totally understand was this:

Quote:
A more interesting approach, is to choose neither of them, and to write an abstraction layer which will hide all API specific code inside a module, making the engine API agnostic. With such a layer, the engine will be able to use the best API for a given system, to ensure high performance. The drawback of having an abstract renderer interface is that it must target least common denominator of the APIs it'll be hiding, or the engine will need some tweaks to target some platforms. Still, since the code is nicely encapsulated, changes, even engine broad, will be much easier to deal with.
In what code is your engine written? you write in language X and code the layer to translate all the code in the engine to either opengl or DX?
__________________
I cut an elderly woman off and she spun out and crashed... but its alright... cause I've got a Jaaaaag
tongue_of_colicab is offline   Reply With Quote
Old 27-Dec-2007, 19:23   #7
sebbbi
Member
 
Join Date: Nov 2007
Posts: 939
Default

Quote:
Originally Posted by tongue_of_colicab View Post
Nice read, I dont like programming but its very interresting to learn more about how a engine works.

Something I didnt totally understand was this:

In what code is your engine written? you write in language X and code the layer to translate all the code in the engine to either opengl or DX?
OpenGL and DirectX are not programming languages. Both are libraries you can use with different programming languages (C, C++, C#, Java, etc). The interface of OpenGL and DirectX differ a bit, but both provide more or less the same functionality. The idea is to create an common interface that provides the engine all the graphics hardware functionality it needs. Behind this interface is the renderer module that communacates with the graphics API (or the hardware directly). No other pars of the engine have any access to the graphics API or the hardware. This way all other engine parts (scene management / culling, texture/resource management, animation, object/scene loaders, etc) are completely graphics API and platform independent. This is the way most cross platform graphics engines access the API/hardware.
sebbbi is online now   Reply With Quote
Old 27-Dec-2007, 23:20   #8
BRiT
...
 
Join Date: Feb 2002
Location: Cleveland
Posts: 4,282
Beyond3D

Has anyone sent this article to 3DRealms and the Duke Nukem Forever team, yet?
__________________
IBSL: 2835, 6541, 8531, 9299, 20484, 86985, 87130
FBSL: 7221, 9255, 15892, 20484
BRiT is online now   Reply With Quote
Old 28-Dec-2007, 01:08   #9
Vincent
Member
 
Join Date: May 2007
Location: London
Posts: 235
Default

Quote:
Originally Posted by BRiT View Post
Has anyone sent this article to 3DRealms and the Duke Nukem Forever team, yet?
Vincent is offline   Reply With Quote
Old 28-Dec-2007, 14:01   #10
Pantagruel's Friend
Junior Member
 
Join Date: Jun 2007
Location: Budapest, Hungary
Posts: 59
Default

Nice appetizer, let's see the first course

I like the way the article is structured - and also the approach to building software.
Pantagruel's Friend is offline   Reply With Quote
Old 29-Dec-2007, 06:37   #11
Gumbovariations
Registered
 
Join Date: Dec 2007
Location: Buenos Aires, ARG
Posts: 1
Default

Thanks for sharing your knowledge. I really don't know anything about 3d engines, but it's a very interesting subject to explore. I'm looking to read the next part!

Great site! This is my first post!
Gumbovariations is offline   Reply With Quote
Old 31-Dec-2007, 03:42   #12
Novum
Member
 
Join Date: Jun 2006
Location: Germany
Posts: 284
Default

Quote:
On one hand you have Direct3D, pushed by Microsoft, with a rather nice interface (in its 9th and 10th versions), but suffering from a severe draw call issue. Draw calls on certain Direct3D platforms force a kernel context switch, which has an ultimate performance cost.
Not longer true for Vista (and therefore not true for D3D10 altogether). Please fix.

And it's an urban legend that it's the kernel context switch which makes it that slow. It's the encoding to an immediate form and subsequent decoding in the driver of API commands (that's what makes a Direct3D 9 driver understand the Direct3D 1 API without any extra code btw.).

Last edited by Novum; 31-Dec-2007 at 04:55.
Novum is offline   Reply With Quote
Old 31-Dec-2007, 10:39   #13
Rys
Tiled
 
Join Date: Oct 2003
Location: Kings Langley, UK
Posts: 2,675
Default

Quote:
Originally Posted by Novum View Post
Not longer true for Vista (and therefore not true for D3D10 altogether). Please fix.
So draw calls are completely free on Vista now? Methinks not.

As for it not being the context switch that's the issue with 9, it certainly is on XP. There's (almost) no expensive format conversion done in D3D9 titles whatsoever, and you hit the call limit precisely because it's an in-kernel operation under XP (and 2K/9x).
__________________
A major redesign of the core ALU pineapple boomerang fortress.
Rys is offline   Reply With Quote
Old 31-Dec-2007, 12:14   #14
armchair_architect
Member
 
Join Date: Nov 2006
Posts: 128
Default

Has anyone measured batch count limits on XP vs. Vista using the same DX9 code? Both the ring transition and the DP2 buffering are gone in Vista (which unfortunately means we can't isolate them), so I'd imagine draw overhead would be lower on Vista. But I haven't seen any data.
armchair_architect is offline   Reply With Quote
Old 31-Dec-2007, 14:58   #15
Demirug
Senior Member
 
Join Date: Dec 2002
Posts: 1,326
Send a message via MSN to Demirug
Default

Quote:
Originally Posted by Rys View Post
So draw calls are completely free on Vista now? Methinks not.
Not free but cheaper. But OpenGL draw calls where never free on XP, too.

Quote:
Originally Posted by Rys View Post
As for it not being the context switch that's the issue with 9, it certainly is on XP. There's (almost) no expensive format conversion done in D3D9 titles whatsoever, and you hit the call limit precisely because it's an in-kernel operation under XP (and 2K/9x).
I am not sure what you are mean with “format conversion” here. The DP2 encoding/decoding is for sure a part of this story. But this was done on 9x too and the only differences I know with 2K was that 2K requires the context switch and 9x not. I have the feeling that later runtimes increased the size of the command buffer to reduce the number of context switches needed. Additional game developers become more sensitive.

Quote:
Originally Posted by armchair_architect View Post
Has anyone measured batch count limits on XP vs. Vista using the same DX9 code? Both the ring transition and the DP2 buffering are gone in Vista (which unfortunately means we can't isolate them), so I'd imagine draw overhead would be lower on Vista. But I haven't seen any data.
The ring transition is not gone. This will maybe happened with WDDM 2.x. With WDDM 1.0 it is still required as the kernel mode graphics subsystem and the driver need to replace memory handles with real memory address. VRAM swapping is done there too if necessary. But they properly occur less often as the buffers are larger and the internal GPU command format maybe need less room compared to the DP2 tokens.

I am not sure if DP2 encoding is gone. If you have a look at the WDK samples you can see that the sample 8500 driver use the XP DP2 decoder and a custom DP2 encoder. Maybe this technique is used in some released drivers too. At least it makes it possible to still use big parts of the XP drivers that are still developed.

I have seen some big overhead differences between 9 and 10 on Vista on my 8800 development rig. But I am not sure if this is based on different command handling. Maybe there are some differences in the handling of SM3 and SM4.
__________________
GPU blog
Demirug is offline   Reply With Quote
Old 01-Jan-2008, 04:41   #16
suryad
Senior Member
 
Join Date: Aug 2004
Posts: 2,454
Default

I am not as knowledgeable as you guys are in this forum, but I had a question...all these directx 10 games...so called...how come they are not reflecting the performance that was being touted by Microsoft...they were saying how the games would run a lot faster because of the driver overhead being non-existent or something like that. Does it not seem that in a lot of benchmarks the DX 9 version is faster than the DX 10 version? Is it because drivers and the hardware are not optimized for DX 10? Or are programmers designing games not used to dX 10? Just trying to understand...

Also another question...when they say that this engine scales really well...well I guess a better way to put it is COD 4 vs Crysis...I have seen screenies and accounts from people i know who have played both Crysis and COD4...and say COD4 was brilliant looking and ran on their system just fine whereas Crysis brought their system down to its knees...just trying to understand what makes Crysis game's engine so heavy when COD4 breezes through while looking just as good on lower end systems?
suryad is offline   Reply With Quote
Old 01-Jan-2008, 05:51   #17
BRiT
...
 
Join Date: Feb 2002
Location: Cleveland
Posts: 4,282
Default

Suryad, a lot of it is down to apples and oranges comparisons. The DX10-capable games are not rendering the different modes with the exact same image settings. All of the current DX10-capable games were not developed with DX10 in mind. It was more of an afterthought or bolt-on and as such any performance improvements will be limited.
__________________
IBSL: 2835, 6541, 8531, 9299, 20484, 86985, 87130
FBSL: 7221, 9255, 15892, 20484
BRiT is online now   Reply With Quote
Old 01-Jan-2008, 07:53   #18
armchair_architect
Member
 
Join Date: Nov 2006
Posts: 128
Default

DX10 does have lower CPU overhead than DX9 for an equivalent set of rendering commands. I've measured this on Nvidia hardware and I believe it's true for AMD as well. But that only matters if the game is CPU-limited; otherwise the reduced overhead won't affect the framerate at all.

Some of the new DX10 features, like texture/rendertarget arrays and geometry shaders, also make it possible to do the same thing as in DX9 except more efficiently (for the GPU). This helps games that are GPU-limited, but only if they actually use those features. The problem here is that most of the "DX10 games" so far are really DX9 games with a DX10 path thrown in as an afterthought (the others are Xbox360 ports) -- the developers haven't invested in taking advantage of the new DX10 features for the effects that are also in the DX9 version; they only use DX10-specific features for the additional DX10-only effects. So the DX10 version does all the same stuff as the DX9 version in the same way as DX9 (and therefore with the same performance), and then does extra DX10-specific stuff. That gets you extra eye candy but not better performance...

The final problem is that trying to do things the DX9 way in DX10 isn't always efficient. Constant buffers are the biggest problem here (look at all of the MS, AMD, and NV developer presentations about DX10 -- they all harp about this) -- in DX9 you only have one constant buffer but you can update individual constants efficiently. In DX10 you have lots of constant buffers, but you can't change individual elements of a buffer -- you have to rewrite the entire thing each time. The efficient way to manage this in DX9 is very inefficient if translated directly to DX10, and the efficient way to do it in DX10 is not possible in DX9. This makes it very hard for games that use both DX9 and DX10 to optimize for both -- and since they're mostly developed on DX9, that's what they optimize for.
armchair_architect is offline   Reply With Quote
Old 01-Jan-2008, 11:34   #19
Demirug
Senior Member
 
Join Date: Dec 2002
Posts: 1,326
Send a message via MSN to Demirug
Default

Quote:
Originally Posted by armchair_architect View Post
The final problem is that trying to do things the DX9 way in DX10 isn't always efficient. Constant buffers are the biggest problem here (look at all of the MS, AMD, and NV developer presentations about DX10 -- they all harp about this) -- in DX9 you only have one constant buffer but you can update individual constants efficiently. In DX10 you have lots of constant buffers, but you can't change individual elements of a buffer -- you have to rewrite the entire thing each time. The efficient way to manage this in DX9 is very inefficient if translated directly to DX10, and the efficient way to do it in DX10 is not possible in DX9. This makes it very hard for games that use both DX9 and DX10 to optimize for both -- and since they're mostly developed on DX9, that's what they optimize for.
Yes, the constant buffer transfer limit is a nasty thing. You can very easy block your whole GPU with processing constant buffer updates. The SM4 compiler makes your life here even harder as there is no option to remove unused constants like the SM2/3 compiler does. This can be a big problem if you are trying to use the same shader base for 9 and 10.

But D3D9 have its own problems here too. Updating too many constants individual will steal you CPU time. If you can do it as block operation it is normally better.

Another thing that can you make headaches when going from 9 to 10 are render state changes. While 9 allows changing each state individual 10 supports only state objects that bundle multiple states together. Unfortunately most engines are designed for single state changes. If you want to add Direct3D 10 without changing too much you are forced to implement state object lookups. This will eat up most of the CPU cycles that the new Direct3D 10 state system can save you.
__________________
GPU blog
Demirug is offline   Reply With Quote
Old 01-Jan-2008, 16:17   #20
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

Quote:
Originally Posted by suryad View Post
I am not as knowledgeable as you guys are in this forum, but I had a question...all these directx 10 games...so called...how come they are not reflecting the performance that was being touted by Microsoft...they were saying how the games would run a lot faster because of the driver overhead being non-existent or something like that. Does it not seem that in a lot of benchmarks the DX 9 version is faster than the DX 10 version? Is it because drivers and the hardware are not optimized for DX 10? Or are programmers designing games not used to dX 10? Just trying to understand...

Also another question...when they say that this engine scales really well...well I guess a better way to put it is COD 4 vs Crysis...I have seen screenies and accounts from people i know who have played both Crysis and COD4...and say COD4 was brilliant looking and ran on their system just fine whereas Crysis brought their system down to its knees...just trying to understand what makes Crysis game's engine so heavy when COD4 breezes through while looking just as good on lower end systems?
I understand where you're coming from here, and I absolutely agree that the messaging was confusing on this point for much of 2006 and earlier. I think it more turns out that the answer to both questions above are roughly the same. That the overhead of DX9 limited developers to a certain number of objects on the screen before the overhead became performance-prohibitive. But below that number it was manageable. So to get the performance advantage touted for DX10, you first need a game that goes beyond those old limits. So the performance point and scalability point are actually the same point. . .
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline   Reply With Quote
Old 01-Jan-2008, 22:10   #21
Davros
Darlek ******
 
Join Date: Jun 2004
Posts: 9,497
Default

so the only way we will get these massive speedups promised by dx10 is if the dev says "sod dx9" ?
Davros is offline   Reply With Quote
Old 02-Jan-2008, 00:06   #22
Demirug
Senior Member
 
Join Date: Dec 2002
Posts: 1,326
Send a message via MSN to Demirug
Default

Quote:
Originally Posted by Davros View Post
so the only way we will get these massive speedups promised by dx10 is if the dev says "sod dx9" ?
Depends on what you define as ”massive speedups“? It is possible to write an D3D9/10 Engine that can take advanced of the lower call overhead and make use of other advanced Direct3D 10 features. But to do this the engine need to be written with 9 and 10 in mind. The current engines are based on Direct3D 9 and the Direct3D 10 support was added late.
__________________
GPU blog
Demirug is offline   Reply With Quote
Old 02-Jan-2008, 00:50   #23
suryad
Senior Member
 
Join Date: Aug 2004
Posts: 2,454
Default

Thanks for the explanations. I am beginning to understand now. I did not look at it that way before.

I think what Davros meant when saying "massive speedups" is if game A is implemented in DX 9 purely, and if game A has another version written in DX 10 purely then DX 10 version with all the new features enabled will look better but use less resources and generate higher fps than the DX 9 version. Is that correct?
suryad is offline   Reply With Quote
Old 02-Jan-2008, 07:49   #24
silent_guy
Senior Member
 
Join Date: Mar 2006
Posts: 1,687
Default

Quote:
Originally Posted by suryad View Post
I think what Davros meant when saying "massive speedups" is if game A is implemented in DX 9 purely, and if game A has another version written in DX 10 purely then DX 10 version with all the new features enabled will look better but use less resources and generate higher fps than the DX 9 version. Is that correct?
Sometimes... It really depends on where your performance bottleneck is located.

If you're completely maxing out on, say, the multipliers in the pixel shaders, then, on the same HW, you'll render that part at the speed no matter which API you're using.

In practice, a scene has different parts with different properties. Some will be floating point limited, others will be texture unit limited etc.
silent_guy is offline   Reply With Quote
Old 02-Jan-2008, 16:05   #25
pthiben
Registered
 
Join Date: Apr 2007
Posts: 7
Icon Arrow

Quote:
Originally Posted by silent_guy View Post
Sometimes... It really depends on where your performance bottleneck is located.

If you're completely maxing out on, say, the multipliers in the pixel shaders, then, on the same HW, you'll render that part at the speed no matter which API you're using.

In practice, a scene has different parts with different properties. Some will be floating point limited, others will be texture unit limited etc.
A pretty interesting read is here where profiling was done:
http://developer.nvidia.com/docs/IO/...BatchBatch.pdf

The mentioned hardware is a bit old now, and it lacks statistics about D3D10 (and vista) but it's still a good read
pthiben is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 21:11.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.