Is UE4 indicative of the sacrifices devs will have to make on consoles next gen?

I am praying to the gaming gods that not every AAA game this gen is some shooter with a love affair of grey/brown scenery.
 
Why? The 360 had unified memory plus shed loads of bandwidth to the eDRAM and a GPU architecture that was not only well beyond the common DX9 spec when it launched but in some ways beyond DX10 to say nothing of it sporting bleeding edge performance.
That was a split-memory pool, as Cerny said. Main memory and eDRAM means copying still had to be done.

PS4 has what? Unified memory, a standard DX11 GPU, a performance target around the PC upper mid range and some GPGPU enhancements for which there are several possible answers to in the PC space (e.g. far more powerful CPU's like Haswell or utilising APU's like Kaveri).
We all know pushing hardware is not a PC strong suit. If it was, this wouldn't even be a conversation worth having. Just because PC has the hardware doesn't mean it will get used. This has been demonstrated generation after generation. Are you expecting that to change all of a sudden? That's what would have to happen, in order for your statement about having several possible answers in the PC, for it to matter.

Add to that the general PC like architecture of the new consoles and your expectations of greater console effectiveness for a given performance level over last generation seems a little far fetched.
How can it seem far fetched, when DirectX 11 doesn't even see all the functionality of GPUs? You seem to believe that DirectX 11 is not extremely wasteful and pushes GPUs to it's max. Neither one of those beliefs are true.

As AndyTX said, no this is not the case. Aside from anything else we've already been told by Exophase (and other developers here) than GCN and possibility even older architectures can already switch between graphics and compute on the fly and do not need to stop one to start the other.
Are you saying DirectX 11 sees GCN functionality to exploit it or that this is automatically recognized? Also, the GCN, in the PS4, is much more functional than what's currently in PCs. That would be an architecture advantage. Please don't try to dismiss it.

The GPGPU enhancements of PS4 and Durango will really come into play where those jobs are highly sensitive to latency, but for the types of GPGPU stuff we see today effecting graphics only - which includes TressFX - there's no reason to expect PS4 to be significantly better than a regular discrete GCN based system.
2 ACEs are almost as good as 8 ACEs with 64 total cues? That's what you're saying?

Evidence of what? DX11 having less overhead than DX9? It's common knowledge, a quick google will reveal dozens of different sources for that information. Your very own link posted earlier in this thread is one of them.
Evidence that the article I presented is not pertaining to DirectX 11, even though it was written almost 2 years after DirectX 11 emerged? And, evidence that the graph, done by a forum member's study, was not accurate (DirectX 11 vs OpenGL)? That graph looked pretty telling to me. Plus, libGCM is even lower level than OpenGL.
 
Last edited by a moderator:
2 years ago how many games were coded for dx11?

8 aces does not mean anything like 4 times as good as 2 aces. They allow more in flight threads which will increase efficiency which will increase performance, but having more CU's like a PC reaches that goal just in a different way. Consoles don't have the power budget so they target efficiency, this is less necessary on PC. However if there is a significant advantage for PC parts with more ACE's we may well see that happen on future parts.

You can continue to use out of context quotes to reinforce your own opinion, but I don't doubt it will become quite apparent in short order that consoles will no longer be launching as machines at the pinnacle of performance. Power budgets and economics have caught up with them.
 
But in photography of the real world, which is what most people are comparing screen visuals to, blur and the like makes it more realistic.
Most amateur and journalism photography simply uses the smallest stop you can get away with for the speed of exposure ... we expect clarity, we tolerate technical limitations.

Lack of DoF, grain, etc ... is not seen by the general population as a sign of quality in the same sense as juddery motion in film IMO.
 
That was a split-memory pool, as Cerny said. Main memory and eDRAM means copying still had to be done.

Are you suggesting that EDRAM was a disadvantage for the 360 compared to PCs of the same time period?
You always have to copy your data back to main memory. Ideally you do it in as large batches as possible. EDRAM facilitated this quite well on the 360.
Yes, you are effectively doing two copies, one to EDRAM, one to main ram - but then you could consider the same thing when using the caches. Redundant work isn't being done, which you seem to be suggesting (?).

We all know pushing hardware is not a PC strong suit. If it was, this wouldn't even be a conversation worth having. Just because PC has the hardware doesn't mean it will get used. This has been demonstrated generation after generation. Are you expecting that to change all of a sudden? That's what would have to happen, in order for your statement about having several possible answers in the PC, for it to matter.

While it is obviously true that the PC platform has certain overheads (especially related to CPU usage for the driver layers, etc) in practise it doesn't really have a big impact on actual performance when GPU limited. The layers of abstraction typically have greatest impact on a subset of the rendering pipeline (API calls) - and if those layers aren't the bottleneck (which they usually aren't - unless you are being silly trying to perform 10s of thousands of draw calls per frame).

Otherwise, the reasons console games are more heavily optimised is pretty simple (and in my opinion has nothing to do with the layers of abstraction). Simply that it *has* to run well on the target hardware and with fixed hardware you can devote more time to it - you can't and don't have to let the user dial down settings. With that said, DX10 and DX11 alleviate a lot of the compatibility work required when building PC games of old - time that can be put into optimisation.

How can it seem far fetched, when DirectX 11 doesn't even see all the functionality of GPUs? You seem to believe that DirectX 11 is not extremely wasteful and pushes GPUs to it's max. Neither one of those beliefs are true.

I'm not sure what your point is. Granted, there will be a subset of GPU features that are not directly exposed via DX (or GL) - however this doesn't mean the driver doesn't use them (why would they exist otherwise - hardware is built to DX11 spec, it would be wasteful to do otherwise?).

What evidence are you basing the second statement on?
There is a good deal of anecdotal evidence to already suggest your claim is bogus; performance throttling being a simple example, and synthetic tests commonly hitting theoretical peak numbers. I'm not a DX11 expert so I'm not going to say you are wrong, but I'd like some evidence.

Are you saying DirectX 11 sees GCN functionality to exploit it or that this is automatically recognized? Also, the GCN, in the PS4, is much more functional than what's currently in PCs. That would be an architecture advantage. Please don't try to dismiss it.

Well, one would hope the graphics driver does (!).
We don't know much about the customizations to GCN in the PS4, other than the additional ACEs and cache bypass - which I wouldn't class as new functionality (more minor optimisations / extensions for a specific use case).

2 ACEs are almost as good as 8 ACEs with 64 total cues? That's what you're saying?

Anyone who knows the answer to that question would break their NDA if they answered.
The statement you are questioning (to me) implies a comparison of latency due to driver and abstraction overhead (not to mention potential latencies due to PCI-e data transfers etc), which I think we can be pretty confident is an area the PS4 can have a clear advantage (cache bypass, etc).
How this translates to realworld performance has yet to be seen - but I'd agree with pjb that it likely will only help with very low latency jobs.

Evidence that the article I presented is not pertaining to DirectX 11, even though it was written almost 2 years after DirectX 11 emerged. And, evidence that the graph, done by a forum member's study, was not accurate (DirectX 11 vs OpenGL). That graph looked pretty telling to me. Plus, libGCM is even lower level than OpenGL.

Not sure what you mean here? It's not accurate but telling?
:?:
 
Most photography seen on TV, the same medium as computer games, is via professional photographers and that's the target for games designers, although they are keen to emphasise technical limits of cameras with exaggerated flares and colour aberrations and the like. When Laa-Yosh creates a photorealistic cutscene, he's adding photographic limits that our eyes don't have, and the results are damned impressive. So I do think that the target should be TV/film visuals and not real-life visuals. DOF is great at adding beauty to otherwise dull scenes, for example, although kinda lousy in computer games where you're trying to see into the distance and the game has decided to blur that out!
 
we need move/kinect to track eye movement and feed it into the DOF shaders :LOL:

It's funny that you mention that...there was actually a recent paper from MS Research that used eye-tracking to determine where the viewer was looking, and render the rest of screen at lower resolution to save performance. I guess it would suck if anybody else was trying to watch you play. :p
 
We all know pushing hardware is not a PC strong suit. If it was, this wouldn't even be a conversation worth having. Just because PC has the hardware doesn't mean it will get used. This has been demonstrated generation after generation. Are you expecting that to change all of a sudden? That's what would have to happen, in order for your statement about having several possible answers in the PC, for it to matter.

Since most of your post has already been answered by other members I'll just address the following few points.

I'm not sure how you justify the above statement. Clearly the power of current high end PC hardware goes either unused or used inefficiently because the (console derived) software doesn't require that power. Should the software change and start requiring higher performance, that performance will be used if it is available, whether or not it's in a PC. It's not difficult to max out a dual core CPU or entry level GPU in modern games and that will be no different next generation with higher specification hardware.

If CPU compute requirements increase next genreation then there's no reason that more powerful PC CPU's will not be fully utilised in meeting those requirements. I'll grant you that using the IGP's of APU's as dedicated GPGPU processors is more questionable given that it would require some level of developer and/or vendor support but NV has certainly achieved this type of support with PhysX so there's no reason AMD couldn't achieve the same - at least in some flag ship titles where it most matters. Their HSA presentations certainly show that they want to push this as a usage scenario.

2 ACEs are almost as good as 8 ACEs with 64 total cues? That's what you're saying?

No, but there is certainly an element of diminishing returns to take account of. 2 ACEs with 16(?) queues is still a lot of scheduling capability for compute work on the GPU. How much benefit to you derive from scheduling more? Surely beyond this point you are getting to smaller tasks which would be just as well, or better run on a CPU with greater SIMD capability. Lets not lose sight of the fact that for every task you move from the CPU to the GPU via the ACE's you're using up CU FLOPs that could be spent of graphics. With only 18 CU's it might be worth considering just how much of the GPU you want to spend on CPU work.

Evidence that the article I presented is not pertaining to DirectX 11, even though it was written almost 2 years after DirectX 11 emerged. And, evidence that the graph, done by a forum member's study, was not accurate (DirectX 11 vs OpenGL). That graph looked pretty telling to me. Plus, libGCM is even lower level than OpenGL.

There was never any claim that the article wasn't pertaining to DX11. Nor was there any claim that the graphs posted were inaccurate. The argument was and always has been (and I'm not sure how you missed this given how many times it has been repeated) that JC's comment of a 2x console performance advantage at a given performance level was no longer completely relevant due to it being made in respect to DX9, which is considerably less efficient in terms of overhead compared with DX11.

Obviously, most longstanding B3D members understand JC's comment to be a highly simplified statement for the benefit of the less technically minded and that in fact the actual advantage varies greatly from situation to situation depending on workload requirements, system bottlenecks and general levels of optimisation for each platform. But whatever part of his statement related to API overhead is certainly less relevant when using DX11 compared to using DX9 which is the API the statement was made in respect to. And to go further, whatever element of JC's statement was based on the ability to optimise for 1 specific platform / architecture is likely also made less relevant this generation by the fact that the new consoles share a considerable amount in common with the PC in both their CPU and GPU architectures.
 
Last edited by a moderator:
2 years ago how many games were coded for dx11?
www.gamertechtv.com/2010/directx11-games-for-2010-and-2011/

8 aces does not mean anything like 4 times as good as 2 aces. They allow more in flight threads which will increase efficiency which will increase performance, but having more CU's like a PC reaches that goal just in a different way. Consoles don't have the power budget so they target efficiency, this is less necessary on PC. However if there is a significant advantage for PC parts with more ACE's we may well see that happen on future parts.
Does it have to be 4x as good for you and others to recognize it as "significant" or an architectural advantage?

You can continue to use out of context quotes to reinforce your own opinion, but I don't doubt it will become quite apparent in short order that consoles will no longer be launching as machines at the pinnacle of performance. Power budgets and economics have caught up with them.

I want fact and not opinion. Is it an architectural advantage to have 8 ACEs instead of 2 ACEs? Is it an architectural advantage to have 64 cues instead of 16 cues? These answers should be fact and not opinion.

This have never been about the PS4 set up being equal to a Titan in hardware. It's about the PS4 being a lot more capable than similar PC hardware (due to software and architectural differences). Why are people tring to make it more than that?
 
Are you suggesting that EDRAM was a disadvantage for the 360 compared to PCs of the same time period?

You always have to copy your data back to main memory. Ideally you do it in as large batches as possible. EDRAM facilitated this quite well on the 360.
Yes, you are effectively doing two copies, one to EDRAM, one to main ram - but then you could consider the same thing when using the caches. Redundant work isn't being done, which you seem to be suggesting (?).
I'm suggesting exactly what I said. It's split-pool memory, like Durango is rumored to be. It sounds like you agree. Does the X360 not have cache, too? Would that not be an extra step? If it is, then it sounds like you agree with me.


While it is obviously true that the PC platform has certain overheads (especially related to CPU usage for the driver layers, etc) in practise it doesn't really have a big impact on actual performance when GPU limited. The layers of abstraction typically have greatest impact on a subset of the rendering pipeline (API calls) - and if those layers aren't the bottleneck (which they usually aren't - unless you are being silly trying to perform 10s of thousands of draw calls per frame).
Why would you say the amount of draw calls you mentioned to be silly for PC, when that would be a very low number for consoles? Are you saying PCs can't benefit greatly from 10K to 30K draw calls? Does this not give much greater artistic freedom for environments (as the article and Timothy Lottes said)?

Otherwise, the reasons console games are more heavily optimised is pretty simple (and in my opinion has nothing to do with the layers of abstraction). Simply that it *has* to run well on the target hardware and with fixed hardware you can devote more time to it - you can't and don't have to let the user dial down settings. With that said, DX10 and DX11 alleviate a lot of the compatibility work required when building PC games of old - time that can be put into optimisation.
It still seems bad according to the article and Timothy Lottes. I think even Sebbi had something nasty to say about the overhead. That member comparison chart between DX11 and OpenGL looked pretty horrible, too. LibGCM is suppose to be that much better than OpenGL. I'm not talking about the difference between DX9 and DX11 draw call overhead. However, it WOULD be nice to see that data. I just don't think it will show up.

I'm not sure what your point is. Granted, there will be a subset of GPU features that are not directly exposed via DX (or GL) - however this doesn't mean the driver doesn't use them (why would they exist otherwise - hardware is built to DX11 spec, it would be wasteful to do otherwise?).
Are you saying there is nothing that can be done with libGCM/PSSL that can't be done with DX11? If something is wasteful on one platform, it's a chance for ground to be gained on the other. That would be my point. If you say that's not true, then I'll let it be.

What evidence are you basing the second statement on?
There is a good deal of anecdotal evidence to already suggest your claim is bogus; performance throttling being a simple example, and synthetic tests commonly hitting theoretical peak numbers. I'm not a DX11 expert so I'm not going to say you are wrong, but I'd like some evidence.
I'm basing the second statement on what I've already posted. If a PC can't do 30K draw calls on DX11, that's obviously wasteful. Timothy Lottes, the article and the B3D forum member's graph all say about the same thing. I put it all in my original post.

Well, one would hope the graphics driver does (!).
We don't know much about the customizations to GCN in the PS4, other than the additional ACEs and cache bypass - which I wouldn't class as new functionality (more minor optimisations / extensions for a specific use case).

Anyone who knows the answer to that question would break their NDA if they answered.
The statement you are questioning (to me) implies a comparison of latency due to driver and abstraction overhead (not to mention potential latencies due to PCI-e data transfers etc), which I think we can be pretty confident is an area the PS4 can have a clear advantage (cache bypass, etc).
How this translates to realworld performance has yet to be seen - but I'd agree with pjb that it likely will only help with very low latency jobs.
Does something have to be new to be efficient or functional to some people ("there is nothing new under the Sun")? What I don't understand is that if we don't know the real world implications of such a drastic increase in cues will due, why assume it's not very helpful? That seems worse than taking a direct correlation between the difference in the amount of cues. If based on a 4x increase, at least, then it would be based on math. I'm just saying it's an architectural improvement over what's currently available in PC hardware.

Not sure what you mean here? It's not accurate but telling?
:?:
It should've been a question mark instead of a period. I have corrected it. Thank you for pointing it out.
 
So can we say the gpu in PS4 can perform like a 7870 or greater if taken into account of console architecture efficiencies etc?
 
I want fact and not opinion. Is it an architectural advantage to have 8 ACEs instead of 2 ACEs? Is it an architectural advantage to have 64 cues instead of 16 cues? These answers should be fact and not opinion.

Those are the same question. Yes they wouldn't have added them if they were not, but I doubt they are worth more than a couple CU's in best case. The advantage being peak power stays down over a brute force approach. So adding ACE's on a pc might not be as significant.

This have never been about the PS4 set up being equal to a Titan in hardware. It's about the PS4 being a lot more capable than similar PC hardware (due to software and architectural differences). Why are people tring to make it more than that?
If you only mean to suggest that a console will outperform a 2TF PC, I don't think you will find much argument. The problem is when you try to suggest console hardware will punch twice its weight. Those days are gone. My year old PC will be running next gen titles with higher fps, resolution and AA.
 
I'm not sure how you justify the above statement. Clearly the power of current high end PC hardware goes either unused or used inefficiently because the (console derived) software doesn't require that power. Should the software change and start requiring higher performance, that performance will be used if it is available, whether or not it's in a PC. It's not difficult to max out a dual core CPU or entry level GPU in modern games and that will be no different next generation with higher specification hardware.
I think Windows PCs could use the efficient software in consoles just as much or more than consoles could use the highest end hardware PCs could provide. We know that level of software performance won't/can't happen in the Windows PC environment, though. Developers must target lower GPU set-ups for profits. And, developers can't focus on one particular hardware set.


If CPU compute requirements increase next genreation then there's no reason that more powerful PC CPU's will not be fully utilised in meeting those requirements. I'll grant you that using the IGP's of APU's as dedicated GPGPU processors is more questionable given that it would require some level of developer and/or vendor support but NV has certainly achieved this type of support with PhysX so there's no reason AMD couldn't achieve the same - at least in some flag ship titles where it most matters. Their HSA presentations certainly show that they want to push this as a usage scenario.
I agree, however aren't CUDA cores less flexible (switching between physics tasks and graphics)?


No, but there is certainly an element of diminishing returns to take account of. 2 ACEs with 16(?) queues is still a lot of scheduling capability for compute work on the GPU. How much benefit to you derive from scheduling more? Surely beyond this point you are getting to smaller tasks which would be just as well, or better run on a CPU with greater SIMD capability. Lets not lose sight of the fact that for every task you move from the CPU to the GPU via the ACE's you're using up CU FLOPs that could be spent of graphics. With only 18 CU's it might be worth considering just how much of the GPU you want to spend on CPU work.
I would say the true diminishing returns scenario is in the graphics department and not with the ACEs. That, much like your statement, is just a guess.


There was never any claim that the article wasn't pertaining to DX11. Nor was there any claim that the graphs posted were inaccurate. The argument was and always has been (and I'm not sure how you missed this given how many times it has been repeated) that JC's comment of a 2x console performance advantage at a given performance level was no longer completely relevant due to it being made in respect to DX9, which is considerably less efficient in terms of overhead compared with DX11.

Obviously, most longstanding B3D members understand JC's comment to be a highly simplified statement for the benefit of the less technically minded and that in fact the actual advantage varies greatly from situation to situation depending on workload requirements, system bottlenecks and general levels of optimisation for each platform. But whatever part of his statement related to API overhead is certainly less relevant when using DX11 compared to using DX9 which is the API the statement was made in respect to. And to go further, whatever element of JC's statement was based on the ability to optimise for 1 specific platform / architecture is likely also made less relevant this generation by the fact that the new consoles share a considerable amount in common with the PC in both their CPU and GPU architectures.
Where is this DX9 and JC stuff coming from? I never mentioned DX9 or JC in my original post. Of course, when overhead can be up to 100x more (mixed with other software/hardware advantages), it's not hard to see around a 2x advantage (with similar hardware).
 
@Lucid_Dreamer and others
You guys really need to get reality check on claims You make, like that You can get 2x more performance on consoles in similar configuration to PC. Its just unrealistic, because if that was true it would mean that games/applications run with 50% average efficiency on PC, it would be unacceptable 10 years ago and its even more unacceptable now.
We all can agree that some algorithms can work with 50% efficiency on PC, but to think that whole frame is rendered with 50% of hardware efficiency is crazy to say at least.

So assuming that unoptimized game uses only 70% of hardware on PC and the best optimized on console 95% [You wont get 100% ever], You can get 35% more power out of console at best, thats the most extreme example.

--
And i wont believe in 100x difference in draw calls. You can search for Repi and Sebbbi posts about this, because i'm pretty sure they've talked about it
 
Last edited by a moderator:
I think Windows PCs could use the efficient software in consoles just as much or more than consoles could use the highest end hardware PCs could provide. We know that level of software performance won't/can't happen in the Windows PC environment, though. Developers must target lower GPU set-ups for profits. And, developers can't focus on one particular hardware set.

What is it you're trying to say with this statement other than an extremely generic "PC's could benefit from console type optimisations"? That's clearly not in dispute. The argument here is the level of performance gains possible on next generation consoles relative to current generation PC's.

I see 2 elements to that argument, both of which I disagree with. 1. some are claiming the gains will be on average 2x or more based on JC's twitter statement - I disagree because the situation has changed from the conditions on which JC based that statement (DX9 and unique console architectures) and 2. You seem to be claiming that the performance gains will actually be larger than last generation for which I see no logical basis aside from the obvious wish to make up for the lack of theoretical performance compared with last generation. We have more similarity in architecture across all platforms, we have a thinner, more efficient API on the PC and we have larger gaming development costs than ever before making reliance very low level coding (where a lot of the console performance gains come from) likely to be less prevalent than last generation.

I agree, however aren't CUDA cores less flexible (switching between physics tasks and graphics)?

I have no idea on the relative flexibility of CUDA and GCN cores for compute. However I'm not sure how it relates to my point of performing compute work on CPU's or APU's in the PC space next generation. There are no NV APU's. And by the time there are, they won't be using Kepler cores.

I would say the true diminishing returns scenario is in the graphics department and not with the ACEs. That, much like your statement, is just a guess.

No, when you add more graphics hardware the performance scales linearly. If you don't see that as improved visual impact then that's a different kind of diminishing returns.

I'm talking pure hardware level diminishing returns. You are not going to get 4x the compute performance from 18CU's using 8 ACE's as you will with 2 ACEs. But the fact still remains that for all you're focus on GPGPU as an advantage for the PS4, you are actually talking about reducing the power available for graphics for every additional piece of compute work you run on the GPU as opposed to the CPU.

Where is this DX9 and JC stuff coming from? I never mentioned DX9 or JC in my original post. Of course, when overhead can be up to 100x more (mixed with other software/hardware advantages), it's not hard to see around a 2x advantage (with similar hardware).

Perhaps arguments have got mixed up between different people. See my first comment in this post.

It still seems bad according to the article and Timothy Lottes. I think even Sebbi had something nasty to say about the overhead. That member comparison chart between DX11 and OpenGL looked pretty horrible, too. LibGCM is suppose to be that much better than OpenGL. I'm not talking about the difference between DX9 and DX11 draw call overhead. However, it WOULD be nice to see that data. I just don't think it will show up.

Since you seem to be focussed on these two sources so much, let me ask you this. If draw call limitations are such a huge problem for the PC, and per your own source, OpenGL is such a massive improvement over DX11. Why aren't more (any) games using OpenGL as opposed to DX9/11? Surely this would be an easy performance win for developers which would mean less time needs to be spent on draw call optimisation and the game is available to a wide audience (at lower performance levels).

Also, if you're looking for numbers around the DX9-DX11 comparison, the bit-tech/Timothy Lottes article mentions an up to 2x advantage of DX11 over DX9 for draw calls via multi-threaded display lists which is in addition to other work arounds that can be employed such as instancing.

"There are the multi-threaded display lists, which come up in DirectX 11 – that helps, but unsurprisingly it only gives you a factor of two at the very best, from what we've seen. And we also support instancing, which means that if you're going to draw a crate, you can actually draw ten crates just as fast as far as DirectX is concerned."
 
Last edited by a moderator:
@Lucid_Dreamer and others
You guys really need to get reality check on claims You make

In reality,when it comes to perceived graphics quality, God of War 3, Last of us, Gears of War 3, Uncharted 3 are more or less in the same ballpark as Crysis 3 on highend PC. Maybe it's all about art, not performance but the point still stands.
 
In reality,when it comes to perceived graphics quality, God of War 3, Last of us, Gears of War 3, Uncharted 3 are more or less in the same ballpark as Crysis 3 on highend PC. Maybe it's all about art, not performance but the point still stands.

Ok ...
 
...I think even Sebbi had something nasty to say about the overhead...
Draw call overhead on PC is considerably higher compared to consoles. On a console you can just write a few bytes (command header + data) directly to the ring buffer (existing in same unified memory system). That's just a few CPU cycles. You can just ignore all the CPU and GPU synchronization if you know what you are doing (you manage the life time of your data properly, and make sure you are not modifying the data while the GPU is accessing it - for example by manually double buffering the resources).

DirectX/OpenGL doesn't know enough about your data usage pattern to skip unneeded synchronization. And on PC there's always other applications (and the OS) using the GPU as well. So the API calls (draw, map, state change) needs to be always properly synchronized. A simple mutex lock costs over 1000 cycles. There's multiple driver layers (user, kernel) and some parts are shared between all applications (and OS). So you should expect DirectX/driver to have multiple synchronization points (locks) for each draw call. And then of course the data must be transferred over PCI express bus to the GPU's memory, since there's no shared memory that both CPU and GPU can access. So the 10x overhead discussed in this thread is likely a conservative guess. I would expect the CPU cycle overhead to be much higher.

Earlier in this thread, there was some discussion about DX11 multithreaded rendering. It doesn't actually reduce the driver overhead at all. It allows you to use several CPU cores record the draw calls to separate command buffers. This is kind of awkward, since one modern (Sandy/Ivy Bridge) PC CPU core is powerful enough to run all the processing that six x360 thread would do in a single frame. Usually x360 games use just one of those six threads to submit the draw calls. With DX11 you can now use two high performance PC cores in tandem to send as many draw calls as a single x360 thread... Brute force at it's finest. Not something to brag about :)
 
^^ Enlightening, thanks. Still, I guess it's hard to put an exact number of efficiency potential in the end result over PC though... which is what everyone is trying to put a pin on.
 
Back
Top