Digital Foundry Article Technical Discussion Archive [2015]

function · May 26, 2015

Can't wait to see some HBM powered Fusion PC's enter the DF comparisons.

That's if AMD survive long enough.

Arwin · May 26, 2015

Can you even do proper dynamic resolution under DirectX (i.e. really responding to load, not just pre-set)? I'm suddenly wondering if we've ever seen it on anything other than the Playstation.

function · May 26, 2015

Arwin said:
Can you even do proper dynamic resolution under DirectX (i.e. really responding to load, not just pre-set)? I'm suddenly wondering if we've ever seen it on anything other than the Playstation.

Rage and Wolfenstein?

Shifty Geezer · May 26, 2015

This thread restored to technical discussion.

jlippo · May 27, 2015

Arwin said:
Can you even do proper dynamic resolution under DirectX (i.e. really responding to load, not just pre-set)? I'm suddenly wondering if we've ever seen it on anything other than the Playstation.

I really do not see why not.

Intel had dynamic resolution demo with DX11 and it's pretty straight forward in implementation.
https://software.intel.com/en-us/articles/dynamic-resolution-rendering-article
https://software.intel.com/en-us/articles/dynamic-resolution-rendering-sample

Arwin · May 27, 2015

I guess on PC it is just not common at all because you are basically expected to just tweak your settings. But I am wondering why we only have two half-hearted attempts on Xbox One that aren't dynamic at all basically, just changing resolution for specific scenarios instead of based on load-balancing. Too difficult to predict budget issues and then switch?

In general I would have expected to see the technique used more often across the board actually, because it just makes so much sense.

Rikimaru · May 27, 2015

Arwin said:
I guess on PC it is just not common at all because you are basically expected to just tweak your settings. But I am wondering why we only have two half-hearted attempts on Xbox One that aren't dynamic at all basically, just changing resolution for specific scenarios instead of based on load-balancing. Too difficult to predict budget issues and then switch?

In general I would have expected to see the technique used more often across the board actually, because it just makes so much sense.

ESRAM is not infinite. Resolution increase means using more of the slow DDR3 for fb.

Dynamic resolution would work better on PS4 actually.

iroboto · May 27, 2015

Rikimaru said:
ESRAM is not infinite. Resolution increase means using more of the slow DDR3 for fb.

Dynamic resolution would work better on PS4 actually.

I'm not sure that is accurate WRT can esram hold it. As you and many have noted the main issue with esram is its small size. But this can be engineered around, either by getting away from multiple large frame buffers or to copy out the FB out of esram to ddr3 at proper timing intervals to free up space. Forward+ rendering is a solution that comes to mind for sidestepping large frame buffers, but the second doesn't seem strong as a solution.

Mainly because dx11 as an API is built to be serial. You look over the Xbox SDK and copying to and from esram at the end of the day is going to be submitted serially by that single core. You are really holding up the rendering pipeline if that 1 core needs to handle all the rendering elements in order. Furthermore, I'm not sure if dx11.X supports async copy engine either yet, I do know it supports async compute though.

So with the move to dx12 Xbox will have for sure async copy as well as multithreaded command buffers. It should IMO provide the flexibility to engineers the ability to create patterns of cycling in and out data of esram while it simultaneously rendering into esram; bandwidth shouldn't be much of an issue since it's read and writes for the most part do not interfere with each other.

MJP · May 28, 2015

iroboto said:
I'm not sure that is accurate WRT can esram hold it. As you and many have noted the main issue with esram is its small size. But this can be engineered around, either by getting away from multiple large frame buffers or to copy out the FB out of esram to ddr3 at proper timing intervals to free up space. Forward+ rendering is a solution that comes to mind for sidestepping large frame buffers, but the second doesn't seem strong as a solution.

Mainly because dx11 as an API is built to be serial. You look over the Xbox SDK and copying to and from esram at the end of the day is going to be submitted serially by that single core. You are really holding up the rendering pipeline if that 1 core needs to handle all the rendering elements in order. Furthermore, I'm not sure if dx11.X supports async copy engine either yet, I do know it supports async compute though.

So with the move to dx12 Xbox will have for sure async copy as well as multithreaded command buffers. It should IMO provide the flexibility to engineers the ability to create patterns of cycling in and out data of esram while it simultaneously rendering into esram; bandwidth shouldn't be much of an issue since it's read and writes for the most part do not interfere with each other.

Multithreaded command submission doesn't necessarily have anything to do with how the GPU consumes commands, or how the GPU executes those commands. Even if XB1 titles are restricted to building command buffers on a single core with D3D11 (and I don't think that I would assume that to be the case, considering MS likely has all kinds of XB1-specific extensions for D3D11 and can also customize the driver for their specific hardware), that doesn't affect how the GPU consumes those commands. With D3D12 and other API's that allow multithreading, you generally have multiple threads each building separate command buffers independently. However once this is finished, the separate buffers still need to be "combined" into 1 serial list for the GPU to execute. This is generally done by submitting a list of command buffers in order of how you want the GPU to execute them. So really the GPU doesn't have to care much about how the command buffers were created: they could have been created by 1 core, 4 cores, 10 cores, or some of them might have even been pre-computed. Ultimately it's just going to consume all of the commands serially*.

What we really care about here is whether the GPU can execute certain commands in a parallel. GPU's do this all of the time: if you give the GPU 3 draw commands, it's very likely that the actual processing associated with those commands (vertex shading, rasterization, pixel shading, write/blend) will overlap due to the parallel execution resources available on the GPU. What prevents overlapping command execution is sync points. On the PC version of D3D11 you have no direct control over when sync points occur. Instead the sync points are implicit based on which commands you issue. So for instance if you issue Draw A to a render target, and then issue Draw B which reads from that render target as a texture, it's the driver's responsibility to recognize that it needs to insert a sync point before Draw B so that the GPU will wait for Draw A to completely finish before starting Draw A. With D3D12 you instead have manual control over sync points, which means it's your responsibility (not the driver's) to insert sync points where necessary in order to avoid data races. This potentially reduces overhead, since the driver no longer has to keep track of a million things in order to determine whether it needs to insert a sync point. It also potentially lets you avoid synchronization due to situations where the D3D11 rules are too conservative, or where the driver is too conservative due to its limited ability to track your resource usage.

Now to finally get back to copying to ESRAM while simultaneously rendering. Even if we assume that D3D11 on XB1 has the same implicit synchronization as on PC (which we probably shouldn't), it still doesn't mean that the driver (or equivalent) on XB1 can't track resource hazards and determine that it doesn't need to insert a sync point after kicking off a DMA into or out of ESRAM. TL: DR, I don't think that I would assume that XB1 titles can't asynchronously copy to ESRAM just because it uses a variant of D3D11.

*Unless you use async compute, which lets you submit command buffers that get consumed in parallel with your "main" rendering command buffers

iroboto · May 28, 2015

MJP said:
Multithreaded command submission doesn't necessarily have anything to do with how the GPU consumes commands, or how the GPU executes those commands. Even if XB1 titles are restricted to building command buffers on a single core with D3D11 (and I don't think that I would assume that to be the case, considering MS likely has all kinds of XB1-specific extensions for D3D11 and can also customize the driver for their specific hardware), that doesn't affect how the GPU consumes those commands. With D3D12 and other API's that allow multithreading, you generally have multiple threads each building separate command buffers independently. However once this is finished, the separate buffers still need to be "combined" into 1 serial list for the GPU to execute. This is generally done by submitting a list of command buffers in order of how you want the GPU to execute them. So really the GPU doesn't have to care much about how the command buffers were created: they could have been created by 1 core, 4 cores, 10 cores, or some of them might have even been pre-computed. Ultimately it's just going to consume all of the commands serially*.

What we really care about here is whether the GPU can execute certain commands in a parallel. GPU's do this all of the time: if you give the GPU 3 draw commands, it's very likely that the actual processing associated with those commands (vertex shading, rasterization, pixel shading, write/blend) will overlap due to the parallel execution resources available on the GPU. What prevents overlapping command execution is sync points. On the PC version of D3D11 you have no direct control over when sync points occur. Instead the sync points are implicit based on which commands you issue. So for instance if you issue Draw A to a render target, and then issue Draw B which reads from that render target as a texture, it's the driver's responsibility to recognize that it needs to insert a sync point before Draw B so that the GPU will wait for Draw A to completely finish before starting Draw A. With D3D12 you instead have manual control over sync points, which means it's your responsibility (not the driver's) to insert sync points where necessary in order to avoid data races. This potentially reduces overhead, since the driver no longer has to keep track of a million things in order to determine whether it needs to insert a sync point. It also potentially lets you avoid synchronization due to situations where the D3D11 rules are too conservative, or where the driver is too conservative due to its limited ability to track your resource usage.

Now to finally get back to copying to ESRAM while simultaneously rendering. Even if we assume that D3D11 on XB1 has the same implicit synchronization as on PC (which we probably shouldn't), it still doesn't mean that the driver (or equivalent) on XB1 can't track resource hazards and determine that it doesn't need to insert a sync point after kicking off a DMA into or out of ESRAM. TL: DR, I don't think that I would assume that XB1 titles can't asynchronously copy to ESRAM just because it uses a variant of D3D11.

*Unless you use async compute, which lets you submit command buffers that get consumed in parallel with your "main" rendering command buffers

lol @fine print, I mean that was part of what I was hoping it was. I was unsure if Async copy engine was also part of that model. I know with D3D12 it's Compute, Copy, and 3D (I think this applies to Xbox as well IIRC from the SDK). 3D does it all, Copy just does copy, compute to compute. I was thinking that it was going to run these and be consumed in parallel with your main rendering command buffers; but that's something I should have asked more clarification on.
In any event, thanks MJP for the thoughts there, that was pretty insightful.

I'm just going off older things I've read from previous developers. Specifically Respawn indicated that they were so bound that they dedicated 1 core to submitted all their draw calls, a 2nd core handled all their culling. I figured if DX11.X could step away from immediate & deferred context in the way that DX12 does, I may have caught wind of it in the leaked documentation, but there is nothing specifically (at least up till November SDK) that indicated DX11.X fast semantics was much different. But you could be right as we have been nearly 7-8 months since that release. Though this sorta cropped up for me when the SMS developer indicated that if X1 goes to Dx12 and whether GNM can emulate some of the main benefits of 12 - that rather had me confused on the API situation given what I thought I knew about GNM and DX11.X and DX12. I had pretty much assumed GNM had more control than 12, but the developer commentary said differently, that XBO with dx12 would greatly benefit their multi-threaded engine. And the only thing i could think of was that DX11.X was still stuck on immediate context - so it made me think of the only feature that it could be missing.

edit: given your position in the DX community as well as your skill set it's very tempting and easy to defer everything to you haha. I'm a little flushed whether I should decipher your response as whether you are telling me 'how it works' on Xbox, or whether it's standard thread discussion

Cyan · May 28, 2015

Arwin said:
I guess on PC it is just not common at all because you are basically expected to just tweak your settings. But I am wondering why we only have two half-hearted attempts on Xbox One that aren't dynamic at all basically, just changing resolution for specific scenarios instead of based on load-balancing. Too difficult to predict budget issues and then switch?

In general I would have expected to see the technique used more often across the board actually, because it just makes so much sense.

As time goes on I have it more and more clear that the most benefitial resolution for the X1 would be 900p, especially if it can be combined with dynamic scaling. Some of the best games I've played on the console run at a very high framerate and 900p, and they look very nice.

On a different note, Richard has written an article on what are the best setups (price/IQ) to run TW3 at 1080p, 60 fps on the PC.

http://www.eurogamer.net/articles/digitalfoundry-2015-the-best-pc-hardware-for-the-witcher-3

I wonder if they will make a time lapse video like they did with other games. Given that the shadows in this game are some of the best I've seen in any game, complexity wise -basically everything seems to cast a shadow-, that could be a very interesting video.

Arwin · May 28, 2015

GTX 970 and W3 are a good combo for sure. Which reminds me I need another DS4.

bgroovy · May 29, 2015

Nice to see them demonstrate the limitations of dual core processors. It's incredibly aggravating when I see someone recommend i3 processors for a gaming build because "games are bad at using more than 2 cores" or based on old benchmarks where an i3 gets marginal victories in games where everything is running over 100fps while ignoring demanding recent titles like Witcher 3 and Battlefield 4 where the benefits of more cores are clear. Even Digital Foundry's video on the subject is slightly misleading because the cutscene performance is basically useless data and should probably be ignored as part of the average.

Globalisateur · May 30, 2015

Framerate analysis of The Witcher 3 in the Swamps (2mn 48), Xbox One, 1.03 version with the 30fps lock (cough). Courtesy of Gamersyde:

Min 22fps, max 30fps, average: ~27.8fps

AlNom · May 30, 2015

Need a pre-patch analysis to compare tbh. The swamps are supposedly very taxing?

edit: the other comparison vids don't seem as bad. So I think you've chosen a rather GPU-limited section. The drops don't appear any worse with the cap vs uncap.

Oddly, still showing slightly higher than 30fps at times.

Given that Durante seemed to think their frame rate limiter wasn't good on PC, it's still probably not well frame-paced when the game can do 30fps.

Other vids http://www.gamersyde.com/news_tw3_version_1_01_and_1_03_xb1_comparison-16579_en.html

Edit: The game still has issues when switching between camera views. PC version exhibits same problem on certain configs too.

Globalisateur · May 30, 2015

AlNets said:
Need a pre-patch analysis to compare tbh. The swamps are supposedly very taxing?

edit: the other comparison vids don't seem as bad. Oddly, still showing slightly higher than 30fps at times.

Given that Durante seemed to think their frame rate limiter wasn't good on PC, it's still probably not well frame-paced when the game can do 30fps.

Yes. The swamps are very GPU heavy, like many others areas in this game. DF only showed us the CPU heavy moments in their video.

AlNom · May 30, 2015

Globalisateur said:
Yes. The swamps are very GPU heavy, like many others areas in this game. DF only showed us the CPU heavy moments in their video.

Maybe with all the bloody patches they can progress further into the story for the two consoles for subsequent articles.

psorcerer · May 30, 2015

Arwin said:
GTX 970 and W3 are a good combo for sure

Lower than 30 fps with i7 4790 and 1080p, Ultra, no Hairworks, patch 1.04
I.e. DF lied again.

orangpelupa · May 30, 2015

df lied or there something wrong with each other person PC?

AlNom · May 30, 2015

Or different areas have different bottlenecks?

You folks are reaching.

Digital Foundry Article Technical Discussion Archive [2015]

function

None functional

Arwin

Now Officially a Top 10 Poster

function

None functional

Shifty Geezer

uber-Troll!

jlippo

Arwin

Now Officially a Top 10 Poster

Rikimaru

iroboto

Daft Funk

MJP

iroboto

Daft Funk

Cyan

orange

Arwin

Now Officially a Top 10 Poster

bgroovy

Globalisateur

Globby

AlNom

Moderator

Globalisateur

Globby

AlNom

Moderator

psorcerer

orangpelupa

Elite Bug Hunter

AlNom

Moderator

Similar threads