Digital Foundry Article Technical Discussion Archive [2015]

Status
Not open for further replies.
Three face please in the interests of fairness. Even though it's pulled, the Master Race needs to know their (I suppose I should say our as I exclusively game on the PC now) version sucks balls.

NX has a first contact, resolution difference and slight performance disparity for xbox one, PC version missing lots of effects and not running very well on his 7870 AMD system.
 
Console ports are increasingly kicking the PC in the PCI-E bridge.

Maybe DX12 and the better control of transfers into video memory that it's supposed to allow can save us...
 
They are taking an age to do the Batman face-off, what gives?

Struggling to spin it in their preferred direction? ;)

They've already published 2 articles highlighting the mess that is the PC version so I'm not sure that's a fair assessment. If they were to do a face off right now, they would be right to exclude the PC version altogether, especially since it's been withdrawn from sale.
 
Console ports are increasingly kicking the PC in the PCI-E bridge.

Maybe DX12 and the better control of transfers into video memory that it's supposed to allow can save us...

What other games have suffered in this way recently? And what makes you think it has anything to do with PCI-E?
 
They miss AF, they use d3d registers. I would say they probably have experienced people, but do not have people who are dedicated to PS4 version.
Every cross platform studio has some dedicated PS4 and Xbox One developers, especially in the rendering team. Using D3D style register bindings can reduce the code maintenance cost, if your code base is influenced by DirectX. Some studios might prefer this (if they are not CPU bound and need the time to optimize the GPU side better). We have our own rendering API and use code generation instead. Generator emits optimal (branchless) resource binding code for each platform. We always use the closest to metal API on every platform. I know other cross platform developers who do the same.

Now that both consoles have GCN based GPUs and Jaguar CPUs, cross platform developers no longer need to split their optimization efforts between two completely different platforms. Most optimizations help both platforms similarly.
 
Last edited:
What other games have suffered in this way recently? And what makes you think it has anything to do with PCI-E?

Far Cry 4 springs to mind, just from reading about it (don't own it). Texture streaming seems to be the culprit there.

I'm pretty sure I've read amongst the DX12 previews articles that explicitly moving textures in and out of GPU memory using DX 11 / 10 / 9 is high overhead, works best with large blocks (whole mipmap levels perhaps?) and can cause performance issues.

With DX12, controlling movement of data into GPU memory is finer grained and lower overhead, and less likely to interfere with other data being sent to the GPU.

PCI-E BW should be enough for texture streaming even at high resolutions and frame rates if the transfers could be handled well enough. Just look at what virtual texturing can achieve with a paltry cache of a few MB and mechanical HDD transfer rates!

Suffice to say, I have high hopes for DX12.
 
Far Cry 4 springs to mind, just from reading about it (don't own it). Texture streaming seems to be the culprit there.

It runs fine now (I've got it and it's completely stutter free on my 2GB GTX670, pretty much maxed out at higher than console settings). I understand it stuttered when first released but that was resolved after a few patches. No doubt PC's are harder to optimise for than consoles given the limitations of DX11 and the varied hardware configurations but I'm not seeing a fundamental limitation of PCI-E unless we're talking about latency sensitive GPGPU operations - which with DX12 would be possible on integrated GPU's.
 
if your code base is influenced by DirectX

What "code base"? It's not an enterprise with legacy code.

use code generation instead

Usually debugging generated code is a real nightmare.

if they are not CPU bound

If they are not CPU bound, are they by any chance GPU bound? I want to see these heroes with my own eyes!

Most optimizations help both platforms similarly.

That was also the case in previous generation. But slightly different: people, who wrote good PS3 code, also wrote a good X360 or PC one.
 
Texture streaming seems to be the culprit there.
Texture (and other data) streaming to GPU memory is often a cause of stuttering in PC games. This is mostly because DirectX abstracts the resource management (Java / garbage collection syndrome). With abstract resource management the GPU driver has no clue what textures you need in a certain level area. When you bind a resource to the GPU, and it is not resident in the GPU memory (you can't even ask for this), the driver notices that it is missing, starts the texture upload and with high likelihood stalls the frame rendering (if a big texture is missing or multiple small ones). Good manual texture management uses engine knowledge about level design and moves textures to GPU memory just ahead of time, avoiding these stalls.
With DX12, controlling movement of data into GPU memory is finer grained and lower overhead, and less likely to interfere with other data being sent to the GPU.
Copy queus make movement cheaper and lower latency. But most importantly the game engine can tell the GPU what data is needed instead of employing driver side black magic to guess it.
What "code base"? It's not an enterprise with legacy code.
Big game engines have huge amount of legacy code. Even the first party console studios don't rewrite their whole code base for every project. We are talking about code bases of several million lines here. It would not be commercially viable to rewrite it all during a single project.
Usually debugging generated code is a real nightmare.
Not if you just generate the small platform specific command creation part, and if you employ techniques to make debugging easier. Some studios even employ code generation to make runtime code editing faster and easier, improving the iteration time. Code generation can be used to made debugging easier instead of harder when used properly.
That was also the case in previous generation. But slightly different: people, who wrote good PS3 code, also wrote a good X360 or PC one.
SPUs certainly forced people to think about data access patterns and optimize the crap out of the data movement between memory and local store. This of course helps all cache based architectures, especially ones like Xbox 360 that require manual cache prefetching to perform well. However Xbox 360 VMX128 code needs a lot of special care to work well. SPUs do not LHS stall for example. And SPUs have lower instruction latency. Compiler needs lots of parallelism for VMX128 inside each loop body to generate good code (and utilize that huge pool of 128 vector registers). SPU code doesn't require that much unrolling and other tricks to perform as expected.

Current generation allows you to use exacly the same optimized CPU code on both platforms. This has never been possible before. When you optimize a loop with AVX intrinsics that code can be used on both consoles. When you optimize some data set to fit to the L1 and L2 cache better it helps both platforms identically, since they both have Jaguar CPUs with same caches (same size and associativity). When you optimize around CPU bottlenecks and quircks, both consoles can use the same code. This is a big improvement for cross platform developers.

On the GPU side, you also can optimize the shader code once (for GCN), and expect minimal extra modifications based on platform. On PS3 you had to be extra careful about 32 bit ALUs and brancing, interpolants, etc. Xbox 360 GPU allowed more advanced techniques, but only if you had the time to fully rewrite your lighting/etc code for PS3. Some devs did lighting and post processing on SPUs (very different code indeed compared to Xbox 360 shader code).
 
Last edited:
I've played around a bit with Batman these days (got it for free with my GPU)... it's an interesting beast, really.
So, I should easily be getting an average of about 60Hz at 1080P or thereabouts. Or something along the lines of 30Hz at 4K with reduced options (which there aren't many of in Batman).

Either way... the "PC Perfomance Test". For comparisons sake, I ran it twice. With VSync on!

4K
First run: 19fps lowest, 28fps average
Second run: 11fps lowest 21fps average

That's a disparity, I can't really believe. I know, VSync is partly to blame here, as I might've just so managed to go above 30Hz in the first test, and was barely running below 30Hz in the second, but... the lowest one is... impressive. Also from the get go, it used north of 8GB of RAM and the second run topped out at north of 9GB. No to mention the heavy swapping within my GPU (970 with its 3.5GB of fastram)

Looking at The Witcher 3. I can manage 4k30Hz with medium to ultra details (and disabling all the IQ destroying post processes). And that game has such a long view distance... it really boggles my mind how Batman can't reach that, even in static scenes, where there's no real texture streaming happening.
 
It would not be commercially viable to rewrite it all during a single project.

If you employ scene management and other "change management" code in your "engine" then I can see why it is millions of lines long (you do not "draw mesh", you need to "place it the scene", set up "modes", and other dependencies, etc.). But if you just immediately "draw triangles" (that's what hardware is optimized for) I don't see why the code should be so complex.
Legacy code for current consoles is "being bound by D3D-thinking".

Code generation can be used to made debugging easier instead of harder when used properly.

Depends on what you call "code generation". If it's some clever component-based templates I totally see why it can be easeier and faster, but if it's a real code generation...
Hmm, on the other hand I do remember using codegen to get some aspect-like behavior useful for debugging, maybe you're right...

Xbox 360 VMX128 code

Yeah, "backward compatibility" MSFT initiative will show just how many games used "manual optimizations" for that (I mean there is no way to emulate 3GHz CPU on 1.2GHz one if threads are that optimized).
But judging from MSFT even coming with this initiative I would say that they believe X360 CPU was heavily underused...

When you optimize a loop with AVX intrinsics

Then you'd better move it to GPU compute and forget about it.

When you optimize around CPU bottlenecks and quircks

Then you doing it wrong. There is no performance to find there. Just use compute.
And, to prevent "it's GPU bound", until this day I have never seen a GPU-bound game in real life (GPU-bound = uses 100% of all GPU ALUs all the time).
 
Batman Arkham Face-Off

The visual return is easy to see; on PS4 the game runs at a full native 1920x1080 resolution many hoped would be the standard this generation. It's perhaps not the crispest example of a full-HD game, owing to its heavy post-process anti-aliasing, and a film grain filter - but the visibility of Gotham's city-line is all the clearer for running at this pixel count. Pop-in is something that flares up from time-to-time, but overall the game looks gorgeous in motion and rarely shows its rough edges.

On Xbox One, every single effect and detail carries directly across from the Sony release. Texture mapping is identical, and in terms of asset streaming there are only minor variances between the two versions when it comes to texture pop-in. However, it's a familiar scenario in the resolution stakes, and we get an upscaled 1600x900 on Xbox One that causes more pixel-crawl on distant buildings than we see on PS4. This is accentuated by Arkham Knight's post effects - the same gamut of filters as seen on PS4 - where a chromatic aberration pass heightens the effect of pixel-crawl in brightly-lit areas.

On balance, Xbox One shows a trend of more hiccups and tears overall as we glide through the city, though the difference isn't stark. It's not enough to detract from the playing experience on either platform, but it's fair to say PS4 is a smoother performer as an overarching rule. Impressively, as of patch 1.02 playback on Sony's platform is even more polished than results we experienced in our review code - and the fully patched Xbox One release surpasses this older Sony build too. It's a very solid 30fps turnout for both, and in light of the frame-pacing issues experienced in recent games, a real breath of fresh air.
 
If you employ scene management and other "change management" code in your "engine" then I can see why it is millions of lines long (you do not "draw mesh", you need to "place it the scene", set up "modes", and other dependencies, etc.). But if you just immediately "draw triangles" (that's what hardware is optimized for) I don't see why the code should be so complex.
Legacy code for current consoles is "being bound by D3D-thinking".



Depends on what you call "code generation". If it's some clever component-based templates I totally see why it can be easeier and faster, but if it's a real code generation...
Hmm, on the other hand I do remember using codegen to get some aspect-like behavior useful for debugging, maybe you're right...



Yeah, "backward compatibility" MSFT initiative will show just how many games used "manual optimizations" for that (I mean there is no way to emulate 3GHz CPU on 1.2GHz one if threads are that optimized).
But judging from MSFT even coming with this initiative I would say that they believe X360 CPU was heavily underused...



Then you'd better move it to GPU compute and forget about it.



Then you doing it wrong. There is no performance to find there. Just use compute.
And, to prevent "it's GPU bound", until this day I have never seen a GPU-bound game in real life (GPU-bound = uses 100% of all GPU ALUs all the time).
No, gpu bound means that the bottleneck is at least in one part of the gpu. You will never utilize a gpu 100%, if you really mean 100%. You can't even use the ROPs 100%, because of memory bandwidth.
 
No, gpu bound means that the bottleneck is at least in one part of the gpu. You will never utilize a gpu 100%, if you really mean 100%. You can't even use the ROPs 100%, because of memory bandwidth.

Eh, now that we are definitely heading toward a spin-off thread, I wanted to know, while the ROPs are fully using the main memory bandwidth, is that still possible to make the ALUs process something only within its GPU caches?
 
Batman Arkham Knight Receives patch on PC
Chanegelog:
- Fixed a crash that was happening for some users when exiting the game
- Fixed a bug which disabled rain effects and ambient occlusion. We are actively looking into fixing other bugs to improve this further
- Corrected an issue that was causing Steam to re-download the game when verifying the integrity of the game cache through the Steam client
- Fixed a bug that caused the game to crash when turning off Motion Blur in BmSystemSettings.ini. A future patch will enable this in the graphics settings menu
 
Open world game with relatively lots of physics: where is the CPU advantage of the One?
What advantage would you expect to see? If the CPU requirements are capped at what PS4 is capable of (enforced parity), no advantage would be visible. From the sounds of it, the game is well balanced to not put moments of considerable framerate-trashing stress on the CPU, and it's only the occasional GPU spike that hampers the 30 fps.
 
Maybe part of physics run on the GPU. Raycasting for visibility or others task, cloth physics like Ubi, other parts are good task for GPGPU.
 
Last edited:
What advantage would you expect to see? If the CPU requirements are capped at what PS4 is capable of (enforced parity), no advantage would be visible. From the sounds of it, the game is well balanced to not put moments of considerable framerate-trashing stress on the CPU, and it's only the occasional GPU spike that hampers the 30 fps.
If they choose parity for CPU, why not parity for GPU then??

What advantages I'd expect? Well, lots of NPCs (AI) in an open world game, lots of physics...shouldn't this put a stress on the CPU?

Didn't we here the NPC argument in the case of AC Unity?

Can't sudden physics interaction spike CPU usage?
 
Status
Not open for further replies.
Back
Top