DirectX 12: The future of it within the console gaming space (specifically the XB1)

Yah, there was that "Four stages of ESRAM adoption" thing that went around a while ago, and I still don't really how it would make sense, in practical terms, to make better use of ESRAM. It seemed to suggest that stuffing a full g-buffer into the ESRAM was not optimal, and that future games will move to steps 3 and 4, where games now are in stages 1 and 2.

1. statically allocate small number of render targets into ESRAM (stuffing gbuffer into ESRAM)
2. alias the same memeory for reuse later (don't know what this means)
3. partial residency - put the top strip of render target(sky) into DRAM, not ESRAM (how practical is this for any kind of game with a user controlled camera?)
4. asychronously DMA resources in/out of ESRAM (tiling, is this the whole thing about moving different buffers into ESRAM at each stage of the rendering pipeline? Depth, stencil, colour, others?)

powerpoint presentation from Martin Fuller (relevant slides 7-13)
http://www.google.fr/url?sa=t&rct=j...=5OPUG-PcfQ6l-QG42tkmnQ&bvm=bv.72676100,d.bGE
 
Man, it sure seems like Microsoft pulled a Sony on devs this generation. XBone seems like it will be a pain in the ass for devs to get performance out of it.
 
Man, it sure seems like Microsoft pulled a Sony on devs this generation. XBone seems like it will be a pain in the ass for devs to get performance out of it.

Yah, the big question is how much does doing all of that gain you? If you can go 900p down to 792p without doing all that work, is that a better option than doing all the extra work if the end result on the screen is not noticeably better? It'll be interesting to see how it plays out.

Edit: I'm also not sure how compute is going to affect asynchronous transfer of buffers in and out of ESRAM. For example, how many shaders need depth information at different stages of your pipeline? Is it really going to be easy, in practice, to keep depth, stencil, colour and other buffers in DDR3 at different stages?
 
Let me fix that:

But even so that allows only 900p on those 32 MB at the level of complexity and quality Crytek was aiming for.

They'd probably be able to hit 1080p as well if they were willing to sacrifice on various kinds of detail, post processing effects and such.
 
2. alias the same memeory for reuse later (don't know what this means)

It basically means to tell the GPU that multiple different resources overlap in memory. Provided you manager read and write times, then you can reuse the same memory for different things at different stages of the render pipeline.
Very common thing to do. We saved over 1/3rd of our render target memory doing this in KZM.
 
Yah, the big question is how much does doing all of that gain you? If you can go 900p down to 792p without doing all that work, is that a better option than doing all the extra work if the end result on the screen is not noticeably better? It'll be interesting to see how it plays out.

Edit: I'm also not sure how compute is going to affect asynchronous transfer of buffers in and out of ESRAM. For example, how many shaders need depth information at different stages of your pipeline? Is it really going to be easy, in practice, to keep depth, stencil, colour and other buffers in DDR3 at different stages?

Well, on the plus side, all the signs point to Xbox doing better not worse over time in the resolution gap.

2013
Ghosts 720P (1080 PS4)
BF4 720P (900 PS4)
AC IV 900P (1080 PS4)

2014

COD:AW at least 882P (wont be >1080 PS4 obviously)
BF4 Hardlines (now 2015) targeting 1080/60?
AC Unity Targeting 1080/60?

Besides other examples like Destiny, Diablo 3 making 1080, Witcher 3 shooting for 1080. Metro Redux 900P. Resolution parity 900P on UFC 15.

Now granted all these games may not make their target and we will see. But at least 900P seems the worst we've got on One since MGS: GZ.
 
Man, it sure seems like Microsoft pulled a Sony on devs this generation. XBone seems like it will be a pain in the ass for devs to get performance out of it.

No way. These systems are too similar to PCs. Nothing as far outside the box like the EE/GS and Cell. It just seems some cannot wrap their heads around the fact that the GPUs on both these machines are worlds apart. People are hoping for parity that is never going to happen. X1 can get the job done.. just at lower render quality then its competition.
 
Let me fix that:



They'd probably be able to hit 1080p as well if they were willing to sacrifice on various kinds of detail, post processing effects and such.

I dont´s think thats physically possible when Deferred Shading is used.

Its all in the maths.

32 MB = 32*1024*1024 bytes= 33554432 Bytes

At 1080p thats 33554432/(1920*1080) = 16,18 bytes per pixel.

Deferred Shading uses 28 bytes per pixel, and thats 792 p (1408*792).

If we manage to somehow aggressively compress this (in the word of Crytek), we can reduce it to 20 to 24 bytes per pixel. 900p requeres 23 bytes per pixel, so it was possible with compression, but going even further and achieve 16 (from 28) it would be a feat.

Even some other expert teams like teams from Ubisoft (Watch Dogs) and EA (Titanfall), that also used Defered Shading (this is used because it greatly increases performance) were not capable of reducing memory usage to allow for 900p, let alone 1080p.

Honestly, I cannot see it resolved, not even with DX 12.
 
I dont´s think thats physically possible when Deferred Shading is used.

Its all in the maths.

32 MB = 32*1024*1024 bytes= 33554432 Bytes

At 1080p thats 33554432/(1920*1080) = 16,18 bytes per pixel.

Deferred Shading uses 28 bytes per pixel, and thats 792 p (1408*792).

If we manage to somehow aggressively compress this (in the word of Crytek), we can reduce it to 20 to 24 bytes per pixel. 900p requeres 23 bytes per pixel, so it was possible with compression, but going even further and achieve 16 (from 28) it would be a feat.

Even some other expert teams like teams from Ubisoft (Watch Dogs) and EA (Titanfall), that also used Defered Shading (this is used because it greatly increases performance) were not capable of reducing memory usage to allow for 900p, let alone 1080p.

Honestly, I cannot see it resolved, not even with DX 12.

Why can't they just split the render targets between esram and ddr3? I thought the xb1 was capable of writing out to render targets in both pools of ram at the same time or is that not exposed in the sdk as of yet?
 
Why can't they just split the render targets between esram and ddr3? I thought the xb1 was capable of writing out to render targets in both pools of ram at the same time or is that not exposed in the sdk as of yet?

memory can be allocated across both pool, transparently to the code.
 
memory can be allocated across both pool, transparently to the code.

As Crytek said:
"We put our most accessed render targets like the G-Buffer targets into ESRAM. Writing to ESRAM yields a considerable speed-up."

Since the G-buffer is one of the most accessed render targets, I really doubt any performance could be obtained from DDR3. Specially if you reduce the 68GB/s theoretical memory to 54,4 GB/s (80% efficiency as estimated by the Xbox Engineers on the Digital Foundry Interview), and subtract the 30 GB/s that the CPU can access. And that leaves a garanteed 25 GB/s (only) to the GPU!
 
There must be some parts of the gbuffer that aren't needed until the post process phase, stuff like that can be moved to slower ddr3 and leave the other stuff in esram.
 
As Crytek said:
"We put our most accessed render targets like the G-Buffer targets into ESRAM. Writing to ESRAM yields a considerable speed-up."

Since the G-buffer is one of the most accessed render targets, I really doubt any performance could be obtained from DDR3. Specially if you reduce the 68GB/s theoretical memory to 54,4 GB/s (80% efficiency as estimated by the Xbox Engineers on the Digital Foundry Interview), and subtract the 30 GB/s that the CPU can access. And that leaves a garanteed 25 GB/s (only) to the GPU!

You can hypothesize 1,000,000 ways that this wouldn't work, but in reality people just need 1 way that it would work.

It's just software, there's no reason they can't go for higher resolution while sacrificing the pixel quality.
It's hard to believe Crytek had not attempted to go for higher resolution at some point.
 
Last edited by a moderator:
As Crytek said:
"We put our most accessed render targets like the G-Buffer targets into ESRAM. Writing to ESRAM yields a considerable speed-up."

Since the G-buffer is one of the most accessed render targets, I really doubt any performance could be obtained from DDR3. Specially if you reduce the 68GB/s theoretical memory to 54,4 GB/s (80% efficiency as estimated by the Xbox Engineers on the Digital Foundry Interview), and subtract the 30 GB/s that the CPU can access. And that leaves a garanteed 25 GB/s (only) to the GPU!

I doubt very much that the CPU(s) use 30 GB/s, and if they ever did it'd be in short bursts over a small proportion of frame time.

Even a Core i7 doesn't consistently use that kind of bandwidth in game (not even remotely), and two of those cores will knock all six of the CPU cores in Xbone senseless.
 
I've recently read an interesting article about chroma subsampling in the framebuffer. For all it's worth, bluray uses it as well, and nobody really complains about "half color resolution" there, either (well, you're downsampling the input...)

For X1, this could potentially lead to interesting uses for esram as well, no? It's not just memory savings, either, as you are able to just render 2/3s of the information for essentially the same result (at least in PSNR).

You'd need to refactor your whole engine for full gains, though... even the art pipeline could begin with it (the paper also talked about chroma subsampled textures, as well as using additional compression before DXT), leading to substancial savings across the board (texture memory, alu, framebuffer, bandwidth). I'd love to see a comparison between a chroma subsampled rendering, "full resolution" and a 2/3 resolution (which in terms of framebuffer bandwidth and alu should be comparable)... and to see which one compares more favorably to the native image.
 
Using YCbCr encoding as 8:4:4 (2 bytes) is actually a pretty common thing to do when packing data for gbuffer. Crisis 3 did this, for example (afaik).
Storing lower resolution chroma in the rendering pipe isn't practical though. How do you encode the result, for example, when each pixel processes independently.
 
You'd need to refactor your whole engine for full gains, though... even the art pipeline could begin with it (the paper also talked about chroma subsampled textures, as well as using additional compression before DXT), leading to substancial savings across the board (texture memory, alu, framebuffer, bandwidth).

Crytek did chroma subsampling of the GBuffer for Crysis3 and I'm pretty sure for Ryse too.
It's tempting to try to extend this to the entire pipeline (from textures, to GBuffer, to lighting to final frame buffer) but unfortunately that won't work as well as one might hope.
It is true the YCoCg can be linearly transformed from/to RGB space, however common operations such as multiplying by diffuse or specular color require operations in RGB space. The equivalent transformation in YCoCg space is a 3x3 matrix, which makes it a lot less practical, not to mention that if you chroma subsample via dropping Co or Cg, you'll need to reconstruct the missing part every time you need to do these operations.
 
So I fail to see a connection there, is this written by someone from Crytek?

I´m sorry for the bad structure in the sentence.
What I meant to say was only that Ryse uses that technique, not that Ryse uses the 28 bytes per pixel refered in that document as the technique average.
If it did, it could not be 900p, but only 792p.
 
Back
Top