DirectX 12: The future of it within the console gaming space (specifically the XB1)

e) *OT Tinhat* I have a strong feelings nVidia pushed a lot of their proprietary features into DX12 to become standard, I feel like I might go as far and say that those features in Maxwell MK2 made it into Xbox to help push [Nvidias] cards; perhaps a way to make up for an all AMD generation.

Nvidia may very well have pushed to have its favored features put into DX12, but Microsoft as the platform holder has an interest in including features from all major stakeholders. It's not helpful for the platform if it starts ignoring the advances in the state of the art of one of the most significant players in that space. It's not all just AMD, and if the goal is to make progress as an industry it can't be. There's now a beta API that can be all just AMD, if that's one's fetish.

One feature that will require hardware support on DX 12 is the implementation of PixelSync for shure. And AMD already has it covered!
Is there a source link for the claim that AMD has hardware support for PixelSync, or which hardware has it?
 
I am wondering, why is DX12 potentially more beneficial for the XB1? Considering that the hardware architecture for both consoles is almost identical, shouldnt one be just as compatible with D12 features as the other?

Nick Baker: I'm Nick Baker, I manage the hardware architecture team. We've worked on pretty much all instances of the Xbox. My team is really responsible for looking at all the available technologies. We're constantly looking to see where graphics are going - we work a lot with Andrew and the DirectX team in terms of understanding that. We have a good relationship with a lot of other companies in the hardware industry and really the organisation looks to us to formulate the hardware, what technology are going to be appropriate for any given point in time. When we start looking at what's the next console going to look like, we're always on top of the roadmap, understanding where that is and how appropriate to combine with game developers and software technology and get that all together. I manage the team. You may have seen John Sell who presented at Hot Chips, he's one of my organisation. Going back even further I presented at Hot Chips with Jeff Andrews in 2005 on the architecture of the Xbox 360. We've been doing this for a little while - as has Andrew. Andrew said it pretty well: we really wanted to build a high-performance, power-efficient box. We really wanted to make it relevant to the modern living room. Talking about AV, we're the only ones to put in an AV in and out to make it media hardware that's the centre of your entertainment.

http://www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview

Both DirectX and Xbox teams were/are working with each other. According to Spencer they knew what DX12 was doing when they built XB1. Also they said that they want to be ahead of other companies schedule in a 2013 interview:

"In the consumer space, to control your destiny, you can't just rely on commodity components. You have to be able to make your own silicon. It helps with performance; it helps with the cost; it helps make your product smaller; it helps you create your own IP (always a good thing). I'll argue you're a lot more flexible -- you're not relying on somebody else's schedule; you make your own. So we're obviously heading that way. The stuff we've done over the last 13, 14 years is one example of that within Microsoft. And you're gonna see more and more of that, is my guess, as you go forward."

http://www.engadget.com/2013/05/21/building-xbox-one-an-inside-look/

I prefer to wait and see what they have to talk about XB1 and it's DX12 only features. Even Spencer once said that there are some DX12 features in XB1 that he don't want to talk about them, but somehow confirmed their existence.

(listen from 28:00)
 
I am wondering, why is DX12 potentially more beneficial for the XB1?
From my understanding of things it shouldn't be unless MS had deliberately altered their GPU for additional dX12 future proofing. If no additional changes were done from a beneficiary perspective they should be the same.

Considering that the hardware architecture for both consoles is almost identical, shouldnt one be just as compatible with D12 features as the other?

The ideal case for everyone involved is that both support exactly the same features so that the games can use the latest features and not excessively hold back the graphical developments of games that continue to evolve while the consoles age.

Unfortunately that would be the ideal case for all gamers and things may not be entirely ideal, so the next best thing is to have at least 1 console that has future proofed support and worst case is that neither do. If neither do then we are playing out the way 2009 should have been just in 2014+.

If it's just one it likely is Xbox due to their close collaboration with the vendors, ownership of direct x and in the last generation while xbox360 was a directx9 GPU it contained some of featureset_10. Quite a pattern. Wouldn't be entirely shocked if they added more here as well.

If it's both consoles then AMD has been really busy in future proofing their IPs and not letting anyone know. You'd figure if gcn1.0 or 1.1 had all of these features we'd hear about it now since nvidia is making a big hoopla about it.
 
I am wondering, why is DX12 potentially more beneficial for the XB1? Considering that the hardware architecture for both consoles is almost identical, shouldnt one be just as compatible with D12 features as the other?
It just depends on what MS and AMD did to the xbox one GPU. MS says they highly customized many parts of the GPU. It could also be, that there were some experimental features already in older GCN and in the GCN chip of the PS4 that were just non functional or were never used by any software/firmware. That's not unusual (CPUs do that, GPUs do that, ..). Maybe they just fixed some of those features or just made them ready for real usage in DX12/OpenGL.
There are also some differences with the GPU e.g. the Move Engines. Then there is Shape etc. DX is not only for GPUs it is for the whole system. And as DX12/Win 10 is available for every connected device (e.g. ARM-bases computers) there may be some things in the API that let you use some things of the system (outside of the GPU) that weren't available/ready (or just not that dynamic useable) before (just because the API was not ready). No, I'm not writing about any 'secret sauce', just some things that make the life of a game developer a bit easier (or even more complicated).

e.g. We still don't know anything about the extra sram pool on the chip to the left of the gpu. the 8gb flash memory is still not mentioned anywhere, ...
And no this must not have anything to do with GPU performance, just a few more small tweaks a console could do (e.g. just for better loading times). The xbox was designed as media-center ... well there may be some things in DX/win10 that are needed for MS' definition of a media center in the future.
 
Nvidia may very well have pushed to have its favored features put into DX12, but Microsoft as the platform holder has an interest in including features from all major stakeholders. It's not helpful for the platform if it starts ignoring the advances in the state of the art of one of the most significant players in that space. It's not all just AMD, and if the goal is to make progress as an industry it can't be. There's now a beta API that can be all just AMD, if that's one's fetish.
Very true. Collaboration is key to progress for the industry.
 
Is there a source link for the claim that AMD has hardware support for PixelSync, or which hardware has it?

I dond´t said AMD has currently hardware support for PixelSync. I said AMD has it covered! Quite a big diference there!

http://www.chiploco.com/amd-iceland-tonga-new-technologies-35155/

On a bit more info, this website claims UAV Ordering (the equivalent of Intel's Pixelsync tech) is just a software algorithm.

http://videocardz.com/51021/amd-gcn-update-iceland-tonga-hawaii-xtx
 
It just depends on what MS and AMD did to the xbox one GPU. MS says they highly customized many parts of the GPU. It could also be, that there were some experimental features already in older GCN and in the GCN chip of the PS4 that were just non functional or were never used by any software/firmware. That's not unusual (CPUs do that, GPUs do that, ..). Maybe they just fixed some of those features or just made them ready for real usage in DX12/OpenGL.
There are also some differences with the GPU e.g. the Move Engines. Then there is Shape etc. DX is not only for GPUs it is for the whole system. And as DX12/Win 10 is available for every connected device (e.g. ARM-bases computers) there may be some things in the API that let you use some things of the system (outside of the GPU) that weren't available/ready (or just not that dynamic useable) before (just because the API was not ready). No, I'm not writing about any 'secret sauce', just some things that make the life of a game developer a bit easier (or even more complicated).

e.g. We still don't know anything about the extra sram pool on the chip to the left of the gpu. the 8gb flash memory is still not mentioned anywhere, ...
And no this must not have anything to do with GPU performance, just a few more small tweaks a console could do (e.g. just for better loading times). The xbox was designed as media-center ... well there may be some things in DX/win10 that are needed for MS' definition of a media center in the future.

Uhh...
Well PS4 would have move engines if it needed them. They have a complete unified memory pool so there it's pointless to DMA information around. Because XBox does not have a completely unified memory pool, DMA from the DDR3 to ESRAM is the method they chose to do it. Otherwise you'll be asking the GPU to waste cycles fetching from DDR3 to dump it into eSRAM if I understand that correctly.

As for Shape, Shape is from all my readings of it, is for audio. It's very good at handling audio features.

The sram pool is 4 pools of 8MB. Not one massive pool of 32MB, it being divided is just that.

The 8GB flash NAND module if I understand correctly is what's used for it's suspend and resume process for the OS. The whole OS can reside in there in low power states or something. I'm probably wrong on specifics but I think generally i'm in the right area.
 
I am wondering, why is DX12 potentially more beneficial for the XB1? Considering that the hardware architecture for both consoles is almost identical, shouldnt one be just as compatible with D12 features as the other?

The Xbox One DX11 API appears to have higher overhead and is less flexible than the API on PS4. DX12 shouldn't be an advantage in terms of performance so much as it levels the playing field on the API side. That's just my opinion.
 
The Xbox One DX11 API appears to have higher overhead and is less flexible than the API on PS4. DX12 shouldn't be an advantage in terms of performance so much as it levels the playing field on the API side. That's just my opinion.

Well I assumed the DX11 Fast Semantics covered a lot of the driver overhead, but I supposed that is still to be seen. The parallel rendering should offer Xbox a worthy advantage in certain scenarios if PS4 does not have parallel rendering; referencing multiple CPU cores all submitting their own graphics contexts in parallel to the GPU - unless of course PS4 has always had this available. Which honestly it wouldn't surprise me if it did.

There was mention that GNM was very close to Mantle. I forgot where I read that though. I assume it could mean more than just a low driver overhead.
 
Last edited:
I dond´t said AMD has currently hardware support for PixelSync. I said AMD has it covered! Quite a big diference there!

http://www.chiploco.com/amd-iceland-tonga-new-technologies-35155/

On a bit more info, this website claims UAV Ordering (the equivalent of Intel's Pixelsync tech) is just a software algorithm.

http://videocardz.com/51021/amd-gcn-update-iceland-tonga-hawaii-xtx
If the hardware element from Pixelsync is not present, I would say the jury is out on whether AMD has it covered.
A lot of features can be correctly implemented using software, but the costs of emulating the hardware fast path have not been established.
It would be "covered" as long as the software solution is usable in a practical sense, and the majority of the GPUs discussed in the same articles as this feature are far beyond the resource scope of the Xbox One.
 
Well I assumed the DX11 Fast Semantics covered a lot of the driver overhead, but I supposed that is still to be seen. The parallel rendering should offer Xbox a worthy advantage in certain scenarios if PS4 does not have parallel rendering; referencing multiple CPU cores all submitting their own graphics contexts in parallel to the GPU - unless of course PS4 has always had this available. Which honestly it wouldn't surprise me if it did.

Just for a public reference, parallel rendering on PS4 is on Epic's public UE4 task board. Which heavily implies that PS4 can do it, which is great, and expected since PS3 and X360 both could.

There was mention that PSSL was very close to Mantle. I forgot where I read that though. I assume it could mean more than just a low driver overhead.

PSSL is the shading language, you probably mean GCM.
 
I am wondering, why is DX12 potentially more beneficial for the XB1? Considering that the hardware architecture for both consoles is almost identical, shouldnt one be just as compatible with D12 features as the other?
We had someone important say that Sony would roll the same concepts into their libs/drivers even if they aren't doing so already. It boils down to whether XB1 has significant DX12 feature enhancements or not. I for one don't believe it does, and believe the core aspect to DX12, as it's being talked about, is the software layer and CPU interaction. That said, I don't follow DX12 closely and it's only what gets discussed here tat shapes my view, so I could be missing a few important aspects regards hardware features.
 
Just for a public reference, parallel rendering on PS4 is on Epic's public UE4 task board. Which heavily implies that PS4 can do it, which is great, and expected since PS3 and X360 both could.



PSSL is the shading language, you probably mean GCM.
GNM ;) Yes correct whoops.

To note however: parallel rendering is available on Xbox One today, but the balance between the threads essentially makes it useless. I don't know if they had great parallel rendering on 360 and PS3 so maybe someone else can comment, but from my complete understanding of things today, and from titanfall interview's I've read, XBO acts like any other DX11 GPU, in that the immediate context is completely full and the other cores have minor submissions with their deferred contexts, leaving the fastest speeds to be, the weakest link which is the immediate context. DX12 will provide that serious load balance that XBO needs.

I would be a little shocked if 360 and PS3 had load balanced command buffer threads; DX9/DX10 APIs were built mainly around single threaded games IIRC. I wouldn't be if PS4 did or (did not - and is in a similar position to X1 currently).
 
Last edited:
Uhh...
Well PS4 would have move engines if it needed them. They have a complete unified memory pool so there it's pointless to DMA information around. Because XBox does not have a completely unified memory pool, DMA from the DDR3 to ESRAM is the method they chose to do it. Otherwise you'll be asking the GPU to waste cycles fetching from DDR3 to dump it into eSRAM if I understand that correctly.
The move engines can do a little bit more than just move. They can decode and encode while moving. So they can be used to save memory bandwith and memory space. and while doing this.
the memory system is unified. virtually unified. it is one big virtual address space, it is only the software that prevents you from directly accessing the sram via cpu (to prevent memory contention on sram). This is also something a lower level api can provider better access methods etc to maximize sram bandwidth usage.
Also you don't have to copy something into sram or to DDR3 from the sram. the gpu can read from both. e.g. read from ddr3, process it and write the result back to sram if it is frequently needed/written. You can move something into ddr3 if you need the bandwith of the sram is needed for something else, but it is not a must. right now most developers seem to just use the sram for the rendertarget. but that may change in future (not every part of the render-target is needed in the fast memory all the time).

well... did I just start a esram debate again ... sry for that with my last post I meant the sram block that is outside of the 32mb which is ~2-3mb when you measure it's size. I still wanne know what this is used for. But maybe it is just used for video-encoding or anything like that ^^ some multimedia thing nobody needs because "it is a all in one device".

As for Shape, Shape is from all my readings of it, is for audio. It's very good at handling audio features.
yeah, shape is for audio processing, but it isn't very flexible right now and almost not used in games today. It just may need some SDK improvements. It just seems it was not yet ready for launch like many things. Maybe in future it will offload some cpu usage, or just increase the sound quality in any ways.

The sram pool is 4 pools of 8MB. Not one massive pool of 32MB, it being divided is just that.
The sram is even devided in 512kbyte chunks.

The 8GB flash NAND module if I understand correctly is what's used for it's suspend and resume process for the OS. The whole OS can reside in there in low power states or something. I'm probably wrong on specifics but I think generally i'm in the right area.
The flash memory is still a mystery. Well that's not it. the OS has it's own memory somewhere in the 3gb. Suspend like the PC is not used, because the xbox must powered for instant on. Wouldn't be needed if the suspend for flash would be done. but maybe a real suspend mode is coming with future OS updates.


We had someone important say that Sony would roll the same concepts into their libs/drivers even if they aren't doing so already. It boils down to whether XB1 has significant DX12 feature enhancements or not. I for one don't believe it does, and believe the core aspect to DX12, as it's being talked about, is the software layer and CPU interaction. That said, I don't follow DX12 closely and it's only what gets discussed here tat shapes my view, so I could be missing a few important aspects regards hardware features.

well they told us in the interview they customized some things. The example was the command processor. so there may be some hardware requirements. Not to make a new feature, but to make the new feature fast enough so it can be effectively used. There is nothing from preventing Sony to also get this feature into there libs, but it may have e.g. a higher cpu-overhead.
 
The move engines can do a little bit more than just move. They can decode and encode while moving. So they can be used to save memory bandwith and memory space. and while doing this.
the memory system is unified. virtually unified. it is one big virtual address space, it is only the software that prevents you from directly accessing the sram via cpu (to prevent memory contention on sram). This is also something a lower level api can provider better access methods etc to maximize sram bandwidth usage.
Also you don't have to copy something into sram or to DDR3 from the sram. the gpu can read from both. e.g. read from ddr3, process it and write the result back to sram if it is frequently needed/written. You can move something into ddr3 if you need the bandwith of the sram is needed for something else, but it is not a must. right now most developers seem to just use the sram for the rendertarget. but that may change in future (not every part of the render-target is needed in the fast memory all the time).
Yes, you wouldn't do a straight copy, you'd do it through a shader. It'll afford you a little more bandwidth than the move engines which combined/or individually max the bus out at 32 GB/s. I believe a shader pulled from DDR3 can provide a greater number than that.

well... did I just start a esram debate again ... sry for that with my last post I meant the sram block that is outside of the 32mb which is ~2-3mb when you measure it's size. I still wanne know what this is used for. But maybe it is just used for video-encoding or anything like that ^^ some multimedia thing nobody needs because "it is a all in one device".
The sram is even devided in 512kbyte chunks
Still pooled into 4x8MB. And also, are you sure that exists as 'esram', what if that is something else entirely.
Digital Foundry: If we look at the ESRAM, the Hot Chips presentation revealed for the first time that you've got four blocks of 8MB areas. How does that work?

yeah, shape is for audio processing, but it isn't very flexible right now and almost not used in games today. It just may need some SDK improvements. It just seems it was not yet ready for launch like many things. Maybe in future it will offload some cpu usage, or just increase the sound quality in any ways.
Sadly, audio is never used, but with respect to the topic of DX12, I don't think it'll add any additional featureset here to the audio side of things.

The flash memory is still a mystery. Well that's not it. the OS has it's own memory somewhere in the 3gb. Suspend like the PC is not used, because the xbox must powered for instant on. Wouldn't be needed if the suspend for flash would be done. but maybe a real suspend mode is coming with future OS updates.
Interesting that it's still unknown/unconfirmed. Regardless though, I don't feel a strong correlation between 8GB NAND flash and DX12.

well they told us in the interview they customized some things. The example was the command processor. so there may be some hardware requirements. Not to make a new feature, but to make the new feature fast enough so it can be effectively used. There is nothing from preventing Sony to also get this feature into there libs, but it may have e.g. a higher cpu-overhead.
Right, like I said earlier, I think X1 has more stuff than the standard GCN1.1. My reasons are stated above, they spent a lot of money to build this device there is probably some thought of future proofing. I mean, they knew what was coming ultimately, there's no doubt, MS wasn't predicting the future when they were collaborating on DX12, they were shaping it. It's not like they didn't do that for Xbox 360. They knew DX10 was coming and they put it into 360. So the question really becomes, what is the focus on DX12, is it more than just a low level driver and usable parallel rendering across multiple cores.

I'm excited at the prospect of seeing more VXGI in games in the future, hopefully there are more surprises to come.
 
I would be a little shocked if 360 and PS3 had load balanced command buffer threads; DX9/DX10 APIs were built mainly around single threaded games IIRC. I wouldn't be if PS4 did or (did not - and is in a similar position to X1 currently).

What do you mean by "load balanced"?

GCM on PS3 was a pretty light API. You construct command buffers on any thread (or SPE) and submit them serially. Load balancing was up to the developer. If I want to generate command buffer for my entire scene in one thread and for a guy's finger in another that's my fault. I don't think much of this has changed about this between generations.
 
Right, like I said earlier, I think X1 has more stuff than the standard GCN1.1. My reasons are stated above, they spent a lot of money to build this device there is probably some thought of future proofing. I mean, they knew what was coming ultimately, there's no doubt, MS wasn't predicting the future when they were collaborating on DX12, they were shaping it. It's not like they didn't do that for Xbox 360. They knew DX10 was coming and they put it into 360.
What use did that have though? How much action did XB360's tessellator see, for example? I don't recall anything game-changing in the end results. Any DX10 like features were premature because the tech wasn't ready (as is always the case for first-gen DXNew features. Takes a second wave of GPUs to refine the ideas into something practical). AMD offered a DX11 part. MS could tweak it, but not redesign it (AMD would need a year or three to create a new GPU on a new spec that's radically different to the old one), so they're left with a DX11 part with maybe a couple of DX12 niceties regards memory addressing or somesuch. What they won't have is a higher shader model than PS4, or extra functional blocks, or anything significantly different. Similar to PS4 not having anything significantly different save a prod on the GPU<>CPU communication lines and a poke on the ACEs. The difference in real terms to what DX12 enables on XB1 over PS4 will be lost among the many other variables affecting what's on screen. It'll just be the driver overhead and ability to feed the GPU that matters. The only area that's perhaps a little grey, which I consider low probability, is the second GCP makes a significant difference. Perhaps, the second GCP gets better utilisation of the GPU than compute does in real workloads, and the X1 efficiency and max utilisation ends up more than PS4's?
 
What do you mean by "load balanced"?

GCM on PS3 was a pretty light API. You construct command buffers on any thread (or SPE) and submit them serially. Load balancing was up to the developer. If I want to generate command buffer for my entire scene in one thread and for a guy's finger in another that's my fault. I don't think much of this has changed about this between generations.
I unfortunately never had the chance to program at that level :( Lua for PSN@home was as far as I got.

I'm not sure if this applied to ps3, but this certainly applies to dx11:

2806.cpucompare.png


You can see here that the immediate context (thread 0) is full while the other contexts are light. It's not really balanced. This is what I was referring to, I'm not sure how the older generations handled this. I imagine serially as you just said.

Dx11 allowed for some parallel submission but the overhead of setting this up was seldom worthwhile from what I've read. Dx12 will allow all threads to submit in parallel properly.
 
Back
Top