Digital Foundry Microsoft Xbox Scorpio Reveal [2017: 04-06, 04-11, 04-15, 04-16]

What's the consensus on them integrating DX12 at the hardware level? Is it really a noticeable jump or more of a talking point?
I spoke about this in the past a couple times, it never really amounted to anything I guess until now. it was revealed that XBO has the beginnings of this custom command processor hardware, as revealed by DICE.

The DX12 feature in question is executeIndirect am I to believe.
CPU will offload draw calls (which take up a majority of CPU time) and ask the GPU to generate its own draw calls.

Discovered first here:
#1803

and we discussed it to death following, but not many people seemed to care ;)
 
Last edited:
IMO, It's just a bullet point for comparing against PS4 pro. In practice, I think dev's will actually use checkboard rendering.
its easier to implement. Aside from clarity, it's overall easier to implement.
 
its easier to implement. Aside from clarity, it's overall easier to implement.

Easier how? 900p on X1, 1080p on Ps4, 1800c/2160c on Ps4 Pro and native 4k on Scorpio? Let's assume that every multiplat will run at checkerboard 2160p on Ps4 Pro(which isn't the case for multiplats at the moment). The difference in pixels rendered between that and native 2160p is precisely double.
 
I spoke about this in the past a couple times, it never really amounted to anything I guess until now. it was revealed that XBO has the beginnings of this custom command processor hardware, as revealed by DICE.

The DX12 feature in question is executeIndirect am I to believe.
CPU will offload draw calls (which take up a majority of CPU time) and ask the GPU to generate its own draw calls.

Yes, the large amount of work to customise the command processor and dramatically reduce CPU load is particularly interesting. Could Scorpio be able to hit 60 fps in 30fps X1/PS4 games? Could the 60 hz dream still be alive?

So much customisation for the "dead" cat cores is interesting too. Not just the improved clocks, but "extensive customisation to reduce latency" (an area where X1 implementation was already ahead of PS4). They should have higher IPC as well as higher clocks and lower workload in DX12 games.

4GB reserved for dash though.... damn.
 
I'm going to guess that it's actually 48 CU's on the chip with 8 disabled for yields. That would be a match well with a Polaris based design: 18 CU's for Polar 11 on 128-bit bus, 36 CU's on a 256-bit bus, and 48 CU's on Scorpio with a 384-bit bus with 8 CU's disabled.
 
Yes, the large amount of work to customise the command processor and dramatically reduce CPU load is particularly interesting.
The command processors aren´t in the end programmable cores?. This all custom modifications sound really software based...
 
So they explicitly mention Polaris GPU. I expected that, but never knew they could reach 40CU with it. I guess they heavily customized it.
And no Zen? never expected that to be honest!
 
This bodes very well IMO -
Hopefully we get the audio we were wanting at the beginning of the generation, and Shape will now do something useful instead of Kinect.

Yeah, that's pretty awesome. And at least all that silicon isn't being wasted now ...
 
4GB reserved by the OS... good god, it keeps getting worse and worse lol.

Say what you want about Nintendo hardware, but at least their OS aren't maddeningly hoggish.

The rest of the console is pretty impressive.
 
I'm going to guess that it's actually 48 CU's on the chip with 8 disabled for yields. That would be a match well with a Polaris based design: 18 CU's for Polar 11 on 128-bit bus, 36 CU's on a 256-bit bus, and 48 CU's on Scorpio with a 384-bit bus with 8 CU's disabled.

Says in the article it's 4 disabled.
 
Easier how? 900p on X1, 1080p on Ps4, 1800c/2160c on Ps4 Pro and native 4k on Scorpio? Let's assume that every multiplat will run at checkerboard 2160p on Ps4 Pro(which isn't the case for multiplats at the moment). The difference in pixels rendered between that and native 2160p is precisely double.
Easier to implement because checkerboard rendering actually takes time to develop and each checkerboard rendering style is not the same; it's not universal, and it's not hardware based (there are ways to assist it, but it's not like there is a hardware check boarder in there). Each game that uses CBR undergoes heavy testing and your game needs to be designed with it in mind to make CBR trivial. When you look at the older titles that didn't have CBR in mind, how many methods did they use that would ultimately not work with CBR?

asking a game to move to a 4K frame buffer seems trivial in comparison.
 
The command processors aren´t in the end programmable cores?. This all custom modifications sound really software based...

Says in the article:

"We essentially moved Direct3D 12," says Goossen. "We built that into the command processor of the GPU and what that means is that, for all the high frequency API invocations that the games do, they'll all natively implemented in the logic of the command processor - and what this means is that our communication from the game to the GPU is super-efficient."

Processing draw calls - effectively telling the graphics hardware what to draw - is one of the most important tasks the CPU carries out. It can suck up a lot of processor resources, a pipeline that traditionally takes thousands - perhaps hundreds of thousands - of CPU instructions. With Scorpio's hardware offload, any draw call can be executed with just 11 instructions, and just nine for a state change.

"It's a massive win for us and for the developers who've adopted D3D12 on Xbox, they've told us they've been able to cut their CPU rendering overhead by half, which is pretty amazing because now the driver portion of that is such a tiny fraction," adds Goossen.

But you need to be using DX12 to benefit, and some games are stuck on DX11. Anyway, MS have done a lot of work with the GPU.
 
The command processors aren´t in the end programmable cores?. This all custom modifications sound really software based...
no, they aren't, or rather they aren't explicitly. They aren't working cores, they are closer to schedulers. They are responsible for scheduling work for the GPU to do. All the commands come in, when the commands are all present, it assigns the work depending on what type of command it is, async, compute, copy, etc.
 
Yes, the large amount of work to customise the command processor and dramatically reduce CPU load is particularly interesting. Could Scorpio be able to hit 60 fps in 30fps X1/PS4 games? Could the 60 hz dream still be alive?
CPs are generally quite general processors with fixed function helper hardware in just about every GPU out there (mobile included), the hard bit is more to do with pipelining, cache and memory.

A CP is in many way similar to a Network Processing Unit, in that it really doesn't have long to do something and very little space to do it in. Its why they are usually threaded, so it one thread has to take a little bit longer, something can still push work to the various hardware units further down the pipe. Keeping a modern GPU busy isn't easy...

The software cost of changing CP a lot is a bit problem as well, a know of at least one GPU that keep its old and new CP in its new chip, so that the old could be used whilst the developed and debugged the SW. On release the old CPs are just not used, becoming dark silicon...
 
Back
Top