Digital Foundry Microsoft Xbox Scorpio Reveal [2017: 04-06, 04-11, 04-15, 04-16]

Which part cites Polaris being the design basis?
I interpret the article stating that the Xbox One was used as the base as a sign that it might still be similar at a CU feature level to GCN2, unless specifically cited as updated like DCC and some of the geometry changes.

Here's what I get from the article.

In much the same way Polaris is just specific targeted changes to the previous GPU and VEGA is just specific targeted changes from Polaris, the Project Scorpio GPU is specific targeted changes to the XBO base GPU.

So in as much as Polaris and Vega define one path of GPU improvement and are thus different from the base GPU from which they started, so is the GPU Project Scorpio (obviously applies to PS4-P GPU as well) uses.

So for Project Scorpio they didn't want to start with a base of Polaris because it contained changes they did not want (IE - changed features they wanted to keep) or did not change certain aspects of the GPU as much as they wanted. They mentioned for example changes to various memory sizes, so I'd imagine things like Register space may have been increased which would alleviate some potential stalls common to GCN.

So while it isn't Polaris or Vega, it may still represent a quite significant change in architechture, if not in features, similar in scope if not in features to Polaris or Vega.

So going from the base feature set of the XBO GPU, they mention taking features of Polaris and Vega. And they mentioned other changes not contained in either Polaris or Vega.

Considering their focus on keeping the base features as similar to the base XBO as possible, I'd imagine that leaving out double rate FP16 was a conscious choice on their end. IE - they were already incorporating features from Vega, but opted not to include double rate FP16 as it didn't fit their design goals for the GPU for Project Scorpio. Or to put it another way, they thought the die space for double rate FP16 could be better used for something else that would be of more general benefit for titles currently in the market or entering the market. Not to mention that in focusing on improving how graphics are rendered for current titles, double rate FP16 wouldn't help with that.

Regards,
SB
 
Last edited:
And MS have explicitly told DF that "the GPU supports extensions that allow depth and ID buffers to be efficiently rendered at full native resolution, while colour buffers can be rendered at half resolution with full pixel shader efficiency". This may be worse than Sony's implementation, or it may be better, but Secret Saucing an ID buffer is at this point is silly because we know Scorpio can support something similar too. What is being described for Scorpio is remarkably similar to the implementation DICE used for their CB renderer.

For all we know that's just another DX12 feature and not an actual change to the hardware. MS has a recent track record of conflating the two and DF has a bad history of not even being able to parse information about how CB rendering works at all.

Frankly, I'd bet good money the vast majority of the customizations MS is counting are entirely related to the switch to GDDR5 and the removal of ESRAM. That being the case it's disingenuous to consider them anything like a competitive advantage when it's probable the result just makes the Scorpio APU work like the PS4 APU always has.
 
For all we know that's just another DX12 feature and not an actual change to the hardware. MS has a recent track record of conflating the two and DF has a bad history of not even being able to parse information about how CB rendering works at all.

What evidence do we have that DF don't know how CB rendering works?

Frankly, I'd bet good money the vast majority of the customizations MS is counting are entirely related to the switch to GDDR5 and the removal of ESRAM. That being the case it's disingenuous to consider them anything like a competitive advantage when it's probable the result just makes the Scorpio APU work like the PS4 APU always has.

It's always exciting to bet good money on people with far more insight being liars.

Having basically no meaningful modifications would certainly be an interesting end result from having run tens of thousands of simulated runs through hardware variations and making several tens of unique modification requests to available IP.
 
I'm sure the fact that there would be roughly 80 million consoles already installed that do not support double data rate fp16, led ms to decide that support could only ever be a sku-specific optimization and have limited support. They also had no idea the pro existed or would support it so why use the die space on a feature that might only ever be on the Scorpio.
Not only consoles, pcs as well

And the mantra for Ms.is to carry over the workflow from XBO to scorpio, and W10 pcs, all.of them DX12
 
Given the areas of modification, would it be possible to estimate a ballpark for how much performance might have changed? Just how significant is the performance overhead of VM and hypervisor likely to be?
The article's quote is pretty general:
On the latency, a couple of the areas we tackled, one was all the queues coming back from the memory interface, we sped those up as well. Specifically, within the core, because we're running a virtualised OS environment, we wanted to optimise how memory translation operations happen so there are some key changes inside the core to speed those things up. The end result is that not only does the CPU run faster, it also runs more efficiently meaning more power for you at the end."

The memory interface queues may have been sped up due the change to a wider GDDR5 interface, or if they're queues in the uncore they could have sped up due to everything else being clocked significantly higher as well. The prior consoles had a lot matching the GPU speed, which Scorpio did increase significantly. I think that might serve more to make CPU and GPU upclocks suffer less from diminishing returns, rather than standing out on its own.

Optimizing how memory translation operations happen is a fuzzy thing to interpret.
Some possibilities might be more capacity to cache translations between host and guest, or tweaks to the TLB or page table walker that reflect what Microsoft knows as the owner of the hypervisor and system memory arrangement.

In theory, the VM's overheads would have been reduced already, since Microsoft mentioned that it did a lot of work like this for the Xbox One and over time it's seemed like there have been enough examples where the Microsoft's CPU-bound performance managed to edge the PS4. Maybe there are specific functions that Microsoft wanted improved, but from the standpoint of how the Xbox One didn't clearly suffer in CPU terms versus its non-virtualized competition, it might not be as obvious because a lot of other work has been done to avoid the overhead.

So for Project Scorpio they didn't want to start with a base of Polaris because it contained changes they did not want (IE - changed features they wanted to keep) or did not change certain aspects of the GPU as much as they wanted. They mentioned for example changes to various memory sizes, so I'd imagine things like Register space may have been increased which would alleviate some potential stalls common to GCN.
I think changing the vector register file's capacity would be a notable change. It's something that has been constant even as other elements of the architecture have changed.
 
Not only consoles, pcs as well

And the mantra for Ms.is to carry over the workflow from XBO to scorpio, and W10 pcs, all.of them DX12
just wish that Scorpio is fully PC compatible, if it is I will get it day one, if not, dunno. Since my predictions aren't always right we shall wait and see.
And what if Sony made thousands of customisations?
It's a PR talk.
Double rate FP16 and ID buffer benefits were explained, these 60 customisations read like entirely marketing bullet point.
Song-Jieun.gif
 
For all we know that's just another DX12 feature and not an actual change to the hardware. MS has a recent track record of conflating the two and DF has a bad history of not even being able to parse information about how CB rendering works at all.

Frankly, I'd bet good money the vast majority of the customizations MS is counting are entirely related to the switch to GDDR5 and the removal of ESRAM. That being the case it's disingenuous to consider them anything like a competitive advantage when it's probable the result just makes the Scorpio APU work like the PS4 APU always has.

While what you suggest is possible, I think you are really underestimating the gains achievable with proper simulation. There are number of techniques to analyze and optimize in this situation, but I personally prefer a forest of trees approach for this kind of situation as, to me, it provides greater visibility to options that provide unexpected and /or outsized gains. They could have easily found 60 places where making adjustments yielded strong ROI.
 
I'm sure the fact that there would be roughly 80 million consoles already installed that do not support double data rate fp16, led ms to decide that support could only ever be a sku-specific optimization and have limited support. They also had no idea the pro existed or would support it so why use the die space on a feature that might only ever be on the Scorpio.

Well, it's a feature on pc, so it's not like scorpio wouldn't be the only place it was available. Nvidia and AMD both support it, Nvidia earlier. I would guess there are other reasons why they chose not to implement a newer CU that includes double-rate fp16.
 
Well, it's a feature on pc, so it's not like scorpio wouldn't be the only place it was available. Nvidia and AMD both support it, Nvidia earlier. I would guess there are other reasons why they chose not to implement a newer CU that includes double-rate fp16.

AFAIK, double rate FP16 is only enabled for CUDA on NVidia hardware in Windows and not for general game usage through either Vulkan or DirectX.

Regards,
SB
 
AFAIK, double rate FP16 is only enabled for CUDA on NVidia hardware in Windows and not for general game usage through either Vulkan or DirectX.

Regards,
SB

What about with SM 6.0?

The hardware is out there. Microsoft should be fully aware of that working with Nvidia, AMD on Directx.

I'm very curious to see what kind of performance improvements AMD's rapid-packed math will bring, but for ps4 pro to match scorpio in alu performance it would have to bring a 50% performance improvement, on average, which is asking a lot. The Frostbite GDC slides talk about fp16 bringing a 30% performance improvement to their checkerboard resolve shader, but checkboard resolve plus temporal AA is only 1.15ms of their frame time. How much performance improvement does fp16 bring to the rest of their shaders? Is 30% an upper limit case or an average?

Again, I think there must have been other factors in determining which features they decided to add. Hard to know what it would be. We just know they had as a performance target, and they chose clock speed as the way to get there.
 
What about with SM 6.0?

The hardware is out there. Microsoft should be fully aware of that working with Nvidia, AMD on Directx.

There's nothing that prevents its use in Vulkan or DirectX. Just NVidia have chosen not to expose the feature outside of CUDA. It remains to be seen whether AMD exposing it in Vega will prompt NVidia to change their stance on it.

I don't believe they even expose it in OpenGL or OpenCL either (but I could be wrong). They could easily expose the feature via extensions in Vulkan, OGL, and OCL if they wanted.

Actually this seems to imply that FP16 rate on 1080 is only 1/64 rate (probably artificially limited?).

https://devtalk.nvidia.com/default/...scal-geforce-gtx-1080-gtx-1070-amp-gtx-1060/6

Basically they view it as a professional and not a consumer facing feature.

Regards,
SB
 
Last edited:
There's nothing that prevents its use in Vulkan or DirectX. Just NVidia have chosen not to expose the feature outside of CUDA. It remains to be seen whether AMD exposing it in Vega will prompt NVidia to change their stance on it.

I don't believe they even expose it in OpenGL or OpenCL either (but I could be wrong). They could easily expose the feature via extensions in Vulkan, OGL, and OCL if they wanted.

Actually this seems to imply that FP16 rate on 1080 is only 1/64 rate (probably artificially limited?).

https://devtalk.nvidia.com/default/...scal-geforce-gtx-1080-gtx-1070-amp-gtx-1060/6

Basically they view it as a professional and not a consumer facing feature.

Regards,
SB
Just adding to this one:

SM6.0 has hardware requirements as well, similar to DX Feature levels. There are flags for SM6.0 hardware support, and one of those items is FP16 rapid math.
FP16 comes to DX12 in June. I'm unsure if the 10xx series of nvidia GPUs are artificially locked to ensure data scientists are buying professional data cards for their purposes, and gamers buying them for theirs.

The addition of this feature to SM6.0 is probably indicatives of a future direction for modern rendering.

Ideally this would force nvidia to unlock the 10xx series, we'll know as soon as June rolls around.
 
Just adding to this one:

SM6.0 has hardware requirements as well, similar to DX Feature levels. There are flags for SM6.0 hardware support, and one of those items is FP16 rapid math.
FP16 comes to DX12 in June. I'm unsure if the 10xx series of nvidia GPUs are artificially locked to ensure data scientists are buying professional data cards for their purposes, and gamers buying them for theirs.

The addition of this feature to SM6.0 is probably indicatives of a future direction for modern rendering.

Ideally this would force nvidia to unlock the 10xx series, we'll know as soon as June rolls around.

So Scorpio wont support all/any SM6.0 requirements?
 
I'm sure the fact that there would be roughly 80 million consoles already installed that do not support double data rate fp16, led ms to decide that support could only ever be a sku-specific optimization and have limited support.
Isn't that an argument for not embracing any new technology? :runaway:

SM6.0 has hardware requirements as well, similar to DX Feature levels. There are flags for SM6.0 hardware support, and one of those items is FP16 rapid math.
Flags are not the same as requirments. :nope: It's more likely SM6.0 can take advantage of RPM but that support in hardware is not required.
 
Last edited by a moderator:
While what you suggest is possible, I think you are really underestimating the gains achievable with proper simulation. There are number of techniques to analyze and optimize in this situation, but I personally prefer a forest of trees approach for this kind of situation as, to me, it provides greater visibility to options that provide unexpected and /or outsized gains. They could have easily found 60 places where making adjustments yielded strong ROI.

What I don't get about this simulation approach to improve their design is that shouldn't this be AMD's job during the design stages of their APUs and GPUs? It's as if MS would be the "only" one who has an idea how games use their hardware in the first place to localise critical areas and improve on them.
 
I'm sure the fact that there would be roughly 80 million consoles already installed that do not support double data rate fp16, led ms to decide that support could only ever be a sku-specific optimization and have limited support.
It may become common optimization in the PC space if NVIDIA ports it to their next mainstream GPU from GP100 and Tegra X1 now that AMD touts it as one of the biggest changes in Vega.

http://www.anandtech.com/show/11002/the-amd-vega-gpu-architecture-teaser/2

Things like normal maps won't require 32bit precision, it can accelerate pixel shader performance significantly. The other Vega feature present in the PS4 Pro is Intelligent Workgroup Distributor. Does the Scorpio GPU have any Vega features?
 
Last edited:
What I don't get about this simulation approach to improve their design is that shouldn't this be AMD's job during the design stages of their APUs and GPUs? It's as if MS would be the "only" one who has an idea how games use their hardware in the first place to localise critical areas and improve on them.

I think it would be unreasonable for AMD to be expected to profile and design their APU based on wider system architecture features like ESRAM, VMs, differing memory types, custom APIs (GNM, GNMX), CPU/GPU OS/hypervisor reservations that will also impact performance. AMD could do this but surely this is something that better sits with Microsoft and Sony who are also talking directly with their developers.
 
MMU load with MS specific hypervisor structure is understandable but ESRAM is not issue with Scorpio and as this is about Scorpio design seriously improved through simulators. I'm sure you don't talk here about bus width and their GDDR5 clocks. Tuning to specific memory setup should be part of AMD's 1x1 of GPU designs.

Scorpio is an APU based of AMD Jaguar and GPU assets. That they may need some adjustments for their specific GPU setup makes sense but for the general designs? That should be AMD's responsibility because it affects their own PC products.
 
Back
Top