Middle Generation Console Upgrade Discussion [Scorpio, 4Pro]

Status
Not open for further replies.
heterogeneous computing



Has it been shown that there has been very little value add for HUMA and heterogeneous computing in general for these consoles ??
Sorry but the charge of the proof is on the original poster. Compute is useful whether you have discrete or integrated GPU. Now if you know a game that can't run on PC if the CPU is not an APUs I will start to consider reconsidering my pov.
Edit
badly worded Intel CPU are APUS yet the gpu is not used when paired with potent gpu.
 
Second attempt and an image

scorpiomeasure7jjms.png


Non rounded numbers:

Height: 15.298245614035087719298245614035
Width: 23.719298245614035087719298245614
Die size: 362.86365035395506309633733456448

Actually pretty similar to X1 in terms of size if this is correct, which i'm not sure about as i'm hardly an expert in measuring dies :p

Math logic behind the measurement
  • Draw line over memory module width (which according to micron is 12mm)
  • Measure how many pixels on that line -> 12mm line = 171 pixels
  • Draw lines for width and height on the die
  • Measure pixels of each line in the actual die -> (#pixels/171)x12 = mm of each respective line on the die
  • Multiply -> Profit

Just to test, used the exact same method for this (GM200): https://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_980_Ti/images/front_full.jpg

And the total die size came up to 600.49690695619239994950132559021, reported nvidia die size is 601mm² so i guess that's pretty close :)
You could also do it with the 6 pin atx power connector on the board visible in the uncroped image. I don't have a measuring tape with me but just incase those memory modules are a different size for whatever reason for some weird reason you could also base measurements off that.
 
Server MCMs communicate pretty well off-chip, perhaps there's room for a dual die module that carries the benefits of HUMA but with a slight increase in time to communicate. It's not like CPU buses are terribly high BW.

If you go past a certain size, you might get more bang for your money with separate dies for CPU and GPU. While more expensive in the long run, it might cut down time to market, cut down R&D needed for a more customised part, and then you just dump it and release a new system after a few years.

I'm not reading too much into that render for the moment. As well as die size it has a main memory bus width that doesn't sit too comfortably with the BW figures given out, unless MS are planning on using a 384-bit GDDR5 bus with GDDR5 (unlikely) or using their GDDR5X chips in x16 mode at some 13~14 gHz (effective).
 
Base clock of the 480 is 1080 mHz, iirc. Boost clocks aren't so useful in looking at what might translate to console land, at least traditionally.

Edit: you want to take both the 12 memory chips and the 320 GB/s BW seriously, you might by looking at:

- 256 GB/s from a single stack of HBM2 (giving 2* to 8 GB)
- 64 GB/s from 2.66 gHz DDR4 on a 192-bit bus (giving 6 or 12 GB)

This seems rather less likely than simply having a single pool of GDDR5X on a 256-bit bus (8 or 16 chips).

*yarp. HBM2 is going to be available in a cost and APU friendly 2Hi stack:

http://hexus.net/tech/news/ram/91100-sk-hynix-schedules-4gb-hbm2-mass-production-q3-year/
 
Last edited:
Base clock of the 480 is 1080 mHz, iirc. Boost clocks aren't so useful in looking at what might translate to console land, at least traditionally.
Polaris 10 is supposed to work at it's boost clocks on everything but the furmark & co
 
Polaris 10 is supposed to work at it's boost clocks on everything but the furmark & co

360S could run a power virus on both CPU and GPU simultaneously, with no option to throttle, and without ever overheating or becoming unstable.

(360 launch machines, on the other hand ....)

In light of the 480 base clock of 1080 the Neo's 911 in a console form factor isn't looking so unreasonable.
 
On the bottom left corner I count 14 balanced lines (8 left, 6 more on the bottom). Smells like high bitrate interfaces like pcie, maybe there are 2 more lines hidden, for a total of 16.

I'm not rumor-mongering, I swear!
 
Oy, rumour monger, any ideas how many layers to the X1 and PS4 PCBs? Assuming a similar number of traces per layer (I dunno, just assuming), we've got 23 traces per layer.

DDR4 is 78 used pins per chip, GDDR5 is I think 170.

So assuming a similar number of traces to the memory chips per layer, and assuming the Scropio PCB render is in any way representative of anything, that's a minimum of 4 layers for DDR4 vs a minim of 8 for GDDR5.

(Edit: no idea about GDDR5X in X16 mode)
 
On the bottom left corner I count 14 balanced lines (8 left, 6 more on the bottom). Smells like high bitrate interfaces like pcie, maybe there are 2 more lines hidden, for a total of 16.

I'm not rumor-mongering, I swear!

The Xbox One has a PCIe interface as a way to link to the southbridge (controllers/USB/HDD/Kinect) and also has a PCIe link to an Ethernet controller. HDMI and other video output come from the main SOC as well.
 
If you go past a certain size, you might get more bang for your money with separate dies for CPU and GPU
I'm not sure it has anything to do with size, it has more to with costs and added value. As CPU have gone tinier the manufacturers have started to integrate iGP into their CPU. Performances of those GPU have never really much more than a marketing bullet point (especially for AMD) and it helped cleaning the mobo, made sure any computer has some level of graphic acceleration. Recently they prove competent enough to drive some non demanding games but no gamers (with reasonable financial means) is fine with just that. On the mobile SOC, it is pretty much the same, performance (3D) are an after though compare to optimizing price (and footprint). CPU and GPU are different in many ways, till those recent huge steps in resolution the different between the size of the RAM and the V-ram was pretty massive, it is reduced now. GPU works pretty consistently, power usage varies but I would think less than a CPU, the turbo is lesser the GPU run slower. CPU is more of the "bursty" type,when the architecture the jump in turbo frequency can be a lot more significant. I see this a tough mariage i you put then on the same chip: the GPU is likely to eat the thermal/power room that the CPU might want to hit during its burst.
Ultimately what we saw with both Sony and MSFT is that trying to conciliate an approach rooted in cost optimization and adding value to CPU (through extra convenience) is that they were faced with complicated choices, which cost money: Sony could have been stuck with 4GB and they had to use a 256 bits bus (whether it fitted their performances requirements or not) and MSFT they had to spend a significant area (pretty the same size as one the GPU) on eSRAM to make up for the bandwidth the main memory could not deliver to the GPU, it did not even saved them a 256 bit bus. I pass on the case when the CPU and the GPU compete for the access to the memory and effective memory bandwidth collapses. On the PC world if I let the raw compute pixel throughput of Bonaire derivative what I see is that a 128 bit bus along with GDDR5 which provide 90 to 100GB/s doesn't see to be a bottleneck and that those parts are competitive with this generation of consoles.

While more expensive in the long run, it might cut down time to market, cut down R&D needed for a more customized part, and then you just dump it and release a new system after a few years.
I'm not sure about that, the cost overhead in cost of the whole memory set-up is also quite consistent and might spice up the bill significantly. Jaguar seems to be sucking bandwidth out of straw cheap DDR3 would have done the job. Completely down the road they could have mapped the system to a SOC with UMA and optimize completely for cost as technology would have caught up to speed with the original system.
think a 14nm SOC 4 Zen core, less CUs than 12 running @ higher speed, 16 ROPS with gen2 bandwidth compression, a cleaner and more efficient memory sub-system connecting that to a more efficient dual memory controller connected to reasonably fast DDR4
Nobody sees the future it may also a bette guideline to optimize what you launch and then adapt to how things are evolving (which having say a A and B plan), memory prices are whimsical... at best, lithography stall for a little while, etc. it was all "unplannable".

As for the performances the system would have been straight forward to program for, and have faced no limitation wrt the size of the various renders targets, etc.
The funnier thing is that Bonaire was actually right for both Sony and Microsoft performances goal, Sony needed fully enable version running at slightly higher clock speed (chip and memory); The gap would not have been the same as now because MSFT would have not be constrained in way that extra computational power and pixel rate can't help.
 
Last edited:
Something just struck me as I was sitting here thinking about the potential for the XBO-T to have a 384 bit memory bus.

If we compare it to Polaris 11 (Rx460) there's some interesting things that happen. Polaris 11 (Rx 460) in base configuration is over 2 TFLOPs (under 1 GHz for 2.0 TFLOPs and over 1.2 GHz for 2.5 TFLOPs), 128 bit memory interface with ~112 GB/s bandwidth. It has 16 CU's compared to Polaris 10's 36 CUs (for the Rx480).

If you triple Polaris 11's configuration and target less than =< 1 GHZ you end up with a 6 TFLOP configuration with a 384 bit memory interface and bandwidth that's probably right at 320 GB/s (basically everything clocked lower than Rx 460).

So, if we assume Vega is just a slight evolution of Polaris. I can totally see a 48 CU Polaris/Vega configuration with 384 bit memory interface clocked at =< 1 GHz.

Hmmm, another thought, would it be possible to have a 192 bit interface with faster memory to hit 320 GB/s? For instance, imagine if the lower end Vega card is going to be using GDDR5x with a 256 bit memory bus. Instead of XBO-T using something akin to 3x Polaris 11, perhaps it's 3/4 of the lower Vega variant?

Regards,
SB
 
Microsoft could be working with Valve on Project Scorpio, according to TIME magazine.

http://time.com/4369910/xbox-one-s-project-scorpio-microsoft-e3-2016/

That's not what Phil said actually.

I’m not probably supposed to say this, but I sat down before this show with the leaders of Valve, and I showed them Scorpio and what we’re doing . . . Valve and Steam specifically is a massive part of the PC gaming community today, it’s growing like crazy, it will continue to grow. I think it’s got a long, great future ahead of it. I think as the platform holder of Windows, I think there are things we should be doing to make Windows 10 a great gaming experience. Valve applauds us. As [Valve co-founder] Gabe [Newell] says, there are going to be areas where we compete and areas where we cooperate and in the end of both of those are good for gamers.

And it's good that he's very aware of the reality of the situation.

I want to build a store, and people can look at that and say, you focusing on Windows Store is competition for Steam, I kind of laugh right now when I look at the numbers and people say I’m competitive with Steam at all, but we’re selling our first-party content.

He's also mentioned in other interviews that Microsoft eventually plan to go back to releasing their first party games on Steam again as they have in the past. They just need to figure out the details and how it'll work, etc.

I also liked this very much. In response to the reviewer calling his Razer laptop a portable Xbox if he plays Xbox games on it.

Sure, but in a way I’d prefer if you didn’t call it a portable Xbox, because it’s a Razer. I think of Xbox as the service, and your content library, and you’re bringing Xbox to Razer. I’m not trying to turn PCs into consoles either.

Thank you, Phil. I very much do not want my PC experience to turn into a console experience anymore than it already has due to the lazy ports.

Regards,
SB
 
Something just struck me as I was sitting here thinking about the potential for the XBO-T to have a 384 bit memory bus.

If we compare it to Polaris 11 (Rx460) there's some interesting things that happen. Polaris 11 (Rx 460) in base configuration is over 2 TFLOPs (under 1 GHz for 2.0 TFLOPs and over 1.2 GHz for 2.5 TFLOPs), 128 bit memory interface with ~112 GB/s bandwidth. It has 16 CU's compared to Polaris 10's 36 CUs (for the Rx480).

If you triple Polaris 11's configuration and target less than =< 1 GHZ you end up with a 6 TFLOP configuration with a 384 bit memory interface and bandwidth that's probably right at 320 GB/s (basically everything clocked lower than Rx 460).

So, if we assume Vega is just a slight evolution of Polaris. I can totally see a 48 CU Polaris/Vega configuration with 384 bit memory interface clocked at =< 1 GHz.

Hmmm, another thought, would it be possible to have a 192 bit interface with faster memory to hit 320 GB/s? For instance, imagine if the lower end Vega card is going to be using GDDR5x with a 256 bit memory bus. Instead of XBO-T using something akin to 3x Polaris 11, perhaps it's 3/4 of the lower Vega variant?

Regards,
SB
PS4 neo gpu is supposedly clocked at 911MHz. So I'm thinking 52CU@900MHz for Scorpio.
 
PS4 neo gpu is supposedly clocked at 911MHz. So I'm thinking 52CU@900MHz for Scorpio.

I'm going to say 56CU@850 at least because MS will want to comfortably reach their performance metrics without pushing the APU to it's limits. I doubt very seriously if we'll see 900Mhz, Microsoft probably still has RROD nightmares. They want a quiet, cool experience, so they're not going to risk it.
 
I'm going to say 56CU@850 at least because MS will want to comfortably reach their performance metrics without pushing the APU to it's limits. I doubt very seriously if we'll see 900Mhz, Microsoft probably still has RROD nightmares. They want a quiet, cool experience, so they're not going to risk it.

The RROD nightmares were caused by the industry not understanding what was required when switching to lead-free solder. This is the same cause of many Sony PS3 deaths too. MS won't be fearing that today, let alone 18 months from now.
 
Again, for a 56CU part (which means a 64CU part with 8CU disabled) the die area dedicated to CUs would have to be significantly larger (at least > 180mm²). And let's say that it's generously using 40% of the whole die for CUs only (31% for the Ps4 die, 20% for the X1 die). That would mean the die size is > 450mm² which first, doesn't align with the render they provided and second, would increase cost significantly. Of course there's a chance that Polaris and FF+ bring significant architecture changes that enable such a design but we'll have to wait and see.
 
Last edited:
Status
Not open for further replies.
Back
Top