Predict: Next gen console tech (9th iteration and 10th iteration edition) [2014 - 2017]

Status
Not open for further replies.
Wasn't it Raven Ridge APUs in 2017 on 14nm with 4 core Ryzen and Vega graphics - Then Gray Hawk APUs in 2019 on 7nm using Zen+ cores and Navi-based graphics) ?.
 
But A12 is not jaguar, don't forget that ;)
Jaguar has more performance per clock than the bulldozer architectures.

Later iterations steadily improved IPC and/or reduced die area, by Excavator IPC was probably above Jaguar.

That said, for Scorpio MS have done a lot to increase Jaguar performance (latency reduction, tripled memory channels to reduced contention, improved cache performance) for the kind of workloads they've profiled running on Jaguar. On top of the 32% clock speed increase there should be worthwhile improvement in throughput.

It would be very interesting to see developers show CPU benchmarks for X1 and X1X, with results normalised for clockspeed.

I still see a refined Jaguar as a candidate for another console revision, designed to target the mass market. I'm particularly thinking of a "slim" X1X, some time around 2019. MS have done work far in excess of that needed to just run X1 games at 4K where a more modest clockspeed jump and no tinkering around with caches would have been more than enough. I think the odds of Sony seeing Jaguar again are far slimmer though.
 
I think we're certainly going to see slim versions of both mid-gen consoles once they can be manufactured on the 7nm node.

The thing that I wonder about though, is whether we'll see a super slim PS4 and XBoxOne. If so, how cheap could they get? Final PS3 cheap?
 
It may not be worth it for MS to do another Durango iteration as the sooner developers can ditch eSRAM the better (stop prolonging it).

It would be pretty cute if both Sony and MS did do it and take advantage of the process node for even higher clocks - boost mode without needing to buy the Pro, although that'd probably overlap too much into 4Pro's & Scorpio's non-patched advantages. A CPU clock bump would still be useful for general OS updates later down the road for the low-price-late-comers should the console companies decide to prolong the generation with 2013 APU specs.
 
Later iterations steadily improved IPC and/or reduced die area, by Excavator IPC was probably above Jaguar.

That said, for Scorpio MS have done a lot to increase Jaguar performance (latency reduction, tripled memory channels to reduced contention, improved cache performance) for the kind of workloads they've profiled running on Jaguar. On top of the 32% clock speed increase there should be worthwhile improvement in throughput.
Of note is that the leaked description of the first PS4 idea was that it was based on Steamroller, which means the two architectures were close enough and that Sony almost went with it.

Carrizo's design focused more on the lower power envelope and density, and a single Excavator module was actually much smaller than a Jaguar one. At 14.5mm2, even increasing its L2 back up to 2MB would leave it far below the 26.2 mm2 Jaguar. I would think there would be little doubt Excavator would be vastly superior to the Jaguar modules that were used in the consoles, although it may not be fair since Excavator had more advanced implementation methods that if back-ported could have helped.

A fair amount of what goes into improving the CPU for Scorpio is likely bound up in the uncore being faster and wider, with bandwidth to match memory more closely and the northbridge clock paired with the upclocked GPU.
The one specific change noted for the CPU architecture appears to have been an increase in its ability to manage or cache guest to host translations under the always-virtualized OS setup. They gave a decent figure of improvement for operations related to that subset, though I haven't tracked down where I saw that. That specific corner of the design is something I would wonder if the more server-oriented Bulldozer line might have been better with from the beginning.
 
14nm Jaguar cores. Otherwise those benchmarks are useless.
Best you'll get is 28nm Puma+ cores that actually clock very close to the mid-gens' 2.1-2.3GHz. Which should be okay because shrinkage per se doesn't bring better IPC.
https://browser.geekbench.com/geekbench3/4597455

Single Core: 1366
Multi-Core: 4002

15W Raven Ridge single core is over 2.5x faster than the Jaguar cores at the XBoneX's / PS4Pro's clock speeds.
It doesn't look like Jaguar is getting better IPC than Excavator either. The A12-9800B clocks at 2.7GHz, so it's 25% higher clocked than the 2.2GHz A8-7410 while getting 70% better performance in the single core benchmark.



Honestly, IMO the only reason for Microsoft not going with Excavator v2 in the XboneX is because the console's CPU performance is mighty dependent on floating point performance, so they might have needed either 4 modules at ~3.3GHz or 8 modules at 1.7GHz just to achieve complete parity.
 
The one specific change noted for the CPU architecture appears to have been an increase in its ability to manage or cache guest to host translations under the always-virtualized OS setup. They gave a decent figure of improvement for operations related to that subset, though I haven't tracked down where I saw that. That specific corner of the design is something I would wonder if the more server-oriented Bulldozer line might have been better with from the beginning.
This years hot chips presentation of the Scorpio engine.
 
I think we're certainly going to see slim versions of both mid-gen consoles once they can be manufactured on the 7nm node.

The thing that I wonder about though, is whether we'll see a super slim PS4 and XBoxOne. If so, how cheap could they get? Final PS3 cheap?

For X1X, once you've slimmed down to a ~200 mm^2 chip and dropped to six memory chips, you have to wonder what benefit there'd be to making an even slimmer X1S with climbing prices for the 16 DDR3 chips, an IO limited minimum die area and legacy issues like the HDMI pass through, same cost 4K BR drive, almost same cost HDD ... Hopefully they'll sweep away everything else when they shrink, ditch the HDMI pass through and go for a lean, mass market device.

If Sony are planning a new system around 2019 / 2020, I wonder if it's wise to flood the market with both a shrunk PS4 and PS4Pro. I think they'd be better off picking one - hopefully the Pro - and making it their base machine and then go with a new high end BC device. The larger PS4 Pro chip might be a better shrinking candidate for 7nm than the PS4 Amateur, and 7 gHz GDDR5 should be as cheap as chip by 2019.

It may not be worth it for MS to do another Durango iteration as the sooner developers can ditch eSRAM the better (stop prolonging it).

It would be pretty cute if both Sony and MS did do it and take advantage of the process node for even higher clocks - boost mode without needing to buy the Pro, although that'd probably overlap too much into 4Pro's & Scorpio's non-patched advantages. A CPU clock bump would still be useful for general OS updates later down the road for the low-price-late-comers should the console companies decide to prolong the generation with 2013 APU specs.

With X1S, MS pushed the GPU clocks but kept the CPU where it was. I think a shrunk X1X would be wise to follow the same pattern, where Scorpio / X1X can be the baseline for all software going forward. And, as you say, start to move past X1 and its development quirks.

A cost reduced PS4Pro might make a good entry level machine, with its faster CPU - as you say - and its 4K output.

Of note is that the leaked description of the first PS4 idea was that it was based on Steamroller, which means the two architectures were close enough and that Sony almost went with it.

Carrizo's design focused more on the lower power envelope and density, and a single Excavator module was actually much smaller than a Jaguar one. At 14.5mm2, even increasing its L2 back up to 2MB would leave it far below the 26.2 mm2 Jaguar. I would think there would be little doubt Excavator would be vastly superior to the Jaguar modules that were used in the consoles, although it may not be fair since Excavator had more advanced implementation methods that if back-ported could have helped.

A fair amount of what goes into improving the CPU for Scorpio is likely bound up in the uncore being faster and wider, with bandwidth to match memory more closely and the northbridge clock paired with the upclocked GPU.
The one specific change noted for the CPU architecture appears to have been an increase in its ability to manage or cache guest to host translations under the always-virtualized OS setup. They gave a decent figure of improvement for operations related to that subset, though I haven't tracked down where I saw that. That specific corner of the design is something I would wonder if the more server-oriented Bulldozer line might have been better with from the beginning.

Jay has beat me to it, but it was Hotchips. Anand have a report on the presentation:

https://www.anandtech.com/show/1174...x-scorpio-engine-live-blog-930am-pt-430pm-utc

In addition to the new page descriptor cache, they also mention a 4X increase in the number of L2 TLB entries.

The impact of trebling the number of memory channels over the X1 would be interesting to know in terms of running actual game applications. Come to think of it, aren't PS4/PS4Pro only four channels too?
 
Best you'll get is 28nm Puma+ cores that actually clock very close to the mid-gens' 2.1-2.3GHz. Which should be okay because shrinkage per se doesn't bring better IPC.
https://browser.geekbench.com/geekbench3/4597455

Single Core: 1366
Multi-Core: 4002

15W Raven Ridge single core is over 2.5x faster than the Jaguar cores at the XBoneX's / PS4Pro's clock speeds.
It doesn't look like Jaguar is getting better IPC than Excavator either. The A12-9800B clocks at 2.7GHz, so it's 25% higher clocked than the 2.2GHz A8-7410 while getting 70% better performance in the single core benchmark.

That Geekbench test has a terrible score for memory performance. With a single 64-bit channel of DDR3 1866 perhaps that's not surprising.

X1X has 12 times the channels, and each channel has nearly twice the BW.

Honestly, IMO the only reason for Microsoft not going with Excavator v2 in the XboneX is because the console's CPU performance is mighty dependent on floating point performance, so they might have needed either 4 modules at ~3.3GHz or 8 modules at 1.7GHz just to achieve complete parity.

Don't Jaguar and Bulldozer support slightly different instruction sets too?

Depending on the speed boost brought about by MS's customisations, and factoring in the 2.3 gHz of the X1X CPU, you might have been looking at rather more than 4 gHz for two modules or rather more than 2 gHz for four to reach the same level of performance. 4+ gHz would have been power prohibitive, and 2+ gHz for 8 modules would - looking at the die shot - have been awkward to house and bloated the die.

Looking at what MS wanted from the X1X CPU they chose, I don't think Excavator was ever going to be a good fit for X1X.
 
With X1S, MS pushed the GPU clocks but kept the CPU where it was. I think a shrunk X1X would be wise to follow the same pattern, where Scorpio / X1X can be the baseline for all software going forward. And, as you say, start to move past X1 and its development quirks.

Right, I meant boosting clocks for a hypothetical Slim 2.0 just to get the original 2013 hardware up to speed on that front.
 
Right, I meant boosting clocks for a hypothetical Slim 2.0 just to get the original 2013 hardware up to speed on that front.
Yea I could see this I think. Increasing that memory bandwidth will be critical though, eventually the hard bottleneck will be that 67GB/s from DDR3
 
For X1X, once you've slimmed down to a ~200 mm^2 chip and dropped to six memory chips, you have to wonder what benefit there'd be to making an even slimmer X1S with climbing prices for the 16 DDR3 chips, an IO limited minimum die area and legacy issues like the HDMI pass through, same cost 4K BR drive, almost same cost HDD ... Hopefully they'll sweep away everything else when they shrink, ditch the HDMI pass through and go for a lean, mass market device.

If Sony are planning a new system around 2019 / 2020, I wonder if it's wise to flood the market with both a shrunk PS4 and PS4Pro. I think they'd be better off picking one - hopefully the Pro - and making it their base machine and then go with a new high end BC device. The larger PS4 Pro chip might be a better shrinking candidate for 7nm than the PS4 Amateur, and 7 gHz GDDR5 should be as cheap as chip by 2019.

I seem to recall reading on here that the base PS4's bus would prohibit it from moving to the 7nm node - is there any truth to that do you reckon?

The matter of flooding the market is certainly important, but Sony seem to be fine with that: PS4 Slim, PS4Pro, PSVR all released at the end of one year. I'm interested to know whether Sony were satisfied with the results of that flooding, or if they'd rather avoid it as long as circumstance doesn't force their hand again.

That said, if they released the PS4 Super Slim, PS4Pro Slim, and PS5 in the same year, I reckon that would cover the high, mid, and low cost points of entry to the ecosystem. That might mean a higher launch price is viable for the PS5, especially if it releases in 2019 and can reasonably expect a year to itself.
 
I seem to recall reading on here that the base PS4's bus would prohibit it from moving to the 7nm node - is there any truth to that do you reckon?
As I pointed earlier, gddr6 solves this. Both ps4 slim and pro can probably be on 7nm if they use gddr6 14gbps.

Not as simple for the original xb1. Ddr4 4266 won't happen any time soon, let alone at reasonable prices. So I expect MS to focus on xb1x which also have the gddr6 option. Or go gddr6 for xb1. Edram plus ddr have to disappear asap.
 
I seem to recall reading on here that the base PS4's bus would prohibit it from moving to the 7nm node - is there any truth to that do you reckon?

Below 200mm^2, it gets pretty difficult to fit a 256-bit bus.


------------

sidenote TSMC's marketing guidelines for 10nmFF & 7nmFF

10nmFF vs 16nmFF
  • 2x density
  • +20% speed
  • -40% power consumption

7nmFF vs 10nmFF
  • 1.6x density
  • +20% alacrity
  • -40% power consumption

hm....
 
Below 200mm^2, it gets pretty difficult to fit a 256-bit bus.


------------

sidenote TSMC's marketing guidelines for 10nmFF & 7nmFF

10nmFF vs 16nmFF
  • 2x density
  • +20% speed
  • -40% power consumption

7nmFF vs 10nmFF
  • 1.6x density
  • +20% alacrity
  • -40% power consumption

hm....

alacrity? I am not a native english speaker, so I am used to having to lookup some words, but even looking it up, makes no sense.....

alacrity
əˈlakrɪti/
noun
  1. brisk and cheerful readiness.
    "she accepted the invitation with alacrity"
 
Status
Not open for further replies.
Back
Top