Predict: Next gen console tech (9th iteration and 10th iteration edition) [2014 - 2017]

Status
Not open for further replies.
The work MS have done on the CPU for Scorpio seems beyond what they would have needed to simply run X1 games at 4K. They've achieved a significant drop in main memory latency (up to 20%) - which was already below that of their competitor - and mucked about with the cache to increase IPC from within virtualised environments (up to 4.3% faster).

For a niche product intended to live and die with the X1 and PS4, they've done a lot of seemingly unnecessary work. Unless, of course, they do see a possible life for the X1X beyond the X1, where every few percent of extra CPU performance you can get will count.

I can still see another potential outing for Jaguar, perhaps in a cost reduced, upclocked X1X follow up designed to supersede the X1 and X1X and target the mass market. GDDR6 and 10 nm could allow for much smaller, cooler system based on the X1X architecture to lead the low end charge while they design a new high end system - where the additional die area of Zen would not be prohibitive.
 
Why not both?

Something Scorpio level is still going to be fast in 2019, and would overlap with a new, expensive bruiser of a system with a common development environment.

MS said they were moving away from traditional generations, and they could have two products on the go just as they will worth Scorpio and X1S.
 
With the GDDR6 specs, there's something else I find really interesting about the decision to go dual channel. Each x16 channel have basically the same commands and timings of a full x32 gddr5.

It means PS4 Pro Slim can be done with 4 chips @14gbps and the XB1X can be 6 chips. It would behave incredibly similar, which is definitely not the case with gddr5x. Could it be as simple as having practically the same controller logic but a different PHY for data, and lower voltage?

It's an interesting new upgrade path which previous discussions here pointed to a dead end in terms of die size because of the I/O area etc...
 
Why not both?

Something Scorpio level is still going to be fast in 2019, and would overlap with a new, expensive bruiser of a system with a common development environment.

MS said they were moving away from traditional generations, and they could have two products on the go just as they will worth Scorpio and X1S.
I also think the xboX is there to stay for the next gen "entry". At some point MS will no longer support the base xbo & Xbox. At that point the xboX will be the new base console and the new gen will be the high end console.
than they don't start with a empty userbase into the new generation, they already have the xboX userbase and can release the xboX2.
 
Why not both?

Something Scorpio level is still going to be fast in 2019, and would overlap with a new, expensive bruiser of a system with a common development environment.

MS said they were moving away from traditional generations, and they could have two products on the go just as they will worth Scorpio and X1S.
Will there be multiplatform games that will be held back by PS4 and X1 in the beginning of next-gen? sure, I don't see why not.
It's just the way it works once there is a big installbase. Good news is that unlike Ps4Pro and XB1X, modes not possible on the previous system (or in that case generation)will be allowed on these consoles. Additional or improved modes should exist.

I would find it dumb if MS decided to support the XB1X for the entire next-gen. let alone enabling cross-play between it and their next system.
 
Last edited:
PC is 100% compatible on the CPU side. As long as instructions aren't present in the old architecture and missing in the new (without being emulated in microcode) then CPUs are interchangeable, which is why you can take a 5 year old PC on an i5 and swap out the mobo+CPU for Ryzen and run all the same software. By sticking with x86, CPU compatibility should be a given.

Good thing Sony's rumored Steamroller version of the PS4 was dropped, because they would have been screwed given Bulldozer's extensions were almost completely wiped out with Zen.

Aside from that, there could be some low-level differences that the firmware would need to know about. What is visible at the game level may depend on just how low-level they are permitted to go. Something like thread/core IDs might not behave as expected, and some of the earlier coreID values indicate that naively running from 0-7 for cores will actually fill up 4 physical cores and their threads. That could be worked around if the OS had a compatibility mode, although it seems Sony and Microsoft didn't want to upend the OS/hypervisor mid-generation.

There could be some subtle differences that most games probably wouldn't be affected by, as CPU vendors can tweak things like latencies or not have results for instructions be bit-accurate for FP or complex operations like transcendentals--not that a game should be counting on something that specific. There are some very subtle changes such as prefetches and items like the memory consistency model and load/store ordering. Intel changed, or perhaps just explicitly recognized what was already assumed, its store and load ordering within a core. Between AMD and Intel, they don't fully agree on what some streaming memory operations do, or which memory sequences need the heaviest synchronization.
A game might be able to accidentally hit one of those synchronization differences, at least between vendors.
 
Can we expect something like this?
configbcomplete-crsdim.png
 
Can we expect something like this?
It's close to a generic GCN and CPU combo, but without more context it's hard to say whether there's an explanation for the deviations.
Compared to known CPU and GPU architectures.
The CPU block is rather undefined, although if it's just a generic block it might excuse having the L2s separated from the L3s in the middle. If it were a Zen derivative in particular, the cores would be mirrored on both sides of their local L3.

The GPU's ACE count looks excessive (one is missing an arrow), and there's no HWS. The consoles have two command processors, although this may be a quirk of the current gen. The dispatch unit is potentially replaced/renamed with AMD's workload balancer.
GDDR6 modules have two channels, which doesn't mesh with the diagram.
Following the Vega ISA doc, the memory crossbar is between the CUs and L2, not outside of the L2. The diagram seems to be drawing the L2s as private to a local SE, and then those are interfacing directly with one of two layers of memory controller.
Which L1 is the diagram saying is shared between the CUs?
Vega appears to have infinity fabric between the L2 and the HBCC/memory controller section.
The idea of the infinity fabric is that it is more generic, so having a CPU fabric and GPU fabric is of uncertain utility.

I'm not saying that there couldn't be a formulation with features like that in some future architecture, just that the diagram isn't much to go on to distinguish purposeful change versus vagueness. The glut of ACEs and the way the cache heirarchy is shifted may not do well if evaluated in the same way as existing architectures.
 
GDDR6 modules have two channels, which doesn't mesh with the diagram.
It would fit with gddr6 pseudo_channel mode, which link address pins in one bus. Basically a 32 bit mode with double prefetch making it practically a gddr5x mode.
 
It's close to a generic GCN and CPU combo, but without more context it's hard to say whether there's an explanation for the deviations.
Compared to known CPU and GPU architectures.
The CPU block is rather undefined, although if it's just a generic block it might excuse having the L2s separated from the L3s in the middle. If it were a Zen derivative in particular, the cores would be mirrored on both sides of their local L3.

The GPU's ACE count looks excessive (one is missing an arrow), and there's no HWS. The consoles have two command processors, although this may be a quirk of the current gen. The dispatch unit is potentially replaced/renamed with AMD's workload balancer.
GDDR6 modules have two channels, which doesn't mesh with the diagram.
Following the Vega ISA doc, the memory crossbar is between the CUs and L2, not outside of the L2. The diagram seems to be drawing the L2s as private to a local SE, and then those are interfacing directly with one of two layers of memory controller.
Which L1 is the diagram saying is shared between the CUs?
Vega appears to have infinity fabric between the L2 and the HBCC/memory controller section.
The idea of the infinity fabric is that it is more generic, so having a CPU fabric and GPU fabric is of uncertain utility.

I'm not saying that there couldn't be a formulation with features like that in some future architecture, just that the diagram isn't much to go on to distinguish purposeful change versus vagueness. The glut of ACEs and the way the cache heirarchy is shifted may not do well if evaluated in the same way as existing architectures.

Thanks.
 
It would fit with gddr6 pseudo_channel mode, which link address pins in one bus. Basically a 32 bit mode with double prefetch making it practically a gddr5x mode.
I wondered if GDDR5X was basically what would have been GDDR6, but they didn't wait for the JEDEC ratification.
 
I made a new one with 3dlittetante propositions.

nextgen.png

I know that the HBCC seems to be more related to the HBM memory but at the same time my understanding is that it an important part to give the GPU the full coherent access to memory.

The idea is a GPU with 72 CU/NCU... ¿Reasons? PS3 was designed for 720P rendering in mind and it has 24 Texture Units, PS4 for 1080P (2.25x more) but it became rounded up to 3x and in PS4 we have 72 TMUs and 18 CUs. The jump from 1080P to 4K is 4X and this is the reason for this configuration. The other possibility is a 8*9 Configuration instead of a 6*12 configuration, but I prefered for space reasons the 6*12 Configuration.

And well... As you have realized. The diagram is for an MCM unit where CPU and GPU are in different dies but with a common interposer, like the Wii U.
 
I wondered if Neo's configuration was a step in that direction, but I suppose it also just mirror's Polaris 10 config (3+3+3 per SE)
 
I know that the HBCC seems to be more related to the HBM memory but at the same time my understanding is that it an important part to give the GPU the full coherent access to memory.

The idea is a GPU with 72 CU/NCU... ¿Reasons? PS3 was designed for 720P rendering in mind and it has 24 Texture Units, PS4 for 1080P (2.25x more) but it became rounded up to 3x and in PS4 we have 72 TMUs and 18 CUs. The jump from 1080P to 4K is 4X and this is the reason for this configuration. The other possibility is a 8*9 Configuration instead of a 6*12 configuration, but I prefered for space reasons the 6*12 Configuration.

And well... As you have realized. The diagram is for an MCM unit where CPU and GPU are in different dies but with a common interposer, like the Wii U.

If this is just moving generic boxes around in a diagram, that's fine. It's such a high-level look that it's effectively saying "Zen+Vega" in picture form. I don't know if we can get much more out of it.

To clarify, Infinity Fabric is the data fabric. The data fabric box and the extra line between the CPU and GPU that are not for Infinity are redundant.
There is a control variant of the fabric, but items related to that are not represented in the diagram.
There would still be memory controllers between the HBCC and the GDDR6 in some fashion, since the HBCC itself doesn't control the memory channels in Vega.

Now that you've indicated these are not the same chip, I'm not sure if this arrangement matches what AMD would intend in a system with HBCC. Its proposals usually give the CPU its own DRAM, and give the GPU HBM. A CPU hanging off a separate GPU chip for memory is unexplored territory, since Zen has not shown itself capable of not having local memory.

Outside of Raven Ridge, which is one chip, AMD has only shown a form of "APU" where the CPU and GPU do not share an interposer, and the GPU gets HBM. AMD hasn't promised any significant level of connectivity between the CPU and GPU using an interposer, and hasn't shown that there's really a need since its existing non-interposer links seem more than sufficient for the CPU.

As noted, if this is Vega-based, the number of CUs per grouping stops at 3. It is true there is a shared instruction cache in that region labeled "L1", although that is just part of the shared hardware.
I'm not sure how much of the HBCC is needed for a console, or if the GPU is the sole owner of all the memory in this case.
I have a feeling the CPU and system overall may not align with that concept without some significant work, since a slave device that cannot process certain system/link errors that could interfere with communication with the CPU owns the memory needed to process errors.

I'm not sure using TMU count or target resolution is the best metric for improvement, and it misses the massive growth in other resources and the fact that the texture units were higher-clocked and more capable with the current generation. Also, if this is 7nm or the like, a chip that is not significantly more endowed than Vega may be small enough that a 512-bit GDDR6 bus may be difficult to fit (roughly the size of 256-bit Polaris 10), and an interposer doesn't help GDDR6 or the CPU. It makes having an interposer rather questionable since this setup is not offering new integration capability.

If clocked like Vega at ~1.5 GHz (may be a bad idea), it could get something like the 8x FLOP boost over the prior generation like the PS4 did, but less of a gain versus the Pro (~3x) or Scorpio(~2.4x). It may not be able to cleanly differentiate itself as a new generation. (edti: This may be unavoidable and not a problem with this specific concept in particular.)
 
Last edited:
Also, if this is 7nm or the like, a chip that is not significantly more endowed than Vega may be small enough that a 512-bit GDDR6 bus may be difficult to fit (roughly the size of 256-bit Polaris 10), and an interposer doesn't help GDDR6 or the CPU. It makes having an interposer rather questionable since this setup is not offering new integration capability.

If clocked like Vega at ~1.5 GHz (may be a bad idea), it could get something like the 8x FLOP boost over the prior generation like the PS4 did, but less of a gain versus the Pro (~3x) or Scorpio(~2.4x). It may not be able to cleanly differentiate itself as a new generation.

mm... was thinking something like the following could be an ok size @"7nm" TSMC/GF/SS ( I think ~0.4x scaling vs 14/16FF gen, but I'm a little fuzzy)

CPU
  • Ryzen - 2 CCX
GPU
  • 6 SE x 15 CUs
  • 6 CUs disabled
  • 1.40GHz = 15TF FP32
  • 4RBEs per SE = 96 ROPs, 6-12MB L2
Mem
  • 384-bit GDDR6 @ 14Gbit/s = 672GB/s
  • (16Gbit) *12x (32-bit) chips = 24GB
 
I'd say 32gb memory is the minimum we can expect. Scorpio may only have 12gb but it's designed for the same games.

Hell 64gb isn't out of the question given that every new playstation console sees exactly a 16x increase in memory amount. Memory prices would have to stabilize for that though and we probably would be stuck with the ps4 until 2020-21.

Which I would be fine with, honestly. The more time it takes for the new gen, we may actually get something resembling a real generational leap.
 
Last edited:
Status
Not open for further replies.
Back
Top