Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

Jay · May 15, 2019

BRiT said:
Depends, did AMD already integrate MS's X1X Jaguar tweaks in Zen2?

From what I remember from the hotchips presentation the only changes was really internal cache/memory sizes, nothing major,.
Mainly helped running VM.

If it's a monolithic soc then they may make minor tweaks again.
Otherwise be little reason to.

bgroovy · May 15, 2019

bbot said:
Another possibility is to use Navi gpu chiplets, each with 36 cus, with 4 redundant. Lockhart would use one chiplet and Anaconda would use two.

I don't think there is any interconnect with sufficient bandwidth for GPU chiplets. You'd have to put memory controllers on each and that would just recreate all the same multi-GPU issues with memory not being shared, data duplication and various inefficiencies.

iroboto · May 15, 2019

Adonisds said:
For the X1X CPU, Microsoft did some changes to the architecture. Do you think they will do the same for the next gen or will they use stock Zen 2?

Depends what they are and why they are Implemented.

iroboto · May 15, 2019

Jay said:
From what I remember from the hotchips presentation the only changes was really internal cache/memory sizes, nothing major,.
Mainly helped running VM.

If it's a monolithic soc then they may make minor tweaks again.
Otherwise be little reason to.

Hot chips slides don’t outline any customizations on the GPU or CPU. So I wouldn’t use that as a fully exhaustive source.

AlNom · May 16, 2019

They increased the entries for the L2 Instruction & Data TLB (4KB pages) over standard Jaguar.

----

GPU slide: https://images.anandtech.com/doci/11740/img_20170821_093653.jpg
Nothing particularly out of the ordinary except for whatever bits they added to help facilitate backward compatibility (e.g. texture formats).

Not sure what they're getting at with Conservative Occlusion Query or some of the other features listed.

Jay · May 16, 2019

iroboto said:
Hot chips slides don’t outline any customizations on the GPU or CPU. So I wouldn’t use that as a fully exhaustive source.

Could have sworn I listened to presentation by MS at hotchips also.
But they did cover changes, and it was as @AlBran pointed out for the CPU. Basically profiled it and realised upping those caches would improve VM performance.
They are just bog standard jaguar cores otherwise.
So you wouldn't specifically take those changes across to zen, but that doesn't mean they couldn't make similar changes.

AlNom · May 16, 2019

The increased entries for the 4KB pages is kind of curious since the number of entries is otherwise identical to standard Jaguar for the rest of the scenarios (including the larger page sizes), so I'm not sure what's going on there or the context.

At any rate, maybe the Azure folks will simply push for keeping the 16MB L3 per CCX as opposed to having the cut-down variant ala Raven Ridge. There's probably not a whole lot they can reasonably improve upon that the AMD engineers haven't already tried to do for performance. The rest would just have to be BC-related.

bbot · May 16, 2019

bgroovy said:
I don't think there is any interconnect with sufficient bandwidth for GPU chiplets. You'd have to put memory controllers on each and that would just recreate all the same multi-GPU issues with memory not being shared, data duplication and various inefficiencies.

Yes, but not because of bandwidth.

Davi Wang

But realistically it’s more of a software problem than a hardware one.
The Infinity Fabric interconnect should be able to provide an interface
that is wide enough, and high-speed enough to deal with the communication
to make is look and feel like one chip, but getting the OS and the
applications to see it that way is a lot tougher.

https://www.pcgamesn.com/amd-navi-monolithic-gpu-design

HBRU · May 16, 2019

AlBran said:
Save the clock boost for mid gen.

There is some merit to going wide in the sense that they can just boost things later in the generation. At the start they’ll be somewhat more concerned with yields, and both a modest clock and wide design can serve that (obviously needs balancing so we aren’t making a small pizza sized chip).

specially true for Sony because is less capable of creating BC on OS level...
so I think PS5 will be wider and slower clocked...

Jay · May 16, 2019

After going around the houses I've settled on what I think MS should do.
MCM design with:

Off the shelf zen chiplet CPU's (I don't think there is much to tinker with for blade usage)
GPU (not chiplet just module)
IO & memory controller module (allowing a different one to be used in blades, which will also allow board stacking via high speed interconnects for azure users) also might be better to use HBM on blades?
Accelerator module (possibly AI, RT for example)

If the cost of mcm component construction was so prohibitive we wouldn't see it being moved down the stack.
Think off the shelf zen and ability to swap IO dies would be a big benefit as that's where the big change is between console and server occures, so overall would make it a reasonable and cost effective choice. Compared to singe huge dies.
Also helps yields as if any component is defective you don't loose the whole chip. CPU, wouldn't need to worry about as off the shelf.

Still think PS5 is fine with monolithic though

Metal_Spirit · May 16, 2019

Problems with multi-GPU

—-

Splitting GPUs into chiplets isn’t a new idea in the realm of ideas, however it is a concept that is difficult to conceive. One of the key areas of shuffling data around a GPU is bandwidth – the other is latency. In a graphics scenario, the race is on to get a low frame rendering time, preferably below 16.67 milliseconds, which allows for a refresh rate of 60 Hz to have a full display frame inserted on every refresh cycle. With the advent of variable refresh displays this has somewhat changed, however the main market for graphics cards, gamers, is heavily reliant on quick refresh rates and high frame rates from their graphics. With a multi-chip module, the manufacturer has to consider how many hops between dies the data has to perform from start to finish – is the data required found directly connected to the compute chip, or does it have to cross from the other side of the design? Is the memory directly stacked, or is there an intrapackage connection? With different memory domains, can the data retain its concurrency through the mathematical operations? Is there a central management die, or do each of the compute chiplets manage their own timing schema? How much of the per-chiplet design comes from connectivity units compared to compute units?

Ultimately this sort of design will only win out if it can compete on at least two fronts of the triad of performance, cost, or power. We already know that multi-die environments typically require a higher power budget than a monolithic design due to the extra connectivity, as seen with multi-die CPU options in the market, so the chiplets will have to take advantage of smaller process nodes in order to eliminate that deficit. Luckily, small chiplets are easier to manufacturer on small process nodes, making it a potential cost saving over big monolithic designs. Performance will depend on the architecture, both for raw compute, as well as the interconnect between the chips.

https://www.anandtech.com/show/14211/intels-interconnected-future-chipslets-emib-foveros

So, power usage is a problem... and since the idea is to go above 64 CUs, cost should be too.

Gubbi · May 16, 2019

People here are out of their goddamn minds.

Next gen console will launch at $499 and like the past generation it is highly unlikely they will be sold at a loss.

The console vendors will need to produce a system with CPU/GPU, DRAM, Storage, PSU and a controller and sell it at the same price as a RTX 2070 8GB.

There is zero, ZERO!!!, chance , that we will see a 64 CU GPU in next gen consoles; 1.) the cost of the die is too large, 2.) the power consumption is too large, 3.) the bandwidth demand on the memory subsystem, and consequently price, is too large.

Both MS and Sony are going to compete against console as a service-providers next gen, that puts tremendous downward pressure on the purchasing price of physical consoles.

I would expect a 48CU GPU die, with only 40 active to ensure as many usable dies as possible. I would expect MS to pair hot GPU dies with cool CPU dies to maximize the power/yield point. If they can hit 1.7GHz, then that's 8.7TFlops with FP32 and 17.4TFlops using packed FP16. I would expect it to be paired with 16GB GDDR6 on a 256bit bus running either 13 or 14GHz (~400GB/s bandwidth).

What Lockhart is/isn't is just speculation at this point, every thing I've read originates from a Reddit post in february AFAICT. If it isn't just a SKU with gimped storage (no optical, half the SSD), it might be a client to MS' console-in-the-cloud service. It could be an APU with limited capability. Enough to play existing XB1 titles, but everything more demanding would be streamed from a server.

Cheers

Globalisateur · May 16, 2019

@Gubbi
4/ GCN doesn't scale well above 56 CUs

Jay · May 16, 2019

Gubbi said:
What Lockhart is/isn't is just speculation

See title of thread.

Nothing at all to discuss otherwise, and I personally have enjoyed the discussions and pro's, con's and feasibility of peoples ideas.

So $499 8.7 TF 16GB consoles?
Sounds a bit expensive to me but, that's your opinion/speculation.

function · May 16, 2019

Gubbi said:
There is zero, ZERO!!!, chance , that we will see a 64 CU GPU in next gen consoles; 1.) the cost of the die is too large, 2.) the power consumption is too large, 3.) the bandwidth demand on the memory subsystem, and consequently price, is too large.

With MS looking to use the silicon in Azure racks for "none entertainment purposes", you potentially have other considerations that could affect the chip. E.g. your good dies go in the server blades, your salvage dies go in the consoles. This could potentially lead to a wider chip than you'd otherwise see if only used for a console, IMO. The savings of 'cheap' console parts Vs buying enterprise parts at several $k a pop could affect the calculation on die area.

I won't say 64 CUs, but I'm betting we'll see more than 40 active in anaconda, and probably more for the cloud units than retail. X1X is already at 40 on 14nm.

Power is also relative to clocks. Wider but slower can give better performance for less power.

Picao84 · May 16, 2019

Gubbi said:
People here are out of their goddamn minds.

Next gen console will launch at $499 and like the past generation it is highly unlikely they will be sold at a loss.

The console vendors will need to produce a system with CPU/GPU, DRAM, Storage, PSU and a controller and sell it at the same price as a RTX 2070 8GB.

There is zero, ZERO!!!, chance , that we will see a 64 CU GPU in next gen consoles; 1.) the cost of the die is too large, 2.) the power consumption is too large, 3.) the bandwidth demand on the memory subsystem, and consequently price, is too large.

Both MS and Sony are going to compete against console as a service-providers next gen, that puts tremendous downward pressure on the purchasing price of physical consoles.

I would expect a 48CU GPU die, with only 40 active to ensure as many usable dies as possible. I would expect MS to pair hot GPU dies with cool CPU dies to maximize the power/yield point. If they can hit 1.7GHz, then that's 8.7TFlops with FP32 and 17.4TFlops using packed FP16. I would expect it to be paired with 16GB GDDR6 on a 256bit bus running either 13 or 14GHz (~400GB/s bandwidth).

What Lockhart is/isn't is just speculation at this point, every thing I've read originates from a Reddit post in february AFAICT. If it isn't just a SKU with gimped storage (no optical, half the SSD), it might be a client to MS' console-in-the-cloud service. It could be an APU with limited capability. Enough to play existing XB1 titles, but everything more demanding would be streamed from a server.

Cheers

You seem to forget that XBox One X already does >6TFlops on GPU. What would be the point in launching a console with only 33% more GPU power? Sure, the new beefier CPU probably occupies more die space and uses more power, but if they are not going to push GPU to at least 10 TF, what's the point? Go from a CPU starved console to a GPU starved one?

Regarding the memory, 16GB is not as forward thinking as 8GB was on the PS4, plus the Xbox one X is already at 12GB? Again I fail to see the point of releasing a new console if the difference in hardware would be so small.

About the supposed competition from streaming services (which success really remains to be seen and I'm very very sceptic about), you only talk about price, but what about experience? What would be the point of selling a cheaper console that gives you an experience that is barely improved on the previous generation, while streaming services can easily upgrade their hardware and give a better experience over the life of your standalone console? If game streaming is really a threat to the traditional console business, they need to go all in on hardware to show they can provide the better experience on the long run.

Shifty Geezer · May 16, 2019

TF's doesn't matter - it's what's on screen that matters. XB1X is hampered by running current-gen games not targeting it, firstly. Secondly, advanced features could make those 8 TFs do significantly more than 33% more than 8 TFs of last-gen GCN.

Discussing TFs is interesting, but it doesn't tell us what the end results will look like and shouldn't be used for business considerations or design logistics.

Picao84 · May 16, 2019

Shifty Geezer said:
TF's doesn't matter - it's what's on screen that matters. XB1X is hampered by running current-gen games not targeting it, firstly. Secondly, advanced features could make those 8 TFs do significantly more than 33% more than 8 TFs of last-gen GCN.

Discussing TFs is interesting, but it doesn't tell us what the end results will look like and shouldn't be used for business considerations or design logistics.

If this would be a new non-GCN architecture sure, TFlops would be as meaningless as they are for comparing AMD and NVIDIA GPUs.
Since Navi is still GCN, I really doubt it can do significantly more than 33%, unless RPM would be heavily used. Each iteration of GCN was hyped as hell as bringing huge uplifts and all of them were wimps, with uplifts coming mostly from process and clock speeds.
Vega 64 has 50% more TFlops than Fuji and yet it only delivered a average 30% uplift. Performance per TFlop has been going down, not up, so I highly doubt Navi reverses the trend.

Gubbi · May 16, 2019

Picao84 said:
You seem to forget that XBox One X already does >6TFlops on GPU.

You seem to forget that MS and Sony needs to make money off of these consoles.

Picao84 said:
What would be the point in launching a console with only 33% more GPU power? Sure, the new beefier CPU probably occupies more die space and uses more power, but if they are not going to push GPU to at least 10 TF, what's the point? Go from a CPU starved console to a GPU starved one?

Packed fp16 doubles the computational oomph where it is applicable.

Picao84 said:
Regarding the memory, 16GB is not as forward thinking as 8GB was on the PS4, plus the Xbox one X is already at 12GB? Again I fail to see the point of releasing a new console if the difference in hardware would be so small.

There are two reasons for the 12GB in Scorpio: 1.) The extra computational power of the GPU demanded more bandwidth and resulted in a 384 bit bus. 2.) Assets required for 4K rendering. On both the original Xbox 1 and on Scorpio, 3GB is reserved for the system/other apps. That leaves 5 and 9 GB for the active game.

On a 16 GB machine you would have 13GB for a game, 2½ x the XB1 and almost 50% more than Scorpio. Because of the expected high performance mass storage, assets can be fetched on demand to a much larger degree without degrading the gaming experience.

Picao84 said:
Vega 64 has 50% more TFlops than Fuji and yet it only delivered a average 30% uplift. Performance per TFlop has been going down, not up, so I highly doubt Navi reverses the trend.

Fiji has 6% more bandwidth than Vega 64.

Cheers

Picao84 · May 16, 2019

Gubbi said:
You seem to forget that MS and Sony needs to make money off of these consoles.

Historically most consoles were sold at a loss in the beginning.

Packed fp16 doubles the computational oomph where it is applicable.

Exactly and there were discussions already on this forum about that topic and I remember someone (Sebbi?) saying it is very limited. In real world you might use it for a third of the rendering or less, depending on what you are rendering. It's not something you can put a lot of faith on.

There are two reasons for the 12GB in Scorpio: 1.) The extra computational power of the GPU demanded more bandwidth and resulted in a 384 bit bus. 2.) Assets required for 4K rendering. On both the original Xbox 1 and on Scorpio, 3GB is reserved for the system/other apps. That leaves 5 and 9 GB for the active game.

On a 16 GB machine you would have 13GB for a game, 2½ x the XB1 and almost 50% more than Scorpio. Because of the expected high performance mass storage, assets can be fetched on demand to a much larger degree without degrading the gaming experience.

Cheers

Fair enough, but the memory is not only used for rendering. You will have beefier CPUs for something (e.g. more believable worlds, better AI), are you going to have them compete with GPUs for that increase of only 50%?

Next Generation Hardware Speculation with a Technical Spin [pre E3 2019]

Jay

bgroovy

iroboto

Daft Funk

iroboto

Daft Funk

AlNom

Moderator

Jay

AlNom

Moderator

bbot

HBRU

Jay

Metal_Spirit

Gubbi

Globalisateur

Globby

Jay

function

None functional

Picao84

Shifty Geezer

uber-Troll!

Picao84

Gubbi

Picao84

Similar threads