Next Generation Hardware Speculation with a Technical Spin [2018]

Status
Not open for further replies.
You have to consider what's out today. $499 for a Xbox One X and likely taking a loss due to ram prices today.
In 2 years time what are the realistic probabilities that Sony can produce a significantly more powerful console than One X at $100 cheaper?
We'd need to consider what exactly is making the Xbone X so much more expensive to make vs. the PS4 Pro.
Is it the SoC?



Hardly, since Neo is 325mm^2 and Scorpio is 360mm^2 (a 10%/35mm^2 difference which practically corresponds to 4*GDDR5 32bit PHYs at 7.3mm^2 each).
RAM prices could play a part, but 4GB of the slowest GDDR5 wouldn't account that much to the difference.
Hard drives are also pretty much the same. The UHD Blu-Ray drive could also be making a dent, but the Xbone S prices don't really suggest that.

I think the main culprit here is the so-called Hovis-method where all motherboards must go through a per-component fine-tuning in accordance to each SoC in order to minimize power consumption and heat output. This extra step (or set of steps) in the production line might be costing them an arm and a leg. The iGPU area in the PS4 Neo seems to be actually a bit larger than Scorpio's. Each proto-Vega CU in Neo is reportedly 15% larger than each Southern-Islands CU in Scorpio, and the latter only has 11% more CUs. At the same time the Neo has twice the ROPs. So the performance delta between Scorpio and Neo boils down to only 3 factors: 29% higher GPU clocks, 50% higher memory bandwidth and significantly more memory amount available for games.
And these higher clocks may be coming at a large cost.


Microsoft could simply make a larger console with a larger fan and a larger PSU, but they were apparently adamant on selling the premium console idea, so they spent all this money on the most powerful xbox that's also the smallest and the most silent.
I get the value proposition, but I wonder if it will serve them right in the long term.

So could Sony produce a significantly cheaper and much more powerful console than the Xbone X in 2019, if they get their hands on 7nm chips and foundries adopt EUV in the meantime? Yes, I definitely do.

They need to cover a new CPU license (a flagship CPU license)
AMD will ask whatever they want for Zen. And what AMD wants could be selling Zen for really cheap so they can get the semi-custom wins without their customers even trying to look the other way (e.g. Qualcomm, nVidia or even going full custom with ARM cores + Mali).
It's not like AMD is charging gazillions in Zen IP to 3rd party SoC makers which would make it harder for them to charge significantly less from Microsoft and Sony. Only AMD is selling Zen CPUs, no one else.

And as the underdog in the PC GPU market, they really need the console design wins.

a new node process that they have to wait to come down in price per transistor to even compete with the current prices of 16nm, have more transistors if you want it in the 8TF - 10TF range, and at least the same amount of memory, if not more.
EUV has the potential to revolutionize the industry in terms of price-per-transistor and fab output. Latest numbers I saw pointed to up to 3x faster waffer output, and that's huge.
 
Each proto-Vega CU in Neo is reportedly 15% larger than each Southern-Islands CU in Scorpio, and the latter only has 11% more CUs. At the same time the Neo has twice the ROPs. So the performance delta between Scorpio and Neo boils down to only 3 factors: 29% higher GPU clocks, 50% higher memory bandwidth and significantly more memory amount available for games.
And these higher clocks may be coming at a large cost.


Microsoft could simply make a larger console with a larger fan and a larger PSU, but they were apparently adamant on selling the premium console idea, so they spent all this money on the most powerful xbox that's also the smallest and the most silent.
I get the value proposition, but I wonder if it will serve them right in the long term.
Agreed. Some good counter points in here.

From another thread, if we ignore the size of the CU entirely:
That would be something of an architectural change, with the ceiling of 4 SE per GPU, 16 CUs per SE, and 4 RBEs (4 ROPs in an RBE) per SE.
So if they don't switch the architecture then 4 SE and a maximum of 16 CU. That's 64 CU maximum, but we know there needs to be some redundancy, so we're looking at 60 CU max.

Xbox One X is 40 CU after redundancy, all of it's gains in performance have been largely due to increasing clock speed. I don't see why MS would take this route if going wide with lower clock speeds would result in a cheaper device with the same performance profile. Surely they didn't invest in all of this just for the smallest and quietest machine. A $399 Xbox One X would be a significantly larger win if it came in at a slightly larger box.

Clocks improve the entire pipeline. Going wide doesn't, it only alleviates one type of bottleneck.
So you'd have to go really wide on compute, and then go really wide everywhere else to get the same performance as a higher clocked machine. I assume if you go wide we need to start talking about transistor count and thus costs, and die size again.

Clocks need to be a discussion point going into next gen if you're using the same architecture.
 
Yeild is probably an important chunk of the cost difference because of the clocking, and the wider bus must impact yield too. There are also multiple $10 here and there which quickly add up. High efficiency psu, cooling, uhd drive... Then add maybe $20 for the memory.
 
Yeild is probably an important chunk of the cost difference because of the clocking, and the wider bus must impact yield too. There are also multiple $10 here and there which quickly add up. High efficiency psu, cooling, uhd drive... Then add maybe $20 for the memory.
I like where the discussion is headed.

All these attributes that can related to cost increases or costs decreases, performance increases and performance decreases.

The only challenge I see is how to weigh them in terms of cost/benefit impact to the overall cost of the console.

Can we build some sort of google sheets calculator?
That factors in size of die, CU with redundancy, clock speed, and total FLOP output?
And then just keep messing with numbers until we get something we like? Stage 2 would be to incorporate the cost of those afterwards?
 
Last edited:
EUV has the potential to revolutionize the industry in terms of price-per-transistor and fab output. Latest numbers I saw pointed to up to 3x faster waffer output, and that's huge.
Lots of journalists started exagerating EUV impact and now I'm not sure anymore. Digging a bit more, suposedly the first use of EUV is only going to help yield, only for a few steps, and the big claims are just long term predictions if they solve all the current issues, which is still far off. The nice thing is that it will allow 5nm and 3nm very soon after, which are plainly impossible with current methods.
 
Removing the multi-patterning steps speeds things up, but the foundries are going to want to bill people big dollars early on anyway.
 
You have to consider what's out today. $499 for a Xbox One X and likely taking a loss due to ram prices today.
In 2 years time what are the realistic probabilities that Sony can produce a significantly more powerful console than One X at $100 cheaper?
They need to cover a new CPU license (a flagship CPU license), a new node process that they have to wait to come down in price per transistor to even compete with the current prices of 16nm, have more transistors if you want it in the 8TF - 10TF range, and at least the same amount of memory, if not more.
We haven't even gotten to increased hard drive speeds and space to support the higher fidelity or optical drives which Sony still hasn't factored into price yet for their current systems.
Not to mention with this new GPU architecture coming right around the corner for 2020 does that even make sense to still run on the current GCN architecture?

Is 2019 too close to figure out backwards compatibility? It took years of R&D for MS to get 360-> Xbox One.
And even between Xbox One X and Xbox One, or PS4 -> PS4Pro, to make compatibility easier they are essentially just mirrors of the GPU.

Tall order here. I hope they can deliver, but the realistic probabilities of a PS5 having all of these features in 2019 at $399 is not in your favour. If 2019 is forced I don't think we're going to see something much better than X1X performance range at $399; in which MS response may be to not have to do anything at all except to enable exclusives onto X1X platform to move the baseline upwards.

I doubt either Sony or MS are affected by DRAM's current pricing. Large volume buyers with long term needs aren't usually affected by short term volatility.
 
Last edited:
I doubt either Sony or MS are affected by DRAM's current pricing. Large volume buyers with long term needs aren't usually affected by short term volatility.
Hasn't this been over 1 year though?
 
Hardly, since Neo is 325mm^2 and Scorpio is 360mm^2 (a 10%/35mm^2 difference which practically corresponds to 4*GDDR5 32bit PHYs at 7.3mm^2 each).
I imagine if this is still accurate, the larger die size and the higher clock speeds probably is equating to a higher cost here.

edit: btw, asked around. got no hits.
main-qimg-a7882960d87d87ec3cbc62a7d73768ca
 
Last edited:
I imagine if this is still accurate, the larger die size and the higher clock speeds probably is equating to a higher cost here.

edit: btw, asked around. got no hits.
main-qimg-a7882960d87d87ec3cbc62a7d73768ca
And this shows how important it is to have some CU disabled for yield, each of those with a defect still work as long as the defect fall on a CU (high chances considering it's the majority of the die). Only 5% more area for redundancy can mean 50% more working parts.

Another fun trick is threadripper with 4 small dies which have great individual yield, but operate like a massive chip without the associated exponential drop in working parts. 4 times the die area for 4 times the cost, versus the gigantic xeons that cost many thousands.
 
And this shows how important it is to have some CU disabled for yield, each of those with a defect still work as long as the defect fall on a CU (high chances considering it's the majority of the die). Only 5% more area for redundancy can mean 50% more working parts.

Another fun trick is threadripper with 4 small dies which have great individual yield, but operate like a massive chip without the associated exponential drop in working parts. 4 times the die area for 4 times the cost, versus the gigantic xeons that cost many thousands.
is threadripper AMD's response to increasing die sizes or is it universal?
nvm. Ryzen specific.
From what I can see it' is actually 2 dies + 2 dummy dies on 1 thread ripper.
TDP is very high however coming in at 180W. But damn
 
Last edited:
is threadripper AMD's response to increasing die sizes or is it universal?
I don't know, so far we haven't seen this by intel or nvidia or others. The inter-die communication is probably a complicated compromise of power, bandwidth and latency.

I imagine a gpu being able to pull this off would be quite competitive at the high-end. Say they have 6 or 8 small sections with each basically a complete small gpu including a memory controller (one hbm or 64bits of gddr6 each). If they were all connected orthogonally it could have one as a redundancy. Practically no critical area. Amazing yield including assembly, interposer, tsvs, etc...
 
I don't know, so far we haven't seen this by intel or nvidia or others. The inter-die communication is probably a complicated compromise of power, bandwidth and latency.

I imagine a gpu being able to pull this off would be quite competitive at the high-end. Say they have 6 or 8 small sections with each basically a complete small gpu including a memory controller (one hbm or 64bits of gddr6 each). If they were all connected orthogonally it could have one as a redundancy. Practically no critical area. Amazing yield including assembly, interposer, tsvs, etc...
yea I was thinking along the same lines hoping you'd have a clue if this is leveraged elsewhere.

Very cool though.
 
About The recently updated backward compatibility patent (Cerny)

Backward compatibility testing of software in a mode that disrupts timing

If I am not mistaken Jaguar has only L1 and L2 caches, well, here comes L3 cache for CPU shared by all clusters of cores:

FIG. 2 depicts an example of a possible multi-core CPU 200 that may be used in conjunction with aspects of the present disclosure...Each cluster may also include a cluster-level cache 203-1 . . . 203-M that may be shared between the cores in the corresponding cluster...Furthermore, the CPU 200 may include one or more higher-level caches 204, which may be shared between the clusters.

US09892024-20180213-D00002.png
 
And this shows how important it is to have some CU disabled for yield, each of those with a defect still work as long as the defect fall on a CU (high chances considering it's the majority of the die). Only 5% more area for redundancy can mean 50% more working parts.

Another fun trick is threadripper with 4 small dies which have great individual yield, but operate like a massive chip without the associated exponential drop in working parts. 4 times the die area for 4 times the cost, versus the gigantic xeons that cost many thousands.
Aren't there some associated performance penalties with using infinity fabric? Maybe the tradeoff is more than worth it versus single dies if programs take advantage of all the cores though.
 
I doubt either Sony or MS are affected by DRAM's current pricing. Large volume buyers with long term needs aren't usually affected by short term volatility.
I thought similar things, however Apple had to make a statement to their shareholders regarding DRAM supply issues impacting the amount of memory in upcoming phones. And LG had to make a statement to its shareholders that it had to take proactive measures to ensure its DRAM supply. Meaning existing contracts/deals were not enough to insulate either of these massive players and they had to take proactive measures beyond just the initial deals/contracts.
 
They magically forgot to put gddr in that chart. Because gddr6 is going to be 864GB/s and 5 pj/bit this year.
Is the 5 pj/b figure from Micron's whitepaper on GDDR6? It would seem like that number is not capturing the same elements as the figures given for HBM/HBM2 SOCs. The HBM2 figures give 6-7 pj/bit in comparison to GDDR5's ~20 pj/bit. However, Micron's GDDR6 paper gives GDDR5 6.5 pj/bit, which may mean that they are leaving off some major consumers like the controllers or on-chip hardware.

In theory, some of those elements should also be improved, but we have enough product examples to show GDDR5 is several times less efficient than HBM, which wouldn't make sense if the marketing for GDDR6 gave the whole picture.

edit: Of note from other presentations, the drop in interface power has made DRAM array consumption a limiting factor, which may put a more problematic ceiling on 3D integration since the DRAM's own power consumption may cap the effectiveness of the stack regardless of what could be delivered as bandwidth.

The cited patents are...irrelevant? And not actually cited anywhere? Still, at least we know Cerny is involved at Sony.
To some value of "is", since the patent was filed in 2015 and projects sometimes have significant lead times before they filter out into a filing.

is threadripper AMD's response to increasing die sizes or is it universal?
EPYC is, or rather it was AMD's response to the fact that it could not design and support as broad a range of chips or engineer a big chip to the extremes Intel can.
Threadripper was an incremental side project that took what EPYC could do and cut the number of chips down.
EPYC was engineered to cater to 80-90% of the server market that generally wouldn't need such high level of die integration or vector throughput, and provide an uncommonly large ratio of IO and memory to package/socket for the price.
The remaining 10-20% can really feel where AMD's FPU or interconnect falter, and they generally are able and willing pay handsomely for the performance it doesn't provide. Perhaps more awkwardly, workloads in that 80% do occasionally wander into the corner cases felt by the 10-20%, and the choices made for scalability have some high latency floors in the current first generation.


About The recently updated backward compatibility patent (Cerny)

Backward compatibility testing of software in a mode that disrupts timing

If I am not mistaken Jaguar has only L1 and L2 caches, well, here comes L3 cache for CPU shared by all clusters of cores:
As indicated by the many "may or may not" interjections, the patent is being very broad on purpose. Near the end, it pretty much says something to the effect that if a chip can have X, it can be put into test mode, but whether a specific unit or block is present or not is in no way limiting.
Unless people expect the PS5 to have a CDROM and a tape drive while running an off-die memory controller, they're not committing to any specific embodiment. None of AMD's x86 chips have that cache layout, but then again few other architectures have that mix either. The patent's language is that any of those specific features could be added or removed without affecting the concept.

Aren't there some associated performance penalties with using infinity fabric? Maybe the tradeoff is more than worth it versus single dies if programs take advantage of all the cores though.
Summit Ridge's memory and inter-CCX latencies are in the same range as the console chips' Jaguar latencies, as an example. Microsoft mentioned latency improvements for both of its APUs, and the Xbox One X may have a higher northbridge clock to match the GPU and GDDR5 controllers. The PS4 Pro might have bumped that clock as well. With even modest improvements to that on-die latency, the newer Jaguar chips might oddly enough have some advantages over Ryzen's first iteration of its fabric, at least pending analysis of the newer Raven and Pinnacle Ridge variants.
 
Last edited:
Another fun trick is threadripper with 4 small dies which have great individual yield, but operate like a massive chip without the associated exponential drop in working parts. 4 times the die area for 4 times the cost, versus the gigantic xeons that cost many thousands.
And another possible trick is to focus fully on 2.5D approach by extracting more and more parts FROM INSIDE THE CPU/GPU into standalone chips. The advantage of this is that instead of making really big chips on 7nm, they can push through bad yields by placing on wafer only very small chips. They will get more from every wafer, but will have to spend a bit to assemble those smaller chips back together on a interposer.
 
Last edited:

This shouldn't even be considered a rumor. No sources. Nothing.

p.s. I don't believe a PS4 portable would be feasible until 5nm in 2022+

The guy leaked a Nintendo direct and have connection into the industry. I think the PS5 will be available as soonest as the technology is ready and they can use 7nm process for creating a console at 400 dollars/euros with the same economic model than PS4 (2019 or 2020).

I think we will never see a PS portable again...
 
Last edited:
Status
Not open for further replies.
Back
Top