Next Generation Hardware Speculation with a Technical Spin [post E3 2019, pre GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
- Very happy with how Zen2 is shaping up
- Navi looks good, but perhaps not great considering its 7nm
- 251mm2 Navi10 + Zen 2 is standard console die size (330-350mm2)
- They expect 40CU with 4 deactivated for consoles, with lower clock speeds so around 8.5TFLOPs
- Hardware raytracing generation away for AMD, but glad MS announced they are doing hardware raytracing
- A bit suspicious on Sony saying they are doing "audio ray tracing"
- Devs said they are quite happy with PS5

I think it's a bit silly to assume the PS5 will only has the same number of CUs as the PSPro did 4 years and a process node earlier. It's also kinda obnoxious for them to repeat the "Cerny only talked about audio retracing" crap that was debunked by the Wired author like the same day the article was published.
 
yes... I guess also PS5 has much similar (if not the same) numbers & GPU structure... why make an extra powerfull, extra noisy & hot, extra expensive Console ? To let PS4 owners behind ? Crazy... To let people buy the cheap & silent Stadia ? Crazy... Better lead people to stay on PS4 as long as possible by giving them new way of play... Also by online services that gives PS4 players TF as much as Stadia... And give to whom has not proper online connection the PS5 ASAP... A well dimensioned PS5... Not too expensive. Not too powerfull.... Around 10.5 TF.... ;)
Disgusting, absolutely disgusting, I'm horrified why would you be content with 10.5 TF. It might be manageable at the start of the gen but few years later you'll see framrate drops, dialed down settings, lowered res, blurry iq and so on, that extra 2 TF you would have gotten if they pushed it a bit more would be sorely missed. A 12-13 TF console is a 4k console while a 10 TF is not. Ain't no forum dwellers is gonna take my TFs away from me:devilish:.
 
I think it's a bit silly to assume the PS5 will only has the same number of CUs as the PSPro did 4 years and a process node earlier. It's also kinda obnoxious for them to repeat the "Cerny only talked about audio retracing" crap that was debunked by the Wired author like the same day the article was published.

A lot of this nonsense is based on rumors or conjecture and then regurgitated all across gaming forums. As I stated many times, it's cool to speculate and post others speculation, but don't take speculation (especially your own brand of speculation) as being the truth. If Sony or Microsoft hasn't confirmed anything, then everything should be considered speculation at this point, including rumored developers stating which system is performing better.
 
Last edited:
Disgusting, absolutely disgusting, I'm horrified why would you be content with 10.5 TF.....

ha ha ha... Cooooostsssss.... you have to fight eventually against a 129 dollars (or even 69) Stadia.... It cannot be expensive this time. Sorry....
 
I think the first 12.9 tflops rumor hasn't come from dev and their tests with dev-kits but from the croissant leak and that was supposed to be the final tflops of the PS5 console, not the dev-kits.
It was also reported by Benji ("near 13TF). Knowing TDP and die size of the chip, I am absolutely sure you are not gonna see Navi at 13TF in consoles.
 
DF's hay day was last gen. With 360 and PS3 being completely opposite in terms of arch, yet so close on performance, there was fun to be had each time new face off was posted.

These days, with both MS and Sony going to same vendor for SOCs, with same requirements and same budget, they have to stretch their analysis so many people feel they fell of. Its just the nature of technology we have in this day and age, and they have to put a bread on the table somehow.
not to mention that the times where consoles were squished to the max to get some little extra performance are gone for good. Current consoles are so powerful and have so much memory, similar unified architectures, that, while they underperform compared to PC, developers just have to tweak a few settings here and there and call it a day when they port a game.Hence you can hardly distinguish a PS4 capture from a Xbox screengrab, save from a shadow and a solitary leaf of a tree that is present in one version but not on the other.

As for DF, what still holds me spellbound is DF Retro, 'cos many details that remained a mystery about games I discussed over a lot with fellow forumers, like the admired Gran Turismo 4 could come to light someday. Like the eternal debate, which was better, Forza or GT4 and its 60fps? Was GT4 actually running at 60fps or was it wishful thinking?
 
I think it's a bit silly to assume the PS5 will only has the same number of CUs as the PSPro did 4 years and a process node earlier. It's also kinda obnoxious for them to repeat the "Cerny only talked about audio retracing" crap that was debunked by the Wired author like the same day the article was published.
Nothing silly in that. Mind you, node reduction from 16nm to 7nm was not as big as numbers suggest (28nm to 16nm was bigger one), and RDNA arch has bigger CUs then GCN had, so comparison is really not simple one.

Lets work with what we know. 40CU 5700XT GPU has 9.8TF at 1905MHz and die size of 255mm2. This means that, with 8 core Zen2, your die size will already be ~325mm2. If there is RT hardware included in this (and there seems to be), you can add another 20-30mm2. In that case we have a die size that is bigger then PS4 and PS4Pro ones, as well as more expensive one (when looked at $/mm2 7nm process is more expensive one then 28nm and 16nm node).

People said TF don't matter because Nvidia cards with considerably less TF perform better then GCN with more TF. Now we get NAVI cards that are kinda following this philosophy with 9.75TF NAVI being faster then 12.5TF GCN, which means consoles will be packing more performance then VEGA56 cards that where said to be very hard to achieve a year ago. So I think clinging on CU number is a wrong way to go. If indeed these GPUs will be clocked so high (1.8GHz), then 40CUs is more then enough and more then was expected actually.
 
One possible explanation of how project scarlett can be 4x the power of xb1x:

fp16 performance of xb1x : 6tflops
fp16 performance of scarlett : 24tflops

Which means the single precision performance of scarlett is 12tflops.
 
One possible explanation of how project scarlett can be 4x the power of xb1x:

fp16 performance of xb1x : 6tflops
fp16 performance of scarlett : 24tflops

Which means the single precision performance of scarlett is 12tflops.

Further comments from Phil Spencer have stated this is CPU speedup, which makes sense.

1.6 GHz Jaguar:
1.5 * 1.05 * 1.15 IPC improvements
2.0 * clock improvement.

Roughly 4x.

https://variety.com/2019/gaming/features/breaking-down-what-we-know-about-the-next-xbox-1203239065/

Microsoft is calling Scarlett “a bigger leap than any generation we’ve done before,” specifying that from a “pure processing perspective, [Scarlett is] four times more powerful than Xbox One X.” Today, Variety was able to confirm with Microsoft that this number was specifically in reference to Scarlett’s CPU, and not it’s graphical horsepower, which we still know very little about.

Some good content here on TSMC's 7nm process.


This talks about identifying yield issues and fixing them. This is important because Sony/MS can apply things learned on Navi and Zen 2 by virtue of debuting next year.


https://fuse.wikichip.org/news/2408...ls-2nd-gen-7nm-and-the-snapdragon-855-dtco/2/


One interesting aspect of the presentation that Chi delved into was yield. With the original batch of SDM855 processors, they had quite a lot of bad parts rejected by their partners due to high power consumption. A more in-depth look revealed that there was quite a large spread in their Vmindistribution. Naturally, there are two main approaches to handling this specific situation. You can lower the operating voltage or you can tighten the spread distribution. For Vmin higher than the Vdd, the operating voltage has to be raised to pass. Dynamic laser stimulation (DLS) was employed in order to determine the location of sensitive areas. The analysis located flip-flop devices located at a cell boundries. Further analysis revealed physical defects that cause systematic transistor Vt shift, impacting the operating voltage of the critical path. In collaboration with TSMC, design and process changes had to be made to improve the timing margins and reduce the physical defect. Multiple such problems showed up on the 7-nanometer process. A number of key modules that were particularly prone to generating low-voltage defects were isolated including the polycut and RMG clean. Through the DTCO collaboration, the yield loss due to low Vt operations was reduced by 9x. With the help of TSMC, the spread of variation in Vmin was tightened up using device tuning, optimizations across the fin, epi, and the metal gate. All in all, the result is much better uniformity across wafers with power consumption spread being reduced by around 60%. All of this effort is gone into ensuring that the share of parts that are rejected is significantly lowered.


More importantly, there is a 2nd generation 7nm process that has a 5% performance gain with the exact same design rules and toolset (no EUV). This is not 7nm+ EUV or 6nm. I see no reason consoles wouldn't benefit from these enhancements.


TSMC also developed a 2nd generation of their 7nm process. This is an optimized process which uses the same design rules and DUV and is unrelated to 7nm+ which is EUV-based. This process is entirely design-compatible with the first generation but enjoys additional power and performance enhancements. For their second generation process, TSMC made some additional optimizations.


All in all, the 2nd-generation 7nm process is said to deliver over 5% improvement in performance. Additionally, at the same leakage, at high frequencies, the second-generation 7nm process has improved the Vmin by 50 mV.


vlsi-2019-2nd-gen-perf.png



What's important here is that the y-delta is bigger than 5% once you get past the mid-point on the x-axis. That is, if you maintain the same speed, you'll get back more than 5% in power. 5% on a 180W GPU would be a total of 8W. David Schor confirmed this is available now. The question is whether Navi and/or Zen 2 already use it.
 
Last edited:
Still, even the 2nd gen 7nm seems to be far from the node's specs announced by TSMC.
 
Further comments from Phil Spencer have stated this is CPU speedup, which makes sense.

1.6 GHz Jaguar:
1.5 * 1.05 * 1.15 IPC improvements
2.0 * clock improvement.

Roughly 4x.

https://variety.com/2019/gaming/features/breaking-down-what-we-know-about-the-next-xbox-1203239065/



Some good content here on TSMC's 7nm process.


This talks about identifying yield issues and fixing them. This is important because Sony/MS can apply things learned on Navi and Zen 2 by virtue of debuting next year.


https://fuse.wikichip.org/news/2408...ls-2nd-gen-7nm-and-the-snapdragon-855-dtco/2/





More importantly, there is a 2nd generation 7nm process that has a 5% performance gain with the exact same design rules and toolset (no EUV). This is not 7nm+ EUV or 6nm. I see no reason consoles wouldn't benefit from these enhancements.





vlsi-2019-2nd-gen-perf.png



What's important here is that the y-delta is bigger than 5% once you get past the mid-point on the x-axis. That is, if you maintain the same speed, you'll get back more than 5% in power. 5% on a 180W GPU would be a total of 8W. David Schor confirmed this is available now. The question is whether Navi and/or Zen 2 already use it.

Is it true that three has never been more than 4x CPU increase?
 
Regarding the use of words from Mark Cerny in the Wired interview "The GPU, a custom variant of Radeon’s Navi family, will support ray tracing"
He didn't use words like integrated or hardware ray tracing. That has been discussed here. He later said “we are cloud-gaming pioneers, and our vision should become clear as we head toward launch” then there is this slide and text from AMD under cloud: Could it be that Sony have chosen ray tracing support only via cloud? He also talked about using ray tracing for sound localisation in games, so I don't know, just speculating.
amd-ray-tracing-580x326.jpg
 
Regarding the use of words from Mark Cerny in the Wired interview "The GPU, a custom variant of Radeon’s Navi family, will support ray tracing"
He didn't use words like integrated or hardware ray tracing. That has been discussed here. He later said “we are cloud-gaming pioneers, and our vision should become clear as we head toward launch” then there is this slide and text from AMD under cloud: Could it be that Sony have chosen ray tracing support only via cloud? He also talked about using ray tracing for sound localisation in games, so I don't know, just speculating.
amd-ray-tracing-580x326.jpg
I believe Matt from Resetera confirmed that it was hardware ray tracing to calm 60 pages of back and forth.
 
If Sony's machine is going to be more powerful, I wonder at what point it would make more sense to go chiplet instead of APU? Zen 2 chiplet yields are supposed to be in the ~70%, but that's for only a 74mm² die, and the 251mm² size of Navi will grow with any serious RT hardware in addition to more CUs. Also, I do not see either MS or Sony going ~1900 MHz GPU unless AMD makes some real power consumption breakthroughs with 2nd Gen Navi. The CU count is going to grow substantially if 10 to 12 theoretical TFLOPS is the goal.

With that in mind, the 331mm² Radeon VII die is probably at the current upper practical limit for 7nm as far as yields go, and what if 2nd Gen Navi creeps up to such a number? The addition of 2x Zen 2 CCXs, and IO die functions, will bring such an APU well above 450nm². My bet is Sony (and MS) go wide (but slower) with the CUs. Until yields pan out to make a full APU viable, going chiplet will benefit yields, while still making it possible to use harvested CPU & GPU dies for PC products. Harvested console Navi dies could be the basis for the downgraded model MS is supposedly working on, while still having access to full speed CPU dies needed for runtime parity between top end and budget console models.

The interposer holding the chiplets can have separate memory pathways for DDR4 coming from the IO chip, while allowing GPU memory pathways to pass through the interposer to GDDR6.
 
Status
Not open for further replies.
Back
Top