Baseless Next Generation Rumors with no Technical Merits [post E3 2019, pre GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
Even if PS5 really needs exactly 36CUs in BC mode, why can’t they design a 56CUs GPU and turn off 20CUs in BC mode?

Why didn't they do likewise on 4Pro, but keep more CUs active in "Boost mode" so the software gets a bigger boost even when not patched? That's what Microsoft did on OneX so all older unpatched games run at 3TF from the start. Games with newer SDK then get use of the full 6TF even without explicitly targeting OneX.
 
Why didn't they do likewise on 4Pro, but keep more CUs active in "Boost mode" so the software gets a bigger boost even when not patched?
Because it'd cost more and yet likely sell no more units. 4Pro's choices are affected by economic choices very different a next-gen platform.
 
Which is why Cell should be included in PS5 as a custom 3D audio processor! What would Cell be at 7nm? It was 230 mm² at 90nm? 7 node shrinks. 230 / 2^7 = 2 mm²...

Two!
 
Even if PS5 really needs exactly 36CUs in BC mode, why can’t they design a 56CUs GPU and turn off 20CUs in BC mode?

A. First let's get rid of the notion that Sony ever had any intention of going with a 400mm2 SOC like the Xsx. 400mm2 is an outlier in terms of silicon die size in consoles, and is proverbially a game changer. MS also has a smaller SOC in Lockhart to offset any pricing difficulties. Just because MS did A, doesn't mean Sony wanted or can do A. Vice versa.

2. MS had a three VMs for the OS in Xb1 and Xb1x. The title VM had a HAL that allowed them their low level SDK to be further removed from Sonys. This is why Xb1 games can use 20CUs and 3TF in unpatched mode by default. Sony didn't have a similar HAL so their games in BC mode needs to run in an environment that more closely matches the environment they were developed for.
 
Last edited:
Why would it need a HAL? The GPU could operate as a 36 CU part and deal with the instructions as such in 'compatibility mode'.
 
Which is why Cell should be included in PS5 as a custom 3D audio processor! What would Cell be at 7nm? It was 230 mm² at 90nm? 7 node shrinks. 230 / 2^7 = 2 mm²...

Two!

I know your are jesting, but I wonder if there is a minimum die size due to tooling and things like that?
 
If we ignore N7P for now , 36CU rdna with 8x zen2 is just a bit smaller than the launch ps4 die (maybe 320mm2?), and we didn't add whatever their solution is for RT and Audio. So it looks like 36 is already barely reaching a reasonable BOM. Also we have to consider the much higher cost of the SSD for this gen ($100 vs $35). It's not a bad target for a single SKU launch.
28nm is probably a lot cheaper per mm^2 than 7nm anyways. I expect even with 36CUs the console would have much higher BOM than the PS4 at launch.
 
Why would it need a HAL? The GPU could operate as a 36 CU part and deal with the instructions as such in 'compatibility mode'.

I read somewhere that this opens up race condition and timing issues. Can't seem to find the post...

20 active WGP part was there in Navi 10 for PS5 to use, but they chose 18 instead. The only reason imo for such a move is BC.

Disclaimer: If anyone believes that Sony added or is adding extra CUs since the Github tests were run, or that the GPU has extra shader arrays that were hidden during the testing, disregard my posts.
 
Last edited:
I know your are jesting, but I wonder if there is a minimum die size due to tooling and things like that?
The smallest BGA package size is 1.5mm x 1.5mm, today they can make dies 1 mm2. But he's joking, the cell would not shrink like that, it would need a complete redesign, so they might as well design a purpose built DSP with fixed audio blocks.

With clock scaling every node, the Cell would be running at 12GHz. And they could put hundreds of them on a gigantic mother-of-all ring buses. Just like kutaragi envisionned.
 
I read somewhere that this opens up race condition and timing issues. Can't seem to find the post...

20 active WGP part was there in Navi 10 for PS5 to use, but they chose 18 instead. The only reason imo for such a move is BC.

Disclaimer: If anyone believes that Sony added or is adding extra CUs since the Github tests were run, or that the GPU has extra shader arrays that were hidden during the testing, disregard my posts.
It's from Cerny and his BC patents. The explanation is that no matter how much you put hardware abstraction, the rest of the code is still multi threaded code from the studio's own engine which may or may not have the proper check to avoid race conditions. Most of his patents are about cycle exact operation which have nothing to do with HAL. They enabled boost mode on Pro and I haven't seen any game glitching, it was to get perfect 100% BC which was extremely important for midgen, but not for next gen.

20 WGP is the total on die, they need to drop two for redundancy on a console.
 
The smallest BGA package size is 1.5mm x 1.5mm, today they can make dies 1 mm2. But he's joking, the cell would not shrink like that, it would need a complete redesign, so they might as well design a purpose built DSP with fixed audio blocks.

With clock scaling every node, the Cell would be running at 12GHz. And they could put hundreds of them on a gigantic mother-of-all ring buses. Just like kutaragi envisionned.

Indeed. The issue isn't that Cell couldn't do the job, it's just that a DSP could do the job better, cheaper and with better power efficiency and is already designed to be integrated into a SoC.
 
20 WGP is the total on die, they need to drop two for redundancy on a console.

I was thinking 22WGP total, 20 active.

On second thought it might need to be 24WGP total for balanced shader arrays. 6 each.

24 WGP total, 22 active.
 
If we ignore N7P for now , 36CU rdna with 8x zen2 is just a bit smaller than the launch ps4 die (maybe 320mm2?), and we didn't add whatever their solution is for RT and Audio. So it looks like 36 is already barely reaching a reasonable BOM. Also we have to consider the much higher cost of the SSD for this gen ($100 vs $35). It's not a bad target for a single SKU launch.
In 2012/2013, a 500Gb 2.5mm, the exact one in the ps4, was ~$100. Today, a 1Tb ssd is ~$120-130 and sony's proposed solution suggests it being a cost saving approach vs OTS parts. Even assuming the last part is BS, how does it end up as a $100 cost to sony?
 
The 5700 have the same butterfly design, which proves navi was made by sony for BC (/s), or maybe it was just the most efficient way to route the data paths. The Pro was launched quite early, Cerny said it was a simple solution but that doesn't mean they couldn't do something else with more effort.


Aren't all AMD GPUs in the 36+ CU range "butterfly" like the 5700, with shader engines and CU arrays distributed symmetrically to either side of the command processor and front end logic?
The maximum CU count with two shader engines is 32, and once a design needs additional shader engines they're placed on the other side.

Even if it was somewhat related, there is nothing that logically prevents them doing a three panel design, 18, 36, 54 CU which powers down sections the same way the whole point was to switch between 18 and 36 on the Pro.
With GCN, the GPU is structured to load balance by dividing screen space into tiles that an SE and its RBEs have exclusive responsibility for. The logic for how the SEs and RBEs discard or route gemoetry seems to be counting on a straightforward set of rules for determining the coverage of a triangle.
The Vega ISA doc actually has an instruction that explicitly recognizes the differing behavior of GPUs with 1, 2, and 4 SEs, based on the bounding box of a given triangle (possibly for the primitive shaders that were never released).

1 SE is trivial, and 2 SEs can be handled in software, and 4 SEs had a lookup table and a few arithmetic limits to the instruction.
Is the math straightforward and distributed evenly across screen space for higher counts, or for odd numbers?
AMD's method skipped over 3.

(edit: corrected truncated sentence above)

Why didn't they do likewise on 4Pro, but keep more CUs active in "Boost mode" so the software gets a bigger boost even when not patched? That's what Microsoft did on OneX so all older unpatched games run at 3TF from the start. Games with newer SDK then get use of the full 6TF even without explicitly targeting OneX.

In some ways, I think the PS4 Pro technically could have, since if we think of it in terms of what the hardware can do, there are inactive CUs in addition to what is active.
From the point of view of the hardware, it could have happened if it weren't for outside considerations such as yield recovery.
Die size and limited bandwidth might have capped the Pro from having enough CUs to bother with 36+.

From that standpoint, the CU count was artificially lowered even with the original PS4. The hardware or platform decided what the software could see. I think there were examples of AMD GPUs reflashed from non-XT to XT BIOS versions that enabled CUs--albeit at the risk of bringing in whatever issue disabled the CUs in the first place.
 
Last edited:
We shouldn't take Cerny's design philosophy for PS4Pro and assume he'd follow the same principles for a whole new PS generation.

People have read way too much into that comment from the beginning. I wouldn't even go so far as to call it a design philosophy. It was mostly a simple and convenient way to illustrate how the new chip design was built out for a lay audience. Symmetry has natural advantages in chip design. It's not like he's obsessed with the visual arrangement of the chip and its relationship to the previous design to the point where he'd require the choice to be between 36CUs or 72CUs, screaming "ONLY BUTTERFLIES!!!"
 
So let's summarize, does the idea of PS5 doesn't necessarily require a 36 CU design to have BC would make the github leak even more irrelevant? Would the Oberon steppings just be testing units for PS Now server blades? The true PS5 apu driver has yet to surface after all? One thing is for sure tho, running 1.8ghz - 2ghz at 36 CU inside a retail console is madman town.
 
That's GCN and PS4Pro. That's not a great starting point for discussing BC on RDNA.
AMD has published a white paper on RDNA, in terms of running legacy code it has been established that it's doing the same things in a different way. On a fundamental level, relative to GCN, the functionality of RDNA is more similar than different but in implementation it is more different than similar. Why wouldn't this not be a good starting point for comparison?

Theoretically, that could be masked with some hardware mappings, IMO.
What hardware mapping techniques are you referring too? Because, as I posted, we know that GNM APIs can be nothing more than shoving data into GPU registers which means explicit referencing of known GPU functionality. This approach does not bend well to retro-abstraction when it comes to code relying on those registers and those functions to work in a very specific way.

More info on this hardware mapping would be welcome. Where else does this work and if it does well, why is the traditional application - API - driver - hardware model still so prevalent, because removing layers of abstraction is the simplest way to increase performance while reducing code maintenance effort.
 
So let's summarize, does the idea of PS5 doesn't necessarily require a 36 CU design to have BC would make the github leak even more irrelevant? Would the Oberon steppings just be testing units for PS Now server blades? The true PS5 apu driver has yet to surface after all?
It's almost like the PS4 Pro didn't require a 18 CU design to have BC with PS4 games.
Or that Sony hasn't been releasing patents that mention disabling parts of the GPU, after the PS4 Pro released.

20 WGP is the total on die, they need to drop two for redundancy on a console.
They most probably don't need to, but they might choose to.
Assuming 20 WGP is in fact the total on die.
 
'Necessary' is a high bar. It implies no other economical solution to achieve the same goal. We do have DF reporting Mark Cerny saying the following, which is messaging Mr Cerny conveyed to other sites.



What isn't known, so is an assumption by default, is whether this was the only solution to backwards compatibility. Other statements support the lack of critical abstraction layers in PS4's APIs required for compatibility across incompatible hardware architectures, like 4A Games interviews with Digital Foundry on their experience on porting the Metro games to PS4 and XBO. When the hardware isn't 100% backwards compatibility, 99.99% compatible doesn't cut it.
That's it.
Of course Cerny maked in words what was evidently more a necessity .... a choice.. He is also a PR man.

Even if PS5 really needs exactly 36CUs in BC mode, why can’t they design a 56CUs GPU and turn off 20CUs in BC mode?
Because they think to a ps5-pro / ps6 already.... Doubling
 
Last edited by a moderator:
Status
Not open for further replies.
Back
Top