Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
Definitely warmer then i'd like to have in a pc build. On the other hand i dont think those temps are 'dangerously high'. Ram does run hotter then other parts mostly.
Right. Warmer than I’d like. I like my chips below 73C -78C generally under hard load (when I was mining). But safe since I know it can’t get any hotter unless you have poor ventilation.

Memory being hot is a bit more problematic; not a good fix for people there to solve this one. I guess we have only to hope the silicon can hold out for a lot of abuse for a long time at these temps.
 
Basically in this case PS5 would be 52mm² smaller but with 32-64MB of additional memory on die with same amount of SEs, Zen 2 cores and literally everything else bar 64bit PHY and 16CUs (which are ~2mm² each). Does not fit at all.
Does it need to be 32-64mb? On a CPU intended for a desktop OS you need an over abundance of cache to accommodate the larger number of applications polluting cache with random memory accesses, but a CPU intended for a console? If your aim is only to mitigate access from cross-cluster access then cache need not be much larger than the L2. Obviously the larger the better but is there an architectural reason it needs to be 32mb to 64mb? Why not 16 or 24mb?
 
Right. Warmer than I’d like. I like my chips below 73C -78C generally under hard load (when I was mining). But safe since I know it can’t get any hotter unless you have poor ventilation.

Memory being hot is a bit more problematic; not a good fix for people there to solve this one. I guess we have only to hope the silicon can hold out for a lot of abuse for a long time at these temps.

My 3900x doesnt get even near those temps, same for the GPU (cores), anywhere between 45 and 50c. VRAM tends to go warmer, on my OC 2080Ti around 55 (MEM1), but MEM2 can get warmer then that. This is while playing BFV 64 matches with everything ultra/DXR to high.
So yes VRAM can get hot (ter).
 
As i said previously im not saying im right, im speculating on the posibility which preliminary at least seems plausible and to bring a different perspective
Full disclaimer: I tend to gravitate towards the optimistic outcomes

What's the latest count now - 333 sq mm with 64MB IC?

See below for IC SRAM density. If you remove 43 sq mm for 64MB IC, PS5 die is around 290 sq mm.

You've assumed IO Complex logic is already incorporated. Target of 305 sq mm means you have 15 sq mm free for:

- IO Complex SRAM
- What about Multimedia logic?
- Finer adjustments for adding back 2MB L2 cache (4 PHYs + 4 MCs don't cover)
- Finer adjustments for halved Command processor, Geometry Processor and ACEs

1323290890874449920
 
@3dilettante Do you know how double Z-rate for depth only passes works? Do they just rejig the colour block to output a depth value (since it's just greyscale) in conjunction with the depth block?

I was wondering if RB+ would mean 3x depth-only rate (doubled colour block + depth block). i.e. 32 RB+ = 96 depth-only writes

Or I could be way off. :p
From other AMD GPUs, it looks like there have been separate blocks for depth. GCN die shots show that the regions where ROPs are placed that half are somewhat longer and larger blocks than the others. That would be consistent with the smaller blocks being associated with the simpler math and separate caches for depth.

As a layman, what I don’t understand is that people are saying software issues are meaning the face offs are a wash (or rather XSX not showing it’s paper advantage) - yet the BC is working very well?
There's presumably a larger cushion of performance over weaker last-gen hardware. There are also fewer ways the old code can exercise any new features that might be expensive for performance, not that backwards compatibility is always perfect. If there are issues, the types of problems and their solutions would likely be different than a title that intends to stress the modern hardware.
 
There's presumably a larger cushion of performance over weaker last-gen hardware. There are also fewer ways the old code can exercise any new features that might be expensive for performance, not that backwards compatibility is always perfect. If there are issues, the types of problems and their solutions would likely be different than a title that intends to stress the modern hardware.

And even in addition to that, there's the Rocket League devs saying that updating on Series is pretty much as simple as a patch, but on PS5 it requires a native port. So that has to be playing a big role on BC title performance and enhancements we've seen between the two platforms; full recompiles cost a lot of time and money for smaller teams.

Then there's the other stuff in the mix like how for BC titles Series X disables its SMT on the CPU so there's a slight clock boost there; games mainly CPU-bound benefit that much more from such things.

What's the latest count now - 333 sq mm with 64MB IC?

See below for IC SRAM density. If you remove 43 sq mm for 64MB IC, PS5 die is around 290 sq mm.

You've assumed IO Complex logic is already incorporated. Target of 305 sq mm means you have 15 sq mm free for:

- IO Complex SRAM
- What about Multimedia logic?
- Finer adjustments for adding back 2MB L2 cache (4 PHYs + 4 MCs don't cover)
- Finer adjustments for halved Command processor, Geometry Processor and ACEs

1323290890874449920

Quick question: are there any design-centric reasons why if Sony decided to use IC, MS seemingly decided not to? Considering the benefits it seems to bring on the PC GPUs, and it's not like servers or rendering farms won't be using those CPUs as well (though I guess in certain visual production fields, Nvidia's stuff, particularly the 3090, will reign supreme due to superior RT features and possibly DLSS 2.0)?

Either MS saw some kinks with IC they felt not worth implementing, or they really missed the boat on a relatively easy add-in? Or maybe this is an area where their push for dual-use of the SOC in consoles and servers has a more obvious drawback? Azure would of course benefit from more RAM and a wider bus for up to 40 GB does well for that purpose, arguably much more than any embedded IC. And the GPU is already large enough where they'd want to save costs by cutting on something like IC, I understand that. And they practically need 40 GB of RAM if they want to run 4x One X instances on a Series X...though I've only seen them mention One S instances running on Azure at that number, and they wouldn't need more than 32 GB of RAM for that if a Series S (minus OS reserves) is 5-6 GB.

Maybe IC doesn't scale well for virtualized multiple system instances on the same SOC, something maybe to do with that I'm guessing.

But maybe it wasn't the best decision in retrospect (for Series outside of server/Azure use), guess that depends on how we see performance delta play out between Series X and PS5 for the longer term. Then again we don't honestly know how Infinity Cache will perform longer-term on these GPUs so for all we know MS might've dodged a bullet.
 
Quick question: are there any design-centric reasons why if Sony decided to use IC, MS seemingly decided not to? Considering the benefits it seems to bring on the PC GPUs, and it's not like servers or rendering farms won't be using those CPUs as well (though I guess in certain visual production fields, Nvidia's stuff, particularly the 3090, will reign supreme due to superior RT features and possibly DLSS 2.0)?
I don't think Cerny cared for IC (a marketing term), rather he designed the PS5 for low latency and data throughput. The IO Complex is the heart of PS5. You don't need IC for it, although the design is heavily focused around managing cache.

MS already made the transition from eSRAM from X1 to X1X and carried through to XSX. I think they were burned by taking up too much die space that replaced logic with SRAM. However, they messed up with what they had with eDRAM from the X360, which had the large chunk of eDRAM off the main GPU die, so logic space wasn't wasted. They could've done something similar with the X1 and have an off-die module, but probably didn't due to cost and latency.

They needed more bandwidth for 52 CUs, and the traditional way is just a wider bus. And they would've needed to wait for AMDs IC as well, and the risks involved.

Either MS saw some kinks with IC they felt not worth implementing, or they really missed the boat on a relatively easy add-in? Or maybe this is an area where their push for dual-use of the SOC in consoles and servers has a more obvious drawback? Azure would of course benefit from more RAM and a wider bus for up to 40 GB does well for that purpose, arguably much more than any embedded IC. And the GPU is already large enough where they'd want to save costs by cutting on something like IC, I understand that. And they practically need 40 GB of RAM if they want to run 4x One X instances on a Series X...though I've only seen them mention One S instances running on Azure at that number, and they wouldn't need more than 32 GB of RAM for that if a Series S (minus OS reserves) is 5-6 GB.

Much like CPU cores and scaling them up, you can look at Shader Arrays or Shader Engines in GPUs as analogous cores, and scale them up in server environments. I think MS took the simpler, lower risk approach.

Maybe IC doesn't scale well for virtualized multiple system instances on the same SOC, something maybe to do with that I'm guessing.
IC is new and a risk compared to traditional wider buses, and as they were burned by eSRAM, I don't think MS wanted to take that risk for scalability.

But maybe it wasn't the best decision in retrospect (for Series outside of server/Azure use), guess that depends on how we see performance delta play out between Series X and PS5 for the longer term. Then again we don't honestly know how Infinity Cache will perform longer-term on these GPUs so for all we know MS might've dodged a bullet.
Tech flip-flops back and forth. Until you run benchmarks and games, you won't really know. And something that was old becomes new again. GPU fashion catwalk.
 
And even in addition to that, there's the Rocket League devs saying that updating on Series is pretty much as simple as a patch, but on PS5 it requires a native port. So that has to be playing a big role on BC title performance and enhancements we've seen between the two platforms; full recompiles cost a lot of time and money for smaller teams.
I'm not sure I'm follow. Any change to game code - including a patch - is going to require a recompile. What's the difference if the recompile is targeting modes BC1-BC3 or full PS5 on PlayStation platforms? A bunch of PS4 games, including PSVR games like Blood and Truth, have already had PS4 patches that unlock higher performance on PS5.

If you want to address all of that hardware though, that's when you to re-target for PS5 because their simply aren't PS4 APIs for PS5 hardware, but I don't see how that's any more work. Bear in mind Mark Cerny said in his talk that devs can just ignore all the new stuff and the time to triangle for PS5 is ~1 month, down from 1-2 months for PS4.
 
And even in addition to that, there's the Rocket League devs saying that updating on Series is pretty much as simple as a patch, but on PS5 it requires a native port. So that has to be playing a big role on BC title performance and enhancements we've seen between the two platforms; full recompiles cost a lot of time and money for smaller teams.

I believe the context of this statement was strictly related to enabling 120hz support on XSX and PS5.
 
Continuing the Infinity Cache speculation... the latest DF face off (COD) results could perhaps be explained by IC?
PS5 locks 60fps 95% of the time in 4k/RT mode (dynamic 2160p-1800p) and it chugs in specific set pieces momentarily dropping to 45fps/50fps for a couple of seconds while the XSX mostly retains lock 60
On the 120 mode (1080p-1220p) things flip and PS5 holds a small performance advantage.

There are now two 4k modes where PS5 falls behind XSX: DMC5 & COD.
Using IC as a explanation for these results: At native 4k or close to there's more cache misses resulting in lower performance for PS5 when the misses are more abundant similar to what we are seeing
Though PS5 doesn't have these odd dips in the 4k mode RT disabled in COD, is possible the use of high resolution in combination with rt is too much for small pool IC incurring drops from the 16ms frame time when those cache misses occur something that optimization and tinkering may improve in the future
Generally console SOC are designed around most efficient use of die space per default, I dont think you will see Sony or MS having even low single digit advantage in layout and packaging of components on SOC. Tbh AMD designed all their chips for maximum die space efficiency, as costs on 7nm node are so high per mm² compared to 16nm and 28nm.
Die space is number one requirement for any design, especially console, as SOC is biggest part of BOM, especially at 7nm.

In your example Sony would have big advantage in CU/core per mm² (we are talking about ~20-30mm²) which seems very unrealistic.
You might be right and you certainly make me question the possibility
Im torn depending of how you look at it seems very possible or very unlikely. I will say however that your argument also applies to Navi21 especially if they have to dedicate a sizeable portion of the die to IC they'll make sure to be as efficient as possible with the rest of the chip. It is in AMDs best interests to keep die sizes as small as possible irrespective of whether it is a SoC or a discrete GPU chip
I also dont understand what do you mean about Navi 21 layout being closer to PS5.
Im speculating that PS5 GPU would be a match for Navi21chip halved whereas the XSX went with a different SE/SA layout to accommodate for more DCUs and also made some changes to the front end
 
I don't think Cerny cared for IC (a marketing term), rather he designed the PS5 for low latency and data throughput. The IO Complex is the heart of PS5. You don't need IC for it, although the design is heavily focused around managing cache.

I think a 64 or 128mb infinity cache certainly would have helped even the PS5.
 
What's the matter with xbseries s?

No 60fps in ACV. No 120fps or ray-tracing in COD. 570p resolution in 120fps mode in Dirt 5 which is too low even for 720p TVs.

Generally it can't deliver true next-gen experience just with resolution scaling.
 
What's the matter with xbseries s?

No 60fps in ACV. No 120fps or ray-tracing in COD. 570p resolution in 120fps mode in Dirt 5 which is too low even for 720p TVs.

Generally it can't deliver true next-gen experience just with resolution scaling.

Well specifically relating to ACV and COD Cold War the developer didn't really do the appropriate resolution scaling imo.

As a fan of the Series S I'm less concerned about it's capabilities and more concerned about developers putting in the effort to optimize.
 
You've assumed IO Complex logic is already incorporated. Target of 305 sq mm means you have 15 sq mm free for:

- IO Complex SRAM
- What about Multimedia logic?
- Finer adjustments for adding back 2MB L2 cache (4 PHYs + 4 MCs don't cover)
- Finer adjustments for halved Command processor, Geometry Processor and ACEs
This is the part where knowing PS5's & 6800/6900 i/o size would help determine how much if any amount of IC will be possible
Using 5700 i/o (~37mm) as baseline accounts for ~18mm2 that can be used for ps5 io after we halve navi21

Btw isn't L2 included with phy/mc in rdna?
Quick question: are there any design-centric reasons why if Sony decided to use IC, MS seemingly decided not to?
To use IC on a console SoC they would had to settle for a 40CU (36 enabled) to keep the die size within reasonable parameters for a console.
They didn't think 36CU would be enough to reach their 12TF target back in 2017/18 when they were working on XSX design, it's also possible that back then they didn't know how high RDNA2 final design would clock and weren't willing to risk a 8-9TF console while Sony was probably ok with that as a minimum expectation.

Also they either weren't aware of the variable frequency depending of loads paradigm shift or considered it would complicate GDK unified development environment.
 
Also they either weren't aware of the variable frequency depending of loads paradigm shift or considered it would complicate GDK unified development environment.
Or variable frequency makes no sense when your chip has 52CUs and already operates in sweet spot. Only reason why PS5 has variable frequency is because 36CU RDNA2 is well, well below 200W limit when clocked to its sweet spot.

For MS to have meaningful variable frequency with 52CU chip they would have to cap wattage at around 230-240W, which obviously brings way to much cooling requirements given that current cap is around 200W.
 
The other thing about the variable frequency that wouldn't be a problem for sony but would be for Microsoft is that, even though I'm sure it wont affect the games in anyway, and to a game, each PS5 is identical, the variable frequency of the CPU/GPU obviously produces variable amounts of compute - not a problem in games.

Microsoft, on the other hand, are going to be using these chips for non-gaming purposes in azure datacentres, they are going to be running any random virtual machine that someone spins up on them, so a set level of performance is incredibly important. Also, since the series x SOC is bigger than the PS5's SOC it will be better at transferring heat to the heat sink, because it has a larger surface area, this makes the small, passive heat sinks that are used in servers possible, where the airflow is provided by fans mounted onto the server case rather than the CPU cooler itself.

If you have seen a teardown of the series X I think its important to see how the components could very easily slot into a high density server, for instance the dual motherboard design, well just take the card with the SOC on it and plug it into a backplane, use the same heat sink that the series X uses, slap a bunch together, and you have an ultra dense blade server that has part commonality with you console that you were making millions of anyway. This dramatically reduces the build costs of the Xcloud servers.

I bet that the series X is the best bang for the GPU compute buck that Microsoft could possibly get for azure, with the added bonus of a decent CPU for running VM's at a lower power target. Also, there have been some rumblings about how Microsoft is producing substantially more silicon than sony is, by a lot. Since Microsoft evidently has less consoles available for purchase than sony they have to be going somewhere, so I think they want a full xcloud azure roll out by the middle of next year. That's my prediction
 
Or variable frequency makes no sense when your chip has 52CUs and already operates in sweet spot. Only reason why PS5 has variable frequency is because 36CU RDNA2 is well, well below 200W limit when clocked to its sweet spot.

For MS to have meaningful variable frequency with 52CU chip they would have to cap wattage at around 230-240W, which obviously brings way to much cooling requirements given that current cap is around 200W.
You don't think their current cooling setup has enough wiggle room to handle an extra 40W? Besides I was referring to the scenario of MS going narrow and fast with some form of IC with a 36 to 44CU configuration.
XSS doesn't use variable frequencies either... I think this would complicate their unified development goals. Having it scale to 1/3 the compute performance likely makes development for two platforms more straight forward

Perhaps they need to guarantee a 12TF throughput for their server blades
 
What's the matter with xbseries s?

No 60fps in ACV. No 120fps or ray-tracing in COD. 570p resolution in 120fps mode in Dirt 5 which is too low even for 720p TVs.

Generally it can't deliver true next-gen experience just with resolution scaling.

That was expected.
I stated quite earlier that if the bigger brothers are having trouble hitting 1440p then it's going to hit below 900p for Series S
Devs might not want to really go as low as 900p to 720p, so they in turn sacrificed 60/120hz modes.
People were drinking coolaid thinking PS5 and XBSX could hit 4k without issues. I thought it was apparent from the UE5 demo ¯\_(ツ)_/¯
 
You don't think their current cooling setup has enough wiggle room to handle an extra 40W?

For one SOC? yeah no problem, but when you have 100 in a server rack (and that's on the low side, I would image a 40U server rack would have nearer 200) your talking about 4 kW of uncertainty just for a single rack in a data centre, that's way too much variability when sizing things like cooling requirements. It takes more energy to cool the chips ontop of what it takes to run, typically up to 40% of a datacentres energy usage is just in cooling. So your talking about an extra up to 5.6 kW per rack of electrical consumption, that you have to overbuild all your systems to handle.

also 40W is ~20% of the SOC, that means 20% higher running costs etc etc.


with the fixed clocks on the series S, I remain convinced that the series S will get a mobile successor around mid gen, in the format of a chunky switch, with the same specs as the current series S. something like the razer UFO prototype, link below to a video, I think this for a couple of reasons, firstly the performance seems sandbagged, the series S is the slowest running RDNA 2 part we have seen so far, and is almost certainly well below the point that the efficiency curve of the soc has any dramatic effect on the power consumption. And if the RDNA architecture keeps increasing power efficiency as it has been doing, by the time we have RDNA 5 A series S which at the moment draws 100 watts max, could have a power draw of 12.5 watts! assuming the 50% power efficiency increase gen to gen of RDNA holds, (admittedly a bit of a leap)

Razer UFO

On an unrelated note I wonder if we will see the SOCs in the series s and x in surface devices? the series X soc with a downclock could be perfect for a new surface studio, a series s soc would be perfect for surface book. Also with the security features inherent in the xbox soc they could offer higher security version of surface devices.

Thinking about it, an xbox soc powered surface device could be the first Microsoft pluton certified computer...
Microsoft Pluton
https://www.microsoft.com/security/...-chip-designed-for-the-future-of-windows-pcs/
 
Status
Not open for further replies.
Back
Top