General Next Generation Rumors and Discussions [Post GDC 2020]

...

Can you expand on where the Series X falls behind on GPU bandwidth? Is that DRAM bandwidth, or bandwidth elsewhere?
The PS5's clock isn't going to make it win in total bandwidth for any per-CPU caches.
Perhaps the L1, assuming the Series X GPU didn't adjust its size/bandwidth. One reason why it might need to depends on whether the L2's slice count increased to mirror the wider memory bus. In RDNA, the L1 is subdivided to match the number of L2 groups, since there are 4 slices per 64-bit controller, and the L1's subdivisions match how many requests it can respond to per clock.
The Series X may have 5 L2 groups, in which case the L1 might increase to have 5 sections, and thus 5 requests per clock, which would keep it above the PS5.
However, if the Series X doesn't create a 5th L2 group, it might mean that the L1 and L2 capabilities are as wide per-clock as the probable PS5 arrangement, and then clock speed could have an effect.
One possible complication to adding another cache division like that is that the ROP caches are aligned in a specific manner, and some of the no-flush benefits that Vega touted for making them L2 clients didn't hold if there was some kind of misalignment (maybe for an APU?).

...

The 6GB have an effective bandwidth of 336 GB/s. Just as a hypothetical, if the GPU is accessing this memory more than Microsoft would have expected I guess there is potential to be bandwidth limited. Ideally you want access across all memory channels for full bandwidth, but I guess if you access that particular range too much you'll lower your effective bandwidth. This goes back to how the memory interleaving is set up and I'm by no means an expert.
 
Considering the difficult part of cooling is the thermal density, and 3.5ghz zen2 was said to be the same density as the 2.23ghz GPU, it logically means they didn't create any additional difficulty cooling a 2.23 gpu than the 3.5 zen2, just more material for the total watts (one more heatpipe, maybe an inch more height). MS have a more difficult thermal density to deal with despite a lower GPU clock, because they have a 3.6 zen2 16T without the possibility of downclocking under worst case circumstances or unpredictable circumstances. It depends how high Sony put the individual power limit, but again there are indications there are aiming for something reasonable.

For parametric yield, the layout of the WGPs on a smaller chip means potentially shorter data paths. So it could allow a slightly higher clock if the worst case length of critical high bandwidth data paths is shorter than a layout requiring an additional row of WGPs further away.
 
Last edited:
Can the GPU reach that slower portion? I thought it was 10 GBs addressable by GPU at full speed, 16 GBs addressed by everything at slower speed.
 
Can the GPU reach that slower portion? I thought it was 10 GBs addressable by GPU at full speed, 16 GBs addressed by everything at slower speed.

Yes, it's still a unified memory system. The GPU can access the full range. At least that's how I understand it.

From Digital Foundry:
"Memory performance is asymmetrical - it's not something we could have done with the PC," explains Andrew Goossen "10 gigabytes of physical memory [runs at] 560GB/s. We call this GPU optimal memory. Six gigabytes [runs at] 336GB/s. We call this standard memory. GPU optimal and standard offer identical performance for CPU audio and file IO. The only hardware component that sees a difference in the GPU."

In terms of how the memory is allocated, games get a total of 13.5GB in total, which encompasses all 10GB of GPU optimal memory and 3.5GB of standard memory. This leaves 2.5GB of GDDR6 memory from the slower pool for the operating system and the front-end shell. From Microsoft's perspective, it is still a unified memory system, even if performance can vary. "In conversations with developers, it's typically easy for games to more than fill up their standard memory quota with CPU, audio data, stack data, and executable data, script data, and developers like such a trade-off when it gives them more potential bandwidth," says Goossen.
 
I understand that argument, but if that's doable, why hasn't Sony done it? In the GitHub Controversy, the idea was floated that Sony would do just that, with the 36 CUs showing only for BC testing. Turns out Sony did go narrow and then clock really high. It seems an odd choice.

I think a second part might be the SSD. I think Sony invested very heavily in that and thought that coupled with ~9 TFs that 36 CU would provide would be okay, and then they pushed that 36 CU unit. If they decided wider and more TFs was the target, perhaps the SSD solution would be simpler and cheaper.
As per DF article in which they tried to match the CPU and GPU thermals on the chip so that it would be even. Perhaps it was too difficult to approach this with a wider GPU. Seems like tom foolery when I write it this way though.
 
What other reason is there for choosing 36 CUs and then having to engineer some complex, expensive cooling when more CUs gets you more performance at the same cost?
For the sake of argument, I'll assume the jump to 7nm gave 2x the transistor budget, although it seems like it might have scaled better.
The CPUs are larger. Very rough estimate from pixel-counting a die shot of the PS4 Pro gives 40mm2 or less for two Jaguar modules, while a Zen 2 CCX on its own is about the same--which may explain why the console version may have pared back the amount of L3 cache.
Per Mark Cerny's presentation, a RDNA2 CU's transistor count is 60% larger than that of a PS4 CU, and the other parts of the shader engines and front-end logic have grown significantly. Assuming a 36CU GPU had its area cut in half, and then growing it by 1.6x brings it back to 80% of the Pro's GPU area.
If the Zen 2 area isn't significantly pared back, the area of one CCX alone puts the CPU+GPU area back to where the Pro was, not including the uncore or IO--which it sounds like the PS5 has more of.

If Sony committed to a die area like the Pro, but also including Zen2,RDNA2, a larger integrated controller block, and the bulkier GDDR6 interface, it might have had limited room for growth.
I don't know if the decision for a 256-bit interface could have driven the CU count decision, or the other way around. Microsoft's larger CU array came with a larger DRAM layout that Sony would have needed as well.

Many design targets could have left Sony with the PS5's footprint. Until recently, 7nm costs would have been projected to be higher than they've turned out to be, which could have reduced the target area. A desire to have similar chip manufacturing volume to the PS4 in the face of those projections and likely wafer start competition could have encouraged a smaller chip.
If Sony's decisions 3-4 years ago were a bit off, it could have left the room for additional CUs out.
BC might have created a floor to the CU count that the PS5 couldn't drop below, since it's harder to satisfactorily emulate missing units versus not reporting the existence of extra hardware.
 
? He is only talking about SMT on/off toggle on XSX here.
So then why bother asking that question in that way? It makes no sense.

Could the Hyperthreading feature included in the X series be the Microsoft's winning ace at the end of gerneration?
^^^
I was so confused by this question.
 
It's interesting that the explanation from Cerny about BC was to "add the differences" to the GPU so it can operate in that mode. So it seems BC only cost a small die size, and we were also seeing this on half the PS4Pro where one half of the CUs are a smidge bigger.
 
Of course the pc solution is buy more memory, huge load time and then play from memory.

Maybe, maybe not. There are and will be faster drives then the PS5 will have, it depends on how direct storage/velocity storage will work out there.

Yep, I understand that.

Don't worry. I see Doom Eternal pc basically having no load times at all with a fast SSD, while on consoles it loads.... rather slow. What is it, 20x faster on pc, atleast? Thats one of the finest looking games out there now, rather huge levels too, with massive assets and geometry. We dont even have the fastest drives that will be available yet, without even minding Direct Storage/velocity storage features. Yes it is a current gen game, but it goes to show that a swan-song game at the generation can load instantly.
 
It's interesting that the explanation from Cerny about BC was to "add the differences" to the GPU so it can operate in that mode. So it seems BC only cost a small die size, and we were also seeing this on half the PS4Pro where one half of the CUs are a smidge bigger.
The two halves of the Pro's GPU have different lengths and widths for what looks to be layout purposes. However, small differences in area might also be related to structures or cells that didn't perfectly scale in one dimension to fit the new aspect ratio.
The Pro's CUs were already a custom version with significant amounts of backwards compatibility built-in, since they would be running the same ISA in Pro mode, and that ISA's structure would have BC with the original.
Perhaps it would be worth the small area savings to take make an architectural break inside of the GPU, though I wouldn't know how to see it from die shots.
 
So then why bother asking that question in that way? It makes no sense.

Could the Hyperthreading feature included in the X series be the Microsoft's winning ace at the end of gerneration?
^^^
I was so confused by this question.
You only read what you wanted to read (him not talking about SMT on PS5 proving SMT is not on PS5). PS5 has SMT, it's in the officials specs from Sony. Even DF don't doubt about that fact, that should be enough for you.
 
who are these people? Respected news sources or just randos ?
I've heard an explanation from them that they wasn't endorsing the full statement and it was a mistake to tweet the way they did.
It was one single aspect they had heard about, and everything else i.e. it being delayed, hot being a problem isn't what they heard.
They heard it runs very hot, not that that's even a problem.
 
You only read what you wanted to read (him not talking about SMT on PS5 proving SMT is not on PS5). PS5 has SMT, it's in the officials specs from Sony. Even DF don't doubt about that fact, that should be enough for you.

Question is not if SMT will be there, it is when and under what conditions it will enabled/disabled, or if devs cant decide and it will always be enabled at that var clock speed.

@Jay

Running very hot, that's already a sign of a 'problem', very hot isn't ideal in any case.
 
You only read what you wanted to read (him not talking about SMT on PS5 proving SMT is not on PS5). PS5 has SMT, it's in the officials specs from Sony. Even DF don't doubt about that fact, that should be enough for you.
that's not my issue. I know it's supposed to be there. That's why I don't think this interview makes a lot of sense
 
Back
Top