Current Generation Hardware Speculation with a Technical Spin [post GDC 2020] [XBSX, PS5]

Status
Not open for further replies.
To use the exact quote he said that the kraken compression block could accept "over 5GB (likely 5.5GB) per second of compressed raw data" and that would typically equate to 8-9GB/s of output "but the unit itself is capable of outputting as much as 22GB/s if the data compressed particularly well".

So it depends on how well your data compresses into Kraken. The limit is "over 5" GB.

The Road to PS5 - 17:43

I took the 8-9GB/s figure to be a simple calculation of the raw SSD speed * the ZLIB/Kraken compression ratio which given he stated Kraken to be about 10% better than zlib gives you a ratio between 150% - 165%. Or exactly 8.25GB/s - 9.1GB/s.
 
I'm a bit curious as to what MS has done with the on-chip SRAM as they felt the need to declare how much there was in total (76MB). So far, both MS and Sony have talked up what the have felt to be their strong points so it makes me wonder if MS has done something out of the ordinary with the amounts of SRAM. Anyone care to hazard a guess at what could be possible?

Maybe having large L1 and L2 GPU caches? They are determined per shader engine and per memory channel respectively, going by RDNA 1.

With MS having quite a lot more CUs per shader engine, they may have double the L1 compared to the PS5. And with them having a lot more CUs and 5 memory channels (at a high level e.g. 5 x 2 x 32 bit) maybe they have something crazy like 250% of the GPU L2 (RDNA cache slices only seem to go up by doubling).

Their "high" sram quantity might be a stealth brag about huge caches, which might also accelerate RT by keeping more levels of the acceleration structure resident in cache.

Clearly, they want people to know about their TF, sram and VRS. They picked a few points to focus on, and there has to be a reason! Same with Sony and their audio and ludicrous speed SSD.
 
Maybe having large L1 and L2 GPU caches? They are determined per shader engine and per memory channel respectively, going by RDNA 1.

With MS having quite a lot more CUs per shader engine, they may have double the L1 compared to the PS5. And with them having a lot more CUs and 5 memory channels (at a high level e.g. 5 x 2 x 32 bit) maybe they have something crazy like 250% of the GPU L2 (RDNA cache slices only seem to go up by doubling).

Their "high" sram quantity might be a stealth brag about huge caches, which might also accelerate RT by keeping more levels of the acceleration structure resident in cache.

Clearly, they want people to know about their TF, sram and VRS. They picked a few points to focus on, and there has to be a reason! Same with Sony and their audio and ludicrous speed SSD.
We don't know how many shader engines MS has and thus how many CUs per shader engine there is.
 
i still think it makes sense on console to do it this way. It just saves costs and you are right there is bandwidth loss during contention but when there isn’t the gains are there. And when it is present you lose bandwidth but not so much that it’s choking the system.

seems a fair trade off. It’s lower but not to low, and it can reach some good highs.
I agree it works best in consoles to save costs, but as you scale things up, the cost savings quickly evaporate. on PCs I don't believe a shared pool would work out, not with the kinds of massively powerful CPUs and GPUs that we have there, a Ryzen 3950X with a 2080Ti would need close to 1TB of shared memory pool bandwidth (which is terribly expensive) to even stand a chance.

It's not about flexibility, but capacity vs BW. If you have 8 GBs VRAM and 16 GBs RAM, your drawing either limits itself to 8 GBs VRAM to use its full BW with only 8GBs assets (less framebuffers), or stores assets in RAM and can use more than 8 GBs assets but has to copy them across the slow bus into VRAM to draw. If you have 16 GBs unified RAM, you have all 16 GBs available for assets but impact maximum BW.
Most games don't need more than 8GB of VRAM anyway, and those that do use the VRAM as a large texture cache, with most of it just setting there doing nothing. In fact a PC GPU with 8GB of RAM will always play games at much higher texture quality levels than Xbox One X with it's 12GB of shared RAM. Look at the recent release of Gears 5 where the PC version has access to significantly higher texture resolution packs not available for the Xbox One X, same for other multi-platform games too.

If we factor in next gen features, like Sampler Feedback, and smart texture streaming, VRAM utilization will be much better in the future.
More assets, or faster reads and writes? Pick your poison - you can't have both.
On PC, I believe we already have both, as shown above, of course it comes at the cost of higher RAM capacity for the split pool.
 
Maybe having large L1 and L2 GPU caches? They are determined per shader engine and per memory channel respectively, going by RDNA 1.

With MS having quite a lot more CUs per shader engine, they may have double the L1 compared to the PS5. And with them having a lot more CUs and 5 memory channels (at a high level e.g. 5 x 2 x 32 bit) maybe they have something crazy like 250% of the GPU L2 (RDNA cache slices only seem to go up by doubling).

Their "high" sram quantity might be a stealth brag about huge caches, which might also accelerate RT by keeping more levels of the acceleration structure resident in cache.

Clearly, they want people to know about their TF, sram and VRS. They picked a few points to focus on, and there has to be a reason! Same with Sony and their audio and ludicrous speed SSD.
I seem to recall the Scorpio Engine have larger caches than its PC equivalent gpus. Read it in one of these threads. Maybe they increased it in Scarlett compared to standard rdna2 gpus.
 
Why do you need compression at all? If the typical peak throughput of the XSX SSD with compression is 4.8GB/s then that's already possible with current SSD's without compression. And by the time the new consoles launch near PS5 speeds will be possible without. Obviously compression has other advantages though like install size which is why I was interested in understanding if it can be done on the GPU (where a dedicated decompression block is entirely feasible) rather than the CPU/APU where it's never going to happen.

I do not know any current SSD capable of 4.8 GB/s. Samsung 970 EVO and Intel Optane 905p cannot reach that value (3.2 GB/s is the best max on 970 Evo).
What other do you know?
 
We know it's 4 shader engines, like PS5. So we know how many CUs per shader engine.

Github was a pretty baller leak.
Oh it was in one of the confirmed leaks? Then I have a follow up question: How exactly is having more CU's per shader engine supposed to be worse? And it's not especially high count either, Navi 10 has 20 CUs per Shader Engine, XSX 13(14) if there's 4 Shader Engines
 
Less L1, ROPs and GE per CU.
Did the same leak confirm those numbers too?
Assuming it didn't, or didn't all of them:
L1 is according to RDNA whitepaper tied to "group of Dual Compute Units", it doesn't specify it needs to be some specific amount of them or that it needs to connect to specific amounts of ROPs either, so how can we know there's less of it?
Shader Engine can have varying amounts of ROPs
Geometry Engine isn't part of Shader Engines
 
L1 is according to RDNA whitepaper tied to "group of Dual Compute Units"

"The new graphics L1 cache serves most requests in each shader array, simplifying the design of the L2 cache and boosting the available bandwidth."
L1 is per shader array.

Shader Engine can have varying amounts of ROPs

Who said that? In RDNA wp each SA (shader array) has 4 RBs (4x4 = 16ROPs), 1 rasterizer, 1 primitive unit.

Geometry Engine isn't part of Shader Engines

But it doesn't scale with CU either.
 
"The new graphics L1 cache serves most requests in each shader array, simplifying the design of the L2 cache and boosting the available bandwidth."
L1 is per shader array.
And as shader arrays can be of varying size, who's to say how many DCU's each has in XSX or PS5 or any other RDNA2 product, and thus which of them has the most L1 per CU?
To add on top of that, who's to say they even have same amount of L1 per shader array? MS was boasting how much SRAM they have in there and it all needs to go somewhere, possibly some extra in L1 too.
Who said that? In RDNA wp each SA (shader array) has 4 RBs (4x4 = 16ROPs), 1 rasterizer, 1 primitive unit.
Well, according to the RDNA white paper each SE has two SAs and if XSX & PS5 have 4 SEs they'd have 8 SA's which would mean 128 ROPs, which I just find quite unlikely especially for a console.
 
Scorpio has a 2MB GPU L2 whereas the nearest sized desktop would be Hawaii (44CU), which sported only 1MB L2.
What about the RX 480/580? Any idea the amount of L2 they have? I want to know if the amount of L2 in Scorpio was something common to Polaris. I know Scorpio more closely resembles the X1 in its makeup but they did include some features found in Polaris. Curious if the L2 amount more closely conforms to Polaris or Hawaii.
 
Status
Not open for further replies.
Back
Top