Digital Foundry Article Technical Discussion [2020]

3dilettante · Mar 19, 2020

I think they'd have to do more to a CU than just remove the caches to get something useful, and there are some nice elements to caches that the SPEs might have benefited from.
What would a CU have for data access without caches? Is this all going directly into the LDS, and is that LDS larger?
The latency for the LDS is likely in the range of 30 cycles, versus 7 in Cell.
The LDS doesn't serve instructions to the CU, and an SPE-like solution would have the same storage for instructions and data.
That shared local store was also in some ways a disadvantage for the SPEs, since linking the instruction stream to the same memory meant slightly longer latency for all LS access, and meant that changes to the instructions in a program could affect the data portion if the total memory consumption rose.
I wonder if there are hardware units on the side for CODEC work, and presumably the DMA feeding whatever the CU is using for memory.

dobwal · Mar 19, 2020

bgroovy said:
I'm not convinced any of the Xbox Velocity Architecture is anything more than a rebranding of HBCC.

HBCC has a decompression block or allows a PC to quick resume? Sure some of it looks like HBCC because better memory management between vram and the lower bandwidth memory servicing it, is always going to be an ongoing issue looking for a better solution.

jlippo · Mar 19, 2020

Scott_Arm said:
So I'm watching this DF vid about the PS5 specs and I'm really curious about latency to the SSD when they say things like, "it seems like it can read from the disk almost as if it were just RAM". If you look at all of the work that is being done in optimizing game engines, keeping the GPU and CPU working by avoiding cache misses is paramount. The reason is for each step in the cache hierarchy away from the registers and towards HDD/SSD your processor waits longer for data retrieval and your CPU or GPU sits idle. The kind of ballpark numbers on the CPU side is registers are 0/1 clock cycle, L1 is around 5 cycles, L2 is around 10 cycles and RAM is like 200+ cycles. I know the SSD is light years better than an HDD, and looks massively better than any SSD on the market, but they say seek time is "instantaneous" ... but no. Game devs work in nanoseconds as a unit of measure. It's not 0 nanoseconds. Unless it's very close to the same access time reaching RAM, it won't be able to be used like RAM. You really have to look at the latency in terms of clock cycles, and not throughput in seconds. Maybe they do have access times in line with RAM.

The throughput looks very good.

So absolute best case of:
22 GB/s = 22 MB/ms = 352 MB/16ms frame

Average case of:
9 GB/s = 9 MB/ms = 144 MB/16ms frame

PS4 HDD (best case, unrealistic):
100 MB/s = 100 KB/ms = 1.6 MB/16ms frame

PS5 RAM:
440 GB/s = 440 MB/ms = 7 GB/16ms frame

It's been a long time since I've read up on virtual texturing to see how much data you'd need to read on the fly for a 4k framebuffer to make sure you don't have texture pop-in like Rage.

Raw 3840*2160 16 bytes per pixel is ~130MB.. (Unless I fail again in math.)
I think we will be good, especially if basis or similar texture compression is used.

3dilettante said:
I think they'd have to do more to a CU than just remove the caches to get something useful, and there are some nice elements to caches that the SPEs might have benefited from.
What would a CU have for data access without caches? Is this all going directly into the LDS, and is that LDS larger?
The latency for the LDS is likely in the range of 30 cycles, versus 7 in Cell.
The LDS doesn't serve instructions to the CU, and an SPE-like solution would have the same storage for instructions and data.
That shared local store was also in some ways a disadvantage for the SPEs, since linking the instruction stream to the same memory meant slightly longer latency for all LS access, and meant that changes to the instructions in a program could affect the data portion if the total memory consumption rose.
I wonder if there are hardware units on the side for CODEC work, and presumably the DMA feeding whatever the CU is using for memory.

In Cell 256kB was chosen as memory amount due to physical distance limit latency to 7 cycles.
How much of memory would fit into that area with 7nm process?

Deleted member 11852 · Mar 19, 2020

BRiT said:
Dont forget possibly revisiting Google Stadia too.

The position of troll mod is already filled! :yep2:

Deleted member 11852 · Mar 19, 2020

BRiT said:

Allow me to interpret some of the commentary.

On the variable frequency: "it's a bold move" = Sony are crazy.
On boost mode: "Sony have a very specific implementation" = Sony are stupid to call it boost mode.
On the clock speeds: "They're pushing the clock speeds to some pretty crazy levels" = Sony really are crazy.
On teraflops: "Teraflops as a metric is not equivalent to performance" = Sony are tricksy.
On the clock speeds: "not going wider but faster is innovative" = Sony are crazy.
On the SSD: "This Sony SSD, they've pushed things so hard here" = You'd be stupid to play Cyberpunk 2077 on a non-nextgen console. Suck-it, PCMR!*
On the SSD controller: "For me this is the most exciting point of the whole presentation" = Sony are crazy.
On expandable SSD storage: "They are allowing you to use off-the-shelf components" = Sony are crazy.
On the 'Tempest'** audio: "This is so ambitious. I hope they can pull this off" = Sony are crazy.

These are all QFT statements. :yep2:

*not my words - QFT from DigitalFoundry.
**obviously a stupid name and it should have been called 'Normandy' audio.

PSman1700 · Mar 19, 2020

The road to PS5 video (Youtube) got much hate, about as many dislikes as likes. The commentary there isn't positive either, deletion of comments happen but admins cant keep up.

eloic · Mar 19, 2020

DSoup said:
Allow me to interpret some of the commentary.

On the variable frequency: "it's a bold move" = Sony are crazy.

On boost mode: "Sony have a very specific implementation" = Sony are stupid to call it boost mode.

On the clock speeds: "They're pushing the clock speeds to some pretty crazy levels" = Sony really are crazy.

On teraflops: "Teraflops as a metric is not equivalent to performance" = Sony are tricksy.

On the clock speeds: "not going wider but faster is innovative" = Sony are crazy.

On the SSD: "This Sony SSD, they've pushed things so hard here" = You'd be stupid to play Cyberpunk 2077 on a non-nextgen console. Suck-it, PCMR!*

On the SSD controller: "For me this is the most exciting point of the whole presentation" = Sony are crazy.

On expandable SSD storage: "They are allowing you to use off-the-shelf components" = Sony are crazy.

On the 'Tempest'** audio: "This is so ambitious. I hope they can pull this off" = Sony are crazy.

These are all QFT statements.

*not my words - QFT from DigitalFoundry.
**obviously a stupid name and it should have been called 'Normandy' audio.

Like, because rainbow.

(I didn't really interpret those comments like you, BTW, but yeah, I can understand why Sony can seem a bit crazy)

Deleted member 11852 · Mar 19, 2020

eloyc said:
Like, because rainbow. (I didn't really interpret those comments like you, BTW, but yeah, I can understand why Sony can seem a bit crazy)

I genuinely think PS5 is a good mix of "bold" (crazy) balance minus the Ken Kutaragi 'fuck developers - they should learn my batshit crazy-albeit-powerful hardware' position. PS4 was way too 'normal', PS5 redresses the weird excepting the XSX mixed-bandwidth RAM-pool which actually makes sense. Not all RAM need to be high-bandwidth so why make all RAM equal?

PSman1700 · Mar 19, 2020

Gears 5 enhanced graphics, on xsx it looks better then most other games out there.

3dilettante · Mar 20, 2020

jlippo said:
In Cell 256kB was chosen as memory amount due to physical distance limit latency to 7 cycles.
How much of memory would fit into that area with 7nm process?

A naive doubling for every node would be 256 KB * 2 ("65nm") * 2 ("45/40nm") * 2 ("32/28nm") * 2 ("20nm") * 2 * ("16nm") * 2 ("10nm") * 2 ("7nm").
That would be 32MB, but scaling has been less than ideal, and some nodes like the 20nm/16nm transition for TSMC didn't scale density significantly.
I haven't found an equivalent processor storage element to the local store to know if the latency would still be as favorable.

Working from a 14.5mm2 SPE, rough attempts at getting the area of the LS gave me around 4.6-4.65 mm2.
https://www.slideshare.net/Slide_N/cell-technology-for-graphics-and-visualization
Implementation choices for the SRAM may vary cell size and the amount of surrounding logic, so I don't have a specific target density.

Going by rough pixel counting of a Zen 2 L3 in a CCX of area 31.3mm2, there's ~1MB /mm2, which if applied to the same area as an LS would give up to 5.5 MB in that area.
My rough estimate of the L2 of a Zen 2 core gives ~1.6MB per mm2, or ~7.5MB in the LS area.
Latency-wise, the L2 latency is 12 cycles, although that is additive to the 4 cycles of the L1.
https://en.wikichip.org/wiki/amd/microarchitectures/zen_2
On one hand, there is a lot of other logic involved in the cache tags and supporting hardware, but it's less optimistic than going with the PR numbers given by foundries for their SRAM test cells.
TSMC had one with a cell area of .027um2, which would have been over 21MB is scaled without regard to the need for interface logic and other implementation choices.
I have not found a comparable example of a large storage memory with the latency range of an LS to know where the cut-off is for array latency versus arbitration or cache pipeline latency.

PSman1700 · Mar 21, 2020

Incredible indeed!

Svensk Viking · Mar 21, 2020

PSman1700 said:
Incredible indeed!

Seems to be a great showcase for consoles and GCN in general, but the PC version is apparently horrible on the Maxwell and Kepler GPUs.
https://www.pcgameshardware.de/Doom...ra-Nightmare-Tuning-Tipps-Benchmarks-1345721/

I'm not really interested in the game itself though, I disliked the overreliance on glory kills and the constant arena setup in Doom 2016, and it's supposedly even more encouraged in this game.

Scott_Arm · Mar 21, 2020

Svensk Viking said:
Seems to be a great showcase for consoles and GCN in general, but the PC version is apparently horrible on the Maxwell and Kepler GPUs.
https://www.pcgameshardware.de/Doom...ra-Nightmare-Tuning-Tipps-Benchmarks-1345721/

I'm not really interested in the game itself though, I disliked the overreliance on glory kills and the constant arena setup in Doom 2016, and it's supposedly even more encouraged in this game.

Old hardware eventually hits its limits. It makes little sense to be optimizing a game around a 2014 and older feature set, at this point.

PSman1700 · Mar 21, 2020

Doom 2016 runs wonderfull on even a GTX670, patches could come.

Svensk Viking · Mar 21, 2020

Scott_Arm said:
Old hardware eventually hits its limits. It makes little sense to be optimizing a game around a 2014 and older feature set, at this point.

Sure, but after seeing a GTX 770 being enough to almost always beat the consoles in performance and at slightly better graphics, and the more powerful Maxwell models doing the same, for all these years, seeing them fall so much still feels lame.
Digitalfoundry's test said the base consoles mostly corresponds to medium, and PCgameshardware tested at Ultra, so it might still be good at console equivalent settings.

I know you can't expect driver optimizations or engine optimizations for the older architectures forever, but that's also why PC gaming is a bit sad. Low-end for PC gaming has never been better than this generation IMO. People were super happy when I told them their aging GTX 770 and FX 8350 should still be superior to the base PS4 in RDR2.

techuse · Mar 21, 2020

Svensk Viking said:
Seems to be a great showcase for consoles and GCN in general, but the PC version is apparently horrible on the Maxwell and Kepler GPUs.
https://www.pcgameshardware.de/Doom...ra-Nightmare-Tuning-Tipps-Benchmarks-1345721/

I'm not really interested in the game itself though, I disliked the overreliance on glory kills and the constant arena setup in Doom 2016, and it's supposedly even more encouraged in this game.

Pascal owner here. Its not at all horrible, the game is still very good. Its certainly below its GCN competitors but I chalk that up to GCN just being better.

PSman1700 · Mar 21, 2020

Svensk Viking said:
Sure, but after seeing a GTX 770 being enough to almost beat the consoles in performance and at slightly better graphics, and the more powerful Maxwell models, for all these years, seeing them fall so much still feels lame.
Digitalfoundry's test said the base consoles mostly corresponds to medium, and PCgameshardware tested at Ultra, so it might still be good at console equivalent settings.

I know you can't expect driver optimizations or engine optimizations for the older architectures forever, but that's also why PC gaming is a bit sad. Low-end for PC gaming has never been better than this generation IMO. People were super happy when I told them their aging GTX 770 and FX 8350 should still be superior to the base PS4 in RDR2.

One of my pc's is still on a 670 (yes i know 8 years old

), it does about every game better then my base ps4 does. I consider that gpu lower then low end anno 2020. I tell you, things have been in a much worse state with other generations, people had to upgrade constantly to keep up. My Ti500 had to be upgraded rather fast to a 9700pro, that 9700 had to be upgraded far too quick aswell.
A family member has a 7970ghz edition, and yes those performance has held up better then my kepler.

Cyan · Mar 22, 2020

Svensk Viking said:
Seems to be a great showcase for consoles and GCN in general, but the PC version is apparently horrible on the Maxwell and Kepler GPUs.
https://www.pcgameshardware.de/Doom...ra-Nightmare-Tuning-Tipps-Benchmarks-1345721/

I'm not really interested in the game itself though, I disliked the overreliance on glory kills and the constant arena setup in Doom 2016, and it's supposedly even more encouraged in this game.

same here, the game has a lot of technical merits, specially performance wise, but I found Doom 2016 to be really boring. I completed it, but I wanted it to end. Is is Doom...,? well, it might be, but it doesn't feel like Doom. The glory kills are a meh for me, is like teleporting with a cheat. I still miss certain design ideas of the original, but well, I gotta admit that they did a good job by somewhat keeping the colored keys feature of the original..., somewhat.

Also, the idea of a super invincible guy that even demons should fear is.....meh. You are the artifact it seems. In Doom 3, while they didnt keep the spirit of the original, they still managed to make certain things interesting, like the soul thing for the 2nd run, which made the game more fun than in the first run. But this Doom, I never wanted to replay it and I tried it recently with the excellent mod to have dynamic resolution, but I didnt get hooked to the game again.

Silent_Buddha · Mar 22, 2020

DSoup said:
Allow me to interpret some of the commentary.

On the variable frequency: "it's a bold move" = Sony are crazy.

On boost mode: "Sony have a very specific implementation" = Sony are stupid to call it boost mode.

On the clock speeds: "They're pushing the clock speeds to some pretty crazy levels" = Sony really are crazy.

On teraflops: "Teraflops as a metric is not equivalent to performance" = Sony are tricksy.

On the clock speeds: "not going wider but faster is innovative" = Sony are crazy.

On the SSD: "This Sony SSD, they've pushed things so hard here" = You'd be stupid to play Cyberpunk 2077 on a non-nextgen console. Suck-it, PCMR!*

On the SSD controller: "For me this is the most exciting point of the whole presentation" = Sony are crazy.

On expandable SSD storage: "They are allowing you to use off-the-shelf components" = Sony are crazy.

On the 'Tempest'** audio: "This is so ambitious. I hope they can pull this off" = Sony are crazy.

These are all QFT statements.

*not my words - QFT from DigitalFoundry.
**obviously a stupid name and it should have been called 'Normandy' audio.

So, my takeaway from this is that you would rate the PS5 as 6 crazy's out of 9?

You dirty old man.

Regards,
SB

ultragpu · Mar 22, 2020

PSman1700 said:
Incredible indeed!

Wait they're rendering 80-90 millions of polygons per scene? Is that way above the average in current gen or is it something like the engine renders a fraction of it per frame then culls the rest when not in player's viewport?

Digital Foundry Article Technical Discussion [2020]

3dilettante

dobwal

jlippo

Deleted member 11852

Guest

Deleted member 11852

Guest

PSman1700

eloic

Deleted member 11852

Guest

PSman1700

3dilettante

PSman1700

Svensk Viking

Scott_Arm

PSman1700

Svensk Viking

techuse

PSman1700

Cyan

orange

Silent_Buddha

ultragpu

Similar threads