Digital Foundry Article Technical Discussion [2020]

Status
Not open for further replies.
I think they'd have to do more to a CU than just remove the caches to get something useful, and there are some nice elements to caches that the SPEs might have benefited from.
What would a CU have for data access without caches? Is this all going directly into the LDS, and is that LDS larger?
The latency for the LDS is likely in the range of 30 cycles, versus 7 in Cell.
The LDS doesn't serve instructions to the CU, and an SPE-like solution would have the same storage for instructions and data.
That shared local store was also in some ways a disadvantage for the SPEs, since linking the instruction stream to the same memory meant slightly longer latency for all LS access, and meant that changes to the instructions in a program could affect the data portion if the total memory consumption rose.
I wonder if there are hardware units on the side for CODEC work, and presumably the DMA feeding whatever the CU is using for memory.
 
I'm not convinced any of the Xbox Velocity Architecture is anything more than a rebranding of HBCC.

HBCC has a decompression block or allows a PC to quick resume? Sure some of it looks like HBCC because better memory management between vram and the lower bandwidth memory servicing it, is always going to be an ongoing issue looking for a better solution.
 
So I'm watching this DF vid about the PS5 specs and I'm really curious about latency to the SSD when they say things like, "it seems like it can read from the disk almost as if it were just RAM". If you look at all of the work that is being done in optimizing game engines, keeping the GPU and CPU working by avoiding cache misses is paramount. The reason is for each step in the cache hierarchy away from the registers and towards HDD/SSD your processor waits longer for data retrieval and your CPU or GPU sits idle. The kind of ballpark numbers on the CPU side is registers are 0/1 clock cycle, L1 is around 5 cycles, L2 is around 10 cycles and RAM is like 200+ cycles. I know the SSD is light years better than an HDD, and looks massively better than any SSD on the market, but they say seek time is "instantaneous" ... but no. Game devs work in nanoseconds as a unit of measure. It's not 0 nanoseconds. Unless it's very close to the same access time reaching RAM, it won't be able to be used like RAM. You really have to look at the latency in terms of clock cycles, and not throughput in seconds. Maybe they do have access times in line with RAM.

The throughput looks very good.

So absolute best case of:
22 GB/s = 22 MB/ms = 352 MB/16ms frame

Average case of:
9 GB/s = 9 MB/ms = 144 MB/16ms frame

PS4 HDD (best case, unrealistic):
100 MB/s = 100 KB/ms = 1.6 MB/16ms frame

PS5 RAM:
440 GB/s = 440 MB/ms = 7 GB/16ms frame

It's been a long time since I've read up on virtual texturing to see how much data you'd need to read on the fly for a 4k framebuffer to make sure you don't have texture pop-in like Rage.
Raw 3840*2160 16 bytes per pixel is ~130MB.. (Unless I fail again in math.)
I think we will be good, especially if basis or similar texture compression is used. ;)
I think they'd have to do more to a CU than just remove the caches to get something useful, and there are some nice elements to caches that the SPEs might have benefited from.
What would a CU have for data access without caches? Is this all going directly into the LDS, and is that LDS larger?
The latency for the LDS is likely in the range of 30 cycles, versus 7 in Cell.
The LDS doesn't serve instructions to the CU, and an SPE-like solution would have the same storage for instructions and data.
That shared local store was also in some ways a disadvantage for the SPEs, since linking the instruction stream to the same memory meant slightly longer latency for all LS access, and meant that changes to the instructions in a program could affect the data portion if the total memory consumption rose.
I wonder if there are hardware units on the side for CODEC work, and presumably the DMA feeding whatever the CU is using for memory.
In Cell 256kB was chosen as memory amount due to physical distance limit latency to 7 cycles.
How much of memory would fit into that area with 7nm process?
 
Last edited:

Allow me to interpret some of the commentary.
  • On the variable frequency: "it's a bold move" = Sony are crazy.
  • On boost mode: "Sony have a very specific implementation" = Sony are stupid to call it boost mode.
  • On the clock speeds: "They're pushing the clock speeds to some pretty crazy levels" = Sony really are crazy.
  • On teraflops: "Teraflops as a metric is not equivalent to performance" = Sony are tricksy.
  • On the clock speeds: "not going wider but faster is innovative" = Sony are crazy.
  • On the SSD: "This Sony SSD, they've pushed things so hard here" = You'd be stupid to play Cyberpunk 2077 on a non-nextgen console. Suck-it, PCMR!*
  • On the SSD controller: "For me this is the most exciting point of the whole presentation" = Sony are crazy.
  • On expandable SSD storage: "They are allowing you to use off-the-shelf components" = Sony are crazy.
  • On the 'Tempest'** audio: "This is so ambitious. I hope they can pull this off" = Sony are crazy.
These are all QFT statements. :yep2:

*not my words - QFT from DigitalFoundry.
**obviously a stupid name and it should have been called 'Normandy'
audio.
 
The road to PS5 video (Youtube) got much hate, about as many dislikes as likes. The commentary there isn't positive either, deletion of comments happen but admins cant keep up.
 
Allow me to interpret some of the commentary.
  • On the variable frequency: "it's a bold move" = Sony are crazy.
  • On boost mode: "Sony have a very specific implementation" = Sony are stupid to call it boost mode.
  • On the clock speeds: "They're pushing the clock speeds to some pretty crazy levels" = Sony really are crazy.
  • On teraflops: "Teraflops as a metric is not equivalent to performance" = Sony are tricksy.
  • On the clock speeds: "not going wider but faster is innovative" = Sony are crazy.
  • On the SSD: "This Sony SSD, they've pushed things so hard here" = You'd be stupid to play Cyberpunk 2077 on a non-nextgen console. Suck-it, PCMR!*
  • On the SSD controller: "For me this is the most exciting point of the whole presentation" = Sony are crazy.
  • On expandable SSD storage: "They are allowing you to use off-the-shelf components" = Sony are crazy.
  • On the 'Tempest'** audio: "This is so ambitious. I hope they can pull this off" = Sony are crazy.
These are all QFT statements. :yep2:

*not my words - QFT from DigitalFoundry.
**obviously a stupid name and it should have been called 'Normandy'
audio.
Like, because rainbow.

(I didn't really interpret those comments like you, BTW, but yeah, I can understand why Sony can seem a bit crazy)
 
Like, because rainbow. (I didn't really interpret those comments like you, BTW, but yeah, I can understand why Sony can seem a bit crazy)

I genuinely think PS5 is a good mix of "bold" (crazy) balance minus the Ken Kutaragi 'fuck developers - they should learn my batshit crazy-albeit-powerful hardware' position. PS4 was way too 'normal', PS5 redresses the weird excepting the XSX mixed-bandwidth RAM-pool which actually makes sense. Not all RAM need to be high-bandwidth so why make all RAM equal?
 
In Cell 256kB was chosen as memory amount due to physical distance limit latency to 7 cycles.
How much of memory would fit into that area with 7nm process?
A naive doubling for every node would be 256 KB * 2 ("65nm") * 2 ("45/40nm") * 2 ("32/28nm") * 2 ("20nm") * 2 * ("16nm") * 2 ("10nm") * 2 ("7nm").
That would be 32MB, but scaling has been less than ideal, and some nodes like the 20nm/16nm transition for TSMC didn't scale density significantly.
I haven't found an equivalent processor storage element to the local store to know if the latency would still be as favorable.



Working from a 14.5mm2 SPE, rough attempts at getting the area of the LS gave me around 4.6-4.65 mm2.
https://www.slideshare.net/Slide_N/cell-technology-for-graphics-and-visualization
Implementation choices for the SRAM may vary cell size and the amount of surrounding logic, so I don't have a specific target density.

Going by rough pixel counting of a Zen 2 L3 in a CCX of area 31.3mm2, there's ~1MB /mm2, which if applied to the same area as an LS would give up to 5.5 MB in that area.
My rough estimate of the L2 of a Zen 2 core gives ~1.6MB per mm2, or ~7.5MB in the LS area.
Latency-wise, the L2 latency is 12 cycles, although that is additive to the 4 cycles of the L1.
https://en.wikichip.org/wiki/amd/microarchitectures/zen_2
On one hand, there is a lot of other logic involved in the cache tags and supporting hardware, but it's less optimistic than going with the PR numbers given by foundries for their SRAM test cells.
TSMC had one with a cell area of .027um2, which would have been over 21MB is scaled without regard to the need for interface logic and other implementation choices.
I have not found a comparable example of a large storage memory with the latency range of an LS to know where the cut-off is for array latency versus arbitration or cache pipeline latency.
 
Seems to be a great showcase for consoles and GCN in general, but the PC version is apparently horrible on the Maxwell and Kepler GPUs.
https://www.pcgameshardware.de/Doom...ra-Nightmare-Tuning-Tipps-Benchmarks-1345721/

I'm not really interested in the game itself though, I disliked the overreliance on glory kills and the constant arena setup in Doom 2016, and it's supposedly even more encouraged in this game.

Old hardware eventually hits its limits. It makes little sense to be optimizing a game around a 2014 and older feature set, at this point.
 
Old hardware eventually hits its limits. It makes little sense to be optimizing a game around a 2014 and older feature set, at this point.

Sure, but after seeing a GTX 770 being enough to almost always beat the consoles in performance and at slightly better graphics, and the more powerful Maxwell models doing the same, for all these years, seeing them fall so much still feels lame.
Digitalfoundry's test said the base consoles mostly corresponds to medium, and PCgameshardware tested at Ultra, so it might still be good at console equivalent settings.

I know you can't expect driver optimizations or engine optimizations for the older architectures forever, but that's also why PC gaming is a bit sad. Low-end for PC gaming has never been better than this generation IMO. People were super happy when I told them their aging GTX 770 and FX 8350 should still be superior to the base PS4 in RDR2.
 
Last edited:
Seems to be a great showcase for consoles and GCN in general, but the PC version is apparently horrible on the Maxwell and Kepler GPUs.
https://www.pcgameshardware.de/Doom...ra-Nightmare-Tuning-Tipps-Benchmarks-1345721/

I'm not really interested in the game itself though, I disliked the overreliance on glory kills and the constant arena setup in Doom 2016, and it's supposedly even more encouraged in this game.

Pascal owner here. Its not at all horrible, the game is still very good. Its certainly below its GCN competitors but I chalk that up to GCN just being better.
 
Sure, but after seeing a GTX 770 being enough to almost beat the consoles in performance and at slightly better graphics, and the more powerful Maxwell models, for all these years, seeing them fall so much still feels lame.
Digitalfoundry's test said the base consoles mostly corresponds to medium, and PCgameshardware tested at Ultra, so it might still be good at console equivalent settings.

I know you can't expect driver optimizations or engine optimizations for the older architectures forever, but that's also why PC gaming is a bit sad. Low-end for PC gaming has never been better than this generation IMO. People were super happy when I told them their aging GTX 770 and FX 8350 should still be superior to the base PS4 in RDR2.

One of my pc's is still on a 670 (yes i know 8 years old :p), it does about every game better then my base ps4 does. I consider that gpu lower then low end anno 2020. I tell you, things have been in a much worse state with other generations, people had to upgrade constantly to keep up. My Ti500 had to be upgraded rather fast to a 9700pro, that 9700 had to be upgraded far too quick aswell.
A family member has a 7970ghz edition, and yes those performance has held up better then my kepler.
 
Seems to be a great showcase for consoles and GCN in general, but the PC version is apparently horrible on the Maxwell and Kepler GPUs.
https://www.pcgameshardware.de/Doom...ra-Nightmare-Tuning-Tipps-Benchmarks-1345721/

I'm not really interested in the game itself though, I disliked the overreliance on glory kills and the constant arena setup in Doom 2016, and it's supposedly even more encouraged in this game.
same here, the game has a lot of technical merits, specially performance wise, but I found Doom 2016 to be really boring. I completed it, but I wanted it to end. Is is Doom...,? well, it might be, but it doesn't feel like Doom. The glory kills are a meh for me, is like teleporting with a cheat. I still miss certain design ideas of the original, but well, I gotta admit that they did a good job by somewhat keeping the colored keys feature of the original..., somewhat.

Also, the idea of a super invincible guy that even demons should fear is.....meh. You are the artifact it seems. In Doom 3, while they didnt keep the spirit of the original, they still managed to make certain things interesting, like the soul thing for the 2nd run, which made the game more fun than in the first run. But this Doom, I never wanted to replay it and I tried it recently with the excellent mod to have dynamic resolution, but I didnt get hooked to the game again.
 
Allow me to interpret some of the commentary.
  • On the variable frequency: "it's a bold move" = Sony are crazy.
  • On boost mode: "Sony have a very specific implementation" = Sony are stupid to call it boost mode.
  • On the clock speeds: "They're pushing the clock speeds to some pretty crazy levels" = Sony really are crazy.
  • On teraflops: "Teraflops as a metric is not equivalent to performance" = Sony are tricksy.
  • On the clock speeds: "not going wider but faster is innovative" = Sony are crazy.
  • On the SSD: "This Sony SSD, they've pushed things so hard here" = You'd be stupid to play Cyberpunk 2077 on a non-nextgen console. Suck-it, PCMR!*
  • On the SSD controller: "For me this is the most exciting point of the whole presentation" = Sony are crazy.
  • On expandable SSD storage: "They are allowing you to use off-the-shelf components" = Sony are crazy.
  • On the 'Tempest'** audio: "This is so ambitious. I hope they can pull this off" = Sony are crazy.
These are all QFT statements. :yep2:

*not my words - QFT from DigitalFoundry.
**obviously a stupid name and it should have been called 'Normandy'
audio.

So, my takeaway from this is that you would rate the PS5 as 6 crazy's out of 9? :p You dirty old man.

Regards,
SB
 
Status
Not open for further replies.
Back
Top