Playstation 5 [PS5] [Release November 12 2020]

Right, Just turning on more CUs doesn't get you much if the rest of the system doesn't change to support it. You need to increase available power and bandwidth to support. And you only gain 11%.
That was just hypothetical. Something they could use in a devkit, like a PSVR2 devkit somehow needing more power. But obviously they won't do in a retail console.

Also in a PSVR2 devkit they woudn't necessarily need BC.
 
40CUs at 2.23 would give them 11.4 TFlops (max). But they would have to use faster GDDR6 chips.

Right, Just turning on more CUs doesn't get you much if the rest of the system doesn't change to support it. You need to increase available power and bandwidth to support. And you only gain 11%.

They'd probably have to drop clocks somewhat more often too. Their cooling system and power delivery has been designed to run at a constant level, and using more CUs would lead to more frequent and deeper throttling / un-boosting.

Chances are that you wouldn't even get the full benefit of those extra CUs even if you increased BW.

Actually, increased BW on its own may be a bigger benefit than more CUs...
 
They'd probably have to drop clocks somewhat more often too. Their cooling system and power delivery has been designed to run at a constant level, and using more CUs would lead to more frequent and deeper throttling / un-boosting.

Chances are that you wouldn't even get the full benefit of those extra CUs even if you increased BW.

Actually, increased BW on its own may be a bigger benefit than more CUs...

That depends on how much power leakage Sony hardware is experiencing at its current frequency.

Widening a chip is more costly but can turn power into performance more readily as the current is spread across more transistors operating at lower frequencies.
 
Last edited:
This I don't understand? More BW would help 36 CU's and even more 40? Care to elaborate?

Yeah, the point was that as Sony are trying to run at a constant power and heat level, more CUs means you have to lower clocks to keep within that power and heat level. So you'd be more likely to have to lower boost levels - meaning more frequent drops below 2.23 / 3.5 gHz, and deeper drops than you'd currently experience.

So it may still be a net win, but you'd not be getting the "full" 11% gain in performance that you'd typically associate with a CU bottlenecked system maintaining the same clocks.

That depends on how much power leakage Sony hardware is experiencing at its current frequency.

Widening a chip is more costly but can turn power into performance more readily as the current is spread across more transistors operating at lower frequencies.

Yeah, but the point was that with PS5 you'd naturally get those lower frequencies due to the way they focused on a fixed power and heat level. So the actual gain of enabling those 4 extra CUs would be lower than you might expect.

Plus, with PS5, Sony have made a suspiciously big deal (i.e. marketing) about how narrow and fast is "better" than wider and slower because it makes everything on the GPU faster (kinda like MS did with the X1 up clock). So they would presumably see this a worse way to run the hardware.
 
Would increase the form factor and perhaps a better cooler help to maintain the heat from 4 extra CUs? Maybe this is the point where things get a bit too costly I guess.
 
I watched this recently and it is striking how the PS5 was designed around that philosophy that made Crash Bandicoot looking a lot better than anything else on the Playstation 1. Something that made developers a t the time wondering if Naughty Dog had a special treatment from Sony with better tools.

He clearly said that they could take advantage of the CD Space and break the barrier of a tiny RAM. That sounds an awful lot like the SSD idea in the PS5.
While others were trying to feed the RAM with the whole level plus all other game data (which the Playstation had only 2MBs of RAM and 1MB of VRAM according to him), Naughty Dog optimized the game to work with chunks, so the level for example was 30megs of data, not just 1 MB or something around that size. The game was streaming in and out from the CD the level data which enabled them to have in the memory what is the "nearest" to the player experience and not everything, enabling for a lot more detail to be shown.
This idea found its way to Jak and Daxter and later became a standard for pretty much every game with the help of drives, but obviously they have hit the wall in data seeking times.

So from experience from what they achieved with Crash (and later with Jak), enabling them to bump the detail, Mark Cerny clearly wanted to make a next gen version idea of this, and optimize it to it's maximum, so that you ve got as much as possible (and ideally 16GB) of unique, directly accessed data at any given time. The cache scrubbers are probably also optimized to clean the memory as fast as possible to give room for the new data to come in.

If this is significantly more optimized than whats on XBOX (with no large diminishing returns between the two solutions), we might actually see a lot more detailed worlds on the PS5.

Still very interesting that it is based on an idea that was born and used on the PS1 and was carried over 4 generations later.

 
Last edited:
I watched this recently and it is striking how the PS5 was designed around that philosophy that made Crash Bandicoot looking a lot better than anything else on the Playstation 1. Something that made developers a t the time wondering if Naughty Dog had a special treatment from Sony with better tools.

He clearly said that they could take advantage of the CD Space and break the barrier of a tiny RAM. That sounds an awful lot like the SSD idea in the PS5.
While others were trying to feed the RAM with the whole level plus all other game data (which the Playstation had only 2MBs of RAM and 1MB of VRAM according to him), Naughty Dog optimized the game to work with chunks, so the level for example was 30megs of data, not just 1 MB or something around that size. The game was streaming in and out from the CD the level data which enabled them to have in the memory what is the "nearest" to the player experience and not everything, enabling for a lot more detail to be shown.
This idea found its way to Jak and Daxter and later became a standard for pretty much every game with the help of drives, but obviously they have hit the wall in data seeking times.

So from experience from what they achieved with Crash (and later with Jak), enabling them to bump the detail, Mark Cerny clearly wanted to make a next gen version idea of this, and optimize it to it's maximum, so that you ve got as much as possible (and ideally 16GB) of unique, directly accessed data at any given time. The cache scrubbers are probably also optimized to clean the memory as fast as possible to give room for the new data to come in.

If this is significantly more optimized than whats on XBOX (with no large diminishing returns between the two solutions), we might actually see a lot more detailed worlds on the PS5.

Still very interesting that it is based on an idea that was born and used on the PS1 and was carried over 4 generations later.

It's been with us all along. It's why MGS4 had installs on PS3 and why there were eventually hard drive required games on 360. It's why this gen had mandatory hard drive installs. This is just a natural progression.
 
For years I argued that Crash on PS1 was really just a Taz Mania clone. Behind the back camera view, spin to attack, cartoony character... I was glad that Gavin basically said as much when he said he wanted to make a game inspired by Loony Toons style animation.
 
Nesh, I watched Cerny’s GDC presentation. I suspect because the audience are developers, he did not explain the implications of his creations much. That leaves curious enthusiasts puzzling and guessing over “where the beef is”, “what’s the big deal ?”

Describing his creation as “much faster SSD streaming” may be underselling the concept. It sounds like what they have (or are working on) is a foundation that can “teleport” any assets on-demand. The feature is built into the system, so any game can use it with ease. Hardware features like 6 level of priorities (perhaps some are reserved exclusively by the OS), explicit DMA for SPU-like audio kernel (avoids audio chewing up bandwidth) may help to ensure an upper bound to the delivery under “any” situation.

In that sense, people might want to explore at least these 2 areas:

1. Saving developers’ time to design and build complex game world. So they can devote more time optimizing the game (visual ?). The other way to see this is it lowers the barrier significantly, potentially allowing junior developers or even the gamers themselves to create complex game worlds. This will gel in better with the Create idea Cerny mentioned briefly.

2. Game world can be vast and without bottleneck. But that would still be “evolutionary”. Beyond persistent destructible world, would it be possible to let users and every game create/modify their own game world ? Not by the current gen brick-by-brick building model. That’s too time consuming for busy people. Using daily tools like your cellphone, and specialized tools like the LiDAR camera in iPad Pro.

Clearly this is a big ask. All the parts and people need to be aligned. It won’t work if it’s merely “faster GPU”, “super fast SSD”. They have to fit in the right way to _guarentee_ some service level, and also make it _general_ at the game world level (instead of the semi-conductor level). That is the hard part.

I got too busy and left gaming years ago. Forgot all my PSN and fora passwords. Dropped back into gaf last week because of the COVID-19 lockdown (I remembered my password suddenly and said “what the heck”). I am unlikely to play more games if it’s the same old my picture is clearer than yours battle.

However I’m still interested in technologies. So perhaps I might buy a console to check it out later.

EDIT:
You won’t get the same effect if every part is “straightforward” powerful and balanced. The system needs to be well planned, _efficient_, “segregated” and allows fine grained controls for common issues/bottlenecks to help tackle or alleviate edge cases. The designers need to be able to see around the corners to build this thing, which may be partly why it’s tricky to understand and appreciate at this moment.
 
Last edited:
I watched this recently and it is striking how the PS5 was designed around that philosophy that made Crash Bandicoot looking a lot better than anything else on the Playstation 1. Something that made developers a t the time wondering if Naughty Dog had a special treatment from Sony with better tools.
Besides cost making lives easier for Sony first party devs is the reason the PS5 is the way it is. Multiplatform devs are important as well but not the reason why the PS5 exists the way it does.
 
For years I argued that Crash on PS1 was really just a Taz Mania clone. Behind the back camera view, spin to attack, cartoony character... I was glad that Gavin basically said as much when he said he wanted to make a game inspired by Loony Toons style animation.
Well there is a similarity, but this one looks like a racing game rather than a platform game. I d say there is a lot more depth and variety in Crash which combined a lot more elements.
The spinning attack in Taz acted a lot like Sonic's dash. It was an attack and a movement accelerator. Crash's spinning was an attack.
They were probably inspired by Taz a bit, but I wouldnt call it a clone. He mentioned looney toons but he also mentioned another important old animator.
In general making an expressive cartoon character like the old time animations was the perfect decision for any animal cartoon character. Crash is not the crazy lunatic Tazmania devil that wants to destroy and eat anything. Apart from the spinning he is a unique character in his own right.
The camera angles of Crash appears to be a conscious decision which helped them to easier optimize the game with the streaming idea. For example he said that making Crash go through more directions meant that they would have to branch the data access.
 
Well, you know it's wrong. He talked about SPUs too. So Cell is double confirmed to be in there. Doing audio, and, of course, all the RT calculations.

Nah... SPU-like.

I think he meant the custom CU uses a similar computing model as the Cell's SPUs (async DMA batch operation). It sounds more like a regular GPU CU with some tweaks.
 
Nah... SPU-like.

I think he meant the custom CU uses a similar computing model as the Cell's SPUs (async DMA batch operation). It sounds more like a regular GPU CU with some tweaks.

Not with only two wavefronts at the same time, I doubt it is like a CUs. This is more like an SPU with some hyperthreading but the GPU is doing the raytracing part.

https://www.eurogamer.net/articles/digitalfoundry-2020-playstation-5-the-mark-cerny-tech-deep-dive

"GPUs process hundreds or even thousands of wavefronts; the Tempest engine supports two," explains Mark Cerny. "One wavefront is for the 3D audio and other system functionality, and one is for the game. Bandwidth-wise, the Tempest engine can use over 20GB/s, but we have to be a little careful because we don't want the audio to take a notch out of the graphics processing. If the audio processing uses too much bandwidth, that can have a deleterious effect if the graphics processing happens to want to saturate the system bandwidth at the same time."

Essentially, the GPU is based on the principle of parallelism - the idea of running many tasks (or waves) simultaneously. The Tempest engine is much more serial-like in nature, meaning that there's no need for attached memory caches. "When using the Tempest engine, we DMA in the data, we process it, and we DMA it back out again; this is exactly what happens on the SPUs on PlayStation 3," Cerny adds. "It's a very different model from what the GPU does; the GPU has caches, which are wonderful in some ways but also can result in stalling when it is waiting for the cache line to get filled. GPUs also have stalls for other reasons, there are many stages in a GPU pipeline and each stage needs to supply the next. As a result, with the GPU if you're getting 40 per cent VALU utilisation, you're doing pretty damn well. By contrast, with the Tempest engine and its asynchronous DMA model, the target is to achieve 100 percent VALU utilisation in key pieces of code."
 
Last edited:
Even more curious, we have a performance estimation of it of 100-200 GFlops (whether going by CPU equivalent of CU performance). Cell would have been tiny and effective and more powerful and allow PS3 BC, and even be useable for other things. I wonder if it was considered? Did they see the value but find it but find it too difficult/costly to adapt Cell for an AMD SOC? Or was it not even on table, and if not, why not?
 
Back
Top