Playstation 5 [PS5] [Release November 12 2020]

I wouldn't think that the CPU would take well to massively increased latency and expect that the SSD will already be quite busy using all of it's available bandwidth for moving data into RAM (the thing it's actually designed for). This seems like a bad idea, even if it were technically possible.
I specifically said the CPU or GPU could avoid making access requests to the RAM by using the SSD directly on code that is less latency sensitive.
For example a small 100KB script that determines enemy behavior, runs on the CPU and fits a small portion of the L3. Does it ever need to go through the main system RAM?
On the GPU side, on a 5.5GB/s mass storage I/O do you really need to put the ~2MB/s (H265 4K + Dolby Digital) video for sharing through the main system RAM, after it comes out of the encoder? And we know that live recording used to gobble away RAM like hell on these last gens.



I get the need for trying to address older memory issues that have cropped up in the past. But that may have been an issue with the memory controller and not just the fact that they were trying to request data at the same time. From a software develpoment perspective I have huge reservations on using the SSD solution like a RAM drive. From a hardware perspective, I would have huge reservations on using it like a RAM drive because we don't know it's random access speeds or it's latency, but also the more you use it like RAM drive the more I would be concerned for heat usage.

We don't know what the characteristics for this SSD are in terms of heat, but I can only imagine it's going to be running hotter than our PC counterparts at this moment - just looking at the raw output numbers. so this combined with a high clocked SOC running maximum power; I have some reservations on how that SSD can be used.

Whatever the heat ouput for the SSD is, regardless of its load, I'm pretty sure Sony will have adjusted their cooling capacity accordingly.
If the PS5's SoC is consuming around 250W (as it might be), I don't think they'd let the I/O performance be hampered by needing to cool another 15W from the storage subsystem.
Especially not after Sony made so much noise about their storage solution.
 
After seeing the additional DF video, it’s fascinating that SSD latency is only a couple of milliseconds. If I understand that right this means that even at 60fps you can load almost 92MB-200MB of fresh data with only one frame of latency, depending on compression. This has a big impact on the efficiency of available RAM I imagine and was also discussed as you don’t need to keep nearly as many assets in memory as you used to. That should more than make up for the relatively minor boost in RAM the PS5 got vs the PS4.

I also understand that older games do not benefit as much from these improvements but even these will fill up a PS4 worth of RAM in about 1 second, so load times should be fantastic even on older games.
 
After seeing the additional DF video, it’s fascinating that SSD latency is only a couple of milliseconds. If I understand that right this means that even at 60fps you can load almost 92MB-200MB of fresh data with only one frame of latency, depending on compression. This has a big impact on the efficiency of available RAM I imagine and was also discussed as you don’t need to keep nearly as many assets in memory as you used to. That should more than make up for the relatively minor boost in RAM the PS5 got vs the PS4.

I also understand that older games do not benefit as much from these improvements but even these will fill up a PS4 worth of RAM in about 1 second, so load times should be fantastic even on older games.
Ps4 games are LZ compressed, the big decompressor on ps5 is kraken, they certainly didn't make an LZ decompressor in hardware just for BC so I wonder what the solution is here. CPU? Recompressed on install? And what happened with the jpg blocks?
 
After seeing the additional DF video, it’s fascinating that SSD latency is only a couple of milliseconds. If I understand that right this means that even at 60fps you can load almost 92MB-200MB of fresh data with only one frame of latency, depending on compression. This has a big impact on the efficiency of available RAM I imagine and was also discussed as you don’t need to keep nearly as many assets in memory as you used to. That should more than make up for the relatively minor boost in RAM the PS5 got vs the PS4.

I also understand that older games do not benefit as much from these improvements but even these will fill up a PS4 worth of RAM in about 1 second, so load times should be fantastic even on older games.

Milliseconds? That’s HDD level latency
 
Ps4 games are LZ compressed, the big decompressor on ps5 is kraken, they certainly didn't make an LZ decompressor in hardware just for BC so I wonder what the solution is here. CPU? Recompressed on install? And what happened with the jpg blocks?

Use a zen core? It’s not like a BC game needs all the performance of the PS5 cpu.

Plus, for just basic BC you only need to handle 100 MBs of compressed data off the SDDs. Not 5.5 GBs.

JPG/LZ decompress may still be part of the DMAs in the APU if the PS4 setup was similar to the XB1.
 
Milliseconds? That’s HDD level latency

"so your 1gb fibre have milliseconds of latency? that is 56k modem level latency"

If you meant that both use the same unit, then it doesnt make any sense.

Even if SSD is 1ms, and HDD is 5-10ms = 500-1000% faster, that is nothing to you?

https://en.m.wikipedia.org/wiki/IOPS And SSDs destroy HDDs on amount of I/O operations/second.

Your comment just sounds like "milliseconds? that is shit!" kind of comment, while it looks like most SSDs have latency of one to few ms.

Would it have to be nanoseconds to satisfy or what is your point? I'm not sure if you are trying to say that SSDs are not good enough vs HDD or are you trying to down talk Sony SSD solution, or just dont know what you are talking about?
 
On the GPU side, on a 5.5GB/s mass storage I/O do you really need to put the ~2MB/s (H265 4K + Dolby Digital) video for sharing through the main system RAM, after it comes out of the encoder? And we know that live recording used to gobble away RAM like hell.

Constantly streaming data to the SSD doesn’t seem like a great way to ensure a long life of that SSD.
 
"so your 1gb fibre have milliseconds of latency? that is 56k modem level latency"

If you meant that both use the same unit, then it doesnt make any sense.

Even if SSD is 1ms, and HDD is 5-10ms = 500-1000% faster, that is nothing to you?

https://en.m.wikipedia.org/wiki/IOPS And SSDs destroy HDDs on amount of I/O operations/second.

Your comment just sounds like "milliseconds? that is shit!" kind of comment, while it looks like most SSDs have latency of one to few ms.

Would it have to be nanoseconds to satisfy or what is your point? I'm not sure if you are trying to say that SSDs are not good enough vs HDD or are you trying to down talk Sony SSD solution, or just dont know what you are talking about?

No. My understanding of nvme SSDs’ latency is that it is measured in microseconds not milliseconds.

Unless these decompression schemes are greatly increasing latency rates, you are doing a disservice to SSD performance.

LOL
 
No. My understanding of nvme SSDs’ latency is that it is measured in microseconds not milliseconds.

Unless these decompression schemes are greatly increasing latency rates, you are doing a disservice to SSD performance.

LOL
You had a surprising amount of restraint here. I would have blasted off after being called out like that. haha

Since we're on the topic:
The nvme speed is approximately 60 us, standard SSD is 175us
Standard system memory is about 8ns to 20ns.

So: 60 us is 60000 ns.
1000x slower than system ram.
I specifically said the CPU or GPU could avoid making access requests to the RAM by using the SSD directly on code that is less latency sensitive.
For example a small 100KB script that determines enemy behavior, runs on the CPU and fits a small portion of the L3. Does it ever need to go through the main system RAM?
If we forget about the heat, the latency of waiting on SSD will just stall the whole process if you just look at the above.
So everything is just running full tilt on the CPU and GPU working with slow memory (8ns-24ns latency) and then you've gotta wait 60,000ns for a piece of data to come in. You're going to stall. Using your earlier analogies, that's like getting all your furniture pieces from your Ikea in another country. But waiting for the screws and pegs to arrive from the Moon.

I'm not sure why that picture is there, somehow I managed to copy and paste that into this message without... seeing any BB code. But I guess since I can't remove it, hurray for VRS tier 2 improvements.
 

Attachments

  • upload_2020-4-3_19-11-53.png
    upload_2020-4-3_19-11-53.png
    46.7 KB · Views: 21
Last edited:
I specifically said the CPU or GPU could avoid making access requests to the RAM by using the SSD directly on code that is less latency sensitive.
For example a small 100KB script that determines enemy behavior, runs on the CPU and fits a small portion of the L3. Does it ever need to go through the main system RAM?
On the GPU side, on a 5.5GB/s mass storage I/O do you really need to put the ~2MB/s (H265 4K + Dolby Digital) video for sharing through the main system RAM, after it comes out of the encoder? And we know that live recording used to gobble away RAM like hell on these last gens.

Why would you want to do this? It's what RAM is for and if you're talking about small amounts of bandwidth it's just a drop in the bucket anyway. I don't see the advantage.
 
This a case the rasterizer is at maximum efficiency. Simple geometry equals to maximum rasterizer efficiency. This is the reason Mark Cerny said simple geometry push further the GPU than complex geometry. This is as simple as that but many people seems to forget how rasterizing work.

optimizing-the-graphics-pipeline-with-compute-gdc-2016-51-638.jpg
It's a great visual example to explain this edge cases where staring at a wall blows your PC up
 
Was the thermal density of CPU/GPU discussion before?
It's come up before when discussing the downside to node shrinks and their effect on performance scaling and cooling solutions. I'm not sure what to make of the claim about creating a whole set of clock points that lead to equal thermal density. I can think of some advantages to it for the purposes of reliability, but I'm not sure what quirk to the chip or the cooler would make equal density a specific target like that.

These "cache scrubbers" on the PS5 i keep hearing about...can someone please explain to me what they are? I keep hearing that they are there to partially mitigate the bandwidth issues, but how does that work if that's indeed how it is?
Cerny discussed the scenario where the SSD loads new data into memory, and the GPU's caches have old versions of the data. The standard way of dealing with that is to clear the GPU's caches, while the PS5's coherence engines send the affected addresses to the GPU, and the cache scrubbers are able to scan through and clear those specific addresses from all the caches. While there could be a bandwidth element, I think the big motivator is performance like the volatile flag was for the PS4. Cache flushes involve clearing caches of most or all of their contents, even if only a little of it was problematic. That hurts the performance of everything else running on the GPU by forcing extra writes to memory and then forcing data to be read back in. The operation itself is also long because there are many cache lines, and the process involves stalling the whole GPU while it is going on. Presumably, the PS5 minimizes the amount of stalling or avoids stalling in many cases. However, if the goal were only reducing bandwidth, stalling the GPU would reduce it more.

I am curious if the 36 CU decision is related to BC, though. It's exactly the same as PS4pro, and Sony's backwards compatibility has often been achieved by down clocking and handling workloads in as close to original hardware configurations as possible.
I dunno. The GPU has cache scrubbers, dynamic boost clocks, and built-in custom BC hardware. The CPU is a server-derived core that in other situations could turbo to just short of 5 GHz, and the PS5 included timing modifications (firmware or hardware?) to make it act like more like a netbook processor from 7 years ago on the fly. All this in a many-core SOC with many custom processors and a transistor budget likely greater than 10 billion.
The GPU and CPU are capable of virtualizing their resources or changing modes to pretend to be something else, or to have different clocks on a whim.
I'd worry if the designers were stumped by the number 36.

I wouldn't think that the CPU would take well to massively increased latency and expect that the SSD will already be quite busy using all of it's available bandwidth for moving data into RAM (the thing it's actually designed for). This seems like a bad idea, even if it were technically possible.
Intel has Data Direct I/O, which allows for network interfaces to load directly to the L3. I haven't seen an AMD version of it, or they gave it a different name and I failed to find it. There might be more benefit with the network since there's more to packet processing versus a local disk read.

The difference between a typical game and Furmark is very significant to the point where it is recommended to not even RUN Furmark cause it might damage your GPU.

It is not unrealistic to have some parts of code that will run outside the TDP of any hardware and devs should absolutely make sure their code isn't doing this for long periods.
I think it's a mark of flawed hardware if Furmark can damage it. Modern transistor budgets are too large at this point for a design to just hope something doesn't accidentally use an extra fraction of many billions of transistors, particularly with devices that strive for extra circuit performance. I also don't know about the vendors' habit of declaring any application that makes them look bad thermally a "power virus". There are enough examples of accidental power-virus applications, or ones that just happen to be a power virus for one product configuration out of millions, or because of interactions with one driver revision. Nor is it a guarantee that code that's safe today won't cause problems in future hardware, if it removes some bottleneck that was holding things back.


So GPU manufacturers like Nvidia and AMD don't actually know the max power draw for their parts? What they're giving out is some kind of estimate of power draw?
For marketing purposes, and because silicon performance has not been scaling well since the Pentium 4, vendors have been cutting back on guard-bands for voltage and clock, and they've added billions of transistors. There are electrical specifications that outline normal operation, and then theoretical limits and peak transients, with the caveat that those have error bars as well. Regular operation can temporarily lead to spikes considered too short to be relevant, but they happen. Covering for transients means the power delivery system is capable of amounts that would be dangerous if sustained, but to discard that would be to accept far fewer transistors and much of the clock scaling over the last 7 years.

But wouldn't this just go back to hardware design? At least do synthetic tests that just blitz the worst-case functions of the gpu, and either have dynamic clocking that can handle it, or pick a fixed clock that can sustain it?
Analysis like that becomes intractable at the level of complexity of modern SOCs, then the extra turbo and boost measures, and with DVFS and power gating. It's much easier to get caught out in a scenario where some combination of hundreds of independent blocks hits a peak, particularly if its in one of hundreds of clock/voltage points running who knows what sequence of operations that can have physical side effects. Power gating is an example of something that can be brutal in terms of electrical behavior when units are re-connected to power, so it's also a combination of what state the hardware was in before a worst-case function, or maybe what it goes to after.
The gap between average and worst-case is vast, and it only takes a fraction of the chip to breach limits. Due to less than optimal power scaling, the number of transistors needed to breach a given power budget grows slowly, while nodes add up to 2x the number of potential candidates.

That may have been inspired from the overheating troubles of the original and going with the over-engineered solution. The most interesting data-point is power draw by games. Too Human is what, >85%. Does that point to something like 85% being the expected power draw for a PS5 game? :???:
It's 85% of the "power virus" limit. This also assumes that power virus was the best one they had, or that someone didn't find something even better later. It was easier to craft a power virus when the number of units or buses was in single-digits. The Xbox Series X has over 40x the transistor count of that earlier console.

I specifically said the CPU or GPU could avoid making access requests to the RAM by using the SSD directly on code that is less latency sensitive.
For example a small 100KB script that determines enemy behavior, runs on the CPU and fits a small portion of the L3. Does it ever need to go through the main system RAM?
If going by the definition of cache, strictly speaking it does need to go to RAM at least initially. There are specific features for specific products for things like Intel Xeons that try to change that, but it's not simple and there can be potential issues that aren't worth the trouble for a console.
 
You had a surprising amount of restraint here. I would have blasted off after being called out like that. haha

Since we're on the topic:
The nvme speed is approximately 60 us, standard SSD is 175us
Standard system memory is about 8ns to 20ns.

So: 60 us is 60000 ns.
1000x slower than system ram.

If we forget about the heat, the latency of waiting on SSD will just stall the whole process if you just look at the above.
So everything is just running full tilt on the CPU and GPU working with slow memory (8ns-24ns latency) and then you've gotta wait 60,000ns for a piece of data to come in. You're going to stall. Using your earlier analogies, that's like getting all your furniture pieces from your Ikea in another country. But waiting for the screws and pegs to arrive from the Moon.

I'm not sure why that picture is there, somehow I managed to copy and paste that into this message without... seeing any BB code. But I guess since I can't remove it, hurray for VRS tier 2 improvements.
Latency reference
https://colin-scott.github.io/personal_website/research/interactive_latency.html
 
So the solution to have better code is making way harder porting it?

No. Going exclusive.
And I'm not talking about harder, just not similar.
If you have two very similar platforms, the most viable business solution would be to make a game that's a minimal common denominator of both.

Standard system memory is about 8ns to 20ns.

Nope. Real GDDR is about 100-200ns.

The nvme speed is approximately 60 us, standard SSD is 175us

Flash arrays with custom FTLs (what we have in PS5) get to ~1000ns of latency (with peaks below 2000 for garbage collect).
The difference is 10x only.
 
I'm curious if there's a specific scenario or removal of other overheads to get an SSD with 16 us latency. I've seen typical marketing numbers that start at several times that, and benchmarks that get around 100 us while still being considered good.
99th percentile times for many SSDs often get into the ms range, which isn't as bad as a hard disk but non-trivial if thinking in terms of a frame budget for a high-FPS game.
The cost-sensitive QLC drives seem to be on a race to see how close they can get to HDD-like timings in the worst case while still being able to say they're better than a HDD.

Sony's 6 levels of access priority may have something to do with avoiding conflicts and delays due to garbage collection, but even if the main SSD has better performance consistency, can the PS5 somehow make the third-party M.2 upgrade behave consistently as well?
 
Milliseconds? That’s HDD level latency

I think digital foundry gave about 250ms for HDD vs ‘a couple’ of ms for SSD. If SSD can be faster, that doesn’t really change my point though.
 
Back
Top