Playstation 5 [PS5] [Release November 12 2020]

Interesting that they use aluminium for the back shield in order to have better thermal conductivity. And a different kind of thermal paste for the GDDR6 chips.

We didn't miss it.
Ohhh I took it from the wording of the article (or the translation) that they were implying there was an additional heatsink that wasn't shown in the teardown video
 
That's because the PS5's SoC "basically runs at almost full power during gaming" (he says). As a result, TDP (Thermal Design Power) values and the amount of heat generated during gaming are "about the same". On the other hand, it is rare for a PS4 SoC to operate at the very edge of TDP, and even when gaming, it generates only a few percent of its TDP.
Perhaps something was lost in translation, a misquote, or he misspoke about the PS4 SOC only generating a few percent of TDP. Having something like 90%+ power margin would have had significant implications in how far Sony could have pushed clocks or how quiet the PS4 could have been. That text doesn't mesh with the PS4 being shown to pull 100-150W for games.


Interview with Otori VP in Japanese(2/2)

 To cool both sides of the main board, the PS5's cooling fan is 45 mm thick, which is thicker than the current PS4 and PS4 Pro. If we divide the SoC-mounted side of the PS5 into "Side A" and the back of the PS5 into "Side B," then the heat emitted from Side B is "equivalent to that of the PS4's SoC," according to Mr. Otori. Therefore, the air is sucked in from both sides of the cooling fan to cool the A and B sides of the main board.
I'm not sure about this figure for the side opposite the SOC pulling as much as the PS4 SOC. It may depend on what figure he's using for the PS4, but since the PS4's been measured pulling 100-150W, the upper range could leave 100W+ for the SOC, while the lower could 50-60W. The elements with the most obvious thermal compound are the GDDR6 modules and the DC converters and a few other components (3 NAND modules, and a few ICs near the IO ports with silver "towers" that likely put them in contact with the shield).
From the following, we could probably assume ~2.5W for the GDDR6, or 20W for 8 modules.
https://www.eeweb.com/high-bandwidth-memory-ready-for-ai-prime-time-hbm2e-vs-gddr6/

That leaves 30-80W for everything else on that side, and which I cannot account for. The global power conversion losses for the whole console should probably not be worse that roughly 30W with 10% inefficiency, and that's including the power supply itself and the more substantial number of conversion ICs on the SOC side.

The big glob of thermal compound over much of the converter ICs also makes me think that something on the upper range of that estimate would have a more close contact with the the heatpipe.

There were other structural features as well. One example is the thermal conductivity between the GDDR6 compatible memory mounted on the B side of the main board and the shield board. Instead of the so-called 'stick-on' type thermal conductor in sheet form, it is coated with a liquid material that hardens like rubber after a short time. This is a measure to increase productivity in response to automation.

 In the case of the paste-type heat-conductive materials, it is difficult to remove them from the backing board by an automatic machine, so it is necessary to manually remove them. The PS5 uses almost all of the thermal conducive materials used in the PS5, whereas the PS4 series used only some of them.
The thick material over the GDDR6 also hints that it's likely not dissipating that much heat. The reference to this choice allowing for more automation may also a data point to my perception that the PS5's physical design had a higher emphasis on mass production and compatibility with tooling.

When Sony said that work on a metal TIM began about 2 years ago in the teardown video, it did occur to me that about two years ago would have been when they were having frequency issues with fixed clocks and moving to take advantage of boosting and AMD Smart Shift. It's interesting how one decision may drive innovation in another area.
2 years could point to when they had to commit to a physical SOC and node. Prior to 2018, perhaps there was some uncertainty on the 7nm node variant, since it seems AMD has at this point used the same 7NP node for all its chips despite all the speculation at the time about 7nm variants that could have provided additional performance.
The smaller-die strategy would have been a candidate the whole time, and I think the characterization of the processes at the time would have given Sony a decent idea of the risks in terms of power density and whatnot.
Perhaps it came down to cost in the end for any alternatives, be it the Oberon SOC on a different node variant or a larger SOC with lower clocks.

I think it's still worth lending some credence to Cerny's claim that they historically had trouble predicting power consumption and that power demands were spiky. At least it would have been more expensive to get the SOC to similar levels of performance without falling back to the silicon's self-management.


Official numbers are 560gb/s vs 448Gb/s.

the whole debate around split pools I would generally ignore, largely speculation and not representative of what Scarlett is doing, which is asymmetrical memory sizes which allows the cpu and GPU to access at different speeds.
Memory size doesn't really govern the access speed of either the GPU or CPU. The CPU likely physically can only consume the same amount of bandwidth regardless of which slice of the memory space it is reading from, whereas the GPU has a substantially broader interconnect and so it can be constrained if it its accesses fall outside of the GPU-optimized zone.

Don't forget the specific memory configuration on XSX. When the CPU is used (using the slower pool of memory) then it will reduce the total available bandwidth, on top of the regular memory contention.
Memory is memory. If the CPU consumes X amount of bandwidth, it's X amount from the total available for either console.

I think you mean L1 since that's shared in the number of CUs per shader array.
L2 is matched to bus width on RDNA. So there's going to be much more L2 available. L2 is 5MB on XSX.
The number of L2 slices is generally matched to the bus width, but the capacity per slice is not. In theory, a narrower bus could have more L2, if the design opted to increase slice capacity. That could offset some of the downsides of having higher clocks vs memory and some bandwidth consumption, but the cost argument would go against it.
 
Why is this an either/or statement? Why wouldn't Sony be trying to reserve as much PS5 consoles as possible for launch day?
I don't think Sony is in the business of keeping warehouses filled with brand new PS5 consoles just because.



Do we have confirmed L2 amounts for the PS5?


Cache scrubbers should be a means to require fewer movements to the GDDR6 by leaving more cache available, resulting in fewer cache misses.
Whether or not these are effective is something we don't know. We do know AMD decided to not use it for RDNA2 PC graphics cards.

Okay, nevermind the cache scrubbers for a moment.

Is it likely that PS5, XSX and RDNA 2 Radeon GPUs for PC all have a large Cache to reduce the need for a massive amount of GDDR6 bandwidth and wider external memory busses?
 
I'm fairly positive that Sony is selling pre-orders well beyond launch date and I believe MS is trying to reserve as much as possible for launch day.

So I don't know if you can buy a PS5 on launch day if you don't have a pre-order. But you can definitely have a chance for Xbox Series. I was not guaranteed a launch day order for PS5, and I'm second block. I can only assume any orders taken after mine are progressively less likely to make launch day.

Not only that but there's been a few (admittedly anecdotal) cases of people having their PS5 preorders cancelled, or delayed into Q1 2021. Hopefully those are just fringe cases but I wouldn't be surprised if more end up either being cancelled or delayed. Also I'm guessing this would mainly affect the Disc model since it's the more popular of the two.
 
Why is this an either/or statement? Why wouldn't Sony be trying to reserve as much PS5 consoles as possible for launch day?
I don't think Sony is in the business of keeping warehouses filled with brand new PS5 consoles just because.
I believe I wrote: "I don't know". I just do know that the pre-orders taken are likely now sliding well beyond launch date. It doesn't necessarily mean they won't have some quantities for sale available on launch day.
 
Do you have any proof?
it's in another thread that I can't find. I think it was the one where we were putting in our replies about achieving pre-orders or not.
But in that thread, MS tweeted they are done with pre-order stock, and if you didn't manage to get one, your next opportunity would be on launch day.

I have no clue what Sony is doing.
 
Why is this an either/or statement? Why wouldn't Sony be trying to reserve as much PS5 consoles as possible for launch day?
I don't think Sony is in the business of keeping warehouses filled with brand new PS5 consoles just because.



Do we have confirmed L2 amounts for the PS5?


Cache scrubbers should be a means to require fewer movements to the GDDR6 by leaving more cache available, resulting in fewer cache misses.
Whether or not these are effective is something we don't know. We do know AMD decided to not use it for RDNA2 PC graphics cards.

Is it confirmed AMD arent using them in RDNA2?
 
I meant higher than steal.

Don't forget the specific memory configuration on XSX. When the CPU is used (using the slower pool of memory) then it will reduce the total available bandwidth, on top of the regular memory contention. And actually XSX has less L2 cache by CU which means overall there will be more L2 misses on XSX (good thing it has more bandwidth).

In the end, only the games will show us the outcome of those constraints.

I think some people are overblowing the memory situation on XSX. Some seem to be assuming the amount of cycle time a game would be leveraging the CPU and GPU accessing the memory to be even, when that won't be the case. It will vary on a game-by-game basis so the average of time the CPU or GPU spend accessing the memory pool will vary just as much depending on what the game requires. Also I really can't see how overall bandwidth would be reduced by the degree I saw some people on blog spots early in the year (or even on other places like Era) try implying. Did they think the CPU and GPU would be sharing GDDR6 channels? Hot Chips confirmed there are 20 channels.

Series X's GPU can also snoop the cache of the CPU. I don't know if that's just the L3$ (should probably call it L2$ going by AMD's nomenclature), or the other two caches too (I'd assume snooping L0$ is impossible), but I'd figure that's in place to provide some type of benefit for the GPU if it needs to know of some certain type of data but the CPU may be the one currently accessing the memory bus.

L2$ (guess you mean L1$ as AMD would call it, because AMD goes by L0$, L1$, L2$), that should be interesting to see how it shakes out. Indeed it does have less per CU, unless they've increased the amount which to my knowledge wasn't mentioned at Hot Chips (and when they did explicitly mention the L3$ size being increased it'd only make sense for them to do mention the same for the L2$ if that were also done), but I think the whole thing about it having less per CU only comes into play with tasks that aren't saturating the wider net of overall CUs. In totality it still has more L2$ (L1$) by virtue of having more physical CUs, and while I get saturating more CUs is going to logically require a bit more than saturating less, in the grand scheme of things it's not really that many more CUs to saturate especially considering AMD have at least one GPU coming soon that's over 50% larger in CU count than Series X (if only counting active CUs).

At the very least I'm guessing their saturation for wider net of CUs has improved a ton from RDNA1 otherwise there's not too much a point pushing wider except for bragging rights, which isn't something AMD needs right now without the performance to back it up (thankfully they seem to definitely have capable performance so IF they can outdo Nvidia's Ampere cards, the bragging rights come along for the ride ;))
 
Talking about caches etc we don't actually know how much cache is in the PS5 and remember its got those GPU cache scrubbers so I imagine those help with available cache. For all we know ps5 could be utilising that leaked AMD infinity cache.
I think cache capacity isn't particularly enhanced. Before cache scrubbers, a line with stale data would be a line you didn't want to use. After scrubbers, that line with stale data is invalidated, but you'd still need to read in the new one separately.
The old style of flushing the cache entirely isn't capacity-dependent.
Going by the infinity cache rumor, I think the PS5 is missing 100-150 mm2 of area.

My understanding of the cache scrubbers was to avoid a complete flushing of caches.
This is how it's characterized, although I'm curious on how the scrubbers changed the process. Does it avoid flushes versus making them have a lighter impact?

Cache scrubbers should be a means to require fewer movements to the GDDR6 by leaving more cache available, resulting in fewer cache misses.
Whether or not these are effective is something we don't know. We do know AMD decided to not use it for RDNA2 PC graphics cards.
Pinpoint flushes would leave unrelated data cached, which would save bandwidth in terms of needing to write out dirty lines and read everything back in. I'm still not sure if the bandwidth argument isn't something of a wash, depending on the PS5's implementation of the flush process.
A big issue with the cache flushes is that the GPU needs to stall for a very long time while this is going on, and only some of that time would be spend generating writes to flush dirty cache lines. Lines with data that had only been read would just be marked invalid with no further action. During the process, the command processors are stalled, wavefronts don't launch, and the existing wavefronts are either allowed to drain prior to the flush or they are paused.
In that scenario, there's not a bandwidth constraint since very little bandwidth is being used.

Depending on what the scrubbers change for this process, a GPU with scrubbers could wind up consuming more bandwidth than one without. However, that also goes to a question I have about what the usage model is for these scrubbers. Because of the penalties for this sort of activity, games would generally not try to overwrite resources from disk until after some significant event like a new frame where much of the context is reset anyway. If the scrubbers make stalls cheaper but don't eliminate them, then only a few may be advisable versus the current policy of avoiding them as much as possible. This seems to imply that Sony wants to more aggressively pull data into the GPU from the SSD in the same frame, but the SSD's delivery time frame may not be reliable enough depending on the frame time budget or performance glass jaws (either the in-built SSD or random NAND drive added as an expansion).
So if games generally avoid reading data from disk on top of actively used buffers, what or how much would scrubbing change?
 
So if games generally avoid reading data from disk on top of actively used buffers, what how much would scrubbing change?

It's to make things like ratched&clank jumping from one location to another location more feasible. In essence to not bottleneck/overload cpu with cache invalidation/io calls when those gigabytes of data per second come in from ssd. Or something like unreal5 streaming things in on demand using unique assets. The world is changing as ps5 architecture is very heavily streaming optimized.
 
Did they think the CPU and GPU would be sharing GDDR6 channels? Hot Chips confirmed there are 20 channels.
They are sharing channels, unless you meant on the same cycle, there's no module walled off from either client.

Series X's GPU can also snoop the cache of the CPU. I don't know if that's just the L3$ (should probably call it L2$ going by AMD's nomenclature), or the other two caches too (I'd assume snooping L0$ is impossible), but I'd figure that's in place to provide some type of benefit for the GPU if it needs to know of some certain type of data but the CPU may be the one currently accessing the memory bus.
The GPU being able to snoop CPU caches dates back to the current gen of consoles, so that's not change. However, that's the GPU reading from the CPU caches, and they call their caches L1, L2, and L3. RDNA alone renamed its L1 caches L0, for some reason.


It's to make things like ratched&clank jumping from one location to another location more feasible. In essence to not bottleneck/overload cpu with cache invalidation/io calls when those gigabytes of data per second come in from ssd. Or something like unreal5 streaming things in on demand using unique assets. The world is changing as ps5 architecture is very heavily streaming optimized.
What would loading a new region require? If it's loading new objects, they're different objects and other assets from the region being exited. They wouldn't load on top of existing objects because they would still be actively rendering.
Scrubbers wouldn't correct for the reverse problem, where there's data that hasn't been invalidated yet by the SSD, but should be considered discarded. That would require explicit disposal of existing objects before overwriting them, which is either going to add some form of stall to the GPU or on the SSD.
 
They are sharing channels, unless you meant on the same cycle, there's no module walled off from either client.

Yeah, that's probably more in line with what it is. Thanks for the clarifications. I was thinking of some of the interleaving talk others brought up a while back in other places. Seemed to think there would be 10 channels (since there's 10 modules) or something to that effect. It's not something I'm too versed in, admittedly.

Aside that though yes, they still share the channels, since CPU or GPU accessing the bus would be the way that works in a hUMA design.

The GPU being able to snoop CPU caches dates back to the current gen of consoles, so that's not change. However, that's the GPU reading from the CPU caches, and they call their caches L1, L2, and L3. RDNA alone renamed its L1 caches L0, for some reason.

Ah, okay then. That's news to me; never did really look too deeply into PS4 or XBO system technical specs beyond the broad things, and at that time I wasn't interested in the technical features, standards, techniques, specifications etc. as I've become the past few years.

I'd hope AMD standardize their cache naming schemes across the board if they've only changed it for the GPUs but not the CPUs. Somewhat annoying quirk when companies fracture nomenclature across their product lines, it adds unnecessary confusion :S
 
Scrubbers wouldn't correct for the reverse problem, where there's data that hasn't been invalidated yet by the SSD, but should be considered discarded. That would require explicit disposal of existing objects before overwriting them, which is either going to add some form of stall to the GPU or on the SSD.

As cerny explained developer would initiate load from ssd to some memory address. decompression+cache scrubbers make sure that by the time data is in memory caches are in correct state and developer/os does minimal work. It's just a convenience(ease of programming) and cpu cycles saver, not a magic bullet to make cost of streaming/memory bandwidth zero. In more traditional case os would do some cache invalidate calls using cpu whereas in ps5 case cache scrubbers/io-chip takes care of this duty. PS5 can pull 11GB/s decompressed textures. In worst case doing all those invalidate calls using traditional means could become a bottle neck or even in best case cpu would be doing work that is best left for dedicated hw like the ps5 io chip and cache scrubbers.
 
Is it likely that PS5, XSX and RDNA 2 Radeon GPUs for PC all have a large Cache to reduce the need for a massive amount of GDDR6 bandwidth and wider external memory busses
At least the PC RDNA2 is heavily rumored to be using "Infinity Cache" to make up for a narrower and slower GDDR6 bus than the competition.

Is it confirmed AMD arent using them in RDNA2?
I'm sure Cerny stated AMD had decided to not use their cache scrubbers in the Road to PS5 presentation.
 
At least the PC RDNA2 is heavily rumored to be using "Infinity Cache" to make up for a narrower and slower GDDR6 bus than the competition.


I'm sure Cerny stated AMD had decided to not use their cache scrubbers in the Road to PS5 presentation.

I don't recall him saying that at all
 
I'd hope AMD standardize their cache naming schemes across the board if they've only changed it for the GPUs but not the CPUs. Somewhat annoying quirk when companies fracture nomenclature across their product lines, it adds unnecessary confusion :S
The odd part is that AMD's scheme was consistent until RDNA. I'm not sure why they changed the nomenclature. Perhaps they didn't want to risk confusion by renaming the L2, and didn't want to call the graphics L1 an L1.5 cache.

As cerny explained developer would initiate load from ssd to some memory address. decompression+cache scrubbers make sure that by the time data is in memory caches are in correct state and developer/os does minimal work. It's just a convenience(ease of programming) and cpu cycles saver, not a magic bullet to make cost of streaming/memory bandwidth zero.
I'm asking in what specific scenarios that makes sense.
Somehow the SSD is being asked to load data, and then the IO system or game is loading that into the same memory addresses as an existing asset of some kind.
What sorts of objects or assets are amenable to that?

One thing is that at least so far, GPU caches are virtually tagged. If the SSD loads a page into RAM, there's the question of what virtual memory page it gets and what physical addresses in RAM it receives.
The GPU is generally only aware of the virtual address, with the exception being CPU-coherent accesses. As a fellow IO device, the SSD's traffic is not in the latter category.
However, virtual address space in a 64-bit system is effectively unconstrained versus the physical space of RAM.
If a range of assets belonging to different surfaces and objects is loaded into RAM, why wouldn't it get separate virtual addresses? In that case, any access by the GPU would cache those lines separately. The replaced object would be evicted without issue.
If the replaced object was legitimately still in use, the scrubbers would not help with the problem that if ongoing shaders load new data rather than the old object's data, they'd be wrong rather than future shaders loading stale data.

That write after read hazard is not helped by the scrubbers, and an argument against overwriting locations unless there's a barrier of some kind and an explicit discard. That could stall the SSD or IO system, if an access were initiated and the destination was flagged as still being needed.
The alternative is that objects could be given separate virtual memory addresses, since they are separate objects. But then the scrubbers wouldn't do anything.

If there is a scenario, can the GPU rely on the timeliness of the SSD, or what is the fallback?
 
hmm, that doesn't exactly fill me with confidence here.

Without generalising too much, Japanese are usually very conservative in their statements. And if you add the fact that he’s a tech guy, the PR levels of this is zero. Any CEO or marketing adjacent person would go “PS5 is the quietest console ever in the history of humankind”
 
Without generalising too much, Japanese are usually very conservative in their statements. And if you add the fact that he’s a tech guy, the PR levels of this is zero. Any CEO or marketing adjacent person would go “PS5 is the quietest console ever in the history of humankind”
yea, I mean, I guess the answer is ultimately it depends on what they are referring to.
I know there are quiet ps4s, and loud ps4s. If it's generally more quiet than the quiet PS4, that's ideal. If they are referring to the generally more quiet than the loud ps4, then, I'm not sure how great that is.

So while I did write the post in jest somewhat, it's also conflicting with what I understood about PS5 cooling. The expectations for performance were largely set by Cerny, who explained how by fixing everything there would be paired cooling with the power input going into the chip, it would never got hotter or cooler etc. So I guess in many ways, perhaps I interpreted that as being a single fan speed which should cool the chip at all temps, perhaps that was a bad take. As it would appear the fan speed is indeed variable and likely tied to the heat of the chip. Whereas the power output is fixed and the variable clocks are based on game code have nothing to do with how fast that fan is going to run.
 
I'm asking in what specific scenarios that makes sense.

Extreme case would be new ratchet&clank where the gameplay almost seemlessly switches between levels. Another example could be something like gran turismo where jumping inside car could unload some data from ram and load the car interior on demand. If the car interior had 1GB of assets ps5 would pull that in about 0.1s. Or the car lods/track in gran turismo. Stream them in as needed instead of trying to fit everything into ram. Or something like entering a building in any game. Unload the outside assets and load the in building assets to ram. Sony is trying to move away from load whole level to ram and then play to streaming solution.

Simpler way of saying the same is sony's idea is to cache in ram what is needed in next 1s instead of next 30s what is being used in ps4 games.

 
Last edited:
Back
Top