Playstation 5 [PS5] [Release November 12 2020]

Yea, it's a good way to look at it. It's like asking a Zen 2 motherboard to support Zen 3 CPU; the zen 3 support would be present in the microcode for it to be able to support it.

Sorry, I phrased matters poorly. I meant that I'm completely unfamiliar with the term "no microcode reference" in the context of game development. I'm vaguely familiar with it in the context of general computing.

I've dabbled in some C++ and HTML in my time, and I always assumed that dev kits were somewhat similar insomuch as they're a significant number of steps away from microcode. Am I mistaken on that front?

And being able to make your own versions of these features doesn't necessarily imply you can outperform the hardware versions of these functions.

Oh, absolutely. I recently finished Death Stranding, and its checkerboard solution is substantially worse than that in God of War or Horizon Zero Dawn. As mentioned in this Digital Foundry video, Kojima's studio decided to checkerboard purely in software. And it's to the detriment of its presentation.

imo, I don't think there is a mid-gen refresh coming. The node shrink would not be significant enough to warrant a refresh and still keep the price points to where they are today. Next generation after this one will be interesting however. Curious to see how they intend to tackle it.

I disagree there. I don't think we'll see a PS4 to Pro power increase, and certainly not an XB1 to X1X increase, but I do anticipate a tentative step into chiplet territory.

I initially thought the PS5's 36CU design would lend itself well to doubling the GPU to 72 CU's, but I now think the mid-gen refreshes are going to be the likes of RDNA1>RDNA2 IPC improvements (that's more of a Microsoft move IMO,) tensor cores, more bandwidth, and a clockspeed increase, with the core/CU counts staying the same.

In essence, shrink the current SoC's to 5nm versions, use the lower performing ones on their own in slim versions of the PS5/XSX, and stick some chiplets onto the higher performing versions for "Pro/X" versions.
 
I'm completely unfamiliar with the term "no microcode reference."

No API, I could still imagine the hardware is there but it's yet to be exposed. Might the absence microcode reference be a relatively strong indication that the hardware simply isn't there to expose?
This may be in relation to no mention in low-level system code or a lack of provision in the command packets submitted to the GPU. Enabling various fixed-function features means sending commands to the command processor that set things like rasterizer modes, things not observable at the compute ISA level. If there's reference in the system code, or the command formats lack a place to enable the feature, then the feature cannot be toggled. The lack of reference isn't a guarantee of absence, as GPU microcode can be changed and the command processors can enact new behaviors after the change, but it's also not an encouraging sign that nothing mentions it.

Very interesting...
Does the PS5 have primitive shaders (as per AMD patent https://patentimages.storage.googleapis.com/66/54/00/c86d30f0c1c61e/US20200193703A1.pdf) ?
It at least has a form similar to what AMD called primitive shaders for Vega and later, which was more focused on the culling aspect while not delving into the deferred attribute portion so much.
Some of the wording about what the PS5's primitive shaders can do from Sony indicate capabilities that AMD only mentioned as future possibilities.
One thing I noted is how little AMD tries to talk about primitive shaders since the fanfare with the failed attempt with Vega. We know the culling-focused primitive shaders are used much more substantially with the most recent generation, but there's barely a blip in the marketing. Meanwhile, Sony's more feature-laden description gets called primitive shaders, so did they sort of take over the marketing name?

MS want commonality with the PC - that's the reason for the move to the GDK. Games that do use AVX256 need to be able to run just as fast on Xbox with minimal work. Might also increase the flexibility of cloud units - they won't always be full up with games.

Downside is that MS have to be able to deal with high thermal density of AVX256 no matter what, while staying almost silent. Tiny bit more die too.

Sony probably have a bit more leeway to cut back on this particular aspect of the CPU. It's all about dem tradeoffs, and they won't be the same for everyone.
Microsoft's solution was able to handle standard Zen 2 FPU thermal density with a silicon (edit: silicone) thermal compound. Sony's solution has a liquid metal interface and substantially increased GPU thermal density, so it seems like it has some notably better measures for handling it without cutting the silicon.
I'm still not sure about the wins for reducing the FPU. In the grand scheme of things, was Sony that desperate to limit that dimension of the chip? There's non-CPU silicon on either end of the Zen 2 section that look like at least half of that saved area didn't go into silicon with any known use, so potentially hobbling performance for a handful of mm2?
 
Last edited:
Microsoft's solution was able to handle standard Zen 2 FPU thermal density with a silicon (edit: silicone) thermal compound. Sony's solution has a liquid metal interface and substantially increased GPU thermal density, so it seems like it has some notably better measures for handling it without cutting the silicon.
I'm still not sure about the wins for reducing the FPU. In the grand scheme of things, was Sony that desperate to limit that dimension of the chip? There's non-CPU silicon on either end of the Zen 2 section that look like at least half of that saved area didn't go into silicon with any known use, so potentially hobbling performance for a handful of mm2?

That's a good point! I wonder if the seemingly shrunk FPU could be down to Sony's power management and turbo implementation rather than simply heat or die area? I'll try and explain what I mean.

AVX2 can cause massive swings on power draw. PS5 uses an activity based system to determine clocks and power balance, and maybe simplifying the FPU allowed a simpler, more responsive system that was easier to add to ZEN 2's existing power management systems.

I guess you'd need a simpler rule table, fewer counters, less steps per balancing calculation? Might allow you to be less conservative and so push boost clocks higher too (particularly on the GPU)...?
 
That's a good point! I wonder if the seemingly shrunk FPU could be down to Sony's power management and turbo implementation rather than simply heat or die area? I'll try and explain what I mean.

AVX2 can cause massive swings on power draw. PS5 uses an activity based system to determine clocks and power balance, and maybe simplifying the FPU allowed a simpler, more responsive system that was easier to add to ZEN 2's existing power management systems.

I guess you'd need a simpler rule table, fewer counters, less steps per balancing calculation? Might allow you to be less conservative and so push boost clocks higher too (particularly on the GPU)...?
Reducing width of the FPU would cut the peaks down, which may possibly indicate a limitation to well AMD's method can track the most extreme swings in utilization. That might in part explain why AMD's vector width lags Intel by a generation or more, although other reasons for not following Intel's direction closely exist.
One odd part of this is that Cerny's justification for the DVFS implementation directly mentions using 256-bit instructions, which renders that part moot if the FPU was cut to 128-bit.

However, it's hard to say what else could cause the FPU area loss, particularly for what seems to be a significantly narrower register file.
There are some things that could happen without losing native 256-bit execution, or at least not all of it.
Replacing high-speed multipliers with more compact versions can happen. Bobcat or Jaguar did something like this.
Maybe reverting some of the other functionality, such as not having native FMA? That would make Sony's FPU more like a widened Jaguar MUL+ADD solution.
Losing an execution port also has precedent, and maybe reducing some of the extra throughput for things like vector integer arithmetic or only having one port be 256-bit could help.
The downside for a fair amount of that is that it can add latency, which among other things would increase the utility of having more rename registers in what looks to be a skinnier register file.

Reducing port count or the total number of operands would lead to fewer ports on the register file, which at 12 read/6 write would have SRAM cells 3x larger than single-porting and with a much bigger operand network.
I didn't see note of something as significant as Carrizo's HDL for 7nm, although I don't think Zen uses the lowest-track libraries for its cells. I'm also not sure such changes can readily be applied solely to the FPU, either.
Rearchitecting the FPU for density would also cost performance, although perhaps a Zen 2 FPU that explicitly gives up on clocking 4.5GHz can be made smaller without going entirely 128-bit.
 
Last edited:
I disagree there. I don't think we'll see a PS4 to Pro power increase, and certainly not an XB1 to X1X increase, but I do anticipate a tentative step into chiplet territory.

I initially thought the PS5's 36CU design would lend itself well to doubling the GPU to 72 CU's, but I now think the mid-gen refreshes are going to be the likes of RDNA1>RDNA2 IPC improvements (that's more of a Microsoft move IMO,) tensor cores, more bandwidth, and a clockspeed increase, with the core/CU counts staying the same.

In essence, shrink the current SoC's to 5nm versions, use the lower performing ones on their own in slim versions of the PS5/XSX, and stick some chiplets onto the higher performing versions for "Pro/X" versions.
I was thinking that chiplets would be in play for next generation consoles as a possibility. But as a mid-gen refresh variant seemed too early/expensive to be ready in 4 years time while supportingn backwards compatibility.
I was thinking they would still be monolithic dies, which pretty much also means, very difficult to improve on what they have existing.
 
I was thinking that chiplets would be in play for next generation consoles as a possibility. But as a mid-gen refresh variant seemed too early/expensive to be ready in 4 years time while supportingn backwards compatibility.
I was thinking they would still be monolithic dies, which pretty much also means, very difficult to improve on what they have existing.

I think you could be right, but it's mostly due to the noise surrounding RDNA3 possibly using chiplets that I think it may not be as far off as the next generation.

If we see AMD release a GPU featuring chiplets within the next 12-18 months, I would be surprised if we don't see that within a mid generation refresh. Albeit in a limited capacity.

5nm at the same performance profile uses 30% less power, which wouldn't be nearly enough for the PS4 to Pro butterfly design. But it would still leave ~60 watts of performance available for mid generation refreshes. Clockspeed increases are an obvious contender, and would certainly be present to some extent. But some ML coprocessors for DLSS seem like they might be a pretty easy win that needn't necessarily impact the existing architecture.

Allowing higher clocked versions of the current PS5/XSX SoC's to render at a lower internal resolution than the base versions but output the same or higher strikes me as a relatively cheap way of getting visibly greater performance with minimal investment from developers.

Are Nvidia's tensor cores used for ray tracing at all, or are they pretty much just for DLSS? Denoising the ray traced image perhaps? It's been at least a year since I last read up on them.
 
I think you could be right, but it's mostly due to the noise surrounding RDNA3 possibly using chiplets that I think it may not be as far off as the next generation.

If we see AMD release a GPU featuring chiplets within the next 12-18 months, I would be surprised if we don't see that within a mid generation refresh. Albeit in a limited capacity.

5nm at the same performance profile uses 30% less power, which wouldn't be nearly enough for the PS4 to Pro butterfly design. But it would still leave ~60 watts of performance available for mid generation refreshes. Clockspeed increases are an obvious contender, and would certainly be present to some extent. But some ML coprocessors for DLSS seem like they might be a pretty easy win that needn't necessarily impact the existing architecture.

Allowing higher clocked versions of the current PS5/XSX SoC's to render at a lower internal resolution than the base versions but output the same or higher strikes me as a relatively cheap way of getting visibly greater performance with minimal investment from developers.

Are Nvidia's tensor cores used for ray tracing at all, or are they pretty much just for DLSS? Denoising the ray traced image perhaps? It's been at least a year since I last read up on them.
Tensor Cores for gaming applications are currently just used for DLSS. No real-time denoising via tensor cores in games, yet.
 
I see Moore's Law is Dead guy and RTG are having a fun pointing fingers at each other on Twitter, sharing DMs and who told what to whom first...Admittedly, seems like MLID started it yesterday by throwing RTG and another guy under bus for wrong PS5 info, saying he was misquoted and told it was rumor, and others ran with it.

I did say here multiple times that these two are fishing for info via DMs, B3D and Reddit and then leading on fanboys to pump up their views (which were nowhere near as high before they started speculating on NG consoles) and had 0 idea about either consoles, let alone inside scoops, but we needed die shots to put it to rest...

This is how I would describe these two

C-658VsXoAo3ovC.jpg
 
I think you could be right, but it's mostly due to the noise surrounding RDNA3 possibly using chiplets that I think it may not be as far off as the next generation.

If we see AMD release a GPU featuring chiplets within the next 12-18 months, I would be surprised if we don't see that within a mid generation refresh. Albeit in a limited capacity.

5nm at the same performance profile uses 30% less power, which wouldn't be nearly enough for the PS4 to Pro butterfly design. But it would still leave ~60 watts of performance available for mid generation refreshes. Clockspeed increases are an obvious contender, and would certainly be present to some extent. But some ML coprocessors for DLSS seem like they might be a pretty easy win that needn't necessarily impact the existing architecture.

Allowing higher clocked versions of the current PS5/XSX SoC's to render at a lower internal resolution than the base versions but output the same or higher strikes me as a relatively cheap way of getting visibly greater performance with minimal investment from developers.

Are Nvidia's tensor cores used for ray tracing at all, or are they pretty much just for DLSS? Denoising the ray traced image perhaps? It's been at least a year since I last read up on them.
I know next to nil about chip design costs. But my intuition would point that moving from monolithic die to chiplet design for a mid-gen refresh would be very costly, and something like this would be more in line with a new generation of console. There would need to be at the very least 3 chips, possibly 4. That has to be packaged. And package needs to run. And so you've added a lot of costs to build out say 3 chips all of which need to meet spec and yield. Then you need to package them together, and once again check yield on the final package I assume. Just seems extremely costly. For backwards compatibility you'd need to have 2 GPUs running at 2230Mhz for yield.

While I understand that chiplets are cheaper than monolithic dies for large chips, consoles have gravitated towards other cost saving methods: small packages, fixed clocks and redundancy on compute units to ensure that ultimately they are getting the most yield out of their chips. I'm not entirely sure that chiplets would necessarily help bring the costs down of these particularly small 300mm^2 and 360mm^2 SoCs we have today.

When looking at monolithic dies for mid-gen refresh, assuming we manage to get down to a node size that is worth making a refresh of, the butterfly design for PS5 does not lend well for this as I understand, it is very difficult to double a butterfly design, some heavy workarounds would likely be required - ie chiplets..

Then we get into the real challenge: obtaining the bandwidth required to feed this amount of compute for cheap. Which is probably the larger challenge of the two I suspect. You'd be looking at something like a bare minimum of 16GB HBM type memory or you'd have to bring onboard the infinity cache. Either way, moving up from 16GB will be painful. Once again, I'm not sure what options would be available in terms of increasing the storage size and bandwidth
 
Last edited:
I know next to nil about chip design costs. But my intuition would point that moving from monolithic die to chiplet design for a mid-gen refresh would be very costly, and something like this would be more in line with a new generation of console.

From the perspective of getting chips from wafers, the chip's architecture generally isn't any kind of factor - the wafer substrate has no concept of processor architectures but the materials comprising the wafer have different physical, chemical and electrical behavioural properties which need to be considered. It's often about square footage ("milimeter-age" ? :???:) and the actual micro-layouts of individual elements of the chips, their micro-power draw and micro-signal complexity which and how they are laid out, when coupled with the required yields which will dictates the type (and expense) of the wafer.

You end up with choices like cheaper wafers with lower yields or going the other way. More expensive wafers may result in higher tangible yields but more complex testing which may slow production or be more expensive. There is so much complexity across the many stages of producing a viable IC and, like design compromises of processors themselves, there is no one right answer just a bunch of decisions and compromises.
 
I see Moore's Law is Dead guy and RTG are having a fun pointing fingers at each other on Twitter, sharing DMs and who told what to whom first...Admittedly, seems like MLID started it yesterday by throwing RTG and another guy under bus for wrong PS5 info, saying he was misquoted and told it was rumor, and others ran with it.

I did say here multiple times that these two are fishing for info via DMs, B3D and Reddit and then leading on fanboys to pump up their views (which were nowhere near as high before they started speculating on NG consoles) and had 0 idea about either consoles, let alone inside scoops, but we needed die shots to put it to rest...

This is how I would describe these two

C-658VsXoAo3ovC.jpg

It's been a sh*tshow, to put it mildly xD

5nm at the same performance profile uses 30% less power, which wouldn't be nearly enough for the PS4 to Pro butterfly design. But it would still leave ~60 watts of performance available for mid generation refreshes. Clockspeed increases are an obvious contender, and would certainly be present to some extent. But some ML coprocessors for DLSS seem like they might be a pretty easy win that needn't necessarily impact the existing architecture.

The other benefit with that power consumption savings would be shrinking down the package sizing, something PS5 in particular really could benefit from. But I agree with @iroboto on mid-gen refreshes remaining monolithic chips; chiplets likely won't come until 10th-gen. Interestingly some analysis over on Twitter regarding PS5 possibly using a Type-C USB port in Alt-Mode for wired PSVR2.

I'm hoping they can also use this as a method for wireless module interface to enable wireless VR; just one step closer to making VR/AR standard in mainstream console gaming (the other big step would be having a VR/AR headset standard in default SKUs; by another six-seven years I think that should definitely be doable).

Then we get into the real challenge: obtaining the bandwidth required to feed this amount of compute for cheap. Which is probably the larger challenge of the two I suspect. You'd be looking at something like a bare minimum of 16GB HBM type memory or you'd have to bring onboard the infinity cache. Either way, moving up from 16GB will be painful. Once again, I'm not sure what options would be available in terms of increasing the storage size and bandwidth

Thankfully HBM2 prices (vanilla HBM2 i.e not the Samsung Flashpoint or SK Hynix HBM2E) have come down over the past few years; quotes put 16 GB @ $120. That's probably "only" $30 more than the GDDR6 both systems are actually using. However IIRC that bandwidth would only be at around 256 GB/s. The variants providing higher bandwidth MS and/or Sony'd need for hypothetical Pro/One X-level mid-gen refreshes costs more, prohibitively so for mid-gen refreshes. And that's not even considering increasing capacity from 16 GB to something like 24 GB or 32 GB.

It's a big reason I'm not putting a lot of stock into Pro/One-X mid-gen refreshes tbh, and it's not like Sony or Microsoft need them. The current systems are very capable and only just getting started. Any mid-gen refresh at least from Sony will be focused on size reduction, power consumption reduction and increasing drive capacity by MAYBE also redesigning some part of the flash controller for PCIe 5.0 and NVMe Gen 5 drives, moving everything to a single, swappable M.2 form-factor SSD but keeping much of the rest of the flash controller and I/O block the same? I guess they could also increase the bandwidth along with it, something like 7-8 GB/s but support 3P drives even faster than that (NVMe Gen 5 should support 16 GB/s drives maximum, but I wouldn't expect the PS5 decompression bandwidth to scale with that, probably no reason).

Microsoft would be more of a curiosity there but, this isn't a Microsoft thread so I'll shut up about that :p.
 
It's been a sh*tshow, to put it mildly xD



The other benefit with that power consumption savings would be shrinking down the package sizing, something PS5 in particular really could benefit from. But I agree with @iroboto on mid-gen refreshes remaining monolithic chips; chiplets likely won't come until 10th-gen. Interestingly some analysis over on Twitter regarding PS5 possibly using a Type-C USB port in Alt-Mode for wired PSVR2.

I'm hoping they can also use this as a method for wireless module interface to enable wireless VR; just one step closer to making VR/AR standard in mainstream console gaming (the other big step would be having a VR/AR headset standard in default SKUs; by another six-seven years I think that should definitely be doable).



Thankfully HBM2 prices (vanilla HBM2 i.e not the Samsung Flashpoint or SK Hynix HBM2E) have come down over the past few years; quotes put 16 GB @ $120. That's probably "only" $30 more than the GDDR6 both systems are actually using. However IIRC that bandwidth would only be at around 256 GB/s. The variants providing higher bandwidth MS and/or Sony'd need for hypothetical Pro/One X-level mid-gen refreshes costs more, prohibitively so for mid-gen refreshes. And that's not even considering increasing capacity from 16 GB to something like 24 GB or 32 GB.

It's a big reason I'm not putting a lot of stock into Pro/One-X mid-gen refreshes tbh, and it's not like Sony or Microsoft need them. The current systems are very capable and only just getting started. Any mid-gen refresh at least from Sony will be focused on size reduction, power consumption reduction and increasing drive capacity by MAYBE also redesigning some part of the flash controller for PCIe 5.0 and NVMe Gen 5 drives, moving everything to a single, swappable M.2 form-factor SSD but keeping much of the rest of the flash controller and I/O block the same? I guess they could also increase the bandwidth along with it, something like 7-8 GB/s but support 3P drives even faster than that (NVMe Gen 5 should support 16 GB/s drives maximum, but I wouldn't expect the PS5 decompression bandwidth to scale with that, probably no reason).

Microsoft would be more of a curiosity there but, this isn't a Microsoft thread so I'll shut up about that :p.

I'm not sure how long initial console droughts usually last but $ 499 seem easily lapped up but maybe it's unusual because of the whole pandemic.
Maybe even at scalped prices below $ 700, these consoles are selling. Wouldn't that be another reason for a mid gen refresh, even at a higher price point?

The bigger reason would probably the ps5 possibly getting very big negative pr across the web even if in reality, most won't notice the difference while gaming.
 
Cheaper to cool down bigger volume while keeping noise down? i.e. building it to smaller box would have been more expensive?

Reminds me of someone posting online complaining to people that 15.6 inch laptops being more expensive is ridiculous because 14.0/13.3 inch are more premium and tech is harder to miniaturize if specs are roughly similar.
 
Sorry, I phrased matters poorly. I meant that I'm completely unfamiliar with the term "no microcode reference" in the context of game development. I'm vaguely familiar with it in the context of general computing.

I've dabbled in some C++ and HTML in my time, and I always assumed that dev kits were somewhat similar insomuch as they're a significant number of steps away from microcode. Am I mistaken on that front?



Oh, absolutely. I recently finished Death Stranding, and its checkerboard solution is substantially worse than that in God of War or Horizon Zero Dawn. As mentioned in this Digital Foundry video, Kojima's studio decided to checkerboard purely in software. And it's to the detriment of its presentation.
HZD and Death Stranding have the same checkerboarding presentation - same engine, same technique. Fully Software.
 
The problem with the demand is partly that because of the chip shortages Sony hasn’t been able to ramp up production as much as it probably wanted to. This combined with a global launch and the pandemic creates a higher than ever demand. I mean my preorder, just a day after the preorders started, was only fulfilled ten days ago.
 
Back
Top