Playstation 5 [PS5] [Release November 12 2020]

Discussion in 'Console Technology' started by BRiT, Mar 17, 2020.

  1. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,717
    Likes Received:
    237
    I forgot, is it known if the Zen 2 floating point pipelines in Xbox Series X are full 256-bit or are they too cut down to 128-bit?


    Overall, I think for any future PS6 home hardware, Cerny and team probably already realize that everything will need to go significantly "wider" next time. A lot more GPU Compute Units, or equivalent around 2025-2027, more bandwidth (north of 1TB/sec) and possibly even dividing PS5 silicon into 2 or 3 high-volume 3nm/2nm GAFFET/EUV chiplets (with only 1 GPU rasterizer chiplet assuming putting in 2 will still cause huge developer headaches as well as whatever AMD will have in 2025+) .

    Assuming 10th gen consoles happen (~2027+) would inherently have, one way or another, better dedicated HW for RT that's more ambitious and capable than Nvidia Turing and Ampere. Not to mention, more dedicated, less "bolted on" as it seems, with both PC RDNA2 and PS5 RDNA2 (and Xbox RDNA 2 which is more like full PC RDNA 2.

    I mean, Microsoft will have to make similar choices, although Xbox Series X is already somewhat "wider" a design than PS5, which is narrow but GPU clock is very high.
     
    #8061 Megadrive1988, Feb 15, 2021
    Last edited: Feb 15, 2021
  2. chris1515

    Legend Regular

    Joined:
    Jul 24, 2005
    Messages:
    6,105
    Likes Received:
    6,378
    Location:
    Barcelona Spain
    First this is not sure what they have done. It look likes this isnbot halving the FPU thorouput but after a second and better shot, they have no idea. Reading what Locuza did for the Xbox Series X and will do for the PS5. I suppose they are customized CPU and GPU. Different choice, tradeoff, I like the transparency of MS, less how Sony choose to do the thing but it looks like the design are good on the two sides. We will wait two years of games exclusives and multiplatform and some GDC presentation and maybe it will help have a better understanding of what Sony has done in 2022/2023.

    EDIT:


    The better PS5 Zen 2 shot.
     
    #8062 chris1515, Feb 15, 2021
    Last edited: Feb 15, 2021
    Pete, Nesh, thicc_gaf and 1 other person like this.
  3. scently

    Veteran Regular

    Joined:
    Jun 12, 2008
    Messages:
    1,083
    Likes Received:
    420
    Full 256-bit going from the HotChips document.
     
    #8063 scently, Feb 15, 2021
    Last edited: Feb 15, 2021
    Silenti, thicc_gaf and BRiT like this.
  4. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,532
    Likes Received:
    485
    Location:
    Varna, Bulgaria
    As far as I can see, the FP reg file is cut in half and the FMA area is also reduced somewhat, so this might be indeed a 128-bit implementation of the original Zen 2 SIMD design.
     
  5. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,717
    Likes Received:
    237
    Neat, thanks!
     
  6. PSman1700

    Veteran Newcomer

    Joined:
    Mar 22, 2019
    Messages:
    4,520
    Likes Received:
    2,074
    Depends how you see it, if one suspected full fat zen2 cpu chips then yes. Even worse if the expectation was Zen3 and full rdna2 features.

    Cerny himself said so. But that it does impact CPU performance quite a bit.

    Probably. This 36CU most likely had to do with BC. Theres no other reason i can thinkoff going with narrow.

    Ok thanks.

    Ouch.
     
    mr magoo and thicc_gaf like this.
  7. thicc_gaf

    Regular Newcomer

    Joined:
    Oct 9, 2020
    Messages:
    324
    Likes Received:
    246
    Clarifies a lot. I think I can relay some of this to a DSP and get the general idea behind the FADD, so it would sound like a pretty necessary element to keep especially for more complex logic/physics/AI calculations (an area games could definitely benefit from more hardware support for outside of offloading to the GPU).

    Supposing the FADD has been cut, would it be possible to run that intended logic in software on the GPU and is that a potential reason (outside of simply higher pixel fillrates and culling/rasterization rates) Sony could've went with higher GPU clocks (to have some spare cycles for asynchronous compute such as 256-bit math calculations)?

    Only thing I'm still in contention with for 10th-gen Sony console would be actual need to go wider. Personally I still think they will favor a narrower design and rely on a lot of hardware accelerators and maybe other changes (they can maybe increase the number of TMUs to each CU, the number of shader cores per CU, increase of ROPs from 64 to 128) to reach acceptable performance while keeping the chip small and thus keeping costs down on production because shifting to smaller and smaller nodes is bringing higher prices, not lower.

    Of course some of that is my own wishing, I'm also hoping some tighter FPGA integration makes it into 10th-gen consoles from Sony & MS. But for Sony in particular I think they'll try and stick with narrower designs and offload other tasks to hardware accelerants while making the CUs themselves bigger (PS5's are 62% larger than PS4's for example).

    There is one other reason, actually: costs. These nodes are getting smaller but costs for the real estate is going up. So a narrower design means you can save on costs. To offset the reduced silicon presence, Sony chose to increase the clocks. That's a tradeoff in and of itself with its own pluses and minuses, as we're seeing.

    And I think costs are still going to be the reason they stay with a narrower design, though "narrow" could change over the years. 40 CUs used to be considered big, then 60, now it's 80. RDNA 3 rumors are for 120 CU dual-chiplet designs on the flagship, so maybe 60 CUs could end up being the new "narrow" by the time of 10th gen, who knows.
     
    #8067 thicc_gaf, Feb 15, 2021
    Last edited: Feb 15, 2021
    mr magoo and PSman1700 like this.
  8. liams

    Regular Newcomer

    Joined:
    Jul 1, 2020
    Messages:
    313
    Likes Received:
    265
    Just curious what sorts of things did you envision an integrated FPGA being used for? I cant see game developers designing FPGAs on a per game basis for rendering acceleration.

    I know people say probably say this every gen, but how much higher does console tflops need to go? Realistically I dont think anyone will be using 8K TVs ever, and as we go forward reconstruction/upscaling tech like DLSS will only improve. So maybe the next gen of consoles is something like 20 tflops but with much better ray tracing and the like. I think the 'add ons' like ray tracing, better physics, more interact-able environments will be the sorts of things that define future generations, not necessarily an increase in pixel count
     
    thicc_gaf likes this.
  9. Globalisateur

    Globalisateur Globby
    Veteran Regular Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    4,107
    Likes Received:
    3,027
    Location:
    France
    Mark Cerny at 34:33



    Then he explains they had to invent a new variable clocks system.
     
    Pete and RagnarokFF like this.
  10. thicc_gaf

    Regular Newcomer

    Joined:
    Oct 9, 2020
    Messages:
    324
    Likes Received:
    246
    Well that's the thing; the game devs wouldn't be programming the FPGA components (I want to use components here because I'm thinking more along the lines of the logic cells, LUT/BRAM etc. blocks and frontend/backend to handle the programming and targeted output; not so much a literal FPGA block just grafted onto the GPU), but rather presets of configurations that could be adjusted in short cycle time and loaded from some type of small, fast and preferably updatable block of storage on the GPU, like maybe perhaps some type of MRAM cache.

    I think with some FPGA logic integration and hardware acceleration, you basically can leverage more with that versus just piling on the TFLOPs. Personally I don't think 10th-gen systems are gonna go over 35-40 TFs; with what type of memory'll likely be around and in decent quantities (I think they'll definitely need to go HBM-based by then, at least one of them will) you wouldn't want to push TF too much higher than that if you still want decent bandwidth-per-TF numbers.

    10th-gen systems will need more than just power increases or even just faster storage to justify them, though, IMHO, I think we're on the same page with that. So I'm hoping VR & AR are standardized with that generation, instead of treated as peripheral bonuses in the ecosystem like they are currently (at least on PlayStation; VR/AR isn't even supported by Xbox at this time).

    I mean, this can be (and is) true but it doesn't refute the x-ray scan if that's why it's being brought up. I think iroboto, function or tunafish mentioned about there being "standard" hardware in the CPU that can handle FADD instructions with a 5-cycle latency and specialized units in the CPU also able to handle FADD with 2-cycle latency.

    The point of interest seems to be that PS5's removed the specialized units that offer the lower latency, but it doesn't mean they lack native hardware support for 256-bit AVX instructions.

    EDIT: A bit an aside but I think it's safe to say Sony's work with the variable frequency system is the "result of fruitful collaboration" that's now being seen in the RDNA 2 GPUs with those and Zen 3 CPUs being able to shift power budgets and SAM.
     
  11. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    12,406
    Likes Received:
    3,354
    Didn't one of the AMD cpu's use two 128bit units to do a 256 bit instruction back on bulldozer ?

    Could this be how sony is supporting it ?
     
  12. Ronaldo8

    Regular Newcomer

    Joined:
    May 18, 2020
    Messages:
    270
    Likes Received:
    320
    Am I the only one noticing the "butterfly" design of PS5's GPU (like the PS4 Pro) compared with the more monolithic style of the SX ?
     
  13. Nesh

    Nesh Double Agent
    Legend

    Joined:
    Oct 2, 2005
    Messages:
    12,906
    Likes Received:
    3,072
    So what does that mean in terms of performance?
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,552
    Likes Received:
    4,713
    Location:
    Well within 3d
    The area savings for shaving off that portion of the FPU seem minor in the grand scheme of things. Would the bright areas on either side of the CPU section be test silicon/pads, or could those areas be blank? There are some sort of visible striations, but I don't recognize the patterns from other AMD silicon.
    Would Sony have been that desperate for die area to pay for a rearchitecting of the FPU and new layout, or maybe this is something AMD had on offer, like a scrapped alternate version of the mobile core?
    Another consideration is thermal density, since Microsoft cites the 256-bit FPU as being the thermal limiter of the Series X.
    https://www.anandtech.com/show/16489/xbox-series-x-soc-power-thermal-and-yield-tradeoffs
    "For Scarlett, it is actually the CPU that becomes the limiting factor. Using AMD’s high-performance x86 Zen 2 cores, rather than the low power Jaguar cores from the previous generation, combined with how gaming workloads have evolved in the 7 years since, means that when a gaming workload starts to ramp up, the dual 256-bit floating point units on the CPU is where the highest thermal density point happens."

    Granted, the PS5 GPU probably ramps thermal density significantly more, and then there's the liquid metal TIM.

    Individuals expecting Zen3 were setting themselves up for disappointment. I don't consider that a fair standard to measure the downgrade.

    From the die shot, the GPU really dominates the die area already. The ratio of GPU to overall die area may need to checked with the PS4 and PS4 Pro. This might be somewhere in the same range as the original PS4, while the GPU area for the PS4 Pro was even more lopsided.
    36 would make sense as a minimum that they couldn't go below.


    The clocking method isn't particularly new, as far as AMD is concerned. The PS5 implements a less aggressive version of AMD's DVFS.
    The claim that the CPU supports native 256-bt instructions leads to questions about what was done the FPU.
    The register file is split like the original 256-bit Zen 2 FPU, but the area and layout don't match very well. If the FPU were treated like two 64-bit halves, that might explain why the alleged register file section is also narrower.
    The Bulldozer line did have a series of changes to the FPU, first by dropping one FP pipe, and then the Steamroller to Excavator transition included high-density libraries that saved quite a bit of area at the expense of top-line clocks. The area savings were notable for the FPU, but I don't think they were limited to just the FP portion and the register file didn't benefit that much.
    The PS5's CPU cores look pretty standard outside of the FPU.



    RDNA GPUs have gone with either layout, depending on unit counts and possibly considerations like making room for other silicon.
    Using a two-sided arrangement like the PS4 Pro means that particular way of growing the GPU in a mid-gen refresh is ruled out.
     
    Pete, thicc_gaf, iroboto and 5 others like this.
  15. Ronaldo8

    Regular Newcomer

    Joined:
    May 18, 2020
    Messages:
    270
    Likes Received:
    320
    My post was more in the general context of BC with PS4/PRO.
     
  16. Globalisateur

    Globalisateur Globby
    Veteran Regular Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    4,107
    Likes Received:
    3,027
    Location:
    France
    You already knew an AMD system with dynamic clocks based on total instructions budget?
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,552
    Likes Received:
    4,713
    Location:
    Well within 3d
    The earliest version of AMD's method for DVFS I can recall was patented around the Bulldozer introduction, and various iterations go through later CPUs and GPUs.

    Compared to Bulldozer's contemporary form Intel, Sandy Bridge, AMD didn't go with turbo and frequency control using thermal sensor input as the primary input. The space trade-offs and longer latency for a temperature sensor next to a logic block weren't acceptable to AMD.

    Sandy Bridge instead implemented smaller temperature sensors that were designed to only measure within a more limited temperature range near the top end of the operating specs, which meant they could be smaller and more responsive.
    AMD's initial claims were the use of activity counters in the hardware, whose results would depend on the instruction mix going through the core. The counters would be paired with a table of values for the approximate thermal impact of an event in that region of the chip at the present conditions of the silicon.
    This allowed for more rapid detection of thermal spikes at lower die area cost, although it would depend on other factors like the accuracy of the silicon characterization to determine how close it was to calculating temperatures.
    The characterization is necessarily conservative, but a conservative calculation that can accumulate dynamic data at the cycle level can potentially approximate better than thermal diodes that might not register change for multiple milliseconds.

    Bugs or weak silicon characterization have dogged AMD at times. For example, Jaguar should have had turbo (one SKU had a very limited upclock when not on battery power), but its hop to TSMC for the prior gen may not have been part of the original plan. It wasn't until the Globalfoundries variants came out that the turbo AMD had announced for Jaguar was actually offered. AMD chips have a history of getting weaker clocking or less effective turbo in the first generation of a chip, with the later refresh usually having more effective clocks despite being generally the same silicon. (Jaguar, Bulldozer APUs, Ryzen 2, 7970, multiple Hawaii generations, etc.).

    Later versions of the DVFS would include more dynamic monitoring of current and voltage behavior, with blocks of dummy ALUs and other units running representative operations to better approximate what the actual logic is doing.

    If you take that foundation and make some adjustments for consistency over the whole family, it acts like what Sony claims. The lookup table for characterizing a chip's silicon doesn't need to be unique to a chip. It would be best for efficiency and top clocks if the values did match the chip, but as long as they aren't too aggressive the chip will function fine. As Sony indicated, there's an ideal SOC model that is given to all PS5 chips, which means their DVFS algorithms produce the same output based on the shared set of values. What's lost is the upper performance range.
    For the CPU, it's a matter of dropping the boost clocks of an architecture able to go +4.5 GHz. For the GPU, it's dropping the upper turbo clocks first shown with Big Navi and going significantly narrower. There's some indication that many RDNA2 chips could go faster, but the overclocking settings max out before the chips do.
     
    Pete, PSman1700, thicc_gaf and 5 others like this.
  18. Globalisateur

    Globalisateur Globby
    Veteran Regular Subscriber

    Joined:
    Nov 6, 2013
    Messages:
    4,107
    Likes Received:
    3,027
    Location:
    France
    Everything is an evolution of something already existing. Cars are nothing new as they evolved from carts pulled by horses.
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,552
    Likes Received:
    4,713
    Location:
    Well within 3d
    The specific link is the use of event counters paired with silicon characterization data to calculate power and temperature, rather than directly measuring them.
    To get what the PS5 does, a universal table that all PS5 APUs follow would yield consistent behavior, as long as other variation-inducing measures like the upper boost clocks are removed.

    The PS5's method can be done by AMD's existing DVFS, by dropping clocks, not using the upper clock range, and not using per-chip characteristic data--doing less than what AMD's standard solution can do.
     
    egoless, tinokun, Pete and 6 others like this.
  20. Vega86

    Newcomer

    Joined:
    Sep 25, 2018
    Messages:
    182
    Likes Received:
    123
    Is there a different chip where the ssd controllers/kraken things reside? Will they x-ray that too?
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...