Next Generation Hardware Speculation with a Technical Spin [post E3 2019, pre GDC 2020] [XBSX, PS5]

Discussion in 'Console Technology' started by DavidGraham, Jun 9, 2019.

Tags:
Thread Status:
Not open for further replies.
  1. TheAlSpark

    TheAlSpark Moderator
    Moderator Legend

    Joined:
    Feb 29, 2004
    Messages:
    22,146
    Likes Received:
    8,533
    Location:
    ಠ_ಠ
    Just fall back on DX11. :twisted:

    /cheapshot
     
  2. Xbat

    Veteran

    Joined:
    Jan 31, 2013
    Messages:
    1,650
    Likes Received:
    1,315
    Location:
    A farm in the middle of nowhere
    Aren't RDNA 2 cards coming out next year? They need drivers too. I don't see that being an issue really.
    I'm not saying next gen is going to be using RDNA 2 but that the consoles are supposedly going to have some form of hardware accelerated Ray tracing surely points to RDNA 2.
     
  3. McHuj

    Veteran Subscriber

    Joined:
    Jul 1, 2005
    Messages:
    1,613
    Likes Received:
    869
    Location:
    Texas
    Posted today:



    If Navi 22/23 are RDNA2 based, then drivers are being actively worked on.
     
    HBRU and BRiT like this.
  4. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,785
    Likes Received:
    12,697
    Location:
    London, UK
    Don't worry about drivers, worry about each console's optimisation tools. Making launch games is hard enough trying to balance a bunch of things on low-cost hardware, but to suddenly have one part of the system behaving differently/faster may sound good in theory but not always in practice. Graphics tech getting a boost sounds cool but if the GPU is sharing bandwidth to RAM with the CPU, is the CPU now disadvantaged? Did code that use to fit in 16ms now take 17ms? :runaway:

    Without knowing anything about system architecture, it's hard to predict what any theoretical 'boost' to the graphics architecture means for the system as a whole.
     
    PSman1700, BRiT and function like this.
  5. function

    function None functional
    Legend

    Joined:
    Mar 27, 2003
    Messages:
    5,854
    Likes Received:
    4,411
    Location:
    Wrong thread
    I could see consoles possibly going for half the L3 of desktop. Consoles could probably get away without Epyc amounts of LLC. Save die area for CUs, lower L3 latency, and save a little power to spend elsewhere.
     
  6. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    I’m just speaking off my mind; though @3dilettante might be able to provide a better perspective here on this. It’s been my understanding that even though your design and architecture is done and taped out. You submit batches for silicon production. See how well the chips are performing in terms of yield and clocking performance, making adjustments, and retool and do again. Until you get to an acceptable quality in which you do massive production.

    I could be wrong on this front; but that process sounds long.

    And considering consoles require a lot of assembly as well; everything needs to come together be packaged and distributed in 12 months from now. Which probably leaves 3-4 months left before mass production begins to have 1-2 million launch consoles ready.

    I’m not saying it’s not RDNA2.0; I’m just saying the longer it takes for RDNA2.0 to release, the less likely its RDNA2.0
     
    PSman1700 likes this.
  7. HBRU

    Regular

    Joined:
    Apr 6, 2017
    Messages:
    837
    Likes Received:
    180
    Linux RDNA2 drivers worked on NOW is IMHO really a NEWS and makes a RDNA2 based GPU for both consoles a much closer reality... So it seems I was wrong. Do someone know what are the TF conversion factors of RDN2 in comparision to RDNA1 ?
     
  8. iroboto

    iroboto Daft Funk
    Legend Subscriber

    Joined:
    Mar 6, 2014
    Messages:
    14,834
    Likes Received:
    18,634
    Location:
    The North
    Lol. Launch like XBO? Hells yea!
     
  9. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,723
    Likes Received:
    242
    Nevermind RDNA 1.0 and 2.0 for a minute.

    Do we 100 percent believe that both PS5 and Scarlett will end up with 36/40/44 CUs. and not more?
     
    HBRU likes this.
  10. PSman1700

    Legend

    Joined:
    Mar 22, 2019
    Messages:
    7,119
    Likes Received:
    3,093
    Could be a custom solution, resulting in a hybrid, RDNA1 with RT, or something.
     
  11. McHuj

    Veteran Subscriber

    Joined:
    Jul 1, 2005
    Messages:
    1,613
    Likes Received:
    869
    Location:
    Texas
    no, just guesses based on Navi 10 die size and power consumption.
     
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    There was a GDC 2018 optimization hot lap from AMD that gave ~114, ~190, ~350 cycles respectively for L1 hit, L2 hit, and L2 miss.
    AMD indicated elsewhere 10% improvement with Navi, although I didn't see it giving specific values versus an overall improvement because of the additional capacity throughout. The most recent Navi architecture slides indicated there's a lower-latency path for loads that bypass the sampling hardware, but no concrete figures.

    Perhaps a BVH block would see latency closer to the direct load path, whatever value that is. If it's not at least an order of magnitude better, then it might explain why AMD's method has the BVH hardware defer to SIMD after each node evaluation. The CU's register file and LDS might be necessary to buffer sufficient context for a traversal method with such a long-latency critical loop. Perhaps it's counting on the CU's larger context storage to support more rays concurrently, or its register file and LDS to have the more reasonable latency figures for frequently hit node data.

    Reverse engineering of Nvidia's L1 in Turing shows it's on the order of 32 cycles, which while vastly better than GCN is sloth-like compared to CPUs. Less clear is which level Nvidia's RT cores interface with. It seems probable they're closely linked to the SM's memory pipeline, but some of Nvidia's patents might have it hooked up outside the L1 and reliant on the L2. The L2 is about as slow as GCN's, which might be problematic if that's how it's implemented. However, there were indications that the RT hardware would have its own storage at presumably reasonable latency, and there were some hints that there is storage management done by the RT core for memory not clearly associated with the L1 or L2.

    links:
    GCN memory: https://gpuopen.com/gdc-2018-presentation-links/, optimization hot lap
    Turing's memory: https://arxiv.org/pdf/1903.07486.pdf
    edit: Navi reference https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Architecture_public.pdf

    I'm unclear on the use of the term hybrid ray tracing. That's often used to describe a rendering engine that combines rasterization and ray tracing, but that's usually still on the GPU.

    When involving a read/write penalty in the context of CPU and GPU cooperation, it sounds like you might mean the heavier synchronization barriers between them. Those generally exist because the GPU's memory model is much weaker than the CPU's and the GPU's overall execution loop is vastly longer latency and unpredictable in length.
    How closely you think the CPU and GPU are cooperating might make a difference. Directly reading and writing to the same cache lines would either be brutally slow or error-prone. Trading intermediate buffers between certain synchronization points seems possible, but I'm not sure how much that would be a change from some of the mechanisms available already.
    A more integrated approach likely means the GPU's memory and cache pipeline is very different from what we've seen already described, and I am not sure a critical area like CPU memory handling is worth the risk of disrupting.

    We're throwing around version numbers like they mean something. We don't have a good way to define how much a given change increments a counter, or whether AMD would care if we did. AMD resisted applying a number to GCN generations for quite some time, with sites like Anandtech going with terms like GCN 1.2 for the generation after Sea Islands to try to describe changes in an architecture AMD treated as an amorphous blob--until it relented and labelled things GCN1 (Southern Islands), GCN2(Sea Islands and consoles), GCN3 ("GCN 1.2", Fiji/Tonga/Polaris), etc.--then reverted to calling the next version Vega ISA.

    In that context, the consoles were modifications from a GCN 1.x baseline already, one which AMD decided to give a whole number increment above the original hardware.
    As for what is considered significant enough, hardware outside the CU array has often been updated relatively flexibly. The mid-gen consoles took on delta memory compression found in the Polaris and Vega products, and instructions for packed FP16 math from Vega showed up in the PS4 Pro. However, I think there's evidence that significant architectural changes like scalar memory writes from GCN3 did not show up, so outside of the additional hardware they were very close the the GCN2 baseline.

    The transition from GCN to GCN2 may be a comparison point to RDNA1 to RDNA+?. One significant change to the ISA was the addition of a new instruction group for flat addressing, and a modest number of new instructions being added and some being deprecatd. Whether a new instruction or instructions for BVH node evaluation rises to the level a whole addressing mode may be up to the observer.

    I may have missed confirmation about details on the earliest dev kits for the current gen. I remember the rumor was PCs using GPUs like Tahiti.
    The earliest Sea Islands GPU to be released was Bonaire in spring of 2013. Hawaii was launched a little before the consoles launched.
    Early silicon for those GPUs would seem to be the absolute limit for when developers could have done anything with non-console hardware with Sea Islands features.
     
    #1472 3dilettante, Nov 2, 2019
    Last edited: Nov 2, 2019
    MBTP, Shortbread, function and 9 others like this.
  13. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,723
    Likes Received:
    242
    Okay good. Personally, I'd be really happy with 48 or 52 active CUs in either or both console GPUs.

    Lets say retail PS5 had 48 active at 1.8 ~ 1.9 GHz (2.0 GHz for dev kits, same 48 CUs)
    and retail Xbox Scarlett had 52 active at 1.7~1.8 GHz (56 dev kits, same clock).

    The raw performance difference is in the single digits, and its really the "secret sauce" in each
    (i.e. specific RDNA xx features and HW ray tracing implementations) that distinguishes each from the other.
     
  14. HBRU

    Regular

    Joined:
    Apr 6, 2017
    Messages:
    837
    Likes Received:
    180
    Yes
     
  15. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,516
    Likes Received:
    24,424
  16. DSoup

    DSoup Series Soup
    Legend Subscriber

    Joined:
    Nov 23, 2007
    Messages:
    16,785
    Likes Received:
    12,697
    Location:
    London, UK
    A couple of days back Sony's Jim Ryan said to gamesindustry.biz: "One thing that makes me particularly optimistic that what we're hearing from developers and publishers, is the ease in which they are able to get code running on PlayStation 5 is way beyond any experience they've had on any other PlayStation platform."

    Sony made a big deal about the ease of time-to-triange on PS4 so I'd chalk up most of this to nextgen being an technological evolution rather than a revolution. Xbox went from 80x86/Nvidia to PowerPC/AMD to 80x86/AMD and PlayStation went from MIPS/WeirdArse3D to MIPS/EmotionEngine to PowerPC/Nvidia to 80x86/AMD, with a few weird bus/RAM setups from both manufacturers along the way. This is probably the first console transition for years - outside of Nintendo - where the technology is immediately familiar and the toolchain isn't entirely new.
     
    #1476 DSoup, Nov 8, 2019
    Last edited: Nov 8, 2019
  17. anexanhume

    Veteran

    Joined:
    Dec 5, 2011
    Messages:
    2,078
    Likes Received:
    1,535
    Hello all. A large number of Microsoft ray tracing and GPU patents for your consumption:

    Changing how textures are handled to reduce memory bandwidth needs:

    https://patents.justia.com/patent/10388058

    Pre-passing to determine whether to ray trace or use SSR for pixels.

    http://www.freepatentsonline.com/y2019/0311521.html

    Hardware assisted GPU performance profiling:

    http://www.freepatentsonline.com/WO2019173115A1.html

    This seems to be the big one. Custom methods and hardware for intercepting GPU instructions for the purposes of accelerating ray-tracing:

    http://www.freepatentsonline.com/WO2019168727A1.html

    More GPU modifications to facilitate ray-tracing:

    http://www.freepatentsonline.com/WO2019168726A1.html

    Better GPU fault detection:

    http://www.freepatentsonline.com/y2019/0272206.html

    Managing GPU memory allocation:

    http://www.freepatentsonline.com/y2019/0236749.html

    Improving ray intersection testing

    http://www.freepatentsonline.com/WO2019099283A1.html

    BVH traversal:

    http://www.freepatentsonline.com/WO2019036098A1.html
     
    Pete, Newguy, mahtel and 12 others like this.
  18. bbot

    Regular

    Joined:
    Apr 20, 2002
    Messages:
    750
    Likes Received:
    13
    I tried my hand at trying to estimate the size of Project Scarlett's SoC. First, I had to find a ruler to measure the dimensions of the SoC and GDDR6 module in the reveal trailr . Then, I had to discover the actual dimensions of the gddr6 module. The short side is facing the SoC.

    350mm^2, my foot. It's more like 600mm^2. It's, like Donald Trump likes to say, "YUUUUUGE".
     
  19. Shifty Geezer

    Shifty Geezer uber-Troll!
    Moderator Legend

    Joined:
    Dec 7, 2004
    Messages:
    44,106
    Likes Received:
    16,898
    Location:
    Under my bridge
    Please post your measurements.
     
  20. Proelite

    Veteran Subscriber

    Joined:
    Jul 3, 2006
    Messages:
    1,620
    Likes Received:
    1,107
    Location:
    Redmond
    I am confident in my measurements of 24mmx15/16mm. The silver part in the middle is the die! You must be measuring the substrate too. :lol:

     
    psolord and Shifty Geezer like this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...