AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

Thread Status:
Not open for further replies.
  1. PSman1700

    PSman1700 Legend

    Cuthalu likes this.
  2. pTmdfx

    pTmdfx Regular

    `num_rb_per_se` is halved from 8 to 4 in the scrapped info from firmware binaries for all Navi 2X GPUs. So the RB:SA ratio is implicitly halved given that SE:WGP is unchanged at 1:10 (40 WGPs for Navi 21), unless SE:SA is changed from 1:2 (unlikely?).

    This means Navi 21 and 22 will get 4 RBs * 4 SEs and 4 RBs * 2 SEs respectively by that metric, and 64/32 colour ROPs respectively, assuming each RBE still does four 32b pixels/clk.
     
    Last edited: Sep 27, 2020
    Lightman, BRiT and NightAntilli like this.
  3. iroboto

    iroboto Daft Funk Legend Subscriber

    That's base clock speed, not sure how high boost goes, or what the marketed game clock will be
     
    PSman1700 likes this.
  4. PSman1700

    PSman1700 Legend

    Yes, it's the least performance clock you get, it can only upclock from there if the conditions allow so.
     
  5. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■) Moderator Legend Alpha

    Or how often it will run above base clock.
     
    PSman1700 likes this.
  6. PSman1700

    PSman1700 Legend

    Most of the time in ideal operations.
     
  7. SimBy

    SimBy Regular

    There's literally a firmware dump with clocks and other info a few pages back. I think I'll trust those more than some newegg rando.
     
    yuri, Picao84, Cuthalu and 4 others like this.
  8. Cat Merc

    Cat Merc Newcomer

    Ah, I missed that
     
  9. 3dilettante

    3dilettante Legend Alpha

    AMD's rasterizers divide screen space into a checkerboard pattern, where each rasterizer and associated RBEs is solely responsible for handling geometry in a given tile. A bounding box based on the min and max xy coordinates can be used to look up responsibility with 4 rasterizers, and it's significantly simpler or trivial with 2 or 1. That's what made me question rumors positing a change that didn't fit this scheme.

    Three isn't so clean, unless striping screen space into vertical regions, but that could more realistically lead to unbalanced distribution of work in scenes with a lot of vertical geometry.
    It wouldn't be impossible for AMD to alter things, but there's a lot of power of two assumptions built into things like ROP tiles, rasterizer regions, caches, and exports as well.
     
  10. Cyan

    Cyan orange Legend

    Lightman likes this.
  11. Leoneazzurro5

    Leoneazzurro5 Regular

    Well one is quite probably a summary of possible rumors (6Gbytes on a 6700XT? Well no, if they are not aiming at a sub-300$ selling price - with 6Gbytes you are starting to be frame buffer limited in various games, not even mentining the regression in clock when AMD explicitely stated incresing it); the other comes from a dump of an OS. Which one has better chance to be right?
     
    Cat Merc, Lightman and chris1515 like this.
  12. pjbliverpool

    pjbliverpool B3D Scallywag Legend

    XSX does 8 so its safe to assume Navi2x does too.
     
    PSman1700 likes this.
  13. pTmdfx

    pTmdfx Regular

    Where is this number coming from? The HC31 presentation says 116 Gpixel/sec, which is ~64 pixel/clk at 1.825 GHz.

    Its block diagram does draw only one RB per shader array, but I would ehh on interpreting that as “1 new RB is the new 2 old RBs”. The diagram is not drawn for precision, especially if you look at L0$ and L2$.
     
    Pete and trinibwoy like this.
  14. Jawed

    Jawed Legend

    Die area question is the elephant in the room, still.

    24 dual-CUs (WGPs) in XSX die shot use 95.78mm², about 2mm² per CU.
     
  15. pjbliverpool

    pjbliverpool B3D Scallywag Legend

    The Hot Chips presentation shows us that the XSX has 2 Shader engines and 64 ROPS so that's 32 ROPS per SE. This leak shows us that Navi2x appears to have 4 RBE's per SE which means that if the same holds true for XSX, it must have 8 ROPS per RBE for a total of 64.

    So unless Navi21 has a different configuration it stands to reason that it's 4 Shader Engines will come with 128 ROPs.
     
  16. Jawed

    Jawed Legend

    This is fun (I can't work out how to link an image posted in a tweet):

    https://pbs.twimg.com/media/EisSihKXgAEw4ls?format=jpg&name=large

    from:



    I've not heard of InFO_MS before:

    https://www.cieonline.co.uk/cadence...tsmc-info_ms-advanced-packaging-technologies/

    I'm struggling to understand what this really is. It seems to be a "non-interposer" based chip stacking technology.

    from: https://www.cadence.com/en_US/home/solutions/3dic-design-solutions.html
     
    Last edited: Sep 28, 2020
    Pete and Deleted member 13524 like this.
  17. Ext3h

    Ext3h Regular

    https://www.anandtech.com/show/16051/3dfabric-the-home-for-tsmc-2-5d-and-3d-stacking-roadmap

    Explanation for most of the terms.

    And for InFO_MS:
    https://fuse.wikichip.org/wp-content/uploads/2019/07/semicon-2019-tsmc-info_ms.png

    Both memory and logic die have fan-out integrated (InFO), and the memory sits directly on the substrate with a "back end" (side by side) connection between the logic and the memory die. No interposer, but the dies are stretched to work with the spacing of the substrate or edge to edge bonding.
     
    Last edited: Sep 28, 2020
    Kej, Pete and Jawed like this.
  18. Jawed

    Jawed Legend

    Wow, so much out of the loop on this stuff. New respect for TSMC, too.

    So, the rumoured ~500mm² Navi 21, if usng InFO_MS to work with HBM, could have substantially less than ~500mm² active GPU area...
     
  19. Ethatron

    Ethatron Regular Subscriber

    It is, thanks. I had a big laugh looking up if Rambo Cache is something serious, and then finding Raja in front of a flat line performance graph ... ROTFL

    Edit: Rambo Cache surfaces in the tweets responses.
     
    Jawed likes this.
  20. Leoneazzurro5

    Leoneazzurro5 Regular

    InFO_MS does not account RAM area as logic area, and by the way the (few) pictures we have and card renders show a package that is quite bigger than Radeon VII (which had 4 HBM dies) and it is on par with Vega 10 package (which was a 495 mm^2 chip with 2 HBM chunks)
     
Loading...
Thread Status:
Not open for further replies.

Share This Page

Loading...