AMD: Navi Speculation, Rumours and Discussion [2019-2020]

Discussion in 'Architecture and Products' started by Kaotik, Jan 2, 2019.

Thread Status:
Not open for further replies.
  1. andermans

    Newcomer

    Joined:
    Sep 11, 2020
    Messages:
    28
    Likes Received:
    43
    It should be 8 per RBE now for most common 32-bit and lower formats, just as most of the APUs since Stoney Ridge.
     
  2. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,111
    Location:
    New York
    That makes the most sense given each rasterizer spits out 16 pixels per clock for a total of 128 on Navi21. Unless of course the rasterizers were scaled back to 8 pixels per clock but that seems unlikely.
     
  3. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    Indeed, why would they halve the number of RBE's per shader engine from Navi10 to Navi20 without compensating elsewhere?

    Also the XSX has 64 ROPS with only 2 Shader Engines = 8 ROPS / RBE.

    So Navi 21 = 128 ROPS.
     
    BRiT and Lightman like this.
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    AMD may double zixel (z buffer) rate per RBE - like it did with RV770.

    The rate of render target colour operations is theoretically falling, because of shader complexity (this is why NVidia doubled FP32 ALU rate). The time you need unreal fillrate is for depth pre-pass and shadow buffer rendering, which is zixel rate.

    Also, game engines are moving away from deferred rendering, which is a frequently encountered use case for unreal colour fillrate (short shaders writing lots of colour bytes per pixel in the G-buffer pass).

    5700XT appears to have far too high colour fillrate for its actual performance in games.

    Anyway, these are just my theories.
     
    no-X, NightAntilli and BRiT like this.
  5. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,111
    Location:
    New York
    That’s true but there’s also the 4K hype to account for. That’s a pretty significant increase in fillrate requirements.
     
    Lightman likes this.
  6. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,723
    Likes Received:
    242
    Has there ever been an AMD GPU with more than 64 ROPs?

    I'd like to believe Navi 21 has 128 ROPs, but my hopes for 96 or more ROPs from AMD have been dashed, time and time again.
     
  7. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,235
    Likes Received:
    4,259
    Location:
    Guess...
    It's certainly a fill rate monster. 2.2Ghz puts it at 72% more fill rate than the 3080. Now we just have to see if the memory bandwidth is enough to back it up. Given the XSX is rocking a 320bit bus though I find it virtually inconceivable that Navi21 will be stuck on 256bit. My money's on 384bit with 18Gbps for 864GB/s.
     
  8. NightAntilli

    Newcomer

    Joined:
    Oct 8, 2015
    Messages:
    104
    Likes Received:
    131
    All the leaks are suggesting 256-bit with 128MB of some sort of cache to compensate for the lower bandwidth. Sounds like how the X360 used its eDRAM or the XBO used its eSRAM. Hopefully, if it is like this, immature drivers/firmware doesn't gimp the performance of the cards.
     
  9. Cyan

    Cyan orange
    Legend

    Joined:
    Apr 24, 2007
    Messages:
    9,734
    Likes Received:
    3,460
    sounds fine to me. An affordable GPU with decent RT performance would be ideal. Another more of the same GPU makes no sense when you have the 5600XT at that price. In some countries a potential cheap GPU featuring RT would be a winner.

    Navi 21 or Sienna Cichlid seems to have 80 CUs or 5,120 SPs, assuming that each CU still carries 64 SPs on RDNA 2. Navi21A silicon shows a boost clock up to 2,050 MHz. The Navi 21B silicon seems to have a 2,200 MHz boost clock. Most interestingly, the power limit varies from 220W to 238W?

    Quite promising if you ask me.
     
  10. NightAntilli

    Newcomer

    Joined:
    Oct 8, 2015
    Messages:
    104
    Likes Received:
    131
    That would be the GPU only, not including the RAM, fans and whatever else needs power on the PCB, like RGB etc.
     
    T2098 and Cyan like this.
  11. Cyan

    Cyan orange
    Legend

    Joined:
    Apr 24, 2007
    Messages:
    9,734
    Likes Received:
    3,460
    mmmmm that changes things. A lot. It looks like it is a 22.5 teraflops gpu btw
     
  12. Love_In_Rio

    Veteran

    Joined:
    Apr 21, 2004
    Messages:
    1,627
    Likes Received:
    226
    If they defeat Ampere 3080 with 230 watt AMD will indeed have got its new 9800 pro after so many years.

    By the way, Cerny redeemed?, 2,23 Ghz not a last time overclock?. ;)

    I wonder if PS5's IO SRAM is the rumored big new cache.
     
    #3332 Love_In_Rio, Sep 27, 2020
    Last edited: Sep 27, 2020
  13. yuri

    Regular

    Joined:
    Jun 2, 2010
    Messages:
    283
    Likes Received:
    296
    It seems there haven't. Hawaii brought whooping 64 ROPs with 4 SEs back in 2013. After that AMD was stuck at 64 ROPs with all following gens counting Fiji, Vega 10, Vega 20 and Navi 10.

    There have been only a single "leak" mentioning the 128MB "cache" so far. The 256b interface was derived indirectly from the drivers.

    Using Occam's razor one would say the notation in drivers has changed and doesn't reflect Navi 2. So all the alien tech super-caches go away.
     
  14. Frenetic Pony

    Regular

    Joined:
    Nov 12, 2011
    Messages:
    807
    Likes Received:
    478
    Ah shit, better go tell Naughty Dog and Unity they're out of date :razz:

    Really though, while newer weird stuff like deferred texturing and whatever UE5 does will probably show up more and more, I don't see deferred being ditched entirely. Heck now you've got all the layer mixing and deferred decals you want.

    Also I read that "FRC" paper you posted. Why, why are thesis written this way, why are they made to be the least understandable and most overwritten paper you can make, whyyy. But anyway, trying to skip through it, while it's a neat idea for reducing fetch latency and l2 miss at the same time, ultimately it just results in more IPC (or OPC, as the author calls it for some reason) if their modelling is correct. Which means, as their own numbers way far down in the paper suggest, that instructions are done faster on average and so even more traffic to main memory is generated, despite the increased hit rate. Absolute opposite of magic cache that makes main memory access go away as a problem.
     
    #3334 Frenetic Pony, Sep 27, 2020
    Last edited: Sep 27, 2020
    BRiT likes this.
  15. pTmdfx

    Regular

    Joined:
    May 27, 2014
    Messages:
    415
    Likes Received:
    379
    L2 cache bumping up to 128MB is not that unlikely now, having a thought about it. This is considering rumours suggesting Navi 21 being twice as large (~505 mm^2) as Navi 10, while on the other hand still sticking with a 256-bit GDDR6 bus.

    It explains why there are very few clues in the drivers, and it could be a motivation of AMD reducing/eliminating L2 flushes and adding GL1 cache in the renewed RDNA memory hierarchy. GL1 cache can replace L2 for the bandwidth amplification role in texture reads, and it does seem a more natural cache level for doing this, especially with its fixed tie to a binning rasterizier and its screen space tiles (forces that contribute to locality).

    Although if this were to happen, I would expect clues emerging in the shader compilers by now, with e.g. new optimization passes altering L2 cache policies of memory requests, based on shader type and resource type. Otherwise, commonality of streaming-like accesses in the conventional graphics pipeline is going to work against a way larger L2 cache, which has been small with a write combining focus throughout its GCN lineage, rather than having a decent read hit rate.

    Another possible implementation of "128MB cache" is embedded SRAM banks like Xbox One. Combined with the HBCC resurrected from Vega 10, we will have 64KB (?) pages being hot migrated between the eSRAM and the GDDR6 pool. This could also explain a lack of clue in OSS drivers, because in theory the GPU can function solely with the GDDR6 pool, and so the OSS code drop can be delayed.

    Edit: More on-chip SRAM could also explain why Navi 22 was rumoured to be 320+ mm2, despite apparently having same amount of CUs and 25% narrower in GDDR6 bus. (assuming that intersection module in TMU has a slim transistor budget)
     
    #3335 pTmdfx, Sep 27, 2020
    Last edited: Sep 27, 2020
    Pete, NightAntilli, Lightman and 2 others like this.
  16. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    161
    Likes Received:
    179
    The very same firmware tables show Navi 10 as having a clockspeed of 1400, which is 300 ish MHz under what it actually does. I wouldn't count on the Navi 2x number being an absolute limit. The power of the GPU actually going down between N22 and N10 despite having the same CU count tells me they're not even pushing it as hard.

    That said N14 is 1900MHz in those tables, which is actually a reasonable number, so unless we know what the exact function here of the driver we can't say what it means. There's also the likely possibility of RDNA2 itself behaving differently with the same input configuration.
     
  17. Wesker

    Regular

    Joined:
    May 3, 2008
    Messages:
    299
    Likes Received:
    186
    Location:
    Oxford, UK
    Those Navi 10 clocks could coincide with the Radeon Pro GPUs:
    https://www.techpowerup.com/gpu-specs/?generation=Radeon+Pro+Mac&sort=generation

    Just looking at Navi 10, Apple may then use some kind of multiplier algorithm to control max/base clocks in the 5600M (MacBook Pro), 5700 and 5700 XT (iMac), and W5700X (Mac Pro).

    I presume Apple are going in on Navi 21 for a Mac Pro MPX module. The plot thickens... Navi 21, if it's above 2.00GHz, is going to have some very impressive performance.
     
  18. Megadrive1988

    Veteran

    Joined:
    May 30, 2002
    Messages:
    4,723
    Likes Received:
    242
    For that reason, I find it very hard to believe Navi 21 has 128 ROPs.

    It was widely reported that Radeon VII had 128 ROPs but of course that turned out not to be the case, it was just 64.

    For those that are saying Navi 21 has 128, I'd like to know the reasons why, some evidence, etc. Not just rumor.
     
    no-X and chris1515 like this.
  19. pharma

    Veteran

    Joined:
    Mar 29, 2004
    Messages:
    4,887
    Likes Received:
    4,534
    Newegg is listing Radeon RX 6700 XT, 6800 XT and 6900 XT specs in its blog
    https://www.guru3d.com/news-story/n...-xt6800-xt-and-6900-xt-specs-in-its-blog.html
     
    Cyan and PSman1700 like this.
  20. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    161
    Likes Received:
    179
    The ROPs in Navi are tied to SA's. According to the drivers the ratio of WGP's to SA's is identical in Navi 21, which means the number of SA's is doubled from Navi 10. Unless they cut the ROP count per SA for ??? reason, it will have 128 ROPs.
     
    pjbliverpool and Lightman like this.
Loading...
Thread Status:
Not open for further replies.

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...