AMD Radeon RDNA2 Navi (RX 6500, 6600, 6700, 6800, 6900 XT)

Discussion in 'Architecture and Products' started by BRiT, Oct 28, 2020.

  1. yuri

    Regular

    Joined:
    Jun 2, 2010
    Messages:
    283
    Likes Received:
    296
    Those roadmaps are a surprisingly clever marketing product - they allow for a two years long launch window.

    Zen 3 was launched in 2020, although it was just a supersmall launch compared to the 2021 schedule. In 2021 AMD is launching following Zen 3 products: Milan server processors, non-X and the rest of X desktop processors, Cezanne APUs, Threadrippers HEDTs, and *maybe* the Warhol DDR5 refresh. A similar pattern is for RDNA2. This year AMD should launch only the 6800/6900 series. The 6700 and lower dGPUs, the PRO line, and APUs are clearly 2021.
     
    Lightman, Jawed and chris1515 like this.
  2. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Samsung is using RDNA2, not RDNA3 (of course this could change by the release but that's the official story so far)
    RDNA2 wasn't developed with "Sony's money" any more than it was developed with "Microsoft's money", it would have been developed regardless of either console manufacturer and most of what Sony and MS pay are just now starting to roll in as they're actually buying the chips from AMD.
    Considering that each architecture is a multi-year project really, of course they have overlapping development by different teams.
     
  3. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Not saying that goalposts might have shifted in the meantime. It's G6, not G5 anymore. But regarding 6800: AMD is using the fastest GDDR6 available.
     
    Erinyes, PSman1700, Lightman and 2 others like this.
  4. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,969
    Likes Received:
    963
    Location:
    Torquay, UK
    I can run few tests, just let me know settings and benchmark you want to see scaling in.

    My memory starts artifacting at 2150MHz with fast timings, haven't spent much time finding real limits as it runs fine at 2100MHz and I was spending my free time on tweaking core first. Undervolting boosted my Port Royale score from 9760 to 9970 give or take few points. Core in both cases set to 2727MHz and all the other parameters kept equal.
     
    Pete and PSman1700 like this.
  5. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,111
    Location:
    New York
    Looks like SAM is up and running on Intel + RDNA 2.

    https://wccftech.com/amd-smart-acce...ble-performance-gains-with-radeon-rx-6800-xt/

    Also, there is something about Zen 2 and earlier possibly not supporting SAM due to a hardware limitation.

    https://www.techpowerup.com/275565/...mitation-intel-chips-since-haswell-support-it

    Quite embarrassing if true.
     
  6. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,451
    Likes Received:
    471
    Isn't Zen3's IO die (incorporating PCIe controller) identical to Zen2's?
     
  7. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,055
    Likes Received:
    3,111
    Location:
    New York
    Presumably it’s a CPU instruction or the IO controller isn’t 100% the same. Or maybe it’s just a bullshit rumor.
     
  8. manux

    Veteran

    Joined:
    Sep 7, 2002
    Messages:
    3,034
    Likes Received:
    2,276
    Location:
    Self Imposed Exhile
    Isn't it so gpu connects straight into cpu via pcie. IO die is mainly for cpu-chipset connection? Or io die acts as pass through without much logic?
     
  9. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,451
    Likes Received:
    471
    CPU chiplet has no PCIe interface nor PCIe controller. Both are located in the IO chiplet, which is connected to the CPU chiplet via GMI2 / IFL.
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,708
    Likes Received:
    2,132
    Location:
    London
    It seems like it would be a waste of time doing these tests since the potential gain is so low and may not even exist.

    I was curious about what might be seen with a refresh in June, say, if it had GDDR6X.
     
    Lightman likes this.
  11. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,969
    Likes Received:
    963
    Location:
    Torquay, UK
    My memory overclock is only by 5% and with IC in place, makes it really difficult to notice its impact.
    Just of out curiosity, I've ran Unigine Heaven in QHD, stopped camera and with the same scene I manipulated memory clock. There was absolutely no difference to FPS in 3 scenes I've tested, which ranged from 160 to 223FPS. I think this engine and level of complexity is very well suited to IC or lack of movement doesn't put any extra load on MC, hence no difference.

    For my 2nd test I've switched to Unigine Superposition and went straight to 4K and 8K tests. Both sets ran with my standard overclock of 2727MHz GPU, custom fan profile to avoid any thermal throttling and undervolt to 1050mV (which in reality just changes voltage curve as the chip still hits 1150mV at higher clocks, just the threshold before it has to go there is later in the scale).

    Here are results:
    [​IMG]
    [​IMG]
    4K - only difference RAM 2000MHz top picture and 2100MHz bottom

    [​IMG]
    [​IMG]
    8K - top pic 2000MHz and bottom 2100MHz

    In this test, at 4K Optimized my card ranks around 220 position in the world, among RTX3090's, also overcloked. On another hand, at 8K Optimized settings, I still rank around 220 postion, but there performance is only reaching RTX 2080Ti.

    Clearly, nVidia's brute-force approach to memory bandwidth has big advantages at very high resolutions. At the same time, in this test at least, 4K performance with 5% memory overclock gains less than 1% performance, where at 8K it jumps to over 1.5%.
    What is clouding accurately measuring influence of memory clock on the overlay performance in my case is it's influence on GPU dynamic clock. Simply, memory overclock eats from the same 300W power limit GPU has, lowering average GPU frequency by about 20MHz to 30MHz. I can imagine situations where having less power and frequency eaten by memory subsystem will result in higher FPS as GPU will clock higher. This should be most likely at lower resolutions, as IC will have higher hit rate, saving even more energy from memory subsystem and using it to boost GPU clocks.
     
  12. tsa1

    Newcomer

    Joined:
    Oct 8, 2020
    Messages:
    89
    Likes Received:
    97
    Or, it's actually higher GPU utilization is turned into higher power consumption / lower clocks. It's the same with Vega - in poorly optimized games like Assasin's creed Odyssey or Final Fantasy XV (the one with the inside-out TWIMTBP fluffy cows) it can boost upwards of 1700 mhz, while in Witcher 3 or Doome Eternal it goes down to 1640-1650 mhz with the same OC settings. If you increase PL (or if it can be increased higher than the standard 15% so that both power package / TBC limitations won't really matter anymore), you'd probably see higher GPU clocks and higher scores with heavily OC'd VRAM as compared to stock / lesser OC settings
     
  13. LordEC911

    Regular

    Joined:
    Nov 25, 2007
    Messages:
    877
    Likes Received:
    208
    Location:
    'Zona
    I remember it being more about the size of the memory controllers.
    They were able to fit a 512bit bus in 440mm2. Hawaii's interface took 20% less space than Tahiti's 384bit, earning them ~50% more bandwidth per mm2.

    Scales more easily while needing extra memory IC's. Increasing from 256bit to 384bit is a minimum 50% increase to your DRAM costs. GDDR6 isn't exactly cheap, and I don't even want to think of what Nvidia will be paying for 2GB GDDR6X.

    [​IMG]

    Edit- They also wanted to scale out their ROPs which were tied to L2/MC at the time.
     
    #1653 LordEC911, Dec 5, 2020
    Last edited: Dec 5, 2020
    Jawed, Putas, Silent_Buddha and 2 others like this.
  14. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,493
    Likes Received:
    474
    Everything doesn't shrink the same with each new process node so that could have something to do with it. For example, wires don't shrink the same as combinational, also called standard cell, logic.
     
  15. chris1515

    Legend

    Joined:
    Jul 24, 2005
    Messages:
    7,157
    Likes Received:
    7,965
    Location:
    Barcelona Spain
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Going that wide again may require pushing memory clocks near where Hawaii's bus topped out. Initial speeds were 5.0 Gbps, with the 390X hitting 6.0.
    I've seen commentary about the routing and spacing rules for the higher-speed GDDR6/GDDR6X chips being a possible limiter for bus width on the high-end. Perhaps the gap is too wide to justify reducing clocks enough to get to 512 bits?
     
    no-X and CarstenS like this.
  17. Frenetic Pony

    Regular

    Joined:
    Nov 12, 2011
    Messages:
    807
    Likes Received:
    478
    Could be why Nvidia stuck with 384bit and pushed speeds?

    Then if a giant cache loses efficiency at a 4k resolution, either from cache overflow despite being 128mb or from some pass needing a lot of access to main memory (has it been figured out which one it is?) then what's the solution to bandwidth scaling? Hoping HBM becomes cheaper doesn't seem the most likely. How cheap is Intel's EMIB style connect supposed to make it, I've seen claims that it's supposed to be cheaper, but not numbers.
     
  18. glow

    Newcomer

    Joined:
    May 6, 2019
    Messages:
    40
    Likes Received:
    31
    It is hard to compare Intel foundry/packaging costs to others. I suppose we'll have to watch how TSMC's version of EMIB compares to their current interposer. I don't think it has shipped in anything yet?
     
  19. tsa1

    Newcomer

    Joined:
    Oct 8, 2020
    Messages:
    89
    Likes Received:
    97
    Not sure about the modern GPUs (as I can't get one for weeks, sad sigh), but overclocking the memory on an R9 290 was basically a completely pointless endeavour - going from the standard 1250 mhz to something like 1500 mhz improved badwidth a lot (judging from oclmembench), but the scores in the benchmarks and games went higher by measly 3-5% at best. Of course, it might have been higher in 4K and modern games, but at that time it was just a matter of academic interest / hardware benching kind of stuff.
     
  20. fellix

    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,552
    Likes Received:
    514
    Location:
    Varna, Bulgaria
    The Hawaii GPU doesn't support frame-buffer compression, that could be one of the reasons for the poor performance scaling in this case.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...