AMD: RDNA 3 Speculation, Rumours and Discussion

Discussion in 'Architecture and Products' started by Jawed, Oct 28, 2020.

  1. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,857
    Likes Received:
    2,059
    Location:
    Earth
    Chiplet for gaming workload requires explicit developer support to group work in a way that minimizes cross chip traffic. Same reason why SLI is dead. Moving data between chips eats a ton of power if it's done at same speed chip operates internally. Chiplets will happen for gaming once developers are on board for it. One could claim nvidia's server solutions are "kind of chiplets" as the gpu's are connected together via nvlink and share same cache coherent memory space. But even in that space the requirement is to be aware of penalty of chip to chip communication. However people are willing to take that into account and optimize to be able to solve bigger problems.

    Chiplet's for cpu's are easier as there are independent workloads. Though even there we have seen issues communicating between chips causing performance degradation and needing to optimize to minimize hopping between chips.
     
  2. Bondrewd

    Veteran Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    1,130
    Likes Received:
    510
    Or you just build an overkill d2d interconnect that uses fancy adv packaging.
    It does but NVLink is slow by GPU standards.
    A100 NVLink b/w total is around 600GB/s with A100 main DRAM b/w being over twice that number.
    And that's DRAM.
    Fuck no, the sheer amount of CC magick going into stuff like Rome or SPR is amazing.
    CPU CC is a nasty, bulky and hard abstraction we all deal with cuz we have to.
     
  3. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,857
    Likes Received:
    2,059
    Location:
    Earth
    If I don't remember all wrong the nvlink bandwidth in latest nvidia servers is something like 300GB/s down and 300GB/s up. Combined 600GB/s. It's pretty decent nvlink speed in and out of single chip. Single link isn't that fast but there is multiple nvlinks per chip and flexibility on how to route the traffic between chips using one or more of the links.
     
  4. Bondrewd

    Veteran Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    1,130
    Likes Received:
    510
    Yeah but A100 alone has over twice the b/w to main DRAM.
    And its L2 bandwidth per each NUCA segment is something unholy.
     
  5. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,244
    Likes Received:
    1,681
    Location:
    msk.ru/spb.ru
    Yeah "mostly" as in "not at all".
    As I've said, let's wait and see. It's not a given that a chiplet design on N5 would be better than a traditional one on N7 for example.
     
  6. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,857
    Likes Received:
    2,059
    Location:
    Earth
    I remember why I don't participate on these threads. I'll take my coat and go wait to see what frontier is all about. This easily could be another gddr6x moment for some people.
     
  7. Bondrewd

    Veteran Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    1,130
    Likes Received:
    510
    Man that's a lotta seething from you.
    Of course.
    It's always funny with you.
    Of course it would be, even a single tile would be quite a lot more mean.
    N5(p) is also some more speed, and everyone loves speed.

    Lower end N3x's are single dies for mobile anyway.
    So ugh I dunno what's even your point.
    It's very simple, Trento and 4 MI200s doing a thing.
    Add some Slingshot juice and ta-da!
     
  8. DegustatoR

    Veteran

    Joined:
    Mar 12, 2002
    Messages:
    2,244
    Likes Received:
    1,681
    Location:
    msk.ru/spb.ru
    That's pretty obvious tbh
     
  9. Bondrewd

    Veteran Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    1,130
    Likes Received:
    510
    Yeah the packaging volumes can be a bit too soulcrushing given the target volume.
     
  10. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    2,095
    Likes Received:
    1,536
    Location:
    France

    Hu the big thing is with rdna3&co is that devs don't have to do a thing about it anymore ?
     
  11. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,857
    Likes Received:
    2,059
    Location:
    Earth
    It's unlikely that would be true. The bandwidth needed to communicate between chips is huge(terabytes/s). That huge bandwidth translates into extra power consumption. Developer interaction is needed to make the workload such that each core can work mostly independently avoiding the bandwidth(power) cost. It's different with cpu's as it's much easier to find threads with independent workloads. Despite this even cpu's see issues when threads jump between chips,... And hence we need os that is aware of chiplets and keeps work in same chip, game engines need to be optimized etc.

    GPU chiplets in gaming context only make sense after the biggest possible standalone chip is not enough anymore. And this implies developers are then on board to optimize. GPU chiplets in AI/HPC on the other hand make a lot of sense. Those workloads already are being distributed and being optimized for multi gpu/chip use cases.
     
    DavidGraham and DegustatoR like this.
  12. tunafish

    Regular

    Joined:
    Aug 19, 2011
    Messages:
    619
    Likes Received:
    397
    I do not think you are fully appreciating the things that new, exotic packaging can get us. Chiplet GPUs using Zen-style packaging (separate dies on an organic substrate) are complete non-starters for reasons you have outlined. (You state that it's not possible without getting developers onboard, frankly I do not think getting them onboard is possible.)

    The reason people are repeatedly proposing them is that the entire industry is right now falling over themselves trying to produce better ways to connect chiplets to each other, some of which have energy/transferred bit comparable to moving data between two structures on the same chip. So yes, we are literally proposing GPU chiplets with terabytes/s interconnect bandwidth, using some of the new packaging/integration technologies that are just maturing.
     
    Silent_Buddha and Lightman like this.
  13. HLJ

    HLJ
    Regular Newcomer

    Joined:
    Aug 26, 2020
    Messages:
    393
    Likes Received:
    643
    You always run in the the physical universe:
    Compute: Cheap
    Moving data: EXPENSIVE!!!
     
    DavidGraham, Krteq, PSman1700 and 2 others like this.
  14. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,857
    Likes Received:
    2,059
    Location:
    Earth
    I would be curious to see numbers for power consumption transferring something like 1TB/s constantly between chips would take. Maybe you have numbers to share? Last real numbers I saw were high enough to make this approach not work. If the in between chip bandwidth doesn't match internal bandwidth that will create issues unless workload is optimized to take the lesser bandwidth and potentially higher latency into account. In essence the chiplet would not work as single big chip, rather there would be communication bottleneck in between.`

    edit. Honestly, 1TB/s is probably not enough. Probably one would want to have at least same speed for chip to chip communication as is speed for infinity cache. Closer to 2TB/s read and 2TB/s write would be realistic want for rdna3 to start making the interconnect fast enough to not be major bottle neck.
     
    #174 manux, Feb 11, 2021
    Last edited: Feb 11, 2021
    DegustatoR likes this.
  15. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    2,095
    Likes Received:
    1,536
    Location:
    France
    So what the point of the chiplet buzz if it's another Fury MAXX ?

    That's why it's on paper a big deal. Same with what Imagination Tech is doing for exemple (on paper). If it's not, all the patents and buzz would have been BS, I don't see that happening...
     
  16. Bondrewd

    Veteran Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    1,130
    Likes Received:
    510
    YES.
    Now imagine the sheer amount of cache traffic a 96c Genoa part would do internally running AVX512 on a pair 512b FMAs per each core.
    It's not buzz, the entire industry is gearing up for it.
    x86 duo, TSMC, EDA vendors, you name them.
    Big and mean d2d is here to come and stay.

    https://www.servethehome.com/wp-con...l-Architecture-Day-2020-Packaging-AIB-2.0.jpg
    Stuff like this.
     
  17. w0lfram

    Regular Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    252
    Likes Received:
    48
    AMD has been working with heterogeneous compute and unifying lvl 3 cache..

    A newer HBCC using a new Infinity Fabric is all that is needed.
     
  18. HLJ

    HLJ
    Regular Newcomer

    Joined:
    Aug 26, 2020
    Messages:
    393
    Likes Received:
    643
    People always want to break the laws of the universe...but he universe don't give a F....
    Some reading, because I can see people with more brand bias than physical knowledge have started being silly:
    Toward Standardized Near-Data Processing with Unrestricted Data Placement for GPUs (nvidia.com)
    hpca18-xor.pdf (niladrish.org)
    Kestor.pdf (pnl.gov)
    Data movement is overtaking computation as the most dominant cost of a... | Download Scientific Diagram (researchgate.net)

    Gaming is a piss poor application for chiplets (due to frametimes being so vital)...other workloads are better suited for chiplets, but the fact remains:

    You want to move your data as LITTLE as possible both on-chip and off chip.
     
  19. Nebuchadnezzar

    Legend Veteran

    Joined:
    Feb 10, 2002
    Messages:
    1,040
    Likes Received:
    287
    Location:
    Luxembourg
    Those papers are completely irrelevant. In the next year we'll have packaging and d2d interconnect methods that use 1/10th* the energy per bit of what's been used traditionally.

    I don't remember where, but AMD/ had given an interview where they said the new technologies are allowing them unprecedented bandwidth that wasn't possible before.

    * AMD last quoted 2pJ/bit for their IFOP SerDes: https://www.slideshare.net/AMD/isscc-2018-zeppelin-an-soc-for-multichip-architectures

    http://www.guc-asic.com/en-global/news/pressDetail/glink

    Those are the kind of numbers we're at today.

    If Imagination says they can do segregated GPUs with just one wire between them, scaling perfectly well in graphics, I don't see why AMD couldn't.
     
    #179 Nebuchadnezzar, Feb 11, 2021
    Last edited: Feb 11, 2021
    Silent_Buddha, Kej, ethernity and 4 others like this.
  20. manux

    Veteran Regular

    Joined:
    Sep 7, 2002
    Messages:
    2,857
    Likes Received:
    2,059
    Location:
    Earth
    AI/HPC workloads in datacenter/professional use. There is huge incentive there to crunch data that doesn't either fit in one gpu or reasonable compute time requires multiple gpu's. Datacenter full of these types of things: https://www.nvidia.com/en-us/data-center/hgx/ John Carmack for example got one of these for ai work he is pursuing: https://www.nvidia.com/en-us/data-center/dgx-station-a100/

    CPU's as chiplets make sense in consumer world as the nature of cpu tasks is such that cores can often operate independently of each other. Though it's not always the case and we did see perf issues initially(os scheduler update, better implementation in zen3, cp2077 issues requiring chiplet specific tuning etc).

    AMD won multiple big supercomputer deals. My guess is chiplets go there first. Perhaps also as prosumer devices as they could be tempting in various non gaming use cases like research university/companies do. Perhaps chiplet goes to frontier or el capitan: https://www.amd.com/en/products/exascale-era
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...