Software transparent Multi-GPU using an interposer

Discussion in 'Beginners Zone' started by Cat Merc, May 27, 2017.

  1. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    111
    Likes Received:
    96
    Would such a setup be at all possible? My gut feeling says no, but my understanding of GPU rendering is fairly limited.
     
  2. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    AMD's upcoming Vega GPU's are built using their 'Infinity Fabric' technology which gives it modularity like the CCX's in Ryzen. So it's possible in the future they come out with a multichip software transparent product. The only question would be whether they use classic MCM assembly or an bigger interposer.
     
  3. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    246
    Likes Received:
    109
    Infinity Fabric itself doesn't mean everything, when the problem at hand is that GPUs have a far higher scale of interconnect bandwidth needs. "Modularity" doesn't help either if you consider how GPU caches are served to amplify bandwidth and to exploit spatial locality first and foremost with its predominant data-streaming access nature, and the increasing reliance on device scope atomics and coherency needs.

    Imagine to have 2x Vega 10 as four smaller chips. For each chip, in addition to the x1024 HBM2 interface, you would also need a triple of such (in SerDes or whatever) to at least maintain full channel-interleaving bandwidth that matches what is being expected for a monolithic GPU. Now let's take account of also the need of L2 caches and ROP access needs. Let's not forget also the GPU control flow (wavefront dispatchers, CPs, and GPU fixed functions esp).

    As always I am not saying it is impossible. But apparently the only question is not the only one.
     
    Cat Merc likes this.
  4. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    837
    Likes Received:
    235
    I think it's not realistic to expect that the hardware looks to the _driver_ like one GPU, but you can certainly just publish two graphics-queues to the run-times (DX12 + Vulkan runtimes, not games) and require no other adjustments at all (shared memory address space and so on, maybe e.g. shared MMU instead of multi-MMU coherency or such).
     
  5. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    I agree with you, but I remember someone from AMD saying something along the lines of multi-chip was the way to go in the future. Without much thought I assumed with Vega having infinity fabric this was at least the first step towards such a future. Why else would AMD choose to build Vega with infinity fabric...? But there's no guarantee it would be software transparent.
     
  6. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    246
    Likes Received:
    109
    This makes little difference from explicit multi GPU though. Shared memory address space is already a thing IIRC (via the host).
     
  7. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    246
    Likes Received:
    109
    Multi-chip is the way, but the tone was set for heterogeneous SoC integration in the first place. Their excascale proposal uses multiple GPUs per package, but that's because of the model they pursue for that particular project (in-memory computing).

    Cache coherency between multiple GPUs, Zen hosts and perhaps OpenCAPI appears to be an incentive though.
     
  8. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    837
    Likes Received:
    235
    If the memory pool itself isn't shared it's very problematic currently. Transfers between them are only pushable, not pullable etc. (see DX12 docs).

    If you compare it to CPUs - having different memory pool for different sockets vs. shared memory pool for all sockets vs. more cores on the same SoC in the same socket - I suspect the trend to be the same.
     
  9. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,126
    Likes Received:
    3,762
    I think it's relevant to mention here that while Vega indeed does use the Infinity Fabric to connect to a CPU in Raven Ridge and other APUs, according to the slides Navi is the architecture that focuses on scalability:

    [​IMG]


    That said, Navi might be an architecture that takes inspiration from PowerVR's and Mali's "MP" models, by designing a base core unit and then grouping larger or smaller numbers according to a performance and power segment.

    This seems to be what they're doing right now with Zen. Ryzen 5 and 7 have 2*CCX, Threadripper has 4 CCX and Epyc goes up to 8, and all of them use Infinity Fabric to interconnect the CCXs. Raven Ridge is 1*CCX + Vega through Infinity Fabric.
    Navi could be the GPU version of a CCX (let's imagine GCX). Future AMD chips could be just a mix of different numbers of CCX and/or GCX modules, according to the product they want to output.
    If the interconnect fabric is robust , performant and future-proof enough, AMD's hardware R&D teams could focus on iterating upon CCX and GCX.
     
  10. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...