AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,169
    Likes Received:
    576
    Location:
    France
    Yeah I hope wattman will be present in the new drivers. If so, I'll be very happy. It will basically be a 16gb Vega RX64. Already some game are eating more than 8gb, even at 1440p, so, eh, that's nice :eek:
     
    CarstenS likes this.
  2. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Some of Pascal's architectural features like the 49-bit address space and an automatic facility for page migration seem to be leading features where Vega is playing catch-up.
    Volta's memory management includes among other things access counters to better determine optimal migration patterns.
    This seems like a ripe area for watering the features down for a consumer product in the future. The actual address space and fault-handling are base-level system infrastructure, which HBCC also relies on regardless of API.

    AMD's HBCC adopts 4-level page tables like x86 (may lag Intel's 5-level extensions in future), although the direct access of x86 page tables via the Address Translation Cache was noted as being inactive when EPYC launched.
    Volta's Address Translation Services offer something similar, but only for Power. Perhaps that is a specific platform limit that could give AMD an advantage, or a matter for a future change/implementation. Or perhaps something a hypothetical future console could take in.

    It would be interesting to see the two platforms benchmarked head to head. I've seen some intimation that HBCC for all its features currently doesn't have the raw handling throughput. Even if Nvidia's methods are not quite as diverse in their options, games do not appear to really need all that HBCC offers, and it's possible for the sake of broad compatibility that even partially software-based methods may be good enough for client use.
     
    pharma likes this.
  3. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    Looking at the difference between Pascal/Volta and Vega, is the following statement correct?

    Pascal/Volta need OS support for each page swap into the GPU and Vega does not.
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Nvidia's migration engine has controllers dedicated to managing migrations, and at least for Power a translation method that allows access to CPU page tables. That may depend on the interfacing hardware in Power's NVlink/CAPI blocks, and possibly some OS support is needed.

    When EPYC launched, Vega's corresponding method was inoperative.
    Vega's ATC memory management needs an IOMMU present (Or so I interpret the mention of the IOMMU and ATC bit settings in: https://llvm.org/docs/AMDGPUUsage.html), and the support for newer versions of IOMMU needed implementation for the Linux kernel at least.

    For base level functionality, it look as if some level of driver and/or OS awareness is needed, even if it's supposedly transparent for client software. Both vendors have methods that appear to be overkill for what a game needs, so there could be slack even if one product isn't fully hardware-managed.
     
    Kej, pharma, Lightman and 1 other person like this.
  5. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    859
    Likes Received:
    262
    Polaris (edit: I think) added 49bit adress-space:

    https://www.pcper.com/reviews/Graph...ecture-Preview-Redesigned-Memory-Architecture
     
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
  7. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    859
    Likes Received:
    262
    Actually, it's from Fiji.

    GCN3 Buffer descriptor: 48 bit + 1 Bit ATC (Address Translation Cache), Table 8-9
    GCN5 Buffer descriptor: 48 bit + 1 Bit HBCC (mapped tile pool / heap), Page 56 / Table 33
     
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The ATC bit goes back further, as it's mentioned for Sea Islands in Table 8.5.
    http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf

    I'm not clear if the 48-bit base and ATC bit correspond to the same situation as the 48-bit base and mapped tile bit. Potentially, the HBCC doesn't need a bit in the shader's context to manage the resource's placement in the overall memory space.
    Having an explicit ATC bit as part of a resource's descriptor seems to be related to the inability of prior generations to autonomously manage memory straddling CPU and GPU pools, since the resource needs to know which side it's on. Rather than a 49-bit space managed in a common handler, it's two disparate 48-bit spaces.

    As far as the hardware's capability goes, what the shader sees may not be representative of the virtualized resource.
     
  9. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    OS support and a process or service to handle the transaction. Vega would only need minimal OS support or awareness for certain capabilities where both interact. Security, virtualization, sharing pointers, configuration/layout, etc that are usually handled by the driver. Arbitrary reads of system memory aren't necessarily desirable and more a programming issue in any case, but would be possible.
     
  10. Rasterizer

    Newcomer

    Joined:
    Aug 4, 2017
    Messages:
    29
    Likes Received:
    9
    So this might sound like a bit of a strange question, but I just want to make sure my understanding is correct as I'm having a discussion about Vega on another forum where this has come up: With respect to the next generation geometry engines in Vega, it's my understanding that it is the shaders within the geometry engines themselves (which were all fixed function shaders in Fiji) that have been (mostly) replaced with programmable non-compute shaders that can be reconfigured to act as primitive shaders rather than their default behavior, and it is not at all the case that primitive shaders work by bypassing the geometry engines entirely and using the compute units to process geometry instead of using the geometry engines, right?
     
  11. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,962
    Likes Received:
    4,553
    AFAIK Primite Shaders are only discarding triangles that don't appear in the final frame, and the end result is sent to the 4 geometry engines that still exist in Vega. Primitive shaders aren't working in Vega yet, so what you get is the same performance/clock as Fiji.

    EDIT because I don't know if this was the actual question:
    - I'm not sure if the primitive shaders (a shader is a program, not a physical hardware component) are running in the ALUs / shader processors that reside in the NCUs or if they're completely new units that go into the front-end, but I think they're using the NCUs. That being the case, what's new in Vega compared to older GCN are the bridges (bus and caches) created between the shader processors and the front-end (geometry processors).
     
    #4051 ToTTenTranz, Sep 6, 2017
    Last edited: Sep 6, 2017
  12. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    My understanding is that the primary processing grunt of the shader engines CUs. Primitive shaders as they're presented do not replace the fixed-function elements of the geometry pipeline. They exist in addition to the standard pipeline, which already has some capability to be fed via compute.

    The GPU's geometry path has a number of internal shader types that handle various combinations of vertex position, parameter, and amplification/decimation below the smaller number of API shader types.
    Each type outside of fixed-function blocks like the still fixed-function tessellation unit still runs on the CUs.

    The reasons for these shaders being separate at all seems to vary, with some being related to which shader is allowed certain sources and destinations, the format and packaging of data, or changes in the amount of data by a stage that could amplify or remove primitives. What Vega seems to have done is combined certain programmable stages that might have been separate for source/format reasons, but not necessarily where there are fixed-function transitions or amplification. Combining the producer and consumer portions of various pairings and removing the more esoteric divisions between programmable sections seems to underpin the new functionality.

    So it seems as if the programmable portions have become more generic in what they can source/send, but the difficulties in getting the feature activated and the description of how hard it might be to expose primitive shaders to developers makes it seem like this was not a clean change in this first-gen implementation.
     
  13. Ethatron

    Regular Subscriber

    Joined:
    Jan 24, 2010
    Messages:
    859
    Likes Received:
    262
    Yeah, I'm also uncertain about what exactly the 49bits in the media refers to. The descriptor clearly creates two address pools, which is about what I would expect as a client programmer, and which is fine by me. The descriptor is basically a typed/tagged address, and it can address 49bits of distinguishable memory AFAICS, and simply raising the bit while keeping the address field the same makes not too much sense most of the time (aliasing the same address in two pools for maybe the same data is exotic I would say). I would not expect heterogenous UMA. I also don't expect the HBCC controller being invoken on all and any address or descriptor presented to the GPU. There is no consecutive pointer type of 49bits in the ISA manual at least. CPU address space is 48bits anyway, so having a true unified address pool of 49bits is useless if the GPU can't potentially address more than 48bits itself, which I don't think is the case.

    There is the possibility that ATC is/was for Hyper-V or GPU virtualization, and it's in reality N virtual adress spaces (sequentially acessible, not simultaniously), N being the number of virtualized instances. How virtualization works with HBCC is also an open question. Maybe the HBCC-controller is the XDMA-controller and a TLB for the system memory pool fused in one unit. There is still the GPUs native TLB for the on-board pool.

    Seems really interesting, but not too relevant for normal operation. :)
     
  14. Rasterizer

    Newcomer

    Joined:
    Aug 4, 2017
    Messages:
    29
    Likes Received:
    9
    Okay, now I'm definitely confused. The Vega whitepaper talks about "Next Generation Geometry Engines" and says:
    They aren't talking about processing being done within the geometry engines as shown on the Vega block diagram? I guess that means I should ask which stages of the rendering pipeline are normally done within the geometry engines?
     
  15. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The HBCC wouldn't generally be involved without a page fault. If it's operating in a shared mode, it would need to track what ranges might fault. Hardware-generated offsets like some of the wave-level base pointers may have implicit restrictions where the GPU will use its known-local address range.

    The motivation stated for 49-bit addressing by the vendors is for unified memory addressing where the GPU can access the full CPU range in addition to what it can address independently. If it wants to be generally capable of accessing from the host's 48-bit range, it would exhaust the address space of the GPU's paging system without additional space.

    ATC is the implementation of an IOMMUv2 feature for heterogeneous memory access. It allows the GPU to interface with the host's page tables and cache translations. For protection, everything the GPU does with unified memory treats it as a virtual guest.


    The only truly fixed-function elements in the diagram are in the solid dark gray blocks. The various VS/DS and GS elements are actually running on the CUs. Internally, some of those actually decomposed into different variants depending on whether of tessellation and geometry shaders were invoked.
     
    pharma and ieldra like this.
  16. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,273
    Likes Received:
    3,722
  17. BoMbY

    Newcomer

    Joined:
    Aug 31, 2017
    Messages:
    68
    Likes Received:
    31
    Very unlikely, they won't use more processes than they absolutely have to (because cost, you know), and their main producer will always be GloFo. Vega 20 will most likely be their first 7LP product from GloFo, somewhere around Q4/2018. Also it could be mostly (only?) a Pro product, because it may have 4 stacks of HBM2.
     
  18. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,962
    Likes Received:
    4,553
    Well if GF's solutions keep increasing the efficiency gap to TSMC and Samsung's equivalent processes then AMD had better find a way to avoid being dragged down by them. At least for their halo products (which Vega 20 should be).
     
  19. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,183
    Likes Received:
    1,840
    Location:
    Finland
    7LP, despite name sounding like it, isn't a lowpower process like 14LPx's are, it's "7 Leading Performance" aimed at high performance products (read: GPUs, big x86 cpus)
     
    Grall likes this.
  20. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,962
    Likes Received:
    4,553
    Yes, but it needs to reach the point of mass production on time...
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...