AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,185
    Likes Received:
    1,841
    Location:
    Finland
    Not much:
    https://www.phoronix.com/scan.php?page=news_item&px=Arcturus-Linux-Driver-Patches
     
  2. JoeJ

    Regular Newcomer

    Joined:
    Apr 1, 2018
    Messages:
    467
    Likes Received:
    561
    I'm excited! <3 :)

    Too bad nobody except me will believe this is the right direction also for gaming :|
     
  3. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,041
    Likes Received:
    3,110
    Location:
    Pennsylvania
    Maybe in a couple of decades :)
     
  4. bridgman

    Newcomer Subscriber

    Joined:
    Dec 1, 2007
    Messages:
    58
    Likes Received:
    102
    Location:
    Toronto-ish
    I guess if we actually wanted attention we could drop a couple of patches into llvmpipe or softpipe to support thousands of rendering threads.
     
    Lightman likes this.
  5. PizzaKoma

    Newcomer

    Joined:
    Apr 29, 2019
    Messages:
    39
    Likes Received:
    64
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,125
    Likes Received:
    2,884
    Location:
    Well within 3d
    The code changes that cover some of this discuss no longer needing to enumerate graphics queues and load the microcode for several of the processors that exist with in the command processor block (PFP, ME). However, there's a brief change elsewhere that discusses enumerating certain things like Hi-Z and primitive FIFO sizes. These could be placeholders that might be rewritten at some point to remove the stubs, or these elements might still be present to some extent because they exist at the SE level (geometry engine, RBEs) instead of being attached to command processors. Perhaps the higher CU count is enabled in part because there were limitations to the GFX command processor's ability to manage that many CUs. Having multiple compute processors might allow for a larger pool to be subdivided internally since they don't pretend to host a massive single context.

    The microcode engines that handle compute still interact with the SE hardware, and perhaps adjustments can be made for that part of the context. Other graphics functionality remains, since there are display controllers enumerated (default to gated-off unless specifically needed) and a streamlined context for surface parameters.
    The graphics command processor block specializes in the backwards-compatible support of a large API-defined context at speed, but something not compatible or lower-spec could be useful for visualization or other scenarios.

    Mixed in the changes are some mentions of integer-scaling of content, but I didn't see a statement that this was specific to Arcturus.
     
  7. PizzaKoma

    Newcomer

    Joined:
    Apr 29, 2019
    Messages:
    39
    Likes Received:
    64
    Regarding Arcturus, VCN 2.5, no display ip block with that and below you have how much of a compute card it is:
    [​IMG]
     
    Lightman likes this.
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,125
    Likes Received:
    2,884
    Location:
    Well within 3d
    A recent change meant to reduce the impact of integrating Arcturus' larger CU count mentioned a term that was briefly discussed a few times in this forum in the past.
    https://lists.freedesktop.org/archives/amd-gfx/2019-August/037800.html

    The SE/SH layout seem like it addresses a sub-division that is possible within the CUs in a shader engine.
    Going back to documentation in the Southern Islands ISA doc, there is a hardware register called HW_ID that references the SE number and another 1-bit identifier for the shader array within the SE.
    What impact having the CUs within an SE being part of one single array or split into two is unclear. Perhaps it has some impact on how the CUs can be signaled or how they can arbitrate for shared resources like the memory crossbar or export.

    In theory, the combination of SE, SH, and CU identifiers could have given enough space to differentiate 128 CUs all the way back in Southern Islands, at least in terms of that element of the architecture.
    Since the SI ISA doc, AMD hasn't wanted to keep documenting this hardware register, though even with the recent RDNA ISA doc you can see the jump in numbers over where it likely still is.

    This change sheds some light on Vega might look in terms of that register, and how Arcturus didn't follow the previously mentioned way of getting to 128 CUs.
    Vega apparently had 4 SEs with 2 SH each, or some products with 4 SEs and 1 SH.
    Arcturus is apparently going for 8 SEs and 1 SH each, but in order to reduce the impact of having to rewrite a commonly-used layout table that assumed 4 SEs max, the SH count is being repurposed for Arcturus to serve as an additional bit for differentiating between the first and second halves of the set of 8 shader engines.
    What it means to have 1 SH isn't clear, although if it deals with how the shader engines can interface with the rest of the chip it might prevent excess complexity in linking them to their infrastructure (or the there's some other barrier to having 16 shader arrays).

    Edit:
    The table from the SI ISA doc, for reference:
    Code:
    Table 5.8 HW_ID
    Field Bits Description
    WAVE_ID    3:0    Wave buffer slot number (0-9).
    SIMD_ID     5:4    SIMD to which the wave is assigned within the CU.
                       7:6     reserved.
    CU_ID       11:8    Compute unit to which the wave is assigned.
    SH_ID        12      Shader array (within an SE) to which the wave is assigned.
    SE_ID       14:13  Shader engine the wave is assigned to.
    TG_ID       19:16  Thread-group ID
    VM_ID       23:20  Virtual Memory ID
    RING_ID   26:24  Compute Ring ID
    STATE_ID 29:27  State ID (graphics only, not compute).
    
     
    #5928 3dilettante, Aug 2, 2019
    Last edited: Aug 2, 2019
    AlBran likes this.
  9. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,001
    Likes Received:
    4,572
  10. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    910
    That looks fairly solid, but I really can't figure out why AMD would keep a Vega GPU on Renoir. By then, Navi will have been out for about a year. I can't make sense of this.
     
  11. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    10,001
    Likes Received:
    4,572
    Unless Renoir comes up later than expected, the difference should be half a year.
    Regardless, I can't think of any reason to not adopt RDNA other than development time, or maybe AMD being concerned with the iGPU"s compute capabilities.
     
  12. no-X

    Veteran

    Joined:
    May 28, 2005
    Messages:
    2,298
    Likes Received:
    247
    Or maybe AMD offered semi-custom Navi-based APU exclusively to a bountiful customer, e. g. Apple.
     
    Lightman likes this.
  13. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,041
    Likes Received:
    3,110
    Location:
    Pennsylvania
    Aren't they separate use designs now? Compute focused Vega and RDNA for rasterization gaming?
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,125
    Likes Received:
    2,884
    Location:
    Well within 3d
    Prior APUs lagged the discrete architectures as well. There seems to be an additional hurdle for integrating the output of CPU and GPU development pipelines. There are some possible reasons, like the teams responsible for implementing the APU need to wait on two different milestone dates from the CPU and GPU groups, potentially more involved physical design and engineering trade-offs between the CPU and GPU silicon, possibly more physical work if there's a mobile focus, and risk-management on top of all else.

    Zen 2's chiplet strategy may be a hindrance here, assuming this is some kind of APU. A selling point of the dis-integration was using process technology and implementation techniques that best matched the type of processor, and an APU would revert things to the less-optimal mixed-use chips of before. Additional time or risk would require locking design elements in earlier, perhaps before Navi was considered suitable.
     
  15. Tkumpathenurpahl

    Veteran Newcomer

    Joined:
    Apr 3, 2016
    Messages:
    1,077
    Likes Received:
    799
    Isn't AMD using GF's 16nm process for the I/O chip of their chiplet CPU's?

    If so, that might explain the use of Vega. Get the GPU+I/O chip from GF's 16nm, and pair it with Zen 2 chiplets from TSMC's 7nm.
     
  16. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,496
    Likes Received:
    910
    If it's just a six-month gap, then yes, I can see how timing might prevent the integration of a Navi GPU. This still feels like a shame, though. But I guess it's better to have Zen2+Vega in January than Zen2+Navi in July.
     
  17. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    16,157
    Likes Received:
    5,095
    Would it be possible to just have the GPU be part of the "uncore" chip rather than being included on the CPU core chips? Granted that would make for a rather large "uncore" chip, but would provide consistent GPU performance as well as retaining the flexibility in how to use the CPU cores.

    Regards,
    SB
     
  18. Bondrewd

    Regular Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    520
    Likes Received:
    239
    Yes, see Arrandale.
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,125
    Likes Received:
    2,884
    Location:
    Well within 3d
    It would be similar to how almost every northbridge with integrated graphics worked back when the chipset was on the motherboard, so I think it would be possible.
    The size might be a counter-argument to using it, and unlike most of the IO die a GPU is very dense and very active silicon.
    If the product is expected to make its way into laptops, the benefits of having the GPU die on a node with significant power savings would be significant.

    If the CPU and GPU are on a 7nm node and this is going into laptops, an APU might make more sense since it can discard IFOP links and maybe shave off some of the IO capability of a Matisse IO die.
    If this is on 7nm, GF is excluded and AMD's cost structure with two separate 7nm die could be unfavorable.
    A custom uncore could also revisit the low-power and idle behaviors of Zen2, which seem to be somewhat higher on the desktop and could really go for optimization in portable systems.

    Details are still sparse on what differences Renoir has versus Raven. Some of the code shares blocks with Raven Ridge for IDs and settings, although some things like video encoding and the PSP version are more recent.
    Navi 10 does have more die area devoted to elements that might be less acceptable at this range. The L1 cache may not make as much sense in a system where a shader array is alone on the chip and not fighting for the L2, and the WGP organization splits a front end between 2 CUs versus 1:3. The geometry engine setup and other elements also seem like they take up a fair chunk of area in the center versus what was there in Vega.
    In a system with very limited memory bandwidth, die area constraints, and contention with a CPU, perhaps Navi wasn't worth waiting for versus a more compact Vega GPU running at low clocks on 7nm.

    I don't remember whether Raven was considered "late", and whether Vega's growing pains might have figured into its release schedule.
     
    Silent_Buddha likes this.
  20. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,185
    Likes Received:
    1,841
    Location:
    Finland
    Epycs I/O-chip is 14nm, Ryzens 12nm (and X570 chipset 14nm), GloFo doesn't have 16nm
     
    Tkumpathenurpahl likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...