AMD Vega Hardware Reviews

Discussion in 'Architecture and Products' started by ArkeoTP, Jun 30, 2017.

  1. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,183
    Likes Received:
    1,840
    Location:
    Finland
    I'm pretty sure HWS were introduced before Polaris, but you're right that initially (or at least in the initial slides) Tonga/Fiji didn't have them. I think it was some hotchip (or some other such more professional oriented event) where they first published Fiji-slides with HWS
     
  2. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    124
    Likes Received:
    108
    Tonga and Fiji had HWS, but were not yet ready to be enabled. At some point they were enabled.
     
    Lightman likes this.
  3. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    pharma likes this.
  4. mpg1

    Veteran Newcomer

    Joined:
    Mar 5, 2015
    Messages:
    1,526
    Likes Received:
    1,112
    wrong thread..
     
    #564 mpg1, Jul 30, 2017
    Last edited: Jul 30, 2017
  5. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    From scattered forum posts about this, HWS is a multithreaded processor that handles run lists consisting of kernels to be launched, and can dynamically assign queues in memory to the ACEs. The two HWS blocks in the diagrams are actually one block that can run two threads. The HWS does not handle dispatch itself, and so it is one step removed from the dispatch pipeline that tries to arbitrate for wavefronts. This is in turn one or more levels removed from having visibility or communication with the CU's instruction fetch arbitration or execution progress. The CUs can readily function without it, since HWS can be disabled, and there is no obvious channel between the two domains or reason for them to notice. Since HWS is handing things off to ACEs, it doesn't seem to need awareness of the CUs either.
     
  6. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    While not very well documented, my take was ACEs handling dispatch and dependencies of assigned queues. I thought I read a while ago 8 pointers per ACE and probably some registers for tracking progress. HWS instructing the ACEs where to dispatch a wave as needed and load balancing. ACEs and HWS being the same hardware, including other functions for HSA, etc. They should all have the same exposure as HWS used to be ACEs. They seem like PLDs sitting on a data bus polling register values and making decisions based on whatever program was loaded. Somewhere in there they handle task graphs and dependencies.

    ACEs shouldn't be involved in instruction dispatch beyond a pointer to the kernel for fetching. The documentation I recall was old, but they had 8 pointers and would step through lists of kernel pointers potentially in the tens of thousands. The only concern should be kernel progress regarding waves dispatched for determining indices.

    I could actually see similar hardware for scheduling in each CU, but again nothing there would ever require public documentation as even the ACE/HWS is signed firmware.
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    From some descriptions on the Phoronix forums by AMD staff, they are actually not the same hardware. It was AMD's artistic license to try to draw ACE-labelled squares for the externally visible queues, just at it continued to be artistic license to place two HWS blocks in the diagrams when there is in actuality only one.

    The HWS does not have the same dispatch capability of an ACE, and an ACE is one end of a pipeline used to arbitrate, construct, and launch workgroups. My impression is that there's further hardware that does much of the allocation and arbitration work, so what the ACEs are actually aware of is unclear to me.
    The HWS hardware is further away from any change in the behavior of wavefront instruction issue and caching than the ACEs are, so it seems even less likely to have changed the realities of CU execution.
     
  8. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    I'm not disputing HWS's lack dispatch, but that the hardware performing that function is interchangeable. Four blocks as I understand, each existing as 1 HWS, 2 ACEs, or something else entirely.

    I don't believe either unit has any involvement with instruction caches beyond a single pointer to the kernel for reference. Instruction fetching left to the CUs or other hardware block.

    These blocks are very simple as I understand. Handful or registers and some basic math capabilities. Simple microcontrollers that may be working with each other. No direct wiring to CUs or anything. The CUs on the other hand will need some direction on how to communicate with the units as a firmware update could relocate registers. My thinking is when a CU needs work it pings an ACE, HWS evaluates metrics from that CU and in progress kernels on the ACE, then the ACE dispatches a wave based on that result. Accounting for priority, age, dependencies, etc. Could be completely wrong on that, but it seems likely they are working together. HWS being added late to extend some capability that didn't exist in early GCN iterations. For that reason it shouldn't be critical, but have added better async handling. GCN1 was simple round robin across queues as I understand. Some of the console guys may have a better grasp on that original behavior as it should be the XB1/PS4 method.
     
  9. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    129
    There is an intermediate layer called SPI, that sits between front-ends (ACEs & all graphics front-ends) and CUs, for the purpose of work distribution. It receives dispatches from the front-ends, and assigns workgroups to CUs with idle resources.

    This is not a thing that is well-documented. But you can find relevant clues in the GPU hardware register docs, ISA docs, and open source drivers. It was also illustrated in the diagrams of the VGLeaks PS4/XB1 GPU architecture leaks.
     
    #569 pTmdfx, Aug 1, 2017
    Last edited: Aug 1, 2017
    Putas, Anarchist4000 and BRiT like this.
  10. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    129
    Just a minor correction: ACEs spawn workgroups (not dispatches), according to https://www.slideshare.net/DevCentralAMD/gs4152-michael-mantor.

    So the assumption is that ACEs get pinged back only after an outstanding workgroup is completed. Since the commands of ACEs work at the granularity of dispatches (grids/cubes), these ping backs should not touch the software queues directly, but only the ACE's internal grid-to-group dispatcher, which in turn wakes queues as appropriate.
     
    #570 pTmdfx, Aug 1, 2017
    Last edited: Aug 1, 2017
    BRiT likes this.
  11. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    https://www.slideshare.net/mobile/D...-gcn-architecture-a-crash-course-by-layla-mah

    Slight correction: "dispatch one wavefront" per cycle, creating workgroups as needed. At the bottom of one of the slides from another presentation. So 4 waves in a workgroup take 4 cycles to schedule. All of this subject to change as HWS don't exist and ACEs are programmable.
     
  12. pTmdfx

    Newcomer

    Joined:
    May 27, 2014
    Messages:
    249
    Likes Received:
    129
    The one wavefront per cycle figure, as far as I know, belongs to the publicly invisible SPI. SPI schedules resources at the granularity of workgroups, because of the presence of group-level semantics and resources (LDS & barriers).
     
  13. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,928
    Likes Received:
    1,626
  14. Scott_Arm

    Legend

    Joined:
    Jun 16, 2004
    Messages:
    13,276
    Likes Received:
    3,725
    Seems kind of dumb to release a review this late when the conclusion includes this statement: "After a year of teasing us with tales of the Vega architecture, it's just days away from releasing three desktop-oriented gaming cards and a driver that'll enable the much-discussed Draw Stream Binning Rasterizer. If the impact of that software falls flat, this could be much ado about very little."

    So why not wait until the new driver is out? The review is already late.
     
    digitalwanderer likes this.
  15. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    124
    Likes Received:
    108
    It contains comparisons to P6000 in professional workloads. This is where Vega FE is supposed to shine.

    Also, the review used to be in German near Vega FE's release, it just took them this long to translate it.
     
    T1beriu, Kej, homerdog and 4 others like this.
  16. BacBeyond

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    73
    Likes Received:
    43
    I know its an old review just refreshed with english, but why would they only test a $1000 GPU against the $5500+ Quadro version and not the $2000ish P5000 or $1200 Titan XP?

    It just is odd to me to leave that kind of information off the graphs or at least on the "feature comparison" from page 1. They say its "Included in the mix is Nvidia's Quadro P6000, which is around three times more expensive than AMD's Frontier Edition board." which doesn't make any sense, because its not 3x, its 5x+.

    https://www.bhphotovideo.com/c/product/1319335-REG/hp_z0b12at_nvidia_quadro_p6000.html is the cheapest I can find @ $5,600

    There is one cheaper off amazon, but its by a 3rd party with a total of 37 reviews in their lifetime... so I'd be hardpressed to send them $5k. HP Sells it in one of their systems for $5500 as well so that seems to be the going rate.

    Maybe its just me, but it seems wrong to compare a product against one that costs 5x as much without making it very clear that is the case. I mean how often do we compare a 1050 vs a 1080 Ti to see what delivers better FPS?
     
  17. BacBeyond

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    73
    Likes Received:
    43
    I think they said that the SSG will cost $7k
     
  18. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    In germany, you can order a P6000 at about 3,200 EUR at amazon.de (sold by them as well). Currently, though, it's out of stock - maybe it was in stock, when the review was initially published?
     
    pharma likes this.
  19. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,966
    Likes Received:
    4,561
    Truthfully, if I was filthy rich this would be the card I'd want. Or a couple of those even. How cool would it be to virtually end all loading times in games and streaming hiccups in free-roaming games?
     
  20. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,183
    Likes Received:
    1,840
    Location:
    Finland
    Does the SSG-storage actually show as HDD? At least they have some sort of API for it, currently supported (in beta) by Adobe Premiere & After Effects only
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...