AMD Vega Hardware Reviews

Discussion in 'Architecture and Products' started by ArkeoTP, Jun 30, 2017.

  1. roybotnik

    Newcomer

    Joined:
    Jul 12, 2017
    Messages:
    18
    Likes Received:
    14
    Yeah I know, that's what I did during my own testing. I'm just thinking along the lines of users who might not be willing to mess with around in those settings, since many folks are talking about "out of the box" configuration.
     
  2. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    It's still packed and occurring in the same clock cycle for scheduling. Only difference is the execution unit serializes it thanks to lower propagation delay. Consider a simple integer adder where the MSB depends on the LSB. Half the size, half the delay, so double the clockspeed. Bit more complex in reality, but keeping it simple. Same amount of bits are entering and exiting from the registers. Like I mentioned above, 4x should be doable with a little work, but that packed part breaks down as it would require twice the bandwidth. At which point a lot of changes are being made.
     
  3. roybotnik

    Newcomer

    Joined:
    Jul 12, 2017
    Messages:
    18
    Likes Received:
    14
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    Segmentation and complexity savings could be reasons. One claim I did see, although it's been some time and I forget if it was a thread here or elsewhere, is that some of the products' development timelines did not match up for the readiness of hardware supporting certain precisions.
    GV100 is the first product to have DP, FP32, FP16, and INT8 in the 1/2, 1, 2x, 4x (ed: fixed 4x) hierarchy that one would expect in a top product, while the Pascal generation's hardware is less uniform.


    I have many concerns raised by this and the elaborations that follow.
    From how to provide deterministic breakdown into indeterminate behavior, where this is happening, when these things happen, how they can be accomplished simply, circuit behavior, knock-on effects in the pipeline, pipeline balance, clocking complexity, "simple" circuits that somehow map to situations with nebulous bounds on complexity, inserting layers for decision making, gating, wires in, wires throughout, wires going out, and so on.


    The EXE stage is a serial chain of single-bit adders and a sequencer for instruction type propagation delay adjustment?

    Generally, operand size has not had a linear effect on stage delay. The number of logic layers isn't linear, and there are many elements and layers outside of the ALU like pipeline latches and settling time, or the delay of other stages that moderate the effects. The switch from 32-bit to 64-bit ALUs for x86 CPUs didn't halve all clock speeds, and I think I've seen estimates of the internal delays for the execution stage maybe increasing by a modest amount (years ago I vaguely recall estimates in perhaps the tens of percent from the tens of picoseconds of intra-stage delay), but the overall pipeline was able to mostly absorb the incremental delay since more pressing limits constrained the clock period before the ALU circuit itself could limit speeds.
     
    #284 3dilettante, Jul 18, 2017
    Last edited: Jul 18, 2017
  5. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,930
    Likes Received:
    1,626
    AMD Radeon RX Vega Lands in Budapest – Shows Performance Close To a GTX 1080, Launching in 2 Weeks
    http://wccftech.com/amd-radeon-rx-vega-gtx-1080-battlefield-1-comparison/

    Edit: Why all the smoke and mirrors?
     
  6. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    If the AMD representative said the Radeon setup was $300 cheaper, I think it is prudent to not assume that they are using the supposed $200 average difference between Freesync and G-Sync unless they said so explicitly.
    Covering up the monitor model information means they are free to find some pair of monitors that are outliers or are part of the sample that are necessarily above average in price differential.

    Assuming that the rest of the system specs or their pricing are equivalent or not misleading would be nice, but the rest of the spectacle makes me wary.

    edit:
    The Tomshardware temp numbers for Vega and the HBM2 stacks seem strange to me.
    95C should be what AMD's silicon can readily maintain, and not generally the HBM2.
    How the DRAM can hit that if it's even somewhat touching a cooler that can keep the GPU at 85 seems counter to my intuition.

    Could the layers really be that insulating versus a GPU at full tilt?
     
    #286 3dilettante, Jul 19, 2017
    Last edited: Jul 19, 2017
    Cat Merc likes this.
  7. mpg1

    Veteran Newcomer

    Joined:
    Mar 5, 2015
    Messages:
    1,526
    Likes Received:
    1,112
    assuming the translation of this video is correct:



    then they were explicity using the price difference between freesync and gysync as the selling point..

    edit: actually he says g-sync system. so that could be interpreted as price difference in monitor + card if they are using $200 monitor diff avg..
     
    #287 mpg1, Jul 19, 2017
    Last edited: Jul 19, 2017
  8. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    In this case I'd have inserted a few gates to break the circuit and yes for FP it would be a little more involved. That said, my understanding was the packed math only accounts for relatively simple math operations. It could very well be it's own parallel logic, but they also could have reused portions to conserve space.

    The bounds are always going to be the 32 bits in and out in x amount of time. I'm not proposing any changes beyond extending time slightly if it proved necessary.

    I'm not sure it would even be that complex. It could be a matter of doubling the clockspeed and planning on the circuit stabilizing in a single clock for faster double rate math. Mask off portions to likely save a little power on indeterminate states and comply with standards. Other instructions now taking two cycles to maintain the status quo. Ultimately the timing would need analyzed throughout the pipeline to determine clocks, but I'm operating from the basis the ALU is the limiting factor as slower is the only possibility to affect other parts of the pipeline.

    FP32 to FP64 didn't halve the clocks, but it did require doubling operand bandwidth to sustain throughput. In the case of P100 we definitely aren't seeing the 2GHz clocks of the consumer parts so FP64 likely became the bottleneck. Only the ALUs should have that serial dependency. Everything else a per bit operation, so most of the timing fixed regardless of operand size.
     
  9. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    "Technical" reasons for not having 8-Hi HBM1. I'd imagine AMD would have liked those for Fiji. Not to mention Nvidia wanting the same for P100. I doubt cost was that harmful to margins. Only other possibility was poor yields from the assembly process. I always assumed the thermal issues were the reason for Fiji's AIO as temps were rather low for a typical GPU.
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The proposal was to force the timings fast enough to cause half of the logic to fail, not specifying how this would be determined or why this failure mode should occur in a consistent pattern. It's not clear why this method should be linked to doubling clock rate. If the pipeline is balanced and tight enough, there may not be that much slack--before adding intra-stage state monitoring, one or two arbitrary crossbars, clock doubling hardware, and gates reactive to a sub-cycle granularity.
    Is there a step or design feature that was omitted that makes the analog behavior of these circuits reach indeterminate states in a deterministic manner? The total solution should be simple and not inject more delays than it saves in the normal and doubled case.

    It's not that complex, rather it's simple enough to create the asserted linear relationship between execution cycle time and the width of the data unit.
    This is inserting a doubled clock or clock doubling circuitry. Is this applying the faster clock to the whole unit and register pipeline then selectively down-clocking one half-speed stage, or is the whole pipeline at one clock and one stage has a doubled clock and the buffering necessary to bring signals up and down out of the domain?

    I don't see the relevance that this has to whether the half the ALU logic can settle within a given clock period.
     
    #290 3dilettante, Jul 19, 2017
    Last edited: Jul 19, 2017
    pharma likes this.
  11. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    If you have to promote the price advantage of the full system incl. the monitor (and most of it comes from the monitor) your performance can not look too awesome.
     
    homerdog and Mize like this.
  12. Leier

    Newcomer

    Joined:
    Jun 30, 2017
    Messages:
    31
    Likes Received:
    22
    I don't think those are reference cards with DHE.

    Both two chip cards - different thing. And then check the reviews and look at the "noise" section ;)
     
  13. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    During further testing we've found something quite accidentally which Vega really excels at: +72% compared to Fury X, which itself is faster than a Titan X - so maybe not a good indicative of performance compared to Nvidia, but performance jump from prior Radeon generations when everything falls into place.
    A pure compute 4k entry from Scene demo competition called 2nd Stage Boss:
    http://www.pcgameshardware.de/Vega-...ase-AMD-Radeon-Frontier-Edition-1232684/3/#a5

    From the demo's readme:
    Description:
     
    Cyan, Lightman, Mize and 4 others like this.
  14. BacBeyond

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    73
    Likes Received:
    43
    Interesting thanks :), can you run it @ Fury clocks to see how much is clocks and how much is arch changes?
     
  15. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    4K demos like this tend to be pure ALU (sphere tracing of analytical scene). No memory accesses at all (HBM2 and memory controllers are idling). Also ROPs, TMUs and geometry units are idling. It's entirely possible that Vega reaches max clocks in code like this. That gets us to 52% increase. The shader is pretty big (whole demo in one shader), so it should benefit from the instruction prefetch introduced in GCN4 (Polaris). Tiled rasterizer and improved DCC aren't used at all because this demo is compute shader based.
     
    Kej, Silent_Buddha, homerdog and 7 others like this.
  16. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    This sounds as if you had a background in that kind of programming, too. :) Much appreciated insights. And yes, it runs at max clocks. As does Fury X. Compared to Polaris, Fury X scores 21-23% higher while having 31% higher TFLOPS throughput, so maybe that's the prefetch effect showing.

    TFLOPS per Fps (less is better)
    Code:
    Fury X 		221,7
    Polaris 20 	209,1
    Vega 10 	192,4
    Hawaii XT 	223,6
    Titan X (Pascal)355,4
    
     
    #296 CarstenS, Jul 19, 2017
    Last edited: Jul 19, 2017
    Lightman and T1beriu like this.
  17. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,490
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    Could you run this shader heavy demo on Vega: http://www.geeks3d.com/20151231/impressive-pixel-shader-of-a-snail-glsl/

    The setup is a bit complicated, just follow the instructions.
     
  18. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Complicated is not good atm, we have our mag's deadline to hit. :) But I'll try.

    edit:
    Ah, not too complicated.

    Titan X (Pascal) 228-234 Fps with 1x AA, 12 Fps with 4x AA at default 800x480 window
    Did not run on Vega FE. GeexLab just closed down.
     
    #298 CarstenS, Jul 19, 2017
    Last edited: Jul 19, 2017
    Lightman, fellix and T1beriu like this.
  19. Rootax

    Veteran Newcomer

    Joined:
    Jan 2, 2006
    Messages:
    1,179
    Likes Received:
    581
    Location:
    France
    homerdog and Cat Merc like this.
  20. tongue_of_colicab

    Veteran

    Joined:
    Oct 7, 2004
    Messages:
    3,445
    Likes Received:
    655
    Location:
    Japan
    That's what I've been wondering the past one or two week following this thread. If Vega was in any way competitive either on pure speed or price performance surely by now AMD would have made some noise about that? Power usage while gaming is not really that important imo but given how coy AMD is being about everything it seems to me Vega is slower, hotter and not really (if any) cheaper than the competition.

    Hopefully that isn't that case. Apart from a GF4MX and 560TI I always had AMD cards and I'm perfectly happy with my R9 290 but I'm not sure if I'll buy a power hungry card again. Not because its power hungry but because of the noise. Even the Gigabyte R9 I have can get pretty load in summer.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...