AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by Deleted member 13524, Sep 20, 2016.

  1. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    From a (VERY) quick glance with some basic fillrate tests (yes, i know, hence VERY quick) there does not appear to be anything unusual with Vega - like certain past architectures where enabling MSAA would greatly reduce pixel- or textur rate, which it was designed no to in the first place.
     
  2. I read somewhere that HBCC is disabled in current drivers. Or at least in the drivers that were sent to reviewers.
     
  3. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Yes and no. It's disabled by default, but you can enable it from Radeon Software and it should just work then.
     
    CarstenS likes this.
  4. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    696
    Likes Received:
    446
    Location:
    Slovenia
    Who has all the swizzles... :twisted:
    There are 3 completely different topics with swizzling:
    - standard swizzle (in this thread): Layout of texels within texture defined by D3D (as opposed to IHV defined swizzles which may differ from one IHV to another or from one GPU generation to another).
    - indexed swizzle: Which sebbbi mentioned here and refers to ability of threads to exchange data withing a warp/wavefront. Specifically that one lane may directly index a value in a register from another lane.
    - viewport swizzle: Which you mentioned and which allows to output a triangle from pass through geometry shader to multiple viewports and reorient it properly (flip some coordinates around) when rendering cube map in a single pass for example.
     
    tinokun, Kej, Entropy and 5 others like this.
  5. Alexko

    Veteran Subscriber

    Joined:
    Aug 31, 2009
    Messages:
    4,541
    Likes Received:
    964
    @sebbbi, now that you've played with Vega quite a bit, do you have an educated guess as to why it performs so much worse than what you'd expect from its specifications?
     
  6. Alessio1989

    Regular

    Joined:
    Jun 6, 2015
    Messages:
    614
    Likes Received:
    321
    Anyone played with two VEGA under LDA and MDA modes? D:

    I am curios to see how HBC and std swizzle impact vs previous GCN generations..
     
  7. Locuza

    Newcomer

    Joined:
    Mar 28, 2015
    Messages:
    45
    Likes Received:
    101
    As far as I know three german sites tested the HBCC (High-Bandwidth-Cache-Controller).
    The results are mixed, some games profit from the HBCC some games are running worse.
    I uploaded two images with some results below, there are more results if you click on the links.

    Computerbase (had mixed results):
    [​IMG]
    https://www.computerbase.de/2017-08..._hbcc_hat_im_schnitt_weder_vor_noch_nachteile

    PCGH (the HBCC was never worse but in Metro the HBCC improved the performance):
    [​IMG]
    http://www.pcgameshardware.de/Radeo...66623/Specials/HBCC-Gaming-Benchmark-1236099/

    Gamestar (the HBCC was never better, on average a few % loss):
    http://www.gamestar.de/artikel/rade...t-high-bandwith-cache-controller,3318564.html


    The games each one testet were different and so was the testsystem.

    PCGH used the 6800K @ 4,4 Ghz with 32GB (3200) in quad-channel-mode, ~4GB system memory were reserved for ~12 GB unified memory.

    CB used also 32GB (probably quad-channel (3000) with the 6850K @ 4,3 Ghz)
    They reserved 8GB for 16GB unified memory.

    Gamestar in contrast used the 7700K @ default with 16GB (2400) dual-channel memory where they reserved 4GB for 12 GB unified memory.
     
    tinokun, T1beriu, sebbbi and 5 others like this.
  8. Digidi

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    428
    Likes Received:
    239
    That's a good point. AMD stated that they reached the 17 polygons per clock in there internal benchmarks. Also I was surprised that they lifted up the value from 11 to 17. And I also was surprised that the primitive shader is not really activated in the driver and that it will be diliverd later. Also it's interesting that they know the driver name where it should come.

    For me it is strange, first they make jokes like poor Volta. Than they compare Vega to a 1080(non ti). For me it looks they have serious problems to get there geometry engine to run but when it runs we will see a huge performance increase, because the utilisation of the hardware will be higher.
     
    Rasterizer likes this.
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The exact word is polygons per clock in the earlier slide versus 17 or more in the Vega whitepaper. I'm not sure if there's some subtly different scenario being applied in each, or if AMD was being conservative about what it could apply its primitive shaders to.

    I think it's possible that it was some marketing intern's idea of a little joke that took on a life of its own.
    I don't think anybody engineering Vega would have thought this seriously.


    In the other thread, I was speculating on whether some of the apparent lack of development in advance of the chip being taped out could have stemmed from some kind of re-shuffling in Vega's design process. The leaked Greenland GPU had certain features that may not be in Vega 10, like 1/2 rate DP and 4 GMI links.

    If there was something like AMD having multiple candidate designs that weren't settled on until late, it may have inhibited preliminary development or led to some IP blocks being adjusted to different levels than early modeling was done for.
     
  10. Digidi

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    428
    Likes Received:
    239
    Maybe AMD was not sure about primitive shader. It's interesting that primitive shader coexist with the old geometry pipeline.
     
    #3731 Digidi, Aug 21, 2017
    Last edited: Aug 21, 2017
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The formulation of the primitive shader given so far is that it contributes to efficiency by conservatively culling geometry that the current geometry pipeline would wind up rejecting, freeing up traffic and storage pertaining to attribute calculation and cycles the fixed-function engines must use to reject even the most obviously unused primitive.

    That it's conservative means sometimes a little or a lot of the primitive shader's input must get through, and since it's only culling it doesn't do a lot of the processing the main pipeline must perform.

    Some of the items that haven't been fully explored are what portion of the culling can be handled by the primitive shader versus the primitive discard accellerator, versus DSBR, versus the rest of the pipeline.
    The other hardware elements somewhat overlap with the primitive shader and are themselves conservative. The main pipeline and rasterizer also eventually drill down to the sub-pixel level, which could incur a fair amount of math for the primitive shader to do on its own, and then have the rasterizer do all over again.
    Perhaps if there were a way to pass along a guarantee that certain culling actions are known to be fully accurate, something could be skipped.

    I'm not sure how many situations involve the primitive shader's calculations for frustrum and triangle facing needing to be more conservative than the main pipeline. Sample coverage at the level that the rasterizer can drill down to might incur a fair amount of math, and might affect the per-primitive cost of a primitive shader based on the complexity of the sampling if it were coded to try calculating things that finely for every triangle and then having to pass it all through anyway.
     
    Rasterizer and Digidi like this.
  12. Digidi

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    428
    Likes Received:
    239
    #3733 Digidi, Aug 21, 2017
    Last edited: Aug 21, 2017
  13. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    May be better to think of primitive shaders as a means to control the fixed function pipeline. The whole point is to take a primitive, do something to it, and hand it off for rasterization. That can be culling, transforming, sorting, generating, etc to make the pipeline more efficient. If presented with triangle strips consisting of only front facing, visible triangles it's unnecessary. Therefore it coexists.
     
    Rasterizer likes this.
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    I think that slide is more about illustrating that the primitive shader combines the capabilities of the formerly separate stages. The culling portion is what's been mostly talked about in the most recent disclosures, and it was discussed earlier that the system or software could opt to not use its capabilities, possibly in cases where it would be more overhead than help.

    Using that illustration, the fixed-function portion of the setup pipeline would come after, and that's where the other culling checks and rasterizer would be.
     
    Digidi likes this.
  15. Digidi

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    428
    Likes Received:
    239
    I find this sentence interesting:
    http://www.gamersnexus.net/guides/3010-primitive-discarding-in-vega-with-mike-mantor

    Looks like AMD find a way to do back face culling with vertex data?

    From the white paper:
     
    #3736 Digidi, Aug 21, 2017
    Last edited: Aug 21, 2017
    Rasterizer likes this.
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    The later portion of the quote discusses having two edges or creating two edges from three vertices, which should be available in this instance. Then it discusses taking the cross product and its product with the eye-ray, indicating whether the triangle is back-faced or not.
    That goes to where I wasn't sure how often the primitive shader would diverge from the primitive pipeline in terms of frustrum and back-faced triangles.

    I read that text, and I watched the video for any mention about the overhead of the zero-coverage checks, which might be influenced by the sampling mode and elements that are typically handled at higher precision by the rasterizer. I don't recall seeing that portion mentioned, and drilling down to the level that the DSBR can go would potentially scale overhead based on a portion of the complexity of the pixel output.

    The primitive shader is conservative, in part to be safe, and I think in part because it should be coarse enough that the work to cull plus the non-culled work is supposed to be less than just using what's already there.
     
  17. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,714
    Likes Received:
    2,135
    Location:
    London
    It's always been possible to do this in hardware, it's just a small part of the acceleration that you get from GPUs! Using shader code to do it, instead, is not a major engineering feat. Larrabee was doing this.

    It's analogous to when GPUs changed from having fixed counts of vertex and pixel shader pipelines in hardware into a unified architecture where all shader pipes could do both vertex and pixel shading. It was done because it allowed the GPU to adjust to the workload, using programmability and load-balancing metrics to adjust to the workload. It led to better performance.

    So the primitive shader is similar: it allows the hardware to cover a large range of situations, particularly very high geometry load and be less bottlenecked than a fixed configuration of buffers/hardware.
     
    Cat Merc, tinokun, Lightman and 5 others like this.
  18. Digidi

    Regular

    Joined:
    Sep 1, 2015
    Messages:
    428
    Likes Received:
    239
    Found a interesting link. Is this the birth of primitive shader ?

     
    Rasterizer likes this.
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    At least for my browser right now, I cannot see the actual tweet.
    However, would the following be an earlier precursor, given what it does and that it was done on GCN?
    http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=3

    At least up to the point where it is running two separate kernels communicating through a buffer, it's using a subset of the vertex shader's functionality to do similar position and visibility culling before feeding the remaining vertices into the main vertex processing phase. Possibly, the architecture goes a step further in its generalizing and combining shader stages so that it can make one stage out of the formerly separate sieve and vertex pair?
     
    Malo, Digidi, Grall and 1 other person like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...