AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. BacBeyond

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    73
    Likes Received:
    43
  2. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,719
    Likes Received:
    2,456
    I'd wager none, They basically used the new node to increase clock speeds at roughly the same power consumption as FuryX. Apparently GCN required a lot of juice and area for that. Ryzen turned out fine despite being fabbed at GF.
     
  3. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Sorry, per thread group not SIMD. Regardless that seems a rather interesting change in the scheme of things. However I thought 1024 was an established limit for most APIs.

    AMD has had a bunch of patents recently that were all quickly filed and published. For the most part they seem to be software techniques for ambiguous hardware. Like most patents. Anyways:
    MEMORY MANAGEMENT IN GRAPHICS AND COMPUTE APPLICATION PROGRAMMING INTERFACES
    METHOD AND APPARATUS TO ACCELERATE RENDERING OF GRAPHICS IMAGES (Perhaps worst patent title ever?)

    In a more compressed form yes. While certainly possible, I'm guessing the paged memory is a subset of the overall pool. Leaving the HBCC to only track active pages. Some resources(framebuffer, meshes, stacks) simply won't lend themselves to paging very well and likely be kept in a separate pool.
     
  4. entity279

    Veteran Regular Subscriber

    Joined:
    May 12, 2008
    Messages:
    1,227
    Likes Received:
    421
    Location:
    Romania
    I'd say you're oversimplifying things. Ryzen is too diferent to Vega / Fury to compare (and who's to say it doesn't performs as it does in spite of the process that hampers it, instead of because of it).

    Further, for Fury vs Vega, both the architecture and process are different. Which is the main cause of Vega's underperforming could be anyone's guess.
     
  5. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Some references to "Texture Caches" in drivers so there could be multiple generic L2s. That might actually make sense for async to avoid trashing.

    Instruction caches would be significant. 48KB (16+32KB) per CU on an old GCN iteration and they've been growing. Those could be critical with higher clocks and over 3MB. Then assume if INT and FP are running concurrently as suggested in one slide they would need to be much larger. Could be 10MB or more in various instruction caches there. Certainly not everything, but that could be half the unaccounted SRAM.
     
  6. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,796
    Likes Received:
    2,054
    Location:
    Germany
    Texture Caches are L1. 16 KiB per CU.
    Where did you get the instruction caches sizes from? Very curious, since I've either completely forgotten about them being discussed or have never seen it.
    edit: Ah, the very first GCN presentation. 4 CUs sharing 16 kiB scalar read-only cache (constants?) and 32 kiB instruction L1!
    Which slide suggests that INT and FP are running concurrently (on the vec16-SIMDs)?
     
  7. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,120
    Likes Received:
    2,866
    Location:
    Well within 3d
    The GCN3 ISA indicates the maximum workgroup size is 16 wavefronts (1024 work items). Whatever limit is being set here has some other confounding issue if they got away with 2048 before.
     
  8. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    8,142
    Likes Received:
    1,830
    Location:
    Finland
    Just to interrupt your usual broadcast, @ToTTenTranz could you update the title to include Vega 12?
     
    Malo and ToTTenTranz like this.
  9. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,120
    Likes Received:
    2,866
    Location:
    Well within 3d
    It could be an evolution of a customization created for a console already built.
    http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=3

    The front end reorganization may allow for a more generic version of this compute shader so that it can feed primitive setup across various combinations of VS,TS, and GS. Perhaps the specialized compiler mode is a precursor to how AMD expected to make existing vertex code work for Vega.


    Exact wording may be important. There are references to texture channel caches, which are actually describing the L2.
     
  10. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Going to hold off on the concurrently part given the context for the time being after re-reading that paragraph. That 4-5x part is still interesting though. INT8 would be 4x, but the extra +1 relative to Polaris I'm unsure about. The rest of the paragraph is INT16/FP16 which I figured ran concurrently for 4x.

    Looking at slide 17, it's possible there are two 64KB banks per SIMD. That could account for a good chunk of SRAM and make sense with the longer pipelines and higher clocks.

    It may be a linux thing because that code would have been actively used for years now. All the documentation I recall has the 1024 work item limit as you mentioned, but obviously they were exceeding that limit with some success.
     
    fellix and CarstenS like this.
  11. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,120
    Likes Received:
    2,866
    Location:
    Well within 3d
    If some of the instructions in the addressing category have scalar and vector variants, the scalar portion running a chained operation can be the +1 if running concurrently with a 4x INT8 operation. It seems like a sensible thing to have in both domains.
     
    CarstenS likes this.
  12. Elfear

    Newcomer

    Joined:
    Oct 27, 2006
    Messages:
    3
    Likes Received:
    0
    If all of AMD's slides were made using the newer drivers with DSBR enabled and those slides show Vega ~= 1080FE, that indicates Vega would be 10-15% slower than the 1080 without that new feature enabled. I'm not saying you're wrong but my brain can't wrap itself around what that means (i.e. with a 50% clock increase Vega would only be ~15% faster than a FuryX). What has AMD been doing for the last 2yrs? :no:
     
  13. moozoo

    Newcomer

    Joined:
    Jul 23, 2010
    Messages:
    109
    Likes Received:
    1
    So.. having viewed the slides, in summary, no advancement of fp64 performance. yeah my main interest. Graphics performance of my R9 290x's is good enough for me. But its great to see the other improvements.
     
  14. seahawk

    Regular

    Joined:
    May 18, 2004
    Messages:
    511
    Likes Received:
    141
    GP107 is the counter point. Although with baseline frequencies rather low it boost easily to 1700Mhz, uses less power per FPS than the AMD competition and is made in the same process but at Samsung.
     
    DavidGraham likes this.
  15. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    I think the real question is, with so many FreeSync monitors on the market, why do they bundle the one with the worst reputation? I think Samsung simply saw this as a opportunity to offload a lemon, and AMD took the bait.
     
    Malo, pharma and Lightman like this.
  16. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,719
    Likes Received:
    2,456
    It was indeed, Vega FE is inbetween 1070 and 1080, and it had DSBR disabled. AMD will provide a patch to enable it for FE when RX launches. So FE and RX will have equal gaming performance. Maybe @Rys can shine more light on the matter, if his hands are not tied that is.
     
  17. BacBeyond

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    73
    Likes Received:
    43
    GSync has plenty of flickering issues as well. Samsung is one of the biggest suppliers and having a 20%+ discount on a monitor is pretty huge, which many manufacturers probably can't do. Since samsung makes the panels they obviously have the highest markup.

    A quick fix was already posted and I'm sure saving a ton vs GSync is welcome.
     
  18. kalelovil

    Regular

    Joined:
    Sep 8, 2011
    Messages:
    555
    Likes Received:
    93
    Exotic memory raising platform cost, lengthened pipeline for higher clock-speeds costing a lot of transistors and reducing IPC, feature extensions which require significant developer effort to implement, an architecture pitched as best suited for 'tomorrow's workloads'.
    Sounding a bit like the P4.
     
  19. Entropy

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,051
    Likes Received:
    1,011
    Nevertheless it does clock lower than the same architecture implemented on the similar TSMC process.
    Boost clocks are actually 26% higher on GP106 vs GP107.
    That's pretty much the real world performance delta between the gtx1080 and 1080ti. In Vegas performance segment such differences make a large difference in perception, and thus also in what prices you can ask.
     
    #3319 Entropy, Aug 1, 2017
    Last edited: Aug 1, 2017
    no-X and Putas like this.
  20. gamervivek

    Regular Newcomer

    Joined:
    Sep 13, 2008
    Messages:
    715
    Likes Received:
    220
    Location:
    india
    A reddit user posted a comparison of 1080Ti at roughly similar clocks with a frontier edition, around 10% advantage for the nvidia card.

    http://www.3dmark.com/compare/fs/13254853/fs/131174

    Edit : the vega card is overclocked on memory as well so the difference will be greater.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...