AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by Deleted member 13524, Sep 20, 2016.

  1. BacBeyond

    BacBeyond Newcomer

  2. DavidGraham

    DavidGraham Veteran

    I'd wager none, They basically used the new node to increase clock speeds at roughly the same power consumption as FuryX. Apparently GCN required a lot of juice and area for that. Ryzen turned out fine despite being fabbed at GF.
     
  3. Anarchist4000

    Anarchist4000 Veteran

    Sorry, per thread group not SIMD. Regardless that seems a rather interesting change in the scheme of things. However I thought 1024 was an established limit for most APIs.

    AMD has had a bunch of patents recently that were all quickly filed and published. For the most part they seem to be software techniques for ambiguous hardware. Like most patents. Anyways:
    MEMORY MANAGEMENT IN GRAPHICS AND COMPUTE APPLICATION PROGRAMMING INTERFACES
    METHOD AND APPARATUS TO ACCELERATE RENDERING OF GRAPHICS IMAGES (Perhaps worst patent title ever?)

    In a more compressed form yes. While certainly possible, I'm guessing the paged memory is a subset of the overall pool. Leaving the HBCC to only track active pages. Some resources(framebuffer, meshes, stacks) simply won't lend themselves to paging very well and likely be kept in a separate pool.
     
  4. entity279

    entity279 Veteran Subscriber

    I'd say you're oversimplifying things. Ryzen is too diferent to Vega / Fury to compare (and who's to say it doesn't performs as it does in spite of the process that hampers it, instead of because of it).

    Further, for Fury vs Vega, both the architecture and process are different. Which is the main cause of Vega's underperforming could be anyone's guess.
     
  5. Anarchist4000

    Anarchist4000 Veteran

    Some references to "Texture Caches" in drivers so there could be multiple generic L2s. That might actually make sense for async to avoid trashing.

    Instruction caches would be significant. 48KB (16+32KB) per CU on an old GCN iteration and they've been growing. Those could be critical with higher clocks and over 3MB. Then assume if INT and FP are running concurrently as suggested in one slide they would need to be much larger. Could be 10MB or more in various instruction caches there. Certainly not everything, but that could be half the unaccounted SRAM.
     
  6. CarstenS

    CarstenS Legend Subscriber

    Texture Caches are L1. 16 KiB per CU.
    Where did you get the instruction caches sizes from? Very curious, since I've either completely forgotten about them being discussed or have never seen it.
    edit: Ah, the very first GCN presentation. 4 CUs sharing 16 kiB scalar read-only cache (constants?) and 32 kiB instruction L1!
    Which slide suggests that INT and FP are running concurrently (on the vec16-SIMDs)?
     
  7. 3dilettante

    3dilettante Legend Alpha

    The GCN3 ISA indicates the maximum workgroup size is 16 wavefronts (1024 work items). Whatever limit is being set here has some other confounding issue if they got away with 2048 before.
     
  8. Kaotik

    Kaotik Drunk Member Legend

    Just to interrupt your usual broadcast, @ToTTenTranz could you update the title to include Vega 12?
     
    Malo and Deleted member 13524 like this.
  9. 3dilettante

    3dilettante Legend Alpha

    It could be an evolution of a customization created for a console already built.
    http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=3

    The front end reorganization may allow for a more generic version of this compute shader so that it can feed primitive setup across various combinations of VS,TS, and GS. Perhaps the specialized compiler mode is a precursor to how AMD expected to make existing vertex code work for Vega.


    Exact wording may be important. There are references to texture channel caches, which are actually describing the L2.
     
  10. Anarchist4000

    Anarchist4000 Veteran

    Going to hold off on the concurrently part given the context for the time being after re-reading that paragraph. That 4-5x part is still interesting though. INT8 would be 4x, but the extra +1 relative to Polaris I'm unsure about. The rest of the paragraph is INT16/FP16 which I figured ran concurrently for 4x.

    Looking at slide 17, it's possible there are two 64KB banks per SIMD. That could account for a good chunk of SRAM and make sense with the longer pipelines and higher clocks.

    It may be a linux thing because that code would have been actively used for years now. All the documentation I recall has the 1024 work item limit as you mentioned, but obviously they were exceeding that limit with some success.
     
    fellix and CarstenS like this.
  11. 3dilettante

    3dilettante Legend Alpha

    If some of the instructions in the addressing category have scalar and vector variants, the scalar portion running a chained operation can be the +1 if running concurrently with a 4x INT8 operation. It seems like a sensible thing to have in both domains.
     
    CarstenS likes this.
  12. Elfear

    Elfear Newcomer

    If all of AMD's slides were made using the newer drivers with DSBR enabled and those slides show Vega ~= 1080FE, that indicates Vega would be 10-15% slower than the 1080 without that new feature enabled. I'm not saying you're wrong but my brain can't wrap itself around what that means (i.e. with a 50% clock increase Vega would only be ~15% faster than a FuryX). What has AMD been doing for the last 2yrs? :no:
     
  13. moozoo

    moozoo Newcomer

    So.. having viewed the slides, in summary, no advancement of fp64 performance. yeah my main interest. Graphics performance of my R9 290x's is good enough for me. But its great to see the other improvements.
     
  14. seahawk

    seahawk Regular

    GP107 is the counter point. Although with baseline frequencies rather low it boost easily to 1700Mhz, uses less power per FPS than the AMD competition and is made in the same process but at Samsung.
     
    DavidGraham likes this.
  15. silent_guy

    silent_guy Veteran Subscriber

    I think the real question is, with so many FreeSync monitors on the market, why do they bundle the one with the worst reputation? I think Samsung simply saw this as a opportunity to offload a lemon, and AMD took the bait.
     
    Malo, pharma and Lightman like this.
  16. DavidGraham

    DavidGraham Veteran

    It was indeed, Vega FE is inbetween 1070 and 1080, and it had DSBR disabled. AMD will provide a patch to enable it for FE when RX launches. So FE and RX will have equal gaming performance. Maybe @Rys can shine more light on the matter, if his hands are not tied that is.
     
  17. BacBeyond

    BacBeyond Newcomer

    GSync has plenty of flickering issues as well. Samsung is one of the biggest suppliers and having a 20%+ discount on a monitor is pretty huge, which many manufacturers probably can't do. Since samsung makes the panels they obviously have the highest markup.

    A quick fix was already posted and I'm sure saving a ton vs GSync is welcome.
     
  18. kalelovil

    kalelovil Regular

    Exotic memory raising platform cost, lengthened pipeline for higher clock-speeds costing a lot of transistors and reducing IPC, feature extensions which require significant developer effort to implement, an architecture pitched as best suited for 'tomorrow's workloads'.
    Sounding a bit like the P4.
     
  19. Entropy

    Entropy Veteran

    Nevertheless it does clock lower than the same architecture implemented on the similar TSMC process.
    Boost clocks are actually 26% higher on GP106 vs GP107.
    That's pretty much the real world performance delta between the gtx1080 and 1080ti. In Vegas performance segment such differences make a large difference in perception, and thus also in what prices you can ask.
     
    Last edited: Aug 1, 2017
    no-X and Putas like this.
  20. gamervivek

    gamervivek Regular

    A reddit user posted a comparison of 1080Ti at roughly similar clocks with a frontier edition, around 10% advantage for the nvidia card.

    http://www.3dmark.com/compare/fs/13254853/fs/131174

    Edit : the vega card is overclocked on memory as well so the difference will be greater.
     
Loading...

Share This Page

Loading...