AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by Deleted member 13524, Sep 20, 2016.

  1. And/or bandwidth and/or geometry performance. In the article's conditions the Vega 56 and 64 have the exact same bandwidth and the same 4 geometry engines working at the same clocks.
    With primitive shaders working these two factors might be less of a bottleneck.



    Raven Ridge is bringing up to 11 NCUs which is more than enough to meet the performance target of a GT4 Iris Pro even if it clocks at ~900MHz. Problem here is bandwidth. Supporting LPDDR4X or a single HBM stack would have done wonders for Raven Ridge, but it doesn't look like it's happening.
     
    #4101 Deleted member 13524, Sep 14, 2017
    Last edited by a moderator: Sep 14, 2017
  2. Anarchist4000

    Veteran

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Unless more than half the frame is spent on geometry, async should work around a geometry bottleneck. Bandwidth should show at higher resolutions as cache bandwidth becomes more significant. At 4k a Vega64 should be pulling well ahead as both geometry and bandwidth become less significant relatively.

    If the supply was there, I'd think HBM and a small form factor push would have been a huge opportunity for AMD. Drivers are still working on RR for Android, so may still be hope for some Chromebox and Steam Box designs.

    What I'm suggesting is stick a nano and 4/8 core Ryzen into a Threadripper socket or simply embed it entirely. Push an entirely new market. A small box with a lot of external IO. Do what integrated graphics did to the low end market with Intel's graphics market share beyond just 11 NCUs. Take integrated a step further and go for 80-90% integrated.
     
  3. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Vega 56 and 64 operate basically on the same memory bandwidth as Fiji. Hence, every byte saved by the measures put in place should be very welcome.
     
  4. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Aside from HBM and the data fabric, there is also the same L2-L1 bandwidth and the same RBE/export behavior to the L2 and beyond.
    Latencies could be generally equivalent iso-clock, and if there are internal variations with the firmware settings they don't show up in the testing.

    Some of those elements might change with DSBR and workable primitive shaders. Backpressure due to RBE thrashing and conflicts with CU traffic could slow execution of wavefronts or delay their final export and releasing of resources. Less successful early culling can potentially leave the iso-clock setup pipeline and wavefront launch process more burdened as well.

    If AMD's patents on how it implements binning and tiled rasterizer reflect Vega's implementation, the front end's behavior without the features active would make the setup process longer-latency, which Vega may not be balanced for if it's the norm rather than a minority case. As this is happening in a stage whose output generally is amplified into a larger amount of pixel shader work, the level of parallelism and latency tolerance may profile differently and have more specialized concerns given the interaction with the more specialized paths and fixed-function blocks.

    The latency angle has made me curious about the significantly higher wait count limit for Vega's vector memory instructions, and if this is a case of where decisions at each level of abstraction are bleeding through.
    It doesn't seem like GCN suddenly incurred four times the memory latency, but it might matter more for the new shader variants than it does for more free-form pixel or compute shaders. The tendency for the front-end work to wind up spanning fewer CUs, interactions with fixed-function paths, taking a minority of CUs because of it being pre-amplificiation, and the more complex merged/primitive shader code might place a greater premium on per-wavefront latency handling. That's admittedly speculation in the absence of knowing how the new shaders profile.

    That Vega's ISA splits the latency count field the way it does may be another indication of wanting binary backwards compatibility, or perhaps like the implementation-specific triangle coverage instruction it is a sign of the ISA reflecting different scenarios (or different CU revisions?) that need to be able to ignore the new bits.


    The pricing and volume situation may keep this from happening going forward, given the orders of magnitude greater volume of the laptop market and the memory market's pricing trends. HBM seems to be hitting a point where it is too acceptable to buyers willing to pay a premium for APUs that will likely need to hit low price points--something likely to be assumed to be the case for AMD for some time just because that's generally what AMD gets and it would take time to reverse that perception.
    The pricing clock tends to reset with every new memory type or variation as well.
    Perhaps if Raven Ridge is among the last of the monolithic APUs, future implementations can allow flexibility where now AMD would need to add costs for itself to balance uncertainties in DRAM pricing and the disparate price points it needs to hit.

    To avoid cluttering the review thread, I will append a note in response to a post you made:
    https://forum.beyond3d.com/posts/2001017/

    I think that tweet was in reference to area more than other factors. The IF strip is a measurable amount of area.
    I don't really know why it would be a limitation beyond that, given it is described as a mesh and client Vega really shouldn't be stressing it enough to cause it to be a notable limitation.

    Its clock domain is constant, which likely wouldn't change for a client-optimized version for power reasons. It may also help service certain heterogeneous compute functions if its domain is used as a timekeeper.
    HBCC appears to sit in the coherent slave position noted for implementation of Zen's fabric, where there is an intermediary between the links and a memory controller, although what it's tasked with shouldn't be a major limiter since what gaming needs is a small subset. The unused features would be an area cost, generally.
    IF itself doesn't implement controllers or PCIe. In Zen, the fabric interfaces with controllers that then plug into the interface and PHY.
     
    Lightman likes this.
  5. RecessionCone

    Regular Subscriber

    Joined:
    Feb 27, 2010
    Messages:
    505
    Likes Received:
    189
    GP100 has a completely different SM than GP102. The ratio of scheduling to math hardware and on-chip memory is quite different. So this comparison is not as straightforward as you'd like to make it.
     
  6. The bulk of the conversation was about comparing Vega 10 to GP102, but you're worried about the GP100->GP102 comparison not being straightforward enough?

    Regardless, as posted above this isn't a conversation for this topic.
     
  7. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,382
    No, I'm talking about improved compute somehow impacting graphics at the cycle level within the same family.

    There is no evidence at all that the additional compute features of GP100 have a negative impact at the cycle level on graphics compared to GP102.

    It's a fun exercise, but not particularly relevant in this context. The trick is to make an architecture that combines best of everything. High clock speeds, low power, good performance for both games and compute workloads.

    Of course, TDP has an impact. But that doesn't apply to Vega vs GP102, since AMD already happily decided to give it a 350W power budget in order to not have to scale back clocks. And thus there's still no indication that Vega's disappointing performance can be explained by its compute features. It'd be different if AMD restricted Vega's clocks so that it could stay within the power envelope of GP102.

    You give 4 reasons, but forgot the most obvious one: GCN was always a power inefficient architecture and one that's unable to make TFLOPS do the work for graphics, and that hasn't changed with Vega.
     
    el etro, pharma and DavidGraham like this.
  8. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Cross-Topic, as it is not proper anymore to continue the discussion in the hardware-review thread, as was rightfully pointed out to me by ToTTenTranz. So we should continue this topic here.

    All those reasons sound like deliberate and willful decisions made by AMD, don't they?
     
  9. So this just appeared on reddit:



    [​IMG]

    IIRC, the Ryzen 5 2500U has the lower-end Vega 8 NCUs GPU, and Ryzen 7 "U" should get 11 NCUs. All of them have a 15W TDP.
    If true, I'm left wondering how the Vega iGPU isn't getting drowned in bandwidth bottlenecks.





    AMD yes, RTG no (e.g. GF's divestment deals precedes the creation of RTG).
    Point being?
     
    Lightman likes this.
  10. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    10,244
    Likes Received:
    4,465
    Location:
    Finland
    Going by name Ryzen 7 mobile should have 10 CUs, not 11
    Got link to reddit thread?

    edit:
     
  11. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    161
    Likes Received:
    179
    How is Vega able to outperform Fiji with lower bandwidth?

    Vega itself is built to be more memory bandwidth efficient, and it shows in the comparison between Fiji and Vega.

    All that 45MB of SRAM has to be doing something :???:
     
  12. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,195
    What SRAM? Are AMD APUs going to have SRAM like Iris Pro? Because Intel 640 has 64 MB....

    Regardless, they cannot come soon enough! My Core M ZenBook is getting awfully limited now that my use case for it changed from merely browsing/office work to software development.

    EDIT - Oh, you probably mean the 45 MB of cache Vega has! Although an 8CU part would probably have way less cache...
     
    #4112 Picao84, Sep 15, 2017
    Last edited: Sep 15, 2017
  13. Actually, in synthetic benchmarks Vega 10 seems to get lower effective bandwidth per-clock than Fiji, at least for now.
    Plus, I doubt Raven Ridge will have 45MB of SRAM. It's a 15-35W APU, not a 200-350W GPU.

    Maybe these results were obtained using high-clocked DDR4, though notebook memory currently tops at 3000MT/s (I think) and that's 48GB/s total.
    In Intel's Iris Plus 640, Crystalwell's eDRAM is 50GB/s duplex (100GB/s total), plus the system's 25-30GB/s using LPDDR3 1866.
     
  14. itsmydamnation

    Veteran

    Joined:
    Apr 29, 2007
    Messages:
    1,349
    Likes Received:
    470
    Location:
    Australia
    Despite all the derp and that has taken over this forum, buying a near MSRP 580 (about 75 on avg over in my country) or Vega is still a challenge. If AMD made desktop GFX the 3rd cousin to the consoles and Zen ( honestly it looks like it), it couldn't have turned out much better for them.

    1. A CPU thats IPC competitive to Intel
    2. A HEDT platform thats just better then Intels
    3. If above is ball park An APU thats going to be great for those 15-25watt ultrabooks and 25-35watt standard laptops
    4. A GPU that at least gets them in the enthusiast space
    5. A platform (CPU + GPU) that gives them a solid compute/datacentre product ( intels going to intel, power + G[V,P]100 is an expensive platform)

    Consider that currently AMD's R&D budget is less then NV's let alone worrying about Intels i think there is some room for optimism in the future and revenue and R&D will increase from where it has been over the last couple of years.

    edit: I just want to point out we dont really know if Vega is bandwidth bottleneck*, increasing memory clock reduces latency and that could be just as big a factor, so tasty DDR4 could go hand in hand nicely.

    *it certainly isn't compared to Polaris, which gets very good perf gains form mem OC.
     
    #4114 itsmydamnation, Sep 15, 2017
    Last edited: Sep 15, 2017
  15. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Just trying to understand what it is you're actually saying. :)
     
  16. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    161
    Likes Received:
    179
    Yeah, I mean the cache. Vega has a surprisingly high amount of it, and most of it is still unaccounted for.
     
  17. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    161
    Likes Received:
    179
    Well, of course it would have less mem bandwidth per clock. It's half the bus width :-|
     
  18. Picao84

    Veteran

    Joined:
    Feb 15, 2010
    Messages:
    2,109
    Likes Received:
    1,195
    Well, although it has a surprising amount of it, ROPs are now using it as well (although I have no idea if that means only reading or writing too), so its not like the increase benefits just the units who were using L2 cache already, they have to share the benefits across more units.
     
  19. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    The 45 MB SRAM do not only encompass the cache, but all SRAM cells on the GPU: Caches, FIFO-Buffers, Register files and the like.
     
    pharma, Heinrich4, Grall and 2 others like this.
  20. mrcorbo

    mrcorbo Foo Fighter
    Veteran

    Joined:
    Dec 8, 2004
    Messages:
    4,024
    Likes Received:
    2,851
    Sure. But if you're an enthusiast looking for a good value the purchasing experience around Vega is frustrating and that's NOT good for AMD and, even though they at least have a product offering now, they are still effectively absent from this market for many people. If I had confidence that there would be custom Vega 56s readily available @ $399 within a reasonable period of time I would have considered waiting to purchase one of those over the custom GTX 1080 I actually purchased @ $499.

    Clearly, there are people who will take any and all opportunities to present AMD's efforts in a negative light (and the reverse, of course), but this situation is not great for them, even if it does benefit them financially in the short term.
     
    Heinrich4 and Grall like this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...