AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. eastmen

    Legend Subscriber

    Joined:
    Mar 17, 2008
    Messages:
    10,902
    Likes Received:
    2,078
    darn it ! lol enjoy it
     
    digitalwanderer likes this.
  2. pTmdfx

    Regular Newcomer

    Joined:
    May 27, 2014
    Messages:
    280
    Likes Received:
    177
    How can it not be visible in shader when it is a dedicated type with pixel-level synchronization semantics in the shader? e.g. Intel advertises their implementation with the critical section starting at the first access touching the ROV resource.

    I'd be genuinely surprised if the rasterisers are made to repack wavefronts at pixel level to guarantee such order.
     
    #3582 pTmdfx, Aug 10, 2017
    Last edited: Aug 10, 2017
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,944
    Location:
    Well within 3d
    There is a wave-level ID, operand source, message type, and specific counters added for a mode called Primitive Ordered Pixel Shading. Perhaps that is related?
     
    3dcgi likes this.
  4. chavvdarrr

    Veteran

    Joined:
    Feb 25, 2003
    Messages:
    1,165
    Likes Received:
    34
    Location:
    Sofia, BG
    so, its confirmed that AMD failed to implement successfully tile-based rendering in Vega?
     
  5. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,293
    Location:
    Helsinki, Finland
    AMDs slides show up to 30% memory bandwidth usage reduction when DSBR is active: https://goo.gl/images/3TdRmp. This clearly shows that their tile-based renderer and on chip-binning ("fetch once") is working properly.

    However we don't know whether the occlusion culling ("shade once") is enabled and/or working. Per pixel occlusion culling would directly reduce the number of pixel shader invocations, and would bring other gains (reduced ALU/TMU use) in addition to reduced memory bandwidth use. "Shade once" should improve performance also in games that aren't memory bandwidth bound. AMD slide: https://goo.gl/images/YcVKmr.
     
    Pixel, Lightman, w0lfram and 6 others like this.
  6. dogen

    Regular Newcomer

    Joined:
    Oct 27, 2014
    Messages:
    338
    Likes Received:
    260
    This slide was pretty recent. "Optionally performs deferred rendering step".

    [​IMG]
     
  7. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    243
    Likes Received:
    102
    Interesting bin an betches are flexible? Does Nvidia can do this?
     
  8. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,944
    Location:
    Well within 3d
    Nvidia's tile size is adjusted based on the amount of pixel and vertex data that can be fit into its allocation on-die.
    The Realworldtech test shows that tile patterns changing as the vertex and pixel format settings are increased.
     
    DavidGraham likes this.
  9. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    124
    Likes Received:
    108
    Any hints? Pweeeaaaasee? :razz:
     
  10. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    124
    Likes Received:
    108


    If I'm understanding Rys correctly, Primitive Shaders don't require developer input. They're applied automatically in driver.
     
    digitalwanderer and CarstenS like this.
  11. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    124
    Likes Received:
    108
  12. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,279
    Likes Received:
    3,527
    Which still begs the question, where is their effect on performance?
     
  13. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    124
    Likes Received:
    108
    I figure you will only see it in geometry heavy scenarios. I don't think GCN is geometry bottlenecked in most games contrary to popular belief.
     
    digitalwanderer likes this.
  14. w0lfram

    Newcomer

    Joined:
    Aug 7, 2017
    Messages:
    217
    Likes Received:
    38

    I imagine, that is why a Developer was mentioning he loves how Vega handles open worlds. Even in games such as BF1, when you spin 180 to fire on someone, think about the geomtry involved in doing so, rapidly.

    Also, I think too many people are underestimating HBCC, and the ability to store micro-code within vega. Doesn't that mean, RX Vega can load Vulkan, or DX12.1 microcode into local stores, like the 1X claims to do.. ?

    Or is my understanding on the matter, off .. ?
     
    digitalwanderer likes this.
  15. Kaotik

    Kaotik Drunk Member
    Legend

    Joined:
    Apr 16, 2003
    Messages:
    9,088
    Likes Received:
    2,955
    Location:
    Finland
    It was my understanding that the NGG Fast Path -culling would require use of primitive shaders, too. If they're done automatically, how can developer make sure NGG Fast Path -culling is used instead of the 'native', too?
     
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,944
    Location:
    Well within 3d
    Most current games would be trying to stay within the storage limits of the board. Vega has a decent to generous amount, which makes many games insensitive to any paging enhancement. Until more than just a few architectures can handle large allocations without choking, games will have to remain insensitive. Most of the HBCC's features and capabilities are wasted on gaming.

    I do not follow what you think is different about microcode loading. Microcode is used by everything. Some of the features that were added to cards over time, like HSA support, priority queues, HWS being enabled, and so on were the result of microcode updates. Sometimes, the chips that did not get those updates missed out because of limits to their microcode storage or handling. Vega changes its load method to use a path locked down by its platform security processor.
     
    Entropy likes this.
  17. roybotnik

    Newcomer

    Joined:
    Jul 12, 2017
    Messages:
    18
    Likes Received:
    14
    I dunno man, according to Reddit we're going to need Raja and Lisa to say it on a live stream in front of a large audience because we're not REALLY REALLY sure he was talking about primitive shaders.
     
    Cat Merc likes this.
  18. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Could you ask him if thats DX11 and DX12 or DX12 only?
    @Rys if you could answer that would be appreciated as well.
     
  19. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Invoke the primitive shader through whatever API or mechanism AMD eventually provides. It seems the point of primitive shaders was to make their binning and work distribution more flexible. So a programmer could write a new, more efficient paths or fall back on the traditional pipeline. The actual fast paths likely follow a path beyond the limits of the traditional pipeline. Doesn't mean they can't defer interpolation or automatically perform other optimizations. Key difference being what used to be a driver optimization is becoming exposed to devs. Big chunk of the black box there.

    With the recent open world craze and modding it will likely be used. Fallout 4, Skyrim, etc can easily use all available memory as gamers load all sorts of models and texture packs. Limited only by acceptable performance. HBCC is likely the key feature Bethesda was after as it would aid old and new games. Cases where the dev can't control the scene as a gamer plops objects everywhere inevitably building some giant castle.

    Truly a wonder that a programmable replacement for the first stages can actually perform its most basic task and replace them.
     
  20. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,363
    Likes Received:
    3,944
    Location:
    Well within 3d
    The cited statement is literally saying it is not being exposed to devs. It was noted that GFX9 merged several internal setup stages, seemingly combining the stages that generate vertices that need position calculation and then the stages that processed and set them up. That sounds like it has to do with primitive shaders, or is what the marketing decided to call primitive shaders.
    The prior shader stages didn't look like they were exposed to developers, so AMD might not be driven to expose the new internal stages all that quickly. Some of the work that occurs hooks into parts of the pipeline that have been protected from developers so far.

    The stats I've seen for the gaming customer base may be out of date by now, but most systems they'd be selling to would be less well-endowed than a Vega RX. I do not think they are going to abandon those systems, and as such counting on a Vega-specific feature outside of coincidental use seems unwise.
    I would like to see an interview or some statement about Bethesda's pushing for a leading-edge feature like HBCC or similar tech, given their open-world engine isn't from this decade.


    I'm curious about the internal makeup of the primitive shader pipeline, such as whether the front end is running the position and attribute processing phases within the same shader invocation, or if it's like the triangle seive compute shader+vertex shader customization that was done for the PS4.
    The idea of using a compute-oriented code section to calculate position information and do things like back-faced culling and frustrum checking ahead of feeding into the graphics front end seems to be held in common.
    Mark Cerny indicated that this was not always beneficial, since developers would need to do some preliminary testing to see if it made things better.
    Potentially, the overhead of two parallel invocations and the intermediate buffer was a source of additional overhead.

    If this isn't two-workgroup solution, then AMD may have noted from analyzing the triangle sieve method that it could take that shader and the regular vertex shader and throw it all in the same bucket, then try to use compile-time analysis to hoist the position and culling paths out of the original position+attribute loop.
    Perhaps a single shader removes overhead and makes it more likely to be universally beneficial than it was with the PS4. If not, then going by the Linux patches it may be that it's not optional like it was for the PS4.

    Analysis of the changes may explain what other trade-offs there are. Making a single shader out of two shaders may save one portion of the overhead and possibly reuse results that would have to be recalculated. However, it could also be that while it's smaller than two shaders it's still an individually more unwieldly shader in terms of occupancy and hardware hazards, and exposed to sources of undesirable complexity from both sides. The statement that it calculates positions, and then moves to attributes may also point to a potentially longer serial component. From the sounds of things, this is an optimization of some of the programmable to fixed function transition points, but the fixed function element is necessarily still there since the preliminary culling must be conservative.

    edit: missing word
     
    #3600 3dilettante, Aug 13, 2017
    Last edited: Aug 13, 2017
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...