AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Lightman, ToTTenTranz and Jawed like this.
  2. pharma

    Veteran Regular

    Joined:
    Mar 29, 2004
    Messages:
    2,928
    Likes Received:
    1,626
    Isn't it $700 vs $699?
     
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    I would reference the filing date for AMD's binning rasterizer, which is 2013. Vega's development may have been prolonged by internal issues, but even in other contexts you wouldn't expect something patented now to have been sat on for the years the design was moving down the pipeline. Given the timing, we might have to wonder if or when it might show up. That it was filed and made public may mean AMD isn't worried about competitors seeing it too early (publication can be delayed significantly from filing if you care).

    There are slides and die shots that don't point to something that significantly different for the LDS. Maybe it could be somewhat bigger?

    Reference?

    128K of what, page table entries?
    GCN does support 4KB x86 page tables, which would not cover 16GB. Even if going with the coarse 64KB PRT granularity, the additional context and history tracking would go over a MB.
    On top of that, CPU TLB hierarchies and the page table hierarchy are backed up by their caches. The HBCC may not have that option, or it might want to avoid using it that much given the L2 isn't that expansive.

    There's dribs and drabs in the previously mentioned slide deck. The event seemed more product and bundle oriented, with some other hyping like having Linus Tech Tips amp things up. I think Linus or someone else dropped a Vega card, which at least confirms the architecture is not gravity-defying.
     
  4. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,029
    Likes Received:
    3,101
    Location:
    Pennsylvania
    Why do we have slides like this if DSBR is inactive?
    [​IMG]
     
  5. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,966
    Likes Received:
    4,560
    Regular RX Vega is $500. Or $600 to anyone who thinks an aluminum shroud is somehow worth $100 more (though AMD may pull a nvidia and only sell founders limited editions during the first month or so).


    It's inactive in current Vega FE drivers. It'll be enabled on time for RX Vega release, which is what these slides are about.
     
    #3285 ToTTenTranz, Jul 31, 2017
    Last edited: Jul 31, 2017
    BRiT, BacBeyond and entity279 like this.
  6. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,029
    Likes Received:
    3,101
    Location:
    Pennsylvania
    Oh yes, oops. Apparently I can't read properly. Thanks.
     
  7. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    2,773
    Likes Received:
    2,560
    It's only inactive in Vega FE, it's active in AMD's final RX performance targets. AMD stated it's active in 17.20 driver (page 43 note), AMD tested all of their games using driver 17.30, which is the driver after it.
     
    Lightman and pharma like this.
  8. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,288
    Location:
    Helsinki, Finland
    Yeah. AMDs ROP caches (CB and DB) have been historically quite small. But Nvidia also doubled their L2 when they moved to tiled rasterizer. And I would guess than AMDs 64 wide waves prefer bigger tiles than Nvidia's 32 wide warps (assuming you need to find 16 quads with the same shader in the same tile to fill the wave fully). I would guess that we are talking about tile sizes of at least 128x128, thus depth tile (32 bpp) >= 64 KB. Color tile (RGBA16f) >= 128 KB. But that 2 MB increase in L2 size should be enough for these purposes. L1 ROP caches can remain as tiny as AMDs previous ROP caches.

    But how much storage does the binned geometry need? Let's pick some triangle count, for example 8192 triangles. Let's assume roughly 1 vertex shaded per triangle and let's assume 4x float4 interpolants. That's 512 KB of vertex interpolants and 16 KB of vertex indices (assuming 16 bit indices). Is 8192 triangles enough? I would say no, if you want to do proper hidden surface removal. In that case you need much more. But what if they did go one step further, and separated position calculation from the rest of the vertex shader? This only needs float4 storage per vertex, regardless of interpolant count. In this case, you could also do HSR in the binning step, only emitting those triangles that cover pixels. This would allow you to run full vertex shader only for those triangles that have at least one visible pixel. It would also have similar effect as "z-prepass" for pixel shader culling (as the binning step would generate the partial depth buffer for the tile). I did some experiments with a compute shader based pipeline like this. Could be really efficient if done at hardware level.
     
    no-X and Lightman like this.
  9. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    Doesn't 64 wide waves also require more L1/L2 for the temporary data generated between graphics pipeline stages?
     
  10. sheepdogexpress

    Newcomer

    Joined:
    Mar 10, 2012
    Messages:
    86
    Likes Received:
    11
    The problem with this bundle is to get the 100 dollars savings you have to purchase the most expensive AMD motherboard as each of those MB listed in the promo is the top of the line MB from Asus, gigabyte and MSI, you also have to buy more expensive 8 cores(can't buy a ryzen 1700), to qualify for the bundle.

    One could save more on their own by building their own bundle and not picking these more expensive parts.
     
    xEx likes this.
  11. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    And the bundled FreeSync monitor apparently has the reputation of being one of the worst FreeSync monitors in terms of flickering (to the point where Samsung recommends in their manual not to enable it.)
     
  12. BacBeyond

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    73
    Likes Received:
    43
    Regular Vega is $500. Limited Edition is $600 with a bundle and $699 is the water cooled bundle.

    The bundles are 2 games + $100 off Ryzen combo + $200 off Samsung monitor. One of the sites said more options (at least monitor) were coming soon
     
  13. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    The most recent description from some of AMD's patents has screen space subdivided into some number of rectangles.
    When the rasterizer is deferring shading, a triangle's initial bin is determined and it then enters that bin's list of primitives. This then begins by querying how many tiles it intercepts along one axis. The max and min intercepts are recorded, and the bin will accumulate primitives until full or some other condition is hit.
    Then, the list of additional intercepts is passed onto another query step that gets the max and min of bin IDs along the other axis in screen space.
    After that reaches some closure condition, the hardware concludes its intercept determination and starts to evaluate coverage.
    Within a bin, pixel coverage is determined and what primitives belong to each pixel in a tile are recorded with potentially some number of multiple IDs per pixel allowed in the presence of transparency. There's mention of a possible way of continuously updating a batch so that it can dynamically remove primitives while accumulating more, which may allow more coalescing by preventing culled IDs from hitting the maximum bin size, although it's unclear if that is implemented (would require some kind of indirection to the indices, perhaps?).
    The context associated with interpolation and resources associated with export buffers and pixel data may count towards batch closure conditions.

    Then there is the number of primitives per batch, which yields a size of the primitive ID for the buffer--needed for coverage and the order of shading.
    Then, there's some additional context like whether a primitive is opaque, some flags for the stage of processing a batch/bin is in, some form of depth information per pixel, and an output of coverage either by the scan converter or at that level of precision.
    AMD posits for the purpose of utilization at least double-buffering all of this, so multiples of some of the context are to be expected.

    8192 primitives per batch is 13 bits per primitive, the number of rows and columns can attach 4 additional fields as form of pipeline context that would take up storage even if not used in the data passed to pixel shader launch.
    The tile size is going to give ID bits, transparency, depth of the closest occluder, and some number of IDs per pixel.
    There's in effect, an ID buffer of 128x128 pixels with at least 13 bits per pixel without transparency.

    With the ID alone, it's 26KB to just express for one tile what primitive goes to a pixel without double-buffering, transparency, or higher sampling level. Perhaps the depth for the tile can be shared between batches in a double-buffered setup?
     
    Ethatron and Lightman like this.
  14. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,966
    Likes Received:
    4,560
    It's solved. Most probably solved by default in the RX Vega launch driver too, since it's a simple misdetection of the screen's horizontal frequency by the driver.
    There's no such thing as Samsung recommending FreeSync to be disabled, either.



    Nice try at lurking reddit for FUD defects on the bundle, though.
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    So texture sampling returns packed results to GPRs. This seems to imply that 8-bit and 16-bit texture sample operations will return as packed results in unsigned integer (four or two results packed into one register), and then uses shader instructions to read the results out of the register and converting to fp32 at the time of being used. If true, this should save register space compared with loading texturing results directly into VGPRs as fp32 values (ALU cycles are cheaper than VGPRs).

    The new addressing instructions might also lead to reduced VGPR usage. Now, instead of having source values for address calculation (which can't be discarded straight away) plus the register allocation for intermediate values in a sequence of address computations, these new instructions would increase the chances of going from source values to address without needing intermediates. Well, that's my theory.

    The 8-bit operations (sum of absolute differences) are ancient instructions, not sure why they're being mentioned now.

    The "explanation" of the primitive shader is entirely unconvincing. Seems likely to be a white elephant. I wonder if this was built by AMD as the basis for a console chip at some later date. In a console it would be totally awesome, I presume.

    ---

    How much of Vega's (and Polaris's?) power problems are solely due to Global Foundries, versus TSMC?
     
    Lightman and DavidGraham like this.
  16. silent_guy

    Veteran Subscriber

    Joined:
    Mar 7, 2006
    Messages:
    3,754
    Likes Received:
    1,379
    So all you need to do is to hack the video timings? Awesome! :)
     
  17. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,029
    Likes Received:
    3,101
    Location:
    Pennsylvania
    I guess the point is there's nothing wrong with the monitor and a driver update will fix.

    This thing has a 80-100 freesync range? That's not much....
     
  18. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Possible additions so far - without actual sizes though:
    - HBCC buffers
    - ROP Caches (4 kiB per RBE?)
    - Parameter Caches
    - Constant Caches
    - DSBR-Tile-Cache
    - Color Cache per RBE (was 16 kiB in Hawaii)
    - Z-/Depth-Cache per RBE (was 4 kiB in Hawaii)
     
    #3298 CarstenS, Jul 31, 2017
    Last edited: Jul 31, 2017
    AlBran, Kej, Lightman and 1 other person like this.
  19. Cat Merc

    Newcomer

    Joined:
    May 14, 2017
    Messages:
    124
    Likes Received:
    108
    The flickering happens when you enable the 48-100 mode.
     
  20. BacBeyond

    Newcomer

    Joined:
    Jun 29, 2017
    Messages:
    73
    Likes Received:
    43
    It has two ranges in two modes, one is 80-100 and the other is 48-100.

    Honestly no idea why samsung does the multiple modes when other vendors using the same panels don't, because yeah, 80-100 is very small. Would be interesting to see the technical details of why they use two .
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...