AMD Vega 10, Vega 11, Vega 12 and Vega 20 Rumors and Discussion

Discussion in 'Architecture and Products' started by ToTTenTranz, Sep 20, 2016.

  1. Anarchist4000

    Veteran Regular

    Joined:
    May 8, 2004
    Messages:
    1,439
    Likes Received:
    359
    Wouldn't be surprised if that's how they bin and create tiles.

    That 17 was only 11 a few months ago, so AMD is still finding ways to evaluate or cull more. Best guess is the packed math and scalar being a bit more versatile. Backface culling with really low precision as that should remove more than half.
     
    Rasterizer likes this.
  2. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    226
    Likes Received:
    97
    Rasterizer likes this.
  3. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    And that implies a hard limit of 8 in your eyes? Ok, so be it.

    I'd say we're looking at a different limitation here. I would think, R/W-rate of L2 cache partitions (not aggregate!) might be limiting.
     
    #3683 CarstenS, Aug 19, 2017
    Last edited: Aug 19, 2017
  4. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    Well going from the bandwidth figures and assuming one triangle per vertex and just X, Y, Z for the vertex (12 bytes)... So say a long non indexed strip. A 1733MHz chip with 4 triangles per clock will bust over 83GB/s of bandwidth on input assembly alone. If it's indexed geometry (assuming 32 bit indices) that figure will double.
    That's without any drawing. I'm just pointing this out because once you get to these insanely high primitive rates some weird stuff will start popping out.
     
    Rasterizer and silent_guy like this.
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Vertex shader is a hardware shader stage. So is a fragment shader stage. And so on...

    It's "hardware" because the GPU is cognisant of the type of shader and can use that, as well as the data associated with each thread for that shader type, as inputs into load-balancing. The hardware also knows how to connect a source of data for a stage with the stage itself and then how to connect the results from that stage with the next stage (or the buffer for that stage).

    When you look at how they actually work, all these types of shader are just code. You populate a buffer and/or some registers with the right data, you optionally put some other data into LDS and voila, you have all the data that a "hardware stage" shader requires.

    When a game developer writes a compute shader to do the same job as the primitive shader, they are responsible for setting up the data connections and working out how it should be load balanced.

    So the hardware aspect here is controlling how to start and feed the type of shader (vertex, fragment, primitive etc.) and what to do with the data it produces. The shader itself is just code.

    A primitive shader accepts data just like a vertex shader does. It outputs data just like a geometry shader does (if there is one defined by the programmer when setting up the pipeline, otherwise, just like a vertex shader does).

    The white paper also refers to a surface shader. A surface shader accepts data in the same way as a vertex shader and outputs data just like a hull shader does.

    Both of these are examples of a type of shader that already fits into the model of the graphics pipeline that the hardware has been designed to process. Vertices, patches and triangles are well defined already. So these new shader types are really just a re-configuration of the hardware, working with geometry-related data types that the GPU already knows how to handle.
     
    Rasterizer, Lightman, xpea and 5 others like this.
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    The case when you perform multiple frustum rendering, e.g. for VR:

    Single Pass Stereo

    should benefit greatly from "primitive shader" functionality.
     
    Rasterizer likes this.
  7. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    226
    Likes Received:
    97
    It Looks like a hard Limit. If you look the values between 1080 ti and 1080 they have nearby the same Limits. So this seems to be the Hardware Limit of Pascal Architektur.

    @MDolance
    AMD statet that they do culling before Vertex Data is written. So the bandwithd should be not so high?
     
    #3687 Digidi, Aug 19, 2017
    Last edited: Aug 19, 2017
    Rasterizer likes this.
  8. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,038
    Likes Received:
    3,111
    Location:
    Pennsylvania
    I thought it was already known that GP104 has a 6 triangle setup limit? Compared to AMD's 4.
     
  9. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    226
    Likes Received:
    97
    It's not about triangle Setup. It's about culling and AMD have the primitive shader. If it's activated AMD can put out 17 Polygons per Clock because of fast culling.
     
    #3689 Digidi, Aug 19, 2017
    Last edited: Aug 19, 2017
    Rasterizer and digitalwanderer like this.
  10. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
  11. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,999
    Likes Received:
    4,571
  12. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,436
    Likes Received:
    264
    GP104 can rasterize 4 triangles. You're thinking of the bigger chip.

    Forget you ever heard 11 per clock. That number shouldn't have been published. It was referring to discard rate though.
     
    tinokun, Kej, Alexko and 4 others like this.
  13. Malo

    Malo Yak Mechanicum
    Legend Veteran Subscriber

    Joined:
    Feb 9, 2002
    Messages:
    7,038
    Likes Received:
    3,111
    Location:
    Pennsylvania
    Gotcha, thanks for the clarification.
     
    digitalwanderer likes this.
  14. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Where does your link say „draw“? Anandtech quotes quite clearly from what AMD meant: "Vega is designed to handle up to 11 polygons per clock with 4 geometry engines."
     
    Rasterizer likes this.
  15. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    226
    Likes Received:
    97
    Rasterizer likes this.
  16. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,999
    Likes Received:
    4,571
    In the slide that says "over 2x peak throughput per clock", which seems to be what @Ryan Smith is commenting when talking about the 11 polygons per clock.
     
  17. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    So, we are still missing any possible reference to „draw“?

    Let me help you with your link, where Ryan says quite clearly where he got this information. Which is, btw, why he put in quotation marks - because he does not comment, but he quotes.
    ->„And while AMD's presentation and comments itself don't go into detail on how they achieved this increase in throughput, buried in the footnote for AMD's slide deck is this nugget: "Vega is designed to handle up to 11 polygons per clock with 4 geometry engines."
    [my bold]
     
    pharma, Rasterizer and DavidGraham like this.
  18. ToTTenTranz

    Legend Veteran Subscriber

    Joined:
    Jul 7, 2008
    Messages:
    9,999
    Likes Received:
    4,571
    You're suggesting that Vega 10 at 1.5Ghz is discarding 16.5 billion triangles per second?
    At a generous 60FPS, that's 275 million triangles per frame. Does discarding 275M triangles/second even make any sense?
     
  19. Digidi

    Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    226
    Likes Received:
    97
    Rasterizer and ToTTenTranz like this.
  20. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    I am not suggesting anything, just going by the most recent information published by AMD and not interpreting anything into their marketing slides which is not in there.

    You are the one making the assertions, i.e. „draw“, even though you put question marks behind.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...