Modern and Future Geometry Rasterizer layout? *spawn*

Discussion in 'Architecture and Products' started by Digidi, Aug 19, 2020.

  1. Digidi

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    309
    Likes Received:
    152
    I think this is not Total correct. The Rasterizers limit is how much polygons he can take. It doesn’t matter how much pixel he can make out of a polygon. The question is how much polygons do you get into pixel.

    AMD stated at GCN that the worst case which can happen is, that the polygon is not bigger than a pixel. If you have this case the old gcn Rasterizer could put out only one pixel for one polygon. If you now have a Rasterizer which can take two polygons, you can get out 2 pixels at 1 clock. As more Polygons a Rasterizer can take, as more Pixels you get.
    Page60:
    https://de.slideshare.net/DevCentralAMD/gs4106-the-amd-gcn-architecture-a-crash-course-by-layla-mah
     
    iamw likes this.
  2. Qesa

    Newcomer

    Joined:
    Feb 23, 2020
    Messages:
    16
    Likes Received:
    10
    As far as I'm aware both are limits. AFAIK both AMD and nvidia's rasterizers can take at most one triangle/clock and output at most 16 pixels/clock. So large triangles will take more than one cycle to complete, and small triangles will cause lower than the peak pixel throughput
     
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,469
    Likes Received:
    4,398
    Location:
    Well within 3d
    The situation where two backfaced primitives are culled indicates that the hardware can discard up to two primitives per cycle. At some point, every primitive that reaches the fixed-function hardware must be rejected or used to produce a set of threads of some kind corresponding to the pixels it covers.

    The old implementation would spend one cycle per primitive regardless of whether it would produce any pixels for rendering. If the hardware can cull two triangles per cycle, a stream of triangles that need to be discarded can be discarded in half the time.
    Depending on how many triangles are encountered that can be culled, this can speed up overall processing since there can be fewer cycles spent on triangles that do not produce output pixels. However, if the stream has mostly non-culled triangles, the throughput would be similar to before.
     
  4. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,156
    Likes Received:
    2,663
    Location:
    Germany
    Yes, I tried to condense it down a bit. Micropolygons are dreaded by rasterizers, so people came up with compute shader solutions for this, like AMDs Geometry FX. They use CS to cull microgeometry to lessen the burden on rasterizers and geometry engines: https://gpuopen.com/geometryfx/

    Also a good read on primitives: https://frostbite-wp-prd.s3.amazonaws.com/wp-content/uploads/2016/03/29204330/GDC_2016_Compute.pdf
     
    Digidi likes this.
  5. Digidi

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    309
    Likes Received:
    152
    Thank you 3dilettante and CartenS for your explanation. Nice pages you have finde CarstenS thank you for this.

    My point ist. We tallk all about culling but maybe culling is not the only issue you have. If you hava fine meshes even after culling you have a lot of polygons to rasterize. I think in these days small Polygons are used more and more to create fine conturs of area and faces of people.

    If you have these case only 1 pixel is created by polygon, which leaves the rest of the pipline (Shaders, Rops) realy empty. The pipline works only realy well when the rasterizer put out 16 pixels for 1 polygon, then you have enough pixels to fill all the shaders and rops.

    So in programms with fine meshe where each polygon is not bigger than a pixel you will accelerate your speed dramndiusly if the one rasterizer cane take two polygons per clock and adress the resulting pixels to the shaders and rops. You will directly double your performance for this worst case.
     
  6. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,372
    Likes Received:
    3,754
    As I understood this, and please correct me If I am wrong, NVIDIA can rasterize and cull a max of 6 primitives per cycle, as they have a distributed tessellation (PolyMorph) engine that can deal effectively with this. AMD on the other hand can cull 4 primitives but only rasterize about 2.
     
    #6 DavidGraham, Aug 19, 2020
    Last edited: Aug 19, 2020
  7. Bondrewd

    Veteran Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    1,042
    Likes Received:
    441
    Yes.
    No, it's (8)4 for Navi iirc.
     
  8. DavidGraham

    Veteran

    Joined:
    Dec 22, 2009
    Messages:
    3,372
    Likes Received:
    3,754
    Navi is still a max of 4, that I am sure of, the question is, 4 culled and 4 rasterized or just 4 culled?
     
  9. Bondrewd

    Veteran Newcomer

    Joined:
    Sep 16, 2017
    Messages:
    1,042
    Likes Received:
    441
    No.
    Pretty sure it's distributed geometry with 4 culled 4 drawn tris o'clock.
     
  10. jlippo

    Veteran Regular

    Joined:
    Oct 7, 2004
    Messages:
    1,513
    Likes Received:
    699
    Location:
    Finland
    Adding to small triangles are bad.
     
    milk, Lightman, Digidi and 3 others like this.
  11. chris1515

    Legend Regular

    Joined:
    Jul 24, 2005
    Messages:
    5,120
    Likes Received:
    4,385
    Location:
    Barcelona Spain
    This is the reason Epic use a compute rasterizer for Nanite and only bigger triangle use the hardware rasterizer.
     
    Krteq, Dictator and jlippo like this.
  12. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,156
    Likes Received:
    2,663
    Location:
    Germany
    I can see more than 2 per clock, so that cannot be the hard limit. Probably they are running into some other bottleneck that keeps them from reaching clearly more than 3 or approaching four even.
    edit: Just re-read the Whitepaper. AMD says explicitly, that each primitive unit can cull 2 triangles per clock and draw 1 (ouput to rasterizer). Each (of the four) rasterizer can process 1 triangle per clock, test for coverage and emit 16 pixels per clock. I haven't seen culled triangle rates much above 8 GTri/s though, maybe the prim units are not fed quickly enough or the test runs into another bottleneck.
     
    #12 CarstenS, Aug 19, 2020
    Last edited: Aug 20, 2020
    Kej, pharma, Digidi and 2 others like this.
  13. Digidi

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    309
    Likes Received:
    152
    One Polymorphengine at Nvidia can handel 0.5 polygon/clock. Which lead to maximum 15 polygon/clock for Pascale. But there is a maybe a cache limit, which results in 11 polyons/clock. So 11 Polygons can be culled and only 6 Polygons can get Rasterized at Nvidia. Because Nvidia hast 2 Rasterizer more, maybe this is the Reason why Nvidia get more Performance than AMD? Because in Worst Case Szenario Nvidia can always put out 2 more Polygons than AMD.

    You can read here in German my conversation about it with good information from other people like Pixeljetstream.

    https://www.forum-3dcenter.org/vbulletin/showthread.php?p=11466705&highlight=0.5#post11466705
     
    pharma likes this.
  14. techuse

    Regular Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    409
    Likes Received:
    227
    PCGH used to run some theoretical tests which included geometry. Do you know why they stopped?
     
  15. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,156
    Likes Received:
    2,663
    Location:
    Germany
    No, I left PCGH over 2 years ago.
     
  16. techuse

    Regular Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    409
    Likes Received:
    227
    Oh, I wasn't aware you left.
     
  17. Digidi

    Regular Newcomer

    Joined:
    Sep 1, 2015
    Messages:
    309
    Likes Received:
    152
    @CarstenS Maybe for Ampere and RDNA2 you can put the Values as additional link to your C't article you will write? This will be realy great <3:D
     
  18. Ext3h

    Regular Newcomer

    Joined:
    Sep 4, 2015
    Messages:
    411
    Likes Received:
    457
    That isn't working as easy as that. You don't get to mix multiple polygons in a single wavefront, due to a usually significant data dependency on per-triangle uniform vertex attributes which is handled in the scalar data path. In order to mix like that, you would need to accept a 16x load amplification on the rasterizer output bandwidth as you would have to drop the scalar path and the compacted inputs for a fully vectorised one. There is no cost effective way to afford that amplification with a hardware rasterization, with geometry engines being kept centralized.

    EDIT: Maybe we could actually see this in a future architecture, "lone" pixels being caught in a bucket, and then dispatched in batch in a specialized, scalar free variant of the fragment shader program. But that would still require a decentralised geometry engine to better cope with the increased bandwidth requirements, and a higher geometry throughput.
     
    #18 Ext3h, Aug 20, 2020
    Last edited: Aug 20, 2020
  19. CarstenS

    Legend Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,156
    Likes Received:
    2,663
    Location:
    Germany
    Yet, there seems to be some merit to do exactly that (centralizing geo engines), since AMD has done so with the move to RDNA. At least that's what I've gathered from the whitepaper.
     
  20. techuse

    Regular Newcomer

    Joined:
    Feb 19, 2013
    Messages:
    409
    Likes Received:
    227
    Why cant GPUs be designed to not shade in quads so that micro polygons don't destroy efficiency?
     
    milk likes this.
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...