How a modern GPU works ?

Discussion in 'Beginners Zone' started by agent_x007, Dec 12, 2014.

  1. agent_x007

    Newcomer

    Joined:
    Dec 2, 2014
    Messages:
    13
    Likes Received:
    0
    First of all : Hello everyone :)

    I know it's a silly (and quite complex) question, but I want to see if I understand 3D rendering idea in context of GPU architecture, right :)
    Simple way to put it - I want to join these :
    [​IMG]

    With those :
    [​IMG]
    [​IMG]

    I do know (more-less) how Deffered Rendering works (and since it's quite common in games these days I want to base on it).

    SO, here's how I think a modern GPUs work :

    DX10 (using/handled in GT200 aka. GTX 2xx) :
    We start at VS (Vertex Shader), that works on points in space (delivered by CPU through PCI-e).
    VS will transform them (move, rotate, etc.) any way we want and combine them together to form vertices and primitives/objects.
    Vertex Shader stage uses "Streaming Procesors" (SPs) in GT200 combined with cache and VRAM to move data around.
    Next stage, Geometry Shader (GS), does similar thing to VS, but it does it on larger scale (entire vertices, objects/primitives) + it can create new vertices (VS can't make new geometry).
    Like VS, GS is handled by SPs.
    After we complete all transformations, we rasterise the image (ie. change 3D space of vertices, to 2D space of pixels - triangles in, pixels out) using Rasteriser in GPU.
    Next, we have to "put wallpaper" on our new pixels, to figure out what colour they should have now (using GPU TMU's), and after that we pass them on to Pixel Shader (PS) that can do interesting stuff to them and take care of lighting entire scene (Deffered Rendering "thing"). PS is again done by SP unit's in GPU.
    Important thing to note is that SP's operate in blocks and if they are in the same SM, they can't do two different things (like PS and VS) at the same time.
    After all that, all what's left is to blend our pixels into something more useful than some numbers, so we input them to ROP unit(s) (which give us a image frame as ouput.
    Once we have it, we send it through RAMDAC(s) (RAMDAC translates "a frame", into something monitor can understand), to monitor(s).
    I didn't mentioned culling and cutting since it takes place in almost every stage (culling/cutting reduces workload for this stage, and those after it).

    In DX11, before Geometry Shader and after Vertex Shader we have a Tessalation stage (that consists of Hull Shader, Tesselator unit and Domain Shader). It can create Enourmous amounts of new geometry REALLY FAST (it's based on fixed function setup, like T&L in days pased or fast Video Decoding today :) ).
    Hull Shader takes care of controling other stages, Tesselator... tesselates, and Domain Shader combines data gathered from all previous stages (including Vertex Shader) and prepares it for Geometry Shader.
    VS, HS, DS, GS and PS, are all handled by Cuda Core's (new marketing term, since SP's can't handle HS and DS stages ie. GT200 is not capable of DX11).
    Cuda Core's (CC's) are present in Fermi, Kepler and Maxwell based cards.
    CC's have the same limitations as SP's had (working in groups, the CC's of the same group can't hadle different tasks).

    Other stages are pretty much the same as the ones in DX10 (altho they do have more capabilities than old versions).
    From GPU perspective It's worth noting that all DX10 GPU's have only one Rasteriser ("thing" that changes 3D to 2D). DX11 based ones can have 4 or even 5 of these, working in parallel.

    OK, that's it (I THINK I got it right).
    I don't need ALU or Register detail level here - I just want to know if my thinking (and understanding of it all), is correct.

    PS. Also one other thing :
    I know Immediate/Direct Rendering (DR) differs from Deffered Rendering in Lighting stage - in DR it takes place early (ie. in Vertex Shader).
    But are there any other this type of things or other stuff, that make GPU handled it differently ?

    Thank's for all responds.
     
  2. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    #2 Infinisearch, Dec 12, 2014
    Last edited: Dec 12, 2014
  3. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
  4. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    14,397
    Likes Received:
    1,865
    Here comes the noob question
    If it creates 3 vertices thats a triangle, and thats geometry isnt it ?
     
  5. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    425
    Location:
    Cleveland, OH
    The vertex shader has no concept of geometry to begin with. The input is a vertex and associated parameters and the output is a vertex and associated parameters, so it's 1:1. So the vertex shader doesn't really create vertexes either, it's more like it changes them.

    The vertexes could be ended up used as part of one triangle, multiple triangles (vertex shared as a corner of adjacent triangles), no triangles (if it was clipped or culled) or higher level geometry that's converted into triangles later.
     
    Simon F likes this.
  6. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
  7. Exophase

    Veteran

    Joined:
    Mar 25, 2010
    Messages:
    2,406
    Likes Received:
    425
    Location:
    Cleveland, OH
    I stand corrected, thanks. This just goes to show how much barriers in the traditional pipeline have been broken down (I'm sure it's more so in DX12) Do you know if this capability has been added to any recent OpenGL too?
     
  8. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
  9. agent_x007

    Newcomer

    Joined:
    Dec 2, 2014
    Messages:
    13
    Likes Received:
    0
    If someone is interested : Deffered Rendering explained on 3DMark11 - LINK (Google Translator from French).
     
  10. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA
    You should make a separate thread for that link since it has more to do with a render technique than how a GPU works. If you want some presentations on render techniques I'll post them in that thread.
     
  11. sebbbi

    Veteran

    Joined:
    Nov 14, 2007
    Messages:
    2,924
    Likes Received:
    5,280
    Location:
    Helsinki, Finland
    A common misconception is that vertex shader stage runs first and transforms all the vertices. This is not how it goes.

    Assuming indexed triangle rendering (the most common one):
    Primitive assembler reads 3 indices from the index buffer. GPU checks the "post transform vertex cache" (also known as parameter cache) for each index, and runs the vertex shader for those vertices that did not exist in the cache. When all the 3 vertices of a triangle are transformed, the transformed vertex positions go to a fixed function unit that first determines the rough coverage of the triangle (*) and then fine (2x2 quad) coverage of the triangle (*). This results in variable amount of pixel shader instances (grouped in 2x2 quads). GPU starts to execute these pixel shader instances immediately, while still continuing the primitive assembler work and vertex shader work (and generating new pixel shader instances based on triangle coverage) at the same time. In the end the pixel shader output is tested against triangle coverage (not all pixels in 2x2 tile are inside the triangle), against depth and stencil. Pixels that fail the test are rejected.

    Both vertices and pixels are actually grouped to waves/warps before execution (32 elements on NVIDIA hardware, 64 elements on AMD hardware, 8/16/32 on Intel hardware). This adds a little bit latency to the pipeline, but simplifies the GPU execution a lot, since the later stages don't need to process/book-keep single vertices/pixels.

    (*) In these stages the pixel shader instances might be culled by hierarchical depth buffering and by early depth/stencil test.
     
    Alexko and Grall like this.
  12. Grall

    Grall Invisible Member
    Legend

    Joined:
    Apr 14, 2002
    Messages:
    10,801
    Likes Received:
    2,170
    Location:
    La-la land
    Right! I was thinking why depth/stencil checks came after pixel shading work had already been kicked off! There's early-Z and such to cut down on the unnecessary workload.

    Anyway, why is it that depth check (not counting early Z stuff) is located at pretty much the very end of the pipe? Is it just easier to build the GPU that way, or is it maybe just some historical reason, IE, a carry-over from the classical Silicon Graphics rendering pipe or somesuch ("it's always been that way")?

    Cheers! :D
     
  13. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,558
    Likes Received:
    149
    Location:
    In the Island of Sodor, where the steam trains lie
    That is the OpenGL (well probably even the preceding IrisGL**) model of the SGI machines***. It permits using the texture to control visibility of the polygon with a fixed pipeline processing order.

    Of course, it is a pain from a performance perspective since most of the scene doesn't require alpha testing.


    **but I don't have time to go an check through the ancient documentation.
    ***possibly other early work station systems, e.g, Apollo,Sun, HP, as well but someone else would need to check.
     
    #13 Simon F, Aug 25, 2015
    Last edited: Aug 25, 2015
    Grall likes this.
  14. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,275
    Likes Received:
    118
    Location:
    On the pursuit of happiness
    It might, though, and it is roughly what happens on tile based renderers (where geometry work from one render pass overlaps raster work from the previous render pass).
     
  15. Infinisearch

    Veteran Regular

    Joined:
    Jul 22, 2004
    Messages:
    739
    Likes Received:
    139
    Location:
    USA

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...