In a cycle...

Discussion in 'Architecture and Products' started by OpenGL guy, Jul 19, 2006.

  1. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    ...a graphics chip fetches a number of vertices while n vertex shaders execute m instructions while the setup engine sets up l triangles while the k scan converters prepares j quads while the rasterizer interpolates i interpolants while h pixel shaders execute g instructions while the f texture units texture e pixels while the depth buffer tests d pixels while the blend unit blends c pixels. And while all this is happening, the memory controller is fetching and writing data for multiple clients.

    Typical values for a high end chip:
    n = 8
    m = 1
    l = 1
    k = 4
    j = 1
    i = 8
    h = 16
    g = 3
    f = 4
    e = 4
    d = 16
    c = 16

    Of course, this is not an exhaustive list as it doesn't account for early Z check, any sort of compression, alpha test, or anti-aliasing. Also, I could go into a lot more detail are certain steps, notably texturing which includes filtering, anisotropy, and more. Setup also has to handle things like clipping and culling, so more steps are involved there as well. Fog may also be handled by the HW unless it's being done in the pixel shader.

    Now consider how many of these steps happen at full FP32 precision and you can see why graphics chips are floating point monsters.

    Hope this helps someone. :D
     
  2. rwolf

    rwolf Rock Star
    Regular

    Joined:
    Oct 25, 2002
    Messages:
    968
    Likes Received:
    54
    Location:
    Canada
    How about AA and AF?
     
  3. hoom

    Veteran

    Joined:
    Sep 23, 2003
    Messages:
    3,139
    Likes Received:
    693
    sorry but someone had to ask:
    How about some values for an unannounced upcoming high end chip :?:
     
  4. Me

    Me
    Newcomer

    Joined:
    Mar 6, 2003
    Messages:
    31
    Likes Received:
    0
    Location:
    Calgary, Canada
    4 texture units?

    There are only 4 texture units? I seem to remember R300 had 1 per pipe for 8 total. Has the number actually gone down since then?
     
  5. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,331
    Likes Received:
    158
    Location:
    On the path to wisdom
    To avoid confusion, add an "each" for all the figures that come in pairs (nm, kj, hg, fe). It's four quad texture units.

    Is i meant to be "quad interpolants", i.e. two per quad per clock?
     
  6. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    It might be related to rough average rates based on anisotropic filtering, if it's not a typo :)
     
  7. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    I wonder if the implication here is "and because they are such floating-point monsters, they'd be able to do physics just fine." But I'm goofy.
     
  8. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    No, I think the implication is related to the "bilinear filtering on CPU" thread, that is, GPUs blow away CPUs at this workload and always will. I think Mint's "number of instrs and bytes fetched per pixel" calculations were probably a little more clearer to the average person. But Mint was just accounting PS/TEX, OGL Guy is trying to show how much FP power is being gobbled up even by fixed function HW in the pipeline.
     
  9. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    I think it'd better (and simpler) to think of things in terms of "quads". R300 had 2 quad pipes, each with it's own texture unit. Obviously, it's a beefy texture unit as it has to service 4 pixels at a time.
     
  10. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    Yeah, that's how I meant it. I could have been a bit more clear, but it was late :)
    I meant 8 interpolants per quad.
     
  11. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    Precisely.
     
  12. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    And in the near future, all this state has to be dumped out to memory... frequently. It is scary :shock:
     
  13. Npl

    Npl
    Veteran

    Joined:
    Dec 19, 2004
    Messages:
    1,905
    Likes Received:
    7
    Not necessary, you could allow the GPU to get into "switch-state" which then needs alot less information to backup, like for example let it finish all triangles/vertexes it started - in that case only renderstates and displaylist would need to be saved and no internals.

    What I wanted to know though - do GPUs (or any other Device on the Bus for that matter) actually know about the CPU`s page-table or do they just see the physical memory ?
    In the latter case it should be easier switching tasks as even when the CPU is already running the new task the GPU could finish workloads of the old one...
     
  14. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    Have you read the DX10 spec?
     
  15. Demirug

    Veteran

    Joined:
    Dec 8, 2002
    Messages:
    1,326
    Likes Received:
    69
    Maybe I am misunderstanding you. Do you talk about render context switching?
     
  16. Npl

    Npl
    Veteran

    Joined:
    Dec 19, 2004
    Messages:
    1,905
    Likes Received:
    7
    Nope, you are speaking about the GPU-Context switches, so what requirement am I missing?
    I dont think DX10 requires 0-cycle switches, I was just pointing out an example which would allow switching tasks at a more coarse granularity. Its similar to task switches with CPUs where you DONT store full caches and the pipeline-state, instead simply writeback dirty cache lines and wait till the pipeline is empty. Similar you could finish tasks in the GPU (be it the current pixel, the current vertex, finish all fetches.... whatever) to avoid having to dump your whole state.

    If Im totally wrong, then Im sorry, but I dont know where you going and Im not gonna readup DX10 specs, atleast aslong you point me to the stuff in question :wink:
     
  17. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    That would violate the DX10 spec. Say you're running a complex pixel shader that takes a million cycles per pixel... Do you really want your context switch to wait until all pixels are shaded?
     
  18. Npl

    Npl
    Veteran

    Joined:
    Dec 19, 2004
    Messages:
    1,905
    Likes Received:
    7
    Maybe I would want the pixel to be finished, considering it could in turn save time&memory for reading/writing the context :smile: . But I see this would mean indeterministic delays.
    Just out of curiosity, what do the DX10 specs require?
     
  19. Bludd

    Bludd Experiencing A Significant Gravitas Shortfall
    Veteran

    Joined:
    Oct 26, 2003
    Messages:
    3,612
    Likes Received:
    1,221
    Location:
    Funny, It Worked Last Time...
    Excuse me if this is a silly question, but how long is a typical cycle for a high end chip that is outlined in the original post? :)

    Edit: Oh, and how does such a typical cycle time compare with a cycle in a modern x86(7?) CPU?
     
    #19 Bludd, Jul 21, 2006
    Last edited by a moderator: Jul 21, 2006
  20. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    There's, er, one clock cycle.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...