Pixel fill rate V.s Texel fill rate...

Discussion in 'Architecture and Products' started by Hellbinder, Sep 5, 2002.

?

Pixel Fill rate Or Texel Fill Rate..

  1. A GPU with 4 pixel pipes and 4 TMUS per pipe

    100.0%
  2. A GPU of Configuration X (tell use below)

    0 vote(s)
    0.0%
  3. A GPU with 8 pixel pipes and 1 TMU per pipe

    0 vote(s)
    0.0%
  1. Hellbinder

    Banned

    Joined:
    Feb 8, 2002
    Messages:
    1,444
    Given todays games,,, and perhaps games comming out over the next year... (but no further)....

    All Theoretical Cards have the same core clock, and comparable memory bandwidth (using whatever means), and similar shader performance pixel/vertex.... Would you rather have....
     
  2. Chalnoth

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,706
    Location:
    New York, NY
    There is only one thing that you didn't mention: Independent of textures, how much math can one pixel pipeline perform?

    If the 8x1 pipelines can actually perform all the same math as each of the 4x4 pipelines (as appears to be the case with the R300...), then the 8x1 would definitely be better, for use with anisotropic filtering.
     
  3. multigl2

    Newcomer

    Joined:
    May 23, 2002
    Messages:
    64
    call me crazy, but i wouldn't mind seeing a good implementation of 16x0. If you had good loopback capabilities, 16x0 could do a lot of damage to current games. of course the tradeoff would most definitely be slower or harder-to-implement-at-decent-speeds trilinear/anisotropic filtering.
     
  4. Saem

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,532
    Multigl2,

    Not only would you need 16X0, you'd likely need multiple triangle setup engines and the setup in a non-fixed rendering pattern (say 4*4). Otherwise the diminishing returns would really hose your performance.

    Personally, I like the super generalized P10 architecture. General execution units, geared for the problems you'll usually encounter.
     
  5. multigl2

    Newcomer

    Joined:
    May 23, 2002
    Messages:
    64
    most definitely saem... but from my just toying with shaders perspective:

    it would be really nice to see what a 16x0 math power house could do... i mean if it should techinically (setup and bandwidth permitting) as fast as 8x1 in multi texturing duties, but it could do some serious shaders if the pipelines were setup nicely. Like for instance, if the setup permitted, you could treat it as a 4x2 card with 2 free pipes to help process shaders :) again, setup permitting.
     
  6. Nagorak

    Regular

    Joined:
    Jun 20, 2002
    Messages:
    854
    Sorry for being ignorant but how exactly does a 16*0 setup work? I mean wouldn't that end up being a bunch of untextured polies (obviously not, so please explain ;) ).
     
  7. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    The poll is too simplistic IMO but if I have lots of shaders in my game, I'd probably prefer a card with more pipes. However, given the differences in architectures (which will probably always exist), the bottomline is the performance - it won't matter to me if it is 8x1 or 4x4 or whatever since this is transparent to a developer.
     
  8. Saem

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,532
    Think PS2. It can do something in the order of 2400 mp/s all untextured. You cut that number in half for adding in a texture layer. The point of this setup is for doing things that don't involve texturing -I'm guessing stencil buffers would be one- this ends up being more efficient since you're not using the TMU anyways. People can argue that the returns provided by a TMU are huge. One could allow for significant loop back and this would be a less of a problem, a simplificantion of the circuit could also lead to higher clocks. Though, I'm guessing TMUs aren't your big inhibiters.
     
  9. alexsok

    Regular

    Joined:
    Jul 12, 2002
    Messages:
    807
    Location:
    Toronto, Canada
    8 pixel pipes and two TMUS per pipe

    I do have to agree with Saem though, P10's architecture is really flexible in many ways and is targeted towards generalization of everything.

    I'm very intrested in the flexibility of NV30's architecture, since it's been suggested that it might be even more flexble than P10's! (not in all areas obviously, but in most of them).
     
  10. Tahir2

    Tahir2 Itchy
    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,926
    Location:
    United Queendom
    8*2 would then require a very high memory bandwidth to take advantage of. Something a lot higher than the 19gb/sec with the Radeon 9700 Pro.
     
  11. alexsok

    Regular

    Joined:
    Jul 12, 2002
    Messages:
    807
    Location:
    Toronto, Canada
    I know m8, I know... :D
     
  12. Tahir2

    Tahir2 Itchy
    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,926
    Location:
    United Queendom
    Maybe the NV30 has it?
    ;)
     
  13. alexsok

    Regular

    Joined:
    Jul 12, 2002
    Messages:
    807
    Location:
    Toronto, Canada
    Well... we shall see... :wink:
     
  14. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,009
    Location:
    O Canada!
    Yawn. Lets get back on topic shall we.
     
  15. alexsok

    Regular

    Joined:
    Jul 12, 2002
    Messages:
    807
    Location:
    Toronto, Canada
    Sure Dave! Now where were we? oh yeah, 8 pipes and 2 TMUS on each pipe would be great with aproximately 25-30gb/s of bandwidth and of course a 256 bit memory bus.
     
  16. psurge

    Regular

    Joined:
    Feb 6, 2002
    Messages:
    896
    Location:
    LA, California
    Question on the pixel pipes of the p10.

    Notice that they rasterize tris into 8x8 tiles (and perform visibility culling at this level).

    On top of that they have 64 = (8*8 ) texture coordinate processors and 64 pixel shading ALUs.

    To me this says: 64 pipe card, with each pipe locked to a specific pixel in an 8x8 tile? I haven't seen any claims that p10 can do data-dependent branching in pixel programs (there does appear to be some form of loop support for texture sampling), or that it can handle programs of arbitrary length, or that it's pixel pipes can operate on arbitrary pixels, pixels from different triangles, or even pixels with different shaders.

    IMO if this kind of thing were possible with p10, wouldn't the performance numbers reflect it?

    (Before you all say, 64 pipes! no way! - note that the p10 ALUs are not SIMD - i.e. they process 1 float/int at a time as opposed to 4.)

    However they do describe their programmeable units as "SIMD vertex texture and pixel arrays". That would tend to indicate that each ALU is executing the same instruction as all the other ones each cycle.

    So why exactly does everyone think p10 is "so flexible" compared to say r300?
     
  17. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    7,648
    Location:
    Cleveland
    Points to marketting material from 3DLabs. It says so right there. :p

    --|BRiT|
     
  18. Tonyo

    Newcomer

    Joined:
    Aug 2, 2002
    Messages:
    29
    Because it is :).
    Yes, P10 has data dependent branching and looping in the fragment shader. Regarding the relationship between shaders and pixels/fragments: At rendering time, a primitive (say, a triangle) is decomposed in the tiles the projected 2D primitive touches and the shaders are run for each tile, so in that sense the shader cannot displace pixels around the screen and the shader run is the same for the whole primitive.

    From Wavy's P10 preview:
    http://www.beyond3d.com/articles/p10tech/index.php?page=page2.inc

    I haven't been able to find any source disclosing the number of instructions in any of the fragment-pixel units (coordinate, shader, address and pixel)though :(
     
  19. sancheuz

    Newcomer

    Joined:
    Jul 27, 2002
    Messages:
    44
    I would recommend no less then 20 pixel pipes and around 25 tmu's per pass
     
  20. Saem

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,532
    psurge,

    In this thread over here. I asked Dave Baumann about whether the "pixel pipes" were fixed, and he felt that they weren't.

    When looking at the diagram at the end of the page here which describes the P10 microarhitecture. It seems that it is possible to load more than one triangle and have the pixel processing -of course this will take more cycles. I feel this is the case because as Dave mentioned in his P10 technology preview the P10 uses a lot of mulitlevel cache, the P10 could easily have the ability to cache a few tiles or patches. As pixels would be processed, the cache (FIFO buffer) would spit out another pixel onto the chopping block.

    As for the "SIMD arrays", this could be very much like the vertex processer where this is simply an abstracted look and in actuality, the pipelines are independently executing.
     

Share This Page

  • About Beyond3D

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...