Can someone explain this...

Discussion in 'Architecture and Products' started by obobski, Nov 2, 2005.

  1. obobski

    Newcomer

    Joined:
    Jul 25, 2005
    Messages:
    117
    Likes Received:
    3
    one of the few points I don't understand about fill rate is why the G70 can compete with X1800XT, given that it's pixel fill rate is so much lower, I understand the texture fill rate to be higher (scarcely)

    so how does one calculate by how much the 550/1800 512MB will stomp X1800XT?

    assuming i've done this right, it'll push 8800M Pixels/S and 13200M Texels/S, X1800XT is squarely at 10000M on both, I know that the 512MB will smash up on the X1800XT because it's clocked so much higher, given that current G70's are able to be on the level with X1800XT...so can someone please assist me in figuring out whatever proportional equation or rule to show how a card with a non equal # of ROP's to PPU's can have it's output calculated?
     
  2. stepz

    Newcomer

    Joined:
    Dec 11, 2003
    Messages:
    66
    Likes Received:
    3
    That output cannot be calculated by a proportional equation. The GPU architecture is complex enough that there just isn't a single metric to calculate performance. For a reasonably similar metaphor take CPU's: the P4 has some 16GFlops of theoretical power while Athlon has about 12GFlops, but that doesn't translate into P4 being a third faster.
     
  3. obobski

    Newcomer

    Joined:
    Jul 25, 2005
    Messages:
    117
    Likes Received:
    3
    that makes logical sense, but usually higher fill rate has resulted in the GPU being faster, yet I do understand what your saying, and since most GPU's don't hit their theoretical fill rates even in very "light" benchmarks (such as D3D Rightmark) I could see how having all that fill rate doesn't really result in more performance...along with the issue of bandwidth

    but fill rate is different from FLOPS isn't it? since you can measure a GPU's output in FLOPS as well (and don't they hit some amazing figures?)

    also, I don't think A64 is 12...it's more like 4-6

    reference:
    http://top500.org/sublist/System.php?id=6818

    Opteron 2000MHZ is roughly identical to an Athlon64 2000MHZ with 1MB L2

    as for Pentium 4, the best comparison I could draw is Xeon, at first i'd be inspired to asy Xeon was P6 based, but if Xeon was P6 based it would downright tear apart every CPU on the market with possibly the exception of Itanium2, given the clock speeds it's operating at

    http://top500.org/sublist/System.php?id=6085
    there is Xeon 2400, operating at 4.8GFLOPS

    so it's roughly 2x clock speed = GFLOPS
    which technically doesn't make sense, given that by calculation of avliable cycles x # of FPU's it should result in figures around 20GFLOPS, not 4, but then again, this is GFLOPS as read by the Linpack bench

    I'm thinking the comparison your trying to draw is archetecual difference to clock speed?
    like saying that a Pentium 4 670 won't tear apart an Athlon64 FX-57, even though it's clocked a GHZ faster

    but i'm not comparing the clock speeds of these cards, i'm comparing the clock speeds in relation to available ROP's, in all reason the G70 shouldn't be able to perform like it does, but the extra PPU's do give it more peformance, i'm just wondering how their aiding it in doing what it does, so much faster than NV40 @ identical speeds
     
  4. obobski

    Newcomer

    Joined:
    Jul 25, 2005
    Messages:
    117
    Likes Received:
    3
  5. Maintank

    Regular

    Joined:
    Apr 13, 2004
    Messages:
    463
    Likes Received:
    2
    Think I understand what you are asking.

    I havent kept up with the technical specs but I believe the FPU on the P4 is a dual issue, single retire unit. The most you will get out of it is 1xClock for your flop rating.

    The Athlon is a 3 issue, 2 retire unit and thus a 2xclock.

    Athlon at 2Ghz should give you 4 Gflops, P4 at 3.5Ghz could give you 3.5Gflops theoretically of course.

    SSE units get a little dicey and I understand the SSE and SSE2 units on the P4 are supposed to be able to issue and retire 4 instructions per clock. 3.5Ghz P4 = 14Gflops. The Athlon in long mode I have no idea what it can do.

    Now to answer your question about how Nvidia keeps up? I am guessing the games we are looking at are not pixel fill rate limited but instead pixel shader limited. Any advantages the X1800 shows in fill rate ends up being a waste as the shaders are not able to keep them going full bore. That is my guess and I am sure people on this msgboard who are more knowledgeable will chime in with more information or correct anything I have said that is wrong.
     
  6. Pete

    Pete Moderate Nuisance
    Moderator Legend Veteran

    Joined:
    Feb 7, 2002
    Messages:
    5,032
    Likes Received:
    467
    Afaik....

    You can always compare a 6600GT to a 6800 to see that raw pixel fillrate isn't the limiting factor anymore. The 6600GT has just four ROPs clocked at 500MHz, which means a 2GP/s fillrate. The 6800 has I believe 16 ROPs clocked at 325MHz, which means a 5.2Gp/s fillrate. Obviously a 6800 scores nowhere near 2.5x higher than a 6600GT in any benchmark I can think of, so raw fillrate isn't that big of a deal anymore. Heck, check out XbitLabs' recent Asus 256MB 6600GT review: the extra 128MB of RAM on the 6600GT boosted scores quite a bit, usually to parity with the 6800, and that's with the same "limited" pixel fillrate.

    Cards aren't drawing simple pixels anymore, so pixel fillrate is increasingly a very theoretical number. When you get down to the relatively minor 10% pixel fillrate deficit a GTX yields to an XT, it's really insignificant, especially when both cards are typically compared with full special effects (FP16 HDR halves theoretical fillrate on NV's parts, IIRC) and lots of AA and AF.
     
  7. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    For all modern graphics cards (excepting the Radeon X1600 series), it's the texture fillrate that is most important as a measurement of performance. This is because all of these graphics cards have one texture unit per pixel pipeline.

    The GeForce 7800 GTX, for instance, has 24 pixel pipelines, but only 16 ROP's. The lack of ROP's gives a negligible performance hit because there isn't enough memory bandwidth to push 24 ROP's per clock in most situations anyway. So the 24 number is a better measurement of performance (which happens to be the number used for the calculation of the texture fillrate).

    Things are about to get a whole lot more complex, though, as pixel pipelines are decoupled from texture pipelines, as is the case with the Radeon X1600 series (which has 3 pixel pipelines per texture pipeline), and will be the case with ATI's upcoming R580.

    The texture fillrate and pixel fillrate of both of these cards will be extremely low compared to the performance they will obtain in modern games.

    So, because these cards are getting so complex, it's basically reaching the point where it's going to be better to just ignore the architecture entirely and focus on game benchmarks (i.e. not 3DMark) to give an indication of the performance of a GPU.

    Edit: By the way, another poster child for low pixel fillrate but high performance is the GeForce 6600 GT, which only has a pixel fillrate of 2GPix/sec, but has a texel fillrate of 4GTex/sec, and behaves much more like one would expect a 4GPix/sec card to perform.
     
  8. stepz

    Newcomer

    Joined:
    Dec 11, 2003
    Messages:
    66
    Likes Received:
    3
    No can't do. Will not ignore the architecture. I for one don't care how fast a card runs, I'm mostly interested in why does it run so fast (or slow). On the other hand, maybe its just me. :)

    Anyway, I wouldn't advocate the texture fillrate as a metric either. It really isn't that good even today, and tomorrow we'll have some other guy asking "card A has so and so texel fillrate and card B has only so and so fillrate, how come B is faster?". The fact of the matter is, the chips are complex enough that there just isn't any other meaningful metric other than actual performance in applications.

    PS: On the P4 and Athlon FLOPS, both can do one SSE op per cycle under ideal circumstances. So 16GFLOPS is a 4GHz P4 and 12GFLOPS is a 3GHz Athlon. The linpack performance is already closer to an actual (albeit simple) application than a theoretical capability figure.

    [edit] typo
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...