Historical GPU FlOPs performance

Discussion in 'Architecture and Products' started by caboosemoose, Jan 12, 2009.

  1. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
    Hi Guys, I'm after some historical figures for peak theoretical GPU performance in FLOPs - that's peak theoretical rather than sustained.

    So, for current chips that's 1.2FTLOPs for RV770 or 933GFLOPs for GT200, for example.

    Does anyone know of a resource, a slide published somewhere - I need numbers for:

    NV:
    GF 3 (does SM1 do flops?)
    GF 3 500 (as above)

    GF 4 TI 4800 (again, assuming FLOPs possible)

    GF FX 5800 U
    GF FX 5900 U

    GF 6800 U 54 GFLOPs

    GF 7800 GTX (I have 165 GFLOPs as a possible number here)
    GF 7800 Ultra

    ATI:
    Rad 8500 (again assuming FLOPs poss)

    Rad 9700 Pro
    Rad 9800 Pro

    Rad X800 XT 66 GFLOPs (X850)

    Rad X1800
    Rad X1900 (Possibly 426 GFLOPs)

    Rad HD 2900 475 GFLOPs

    Rad HD 3870 496 GFLOPs

    That's it :D

    Any help with any of the above appreciated
     
    #1 caboosemoose, Jan 12, 2009
    Last edited by a moderator: Jan 12, 2009
  2. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
    Thanks for the epic sarcasm.

    Needless to say I have spent much of the day on google trying to fill in the gaps. I have yet to find reliable sources for the above.

    Hence the post.

    I have found a few possible numbers, I will update as I go along.
     
  3. Albuquerque

    Albuquerque Red-headed step child
    Veteran

    Joined:
    Jun 17, 2004
    Messages:
    3,787
    Location:
    35.1415,-90.056
    Well, the first hit on the GeForce 3 search was pretty straight forward. The only difference between the original GeForce 3 and the Ti200/Ti500 was clock speed. Since you know all three clockspeeds, it's trivial to extrapolate from there.
     
  4. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
    According to the first hit on the Geforce 3 results, the figure is 76 GFLOPs, which surely cannot be right, or is calculated very differently than the method used to come up with 165 GFLOPs for G70, to take one example - G70 has much more than 2.5x parallel processing power than NV20...

    Oh and NV40 is apparently 54 GFLOPs, making is significantly slower than GF3. Not terribly likely.

    It really isn't that easy to find reliable numbers for the early stuff...
     
    #5 caboosemoose, Jan 12, 2009
    Last edited by a moderator: Jan 12, 2009
  5. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
    I will update the first post as I go along and tidy it all up to be used as a resource when it's finished. These are the numbers I have so far (CPU vs GPU - rounded to nearest GFLOP,peak theoretical, not sustained):

    CPU:
    Intel Pentium 4 3.2GHz 6 GFLOPs
    Intel Pentium 4 3.4GHz 7 GFLOPs
    Intel Pentium 4 670 7 GFLOPs
    Intel Pentium D 840 13 GFLOPs
    Intel Pentium D 955 14 GFLOPs
    Intel Pentium D 965 15 GLOPs
    Intel Core 2 X6800 23 GFLOPs
    Intel Core 2 Quad QX6700 43 GFLOPs
    Intel Core 2 Quad QX6850 48 GFLOPs
    Intel Core 2 Quad QX9770 51 GFLOPs
    Intel Core i7-965 51 GFLOPs

    Graphics chip

    GeForce 6800 Ultra 54 GFLOPs
    ATI Radeon X850 XT 66 GFLOPs
    NVIDIA GeForce 7800 GTX 165 GFLOPs
    ATI Radeon X1900 426 GFLOPs
    NVIDIA GeForce 8800 GTX 518 GFLOPs
    NVIDIA GeForce 8800 Ultra 576 GFLOPs
    ATI Radeon HD 2900 475 GFLOPs
    NVIDIA GeForce 9800 GTX 648 GFLOPs
    ATI Radeon HD 3870 496 GFLOPs
    NVIDIA GeForce GTX 280 933GFLOPs
    ATI Radeon HD 4870 1.2 TFLOPs
     
  6. Freak'n Big Panda

    Regular

    Joined:
    Sep 28, 2002
    Messages:
    898
    Location:
    Waterloo Ontario
    If you wanted to be accurate about it you'd need to take a look at the shader hardware and figure out what each of the GPUs are capable of. When reviewers talk about flops on RV770, or G80, or similar DX10 capable GPUs they're talking about the number of floating point operations that can be performed in the shader cores per second. The thing is a bunch of other blocks in a GPU carry out floating point operations so you'll have to define what you mean by FLOPs.
     
  7. Albuquerque

    Albuquerque Red-headed step child
    Veteran

    Joined:
    Jun 17, 2004
    Messages:
    3,787
    Location:
    35.1415,-90.056
    Indeed, remember that earlier GPU generations were entirely fixed function, so the peak FLOPs number might indeed be higher than the comparison to current performance figures might suggest.

    Which is exactly where Big Panda's comment comes true: there are lots of operations in a current GPU that aren't covered by the shader core. Which ultimately leads us to the truth: FLOPs is not a good measure of total processor performance under the significant majority of workloads...
     
  8. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
    Yes, it's a bit of a minefield, hence the post.

    However, NV and AMD are pretty consistent about how they quote FLOPs for the later stuff, eg 933GFLOPs for GT200.

    I'm looking to fill in the gaps using a broadly similar metric. I'm not looking to do the calculations myself - ie I don't want to get into making personal judgements. I just want the peak theoretical rate as the makers of the chips themselves would claim.

    I also don't want to get into a debate about how all this translates into real world performance or processing power. I am aware of the pitfalls. I just need to compile a list of the headline, showbiz rates.
     
  9. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
  10. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,454
    Location:
    Guess...
    Aren't they dual precision numbers? I thought all those chips were double that in single precision.
     
  11. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    544
    Location:
    Slovenia
    NV30 is terrible when it comes to flops. You really need to know what do you count into this (shader flops, texture filtering flops, rop flops,...) to make ANY sense out of it. And even then it's more apples and oranges then anything else.
     
  12. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Location:
    /
    Be careful about precision though. precision has steadily increased in the gpu domain. Earlier programmable gpus are not even 32 bit FP let alone IEEE compliant.
     
  13. Arun

    Arun Unknown.
    Moderator Veteran

    Joined:
    Aug 28, 2002
    Messages:
    4,971
    Location:
    UK
    GF3/GF4 is 0GFlops for the PS, and 10 flops/MHz per VS (i.e. 10 for GF3, 20 for GF4 - this is Vec4+Scalar MADD). NV30/NV35 is the same for the VS (with 3 pipes vs 1/2), but for the PS it's a bit more complicated. NV30 is 4[Pipes]*1[Unit]*2[MADD]*4[Vec] for the PS, while NV35 is 4[Pipes]*3[Unit]*2[MADD]*4[Vec]. However, the latter is for FP16; in FP32 mode, there isn't enough register bandwidth to do more than 2 MADDs or 1 MADD + 2 MULs (i.e. 2/3rd as many flops). All this means, for example, that NV30 had 16GFlops peak for the PS and 15GFlops peak for the VS...

    Radeon 8500 had two VS engines, but I can't find whether they were Vec4 or Vec5 anywhere; presumably the latter like R300+. Same as for NV2x PS-wise though, 0 flops... Radeon 9000 was the same but only 1 VS engine. R300 had 4 VS, but the PS had 8*4*2 [EDIT: *3, not *2!!!] flops available to it (FP24 obviously). I think you have the right numbers for the other chips and can just extrapolate for clock speed as required, so I won't bother repeating the obvious.

    Does this help? :)
     
  14. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
    Yes - thank you.

    However, what is currently confusing me is the NVIDIA graph that puts G70 @ 200GFLOPS, G71 @ 250GFLOPs, G80 @ 350GFLOPs, NV30 @ 15GFLOPs and NV35 @ 40GFLOPs.

    ...and yet I find frequent reference to G70 as 165GFLOPs. I also suspect the 933GFLOPs figure for GT200 is a different metric.

    Regards the CPU figures, yes, that may be the case regards dual and single precision - the Intel page I drew them from does not specify. However, in a comparison table NVIDIA puts a 3GHz quad-core Core 2 chip @ 96GFLOPs, so I suspect my figures quoted are indeed dual precision...
     
  15. Arun

    Arun Unknown.
    Moderator Veteran

    Joined:
    Aug 28, 2002
    Messages:
    4,971
    Location:
    UK
    7800 GTX's PS peak is ~165GFlops, VS peak is ~34.4GFlops, so the total is indeed ~200GFlops. As a side note, that's a pretty good example of how the ratio between PS:VS flops just kept going up all the time!
     
  16. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
    Ah yes, that makes sense, thanks.
     
  17. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    7,454
    Location:
    Guess...
    Until G80 reset the ratio forever :smile:
     
  18. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,661
    Location:
    London
    Also you can prolly argue that before GT200, NVidia's unified GPUs could only issue a MAD per clock, whereas from GT200 onwards it's MAD+MUL. Hence 346GFLOPs for 8800GTX.

    Jawed
     
  19. caboosemoose

    Regular

    Joined:
    Jan 15, 2003
    Messages:
    294
    Yes, the NV produced graph I have puts G80 at approx 350 GFLOPs. It's all a bit of a ball ache.
     

Share This Page

  • About Beyond3D

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...