Historical GPU FlOPs performance

Hi Guys, I'm after some historical figures for peak theoretical GPU performance in FLOPs - that's peak theoretical rather than sustained.

So, for current chips that's 1.2FTLOPs for RV770 or 933GFLOPs for GT200, for example.

Does anyone know of a resource, a slide published somewhere - I need numbers for:

NV:
GF 3 (does SM1 do flops?)
GF 3 500 (as above)

GF 4 TI 4800 (again, assuming FLOPs possible)

GF FX 5800 U
GF FX 5900 U

GF 6800 U 54 GFLOPs

GF 7800 GTX (I have 165 GFLOPs as a possible number here)
GF 7800 Ultra

ATI:
Rad 8500 (again assuming FLOPs poss)

Rad 9700 Pro
Rad 9800 Pro

Rad X800 XT 66 GFLOPs (X850)

Rad X1800
Rad X1900 (Possibly 426 GFLOPs)

Rad HD 2900 475 GFLOPs

Rad HD 3870 496 GFLOPs

That's it :D

Any help with any of the above appreciated
 
Last edited by a moderator:
Thanks for the epic sarcasm.

Needless to say I have spent much of the day on google trying to fill in the gaps. I have yet to find reliable sources for the above.

Hence the post.

I have found a few possible numbers, I will update as I go along.
 
Well, the first hit on the GeForce 3 search was pretty straight forward. The only difference between the original GeForce 3 and the Ti200/Ti500 was clock speed. Since you know all three clockspeeds, it's trivial to extrapolate from there.
 
According to the first hit on the Geforce 3 results, the figure is 76 GFLOPs, which surely cannot be right, or is calculated very differently than the method used to come up with 165 GFLOPs for G70, to take one example - G70 has much more than 2.5x parallel processing power than NV20...

Oh and NV40 is apparently 54 GFLOPs, making is significantly slower than GF3. Not terribly likely.

It really isn't that easy to find reliable numbers for the early stuff...
 
Last edited by a moderator:
I will update the first post as I go along and tidy it all up to be used as a resource when it's finished. These are the numbers I have so far (CPU vs GPU - rounded to nearest GFLOP,peak theoretical, not sustained):

CPU:
Intel Pentium 4 3.2GHz 6 GFLOPs
Intel Pentium 4 3.4GHz 7 GFLOPs
Intel Pentium 4 670 7 GFLOPs
Intel Pentium D 840 13 GFLOPs
Intel Pentium D 955 14 GFLOPs
Intel Pentium D 965 15 GLOPs
Intel Core 2 X6800 23 GFLOPs
Intel Core 2 Quad QX6700 43 GFLOPs
Intel Core 2 Quad QX6850 48 GFLOPs
Intel Core 2 Quad QX9770 51 GFLOPs
Intel Core i7-965 51 GFLOPs

Graphics chip

GeForce 6800 Ultra 54 GFLOPs
ATI Radeon X850 XT 66 GFLOPs
NVIDIA GeForce 7800 GTX 165 GFLOPs
ATI Radeon X1900 426 GFLOPs
NVIDIA GeForce 8800 GTX 518 GFLOPs
NVIDIA GeForce 8800 Ultra 576 GFLOPs
ATI Radeon HD 2900 475 GFLOPs
NVIDIA GeForce 9800 GTX 648 GFLOPs
ATI Radeon HD 3870 496 GFLOPs
NVIDIA GeForce GTX 280 933GFLOPs
ATI Radeon HD 4870 1.2 TFLOPs
 
If you wanted to be accurate about it you'd need to take a look at the shader hardware and figure out what each of the GPUs are capable of. When reviewers talk about flops on RV770, or G80, or similar DX10 capable GPUs they're talking about the number of floating point operations that can be performed in the shader cores per second. The thing is a bunch of other blocks in a GPU carry out floating point operations so you'll have to define what you mean by FLOPs.
 
Indeed, remember that earlier GPU generations were entirely fixed function, so the peak FLOPs number might indeed be higher than the comparison to current performance figures might suggest.

Which is exactly where Big Panda's comment comes true: there are lots of operations in a current GPU that aren't covered by the shader core. Which ultimately leads us to the truth: FLOPs is not a good measure of total processor performance under the significant majority of workloads...
 
Yes, it's a bit of a minefield, hence the post.

However, NV and AMD are pretty consistent about how they quote FLOPs for the later stuff, eg 933GFLOPs for GT200.

I'm looking to fill in the gaps using a broadly similar metric. I'm not looking to do the calculations myself - ie I don't want to get into making personal judgements. I just want the peak theoretical rate as the makers of the chips themselves would claim.

I also don't want to get into a debate about how all this translates into real world performance or processing power. I am aware of the pitfalls. I just need to compile a list of the headline, showbiz rates.
 
I will update the first post as I go along and tidy it all up to be used as a resource when it's finished. These are the numbers I have so far (CPU vs GPU - rounded to nearest GFLOP,peak theoretical, not sustained):

CPU:
Intel Pentium 4 3.2GHz 6 GFLOPs
Intel Pentium 4 3.4GHz 7 GFLOPs
Intel Pentium 4 670 7 GFLOPs
Intel Pentium D 840 13 GFLOPs
Intel Pentium D 955 14 GFLOPs
Intel Pentium D 965 15 GLOPs
Intel Core 2 X6800 23 GFLOPs
Intel Core 2 Quad QX6700 43 GFLOPs
Intel Core 2 Quad QX6850 48 GFLOPs
Intel Core 2 Quad QX9770 51 GFLOPs
Intel Core i7-965 51 GFLOPs

Aren't they dual precision numbers? I thought all those chips were double that in single precision.
 
NV30 is terrible when it comes to flops. You really need to know what do you count into this (shader flops, texture filtering flops, rop flops,...) to make ANY sense out of it. And even then it's more apples and oranges then anything else.
 
Be careful about precision though. precision has steadily increased in the gpu domain. Earlier programmable gpus are not even 32 bit FP let alone IEEE compliant.
 
GF3/GF4 is 0GFlops for the PS, and 10 flops/MHz per VS (i.e. 10 for GF3, 20 for GF4 - this is Vec4+Scalar MADD). NV30/NV35 is the same for the VS (with 3 pipes vs 1/2), but for the PS it's a bit more complicated. NV30 is 4[Pipes]*1[Unit]*2[MADD]*4[Vec] for the PS, while NV35 is 4[Pipes]*3[Unit]*2[MADD]*4[Vec]. However, the latter is for FP16; in FP32 mode, there isn't enough register bandwidth to do more than 2 MADDs or 1 MADD + 2 MULs (i.e. 2/3rd as many flops). All this means, for example, that NV30 had 16GFlops peak for the PS and 15GFlops peak for the VS...

Radeon 8500 had two VS engines, but I can't find whether they were Vec4 or Vec5 anywhere; presumably the latter like R300+. Same as for NV2x PS-wise though, 0 flops... Radeon 9000 was the same but only 1 VS engine. R300 had 4 VS, but the PS had 8*4*2 [EDIT: *3, not *2!!!] flops available to it (FP24 obviously). I think you have the right numbers for the other chips and can just extrapolate for clock speed as required, so I won't bother repeating the obvious.

Does this help? :)
 
Yes - thank you.

However, what is currently confusing me is the NVIDIA graph that puts G70 @ 200GFLOPS, G71 @ 250GFLOPs, G80 @ 350GFLOPs, NV30 @ 15GFLOPs and NV35 @ 40GFLOPs.

...and yet I find frequent reference to G70 as 165GFLOPs. I also suspect the 933GFLOPs figure for GT200 is a different metric.

Regards the CPU figures, yes, that may be the case regards dual and single precision - the Intel page I drew them from does not specify. However, in a comparison table NVIDIA puts a 3GHz quad-core Core 2 chip @ 96GFLOPs, so I suspect my figures quoted are indeed dual precision...
 
7800 GTX's PS peak is ~165GFlops, VS peak is ~34.4GFlops, so the total is indeed ~200GFlops. As a side note, that's a pretty good example of how the ratio between PS:VS flops just kept going up all the time!
 
7800 GTX's PS peak is ~165GFlops, VS peak is ~34.4GFlops, so the total is indeed ~200GFlops. As a side note, that's a pretty good example of how the ratio between PS:VS flops just kept going up all the time!

Until G80 reset the ratio forever :smile:
 
Also you can prolly argue that before GT200, NVidia's unified GPUs could only issue a MAD per clock, whereas from GT200 onwards it's MAD+MUL. Hence 346GFLOPs for 8800GTX.

Jawed
 
Back
Top