Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 12-Jan-2009, 14:14   #1
caboosemoose
Member
 
Join Date: Jan 2003
Posts: 294
Default Historical GPU FlOPs performance

Hi Guys, I'm after some historical figures for peak theoretical GPU performance in FLOPs - that's peak theoretical rather than sustained.

So, for current chips that's 1.2FTLOPs for RV770 or 933GFLOPs for GT200, for example.

Does anyone know of a resource, a slide published somewhere - I need numbers for:

NV:
GF 3 (does SM1 do flops?)
GF 3 500 (as above)

GF 4 TI 4800 (again, assuming FLOPs possible)

GF FX 5800 U
GF FX 5900 U

GF 6800 U 54 GFLOPs

GF 7800 GTX (I have 165 GFLOPs as a possible number here)
GF 7800 Ultra

ATI:
Rad 8500 (again assuming FLOPs poss)

Rad 9700 Pro
Rad 9800 Pro

Rad X800 XT 66 GFLOPs (X850)

Rad X1800
Rad X1900 (Possibly 426 GFLOPs)

Rad HD 2900 475 GFLOPs

Rad HD 3870 496 GFLOPs

That's it

Any help with any of the above appreciated

Last edited by caboosemoose; 12-Jan-2009 at 14:45.
caboosemoose is offline   Reply With Quote
Old 12-Jan-2009, 14:31   #2
Albuquerque
Red-headed step child
 
Join Date: Jun 2004
Location: Guess ;)
Posts: 3,266
Default

Google?

http://www.letmegooglethatforyou.com/?q=geforce+3+flops
http://www.letmegooglethatforyou.com...rce+4200+flops

Continue ad-nauseum...
__________________
"...twisting my words"
Quote:
Originally Posted by _xxx_ 1/25 View Post
Get some supplies <...> Within the next couple of months, you'll need it.
Quote:
Originally Posted by _xxx_ 6/9 View Post
And riots are about to begin too.
Quote:
Originally Posted by _xxx_8/5 View Post
food shortages and huge price jumps I predicted recently are becoming very real now.
Quote:
Originally Posted by _xxx_ View Post
If it turns out I was wrong, I'll admit being stupid
Albuquerque is offline   Reply With Quote
Old 12-Jan-2009, 14:44   #3
caboosemoose
Member
 
Join Date: Jan 2003
Posts: 294
Default

Thanks for the epic sarcasm.

Needless to say I have spent much of the day on google trying to fill in the gaps. I have yet to find reliable sources for the above.

Hence the post.

I have found a few possible numbers, I will update as I go along.
caboosemoose is offline   Reply With Quote
Old 12-Jan-2009, 15:05   #4
Albuquerque
Red-headed step child
 
Join Date: Jun 2004
Location: Guess ;)
Posts: 3,266
Default

Well, the first hit on the GeForce 3 search was pretty straight forward. The only difference between the original GeForce 3 and the Ti200/Ti500 was clock speed. Since you know all three clockspeeds, it's trivial to extrapolate from there.
__________________
"...twisting my words"
Quote:
Originally Posted by _xxx_ 1/25 View Post
Get some supplies <...> Within the next couple of months, you'll need it.
Quote:
Originally Posted by _xxx_ 6/9 View Post
And riots are about to begin too.
Quote:
Originally Posted by _xxx_8/5 View Post
food shortages and huge price jumps I predicted recently are becoming very real now.
Quote:
Originally Posted by _xxx_ View Post
If it turns out I was wrong, I'll admit being stupid
Albuquerque is offline   Reply With Quote
Old 12-Jan-2009, 15:13   #5
caboosemoose
Member
 
Join Date: Jan 2003
Posts: 294
Default

According to the first hit on the Geforce 3 results, the figure is 76 GFLOPs, which surely cannot be right, or is calculated very differently than the method used to come up with 165 GFLOPs for G70, to take one example - G70 has much more than 2.5x parallel processing power than NV20...

Oh and NV40 is apparently 54 GFLOPs, making is significantly slower than GF3. Not terribly likely.

It really isn't that easy to find reliable numbers for the early stuff...

Last edited by caboosemoose; 12-Jan-2009 at 15:21.
caboosemoose is offline   Reply With Quote
Old 12-Jan-2009, 15:19   #6
caboosemoose
Member
 
Join Date: Jan 2003
Posts: 294
Default

I will update the first post as I go along and tidy it all up to be used as a resource when it's finished. These are the numbers I have so far (CPU vs GPU - rounded to nearest GFLOP,peak theoretical, not sustained):

CPU:
Intel Pentium 4 3.2GHz 6 GFLOPs
Intel Pentium 4 3.4GHz 7 GFLOPs
Intel Pentium 4 670 7 GFLOPs
Intel Pentium D 840 13 GFLOPs
Intel Pentium D 955 14 GFLOPs
Intel Pentium D 965 15 GLOPs
Intel Core 2 X6800 23 GFLOPs
Intel Core 2 Quad QX6700 43 GFLOPs
Intel Core 2 Quad QX6850 48 GFLOPs
Intel Core 2 Quad QX9770 51 GFLOPs
Intel Core i7-965 51 GFLOPs

Graphics chip

GeForce 6800 Ultra 54 GFLOPs
ATI Radeon X850 XT 66 GFLOPs
NVIDIA GeForce 7800 GTX 165 GFLOPs
ATI Radeon X1900 426 GFLOPs
NVIDIA GeForce 8800 GTX 518 GFLOPs
NVIDIA GeForce 8800 Ultra 576 GFLOPs
ATI Radeon HD 2900 475 GFLOPs
NVIDIA GeForce 9800 GTX 648 GFLOPs
ATI Radeon HD 3870 496 GFLOPs
NVIDIA GeForce GTX 280 933GFLOPs
ATI Radeon HD 4870 1.2 TFLOPs
caboosemoose is offline   Reply With Quote
Old 12-Jan-2009, 15:22   #7
Freak'n Big Panda
Member
 
Join Date: Sep 2002
Location: Waterloo Ontario
Posts: 898
Send a message via MSN to Freak'n Big Panda
Default

If you wanted to be accurate about it you'd need to take a look at the shader hardware and figure out what each of the GPUs are capable of. When reviewers talk about flops on RV770, or G80, or similar DX10 capable GPUs they're talking about the number of floating point operations that can be performed in the shader cores per second. The thing is a bunch of other blocks in a GPU carry out floating point operations so you'll have to define what you mean by FLOPs.
__________________
Random 1MB ISA -> SiS 530 -> SST96 -> STG4000 -> NV20 -> R300 -> R350 -> G70 -> R580 -> RV670 -> RV770

IHV bias meter: ATI[-X--------]NV
Freak'n Big Panda is offline   Reply With Quote
Old 12-Jan-2009, 15:32   #8
Albuquerque
Red-headed step child
 
Join Date: Jun 2004
Location: Guess ;)
Posts: 3,266
Default

Indeed, remember that earlier GPU generations were entirely fixed function, so the peak FLOPs number might indeed be higher than the comparison to current performance figures might suggest.

Which is exactly where Big Panda's comment comes true: there are lots of operations in a current GPU that aren't covered by the shader core. Which ultimately leads us to the truth: FLOPs is not a good measure of total processor performance under the significant majority of workloads...
__________________
"...twisting my words"
Quote:
Originally Posted by _xxx_ 1/25 View Post
Get some supplies <...> Within the next couple of months, you'll need it.
Quote:
Originally Posted by _xxx_ 6/9 View Post
And riots are about to begin too.
Quote:
Originally Posted by _xxx_8/5 View Post
food shortages and huge price jumps I predicted recently are becoming very real now.
Quote:
Originally Posted by _xxx_ View Post
If it turns out I was wrong, I'll admit being stupid
Albuquerque is offline   Reply With Quote
Old 12-Jan-2009, 15:33   #9
caboosemoose
Member
 
Join Date: Jan 2003
Posts: 294
Default

Yes, it's a bit of a minefield, hence the post.

However, NV and AMD are pretty consistent about how they quote FLOPs for the later stuff, eg 933GFLOPs for GT200.

I'm looking to fill in the gaps using a broadly similar metric. I'm not looking to do the calculations myself - ie I don't want to get into making personal judgements. I just want the peak theoretical rate as the makers of the chips themselves would claim.

I also don't want to get into a debate about how all this translates into real world performance or processing power. I am aware of the pitfalls. I just need to compile a list of the headline, showbiz rates.
caboosemoose is offline   Reply With Quote
Old 12-Jan-2009, 15:39   #10
caboosemoose
Member
 
Join Date: Jan 2003
Posts: 294
Default

Confusingly, according to an NV graph linked below (page 6), NV30 is approx 15 *observed* GFLOPs, G71 is 250 *observed* GFLOPs:

http://developer.download.nvidia.com...pu-physics.pdf
caboosemoose is offline   Reply With Quote
Old 12-Jan-2009, 16:06   #11
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 5,568
Send a message via MSN to pjbliverpool
Default

Quote:
Originally Posted by caboosemoose View Post
I will update the first post as I go along and tidy it all up to be used as a resource when it's finished. These are the numbers I have so far (CPU vs GPU - rounded to nearest GFLOP,peak theoretical, not sustained):

CPU:
Intel Pentium 4 3.2GHz 6 GFLOPs
Intel Pentium 4 3.4GHz 7 GFLOPs
Intel Pentium 4 670 7 GFLOPs
Intel Pentium D 840 13 GFLOPs
Intel Pentium D 955 14 GFLOPs
Intel Pentium D 965 15 GLOPs
Intel Core 2 X6800 23 GFLOPs
Intel Core 2 Quad QX6700 43 GFLOPs
Intel Core 2 Quad QX6850 48 GFLOPs
Intel Core 2 Quad QX9770 51 GFLOPs
Intel Core i7-965 51 GFLOPs
Aren't they dual precision numbers? I thought all those chips were double that in single precision.
__________________
PowerVR PCX1 -> Voodoo Banshee -> GeForce2 MX200 -> GeForce2 Ti -> GeForce4 Ti 4200 -> 9800Pro -> 8800GTS -> Radeon HD 4890 -> GeForce GTX 670 DCUII TOP

8086 8Mhz -> Pentium 90 -> K6-2 233Mhz -> Athlon 'Thunderbird' 1Ghz -> AthlonXP 2400+ 2Ghz -> Core2 Duo E6600 2.4 Ghz -> Core i5 2500K 3.3Ghz
pjbliverpool is offline   Reply With Quote
Old 12-Jan-2009, 16:06   #12
MDolenc
Member
 
Join Date: May 2002
Location: Slovenia
Posts: 420
Default

NV30 is terrible when it comes to flops. You really need to know what do you count into this (shader flops, texture filtering flops, rop flops,...) to make ANY sense out of it. And even then it's more apples and oranges then anything else.
MDolenc is offline   Reply With Quote
Old 12-Jan-2009, 16:23   #13
rpg.314
Senior Member
 
Join Date: Jul 2008
Location: /
Posts: 4,218
Send a message via Skype™ to rpg.314
Default

Be careful about precision though. precision has steadily increased in the gpu domain. Earlier programmable gpus are not even 32 bit FP let alone IEEE compliant.
rpg.314 is offline   Reply With Quote
Old 12-Jan-2009, 16:27   #14
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,908
Default

GF3/GF4 is 0GFlops for the PS, and 10 flops/MHz per VS (i.e. 10 for GF3, 20 for GF4 - this is Vec4+Scalar MADD). NV30/NV35 is the same for the VS (with 3 pipes vs 1/2), but for the PS it's a bit more complicated. NV30 is 4[Pipes]*1[Unit]*2[MADD]*4[Vec] for the PS, while NV35 is 4[Pipes]*3[Unit]*2[MADD]*4[Vec]. However, the latter is for FP16; in FP32 mode, there isn't enough register bandwidth to do more than 2 MADDs or 1 MADD + 2 MULs (i.e. 2/3rd as many flops). All this means, for example, that NV30 had 16GFlops peak for the PS and 15GFlops peak for the VS...

Radeon 8500 had two VS engines, but I can't find whether they were Vec4 or Vec5 anywhere; presumably the latter like R300+. Same as for NV2x PS-wise though, 0 flops... Radeon 9000 was the same but only 1 VS engine. R300 had 4 VS, but the PS had 8*4*2 [EDIT: *3, not *2!!!] flops available to it (FP24 obviously). I think you have the right numbers for the other chips and can just extrapolate for clock speed as required, so I won't bother repeating the obvious.

Does this help?
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles)
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline   Reply With Quote
Old 12-Jan-2009, 16:42   #15
caboosemoose
Member
 
Join Date: Jan 2003
Posts: 294
Default

Yes - thank you.

However, what is currently confusing me is the NVIDIA graph that puts G70 @ 200GFLOPS, G71 @ 250GFLOPs, G80 @ 350GFLOPs, NV30 @ 15GFLOPs and NV35 @ 40GFLOPs.

...and yet I find frequent reference to G70 as 165GFLOPs. I also suspect the 933GFLOPs figure for GT200 is a different metric.

Regards the CPU figures, yes, that may be the case regards dual and single precision - the Intel page I drew them from does not specify. However, in a comparison table NVIDIA puts a 3GHz quad-core Core 2 chip @ 96GFLOPs, so I suspect my figures quoted are indeed dual precision...
caboosemoose is offline   Reply With Quote
Old 12-Jan-2009, 17:04   #16
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,908
Default

7800 GTX's PS peak is ~165GFlops, VS peak is ~34.4GFlops, so the total is indeed ~200GFlops. As a side note, that's a pretty good example of how the ratio between PS:VS flops just kept going up all the time!
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles)
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline   Reply With Quote
Old 12-Jan-2009, 17:13   #17
caboosemoose
Member
 
Join Date: Jan 2003
Posts: 294
Default

Ah yes, that makes sense, thanks.
caboosemoose is offline   Reply With Quote
Old 12-Jan-2009, 17:42   #18
pjbliverpool
B3D Scallywag
 
Join Date: May 2005
Location: Guess...
Posts: 5,568
Send a message via MSN to pjbliverpool
Default

Quote:
Originally Posted by Arun View Post
7800 GTX's PS peak is ~165GFlops, VS peak is ~34.4GFlops, so the total is indeed ~200GFlops. As a side note, that's a pretty good example of how the ratio between PS:VS flops just kept going up all the time!
Until G80 reset the ratio forever
__________________
PowerVR PCX1 -> Voodoo Banshee -> GeForce2 MX200 -> GeForce2 Ti -> GeForce4 Ti 4200 -> 9800Pro -> 8800GTS -> Radeon HD 4890 -> GeForce GTX 670 DCUII TOP

8086 8Mhz -> Pentium 90 -> K6-2 233Mhz -> Athlon 'Thunderbird' 1Ghz -> AthlonXP 2400+ 2Ghz -> Core2 Duo E6600 2.4 Ghz -> Core i5 2500K 3.3Ghz
pjbliverpool is offline   Reply With Quote
Old 12-Jan-2009, 17:43   #19
Jawed
Regular
 
Join Date: Oct 2004
Location: London
Posts: 9,948
Send a message via Skype™ to Jawed
Default

Also you can prolly argue that before GT200, NVidia's unified GPUs could only issue a MAD per clock, whereas from GT200 onwards it's MAD+MUL. Hence 346GFLOPs for 8800GTX.

Jawed
Jawed is offline   Reply With Quote
Old 12-Jan-2009, 18:13   #20
caboosemoose
Member
 
Join Date: Jan 2003
Posts: 294
Default

Yes, the NV produced graph I have puts G80 at approx 350 GFLOPs. It's all a bit of a ball ache.
caboosemoose is offline   Reply With Quote
Old 12-Jan-2009, 18:29   #21
Arun
Unknown.
 
Join Date: Aug 2002
Location: UK
Posts: 4,908
Default

Quote:
Originally Posted by Jawed View Post
Also you can prolly argue that before GT200, NVidia's unified GPUs could only issue a MAD per clock, whereas from GT200 onwards it's MAD+MUL. Hence 346GFLOPs for 8800GTX.
And even that is too simple if you wanted to be perfectly honest: G80 can use half the MUL in CUDA and none in 3D, while GT200 can use all the MUL in CUDA but only half in 3D. Yay?
Also, oops, I realized I made a mistake wrt R300's PS and forgot the free ADD; so just multiply that by 1.5x!
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles)
"[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions."
Arun is offline   Reply With Quote
Old 12-Jan-2009, 19:35   #22
mhouston
A little of this and that
 
Join Date: Oct 2005
Location: Cupertino
Posts: 342
Default

John Owens at UC Davis has historical data and a graph for this that he shows regularly.
mhouston is offline   Reply With Quote
Old 12-Jan-2009, 22:01   #23
Nick
Senior Member
 
Join Date: Jan 2003
Location: Montreal, Quebec
Posts: 1,856
Default

Quote:
Originally Posted by pjbliverpool View Post
Aren't they dual precision numbers? I thought all those chips were double that in single precision.
Only the Core 2 numbers should be doubled. They can execute MUL and ADD in parallel while Pentiums only have a single execution port for both (and only half the SIMD width).
Nick is offline   Reply With Quote
Old 12-Jan-2009, 23:56   #24
KonKort
Junior Member
 
Join Date: Dec 2008
Location: Germany, Ennepetal
Posts: 89
Send a message via ICQ to KonKort
Default

Well, look: Here is a list of all Nvidia and ATI GPUs since Geforce 2 / Radeon 7000 with Flops.
But do not compare the values 1:1. There are many architecture differences.

Nvidia list

ATI list
KonKort is offline   Reply With Quote
Old 13-Jan-2009, 01:56   #25
caboosemoose
Member
 
Join Date: Jan 2003
Posts: 294
Default

@ Kon Kort

Thanks - that is exactly what I was after.
caboosemoose is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 01:12.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.