If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Member
Join Date: Jan 2003
Posts: 294
|
Hi Guys, I'm after some historical figures for peak theoretical GPU performance in FLOPs - that's peak theoretical rather than sustained.
So, for current chips that's 1.2FTLOPs for RV770 or 933GFLOPs for GT200, for example. Does anyone know of a resource, a slide published somewhere - I need numbers for: NV: GF 3 (does SM1 do flops?) GF 3 500 (as above) GF 4 TI 4800 (again, assuming FLOPs possible) GF FX 5800 U GF FX 5900 U GF 6800 U 54 GFLOPs GF 7800 GTX (I have 165 GFLOPs as a possible number here) GF 7800 Ultra ATI: Rad 8500 (again assuming FLOPs poss) Rad 9700 Pro Rad 9800 Pro Rad X800 XT 66 GFLOPs (X850) Rad X1800 Rad X1900 (Possibly 426 GFLOPs) Rad HD 2900 475 GFLOPs Rad HD 3870 496 GFLOPs That's it Any help with any of the above appreciated Last edited by caboosemoose; 12-Jan-2009 at 14:45. |
|
|
|
|
|
#2 |
|
Red-headed step child
Join Date: Jun 2004
Location: Guess ;)
Posts: 3,084
|
Google?
http://www.letmegooglethatforyou.com/?q=geforce+3+flops http://www.letmegooglethatforyou.com...rce+4200+flops Continue ad-nauseum...
__________________
"...twisting my words" |
|
|
|
|
|
#3 |
|
Member
Join Date: Jan 2003
Posts: 294
|
Thanks for the epic sarcasm.
Needless to say I have spent much of the day on google trying to fill in the gaps. I have yet to find reliable sources for the above. Hence the post. I have found a few possible numbers, I will update as I go along. |
|
|
|
|
|
#4 |
|
Red-headed step child
Join Date: Jun 2004
Location: Guess ;)
Posts: 3,084
|
Well, the first hit on the GeForce 3 search was pretty straight forward. The only difference between the original GeForce 3 and the Ti200/Ti500 was clock speed. Since you know all three clockspeeds, it's trivial to extrapolate from there.
__________________
"...twisting my words" |
|
|
|
|
|
#5 |
|
Member
Join Date: Jan 2003
Posts: 294
|
According to the first hit on the Geforce 3 results, the figure is 76 GFLOPs, which surely cannot be right, or is calculated very differently than the method used to come up with 165 GFLOPs for G70, to take one example - G70 has much more than 2.5x parallel processing power than NV20...
Oh and NV40 is apparently 54 GFLOPs, making is significantly slower than GF3. Not terribly likely. It really isn't that easy to find reliable numbers for the early stuff... Last edited by caboosemoose; 12-Jan-2009 at 15:21. |
|
|
|
|
|
#6 |
|
Member
Join Date: Jan 2003
Posts: 294
|
I will update the first post as I go along and tidy it all up to be used as a resource when it's finished. These are the numbers I have so far (CPU vs GPU - rounded to nearest GFLOP,peak theoretical, not sustained):
CPU: Intel Pentium 4 3.2GHz 6 GFLOPs Intel Pentium 4 3.4GHz 7 GFLOPs Intel Pentium 4 670 7 GFLOPs Intel Pentium D 840 13 GFLOPs Intel Pentium D 955 14 GFLOPs Intel Pentium D 965 15 GLOPs Intel Core 2 X6800 23 GFLOPs Intel Core 2 Quad QX6700 43 GFLOPs Intel Core 2 Quad QX6850 48 GFLOPs Intel Core 2 Quad QX9770 51 GFLOPs Intel Core i7-965 51 GFLOPs Graphics chip GeForce 6800 Ultra 54 GFLOPs ATI Radeon X850 XT 66 GFLOPs NVIDIA GeForce 7800 GTX 165 GFLOPs ATI Radeon X1900 426 GFLOPs NVIDIA GeForce 8800 GTX 518 GFLOPs NVIDIA GeForce 8800 Ultra 576 GFLOPs ATI Radeon HD 2900 475 GFLOPs NVIDIA GeForce 9800 GTX 648 GFLOPs ATI Radeon HD 3870 496 GFLOPs NVIDIA GeForce GTX 280 933GFLOPs ATI Radeon HD 4870 1.2 TFLOPs |
|
|
|
|
|
#7 |
|
Member
|
If you wanted to be accurate about it you'd need to take a look at the shader hardware and figure out what each of the GPUs are capable of. When reviewers talk about flops on RV770, or G80, or similar DX10 capable GPUs they're talking about the number of floating point operations that can be performed in the shader cores per second. The thing is a bunch of other blocks in a GPU carry out floating point operations so you'll have to define what you mean by FLOPs.
__________________
Random 1MB ISA -> SiS 530 -> SST96 -> STG4000 -> NV20 -> R300 -> R350 -> G70 -> R580 -> RV670 -> RV770 IHV bias meter: ATI[-X--------]NV |
|
|
|
|
|
#8 |
|
Red-headed step child
Join Date: Jun 2004
Location: Guess ;)
Posts: 3,084
|
Indeed, remember that earlier GPU generations were entirely fixed function, so the peak FLOPs number might indeed be higher than the comparison to current performance figures might suggest.
Which is exactly where Big Panda's comment comes true: there are lots of operations in a current GPU that aren't covered by the shader core. Which ultimately leads us to the truth: FLOPs is not a good measure of total processor performance under the significant majority of workloads...
__________________
"...twisting my words" |
|
|
|
|
|
#9 |
|
Member
Join Date: Jan 2003
Posts: 294
|
Yes, it's a bit of a minefield, hence the post.
However, NV and AMD are pretty consistent about how they quote FLOPs for the later stuff, eg 933GFLOPs for GT200. I'm looking to fill in the gaps using a broadly similar metric. I'm not looking to do the calculations myself - ie I don't want to get into making personal judgements. I just want the peak theoretical rate as the makers of the chips themselves would claim. I also don't want to get into a debate about how all this translates into real world performance or processing power. I am aware of the pitfalls. I just need to compile a list of the headline, showbiz rates. |
|
|
|
|
|
#10 |
|
Member
Join Date: Jan 2003
Posts: 294
|
Confusingly, according to an NV graph linked below (page 6), NV30 is approx 15 *observed* GFLOPs, G71 is 250 *observed* GFLOPs:
http://developer.download.nvidia.com...pu-physics.pdf |
|
|
|
|
|
#11 | |
|
B3D Scallywag
|
Quote:
__________________
PowerVR PCX1 4MB --> Voodoo Banshee 16MB --> GeForce2 MX200 32MB --> GeForce2 Ti 64MB --> GeForce4 Ti 4200 128MB --> 9800Pro 128MB --> 8800GTS 640MB --> Radeon HD 4890 1GB --> GeForce GTX 670 DirectCU II TOP 2GB |
|
|
|
|
|
|
#12 |
|
Member
Join Date: May 2002
Location: Slovenia
Posts: 420
|
NV30 is terrible when it comes to flops. You really need to know what do you count into this (shader flops, texture filtering flops, rop flops,...) to make ANY sense out of it. And even then it's more apples and oranges then anything else.
|
|
|
|
|
|
#13 |
|
Senior Member
|
Be careful about precision though. precision has steadily increased in the gpu domain. Earlier programmable gpus are not even 32 bit FP let alone IEEE compliant.
|
|
|
|
|
|
#14 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
GF3/GF4 is 0GFlops for the PS, and 10 flops/MHz per VS (i.e. 10 for GF3, 20 for GF4 - this is Vec4+Scalar MADD). NV30/NV35 is the same for the VS (with 3 pipes vs 1/2), but for the PS it's a bit more complicated. NV30 is 4[Pipes]*1[Unit]*2[MADD]*4[Vec] for the PS, while NV35 is 4[Pipes]*3[Unit]*2[MADD]*4[Vec]. However, the latter is for FP16; in FP32 mode, there isn't enough register bandwidth to do more than 2 MADDs or 1 MADD + 2 MULs (i.e. 2/3rd as many flops). All this means, for example, that NV30 had 16GFlops peak for the PS and 15GFlops peak for the VS...
Radeon 8500 had two VS engines, but I can't find whether they were Vec4 or Vec5 anywhere; presumably the latter like R300+. Same as for NV2x PS-wise though, 0 flops... Radeon 9000 was the same but only 1 VS engine. R300 had 4 VS, but the PS had 8*4*2 [EDIT: *3, not *2!!!] flops available to it (FP24 obviously). I think you have the right numbers for the other chips and can just extrapolate for clock speed as required, so I won't bother repeating the obvious. Does this help?
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|
|
|
|
|
#15 |
|
Member
Join Date: Jan 2003
Posts: 294
|
Yes - thank you.
However, what is currently confusing me is the NVIDIA graph that puts G70 @ 200GFLOPS, G71 @ 250GFLOPs, G80 @ 350GFLOPs, NV30 @ 15GFLOPs and NV35 @ 40GFLOPs. ...and yet I find frequent reference to G70 as 165GFLOPs. I also suspect the 933GFLOPs figure for GT200 is a different metric. Regards the CPU figures, yes, that may be the case regards dual and single precision - the Intel page I drew them from does not specify. However, in a comparison table NVIDIA puts a 3GHz quad-core Core 2 chip @ 96GFLOPs, so I suspect my figures quoted are indeed dual precision... |
|
|
|
|
|
#16 |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
7800 GTX's PS peak is ~165GFlops, VS peak is ~34.4GFlops, so the total is indeed ~200GFlops. As a side note, that's a pretty good example of how the ratio between PS:VS flops just kept going up all the time!
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|
|
|
|
|
#17 |
|
Member
Join Date: Jan 2003
Posts: 294
|
Ah yes, that makes sense, thanks.
|
|
|
|
|
|
#18 | |
|
B3D Scallywag
|
Quote:
__________________
PowerVR PCX1 4MB --> Voodoo Banshee 16MB --> GeForce2 MX200 32MB --> GeForce2 Ti 64MB --> GeForce4 Ti 4200 128MB --> 9800Pro 128MB --> 8800GTS 640MB --> Radeon HD 4890 1GB --> GeForce GTX 670 DirectCU II TOP 2GB |
|
|
|
|
|
|
#19 |
|
Regular
|
Also you can prolly argue that before GT200, NVidia's unified GPUs could only issue a MAD per clock, whereas from GT200 onwards it's MAD+MUL. Hence 346GFLOPs for 8800GTX.
Jawed |
|
|
|
|
|
#20 |
|
Member
Join Date: Jan 2003
Posts: 294
|
Yes, the NV produced graph I have puts G80 at approx 350 GFLOPs. It's all a bit of a ball ache.
|
|
|
|
|
|
#21 | |
|
Unknown.
Join Date: Aug 2002
Location: UK
Posts: 4,877
|
Quote:
Also, oops, I realized I made a mistake wrt R300's PS and forgot the free ADD; so just multiply that by 1.5x!
__________________
Focusing on non-graphics projects in 2013 (but I still love triangles) "[...]; the kind of variation which ensues depending in most cases in a far higher degree on the nature or constitution of the being, than on the nature of the changed conditions." |
|
|
|
|
|
|
#22 |
|
A little of this and that
Join Date: Oct 2005
Location: Cupertino
Posts: 342
|
John Owens at UC Davis has historical data and a graph for this that he shows regularly.
|
|
|
|
|
|
#23 |
|
Senior Member
Join Date: Jan 2003
Location: Ottawa, Ontario
Posts: 1,783
|
Only the Core 2 numbers should be doubled. They can execute MUL and ADD in parallel while Pentiums only have a single execution port for both (and only half the SIMD width).
|
|
|
|
|
|
#24 |
|
Junior Member
|
Well, look: Here is a list of all Nvidia and ATI GPUs since Geforce 2 / Radeon 7000 with Flops.
But do not compare the values 1:1. There are many architecture differences. Nvidia list ATI list |
|
|
|
|
|
#25 |
|
Member
Join Date: Jan 2003
Posts: 294
|
@ Kon Kort
Thanks - that is exactly what I was after. |
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|