AMD: R7xx Speculation

Status
Not open for further replies.
So if the Rv770xt won't have a separate shader clock`, or core clock at 1050, it will have much less than 1 Tflop?
How did arrive at that conclusion?

750 x 800 x 2 = 1.2 TFlop

edit - You might have done 750 x 480 x 2 = 720 GFlops
 
For GPGPU: ~1TFLOP in ATI form costs $200 and in NVidia form costs $600.

Comparing double-precision: 300+ versus ~125 GFLOPs.

For $600 you can have 1 TFLOP of ATI's double-precision or 125 GFLOPs of NVidias :oops:

Hmm.

Jawed

Which makes it a bit of a shame that ATI didn't also release a high level programming language for GPGPU ala CUDA. But instead focused on a more low level approach to programmability.

It appears (from my limited perspective) that GPGPU is taking off on Nvidia hardware mostly due to the "relative" ease of use of CUDA.

Still, at least it's seen a bit of a resurgence in the professional market for FireGL. I can only imagine FireGL based on Rv770 is going to be even more popular.

And for GPGPU, it's going to be hard for data centers to ignore the power consumption difference between Rv770 and GT200.

Regards,
SB
 
GPU-Dies.png


RV670 -> 14,36 mm x 13.37 mm = 192 mm²
RV770 -> 15.65 mm x 15.65 mm = 245 mm²
G92b ---> 16.4 mm x 16.4 mm = 268 mm²
G92 ----> 18 mm x 18 mm = 324 mm²
G200 --> 24 mm x 24 mm = 576 mm²
 
Which makes it a bit of a shame that ATI didn't also release a high level programming language for GPGPU ala CUDA. But instead focused on a more low level approach to programmability.

AMD Stream Computing
CTM : Close To Metal (Low-level language)
CAL : Compute Abstraction Layer
Brook+ (High-level language / ANSI C / BrookGPU adaptation)
ACML: AMD Core Math Library
APL: AMD Performance Library
COBRA: Video library for video transcoding acceleration
 
w0mbat said:
Why would u introduce a "terascale-engine" to a new gpu series? this only makes senes if both rv770 gpus can reach at least 1 tflops of processing power.
From R3D.

I said the exact same thing here today, apparently that post got 'removed'.

*shifty eyes* :cool:
 
So it is pretty much confirmed that it will have 800 ALUs then? Sounds very nice! 1.2 Tflops is incredible esp. when compared to the 933 Gflops of G200 high end.
 
So it is pretty much confirmed that it will have 800 ALUs then? Sounds very nice! 1.2 Tflops is incredible esp. when compared to the 933 Gflops of G200 high end.

Well, G80 had less than R600, but then reality set in. Same for G92 vs RV670.
Remember that theoretical teraflops numbers don't mean much unless there an actual advantage in real-world applications.

Still, let's wait and see what RV770 can do. It sure looks promising.
 
First I would like to post all of Ailuros' posts one after one, it might help someone decipher the code...
Well you guys can always start from scratch and try to guess how many clusters RV770 has, which would be a good starting point.
4 or 5 seems common, so let's go out of the box and say 6.
5 FLOPs/ALU in both cases. How about a bit more creative math? ;)
Excuse the typo it should have read 10 FLOPs/ALU for both hypothetical cases (96 for the first and 160 for the second). Both 480 and 800SPs could be theoretically arranged in 5 clusters.
Besides the point that the most important thing about it is that RV700 truly should yield a theoretical peak of ~1 TFLOP/s, that scenario above sounds a bit complicated to my layman's eyes. Assume you'd arrange those 480SPs in 5 clusters, you'd end up with 15 FLOPs per ALU. And no it doesn't have to be 4 or 5 clusters at any price, but that the whole number crunching stuff doesn't lead anywhere either.
By the way ATI never used to be that "fond" of MUL calls from what I recall from the past. If they'd add any single FLOP anywhere ADD would be the most likely candidate.
I knew at some point that the whole processor thing would backslap eventually LOL. If I'd start as a layman I'd say that R6x0/RV6x0 has 4 very "phat" clusters and G8x/9x 8 quite "thin" clusters. So far we've somewhat verified that GT200 contains 10 clusters.

The only thing I can think of that is relatively "creative" would be 80 5d equaling 400 shaders each doing 4Flops at 750mhz means 1.2TFlop.
80/32 = 2.5:1
80/24 = 3.3:1
 
Last edited by a moderator:
Will even the R700 card, or any of Nvidia's GT200s, need PCI express 2.0 in a single card configuration, or is 1.1 enough still?
 
I'm going to play devil's advocate here and go with the simple, boring and realistic:

750Mhz core
96 5D processors
720 Gflops
32 TMUs

Should be more than enough for +50% or more on RV670.

Speculating that we will get a 150% increase in shaders and 100% increase in TMUs for a 25% increase in transistors is too far beyond the realm of common sense for me. That would mean RV670 was made mostly of vanilla pudding and they decided to swap it for transistors this time around.
 
I'm going to play devil's advocate here and go with the simple, boring and realistic:

750Mhz core
96 5D processors
720 Gflops
32 TMUs

Should be more than enough for +50% or more on RV670.

Speculating that we will get a 150% increase in shaders and 100% increase in TMUs for a 25% increase in transistors is too far beyond the realm of common sense for me. That would mean RV670 was made mostly of vanilla pudding and they decided to swap it for transistors this time around.
That would work, other than your math of doing 2Flops per shader but putting 3Flops in definitely makes it better.
96*5*3*750=1.08TFlop
 
As long as we are taking stabs in the dark... I'll try something creative :)

7 clusters x 16 x 5 = 560 x 3 Flops x .625GHz = 1050 GFlops.

7 clusters x 4 TMUs = 28 TMUs
 
I'm going to play devil's advocate here and go with the simple, boring and realistic:

750Mhz core
96 5D processors
720 Gflops
32 TMUs.

Should be more than enough for +50% or more on RV670.
But that goes against the TFlop target.

Speculating that we will get a 150% increase in shaders and 100% increase in TMUs for a 25% increase in transistors is too far beyond the realm of common sense for me. That would mean RV670 was made mostly of vanilla pudding and they decided to swap it for transistors this time around.
Maybe they got rid of something?

That would work, other than your math of doing 2Flops per shader but putting 3Flops in definitely makes it better.
96*5*3*750=1.08TFlop
I think key is to work with 4850's clock (625MHz) to reach the TFlop mark ..
 
Status
Not open for further replies.
Back
Top