AMD: R7xx Speculation

Arty · Jun 7, 2008

Nuker said:
So if the Rv770xt won't have a separate shader clock`, or core clock at 1050, it will have much less than 1 Tflop?

How did arrive at that conclusion?

750 x 800 x 2 = 1.2 TFlop

edit - You might have done 750 x 480 x 2 = 720 GFlops

Silent_Buddha · Jun 7, 2008

Jawed said:
For GPGPU: ~1TFLOP in ATI form costs $200 and in NVidia form costs $600.

Comparing double-precision: 300+ versus ~125 GFLOPs.

For $600 you can have 1 TFLOP of ATI's double-precision or 125 GFLOPs of NVidias

Hmm.

Jawed

Which makes it a bit of a shame that ATI didn't also release a high level programming language for GPGPU ala CUDA. But instead focused on a more low level approach to programmability.

It appears (from my limited perspective) that GPGPU is taking off on Nvidia hardware mostly due to the "relative" ease of use of CUDA.

Still, at least it's seen a bit of a resurgence in the professional market for FireGL. I can only imagine FireGL based on Rv770 is going to be even more popular.

And for GPGPU, it's going to be hard for data centers to ignore the power consumption difference between Rv770 and GT200.

Regards,
SB

Wirmish · Jun 7, 2008

RV670 -> 14,36 mm x 13.37 mm = 192 mm²
RV770 -> 15.65 mm x 15.65 mm = 245 mm²
G92b ---> 16.4 mm x 16.4 mm = 268 mm²
G92 ----> 18 mm x 18 mm = 324 mm²
G200 --> 24 mm x 24 mm = 576 mm²

Wirmish · Jun 7, 2008

Silent_Buddha said:
Which makes it a bit of a shame that ATI didn't also release a high level programming language for GPGPU ala CUDA. But instead focused on a more low level approach to programmability.

AMD Stream Computing
CTM : Close To Metal (Low-level language)
CAL : Compute Abstraction Layer
Brook+ (High-level language / ANSI C / BrookGPU adaptation)
ACML: AMD Core Math Library
APL: AMD Performance Library
COBRA: Video library for video transcoding acceleration

Shtal · Jun 7, 2008

http://rs322tl3.rapidshare.com/files/120645188/328743/R700.jpg

BRiT · Jun 7, 2008

Shtal said:
http://rs322tl3.rapidshare.com/files/120645188/328743/R700.jpg

You want to download the following file:

http://rapidshare.com/files/120645188/R700.jpg | 193 KB

The download session has expired.

EDIT: Odd, I had to click through a few screens that kept giving the same error, then it started working. The one which worked for me.

Wirmish · Jun 7, 2008

Are you talking about the legendary ATi chip, which will be revealed in approximately 6 months ?

Shtal · Jun 7, 2008

Wirmish said:
Are you talking about the legendary ATi chip, which will be revealed in approximately 6 months ?

lol: I think to get basic idea how (2x RV770) Radeon 4870X2 will fight against huge GT200.

Arty · Jun 7, 2008

w0mbat said:
Why would u introduce a "terascale-engine" to a new gpu series? this only makes senes if both rv770 gpus can reach at least 1 tflops of processing power.

From R3D.

I said the exact same thing here today, apparently that post got 'removed'.

*shifty eyes*

ninelven · Jun 7, 2008

Well I hope they were able to fit at least 40 TMUs with 800 SPs...

Nuker · Jun 7, 2008

So it is pretty much confirmed that it will have 800 ALUs then? Sounds very nice! 1.2 Tflops is incredible esp. when compared to the 933 Gflops of G200 high end.

INKster · Jun 7, 2008

Nuker said:
So it is pretty much confirmed that it will have 800 ALUs then? Sounds very nice! 1.2 Tflops is incredible esp. when compared to the 933 Gflops of G200 high end.

Well, G80 had less than R600, but then reality set in. Same for G92 vs RV670.
Remember that theoretical teraflops numbers don't mean much unless there an actual advantage in real-world applications.

Still, let's wait and see what RV770 can do. It sure looks promising.

Wirmish · Jun 7, 2008

19 days to go... and we are still in the dark

LordEC911 · Jun 7, 2008

First I would like to post all of Ailuros' posts one after one, it might help someone decipher the code...

Ailuros said:
Well you guys can always start from scratch and try to guess how many clusters RV770 has, which would be a good starting point.

4 or 5 seems common, so let's go out of the box and say 6.

Ailuros said:
5 FLOPs/ALU in both cases. How about a bit more creative math?

Ailuros said:
Excuse the typo it should have read 10 FLOPs/ALU for both hypothetical cases (96 for the first and 160 for the second). Both 480 and 800SPs could be theoretically arranged in 5 clusters.

Ailuros said:
Besides the point that the most important thing about it is that RV700 truly should yield a theoretical peak of ~1 TFLOP/s, that scenario above sounds a bit complicated to my layman's eyes. Assume you'd arrange those 480SPs in 5 clusters, you'd end up with 15 FLOPs per ALU. And no it doesn't have to be 4 or 5 clusters at any price, but that the whole number crunching stuff doesn't lead anywhere either.
By the way ATI never used to be that "fond" of MUL calls from what I recall from the past. If they'd add any single FLOP anywhere ADD would be the most likely candidate.

Ailuros said:
I knew at some point that the whole processor thing would backslap eventually LOL. If I'd start as a layman I'd say that R6x0/RV6x0 has 4 very "phat" clusters and G8x/9x 8 quite "thin" clusters. So far we've somewhat verified that GT200 contains 10 clusters.

The only thing I can think of that is relatively "creative" would be 80 5d equaling 400 shaders each doing 4Flops at 750mhz means 1.2TFlop.
80/32 = 2.5:1
80/24 = 3.3:1

kyetech · Jun 7, 2008

Wirmish said:
19 days to go... and we are still in the dark

lol, good work!

Berek · Jun 7, 2008

Will even the R700 card, or any of Nvidia's GT200s, need PCI express 2.0 in a single card configuration, or is 1.1 enough still?

trinibwoy · Jun 7, 2008

I'm going to play devil's advocate here and go with the simple, boring and realistic:

750Mhz core
96 5D processors
720 Gflops
32 TMUs

Should be more than enough for +50% or more on RV670.

Speculating that we will get a 150% increase in shaders and 100% increase in TMUs for a 25% increase in transistors is too far beyond the realm of common sense for me. That would mean RV670 was made mostly of vanilla pudding and they decided to swap it for transistors this time around.

LordEC911 · Jun 7, 2008

trinibwoy said:
I'm going to play devil's advocate here and go with the simple, boring and realistic:

750Mhz core
96 5D processors
720 Gflops
32 TMUs

Should be more than enough for +50% or more on RV670.

Speculating that we will get a 150% increase in shaders and 100% increase in TMUs for a 25% increase in transistors is too far beyond the realm of common sense for me. That would mean RV670 was made mostly of vanilla pudding and they decided to swap it for transistors this time around.

That would work, other than your math of doing 2Flops per shader but putting 3Flops in definitely makes it better.
96*5*3*750=1.08TFlop

ninelven · Jun 7, 2008

As long as we are taking stabs in the dark... I'll try something creative

7 clusters x 16 x 5 = 560 x 3 Flops x .625GHz = 1050 GFlops.

7 clusters x 4 TMUs = 28 TMUs

Arty · Jun 7, 2008

trinibwoy said:
I'm going to play devil's advocate here and go with the simple, boring and realistic:

750Mhz core
96 5D processors
720 Gflops
32 TMUs.

Should be more than enough for +50% or more on RV670.

But that goes against the TFlop target.

trinibwoy said:
Speculating that we will get a 150% increase in shaders and 100% increase in TMUs for a 25% increase in transistors is too far beyond the realm of common sense for me. That would mean RV670 was made mostly of vanilla pudding and they decided to swap it for transistors this time around.

Maybe they got rid of something?

LordEC911 said:
That would work, other than your math of doing 2Flops per shader but putting 3Flops in definitely makes it better.
96*5*3*750=1.08TFlop

I think key is to work with 4850's clock (625MHz) to reach the TFlop mark ..

AMD: R7xx Speculation

Arty

KEPLER

Silent_Buddha

Wirmish

Wirmish

Shtal

BRiT

(>• •)>⌐■-■ (⌐■-■)

Wirmish

Shtal

Arty

KEPLER

ninelven

PM

Nuker

INKster

Wirmish

LordEC911

kyetech

Berek

trinibwoy

Meh

LordEC911

ninelven

PM

Arty

KEPLER

Similar threads