AMD: R7xx Speculation

Status
Not open for further replies.
@BRiT: Then they would have choose a higher clk for reaching the 1tflops mark.

@v_rr: Maybe i understood this message, maybe not.
 
Even G200 has the MUL now fully available.

Just imagine how RV770 can reach high processing power w/o any shaderclk.
 
Excuse the typo it should have read 10 FLOPs/ALU for both hypothetical cases (96 for the first and 160 for the second). Both 480 and 800SPs could be theoretically arranged in 5 clusters.
 
For that we have to disregard the different clock for shaders theory, isnt it?
Definitely.

480:24 (96:24) is the same ALU:TEX as 320:16, 4:1

800:32 (160:32) is 5:1.

I have to say it is kinda entertaining that this still feels "up in the air".

Something that's worth noting: Vantage P scores should benefit much less than X scores regardless of the scale of changes in RV770. This is simply because P is with AF/AA off. So, I'm wondering if the P scores are a misdirection (being about 1.5x)...

Vantage X scores should be 2x, so the remaining question is, can RV770 achieve that solely through 4xZ per clock? Can RV770 do that with 24 TUs, or does it need 32 TUs?

In case it isn't obvious: I'm ignoring the die size (which is definitely in the region of 250mm2) and entertaining the idea that the damn thing is 800:32.

Somone mentioned 480:32 a while back, so I suppose I should throw it in there.

Jawed
 
MADD + MUL?
3 flops p/ clock?

750 * 3 * 480?

???

Besides the point that the most important thing about it is that RV700 truly should yield a theoretical peak of ~1 TFLOP/s, that scenario above sounds a bit complicated to my layman's eyes. Assume you'd arrange those 480SPs in 5 clusters, you'd end up with 15 FLOPs per ALU. And no it doesn't have to be 4 or 5 clusters at any price, but that the whole number crunching stuff doesn't lead anywhere either.

By the way ATI never used to be that "fond" of MUL calls from what I recall from the past. If they'd add any single FLOP anywhere ADD would be the most likely candidate.
 
Question

Whether it's on the back-end or the front-end (i.e. setup vs. scanout), there is something bottlenecking that slave GPU, which often exhibits longer average frame rendering times than the master.



I disagree. The fewer inter-processor communications there are, the fewer dependencies and potential bottlenecks as well.
out of thread question.

So far r670 was seen as a 4 "processors" device nv92 as a 16 "processors".

Does Nvidia choice could hurt in this as GPU become bigger?
Could communication between the different "processors" start to hurt Nvidia more than ATI in near future (without major architectural change obviously)?

Could this add more complexity on software too (in regard to ATI choices)?
 
Last edited by a moderator:
I knew at some point that the whole processor thing would backslap eventually LOL. If I'd start as a layman I'd say that R6x0/RV6x0 has 4 very "phat" clusters and G8x/9x 8 quite "thin" clusters. So far we've somewhat verified that GT200 contains 10 clusters.
 
I'm still completely intrigued by the fact that we don't have any solid hardware numbers yet, besides clocks

And in its own way, if clocks are lower than previous generations, then ATI must be confident in the rest of the card's ability to make up for lower core clocks since that goes against their trend with previous generations

Anyways, the fact that more than a few people are claiming we'll be pleasantly surprised by the RV770's and the fact that nobody has leaked official hardware #'s on the card yet (when the Gigabyte placards were leaked, the GT200's specs were unveiled while ATi wasn't besides the obvious 512MB-256bit piece) seems to tell me that it's being closely hidden...
 
750MHz * 96 * ( 1.4*5 ) = ~1 TFLOP

If they're still doing the shader AA then having a little extra math power isn't a bad idea. Throw in the 1.4x shader clock domain with 750MHz and they get their teraflop.
 
800 confirmed?

24/800? I dont see how they jam 32 tmu into that space..

16/800?

Thats an insane amount of SP's..what the hell is Nvidia doing with all their die? Gobs of texturing I guess..
 
For GPGPU: ~1TFLOP in ATI form costs $200 and in NVidia form costs $600.

Comparing double-precision: 300+ versus ~125 GFLOPs.

For $600 you can have 1 TFLOP of ATI's double-precision or 125 GFLOPs of NVidias :oops:

Hmm.

Jawed
 
So if the Rv770xt won't have a separate shader clock`, or core clock at 1050, it will have much less than 1 Tflop?
 
Status
Not open for further replies.
Back
Top