What do you expect for R650

What do you expect for HD Radeon X2950XTX

  • Faster then G80Ultra about 25-35% percent overall

    Votes: 23 16.4%
  • Faster then G80Ultra about 15-20% percent overall

    Votes: 18 12.9%
  • Faster then G80Ultra about 5-10% percent overall

    Votes: 18 12.9%
  • About same as G80Ultra

    Votes: 16 11.4%
  • Slower thenll G80Ultra about 5-10% percent overall

    Votes: 10 7.1%
  • Slower then G80Ultra about 15-25% percent overall

    Votes: 9 6.4%
  • I cannot guess right now

    Votes: 46 32.9%

  • Total voters
    140
80nm is some kind of "half-node". Like 110nm was a half-node, a "shrink" of 130nm. 110nm at TSMC doesn't support low-k. I suppose this complicated ATI's attempts to move GPU designs from 130nm to 110nm. I imagine there are process-specific oddities working with 80nm - these might explain some of what's happened with R600.

Bear in mind that RV570 (and "RV560") are 80nm too. How does that behave?

Jawed
 
I'm not sure I buy the whole "there isn't a problem with AA resolve" stuff that AMD is sending out. The documents point to it being there yet it seems rather worthless. You'd think they'd have completely cut it from the chip if they wanted the ALUs doing the math.
You still want alpha-blending, z-compare and such like render target operations to be done in fixed-function hardware, I guess.

And on top of that why not add additional ALUs to compensate for the loss of fixed hardware.
AA resolve consumes something like 10% of ALU power, it seems (at around 60fps). Tented AA resolve costs more, but then higher-IQ always costs...

Jawed
 
AA resolve consumes something like 10% of ALU power, it seems (at around 60fps). Tented AA resolve costs more, but then higher-IQ always costs...

Jawed

10% for R600?

I hope something's done for the RV cores.

That's >30% for RV630, if working with the same scene and settings.
 
10% for R600?

I hope something's done for the RV cores.

That's >30% for RV630, if working with the same scene and settings.
Yeah, RV630 is gonna be crucified running 4xMSAA at 2560x1600 :p

Now, why AA performance on R600 is all over the shop, well I dunno, is it even worth speculating?

Jawed
 
Beats me.

The claim was that the link from the ROPs back to the SIMDs is meant for this kind of thing, though how well that works kind of puzzles me when there are 4 ROP clusters for R600's 4 SIMDs, while there's 1 ROP cluster for 3 half-SIMDs for RV630.

Actually, it makes sense that the ROP count is so low if taking AA resolve into account.

ROP unit granularity is coarser than shader unit granularity, so reducing ROP count down is either 1/2 or 1/4 of R600.

1/2 ROPs on the heavily pared down RV630 sounds like 15-20% penalty for resolve.

That would kind of put a dent in the ATI's aim for "quality pixels".
 
I think it's worth remembering that RV630 is nothing more than this gen's version of RV530 (which is "1/4 of R580" ... I have high hopes for R650...). It's not meant to produce enthusiast-class IQ and performance. That's what R600 is for (fingers-crossed on the drivers...).

It'll manage enthusiast class IQ and performance (hopefully) on more elderly games.

Jawed
 
Um... Maybe I'm forgetting something, but how does decreasing feature size reduce current leakage? :???:

(The way I was taught, ) Current leakage is a bigger problem as you decrease transistor dimensions. The reduction in the threshold voltage gives an exponential rise to the subthreshold leakage current. And then there's the gate oxide leakage due to tunneling; this can be alleviated by using oxides of differing dielectric strength or other designs such as dual gate or Intel's tri-gate design but each has their own trade-offs for manufacturing.

Anyways, a couple papers on leakage: http://ieeexplore.ieee.org/iel5/92/28328/01266405.pdf
http://domino.research.ibm.com/acas/w3www_acas.nsf/images/projects_02.03/$FILE/02blaauw.pdf

I must be missing something...



The R666 should be a performance demon. :yep2:
It's not all about feature size. There are various tweaks made to each process. So you could have 65nm geared to low power or high speed.
 
I was thinking - if we could logically assume that R650 @ 65nm tech be clocked default @ around 900MHz GPU frequency and at the same time it will draw less power & heat.
Putting them in crossfire setup you could probably score around 23.4k in 3DMark06.

Example if you look at over-clocked crossfire setup HD2900XT @ 900MHz GPU.
http://kinc.bandwidth.se/23k2.PNG

But if R650 will have 96 ALUs, 32 TMUs, 16ROPs and running @ 900MHz you could probably get ~27.0 - 28.0k points in crossfire setup in 3DMark06.


[Edit: But R650 @ 65nm tech you could probably over-clock GPU in crossfire to around ~1060MHz "Which will boost 3DMark score even higher (Well possibly extra two thousand points)"]
 
Last edited by a moderator:
Maybe 16 texture-filtering units aren't enough, but if ATi planned initially to clock R600 at ~950MHz, it's texture fillrate would be almost 50% higher (compared to X1950XTX). At 740MHz R600 has poor fillrate and good ALU performance, but at ~950MHz it should have quite good fillrate and amazing ALU performance.

fillrate of speculative R650 with 24 "TMUs" at 950MHz is 2x higher (x X1950XTX)
fillrate of speculative R650 with 32 "TMUs" at 950MHz is 3x higher (x X1950XTX)

Does R650 really need 32 texture units? Wouldn't 96/32 cause die size even larger than die size of R600?
 
Maybe 16 texture-filtering units aren't enough, but if ATi planned initially to clock R600 at ~950MHz, it's texture fillrate would be almost 50% higher (compared to X1950XTX). At 740MHz R600 has poor fillrate and good ALU performance, but at ~950MHz it should have quite good fillrate and amazing ALU performance.

fillrate of speculative R650 with 24 "TMUs" at 950MHz is 2x higher (x X1950XTX)
fillrate of speculative R650 with 32 "TMUs" at 950MHz is 3x higher (x X1950XTX)

Does R650 really need 32 texture units? Wouldn't 96/32 cause die size even larger than die size of R600?

I for one very much doubt that ATI was realistically targeting 950MHz. Regarding 24 Vs 32 TMUs, that would depend on whether R650 is using R600 (4:1 ALU:TEX ratio) or RV630 (3:1) as the template for the next design.
 
Does R650 really need 32 texture units? Wouldn't 96/32 cause die size even larger than die size of R600?

Based what ATI thinks - the answer is yes, but I'm not sure how it will effect die size.

Our sources confirmed that R650 will have a slightly redesigned core and that the main marchitecture difference will be the texture capability of the GPU. http://www.fudzilla.com/index.php?option=com_content&task=view&id=1074&Itemid=1

It doesn't say dough how much it will have: 24TMU's, 32TMU's or 64TMUs'.
Somewhere in different location was mention up to 96 ALUs as well.
 
Last edited by a moderator:
You mean, this:

I am more interested in how it would do against G80 with 1955MHz shader core, 1.21 gigahertz RAM and 88 TMUs.

...in response to your inquiry regarding how would a hypothetical 64 TMU, 840/1000 MHz "R650" perform against G80 Ultra/GTX?
BTW, I salute everyone who got the reference. You know who you are... geeks. ;)
 
Maybe 16 texture-filtering units aren't enough, but if ATi planned initially to clock R600 at ~950MHz, it's texture fillrate would be almost 50% higher (compared to X1950XTX). At 740MHz R600 has poor fillrate and good ALU performance, but at ~950MHz it should have quite good fillrate and amazing ALU performance.

fillrate of speculative R650 with 24 "TMUs" at 950MHz is 2x higher (x X1950XTX)
fillrate of speculative R650 with 32 "TMUs" at 950MHz is 3x higher (x X1950XTX)

Does R650 really need 32 texture units? Wouldn't 96/32 cause die size even larger than die size of R600?

Actually didn't Ati state that they clocked their cards beyond expectations? I actually believe they were targeting the GTX but didn't realise that the G80 information they had was something along the lines of a GTS spec G80 or a cut down engineering sample.

So to compete effectively they had to go and clock their core abit higher. (I think they were aiming somewhere between 600mhz to 700mhz, maybe even lower not quite sure)

So basically, i think the R650 could just well be a 65nm with a core clock boost and maybe in combination with GDDR4. TMU count could increase but wouldn't that fudge up their R600 architectural layout (ALU:TEX ratio too?)?

Are they bringing back AA resolve on the ROPs or not too is another question because even without the high core boosts, i think the R600 itself would've done admirably better if it had AA resolve on the ROPs rather than doing AA on shaders which kill performance as seen by alot of benches.
 
I dunno, the only thing R600 should have been competing with was R580. It should have been circa 2x faster across the board. Sometimes it is.

The fact that it's not hard to find cases where R600 is slower proves that something else is up...

What is more damning is that R600 was "specified" to be able to use ~120GB/s of bandwidth, with the combination of GDDR4 and the 512-bit bus. It seems pretty obvious that R600 simply isn't capable of using all that bandwidth. It's either rottenly inefficient compared with G80 which makes hay with 86.4GB/s, the drivers are shit or it's not got enough bandwidth-consuming units to make use of all those GB/s. Or some combination thereof.

Jawed
 
It's either rottenly inefficient compared with G80 which makes hay with 86.4GB/s, the drivers are shit or it's not got enough bandwidth-consuming units to make use of all those GB/s. Or some combination thereof.

Well, should be rather easy to verify if the R600 uses all it's bandwidth. Just "underclock" the memory a bit and see what happens with the performance.
 
Well, should be rather easy to verify if the R600 uses all it's bandwidth. Just "underclock" the memory a bit and see what happens with the performance.
What? Empirical testing to confirm hypotheses? We can't have the scientific method in here! :LOL:

(Seriously, though, somebody needs to try that. I think it will confirm exactly what Jawed said, but it can't hurt.)
 
Back
Top