AMD: R7xx Speculation

Status
Not open for further replies.
If there is any part of a graphics design that should be as custom as possible, its the ALU pipes themselves. Its very very high bang for the work, basically 1 ALU and then just slicing it across for the SIMD block and then replication for the multiple SIMD blocks. Its also something that is relatively easy to layout and extremely regular.

In addition synthesis tools tend to suck balls at ALU pipes where as they are much closer to hand drawn for random logic.

I've been a big proponent of all ALUs being full custom. Esp in light of the way Nvidia did it allowing 2x alu rates.

Aaron Spink
speaking for myself inc.


Maybe but it still comes out with the same or less shading power than R600.

Just seems a big tradearound that didn't *really* gain them anything. Now they've got half as many double clocked units, it basically seems like...
 
It gained them transistor budget to do other things that gave them the advantage. That's why a 500M transistor G94 can run with the 666M transistor RV670 in many cases.
 
specifically I was referring to crysis and HL2:E2. Synthetics aren't really that interesting to me atm. Esp synthetics in high visibility benchmarks!

Aaron Spink
speaking for myself inc.



EP2 is probably the most non-shader intensive "Blockbuster" FPS. ;)

Crysis is the opposite being shaderpower limited in cases with G94 vs G92. *But* that is with nVidia's safety net TMU visions in place. RV670 is probably bottlenecked on something other inadequacy I reckon.
 
It gained them transistor budget to do other things that gave them the advantage. That's why a 500M transistor G94 can run with the 666M transistor RV670 in many cases.

But the RV670 still has a significantly smaller die than G94 (192mm^2 vs 240MM^2), so it's arguably more efficient, or at least as efficient. Even allowing for the process difference.

I dont put much stock in transistor count because the companies count differently. Die size is a more objective measurement.

I guess it's valid to believe it gained Nvidia transistor budget if you want too, I just see it as ATI's unbalanced architecture (TMU deficit) holding ATI back.

If RV770 really does have 32 TMU's, lots of shaders, and as small a die as claimed, it would bear this out.
 
A good full custom design can be much faster (and smaller) than a synthesized design, especially for datapaths, but probably not because of better clock distribution, since that is more of a function of your clock distribution network (and i think tools should be able to balance paths reasonably well, but of course i've never used them before. full custom ftw! :p). You could argue that making the ALU smaller makes it easier for clock distribution, but on the scale of a single ALU i doubt that is a huge issue unless you are really pushing the limits in terms of clocking.
I was assuming (no, not making an ass out of u and me ;)) the ALUs on G80 to be kind of a single block with much of the supporting logic grouped around them.

btw: Full Custom - you're involved in telecom somehow?
 
But the RV670 still has a significantly smaller die than G94 (192mm^2 vs 240MM^2), so it's arguably more efficient, or at least as efficient. Even allowing for the process difference.

I dont put much stock in transistor count because the companies count differently. Die size is a more objective measurement.

Yeah I think that's a fair assessment. Overall die size is definitely more objective than proprietary transistor counting methods. But I don't think it's really a useful measure of architectural efficiency.

Efficiency relates to the use of resources and I don't think die size is necessarily a meaningful resource in this case. For example, what if a lot of transistor budget goes toward making better use of fewer processing resources available instead of just adding more of those resources. Which approach would you consider more efficient? Just a general question, not necessarily referring to any particular comparison.
 
I was assuming (no, not making an ass out of u and me ;)) the ALUs on G80 to be kind of a single block with much of the supporting logic grouped around them.

btw: Full Custom - you're involved in telecom somehow?

Not sure how you define "ALU" and "supporting logic". But the ALU datapath block should already have all the pipeline registers and clocking stuff built in. You will probably only have one or a few buffered clock inputs and you already have your lowest level clock distribution stuff already included in the ALU block.

And no, i'm a n00b still looking for a job, so don't take my words as gospel. I have done a full custom ALU for a project though (all the way to hand drawn layouts), so i at least know something about it. Compared to the synthesized version, it was something like 2x faster and 2x smaller. But of course, synthesizing the design takes about a day or so, the full custom layout took a month :p
 
With ALU block i meant something like everything except Dispatcher, TMUs, Mem controller and that sort of stuff. And yes, that includes the ALUs register file(s) and clock buffers.
 
With ALU block i meant something like everything except Dispatcher, TMUs, Mem controller and that sort of stuff. And yes, that includes the ALUs register file(s) and clock buffers.
So you mean the math circuits and the register files? Or just the math circuits (including pipeline registers). The register file won't be in the same basic block as the math circuits at the lower levels. The building blocks of each "shader group" would probably be something like, math block, register file/memory blocks/caches, dispatcher, TMU, crossbar interface, etc.

Also, i'm a little lost on what you were asking now...
 
G94 at 55nm? Is this combination even possible? Die-size couldn't be sufficient for 256bit memory interface…
 
RV670 is 55nm, G94 is still 65nm.
G94 @ 55nm should be quite a bit smaller than RV670.

Ehh, RV670 is 20% smaller...

I dont know what a typical Half-node does, but I wouldn't think a whole lot more than that..

Sometimes even a full node doesn't seem to gain that much size reduction these days (wasn't Cell only like 17% smaller initially @65nm?)
 
55nm is a 10% linear shrink in each dimension, so it's 19% smaller. In practice I think if both G94 and RV670 were on the same processes, then RV670 would be smaller by a few percentage points or less as CJ said. Still, given that R600 was bigger than G80 if put on the same process most likely, that's a quite impressive achievement! (slightly smaller than G94, massively smaller than G92).
 
55nm is a 10% linear shrink in each dimension, so it's 19% smaller. In practice I think if both G94 and RV670 were on the same processes, then RV670 would be smaller by a few percentage points or less as CJ said. Still, given that R600 was bigger than G80 if put on the same process most likely, that's a quite impressive achievement! (slightly smaller than G94, massively smaller than G92).

Just how exactly(theoretically speaking if G94 was on 55nm) would RV670 be smaller than G94 with 166 million more transistors? I guess it's due to the size difference and transistor layout?
 
Rv670 X3?

Fudo posted this link to the Nordichardware, it's Asus RV670 X3 on a single card using 3 MXM modules and water cooling :oops:

The link is here for Nordichardware...
http://www.nordichardware.com/news,7543.html

If the X3 RV670 is possible on a single card, may we see this 3 RV770 chips on a single card too? May the CrossFireX possible with 6 GPUs?
 
Just how exactly(theoretically speaking if G94 was on 55nm) would RV670 be smaller than G94 with 166 million more transistors? I guess it's due to the size difference and transistor layout?


well still have to have transistors for routing data in the larger processes, I remember the g70 to g71 shrink that was one place where nV shaved off some transistors, don't know how many though but it could be significant.

Hmm looking at the figures now, I am willing to bet the control logic in the g8x and g9x chips are significantly more then the AMD counterparts? Of course more TMU's and ROP's and more robust TMU's as well.
 
55nm is a 10% linear shrink in each dimension, so it's 19% smaller.
RV620 is about 23,3% smaller than RV610 (despite RV620 is 1M transistors "bigger").

Applied to G94, 55nm version would be about 182mm2 large. nVidias smallest 256bit GPU is G71, which is 196mm2...
 
If the X3 RV670 is possible on a single card, may we see this 3 RV770 chips on a single card too? May the CrossFireX possible with 6 GPUs?

Hehe, sorry I couldn't help smiling at the thought of trying to use AFR with 6 GPU's. Every 6 frames a game could take an input from my mouse/keyboard. Or it'll have to discard quite a few frames as input during those 6 frames causes most to be discarded and a new 6 frames lined up for processing. :LOL:

Regards,
SB
 
Status
Not open for further replies.
Back
Top