AMD: R7xx Speculation

Status
Not open for further replies.
The strange thing about it though, wouldn't we be right back to ROP/TMU limitation? I wonder if such a massive increase in shaders would be to help with AA because of the way it's done in the R600/R700 arch?

I would believe if they said it has 24 ROP's and 64 TMU's :D
 
R520 -> R580 = +18% of die-space & 3-times more ALUs
RV670 -> RV770 (rumoured) = +30% of die-space & 2.5-times more ALUs + 16 tex. units (or just tex. filters?)

Why should it be impossible? ALU:TEX would only slightly raise (4:1->5:1) - thats quite logical, R600 will be in fact more than 1.5 year old this summer.
 
C'mon, they can do better =)
Why not 127945702395823578235235 SP? That sounds great!

Or to put it terms I find funny, in the course of ~6 months, their high-end would go from 320 MADDs (HD3870 Nov 07) to 320 distinct shaders (HD4870X2 May/June 08).

I do want to believe that AMD/ATi wants to return to talking true shader count, rather than MADDs though, so in that instance it makes sense. 160 vs 128 looks and sounds better than the MADD Marketing Speak. They surely couldn't market R600/RV670 as a 64 shader processor and expose it as a competitor to the 9600gt with similar specs (9600gt having the same amount of shaders with roughly half the units and adding roughly double the clock speed). 320 would sure look good against Nvidia, which is the next point...

Latest nvidia rumors put GT200 (or whatever) at 384 shaders, or roughly 3x the amount of G80/G92, with + 1/3 ROPS/TMUs (32r/96 tmus total) over G80/G92. Die size of g92 is 334mm2@65nm. I ask how it is possible for 2.65x the transistors at 55nm is possible for a single chip, when roughly less than half that spec is not possible at around half the size?

I mean, what do you expect GT200 to be?
384sp/32 Rops/96 TMUs/512-bit/~1.8B trannys @ 55nm = ??mm2? >~500? Nothing much bigger is going to fit on a package.

This would make RV770:

160sp/16 ROPs/32 TMUs/256-bit/~900m trannys @ 55nm = ~250mm2.


While seemingly unorthodox, tell me why this isn't possible?

When broken down that way, and putting the rumored specs of a 4870X2 vs. GT200, you do get coherent and sensible competition. A 4870x2 @ 720mhz would reach an equal flop count to Nvidia's rumoured specs (counting mul), take into account the differences in architecture, perhaps some crossfire penalization, and then the higher clock speed for the ATi parts (825-875 vs 650 on GT200) helping bridge the texture gap, and you'd have yourself a ballgame...probably with nvidia still winning the high-end...but still, a ballgame. :)

@No-X: You iterated a similar point while I was coincidentally typing a comparative notion at this early hour. Well done. :)
 
Last edited by a moderator:
I know, right? :p It does look bullshitty.

edit: While the picture looks crappy and may be chopped, the specs match what was posted earlier, just with more detail. The earlier post said a little more than twice the shaders, and this says 160, or 2.5x...so to me it sounds like we're starting to get corroborating posts...or maybe just guesses based on the earlier post. I never know with Chiphell, but I try to keep track of the rep of certain posters there. This one is a mod, so you'd think they'd post less dung, but I suppose one never knows.

The strange thing about it though, wouldn't we be right back to ROP/TMU limitation? I wonder if such a massive increase in shaders would be to help with AA because of the way it's done in the R600/R700 arch?

I don't understand this mentality.

If someone is willing to post a faked photograph and pass it off as real why would you place a give a single iota of credence on anything else they post ever? In what sense do any "specs" posted alongside any faked photo by such a person lend credence to anything? Even if they've nicked the specs, or made them up and by chance they match reality, it's completely beyond me how they can "corroborate" other posts. Bullshit fake photo = bullshit everything else with zero intellectual value. Delete from your brain ASAP.
 
I don't understand this mentality.

If someone is willing to post a faked photograph and pass it off as real why would you place a give a single iota of credence on anything else they post ever? In what sense do any "specs" posted alongside any faked photo by such a person lend credence to anything? Even if they've nicked the specs, or made them up and by chance they match reality, it's completely beyond me how they can "corroborate" other posts. Bullshit fake photo = bullshit everything else with zero intellectual value. Delete from your brain ASAP.

Banter and conjecture ie making conversation? Point taken though, my friend. ;)
 
Latest nvidia rumors put GT200 (or whatever) at 384 shaders, or roughly 3x the amount of G80/G92
The same rumours also mention some VERY big die -- bigger than anything we've seen so far. And NV is going from 65 to 55nm, while AMD is already there.
Now if we assume that RV770 is RV770 and that top-end board will use two of them again then we can safely assume that RV770 won't be the same kind of jump above RV670 that G100 is supposed to be above G80.
So there, no 3x of anything if it's supposed to be an RV670 replacement on the same 55nm process. 96 5D ALUs rumour is more or less plausible but not that 160 ALUs nonsense (unless we're being fooled and all this RV and MGPU talk is just a smokescreen of course).

This would make RV770:
160sp/16 ROPs/32 TMUs/256-bit/~900m trannys @ 55nm = ~250mm2.
While seemingly unorthodox, tell me why this isn't possible?[/I]
You're taking RV670 (192mm2, 64 5D ALUs, 16 TMUs), beefing it up 2-3 times (160 5D ALUs, 32 TMUs) and getting only 250mm2? I'd call that an underestimation.
 
In terms of unit counts and effects on ALU:TEX ratios (and, ultimately, die size), I find it very hard to stomach anything more than 480 SPs and 24 TUs with 16 RBEs that have quad-rate Z.

---

To go significantly larger than that would require some kind of "full custom" revolution. I don't even know what kind of scaling factor such an implementation change would bring about and I've got no idea if it's a meaningful concept.

Though I will add that ATI's strategy of designing for "X2" does actually make "full custom" less arduous in terms of timescales/engineer-effort. If you've got 100 man years per year of GPU implementation capability, you can significantly increase your implementation throughput if you never have to build an enthusiast core, but instead stitch two RVs to make X2. The core that's hardest to implement just disappeared from your schedule.

Jawed
 
In terms of unit counts and effects on ALU:TEX ratios (and, ultimately, die size), I find it very hard to stomach anything more than 480 SPs and 24 TUs with 16 RBEs that have quad-rate Z.
Jawed

This is my thoughts as well. Mainly die size. I reckon 3.5 million transistors is packed per 1mm2 of die space which gives RV770 a rough 200 million extra transistors over RV670. If one knew the die space the shader core takes up in RV670, then perhaps we could get a better idea what is under RV770's hood. Although, I so far like the sound of 96sp and 6 TEU's.
 
But wouldn't TSMC's 45nm make all this BS seem a little more likely?
It would throw a bunch of other potential problems into the mix though...
 
But wouldn't TSMC's 45nm make all this BS seem a little more likely?
It would throw a bunch of other potential problems into the mix though...

I don't think we will see 45nm GPU's untill winter. Although I would not be surprised if R8xx went right in to 40nm rather than 45nm. More than likely I'm betting along with everyone else that RV770 is on 55nm.
 
I don't think we will see 45nm GPU's untill winter. Although I would not be surprised if R8xx went right in to 40nm rather than 45nm. More than likely I'm betting along with everyone else that RV770 is on 55nm.

You are right, looks like it usually takes at least ~1 year to go from low-power to GPUs.
TSMC 65nm did low power in May '06 and AMD started shipping 65nm GPUs in June '07.
So Q4 seems likely for 45nm if we can follow 65nm as a timeline.

Just trying to get all the options out there to try and get a grasp on all these rumors.
 
You're taking RV670 (192mm2, 64 5D ALUs, 16 TMUs), beefing it up 2-3 times (160 5D ALUs, 32 TMUs) and getting only 250mm2? I'd call that an underestimation.


IMHO only;

the RV635 has 120mm² with 4 TMUs and 120 ALUs; the RV670 has 192mm² with 16 TMUs and 320 ALUs. If you make a simple excel chart then you can see that a RV6xx with no TMUs and no ALUs at all would have ~90mm². Therefore the RV635 needs only 30mm² for 4 TMUs and 120 ALUs. If we devide this then 4 TMUs need roughly 15 mm² and 120 ALUs again 15 mm².
So a hypothetical RV770 with 32 TMUs and 640 ALUs would need 32/4 x 15 mm² (=120 mm²) + 640/120 x 15 mm² (=80 mm²) die area for the functional units.
So 90 mm² + 200 mm² = 290 mm² for a RV770 with 32TMUs and 640 ALUs.

A RV670 would have ~190mm² using this math.

I know very well that this small calculation is bonkers. :) But it shows very well that the hypothetical specs are not out of range.

If the RV6xx with no functionality would have 95mm² instead of 90mm² then a RV770 with 32 TMUs and 640 ALUs would have a dia-area of ~266 mm² (but a RV670 would have only 180 mm² using this math).
 
Last edited by a moderator:
Hmmm... and RV670 has also 16 RBE instead of 4.
So the computation could be even better for an hypotetical "RV 770", but it should be seen how more control logic it will have, and if the units are "beefed up" in some way (more samples per clock in the RBE, better-faster filtering, and so on).
 
Last edited by a moderator:
What about that supposed internal bridge chip?
I remember that Digitimes article, and IMHO it would definitely help out for their X2 cards in terms of design and cost...


But that original PLX chip definitely isn't small (and the shrinks aren't that small either), so where are they again? :D The original chip was generic and all (blah) but I don't think implementation would differ to a massive stage though. Any possibilities that they'd use a faster interconnect standard too?
 
What about that supposed internal bridge chip?
I remember that Digitimes article, and IMHO it would definitely help out for their X2 cards in terms of design and cost...


But that original PLX chip definitely isn't small (and the shrinks aren't that small either), so where are they again? :D The original chip was generic and all (blah) but I don't think implementation would differ to a massive stage though. Any possibilities that they'd use a faster interconnect standard too?
According to the first slide on this page:

http://www.pcper.com/article.php?aid=527&type=expert&pid=4

780G chipset, including "HD2400", amounts to 205M transistors. Now, HD2400 is about 180M transistors...

Also, if you look at what's coming down the road, with bifurcation support:

http://www.elitebastards.com/pic.php?picid=hanners/ati/cat83/Slide14.jpg

you'll see that soon a 16-lane port can be treated as two 8-lane ports. Though how this will impact the configuration of an X2 (or R780 if it's not "X2" in the sense that R680 is) card, I don't know.

Jawed
 
Last edited by a moderator:
IMHO only;

the RV635 has 120mm² with 4 TMUs and 120 ALUs; the RV670 has 192mm² with 16 TMUs and 320 ALUs.
small correction rv635 has 8 TMUs.

I know very well that this small calculation is bonkers. :) But it shows very well that the hypothetical specs are not out of range.
It's not bonkers but rather a bit oversimplified. There are at least some other things we know need more die space on rv670 too (more ROPs, twice as wide memory interface), as well as other things to consider (texture size of rv635/rv670, will this get increased with rv770 etc.).
I dunno to which hypothetical specs you specifically refer, but I'd still say the 160SP (800) is bonkers :).
 
I dunno to which hypothetical specs you specifically refer, but I'd still say the 160SP (800) is bonkers :).
I dare say the normal assumption with GPU scaling is that as the number of ALUs increases, the register file also increases, proportionally.

So, the question is, is it reasonable to put an upper bound on the size of register file? If so, that would allow for ALU count to scale faster than apparent transistor-count/mm modelling would imply.

I'm generally against this concept, for what it's worth. If you have more threads in flight, then in order to maintain performance scaling for a given complexity of shader (complexity measured as instruction-slot-count:register-count and ALU:TEX) you need to scale the register file size too.

But does shader complexity "level-off"? My gut feel is that if it does, we're a long way off that point. I think there's two forces that come into play:
  1. conversion of fixed-function processing to shader programs
  2. latency-hidden thread-context switching
R600 has already introduced shader programs for AA resolve. We should see alpha blending come relatively soon and I imagine texture filtering isn't massively distant. All of these are "low complexity" shaders, being either bandwidth-bound or ALU-bound but not register bound. As more fixed-functions are converted into shader-program form, they will consume a larger and larger proportion of the total ALU cycles of the GPU, for a given level of performance. In doing so they will average-down the consumption of the register file.

R600 virtualises the register file: it can spill into video memory. I don't know how well R600 can hide the latency associated with paging the register file, but I think it's reasonable to assume that at some point that latency will become hidden in the general case. If so, then there's an upper limit on the size of the register file.

Jawed
 
Status
Not open for further replies.
Back
Top