RV730 - where are the 32 TMUs?

As suicidal as it may sound, I agree with Mintmaster. The reason for R600's not so good perf/mm2 ratio was that some design decisions turned out wrong (I mean the strong optimization for FP16). But there's nobody to blame since those decisions were based on estimates. Nowadays it's quite easy to say they screwed up.
Dave is right about one thing though. Even if the engineers could not improve the perf/mm2 ratio much, they could have fixed some flaws in the design if they had more time (which they sure did not have). Had they fixed it, I think we would see some decent performances with AA enabled.

=>Arun: I see where you're pointing at. R580 had way more transistors and die size than G71 while being roughly equally fast at launch. But fast forward a few months and the R580 gets way ahead of G71 to a point that even a GX2 can't beat a single R580. There surely was a reason for R5xx to be bigger and it showed. I think there originally was supposed to be a reason for R6xx being big as well, but unfortunately it didn't show during the chips' market lifetime and if it ever does, even RV670 will be long obsolete by then.
 
In addition to that, it's hard to argue against the fact that R6xx had subpar ROPs, memory controllers and compression hardware in every way...
I don't think memory controllers and compression were bad. If you look at the thread I made about games and BW, high available BW had no chance of making a big difference in performance with R600.

When there was enough load on the MC, it performed admirably. The problem was that the ROPs, ALUs, and TMUs were so damn big that they couldn't fit many on the chip. Therefore there wasn't as much work to do for the MC's as on NVidia's chips.

As suicidal as it may sound, I agree with Mintmaster. The reason for R600's not so good perf/mm2 ratio was that some design decisions turned out wrong (I mean the strong optimization for FP16). But there's nobody to blame since those decisions were based on estimates. Nowadays it's quite easy to say they screwed up.
They are absolutely to blame. Thinking FP16 filtering speed is important was retarted. No game was ever significantly limited by that, and there's no reason it would be. Full speed FP16 blend was also stupid, as even 100GB/s wasn't close to enough for that.

Dave is right about one thing though. Even if the engineers could not improve the perf/mm2 ratio much, they could have fixed some flaws in the design if they had more time (which they sure did not have). Had they fixed it, I think we would see some decent performances with AA enabled.
I'd be surprised if they didn't have enough time. Vista and DX10 is something they saw coming many years ahead. Xenos and R5xx were complete well before G71 was. I guess if management screwed up the timing and put the R600 team under undue time pressure then the engineers could be forgiven.

=>Arun: I see where you're pointing at. R580 had way more transistors and die size than G71 while being roughly equally fast at launch. But fast forward a few months and the R580 gets way ahead of G71 to a point that even a GX2 can't beat a single R580.
Yup, and that's why I didn't rag on R5xx much in my previous post. They just overestimated the software progression by a year or two. R580 was still decently optimized, but more so for loads a bit further in the future. R600 and RV630, though, weren't well optimized for anything.
 
The problem was that the ROPs, ALUs, and TMUs were so damn big that they couldn't fit many on the chip.
Perhaps they originally thought they'd manufacture R600 at 65nm?
I'd be surprised if they didn't have enough time.
They did not realize there were those bugs in the design until they got the first chips back from manufacturing, perhaps.
 
I still think that when you're given an engineering task that it should be relatively optimal, especially for ASICs.

Depends on what you're optimizing for. It's not fair to say R600 wasn't fast enough without considering its design goals.

That being said, R600 did not perform as a 512-bit enthusiast part would be expected to, IMHO.
 
I still think that when you're given an engineering task that it should be relatively optimal, especially for ASICs.

It really said something to me when I saw slides saying perf/mm2 was a design goal with RV770. Duh! WTF were the hw architects smoking when this wasn't a design goal for R600?
Of course. But its also to make it as optimal as you can in the schedule. RV770 is not magically different, it just had a lot more time to make it optimal.
 
Mintmaster: Are you sure, that failure of R600 originate from bad design decisions and not from design bugs? You have mentioned R200 and I was told, that R600 was the most bugged product since the R200...
 
=>no-X: Do you mean the major bugs or the minor ones that are always worked around somehow (but the may slow the thing down a little bit?) Nevertheless I think that the design decisions contributed to the failure more than bugs.
 
The argument of "not having enough time" doesn't bode too well, considering that Nvidia pushed out G80 (which I may remind you was a 90nm chip) well before R600. Yes, G80 wasn't exactly a smooth ride either (from a software point of view), but at least it delivered record breaking performance.

The question on my mind is; why couldn't the R600 design team make something akin to RV730, but at 80nm? Obviously at 80nm the chip would be bigger, and you could also arm the GPU with a wider bus (RV730 is clearly bandwidth starved).
 
AFAIK, Nvidia spent four years in R&D for G80, anyway. At the same time, ATi had other extra predicaments to deal with, outside the pure engineering cycle.

why couldn't the R600 design team make something akin to RV730, but at 80nm?
Well, R600 at its present form was already stretching the limits of the available 80nm tech. And, as someone previously reminded, the R600's TMUs were much fatter than ones in R700 -- a consequence of misguided design decisions at the time, so putting more was off limits, somewhat. Major architecture redesign was needed, for that, as we see it now.
 
Is it just me or is there a kind of queer connonation with the fact, that all they really had to do from ground up were the ROPs and those were a major player in R600 performing sub-expectaions?

I mean, at launch Ati went on about R6xx being their second generation 'USA' with which they would profit from their experience in Xenos. But the Xenos GPU itself did not have --- ROPs (call it RBEs if you like). Those were residing in a separate Die connected to Xenos with a 32 GB/sec link IIRC. The ROPs themselves had EDRAM access at a rate of 128 GB/sec (which, incidentally, was just the same memory bandwidth, the GDDR4-version of R600 would sport).

Plus, Atis engineering team(s) seemed quite a bit stretched with so many products to design for in rapid succession while (AFAIK) being sigificantly smaller than Nvidias. The latter had the luxury of doing almost no work since G70 except from shrinking exisiting designs and did not have to design a revolutionary console GPU to boot.

Looking back, my impression that R600 was an unfortunate attempt at glueing together some architectural parts that weren't able to fit properly at the time togehter with some decisions that are rather questionable (Full-FP16-everyhting - even all textures were apparently promoted to FP16), did not change.

Reverting some of their decisions in RV670 (FP16 everywhere) and then some more in RV7x0 (ringbus, traditional ROPs) really made the underlying architecture shine recently, though.

--
On another thread of thoughts:
The whole FP16-Texturing-thing boggles my mind. I mean, they designed Xenos themselves, optimizing it AFAIK for RGBA1010102 (as also support in R5x first) and must have known that this format would be used far and wide by every Xbox-360-port ever coming to the PC. So why was it even possible that they so firmly believed in FP16?
 
It could be stupid idea, but nVidia supported HW FP16 filtering/blending since the NV40. It's known, that nVidia have significant influence on developers. Maybe ATi assumed that due to nVidias influence, FP16 filtering will be common thing in 2007?
 
OK, here are some quotes from SirEric, over the matter:

The samplers were designed to be 64b samplers from nearly the beginning, and matching that to BW and keeping the 4:1 ratio on ALU:Tex was the design choice made. In the latest games, where ALU:TEX ratios hit 15~20, this really shines.
I remember when we decided to focus on FP16 as "the basic" unit for most things on the R600. At the time, HDR rendering was just starting, but it certainly seemed the way of the future. Not only for floating point “imageâ€￾ textures, but also for all types of floating point textures, such as normal maps and other more complex floating point data sets. In retrospect, the industry is probably moving a little slower than we expected, though I would say that games like Oblivion in 2006, and a few of the upcoming titles are really changing the tides, at least at the avant-garde of games. I'd like to think that our chips are very forward looking.
 
According to expreview and this coincides with other data, G84 is ~169mm² and RV630 is ~149mm². The cost difference between a 90nm and 65nm wafer in that timeframe is *certainly* higher than 13.5%, so RV630 was already more expensive before we even consider yields.
Sorry for revival of the old thread, but I just noticed a small inacccuracy. G84 wasn't 90nm, but 80nm...
 
Sorry for revival of the old thread, but I just noticed a small inacccuracy. G84 wasn't 90nm, but 80nm...
Ah yes, old brainfarts coming back to haunt me I see ;) Anyway I'd imagine 80nm wafer prices should have been roughly identical to 90nm ones as long as your ramp doesn't force TSMC to increase capacity there faster than they planned to anyway. They have certainly said seevral times themselves that 80nm (and 55nm) have a up to X% cost benefit, where X% is actually exactly the claimed shrink percentage.

Also obviously I pointed out that it would be certainlycheaper in that timeframe. Right now, the price gap between 80nm and 65nm is probably smaller; although whether it is higher or lower than 13.5%, I haven't the faintest idea.
 
Total layman here, but just beeing looking back at tibits here and there, mainly about RV730 (HD4670) and RV670 (HD3870) performance. Seems the two were very close in performance in some situations (non bandwidth bound situations??).

From what I can tell, the RV670 weighs in at circa 190 mm2 (666m trannies) and the RV730 is 146 mm2 (514m trannies)

The most obvious difference between these 2 parts apart from Bus width is Rop count (8 versus 16 in favour of RV670) and tex units (16 versus 32 in favour of RV730).

So they managed to save 152 transistors by cutting the rops in half but managed to pack in 32 tex units, abeit less capable ones (would FP16 units be twice the size of RV730's units, or close enough??).

Does anybody here no how much die space the ROPs and tex units of either architecture takes up, or could hazard a guess.

Mainly I'd like to know how much density improved for the Shader core from RV670 to RV730/RV770.

Do we have any die shots of R600/RV670??
 
Back
Top