RV730 - where are the 32 TMUs?

Indeed, I was just pointing out that nvidia has the fastest gpu out, same as last time, only now ati has higher margins since they didnt build such a goddamn big chip :)
Well, it certainly doesn't help that nvidia's big, expensive chip isn't really that much faster than AMD's much smaller chip. It is faster generally yes, but nowhere near as much as the die size difference would suggest... So even on that ultra high end part which has no direct AMD equivalent (well disregarding X2 solutions), margins can't be that high as nvidia can't really attach the ultra high end price such parts usually have...
 
Well, it certainly doesn't help that nvidia's big, expensive chip isn't really that much faster than AMD's much smaller chip. It is faster generally yes,

if your saying gt260 faster than hd4870 I thought that and was told it wasnt..
 
P.S.: Regarding RV730's 32 TMUs, I wish I knew for sure!
Besides taking Damiens word over every marketing PDFs any day, I got confirmation from AMD today, that there's really 32 TMUs in RV730 (which was to be expected) but only 16 interpolators.

edit:
..which is, as I might add, not necessarily a bad thing considering perf/area and perf/watt where RV730 really seems to shine.
 
Last edited by a moderator:
=>AnarchX: I can vaguely remember that specs table and the dual-slot cooled cards. But that doesn't suggest the chips were made at 80nm, only Fudzilla says that. Besides, has anyone ever *seen* an 80nm RV630?
 
I've been looking over reviews a bit lately and it looks like 4870 is actually faster than GTX 280 fairly often. It's too bad that the smaller RV770 uses so much more power at idle than the huge GTX 280. That is very disappointing to me. I really want to replace my 8800GTX with a card that idles better.
 
=>swaaye: Can be, but generally the HD 4870 is equally powerful as the GTX 260. The power consumption of the HD 4870 is a little mystery. HD 4850 underclocks its core to 160 MHz in idle, HD 4870 underclocks only to 500. I don't know why is this and ATI doesn't want to reveal much details about it.

// Looking at Computerbase's numbers, the HD 4670 in idle draws less than RV670 and also less than RV635. That points to some PowerPlay improvements being present in the R7xx family, but it seems as if it's broken on the RV770.
 
ixbt's heavy texturing version of the "Procedural: Frozen Glass PS2.0" test is interesting:

http://www.ixbt.com/video3/rv730-part2.shtml#p5
http://www.ixbt.com/video3/rv770-2-part2.shtml#p5

It appears this test is texturing bound in RV7xx. RV730 is 78.4% of RV770 (1726 MP/s and 2202MP/s) in this test - while its texturing capability is theoretically 80% (32 versus 40 texels per clock).

On RV670, also at 750MHz, this test runs at 698MP/s. That's 40.4%. Theoretically it should be 50%.

Ouch.

Jawed
 
On RV670, also at 750MHz, this test runs at 698MP/s. That's 40.4%. Theoretically it should be 50%.
50 % of RV770? Perhaps the benchmark doesn't take advantage of RV670's beefier texture units. And 16 ÷ 40 is 40 %.
The theory is that it's related to the use of GDDR5.
I know. I also know the answer of someone from ATI (it's from an interview that should be published next week, wait for it) and their explanation is a little different, but it doesn't run into much details.
 
ixbt's heavy texturing version of the "Procedural: Frozen Glass PS2.0" test is interesting:

http://www.ixbt.com/video3/rv730-part2.shtml#p5
http://www.ixbt.com/video3/rv770-2-part2.shtml#p5
Thanks for the links.

Man, RV730 is a budget shading monster. $70 at launch? I got a 640MB G80 for $300 16 months ago, and it's slower. Shame on me for losing faith in ATI. :smile:

If I was on the R6xx design team, I'd feel like a complete idiot right now. Just think about how many millions parts ATI produced that could have had vastly higher ASPs if the designers did their job properly and created an optimal chip with the same silicon.
 
I got a 640MB G80 for $300 16 months ago, and it's slower. Shame on me for losing faith in ATI. :smile:
Well, the RV730 is basically faster than R600. I think faster memory would help its performance. Perhaps some manufacturer will offer GDDR4 or GDDR5 equipped versions and we'll see whether I'm right or not :)
If I was on the R6xx design team, I'd feel like a complete idiot right now.
Yeah. Some people blame it on bugs in the desing (which is unfortunate) or the manufacturing process, but wrong desing decisions had their role as well. But who was supposed to know back then? :???:
 
50 % of RV770? Perhaps the benchmark doesn't take advantage of RV670's beefier texture units. And 16 ÷ 40 is 40 %.
Sorry, that was sloppy - I was comparing RV670 to RV730, both at 750MHz with 16 and 32 texture results per clock. RV670 is very inefficient here, comparatively.

I don't know the detail of the test but I suspect, being a PS2.0 test (i.e. "old"), it does not use textures with more than 8-bit format, i.e. there are no int16/fp16 formatted textures in the test. So in that case the theoretical parity of the two GPUs at 16-bits per channel per texel is irrelevant (RV7xx has half the throughput per TF for 16-bit and 32-bit format texels as R6xx).

Jawed
 
I've been looking over reviews a bit lately and it looks like 4870 is actually faster than GTX 280 fairly often. It's too bad that the smaller RV770 uses so much more power at idle than the huge GTX 280. That is very disappointing to me. I really want to replace my 8800GTX with a card that idles better.
It's all about proper tweaking, for more aggressive parameters.
In my case (HD4870), Ati Tray Tools is doing a perfect job lowering as much as it's possible the idle power draw, where I've added an automatic profile for 2D mode with 500MHz core, 700MHz for the memory and Vcc=1.083v, while the fan speed is just a shy below the 1000 RPM mark, and the thing is cool enough and dead silent. ;)

/sorry for the off-topic!
 
Now this may be a little offtopic but if you look at the [H] review of X2 the power consumption of the 4870 seems to be fixed
http://enthusiast.hardocp.com/article.html?art=MTU0OSw4LCxoZW50aHVzaWFzdA==
(this may be better suited for the 4870 review thread but it seems somehow relevant to some of the last posts here :))
It probably is the wrong thread, but thanks for the info and welcome to the forums!

Even the non-X2 4870 numbers seem to have improved (and GT200 numbers got worse?), as this is what [H] originally found:
http://enthusiast.hardocp.com/image.html?image=MTIxNDM2MzM1MUNyREJoNlhHWENfOF80X2wuZ2lm

I've been looking over reviews a bit lately and it looks like 4870 is actually faster than GTX 280 fairly often. It's too bad that the smaller RV770 uses so much more power at idle than the huge GTX 280. That is very disappointing to me. I really want to replace my 8800GTX with a card that idles better.
Looks like your prayers have been answered. :D
 
Perhaps they changed what software they were using for benchmarking GPU power. They might have found some software that works GTX280 harder than before. Also, at the same time, they might be working ATI GPUs less, if the absolute performance of ATI is markedly lower. This seems to be what happens in this thread:

http://forum.beyond3d.com/showthread.php?t=45703

Jawed
 
If I was on the R6xx design team, I'd feel like a complete idiot right now.
Not really. As explained at the launch, R600 was schedule driven as it had to get there for DX10 (though it did miss that a bit). RV770 benefitted not just from the extra year development post R600, but also from some parallel track development time as well as a lot of the guys that came off Xenos went to RV770.

RV770's timeframe was always such that it doesn't need to implement a bunch of new features and its architectural mandate was to just take the unified arch and optimise the nuts off it.
 
I still think that when you're given an engineering task that it should be relatively optimal, especially for ASICs.

R100 was quite good considering that it had dependent texturing. From my work at ATI I know it was just barely short of GF3 functionality and had similar perf/mm2. R200 was decent too, if a bit buggy. R300 was fairly optimal as the first DX9 chip, and thus so was R420 (though ATI really should have put FP blending in it). R520 wasn't so great, but R580 was decent considering the DB granularity, although that was a misguided design criteria for the time. Xenos was quite optimal, too.

R600 was pathetic, though, and RV630 even worse. Look at what NVidia did with G80, which was their first unified and first DX10 GPU, and they got it done much earlier than R600. Two architecture refinements later and they've barely improved perf/transistor/clock, which tells you that they optimized quite well right from the beginning. That's how it should be.

It really said something to me when I saw slides saying perf/mm2 was a design goal with RV770. Duh! WTF were the hw architects smoking when this wasn't a design goal for R600?

Anyway, I can definitely say that if I led the R600 team and saw people in the same company produce such a vastly better design later on, I'd be ashamed of myself. You expect 10-20% from design refinement, but RV730 is often 3x the speed of RV635 with only 25% more die space and BW.
 
I know Wavey is probably going to disagree, at least publicly (and probably privately too), but my perspective is that the R5xx and R6xx engineers, for some bizarre reason, thought perf/unit was more important than aggregate perf/mm². It's a bit as if OoOE CPU designers had been in charge of certain aspects, really... (HI LARRABEE!!! :))

In addition to that, it's hard to argue against the fact that R6xx had subpar ROPs, memory controllers and compression hardware in every way... It's pretty amazing how they managed to optimize all that so much that they actually *beat* NV in those respects now! Respect, guys.
 
Back
Top