AMD RV770 refresh -> RV790

Hmm, quite impressive. Guru3d should take out a ruler and measure that chip size!
That said, the board doesn't really look for real, more like a modded HD48xx design. The power section seems to be too beefy (and it is accordingly not even close to fully equipped). I guess that's ok for a ES but I can't imagine the final card looking like this. Also wouldn't it be cheaper to only use 4 1Gbit chips instead of 8 512Mbit ones (though I guess this makes sense so could also support 1GB versions).
If they can get clocks up a bit (is there a reason why it shouldn't at least be capable of what the rv770 could do, e.g. 725Mhz), get some more memory bandwidth (that 800Mhz mem clock is basically as low as it gets when it comes to gddr5, based on what's available I would imagine the 1Ghz chips don't really carry a high premium), I guess it could indeed rival the HD4850.
 
What do you guys think is the possibility of RV790 still actually being a product roughly double the size of RV740 on the 40nm process and and the boards being called Radeon 5670 and 5870 respectfully. I mean they have changed series names with limitid architetural updates before (2xxx -> 3xxx).

I know there the fact that RV790 is 55nm is almost set to stone according to various rumors. But still, one can hope right ? :p
 
Apart from bandwidth there's CPU, setup rate and fillrate that could be restricting the difference between HD4850/HD4830 to 4-5%.

OK, one by one:
- Setup: the possibility is there, definitely - in some of the games. But I don't think that a lower poly game like FEAR or R6 Vegas would be affected, do you?
- CPU (or any other element before the VGA): no, I filtered out the platform limited cases (where the vanilla 4850 performed less than 7% faster than the underclocked one)
- Fillrate: yeah, that's what I'm getting at. or z-fillrate. or whatever else these RBEs do in their spare time :smile:

I think with MSAA off HD4870 tends to be merely 20% faster than HD4850.

Actually, that's not always the case. Check this comaparison - FEAR at 1920x1200 and Vegas at 1680x1050 or 1920x1200 (you can click the bars to see percents). Or Crysis on page 2 at or above 1680.


Well, with 16 RBEs and 800MHz GDDR5 "HD4750" will be really bandwidth limited in comparison with HD4850.

OK, I see. But that can be a good thing from positioning perspective - you just slap in a 40% faster gDDR5 variant, raise the clock speed by 20%, and you can call the card a 4790 with a 25-30% perf advantage :D

About the rest - yes. I was getting quite confident that the RV790 will be a simple overclock with serious power consumption tweaks, but after this RV740 showing, I'm more than doubtful.
 
Any coarse predictions on the 1GB DDR3 part? I'm curious on how much fillrates contributed to 730's loss of grace wrt 3870.
This question leads in a really nice direction :LOL:


HD4670 theoreticals in comparison with HD3850 (not HD3870):
  • GFLOPs - 112%
  • Texture - 225%
  • Bandwidth - 60%
According to:

http://www.firingsquad.com/hardware/amd_ati_radeon_4670_performance_review/default.asp

and taking the maximum resolution only (1600x1200) and noting that CoD4 and Crysis results are confused (both are the same, so only counting one of them), HD4670 achieves 93% of HD3850's performance.

It did this with 56% the colour fillrate of HD3850, but more importantly, it did it with 112% the Z rate.

Will there be a GDDR3 version of RV740? It seems pretty likely (DDR3? DDR2?). I suppose 1100MHz GDDR3, which gives 35.2GB/s, 69% of the "HD4750", is highly likely.

So, in summary, Z rate is very bandwidth efficient. So this comparison with HD4670 v HD3850 would naturally bias RV740 towards having twice the Z rate of RV730 and would fit in very nicely with the observed performance.

Now, the question is, does RV740 have 8xZ per RBE or does it have 16 RBEs? :p

Jawed
 
I would like to know if the GDDR5 version of the RV740 will suffer the same horrendous idle power consumption as the 4870? Hopefully, they are able to make it better in this regard, it is a massive waste.
 
What do you guys think is the possibility of RV790 still actually being a product roughly double the size of RV740 on the 40nm process and and the boards being called Radeon 5670 and 5870 respectfully. I mean they have changed series names with limitid architetural updates before (2xxx -> 3xxx).

IMHO, only rv8xx is gonna get the 5xxx name.

I know there the fact that RV790 is 55nm is almost set to stone according to various rumors. But still, one can hope right ? :p

Rumour mill may be set in stone, but I can't think of 1 sane reason to do it. May be that's just me.:oops:
 
This tested "HD4750" appears to have 8 memory chips. I think that implies clamshell configuration of the GDDR5.

Or it could imply that each quad RBE has an MC :p

http://forum.beyond3d.com/showpost.php?p=1257959&postcount=194

Dave Baumann said:
You're describing the old mechanism there for dual-rank/clamshell. GDDR5 actually has a x16 mode, so each device has half the bandwidth, but you double up on the number of devices.
Was Dave "letting slip" or hinting that AMD will be using 16-bit MCs? Each of the eight 16-bit MCs in RV740 has a quad RBE attached, then bingo :!:

Jawed
 
OK, one by one:
- Setup: the possibility is there, definitely - in some of the games. But I don't think that a lower poly game like FEAR or R6 Vegas would be affected, do you?
- CPU (or any other element before the VGA): no, I filtered out the platform limited cases (where the vanilla 4850 performed less than 7% faster than the underclocked one)
- Fillrate: yeah, that's what I'm getting at. or z-fillrate. or whatever else these RBEs do in their spare time :smile:
I'm not going to disagree on that last point and well, as you can see from my further investigation, 16x Zs per clock really makes the most sense for RV740.

Actually, that's not always the case. Check this comaparison - FEAR at 1920x1200 and Vegas at 1680x1050 or 1920x1200 (you can click the bars to see percents). Or Crysis on page 2 at or above 1680.
There's some very funny numbers in that article, e.g. Grid is only 3% faster at 1920 :???:

http://www.techreport.com/articles.x/14990/13

shows 27% margin. Interestingly, the "low" frame rates reported on that page show a 41% margin. But yeah, some games are very heavily fillrate dominated.

OK, I see. But that can be a good thing from positioning perspective - you just slap in a 40% faster gDDR5 variant, raise the clock speed by 20%, and you can call the card a 4790 with a 25-30% perf advantage :D
Definitely.

About the rest - yes. I was getting quite confident that the RV790 will be a simple overclock with serious power consumption tweaks, but after this RV740 showing, I'm more than doubtful.
Yep, signature updated. Call me stubborn :LOL:

Jawed
 
Was Dave "letting slip" or hinting that AMD will be using 16-bit MCs? Each of the eight 16-bit MCs in RV740 has a quad RBE attached, then bingo :!:
My impression is, that Clamshell mode for GDDR5 is just data striping method, while the address/command bus remains shared between the two terminal devices, where in a GDDR3 case both the data and address lanes are in shared topology. :rolleyes:
In that case, there wouldn't be such thing as 16-bit controller/channel in there, because the striped configuration is still being visible as "virtual" 32-bit device.
 
OK, one by one:
- Setup: the possibility is there, definitely - in some of the games. But I don't think that a lower poly game like FEAR or R6 Vegas would be affected, do you?
- CPU (or any other element before the VGA): no, I filtered out the platform limited cases (where the vanilla 4850 performed less than 7% faster than the underclocked one)
- Fillrate: yeah, that's what I'm getting at. or z-fillrate. or whatever else these RBEs do in their spare time :smile:
I was under the impression that RBE-config on HD 4850 and HD 4830 is the same. BTW - I think neither FEAR (shadow volumes) nor RB6 Vegas are particularly low-poly (except you're comparing it to Crysis).

Might another factor also be that both chips can interpolate for only 32 textures at a time?


Actually, that's not always the case. Check this comaparison - FEAR at 1920x1200 and Vegas at 1680x1050 or 1920x1200 (you can click the bars to see percents). Or Crysis on page 2 at or above 1680.
You're not comparing different cards on different drivers, aren't you?
 
There's some very funny numbers in that article, e.g. Grid is only 3% faster at 1920 :???:

Yeah, that's a sure sign that the test CPU needs to be replaced :cry: That's why I pointed to specific tests where I'm sure the CPU didn't play a part.

Interestingly, the "low" frame rates reported on that page show a 41% margin. But yeah, some games are very heavily fillrate dominated.

May be due to texture swapping begins at that resolution due to frame buffer size?

Yep, signature updated. Call me stubborn :LOL:

If the rv740 is 4 RBE-quads, then it makes sense for the rv790 to double up. And looks like you're dreaming a hefty bandwidth to it, too :D

I was under the impression that RBE-config on HD 4850 and HD 4830 is the same.

Yes, and the point was that the 4850 can't realize much advantage from it's +25% ALUs and TMUs - being bandwidth AND RBE limited.

BTW - I think neither FEAR (shadow volumes) nor RB6 Vegas are particularly low-poly (except you're comparing it to Crysis).

... compared to Crysis, World in Conflict or even the Tropics demo :D

Might another factor also be that both chips can interpolate for only 32 textures at a time?

Sure - I keep forgetting that :oops:

You're not comparing different cards on different drivers, aren't you?

Ohshit, sorry about that. It is a generated test based on the result sets, and I choose the wrong driver in the menu. Proper results here. It doesn't matter in practice, but of course it would be awfully amateurish...
 
Last edited by a moderator:
My impression is, that Clamshell mode for GDDR5 is just data striping method, while the address/command bus remains shared between the two terminal devices, where in a GDDR3 case both the data and address lanes are in shared topology. :rolleyes:
That's a good way of looking at it. Does GDDR5 prevent 2 memory chips sharing the same data bus, entirely? I suspect it does (since it's effectively quad-rate signalling - I guess that's just harder to maintain signal integrity), but I don't know. If so, then this necessitated clamshell mode.

Anyway, just like GDDR3 also offers the designer the ability to use a single memory device per set of address/command/data lines, I presume it's possible to use GDDR5 memory in 16-bit mode with only one chip connected to that address/command bus. If that's the case then I think Dave could have been hinting that AMD is going to use this. And if so, then RV7xx architecture with one quad-RBE/L2 per MC still holds in RV740 and RV790, leading to these chips having 16 and 32 RBEs, respectively. An awesome prospect.

The problem with 16-bit MC channels is the increase in pin count needed for the extra address/command lines. If a 100mm2 RV740 with 128-bit (4x 32-bit MCs) isn't big enough for 16 RBEs (the extra RBEs, L2 and MCs adding, say, 20% area?) then perhaps a 120mm2 RV740 with 8x 16-bit MCs would be fine.

D3D11 GPUs are going to need yet higher fillrates. GDDR5 bandwidth will double over the period from 2008Q3 to 2010Q3, so it makes sense that fillrates do, too. So if AMD is doubling RBE count in RV740 then this could be a preview of the kind of capability the D3D11 GPUs will have.

Though RBEs may cease to exist?

If, instead, AMD went with 8 Zs per RBE per clock in RV740, this whole question of a 16-bit MC configuration seems to be irrelevant, as four 32-bit MCs would be fine. Generally I'm doubtful colour rate needs to increase, which would imply that 8 Zs is the preferred configuration.

Right now the simplest, sanest, RV740 appears to be 8xZ per RBE, rather than a doubling of RBE count...

Jawed
 
That's a good way of looking at it. Does GDDR5 prevent 2 memory chips sharing the same data bus, entirely? I suspect it does (since it's effectively quad-rate signalling - I guess that's just harder to maintain signal integrity), but I don't know. If so, then this necessitated clamshell mode.
Well, I don't know if anyone would slap two GDDR5 devices in a full-shared topology with that kind of sensitive signaling -- clamshell does it all the way better for a reason, e.g. there is virtually no extra signaling load on the data bus, adding a second device.
Anyway, just like GDDR3 also offers the designer the ability to use a single memory device per set of address/command/data lines, I presume it's possible to use GDDR5 memory in 16-bit mode with only one chip connected to that address/command bus. If that's the case then I think Dave could have been hinting that AMD is going to use this
32608689.png

Hmm, could it really?
Addressing the physical space looks quite tricky that way -- interleaving, or some... and the performance prospects aren't looking good enough. :???:

p.s.: From the diagram above, looks like clamshell is just an extention to the conventional single end-point x32 mode.
 
It seems to me there's nothing forcing the designer to use the same address and command bus for both chips in a clamshell. Simply having two address/command buses, one per chip is feasible. This is no longer "clamshell" mode, but a discrete, 16-bit mode, per chip.

Jawed
 
8 memory chips, each with a 16-bit bus, makes 128 bits. Why would that be half?

Sure, each memory chip is only providing half the bandwidth it can muster, but the GPU still sees 128-bits total.

Jawed
 
Donanimhaber still thinks it's 850Mz

http://translate.google.com/transla...detaylar-13077.htm&sl=tr&tl=en&hl=en&ie=UTF-8

And also thinks it'll be called 4890. So two recent reports that are fairly contradictory, both quoting sources that are supposedly decent. What do they agree on? Erm, that it's 55nm and that memory clocks haven't changed.

So far there haven't been any rumours about the Pro version. 650MHz? 700MHz? 750MHz? GDDR3?

The big problem with the Pro GPU is that its performance will be only 5-10% better than RV740XT (unless it's clocked at around 750MHz or more). This smells fishy to me. I don't believe AMD would produce a simultaneous refresh to both RV730 and RV770 and create two SKUs that have basically the same performance, RV740XT and RV790Pro.

Also, why've we not heard about RV720 yet?

Jawed
 
The big problem with the Pro GPU is that its performance will be only 5-10% better than RV740XT (unless it's clocked at around 750MHz or more). This smells fishy to me. I don't believe AMD would produce a simultaneous refresh to both RV730 and RV770 and create two SKUs that have basically the same performance, RV740XT and RV790Pro.

Completely agreed. The extremely impressive performance of RV740 and potential to clock it higher later on, coupled with RV770's easily achievable performance, makes the suspected 790s very funny products indeed.


Also, why've we not heard about RV720 yet?
Jawed

Because RV710 was too good.

Nah, just kidding. :p RV710 as a notebook part is perfect, but for the desktop channels it ain't moving much due to RV730 hot on its pricing heels.
 
Nah, just kidding. :p RV710 as a notebook part is perfect, but for the desktop channels it ain't moving much due to RV730 hot on its pricing heels.
So even in OEM systems RV730 is being chosen in preference to RV710?

So what we're seeing is RV710 being squeezed out between IGP and discrete (RV730)?

Jawed
 
Back
Top