So, do we know anything about RV670 yet?

lol RV670XT will use only GDDR4 only and R680 use GDDR3.

Back to RV670 topic,until now it looked really promising core and memory clock above 8800ultra.
Only thing I worried about RV670 is low fill rate when compare to nvidia. :(

Nope I dont worry about AA since I dont think I'll enable in UT3 lol.

Which does make sense. It would be 2xRevival and not Gladiator.

Yay! An adjustable PCIe power level delivery through the BIOS! :D

RD790 roX!

Can not most of todays ATI borads do that too ?
 
The 256-bit bus on most GPUs is actually a collection of smaller buses, e.g. 4x64-bit or 8x32-bit.

What I'm expecting to see is that two GPUs on one package have a fat bus joining them (much like Xenos and its daughter die have a 32GB/s bus). The two separate memory systems (256/512MB per GPU) are then aggregated into a single memory address space.

I knew that (smaller buses), and is what I figured. Could 4x64 become 8x64 using two of the same GPUs on the same package with cross-usage of memory and current load balancing ala driver for the GPUs? I believe you may have answered it below in the positive (and through many subsequent answers to others.)

In effect GPU A can request a texture from GPU B's memory, so the fetch request travels over to the relevant memory controller on GPU B which then obtains the data GPU A requires. All clients in both GPUs (TUs, RBEs, vertex fetch, etc.) see this single memory space and are able to use it freely.

Then you just need a decent driver that understands how best to assign memory to the clients on both GPUs, so that the chosen multi-GPU mode produces the most efficient usage of memory as well as the required performance.

This is what I wonder is currently feasible, or if we'll have to wait a generation or two.


AFR seems like the prime candidate. Within AFR, though, it's possible to optimise the way textures are organised - e.g. classically textures are copied to both GPUs' memory. In theory the newer ATI GPUs don't need to be that wasteful.

I've got my fingers crossed that we'll see these kinds of efficiency gains, but the driver gods have been scowling upon these D3D10 GPUs and I see no sign of a let up.

Jawed

That, I find very interesting, and it would seem to make quite a bit of sense.

EDIT: I also did not know about MC addressing and the number of GPUs being irrelevant. Very food investigative reporting (and deductive reasoning) amigo. :)

Thanks a lot Jawed, and very interesting discussion about the possibilities as well as pros about such a solution (yields, cost saved on packaging, similar layout/heat to R600 could be used etc), many of which I figured would be the reason why such a product could or rather should exist. :smile:

BTW: the adjustable power to the pci-e slots on 790 looks freakin' rad., and agree the 512MB/1GB seems to imply R680 is indeed 2x670 Gladiator (in one form or another) ;)

VR-ZONE gives me hope.
vr-zone said:
We even heard faintly that R680 could be AMD's ambitious plan to integrate two RV670 into a single die, if not on the same package.

Care to comment CJ? You seemed to give them the rest of the scoop. I'm just curious how a R680 (2xRv670) could score 20k in 3dmark06, while a single gladiator will reportadley do about 10.4 (edit: 11.4). That's almost perfect 2x scaling, and while granted the CPU has to taken into the mix, iirc we don't see anything close to that in Crossfire, although granted 3DM may be the exception.
 
Last edited by a moderator:
Gladiator or Revival? They don't mention it and some other sources mentioned Revival... quite confusing.
 
Rv670 gladiator naked:
rv670g1qq7.jpg

rv670g2ri5.jpg
 
What topology do these buses have?
I'm afraid you've got me there.

As I understand it, the common configuration of GDDR on graphics cards uses a shared 64-bit data path to two chips with a set of common address lines. I don't know if chip select is command- or pin-driven. Actually, chip select wouldn't be necessary for this configuration, if a single address is ganged across both chips.

So the single memory chip bus configuration is actually fairly rare on contemporary graphics cards, e.g. certain R5xx cards such as X1950Pro (256MB or 512MB). RV670 looks like it'll work the same way.

32-bit channels are theoretically advantageous because they allow for the most fine-grained memory accesses, in terms of the amount of data that is fetched (due to the minimum burst size of DDR being 4 clocks, generally). I don't know of any benchmark results that show such an advantage, though.

Jawed
 
I knew that (smaller buses), and is what I figured. Could 4x64 become 8x64 using two of the same GPUs on the same package with cross-usage of memory and current load balancing ala driver for the GPUs?
I speculate it's possible, but we'll just have to wait and see...

This is what I wonder is currently feasible, or if we'll have to wait a generation or two.
I'm basing this on the patent applications I linked above - so the chances are high that I've made an amateur fumble!

That, I find very interesting, and it would seem to make quite a bit of sense.
To me this would be a major motivation.

I also think this is where the L2/L3 cache hierarchy comes into play. Patent applications refer to multiple L2 caches in a GPU (or, extrapolating, multi-chip GPU), with the idea that they function as L3s for each other. Though I can't see how this would be useful if the GPUs are running AFR... This is what kills me about AFR, it seems to me the clunkiest approach - simplistically brute force as long as the frame rendering algorithm is straight-forward enough, but a nightmare when the programmers do funky stuff.

Jawed
 
Maybe a sort of depth-buffer access measurement would tell us something about granularity, as this kind of access pattern tends to have a very high entropy. AA sample access is also good one.

Speaking of this, looking at the layout of the Parhelia-512 board, does anyone know is it really single 256-bit bus device, or 4*64-bit?
 
Maybe a sort of depth-buffer access measurement would tell us something about granularity, as this kind of access pattern tends to have a very high entropy. AA sample access is also good one.
The fundamental problem seems to be that other architectural changes occur at the same time, so it's really hard to isolate such theoretical improvements.

For all we know 32-bit channels may have been a half-step in R5xx and the benefits only really appear in R6xx or later. Clearly R600 uses 64-bit channels, so it even brings into question the relevance of 32-bit channels.

Jawed
 
The obvious benefit seems to be of how many concurrent reads/writes the device can initiate at once to the frame buffer?!
 
Last edited by a moderator:
Maybe a sort of depth-buffer access measurement would tell us something about granularity, as this kind of access pattern tends to have a very high entropy. AA sample access is also good one.

Speaking of this, looking at the layout of the Parhelia-512 board, does anyone know is it really single 256-bit bus device, or 4*64-bit?
I think it's 4*64-bit. I know it's not a monolithic 256-bit.
 
Sure, I heard several times that it was 1x256Bit and one of the mistakes in Parhelia design.
People that said it was 1x256 bit didn't know what they were talking about. There was even an interview somewhere that stated as much. Now as to the efficiency of that memory controller I don't know.
 
Question for RV name.

Usually RV stands for cut down version of R.
If RV670 will have same specifications as R600, then why it is called RV670 instead R670 ?
 
Last edited by a moderator:
Question for RV name.

Usually RV stands for cut down version of R.
If RV670 will have same specifications as R600, then why it is called RV670 instead R670 ?

I think its more about the price target than any performance target.

V = Value?
 
Ok, RV = Value price but NOT performance

If (RV670+RV670 Dual card)=R680
If R680 will be using 256bit (example 900x2) effective 1800MHz GDDR3 [Dual GPU 256bit x 2 memory] since that gives ~100GB's bandwidth.

How would R680 be stack against G80-GF8800Ultra?
 
Last edited by a moderator:
Back
Top