dual GPU

Away

Newcomer
Why can't we say for example that a 7950GX2 is 512-bit ?

What's the difference btween a dual GPU (2x256bit) and a single GPU (512-bit) ?


Thank you
 
Why can't we say for example that a 7950GX2 is 512-bit ?

What's the difference btween a dual GPU (2x256bit) and a single GPU (512-bit) ?


Thank you

The fact that the 7950GX2 is comprised of two discrete chips with their own discrete memory buses/banks. GPU1 can't access GPU2's memory, nor is the reverse possible. Thus, you don't get a double wide bus, but 2 parts with individual busses that exchange a limited(relatively) ammount of data through the on-board PCI-E interconnect. The future MCM multi-GPU solutions may change that-the scenario would be a number of discrete cores linked to a central "hub" that arbitrates memory access, among other things, through a single-unified and probably rather fat memory bus. Time will tell.
 
The fact that the 7950GX2 is comprised of two discrete chips with their own discrete memory buses/banks. GPU1 can't access GPU2's memory, nor is the reverse possible. Thus, you don't get a double wide bus, but 2 parts with individual busses that exchange a limited(relatively) ammount of data through the on-board PCI-E interconnect. The future MCM multi-GPU solutions may change that-the scenario would be a number of discrete cores linked to a central "hub" that arbitrates memory access, among other things, through a single-unified and probably rather fat memory bus. Time will tell.


As you can see it ever since the AMD Athlon 64 days (and soon, in Intel's "Nehalem" too), the practice of using a centralized, external memory controller (motherboard chipset's Northbridge), shared between CPU's/cores through a "fat bus" (Front Side Bus) has been deemed as being largely inefficient.
Why should it suddenly become appropriate in the GPU realm now, where speed is paramount ?

Would ATI revert two steps back, after developing a sophisticated ring-bus internal memory controller, to an off-die shared controller ?
I can't make any sense of it, as they are two technological moves in clear contradiction of each other, IMHO.
 
As you can see it ever since the AMD Athlon 64 days (and soon, in Intel's "Nehalem" too), the practice of using a centralized, external memory controller (motherboard chipset's Northbridge), shared between CPU's/cores through a "fat bus" (Front Side Bus) has been deemed as being largely inefficient.
Why should it suddenly become appropriate in the GPU realm now, where speed is paramount ?

Would ATI revert two steps back, after developing a sophisticated ring-bus internal memory controller, to an off-die shared controller ?
I can't make any sense of it, as they are two technological moves in clear contradiction of each other, IMHO.

Perhaps I worded it badly, and comparing the FSB with what an external 512-bit bus to memory is quite a stretch, ignoring the frequency of the GDDR attached to it which would be far beyond the measely thing you can get with typical DDR2 modules tied to a dual-channel 128-bit interface. Someone had a nice drawing in the R700 speculation thread that pretty much illustrated a possible arrangement for this, think something like 4 chips in the corners of a square with the "arbitrator" at the diagonals' intersection. This way all chips have acces to all of the RAM that's onboard, they don't have to go through each-other, tracing is easier that tracing 4 individual 256-bit buses to the MCM, scaling is easier.
 
Agreed.

I suppose it's worth pointing out that GPUs, via either AGP or PCI Express (PCI too?) have access to CPU memory. HyperMemory/TurboCache are the marketing names in the PCI Express view of the world:

http://techreport.com/articles.x/8396/1

Jawed

Yes, i'm well aware of them.
But using system RAM is neither desirable nor efficient for realtime 3D graphics rendering, otherwise IHV's wouldn't develop wide internal buses with lots of dedicated video RAM.

A good example is the recently disclosed AGP8x version of Powercolor's HD3850 512MB.
Given similar systems (as possible), would it really make any difference going from AGP8x to PCIe 1.x to PCIe 2.0 ?
I don't think so.


Perhaps I worded it badly, and comparing the FSB with what an external 512-bit bus to memory is quite a stretch, ignoring the frequency of the GDDR attached to it which would be far beyond the measely thing you can get with typical DDR2 modules tied to a dual-channel 128-bit interface. Someone had a nice drawing in the R700 speculation thread that pretty much illustrated a possible arrangement for this, think something like 4 chips in the corners of a square with the "arbitrator" at the diagonals' intersection. This way all chips have acces to all of the RAM that's onboard, they don't have to go through each-other, tracing is easier that tracing 4 individual 256-bit buses to the MCM, scaling is easier.


Even with "efficient on-package tracing" (physically similar to the ones used -in single GPU configurations, of course- in stuff like Nvidia's RSX or ATI's Xenos) it would still be an external/off-die memory controller, there's no way around it.

As for comparing GDDR types with DDR, i never mentioned anything of the sort, although i do have to point out that there is, in fact, DDR3 @ 2000MHz (not GDDR3, mind you).
This despite the fact that it's a DIMM-type of RAM, not an bunch of IC's soldered directly next to the dedicated path like in a graphics card.
 
Yes, i'm well aware of them.
But using system RAM is neither desirable nor efficient for realtime 3D graphics rendering, otherwise IHV's wouldn't develop wide internal buses with lots of dedicated video RAM.

A good example is the recently disclosed AGP8x version of Powercolor's HD3850 512MB.
Given similar systems (as possible), would it really make any difference going from AGP8x to PCIe 1.x to PCIe 2.0 ?
I don't think so.





Even with "efficient on-package tracing" (physically similar to the ones used -in single GPU configurations, of course- in stuff like Nvidia's RSX or ATI's Xenos) it would still be an external/off-die memory controller, there's no way around it.

As for comparing GDDR types with DDR, i never mentioned anything of the sort, although i do have to point out that there is, in fact, DDR3 @ 2000MHz (not GDDR3, mind you).
This despite the fact that it's a DIMM-type of RAM, not an bunch of IC's soldered directly next to the dedicated path like in a graphics card.

What would you suggest as an alternative?The proposed scenarios imply up to 4 individual GPU dies on a single package. Should each die include its own memory bus and have it's own discrete VRAM pool, maintaining the inneficiencies of current solutions?Something else?Honest question, I'm curious WRT how ppl are seeing the possible(probable) future implementations of multi-GPUs.

What I meant with the DDR example was that on one hand you have the an 128-bit bus at best, whilst on a hypothetical GPU you'd have a possible 512-bit one, which even at similar clockrates would equate to 4 times the bandwidth(and there's no reason to ignore where GDDR4 is now in terms of clocks and where GDDR will go in terms of clocks), and higher granularity(perhaps 8x64 independent channels instead of the 2x64 in case of the FSB situation). And we're still ignoring the fact that having the mem-controller on the northbridge is quite a bit different from having an on-package arbitrator, and considerably less performant as an approach.OTOH, the FSB+northbridge integrated mem-controller has proven to be a bad idea IRL only in server scenarios with 8+ CPUs, under heavy loading where the FSB itself becomes saturated. With only 4 CPUs, Core2 Quads seem to have little issue with this approach(although in theoretical scenarios the X2s and the Phenom seem better, that simply doesn't materialise). An counter-argument here would be that probably the ammount of data traffic would be significantly higher with 4 GPUs making concurrent demands. So, in short, I dunno...as with everything, some aspects would be better, and some worse.Only time will tell.
 
What would you suggest as an alternative?The proposed scenarios imply up to 4 individual GPU dies on a single package. Should each die include its own memory bus and have it's own discrete VRAM pool, maintaining the inneficiencies of current solutions?Something else?Honest question, I'm curious WRT how ppl are seeing the possible(probable) future implementations of multi-GPUs.

My opinion is that both IHV's have been spinning the multi-GPU tale louder since 2004 because they can quickly and easily double their profits by selling two chips for the price of... two, for a single user, not because of some technical difficulty.
GPU R&D and production costs just keep getting higher, while brand loyalties in the GPU business are not what they used to be, so their -economically understandable- impatience to wait for tech process innovations like optical interconnects and on-die stacked memory leads them to go the multi-GPU route.


As you know, GPU's are essentially massively parallel architectures (unlike CPU's), so the reasoning for dropping a single, large core and go with a bunch of smaller cores working together is purely economical, not a technical hurdle.
Look at how G80, despite the huge die and relatively "old" fabrication process, turned out to be so successful for so long.
Do you think Quad-SLI'ed G84 cores in a single graphics card would be as power-efficient and price-competitive as one or two 8800 Ultra's in terms of performance per-watt and RAM IC costs ?


This is why i also believe that Nvidia will do it again, and rely on 65nm and G80-level die surface for their next high-end (not 55nm and dual or triple SLI'ed cores).
The argument of saving costs by reusing separate-but-same-waffer-based cores for low-end cards doesn't add up either.
I doubt it is that much cheaper to produce 2 RV670's for R680 and its complex, nearly "double-everything" PCB -or double PCB like 7950 GX2-, than a single G80.
7950 GX2 itself was merely an "emergency" solution due to G80 not being ready on time and ATI coming up with the X1950 XTX GDDR4.
 
Oh yes , thank you!!


One more question

What's the difference between a dual gpu in 2 pcbs and a dual gpu in only one pcb (like 3870 x2) ?
 
Oh yes , thank you!!


One more question

What's the difference between a dual gpu in 2 pcbs and a dual gpu in only one pcb (like 3870 x2) ?

Aestethics(in theory, going single PCB could mean shorter traces/better inter-GPU communication, but both nV and ATi have opted to use a bridge chip anyway to handle interchip communication so that's irrelevant for this gen). You could also make a case for the dual-PCB board being harder to cool/a bitch to get aftermarket cooling on, but I don't think that was the point of your question:)
 
My main problem with this is. Is the assumption that the GPUs are always load balancing even closely to 75%. I understand nothing is 100% efficient. Even bandwith utilization. But if your not achieving ideal load balancing and frames are not divided evenly then your not really benefiting from the other GPUs bandwith anyway.
 
Aestethics(in theory, going single PCB could mean shorter traces/better inter-GPU communication, but both nV and ATi have opted to use a bridge chip anyway to handle interchip communication so that's irrelevant for this gen). You could also make a case for the dual-PCB board being harder to cool/a bitch to get aftermarket cooling on, but I don't think that was the point of your question:)

I got it , thank you :D
 
Back
Top