Multi-GPU rendering communications thread

He means vastly more memory bandwith if and only if the R700 can actually make use of a shared memory space of something along the lines of a 512-bit bus.

If one thinks about it, that would be a pretty awesome hurdle to overcome. The current _X2 cards and SLI/Crossfire systems are hugely limited in that area. However, all things considered, it may not be any less complicated to design a 512-bit shared memory space with multiple dies vs a 512-bit bus for a single larger die. So a GT200b with GDDR5 would certainly equalize differences in memory bandwith vs R700, assuming that R700 works with a shared memory space.
 
Last edited by a moderator:
Well, even if they are sharing memory, I am not sure that they would be able to make use of a 512bit bus due to bandwidth limitations between chips. I think ~64GB/s each direction would be the most one could hope for, and that would be insufficient if trying to move around ~108 GB/s. With a 256bit bus, the most you would have to worry about would be ~72GB/s but half of that could travel in either direction.
 
Well, even if they are sharing memory, I am not sure that they would be able to make use of a 512bit bus due to bandwidth limitations between chips. I think ~64GB/s each direction would be the most one could hope for, and that would be insufficient if trying to move around ~108 GB/s. With a 256bit bus, the most you would have to worry about would be ~72GB/s but half of that could travel in either direction.
In theory the bridge that joins two GPUs to create a shared memory space only needs to provide enough bandwidth for texturing or perhaps for post-transformed vertices. Render target operations (colour, Z and stencil writes, MSAA, blending) normally consume considerably more bandwidth.

So the bandwidth required for the bridge should be considerably less than the bandwidth of either GPU to its local memory.

But I admit, attempts at trying to quantify the bridge bandwidth haven't proven very successful so far. Bit embarrassing really.

Jawed
 
@ Jawed:

Yeah, those were just theoretical maximums.

One other benefit of keeping the same external bus size is that it would appear the same for each chip as in individual operation. With some minimal arbitration on ring, it could probably operate as such. Otherwise, I'd think you would need a scheduler that is aware if it is being used in a multichip environment (and what that looks like) or not, which would seem more of a hassle.
 
@ Jawed:

Yeah, those were just theoretical maximums.

One other benefit of keeping the same external bus size is that it would appear the same for each chip as in individual operation.
Resulting in a link that's bigger than it need be it seems.

With some minimal arbitration on ring, it could probably operate as such. Otherwise, I'd think you would need a scheduler that is aware if it is being used in a multichip environment (and what that looks like) or not, which would seem more of a hassle.
The ring is programmable. I think if they want to they can vary the ring behaviour by game. So one-chip or two-chip is just another driver parameter.

Bearing in mind that a one-chip configuration has got a link that, presumably, goes nowhere.

Jawed
 
Jawed said:
So one-chip or two-chip is just another driver parameter.
I'm not so sure... if this were the case you would think we would have seen shared memory with 3870x2 products.



Jawed said:
Bearing in mind that a one-chip configuration has got a link that, presumably, goes nowhere.
Maybe I'm being dense, but why would this be the case?
 
I don't suppose someone is brave enough to slice off the discussion of how to utilize shared memory between two GPUs into a new thread?
 
I'm not so sure... if this were the case you would think we would have seen shared memory with 3870x2 products.
Which is why a shared memory R700 isn't given a great deal of credence...

Maybe I'm being dense, but why would this be the case?
Like when a CrossFire connection goes "nowhere" when there's only one GPU in a system.

Jawed
 
Like when a CrossFire connection goes "nowhere" when there's only one GPU in a system.
I was thinking... And that's a wild guess... But maybe they can use half of MCs for such link?
If we assume that RV770 has 256 bit memory bus then RV770X2 can have 256-bit memory bus too with 128 bits on each chip used for the link for NuMA architecture.
And i always thought that GDDR5 is a bit too much for RV770 but for such X2 configuration it makes perfect sense, no? Effectively you have the same memory bandwidth as with 256+256 bit GDDR3 but you have good NuMA also with 256 bit interconnect between the chips.
But that's just me thinking out loud :)
 
DegustatoR said:
I was thinking... And that's a wild guess... But maybe they can use half of MCs for such link?
Yeah, that's what I was getting at.


Jawed said:
Which is why a shared memory R700 isn't given a great deal of credence...
Well if it is dual chip and not shared, it is going to need 2GB of GDDR5 to compete with a GTX280 or even 260... Perhaps Arun is right and it is 800sp, single chip after all, but it seems [strike]on[/strike] odd that Pande would intentionally leak false specs.
 
Last edited by a moderator:
I don't suppose someone is brave enough to slice off the discussion of how to utilize shared memory between two GPUs into a new thread?
Done for the portion in the GT200 thread, will mop up the RV770 thread tomorrow. Dog tired working to write something meaningful before CJ leaks it all :LOL:
 
I was thinking... And that's a wild guess... But maybe they can use half of MCs for such link?
If we assume that RV770 has 256 bit memory bus then RV770X2 can have 256-bit memory bus too with 128 bits on each chip used for the link for NuMA architecture.
And i always thought that GDDR5 is a bit too much for RV770 but for such X2 configuration it makes perfect sense, no? Effectively you have the same memory bandwidth as with 256+256 bit GDDR3 but you have good NuMA also with 256 bit interconnect between the chips.
But that's just me thinking out loud :)
It's certainly an idea!

Maybe not half, hey what about 192 bits connected to memory chips and the other 64 bits connected to the other GPU?

So, ahem, the start of the 768MB R700 rumour :?: :p

Jawed
 
It's certainly an idea!

Maybe not half, hey what about 192 bits connected to memory chips and the other 64 bits connected to the other GPU?

So, ahem, the start of the 768MB R700 rumour :?: :p

Jawed

Oh my, no way that the PR department would let the R700 engineers get away with anything less than 1024MB :)

So if ATI can figure out a clean way to have a shared memory space for two dies on one card, what about more than two dies on one card? Is that even a realistic proposition?
 
I'd say 3 cores would be close to the limit that would physically fit. The limiting factor seems like it will come down to the interconnects for a shared memory system.
 
Back
Top