Crossbar Architectures

I am curious if anyone can explain to me alittle more in depth about nV's crossbar architecture. As far as I can understand crossbar architectures require n^2 amount of switches in comparison with an omega architecture which has 2nlogn. So am I correct in assuming that as memory sizes increase i.e.(64mb GF3 to 128mb GF4) the cost of using a crossbar architecture is going to increase exponentially? But, at the same time I understand as the larger memory gets, it takes an omega architecture that much longer to access a node which it doesn't have a direct path to. In the long run would it be cheaper to increase the size of the data path 128bits up from 64bits so there would be less paths?

thanks
 
Since the crossbar architecture is made up of 32 bit (DDR) channels, complexity won't increase with increased memory as long as the aggregate width of the memory bus is the same (128 bit DDR).

I have no clue as to what an Omega crossbar is, but guess (from your description) it is hierarchical. I expect the nv crossbar to be all-to-all wired, ie complexity n^2 with n being the number of ports.

Cheers
Gubbi
 
Crossbar is just a marketing term, I doubt they intend it to reflect how they route their signals.
 
Well, if you believe Dave Kirk, it does. There are four distinct memory spaces. Here's the relevant text:

"Recall that LMA is a four-way crossbar memory controller whose mission in life is to make as efficient use of memory bandwidth as possible. By having four controllers, each of which tends to one-quarter of the onboard memory (32MB in the case of a 128MB board), there can be four simultaneous memory transactions in flight. The principle advantage of this approach is that it helps to mask memory access latency. Each controller has its own 32-bit address space, and effectively a 64-bit interface (32-bit physical interface, DDR memory) to its respective slice of memory. The other key advantage this design delivers is more granular memory access. "

This is from the article by Dave Salvator, my counterpart in crime at ExtremeTech:

http://www.extremetech.com/article/0,3396,s=1017&a=22391,00.asp
 
Interesting; to me that description goes completely against their pictorial representation of what is happening:

crossbar4.jpg


That diagram shows each controller accessing all available memory, however David is saying that each controller can only address its own 32bits.

_________________
'Wavey' Dave
Beyond3D
 
Can what Kirk said and the diagram coexist?. I have no idea how these things physically work, but why not the following:

Each of the 4 memory "chunks" (I presume these are some configuration of physical chips?) has a 32 bit access path. But why cant each controller only access 8 bits of each "chunk"?

This gives 32 bits of access for each LMA controller, spread over the entire memory "pool", or the entire range of chips. However, each 32 bit memory "chunk" is segmented 4 ways, giving "exclusive" access of each sgement to a single LMA controller.

<font size=-1>[ This Message was edited by: Joe DeFuria on 2002-03-05 17:50 ]</font>
 
Each of the 4 memory "chunks" (I presume these are some configuration of physical chips?) has a 32 bit access path. But why cant each controller only access 8 bits of each "chunk"?
Cause it is a non-sense. If the 32 bits wide data bus each controler has is splitted 4 ways and distribuited to each bank how can u fetch 4 different 32bits words at time in different 'space' locality? Won't u reduce the minimum data load/write from 32 to 8 bits masking unused bits and wasting 75% of peak bandiwth? :smile:

ciao,
Marco
 
One guess: each of the "controller"s in the diagram is a unit that manages memory access for one part of the chip (like e.g. vertex cache, texture mapper, framebuffer blender, Z-buffer tester), whereas the controllers mentioned by David Kirk are raw DRAM comtrollers, corresponding to the "memory" blocks in the diagram. It would make sense to have a physical crossbar between those stages.
 
On 2002-03-05 17:33, DaveBaumann wrote:
Interesting; to me that description goes completely against their pictorial representation of what is happening:

http://216.12.218.25/domain/www.beyond3d.com/articles/gf4launch/crossbar4.jpg

That diagram shows each controller accessing all available memory, however David is saying that each controller can only address its own 32bits.

What I don't understand in this pic, is why the 2 end memory conrtollers(0 and 3) access n-1 memory nodes. But yet memory controllers 1 and 2 access all 4 of the shown memory nodes. It would seem having things done this way would limit your crossbar because you still will not have direct access to ALL of your memory nodes from each controller.

<font size=-1>[ This Message was edited by: LittlePenny on 2002-03-05 19:02 ]</font>
 
For some reason my University prohibits me from downloading the full texts, but for those of you with ACM account you can view http://portal.acm.org/citation.cfm?...dl=ACM&amp;CFID=1774386&amp;CFTOKEN=65395595# or you can read what I was able to download http://www.umr.edu/~buechler/p202-briggs.pdf.

Keep in mind neither article deals with the exact topic of memory controllers in graphics processing, but the same logic can be applied. In the latter article, the interesting stuff begins around section 5 page 3. It helps to explain to me why maybe nV chose to have the end memory controllers only access 3 nodes instead of all four, because at this point adding more paths may not have a sufficient increase in efficiency versus the cost.

<font size=-1>[ This Message was edited by: LittlePenny on 2002-03-05 19:42 ]</font>
 
i heard somewhere that nvidia's memory architecure weasn't designed or invented by them
same with their audio processor that's in their nforce boards
 
I hadn't noticed before that the outside memory controllers can't access all of memory. I would guess this is largely a routing issue, but it might just be that it doesn't actually help. As was said Kirk's quote doesn't mention the crossbar aspect of the memory controllers. Maybe accessing another 32 bit address space only occurs in specific situations, but in general the controllers act as if they're distinct. So maybe the cross bar just provides more flexibility.
 
Well, if you believe Dave Kirk, it does. There are four distinct memory spaces. Here's the relevant text:

"Recall that LMA is a four-way crossbar memory controller whose mission in life is to make as efficient use of memory bandwidth as possible. By having four controllers, each of which tends to one-quarter of the onboard memory (32MB in the case of a 128MB board), there can be four simultaneous memory transactions in flight. The principle advantage of this approach is that it helps to mask memory access latency. Each controller has its own 32-bit address space, and effectively a 64-bit interface (32-bit physical interface, DDR memory) to its respective slice of memory. The other key advantage this design delivers is more granular memory access. "

This is from the article by Dave Salvator, my counterpart in crime at ExtremeTech:

http://www.extremetech.com/article/0,3396,s=1017&amp;a=22391,00.asp

Loyd I didn't know that you read these forums. I enjoyed your writing for years before you left CGW. Just wanted to say thanks for all the great articles you've done over the years. Big thumbs up.
:D

<font size=-1>[ This Message was edited by: Brimstone on 2002-03-07 07:10 ]</font>
 
On 2002-03-06 03:21, phynicle wrote:
i heard somewhere that nvidia's memory architecure weasn't designed or invented by them
same with their audio processor that's in their nforce boards
I can't speak for their memory architecture, but their audio processor IP came from Parthus, Inc. Its a clone of the Motorola 56300 processor.
 
The most likely explanation of that image is that it's the interpretation a computer illiterate PR guy made of what a tech guy said. The image realy doesn't make sense, and any strageness is likely because it's incorrect.
 
You know, it could be 4 separate bus's with access to 4 individual regions of memory, excepot they are all interleaved, RAID striping style.

Hey, just a thought. After all it would actually help some with granularity then without holding up large transfers.

Dave
 
"You know, it could be 4 separate bus's with access to 4 individual regions of memory, excepot they are all interleaved, RAID striping style.

Hey, just a thought. After all it would actually help some with granularity then without holding up large transfers."

I thought that was exactly what they're doing. Would this allow them to have more banks/pages open, I think 16 instead of the usual 4.
 
Back
Top