What are the advantages of multi-chip cards?

One advantage of a multichip architecture is higher yields. With a single big chip if the rendering engine is bad you also throw away the geometry engine. If these are in separate chips you only throw away the bad rendering chip.

Disadvantages of multiple chips might be board space and lower data transfer speeds between the chips.
 
Sorry Randall- didn't mean my "Who said.." to be a rebuttal. heh. Just a rhetorical point of interest. :)

I also agree with Rev that with the V5 (assuming the meatiest "problem" apps and filtering AA from the picture), it was a fillrate bottleneck moreso than anything else.
 
3dcgi said:
One advantage of a multichip architecture is higher yields. With a single big chip if the rendering engine is bad you also throw away the geometry engine. If these are in separate chips you only throw away the bad rendering chip.

Disadvantages of multiple chips might be board space and lower data transfer speeds between the chips.

While I don't know for sure, I suspect the increased testing time and packaging costs may outweigh the increased yield. Each chip would require an "exotic" BGA with several hundred pins, instead of a single one of equal complexity.
 
While I don't know for sure, I suspect the increased testing time and packaging costs may outweigh the increased yield.

I'd be inclined to believe that it's really on a case-by-case basis. Unless we're comparing two sepecific designs and two specific engineering teams, trying to generalize cost effectivness one way or the other is basically a crap shoot IMO.
 
Didn't John Carmack state that things were going to get interesting next year with the release of new technology and multi-chip solutions. I remember reading that somewhere but I can't remember where.
 
Joe DeFuria said:
I'd be inclined to believe that it's really on a case-by-case basis. Unless we're comparing two sepecific designs and two specific engineering teams, trying to generalize cost effectivness one way or the other is basically a crap shoot IMO.

But that's not the question here. Sure, you could have a crack team make a two chip solution that was more cost effective than a hack team make a single chip solution.

The question at hand is "all else being equal", which is really the only way to compare hypotheticals, no?
 
I think both ATI and nVidia have multi-chip possibilities in mind for their next generation, but it is not for the consumer market. It is for the commercial render-farm market, which I think they are really aiming at with their floating-point capable architectures (and also the high-end simulator market, which I think is much smaller).

Both companies have hardware products which can finally approach the capabilities of software renderers (although still with multiple passes so not at real-time speeds), and this gives them a crack at this market that they haven't really had before. This market is less price sensitive than the consumer market and already has software that would benefit from all the horsepower a multi-chip solution has to offer.

(I think nVidia might have a better product than ATI for commercial rendering--it's support of very long shaders isn't really useful for real-time but might pay off doing "near-real-time" renders with sophisticated shaders)
 
antlers4 said:
I think both ATI and nVidia have multi-chip possibilities in mind for their next generation, but it is not for the consumer market. It is for the commercial render-farm market, which I think they are really aiming at with their floating-point capable architectures (and also the high-end simulator market, which I think is much smaller).

Both companies have hardware products which can finally approach the capabilities of software renderers (although still with multiple passes so not at real-time speeds), and this gives them a crack at this market that they haven't really had before. This market is less price sensitive than the consumer market and already has software that would benefit from all the horsepower a multi-chip solution has to offer.

(I think nVidia might have a better product than ATI for commercial rendering--it's support of very long shaders isn't really useful for real-time but might pay off doing "near-real-time" renders with sophisticated shaders)

Well ATi does have AFR, and IIRC there was some talk a while back about a possible R300 MAXX... but that was of course rumours so don't trust it :p

nVidia's multichip is questionable... I've heard from various sources that they do and don't have SLI. Anyone know the patent # so we can see for sure who the current owner truly is?
 
Since SLI isn't so crash hot in terms of technology I'm not sure they'll exactly need it. Rendering to tiles seems to be a better approach for multichip solutions.
 
I liked SA´s ideas about multichip sollutions with edram. Question is if edram will ever become mainstream (or cost effective if you prefer).
 
A multichip module.

Essentially, the separate chips are placed on a small "pcb" or substrate and the whole substrate is encased in epoxy to make the chip. From the outside, it looks like a single 'chip'.

From what we've seen, the price increase for this isn't horrible, but your useable die space goes down because you've got a pad ring around each chip in the module, and you have to pay for the more expensive packaging. I think BGA chips need the substrate anyways, so all you're paying for is the extra time in manufacturing (which costs money) as the robots place the multiple chips on the substrate.

Many flash chips are done this way, as are smart media cards (The larger ones). ATIs recent mobility chips are MCMs, as is Philips new MP3 decoder chip(Ptoooie! ours is better).
 
Wasn't the "full speed cache" in the Pentium Pro located in a separate chip all wrapped up into one package with the CPU?

For a long time I thought the PPro actually had on-die cache, but I know I've run across a picture somewhere showing the exposed cores, and there was clearly a separate chip for cache.

As far as that goes... instead of just having an off-die in-package cache, how about two separate GPU/VPU cores that share a single package? Perhaps the two cores could share many of the pins between them, reducing redudancy and helping keep cost down.

With some magical memory controllers, I could even see a single memory bus from the package to memory that is shared between the two cores.

Oh well... I'll stop smoking the wacky weed now. :)
 
Ack search functions of the board don't work always as I'd want to; thank God I always save the interesting stuff at home. Just to remind some people what I meant when I mentioned SA's idea (and I think it answers the threads most topics adequately anyway), here's the post I was refering to:

Originally posted by SA:

Highly scalable problems such as 3d graphics and physical simulation should get near linear improvement in performance with transistor count as well as frequency. As chips specialized for these problems become denser they should increase in performance much more than CPUs for the same silicon process. This means moving as much performance sensitive processing as possible from the CPU to special purpose chips. This is quite a separate reason for special purpose chips than simply to implement the function directly in hardware so as to be able to apply more transistors to the computations (as mentioned in my previous post) It also applies to functions that require general programmability (but are highly scalable). General programmability does not preclude linear scalablity with transistor count. You just need to focus the programmability to handle problems that are linearly scalable (such as 3d graphics and physical simulation). In makes sense of course to implement as many heavily used low level functions as possible directly in hardware to apply as many transistors as possible to the problem at hand.

The other major benefit from using special purpose chips for highly scalable, computation intensive tasks is the simplification and linear scalablity of using multiple chips. This becomes especially true as EDRAM arrives.

The MAXX architecture requires scaling the external memory with the number of chips, so does the scan line (band line) interleave approach that 3dfx used. With memory being such a major cost of a board, and with all those pins and traces to worry about, it is a hard and expensive way to scale chips (requiring large boards and lots of extra power for all that external memory). The MAXX architecture also suffers input latency problems limiting its scalability (you increase input latency by one frame time with each additional chip). The scan line (band line) method also suffers from caching problems and lack of triangle setup scalability (since each chip must set up the same triangles redundantly).

With EDRAM, the amount of external memory needed goes down as the number of 3d chips increase. In fact, with enough EDRAM, the amount of external memory needed quickly goes to 0. EDRAM based 3d chips are thus ideal for multiple chip implementations. You don't need extra external memory as the chips scale (in fact you can get by with less or none), and the memory bandwidth scales automatically with the number of chips.

To make the maximum use of the EDRAM approach, the chips should be assigned to separate rectangular regions or viewports (sort of like very large tiles). The regions do not have their rendering deferred (although they could of course), they are just viewports. This scaling mechanism automatically scales the computation of everything: vertex shading, triangle setup, pixel operations, etc. It does not create any additional input latency, allows unlimited scalablity, and does not require scaling the memory as required by the previously mentioned approaches.

Tilers without EDRAM also scale nicely without needing extra external memory. They are in fact, the easiest architecture to scale across multiple chips. You just assign the tiles to be rendered to separate chips rather than the same chip. The external memory requirements while remaining constant, do not drop however, as they do with EDRAM. The major problem to deal with is scaling the triangle operations as well as the rendering. In this case, combining the multi-chip approach mentioned for EDRAM with tiling solves these issues. You just assign all the tiles in a viewport/region to a particular chip. Everything else is done as above and has the same benefits.

In my mind, the ideal 3d card has 4 sockets and no external memory. You buy the card with one socket populated at the cost of a one chip card. The chip has 32 MB of EDRAM, so with 1 chip you have a 32MB card. When you add a second chip you get a 64MB card with double the memory bandwidth and double the performance. For those who go all out and decide to add 3 chips, they get 128 MB of memory, and quadruple the memory bandwidth and performance. Ideally, the chip uses some form of occlusion culling such as tiling, or hz buffering with early z check, etc. Using the same compatible socket across chip generations would be a nice plus.

In the long run I agree with MFA. Using scene graphs or a similar spatial heirarchy simplifies and solves most of these problems, including accessing, transforming, lighting, shading, and rendering, only what is visible. They also simplify the multiple chip and virtual texture and virtual geometry problems. We will need to wait bit longer for it to appear in the APIs though.

There are indeed two problems generally associated with partitioning the screen across multiple chips. Load balancing, and distributing the triangles to the correct chip. Both have fairly straight forward, very effective solutions, though I can't mention the specifics here.
Those are some good comments, MFA. However, there is no need to defer rendering and no need for a large buffer. Each chip knows which vertices/triangles to process, without waiting.
 
I think its best to use normal ram on graphics cards because you need it for many high resolution textures.
Memory bandwidth needs will probably reach a limit in the future right?
I think if 1terabyte/sec is reached in 10 years orso, it will be anough for perfect texture filtering (anough samples for it) and as many texture layers as you need.
Then the speed increase will come in calculation quality of textures and geometry and lighting, i dont think these need much memory bandwidth?
And if the lighting would need much memory bandwidth i think that for this on chip memory would be good.
I mean raytracing or radiocity or something else ...
Then multiple chips can be used without the need to double the ram for more bandwidth like with vsa-100.
Maybe the normal memory would be just for texture storage.
The frame buffer or scene buffer and all the space needed for calculations for lighting or anything else could be on edram on chip i think.
Maybe more physics/geometry calculations can be done too.
I obviously dont know much about how it all actualy works but i think this could be good somehow?
Maybe someone can give some comments on this.
 
Even 1TB/sec wouldn't really be enough.

Hell, if you went back to when Voodoo2 was released (2.4GB/sec per), I'm sure people would've said that today's 20GB/sec (ParH and R300) is way more than enough for everything. :)

I think if a core came out next month with 1TB/sec, yes, since nothing could EVER saturate that bus TODAY, it would be more than enough... but in the future, even that would be saturated. Then we'd be looking at 1PB/sec... even 1EB/sec. Ouch. Thinking about a card with 1EB/sec memory bandwidth... woo...
 
I dont think you can just think more is better with this...
The 20GB/s of the 9700 is anough for 2+ texture layers at 1600*1200 with 16x adaptive anisotropic at 100fps.
If you go to 10 texture layers ( why would you need that? ) and 1024x anisotropic limit in 3200*2400 i think you wont even use 1TB/sec
The higher the anisotropic the less pixels it does ... i think at 64x it probably does less then 1% of the pixels, at 1024x maybe 100pixels left on whole the screen.
So this is why im pretty sure that for texture bandwidth 1TB/sec would be anough to do perfect texture calculations with.
The other things can be saved on chip ram in the future and i dont even think they use as much bandwidth.
If anything would use more bandwidth eventualy it wouldnt even mather because its on chip ram, this 1TB/sec is just for the texture memory because they would take to much space on the chips.
 
Back
Top