What are the advantages of multi-chip cards?

Johnathan256 · Oct 22, 2002

Just about all I have heard about multi-chip architectures are disadvantages. There are multi-chip rumors going around about the NV30. The 3dfx Rampage chips were part of a new line of cards called Spectre. However, Rampage was to be used in conjunction with Sage, which was a geometry processor on a separate chip. The GigaPixel GP-1 chips were also multi-chip capable. With both of these designs in the hands of Nvidia and the notion of NV30 being a revolutionary new architecture, it makes sense to think Nvidia might have taken that route unless they have developed the technology into a single chip thats even better. 3dLabs seems to have had some success with their graphics array onboard the Wildcat III cards. I am wondering, what are some possible advantages to a configuration of multiple chips with specific tasks?

Brent · Oct 22, 2002

I remember back in the day with the introduction of the VSA-100 and its multi-chip solution that NVIDIA was admiment about staying with a single chip card. They seemed to think then that that was the better way to go, I don't remember the specifics, but it is one thing about all that that I do remember.

If they have re-thought this approach after seeing all the Gigapixel/3dfx tech then it could be most interesting. I would love to know their thinking on it myself. Of course no one really knows WHAT the NV30 will be just yet, heh. So it is really all speculation on that part.

But as for the advantages a multi-chip configuration can have I too would love to hear from the people here how it could be advantagous. One thing I can think of right off the bat is heat, if they offload the work to other chips then it should in turn keep them cooler then if everything was crammed into one chip. The transistors per chip can be decreased.... I'd love to hear more myself though on the benefits....

Johnathan256 · Oct 22, 2002

Yeah, I am most interested as well. I am no expert on this subject either but it seems to me that it would be cheaper to use smaller chips with less transitors. This would spread the overall transistor count and make the chips easier and cheaper to produce. And as you said, heat would also be less of and issue. But as far as performance I am not sure. What 3dfx did with the Voodoo 5's might have been ok if a geometry engine could have been added. Hardware T&L baby!!!

borntosoul · Oct 22, 2002

maybe instead of having one huge core running at max with lots of heat ,spread it accross multiple smaller and much cheaper cores ,i cant see no reason why this cant be done in the future ,

Fuz · Oct 22, 2002

Having a multi-chip solution would make it easier to scale across dif markets. Single chip for budget gamers, dual config for mid-range and quad setup for enthusiasts. You get the idea.

Fuz

RussSchultz · Oct 22, 2002

IF that was the way to go, why hasn't it happened?

Why? Because its not the way to go. Transistor counts go up as die size goes down due to advances in technology.

Cost is king in the consumer market and if your equally competitive part costs more than your competitor, you lose.

To have 2 chips means double the test time, double the packaging cost, double the support/glue logic chips, double the traces on the board. All of these things cost.

Concievably, as 3dfx showed, that you could use a single die to hit all segments of the market. Their super high end product never was economically viable, was it?

Sharkfood · Oct 22, 2002

There is a difference between multi-chip and scalable multi-chip architectures.

Multi-chip, like rumors of the alleged 3dfx Sage/Fear combo, would have the advantage of being able to market multiple products. Say you have a VPU and the T&L Processor in a second chip. You can target two markets with the same chip design by having one product that only has the VPU, and another that has both the VPU and T&L processor.

The old Amiga PC's got around current technology limits in much the same way. By having a host of custom chips, they were able to make a system with a 7.14mhz CPU offer performance similar to 25mhz to 33mhz machines of their time by creating specialized chips to handle many of the tasks the main CPU was performing at the time in other systems.

For multi-chip/scalable- it's a pretty simple concept- when you hit the technological ceiling, you have to go wider to go faster. Scalable technology pretty much sets the limit of performance to the limit of your checkbook. If an architecture is truly scalable, the sky (and space in your case) truly is the limit.

The only *real* detriments to scalable architectures are: cost and space. The advantages are the ability to surpass the performance of the given technology level. There were many marketing projects to denounce scalable designs at around the time of the Voodoo5, so the majority of the bad mojo on the topic is just the result of overactive marketing weasels.

SGI, Sun, IBM and all the biggies have reached towards scalable graphics architectures for over a decade prior and done so with great success for pushing the next envelope earlier than anyone else.

Scalable technologies are generally made obsolete by future generations. After all, 3-5 years ago, a dual Pentium 90 server can now be far surpassed by a single P4 server. At the time, the Pentium 90 was the highest performing processor, so if you needed more power- ya had to add more. Quad Xeon servers today will likely be topped by some future single CPU model as well, but such speed just hasnt been reached on a single die yet.

Randell · Oct 22, 2002

specifically to V5 SLI style multichip solutions - increased memory cost has to be an limiter as well as you need more memory than an equivalent single chip solution. eg the 64meg V5 wasnt really a true 64meg card as the texture memory is shared. It (in those days) equated to slightly more than 32meg though as most games didnt have more than 20-25meg of textures to share.

These day with UT2003 what, boasting some maps have >100meg of textures? A multichip, scalable SLI solution would have to launch with 256meg of VRAM.

Now please correct the inevitable bits I've got wrong.

Johnathan256 · Oct 22, 2002

Yes you make a good point, but how about this. The main limiting factor of the V5 was memory bandwidth and the cost of putting more memory on the board was to high. So the V5 had enough power but not enough bandwidth to implement it.

What about a multi-chip tile-based rendering architecture? The memory bandwidth would be effectively doubled and would be more than enough for each chip even if a minimal amount of memory was used. Nvidia has patents for both types of architectures. Surely they could combine best ideas of all those former 3dfx and Gigapixels engineers, as well as some of Nvidia's new concepts and produce something revolutionary. Time will tell.

Randell · Oct 22, 2002

but you still need real memory space to store real textures in. Effective bandwidth isnt the issue I was alluding.

Reverend · Oct 22, 2002

The main limiting factor of the V5 was memory bandwidth ...

Obviously this depends on the app but I'd argue that it was insufficient fillrate instead of bandwidth.

Sharkfood · Oct 22, 2002

Who says you have to replicate storage for scalable architectures?

Sure, in the case of 3dfx's SLI architecture, textures are stored for easy, non-arbitrated use by multiple chips, but this surely isn't a total given for all implementations.

I'll refer back to the Amiga as this used a special set of shared memory that both the CPU and custom chips had access to (called "chip" memory) versus "fast" memory only the CPU had access to.

Designs similar could use a shared pool of memory for textures and handle arbitration either by a complex memory controller or some form of cycle method to handle access at a reduction to bandwidth. Enough on-die cache for texture lookups would clear this bottleneck, even if texture fetches were limited to 1/2 the normal memory bandwidth for two chips, or 1/4 the bandwidth for four chips.

It becomes a simple case of metering cost-ratio to complexity of implementation. In the case of 3dfx, it was deemed cheaper to simply throw more memory on the boards and with the vmem footprint of everything in 3D games rarely exceeding a 16MB videocard (i.e. the era of 8MB cards was just ending), the replication was simple and accomodating. Two or three years later, we're only now starting to see texture needs in less than 2% of the games that become cramped with that model.

Reverend · Oct 22, 2002

Someone do some quick theoretical bandwidth calculations and cost re single chip with 256bit memory interface pincount and two chips with 128bit pincount, assuming the chip in both cases is the same in terms of architecture.

[edit]Forgot the cost part.

BoardBonobo · Oct 22, 2002

Sharkfood said:
I'll refer back to the Amiga as this used a special set of shared memory that both the CPU and custom chips had access to (called "chip" memory) versus "fast" memory only the CPU had access to.

If I remember correctly, only the copper and Agnus (for blitting) had those restrictions, Denise could access fast ram. And was used to pull any extra stuff in via dma channels.

It's been such a long time since I played with any of that stuff

.

Randell · Oct 22, 2002

Sharkfood said:
Who says you have to replicate storage for scalable architectures?

Sure, in the case of 3dfx's SLI architecture, textures are stored for easy, non-arbitrated use by multiple chips, but this surely isn't a total given for all implementations

which I was why I said specifically for 3dfx SLI as I was sure there must be another way, done before, someone else would know about.

arjan de lumens · Oct 22, 2002

One way to partially avoid replicating texture data is to use virtual texture memory, with separate memory maps for each renderer core. This could work, as the core that takes e.g. the upper half of the screen generally has a different working set of textures than the core that takes the lower half.

The only way I can see to fully avoid replicating texture data when doing multiple renderer chips is to have the texture memory shared between the renderers. Having a common bus is pretty much out of the question (too many loads lower bus speed, and contention between the renderers cut into the efficiency of each renderer), so the remaining solution is to have dual/multi-port DRAM chips, like the VRAMs of yore. Haven't seen such beasts in a long time, but they should still be doable. Routing two 256-bit buses into a row of dual-port DRAMs could be an interesting challenge for the board designers ...

Tagrineth · Oct 22, 2002

arjan de lumens said:
One way to partially avoid replicating texture data is to use virtual texture memory, with separate memory maps for each renderer core. This could work, as the core that takes e.g. the upper half of the screen generally has a different working set of textures than the core that takes the lower half.

The only way I can see to fully avoid replicating texture data when doing multiple renderer chips is to have the texture memory shared between the renderers. Having a common bus is pretty much out of the question (too many loads lower bus speed, and contention between the renderers cut into the efficiency of each renderer), so the remaining solution is to have dual/multi-port DRAM chips, like the VRAMs of yore. Haven't seen such beasts in a long time, but they should still be doable. Routing two 256-bit buses into a row of dual-port DRAMs could be an interesting challenge for the board designers ...

First part: That wouldn't work for 3dfx. Remember that they weren't doing top-half-bottom-half, they were doing SLI - alternate segments of scan lines (from 1 line alternation, all the way up to blocks of 64 lines in VSA).

Second part: 3dfx has a means to make SLI share RAM... but the tech was devised VERY near the end. Wouldn't even have appeared in Rampage's successor had 3dfx lived.

Dave Baumann · Oct 22, 2002

Wildcats have have two render cores and can pool texture data between them.

arjan de lumens · Oct 22, 2002

DaveBaumann said:
Wildcats have have two render cores and can pool texture data between them.

Two points:

1. How do they do it? Passthrough from one chip to the other?

2. Wildcats have never been known for high texturing performance - Wildcat4 seems to have about 15% of the texel fillrate of the R9700.

KimB · Oct 22, 2002

The way I see it, the primary benefit of multi-chip systems is, quite plainly, performance. The largest drawback is most obviously price.

However, there are more subtle differences as well.

Some of the obvious limitations of multi-chip systems: While datapaths within a chip are nearly unbounded in size (they can be as big as they need to be), it becomes increasing challenging and expensive to manufacture boards that have larger datapaths. In other words, if you want to compare, say, two 4-pipeline chips to one 8-pipeline chip, the 8-pipeline chip will fundamentally have an easier time rendering the same data based merely on the fact that the internal datapaths will be better.

That said, one of the primary benefits of having multiple chips on a graphics card is that it isn't quite as challenging to give each chip its own dedicated memory bus. In other words, multiple chips have a relatively easy time getting large amounts of memory bandwidth. But, in reality, it simply won't be 2x the bandwidth of a single chip with the same memory type. Some real-world examples to illustrate this:

The Voodoo5 series needed to duplicate texture memory across the different chips. This means, by analogy, that texture memory could not be shared between chips, so that a two-chip V5 required twice as much texture memory bandwidth as a single-chip V4. While also having twice the memory bandwidth, this basically means that the effective bandwidth of a V5 was: 2 x (V4 memory bandwidth) - texture memory bandwidth.

As another example, the Rage Fury MAXX from ATI had a different set of problems. This card was designed to render one frame per chip. What this ended up meaning was that triple buffering was required for smooth rendering (without triple buffering, the framerate was very unstable), and there was also an increase in the amount of data sent over the AGP bus over a single-chip version.

In the end, this all just boils down to the limitation of multiple chips to pass large amounts of data between each other.

All of that said, two chips will always be better than one chip of the same type, unless the engineers are a bunch of monkeys. But two chips are often not going to be twice as good as one of the same type (2x performance with 2x chips may be achievable for non-realtime apps without too much hassle, or for realtime apps with an extra control chip or set of control chips). So, this means that for the absolute best performance at a given level of technology, multi-chip solutions are a must.

So, why don't we see multi-chip solutions in the consumer 3D graphics market today? Quite simply, the technology has reached a point where there is generally only one GPU on a graphics board (There used to be a few coprocessors, for example, I believe the Voodoo2 used 4 separate processors), and given the already high cost of the single-chip boards, multichip boards would just be prohibitive.

What we will probably see going into the future is an increase of the number of multi-chip boards that are made with ATI and nVidia hardware for the workstation market. Particularly if chips from either of these companies can be shown to be an excellent replacement for today's truly high-end 3D graphics machines (ex. SGI's Infinite Reality), their chips will need to have built-in multi-chip functionality. Even with the existence of such technology, however, we still may not see multichip products released for the consumer market.

What are the advantages of multi-chip cards?

Johnathan256

Brent

Johnathan256

borntosoul

Fuz

RussSchultz

Professional Malcontent

Sharkfood

Randell

Senior Daddy

Johnathan256

Randell

Senior Daddy

Reverend

Sharkfood

Reverend

BoardBonobo

My hat is white(ish)!

Randell

Senior Daddy

arjan de lumens

Tagrineth

murr

Dave Baumann

Gamerscore Wh...

arjan de lumens

KimB

Similar threads