Dual/Multi Chip Consumer Graphics Boards

ram said:
Gunhead said:
I wonder how well the Kyro architecture would work in a multichip configuration. The rasteriser looks ideal for that, simply a tile per chip, but I wonder if there would be big problems with (T&L and) binning?

AFAIK there is an arcade solution (Naomi) using multichip. There is a geometry processor acting as a bridge and doing the T&L & binning and two PowerVR Series 3 chips doing the pixel shading.

Series 2

CPU : SH-4 128-bit RISC CPU (200 MHz 360 MIPS / 1.4 GFLOPS)
Graphic Engine : 2 x PowerVR 2 (PVR2DC-CLX2) GPU's - (under the fans)
Geometry Processor : Custom Videologic T+L chip "Elan" (100mhz) - (Under Heatsink)
Sound Engine : ARM7 Yamaha AICA 45 MHZ (with internal 32-bit RISC CPU, 64 channel ADPCM)
Main Memory : 32 MByte 100Mhz SDRAM
Graphic Memory : 32 MByte
Model Data Memory : 32MByte
Sound Memory : 8 MByte
Media : ROM Board / GD-Rom
Simultaneous Number of Colors : Approx. 16,770,000 (24bits)
Polygons : 10 Million polys/sec with 6 light sources
Rendering Speed : 2000 Mpixels/sec (unrealistic max, assumes overdraw of 10x which nothing uses)
Additional Features : Bump Mapping, Multiple Fog Modes, 8-bit Alpha Blending (256 levels of transparency), Mip Mapping (polygon-texture auto switch), Tri-Linear Filtering, Super Sampling for Full Scene Anti-Aliasing, Environment Mapping, and Specular Effect.
 
PC-Engine said:
I believe the CLX chips in NAOMI 2 were designed to be scaled up to 16 chips. Simon F can probably offer some more info on it's operation/config.

He's on holidays so that could take a while... and it's before I joined so I have no clue about it.

K-
 
Gunhead said:
A propos, what do you think about deferred renderers (multiple or single) with the freedom of access VS3.0/PS3.0 seems to require, any problems there?

Which/What "freedom of access" are you talking about that might create problems with a deferred renderer ?

K-
 
Kristof said:
Gunhead said:
A propos, what do you think about deferred renderers (multiple or single) with the freedom of access VS3.0/PS3.0 seems to require, any problems there?

Which/What "freedom of access" are you talking about that might create problems with a deferred renderer ?

K-

Hmm, well, imagine a PS3.0. program with a branch and a KIL instruction in the branch - would force you to keep more things, already, no? Not sure, though.

What I do know, though, is that the R400 is kinda a TBDR ( hybrid, AFAIK, I doubt we'll ever know more about it ) with PS3.0. and VS3.0. - And look what happened to it...
It's canned. It's death. It's cancelled.


Uttar
 
Uttar said:
Hmm, well, imagine a PS3.0. program with a branch and a KIL instruction in the branch - would force you to keep more things, already, no? Not sure, though.

What I do know, though, is that the R400 is kinda a TBDR ( hybrid, AFAIK, I doubt we'll ever know more about it ) with PS3.0. and VS3.0. - And look what happened to it... It's canned. It's death. It's cancelled.
Uttar

Ah right, a KILL is actually similar to Alpha Test Punch Through, based on texture contents (and/or pixel shader operations in the case of a KILL) a pixel is either opaque (not killed) or completely transparent (killed). KYRO already supports Alpha Punch Through without problems so I see no problem for a tile based deferred renderer to support a KILL instruction. Conditional branching just influences which instructions get executed so inherently it does not change the actually final write. Obviously this kind of operations reduces the efficiency of all earlyZ mechanisms.

Now can someone fill me in on these claims that R400 is canned, cancelled ? Where does this info come from, is there a reliable source ? Something like a friend of a friend knows someone who works for ATI and he said that ... ?

K-
 
Uttar said:
What I do know, though, is that the R400 is kinda a TBDR ( hybrid, AFAIK, I doubt we'll ever know more about it ) with PS3.0. and VS3.0. - And look what happened to it...
It's canned. It's death. It's cancelled.

If these claims are true, two questions begs to be answeared:

1) Why?
2) What's ATIs back up plan?

(I can sence this thread going way off topic and growing quickly)
 
I'll give some reasons why a multichip design is more cost effective long term.

a greater variety of product line thru speed binning chips and coupled with multi chip designs heres a ficticious product line up

rv350 300/300 = $100
rv350 pro 350/400 = $200
rv 375 dual 275/300 = $300
rv 375 pro 325/400 = $400

the savings come from the following
1. TRULY unified drivers
2. less waste product because you can use low spec chips in a dual configuration to make them usable
3. better pricing thru volume bargaining i.e. your only buying one type of chip at quadruple or higher volumes
4.possibly same as above with the PCB and memory
5. you can charge doubled the price of single chip boards without double the cost thus increasing the margin between highend and low end boards as opposed to lowering it as is the case now.
6. increase the product cycle time saving r&d costs

It's a win-win situation for the consumer and the IHV . We get (hopefully) a naming convention that makes sense and is clearly a relationship between performance and cost , the IHV gets a lower total cost of production through the entire cycle resulting increase margins.
 
I have a feeling some people are jumping to conclusions about R400 being canned. I've heard it is behind schedule, which probably means some features will change, but I don't consider that canning a chip since they won't be starting from scratch again.
 
Okay, so first a little disclaimer: I'm not really sure the R400 is a TBDR. There were some rumors of it including some principles of TBDR ( beyond what the other architectures do, that is ) - but those were some of the least reliable rumors about it. Many things are pretty much sure about it ( unified PS/VS being one ) , though - this ain't one of them, though.
So it might not be a very clear justification of there being hard to do a TBDR with PS3.0. & VS3.0. - heck, the problem could be somewhere else anyway, it was a very ambitious design on many fronts.

The R400 being canned is an ooold rumor. The first rumor about that came in early March, IIRC. Then, it resurfaced, from different people, over time. People like MuFu and Hellbinder said it, for example.

What is still unclear, though is what's gonna happen now. Some suggest a R420 ( R400: Take 2 ) which would be based on roughly similar technology, and others say the whole R4xx line is canned and that ATI is going directly to the R5xx.

The thing is that there was a major design problem. Don't know what it is, though.

I would suggest the following thread, among others, for some info about this whole R4xx deal: http://www.rage3d.com/board/showthread.php?s=b10f8bbd4e2655a38d49188c4c43bb4d&threadid=33681807


Uttar
 
indio said:
I'll give some reasons why a multichip design is more cost effective long term.

a greater variety of product line thru speed binning chips and coupled with multi chip designs heres a ficticious product line up

rv350 300/300 = $100
rv350 pro 350/400 = $200
rv 375 dual 275/300 = $300
rv 375 pro 325/400 = $400

the savings come from the following
1. TRULY unified drivers
2. less waste product because you can use low spec chips in a dual configuration to make them usable
3. better pricing thru volume bargaining i.e. your only buying one type of chip at quadruple or higher volumes
4.possibly same as above with the PCB and memory
5. you can charge doubled the price of single chip boards without double the cost thus increasing the margin between highend and low end boards as opposed to lowering it as is the case now.
6. increase the product cycle time saving r&d costs

It's a win-win situation for the consumer and the IHV . We get (hopefully) a naming convention that makes sense and is clearly a relationship between performance and cost , the IHV gets a lower total cost of production through the entire cycle resulting increase margins.

I completely agree here. What I am describing is taking what a VPU does already and isolating it in a single operation or hybrid. One VPU is in control of the Vertex Processing and the other one is in control of the Pixel Processing.

The reason I suggested RV350 is because it is more than powerful enough, very efficient running off AGP Bus support power only (not needing external power), Cool running and .13u

Say this does infact double the cost, the RV350 or (9600Pro) is priced to be $200 range. So double would be what... $400, wow.. not so far off is it now?

And lastly, just becuase Nvidia does not have a multi-chip capable GPU does not mean this product cant exist. I am not saying a multi-chip design must operate like the ATi Rage Fury Maxx, they way I am describing is completely different in operation I believe and is possible. What I am asking is for clarity I suppose.
 
indio said:
http://www.beyond3d.com/forum/viewtopic.php?t=3552&postdays=0&postorder=asc&highlight=medallions&start=0

you can checkout this thread where sireric
(who is an ATI employee) gives some clues as to the multichip capabilities of the r300.

Thank you indio. :D
 
here's a flaw...

Here's a flaw with some multi-chip designs that target modern applications.

They get their speed by not sharing b/w. Each chip has it's own memory bus. Textures are typically duplicated between chips.

The problem is as more games do ;ots of render to texture, even for their main back buffer, these textures must either

a) live on both chips' memory, requiring both chips to do all the work

b) Split up the work, and blt the result back and forth between them before it is accessed by the other chip

Approach b) would still be a win over not doing multi-chip, but then there may be synchronization penalties cost, and it would prevent AFR-type approaches.

Unforunately, approach b) is incompatible with any scheme that lets you read the current frame buffer as a texture at somewhere other than the currently rendered pixel. Since there are no guarantees about where the frame buffer might be accessed, you can't split it between chip memory areas.
 
indio said:
I'll give some reasons why a multichip design is more cost effective long term.

a greater variety of product line thru speed binning chips and coupled with multi chip designs

This is only a superficial advantage and leads to fundimental problems on the economies of scale front when the product exists in a real, dynamic market.

The problem, which is inherient in nature, comes from the fixed costs of two die as opposed to one. When an IHV shrinks the lithography of their IC in time, they reduce the cost immensly. The problem is that even though the multichip company can shrink their two simpler chips inline with the one "superchip", they still have 2X the amount of pins, buses, and the other fixed costs of going multichip.

Thus, what happens is that as lithography shrinks and your competitior's "superchip" is now vastly cheaper, you're still bound by the fixed costs of connecting the IC to the board and the related PCB costs. These don't scale... period.

Especially when the product turnover was happening at 6month intervals, it was a horrible, horrible solution. It still is a horrible, horrible solution. And this can be seen by the fact that nobody uses it currently, and even the late 3dfx's forward looking plans were to move from the multi-chip solution.

The way you win in this industry is by putting your balls-to-the-wall and push lithography. You may not win everytime, but when you win it'll be big.
 
I can see some of the points you both brought up although others are not quite clear.

You suggest that the frame buffer will be one of the lagest problems. My simplistic answer would be non-shared buffer which is why I suggested 256MB DDR-II: 128MB for VPU #1 and 128MB for VPU #2.

Now as for cost, it was suggested that it would be more than 1 single VPU solution. Granted if it was double the cost of a 9600Pro x 2 = what we are already being forced to pay with the cost of a ploblematic NV30 solution.

So that is why I have a problem excepting a "do it all" single VPU solution vs. a Multi-Chip design like the Rage Fury Maxx but different in how it operates. #1 VPU for Vertex Processing and the #2 VPU for Pixel Processing. Both VPU's are more than capable of doing all the operations by themselves which if forced to process only half of the processes at double the rate, the only problem I can see would be synchronization.

How big a problem would synchronization present I suppose would be a good question. I would assume this would also give overclocking a problem as well. Although I can't help to recall the problem with overclocking the R100 long ago as the core clock was the same as the memory clock 183/183. This was also true for the Voodoo3 3500.
 
Thus, what happens is that as lithography shrinks and your competitior's "superchip" is now vastly cheaper, you're still bound by the fixed costs of connecting the IC to the board and the related PCB costs. These don't scale... period.

Not exactly true. If the "Uber" single chip solutions are more likely to require more expensive packging, as we've seen with the introduction of flipchip packages with R300, the smaller cores can get away with cheaper packaging.

Another area that may have bonuses is that of heat generation as the Uber core has its heat concentrated in one single point, where as smaller multiple chips will have heat spread over two different locations, which can be easier to manage, and potentially make these more scaleable in terms of clock speed.
 
Vince said:
indio said:
I'll give some reasons why a multichip design is more cost effective long term.

a greater variety of product line thru speed binning chips and coupled with multi chip designs

This is only a superficial advantage and leads to fundimental problems on the economies of scale front when the product exists in a real, dynamic market.

The problem, which is inherient in nature, comes from the fixed costs of two die as opposed to one. When an IHV shrinks the lithography of their IC in time, they reduce the cost immensly. The problem is that even though the multichip company can shrink their two simpler chips inline with the one "superchip", they still have 2X the amount of pins, buses, and the other fixed costs of going multichip.

Thus, what happens is that as lithography shrinks and your competitior's "superchip" is now vastly cheaper, you're still bound by the fixed costs of connecting the IC to the board and the related PCB costs. These don't scale... period.

Especially when the product turnover was happening at 6month intervals, it was a horrible, horrible solution. It still is a horrible, horrible solution. And this can be seen by the fact that nobody uses it currently, and even the late 3dfx's forward looking plans were to move from the multi-chip solution.

The way you win in this industry is by putting your balls-to-the-wall and push lithography. You may not win everytime, but when you win it'll be big.

Let me clarify what I mean a little . I'm talking about using multichip boards as a cost cutting and performance enhancement across a product generation. i.e. R3xx or Nv3x .
Instead many different chips built upon the same basic "reference" design is what is being produced. Each individual model has a unique chip and if possible PCB as well.
I seriously think producing 4 unique chips for a single product line is more costly than producing one unique chip and scaling it across the product line to stay performance competitive. It's just a different approach. bottom up as opposed to top down. ;)
 
Sorry Dave, :?

But I can't help to be fascinated by the probability of seeing a more recent use of this design :D . I found the answer to my question of R&D cost here:

Interview done by Computer Games Magazine with Nympha Lee, product manager for ATI's Rage Fury MAXX:: http://www.cdmag.com/articles/024/190/ragefm_interview.html

CG: Could the AFR multiple chip technology you've developed for the MAXX be used with any future ATI chip?

Lee: Yes. We've purposely designed the whole multiple-ASIC technology and the AFR rendering implementation to where we can just stick our next generation chip on the card and it will work. Going forward, we can take this technology and use it to our advantage in the next generation of chips.

CG: So there's basically no R&D involved in using this with a future chip. Just stick it on the card, tweak the drivers, and you're ready to go.

Lee: Right, and we're continually optimizing it and making sure all the features work. So yes, definitely this technology can be taken to a future product.

CG: Is it possible to do a three or four chip version?

Lee: The three or more chip thing hasn't really been investigated. It's not really a matter of just saying "yeah let's put three, four, five chips on there." We've go to sit down and say "how much board real estate is that going to cost? How much is the RAM going to cost?" You need a bigger board, and the cost is going to be high. It's one thing to say "yes we can do it" and another to say "yes, the market is going to buy it."

CG: It's more a matter of you not wanting to offer a $600 card.

Lee: A $600 card might be a little beyond the market we want to pursue. We want to provide some type of value and $600 is a lot of money to ask of a gamer.

Now, I suppose my only question is why haven't we seen this card yet? :?:
 
Back
Top