Dual/Multi Chip Consumer Graphics Boards

Tahir2 · Apr 27, 2003

Malfunction I believe the answer to your question above lies in the interview you posted by Nympha Lee.

CG: It's more a matter of you not wanting to offer a $600 card.

Lee: A $600 card might be a little beyond the market we want to pursue. We want to provide some type of value and $600 is a lot of money to ask of a gamer.

Granted the question was about 3,4 or 5 chip boards but ATI were making little money on the whole 9500/9500P/9700 lines. The chip they designed (R300) was an Uber chip and even killing off half the pipeline, removing the HyperZIII functions still lead to a redesign of a less Uber chip (the RV350) precisely because of cost. Even then other costs involved including the 256bit memory interface amount of RAM needed etc would mean a multi-cored RV350 (performance about that of a R350) would be helluva expensive.

Vince · Apr 27, 2003

DaveBaumann said:
Not exactly true. If the "Uber" single chip solutions are more likely to require more expensive packging, as we've seen with the introduction of flipchip packages with R300, the smaller cores can get away with cheaper packaging.

Ok, got me here... argh...

Another area that may have bonuses is that of heat generation as the Uber core has its heat concentrated in one single point, where as smaller multiple chips will have heat spread over two different locations, which can be easier to manage, and potentially make these more scaleable in terms of clock speed.

Dave we (not just you or I obviously) argued about this three years ago and I used to side with you, but I just don't see it happening realistically. Given how the industry has historically operated, I maintain that the best way to go is push the current bleeding-edge lithography almost to the breaking point when your designing your architecture (initial yeilds will suck, but generally normalize quickly when you have the resources interworking with TSMC as nVidia does - I think we can agree the NV30 was a fundimental error) and then scale it down in time.

Also, thermally, the diffrence would probobly be negligable with any active cooling solution where the solution is built to the intended lithography spec.

watupwidat · Apr 27, 2003

elroy said:
It seems to me that one of the biggest problems with making a graphics chip these days is the .13 um process.

Some would argue that its not the process that's necessarily the problem but rather the design methodology being used to try to exploit the process. Specifically, designing with clocks is becoming more and more of a problem as the entire industry moves to finer processes.

There is a good overview of why clocked or synchronous design methodologies are so problematic:

http://www.fulcrummicro.com/technology/download/Async White Paper.pdf

The alternative is asynchronous design and there is a growing movement afoot to bring it to the mainstream. Asynchronous design has been around for quite a while but has been largely thought of as fringe or niche. However the difficulties that companies like Nvidia and many others have been having is resulting in an increasing awareness of this alternative methodology. Even though it under the radar for most people, there is actually a fairly active async community in the academic and commercial world most notably at Cal Tech, Sun, Phillips, FulcrumMicro, and even Intel.

There is a great deal of pain and risk associated with adopting a radically different design methodology for sure. But because there is a great deal of pain right now with the existing synchronous methodologies, I think there is a good chance that the industry is right on the cusp of a major change.

indio · Apr 27, 2003

pathelogical overclockers would hate it

RussSchultz · Apr 27, 2003

pathelogical overclockers would hate it

So would test and product people. How the hell do you test if you can't
insert scan chains?

arjan de lumens · Apr 27, 2003

Asynchronous logic, while having certain advantages, has a number of problems as well:

Variable performance that depends critically on supply voltage and temperature.
While average switching noise levels are much better than for synchronous logic, it's hard to make a guarantee that the worst-case behavior isn't just as bad as for synchronous logic. (Consider the case where every subcircut of the async circit receive enable/data ready signals at the same time. Unlikely for sure, but how can you GUARANTEE that it won't happen?)
AFAIK, async logic is in general unproven for large designs - the largest design I am aware of is the AMULET processor, which is something like 2 to 3 orders of magnitude smaller than the largest synchronous designs. (While there is some async logic in most processors, I have yet to hear about a Pentium/Athlon-class processor that uses it for anything else than just transferring data between clock domains.)
A reason for this may be that there doesn't seem to be any good Hardware Description Languages available for asynchronous design (like Verilog and VHDL for synchronous design), which means that you pretty much have to design everything by hand at the netlist level, increasing development time (as measured in man-months) dramatically.
Testability at manufacturing time. For synchronous logic, this is generally a solved problem, which scan chains and that sort of stuff, which can be automated to a very large degree. For asynchronous circuits, there is no corresponding automatic method available, so the chip designer will have to design a lot of custom tests and associated test circuitry in order to make the design at all testable, consuming a lot of additional development time.

Given that these problems persist after 2 decades or so of research, I tend to view async logic a bit like fusion power: holding potentially great promise for the future, but seems to be perpetually just out of reach for mainstream use.

indio · Apr 29, 2003

http://www.xbitlabs.com/news/video/display/20030427185052.html
If it's only $18 for a chip then I think thats it's pretty clear . The easiest and cheapest way to increase margins and nearly double performance is a multichip part.

MfA · Apr 29, 2003

I cant guarantuee all oxygen molecules in my room wont decide they only like the corner of my room for a while, but I dont worry about it ... the probability just needs to be small enough, an outright guarantuee is not needed.

ram · Apr 29, 2003

indio said:
http://www.xbitlabs.com/news/video/display/20030427185052.html
If it's only $18 for a chip then I think thats it's pretty clear . The easiest and cheapest way to increase margins and nearly double performance is a multichip part.

I doubt this number. Some high end models (R300 and NV30 just for example) phase out even before they drop below 30 USD, so I guess the number is valid for mainstream parts with a long lifetime.

arjan de lumens · Apr 29, 2003

MfA said:
I cant guarantuee all oxygen molecules in my room wont decide they only like the corner of my room for a while, but I dont worry about it ... the probability just needs to be small enough, an outright guarantuee is not needed.

OK - if the activity in the chip is random enough, achieving a 'good enough' failure rate (like, say, 1 failure per 10^15 operations or whatever) may be attainable. But I am not really convinced that e.g. a tight code loop or something functionally similar won't cause a highly non-random repeating pattern in the noise emitted by the chip. Also, if you are going to emply power gating (to stop leakage currents; as opposed to clock gating, which stops switching currents), you will get a rather nasty spike whenever you turn on power, possibly negating the entire noise advantage that asynchronous circuits hold over synchronous circuits.

no_way · Apr 29, 2003

Does anyone have some info on current status of Stanford's Imagine project ?
When i first looked at it, it seemed like an ideal solution for future scalable graphics architecture.
Perhaps Texas Instruments will put out the next Voodoo 2

watupwidat · Apr 30, 2003

arjan de lumens said:
Asynchronous logic, while having certain advantages, has a number of problems as well:

Variable performance that depends critically on supply voltage and temperature.

There are several different types of asynchronous circuits. Fulcrum uses delay-insensitive async circuits which are operable over a very wide range of power and voltage. For example they claim they can run one of their 1.5v nominal devices from minimum transisitor threshold voltage( .8v) all the way up to punch-through voltage ( 3.3v ) without the device glitching. This is pretty amazing for a several million transistor device.

[*]While average switching noise levels are much better than for synchronous logic, it's hard to make a guarantee that the worst-case behavior isn't just as bad as for synchronous logic. (Consider the case where every subcircut of the async circit receive enable/data ready signals at the same time. Unlikely for sure, but how can you GUARANTEE that it won't happen?)

Click to expand...

Synchronous circuits get much closer to worst case due to the fact that you have things switching all at once on the clock edge.

[*]AFAIK, async logic is in general unproven for large designs - the largest design I am aware of is the AMULET processor, which is something like 2 to 3 orders of magnitude smaller than the largest synchronous designs. (While there is some async logic in most processors, I have yet to hear about a Pentium/Athlon-class processor that uses it for anything else than just transferring data between clock domains.)

Click to expand...

I remember reading somewhere that Fulcrum built a 13 milllion transistor processor which they called the Vortex processor. Its certainly not as large as the behemoth GPUs from Nvidia/Ati but certainly large enough to prove that their technology can scale to a moderate size.

A side note, originally the founders were actually wanting to design a high end GPU with their startup company. I think the venture capitalist probably convinced them that it would be wiser to target a market that wasn't as ferociously competetive as the graphics market. So they have been going after the NPU market.

[*]A reason for this may be that there doesn't seem to be any good Hardware Description Languages available for asynchronous design (like Verilog and VHDL for synchronous design), which means that you pretty much have to design everything by hand at the netlist level, increasing development time (as measured in man-months) dramatically.

Click to expand...

Interestingly Fulcrum has been using Java as their HDL. Cadence is one of Fulcrum's investors and they are said to be developing a toolset/language together.

[*]Testability at manufacturing time. For synchronous logic, this is generally a solved problem, which scan chains and that sort of stuff, which can be automated to a very large degree. For asynchronous circuits, there is no corresponding automatic method available, so the chip designer will have to design a lot of custom tests and associated test circuitry in order to make the design at all testable, consuming a lot of additional development time.

Click to expand...

Testability is one area that Fulcrum hasn't said much about and I'm curious to find out how they address it.

Given that these problems persist after 2 decades or so of research, I tend to view async logic a bit like fusion power: holding potentially great promise for the future, but seems to be perpetually just out of reach for mainstream use.

The technology has been evolving. I think its a very different picture than it was 20 years ago. But certainly there are still many hurdles.

arjan de lumens · Apr 30, 2003

A functioning async HDL and a 13 million transistor async design are news to me the and indicate that asynchronous design has indeed come quite a bit longer that I gave it credit for. Any links to where I can learn more about this java-based HDL?

As for the fact that it works over such a wide range of voltages, that mainly indicates that asynchronous designs (or at least the logic family that Fulcrum uses) are extremely tolerant of power supply problems (excessive ripple or too low/high steady voltages), which is a good thing wrt reliability. Since gate delays are directly dependent on supply voltage, I would still expect performance to vary with varying voltage, though (although reduced performance when voltage dips low is still much better than glitches).

watupwidat · May 1, 2003

links to articles

Vortex is described here.

http://eet.com/in_focus/silicon_engineering/OEG20030110S0034

There is a very short blurb on their use of Java included in this article here:

http://www.eetimes.com/story/OEG20020819S0031

It looks like my memory was not quite accurate. Java is used as a very detailed high level modeling language which is then decomposed into an intermediate language.

And here is Fulcrum's presentation at last years HotChips:

http://www.hotchips.org/archive/hc14/hc14pres_pdf/21_lines.pdf

Dual/Multi Chip Consumer Graphics Boards

Tahir2

Vince

watupwidat

indio

RussSchultz

Professional Malcontent

arjan de lumens

indio

MfA

ram

arjan de lumens

no_way

watupwidat

arjan de lumens

watupwidat

Similar threads