The future of 3d chips, product cycles, and R&D cost.

GetStuff

Newcomer
I'm not sure how to go about wording this, but people have said ATi has had to take a similar approach to designing the R300 that Intel has designing their CPUs. Something along the lines of customly configuring each transistor as opposed to using "libraries" provided by their foundry.

But anyway, considering that point above, and realizing this is a gross over simplification of the senario. Will graphic chip makers be forced to devote more and more engineering resources and/or time to the design of future chips to remain competitive?

Up until the .13 micron issues NVIDIA is having with the super complex NV30, AFAIK things have been relatively smooth sailing down to .15um...

It seems to me that NVIDIA (and ATi up to R300) has been using cutting edge process technologies as a crutch to keep beefing up their GPU's, while ignoring squezzing the most out of mature processes...

Intel has been able to ramp up their .13um process so fast because of what? More R&D cash, correct? Intel spends more on R&D in one year then NVIDIA and ATi put together have for the past few years.

I guess the final question is. Are we going to see product cycles lengthen and innovation slow down really soon? (At the IC level, not memory technology)
 
Intel is a much bigger company with a broad bussines range thus their R&D is not limited to cpu design but also involves process engineering, fault detection etc etc etc. You cannot compare intel's R&D with fabless design house such as Nvidia or even ATI who subcontacts the manufacturing of the boards and chips. Furthmore PC CPUs are general purpose processors and thus are much more difficult to design. Graphics processors domain is limited to certain operations and thus are much easier to design. Their speed is proportional to the PE elements you can fit into it(ignoring other factors such as mem speed etc). As it is now GPUs domain is limited and thus the r&d and development time is significantly smaller.
 
efty said:
Intel is a much bigger company with a broad bussines range thus their R&D is not limited to cpu design but also involves process engineering, fault detection etc etc etc. You cannot compare intel's R&D with fabless design house such as Nvidia or even ATI who subcontacts the manufacturing of the boards and chips. Furthmore PC CPUs are general purpose processors and thus are much more difficult to design.

...what doesn't matter 'cause a P4 contains less than 50 million transistor - less than a half of a 110+ million R300.
 
T2k,

Lets not compare custom logic with a more cookie cutter approach. If you've taken even a basic digital systems course you should realize that the goals in a CPU are much harder to reach than they would be in a graphics processor. The issues that Graphics processors tackle are much more clear cut, this narrow problem allows one to design the necessary logic with less difficulty.
 
Of course, GPUs today include so much more than CPUs, and are becoming so much more flexible, that it seems a given that they will eventually approach the R&D time/money required by a major CPU like the Athlon or Pentium4.

Another way to state it is that it seems that we are coming very, very close to a head in the 3D graphics market that will, "Separate the men from the boys," so to speak. Given all of the low-cost DX9 chips and whatnot announced from graphics companies we've not heard from in a while, I have a good feeling that only ATI and nVidia will be able to compete realistically in a decently-wide variety of scenarios. If the other companies attempt to keep up with the technology, I do believe that the difference will just become more pronounced (As an example, remember Cyrix? Their CPUs are unbelievably slow compared to the competition now...and though they were slower in the Pentium/5x86 days, they're one heck of a lot slower now). Unless the smaller companies can carve out their own little niche, they're not going to survive. For example, I think Matrox has a niche that they will be able to keep nicely for many years to come. I'm not sure any of the others do.
 
Hmmm .. using transistor count as a measure of complexity does not necessarily make sense - there are many DRAM chips out there with 250+ million transistors, for example - it really depends on how many transistors are spent on stuff like RAMs and other repetitive structures. 8 pipelines may very well mean that they spent a lot of effort on designing 1 pipeline and then used copy/paste to produce the remaining 7 pipelines (OK , this is oversimplifying a little, but you get the idea).
 
T2k said:
...what doesn't matter 'cause a P4 contains less than 50 million transistor - less than a half of a 110+ million R300.

Well...and what about Itanium/McKinley? I don't have exact transistor counts in mind but AFAIK they are far beyond the 110m of the R300.
 
Most of the transistors of the McKinley are in the caches aswell.

Transistorcount != complexity.

Graphics chips are all static CMOS designs (I'm guessing), whereas CPUs use dynamic logic to a great extend to reach those superfast clockspeeds.

Cheers
Gubbi
 
It is generally very difficult to describe dynamic logic in HDL code, and standard synthesis tools currently just don't support dynamic logic at all, so all dynamic logic currently needs to be laid out by hand, gate for gate. This adds a LOT of time and cost to chip development, enough to make it infeasible for GPU makers on their current product cycles. Even AMD didn't add dynamic logic to its Athlon chips until the "Palomino".

A chip designed from the ground up for dynamic logic, like the Pentium4, can get a ~60-120% speed boost over standard static logic, whereas adding dynamic logic after the fact as in "Palomino" gives a boost more in to 20-30% range.

As far as ATI is concerned, I am guessing that they merely laid out the clock tree by hand and possibly overlapped HDL coding with floorplanning, allowing them to find any excessively long data lines early on and put some Pentium4-style drive stages on them.
 
I thought the whole "ATI is like INTEL" statement surfaced simply because they did some custom logic, rather than a completely standard cell design. I don't remember anybody actually talking about their design process being similar to Intels. But you know what goes first as you age...
 
Dynamic logic will become less relevant soon with diminishing feature sizes (isnt a 2.2x performance increase a tad optimistic BTW?).

BTW I found <A HREF=http://www.easic.com/news_events/DACpanel%20WA.pdf>this pdf</A> amusing. It asserts ASICs perform so poorly because the legacy design paradigms underlying modern EDA tools are backwards, to the point that he does not see ASICs surviving in the future (comes from the inventor of the fastest form of logic in literature, at least so he says, which Intel is evaluating at the moment and which might replace static&amp;dynamic logic in the future for high performance devices ... at least for those companies which get to license his patents ... ).
 
The actual speed increase of using dynamic over static logic does vary with the actual circuit being implemented, but given the 5+ GHz ALus of Pentium4, I'd say that a best-case factor of 2.2 isn't that far off for high clock speed circuits. Still, given the increasing importance of interconnet delay as feature sizes get smaller, the benefit of dynamic logic may become smaller as well, so unless someone figures out how to compile HDL code into dynamic logic (in which case it will become ubiquitous fast), it may someday disappear.

EDA tools aren't that bad - from what I've heard, EDA tools tend to produce circuits that are no more than about 20% larger/slower than design by hand does (not conting the effects of dynamic logic). FPGAs? Yes, they may take over for ASICs - they already do so where low chip quantities are needed and absolute performance or power usage aren't deadly critical, but a competitive FPGA CPU or GPU is a long way off still.
 
Saem said:
T2k,

Lets not compare custom logic with a more cookie cutter approach. If you've taken even a basic digital systems course you should realize that the goals in a CPU are much harder to reach than they would be in a graphics processor. The issues that Graphics processors tackle are much more clear cut, this narrow problem allows one to design the necessary logic with less difficulty.

I did and not basic only (I'm an eng. but not PCB-related :)). And I didn't say which one is easier. I just inserted something forgotten.
;)

(BTW, if you've taken even a basic digital systems course you should know how serious push in two-three times more transistor to the same place and getting run on 300MHz using 150 nano instead of Intel's 130 nano)
 
Snyder said:
T2k said:
...what doesn't matter 'cause a P4 contains less than 50 million transistor - less than a half of a 110+ million R300.

Well...and what about Itanium/McKinley? I don't have exact transistor counts in mind but AFAIK they are far beyond the 110m of the R300.

Yes, yes. Again, I didn't say things are equal. :)
 
The same aspects of dynamic logic which make it fast also make it unstable, and that is what is the biggest problem with decreasing feature sizes ... sensitivity to noise. The problem of interconnect becoming a dominant factor in the maximum clockrate was severely oversold in the past.
 
Asynchronous logic

There are some interesting things happening in the area of asynchronous design. Its not clear to me how willing an Ati or Nvidia would be willing to jump to an asynchronous methodology, but there appear to be some compelling arguements for doing so. There would be a lot of pain involved for sure. But the current situation is not a bed of roses either.

Fulcrum Micro presents the case for asychronous design:
http://www.fulcrummicro.com/technology/download/Async White Paper.pdf
 
As a guess I believe that within 3 years CPU chip densities (transistor counts) and transitor allocations to logic units per chip (what units you spend those transistors on) and sub-unit interconnect complexity (how powerfully each sub unit is pipe lined or interconnected) will be the key determinate of computing power, with clock speeds a poor second.

So once transistor densities exceed 500M transistors what logic units you have, how many of them there are and how they are interconnected (i.e. how parallel your chip can be) will predominantly determine general performance.

So chip designers in a few years time might have 200M more tranistors to play with. It will be important to consider how these additional 200M transistors get allocated to logic units. Chip and compiler designers need to ask do we need more ALUs, PLUs, FPUs etc... where should I best spend these transistors to best raise overall performance? Tricky stuff.

I am presuming that on chip cache might consume 50% of your tansistor budget - so think 1,000M transistor chips = 50% cache and 50% logic units).

Once you get over 1 Billion transistors (2008 onwards) on a chip raw power might be more closely linked to the number and complexity of links between sub units, rather than the actually number of sub units. Like the human brain. In general parallel processing the key determinate is often the complexity of interconnects between processing nodes, rather than the raw number of processing nodes, that has the greatest impact on performance.

Think more interconnects give you more ways of keeping what you have very, very busy. Best utilising the potential of these interconnects is an exponentially complex problem. If you look at even Hammer, you see AMD are saying not too many more ALUs or FPUs - but clever pipeline to keep what we already have much more busy. Its already started!
 
Hrm, those PDF's on FPGA's and asynchronous circuits were quite interesting.

Anyway, here's another link for those interested. This is a company that is currently designing products for the supercomputer market that are based on FPGA designs:

http://www.starbridgesystems.com

I read about this company a couple of years ago, and this really seems like the way that silicon-based processors are going to be going.

As an aside, the University of Washington (which was, apparently, where the person who wrote the FPGA article is) is also making some very substantial contributions to Quantum Computing research. Very interesting stuff.

Update: One thing that seems very, very strange about starbridge's products is that they apparently run Windows 98. There's just something strange about that...
 
Back
Top