What is Fast14?

2senile

Regular
http://www.eetimes.com/semi/news/OEG20040205S0029

AUSTIN, Texas — Intrinsity, Inc. and ATI Technologies, Inc. (Ontario, Canada) announced a licensing deal Thursday (Feb. 5)which could give ATI a performance edge in the competitive graphics IC market.
Intrinsity, based here, has developed a form of dynamic logic, called Fast14, that it is licensing to ATI "for use in future consumer products."

Sooooooo?

Anybody know anything about Fast14 ...... or speculation will do? :)
 
Razor04 said:
Google is your friend...
All this is great for an EE who's up with microprocessor design or a marketing type looking for buzz-words, but how about a two sentence digest for us CS guys who may have built a 4-bit CPU in college and know something of TTL digital logic, but that's about it. ;)

"Dynamic logic" "faster design" "higher clock speeds with overlapping clocks" implies "10 times performance advantage" -- well, ...yeah. Bah, marketing glitz. What's the engineering gist of this. Is this anything like building CPUs out of FPGAs? What's dynamic? The circuit? Why is it faster to design with this, or does one need to understand VHDL to appreciate the answer? Why would this be inherently a good technology to build video cards out of, barring the extra heat dissipation of course?
 
I don't know much of the specifics of fast14, but some about older dynamic logic.

In dynamic logic, you've got clocks on every single gate. A "gate" kan be more complex than a NAND gate, but I don't think they do gates equivalent to more than a handful of NAND gates.

Each such gate can be seen (logically) as a small logic function followed by a latch. The latch is however not stable. It stores the bit like in a DRAM, as a charge in a capacitor, thus the name "dynamic logic". This means that the latch can not store the bit for more than one clock cycle, and this clock cycle must be short. In fact, IIRC (it was long time ago) it might just store the bit for half a clock cycle, so you'll need the next gate to run at the opposite phase (that's the multiple clocks they talk about).

This logic is much faster than if you'd designed it with a normal gate and with a normal latch after, and it is very suitable for extreme pipelining. Think free pipeline stages for every couple of gates.

The drawback is that the capacitor in the latch will be charged and discharged for every clock cycle, even if the data doesn't change. And this will of course increase the power needed.

And then of course it's been (historically) harder to design for if you want to make use of that extreme pipelining. Keeping the waves of data in sync, and hiding the higher latencies (counted in clock cycles). Exact timing between the clocks of two consecutive gates has also been critical, and made it hard.

The exact timing has been made easier by fast14's quad clock. But I guess you still need to do some brain athletics to get good use of the extreeme pipelining. (The tools might help with some of it though.)

This way to design shoud work well if you have lots of independent identical parallelisim (PS or VS). Think hyperthreading.

I'll see if I can find some (transistor level) drawings of the actual gates.

[Edit]
I think I forgot one question: "Why is it faster to design with this?"
It faster to design with fast14 than old style dynamic logic because the new tools raise the design to a higher level (VHDL?) instead of the low level gate fiddling you had to do earlier.
Designing is probably still harder than with static logic, but the gap is smaller.
 
Basic said:
I don't know much of the specifics of fast14, but some about older dynamic logic.

In dynamic logic, you've got clocks on every single gate. A "gate" kan be more complex than a NAND gate, but I don't think they do gates equivalent to more than a handful of NAND gates.

Each such gate can be seen (logically) as a small logic function followed by a latch. The latch is however not stable. It stores the bit like in a DRAM, as a charge in a capacitor, thus the name "dynamic logic". This means that the latch can not store the bit for more than one clock cycle, and this clock cycle must be short. In fact, IIRC (it was long time ago) it might just store the bit for half a clock cycle, so you'll need the next gate to run at the opposite phase (that's the multiple clocks they talk about).

This logic is much faster than if you'd designed it with a normal gate and with a normal latch after, and it is very suitable for extreme pipelining. Think free pipeline stages for every couple of gates.

The drawback is that the capacitor in the latch will be charged and discharged for every clock cycle, even if the data doesn't change. And this will of course increase the power needed.

And then of course it's been (historically) harder to design for if you want to make use of that extreme pipelining. Keeping the waves of data in sync, and hiding the higher latencies (counted in clock cycles). Exact timing between the clocks of two consecutive gates has also been critical, and made it hard.

The exact timing has been made easier by fast14's quad clock. But I guess you still need to do some brain athletics to get good use of the extreeme pipelining. (The tools might help with some of it though.)

This way to design shoud work well if you have lots of independent identical parallelisim (PS or VS). Think hyperthreading.

I'll see if I can find some (transistor level) drawings of the actual gates.
http://www.ece.msstate.edu/~reese/EE8273/lectures/dynamic/dynamic.pdf
A quickie perhaps?
http://www.ece.msstate.edu/~reese/EE8273/
has more...
 
Dark Helmet said:
Razor04 said:
Google is your friend...
All this is great for an EE who's up with microprocessor design or a marketing type looking for buzz-words, but how about a two sentence digest for us CS guys who may have built a 4-bit CPU in college and know something of TTL digital logic, but that's about it. ;)
Sorry can't help you there...I am a Mechanical Engineer.
 
Thanks Aivansama.
Do you mean that just because there already had been a mention of Google, I should have thought of finding a link instead of writing some up myself? :D

What's the fun with that? :)
 
Look at these two paragraphs:

Intrinsity, based here, has developed a form of dynamic logic, called Fast14, that it is licensing to ATI "for use in future consumer products."

Bob Feldstein, vice president of engineering at ATI, said ATI believes that Fast14 technology "can deliver up to four times the performance per silicon dollar when compared with standard design approaches."

Summation:

*The technology has no immediate impact or advantage. If it proves workable it will be used in the future and ATi is in a position to use it; if not, they won't.

*Notice it does not say "4 times faster than existing silicon," but says "up to four times the performance per silicon dollar compared with existing design approaches." This makes me think we are talking about a design technique which isn't necessarily 4x faster, but might be 4x cheaper, instead.

*The press release is issued by Intrinsity, Inc., out of Austin, and really is designed to describe and publicize Instrinsity more than it is meant to be a description of any 3d chips ATi has planned for upcoming production. Intrinsity wants potential customers to know that a major player, ATi, has licensed its technology, which will hopefully serve as a reference for Intrinsity. This practice is common among such companies.

Bottom line is that Intrinsity's approach may, or may not be, used in future ATi products. Time will tell. Deferred rendering, possibly? (Trying to think of something that might impact the whole silicon in terms of price/performance in such a manner.)

Edit: Nah, not df. Looked at the .pdf, and looks like an interesting approach to transistor design and array.
 
WaltC said:
*Notice it does not say "4 times faster than existing silicon," but says "up to four times the performance per silicon dollar compared with existing design approaches." This makes me think we are talking about a design technique which isn't necessarily 4x faster, but might be 4x cheaper, instead.
Somewhere I read that the DSP(? FastMATH) they implemented with Fast14 was twice as fast (as regular approaches) but also twice as powerhungry. The power part is what's wrong with this picture...

Nelg, I couldn't get the pictures on that patent to open. (Couldn't bother to install new software). Could You give the punchline?
 
I did remember one thing wrong.
The capacitor doesn't get charged and discharged for every cycle. It just happens for cycles when the output of the gate is 0. This means that 0's cost much, and 1's not so much. (As opposed to ordinary logic where it's changes that cost.)


The patent describes a way to code signal values to use as little power as possible. The important part is that in addition to 0's and 1's, they have a "NULL"-state, which will make it draw less power. It will also propagate through the gate to make later gates draw less power.

One example they have is to code two data bits into four wires.
wires=1110 => data=00
wires=1101 => data=01
wires=1011 => data=10
wires=0111 => data=11
wires=1111 => data=NULL
Other combinations of the wires are illegal.

So if you have a "bubble" somewhere in your pipeline, make sure the bubble contains NULL signals to save power.
 
WaltC said:
*Notice it does not say "4 times faster than existing silicon," but says "up to four times the performance per silicon dollar compared with existing design approaches." This makes me think we are talking about a design technique which isn't necessarily 4x faster, but might be 4x cheaper, instead.

That is a rather academic distinction, you get 4 times the performance for a chip of a given cost either way.
 
Thanks for the info ...... thus far. :)

I have very vague memories of gates & have to admit that those Google links were beyond me mainly.

'fraid I need things so i can get a simple picture in my mind. :LOL:
 
MfA said:
WaltC said:
*Notice it does not say "4 times faster than existing silicon," but says "up to four times the performance per silicon dollar compared with existing design approaches." This makes me think we are talking about a design technique which isn't necessarily 4x faster, but might be 4x cheaper, instead.

That is a rather academic distinction, you get 4 times the performance for a chip of a given cost either way.

True - we're looking at a industry that pushes the edges of what is possible, already making products that are getting very expensive compared to the rest of the machine.

It sounds like as future chips become ever more complex and expensive, design technologies like FAST14 will make the difference between a chip being economically viable to build or or not.
 
This sounds a lot like asynchronous processing.

If I'm understanding this correctly, this "dynamic logic" is a way of just letting each transistor switch as fast as it can, allowing the data stream to "trickle through" the processor as quickly as it can (or something similar to this).

I remember reading a paper on asynchronous processing a couple of years ago. It was quite interesting.

This is what I gather:
1. As transistors get smaller and smaller, the spread of transistor switching speeds increases. A global synchronous clock forces the entire chip to be limited by the slowest transistor in the design. An asynchronous clock would essentially allow the data rate to be controlled by the average transistor switching speed instead. This could produce dramatic performance differences as transistors get smaller and smaller.

2. Without a global clock, these chips may circumvent a primary limitation upon the speed of current chips: radiation. An electric circuit will start to radiate if c/d is similar in magnitude to the frequency of the circuit (c is the speed of light, d is some measure of the size of the unit). If we set a threshold at 10%, at which there should be noticeable radiation, a 1cm chip would start to radiate at about 3GHz. Essentially, it may be possible for an asynchronous design to have better power usage characteristics at high clock speeds (i.e. it may not actually go faster, but it should radiate less). You should also note that radiation from modern processors should already be a problem that needs to be confronted.

Of course, an obvious drawback is the need for extra logic to synchronize the data flow through the chip. Such a chip would definitely be harder to design, but may be a way around the limitations current silicon-based chip technologies are beginning to run into, at least in the short-term. There's still only so much processing power that can be pulled out of silicon-based designs, and we will need to soon move onto other technologies for processing power to continue to increase.
 
mboeller said:
Aivansama said:
(Couldn't bother to install new software).

Look here : http://www.uspto.gov/web/menu/plugins/tiff.htm

You need an specific TIFF-viewer for the images

You can go here : http://www.internetiff.com/ (shareware) or here http://www.mieweb.com/alternatiff/ (freeware??) to view the images.
... Yes, I knew that. As I said, couldn't bother... :)

Chalnoth, there's still a clock. Dynamic logic does not (as far as I can tell, which is bloody damn near unfortunately) equal asynchronous logic. In fact, they list the four overlapping clock phases as one of the main points of fast14. The wire swizzling, which is also advertised, means simply that they change the order of the four data wires at the half-way point of a long trace. A shocker there.
I think that the least advertised aspect of this whole fast14 business is also the most important one: The EDA tools ATI acquired. Dynamic logic hasn't enjoyed much support in the EDA tools this far, thus it has been necessary to spend a lot of time hand-tuning the circuits using it. Fast14 EDA tools could mean that there is now a chance to include dynamic logic in the regular design flow and still expect it to work as planned. No more hand-tuning necessary, hence faster design times.
 
Chalnoth said:
2. Without a global clock, these chips may circumvent a primary limitation upon the speed of current chips: radiation. An electric circuit will start to radiate if c/d is similar in magnitude to the frequency of the circuit (c is the speed of light, d is some measure of the size of the unit). If we set a threshold at 10%, at which there should be noticeable radiation, a 1cm chip would start to radiate at about 3GHz. Essentially, it may be possible for an asynchronous design to have better power usage characteristics at high clock speeds (i.e. it may not actually go faster, but it should radiate less). You should also note that radiation from modern processors should already be a problem that needs to be confronted.

I believe async chips radiate in a more broad spectrum fashion which in turn is slightly less of a problem with interference.

What I would like to know is how far from mainly using standard fab cells in the logic parts of the current chips ( so IE lets forget about the mem/ramdac/agp ect ) because if there is currently a heavy amount of hand tuning then FAST14 probably isn't going to have a very profound effect.
 
Back
Top