AMD: R7xx Speculation

Status
Not open for further replies.
(BACKSTAGE, the scream of the packaging expert, who now has to invent a completely new packaging technique.)

Heh, thanks "silent_guy" :)
Lots of stuff there I didn't know -- and here I was only worried about alignment and heat removal....

but I'm not convinced others would agree, so let's leave it at that. ;)

'k. Not yet convinced of the utility of the ringbus, but it does have the attribute that it doesn't need a centralized piece. On the otherhand, it isn't clear to me where some of the other bits hang out yet either -- tesselator, rasterizer....

-Dave [redonning skeptic hat, wondering how this approach is anything other than a way to avoid building high-end chips....]
 
AMD's K8 has a 12-stage integer pipeline that has a decoded instruction cross a significant portion of the processor's die.
Floating point might take up to 17, and it crosses the width (narrow side, but still not 1/10 the length of R600) of the Opteron die.

AMD's L2 cache latency is 12 cycles, of which about 9 are actually involved in accessing the cache, which is over half the die right there.

At the same time K8 clocks in at 3 GHz, much higher than R600.

Itanium has a huge cache that covers large portions of the chip, but it has a 14 cycle L3 cache latency.
That means the worst-case where an L3 line the furthest from the core is accessed takes that time to make it back. Several cycles are probably needed for tag comparisons, so it takes less than 14 cycles for the signal to cross the cache.
Couple that with Itanium's pipeline length, we're talking 8 cycles for an instruction to cross the core portion.
That's on the order of maybe 20 cycles worth of time for a load needed for an operand on an executing result from the L3 to propogate through a cache an layers of complex logic.

Last I checked, Itanium is larger than R600 and clocked twice as high.
 
Last edited by a moderator:
But isn't the situation much different with the kind of workload we have in the GFX-cards? The pipelines are much longer etc. AFAICR.

EDIT: and thanks :)
 
Last edited by a moderator:
Normally, dies are tested after cutting, but before packaging. Cutting by itself can put some mechanical stress on the die and result in failures. But there's no reason why you can't test before cutting. Your only problem would be that you may have larger fallout in after-packaging testing, which will increase overall cost.

Much more of a problem: you can't cut out individual dies because you're using a circular saw that cuts across the whole wafer. (Basically, you first attach the wafer to a plastic that's stretched flat on a hollow ring. The saw will cut deep enough to separate the dies, but it doesn't cut through the plastic. Individual dies are separated only 150um from each other, so the saw is extremely precise.)

I think it's about time to replace those classical mechanical saws by laser cutting machines.
 
I think it's about time to replace those classical mechanical saws by laser cutting machines.

Wouldn't the heat generated by a laser needed to cut the substrate destroy the chips? Sure, a saw may generate heat as a side effect, but it's primary cutting mechanism is mechanical. To use a laser that has no mechanical cutting edge means you need a lot of heat to perform the cut, and I can't see today's modern chips being able to survive that.
 
The point of a laser cutter is to burn or vaporize away whatever conntects the two sides of what you want separated.
How would a UV laser burn the silicon away without heating it?
 
Wouldn't the heat generated by a laser needed to cut the substrate destroy the chips? Sure, a saw may generate heat as a side effect, but it's primary cutting mechanism is mechanical. To use a laser that has no mechanical cutting edge means you need a lot of heat to perform the cut, and I can't see today's modern chips being able to survive that.

When used it in a pulsed manner, I think it would be fine: you can have an average below thermal annealing temperature. Furthermore, the accuracy of the beam helps to localize the heat as good as possible. In other words: the cutting line can be realy sharp (=less area=> less heat).
 
When used it in a pulsed manner, I think it would be fine: you can have an average below thermal annealing temperature. Furthermore, the accuracy of the beam helps to localize the heat as good as possible. In other words: the cutting line can be realy sharp (=less area=> less heat).
This is about producing millions of pieces. You don't exactly have the luxury to go at a leisurely pace. Do you really think you can heat up silicon enough to vaporize it without the heat extending father that 80um?
 
When used it in a pulsed manner, I think it would be fine: you can have an average below thermal annealing temperature. Furthermore, the accuracy of the beam helps to localize the heat as good as possible. In other words: the cutting line can be realy sharp (=less area=> less heat).

Here's an interesting document:
The laser's intense effect on silicon in particular gives rise to cracks, structural changes, burrs and deposits on the wafer surface.
If lasers were a preferable replacement for saws, then Intel, IBM, TSMC, AMD or any one of the other chip manufacturers would be using them. They already use lasers for etching the surface of chips or cutting jumpers on the die packaging, so the fact they don't use them for cutting dies from the wafer should tell you something.

Lasers may become more useful as wafers get thinner, but again, chip layers are getting denser, so this may offset the viability of laser cutting. Chips are basically a glass-like substance full of heat-sensitive circuits (that can destroy themselves if operated for a few tens of seconds without a heatsink), so it doesn't seem very practical to use a cutting method that relies purely on the heat from a beam of light to cut through the wafer.
 
Last edited by a moderator:
The point of a laser cutter is to burn or vaporize away whatever conntects the two sides of what you want separated.
How would a UV laser burn the silicon away without heating it?
That's what UV lasers do. The energy contained in a UV photon is high enough to destroy any chemical bond. So UV lasers directly sever the bonds between atoms.

By way of a rather crude analogy, if you imagine atoms connected together with string :) a conventional laser makes all the atoms jerk to and fro faster and faster until eventually they are torn loose. To do that you have to make a large number of atoms vibrate furiously to and fro. A uv laser can go straight in and directly cut the strings without introducing vibrations.
 
Chemical bonds are states of lower energy between two atoms.
If a bond is broken, then enough energy was injected into the system to push the atoms out of their bond.
That energy eventually has to be dealt with.

Is a UV laser simply able to do this without exciting too many of the atoms, since its mechanism is no different than a lower-frequency laser?
 
Jawed made a post a while back referencing this page to which I have been spending some time reviewing - I'm halfway through the course work now (listening to the lectures and reading the power point slides).

Anyhow, this morning the NVIDIA guy (teaching) said some interesting stuff on the Lecture5 tape. He suggested (at 29:40) that currently the TFs (Texture Filters) are done through dedicated hardware but once revealed through the API with programmable elements the floating point power would be roughly doubled.

Can anyone confirm if that idea is a viable reason to why G90 would reach nearly 1 TFlop?

He then goes on to say (at 1:04:20) that Bi/Tri linear filtering is also currently done through and dedicated hardware and that the Next generation or the generation after that will utilize all those floating point units for general computing.

Is this info helpful?


PS: I listen to the lessons at about 160% normal speed. Try it if you don't normally use that feature - it's very helpful.
 
3vi1: Very nice find, now I regret not having listened to ALL those lectures, heh! :)
Interestingly, what David Kirk seems to be describing there is programmable filtering hardware (which NVIDIA doesn't have a patent on), rather than using the ALUs for filtering and then load-balance (which they do have a patent on).

I guess it is fairly logical that if you want to implement one, the other goes in hand with it. Otherwise you're just tempting programmers to use filtering operations not supported on the TMUs and you're wasting that silicon completely...

And I don't think this is a G9x feature, let alone because that'd be ridiculously more advanced and programmable than what the D3D10.1 API asks for. Furthermore, it's really not obvious to me how you be able to use that in a generic manner.

You could think of it as a coprocessor to the main ALUs for GPGPU, but that's kind of messy. So that's probably what he's thinking of in terms of not being sure how to expose it. That seems to imply it would actually be exposable, so that G80's filtering hardware is already programmable, but I think he stopped just short of confirming that really and that it simply isn't the case. I'd love to be wrong, though.
 
Here's an interesting document:
If lasers were a preferable replacement for saws, then Intel, IBM, TSMC, AMD or any one of the other chip manufacturers would be using them. They already use lasers for etching the surface of chips or cutting jumpers on the die packaging, so the fact they don't use them for cutting dies from the wafer should tell you something.

Hehe, that is reasoning the other way around. And I can't really agree on this point. IBM, Intel and so on want to use proven tech, unless it's really really really really proven that lasers are better. And this doesn't stop just at the manufacturing level. Also architecture, devices and so on. I have some nice stories from a colleague from Intel Haifa (Israel) to illustrate it, but it would go a bit off topic. ;)

Lasers may become more useful as wafers get thinner, but again, chip layers are getting denser, so this may offset the viability of laser cutting. Chips are basically a glass-like substance full of heat-sensitive circuits (that can destroy themselves if operated for a few tens of seconds without a heatsink), so it doesn't seem very practical to use a cutting method that relies purely on the heat from a beam of light to cut through the wafer.

The size of the ammount of layers on top of the chip is quite small compared to the substrate. And the substrate needs to be thick for mechanical stability, mainly limited by 'the saw'. Especially in my own field (RF analog IC design) we would like the substrate to be as thin as possible to limit the parasitics. (We hate it when our 60 GHz signals leak away :mad: ).
And I'm not really sure what you're referring to with the 'heat sensistive circuits'. You're reffering to burn-out or some kind of latchup? Or maybe junction break down? Usually wells in combination with specific doping profiles will make sure that this kind of failures will rarely occur.

Sorry for the off topic btw. Back to R700 :!:
 
You could think of it as a coprocessor to the main ALUs for GPGPU, but that's kind of messy. So that's probably what he's thinking of in terms of not being sure how to expose it. That seems to imply it would actually be exposable, so that G80's filtering hardware is already programmable, but I think he stopped just short of confirming that really and that it simply isn't the case. I'd love to be wrong, though.

Hmmm where's tertsi...? I believe he had some theories about the possiblity that G80 could schedule MADs on the filtering units.
 
Status
Not open for further replies.
Back
Top