NVIDIA GF100 & Friends speculation

Also, since the electrical properties of circuits on a chip these days are often limited by the wiring, you can affect the timing and power consumption of a circuit by changing only the metal, although there are obviously diminishing returns.

While this can sometimes be true, it really relies on a broken DRC/waiver or extremely bad cap extract flow. Fundamentally, wires are fairly easy to characterize and generate parameters for, as such the performance of them is basically known beforehand.
 
Maybe S/A was too optimistic with their prediction of 8000-10000 cards. Tweaker.net says 5000 worldwide at launch.
I think someone took my post to Zed_X to heart. He was stating that GF100 would have more launch quantity than Cypress, so I asked him if he really believed that they would have more than 10k cards at launch since Cypress had more than that. Or maybe my speculation is just that good?:LOL:

What if the A3 was a "lets try it if it can help before full respin" and the results were almost same as A2.
Or, let's not waste these thousands of Ax wafers on A2 or completely waste them by moving on to Bx.

Gotcha.

So the the only conflicting bit (I'll take Rys' word on this) is the die size.
Clocks, TDP and # of CUDA Cores are still very conflicting.

Yup. I had two SIMD's on RV770 that were mysteriously signifcantly slower than the rest, at the time we had the options of limiting the upper clock or dropping those two SIMD's entirely. Even though this wasn't considered as a gate to A12 the engineering team spent days and nights in the run up to the spin invesigating it and adding buffers in the metal layers to help the timing. Fortunatly it worked, and in fact A12 came back even faster than A11 with those two SIMD's disabled.
So RV770 wasn't A11, interesting. Only took almost 2years to find that out.
 
Last edited by a moderator:
That was quite instructive, thanks for taking the time!

I'm not entirely clear on the bold part, though. What do you mean by "the electrical properties of circuits are often limited by the wiring"?

Digital circuits have two main building blocks: transistors and wires. Transistors do the switching and store information, but they must be connected with wires.
In general, transistors switch faster as they get smaller. But wires get slower as they get smaller, for two reasons. Firstly, it's just hard to push current through small wires - this is called resistance. Secondly, the wires get packed closer together, which increases capacitance. Roughly speaking, with every new process generation, the transistors get a little faster and the wires get a little slower, and we're now at the point where the wires can dominate the time and power characteristics of a circuit.

You can make a wire propagate a signal faster many ways, here are a few:
1. Move the wire to a bigger metal layer. Generally, the lowest metal layers in the stack are the smallest, and are used for short connections. As you go higher in the metal layer stack, the wires get bigger (but there are fewer of them), which makes for less resistance and capacitance.
2. Cut the wire into pieces. The time it takes a signal to travel through a wire superlinearly related to its length. If you break the distance up into smaller hops, the sum of the delay for the entire path can be quicker. At the end of each hop, the signal has to jump back down to the transistor layer in order to be reboosted back to full strength and sent out to the next hop. This is called "buffering". You can only do this with a metal spin if you have extra buffers lying around which aren't being used - to add new buffers that aren't there would require a full silicon respin.
3. Make the wire shorter. Depending on the flexibility of your design, you can change the physical placement of the transistors it's connecting to make the path shorter. This is hard to do in a metal spin, and often introduces problems of its own: if you make one wire shorter, you often make another wire longer.
 
like all others before me thanx for the info!


i guess if a chip is made up of thousands/millions of buffers then based on the wiring you can come out with very different physical flow/path.

that puts what Dave was saying in perspective, a mad dash to rearrange the wiring based on a juggling act of available buffers.
 
Gigabyte Geforce GTX 480 box has typo, "Gigabyte Ati Radeon 5000 series supports DirectX 11". hehe

4j9lcn.jpg


and GTX 470 box has 384bit / 1536mb:

DSCF1381-580x435.jpg


"Quick rush these to the printer, we need them before wednesday for cebit"
 
what i find interesting assuming 320 and 384 bit buses and memory specs linked

320 X 3200 = 1024000 Mbit a sec
256 X 4000 = 1024000 Mbit a sec

384 X 3200 = 1228800 Mbit (assuming same memory speed as 470)
256 X 4800 = 1228800 Mbit

its going to be interesting to see if NV's uarch is more efficent in terms of bandwidth.
 
But you can use metal spins to fix...

* yield : no, pretty much never. (Maybe, just maybe, you can do some of the via cell yield improvement on higher metal layer, but on an already placed design, there's just no margin to start adding vias everywhere.)
* power : no, pretty much impossible, unless you're talking about a design bug that puts some piece of logic in a continuous semi-on, leaking state.
* and timing issues as well : only in two cases. Most common: noise fixes or, rare, buffering up a wire or two.

There's a reason why buffering up is much rarer than noise fixes: in a standard timing flow, a wire that's part of the critical path will already have been treated by the P&R tool to insert buffers, so those wires are usually already optimally driven and addtional drivers will actually slow down that path (due to the internal delay of the gate.)

Dave can correct me, but if RV770 could be sped up by buffering up a number of wires, it's very likely to have been an noise problem after all: non-critical path wires that became critical due to noise interference of nearby aggressors. Those can be fixed either by slightly rerouting the victim or aggressor nets or by buffering up the victim nets.

... power consumption of a circuit by changing only the metal, although there are obviously diminishing returns.
If you can significantly reduce power by doing a metal spin, the original silicon must have been totally borked.
 
You'll have to be more specific :) The stuff that used to run on the core clock now runs in different domains. When we say core clock today are we referring to the ROPs or scheduler/texture units?
 
Back
Top