A few questions on NV30, NV35

First, I'm curious about something with the NV30. This has to do with the available die space, and the transistor count of certain features.

NV30 is rumored to have ~120 million transistors, and the 9700 has ~110+ (don't remember reading a definite number there either, just 110+). Let's say, for simplicity, that the difference is 10 million transistors.

My question is probably obvious: how many transistors would an additional 8 TMU's (one extra per pipeline) take (roughly)? Can that be bought for 10 million or less transistors?

Other obvious questions along that line: how expensive will the higher precision pixel shaders (128 bit vs. 96 bit) and more flexible pixel/vertex shaders be in transistor count? Are we talking a few thousand to implement the "advanced" or "beyond DX9" features, or are we talking a few million?

Also, what rumors are there on "videoshader" like capabilities, and integrated RAMDAC's on the NV30? The exclusion of one integrated RAMDAC could buy some die space, as could the exclusion of some of the videoshader-like logic. How much does one RAMDAC cost, approximately?

OK, enough NV30 questions... on to the NV35.

Why is it assumed that the NV35 will be a "refresh" to the NV30 anytime in the near future? I can't help but notice that NVIDIA's historical timing between NVx0 and NVx5 generations is anything but 6 months.

NV10 was the GeForce256. Next was the DDR version, and then we got the NV15 (GF2) a year or so later. The "refresh" was not the NV15 core, but simply the same NV10 core with DDR memory. Before the NV20 core we saw the NV16 (GF2 Ultra and Pro) cores. A ~year later, we get the NV20.

How long did it take to go from NV20 to NV25? Roughly a year. Between the NV20 (GF3) and NV25 (GF4) we got the NV20-Ti core (GF3-Ti). This was the "six-month refresh," not the NV25.

In fact, it looks like we'll get the "refresh" of the intermediate generation (NVx5) well before the NV30 ever arrives (i.e., the NV28, which is analogous to the GF2 Ultra--NV16 core).

This is the way I see things: NVIDIA takes ~2 years between "DX generations" (between NV10, NV20, NV30) and takes ~1 year between new core architectues (NV10 -> NV15 -> NV20 -> NV25 -> NV30), with a "refresh" every six months that consists of faster memory/core speeds.

So I ask again: why is it assumed that the NV35, which historically speaking would be roughly a year behind the NV30, will show up sometime in the first half of next year? At best, I would expect to see a faster core/memory version of the NV30 late next spring (delayed a bit because of the NV30's delay), and the NV35 late next fall.

Will NV suddenly jump a generation, during their most difficult generational step yet? I don't see how that assumption can be made.
 
My question is probably obvious: how many transistors would an additional 8 TMU's (one extra per pipeline) take (roughly)? Can that be bought for 10 million or less transistors?

Well, it's more complicated than that because these are two different chips designed by two different companies. Even if it would take 30 million transistors to do "8 more texture pipes", that wouldn't mean that NV30 can't have it. NV30's design migh have fewer transistors for memory cache or pixel pipelines compared to R-300.

That being said, there is a related point here that has irked me when reading many NV30 "previews". They all "expect" NV30 to have more features usually saying because it's manufactured on 0.13 vs. 0.15. That's bogus.

If one were to specualte on "features" you would look at transistor count, not process size. These two chips are similar in size (transistor wise), and give the past history with these two companies, one would guess they'd have similar features. (Each chip probably having some features that the other doesn't support.)

What one would expect, givin similar transistor counts, is that the more advanced process would be clocked higher, and consume less power. The latter is basically a given. However, given the manufacturing problems of 0.13, and given that ATI went beyond the power limits of AGP, it is certainly questionable whether Nv30 will be faster (clock speed), than R-300.

Why is it assumed that the NV35 will be a "refresh" to the NV30 anytime in the near future? I can't help but notice that NVIDIA's historical timing between NVx0 and NVx5 generations is anything but 6 months.

I'm with you. Assuming NV30 ships December at the earliest, I don't see NV35 following on its heals in the typical "spring release" of April/May. I see a likely scenarios:

1) NV30 in December or even "spring", NV35 next "fall"

or

2) NV30 gets "canned" altogether, and Nv35 this "spring."

In any case, I expect to see a "GeForce4 refresh" (NV18/NV28?) for nVidia this september.
 
Joe DeFuria said:
Well, it's more complicated than that because these are two different chips designed by two different companies. Even if it would take 30 million transistors to do "8 more texture pipes", that wouldn't mean that NV30 can't have it. NV30's design migh have fewer transistors for memory cache or pixel pipelines compared to R-300.
Point well taken. I guess I'm just trying to get a handle on the magnitudes involved here. If it takes (gonna make up numbers here) ATi 8 million transistors per pixel pipeline, can NVIDIA realistically do that in 7 million, or 7.95 million (you get my point I hope)?

I guess my question is further compounded by the apparently more flexible pipelines of the NV30 (128 bit pixel shaders, more instructions, dynamic flow control... etc.). It would appear (to an outsider) that it would be difficult to implement an 8 pipeline architecture that is both more flexible and less transistors than ATi's implementation.

You see where I'm going with this? Sure... there will be different ways of doing certain things, and the transistor counts will vary between cards, but I can't help but notice how similar the 8500 and GF3 were in transistor count given their fairly similar capabilities.

The bottom line is that I'm really just interested in knowing approximately how many transistors 8 TMU's would cost. At the very least, it would give an idea of how many transistors NVIDIA saved in other areas by being more "efficient."

Also, I'm still curious as to any rumors regarding integraded RAMDAC's and videoshader-like capabilities in the NV30, and how much die space those features would consume.
 
I don't think its questionable at all that the NV30 should have a higher clock than the R300.

.13 vs. .15 will have approximately a 20% max clock boost, all else being equal.

I'm not saying it definately will, just everything suggests that it probably will, and very little suggests that it won't.
 
.13 vs. .15 will have approximately a 20% max clock boost, all else being equal.

Well, that's my point. All else is NOT equal.

I mean, all else being equal, a 110 million transistor chip on 0.15 would be considerably slower than a 60 million transistor chip on 0.15, right? So how does one explain the R-300 clock rate in comparison to nVidia's and ATI's previous efforts?

Two things are immediately "not equal" when looking at R-300 vs. NV25/R-200:

1) Power Consumption limit. R-300 consumes more power. Earlier chips conformed to AGP power limits, which in turn limits max clock rate. R-300 "ignores" AGP limits and does not have this constraint.

2) "Tuning design by hand". This is how ATI explained the high clock rate.

Now, if you can tell me for a fact that Nv-30 ignores AGP power limits, and that it's design was "similarly" hand tuned, then I would agree that it's reasonable to guess the clock rate would be higher.

Furthermore, 0.15 process is mature, and 0.13 is obviously going through "teething" issues. That also makes things "not equal." And don't forget the approximately 10% more transistors of NV30....

So agagin, I agree that "all else being equal" one would expect a higher clock rate for NV30. However, there is little to indicate that all else is in fact equal, while there is evidence to the contrary.
 
What about Trident's XP4...Full DX8.1 on only 30 million transistors....

I think that shows quite well how very different designs can have vast differences in the number of transistors.
 
Look at how you do a multiplication by hand.

Code:
     XXXX
*    XXXX
---------
     YYYY
    YYYY
   YYYY
+ YYYY
---------
 ZZZZZZZZ

That's not neccesarily how it would be done in hardware, but most methods show a similar behaviour in one sense. The number of Ys' is proportional to the square of the length of the input values. This means that the size of that part of a multiplicator will grow as the square of the length of the mantissa. 96bit is 4x24bit which likely have ~16 bit mantissa. 128bit is 4x32 bit which have 24bit mantissa. 24*24/(16*16) = 2.25

So one rather space consuming part of the multiplicators will increase in size with a factor larger than two. I can't say how large this part is of the total ALU, but it is at least a sigificant increase. Other part will also grow, but not as fast.

However I don't think NV30 is a full 128bit design, and there's indications that I'm right. Nvidia talks about the possibility to switch between 64bit and 128bit for internal and external operations, to chose between speed and precision.

So NV30 is probably doing 64bit at "full speed", and 128bit is done by "recycling" the hardware. It's reducing the hardware to less than what's needed for 96bit, but you'll get a performance hit ( /2 ?) for 128bit.


Joe is however right, it's two different chips done by different design teams at different companies. Nvidia won't take a R300 and add 10M transistors. How efficiently the first 110M transistors is used can say more about how much space is left at the end than the extra 10M. And it could sway both ways.


NV35:
Is it realy assumed that it's coming soon? I haven't heard much (if anything at all) about it. There's some rumours about a NV31 though.
 
Transistor count is no indication of relative speed, in fact, it is in many cases exactly opposite of the indication you are using it as. (pipelining takes more transistors, but allows higher clock rate; parallel multiplies take more transistors, but less gate steps)

.13u uses about 30%-40% less power than .15u

Also, you seem to think that one company is doing something the other is incapable of.

Again, I'm not saying IT IS SO, I'm saying its over reaching to say "its questionable that NV30 will have a higher clock rate than the R300" given all the parameters we know.
 
Well, we're looking at opposite sides of the coin:

Transistor count is no indication of relative speed...

Nor is process size. We're both making over generalizations based on "all else being equal."

Also, you seem to think that one company is doing something the other is incapable of.

IMO, you're assuming that one company will do everything the other one has already done.

Again, I'm not saying IT IS SO, I'm saying its over reaching to say "its questionable that NV30 will have a higher clock rate than the R300" given all the parameters we know.

And I'm not saying "it's not possible that NV30 will have a faster clock." I'm saying it's overreaching to say "I don't think its questionable at all that the NV30 should have a higher clock than the R300."

I'm saying it's not really a given, either way.

Whereas it would apparently surprise you if NV30 is clocked lower than R-300, it would not surprise me. (Nor would it surprise me if NV30 is clocked higher...how's that for straddling the fence?)
 
Basic said:
So NV30 is probably doing 64bit at "full speed", and 128bit is done by "recycling" the hardware. It's reducing the hardware to less than what's needed for 96bit, but you'll get a performance hit ( /2 ?) for 128bit.

I can't speak for nVidias design, obviously, but prior computational iron I have experience with has doubled precision at a factor of two penalty for the actual computation. My knowledge of how this is(was) implemented in silicon bears this out as reasonable today as well, but that knowledge is sadly dated. A couple of the data points however, are not.

Entropy
 
RussSchultz said:
I don't think its questionable at all that the NV30 should have a higher clock than the R300.

.13 vs. .15 will have approximately a 20% max clock boost, all else being equal.

I'm not saying it definately will, just everything suggests that it probably will, and very little suggests that it won't.

But is all else equal? Jen Hsun did say they were using the 0.13um WITH copper interconnects for NV30. Does TSMC's standard 0.15um process use copper? (as opposed to aluminum)

The resistivity's of these are:
copper = 1.7e-6 ohm-cm
aluminum = 2.75 e-6 ohm-cm

So copper has 38% lower resistance(i.e. 62% better conductivity when you invert the numbers) than aluminum, which will translate into a lower propagation delay along the interconnect lines, since they look like a distriuted series resistance shunt capacitance network.

Now, I wouldn't expect to be able to clock the chip 65% higher just because of going to copper, but I would expect to see a fair fraction of that, say 20 to 30% increase of clock rate just due to copper interconnects.
 
The big issue to worry about here wrt to clock speeds is yield. The .13 micron process may be inherently faster than .15 micron, but we've already been hearing some pretty negative comments about the current yields achieved using TSMC's .13 (most recently from Anand). Die yields follow a curve that increases with decreasing clock speed... so one of the simplest ways to get yields under control is to lower clock speed.

The question is, would Nvidia choose to lower the NV30 clock speeds to R300 levels to make the chips worthwhile to sell, or would they just swallow the poor yields and jack up the price? The latter seems more likely, and doesn't bode well for lower cost versions of the NV30 (ala the GF4 Ti4200).
 
Well, we'll just have to disagree.

To me "it is questionable" connotes "there is severe doubt".

If that's what you meant, I think you've got no basis in fact since we know nothing other than transistor count and process and neither one backs up your contention.

Or, if its not what you meant, I think you've chosen your wording poorly.

Anyways...what what BD asking? Oh yeah, what transistors equal what features.

I dunno, personally. 100 million transistors is a lot. I suspect a lot of that is cache. For some numbers that I know that are easily findable:

(Multiply by 6 to get transistors)
USB1.0 cores are approximately 20k gates
USB2.0 cores are approximately 80k gates
8051 clone uC is about 10k gates
8086 clone uM is about 20k gates
80186 clone uM is about 100k gates (with USB, ethernet, other 186 peripherals)
Older DSP is about 50k gates
Newer DSP is about 100k gates
ARM7 is about 50k gates
ARM9 is about 100k gates
 
To me "it is questionable" connotes "there is severe doubt". If that's what you meant...

That's not what I meant. To me, "questionable" means, uh, "questionable." ;) As in, not certain, with some doubt.

http://www.dictionary.com/search?q=questionable

Or, if its not what you meant, I think you've chosen your wording poorly.

Or perhaps you have a pre-conceived notion of my motives, and therefore translated questionable as "severe doubt." 8)

Again, I brought this up because NO ONE that speaks of NV30 in "previews" or rumors seems to question AT ALL the possibilty of NV30 being clocked lower....only 0.13 process = must be clocked higher. And I DO see the possibility of it being clocked the same or lower as very real.
 
Remember GF2 launch with only 10% faster memory quickly superseded by GF2 Ultra?

Remember GF3 superseded by GF3 Ti500?

NV30 out... has lower than expected clockspeed (OMG its only 400MHz).

NVIDIA release NV30 turbo 3 months later ... (thats better, its 405MHz 8) )

Ad *infiniti/nauseum (delete as appropriate).

NVIDIA are trying to something special here... ATI have already done it so NVIDIA will not rest unless they have the FASTEST performing chip on the market, period. I say that because ever since the TNT2 nothing has beaten NVIDIA's performance, and to boot they always pack their GPU's with tons of features no game uses for a couple of years. At least they get the ball rolling, e,g, TnL.
 
Joe DeFuria said:
To me "it is questionable" connotes "there is severe doubt". If that's what you meant...

That's not what I meant. To me, "questionable" means, uh, "questionable." ;) As in, not certain, with some doubt.

I guess it all depends on personal preferrences. Some people would publicly question a statement only if they have significant doubts, some would do it with "some" doubts :)

Cheers,
Darkman
 
I got a question. Why is it that CPUs are going in the GHZ range, while gpus are still in the low megahertz? Is it a heating problem?
 
alexsok said:
it is certainly questionable whether Nv30 will be faster (clock speed), than R-300.

:LOL:

Don't quote things out of context, that's very annoying. He posted his reasoning (ATi going beyond AGP spec, etc) and you did nothing to convince anyone otherwise with your little "lol" face. Seriously, I feel his argument has merit, especially considering the NV30 is getting piss-poor yields. Try to be a little more objective or you're just going to be eating crow when the NV30 comes out and it isn't "all that". :rolleyes:

RussSchultz said:
I don't think its questionable at all that the NV30 should have a higher clock than the R300.

.13 vs. .15 will have approximately a 20% max clock boost, all else being equal.

I'm not saying it definately will, just everything suggests that it probably will, and very little suggests that it won't.

The point is all things are NOT equal as ATi went beyond the AGP spec in terms of power consumption. That's a major difference IMO. Granted, I'm no EE or anything, but I've seen first hand the effects of increasing voltage to CPUs (in terms of overclockability). In many cases it can make a big difference (granted you run into diminishing returns rapidly as the CPU reaches the edge of its overclockability). So I can easily see this as being a factor in clockspeed vs clockspeed comparisons.

Also remember everyone thought that "Ati could never make such a complex chip run at greater than 300 MHz". I believe in the past it was basically assumed that R300 would be 250 MHz or so, and NV30 would be around 350 MHz because it was at 0.13. No one was ever concerned that the R300 would be half as fast as the NV30. So why is it now that ATi pulled a rabbit out of their hat, that it's just assumed the NV30 will be 400+ MHz?

I'm not saying the NV30 won't have a higher clockspeed, but it's not nearly as black and white as people are trying to make out.
 
sancheuz said:
I got a question. Why is it that CPUs are going in the GHZ range, while gpus are still in the low megahertz? Is it a heating problem?
Search the forums for that, I think there was a good thread on that some time ago. The simple answer is that its more efficient to do a lot of work per clock cycle than have a high clock rate for GPUs.
 
Back
Top