So who's thinks their predictions were right about the nV40?

Chalnoth said:
I think the NV40's excellent performance really is good evidence that the NV3x core was severely broken. That is, it did not make remotely good use of the transistor budget. After all, when you can quadruple the number of pipelines, fix a number of performance issues, and barely reduce the theoretical maximum processing power of each pipeline without even doubling the number of transistors, something must have been seriously wrong with the design of the previous architecture.

Heh...;) I didn't have to see nV40 to see that aspect of nV3x...;) You could reach the same conclusions just by comparing R3x0 to nV3x. Still, it's a *big* chip with nearly 2x the number of transistors of NV38, and it's made on the same process as nV38, and at the same FAB, so I don't think great yields are guaranteed just because it's a better design than nV3x. This isn't to say yields will be a problem, just that time will tell.

I therefore stand by my previous postulate that the NV30 that nVidia originally meant to release was very different from the one that was released, that the NV30 we saw was one that was designed in a very short timeframe after process troubles prevented the release of the "original" NV30 design.

My own pet theory with respect to nV30 was that nVidia originally designed it as a ~350-400MHz, single-slot, normally aspirated card, but that the R300-based 9700P fouled up the works, and what was done in a hurry in response was the gigantic cooler and the attempt to factory overvolt and overclock it so that it might keep up. Especially considering nVidia's many comments at the time as to how "it wasn't the right time" for a 256-bit wide local bus, and so on, I really think that's what happened. I'm not sure what you think the "original" nV30 design might have been, but I'm confident it had nothing in common with nV40 (not that I think you're saying that, of course)...;)
 
WaltC said:
Heh...;) I didn't have to see nV40 to see that aspect of nV3x...;) You could reach the same conclusions just by comparing R3x0 to nV3x. Still, it's a *big* chip with nearly 2x the number of transistors of NV38, and it's made on the same process as nV38, and at the same FAB, so I don't think great yields are guaranteed just because it's a better design than nV3x. This isn't to say yields will be a problem, just that time will tell.
Well, when I made a similar post some weeks ago, many doubted the statement that the NV3x architecture that we saw was not the one nVidia originally designed. The disparity between the NV40 and NV30 architectures seems to be better evidence than was available previously for this statement, though.

Oh, and no, I don't think yields need be any better with the NV4x design. I think that's more closely tied to the process used than the architecture.

My own pet theory with respect to nV30 was that nVidia originally designed it as a ~350-400MHz, single-slot, normally aspirated card, but that the R300-based 9700P fouled up the works, and what was done in a hurry in response was the gigantic cooler and the attempt to factory overvolt and overclock it so that it might keep up.
I don't buy it. No amount of overclocking/overvolting/whatever would cause some of the more silly aspects of the architecture, such as the FP register performance limitation. No, I really think it had more to do with the rumor that the NV30 was originally designed to run on a low-k .13 micron process at TSMC, something that did not come to fruition. I claim that the change to a normal .13 micron process was significant enough that they couldn't get a proper design out the door, and had to settle for whatever could be made in the time available.

Now, I do agree that ATI did catch nVidia by surprise with the 256-bit bus, but I don't buy that the shader performance of the R3xx should have exceeded that of the NV30, had low-k worked (so the R3xx may have still won in most benchmarks...shader benchmarks, I claim would have been much closer).

And yes, I am certain that no matter the original plan for the NV30's design, I'm sure it did not much resemble the NV40.
 
Chalnoth said:
I don't buy it. No amount of overclocking/overvolting/whatever would cause some of the more silly aspects of the architecture, such as the FP register performance limitation. No, I really think it had more to do with the rumor that the NV30 was originally designed to run on a low-k .13 micron process at TSMC, something that did not come to fruition. I claim that the change to a normal .13 micron process was significant enough that they couldn't get a proper design out the door, and had to settle for whatever could be made in the time available.
Makes no sense whatsoever. The NV3x shader performance is slow by design not because of the process.
Now, I do agree that ATI did catch nVidia by surprise with the 256-bit bus, but I don't buy that the shader performance of the R3xx should have exceeded that of the NV30, had low-k worked (so the R3xx may have still won in most benchmarks...shader benchmarks, I claim would have been much closer).
It's the design, not the process, that dictates performance. Granted, low-k may have offered higher clockspeeds, but if the design is broken in the first place, it's still broken on low-k.

NV3x was business as usual for NVIDIA. They stuck with the basic architecture (NV2x) that had worked so well for them and added more stuff to it. However, that was shown to be an error given how well the R300 performed.

-FUDie
 
g__day said:
Well I think I was the first to realise it could ship with GDDR3 almost 8 months, so I feel great about that calculation.

Yup...I'm on record a while back doubting that GDDR-3 would be used by anyone (nVidia or ATI) this spring. I wasn't expecting it 'till fall.

Glad I'm wrong! :)
 
FUDie said:
Chalnoth said:
I don't buy it. No amount of overclocking/overvolting/whatever would cause some of the more silly aspects of the architecture, such as the FP register performance limitation. No, I really think it had more to do with the rumor that the NV30 was originally designed to run on a low-k .13 micron process at TSMC, something that did not come to fruition. I claim that the change to a normal .13 micron process was significant enough that they couldn't get a proper design out the door, and had to settle for whatever could be made in the time available.
Makes no sense whatsoever. The NV3x shader performance is slow by design not because of the process.
Now, I do agree that ATI did catch nVidia by surprise with the 256-bit bus, but I don't buy that the shader performance of the R3xx should have exceeded that of the NV30, had low-k worked (so the R3xx may have still won in most benchmarks...shader benchmarks, I claim would have been much closer).
It's the design, not the process, that dictates performance. Granted, low-k may have offered higher clockspeeds, but if the design is broken in the first place, it's still broken on low-k.

NV3x was business as usual for NVIDIA. They stuck with the basic architecture (NV2x) that had worked so well for them and added more stuff to it. However, that was shown to be an error given how well the R300 performed.

-FUDie


I think he is suggesting that Nvidia had to radically redesign the core, because the process wasnt ready for their original core design, Whether thats the case or not, I really do not know.
 
FUDie said:
Makes no sense whatsoever. The NV3x shader performance is slow by design not because of the process.
That's sort of the point. It's not just slow, it's slow in ridiculous ways, and, for example, the NV35 doubled the floating point performance without hardly changing the transistor count at all. What I'm saying is that the design that we saw was a rushed design: the process problems forced a rushed design to market.
 
Chalnoth said:
That's sort of the point. It's not just slow, it's slow in ridiculous ways, and, for example, the NV35 doubled the floating point performance without hardly changing the transistor count at all. What I'm saying is that the design that we saw was a rushed design: the process problems forced a rushed design to market.

Or nVidia's design problems forced a rushed secondary design to market...
 
ChrisRay said:
I think he is suggesting that Nvidia had to radically redesign the core, because the process wasnt ready for their original core design, Whether thats the case or not, I really do not know.
That's what I'm suggesting, yes. I think the fact that the NV40 is massively higher-performing than the NV3x is further evidence of this.
 
However, I find my biggest disappointment to be the performance in Far Cry from both a fps and IQ perspective. I realize that there are issues with regards to ver. 1.1 of the Far Cry patch. I will retract my disappointment if Far Cry is run in pure PS2.0 with equal IQ to the 9800XT along with a nice improvment in fps over that same 9800XT.

Did you see the Shadermark and 3dmark03 pixel and vertex shader results of the NV40 vs the 9800XT? The NV40 totally dominated. There were also some PS tests done showing that the NV40 was actually faster using PS 2.0 than using PS 1.1! The NV40 also blazed through games like Tomb Raider. Notice also that, at least at [H]OCP, Far Cry used NV3x settings for NV40. The NV40, based on all other indications, should have very high IQ and very high speed in a game like Far Cry. If anything, it should have an even more commanding lead over the current generation hardware. I expect good things to come when the Forceware 60 drivers become more mature and when the Far Cry game developers work with NVDA to run the game optimally with the NV40.
 
I agree with the others. Higher core clock speed would be nice, and more useable 8xAA would be interesting too. It looks like NVDA was very focused with the NV40 design, clearly defining what it can do very well and what it can't, instead of trying to go for a home run in all areas. In a sense this is good, because NVDA can maximize performance given a more focused set of criteria, ie up to 4xAA and 16xAF, at high resolutions.
 
I was expecting at least a minimum of 30% better performance than what is currently shown.
I was also hoping for gamma corrected FSAA.

I got neither.

I'll just go for whatever has the best possible IQ in the end anyway.
 
K.I.L.E.R said:
I was expecting at least a minimum of 30% better performance than what is currently shown.
I was also hoping for gamma corrected FSAA.

I got neither.

I'll just go for whatever has the best possible IQ in the end anyway.

I saw you give different reasons on other forums. :)

Anyway, with respect to performance and AA, I suggest 1) withhold judgement for a driver revision or two. 2) realize that in many of the benchies, the NV40 was either CPU or bandwidth limited.
 
The only other forum I made a post about the NV40 was NGEmu and I stated the same thing there.

I said: "WOW! I was expecting a lot more from nV. In IQ as well as performance."

Yeh, I hope things only get better from here.


DemoCoder said:
K.I.L.E.R said:
I was expecting at least a minimum of 30% better performance than what is currently shown.
I was also hoping for gamma corrected FSAA.

I got neither.

I'll just go for whatever has the best possible IQ in the end anyway.

I saw you give different reasons on other forums. :)

Anyway, with respect to performance and AA, I suggest 1) withhold judgement for a driver revision or two. 2) realize that in many of the benchies, the NV40 was either CPU or bandwidth limited.
 
What reasoning did you use to come up with the "I expected it to be at least 30% better"? That makes no sense at all. % performance differences vary depending on what game is tested and what resolutions and AA/AF settings are used, period. In some games and at some resolutions, the NV40 was 2-3 times faster than what was previously regarded as the fastest chip on the market, using raw drivers and most likely conservative clock speeds too! Things can only go up from here as the game developers will begin to spend time coding optimally for the NV40, and as the driver team gets a chance to smooth out the drivers and work hand in hand with the game developers.

You also need to realize that speed and iq go somewhat hand in hand. The NV40 is fast enough to keep framerates extremely high using higher resolution and/or higher AA/AF than the current generation hardware. That in itself implies better image quality per given frame rate.
 
Frankly I had a hard time believing that they'll manage to reach 500MHz that easily, but 400MHz is way lower than I expected.

The original targets both in terms of clockspeed and bandwidth were actually as high as older rumours suggested.

I know I'll probably get crucified for saying it, yet IMHO switching to IBM for NV4x wasn't such a wise decision probably after all. I think the picture will get even clearer once we get final info on clockspeed and power consumption from ATI's upcoming sollutions.

1) withhold judgement for a driver revision or two. 2) realize that in many of the benchies, the NV40 was either CPU or bandwidth limited.

Both aren't unusual with every new release. In most cases we see driver revisions on all boards increase performance and the CPU or bandwidth limitations will apply for other sollutions just as much; especially since there's an obvious threshold to GDDR3 availability beyond 550-600MHz.
 
It's not like NVDA is going to sit still with a 400Mhz core clock, especially when we all can see how efficient the underlying NV40 architecture is, and how much performance can potentially increase with core clock speeds. Hopefully there are already process improvements under way that will allow NVDA to start bumping up that core clock speed.

At least the programmers will not need to worry about hacking their drivers in order to achieve higher framerates in a select few benchmarks :D Finally, it seems as if NVDA's resources will go into refining a very fundamentally sound product.
 
And? Same goes for the competition too. 400MHz is a low starting point from whichever perspective you may look at it. In fact 50% lower than the initial 600MHz target.
 
Well, I guess we'll just have to wait and see exactly what ATI brings in response, and what NVDA brings as a counter-response, huh? :D

This is a cycle where each company is going to try to outdo the other, one step after another. Will be good for the consumer though ;)
 
That's what I said a couple of posts ago. Rumours so far indicate at least 500MHz for ATI; the 600MHz rumours I'm not willing to buy just yet.

Whereby of course before anyone else says it, clockspeed cannot be indicative for efficiency. A very tight neck to neck race doesn't sound impossible though IMHO.
 
Back
Top