NV40: Surprise, disappointment, or just what you expected?

demalion · Apr 16, 2004

Khronus said:
I don't see the big deal with the NV40. Theres no doubt its fast but with 2x the pipelines and another 12GB of memory bandwith than a 9800XT I'm not surprised it smokes a 9800XT at high resolutions.
...

The big deal is achieving 2x the pipelines (of the R3xx). Especially since it compares to the NV3x family even more impressively. This doesn't come from nowhere, this comes from engineering effort, so actually achieving that is what is impressive.

Sxotty · Apr 16, 2004

karlotta said:
Buy them both, and see how they work.Trolling for validation isnt going to get you a answer to your question.
Whats kinda funny here is the way the Nv40 is being compared to the r3xx, and the r3xx is holding its own

LOL you have a different interpretation of holding its own than me that is for sure.

glw · Apr 16, 2004

DaveBaumann said:
Evildeus said:

Welll the article also state that they don't know if the disapointing result on branching is due to the drivers or not. I think we need other tests, other drivers and perhaps another DX.

Click to expand...

In the initial round of NDA breifings I'd asked Kirk how the branching was supported on the PS of NV40 and then his answer was "Oh, we use the parallelism of the pipeline to figure out the answer". To me, this sounded very much like they have just exposed the conditional write masks for NV3x as a PS3.0 "dynamic branch" capability.

So no branch prediction/history?

It'll be nice for writing unified shaders but performance may not
be there yet. Careful use will be the key, branch to early exit,
don't branch too often etc.

IMHO this is what ATI are hinting at with the R500, doing branches
like a modern CPU.

Joe DeFuria · Apr 16, 2004

Chalnoth said:
I definitely expect that at the very least, the NV4x parts for the mid-low range of the market will be the parts to buy this fall.

I would have to disagree.

From what we can tell, nvidia has basically achieved "parity" with the R3xx core, in terms of performance and bandwidth utilization (Pipe per pipe, and clock for clock.)

That is, a 4 pipe NV4x at say 300 Mhz, would about equal the performance of a 4 pipe RV3x at 300 Mhz, given the same memory bandwidth.

However, it also appears (can't tell for sure yet) that it has taken nVidia more transistors to do this, and more power consumption. To be frank, having support for SM 3, as little as it may possibly mean in the high end, means almost zero in the low end, AFAIC.

In the lower end space, cost is obviously key.

So I see two advantages for ATI here:

1) ATI appears, with 4 pipe R3xx core, to have a cost/performance advantage over an nVidia 4 pipe variant of NV40 core.

2) ATI may have access to a more cost effective process to boot. (ATI will be using 0.11 for their coming low end RV370...I don't think nVidia has 0.11 in the cards for at least a few quarters yet.)

KimB · Apr 16, 2004

Joe DeFuria said:
Chalnoth said:

I definitely expect that at the very least, the NV4x parts for the mid-low range of the market will be the parts to buy this fall.

Click to expand...

I would have to disagree.

From what we can tell, nvidia has basically achieved "parity" with the R3xx core, in terms of performance and bandwidth utilization (Pipe per pipe, and clock for clock.)

Well, I was mostly considering shader performance. Yes, performance in most games will be similar, but by the end of this year, we should have more shader-heavy games, and the NV4x parts should start to pull ahead.

As for transistor counts, a 4-pipe NV4x part shouldn't have more than about 70-80 million transistors, which would make it similar in size to the 4-pipe R3xx's, I believe.

Skinner · Apr 16, 2004

I'm for sure impressed with its performance, especially with the 32 bits fp shaderspeed and the coming of SM3, I want to see displacementmapping in my games and with good performance.

Kudos to nV, It's an impressive part!

Luminescent · Apr 16, 2004

Joe Defuria said:
From what we can tell, nvidia has basically achieved "parity" with the R3xx core, in terms of performance and bandwidth utilization (Pipe per pipe, and clock for clock.)

In terms of shading performance, I'm not so sure of that, Joe. Two of our forum members took the time to stack the driver unoptimized NV40 with the 9800XT, in terms of shader performance; they discovered the following in this thread:

eSa said:
According to the final result, single 6800 pipe has 1.406 times the speed of single R360 pipe. That 1.4 ratio is roughly present throught all invidual tests, so maybe it's at least a decent approximation.

and

Mintmaster said:
Anyway, the number I get is 1.13 (NV pipe / RV360 pipe). With the Xbit shaders, I get 1.01. That's pretty much what I expected, since NVidia won't be able to do 2 shader ops per clock very often given the restrictions. Also, NV40 has about 22% less bandwidth per pipe, and while this isn't a big deal for the long shaders, it could make a difference in some of them.

We still do not know the shader units full abilities, including what types of instructions they can pull off simultaneously. 3DGPU stated the following in its synopsis of the NV40 shader pipeline:

NV40 is less "static" than you may think. If Shader Unit 1 performs a texture operation, this usually blocks the arithmetic part. But sometimes, it can still execute at least a scalar arithmetic instruction during the same clock cycle

Joe DeFuria · Apr 16, 2004

Chalnoth said:
Well, I was mostly considering shader performance. Yes, performance in most games will be similar, but by the end of this year, we should have more shader-heavy games, and the NV4x parts should start to pull ahead.

Why?

What suggests that a NV4x has more shader power (clock for clock and pipe for pipe), than the R(V)3X cores? They look remarkably similar to me. NV40 doesn't beat the R360 because it has more shader power per pipe per clock...it beats it because it has more pipes.

As for transistor counts, a 4-pipe NV4x part shouldn't have more than about 70-80 million transistors, which would make it similar in size to the 4-pipe R3xx's, I believe.

That would make transitor counts similar if that ends up being the case...but not die size if RV370 is on 0.11, and the 4 pipe NV4x is on 0.13.

Joe DeFuria · Apr 16, 2004

Luminescent said:
In terms of shading performance, I'm not so sure of that. Two of our forum members took the time to stack the driver unoptimized NV40 with the 9800XT.

Yes, I'm aware of those tests:

1) It's likely that there is not nearly as much room for "driver optimizations" with the NV40 core (in PS 2.0 apps) as there was with the "twitchy" NV3x core. (This is actually a good thing in my eyes, btw.)

2) Mintmaster cast some doubt on the accuracy of the Hexus.net scores that eSa pulled his numbers from:

Mintmaster said:
There's something funny about the Hexus numbers. Over at Tech Report, most of the numbers are lower with pp on. I eyeballed the values on Tech-Reports chart, and they're all 18-22% higher on Hexus (did they overclock?). The R360 scores, however, match perfectly.

I did the same thing you did with Tech-Reports score, but used a geometric average instead (that one enormously large score will make a big difference in an arithmetic average). I can't say for sure Tech-Report is right, but it's more inline with what we see here at Xbit.

So for the time being, I'm going with MintMaster's (XBits) performance numbers for the shaders, which are between 1 and 10% faster, clock for clock.

Stryyder · Apr 16, 2004

Joe DeFuria said:
Chalnoth said:

Well, I was mostly considering shader performance. Yes, performance in most games will be similar, but by the end of this year, we should have more shader-heavy games, and the NV4x parts should start to pull ahead.

Click to expand...

Why?

What suggests that a NV4x has more shader power (clock for clock and pipe for pipe), than the R(V)3X cores? They look remarkably similar to me. NV40 doesn't beat the R360 because it has more shader power per pipe per clock...it beats it because it has more pipes.
.

Good point, after looking at the numbers that is a very good observation and bodes well for R4xx shader performance.

Joe DeFuria · Apr 16, 2004

Stryyder said:
Good point, after looking at the numbers that is a very good observation and bodes well for R4xx shader performance.

To be clear, I'm not expecting R420 to outperform R3xx much on a "clock for clock / pipe for pipe" basis either. I'm expecting a very similar performance profile compared to NV40...so absolute performance would come down to clock rates.

KimB · Apr 16, 2004

Joe DeFuria said:
As for transistor counts, a 4-pipe NV4x part shouldn't have more than about 70-80 million transistors, which would make it similar in size to the 4-pipe R3xx's, I believe.

Click to expand...

That would make transitor counts similar if that ends up being the case...but not die size if RV370 is on 0.11, and the 4 pipe NV4x is on 0.13.

And why would you think that ATI would be the only one to drop to .11 micron? The roadmaps I've seen put the rest of nVidia's NV4x line at .11 micron.

Personally, I expect a performance battle to occur in the high-end, but the mid-low range will go hands-down to nVidia.

webmedic · Apr 16, 2004

Chalnoth said:
Personally, I expect a performance battle to occur in the high-end, but the mid-low range will go hands-down to nVidia.

Um ok. The warp core from the nv40 has caused chalnoth to loose his marbles. Back away slowly so you can start thinking again.

We wont know this untill the parts come out.

Joe DeFuria · Apr 16, 2004

Chalnoth said:
And why would you think that ATI would be the only one to drop to .11 micron?

This spring? Because ATI said in their last conference call that their spring line up would include a 0.11u part.

The roadmaps I've seen put the rest of nVidia's NV4x line at .11 micron.

When?

Personally, I expect a performance battle to occur in the high-end, but the mid-low range will go hands-down to nVidia.

I think it will be a battle in both areas, and I believe ATI will have the edge in both as well. Not "hands down" to ATI...but certainly not hands-down to nVidia.

Again, explain to me how NV4x NEW cores make better (cheaper and less power consuming) parts, clock for clock, pipe for pipe, than ATI's already existing RV3xx core?

radar1200gs · Apr 16, 2004

Joe DeFuria said:
Chalnoth said:

I definitely expect that at the very least, the NV4x parts for the mid-low range of the market will be the parts to buy this fall.

Click to expand...

I would have to disagree.

From what we can tell, nvidia has basically achieved "parity" with the R3xx core, in terms of performance and bandwidth utilization (Pipe per pipe, and clock for clock.)

That is, a 4 pipe NV4x at say 300 Mhz, would about equal the performance of a 4 pipe RV3x at 300 Mhz, given the same memory bandwidth.

However, it also appears (can't tell for sure yet) that it has taken nVidia more transistors to do this, and more power consumption. To be frank, having support for SM 3, as little as it may possibly mean in the high end, means almost zero in the low end, AFAIC.

In the lower end space, cost is obviously key.

So I see two advantages for ATI here:

1) ATI appears, with 4 pipe R3xx core, to have a cost/performance advantage over an nVidia 4 pipe variant of NV40 core.

2) ATI may have access to a more cost effective process to boot. (ATI will be using 0.11 for their coming low end RV370...I don't think nVidia has 0.11 in the cards for at least a few quarters yet.)

Lots of people ragged on the original GeForce MX as being too hot and having too many useless features for the lowend market and how it was so underpowered compared to mainstream GeForces as to be useless. Lets just say the market proved them very, very wrong...

Joe DeFuria · Apr 16, 2004

radar1200gs said:
Lots of people ragged on the original GeForce MX as being too hot and having too many useless features for the lowend market...

Um, who said that?

and how it was so underpowered compared to mainstream GeForces as to be useless. Lets just say the market proved them very, very wrong...

IIRC, the original GeForceMX was pretty much hailed as a great low-end variant of the Geforce2 right from the start?

The GeForceMX had no direct competition for quite some time. This is obviously not going to be the case this time around.

radar1200gs · Apr 16, 2004

Reviewers got the MX right as did consumers, but, I'm talking more about people (anti-nvidia bashers) discussing the MX in various forums (note I didn't even mention the 5200, another very budget popular chip and it did have (indirect) competition).

Joe DeFuria · Apr 16, 2004

radar1200gs said:
Reviewers got the MX right as did consumers, but, I'm talking more about people (anti-nvidia bashers) discussing the MX in various forums...

Who cares about them? :?

radar1200gs · Apr 16, 2004

Joe DeFuria said:
radar1200gs said:

Reviewers got the MX right as did consumers, but, I'm talking more about people (anti-nvidia bashers) discussing the MX in various forums...

Click to expand...

Who cares about them? :?

You evidently, since your argument is comsumers will ignore a budget 6800 because of power consumption, heat and useless features. I would suggest history does not back your assertions up.

Joe DeFuria · Apr 16, 2004

radar1200gs said:
You evidently, since your argument is comsumers will ignore a budget 6800 because of power consumption, heat and useless features. I would suggest history does not back your assertions up.

Um, where did I say consumers will ignore a budget 6800?!

NV40: Surprise, disappointment, or just what you expected?

Similar threads