NV40: Surprise, disappointment, or just what you expected?

Chalnoth said:
Bry said:
I have a personal question for you.
If ( And I know it is a big if) ATI releases the X800XT and it smacks NV40 all over the place as far as shaders go. Are you going to leave this forum in embarassmeant
Haha! You take this way too seriously. I've been wrong in the past, and I'll be wrong in the future. I stay here because it's my hobby. I enjoy learning about new hardware, and I (usually) enjoy arguments that I'm invoved in.

Or continue to say NV40 has so much more to offer than R420,
I don't think I've been saying that yet. I've been stating my expectations on performance (I still think the NV40 will outperform the equivalent R420, that is, 16 pipe vs. 16 pipe, etc., but it looks like the margin will be close), and, I hope, a realistic expectation on what to expect from PS 3.0 support on the NV40.

lol hey good answer..I fully expected to get slammed on far saying that..lol I was just curious is all..And yes it does make for entertaining reading when these arguments happen ;)
 
I doubt they'll be able to increase shader efficiency by more than ~20%.

Are you assuming a single (ala R30) or dual pixel shader ALU (ala NV40) per pipe in the R420? Because if ATi goes for dual shader ALUs per pipe and have higher clock speeds then I'd say the performance advantage would definitely shift toward ATi. Nvidia would still retain bragging rights on the NV40's SM3.0 support though.

Ati's been able to get some pretty decent performance increases from the R300's compiler overtime and I expect NV40 to be able to do the same, and as architectures get more complex this sort of thing is only going to get more apparent.
 
Heathen said:
Are you assuming a single (ala R30) or dual pixel shader ALU (ala NV40) per pipe in the R420? Because if ATi goes for dual shader ALUs per pipe and have higher clock speeds then I'd say the performance advantage would definitely shift toward ATi. Nvidia would still retain bragging rights on the NV40's SM3.0 support though.
1. Since they have fewer transistors than the NV40, even if they only use FP24, I don't think it's possible to have two full (without limits) shading units per pipeline in the R420.
2. I do expect two ALU's per pipe, but with limitations. According to what we've heard about the R3xx's pipelines, they already have two ALU's per pipe, with limitations. What I expect is greater efficiency in making use of both shader units.

Regardless, if the PS 2.0b profile is for the R420, which I find likely, then it is apparent that the R420 is not a big step up from the R3xx (in terms of pipeline configuration: it will be a big step in performance, due to the added pipelines). This would coincide with rumors from a while back that the original R400 was scrapped, and the R380 became what we now know as the R420. Given the small increase in features, I don't expect significant differences in processing power per pipeline than the R3xx. I do expect greater efficiency in making use of available processing.
 
1. Since they have fewer transistors than the NV40, even if they only use FP24, I don't think it's possible to have two full (without limits) shading units per pipeline in the R420.
2. I do expect two ALU's per pipe, but with limitations. According to what we've heard about the R3xx's pipelines, they already have two ALU's per pipe, with limitations. What I expect is greater efficiency in making use of both shader units.

So you've no real idea about the shader config then I take it? ;)

then it is apparent that the R420 is not a big step up from the R3xx (in terms of pipeline configuration:

Last I knew number of instructions it could handle had little or nothing to do with the number of ALUs in the pipe.

This would coincide with rumors from a while back that the original R400 was scrapped, and the R380 became what we now know as the R420.

If you're going to listen to one rumour you need to listen to them all, cherry picking may support your viewpoint but it's not very useful.
 
Chalnoth said:
Heathen said:
Are you assuming a single (ala R30) or dual pixel shader ALU (ala NV40) per pipe in the R420? Because if ATi goes for dual shader ALUs per pipe and have higher clock speeds then I'd say the performance advantage would definitely shift toward ATi. Nvidia would still retain bragging rights on the NV40's SM3.0 support though.
1. Since they have fewer transistors than the NV40, even if they only use FP24, I don't think it's possible to have two full (without limits) shading units per pipeline in the R420.

The NV40 technical information doesn't indicate "two full (without limits) shading units per pipeline" either, just the marketing material (that said the same thing for the NV35, remember?). Why, in your perception of possibility, is this required for the R420 in order for it to compare favorably?

In comparison, the R3xx, as confirmed repeatedly in past discussions and brought up again , already isn't just one effective ALU per pipe per clock.

Your line of reasoning doesn't seem to hold together.

2. I do expect two ALU's per pipe, but with limitations.

This describes both the R3xx and the NV40 with available information.

Regardless, if the PS 2.0b profile is for the R420, which I find likely, then it is apparent that the R420 is not a big step up from the R3xx (in terms of pipeline configuration: it will be a big step in performance, due to the added pipelines). This would coincide with rumors from a while back that the original R400 was scrapped, and the R380 became what we now know as the R420.

Well, my understanding was that the R390/R420 was initiated to leverage existing R3xx knowledge to deliver a modification of that design with reduced risk (and functionality) compared to the primarily new R400 pixel pipeline design, as a result of most of the R400 design work being re-directed to R500. Not that some existing 16 pipe design R380 was simply re-badged.

Given the small increase in features, I don't expect significant differences in processing power per pipeline than the R3xx. I do expect greater efficiency in making use of available processing.

The latter seems to be how the NV40 improved over the NV35, even with less potential "processing power" per pipeline on paper. The R3xx wouldn't require a whole new set of per clock "effective" ALU functionality for each pipeline for significant improvement.
 
I was just curious how anyone felt about the rumor of the original design of the R400 being scrapped because it was too powerful.

Do people still feel this way, even now after seeing the 6800U?

Appears to be all BS to me now :LOL: , though that is imho. :)
 
Malfunction said:
I was just curious how anyone felt about the rumor of the original design of the R400 being scrapped because it was too powerful.

Do people still feel this way, even now after seeing the 6800U?

Appears to be all BS to me now :LOL: , though that is imho. :)

My understanding is that the R400 wasn't scrapped. It was deemed too ambitious for currently available processes(hundreds of millions of transistors, sounds like more than 200million to me, but just imo). So it was retargeted. Rumors have sort of persisted that you will be seeing some of r400 in r420 in the form of its vertex shaders. New rumors have also sprung up that you will see the rest of that design in the form of r500 sooner than previously expected.
 
Well considering we won't see Dx10 for a while kinda, looks like what nvidia offered, ie full speed 32bit, ps3.0/vs3.0 in a 16 plus pipe config is probably what the original R400 design, obviously ATI thought it was too hard but nvidia proved em wrong, just like how ATI proved nvidia wrong when they though .15 wasn't good enough for a dx9 card. I love this whole flip flop of position, it's only gonna be mean more performance for us consumers. I've been a little underwhelmed with the slow pace of dx9 improvement in the last 2 years, it feel like we just kept getting 10% speed bumps with a $399 price tag to go along. Not good for consumers since the value of their cards erode quicker when there isn't really much diff in features and performance, just a model number.
 
Has the R420 been released yet, damn I must have been sleeping
eek.gif
 
demalion said:
The NV40 technical information doesn't indicate "two full (without limits) shading units per pipeline" either, just the marketing material (that said the same thing for the NV35, remember?). Why, in your perception of possibility, is this required for the R420 in order for it to compare favorably?
No, the NV40 doesn't have two full shading units. Apparently the structure is:
SU1: can execute a mul or special function and a 16-bit nrm.
SU2: can execute a mad, mul, or add.

Current shader benchmarks put this architecture at ~20% faster per pipeline per clock than the Radeon 9800 XT on average, when operating in full FP32. I really think that ATI would have to something dramatically different to get more than ~20% faster per pipeline per clock with the R420. Since the R420 doesn't appear to be a dramatic departure from the R3xx, I doubt that it will have that much greater efficiency, particularly not at the rumored transistor counts.

So, if you read the above, you should notice that I roughly expect that when running FP32, I expect the NV40 to approximately achieve parity with the R420 in shader ops. But the NV40 has the added advantage of additional FP16 functional units, and so I expect the NV40 to pull ahead when partial precision is used in appropriate places for special functions (i.e. rsq, nrm, which are commonly used in lighting).
 
Since when does transistor count relate to performance, you used that arguement with the Nv30 and lost..want to make that bet again.
ATI and Nvidia also report their transistor count differently, and rumours are always almost WRONG.
 
mozmo said:
I've been a little underwhelmed with the slow pace of dx9 improvement in the last 2 years, it feel like we just kept getting 10% speed bumps with a $399 price tag to go along. Not good for consumers since the value of their cards erode quicker when there isn't really much diff in features and performance, just a model number.

Ya, I felt the same about the percentage of improvement vs. cost as well. If the original R400 design was too *powerful, I just don't understand why ATi wouldn't go for the throat and release it.

imho, it would have set nVidia back a longer while without and prayer and made a larger impact to nVidia finacially. This is why I kinda feel the reason ATi has been so quite about the R420 is because it is gonna compete, mayber even perform better... at the expense of lower precision which, from what I thought, was the whole arguement on why people should go with a 9800 vs. 5950.

If that is no longer the arguement, what is now?
 
Chalnoth said:
For example, the NV40 architecture is maximally capable of ~6-7 instructions per clock per pipeline under ideal circumstances.

Actually, under ideal circumstances, according to nvidia's specifications, the NV40 is capable of a maximum of 4 instructions (2 instructions in ALU0 or a tex op and 2 instructions in ALU1) per pipe per cycle with those 4 instructions generating a maximum of 8 operations per pipe per cycle.


The R3xx's architecture appears to be maximally capable of closer to 4-5 instructions per pipeline per clock, with an average closer to 1 instruction per clock (since ATI has not released technical information, we cannot know exactly how many instructions per pipeline per clock the R300 is capable of).

From what has been shown by sir eric and others, ATI appears to be capable of a maximum of 4 instructions per cycle (1 Tex instruction, up to 2 instructions in ALU0 and 1 or possibly 2 instructions in the restricted ALU). Counting like Nvidia, ATI's 300 is capable of performing 9 operations per pipe per cycle to the best of our knowledge.


What I'm saying is that I expect the current average of ~1.2-1.3 instructions per clock on the NV40 to increase with better compilers that reorder instructions in more optimal ways for the architecture, not to mention proper use of FP16 for certain specific calculations.

While the configuration that Nvidia chose does appear to have some more flexability than what the 300 is using, I wouldn't expect too much benefit from future compilers that wouldn't also apply to the ATI architecture.

Aaron Spink
speaking for myself inc
 
Doomtrooper said:
Since when does transistor count relate to performance, you used that arguement with the Nv30 and lost..want to make that bet again.
ATI and Nvidia also report their transistor count differently, and rumours are always almost WRONG.
The NV30 was broken. The NV40 is not.

With a similar transistor count per pipeline to ATI, nVidia has managed to provide a more efficient core than the R3xx. I expect ATI will also produce a more efficient core with the R420, but it just doesn't appear to have enough added transistors to have a huge increase in shader power.
 
aaronspink said:
Actually, under ideal circumstances, according to nvidia's specifications, the NV40 is capable of a maximum of 4 instructions (2 instructions in ALU0 or a tex op and 2 instructions in ALU1) per pipe per cycle with those 4 instructions generating a maximum of 8 operations per pipe per cycle.
One tex op, one FP16 normalize, two co-issued muls, two co-issued adds/mads. Then there's the posibility of an extra trivial instruction (mov, abs).

From what has been shown by sir eric and others, ATI appears to be capable of a maximum of 4 instructions per cycle (1 Tex instruction, up to 2 instructions in ALU0 and 1 or possibly 2 instructions in the restricted ALU). Counting like Nvidia, ATI's 300 is capable of performing 9 operations per pipe per cycle to the best of our knowledge.
That's probably about right.

While the configuration that Nvidia chose does appear to have some more flexability than what the 300 is using, I wouldn't expect too much benefit from future compilers that wouldn't also apply to the ATI architecture.
ATI's had close to two years to optimize the R3xx core. More optimizations just aren't going to happen.
 
Chalnoth you are seriously nVidia biased, it gets old after awhile, especially when you pretend not to be.

The way I look at it, ATi has proven that they understand how to make an efficient core, and have outdistanced nVidia in this respect for 2+ years. I have no reason to expect this will change with the R420. Will this make it faster? Maybe not, but I am confident it will be smaller, cooler, have higher image quality, and offer some new technologies in the process.

This pattern keeps happening, one technology leapfrogs the previous over and over again. That is why they call it a generation. Why is everyone so surprised by this?
 
Surprises are coming for sure, Chalnoth has no idea what is coming, as usual alot of trivial BS from him...but like I said before two years ago alot of Nvidia faithfull went into hiding...at least Chalnoth spent many hours on this forum defending a broken video card by his own words.
 
SiliconAbyss said:
The way I look at it, ATi has proven that they understand how to make an efficient core, and have outdistanced nVidia in this respect for 2+ years.
They have had a more efficient core for less than two years, and that's one architecture.

nVidia was ahead of ATI in nearly every way in 3D graphics for the previous four architectures (RIVA 128, RIVA TNT, GeForce 256, GeForce3), and ahead of the rest of the market for about 2.5 or so of those architectures.
 
Chalnoth said:
With a similar transistor count per pipeline to ATI, nVidia has managed to provide a more efficient core than the R3xx. I expect ATI will also produce a more efficient core with the R420, but it just doesn't appear to have enough added transistors to have a huge increase in shader power.

What part of ATI doesn't count transistors like your beloved don't you understand.
 
Back
Top