NV40: Surprise, disappointment, or just what you expected?

nelg said:
Chalnoth said:
Since the R420 doesn't appear to be a dramatic departure from the R3xx, ....
Neither does the Nv40 seem to be a big departure from the R3xx.
Right. It sure doesn't have things like:
FP16 blending
FP16 texture filtering
Static branching in the pixel shader
Dynamic branching in the pixel shader
Unlimited shader lengths
Unlimited dynamic texture reads
Video Processor
Some accelerated FP16 instructions
FP32 processing
Arbitrary Swizzling
Gradient Instructions

...need I go on?
 
Chalnoth said:
Heathen said:
Are you assuming a single (ala R30) or dual pixel shader ALU (ala NV40) per pipe in the R420? Because if ATi goes for dual shader ALUs per pipe and have higher clock speeds then I'd say the performance advantage would definitely shift toward ATi. Nvidia would still retain bragging rights on the NV40's SM3.0 support though.
1. Since they have fewer transistors than the NV40, even if they only use FP24, I don't think it's possible to have two full (without limits) shading units per pipeline in the R420.
2. I do expect two ALU's per pipe, but with limitations. According to what we've heard about the R3xx's pipelines, they already have two ALU's per pipe, with limitations. What I expect is greater efficiency in making use of both shader units.

Regardless, if the PS 2.0b profile is for the R420, which I find likely, then it is apparent that the R420 is not a big step up from the R3xx (in terms of pipeline configuration: it will be a big step in performance, due to the added pipelines). This would coincide with rumors from a while back that the original R400 was scrapped, and the R380 became what we now know as the R420. Given the small increase in features, I don't expect significant differences in processing power per pipeline than the R3xx. I do expect greater efficiency in making use of available processing.
Your information os flatly inaccurate. The r380 did not become the R420. And you are underestimating just what ATi has done and what their features will be. You are also off in your assumptions about what they can and cant do with their transistor budget.
 
Chalnoth said:
SiliconAbyss said:
The way I look at it, ATi has proven that they understand how to make an efficient core, and have outdistanced nVidia in this respect for 2+ years.
They have had a more efficient core for less than two years, and that's one architecture.

nVidia was ahead of ATI in nearly every way in 3D graphics for the previous four architectures (RIVA 128, RIVA TNT, GeForce 256, GeForce3), and ahead of the rest of the market for about 2.5 or so of those architectures.
That is not an accurate statement either. :rolleyes:
 
Hellbinder said:
Your information os flatly inaccurate. The r380 did not become the R420. And you are underestimating just what ATi has done and what their features will be. You are also off in your assumptions about what they can and cant do with their transistor budget.
And now that the GeForce 6800 has been announced, and everybody's seen it stellar performance, you're just claiming that ATI's got magical bunnies that are going to allow them to develop a core that's twice the speed, with less power draw!
 
Chalnoth said:
Hellbinder said:
Your information os flatly inaccurate. The r380 did not become the R420. And you are underestimating just what ATi has done and what their features will be. You are also off in your assumptions about what they can and cant do with their transistor budget.
And now that the GeForce 6800 has been announced, and everybody's seen it stellar performance, you're just claiming that ATI's got magical bunnies that are going to allow them to develop a core that's twice the speed, with less power draw!
You remember this Quote in a few more days....

Thats all i am going to say.
 
Hellbinder said:
Chalnoth said:
Hellbinder said:
Your information os flatly inaccurate. The r380 did not become the R420. And you are underestimating just what ATi has done and what their features will be. You are also off in your assumptions about what they can and cant do with their transistor budget.
And now that the GeForce 6800 has been announced, and everybody's seen it stellar performance, you're just claiming that ATI's got magical bunnies that are going to allow them to develop a core that's twice the speed, with less power draw!
You remember this Quote in a few more days....

Thats all i am going to say.
ativsnvidia.gif
 
Chalnoth said:
And now that the GeForce 6800 has been announced, and everybody's seen it stellar performance, you're just claiming that ATI's got magical bunnies that are going to allow them to develop a core that's twice the speed, with less power draw!

:LOL: I wouldn't quite call them bunnies...
 
And now that the GeForce 6800 has been announced, and everybody's seen it stellar performance, you're just claiming that ATI's got magical bunnies that are going to allow them to develop a core that's twice the speed, with less power draw!

Well, they did it with the R3xx (vs. the NV3x).

And I don't want to hear you say that is only because the NV3x "was broken". It wasn't broken. There was nothing broken about it. It was a piece of shit. That doesn't mean it was broken.

The chip (NV30) was exactly what Nvidia wanted, it was exactly what they had developed. They spent hundred of millions of dollars to build that chip just the way it turned out. Nothing was broken. It simply turned out that Nvidia misjudged both the market and their competition, hoping that a half-assed technology leap would be good enough. Well, it wasn't.
 
Chalnoth said:
demalion said:
The NV40 technical information doesn't indicate "two full (without limits) shading units per pipeline" either, just the marketing material (that said the same thing for the NV35, remember?). Why, in your perception of possibility, is this required for the R420 in order for it to compare favorably?
No, the NV40 doesn't have two full shading units. Apparently the structure is:
SU1: can execute a mul or special function and a 16-bit nrm.
SU2: can execute a mad, mul, or add.

Yeah, congrats, you read it, I guess? Now read about the R3xx. Hold both thoughts in your mind, and then let only the logical things out through the keyboard, if you are able.

Current shader benchmarks put this architecture at ~20% faster per pipeline per clock than the Radeon 9800 XT on average, when operating in full FP32.

What average are you talking about, and do the metrics that go into it isolate pixel shading from bandwidth and vertex processing advantages? Your using the word "average" when picking and choosing info, doesn't make that figure any less imaginery.

So what's your excuse for ignoring that this advantage isn't universal and that even the R3xx should be able to reverse this situation in some cases? It seems you don't want to recognize this, because it just might indicate that minor changes could conceivably change the balancing point of the "average" case.

Look, the NV40 architecture is excellent, but your selective perception makes discussion of how this compares to any other IHV a useless exercise while you continue to display your bias blinders at every turn.
This is a shame and unfair to the NV40, because even though it doesn't make sense for the only possibility you'll accept (that it is impossible that "non-dramatic" changes to the R3xx pipeline, and therefore possible for the R420, might achieve something with more effective general case throughput than NV40), it does, AFAICS, indicate that the NV40 throughput picture is advantageous in comparison to the R3xx for quite a significant body of shaders. This is no small achievement of the NV40 engineers, nor was delivering an option for PS 3.0 features and achieving 16 pipelines at the same time. You polarize people into attacking the NV40 by making a mockery of comparing it to other architectures to fit things into your worldview. :-?

I really think that ATI would have to something dramatically different to get more than ~20% faster per pipeline per clock with the R420.

Than the R3xx? No, they wouldn't, at least not in the sense of "dramatic" as you use it to propose they can't reasonably achieve it in R420. And you again confuse the idea of efficency of throughput by treating "20%" as requiring some sort of 20% transistor increase, and ignoring how the NV40's greater efficiency than NV35, with more features, more than double the pipelines, and less than double the transistors, illustrates the flaw with that. :-?

Would only adding more effective dot product op throughput be "dramatically" different? It would double throughput in dot product sequences, and allow operation of a non dependent op to remove a cycle of cost from normalization. The first case would seem to deliver advantage over NV40, the second sometimes significantly reduce the advantage of an NV40 feature that is restricted to partial precision, and put it ahead in PS 2.0 full precision.

How about reducing the effective clock cycle cost of some key operations that might currently be more than one clock cycle? Are there any for the R3xx? Why wouldn't such a tweak be feasible? The engineers have already indicated pure FP32 would have been feasible if they deemed it necessary, and you still preclude that a design could be closely based on the R3xx yet be significantly different in characteristics.

Since the R420 doesn't appear to be a dramatic departure from the R3xx, I doubt that it will have that much greater efficiency, particularly not at the rumored transistor counts.

You make nearly every conversation involving ATI and nVidia a useless exercise. Until, of course, it comes time for nVidia to accomplish something similar, and then "realizing" your prior error. :oops: Congrats.

Let's see if we can shift time/alter the universe to suit you, and allow useful conversation here (if only I had known the NV40 was coming so it could have been this easy when I discussed the NV30's problems with you! :-?): so, when it comes time to refresh the NV40, is nVidia as precluded from "dramatically" changing the NV40 architecture to achieve a higher throughput than the NV40, even within an unclear in specifics but similarly limited transistor budget to the NV40?
Would it then enter your head to consider the significance of their methodology of counting transistors, to think that maybe a significant portion of their transistor budget might conceivably remain untouched by improvements that increase throughput for pixel processing?

Well?

So, if you read the above,...

Umm...remember, it was me that provided a link to my discussing the info about the NV40 you recognized "above"...you were proposing that two fully independent ALU ops per clock would be necessary to show advantage to the NV40 prior to that. You didn't provide any new info about why your R420 versus NV40 dictates make sense.

EDIT: Well, your info doesn't quite match the info I'm thinking of, actually. Where is it from?

...you should notice that I roughly expect that when running FP32, I expect the NV40 to approximately achieve parity with the R420 in shader ops.

Because, of course, what ATI can achieve is limited by what nVidia did, nevermind any side issues?

But the NV40 has the added advantage of additional FP16 functional units, and so I expect the NV40 to pull ahead when partial precision is used in appropriate places for special functions (i.e. rsq, nrm, which are commonly used in lighting).

Ah, no, it is because what ATI can achieve is limited to having no advantages compared to what nVidia did, but the converse doesn't apply. I see.
 
Chalnoth is a serious NV fan, end of story.
If we know absolutely nothing else for certain.

Bunnies....
My god ya couldn't come up with anything better than that?
What are you gonna say if the R420 kicks the shit out of that thing?
They got lucky?
AGAIN!!
Folks keep talking like NV had the speed crown for 100 years, and ATi got lucky and had it for 1 1/2 or so.....
How come no NV fanboy ever discusses that fact that folks had to disable the filtering on their cards so they had DECENT IQ that wasn't all washed out?
Not a vcore/vref/vdd mod for better PERFORMANCE..
A Mod to make it look REASONABLE!!!
At 4 beans that sucker should have looked good fronm the get go!!
LAME ass excuses as usual from these zealots....

Get real man, NV is NOT the only player in town, and your fanboyitis is getting really F'N old.

You make HB look like a saint for crying out loud.

Edit: geez can't even use the word fan boy here............
 
Chalnoth said:
nelg said:
Chalnoth said:
Since the R420 doesn't appear to be a dramatic departure from the R3xx, ....
Neither does the Nv40 seem to be a big departure from the R3xx.
Right. It sure doesn't have things like:
FP16 blending
FP16 texture filtering
Static branching in the pixel shader
Dynamic branching in the pixel shader
Unlimited shader lengths
Unlimited dynamic texture reads
Video Processor
Some accelerated FP16 instructions
FP32 processing
Arbitrary Swizzling
Gradient Instructions

...need I go on?
Sorry about that. I was trying to say that the R3xx looks like the foundation for both the Nv40 and R420. Lets look at the similarities between the Nv40 and the R3xx.

Not relying on cutting edge process tech.

Dave's preview said:
Jen-Hsun also made note that NV30 was far too reliant on process. It's widely believed that NV30 was late due to NVIDIA initially trying to adopt 130nm low-k before it was truly ready and emphasizing and clock speed, neglecting large scale parallelism. He stated that NV4x has been designed not so much with pure speed in mind, but a wider, more parallel architecture – effectively doffing his hat to ATI.

Redundancy in design

however, with multiple quads on the die it may be the case that NVIDIA will choose to operate a redundancy scheme in order to minimize cost for dies that are not fully functional to NV40 specification. As ATI did with the R300 ASIC and the Radeon 9700 and 9500 products, should a defect occur in a pixel pipeline, which would probably be one of the largest element of the die in these chips, then rather than wasting the entire ASIC the quad that the pipeline exists in can be disabled

Multiple shader units in each pipe.

As the pipeline above shows, each vertex engine has both a full vector ALU unit with a scalar ALU in parallel, giving a total of 5 component operations per cycle, similar to R300.
 
I just don't think it's possible for the R420 to have dramatically higher shader throughput per pipe than the R3xx.

Here's another argument, looking at transistor counts:
Once again, the transistor count rumored for the R420 is 175 million, with a 16-pipe design.

The Radeon 9800 XT has 107 million transistors for an 8-pipe design.

How many extra transistors would it take to move from an 8-pipe R3xx design to a 16-pipe design? Well, let's look at how many transistors ATI saved when they made the Radeon 9600 XT.

The Radeon 9600 XT has 75 million transistors for a 4-pipe design.

So, if we do some simple math, and assume that if we can take the transistor difference in the 9800 XT and the 9600 XT to be indicative of a transistor difference if we added pipelines, the number of transistors a theoretical R3xx part would have is 107+(107-75)*2=171 transistors.

Given that it's a new core, I expect some improvements, but with barely enough transistors to support R3xx-style pipelines, I just don't buy that ATI could have dramatically increased efficiency with the R420.
 
Chalnoth said:
And now that the GeForce 6800 has been announced, and everybody's seen it stellar performance, you're just claiming that ATI's got magical bunnies that are going to allow them to develop a core that's twice the speed, with less power draw!

Well, the rumours are that it doesn't have SM3.0 capabilities. I would probably agree with you if the rumours said that it was a SM3.0, FP32 part and only had 160-180 million transistors. But i definitely think it's possible for them to more then double the performance with those amount of transistors if the featureset is largely the same.

(The rumours seems to indicate great things afa FSAA goes which of course it very interesting.)
 
nelg said:
Sorry about that. I was trying to say that the R3xx looks like the foundation for both the Nv40 and R420. Lets look at the similarities between the Nv40 and the R3xx.
That's just an impossibility. nVidia just doesn't have the design specifications for the R3xx. Make no mistake, the NV30 is the core the NV40 was based upon. The NV40 was designed with the knowledge of the mistakes made in the NV30's design. Those mistakes were corrected, and the successes were expanded upon.
 
If Chal's pipeline estimations are correct, then I think he has a valid point. 30 million for 4 additional pipes (9600 -> 9800) and consequently 60 million for 8 pipes, bringing the transistor count to 167 million (9800 -> R420). Whether or not you can extrapolate like this is another story, but it seems to have merit.
 
All I know is I will lmao if the skinny R420 with it little tranistor count kicks the shit out of the new "King" ( I never said that, zealot NV fans have, and they haven't even seen what the R420 can do..... fools as usual).
 
I just peeped into this thread now, and I don't think I've ever seen so much variety in Chalnoth's biased, illogical replies.

Chalnoth said:
Current shader benchmarks put this architecture at ~20% faster per pipeline per clock than the Radeon 9800 XT on average, when operating in full FP32.
Really? Not by my count. ShaderMark 2.0: 13% faster. Xbitlabs shaders: 1% faster.

Your transistor argument, well, that's just silly. Look no further than NV3x. It was broken, you say? Instead, look at R200 vs. RV250. The former has around 60 million transistors, the latter around 35-40 million. Same pixel shading ability, just lost half the texture units, and had one "optimized" VS unit instead of the two in R200. Sometimes you can really trim an architecture. ATI probably wanted R300 out ASAP, as it was their white knight, so saving transistors probably wasn't the number one goal.

Chalnoth said:
ATI's had close to two years to optimize the R3xx core. More optimizations just aren't going to happen.
First of all, the 9800XT came out one year after the 9700 did. Secondly, they were pretty much just optimizing the core speed, and had barely any other additions. Thirdly, NVidia could have similarly fixed NV30's pixels shaders in NV35 (I know they say they doubled the FP units but you rarely saw even close to that improvement in pixel shading) or even NV36, but they didn't. When these companies finish an architecture, they pretty much stick with it.

Chalnoth said:
nVidia was ahead of ATI in nearly every way in 3D graphics for the previous four architectures (RIVA 128, RIVA TNT, GeForce 256, GeForce3), and ahead of the rest of the market for about 2.5 or so of those architectures.
Not this again. Need I remind you of this thread? NVidia was ahead in terms of performance, and probably drivers too, but beyond those aspects (admittedly the most important from a marketing standpoint) they were by no means dominant, especially from a technological standpoint since the original Radeon came out.


Chalnoth, I love the NV40's architecture, and if I were in the market for a new VC it would be my number one choice unless the price difference was too large, but for you to undermine R420's performance before it's even here is just plain silly.

I think a lower bound on it's performance is 4 times the 9500 pro. R420 (XT version) has double the pipes, double the bus width, and double the mem/core clocks (assuming 550/540). R420 may have 6 VS units rather than 8, but that won't be a huge difference. Shader efficiency improvement is likely, as is AA perf due to compression improvements.

EDIT: Made a BIG mistake. I meant to say RV250 in my first paragraph, not RV360. Fixed it now.
 
muzz said:
All I know is I will lmao if the skinny R420 with it little tranistor count kicks the shit out of the new "King" ( I never said that, zealot NV fans have, and they haven't even seen what the R420 can do..... fools as usual).

If the R420 is a FP24 chip with SM2.0 then you can't really make valid comparisions between transistor counts and performance. And since when
is performance the only thing to measure a chip with ? (especially these days)
 
Chalnoth said:
nelg said:
Sorry about that. I was trying to say that the R3xx looks like the foundation for both the Nv40 and R420. Lets look at the similarities between the Nv40 and the R3xx.
That's just an impossibility. nVidia just doesn't have the design specifications for the R3xx. Make no mistake, the NV30 is the core the NV40 was based upon. The NV40 was designed with the knowledge of the mistakes made in the NV30's design. Those mistakes were corrected, and the successes were expanded upon.

So the 125? million (not quite sure on that number) transistor 4 pipe core of the nv30 was the basis for the 222 million transistor 16 pipe core of the nv40, but there is no way that the 75 million transistor 4 pipe core of the rv360 could translate into 175 million transitor 16 pipe r420?

Your logic defeats reason.
 
Back
Top