PDA

View Full Version : Some Geforce FX Benches


Beatles
04-Jan-2003, 10:26
The following preview is of an early GeForce FX sample that was hand-delivered to the Maximum PC Lab by an Alienware representative. Our full preview of Alienware’s new prototype machine and the GeForce FX can be found in the February issue of Maximum PC.



We first heard about the GeForce FX, then code-named NV30, in June 2002. We received a run-down of its feature set -- pixel and vertex shaders that exceed the DirectX 9 spec, 128-bit floating-point precision throughout the 3D pipeline, and support for DDR II memory -- but weren’t able to finagle access to working silicon. Until now.



Of course, the GeForceFX card that came inside our Alienware prototype system was just as “beta” as the rest of the system. With early drivers and freshly fabbed silicon, the card we tested isn’t quite what you’ll find in stores when the card ships in February or March. In fact, the board and its drivers were so unpolished, nVidia initially refused to let us benchmark it at all, and relented only when we agreed to limit our tests to pre-approved benchmarks running at stipulated resolutions and AA settings. We gave in to all these conditions because we were intent on reporting the first GeForce FX benchmark scores, however beta they may be. Driver refinement is an ongoing process -- before and after a videocard launch -- and frame rates will improve as nVidia optimizes more and more for specific engines.



It would be silly to extrapolate fine details about the card’s performance from such a small benchmark sample. It would also be unfair, considering the un-optimized condition of the drivers. But we can make some broad guesses about the strengths and weaknesses of nVidia’s new technology. In Quake III running at 1600x1200, 32-bit color and 2x anti-aliasing, the GeForce FX is about 40 percent faster than the ATI Radeon 9700 Pro at the same settings. The GeForce is almost 20 percent faster than the 9700 Pro in the Unreal Tournament 2003 Asbestos fly-by demo at these same settings. However, in the 3DMark 2001:SE Game 4 benchmark at these settings, the Radeon 9700 is about 10 percent faster than the GeForce FX.



What does this suggest? That the GeForce FX is very fast -- particularly when memory bandwidth isn’t an issue. Remember that the GeForce FX’s 128-bit memory bus runs at 500MHz, but has a maximum bandwidth of just 16GB/sec. Meanwhile, the Radeon 9700’s 256-bit memory interface accommodates 19.8GB/sec, even though it runs at just 325MHz.



The GeForce FX’s core graphics processor is much faster than the Radeon 9700’s, so it will be able to draw as many polygons and fill as many pixels as will fit across the memory pipeline. Our hunch is that turning on 4x anti-aliasing at 1600x1200 would diminish the GeForce’s performance lead over the Radeon, or maybe even nix it entirely. But that’s just a guess based on the scores we achieved, and the fact that nVidia wouldn’t let us run anything that would stress the memory pipeline.



We are much more surprised by the Game 4 scores. We expected to see the GeForce FX’s 500MHz core flex its programmable-shader muscle in this DirectX 8 benchmark. nVidia says that the FX’s programmable shaders are able to run more complex shader programs than those mandated by the DirectX 9 spec. Our guess is that the nVidia drivers just aren’t tuned for this particular benchmark yet.



The practical upshot is that if next year’s games -- specifically DooM III and its programmable-shader brethren -- require more raw GPU power than sheer memory bandwidth, the GeForce FX architecture will be a perfect fit. On the other hand, if next year’s games are starved for memory bandwidth, the Radeon 9700 could very well be a better choice for frame rate–hungry gamers. This is just the first round, though. We have no doubt that ATI has plans for a souped-up Radeon that will be ready to roll as soon as the GeForce FX ships. And if you really twisted our arms, we’d bet money that it will be running on a 0.13-micron core and using 256-bit DDR II memory.



Dare to Compare: GeForce FX Early Benchmarks

GeForce FX

Quake3 Demo001, 1600x1200 2xAA: 209fps

UT 2003 Asbestos, 1600x1200 2xAA: 140fps

3DMark Game4, 1600x1200 2xAA: 41fps



Radeon 9700 Pro

Quake3 Demo001, 1600x1200 2xAA: 147fps

UT 2003 Asbestos, 1600x1200 2xAA: 119fps

3DMark Game4, 1600x1200 2xAA: 45fps

Tests were run in the Alienware prototype system.

Bambers
04-Jan-2003, 10:38
We are much more surprised by the Game 4 scores. We expected to see the GeForce FX’s 500MHz core flex its programmable-shader muscle in this DirectX 8 benchmark. nVidia says that the FX’s programmable shaders are able to run more complex shader programs than those mandated by the DirectX 9 spec. Our guess is that the nVidia drivers just aren’t tuned for this particular benchmark yet.

Game 4 is more of a single texturing fillrate test than anything else with modern cards vertex and pixel shading abilities and so its going to be memory bandwidth limited on the FX

Arun
04-Jan-2003, 12:02
First of all, is this a GeForce FX Regular or GeForce FX Ultra? I'm 99% sure it's an Ultra, but it's not mentionned anywhere, so a confirmation would be nice. Also, were the clocks 500/500? Because some cards nVidia sent to some people didn't have the standard clockrate according to "things I heard".

Interesting results. Did you try to ask nVidia to benchmark Aniso? I can't see why they'd refuse, since they claim their algorithm is better than ATI's one :p

On the plus side, nVidia should get a nice advantage if you activate 128BPP or 64BPP. Because that doesn't hurt memory as much as it hurts shading performance AFAIK.

As for the lower increase with UT 2003, I think I might have an interesting explanation.

I've recently read part of a nVidia patent ( the not-too-recent ones are public, it sounds a lot like it's some type of nVidia DX7 architecture incorporating hardware AA, something that never seen the light of day, since their next architecture was DX8 ) and I was surprised how much DrawIndexedPrimitive could kill memory bandwidth.

For example: When you use Lock(), you obviously write to video memory. But when you use Draw(Indexed)Primitive, you don't only read every single untransformed/unlighted vertex from memory. You also WRITE to memory every single transformed/lighted vertex.
That means higher vertex count simply kills memory bandwidth, too. And even more when more information is passed to the Pixel Shader

Thus, the NV30 *cannot* handle anything near 350M vertices. Probably not even 200M. The architecture is all based on *better* polygons/pixels. Not more. The memory bandwidth couldn't handle more anyway.
And UT2003 doesn't only use more fillrate/textures, it also uses a lot more vertices. So the GFFX advantage becomes a lot less signifiant.

The GFFX advantage might become a lot more evident with Aniso. However, nVidia doesn't want Aniso benchmarks right now I suppose, probably because nobody would believe them if they said their 8X quality is similar to ATI's 16XQ... We need screenshots.

Yes, I know, I'm extrapolating. But the drivers can't change ANYTHING on how much memory bandwidth is taken for vertices. Absolute zero. Nada.

Now, where could the GFFX truly shine? Well, it can't do Doom 3 in less passes than the R300, because the R300 instruction count is already waayy sufficent for 1 pass per light.
However, Doom 3 is a lot more of a Pixel Shading monster. And that's where the NV30 rules.
For example, look at Quake 3. Without AA, its bottleneck *is* fillrate on modern GPUs. And the GFFX already got a 40% advantage there with 2X AA.
However, Doom 3 uses Stencil a lot. But preliminary benchies from nVidia still give us a 40% advantage, so I guess it's mostly a Pixel Shading monster in practice.

Now, to get the best results with the GFFX, Vertex Shading got to be very complex. So programmers shouldn't hesitate to use a super high level of bones for hardware skinning. Also, for things not using hardware skinning, there are other things which may be used to be sure memory isn't the bottleneck.
Improved Water would be gorgeous. Another wonderful thing would be grass that actually ( roughly, else the GPU suicides ) moves according to the wind. And some vertex lighting in cases where the Pixel Shader is already very complex for other reasons would be nice too.

Programming for the GFFX requires a *lot* more care than before. Using more advanced Vertex Shaders / Pixel Shaders is absolutely essantial. Increasing vertex count & number of textures ain't gonna help at all here.
Instead, using Vertex Shading to calculate the position of water ( waves ) becomes possible. And since using something such as FP32 for the Pixel Shading effects on the water at the same time would be nearly free ( it's just balancing bottlenecks ) , you'd get amazing results.

There are many ways you can see this nearly obvious memory limitation in current games. Any pessimist would say the R300 would probably be superior in many cases ( or equal, if nVidia gets their drivers to be more efficient )
But an optimist would actually see this as an opportunity for better effects. Because now, developers can't simply increase texture/vertex count. They got to use the PS/VS. With the R300, many programmers would probably have been too lazy to use those features correctly, since memory bandwidth enables them to continue increasing quantity, and not quality. Now, it no longer does.

And you know what's the best thing for nVidia here? Since we're moving to better quality polygons, and not more polygons, we're not going to get more aliasing ( since it comes from higher polygon count )
That means the most horrible cases for the GFFX, such as 1600x1200 with 6X AA, are not going to be of any use. Of course, 1600x1200 4X could still be nice, but it won't be anyway nearly as abysmal as 1600x1200 6X AA.
And anyway, since there's color compression, the cost of 4X AA should be not two times more memory bandwidth than 4X AA because samples are becoming even more similar, both in Z and Color ( which means compression is more efficient )


Yeah, this was long :)


Uttar

Grall
04-Jan-2003, 15:13
Two questions:

Why do you think NV30 is a "pixel shading monster"? It's got greater *capabilities* than R300 (and you already said that was irrelevant for this game), but it doesn't seem to be inherently faster from looking at what we know, apart from clock speed difference. If anything, it looks like NV30 issues fewer pixel shading ops per clock compared to R300, not more.

Second:

Why do you think devs will spend even five minutes specifically balancing and fine-tuning their software for NV30 on the expense of all others when it will be a minority player on the DX9 playing field when released? Just because Nvidia's Nvidia? Get real, hell they don't optimize their software even for *today's* rigs, let alone tomorrow's! (Just look at Morrowind for example, rofl!) Many games of today recommend 1.5GHz CPUs and sh!t like that, and they don't look substantially better than what we had a while ago. I say you're dreaming, pal! :)


*G*

Arun
04-Jan-2003, 15:31
R1: the NV30 is a Pixel Shading monster because:
8X500 = 4000
8X325 = 2600

And it got more instructions, enabling it to do some things in less instructions. So it's nearly twice the theorical power.
Of course, the practical power could vary. But that Q3 Score was obviously fillrate limited. And it had a 40% advantage. The advantage would obviously have been even higher if there wasn't the 2X AA part and it was something like 2048x1536

So, current benchies proof it's a fillrate monster. Which means it's also a Pixel Shading monster.


R2: They won't optimize for the NV30. They'll optimize for the NV3x. And the NV3x will be a very big part of the market. There's the NV31, the NV34, the NV35, and probably refreshes. From now on, nVidia is ONLY focusing on NV3x. No more NV1x or NV2x.
And right now, nVidia still got about 45% of the market, ATI 25% and the other minor players about 25%
And JC/Epic are obviously going to put more in optimization than Bethesda :)


Uttar

K.I.L.E.R
04-Jan-2003, 15:51
8X500 = 4000

Nv30 doesn't have the bandwidth to reach it's apex from what I heard. :)
Prove me wrong. :)

Arun
04-Jan-2003, 17:25
That's *precisely* what I explained in my huuuge post above, Kiler :p

With more complex shader programs, the GFFX is not going to be bandwidth limited. So, to reach its full potential, very complex VS *and* PS programs are required.
And no, Nature certainly isn't sufficent complex for that. Doom 3 is a lot nearer, but I don't think it's quite sufficent yet :D


Uttar

Chalnoth
05-Jan-2003, 08:24
Check this (http://babelfish.altavista.com/babelfish/urltrurl?url=http%3A%2F%2Fwww.computer-trend.biz%2Ftests%2FxNews.php%3Fact%3Dshownews%26i d%3D6&lp=de_en&tt=url) preview out.

Notice the Unreal Tournament 2003 benchmark? It shows the GeForce FX at 1280x1024x32 with 4x FSAA and 8-degree aniso nearly doubling the performance of the Radeon 9700 Pro. This seems too good to be true. But if it is...wow...

Grall
05-Jan-2003, 10:29
"R1: the NV30 is a Pixel Shading monster because:
8X500 = 4000
8X325 = 2600 "

Uttar, you forget these chips don't issue one instruction per clock, per pipe. R300 does up to three per clock (a matrix fp op, a texture read and a pixel op I believe), NV30 seems to do one fp op and two integer ops per clock OR two pixel ops. I'm not sure if we have the exact specifics of the NV30 architecture pinned down yet, but based on this, there is no obvious advantage for Nvidia here. Naturally, integer ops aren't going to be used with FP framebuffers, so disregarding clock speed advantages, that leaves the chip underpowered compared to R300 unless 64-bit FP buffers are used (since its pipes can be split to handle two pixels per clock).

"And it got more instructions, enabling it to do some things in less instructions."

It's got more instruction SLOTS and a few more registers and such, sure. But that's not anything that will be used in any games since it would be too slow for realtime and you know it. As for actual instruction differences there seems to be no real advantage. NV30 features sin/cos stuff but that can easily be substituted on R300, and the looping stuff NV30 features that R300 lacks won't be used in the chip's lifetime anyway so who gives a f*ck except the fanboiiiis? :) We hardly got any DX8-optimized software at all yet, and the NV30 came out almost two years ago for crying out loud.

"So it's nearly twice the theorical power."

I'd say that's doubtful, even from a theoretical POV.

"Of course, the practical power could vary. But that Q3 Score was obviously fillrate limited. And it had a 40% advantage. The advantage would obviously have been even higher if there wasn't the 2X AA part and it was something like 2048x1536"

Note that clock speed difference between the two chips isn't any more than 35%. I VERY MUCH DOUBT the advantage would increase at all.

"So, current benchies proof it's a fillrate monster. Which means it's also a Pixel Shading monster."

That's not a correct assumption, at least not based merely on the premises you yourself set up. Comparing a 1.5GHz P4 with a 1GHz Athlon for example is a clear-cut case on paper. In reality, the story's quite different. Same thing here, shader architecture is different between the two chips. Could well be the R300 out-pixel-shades NV30 under most cases (128-bit buffers) if clocked at 400MHz. It sure will beat it when vertex shading.

"R2: They won't optimize for the NV30. They'll optimize for the NV3x. And the NV3x will be a very big part of the market. There's the NV31, the NV34, the NV35, and probably refreshes."

Shouldn't you wait for the thing to actually get OUT ON THE MARKET before proclaiming what the softcos will do, huh? :)

"From now on, nVidia is ONLY focusing on NV3x. No more NV1x or NV2x."

LOL, yeah right... Dude, they're still selling MX chips that are DX7-level. I bet they won't stop just because they've released NV3x. We're going to see budget 2.8GHz P4s systems with MX graphics boards in them from the big OEMs before too long YOU BET YOUR ASS ON IT! :)

"And right now, nVidia still got about 45% of the market"

You forget, zero percent of that is the DX9-level market. ATI's already got 100% of that, and will essentially continue to have it for at least another two months or so until availability of NV30 products becomes consistent to the general public on a world-wide basis.


*G*

Arun
05-Jan-2003, 12:04
"Uttar, you forget these chips don't issue one instruction per clock, per pipe. R300 does up to three per clock (a matrix fp op, a texture read and a pixel op I believe), NV30 seems to do one fp op and two integer ops per clock OR two pixel ops. I'm not sure if we have the exact specifics of the NV30 architecture pinned down yet, but based on this, there is no obvious advantage for Nvidia here. Naturally, integer ops aren't going to be used with FP framebuffers, so disregarding clock speed advantages, that leaves the chip underpowered compared to R300 unless 64-bit FP buffers are used (since its pipes can be split to handle two pixels per clock)."

I'd guess that 3 instructions per clock thing is simply marketing babble and that nVidia does it too.

"It's got more instruction SLOTS and a few more registers and such, sure. But that's not anything that will be used in any games since it would be too slow for realtime and you know it. As for actual instruction differences there seems to be no real advantage. NV30 features sin/cos stuff but that can easily be substituted on R300, and the looping stuff NV30 features that R300 lacks won't be used in the chip's lifetime anyway so who gives a f*ck except the fanboiiiis? :) We hardly got any DX8-optimized software at all yet, and the NV30 came out almost two years ago for crying out loud."

You don't seem to have understood me...
I'm NOT talking about instruction slots. I'm talking about things such as sin/cos, as you said ( and there's more than 2 instructions the R300 doesn't have )
Of course it can be substituted on the R300! But it's SLOWER because it takes more instruction slots.
Obviously we ain't gonna see those instructions used too soon. But I'd bet in H2 2003 we'll already get them used a little, and more in H1 2004.
As for DX8 optimized software: Morrowind uses it a little, but very little. However, Doom 3 will use DX8 quite well.

"I'd say that's doubtful, even from a theoretical POV."

We'll see that soon in theorical benchies :)

"Note that clock speed difference between the two chips isn't any more than 35%. I VERY MUCH DOUBT the advantage would increase at all."

The NV30 is clocked over 50% higher than the R300.

"That's not a correct assumption, at least not based merely on the premises you yourself set up. Comparing a 1.5GHz P4 with a 1GHz Athlon for example is a clear-cut case on paper. In reality, the story's quite different. Same thing here, shader architecture is different between the two chips. Could well be the R300 out-pixel-shades NV30 under most cases (128-bit buffers) if clocked at 400MHz. It sure will beat it when vertex shading."
We're comparing the R300 and NV30 here, not R350 and NV30.
Yes, it could be, but it's unlikely.

"Shouldn't you wait for the thing to actually get OUT ON THE MARKET before proclaiming what the softcos will do, huh? :)"

Well, if they optimize for a part of the market, they'll probably optimize for the NV3x part of it. Of course, if they don't optimize at all, I can't force them to :roll:

"LOL, yeah right... Dude, they're still selling MX chips that are DX7-level. I bet they won't stop just because they've released NV3x. We're going to see budget 2.8GHz P4s systems with MX graphics boards in them from the big OEMs before too long YOU BET YOUR ASS ON IT! :)"

I'm not saying they're going to discontinue it. And yes, we'll probably see that.
What I'm saying is that we won't see a NV19 or NV29 or whatever. Those lines are *finished*. They're going to continue selling the NV18/NV28, maybe NV17/NV25 a little too, but nothing more.

"You forget, zero percent of that is the DX9-level market. ATI's already got 100% of that, and will essentially continue to have it for at least another two months or so until availability of NV30 products becomes consistent to the general public on a world-wide basis."

Agreed. But I'm talking about software being released in about one year.


Uttar

Iceman
05-Jan-2003, 12:04
Check this (http://babelfish.altavista.com/babelfish/urltrurl?url=http%3A%2F%2Fwww.computer-trend.biz%2Ftests%2FxNews.php%3Fact%3Dshownews%26i d%3D6&lp=de_en&tt=url) preview out.

Notice the Unreal Tournament 2003 benchmark? It shows the GeForce FX at 1280x1024x32 with 4x FSAA and 8-degree aniso nearly doubling the performance of the Radeon 9700 Pro. This seems too good to be true. But if it is...wow...

This "Preview" is a fake. Just look at the 3D Mark graph.
And the card pictures are just the regular pictures seen everywhere else on the web.

Dave Baumann
05-Jan-2003, 12:49
I'd guess that 3 instructions per clock thing is simply marketing babble and that nVidia does it too.

Well, its not marketing babble since there are three functional units that can all operate per clock. Of course, the fragment program has to be long enough and require these three units much of the time for each instruction to make optimal use of it, but with the length of fragment programs at the moment that’s not really the case.

I'm NOT talking about instruction slots. I'm talking about things such as sin/cos, as you said ( and there's more than 2 instructions the R300 doesn't have )
Of course it can be substituted on the R300! But it's SLOWER because it takes more instruction slots.
Obviously we ain't gonna see those instructions used too soon.

Question, do you know which of these instructions are actually DX macros and which ones aren’t? Even if they are macros on both, do you know how many cycles it takes to operate on both?

Arun
05-Jan-2003, 12:58
1: Well, its not marketing babble since there are three functional units that can all operate per clock. Of course, the fragment program has to be long enough and require these three units much of the time for each instruction to make optimal use of it, but with the length of fragment programs at the moment that’s not really the case.
...
2: Question, do you know which of these instructions are actually DX macros and which ones aren’t? Even if they are macros on both, do you know how many cycles it takes to operate on both?

1: Actually, what I meant is that I think the GFFX is able to do that too.
I'm not 100% sure of what I'm going to say here, so please just consider it as a question.
In an extremetech article about the GFFX, I remember reading the GFFX got 32 calculators, which are its "hearth and soul". And that they're used on 8 pipelines. They can be dynamically allocated, but on average, it's 4/pipeline. Is that comparable to the R300 3/pipeline? Or am I just getting confused?

2: About which are macros. Well, I'm assuming sin/cos isn't a macro on the NV30, but is a macro on the R300. Or anyway, that's what all CineFX documents point to, and it's also what B3D Zephyr's R300vsNV30 article says. As for the others, I'm didn't check. I suppose most aren't macros.
Else you could really talk about some serious vaporware :!:

Uttar

Grall
05-Jan-2003, 14:11
"I'd guess that 3 instructions per clock thing is simply marketing babble and that nVidia does it too."

It's not babble, it really does up to three instructions per clock.

"You don't seem to have understood me...
I'm NOT talking about instruction slots."

I understood, I know you didn't talk about slots, but I did, just to cover those bases. :)

"I'm talking about things such as sin/cos, as you said ( and there's more than 2 instructions the R300 doesn't have )"

Probably true, but from what has been seen on this board, it's not anything that seems to be fundamentally in NV30s advantage. And remember, when softcos finally do start to target DX9-level graphics, they're going to aim squarely at the lowest common denominator. DX9 effects will initially be what DX8 effects are now, IE water in morrowind and neverwinter nights and such. Nothing that will need any fancy-pants Nvidia-exclusive instruction set.

This advantage is for ALL intents completely irrelevant.

"Of course it can be substituted on the R300! But it's SLOWER because it takes more instruction slots."

Real question is, will it even be noticably slower in the end, and will software even exist to take advantage of it before ATI launches a chip of its own with the same (or greater) capabilities?

"However, Doom 3 will use DX8 quite well."

...Which is ironic, since it uses OGL...

(Yeah, I'm just teasing ya! I know what you *really* meant to say! :))

"The NV30 is clocked over 50% higher than the R300."

I don't know what planet you live on, but on this one, 500/2 != >325...

You seem to think GFFX is either clocked at greater than 650 MHz, or R9700Pro at less than 250. Neither's actually true.

"We're comparing the R300 and NV30 here, not R350 and NV30.
Yes, it could be, but it's unlikely."

I wasn't comparing R350 and NV30. You can overclock the R300 to 400MHz with a bit of luck, or with the help of a voltage mod. Better cooling goes without saying of course. Vertex shader performance is clearly greater per clock for R300, pixel shader is a bit sketchier. With only the info I have available, I'd venture a guess and say it's greater too.

"Well, if they optimize for a part of the market, they'll probably optimize for the NV3x part of it."

Based on what *factual* evidence? Like I said, NV has *ZERO* percent DX9-level marketshare right now, and for the immediate future. Their value DX9 part seems to be feature-crippled from the rumors we've heard here (then again, so could ATIs for all we know), and it won't be out before the RV350 is scheduled to be released as well. Face it, NV has to struggle an uphill battle this time. There's no reason to believe softcos will bend over extra for the minority player.

"But I'm talking about software being released in about one year."

There's currently no reason to believe NV will be as dominant in the DX9-level arena as they were during DX6-8. To simply assume they will seems no more than wishful thinking at this point in time. ATI will have 6 months of head-start on Nvidia and be ready for their second DX9-level iteration. You think all that doesn't count for anything and I say you're a fool.

*G*

Arun
05-Jan-2003, 15:05
I agree with much of what you said, but there are a few thins I still don't agree with at all.

"I'd guess that 3 instructions per clock thing is simply marketing babble and that nVidia does it too."

It's not babble, it really does up to three instructions per clock.

I didn't say the R300 didn't do up to three instructions per clock. I simply said I didn't think the NV30 couldn't do it too.

Real question is, will it even be noticably slower in the end, and will software even exist to take advantage of it before ATI launches a chip of its own with the same (or greater) capabilities?

Very good point. Probably not. But I think the NV30 spec is actually better on the sincos front that the minimum 3.0. spec. So, if ATI implements standard 3.0. ( after all, they implemented standard 2.0. beside temp count ) , the NV30 will still be superior.

"The NV30 is clocked over 50% higher than the R300."

I don't know what planet you live on, but on this one, 500/2 != >325...

You seem to think GFFX is either clocked at greater than 650 MHz, or R9700Pro at less than 250. Neither's actually true.

And you seem to be on a planet where +50% = *2
What's that planet? Mars? Venus? Jupiter? Or maybe simply the moon?
Because this certainly ain't the case on Earth. On earth, *2 = +100%
Please note that I said the NV30 is clocked over 50% higher than the R300. So... 325+((325/10)*5) = 325+162.5 = 487 - *true*
So it's correct
If I said the R300 was clocked 50% lower than the NV30, then it would have been false.
500-((500/10)*5) = 500-250 = 250 - *false*

"We're comparing the R300 and NV30 here, not R350 and NV30.
Yes, it could be, but it's unlikely."

I wasn't comparing R350 and NV30. You can overclock the R300 to 400MHz with a bit of luck, or with the help of a voltage mod. Better cooling goes without saying of course. Vertex shader performance is clearly greater per clock for R300, pixel shader is a bit sketchier. With only the info I have available, I'd venture a guess and say it's greater too.

Of course, but Vertex Shading is not the NV30 bottleneck in *any* known real world situation. And I'd venture to say the NV30 per clock Pixel Shading power is higher than the R300's, or at least equal.

"Well, if they optimize for a part of the market, they'll probably optimize for the NV3x part of it."

Based on what *factual* evidence? Like I said, NV has *ZERO* percent DX9-level marketshare right now, and for the immediate future. Their value DX9 part seems to be feature-crippled from the rumors we've heard here (then again, so could ATIs for all we know), and it won't be out before the RV350 is scheduled to be released as well. Face it, NV has to struggle an uphill battle this time. There's no reason to believe softcos will bend over extra for the minority player.

Hmm, I wouldn't call nVidia a minority player in the DX9 arena, even if their plans didn't go as well as previewed. For the low-end, they'll even get a NV34 in their next nForce IGP! Seeing the amazing market acceptance the nForce 2 has recently got and the hype around Hammer, there's few reasons for it not to sell well. And it's a big market, too. Too bad they don't have Intel motherboards... Aww :(
And even if the NV31 sounds feature-crippled, we don't know which parts will be. For all we know, it could simply be there's no color compression / adaptive AF.

"But I'm talking about software being released in about one year."

There's currently no reason to believe NV will be as dominant in the DX9-level arena as they were during DX6-8. To simply assume they will seems no more than wishful thinking at this point in time. ATI will have 6 months of head-start on Nvidia and be ready for their second DX9-level iteration. You think all that doesn't count for anything and I say you're a fool.

*G*

Agreed, there is no current reason to be certain of it. And yes, it does count for something. But according to some recent NV35 info, I've got no reason to think nVidia won't at least be on par with the R400. Of course, that is if everything is going as previewed. But then again, that info could be fake. But then again, we could all be fake and not exist. Oh, wait, I'm getting OT :lol:

Anyway, maybe nVidia won't have the lead in the DX9 battles. But they certainly won't be a "minority player".
They couldn't survive that, anyway. They've got WAY too much R&D to be able to survive being a minority player.


Uttar

Chalnoth
05-Jan-2003, 16:51
This "Preview" is a fake. Just look at the 3D Mark graph.
And the card pictures are just the regular pictures seen everywhere else on the web.

Yeah, you're right. I normally don't look at 3DMark scores. Guess it bit me in the butt this time.

Chalnoth
05-Jan-2003, 17:00
And even if the NV31 sounds feature-crippled, we don't know which parts will be. For all we know, it could simply be there's no color compression / adaptive AF.
No color compression would seem silly. And all anisotropic filtering is adaptive (Why in the hell do so many people not get this?). The GeForce FX will automatically have less of a performance hit with AF just because it only has one texture unit per pixel pipeline.

Doomtrooper
05-Jan-2003, 18:29
Check this (http://babelfish.altavista.com/babelfish/urltrurl?url=http%3A%2F%2Fwww.computer-trend.biz%2Ftests%2FxNews.php%3Fact%3Dshownews%26i d%3D6&lp=de_en&tt=url) preview out.

Notice the Unreal Tournament 2003 benchmark? It shows the GeForce FX at 1280x1024x32 with 4x FSAA and 8-degree aniso nearly doubling the performance of the Radeon 9700 Pro. This seems too good to be true. But if it is...wow...

You posted this on the front page of Nvnews !!http://www.gamers-forums.com/smilies/cwm/cwm/eek7.gif

http://www.gamers-forums.com/smilies/contrib/blackeye/evil_laughterpurple.gif

Grall
05-Jan-2003, 19:28
"I didn't say the R300 didn't do up to three instructions per clock. I simply said I didn't think the NV30 couldn't do it too."

According to the info posted here, that doesn't seem to be the case. Now I agree that the info may be incomplete, but from what I know, R300 does seem to have the more powerful pixel shader implementation of the two (per clock that is).

"But I think the NV30 spec is actually better on the sincos front that the minimum 3.0. spec."

NV30 doesn't comply with PS3.0 AT ALL to my knowledge, not even meeting minimum specs.

"So, if ATI implements standard 3.0. ( after all, they implemented standard 2.0. beside temp count ) , the NV30 will still be superior."

Excuse me? How could you ever get to that conclusion? Besides, R300 is fully PS2.0 compliant on all accounts, hardware actually exceeding specs much like NV30.

"And you seem to be on a planet where +50% = *2"

I'm home with the flu and having a fever, what's your excuse? :lol: Sorry, my maths is a bit messed-up right now.

"Of course, but Vertex Shading is not the NV30 bottleneck in *any* known real world situation."

Merely stating a fact. No need to get defensive about it... :)

"And I'd venture to say the NV30 per clock Pixel Shading power is higher than the R300's, or at least equal."

Again, based on what facts? If you've read the threads here trying to investigate this, you'll find there's been no conclusive info posted that would suggest that to be the case.

"Hmm, I wouldn't call nVidia a minority player in the DX9 arena"

You wouldn't? Despite not having sold a single DX9-compliant chip yet? LOL, you're more desperate than I thought! :)

"For the low-end, they'll even get a NV34 in their next nForce IGP!"

NV34 is not confirmed as the next NForce, that's just internet lore at this point in time. Nor has Nvidia said they'd make the next NForce DX9-compliant. Besides, even assuming NV34 IS = NForce3, and NForce3 = DX9, what makes you think it will be available anytime soon, huh? :) NForce2 hasn't been available that long after all.

"Seeing the amazing market acceptance the nForce 2 has recently got and the hype around Hammer, there's few reasons for it not to sell well."

You seem to forget there's at least five intel systems for every AMD system sold (this is probably counting low), and probably three more non-NForces out of every four AMD systems...

"And even if the NV31 sounds feature-crippled, we don't know which parts will be."

It's very likely to be the shaders. That's what the rumors said here anyway. Fewer pixel pipes and vertex processor array elements goes without saying. Maybe only 64-bit FP buffers (they could get away with only two split pipes on the chip and get the functional equivalent of four). Maybe no vertex shading at all in hardware, but hopefully full pixel shading. We can't be certain though until full specs are announced.

"For all we know, it could simply be there's no color compression / adaptive AF."

If you don't have adaptive AF, you have isotropic filtering = bilinear or trilinear... ;) Besides, I don't think this would save them many transistors in comparison.

"But according to some recent NV35 info, I've got no reason to think nVidia won't at least be on par with the R400."

Ummm... Sorry everybody for doing this, but: :roll: :roll: :roll:

"Anyway, maybe nVidia won't have the lead in the DX9 battles. But they certainly won't be a "minority player"."

They won't be in the minority, they'll still continue to sell gobs of DX7&8-level hardware thus keeping them in the lead spot in total, little to no doubt about it (discounting iNTEL and their integrated graphics sh!t of course). I'm talking about the DX9 market segment only here.

*G*

Arun
05-Jan-2003, 20:16
Per clock R300/NV30 Pixel Shading power: I'll see if I can get more info on this and I'll get back to this ASAP. If I'm able to find info on it, that is :D

Sincos: Here's a quote from the R300vsNV30 article:
"DX9 VS3.0 requires that SINCOS macro cannot take more than 2 slots, and NV30 exceeds its requirement here."
Of course, the NV30 is *below* 3.0. because it doesn't support some VERY important features such as texture lookup. But in some ways, such as this one, it's better. So, if the R400 is simply VS3.0., it won't be better in every single case. Only in maybe 65% of cases :P
The NV30 is on par, and sometimes even above, P3.0. - all it misses is static & dynamic branching, and very few instructions ( but then again, it also got some optinal ones, so it balanaces it ). Nothing else. And branching nearly certainly is the feature programmers are going to take the most time to use.
As for VS3.0., it's only Texture Lookup the NV30 doesn't have. And that's a feature we're nearly certainly not going to see in games for a looooong time :D - even if the NV30 supported it today. It seems complex to implement, from the little I know about it. In fact, the NV30 also supports *more* instructions than VS3.0. ! Might miss one or two, but not IIRC.

nVidia in the DX9 arena: Err, sorry for not being very clear. My point is that, no matter what happens, nVidia won't be a minority player in the DX9 arena in one year. Okay, right now it isn't even in it. But I'm talking about in one year.

NV35: We'll see that soon, now, won't we? :) Just 9 more months for NV35 benchmarks :roll:

NV31/NV34: No Vertex Shading units? We'll see one, don't worry. As for removing Color Compression... Well, it obviously takes a lot of transistors. At least one million per pipe I'd guess. Sure, it's not much, but it's too much for a mainstream part. As for mid-end, it might be acceptable. Or maybe some less efficient Color Compression?

As for adaptive AF: nVidia clearly stated it's a different algorithm. Yes, the GF4 is adaptive too. But it's not as efficient as the GFFX algorithm, if we are to trust nVidia marketing. Yes, only having 1TMU also probably helps. But there's more than that, it seems. So, two good reasons for better AF performance. Better than one! :D


Uttar

OpenGL guy
05-Jan-2003, 21:59
NV31/NV34: No Vertex Shading units? We'll see one, don't worry.
What's your basis for this statement?
As for removing Color Compression... Well, it obviously takes a lot of transistors. At least one million per pipe I'd guess. Sure, it's not much, but it's too much for a mainstream part. As for mid-end, it might be acceptable. Or maybe some less efficient Color Compression?
How do you conclude that color compression takes up 1 million transitors per pipe?
As for adaptive AF: nVidia clearly stated it's a different algorithm. Yes, the GF4 is adaptive too. But it's not as efficient as the GFFX algorithm, if we are to trust nVidia marketing. Yes, only having 1TMU also probably helps. But there's more than that, it seems. So, two good reasons for better AF performance. Better than one! :D
How does having fewer TMUs help AF?

Jamm0r
05-Jan-2003, 22:59
Sorry to have to say this, but some of the posters in this thread don't know their $ss from a hole in the wall. And I'm NOT referring to OpenGL Guy ;)

Hellbinder
06-Jan-2003, 07:15
Radeon 9500 and 9500 PRo both have color compression.. And they are mainstream/midrange parts.

Nvidia also recently *clarified* that their Color Compression only works during FSAA in really small print after about 10 review sites falsley claimad that nv30 used color compression all the time.

Which is exactly the same as the R300 technology.

Fox5
06-Jan-2003, 23:48
Has nvidia stated they will only be making dx9 parts? I mean, after the geforce 3, they still made the geforce 4 mx, and I think also after the geforce 3, they had a geforce 2 ti 100 or something.

BTW, what are the benefits of dx9? So far, it seems to me that it is just smarter dx8, in which case, it will probably be about 3 years before any game actually takes advantage of dx9. Heck, most games I see now still look like dx6 games.

Arun
07-Jan-2003, 16:48
As for removing Color Compression... Well, it obviously takes a lot of transistors. At least one million per pipe I'd guess. Sure, it's not much, but it's too much for a mainstream part. As for mid-end, it might be acceptable. Or maybe some less efficient Color Compression?
How do you conclude that color compression takes up 1 million transitors per pipe?
As for adaptive AF: nVidia clearly stated it's a different algorithm. Yes, the GF4 is adaptive too. But it's not as efficient as the GFFX algorithm, if we are to trust nVidia marketing. Yes, only having 1TMU also probably helps. But there's more than that, it seems. So, two good reasons for better AF performance. Better than one! :D
How does having fewer TMUs help AF?

Color Compression: Hmm, really, Hellbinder? Can you give me the link for that please?
Anyway, I guess it certainly doesn't take that much transistors if it's only on during AA...
But since you suppose I invented that 1M/pipe number,
Was basing myself on having nearly 5% of the core for Z Compression according to some old GF4 document. Since the GF4 is 60M, that means nearly 3M transistors. And there's four pipes. So, let's say 0.60M/pipe ( not sure it's really by pipe, but it seems logical to have more if there are more pipes since those allow higher resolutions )
I was also supposing that Color Compression was more complex, since colors are often more different. But with that "clarification", it obviously doesn't make sense for it take anywhere near those amounts.

Adaptive AF: 1 TMU helps *compared* to the GF4, because IIRC, there's a bug with the GF4 which nearly nullifies the effect of two TMUs. So, it helps in performance % drop...


Uttar

OpenGL guy
07-Jan-2003, 18:53
But since you suppose I invented that 1M/pipe number,
You did invent the number, as you show below.
Was basing myself on having nearly 5% of the core for Z Compression according to some old GF4 document. Since the GF4 is 60M, that means nearly 3M transistors. And there's four pipes. So, let's say 0.60M/pipe ( not sure it's really by pipe, but it seems logical to have more if there are more pipes since those allow higher resolutions )
I was also supposing that Color Compression was more complex, since colors are often more different. But with that "clarification", it obviously doesn't make sense for it take anywhere near those amounts.
Maybe it's more, how do you know? What I am trying to say is you have no facts, so why speculate?
Adaptive AF: 1 TMU helps *compared* to the GF4, because IIRC, there's a bug with the GF4 which nearly nullifies the effect of two TMUs. So, it helps in performance % drop...
That's still a weird way of putting it. The point isn't that the GeForce FX has 1 TMU, it's that the GeForce 4's AF had problems.

Ante P
07-Jan-2003, 19:36
]Radeon 9500 and 9500 PRo both have color compression.. And they are mainstream/midrange parts.

Nvidia also recently *clarified* that their Color Compression only works during FSAA in really small print after about 10 review sites falsley claimad that nv30 used color compression all the time.

Which is exactly the same as the R300 technology.

I'm not saying that I don't trust you but where did you read this about the color compression?
I've asked you one time earlier but you never replied.

Dave Baumann
07-Jan-2003, 20:09
I'm not saying that I don't trust you but where did you read this about the color compression?

Try B3D :!:

Alongside Z buffer compression, R300 also feature colour compression when FSAA is enabled. Because of the way multisampling operates much of the subsamples contain the same colour data, its only at polygon intersections are there ever more than one colour value over all the subsamples. R300 is able to compress the colour samples and achieve a very high compression ratio with mulitsampling.

Given the amount of PR NVIDIA have had over their color compression ATI have made a bit of a mistake here. Their preview press documentation hardly made any mention of it at all, and it ended up a almost a footnote - to this end most sites that did previews barely mentioned it, if at all. This was a big mistake by ATI because very few people realise it has color compression and fewer people realise the benefits it can have on AA performance.

We didn't do a preview so I had a little extra time and actually validated what was written there with ATI.

Ante P
07-Jan-2003, 20:14
I'm not saying that I don't trust you but where did you read this about the color compression?

Try B3D :!:

I was referring to his claim about NV30s color compression not being constant..?

I've seen his claims about nvidia "clarifying" the situation twice but none of the times did he provide a source for the claim and when I asked him earlier he never replied.

Dave Baumann
07-Jan-2003, 20:16
Ahh, sorry.

AFAIK it is constant, but they admit there will be little benefit without AA.

Ante P
07-Jan-2003, 20:31
Ahh, sorry.

AFAIK it is constant, but they admit there will be little benefit without AA.

is there any place to read their comment? :)

Dave Baumann
07-Jan-2003, 20:36
http://www.beyond3d.com/previews/nvidia/nv30launch/index.php?p=3

Ante P
07-Jan-2003, 20:59
http://www.beyond3d.com/previews/nvidia/nv30launch/index.php?p=3

spanks :)

Bambers
07-Jan-2003, 21:49
There was a thread a while back on rage3d of some numbers someone had got from a 9700pro with colour compression on and off (used some registry switch). Anyone want to confirm this? and if its true then try a 9500pro :)

Thread is here: http://www.rage3d.com/board/showthread.php?s=&threadid=33651525