5700, 5600, 9600 'Pure' DX9 Performances

DemoCoder said:
I'm waiting for a comprehensive set of comparisons (maybe Dave can run the tests with Det45 vs Det52), but it looks to me like it is legitimate. And it makes sense if you read their Unified Compiler Whitepaper where they show before and after instruction schedules.

I've read a few "whitepapers" from nVidia over the years that I could not distinguish from thinly veiled PR jargon designed to promote some various aspect of their products as a marketing exercise. The ones relative to 3dMk03 and their "critiques" of various "unnecessary" so-called DX9 hardware features come to mind, as recent examples.

When they spent all that time and money figuring out how to thoroughly cheat 3dMk03 in their drivers earlier this year to the degree that the benchmark provided almost no meaningful information on their GF FX products (merely one of many such examples), evidently "the problem" *then* according to the nVidia PR machine was with "software developers using the wrong approach" and everything would be "fine" if only "software developers" would optimize for nV3x, etc. To me the current emphasis on compilers isn't new at all from a technological standpoint, nor particularly important at this stage of nV3x deployment, but in terms of marketing it is just a new chapter in the same old tired PR story trumpeted all year along: an ongoing attempt to explain away the performance deficit of nV3x in relation to the chips offered by their competitors (specifically, of course, R3x0.)


Well, I have been discussing it since the NV3x launch, like a broken a record, I might add, yet after many many shader-benchmark tests people were coming to the conclusion that there was nothing more to be done because of the NV3x architecture. (and of course, with the usual disclaim that a technical discussion on NV3x architecture and optimization opportunities does not constitute a "defense" of NVidia or anti-ATI position for fanboys)


For some reason you seem to think nVidia's remarks on the subject of nV3x compilers do not constitute mere marketing for nV3x, but are objective, unbiased statements of some sort of universally accepted principles. Considering that nVidia makes all of the statements it makes in reference only to nV3x, and doesn't mention its competition unless it is to disparage the competition's products, I don't see how you might fail to reach the conclusion that nVidia itself is the biggest nVidia "fanboi" of them all. Large grain of salt, therefore, is required when self-interested companies obsessed with marketing write "whitepapers," IMO.

I'm not interested in the Det52's from the aspect of "well, I might want to buy a 5700 now". I am interested in them from the aspect of "What went wrong?"

It appears NVidia's problems are the result of trying to design a shader pipeline to be too flexible (resource sharing, extremely long lengths, predicates, ddx/ddy, unlimited dependent textures, complex instruction set, pure stencil-fill mode, two TMUs, etc) They spent transistors on complexity, which in itself, led to poorer performance, but in doing so, the added complexity made it much harder for the drivers to translate DX9 instructions efficiently. Both these factors led to really crummy performance, now it appears the latter issue has been resolved, but we're still left with HW that is not up to snuff.

Certainly the hardware is not "up to snuff." That's the entire problem. I've had an answer which satisified me to the question of "What went wrong?" for most of this year...:) And, of course, it has little to do with compilers.

It looks to me that nVidia designed a dog of a chip architecture for nV30 on many levels, and that the cancellation of nV30, the replacement of nV35 with an IBM-fabbed nV38, an increase in core clocks and ram bus clocks, more driver optimizations than can be tabulated--not even touching on 3dMK03 and that particular can of worms--things like FX12 and Fp16, along with hardware fp32 support too slow to do anything but provide nVidia with technical DX9 compliance-- it all adds up to a clear picture of a company struggling to improve a non-competitive architecture through every indirect means at its disposal, like FAB changes, silicon respins, pcb revisions, driver and compiler optimizations--you name it, nVidia has done everything it can do except that which is most needed and which will make the most difference regarding their current problems: a brand new architecture. nV3x is simply a dog. To say that it is "overly complex" as marketing spin for "poorly designed" seems to advance little of worth to the topic, IMO.

Not really. ATI's architecture seems much more straight forward and tailor made for DX9 input. You don't have register limitations to deal with. You don't have multi-precision. You have clear rules for how to use the separate vector and scalar units. They still have to do translation and scheduling, but the issues aren't as complex. If you listen to Richard Huddy explain how to hand craft shaders, you'll see that it's much simpler. Hand crafting for NV3x is more difficult.

The way I read the above it's simply being "overly complex" in stating the obvious: that ATi simply designed a better architecture in every respect germane to a solid 3d architecture. What--I'm supposed to give nVidia kudos for FX12, fp16, and support for a 3d-gaming-useless fp32, simply because they couldn't--or wouldn't--design a fast fp24 pipeline to cover all the bases? Don't think so...I'm supposed to give them kudos for complexity which is wholly unneeded and contributes nothing to 3d-API support, when the architecture is targeted to a market 98% comprised of people who'd buy those products to run 3d games, who have support for present and *upcoming* APIs in mind? I'll pass, thanks.

The upshot for me concerning the whole issue here is this: could you imagine ATi, or any other IHV at the moment, possibly, under any circumstances whatsoever, saying:

"Next generation for us the challenge will be to emulate as closely as we can the principles we see so well-implemented in nVidia's nV3x architecture because we see in it the future of 3d-chip architecture design. We are so impressed with nVidia's market success of nV3x, it's incredible performance, its powerful support of newer 3d API features, and the incredible image quality, and more, that we feel compelled to adopt what is obviously a new 3d-chip-design paradigm, and the wave of the future. We only hope we can be half as successful with our version of the revolutionary nV3x architecture paradigm."

Of course not, right?...:) Only an idiot IHV would want to emulate the mess nVidia has created in nV3x, or want to experience the joy nVidia has experienced over the last year because of it, and accordingly absolutely nobody is going to try and emulate them. The truth as we all know it is that practically every comment made in the absurdity I fabricated above is false as it pertains to nV3x. And that is precisely why such "principles" as you apparently see in nV3x that you believe are worthwhile and represent "design directions for the future" are in fact nothing of the sort, IMO. nV3x is simply a non-competitive 3d architecture, which explains everything and is the actual truth of the matter, in my opinion.

That's what bothers me about your assesments that nV3x represents some kind of "future" for 3d chip design in world where IHVs need a solid year to optimize compilers because the chips are *far more complex than they need to be to do the job.* The opposite seems much more likely to me, that IHVs would view nV3x as a prime example of how not to design a 3d architecture for the 3d-gaming market segment in the future. Sure, it's true that chip performance is moving away from external factors like bandwidth and into the vpu itself in relation to performance and IQ with such technologies as pixel shading. So, OK then, the R3x0 does all of that much better than nV3x, and it's nV3x which is actually far more dependent on core and ram clocking than R3x0-based products currently, yet R3x0 still manages to outperform nV3x, and sometimes quite substantially, especially in the implementation of core-dependent technologies like ps2.0 as supported in R3x0 for DX9 API support. So how on earth could anyone ever reach the conclusion that the design paradigm of nV3x was the "future" and the paradigm of R3x0 is not? The facts would seem to indicate the very opposite, seems to me.
 
DemoCoder said:
I'm waiting for a comprehensive set of comparisons (maybe Dave can run the tests with Det45 vs Det52), but it looks to me like it is legitimate. And it makes sense if you read their Unified Compiler Whitepaper where they show before and after instruction schedules.

I was going to do this in the 5700 Ultra review, but I'm now going to push it back to the 5950 review (if I get one).
 
[whole bunch of frothing anti-nvidia stuff deleted]

WaltC, why aren't you able to separate out your clear Nvidia hatred that's fogging your mind and have an honest technical discussion?

As GPU's become more general purpose with PS3.0 and beyond (loops, branches, calls), the question is, what's going to happen to performance? Your tacit assumption that an R300-like architecture which is insensitive to alot of optimization issues will just map right over to PS3.0 and beyond, in which case, ATI will not encounter any significant issues with compilation is a bit of a unfounded speculation.

Yes, the NV3x architecture is busted, but the question is, why did they make the design decisions they did? Instead of just saying "well, NVidia sucks", why not step back for a moment and imagine that they explored several design options, and went with the NV3x pipeline design for some "reason" over other options they had. Are their problems with registers a factor of their support of instruction streams much larger than 96? Is it a factor of multiprecision? Is it a factor of unlimited dependant textures? Is it the FP32 units eaching too many transistors vs FP24? Which one of their design requirements led them to a busted architecture besides the general "well, all their engineers suck" dismissal. I frankly don't care about your PR arguments, even if I agree with them, because PR and engineering departments are separate, so just because Perez or Kirk are liars, doesn't mean the other few hundred engineers working in the trenches are all of a sudden idiots or incompetents.


Leaving aside the NV3x, instruction scheduling and hazard avoidance are issues for any processor. The more functional units you have, and the more latencies differ and overlap for different options, the harder it becomes. When you throw loops into the mix, you get an NP-complete problem, that unlike register allocation (also NP-complete), there is no known "almost optimal efficient" algorithm, unlike the case of registers, for which graph coloring is pretty good.

I happen to think that people learn best from their failures, not their successes, and there is something to take away from the failure of the NV3x with respect to design and issues with HLSL compilers. Close your mind if you wish.
 
DemoCoder said:
I frankly don't care about your PR arguments, even if I agree with them, because PR and engineering departments are separate

You'd be suprised ;)

I'm not sure nVidia really ever made huge errors of judgement when designing the NV3x architecture - i'm of the belief that it was a collection of mistakes/errors/bad luck all the way along which caused the problems nVidia have got now. With hindsight, FP24 would have been the better move for them, but having both FP16 and FP32 could well have looked like a nice and flexible solution on paper.

Also, as for nV saying that they're going to try and keep/refine the best bits of the nV3x architecture for nV4x - isn't that obvious? I'm not sure why that's apparently such a bad thing. Just because their first stab at it didn't work, it doesn't mean that some of the cogs of the machine don't work well (or can't be made to work well with some extra effort and time). nVidia can obviously see value in at least *some* of their architecture, and - frankly - they're in a better position to make that judgement call than we are.

I'm prepared to give them another throw of the dice. They've got many, many talented people there, even if they seemed to have been buried by all the PR recently. The fact that the NV38 is doing reasonably well now, having made huge strides forward since the NV30, is proof enough that nVidia have talent. If they can make a busted architecture work to the degree that the NV38 does (mostly, anyway. I know they still lag behind ATi in DX9), what will they be able to do with an architecture that actually works properly?

Then again, ATi are hardly going to let the NV40 have an easy ride of it... :devilish:
 
PaulS said:
DemoCoder said:
I frankly don't care about your PR arguments, even if I agree with them, because PR and engineering departments are separate

You'd be suprised ;)

Of course, you know I meant that just because Perez is a boob, doesn't imply that say, Mark Kilgard is, or for that matter, ex-Nvidiot Richard Hubby. Nvidia has competent engineers, but the image of the whole company has been blemished by PR and management.

Nvidia's biggest problem with the NV30 is that it's not really a PS2.0 device. It's a DX8.1 accelerator with some DX9 features. ATI successfully evangelized DX9, to their credit, and the benefit of users, such that the API has way faster uptake than any I've seen before. Even though most people don't have DX9, developers and users view PS2.0 performance as an important issue. In the past, people would say things like "who cares about performance of feature X, no games have it, and hardly any people own cards with feature X"

Another problem is, for some reason, AA improvements keep getting axed from the design. NVidia management has made IQ a low priority, and that does translate into priorities for the engineering department. If Jen-Sen told engineering tommorow "NV4x must have industry leading AA", they'd find a way to improve it in the next design, even if they had to leave out a PPP or a register combiner, or something. Problem is, management has gotten addicted to winning in benchmarks, too addicted.

Many people in this forum would still buy R300's, even if the PS2.0 performance was lower than NV3x and even if it lost in benchmarks by a little bit, because the AA is so much better.
 
PaulS said:
With hindsight, FP24 would have been the better move for them, but having both FP16 and FP32 could well have looked like a nice and flexible solution on paper.

Statements like this assume that the move from FP24 to FP32 is trivial. I don't think such a statement can be made until other companies make a similar transition successfully. ATI will probably make a transition at some point as well, but will we say that with hindsight, FP32 would have been a better move for them earlier if problems do arise? I don't think it will be too much trouble, but until we have seen a large enough scope to compare with, I don't think hindsight is really something that should be brought up... not to mention the fact that "hindsight" is only useful for bragging rights.
 
bdmosky said:
PaulS said:
With hindsight, FP24 would have been the better move for them, but having both FP16 and FP32 could well have looked like a nice and flexible solution on paper.

Statements like this assume that the move from FP24 to FP32 is trivial.
I don't see why PaulS's statement requires the assumption that FP24 to FP32 is trivial. Although, I do consider it trivial compared to the complexity of the control logic for the shader pipes. What's not trivial is changing a design from supporting one FP format to a hybrid design like NV3x.
 
PaulS said:
With hindsight, FP24 would have been the better move for them, but having both FP16 and FP32 could well have looked like a nice and flexible solution on paper.
That's not necessarily the case. There are many differences between the NV3x and R3xx architectures, far too many to point at one of them and say that that was the issue.

And while I think that going for two INT12 units and one FP32 unit was a bad move, I don't think the FP32/FP16 design of the NV35+ is necessarily bad.

In order to make a judgement on which features of the architecture were good and which were not, you would really need to know the exact impact of each and every design decision. It is impossible for somebody outside of nVidia to know these things.
 
I have a few questions and a few comments. I want to try to take a different approach but realize I could be wrong in the way I look at things.
the 9700 pro was released almost a year and a half ago. Which tells me Microsoft had to have given both ATI and NVidia the dx-9 white sheets I am assuming at least 6 months before that. So we are talking at least 2 years since NVidia got the information they needed to make design their cards. With this in mind. I am sure they had to have designed this card and had more than enough time to test it and realized that there was a problem. No one can tell me that NV's software engineers could not write some code with PS 2.0 shaders and when running the code think that they would have no problem. I mean I doubt very seriously that 3dmark03 was the first time they saw an issue with advanced shaders.
Secondly, why did they chose the 4 x 2 pipeline structure, and then only chose to use two shader engines. I see everyone in this forum talk about compilers and such. NV stated the problem was that no one was using the CG compiler their hardware was designed under. However if you look at the old TR:AOD benchmarks. even when using the CG compiler performance was no better than HLSL exept when AA or AF was added, then by only a few FPS. and still it showed unplayabe framerates.
My first question is, even with all this bad compiler implimentation would an 8 X 1 architecture and 4 shader engines have made this a mute point. Since they would not have to worry about getting double the output through each pipeline in one clock cycle to equal the output that an 8 pipeline design would have done.
As for the FP32 issue. I think it was a marketing gimmick. " ATI is only doing FP 24 but we can do "FP32" etc etc. I firmly believe also they knew back then they could not do it. I also think ATI realized that even their hardware would have a rough time doing FP32, and decided it was better to use FP 24, then to try to add all the extra that would be needed to use FP 32 or use PP hints to drop to half precision. Not to mention the IQ hit it would take.
Third. When NVidia realized after the 3DMark fiasko the problems they had. Why did not admit they made a mistake and totaly redesign the core and get it right in the first place. Instead of bullying Futuremark to admit they were not cheating. Then making all these benchmark specific optimizations that made their cards apear they were better than they were?
It has been close to three years in my estimation that NVidia recieved the white sheet from Microsoft saying..here this is what DX-9 will consist of. Go ahead and design your card to run it. yet we are just now seeing a driver release that makes it someone competative. although I still question whether it really does at all. Is this new compiler in the new drivers going to help the 5600 run HL-2 in full dx-9 mode? or even in mixed mode half precicion. Or are they still going to have to run DX-8 mode as Valve stated.
As for the 5700. They did add a shader engine to the core. That plus the higher clock speed is in my opinion why you see the big performance increase. More so that the 52.16 drivers, although I am sure that helped. However, it still would not beat the 5900 so it still will be rendered DX-8 mode according to valves specifications in my opinion. They still do not support HDR with ps 2.0 shaders. This will be used in HL-2, and no FX user will be able to use it. Although Valve is trying to get a form of HDR to work on the 5900 and over cards if NVidia will make some changes.
This is not an ATI vs. NVidia issue in my book. If the FX had no problems running ps 2.0, did not have to run mixed mode or even dx-8 mode in many cases. then it would be an ATI vs NV issue. This is not the case. the only think that makes this an ATI vs NV issue is the fact that ATI got it right the first ime. And have had it right since the start. And the only thing to compair FX to happens to be them. The issue is card that they say will run the latest and future games in their own words
Delivering blazing speeds, ultra-high resolutions, cinematic gaming effects, unmatched features, and rock-solid stability, the GeForce FX 5900 GPUs ignite PC gaming. The second-generation Intellisample high-resolution compression technology (HCT) delivers the highest-quality antialiasing for ultra-realistic visuals with no jagged edges. And, powered by the NVIDIA CineFXâ„¢ 2.0 engine with the industry's only true 128-bit precision processing--the GeForce FX 5900 GPUs take cinematic-quality special effects to new levels while providing the industry's most compatible and reliable gaming platform.
this is obviously not the case.
1. Blazing speed, ultra high resolutions, unmatched features and rock solid stability. As far as ps 2.0 goes. this has not been the case since day one. they have had SLOW fps (unplayable for the most part in full dx-9 mode) They have had to run at lower resolution to try to increase the FPS in this mode. I think everyone will agree from what I have been reading that ATI currently offers more features than FX, or at least workable features. HDR for one. And stability has been far from the case.
2. The second-generation Intellisample high-resolution compression technology (HCT) delivers the highest-quality antialiasing for ultra-realistic visuals with no jagged edges.. Um hasn't AA been an issue with FX since day one?/ I know they have good AF but AA??
3. powered by the NVIDIA CineFXâ„¢ 2.0 engine with the industry's only true 128-bit precision processing - yes they may offer it, but currently they can't use it!!
4. the GeForce FX 5900 GPUs take cinematic-quality special effects to new levels while providing the industry's most compatible and reliable gaming platform. even games under made "The way its meant to be played" Like TR:AOD can not be played at playable framerates for the most part.

This is the issue and has been from day one.

Lastly..what must NV do to get NV 40 to work. And if they do get it to work. Does anyone think they are going to continue spending the big bucks on programmers to try to keep the FX series of cards running..I.E. make game and benchmark specific optimizations??
 
Hey Dave I have a question for ya. Do you have the pro version of Aquamark? If ya do could you do a test for us. Someone over in Futuremark posted the following

Don't start flaming on me, at least not before reading the whole post, I can't confirm this yet as I don't have pro version of Aquamark3, I sent PM for Hanners if he could do some comparisons..

anyway, I picked this from another forum, the guy who posted it claims that the FX's aren't doing any PS 2.0 effects in Aquamark3 with Det52.16's, and posted video to prove it.
In the video, blue color should mean that there's no usage of any pixel shaders, yellow that PS 1.x is used and red that 2.0 is used.
I can't see anythin red in it, but don't even know if there should be any ps2.0 effects in that scene, that's why i asked Hanners to test on this one.

here's the link on the video:
http://www.racomputers.com/files/5216_drivers.avi

if anyone of you with FX's have pro version of Aquamark3, do some tests please, and same for you people with R9800's and pro version of Aquamark3.

The video is taken with demoversion of Fraps in Aquamark3's SVIST (or something like that) mode

Any way. What we found out was that the guy had a 5600. Someone from the thread had a 5900 and showed that it is doing PS2.0. My question, and the question of alot of us, is the new 52.16 causing the 5600 to not do ps 2.0 on benches like Aquamark and possibly others, and reverting to dx-8 mode?? Any way you can test this theory??
 
DemoCoder, would it be fair to say that the first 20% of performance improvements that could be found in a driver/compiler are easier to find than the next 20% ?

And does anyone know if the 5700 is subject to a waiver to meet the Dx9 spec ?
 
nelg said:
DemoCoder, would it be fair to say that the first 20% of performance improvements that could be found in a driver/compiler are easier to find than the next 20% ?

And does anyone know if the 5700 is subject to a waiver to meet the Dx9 spec ?

Yeah, low hanging fruit. It's hard to speculate what else could be done without knowing the architecture. It also depends on the class of "input" to the compiler. They could have optimized for small to medium "common" shader idioms. Dunno.
 
Bry said:
1. Blazing speed, ultra high resolutions, unmatched features and rock solid stability. As far as ps 2.0 goes. this has not been the case since day one. they have had SLOW fps (unplayable for the most part in full dx-9 mode) They have had to run at lower resolution to try to increase the FPS in this mode. I think everyone will agree from what I have been reading that ATI currently offers more features than FX, or at least workable features. HDR for one. And stability has been far from the case.

As you know Halo is a game and not a synthetic benchmark and contains a command line variable that supports a pixel shader 2.0 code path. Have you by chance tested performance in Halo using this code path with either the GeForce FX or Radeon 9800?
 
MikeC said:
As you know Halo is a game and not a synthetic benchmark and contains a command line variable that supports a pixel shader 2.0 code path. Have you by chance tested performance in Halo using this code path with either the GeForce FX or Radeon 9800?

FX'es when running halo drop down to a mix precsion mode. So while its a valid comparison, its not an apple to apple one.
 
jb said:
FX'es when running halo drop down to a mix precsion mode. So while its a valid comparison, its not an apple to apple one.

Which it of course also wouldn't have been if you did a FP24 vs FP32 comparision. Just as with Ati 4X FSAA vs Nvidias 4X FSAA. The list is long and we wouldn't be able to do any comparisions at all if we wanted exact apples to apples.
 
Which it of course also wouldn't have been if you did a FP24 vs FP32 comparision.

Depends on your outlook. Both oth these modes are DX9 full precision - so in one case you are comparing full precision to partial precision and in another you are comparing full precision to full precision. ;)
 
DaveBaumann said:
Which it of course also wouldn't have been if you did a FP24 vs FP32 comparision.

Depends on your outlook. Both oth these modes are DX9 full precision - so in one case you are comparing full precision to partial precision and in another you are comparing full precision to full precision. ;)

I did notice the smiley but still :)

If we would use the definition only then we wouldn't need to worry about differences between Ati's and Nvidia's 4X FSAA either since both are by definition 4X FSAA.

My simple opinion:

If you can get the same IQ with a mix of FP16 and FP32 vs FP24 then it's a apples to apples comparision

If you do FP32 all the way through and there's no difference (unless zooming in 4X to find one pixel not exactly the same color or something crazy like that, goes for the FP16/FP32 vs FP24 example also) between FP24 and FP32, then it's a apples to apples comparision.

It's all about the final output :)
 
Back
Top