Another NV3x series and multiple precisions thread

demalion said:
In short, what I'm saying is that nVidia's design approach for NV3x looks like it needs less rethinking to be suitable for PS/VS 3.0 than the R3xx's design does. I.e., incremental improvement versus redesign.

I generally agree with that. nVidia and ATI could be said to be swapping positions in this respect compared to the DX8 to DX9 jump.

The R200 core should have required much less "rework" to get up to DX9 (PS/VS 2.0) standards than the NV20 core did.

Not surprisingly to me, the R300 core had much fewer problems getting launched and supported with compliant drivers compared to the NV30.

So it would not be surprising to me to have nVidia get to the PS/VS 3.0 a bit "easier" than ATI will....assuming both ATI and nVidia strive for PS/VS 3.0 at all. ATI has more reason to just "ignore" PS/VS 3.0 (like nVidia ignored PS 1.4), and milk the R300 core all the way to their DX10 part....becaue the R300 core has a very solid PS 2.0 card.

PS/VS 3.0 isn't going to mean much, practically speaking. For marketing and for the occasional app, it will be an additional "bonus". But the bulk of DX9 titles will be based on PS/VS 2.0, as it is the lowest common denominator. PS./VS 3.0 will also be useful for developers to get their "feet wet" so to speak and be able to prepare for DX10. MS in particular should be able to get feeback on PS/VS 3.0 and use it as as a lerning exercise for potential tweaks for DX10 development.
 
PS3.0. / VS3.0. isn't going to mean much for gamers, probably for at least a good one year or two after the NV40 is released at this Comdex.

It's a completely different matter for workstations, however.
With VS3.0. , PS3.0. and a PPP, you can easily do EVERYTHING the film industry is used to. I mean, there's practically no limitations, and multipass could be used if it was really required anyway ( heck, maybe there'd even be ways to do multipass for vertices since they can use textures, although they'd obviously be unoptimal I guess )
With such features, nVidia is pretty much guaranteed to have an advantage over ATI in getting large, film-related contracts - although those contracts seem to be rare right now, and they're always behind the scenes.

Also, I do not believe the NV3x will help nVidia much for the NV4x. Oh, sure, it'll give them some experience in things like dynamic branching. But let's remember the NV40 is a pretty big leap.

I remember CMKRNL saying, in one of his last messages, that the NV30 was an intermediary design - that the NV40 was very different.


I believe that we already know a lot more about the NV40 that we really want to admit.
0.13u with SOI and without Low-K at IBM
VS3.0., PS3.0. , PPP
Full FP32 ( or at least, no FX functionality, maybe you'll be able to use FP16 registers, doubt it though considering that it's supposed to have the same constructs in the PS as in the VS )
Uses GDDR2, probably on a 256-bit memory bus ( so, likely not a TBDR )

One question remains, however.
Does the NV40 employs Dynamic Allocation? ( and if so, is the PPP included in this dynamic allocation? )

Should we have the answer to that, we'd have solved the puzzle that is the NV40. For now, sadly, the NV40 remains mysterious.


Uttar
 
Chalnoth said:
demalion said:
It seems simple: fp16 and fp32 is a good decision, fx12 and fp16 and fp32 was not. To me, it seems illogical to simultaneously propose that "FX12 was necessary and not wasteful" and "the NV35 is able to improve a similar design significantly", and isn't even consistent with what nVidia themselves has recognized.
Well, going for higher-speed FP16 is obviously better, but that does not mean FX12 was necessarily bad, either.

Didn't say FX12 was bad, said it was wasteful. Hence why I discuss the example of the NV35 and relatively small transistor count increase as being an illustration of this.

Still, Microsoft is making it very hard to get good performance and image quality from the NV31-34 cards through DirectX.

:oops:

I'm sorry, but that seems to not even be remotely divergent from rampant and nonsensical bias. Feel free to provide some reasoning for the statement that will give me some reason to think otherwise.

Let's try this statement...can you say it, mean it, and have it take for you before the NV30-NV34 fade from your memory?:

"nVidia is making it very hard to get good performance and image quality from the NV31-34 cards through DirectX".

Since nVidia designed the NV30-34, and made them dependent on inferiority to the DX 9 PS spec for performance, why is Microsoft to blame and not nVidia? They can perform well for PS 1.1 through PS 1.4 within their limitations, except when you compare to their competitors. That comparison is nVidia's fault, Microsoft just lets that weakness be exposed. Well, Microsoft and every other cross vendor standard.

The NV31-34 are just worse cases of the NV30.

Since I really don't have much information on exactly what kinds of shaders and how many shaders used in real games will need what sorts of precision, I can't give an accurate picture as to whether FX12 was a good decision or not.

That's because you're good at turning a blind eye to what you don't want to see. FX12 dependency is never better in and of itself, it is only worse...the only benefit is from the tradeoffs it might allow you to avoid, and that is irrelevant until those tradeoffs are actually avoided and you gain something significant. If you can implement floating point in approximately the same space, or less, your FX12 implementation was wasteful. This is long demonstrated to be the case for the NV30.

FP16 is just clearly better (for performance...it has a larger transistor count, which may not have been possible in the NV30's timeframe, esp. given the other development problems).

It was possible to do better than even fp16 before the NV30's time frame. It was also possible to implement floating point processing (fp32, AFAIK, except with the severe register limitations) in just slightly more space and the same functionality shortly afterwards. To me, this clearly shows that the NV30 itself was wasteful. Considering both factors, and not selectively ignoring one at a time, how are you saying this is not demonstrated?

All that I do know is general information on what precision is needed where, an common sense from this knowledge tells me that it will be rare to need FP32 throughout most shaders.

You do realize that this in no way validates FX12 independently of your having a preference for it and labelling it "common sense", and then also still leaves fp24 better than fp16?

Why do you still persist in concentrating on the peak performance of the NV3x (NV35 in this case), ignoring the limitations affecting its ability to reach its peak, ignoring the peak performance of the R3xx (which, btw, is 16 ops if you want to ignore limitations), and then concluding that "shader performance will still be higher than an 8 PS per clock architecture"?
The implication is that with enough optimization (hopefully available through a HLSL compiler, eventually if it's not there yet), performance close to that peak can be realized.

Yeah, but then you should either leave the consistent repetition of fallacious comparison to the R3xx out of your "implications", or give some basic recognizition that the same factors would allow it to reach its peak as well. :oops:

The FX architecture is hard to write assembly for. Hopefully these compilers can help (DX9 HLSL and Cg now, GL2 HLSL later).

And the R3xx seems easier to write assembly for.

I think their respective designers are to blame for that. Why do you seem completely unable to accept that possibility and make it part of your working thought process?

As for the coincidence of vec3/scalar and texture/32FP ops, these will prevent the architecture from reaching peak performance,

Not really, it just can't take advantage of them to increase performance, except maybe by conservation of register usage. You just seem to be dedicated to avoiding recognition that the R3xx can take advantage of them at every turn, though, by pretending they don't exist in your discussion.

but if other DX9-level games are anything like DOOM3, they won't be enough to drop the optimized shader performance of the FX architecture below R3xx levels.[

Doom 3 is a DX 8.1 featureset level game, not DX 9. It was designed with DX 7 in mind, and requires DX 8.1 functionality to do the least work to implement its full effect set. How things fall out after that can benefit speed and quality depending on what the rest of the hardware delivers, and the top of the food chain are the cards with good DX 9 level feature support...that limits NV3x discussion to NV35, presumably using the ARB2 Doom 3 path.

Note the favorable Doom 3 "full" featureset speed/transistor ratio of the RV250. Note the nearest transistor count competitor of similar functional level, the NV34, performing poorly in comparison for all shader execution, with more transistors. Whose fault is that?

All these questions are very important and directly relevant for comparison, and they are questions you consistently ignore when you state "12 versus 8" in what seems to me to be a useless fashion...
All I can state is what I know.

Where was that statement of what you "know" in your post? That uninformative commentary around the "common sense" reference?

Real, solid info on the PS2-level shader performance capabilities of the NV3x in real games just isn't yet available.

Ah, the "real game" stipulation, because...none of the abundant "real PS 2.0 benchmark" performance comparisons are at all relevant to what "real game PS 2.0" performance will look like?
How about the "real game" performance comparisons of Doom 3 using the ARB2 path (which exposes DX 9 level functionality, as the NV30 path does not for the NV30-NV34) that John Carmack provided? How about all the "real PS 2.0 demos"? Is there a criteria for exclusion of these PS 2.0 factors, and other VS 2.0 factors for that matter, besides convenience?
All we have is conjecture.
Once you've finished turning your eyes from the inconvenient, sure.
 
Uttar said:
With VS3.0. , PS3.0. and a PPP, you can easily do EVERYTHING the film industry is used to. I mean, there's practically no limitations, and multipass could be used if it was really required anyway ( heck, maybe there'd even be ways to do multipass for vertices since they can use textures, although they'd obviously be unoptimal I guess )
With such features, nVidia is pretty much guaranteed to have an advantage over ATI in getting large, film-related contracts - although those contracts seem to be rare right now, and they're always behind the scenes.

You seem absolutely sure that NV40 with not only have a PPP (somewhat likely), support for VS/PS 3.0 (almost certain) but that ATI wont have the ability to get such a product to the market in the same timeframe.

May I ask why you are so certain about this (I'm just wondering if I have overlooked something)?
 
I believe that MuFu, Dave and one or two others have said that the planned R400 (PS 3.0 + VS 3.0) has been cancelled and the 'new' R400 is a heavily modified R300 which will be very fast but only supports PS + VS 2.0.

If this is the case then a PS + VS 3.0 supporting device might not be released by ATI until some time after the NV40 appears... assuming that NV40 appears on schedule, of course.
 
demalion said:
Didn't say FX12 was bad, said it was wasteful. Hence why I discuss the example of the NV35 and relatively small transistor count increase as being an illustration of this.
What is the transistor count increase? I haven't seen data on this yet.

I would tend to think that if the transistor count increase is quite small (~10-15M transistors added), then it's not that FX12 was wasteful, but instead that FP16 was in the NV30-34, just broken.

Still, Microsoft is making it very hard to get good performance and image quality from the NV31-34 cards through DirectX.
I'm sorry, but that seems to not even be remotely divergent from rampant and nonsensical bias. Feel free to provide some reasoning for the statement that will give me some reason to think otherwise.
NV31-34 requires integer precision for good performance.

Microsoft doesn't offer integer precision in DX9.

And one last thing:
The NV3x cards all offer different levels of performance for different precisions. Given that the output is always 8-bit precision, it is ludicrous to think that even most shaders will require much more than that (read: 24-32 bit). The NV3x offers developers higher possible performance than the R3xx in pure shader processing power when these lower precisions are used enough.
 
Chalnoth said:
The NV3x cards all offer different levels of performance for different precisions. Given that the output is always 8-bit precision, it is ludicrous to think that even most shaders will require much more than that (read: 24-32 bit). The NV3x offers developers higher possible performance than the R3xx in pure shader processing power when these lower precisions are used enough.
you'd be more believable if you were more consistant.
The fact of the matter is, months ago, you were trashing on FP 24 for not being good enough. Now, FP 16 is, simply because your favorite IHV cant perform well unless its lower precision. Strange.
 
Chalnoth said:
demalion said:
Didn't say FX12 was bad, said it was wasteful. Hence why I discuss the example of the NV35 and relatively small transistor count increase as being an illustration of this.
What is the transistor count increase? I haven't seen data on this yet.

I think the NV35 is about 135 million, up 10 million from NV30. Here is one place that says 135 million for the NV35.

You hadn't been exposed to this figure?

I would tend to think that if the transistor count increase is quite small (~10-15M transistors added), then it's not that FX12 was wasteful, but instead that FP16 was in the NV30-34, just broken.

There are limited possibilities:

nVidia is intentionally misrepresenting their debut GF FX products and their capabilities. I'm throwing this one out, as it just seems to have too many detrimental affects for them.
nVidia had a nasty series of accidents with the NV30-NV34.
nVidia tried to achieve FX12 and later discovered it was a mistake, and are demonstrating it with the NV35.
nVidia is pulling some strange and convoluted sleight of hand with the NV35.

The best outlook for the idea of low precision that seems left, is that FX12 might not have a direct example of wastefulness in the NV3x designs, though that still leaves the NV3x designs as demonstrably wasteful. Your wording doesn't seem to recognize that possibility for the NV30-34.

So, to restate, can you agree that: What it looks like we've established is that the NV30-34 are indeed wasteful, either from being broken and wasting transistors, or trying to implement FX12 and being examples of demonstrated failure in attempting FX12 implementation? The former depends on NV30-34 all being nasty accidents, and precluding the possibility that the NV35's "fix" was anything but minor.

If you can agree, please recall that the comparison to the NV35 isn't the only element in my discussion: what you are still ignoring is the transistor count comparison between the NV30/R300, NV35/R350, NV31/RV350, and even the NV34 the RV250, as well as all related discussion of scalars and scheduling. Even the best case comparison, NV35/R350, seems to support the idea that switching precisions itself is wasteful. The least clear comparisons out of those:

  • The NV34/RV250...the RV250 seems to be able to calculate with higher precision than FX12, the NV34's highest speed case, and seems another case of precision switching being severely detrimental to performance. However, the RV250 isn't targetted at the same DX spec, it is targetted at the same delivery goal...it delivers higher speed during higher precision, and for similar realtime functionality.
  • The NV35/R350...the rest of my discussion elsewhere addresses this comparison. I'm not going to play the repetition game when you just simply excise it from your replies to avoid addressing it.

Still, Microsoft is making it very hard to get good performance and image quality from the NV31-34 cards through DirectX.
I'm sorry, but that seems to not even be remotely divergent from rampant and nonsensical bias. Feel free to provide some reasoning for the statement that will give me some reason to think otherwise.
NV31-34 requires integer precision for good performance.

Microsoft doesn't offer integer precision in DX9.

Yes, I'm aware of the elements of your thinking, I'm saying that the appearance of them being a support of your blame assignation is an illusion of your bias, and that your statement do not do anything to establish support outside of the apparent precepts of what seems to be colossal bias. I was asking you to change that, not restate them.
Your statements only displays a dependence on re-ordering the world so that the person who determines the spec for all IHVs is nVidia alone. Perhaps you should look up "reasoning".

BTW, Microsoft DOES offer integer precision in DX 8.1. I've already discussed the problems with the "lengthy integer based shaders" model you've proposed, and you just continue to ignore it. Oh well.

And one last thing:
The NV3x cards all offer different levels of performance for different precisions. Given that the output is always 8-bit precision, it is ludicrous to think that even most shaders will require much more than that (read: 24-32 bit).

And you only had to ignore what I'd already said to state that again. That isn't "one last thing", that's a repetition of the "same thing", ignoring everything I've said in response to it already.

The NV3x offers developers higher possible performance than the R3xx in pure shader processing power when these lower precisions are used enough.

You know, making statements based on the idea that a product has to be shown in the best light simply because it comes from a company is the job of PR departments. We have enough of those, we don't need you to be one here.

If you keep ignoring every issue I point out, like vec3/scalar/texture operations, do you think those issues disappear?

Let's try that statement again: "The NV3x offers developers higher possible performance than the R3xx in pure shader processing power when: these lower precisions are used enough, vec3 and scalar opportunties are mostly absent, texture usage is extremely limited, register usage is severely curtailed, and you can implement what you were writing the shader for in the first place while jumping through all of these hoops at the same time. Oh, and you need a bit of a clock speed advantage to really manifest this in the real world, too.

Gee, there are a lot more stipulations than you indicated.

Some math for you: 16>12, 8>4, 24>8+(8 or 4).

How this manifests for real shader execution cannot be established by simply picking and choosing the number convenient for you, and if it did, the R3xx has a head start.

Is this where you repeat your assertion and ignore all of this again?
 
Althornin said:
you'd be more believable if you were more consistant.
The fact of the matter is, months ago, you were trashing on FP 24 for not being good enough.
Never, ever, ever have I done that.

When it came out that the R300 was to use FP24, I stated that I could not think of a scenario where FP24 wouldn't be good enough.

Now, FP 16 is, simply because your favorite IHV cant perform well unless its lower precision. Strange.
The point is that it's going to be enough for specific instructions, not all instructions.
 
Back
Top