NV30 processor result evangelism

demalion

Veteran
...or my response to what I view as such.

:!: Continued from this thread.

Why did you not reply to my post directed at your related assertions [in the parent thread]?
Chalnoth said:
demalion said:
1. It is superior if you remove dynamic range and precision data during operations? Isn't that a bit contradictory to the label "superior" and to proposing shader length advantage?
Sure, but only for calculations that don't need them.

I said dynamic range and precision, Chalnoth. For what you propose, you give that up. I'm not arguing that the GF FX isn't an integer and PS 1.3 functional card with superior performance, I'm arguing that this does not make its shader implementation superior, and ignores the cases where it is inferior.

Why run at full precision for all calculations, if not all calculations need full precision?

In order to have a performance advantage you have to give up the full precision. Calculating at full precision, and then calculating the result at integer precision, you give up the full precision (or are negating your proposed performance advantage).
It can only offer the performance advantage on the same pixel data by removing the precision from calculations for the pixel.

This makes whether or not the FX is superior dependent upon the nature of the shader being calculated (except in DirectX, where Microsoft has screwed nVidia).

Superior in what? In which shader is it not inferior in either features or speed? What about the shaders where it is inferior in both? What quality makes it superior?

Unless you want to comment on precise shaders that would be used on significant portions of a game scene and require FP precision throughout for maximum quality, then go ahead.

Your "less functionality, using integer instead of fp24, excluding complex ops and texture ops" is more applicable?. :-?

Ok, I'll restrict this discussion to integer precision processing with PS 2.0 "extended" functionality to try and ignore the speed deficit (at 500 MHz versus 325, and using more transistors :oops:) somewhat.

First, that seems inferior to PS 2.0 with fp24, and to a rather more drastic and tangible degree than fp24 is to fp32.
Second, the performance of the nv30 at 500 MHz when doing this is at a parity with the R300 at 325 MHz (that's what I call it when it sometimes leads, sometimes trails, and I'm not selectively looking at one of those cases to the exclusion of the other).
Third, using the same cooling solution, the R300 would be capable of operating at higher than 325MHz (I'm ignoring the R350 for now).

If you respond with discussion of NV30 PS advantages, please include recognition for R300 features, and an explanation as to how long shader lengths at integer processing make sense.

But just stating more precision for the sake of more precision is meaningless.

Again, I said precision and dynamic range, not "precision for the sake of more precision". You know these matter for visual results, I've seen you post in threads discussing it. Are you claiming amnesia? What about to your prior discussions about color precision and dynamic renage from before the nv30's performance issues were substantiated?

Where did this jump from "nv30 is competitive when using integer" to "nv30 is superior" come from all of a sudden? The support seems to be predicated on a theoretical situation and ignoring factors outside of that case.
Rather, it is superior for a select class of shaders. That means that it cannot be said that it is absolutely inferior.

Oh, something I agree with (the second sentence, the first seems to be a repeat of your focus on avoiding discussion of concurrent speed and features). Only trouble was no one said "absolutely inferior" when you responded. They said it "sucked in comparison to the R300" after a discussion of very specific functionality, which seems a valid description of some situations, in fact a great deal of situations, when comparing them. For myself, I wouldn't use "suck" but I would agree to inferiority for the nv30 for those situations.

Exactly how well it will match up to the R3xx architecture depends hugely upon the application. Since most games will likely use similar shaders, it seems likely that one company made the right decision, and the other made the wrong one. What I don't see is any evidence in this thread which company that is.

Hmm...you almost made me make a lame river joke. :-?

Well, the evidence in [the parent] thread supports a lot of observations in many other threads that seem to be giving the answer to your question. If you want to ignore those other threads, benchmarks, image comparisons, articles, etc, I guess you can, but IMO it looks a bit ridiculous.
 
Damn, typed a long reply and I closed the window by error... Gotta start again, and going to summarize most of my thoughts because I don't feel like retyping everything...

demalion said:
Uttar said:
The NV30 is superior to the R300 if the following is true:
What is "superior"? Offering more features, or more speed? It doesn't seem to do both at the same time.
More speed.

1. It is superior if you remove dynamic range and precision data during operations? Isn't that a bit contradictory to the label "superior" and to proposing shader length advantage?

Most operations don't need that much range. And FX12 still gives much better range than the screen's 8-bit.

It's not THAT hard, now, is it?

You usually make sense, Uttar, but I don't see it here. "not THAT hard" after listing a set of criteria that seems to contradict the premise of "the NV30 is superior to the R300"?

I meant to say:
"It's not THAT hard to understand, now, is it?"
"The NV30 is superior to the R300 when, and only when:"

It is also a problem because it offers inferiority, both in tangible and in theoretical results, to the R300. 3dmark 03, shader benchmarks, and John Carmack's discussions all seem to support this (a 500 MHz part using a custom tailored path at reduced quality barely edging out a 325 MHz part using a generic path does not establish superiority).

Hacking something to try to have higher speed isn't the same thing as developing a path with an architecture in mind. I bet there'll be very little IQ difference with Doom 3 on a NV30.
Remember also that Doom 3 was made with the GeForce 1 in mind. There was no dynamic range there. And at the same time, Doom 3 is still future-oriented in many ways - so it's unfair to say the NV30 isn't future-oriented simply because it got native FX.

Where did this jump from "nv30 is competitive when using integer" to "nv30 is superior" come from all of a sudden? The support seems to be predicated on a theoretical situation and ignoring factors outside of that case.

*I* never said the NV30 is superior. I said it is superior WHEN specific conditions are met.
Those conditions are mostly met in theorical cases, but they are also met to a lower extend in many practical cases, making the NV30 very slightly superior to the R300 in those cases.
Most of the time, though, I'd still say the R300 is superior to the NV30.

Using fp16, it is still slower than the R300. If it is using intermixed integer ops, and thereby freely dropping the advantages of fp16 in intermixing ops, that performance parity can indeed be somewhat addressed in a realistic workload.

The low FP16 speed is normal, since the NV30 was made with intermixing in mind.
Also, remember that you could do two different type of independent things in a fragment shader. Let me give you an example:

Imagine you'd want to do lighting, but at the same time you'd want things which are far from the screen to look more red for whichever strange reason ( for example, you'd want to make the user think he's getting nearer and nearer to hell, even though he really isn't - some type of evyl illusion )

You may want to do lighting with FP, 16 or 32 depending on needs, to have higher quality on that one. But the hueing depending on Z only need FX, really.
So, you'd multiply the Z value by a given factor in FX. Then evantually remultiply by how "hellish" the area is. Then add ( thus using MAD ) a minimum.

Then, lighting would be done in parallel ( although it'd probably gonna be finished after the FX stuff ) in FP, and you'd do a FX MUL to combine both results. Then you'd do yet another FX MUL to combine it with texturing. And there you go: FP quality lighting, while using some FX. It would look exactly the same, since hueing here doesn't need FP and FX12 got sufficent range not to lose quality on a 8-bit display.

As for complex ops, the NV30 is immensely superior to the R300 when using SINCOS. In many other cases though, such as LRP, the R300 is a lot superior. But then again, complex ops aren't the main part of a program, generally. But yes, noting it *is* important.

Thus, IMO:

1. Having native FX functionality makes sense
2. But nVidia's "2 times more FX than FP power" ratio is simply bad - there should be more FP power than that!
3. The NV30 is superior to the R300 is about 25% of well-programmed shaders ( using correct intermixing and keeping near perfect quality )
4. The R300 is superior to the NV30 is 75% of cases, thus.

The good news, though, is that by doing a few minor changes to the architecture ( better FX/FP ratio, for example ) , you could get something which is superior to the R300 in at least 70% of cases. That is, if you can use intermixing.
The questions, thus, are:
1. Will DX9.1 support intermixing FP & FX?
2. Will the NV35 have a better FP/FX ratio ( = more FP power, maybe less FX power, maybe something like 6/6 instead of 4/8 - or even better, 8/8 :) )


Uttar
 
Uttar said:
Damn, typed a long reply and I closed the window by error... Gotta start again, and going to summarize most of my thoughts because I don't feel like retyping everything...

demalion said:
Uttar said:
The NV30 is superior to the R300 if the following is true:
What is "superior"? Offering more features, or more speed? It doesn't seem to do both at the same time.
More speed.

Superior PS 1.3 speed doesn't seem very significant to me. Do you have a reason to propose it is? Again, actual shader workloads seem to have the nv30 barely edging out the R300, or losing, and that is with lower quality and at higher clock speeds. For quality output, and calling fp16 equivalent to fp24, it does not offer superior speed even at Ultra clock frequency.

1. It is superior if you remove dynamic range and precision data during operations? Isn't that a bit contradictory to the label "superior" and to proposing shader length advantage?

Most operations don't need that much range. And FX12 still gives much better range than the screen's 8-bit.

Hmm, so do the 8500 and the GF 4.
That still didn't answer my question: isn't that contradictory to the label "superior"? Focusing on your stipulation of speed alone, keep in mind that in a case with a texture op, no scalars ops, and no bandwidth limitation does it compete on a clock for clock basis with the 9700. In the general case, it seems to have performance parity when doing integer versus the R300's fp24.

Doing operations without texture ops and without PS 2.0 functionality, and using no scalar operations and 4 component vector ops (where it seems it is indeed faster or the same speed, even per clock) seems an exceedingly rare case compared to all the situations in which it is not faster per clock, or even faster at the higher clock. Again, the R300 has advantages as well, and they don't seem very rare.

Where is the superiority in association with your proposition of "most operations"? That "most operations" seems contradictory with even your speed focused definition of superiority.

It's not THAT hard, now, is it?

You usually make sense, Uttar, but I don't see it here. "not THAT hard" after listing a set of criteria that seems to contradict the premise of "the NV30 is superior to the R300"?

I meant to say:
"It's not THAT hard to understand, now, is it?"
"The NV30 is superior to the R300 when, and only when:"

What am I not understanding? You haven't said anything here that is new, AFAICS. If I'm incorrect in that, please point it out.

It is also a problem because it offers inferiority, both in tangible and in theoretical results, to the R300. 3dmark 03, shader benchmarks, and John Carmack's discussions all seem to support this (a 500 MHz part using a custom tailored path at reduced quality barely edging out a 325 MHz part using a generic path does not establish superiority).

Hacking something to try to have higher speed isn't the same thing as developing a path with an architecture in mind. I bet there'll be very little IQ difference with Doom 3 on a NV30.

We know the IQ difference with the nv30 path, it is reduced quality, as I said. You are, however, right AFAIK with the sentiment "very little IQ difference" for the reasons you go on to state, but my statement wasn't saying otherwise, it was addressing the comment of "superior" before you said you specified speed was your criteria.

Remember also that Doom 3 was made with the GeForce 1 in mind. There was no dynamic range there. And at the same time, Doom 3 is still future-oriented in many ways - so it's unfair to say the NV30 isn't future-oriented simply because it got native FX.

Hmm...does the R300 being able to provide higher dynamic range at the same execution speed qualify as an advantage, or not? Wouldn't that make the nv30's speed "superiority" less future oriented than the R300's? With your criteria established, it seems to me that this statement can be objectively made and substantiated.

The term "future oriented" without comparison is a more subjective evaluation...I don't think a focus on fx12 is future oriented at all, and I think the implementation decision serves to negate the usefulness of the concurrent advantages it can offer, most especially when evaluating performance. However, we were discussing it comparitively to the R300 in specific, not making an "unfair" statement that the "NV30 isn't future-oriented simply because it has native FX12 support".

Where did this jump from "nv30 is competitive when using integer" to "nv30 is superior" come from all of a sudden? The support seems to be predicated on a theoretical situation and ignoring factors outside of that case.

*I* never said the NV30 is superior. I said it is superior WHEN specific conditions are met.

Heh, and I discussed those conditions before making my statement, so I'm not sure why you complained. I guess what I'm wondering is more accurately described as the "the validity of" the statement that it is superior, even with your stipulation.

Those conditions are mostly met in theorical cases, but they are also met to a lower extend in many practical cases, making the NV30 very slightly superior to the R300 in those cases.

Hmm...I'm still missing the "many practical cases".

Most of the time, though, I'd still say the R300 is superior to the NV30.

Does that mean "many many" cases? "many more"? If the R300 is superior most of the time (and what seems established is that the superiority is not slight) what are you arguing against? AFAICS, no one was saying the NV30 could never outperform the R300 in select cases with a clock speed advantage, so why are you arguing against "never"?

I don't know, it disturbs me when I can't make objective sense out of your comments and context.

Using fp16, it is still slower than the R300. If it is using intermixed integer ops, and thereby freely dropping the advantages of fp16 in intermixing ops, that performance parity can indeed be somewhat addressed in a realistic workload.

The low FP16 speed is normal, since the NV30 was made with intermixing in mind.
Also, remember that you could do two different type of independent things in a fragment shader. Let me give you an example:

Imagine you'd want to do lighting, but at the same time you'd want things which are far from the screen to look more red for whichever strange reason ( for example, you'd want to make the user think he's getting nearer and nearer to hell, even though he really isn't - some type of evyl illusion )

You may want to do lighting with FP, 16 or 32 depending on needs, to have higher quality on that one. But the hueing depending on Z only need FX, really.
So, you'd multiply the Z value by a given factor in FX. Then evantually remultiply by how "hellish" the area is. Then add ( thus using MAD ) a minimum.

Then, lighting would be done in parallel ( although it'd probably gonna be finished after the FX stuff ) in FP, and you'd do a FX MUL to combine both results. Then you'd do yet another FX MUL to combine it with texturing. And there you go: FP quality lighting, while using some FX. It would look exactly the same, since hueing here doesn't need FP and FX12 got sufficent range not to lose quality on a 8-bit display.

Yep, and I recognized the "integer finishing" potential somewhere earlier in the threads associated with these results, and also mentioned it above with "In order to have a performance advantage you have to give up the full precision. Calculating at full precision, and then calculating the result at integer precision, you give up the full precision (or are negating your proposed performance advantage). "

But, how long a shader is that? The issue is that you're looing at the clock cycle in isolation...your color is fine for output to the screen, but the precision for continued shading has disappeared, and the factors for prior shading are ignored.

The GF FX could do the lighting calc at fp16, the Z multiply at FX12, for 4 pixels in 1 clock.

The R300 could do the lighting calce at fp24, the Z multiply, and the additional color at fp24, for 8 pixels in 2 clocks

But the R300 could also have done a texture op for both clocks if it needed to, where the GF FX would have needed a separate clock cycle to get texture data.

EDIT: changed component expectations for lighting calc.

AFAICS, ignoring getting the texture data and bandwidth penalties :)shock:), the nv30 would have a per clock parity.
That does not seem to present an advantage for intermixing, but a case where the disadvantages of the architecture can be ameliorated by limiting options.

As for complex ops, the NV30 is immensely superior to the R300 when using SINCOS.

It is phrasing like "immensely superior" that I don't get. Look at the PS 2.0 results for 3dmark03 compared between the R300 and the NV30...it is using the PS 2.0 "sin" function. Comparing image quality and performance results, what is the "immense superiority"?

In many other cases though, such as LRP, the R300 is a lot superior. But then again, complex ops aren't the main part of a program, generally. But yes, noting it *is* important.

Thus, IMO:

1. Having native FX functionality makes sense
2. But nVidia's "2 times more FX than FP power" ratio is simply bad - there should be more FP power than that!

In which case it would be fast enough to use for FX functionality, which would seem to contradict 1. When comparing it to the R300, I just don't get your 1 and 2.

3. The NV30 is superior to the R300 is about 25% of well-programmed shaders ( using correct intermixing and keeping near perfect quality )
4. The R300 is superior to the NV30 is 75% of cases, thus.

Numbers From Nether Regions! Ack! I don't get your basis for these assertions.

The good news, though, is that by doing a few minor changes to the architecture ( better FX/FP ratio, for example ) , you could get something which is superior to the R300 in at least 70% of cases. That is, if you can use intermixing.

"Superior" at a higher clock speed in 70% of the cases? Hmm...why not design like the R300, so when at a higher clock speed it would be better than the R300 in 100% of cases, not just when "intermixing"? I guess ATi will be the only ones doing that for a while. :-?

The questions, thus, are:
1. Will DX9.1 support intermixing FP & FX?

Lower the quality of the spec to integer to go with...longer shaders and more complex shader operations...? I'm still finding that proposition a fundamental contradiction.

2. Will the NV35 have a better FP/FX ratio ( = more FP power, maybe less FX power, maybe something like 6/6 instead of 4/8 - or even better, 8/8 :) )

What use is the second "8" (even granting it as a suitable label)? Why not have just the first 8 and at high clock speeds so you can use the other chip real estate for something other than depending on lowered quality (and sitting idle when not using lowered quality)? If the first is 8 instead of 4 (even if just fp16), you could maybe even use it for integer ops by itself. :oops:

Can't I call the R300 8/8 for its texture op functionality, directly comparable to your example when doing texture ops, except using FP24? How about when there are scalar ops and 3 component vectors? 8/8/8? And it's design has less transistors.

I don't see the superiority in depending on FX12 to offer optimal performance instead of designing for optimal performance for varied shader workload (and at FP24).
 
I guess you still don't understand the nature of higher precision and range.

First, let's take a look at FX12. FX12 is a 12-bit integer format clamped to [2,-2]. This means that FX12 will work just fine as long as no number in the calculation goes above twice the maximum output brightness, and no number is multiplied by more than 4.

Of course, there's always the possibility of recursion errors, but since FX12 has an additional 2 bits of "buffer" accuracy, with proper dithering FX12 can support a resonable number of calculations with no noticeable errors. With very long programs, FP32 would be necessary (FP16 would actually show more recursion errors).

At the same time, that doesn't even make FX12 useless for very long programs. With proper analysis of the data, places can be found where recursion errors will be reduced rather than magnified, making it acceptable to use FX12 for those cases (that also make the conditions I listed above).
 
Chalnoth said:
I guess you still don't understand the nature of higher precision and range.

Well, that's a useful alternative to addressing the post full of questions to you above.

Let's see, I said FX12 was a problem for precision and dynamic range, and seemed to contradict the usage of long shaders and complex shader instruction calculations.

First, let's take a look at FX12. FX12 is a 12-bit integer format clamped to [2,-2]. This means that FX12 will work just fine as long as no number in the calculation goes above twice the maximum output brightness, and no number is multiplied by more than 4.

Which seems to bring us back to a lack of dynamic range, which is why I mentioned it in that post to you above.

Of course, there's always the possibility of recursion errors, but since FX12 has an additional 2 bits of "buffer" accuracy, with proper dithering FX12 can support a resonable number of calculations with no noticeable errors. With very long programs, FP32 would be necessary (FP16 would actually show more recursion errors).

There is also fp24 too, just to remind you, and it doesn't require being clamped to FX12 range to perform at speed, though you do seem to be just confirming the issues I mentioned. Also, doesn't the R200/RV250s -8 to 8 range look pretty favorable in the context of FX12?

However, you've still failed to relate this set of limitations to support for your assertion of superiority in the face of the commentary I've provided, and my related questions detailed in the prior reply (to you) remain unanswered.

At the same time, that doesn't even make FX12 useless for very long programs. With proper analysis of the data, places can be found where recursion errors will be reduced rather than magnified, making it acceptable to use FX12 for those cases (that also make the conditions I listed above).

Did you just spend a post saying "FX 12 is indeed inferior to what the R300 offers, in the ways you mentioned, but since you can work to counter some of these problems with careful planning, you show you don't understand by mentioning the problems in the first place"?
 
At the same time, that doesn't even make FX12 useless for very long programs. With proper analysis of the data, places can be found where recursion errors will be reduced rather than magnified, making it acceptable to use FX12 for those cases (that also make the conditions I listed above).

Why go to all the trouble of analysis? The R300 doesn't need it.
 
demalion said:
Which seems to bring us back to a lack of dynamic range, which is why I mentioned it in that post to you above.
Dynamic range is only useful if it's used.

And this is the point. For any shader where FX12 is used frequently enough (and properly...so as not to decrease output quality), the NV30 architecture will be faster than the R300 architecture.

This is why without precise knowledge of which shaders game developers are apt to use, you have no authority to state that the R300 architcture is unequivocally better at shader ops than the NV30 architecture. You only have your personal preference.

Additionally, I'm not talking about sacrificing anything here. I'm stating that with smart programming with specific shaders, the NV30 can be superior, while providing the same image quality as the R300. The only remaining question is which shaders game developers will actually use.

Once again, the lack of precision and dynamic range is not a problem if the output is the same. The output will be the same (for all intents and purposes) if FX12 is used in parts of many shader programs (exactly how many I don't have authority to comment on, but statements from game developers seem to indicate that it's enough to outdo the R300).

As a side note, I have a feeling nVidia is going to start interpreting the _pp hint in PS 2.0 as FX12, and, as a result, we're going to see a little "staring match" between Microsoft and nVidia.
 
Saem said:
Why go to all the trouble of analysis? The R300 doesn't need it.
Because the NV3x line will certainly take a significantly larger marketshare, particularly with the NV34 product.
 
Dynamic range is only useful if it's used.

It comes for free on the R300. If widgets on a toy come for free they'll be used, nvidia is effectively screwing the industry & consumers by offereing an FX mode.

As a side note, I have a feeling nVidia is going to start interpreting the _pp hint in PS 2.0 as FX12, and, as a result, we're going to see a little "staring match" between Microsoft and nVidia.

I pity nvidia then. :D

Because the NV3x line will certainly take a significantly larger marketshare, particularly with the NV34 product.

Who says?
 
Chalnoth said:
Additionally, I'm not talking about sacrificing anything here. I'm stating that with smart programming with specific shaders, the NV30 can be superior, while providing the same image quality as the R300.

I have the impression that games developers, and their publishing houses, aren't really all that concerned about having to really delve into the architectures to actually produce something that gives them both speed and decent IQ - Why should they need to when a balanced architecture, such as R300, gives them that for free with no extra time spent on rogramming? I suspect that the 'smart' programming you talk of will mainly be done inside NVIDIA's walls, be it by their dev rel or driver team.

As a side note, I have a feeling nVidia is going to start interpreting the _pp hint in PS 2.0 as FX12, and, as a result, we're going to see a little "staring match" between Microsoft and nVidia.

You talk about not sacrificing anything first, and then say that? Doesn't gel.
 
Chalnoth said:
As a side note, I have a feeling nVidia is going to start interpreting the _pp hint in PS 2.0 as FX12, and, as a result, we're going to see a little "staring match" between Microsoft and nVidia.

I share that 'feeling'. Apparently speed on NV3X doesn't as much come from the use of FP16 over FP32 as initially thought, but from going int12 over FP. Spicy. 8)
 
Chalnoth said:
demalion said:
Which seems to bring us back to a lack of dynamic range, which is why I mentioned it in that post to you above.
Dynamic range is only useful if it's used.

Hmm...yeah, who'd want to use it, right?

And this is the point. For any shader where FX12 is used frequently enough (and properly...so as not to decrease output quality), the NV30 architecture will be faster than the R300 architecture.

No, it won't, and I've addressed this already.

Your 50% per clock advantage for NV30 depends on: limiting range and precision to FX12 AND no opportunities for 3 component vector ops and scalars AND no texture ops AND no bandwidth limitation AND limitation in instructions utilized.

You're basing your support on this?

Without all of these, and only when using FX12 freely, it can SOMETIMES have an IPC parity with the R300, as one of them falling through removes the advantage.

In contrast, the R300's 100% per clock advantage (a situation you seem intent on ignoring) depends on having any 2 of the above fail (besides bandwidth limitation...the impact of that will vary), and it isn't limited to a maximum of 100% per clock advantage.

And all of this is discussing the NV30 using FX12 versus the R300 using FP24, so it is even granting all the gymanstics you seem to be putting forth above as valid, which would still not make it superior, because the quality and featureset limitations are significantly below that of what the R300 is offering.

So, this is the NV30's superiority?

This is why without precise knowledge of which shaders game developers are apt to use, you have no authority to state that the R300 architcture is unequivocally better at shader ops than the NV30 architecture. You only have your personal preference.

Oh, so only you have the authority to say a card is superior then? :oops:
I didn't say the R300 was "unequivocally better", I said the NV30 was inferior in a wide range of circumstances and considerations (backed up by facts to the best of my knowledge), and recognized your stipulation of higher speed (in what seem to me to be rather restricted circumstances, and at significantly lower quality) while pointing out that the R300 has opportunities to lead as much or more in speed (both per clock and for final output) even at tangibly and significantly higher quality than you proposed. This was not based on personal preference...are you sure you can say the same of what you continue to propose?

Your response is that higher speed at lower quality is superiority because sometimes you can hide (not eliminate) the lower quality (theoretically), with special care and planning, nevermind the impact on speed this will have.

From this, I proposed that the R300 being able to lead both in quality and speed (simultaneously and in circumstances that benchmarks and analysis seem to support are common) versus the NV30 being able to lead in one or the other (with a significant sacrifice of the remaining one, and even then only in specific circumstances) does tend to supply an answer to the question that you proposed had an answer that still remained to be seen.

Additionally, I'm not talking about sacrificing anything here.
Yes you are.
I'm stating that with smart programming with specific shaders, the NV30 can be superior, while providing the same image quality as the R300.
Limiting yourself to specific NV3x shaders isn't a sacrifice?
Limiting yourself to working around FX12 isn't a sacrifice?
FX12 is the "same" image quality? I guess my talking about shader length didn't happen, and ignoring range and precision issues is all that matters.

Your set of qualifications is staggering, your statements seem questionable for the reasons I've stated prior, and your dedication to proposing them seems to only make sense if you've decided to see only nv30's superiority, no matter what.
That's my opinion, and I think it is well supported.
The only remaining question is which shaders game developers will actually use.
Hmm...I don't think that question would have anything to do with your arguments for "superiority".
Once again, the lack of precision and dynamic range is not a problem if the output is the same. The output will be the same (for all intents and purposes) if FX12 is used in parts of many shader programs (exactly how many I don't have authority to comment on, but statements from game developers seem to indicate that it's enough to outdo the R300).
:oops: You don't think your statement is a bit convoluted and special case?
The only thing new here is "statements from game developers seem to indicate that it's enough to outdo the R300", and I'm wondering if you could elaborate on that, please.
As a side note, I have a feeling nVidia is going to start interpreting the _pp hint in PS 2.0 as FX12, and, as a result, we're going to see a little "staring match" between Microsoft and nVidia.
The "Dawn of Cinematic Computing"? :-?
 
Chalnoth said:
Saem said:
Why go to all the trouble of analysis? The R300 doesn't need it.
Because the NV3x line will certainly take a significantly larger marketshare, particularly with the NV34 product.

Hmm...with your definition of "advanced shading", please don't forget that the 8500/9000/9200 seem to be as applicable as the NV34.

Well, if you're using DX or OpenGL, that is.
 
Heathen said:
It comes for free on the R300. If widgets on a toy come for free they'll be used, nvidia is effectively screwing the industry & consumers by offereing an FX mode.
Higher precision also comes for free on the NV30, though not on every instruction, unfortunately.
 
DaveBaumann said:
I have the impression that games developers, and their publishing houses, aren't really all that concerned about having to really delve into the architectures to actually produce something that gives them both speed and decent IQ - Why should they need to when a balanced architecture, such as R300, gives them that for free with no extra time spent on rogramming? I suspect that the 'smart' programming you talk of will mainly be done inside NVIDIA's walls, be it by their dev rel or driver team.
Very possible, but remember that nVidia still has the larger marketshare, and the release of a low-cost DX9 part (NV34) will essentially guarantee that nVidia will have more DX9 parts than ATI.

As a side note, I have a feeling nVidia is going to start interpreting the _pp hint in PS 2.0 as FX12, and, as a result, we're going to see a little "staring match" between Microsoft and nVidia.
You talk about not sacrificing anything first, and then say that? Doesn't gel.
So? I'm talking about two totally different situations. One is developers who are trying to get the best performance and visual quality for their games. The other is Microsoft who is somehow against any integer operations in PS 2.0. Personally, I think nVidia has the right to turn around and use _pp in a different way than it was intended, since theirs is the only hardware that will make any use of it currently, so the only developers who will be using the hint (except Futuremark...) will be using it for increased speed on nVidia hardware, though it won't allow nVidia to get the register number boost from going for FP16 on other calcs. The stipulation here is, of course, that nVidia informs developers what's going on.
 
Chalnoth said:
So? I'm talking about two totally different situations. One is developers who are trying to get the best performance and visual quality for their games. The other is Microsoft who is somehow against any integer operations in PS 2.0. Personally, I think nVidia has the right to turn around and use _pp in a different way than it was intended, since theirs is the only hardware that will make any use of it currently, so the only developers who will be using the hint (except Futuremark...) will be using it for increased speed on nVidia hardware, though it won't allow nVidia to get the register number boost from going for FP16 on other calcs. The stipulation here is, of course, that nVidia informs developers what's going on.

No, that's how really stupid things start to happen, and how specs get fragmented - by short sighted arguments like this.

The spec is there to define how things should behave - having the first hardware that implements part of the spec does not give you the right to alter that spec to suit yourself. Microsoft control the specification, not nVidia or any other IHV, which is as it should be - it's hard to have a competitive market in which one of the competitors controls the specification and can redefine it as they choose.

There have been examples in the past where the spec says how things are supposed to be, but the defacto standard ends up being whatever a particular IHV ended up implementing. Then other IHVs come along and implement things correctly and get 'bugs' in existing software because people have written their software expecting the other IHVs incorrect behaviour, and the really daft thing is that the IHVs that implement things correctly are the ones who get blamed for driver bugs.

It's stupid.

In the past this sort of thing has mainly been caused by the spec being insufficiently rigorous. In the case of the _pp hint it's reasonably clear - the value must be float and at least S10e5.

An IHV has no right to do things differently and then claim 'We were here first'.

The spec was here first. The only fixed point mentioned anywhere in the PS2.0 spec is the minimum defined precision of the colour interpolator inputs. All pixel shader operations must be float. Any hardware or driver that doesn't use float for all operations in a PS2.x shader is doing things wrong.
 
Chalnoth said:
...I think nVidia has the right to turn around and use _pp in a different way than it was intended, ...The stipulation here is, of course, that nVidia informs developers what's going on.

Are you really serious?

Just to add to andypski's post:

Is nVidia going to pay for any other IHV's driver delveopment, when they have to code for work-arounds to software devs programs that are based on nVidia's wrongful implementation? To even suggest that nVidia could wrongly implement a spec is one thing, and on top of that to suggest the only "stipulation" be that nVidia informs devs about it...just displays your lack of understanding about how doing such things impacts the industry.

If nVidia really wants certain capability of their hardware to be utilized that DirectX (and even GL proper, I believe) doesn't expose, then nVidia has a LEGAL alternative. Push the use of their own proprietary extensions for GL.

Developers, on a case by case basis, will then have a choice to decide for themselves on whether they want to expend the resources to cater to NV30's architecture.
 
Higher precision also comes for free on the NV30, though not on every instruction, unfortunately.

So why the performance drop in higher pecision modes or Nvidia's desire to go for FX12? If it was for 'free' as you claim they would never have offered integer modeas. Or is it like their claim of free AA? Complete PR Crudstunk.

All the NV3X architecture seems to be is another enginerring kludge, poorly thought out and poorly implemented and Nvidia seems hell bent on fracturing the DX standard with their own 'personal' implementation. Hey I suppose MS could offcially declare the NV30 cards 'non-compliant' until they follow the claerly laid down guidelines.
 
Heathen said:
Hey I suppose MS could offcially declare the NV30 cards 'non-compliant' until they follow the claerly laid down guidelines.

Seen any recent WHQL drivers from Nvidia for the NV30? No, didn't think so.
 
Back
Top