http://www.digit-life.com/articles2/gffx/nv31-nv34.html#p7
heres a benchmark using rightmark comparing fxs to radeons.
heres a benchmark using rightmark comparing fxs to radeons.
I was talking about RightMark having a render path that prefers "native" extensions as opposed to ARB ones. This wasn't Cg specific, this wasn't even shader-specific. What about vertex buffer extensions?demalion said:Xmas, you are losing me here. We're talking about Cg...Cg doesn't compile to ATI specific extensions, nor is it likely to as long as ATI is committed to the standards Cg bypasses and instead chooses to do things like design their hardware for more computational efficiently (Pick some shader benchmarks comparing the cards, analyze the results, and tell me if you'd dispute even that).
... I still can't see how having the clearly-labeled choice of supporting vendor-specific extensions (which I don't think are NVidia only) can possibly be a bad thing.demalion said:Hmm... OK, let's go over this again:
There are "synthetic" tests. I only noticed files fo DirectX at the moment, but I presume they intend to implement them for OpenGL. When/if they do, of course it is not bad if they support all OpenGL extensions, but that is not at all what we are discussing. That follows...
There are "game" tests. They are implemented using Cg, period. The only optimizations this delivers are nVidia's prioritized compilation, and for those targets nVidia has defined. So, in addition to the restriction of standards to only the ones nVidia has an interest in exposing (which excludes the ATI fragment extension for OpenGL and PS 1.4), the only vendor specific optimizations offered are the ones provided by nVidia, for nVidia hardware.
I don't think they need "solutions", but supporting DX9 HLSL in parallel would certainly be nice to see.demalion said:Solutions, covered again:
The authors of Rightmark 3D support DX 9 HLSL (ignoring for the moment that the shaders in the .cg files seem short and able to be ported to the ATI fragment extension for OpenGL and the 8500/900). This is what I referred to earlier as being as much as could be reasonably expected with limited resources (i.e., flawed, but potentially useful).
Other IHVs circumvent GLslang and DX 9 HLSL, whose specifications they happen to have a known say in, and support Cg on nVidia's terms. There does appear to be some reason for them to consider this undesirable, do you disagree?
demalion said:Yeah, and API's only specify what kind of programs are valid, not what kind are good programs. Your statement is misleading, since you are proposing that there is no relation to the specification of the instructions and modifiers, and then subsequently what is a good method of executing them. That is simply not true.Sorry, but the DX9 PS 2.0 spec only says what kinds of shaders are valid shaders, not what kinds of shaders are good shaders. So when writing a shader you have lots of options, but no indication of whether you're on the right track or not.demalion said:To the DX 9 PS 2.0 spec, as I specifically stated.
For instance, the DX 9 PS 2.0 specification does not specify that two non texture op instructions from a restricted set occur for each PS 2.0 instruction or texture op. This is a possible occurrence, but depending on it to occur for optimal performance is not a good method of execution. However, that is a characteristic of Cg's known target, and it is the ability to control this that nVidia is using Cg to allow them to do for shader content creation, during development and at run time, before the low level optimizer has to worry about it. They promote that precompiled code be generated using Cg, even if there is another alternative, for that reason...but, you maintain this does not matter despite evidence and analysis proposed to the contrary.
We are talking about a high level compiler, Xmas. What do you think Cg is using to output to PS 2.0? You are circumventing the simple observation that compilers produce different code, by saying you are only talking about the code.And you continue to talk about different targets while I am only talking about PS 2.0 assembly, and not about anything to aid high level compilers.demalion said:You continue to maintain that simplification, despite my having tried to point you to examples of that not being the case. Again, looking at some benchmarks, notice instruction count parity (including when targetting the same assembly) and yet performance differences even for the same datatype. Also notice the recommendations to aid Cg compilation at the end, which the Cg compiler itself could (and likely will) do, even though the concerns are unique to the nv30, not all architectures.
Why didn't you address my instruction example at all? If it has flaws, it would be useful to discuss where I went wrong. If it doesn't, it seems more than slightly applicable.
You can write bad code using PS 2.0, you can hand code bad code, you can compile bad code in a compiler. The problem with Cg is that the bad code that the compiler will be improved to avoid is defined by nVidia. The benefit of DX 9 HLSL, as long as MS doesn't have a conflict of interest, is that all IHVs have input into that definition (without having to write a custom compiler at nVidia's request).
A good set of base principles can be reached and targetted without conflict, except for designs that are a bad match for the general case base specifications of the LLSL. It so happens that the nv30 is a bad design for all specifications I know of except nVidia's customized OpenGL extension and their formerly (?) bastardized PS 2.0, yet they want other IHVs to have to also create another HLSL compiler ...for nVidia's toolset ...or allow nVidia to dictate the code their low level optimizer have to be engineered to address effectively.
You maintain there is nothing wrong with things occurring along the lines of this goal.
I'll reply to that in a separate post.demalion said:btw, arbitrary swizzle is an optional feature of "PS2.x" and if a compiler would output code that uses it, it would not work on PS2.0 cards at all.
Is this in reference to the sentence after my example shader code? Thanks for the correction, but what about the rest of what I proposed?
demalion said:Well, that seems to fit my second description, quoted alone for brevity. I already provided an answer:
demalion said:I think the second is wrong, though it does seem to be what nVidia is intent on accomplishing.
And expanded upon it (why didn't you address this the first time?):
I would also call R300 the better design. However I do not think a design that I would consider even better would have to be equally "general case compatible" and not sensitive to optimizations.demalion said:Well, if you look at the recommendations, consider that they are nv30 specific recommendations that nVidia has an interest in having the Cg compiler handling, regardless of the impact on low level optimization workload for other IHVs. Then consider that some might not impact other IHV hardware negatively...that there is a conceivable difference between those that do and do not, despite what you have proposed. Then apply that thinking to some of the established unique attributes of the nv30.(btw, I don't know why you mention pocketmoon's Cg recommendations here, as I'm not talking about how you should write a HLSL shader)
You might then try doing the same with the R300, and realize how little divergence from general case specification expression its performance opportunities require.
I call that the result of a "better design".
But that's a reason why I think the "integrated" concept of GLslang is much better suited to shader programming.
I certainly do not think HLSL optimization is not significant. However I also don't think there is such a thing as "the general case". There are shaders that run well on R300, others run well on both R300 and NV30, and others run best on NV30.demalion said:The difference is that an 8x1 is full speed when handling any amount of textures, and a 4x4 architecture is only full speed when handling multiples of 4. I get that out of your text because code optimized for the nv30 is exactly what you are saying that, for example, the R300 should be able to handle, rather than the nv30 should be able to handle code generated to provide general case opportunities (I've even provided a list for discussion of optimizations I propose are common).
Anyways, pardon the emphasis, as we've been circumventing this idea by various paths in our discussion a few times now:
The base specification that allows applying 1 or more textures to a pixel is not optimized for 1 TMU per pipe, 1 TMU per pipe is optimized for efficiency for the base specification.
Your assertion that the DX 9 HLSL output is just as unfair as Cg's output is predicated on a flawed juxtaposition of that statement that parallels the situation with nVidia's hardware (try replacing "1" with "4"...that reflects what you'd have to say to support a similar assertion about Cg and DX 9 HLSL, and it does not make sense because that statement is exclusive of less than 4 textures, while this statement is not exclusive of 4 or more textures...).
This TMU example pretty closely parallels several aspects of the difference between the shader architectures in question. You continue to propose that because the specification allows expressing code in such a way that it is optimized for the one that has more specific requirements, despite discussion of how that such optimization can hinder the performance of other architectures, that it is fine to optimize code for the specific case instead of the general case. The cornerstone of this belief seems to be that this is fine because the hardware designed for the general case should be able to handle any code thrown at it at all (why bother to design something that performs well in the general case anyways?), rather than the hardware designed for the specialized case having to seek opportunities to address its own special requirments.
You proclaim that the idea of HLSL optimization for DX 9 is not significant, so it doesn't matter what optimization is used, and that continues to make no sense to me.
More later, sorry. Got to go to a birthday party
Ichneumon said:Ante P said:Fine use Cg for a benchmark but then they shouldn't claim to be so independent, the whole site sorta contradicts tthe fact that they are using a vendor specific compiler which nVidia themselves says outputs nV optimized code.
And has no support for PS 1.4 at all.
Now I'm pretty puzzled. A 4-component fp addition takes 2 clock cycles on NV30? This contradicts anything I've heard so far about NV30 FP ALUs. In fact, results from thepkrl seem to indicate that it always takes one clock cycle.demalion said:Here are some observations:
For some operations, the nv30 can execute two 2 component fp ops in the same time as it can execute one 4 component op. It can also arbitrarily swizzle. It should also be able to use it's FX12 units legally in specific instances (like when adding a valid FX12 constant to values just read from a 12 bit or less integer texture format ).
This might be optimized uniquely for the nv30 when two of four constants are valid FX12 constants such that (this is for an integer texture format, and please forgive the amateurish nature):
for everyone:
nv30 executing this unoptimized for fp32:Code:textureload (r,g,b,a}->xtex add xtex, (float32,float32,int12,int12)->simpleeffect 2 cycles for a card that can perform well in the general case
Code:*-textureload (r,g,b,a}->xtex **addf xtex, (float,float,int,int)->simpleeffect 3 clock cycles
Arbitrary swizzling is an optional PS2.x feature. A driver that does not support it would not accept code with arbitrary swizzling.demalion said:Swizzling furthers the optimization opportunities that could be explicitly stated in the LLSL without adversely affecting the nv30, but hindering the performance for others. So do modifiers.
Xmas said:First, sorry demalion for picking some points to reply to and ignoring others, but I wrote that post at 5am and wanted to just write some answers I could think of in my tired state before going to bed
I was talking about RightMark having a render path that prefers "native" extensions as opposed to ARB ones. This wasn't Cg specific, this wasn't even shader-specific. What about vertex buffer extensions?demalion said:Xmas, you are losing me here. We're talking about Cg...Cg doesn't compile to ATI specific extensions, nor is it likely to as long as ATI is committed to the standards Cg bypasses and instead chooses to do things like design their hardware for more computational efficiently (Pick some shader benchmarks comparing the cards, analyze the results, and tell me if you'd dispute even that).
RightMark supports PS1.4 in some tests. So it could as well support ATI's fragment shader extension in OpenGL.
I was at a loss here, as I've abundantly pointed out to you that there is no choice, and just a short while ago you seemed to realize that there wasn't one, but then I realized you said "(which I don't think are NVidia only)", and must have not read the quoted text carefully.... I still can't see how having the clearly-labeled choice of supporting vendor-specific extensions (which I don't think are NVidia only) can possibly be a bad thing.demalion said:Hmm... OK, let's go over this again:
There are "synthetic" tests. I only noticed files fo DirectX at the moment, but I presume they intend to implement them for OpenGL. When/if they do, of course it is not bad if they support all OpenGL extensions, but that is not at all what we are discussing. That follows...
There are "game" tests. They are implemented using Cg, period. The only optimizations this delivers are nVidia's prioritized compilation, and for those targets nVidia has defined. So, in addition to the restriction of standards to only the ones nVidia has an interest in exposing (which excludes the ATI fragment extension for OpenGL and PS 1.4), the only vendor specific optimizations offered are the ones provided by nVidia, for nVidia hardware.
One thing to note: The option is labeled "Preferred extensions". For R200 PS1.4 there's only an ATI extension AFAIK. For R300 PS2.0 there's only an ARB extension. For NV30 PS2.0 there are ARB and NV extensions.
I don't think they need "solutions", but supporting DX9 HLSL in parallel would certainly be nice to see.demalion said:Solutions, covered again:
The authors of Rightmark 3D support DX 9 HLSL (ignoring for the moment that the shaders in the .cg files seem short and able to be ported to the ATI fragment extension for OpenGL and the 8500/900). This is what I referred to earlier as being as much as could be reasonably expected with limited resources (i.e., flawed, but potentially useful).
Other IHVs circumvent GLslang and DX 9 HLSL, whose specifications they happen to have a known say in, and support Cg on nVidia's terms. There does appear to be some reason for them to consider this undesirable, do you disagree?
I have to answer that as a whole....stuff I think you should read again...
The problem I'm trying to get at is, as long as you don't have any applicable metric for quality of assembly shader code except "shorter is better", how can you say one shader is better than another one?
I am particularly considering that different compilers produce different code. The point is, is there a way to tell which one is better?
Yes, when the nv30 performance characteristics deviate so widely from the base spec. That was the point of the TMU illustration. Of course, it isn't worse for the nv30, which is why Cg is not suitable as the sole means of producing benchmark shader code.Is "NV30-optimized" code that takes certain characteristics into account worse than code without those optimizations?
Can we take performance as a metric? We can. But in which cases?
If HLSL code runs faster than equvalent-length Cg code with a driver that is optimized for "DX9 HLSL compiler style" and for nothing else, that is self-fulfilling. Of course this is also true the other way round.
What if a Cg shader runs faster on hardware A and the equivalent HLSL shader runs faster on hardware B?
Xmas said:I think that any driver must be capable of optimizing any assembly shader code it gets, regardless if it was generated by Cg, DX9 HLSL compiler, any other HLSL compiler or coded in assembly.
I'll reply to that in a separate post.demalion said:btw, arbitrary swizzle is an optional feature of "PS2.x" and if a compiler would output code that uses it, it would not work on PS2.0 cards at all.
Is this in reference to the sentence after my example shader code? Thanks for the correction, but what about the rest of what I proposed?
...
I would also call R300 the better design. However I do not think a design that I would consider even better would have to be equally "general case compatible" and not sensitive to optimizations.
The distinction is that GLslang is more actively attempting to replace the low level expression of shader functionality. As I said, OpenGL does not have a strong standardized low level expression legacy to recognize (ARB fragment seems to be the intermediate step, not directly associated with GLslang evolution).But that's a reason why I think the "integrated" concept of GLslang is much better suited to shader programming.
demalion said:...more stuff I think you should read again...
I certainly do not think HLSL optimization is not significant.
However I also don't think there is such a thing as "the general case".
There are shaders that run well on R300, others run well on both R300 and NV30, and others run best on NV30.
More later, sorry. Got to go to a birthday party
Xmas said:....
Now I'm pretty puzzled. A 4-component fp addition takes 2 clock cycles on NV30? This contradicts anything I've heard so far about NV30 FP ALUs. In fact, results from thepkrl seem to indicate that it always takes one clock cycle.
Arbitrary swizzling is an optional PS2.x feature. A driver that does not support it would not accept code with arbitrary swizzling.demalion said:Swizzling furthers the optimization opportunities that could be explicitly stated in the LLSL without adversely affecting the nv30, but hindering the performance for others. So do modifiers.
What about modifiers? There are only 5 modifiers in PS2.0 (instruction modifiers: centroid, pp, sat; source modifiers: -, abs), and they should be "free" on any hardware.
flick556 said:...
a.) Nvidea is moving too fast for everyone else and even the brand spanking new direct x9.0 does not take advantage of some of their advanced features.
b.)I hate to keep hearing people talk about how nvidea is lowering the specs. fp32/fp18 is better than just fp24 plane and simple.
...In the end I just want to see the geforce fx do everything it says it can do and then compare it to the Radeon doing it’s best job.
Well, just wanted to state that I don't agree with this for reasons too lengthy to discuss in this thread right now. Prior discussions of this are available in the forum if you wish to do some searching.I would be in favor of exclusive games for ati and geforce respectively just to stop them from slowing each other down and it looks like this is where things are headed.
muzz said:IMO exclusive games are an indication of idiocy....... gimme a break here with that crap.
Should we have each Developer coding for seperate cards? Ya that'll be real profitable and reasonably quick.....
demalion said:For accuracy's sake, the list of specifications with different performance characteristics are FX12/FP16/FP32.
Also, I'll point out that your FP16, FP24, and FP32 discussion is self-contradicting.
For your C, and D, there are some things I would term significant inaccuracies, but you are entitled to your opinion.
flick556 said:These standards have forced Nvidea and ATi cards to be to similar, I think they should be radically different and completely incompatible with each other. That would initiate real competition and not this oligopoly cartel that exist today being refereed by big bad Microsoft.
Nvidea is moving too fast for everyone else and even the brand spanking new direct x9.0 does not take advantage of some of their advanced features.
I hate to keep hearing people talk about how nvidea is lowering the specs. fp32/fp18 is better than just fp24 plane and simple.
fp32 allows better quality and fp18 allows better speed.
Part of the whole fx push is being able to use the same fp32 as the movies like shrek and toy story. The goal is to be able to render these movies in real time (and I almost 100% sure they already demoed a scene from toy story.)
fp18 is great and those screen shots floating around the net don't represent the difference between fp18 and fp24, I’m sure their was something far more dramatic occurring like a bug since the difference between these two specs is not something a human eye can detect. and even these issues where fixed with the newest drivers while still maintaining the high frame rates so the performance boost is coming from somewhere else besides downgrading fp.
m$ chose not to implement all thier features into direct x 9.0, know that was a very mean thing to do.
They are key developers in the creation of Opengl they know are creating thier own very capable compiler and thier active in all types of development tools. I like the fx specifications allot, I can't wait to games fully utilize them, without cg most likely exporting to opengl,since m$ won't play nicely these features would never get used and those are the games I'am buying. Who in the world gave M$ or ATI the right to tell Nvidea the best way to display graphics.
andypski said:Which advanced features of nVidia's are not exposed in DX9 that you would particularly like to see?
flick556 said:And Ati only supports one middle range fp that does not match that of nividea or the movie industry.
What isn't NV30 able to expose under DX9:flick556 said:andypski said:Which advanced features of nVidia's are not exposed in DX9 that you would particularly like to see?
Correct me if I'am wrong but I thought the fx's dynamic shader instructions where not supported by direct x 9.0
-Both 512/1024 instruction count and register count are more likely design issues than features.MDolenc said:andypski said:Which advanced features of nVidia's are not exposed in DX9 that you would particularly like to see?
What isn't NV30 able to expose under DX9:
-only half (512) instructions are possible
-no pack/unpack instructions
-each fp32 register can serve as 2 fp16 registers (providing 64 fp16 registers)
-vPos in ps_3_0 model only defines x,y components while NV30 is a ps_2_0 part and defines x,y,z,w components (and ends up unexposed in ps_2_0)
I don't understand. Which delay? And where did you get that from? That's news to me.demalion said:Look more closely...the delay is in outputting the 4 components, so it would perform as outlined
[/quote]demalion said:What about modifiers? There are only 5 modifiers in PS2.0 (instruction modifiers: centroid, pp, sat; source modifiers: -, abs), and they should be "free" on any hardware.
Yep, but each hardware has specific modifiers they could use for actual low level optimization, and their differences there facilitates code generated uniquely with one in mind (that would be Cg and its set of nVidia controlled priorities) preventing such opportunities from being easily visible.
For example, from here, we have the following list for the R300: "negate, invert, bias, scale, bias and scale, and channel replication, and instruction modifiers, such as _x2, _x4, _x8, _d2, _d4, _d8, and _sat", some of which the R300 low level optimizer might be able to use for optimization with DX 9 HLSL generated code, and which Cg generated code might easily preclude by instruction ordering changes that are valid within the spec. It is not the reordering to the advantage of the nv30 that I object to for Cg, but using that for other hardware and blithely stating no difference in optimization opportunities could result.
Xmas said:I don't understand. Which delay? And where did you get that from? That's news to me.demalion said:Look more closely...the delay is in outputting the 4 components, so it would perform as outlined
thepkrl said:The FLOAT/TEXTURE unit can handle any instruction with any format of input or output. All instructions execute in one cycle, except for LRP,RSQ,LIT,POW which take 2 and RFL which takes 4.
thepkrl said:Registers and performance:
Number of registers used affects performance. For maximum performance, it seems you can only use 2 FP32-registers. Every two new registers slow down things:
examples of the slowdowns incurred for an instruction sequence...
thepkrl said:If input regs are used in the unit they are connected to, using them is free. If they are used for FLOAT/TEXTURE coords, an extra round is needed to first store them into a temp register. For example "ADD R0,f[TEX0],f[TEX0]" takes two rounds.
Sorry, my mistake. There are no hardware-specific modifiers.demalion said:What about modifiers? There are only 5 modifiers in PS2.0 (instruction modifiers: centroid, pp, sat; source modifiers: -, abs), and they should be "free" on any hardware.
Yep, but each hardware has specific modifiers they could use for actual low level optimization, and their differences there facilitates code generated uniquely with one in mind (that would be Cg and its set of nVidia controlled priorities) preventing such opportunities from being easily visible.
For example, from here, we have the following list for the R300: "negate, invert, bias, scale, bias and scale, and channel replication, and instruction modifiers, such as _x2, _x4, _x8, _d2, _d4, _d8, and _sat", some of which the R300 low level optimizer might be able to use for optimization with DX 9 HLSL generated code, and which Cg generated code might easily preclude by instruction ordering changes that are valid within the spec. It is not the reordering to the advantage of the nv30 that I object to for Cg, but using that for other hardware and blithely stating no difference in optimization opportunities could result.
Well, from what I understand, they are not. This list agrees with what you stated, except that it lists 2 of your items (abs, centroid) as not being for PS 2.x.Those modifiers you list for R300 are all part of the DX9 spec, but they are only listed in the PS1.x reference in the documentation, so I wasn't sure if they are available in PS2.0 too.
The reason RightMark uses Cg is that the developers wanted to use a HLSL and they wanted to target both OpenGL and D3D. Cg is the only available product fulfilling those requirements. While I don't think it's an optimal choice for a benchmark, I still think it's a viable decision.demalion said:The problem isn't what it could do, it is what it does (and does not) do. This seems self evident to me as everything I've mentioned that it should do, which included mentioning PS 1.4 and the OpenGL ATI fragment extension once at the very least, are things it could do.
From this, I really think you missed some things I said. Please recall the beginning of this post (ending with a ).
The situation for OpenGL is: there is no HLSL other than Cg. And as the only cards suited for Cg compiled shaders in OpenGL are those supporting PS2.0/ARB fragment shader level, we can probably conclude that those game tests using Cg require that level of hardware, like the Mother Nature scene does. So there would be no point in trying to support other extensions here.demalion said:I was at a loss here, as I've abundantly pointed out to you that there is no choice, and just a short while ago you seemed to realize that there wasn't one, but then I realized you said "(which I don't think are NVidia only)", and must have not read the quoted text carefully.... I still can't see how having the clearly-labeled choice of supporting vendor-specific extensions (which I don't think are NVidia only) can possibly be a bad thing.demalion said:Hmm... OK, let's go over this again:
There are "synthetic" tests. I only noticed files fo DirectX at the moment, but I presume they intend to implement them for OpenGL. When/if they do, of course it is not bad if they support all OpenGL extensions, but that is not at all what we are discussing. That follows...
There are "game" tests. They are implemented using Cg, period. The only optimizations this delivers are nVidia's prioritized compilation, and for those targets nVidia has defined. So, in addition to the restriction of standards to only the ones nVidia has an interest in exposing (which excludes the ATI fragment extension for OpenGL and PS 1.4), the only vendor specific optimizations offered are the ones provided by nVidia, for nVidia hardware.
You really need to read my quoted text more than glancingly. OpenGL in Rightmark3D at the moment = Cg. Cg does not support any OpenGL extensions except those nVidia's hardware supports, hence the benchmark consisting of a solid black window when I run it, and the existence of only the .cg files for the game benchmarks as I already mentioned to you....
If Cg did, it would (again, repetition) either be as nVidia dictated, or by forcing ATI (and therefore other IHVs) to write a back end and circumvent the other HLSLs (which other IHVs collectively have a say in). nVidia's response to this rather obvious drawback is that Cg = DX 9 HLSL. The only guarantee we have of that is nVidia's assurances...and the indication of the opposite that we have is various clear examples of conflict of interest with the way nVidia maintains Cg, and the already observed code output differences between the two.
OpenGL in Rightmark3D COULD be something else, like Cg for nVidia, and custom extensions for anyone else capable of the tests it is using. It looks to me like this includes the R200 at least, and I actually wouldn't be surprised if it included some other hardware as well (I don't know the precise functionality exposed in OpenGL by the P10 and Parhelia).
But, it isn't. I've covered this more than a few times already.
What things do you mean?demalion said:I don't think they need "solutions", but supporting DX9 HLSL in parallel would certainly be nice to see.
Hmm...well your above comments seem to indicate you are thinking things that are simply not true at all, as far as I understand you.
Ok, I'll do......stuff I think you should read again...
Yep. That's the point. It does not specify it.demalion said:Yeah, and API's only specify what kind of programs are valid, not what kind are good programs. Your statement is misleading, since you are proposing that there is no relation to the specification of the instructions and modifiers, and then subsequently what is a good method of executing them. That is simply not true.Sorry, but the DX9 PS 2.0 spec only says what kinds of shaders are valid shaders, not what kinds of shaders are good shaders. So when writing a shader you have lots of options, but no indication of whether you're on the right track or not.demalion said:To the DX 9 PS 2.0 spec, as I specifically stated.
For instance, the DX 9 PS 2.0 specification does not specify that two non texture op instructions from a restricted set occur for each PS 2.0 instruction or texture op. This is a possible occurrence, but depending on it to occur for optimal performance is not a good method of execution.
I was not talking about what is a good way to execute it. I was specifically talking about whether you can determine if a shader is "good" or "not so good" without executing it, only with the help of the specification. If you can not, you also can not state that it is off the base spec.Xmas said:the DX9 PS 2.0 spec only says what kinds of shaders are valid shaders, not what kinds of shaders are good shaders.
I don't say it does not matter. But I think it's acceptable if the generated shader is either "good" or "indeterminable" according to what I said above.demalion said:However, that is a characteristic of Cg's known target, and it is the ability to control this that nVidia is using Cg to allow them to do for shader content creation, during development and at run time, before the low level optimizer has to worry about it. They promote that precompiled code be generated using Cg, even if there is another alternative, for that reason...but, you maintain this does not matter despite evidence and analysis proposed to the contrary.
I am talking about code regardless where it came from. Simply because where it came from doesn't make it better or worse. Whether I write it in assembly or a compiler generates it, if it's the same code, it's the same code. Compilers produce different code, I'm aware of this.demalion said:We are talking about a high level compiler, Xmas. What do you think Cg is using to output to PS 2.0? You are circumventing the simple observation that compilers produce different code, by saying you are only talking about the code.
It would be good if you could clear up the issue on the add taking 2 cycles. I doubt that, but maybe you can show me proof of the opposite.Why didn't you address my instruction example at all? If it has flaws, it would be useful to discuss where I went wrong. If it doesn't, it seems more than slightly applicable.