Rightmark3D

Discussion in 'Architecture and Products' started by Lezmaka, May 1, 2003.

  1. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    I think this part just before what you quotet answers this:
    So the "two FP32 registers without performance hit" is only referring to temporary registers. You're only using one temp register in your example (xtex). Also your example doesn't access input registers the way described, it only adds a constant.


    I don't know whether _x8, _d4 and _d8 are natively supported on NV30. Those seem to be the only PS1.4 specific ones. DX will not accept shaders with modifiers it does not know.

    You're right about abs and centroid. I didn't look at the version table. However I don't think there are any modifiers the R300 supports and NV30 does not support. That said, it might even be the case that R300 (and, partly, NV30) does not natively support (meaning for free) some of the PS1.x modifiers because they are suited for integer types and more complex on float types (x2, x4, x8, d2, d4 are shift operations for integers, but for floats that's adding or subtracting a number from the exponent)

    As long as that kind of optimization doesn't slow down other cards, i think anyone can be happy with it.

    If that modifier is not supported in PS2.0 (as it seems), then how would there be a way for R300 to use it?
     
  2. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    I think you need to re-read this post and this post from me.

    I think you should because you are replying as if my own statements do not include such in their considerations...an example:

    I think your statement here is contradicts those you made which prompted me to address your comments, as evident in several places in our discussion. including the above excerpt from a reply of mine disagreeing with you...for example:

    This sentiment has been consistently present in your discussion, including your argument that the low level compiler should be able to handle any compiler's code equally well (hence the branch of our discussion indicating how that is counter to what is indicated to be the case in reality). Your conclusion has remained the same, but you are stating the premise differently here, while your argument still follows the statement you made in this quote.

    I again refer you to my prior statements addressing this. I'll quote some pertinent highlights shortly.

    Well, then it has the flaw of not representing the technical abilities of all cards, as I've also stated prior.
    That Rightmark has this flaw as a general benchmark, as opposed to an nVidia card benchmark, is not nVidia's fault? That's true. But it does represent a fault in Cg that invalidate its use for representative cross vendor benchmarking.

    As for the authors of Rightmark, note a prior comment on my part:
    Restatement of the above commentary in the context of the discussion in which it was made:

    True, but the entire context of convenience of the Rightmark3D developers and representative benchmarking was already addressed. Please take a look at the shader files in question and note that they are not necessarily onerous to implement at low level, and that DX9 HLSL usage to generate a PS 1.4-alike template might make it simpler still. Again, they are short enough that I don't think even that would be required except for comparison.

    I'll point out that the beginning of this discussion is due to Rightmark existing in this state now. Note the pertinence of commentary in my prior discussion

    Also, Glslang will not put an end to this discussion, just circumvent the reason for Rightmark 3D to have the reason you propose to use Cg (a reason of convenience: the tools to express functionality in OpenGL exist beyond Cg).
    Actually, as far as just ATI and nVidia, if a text file can be used to specify the R200 OpenGL extension, providing support for implementing that for the game tests (and doing this and the equivalent for the applicable nVidia OpenGL extensions for the synthetic tests) would be a significant step. Heck, within the context of being an "Open Source benchmark", just adding support for reading in user viewable and modifiable files representing low level instruction shaders to be applied to the tests, for both OpenGL, and Direct X (already done for synthetics, needs to be for game tests) for all the cards for which it claims to be applicable, would remove the flaw I mention, once those files were actually included and reviewed.

    As I said earlier, that's why I'm not on a "Rightmark3D bashing" bandwagon, but on a "Rightmark 3D has issues that need to be address" bandwagon.

    .
    Why are talking as if I hadn't address this exact point already? Ack!! I've reached my quote cuttof for the night, heh, you can find go find it yourself...it really isn't too hard to spot, and is probably represented atleast once in what I link to above.

    You keep on making arbitrary stipulations and then proposing a conclusion based on them, even after I've provided direct opportunity to discuss why I think the stipulation is mistaken, or have provided evidence and analysis specificaly indicating the opposite of the stipulation. :-?

    You then go on to base a definitive statement on your stipulation as if you've established it as factual, and also circumventing discussion, as repeated just above, of what has been actually been observed. In this case, I have only my own observation, but the actual shader files are available to you, and anyone, to dispute my interpretation...yet you do this instead. Even so, again for this particular part of the discussion, I have a whole line of other discussion that I've been repeating several times, that remains applicable even if my observation is in question, yet we never get to discuss it because you propose statements without basis instead.

    By this you seem to state that they are justified in using Cg for convenience because the results can't be skewed, which again contradicts what you proported to recognize about HLSL at the beginning of your post.

    They don't as far as I know and have been able to ascertain by running it, and by looking at files in its directory. As I already outlined by pointing out that I only saw files with names indicating DirectX shader functionality.

    That's the type of comment that causes me to refer you to previous posts.

    I understand from your prior comment about waiting for Rightmark3D to download that you might not have it available to view for yourself yet, but that shouldn't stop you from recognizing my observation in the meantime.

    You keep asking questions to which I think my answer has been obviously stated. i.e., YES. This, I feel, was completely covered in our prior "The problem is not what it could do, but what it does do" discussion you just quoted!

    And that is based on what I actually observed...Ack! The list for the Synthetic tests include PS 1.4 and VS 1.1, which can be expressed using existing OpenGL extensions for the 8500, and yet there is no option listed to target OpenGL. Also, the game tests seem to be only expressed using Cg, and to be the only tests for which the OpenGL tab applies (following from the above). Therefore, the switch does nothing more "than setting the Cg profile to either ARBFP or NV30FP".

    This is a repeat of information directly stated in my prior posts. :!:

    This fits the template of what you asked me to clarify later, and is not unique to this post.

    That's fine for a nv30 evaluation.

    Yes, as long as it doesn't propor to represent results between different chips that are not represented as equivalently as possible for a given standard, or set of standards. OpenGL and Direct X are such standards, Cg is not, except for the nVidia hardware, and it is wrong for Rightmark 3D to use it in place of such and represent itself as benchmarks for them. Again, this is a restatement of my prior posts.

    That's an observed result that confirms my conclusion, which were in turn based on observations already repeatedly outlined, and you can evaluate that for yourself, just please recognize that it exists, please.
    It is not presented as "conclusive proof".

    So you do have it? Tell me if you find indication of OpenGL being used for the synthetic tests, and then we can discuss this particular issue in relation to the synthetic tests with some hope of not being circular. My line of reasoning is above, as well as my answer to what I think of it if you do indeed find such indication.

    One example highlighted above, which applies to what you have presented before. Your assertions relating to good assembly and low level optimizers being able to handle any code thrown at them are also things I believe I have given good indication were simply not true as well, by example and analysis, among other things we've tried to discuss.

    Yep. That's the point. It does not specify it.[/quote]
    By this you state that because the LLSL does not specify you can NOT do this, it doesn't matter that Cg has unique reason to do this for the nv30 and that this is not significant. Again, your actual argument contradicts the statement you intially proposed as defining (in this post, atleast) your stance on using Cg in this way.

    This is the type of thing that prompts me to ask you to make up your mind, and I don't think I ask that unfairly.


    :shock:

    Your comment seems to be based on the pretext that you can't determine whether a specific shader is a good way of executing the function without actually executing it, as that is the only case I can see in which those two sentences do not completely contradict each other, assuming compilers can actually optimize. :-?

    I'll throw in a highlight here because I think this illustrates the same issue of ignoring my discussion and things that seem to me are already readily evident.

    This is the type of comment that prompts me to refer to my 4x4 and 8x1 by parallel which I consider particularly illustrative of the fallacy of what you propose is the case among those two.

    Then you are saying the HLSL compiler used does not matter for your definition of "acceptable".
    There are just too many conflicts in your statements left unresolved.

    If I had an Exorcist emote (without the overt hostility of something disgusting like vomiting, to keep things civil :p), this would be the place it belonged.

    I think my above comment about unresolved contradictions applies.

    Please search for my comments along the lines of "you can hand code bad code, compile bad code with Cg, or compile bad code with HLSL" or something like that, as I would have to do quote it for you to address this sentiment again.

    No highlight is suitable now, because we're actually discussing this usefully now, albeit it took a while to start doing so. Response in the other post (didn't take me long to reply, but I'm posting this first for chronological consistency).

    OK, this is another place for the head spinning emote I wish I had. Please put 4x4 and 8x1 as example substitutes into your discussion above, and refer to my prior comments to aid in illustrating, again, the fallacy of your statement.

    This is based on the spec not explicitly stating how you should code, therefore you can't make an assumption about what is general case or not, nevermind the simple observation that "the more unique restrictions on the code structure you entail, the less general case it is".

    I've already discussed this in depth, including example optimizations which you have not recognized in making this statement (which reminds me that a highlight might be helpful to address your maintaining that this is not the case).
     
  3. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    He seems to be talking about 2 components, not writing to 2 or more registers simultaneously... :?: I'm a bit confused by what you are saying.

    EDIT: Ah, I'm redefining his statement for the convenience of my example, I see what you are thinking now. This illustrates something I believe is associated with the pack/unpack functionality in the units under discussion, as what he is stating is made equivalent to what I'm stating by using such functionality for register usage, which is also illustrated by his discussion of storing two 16-bit value sets in one 32-bit register.

    With 4 components, not 2. EDIT: I see more clearly what you are saying now, it is a good thing I looked again after I finished the other post.

    I quote again:


    Yes it does access input registers the way described (which is "free" for integer components atleast)...the performance hit we are discussing is the output, as I specifically said.

    EDIT: Hmm...Oh, I see the contradictions, the text is talking about input, heh.

    When that result is being used as an integer (i.e., at screen output, or for the mixed mode FX12 limited shader paradigm), it allows a speed up. What my example should be stipulating, as in my mind (sorry, I did not express this at all clearly, it was just implicit in my thinking), is that you are actually going to use the floating point values for further calculation (at which point you actually access them).

    My use of the word output is indeed misleading because the slowdown in my example is in using the floating point precision of the output, which I took to be a given, but I did not establish. As we both recognize, calculating at fp32 is what is stated not to be a slow down, and my evaluation of speed occurs in the context of "deciding what format to output". Both educational in needing to take care with such assumptions, and possibly illustrative of the implementation detail divergence to which I referred elsewhere. :)

    I still feel like I need a head spinning emote for other things though. :p
    [/EDIT]

    It seems pretty clearly expressed to me that I'm not talking about expressing this in the LLSL, but about opportunities expressed in the actual "assembly" of the specific card that the driver low level optimizer converts it to.

    Hmm...well, you are stipulating things you think without any provided reason. I'd say the stated PS 1.4 specification gives reason to indicate something to the contrary, as well as other clearly exhibited factors illustrating their divergent designs.

    That is quite true for that example, but the case I mentioned for looking for integer valued calculations for the nv30 would also be applicable. It would not be as universal an opportunity as I implied by context, however.

    This is what I mean by making up your mind: do the HLSL optimizations expressed in the LLSL matter (as per the beginning of your prior post), or do they not (as per other statements, including the one quoted in my reply to that post).

    By the low level optimizer. When I say "visible opportunities", it continues to mean opportunities of which the actual architecture instruction execution model is capable. That seems to be plainly stated. The statement you're making there makes no sense to me.
     
  4. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    Isn't this statement a bit inaccurate? I thought the reason for this was to be able to guarantee constant usage without introducing complexities like "1024 instructions with no constants used". This is based on my understanding of constants taking instruction slots for the nv30, so if anyone understands differently just correct me.
     
  5. Clootie

    Newcomer

    Joined:
    Feb 7, 2002
    Messages:
    61
    Likes Received:
    0
    Location:
    Russia
    Precisely.
    DX as general API can't expose all chip details of different HW vendors. This way multuhead in exposed at API level only in DX9 althrow Matrox had DualHead at DX6 timeframe, DX8 eliminated PS.0.5 for original Radeons and GF1-2, etc... Let's _imagine_ what SiS will introduce HW what store both code, constants _and_ some other registers in the same fixed size pool (actually this is really bad idea and show here just for example) - Is another strange validation rule should be introduced to DX9?
     
  6. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    demalion,
    I really think there is no point in discussing this (the longer one of your last two posts) any further.
    You tell me to re-read some of your postings, so I didn't seem to get what you intended. And on the other hand, I have the feeling that some of your replies are not related to those quotes you are replying to, so you must have interpreted them differently than I intended.
    Maybe I'm unable to express my thoughts correctly and understandably. And sometimes I have a really hard time trying to figure out the meaning of a sentence of yours.
    You think that there are contradictions in my statements. At least I can't find any contradictions in my thoughts on the topic.

    And if the base of the discussion is flawed, it's pointless to go further.

    Oh, some things I wanted to add:
    If I state something, it doesn't mean I'm contradicting what you said.
    If I state something you've already discussed, it doesn't mean I haven't read what you wrote. Just that my opinon is different.
    If I state something that has been stated before, in similar or different form, it doesn't mean I haven't read this, but that I use this statement to provide the context for another statement.

    I do not have the time nor the will do discuss every point in a lengthy discussion like this. It's just not possible.
     
  7. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    An incomplete sysnopsis of my view of the conversation

    The problem with the discussion as I see it, is that I provided direct examples that provided a clear path for discussion, and the avenues of replies you took consistently failed to address them, snowballing into full length discussions of their own without resolving the premises that prompted them.

    You first stated that it was pointless to object to Rightmark 3D using Cg as benchmark, and supported this with, for example, statements like that a low level optimizer in the driver should be able to handle any code thrown at it, so if Cg differed from DX 9 HLSL in the LLSL code it produced, that did not invalidate it for applicability to other cards no matter how that code looked.

    To further support that, you went on to say that the length of code was the only metric for good code, so naturally Cg would produce short code that would be just as fast for the R300 as for the nv30, and that the structure of the code did not matter. I provided several discussion of indications to the contrary, including several direct examples of instruction count and performance metrics, analysis of some principles of the nv30 architecture that seem related to it, and supported these discussions with references to several sources that could be discussed clearly and directly, including instruction specific benchmarking and the resultant pipeline characteristics observed, Microsoft's GDC powerpoint presentation illustrating the difference in the code output proposed for the PS 2.0 and PS 2.a specifications, and also my understanding of John Carmack's comments comparing the nv30 to the r300.

    Alongside this, you discussed that the OpenGL options (ARB and Native) represented useful functionality in Rightmark3D, which to my observation remains an erroneous statement, and I informed you my observations indicating that they simply offered different options for Cg, and did not expose any other functionality except what nVidia cards support (in direct contradiction to what you indicated was the case to support that Rightmark3D was a good benchmark right now), and after repeating those observations several times, I simply asked you to check for yourself and provide your observations of its behavior supporting your analysis.

    I think there are very clearly things, similar to this last example, that seem erroneous in your comments, and I've tried to point out clear illustration of why I thought so, ranging from several examples I spent time conceiving to illustrate what I was saying, to simply asking you to look for yourself at something you could independently verify, and then confirm or deny if your observations concurred with what I was saying.
    I think the conversation ended up where it did because you consistently chose not to do address these things with anything besides new statements that offered what you said was your opinion without providing any support for it.
    I don't consider this recap a complete list of the items that cause me to believe this.

    ...


    What I find frustrating is that you proport that your initial statements are not incorrect or contradictory, after having failed to provide any support for them, or any reasons why my assertions of contradiction are incorrect.
    If the only way to address that frustration is to end the discussion, so be it.

    :-?
     
  8. Xmas

    Xmas Porous
    Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,344
    Likes Received:
    176
    Location:
    On the path to wisdom
    Im absolutely sure he is referring to full fp32 4-component temporary registers. Not writing to them simultaneously, but generally using them in the shader.

    The way I understand the quote:
    It's about the interpolated texture coordinate registers and the interpolated color registers. If you use texture coordinates in any non-texture-sampling instruction, it takes an extra cycle. If you use interpolated color in float/tex operations, it takes an extra cycle.


    As far as I understand it, neither using the result from the add(f) in further fp calculations nor outputting it to the color output register costs any extra cycles.

    If NV30 supports these modifiers, it's equal opportunities, and not hardware-specific.

    Hmm...well, you are stipulating things you think without any provided reason. I'd say the stated PS 1.4 specification gives reason to indicate something to the contrary, as well as other clearly exhibited factors illustrating their divergent designs.[/quote]
    I'm not stipulating things. I'm saying what I think is most likely. That's simply my opinion. There designs are divergent but still there are a lot of similarities. PS1.4 spec only indicates that three modifiers (x8, d4, d8) are not supported by the NV30 integer units, and assuming the NV30 register combiners are identical to the NV2x ones (besides precision/range), the OpenGL register combiner extension spec proves this.
    But for the float units, I think it is equally likely for both to either support or not support those modifiers natively.

    Sorry, I'm not sure I understand what you're saying here.

    I never said they do not matter. If they would not matter, they wouldn't be there.

    My fault. Could you give an example of how you think such opportunities can be hidden?

    btw, seeing your last post, I think it was a good decision to stop that discussion here. It shows you have not understood many things I said like I intended them to be understood. I admit I think this is mostly my fault. But the discussion didn't work this way. It's also frustrating for me.
     
  9. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    I vote only one thing can be discussed at a time. My eyes roll back and my tongue lolls out when I see the point for point tete-a-tete that goes on ad nauseum.

    (Though, I should just shut up and let you guys keep talking. ;) )
     
  10. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    Progress?

    Any interpretation at all of the wording establishes directly that short shader code is not the only metric for performance for the nv30 (I consider this the self evident criteria for evaluation here, since Rightmark3D is a benchmark that measures performance, though you continue to avoid recognizing this), which then directly establishes the problem with using the code nVidia's Cg compiler generates for a general benchmark. Since this is the point of my example in the first place, I continue discussing this here primarily to recognize the problems with the way I proposed my example, and also to further provide opportunities to try and illustrate this to you given that you still seem to maintain that statements you have made are not erroneous.

    To address my example:

    You are right that in the context of thepkrl's testing results and comments presented in that thread, the only thing established for certain is that my example's performance increase would depend on being able to conserve register usage and prevent exceeding the stated limit of using 2 4 component fp32 registers (which would depend on assumptions about pack/unpack not directly established by the thread), and that either this or some other factor allowed by using only 2 component fp32 not unquestionably established in that text would be required to definitively establish its applicability.

    My understanding of the comment I quoted about loading floating point registers with value leads me to conclude things that are not directly supported by the benchmarks, so, for brevity, I'll abandon it as a bad example illustration instead of continuing to stipulate such criteria.

    Instead, I'll point you to consideration of Hyp-X's example as an alternative illustration of how code structured for the nv30's performance characteristics is not representative of the general case optimization.

    My question: has it atleast facilitated in finally establishing how code can be longer and yet faster for the nv30, and can we then return to the discussion about Cg's applicability for benchmark usage with that fundamental item considered?

    No, because the nv30 has interests in code expression that might hide such opportunities for other architectures' low level optimizers but not for itself. The problem with your stipulation here is that you see no problem in proposing an "If" as an answer, ignoring other "Ifs", like whether there are any other distinct operation modifiers that actual hardware low level instructions can express.

    This returns to my 4x4 verus 8x1 example, where the general case, as I've already expressed, is served by focusing on implementation of the spec that doesn't prevent any architecture from reasonably being able to seek its own further optimization opportunities. Just because the spec can be used to express an implementation that violates that principle, doesn't mean that such an implementation is equivalent to the general case, nor that it is the responsibility of other parties to adapt to such an implementation instead of the general case.

    Have we established this much?

     
  11. UncleSam DL iXBT

    Newcomer

    Joined:
    Feb 27, 2003
    Messages:
    36
    Likes Received:
    0
    I manage synthetic (not game tests) for RM3D. Thats one that was published is quite old and buggy. At days (tommorow) will be released beta of latest d3d synthetic set with completely new shell environment that supports modular tests. anyone who wants to contribute some synthetic speed/quality/precision tests fell free to contact me unclesam@ixbt.com with any ideas/code and so on. Source code of shell and modules i will send back on you request if you need.

    for example we still didnt know how to profile state changes right - its a big question.

    lets do this test set better together 8)
     
  12. UncleSam DL iXBT

    Newcomer

    Joined:
    Feb 27, 2003
    Messages:
    36
    Likes Received:
    0
  13. chavvdarrr

    Veteran

    Joined:
    Feb 25, 2003
    Messages:
    1,165
    Likes Received:
    34
    Location:
    Sofia, BG
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...