NV30 processor result evangelism

Discussion in 'Graphics and Semiconductor Industry' started by demalion, Apr 12, 2003.

  1. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    ...or my response to what I view as such.

    :!: Continued from this thread.

    Why did you not reply to my post directed at your related assertions [in the parent thread]?
    I said dynamic range and precision, Chalnoth. For what you propose, you give that up. I'm not arguing that the GF FX isn't an integer and PS 1.3 functional card with superior performance, I'm arguing that this does not make its shader implementation superior, and ignores the cases where it is inferior.

    In order to have a performance advantage you have to give up the full precision. Calculating at full precision, and then calculating the result at integer precision, you give up the full precision (or are negating your proposed performance advantage).
    It can only offer the performance advantage on the same pixel data by removing the precision from calculations for the pixel.

    Superior in what? In which shader is it not inferior in either features or speed? What about the shaders where it is inferior in both? What quality makes it superior?

    Your "less functionality, using integer instead of fp24, excluding complex ops and texture ops" is more applicable?. :-?

    Ok, I'll restrict this discussion to integer precision processing with PS 2.0 "extended" functionality to try and ignore the speed deficit (at 500 MHz versus 325, and using more transistors :shock:) somewhat.

    First, that seems inferior to PS 2.0 with fp24, and to a rather more drastic and tangible degree than fp24 is to fp32.
    Second, the performance of the nv30 at 500 MHz when doing this is at a parity with the R300 at 325 MHz (that's what I call it when it sometimes leads, sometimes trails, and I'm not selectively looking at one of those cases to the exclusion of the other).
    Third, using the same cooling solution, the R300 would be capable of operating at higher than 325MHz (I'm ignoring the R350 for now).

    If you respond with discussion of NV30 PS advantages, please include recognition for R300 features, and an explanation as to how long shader lengths at integer processing make sense.

    Again, I said precision and dynamic range, not "precision for the sake of more precision". You know these matter for visual results, I've seen you post in threads discussing it. Are you claiming amnesia? What about to your prior discussions about color precision and dynamic renage from before the nv30's performance issues were substantiated?

    Oh, something I agree with (the second sentence, the first seems to be a repeat of your focus on avoiding discussion of concurrent speed and features). Only trouble was no one said "absolutely inferior" when you responded. They said it "sucked in comparison to the R300" after a discussion of very specific functionality, which seems a valid description of some situations, in fact a great deal of situations, when comparing them. For myself, I wouldn't use "suck" but I would agree to inferiority for the nv30 for those situations.

    Hmm...you almost made me make a lame river joke. :-?

    Well, the evidence in [the parent] thread supports a lot of observations in many other threads that seem to be giving the answer to your question. If you want to ignore those other threads, benchmarks, image comparisons, articles, etc, I guess you can, but IMO it looks a bit ridiculous.
     
  2. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Damn, typed a long reply and I closed the window by error... Gotta start again, and going to summarize most of my thoughts because I don't feel like retyping everything...

    More speed.

    Most operations don't need that much range. And FX12 still gives much better range than the screen's 8-bit.

    I meant to say:
    "It's not THAT hard to understand, now, is it?"
    "The NV30 is superior to the R300 when, and only when:"

    Hacking something to try to have higher speed isn't the same thing as developing a path with an architecture in mind. I bet there'll be very little IQ difference with Doom 3 on a NV30.
    Remember also that Doom 3 was made with the GeForce 1 in mind. There was no dynamic range there. And at the same time, Doom 3 is still future-oriented in many ways - so it's unfair to say the NV30 isn't future-oriented simply because it got native FX.

    *I* never said the NV30 is superior. I said it is superior WHEN specific conditions are met.
    Those conditions are mostly met in theorical cases, but they are also met to a lower extend in many practical cases, making the NV30 very slightly superior to the R300 in those cases.
    Most of the time, though, I'd still say the R300 is superior to the NV30.

    The low FP16 speed is normal, since the NV30 was made with intermixing in mind.
    Also, remember that you could do two different type of independent things in a fragment shader. Let me give you an example:

    Imagine you'd want to do lighting, but at the same time you'd want things which are far from the screen to look more red for whichever strange reason ( for example, you'd want to make the user think he's getting nearer and nearer to hell, even though he really isn't - some type of evyl illusion )

    You may want to do lighting with FP, 16 or 32 depending on needs, to have higher quality on that one. But the hueing depending on Z only need FX, really.
    So, you'd multiply the Z value by a given factor in FX. Then evantually remultiply by how "hellish" the area is. Then add ( thus using MAD ) a minimum.

    Then, lighting would be done in parallel ( although it'd probably gonna be finished after the FX stuff ) in FP, and you'd do a FX MUL to combine both results. Then you'd do yet another FX MUL to combine it with texturing. And there you go: FP quality lighting, while using some FX. It would look exactly the same, since hueing here doesn't need FP and FX12 got sufficent range not to lose quality on a 8-bit display.

    As for complex ops, the NV30 is immensely superior to the R300 when using SINCOS. In many other cases though, such as LRP, the R300 is a lot superior. But then again, complex ops aren't the main part of a program, generally. But yes, noting it *is* important.

    Thus, IMO:

    1. Having native FX functionality makes sense
    2. But nVidia's "2 times more FX than FP power" ratio is simply bad - there should be more FP power than that!
    3. The NV30 is superior to the R300 is about 25% of well-programmed shaders ( using correct intermixing and keeping near perfect quality )
    4. The R300 is superior to the NV30 is 75% of cases, thus.

    The good news, though, is that by doing a few minor changes to the architecture ( better FX/FP ratio, for example ) , you could get something which is superior to the R300 in at least 70% of cases. That is, if you can use intermixing.
    The questions, thus, are:
    1. Will DX9.1 support intermixing FP & FX?
    2. Will the NV35 have a better FP/FX ratio ( = more FP power, maybe less FX power, maybe something like 6/6 instead of 4/8 - or even better, 8/8 :) )


    Uttar
     
  3. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    Superior PS 1.3 speed doesn't seem very significant to me. Do you have a reason to propose it is? Again, actual shader workloads seem to have the nv30 barely edging out the R300, or losing, and that is with lower quality and at higher clock speeds. For quality output, and calling fp16 equivalent to fp24, it does not offer superior speed even at Ultra clock frequency.

    Hmm, so do the 8500 and the GF 4.
    That still didn't answer my question: isn't that contradictory to the label "superior"? Focusing on your stipulation of speed alone, keep in mind that in a case with a texture op, no scalars ops, and no bandwidth limitation does it compete on a clock for clock basis with the 9700. In the general case, it seems to have performance parity when doing integer versus the R300's fp24.

    Doing operations without texture ops and without PS 2.0 functionality, and using no scalar operations and 4 component vector ops (where it seems it is indeed faster or the same speed, even per clock) seems an exceedingly rare case compared to all the situations in which it is not faster per clock, or even faster at the higher clock. Again, the R300 has advantages as well, and they don't seem very rare.

    Where is the superiority in association with your proposition of "most operations"? That "most operations" seems contradictory with even your speed focused definition of superiority.

    What am I not understanding? You haven't said anything here that is new, AFAICS. If I'm incorrect in that, please point it out.

    We know the IQ difference with the nv30 path, it is reduced quality, as I said. You are, however, right AFAIK with the sentiment "very little IQ difference" for the reasons you go on to state, but my statement wasn't saying otherwise, it was addressing the comment of "superior" before you said you specified speed was your criteria.

    Hmm...does the R300 being able to provide higher dynamic range at the same execution speed qualify as an advantage, or not? Wouldn't that make the nv30's speed "superiority" less future oriented than the R300's? With your criteria established, it seems to me that this statement can be objectively made and substantiated.

    The term "future oriented" without comparison is a more subjective evaluation...I don't think a focus on fx12 is future oriented at all, and I think the implementation decision serves to negate the usefulness of the concurrent advantages it can offer, most especially when evaluating performance. However, we were discussing it comparitively to the R300 in specific, not making an "unfair" statement that the "NV30 isn't future-oriented simply because it has native FX12 support".

    Heh, and I discussed those conditions before making my statement, so I'm not sure why you complained. I guess what I'm wondering is more accurately described as the "the validity of" the statement that it is superior, even with your stipulation.

    Hmm...I'm still missing the "many practical cases".

    Does that mean "many many" cases? "many more"? If the R300 is superior most of the time (and what seems established is that the superiority is not slight) what are you arguing against? AFAICS, no one was saying the NV30 could never outperform the R300 in select cases with a clock speed advantage, so why are you arguing against "never"?

    I don't know, it disturbs me when I can't make objective sense out of your comments and context.

    Yep, and I recognized the "integer finishing" potential somewhere earlier in the threads associated with these results, and also mentioned it above with "In order to have a performance advantage you have to give up the full precision. Calculating at full precision, and then calculating the result at integer precision, you give up the full precision (or are negating your proposed performance advantage). "

    But, how long a shader is that? The issue is that you're looing at the clock cycle in isolation...your color is fine for output to the screen, but the precision for continued shading has disappeared, and the factors for prior shading are ignored.

    The GF FX could do the lighting calc at fp16, the Z multiply at FX12, for 4 pixels in 1 clock.

    The R300 could do the lighting calce at fp24, the Z multiply, and the additional color at fp24, for 8 pixels in 2 clocks

    But the R300 could also have done a texture op for both clocks if it needed to, where the GF FX would have needed a separate clock cycle to get texture data.

    EDIT: changed component expectations for lighting calc.

    AFAICS, ignoring getting the texture data and bandwidth penalties :)shock:), the nv30 would have a per clock parity.
    That does not seem to present an advantage for intermixing, but a case where the disadvantages of the architecture can be ameliorated by limiting options.

    It is phrasing like "immensely superior" that I don't get. Look at the PS 2.0 results for 3dmark03 compared between the R300 and the NV30...it is using the PS 2.0 "sin" function. Comparing image quality and performance results, what is the "immense superiority"?

    In which case it would be fast enough to use for FX functionality, which would seem to contradict 1. When comparing it to the R300, I just don't get your 1 and 2.

    Numbers From Nether Regions! Ack! I don't get your basis for these assertions.

    "Superior" at a higher clock speed in 70% of the cases? Hmm...why not design like the R300, so when at a higher clock speed it would be better than the R300 in 100% of cases, not just when "intermixing"? I guess ATi will be the only ones doing that for a while. :-?

    Lower the quality of the spec to integer to go with...longer shaders and more complex shader operations...? I'm still finding that proposition a fundamental contradiction.

    What use is the second "8" (even granting it as a suitable label)? Why not have just the first 8 and at high clock speeds so you can use the other chip real estate for something other than depending on lowered quality (and sitting idle when not using lowered quality)? If the first is 8 instead of 4 (even if just fp16), you could maybe even use it for integer ops by itself. :shock:

    Can't I call the R300 8/8 for its texture op functionality, directly comparable to your example when doing texture ops, except using FP24? How about when there are scalar ops and 3 component vectors? 8/8/8? And it's design has less transistors.

    I don't see the superiority in depending on FX12 to offer optimal performance instead of designing for optimal performance for varied shader workload (and at FP24).
     
  4. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,823
    Likes Received:
    162
    Location:
    Seattle, WA
    I guess you still don't understand the nature of higher precision and range.

    First, let's take a look at FX12. FX12 is a 12-bit integer format clamped to [2,-2]. This means that FX12 will work just fine as long as no number in the calculation goes above twice the maximum output brightness, and no number is multiplied by more than 4.

    Of course, there's always the possibility of recursion errors, but since FX12 has an additional 2 bits of "buffer" accuracy, with proper dithering FX12 can support a resonable number of calculations with no noticeable errors. With very long programs, FP32 would be necessary (FP16 would actually show more recursion errors).

    At the same time, that doesn't even make FX12 useless for very long programs. With proper analysis of the data, places can be found where recursion errors will be reduced rather than magnified, making it acceptable to use FX12 for those cases (that also make the conditions I listed above).
     
  5. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    Well, that's a useful alternative to addressing the post full of questions to you above.

    Let's see, I said FX12 was a problem for precision and dynamic range, and seemed to contradict the usage of long shaders and complex shader instruction calculations.

    Which seems to bring us back to a lack of dynamic range, which is why I mentioned it in that post to you above.

    There is also fp24 too, just to remind you, and it doesn't require being clamped to FX12 range to perform at speed, though you do seem to be just confirming the issues I mentioned. Also, doesn't the R200/RV250s -8 to 8 range look pretty favorable in the context of FX12?

    However, you've still failed to relate this set of limitations to support for your assertion of superiority in the face of the commentary I've provided, and my related questions detailed in the prior reply (to you) remain unanswered.

    Did you just spend a post saying "FX 12 is indeed inferior to what the R300 offers, in the ways you mentioned, but since you can work to counter some of these problems with careful planning, you show you don't understand by mentioning the problems in the first place"?
     
  6. Saem

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,532
    Likes Received:
    6
    Why go to all the trouble of analysis? The R300 doesn't need it.
     
  7. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,823
    Likes Received:
    162
    Location:
    Seattle, WA
    Dynamic range is only useful if it's used.

    And this is the point. For any shader where FX12 is used frequently enough (and properly...so as not to decrease output quality), the NV30 architecture will be faster than the R300 architecture.

    This is why without precise knowledge of which shaders game developers are apt to use, you have no authority to state that the R300 architcture is unequivocally better at shader ops than the NV30 architecture. You only have your personal preference.

    Additionally, I'm not talking about sacrificing anything here. I'm stating that with smart programming with specific shaders, the NV30 can be superior, while providing the same image quality as the R300. The only remaining question is which shaders game developers will actually use.

    Once again, the lack of precision and dynamic range is not a problem if the output is the same. The output will be the same (for all intents and purposes) if FX12 is used in parts of many shader programs (exactly how many I don't have authority to comment on, but statements from game developers seem to indicate that it's enough to outdo the R300).

    As a side note, I have a feeling nVidia is going to start interpreting the _pp hint in PS 2.0 as FX12, and, as a result, we're going to see a little "staring match" between Microsoft and nVidia.
     
  8. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,823
    Likes Received:
    162
    Location:
    Seattle, WA
    Because the NV3x line will certainly take a significantly larger marketshare, particularly with the NV34 product.
     
  9. Heathen

    Regular

    Joined:
    Jul 6, 2002
    Messages:
    380
    Likes Received:
    0
    It comes for free on the R300. If widgets on a toy come for free they'll be used, nvidia is effectively screwing the industry & consumers by offereing an FX mode.

    I pity nvidia then. :D

    Who says?
     
  10. Saem

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    1,532
    Likes Received:
    6
    I disagree.
     
  11. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    I have the impression that games developers, and their publishing houses, aren't really all that concerned about having to really delve into the architectures to actually produce something that gives them both speed and decent IQ - Why should they need to when a balanced architecture, such as R300, gives them that for free with no extra time spent on rogramming? I suspect that the 'smart' programming you talk of will mainly be done inside NVIDIA's walls, be it by their dev rel or driver team.

    You talk about not sacrificing anything first, and then say that? Doesn't gel.
     
  12. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    I share that 'feeling'. Apparently speed on NV3X doesn't as much come from the use of FP16 over FP32 as initially thought, but from going int12 over FP. Spicy. 8)
     
  13. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    Hmm...yeah, who'd want to use it, right?

    No, it won't, and I've addressed this already.

    Your 50% per clock advantage for NV30 depends on: limiting range and precision to FX12 AND no opportunities for 3 component vector ops and scalars AND no texture ops AND no bandwidth limitation AND limitation in instructions utilized.

    You're basing your support on this?

    Without all of these, and only when using FX12 freely, it can SOMETIMES have an IPC parity with the R300, as one of them falling through removes the advantage.

    In contrast, the R300's 100% per clock advantage (a situation you seem intent on ignoring) depends on having any 2 of the above fail (besides bandwidth limitation...the impact of that will vary), and it isn't limited to a maximum of 100% per clock advantage.

    And all of this is discussing the NV30 using FX12 versus the R300 using FP24, so it is even granting all the gymanstics you seem to be putting forth above as valid, which would still not make it superior, because the quality and featureset limitations are significantly below that of what the R300 is offering.

    So, this is the NV30's superiority?

    Oh, so only you have the authority to say a card is superior then? :shock:
    I didn't say the R300 was "unequivocally better", I said the NV30 was inferior in a wide range of circumstances and considerations (backed up by facts to the best of my knowledge), and recognized your stipulation of higher speed (in what seem to me to be rather restricted circumstances, and at significantly lower quality) while pointing out that the R300 has opportunities to lead as much or more in speed (both per clock and for final output) even at tangibly and significantly higher quality than you proposed. This was not based on personal preference...are you sure you can say the same of what you continue to propose?

    Your response is that higher speed at lower quality is superiority because sometimes you can hide (not eliminate) the lower quality (theoretically), with special care and planning, nevermind the impact on speed this will have.

    From this, I proposed that the R300 being able to lead both in quality and speed (simultaneously and in circumstances that benchmarks and analysis seem to support are common) versus the NV30 being able to lead in one or the other (with a significant sacrifice of the remaining one, and even then only in specific circumstances) does tend to supply an answer to the question that you proposed had an answer that still remained to be seen.

    Yes you are.
    Limiting yourself to specific NV3x shaders isn't a sacrifice?
    Limiting yourself to working around FX12 isn't a sacrifice?
    FX12 is the "same" image quality? I guess my talking about shader length didn't happen, and ignoring range and precision issues is all that matters.

    Your set of qualifications is staggering, your statements seem questionable for the reasons I've stated prior, and your dedication to proposing them seems to only make sense if you've decided to see only nv30's superiority, no matter what.
    That's my opinion, and I think it is well supported.
    Hmm...I don't think that question would have anything to do with your arguments for "superiority".
    :shock: You don't think your statement is a bit convoluted and special case?
    The only thing new here is "statements from game developers seem to indicate that it's enough to outdo the R300", and I'm wondering if you could elaborate on that, please.
    The "Dawn of Cinematic Computing"? :-?
     
  14. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    Hmm...with your definition of "advanced shading", please don't forget that the 8500/9000/9200 seem to be as applicable as the NV34.

    Well, if you're using DX or OpenGL, that is.
     
  15. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,823
    Likes Received:
    162
    Location:
    Seattle, WA
    Higher precision also comes for free on the NV30, though not on every instruction, unfortunately.
     
  16. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,823
    Likes Received:
    162
    Location:
    Seattle, WA
    Very possible, but remember that nVidia still has the larger marketshare, and the release of a low-cost DX9 part (NV34) will essentially guarantee that nVidia will have more DX9 parts than ATI.

    So? I'm talking about two totally different situations. One is developers who are trying to get the best performance and visual quality for their games. The other is Microsoft who is somehow against any integer operations in PS 2.0. Personally, I think nVidia has the right to turn around and use _pp in a different way than it was intended, since theirs is the only hardware that will make any use of it currently, so the only developers who will be using the hint (except Futuremark...) will be using it for increased speed on nVidia hardware, though it won't allow nVidia to get the register number boost from going for FP16 on other calcs. The stipulation here is, of course, that nVidia informs developers what's going on.
     
  17. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    No, that's how really stupid things start to happen, and how specs get fragmented - by short sighted arguments like this.

    The spec is there to define how things should behave - having the first hardware that implements part of the spec does not give you the right to alter that spec to suit yourself. Microsoft control the specification, not nVidia or any other IHV, which is as it should be - it's hard to have a competitive market in which one of the competitors controls the specification and can redefine it as they choose.

    There have been examples in the past where the spec says how things are supposed to be, but the defacto standard ends up being whatever a particular IHV ended up implementing. Then other IHVs come along and implement things correctly and get 'bugs' in existing software because people have written their software expecting the other IHVs incorrect behaviour, and the really daft thing is that the IHVs that implement things correctly are the ones who get blamed for driver bugs.

    It's stupid.

    In the past this sort of thing has mainly been caused by the spec being insufficiently rigorous. In the case of the _pp hint it's reasonably clear - the value must be float and at least S10e5.

    An IHV has no right to do things differently and then claim 'We were here first'.

    The spec was here first. The only fixed point mentioned anywhere in the PS2.0 spec is the minimum defined precision of the colour interpolator inputs. All pixel shader operations must be float. Any hardware or driver that doesn't use float for all operations in a PS2.x shader is doing things wrong.
     
  18. Joe DeFuria

    Legend

    Joined:
    Feb 6, 2002
    Messages:
    5,994
    Likes Received:
    70
    Are you really serious?

    Just to add to andypski's post:

    Is nVidia going to pay for any other IHV's driver delveopment, when they have to code for work-arounds to software devs programs that are based on nVidia's wrongful implementation? To even suggest that nVidia could wrongly implement a spec is one thing, and on top of that to suggest the only "stipulation" be that nVidia informs devs about it...just displays your lack of understanding about how doing such things impacts the industry.

    If nVidia really wants certain capability of their hardware to be utilized that DirectX (and even GL proper, I believe) doesn't expose, then nVidia has a LEGAL alternative. Push the use of their own proprietary extensions for GL.

    Developers, on a case by case basis, will then have a choice to decide for themselves on whether they want to expend the resources to cater to NV30's architecture.
     
  19. Heathen

    Regular

    Joined:
    Jul 6, 2002
    Messages:
    380
    Likes Received:
    0
    So why the performance drop in higher pecision modes or Nvidia's desire to go for FX12? If it was for 'free' as you claim they would never have offered integer modeas. Or is it like their claim of free AA? Complete PR Crudstunk.

    All the NV3X architecture seems to be is another enginerring kludge, poorly thought out and poorly implemented and Nvidia seems hell bent on fracturing the DX standard with their own 'personal' implementation. Hey I suppose MS could offcially declare the NV30 cards 'non-compliant' until they follow the claerly laid down guidelines.
     
  20. Colourless

    Colourless Monochrome wench
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,274
    Likes Received:
    30
    Location:
    Somewhere in outback South Australia
    Seen any recent WHQL drivers from Nvidia for the NV30? No, didn't think so.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...