GeForce FX: 8x1 or 4x2?

Discussion in 'General 3D Technology' started by Dave Baumann, Feb 10, 2003.

  1. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    Darn, I was about to leave the Tony Tamasi-interview at ease, but now I might as well get this straight:

    Please read the last part again, then tell me how you would split a shader program between the FP units (fragment program) and int units (register combiners) for twice the throughput?

    Of course in theory you could let the int path do some operations (e.g. blends) fast, then swap the intermediate data/results to the fragment program via the temporary register (if they are even shared) to continue with FP ops (or verse-versa), but each path still have to wait for the other part to conclude it’s operation first.

    Second, since we’re working on the same pixel (or even polygon if you prefer it on the grander scale) I can’t see how it is possible to work on two different shaders at the same time (sync).

    How is that for twice the throughput? Sorry for asking what may be a stupid question, but this isn't clear to me at all.
     
  2. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    The reason you get "twice the throughput" is that the register combiner portion is pipelined in series with the FP portion, so the FP portion can be on to the next pixel while the register combiner portion is working. In that sense, the NV30 does have 8 shader pipes; two sets of 4 in series that are pipelined so that at peak it can be working on 8 shader ops at once.

    This underscores the real problem I see with the NV30 and shader execution: in relying on two, very different kinds of shader execution units it differs significantly from the theoretical shader execution model that was the basis for both the DX9 PS and OpenGL ARB_Fragment specifications. Therefore, to reach its full potential it is likely to require either chip-specific code, or code built on the Cg platform that might as well be chip-specific because no other IHV will support it.
     
  3. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    Thanks antlers4, but I of course realize this and my key problem with the "twice the throughput" term is as follows: At any given cycle you should be working on the same polygon which will mean that you are running the same shader program. (I wrote 'should be' because this might were the twichy part of NV30 comes into play).
     
  4. tamattack

    Newcomer

    Joined:
    Jun 24, 2002
    Messages:
    126
    Likes Received:
    0
    Location:
    Canada? What state is that in?
    Bizarre logic, to be sure. But then, I bet you could even find a way to justify blaming ATI for bad weather. :roll:

    Following your logic, I would say that it's your fault for choosing to use an OS which requires drivers to interact with each other. :?

    All well and good. So why do you feel the need to place blame at all? :evil:

    Imagine that... an AGP card, intended for use in AGP slots with AGP drivers not working under PCI mode...
     
  5. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    NV_fragment_program actually supports all three data types (integer, half float, full float), so you can mix both float instructions and integer instructions freely in your shader (you don't need to use register combiners though they are still there AFTER NV_fragment program). So yes they can "co-issue" two vec4 instructions where one instruction is integer and the other is float.
     
  6. andypski

    Regular

    Joined:
    May 20, 2002
    Messages:
    584
    Likes Received:
    28
    Location:
    Santa Clara
    Scandalously enough our drivers also apparently have some dependence on Microsoft's DirectX runtime working directly under Windows, as well as their GDI calls actually occurring.

    We also have a nasty habit in our drivers of allocating memory, so we do rely somewhat on the operating system to know how to allocate and free its memory as well.

    We have further noticed that if people's mouse drivers don't work correctly then the mouse pointer can occasionally not move on screen - a clear graphics problem. We haven't added full workaround mouse driver support into our drivers yet, so if you have a buggy mouse driver we can't fix this for you yet.

    Seriously, I almost don't know how to respond to this. Sometimes a post comes along that just leaves you completely gobsmacked... :shock:

    - Andy.
     
  7. antlers

    Regular

    Joined:
    Aug 14, 2002
    Messages:
    457
    Likes Received:
    0
    What we'd like to know is to what extent the ability to co-issue depends on the data dependencies in the shader.
     
  8. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    What I'd like to know is how the word "proxel" is more confusing to consumers than "a system might perform an operation on N pixels in parallel in each clock (or equivalently N operations on one pixel), but might only be able to write results to the frame buffer at a sustained rate of M pixels/clock, where N > M."

    I liked "pixel pipelines" just fine, myself, and I'm not proposing that the consumer think of "proxels" (or whatever the term) as anything more than "modern pixels", or do you think consumers think of z buffers and color bit depth when considering the term "pixel" or comparing fill rates? This is not a term meant to help out hardware designers understand things, but to avoid the confusion recently introduced to existing terminology.
     
  9. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    Exactly, and besides that I would also like to know whether this requires NV_fragment_program (e.g. can't be done easy in DirectX or with ARB_fragment_program)?
     
  10. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    You can (well driver can) do all sorts of tricks to achieve as many co-issues as possible. It can rearrange instructions, reallocate registers,...
    And yes, this only works on NV_fragment_program as neither ps_2_0+ nor ARB_fragment_program (I'm not 100% sure about ARB) doesn't expose integers. There is only partial precision hint (which driver can ignore freely) in ps_2_0+.
     
  11. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Cg supports Float, Half *and* Int
    But I'd guess "Int" is transformed in Half when in DX9.
    But wait, that's ridiculous!
    Because Half got 10-bit mantissa, and the Integer got 12-bit!
    So that means, for optimal quality, you'd have to use float...

    Could we get confirmation using Integer in Cg will not work as integer when using PS2?

    Hmm, wanna bet nVidia is begging MS to include "integer" in DX9.1? ;)

    <joke, examples are *not* accurate and are not supposed to be>
    "New features in DX9.1:
    - Support for ATI's revolutionary F Buffer
    - Support for Matrox's adaptive displacement mapping ( not required for comliance ) & unlimited texture fetches in the VS
    - Support for PowerVR's texture compression for 256:1 compression ratio with minimal quality loss
    - Support for nVidia's ugly, silicon-hungry & nearly useless Integer format."
    </joke>


    Uttar
     
  12. jasal

    Newcomer

    Joined:
    Mar 4, 2003
    Messages:
    4
    Likes Received:
    0
    Location:
    Italy
    Ilfirin, no insult intended here, but I have an hard time thinking you are not biased, because all your arguments have proven false, but you still stick to your point. You get to imply that Nvidia is currently a force working against innovation in graphics industry, but I fall short to see how ANY company could do this and still survive in the long term (and even in the short, given the hard competition and the tight market situation). I'd say Nvidia built its ENTIRE fortune in adopting standards and offering great consumer implementation (both hw and sw) of them: doing so, they acted as a DRIVING force of innovation because they challenged everyone to target such level of quality (not to mention the fact that for a long period Nvidia hw was practically the only one to show good performance AT ALL, both on proprietary AND non proprietary path). Programmable HOS tessellation is to be implemented in NV50 and R500, so maybe they didn't support it because it WASN'T in the standard (probably a DX10 feature). They don't support by now PS1.4 (and that's not good performance wise, because NV30 maps PS1.4 shaders to 2.0 shaders, that apparently don't do well on the FX) but in fact PS1.4 IS kind of proprietary (it was born as Ati's specific implementation of DX8 shaders). You cannot assert Nvidia doesn't support ARB because they ARE in the Architecture Review Board and they proposed a lot of ARB extension. As for prorietary path performing better the standard ones, this seems to me a self evident truth, so... Of course all of that could not prove true for the future and Nvidia could undertake a suicidal course of action, fighting standards and innovation: simply it doesn't seem this has already happened..
     
  13. Hyp-X

    Hyp-X Irregular
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,170
    Likes Received:
    5
    FP16 is:
    5 bits exponent
    1 bit sign
    10 bits mantissa

    FX12 is:
    1 bit sign
    11 bits value

    When you convert the value to floating point the exponent will containg the location of the highest "1" bit in the number so it doesn't have to be stored in the mantissa.
    So even in the worst case (highest bit in FX12 is 1), the FP16 will have sufficient preciosion to store the number exactly.
     
  14. kyleb

    Veteran

    Joined:
    Nov 21, 2002
    Messages:
    4,165
    Likes Received:
    52
    oddly, i didn't see any factual inconsistencies in his arguments at all. granted it is a big thread and moved rather quick at points so it is a little hard to keep up with but i even dug back though just to see if i missed anything. so, sense you posed the position i feel compelled to ask; what ever are you referring to Ilfirin?
     
  15. Ilfirin

    Regular

    Joined:
    Jul 29, 2002
    Messages:
    425
    Likes Received:
    0
    Location:
    NC
    Strange, if anything I would say I almost immediatly backed down from the point. The original post was nothing more than a 4AM rant (after not sleeping for several days prior, even) of which the original point was never properly conveyed, and most examples given were not totally sound.

    Essentially all that I was trying to convey was that on more than a few occasions I (along with many other programmers I know) have found myself smacking my forehead wondering why with regards to nVidia's decisions as of late. Many of these decisions have resulted in more work for me (and others), with less return, and it's quite frankly annoying.

    This will be my last post on the subject since I fail to see the point in defending an ill-formed argument (it's not like I'm trying to convince anyone of anything), and it's starting to detract from the main topic of the thread. My PM inbox, however, is far from full if you wish to contact me directly.
     
  16. LeStoffer

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,253
    Likes Received:
    13
    Location:
    Land of the 25% VAT
    Thanks, but now I have another question for you! (sorry).

    I might as well do a copy n' paste from the NV_fragment_program Extension Specifications for CineFX:

    So if the final results of a combiner fragment programs works as the initial values for the register combiners, how can it be done simultaneously?

    The doc say:
    (my bold)

    If things are pipelined like this (which would only make sense!) I still don't quite get the double throughput claim. :?
     
  17. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,902
    Likes Received:
    218
    Location:
    Seattle, WA
    It would only be double throughput if the fragment program and combiner took the same number of clocks to execute.
     
  18. jasal

    Newcomer

    Joined:
    Mar 4, 2003
    Messages:
    4
    Likes Received:
    0
    Location:
    Italy
    Kyleb: well, the posts from pages 25-26... It's not really a problem of inconstincency, it's more like if the examples he was using weren't really related to what he was saying; this probably generated a confusion, even on my part, about what he really meant.
     
  19. jasal

    Newcomer

    Joined:
    Mar 4, 2003
    Messages:
    4
    Likes Received:
    0
    Location:
    Italy
    Ilfirin: in fact your clarification simplified things a lot. Probably I understood something slightly different from what your exact point was (and you agree that something was actually not properly conveyed); however I was targeting what appeared to me as the EXTREME implication of your reasoning about questionable business tactics: that Nvidia (or anyone else, for what matters) could even deliberately hinder the performance of a product only to make its point about a certain technology; what appears not to be the case. Or, better, they COULD do that, but that would be only suicidal on their part. It's something like, say: "Nvidia's poor implementation (performance wise) of PS2.0 is an attempt to kill the technology and drop support of DX AT ALL, promoting instead a wider use of Cg as programming tool of election", a bit unrealistical at the best. In fact I was implying that: graphics industry is not a monopoly (even though these times seems to act like a duopoly), it's still an high competitive marketplace where missing a product cycle could mean death (or at least major damage) for a company; to emerge there Nvidia products showed both performance and excellent implementation of standard technology... At least so far, for present and future I cannot say :) (but I could guess: probably the first thing they'll try and fix with NV35 is FP performance, because they know DX IS important). Of course a programmer is much more entitled to say if a certain hardware is difficult to program for or shows some kind of anomaly when using standard paths, but I'd would say that most times a bad implementation is just that (i.e. something to fix later), an not the product of an aware decision...
    However: I understand what you say about 4am posts, and it's been a VERY long way for me to read carefully 28 (28!) pages full of interesting posts; and YES this is definitely not the main topic here..
     
  20. MDolenc

    Regular

    Joined:
    May 26, 2002
    Messages:
    690
    Likes Received:
    425
    Location:
    Slovenia
    You have both integer and floating point pipeline available in NV_fragment_program. NV_fragment_program can output 4 different colour values and only here old register combiners kick in. On top of all NV_fragment_program code you still have 8 register combiners which can modify 4 colours which came from NV_fragment_program in any way they like. The point here is that while in "NV_fragment_program mode" both integer and float pipeline can work simultaneously on completely different instructions.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...