Asking Tim Sweeney about NVIDIA and more

Discussion in 'Beyond3D News' started by Reverend, Sep 29, 2003.

  1. demalion

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,024
    Likes Received:
    1
    Location:
    CT
    Well, DeanoC showed the problem with what was proposed in the other reply to my statement, but he also uncovers one specific issue:

    While the base ps 2.0 spec is shown to not be hardware centric, and the actuality of optimizing to the spec is demonstrated rather clearly, the HLSL ps_2_0 profile is indeed also shown to be "architecture that doesn't implement SINCOS effectively"-centric by failing to do so, which is indeed R3xx centric and not "base 2.0"-centric until other vendors release hardware that doesn't implement SINCOS directly in hardware.

    This could still be easily shown to be non-hardware centric compilation profile for SINCOS depending on whether MS has information on whether other IHVs are doing this (i.e., it can't be easily shown with the information we have now), if that was the only profile with the behavior.

    But my test of the sample code seems to illustrate another problem: The HLSL compiler does indeed seem "hardware that doesn't support SINCOS directly"-centric for SINCOS, for all profiles. My compilation with FXC 4.09.00.1126 expands SINCOS for ps_2_sw and ps_2_a targets as well, not just ps_2_0. Turning off optimizations does not change this, so it doesn't seem to be just the result of an optimization analysis.

    :?:

    Now, it does seem the ps_2_a profile takes other opportunities to distinguish between the NV3x and the base spec, due to the differing outputs from these profiles reported elsewhere and their being demonstrated to benefit the NV3x performance-wise. What I don't understand is "why not this one"?

    I'm not familiar with what Dean means about input range and it affecting the compiler output in odd ways, but right now it looks like this should have been expressed most effectively in the LLSL using the SINCOS macro. Since the linked compiler would seem to obviously be expected to behave the same, this looks like a significant deviation from what the compiler should be doing unless there is some benefit to nVidia in being able to decide whether to collapse expanded macros after the fact, or not. However, I don't see what possible advantage they could get, given my impression of the NV3x strength with SINCOS.

    How many clock cycles is SINCOS? Just 2 would seem to preclude any advantage, unless SINCOS has some sort of register usage penalty, as the temp register count is drastically reduced in the latest compiler (for both ps_2_0 and ps_2_a). Of course, the problem with this is that I don't know of any such register usage penalty for SINCOS.

    Code:
    //
    // Generated by Microsoft (R) D3DX9 Shader Compiler 4.09.00.1126
    //
    //   fxc /Tps_2_a /Fxcode.txt test.hlsl
    //
        ps_2_x
        def c0, -0.5, 1, 0, 0
        def c1, 0.159155, 0.25, 6.28319, -3.14159
        def c2, -2.52399e-007, 2.47609e-005, -0.00138884, 0.0416666
        dcl t0.x
        mad r0.w, t0.x, c1.x, c1.y
        frc r0.w, r0.w
        mad r0.w, r0.w, c1.z, c1.w
        mul r0.w, r0.w, r0.w
        mad r1.w, r0.w, c2.x, c2.y
        mad r1.w, r0.w, r1.w, c2.z
        mad r1.w, r0.w, r1.w, c2.w
        mad r1.w, r0.w, r1.w, c0.x
        mad r0, r0.w, r1.w, c0.y
        mov oC0, r0
    
    // approximately 10 instruction slots used
    
    
    // 0000:  ffff0201  0016fffe  42415443  0000001c  ......._CTAB.___
    // 0010:  00000023  ffff0201  00000000  00000000  #___....________
    // 0020:  00000000  0000001c  325f7370  4d00615f  ____.___ps_2_a_M
    // 0030:  6f726369  74666f73  29522820  44334420  icrosoft (R) D3D
    // 0040:  53203958  65646168  6f432072  6c69706d  X9 Shader Compil
    // 0050:  34207265  2e39302e  312e3030  00363231  er 4.09.00.1126_
    // 0060:  05000051  a00f0000  bf000000  3f800000  Q__.__..___.__.?
    // 0070:  00000000  00000000  05000051  a00f0001  ________Q__.._..
    // 0080:  3e22f983  3e800000  40c90fdb  c0490fdb  ..">__.>...@..I.
    // 0090:  05000051  a00f0002  b4878163  37cfb5a1  Q__.._..c......7
    // 00a0:  bab609ba  3d2aaaa4  0200001f  80000000  ......*=.__.___.
    // 00b0:  b0010000  04000004  80080000  b0000000  __...__.__..___.
    // 00c0:  a0000001  a0550001  02000013  80080000  .__.._U..__.__..
    // 00d0:  80ff0000  04000004  80080000  80ff0000  __...__.__..__..
    // 00e0:  a0aa0001  a0ff0001  03000005  80080000  ._..._...__.__..
    // 00f0:  80ff0000  80ff0000  04000004  80080001  __..__...__.._..
    // 0100:  80ff0000  a0000002  a0550002  04000004  __...__.._U..__.
    // 0110:  80080001  80ff0000  80ff0001  a0aa0002  ._..__..._..._..
    // 0120:  04000004  80080001  80ff0000  80ff0001  .__.._..__..._..
    // 0130:  a0ff0002  04000004  80080001  80ff0000  ._...__.._..__..
    // 0140:  80ff0001  a0000000  04000004  800f0000  ._..___..__.__..
    // 0150:  80ff0000  80ff0001  a0550000  02000001  __..._..__U..__.
    // 0160:  800f0800  80e40000  0000ffff            _...__....__
    
    MS's response and Dean's further analysis should be helpful, as would some testing of what nVidia's drivers do performance-wise for these outputs.

    EDIT: clarity.
     
  2. Reverend

    Banned

    Joined:
    Jan 31, 2002
    Messages:
    3,266
    Likes Received:
    24
    Tim is my b_tch. Well, sometimes, I'm his b_tch.

    I know exactly what Tim thinks of Futuremark (or 3DMark03).
     
  3. Deathlike2

    Regular

    Joined:
    Aug 17, 2003
    Messages:
    542
    Likes Received:
    5
    lol... that's an interesting way of putting it...
     
  4. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    But surely a driver with as advanced detection algorithms as nvidia are using could detect the expansion and reconvert it back to sincos?
     
  5. WaltC

    Veteran

    Joined:
    Jul 22, 2002
    Messages:
    2,710
    Likes Received:
    8
    Location:
    BelleVue Sanatorium, Billary, NY. Patient privile
    What you can't do, of course, on 8-bit-per-color integer card is display more than 256 shades of a color. The ~16.7 million total color display output of 32-bit integer is not the problem, of course, since the smallest on-screen unit a color can be is 1 pixel, and of course there aren't any 16.7 million-pixel monitors out there--and if there were, there'd be no chips capable of driving them at a decent performance. 2048x1536 gets us to a tad over 3,000,000 pixels on screen, for a maximum of 3M+ colors onscreen, which is far under even what 32-bit integer is capable of in terms maximum color display.

    The barrier that fp precision bursts wide open is not the 16.7 million color barrier, since that's irrelevant for the reasons stated above. What fp precision does is to expand the shades of a color palette restriction from 256 to thousands of shades per color. Take your average family photograph, for instance. It can easily contain 1,000 or more levels of contrast--which 32-bit integer can never hope to reproduce, as it simply can't with the 256-shades-per-color limitation (which has *nothing to do* with the ~16.7M max color limit.) But fp precision can match it easily. That's a tremendous display advance, and IMO, it's more important by far than ancillary things like HDR--which are nice and interesting--but don't describe the advantages of fp fully by any stretch.

    What we're seeing in current 3d games running on fp hardware are older game engines not designed to render internally to fp precisions--and as long as that remains the case games won't look much different than what we're used to. But even the Doom3 engine renders internally at FX12 and fp16 (according to Carmack)--actually, Carmack is just describing the nV3x path he's coded for the game. I do not get out of that that the D3 engine will actually render internally at fp16--only that it will use that level of precision in the NV3x--sometimes--and as he's said he'll also be using FX12--I'm assuming the game engine itself therefore will exceed current integer rendering precision, and will average somewhere between FX10 and fp16, but always < than at full fp16 precision, even when using fp16 precision hardware (just guessing there.) IE, what I get from Carmack is that D3 itself will not require even fp16 precision in order to render properly--or as intended. When you consider that he does not plan to restrict the rendering engine to fp-precision hardware, you can see his logic.

    So, in other words, simply because a game engine uses a certain fp-precision pipeline, we cannot assume that the game engine itself requires, or makes use of, the full precision of the pipeline being used. For instance, I could program a white rectangle on a black background, for a total of 2 on-screen colors, and have it run through the fp24 display pipeline--and I've got all that great precision--but since I'm only displaying two colors, I'm not using it fully in the software even though I am using the physical fp pipeline--which always runs at fp24 in the R3x0.

    Anyway, this is how I've always understood the issues relative to *making use* of the fp pipeline as opposed to simply using it to run software which doesn't come close to using its precision potential. I would think, and would like to be corrected here of any error on my part, that the 32-bit framebuffer would be largely irrelevant here, since we aren't concerned with the total number of possible display colors--as 16.7M possibilities is plenty--but we are concerned with the number of shades-per-color displayed by the card. This seems to me to be where fp precision has its greatest value.

    So what's it going to take before we start seeing 3d game engines *requiring* fp precision? New software development tools all the way around on the graphics end (which the artists will use), new game engines capable of rendering to the support levels of color precision which will fully utilize the hardware. It's going to be awhile, I think, before we start seeing the full power of fp precision--for all those practical reasons. Nevertheless, I think it is one of the most exciting features to hit 3d in years.
     
  6. Sxotty

    Veteran

    Joined:
    Dec 11, 2002
    Messages:
    4,890
    Likes Received:
    344
    Location:
    PA USA
    Agreed walt, but then my agreeing is not much of an endorsement ;).
     
  7. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0

    DAMN! I never thought this might have hit so hard.
    Sorry for the people offended. It's obvious I'm not Tim Sweeney.

    First of all it was intended as a joke.

    Second, since Tim Sweeney's letter was so "left to people's interpretation" about his arguments, why in the world there's no chance he was sitting as his PC laughing his ass of at this 9 page argument with a MILLION different points of view, everyone swearing they know what's in that guy's mind? (keep in mind Im not attacking no one)

    I actually visualized this, him laughing and telling that to himself.

    I did not did that to insult or discredit the guy, for that there are plenty of 4 letter words, yet UT games are one of my favourites, so I can only praise him.

    If you ask me, the closest one of you guys to what he meant with that letter, it was "I wouldn't touch that with a 10 foot pole", and would add to that, "you try it."

    Reverend...sorry and lighten up, it was a JOKE. 9 pages for such a short mail answer can hint anyone this got out of proportion.
     
  8. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    Actually, I've enjoyed the comments this has all generated.
    I've learned alot about 3D programming, the associated hardware, and a bit about the business side too.

    Been a suprisingly deep thread, much more than "nVidia Sucks".
     
  9. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    Re: Oooops

    No problem. Nobody can bother to read the garble you output anyway.

    Please learn to write. Or have you lost the space and enter keys? In case you didn't notice, this *isn't* the chat room!

    Seriously, you'll get better responses if your posts are legible.

    (You have a good attitude and some ethusiasm -- you should fit in well. I'd log in and welcome you but I'm just tired of wading through stuff remotely resembling language... Having to hand pick the gems here, anyway.)
     
  10. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    The SINCOS instruction is defined only over a very finite range, so one possibility that crossed my mind was that the HLSL I originally gave wasn't confined to that range and as such the compiler decided not to use the instruction. BUT that wasn't it, I modified the code (in several ways) and it has never outputed a SINCOS instruction,

    I haven't looked up the exact cost on GFFX but whatever it should be outputing SINCOS so that the any IHV/driver can use that infomation have ever it likes. The more infomation the driver has the more optimisation strategies it can try.

    I've also (like you) tried the latest released SDK and still no joy. At the weekend I'll see if there is a beta compiler around, and if so I'll try that.

    I've mailed MS, so I'll see what they have to say.
     
  11. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    Re: Oooops

    Am I missing the handwriting plugin for IE?
     
  12. DeanoC

    DeanoC Trust me, I'm a renderer person!
    Veteran Subscriber

    Joined:
    Feb 6, 2003
    Messages:
    1,469
    Likes Received:
    185
    Location:
    Viking lands
    Maybe but I doubt it, its not that easy.

    The sequence of code is A) mixed up with other code and B) is just a taylor approximation to a sincos. To convert it back you would have to successfully extract the relevant instructions and then tell that it was doing the sincos.
    As a taylor approximation, you can only tell its sincos by looking at both the arithmetic ops AND the constants (its not a sincos approximation if the constants change). Now in theory the local def's override any SetPixelShaderConstants() call, so the driver would only have to do this (complicated) detection once when the shader was created BUT in practise SetPixelShaderConstants() does override the local def's, so you would have to do it everytime SetPixelShaderConstants() and SetPixelShader() is called, that would probably be quite time consuming and would probably slow down the game rather than speed it up.

    BUT this is exactly where NVIDIA could do an application specific optimisition. If they know completely that the application is really using that asm sequence as a sincos, then they can embed this infomation in the driver to speed up the detection and therefore the application.

    After all why should NVIDIA be penalised in benchmarks and the quality of games just because MS didn't write HLSL correctly?
     
  13. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,015
    Likes Received:
    112
    Ah, didn't know it's possible to override the constants (well I'm no D3D coder).
    Probably nvidia should ask MS to fix it (though maybe they have - will be interesting to see what MS says).
     
  14. Simon F

    Simon F Tea maker
    Moderator Veteran

    Joined:
    Feb 8, 2002
    Messages:
    4,560
    Likes Received:
    157
    Location:
    In the Island of Sodor, where the steam trains lie
    It appears that there are other "macros" that are pre-expanded by HLSL yet there are other that aren't. <shrug>

    As for the "these are constants that are never going to be changed by the application" it is a pity that D3D has no way of identifying them. I'm sure that would have allowed more peephole style optimisations within the driver.

    Judging from the DirectX Mail list, I don't think the compiler is "finished". No doubt there will be further updates.
     
  15. Ilfirin

    Regular

    Joined:
    Jul 29, 2002
    Messages:
    425
    Likes Received:
    0
    Location:
    NC
    Yeah. This update was just one of many planned.
     
  16. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    pidgeon english

    I own a geForce ti4200 .
    If I had the bucks right now, I would buy one of the ATI directX 9 offerings today.
    Anyway, reading topics like this I came up with the conspiracy theory that is the answer to the simple question "How did NVIDIA screw things like this up?" And like in an any worth mentioning conspiracy theory Micro$oft has a large part in it. So it goes....
    M$: Slice the prices of geforces 4 that we put into Xbox ! I mean, you now all very well specific rules of how console world works (we loose money on hardware and earn money on software)!
    NVIDIA: We understand that very well, and we adjusted the prices accordingly from the start, we won't slice the prices any further.
    M$: Slice the prices! We dont like loosing money (as a principle, not that we are lacking it), screw the agreed!
    NVIDIA: NO! Screw you!
    M$: Speak of the devil...he,he,heFormaly we are continuing relationship on development of DX9, informaly... GET THE F*** OUT OF HERE !
    That, combined with the bad timing (tehnical difficultis) on Nvidia part and excellent timing (and good execution) on ATI part led to the situation today.
    Roumor: Most likely ATI will be developing GPU for Xbox2.
    I own and use windows.
    I will avoid putting Linux as my desktop in the near future (atleast).
    P.S.
    personal note
    I am bored as hell, and I simply hate waiting for something....
     
  17. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
  18. RussSchultz

    RussSchultz Professional Malcontent
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,855
    Likes Received:
    55
    Location:
    HTTP 404
    What exactly is wrong with a bundling/licensing deal? HL2 is highly anticipated; I'd like HL2; I'd like a new video card. I'm more than happy to let ATI buy me HL2 when I buy a new video card.
     
  19. Anonymous

    Veteran

    Joined:
    May 12, 1978
    Messages:
    3,263
    Likes Received:
    0
    Nothing wrong, but all the conspiracies against Nvidia must stop. It is the way the hardware-software game is going to be played as you can see.
     
  20. Bouncing Zabaglione Bros.

    Legend

    Joined:
    Jun 24, 2003
    Messages:
    6,363
    Likes Received:
    82
    Nvidia is rumoured to have offered the same. Valve chose to go with ATI because the ATI tech is simply better.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...