David Kirk finally admits weak DX9 vs DX8 performance - GeFX

Discussion in 'Architecture and Products' started by g__day, Oct 24, 2003.

  1. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I'm thinking that R420 will use FP32 because the pixel shader and vertex shader are going to be unified, right? There's only going to be one shader core, with full texture access, and it will work on some vertices, buffer them, then rasterize them and switch to pixel shading, and so forth. Since for vertices both ATI and NVidia have always used FP32, I don't see ATI reducing this precision, especially since precision issues can be more easily noticed as a displacement error rather than a colour error.

    This will definately make efficiency much higher, as most of the time only one of the vertex or pixel shaders is working near it's peak. When drawing large triangles, the VS is waiting. When drawing small ones, the PS is waiting. I think 16 Vec4 FP32 general shaders as opposed to 4 Vec4+scalar FP32 and 8 Vec4 FP24 shaders is not going to be that much higher silicon cost considering it is a next generation part, and we should get twice the peak performance in pixel shading, 4x in vertex shading, plus fully featured vertex texturing.

    I'm definately looking forward to these next gen parts.
     
  2. Gubbi

    Veteran

    Joined:
    Feb 8, 2002
    Messages:
    3,661
    Likes Received:
    1,114
    Area of multiply arrays scale with the width^2. Going from FP24 to FP32 increases the mantissa with 50% and hence the area of the multipliers with 125%. This, -and adding another 4 shaders seems unrealistic to me. More likely 12 unified shaders (if unified at all, DX10 seems a long way off).

    Cheers
    Gubbi
     
  3. KimB

    Legend

    Joined:
    May 28, 2002
    Messages:
    12,928
    Likes Received:
    230
    Location:
    Seattle, WA
    I hope you're right, Minstmaster, I hope you're right.

    As for the extra die space required, Gubbi, remember that we don't know exactly what percentage of the total die space the multipliers require.
     
  4. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Mintmaster, you seem to be confusing R400 and R420. Common confusion of course.

    Just wait till the NV50 comes, THEN we'll see serious FP32 power coupled with astonishing flexibility and finesse.
    Oh wait, wasn't that the NV30? :twisted: ;)


    Uttar
     
  5. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,532
    Location:
    Winfield, IN USA
    :lol:

    You have no idea how much I'm looking forward to your article! ;)
     
  6. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Well, frankly, there's not a single mention of the NV50 in it. I do explain the NV30 debacle's causes AFAIK in more detail though.

    And, sorry, ULE has been delayed again. No kidding.
    The reason? The person editing it seems so busy in RL that he didn't even reply to the PM including the URL to download the final, non-edited version *grins*

    Which is surprisingly near to the URL I communicated you just kidding around - it was ule4.doc.
    But sorry, it's gone now ;)


    Uttar
     
  7. tamattack

    Newcomer

    Joined:
    Jun 24, 2002
    Messages:
    126
    Likes Received:
    0
    Location:
    Canada? What state is that in?
    You're such a tease. :evil: But I'm very much looking forward to reading it. :D

    EDIT: BTW, what does ULE mean?
     
  8. Mariner

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,288
    Likes Received:
    1,055
    Bah! It sounds to me like this 'ULE' is nothing but Vapourware! :twisted:
     
  9. jb

    jb
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,636
    Likes Received:
    7
    I can proof read it for you.

    In fact if you post it here...we all can proof read it for you :)
     
  10. Ollo

    Newcomer

    Joined:
    Feb 7, 2002
    Messages:
    129
    Likes Received:
    1
    Uttars Legendary Editorial
     
  11. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Yeah, yeah, MAYBE it's vapourware, but then again, MAYBE I'll reveal the REAL secret of Quack in it :twisted:

    Oh wait, we know that already, damn :p

    Hmm... No, but seriously...
    Is it MY fault if the guy who's editing is VERY busy in RL?
    *thinks a bit*

    Oh wait, it IS my fault... *sigh* ( and don't try to figure out how that's possible, even a psychic couldn't guess it )

    Ollo: Nah, it's really Uttar's Late Editorial :twisted: :D ;) :p


    ---

    Okay, to compensate for me being evil and teasing you all, here's a small part of it:

    That part was already edited a while ago FYI.



    Uttar

    P.S.: And I swear you won't get anymore of it before release :p
     
  12. olivier

    Newcomer

    Joined:
    Aug 8, 2002
    Messages:
    78
    Likes Received:
    0
    Location:
    Trois-Rivieres, Quebec
    can we get more before the release :?: 8)
     
  13. cthellis42

    cthellis42 Hoopy Frood
    Legend

    Joined:
    Jun 15, 2003
    Messages:
    5,890
    Likes Received:
    33
    Location:
    Out of my gourd
    Hey, I'm good with editing AND not busy right now--feel free to toss it over and I'll get right to work. :D The earlier it gets done, the more money you can extort from the likes of Dig to get it posted as quickly as possible. ;)
     
  14. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    LOL, yeah, Dig, how much do you give me to post it faster? :D

    More seriously though...
    The person in question generally posts on a nearly daily way on internet forums. His last postings I can find seem to be dated October 24th.

    I received a read receipt for the messages I sent the 25, but no response. I did not receive a read receipt for the message I sent on the 27.

    I do not believe however that I have any reason to worry, considering it took him over a week to correct Part 5 & 6 - he claims he doesn't go to the computer daily anymore due to RL, and the reasons I'm aware of for that are perfectly reasonable.

    Although if a certain person responds and accepts, who knows, he might be moving to Canada - if so, that could delay the whole thing an awful lot more *grins* - although I admit it's unlikely.


    And feel free to think it's my fault ;) :p


    Uttar
     
  15. Ailuros

    Ailuros Epsilon plus three
    Legend Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    9,511
    Likes Received:
    224
    Location:
    Chania
    Simple question: is there anywhere even a hint of a good number of highly experienced employees retiring during the NV25 era? I don't seem to be able to find anything about it so far.

    It shouldn't be in anybody's interest to elaborate on a matter like that but according to what I know and MHO it played a very significant role in recent releases.
     
  16. Manfred Bertuch

    Newcomer

    Joined:
    Oct 30, 2003
    Messages:
    1
    Likes Received:
    0
    Re: David Kirk finally admits weak DX9 vs DX8 performance -

    On editor´s day I asked David Kirk about this slide and he told me that the GFLOPS figures are a mistake. He wanted to indicate that the shader units can operate at 16 instr/clock and the new fp-units in NV35/36/38 are operating at 32 instr/clock. From a second source I got that this is simply derived from 4 components/vector x 4 units resp. 4 comp/vector x 8 of the newer units. Another slide made the comparison of 48 instr/clock in total against 32 instr/clock with the R300 (4 components/vect x 8 pipes), so Nvidia wins.

    ATI´s own counting is (1tex op + 1 vec3 op + 1 scalar op) x 8 pipes, which gives 24 instr/clock (or 40 instr/clock with MAD as two operations), which sounds more reasonable than counting per-component-operations.

    This shows, that even printed and technical looking informations direct from the leading technicians sometimes can be misleading.

    Manfred Bertuch, c´t magazin fuer computertechnik
     
  17. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Hmm, nope, sorry.
    Although I don't know since my ex primary source was hired by NV slightly after the NV20 IIRC.

    Probably not the best man to have realized that...


    Uttar

    P.S.: I'll check to see if I can get any info about that though.
     
  18. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    So you're saying that R420 won't have this unified architecture? Do you know if they're going to add texture units to the vertex shader or give them access to the pixel shader texture units (or some other possibility I haven't thought of yet)?

    I though R420 was basically the shader heart of R400 transplanted into R300's good balance of everything else, or something like that, but I really have no basis for this assumption. I knew a fair amount about R400 from working at ATI, but that was over a year ago, so lots would have changed.

    It's a shame, though, if they don't get the shaders unified on hardware for the next generation. I guess R420 isn't as big of a step forward as I would like it to be, but on the bright side my R300 will avoid obsolescence a bit longer :)
     
  19. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Mintmaster: I wish I knew, eh. Best man to know this type of stuff right now should be Fudo IMO, although I don't know if he got that technical with the R420 yet...

    Although something I heard, with kinda low reliability though, is that Loki/R420 is R400's VS finesse + R300's PS brute force. Of course, it's not that simple, but that the VS would have a lot more in common with the R400 than with the R300, and the other way around of the PS.

    I do expect them to have the VS (ab)using the PS texture lookup units though :)


    Uttar
     
  20. Dave H

    Regular

    Joined:
    Jan 21, 2003
    Messages:
    564
    Likes Received:
    0
    I understand that cost per working die increases superlinearly with die size. The point is that this dynamic exists over the entire die-size curve (although it is overshadowed by countervailing per-die costs, like packaging and testing, at the small die size end). There is no die size at which using .15u suddenly becomes infeasible; it just continues to get more expensive as die size increases. So yes, if R300 had used FP32 for the pixel pipeline that would have cut into ATI's margins somewhat; but no, it is in no way infeasible or even significantly more difficult to make such a chip.

    While we're on the subject, as the vast majority of GPU transistors are taken up by multiple copies of a particular functional unit operating in parallel, GPUs would be amenable to the trick of including redundant copies of functional units as a way to hedge against bad chips due to random silicon defects. In doing so, you pay an extra penalty in die size--reducing the potential number of dies per wafer--but can reduce the probability of bad chips due to random defects in the silicon down arbitrarily close to zero.

    AFAIK graphics IHVs don't make significant use of this technique (except in the very crude instance of the 9500 NP). Of course this is because it imposes extra design and testing burdens, plus the die size penalty I already mentioned. But that's precisely the point: if GPU die sizes were so big that random defects were catastrophically lowering yields, it would be worth it for the IHVs to make use of these techniques. The fact that it's not yet worth it goes to show how GPUs--even large die size chips like R300--already achieve decent yields.

    Of course I'm not saying that die size isn't critically important to costs, or that ATI isn't significantly better off from a cost standpoint getting away with FP24 instead of being required to use FP32. But to suggest that an FP32 R300 wouldn't have been viable on .15 is going way too far IMO.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...