Futuremark Announces Patch for 3DMark03

Discussion in 'Graphics and Semiconductor Industry' started by Nick[FM], Nov 11, 2003.

  1. Rolf N

    Rolf N Recurring Membmare
    Veteran

    Joined:
    Aug 18, 2003
    Messages:
    2,494
    Likes Received:
    55
    Location:
    yes
    I think most of the confusion stems from their use of the term "unified compiler technology". It's a marketing term and as such means nothing. It specifically does not mean "compiler" - as an average person would understand it. The sinister twist here is that the term contains "compiler". Note the difference to the "high resolution anti-aliasing" (dribble) vs multisampling (meaningful description) situation.

    You can't tell for sure what's what. Whenever NVIDIA speak of a "compiler" they may either refer to the usual meaning, or they may use the word as an abbreviation of UCT.

    The all important catch really is "technology". It's a combination of some stuff (true compiler) and some other stuff (replacements). If one component is circumvented, "the UCT" is compromised, which lays basis to their claims.

    This is the wiggle room they're currently using, and it can be interpreted as making sense, but it really takes some brain surgery.

    PS: the above doesn't mean I sympathize with those twits. Not at all.
     
  2. Pete

    Pete Moderate Nuisance
    Moderator Legend

    Joined:
    Feb 7, 2002
    Messages:
    5,777
    Likes Received:
    1,814
    :lol:

    The one nice thing about these repeat debates is the humor they elicit. I loved surrounding Fuad's name with warning lights, too. Corwin's subversive PRs are always good for a laugh, and this last one was possibly the best, not least because of that magical little dwarf. And using "gentle caress" to refer to engineering in an official PR? Classic! :D
     
  3. Magic-Sim

    Newcomer

    Joined:
    Nov 14, 2003
    Messages:
    99
    Likes Received:
    0
    Location:
    Calais (France)
    Hey, maybe the compiler is just.... a little virtual dwarf rewritting shaders by hand in semi real time :D That would explain a lot !

    Well, it could be Dobby too :D

    That would well stick with nV's Image ;)
     
  4. Dio

    Dio
    Veteran

    Joined:
    Jul 1, 2002
    Messages:
    1,758
    Likes Received:
    8
    Location:
    UK
    Any degredation on R300 would be a. unlikely, b. gradual and c. slight. It's pretty much a non-problem.
     
  5. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Well, register usage is not as important in the NV35; and it'll be even less important in the NV40. But that doesn't mean it'll be gone.

    According to my testing, the NV35 indeed has twice the register file size; the penalties are nearly 2 times as small ( My results on a NV35: -40% = 1.7x as slow for 12 FP32 registers. thepkrl results on a NV30: 3.4 times as slow. I tested with 12 registers as it's the theorical maximum for PS2.0. although both R3xx and NV3x support 32 full precision register.

    Now, considering a program with only Vec4 MAD, the R360 would get: 415*8 = 3320
    NV38 = 475*8*0.6 = 2280
    That's the WORST case scenario for the NV38 for pure arithmetic. I'm not taking into account the TEX disadvantage for the NV30 or the Scalar advantage for the R300, so in practice, best case PS2.0. is even better for ATI.

    Now, the best case scenario for NVIDIA is a 8 FP16 register program, thus all done in Vec4 FP16. Operations can be either MAD, MUL or ADD.
    R360 = 415*8 = 3320
    NV38 = 475*12 = 5700

    If we make an average of 2280 and 5700, we get: 3990.
    According to that, NVIDIA has the lead with the NV38. But obviously, such an average is not accurate, because of the TEX and Vec3+Scalar factors
    I'd go as far as saying +15% performance for the R360 and -20% for the NV38
    ->
    R360 = 3818
    NV38 = 3192

    That would mean the R360 is 19.6% faster than the NV38 for shading, shading here meaning PS1.1->PS.2.0+
    And I'd say the calculations I used would most likely favorize NV.
    Something's obvious though, and that is that without the register usage problem, the NV38 would nearly never be more than 10% to 15% slower.


    Now, you also asked: WHY is it that way?
    Well, there's a limited register file. If there aren't enough registers left, the maximum number of quads ( 8 = 32 pixels ) cannot be sent through the pipeline.
    And why was such a design choice made? I don't really know, I admit.
    The NV30 register file ( which is in FP16, FP16 registers unite themselves to become FP32, not the opposite ) is 4*32 = 128 FP16 registers. The NV35 has 256 FP16 registers. I doubt that would take SO much cache... Although the NV30 already has over 1MB cache, while the R300 has like 500KB cache, only :!:

    Now, I'd be interested in knowing how many registers the R300 has - sicne I don't have a clue about that.


    Uttar
     
  6. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    According to NVIDIA, NV35 has 256 quads in flight and 8 FP32 registers per quad.

    -> 2048 FP32 registers -> 32 Ko de cache pour ces registres
     
  7. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    I don't think so. I think that the problem is different. A part of the register 'issue' is for a same pass in the pipeline. Because of this, it is difficult to optimise for using all the units in the pipeline every pass.

    In NV30 the problem was slighlty different.

    Of course NV40 is a lot better about register and even if this problem will still be present, we won't have to focus on it.
     
  8. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    Are you sure about the amount of internal cache you talk about ?

    If your numbers are true it means that NV35 has less logic than R3x0. (more than 50 millions transistors for 1 Mo of basic cache + many more transistor as a part of the cache is more complex).
     
  9. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Oui, bon, on parle tout les deux français, mais c'est pas pour ça qu'il faut pas parler anglais sur des forums anglais ;)

    Anyway...
    Those numbers do surprise me quite a bit.
    8 FP32 registers/quad -> 2 FP32 registers per pixel.
    That would mean 1 FP32 register/pixel for the NV30...
    Yet, thepkrl's numbers were clearly showing that 2 FP32 registers/pixel were FREE.

    Unless they doubled the number of quads in the NV35. I doubt that though, because then my 1.7x vs 3.4x number makes no sense. Unless they did that ONLY through drivers; the drivers I tested with are much more recent than the ones thepkrl used. But that'd be an helluva driver improvement, eh!

    Anyway, 32KB of cache seems awfully small. Maybe that's what they need for the registers; but I'd assume there's a per-register transistor overhead somewhere in the architecture. If all it cost was 32KB, which is not even a 25th of the cache on the NV35... NVIDIA's engineers would seriously need a reality check.

    Also, I'm not 100% sure of the 1MB cache number, but the source is rather reliable generally. Notice how NVIDIA said so proudly they had 60%+ logic on the GF4 and we never had an official number for the NV3x? ;)
    Remember 1MB is for the NV30 though. Maybe they got less on the NV35, and they just added all that cache in a desesperate move to get some units at least working a bit... Who knows.
    Certainly reducing that huge amount of cache would be a good way to find the transistors they needed to replace the FX12 units by FP16/FP32 ones.


    Uttar
     
  10. Tridam

    Regular Subscriber

    Joined:
    Apr 14, 2003
    Messages:
    541
    Likes Received:
    47
    Location:
    Louvain-la-Neuve, Belgium
    Oups :D

    I need to sleep more :p
     
  11. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    The typical excuse :roll:

    ;) :)


    Uttar
     
  12. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,989
    Likes Received:
    3,529
    Location:
    Winfield, IN USA
    Back on topic...has anyone heard anything official out of nVidia or FM since the retraction statement?

    I haven't heard/read/seen any spin this morning and I'm having a serious sarcasm build up. :(
     
  13. Rolf N

    Rolf N Recurring Membmare
    Veteran

    Joined:
    Aug 18, 2003
    Messages:
    2,494
    Likes Received:
    55
    Location:
    yes
    Here you go ;)
     
  14. Tim

    Tim
    Regular

    Joined:
    Mar 28, 2003
    Messages:
    875
    Likes Received:
    5
    Location:
    Denmark
    40% of the 63 milion transistors used in the NV25 is actually almost exactly 512KB.
     
  15. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,989
    Likes Received:
    3,529
    Location:
    Winfield, IN USA
    Thanks zeckensack! :D

    Anyone heard Chuckle-boy's response to this from the dark & angry heart-o-darkness? :|
     
  16. cthellis42

    cthellis42 Hoopy Frood
    Legend

    Joined:
    Jun 15, 2003
    Messages:
    5,890
    Likes Received:
    33
    Location:
    Out of my gourd
    That post pre-dates ATi's response and nVidia's retraction, actually. (Well, maybe "dates" isn't the word to use in this case since it was all coming in very quickly, but I saw that pose before seeing mention of the others on the sites. Sent links to ATi's response and nVidia's retraction to the Inq., but despite following the affair so closely they have for whatever reason been slow to respond with the latter releases.)
     
  17. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    I'd guess the whole "This is not true" thing is just a way to give them more time, then.
    They could either:

    1) Find another insane theory later.
    2a) Screw the replacements, never put them back again, and never comment on how come the score dropped.
    2b) Screw the replacements, never put them back again, and be honest about it.
    2c) Screw the replacements, never put them back again, and say they had 'certain problems in their driver which made the compiler overly aggressive, and these have now been fixed.'
    3) Never comment on it again, but continue to put the optimizations back with every new driver release.

    I'd love 2b, but that's not gonna happen. 2c, though, would still be very nice... 1 and 3 would simply be stupid from their part IMO.


    BTW, Dig, read my PM at nV News?


    Uttar
     
  18. sireric

    Regular

    Joined:
    Jul 26, 2002
    Messages:
    348
    Likes Received:
    22
    Location:
    Santa Clara, CA
     
  19. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    I meant the total of all caches on the R300. That includes texture cache, FIFO cache, Compression Technology ( LMA on GeForces ) cache, and so on.
    If that's still wrong by a long shot, I'd be surprised if the GFFX figure was correct; although that person's NVIDIA sources are better than his ATI's.


    Uttar
     
  20. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    BTW, regarding a way to beat NVIDIA at their own game if they want to continue cheating:
    When reading the shader files, change them a bit randomly. A few easy and "annoying-for-nv" changes would be:

    1) randomly change the name of registers ( search for R0 in the string, replace all instances by R3, all instances of R3 by R2, and so on )
    2) Put NOP operations in the code randomly
    3) Add "easy-to-optimize-by-driver" operations randomly, such as MUL R0,R0,1 or ADD R0,R0,0. If ATI's compiler can also do this ( and I know NVIDIA can ), do operations whose results are never checked again ( like, if in the end only R0, R1 and R2 are still used, do MUL R3, R1, R2 and never use R3 again ).

    Just make sure you don't add too many and that it makes it goes beyond PS2.0. standards and can't compile, though!
    All of these could be bypassed by NVIDIA, but it'd take an awful lot more time for them to fix it than for FutureMark to add them IMO.


    Uttar
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...