Nv40 16 full pipelines- The Inq.

Discussion in 'Pre-release GPU Speculation' started by nelg, Feb 26, 2004.

  1. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,533
    Location:
    Winfield, IN USA
    Lemme try it again slowly and see if that helps.

    The.......compiler.......would.......NOT.......be.......affected.......by.......the.......3dm2k3.......patch.

    Even.......nVidia.......has.......admitted.......this.

    You.......came.......here.......to.......address.......one.......issue.......,.......but.......you're.......wrong.......on.......that.......one.......issue.

    I hope that helps, after this I'm afraid I'll have to resort to some form of sarcasm to get the point across. :(
     
  2. Pete

    Pete Moderate Nuisance
    Moderator Legend

    Joined:
    Feb 7, 2002
    Messages:
    5,777
    Likes Received:
    1,814
    How shall we learn? By accepting that the mere reordering of variables in the latest 3DM03 patch dropped nV scores almost across the board? Is that a useful "compiler" to you? Try this on for size: Fool me once, shame on you. Fool me twice, shame on me. Not a lot of ppl here trust nV to play according to FM's rules WRT 3DM03 given their past actions and their past and current public stance. Business is business, after all, so I suspect there's no getting around endemic "cheating" per the "win at all costs" philosophy (no doubt fueled by being a publicly-traded company), but at least we all know to look out for it. Not expecting it is just being naive, which you don't seem to be. Arguing over semantics is just being petty, which you seem to be.

    It does seem like nV is no longer compromising IQ or using flagrant hacks like hand-coded clip planes, but code substitutions are still against FM's rules for 3DM, sensible or not. So either ATi is far more sophisticated with their code substitutions and FM isn't able to fool them with variable renaming/reordering, or ATi's just not using them. I really have a hard time believing the former, but I also have a hard time believing ATi isn't using app-specific optimisations for some games (given their past admissions in other squabbles and sudden benchmark performance increases in other apps). But at this point it seems ATi can afford to not optimize for 3DM03, simply because it's clear they have the more powerful DX9 architecture.

    We know that nV's "Unified Compiler Tech" is mainly lots of shader substitutions, per many sources. Code substitution is against 3DM03's rules, so the part of "UCT" that's increasing 3DM03 scores is probably in violation of 3DM03's EULA. If you don't care, then more power to you. But why bother convincing us that what nV is doing WRT to 3DM03 isn't wrong, at least in this situation? If you want to go beyond 3DM03, both Gabe Newell and John Carmack have said the code subsitution in drivers is not helpful to gamers and not reflective of general speed, as the substitutions are fragile (not portable even to a game's subsequent mods) and thus serve mainly to inflate one score at the expense of others.

    You're "done" with this thread, indeed. :roll:
     
  3. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    What if it's not app specific, just shader specific? e.g. if shader_source == "..." then shader_source=replacement_source. This will work in all games that use the same shader. There is a class of short shaders and fragments of shaders amenable to this sort of peep-hole optimization. Granted, searching for exact sequences of instructions is kinda a lame way to do optimization, it's certainly not grossly illegal.

    There are some optimizations that virtually require this, unless you want to embed a theorem prover in your optimizer so that it can reason about trigometry and vector mathematics.
     
  4. FUDie

    Regular

    Joined:
    Sep 25, 2002
    Messages:
    581
    Likes Received:
    34
    Except that this is still against Futuremark's rules for 3D Mark 2003. The optimizations have to be general and not rely on previous knowledge of what 3D Mark 2003 will do while running.
    That's great. So what does this have to do with 3D Mark 2003?

    -FUDie
     
  5. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    I thought it was, "Fool me once, shame on you. Fool me twice, won't get can't get fooled again." ;)
     
  6. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    Some legitimate optimizations are very hard to do without wholesale replacement, and they do not benefit "just 3D Mark 2003". Detecting shaders is different than detecting applications running those shaders.
     
  7. {Sniping}Waste

    Regular

    Joined:
    Jan 13, 2003
    Messages:
    833
    Likes Received:
    29
    Location:
    Garland TX
    The shader code that FM uses in 3DMARK 03 is ONLY USED in 3DMARK 03 and no were else. To detect that shader code is to only detect 3DMARK 03. In the 53.03, it detects the VS codes and replaces it. The VS codes for 3DMARK 03 is only used in 3DMARK 03 so your statement is wrong. Detecting the shader code in 3DMARK WILL NOT HELP ANY GAME.
     
  8. Pete

    Pete Moderate Nuisance
    Moderator Legend

    Joined:
    Feb 7, 2002
    Messages:
    5,777
    Likes Received:
    1,814
    Now, now. Don't be an evildoer. ;)
     
  9. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,533
    Location:
    Winfield, IN USA
    You forgot the next bit, and it's pretty applicable too:

    :lol:
     
  10. Althornin

    Althornin Senior Lurker
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    1,326
    Likes Received:
    5
    If changing an operation order or switching variable names around breaks your "optimization", then it isnt one, its a shader replacement and a cheat.
    Otherwise you could generalize it.
     
  11. DemoCoder

    Veteran

    Joined:
    Feb 9, 2002
    Messages:
    4,733
    Likes Received:
    81
    Location:
    California
    First of all, all optimizing compilers are vulnerable to problems with reordered input, especially if they have to obey associativity and commutivity rules with respect to floating point. For you, (A+B)+C might be the same as A+(B+C), but the compiler cannot always make that assumption. FutureMark developers (e.g. human beings) can look at the source code, and decide the two expressions will be equivalent for their purposes, compilers can't, hence they can't always reorder.


    Secondly, I'm talking about a peephole patterning matching optimizer that slides a window over shader streams (which is why I mentioned subshaders), not bit-for-bit identical shaders. These are perfectly legitimate optimizations, but the nature of the pattern matching is sometimes fubared by reordering.

    There are many short shaders (<12 instruction) which applications share in common, standard diffuse, specular, bump, environment, and multitexture shaders, which can be replaced entirely, and will be "generic" and neutral with respect to applications.

    I'm not saying this is what NVidia is or is not doing, but unless you've got hard evidence backup your claims, innocent until proven guilty. The parties screaming cheat should have the wherewithal to put just an ounce of investigation and work behind what they are saying.

    If you do that, I'll raise no objections. Until then, I'll raise what the alternative technical explanations are, that can also fit the facts.
     
  12. madshi

    Regular

    Joined:
    Jul 26, 2002
    Messages:
    359
    Likes Received:
    0
    How often do we need this evidence? NVidia encrypts its drivers, so it's very hard to find out what they're doing. Guess why they did this? There are enough indications that strongly suggest that NVidia is still cheating in the latest drivers:

    (1) They did on every single driver up to 53.03, this was proved by examinations by FutureMark and by Beyond3D and by other parties.
    (2) NVidia said they think that they have the right to cheat (on in their words: to do application specific "optimization" for 3DMark).
    (3) NVidia admitted that their "compiler technology" consists of 2 parts, namely a "real" compiler and application specific optimizations, where the latter try to find a good compromise between speed and IQ (e.g. replacement shaders which are mathematical not 100% identical, but you can't really see much of a difference, if any).
    (4) The new drivers magically bring the score back to almost exactly the same value as it was before the last patch.
    (5) FutureMark did not approve any driver, except 52.16, and that only because the latest FutureMark patch disables the application specific "optimizations" of 52.16 (for the most part).

    If you ask me those indications are more than enough for any court to decide that NVidia is guilty.

    I don't even see any ground for doubting that - hey, NVidia themselves said they will continue to do application specific optimization for 3DMark. So why are we even discussing this?
     
  13. Miksu

    Regular

    Joined:
    Mar 9, 2003
    Messages:
    997
    Likes Received:
    10
    Location:
    Finland
    Now this thread is getting interesting.. NOT. How many discussions have you went through with this same subject? It always starts with one misinformed person (Ardrid in this case) who has this "I know this, I don't need to listen to you"-attitude and then the thread goes on for some 10 pages. After that we've heard every argument atleast 10 times. How about just ignoring Ardrid and going on with the subject on hand?
     
  14. Pete

    Pete Moderate Nuisance
    Moderator Legend

    Joined:
    Feb 7, 2002
    Messages:
    5,777
    Likes Received:
    1,814
    Because the subject is dead, Jim.
     
  15. Miksu

    Regular

    Joined:
    Mar 9, 2003
    Messages:
    997
    Likes Received:
    10
    Location:
    Finland
    More dead than nVidia's compiler? Hardly.
     
  16. anaqer

    Veteran

    Joined:
    Jan 25, 2004
    Messages:
    1,287
    Likes Received:
    1
    My thoughts exactly.
    I have nothing against a bit of healthy OT, if it is either funny or informative. This NV vs. FM stuff is neither... what was that about horses, mortality and physical abuse again...?
     
  17. Aivansama

    Newcomer

    Joined:
    May 31, 2003
    Messages:
    39
    Likes Received:
    0
    Location:
    Finland
    Dead horses should lie still unless they want to get beaten?

    Besides, this thread needs more noise so Dave can drop hints without seeming too obvious...
     
  18. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    To get back on topic...
    I had quite of an illumination last night :p But take it with a BIG grain of salt I guess...

    First, it can't be 6x2 or 8x2. The NV30 wasn't supposed to be 4x2. It's a fricking unwanted effect of their non-Vec4 MAD technology not working properly, and thus not being in the final silicon.
    Secondly, it'd make no sense, because it would mean the VS are 2x2/4x1 :roll:

    The NV31/NV33/NV34/NV36 "double-pumping" is a simple hardcoded trick to enable Vec2+Vec2 operation under those conditionals.

    So, the NV40 is 16x1/32x0. Maybe 12x1 too, but not 24x0.
    I'm not sure where the 8 wasted ROPs are going in 12x1 mode.
    The NV41 would be a "plain" 6x1.
    And the NV42, logically speaking, could be 4x1 then...

    Regarding Ailuros' RezN8 reference, sure, that's true, you can't divide 3 by 2... But let me shake the foundations of YOUR theories a bit too: R3xx 6x AA with 2 ROPs per pipeline and 8 pipelines. You were saying? 8) Rampage Z trick is, well, an *old* trick. It's very cheap transistor-wise I'm sure, and that's why it was good in that timeframe IMO. But there are other ways to do the same thing nowadays, really.


    Uttar

    EDIT: The NV40 would be 6x1+6x1+4x1, allowing for for "broken" NV40s to become NV41s, probably under another brandname since they'd obviously be bigger cards/chips and they'd need more power to function properly. BTW, I was thinking: what if NVIDIA made a super-low-end model with no PS pipelines, just abusing the VS ones? :lol: j/k
     
  19. volt

    Regular

    Joined:
    Oct 22, 2002
    Messages:
    365
    Likes Received:
    3
    Question still remains however. Is it possible for NV to increase the transistor count from ~175m (original) to 210m (Inq. info).

    Let's say they managed to do that :? what else did they fit in that die?

    Is 210m trannies enough for NV40 to operate at 16x1 (real physical pipelines)?
     
  20. Arun

    Arun Unknown.
    Legend

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    302
    Location:
    UK
    Well, I HATE having to use numerology or whatever because it's, well, ALWAYS wrong in this industry but...

    207.5/130 = 1.6
    Considering extra features, better AA, ... let's simplify that to 1.5
    4x2 * 1.5 = 6x2
    Assuming they're using scalars or Vec2 instead of Vec4, we got a 12x1
    And assuming they abuse their VS pipelines, we got a 16x1.

    And if you're trusting the above, I assume you believe in astrology too? ;) Although I'd still think it's well enough for 16x1. It just means there are less per-pipeline units, arithmetic-wise for example, than if it was 8x2. Not that it really matters.


    The following is a joke, and not a good one at that... Read it only if you have time to waste.
    Plus, that sentence made no sense at all:
    You could have 16 pipelines with only one texture lookup unit and a single scalar MAD unit, just enough to meet shaders 3.0 spec by using evil hacks to do branching. The VS would be a single pipeline with one dedicated texture lookup unit... The bandwidth saving techniques would be Early Z and some sort of Z compression when ALL the subpixels are identical...
    The memory interface would be so wasteful it couldn't even use half of its peak throughput in a best case scenario... AF could be a -15.0 LOD trick with maybe 0.75x SSAA added to it to make it less grainy... Oh, and they could just screw the whole idea of cache and send the data directly, completely ignoring latency.
    And since you'd still have transistors to waste then, just add a few useless 2D features which make use of hundreds of RSQ/COS/SQRT operations in order to finally get to a result frighteningly near pure black.

    OMG! XG45! :lol: :twisted:


    The point is you could implement a 64x2 in 210M probably. Just it'd suck so much in all aspects beside double-texturing it wouldn't even be funny, really. It's all about making compromises elsewhere, as well as putting less in each of those pipelines.


    Uttar
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...