NVIDIA Fermi: Architecture discussion

Discussion in 'Architecture and Products' started by Rys, Sep 30, 2009.

  1. leoneazzurro

    Regular

    Joined:
    Nov 3, 2005
    Messages:
    518
    Likes Received:
    25
    Location:
    Rome, Italy
    You said "how could you paint this as a worse scaling". I did not paint anything, nor I made comparison between the vendors or their scaling.

    They will indeed be more usable if we compare the figures "with MUL". But it´s funny that anyone could speculate about how fast GF100 could be only looking at FLOPs and that possible speculation based on the behaviour of past chip generations history is worthless.
     
    #1641 leoneazzurro, Nov 25, 2009
    Last edited by a moderator: Nov 25, 2009
  2. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    The loss can't be regained, merely mitigated. I was surprised to see the shortfall was as large as 20%. The loss in games can be much higher (I've seen 60%).

    Curiously, Vantage GT1:

    http://www.xbitlabs.com/articles/video/display/radeon-hd5770-hd5750_13.html

    shows 97% performance advantage for HD5870 over HD5770. GT2 shows 86%. There could be a clue there, I suppose...

    Anyway, NVidia's working from a lower base. GT240 is great evidence of that.

    A lower base does tend to do that - hence the shock and awe of RV770.

    Jawed
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    With a tiny share, any kind of win's a big deal :!:

    Jawed
     
  4. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    Unless the platform difference was exactly half as well (i.e. half the CPU performance, half the memory performance, half the bus performance, etc.) I'm not sure you can conclude too much that is graphics related alone.
     
  5. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Which is why that was just a side note.

    I won't be able to distinguish between ALU and TEX, but that's not very relevent anyway. I'll show how much is dependent on the shader core, which is the only part that your 1% ALU and -22% TEX numbers make any difference with. Bandwidth, setup, and CPU are all part of the equation.

    ATI's ALUs scale perfectly, as does NVidias. There are plenty of tests out there that are not limited by BW or setup or CPU that show this. Using games as proof of ALU scaling is just stupid.
     
  6. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    3DMark makes their graphics tests have a very light CPU load. I suppose PCIe could be a bottleneck at some points, but it's doubtful.
     
  7. FrameBuffer

    Banned

    Joined:
    Aug 7, 2005
    Messages:
    499
    Likes Received:
    3
    could not the argument be made then that if (as some propose) profitability of graphics (Geforce) did indeed suffer that out of pure profit percentage, Tesla/Quadro (where bumpgate, die yields etc have very little bearing) would mitigate any Geforce losses ? If 75% of profits come from GPGPU/HPC/Workstation then even if Geforce (consumer) graphics suffers huge losses (say catastrophic .. ie 50%) a drop of .. I don't know.. 10% (from 20%) would in fact be huge but overall profitability of nV wouldn't really show it as such. ?

    So I would assume that both sides could be right,.. Geforce profitability could have indeed tanked but this would not be represented in nV's numbers overall. I haven't looked at nV's quarterly so I'm not sure if nV breaks down Graphics division to include or exclude GPGPU, HPC and Workstation seperately.
     
  8. FrameBuffer

    Banned

    Joined:
    Aug 7, 2005
    Messages:
    499
    Likes Received:
    3
    BTW.. maybe I missed it over the last 4 pages but did xman86 ever post any links to support his claim that "sold more GTX295s than ATI 4870X2s" (the fact that X2s have been EOL for some time would negate supply claims). ?? I've looked around and couldn't find anything.. Google sends me to Fruad and ironically back to this tread.
     
  9. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    Yeah, sorry that wasn't directed at you specifically. Was just a comment on the overall contempt for GT200 :)

    Yeah it's almost perfect on stuff like 3dmark's perlin noise but those tests aren't particularly relevant are they? It's hard to make a case for games being useless when evaluating the scaling of ALU+MEM+TEX altogether.

    Why would there be any mitigating effect if the professional segment is still performing well below longrun averages (per their last quarterly CC)? Also, bumpgate is a charge directly against income - it doesn't matter where that income came from.
     
  10. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Second test is the space one with all the rocks, so I guess it has a huge triangle count - so we could be seeing a bit of a setup limit I suppose...

    Jawed
     
  11. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    12,059
    Likes Received:
    3,119
    Location:
    New York
    This is anecdotal at best but here goes - Dow II @ 2560x1600 Max

    Core: 648 Shader: 1296 Mem: 1350
    Min: 9.9 Max: 77 Avg: 35.45

    Core: 648 Shader: 1566 Mem: 1350
    Min: 10.84 Max: 84.32 Avg: 37.7

    Core: 702 Shader: 1566 Mem: 1250
    Min: 10.28 Max: 86.39 Avg: 37.77

    Core: 702 Shader: 1566 Mem: 1350
    Min: 11.04 Max: 92.88 Avg: 40.66

    I had to overclock my e8400 from 3.0-3.6 to break 35fps so I don't know how much these numbers are still CPU limited. Will be installing a Q9550 this weekend so I might try again.

    Why does the base matter? 100% is 100%.
     
  12. Groo The Wanderer

    Regular

    Joined:
    Jan 23, 2007
    Messages:
    334
    Likes Received:
    2
    They did. Go look at Dell and HP for example, both have ATI as either the default or available on almost every workstation class machine. A year ago, they weren't available.

    -Charlie
     
  13. BRiT

    BRiT (>• •)>⌐■-■ (⌐■-■)
    Moderator Legend Alpha

    Joined:
    Feb 7, 2002
    Messages:
    20,517
    Likes Received:
    24,424
    No.
     
  14. A.L.M.

    Newcomer

    Joined:
    Jun 2, 2008
    Messages:
    144
    Likes Received:
    0
    Location:
    Looking for a place to call home
    Sorry if I go back on talking 'bout the MADD to FMA substitution...
    Actually I don't think that the thread processor, who should be the one in charge for the substitution can operate the change on a big bunch of shaders...
    This because it can only read stuff which is on its internal buffers (which can contain only small portions of code). This leads to the fact that actually the substitution has to be performed separately from compiling, so it's not done at compile time, but rather at runtime.
    At least this is what a guy much more knowledgeable than me on the matter explained to me... :wink:

    I would be very interested in listening to Rys' view on this...
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Well, with either core or memory clock increases of ~8% producing ~8% more performance, it seems "fine" - though how much difference in performance did the 20% on your CPU make? ALUs are making very little difference, 6% for 21% higher clocks.

    A complication is that texture coordinate interpolations by the ALUs mean that the shader clock could make some difference to texturing, so shading may be even less dependent upon ALU rate, if you exclude the interpolations. HD5870 has that complicating factor, of course.

    HD5870 is up to 94% faster than HD5770 in this game:

    http://www.computerbase.de/artikel/...ati_radeon_hd_5970/7/#abschnitt_dawn_of_war_2

    That indicates HD5870 is ~44% faster than GTX285, which is slightly higher than the texture and fillrate advantage, despite having less bandwidth.

    If you could wangle a ~45% core overclock on your GTX285, I wonder how close you'd get to HD5870? :razz:

    I have to admit the way that's scaling on your system is a bit of a puzzler overall.

    You effectively said it yourself, an architectural overhaul tends to increase per-unit per-clock efficiency (highlighting the inefficiencies of the older design). R800 isn't such an overhaul (except, perhaps, in attribute interpolation) so...

    Jawed
     
  16. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    The thread processor doesn't work on RAW uncompiled code. The driver is responsible for converting the input shader/kernel/whatever into machine code. During that process it can quite easily convert MAD into FMA or even NOPs if it so chooses.

    Now if an end-user were able to pass in machine code directly, then the chip would have to do the conversion itself, but this is not possible as the driver is always in between.
     
  17. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,579
    Likes Received:
    4,799
    Location:
    Well within 3d
    Would it be possible that Fermi's opcode for FMA is a reuse of the MAD one?
    That would make things even more straightforward.
     
  18. A.L.M.

    Newcomer

    Joined:
    Jun 2, 2008
    Messages:
    144
    Likes Received:
    0
    Location:
    Looking for a place to call home
    Obviously not.
    So the cpu is responsible for the compiling of the instructions, in order to transform the code into something that the gpu can actually read and process (machine language). That should be the only thing that is done in compile time.
    Then the thread processor works on the compiled code in order to do the optimization that is contained into drivers.
    The problem is that it simply cannot optimize the whole bunch of code that has been compiled, because it cannot read outside of its registers, so it will optimize (in this case changing MAD to FMA) only small portions of the code each time.

    Correct me if I'm wrong. :smile:
     
  19. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    The driver's compiler does more than just convert the kernel into machine code. For example, there's certainly an optimizer in there. When the driver is converting the kernel into machine code it is free to do what it wants. It doesn't need to know the contents of different registers before doing a general optimization. If FMA is valid, then it is free to use it wherever it's suitable.

    The GPU doesn't need to be aware of this MAD -> FMA process at all.
     
  20. A.L.M.

    Newcomer

    Joined:
    Jun 2, 2008
    Messages:
    144
    Likes Received:
    0
    Location:
    Looking for a place to call home
    Well, I think that it should be aware, given that the only piece of hw that can read what's written in a vga driver is the gpu... :lol:

    I think that the key issue is this:

    - can a gpu optimize its code by substituting all the MADD with FMA at once?

    Chances are that it can't. And this because optimization is done by the gpu, and in particular into the thread processor, who reads only what's written in its registers, that can't physically contain all the recompiled code that is necessary to an entire game level, let's say.
    The driver can say to the TP "if you find a MADD, change it into an FMA", but this change is going to be performed for each single bunch of stuff that fits into the TP registers.
    But how this process is going to impact real world performances, it goes much beyond my knowledge....

    Am I missing something? No flames, just wanting to understand better this kind of process... :smile:
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...