NVIDIA Fermi: Architecture discussion

Discussion in 'Architecture and Products' started by Rys, Sep 30, 2009.

  1. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    The GPU is just a dumb state machine, that does what the driver tells it to. The driver doesn't say to the GPU, "find a MADD and substitute it for FMA", since that execution of that instruction is defined by the driver well before the GPU ever tries to run it.
     
  2. Blazkowicz

    Legend

    Joined:
    Dec 24, 2004
    Messages:
    5,607
    Likes Received:
    256
    I thought you would have your code written in high level language, and the compiler has to figure out how to perform x = a*b + c calculations at one point. This is where I don't understand much the discussion. At that point, there's no MADD or FMA yet? only the assertion that you want that result.

    So there would never be a MADD at all, in the reasonable scenario that all such calculations default to a FMA, unless you make it clear you want a MADD.
    Correct me, too :)
     
  3. A.L.M.

    Newcomer

    Joined:
    Jun 2, 2008
    Messages:
    144
    Likes Received:
    0
    Location:
    Looking for a place to call home
    So what the driver would tell to the gpu in this case?
    Not the same thing that a GT200 driver would tell, I guess, due to this "missing" MADD capability.
    I am just curious, because I am wondering why, if it's so trivial, AMD went all the other way, with cores able to calculate both MADD and FMA...
    I mean, if it's for free and it's so simple, why wasting time in leaving the MADD capability? :?:
     
  4. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    The compiler generates code that targets the chip ISA though. So it's low enough level to generate MADD or FMA and there is the raw instruction there. Check out PTX for more, since it's pretty much the raw opcodes.
     
  5. Rys

    Rys Graphics @ AMD
    Moderator Veteran Alpha

    Joined:
    Oct 9, 2003
    Messages:
    4,182
    Likes Received:
    1,579
    Location:
    Beyond3D HQ
    The driver would give it the raw instruction (see my last post). And it's not wasting time, just area. It's quite feasible to design two ALUs, both capable of FMA, but only one also capable of MADD at the same rate.
     
  6. FrameBuffer

    Banned

    Joined:
    Aug 7, 2005
    Messages:
    499
    Likes Received:
    3
    so does that claim get sent to the FUD Bin for filing ?
     
  7. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,533
    Location:
    Winfield, IN USA
    If you haven't already done so, it is safe to do so at this time. :yep2:
     
  8. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    When the compiler's done with it's high level optimizations, it breaks the intermediate representation of the code into basic operations. Basic here is defined by the ISA of the chip. Basic would be a MADD for rv770 and FMA for cypress.

    Each chip has it's own ISA baked into the driver. At shader-compile-time, the relevant ISA description is pulled from disk and machine code is generated accordingly.
     
  9. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    That's not how it works at all. The driver builds buffers of commands for the GPU to process. The GPU doesn't "process" the driver.
    The GPU doesn't optimize code. The driver may enable performance features of the GPU ("interpret MADDs as FMA") but the GPU doesn't (yet) compile its own shader code.
    Why would the GPU need to hold all the code at once anyway? As stated earlier, the driver runs on the CPU, not GPU.
     
  10. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Not assume, but take into consideration as well.

    Which, i bet, they are regretting now, considering the shortage of Cypress-GPUs.

    As far as i know, Vantage runs much more simulations on the GPU in the extreme settings than in performance mode - instead of just upping resolution and AA/AF levels:
    http://www.pcgameshardware.de/aid,641615/3DMark-Vantage-im-PCGH-Benchmark-Test/Benchmark/Test/
    This is from the Whitepaper regarding the second Game-Test "New Calico:
    • Almost entirely consists of moving objects
    • No skinned objects
    • Variance shadow mapping shadows
    • Lots of instanced objects
    • Local and global ray-tracing effects (Parallax Occlusion Mapping, True Impostors and volumetric fog)

    All-in-all, it looks like Vantage is putting more emphasis on the shaders than most game seem to do.


    WRT to setup limitation - the values from HD 4890 vs. OC 5770 do not look like this is the case, both being neck to neck with each other. Bandwidth seems suffiecient on HD 5770, considering HD5870-scaling.


    No, it's up to 94% faster in the presumably scripted introduction sequence of the first mission. While it's maybe a useful test wrt raw performance, it doesn't tell us much about behaviour while really playing the game.

    --
    Oh, and while we'r at it:
    Here's a nice (not because we've done it, but because it's nice data in itself) comparison showing the improvements for GTX 280 and HD 4870 made possible by driver progress alone over the course of one year after their respective introduction:
    http://www.pcgameshardware.de/aid,6...us-aktuelle-Treiber-im-Test/Grafikkarte/Test/
     
    #1670 CarstenS, Nov 26, 2009
    Last edited by a moderator: Nov 26, 2009
  11. dizietsma

    Banned

    Joined:
    Mar 1, 2004
    Messages:
    1,172
    Likes Received:
    13
    Well I read Rys' very good piece over at Techreport and yet still I cannot get over enthused about it. Why should I pay for transistors that do HPC when I am not doing it myself? Same for the next Intel chip, they bung a graphics chip on there as well .. why do I want to pay for something I will not use? I'd rather have something dedicated to what I am paying for.
     
  12. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Compute Shader is part of D3D11, so for good graphics performance on games, e.g. those that do post-processing using CS, you'll be using most of the chip. The ECC shouldn't make much difference in die size (a few percent?) and double-precision is almost entirely based on re-use of existing units (with a huge adder and some additional routing being the overhead).

    I don't think it's a big deal at all.

    Jawed
     
  13. pjbliverpool

    pjbliverpool B3D Scallywag
    Legend

    Joined:
    May 8, 2005
    Messages:
    9,237
    Likes Received:
    4,260
    Location:
    Guess...
    My 5850 has been delayed again. Right now its a race between me actually getting my hands on that GPU, and NV releasing some concrete information about Fermi. If I can get a date, performance numbers or both that look favourable then I might just cancel the 5850. Hard launch my ass.
     
  14. Silus

    Banned

    Joined:
    Nov 17, 2009
    Messages:
    375
    Likes Received:
    0
    Location:
    Portugal
    I can't help but notice that you already placed the HD 5850 in your signature, even though you don't have it. I wonder how many people do the same :)
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Yes, since it's scaling with GPU performance fairly well. Though ALUs don't seem to be relevant:

    [​IMG]

    The fillrate graphs seem to indicate ATI is not pixel limited. GT2 tends to be pixel limited on NVidia.

    [​IMG]

    [​IMG]

    And most websites don't use gameplay for testing...

    I dare say I'm surprised to see a few substantial improvements for NVidia.

    Jawed
     
  16. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    18,992
    Likes Received:
    3,533
    Location:
    Winfield, IN USA
    Well if you see anyone with Fermi in their sig I'd kind of doubt that one too.

    I got a longcard in my system, got the HDD cage ripped out and the HDDs sitting on the bottom of my case to prove it too. :razz:
     
  17. OpenGL guy

    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,357
    Likes Received:
    28
    Pixel limited? These cards are capable of billions of pixels per second and you're showing 40 million pixels per second in your graph.
     
  18. CarstenS

    Legend Subscriber

    Joined:
    May 31, 2002
    Messages:
    5,800
    Likes Received:
    3,920
    Location:
    Germany
    Where's that data in your image compiled from?
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    11.6 fps for a 4,096,000 pixel frame is ~37MP/s. Same goes for 1920x1200 and 1680x1050, with ~35MP/s at 1280x1024.

    Some stuff, like shadow maps I presume, is fixed in size regardless of resolution. So in addition to the vertex workload being pretty much static regardless of resolution, some of the pixel rendering passes are, too.

    How would you characterise the bottlenecks of these two tests at various resolutions?

    Jawed
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...