NVIDIA Fermi GPU and Architecture Analysis

Discussion in 'Beyond3D Articles' started by AlexV, Oct 23, 2010.

  1. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,528
    Likes Received:
    107
  2. Florin

    Florin Merrily dodgy
    Veteran

    Joined:
    Aug 27, 2003
    Messages:
    1,633
    Likes Received:
    183
    Location:
    The colonies
    I was wondering what the slimer thing was about and figured it'd be revealed soon(tm) enough. This is a sweet surprise *runs off to read*
     
  3. Sxotty

    Veteran

    Joined:
    Dec 11, 2002
    Messages:
    4,842
    Likes Received:
    303
    Location:
    PA USA
    Congrats! It's ALIVE!!!
     
  4. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,414
    Likes Received:
    411
    Location:
    New York
    Wow those are some damning results if they're accurate. Even though synthetic tests don't tell the whole story there's no doubt nVidia's heavily relies on software.

    The round trip to L2 for shared-memory atomics seems a bit ummm, retarded....

    Oh and good job guys.
     
  5. wishiknew

    Regular

    Joined:
    May 19, 2004
    Messages:
    332
    Likes Received:
    6
    Sontin got a mention.

    And my god, someone checked the home page.
     
  6. Florin

    Florin Merrily dodgy
    Veteran

    Joined:
    Aug 27, 2003
    Messages:
    1,633
    Likes Received:
    183
    Location:
    The colonies
    Heh yeah quite cheeky that.

    It'd be interesting to find out if the untessellated triangle setup limitation is indeed implemented in software only - circumventing that'd appeal to the hardware ricer in me.
     
  7. wishiknew

    Regular

    Joined:
    May 19, 2004
    Messages:
    332
    Likes Received:
    6
    Is Beyond3d going to do a quickie on the Barts?
     
  8. digitalwanderer

    digitalwanderer Dangerously Mirthful
    Legend

    Joined:
    Feb 19, 2002
    Messages:
    17,130
    Likes Received:
    1,660
    Location:
    Winfield, IN USA
    Fermi finally came out? Really? :|
     
  9. 3dcgi

    Veteran Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    2,432
    Likes Received:
    261
    How does the system value mechanism (SV_position) differ from the 1-4 attributes cases?

    When shader instruction throughput is tested are all stages active at once or is VS measured with only a VS enabled and GS with no HS and DS?
     
  10. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,484
    Likes Received:
    396
    Location:
    Varna, Bulgaria
    That's "Slimer" for you, soldier!

    :razz:
     
  11. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    It wouldn't be B3D if it was a quickie. :lol:
     
  12. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    113
    Location:
    New Zealand
    Hey if Fermi = Slimer, what is Cypress? Would it be something like Chicken Korma? :p
     
  13. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,528
    Likes Received:
    107
    Using SV_Position is equivalent-ish with the 4 Attribute case applying the noperspective modifier to the latter, hence the comments in the article about the performance delta between the two cases being interesting.

    I use the fewest possible stages - so for VS only that is active, for GS you get VS+GS and so on and so forth. There's obviously more to see in that area than what was shown in the article, so that, like other aspects, is still under active development, but we've discovered that eventually some things have to come out so that people still remember that there's a frontpage/another facet of B3D.

    Squilliam: you're better of not knowing the answer to that one:)
     
  14. foo

    foo
    Newcomer

    Joined:
    Sep 1, 2003
    Messages:
    3
    Likes Received:
    0
    I am stunned if your programs are correct how the 480 can beat the 5780?

    Remarkable!!
     
  15. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    113
    Location:
    New Zealand
    If not knowing is safer then yes I will choose not to know. :razz:
     
  16. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,796
    Likes Received:
    2,054
    Location:
    Germany
    Thanks for the article guys, I really enjoyed reading it yesterday evening! Now, in the coming weeks i might be busy trying to understand it as well. ;)
     
  17. mczak

    Veteran

    Joined:
    Oct 24, 2002
    Messages:
    3,012
    Likes Received:
    112
    Very nice article!
    I think the Vec4 MAD Int rate quoted in the table for HD5870 is wrong. Should be only half (same as Vec4 MUL Int).
    Any idea what's going on with DOT2 on HD5870? The table correctly states the max issue rate (as the chip should indeed be able to execute 2 of them per clock) but why is it often only half that of DOT4? Makes no sense. Are there bank conflicts or something?
     
  18. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,528
    Likes Received:
    107
    Thanks for taking the time to read it!

    32-Bit Int MUL is done by the fat ALU. All ALUs are capable of ADDs though, which translates into a Vec4 ADD per block, times 16 blocks per SIMD, times 20 SIMDs, times 850MHz, which equals 272 GInstr/s. What was wrong in the table was the Vec4 MAD rate, which I corrected now.

    The problem with DOT2 is a combination of how I setup the test, which is a string of dependent ops, to break some compiler optimizations, and a bit of driver wonkiness. Practically I've verified it can do 2 DOT2s per block per clock, and should probably re-work that particular test a bit to show it(and this should've been mentioned in the text as well, whoops!).
     
  19. Squilliam

    Squilliam Beyond3d isn't defined yet
    Veteran

    Joined:
    Jan 11, 2008
    Messages:
    3,495
    Likes Received:
    113
    Location:
    New Zealand
    How long does it take to make an article like this?

    Anyway, I take it Dave would be kind enough to provide a Cayman when applicable? :)
     
  20. Tahir2

    Veteran

    Joined:
    Feb 7, 2002
    Messages:
    2,978
    Likes Received:
    86
    Location:
    Earth
    That is a very nice article and surprisingly easy to read, the level of prodding was outstanding.

    I would imagine an article comparing Cypress to Barts to Cayman, see what kind of sacrifices and improvements AMD has made to its architecture to increase efficiency would also be a surprising read.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...