AMD announces new GPGPU card, hints at RV670 specs

Discussion in 'GPGPU Technology & Programming' started by Dave Baumann, Nov 8, 2007.

  1. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    What if it's less than 25%?

    The 90nm Cell's throughput drops down to something like 1/10, and that's on an architecture that doesn't try to fit tons of ALUs into a small area.
    The 65nm variant is half-speed at DP with specialized hardware added.

    Where does RV670 fit on that continuum I wonder?
     
  2. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    I always thought it was mostly bus width, routing logic, and register storage that were the main limitations of increasing precision on GPUS. If you have 50% DP rate and forget about increasing register space, then all those problems go away.

    Because GPUs are made to handle hundreds of cycles of latency for texture instructions via thousands of fragments in flight, it's okay for them to have much, much longer instruction latency than a CPU if they want to. That takes a big chunk out of the cost of increasing precision.

    If you actually look at the fundamentals, a DP multiplier isn't that big. 160 of them on a 666M transistor chip is pretty reasonable, especially when you're just modifying 320 SP multipliers to act like that. Half rate DP isn't out of the question, IMO. The original Cell was probably 1/10 speed because DP was a near useless feature for its original market (PS3). Now that it's getting some traction in HPC, the minimal investment required for half speed DP is worth it.
     
  3. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    1/2 is the upper limit to what can be expected because a higher ratio between DP and SP would indicate there is some hardware that could have been used to up the SP throughput.

    The question is how much effort AMD put into DP for RV670. It is a derivative of a product only capable of SP, so how much would AMD be willing to tweak the design?

    Why wouldn't they have added DP capability as a checkbox figure?
    They certainly didn't disclose DP throughput, and 1/2 SP throughput would have been respectable enough to disclose.
     
  4. Geo

    Geo Mostly Harmless
    Legend

    Joined:
    Apr 22, 2002
    Messages:
    9,116
    Likes Received:
    213
    Location:
    Uffda-land
    Quoted for permenance, my brotha. :lol:
     
  5. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    Welp. D:
     
  6. 3vi1

    Newcomer

    Joined:
    Jan 25, 2007
    Messages:
    22
    Likes Received:
    3

    In an ExtremeTech article from June of this year there is a company building a GPU API with benchmarks for the ATI 2900 versus a Nvidia Quadro 4600 and a CPU in SP and DP.

    You can see the PDF here: http://www.gpucomputing.eu/download/en_presskit.pdf or just the three graphs in question here [no pdf]: http://www.gpucomputing.eu/index3.php?lang=en&page=_demo1.php&id=2



    From the benchmarks featured it seems that the 2900's DP is about 40% of it's SP.



    Any thoughts?
     
    #46 3vi1, Nov 10, 2007
    Last edited by a moderator: Nov 10, 2007
  7. mhouston

    mhouston A little of this and that
    Regular

    Joined:
    Oct 7, 2005
    Messages:
    344
    Likes Received:
    38
    Location:
    Cupertino
    No GPGPU chip before this FireStream and has native 64-bit. It can be emulated, but not at that performance. I have no idea how they are getting their claimed double precision performance... I know for sure that R600 doesn't support double, nor does any shipping Nvidia part.
     
    #47 mhouston, Nov 10, 2007
    Last edited by a moderator: Nov 10, 2007
  8. 3vi1

    Newcomer

    Joined:
    Jan 25, 2007
    Messages:
    22
    Likes Received:
    3
    Like a side of cole slaw and mashed potates,,

    I was reading this article http://techreport.com/articles.x/10956/3 after watching AMD/ATI stream computing presentation from Sep. 2006.

    So, it hit me.. What the hell has ATI/AMD been doing? Why have they not leveraged the awesome power of their GPU to do physics? If their GPU technology could do the stuff they showed in their demos what's the holdup from getting something to the public?


    So, after viewing the video I wondered where the hell can I buy a PhysicsCAD?!

    Imagine for a moment a tool that allows rapid product development.. Not only can you design it's shape, form and factor but you can test it's physical charactoristics. Building a jungle gym? Ok, let's drop a few 40 pound balls onto the top and against the sides to see how the structure holds up.. Designing a rocket / lunar componet to fly in the Google lunar competition? Well, lets test it's design in a simualted hostile enviroment before you commit to expensive development costs.. Do you have a patent for a new device and want to simulate it's physical charactorisitcs before the build phase? No problem, design the componet shape then select the material type and run it through a battery of tests.

    I don't mean to simplify things too much but I imagine the applications for such software could be endless. A PhysicsCAD could be used in everything from educational arenas to product design and testing - before the actual product is ever built - thereby decreasing costs and incressing productivity.

    Anyhow, unless AMD/ATI does something exciting to get developers back they may not be able to withsatnd Intels comming assult. So yea great they got new hardware, but so what? Nvidia is talking about 1 teraflop for christmas store shelves, just in time for Santa!



    AMD THINK CREATIVLY! We need 3 players in the game to keep things honest... I think all may be for naught, but a PhysicalCAD/simualator would be a great tool, that would attract developers and increase sales beyond just games.. Oh yea, Nano-tech modeling :yes:..



    Just a thought, cheers.
     
  9. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,806
    Likes Received:
    473
    They didn't reuse multiplier hardware between the SP/DP processing, not a valid comparison.
     
  10. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    That is true.

    I hadn't given much thought of the implementation on the SPE, just that 1/10 was perhaps the lowest an implementation could go and still be acceptable.

    Actually, bringing up the separate hardware point, a possible compromise for RV670 would be to conserve transistors by only extending the complex ALU to handle DP calculations. Expanding one unit would be less drastic than extending five ALUs in a processor.

    That would put DP performance at 1/5 SP.
    Unless it actually iterates a DP through SP hardware twice, which would put it back at 1/10. ;)
     
  11. OICAspork

    Newcomer

    Joined:
    May 9, 2003
    Messages:
    210
    Likes Received:
    0
    Location:
    Nara, The Land of the Rising Sun
    I think this may be appropriate reading. Don't forget the video!
     
  12. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    I think I can pedant myself out of this!
     
  13. 3vi1

    Newcomer

    Joined:
    Jan 25, 2007
    Messages:
    22
    Likes Received:
    3
    pedant = not a verb. :wink:
     
  14. Lux_

    Newcomer

    Joined:
    Sep 22, 2005
    Messages:
    206
    Likes Received:
    1
    FH
    Related, I guess.
     
  15. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    I spoke to some AMD people at SC07 (come by the RapidMind booth if you're there! ;)). Some tidbits that I gleaned:

    - the card is indeed R670-based - graphics parts to come soon
    - the graphics parts will also support double precision
    - the FireGL offering will basically be a superset of this card (i.e. R670, 2GB RAM but with display outputs naturally)
    - double precision is 50% speed, but no fused MAD (seems in line with the ~40% figure quoted earlier in this thread)

    So maybe some hat eating/video recording to come yet? ;)

    Anyways I can't stand 100% behind these facts since the AMD people may even have been in error, but they are probably fairly reliable, and nothing too crazy.
     
  16. BRiT

    BRiT (╯°□°)╯
    Moderator Legend Alpha Subscriber

    Joined:
    Feb 7, 2002
    Messages:
    12,501
    Likes Received:
    8,702
    Location:
    Cleveland
    Why is SP so slow then?




    :runaway:
     
  17. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    The "5'th" special function scalar is actually not invloved in DP calculations.
     
  18. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    I don't know much about the hardware, but I think 50% DP seems to imply a fairly full utilization of the silicon in both single and double cases, no?

    Ah that makes sense :) The note about no MAD is interesting though as if you're doing heavy MAD stuff (say, interpolation, evaluating berstein polynomials, etc) you're getting 1/4 the speed DP rather than 1/2. Still, it's probably not critical for most code and 50% issue rates for other instructions is pretty good.

    Now I'm wondering whether transcendentals are also 50% and if so, are they accurate to more bits in DP, or the same as SP?
     
  19. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,122
    Likes Received:
    2,873
    Location:
    Well within 3d
    That puts RV670's DP throughput at less than 25% of the peak FLOPS figure given for the chip in the AMD PDF.
    My math earlier was based on peak marketing numbers, which I were going by the FMADD peak.

    Actually, the 5th special function ALU being left out leaves my earlier math slightly optimistic when it comes to price/performance and performance/watt in the redundancy case.
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    See this document

    http://www.cs.berkeley.edu/~samw/projects/multicore/sc2007.pdf

    for an example of how "useless" x86 (particularly Intel) is, and why "peak" is such a meaningless concept. The applications covered, such as Proteins, FEM and circuit simulation look like reasonable evaluation candidates for GPGPU workloads...

    Jawed
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...