The state of AMD's GPGPU implementation?

Discussion in 'GPGPU Technology & Programming' started by wingless, Aug 21, 2008.

  1. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,811
    Likes Received:
    478
    Hyperbole much?

    The local data share is there, just not enough information given to understand how contention will affect performance.
     
    #41 MfA, Sep 10, 2008
    Last edited by a moderator: Sep 10, 2008
  2. randomhack

    Newcomer

    Joined:
    Apr 4, 2008
    Messages:
    41
    Likes Received:
    0
    The documentation is still not good.
    The new features seem to be
    a) Local data share is exposed through CAL. Some info is present in IL reference but thats it. NO mention of LDS anywhere else in docs. Global data share is not exposed?
    b) Vista is now supported.
    c) Dx 9/10 interop is now supported. OpenGL is not mentioned?
    d) Some sync primitives.
     
  3. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Fair point.
     
  4. randomhack

    Newcomer

    Joined:
    Apr 4, 2008
    Messages:
    41
    Likes Received:
    0
    Rv730 and SDK

    I understand that RV730 is not officiallyu supported by the SDK. But I am wondering if RV730 supports features like double precision, global buffer and the new fangled LDS etc. Comments?
     
  5. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    To the thread - the V1.2 SDK update brings official (albeit Beta, at this stage) Vista support, which is what was mentioned earlier in this thread.

    RV730 has all the same compute feature set as RV770 other than no Double Precision support.
     
  6. randomhack

    Newcomer

    Joined:
    Apr 4, 2008
    Messages:
    41
    Likes Received:
    0
    Ahh thanks for the info. As I am playing more with this new version of CAL, I am getting more excited. The "compute shaders" are very interesting. Lots of stuff to play with. LDS and shared registers are especially interesting.

    It would be much nicer though if AMD can disclose some information about how thread groups are mapped to hardware and also some info about caches wont hurt either :)
     
  7. wingless

    Newcomer

    Joined:
    Aug 5, 2007
    Messages:
    79
    Likes Received:
    0
    Location:
    Houston, Texas
    I hope AMD engineers are reading all of your thoughts on this subject. You said you're getting excited about AMD's features, but the documentation is lacking. AMD could have a real opportunity to please the developer community if they get some more info out there. I have faith in the hardware.

    On a side note, I wonder if AMD is planning on tossing in some ECC capabilities to allow more efficiency in multi-gpu configurations. David Kanter says that the RV770 isn't as well designed for multi-gpu configurations as GT200 is. This is shameful given the immense processing potential of the 4870X2.
     
  8. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,528
    Likes Received:
    107
    Hmm? Is David really saying that? Would you mind pointing me to the exact quote?
     
  9. wingless

    Newcomer

    Joined:
    Aug 5, 2007
    Messages:
    79
    Likes Received:
    0
    Location:
    Houston, Texas
    http://www.realworldtech.com/page.cfm?ArticleID=RWT090808195242&p=5

    Well, his criticism is about multi gpu setups from both Nvidia and AMD. It is harder for the developer to deal with multiple GPUs to begin with. I misunderstood the first time I read this so I apologize to Mr. Kanter and everybody else. I really would like to see AMD be the first to solve this cache coherency issue between multiple GPUs. It seems that would give them a leg up on CUDA and also make it A LOT easier to program for this multi-gpu configs.
     
  10. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    I think pixie dust could do the job ;) More seriously, I doubt there can even theoretically be a very attractive answer to that without CMOS photonics.
     
  11. randomhack

    Newcomer

    Joined:
    Apr 4, 2008
    Messages:
    41
    Likes Received:
    0
    Pardon my noobness .. but why is it that hard to do?
    Cache coherency on SMP systems for CPUs has been done for many years now.
    Cache coherency b/w various cores on GPU is being done by Larrabee.
    So why is there a huge jump from these 2 to cache coherency b/w multi-GPUs on the same PCB?
     
  12. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,751
    Likes Received:
    128
    Location:
    Taiwan
    Because the interconnection requirement is quite different. Normally a CPU has only a fraction of bandwidth of a GPU. The interconnection between two (or more) CPU does not really require much more bandwidth than their main memory bandwidth. Actually, latency is probably more important for them.

    However, in the case of GPU, you will want a lot of bandwidth between two "cache coherent" GPU, roughly the same as their memory bandwidth. That means you are probably looking at a about 50GB/s bandwidth requirement for an interconnection between two GPU. This is probably doable if these two GPU are on the same board, but it's still expensive. It will get much more expensive if you want an interconnection like this to work across multiple boards.
     
  13. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Plus, to fit in AMDs strategy, it would have to reside in every (performance) ASIC, which costs die space even on those dies which are not used in mGPU-scenarios.
     
  14. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,125
    Likes Received:
    2,885
    Location:
    Well within 3d
    It's not clear that even Intel will push inter-chip cache coherency strongly when it comes to graphics products.
    I haven't seen any inter-chip bandwidth numbers, and it may be that Larrabee won't even try multichip early on.

    The traffic problem might in the end vindicate Nvidia, when coherent caches are implemented.
     
  15. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,811
    Likes Received:
    478
    Snooping cache coherency doesn't scale well, but directory based cache coherency does. IMO this just obfuscates the underlying architecture and promotes poor programming though, just use message passing.
     
  16. 3dilettante

    Legend Alpha

    Joined:
    Sep 15, 2003
    Messages:
    8,125
    Likes Received:
    2,885
    Location:
    Well within 3d
    The problem as described so far indicates that there is a genuine need for very high bandwidth between all the chips.
    In what way can a directory minimize traffic that isn't related to coherence?
     
  17. wingless

    Newcomer

    Joined:
    Aug 5, 2007
    Messages:
    79
    Likes Received:
    0
    Location:
    Houston, Texas
    Would the Sideport interconnect on the HD 4870X2 qualify as this high bandwidth interconnect that yall are talking about? Of course this is only between two chipsets on one board, but its 2x better than a single RV770. They gotta start somewhere...

    Cache coherency seems to really be a big issue with these GPUs. I can't wait to see what creative and innovative solution these companies come up with to make multi GPU GPGPU farms work well together. MfA mentioned something about message passing to get around this issue in software ( I think that is what he meant, please elaborate, MfA). Do you all know of ways to get around this problem purely with creative coding?
     
  18. Lux_

    Newcomer

    Joined:
    Sep 22, 2005
    Messages:
    206
    Likes Received:
    1
    As Eric Demers said in recent Rage3D interview, Sideport is write-only.
     
  19. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,811
    Likes Received:
    478
    This is not really for GPU processing, but for more generalized massively parallel processing. With message passing data being carted around has an explicit destination, there is no need for cache coherency because there is no duplicate storage of the same address.

    When you rely on cache coherency for communication, data has an implicit destination. With snooping the receiver picks data for addresses it has cached from broadcast traffic (obviously doesn't scale well) and with directory based caching you have special controllers which remember which processors have which data cached so you can do it without broadcasting (can scale well, with a lot of caveats).

    For the moment though talking about cache coherency for GPGPU programming is missing the point somewhat.
     
    #59 MfA, Sep 29, 2008
    Last edited by a moderator: Sep 29, 2008
  20. wingless

    Newcomer

    Joined:
    Aug 5, 2007
    Messages:
    79
    Likes Received:
    0
    Location:
    Houston, Texas
    Gotcha. Back to the original topic of my post. All in all it seems that AMD's GPGPU software isn't as mature as CUDA which is hindering it's present-day market adoption. CUDA is getting into anything and everything now and it pains me to see AMD lagging behind when we know the hardware is up to the task.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...