GPCBenchmark - A OpenCL General Purpose Computing benchmark

Discussion in 'GPGPU Technology & Programming' started by Arnold Beckenbauer, Apr 30, 2010.

  1. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    3,696
    Location:
    Germany
    That's precisely what I meant. No DP for example, crash in INT, but at least a score in Image Processing, which (the latter) wasn't the case with HD 5870 and older drivers.

    edit: To be more precise: The combination of the rather oldish and not recently updated benchmark and AMDs (and partly Nvidias also) OpenCL drivers does not give a good indication of the performance one should be expecting from Cayman.
     
    #101 CarstenS, May 9, 2011
    Last edited by a moderator: May 9, 2011
  2. trinibwoy

    trinibwoy Meh
    Legend Alpha

    Joined:
    Mar 17, 2004
    Messages:
    10,312
    Location:
    New York
    The histogram test shows the local atomics version running much faster than the global atomics on Fermi. I thought we had concluded that local atomics were still making a round trip through L2 and therefore shouldnt be any faster?
     
  3. pcchen

    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,645
    Location:
    Taiwan
    Are you sure about this? Local atomics is in shared memory, so by definition it's in L1 cache, not L2 cache. Remember that local atomics only work within a block, so it shouldn't have to touch L2 cache.
     
  4. trinibwoy

    trinibwoy Meh
    Legend Alpha

    Joined:
    Mar 17, 2004
    Messages:
    10,312
    Location:
    New York
    No I'm not sure as I haven't done the test myself but B3D's analysis found local and global atomics to be the same speed (actually locals were slower). Haven't seen any other evidence to dispute this besides the GPCBenchmark test.

    http://www.beyond3d.com/content/reviews/55/14
     
  5. pcchen

    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,645
    Location:
    Taiwan
    Well, I don't know but my previous experiments on histogram also shows local atomics is faster than global atomics (local atomics version can do histogram at around 10GB/s on my GTX 460), but it's in CUDA. Of course, this still can't be completely ruled out as I didn't do any "maximum speed" test on atomics.
     
  6. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    730
    i wonder GCN numbers :)
     
  7. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    3,696
    Location:
    Germany
    Lemme just quick-quote myself:
    This is also true for Tahiti. Some of the performance numbers in OpenCL Bench make sense, others just don't. Plus, the results vary wildly in some subtests from run to run.
     
  8. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    730
    Thanks i wont ask for Kepler then :smile:
     

Share This Page

Loading...