Scalable HeterOgeneous Computing (SHOC) Benchmark Suite

Discussion in 'GPGPU Technology & Programming' started by Jawed, May 11, 2010.

  1. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,638
    Location:
    London
    http://ft.ornl.gov/doku/shoc/start

    The initial report:

    http://ft.ornl.gov/pubs-archive/shoc.pdf

    dates from 14 March 2010. It shows the immaturity of OpenCL...

    There's plenty to play with, too.

    Jawed
     
  2. pcchen

    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    2,645
    Location:
    Taiwan
    Unfortunately, this report used the older OpenCL SDK (NVIDIA SDK 2.3 and ATI Stream SDK 2.0), hence the relatively large disparity between OpenCL and CUDA. I hope they'll do a new report with newer implementations and that'd be interesting.
     
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,638
    Location:
    London
    I suppose the first requirement is that the major OpenCL implementations are feature-complete and stable. It's almost like a first-taste.

    If nothing else, this gives them a tool to measure the usefulness of OpenCL. If it's another year away from being useful, they'll at least have a benchmark for progress.

    Additionally, the heterogeneous scope is quite a learning curve, so while they're climbing it, developing a suite like this will be quite productive for the community at large - and even for AMD, NVidia etc.

    Jawed
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,638
    Location:
    London
  5. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    414
    GeForce GTX 460 768MB 256.35 CUDA 3.1 gcc 4.3 Ubuntu 10.04 32bit

    Code:
    edison@edison-desktop:~/Downloads/shoc-0.9/tools$ perl driver.pl -cuda
    --- Welcome To The SHOC Benchmark Suite --- 
    Hostname: edison-desktop 
    Number of available devices: 1 
    Device 0: 'GeForce GTX 460'
    
    --- Starting Benchmarks ---
    -- Running Level 0 Benchmarks for Device 0 --
    -- Running Level 1 Benchmarks for Device 0 --
    -- Results --
    - Level 0: "Feeds and Speeds" -
    Peak FLOPS  (GFLOPS):           529.072
    Memory Bandwidth (Gbytes/s):    72.8
    PCIe Bus Speed H->D (Gbytes/s): 5.58274
    PCIe Bus Speed D->H (Gbytes/s): 6.16579
    - Level 1: Low Level Operations -
    FFT (GFLOPS):         258.899
    MD (GFLOPS):          345.605
    Reduction (Gbytes/s): 67.1293
    Scan (Gbytes/s):      15.0408
    SGEMM (GFLOPS):       358.095
    Sort (Gbytes/s):      0.239755
    Stencil2D (sec):      2.03656
    Triad (Gbytes/s):     7.0491
    

    Code:
    edison@edison-desktop:~/Downloads/shoc-0.9/tools$ perl driver.pl -opencl
    --- Welcome To The SHOC Benchmark Suite --- 
    Hostname: edison-desktop 
    Number of available devices: 1 
    Device 0: GeForce GTX 460
    
    --- Starting Benchmarks ---
    -- Running Level 0 Benchmarks for Device 0 --
    -- Running Level 1 Benchmarks for Device 0 --
    -- Results --
    - Level 0: "Feeds and Speeds" -
    Peak FLOPS  (GFLOPS):           899.298
    Memory Bandwidth (Gbytes/s):    Benchmark Error
    PCIe Bus Speed H->D (Gbytes/s): 5.57252
    PCIe Bus Speed D->H (Gbytes/s): 6.17624
    Kernel Compilation (s):         0.000433812
    OCL Queueing Delay (ms):        5.80597e-05
    
    - Level 1: Low Level Operations -
    FFT (GFLOPS):         38.4019
    MD (GFLOPS):          248.743
    Reduction (Gbytes/s): 57.8637
    Scan (Gbytes/s):      12.5351
    SGEMM (GFLOPS):       239.069
    Sort (Gbytes/s):      0.236293
    Stencil2D (sec):      2.06515
    Triad (Gbytes/s):     5.5809
    
     
    #5 cho, Jul 15, 2010
    Last edited by a moderator: Jul 15, 2010
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,638
    Location:
    London
    :lol: I was just about to comment on the peak GFLOPS of the CUDA run, then you posted the OpenCL run!
     
  7. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    414
    http://ft.ornl.gov/doku/shoc/downloads

    Changelog, version 0.9.1

    Numerous bug fixes to OpenCL and CUDA benchmarks
    Addition of Spmv benchmark, a sparse matrix-vector-multiply test
    Addition of double precision support to many SHOC benchmarks (though not yet all)
    Addition of an experimental driver for testing performance on NUMA systems
    Changed SHOC to use a GNU autoconf-generated script for configuration
     
  8. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    414
    version 0.92 released .

    Code:
    [edison@dhcppc2 tools]$ perl driver.pl -s 1 -opencl
    --- Welcome To The SHOC Benchmark Suite --- 
    Hostname: dhcppc2 
    Number of available devices: 1 
    Device 0: GeForce GTX 460
    
    --- Starting Benchmarks ---
    - Level 0: "Feeds and Speeds" -
    -- This can take several minutes. --
    -PCIe Bandwidth Tests (Gbytes/s)-
    Dev 0: GeForce GTX 460 H->D: 5.58484
    Dev 0: GeForce GTX 460 D->H: 6.19401
    
    -MaxFlops Test (GFLOPS)-
    Dev 0: GeForce GTX 460 SP:   902.635
    Dev 0: GeForce GTX 460 DP:   TBI
    
    -Device Memory Bandwidth Tests (Gbytes/s) (Read / Write)-
    Dev 0: GeForce GTX 460
    Global Memory Contiguous:       83.5151 / 69.146 
    Global Memory Strided:          6.59788 / 3.5106 
    Local Memory:                   74.998 / 75.5435 
    Image (Random Access):          23.7179 
    
    -OpenCL Kernel Compilation (s)-
    Dev 0: GeForce GTX 460 Kernel Compilation:         0.0042562
    
    -OpenCL Queuing Delay (ms)-
    Dev 0: GeForce GTX 460 Submit-Start Delay:        1.17657e-05
    
    --- Level 1 - Basic Algorithms and Parallel Primitives ---
    -- This can take several minutes. --
    
    -FFT (GFLOPS) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: GeForce GTX 460 SP FFT:           29.5859 / 13.548
    Dev 0: GeForce GTX 460 SP IFFT+Norm:     27.9357 / 13.1912
    
    -MD (Gbytes/s) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: GeForce GTX 460 SP MD:           15.9212 / 9.86172
    Dev 0: GeForce GTX 460 DP MD:           19.2474 / 13.4067
    
    -Reduction (Gbytes/s) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: GeForce GTX 460 SP MD:           30.0856 / 4.57579
    Dev 0: GeForce GTX 460 DP MD:           33.2181 / 4.64346
    
    -Scan (Gbytes/s) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: GeForce GTX 460 SP MD:           4.1323 / 1.69043
    Dev 0: GeForce GTX 460 DP MD:           3.98686 / 1.66584
    
    -GEMM (GFLOPS/s) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: GeForce GTX 460 SGEMM:           172.719 / 58.1831
    Dev 0: GeForce GTX 460 DGEMM:           38.5053 / 10.3581
    
    -Sort (Gbytes/s) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: GeForce GTX 460 Sort:            0.75323 / 0.596243
    
    -Stencil2D (s) (Kernel + PCIe transfer)-
    Dev 0: GeForce GTX 460 SP Sten2D:        0.209269
    Dev 0: GeForce GTX 460 DP Sten2D:        0.343495
    
    -Triad (Gbytes/s) (Kernel + PCIe transfer)-
    Dev 0: GeForce GTX 460 Triad:           5.66663
    
    Code:
     perl driver.pl -s 1 -cuda
    --- Welcome To The SHOC Benchmark Suite --- 
    Hostname: dhcppc2 
    Number of available devices: 1 
    Device 0: 'GeForce GTX 460'
    
    --- Starting Benchmarks ---
    - Level 0: "Feeds and Speeds" -
    -- This can take several minutes. --
    -PCIe Bandwidth Tests (Gbytes/s)-
    Dev 0: 'GeForce GTX 460' H->D: 5.58332
    Dev 0: 'GeForce GTX 460' D->H: 6.17031
    
    -MaxFlops Test (GFLOPS)-
    Dev 0: 'GeForce GTX 460' SP:   528.665
    Dev 0: 'GeForce GTX 460' DP:   75.4328
    
    -Device Memory Bandwidth Tests (Gbytes/s) (Read / Write)-
    Dev 0: 'GeForce GTX 460'
    Global Memory Contiguous:       72.6721 / 63.7339 
    Global Memory Strided:          5.12369 / 3.20294 
    Shared Memory:                  128.105 / 126.781 
    Texture (Random Access):        35.6334 
    
    --- Level 1 - Basic Algorithms and Parallel Primitives ---
    -- This can take several minutes. --
    
    -FFT (GFLOPS) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: 'GeForce GTX 460' SP FFT:           140.434 / 25.8695
    Dev 0: 'GeForce GTX 460' SP IFFT+Norm:     137.168 / 25.7565
    
    -MD (Gbytes/s) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: 'GeForce GTX 460' SP MD:           52.8352 / 17.3668
    Dev 0: 'GeForce GTX 460' DP MD:           47.5434 / 22.9686
    
    -Reduction (Gbytes/s) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: 'GeForce GTX 460' SP MD:           42.3628 / 4.54037
    Dev 0: 'GeForce GTX 460' DP MD:           45.3746 / 4.57222
    
    -Scan (Gbytes/s) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: 'GeForce GTX 460' SP MD:           13.0782 / 0.00578171
    Dev 0: 'GeForce GTX 460' DP MD:           11.6588 / 0.00578045
    
    -GEMM (GFLOPS/s) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: 'GeForce GTX 460' SGEMM:           198.003 / 101.265
    Dev 0: 'GeForce GTX 460' DGEMM:           40.992 / 22.6474
    
    -Sort (Gbytes/s) (Kernel Only / Kernel + PCIe transfer)-
    Dev 0: 'GeForce GTX 460' Sort:            0.883505 / 0.670083
    
    -Stencil2D (s) (Kernel + PCIe transfer)-
    Dev 0: 'GeForce GTX 460' SP Sten2D:        1.587
    Dev 0: 'GeForce GTX 460' DP Sten2D:        1.81846
    
    -Triad (Gbytes/s) (Kernel + PCIe transfer)-
    Dev 0: 'GeForce GTX 460' Triad:           5.73816
    
    geforce gtx 460 768mb 675mhz, cuda 3.2+260.24,centos 5.5 x86_64
     

Share This Page

Loading...