AMD Trinity/Richland: Help me run a small GPU test!

Discussion in 'GPGPU Technology & Programming' started by codedivine, May 27, 2013.

  1. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Hi folks. I need your help!

    I am trying to find out the approximate double-precision performance of the GPU in AMD Trinity for ADD, MUL and FMA instructions. I have written a small OpenCL test to do so. If possible, can you please download and run my test? Link: http://www.rgbench.com/MaxFlopsCL.exe

    It needs to be run on the command-line under 64-bit windows. For machines with a single GPU, the test should choose the default GPU. If you are running a Trinity based system with an additional AMD GPU installed, and the test reports numbers for that GPU instead, then try doing: "MaxFlopsCL.exe 0 1" instead?

    If you are worried about downloading arbitrary binaries from the internet, well the test is open-source with source here so you can compile yourself if you wish :) : https://bitbucket.org/codedivine/maxflopscl
     
    #1 codedivine, May 27, 2013
    Last edited by a moderator: May 27, 2013
  2. moozoo

    Newcomer

    Joined:
    Jul 23, 2010
    Messages:
    109
    Likes Received:
    1
  3. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Well that info is likely not right. I asked about Trinity DP throughput a long time ago on AMD OpenCL forums and got the reply that it is 1/16 the fp32 speed. I was wondering however whether that is for ADD, MUL and FMA or just ADD. And that is why I whipped up this test. My test reports 3 numbers separately.

    Looked at FlopsCL source code, it is a very nice tool. Looks like it is benchmarking FMA ops. Some chips are faster at adds than muls and FMAs, so would like to know the throughput separately. My tool is indeed very basic (just scalar double ops) but it is intended for a slightly different task.
     
  4. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,835
    Likes Received:
    3,026
    cannot start because msvcp110.dll is missing
    downloaded file and got this
    [​IMG]

    installed Visual C++ Redistributable for Visual Studio 2012 Update 1

    amd 6950

    C:\Temp>maxflopscl
    GPU selected Cayman
    Op = Add
    Error creating program from source 0 -11 -46
    Buildlog 7211 C:\Users\Davros\AppData\Local\Temp\OCL1A8D.tmp.cl(1): error: can't
    enable all
    OpenCL extensions or unrecognized OpenCL extension
    #pragma OPENCL EXTENSION cl_khr_fp64 : enable
    ^

    edited to tidy up thread
     
  5. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Thanks Davros. Looks like cl_khr_fp64 is not supported on your card but I think AMD's cl_amd_fp64 should be. I have uploaded a new binary which should work-around this issue and contains a number of other fixes.
     
  6. moozoo

    Newcomer

    Joined:
    Jul 23, 2010
    Messages:
    109
    Likes Received:
    1
    Beware that the correct way to check for double precision under opencl 1.2 is to check the preferred vector size for doubles is >0.
    the #pragma for cl_khr_fp64 may give a warning that double precision is no longer an extension and is now a part of the base standard.
     
  7. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,835
    Likes Received:
    3,026
    Tried the new build (just posted part of the output for brevety


    C:\Temp>maxflopscl
    Device selected Cayman
    Device compute units: 22
    Device extensions:
    cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomic
    s cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_
    image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_count
    ers_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops
    cl_amd_popcnt cl_khr_d3d10_sharing
    FP64 supported with configuration: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEA
    REST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
    Testing DP performance
    Op = Add
    Error creating program from source 0 -11 -46
    Buildlog 7098 C:\Users\Davros\AppData\Local\Temp\OCL8CA9.tmp.cl(3): error: expec
    ted an
    expression
    #elif
    ^

    C:\Users\Davros\AppData\Local\Temp\OCL8CA9.tmp.cl(7): error: identifier
    "double" is undefined
    void testFlops(__global double* output,double v1,double v2){
    ^
     
  8. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Thanks. Will fix.
     
  9. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,835
    Likes Received:
    3,026
    ps: if anyones interested I ran the benchmark moozoo linked to
    [​IMG]
     
  10. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,819
    Likes Received:
    491
    Location:
    Torquay, UK
    When you fix code codedivine I can run it on A10-5800K (stock/OC), A8-4500M (256 shaders) and HD7970 for you.

    Keep up good work guys!
     
  11. moozoo

    Newcomer

    Joined:
    Jul 23, 2010
    Messages:
    109
    Likes Received:
    1
    It would be great if you could run FlopsCL on the A10 5800k as well.
    I'd love to know if it does have fp64 support and what it's double4 gflops are.
     
  12. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,819
    Likes Received:
    491
    Location:
    Torquay, UK
    No problem! Just give me few hours to get back to home.

    EDIT!!!

    Results for FlopsCL are here.

    A10-5800K CPU (4.3GHz)

    [​IMG]



    A10-5800K GPU (900MHz)

    [​IMG]



    HD7970 (1050/1425)

    [​IMG]


    Interestingly DP on Trinity was broken and much slower on Cat. 12.10 beta. Double 8 would crash W8 at random point during test. Luckily latest Catalyst are working fine :)
     
    #12 Lightman, May 30, 2013
    Last edited by a moderator: May 30, 2013
  13. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,835
    Likes Received:
    3,026
    Wonder why the 7970 uses 65k blocks but a 6950 4k blocks ?
     
  14. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Hi guys. I have uploaded an updated test. Hoping it works out. Same link.
     
  15. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Thanks a lot for those numbers! :smile:
     
  16. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,835
    Likes Received:
    3,026
    Yay
    C:\Temp>maxflopscl
    Device selected Cayman
    Device compute units: 22
    Device extensions:
    cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomic
    s cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_3d_
    image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_ext_atomic_count
    ers_32 cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops
    cl_amd_popcnt cl_khr_d3d10_sharing
    FP64 supported with configuration: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEA
    REST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF CL_FP_FMA
    Testing DP performance
    Op = Add
    Time 155.751ms GOps/s 441.214
    Op = Mul
    Time 123.728ms GOps/s 277.703
    Op = Fma
    Time 123.648ms GOps/s 277.884
     
  17. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Thanks! Clearly, ADDs are much faster than MULs and FMAs whereas on Nvidia cards both are usually of equal throughput. Note that I count ops, not flops. Thus 277.8 GOps/s FMAs equates to 555 GFlops which is very close to the FlopsCL result.
     
  18. moozoo

    Newcomer

    Joined:
    Jul 23, 2010
    Messages:
    109
    Likes Received:
    1
  19. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    15,835
    Likes Received:
    3,026
    cats 13.2 and installed the latest opencl driver from amd before i ran the tests
    seem to be using same driver as lightman's 7970
     
  20. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    18,029
    Likes Received:
    1,614
    Location:
    Maastricht, The Netherlands
    Won't run on my 550Ti / Win 8 for some reason (just exists when I press Benchmark). It did at least tell me I have 4 CUs though, so that at least gives me some perspective on my current GPU vs next gen consoles. :lol:

    I have to say, that seeing none of these OpenCL programs seem to work on anyone's machines without serious effort is not encouraging. Hopefully some platform will eventually arise that just works everywhere, and maybe next-gen consoles will help with that (stranger things have happened)
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...