OpenCL Solar System

Discussion in 'GPGPU Technology & Programming' started by moozoo, May 18, 2013.

  1. moozoo

    Newcomer

    Joined:
    Jul 23, 2010
    Messages:
    109
    Likes Received:
    1
    >I might have access to a machine with a nvidia card soon (only at GT 610 but still)

    Work sold me a cheap Intel Core 2 Duo for $35 AUD and I put a cheap $50 AUD low profile GT 610 in it. I now have a full Nvidia development environment again.

    As a result of this I found out I had compiled the opencl icd as a debug build (sorry my lack of experience with cmake). It has a dependency on the debug run time.

    I uploaded 1.062 in which I compiled a release version.

    I have been thinking about this too.
    I have an old Pentium 4 computer that might be receiving a new motherboard and CPU...

    I have been looking at close encounter detection between all bodies.
    To do this requires an octree. That might benefit from HSA. Especially if HSA enables dynamic parallelism.
    If asteroids come within .05 AU of each other it is possible for them to affect each others orbit. This enables estimates of their masses to be made.

    With my program all of data is transferred to the graphics card memory and stays there. All of the maths and 3d graphics is running on the GPU. i.e. the CPU is just coordinating the GPU to do all the work.

    HSA might at some point enable better synchronization between OpenGL and OpenCL. As far as I know there is no way to insert a fence in the GL command stream tied to the openCL command queue without a round trip to the CPU. I have to glfinish.
    It would be great if there was an API that combined OpenGL and OpenCL into one coherent API. Mantle 2.0? :)
    Ideally I'd send a long list of OpenGL and OpenCL commands and tell the GPU to run them all in a large loop until some window message event occurs (a mouse or keyboard event).

    With HSA I'm thinking we need to dump all 2d,3d & compute API's and start again from scratch.

    The bottleneck in the current program is the force calculation between the bodies (mostly a huge number of rsqrt's).
    Kaveri only has a 1:16 ratio for fp64 on it GCN Cores. When you work out its total DP flops, most of them come from the CPU cores... However GCN has a native double precision rsqrt and AVX does not so its not that simple.
    I believe numerical accuracy of native AVX functions is less than the GCN ones and this affects AVX's performance since additional code is needed to improve the accuracy.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...