GPU Ray-tracing for OpenCL

Discussion in 'Rendering Technology and APIs' started by fellix, Dec 27, 2009.

  1. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,426
    Likes Received:
    10,320
    I was just wondering about that myself. That was my next question for you. Whether there was a way to limit the number of threads/number of cores used on a CPU. As I'd imagine it could be problematic in say a game for instance, if the core game required 1-2 cores for good performance, but would then like anything using OpenCL to be able to fully occupy all remaining cores.

    Regards,
    SB
     
  2. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Can't you just increase the priority on the GPU rendering threads?

    PS. does calling clfinish at near 200 Hz (on the faster ATI cards) really have no impact on performance? (If I had a graphics card which could run it I'd try doing clfinish once every 20 invocations or so instead, but I don't.)
     
    #122 MfA, Jan 10, 2010
    Last edited by a moderator: Jan 10, 2010
  3. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Thinks! Preliminary report is in...

    She works... she works!! :smile:

    This was with running the 'smallptGPU' executable.
    Now getting 3,880K Samples/sec:
    [​IMG]


    GPU-z reporting workload distribution@
    First 1/2 of my 295 - 73%
    Second 1/2 of my 295 - 50%
    280 Dedicated PhysX - 98%
    Q6600 - 29%
    [​IMG]

    Looking good too...
    [​IMG]

    I will report back with my using the various 'Size' bat files.

    UPDATE:
    smallptGPU - 3,880
    RUN_SCENE_CORNELL_32SIZE - 2,401
    RUN_SCENE_CORNELL_64SIZE - 3,880
    RUN_SCENE_CORNELL_128SIZE - 3,516
    RUN_SCENE_SIMPLE_64SIZE - 77,879.7
    [​IMG]

     
    #123 Talonman, Jan 10, 2010
    Last edited by a moderator: Jan 10, 2010
  4. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Running the Simple, workload distribution reported by GPU-z was:
    First 1/2 of my 295 - 35%
    Second 1/2 of my 295 - 15%
    280 Dedicated PhysX - 92%
    Q6600 - 30%
    [​IMG]

    FYI - No system reset problems of any kind. She is solid as a rock so far.

    You may also be interested in knowing that the S
    mallptGPU, and the Simple Scene run fine against each other...
    GPU-z reporting workload distribution@
    First 1/2 of my 295 - 71%
    Second 1/2 of my 295 - 41%
    280 Dedicated PhysX - 96%
    Q6600 - 63%
    Getting 3602.0K Samples/sec on both instances.
    [​IMG]
    I was playing with the '1' key, that is the reason you see the 'Updating OpenCL Device workloads' so many times.
     
    #124 Talonman, Jan 10, 2010
    Last edited by a moderator: Jan 10, 2010
  5. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Wow, Talonman, thank you, I love the pictures with the 3 GPUs at work.

    Good if it is stable .. now I'm a bit worried of the status of my power supply :wink: or my be it is just a bug in the ATI Windows XP driver because everything works fine under Linux.
     
  6. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Thanks...

    Look here for more BETA testers on the new version. (ATI users too)

    http://www.xtremesystems.org/forums/showthread.php?t=241904&page=3

    I think the GPU workload balancing could use a bit of an adjustment, but love the progress your making.

    Keep up the fine job. :wink:
     
    #126 Talonman, Jan 10, 2010
    Last edited by a moderator: Jan 10, 2010
  7. Florin

    Florin Merrily dodgy
    Veteran Subscriber

    Joined:
    Aug 27, 2003
    Messages:
    1,707
    Likes Received:
    345
    Location:
    The colonies
    I made a binary of the 2.0alpha on OSX, needed just a few CFLAGS really (-I/opt/local/include -L/opt/local/lib -lboost_thread-mt, this is for boost 1.41 from macports which is probably most widely used). And a minor typo on line 34 of displayfunc.cpp __APPLE_ needs extra underscore. So nothing big, works sweet.

    Looking at utilisation I reckon it's still leaving cycles on the table for now but great new development Dade :)

    Here's a shot of one of the few boxes where the CPU does a sizeable part of the work:

    [​IMG]
     
  8. Florin

    Florin Merrily dodgy
    Veteran Subscriber

    Joined:
    Aug 27, 2003
    Messages:
    1,707
    Likes Received:
    345
    Location:
    The colonies
    unsafe didn't seem to make a difference, and the current release is actually slightly faster (~16K s/sec) w/o the other 2 options on OS X.. curiously
     
  9. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,969
    Likes Received:
    963
    Location:
    Torquay, UK
    Dade - can load balancing between GPU and CPU be manually adjusted after application do it's automatic balance? The problem I can see is when someone like me likes to change GPU and CPU freq. on the fly, the app will not work optimally.
    Also for some reason it's not balancing ideally for clocks above 850MHz on my GPU (the CPU/GPU ratio stays roughly the same when it should be giving more work for GPU). With manual adjustement I can give for example 96% for GPU and only 4% for CPU which probably give me best performance in 1GHz GPU - 3.5GHz QC CPU confing.

    Hope this can be implemented!

    :grin:
     
  10. Florin

    Florin Merrily dodgy
    Veteran Subscriber

    Joined:
    Aug 27, 2003
    Messages:
    1,707
    Likes Received:
    345
    Location:
    The colonies
    Hyperthreading with ATI Stream SDK :lol:

    [​IMG]

    The i7 is almost exactly as fast as the GTX280 (ie proof positive that Nvidia's current OpenCL.dll is sh*t)
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Copy one of the batch files to make a new one called complex.bat and make it like so:

    smallptGPU.exe 1 1 64 640 480 scenes\complex.scn

    This is a new scene file. The number of spheres is quite meaty...

    Jawed
     
  12. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    It is just a matter of adding few keybindings to hand tune the % of workload assigned to each device, I'm going to add this feature :wink:

    You my have notice that at the moment the workload % is decided after 10secs of "Profiling" with the work evenly split among all devices. I guess it is a too short period but increasing the profiling time would be just annoying so hand tuning is probably the best solution.
     
  13. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Jawed, that scene is quite "terrible" given the current brute force intersection algorithm used in smallptGPU.

    My first idea was to keep smallptGPU very simple but I guess it is worth adding a simple BVH to accelerate ray intersections so we can play with thousands of spheres instead that just 5 or 6 :wink:
     
  14. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    For trivia, w1zzard, the programmer of GPU-z is at Xtreme Systems, was asked what he was using to read the % of workload on each GPU. He posted "I'm using the official interface provided by nvidia in their nvapi."

    I don't know if you could also use it or not... :)

    It would be fun to see the actual workload % change on the screen, as we hit the keys.


    http://www.evga.com/forums/tm.aspx?m=91863&mpage=3
    Posted by Spongebob28:
    "I want to join in on the games.

    V2.0 Alpha

    Geforce 8800 GT Perf. index 1.00 Workload done 34.2%
    Geforce GTX 295 Perf. index 2.28 Workload done 21.8%
    Geforce GTX 295 Perf. index 2.28 Workload done 44.0%

    Rending time 0.625 sec pass 447 Sample/sec 1820.8K."
     
    #134 Talonman, Jan 10, 2010
    Last edited by a moderator: Jan 11, 2010
  15. CNCAddict

    Regular

    Joined:
    Aug 14, 2005
    Messages:
    290
    Likes Received:
    2
    I'm curious what the differences are between David's program and this one??

    http://code.google.com/p/tokaspt/

    He is claiming 185.6M 4 bounce rays per second. I would need about 13 HD5850's to get close to that with SmallptGPU:shock:
     
  16. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Samples are not Rays. SmallptGPU traces 12 rays (2 for each path vertex, 6 path depth max.) to generate a sample. It means running at 228M rays/sec in case of the 19000M samples/sec of a 5870.

    You could look at the "spp" (i.e.Sample Per Sec) statistic there to try to do a direct comparison.
     
  17. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
  18. CNCAddict

    Regular

    Joined:
    Aug 14, 2005
    Messages:
    290
    Likes Received:
    2
    I'm a 100% noob and was not sure if he was using some sort of fancy code to speed things up. Thanks for the clarification!!
     
  19. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    422
    Likes Received:
    16
    the GPU load of GPU-Z 0.38 is "0%" with this version.

    My GPU is NVIDIA GeForce 9600GT 512MB (Windows 7 x64, FW 195.62) .
     
  20. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Outstanding job again Dave!!

    By manually adjusting the workload, I now can get 5173.9K Samples/sec. (A new record on my system.)
    First 1/2 of my 295 - 96%
    Second 1/2 of my 295 - 97%
    280 Dedicated PhysX - 96%
    Q6600 - 30% (Exactly the same as the first 2.0Alpha, without the manual GPU workload distribution feature.)

    Apparently balancing the GPU's workload does NOT change the CPU's utilization at all...
    [​IMG]

    Note that I now feel there is no need for GPU-z anymore, I simply need to get all 3 GPU's to read as close to 33.3% as I can, using your program.
    The displayed workload distribution % in your program is accurate. :)

    The Simple Scene also has the manual adjustment feature, and it also is working well:
    [​IMG]
     
    #140 Talonman, Jan 12, 2010
    Last edited by a moderator: Jan 12, 2010
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...