GPU Ray-tracing for OpenCL

Discussion in 'Rendering Technology and APIs' started by fellix, Dec 27, 2009.

  1. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    I was using a int[2] array to pass around the 2 seeds used by the random number generator. I uploaded a new version available at http://davibu.interfree.it/opencl/smallptgpu/smallptgpu-v1.5beta2.tgz

    I modified the OpenCL kernel to pass around 2 arguments instead of one array of two elements. Let's see if this is the source of all problems with NVIDIA :?:
     
  2. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,430
    Likes Received:
    309
    Location:
    Varna, Bulgaria
    Is there any way for the application to engage more than one OCL device, for multi-GPU systems?

    Also, some mean value for the sample count would be welcome. That way the app will be a bit more benchmark friendly. ;)
     
  3. Florin

    Florin Merrily dodgy
    Veteran

    Joined:
    Aug 27, 2003
    Messages:
    1,601
    Likes Received:
    133
    Location:
    The colonies
    Sorry Dade, this one doesn't produce output for me:

    D:\Install\SmallptGPU-v1.5beta2>RUN_SCENE_CORNELL_64SIZE.bat

    D:\Install\SmallptGPU-v1.5beta2>smallptGPU.exe 1 64 rendering_kernel.cl 640 480 scenes\cornell.scn
    Usage: smallptGPU.exe
    Usage: smallptGPU.exe <use CPU/GPU device (0=CPU or 1=GPU)> <workgroup size (0=default value or anything > 0 and power of 2)> <kernel file name> <window width> <window height> <scene file>
    Reading scene: scenes\cornell.scn
    Scene size: 9
    OpenCL Platform 0: NVIDIA Corporation
    OpenCL Device 0: Type = TYPE_GPU
    OpenCL Device 0: Name = GeForce GTX 280
    OpenCL Device 0: Compute units = 30
    OpenCL Device 0: Max. work group size = 512
    Reading file 'rendering_kernel.cl' (size 3228 bytes)
    OpenCL Device 0: kernel work group size = 384
    OpenCL Device 0: forced kernel work group size = 64
    Failed to wait the end of OpenCL execution: -5
     
  4. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    I doubt you have the feeling how shocking for me is seeing a 565000k/s number for an unbiased rendering :grin:

    Just to give you an idea, a very simple scenes usually runs at 500k/s. Complex scenes usually runs at 20-30k/s on a quadcore (check http://www.luxrender.net/forum/gallery2.php for some example of the scene I'm talking about).

    I can reach 100-150k/s on a network of 6 quadcore ... 565000k/s should be forbidden by the law of physic :wink:

    P.S. Sample from SmallptGPU and "Luxrender" are not exactly the same thing but let's dream a bit.
     
  5. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,430
    Likes Received:
    309
    Location:
    Varna, Bulgaria
    There is an IQ problem with the latest beta -- visible aliasing on some edges and intersections:

    [​IMG]
     
  6. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
  7. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Yes, there is a very good support for handling multiple devices (both CPUs and GPUs). It is the next thing I'm going to explore.
     
  8. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,430
    Likes Received:
    309
    Location:
    Varna, Bulgaria
  9. Florin

    Florin Merrily dodgy
    Veteran

    Joined:
    Aug 27, 2003
    Messages:
    1,601
    Likes Received:
    133
    Location:
    The colonies
    Heh well it'd be cooler if it were actually rendering anything, but just for kicks then:

    [​IMG]

    No overclocking or anything.
     
    #69 Florin, Jan 3, 2010
    Last edited by a moderator: Jan 3, 2010
  10. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Just for fun, this was looking up in the air, running BETA 2.
    I just to see how high my numbers would go. :)

    GPU was set to the same speed as posted above...
    [​IMG]
     
  11. Florin

    Florin Merrily dodgy
    Veteran

    Joined:
    Aug 27, 2003
    Messages:
    1,601
    Likes Received:
    133
    Location:
    The colonies
  12. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
  13. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Thanks guys, 2900K/sec isn't a bad result. It is still a bit far from the 5400K/s of my 4870 under Windows. I assume 1 GPU of the GTX295 and the HD4870 should run about at the same speed.

    May be ATI OpenCL driver/hardware is just better for this particular kernel.

    Overall, it is quite impressive how good are the performances of the new generation (i.e. ATI HD5870) compared to the old one (i.e ATI HD 48xx, NVIDIA GTX 28x/29x). Now we have only to wait for Fermi to have a complete picture :wink:
     
  14. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,430
    Likes Received:
    309
    Location:
    Varna, Bulgaria
    Run one of the BAT files, the one with 64SIZE.
     
  15. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,853
    Likes Received:
    722
    Location:
    London
    OK, so the next thing to do to make David boggle, is to run multiple SmallLuxGPU programs. Because the GPU is only active for a fraction of the time taken to render each pass it should be possible to load up all four cores of a quad core processor by running four different instances of the program: BIGMONKEY, LOFT, LUXBALL and SPONZA :razz:

    Use affinity to keep each one pinned to a single core.

    Then take the average rays/second from each and add them all up for a grand total :lol:

    Jawed
     
  16. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Sure...

    As requested. :wink:

    [​IMG]
     
  17. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Just some numbers from our other thread you might like to see... :wink:
    http://www.xtremesystems.org/forums/showthread.php?t=241904&page=2

    I believe this is what we are looking at so far...
    freeloader ---------- 5850 ------ Sample/sec -- 17,298.6K v1.5 (GPU=1007, M=1152)

    freeloader ---------- 5850 ------ Sample/sec -- 13,719.6K v1.4 (GPU=1007, M=1152)
    Toysoldier ---------- 5870 ------ Sample/sec -- 13,719.6K v1.4 (GPU=875, M=1300)
    fellix bg ------------- 5870 ------ Sample/sec -- 13,719.6K v1.4 (GPU=900, M=1250)

    safan80 ------------- 5970 ------ Sample/sec -- 11,012.8K v1.4 (Unknown)

    SocketMan --------- 5770 ------ Sample/sec --- 7,535.1K v1.4 (GPU=950, M=1200)
    mattkosem --------- 4890 ------ Sample/sec --- 7,520.9K v1.4 (GPU=1056, M=1000)
    BeepBeep2 --------- 4850 ------ Sample/sec --- 7,172.0K v1.5 (GPU=800, M=2250)

    Mechromancer ----- 4870 ------ Sample/sec --- 6,955.5K v1.5 (GPU=790, M=900)
    PyrO ----------- 1/2 a 4870X2 -- Sample/sec --- 6,955.5K v1.5 (GPU=790, M=915)
    redrumy3 ---------- 4870 ------- Sample/sec --- 6,375.8K v1.4 (GPU=875, M=1100)

    PyrO ----------- 1/2 a 4870X2 -- Sample/sec --- 5,796.2K v1.4 (GPU=790, M=915)
    NovoRei ------------ 4870 ------ Sample/sec --- 5,616.1K v1.4 (512mb, 790mhz)

    Talonman -------- 1/2 a 295 ---- Sample/sec --- 2,898.1K v1.5 (C=621, SH=1512, M=1152)
    Chumbucket843 -- GTX 260 ---- Sample/sec --- 2,068.7K v1.5 (C=602, SH=1369, M=1159)

    Talonman -------- 1/2 a 295 ---- Sample/sec --- 1,159.2K v1.4 (C=621, SH=1512, M=1152)
    Chumbucket843 -- GTX 260 ---- Sample/sec --- 1,123.2K v1.4 (C=602, SH=1369, M=1159)
    DosDuoNo -------- GTX 260 ----- Sample/sec --- 1,093.2K v1.4 (C=655, SH=1125, M=1125)
     
  18. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    I have already started to work on SmallptGPU 2.0 in order to test/learn how the OpenCL support for multiple devices work (i.e. CPU + GPU, GPU + GPU, etc.) :wink:

    P.S. thanks Talonman, a lot of very interesting information, 17,298.6K on a 5850 ?!? What the hell ...


     
    #78 Dade, Jan 4, 2010
    Last edited by a moderator: Jan 4, 2010
  19. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,430
    Likes Received:
    309
    Location:
    Varna, Bulgaria
    That's cool!
    You'll find enough devoted beta testers here, I hope. ;)
     
  20. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,853
    Likes Received:
    722
    Location:
    London
    Fermi has cached read/write of spilled registers, so it should be much better, if spilling is the problem on NVidia. Need to know how many registers are being allocated or whether spillage is occurring.

    Jawed
     

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...