GPU Ray-tracing for OpenCL

Discussion in 'Rendering Technology and APIs' started by fellix, Dec 27, 2009.

  1. MrGaribaldi

    Regular

    Joined:
    Nov 23, 2002
    Messages:
    611
    Likes Received:
    0
    Location:
    In transit
    Are there any released in the wild yet? I thought they wouldn't be available until 12th of April.
    But as soon as I can get access to one, I'll try to get some results. Have great hopes for the results!
     
  2. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Not yet, however if you want to see some big number follow some screenshot posted by KyungSoo in LuxRender forum dedicated to GPU accelleration.

    8 GPUs (!) at work:

    [​IMG]

    4 Tesla at work:

    [​IMG]

    I'm looking forward to the first test with Fermi too :wink:
     
  3. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    416
    Likes Received:
    2
    GTX 480:
    [​IMG]

    GTX 285:
    [​IMG]

    HD 5870:
    [​IMG]
     
  4. CNCAddict

    Regular

    Joined:
    Aug 14, 2005
    Messages:
    290
    Likes Received:
    2
    CAN YOU SAY WOOOHOOOO. Looks like I may trade in my 5850 afterall :shock:
     
  5. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Holy cow... :shock:

    Is that a ~20x jump I am seeing there. With just a tiny L1. I am assuming you used 48K for L1 cache.

    Come on Dave, give us some cachey goodness on radeon 6xx0.
     
  6. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    DAMN!
    The default LDS/L1 partitioning for GF100 (as of current) is 48/16KB.
     
  7. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Omg :!:

    "Old" NVIDIA cards have always shown some problem with SmallptGPU (I wouldn't focus too much on the speed up when compared with GTX285) but running more than 2 time faster than a 5870 is eye popping :shock:

    Cho, any chance to run one of the latest SmallLuxGPU (http://davibu.interfree.it/opencl/smallluxgpu/slg-v1.4beta3.tgz) ?
     
  8. Psycho

    Regular

    Joined:
    Jun 7, 2008
    Messages:
    745
    Likes Received:
    39
    Location:
    Copenhagen
    Ehm.. how can the 5870 do more passes in the same time (and show a *very* similar image that if anything is slightly better - like the number of passes indicate), but get a much lower samples/sec count? Looks like it's doing same/more work in the same time
     
  9. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
  10. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    416
    Likes Received:
    2
    GTX 480:
    [​IMG]

    GTX 285:
    [​IMG]

    HD 5870
    [​IMG]
     
  11. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    The time shown is the time between screen updates. The application varies workload per invocation of the OpenCL kernel in order to produce a consistent 0.5s update interval.

    Jawed
     
  12. jj99

    Newcomer

    Joined:
    Mar 29, 2010
    Messages:
    4
    Likes Received:
    0
    Very good result of GTX 480 for smallptGPU, but performance in smallluxGPU is rather disappointing...
     
  13. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Thanks, Cho, however you need to tune a bit the configuration for your hardware and for still rendering (instead of preview). You had only a 50% load on the 480.

    You should edit the scenes/luxball/render-fast.cfg file and replace the content with:

    image.width = 640
    image.height = 480
    batch.halttime = 0
    scene.file = scenes/luxball/luxball.scn
    scene.fieldofview = 45
    opencl.latency.mode = 0
    opencl.nativethread.count = 0
    opencl.cpu.use = 0
    opencl.gpu.use = 1
    opencl.platform.index = 0
    opencl.renderthread.count = 4
    opencl.gpu.workgroup.size = 64
    screen.refresh.interval = 2000
    screen.type = 3
    screen.gamma = 2.2
    path.maxdepth = 6
    path.russianroulette.depth = 5
    path.russianroulette.prob = 0.75
    path.shadowrays = 1

    If you use this configuration, first of all it will use only GPU for the rendering, it will use 4 threads to feed the GPU (I assume you have a quad core) and it will disable preview mode.

    For reference, this is the result of my i7 860+5870+5850:

    [​IMG]

    Indeed, tuning the configuration is very important.
     
  14. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    I think Cho just need a bit of tuning for SmallLuxGPU, however keep in mind SmallptGPU uses a very small dataset (i.e. few bytes). While SmallLuxGPU uses dataset of several MBs.

    May be the size of the Fermi cache shines in the first case while it is nearly useless in the second.
     
  15. jj99

    Newcomer

    Joined:
    Mar 29, 2010
    Messages:
    4
    Likes Received:
    0
    Thanks, Dade, I understand that. I was wondering how Fermi's cache will help in real world scenario like in case of SmallLuxGPU. Will wait to see the updated results of Cho.
     
  16. Lightman

    Veteran Subscriber

    Joined:
    Jun 9, 2008
    Messages:
    1,804
    Likes Received:
    475
    Location:
    Torquay, UK
    Indeed very good showing from GTX480 in SmallPT :shock:

    It all finally starts going in the right direction with GPGPU. I only can hope AMD and nVidia can keep up this rate of development for another 3-5 years and real-time RT will be concurred!
     
  17. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    416
    Likes Received:
    2
    [​IMG]

    I am using a i7-920 with HT enabled..

    The thread number is set to 16 . The GPU load is about 67~78%.
     
  18. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    Wow -- 93°C for just 78% load?

    Anyway, here is my HD5870 @ 900MHz GPU:

    [​IMG]

    This is with 8 threads on Q9450. Four wouldn't saturate it enough, giving me lower sample rates.
     
  19. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    416
    Likes Received:
    2
    yes, but the fan noise is ok at this speed.
     
  20. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Thanks Cho, the correct value for the thread count should be 8 (4 real cores + 4 virtual for HT).

    Anyway, the result seems to confirm 480 about 2 times faster than 5870 on GPGPU tasks (about 8M rays/secs Vs about 4M rays/secs).
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...