GPU Ray-tracing for OpenCL

MrGaribaldi · Mar 28, 2010

Are there any released in the wild yet? I thought they wouldn't be available until 12th of April.
But as soon as I can get access to one, I'll try to get some results. Have great hopes for the results!

Dade · Mar 28, 2010

Lightman said:
Anyone with GTX480 willing to join the party?

Not yet, however if you want to see some big number follow some screenshot posted by KyungSoo in LuxRender forum dedicated to GPU accelleration.

8 GPUs (!) at work:

4 Tesla at work:

I'm looking forward to the first test with Fermi too

cho · Mar 29, 2010

GTX 480:

GTX 285:

HD 5870:

CNCAddict · Mar 29, 2010

CAN YOU SAY WOOOHOOOO. Looks like I may trade in my 5850 afterall

rpg.314 · Mar 29, 2010

cho said:
GTX 480:

GTX 285:

Holy cow...

Is that a ~20x jump I am seeing there. With just a tiny L1. I am assuming you used 48K for L1 cache.

Come on Dave, give us some cachey goodness on radeon 6xx0.

fellix · Mar 29, 2010

cho said:
GTX 480:

DAMN!

rpg.314 said:
Is that a ~20x jump I am seeing there. With just a tiny L1. I am assuming you used 48K for L1 cache.

The default LDS/L1 partitioning for GF100 (as of current) is 48/16KB.

Dade · Mar 29, 2010

Omg

"Old" NVIDIA cards have always shown some problem with SmallptGPU (I wouldn't focus too much on the speed up when compared with GTX285) but running more than 2 time faster than a 5870 is eye popping

Cho, any chance to run one of the latest SmallLuxGPU (http://davibu.interfree.it/opencl/smallluxgpu/slg-v1.4beta3.tgz) ?

Psycho · Mar 29, 2010

Ehm.. how can the 5870 do more passes in the same time (and show a *very* similar image that if anything is slightly better - like the number of passes indicate), but get a much lower samples/sec count? Looks like it's doing same/more work in the same time

Jawed · Mar 29, 2010

That's very tasty. Some nice combination of dynamic branching and cache I suppose.

A much more stressful test:

http://forum.beyond3d.com/showpost.php?p=1385754&postcount=222

Jawed

cho · Mar 29, 2010

GTX 480:

GTX 285:

HD 5870

Jawed · Mar 29, 2010

Psycho said:
Ehm.. how can the 5870 do more passes in the same time (and show a *very* similar image that if anything is slightly better - like the number of passes indicate), but get a much lower samples/sec count? Looks like it's doing same/more work in the same time

The time shown is the time between screen updates. The application varies workload per invocation of the OpenCL kernel in order to produce a consistent 0.5s update interval.

Jawed

jj99 · Mar 29, 2010

Very good result of GTX 480 for smallptGPU, but performance in smallluxGPU is rather disappointing...

Dade · Mar 29, 2010

Thanks, Cho, however you need to tune a bit the configuration for your hardware and for still rendering (instead of preview). You had only a 50% load on the 480.

You should edit the scenes/luxball/render-fast.cfg file and replace the content with:

image.width = 640
image.height = 480
batch.halttime = 0
scene.file = scenes/luxball/luxball.scn
scene.fieldofview = 45
opencl.latency.mode = 0
opencl.nativethread.count = 0
opencl.cpu.use = 0
opencl.gpu.use = 1
opencl.platform.index = 0
opencl.renderthread.count = 4
opencl.gpu.workgroup.size = 64
screen.refresh.interval = 2000
screen.type = 3
screen.gamma = 2.2
path.maxdepth = 6
path.russianroulette.depth = 5
path.russianroulette.prob = 0.75
path.shadowrays = 1

If you use this configuration, first of all it will use only GPU for the rendering, it will use 4 threads to feed the GPU (I assume you have a quad core) and it will disable preview mode.

For reference, this is the result of my i7 860+5870+5850:

Indeed, tuning the configuration is very important.

Dade · Mar 29, 2010

jj99 said:
Very good result of GTX 480 for smallptGPU, but performance in smallluxGPU is rather disappointing...

I think Cho just need a bit of tuning for SmallLuxGPU, however keep in mind SmallptGPU uses a very small dataset (i.e. few bytes). While SmallLuxGPU uses dataset of several MBs.

May be the size of the Fermi cache shines in the first case while it is nearly useless in the second.

jj99 · Mar 29, 2010

Thanks, Dade, I understand that. I was wondering how Fermi's cache will help in real world scenario like in case of SmallLuxGPU. Will wait to see the updated results of Cho.

Lightman · Mar 29, 2010

Indeed very good showing from GTX480 in SmallPT

It all finally starts going in the right direction with GPGPU. I only can hope AMD and nVidia can keep up this rate of development for another 3-5 years and real-time RT will be concurred!

cho · Mar 29, 2010

I am using a i7-920 with HT enabled..

The thread number is set to 16 . The GPU load is about 67~78%.

fellix · Mar 29, 2010

cho said:
The thread number is set to 16 . The GPU load is about 67~78%.

Wow -- 93°C for just 78% load?

Anyway, here is my HD5870 @ 900MHz GPU:

This is with 8 threads on Q9450. Four wouldn't saturate it enough, giving me lower sample rates.

cho · Mar 29, 2010

yes, but the fan noise is ok at this speed.

Dade · Mar 29, 2010

cho said:
I am using a i7-920 with HT enabled..

The thread number is set to 16 . The GPU load is about 67~78%.

Thanks Cho, the correct value for the thread count should be 8 (4 real cores + 4 virtual for HT).

Anyway, the result seems to confirm 480 about 2 times faster than 5870 on GPGPU tasks (about 8M rays/secs Vs about 4M rays/secs).

GPU Ray-tracing for OpenCL

Similar threads