GPU Ray-tracing for OpenCL

Dade: With SmallLuxGPU 1.3 and 1.4 beta 3 I get this error:
http://www.abload.de/img/errorp4oy.png
Older versions and SmallPTGPU work fine.

Win 7 x64, Cat. 8.712.3.1 (OpenGL 4.0&3.3 Preview Driver), Stream SDK 2.01, HD4850.

ATI OpenCL SDK 2.01 has a known problem with HD48xx family. According a post in ATI forums, it will be fixed in the next SDK release.

However, for the moment, the only solution is to downgrade to SDK 2.0 :???:
 
Is that a ~20x jump I am seeing there.

I don't think it's 20x. Mintmaster found some gross problems with the code two pages back which once fixed increased perf on the 285 30x. What we're seeing could simply be the Fermi compiler automatically taking care of those.
 
Does someone know if SmallLuxGPU can use the two chips in 5970? I think there is some problem, and the program is compiled only on the first device. The second one gives black.
 
I don't think it's 20x. Mintmaster found some gross problems with the code two pages back which once fixed increased perf on the 285 30x. What we're seeing could simply be the Fermi compiler automatically taking care of those.
Yeah, Fermi is still slower than my 8800 GTS on CUDA :cool:

Can anyone with Fermi try my CUDA code from a few pages back? I think it will do around 1.5 GRays per second. www.its.caltech.edu/~nandra/SmallptGPU.zip

If I find some free time, I'll try to make a DirectCompute port. Seems like ATI and NVidia are more focussed on that than OpenCL.
 
Hasn't 2.0 expired?

It think only the beta version did. The final release doesn't, I know people that are using it right now (because of the problems with HD48xx).

Does someone know if SmallLuxGPU can use the two chips in 5970? I think there is some problem, and the program is compiled only on the first device. The second one gives black.

It is a problem with crossfire configuration. I have a 5870 and a 5850, if I connect them I get the same result you are describing (and the 5850 is erroneously recognized as a 5870: 20 compute units). Everything works fine when crossfire cable is not used.

It is yet another problem with ATI OpenCL driver, it has been reported a couple of time on their forum. It is another problem it is supposed to be fix in the next release :???:
 
the performance is not stable ... about 0.92 ~ 1.20 GRays/s .

1003301645c390b4c921d1e12b.png
 
If I find some free time, I'll try to make a DirectCompute port. Seems like ATI and NVidia are more focussed on that than OpenCL.

To my understanding (please correct me if I'm wrong), the compiler in DirectCompute is provided by Microsoft. That is, the compiler compiles from HLSL into some intermidiate assembly-like language (probably similar to how vertex shader and pixel shader work), then the driver compiles the assembly into hardware binary codes. Therefore, the compiler quality is more consistent (although not perfect, but still consistent over different vendors).

In the case of OpenCL, although the compilers are all based on LLVM (I heard from a friend that Apple requires this), they still varies in compiler quality.
 
To my understanding (please correct me if I'm wrong), the compiler in DirectCompute is provided by Microsoft. That is, the compiler compiles from HLSL into some intermidiate assembly-like language (probably similar to how vertex shader and pixel shader work), then the driver compiles the assembly into hardware binary codes. Therefore, the compiler quality is more consistent (although not perfect, but still consistent over different vendors).
AFAIK, it only does basic optimizations. The final optimizations and codegen is still left to IHV compiler.

The advantage for IHVs is that they can ignore the lexing/parsing/sema/dce phases, which are the most boring in a compiler anyway.
 
The first clue is all the talk of irreducible control flow.
On the upside, if they fix it inside their compiler backend (code explosion ahoy) they will be able to support goto for OpenCL as well.

Although in the end if they really want to they can just turn off all the optimization passes and do it all internally, I doubt the translation step introduces irreducible control flow.
 
Quick bump...something isn't right on my side. I can't get it to run more than 6 renderthreads even though I've set it to 8 (or more) :
EDIT: THansk to Tomb at the Lux forum I found my error: http://www.luxrender.net/forum/viewtopic.php?f=34&t=3643&start=30#p35294



I'm on a i7 i920 + 5870.

image.width = 640
image.height = 480
batch.halttime = 0
scene.file = scenes/luxball/luxball.scn
scene.fieldofview = 45
opencl.latency.mode = 0
opencl.nativethread.count = 4
opencl.cpu.use = 0
opencl.gpu.use = 1
opencl.platform.index = 0
opencl.renderthread.count = 8
opencl.gpu.workgroup.size = 64
screen.refresh.interval = 2000
screen.type = 3
screen.gamma = 2.2
path.maxdepth = 6
path.russianroulette.depth = 5
path.russianroulette.prob = 0.75
path.shadowrays = 1

An other thing (that isn't related..) I noticed whe running Sisoft Sandra's benchs is that Only single Precision is working in the DirectCompute benchs (Double is emulated). Wtth is going on (running the CAT 10.3b with the latest DX11 runtime and ATI Stream 2.0.1 SDK on Win7 64Bit)

 
Last edited by a moderator:
Back
Top