GPU Ray-tracing for OpenCL

Discussion in 'Rendering Technology and APIs' started by fellix, Dec 27, 2009.

  1. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,415
    Likes Received:
    348
    Location:
    Germany
    Dade: With SmallLuxGPU 1.3 and 1.4 beta 3 I get this error:
    http://www.abload.de/img/errorp4oy.png
    Older versions and SmallPTGPU work fine.

    Win 7 x64, Cat. 8.712.3.1 (OpenGL 4.0&3.3 Preview Driver), Stream SDK 2.01, HD4850.
     
  2. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    ATI OpenCL SDK 2.01 has a known problem with HD48xx family. According a post in ATI forums, it will be fixed in the next SDK release.

    However, for the moment, the only solution is to downgrade to SDK 2.0 :???:
     
  3. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,415
    Likes Received:
    348
    Location:
    Germany
    Wow, great. :sad:
     
  4. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
  5. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Hasn't 2.0 expired?

    Jawed
     
  6. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,432
    Likes Received:
    438
    Location:
    New York
    I don't think it's 20x. Mintmaster found some gross problems with the code two pages back which once fixed increased perf on the 285 30x. What we're seeing could simply be the Fermi compiler automatically taking care of those.
     
  7. jj99

    Newcomer

    Joined:
    Mar 29, 2010
    Messages:
    4
    Likes Received:
    0
    Does someone know if SmallLuxGPU can use the two chips in 5970? I think there is some problem, and the program is compiled only on the first device. The second one gives black.
     
  8. Mintmaster

    Veteran

    Joined:
    Mar 31, 2002
    Messages:
    3,897
    Likes Received:
    87
    Yeah, Fermi is still slower than my 8800 GTS on CUDA :cool:

    Can anyone with Fermi try my CUDA code from a few pages back? I think it will do around 1.5 GRays per second. www.its.caltech.edu/~nandra/SmallptGPU.zip

    If I find some free time, I'll try to make a DirectCompute port. Seems like ATI and NVidia are more focussed on that than OpenCL.
     
  9. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    It think only the beta version did. The final release doesn't, I know people that are using it right now (because of the problems with HD48xx).

    It is a problem with crossfire configuration. I have a 5870 and a 5850, if I connect them I get the same result you are describing (and the 5850 is erroneously recognized as a 5870: 20 compute units). Everything works fine when crossfire cable is not used.

    It is yet another problem with ATI OpenCL driver, it has been reported a couple of time on their forum. It is another problem it is supposed to be fix in the next release :???:
     
  10. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    416
    Likes Received:
    2
    the performance is not stable ... about 0.92 ~ 1.20 GRays/s .

    [​IMG]
     
  11. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    The Link doesn't work anymore?
     
  12. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,767
    Likes Received:
    150
    Location:
    Taiwan
    To my understanding (please correct me if I'm wrong), the compiler in DirectCompute is provided by Microsoft. That is, the compiler compiles from HLSL into some intermidiate assembly-like language (probably similar to how vertex shader and pixel shader work), then the driver compiles the assembly into hardware binary codes. Therefore, the compiler quality is more consistent (although not perfect, but still consistent over different vendors).

    In the case of OpenCL, although the compilers are all based on LLVM (I heard from a friend that Apple requires this), they still varies in compiler quality.
     
  13. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    AFAIK, it only does basic optimizations. The final optimizations and codegen is still left to IHV compiler.

    The advantage for IHVs is that they can ignore the lexing/parsing/sema/dce phases, which are the most boring in a compiler anyway.
     
  14. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    It seems to me AMD's in a nightmare tussle with LLVM.

    Jawed
     
  15. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    why?
     
  16. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
  17. cho

    cho
    Regular

    Joined:
    Feb 9, 2002
    Messages:
    416
    Likes Received:
    2
    [​IMG]

    only 240 KB Global Cache ?
     
  18. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,816
    Likes Received:
    480
    On the upside, if they fix it inside their compiler backend (code explosion ahoy) they will be able to support goto for OpenCL as well.

    Although in the end if they really want to they can just turn off all the optimization passes and do it all internally, I doubt the translation step introduces irreducible control flow.
     
  19. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    Global memory is wrong too, so I think you can assume it's broken in some way.

    Jawed
     
  20. Ike Turner

    Veteran Regular

    Joined:
    Jul 30, 2005
    Messages:
    1,884
    Likes Received:
    1,759
    Quick bump...something isn't right on my side. I can't get it to run more than 6 renderthreads even though I've set it to 8 (or more) :
    EDIT: THansk to Tomb at the Lux forum I found my error: http://www.luxrender.net/forum/viewtopic.php?f=34&t=3643&start=30#p35294

    [​IMG]

    I'm on a i7 i920 + 5870.

    An other thing (that isn't related..) I noticed whe running Sisoft Sandra's benchs is that Only single Precision is working in the DirectCompute benchs (Double is emulated). Wtth is going on (running the CAT 10.3b with the latest DX11 runtime and ATI Stream 2.0.1 SDK on Win7 64Bit)

    [​IMG]
     
    #320 Ike Turner, Mar 31, 2010
    Last edited by a moderator: Mar 31, 2010
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...