GPU Ray-tracing for OpenCL

Discussion in 'Rendering Technology and APIs' started by fellix, Dec 27, 2009.

  1. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Thanks Talonman, good to see it works fine :wink:

    I have received the parts for my new PC yesterday (wow, the 5870 is really HUGE :shock:). Once I have installed everything (i.e. 2x OS, all the tools, etc.) I'm thinking to buy a cheap NVIDIA card for my old PC. This should be finally allow me to squeeze some better performance out of NVIDIA hardware.

    I was thinking to buy a 250GTS for my old PC as cheap test platform. Any better idea ? I don't know the line of NVIDIA cards very well.
     
  2. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
  3. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Nice work on the manual tuning!

    8800GT, 9800GT, 9800GTX, GTS250 - pick the cheapest you can find, they're basically slightly different speed grades of the "same chip" (two different revisions of the same chip).

    http://en.wikipedia.org/wiki/GeForce_8_Series
    http://en.wikipedia.org/wiki/GeForce_9_Series
    http://en.wikipedia.org/wiki/GeForce_200_Series

    Those pages seem accurate on first glance.

    GT240 is also an option, though slower.

    Technically GT220 or 9600GT would work, too.

    GT220/GT240 should have the better memory controllers of compute 1.2 devices (as opposed to compute 1.1 for the other cards). This behaviour might make a better match for the way GTX285 cards work and is slightly closer to the way the near-future generation of cards work. Though that's probably over-complicating things :???:

    Jawed
     
  4. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
  5. FrameBuffer

    Banned

    Joined:
    Aug 7, 2005
    Messages:
    499
    Likes Received:
    3
    kind of odd, that EVGA link points to Fudzilla parroting the suggestion that a dual fermi product will launch 1-2 months after the GTX285/260 replacements but the article right after talks about fermi being excessively hot. The idea of not one but two very hot gpus on a single pcb sounds like a cooling nightmare. Are they going to come with H20/Phase or maybe TEC/Peltier ? lol Rumblings are already saying very limited (5970) March launch and in volume numbers wouldn't show up until May.
     
  6. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    7,610
    Likes Received:
    825
    Getting the heat from the die into the air is not the problem, heatpipes are ridiculously efficient and even with only a dual slot solution there is enough surface area available for fins. Getting the hot air out of the case is the problem (because most people's cases suck, and the room on the backplate to exhaust air is limited ... also if it has to go out back the the air makes inefficient use of the fins).
     
  7. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    I imagine 3 Billion transistors per chip may throw some heat... ;)

    If I go for the Dual Fermi, I will opt to water cool it.
     
  8. Silent_Buddha

    Legend

    Joined:
    Mar 13, 2007
    Messages:
    19,426
    Likes Received:
    10,320
    If it produces that much heat with that many transistors, I'm shuddering to think what actual power consumption numbers will be for it.

    Regards,
    SB
     
  9. FrameBuffer

    Banned

    Joined:
    Aug 7, 2005
    Messages:
    499
    Likes Received:
    3
    Hmm, makes one wonder about the (in)validity of supposed "certified" cases for Fermi based SLI and if any such X2 products would require the use of any such case.
     
  10. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Dave, if you ever get access to a Nvidia system, I wonder if you could try a recompile of your program using Nvidia's OpenCL SDK.
    (I also don't know if there would be much work involved for you doing so, re-write wise.)

    We are talking about it here:
    http://www.evga.com/FORUMS/tm.aspx?high=&m=91863&mpage=4#118517

    I just wonder if it would give us Nvidia guys a performance boost.
     
    #150 Talonman, Jan 12, 2010
    Last edited by a moderator: Jan 13, 2010
  11. mhouston

    mhouston A little of this and that
    Regular

    Joined:
    Oct 7, 2005
    Messages:
    344
    Likes Received:
    38
    Location:
    Cupertino
    That thread misinterprets how the ICD model works. The whole point is to avoid having to compile against one vendor's implementation or another. The actual part that matters, device binary generation, is done at runtime. Nvidia's current ICD is a little out of date which is why the AMD and Nvidia ICDs don't play nice together, but a single code compiled on one should run on another. All of the API calls are standard.
     
  12. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    3,018
    Likes Received:
    582
    Location:
    Taiwan
    To my understanding, there shouldn't be a difference (at least on Windows) between using AMD or NVIDIA's SDK. Basically the cl.h is almost the same (they are different in only one line, which is a comment). The opencl.lib files are different, but they both linked to opencl.dll, with completely the same functions and calling conventions. So there shouldn't be any benefit from recompiling with different SDK.
     
  13. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Still would be a fun test, just to see...

    Would there be much work on Daves end?

    I don't want to ask him for a major re-write.
     
  14. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,716
    Likes Received:
    2,137
    Location:
    London
    Dave doesn't need to do anything, anyone with the SDK and card should be able to do this. Obviously, knowing one's way around compiler tools helps...

    Jawed
     
  15. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    #155 Talonman, Jan 13, 2010
    Last edited by a moderator: Jan 14, 2010
  16. hiro

    Newcomer

    Joined:
    Jan 13, 2010
    Messages:
    3
    Likes Received:
    0
    Not getting any change in performance compiling ver1.6 with the nvidia 3.0 beta sdk. Now trying the -cl-mad-enable flag I did get a boost from 262K to 289K samples/sec on a 8800m gts still had the same cpu load of 50%.:???:
     
  17. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Thanks for the info... ;)

    Please keep us updated.
     
  18. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Dave, I was doing more testing running the latest version, after getting all 3 GPU's to 33% utilization...


    Size 8 -> 2721k
    Size 16 -> 2967k
    Size 32 -> 3054k
    Size 64 -> 4915k
    Size 96 -> 3602k
    Size 128 -> 4451k
    Size 160 -> 4915k
    Size 192 -> 5041k
    Size 224 -> 3978k
    Size 256 -> 4321k
    Size 320 -> 4802k
    Size 384 -> 5173k

    Size 448 -> Would not run
    Size 512 -> Would not run
    Size 576 -> Would not run

    smallptGPU -> 5173k

    I noticed when I run the standard smallptGPU file, it says the 'Suggested work group size: 384'.

    That is also the work group size, that I get my best performance on...

    I tried to increase the work group size further, but the program would not run.

    Is it a known fact that Nvidia can't allocate a larger work group size than 384? Just wondering... :)
     
    #158 Talonman, Jan 14, 2010
    Last edited by a moderator: Jan 14, 2010
  19. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Oh, thanks, fixed.

    Did you noticed the recurring pattern ? The best performances are for any multiple of 64. I think 64 is the maximum number of threads that can run on one of the NVIDIA SIMT processor. Jawed can probably answer to this question.

    Keep in mind that not always "larger is better", the optimal workgroup size influenced by a lot of factors: hardware, size of the kernel, register usage, etc. At the moment choosing the best size looks a bit like black magic. The best practice is probably to do some field test and look for the best size.

    P.S. thanks for the NVIDIA 240 hint, it looks like a very cheap and good candidate for a testing platform. My main concern about the 250 was how old the architecture was.
     
  20. Talonman

    Newcomer

    Joined:
    Jan 2, 2010
    Messages:
    64
    Likes Received:
    0
    Thanks Dave... (Yes, there is a pattern) ;)

    I also asked in the Nvidia OpenCL Developers area these 2 questions:

    Question one: Is it a known fact that Nvidia can't allocate a larger work group size than 384? Just wondering...
    If so, what is the limiting factor? GPU memory?

    Answer posted bt avidday:
    "Workgroup size (the equivalent of blocksize in CUDA) is limited by the resources the OpenCL code uses. It will be different for every piece of code. The basic mutliprocessor unit in NVIDIA GPUs has limits on Workgroup size (512 is the current limited per workgroup, 768 or 1024 total per MP depending on hardware version), registers (128 per thread and 8192 or 16384 total per MP), and shared memory (16kb per MP). How much of each of those things the kernel uses dictates the maximum workgroup size. The only way to increase it is to make the code use less resources. Sometimes it helps performance, sometimes it doesn't."

    Second question: If we could further increase the Work Group size past 384, do you think we might see some additional performance?
    "That is totally hypothetical, and depends on the code for the reasons outlined above. It should improve up to a maxima as the workgroup size is increased, and then stay stable or even reduce after that. Whether this code has reached that point is a question I can't answer."
     
    #160 Talonman, Jan 14, 2010
    Last edited by a moderator: Jan 14, 2010
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...