LuxMark v1.0

Discussion in 'GPGPU Technology & Programming' started by Dade, Feb 25, 2011.

  1. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,415
    Likes Received:
    348
    Location:
    Germany
  2. Man from Atlantis

    Regular

    Joined:
    Jul 31, 2010
    Messages:
    732
    Likes Received:
    6

    it seems GF104 superscalar(2way 32+16sp array) sucks and only utilizes 224sp(2/3) here.. 5092*(480/224)*(825/950)=9475.. ~7% off though maybe it's because lesser bw, caches and error rate.. at least gf104 made for gaming not compute or else
     
  3. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    BTW, there have been some impressive score recently reported on LuxRender forums.

    This has been posted by KyungSoo, LuxMark running on 8 x 480GTX:

    [​IMG]

    And this by Royoni, LuxMark running on 4 x HD6990 + 2 x Xeon E5620:

    [​IMG]
     
  4. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    14,891
    Likes Received:
    2,309
    how the hell is he running 8 gtx460's ?
     
  5. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
  6. Florin

    Florin Merrily dodgy
    Veteran Subscriber

    Joined:
    Aug 27, 2003
    Messages:
    1,648
    Likes Received:
    219
    Location:
    The colonies
  7. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
  8. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    He has recently scored a new record with 8x580GTX:

    [​IMG]

    Power supply related problems aside, this should be the current limit (i.e. it should be not possible to use more than 8 cards).
     
  9. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Won't 8 HD 6990s (with single-slot H20 of course) run togehter or is 4cards/8chips their hard limit?
     
  10. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    You can use PCI-E ribbon extenders to populate all the slots on the motherboard with double-wide cards. Some custom system enclosure would be required for such setup to mount and hold all the cards steady.
     
  11. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    I know there is 8 devices limit (in the BIOS ? From where is it coming from ?) however I don't know how a HD6990/590GTX is counted, 1 or 2 devices :?:

    You have to factor also the power supply problem: Royoni was running LuxMark on 4 x HD6990 and he reported already quite some problem finding a suitable power supply.

    You can not surpass 2KW on a normal power wall socket (at least where I live). You are going to need an industrial power line too :shock:
     
  12. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    14,891
    Likes Received:
    2,309
    pot + kettle ?
     
  13. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,430
    Likes Received:
    433
    Location:
    New York
    Dade, have you profiled LuxMark on various hardware? Just curious to know where the bottlenecks lie on different architectures. Are any specific design decisions hindering or helping performance in a significant way?
     
  14. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Not recently but me and some other user have in the past. My current concern about the general validity of LuxMark as benchmark are:

    1) the scene (i.e. LuxBall HDR) used for the benchmark is a bit too simple. The average path length is vary small (i.e. less than 2 rays). This lead to not very much divergence on GPU threads. A more complex/heavy scene could be more representative of generic rendering load.

    2) kernel execution time could be too small on high-end GPUs and kernel lunch time/overhead could represent a too important aspect of the benchmark. For instance, you could probably obtain artificial high scores by optimizing only the kernel dispatch overhead.

    3) memory accesses are extremely scattered. However I consider this a positive aspect of the benchmark. It does represent a more generic workload than some easy-for-GPUs task like matrix multiplication, etc. However this may favorite some architecture like NVIDIA Fermi (i.e. cache) over AMD GPUs.

    Past profiling sessions have shown that the ALU utilization (on AMD VLIW GPUs) isn't bad (>60%).

    In general, the code is written to run on Apple, AMD, NVIDIA, OpenCLs, on CPU and GPU devices. As any code written for portability it isn't particularly optimized but this should show how good an OpenCL implementation is in running generic code.
     
  15. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,430
    Likes Received:
    433
    Location:
    New York
    Thanks Dade, interesting info.
     
  16. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    What's the other of the two benchmark scenes doing differently? I'm getting vastly different results on that one (it's luxball without the HDR) - lower ones that is, as well as a way more steep drop on AMD than on Nvidia: HD 6970 drops to 54 (!), GTX 580 to 1035 (!).
     
  17. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    14,891
    Likes Received:
    2,309
    what are your scores before the drop
     
  18. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    LuxBall HDR has as no direct light sampling (i.e. it is a brute force path tracer): while tracing the reverse path of light, at each vertex, it traces only a single ray to evaluate where is the next path vertex. Eventualy the path will hit the "background" and will receive light from there.

    Other scenes have direct light sampling: 2 rays are shot at each path vertex, one to evaluate where the next path vertex is (as before) and another toward area light sources to check if they are visible.

    As you see, LuxBall HDR requires about half of the work (ray to trace) to generate a sample compared to other scenes.

    Brute force path tracing is also more SIMD-friendly as it usually leads to less thread divergence (it is what I used in http://www.youtube.com/watch?v=Dh9uWYaiP3s to achieve real-time).
     
  19. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,798
    Likes Received:
    2,056
    Location:
    Germany
    Big thanks Dade, that explains a lot (though not quite the Geforce being 20x faster - i'd have thought maybe 2x at most). Of course, Radeon HD 5k+ should be 1.5-2x fast when brute force SIMDing something that remotely fits their VLIW. :)
     
  20. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,489
    Likes Received:
    400
    Location:
    Varna, Bulgaria
    Well, thread divergence is something that's better be avoided, especially running on AMD's VLIW architecture that would lead to very poor utilization. But the difference in performance here is really staggering!
     
Loading...
Similar Threads - LuxMark
  1. Dade
    Replies:
    52
    Views:
    13,249
  2. Dade
    Replies:
    26
    Views:
    8,039

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...