OpenCL (Open Compute Library)

Discussion in 'GPGPU Technology & Programming' started by NocturnDragon, Jun 10, 2008.

  1. iwod

    Newcomer

    Joined:
    Jun 3, 2004
    Messages:
    179
    Likes Received:
    1
    Just wondering what are the chances of using OpenCL on Intel X4500?
     
  2. randomhack

    Newcomer

    Joined:
    Apr 4, 2008
    Messages:
    41
    Likes Received:
    0
    I have to wonder whether Intel is supporting OpenCL on Larrabee. If claims of Intel that Larrabee is much more programmable than current cores is correct, they shouldnt have much of a problem in supporting any standard.
    I am also wondering if Apple will be using Larrabee? Are Snow Leopard and Larrabee scheduled in the same time frame?
     
  3. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,428
    Likes Received:
    426
    Location:
    New York
    In what respect? Do the structures map well to the Grid -> Block -> Warp -> Thread CUDA hierarchy?
     
  4. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    Or a better question is how exactly they are making the functionality portable without hindering performance. CUDA performance is all about fetching aligned full memory bus granularity blocks into "shared memory" and then allowing the SIMD units to do swizzled fetches from that shared pool. Kind of hard to see this porting well to AMD chips..
     
  5. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    That's an abstraction that would hardly work on a wide range or architectures.
    Compute shaders should be an opportunity to refine CUDA and get it a bit more right (who cares about number of warps/blocks/grids/wavefront/whatever..)
     
  6. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Uhm, that kind of stuff is pretty easy for the HPC community. Certainly for a game developer that's probably intidimating and compute shaders in DX11 should abstract at least some of it, but I think for their initial target market that was not really a problem. Their guide is a strange and rather suboptimal way to teach the paradigm & language though, they told me they're working on revamping it completely but we'll see what happens...
     
  7. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    It really doesn't matter if your 'average' HPC engineer can get all his/her CUDA numbers right, the first company that can deliver a language and/or framework that let you focus on the real important things wins. Many of us (me included) get all excited about hardware architectures but if I have learnt something after having spent a relatively long time working on next gen consoles CPUs is that who gets the software architecture 'right' (whatever it might mean) will take the crown.
    CUDA is nice and everything but I refuse to believe that in 2 or 3 years from now we are still going to use as it is now, it will evolve or it will eventually lose its leadership.
    In such a small and new field is very easy to go from being first of your class to fall into oblivion.
     
  8. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    CUDA is certainly going to evolve, let alone because of changes in their DX11 hardware architecture and the fact individual developers & consumer apps will become even more important in the future (and those have much lower complexity tolerance). However, I don't think it's really necessary to completely hide all of those implementation details; just a layer API with a higher level of abstraction would do the trick. Hitting the right sweetspot for it may be difficult however.
     
  9. MfA

    MfA
    Legend

    Joined:
    Feb 6, 2002
    Messages:
    6,768
    Likes Received:
    470
    Anyone using local/shared storage?
     
  10. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    The problem with CUDA IMHO is that it's a little too specific to the G80/92/T200 architecture. It doesn't map naturally to other architectures with different memory hierarchies and although it can be "made to work", something a bit more abstract is needed for a standard that is meant to be targeted to a wide range of parallel processors with varying memory hierarchies.

    The other problem with CUDA is that it's just too damn hard to make it fast/optimal ;) This is more a problem with the complexity of the underlying hardware than the language itself, but the point remains that the language does nothing to prevent you from seriously shooting yourself in the foot, which is never a good thing. As it stands, even simple problems require highly non-linear optimization and machine-learning style optimization algorithms to even approach 50% of peak performance. There are just too many variables that affect performance in highly non-linear ways for us mere mortals to get right ;)

    Now the above is just a tough problem with parallel programming and complex architectures in general, but it begs the question as to whether we need to be specifying algorithms in something a bit more general and tunable than CUDA, and then the backend/compilers can handle the heavy-lifting as far as optimization and targeting to a specific memory model go.

    Anyways there are certainly many interesting topics moving forward, and it will be fascinating to see what falls out of OpenCL and similar initiatives (DX compute shaders, etc).
     
  11. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    Great paper BTW. Interesting that the difference between worst and peak is only 235%.

    As for peak performance, I'm assuming you are referring to ALU utilization? How many graphics programs ever reach peak ALU performance?

    The point being that it is always tough to reach peak performance under any platform, and in all cases you have to have intimate hardware knowledge to tune (or engineer the algorithm in the first place). I think a great example of this is the potential of floating point performance on the xbox 360 or cell/ps3. In both cases you need to vectorize. On 360 you have to stay in cache and aligned, and have a huge amount of work going in parallel to hide really long instruction latencies... ie you really have to program as you do on a GPU to get anywhere close to peak ALU performance. Most developers either will not or cannot do this for anything but a small amount of code.
     
  12. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Well not just ALU utilization... I'm also considering things like how cleverly you touch memory, avoid cache misses, etc. Basically everything that makes your algorithm as fast as it can theoretically be on a given set of hardware. I realize this is largely hand-wavy, but I'm just trying to make the distinction between - say - the naive vs. hand-tuned vs. autotuned versions of algorithms.

    And yes, it's definitely tough to reach any sort of peak performance, but I'm concerned that on G8x and similarly complex architectures it has gone beyond "tough" into the realm of automated empirical optimization (as the paper that I referenced does). This process can potentially be "guided" or hinted or pruned by the user in the majority of cases, but with all of the factors that come into play when making something fast on G8x/CUDA, it is simply infeasible for even a ninja programmer to find a globally optimal configuration of tuning parameters except in the simplest of cases. The best we can do is a sort of orthogonal gradient ascent (in each dimension) which can be quite suboptimal in the case of something like G8x.

    Anyways my only real point here is that CUDA is pretty tied to a specific architecture, and pretty complex in terms of extracting excellent performance out of that architecture. I submit that these are characteristics of a low-level, relatively non-portable language which is great in its own right, but not suitable as-is for something like OpenCL or DX compute shaders.
     
  13. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,743
    Likes Received:
    106
    Location:
    Taiwan
    I agree that CUDA is too tied to a specific architecture, which makes it very hard to "generalize." However, the problem of "hard to optimize" is very difficult to solve. Even CPU have the same problem. For example, a matrix multiplication algorithm, even written in C/C++, without considering SIMD, will not have optimal performance if the cache size is not considered.

    Of course, the beautiful thing of a CPU is (especially a x86 CPU), even a "normal" program (not specifically optimized for a certain architecture) may perform relatively well. The same can't be said for GPU, or any other more "exotic" architectures, including CELL.

    IMHO, it's almost impossible to hide all architecture details while maintaining high performance. To do so, it would require a lot of "helper" hardwares, which sort of defeat the idea of GPGPU. Therefore, the most important problem right now, is probably to figure out what is the "best" architecture for GPGPU, which all major vendors can accept, and also useful for most application developers.
     
  14. Andrew Lauritzen

    Moderator Veteran

    Joined:
    May 21, 2004
    Messages:
    2,526
    Likes Received:
    454
    Location:
    British Columbia, Canada
    Oh no doubt! I didn't mean to imply that optimizing for G8x is in any way hindered by CUDA... just that writing optimal CUDA code is sufficiently tied to the G8x platform that I consider it a fairly "low-level" language. Clearly CUDA is the best (and only) language for targeting G8x hardware "to the metal", but I remain unconvinced that it provides a good general-purpose, portable programming model.

    Anyways I don't want to come off as anti-CUDA - quite the contrary! I just don't think it makes sense for something like CUDA to be the programming model of choice for writing code to target stuff like AMD GPUs, multicore CPUs, Larrabee and Cell.
     
  15. nAo

    nAo Nutella Nutellae
    Veteran

    Joined:
    Feb 6, 2002
    Messages:
    4,325
    Likes Received:
    93
    Location:
    San Francisco
    Or for whatever NVIDIA will unleash in the next 18/24 months..
     
  16. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
  17. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,023
    Likes Received:
    299
    Location:
    UK
    Good read indeed... :)

    However I have one major problem with it: the whole handheld thing is patently absurd. Mostly visionaries who fail IMO to understand the difference between theory and practice... There is no use case for a FP32-centric device in this field, and there are massively better architectures *on the market today* for every single application you could ever imagine. These solutions already are orders of magnitude more efficient than x86 CPUs which GPUs compare favorably to.

    Might be useful for non-graphics tasks in games, especially because proprietary hardware won't be often exposed let alone standardized, but beyond that I'm very very skeptical. I also laughed at this sentence: "such as being able to point the phone's camera at a building and then process the image so that it can tell you which building it is." - right, because GPS and location-aware services (showing nearby buildings) could *never* do that for a billionth the cost and the power while delivering a better user experience... right?

    I'm sorry for being a bit mean here, but I'm not a big fan of random predictions that contradict the fundamental dynamics of computer architecture and system design. just because they'd benefit you strategically. And I thought Intel had patented that intellectual process, anyway?
     
  18. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,413
    Likes Received:
    347
    Location:
    Germany
  19. TimothyFarrar

    Regular

    Joined:
    Nov 7, 2007
    Messages:
    427
    Likes Received:
    0
    Location:
    Santa Clara, CA
    "Wearing his NVIDIA hat, Trevett says his company is fully supportive of the OpenCL effort and they're going to be careful not to set up CUDA as an OpenCL competitor."

    So perhaps CUDA remains as the low level interface OpenCL uses to access the hardware on NVidia's cards?
     
  20. ADEX

    Newcomer

    Joined:
    Sep 11, 2005
    Messages:
    231
    Likes Received:
    10
    Location:
    Here
    There's been stuff on-line about OpenCL for quite some time, URL doesn't seem to be widely spread though.

    There's a whole load of other interesting stuff as well. Enjoy:

    http://s08.idav.ucdavis.edu/
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...