A Survey of CPU-GPU Heterogeneous Computing Techniques

Discussion in 'GPGPU Technology & Programming' started by sparsh, May 23, 2015.

  1. sparsh

    Newcomer

    Joined:
    May 23, 2015
    Messages:
    20
    Likes Received:
    12
    https://www.academia.edu/12355899/A_Survey_of_CPU-GPU_Heterogeneous_Computing_Techniques
    accepted in ACM Computing Surveys 2015

    This paper surveys CPU-GPU heterogeneous systems, at all abstraction layers of system stack, ranging from microarchitecture to system and application-level. It identifies trends in CPU and GPU design (e.g. transistor count, core-count, 3D, interconnect) and also compares fused CPU-GPU chips (e.g. Llano) with discrete GPU systems. It classifies the research works based on their area of application (e.g. Physics, numerical algebra etc.) and programming languages they have used for CPU and GPU (e.g. OpenCL, CUDA, OpenMP etc.). It also shows benchmarks used for evaluating CPU-GPU systems, e.g. SHOC, Valar, etc.
     
  2. Arwin

    Arwin Now Officially a Top 10 Poster
    Moderator Legend

    Joined:
    May 17, 2006
    Messages:
    17,682
    Likes Received:
    1,200
    Location:
    Maastricht, The Netherlands
    Good stuff. I was lamenting that somehow no serious discussion was possible in a certain other thread as soon as a certain word is mentioned, and I hope this one will do better. Future systems won't be about CPUs or GPUs, but about how many of what type of cores we need and what the best balance is between specialisation and generalisation, with power draw a primary factor. In my experience, which is actually limited to CPU, the most important workload that is requiring the biggest increase of processing power, remains data manipulation, and I've learnt almost everywhere that I actually bothered to read (so could be limited) that running a constant stream of data through a lot of cores is definitely the best way to deal with the increasing demand here. However, 2.3.4. suggests that CPUs haven't exactly been standing still and their cores are getting much better at computational workloads too, so a simple shift to just all GPU is clearly also not the right answer.

    That is of course not to say that there remain various applications where branching is important (sorting), but we are still in relatively early days of algorithm development in that area, reminding me of the return of RISC to becoming mainstream (unroll those loops, death to recursion ... ). But of course the truth is never binary, but rather a float always afloat (wow, thanks for that infection @AlNets ... ).

    So the question is where the right balance is. I think 2.3.3. in this paper is interesting in pointing out that communication between CPU and GPU in PC is still a weak-point, and high level of CPU computation can already bottleneck just feeding the GPU showing additional inefficiencies. For the long list of references to earlier experiments, it would probably have been good to calculate some kind of power-to-compute ratio to clarify that the experiments aren't just using more silicon that happens to be there and then turn out better results, versus spending all that silicon on just CPU or just CPU and then perform the same task. Some kind of common transistor/power/speed factor for various brands of worlkloads could be a good idea here (I remember IBM programming manuals on various of their processors did include suggestions of what was better suited for what type of core).

    CPU linked to GPU with PCIe versus on the same chip though remains interesting. Currently a lot of high performance software, particularly games, are written with attention to this limitation, and I wonder if in games we will see some serious work done on different algorithms that make more of the closeness of the two components. I think PC will hold this back for a while yet, as the most expensive and performant GPUs are in systems where they are separate from the CPU and connected through that PCIe bus. On these systems, you may be better off having the GPU run on its own and feeding itself as much as possible, with as separate workloads as possible (as suggested at the end of 6), whereas APUs can benefit a lot from CPU and GPU working closely together on the same data, because there is far less data transfer cost. But these require significantly different approaches in code (I imagine). Consoles may drive a change here out of necessity (to extract more efficiency out of the system) and there will be some drive there too as the generation progresses, but this significant conflict of interest is likely holding things back (and also explains near-religious extermities in thinking in some discussions here on this forum).

    It's interesting to read that you even need new benchmarks for testing this (which makes sense), and that these do already exist.

    All in all this paper gives a very thorough overview of the work out there, but there seems little time devoted to actual algorithms, except the conclusion that we need platforms that reschedule and process workloads over different (C/G)PUs transparently, without programmers having to spend too much time on that, and this makes sense. Will be interesting to see if any of the game studios will pull this off, or that we'll see this kind of work primarily in the Havok's, Speedtrees et al fo this world.

    Thanks for sharing!

    (Also, in my mind APU stood for All Purpose Unit ... I never wrote a line of demanding code in my life other than by accident, so take my ramblings with a grain of salt ;))
     
    gamervivek likes this.
  3. sparsh

    Newcomer

    Joined:
    May 23, 2015
    Messages:
    20
    Likes Received:
    12
    Thanks Arwin, for the comments. You are correct that algorithms should have been discussed, but we reached the page limit.
    This paper also seeks to change the mindset of researchers that merely offloading computational tasks to GPUs is not optimal, instead, using both CPU and GPU can lead to potentially higher speedup.
     
  4. Rodéric

    Rodéric a.k.a. Ingenu
    Moderator Veteran

    Joined:
    Feb 6, 2002
    Messages:
    3,986
    Likes Received:
    847
    Location:
    Planet Earth.
    Not read the paper yet, will try to do that in the afternoon, but I'm interested in the topic, seeing more and more APU all around, it will be interesting to see how that turns out.
    ATM there's AVX2 on Intel that kind of competes with GPU if you use a SPMD compiler, I wonder how CPU/GPU will evolve.
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...