The state of AMD's GPGPU implementation?

Discussion in 'GPGPU Technology & Programming' started by wingless, Aug 21, 2008.

  1. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    The only competitive thing with AMD right now (features wise) is CAL, which is assembly. Hardly developer friendly in 2008.
     
  2. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    Thats hardly accurate given that we do have Brook+ for high level access. ISV's are taking this and giving good results.
     
  3. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Umm.....

    I have gone through the documentation many times, but have found no mention whatsoever for data sharing/syncing in Brook+. LDS and GDS appear to be exclusive to CAL atm. I could be wrong but it seems that Brook+ doesn't expose LDS and GDS at all.

    Constant and texture caches are exposed in CUDA but they are opaque to Brook+ programmer. We don't get to choose which data merits what kind of caching.

    And Direct3D/OpenGL interoperability with Brook+ is definitely missing.
     
  4. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    [Guessing this is a leak from the other thread, where I have recently posted on the subject of LDS/GDS]

    The way I interpret this:
    1. the programmer needs to create an explicit synchronisation point. So if a kernel has two phases, separated by a synchronisation point then in Brook+ the programmer needs to make two kernels with the output of kernel 1 feeding into kernel 2 as input
    2. The data that the programmer wants to share amongst "threads" needs to be in the output of kernel 1, e.g. as an auxiliary stream or packed into the vec4-format. Kernel 2 can use the "index" feature of stream addressing to read from one or more foreign threads as well as its own intrinsically indexed stream data elements
    So Brook+ doesn't provide an explicit concept of sharing. The whole problem of kernel chaining, bifurcation, serialisation and the rest is all stuff that I, as non-Brook+ programmer, can't say much about. I hope I'm not leading you astray (I just browse these topics to get an overview)...

    No. Brook+ is very much "un-optimised" right now as far as I can tell. Correctness appears to be a much higher priority than performance. The stream model is also sufficiently abstracted from data layout in memory that it's prolly very hard to expose memory programming to the Brook+ programmer in a meaningful way. e.g. how are streams interleaved in memory?

    One could say that CUDA provides a "close to the metal" memory-hierarchy programming model. I think some would argue that it's too close.

    I think D3D11-CS and OpenCL are going to provide some interesting alternatives to this question. I expect there'll be a lot of arguments over memory programming for GPGPU during the next few years.

    Jawed
     
  5. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Jawed, while your post in other thread was very helpful, my previous post here is not some kind of "info transfer". What you are suggesting is ping-ponging, which every body did (prior to cuda) via d3d/ogl shaders. Here you are sharing and/or syncing via global memory which is very slow(latency wise). I am refering to sharing/syncing via the on chip cache (shared memory on nv gpu's, LDS/GDS on ati gpu's)

    I think CUDA is ok (for now atleast) as far as how "close to metal" the programming model should be.

    I looked at existing docs for D3D11 CS and OpenCL shders (whatever ppt's were avl) and they look to be exactly CUDA with a different terminology and different API on cpu-side to access the gpu. But yes, they will evolve over time and we will see interesting debates on how to take it forward.
     
  6. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    10,873
    Likes Received:
    767
    Location:
    London
    I'm suggesting a stream of sub-blocks, if the kernel (two kernels separated by a synch) as a whole is operating on a block. The sub-blocks would need an overlap radius. I presume that giving each sub-block a separate CAL context you can create this stream of sub-blocks and therefore hide the latency of memory operations and synchronisations.

    Honestly, I don't know if Brook+ already supports this technique. I'm envisaging that LDS/GDS provides IL with a smoother, more finely-grained, way of using memory and it's just a matter of time for Brook+ to take advantage of this. Older GPUs have the "read/write memory" whose function is rather vague - I interpret it as a small predecessor of LDS/GDS. Who knows.

    This is interesting because it's a critique of kernel and stream unmanageability in CUDA:

    http://www.kunzhou.net/2008/BSGP.pdf

    I'm pretty dubious about the "exactly", but we'll just have to wait and see.

    Jawed
     
  7. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
  8. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,415
    Likes Received:
    348
    Location:
    Germany
  9. fellix

    fellix Hey, You!
    Veteran

    Joined:
    Dec 4, 2004
    Messages:
    3,494
    Likes Received:
    405
    Location:
    Varna, Bulgaria
    Works for me with Cat 9.11 WHQL:

    [​IMG]
     
  10. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,415
    Likes Received:
    348
    Location:
    Germany
  11. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,803
    Likes Received:
    2,064
    Location:
    Germany
    It seems like it doesn't use Open CL - at least referring to sisoftware's website's comments, which later refer to a OpenCL Beta4 driver:
    "
    ATI Radeon HD 4850
    800 / 625MHz / 512MB
    359.7 / 177.7 MPixel/s (CAL)
    508.7 / 25.52 MPixel/s
    --> OpenCL allows us to achieve even faster performance than CAL, 50% better which is just incredible!

    ATI Radeon HD 5870
    1600 / GHz / 1GB
    912 / 459.478 MPixel/s (CAL)
    1588 / 69.5092 MPixel/s
    --> We see again ~50% gains in OpenCL versus CAL, the compiler doing better than us in optimising the code. Fantastic result!


    But strangely, the website also mentions that there's only emulated results for doubles in OpenCL. Which results in very mediocre, NV-like performance.
     
  12. AlexV

    AlexV Heteroscedasticitate
    Moderator Veteran

    Joined:
    Mar 15, 2005
    Messages:
    2,528
    Likes Received:
    107
    ATI's OCL stack doesn't expose DP yet IIRC.
     
  13. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    SiSoft also make note of that in the table below their results. Double support in OpenCL not part of the core and as an extension that isn't supported yet.

    If you run the off the shelf drivers at the moment, it will default to their Stream implemtation rather than OpenCL as the CAL interface for OpenCL isn't in there at the moment, plus some of that other OpenCL requirements; if you download the Stram SDK and use the drivers that are suplied there then it will use CAL. We'll be integrating the correct version of CAL in upcoming drivers but there may be an additional install required to install the necessary OpenCL libraries to enable OpenCL apps to run without requiring the full SDK on everyone's PC's.
     
  14. CarstenS

    Veteran Subscriber

    Joined:
    May 31, 2002
    Messages:
    4,803
    Likes Received:
    2,064
    Location:
    Germany
    Did you already decide in which of the many upcoming driver releases you'll integrate Open CL? I mean, all your commitment to open standards should be backed up by making it usable as soon as possible for a wide audience.
     
  15. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,415
    Likes Received:
    348
    Location:
    Germany
    The Stream SDK 2.0 beta 4 is installed on my PC, the OpenCL driver, too (9.11 supports OpenCL too).

    But Sandra doesn't want to use OpenCL, or is it because it's Sandra Light?
    OpenCL samples, pcchen's OpenCL apps works too etc.
     
  16. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,079
    Likes Received:
    648
    Location:
    O Canada!
    Yes.
     
  17. trinibwoy

    trinibwoy Meh
    Legend

    Joined:
    Mar 17, 2004
    Messages:
    10,435
    Likes Received:
    440
    Location:
    New York
    So is their terminology here wrong - "We see again ~50% gains in OpenCL versus CAL, the compiler doing better than us in optimising the code. Fantastic result!"

    It's supposed to be Stream vs OpenCL right? With both interfacing with CAL as the backend?

    Also, why don't I have the option to run the bench through OpenCL? Running the latest FW 195.55 beta. This is really confusing, the press release refers to Sandra 2010 with OpenCL support but all download links are for the 2009 version with the old Stream/CUDA benchmarks.

    Sigh....

    http://support.sisoftware.co.uk/knowledgebase.php?article=2

    Now that says 2009 SP5 is required but I can't find SP5 anywhere (or 2010 for that matter).
     
  18. Arnold Beckenbauer

    Veteran

    Joined:
    Oct 11, 2006
    Messages:
    1,415
    Likes Received:
    348
    Location:
    Germany
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...