GPGPU capabilities in Radeon HD 3000/4000

Discussion in 'GPGPU Technology & Programming' started by Arnold Beckenbauer, Oct 1, 2008.

  1. Arnold Beckenbauer

    Veteran Subscriber

    Joined:
    Oct 11, 2006
    Messages:
    1,695
    Likes Received:
    651
    Location:
    Germany
    Question: What are CAL Compute Shaders, which are a new shader type on R700 hardware only?
    (Current Stream Computing SDK 1.2.0beta)
     
  2. Jawed

    Legend

    Joined:
    Oct 2, 2004
    Messages:
    11,306
    Likes Received:
    1,581
    Location:
    London
    Compute shaders provide an explicit "thread ID" based programming model. They're only programmable with CAL, which is basically a bastard offspring of D3D assembly with lots of machine-specific knobs and features (and backwards compatibility for HD2xxx-onwards GPUs).

    Compute shaders also do away with the "graphics" sense of a kernel, so inputs can no longer include "interpolated attributes" (mirroring vertex attribute interpolation in graphics programming) and the outputs have to be explicit writes ("memexport") to memory locations, instead of outputs into a "virtual render target". Inputs can consist of "memimport" or sampling using the texturing hardware, but with no "vertex attribute interpolation" all sampling operations are forced to be dependent, i.e. computed based on thread ID and whatever else the programmer decides.

    The explicit thread ID model also forms the basis of the "data share" mechanism in CAL. Here any "thread" can write data to any one of its own, 64 vec4 (128 bit) locations. Then, any other thread can read any of these 64 locations. So it's a sort of broadcast model without any explicit destination. Think of it as "write private/read public", which requires explicit synchronisation by the programmer. The Local Data Share and Global Data Share memories in RV7xx are where this action happens. I guess it involves a fair amount of juggling, moving LDS/GDS to/from video memory, and therefore involves a fair amount of latency-hiding, similar to the way GPUs hide the latency of texturing.

    Overall, CAL compute shaders could be described as a CUDA-isation in terms of explicit thread ID based programming and the explicit use of shared memory. Or it could be the model that's been drawn up for D3D11 compute shaders. Or maybe just a significant portion of it. Note that RV7xx's shared memory model isn't the same as CUDA's. CUDA allocates a fixed-size block of memory to be shared by all warps extant in a multiprocessor (thread block). So with less warps each thread has more memory to use. And all threads can write to all locations. But data cannot be shared with warps on other multiprocessors or in other clusters. That requires the programmer to do a separate write/read via video memory.

    Brook+ already exposes thread IDs and allows for "threads" to exchange data as well as hiding from the programmer the "graphics-ness" of GPGPU programming.

    I suppose the changes in RV7xx architecture will increase the efficiency of threaded Brook+ programming. But progress on Brook+ is very slow, and I can imagine OpenCL and D3D11 will gain the lion's share of AMD's internal software engineering resources.

    Brook programming was originally about a pure streaming model of computation with no ability to access and manipulate thread IDs and sharing data across threads. CUDA's main break with Brook was to abandon that pure streaming model as being too restrictive. AMD has basically come to the same realisation in Brook+ and is now on the second iteration of supporting this functionality directly in hardware. D3D11 Compute Shader is also based on that realisation. So one way or another AMD had no choice and I suspect CAL compute shader is either a preview of D3D11 CS or is a major step in that direction.

    Jawed
     
  3. ahu

    ahu
    Newcomer

    Joined:
    Jul 19, 2008
    Messages:
    56
    Likes Received:
    2
    Wow. Excellent piece of information there, thanks Jawed!

    One might clarify though, that Brook+ doesn't currently expose the data share, only CAL does. Nor does it allow access to the whole GPU video memory like CAL does.
     
  4. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Thanks Jawed, that was really helpful and informative
     
  5. Rufus

    Newcomer

    Joined:
    Oct 25, 2006
    Messages:
    246
    Likes Received:
    61
    And this is a perfect example of why choices aren't always a good thing. AMD really needs to focus on 1 language, put a bunch of effort behind it, and support it from here out. CTM being dropped for CAL and now CAL compute shaders, with brook and brook+ on the side is just too confusing.

    CTM is already dead. What's the chance that any of the rest survive after OpenCL comes out?
     
  6. Dave Baumann

    Dave Baumann Gamerscore Wh...
    Moderator Legend

    Joined:
    Jan 29, 2002
    Messages:
    14,090
    Likes Received:
    694
    Location:
    O Canada!
    CTM and CAL are not languages, they are interfaces. CTM was just somewhat lower level, while CAL abstracts a little more because it its intended to be more portable across generations. Irrespective of whether there is Brook+ or, later on, OpenCL, CAL will exist as the interface that those compilers will sit upon to access the hardware - but, right now you do not program "in CAL", you program in IL if you want low level access, or you use Brook+ or a 3rd party toolset from the likes of RapidMind or others for higher level access.
     
  7. Arnold Beckenbauer

    Veteran Subscriber

    Joined:
    Oct 11, 2006
    Messages:
    1,695
    Likes Received:
    651
    Location:
    Germany
    Too much INPUT...Error...Reset

    Thx.
     
  8. Arnold Beckenbauer

    Veteran Subscriber

    Joined:
    Oct 11, 2006
    Messages:
    1,695
    Likes Received:
    651
    Location:
    Germany
  9. mhouston

    mhouston A little of this and that
    Regular

    Joined:
    Oct 7, 2005
    Messages:
    344
    Likes Received:
    38
    Location:
    Cupertino
    That is correct. 46XX supports all of the extended compute stuff the 48XX boards have except for double precision.
     
  10. itaru

    Newcomer

    Joined:
    May 27, 2007
    Messages:
    156
    Likes Received:
    15
    http://www.bjorn3d.com/read.php?cID=1408
    ATI Stream Technology

    http://hothardware.com/News/ATI-Stream-Computing-Update/
    AMD ATI Stream Computing Update

    Starting with the Catalyst 8.12 driver release (tentatively due Dec. 10), ATI Stream related software will be built into the driver suite.
    With the Catalyst 8.12 release, every user of a ATI Radeon HD 4000 series cards automatically gains the ability to run ATI Stream-enabled applications.
    AMD is also releasing a brand new version of the free AVIVO video converter that's accelerated by the GPU.
    AMD also pointed out, that they have been working with companies like Cyberlink, ArcSoft, Microsoft, Adobe, and others
    to to deliver ATI Stream enabled applications in the coming months.

    http://ati.amd.com/technology/streamcomputing/stream-consumer.html

    Accelerate your digital world with ATI Stream Technology
    In December 2008, AMD is scheduled to release an update to its ATI Catalyst drivers,
    software version 8.12, that instantly unlocks new ATI Stream acceleration capabilities already
    built into millions of ATI Radeon graphics cards.
     
    #10 itaru, Nov 13, 2008
    Last edited by a moderator: Nov 13, 2008
  11. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    More details on it here. They'll be shipping a free gpu-accelerated transcoder, which beats badaboom converter. It'll be interesting to see if somebody(nvidia?) will come out with a free transcoder as well in competition.
     
  12. Arnold Beckenbauer

    Veteran Subscriber

    Joined:
    Oct 11, 2006
    Messages:
    1,695
    Likes Received:
    651
    Location:
    Germany
    I don't understand this: With 8.12, the catalyst package will include CAL DLLs. So it won't be necessary that every GPGPU application (SiSoft Sandra or F@H) brings its "own" CAL DLLs?
     
  13. wingless

    Newcomer

    Joined:
    Aug 5, 2007
    Messages:
    79
    Likes Received:
    0
    Location:
    Houston, Texas
    I cannot wait for Cat 8.12! 2009 will be a good year for ATI GPGPU.
     
  14. Davros

    Legend

    Joined:
    Jun 7, 2004
    Messages:
    16,872
    Likes Received:
    4,195
    I am so going to hold you to that...
     
  15. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    crap :(
     
Loading...

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...