CUDA 4.0 Announced

Discussion in 'GPGPU Technology & Programming' started by B3D News, Feb 28, 2011.

  1. B3D News

    B3D News Beyond3D News
    Regular

    Joined:
    May 18, 2007
    Messages:
    440
    Likes Received:
    1
    NVIDIA has announced today their latest release of CUDA. We examine the upcoming features and how it might pertain to you. Can we expect a CUDA revolution with this release? Read on to find out.

    Read the full news item
     
  2. Arun

    Arun Unknown.
    Moderator Legend Veteran

    Joined:
    Aug 28, 2002
    Messages:
    5,002
    Likes Received:
    231
    Location:
    UK
    In theory the most interesting feature for beginner/casual CUDA developers who use a single GPU looks to be the automatic performance analyzer (unless you count Thrust, which was available before as a library anyway). I'll be curious to see whether it's any good in practice though :)
    (btw, grats on your first lengthy news post!)
     
  3. Dade

    Newcomer

    Joined:
    Dec 20, 2009
    Messages:
    206
    Likes Received:
    20
    Peer-to-Peer communication between GPUs looks like a very interesting feature too.
     
  4. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    Unified mem space is HUGE.

    Lots of <3 to nv. :)

    Any guesses how they might be doing it? Mmap the entire mem region and mark it as uncacheable? Will that be sufficient?
     
  5. Tim Murray

    Tim Murray the Windom Earle of mobile SOCs
    Veteran

    Joined:
    May 25, 2003
    Messages:
    3,278
    Likes Received:
    66
    Location:
    Mountain View, CA
    You can't directly dereference GPU pointers from the CPU.
     
  6. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    This is very confusing. I thought the meaning of unified address space was that same pointer can be used by both the cpu and the gpu.

    Will there be functions provided to do the job of TLB for gpu addresses?
     
  7. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,716
    Likes Received:
    89
    Location:
    Taiwan
    I haven't reading much into it yet, but I think the unified address is from the GPU's point of view. That is, multiple GPUs can access memories (GPU's video memory and mapped host memory) using the same address.
     
  8. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    How can you do that without qpi/ht and without polluting the cpu caches?
     
  9. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,716
    Likes Received:
    89
    Location:
    Taiwan
    To my understanding, PCIe bus works with FSB/QPI/HT for cache coherence. However, since snooping is still required, so if you need better performance (and if the CPU only writes to the memory) it's generally better to simply mark it non-cachable/write-combining.
     
  10. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    I am pretty sure PCIe can't work with FSB/QPI/HT without vendor specific extensions. After all, the PCIe controller in the GPU can't possibly fathom what kind of coherency protocol the CPU is using.
     
  11. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,716
    Likes Received:
    89
    Location:
    Taiwan
    It can't, but the CPU can. For example, if a GPU need to read from (or write to) a specific memory address, this memory address has to go through the FSB/QPI/HT. So the CPU can snoop on it, as long as the GPU does not cache the memory on its own.
     
  12. codedivine

    Regular

    Joined:
    Jan 22, 2009
    Messages:
    271
    Likes Received:
    0
    Going slightly off-topic, but does anyone know if the relevant extensions be made to Nvidia's OpenCL implementation as well?
     
  13. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    That could work. Gpu <-> gpu memory writes could work in a similar manner.

    I still want a true cache coherent interconnect between the two though.
     
  14. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha Subscriber

    Joined:
    May 14, 2005
    Messages:
    1,365
    Likes Received:
    215
    Location:
    NY
    CUDA 4.0 RC is now out for registered developers.
     
  15. rpg.314

    Veteran

    Joined:
    Jul 21, 2008
    Messages:
    4,298
    Likes Received:
    0
    Location:
    /
    The docs haven't been posted. Sigh :(
     
  16. willardjuice

    willardjuice super willyjuice
    Moderator Veteran Alpha Subscriber

    Joined:
    May 14, 2005
    Messages:
    1,365
    Likes Received:
    215
    Location:
    NY
    The RC2 is out (as you guys probably know). One of the big changes between RC1 and RC2 is I guess Nvidia decided to be nicer and allow any Fermi based GPU to utilize peer-to-peer transfers (instead of just tesla cards)!
     
  17. pcchen

    pcchen Moderator
    Moderator Veteran Subscriber

    Joined:
    Feb 6, 2002
    Messages:
    2,716
    Likes Received:
    89
    Location:
    Taiwan

Share This Page

  • About Us

    Beyond3D has been around for over a decade and prides itself on being the best place on the web for in-depth, technically-driven discussion and analysis of 3D graphics hardware. If you love pixels and transistors, you've come to the right place!

    Beyond3D is proudly published by GPU Tools Ltd.
Loading...