Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 26-Jun-2007, 03:26   #1
B3D News
Beyond3D News
 
Join Date: May 2007
Posts: 440
Default NVIDIA CUDA 1.0 released

NVIDIA has released version 1.0 of its CUDA programming framework, with a number of new features including asynchronous kernel calls and 64-bit Linux support.

Read the full news item
B3D News is offline   Reply With Quote
Old 26-Jun-2007, 03:28   #2
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

I was all excited about writing global mutexes too, until John Stone told us that it wasn't supported by G80.
Tim Murray is offline   Reply With Quote
Old 26-Jun-2007, 04:58   #3
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

You need to see this as an opportunity. "But, John, I'm going to need you guys to give me an 8600 then so we can cover it the way it deserves. . . "
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline   Reply With Quote
Old 26-Jun-2007, 06:19   #4
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

Quote:
Originally Posted by Geo View Post
You need to see this as an opportunity. "But, John, I'm going to need you guys to give me an 8600 then so we can cover it the way it deserves. . . "
Er, John Stone is the guy at UIUC who wrote all of their CUDA stuff (and probably knows more about making CUDA apps go fast than anyone who's not on the CUDA team proper).
Tim Murray is offline   Reply With Quote
Old 26-Jun-2007, 08:57   #5
nutball
Senior Member
 
Join Date: Jan 2003
Location: en.gb.uk
Posts: 1,612
Default

Awesome news! Slightly worrying that two parts from the same family of hardware can have differing functionality like that though -- going to make for some compatibility-issues-from-hell situations. Oh well, that's progress I suppose!
__________________
2+2 is not a matter of opinion.
nutball is offline   Reply With Quote
Old 26-Jun-2007, 15:28   #6
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

Quote:
Originally Posted by nutball View Post
Awesome news! Slightly worrying that two parts from the same family of hardware can have differing functionality like that though -- going to make for some compatibility-issues-from-hell situations. Oh well, that's progress I suppose!
I really don't think it will. Like I said, the only thing that is different right now is the support for atomic functions, and I still can't really figure out why you'd ever want to use them. Performance is probably completely abysmal, for one, and I would imagine that you could do all of it on the CPU much faster.
Tim Murray is offline   Reply With Quote
Old 26-Jun-2007, 15:41   #7
Geo
Mostly Harmless
 
Join Date: Apr 2002
Location: Uffda-land
Posts: 9,156
Send a message via MSN to Geo
Default

And yet they added them to the more recent part. That says something.
__________________
"We'll thrash them --absolutely thrash them."--Richard Huddy on Larrabee
"Our multi-decade old 3D graphics rendering architecture that's based on a rasterization approach is no longer scalable and suitable for the demands of the future." --Pat Gelsinger, Intel
". . .its taking us longer than we would have liked to get a [Crossfire game] profiling system out there" --Terry Makedon, ATI, July 2006
"Christ, this is Beyond3D; just get rid of any f**ker talking about patterned chihuahuas! Can the dog write GLSL? No. Then it can f**k off." --Da Boss
Geo is offline   Reply With Quote
Old 26-Jun-2007, 17:17   #8
silent_guy
Senior Member
 
Join Date: Mar 2006
Posts: 2,054
Default

Quote:
Originally Posted by Tim Murray View Post
I really don't think it will. Like I said, the only thing that is different right now is the support for atomic functions, and I still can't really figure out why you'd ever want to use them. Performance is probably completely abysmal, for one, and I would imagine that you could do all of it on the CPU much faster.
At this moment, the only way for multiple blocks to interact with eachother is by using really ugly hacks. With atomic functions, it's much easier and you can now implement semaphores and polling loops to align all blocks and restart calculating without the overhead of the CPU having to reissue a kernel.

If you only use __syncthreads intra-warp and 1 atomic function per warp for inter-warp synchronization, then maybe performance won't be all that bad?

I had a quick look at the SDK this morning and grepped for 'atomic': they have the histogram64 example where they use atomics on a 1.1 shader and reduction on a 1.0 shader. It would be nice if someone with a 8600 could try both and compare the execution speeds.
silent_guy is offline   Reply With Quote
Old 26-Jun-2007, 17:25   #9
Tim Murray
the Windom Earle of GPUs
 
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,277
Default

Quote:
Originally Posted by silent_guy View Post
At this moment, the only way for multiple blocks to interact with eachother is by using really ugly hacks. With atomic functions, it's much easier and you can now implement semaphores and polling loops to align all blocks and restart calculating without the overhead of the CPU having to reissue a kernel.
I'm just not convinced that it's going to be faster than just using the CPU to perform global synchronization. I also wonder how it's implemented (whether it costs two memory operations or what).
Tim Murray is offline   Reply With Quote
Old 27-Jun-2007, 05:12   #10
silent_guy
Senior Member
 
Join Date: Mar 2006
Posts: 2,054
Default

Quote:
Originally Posted by Tim Murray View Post
I'm just not convinced that it's going to be faster than just using the CPU to perform global synchronization. I also wonder how it's implemented (whether it costs two memory operations or what).
I assume you're hinting at using L2 caches in the ROPs to prevent full external memory round trips?

As for being faster or not than CPU based synchronization: It will probably depend on the amount of warps in play? For a smaller number, atomic operations will definitely have a lower overhead than a CPU relauch (PCIe latency etc.)
For a large number, atomic ops may have too many collisions and eventually CPU overhead will be smaller. My feeling is that you should be able to go pretty far with with atomics, before you hit a wall, by having multiple synchronization stages.

Anyway, it's definitely nice to have the option. Since I just want to play around with it, absolute speed is not my top concern: I may buy an 8600 instead of an 8800 just for this feature.
silent_guy is offline   Reply With Quote
Old 27-Jun-2007, 05:22   #11
armchair_architect
Member
 
Join Date: Nov 2006
Posts: 128
Default

Quote:
Originally Posted by Tim Murray View Post
I'm just not convinced that it's going to be faster than just using the CPU to perform global synchronization. I also wonder how it's implemented (whether it costs two memory operations or what).
No idea if this is how they've implemented it of course, but in the graphics pipeline, the z/stencil tests and color blending are all atomic RMW operations. So this is very similar to something they've optimized heavily before.
armchair_architect is offline   Reply With Quote

Reply

Tags
cuda, nvidia

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT +1. The time now is 23:55.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.