http://www.techreport.com/onearticle.x/11929
Some interesting comments on CUDA. Sounds like the architecture is fairly complex.
Some interesting comments on CUDA. Sounds like the architecture is fairly complex.
Certainly. But this is what libraries are forhttp://www.techreport.com/onearticle.x/11929
Some interesting comments on CUDA. Sounds like the architecture is fairly complex.
If you combine the various comments around the net, you will conclude that:Cuda more complex than CELL, yeah..sure..fine..whatever
If you combine the various comments around the net, you will conclude that:
- CUDA is harder than CELL.
- CUDA is very hard to get more than 10% efficiency out of.
- CUDA has advanced memory hierarchy optimization requirements.
No offense intended to any of these people, but I would tend to believe every single one of these statements is a gigantic bunch of bullshit based on my experience...
If you combine the various comments around the net, you will conclude that:
- CUDA is harder than CELL.
- CUDA is very hard to get more than 10% efficiency out of.
- CUDA has advanced memory hierarchy optimization requirements.
No offense intended to any of these people, but I would tend to believe every single one of these statements is a gigantic bunch of bullshit based on my experience...
We have plenty of previous examples of hardware that failed to live up to their early marketing promise, from the i860 to the PS3. CUDA looks set to follow in their footsteps: I expect that it will take vast amounts of work for programmers to get halfway decent performance out of a CUDA application, and that few will achieve more than 10% of theoretical peak performance.
Bryan O’Sullivan has a beautiful summary of the present state of NVIDIA’s CUDA. He explains the programming model, along with the many different levels of memory and their restrictions (there are many ). I had been quite optimistic in my last post about CUDA (just from taking a quick glance at their source code), but Bryan’s very educated opinion brought me back to earth
.I think that O'Sullivan's point at the end about how only people on Wall Street and in the defense sector will love CUDA because they can commit the developer resources to learning it intimately is well-taken, but I don't see this as a criticism of the technology. To switch subjects for a moment and talk about CTM, the fact that AMD/ATI just opened up the assembly language interface to their GPUs and told people "have at it" is essentially an admission that they're currently only pitching it to parties who truly need this kind of performance and are willing to pay for it in programmer time. The same is almost certainly true of CUDA at this stage
Not sure about your background Arun, but these guys seem to know there stuff. Just because they are not heaping praise on CUDA doesn't make it BS.
Do you own a G80 board? Have you tried programming in CUDA? If not, why not? Why go from blogs when real experience is so cheaply had?
Programming in CUDA is what... a £250 purchase plus a free download of a beta compiler + SDK? I have one now, and I'm using it, and it's great. For £250. Despite IBM repeatedly proclaiming their undying love for me my colleagues and wanting our millions they are yet to open their wallet and allow me similarly cheap access to any Cell hardware ($17000 was their best offer), or to a Clearspeed plugged in to one of their Opteron boxen ("you can give us some code and we'll run it for you, maybe" ha ha).
CUDA has many flaws, partly because flexible as it is G80 isn't as flexible as a lazy programmer might wish. There are many reasons for this. However of the solutions possible (CUDA/CTM, Cell, Clearspeed) the GPU-based solutions are by far the most accessible to the wider developer community. Therefore they win, regardless of anything else. CUDA isn't finished yet, and neither is G80/90/1xx, but they're here, now and cheap.
BTW if these kids playing with CELL and CUDA are scared to use local memory (bless them)...they can simply not do it, at least on CUDA and live happily ever after.
If they want the same power of a GPU with the same flexibility and easy to use 'pc programming model' they can wait..... forever.
Extra flexibility comes at a cost and ever will.
I have far more programming experience on CELL than on CUDA but it does not take a rocket scientist to understand that it's much much easier to setup something running decently fast on the latter platform.Even cell is complicated to program. Read an article just the other day about developers wanting access or getting access to IBM cell engineers so they could better utilize the PS3 hardware.
I have far more programming experience on CELL than on CUDA but it does not take a rocket scientist to understand that it's much much easier to setup something running decently fast on the latter platform.
As I already wrote with CUDA you're not forced to explicitely make use of on chip memory, it's up to you to do so.
I hope that even an undergraduate student knows the difference between registers and external memory, is he/she is not able to cope with that I think the problem does not lie with CUDA or CELL or whatever other platform you want to adopt
Apart from per-thread register usage, I'm not sure if there are other factors that come in to play here. Or, how this count relates to DX9 or D3D10 usage of G80.basically, each multiprocessor (on G80) can support 24 32-thread warps at a time.
I'm talking about how many clock cycles of what is effectively "texturing latency" (setting parameters, fetching from memory, filtering) can be hidden by a single instruction.
Absolutely.Isn't the whole point of threading to have multiple threads/warps and multiple non-dependent instructions per thread/warp available to keep the ALU's busy during IO?
GPU: G80
Multiprocessors per GPU 16
Threads / Warp 32
Warps / Multiprocessor 24
Threads / Multiprocessor 768
Thread Blocks / Multiprocessor 8
Total # of 32-bit registers / Multiprocessor 8192
Shared Memory / Multiprocessor (bytes) 16384