If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Dinosaur Hunter
|
Hey all. As part of my grad research, I'm going to eventually need to write a Poisson solver. Now, I've done this a million times for a regular old Pentium in C, but I got to thinking...we mostly just use SLOR, which is a bunch of (mathematically) simple vector operations, so it seems like this is something you could program a GPU to do. However, I know exactly jack diddly squat about how to tell a GPU to do anything, especially since I use g++ in linux. I have Visual Fortran on my Windows box, but I'm not sure I could use Direct X.
So the question: a) Is this feasible? b) If yes, can you point me to where I can learn how to program a GPU? Any such resource would preferably include a lot of mathematical explanation.
__________________
Don't vote; it just encourages them. |
|
|
|
|
|
#2 |
|
hardware monkey
Join Date: Mar 2007
Posts: 3,904
|
Perhaps you should look into Cg or CUDA or CTM.
|
|
|
|
|
|
#3 |
|
chaos dunk
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,274
|
Not familiar with the algorithm--could you post some links on the topic?
Some basic links to start: CUDA homepage, the best place to learn about CUDA, BrookGPU, and GPGPU.org, which could have some helpful links or info in their forums. Basically, you want to avoid DX/OGL because then you're potentially screwed with every new driver revision. CUDA lets you write for G8x/G9x without touching 3D APIs. Brook is the progenitor of CUDA (I think that's fair to say), and it supports general D3D9 codepaths as well as AMD's CTM (which lets you skip 3D APIs as well) for R5x0 (don't know if it suports R6x0 yet). There's RapidMind as well, but I don't know if that's really necessary here (not free, for one) but it would let you target Cell. The rumor goes that AMD is announcing something at Supercomputing 07 that will let you target multi-core CPUs as well as GPUs, but who knows when that will come out, how good it will be initially, what it will actually support, etc. At the moment, Brook is your best choice for AMD chips, and CUDA is your best bet for NVIDIA cards. (okay token CTM explanation: write your app in HLSL, compile using the CTM compiler, and then you can call that without going through 3D APIs. and then if you want anything outside of the D3D9 spec--which is a lot, in terms of GPGPU--it's time to monkey with assembly. this is why nobody talks about CTM anymore.) Okay, now let's talk about where things can go terribly, terribly wrong. First, expressing your problem in the first place! The basic idea is that instead of writing your problem sequentially (obviously) or in terms of the usual task-oriented parallel model you see (CPU 1, go run function X, CPU 2, run function Y, etc), you need to have a data-parallel formulation for your algorithm. Essentially, you need to have a relatively-branch-free way of running your algorithm on a single piece of your data set at a time. This can be really tricky and just doesn't work for a lot of things. This is why I linked the UIUC course above--it makes a serious effort to explain how to do that. Second, branching is bad. Don't do it if you like performance except on a very coarse level (you'll see that if you look at the CUDA stuff). Third, copying from the CPU to the GPU is expensive. If you constantly need to synchronize with the CPU in order to run other stuff, things could get ugly. Fourth, and kind of the fundamental assumption: you need to have enough arithmetic intensity to hide latency from memory accesses. This isn't like a CPU, where you might lose twenty cycles or so waiting on a memory access. You lose hundreds, and if you're constantly waiting on memory accesses, your performance will be worse than a recent CPU. Finally, and I am dumb for forgetting to mention this beforehand, you might just be able to use the CUBLAS library if you're really just doing simple matrix/vector ops with large matrices. (five bucks that mhouston will show up and call me dumb--that's okay, I'd appreciate it!) |
|
|
|
|
|
#4 |
|
Member
Join Date: Oct 2006
Posts: 214
|
At first glance it seems that the Poisson equations are somewhat similar to the Navier-Stokes equations (both being complex PDEs). If that's the case this chapter from GPU Gems 3 has a complete writeup of how to solve it.
|
|
|
|
|
|
#5 |
|
Dinosaur Hunter
|
Although I would question the formal accuracy of the methods they're using in the above presentation (one can get qualitatively good-looking solutions w/o actually being all that accurate), they're computationally doing the same general sort of stuff. And if that's all being done on the GPU, there's no reason we need to do anything on the CPU, either.
We don't do branching, just lots and lots of for loops on matrices and vectors.
__________________
Don't vote; it just encourages them. Last edited by fearsomepirate; 06-Nov-2007 at 22:41. |
|
|
|
|
|
#6 | |
|
Senior Member
Join Date: Mar 2006
Posts: 1,687
|
Quote:
|
|
|
|
|
|
|
#7 | |
|
Member
Join Date: Oct 2006
Posts: 214
|
From the Cuda talk at SC07 over the weekend in this presentation: http://www.gpgpu.org/sc2007/SC07_CUDA_3_Libraries.pdf
Quote:
|
|
|
|
|
|
|
#8 |
|
Member
Join Date: May 2002
Location: Austria
Posts: 699
|
I wrote a GPU-based multigrid Poisson solver using OpenGL as part my master thesis (slightly over a year ago). Except for some (expected) inefficiencies at coarse grid levels multigrid methods are very well suited to GPUs.
(Edit: sorry for the very late bump, I just now took a look at the dates of the previous replies. I have recently posted mostly on high traffic forums so I didn't expect a thread on the first page to be many months old) |
|
|
|
|
|
#9 | |
|
Meh
Join Date: Mar 2004
Location: New York
Posts: 9,809
|
Quote:
__________________
What the deuce!? |
|
|
|
|
|
|
#10 |
|
Senior Member
Join Date: Aug 2004
Posts: 2,454
|
I am not as knowledgeable as you guys...but if you had an SLI setup...could your code if it can be parallelized using CUDA take advantage of basically what amounts to 2 processors since that is what SLI technically allows?
|
|
|
|
|
|
#11 |
|
chaos dunk
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,274
|
An SLI device appears to the system as one GPU, so running a CUDA app on an SLI setup will only use one chip. AFR and SFR don't make sense in the context of CUDA either. However, depending on your algorithm, it is often possible to write code that scales to multiple GPUs--you just can't have them in an SLI setup.
|
|
|
|
|
|
#12 | |
|
Senior Member
Join Date: Aug 2004
Posts: 2,454
|
Quote:
|
|
|
|
|
|
|
#13 | |
|
chaos dunk
Join Date: May 2003
Location: Mountain View, CA
Posts: 3,274
|
Quote:
|
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|