Anyway, as Andy said the most ressource consuming part of this demo is polygons rendering, not sure cell would help.
Indeed the SPU->vertex fast path on PS3 could be used to great effect in this demo. However to be fair it could easily be made to run faster on the GPU as well via some simple LOD, so it's not as if there's a performance problem there'course it would. You can do all the vertex data generation/transformation on Cell, do the culling, and then stream the results to the RSX, which should be up to the task from there on.
Yup, for sure it's a *very simple* water model. The entire advection term is dropped from the NS equations, and there's no way you'll get complex effects like turbulence, etc. Basically it just models waves, which works well for ponds and stuff and looks decent for lakes, etc. but certainly doesn't realistically model turbulent flow such as rivers. Still, I think it looks kind of coolLooks very nice, although the waves etc. are far from fully realistic from a river but hey, we're in 2007 and not 2070 so I won't complain too much about that!
Yup! That's the paper I was referring to when I said that the hybrid techniques (2D+3D) look pretty promising. I also think SPH is interesting since it's pretty straightforward to parallelize.BTW, Andy, did you read this paper?
Nice demo! I agree with you that particle-based methods like SPH aren't well-suited to large volumes of water, but you can still get some good results on todays' GPUs.
Here's a video of the latest particles demo from the CUDA SDK:
http://www.youtube.com/watch?v=RqduA7myZok
It's not quite as pretty as yours, but it gives you an idea of the performance possible.
I think the best fluid simulation for games is going to be achieved by combining the strengths of these and other techniques.
for all particles {
for all particle neighbors {
gather ID of the neighboring particle
lookup particle properties in table based on ID <-- Bad texture locality???
do math } }
TimothyFarrar said:Seems as if most codes (regardless of if they employ a uniform grid, spacial hash, or something else for the nearest particle search) tend to end in the following form,
Whatever structure you use to accelerate the neighbour search has to be rebuilt periodically. That can be a challenge if you want to do it in parallel, particularly if you want to migrate particles around in memory to help ease the locality issues.
Do you mean specifically for the fluid example? I except it to scale very well, but I'll certainly post results when I get to messing around with it
If you mean the x86 backend in general the results are often very good so far... there are a few benchmarks posted on the site IIRC, but it's not uncommon to see something like 2x speedup on one core over typical C++ code, and 10+x speedups on 8 cores.
It doesn't do it "automatically" per se: it provides an embedded programming model that allows you to easily express computations in a way that can be efficiently parallelized. In the simplest case it's just data parallelism (SIMD) although once you start throwing in gather/scatter, control flow and collective operations (scan, reduce, etc.) it gets significantly more expressive. Now under the hood RapidMind does a ton of optimizations, some auto-vectorizing where it can, lots of fancy memory management, etc. but the general model is to help the programmer to write good parallel code easily rather than try to infer parallelism from serial code (which is a dead end IMHO).How can Rapidmind automagically scale code (efficiently) among multiple cores? You'd always have to design your code/algorithms to support multiple cores somehow?
It doesn't do it "automatically" per se: it provides an embedded programming model that allows you to easily express computations in a way that can be efficiently parallelized. In the simplest case it's just data parallelism (SIMD) although once you start throwing in gather/scatter, control flow and collective operations (scan, reduce, etc.) it gets significantly more expressive. Now under the hood RapidMind does a ton of optimizations, some auto-vectorizing where it can, lots of fancy memory management, etc. but the general model is to help the programmer to write good parallel code easily rather than try to infer parallelism from serial code (which is a dead end IMHO).
Certainly operations such as blocking inter-process communication are intentionally restricted or unsupported to force people to write efficient parallel code, but many applications can be expressed efficiently in such a form, sometimes using a different "more parallel" algorithm. Sure there are algorithms that seem to "resist parallelizing" but I find that for the most part once you get used to parallel programming models it becomes almost as natural as writing serial code... then again maybe I'm just too deep in it now
If you're interested in RapidMind and how it works, I'd definitely encourage you to check out the web site and sample code. You can also request an evaluation copy to play with.
Oh neat, I hadn't seen that myself. Don't know how they compare, but I'll certainly check out LLVM in detail when I get the chance: thanks for the link.Very interesting subject! I recently became interested in something similar: http://www.llvm.org/
That's funny, because at a Terrasoft "hack-a-thon" early this year one of the RapidMind guys worked with one of the Mesa guys for a few days to accelerate the programmable shading parts of Mesa too. It was just hacking for a few days, but they managed to get about an 80x speedup or similar on the Cell (PS3, using the SPUs) over the CPU implementation. Apparently it was quite easy to write too, basically just changing the "interpretor" portion to use RapidMind types, which provides a really simple way to turn an interpretor into a compiler using RapidMindWhich can be used to dynamically compile DSL to different backends too, see for example:
BTW, I believe that Apple is currently using LLVM for runtime compiling of software emulated shaders.