Particle rendering and animation (GPU & CPU)

sebbbi

Veteran
I am interested in the techniques that developers developing current games are using for their particle rendering and animation system. We switched for GPU quad expansion (particle geometry generation), GPU transformation, GPU particle lighting and shadowing (shadow map sampling at particle center point) in our new engine, as we are targeting a single console platform only. But for PC and cross platform titles, choosing the most optimal way to render particles can be difficult: DX9 (all Windows XP computers) does not support vertex index input semantic and/or custom vertex buffer data fetch, stream out or memexport or geometry shaders. Many DX9 hardware also lacks the support of texture sampling in vertex shader and R2VB is merely a API hack in DX9 (officially only supported by ATI cards). These limitations make it hard to do efficient particle animation, particle quad geometry generation and particle lighting/shadowing on DX9 hardware.

Do you have different path for DX9 particle rendering, or have you designed your title for the lowest common denominator in mind?


Some questions I'd like to ask for game developers about their particle rendering and animation systems:


Particle animation:
- How complex is your animation system? Are you just simulating simple linear movement and gravity or are you using artist defined envelopes (linear, splines, etc) to control vast amount of particle and emitter animation properties on runtime? If you are using GPU to process this data, are you storing the animation envelopes in textures or are you using arrays of constants?
- Are you doing collision detection for the particles or applying various forces to them based on the gameplay (gravity fields, explosion shockwaves, wind from character movements, etc)?
- Do you depth sort your particles? What kind of algorithm are you using? Are you using the last frame sorting information to speed up your algorithm?
- Are you using the CPU or GPU to do the animation (or a hybrid approach)? If you are using CPU, is your animation system multithreaded? Did the complexity of your animation system force you to use CPU to do the animation (or a part of it)? If you are using GPU what technique are you using to store & fetch the result (R2VB, streamout/memexport, vertex texture sampling, etc)?


Particle geometry generation and transformation:
- Are you transforming the particles to the post projection screen space on GPU or on CPU?
- Are you generating the particle quads on GPU or on CPU?
- What quad generation technique are you using for different particle types (point particle, rotated particle, stretched particle, frame/texcoord animated particle, etc)?
- Are you mixing up techniques depending on particle type (for example fixed function GPU point sprites for point particles, and CPU for all other types?)


Particle lighting and shadowing:
- Are you lighting your particles? Do you use similar light model you are using for your geometry or a simplified one (some sort of averaged lighting for example instead of multiple light sources)?
- Are you calculating a single lighting value for the particle center point (or the four vertices) or do you calculate the lighting for each rendered pixel (normal maps? height/density maps?)
- Are you shadowing your particles? Do you use the same data (shadow maps?) generated for geometry to do the shadow comparisons for your particles?
- Are you calculating the shadowing for the particle center point or edge vertices only (sampling shadow maps in vertex shader or software?) or for each rendered pixel in the pixel shader?
- Does your particles cast shadows to geometry? From all light sources? Soft or hard shadows?


Volumetric / soft particles:
- Are you using a method to render volumetric particles (to hide z-clipping on particle plane)?
- Do you use constant particle depth volume size (scaled with particle post projection size) or you do calculate a per pixel depth volume extent (calculated from particle texel alpha/brightness value or stored in different height/density channel in particle material)?
 
Last edited by a moderator:
This has some notes on shadowing of translucent objects which includes smoke:
http://ati.amd.com/developer/SIGGRAPH08\Chapter05-Filion-StarCraftII.pdf

We actually use very similar rendering system for our translucent particle shadows and shadows from translucent objects. We however are just using a single channel for the shadow alpha, so we can store both the z-value of the first opaque intersection and the color modulation in the same G16R16 buffer. It's a bit faster this way, as we do not need colored shadows.

The limitation of this technique is that the translucent shadows (from translucent objects) can only affect opaque objects (not other translucent objects). Normal shadows can affect both transparent and translucent objects (including particles). This was not mentioned in the StarCraft pdf, but there was no mention of shadowing the particles themselves either, so I doubt they are shadowing the particles, just casting shadows from the particles.
 
I was wondering if anyone has tried to depth sort particles in the GPU? I found some white papers on GPU sorting algorithms, but these do seem to be targeted at business applications with millions of nodes to sort and not for smaller array sizes used in games (we have only around 10 000 particles visible in a single frame).

http://www.ce.chalmers.se/~uffe/hybridsort.pdf
http://www.cs.unc.edu/~geom/SORT/gpusort.pdf

We are currently sorting particles on CPU (in our dedicated particle animation thread) using a custom radix sorter (that outputs index data), and using the generated indices in vertex shader to fetch particle data from a large vertex buffer. This is not exactly the most vertex cache friendly implementation, but our particle vertex shader is quite heavy (lighting, screen space transformation and shadow map sampling code) and only one vertex is outputted for each particle in this stage. The result is written to another vertex buffer which is used as an input to the screen space quad expansion vertex shader + particle quad rendering pixel shader (reads from this vertex buffer are 100% cache friendly, as particles are processed sequentially and the same vertex is fetched 4 times in a row for each particle).

So quiet here, seems most graphics programmers are not interested in particle rendering :)
 
So quiet here, seems most graphics programmers are not interested in particle rendering :)
Sorry. I've always felt that translucency sorting was the job of the rasteriser hardware, so haven't considered software methods. <shrug>
 
Back
Top