What is missing in GPU hardware to do classic Global Illumination algorithms?

Acert93

Artist formerly known as Acert93
Legend
I was browsing the Developer Forum and I stumbled across the following:

Dany Lepage said:
SM 3.0 is going to be good enough for some time. There is only one big step left (before GPU start evolving just like CPUs --> performance only) that should allow classic global illumination algorithms to be efficient on GPUs. I doubt SM 4.0 will provide that.

A number of things were left on the D3D10 cutting floor, but I am curious: What is necessary for us to begin doing "real" global illumination? (I know, I am never happy... we are just beginning to see PRT in games).

I also found Dany's comment on performance interesting. A number of posters have made similar comments (e.g. Mintmaster recently said, "GPU features will probably reach a relative standstill soon. DX10 seems to be forward looking enough that software will need years to catch up (it's already way behind). This makes a 3-year long-term project feasible" of course there have been others, so I am not picking on Minty!) Very interesting stuff.

What are the final few "features" that will be added to GPUs before we begin seeing performance being the primary focus?
 
I think it's arguably the focus already. DX10 as I understand it adds some nice conveniences and performance-gainers perhaps, but computationally the SM3.0 model is quite capable as is. I think I heard Carmack say a little while back that for all intents and purposes now you can do whatever you want with a GPU - the only question is if you can do it quickly enough.

As for global illumination..I don't know. Even in movies that don't strictly use GI only, AFAIK, but also hacks, in places at least. Whatever looks good enough, usually is.
 
DX10 hardware can likely do everything you'd need. It might be hampered doing it (say, stream out) because the cards are not focused on accelerating that.

I think the biggest thing holding games back is a GI algorithm that runs in predictable time, is fast and gives reasonable results.
 
Megadrive1988 said:
billions of transistors and 5-6 years of time ?

I think you guys kind of missed the nuance in Dany's comment ;)

If you read Dany's comment there is one thing (read: feature) left, other than just pure performance, before classic GI algorithms become effecient on GPUs.

We know D3D10 is adding some new features (Geometry Shaders, Integer instructions, etc). The key word in Dany's comment was "effeciently". There is a difference between hardware being capable of something, and being able to do it well. Adding features like "Dynamic Branching" allow new techniques to be realistically accomplished at acceptible performance levels.

Titiano said:
As for global illumination..I don't know. Even in movies that don't strictly use GI only, AFAIK, but also hacks, in places at least. Whatever looks good enough, usually is.

I am not sure he was suggesting games would ONLY use GI, and that is besides the point. The question is when will GI become feasible on GPU hardware -- not whether it would be the only technique used. ;)

Even D3 with a unified lighting/shadowing system used some hacks and short cuts. e.g. Vents with rotars used shadow maps I believe because it was faster and did not impact IQ. There will ALWAYS be hacks. Heck GI is a hack :D

Anyhow, my question is more directed at:

1) What features are missing from GPUs that make GI inefficient? When may we see this added?

2) What features will we be seeing in GPUs before they begin narrowly focusing on performance jumps like CPUs?
 
well its gonna be a lot more than 5/6 for all but the most trivial scene (which u can do in GI now FWIW, i posted an IOTD about 4 years ago on flipcode.com of a GI version of the cornell box running at ~10fps on my celeron433 )

the main problem of doing GI is u will have to have access to all the scenes data fast, eg store all the geometry on the card (alongside the textures etc)
also if its a game u will most likely have animated characters running around, which means scenegraph construction timeyour sg stuff cant be as tight as u would ideally want (both cosing performance)
also with games u typically have everything textured thus the result of the ray->poly intersection need a further lookup to find the correct color
 
GI in realtime is probably not feasible for meaningful scene for a while, and not in our forseeable future. This has nothing to do with a lack of transistors but a lack of predictable memory accesses. I.e. light can hit any nearly any object in the scene before reaching the surface it's suppose to illuminate, requiring a random access in memory to just about any polygon in the scene, even offscreen ones. Because memory access is so unpredictable, you'll almost never have a cache hit so you'll be killed by latency to main memory. This is not resolvable with more transistors unless all graphics data is stored in eDRAM, which is also not feasible for the forseeable future. In the first instance we are limited by physical distance to the RAM which does not allow for great latency, the second is limited by the fact that graphics data will constantly increase and eDRAM likely will never catch up.

So basically, we may see quantum computers before we see GI unless some really clever person or persons come up with an amazing solution.
 
First we need an algorithm that comes even close to being usable in the non-raytracing world of hardware rendering. Raytracing just can't get near the speed of rasterization, so there won't be a paradigm shift for a while.

My biggest guess is that techniques like spherical harmonic lighitng will get more advanced. In Sloan's original PRT paper he proposed a method of neighbourhood transfer that should give us a pretty good starting point for GI-ish effects. Some alterations are needed to make it feasibly realtime, though.
 
Mintmaster said:
First we need an algorithm that comes even close to being usable in the non-raytracing world of hardware rendering. Raytracing just can't get near the speed of rasterization, so there won't be a paradigm shift for a while.

My biggest guess is that techniques like spherical harmonic lighitng will get more advanced. In Sloan's original PRT paper he proposed a method of neighbourhood transfer that should give us a pretty good starting point for GI-ish effects. Some alterations are needed to make it feasibly realtime, though.

I'm actually very curious how well highend hardware optimized for raytracing would work these days. Some of the fpga raytracing work was pretty impressive for running at 90Mhz:

http://graphics.cs.uni-sb.de/SaarCOR/DynRT/DynRT.html

As far as doing realtime global illumination, Henrik Jensen (of photon mapping fame) has been pioneering that front:

http://graphics.ucsd.edu/papers/plrt/

Nite_Hawk
 
I'm not sure what you mean by "plastic kind" but the lighting in Halo 3 is only an approximation of GI. Approximation is probably the best we are going to get this gen and possibly the gen after this one. I look forward to seeing other devs use lighting techniques like it.
 
Last edited by a moderator:
All rendering is approximation. I've yet to see a digital representation of an analog thing be otherwise.
 
Isn't Halo 3 using Global illumination? Or is it some kind of fake plastic kind...
"Plastic kind"? At best, what Halo 3 or any other game in the universe has as far as Global Illumination would be precomputed lightmaps that used some sort of more exhaustive render scheme (e.g. radiosity, path tracing, photon maps,...). Maybe some level of precomputed radiance transfer schemes for certain objects (e.g. , but that can be a pain in the neck on characters since they effectively "change shape."

I mean, the fundamental difference between global and direct illumination is simply that global illumination is about sampling the scene, while direct is all about lightsources. What makes GPUs in the normal sense incapable is that they operate such that 1 vertex or 1 pixel or 1 triangle or whatever is in a vacuum. In the GI sense, one triangle isn't necessarily independent of all others. The thing is that it takes a lot of tris to form a scene, and that's part in parcel of what makes GI expensive.... that and the question of what does it take to consider a sampling of the environment "good"?
 
Last edited by a moderator:
Forza 2 allready uses global illumination

"
Q: We’ve announced that FM2 will be 60fps at 720p. Can you talk a bit about some of the other techniques we’ll be employing to make FM2 look next-gen?

JW: "We’re really taking full advantage of the immense graphics horsepower of the Xbox 360 to do some incredible things visually. Of course we’re 60fps at 720p, as you mention, but we’re also adding effects and features such as 4X anti-aliasing (no jaggies!), motion blur, high dynamic range lighting, global illumination and per-pixel car reflections updated at full frame rate. I could go on and on. Really a ton of stuff. Too much to list here."

"
Source: http://forzamotorsport.net/devcorner/pitpass/pitpass02.htm

So to answer/quess question "1) What features are missing from GPUs that make GI inefficient? When may we see this added?":
Maybe GI is allready inefficient( on 360 GPU anyway.. ) because use of some neat algorithms used with massive bandwidth offered by eDRAM?? ( yeah so my answer is another question really ;) )

And so people really can visualize what GI( Global Illumination ) is all about. ( because I really did't know before I did some research ) :

Source: http://www.finalrender.com/products/feature.php?UD=10-7888-35-788&PID=17&FID=113

Without GI
room_without_gi.jpg


With use of GI
room_with_gi.jpg
 
As mentioned elsewhere, this GI isn't realtime in Forza. It'll likely be prebaked illumination maps or things like PRT that pre-record GI interactions. This thread does a good job of explaining why GI can't be done fast. You need super-fast access to all the objects in the scene for multiple ray samples. Unless the Forza graphics can be fitted into the eDRAM, it can't be used to solve the problem, and even then the phenominal BW of the eDRAM is only available to the logic within it, so unless that can perform raytracing you still won't have access to enough BW. In answer to Acert's question, my best guess is that there was no particular idea in mind for Danny when talking about hardware to enable GI, and he was just saying that's the only step to go - we've got all the other bases covered. Personally I disagree and think HOS support needs to be improved too, possibly even non-tesselated rendering :mrgreen:
 
Screw rasterization! We need raytracing hardware. Then GI will simply be a matter of throwing more silicon at the problem, which is pretty much what GPUs are all about, anyway ;).
 
Thanks Shifty for directing comments at my question. I wish Dany was around to explain his opinion on what is missing in hardware (and lacking in SM3.0) for classic GI on GPUs!

ShootMyMonkey said:
Screw rasterization! We need raytracing hardware. Then GI will simply be a matter of throwing more silicon at the problem, which is pretty much what GPUs are all about, anyway ;).

Some general questions.

1. Obviously introducing dedicated hardware raytracing is kind of starting at "square 1" all over again. Will we see GPUs move in a direction where they may straddle the fence by not sacrificing their rasterization support while implimenting raytracing support into the pipeline? Is this just a matter of more ALUs, more and faster memory, etc or are there some specific needs still not met?

2. There are some raytracing "chips" to accelerate the process in hardware (if we could an FPGA setup for raytracing... anyhow). How do these get around memory problems? Or are they slow and have low geometry levels for that very reason (i.e. not fast enough memory access).

3. If you were making a checklist of things to change/add to GPUs of today to get to realtime GI, what would they be? (You don't need to make up fake specs, just a general idea of memory requirements and processing requirements).

4. Do we even want traditional offline rendering techniques, or even derivatives of them, in realtime? Or as we move to more deformable and interactive worlds and what not we will find hacks to eventually give similar quality as the hardware improves/becomes faster, but we will just be doing it another way?

I believe I had read somewhere one advantage of raytracing techniques is that the computational expense is linear for geometry, but rasturization starts off cheap, but at some point as complexity increases it becomes more and more expensive and at some point surpasses raytracing. But my memory could be fuzzy on this point.
 
Acert, from what I've seen of the hardware ray tracers is that they only speed up rendering, but not to a level where ray tracing can be done at 30-60 fps. They're mostly useful for professional artists whoi need to speed up their rendering times. I'm sitll not convinced these hardware solutions render better then software solutions, and even then their software support is pretty limited.

http://www.artvps.com/page/15/pure.htm

This is a link to one.
 
Obviously introducing dedicated hardware raytracing is kind of starting at "square 1" all over again. Will we see GPUs move in a direction where they may straddle the fence by not sacrificing their rasterization support while implimenting raytracing support into the pipeline? Is this just a matter of more ALUs, more and faster memory, etc or are there some specific needs still not met?
Given that GPU manufacturers have a lot of vested interest in rasterizers, I don't see a square 1 approach working out in real life short of a newcomer to the market making waves with their whole new "raytracing approach to graphics" which I don't see happening (unless someone introduces it in a console, where legacy isn't a necessity). There may eventually be a need to start including some basic raycasting hardware if people start pushing for more polygon capacity and the worries over sub-pixel polygons becomes a need to start *sampling* tris rather than rendering them.

The other possibility is that something comes up that supercedes both triangle rasterization and raytracing in favor of something more "all-inclusive," which sounds a lot like the future Dave Orton predicted a few years back.

There are some raytracing "chips" to accelerate the process in hardware (if we could an FPGA setup for raytracing... anyhow). How do these get around memory problems? Or are they slow and have low geometry levels for that very reason (i.e. not fast enough memory access).
As it so happens, I believe there are example cases with the SaarCOR FPGA hardware that DO deal with large geometry sets. My best guess is a whole lot of ports to memory and some reasonably sized geometry caches. I honestly don't know what they do, but they're able to do basic first-hit raycast lighting (without shadows) on huge datasets at acceptable framerates for such a high tricount w/ no LOD mechanisms.

e.g. -- http://graphics.cs.uni-sb.de/SaarCOR/DynRT/SunCor-08-1024.jpg
http://graphics.cs.uni-sb.de/SaarCOR/DynRT/SunCor-09-1024.jpg
Apparently 187,145,136 tris, and no LOD. 3-6 fps isn't too bad for that. Still has some floating point issues, though. I imagine, though, the performance will go to hell with shadow tests. They have some examples of around 10,000,000 triangle scenery rendered with shadows around 2-3 fps.

The single biggest difference between raytracing hardware and rasterizer hardware, though, is that with raytracing, pixels are in the outer loop, and its geometry (raytests) that are in the inner loop. That means that the performance is going to go linearly with the resolution.

If you were making a checklist of things to change/add to GPUs of today to get to realtime GI, what would they be? (You don't need to make up fake specs, just a general idea of memory requirements and processing requirements).
Assuming you're sticking with rasterization...
Well, being able to buffer off a full scene in VRAM is a big part of it. And that also means a lot of random access to medium-sized blocks of data in RAM. Threading that means several-ways ported huge RAM. Spatial subdivision hierarchies may be a necessity. Another key to GI is the ability to generate a value and then use it again for further illumination approximations. So for instance, do a direct-lighting pass, write those results into the geometry data. Then go over the scene again using this data as "lightsources"... lather, rinse, repeat. And you have some basic GI approximation.

This is similar in concept to a project I was working on in my undergrad days which was something of a "distribution raytracer" which sampled vertex data (the rays were just for stepping through the octree to find a bounding box with some geometry in it). Works nicely for diffuse-diffuse interactions, but any specular interactions are problematic as you need to know the sources. And either way, it needs high mesh granularity.

You can't really escape that massive computational power and massive bandwidth is necessary. Think hundreds of TFLOPS and hundreds of TB/sec.

I believe I had read somewhere one advantage of raytracing techniques is that the computational expense is linear for geometry, but rasturization starts off cheap, but at some point as complexity increases it becomes more and more expensive and at some point surpasses raytracing.
With decent culling and spatial subdivision techniques, raytracing's expense is logarithmic with geometry, and linear with number of pixels to draw. Of course, the naive O(N*M) approach is linear with geometry, but it's probably faster in practice than any sort of spatial hierarchy when the scene is simple. As a scene gets complex, the ability to reject raytests early becomes part of what makes it fast.

Of course, when you get into those levels of complexity, you anyway hit a wall with rasterization in that LOD will be necessary just to rasterize the geometry in the first place.
 
Global Illumination has been abused a bit even in the CG industry in these past years...

'True' GI is where bounced/reflected light is also used in the lighting calculations, with color bleeding from the surfaces that the light rays have touched. This is hideously expensive to calculate, although Dreamworks has used it in their CG films since Shrek 2; but they're replacing the scene geometry with a heavily simplified version and use a lot of other cheats and tweaks to speed up the rendering.
Radiosity is a subset of GI where only diffuse light transfer is calculated.

Other forms of GI are the often mentioned ambient and reflection occlusion, developed by ILM on Pearl Harbour and Jurassic Park 3. These are very widely used in the CG industry because they're relatively cheap, but good looking tricks, even if we shouldn't really consider them true global illumination.
Ambient occlusion simplifies the incoming direct and indirect lighting from the scene, to a simple enviroment map, which is used as a spherical light source. Then for each visible point of every surface, they calculate the amount of light that would reach the surface, by checking what percentage of the hemisphere enviroment from the surface point is blocked by other objects. The lighting calculated from the enviroment map is then multiplied by this value.
The occlusion part is relatively simple to calculate as it usually won't examine bounces and energy transfer, just perform simple visibility tests using a stochastic sampling pattern. But it's still as problematic as any other kind of raytracing, with random accesses to the entire scene. But it can also be sped up with adaptive undersampling, noise filters etc. etc. and good enviroment maps provide a reasonably realistic lighting most of the time. Also, it is not view-dependent, so for stationary objects, it can be precalculated and saved into texture maps or vertex colors.
Reflection occlusion is a bit more complicated; it's basically used to mask out enviroment mapped fake reflections on parts of the surface that should reflect the object itself. So it's quite similar to a heavily blurred reflection, and thus it is view-dependent. To be honest it's one of the biggest cheats in CG that's surprisingly convincing, especially in motion.
It can't be saved into textures though, but as most CG imagery is composited from several passes, it can also be saved as a separate grayscale image sequence (which is multiplied with the reflection pass and then gets added on top of the other passes like diffuse and so on). Thus other elements of the scene can still be modified, while the relatively expensive calculations for reflection occlusion can be re-used.

So, the cheapest stuff to implement in realtime graphics would be ambient occlusion with reflection occlusion as a possible second. However both are too expensive in their offline form, and all the hardware thrown at the problem would be better used in other things like shader ALUs and texture samplers and so on.
Nevertheless, there may be reasonably good looking approximations that won't require raytracing and thus can be implemented even on current day hardware. But I don't think that these would also run at practical speeds on the current generation... and I think we should get good shadowing and self-shadowing working reasonably well first, with more than 2 lights, before we move on to more advanced lighting.
 
Back
Top