What is missing in GPU hardware to do classic Global Illumination algorithms?

Thanks Laay-Yosh! I was wondering when you would show up ;)
Laa-Yosh said:
So, the cheapest stuff to implement in realtime graphics would be ambient occlusion with reflection occlusion as a possible second. However both are too expensive in their offline form, and all the hardware thrown at the problem would be better used in other things like shader ALUs and texture samplers and so on.
Nevertheless, there may be reasonably good looking approximations that won't require raytracing and thus can be implemented even on current day hardware.
Kind of like the Bizarre hack for ambient occlusion?
 
This paper was probably posted at B3D already, but it's relevant to this discussion. "Real-time Rendering Systems in 2010" suggests we need efficient access to data structures on the GPU and I tend to agree.

http://www-csl.csres.utexas.edu/users/billmark/papers/rendering2010-TR/

Abstract

We present a case for future real-time rendering systems that support non-physically-correct global illumination techniques by using ray tracing visibility algorithms, by integrating scene management with rendering, and by executing on general-purpose single-chip parallel hardware (CMP's). We explain why this system design is desireable and why it is feasible. We also discuss some of the research questions that must be addressed before such a system can become practical.
 
Jawed said:
This is interesting:

Jawed, that is very interesting.

20-30fps with their edge-to-point based system seemed very impressive for the early demo of the dragon. She specifies at the end that the illumination examples (everything after FBT) that they were running on a Pentium D 2.8GHz and only using 1 core. For those who cannot watch it (50min), they had some models up in the millions of polygon range with hundreds of thousands of lights and were doing Global Illumination techniques and rendering frames at under 2 minutes. The images she showed with sub-2min render times were impressive and give some hope. We may have to wait 10-12 years for interactive framerates, but the techniques she demod seem to avoid a ton of problems. It would work with moving lights and also moving geometry that changed shapes. It was exciting when she showed that as more lights were added the computational needs leveled out quickly.

Overall the concepts are interesting and makes a lot of sense. For those who cannot watch it, her illumination technique was basically taking the idea of GI and mixing in 'smart' sparse sampling. Lights are grouped/clustered and trees of lights are made and using pre-defined % error limits each pixel only uses the number of lights necessary to get an accurate illumination value within the given 2% error. So instead of tens of thousands of lights calculated per pixel, only dozens are. @ 37min is a good chart of the concept, and at 38:30 how the cheating works--very intelligent and sneaky! What probably caight my attention the most was how this was a unified technique that worked for hard and soft shadows, HDR, direct and indirect illumination... and very high quality AA (which was very cheap).

Funny how all these problems, which have dedicated hardware on GPUs to resolve them, can all resolved through this unified (!) technique. This reminds me of a Kirk interview from last year where he said that AA was eventually a problem that would be resolved in the rendering engine and it was not worth the effort dedicating a lot of resources to it. I don't know the time scale he is on, but it does look like in 10 years or so he is right.

Like most of your pointed out, she pointed out the memory problem, and that Moores law alone won't solve our problems in this regards. Her solution was using light cuts in a cluster tree as she suggests would substantially cut down on the required memory accesses per pixel.

@ 25min she begins talking about textures ("featured based texturing" FBT) and I thought that was really cool. I am not a rocket scientist, but I had wondered a long time ago why a GPU could not have features to interpolate detail in textures, even if make crude vectors that upon closeup maintain the integrity of edges. She mentioned their technique is fairly small, and the results, at least on her samples, were excellent. She also seemed to emphasize that the FBT textures were 16x16 and looked much better than the 64x64 bilinear counterparts. She mentioned talking to someone at EA about the texturing.

Again Jawed, thanks! I wonder how long it will take until we see something like this in realtime :D

She mentioned this was on a single core of a 2.8GHz Pentium D and that it could be scaled to more cores. It will be exciting when her research is moved to a GPU because they are much more parallel. They are not as robust and lack the large caches (she seemed to emphasize approaches that leveraged CPUs and GPUs together, and down the road how we could see CPU-GPU hybrids). I guess outside the cache memory is going to be a big issue. Reducing latencies and getting good memory management.

Hopefully I am wrong and we will be seeing this stuff in realtime before we hit the dead end of silicon. If not it could be a painfully long, and expensive path until we move over to new hardware platforms :???:
 
Titanio said:
I think I heard Carmack say a little while back that for all intents and purposes now you can do whatever you want with a GPU - the only question is if you can do it quickly enough.
I believe you may be referring to this quotation:
With the next engine you're not going to have absolutely every capability you'll have for an offline
renderer, but you will be able to produce scenes that are effectively indistinguishable from a typical
offline scanline renderer if you throw the appropriate data at it, and avoid some of the things
that it's just not going to do as well. We're seeing graphics accelerated hardware now, especially
with the multi-chip, multi-board options that are going to be coming out, where you're going to
be able to have a single system, your typical beige box, multiple PCI-express system stuffed with
video cards cross connected together, and you're gonna have, with a game engine like this and
hardware like that, the rendering capability of a major studio like Pixar's entire render farm,
and it's going to be sitting in a box and costing $10,000.
[source: http://www.gamedev.net/community/forums/topic.asp?topic_id=266373 and http://www.beyond3d.com/forum/showthread.php?t=13524 ]
 
Last edited by a moderator:
Acert93 said:
Kind of like the Bizarre hack for ambient occlusion?

Bizarre's stuff is precalculated, so it's not really a hack. The occlusion won't change as the object moves around in its enviroment, making it less realistic; but it's still going to make the image a bit better.
 
Laa-Yosh said:
Bizarre's stuff is precalculated, so it's not really a hack. The occlusion won't change as the object moves around in its enviroment, making it less realistic; but it's still going to make the image a bit better.

Actually ,baked AO as a lightmap pass will make the image a lot better...
 
See the link to her Cornell projects page that I posted earlier for downloads on FBTs etc.

The talk was presented at:

http://www.acm.uiuc.edu/conference/archive/2005/webcast.php

where you can find a video of Shirley's talk, too. He's pretty entertaining, but you will probably be disappointed with what he says, since fast ray tracing isn't really the topic.

I haven't read any of the papers, even though I think these approaches are fascinatingly pragmatic.

Jawed
 
ShootMyMonkey said:
Screw rasterization! We need raytracing hardware. Then GI will simply be a matter of throwing more silicon at the problem, which is pretty much what GPUs are all about, anyway ;).
I realise you're probably just being sarcastic, but surely radiosity is more important than raytracing for that.
 
_phil_ said:
Actually ,baked AO as a lightmap pass will make the image a lot better...

AFAIK Bizarre stores ambient occlusion in vertex colors and not lightmaps.
Also, precalculated occlusion will miss all the object interaction, which is a very important part of the visuals.


I wonder why you have to pick on me, arguing on choice of words. Why not give some negative rep as well if you're at it?
 
Laa-Yosh said:
AFAIK Bizarre stores ambient occlusion in vertex colors and not lightmaps.
Also, precalculated occlusion will miss all the object interaction, which is a very important part of the visuals.


I wonder why you have to pick on me, arguing on choice of words. Why not give some negative rep as well if you're at it?

it's just because i disagree .
we store it in texturemaps (cheap L8 format).While it's still a 100% static solution ,it really brings a lot of realism into the whole picture.
 
Laa-Yosh said:
Bizarre's stuff is precalculated, so it's not really a hack. The occlusion won't change as the object moves around in its enviroment, making it less realistic; but it's still going to make the image a bit better.
_phil_ said:
it's just because i disagree .
we store it in texturemaps (cheap L8 format).While it's still a 100% static solution ,it really brings a lot of realism into the whole picture.

Compare the Red-to-Red and Blue-to-Blue.

Your not adding anything new to the discussion Laa-Yosh did not already specifically point out. Your agreeing with him. The only difference is that Laa-Yosh also qualified the downside such a method, which in the context of proper GI is definately noteworthly. If there are issues as the object moves in the environment then it is still a long ways away from properly emulating GI. Again, you are not disagreeing with him as Laa-Yosh said it would make an image look better than not having it:

Laa-Yosh said:
but it's still going to make the image a bit better.
_phil_ said:
Actually ,baked AO as a lightmap pass will make the image a lot better...

You can PM him on the semantic of "a lot" and "a bit" but lets stay on track about Global Illumination and future GPUs and proper shortcuts to make it realtime for gaming.
 
Megadrive1988 said:
and this level of performance will maybe be avalable in next-gen Xbox and PS4 I guess, combined with even more advanced rendering features. but memory bandwidth is going to have to take a huge step up.

Maybe? Well, since we both like to pontificate (our of you know what!) lets have some fun :D Just for a fun exercise...

The method Jawed posted earlier seemed to be very fast, with times just under 2s to about 3s. We will be aiming for a higher resolution, but offsetting any possible algorithm enhancements lets go with 2 seconds. On the processor side: It was on a single core Pentium D 2.8GHz. I don't know enough about their method to know whether SPEs would be a good canidate for this, but lets assume they were. I think it is reasonable to assume in 6 years we will have had enough process shrinks to reach 8x density per mm2. There also may be more room for cache (MRAM, ZRAM), but we should expect the cores to be more robust as well. But assuming similar core density, we could be looking at 8 PPEs and 32 SPEs. Doubling frequency over a 6 year period seems attainable. Assuming an SPE could do the work of 1 Penitum D (with the memory accesses this may be asking a lot, but I ain't an expert) we would be looking at a 64 fold increase in performance if the task could be made effeciently parallel (another unknown at this point). That gets us to ~30 frames per second. Of course those are some seriously large hurdles to overcome, but that is assuming little ingenuity to maintaining integrity of the algorithm and no speed enhancements.

So the processing end seems possible if the software and hardware side can be realistically figured out in the next 6 years. Of course we may fall short some, but the lecture Jawed posted did show an Edge-to-Point system that was in the 20-30fps range, so we may get close, but have to hack some of it.

As for memory... this could very well be memory limited scenarios to begin with and latency will be as much of a factor as processing power. Expecting a 64x jump in memory bandwidth and solving latency problems and the issue of 40+ processors addressing the memory...

Thinking about it in those terms, it would not surprise me if the next consoles were more memory/bandwidth driven approaches. All that processing power is worthless if you don't have the bandwidth to drive it. The consoles have been going at an 8x increase in memory over a 5-6year period. So we are looking at probably 4GB in the next consoles (or maybe less if they go for substantially faster memory and start leaning heavily on procedural data and streaming and I don't see why not as I think the next-next-gen consoles will have a lot of very fast flash type memory standard).

Ok, that ends my aimless pontificating :D I have a lot of questions of where the next next gen will take us, but even more interestingly is what after that? Silicon will be reaching pretty much a dead end thereafter and no real significant inroads have been made, at least publically, on the next step commercially--and if it will be available on large scales and affordable.
 
Acert93 said:
long ,too long OT ,litany

sorry :) .

I can't bring more to the thread without spoiling some little dirty secrets about what can help on the issues of ' baked AO weaknesses.' (static ,monochromatic,material properties ,recusivities..)

Still 90 percent of perception realism is hacked by a good AO ligtmap.
So ,excuse me ,but saying 'a bit' is very far from the truth.

With 2 AO solutions with diffferent ray lenght ,you can even help interpolate day nights settings (inside/outside ) pretty realistically (with good HDR + good tonemapping).

For moving objects ,there is ways that you won't see it as a problem...
 
Ever considered the possibility that vertex color based AO is inferior to texture based AO? There's a huge difference in detail and such IMO...
 
Wasn't the story that originally the PS3 was going to have Cell processor(s) as the Graphics hardware also, but part of the reason why this wasn't done right now was because it would have been too different - developers are much more used to Nvidia's stuff, which uses a lot of things they are already familiar with from PC development and such.

But I do think that in the next generation we may well be raytracing or something similar directly. The current big machines aren't doing anything realtime, but they're also always targeting higher resolutions and higher levels of detail than we are with games. If you can afford to do something not in realtime to make it look better, you typically do.

In fact, I recall reading an article that argued the current graphics card technologies are limiting and will reach a ceiling soon, and a radically different approach will emerge to really make a difference. But we'll see - you never know if someone comes up with a shader solution that looks close enough.

Still, once people start to really target the issue of real-time raytracing, I figure we'll get there. And chips like the Cell will probably help, perhaps that version that does double precision FP that Berkely was suggesting and IBM probably already has on the drawing table also.
 
Laa-Yosh said:
Ever considered the possibility that vertex color based AO is inferior to texture based AO? There's a huge difference in detail and such IMO...

:).Not only in you opinion ,it [is] a huge difference.But from the leak dev shots from PG3 ,the city had a( pretty low rez though) texture AO.
 
Napzu said:
"
Q: We’ve announced that FM2 will be 60fps at 720p. Can you talk a bit about some of the other techniques we’ll be employing to make FM2 look next-gen?

JW: "We’re really taking full advantage of the immense graphics horsepower of the Xbox 360 to do some incredible things visually. Of course we’re 60fps at 720p, as you mention, but we’re also adding effects and features such as 4X anti-aliasing (no jaggies!), motion blur, high dynamic range lighting, global illumination and per-pixel car reflections updated at full frame rate. I could go on and on. Really a ton of stuff. Too much to list here."

"
Source: http://forzamotorsport.net/devcorner/pitpass/pitpass02.htm

So to answer/quess question "1) What features are missing from GPUs that make GI inefficient? When may we see this added?":
Maybe GI is allready inefficient( on 360 GPU anyway.. ) because use of some neat algorithms used with massive bandwidth offered by eDRAM?? ( yeah so my answer is another question really ;) )

And so people really can visualize what GI( Global Illumination ) is all about. ( because I really did't know before I did some research ) :

Source: http://www.finalrender.com/products/feature.php?UD=10-7888-35-788&PID=17&FID=113

Without GI
room_without_gi.jpg


With use of GI
room_with_gi.jpg

didn't they just add ambient lightinh :?:
 
Back
Top