Obviously introducing dedicated hardware raytracing is kind of starting at "square 1" all over again. Will we see GPUs move in a direction where they may straddle the fence by not sacrificing their rasterization support while implimenting raytracing support into the pipeline? Is this just a matter of more ALUs, more and faster memory, etc or are there some specific needs still not met?
Given that GPU manufacturers have a lot of vested interest in rasterizers, I don't see a square 1 approach working out in real life short of a newcomer to the market making waves with their whole new "raytracing approach to graphics" which I don't see happening (unless someone introduces it in a console, where legacy isn't a necessity). There may eventually be a need to start including some basic raycasting hardware if people start pushing for more polygon capacity and the worries over sub-pixel polygons becomes a need to start *sampling* tris rather than rendering them.
The other possibility is that something comes up that supercedes both triangle rasterization and raytracing in favor of something more "all-inclusive," which sounds a lot like the future Dave Orton predicted a few years back.
There are some raytracing "chips" to accelerate the process in hardware (if we could an FPGA setup for raytracing... anyhow). How do these get around memory problems? Or are they slow and have low geometry levels for that very reason (i.e. not fast enough memory access).
As it so happens, I believe there are example cases with the SaarCOR FPGA hardware that DO deal with large geometry sets. My best guess is a whole lot of ports to memory and some reasonably sized geometry caches. I honestly don't know what they do, but they're able to do basic first-hit raycast lighting (without shadows) on huge datasets at acceptable framerates for such a high tricount w/ no LOD mechanisms.
e.g. --
http://graphics.cs.uni-sb.de/SaarCOR/DynRT/SunCor-08-1024.jpg
http://graphics.cs.uni-sb.de/SaarCOR/DynRT/SunCor-09-1024.jpg
Apparently 187,145,136 tris, and no LOD. 3-6 fps isn't too bad for that. Still has some floating point issues, though. I imagine, though, the performance will go to hell with shadow tests. They have some examples of around 10,000,000 triangle scenery rendered with shadows around 2-3 fps.
The single biggest difference between raytracing hardware and rasterizer hardware, though, is that with raytracing, pixels are in the outer loop, and its geometry (raytests) that are in the inner loop. That means that the performance is going to go linearly with the resolution.
If you were making a checklist of things to change/add to GPUs of today to get to realtime GI, what would they be? (You don't need to make up fake specs, just a general idea of memory requirements and processing requirements).
Assuming you're sticking with rasterization...
Well, being able to buffer off a full scene in VRAM is a big part of it. And that also means a lot of random access to medium-sized blocks of data in RAM. Threading that means several-ways ported huge RAM. Spatial subdivision hierarchies may be a necessity. Another key to GI is the ability to generate a value and then use it again for further illumination approximations. So for instance, do a direct-lighting pass, write those results into the geometry data. Then go over the scene again using this data as "lightsources"... lather, rinse, repeat. And you have some basic GI approximation.
This is similar in concept to a project I was working on in my undergrad days which was something of a "distribution raytracer" which sampled vertex data (the rays were just for stepping through the octree to find a bounding box with some geometry in it). Works nicely for diffuse-diffuse interactions, but any specular interactions are problematic as you need to know the sources. And either way, it needs high mesh granularity.
You can't really escape that massive computational power and massive bandwidth is necessary. Think hundreds of TFLOPS and hundreds of TB/sec.
I believe I had read somewhere one advantage of raytracing techniques is that the computational expense is linear for geometry, but rasturization starts off cheap, but at some point as complexity increases it becomes more and more expensive and at some point surpasses raytracing.
With decent culling and spatial subdivision techniques, raytracing's expense is logarithmic with geometry, and linear with number of pixels to draw. Of course, the naive O(N*M) approach is linear with geometry, but it's probably faster in practice than any sort of spatial hierarchy when the scene is simple. As a scene gets complex, the ability to reject raytests early becomes part of what makes it fast.
Of course, when you get into those levels of complexity, you anyway hit a wall with rasterization in that LOD will be necessary just to rasterize the geometry in the first place.