Real time raytracing to go mainstream?

Laa-Yosh · Aug 8, 2006

Cars has used a lot of raytracing though, for ambient occlusion and for car paint reflections.
But it was in part due to neccessity - they really couldn't do lots of metallic surfaces without it, adn they've had some horrible render times as a result. They've even developed a hardware-accelerated 'deep framebuffer' based lighting system to speed up the lighting workflow (otherwise their lighters would still be waiting for new testrenders...).

ShootMyMonkey · Aug 8, 2006

A lot of recent movies outside of Pixar are also moving towards GI simulations, though hardly complete ones, and they mostly rely on irradiance caching so as to get relatively consistent (if not 100% correct) results from fewer samples. Though in Pixar's case as with several before, it's pretty confined to specific tasks (e.g. Photon Mapping specifically for caustics in FF:TSW).

Nonetheless, everything that is rendered for movies still sees *raycasting* out of necessity for sampling subpixel details. It was always a rule of REYES, and you can't rasterize sub-pixel polygons -- you sample them.

As for ART VPS' results, there are several reasons for this, though I'm curious where you got your specs. The biggest thing, of course, is that it's specifically geared to accelerate things that were tens of hours long to render and render them in a few minutes. It's not going to accelerate some arbitrary Quake4 scenery applying all the same lighting models... it's out to accelerate Mental Ray or Renderman shaders which are hugely complex and involve dozens and dozens of samples here every which way. The predecessor to the chips on the PURE card (the AR250), ran at 50 MHz on 0.35u CMOS, and was essentially a bundle of non-pipelined scalar fp pipes with raytracing firmware running on it. And it handled up to 80 million raychecks per second (peak). That I got from some talk slides, btw.

Scaling rate on multiple cards in a rackmount layout is going to be limited by other factors before you get into problems with power -- for instance, the network performance and the fact that these cards are sitting on regular old 33/66 MHz PCI. And the fact that if you're rendering something that mandates that many Renderdrives, it's probably a hell of a lot more massive than anything you'd bother with in a game.

DmitryKo · Aug 8, 2006

ShootMyMonkey said:
I'm curious where you got your specs

I've just taken their benchmarks for RenderDrive 16 and assumed a rough rendering time of around 2 minutes per frame.

The biggest thing, of course, is that it's specifically geared to accelerate things that were tens of hours long to render and render them in a few minutes.

Sure, but if you look at the scenes featured in the benchmark, they are actually pretty simple close-ups with little depth and low object counts. Most scenes shown in the photo gallery are much more complex, so I bet it took more time than 2 minutes specified.

Scaling rate on multiple cards in a rackmount layout is going to be limited by other factors before you get into problems with power -- for instance, the network performance

The cards probably do NOT render the same frame all at once, since they employ a proprietary rendering engine which conforms to the RenderMan, Maya and Autodesk interfaces and thus has access to all the required scenery.

ShootMyMonkey · Aug 9, 2006

DmitryKo said:
I've just taken their benchmarks for RenderDrive 16 and assumed a rough rendering time of around 2 minutes per frame.

I was referring more to the specs about the cards and the machines they ran in. Looks like, from the site that the Renderdrive 16 is only one of those PURE cards. Moreover, I don't see a MHz figure anywhere. I don't know if you're assuming that "AR350" implies 350 MHz, but I know that the AR250 definitely didn't run at 250 MHz.

DmitryKo said:
Sure, but if you look at the scenes featured in the benchmark, they are actually pretty simple close-ups with little depth and low object counts.

Yeah, but it doesn't say much about the render features or what sort of lighting environment it was put into or what was running in the shaders, or even what the original render resolution was and how many AA samples were taken. For instance, the watch pic appears to have some area lights -- how many samples did it take? The water image -- who knows if the water surface was explicitly simulated? If they gave some more information on that, it would be more clear how impressive or not it is.

In either case, I don't know how much has changed since the AR250, but I don't know if I'd consider it a canonical example of raytracing hardware. SaarCOR is probably the most thorough I can think of at the moment, considering they've gone as far as lay out an API framework and everything. Freon 2/7 is rather impressive since they're even able to simulate caustics to some extent, but the project has been in a pretty dead state for the last 2 years -- AFAIK, all they have is a software simulation.

3dcgi · Aug 9, 2006

Blue Sky (Ice Age, Robots) uses a ray tracer exclusively. They also use mostly procedural textures so they're a bit different than most studios. Ray tracing definitely has some advantages over rasterization, but I think it will be quite some time before we see fully ray traced games.

Simon F · Aug 9, 2006

DmitryKo said:
Take a breath, raytracing hasn't even made it into offline rendering up until now.

What? As a sweeping statement that's obviously incorrect. Presumably you are only refering to mainstream movies, in which case you are still wrong because, IIRC, "Ice Age" used it.

K.I.L.E.R said:
Even with raytracing, you still need radiosity to do soft shadows and coloured shadow bleeding effects.

I guess you don't mean the usual definition of soft shadows as those can be done easily enough with ray tracing - either with stochastic models or using "fat" rays, e.g. "Ray Tracing with Cones" by Amantides). I do agree about the global illumination problem still being tricky.

DmitryKo · Aug 9, 2006

ShootMyMonkey said:
the "Renderdrive 16" is only one of those PURE cards

And it gets a frame rendered in about 120 s, which means three racks units with 4 cards each will only peak at 10 s per frame, or 0.1 fps... that's for about $45 000 per single rack.

SaarCOR is probably the most thorough I can think of at the moment, considering they've gone as far as lay out an API framework and everything.

It is supreceded by RPU project, http://graphics.cs.uni-sb.de/~woop/rpu/rpu.html

They're rendering at 512x384 with 20 fps (sometimes as low as 4 fps) and antialiasing on a dual-chip 66 MHz setup. Khm... looks like BitBoys

Moreover, I don't see a MHz figure anywhere.

OK, my mistake, removed it.

if they gave some more information on that, it would be more clear how impressive or not it is.

Sure, but we have what we have.

DmitryKo · Aug 9, 2006

Simon F said:
As a sweeping statement that's obviously incorrect. Presumably you are only refering to mainstream movies, in which case you are still wrong because, IIRC, "Ice Age" used it.

"Tiny, insignificant detail" (c) Love Actually

I think the main point is how much does the setup cost and how much time it takes to render a frame.

K.I.L.E.R · Aug 9, 2006

Sorry my mistake.
I just used a dicionary. I got my terms mixed up.

ShootMyMonkey said:
I don't follow your logic here. You don't have to *draw* anything before tracing rays. You still have to create actual textures and models, but that doesn't have anything to do with having to draw something. In a lot of realtime raytracers, they do use the GPU to dump a Z-Buffer to accelerate the first hit, but that's not a requirement by any means.[...].

Frank · Aug 9, 2006

For global illumination and shadow maps, you can use the z-buffer as a solid 3D representation. But you still need to do both illumination and shadowing that way, to have smooth borders.

ShootMyMonkey · Aug 9, 2006

They're rendering at 512x384 with 20 fps (sometimes as low as 4 fps) and antialiasing on a dual-chip 66 MHz setup.

Which is still something given 66 MHz, having only 350 MB/sec of memory bandwidth and so on.

Khm... looks like BitBoys

Well, I do think that their concept will never hit an actual product, so in that sense, they're like Bitboys. But in my mind, the team there is really little more than an academic research group who sorta kinda formed a semi-company of sorts. I don't really see them as someone attempting to compete in the market (the way Bitboys did), but rather a team attempting to show the feasibility of such a concept.

DmitryKo · Aug 10, 2006

ShootMyMonkey said:
the team there is really little more than an academic research group who sorta kinda formed a semi-company of sorts. I don't really see them as someone attempting to compete in the market

I guess you're right in saying that major companies will probably not pick it up, mostly because it would mean a complete redesign of their rendering architectures which they wouldn't dare to attempt. I doubt it's really a bad thing, traditional rasterizers can yeld very competitive results if used properly - I for one was more impressed with the scenes in 3DMark 2006, even though I admit that the lighting looks arguably better in RPU demos... In the end, it's all about game content and art.

Ho_Ho · Aug 10, 2006

There are newer designs of RPU-like things. Simulations have been made of unoptimized 285MHz 8-pipeline ASIC and it is quite promising:

The whole presentation where I took the picture is here:
http://graphics.cs.uni-sb.de/~slusallek/Presentations/ANS06.pdf
From there:

Projections
â€¢ ATI R-520: 288 mmÂ² in 90 nm process

â€¢ D-RPU-4 (190 mmÂ², 130 nm)
â€¢ 90 GFLOP/s @ 200 MHz
â€¢ 6.4 GB/s achievable with DDR2 memory

â€¢ D-RPU-8 (181 mmÂ², 90 nm)
â€¢ 258 GFLOP/s @ 285 Mhz (constant field scaling)
â€¢ 18.3 GB/s (4-channel DDR-2 or XDR memory)

Seeing how big die areas will R600 and G80 have I would think tripleing or even quadrupling the pipeline count shouldn't be that much of a problem for D-RPU. As the it also has insanely long pipelines (>100 cycles IIRC) I would think that it should be relatively simple to upscale the core speed by using better technologies. Of cource that would probably take way more money than they have.

Also Cell is quite nice for ray tracing. Each of its eight SPU's is almost as fast as single-core AMD. As it has eight of those SPU's (7 in PS3) it is around 5-7x faster than a singlecore AMD. Though compared to dualcore Core2 it should be only about 1.3-1.8x faster thanks to its real 128bit wide SIMD units.
http://www.sci.utah.edu/~wald/Publications/2006///Cell/download//cell.pdf

Combining Cell with RSX might be quite an interesting thing. They have enough bandwidth to burn transferring data around between each other.

ShootMyMonkey said:
the team there is really little more than an academic research group who sorta kinda formed a semi-company of sorts. I don't really see them as someone attempting to compete in the market

The company is called InTrace and they have quite good market for themselves.
http://www.intrace.com/
IIRC, BMW and WV are two of the biggest clients.

A little bit better team working on CPU ray tracing can be found here: http://ompf.org/forum/index.php
They have couple of quite nice demos there. Unfortunately as they are still only optimizing their implementations there is nothing but images of un-textured scenes traced by primary rays only. Of cource as there is source availiable everyone can add it themselves if they feel the need

Also that place should be the best source of information about ray tracing. In a few days or weeks there should be quite a few links and information about what was shown and talked about on this years Siggraph.

Ray tracing is coming, there is no question about that. Only question is how soon will it arrive. My projection is 5 years to first really good HW tracers for big clients and 10 years to reach common (hihg-end workstation?) PC's.

3dcgi · Aug 11, 2006

Intrace, the RPU, OpenRT, and Utah's Wald all came from the same university. No real point here other than pointing out the impact one university has had in the real time ray tracing field.

AlexV · Aug 11, 2006

AFAIR, there`s also the issue that compared to classical rendering, raytracing doesn`t scale linearly. That could matter for game-devs, because as a dev you want to be able to predict what happens under different scenarios, no?
I also seem to remember RT scaled more or less logarithmically, correct?

Ho_Ho · Aug 11, 2006

3dcgi said:
No real point here other than pointing out the impact one university has had in the real time ray tracing field.

Indeed, they have influenced the scene the most but there are others researching the area too. E.g an interesting research from Intel was about beam tracing that gave quite nice speedboost to tracing primary rays.
ftp://download.intel.com/technology/computing/applications/download/mlrta.pdf

Morgoth the Dark Enemy said:
AFAIR, there`s also the issue that compared to classical rendering, raytracing doesn`t scale linearly.

You are correct. It scales logarithmically with increased scene complexity.

Morgoth the Dark Enemy said:
That could matter for game-devs, because as a dev you want to be able to predict what happens under different scenarios, no?

From logarithmic and linear scaling I would pick logarithmic any time of a day. Estimating what happens in different scenarios with logarithmic scaling is just as simple as estimating linear scaling. Just that with logarithmic scaling the worst case scenario is not that much slower compared to average case than with linear scaling.

Morgoth the Dark Enemy said:
I also seem to remember RT scaled more or less logarithmically, correct?

Correct. That means if you increase the triangle count 10x speed will not decrease 10x but less. Something around 2-4x is a good estimate. If you increase triangle count by 100x speed will decrease around 4-8x, possibily a bit more if you hit some bottleneck. That logarithmic scaling is one of the best things about ray tracing.
With rasterizing things are much worse as you have to pump most of those added triangels through the pipeline, especially if you have high depth complexity.

Only problem with ray tracing scaling is that it scales a little less than linearly with increased primary ray count (resolution and AA level). I can't quote exact figures but doubling the ray count made tracing a single frame around 70-85% slower. With rasterizing things are a bit better but I don't know by how much.

nAo · Aug 11, 2006

Ho_Ho said:
It scales logarithmically with increased scene complexity.

It can scale logarithmically, if you're clever, but it's not inherently logarithmic.

From logarithmic and linear scaling I would pick logarithmic any time of a day.

I would not.
Complexity mathematical denifition is based on a limit, ergo pay attention to constant costs, it might be the case that your log complexity algorithm gets beaten by a linear or superlinear complexity algorithm with vastly smaller constant costs for any reasonable N.

with logarithmic scaling the worst case scenario is not that much slower compared to average case than with linear scaling.

that's wishful thinking, unfortunately that's not the case.

Marco

Ho_Ho · Aug 11, 2006

nAo said:
It can scale logarithmically, if you're clever

There is little point in not being clever

but it's not inherently logarithmic.

RT is logarithmic in scene complexity wise, there is no question about that. If you think otherwise, please explain. I would also be interested to hear what scales better in rasterizing compared to RT.

If you meant photon mapping and GI then depending on implementation those might not scale as well as regular ray tracing but generally they should still scale logarithmically, though probably with bigger constant costs.

I also think that photon mapping and GI are not very viable rendering methods yet. They need an order of magnitude more computing power than RT, of cource they also give a lot better image quality.

Btw, could anyone have dreamt in '96 that real-time dynamic shadows could be doable or that in ten years we can render 5M+ triangles at interactive framerates instead of olny a thousand or so?

I would not.
Complexity mathematical denifition is based on a limit, ergo pay attention to constant costs, it might be the case that your log complexity algorithm gets beaten by a linear or superlinear complexity algorithm with vastly smaller constant costs for any reasonable N.

That is mostly correct in todays world. Ray tracing is not very competitive with small complexity but things get a lot more interesting once scenes become complicated.

Just imagine what would it be like to render a 350M independant triangle plane (~8GiB of triangles) on a single PC. A 1.8GHz 2P singlecore Opteron can render it at 1-3 FPS with simple shading and shadows @640x480.
http://openrt.de/Applications/boeing777.php
It takes several seconds to just pump that data throug PCIe link, not to mention the nightmare it would be to use any kind of good space partitioning on it to make it rasterizable. With RT there are algorithms that can build a decent BKD tree automatically from almost any source data. Though with that kind of immence amounts of data it would take a while.

Of cource that was an extreme example and most games today don't have >1M triangles in view frustum so it is relatively complicated to compare rasterizing and ray tracing performance in games. Also the fact that every single game is designed and optimized with only rasterizing in mind will not improve things either. There have been a couple (I know of three) of tries on RT games but most of them use rather old techniques. One such is Oasen:
http://graphics.cs.uni-sb.de/~morfiel/oasen/
Just compare the details of huge landscape with the details of Oblivion. The latter uses so much LOD that it isn't funny and it still chockes high-end GPU's.

that's wishful thinking, unfortunately that's not the case.

Of cource you can construct special cases where things blow up but you can do the exact same thing with anything, including rasterizing.

One huge problem in comparing RT vs rasterizing is that mostly people compare high-end GPU's against software implementations. A bit more fair would be to compare software vs sofware. E.g, UT 2004 has a software engine and my previous 2.8G P4 chocked on it when it rendered 320x240 upscaled to 640x480 with extreemly little details and massive LOD.
http://www.radgametools.com/pixofeat.htm
You can read about their inhuman optimization efforts here:
http://www.ddj.com/dept/global/184405765
http://www.ddj.com/184405807
http://www.ddj.com/184405848

Regular x86 CPU is the second worst thing to run RT after current GPU's. Even Cell is not much better, it just has more power per die but is just as inefficient. Unfortunately there are not many HW products to use for comparison. There is ART's pure series of ray tracing HW but that is not meant for real-time rendering. There are several versions of the RPU but so far they haven't gone much further from research. Perhaps in next-years Siggraph we hear something interesting from them.

RT has become interesting only during the last 5-8y or so. Rasterizing has been used in high-end markets for ~25y. During last decade, huge amounts of cash has been pumped into researcing and developing of rasterizing techniques and HW. If the same would happen with RT things would get much more interesting. In my oppinion, RT is much more future-proof, mostly thanks to logarithmic scaling, global access to scene in pixel-level and smaller memory bandwidth requirements.

AlexV · Aug 11, 2006

The thing is that current and future GPUs are quite inept at RT, and I don`t think that nV or AMD/ATi will risk in the forseeable future scratching all of their current research and investments in order to enter the virgin field of RT. For example, a GPU would suck at traversing a tree, and I seem to recall that you need to do that quite well for RT. And that`s just one example.

nAo · Aug 11, 2006

Ho_Ho said:
RT is logarithmic in scene complexity wise, there is no question about that. If you think otherwise, please explain.

Throw a polygons soup to a ray tracer and tell me if its complexity is still logarithmic.

I would also be interested to hear what scales better in rasterizing compared to RT.

It's completely irrelevant as I didn't write algorithm A is better than algorithm B.
I wrote that your picking up RT in any case instead of rasterization just cause the former scales as O(logN) is wrong, and this is not about computer graphics, it just follows
from complexity definition.
BTW rasterization can be sublinear as well if you're clever enough.

Real time raytracing to go mainstream?

Laa-Yosh

I can has custom title?

ShootMyMonkey

DmitryKo

ShootMyMonkey

3dcgi

Simon F

Tea maker

DmitryKo

DmitryKo

K.I.L.E.R

Retarded moron

Frank

Certified not a majority

ShootMyMonkey

DmitryKo

Ho_Ho

3dcgi

AlexV

Heteroscedasticitate

Ho_Ho

nAo

Nutella Nutellae

Ho_Ho

AlexV

Heteroscedasticitate

nAo

Nutella Nutellae

Similar threads