What's the current status of "real-time Pixar graphics&

Just because it's possible doesn't mean it's practical.

The paper implies the availability of infinite amounts of memory (to store temporary values), and potentially very long execution times.

For example take 'for (i=0; i<arbitraryvalue; i++)' - if there are no known bounds on arbitraryvalue then the loop has to be executed MAX_INT times...
 
Daliden said:
Mr. Blue said:
DiGuru said:
Thanks, Mr. Blue.

:D

Sorry if I get this wrong again, but do you take the shaders into account? As far as I got it, you can use them to create the same effects as you mention, as long as you can cram the function into the program space they have and you don't use conditional branches in the pixel shaders.

But that's just it. You can't cram the more complicated shaders (the ones used for production) into a few registers with limited conditional branching.

Hmm, doesn't this contradict Peercy's paper "Interactive Multi-Pass Programmable Shaders"? As I understood it, he stated that any Renderman program can be broken down into several passes.

What paper was it? I've got most of the Siggraph papers here, so I can just read it and let you know what I think. You can't do it if a pass contains branches and loops (which most complex shaders do).


-M
 
Any program can be converted into a linear program using predication to replace branching and loops, but the program size explodes.
 
I read the same paper, and it can be done. But when you run very long shader programs, it isn't going to be real-time. To do that, you have to 'cheat' and make a nice approximate function that runs in as few passes as possible.

If you don't care about real-time, I think a bunch of parallel 9800 (see the Sapphire R9800 Maxx for an example) would run those Renderman programs nice and fast.
 
Dio said:
Any program can be converted into a linear program using predication to replace branching and loops, but the program size explodes.

But of course..the more instructions are allowed, the closer the GPU becomes a CPU!

-M
 
There was another paper that explained how you could even do raytracing this way, but I cannot find it anymore.

In how far would the FP24/32 precision hamper such a setup? Or could you do better precision by increasing the program length yet again? You can on a CPU, but I have no idea if that would be feasible on a GPU.

EDIT: Found it:

http://graphics.stanford.edu/papers/rtongfx/rtongfx.pdf
 
mrbill said:
Mr. Blue said:
Noone still hasn't given the year that this paper was written. I would like to read his proposal..

2000. See http://www.csee.umbc.edu/~olano/papers/ips/ips.pdf .
(Marc Olano is co-author and now a professor at UMBC.)

-mr. bill

Ok, I just skimmed over this paper and there are a lot of gotchas:

1) It's a software implementation.
2) Inaccuracies due to lookup tables.
3) Bandwidth will be a problem for n passes.
4) No displacement mapping.
5) No transparency (which would slow it down even more due to sorting).
6) No possibility of ray-tracing.
7) No procedural texture filtering.

Again, I state, 3d hardware isn't there yet and won't be for quite some time.

-M
 
Mr. Blue said:
mrbill said:
Mr. Blue said:
Noone still hasn't given the year that this paper was written. I would like to read his proposal..

2000. See http://www.csee.umbc.edu/~olano/papers/ips/ips.pdf .
(Marc Olano is co-author and now a professor at UMBC.)

-mr. bill

Ok, I just skimmed over this paper and there are a lot of gotchas:

1) It's a software implementation.
2) Inaccuracies due to lookup tables.
3) Bandwidth will be a problem for n passes.
4) No displacement mapping.
5) No transparency (which would slow it down even more due to sorting).
6) No possibility of ray-tracing.
7) No procedural texture filtering.

Again, I state, 3d hardware isn't there yet and won't be for quite some time.

-M

It's three years old. A lot has changed. All the things (but raytracing) you adress can be done on a 9800 nowadays, most of it is supported directly by the hardware. And see my last post for a way to do raytracing as well.
 
DiGuru said:
It's three years old. A lot has changed. All the things (but raytracing) you adress can be done on a 9800 nowadays, most of it is supported directly by the hardware. And see my last post for a way to do raytracing as well.

Not true. Why does it sound like I'm repeating myself over and over? If the current 3d boards were a viable solution to our render farms, we'd invest in them. The fact is - they aren't.

o Displacement mapping can *not* be done in realtime for the type of models that VFX houses use. I have yet to even see it in a game.

o The 9800 can *not* implement n texture passes in realtime.

o The 9800 can *not* implement noise shaders in realtime (all instructions done in hardware with *no* lookups) due to it's limited instruction set and API.

o The 9800 can *not* implement complex lighting models in realtime also due to it's design and API.

o The 9800 does *not* have enough memory or bandwidth to render the kinds of scenes that go on the screen.

I'm sure that when that happens, the whole film industry will be talking about it. As of now, we only speak about 3d hardware for the games we play.

-M
 
Mr. Blue said:
DiGuru said:
It's three years old. A lot has changed. All the things (but raytracing) you adress can be done on a 9800 nowadays, most of it is supported directly by the hardware. And see my last post for a way to do raytracing as well.

Not true. Why does it sound like I'm repeating myself over and over? If the current 3d boards were a viable solution to our render farms, we'd invest in them. The fact is - they aren't.

o Displacement mapping can *not* be done in realtime for the type of models that VFX houses use. I have yet to even see it in a game.
You've actually tried it then?
o The 9800 can *not* implement n texture passes in realtime.
Isn't that quite dependent on how large n is?
o The 9800 can *not* implement noise shaders in realtime (all instructions done in hardware with *no* lookups) due to it's limited instruction set and API.
F-buffer allows "infinite" instructions.
o The 9800 can *not* implement complex lighting models in realtime also due to it's design and API.
This is just a matter of how many instructions you need, right?
o The 9800 does *not* have enough memory or bandwidth to render the kinds of scenes that go on the screen.
What resolution you think will be enough?
I'm sure that when that happens, the whole film industry will be talking about it. As of now, we only speak about 3d hardware for the games we play.
One big application for cards like the 9800 is for real-time modeling. Instead of having to send of your model to a render farm and waiting for the results, you can get "immediate" results with your 9800. This may mean taking a few shortcuts, but the fact that it's much faster than waiting for a render farm makes a large difference.
 
Well the rndr farm for toy story 1, as I said in another thread, is capable of peak 39.9Gflops, due to the distributed nature, it's realworld perf probably hovers around 10-20Gflops, this was back in the early 90s...

I think a couple more orders of magnitude of perf. and more flexible h/w will allow us to rndr something that will approach it.... at the latest, I'd say, by the end of this decade we will have the necessary h/w.
 
OpenGL guy said:
o Displacement mapping can *not* be done in realtime for the type of models that VFX houses use. I have yet to even see it in a game.

You've actually tried it then?

I don't really need to try it. I know how taxing it is on our systems to know that displacement mapping takes a very long time to tessellate.

Again, I'm talking about full production models in movies. Not simple objects.

o The 9800 can *not* implement n texture passes in realtime.
Isn't that quite dependent on how large n is?

Sure, but "n" usually means large in the computer world. "N" also denotes no bounds.

o The 9800 can *not* implement noise shaders in realtime (all instructions done in hardware with *no* lookups) due to it's limited instruction set and API.

F-buffer allows "infinite" instructions.

How about conditional branching and looping (which is required to implement of fully functional filtered noise? How about shader trees where several shaders can call other shaders? Can I compute a gradient noise vector by calling it 3 times shifting my x, y, and z values by a small delta?

o The 9800 can *not* implement complex lighting models in realtime also due to it's design and API.

This is just a matter of how many instructions you need, right?

Hehehehe. We need a lot depending on the lighting model.:)

o The 9800 does *not* have enough memory or bandwidth to render the kinds of scenes that go on the screen.

What resolution you think will be enough?

Film res depends. Most films output in full floating point at 2048x768 res, but that would obviously change depending on the film.

I'm sure that when that happens, the whole film industry will be talking about it. As of now, we only speak about 3d hardware for the games we play.

One big application for cards like the 9800 is for real-time modeling. Instead of having to send of your model to a render farm and waiting for the results, you can get "immediate" results with your 9800. This may mean taking a few shortcuts, but the fact that it's much faster than waiting for a render farm makes a large difference.

I've never heard of render farms rendering out a model. If we are talking about a basic model that isn't lit or textured, depending on the complexity of the model, you are correct. However, scences are out of the question. And hair models are also out of the question. LOD would be the best approach in this regard as we would be able to move the camera around a lower level of detail model and then continue it from there. But as we start building on this model by putting joints, IK, etc on the model, it quickly becomes a bottleneck to move around in realtime. Most of the time TDs usually view in wireframe in this case and only resort to turning on full detail when they want to view it but not manipulate it.

Basically the most immediate application (after model manipulation) would be to have the accelerator handle some of the simpler tasks in the render pipe and work alongside with the renderfarms. This will have an immediate speed-up effect no matter how simple the task may be.

-M
 
Mr. Blue said:
I don't really need to try it. I know how taxing it is on our systems to know that displacement mapping takes a very long time to tessellate.
I've seen tesselations down to the pixel level. This was done real time. You can go subpixel if you like.
Sure, but "n" usually means large in the computer world. "N" also denotes no bounds.
I know what "n" means. But "no bounds" and computers have nothing in common: Computers always have a bound, whether it's instructions, memory, storage, whatever. How large could "n" be? 100? 1000? 1000000? There has to be some practical limit.
How about conditional branching and looping (which is required to implement of fully functional filtered noise? How about shader trees where several shaders can call other shaders? Can I compute a gradient noise vector by calling it 3 times shifting my x, y, and z values by a small delta?
I have no idea, I don't write apps. However, I know that lots of branching can be handled by expanding conditionals. Sure it's expensive, but it'll work.
Hehehehe. We need a lot depending on the lighting model.:)
So what's a lot? 1000? 10000? 1000000?
Film res depends. Most films output in full floating point at 2048x768 res, but that would obviously change depending on the film.
2048x768 is not a problem whatsoever, float or not. Now if you add in AA, then you might have some issues. However, you wouldn't necessarily need AA for modeling work. Of course, you could use a "farm" of machine each rendering a slightly jittered result of the desired image and then composite them together later to get nice AA results.
Basically the most immediate application (after model manipulation) would be to have the accelerator handle some of the simpler tasks in the render pipe and work alongside with the renderfarms. This will have an immediate speed-up effect no matter how simple the task may be.
There are issues there as well such as precision differences between the HW and SW implementations. Not an insurmountable problem, but very real.
 
OpenGL guy said:
Mr. Blue said:
I don't really need to try it. I know how taxing it is on our systems to know that displacement mapping takes a very long time to tessellate.

I've seen tesselations down to the pixel level. This was done real time. You can go subpixel if you like.

On what kind of model? A champagne glass?:)

Sure, but "n" usually means large in the computer world. "N" also denotes no bounds.

I know what "n" means. But "no bounds" and computers have nothing in common: Computers always have a bound, whether it's instructions, memory, storage, whatever. How large could "n" be? 100? 1000? 1000000? There has to be some practical limit.

Ok, let's put it this way. Our practical limit far exceeds a 3d accelerators practical limit for the time being.

How about conditional branching and looping (which is required to implement of fully functional filtered noise? How about shader trees where several shaders can call other shaders? Can I compute a gradient noise vector by calling it 3 times shifting my x, y, and z values by a small delta?

I have no idea, I don't write apps. However, I know that lots of branching can be handled by expanding conditionals. Sure it's expensive, but it'll work.

Well, when the ATI/Nvidia present their newest latest and greatest, we'll see what the limitations and drawbacks will be.:) I can assure you that you won't see a fully 3d featured film rendered in realtime on the big screen anytime soon..:)


2048x768 is not a problem whatsoever, float or not. Now if you add in AA, then you might have some issues.

You think VFX movies don't AA??

However, you wouldn't necessarily need AA for modeling work.

The limitation for modeling work is, again, the amount of geometry.


-M
 
I was thinking...

Maybe we should drop the "real-time" part completely from this discussion, as it is basically a meaningless issue in rendering. I mean, it's not like there's a computer at the movie theatre or the TV station rendering the movie/TV episode on the fly.

What's the fps of renderfarms used for lower-quality TV production work? I have no idea if it's over or under 1 fps (I suppose it depends on FX quality). Considering the very very tight schedules of weekly TV shows, *any* kind of speed-up, even with a couple of cut corners, will be greeted with open arms. Won't it?

Of course, just the hardware isn't enough; you gotta have the software to support it. And I guess that's not mature enough yet for any production house to take the risk . . .

Actually I doubt that movie effects will ever be rendered at anything approaching realtime. There's always time and money to burn in Hollywood (OK, maybe for some lower-budget productions this will seem like a nice option at some point). If the rendering speed increases, well, then the resolution is increased. Or more complex lighting system used. There's just no real use for real-time movie-quality rendering, I think. Other than that it would be nice, of course :)
 
Mr. Blue said:
o The 9800 does *not* have enough memory or bandwidth to render the kinds of scenes that go on the screen.
What resolution you think will be enough?
Film res depends. Most films output in full floating point at 2048x768 res, but that would obviously change depending on the film.
Feeling a strong sense of deja vu...

Bandwidth consumption is not particularly dependent on resolution. There is a small effect, but bandwidth required often goes down at higher resolution due to the higher efficiency of Z-features.

Also, after the pixel shader program passes a certain length, effective bandwidth consumption drops nearly to zero. The R300 contains enough instructions to hit this except in certain pathological cases (that are rarely touched by multipass shaders). It's a reasonable approximation to say that multipass implementations have no bandwidth issues.

I don't personally think R300-class parts are going to be doing film final output - for the 'practical limit' reasons. However, I do think that they will be used to do far more accurate previews than can be done now, and that the day a 'prosumer' part can render film is approaching...
 
Mr Blue,

I've edited your quotes because a) It doesn't really matter if we're talking about a 9800 or something else, b) It doesn't matter if it's realtime for your uses. All that matters is "Is it faster than a CPU?"

Mr Blue said:
o [A Graphics Chip] can *not* implement n texture passes [....]
[...]"n" usually means large in the computer world. "N" also denotes no bounds.

A VPU can do 8->32 in a pass and do as many passes as you want. Tends to be faster than a CPU at it as well. If your shading tree takes more than 32 textures in a a single node I'd be surprised (but feel free to surprise me :) )

o [A Graphics Chip] can *not* implement noise shaders [in realtime] (all instructions done in hardware with *no* lookups) due to it's limited instruction set and API.

It can be done. People tend not to because it's slow for realtime use, but it can be done. (Oh, if you're doing gradient noise, i.e. classic Perlin, then you'll need to do lookups into your gradient table, but you do that on a CPU anyway)

How about conditional branching and looping (which is required to implement of fully functional filtered noise? How about shader trees where several shaders can call other shaders? Can I compute a gradient noise vector by calling it 3 times shifting my x, y, and z values by a small delta?

o Conditional branching, no, but predicates, yes, so we can do the same thing in a different way.
o Looping, yes.
o Shaders calling other shaders. Two ways of doing that come to mind. Either running ShaderB inline when ShaderA calls it, or Run shaderB first, write the output to a p-buffer and then use that as a texture in ShaderA.
o Calling a function three times with varying parameters :rolleyes: Please.

o [A Graphics Chip] can *not* implement complex lighting models [in realtime] also due to it's design and API.

That is rapidly becoming false. OpenGL SL and HLSL have an almost identical feature set to RenderMan SL, sure they're still in their infantcy, but it's not going to be long before you're going to be able to specify an abitarily complex shader and have it execute. Maybe not in realtime, maybe not written in the same style that you're used to, but faster than doing the same thing on a CPU I'll wager.

o [A Graphics Chip] does *not* have enough memory or bandwidth to render the kinds of scenes that go on the screen.

Bandwidth tends to be higher on a VPU than a CPU, but capacity is the one that I think is a real problem. In particular, the quantity of memory required to hold the meshes/textures, but there are chips out there that have nice large virtual address ranges (think "greater than the 4GB address space you currently have on each of those 32-bit CPUs in your renderfarm").

I think the main division between the VFX community and the Hardware 3D community is that the VFX community tend to see what the hardware is capable of in realtime and think that that's it. Games companies don't right complex lighting models because they want it to run in realtime. That doesn't mean it can't be done, and done quickly (comparitavly) .

Think of what state of the art software based rendering is, and compare it to what VFX companies do non-realtime.
Then think of what state of the art Hardware based rendering is, and imagine what VFX companies could do non-realtime.

As a parting thought. Why do you think NVIDIA bought Exluna?

Edit: P.S. Why render to 2Kx768? Why would you want a 2.6:1 aspect ratio? Surley 768 is rather low for the vertical dimension of something that'll be projected N feet high (Where N is large and unbounded ;) )
 
Back
Top