3D is still displayed 2D so why bother with 3D?

Dregun · Feb 6, 2007

Ok, maybe in the grand scheme of things my way may be harder or require AI not even possible yet..but.....

If we were to have a machine draw a perfect Sphere properly lit at any one viewpoint what method would be the easiest?

The sphere must be lit from 2 light sources in a confined space.

1. Render six planes to represent the confined space, render a sphere with 5,000 polygons. Light the sphere from the two light sources and apply the proper shaders, take a snapshot.

2. Draw a circle and shade darker between the two light sources, use stored coordinates to draw the planes visible from that viewpoint, take a snapshot.

I know some of you are thinking that without a 3D render of an object we have no way of even making a 2D picture of it. Yet we don't need the object to be rendered to know where it is in a 3D space, we can break it down as simple as we tell the computer where it is...geometry. If I wanted to make a program to draw a house I would feed the coordinates to the computer, use 6 captures from a highly detailed pre-render done "outside" of the game. The machine would now take the 3 captures needed from that viewpoint and combine them on screen adding shading mathematically. The whole game would have been pre-rendered at the studio using the best forms of rendering possible because rendering time isn't important to the game anymore. The machine would not need to construct an actual 3D object it would instead interpret what the 3D object would look like in 2D.

Alcohol people alcohol!

Dregun

ERP · Feb 6, 2007

All your really describing in your last paragraph is image based rendering.
It' been an area of research study for years, in general it has a lot of artifacts.

http://www.google.com/search?source=ig&hl=en&q=Image+based+rendering

And clearly you need more than 3 viewpoints, since you can only render features you actuallt see in the source images.

Dregun · Feb 6, 2007

ERP said:
All your really describing in your last paragraph is image based rendering.
It' been an area of research study for years, in general it has a lot of artifacts.

http://www.google.com/search?source=ig&hl=en&q=Image+based+rendering

And clearly you need more than 3 viewpoints, since you can only render features you actuallt see in the source images.

I think from what I read about "Image Based Rendering" its creating a 3D model from 2D images, I'm talking about taking 2D images and creating 2D images that look as though they are 3D. I don't understand why you would need more then just 3 images at any 1 given view point for a 2D image.

Npl · Feb 6, 2007

Dregun said:
I think from what I read about "Image Based Rendering" its creating a 3D model from 2D images, I'm talking about taking 2D images and creating 2D images that look as though they are 3D. I don't understand why you would need more then just 3 images at any 1 given view point for a 2D image.

The problem you are facing is that you either use a couple of 2D-Images to represent the surface-texture - but then you have no information about the geometry( its plain texturie mapping then).
or you are putting depth information into your images (and making them somewhat simple "3D" Models), then using something like parallax mapping to fake surface detail.
It would only work for very simple objects, and even then you need geometry to put them in the right position (it doesnt matters if you use 2 triangles, 1 quad or position&orientation - same thing really )

Just imagine a more complex object like the famous Teapot or a Car. If you "shoot" 3 Images you wont capture all the exterior & interior. It would be very complicated to work around such issues, and for each workaround you will find more complex objects that wont work.

antlers · Feb 7, 2007

Dregun said:
Ok, maybe in the grand scheme of things my way may be harder or require AI not even possible yet..but.....

If we were to have a machine draw a perfect Sphere properly lit at any one viewpoint what method would be the easiest?

The sphere must be lit from 2 light sources in a confined space.

1. Render six planes to represent the confined space, render a sphere with 5,000 polygons. Light the sphere from the two light sources and apply the proper shaders, take a snapshot.

2. Draw a circle and shade darker between the two light sources, use stored coordinates to draw the planes visible from that viewpoint, take a snapshot.

I know some of you are thinking that without a 3D render of an object we have no way of even making a 2D picture of it. Yet we don't need the object to be rendered to know where it is in a 3D space, we can break it down as simple as we tell the computer where it is...geometry. If I wanted to make a program to draw a house I would feed the coordinates to the computer, use 6 captures from a highly detailed pre-render done "outside" of the game. The machine would now take the 3 captures needed from that viewpoint and combine them on screen adding shading mathematically. The whole game would have been pre-rendered at the studio using the best forms of rendering possible because rendering time isn't important to the game anymore. The machine would not need to construct an actual 3D object it would instead interpret what the 3D object would look like in 2D.

Alcohol people alcohol!

Dregun

When you describe rendering the sphere the 2d way, you're describing rendering it as a primitive. You don't light and shade each consituent polygon, you light and shade the whole thing as a unit.

The problem is that it's hard to render anything that's not mathematically simple to describe as a primitive. So graphics systems tend to have simple primitives, and construct complex scenes by drawing a bunch of simple primitives. Current graphics hardware renders the simplest possible 3d primitive, triangles. Older graphics systems sometimes used more complex primitives: Nvidia's first generation graphics cards (of which they sold very few, I think) used curved regions called Nurbs.

Once you know how to render a primitive, you know how to render any scene constructed from the primitives. Once again, this argues for simple primitives, because it's easier to figure out how to break down an arbitrary scene into simple primitives.

No one has figured out a general way to render an arbitrary scene as a primitive, instead of first breaking it down into simpler things like polygons or Nurbs.

ShootMyMonkey · Feb 7, 2007

I think from what I read about "Image Based Rendering" its creating a 3D model from 2D images, I'm talking about taking 2D images and creating 2D images that look as though they are 3D.

Actually, Image-Based Rendering includes more than that. For instance, computing arbitrary illumination on a scene based on captured images -- that's an entirely 2D process, except for the fact that lighting direction still needs to be represented in 3D. Which is kind of the point of 3D in the first place, which you seem to think is entirely pointless. I'm not entirely sure what you keep talking about with your notions of a "3d image"...

I don't understand why you would need more then just 3 images at any 1 given view point for a 2D image.

How are you going to get all the information from 3 images? I'm lost as to why you think 3 images is enough? For instance, if you get information about an object from above, in front, and from the left, where is the information about the right, rear, and bottom? That too, are you taking X-ray images? Because there are details that may be ambiguous because they're not clearly apparent from more than one viewpoint. And even that isn't completely definitive.

For instance, if I look at a figure from above, below, left, right, front, and back and see a circle-shaped silhouette from all of those views, does that mean I have a sphere? Think carefully before you answer that.

In the generic sense, you need every viewpoint.

randycat99 · Feb 7, 2007

Dregun said:
2. Draw a circle and shade darker between the two light sources, use stored coordinates to draw the planes visible from that viewpoint, take a snapshot.

I think your example may violate your own premise (to avoid the use of any additional metrics beyond 2D).

The circle is fine, but then how do you define the locations/orientations of the "planes" at any other depth than the plane of the circle, itself? W/o extending to 3D, the only thing you can define is a circle sitting flat on one single plane. Also, how do you define where the light sources are, in order to decide how the shading will occur on the circle?

Since there can only be one plane in a 2D system, you will have to start right away with the manipulation/distortion of 3 mis-shapen quadrilaterals with 3 shared side segments and the circle superimposed on top. Nevermind, how you will resolve what lines will cross or be occluded and what will be forefront vs. background...

(The easiest solution will be to resort to another dimension to describe depth, but your premise disallows anything beyond 2D, right?)

Dregun · Feb 7, 2007

I'm going to try and explain my line of thought from multiple quotes

How are you going to get all the information from 3 images? I'm lost as to why you think 3 images is enough? For instance, if you get information about an object from above, in front, and from the left, where is the information about the right, rear, and bottom? That too, are you taking X-ray images? Because there are details that may be ambiguous because they're not clearly apparent from more than one viewpoint. And even that isn't completely definitive.

I stated in that post that 6 "pictures" of the render would be taken but only 3 of them would be required at any given view point. If you viewing the house from directly on its side you only need 2 of the pictures because you can't see anything else from your viewpoint. If you look at it from a angle that is slightly upwards and to the side you would only be able to see 3 sides of the house.

The circle is fine, but then how do you define the locations/orientations of the "planes" at any other depth than the plane of the circle, itself? W/o extending to 3D, the only thing you can define is a circle sitting flat on one single plane. Also, how do you define where the light sources are, in order to decide how the shading will occur on the circle?

You can represent a 3D space in 2D quite easily, artists have been doing it for a long time. When you look at a 3D space in a raster program you usually have a grid pattern allowing you to visualize the location of the object in space. You feed it XYZ coordinates and it places the object in a 3D environment at that location. However in 2D its not much different but instead of actually creating a 3D environment your can portray those "grids" in 2D. If the grid can be portrayed in 2D then why cant the same XYZ coordinates give whatever object you want to be in the location you want it to be? It knows the object is 20'Z, 15'X, 2'Y from your viewpoint and aligns it properly in 2D.

The circle is fine, but then how do you define the locations/orientations of the "planes" at any other depth than the plane of the circle, itself? W/o extending to 3D, the only thing you can define is a circle sitting flat on one single plane. Also, how do you define where the light sources are, in order to decide how the shading will occur on the circle?

I would imagine it would be done the same way as I stated above, you can feed it coordinates without compiling a 3D space. The machine/computer would interpret how the light sources would affect an object based on its location to it. I really dont see how this would be any more complicated then what is being done right now, only that everything wouldn't have to be lighted just to show you what you "can" see.

For instance, if I look at a figure from above, below, left, right, front, and back and see a circle-shaped silhouette from all of those views, does that mean I have a sphere? Think carefully before you answer that.

If all 6 viewpoints showed a circle then you would have a sphere. If however the top and bottom showed a circle and the left, right, front and back showed rectangles then you would have a cylinder as your link suggested. However the "interesecting cylinders" you are referring to would have absolutely no use for this type of rendering as it does not require the combination of polygons, cylinders, cubes or anything else to render the scene. All the information it needs has already been compiled through those techniques during development. I guess what I'm trying to say is that the machine would "paint" the image a 2D image based on 3 of the 6 viewpoints of the pre-rendered 3D object.

The last way I can think about explaining this would be for everyone to imagine how they would tell someone how to "draw" a picture of a house on a piece of paper. You would tell the person to draw two horizontal parallel lines -30 degrees off center. Then to draw 2 vertical parallel lines at 90 degrees connecting the previous lines together. You would keep doing this and they turn 2D lines into 3D images because you don't need a 3D environment to create 3D images.

Everyone seems to come back to "constructing" 3D objects as if creating 3D objects is the only way to represent them. Someone earlier mentioned still life and I think its a very good example. Why can't the machine "paint" an apple inside a glass jar with simple mathematics telling it how the light is going to refract and distort the apple, instead of recreating the world in 3D for the same effect.

I hope everyone realizes that all the comments and suggestions made is why I created the thread. I'm not looking to prove a point or argue about how certain techniques are done or challenge anyones knowledge of the subject. I'm trying figure out what the limitations (if even possible) of "painting" an image instead of rendering 3D environments.

So thanks

Dregun

Shifty Geezer · Feb 7, 2007

Dregun said:
You can represent a 3D space in 2D quite easily, artists have been doing it for a long time. When you look at a 3D space in a raster program you usually have a grid pattern allowing you to visualize the location of the object in space. You feed it XYZ coordinates and it places the object in a 3D environment at that location. However in 2D its not much different but instead of actually creating a 3D environment your can portray those "grids" in 2D. If the grid can be portrayed in 2D then why cant the same XYZ coordinates give whatever object you want to be in the location you want it to be? It knows the object is 20'Z, 15'X, 2'Y from your viewpoint and aligns it properly in 2D.

I would imagine it would be done the same way as I stated above, you can feed it coordinates without compiling a 3D space.

I think the real issue here is that you don't understand properly what 3D rasterization is! Feeding in coordinates and turning that into a 2D image is exactly what is done now!

The idea of working out from 6 2D projections all the different viewpoints is also impossible. First you'd need the computer to have an algorithm to interpolate that information successfully. That's tricky, but could be possible. However, secondly you have the problem of concave surfaces, where a viewpoint doesn't see a detail. You could only get a full set of data for a viewpoint from any angle if every surface was visible, especially if you want animated material. If in the Still Life example, you have a gap between the apples and bananas that's not visible from from or left view, but from the front, left view shows some grapes, that interpolation of the left and front views will render that incorrectly. As another example, a human face can't be modelled correctly from 6 viewpoints. The eyesockets won't appear. If what you described were feasible, you'd be able to model any object by drawing 6 outlines, but modelling isnt that easy (unfortunately)!

I suggest you check out http://www.photomodeler.com/pmpro09.html to see software that creates 3D scenes (scenes that can be viewed from any angle) from 2D stills. It's very effective, but needs user input to know how to 'understand' the image.

Everyone seems to come back to "constructing" 3D objects as if creating 3D objects is the only way to represent them...Why can't the machine "paint" an apple inside a glass jar with simple mathematics telling it how the light is going to refract and distort the apple, instead of recreating the world in 3D for the same effect.

Think about this : How would your tree cast a shadow. Let's say you have a street with houses, a tree, and a man walking under that tree in its shadow. Both the man and tree are viewed exactly head-on, so you can use the front-view images to draw them. How do you work out the shadow to draw on the man and the road?

The last way I can think about explaining this would be for everyone to imagine how they would tell someone how to "draw" a picture of a house on a piece of paper. You would tell the person to draw two horizontal parallel lines -30 degrees off center. Then to draw 2 vertical parallel lines at 90 degrees connecting the previous lines together. You would keep doing this and they turn 2D lines into 3D images because you don't need a 3D environment to create 3D images.

You know, you're sounding just like a computer there. You'd use maths. If you were to use 'simple' language - draw a box, with the horizontal lines longer than the verticals. Draw a box on top with two sticking out triangles - you end up with a hideously complex set of instructions with loads of ambiguity. What you're doing in the above example is taking a 3D model in your head (or a memory of a already existing 2D projection) and describing the 2D representation - exactly what rasterization does when it takes 3D data and turns it into a 2D projection.

I hope everyone realizes that all the comments and suggestions made is why I created the thread. I'm not looking to prove a point or argue about how certain techniques are done or challenge anyones knowledge of the subject. I'm trying figure out what the limitations (if even possible) of "painting" an image instead of rendering 3D environments.

The limitation here is entirely your understanding

Computers work in maths. They compute things from numbers. They have no imagination and no ability to interpolate data unless that data is provided in numerical form. They also have no real-world experiences to fall back on. You'll be surprised how much 3D data you're referencing when you're looking at 2D objects and understanding them. When an artist draws something from imagination in 2D, they're calling on massive amounts of 2D and 3D data and experience which computers cannot do. Trying to get a computer to work like an artist is never going to work. Also, artists take ages to produce their pictures. They make mistakes in their interpolations, and make corrections as they go, constantly comparing what they're producing with their 2D and 3D knowledge and experiences. It takes a very trained individual, accumlating loads of learning, to be able to draw arbitary 3D viewpoints of any scene, and they can also 'make stuff up' which a computer can't.

I don't see what you find wrong in the current methods. Storing data as 3D coordinates provides a very fast and accurate way for a computer to change the viewpoint to any it wants, and rasterization provides a quick translation from 3D to 2D.

I also don't understand why this thread is here in the console tech forum and not the General 3D Discussion forum. :???:

ShootMyMonkey · Feb 7, 2007

If all 6 viewpoints showed a circle then you would have a sphere. If however the top and bottom showed a circle and the left, right, front and back showed rectangles then you would have a cylinder as your link suggested.

What, did you look at just one image on that page and say that's it? You clearly missed the 3rd example where 3 axis-aligned cylinders intersected at one point and you had a figure created out of the intersections. You most certainly would see a circle from every axis-aligned viewpoint if you look at that figure, but it is most definitely not a sphere.
Figure 1
Figure 2

And notably even with those six axis-aligned views, you wouldn't be able to tell it's not a sphere.

However the "interesecting cylinders" you are referring to would have absolutely no use for this type of rendering as it does not require the combination of polygons, cylinders, cubes or anything else to render the scene.

And you clearly missed the point, which was about the shape formed, and not about how you go about rendering or representing them. You could take that intersection figure and represent it in polygons or sprites or whatever. The point was that your notion that you only need a few viewpoints to get any 3d look falls apart with that example.

The last way I can think about explaining this would be for everyone to imagine how they would tell someone how to "draw" a picture of a house on a piece of paper. You would tell the person to draw two horizontal parallel lines -30 degrees off center. Then to draw 2 vertical parallel lines at 90 degrees connecting the previous lines together. You would keep doing this and they turn 2D lines into 3D images because you don't need a 3D environment to create 3D images.

Where do you think that logic comes from? How do you think someone is going to know which lines should be 30 degrees off and which should be vertical? Where should they be? Somewhere in there (at least in someone's mind) is a 3d representation of the house, even if it isn't completely descriptive. Unlike a human brain, though, a computer isn't going to take a rough description and be able to fill in the gaps through intuition.

OtakingGX · Feb 8, 2007

I think the first thing you learn as a Mechanical Engineering undergrad is the ambiguity that arises from projection drawings of parts and how to avoid them. It usually takes years of experience to start making good drawings. Drawings that a machinist can pick up and read without using your name betwixt many a swear word while trying to make (in 3D) what you drew.

I understand the concept of what is being proposed here, but an intermediate step is required. You can't just refer back to the projection views each time you want to draw a new view. You need a topographical map of how the scene looks. That just happens to be the current method employed.

Another technology that has remained largely obscure is the Voxel. It's usually referred to as a 3D bitmap. I think the most notable games to use voxels were Delta Force and Master of Orion 3. Here's the wikipedia entry on voxels, though it's not terribly detailed: http://en.wikipedia.org/wiki/Voxel

Cal · Feb 9, 2007

Is this thread about Image Based Rendering (Layered Depth Image, Lumigraph, Plenoptic function, etc.)? Or something related to Computer Vision?

It seems everyone has something ambiguous to say. Actually many researchers in the field of Computer Vision believe that current polygon-based 3D graphics are overdrawing stuffs to meet the perceptual reality of human eyes. But in fact human's vision is pattern based, not pixel based. Thus portraying every polygon and every pixel accurately is some kind of a waste.

OtakingGX · Feb 9, 2007

Cal said:
Thus portraying every polygon and every pixel accurately is some kind of a waste.

Hence, my idea to create an eye-tracking camera so that you only end up spending rendering resources on pixels that the observer is looking at and not on those in the periphery of their vision that are blurry anyhow.

Cal · Feb 9, 2007

london-boy said:
If we were to explore those games, we would need one picture saved somewhere of every point of view possible in a 3D space, which obviously is an impossible task.

It's possible.:smile: For instance, have a look at Concentric Mosaic algorithm:
http://research.microsoft.com/research/pubs/view.aspx?pubid=933

In fact, all depth-data-free IBR algorithms try to store all pictures from all potential viewpoints. The tricks lay in how you organise and reconstruct them. IBR algorithms can be very efficient and easy to make, like Concentric Mosaic (which only needs a DV to shoot several video clips and no problem to achieve high frame rate in real-time rendering). But that's it. Free-roaming camera is not an issue, but the interactivity is.

Cal · Feb 9, 2007

OtakingGX said:
Hence, my idea to create an eye-tracking camera so that you only end up spending rendering resources on pixels that the observer is looking at and not on those in the periphery of their vision that are blurry anyhow.

I think that requires a very big screen? Otherwise it's hard for a small screen to fall out of focus. Also, you assumed there is only one observer.

OtakingGX · Feb 10, 2007

Cal said:
I think that requires a very big screen? Otherwise it's hard for a small screen to fall out of focus. Also, you assumed there is only one observer.

I was applying the concept to a near eye display. That'd be interesting though, to have each observer sporting just an eye tracker and looking at a TV screen.

And even a small screen does fall out of focus quickly. If you're standing ten feet from a screen then your sharp vision will cover a circle about 6 inches in diameter. That's less than 4% of a 42" TV screen. That area will grow larger with distance, but you'll also be averaging more visual data.

The near eye display works the best, as you can calibrate the output resolution for how far it will be from the eye. Otherwise, you'd have to look at the distance the user is from the screen, and figure out how many pixels will occupy one minute of angle of their vision (then dynamically adjust your rendering to account). This way you don't have to render the full resolution of a 6" display that is 10 feet away. Treat it as a lower resolution screen for rendering purposes.

Bigus Dickus · Feb 10, 2007

Well, I guess I see the problem from a completely different perspective. Instead of noting that we are representing 3D worlds on 2D displays and wondering if we could change our rendering techniques to somehow be more "compatible?" or "optimized?" for that display, I simply see the need to increase our displays to 3D to match.

The raytracing Quake IV demo really opens your eyes to how much more realistic good lighting/shadowing makes an environment seem... similarly, seeing any 3D IMAX movie really opens your eyes to how the extra dimension transforms the experience from "watching something filmed" to "being there in the middle of it." It really is a different perceptual experience.

I know the shutter glasses never really caught on, and for many good reasons, but that doesn't remove the principle need for a suitable 3D display technology.

Cal · Feb 11, 2007

Dregun said:
I think from what I read about "Image Based Rendering" its creating a 3D model from 2D images, I'm talking about taking 2D images and creating 2D images that look as though they are 3D. I don't understand why you would need more then just 3 images at any 1 given view point for a 2D image.

"Image Based Rendering" is not about reconstructing polygon models from a set of 2D/3D images (e.g. CT tomography reconstruction), it's all about rendering a scene from a set of images directly. Otherwise it would be called "Image Based Modelling". Most IBR algorithms do not employ any polygon model at rendering time, but only 2D images (some may use a few polygon for assistance).

I found many people misunderstood the IBR techs. I'd like to explain the basic concept behind IBR although I'm not very good at it:
Typically, IBR algorithms can be classified into two groups: the ones requiring depth info at rendering time (Sprite with Depth, Layered Depth Images, etc), called depth-field methods; and the ones not requiring depth info (Lumigraph, Concentric Mosaic, etc), called light-field methods.

Depth field is based on such a concept: taking a color image and a depth image from a given viewpoint, we can certainly obtain (color, x, y) info from the photograph and (z) info from the depth image. These infos are adequate to represent the luminance distribution in 3D space except the occluded parts. It's not hard to render the scene at a new viewpoint if we can patch the occluded area via some additional data. Using a few images from different viewpoints (front, top, side, etc) as input is an efficient way to recover the occluded parts.

One popular depth-field tech is LDI (Layered Depth Images), which employs more than one "layer" to store "all" color & depth info through one pixel from a given view point. For instance, one LDI data can be composed with one front view layer and one back view layer, each of them contains color & depth pixels. LDI is a very pratical tech because the data is easily obtained via some devices like binocular camera. Many other IBR algorithms are based on it too.

The major deficiency of real-time rendering with depth-field methods is the involving with FTM (forward texture mapping). Today's hardware uses BTM (backward texture mapping), i.e. given a pixel on the screen, using the interpolated texcoord to index into the texture. While for FTM, it gives a texel on the texture, and calculate the screen coordinate then draw the texel onto there. FTM could cause severe gap/granularity alias if the input images are not huge enough.

As for light-field category, all of them are trying to compress a huge set of images from all possible viewpoints and reconstruct the image for a new viewpoint. To record this kind of data, a 5D array (3 for camera position, 2 for camera direction) is required, called plenoptic function. Theoretically it's impossible to store and use this kind data directly, but at some cases it can be reduce to 4D or even lower. Taking the house example you guys discussed before: to view the house from outside, we can use a cubemap to enclose the house. On this cubemap, each pixel is a micro cubemap to store the colors observed from different angles - thus we can use this 4D data structure, called Lumigraph, to render the house from anywhere outside. In most cases, the adjacent micro cubemaps have almost identical data, that means the Lumigraph can be compressed greatly. Concentric Mosaic is another popular light-field tech which further reduces the plenoptic function to 3D. CM is an extension to 360 degree panorama (QuickTime VR). Instead of putting the data (captured from the center point) into a single cylinder, CM stores data (captured from multiple viewpoints) on a set of concentric cylinders. During rendering, it selects vertical lines from different cylinders to compose an image, ensuring the correct parallax and occlusion.

The drawback of light-field methods is the size of data. Although most of them can be compressed to an acceptable level (less than 50M), they are still too big, too clumsy relative to polygon data.

IBR techs once being the hot spot of Computer Graphics reseach field about 10 years ago, its good side lays on the simplicity to capture the real world scenery, while the bad side is lack of ability to interact with. Basic routines like collision detect are very very hard to perform with IBR, let alone animation and physics. Relighting image data is more useful to today's CG film, because it's very common to add some digital set/charaters into photographed scenes.

3D is still displayed 2D so why bother with 3D?

Dregun

ERP

Dregun

Npl

antlers

ShootMyMonkey

randycat99

Dregun

Shifty Geezer

uber-Troll!

ShootMyMonkey

OtakingGX

Cal

OtakingGX

Cal

Cal

OtakingGX

Bigus Dickus

Cal

Similar threads