Unlimited Detail, octree traversals

Hmm, so this tech isn't gonna revolutionize video games after all? Who would've thought...
 
I am not impressed about the compression ratio (raw scanner data to 5%-20%). If I remember correctly id-software's SVO prototype already achieved around 1 bit per voxel average storage, and that wasn't even good enough for their purposes (using the technology for whole game worlds).

Has Euclideon shown any demos that include proper geometry instancing? In all the demos instances of objects are all facing towards the same direction, and positioned in ordered grid. It looks exactly like they have just linked SVO nodes to the same data root node (of the detailed model), but they claim it's not an SVO system. It's odd that their point clouds cannot be rotated separately for each instance. If their system would have supported proper instancing (separate rotation and free placement for all instances) their system would have been good for static geometry in games as well. Huge chunk of geometry is static in current generation games after all.
 
Perhaps compression for them means not re-storing the common prefixes or suffixes of the locational codes of the points. Prefixes' recycling leads to octrees, suffixes' recycling results in instancing. That Bruce Dell knows (in fact, he reinvented it) what a locational code is can be gathered from things he said, including his description on this forum:
we use a mass 3d to 2d conversion that shares the common elements of the 2d positions of all the dots combined.
2d positions = base 4 sequences (view plane locational codes); common elements = common prefixes. You can, with a single procedure, do view frustum culling, get the view homogeneous coordinates and project the visible points. All this without a single *, / or float.
N(i) = number of nodes of a complete octree of height i.
N(i + 1) = 1 + 8N(i)
Number of leaves = number of points = N(i + 1) - N(i) = 1 + 8N(i) - N(i) = 1 + 7N(i)
N(i) being the number of internal nodes.
Now, you can store the leaves' data in their parents (instead of their index). If your points & data are 32-bit quadruplets and nodes in the octree are 32-bit octuplets then, the ratio of the size of the octree to that of the points & data is about 2/7: compression.

So how does this differ from algorithms of ~20 years ago that modelled objects with a multitude of spheres and then had a fast "blatting" algorithm?
It differs from splatting/projection and ray casting more than they differ the one from the other. In fact, his description above is, it is believed, quite detailed.
UD is the new de facto standard renderer, it accords very well with Sutherland's survey: a (combinatorial) coherent bucket sort.
Instead of indulging in destructive criticism, we should try to reverse engineer it.
 
I tried to post this in the UD thread but had no success; could some courageous and knowledgeable programmer implement this object-order octree traversal:
Code:
/* Octree nodes are the buckets (internal nodes have <= 8 buckets: their children);
    frustum octants are the things being sorted.
    The hierarchy of squares is given by a 1 bit per node complete quadtree stored in level order (hence, father/children relationships are "computable").
    The hierarchy of cubes is a sparse octree.
    Call bucket_sort with cube = root node and stack = (viewport, view frustum) */
bucket_sort(cube, stack)
{
    if (cube is a point)    /* leaf node of our object octree */
        do {    /* there can be many frusta located at the same point, in a discrete world */
            (square, frustum) = pop(stack)
            if (!square.black) {    /* a black square represents a completely filled screen region */
                square.color = cube.color
                do {    /* update the occlusion pyramid/quadtree */
                    square.black = true
                    if (square is root)
                        return
                    square = square.father
                } while (square[XY].black for all XY in { LU, RU, LD, RD }) /* while all the subsquares are black */
            }
        } while (stack)
    else {  /* the cube has subcubes */
        do {    /* distribute the given frusta (stack) among the cube's (non-empty) subcubes (the buckets); octosect the frusta as needed */
            (square, frustum) = pop(stack)
            if (!square.black) {    /* skip occluded frusta */
                if (frustum in cube[XYZ])   /* cube[XYZ] is a non-empty octant of cube; XYZ is e.g., "LUF" i.e., left-up-front */
                    push((square, frustum), stack[XYZ]) /* push on subcube "cube[XYZ]"'s stack */
                else    /* frustum in no octant of cube: octosect the frustum, quadrisect the square */
                    for (XY in { LU, RU, LD, RD })  /* LU: left-up quadrant, RU: right-up quadrant etc. */
                        if (!square[XY].black)  /* if the XY quadrant/subsquare isn't black; doing this with a switch has benefits, otherwise it's a bit superfluous */
                            for (Z in { F, B }) /* F: front, B: back */
                                push((square[XY], frustum[XYZ]), stack)
            }
        } while (stack)
        /* the given frusta (or parts of them) are in the buckets now: bucket_sort these buckets */
        for (XYZ in front_to_back)  /* front_to_back is a front-to-back enumeration of the the cube's octants; this depends on the eye's octant (predetermined lists) */
            if (stack[XYZ]) /* if the stack associated with subcube "cube[XYZ]" isn't empty */
                bucket_sort(cube[XYZ], stack[XYZ])
    }
}
or this image-order one:
Code:
/* Frusta are the discriminants. Classical Bresenham traverses a raster/grid; this Bresenham traverses a hierarchy.
    Here the other dimension of space is the major axis: scale. */
hierarchical_bresenham(cube, frustum, square)
{
    if (cube is a point) {  /* leaf node */
        square.color = cube.color
        square.black = true /* i.e., is opaque */
    } else {    /* non-leaf */
        if (frustum in cube[XYZ])   /* descend in non-empty subcube "cube[XYZ]" */
            hierarchical_bresenham(cube[XYZ], frustum, square)
        else {  /* octosect frustum, quadrisect square */
            for (Z in { F, B }) /* front frusta first */
                for (XY in { LU, RU, LD, RD })
                    if (!square[XY].black)
                        hierarchical_bresenham(cube, frustum[XYZ], square[XY])
            if (square[XY].black for all XY in { LU, RU, LD, RD })
                square.black = true /* the region is opaque */
        }
    }
}
The reason for this request is that I tried to program many variants of the above, both recursive and iterative (one using a ring buffer), but my octree is very recalcitrant in the face of the memory hierarchy (8 indices per node constructed on a Morton ordered set of points; a 1 byte per node version was also attempted). A sample: http://www.gamedev.ru/files/?id=81959. Note that, in accordance with UD's claims the core uses no *, / or floats. It's neither raycasting (no concept of ray involved) nor splatting (no model/view transform + projection). The idea is, instead of projecting the view frustum on the viewplane, do draw the frustum "on" the world. This can be achieved hierarchically: octree for the world and view frustum octosection. Linkless octrees and hashes seem promising. I hope that this post will get past moderation soon: there remains little time.
 
Nobody taking up the challenge (or caring)? Anyway, UD is more likely related to:
Code:
/* Here we quadrisect the frustum across x & y (no bisection across z) */
bucket_sort(frustum, square, list)
{
    if (unit square) {  /* frustum corresponds to a pixel */
        cube = pop_front(list)  /* pick a front cube */
        square.color = cube.color
    } else {
        do {
            cube = pop_front(list)
            if (cube in frustum[XY] for some XY in { LU, RU, LD, RD })  /* frustum[XY] is the XY quadrant of frustum */
                push_back(cube, list[XY])   /* list[XY] is a list of cubes that are in the XY subfrustum */
            else    /* the current cube spans subfrusta: push_front its (non-empty) children on list, in back-to-front order so that the frontmost are firsts */
                for (XYZ in back_to_front)  /* back_to_front depends on where the eye is with respect to cube's halfway planes (predetermined list) */
                    push_front(cube[XYZ], list) /* cube[XYZ] is the XYZ (non-empty) octant of cube */
        } while (list)
        for (XY in { LU, RU, LD, RD })  /* LU: left-up quadrant, RU: right-up quadrant etc. */
            if (list[XY])   /* if there are cubes in the XY subfrustum */
                bucket_sort(frustum[XY], square[XY], list[XY])
    }
}
Bruce Dell says that he does a mass 3d to 2d conversion that shares the common elements of the 2d positions of all the dots combined. This suggests a recursive quadrisection of the view frustum/pyramid across the xy axes. Front-to-back/back-to-front enumerations of a cube's octants are known from of old.
 
I did some voxel rendering tests a few years ago (as a hobby project at home). My system didn't ray traverse the octree (for each pixel), but instead projected the whole octree to the screen.

Assuming an orthogonal projection, sorting is very easy. You just sort the 8 octree nodes once a frame (sort by view vector dot cube center). Each octree node is always traversed in this order. If a depth first traverse is used, this will yield in correct drawing order (inverse of painter's algorithm). This basically makes the sorting free (if you don't like to index, you can write a template, and pick a hard coded traverse function based on sorting order).

I used pretty much standard octree traversal. If the node was completely out of viewport I skipped it (and all it's childs). If it was completely in the viewport, I called an optimized traverse path with no further viewport culling checks (but occlusion checks of course still in place). For intersecting nodes, the same function was recursively called. Pretty standard stuff.

For each voxel traversed, the (hierarchical) depth buffer was tested. If the voxel was completely behind the depth buffer value, it was skipped (including its childs). Voxels were rendered (and traverse was stopped) when the voxel screen space size become one pixel (or after there was no childs left). At that point the voxel was rendered to the depth buffer. I had a two level hierarchal depth buffer (didn't test a proper multilevel buffer).

I didn't have time to do many memory optimizations (or any vectorization or multithreading) to my algorithm. It was barely real time (on my Core 2). I did store the collision info in a separate octree (this saved a lot of memory bandwidth, as collision data is not interleaved with color/material data). I used an optimized octree with a single child pointer (all childs allocated to continuous memory locations). If I implemented a sparse octree renderer now, I would likely just use a hash table, and skip storing any child information to the octree. A single 64 byte cache line can store 8x8x8 bit field (that's 512 voxels). With some AVX vector crunching that kind of buckets should be rather efficient to process. After depth buffer processing, I would query the colors directly from a separate sparse octree (depth value is easy to transform to 3d position, and you can directly hash with it to the octree containing material data). Non-leaf material octree nodes of course store color values that are average of all their childs (this is required for proper geometry/texture antialiasing/filtering).
 
Thanks for the interest, sebbbi.
Again, the procedure in #3 can be implemented without *, / or floats (using ideas in #1). In fact, frustum culling/projection can be done combinatorially (i.e., in a pure way, without fixed-point numbers: only true integers). It is strongly suspected that UD works according to #3: it is a novel octree traversal that does not retrace tree nodes (one traversal per frame). Mr. Dell's patent has to do with octant rejection: an octant is rejected by not containing view frustum subfrusta. What is key is view frustum bisection. He really did come up with something that is pure.

When trying to build a search algorithm that scans 3D atoms your biggest problems are what do you grab when they are far away and small ? How do you know whether an object is blocked by another object ? In the end perfect solutions were found. I don’t use the word perfect lightly, for a long time I was not at all happy with some parts of the system but they kept evolving and now we are at the point were the core of our algorithm is less then two pages long and it has no multiplication or division (multiply and divide slow a computer down). (cf. this)
One thing that made him "not at all happy" was, perhaps, the use of * to sort octree nodes' bounding boxes among subfrusta/buckets (with the help of e.g., BoxOnPlaneSide).
 
UD is the new de facto standard renderer, it accords very well with Sutherland's survey: a (combinatorial) coherent bucket sort.
Instead of indulging in destructive criticism, we should try to reverse engineer it.
While I agree people are often too prone to criticism without an open mind statements calling it a "new de facto standard renderer" puts you on the side of belief before proof. Is cold fusion the defacto standard energy source? Some people still think it will be, but it's a long way from being proven let alone standard.
 
Some people still think it will be, but it's a long way from being proven let alone standard.
In UD's case there already are big game companies using it. UD's algorithm is too elegant, simple, not to conquer the concerned field. This is why we should reverse it, before it's disclosed (and there remains very little time). As it stands, it's unclear how one would implement it on the current GPUs. This is a treath for the GPUs, not for UD.
 
What big game companies are using it? I've heard no announcements. And while I understand the desire to reverse engineer something what's the deal about a deadline before details are published? Is it just a thought exercise?

If UD doesn't know how to implement a graphics algorithm on a parallel processor like a GPU they likely haven't thought about the problem hard enough.
 
In UD's case there already are big game companies using it. UD's algorithm is too elegant, simple, not to conquer the concerned field.
:???: AFAIK there hasn't yet been a showcase of UD with dynamic lights, or shadows, or animated content. Or irregularly placed objects. You're going to have to do some work to convince any of us there's a real future here, for none of us share the confidence in Euclidian's work that you have. That's more a discussion for the Unlimited Detail discussion thread then here. Certainly UD is nothing like a new de facto engine that all the big companies are switching to.
 
In UD's case there already are big game companies using it.
Errr... Nobody's using it, except maybe some indie devs. There are too many issues with UD for it to be practical for a big-name title from a large publisher. You can't animate stuff with it, everything looks the same everywhere or you run out of RAM immediately, etc.
 
Certainly UD is nothing like a new de facto engine that all the big companies are switching to.
Errr... Nobody's using it, except maybe some indie devs.
Not all but at least one "dependent" quite big game company uses it (as of April 23, 2012).
AFAIK there hasn't yet been a showcase of UD with dynamic lights, or shadows, or animated content.
http://www.youtube.com/watch?v=_5hg9VfbyYg
It does animation from of old (at least 2003). You can translate, rotate and scale things. Instead of touching the octrees you use many view frusta positioned/deformed in appropriate ways and then search points.
You can't animate stuff with it, everything looks the same everywhere or you run out of RAM immediately, etc.
They page in the needed nodes.
Are there people trying to reverse engineer it? Back when SGI style rendering wasn't standard, one tried to figure out how Mr. Carmack textured his walls (William Doughty succeeded), why are there so few good willed persons today? (Hint: Mammon).
 
The biggest evidence this is a scam is simply that there's a company built around it. No one needs that. No one wants that. It's an algorithm, congratulations you post a white paper and get hired if it's good. You don't build a company around that.

If you want unlimited detail, just figure out how to get the storage costs and etc. down for this: http://bps12.idav.ucdavis.edu/talks/04_crassinVoxels_bps2012.pdf

There, cone tracing through an SVO. Good performance, nigh infinitely scalable geometry, all out there just in a paper for everyone to read. If someone has some "magic science" they won't allow you to look into because they think it will make them rich, then they're scamming you, are fairly ignorant, and probably idiots. People need something that works, and works well, and right away, and is a tangible improvement overall if they're going to spend money on it. This is vague bullshit without anything you'd need for building a game actually working yet, and so no reason whatsoever for anyone to look at it other than as an interesting possible research topic for the future, and they won't tell anyone about it so it's useless in that regard as well.
 
Back
Top