Cool stuff from Bitboys

eSa · May 30, 2003

Rudely stolen from nVnews forum;

http://www.hybrid.fi/research_delaystreams.html

Well, it's actually co-operation between Nordlund and Aila, but I just had to make the subject people would notice. I have know for sometime that they had the paper published, just couldn't find it anywhere

Seems nice work anyway.

Humus · May 30, 2003

"Delay streams" ... looks like renamed deferred rendering?

CorwinB · May 30, 2003

I'm not sure about "streams", but the Bitboys sure were good at "delays"...

"Delay streams" ... looks like renamed deferred rendering?

I'm not sure, but it looks like "semi-deferred" rendering, ie you don't have to process all the primitives in the scene before starting the rendering ?

mczak · May 30, 2003

isn't this similar to "slipstream" from the 3dLabs P9? At least it looks like both are basically storing the geometry and not processing it immediately.

Ilfirin · May 31, 2003

Sounds more like a form of primitive caching/better early-out than deferred rendering. They keep a triangle around for a little while and see if it becomes occluded with the subsequent triangles.

/me goes to read the paper

[Edit] Yeah, it's just a large (but compressed) FIFO buffer followed by an occlusion test. Conceptually, it sounds like a low-risk iacross-the-board mprovement to existing methods (such as Early Out), though their implementation sounds a bit sketchy.

KimB · May 31, 2003

Sounds like a good idea (without reading the paper). With a reasonably-large on-chip buffer, one could expect dramatic improvements in rendering performance, depending on the locality of the triangles sent for rendering.

psurge · May 31, 2003

I briefly glanced at the paper.

2 comments: the delay stream is stored in video memory (not on-chip as far as I can tell). They say that rendering of delayed tris starts when the delay stream is 40% full...so it seems like you essentially pay the full off-chip triangle memory bandwidth price of a fully defferred renderer. On the plus side, the scene capture buffer is bounded in size...

What might be interesting is essentially allocating a delay stream per tile...

Kristof · May 31, 2003

GRIN, somehow I find this funny :

The delay stream does not capture the entire scene. Our tests suggest that a sliding window of 50Kâ€“150K triangles is sufficient to reduce the depth complexity close to the theoretical optimum.

Well DUH, after culling and clipping most scenes do not even contain 150K triangles so when hitting the theoretical optimum they are probably storing the entire scene

Occlusion information cannot be generated for primitives that use per-pixel rejection tests, e.g., alpha or stencil tests, until these tests have been executed. Furthermore, occlusion tests cannot be applied for primitives that modify their depth values in the pixel shader or update the stencil buffer when the depth test fails.

Hmm, fails with stencil tests where have we heard that before...

Our LRZ-buffer uses 8x8 pixel tiles. Each tile requires 32 bits of video memory as the minimum and maximum depth values are stored as 16-bit1 floating point numbers. Both of the values are needed to allow changing the depth comparison mode during the frame. The LRZ-buffer is stored in video memory and accessed through a single-ported 32KB on-chip cache, split into eight banks to facilitate multiple simultaneous accesses. Paging to video memory is performed with a granularity of 256 bytes. Each depth buffer has its own separate LRZ-buffer.

This is IMHO the interesting bit. They use Depth Areas of 8x8 pixels for which they store Depth Values with limited accuracy. Now what this means is that "before" you can update one of these depth values you need to have a triangle that covers at least a complete 8x8 tile. With other words if your occluding geometry is finely tesselated it can not occlude because the area each occluder covers is smaller than the tiles and hence the info can not be updated (if only 50% of a tile has a close by triangle you can not change the depth values stored for that tile area). They fix this using a tile cache which actually stores full details Z values, but only for a limited area so they assume that there is enough spacial locality for this to work. If a detail tile gets flushed out of memory they only store the near/far values not all the detail to save bandwidth.

Transparent surfaces were excluded from all scenes in order to make the optimal depth complexity exactly 1.0.

Erhm ? How exactly should I understand that, did they ignore all transparent polygons completely and take them out of the stream or did they maintain them in the stream ? For a scene such a 3DMark2003 a lot of the follow on passes are transparent so they would have a massive impact on the results, also this scene uses stencil tests about which they said it made the system fail, did they also ignore this ? 36K tris is also not very representative for a 3DM03 scene IMHO, did they specifically select this frame because it happened to work ? Notice how all frames contain a nice big close to the camera occluder (car, dragon, character) obviously in those cases your system is going to work, how about a more random scene ?

PowerVR [2000] captures the geometry of the entire frame and uses tile-based rendering with internal buffers. Occlusion culling is implemented by using on-chip sorting and thus the culling efficiency is not affected by the order of the input primitives. The majority of bandwidth problems are avoided but the limited amount of video memory makes capturing large scenes impractical.

Since when has the amount of memory been a problem now that we have cards with 256MB (even 512MB) of memory ?

Also in their whole section about "Order-independent transparency" they completely fail to mention that PowerVR delivered this functionality in a mass market console product known as DreamCast...

The fundamental nature of order-independent transparency is that
all visible data must be collected before the processing can begin.
This is exactly what a sufficiently long delay stream does.

So do they store the whole scene geometry or not ? They said this was impractical before...

I didn't read the AA section in details but :

Our approach is to conservatively assume that all triangle edges are discontinuity edges and to use a simple geometric hashing algorithm for detecting shared non-sharp edges (Figure 6).

This seems to imply they only consider triangle edges for AA similar to the Matrox approach meaning that triangle intersections will not get AA which people have concluded is not acceptable ?

Also for villagemark they end up with an overdraw of 1.18 for a scene with all transparency removed, this means that they are still processing 18% too many pixels. I mean they say villagemark is 1K polies, that should fit in thei delay stream completely so the result should have been perfect ? Why do they still process 18% too many polygons in such a trivial case ?

Their "compression" seems pretty basic, seems like they have a window of 4 vertices and if you happen to re-use the same vertex a coupe of times in quick succession they will use an index rather than re-store the vertex data... not exactly an impressive compression scheme - its actually similar to a very very small vertex cache at the back-end of your vertex shader. They then say how a vertex format is usualy small anyway, yeah right... with people starting to store full float values in texture coordinates that argumentation goes right out of the water when using complex shaders.

All just IMHO...

K-

psurge · Jun 1, 2003

K,

Correct me if I'm wrong, but doesn't PVR also support essentially fixed size scene buffers (by partially rendering tiles when the scene buffer starts to overflow)?

KimB · Jun 1, 2003

psurge said:
K,

Correct me if I'm wrong, but doesn't PVR also support essentially fixed size scene buffers (by partially rendering tiles when the scene buffer starts to overflow)?

Yes, it does. Dealing with dynamic data sets is just too expensive for a video card (which is one big reason why I'm still against full-on tiling...there's always that frame that's 2x, 3x, or even 10x the average detail...).

Kristof · Jun 1, 2003

psurge said:
K,

Correct me if I'm wrong, but doesn't PVR also support essentially fixed size scene buffers (by partially rendering tiles when the scene buffer starts to overflow)?

Correct, and some improvements have been discussed here :

http://www.beyond3d.com/forum/viewtopic.php?t=758
http://www.beyond3d.com/forum/viewtopic.php?t=2693

K-

Markus Maki · Jun 3, 2003

Kristof, on the frame numbers, I personally drove over to Hybrid with 3DMark beta version on CD and we captured ~5 frames per game test.

The capture system was a "hack" and took quite a while to dump the data from a single frame to disk, and I ran out of time - had to get back to work

PVR_Extremist · Jun 3, 2003

Dammit Kristof, stop posting and bring us Series 5 already for heavens sake!!!!!

Nappe1 · Jun 4, 2003

hmmh... I was a about to say something... too bad that I can't remember it anymore.... oh well, who cares... it was just something about the topic

(first time in the net since mid may... propably last one too forawhile. email most likely reaches, if needed.)

Cool stuff from Bitboys

eSa

Humus

Crazy coder

CorwinB

mczak

Ilfirin

KimB

psurge

Kristof

psurge

KimB

Kristof

Markus Maki

PVR_Extremist

Nappe1

lp0 On Fire!

Similar threads