Thoughts on some rendering ideas I had

I've had some thoughts and Ideas in regards to rendering over the years and I was hoping the developers and armchair developers here would chime in on some of them.

1. Around 2003 on gamedev.net I made a post about z-compositing that I'd like to rehash here and now.
The basic idea was to render the static geometry of the scene at one frame rate and the dynamic geometry at another and then composite the two at the higher frame rate. For an first person camera static geometry at 60fps and the dynamic at 30. For a third person camera 30 for the static, and 60 for the dynamic.

a. Assuming you can keep up the frame rates and its a game with no overtly fast moving objects, camera, or lights can you think of any rendering artifacts? Do you think players would complain of something akin to micro-stutter or something bothering them?
b. Related to a. given modern rendering techniques do you feel it would be worth the trouble? In addition at what stage would you do the compositing and why i.e. before or after lighting and shadowing?

2. I kind of skipped over how HDR was implemented until recently. I've only looked into tone mapping recently so far so essentially all I really know is it seems sometimes you need to calculate the average luminance. So I was wondering if the was possible given either the current hardware or the current hardware/API combination to compute the sum portion of the average as you do your light accumulation?
It seems rather inefficient to do it after the fact.

3. I've always had a thing for ID's primitive/patch id's, object/sub obect id's and... Its seems some rendering techniques require multipassing (light indexed) others have an option for it (clustered) and then some ("classical deferred" with z-only pass) seem like they might benefit from it but wind up with too high a geometry load. So I was wondering if anybody has and is it currently possible (api/hardware) to do a Z, primitive id, and whatever other id is necessary for the specific rendering technique, to generate a sorted list of visible primitives to be used as the basis of what to draw in subsequent passes?
Essentially Z would be read or read modify write, framebuffer output would be write only with no texture reads to accomplish the feat. So speed would be pretty close to a Zonly, and the subsequent pass/passes would only try to render what is visible and the z-buffer would handle the rest. What do you think?

Thanks in advance for any comments, criticisms, advice, and analysis's.
 
Putting a copied z-buffer back in place is a bit tricky, blitting isn't available. You can use a fullscreen triangle with z-writethrough (no compare, just write) and pass the old z value from a Texture2D via Load() to SV_Depth out.
 
1. Would probably lead to weird perceived jittering. Interesting idea though. This could be tested easily (and the technique could be proved/disproved) by doing world/actor updates at different frequencies but rendering everything at the higher one.
3. This would work better as a dedicated piece of silicon than a general purpose computation IMO.
 
I'm quite sure it would be more feasible to have some sort of re-projection scheme for background and composite characters on top.
 
2. I kind of skipped over how HDR was implemented until recently. I've only looked into tone mapping recently so far so essentially all I really know is it seems sometimes you need to calculate the average luminance. So I was wondering if the was possible given either the current hardware or the current hardware/API combination to compute the sum portion of the average as you do your light accumulation?
It seems rather inefficient to do it after the fact.
If you do compute shader based tiled lighting, you can use LDS (thread block shared) atomics (atomic add) to accumulate the total light intensity per thread block, and when the thread block is finished the first thread of the block does one global atomic add (to a shared memory location). This is very efficient.
Putting a copied z-buffer back in place is a bit tricky, blitting isn't available. You can use a fullscreen triangle with z-writethrough (no compare, just write) and pass the old z value from a Texture2D via Load() to SV_Depth out.
However writing z-values out of a shader disables the GPU depth compression, and that hurts your rendering performance. I don't think there's a good way to do this on PC DirectX without any performance loss.
 
1. Would probably lead to weird perceived jittering. Interesting idea though. This could be tested easily (and the technique could be proved/disproved) by doing world/actor updates at different frequencies but rendering everything at the higher one.
3. This would work better as a dedicated piece of silicon than a general purpose computation IMO.

Actually i'm not so sure about jittering, but rather jumping but maybe we're talking about the same thing. In fact I'm more concerned about temporal occlusion artifacts. I'm not that good at picturing things anymore and picturing the the dual frustum's in various geometric configurations that would allude to artifacts is beyond me at times.

Dedicated silicon is most likely unnecessary, here are some concerns off the top of my head.
1.I'm not sure which would be faster generating the list as you render or as a post process, dedicated hardware might speed things up if you do it as you render.
2.You might need multiple sorts and "bins" on the same list which I'm not sure is possible as a fixed function implementation and if you go programmable you might as well use the current programmable hardware to do it if it maps well to it.
3. Multipassing is only done on cameras, it would most likely be a waste of space as a dedicated piece of silicon.
 
If you do compute shader based tiled lighting, you can use LDS (thread block shared) atomics (atomic add) to accumulate the total light intensity per thread block, and when the thread block is finished the first thread of the block does one global atomic add (to a shared memory location). This is very efficient.

When I looked into shader model 5, I remember looking into atomic adds but it said that it was limited to int and uint. Is this different for compute shaders? (looking up now... thanks for the lead)

EDIT - Yeah it seems that i'm missed the part where it says "and shared memory variables", thanks once again.
 
Last edited by a moderator:
I'm quite sure it would be more feasible to have some sort of re-projection scheme for background and composite characters on top.

The day before you posted this I saw a news item over on extremetech in regards to occulus rift called time warping. It had a link to a post by Carmack on altdevblogaday that explained some latency reduction techniques, it explained time warping used reprojection to that end (haven't read the whole thing yet). I'm going to think about it, thank you for your input.

One thing I was considering is if technique 3 was possible, I would try to combine it with 1 creating one list of triangles for static geometry and one for dynamic geometry.
 
However writing z-values out of a shader disables the GPU depth compression, and that hurts your rendering performance. I don't think there's a good way to do this on PC DirectX without any performance loss.

That's unfortunate yes. But because z-compression stores at most 3 plane-equations + a map, and those aren't there when a manual z-value is pushed, it's just the only way how it could be. Waiting for a block to be filled, whenever that could be inferred, and then figuring out if there are exactly three different derivatives in it is quite impractical. Just if someones wonders why. :p

Actually, I'm not sure anymore if the question above is about frequency of parts of the simulation or frequency of parts of the graphics. I could imagine that games working with parallax can indeed function with stitched together "backups of z-planes".
 
But because z-compression stores at most 3 plane-equations + a map, and those aren't there when a manual z-value is pushed, it's just the only way how it could be.
You are talking about NVIDIA right? AMD hardware is different. Is there actually any public documents about the depth compression hardware of the current NVIDIA and AMD PC hardware?

It seems that Mantle allows low level access to GCN HTILE buffer (slide 31):
http://www.slideshare.net/DevCentra...4-with-mantle-by-johan-andersson-amd-at-gdc14

Filling HTILE buffer using a compute shader is an efficient way to handle this problem. Too bad there's no cross platform API that allows anything like this.
 
You are talking about NVIDIA right? AMD hardware is different. Is there actually any public documents about the depth compression hardware of the current NVIDIA and AMD PC hardware?

It seems that Mantle allows low level access to GCN HTILE buffer (slide 31):
http://www.slideshare.net/DevCentra...4-with-mantle-by-johan-andersson-amd-at-gdc14

Filling HTILE buffer using a compute shader is an efficient way to handle this problem. Too bad there's no cross platform API that allows anything like this.

I think only a few persons know which have been the exact algorithms/encodings put into the chips. The only interesting bits I've found in Efficient Depth Buffer Compression. Storing Z in plane-form seems generally reasonable though.
There is very little information about HTILE in the Northern Island documentation. At least one knows that Cayman has 8x8 tiles, and how big the on-chip tile-buffer is.
 
There is very little information about HTILE in the Northern Island documentation. At least one knows that Cayman has 8x8 tiles, and how big the on-chip tile-buffer is.
If the info in Battlefield slides is correct Mantle allows access to HTILE, thus Mantle documentation should contain the exact details about the HTILE data structure. However it seems that Mantle SDK is still not publicly available.
 
I've had some thoughts and Ideas in regards to rendering over the years and I was hoping the developers and armchair developers here would chime in on some of them.

1. Around 2003 on gamedev.net I made a post about z-compositing that I'd like to rehash here and now.
The basic idea was to render the static geometry of the scene at one frame rate and the dynamic geometry at another and then composite the two at the higher frame rate. For an first person camera static geometry at 60fps and the dynamic at 30. For a third person camera 30 for the static, and 60 for the dynamic.
Sort of an improvement on something like Resident Evil?
 
Sort of an improvement on something like Resident Evil?
The 1st person example (30fps actors/60fps backgrounds) would be fairly pointless, as backgrounds in a game inevitably end up consuming far more rendering resources than actors in most cases (unless you hugely unbalance the ratio of resources spent on the two, leading to some very strange, out-of-place-looking visuals...)

There just wouldn't be much gained by rendering only a small sub-set of the screen at a lower framerate.
 
First I'd like to thank Ethatron for his responses, thank you for your time and input.

Sort of an improvement on something like Resident Evil?

I guess you could say that, and although I did play/watch play the original on a friends PS1 it wasn't in my mind at the time. What happened was I had a meager machine at the time (P3 800 Win98se 128MB or 192MB ram and either a 32MB radeon 7200(dx7) or I had recently upgraded to a 128MB radeon 9550 to play with dx9 shaders), and the way I saw it was if I could make my pet project run good on my machine it would run great on something better. So there were all the usual suspects at the time but I wasn't happy with that and I wondered it there was something more I could do. At the time I could only think of three things on my own:

1 Render front to back, nixxed because broke batching which was the common wisdom I had learned at the time. (Although I still don't know if the batching was primarily to reduce draw calls, or to avoid hardware inefficiency due to state changes, or I suppose both.)
2. Number 3 from above, if i couldn't render front to back try to defer lighting and texturing some other way. Couldn't figure out how to do it at the time, don't remember my exact reasoning as to why and I can't recreate my thought pattern at the time.
3. Number 1 from above, IIRC my reasoning was that at the time static geometry had less triangles than dynamic geometry. So I'd reduce my geometry load for every two frames in the first person case. In addition I think I had just started learning about projective texturing/shadow maps and figured if I were to implement it (I was a fan of stencil shadows at the time) I could render my shadow map every other frame (if there were no artifacts), the one with the less geometry load. In all honesty I'm not sure if I thought that last bit up at the time and if my thoughts on the subject are confused.

Anyway I guess I've always had an interest in less orthodox solutions to problems. Wow did this post turn out way longer than I thought. Not gonna waste all this typing though...post unnecessarily long post with unnecessary information anyway.
 
I remember a game doing exactly that on the DOS era. It was a sequel to a RPG/adventure/action game. The first title had 3d characters on a static environment ala RE, but the sequel started rendering those environments at runtime, but only updated them every so often. Anybody remembers its name?
It was about saving your planet from some alien thing sort of thing, and you had a baby to feed in the begging haha. All I remember...
 
Back
Top