The corruption of realtime graphics principles and the how things should be *spawn

sebbbi · Dec 30, 2014

Kovalevsky said:
Did you know that quaternions were found before vectors?

Yes

Kovalevsky said:
needs (...) Z-buffering, image-order

PowerVR GPUs don't need these.

OpenGL guy said:
Now, one can argue that some textures are prone to aliasing (say a normal map) because you can't filter them easily. But, hopefully, your accesses aren't too chaotic or else you can experience aliasing even if you implement your own custom filtering and performance is likely to suffer as well.

Virtual texturing allows you to do efficient texture space lighting (and processing). You will sample the linear space rgb values = filtering is possible.

Kovalevsky said:
Whatever the fixed storage format, it can be defeated by image-based reads. It won't be as fast as traversing headlong (and going to slow RAM only once per n > 1 accesses) & splatting a (say) Morton quadtree. In this respect the MIP pyramid is, again, maliciously ill-formated: the levels aren't in Morton order (or any such Peano style sequence). It just is unbelievable.

If you do texture space lighting (with virtual texturing) you will access all your material textures in a linear order (processing every single bit of every cache line). Only the resulting value (that is a small fraction of all your material textures in both BW and storage) needs to be sampled. As your texture sampling pass will be a full screen compute shader pass, and you will use Hilbert curve ordering (best locality) for your compute threads, the memory locality will be actually very good. I suggest that you implement a compute shader that colors the cache lines in screen space. This way you will notice that the locality is actually very good, and very high percentage of each loaded cache line is accessed. GPU can hide the latency of the misses quite well (90%+ execution unit utilization is achievable with proper tools and optimization effort).

Kovalevsky said:
What an abomination those floats: they each carry their exponent, what a waste of space (and of time).

Storing useless data is useless. If your data is in any structure that makes it easy to determine the high bits (or the exponent), then there is no reason to store them. However there is no self balancing structure of this kind that supports O(1) modification of values. So it is always a trade off (faster static world vs faster dynamic world).

Kovalevsky said:
C++ is so falsely abstract it's not even funny. Your mind should be dealing with abstract beings, not your hands: abstract things aren't in space-time, their size here is 0. With C++ one pretends to be abstract while writing endless (not of size 0 in space-time) empty (of size 0 there) words. C is ad hoc, concrete. A C user thinks much and writes little.

C++ is almost as concrete as C. Try programming some Haskell or any other modern functional language and you will notice a real difference.

Kovalevsky said:
Donald Meagher was sabotaged, it is obvious that his renderers should have been preferred.

Many graphics programmers would like to have completely different hardware to suit their own needs (I do at least). The best can adapt to the hardware that is available. Modern GPUs are quite programmable indeed. For example in your renderer, we use the rasterization pipeline only to fill the depth buffer + another buffer (we don't even sample any textures in the rasterization pipeline). Most of the processing (including all texturing) is done by the compute pipeline, by our own algorithms.

Kovalevsky said:
No matter the format, accessing it by way of the pixels' detour won't be neighborly in general (observing a wall at a sufficient angle will result in the next pixel's tex. sample to be rather far from the current one).

If you want, you can do texture space lighting (= material sampling) on any modern GPU. Virtual texturing helps. But then again, sparse quadtrees (containing wrapped surface material and displacement data) are quite similar to sparse octrees (containing volume data).

sebbbi · Jan 1, 2015

Kovalevsky said:
Even on the current machines, a single viewport-filling rectangle at the present-day resolutions would slow down CPU rendering to a crawl. How ridiculous.

This is not true. Try using AVX2 on Haswell. It has full set of 256 bit wide integer operations. If you are using all the 8 cores, it would not even take a single millisecond to fill a 4K back buffer.

Raqia · Jan 2, 2015

Are octonions ever useful in computer graphics? (I'm not an expert but I'm guessing not; you never know though.)

Btw, interesting fact: the only division algebras (over the real numbers) are the reals, the complex numbers, the quaternions, and the octonions. This is closely related to the fact that a circle can have a non-vanishing tangent vector field without a "cow-lick" (just the obvious "sonic the hedgehog" vector field) but any non-vanishing tagent vector field on a sphere must have atleast one "cow lick." There are analogous facts about higher order spheres and only the ones with non-vanishing tangent vectors fields without "cow licks" correspond to division algebras.

Jim adams · Jan 6, 2015

Exactly, HW uses non-float representations internally. You need to perform to get a high-resolution image with high-quality lighting,

milk · Jan 6, 2015

sebbbi said:
For example in your renderer, we use the rasterization pipeline only to fill the depth buffer + another buffer (we don't even sample any textures in the rasterization pipeline). Most of the processing (including all texturing) is done by the compute pipeline, by our own algorithms.

That's news to me! Was that the case for Fusion? If so, for 360 as well? Never heard of any big game doing something like this.

Laa-Yosh · Jan 6, 2015

I think he's talking about the current, but not yet released engine...

Exophase · Jan 6, 2015

sebbbi said:
Most of the processing (including all texturing) is done by the compute pipeline, by our own algorithms.

milk said:
That's news to me! Was that the case for Fusion? If so, for 360 as well? Never heard of any big game doing something like this.

You interpreted this to mean that texturing is done with loads + compute instead of using samplers, right? That's not how I interpreted it, samplers are still available to compute tasks.. Clarification, sebbbi?

sebbbi · Jan 6, 2015

Exophase said:
You interpreted this to mean that texturing is done with loads + compute instead of using samplers, right? That's not how I interpreted it, samplers are still available to compute tasks.. Clarification, sebbbi?

We use samplers, obviously. Doing anisotropic filtering without samplers is just not viable.

What I wanted to point out that performing sampling only for visible pixels has its advantages and doing all sampling in post processing can also improve the memory access pattern, as texture cache line access patterns are quite local in screen space, so ensuring that your processing order is roughly screen local (morton order, hilbert curve, etc) will give you good cache hit rates.

I also wanted to point out that you don't need to sample (filter) your material textures at all if you do lighting in texture space (in your virtual texture cache). Loads (unfiltered reads) are used in this case and the access patterns are perfectly linear (you process small cache line sized tiles and always read and write the lines fully). This also produces a result with much reduced specular aliasing or other issues, as filtering normal maps (and other non linear data) is just wrong. The lighted result obviously needs to be sampled (preferably with anisotropic filtering) from the virtual texture cache. But this is just a single texture read and as said earlier this is a cache friendly operation (just like reading the materials textures directly in post processing).

sebbbi · Jan 6, 2015

The OP claimed that mip mapping is not good for access pattern locality. This is partially true since the data is stored in morton order only inside mip slices, while each slice is far away from the next. This results in a very long distance from two texels with the same UV but with different mips (N and N+1), and thus results in a guaranteed cache miss. However since GPUs have proper (pseudo LRU) L1 caches, this is not a problem in practice, since four more detailed mip level cache pages share one page of less detailed data. The N+1 data is thus only causing 25% of the memory requests and misses compared to level N data (etc).

This is better compared to the situation where you map all the texels of all the mip levels to a single locality maximising space filling curve (assuming single mip level distance = single texel xy distance). In this scenario sampling mip level N touches cache lines with some small amount of N+1 data (not that bad, in worst case this equals the above case, but doesn't need caches) and A LOT of N-1 data (Seriously BAD, since there will be 4x more level N-1 data = lots of unneeded data loaded to caches and discarded). So I would argue that the current system (morton order inside mip level surfaces only) is highly preferred on modern GPUs that have robust caches and good latency hiding mechanics.

3d volume texture slices are obviously a completely different scenario and should be stored in morton order (on in 3d hilbert curve order) to maximize the locality of accesses. DirectX 11.3 actually includes support for this kind of sparse 3d texture mapping (pages are cubes, not slices). This is obviously very good for sparse voxel octrees (and other sparse volumes).

The corruption of realtime graphics principles and the how things should be *spawn

sebbbi

sebbbi

Raqia

Jim adams

milk

Like Verified

Laa-Yosh

I can has custom title?

Exophase

sebbbi

sebbbi

Similar threads