The End of The GPU Roadmap

ALU:TEX ratios continue to rise. At the same time, we're starting to see a clear need to access memory in a more generic way, not just reading textures. The post you're linking to suggests that GT200 dedicates only 13 % of die space to texture samplers. So it's not inconceivable that in the future they'll be replaced with generic gather units and the fraction of filtering is done in the shader cores.

Today's GPUs perform all calculations in 32-bit floating-point. Yet most of the time with graphics we're still working with colors that have 8-bit integer components. Is that optimized? So why would it be a big deal to generalize the texture samplers as well? They pretty much have to be capable of filtering floating-point textures at full precision anyway.
Have you given thought to what's required to do a single texture lookup? Also, what about the states that affect texturing such as filter mode, addressing mode, texture format, etc.? Apps already create a large number of shaders (sometimes many aren't even used), but now you'd add a bunch more permutations if you need to recompile shaders based on what texture sampler states and/or textures are enabled.

Of course it's not impossible, but it's not trivial either.
 
I see 3 major (but not insurmountable..) issues with generalized texture units (in importance order):

1) Power. Dedicated 8 bits filtering units vs full fledged floating point units
2) Efficient texturing addressing (assuming addressing is done on the shader cores: some stuff is simple, some is plain nasty)
3) Potential impact on performance due to excessive number of shader compiler invocations
 
I see 3 major (but not insurmountable..) issues with generalized texture units (in importance order):

1) Power. Dedicated 8 bits filtering units vs full fledged floating point units
2) Efficient texturing addressing (assuming addressing is done on the shader cores: some stuff is simple, some is plain nasty)
3) Potential impact on performance due to excessive number of shader compiler invocations

Also, texturing tends to be read only...and you don't want to necessarily waste a fully coherent memory system by putting textures through it. Virtualizing it isn't a big deal, it just costs power and area...but adds value. I'm not sure there is a lot of value from full R/W access to textures (my understanding is that the TMUs are coherent in LRB, but are likely read only - it'll be interesting to see as this might shed a clue).

David
 
SSE registers can store single-precision floating-point
SSE is not x86.

Go try and run some SSE code on your 386 if you don't believe me.
Or for that matter, try running some 16bit x86 code on 64bit Windows.
Installed 64bit Windows on a Pentium II lately?

What do you mean "Doesn't work"?!

But, but ...x86 is totally standardised, both forward & backwards compatible :!:


MMX, x86-64, SSE, SSE2, SSE3, SSSE3, SSE4, SSE5, AVX.
Rename them 80386 -> 801286 & it starts looking more than a bit like the evolution of the DirectX API :oops:
 
Last edited by a moderator:
Have you given thought to what's required to do a single texture lookup?
As the lead developer on SwiftShader, it's what keeps me up all night so yeah I've given it thought. :)

Even though texture sampling is one of the least efficient things to emulate on a CPU, disabling it entirely revealed that no modern game is spending more than 25% of execution time on it. TEX:ALU ratio keeps going down and gather support would make it a whole lot more efficient. That really starts to put the 13% of GT200 die space spent on texture units into perspective.
Also, what about the states that affect texturing such as filter mode, addressing mode, texture format, etc.? Apps already create a large number of shaders (sometimes many aren't even used), but now you'd add a bunch more permutations if you need to recompile shaders based on what texture sampler states and/or textures are enabled.
This is why I mentioned before that GPU cores have to become capable of compiling their own code. Larrabee appears to do just that. And I believe the inventor of 'code stitching' knows how to handle the issue just fine.
 
1) Power. Dedicated 8 bits filtering units vs full fledged floating point units
That very same issue hasn't stopped the evolution from Shader Model 1.x to Shader Model 2.0.

Sure, you could keep dedicated low precision units. But it reminds me of NV40 versus R300. Because of additional wiring and control, and split utilization, it's actually better to just have full-fledged units.

Anyway, we probably haven't reached the tipping point yet. But I'm pretty confident that in ten years from now dedicated texture samplers will be something from the stone age.
2) Efficient texturing addressing (assuming addressing is done on the shader cores: some stuff is simple, some is plain nasty)
You're probably thinking of cube mapping? They're really not used that often. The great thing about software rendering is that you only pay for what you use. It's the average case that determines overall performance. For plain 2D textures the addressing is quite simple. Some of it can even be for free if you handle it in the floating-point to integer conversion.
3) Potential impact on performance due to excessive number of shader compiler invocations
Code stitching is cheap, especially when each core can compile its own code. Also, because you're starting from optimized code sections any further optimizations can be focussed on just the remaining opportunities. Think of it as link-time inlining. :)
 
SSE is not x86.
SSE is x86. x86 is not SSE.

Anyway, you missed the point. It has nothing to do with forward or backward compatibility. 3dilettante suggested that x86 CPUs haven't evolved toward full precision floating-point support yet. It's about supporting the generic case. And SSE fully supports that.
 
SSE is x86. x86 is not SSE.

Anyway, you missed the point. It has nothing to do with forward or backward compatibility. 3dilettante suggested that x86 CPUs haven't evolved toward full precision floating-point support yet. It's about supporting the generic case. And SSE fully supports that.

I suggested that half of the x86 register set doesn't.
 
I suggested that half of the x86 register set doesn't.
And in what way does that make x86 any less capable of generic computing? When I said "Shader Model 2.0 made all registers floating-point" the focus was really on making the programmable ALUs capable of floating-point operations, not on having 'all' registers capable of storing floating-point data. If you really want to nitpick, x86's general-purpose registers can store data representing floating-point numbers too. Oh by the way, p0 is not a floating-point register, so by your logic Shader Model 2.x would be a step backward...
 
And in what way does that make x86 any less capable of generic computing? When I said "Shader Model 2.0 made all registers floating-point" the focus was really on making the programmable ALUs capable of floating-point operations, not on having 'all' registers capable of storing floating-point data.
I see, I was just going by what you said, which related to something almost completely orthogonal to what you meant.

If you really want to nitpick, x86's general-purpose registers can store data representing floating-point numbers too. Oh by the way, p0 is not a floating-point register, so by your logic Shader Model 2.x would be a step backward...
My logic would be that I was responding to what you said, which as written didn't further your argument.
 
http://www.bit-tech.net/hardware/graphics/2009/08/20/does-nvidia-have-a-future/1
They made a cool colorful chart that mostly matches my half-wit list from the other day.
convergence.jpg

NV's future is perhaps questionable I suppose. Who knows how it'll go.
 
Last edited by a moderator:
This is probably an insane question so forgive me, but how feasible would it be to geometrise the texture completely? E.g. one pixel of one color in the texture would become two triangles of one color. Of course you could solve this in the art creation pipeline for new games, but for backward compatibility you could automate this to work with existing bitmaps. I know it's probably still a ways of, what with a 2 megabyte texture taking 4 million polygons in the most extreme case of every pixel being different (though I can imagine there's room for optimisation), but as we're getting ready to bring tesselation into the picture, I'm thinking this might not be so far off either.
 
This is probably an insane question so forgive me, but how feasible would it be to geometrise the texture completely? E.g. one pixel of one color in the texture would become two triangles of one color. Of course you could solve this in the art creation pipeline for new games, but for backward compatibility you could automate this to work with existing bitmaps. I know it's probably still a ways of, what with a 2 megabyte texture taking 4 million polygons in the most extreme case of every pixel being different (though I can imagine there's room for optimisation), but as we're getting ready to bring tesselation into the picture, I'm thinking this might not be so far off either.

Well, one thing that would definitely work rather well is tessellation down to the (near?) micropolygon level (see: REYES, http://graphics.stanford.edu/papers/mprast/rast_hpg09.pdf ) and have the vertex shader do what we normally have pixel shaders do: Calculate lighting calculations, do texture lookups in there, and so on. It's definitely not that far off; I wouldn't be surprised if that's how most traditional rendering is done by the time the next console generation rolls around.
 
Actually with micropolygons you still need texture filtering.

The problem with geometry is that it's comparatively hard to pre-filter ... without pre-filtering you need a lot of (semi-)stochastic samples to cure aliasing. As long as geometry is relatively coarse compared to texture detail it doesn't make sense to go there.
 
Well, one thing that would definitely work rather well is tessellation down to the (near?) micropolygon level (see: REYES, http://graphics.stanford.edu/papers/mprast/rast_hpg09.pdf ) and have the vertex shader do what we normally have pixel shaders do: Calculate lighting calculations, do texture lookups in there, and so on. It's definitely not that far off; I wouldn't be surprised if that's how most traditional rendering is done by the time the next console generation rolls around.
Unfortunately shading vertices makes much more difficult to perform any sort of early rejection.
 
Unfortunately shading vertices makes much more difficult to perform any sort of early rejection.

Hmm, yes, I neglected to consider that. Despite that, though, I can still easily envision such a future one way or another.
 
This is probably an insane question so forgive me, but how feasible would it be to geometrise the texture completely? E.g. one pixel of one color in the texture would become two triangles of one color.

Doing it backwards is more efficient. Use 2D image information to store 3D data, that is.

For geometry you need to store XYZ coords in float, a vertex normal in float, and usually some extra data as well like skinning weights, various kinds of weightmaps etc. You don't really care for the extra floats as long as you don't pass a few hundred thousand vertices, but at millions it starts to make a difference...

For a texel you usually store 8-bit RGB data for each color layer and maybe a single 32-bit float for displacement. Furthermore you can use MIP mapping to keep the required texture memory low. It's the LOD part that's not really effective with huge meshes, at least yet.

See, with tesselation you only manipulate a low res object and create the larger dataset of the more detailed model through interpolation (and displacement). Those extra vertices are free in terms of memory and transformations for the most part, they only exist through the rendering stage and can be discarded once the given pixels are completed. This is particularly effective when using bucket based rendering like in PRMan.

If however you also store the color/spec information in the mesh then you can't really do that. Skinning and dynamics simulations in particular would be a nightmare with millions of vertices.

Art pipeline is even more problematic as it's really hard to manipulate that much data in real time. We have tools to support many 2K textures maps per model for each of the color/spec/bump layers, but as seen in the Zbrush implementation of what you've described, called Polypainting, current systems are only enough to support about 4-6 million 'texels' in only one channel.
Also, it's an added problem to keep geometry distribution even throughout the entire the model to get the same density that we'd get with UV mapping and texels.
 
Back
Top