Panajev2001a
Veteran
ERP said:See, I understand what you are saying, but why do we see in off-line CG so much time ( more or less 30% of rendering time ) spent in Texture I/O ?
Because Texture IO is incredibly slow in offline rendering, best case it's coming out of main memory, more likely off disk. It's the primary reason that renderman uses a tile based system (to maximise texture locality).
In a real time system your working with somewhat different constraints. Yes texture latency is still huge by comparison to ALU ops, but a lot of texture ops in current shaders are there to approximate calculations not as part of the art assets. Really how many textures are we likely to see combined on a pixel.
I've been looking at lighting models that take >25 dot products/pixel and that doesn't include transforming all the inputs into the right space. Once you start doing lighting at a pixel level and you start looking at better lighting models, you can easilly spend 100's of ALU ops per pixel, texture complexity even if it weren't constrained by memory just isn't going to explode like that.
No, it will not explode like that, but it still is going to increase compared to what you do now especially because I do not see the jump in Math ops per cycle to be that massive to eliminate completely the use of cube-maps, 3D Textures, etc... as look-up/shortcuts.
Even if we do eliminate the short-cuts, we are likely to see an increase of texture data usage: no it will not grow like Shader ops usage, but it will be a problem that needs to be taken care of.
If we want to support a huge number of Math ops per fragment we should point towards not only a large number of ALUs, but also to a decent efficiency.
I know that parallelism helps, we can have more pixel in flights and even if the texture takes a bit to get to the ALU, we are hiding the latency by having so many pixels being processed.
It is the idea behind the story: we can have a ALU that clocks at 100 MHz and we can do the same work with two ALUs that are each half as efficient.
Depending on how we do take care of the latency in the APUs, we will vary their efficiency.
If texture fetches take lots of cycles then APU's IPC will go down a lot and in order to compensate to the efficiency lost we will need more APUs dedicated to Pixel Shading work.
If we can afford the extra APUs, it is ok, but what if the efficiency drop is so high that we cannot afford the extra APUs ?
We would have the Shading power to run those long Shaders you mention ( for complex lighting models ), but in reality we will not be able to run them at decent speed unless we keep texture fetches to a minimum ( especially dependent texture reads that cannot be optimized by the "pack texture fetches and send them early to the Pixel Engines" kind of trick ).
I know that a solution can be found ( to the potential problem of resonably long latencies for texture fetches ) and we can even look at current patents for the APUs and this one for the SALC/SALP in order to see what can be done... as I said something can be done, I am sure, but I am interested into exploring what we can do...