It seems Sony guys are working on improving the Cg compiler, this is Alan Heirich's home page at Caltech
http://alumnus.caltech.edu/~heirich/activities.htm
See the first paper (to appear at Graphics Hardware 2005 this summer)
Optimal Automatic Multi-pass Shader Partitioning by Dynamic Programming
From the abstract:
RSX should support very long shaders but he wrote:
http://alumnus.caltech.edu/~heirich/activities.htm
See the first paper (to appear at Graphics Hardware 2005 this summer)
Optimal Automatic Multi-pass Shader Partitioning by Dynamic Programming
From the abstract:
Complex shaders must be partitioned into multiple passes to execute on GPUs with limited hardware resources.Automatic partitioning gives rise to an NP-hard scheduling problem that can be solved by any number of established
techniques.
IMHO the popular high level shading language is Cg (Nvidia just released Cg 1.4) and the very high performance GPU is RSX.Experimental results on a set of test cases with a commercial prerelease compiler for a popular high level shading language showed a DP algorithm had an average runtime cost of O(n1:14966) which is less than O(nlogn) on the region of interest in n. This demonstrates that efcient and optimal automatic shader partitioning can be an emergent byproduct of a DP-based code generator for a very high performance GPU
RSX should support very long shaders but he wrote:
The DP solution is motivated by a study of a very high performance GPU that supports large and complex shaders. The size of these shaders implies multi-pass execution and motivates the search for a scalable partitioning algorithm.
Operations are scheduled into passes in order to respect resource limitations of the GPU. At pass boundaries intermediate results are spilled into a form of intermediate storage from which they can be retrieved in a subsequent pass.
Maybe he's using the 'multipass' term in a new way (new to me at least ), he sees (also) intermediate internal results storage as the main problem and he's trying to allievate it.This study is concerned with partitioning shaders for a very high performance GPU that has a very efcient intermediate storage mechanism. GPU resources that must be scheduled include live operands (register allocation), outstanding texture requests, instruction storage and rasterized interpolants. Every shader pass must observe the physical resource limits of the GPU with any excess storage requirements satised from intermediate storage between passes. Each of these resource types has architecture-specic considerations that influence the cost function and the location of pass boundaries.