If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
#1 |
|
Junior Member
Join Date: Jun 2004
Location: Taiwan
Posts: 22
|
Does the ATI or nVidia driver unroll the loops (SM 2.0 flow control)?
|
|
|
|
|
|
#2 | |
|
Tea maker
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,379
|
Quote:
In fact, it may be counter-productive since those loops are controlled by constants which could be changed frequently. Unrolling the loop would mean you may end up reloading new code rather than just tweaking a constant.
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson "I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay |
|
|
|
|
|
|
#3 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
Since hardware like the R3xx and R4xx don't support any branching at all, this means that all loops must be unrolled. Similarly, the NV3x doesn't support pixel shader branching, and any pixel shader loops must therefore be unrolled.
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#4 |
|
Regular
Join Date: Feb 2002
Location: California
Posts: 4,732
|
It is the HLSL compiler in SM2.0 which unrolls the loops, not the driver. There is no loop instruction in SM2.0 assembly.
|
|
|
|
|
|
#5 | |
|
Senior Member
Join Date: Nov 2002
Location: Edmonton, Alberta, Canada
Posts: 1,765
|
Quote:
VS 2.0 Instructions
__________________
"Extremism is so easy. You've got your position, and that's it. It doesn't take much thought. And when you go far enough to the right, you meet the same idiots coming around from the left." -- Clint Eastwood -Ostsol |
|
|
|
|
|
|
#6 |
|
Regular
|
Zero overhead looping isn't exactly rocket science, there is really no need to unroll loops on a sane architecture.
|
|
|
|
|
|
#7 | ||
|
Regular
Join Date: Feb 2002
Location: California
Posts: 4,732
|
Quote:
|
||
|
|
|
|
|
#8 |
|
Registered
|
Is there a way to disable loop unrolling (in the HLSL compiler, SM3.0)?
I'd like to achieve the shortest possible compiled shader binaries, for a 4 kbytes intro/demo. At compile time I compile the shaders with a tool that uses D3DXCompileShaderFromFile(), and store the result. The vertex/pixel shaders contain 2-3 loops, for a fixed number of lights, texture stages, procedural texcoord generators, etc. The HLSL compiler just unrolls the whole thing, producing a "huge" shader... D3DXSHADER_PREFER_FLOW_CONTROL doesn't help. |
|
|
|
|
|
#9 |
|
A little of this and that
Join Date: Oct 2005
Location: Cupertino
Posts: 342
|
Make the loop bounds dynamic so it can't unroll and pass in the loop bounds via constants.
For example, using this for 8 lights for(i=0; i<8, i++) { .. } for(i=0;i<lights;i++) { ... } and pass in 'lights' as a shader constant at shader bind. |
|
|
|
|
|
#10 |
|
Crazy coder
|
Use the [loop] attribute on loops and [branch] on branches.
Code:
float4 main(float4 texCoord: TEXCOORD0) : COLOR {
[loop]
for (int i = 0; i < 8; i++){
texCoord += 3.7 * texCoord.wzyx;
}
return texCoord;
}
Code:
ps_3_0 def c0, 3.70000005, 0, 0, 0 dcl_texcoord v0 mad r0, v0.wzyx, c0.x, v0 mad r0, r0.wzyx, c0.x, r0 mad r0, r0.wzyx, c0.x, r0 mad r0, r0.wzyx, c0.x, r0 mad r0, r0.wzyx, c0.x, r0 mad r0, r0.wzyx, c0.x, r0 mad r0, r0.wzyx, c0.x, r0 mad oC0, r0.wzyx, c0.x, r0 Code:
ps_3_0 def c0, 0, 3.70000005, 0, 0 defi i0, 8, 0, 0, 0 dcl_texcoord v0 mov r0, v0 rep i0 mad r0, r0.wzyx, c0.y, r0 endrep mov oC0, r0 |
|
|
|
|
|
#11 |
|
A little of this and that
Join Date: Oct 2005
Location: Cupertino
Posts: 342
|
Yes, but the vendor compilers will sometimes unroll the loop themselves if it's statically analyzable... Making it dynamic will avoid this, unless they recompile on parameter changes.
|
|
|
|
|
|
#12 |
|
Junior Member
Join Date: Oct 2006
Posts: 46
|
|
|
|
|
|
|
#13 | |
|
Member
Join Date: Mar 2002
Location: UK
Posts: 570
|
Quote:
John. |
|
|
|
|
|
|
#14 |
|
A little of this and that
Join Date: Oct 2005
Location: Cupertino
Posts: 342
|
I agree, but the original question was how to prevent loop unrolling. In general, I perfer the compilers, be it fxc or the vendor compilers, to unroll the code and make their own decisions on predication vs branching for their own hardware. Sadly, both Nvidia and ATI routinely have compiler bugs when optimizing long shaders for their own hardware, which sometimes forces us to figure out work arounds like this to get correctness. Granted we do tend to give them massive shaders. FXC also has tons of bugs and performance issues when you start pushing the limit. For example, we have a few shaders that take *hours* to compile with fxc.
|
|
|
|
|
|
#15 | |
|
Registered
|
Quote:
However, it doesn't work with my original shader (that compiled nicely with the dec.2006 sdk, but had unrolled loops), now I have a lot of strange error messages from the HLSL compiler... It seems "PixelShader" and "VertexShader" became some kind of keywords (shouldn't these be case sensitive?), at least the compiler gives an "error X3000: syntax error: unexpected token 'VertexShader'" message when I name my functions like that. Another one: "error X5300: Invalid register number: 11. Max allowed for v# register is 9.", in the line where the halfvector is calculated (pixelshader code cropped, I quote the whole shader if that helps): Code:
float diffuse_light = 0;
float specular_light = 0;
float3 view_normal = normalize( input.ViewNormal );
[loop]
for ( int i = 0; i < 2; i++ )
{
float3 view_light = normalize( input.ViewLights[ i ] );
diffuse_light += saturate( dot( view_normal, view_light ));
float3 halfvector = normalize( view_light - normalize( input.Position.xyz ));
specular_light += pow( dot( view_normal, halfvector ), 64.0 );
}
Any ideas why I get this error message? |
|
|
|
|
|
|
#16 | |
|
Crazy coder
|
Quote:
No, but if you post the whole shader I might be able to help. |
|
|
|
|
|
|
#17 |
|
Registered
|
The basic idea is to mix axis-aligned-mapped 2D textures, so any arbitrary generated geometry (organic-like: metaballs, or 4D julia sets) will have a nice material.
Code:
struct VS_INPUT
{
float4 Position : POSITION0;
float3 Normal : NORMAL0;
};
struct VS_OUTPUT
{
float4 ProjPosition : POSITION;
float4 Position : TEXCOORD0;
float3 Normal : TEXCOORD1;
float3 ViewNormal : TEXCOORD2;
float3 ViewLights[ 2 ] : TEXCOORD3;
float2 TexCoords[ 3 ] : TEXCOORD5;
};
struct PS_OUTPUT
{
float4 Color : COLOR0;
};
float4x4 Projection : register( c0 );
float4x4 WorldViewTransform : register( c4 );
float4 Material_Color1 : register( c8 );
float4 Material_Color2 : register( c9 );
float4 Material_Specular : register( c10 );
float3 LightVectors[ 2 ] : register( c11 );
float4 Reg13 : register( c13 );
#define Time Reg13.x
#define Global_Ambient Reg13.y
#define Global_Diffuse Reg13.z
#define Global_Specular Reg13.w
float4 TexScale : register( c14 );
// --------------------------------------------------------------------------------------------------------------------
// VertexShader
// --------------------------------------------------------------------------------------------------------------------
#ifdef VERTEXSHADER
VS_OUTPUT __VertexShader( VS_INPUT input )
{
VS_OUTPUT output;
// Position & normal
output.Position = mul( input.Position, WorldViewTransform );
output.ProjPosition = mul( output.Position, Projection );
output.Normal = input.Normal;
output.ViewNormal = mul( input.Normal, WorldViewTransform );
// Light sources
[unroll]
for ( int i = 0; i < 2; i++ )
{
output.ViewLights[ i ] = mul( LightVectors[ i ], WorldViewTransform );
}
// Generate texcoords
[unroll] // FIXME: Indexing of l-values are not supported?
for ( int i = 0; i < 3; i++ )
{
output.TexCoords[ i ] = TexScale.xy * input.Position.xy;
output.TexCoords[ i ] += TexScale.w * ( Time + input.Position.z );
input.Position.xyz = input.Position.yzx;
TexScale.xyz = TexScale.yzx;
}
return output;
}
#endif
// --------------------------------------------------------------------------------------------------------------------
// PixelShader
// --------------------------------------------------------------------------------------------------------------------
#ifdef PIXELSHADER
sampler2D Texture1;
sampler2D Texture2;
PS_OUTPUT __PixelShader( VS_OUTPUT input )
{
PS_OUTPUT output;
// Light sources
float diffuse_light = 0;
float specular_light = 0;
float3 normal = normalize( input.Normal );
float3 view_normal = normalize( input.ViewNormal );
[loop]
for ( int i = 0; i < 2; i++ )
{
float3 view_light = normalize( input.ViewLights[ i ] );
float3 halfvector = normalize( view_light - normalize( input.Position.xyz ));
specular_light += pow( dot( view_normal, halfvector ), 64.0 );
diffuse_light += saturate( dot( view_normal, view_light ));
}
// Material: texture & colors
float4 texture_map = 0;
[loop]
for ( int i = 0; i < 3; i++ )
{
float4 texture_1 = tex2D( Texture1, input.TexCoords[ i ] );
float4 texture_2 = tex2D( Texture2, input.TexCoords[ i ] );
texture_map += texture_1 * texture_2 * abs( input.Normal.z );
normal.xyz = normal.yzx;
}
float4 diffuse_color =
lerp( Material_Color1, Material_Color2, texture_map );
float4 specular_color =
Material_Specular * texture_map * saturate( -view_normal.z );
// Compute final color
output.Color =
Global_Ambient * diffuse_color +
Global_Diffuse * diffuse_color * diffuse_light +
Global_Specular * specular_color * specular_light;
// Per-pixel distance-fog
output.Color = lerp( output.Color, 0.933333333, saturate( abs( input.Position.z ) * 0.02f ));
return output;
}
#endif
|
|
|
|
|
|
#18 |
|
Crazy coder
|
I did some attempt to get that working yesterday, but had some real struggles. This looks like it's a D3D compiler bug. It's trying to use more interpolators than there is available.
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| ATI R500 patent for Xenon GPU? | j^aws | Console Technology | 123 | 14-Apr-2005 05:04 |
| Supplemental info from Japanese articles about the Cell | one | Console Technology | 33 | 14-Feb-2005 22:20 |
| Overclocking Guide | UberL33tJarad | Hardware & Software Talk | 1 | 30-Sep-2004 15:20 |
| Discussion of general purpose processor architecture cont. | Gubbi | Hardware & Software Talk | 11 | 19-Jun-2003 15:58 |
| John Carmack on Cheating vs Optimistations | Dave Baumann | Beyond3D News | 106 | 02-Jun-2003 18:43 |