If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.
![]() |
|
|
|
|
#1 |
|
Junior Member
Join Date: Jun 2004
Location: Taiwan
Posts: 22
|
Does the ATI or nVidia driver unroll the loops (SM 2.0 flow control)?
|
|
|
|
|
|
#2 | |
|
Tea maker
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,382
|
Quote:
In fact, it may be counter-productive since those loops are controlled by constants which could be changed frequently. Unrolling the loop would mean you may end up reloading new code rather than just tweaking a constant.
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson "I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay |
|
|
|
|
|
|
#3 | |
|
Join Date: May 2002
Location: New York, NY
Posts: 12,678
|
Quote:
Since hardware like the R3xx and R4xx don't support any branching at all, this means that all loops must be unrolled. Similarly, the NV3x doesn't support pixel shader branching, and any pixel shader loops must therefore be unrolled.
__________________
April 20, 1979 - America must never forget. |
|
|
|
|
|
|
#4 |
|
Regular
Join Date: Feb 2002
Location: California
Posts: 4,732
|
It is the HLSL compiler in SM2.0 which unrolls the loops, not the driver. There is no loop instruction in SM2.0 assembly.
|
|
|
|
|
|
#5 | |
|
Senior Member
Join Date: Nov 2002
Location: Edmonton, Alberta, Canada
Posts: 1,765
|
Quote:
VS 2.0 Instructions
__________________
"Extremism is so easy. You've got your position, and that's it. It doesn't take much thought. And when you go far enough to the right, you meet the same idiots coming around from the left." -- Clint Eastwood -Ostsol |
|
|
|
|
|
|
#6 |
|
Regular
|
Zero overhead looping isn't exactly rocket science, there is really no need to unroll loops on a sane architecture.
|
|
|
|
|
|
#7 | ||
|
Regular
Join Date: Feb 2002
Location: California
Posts: 4,732
|
Quote:
|
||
|
|
|
|
|
#8 |
|
Registered
|
Is there a way to disable loop unrolling (in the HLSL compiler, SM3.0)?
I'd like to achieve the shortest possible compiled shader binaries, for a 4 kbytes intro/demo. At compile time I compile the shaders with a tool that uses D3DXCompileShaderFromFile(), and store the result. The vertex/pixel shaders contain 2-3 loops, for a fixed number of lights, texture stages, procedural texcoord generators, etc. The HLSL compiler just unrolls the whole thing, producing a "huge" shader... D3DXSHADER_PREFER_FLOW_CONTROL doesn't help. |
|
|
|
|
|
#9 |
|
A little of this and that
Join Date: Oct 2005
Location: Cupertino
Posts: 342
|
Make the loop bounds dynamic so it can't unroll and pass in the loop bounds via constants.
For example, using this for 8 lights for(i=0; i<8, i++) { .. } for(i=0;i<lights;i++) { ... } and pass in 'lights' as a shader constant at shader bind. |
|
|
|
|
|
#10 |
|
Crazy coder
|
Use the [loop] attribute on loops and [branch] on branches.
Code:
float4 main(float4 texCoord: TEXCOORD0) : COLOR {
[loop]
for (int i = 0; i < 8; i++){
texCoord += 3.7 * texCoord.wzyx;
}
return texCoord;
}
Code:
ps_3_0 def c0, 3.70000005, 0, 0, 0 dcl_texcoord v0 mad r0, v0.wzyx, c0.x, v0 mad r0, r0.wzyx, c0.x, r0 mad r0, r0.wzyx, c0.x, r0 mad r0, r0.wzyx, c0.x, r0 mad r0, r0.wzyx, c0.x, r0 mad r0, r0.wzyx, c0.x, r0 mad r0, r0.wzyx, c0.x, r0 mad oC0, r0.wzyx, c0.x, r0 Code:
ps_3_0 def c0, 0, 3.70000005, 0, 0 defi i0, 8, 0, 0, 0 dcl_texcoord v0 mov r0, v0 rep i0 mad r0, r0.wzyx, c0.y, r0 endrep mov oC0, r0 |
|
|
|
|
|
#11 |
|
A little of this and that
Join Date: Oct 2005
Location: Cupertino
Posts: 342
|
Yes, but the vendor compilers will sometimes unroll the loop themselves if it's statically analyzable... Making it dynamic will avoid this, unless they recompile on parameter changes.
|
|
|
|
|
|
#12 | |
|
Registered
|
Quote:
However, it doesn't work with my original shader (that compiled nicely with the dec.2006 sdk, but had unrolled loops), now I have a lot of strange error messages from the HLSL compiler... It seems "PixelShader" and "VertexShader" became some kind of keywords (shouldn't these be case sensitive?), at least the compiler gives an "error X3000: syntax error: unexpected token 'VertexShader'" message when I name my functions like that. Another one: "error X5300: Invalid register number: 11. Max allowed for v# register is 9.", in the line where the halfvector is calculated (pixelshader code cropped, I quote the whole shader if that helps): Code:
float diffuse_light = 0;
float specular_light = 0;
float3 view_normal = normalize( input.ViewNormal );
[loop]
for ( int i = 0; i < 2; i++ )
{
float3 view_light = normalize( input.ViewLights[ i ] );
diffuse_light += saturate( dot( view_normal, view_light ));
float3 halfvector = normalize( view_light - normalize( input.Position.xyz ));
specular_light += pow( dot( view_normal, halfvector ), 64.0 );
}
Any ideas why I get this error message? |
|
|
|
|
|
|
#13 |
|
Junior Member
Join Date: Oct 2006
Posts: 46
|
|
|
|
|
![]() |
| Thread Tools | |
| Display Modes | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| ATI R500 patent for Xenon GPU? | j^aws | Console Technology | 123 | 14-Apr-2005 05:04 |
| Supplemental info from Japanese articles about the Cell | one | Console Technology | 33 | 14-Feb-2005 22:20 |
| Overclocking Guide | UberL33tJarad | Hardware & Software Talk | 1 | 30-Sep-2004 15:20 |
| Discussion of general purpose processor architecture cont. | Gubbi | Hardware & Software Talk | 11 | 19-Jun-2003 15:58 |
| John Carmack on Cheating vs Optimistations | Dave Baumann | Beyond3D News | 106 | 02-Jun-2003 18:43 |