Welcome, Unregistered.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

Reply
Old 24-Mar-2005, 09:23   #1
tcchiu
Junior Member
 
Join Date: Jun 2004
Location: Taiwan
Posts: 22
Default Loop unrolling in NVIDIA/ATI drivers?

Does the ATI or nVidia driver unroll the loops (SM 2.0 flow control)?
tcchiu is offline   Reply With Quote
Old 24-Mar-2005, 11:58   #2
Simon F
Tea maker
 
Join Date: Feb 2002
Location: In the Island of Sodor, where the steam trains lie
Posts: 4,382
Default Re: Loop unrolling

Quote:
Originally Posted by tcchiu
Does the ATI or nVidia driver unroll the loops (SM 2.0 flow control)?
IIRC, given the simplicity of SM2.0 looping, it may not be necessary, perhaps apart from very small loops.

In fact, it may be counter-productive since those loops are controlled by constants which could be changed frequently. Unrolling the loop would mean you may end up reloading new code rather than just tweaking a constant.
__________________
"Your work is both good and original. Unfortunately the part that is good is not original and the part that is original is not good." -(attributed to) Samuel Johnson

"I invented the term Object-Oriented, and I can tell you I did not have C++ in mind." Alan Kay
Simon F is offline   Reply With Quote
Old 24-Mar-2005, 18:49   #3
Chalnoth
 
Join Date: May 2002
Location: New York, NY
Posts: 12,678
Default Re: Loop unrolling

Quote:
Originally Posted by tcchiu
Does the ATI or nVidia driver unroll the loops (SM 2.0 flow control)?
If the loops can be unrolled, they are. Basically, any loop that doesn't depend upon per-vertex or per-pixel information is unrolled. This is simply because it is typically assumed that many pixels and vertices will be drawn per branch, and the branching itself will incur a performance hit. Thus it makes more sense to just eat that little bit of extra data swapping that is needed when the new shader is loaded, instead of eating some constant performance hit for each pixel/vertex.

Since hardware like the R3xx and R4xx don't support any branching at all, this means that all loops must be unrolled. Similarly, the NV3x doesn't support pixel shader branching, and any pixel shader loops must therefore be unrolled.
Chalnoth is offline   Reply With Quote
Old 24-Mar-2005, 19:06   #4
DemoCoder
Regular
 
Join Date: Feb 2002
Location: California
Posts: 4,732
Default

It is the HLSL compiler in SM2.0 which unrolls the loops, not the driver. There is no loop instruction in SM2.0 assembly.
DemoCoder is offline   Reply With Quote
Old 24-Mar-2005, 19:13   #5
Ostsol
Senior Member
 
Join Date: Nov 2002
Location: Edmonton, Alberta, Canada
Posts: 1,765
Default

Quote:
Originally Posted by DemoCoder
It is the HLSL compiler in SM2.0 which unrolls the loops, not the driver. There is no loop instruction in SM2.0 assembly.
Well, not for pixel shaders, but for vertex shaders there seems to be. . .

VS 2.0 Instructions
__________________
"Extremism is so easy. You've got your position, and that's it. It doesn't take much thought. And when you go far enough to the right, you meet the same idiots coming around from the left." -- Clint Eastwood

-Ostsol
Ostsol is offline   Reply With Quote
Old 24-Mar-2005, 19:57   #6
MfA
Regular
 
Join Date: Feb 2002
Posts: 5,228
Send a message via ICQ to MfA
Default Re: Loop unrolling

Zero overhead looping isn't exactly rocket science, there is really no need to unroll loops on a sane architecture.
MfA is offline   Reply With Quote
Old 24-Mar-2005, 23:14   #7
DemoCoder
Regular
 
Join Date: Feb 2002
Location: California
Posts: 4,732
Default

Quote:
Originally Posted by Ostsol
Quote:
Originally Posted by DemoCoder
It is the HLSL compiler in SM2.0 which unrolls the loops, not the driver. There is no loop instruction in SM2.0 assembly.
Well, not for pixel shaders, but for vertex shaders there seems to be. . .

VS 2.0 Instructions
Yeah, I make the common mistake of using SM2.0 interchangeably with PS2.0. For me, the most interesting changes in shader model are in the PS. Vertex texturing was nice, but until geometry shading is added, the vertex stuff is pretty boring, since I find vertex lighting boring and skinning is commodity stuff now
DemoCoder is offline   Reply With Quote
Old 08-Feb-2007, 00:49   #8
Remage
Registered
 
Join Date: Jun 2002
Location: Budapest, Hungary
Posts: 5
Send a message via ICQ to Remage Send a message via MSN to Remage
Icon Question How to disable loop unrolling (HLSL, SM3.0)?

Is there a way to disable loop unrolling (in the HLSL compiler, SM3.0)?
I'd like to achieve the shortest possible compiled shader binaries, for a 4 kbytes intro/demo. At compile time I compile the shaders with a tool that uses D3DXCompileShaderFromFile(), and store the result.
The vertex/pixel shaders contain 2-3 loops, for a fixed number of lights, texture stages, procedural texcoord generators, etc. The HLSL compiler just unrolls the whole thing, producing a "huge" shader...
D3DXSHADER_PREFER_FLOW_CONTROL doesn't help.
Remage is offline   Reply With Quote
Old 08-Feb-2007, 03:25   #9
mhouston
A little of this and that
 
Join Date: Oct 2005
Location: Cupertino
Posts: 342
Default

Make the loop bounds dynamic so it can't unroll and pass in the loop bounds via constants.

For example, using this for 8 lights

for(i=0; i<8, i++)
{
..
}

for(i=0;i<lights;i++)
{
...
}

and pass in 'lights' as a shader constant at shader bind.
mhouston is offline   Reply With Quote
Old 08-Feb-2007, 05:20   #10
Humus
Crazy coder
 
Join Date: Feb 2002
Location: Stockholm, Sweden
Posts: 3,216
Send a message via ICQ to Humus Send a message via MSN to Humus
Default

Use the [loop] attribute on loops and [branch] on branches.

Code:
float4 main(float4 texCoord: TEXCOORD0) : COLOR {
    [loop]
    for (int i = 0; i < 8; i++){
        texCoord += 3.7 * texCoord.wzyx;
    }
    return texCoord;
}
Before:
Code:
ps_3_0
def c0, 3.70000005, 0, 0, 0
dcl_texcoord v0
mad r0, v0.wzyx, c0.x, v0
mad r0, r0.wzyx, c0.x, r0
mad r0, r0.wzyx, c0.x, r0
mad r0, r0.wzyx, c0.x, r0
mad r0, r0.wzyx, c0.x, r0
mad r0, r0.wzyx, c0.x, r0
mad r0, r0.wzyx, c0.x, r0
mad oC0, r0.wzyx, c0.x, r0
After:
Code:
ps_3_0
def c0, 0, 3.70000005, 0, 0
defi i0, 8, 0, 0, 0
dcl_texcoord v0
mov r0, v0
rep i0
 mad r0, r0.wzyx, c0.y, r0
endrep
mov oC0, r0
I think you'll have to use a recent version of the SDK though for the compiler to support these [] attributes.
__________________
[ Visit my site ]
I speak for myself and only myself.
Humus is offline   Reply With Quote
Old 08-Feb-2007, 05:32   #11
mhouston
A little of this and that
 
Join Date: Oct 2005
Location: Cupertino
Posts: 342
Default

Yes, but the vendor compilers will sometimes unroll the loop themselves if it's statically analyzable... Making it dynamic will avoid this, unless they recompile on parameter changes.
mhouston is offline   Reply With Quote
Old 08-Feb-2007, 23:53   #12
Remage
Registered
 
Join Date: Jun 2002
Location: Budapest, Hungary
Posts: 5
Send a message via ICQ to Remage Send a message via MSN to Remage
Icon Frown Some progress...

Quote:
Originally Posted by Humus View Post
Use the [loop] attribute on loops and [branch] on branches.
I think you'll have to use a recent version of the SDK though for the compiler to support these [] attributes.
Thanks, that kinda helps... It seems to work, my test shader has a real loop now.
However, it doesn't work with my original shader (that compiled nicely with the dec.2006 sdk, but had unrolled loops), now I have a lot of strange error messages from the HLSL compiler...

It seems "PixelShader" and "VertexShader" became some kind of keywords (shouldn't these be case sensitive?), at least the compiler gives an "error X3000: syntax error: unexpected token 'VertexShader'" message when I name my functions like that.

Another one: "error X5300: Invalid register number: 11. Max allowed for v# register is 9.", in the line where the halfvector is calculated (pixelshader code cropped, I quote the whole shader if that helps):
Code:
	float diffuse_light = 0;
	float specular_light = 0;
	float3 view_normal = normalize( input.ViewNormal );
	[loop]
	for ( int i = 0; i < 2; i++ )
	{
		float3 view_light = normalize( input.ViewLights[ i ] );
		diffuse_light += saturate( dot( view_normal, view_light ));
		float3 halfvector = normalize( view_light - normalize( input.Position.xyz ));
		specular_light += pow( dot( view_normal, halfvector ), 64.0 );
	}
The compilation stops here, so I can't check what would the resulting shader be.

Any ideas why I get this error message?
Remage is offline   Reply With Quote
Old 08-Feb-2007, 12:12   #13
Dee.cz
Junior Member
 
Join Date: Oct 2006
Posts: 46
Default

Quote:
Originally Posted by tcchiu View Post
Does the ATI or nVidia driver unroll the loops (SM 2.0 flow control)?
IMHO, GLSL fixed length loops with access to array element were broken on ATI for long time, and possibly still are, so it's necessary to unroll them manually in source code.
Dee.cz is offline   Reply With Quote

Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ATI R500 patent for Xenon GPU? j^aws Console Technology 123 14-Apr-2005 05:04
Supplemental info from Japanese articles about the Cell one Console Technology 33 14-Feb-2005 22:20
Overclocking Guide UberL33tJarad Hardware & Software Talk 1 30-Sep-2004 15:20
Discussion of general purpose processor architecture cont. Gubbi Hardware & Software Talk 11 19-Jun-2003 15:58
John Carmack on Cheating vs Optimistations Dave Baumann Beyond3D News 106 02-Jun-2003 18:43


All times are GMT +1. The time now is 20:55.


Powered by vBulletin® Version 3.8.6
Copyright ©2000 - 2013, Jelsoft Enterprises Ltd.