GLSL much slower than HLSL?(In AMD GPU ShaderAnalyzer)

kumayu

Newcomer
I put my simple GLSL and HLSL code into AMD GPU ShaderAnalyzer.
I expect they will be equally in term of performance.
But what a surprise.....

HLSL EstCycle(Tri) GLSL EstCycle(Tri)
X1300 9 13
X1900 4.8 7

Who stole the 30% performance??
Both shader codes are almost the same.
Can You give me a hint?

Does it mean GLSL will be slower than HLSL?
Or just AMD GPU ShaderAnalyzer has problem with GLSL?


--------D3D ASM stastics-------------------------------------------------
Shader Version = 3.0
Instruction Count = 23
ALU Instructions = 9, Texture Instructions = 4, ALU:Texture Ratio = 2.25

Constant Register Count = 1
Temp Register Count = 3, Sampler Register Count = 3, Input Register Count = 6, Output Register Count = 1

Has PS2.0 Instructions
Uses Arbitrary Swizzle
 
-----------HLSL EstCycle(Tri)------ GLSL EstCycle(Tri)
X1300 ------------9------------------------- 13
X1900 -----------4.8------------------------ 7
 
HLSL version
Paste it in AMD GPU ShaderAnalyzer and you will see how many cycle....

--------------------------------------------------
sampler2D tex0; //AniMap
sampler2D tex1; //basemap
sampler2D tex2; //normal map

struct PS_INPUT
{
float4 TexCoord : TEXCOORD0;
float2 vUV : TEXCOORD1;
float3 worldNormal : TEXCOORD2;
float3 worldTangent : TEXCOORD3;
float3 worldBinormal : TEXCOORD4;
float3 halfVec : TEXCOORD5;
};


float4 ps_main( PS_INPUT Input ) : COLOR0
{
float4 tx = tex2D( tex1, Input.vUV );
float3 base = tx.xyz;
float3 bump = tex2D( tex2, Input.vUV ).xyz * 2.0-1.0;
float3 Nb =(Input.worldNormal + (bump.x * Input.worldTangent + bump.y * Input.worldBinormal));
float hdn = dot(Input.halfVec , Nb);
float3 col= base + pow(hdn,32.0);

float4 texColor0 = tex2D( tex0 , Input.TexCoord.xy) ; //uv scroll scale1
float4 texColor1 = tex2D( tex0 , Input.TexCoord.zw) ; //uv scroll scale2
texColor0=texColor0+texColor1;
texColor0=texColor0+float4(col,tx.w);
return texColor0 ;
}
 
GLSL version
All computing is the same as HLSL.

--------------------------------------------------

[Vertex]

uniform mat4 matWorldIT;
uniform mat4 matWorld;
uniform mat4 matViewI;

uniform float time_0_X;
uniform vec4 scrollXYZW;
varying vec4 TexCoord;

varying vec2 vUV;
varying vec3 worldNormal;
varying vec3 worldTangent;
varying vec3 worldBinormal;
varying vec3 halfVec;

void main(void)
{
gl_Position = ftransform();

vUV = vec2(gl_MultiTexCoord0);

worldNormal=normalize((vec3(gl_Normal)*mat3(matWorldIT)).xyz);
worldTangent=normalize((vec3(gl_MultiTexCoord1)* mat3(matWorldIT)).xyz);
worldBinormal=normalize((vec3(gl_MultiTexCoord2)*mat3(matWorldIT)).xyz);

vec3 worldP = (gl_Vertex* matWorld).xyz;
vec3 eyeVec=normalize(matViewI[3].xyz - worldP);
vec3 lightVec=vec3(-eyeVec.x,eyeVec.y,eyeVec.z);
halfVec=normalize(eyeVec+lightVec);

//new uv for scroll1
TexCoord.xy = gl_MultiTexCoord0.xy + (time_0_X * scrollXYZW.xy);

//new uv for scroll2
TexCoord.zw = gl_MultiTexCoord0.xy + (time_0_X * scrollXYZW.zw);
}

[Fragment]

uniform sampler2D tex0; //AniMap
uniform sampler2D tex1; //basemap
uniform sampler2D tex2; //normal map

varying vec4 TexCoord;

varying vec2 vUV;
varying vec3 worldNormal;
varying vec3 worldTangent;
varying vec3 worldBinormal;
varying vec3 halfVec;

void main(void)
{
vec4 tx = texture2D( tex1, vUV );
vec3 base = tx.xyz;
vec3 bump = (texture2D( tex2, vUV ).xyz*2.0-1.0);
vec3 Nb =(worldNormal + (bump.x * worldTangent + bump.y * worldBinormal));
float hdn=dot(halfVec,Nb);
vec3 col=base+vec3(pow(hdn,32.0));

vec4 texColor0 = texture2D( tex0 , TexCoord.xy) ; //uv scroll scale1
vec4 texColor1 = texture2D( tex0 , TexCoord.zw) ; //uv scroll scale2
gl_FragColor=vec4(col,tx.w) + texColor0 + texColor1 ;
}
 
Ignor above [Vertex] part.
My problem is in [Fragment] shader part..

In AMD GPU ShaderAnalyzer , GLSL Fragment shader take more cycles than HLSL...
Does it mean GLSL will be slower than HLSL?
 
The only way to find out is to time the results... it may be the analyser tool that makes a mistake there. But, as already said, ATI has to write the GLSL compiler themselves and their compiler is not really good, while MS has written the HLSL compiler, probably using better optimization techniques.
 
Simple fragment shader:

Code:
void main (void)
{
   gl_FragColor = vec4 (0.0, 1.0, 0.0, 1.0);
}

produces this:

Code:
00 ALU: ADDR(32) CNT(20) 
      0  x: MOV         R0.x,  0.0f      
         y: MOV         R0.y,  1.0f      
         z: MOV         R0.z,  0.0f      
         w: MOV         R0.w,  1.0f      
      1  x: MOV         R1.x,  PV0.x      
         y: MOV         R1.y,  PV0.y      
         z: MOV         R1.z,  PV0.x      
         w: MOV         R1.w,  PV0.y      
      2  x: MOV         R4.x,  R0.z      
         y: MOV         R4.y,  R0.z      
         z: MOV         R4.z,  R0.z      
         w: MOV         R4.w,  R0.w      
      3  x: MOV         R3.x,  R0.z      
         y: MOV         R3.y,  R0.z      
         z: MOV         R3.z,  R0.z      
         w: MOV         R3.w,  R0.w      
      4  x: MOV         R2.x,  R0.z      
         y: MOV         R2.y,  R0.z      
         z: MOV         R2.z,  R0.z      
         w: MOV         R2.w,  R0.w      
01 EXP_DONE: PIX0, R1  BRSTCNT(3)

For some reason it's writing to four outputs, not just one. Dunno if GPUSA (1.42) is buggered or if the problem lies somewhere else, but if this simple shader isn't working right, you're kinda doomed. It appears you'll always get these superfluous MOVs at the end of your fragment shader.

Jawed
 
Dunno if GPUSA (1.42) is buggered or if the problem lies somewhere else

It's the former. There was a slight change in the interface to the GLSL compiler that we failed to mirror in GSA. It's fixed & a new version of GSA with the fix should in a week or so. I'll let you know when it ships.

Cheers,
GP.
 
It's the former. There was a slight change in the interface to the GLSL compiler that we failed to mirror in GSA. It's fixed & a new version of GSA with the fix should in a week or so. I'll let you know when it ships.

Cheers,
GP.

w00t! nice to know :)
 
Same code With New 1.43 GPU ShaderAnalyzer

-----------HLSL EstCycle(Tri)------Old GLSL EstCycle(Tri)---------New GLSL EstCycle(Tri)
X1300 ------------9------------------------- 13-----------------------------11
X1900 -----------4.8------------------------ 7------------------------------4.8

Now, It's much better, But HLSL still faster than GLSL for X1300
 
Same code With New 1.43 GPU ShaderAnalyzer

-----------HLSL EstCycle(Tri)------Old GLSL EstCycle(Tri)---------New GLSL EstCycle(Tri)
X1300 ------------9------------------------- 13-----------------------------11
X1900 -----------4.8------------------------ 7------------------------------4.8

Now, It's much better, But HLSL still faster than GLSL for X1300

That doesn't surprise me much, ATI has generally been faster at DX haven't they?.
 
Back
Top