GLSL much slower than HLSL?(In AMD GPU ShaderAnalyzer)

kumayu · Jun 19, 2008

I put my simple GLSL and HLSL code into AMD GPU ShaderAnalyzer.
I expect they will be equally in term of performance.
But what a surprise.....

HLSL EstCycle(Tri) GLSL EstCycle(Tri)
X1300 9 13
X1900 4.8 7

Who stole the 30% performance??
Both shader codes are almost the same.
Can You give me a hint?

Does it mean GLSL will be slower than HLSL?
Or just AMD GPU ShaderAnalyzer has problem with GLSL?

--------D3D ASM stastics-------------------------------------------------
Shader Version = 3.0
Instruction Count = 23
ALU Instructions = 9, Texture Instructions = 4, ALU:Texture Ratio = 2.25

Constant Register Count = 1
Temp Register Count = 3, Sampler Register Count = 3, Input Register Count = 6, Output Register Count = 1

Has PS2.0 Instructions
Uses Arbitrary Swizzle

kumayu · Jun 19, 2008

-----------HLSL EstCycle(Tri)------ GLSL EstCycle(Tri)
X1300 ------------9------------------------- 13
X1900 -----------4.8------------------------ 7

Rodéric · Jun 19, 2008

That's unexpected, no ATi board in here to benchmark though...

Zengar · Jun 19, 2008

Well, yes, ATI has a bad GLSL compiler... Could you post the code?

kumayu · Jun 20, 2008

HLSL version
Paste it in AMD GPU ShaderAnalyzer and you will see how many cycle....

--------------------------------------------------
sampler2D tex0; //AniMap
sampler2D tex1; //basemap
sampler2D tex2; //normal map

struct PS_INPUT
{
float4 TexCoord : TEXCOORD0;
float2 vUV : TEXCOORD1;
float3 worldNormal : TEXCOORD2;
float3 worldTangent : TEXCOORD3;
float3 worldBinormal : TEXCOORD4;
float3 halfVec : TEXCOORD5;
};

float4 ps_main( PS_INPUT Input ) : COLOR0
{
float4 tx = tex2D( tex1, Input.vUV );
float3 base = tx.xyz;
float3 bump = tex2D( tex2, Input.vUV ).xyz * 2.0-1.0;
float3 Nb =(Input.worldNormal + (bump.x * Input.worldTangent + bump.y * Input.worldBinormal));
float hdn = dot(Input.halfVec , Nb);
float3 col= base + pow(hdn,32.0);

float4 texColor0 = tex2D( tex0 , Input.TexCoord.xy) ; //uv scroll scale1
float4 texColor1 = tex2D( tex0 , Input.TexCoord.zw) ; //uv scroll scale2
texColor0=texColor0+texColor1;
texColor0=texColor0+float4(col,tx.w);
return texColor0 ;
}

kumayu · Jun 20, 2008

GLSL version
All computing is the same as HLSL.

--------------------------------------------------

[Vertex]

uniform mat4 matWorldIT;
uniform mat4 matWorld;
uniform mat4 matViewI;

uniform float time_0_X;
uniform vec4 scrollXYZW;
varying vec4 TexCoord;

varying vec2 vUV;
varying vec3 worldNormal;
varying vec3 worldTangent;
varying vec3 worldBinormal;
varying vec3 halfVec;

void main(void)
{
gl_Position = ftransform();

vUV = vec2(gl_MultiTexCoord0);

worldNormal=normalize((vec3(gl_Normal)*mat3(matWorldIT)).xyz);
worldTangent=normalize((vec3(gl_MultiTexCoord1)* mat3(matWorldIT)).xyz);
worldBinormal=normalize((vec3(gl_MultiTexCoord2)*mat3(matWorldIT)).xyz);

vec3 worldP = (gl_Vertex* matWorld).xyz;
vec3 eyeVec=normalize(matViewI[3].xyz - worldP);
vec3 lightVec=vec3(-eyeVec.x,eyeVec.y,eyeVec.z);
halfVec=normalize(eyeVec+lightVec);

//new uv for scroll1
TexCoord.xy = gl_MultiTexCoord0.xy + (time_0_X * scrollXYZW.xy);

//new uv for scroll2
TexCoord.zw = gl_MultiTexCoord0.xy + (time_0_X * scrollXYZW.zw);
}

[Fragment]

uniform sampler2D tex0; //AniMap
uniform sampler2D tex1; //basemap
uniform sampler2D tex2; //normal map

varying vec4 TexCoord;

varying vec2 vUV;
varying vec3 worldNormal;
varying vec3 worldTangent;
varying vec3 worldBinormal;
varying vec3 halfVec;

void main(void)
{
vec4 tx = texture2D( tex1, vUV );
vec3 base = tx.xyz;
vec3 bump = (texture2D( tex2, vUV ).xyz*2.0-1.0);
vec3 Nb =(worldNormal + (bump.x * worldTangent + bump.y * worldBinormal));
float hdn=dot(halfVec,Nb);
vec3 col=base+vec3(pow(hdn,32.0));

vec4 texColor0 = texture2D( tex0 , TexCoord.xy) ; //uv scroll scale1
vec4 texColor1 = texture2D( tex0 , TexCoord.zw) ; //uv scroll scale2
gl_FragColor=vec4(col,tx.w) + texColor0 + texColor1 ;
}

kumayu · Jun 20, 2008

Ignor above [Vertex] part.
My problem is in [Fragment] shader part..

In AMD GPU ShaderAnalyzer , GLSL Fragment shader take more cycles than HLSL...
Does it mean GLSL will be slower than HLSL?

Zengar · Jun 20, 2008

The only way to find out is to time the results... it may be the analyser tool that makes a mistake there. But, as already said, ATI has to write the GLSL compiler themselves and their compiler is not really good, while MS has written the HLSL compiler, probably using better optimization techniques.

Jawed · Jun 21, 2008

Simple fragment shader:

Code:

void main (void)
{
   gl_FragColor = vec4 (0.0, 1.0, 0.0, 1.0);
}

produces this:

Code:

00 ALU: ADDR(32) CNT(20) 
      0  x: MOV         R0.x,  0.0f      
         y: MOV         R0.y,  1.0f      
         z: MOV         R0.z,  0.0f      
         w: MOV         R0.w,  1.0f      
      1  x: MOV         R1.x,  PV0.x      
         y: MOV         R1.y,  PV0.y      
         z: MOV         R1.z,  PV0.x      
         w: MOV         R1.w,  PV0.y      
      2  x: MOV         R4.x,  R0.z      
         y: MOV         R4.y,  R0.z      
         z: MOV         R4.z,  R0.z      
         w: MOV         R4.w,  R0.w      
      3  x: MOV         R3.x,  R0.z      
         y: MOV         R3.y,  R0.z      
         z: MOV         R3.z,  R0.z      
         w: MOV         R3.w,  R0.w      
      4  x: MOV         R2.x,  R0.z      
         y: MOV         R2.y,  R0.z      
         z: MOV         R2.z,  R0.z      
         w: MOV         R2.w,  R0.w      
01 EXP_DONE: PIX0, R1  BRSTCNT(3)

For some reason it's writing to four outputs, not just one. Dunno if GPUSA (1.42) is buggered or if the problem lies somewhere else, but if this simple shader isn't working right, you're kinda doomed. It appears you'll always get these superfluous MOVs at the end of your fragment shader.

Jawed

Genghis Presley · Jun 23, 2008

Jawed said:
Dunno if GPUSA (1.42) is buggered or if the problem lies somewhere else

It's the former. There was a slight change in the interface to the GLSL compiler that we failed to mirror in GSA. It's fixed & a new version of GSA with the fix should in a week or so. I'll let you know when it ships.

Cheers,
GP.

Rodéric · Jun 24, 2008

Genghis Presley said:
It's the former. There was a slight change in the interface to the GLSL compiler that we failed to mirror in GSA. It's fixed & a new version of GSA with the fix should in a week or so. I'll let you know when it ships.

Cheers,
GP.

w00t! nice to know

Genghis Presley · Jul 2, 2008

GSA v1.43

GPU ShaderAnalyzer v1.43 is now available from http://developer.amd.com/gpu/shader/Pages/default.aspx & includes a fix for this bug.

Cheers,
GP.

kumayu · Jul 6, 2008

Same code With New 1.43 GPU ShaderAnalyzer

-----------HLSL EstCycle(Tri)------Old GLSL EstCycle(Tri)---------New GLSL EstCycle(Tri)
X1300 ------------9------------------------- 13-----------------------------11
X1900 -----------4.8------------------------ 7------------------------------4.8

Now, It's much better, But HLSL still faster than GLSL for X1300

Betanumerical · Jul 8, 2008

kumayu said:
Same code With New 1.43 GPU ShaderAnalyzer

-----------HLSL EstCycle(Tri)------Old GLSL EstCycle(Tri)---------New GLSL EstCycle(Tri)
X1300 ------------9------------------------- 13-----------------------------11
X1900 -----------4.8------------------------ 7------------------------------4.8

Now, It's much better, But HLSL still faster than GLSL for X1300

That doesn't surprise me much, ATI has generally been faster at DX haven't they?.

GLSL much slower than HLSL?(In AMD GPU ShaderAnalyzer)

kumayu

kumayu

Rodéric

a.k.a. Ingenu

Zengar

kumayu

kumayu

kumayu

Zengar

Jawed

Genghis Presley

Rodéric

a.k.a. Ingenu

Genghis Presley

kumayu

Betanumerical