Anyone got a NV40...?

Xmas · May 28, 2004

... to check the execution time of this little PS snippet?

Code:

 ps_2_x

dcl_2d s0
dcl_2d s1
dcl t0

texld r2, t0, s0
dsx r0.xy, t0
dsy r1.xy, t0
add r2.xy, t0, r2
texldd r0, r2, s1, r0, r1
mov oC0, r0

NVShaderPerf says 6 cycles for CineFX1 and 5 for CineFX2, but unfortunately it doesn't provide numbers or NV40 yet. My guess is that it should either take 3 or four cycles, but I'm not sure of that.

edit: added write mask to add.

KimB · May 28, 2004

Well, if it can co-issue the dsx/dsy instructions, and execute them at the same time as that first texld instruction, it seems like it should take 2 cycles. Otherwise I would expect 3 cycles.

Anyway, I've been keeping my eyes open for a 6800, and I'm hoping to get one in just over a week.

Ostsol · May 28, 2004

So coissuing does not have to occur on the same register?

Xmas · May 28, 2004

I'm pretty sure it can co-issue two 2D-dsx/dsy, but I'd expect them to take up SU1. The add takes place in SU1, too, so texldd starts in cycle 3. And texldd takes two cycles on NV3x.

Xmas · May 28, 2004

Ostsol said:
So coissuing does not have to occur on the same register?

No, you can use different registers.

Ostsol · May 28, 2004

Ah. . . In that case it should take 3 cycles, from what I understand from the info released.

1. tex, vec2 -- vec2 coissue
2. add
3. tex, mov

KimB · May 29, 2004

Xmas said:
I'm pretty sure it can co-issue two 2D-dsx/dsy, but I'd expect them to take up SU1. The add takes place in SU1, too, so texldd starts in cycle 3. And texldd takes two cycles on NV3x.

I had thought that the first shader unit was the special function unit, and the second one is the one that does the adds. Anyway, it will be something to test.

Tridam · May 29, 2004

I'll try it tomorrow.

I think that it will need 4 cycles to run.

991060 · May 29, 2004

I modified the code to:

Code:

 ps_2_x 

dcl_2d s0 
dcl t0 

texld r2, t0, s0 
dsx r0.xy, t0 
dsy r1.xy, t0 
add r2.xy, t0, r2 
texldd r0, r2, s0, r0, r1 
mov oC0, r0

because the fillrate tester my friend used doesn't have 2 textures.

The result he obtained showed that the code needs roughly 11 clocks to finish!!! This is kinda weird.

KimB · May 29, 2004

That would be a result of the hardware not being able to hide the latency of the dependent texture read. What hardware took 11 clocks?

digitalwanderer · May 29, 2004

Could someone translate this into thicky-Digispeak? I'm hopelessly lost here trying like hell to understand this.

PeterAce · May 30, 2004

What apps/tools are you guys using to test these shaders?

Would you be able to post a link to the program(s) here (as I would like to test a few myself).

cho · May 30, 2004

PeterAce said:
What apps/tools are you guys using to test these shaders?

Would you be able to post a link to the program(s) here (as I would like to test a few myself).

http://www.gzeasy.com/ours/benchmark/FillrateBenchmark_V092_b.zip

double click the "customized Pixel Shader" and then fill the Pixel Shader...

PeterAce · May 30, 2004

cho said:
PeterAce said:

What apps/tools are you guys using to test these shaders?

Would you be able to post a link to the program(s) here (as I would like to test a few myself).

Click to expand...

http://www.gzeasy.com/ours/benchmark/FillrateBenchmark_V092_b.zip

double click the "customized Pixel Shader" and then fill the Pixel Shader...

Thanks, works like a charm.

Tridam · May 30, 2004

Xmas said:
I'm pretty sure it can co-issue two 2D-dsx/dsy, but I'd expect them to take up SU1. The add takes place in SU1, too, so texldd starts in cycle 3. And texldd takes two cycles on NV3x.

It doesn't seem to be the case. It seems that only one 2D dsx/dsy can be done per cycle and that it requires the 2 ALUs.

Tridam · May 30, 2004

Xmas said:
... to check the execution time of this little PS snippet?

Code:

ps_2_x dcl_2d s0 dcl_2d s1 dcl t0 texld r2, t0, s0 dsx r0.xy, t0 dsy r1.xy, t0 add r2.xy, t0, r2 texldd r0, r2, s1, r0, r1 mov oC0, r0

NVShaderPerf says 6 cycles for CineFX1 and 5 for CineFX2, but unfortunately it doesn't provide numbers or NV40 yet. My guess is that it should either take 3 or four cycles, but I'm not sure of that.

edit: added write mask to add.

Here is what seems to be done :

Code:

cycle 1 :
dsx r0.xy, t0

cycle 2 :
dsy r1.xy, t0

cycle 3 :
texld r2, t0, s0
add r2.xy, t0, r2

cycles 4-12
texldd r0, r2, s1, r0, r1

I think that there will be some improvements with a better compiler.

KimB · May 30, 2004

How did you test that, Tridam?

Tridam · May 30, 2004

Chalnoth said:
How did you test that, Tridam?

I've run some variations of this shader. The result of each variation doesn't say anything but all the results together give a clearer picture of what's going on.

Xmas · May 31, 2004

Chalnoth said:
Xmas said:

I'm pretty sure it can co-issue two 2D-dsx/dsy, but I'd expect them to take up SU1. The add takes place in SU1, too, so texldd starts in cycle 3. And texldd takes two cycles on NV3x.

Click to expand...

I had thought that the first shader unit was the special function unit, and the second one is the one that does the adds. Anyway, it will be something to test.

Yes the first unit is SF/MUL/TEX, but that would be SU0

. dsx/dsy, being a SUB, should run in SU1.

Tridam said:
Here is what seems to be done :

Code:

cycles 4-12 texldd r0, r2, s1, r0, r1

I think that there will be some improvements with a better compiler.

KimB · May 31, 2004

By the way, as far as the compiler is concerned, I really wouldn't expect much benefit for short shaders like this one. The benefit will be in longer, more complex shaders where latency hiding can be done.

Anyone got a NV40...?

Xmas

Porous

KimB

Ostsol

Xmas

Porous

Xmas

Porous

Ostsol

KimB

Tridam

991060

KimB

digitalwanderer

wandering

PeterAce

cho

PeterAce

Tridam

Tridam

KimB

Tridam

Xmas

Porous

KimB

Similar threads