Static vs Dynamic loops on X1K

rwolf · Oct 13, 2005

Correct me if I am wrong here...

I understand that with static loops the driver unroles the shader code. With the performance of dynamic loops on X1K hardware would it make more sense to replace the static loop with a dynamic loop or is the static method still preferred?

991060 · Oct 13, 2005

unrolling loops will lead to instruction count explosion, if the struct within loop is very complex or the loop count is high, hence the driver may not always do unrolling.

There's no need to replace static loop with dynamic loop, because it gains you nothing. I assume X1K execute static loop no slower than dynamic loop.

rwolf · Oct 13, 2005

I thought static loops were always unrolled by the driver, but I may be wrong.

Simon F · Oct 13, 2005

rwolf said:
I thought static loops were always unrolled by the driver, but I may be wrong.

It's easy to support static count loops in the hardware and have them run as fast as the unrolled version.

neliz · Oct 13, 2005

There was a thread about efficiency showin that the r520 was twice as fast when dynamic was turned on...

(maybe the wrong topic)

oh well, sm3 performance: XL vs. GTX, static and dynamic

http://www.beyond3d.com/forum/showpost.php?p=594337&postcount=57

rwolf · Oct 13, 2005

Simon F said:
It's easy to support static count loops in the hardware and have them run as fast as the unrolled version.

I was thinking that you could remove the overhead of unrolling the static loop in the driver and just branch the code.

Xmas · Oct 13, 2005

rwolf said:
I was thinking that you could remove the overhead of unrolling the static loop in the driver and just branch the code.

That's what Simon is saying. But static branching is easier to implement in hardware because all pixels in a draw call are guaranteed to go the same route. So you don't have to worry about batching, granularity and the order of quads.

psurge · Oct 13, 2005

unrolling the loops should also allow a good compiler to reorder instructions across different loop iterations to achieve higher functional unit utilization (or are register constraints to much an issue for that?)...

rwolf · Oct 13, 2005

Xmas said:
That's what Simon is saying. But static branching is easier to implement in hardware because all pixels in a draw call are guaranteed to go the same route. So you don't have to worry about batching, granularity and the order of quads.

Thanks, I misinterpreted his reply.

akira888 · Oct 13, 2005

psurge said:
unrolling the loops should also allow a good compiler to reorder instructions across different loop iterations to achieve higher functional unit utilization (or are register constraints to much an issue for that?)...

If I'm not mistaken would this not force recompilation of the shader every time the application changed an integer constant that a "loop" instruction uses as its iteration parameter? I've only begun to write for my new GF Go 6800U so I'm probably missing something...

psurge · Oct 13, 2005

good point - another reason not to do it I suppose...

Humus · Oct 15, 2005

akira888 said:
If I'm not mistaken would this not force recompilation of the shader every time the application changed an integer constant that a "loop" instruction uses as its iteration parameter? I've only begun to write for my new GF Go 6800U so I'm probably missing something...

For cases where the constants are known at compile time though it will often be unrolled.

DemoCoder · Oct 15, 2005

There's also speculative compilation. The driver can compile several versions and keep them cached. One, unrolled for specific values (based on profile statistics) The other, not-unrolled, using the static branch instruction.

Static vs Dynamic loops on X1K

rwolf

Rock Star

991060

rwolf

Rock Star

Simon F

Tea maker

neliz

GIGABYTE Man

rwolf

Rock Star

Xmas

Porous

psurge

rwolf

Rock Star

akira888

psurge

Humus

Crazy coder

DemoCoder

Similar threads