Static vs Dynamic loops on X1K

rwolf

Rock Star
Regular
Correct me if I am wrong here...

I understand that with static loops the driver unroles the shader code. With the performance of dynamic loops on X1K hardware would it make more sense to replace the static loop with a dynamic loop or is the static method still preferred?
 
Last edited by a moderator:
unrolling loops will lead to instruction count explosion, if the struct within loop is very complex or the loop count is high, hence the driver may not always do unrolling.

There's no need to replace static loop with dynamic loop, because it gains you nothing. I assume X1K execute static loop no slower than dynamic loop.
 
rwolf said:
I thought static loops were always unrolled by the driver, but I may be wrong.
It's easy to support static count loops in the hardware and have them run as fast as the unrolled version.
 
Simon F said:
It's easy to support static count loops in the hardware and have them run as fast as the unrolled version.

I was thinking that you could remove the overhead of unrolling the static loop in the driver and just branch the code.
 
rwolf said:
I was thinking that you could remove the overhead of unrolling the static loop in the driver and just branch the code.
That's what Simon is saying. But static branching is easier to implement in hardware because all pixels in a draw call are guaranteed to go the same route. So you don't have to worry about batching, granularity and the order of quads.
 
unrolling the loops should also allow a good compiler to reorder instructions across different loop iterations to achieve higher functional unit utilization (or are register constraints to much an issue for that?)...
 
Xmas said:
That's what Simon is saying. But static branching is easier to implement in hardware because all pixels in a draw call are guaranteed to go the same route. So you don't have to worry about batching, granularity and the order of quads.

Thanks, I misinterpreted his reply.
 
psurge said:
unrolling the loops should also allow a good compiler to reorder instructions across different loop iterations to achieve higher functional unit utilization (or are register constraints to much an issue for that?)...
If I'm not mistaken would this not force recompilation of the shader every time the application changed an integer constant that a "loop" instruction uses as its iteration parameter? I've only begun to write for my new GF Go 6800U so I'm probably missing something...
 
Last edited by a moderator:
akira888 said:
If I'm not mistaken would this not force recompilation of the shader every time the application changed an integer constant that a "loop" instruction uses as its iteration parameter? I've only begun to write for my new GF Go 6800U so I'm probably missing something...

For cases where the constants are known at compile time though it will often be unrolled.
 
There's also speculative compilation. The driver can compile several versions and keep them cached. One, unrolled for specific values (based on profile statistics) The other, not-unrolled, using the static branch instruction.
 
Back
Top