Xenos ALUs can stall when you have very short loops (say 2 instructions), or when you don't have enough ALU code to hide TEX latency.
One solution is to re-sequence your code. e.g. unroll loops or pre-fetch textures.
The sequencer in Xenos is programmable, allowing devs to fine-tune the coherency of code execution. That is, to define thread-switching points in their code, rather than allowing Xenos to apply its default behaviour. This means Xenos will run with less thread-switches (in total), where every thread-switch costs time through latency (if the latency can't otherwise be hidden).
Xenos's default behaviour is to thread-switch whenever it sees a latency-inducing instruction (e.g. TEX or BR). So, by lumping TEX operations together and then saying "now you can switch threads", Xenos can reduce the 2 or 3 separate thread-switches to a single thread-switch. That reduces the total number of ALU instructions that are required to hide these TEX instructions.
The first task for devs is to tweak data formats (stored as textures or vertex streams) so that access patterns are efficient. i.e. the minimum number of fetches are performed. Additionally, since Xenos offers a choice of access techniques, the dev has to evaluate them.
In a unified architecture, you can't evaluate the performance of a shader in isolation. You can't write a shader and say "TEX ops take this long, ALU ops this long and branches this long, so the total time is xxx, so we can get X fps". You can only say that's the minimum time they'll take. When the pipelines are doing a mix of work (for other pixels, say, or for vertices as well as pixels) then bandwidth limits or full buffers will cause blips. Ultimately the programmer is exposed to concurrency issues.
Another way of putting it is that the programmer has better control of concurrency issues in Xenos - in traditional GPUs, when resource limits are reached, the dev has to tackle the problem indirectly, rewriting code in the hope that the new usage pattern will eke-out better performance. In theory Xenos provides direct control and more options to control execution and resource usage.
Since the SDK for Xenos is still not complete, devs are currently in see-but-can't-touch hell...
Naturally, as someone who has never written shader code, nor coded for Xenos, I can only summarise the general concepts.
Jawed