I have yet to see an analysis of this. Does anyone know the difference between Nehalem's SMT implementation and that of Netburst?
Probably not the site you want to read from but I found this:
http://www.tomshardware.com/reviews/Intel-i7-nehalem-cpu,2041-6.html
and
http://www.behardware.com/articles/733-2/report-intel-nehalem-architecture.html
The per-thread performance and percieved latency won't really improve.
Utilization of the hardware improves, but the contexts themselves won't see significant improvement.
Niagara (and Itanium-2), if I'm not mistaken, don't use SMT, but a more exotic form of TMT (Temporal MT), where in at a given time only one thread per core is active on executing, and if there is an event like load-latency threshold cross, the stalled thread is "stunned" and the next one is issued to the pipeline.