Yes but a debug message that it write out during start let me believe that the D3D10 version have multicore optimizations.
D3D10 is, by default, thread aware/safe now. Now it has its layers mechanism I think you have to actually opt out of this feature.
Which brings me on to another cool feature that I've yet to be able to try... From the docs I've got, you should be able to swap in/out the RefRast dynamically at runtime. Which would make some debugging a bit easier.
what are the use cases for doing read/write on the current render target
Programmable blending is the primary one... I can't think of any specific effects off the top of my head, but the OM stage is one of the few remaining fixed-function sections that can really make a difference to how the final result is written...
If your GS outputs more vertices, or more attributes, you can run even less threads (or need more memory to store the results).
To cover texturing latency, you really want several hundred threads running. That can easily push your on-board memory needs to 0.5 MB - 1 MB. That's as large as whole CPU caches!
The GS is required to allow up to 1024 values to be written out.. thus a single invocation of a GS could output 4kb of data that the GPU
must handle.
From the developer point of view the IMHO biggest change is the banishing of the Caps.
That's what people would like to believe, but I'm sceptical. Maybe somewhat fewer paths, but different paths are most likely going to be a reality in the future as well as long as hardware has different performance characteristics.
I'm with Humus on this one. The fixed-caps stuff is here to stay from everything I've heard, but it isn't some magical solution - at best it solves ~50% of the configuration/compatability problems.
I expect "performance related caps" to become a bigger thing in the future - they already exist now (a GfFX5200 does SM2, just piss-poor slow!) so it wont be too painful.
I have read with DX10 you can now do a cube map in a single pass and what not.
Yes, this part of the more abstract resource views and arrays. The single-pass cube mapping is a great example because its easy to see the advantage, but its far from the only use.
I've not tried it yet, but I
think I can use the same technology to fold the classic fresnel-weighted reflect/refract water effect into 2 passes via D3D10. Same thing should take 4 passes with D3D9. It'll be difficult to compare performance - but the general efficiency of the new API as well as the flexibility in allowing me to be more "direct" about implementing algorithms should yield big performance wins. That means you have performance budget to invest elsewhere - either simply higher resolutions and MSAA
or you can have more effects in more places.
Okay, a couple more features that haven't been mentioned yet:
• Effect framework moves into the core runtime, is leaner and meaner
• All shader authoring is now in HLSL, no more assembly shaders
• Comparison filtering methods in PS - great for PCF/shadow mapping stuff. I've not tried it extensively, but the bits-n-pieces I've read about D3D9 shadow mapping is that it can rely on various IHV "quirks" and features to get the best performance/quality. Having it mandated by the core runtime and equal across chipsets strikes me as a big win.
• Material systems being run on the GPU. I wrote a mini-article about this and want to push the work a bit harder - I think its got a lot of potential
• The fixed and
very rigourously defined calculation/computation rules should not be underestimated.
Think thats all I've got for now.
Jack