Interesting info in "The Direct3D 10 System (SIGGraph 2006)"

Jawed said:
You'd presumably have at least one VS, one GS and one PS all concurrently loaded and executing. Each would have as many as 16 CBs bound to it.
Yup, its still a fair amount of data - but much less than having 100's of copies fo each/every piece of data "in flight" :smile:

Jawed said:
And presumably it gets more exciting in future when there are multiple contexts executing concurrently.
I think WDDM 2.x and 3.x are where we see the driver/hardware engineers stocking up on paracetamol to help with their headaches :LOL:

Jawed said:
Additionally CBs are designed to be changing all the time. This creates a requirement to "double-buffer" the CBs so that the old CBs are retained as an old shader finishes using them, while a new shader can start immediately with the new CBs.
If you take a look at some of the documentation, its highly recommended to be grouping CB's based on their intended update frequency. That is, you put regularly updating constants in the same buffer and leave the mostly static ones in another buffer. This helps with being able to manage which CB's are resident and/or where they're located.

As for double-buffering, that's probably nothing new - all pipeline configuration is going to be double-buffered somewhere in current generations. I suppose it might be slightly more complex though.

As for the reading, though, I would expect that the typical scenario would be where you execute the same instruction on many different pixels in sequence or in parallel. This would make pre-fetching of data from the constant buffers fairly easy, and would also seem to mean that many pixels would make use of the same value from the constant buffer in sequence, making on-chip caching of constant buffer values a trivial exercise.
Agreed. Even a complex shader isn't likely to have that many "unknowns" to it - even with branching and looping. Or, at least by comparison to the complex programs executed by a CPU. Thus I'd imagine its possible (but whether its done I don't know) to predict many of the shader characteristics and access patterns - thus making pre-fetching and instruction scheduling fairly easy.

Jack
 
JHoxley said:
Interesting. That document looks amazingly similar to the actual D3D10 specification in places. I've not checked it out closely, so it might not be... A good find :smile:

Jack

from the document itself:
Much of this material is drawn from (a Beta release of) the Direct3D 10 SDK. The source code, updates to
the sample documentation, as well as more comprehensive reference documentation are available from the
SDK at http://msdn.microsoft.com/directx.
 
Chalnoth said:
But the 4k temporary values still sound insane to me. That's 64kb just for one in-flight pixel. The only way the hardware could have any reasonable number of pixels in flight with this large of a temporary buffer would be if the temporary register arrays were stored in external memory.
How about a small amount of eDram? 2 meg would support 32 sets
But more the first say 128 temp registers or so would be stored on chip and the rest in special textures.
Edit: actually if you look at figure 3 it would appears they are suggesting 32 on chip registers and the rest would be stored potentiall off the chip.

At the end of the day the first chips out will most likely run to a completely crawl if you try to use any of the resources to the maxium all the spec really says is your virtualisation has to handle all these cases.
 
Last edited by a moderator:
Back
Top