Yup, its still a fair amount of data - but much less than having 100's of copies fo each/every piece of data "in flight" :smile:Jawed said:You'd presumably have at least one VS, one GS and one PS all concurrently loaded and executing. Each would have as many as 16 CBs bound to it.
I think WDDM 2.x and 3.x are where we see the driver/hardware engineers stocking up on paracetamol to help with their headachesJawed said:And presumably it gets more exciting in future when there are multiple contexts executing concurrently.
If you take a look at some of the documentation, its highly recommended to be grouping CB's based on their intended update frequency. That is, you put regularly updating constants in the same buffer and leave the mostly static ones in another buffer. This helps with being able to manage which CB's are resident and/or where they're located.Jawed said:Additionally CBs are designed to be changing all the time. This creates a requirement to "double-buffer" the CBs so that the old CBs are retained as an old shader finishes using them, while a new shader can start immediately with the new CBs.
As for double-buffering, that's probably nothing new - all pipeline configuration is going to be double-buffered somewhere in current generations. I suppose it might be slightly more complex though.
Agreed. Even a complex shader isn't likely to have that many "unknowns" to it - even with branching and looping. Or, at least by comparison to the complex programs executed by a CPU. Thus I'd imagine its possible (but whether its done I don't know) to predict many of the shader characteristics and access patterns - thus making pre-fetching and instruction scheduling fairly easy.As for the reading, though, I would expect that the typical scenario would be where you execute the same instruction on many different pixels in sequence or in parallel. This would make pre-fetching of data from the constant buffers fairly easy, and would also seem to mean that many pixels would make use of the same value from the constant buffer in sequence, making on-chip caching of constant buffer values a trivial exercise.
Jack