Beyond Programmable Shading SIGGRAPH 2010 slides posted

alefohn

Newcomer
Hi All,
Mike Houston and I have posted most of the slides from our SIGGRAPH 2010 Beyond Programmable Shading course (we'll post the remaining slides early next week). I encourage you to look through the slides, as we've reworked much of the course and the speakers created a substantial amount of new content. Also note that we've added an all-in-one zip file download option in addition to the individual PDFs.

http://bps10.idav.ucdavis.edu/

Aaron
 
"Decoupled Sampling for Real-Time Graphics Pipelines" is very interesting. I really need to find time to wrap my head around it.
 
Thanks Aaron, and howdy Mike if you check out this thread!

This will cause me great headaches and confusion tonight, thanks again. ;)
 
And let us know if you find any errors in the slides so we can fix them. A few have already been found in my intro timeline. ;-)
 
One common theme across the presentations seems to be that CS/CUDA aren't really as useful as expected to address some of the limitations of the DirectX/OpenGL pipelines. Why is that? Is it just too difficult for the graphics and compute APIs to co-operate? Much of the focus seems to be on shaking up the graphics APIs as opposed to de-emphasizing those in favor of compute.
 
One common theme across the presentations seems to be that CS/CUDA aren't really as useful as expected to address some of the limitations of the DirectX/OpenGL pipelines.
I'm not sure what expectations and limitations you're referring to specifically. There's a fair bit of CS usage (all the tile-based deferred rendering stuff) but it is obviously limited. It's also worth noting the difference between things that *can* be implemented in the CS/CUDA/OCL model but cannot be implemented efficiently. Micropolygon rendering and rasterization appear to be two good examples of this.

Why is that? Is it just too difficult for the graphics and compute APIs to co-operate?
No, since CS has the perfect model for that. It's just that the languages themselves abstract a lot of things that make it difficult to write efficient code sometimes. Furthermore there are things that the hardware simply does not do well at the moment as well.
 
Last edited by a moderator:
I'm not sure what expectations and limitations you're referring to specifically.

The limitations outlined in the presentations. The expectation was that the introduction of the compute shader would free developers to innovate outside of those limitations. But it seems that either the compute APIs are still limited themselves or too slow (hence requiring extensions to fixed-function hardware / API support).

No, since CS has the perfect model for that. It's just that the languages themselves abstract a lot of things that make it difficult to write efficient code sometimes. Furthermore there are things that the hardware simply does not do well at the moment as well.

Ok, I know it's more nuanced than that but it seems like you're answering both Yes and No to my question :???:
 
I find slide 53's dismissal of deferred shading in Fatahalian's Micropolygon talk pretty contentious.
 
What do you find contentious there? Deferred shading does have high bandwidth costs and complication with MSAA usage.
 
Well I looked at the paper and found the same glib justification used. So it seems there's no attempt to measure and compare results.

Fragment merging produces artefacts. How do these compare with the "MSAA artefacts" of deferred shading algorithms implemented under D3D11? How's the performance?

Zilch.
 
That's something I noticed as well. Especially this point:
Interacts poorly with multi-sample anti-aliasing and shader derivatives
(no, Direct3D 10.1 multi-sample access doesn’t solve this)
Why doesn't it solve this? In what cases?
 
"Decoupled Sampling for Real-Time Graphics Pipelines" is very interesting. I really need to find time to wrap my head around it.
Not to be the wet blanket which I usually tend to be, but he seems to be talking about a problem which is already mostly solved in the case of REYES. Yeah, it was a problem that existed in the 80s, but Pixar has done loads of work since then on improving their visibility culling including bucketed z-prepass, layer masking, etc. It's not always 100% perfect simply because REYES allows for topology changes to occur at shade time, but they are able at this point to get it very nearly perfect even in a contrived bad case, as well as get 100% perfect culling in the majority of cases. If you do force all topology changes to occur upstream of shading, you can basically get perfect visibility culling and do a fully deferred shading pipeline.
 
The expectation was that the introduction of the compute shader would free developers to innovate outside of those limitations.
I see what you mean. I'd say that it does free you of some limitations (i.e. look at all the cool stuff we're doing now in DX11 from these presentations and otherwise... most of those are compute related) but not others. Particularly those that are tightly related to the pipeline are not really addressed by the current model.

Ok, I know it's more nuanced than that but it seems like you're answering both Yes and No to my question :???:
I was just saying that the problem with CS is not that it isn't integrated enough (it is - it's a first-class API in DX) but rather that the programming model of CS/CUDA/OCL themselves is limited in a number of important ways.

Why doesn't it solve this? In what cases?
As I discussed a bit in my slides, you still lose the nice decoupling of visibility/shading that MSAA gives you with forward rendering. DirectX 10.1/11 gives you the tools to rip the frequencies back apart, but it is not as elegant and scalable as MSAA with forward rendering. Of course forward rendering has scalability issues of its own, just in different areas. I personally think the latter are the more significant and crippling going forward (i.e. for new techniques we need to make them work with deferred rendering, not the other way around) but it's worth being clear about the weakness of all of the techniques.
 
Last edited by a moderator:
Not to be the wet blanket which I usually tend to be, but he seems to be talking about a problem which is already mostly solved in the case of REYES. Yeah, it was a problem that existed in the 80s, but Pixar has done loads of work since then on improving their visibility culling including bucketed z-prepass, layer masking, etc. It's not always 100% perfect simply because REYES allows for topology changes to occur at shade time, but they are able at this point to get it very nearly perfect even in a contrived bad case, as well as get 100% perfect culling in the majority of cases. If you do force all topology changes to occur upstream of shading, you can basically get perfect visibility culling and do a fully deferred shading pipeline.
While REYES effectively decouples shading and visibility it requires to shade on vertices and therefore to always tessellate your geometry at upoly level (unless you are a big fan of Gouraud shading :) ), which is something you definitely don't want to do in the general case. Moreover, for many reasons it is preferable to efficiently decouple shading and visibility by making relatively small changes to current graphics hardware (where/when possible).
 
While REYES effectively decouples shading and visibility it requires to shade on vertices and therefore to always tessellate your geometry at upoly level (unless you are a big fan of Gouraud shading :) ), which is something you definitely don't want to do in the general case. Moreover, for many reasons it is preferable to efficiently decouple shading and visibility by making relatively small changes to current graphics hardware (where/when possible).
In the case of shading micropolys in the first place, as you would with REYES or DWA's renderer or many other production rendering architectures, you already have a different problem space than you would in a realtime in-game rendering system. In the cases of not working with micropolys, I would think that separating shading complexity from rasterization complexity is already largely dealt with through application of Z-prepass and/or deferred shading. Where you're generally stuck either way would be transparency, but there's no real way to avoid that.

OTOH, if you already know ahead of time that all your polys are going to be micropolys and at most 1-pixel sized, then you can pretty much turn the problem on its head where the basic working unit is a polygon and not a pixel. A pixel becomes nothing more than a coverage mask saying which polygons cover this pixel and by what fraction. You just shade polygons that are visible and accumulate the fractional contributions to the pixels.
 
In the cases of not working with micropolys, I would think that separating shading complexity from rasterization complexity is already largely dealt with through application of Z-prepass and/or deferred shading.
To some extent, yes (although less-so Z-prepass which still has some scheduling inefficiencies) but these don't work with motion blur/DOF. We'd need a more complicated data structure than a simple Z-buffer to define the required samples in these cases. I have yet to see such a structure that remains efficiently generated, sampled and scheduled by the GPU and ideally works well with deferred rendering. GPUs are getting better but they still take a really large penalty for irregularity that goes beyond the standard pipeline :S
 
Last edited by a moderator:
I would think that separating shading complexity from rasterization complexity is already largely dealt with through application of Z-prepass and/or deferred shading.
Deferred shading is a method, not *the* method. JRK's work has some nice properties such as working in 5D, tunable sampling rates, etc. (and it can be used within a deferred shading approach too) and it's not really about upolys rendering.
 
Hi All,
Mike Houston and I have posted most of the slides from our SIGGRAPH 2010 Beyond Programmable Shading course (we'll post the remaining slides early next week). I encourage you to look through the slides, as we've reworked much of the course and the speakers created a substantial amount of new content. Also note that we've added an all-in-one zip file download option in addition to the individual PDFs.

http://bps10.idav.ucdavis.edu/

Aaron

Excellent - thanks for posting.

David
 
Back
Top