Revealing The Power of DirectX 11

http://anandtech.com/video/showdoc.aspx?i=3507

This is DirectX 10.
dx10pipeline.png


We all remember him from our G80 launch article back in the day when no one knew how much Vista would really suck. Some of the short falls of DirectX 10 have been in operating system support, driver support, time to market issues, and other unfortunate roadblocks that kept developers from making full use of all the cool new features and tools DirectX 10 brought.

Meet DirectX 11.
dx11pipeline.png


She's much cooler than her older brother, and way hotter too. Many under-the-hood enhancements mean higher performance for features available but less used under DX10. The major changes to the pipeline mark revolutionary steps in graphics hardware and software capabilities. Tessellation (made up of the hull shader, tessellator and domain shader) and the Compute Shader are major developments that could go far in assisting developers in closing the gap between reality and unreality. These features have gotten a lot of press already, but we feel the key to DirectX 11 adoption (and thus exploitation) is in some of the more subtle elements. But we'll get in to all that in due time.

Along with the pipeline changes, we see a whole host of new tweaks and adjustments. DirectX 11 is actually a strict superset of DirectX 10.1, meaning that all of those features are completely encapsulated in and unchanged by DirectX 11. This simple fact means that all DX11 hardware will include the changes required to be DX 10.1 compliant (which only AMD can claim at the moment). In addition to these tweaks, we also see these further extensions:
otherstuff.png


While changes in the pipeline allow developers to write programs to accomplish different types of tasks, these more subtle changes allow those programs to be more complex, higher quality, and/or more performant. Beyond all this, Microsoft has also gone out of its way to help make parallel programming a little bit easier for game developers.
 
Do you think OpenCL and DX11 computing shader will render PhysX more or less useless in the future? Is this (part of) the reason why ATi is not jumping aboard with CUDA and PhysX?
 
I can't wait for DX11 hardware. :)

I have some compute shaders written, just waiting to be accelerated...
 
Can someone please explain to me the Hull and the domain shader in some detail?

Here's what I know.

1) In first stage, I specify the mesh using the API. I know ogl, not dx, but the basic principles are expected to be the same. I usually use something like vertex buffer and specify primitives to the gpu using indices. Let's restrict ourselves to triangles for simplicity. I guess that the polygons are also tesselated into triangles, invisibly to the programmer.

2) The vertices are transformed and lit by the vertex shader which runs on a per vertex basis. After T&L, primitive assembly happens (a serial operation, using the post T&L cache for reducing need to run the shader on the vertex again). I dunno why input assembly is specified on top here, though it wouldn't affect this stage in theory

This one in , one out nature of vertex shader means that adaptive refinement of meshes i difficult to do. People have resorted to various hacks to generate the geometry on the gpu.

<Here's what I know on the surface and want to know in enough detail that I too can explain it to some one else.>

Hull shader is supposed to tell tesselator on how to tessellate, and domain shader is supposed to put stuff together. MS first attempt at dynamic geometry generation using geometry shaders wasn't particularly succesful.


</Here's what I know on the surface and want to know in enough detail that I too can explain it to some one else.>

Bottom line, I want to understand the new geometry stages in dx11. I have gone through the gamefest slides, but I didn't get it fully.

BTW: IS this post in the correct thread?
 
It seems to me OpenCL is CUDA in sheep's clothing, so if NVidia's serious about PhysX then it'll be available in OpenCL flavour before long.

Yeah probably. But if that does happen I wonder how the licensing scheme would play out. It might just be much cheaper to license a CUDA based PhysX than it is an OpenCL one that would allow Nvidia's competition to reap all the benefits of their investment in the library.
 
Bottom line, I want to understand the new geometry stages in dx11. I have gone through the gamefest slides, but I didn't get it fully.

Yeah I've been struggling with this too. It's really weird that there's been so little discussion about the new stages in detail. And there doesn't seem to be much out there in general with regard to programmable tessellation.

My basic understanding is that the new patch primitive (basically a quad of vertices) is passed to the hull shader. At this point it's up to the developer to code an algorithm for determining how much to tessellate that patch. Here's why I think LOD calculations etc take place.

The tesselator then takes in some tessellation factors and spits out a bunch of domain points, supposedly the more you tessellate the more points you get out. The domain shader then works on each point, doing lookups against textures for displacement if necessary and spitting out the resulting vertex.

So in my mind the pipeline goes something like Quad Patch -> Calculate LOD (Hull) -> Tessellate -> Displace (Domain) -> Vertices -> Triangle Setup -> Rasterize as usual. It'll be interesting to see how many points a tessellator produces per clock on the various IHV implementations. Or even how big each hull/domain warp/batch will be.
 
So in my mind the pipeline goes something like Quad Patch -> Calculate LOD (Hull) -> Tessellate -> Displace (Domain) -> Vertices -> Triangle Setup -> Rasterize as usual. It'll be interesting to see how many points a tessellator produces per clock on the various IHV implementations. Or even how big each hull/domain warp/batch will be.
It works like this. VS->HS->TE->DS->GS->PS
VS has the same basic functionality we expect.

The HS works on patches with 1 to 32 control points. It was explained to me that a use of the HS might be to remove extraordinary vertices from a patch to prepare it for tessellation. By this I mean a 17 control point patch might need to be converted to 16 points to prepare it for tessellation. The LOD might be calculated here or it could be passed in from the VS or a constant.

The tessellator generates points. The number of which is controlled by the LOD.

The domain shader is like a second vertex shader, but it works on the post tessellated vertices. It runs a different algorithm based on the type of subdivision you're trying to do. Catmull-Clark, etc. It might be quite long depending on the type of evaluation it's performing.

To compare this to Ati's existing method. It works with triangles and quads. For single pass mode tessellation is performed prior to the vertex shader so the vertex shader also performs the evaluation.

For adaptive (multi-pass mode) the first pass runs a pre-tessellation vertex shader, calculates tessellation factors and writes them out to memory. The second pass tessellates and performs evaluation in the vertex shader.

Hopefully this makes sense. If you want details on how to write a hull shader you'll have to look elsewhere. It can get complicated.
 
The HS works on patches with 1 to 32 control points. It was explained to me that a use of the HS might be to remove extraordinary vertices from a patch to prepare it for tessellation. By this I mean a 17 control point patch might need to be converted to 16 points to prepare it for tessellation. The LOD might be calculated here or it could be passed in from the VS or a constant.

Ok, what are patches and control points. I guess to use these new shading stages, one would have to use new API calls. Right now, we only specify the vertices of primitives.
 
Ok, what are patches and control points. I guess to use these new shading stages, one would have to use new API calls. Right now, we only specify the vertices of primitives.
Typically, as in a modeling program, a patch is a grid of vertices. Many objects are modeled with patches and tessellated to triangles for display. The Stanford Bunny for example.

With DX11 you can think of a patch as a primitive with N vertices up to 32. Maybe you just want to draw a bunch of cubes so you'd have 6 point patches. Though if your patch is actually a cube and you don't want to tessellate it you might just want to skip the HS and go straight to the GS.

A control point is a vertex. The name is semantics to highlight the distinction between pre and post tessellated vertices. You can think of the VS that precedes the HS as being a control point shader. Microsoft could have easily called it such and named the domain shader the vertex shader.
 
Can you please explain the mechanism going n between, a vertex shader turning out vertices and the hull shader receiving patches. I think that it will be something like primitive assembly. Further, right now we only specify vertices of meshes to the gpu. I suspect with dx11, we'll specify patches instead of triangles/quads so that we can use the new stages.
 
Yeah I've been struggling with this too. It's really weird that there's been so little discussion about the new stages in detail.

And virtually nothing on performance of this new set of stages. For example,

Are we to make some assumptions that tessellation is going to require triangle setup to be done with parallel hardware now? If so will vertex setup (ie index and attribute fetch) scale with triangle setup improvements so that those not using tessellation can benefit, or will tessellation be required to reach peek triangle rates?

What about efficiency of computation for patches? Points of patches share lots of data (likely through "shared registers"), but are not always going to be grouped in a multiple of the SIMD vector width of the machine. So is the cost of doing a 32 point patch going to have the same computational cost as a 19 point patch? Or is the rumor of NVidia with possible form of MIMD (perhaps dynamic warp formation) going to be the solution to this issue?

Also what about buffering of intermediate results. Would going to GDDR for intermediate results likely render tessellation not very useful? I'm thinking intermediate results are buffered on chip. Is the buffering per shader type? Or is this going more general purpose (ie a shared cache)? I'm hoping a cache.

There are way too many unknowns even if you understand how it works...
 
Can you please explain the mechanism going n between, a vertex shader turning out vertices and the hull shader receiving patches. I think that it will be something like primitive assembly. Further, right now we only specify vertices of meshes to the gpu. I suspect with dx11, we'll specify patches instead of triangles/quads so that we can use the new stages.
I believe programmers declare input and output control points. The input control points run through the VS and are the input to the HS. The output control points are sent to the tessellator.

All of the performance details Timothy questions and speculates about will be implementation dependent so Microsoft likely doesn't even know the answers to these questions. Maybe a few fortunate developers have been briefed on AMD and Nvidia DX11 architectures, but it might still be too early for that.
 
Here is a nice and short description on the tessellation pipeline:-
http://www.realtimerendering.com/blog/direct3d-11-details-part-ii-tessellation/

If I'm not wrong DX11 tessellation hardware is designed for parametric patches (like B-splines), and not for subdivision meshes (or refined meshes) like Catmull-Clark. Using a technique by Charles Loop (from their SIGGRAPH 2008 paper), you can indirectly approximate the limit surfaces of subdivision meshes by bezier patches. A patch will typically be a collection of control points, a very small number of vertices that uniquely determine a smooth surface (16-vertex patches are most common). The primary job of the hull shader is to tell the tessellator exactly how much to tessellate along each edge of this patch, however, it can optionally (and for a multitude of reasons) modify the control points being sent. The domain shader gets first access to the (typically numerous) generated vertices, and it can use them (optionally using the hull shader output) to add further detail to the tessellated surface. Displacement mapping in one example.

What about efficiency of computation for patches? Points of patches share lots of data (likely through "shared registers"), but are not always going to be grouped in a multiple of the SIMD vector width of the machine. So is the cost of doing a 32 point patch going to have the same computational cost as a 19 point patch? Or is the rumor of NVidia with possible form of MIMD (perhaps dynamic warp formation) going to be the solution to this issue?

In my opinion the tessellator could either be a fixed-function unit (like the one in AMD hardware) but it could also be emulated using compute languages like CUDA, which can be flexible enough to adjust to cases with 19 point patches et al. I think performance is not going to be an issue.
 
Very excited about the tesselator stuff :)

One of the above links had a link to an NV .pdf, page 7 has a pic of something I'm very keen on: Higher tesselation around the edges of a mesh than in the middle -> no more angular model profiles :!:
 
Yah, that looks pretty good too :)

DX11 is going to be bringing some real ingame visual improvement over DX9 in a way that DX10 really failed to do and that improves the likelyhood of me migrating from XP considerably :yes:
 
Does this tesselator suffer from the same problems truform had
eg: anything that was meant to be flat got rounded
 
Back
Top