DX10 Checklist: What made it into D3D10? What was cut?

Acert93 · May 6, 2006

A couple years back Ilfrin wrote an excellent preview of D3D10. With DX10 GPUs coming in the next 6-9 months do we have a feel for what will be, and wont be, included in the first release? Here is a check list of items in the preview (Dec 2004):

• Unified Shading Model
• Virtual Video Memory & "Unlimited" Resources
• Integer Instruction Set
• General I/O Model
• Topology Processor
• Tessellator Enhancements
• API improvements
• Frame Buffer Access in the Pixel Shader

So, what has made it into D3D10? What got cut? And what is new? And when will we know if features (e.g. F10 blending and filtering) were added.

Confirmed DX10 Features*:
Geometry Shaders

* I will edit this post as we get confirmation of new features.

Acert93 · May 6, 2006

From what I can gather, correct me if I am wrong, we know that the following are present in DX10.

â€¢ Geometry Shaders
â€¢ Batch Size improved
â€¢ SM4.0 & Unified Shading Language with interger and bitwise instructions
â€¢ Vertex Texturing required
â€¢ Virtual Texturing

Since ATI is leveraging their Xenos design, I wonder which features it has (HOS, fixed function tesselation, MEMEXPORT, etc) have also been incorperated.

The below quote from GameSpot's Vista article also has some basic information on what is included and seems to emphasize the two points in Dave's DX preview: â€¢ More general purpose â€¢ Less reliance on the CPU "The idea behind D3D10 is to maximize what the GPU can do without CPU interaction, and when the CPU is needed itâ€™s a fast, streamlined, pipeline-able operation."

Here's a list of several new Direct3D 10 performance improvements GameSpot was able to wrestle out of the DirectX 10 team:
â€¢ New constant buffers maximize efficiency of sending shader constant data (light positions, material information, etc.) to the GPU by eliminating redundancy and massively reducing the number of calls to the runtime and driver. New state objects significantly reduce the amount of API calls and bandwidth, tracking, mapping, and validation overhead needed in the runtime and driver to change GPU device state.
â€¢ Texture arrays enable the GPU to swap materials on-the-fly without having to swap those textures from the CPU.
â€¢ Resource views enable super-fast binding of resources to the pipeline by informing the system early-on about its intended use. This also vastly reduces the cost of hazard-tracking and validation.
â€¢ Predicated rendering allows draw calls to be automatically deactivated based on the results of previous rendering - without any CPU interaction. This enables rapid occlusion culling to avoid rendering objects that arenâ€™t visible.
â€¢ Shader Model 4.0 provides a more robust instruction set with capabilities like integer and bitwise instructions, enabling more work to be transferred to the GPU.
â€¢ The D3D runtime itself has been completely refactored to maximize performance and configurability by the application.

JHoxley · May 6, 2006

Do you want "end user" features, or developer features... There are loads of cool things changed for us programmers that will be of little interest to people who just want pretty pictures

The 1/2 texel offset rules are gone now - so no more "directly mapping texels to pixels" malarky. You have no idea how many 1000000000's of times I've explained that to other developers, but its of approximately zero interest for the end-user.

Acert93 said:
â€¢ Topology Processor

This is your Geometry Shader.

Acert93 said:
â€¢ Tessellator Enhancements

Does this mean the Input Assembler? The GS is more of a tessellator (despite what I've said about it not being the best use of it), but the IA is now more refined and thus handles instancing much better.

Acert93 said:
â€¢ Frame Buffer Access in the Pixel Shader

No, thats not in. The OM stage is not programmable - you can still only write to the FB via the PS, same as D3D9.

â€¢ Batch Size improved

This is indirectly due to the re-architected WDDM and runtime. I suppose you could say it was the same thing, but it would be more correct to say that the re-design leads to improved efficiency, which also happens to reduce overhead thus not having the batch-size problem.

A couple of others off the top of my head...

â€¢ Predicated rendering
â€¢ Much more precisely defined rules on numerical limits/processing/accuracy

I gotta go do some stuff now, so dont have time to post more... but might do so later.

hth
Jack

Demirug · May 6, 2006

JHoxley said:
Does this mean the Input Assembler? The GS is more of a tessellator (despite what I've said about it not being the best use of it), but the IA is now more refined and thus handles instancing much better.

No, the tessellation unit in former Direct3D version was responsible for the whole N-Patch, RT-Patch stuff. There is no spate unit in Direct3D 10 but I donâ€™t see a problem to generate patches in the geometry shader.

JHoxley said:
No, thats not in. The OM stage is not programmable - you can still only write to the FB via the PS, same as D3D9.

Have you try to bind the same texture as render target view and as shader resource for the same pass?

JHoxley · May 6, 2006

Demirug said:
No, the tessellation unit in former Direct3D version was responsible for the whole N-Patch, RT-Patch stuff. There is no spate unit in Direct3D 10 but I donâ€™t see a problem to generate patches in the geometry shader.

Okay. There isn't any problem with generating patches in the GS... just that "word on the street" is that it wont be very high performance in first generation. Yes, you can do it - but it might well be so slow that its not worth it. Also, I seem to remember some discussion about certain continuity issues being a problem - even with GS adjacency.

Demirug said:
Have you try to bind the same texture as render target view and as shader resource for the same pass?

No I haven't actually. I remember being told about the multisample readback though (but cant find the details right this second), so maybe I am wrong

Jack

Neeyik · May 6, 2006

Acert93 said:
A couple years back Dave wrote an excellent preview of D3D10

Ilfrin actually wrote the article.

Demirug · May 6, 2006

JHoxley said:
No I haven't actually. I remember being told about the multisample readback though (but cant find the details right this second), so maybe I am wrong

Jack

It was only a wild guess I havenâ€™t try it too.

IIRC the multisampling reed back feature makes it possible to read back the single samples.

Acert93 · May 6, 2006

JHoxley said:
Do you want "end user" features, or developer features... There are loads of cool things changed for us programmers that will be of little interest to people who just want pretty pictures

Both

I would not mind hearing feedback on how these things "help" as well. e.g. I have read with DX10 you can now do a cube map in a single pass and what not. DX10 is pretty interesting and looks to be a big step forward. We wont have hardware for a while it seems, but at least we can get some info on what is definately in and what is not and discuss how those things will be affecting 3D rendering over the next 3 or 4 years.

Neeyik said:
Ilfrin actually wrote the article.

Opps... fixed. Sorry Ilfrin.

Demirug · May 6, 2006

From the developer point of view the IMHO biggest change is the banishing of the Caps. This will free us from the need to develop different code path/effects for every GPU family from the different vendors. Unfortunately this will have a price. No old GPU can be used with D3D10 you are forced to stay with D3D9 for this chips.

Blazkowicz · May 7, 2006

I find the memory virtualization nice, so people with "only" 1GB RAM will play demanding games without all textures being replicated in RAM I think. but that's due to WDDM and available on DX9L as well.

from a end user perspective : there's the enforced filtering quality, so we can have HQ aniso on new geforce (ironically my geforce 4, and all NV2x/3x can only do HQ aniso). mandatory AA support for every possible rendertarget as well? and will DX10 put an end to games that break MSAA support because of the post-processing or who knows what they do.

Cypher · May 7, 2006

No, thats not in. The OM stage is not programmable - you can still only write to the FB via the PS, same as D3D9.

That still sticks in my craw a bit. Maybe there's something really weird going on with the pixel shaders or something, but why wouldn't it be possible to feed in the RT's destination pixel in as an extra temporary constant in the PS?

arjan de lumens · May 7, 2006

Cypher said:
That still sticks in my craw a bit. Maybe there's something really weird going on with the pixel shaders or something, but why wouldn't it be possible to feed in the RT's destination pixel in as an extra temporary constant in the PS?

Coherency. If two polygons overlap, you need to ensure that the first polygon gets its intended framebuffer result written out (to memory or at least framebuffer cache) before the second polygon can read it. This problem does occur with ordinary framebuffer blends as well, but since the framebuffer blend is a short fixed-function computation, the number of pixels you need to check for overlap is quite small. If you, on the other hand, pull them into the pixel shader, you suddenly need to track coherency/overlap for a very large number of pixels (a modern gpu can keep thousands of pixels in flight in its pixel shader at any given time). The hardware cost of such coherency checking is generally held to be large.

It is also necessary to take great care in case of polygon edges, so that an (antialiased) edge shared between two polygons doesn't produce false overlap hits (such false hits can quickly give you an order-of-magnitude-level performance hit if you're not careful) - this too adds greatly to the cost. Even true overlap hits can be quite nasty to performance as well if you feed in a high-poly model with a not-perfectly-smooth surface.

Humus · May 7, 2006

Demirug said:
From the developer point of view the IMHO biggest change is the banishing of the Caps. This will free us from the need to develop different code path/effects for every GPU family from the different vendors.

That's what people would like to believe, but I'm sceptical. Maybe somewhat fewer paths, but different paths are most likely going to be a reality in the future as well as long as hardware has different performance characteristics. If feature A is really cool and does wonders on card X but is extremely slow on card Y, you probably want a path not using feature A to allow card Y to play your game too. There is no magical solution to this problem.

It's good though that DX10 sets a high common standard which will last for quite a while, but I bet a few revisions into the future, we'll be back to caps, except maybe at a more coarse level, like for instance the GL extensions.

stevem · May 7, 2006

JHoxley said:
Okay. There isn't any problem with generating patches in the GS... just that "word on the street" is that it wont be very high performance in first generation. Yes, you can do it - but it might well be so slow that its not worth it.

Is that a general D3D10 pipeline issue, refrast issue, or IHV implementation specific?

LeGreg · May 7, 2006

stevem said:
refrast issue

refrast is slow no matter what.

stevem · May 7, 2006

Oh yeah, I know that

. Given the dearth of D3D10 HW, it's got to run on something, so it's best to cover all bases.

Demirug · May 7, 2006

Humus said:
That's what people would like to believe, but I'm sceptical. Maybe somewhat fewer paths, but different paths are most likely going to be a reality in the future as well as long as hardware has different performance characteristics. If feature A is really cool and does wonders on card X but is extremely slow on card Y, you probably want a path not using feature A to allow card Y to play your game too. There is no magical solution to this problem.

Sure we always need to scale with the performance but with D3D10 we can concentrate first to make it run before we go to make it faster.

Humus said:
It's good though that DX10 sets a high common standard which will last for quite a while, but I bet a few revisions into the future, we'll be back to caps, except maybe at a more coarse level, like for instance the GL extensions.

If I have understood the concept right the only Caps in the future will be the version number.

Demirug · May 7, 2006

LeGreg said:
refrast is slow no matter what.

Yes but a debug message that it write out during start let me believe that the D3D10 version have multicore optimizations.

Xmas · May 7, 2006

arjan de lumens said:
Coherency. If two polygons overlap, you need to ensure that the first polygon gets its intended framebuffer result written out (to memory or at least framebuffer cache) before the second polygon can read it. This problem does occur with ordinary framebuffer blends as well, but since the framebuffer blend is a short fixed-function computation, the number of pixels you need to check for overlap is quite small. If you, on the other hand, pull them into the pixel shader, you suddenly need to track coherency/overlap for a very large number of pixels (a modern gpu can keep thousands of pixels in flight in its pixel shader at any given time). The hardware cost of such coherency checking is generally held to be large.

Though it should be added that some architectures don't have this problem as they already have all the information about which polygons overlap at any given pixel.

arjan de lumens · May 7, 2006

Xmas said:
Though it should be added that some architectures don't have this problem as they already have all the information about which polygons overlap at any given pixel.

If you're talking about tile-based renderers, they aren't really that much better off for framebuffer-in-shader-reads, even though they reduce the size penalty of the extra overlap checking. If you have N pixel shader invocations on a given pixel (for N overlapping polygons), and each of them reads the framebuffer content that resulted from the previous invocation, you still end up forcing all the invocations on the pixel to execute serially. Given that polygons, when rendered, often exhibit multiple orders of magnitude less temporal separation in a tiler that what you'd see in an immediate-mode renderer, this kind of serial execution is much more likely to appear (and seriously harm performance) in a tiler than in a non-tiler.

DX10 Checklist: What made it into D3D10? What was cut?

Acert93

Artist formerly known as Acert93

Acert93

Artist formerly known as Acert93

JHoxley

Demirug

JHoxley

Neeyik

Homo ergaster

Demirug

Acert93

Artist formerly known as Acert93

Demirug

Blazkowicz

Cypher

arjan de lumens

Humus

Crazy coder

stevem

LeGreg

stevem

Demirug

Demirug

Xmas

Porous

arjan de lumens

Similar threads