Direct3D 11

http://we.pcinlife.com/thread-981287-1-1.html

20080808_fcf04ff191132c11aa1cEjFMYaSlrFS7.jpg
 
My guess is "latest chips" means the performance you can get by writing and optimized FFT directly in native code for a given chip. So say compiling right to Cell or writing something in CUDA or similar (as close to native as we can get right now) for GT200. So basically the overhead of expressing FFT in Compute Shader's parlance is about 2x performance right now.
 
My guess is "latest chips" means the performance you can get by writing and optimized FFT directly in native code for a given chip. So say compiling right to Cell or writing something in CUDA or similar (as close to native as we can get right now) for GT200. So basically the overhead of expressing FFT in Compute Shader's parlance is about 2x performance right now.

Anyone have any hints on performance differences of compute shader on NV vs ATI hardware?
 
Post processing effects will greatly benefit from compute shaders.
Certainly I expect convolution-type stuff to gain some there (although I'm less impressed with the CUDA convolution sample results than I was expecting to be), but what other sorts of things did you have in mind here?
 
Certainly I expect convolution-type stuff to gain some there (although I'm less impressed with the CUDA convolution sample results than I was expecting to be), but what other sorts of things did you have in mind here?
Everything!
Exposure computed in one pass, faster bilateral filter implementations that can be used to for a lots of effects (motion blur, local tone mapping, DOF, etc..).
Hey..even realtime implementations of REYES look more feasible.. ;)
 
Exposure computed in one pass
I doubt reductions are gonna be much faster with compute shader than without. They're already pretty fast and there's no solving the data paths with shared memory really... not convinced on that yet :)

faster bilateral filter implementations that can be used to for a lots of effects
... eh... maybe a bit faster, but again they can be implemented reasonably efficiently already. There was even a paper on framebuffer LOD stuff lately that did something similar and it was pretty fast :)

Hey..even realtime implementations of REYES look more feasible.. ;)
Haha, not sure whether anything like that would be fast enough in compute shader, but feel free to prove me wrong ;)

I'm actually more pumped about building irregular data structures with compute shaders than anything else. But even then, they're not really God's gift to mankind or anything ;)
 
What are the differences between a compute shader and a pixel shader? Is it just inter-thread communication via shared registers/memory?
 
What are the differences between a compute shader and a pixel shader? Is it just inter-thread communication via shared registers/memory?
And scatter and some atomic and sync operations... and you don't need to render quads to launch threads obviously, although it's very CUDA-like in its abilities (and inabilities) to launch threads/strands/whatever you want to call them ;).
 
Thanks Andy.

Does anyone have a good explanation of what exactly the new Hull Shader does? Nvidia's presentation talks a bit about it but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.

So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?

Also, how does the tesselator do its thing without using the control points generated in the HS? (according to that RTR blog)
 
I doubt reductions are gonna be much faster with compute shader than without. They're already pretty fast and there's no solving the data paths with shared memory really... not convinced on that yet :)
It's not going to be an incredible speed up, but it will definitely help a bit.

... eh... maybe a bit faster, but again they can be implemented reasonably efficiently already. There was even a paper on framebuffer LOD stuff lately that did something similar and it was pretty fast :)
Relatively fast(er) bilateral filters are possible on GPU but in my own experience are still quite slow, so there's imho more to gain here (and perhaps more research to do..)

I'm actually more pumped about building irregular data structures with compute shaders than anything else. But even then, they're not really God's gift to mankind or anything ;)
Actually this is a good idea, what structures do you have in mind and for to be used for what? I guess the range of applications and algorithms to target in this case is pretty large :)

BTW..what about procedural texture/geometry generation?
 
So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?
The HS stage doesn't work on a triangle or on a quad patch but on a new kind of primitive (a patch anyway) and the vertex weights are set by the user.
Since tessellation happens via direct evaluation you have to topologically classify each patch in your mesh, group them according this classification and submit them to the tesselator. Each time a new topology has to be sent to the GPU a new set of vertex weights has to be set, and believe me..to compute these weights is not easy. It's very important that Microsoft helps out developers on this.
To not have a combinatorial explosion of topologies you might need to pretesselate your model in order to isolate all non regular vertices, also supporting special cases as darts, creases, might increase a lot the number of possible topological combinations possible.
 
Thanks Andy.

Does anyone have a good explanation of what exactly the new Hull Shader does? Nvidia's presentation talks a bit about it but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.

So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?

Also, how does the tesselator do its thing without using the control points generated in the HS? (according to that RTR blog)
The HS gets a patch as input and it's not restricted to a triangle or quad. Rather it has any number of control points up to 32. It was explained to me that one potential use of it is to remove extraordinary vertices from a patch. As far as I know nothing in the API prevents you from displacing control points in the HS though it might not fit with the subdivision surface algorithm.

The tessellator doesn't output final vertex positions. It generates input data for the domain shader and the final vertex position is calculated in the domain shader.
 
It's not going to be an incredible speed up, but it will definitely help a bit.
No doubt, and it's definitely a step in the right direction. I'm just less pumped about it than some other stuff ;)

Relatively fast(er) bilateral filters are possible on GPU but in my own experience are still quite slow, so there's imho more to gain here (and perhaps more research to do..)
Fair enough - guess we'll see there, but certainly bilateral filters are gonna be useful for multi-frequency shading and similar in the coming years.

Actually this is a good idea, what structures do you have in mind and for to be used for what? I guess the range of applications and algorithms to target in this case is pretty large :)
Oh, indeed; I'm thinking broadly of everything from resolution-matched shadow maps (or even irregular z-buffer stuff) to sparse matrix operations.

[...] but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.
... yeah... the whole thing is pretty complicated as nAo mentioned, even for someone who has had some experience with splines and tesselation.

On one hand it's cool that they're trying to make it general and let you implement lots of stuff, but on the other hand it's sufficiently complicated that one wonders whether or not they should have just let you implement it in software. I guess they figure this stuff is going to become ubiquitous - and maybe it well - but I can't say I'm too much of a fan of the whole "lets make a pipeline that includes everything you ever want to do" style and switch things on and off... that's the way it *used* to be before we got to write code to do what we want ;)

Anyways we'll see...
 
GameFest 08 presentations : Graphics: Introduction to the Direct3D 11 Graphics Pipeline

Slides 2, 56:
Direct3D 11 will run on down-level hardware
  • Multithreading!
  • Direct3D 10.1, 10 and 9 hardware/drivers
  • Full functionality (for example, tesselation) will require Direct3D 11 hardware

Direct3D 11 runtime will support D3D9-class hardware, after all the talk about how D3D10 features really require D3D10-class hardware? How's that possible?

The D3D10 HLSL shader compiler supports D3D9 targets (except ps_1_x) since December 2006 SDK, but what about numerous other changes such as texture and buffer formats? Will there be new "ID3D9" interfaces and D3D9 devices that operate with older data structures, but follow the ideology of D3D10/11 interfaces?
 
Last edited by a moderator:
GameFest 08 presentations : Graphics: Introduction to the Direct3D 11 Graphics Pipeline

Slides 2, 56:


Direct3D 11 runtime will support D3D9-class hardware, after all the talk about how D3D10 features really require D3D10-class hardware? How's that possible?

The D3D10 HLSL shader compiler supports D3D9 targets (except ps_1_x) since December 2006 SDK, but what about numerous other changes such as texture and buffer formats? Will there be new "ID3D9" interfaces and D3D9 devices that operate with older data structures, but follow the ideology of D3D10/11 interfaces?

Direct3D 11 will introduce new techlevels (9.x) that will support a defined subset of Direct3D 11. These techlevels would be using the new Direct3D 11 interfaces and a special software module that translate this calls to the Direct3D 9 driver interface. Therefore this solution could have a higher overhead than using the native Direct3D 9 interfaces. Additional as the tech levels are caps free you cannot expect using all features that are accessible with the Direct3D 9 interfaces.
 
Back
Top