Direct3D 11

cho · Aug 8, 2008

http://we.pcinlife.com/thread-981287-1-1.html

3dcgi · Aug 8, 2008

The realtime rendering blog has a couple posts as well.
Direct3D 11 Details Part I: Intro
Direct3D 11 Details Part II: Tessellation

nAo · Aug 8, 2008

NVIDIA: Gamefest Presentation Slides Now Online

cho · Aug 8, 2008

compute shader

http://we.pcinlife.com/thread-981271-1-1.html

Lux_ · Aug 8, 2008

What is "latest chips"?
Is it "next-gen GPUs running Prototype DX11"?

Andrew Lauritzen · Aug 8, 2008

My guess is "latest chips" means the performance you can get by writing and optimized FFT directly in native code for a given chip. So say compiling right to Cell or writing something in CUDA or similar (as close to native as we can get right now) for GT200. So basically the overhead of expressing FFT in Compute Shader's parlance is about 2x performance right now.

nAo · Aug 8, 2008

Post processing effects will greatly benefit from compute shaders.

TimothyFarrar · Aug 8, 2008

Andrew Lauritzen said:
My guess is "latest chips" means the performance you can get by writing and optimized FFT directly in native code for a given chip. So say compiling right to Cell or writing something in CUDA or similar (as close to native as we can get right now) for GT200. So basically the overhead of expressing FFT in Compute Shader's parlance is about 2x performance right now.

Anyone have any hints on performance differences of compute shader on NV vs ATI hardware?

Andrew Lauritzen · Aug 8, 2008

nAo said:
Post processing effects will greatly benefit from compute shaders.

Certainly I expect convolution-type stuff to gain some there (although I'm less impressed with the CUDA convolution sample results than I was expecting to be), but what other sorts of things did you have in mind here?

nAo · Aug 8, 2008

Andrew Lauritzen said:
Certainly I expect convolution-type stuff to gain some there (although I'm less impressed with the CUDA convolution sample results than I was expecting to be), but what other sorts of things did you have in mind here?

Everything!
Exposure computed in one pass, faster bilateral filter implementations that can be used to for a lots of effects (motion blur, local tone mapping, DOF, etc..).
Hey..even realtime implementations of REYES look more feasible..

Andrew Lauritzen · Aug 8, 2008

nAo said:
Exposure computed in one pass

I doubt reductions are gonna be much faster with compute shader than without. They're already pretty fast and there's no solving the data paths with shared memory really... not convinced on that yet

nAo said:
faster bilateral filter implementations that can be used to for a lots of effects

... eh... maybe a bit faster, but again they can be implemented reasonably efficiently already. There was even a paper on framebuffer LOD stuff lately that did something similar and it was pretty fast

nAo said:
Hey..even realtime implementations of REYES look more feasible..

Haha, not sure whether anything like that would be fast enough in compute shader, but feel free to prove me wrong

I'm actually more pumped about building irregular data structures with compute shaders than anything else. But even then, they're not really God's gift to mankind or anything

trinibwoy · Aug 9, 2008

What are the differences between a compute shader and a pixel shader? Is it just inter-thread communication via shared registers/memory?

Andrew Lauritzen · Aug 9, 2008

trinibwoy said:
What are the differences between a compute shader and a pixel shader? Is it just inter-thread communication via shared registers/memory?

And scatter and some atomic and sync operations... and you don't need to render quads to launch threads obviously, although it's very CUDA-like in its abilities (and inabilities) to launch threads/strands/whatever you want to call them

.

trinibwoy · Aug 9, 2008

Thanks Andy.

Does anyone have a good explanation of what exactly the new Hull Shader does? Nvidia's presentation talks a bit about it but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.

So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?

Also, how does the tesselator do its thing without using the control points generated in the HS? (according to that RTR blog)

nAo · Aug 9, 2008

Andrew Lauritzen said:
I doubt reductions are gonna be much faster with compute shader than without. They're already pretty fast and there's no solving the data paths with shared memory really... not convinced on that yet

It's not going to be an incredible speed up, but it will definitely help a bit.

... eh... maybe a bit faster, but again they can be implemented reasonably efficiently already. There was even a paper on framebuffer LOD stuff lately that did something similar and it was pretty fast

Relatively fast(er) bilateral filters are possible on GPU but in my own experience are still quite slow, so there's imho more to gain here (and perhaps more research to do..)

I'm actually more pumped about building irregular data structures with compute shaders than anything else. But even then, they're not really God's gift to mankind or anything

Actually this is a good idea, what structures do you have in mind and for to be used for what? I guess the range of applications and algorithms to target in this case is pretty large

BTW..what about procedural texture/geometry generation?

nAo · Aug 9, 2008

trinibwoy said:
So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?

The HS stage doesn't work on a triangle or on a quad patch but on a new kind of primitive (a patch anyway) and the vertex weights are set by the user.
Since tessellation happens via direct evaluation you have to topologically classify each patch in your mesh, group them according this classification and submit them to the tesselator. Each time a new topology has to be sent to the GPU a new set of vertex weights has to be set, and believe me..to compute these weights is not easy. It's very important that Microsoft helps out developers on this.
To not have a combinatorial explosion of topologies you might need to pretesselate your model in order to isolate all non regular vertices, also supporting special cases as darts, creases, might increase a lot the number of possible topological combinations possible.

3dcgi · Aug 9, 2008

trinibwoy said:
Thanks Andy.

Does anyone have a good explanation of what exactly the new Hull Shader does? Nvidia's presentation talks a bit about it but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.

So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?

Also, how does the tesselator do its thing without using the control points generated in the HS? (according to that RTR blog)

The HS gets a patch as input and it's not restricted to a triangle or quad. Rather it has any number of control points up to 32. It was explained to me that one potential use of it is to remove extraordinary vertices from a patch. As far as I know nothing in the API prevents you from displacing control points in the HS though it might not fit with the subdivision surface algorithm.

The tessellator doesn't output final vertex positions. It generates input data for the domain shader and the final vertex position is calculated in the domain shader.

Andrew Lauritzen · Aug 9, 2008

nAo said:
It's not going to be an incredible speed up, but it will definitely help a bit.

No doubt, and it's definitely a step in the right direction. I'm just less pumped about it than some other stuff

nAo said:
Relatively fast(er) bilateral filters are possible on GPU but in my own experience are still quite slow, so there's imho more to gain here (and perhaps more research to do..)

Fair enough - guess we'll see there, but certainly bilateral filters are gonna be useful for multi-frequency shading and similar in the coming years.

nAo said:
Actually this is a good idea, what structures do you have in mind and for to be used for what? I guess the range of applications and algorithms to target in this case is pretty large

Oh, indeed; I'm thinking broadly of everything from resolution-matched shadow maps (or even irregular z-buffer stuff) to sparse matrix operations.

trinibwoy said:
[...] but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.

... yeah... the whole thing is pretty complicated as nAo mentioned, even for someone who has had some experience with splines and tesselation.

On one hand it's cool that they're trying to make it general and let you implement lots of stuff, but on the other hand it's sufficiently complicated that one wonders whether or not they should have just let you implement it in software. I guess they figure this stuff is going to become ubiquitous - and maybe it well - but I can't say I'm too much of a fan of the whole "lets make a pipeline that includes everything you ever want to do" style and switch things on and off... that's the way it *used* to be before we got to write code to do what we want

Anyways we'll see...

DmitryKo · Oct 14, 2008

GameFest 08 presentations : Graphics: Introduction to the Direct3D 11 Graphics Pipeline

Slides 2, 56:

Direct3D 11 will run on down-level hardware

Multithreading!

Direct3D 10.1, 10 and 9 hardware/drivers

Full functionality (for example, tesselation) will require Direct3D 11 hardware

Direct3D 11 runtime will support D3D9-class hardware, after all the talk about how D3D10 features really require D3D10-class hardware? How's that possible?

The D3D10 HLSL shader compiler supports D3D9 targets (except ps_1_x) since December 2006 SDK, but what about numerous other changes such as texture and buffer formats? Will there be new "ID3D9" interfaces and D3D9 devices that operate with older data structures, but follow the ideology of D3D10/11 interfaces?

Demirug · Oct 14, 2008

DmitryKo said:
GameFest 08 presentations : Graphics: Introduction to the Direct3D 11 Graphics Pipeline

Slides 2, 56:

Direct3D 11 runtime will support D3D9-class hardware, after all the talk about how D3D10 features really require D3D10-class hardware? How's that possible?

The D3D10 HLSL shader compiler supports D3D9 targets (except ps_1_x) since December 2006 SDK, but what about numerous other changes such as texture and buffer formats? Will there be new "ID3D9" interfaces and D3D9 devices that operate with older data structures, but follow the ideology of D3D10/11 interfaces?

Direct3D 11 will introduce new techlevels (9.x) that will support a defined subset of Direct3D 11. These techlevels would be using the new Direct3D 11 interfaces and a special software module that translate this calls to the Direct3D 9 driver interface. Therefore this solution could have a higher overhead than using the native Direct3D 9 interfaces. Additional as the tech levels are caps free you cannot expect using all features that are accessible with the Direct3D 9 interfaces.

Direct3D 11

cho

3dcgi

nAo

Nutella Nutellae

cho

Lux_

Andrew Lauritzen

Moderator

nAo

Nutella Nutellae

TimothyFarrar

Andrew Lauritzen

Moderator

nAo

Nutella Nutellae

Andrew Lauritzen

Moderator

trinibwoy

Meh

Andrew Lauritzen

Moderator

trinibwoy

Meh

nAo

Nutella Nutellae

nAo

Nutella Nutellae

3dcgi

Andrew Lauritzen

Moderator

DmitryKo

Demirug

Similar threads