My guess is "latest chips" means the performance you can get by writing and optimized FFT directly in native code for a given chip. So say compiling right to Cell or writing something in CUDA or similar (as close to native as we can get right now) for GT200. So basically the overhead of expressing FFT in Compute Shader's parlance is about 2x performance right now.
Certainly I expect convolution-type stuff to gain some there (although I'm less impressed with the CUDA convolution sample results than I was expecting to be), but what other sorts of things did you have in mind here?Post processing effects will greatly benefit from compute shaders.
Everything!Certainly I expect convolution-type stuff to gain some there (although I'm less impressed with the CUDA convolution sample results than I was expecting to be), but what other sorts of things did you have in mind here?
I doubt reductions are gonna be much faster with compute shader than without. They're already pretty fast and there's no solving the data paths with shared memory really... not convinced on that yetExposure computed in one pass
... eh... maybe a bit faster, but again they can be implemented reasonably efficiently already. There was even a paper on framebuffer LOD stuff lately that did something similar and it was pretty fastfaster bilateral filter implementations that can be used to for a lots of effects
Haha, not sure whether anything like that would be fast enough in compute shader, but feel free to prove me wrongHey..even realtime implementations of REYES look more feasible..
And scatter and some atomic and sync operations... and you don't need to render quads to launch threads obviously, although it's very CUDA-like in its abilities (and inabilities) to launch threads/strands/whatever you want to call them .What are the differences between a compute shader and a pixel shader? Is it just inter-thread communication via shared registers/memory?
It's not going to be an incredible speed up, but it will definitely help a bit.I doubt reductions are gonna be much faster with compute shader than without. They're already pretty fast and there's no solving the data paths with shared memory really... not convinced on that yet
Relatively fast(er) bilateral filters are possible on GPU but in my own experience are still quite slow, so there's imho more to gain here (and perhaps more research to do..)... eh... maybe a bit faster, but again they can be implemented reasonably efficiently already. There was even a paper on framebuffer LOD stuff lately that did something similar and it was pretty fast
Actually this is a good idea, what structures do you have in mind and for to be used for what? I guess the range of applications and algorithms to target in this case is pretty largeI'm actually more pumped about building irregular data structures with compute shaders than anything else. But even then, they're not really God's gift to mankind or anything
The HS stage doesn't work on a triangle or on a quad patch but on a new kind of primitive (a patch anyway) and the vertex weights are set by the user.So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?
The HS gets a patch as input and it's not restricted to a triangle or quad. Rather it has any number of control points up to 32. It was explained to me that one potential use of it is to remove extraordinary vertices from a patch. As far as I know nothing in the API prevents you from displacing control points in the HS though it might not fit with the subdivision surface algorithm.Thanks Andy.
Does anyone have a good explanation of what exactly the new Hull Shader does? Nvidia's presentation talks a bit about it but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.
So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?
Also, how does the tesselator do its thing without using the control points generated in the HS? (according to that RTR blog)
No doubt, and it's definitely a step in the right direction. I'm just less pumped about it than some other stuffIt's not going to be an incredible speed up, but it will definitely help a bit.
Fair enough - guess we'll see there, but certainly bilateral filters are gonna be useful for multi-frequency shading and similar in the coming years.Relatively fast(er) bilateral filters are possible on GPU but in my own experience are still quite slow, so there's imho more to gain here (and perhaps more research to do..)
Oh, indeed; I'm thinking broadly of everything from resolution-matched shadow maps (or even irregular z-buffer stuff) to sparse matrix operations.Actually this is a good idea, what structures do you have in mind and for to be used for what? I guess the range of applications and algorithms to target in this case is pretty large
... yeah... the whole thing is pretty complicated as nAo mentioned, even for someone who has had some experience with splines and tesselation.[...] but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.
Direct3D 11 will run on down-level hardware
- Multithreading!
- Direct3D 10.1, 10 and 9 hardware/drivers
- Full functionality (for example, tesselation) will require Direct3D 11 hardware
GameFest 08 presentations : Graphics: Introduction to the Direct3D 11 Graphics Pipeline
Slides 2, 56:
Direct3D 11 runtime will support D3D9-class hardware, after all the talk about how D3D10 features really require D3D10-class hardware? How's that possible?
The D3D10 HLSL shader compiler supports D3D9 targets (except ps_1_x) since December 2006 SDK, but what about numerous other changes such as texture and buffer formats? Will there be new "ID3D9" interfaces and D3D9 devices that operate with older data structures, but follow the ideology of D3D10/11 interfaces?