PDA

View Full Version : Direct3D 11


Demirug
06-Jun-2008, 20:12
While Microsoft is still very vague:
„Join us as we cover exciting new ground, including the lowdown on the upcoming Direct3D 11 API set.”
(http://www.xnagamefest.com/conference_details08.htm#GRAPHICS_)

nvidia goes already deeper:
“Get a sneak peek at what DirectX 11 will look like.

Kev will introduce the new DirectX 11 rendering pipeline currently under development at Microsoft. The technology builds on the existing DirectX 10 API set and adds new features including tessellation, multithreaded rendering, compute shaders, Shader Model 5, and more. Get up to speed fast with the next generation of rendering technology. “ (http://speakers.nvision2008.com/agenda/pop_session.cfm?sessionid=39)

ShaidarHaran
06-Jun-2008, 20:20
Multi-threaded rendering = savior of multi-GPU performance.

fellix
06-Jun-2008, 20:36
Shall we tessellate, already! :lol:

Davros
06-Jun-2008, 22:57
Ive been waiting for someone to do something really cool with tessalation ever since ati anounced truform

Demirug
06-Jun-2008, 23:44
Multi-threaded rendering = savior of multi-GPU performance.

A whole API feature for a niche market?

Ive been waiting for someone to do something really cool with tessalation ever since ati anounced truform

Truform is the stone age of tessellation

armchair_architect
07-Jun-2008, 04:31
Multi-threaded rendering == only way to scale games to more complex worlds in the multi-core *CPU* era.

Games on Xbox360 and PS3 use multiple threads to build up command lists for the graphics processor, because the CPU cores on the consoles are (individually) slower than standard PC CPUs -- they just can't generate commands fast enough using only one processor.

This has nothing to do with driving multiple GPUs, and everything to do with exploiting multiple CPU cores. And maybe just as important, making it easier to port between PCs and consoles.

Sunday
08-Jun-2008, 22:12
It looks like NVIDIA has a strong desire to dictate and shape upcoming DX that will probably ship with Windows 7, and XBOX Next!

I wonder how will ATI (AMD) respond?!

eastmen
08-Jun-2008, 22:43
It looks like NVIDIA has a strong desire to dictate and shape upcoming DX that will probably ship with Windows 7, and XBOX Next!

I wonder how will ATI (AMD) respond?!

ATI will most likely be in the xbox next . I would think ms still has a sour taste in thier mouth over the original xbox .

I think ati will be fine in that regard

RejZoR
08-Jun-2008, 23:00
Interesting, i always "thought" current GPU's are already multi-threaded...

willardjuice
08-Jun-2008, 23:22
Interesting, i always "thought" current GPU's are already multi-threaded...

I believe they are talking about the actual driver. IIRC DirectX 10 drivers are not multi-threaded.

AlexV
09-Jun-2008, 00:39
I believe they are talking about the actual driver. IIRC DirectX 10 drivers are not multi-threaded.

No they're not. In DX10 the driver must make callbacks through the runtime to the OS, calls which have to be made through the main thread. MS mispredicted the ubiquity of multithreaded games and made that choice.

As for what they're talking about, I think it's the same thing Humus is making reference to here: http://forum.beyond3d.com/showthread.php?t=48201&highlight=multi-threaded+rendering. Which is intriguing.

willardjuice
09-Jun-2008, 00:49
Ok then I stand corrected. :smile:

nexus_alpha
09-Jun-2008, 00:58
It looks like NVIDIA has a strong desire to dictate and shape upcoming DX that will probably ship with Windows 7, and XBOX Next!

I wonder how will ATI (AMD) respond?!

Current ATI cards already support tesellation

Demirug
09-Jun-2008, 06:56
No they're not. In DX10 the driver must make callbacks through the runtime to the OS, calls which have to be made through the main thread. MS mispredicted the ubiquity of multithreaded games and made that choice.

The whole situation is somewhat more complex.

Drivers can still create worker threads if they want. Additional you can call most D3D10 methods from any thread you want. But the default multithread layer contains only one synchronization object for anything.

There are limitations when it comes to DXGI methods as they need to play nice with the application window.

In comparison to D3D9 on Vista D3D10 doesn’t do any multithread optimization by its own. At the time the internal workings were designed Microsoft thought that all application that use Direct3D 10 would have an own render thread (like its common on consoles).

Current ATI cards already support tesellation

There are many different forms of tessellation. ;)

AlexV
09-Jun-2008, 10:38
The whole situation is somewhat more complex.

Drivers can still create worker threads if they want. Additional you can call most D3D10 methods from any thread you want. But the default multithread layer contains only one synchronization object for anything.

There are limitations when it comes to DXGI methods as they need to play nice with the application window.

In comparison to D3D9 on Vista D3D10 doesn’t do any multithread optimization by its own. At the time the internal workings were designed Microsoft thought that all application that use Direct3D 10 would have an own render thread (like its common on consoles).


Cheers for detailing! I got that tidbit during a chat from a while ago and never went further with investigating it properly.

ShaidarHaran
11-Jun-2008, 19:56
A whole API feature for a niche market?

Today's niche is tomorrow's mainstream. It is clear that multi-core is where the market is headed.

ninelven
13-Jun-2008, 06:50
I don't think that it is clear at all. From today's point of view multi-chip will only become neccessary in about 3 hardware cycles which places it at least 5 years away.

But who knows how the situation will look in 5 years...

AnarchX
06-Aug-2008, 21:22
http://www.techtree.com/India/News/Nvidias_Big_Bang_II_in_September/551-91858-580.html
Nvidia follows a quarterly development of drivers and has shared their roadmap with us. The R180 drive release is due in September, R185 will be released in December-February time-frame, the R190 will be rolled out between March and May, the R195 is due between Jun-August, and finally -- the R200, codenamed "Big Bang III", will support DirectX 11.

armchair_architect
07-Aug-2008, 03:43
Today's niche is tomorrow's mainstream. It is clear that multi-core is where the market is headed.

I don't think that it is clear at all. From today's point of view multi-chip will only become neccessary in about 3 hardware cycles which places it at least 5 years away.

But who knows how the situation will look in 5 years...

You're talking about different things. DX11 multi-threading is all about having the app rendering engine, DX runtime, and driver take advantage of multiple CPU cores. It has nothing to do with SLI/Crossfire.

ninelven
07-Aug-2008, 07:35
No, he was writing about multi-gpu...

Multi-threaded rendering = savior of multi-GPU performance.

Unless, you meant both of the posts... which then it makes sense... hard to tell.

cho
08-Aug-2008, 04:35
http://we.pcinlife.com/thread-981287-1-1.html

http://we.pcinlife.com/attachments/forumid_206/20080808_fcf04ff191132c11aa1cEjFMYaSlrFS7.jpg

3dcgi
08-Aug-2008, 04:58
The realtime rendering blog has a couple posts as well.
Direct3D 11 Details Part I: Intro (http://www.realtimerendering.com/blog/direct3d-11-details-part-i-intro/)
Direct3D 11 Details Part II: Tessellation (http://www.realtimerendering.com/blog/direct3d-11-details-part-ii-tessellation/)

nAo
08-Aug-2008, 06:51
NVIDIA: Gamefest Presentation Slides Now Online (http://news.developer.nvidia.com/2008/08/gamefest-presen.html)

cho
08-Aug-2008, 07:14
compute shader

http://we.pcinlife.com/thread-981271-1-1.html

http://we.pcinlife.com/attachments/forumid_206/20080808_c5d8f0c2b274c44a29adjxS3eJFIsTPd.jpg

Lux_
08-Aug-2008, 08:03
What is "latest chips"?
Is it "next-gen GPUs running Prototype DX11"?

Andrew Lauritzen
08-Aug-2008, 19:37
My guess is "latest chips" means the performance you can get by writing and optimized FFT directly in native code for a given chip. So say compiling right to Cell or writing something in CUDA or similar (as close to native as we can get right now) for GT200. So basically the overhead of expressing FFT in Compute Shader's parlance is about 2x performance right now.

nAo
08-Aug-2008, 19:39
Post processing effects will greatly benefit from compute shaders.

TimothyFarrar
08-Aug-2008, 20:04
My guess is "latest chips" means the performance you can get by writing and optimized FFT directly in native code for a given chip. So say compiling right to Cell or writing something in CUDA or similar (as close to native as we can get right now) for GT200. So basically the overhead of expressing FFT in Compute Shader's parlance is about 2x performance right now.

Anyone have any hints on performance differences of compute shader on NV vs ATI hardware?

Andrew Lauritzen
08-Aug-2008, 20:36
Post processing effects will greatly benefit from compute shaders.
Certainly I expect convolution-type stuff to gain some there (although I'm less impressed with the CUDA convolution sample results than I was expecting to be), but what other sorts of things did you have in mind here?

nAo
08-Aug-2008, 22:03
Certainly I expect convolution-type stuff to gain some there (although I'm less impressed with the CUDA convolution sample results than I was expecting to be), but what other sorts of things did you have in mind here?
Everything!
Exposure computed in one pass, faster bilateral filter implementations that can be used to for a lots of effects (motion blur, local tone mapping, DOF, etc..).
Hey..even realtime implementations of REYES look more feasible.. ;)

Andrew Lauritzen
08-Aug-2008, 22:50
Exposure computed in one pass
I doubt reductions are gonna be much faster with compute shader than without. They're already pretty fast and there's no solving the data paths with shared memory really... not convinced on that yet :)

faster bilateral filter implementations that can be used to for a lots of effects
... eh... maybe a bit faster, but again they can be implemented reasonably efficiently already. There was even a paper on framebuffer LOD stuff lately that did something similar and it was pretty fast :)


Hey..even realtime implementations of REYES look more feasible.. ;)
Haha, not sure whether anything like that would be fast enough in compute shader, but feel free to prove me wrong ;)

I'm actually more pumped about building irregular data structures with compute shaders than anything else. But even then, they're not really God's gift to mankind or anything ;)

trinibwoy
09-Aug-2008, 00:15
What are the differences between a compute shader and a pixel shader? Is it just inter-thread communication via shared registers/memory?

Andrew Lauritzen
09-Aug-2008, 00:21
What are the differences between a compute shader and a pixel shader? Is it just inter-thread communication via shared registers/memory?
And scatter and some atomic and sync operations... and you don't need to render quads to launch threads obviously, although it's very CUDA-like in its abilities (and inabilities) to launch threads/strands/whatever you want to call them ;).

trinibwoy
09-Aug-2008, 02:43
Thanks Andy.

Does anyone have a good explanation of what exactly the new Hull Shader does? Nvidia's presentation talks a bit about it but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.

So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?

Also, how does the tesselator do its thing without using the control points generated in the HS? (according to that RTR blog)

nAo
09-Aug-2008, 03:31
I doubt reductions are gonna be much faster with compute shader than without. They're already pretty fast and there's no solving the data paths with shared memory really... not convinced on that yet :)
It's not going to be an incredible speed up, but it will definitely help a bit.


... eh... maybe a bit faster, but again they can be implemented reasonably efficiently already. There was even a paper on framebuffer LOD stuff lately that did something similar and it was pretty fast :)
Relatively fast(er) bilateral filters are possible on GPU but in my own experience are still quite slow, so there's imho more to gain here (and perhaps more research to do..)


I'm actually more pumped about building irregular data structures with compute shaders than anything else. But even then, they're not really God's gift to mankind or anything ;)
Actually this is a good idea, what structures do you have in mind and for to be used for what? I guess the range of applications and algorithms to target in this case is pretty large :)

BTW..what about procedural texture/geometry generation?

nAo
09-Aug-2008, 03:39
So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?
The HS stage doesn't work on a triangle or on a quad patch but on a new kind of primitive (a patch anyway) and the vertex weights are set by the user.
Since tessellation happens via direct evaluation you have to topologically classify each patch in your mesh, group them according this classification and submit them to the tesselator. Each time a new topology has to be sent to the GPU a new set of vertex weights has to be set, and believe me..to compute these weights is not easy. It's very important that Microsoft helps out developers on this.
To not have a combinatorial explosion of topologies you might need to pretesselate your model in order to isolate all non regular vertices, also supporting special cases as darts, creases, might increase a lot the number of possible topological combinations possible.

3dcgi
09-Aug-2008, 03:45
Thanks Andy.

Does anyone have a good explanation of what exactly the new Hull Shader does? Nvidia's presentation talks a bit about it but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.

So the HS gets a triangle or quad patch and interpolates the positions of surrounding vertices to generate up to 32 control points in parallel (one HS thread per control point) using some predetermined set of vertex weights per control point? Where do these weights come from? And this stage is limited to the plane of the original surface right - no displacement happens here?

Also, how does the tesselator do its thing without using the control points generated in the HS? (according to that RTR blog)
The HS gets a patch as input and it's not restricted to a triangle or quad. Rather it has any number of control points up to 32. It was explained to me that one potential use of it is to remove extraordinary vertices from a patch. As far as I know nothing in the API prevents you from displacing control points in the HS though it might not fit with the subdivision surface algorithm.

The tessellator doesn't output final vertex positions. It generates input data for the domain shader and the final vertex position is calculated in the domain shader.

Andrew Lauritzen
09-Aug-2008, 03:58
It's not going to be an incredible speed up, but it will definitely help a bit.
No doubt, and it's definitely a step in the right direction. I'm just less pumped about it than some other stuff ;)


Relatively fast(er) bilateral filters are possible on GPU but in my own experience are still quite slow, so there's imho more to gain here (and perhaps more research to do..)
Fair enough - guess we'll see there, but certainly bilateral filters are gonna be useful for multi-frequency shading and similar in the coming years.


Actually this is a good idea, what structures do you have in mind and for to be used for what? I guess the range of applications and algorithms to target in this case is pretty large :)
Oh, indeed; I'm thinking broadly of everything from resolution-matched shadow maps (or even irregular z-buffer stuff) to sparse matrix operations.


[...] but for someone like me who's not that familiar with bezier patches and tesselation the whole process isn't that clear.
... yeah... the whole thing is pretty complicated as nAo mentioned, even for someone who has had some experience with splines and tesselation.

On one hand it's cool that they're trying to make it general and let you implement lots of stuff, but on the other hand it's sufficiently complicated that one wonders whether or not they should have just let you implement it in software. I guess they figure this stuff is going to become ubiquitous - and maybe it well - but I can't say I'm too much of a fan of the whole "lets make a pipeline that includes everything you ever want to do" style and switch things on and off... that's the way it *used* to be before we got to write code to do what we want ;)

Anyways we'll see...

DmitryKo
14-Oct-2008, 18:07
GameFest 08 presentations (http://www.xnagamefest.com/presentations08.htm): Graphics: Introduction to the Direct3D 11 Graphics Pipeline

Slides 2, 56:
Direct3D 11 will run on down-level hardware
Multithreading!
Direct3D 10.1, 10 and 9 hardware/drivers
Full functionality (for example, tesselation) will require Direct3D 11 hardware


Direct3D 11 runtime will support D3D9-class hardware, after all the talk about how D3D10 features really require D3D10-class hardware? How's that possible?

The D3D10 HLSL shader compiler supports D3D9 targets (except ps_1_x) since December 2006 SDK, but what about numerous other changes such as texture and buffer formats? Will there be new "ID3D9" interfaces and D3D9 devices that operate with older data structures, but follow the ideology of D3D10/11 interfaces?

Demirug
14-Oct-2008, 19:07
GameFest 08 presentations (http://www.xnagamefest.com/presentations08.htm): Graphics: Introduction to the Direct3D 11 Graphics Pipeline

Slides 2, 56:


Direct3D 11 runtime will support D3D9-class hardware, after all the talk about how D3D10 features really require D3D10-class hardware? How's that possible?

The D3D10 HLSL shader compiler supports D3D9 targets (except ps_1_x) since December 2006 SDK, but what about numerous other changes such as texture and buffer formats? Will there be new "ID3D9" interfaces and D3D9 devices that operate with older data structures, but follow the ideology of D3D10/11 interfaces?

Direct3D 11 will introduce new techlevels (9.x) that will support a defined subset of Direct3D 11. These techlevels would be using the new Direct3D 11 interfaces and a special software module that translate this calls to the Direct3D 9 driver interface. Therefore this solution could have a higher overhead than using the native Direct3D 9 interfaces. Additional as the tech levels are caps free you cannot expect using all features that are accessible with the Direct3D 9 interfaces.

DmitryKo
14-Oct-2008, 21:59
Direct3D 11 will introduce new techlevels (9.x) that will support a defined subset of Direct3D 11. These techlevels would be using the new Direct3D 11 interfaces and a special software module that translate this calls to the Direct3D 9 driver interface. So the runtime will support creating Direct3D 11 devices on D3D 9, 10 and 10.1 hardware, just like D3D 10.1 device can be created on D3D 10 hardware?
And these devices will have some fixed feature sets (mapped to a set of caps) - something similar to what is defined by D3D10_FEATURE_LEVEL1 (http://msdn.microsoft.com/en-us/library/bb694529(VS.85).aspx) structure used by D3D10CreateDevice1 (http://msdn.microsoft.com/en-us/library/bb694526(VS.85).aspx)?

D3D10_FEATURE_LEVEL_10_0 The hardware supports Direct3D 10.0 features.
D3D10_FEATURE_LEVEL_10_1 The hardware supports Direct3D 10.1 features.
D3D10_FEATURE_LEVEL_9_1 This value is in the header,
D3D10_FEATURE_LEVEL_9_2 but this feature
D3D10_FEATURE_LEVEL_9_3 is not yet implemented.

this solution could have a higher overhead than using the native Direct3D 9 interfaces

Judging by Windows Vista Display Driver Model Reference (http://msdn.microsoft.com/en-us/library/ms794244.aspx), on-the-fly translation of Direct3D 10 calls and data formats into D3D 9 DDI could introduce a quite a lot of overhead, since the D3D10 API is so closely tied to the D3D10 DDI... :?: :-?
I can see how this could be beneficial to future Vista developers though, as they wouldn't have to support a separate D3D9 path for older hardware.

BTW, what has happened to WDDM v2 (Advanced)? No traces of it in the current WDK...

Demirug
14-Oct-2008, 22:18
So the runtime will support creating Direct3D 11 devices on D3D 9, 10 and 10.1 hardware, just like D3D 10.1 device can be created on D3D 10 hardware?

Yes.

And these devices will have some fixed feature sets (mapped to a set of caps) - something similar to what is defined by D3D10_FEATURE_LEVEL1 (http://msdn.microsoft.com/en-us/library/bb694529(VS.85).aspx) structure used by D3D10CreateDevice1 (http://msdn.microsoft.com/en-us/library/bb694526(VS.85).aspx)?

Yes.

BTW, what has happened to WDDM v2 (Advanced)? No traces of it in the current WDK...


I don’t know. We need to watch what will published after the next WinHEC.

DmitryKo
15-Oct-2008, 18:30
http://www.microsoft.com/whdc/winhec/2008/sessions.aspx


High Fidelity Graphics and Media

Display Driver Interface Changes for Windows 7 - GRA-T518
Presenter(s): Ameet Chitre

The Windows Display Driver Model (WDDM) has been optimized for several key performance and reliability improvements in Windows. This session will discuss the WDDM v1.1 driver model optimizations and how it enables key features in Windows 7. You will also learn about WDDM v1.1 requirements and why it is a must-have for Windows 7.

Khm... :?:

DmitryKo
17-Oct-2008, 21:35
DirectX: Core Graphics for Windows 7 - GRA-T515
Presenter(s): Anantha Kancherla

Windows 7 brings Direct3D 10 to the mainstream by completing key corporate scenarios. In this session, you will learn about these new scenarios. We will also talk about Windows 7 features that use Direct3D 10 such as D2D , the new hardware-accelerated 2D API, and Desktop Window Manager that is based on Direct3D 10.

... and Direct3D 10 will be able to run on top of DDI9 in Direct3D 11 runtime, so Direct3D 9-level harwdare will still be good to use. Things really get interesting this time around...

Demirug
17-Oct-2008, 22:25
... and Direct3D 10 will be able to run on top of DDI9 in Direct3D 11 runtime, so Direct3D 9-level harwdare will still be good to use. Things really get interesting this time around...

That’s the 9er feature level we already talked about.

DmitryKo
31-Oct-2008, 18:00
In the meantime, there are some PDC 2008 sessions which discuss Direct3D features in Windows 7; the video feeds of the sessions and the PowerPoint presentations are available on Channel 9.

PC05 - Windows 7: Unlocking the GPU with Direct3D (http://channel9.msdn.com/pdc2008/PC05)
PC04 - Windows 7: Writing Your Application to Shine on Modern Graphics Hardware (http://channel9.msdn.com/pdc2008/PC04) (presented by Anantha Kancherla who will be hosting some of WinHEC Graphics sessions)


The D3DL9 feature levels are as follows:
9_1 - low-end SM 2.0 cards: GeForce FX, S3 Chrome, Intel G965, SiS Mirage
9_2 - higher-end SM 2.0 cards: ATI 9800 and X200,
9_3 - SM 3.0 cards: GeForce 6 series, ATI X1x000 and up
Each upper level is a superset of a lower level.


Multithreading will be possible on down-level hardware but requires a driver update; that's probably what WDDM 1.1 is about.

I'm still waiting for WinHEC to clarify the details, but if D3D10 Level 9 is a subset of Direct3D10/11, maybe D3D9 class hardware could expose a subset of DDI10 in the WDDM driver, avoiding the trickery with software translation to DDI9 in the runtime?


WARP10 is a new D3D10.1 software rasterizer in Direct3D 11, optimized for real-time performance; it allows DWM and Direct2D/DirectWrite (http://channel9.msdn.com/pdc2008/PC18/) to run without a WDDM-compatible GPU in Windows 7.
Works way faster than the current Reference Rasterizer - the Earth bump-mapping demo from DirectX SDK ran only about 8 times slower than it did on the Quadro FX3600M.

Demirug
04-Nov-2008, 18:01
Just some news for everyone. The Direct3D 11 runtime will support Compute Shaders on older hardware as part of tech level 10.1 (new driver may required for this).

Novum
04-Nov-2008, 19:04
Does that mean no current GeForce-GPU can be used for Compute Shaders? Ouch.

Jawed
04-Nov-2008, 20:21
I think it'll be alright:

http://www.cupidity.f9.co.uk/b3da019.jpg

It's interesting the phrase "Private Write/Shared Read on groupshared data" as this is precisely the description AMD gives to the LDS-enabled "inter-thread register sharing" in RV7xx GPUs.

I had been thinking that that's the full functionality of D3D11's inter-thread sharing. But it turns out that D3D11 requires arbitrary sharing.

Arbitrary sharing is what NVidia's GPUs do.

So as I understand it:

ATI HD2xxx and HD3xxx GPUs cannot run compute shader
ATI HD4xxx GPUs are OK
NVidia 8800GTX and later GPUs are OKJawed

Demirug
05-Nov-2008, 16:58
I will check with Alison and Mike what Chips will be compatible.

CarstenS
06-Nov-2008, 11:51
As pre-announced, the november SDK contains a limited number of DX11-Samples, being
• Dynamic Shader Linking
• HDR Tone Mapping
• Multithreaded Rendering
• Subdivision Surfaces / Tesselation

with MT-Rendering the only one running at interactive framerates on DX10-class hardware. The rest uses the painfully slow reference rasterizer.

Some pics:
http://www.pcgameshardware.de/aid,666103/News/Direct_X_11_in_exklusiven_Screenshots_-_das_kann_die_neue_API/&menu=browser&mode=article&image_id=932884&article_id=666103&page=1

Humus
08-Nov-2008, 17:21
Download link for the November SDK:
http://www.microsoft.com/downloads/details.aspx?FamilyID=5493f76a-6d37-478d-ba17-28b1cca4865a&DisplayLang=en

Seth
23-Nov-2008, 17:46
Is everyone having fun with the D3D11 preview? I love it (except for the temporary lack of HW ;))!

I couldn't resist: http://reboot.zapto.org/cslife.png

Andrew Lauritzen
23-Nov-2008, 19:44
I haven't had time to play with it in depth yet, but I skimmed the documentation and looked at the samples.

So does anyone know what this "WARP" thing is? It appears to be sold as a fast, multi-core D3D10.1 software rasterizer by the documentation, but I've not had time to play with it yet. Is this maybe targeted at fast desktop compositing for Vista and newer OSes on computers without dedicated graphics hardware? I'm assuming the performance still wouldn't be up to spec for any modern games, but I haven't had time to try it out. Anyone played with this yet?

Jawed
23-Nov-2008, 19:53
So does anyone know what this "WARP" thing is? It appears to be sold as a fast, multi-core D3D10.1 software rasterizer by the documentation, but I've not had time to play with it yet. Is this maybe targeted at fast desktop compositing for Vista and newer OSes on computers without dedicated graphics hardware? I'm assuming the performance still wouldn't be up to spec for any modern games, but I haven't had time to try it out. Anyone played with this yet?
Yep, it's specifically intended to provide fully compliant rendering for anyone whose hardware isn't up to scratch.

It was demonstrated at PDC, doing a globe of the Earth with normal mapping/specularity and other funky material properties at about 85fps if I remember right.

Jawed

Demirug
23-Nov-2008, 20:05
So does anyone know what this "WARP" thing is? It appears to be sold as a fast, multi-core D3D10.1 software rasterizer by the documentation, but I've not had time to play with it yet. Is this maybe targeted at fast desktop compositing for Vista and newer OSes on computers without dedicated graphics hardware? I'm assuming the performance still wouldn't be up to spec for any modern games, but I haven't had time to try it out. Anyone played with this yet?

I have run our upcoming game on it. On my dual core system it hardly reaches 8 FPS with everything set to low.

Overall it looks more like an emergency fallback for Direct2D and DirectWrite. Maybe it could be an interesting why to use the SSE calculation power of an multicore system by running GPGPU shader there.

Demirug
24-Nov-2008, 17:46
http://msdn.microsoft.com/en-us/library/dd285359.aspx

A whitepaper with some performances data.

pjbliverpool
24-Nov-2008, 19:54
http://msdn.microsoft.com/en-us/library/dd285359.aspx

A whitepaper with some performances data.

Interesting numbers. They certainly highlight how far CPU's are from GPU performance at the moment.

Even a Radeon 2400 Pro can run rings around 8 Nehalem cores at 3Ghz!

I guess Crysis wasn't built with WARP10 in mind though. It would be interesting to see the results for a game that was.

Ateo
25-Nov-2008, 06:04
Interesting numbers. They certainly highlight how far CPU's are from GPU performance at the moment.

Even a Radeon 2400 Pro can run rings around 8 Nehalem cores at 3Ghz!

I guess Crysis wasn't built with WARP10 in mind though. It would be interesting to see the results for a game that was.

That really depends on what you do with your data.

Some calculations ar much faster on a GPU...those are the ones we are getting to much PR about...some calculations are much slower on the GPU...those you don't hear about....because they show the reality:

It's only a small number of calculation that can be done on a GPU instead of a CPU...and make sense at the same time.
(power consumption, time ect.)

Don't think the GPU can replace your CPU just yet...

pjbliverpool
25-Nov-2008, 15:12
That really depends on what you do with your data.

Some calculations ar much faster on a GPU...those are the ones we are getting to much PR about...some calculations are much slower on the GPU...those you don't hear about....because they show the reality:

It's only a small number of calculation that can be done on a GPU instead of a CPU...and make sense at the same time.
(power consumption, time ect.)

Don't think the GPU can replace your CPU just yet...

I should have clarified my statement to specify "for rendering". I didn't mean to suggest that GPU's are faster across the board, just that we're no where near the point of being able to hang up our GPU's in favour of software based rendering yet.

Larrabee may rock the boat in that regard though.

Davros
25-Nov-2008, 18:44
Windows 7: Combined release with Direct X11 is unlikely - Microsoft answered some questions about Windows 7.
http://www.pcgameshardware.com/aid,667657/News/Windows_7_Combined_release_with_Direct_X11_is_unli kely/

PCGH: Will Windows 7 be delivered with DirectX 11?

Ben Basaric: That is to be determined yet. I could speculate, but that wouldn't help you. But I can give a tendency: unlikely.

Edit : theres nothing like a bit of drama to liven up your day ;)

Microsoft's Ben Basaric just told us that Windows 7 will, contrary to the rather vague statement a few days ago, be delivered with DirectX 11. Furthermore DX11 will also come for Windows Vista although Microsoft wouldn't confirm, if this will coincide with the release of Service Pack 2 for Vista.

Arnold Beckenbauer
19-Feb-2009, 00:54
There is an article about some new features coming with D3D11 on PCGH.de (http://www.pcgameshardware.de/aid,676444/DirectX-11-Schnittstelle-fuer-Grafikkarten-in-Windows-7-und-Vista/Technologie/Wissen/), and there is a passage, which lets me think, that D3D11 GPUs will get Shared Memory with 32 KB (per SIMD/Core). Is it possible?

Seth
19-Feb-2009, 01:11
I highly recommend this document (http://www.microsoft.com/downloads/details.aspx?FamilyId=9F943B2B-53EA-4F80-84B2-F05A360BFC6A&displaylang=en). It is an exciting read! :D

Jawed
19-Feb-2009, 01:20
As far as I can tell the programming model for CS does not mention "memory".

Instead a thread is allowed to read/write registers belonging to other threads.

A thread group's 1024 threads are allowed to share data amongst each other. Each thread can have 2048 vec4 (8192 scalar) registers that are shared. That's a silly number, in the same vein as each thread being able to have 4096 private vec4 registers.

Anyway, if you give each of 1024 threads a single vec4 register, you get a 16KB block of memory to share across all threads in a work group. Where've you heard that number before?

Interestingly enough in ATI's Compute Shader mode, in Stream (i.e. nothing to do with D3D) there is a limit of 1024 threads that can run on any SIMD. And there's a 16KB LDS per SIMD. 1024 threads is not many (16 batches), it's going to struggle to hide fetch from memory latency if memory fetches are at all random.

The reality is that due to thread indexing, the code sort of looks like it's addressing memory.

Jawed

Seth
19-Feb-2009, 01:21
The group size is set with numthreads() in HLSL:

For instance:

[numthreads(8,8,8)]
void CS() { ... }