SwiftShader 2.0: A DX9 Software Rasterizer that runs Crysis

Well, it seems that Microsoft's new WARP rasterizer uses a similar technique to SwiftShader.
But WARP is a fully-featured DX10/11 renderer, makes me wonder where that leaves SwiftShader.
Also, I wonder which of the two is the faster renderer. Sadly a 1:1 comparison seems to be out of the question, since WARP can't run DX9 code and SwiftShader can't run DX10/11 code.
 
Well, it seems that Microsoft's new WARP rasterizer uses a similar technique to SwiftShader.
But WARP is a fully-featured DX10/11 renderer, makes me wonder where that leaves SwiftShader.
Also, I wonder which of the two is the faster renderer. Sadly a 1:1 comparison seems to be out of the question, since WARP can't run DX9 code and SwiftShader can't run DX10/11 code.

Did you not just answer your own question? SwiftShader is useful for DX9 code, WARP for DX10/11...

I'd imagine a typical DX10 title will have more complex shaders making them slower.
 
Wow, you bump the over a year old thread just to make that comment. Congratulations you are the winnar!


*of the first annual biggest asshole award on B3D.
 
Did you not just answer your own question? SwiftShader is useful for DX9 code, WARP for DX10/11...

Not really. As far as I know, the idea was to make SwiftShader support DX10. I recall Nick saying that he wanted to have the first SM4.0 implementation...
If that is still the goal, then SwiftShader would become a direct competitor.
Or has WARP moved the goalposts for SwiftShader now?

I'd imagine a typical DX10 title will have more complex shaders making them slower.

Depends on how you look at it.
If we take the usage of SwiftShader as a software solution for 'casual games', relieving the developer from worrying about hardware compatibility... then that doesn't hold.
WARP could do the same thing, except you would use DX10/DX11 instead of DX9. Arguably, more powerful shaders are actually better for a software renderer. You can render effects with more elegant algorithms, rather than just bruteforce and multipass.
In a few years, DX9 code may no longer be relevant, and even casual games might use DX10/11.

In other words: does this mark the end of SwiftShader? Was it already dead anyway? Or is SwiftShader moving forward, and will it compete with WARP?
 
Wow, you bump the over a year old thread just to make that comment. Congratulations you are the winnar!

I'm not sure that really makes him an asshole, it was a valid observation and I assume SwiftShader is still being worked on in some capacity.
 
Has anyone succeeded at getting WARP to run Crysis? I keep getting an error about the D3D10ReflectShader entry point not being found. I assume that's either because it's an older beta build, or they use a slightly modified version of Crysis.
 
I'd imagine a typical DX10 title will have more complex shaders making them slower.
The triple-A titles that push the envelope certainly do, but applications suited for software rendering are no more complex than when using Direct3D 9. They typically don't use anything beyond the capabilities of a Shader Model 2.0 card.

Crysis with all settings on low looks no different to me when running with DX9 or 10, and with the latter API only requires Shader Model 2.0.
 
The triple-A titles that push the envelope certainly do, but applications suited for software rendering are no more complex than when using Direct3D 9. They typically don't use anything beyond the capabilities of a Shader Model 2.0 card.

Crysis with all settings on low looks no different to me when running with DX9 or 10, and with the latter API only requires Shader Model 2.0.

Yea, that's what one would think. However, when I was playing around on my Intel X3100,
I noticed that Crysis ran slower in D3D10 mode than in D3D9 mode, even at the lowest settings.

So I conducted a small test on my own. I rendered the exact same scene with the exact same shaders in D3D9 and D3D10, and D3D9 was around 10% faster.
And I literally mean the exact same shaders. With the D3DX compiler you can compile the exact same sourcecode for D3D9 or D3D10.
The shaders were very trivial anyway, just per-pixel diffuse lighting. Nothing beyond SM2.0, although I compiled them for SM3.0 and SM4.0.

Makes me wonder where the extra overhead comes from in D3D10. Is it just poor Intel drivers, or does D3D10 really do something different?
One would think that D3D10 would be faster, because my code would theoretically work more efficiently in D3D10. I update all shader constants in one call, and I don't need BeginScene()/EndScene(), and things like that.

I've also tried it on my 8800GTS. The difference between D3D9 and D3D10 was minimal, but still the D3D9 was a smidge faster in Vista.
When running the D3D9 code on XP Pro or XP x64, it was faster than either D3D9 or D3D10 in Vista. I've only tried it in windowed mode so far, though... Perhaps the Vista desktop is a limiting factor in performance, I'll have to see what happens when I run both in fullscreen to eliminate that factor.
 
Last edited by a moderator:
One point is the driver overhead, vista virtualizes all memory, so it also pushes the data to the drivers when and how it wants, that intermediate-buffering-overhead is what makes vista in general slower than xp (and also makes it more difficult to make application specific optimizations like drivers did for a lot of games on winxp).

the main difference between d3d9 and d3d10 is the constant handling.

in D3D10-mode, the driver has to assume there are some new constants that have to be set, that means that at least the cache needs to be flushed. But in worst case it means that some shader optimizations are either done on per drawcall basis or not enabled at all.

in D3D9 mode, the driver gets the constants you set probably via the commandbuffer, if u dont set any, it knows all data is up2date, all shader can be kept.

const buffer shall usually save cpu-overhead on application side, but applications usually work with const-buffers like simple constants, updating most of them frequently and barely save constants for the long-term (like maybe material settings). additionally, if you want to change just one simple constant, you have to update them all, that leads to more overhead than saving.
also the drivers have to keep track of all constant settings, even if u just change one, you have to push the whole const-buffer over the bus to vmem.
 
One point is the driver overhead, vista virtualizes all memory, so it also pushes the data to the drivers when and how it wants, that intermediate-buffering-overhead is what makes vista in general slower than xp (and also makes it more difficult to make application specific optimizations like drivers did for a lot of games on winxp).

Well, D3D9 was faster even on Vista and Windows 7.

the main difference between d3d9 and d3d10 is the constant handling.

in D3D10-mode, the driver has to assume there are some new constants that have to be set, that means that at least the cache needs to be flushed. But in worst case it means that some shader optimizations are either done on per drawcall basis or not enabled at all.

in D3D9 mode, the driver gets the constants you set probably via the commandbuffer, if u dont set any, it knows all data is up2date, all shader can be kept.

Doesn't make sense to me.
I do update the constants all the time, at the very least I need to update the transform matrices for the object animation, and light positions and such.
With D3D9 I have to make a separate call for each constant that I update. With D3D10 instead, I just map the entire constant buffer in one go, put the new values in, and unmap it.
So in D3D10 I specifically tell the driver "I'm done with it, the constant buffer is up to date now", where with D3D9 it doesn't know what is going on exactly.
In my case I update all constants every frame anyway, because I used very simple shaders.

Also, what you're saying isn't entirely correct. You can have multiple constant buffers, and you should order them to the frequency of updating them. All this should make D3D10 more efficient, when used properly. So you don't need to push "the whole constant buffer" over the bus. Only the buffer you're updating at the time. Since you do the update in a single go, it should get maximum performance with a burst transfer over the bus.
However, in my case the constant buffers were very small. Only one matrix and a few float values. So bandwidth shouldn't be an issue anyway.

I wonder if it may have something to do with thread safety. D3D9 isn't thread-safe by default, and I never used that flag to get a thread-safe instance. I don't think D3D10 has this option, so perhaps you always get a thread-safe instance by default, which would explain at least some of the extra overhead.
 
Congratulations! This is good news indeed.
 
Yea, that's what one would think. However, when I was playing around on my Intel X3100,
I noticed that Crysis ran slower in D3D10 mode than in D3D9 mode, even at the lowest settings.

So I conducted a small test on my own. I rendered the exact same scene with the exact same shaders in D3D9 and D3D10, and D3D9 was around 10% faster.
And I literally mean the exact same shaders. With the D3DX compiler you can compile the exact same sourcecode for D3D9 or D3D10.
The shaders were very trivial anyway, just per-pixel diffuse lighting. Nothing beyond SM2.0, although I compiled them for SM3.0 and SM4.0.

Makes me wonder where the extra overhead comes from in D3D10. Is it just poor Intel drivers, or does D3D10 really do something different?
One would think that D3D10 would be faster, because my code would theoretically work more efficiently in D3D10. I update all shader constants in one call, and I don't need BeginScene()/EndScene(), and things like that.

I've also tried it on my 8800GTS. The difference between D3D9 and D3D10 was minimal, but still the D3D9 was a smidge faster in Vista.
When running the D3D9 code on XP Pro or XP x64, it was faster than either D3D9 or D3D10 in Vista. I've only tried it in windowed mode so far, though... Perhaps the Vista desktop is a limiting factor in performance, I'll have to see what happens when I run both in fullscreen to eliminate that factor.

Crysis has a major confound when judging D3D10 performance in the form of it's in-engine texture streamer. It's disabled at the lower two texture settings and kicks in at the higher two.

It is a major confound because DX10 already does a form of streaming of it's own in addition to the engine based one. Disabling texture streaming in the engine brings memory usage up to ~1.5gb in DX9 while remaining at 1gb in DX10 mode at the highest texture detail settings. The in-engine streamer also introduces artifacts so it becomes an apples-to-oranges scenario.

The same memory usage behavior is true of all DX10 games I've tried, but with nowhere near the performance drop. Far Cry 2, for example, drops from ~700mb to ~400mb going from DX9 to DX10 while still managing to perform faster.
 
Doesn't that have to do with the virtual videomemory system in D3D10 though?
From what I understood, in DX9 all texturememory is mapped into the virtual address space at all times. But with DX10 they aren't mapped into the address space at all unless you specifically Map() them...?
 
Back
Top