SwiftShader 2.0: A DX9 Software Rasterizer that runs Crysis

B3D News

Beyond3D News
Regular
TransGaming has just released SwiftShader 2.0, an highly optimized software rasterizer that supports DX9 and Shader Model 2.0 and scales with multi-core processors. It can run (albeit slowly) many modern games and it makes a dual-core Penryn perform similarly to the GeForce FX5600/5700 in 3DMark05.

Read the full news item
 
Looking forward to seeing the performance in Far Cry...
Far Cry Benchmark

The benchmark started at 04-Apr-08 01:33:43

System Information
Operating system: Windows (TM) Vista Ultimate
System memory: 4.0 GB
CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
CPU speed: 2400 MHz
Sound system: Luidsprekers (Creative SB X-Fi)
--------------------------------------------------------------------------------
Resolution: 1024×768
Maximum quality option, Direct3D renderer
Level: Research, demo: Research.tmd
Pixel shader: model 2.0b
Antialising: None
Anisotropic filtering: 1×
HDR: disabled
Geometry Instancing: disabled
Normal-maps compression: disabled

Score = 7.22 FPS
 
Please note that SwiftShader 2.0 hasn't been particularly optimized for any of these games. We are committed at optimizing for our client's needs of course, but for existing games only the most obvious bottlenecks were analyzed.

And, just as importantly, none of these games have been optimized for software rendering. For example using a cube map for vector normalization is tens of times slower than a 'nrm' shader operation. Also many operations a graphics card does 'for free' actually cost cycles when software rendering, unless properly disabled.

So the above scores are not an upper limit for what is possible with software rendering. For this release we focussed mainly on features and quality, offering a 'complete' Direct3D 9 device for the casual games market.
 
Far Cry Benchmark

The benchmark started at 04-Apr-08 01:33:43

System Information
Operating system: Windows (TM) Vista Ultimate
System memory: 4.0 GB
CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz
CPU speed: 2400 MHz
Sound system: Luidsprekers (Creative SB X-Fi)
--------------------------------------------------------------------------------
Resolution: 1024×768
Maximum quality option, Direct3D renderer
Level: Research, demo: Research.tmd
Pixel shader: model 2.0b
Antialising: None
Anisotropic filtering: 1×
HDR: disabled
Geometry Instancing: disabled
Normal-maps compression: disabled

Score = 7.22 FPS
So it would be sort of playable at 640x480 with maximum quality, ~15fps?

That's an OpenGL game I'm afraid.
:oops: If I ever knew that I've totally forgotten :oops: :oops:

I can play whole levels on my machine. FPS varies between about 10 and 40 (slightly less than 20 most of the time) for 640x480 and settings at High.
So that's quite decent too. Both of these games would scale to 800x600 or 1280x1024 with lower quality settings.

Since we talk a lot about ALU:TEX ratio for hardware and where software is in relation to that, would you like to hazard a guess at this ratio for your C2Q PC?

Please note that SwiftShader 2.0 hasn't been particularly optimized for any of these games. We are committed at optimizing for our client's needs of course, but for existing games only the most obvious bottlenecks were analyzed.

And, just as importantly, none of these games have been optimized for software rendering. For example using a cube map for vector normalization is tens of times slower than a 'nrm' shader operation. Also many operations a graphics card does 'for free' actually cost cycles when software rendering, unless properly disabled.
I dare say you're faced by the same kind of questions of "driver optimisation", profiling and shader replacement that the IHVs face. And presumably these games (or this type of game) aren't "casual enough" to be your main focus.

So, it's early days yet, too early to examine the detailed performance of state of the art software rendering on C2Q, say, against dual-core-with-IGP systems.

Presumably you're looking forward to Nehalem - I imagine the shiny new memory system will make things run significantly better. Does the performance of SS on Phenom show benefits attributable to its memory system (as opposed to C2 or X2)?

Jawed
 
Wow, this is awesome!! I'm glad there's a free-to-use software rasterizer for D3D out there. I can't wait to try it out for myself.

Quick question: right now it supports only SM2? Not SM2.a/b, or even SM3?
 
SwiftShader 2.0 on AMD Phenom

I'm really anxious to know how well it performs on the Phenom platform as well. SS seems like an extremely memory intensive program and should benefit from Phenom/Nehalem cache structures and system mem bandwidth. I hope they add SEE4/a enhancements in future revisions to make better use of the power of these new chips. Also I wonder if the virtualization characteristics on AMD and Intel processors can be used to their advantage.

EDIT: I forgot to ask, does SS 2.0 have x64 code optimizations? It seems to me it would benefit from the memory optimizations x64 allows.
 
Last edited by a moderator:
So it would be sort of playable at 640x480 with maximum quality, ~15fps?
That might be slightly optimistic. Pixel processing takes the majority of execution time but vertex processing and primitive setup are not negligible especially at such low resolutions. Furthermore, cache coherency and prefetch efficiency improves with more pixels per triangle.
:oops: If I ever knew that I've totally forgotten :oops: :oops:
I'm probably wrong, sorry. I've never actually ran the game. I'll download the demo and give it a try...
So that's quite decent too. Both of these games would scale to 800x600 or 1280x1024 with lower quality settings.
Actually, no, unfortunately. Lowering quality in Half-Life 2 makes it use Shader Model 1.x but it does nearly the same operations. It does things similar to using a cube texture lookup for vector normalization.
Since we talk a lot about ALU:TEX ratio for hardware and where software is in relation to that, would you like to hazard a guess at this ratio for your C2Q PC?
The theoretical floating-point performance of modern CPUs is actually close to that of mid-range graphics cards (for multiply-add). So in my experience software rendering can handle pure arithmetic work really well. Texture sampling however requires a lot of instructions to implement. You could use RenderMonkey to compare the costs. I'd actually be quite interested in the results myself. :D Transcendental functions like log and exp also don't map directly to x86 instructions.

Note though that it's never a bottleneck in the ALU:TEX sense of graphics hardware. There are no dedicated texture samplers that could be a bottleneck on their own. It's just a shift in what code is spent most cycles on. This is also why I'm a big proponent of adding a gather instruction to CPUs. It's useful for texture sampling, transcendental functions (for lookup tables), and tons of other things besides graphics. Adding actual texture sampling units would be of much less use for anything else and hard to standardize.
I dare say you're faced by the same kind of questions of "driver optimisation", profiling and shader replacement that the IHVs face. And presumably these games (or this type of game) aren't "casual enough" to be your main focus.

So, it's early days yet, too early to examine the detailed performance of state of the art software rendering on C2Q, say, against dual-core-with-IGP systems.
I'm afraid so, yes. But I'm up for the challenge, so keep tuned. ;)
Presumably you're looking forward to Nehalem - I imagine the shiny new memory system will make things run significantly better.
Yes, Nehalem and especially Sandy Bridge (with AVX) look very exciting.
Does the performance of SS on Phenom show benefits attributable to its memory system (as opposed to C2 or X2)?
Unfortunately I don't have a Phenom system to test with. But I have noticed some 'interesting' behavior on Ahtlon X2. It's too early to make conclusions though.
 
wouldnt this be of benefit for those games which dont run properly (rendering errors) on modern cards + drivers
examples:
system shock 2
theif 2
crimson skies

edit:
Just tried crimson skies and it just doesnt run with swiftshader installed
 
Last edited by a moderator:
SS seems like an extremely memory intensive program...
What makes you think that? It renders at realatively low framerates and low resolution so total bandwidth needs are modest and there's a lot of cycles between texture accesses simply because of the filtering.

Inter-core bandwidth and latency is something to stay aware of though, especially with increasing core counts.
EDIT: I forgot to ask, does SS 2.0 have x64 code optimizations? It seems to me it would benefit from the memory optimizations x64 allows.
No, it's still 32-bit. Since we're aiming mainly at the causal games market 64-bit makes no sense yet. On the other hand the extra registers would definitely help performance. What specific memory optimizations are you referring to?
 
Really funny piece of code! ;)

Here are some fillrate numbers on my E8400 @ 4GHz:

Code:
           FrameBuffer Clear : 1254,4 FPS
                  Color Fill : 395,1034 M-Pixel/s
                      Z Fill : 807,8229 M-Pixel/s
              Color + Z Fill : 309,5396 M-Pixel/s
              Single Texture : 186,2271 M-Pixel/s
  Single Texture Alpha Blend : 163,5779 M-Pixel/s
               Dual Textures : 115,7628 M-Pixel/s
             Triple Textures : 83,04722 M-Pixel/s
               Quad Textures : 65,43114 M-Pixel/s
    1 Floating Poing Texture : 143,4452 M-Pixel/s
              Render to Self : 171,9665 M-Pixel/s
               PS 1.1 Simple : 161,0613 M-Pixel/s
               PS 1.4 Simple : 166,0944 M-Pixel/s
               PS 2.0 Simple : 135,8955 M-Pixel/s
            PS 2.0 PP Simple : 138,412 M-Pixel/s

Yay! My Penryn got double Z/Stencil rate... :LOL:

Screen shots? How is the quality compared to hardware rendering?
I noticed, that there is some colour banding in the RTHDRIBL demo -- flares and bloom edges mostly.
 
Last edited by a moderator:
Thanks for the info! Another game demo I'll download and try. I'm starting to run out of disk space here... ;)

Nick :
One thing the retail version of crimson skies wouldnt work at all it complained about needing direct-x 7.0 or higher I had to patch it to version 1.02 for it to work. As demo's are usually based on version 1.0 and never updated it may not work for you

ps: if your looking for ideas of games to try read this thread:
http://forum.beyond3d.com/showthread.php?t=47534

edit 2: sorry i thought you just wanted ideas for games to play, i didnt realise you were connected to swiftshader and actually wanted non working games to test ;)
 
Texture filtering is completely missing here, as well as the AA:

61125049zz0.jpg
 
Quick question: right now it supports only SM2? Not SM2.a/b, or even SM3?
It's Shader Model 2.x actually. It supports dynamic branching and predication for vertex shaders, gradient instructions for pixel shaders, there's no limitations in dependent texture reads, support for arbitrary swizzle, no register limitations, and no shader length limitations.
 
Texture filtering is completely missing here, as well as the AA:
You can enable trilinear filtering with SwiftConfig (either the .ini file or the web server).

Anisotropic filtering and anti-aliasing are currently not implemented.
 
What about Larrabee?

If SwiftShader runs on x86, what about Larrabee?

After it (Larrabee) is supposed to all about easy X86 programming...

Markku
 
Back
Top