SwiftShader 2.0: A DX9 Software Rasterizer that runs Crysis

Do you have a breakdown for address generation, decompression and filtering, say for simple 8bpc bilinear?
Sorry, the exact details could reveal IP. But you can probably imagine that each of these steps take multiple instructions to implement, versus a nearly 1-to-1 translation for add, mul, dot4, etc. With SSE5/AVX CPUs will even have fused multiply-add... Compared to that texture sampling is very expensive (though per-component shift and multiply-add do optimize address generation). Again, a gather instruction would be quite useful...
So I could install RenderMonkey, put the SS DX9.dll in the same folder and get some realtime execution stats from some simple shaders?
Sure. Press 's' to see the stats. By varying resolution you can get an idea of how much time is spent doing vertex processing and setup, versus pixel processing.
Presumably, as a result of this, SS doesn't need to have many pixels in flight to hide texturing latency.
Automatic prefetch and out-of-order execution do a great job as far as I'm aware. And Hyper-Threading also means that at a hardware level twice the work is "in flight".
 
Just tried it on unit & benchmark tests we have for Unity...

Almost all rendering unit tests pass correctly (has some issues with attenuation of point vertex lights and a couple of other minor things)! SwiftShader 1.x demo used to crash on some of those, so 2.0 release is a huge step forward.

Performance wise, I only compared Core2Quad + Radeon 3850 versus same CPU with ShiftShader. Obviously it runs 30 to 100 times slower, but hey. Will compare same CPU with Intel 965 for something more in the same ballpark.

Color me impressed! Amazing work.
 
Nice job!!

I'm not up to speed on PC assembly, but are you taking advantage of any instructions similar to the VMX ops on the Xenon? If not, how much do you think it would help?
 
Nice job!!

I'm not up to speed on PC assembly, but are you taking advantage of any instructions similar to the VMX ops on the Xenon? If not, how much do you think it would help?

Quick clarification...what I meant by the last statement is how much would it help performance if we had an instruction set standard to the Xenon's in all pcs.
 
just tried tribes 3 - vengance (uses unreal engine 2)
and get the following error


the game does run though but im getting i guess 5fps at most
 
I'm not up to speed on PC assembly, but are you taking advantage of any instructions similar to the VMX ops on the Xenon? If not, how much do you think it would help?
The equivalent for x86 processors is SSE, which SwiftShader already uses extensively. SSE has several extensions of its own, and SwiftShader supports up to SSSE3 (Supplemental SSE3 - introduced with Core 2 Duo). The most recent Penryn architecture based CPUs feature SSE4, and Intel recently released details about AVX, which extends the registers from 128-bit to 256-bit.
 
I wonder if Larrabee can be made to run SwiftShader to showcase how well it renders x86 3D graphics. I can't wait to see comparisons with Nehalem as well.
 
Very interesting, but it'll take a much more powerful CPU to get results worth wanting.
SwiftShader is not for avid gamers. For them a hardware solution undoubtly offers the most bang-for-bucks. However, for people who don't have Shader Model 2.0 hardware and who don't intend to upgrade just to play a modern casual game, SwiftShader could be very much "worth wanting". But you're right, to actually replace IGPs it's going to take at least another CPU generation or two.
 
just tried tribes 3 - vengance (uses unreal engine 2)
and get the following error

...

the game does run though but im getting i guess 5fps at most
Thanks for the information! How does that performance compare to Unreal Tournament 2004 on your system?
 
im getting an average of 6fps on ut2004
a hell of a lot more than i got on tribes 3

ps: the lighning from the sniper rifle is not rendering at all

pps: just tried mechwarrior 4 mercenaries and the game doesnt run in software (what i mean is with the 2 dll files in the mech4 folder its still using my gpu)


ppps: painkiller seems to run about as fast as ut2004 but 2 problems
1: save games load at about 20% the speed of normal
2: my desktop mouse cursor is being rendered in the game
 
Last edited by a moderator:
the lighning from the sniper rifle is not rendering at all
This could be due to the low framerate. With UT2004, when the lizzard jumps through the TWIMTBP logo and fires you don't see the flashes unless the framerate is above 10 or so. I'm not excluding a SwiftShader bug but sometimes it's the application that makes things look differently.
just tried mechwarrior 4 mercenaries and the game doesnt run in software (what i mean is with the 2 dll files in the mech4 folder its still using my gpu)
Some games load their DLLs directly from the 'system32' folder. And for example Portal first looks in its 'bin' folder. I'll give it a try when I get the chance.
painkiller seems to run about as fast as ut2004 but 2 problems
1: save games load at about 20% the speed of normal
2: my desktop mouse cursor is being rendered in the game
Some games keep rendering as fast as possible while another thread is responsible for loading. They are obviously unaware that SwiftShader takes most resources, slowing down the loading. But I'll give Painkiller a try to look for other bottlenecks.

You can disable the system cursor with SwiftConfig. SwiftShader doesn't support 'hardware' cursor rendering. Some games choose to ignore this so by default the option to keep the system cursor is enabled.

Thanks!
 
SwiftShader can also help reduce QA and support costs. It will generate exactly the same output on any system, and provides a fallback in case of hardware/driver/runtime issues. This is also why it can be useful for medical applications. It's important that every doctor around the world sees the same 3D images.

Interesting point. Right now the processing power of GPUs clearly outweighs the potential for inaccuracies in medical imaging. It probably will for some time. But software rendering on parallel processors would perhaps allow the best of both worlds.
 
It's important that every doctor around the world sees the same 3D images.
Not my doctor! My doctor sees everything in heavily anti-aliased parallax occlusion maps! .. he also, evidently, sees all kinds of diseases and.. other.. problems, like the dancing leprechaun in my appendix, that are apparently not there. But I'm pretty sure that's just a side-effect of all the LSD he ate in college.
 
A demo is now available on TransGaming's website, with which we ran 3DMark05 and obtained a score of ~400 on a stock Core 2 Duo E8400. That's still not mind-blowingly fast, but keep in mind Direct3D's reference rasterizer would likely score in the single digits and the SGX-based IGP in Intel's upcoming Silverthorne-based Menlow platform for UMPCs/MIDs is claimed to only score ~150.

I'm not 100% sure that is right, 3D Mark 05 has a CPU Test as part of its score does it not? The E8400 would presumably score many times that of an Atom, and could out-weigh a graphical performance shortfall?
 
I'm not 100% sure that is right, 3D Mark 05 has a CPU Test as part of its score does it not? The E8400 would presumably score many times that of an Atom, and could out-weigh a graphical performance shortfall?
No, you're thinking of 3DMark06; in 05, the CPU testing is separate... :)
 
No, you're thinking of 3DMark06; in 05, the CPU testing is separate... :)

My bad, assumed it still contributed to the score :)

Oh, and I get 93 in the SM2.0 test in 3D Mark 06 on a 4GHz QX9650. Running SwiftShader also takes away around 1000 marks in the CPU test but I guess that is to be expected.

In 3D Mark 06 however the screen appears to go entirely black for every other frame drawn :???:

UT3 runs, detail and resolution doesn't seem to make a huge difference, was actually getting about 2-3fps at 1680x1050.
 
4GHz QX9650.
What's the heat like for such a CPU? I assume it's air cooled but does it use a classic cooler or half a kilo of copper and a 12 cm fan at high RPM? I've been hearing really good things about Intel's 45 nm...
In 3D Mark 06 however the screen appears to go entirely black for every other frame drawn :???:
That sounds worrying. What's your O.S. and 3DMark settings? Does it happen consistently? Is the SwiftShader logo still there for the black frames? Are you running FRAPS or anything else that might interfere? Clues about other potential causes are highly appreciated. Thanks!

I have Vista 64-bit Ultimate and a Q6600, and can't reproduce anything like that (just a rapid slideshow of 3DMark06 images).
UT3 runs, detail and resolution doesn't seem to make a huge difference, was actually getting about 2-3fps at 1680x1050.
Yes, there are a few complementary explanations for this. vertex processing and triangle setup are not negligible for this game. High resolutions also have a very positive effect on cache coherency and prefetch efficiency. And lastly large tasks means less time is wasted on thread synchronization.

This makes me hopeful that as CPU performance steadily progresses it will be viable as an IGP replacement sooner than expected.
 
Back
Top