SwiftShader 2.0: A DX9 Software Rasterizer that runs Crysis

Nick · Apr 6, 2008

Jawed said:
Do you have a breakdown for address generation, decompression and filtering, say for simple 8bpc bilinear?

Sorry, the exact details could reveal IP. But you can probably imagine that each of these steps take multiple instructions to implement, versus a nearly 1-to-1 translation for add, mul, dot4, etc. With SSE5/AVX CPUs will even have fused multiply-add... Compared to that texture sampling is very expensive (though per-component shift and multiply-add do optimize address generation). Again, a gather instruction would be quite useful...

So I could install RenderMonkey, put the SS DX9.dll in the same folder and get some realtime execution stats from some simple shaders?

Sure. Press 's' to see the stats. By varying resolution you can get an idea of how much time is spent doing vertex processing and setup, versus pixel processing.

Presumably, as a result of this, SS doesn't need to have many pixels in flight to hide texturing latency.

Automatic prefetch and out-of-order execution do a great job as far as I'm aware. And Hyper-Threading also means that at a hardware level twice the work is "in flight".

darkblu · Apr 6, 2008

hey Nick, good stuff there! any plans for arbvp1/fp1?

NeARAZ · Apr 6, 2008

Just tried it on unit & benchmark tests we have for Unity...

Almost all rendering unit tests pass correctly (has some issues with attenuation of point vertex lights and a couple of other minor things)! SwiftShader 1.x demo used to crash on some of those, so 2.0 release is a huge step forward.

Performance wise, I only compared Core2Quad + Radeon 3850 versus same CPU with ShiftShader. Obviously it runs 30 to 100 times slower, but hey. Will compare same CPU with Intel 965 for something more in the same ballpark.

Color me impressed! Amazing work.

Jawed · Apr 6, 2008

Some performance and screenshot comparisons, using an 8500GT as a comparison.

http://www.pcgameshardware.de/aid,6...d_Call_of_Duty_komplett_auf_der_CPU_berechnet

Jawed

big4ared · Apr 6, 2008

Nice job!!

I'm not up to speed on PC assembly, but are you taking advantage of any instructions similar to the VMX ops on the Xenon? If not, how much do you think it would help?

big4ared · Apr 6, 2008

big4ared said:
Nice job!!

I'm not up to speed on PC assembly, but are you taking advantage of any instructions similar to the VMX ops on the Xenon? If not, how much do you think it would help?

Quick clarification...what I meant by the last statement is how much would it help performance if we had an instruction set standard to the Xenon's in all pcs.

Mobius1aic · Apr 6, 2008

Very interesting, but it'll take a much more powerful CPU to get results worth wanting.

Davros · Apr 6, 2008

just tried tribes 3 - vengance (uses unreal engine 2)
and get the following error

the game does run though but im getting i guess 5fps at most

Nick · Apr 6, 2008

big4ared said:
I'm not up to speed on PC assembly, but are you taking advantage of any instructions similar to the VMX ops on the Xenon? If not, how much do you think it would help?

The equivalent for x86 processors is SSE, which SwiftShader already uses extensively. SSE has several extensions of its own, and SwiftShader supports up to SSSE3 (Supplemental SSE3 - introduced with Core 2 Duo). The most recent Penryn architecture based CPUs feature SSE4, and Intel recently released details about AVX, which extends the registers from 128-bit to 256-bit.

wingless · Apr 6, 2008

I wonder if Larrabee can be made to run SwiftShader to showcase how well it renders x86 3D graphics. I can't wait to see comparisons with Nehalem as well.

Nick · Apr 6, 2008

Mobius1aic said:
Very interesting, but it'll take a much more powerful CPU to get results worth wanting.

SwiftShader is not for avid gamers. For them a hardware solution undoubtly offers the most bang-for-bucks. However, for people who don't have Shader Model 2.0 hardware and who don't intend to upgrade just to play a modern casual game, SwiftShader could be very much "worth wanting". But you're right, to actually replace IGPs it's going to take at least another CPU generation or two.

Nick · Apr 6, 2008

Davros said:
just tried tribes 3 - vengance (uses unreal engine 2)
and get the following error

...

the game does run though but im getting i guess 5fps at most

Thanks for the information! How does that performance compare to Unreal Tournament 2004 on your system?

Davros · Apr 7, 2008

im getting an average of 6fps on ut2004
a hell of a lot more than i got on tribes 3

ps: the lighning from the sniper rifle is not rendering at all

pps: just tried mechwarrior 4 mercenaries and the game doesnt run in software (what i mean is with the 2 dll files in the mech4 folder its still using my gpu)

ppps: painkiller seems to run about as fast as ut2004 but 2 problems
1: save games load at about 20% the speed of normal
2: my desktop mouse cursor is being rendered in the game

Nick · Apr 7, 2008

Davros said:
the lighning from the sniper rifle is not rendering at all

This could be due to the low framerate. With UT2004, when the lizzard jumps through the TWIMTBP logo and fires you don't see the flashes unless the framerate is above 10 or so. I'm not excluding a SwiftShader bug but sometimes it's the application that makes things look differently.

just tried mechwarrior 4 mercenaries and the game doesnt run in software (what i mean is with the 2 dll files in the mech4 folder its still using my gpu)

Some games load their DLLs directly from the 'system32' folder. And for example Portal first looks in its 'bin' folder. I'll give it a try when I get the chance.

painkiller seems to run about as fast as ut2004 but 2 problems
1: save games load at about 20% the speed of normal
2: my desktop mouse cursor is being rendered in the game

Some games keep rendering as fast as possible while another thread is responsible for loading. They are obviously unaware that SwiftShader takes most resources, slowing down the loading. But I'll give Painkiller a try to look for other bottlenecks.

You can disable the system cursor with SwiftConfig. SwiftShader doesn't support 'hardware' cursor rendering. Some games choose to ignore this so by default the option to keep the system cursor is enabled.

Thanks!

Voltron · Apr 7, 2008

Nick said:
SwiftShader can also help reduce QA and support costs. It will generate exactly the same output on any system, and provides a fallback in case of hardware/driver/runtime issues. This is also why it can be useful for medical applications. It's important that every doctor around the world sees the same 3D images.

Interesting point. Right now the processing power of GPUs clearly outweighs the potential for inaccuracies in medical imaging. It probably will for some time. But software rendering on parallel processors would perhaps allow the best of both worlds.

Ilfirin · Apr 7, 2008

It's important that every doctor around the world sees the same 3D images.

Not my doctor! My doctor sees everything in heavily anti-aliased parallax occlusion maps! .. he also, evidently, sees all kinds of diseases and.. other.. problems, like the dancing leprechaun in my appendix, that are apparently not there. But I'm pretty sure that's just a side-effect of all the LSD he ate in college.

Thorburn · Apr 7, 2008

A demo is now available on TransGaming's website, with which we ran 3DMark05 and obtained a score of ~400 on a stock Core 2 Duo E8400. That's still not mind-blowingly fast, but keep in mind Direct3D's reference rasterizer would likely score in the single digits and the SGX-based IGP in Intel's upcoming Silverthorne-based Menlow platform for UMPCs/MIDs is claimed to only score ~150.

I'm not 100% sure that is right, 3D Mark 05 has a CPU Test as part of its score does it not? The E8400 would presumably score many times that of an Atom, and could out-weigh a graphical performance shortfall?

Arun · Apr 7, 2008

Thorburn said:
I'm not 100% sure that is right, 3D Mark 05 has a CPU Test as part of its score does it not? The E8400 would presumably score many times that of an Atom, and could out-weigh a graphical performance shortfall?

No, you're thinking of 3DMark06; in 05, the CPU testing is separate...

Thorburn · Apr 7, 2008

Arun said:
No, you're thinking of 3DMark06; in 05, the CPU testing is separate...

My bad, assumed it still contributed to the score

Oh, and I get 93 in the SM2.0 test in 3D Mark 06 on a 4GHz QX9650. Running SwiftShader also takes away around 1000 marks in the CPU test but I guess that is to be expected.

In 3D Mark 06 however the screen appears to go entirely black for every other frame drawn :???:

UT3 runs, detail and resolution doesn't seem to make a huge difference, was actually getting about 2-3fps at 1680x1050.

Nick · Apr 7, 2008

Thorburn said:
4GHz QX9650.

What's the heat like for such a CPU? I assume it's air cooled but does it use a classic cooler or half a kilo of copper and a 12 cm fan at high RPM? I've been hearing really good things about Intel's 45 nm...

In 3D Mark 06 however the screen appears to go entirely black for every other frame drawn

That sounds worrying. What's your O.S. and 3DMark settings? Does it happen consistently? Is the SwiftShader logo still there for the black frames? Are you running FRAPS or anything else that might interfere? Clues about other potential causes are highly appreciated. Thanks!

I have Vista 64-bit Ultimate and a Q6600, and can't reproduce anything like that (just a rapid slideshow of 3DMark06 images).

UT3 runs, detail and resolution doesn't seem to make a huge difference, was actually getting about 2-3fps at 1680x1050.

Yes, there are a few complementary explanations for this. vertex processing and triangle setup are not negligible for this game. High resolutions also have a very positive effect on cache coherency and prefetch efficiency. And lastly large tasks means less time is wasted on thread synchronization.

This makes me hopeful that as CPU performance steadily progresses it will be viable as an IGP replacement sooner than expected.

SwiftShader 2.0: A DX9 Software Rasterizer that runs Crysis

Nick

darkblu

NeARAZ

Jawed

big4ared

big4ared

Mobius1aic

Quo vadis?

Davros

Nick

wingless

Nick

Nick

Davros

Nick

Voltron

Ilfirin

Thorburn

Moderator

Arun

Unknown.

Thorburn

Moderator

Nick

Similar threads