SwiftShader 2.0: A DX9 Software Rasterizer that runs Crysis

What's the heat like for such a CPU? I assume it's air cooled but does it use a classic cooler or half a kilo of copper and a 12 cm fan at high RPM? I've been hearing really good things about Intel's 45 nm...

Running an Arctic Cooling Freezer 7 Pro and BIOS controlled fan speeds, temperatures are 65-70c.

That sounds worrying. What's your O.S. and 3DMark settings? Does it happen consistently? Is the SwiftShader logo still there for the black frames? Are you running FRAPS or anything else that might interfere? Clues about other potential causes are highly appreciated. Thanks!

I have Vista 64-bit Ultimate and a Q6600, and can't reproduce anything like that (just a rapid slideshow of 3DMark06 images).

Vista Ultimate x86, 2 x 3870X2 cards, standard settings in 3D Mark 06. Does the same with CrossFire X enabled and disabled. The SwiftShader logo disappears when the screen goes blank. I have had the same on this rig with CrossFire X enabled however so could be a configuration issue on my end.
 
UT3 runs, detail and resolution doesn't seem to make a huge difference, was actually getting about 2-3fps at 1680x1050.

I get 6fps at same res

/Davros revels in the fact that even 4GHz QX9650 users are humbled by the sheer awesomeness of his mighty Q6600 :D
 
I get 6fps at same res

/Davros revels in the fact that even 4GHz QX9650 users are humbled by the sheer awesomeness of his mighty Q6600 :D

Probably not the same settings though, I was running my standard test of a fly-through of a level while a 32 bot deathmatch was going on, which rather hammers the CPU by itself.
 
Really great renderer!
And I tested it with Max payne demo on both Core2 duo 6600(3Ghz) and an X2 3600+(2 Ghz), the performance difference is really HUGE. The only reason to explain is that the x2 is just cache thrashing... even when I was just looking at the main menu. Did you try anything like tiled raster? As you are tageting high spec CPUs, I think it's rather useless though, if the cache just fit the entire texture.

And also very interested to know your opinion of CELL processor. ;)
 
Cool tool. For some reason it runs tremendously slow on my Phenom X4 9500 system. Trying the included CubeMap.exe demo, it ran at 5FPS.

Quite inexplicable as on my A64 X2 3800+ it goes at 25FPS while my C2D T5500 goes at 50FPS. I'll try it on my C2Q Q6600 later.

Incidentally, running this in Virtual PC 2007 results at about 75% performance (comparing single cores). Also, my T5500 seems to run it at about 25% of the speed my GMA950 can do it.
 
Cool tool. For some reason it runs tremendously slow on my Phenom X4 9500 system. Trying the included CubeMap.exe demo, it ran at 5FPS.

Quite inexplicable as on my A64 X2 3800+ it goes at 25FPS while my C2D T5500 goes at 50FPS. I'll try it on my C2Q Q6600 later.

Quite strange.
My x2 3600+ runs the MAX Payne demo menu at 9fps while my C2D 6600(3Ghz) gots nearly 110. And x2 3600+ is very sensitive to texture size, while C2D was completely not affected.

But considering Phenom X4 9500 has got the same L2 size as X2 3800+, I have no idea. Is there any Intel-specific optims or hopefully just some easy-to-fix bugs such as cpuID detection?
 
Texture fetches tend to be latency sensitive, AFAIK.
That's probably because of the overall robust caching in Core2 architecture -- for instance, K8 can't prefetch from L2 (both back and forth) and the exclusive nature adds an additional write (evict) cycle to slow it further down, not to mention the lack of L1 cache line transmit between the cores, bypassing the main memory access. Adding on that is the superior integer throughput and full arbitrarily flexible read/write op's -- all this stacks up over the oldie K8.
 
And I tested it with Max payne demo on both Core2 duo 6600(3Ghz) and an X2 3600+(2 Ghz), the performance difference is really HUGE.
Yeah, we discovered this too very shortly after the release. AMD's processors behave differently with our mutex lock implementation in specific situations. You might even see a performance increase when reducing the number of threads to 1. We're working on the issue and a processor specific workaround has been found but preferably a uniform implementation will be used without compromising performance on any multi-core CPU architecture. Either way we should have an update soonish.
 
Last edited by a moderator:
Ah, setting it to one core for the Phenom made it run properly.

So a 2.2GHz Phenom core is about as fast as a 1.66GHz C2D core for SwiftShader; both give about 40FPS.
 
AMD's processors behave differently with our mutex lock implementation in specific situations.

Great news! hope to see an update soon.
BTW could you share some infomation about the AMD mutex problem? is there a hardware bug?
Thanks in advance.
 
Yeah, we discovered this too very shortly after the release. AMD's processors behave differently with our mutex lock implementation in specific situations. You might even see a performance increase when reducing the number of threads to 1. We're working on the issue and a processor specific workaround has been found but preferably a uniform implementation will be used without compromising performance on any multi-core CPU architecture. Either way we should have an update soonish.

Thats actually quite a weird issue! any more details on it or the type of mutex you are using? In general AMD processors have lower latency on lock prefix instructions that intel processors and should preform fast in mutex limited cases.

Aaron Spink
speaking for myself inc.
 
Thanks for releasing a demo. Even if it's not very useful, it's an amazing piece of software.
Thanks!

I think it's really already very useful though, for select markets. The casual games market for example is huge and still growing. There is a number of casual games in the 10 top game sales every year, and casual games are often in the headlines of professional game developer magazines: Gaming 2020 Panel Predicts A More Casual Future. If they can grow their audience by some percentage and/or save on QA and support with SwiftShader, then that's a significant win.

So I stronly believe that it's already commercially viable. Call it phase one. Phase two would be actually competing with IGPs. Right now SwiftShader 2.0 on an Intel Q6600 soundly beats an Intel Extreme Graphics (845G) at 3DMark2001 SE (and let's not forget that this IGP doesn't run 3DMark03/05/06 at all). Ok, it's a very modest result but it proves that dedicated hardware rendering is not untouchable. Games of that hardware's generation, like for example Max Payne, play quite smoothly and in my opinion are still fun. And more games become playable with software rendering every year.

Granted, there's still a way to go to make it an acceptable solution for modern 'office systems', but I think SwiftShader 2.0 is a crucial milestone in getting there.
 
BTW could you share some infomation about the AMD mutex problem? is there a hardware bug?
It's not a hardware bug, but it is related to what I'd call a loose specification... Although I obviously would have liked AMD and Intel processors to behave exactly the same in all circumstances I can't really point fingers at either of them. It's an issue that can and should in the first place be fixed in software. Hold your breath. ;)
 
Funny how SwiftShader can match the performance of the GeForce FX5600/5700 in 3DMark05. Says more about the appalling implementation of SM2.x in the FX-series than about SwiftShader though.
As you may recall, back in the day games would switch to SM1.x on these cards. I believe with HalfLife2 even the 5800 only got about 20 fps on the SM2.0-path. With SM1.x it went up to about 80 fps (even though for the direct competitor, the Radeon 9700, SM2.0 or SM1.x made very little difference in performance).
So my guess is that in actual games, the 5600/5700 would still be 4 times as fast as SwiftShader.

Here's something else you can ponder about:
http://bohemiq.scali.eu.org/forum/viewtopic.php?t=35
This runs about 50% faster with the CPU vertex path than with the shader path on SwiftShader. There also seem to be some nasty 'popping' polygons with SwiftShader, where it seems to render properly on hardware.
Another thing to ponder about:
http://bohemiq.scali.eu.org/forum/viewtopic.php?t=38
Similar routine with a Java software renderer, renders about twice as fast as SwiftShader on my dualcore machine, even though the Java renderer is single-threaded.
 
Last edited by a moderator:
Here's something interesting I noticed on my C2Q Q6600 with the cubemap.exe demo.

Running 4 cores is slower than running 3 which is slower than running 2 cores. 2 core is always faster than 1 core though.

However, even that has an interesting effect. If I select cores 0 and 1 or 2 and 3, I get 110FPS. If I select another combination, I only get 90FPS (a single core gets about 70FPS).

It seems that the shared L2 cache of the C2D makes a huge difference here. Once I have to use the FSB, the performance starts to plummet.
 
Tested Swiftshader with our newest game Trials 2 Second Edition on my Core 2 Quad + Vista 64 bit computer. The game runs well and all UI graphics are fine and vertex shaders seem to work properly. Post process filters also seem to work fine (motion blur, depth of field, light blooms all seem correct). The skinned driver model is also rendered 100% correctly (with proper lighting and self shadowing also). So the deferred rendering g-buffers contain valid data and all the lighting shaders work. However all mipmapped textures are completely broken. We are mainly using DXT5 compressed textures and manually lock all mip levels and copy the compressed data there. We copy the texture mipdata from one big texture atlas to the mip levels. Is this a problem for Swiftshader, or do you think there is some other problem?

All the textures are working fine on hardware devices we use in testing (and the game runs fine on DX debug mode max validation): NVidia Geforce (FX, 6000 series, 7000 series, 8000 series), ATI/AMD Radeon (9000 series, x000 series, x1000 series, HD 2000 and 3000 series) and Intel (GMA 950 series, GMA 3000 series, GMA 4500 series).

If you want to test the game, you can download the game demo from www.redlynxtrials.com.

Screenshots of the rendering issue.

Swiftshader:
renderswys8.png


Radeon 3850:
renderhwqs2.png


Seems that our graphics artists have forgot to select "generate mipmaps" for the driver textures, and that's why the driver is rendered correctly. UI graphics with mipmaps also do not work (scaling arrows use mipmaps in some menu screens).
 
Last edited by a moderator:
Thanks!

I think it's really already very useful though, for select markets. The casual games market for example is huge and still growing. There is a number of casual games in the 10 top game sales every year, and casual games are often in the headlines of professional game developer magazines: Gaming 2020 Panel Predicts A More Casual Future. If they can grow their audience by some percentage and/or save on QA and support with SwiftShader, then that's a significant win.

So I stronly believe that it's already commercially viable. Call it phase one. Phase two would be actually competing with IGPs. Right now SwiftShader 2.0 on an Intel Q6600 soundly beats an Intel Extreme Graphics (845G) at 3DMark2001 SE (and let's not forget that this IGP doesn't run 3DMark03/05/06 at all). Ok, it's a very modest result but it proves that dedicated hardware rendering is not untouchable. Games of that hardware's generation, like for example Max Payne, play quite smoothly and in my opinion are still fun. And more games become playable with software rendering every year.

Granted, there's still a way to go to make it an acceptable solution for modern 'office systems', but I think SwiftShader 2.0 is a crucial milestone in getting there.

I need to rephrase that. It's not very useful to most of us HERE, because we all have 3d cards that are faster.

I really don't want to make it seem like I don't respect the work you've done. I think it's jaw-dropping amazing you can get this level of performance out of a software renderer, at this quality. I'm having a ton of fun checking it out!
 
Hi Dennis!
Funny how SwiftShader can match the performance of the GeForce FX5600/5700 in 3DMark05. Says more about the appalling implementation of SM2.x in the FX-series than about SwiftShader though.
You're right, but does it really matter? You can name a few excuses why hardware is sometimes slower, but I have a thousand excuses why SwiftShader (or rather the CPU) is often slower. ;)
As you may recall, back in the day games would switch to SM1.x on these cards. I believe with HalfLife2 even the 5800 only got about 20 fps on the SM2.0-path. With SM1.x it went up to about 80 fps (even though for the direct competitor, the Radeon 9700, SM2.0 or SM1.x made very little difference in performance).
So my guess is that in actual games, the 5600/5700 would still be 4 times as fast as SwiftShader.
Well, sure, but then the game needs a Shader Model 1.x path, and you miss all the Shader Model 2.0 effects. I could make SwiftShader a whole lot faster too, but then quality would be affected. Heck, Quake now runs smoothly at 1600x1200 in software, without making use of SSE or multi-core... But how relevant is that?

A casual game like RoboBlitz even requires a GeForce 6 as a minimum! In all fairness, SwiftShader doesn't run it just yet, but it does show where casual gaming is going.
This runs about 50% faster with the CPU vertex path than with the shader path on SwiftShader.
Interesting, but not really a big surprise. Obviously you can optimize the processing for exactly the things you need, while with SwiftShader a detour is taken with DirectX shaders.
There also seem to be some nasty 'popping' polygons with SwiftShader, where it seems to render properly on hardware.
I'll look into it. Thanks for the demo.
Similar routine with a Java software renderer, renders about twice as fast as SwiftShader on my dualcore machine, even though the Java renderer is single-threaded.
Again, impressive for a Java renderer, but totally incomparable. First of all it's not the same application. And because you wrote both the application and the renderer it can be a lot faster than a software renderer that can render Crysis, Unreal Tournament 3, Call of Duty 4, etc. which are aimed at hardware rendering.
 
Back
Top