Ok not performance-wise but...
I've created a software renderer which supports ps 2.0 shaders. Unlike the reference rasterizer, it is not an interpreter but an optimized JIT compiler using SIMD instructions. So it is dozens of times faster than refrast and beats hardware in flexibility. For one, it is no problem to go beyond the 32 texture instructions limit in ps 2.0 hardware, while still keeping things real-time.
You can read the details here: http://www.flipcode.com/cgi-bin/msg.cgi?showThread=COTD-swShader&forum=cotd&id=-1.
I've created a software renderer which supports ps 2.0 shaders. Unlike the reference rasterizer, it is not an interpreter but an optimized JIT compiler using SIMD instructions. So it is dozens of times faster than refrast and beats hardware in flexibility. For one, it is no problem to go beyond the 32 texture instructions limit in ps 2.0 hardware, while still keeping things real-time.
You can read the details here: http://www.flipcode.com/cgi-bin/msg.cgi?showThread=COTD-swShader&forum=cotd&id=-1.