I do agree, however, that you would be very hard pressed to get full DX9 compatibility and the speed they are claiming with that sort of transistor budget.
If you could get 70% of the performance of the Radeon 9700 via software emulation (which is what you are describing), then don't you think people would have done this instead of making such a complex chip? CPUs are very programmable, but they don't lend themselves to massively parallel operations unless they are specifically designed to do so. This is one reason why CPUs aren't replacing video cards. Also, a 30 million transistor CPU isn't going to be parallel enough to give reasonable results as a 3D rasterizer trying to compete with the Radeon 9700.
Think about a Pentium 4 trying to execute pixel shader operations in SSE2. How many cycles does it take to compute a single MAD (multiply and add) operation? Then you can figure out the maximum number of common single instruction shaders you can execute in a second. I bet it doesn't compare too well to any reasonably fast 3D chip. Don't forget I didn't compute vertex operations, depth testing, fog, or alpha blending operations (to name a few).