It was extremely expensive at first. The first not so naive SPU version, which was considered decent, was taking more than 120 ms, at which point, we had decided to pass on the technique. It quickly went down to 80 and then 60 ms when some kind of bottleneck was reached. Our worst scene remained at 60ms for a very long time, but simpler scenes got cheaper and cheaper. Finally, and after many breakthroughs and long hours from our technology teams, especially our technology team in Europe, we shipped with the cheapest scenes around 7 ms, the average Gow3 scene at 12 ms, and the most expensive scene at 20 ms.
In term of quality, the latest version is also significantly better than the initial 120+ ms version. It started with a quality way lower than your typical MSAA2x on more than half of the screen. It was equivalent on a good 25% and was already nicer on the rest. At that point we were only after speed, there could be a long post mortem, but it wasn’t immediately obvious that it would save us a lot of RSX time if any, so it would have been a no go if it hadn’t been optimized on the SPU. When it was clear that we were getting a nice RSX boost ( 2 to 3 ms at first, 6 or 7 ms in the shipped version ), we actually focused on evaluating if it was a valid option visually. Despite of any great performance gain, the team couldn’t compromise on quality, there was a pretty high level to reach to even consider the option. And as for the speed, the improvements on the quality front were dramatic. A few months before shipping, we finally reached a quality similar to MSAA2x on almost the entire screen, and a few weeks later, all the pixelated edges disappeared and the quality became significantly higher than MSAA2x or even MSAA4x on all our still shots, without any exception. In motion it became globally better too, few minor issues remained which just can’t be solved without sub-pixel sampling.
...
Integrating the technique without adding any latency was really a major task, it involved almost half of the team, and a lot of SPU optimization was required very late in the game.”