I still believe in the idea of a single, unified, scalable architecture that serves all purposes; a pool of processing resources to be used however the software requires, with maximum flexibility and zero wastage.
Define Wastage.
To me dropping your Texture Units for "a pool of processing resources to be used however the software requires" that is an order of magnitude
s slower for a task repeated many times in every frame is a good example of Wastage. This would be allowing Philosophical Idealism to rule good Silicon utilization.
You have recognised yourself the excellent IQ of GOW. Are you saying you'd rather this was not possible and stick with 4xMSAA?
Devil's Advocate: 1) essentially no games use this technique so if the measure of good design is the rare exception, then yes, it is probably not a good thing to enable and 2) no insult to the GOW3 guys, but the PS3 has 6 available SPEs, so why weren't those resources spent for the game? 3) 4 SPEs is a ton of resource footprint, hard to argue against a ton more general system bandwidth and pure GPU power (something the PS3 lacks and the majority of games would benefit from) as these could and would be of more benefit in more games on a more frequent basis.
Or the increased demands of 16xMSAA? Yes, one way of looking at it is trying to find something useful for SPEs to do, although I feel that somewhat belittles the GOW and associated teams' efforts in squeezing the technique onto already very occupied hardware. It's not like they had 4 idle SPEs doing absolutely nothing and were just trying out things to fill them up, and found MLAA was a great processing hog that let them max out the platform so they can claim 100% utilisation!
I like the results I saw but a I think it is worth noting that 4x to 8x MSAA isn't a 2x doesn't result in a halfing of performance of GPU bound software nor require 2x the hardware to get the performance parity. If your bandwidth is a general resource (unlike Xenos) and you have the goal of quality IQ (
which this generation shows: Most developers and console makers aren't married to clean textures and smooth edges... stupid assumption on my part!!) the cost of MSAA in the chip is low, the IQ is high and when implimented correctly avoids nasty IQ issues (see: repi's complaints about MLAA and why it was NOT used in BFBC2), accessible to developers for the purpose, offers performant solutions to other IQ related issues rather than brute-force approaches that are costly, and the bandwidth is a sharable resource. We have already seen that MSAA can be adjusted on the fly to adapt to framerate so we could see situations where a game is bandwidth bound scale the MSAA to adapt to resources present.
In general I don't think we will see MSAA just dropped from hardware anytime soon because it has earned its place on the hardware. Lets not get ahead of ourselves by 1 or 2 games where 1) resources were not utilized and 2) the approach worked well with the game on 3) a system with some SERIOUS gotchas in terms of other AA approaches. As noted above repi didn't think much of these alternative approaches in BFBC2 so it may not be a good general solution... yet.
However, another way to look at it is that MLAA was a solution looking for a platform that could pull it off.
This reminds me
slightly of the RayTracing verses Rasterizing debate. RT is always a technique waiting for a platform to pull it off. Yet the resources to do so are always much, much higher than those to do better looking Rasterization. This isn't as extreme but, and as noted I liked the effect a lot in GOW3, it seems the cost is quite high (why not go with robust MSAA hardware + kick as CPUs that actually developers can use easily... and make more games better overall) and some developers have already given a thumbs DOWN to the techniques.
T.B. has mentioned in that MLAA thread that the solution as featured in GOW3 doesn't map well to current GPUs, meaning it's not an option.
The
current solution
We gave PS3 developers 4+ years to figure this one out, lets give the GPU guys the same time frame to justify their hardware.
As you say, MSAA hardware can be repurposed by clever coders, but that's them working against/around the limits of the design. Wouldn't it be better if instead of cleverly using custom hardware to do unconventional workloads (the whole basis of GPGPU performance), the hardware was fully programmable and there were no architectural limits that either limits your options (not being able to use MLAA if you want it) or adds complexities (finding a way to reengineer existing MLAA methods to a GPU's design)?
I wouldn't say using MSAA hardware to allow
significantly cheaper soft shadow edges or A2C that is passable IQ (versus complete game redesign to remove heavy alpha usage) is hacking the hardware or unconventional. The problem is the hardware as it stands, SPEs and all, this
little amount of hardware offers big IQ and performance increases. I defer back to the Texture Unit example previously given. Larrabee wanted to be the pie in the sky programmable platform but even Intel couldn't stomach the thought of dumping these units (which are large and take away from a lot of
potential programmable units!)
It all sounds good in theory--and you can find corner cases to prove your point--but if this generation tells us anything there are bigger fish to fry. Approachable hardware that gets quality products out with robust content, on schedule and on budget, are more important metrics. Yeah, it sucks that real world business dictates the coolness of the industry, but I think the reason we saw the hardware your envision (Intel's for practical purposes) get v.1 canned was for the very issues the industry faces. Cool concept, too slow, too much out of the box, trying to find problems for the solution instead of addressing the core issues.
The idea of software raasterisers means zero architectural limits. No requirement to use one or other AA method, or one or other lighting method, or to do rasterisation when your game would benefit more from ray-tracing. It throws the doors wide open, and I believe the advancements in software solutions would be dramatic leading to efficiencies that outweigh the brute-force method of the current GPU structure.
You should be writing the Intel GPU blog