Like what else for example? Triangle Throughput ? they are both tied in most metrics .. Pixel/Texel Fill rates .. etc , and Tahiti has much larger memory bandwidth .
There are differences in the speed of FP16 filtering, and differences in how the ROPs are tied to the cache hierarchy.
For cases where higher-precision texture formats are being filtered, Nvidia has a more consistent profile. I don't recall which architectures were compared for how their ROPs handled more complex Gbuffer formats, but at least until recently AMD's ROPs had a different (edit: and less consistent) performance profile.
Setup and tesselation capabilities are a strong point for Nvidia, batch sizes are smaller, and there are scenarios where the different arrangement of local store, L1, and a separate texture cache can be advantageous.
Then there's the changing situation with devrel, and the less than impressive driver situation.
Historically, we'd see Nvidia doing comparatively better in situations where Tahiti could not leverage its memory bandwidth, capacity, and ALU capability to the point that it can batter the 680's bottlenecks.
There were some benches where CPU limitations were potentially showing up earlier for AMD, which may be a driver issue.
In other cases, it just looks like Tahiti wasn't as nimble an architecture. It has been less successful in hiding the particular quirks of the graphics pipeline, and it takes more available work to get a head of steam.
There are compute studies that also show that AMD's architecture starts to falter sooner if you start reducing the number of concurrent items, although it has a very strong advantage if you have enough work available.