Snake Pass is relative, but its only one game. Its very relative to what we can expect from Indies on the platform, and less clear on what it means for AAA. We know the engine plays well with Switch, shouldn't be a surprise because Epic supported the Tegra X1 back in 2015 with the elemental demo. So box stock Unreal 4 seems to work well on Switch. The question arises far more from AAA developers who will spend lots of resources tweaking the engine and creating their own compute shaders. Indy developers are more likely to simply use the array of options that Unreal 4 offers rather than writing their own custom shaders. Maxwell is going to offer much better performance per flop compared to GCN, but developers have worked around this inefficiency on GCN by implementing compute shaders. AAA developers will do this, an Indie project like Snake Pass? Not so much. Its not that Sumo Digital better optimized Snake Pass for Switch, but that Maxwell has better utilization with less GPU stalls compared to GCN. Flop for flop Maxwell is more efficient compared to GCN. Developers use compute shaders to maximize utilization on GCN, but compute shaders would not be natively supported with Unreal 4, its on the developer to custom write these. So inherently Unreal 4 gets more out if Switch than it does PS4/X1. This is not that relative to AAA developers because they will write their own custom compute shaders for PS4 and X1 to maximize performance.
Somebody more knowledgable than me would have to detail this, but my cave man understanding is that Maxwell doesn't benefit from compute shaders like GCN because it doesn't have these stalls/lulls that compute shaders fill on GCN. As far as using custom pixel shaders instead of the defaults on Unreal 4, it would start to defeat the purpose of using an off the shelf game engine. Unreal 4 also scales down to mobile, so its pretty easy to take a given scene and scale it from high end PC hardware all the way down to mobile.
Unreal Engine and Unity both support compute shaders and use some compute shaders internally. For example lighting and some high end post effects are done by compute shaders. Unreal Engine also supports async compute, but doesn't extensively use it internally.
Both Nvidia and AMD gain from compute shaders. Compute shaders are used to reduce the amount of busy work (for example reuse data in groupshared memory & perform shared calculations once). However AMD GCN2 gains a bit more from compute shaders than modern Nvidia GPUs. Mostly because GCN2 has bottlenecks with pixel & vertex shaders.
Reasons why compute shaders are important for GCN2:
- Compute shaders use shared L2 cache. GCN2 ROPs have separate non-coherent cache. GCN2 needs cache flushes to read render target produced by pixel shader, but doesn't need one when the data is generated by a compute shader. Thus compute shaders reduce stalls (and BW), especially in post process passes.
- GCN2 has worse vertex shader performance than Maxwell: Techniques such as async compute skinning (DICE presentation) reduce the vertex shader latency.
- GCN2 has worse geometry performance than Maxwell: Techniques such as fine grained compute shader scene culling (also known as "GPU-driven rendering") reduce the number of drawn (but invisible) triangles. Link to DICE presentation:
http://www.frostbite.com/2016/03/optimizing-the-graphics-pipeline-with-compute/
- GCN2 is able to interleave graphics & compute tasks freely (inside CUs), a bit like "Hyperthreading". This fills execution stalls. This brings GCN an advantage over Maxwell. Also GCN2 stalls more than Maxwell, so it needs this more (see examples above).
All of the listed techniques also slightly improve performance on Nvidia Maxwell (except for async compute, it needs Pascal) and all other modern GPUs.
Unreal Engine and Unity need to support wide range of hardware. Some without compute shader support. GPU-driven rendering and compute shared based animation systems are invasive changes, that require big changes to engine infrastructure and potential trade-offs in flexibility. You can't simple plug them in as simple on/off alternatives. If you need to support wide range of hardware and wide range of needs (mobiles, laptops with iGPUs, high end desktops, Chinese internet cafes, movie production, 90/120 fps VR), you are better off using more traditional techniques.
Obviously AA/AAA developers targeting consoles and using Unreal Engine or Unity will add custom compute shaders and async compute work. Only small indie developers use engines without any internal changes.