That's a strong possibility. WiiUs 32 MB eDRAM can store the whole 720p back buffer at once (plus all grass textures). EDRAM bandwidth is basically unlimited (enough for maximum fill at fattest format with MSAA) and EDRAM latency is lower than DDR/GDDR. Thus there's minimum penalty of writing randomly around the screen. As you said, commonly used back-to-front sorting causes pretty random screen space locality. No problem for WiiU or Xbox 360 or Xbox One. PS4 has plenty of excess bandwidth for cases like this. But Tegra X1 has only 25.6 GB/s bandwidth and no embedded RAM of any kind.I think DF are underestimating the massive amount of tile-unfriendly overdraw that the WiiU tuned vegetation - especially grasses - are causing. Drawing grass back to front with no consideration of screen space locality could cause a lot of traffic between rop cache and main memory.
Maxwell's tiled rasterizer helps a bit in cases like this. Grass patches have low polygon count, so it can buffer many grass patches and then split them to screen space tiles and raster one tile at time (one read & one write per pixel from memory). However the triangle binning buffer size is very limited for this particular scenario, as Zelda has so much grass. The developer should sort the grass in a way that keeps screen local grass blades near each other in the sorted list. This kind of sorting is however much more complex to implement. Macro tiling in software (viewport cull grass/particles for example to 4x2 split smaller frustums = 400x450 each) is easy, and the sorting actually gets cheaper as N is smaller per bin (N log N scaling). But you need to render grass/particles crossing tile edges to multiple frustums, slightly increasing geometry processing cost (shouldn't be any problem on Maxwell).
I can also see the bilinear filter lines clearly. Digital Foundy incorrectly stated that texture detail is configured differently for 720p and 900p. It's not. This is basic feature of hardware filtering. The gradient calculation is based on UV difference between neighbor pixels. 900p causes gradient to be smaller -> filtering hardware selects a more detailed mip sooner. Filtering hardware always tries to select as close as to 1:1 pixel:texel mapping as possible. Otherwise the textures would look as low resolution at 4K or 1080p as they do at 720p. In modern games bilinear filtering is not common. Trilinear hides the seam. But I still remember this same discussion when Voodoo 2 and Riva TNT were trading blowsAnd, yeah, it's bilinear filtering on Switch even when docked.