Crowds have a lot of opaque overdraw, so early z-reject and front-to-back drawing seems like an obvious and powerful optimization that should largely (almost completely) negate the bandwidth advantage of eDRAM. Or am I missing something?
The sorting part is actually simple since generally the seat in front of you will also be lower, hence you can just sort seats by increasing 'Y' to get a front to back draw order for field area cameras.
The problem is that crowd is instanced, so even though you see 50,000 in a stadium, there may only actually be 30 or so individual crowd members that are repeated. When you draw them, you'd want to setup crowd dude#1 and draw all instances of them first since you'd be using the same geometry, texture, etc for every draw call. Plus there is other trickery you can do as well when you are drawing many of the same guy over and over. But, this 'instanced' crowd member will be scattered all over the place, which doesn't work well with front to back sorting alas.
You can do a z-prepass, but crowd can be vertex heavy so it's not worthwhile.