Well, to do it "right" you'd need soft body (cloth) simulation -- including rather fine-grained collision detection -- for each piece of clothing on every single character.
(And beyond the computational requirements I'm not sure how well the content creation pipeline is equipped to handle real clothes at this point)
The first problem is that there's no 100% solution for cloth.
We do a lot of cloth simulation for our movies and there are lots of clipping issues every single time. But since it's a prerendered cinematic and we save out the cloth animation as per vertex data anyway, our tech guys can go in and manually correct all the small mistakes, frame by frame if they have to. It's usually a LOT faster to push those vertices around with a deformer, then running more and more simulations and wrestling with the parameters.
Obviously, a game can't do that.
On top of that, there are some other issues...
Calculation times are not necessarily the same depending on the movement.
Tight clothing and layered clothing are pretty complicated to do - big, flapping cloaks and such are a lot easier.
Our cloth stuff also needs a few frames of run-up time, where the character starts from a T-pose and the cloth just floats around it, then we activate gravity and move the character into it's starting position and pose. It takes about 10 frames before the actual shot starts. Good editing takes care of inconsistencies between shots, or we can always pack multiple shots together if it's necessary.
Games can't do this either, although I'm not sure if it's a must have thing for the simulation or it's just used to get better results.
Good looking clothing (folds etc) needs polygon counts in or preferably above the 10K range. For the cloth only. Otherwise it'll look blocky and ugly.