Most of that can be inferred right from the basic positional/velocity/acceleration data. I guess you can throw in engine speed and gear selection, so you could get the engine sound right for opponent cars. You don't need to, however. The engine speed and gear selection could just as easily be finagled locally via AI in response to the positional/velocity/acceleration data. It may not be absolutely true to "reality", but who is to really know better (unless your remote opponents are not so remote and are actually playing right beside you on their own TV sets)...as long as the resulting experience is believable from a visual or aural standpoint.
The engine sound issue does illustrate the scale of data that should be sent and that which should be kept local. You don't send streaming audio of the engine sound over the comm link- that would be gratuitously wasteful of bandwidth. You send the bits of data (engine rpm and gear selection) so that the sound can be reconstructed locally. In the end, I don't see this data adding up to requiring anything along the lines of Mbit/s. If you are needing much more than a few kbit/s, then something is seriously wrong.