It can't be plain stereo because the PSVR box needs audio in a format that allows it to produce the 3D positional audio for the headset. Uncompressed multichannel audio can consumes quite a lot of bandwidth.
I am certain the reason PS4 doesn't support 120hz 1080p is the same reason the original Xbox One doesn't: insufficient HDMI bandwidth. If a method could have been found to make it work, Microsoft would have supported it on the original hardware.
It is almost certainly a multi channel format, a vanilla Dolby 5.1 (AC3) stream for example is tiny tho, far less that Stereo PCM so it's contribution to bandwidth used is basically nil.
It might be 100mb for an hour of audio.