Video and Sound decoding is handled by two dedicated chips: the AVC decoder handles video decoding (or the bulk of MPEG4 AVC decoding) while the 5 GOPS monster, the VME (a re-configurable DSP) handles sound decoding and sound processing as well as assist the AVC chip.
For DRM purposes (even though there is separate silicon for secure I/O taht handles 128 bits AES encryption), I/O and who knows exactly what else
you have in a non-user-programmable function a full customized R4000i core, separate from the main CPU which is another R4000i with the addition of the VFPU (together with a more traditional FPU), with an FPU and it runs at a speed selectable between 1 MHz and 333 MHz (it can be controlled through the software IIRC).
To Recap:
CPU = R4000i + FPU + VFPU
Media Engine = R4000i + FPU
The Media Engine has control of the AVC chip and the VME chip as well as the other blocks used for I/O, cryptography, DRM, etc...