No, as far as I know, the streams on PC and XBox are identical. And the Kinect is dependent on bandwidth, CPU, and memory. The Skeletal processing is a lib developers link in, and it uses a certain amount of memory and some part of a hardware thread. Speech is the same. The amount of memory required limits the skeletal database size and the number of joints. With huge gobs of ram, we could probably track lots more joints, although generating the machine learning database ("Exemplar" for those in the know) would take months on the gigantic cluster we currently use for it. Speech is the same, we generate a database using machine learning, and if I recall correctly, generating a new model took upwards of a week and cost in the 6-7 figure range.
There's a bunch of processing on the streams even before they get given to the skeletal and speech pipelines. For audio, this is relatively heavyweight and it does all the magic like echo reduction, speech isolation, and beamforming, this is so we can get the cleanest speech track possible to send to the speech subsystem. For skeletal, It's a lot more lightweight, and is probably just a noise reduction process.
So to answer your question, I'd say memory would be the biggest concern for most of the Kinect functions, it's by far the most precious resource on the box.