The distribution of the induced jitter is also important. I haven't looked at tkf's 2nd source, but I recall an AES paper that says that 10ns jitter is audible for frequencies near the top end of human hearing.
Essentially, you hear some "extra noise" in the extreme treble, if you hear it at all. Just a sidenote- if you are really averse to some extra noise in the extreme treble/ultrasonic range, then SACD is definitely not the right format for "soundstage".
Wow and flutter are very low frequency phenomena, and probably do not affect the "transparency" of the soundstage.
True, and you should also expect that if the timing is inconsistent at low frequencies, it only becomes even more so at higher frequencies. Essentially, the rudimentary design of an etched platter and mechanical turntable can barely eliminate speed/timing variations at low frequencies, thus it hasn't even a chance of keeping accurate timing at higher frequencies...yet we still have soundstage, right? I also doubt they were aligning the left and right audio tracks down to the nanosecond, let alone picosecond, as they carved the master disc. The technology for that sort of manipulation simply did not exist for the vinyl era. Even if it did, it would still be pointless, since the aforementioned imperfection of the media and mechanical playback scheme could never be accurate down to nanoseconds, anyway.
So I'm thinking the correlation between soundstage and jitter is looking pretty busted for a theory, unless anybody would like to go on the record that vinyl was inherently incapable of "soundstage" or any kind of redeeming stereo imaging effect, whatsoever.
The thing is, stereo does not allow the soundstage to be reproduced accurately. It doesn't give realistic positional information. The soundstage one perceives is highly fictional.
No doubt about that. The extension to that is that no one has yet invented the "perfect" mic arrangement to utterly capture an audio event in 3 dimensional space + time. It's
all a compromise for the time being (though some very good results can still be attained). No amount of jitter or lack of it is going to make up for that. So I say, if something is to be blamed for not perfectly capturing a soundstage, at least put mic'ing techniques at the top of list. That's going to be many orders of magnitude more dominant an effect than any sort of jitter at the level of picoseconds.
Timing jitter is error in the x-axis, and bit depth affects error in the y-axis.
...and in both scenarios, scale is an important context. Would you sweat a time unit on the x-axis that is on the order of
1 million times smaller than the smallest division on your x-axis any more than you would sweat an amplitude unit on the y-axis that is on the order of 1 million times smaller than the value of your least significant bit? If this is a 16-bit system, this is essentially like attributing some sort of sound quality happening at a tiny value
20-bits beyond that. That's right- a 16-bit audio track actually needing 36-bits of resolution? Extreme, right? Certainly, that kind of extreme calls for skepticism. You may believe otherwise, but that's how the math seems to work out. We might as well give up now, because the sort of time precision in that realm might as well require a clock with time increments that could track the incremental motion of an electron in widths of 1 electron.
Tones near the nyquist limit, that aren't at full dynamic range, and hence don't benefit from full bit depth, have a lot of phase information discarded that might be important.
The only phase information that can be discarded in the process is the phase information belonging to frequencies pertaining to beyond the nyquist limit. If you want to store phase information down to the picosecond, you are essentially calling for a sampling bandwidth on the order of tera-Hz. I don't know if that is such a reasonable goal for the purpose of human hearing (which can barely break 20 kHz). At some point, you got to acknowledge that other parts of the audio chain are far more the bottleneck to fidelity than jitter at the picosecond level. Just the shear amount of phase anomalies that occur in a premium tweeter transducer element are like a meat grinder to the waveform compared to picoseconds of jitter happening in the digital domain.