There are a few of variables that needs to be taken care of if you want good voice chat.
1. Bandwidth is the easy band aid, if you can throw bandwidth at the problem you usually are in good shape.
2. Codec, different codecs got different qualities and bandwidth requirements.
3. Sample rate, how often and how long should you sample? Here we get into issues, long sample times, means packetloss becomes an issue. Short sample times and we might get into an issue with small packets that most routers does not like and we get packetloss, delay and jitter.
Jitter we might be able to overcome by having "large" Rx buffers, but that again adds more delay.
And if we have to pad packets because they are to small (low sample time), then we are waisting bandwidth due to the overhead.
//Edit//
When we design use cases for FTTH projects, we usually slap a big bandaid on it and reserve 1Mbps per customer for the VoIP. That way they can have a couple of phones and even do some conferencing. Then when we actually deploy, we need to tune, based on how the home gateway copes, the codecs, sample time/rate and core and aggregation devices QoS actually handles the traffic load.