Nvidia still hasn't released P100 as a plug-in PCIE board yet, but they will. (The white paper mentions how the existing P100 module can be put onto a carrier board). P100's performance eclipses K40 and even K80 in every metric, so it's not like NVidia is worried about their replacement. The P100 whitepaper also mentions graphic support, so it will be a high end Quadro as well. (GM210 was a Tesla-only chip with no Quadro version, so it wasn't originally clear if P100 had the same limitation. It doesn't.)
A GP104 Tesla and Quadro will be released (just using logic here, not insider knowlege). That gives more compute power at less wattage than the Maxwell M6000 and M40, so when NVidia inevitably releases both P100 and GP104 PCIeE Teslas and Quadros, they've covered the range of professional users pretty well.
That leaves GP102. Obviously it'll beat GP104 in performance and compute density, but to professional users its unique appeal over GP104 may be memory capacity. GP 104 and P100 are both limited to 16GB which may be too small for some professional workloads. NVLink memory communication mimimizes this limitation on DGX-1 systems, but it'd have more impact with PCIE board versions. GP102 likely uses a 384 bit bus and would support 24GB of memory, matching the M6000 and M40, leaving the professional Maxwell parts with no remaining advantages (similar to GTX 980 and 980Ti's eclipse).
So there's no immediate SKU hole without GP102, and there's not a desperate market pressure to accelerate its release. My unsupported best estimate is a November 2016 demonstration of a GP102 Tesla at Supercomputer 2016.
Yes but not everyone who has a k40 or k80 is going to spend a fortune on the P100.
And those are two very old models now.
Nvidia really needs another mix-precision Tesla model that is more competitive on price, and one where its architecture spreads across at least two of the business sectors of Tesla and Quadro, that eventually also releases as Titan on Consumer.
It sounds like you are specifying the GP102 as a M6000 and M40, so they would end up replacing those and leaving K40/K80 in production in both high performance Tesla servers and workstations?
TBH I think they need to do what happened with the GK110, and have a die that crossed between Tesla/Quadro/Titan, albeit this time it is not the top Tesla die.
This would put another card into Tesla but with less DP but at a cheaper price, enable the M6000 and M40 to drop down in price or replace them, and provides the Titan.
As this is a new die, memory can be whatever they design it for; whether HBM2 or GDDR5X (but this really needs to be at least 12Gbps -120 product IMO).
Yeah I see what you mean about minimum memory for Quadro workstations, but they really need another product in the Tesla range as well as K40/K80 are too long in the tooth and not everyone is justifying the P100 upgrade costs for their research centre/business.
There is also the consideration of Intel also looking to attack Tesla business.
Problem is how to fit a die into all 3 branches of NVidia families that makes sense with regards to what they replace or position above.
Cheers
Edit:
Just to say yeah they could go the route you mention.
Problem for Nvidia is how they are going to fit next products into each of the performance families (Tesla/Quadro/Titan) without conflicting in terms of performance overlap, and they have a need to update several cards in each.
I guess it comes down to priorites, they also maybe giving themselves a headache by going with the premium large die straight off.