That said, I don't know how important this kind is to the HPC crowd: in many cases (e.g. Google), it can be avoided due to system level redundancy or by detecting errors with a result sanity check at the end. But I suppose the knowledge that everything went fine in a commercial MRI machine may be worth the additional money... if that still qualifies as HPC, that is. I believe the ECC feature is going to be more important for smaller scale deployments such as engineering workstations where an additional couple of $1K don't really register compared to the overall cost of SW licenses and engineer salary.
This makes sense to me. For smaller scale deployments, a few thousand dollars here or there won't be much of an issue for the convenience of the Tesla products. For a large scale deployment, things are totally different, and cost savings could potentially be huge if one is careful with what option they go with.
Based on the projected DP GFlops and retail pricing of Tesla 2xxx systems (C2050 is $2499 for 520 DP GFlops, C2070 is $3999 for 630 DP GFlops, S2050 is $12,995 for 2080 DP GFlops, and S2070 is $18,995 for 2520 DP GFlops), and assuming that GTX 380 retails at $599 with 700 DP GFlops (retail price intentionally overestimated and GFlops intentionally underestimated):
Cost to reach 1 PetaFlop (Double Precision) with GPU's would be as follows:
C2050: $4.81 million
C2070: $6.35 million
S2050: $6.25 million
S2070: $7.54 million
GTX 380: $0.86 million
Cost to reach 1 ExaFlop (Double Precision) with GPU's would be as follows:
C2050: $4.81 billion
C2070: $6.35 billion
S2050: $6.25 billion
S2070: $7.54 billion
GTX 380: $856 million
What a huge difference in GPU cost for these large scale systems! I know this is highly oversimplifying things because this doesn't include CPU Flops, doesn't include additional hardware costs for the C2050/C2070/GTX 380 which the rack-mountable S2050/S2070 systems don't incur, and doesn't include performance and reliability differences due to amount of RAM, ECC mem, driver tuning, etc., but still pretty interesting.
The reality is that large scale HPC systems are to some extent driven by marketing reasons to reach world record GFlop/TFlop/PFlop peak performance levels, in which case simply providing the most Flops per dollar will be very beneficial in achieving this goal (and the next big push over the next 5-10 years is for an ExaFlop system). But is this really the right approach?
I would hope that HPC systems of the future will put less focus on peak performance, and more focus on real world performance per watt, real world performance per dollar, long-term reliability, etc. in which case NVIDIA's products should do fine in this space given the efforts they have put forth with the Fermi architecture.
I'd say it is up to NVIDIA to demonstrate to their HPC customers that Tesla is the way to go due to higher amount of RAM, ECC mem, and optimized drivers providing a real and tangible performance and reliability benefit which would make the above cost per PFlop/EFlop comparison moot. In the gaming market, we already know that real world gaming performance differences between NV and AMD cards does not correlate well with differences in peak SP Flops. Maybe the situation will be similar in the HPC market too, so NVIDIA has to clearly show this too.