GPUs now have multi-teraflop performance, stew in more of their heat than ever before, and are manufactured with finer and finer processes. I imagine that at some point a calculation error is bound to crop up due to random physical fluctuations.
This doesn't matter too much in gaming; who cares if you get a mis-colored pixel every once in a while. However, it matters quite a bit if you're simulating hyperbolic dynamics. Are hardware manufacturers implementing more stringent error correction in their GPUs now? I've heard of ECC memory in compute products, but are say error correcting codes getting longer internally as well? Also, what's the best practice for modern simulation code to take the inevitable physical error into account?
This doesn't matter too much in gaming; who cares if you get a mis-colored pixel every once in a while. However, it matters quite a bit if you're simulating hyperbolic dynamics. Are hardware manufacturers implementing more stringent error correction in their GPUs now? I've heard of ECC memory in compute products, but are say error correcting codes getting longer internally as well? Also, what's the best practice for modern simulation code to take the inevitable physical error into account?