Running a Cypress system would be an interesting experiment. I'd wonder if even a laboratory's HPC programmers would have the stomach to hack the assembly to get decent compilation.
The payoffs are potentially very good, though. Cypress can do a little over 2 TFLOPs single-precision matrix multiply and 500 GFLOPs DP with a bit of hacking.
Well, if the National SuperComputer Center in Tianjin/NUDT can be #5 in the Top500 using RV770s, it has to be doable...