well it's 180/4 = 45 TFLOPS per ASIC, very poor performance in my opinion for dedicated silicon. The important sentence in the source article:
GV100 is 120 TFLOPS per GPU (960TFLOPS in HGX rack) and can also be used for HPC (strong FP64) and any other more challenging workflows (with the new thread scheduler)
Right, the article title is a bit misleading "Machine-learning ASIC doubles performance", as TPU1 already did 90 TOPS/s 8 bit inferencing. In that respect it indeed looks rather poor.