Judging from the technical description of the architecture, Navi 10 has a very efficient shader execution. If performance did not match the thoretical output, it may be due to something out of the workgroups/CU, in this case the main culprits may be in rasterization/primitive culling, shader scheduling (ACEs/load distribution circuitry), limited datapaths, texture unit capability and ROP capability. If indeed there were one or more such bottlenecks and they managed to solve it in this new iteration of the RDNA architecture, then they may show a big performance jump. There are a lot of "ifs" though.