Bandwidth per core isn't an issue beyond bandwidth per execution units. 16 single FMADD units on sixteen cores at 2 GHz aren't going to need the bandwidth of 4 128bit vector units at 3 GHz on two cores. Any architecture needs to balance memory performance with processing potential. Choice of many small cores or fewer large cores should be made solely on what the code devs will be using will be a better fit for. If future algorithms are easily parallelisable and you can get more peek performance with more cores, than go for it. If the algorithms will be better off with stronger cores then use fewer of them. Whatever processor architecture you go with, couple it to a suitable RAM solution so it's not gimped.