If you connect 30 SPEs to a single main memory interface, you will hit the memory wall as well. It doesn't matter what kind of cores you are using if you have a single shared main memory that is used by all those cores (to read inputs and to write results). Local work memories (or caches) are useful for storing temporal structures, but you still have to move data to/from main memory at some point.IF cache sizes have had to balloon exponentially to keep up with just a few cores, I'm not entire sure putting 30+ cores with smallish caches will not result in subpar performance as expected from the memory wall issues.
SPEs can move data between each other without passing it though the main memory, but so can general purpose CPUs (though L3 cache). Of course the interconnect network between the cores will become the bottleneck if you scale the core count too much (more cores require more complex interconnect network, as you cannot just create a direct link from each core to each core). A fully distributed memory system is the only way, it we want to keep scaling and scaling up the core count in the future (to thousands of cores).