Well let's see just for fun: 2005 (let's say beginning of 2005 otherwise I have to mention dual-cores...) if you ran some near high-end cpu, that would be something like a 2.2Ghz A64 (on socket 939...). Z3770, much less Z3740, cannot outperform that actually (per core only, of course...) in general. The Z3770 would be somewhat close, though.
A Radeon 9800 would have been quite old in 2005, so let's use a X800 instead. That would be 12x4 MAD (fp24) + 6*4 MAD (fp32) per clock (and 400Mhz clock). That's something like 30 gflops, which is just a bit below that of Bay Trail chips (roughly 40 gflops assuming max gpu clock), though obviously the Bay Trail flops should be far more flexible. TMU-wise the X800 has a factor of 2 advantage (taking clocks into account) whereas the X800 also has like 3 times higher ROP capability. Memory bandwidth would also be higher in X800 though not massively so, 22GB/s vs. 17GB/s.
So you are quite right that even with Bay Trail which has comparatively high bandwidth / flop, the X800 still had quite a bit more bandwidth / flop.
(FWIW it's not _quite_ true that a Haswell GT3 has 10 times the EUs but 1.5 times the bandwidth - while it actually has potentially more than 10 times the ALU capacity (as it has higher max gpu clock) it also has access to LLC which of course helps some with bandwidth issues though I can't stick a number to that.)