Fafalada said:Not sure eDram would be a great solution for that though - having to manually manage another layer of memory on top of local storages would complicate things a fair bit more.Gubbi said:Or maybe it just went on a die diet. who knows ?
With 288 GFLOPS peak performance, CELL is fairly certain to be memory starved.
Which is another reason why I prefer demand loaded caches in the first place.
Fafalada said:At any rate - I would argue that eDram bandwith would come more handy on the GPU side, and at least that's an area that even if it were managed by hand is familiar to lots of people already.
The primary function of a big chunk of eDram would be to lower average memory latency, increased bandwidth is secondary.
Main XDR memory will be 200-400 cycles away, that's 400-800 instructions. Anything that doesn't prefetch like hell will stall all the time. Even if you vertical multithread your code, you'll be limited by the maximum 16 outstanding memory (DMA) transactions (25-50 instructions/transaction/thread). - and that is without contention, you have 8 other guys competing for the same memory channel.
Cheers
Gubbi