I don't know how often you run into a job where the *code* takes up a huge chunk of a 256k block since you're not going to put an entire *application* on either one all at once. The SPEs can put anything in their 256k block, and you can DMA anything including code to that local store on demand. It basically counts as a cache miss once again when you need to do so. The PPE has its 32K L1 instruction cache and it sees instruction cache misses all the same.
You might have fewer misses on the PPE than the SPE, so if continuous feeds of data or the use of far function pointers is a common thing for your codebase, you could have some issues. And how hard that hits you performance wise is a matter of how early you make a "prefetch" or equivalent.
If you've got a single job that needs some 300k of code (which is a hell of a lot), then you definitely need to break it up... but that's different from not being able to run all 300k of that code at all.
If you have a small block of code that works on 1 MB of data, that simply means going through it in smaller chunks at a time and fetching over the DMA while you're going through data further down the line so that you never have to wait. This is actually not that unusual since it was something people did on the PS2 VUs all the time and you only had 4k/16k on them.
Thank you SMM. :smile:
Interesting.