I think they'd have to do more to a CU than just remove the caches to get something useful, and there are some nice elements to caches that the SPEs might have benefited from.
What would a CU have for data access without caches? Is this all going directly into the LDS, and is that LDS larger?
The latency for the LDS is likely in the range of 30 cycles, versus 7 in Cell.
The LDS doesn't serve instructions to the CU, and an SPE-like solution would have the same storage for instructions and data.
That shared local store was also in some ways a disadvantage for the SPEs, since linking the instruction stream to the same memory meant slightly longer latency for all LS access, and meant that changes to the instructions in a program could affect the data portion if the total memory consumption rose.
I wonder if there are hardware units on the side for CODEC work, and presumably the DMA feeding whatever the CU is using for memory.
What would a CU have for data access without caches? Is this all going directly into the LDS, and is that LDS larger?
The latency for the LDS is likely in the range of 30 cycles, versus 7 in Cell.
The LDS doesn't serve instructions to the CU, and an SPE-like solution would have the same storage for instructions and data.
That shared local store was also in some ways a disadvantage for the SPEs, since linking the instruction stream to the same memory meant slightly longer latency for all LS access, and meant that changes to the instructions in a program could affect the data portion if the total memory consumption rose.
I wonder if there are hardware units on the side for CODEC work, and presumably the DMA feeding whatever the CU is using for memory.