It's not the same thing but not that different.aaaaa00 said:Not the same thing.
Don't get your point. You'll have a prefetch que anyway, the faster you fill it the better IMO.aaaaa00 said:Even though modern general purpose CPUs require alignment for efficient access, this alignment requirement is the same size as the machine word.
aaaaa00 said:This is not true of the SPE, the memory alignment requirement is 4x that of the machine word (there are no scalar 16-byte operations), and hence is evidence in favor of the SPE's lineage being a vector oriented co-processor, and not a general purpose CPU.
You think so? Here's the background of the 16 byte load/store straight from the horse himself (Peter Hofstee).
If you wonder why only 16B loads and stores? ... One reason is latency ( unaligned loads or smaller quantities require extra muxing stages ). For stores we have to compute an ECC (error correction code) to be stored with the data. These codes are typically calculated over larger fields (16 bytes in our case) to limit the overhead in the SRAM arrays. Writing a quantity less than 16B would therefore require a read, modify (combine new and old data for new ECC), write operation on the array. In true RISC fashion we felt that it would be better to do a load and a store and allow compiler optimization than hide what is really going on.