To OP: I'd say Nvidia's architecture is a set of wide SIMD cores. You could call it implicitly SIMD and that'd be approximately correct too.
In the GF104 and newer ones, it's actually a multi-issue core where each instruction is a SIMD one. You might call it SuperSIMD to be amusing The key to note though, is that each instruction must be SIMD.
In contrast, I'd call ATI's architecture a core that is SIMD or perhaps SVMD - Single VLIW, Multiple Data.
In contrast, I think about CPU cores somewhat differently. They are usually OOOEs with SIMD, but the key distinction is that the SIMD is really optional. You don't have any branch granularity issues, etc. etc.
RE: Shared memory vs. MPI, there are a lot of reasons to use shared memory, e.g. if you don't know or like MPI, or your algorithms aren't amenable. There are a few folks who use giant shared memory systems to write algorithms, which are later deployed via MPI. There are also quite a few that use MPI on shared memory (e.g. an ALtix). Your message passing is really fast if it's through memory : )
DK
In the GF104 and newer ones, it's actually a multi-issue core where each instruction is a SIMD one. You might call it SuperSIMD to be amusing The key to note though, is that each instruction must be SIMD.
In contrast, I'd call ATI's architecture a core that is SIMD or perhaps SVMD - Single VLIW, Multiple Data.
In contrast, I think about CPU cores somewhat differently. They are usually OOOEs with SIMD, but the key distinction is that the SIMD is really optional. You don't have any branch granularity issues, etc. etc.
RE: Shared memory vs. MPI, there are a lot of reasons to use shared memory, e.g. if you don't know or like MPI, or your algorithms aren't amenable. There are a few folks who use giant shared memory systems to write algorithms, which are later deployed via MPI. There are also quite a few that use MPI on shared memory (e.g. an ALtix). Your message passing is really fast if it's through memory : )
DK