AMD announces SSE5 instructions

I specifically excluded SSE5 from the point I made.

SSE5 is going to have to fight its own battle.

My main point is that after SSE5, when AMD puts a streaming/data-parallel core(s) in the CPU - that's going to be driver-dependent. That's my interpretation, anyway.

But AMD's streaming/data-parallel pipes versus Intel's mini-cores implies that x86 throughput computing is going to be proprietary. AMD plans to use drivers, it seems. AMD doesn't look like it wants to follow Intel down the Larabee road.

I dunno if Larabee will need a driver as such (for non-GPU-specific computing, i.e. when operating as a grid of "x86" cores) - each revision to SSE seems to be tantamount to a "driver" revision in itself - whether you actually install a piece of software or not, you've got the same problematic fragmentation in functionality (giving programmers headaches).

Clearly the lightest possible drivers are most desirable.

I'm not defending them, by the way. I was sorta hoping that some kind of x86 data parallel instruction set would arise which would be natively executed on the "GPU" - this extension wouldn't be trying to fork-off instructions as they arise onto the "GPU", piecemeal, but would be launching entire clauses of code (10s, 100s, 1000s of instructions) for 1000s-to-billions of threads, being data-centric not thread-centric.

But the noises AMD is making make it sound more coarse-grained...

Jawed
 
I specifically excluded SSE5 from the point I made.
Okay, but I did not see anything in your previous post to that effect.
From the standpoint of ISA support, the driver issue seems almost orthogonal.

I dunno if Larabee will need a driver as such (for non-GPU-specific computing, i.e. when operating as a grid of "x86" cores) - each revision to SSE seems to be tantamount to a "driver" revision in itself - whether you actually install a piece of software or not, you've got the same problematic fragmentation in functionality (giving programmers headaches).
But at least extensions don't affect previous core revisions.
Driver updates can FUBAR any chips that they are installed over.
Just one of the bugs in one of the games that any GPU maker has gotten away with would lead to AMD being crucified, because nobody takes the kind of crap from their CPU that GPUs get away with.

I'm not defending them, by the way. I was sorta hoping that some kind of x86 data parallel instruction set would arise which would be natively executed on the "GPU" - this extension wouldn't be trying to fork-off instructions as they arise onto the "GPU", piecemeal, but would be launching entire clauses of code (10s, 100s, 1000s of instructions) for 1000s-to-billions of threads, being data-centric not thread-centric.
The piecemeal approach sounds closer to a more incremental form of Fusion that falls short of full integration, where the CPU "steals" or arbitrates for GPU units.

edit:
Actually, going with AMD's horrid luck at anything revolutionary, the piecemeal approach up to having a CPU hosting GPU-type units makes sense.
 
Last edited by a moderator:
The piecemeal approach sounds closer to a more incremental form of Fusion that falls short of full integration, where the CPU "steals" or arbitrates for GPU units.
I was using "piecemeal" to mean "a single thread of a few SIMD-type instructions" or "a limited number of threads that execute the same few short piece of code". The cost of moving such piecemeal contexts onto a "GPU" is simply too high, no matter how tightly integrated the "GPU".

As far as I can tell from the CAL stuff, for example, the same code can run on the CPU or GPU (much like CUDA) - it's a "runtime" switch effectively. So I imagine this'll end up as heuristics in the driver for a particular CPU that decides whether a context justifies being switched over to the "GPU" or whether the code should stay on the CPU, running "natively". Depends on the capability of the "GPU"(s) in the CPU as well as the number of attached GPUs (via PCI Express, say) + bandwidths, etc. etc.

Future versions of D3D10 make the GPU more tightly slaved to the CPU, as far as I can tell (context switching, virtual memory) normalising the GPU as a compute resource and hiding a multitude of them from the developer. This doesn't sound hugely different from what CAL will do across an array of CPUs and GPUs - regardless of the physical coupling.

Jawed
 
That sounds like the expected outcome from AMD's backing off from fully heterogenous multicore solutions.

They are unwilling to sacrifice large swaths of silicon real estate to general-purpose x86 code, probably fearing being pilloried in the benchmark game by getting caught in the Cell processor situation where legacy code only runs on the PPE.

Instead they will try to make a collection of x86 cores where some will likely suck at running most x86 code, but will still be around to chip in some amount of performance.

The driver would the necessary glue (kludge?) between hardware, OS scheduler, and perhaps specially compiled apps, to make this workable.

The unfortunate side effect of maintaining this through x86 is, as you mentioned, that all the context and semantics slathered onto every operation in x86 are going to slow the cores compared to those in a truly heterogenous chip.
 
Back
Top