[Beyond3D Article] Intel presentation reveals the future of the CPU-GPU war

Arun

Unknown.
Moderator
Legend
The war is on. The question is, are you ready?
Read on for the rest of the presentation, as well as our short analysis of Intel's proposed architecture. Please Digg it by clicking here if you liked it.

Back in February we reported that Intel's Douglas Carmean, new Chief Architect of their Visual Computing Group (VCG) in charge of GPU development at Intel, had been touring universities giving a presentation called "Future CPU Architectures -- The Shift from Traditional Models". Since then he's added a few more major university stops, and now the feared B3D ninjas have caught up with him. Our shadow warriors have scored a copy of Carmean's presentation, and we've selected the juicy bits for your enjoyment and edification regarding the showdown that Intel sees as already underway between CPU and GPU makers.
 
I don't think there's enough bandwidth on a CPU for this to be even considered for a Fusion-style part, but the fact that they never mentioned raytracing or rasterization specifically makes me wonder what market they're targeting immediately. Niagara has shown that it's possible for architectures like these to do well, so my first thought is that this is going to be something along the lines of a ClearSpeed coprocessor. Will it have a DVI output? I hope so, sure, but I think things really depend on what the fixed function units are.
 
Well, as we suggested at one point in the article, we know he's been giving some form of this (obviously updated tho, since he has G80 and R600 references) since at least October of 2005. It would be interesting to know 1). Larrabee's birthday and 2). The date Carmean really became Chief Architiect at VCG, rather than the date they began to admit to it in public.

It remains the case, as you suggest, that Intel has multiple threads running around and it is not terribly clear yet how they all relate in time and niche's addressed. Frustrating as hell at times, but we keep picking at the problem and surely eventually enough pieces will come to light to make the whole puzzle make sense.
 
It's sorta comical how in the slides (19 onwards) the CPU and GPU start out "equal" in the dense linear algebra section of the "tug of war" X-axis.

If that's their current view (and not based on GPUs from 2004) then we're in for a bit of a farce :mrgreen:

It's also interesting that no comparison with Cell was made. Cell is, effectively, logically equivalent to :

Image15-big.jpg


minus the "fixed function units". The SPEs' LS and their DMA engines provide programmer-managed "arbitrarily sized and replicated cache".

Obviously Cell makes life a bit difficult for the programmer and is only single-threaded per core. And the LS is only 256KB. But 32-SPEs is less than 700M transistors. On 65nm they'd take about 250mm2 (not allowing for the scaling-up that the EIB would require).

Jawed
 
I think it all boils down to Intel finding a way to spend the transistor budgets afforded to them by Moores Law.
 
I think it all boils down to Intel finding a way to spend the transistor budgets afforded to them by Moores Law.
Well, you could certainly argue that Moore's Law is playing in favour of latency-tolerant cores, as well as in favour throughput cores, but against serial-code-oriented cores.

And on the subject of throughput cores...
http://www-csl.csres.utexas.edu/use...ics_Arch_Tutorial_Micro2004_BillMarkParts.pdf - Slide 85... ;)
(that slide likely wasn't written by Moreton, but still - the guy is the tesselation expert at NV, among other things, you'd expect him to be in favour of throughput cores ffs!)
 
Personaly my little brain thinks intel should give up on graphics chips
and instead find out what makes gpu's so good at certain types of computing and add that functionality to their cpu's
 
Personaly my little brain thinks intel should give up on graphics chips
and instead find out what makes gpu's so good at certain types of computing and add that functionality to their cpu's
There's no reason this couldn't be added to a CPU as a Fusion-type thing, but it depends a lot on how much memory bandwidth the CPU has.
 
Well, you could certainly argue that Moore's Law is playing in favour of latency-tolerant cores, as well as in favour throughput cores, but against serial-code-oriented cores.

I guess it depends on the time frame. Short term (current), for serial code orientated cores, the most effective use would be to add more cores. Four years from now, assuming a quadrupling of transistors, you are faced with the same question. Will doubling my cores be the most beneficial way to use the budget? IMHO, in the consumer space, after four cores the benefits reduce dramatically. At that point it would be better to use the transistors on adding the more GPU like elements.
 
I guess it depends on the time frame. Short term (current), for serial code orientated cores, the most effective use would be to add more cores. Four years from now, assuming a quadrupling of transistors, you are faced with the same question. Will doubling my cores be the most beneficial way to use the budget? IMHO, in the consumer space, after four cores the benefits reduce dramatically. At that point it would be better to use the transistors on adding the more GPU like elements.

I would more or less agree with regards to today's software, but with more cores available future software may attempt to do things we don't bother with today in the consumer space, computer vision related tasks in particular.

If anything like that takes hold it would be counter product to forgo extra cores in favor of GPU like elements. Especially when the majority of the populace isn't interested in high performance graphics, a just good enough solution provided by the many core CPU would likely suffice.
 
hm... Just a thought... when I look at the task manager, at the processes and the "threads" count, there are quite a few things with multiple threads. Now, they may not take up a lot of processing time, but going in-order for the multi-zillion threads and taking into account the <1/3rd single thread performance...

When they were mentioning the throughput cores, was that meant for a CPU replacement or strictly for graphics... :?:
 
hm... Just a thought... when I look at the task manager, at the processes and the "threads" count, there are quite a few things with multiple threads. Now, they may not take up a lot of processing time, but going in-order for the multi-zillion threads and taking into account the <1/3rd single thread performance...

When they were mentioning the throughput cores, was that meant for a CPU replacement or strictly for graphics... :?:
It's prety damn close to Terascale (Polaris)... so who knows. But this was a VCG presentation, so we assume it's a GPU.
 
It's also interesting that no comparison with Cell was made. Cell is, effectively, logically equivalent to
What do you mean by logically equivalent?
To be honest I fail to see this stuff sharing a lot in common with CELL.
 
This may be Intel's move to preempt the GPGPU, though not necessarily win big in graphics.

The idea of keeping the fixed-function (special function, TMUs, ROPS, etc) physically separate would allow Intel's GPGPU to dump a lot of silicon that would sit idle in a wide range of computing tasks.

If a GPGPU minus the second G can come in with less die area, power, and heat than a competing GPGPU based on a graphics core, Intel could marginalize GPGPU in favor of x86 or other Intel-based ISA stream computing.

Using the same base design for both GPGPU and graphics would amortize some of the cost and risk.
 
Intel could marginalize GPGPU in favor of x86 or other Intel-based ISA stream computing.
Indeed. Now the big question is, what *is* that ISA? My guess is it's x86 with some rather aggressive (to say the least!) extensions that are VLIW-like. And then (at least part of) Gesher would also support those same extensions in 2010, but perhaps with another implementation.
 
Back
Top