The Intel GPU Rumour Mill

Techno+

Regular
hi,

look at this http://www.vr-zone.com/?i=4605

Some more updates:

http://www.beyond3d.com/forum/showthread.php?t=37889

http://www.theinquirer.net/default.aspx?article=37548

turns out Larrabee is more than just a GPU, it is a CGPU, it sounds similar to Fusion, but Fusion will take the GPU die, and over time, AMD will go for full integration. Intel's approach is 'full integration', meaning on core ( hope AMD does it like intel and changes plans, you know it is not too late.) , as Charlie reported, developers won't need any new tools for using this CGPU, existing tools are fine. This would translate into a real GPGPU. Perhaps each min-core will have a mini-gpu and a mini-cpu. sounds a lot similar to i860. However, these are just rumours and the actual product might be a lot different ( anyone of you remember thos cell processor rumours>).
 
Last edited by a moderator:
Technically, all GPU's are already multicore devices...

I do believe Intel will resort to its manufacturing process advantage and have a counterpart of some sort for the current SLI/Crossfire systems, and apply it on a massive scale to dense, single-package, relatively simple dies.
Another possibility is a simple x86 core+MC, supervising many dedicated ALU's, not unlike CellBE, but with specialized 3D rendering ISA's.
 
I'd say they technically fall short, since the processing elements are not capable of all the steps in the execution pipeline, I'm looking their inability to do their own instruction fetch in particular.

Intel might be playing a little loose with its definitions, but this is so far off and without any in-depth information that I can't be sure.

If Intel's target chips are truly multicore GPUs, then I'd like to know what they are saving by going this route. There's already significant clustering being done in a GPU, full independence will force a trade-off in transistors devoted to the context of each core and the messaging fabric between them.

If they skimp on inter-core communication, which is possible, this would place a burden in terms of memory bandwidth, storage, or software expansion.
 
But to see 32nm high end GPUs by 2009 is crazy. With how slow TSMC takes to ramp up their nodes 32nm wasnt going to surface until 2011 early

IF intel could pull out a decent GPU we could be looking at a situation where the industry is turned on its head! And yes I know about intels failed attempts at this before. But somehow, this seems different and very serious.

roll on 2009
 
But to see 32nm high end GPUs by 2009 is crazy. With how slow TSMC takes to ramp up their nodes 32nm wasnt going to surface until 2011 early

IF intel could pull out a decent GPU we could be looking at a situation where the industry is turned on its head! And yes I know about intels failed attempts at this before. But somehow, this seems different and very serious.

roll on 2009

Frankly, i disagree.
65nm is in production at TSMC for some time now.
This means 45nm is, what ? Maybe a few months to a year away. So, 32nm in the 2009 time frame is very doable.
Don't confuse the process with the inability of the IHV's to port their designs on time, or the economic aspects of each process (it's not like Nvidia or AMD/ATI can charge 1000 dollars on a high-end GPU lightheartedly).
A CPU has much better profit margins than a GPU.


And you forget another thing:
X86 CPU's always have priority at Intel.
Look at how chipset processes always lag behind CPU processes (because the old Fabs are re-used for secondary chip production), and i can see GPU's following the same pattern.
I mean, P965/G965 is the newest Intel core logic and it has just premiered the 90nm process, while the i975X still is made on the 130nm node.
 
Frankly, i disagree.
65nm is in production at TSMC for some time now.
This means 45nm is, what ? Maybe a few months to a year away. So, 32nm in the 2009 time frame is very doable.
Don't confuse the process with the inability of the IHV's to port their designs on time, or the economic aspects of each process (it's not like Nvidia or AMD/ATI can charge 1000 dollars on a high-end GPU lightheartedly).
A CPU has much better profit margins than a GPU.


And you forget another thing:
X86 CPU's always have priority at Intel.
Look at how chipset processes always lag behind CPU processes (because the old Fabs are re-used for secondary chip production), and i can see GPU's following the same pattern.
I mean, P965/G965 is the newest Intel core logic and it has just premiered the 90nm process, while the i975X still is made on the 130nm node.

Neither of which are cutting edge, and neither of which require cutting edge, it would be a total waste. whereas high end GPUs do require cutting edge fab and are not mass market, hence high end GPUs wont cannabalise too greatly from its production of cutting edge CPUs.

bottom line is, its rumoured that intel will have a high end GPU at 32nm by 2h 2009. (the same time frame as when the 32nm CPUs will be released)

So when do you think Nvidia will have its 32nm offering by?

Since AMD (read IBM) are around a year behind intel in fab processing and TSMC are behind IBM, and then add the fact that Nvidia cant get their designs immediately on to new TSMC processes for the above reasons you mentioned, I cant see why Its nor far from the truth to think that Intel could have a two year competitive advantage over Nvidia for fabbing GPUs.

Is that really far fetched to assume? If someone can either vouch for that or discredit it I would like to know for what reasons?
 
Neither of which are cutting edge, and neither of which require cutting edge, it would be a total waste. whereas high end GPUs do require cutting edge fab and are not mass market, hence high end GPUs wont cannabalise too greatly from its production of cutting edge CPUs.

bottom line is, its rumoured that intel will have a high end GPU at 32nm by 2h 2009. (the same time frame as when the 32nm CPUs will be released)

So when do you think Nvidia will have its 32nm offering by?

Since AMD (read IBM) are around a year behind intel in fab processing and TSMC are behind IBM, and then add the fact that Nvidia cant get their designs immediately on to new TSMC processes for the above reasons you mentioned, I cant see why Its nor far from the truth to think that Intel could have a two year competitive advantage over Nvidia for fabbing GPUs.

Is that really far fetched to assume? If someone can either vouch for that or discredit it I would like to know for what reasons?

I agree with you that nvidia will be late because of relying on TSMC, but dont forget that AMD will hopefully ship 45nm products in Q2 2008 while intel would do so in Q1 2008, that is for 45 nm, at 32 nm, AMD would do more catching upn adn be only 1-2 months late in manufacturing technologies.
 
If they skimp on inter-core communication, which is possible, this would place a burden in terms of memory bandwidth, storage, or software expansion.

(Honest question) could you elaborate in layman's terms what you mean with inter-core communication?
 
(Honest question) could you elaborate in layman's terms what you mean with inter-core communication?

It's more of a question of where data is kept and how it gets from where it is in memory or cache to an ALU, and then what happens to the results and where those results go.

A core is basically an independent set of control and ALU units that perform the Fetch-Decode-Execute loop. Units within a core are aware of the core's state and have free access to data the core has access to.

Multi-core chips have ALUs and control logic, but there is a division between the units in core 0 and core 1.
They don't have access to each other's state, so something must be done to make sure that when there is some state that is shared, it is shared properly. This involves an explicit passing of information.

In the case of multi-core GPUs, it would be a lot like SLI between the cores (perhaps not between all of them), but there are many ways of accomplishing the same result.
Some ways are faster, but they cost more transistors, which cuts into the budget for cache and ALUs.

I'm curious to see what Intel's choices would be.

Going multi-core means that Intel feels the explicit division of core resources--even at the cost of slower sharing and more transistors that do not contribute to computation--is a net gain. It must be large enough to overcome other sources of overhead. Multicore will guarantee an increase in software and driver complexity.

I'd like to know what there is to gain, since GPUs have significant internal divisions already.
Going multicore just for the sake of doing it doesn't help anything.

edit:

I just saw that there are some more details now on the design.
They're apparently going for an ultra-wide bidirectional ring bus. They're saying it's 16 x86 cores with vector engines strapped on. There are some wrinkles that might make or break this design, but at first glance it's not very encouraging.

It may just be that Charlie at the Inquirer worded things funny, but I don't find his assertion that this design settled on being a GPU because Intel's engineers weren't sure what else to do with it all that comforting.
 
Last edited by a moderator:
I've been talking to Charlie through e-mail, and he said that this product, although a kind of CGPU, will be marketed as a GPU, just like 3dillentante pointed out about the i860, however i dont trust in that 16X the performance of G80 phrase. The real "CGPU' that would be marketed as a CPU will follow in about 1-2 years time. Larrabee is just a step down the road towards the CPU being swallowed by the GPU. Intel's CPU with the on die GPU will have some X-86 (OoO and simlar to the current conroe but tweaked) cores and perhaps 8 'Larrabee style' cores. I know I said this before but AMD mentioned that they will add x86 GP cores that can be geared towards graphics when needed, so perhaps AMD will eventually have a similar approach.(http://www.anandtech.com/cpuchipsets/showdoc.aspx?i=2565)
 
There's already significant clustering being done in a GPU, full independence will force a trade-off in transistors devoted to the context of each core and the messaging fabric between them.
I don't see the inherent benefits of such clustering as far as messaging is concerned, there isn't a lot of internal communication in a cluster of pixel or vertex shaders AFAICS (well, the differentials maybe ... but you could simply render a quad on a single core).

As for the benefits in transistor counts in sharing an execution context, I have always had my doubts ... my gut feeling has always been that the storage needed for multi-threading and the ALUs are the overwhelming factor, I doubt the extra instruction cache/bandwidth, control circuitry and a minimal increase in context needed to make the shaders independent is going to break the bank.
 
Last edited by a moderator:
I just saw that there are some more details now on the design.
They're apparently going for an ultra-wide bidirectional ring bus. They're saying it's 16 x86 cores with vector engines strapped on. There are some wrinkles that might make or break this design, but at first glance it's not very encouraging.

It may just be that Charlie at the Inquirer worded things funny, but I don't find his assertion that this design settled on being a GPU because Intel's engineers weren't sure what else to do with it all that comforting.

If that should be true, then it's most likely not what I was secretely hoping for :(
 
Hoping to address graphics without adding a rasterizer it's a suicide.

Presumably they'll discover that at some point. I think there are various blocks in modern high-end GPUs that need to get accounted for. My point is that I think they sense they can't afford to fail now, so when they stub their toe, which is nearly inevitable, they will learn their lesson and take action accordingly rather than just give up.

Right now I see about 4 streams of "Intel graphics tech", some of which may overlap (or not). We'll continue trying to open lines of communication to them to get a sense for where they are going and hopefully get them talking more to the community directly.
 
Hoping to address graphics without adding a rasterizer it's a suicide.
Yeah seriously.

We think that ray tracing is going to take off.
For secondary rays, sure... why you'd ray trace primary rays is beyond me and considering that most scenes aren't exactly "hall of mirrors" situations, a rasterizer is going to be useful for quite some time if not indefinitely.

[Edit] Actually that whole answer bugs me... I don't know why people spout so much "propoganda" about ray tracing. Sure it's a simple idea, but so is rasterization. There's nothing really more "pure" about either IMHO. Both will give the same results for coherent rays...
 
Last edited by a moderator:
[Edit] Actually that whole answer bugs me... I don't know why people spout so much "propoganda" about ray tracing. Sure it's a simple idea, but so is rasterization. There's nothing really more "pure" about either IMHO. Both will give the same results for coherent rays...
On top of that if you compare the most popular ways of calculating primary rays and the way parallel rasterization is done you can see that the two approaches are now largely overlapping. For secondary rays it's another matter though.
 
I don't see the inherent benefits of such clustering as far as messaging is concerned, there isn't a lot of internal communication in a cluster of pixel or vertex shaders AFAICS (well, the differentials maybe ... but you could simply render a quad on a single core).
Clustering itself adds latency if data crosses between clusters. If it is just assumed that such a thing will not happen, then no transistors need be used. On the other hand, SLI seems to indicate that at least something is passed between cores. I can't say either way.

I'm not sure that going fully independent will save that many transistors over what is present now.

There is already a large amount of assumed independence on a GPU, making it explicit now just adds the complexity of an on-chip network that has to be managed separately.

As for the benefits in transistor counts in sharing an execution context, I have always had my doubts ... my gut feeling has always been that the storage needed for multi-threading and the ALUs are the overwhelming factor, I doubt the extra instruction cache/bandwidth, control circuitry and a minimal increase in context needed to make the shaders independent is going to break the bank.

Per ALU and per-core, it probably doesn't. Multiply it by 16, and things become more obvious.
Make the multicore support x86 (I really don't get the point of this right now), and the difference will be greater.
If the x86 support is what everyone says, then there will be design considerations put in place that would not exist on a standard GPU and would not offer any performance improvement other than carrying ISA/software model baggage.
 
On top of that if you compare the most popular ways of calculating primary rays and the way parallel rasterization is done you can see that the two approaches are now largely overlapping. For secondary rays it's another matter though.
Definitely - 100% agree. For secondary rays I can see no other option than ray tracing, so certainly we'd like that to be as fast as possible! It's just good to remember that for the most part, ray tracing is the "brute force" algorithm, and rasterization is the faster way that works in some cases.
 
Back
Top