32 cores on 45 nm sounded unrealistic anyway..
That's not what Intel originally thought, it seems. I'd really like to see 48 cores in 2010
As for per-core performance, here is some information about it.
32 cores on 45 nm sounded unrealistic anyway..
Sandy Bridge should be given a new vector instruction set: one with 3 operand non-destructive instructions and up to 256-bit vectors.
Because it's an experiment for what CPUs in the distant future will look like. Intel is trying to answer the question of what CPU people will buy in 5-10 years time. 20 GHz single-cores are pretty much out of the question, but how many cores is right, should they be identical or hybrid, complex cores or simplified ones? Currently opinions are divided and roadmaps show a bit of everything. Intel would much rather sell the same chip to everyone. That's the only way to keep their dominant position and have x86 survive the next decade. Larrabee will answer many questions to converge to something they can put in each and every system.Yeah, I'm missing the "why use x86" angle for Larrabee.
Actually, the Sandy Bridge/Larrabee split indicates that not everyone at Intel is on the same page.Intel would much rather sell the same chip to everyone. That's the only way to keep their dominant position and have x86 survive the next decade.
x86 will be defined as whatever goes into the dominant CPU, and that will be Sandy Bridge.They got nothing to lose. I'm sure the Larrabee division has its own ambitions and if something extra comes out of it that's great. But it's still Intel we're talking about. They sell CPUs, and divergence would be their doom. So x86 is key for Larrabee.
The rumors I've run across indicate it's not.For the same reason I'd be really surprised if Larrabee's 512-bit SIMD isn't in fact 2 x 256-bit AVX.
Well I can definitely see how a division experimenting with terraflop architectures turned itself into a GPU division. That doesn't take away that the CPU division(s) need answers for the next decade. At least it explains why x86 was used as a starting point.Actually, the Sandy Bridge/Larrabee split indicates that not everyone at Intel is on the same page.
Correct me if I'm wrong but it looks like the first iteration will be aimed at developers. They need/want working hardware by the end of the year. That's two years before Sandy Bridge. So it's quite unavoidable that there will be more differences than intended. It doesn't mean that later iterations won't closely match mainstream x86.x86 will be defined as whatever goes into the dominant CPU, and that will be Sandy Bridge. Larrabee's going to be an almost x86, and I'm wondering if there are those who aren't trying too hard to allow Larrabee to succeed.
Minor differences can easily be bridged with compiler changes. It doesn't fundamentally alter the software developed for Larrabee, if at all.The rumors I've run across indicate it's not. There is overlap in functionality, but the encoding, internal state, and instruction behavior do not match.
Minor differences can easily be bridged with compiler changes. It doesn't fundamentally alter the software developed for Larrabee, if at all.
The Pentium III Katmai core was 128 mm² at 0.25 micron and featured out-of-order execution and SSE. And since it obviously didn't spend all of its die size decoding instructions I think it's fair to say that instruction decoding won't take a major amount of die space for an in-order processor at 45/32 nm. Besides, it also wins you code size and I doubt next-generation GPUs won't have any form of instruction decoding.Out of curiosity what fraction of x86 chip space is typically used for the x86 instruction decode ... or to be more specific, all the stuff needed to translate x86 into the internal hardware instructions actually executed by the core?
Might as well include register rename in this as well under the assumption that combined ALU+MEM opcodes probably get decoded into 2 or more actual operations depending on address mode.
I must have missed that.Irrelevant to whether Larrabee's vector extensions are just doubled AVX. There's evidence that they are not.
That's not an unsurmountable obstacle. Intel has years of experience in running vertex shaders using SSE.The two sets of extensions are (allegedly) not encoded the same, don't behave the same, and may have functionality present in one that is not found in the other.
GPUs are moving towards programmable texture sampling. CPUs are already there. A parallel gather instruction would be great though, but it's not worthless without it.I don't see AVX supporting fixed function texture sampling
edit: not in 1-2 years at least
Out of curiosity what fraction of x86 chip space is typically used for the x86 instruction decode ... or to be more specific, all the stuff needed to translate x86 into the internal hardware instructions actually executed by the core?
Might as well include register rename in this as well under the assumption that combined ALU+MEM opcodes probably get decoded into 2 or more actual operations depending on address mode.
GPUs are moving towards programmable texture sampling. CPUs are already there. A parallel gather instruction would be great though, but it's not worthless without it.
I'll ask a bit different question: how many x86 CPU pipeline stages are wasted for x86?Out of curiosity what fraction of x86 chip space is typically used for the x86 instruction decode ... or to be more specific, all the stuff needed to translate x86 into the internal hardware instructions actually executed by the core?