Larrabee at Siggraph

nAo · Jul 9, 2008

Well..Michelangelo got blind and died before completing it..

(btw..I visited it just a few days ago

)

MoHonRi · Jul 9, 2008

Edit -- saying the Larrabee cores are based upon the old design might be comparable to how the Core 2 is based on the P3.... That might be more clear.

Gah - did I have an old cache of this thread?? Lots of discussion appeared after I posted.

ShaidarHaran · Jul 9, 2008

Intel Blasterizer 1000 - it'll blow your balls off!

Intel Optimo 900 - smoke your enemies

Intel Hyper 3D - no joke punchline for this one, I think it's possible

Intel Clearvision - same

Intel Teraforce - same

any marketing departments want to hire me yet?

ShaidarHaran · Jul 9, 2008

Karoshi said:
Yeah, they asked for pentium cores, so they could reuse the 10-year old rasterizer they had on dusty floppies in the kitchen cupboard.

SCNR

did they order a side of roflwaffles with that, because I just had some and they were scrum-diddly-umptious!

Geo · Jul 9, 2008

ShaidarHaran said:
Intel Blasterizer 1000 - it'll blow your balls off!

Intel Optimo 900 - smoke your enemies

Intel Hyper 3D - no joke punchline for this one, I think it's possible

Intel Clearvision - same

Intel Teraforce - same

any marketing departments want to hire me yet?

It's tough when you're starting with "Extreme" waaaaay down the performance curve (one hopes!)

Mintmaster · Jul 10, 2008

Humus said:
The dot product instruction is particularly interesting to note, especially given that Intel never really seemed to show much interest for graphics in previous SSE instruction sets.

Dot products aren't really that useful for graphics on the CPU, because nearly every use of it can be parallelized into a MAD. It should require fewer instructions, too.

IMO it's more useful for serial instruction streams, or where the data can't be reorganized efficiently.

Simon F · Jul 10, 2008

Mintmaster said:
Dot products aren't really that useful for graphics on the CPU, because nearly every use of it can be parallelized into a MAD.

Hitachi probably would disagree with you as, IIRC, they have a DP instruction in the SH4.

Mintmaster · Jul 10, 2008

I'm not saying dot product instructions are useless. I'm just saying that they are not indicative of a focus on a realtime graphics workload. What a CPU does to determine this workload is another matter.

TimothyFarrar · Jul 10, 2008

Or that dot product instructions are usually AOS, so you would have lots of overhead copy+swizzling to try and make good use of it.

Humus · Jul 10, 2008

Well, if your code is SOA, then you'll get overhead for copy+swizzle. If you on the other hand have AOS, you get copy+swizzle overhead of MAD. The vast majority of the code out there is AOS, unless you started off writing your application with SSE optimizations as a key focus. Now of course with Intel writing all the code I'm sure they do.

Personally I've never been very fond of Intels "you can just use SOA" attitude when it comes to SSE. The most natural way to write code is to keep related data together. You use a class/struct that contains everything for one instance of that object. That'll have better memory access pattern too in most cases than distributing all attributes of your objects into different arrays. The DPPS instruction is great because it'll easily plug right into any existing code and you can do localized optimizations without restructuring your entire codebase.

nAo · Jul 10, 2008

Is DPPS fully pipelined? what's its latency?

Mintmaster · Jul 10, 2008

Humus said:
Personally I've never been very fond of Intels "you can just use SOA" attitude when it comes to SSE. The most natural way to write code is to keep related data together. You use a class/struct that contains everything for one instance of that object. That'll have better memory access pattern too in most cases than distributing all attributes of your objects into different arrays. The DPPS instruction is great because it'll easily plug right into any existing code and you can do localized optimizations without restructuring your entire codebase.

True, but for realtime rendering you'd be mad to use AOS. For any code that Intel is writing, e.g. DX9/DX10 emulation, RT code, or optimized libraries, this isn't a problem in the slightest.

This is why I think the DP instructions have little to do with Larabee's graphics focus. It's more about productivity for other HPC applications.

neliz · Jul 13, 2008

I'm confused about the 128k L2 cache per core part, the 2006 presentation stated 256KB per core.
will the amount of cores decide if this will be a 16 or 32 core part or is any "info" so far regarding the 4MB L2 cache BS?

INKster · Jul 13, 2008

neliz said:
I'm confused about the 128k L2 cache per core part, the 2006 presentation stated 256KB per core.
will the amount of cores decide if this will be a 16 or 32 core part or is any "info" so far regarding the 4MB L2 cache BS?

Perhaps they've lowered the amount per-core to improve overall latency and/or density, who knows ?

neliz · Jul 14, 2008

INKster said:
Perhaps they've lowered the amount per-core to improve overall latency and/or density, who knows ?

Probably yeah, If I remember correctly the L2 latency was 5 instructions on the original P54, so even the now stated 10 instructions is already a stretch.

Megadrive1988 · Jul 14, 2008

Assumung the old Pentium rumor is true, if Intel is using updated early-to-mid 90s Pentium CPU technology, maybe they've also use mid-to-late 90s Lockheed Real3D GPU technology for the rasterization segments of Larrabee

nAo · Jul 23, 2008

Larrabee boards coming in November

Geo · Jul 23, 2008

Well, all things considired, that sounds like spring/summer 2009 for retail.

Arun · Jul 23, 2008

I'm still betting on 65nm samples and 32nm production...

kyetech · Jul 23, 2008

Arun said:
I'm still betting on 65nm samples and 32nm production...

So, iyo, will the 65nm samples have a quarter or half the transistor budget? ie alot less cores, but same principal?

Doesnt it make sense to sample on 45nm (full size sample, much lower clock) and produce on 32?

Be interesting to see if they can get their 32nm fabbing process online before the end of 2009

Larrabee at Siggraph

nAo

Nutella Nutellae

MoHonRi

ShaidarHaran

hardware monkey

ShaidarHaran

hardware monkey

Geo

Mostly Harmless

Mintmaster

Simon F

Tea maker

Mintmaster

TimothyFarrar

Humus

Crazy coder

nAo

Nutella Nutellae

Mintmaster

neliz

GIGABYTE Man

INKster

neliz

GIGABYTE Man

Megadrive1988

nAo

Nutella Nutellae

Geo

Mostly Harmless

Arun

Unknown.

kyetech

Similar threads