Larrabee at Siggraph

Well..Michelangelo got blind and died before completing it..:)

(btw..I visited it just a few days ago :) )
 
Edit -- saying the Larrabee cores are based upon the old design might be comparable to how the Core 2 is based on the P3.... That might be more clear.

Gah - did I have an old cache of this thread?? Lots of discussion appeared after I posted. ;)
 
Last edited by a moderator:
Intel Blasterizer 1000 - it'll blow your balls off!

Intel Optimo 900 - smoke your enemies

Intel Hyper 3D - no joke punchline for this one, I think it's possible

Intel Clearvision - same

Intel Teraforce - same

any marketing departments want to hire me yet? ;)
 
Yeah, they asked for pentium cores, so they could reuse the 10-year old rasterizer they had on dusty floppies in the kitchen cupboard.

SCNR

did they order a side of roflwaffles with that, because I just had some and they were scrum-diddly-umptious!
 
Intel Blasterizer 1000 - it'll blow your balls off!

Intel Optimo 900 - smoke your enemies

Intel Hyper 3D - no joke punchline for this one, I think it's possible

Intel Clearvision - same

Intel Teraforce - same

any marketing departments want to hire me yet? ;)

It's tough when you're starting with "Extreme" waaaaay down the performance curve (one hopes!)
 
The dot product instruction is particularly interesting to note, especially given that Intel never really seemed to show much interest for graphics in previous SSE instruction sets.
Dot products aren't really that useful for graphics on the CPU, because nearly every use of it can be parallelized into a MAD. It should require fewer instructions, too.

IMO it's more useful for serial instruction streams, or where the data can't be reorganized efficiently.
 
Dot products aren't really that useful for graphics on the CPU, because nearly every use of it can be parallelized into a MAD.
Hitachi probably would disagree with you as, IIRC, they have a DP instruction in the SH4.
 
I'm not saying dot product instructions are useless. I'm just saying that they are not indicative of a focus on a realtime graphics workload. What a CPU does to determine this workload is another matter.
 
Well, if your code is SOA, then you'll get overhead for copy+swizzle. If you on the other hand have AOS, you get copy+swizzle overhead of MAD. The vast majority of the code out there is AOS, unless you started off writing your application with SSE optimizations as a key focus. Now of course with Intel writing all the code I'm sure they do.

Personally I've never been very fond of Intels "you can just use SOA" attitude when it comes to SSE. The most natural way to write code is to keep related data together. You use a class/struct that contains everything for one instance of that object. That'll have better memory access pattern too in most cases than distributing all attributes of your objects into different arrays. The DPPS instruction is great because it'll easily plug right into any existing code and you can do localized optimizations without restructuring your entire codebase.
 
Personally I've never been very fond of Intels "you can just use SOA" attitude when it comes to SSE. The most natural way to write code is to keep related data together. You use a class/struct that contains everything for one instance of that object. That'll have better memory access pattern too in most cases than distributing all attributes of your objects into different arrays. The DPPS instruction is great because it'll easily plug right into any existing code and you can do localized optimizations without restructuring your entire codebase.
True, but for realtime rendering you'd be mad to use AOS. For any code that Intel is writing, e.g. DX9/DX10 emulation, RT code, or optimized libraries, this isn't a problem in the slightest.

This is why I think the DP instructions have little to do with Larabee's graphics focus. It's more about productivity for other HPC applications.
 
I'm confused about the 128k L2 cache per core part, the 2006 presentation stated 256KB per core.
will the amount of cores decide if this will be a 16 or 32 core part or is any "info" so far regarding the 4MB L2 cache BS?
 
I'm confused about the 128k L2 cache per core part, the 2006 presentation stated 256KB per core.
will the amount of cores decide if this will be a 16 or 32 core part or is any "info" so far regarding the 4MB L2 cache BS?

Perhaps they've lowered the amount per-core to improve overall latency and/or density, who knows ?
 
Perhaps they've lowered the amount per-core to improve overall latency and/or density, who knows ?

Probably yeah, If I remember correctly the L2 latency was 5 instructions on the original P54, so even the now stated 10 instructions is already a stretch.
 
Assumung the old Pentium rumor is true, if Intel is using updated early-to-mid 90s Pentium CPU technology, maybe they've also use mid-to-late 90s Lockheed Real3D GPU technology for the rasterization segments of Larrabee :D
 
I'm still betting on 65nm samples and 32nm production...

So, iyo, will the 65nm samples have a quarter or half the transistor budget? ie alot less cores, but same principal?

Doesnt it make sense to sample on 45nm (full size sample, much lower clock) and produce on 32?

Be interesting to see if they can get their 32nm fabbing process online before the end of 2009
 
Back
Top