Larrabee at Siggraph

ZerazaX · Aug 4, 2008

http://anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3367

Anandtech has theirs up

Very interesting stuff

nAo · Aug 4, 2008

MfA said:
You don't really want to do direct shading with 16 pixels at a time either.

Yes, that's another reason too.
And finally we got fragment level blending ops implemented via 'shader cores'

dizietsma · Aug 4, 2008

The flexibility and scalability seem great.

The thing about Intel is that they have to go to process shrinks for their cpu's so for graphics we know that it will be there as a given, whereas nvidia and AMD have to rely on TSMC to get it right.

We know nehalem will be on 32nm so building a simpler LarryB will pe childs play.

mboeller · Aug 4, 2008

ZerazaX said:
http://anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3367

Anandtech has theirs up

Very interesting stuff

Only a 5 stage pipeline? Is that really true?

Wouldn't that mean that Larebee would be rather slow? Around 1GHz maybe?????

bowman · Aug 4, 2008

They're targeting over 2GHz.

So, CNET, AnandTech and PCPer, anyone else?

edit - Techgage, hexus

MfA · Aug 4, 2008

Jezus Christ, Anand is brown nosing and burning bridges there. I'd be a little more diplomatic and less presumptuous if I were him.

[maven] · Aug 4, 2008

c't (German)

MfA · Aug 4, 2008

I guess 32 outstanding prefetches per core then ... that will do, that will do.

I wonder if they have a per register flag for dirty registers and a special instruction to push/pop the thread context (I would).

PS. although I guess it would be better to just have the compiler send bitflags along with the instruction to determine which registers were pushed/popped.

iwod · Aug 4, 2008

I hope they can get this out in 12 months time. I dont want it to step over to 2010.

Then 1 year later they could do a 32nm die shrink.

rpg.314 · Aug 4, 2008

iwod said:
I hope they can get this out in 12 months time. I dont want it to step over to 2010.

I remember fudo reporting larrabee launch in summer of 2009. That's around gt300/rv8xx are expected to launch.

Lots of stuff told and lots of stuff still hidden. Their scaling slides showed 48 core data. So I think a 48 core beast can be expected. Having said that, they might want to hide it's bigger pal until right up to launch.

Eagerly waiting for their SIGGRAPH presentation. Hopefully, they'll tell us much more on that day.:smile:

As a gpgpu person, I am really pleased at the texturing hardware. Lot's of (non-graphics) problems have locality of reference and if it has some cache devoted specifically to texturing then it's real good. Their other features such as address clamping, in-hardware interpolation, address wraparound often come in useful in non-graphics stuff too. I just hope these features are implemented in hardware too.

salzkaffe · Aug 4, 2008

What about this "binning mode"-stuff. Is this Tile Based Deferred Rendering, like we know it from the Kyro2?

MfA · Aug 4, 2008

Apparently, minus the deferred shading though.

roninja · Aug 4, 2008

So don't PowerVR folks have a patent on this?

Arun · Aug 4, 2008

PowerVR likely only has patents for an hardware implementation. This is the same reason why Intel couldn't sue Transmeta for their x86 implementation despite them having no license - because all their patents and IP was for a software implementation...

Scali · Aug 4, 2008

Looks like people get less sceptical of Larrabee everytime Intel releases more info.
I can't wait to see the first demonstrations of their technology, which should be sometime later this year.
The theory sounds quite interesting, but how well will it work in practice?

Arwin · Aug 4, 2008

I think it's a good chip for the time that's coming, which I expect will be a strongly transitional phase. A lot of the next improvements in graphics will be Russian style, I'm reckoning ... (i.e. in software

).

But that needs flexible hardware. And (with my reasonably a-technical understanding of the matter) I think this chip seems to go a long way to combine the best of both worlds in what Cell and Intel have had to offer.

hoom · Aug 4, 2008

Probably I'm misunderstanding how x86/GPUs work but the way I'm reading this so-far is that Larrabee is essentially a bunch of Vec16 ALUs, each attached to an x86 int core (running scheduling & API etc?) & linked with a ringbus to some TMUs.

The Floating Point section of x86 CPUs isn't really x86 per-se but extension instructions like x87 & SSE, so the actual floating point graphics processing power & flexibility of the Larrabee architecture is all about what FPU instruction set is supported (a new SSE version with a bunch of Vec16 equivalents to previous SSEs?) and how much utilisation can be attained from these Vec16 ALUs.

A 48 core one would be 768FLOP/clock, @2ghz =3TFLOP MADD?
15MB cache.

iwod · Aug 4, 2008

Off topic: They say Software Render from DirectX and OpenGL, would that means we could have a much easier time porting Games from say Window to Mac?

With Wiine handling all the non DirectX part?

Scali · Aug 4, 2008

iwod said:
Off topic: They say Software Render from DirectX and OpenGL, would that means we could have a much easier time porting Games from say Window to Mac?

With Wiine handling all the non DirectX part?

I doubt it.
This software renderer will be part of the DirectX/OpenGL Windows drivers.
So to a developer there's no difference, he's still using the same API calls. Whether they are executed through software or hardware routines is irrelevant.

You're still missing the same piece of the puzzle: the DirectX API libraries.
It's roughly like this: Application -> DirectX API -> DirectX driver -> hardware/software.

The main problem with GPUs in Wine is that there are no DirectX drivers, so they have to create an emulation layer for OpenGL.
One thing that might be easier to do however, is to have Wine implement a DirectX-compatible driver/renderer on Larrabee, assuming that Intel will release enough information to fully program the GPU, including the fixed-function parts. But I don't really think that's going to happen.

hoom · Aug 4, 2008

OK so all this stuff about 'coding a software renderer as if its x86' is pure horseshit.
Scheduling & API is all thats 'in software' & the API is already in software for ATI/Nvidia anyway.

To write your own software renderer to make use of Larrabee, you'd need to write your own everything including a scheduler that can load balance & hide latency across tens of cores & many many threads!
Presumably Intel would provide libraries to provide those functions for you but then you're still using a 3rd party API.

So back to the hardware, clock for clock, ignoring special function for the time being (emulated at reduced speed on the x86 int cores?) & assuming that the Larrabee Vec16 only does MADD:

RV770 is 10*(16*(1+1+1+1+1) = 800 SP right?
A 48 core Larrabee would be 48*(16*1) = 768 SP

So one ATI 16*(1+1+1+1+1) VLIW SIMD = 5 Larrabee cores

Larrabee at Siggraph

ZerazaX

nAo

Nutella Nutellae

dizietsma

mboeller

bowman

MfA

[maven]

MfA

iwod

rpg.314

salzkaffe

MfA

roninja

Arun

Unknown.

Scali

Arwin

Now Officially a Top 10 Poster

hoom

iwod

Scali

hoom

Similar threads