Larrabee at Siggraph

Geo · Aug 4, 2008

The paper is still not at the link, last time I checked. But, after I made that original post, we were asked not to point directly at that link so we've complied.

Jawed · Aug 4, 2008

2 gigatriangles per second setup rate should be enough, shouldn't it?

Jawed

nAo · Aug 4, 2008

Jawed said:
2 gigatriangles per second setup rate should be enough, shouldn't it?

It's never enough, NEVER

Especially if you have no resources left to do anything else!

3dilettante · Aug 4, 2008

nAo said:
What about distributing draw commands to multiple cores? Though maintaining binning data is probably going to require atomic operations.

All the talk prior to this about why GPUs still run at most 1 tri/clk seemed to revolve around concerns about the expectation that rasterization be done serially.

Can the setup workload be distributed to multiple units or cores, where we wouldn't expect the results in one area to invalidate the parallel work of all those cores that happen to follow later in rasterization order?

As for atomic operations, both GT200 and RV770 now have infrastructure in place that would either allow them or would in a generation's time implement them if they aren't already there.

Scali · Aug 4, 2008

nAo said:
Supporting a standard 3D API is not a necessary evil imho, it's not like tomorrow everyone will start to write their own software renderers.
A fast OpenGL/D3D implementation is going to be the most important thing for Larrabee for years to come, all sort of other improvements are probably going to be exposed as extentions to this APIs.

Yes, but nevertheless Intel sees this as a 'transitional period', even if this period can last a few years. OGL/D3D is where they start from, not what they're going to.

nAo said:
Only a few brave developers will probably write their own thing from scratch.

Perhaps a few brave developers is all it takes.
Currently there's only a handful of engines powering the most successful games. The rest just license the technology and expand on it.
I think if ID, Valve, Epic and CryTek build their own renderers, you'll have a big portion of the games market covered.
There could also be room for new middleware players. There's plenty of people that can write a good software renderer. That's something completely different from designing and marketing a successful game.

Jawed · Aug 4, 2008

nAo said:
It's never enough, NEVER
Especially if you have no resources left to do anything else!

Well the irony being that when you're filling your "irregular shadow map" burning up all that triangle-rate, the only other thing you'll be doing is a funky rasterisation calculation (which is easily parallelisable) and some trivial Z comparisons.

I'm most interested now in what their effective dynamic branching coherency is going to be. It would be really nice if it turns out to be a quad

Though I wouldn't be surprised if it's actually 4 quads.

I suppose there's a decent chance the software could take portions of shader that consist of DB and vectorise those instructions, making coherency quad-sized instead of 4-quad. Outside of DB they could then use NVidia's serial-scalar approach to maximise the utilisation of instructions that are smaller than vec4.

Jawed

Geo · Aug 4, 2008

http://www.intel.com/pressroom/archive/releases/20080804fact.htm?cid=rss-90004-c1-210459

Intel's PR Not Geo Being Bad Yet Again said:
The paper will be available at this Web site: http://doi.acm.org/10.1145/1360612.1360617.

But its still not there yet.

FirewalkR · Aug 4, 2008

So, they're going to talk more about Larrabee at SIGGRAPH. I wonder if they'll show Project Offset (or whatever the Offset guys are doing) running using it.

nAo · Aug 4, 2008

It's up

Geo · Aug 4, 2008

The paper is up now. . .

pjbliverpool · Aug 4, 2008

3dilettante said:
The units would have to be fed, and how Larrabee does it exactly wasn't spelled out.

Core2 upped the maximum size of its cache reads to match the wider SIMD units. A balanced general-purpose design with 16-wide vectors would have to quadruple the read size all over again.

Given the heavier infrastructure for memory traffic that OOE and speculation (at 3 GHz) bring in, I would imagine it's more complex to shoehorn in such a large unit.

Cheers

Thoburn said:
Okay not THAT wide and not Nehalem, but... http://softwareprojects.intel.com/avx/

AVX will bring 8 wide vector ALU's to Sandybridge, right? And that compares to 4 wide on Core2/Nehalem?

I guess with 8 cores and a decent clock speed thats still nothing to sniff at but you can see how Larrabee will be in a different league.

What i'm waiting for are the Sandybridge/Larrabee mixed cores. Say a quad Sandybridge combined with 24 Larrabee cores

nAo · Aug 4, 2008

Full conditional scatter/gather FTW!

Jawed · Aug 4, 2008

nAo said:
Full conditional scatter/gather FTW!

Yeah, but what's the bandwidth like?

Jawed

Titanio · Aug 4, 2008

Times like this I wish I still worked at a University

bowman · Aug 4, 2008

*Your present login does not have access to this feature.

Cute, very cute. I've been waiting all day for this damn thing.

edit, okay, how ridiculous is that, I actually paid for this thing. Dollar is so low I might as well.

Geo · Aug 4, 2008

So, to cut to the chase, they are claiming 10-25 "Larrabee cores" at 1GHz to run FEAR at 1600x1200 60fps with "4 samples" (is that 4xaa or 4xaf or what?)

Well, I guess it'd be more accurate to say they are claiming 25 1GHz cores to keep performance at a minimum of 60fps.

Titanio · Aug 4, 2008

For those wondering, the texture logic offers 32KB of texture cache per core. Supports 'the usual' operations, DX10 compressed formats, mip-mapping, aniso filtering etc. Commands and results to and from the texture logic is passed through L2.

Titanio · Aug 4, 2008

Geo said:
Well, I guess it'd be more accurate to say they are claiming 25 1GHz cores to keep performance at a minimum of 60fps.

I think looking at the chart, if it's accurate, performance should go above 60fps with 25 1Ghz cores. Some of there sample frames need only 5 cores, for example, to be done in 1/60th of a second...for most of those sample frames the numbers required are far less than 25.

So maybe it's more accurate to say that 25 is needed to prevent it dipping below 60fps.

If I'm understanding it right..

edit - also, can I post the chart?

bowman · Aug 4, 2008

Titanio said:
edit - also, can I post the chart?

Nearly all of the preview sites nabbed images from this paper. Staff?

Geo · Aug 4, 2008

Most of the preview sites nabbed images that were from a press briefing. The press briefing used some of the same charts, and sourced them to this paper.

So, I dunno. It's behind a subscription wall so I'm not even sure "fair use" applies here on the images.

Larrabee at Siggraph

Geo

Mostly Harmless

Jawed

nAo

Nutella Nutellae

3dilettante

Scali

Jawed

Geo

Mostly Harmless

FirewalkR

nAo

Nutella Nutellae

Geo

Mostly Harmless

pjbliverpool

B3D Scallywag

nAo

Nutella Nutellae

Jawed

Titanio

bowman

Geo

Mostly Harmless

Titanio

Titanio

bowman

Geo

Mostly Harmless

Similar threads