Larrabee at Siggraph

The paper is still not at the link, last time I checked. But, after I made that original post, we were asked not to point directly at that link so we've complied.
 
What about distributing draw commands to multiple cores? Though maintaining binning data is probably going to require atomic operations.

All the talk prior to this about why GPUs still run at most 1 tri/clk seemed to revolve around concerns about the expectation that rasterization be done serially.

Can the setup workload be distributed to multiple units or cores, where we wouldn't expect the results in one area to invalidate the parallel work of all those cores that happen to follow later in rasterization order?

As for atomic operations, both GT200 and RV770 now have infrastructure in place that would either allow them or would in a generation's time implement them if they aren't already there.
 
Supporting a standard 3D API is not a necessary evil imho, it's not like tomorrow everyone will start to write their own software renderers.
A fast OpenGL/D3D implementation is going to be the most important thing for Larrabee for years to come, all sort of other improvements are probably going to be exposed as extentions to this APIs.

Yes, but nevertheless Intel sees this as a 'transitional period', even if this period can last a few years. OGL/D3D is where they start from, not what they're going to.

Only a few brave developers will probably write their own thing from scratch.

Perhaps a few brave developers is all it takes.
Currently there's only a handful of engines powering the most successful games. The rest just license the technology and expand on it.
I think if ID, Valve, Epic and CryTek build their own renderers, you'll have a big portion of the games market covered.
There could also be room for new middleware players. There's plenty of people that can write a good software renderer. That's something completely different from designing and marketing a successful game.
 
It's never enough, NEVER :)
Especially if you have no resources left to do anything else!
Well the irony being that when you're filling your "irregular shadow map" burning up all that triangle-rate, the only other thing you'll be doing is a funky rasterisation calculation (which is easily parallelisable) and some trivial Z comparisons.

I'm most interested now in what their effective dynamic branching coherency is going to be. It would be really nice if it turns out to be a quad :p Though I wouldn't be surprised if it's actually 4 quads.

I suppose there's a decent chance the software could take portions of shader that consist of DB and vectorise those instructions, making coherency quad-sized instead of 4-quad. Outside of DB they could then use NVidia's serial-scalar approach to maximise the utilisation of instructions that are smaller than vec4.

Jawed
 
So, they're going to talk more about Larrabee at SIGGRAPH. I wonder if they'll show Project Offset (or whatever the Offset guys are doing) running using it. :)
 
The units would have to be fed, and how Larrabee does it exactly wasn't spelled out.

Core2 upped the maximum size of its cache reads to match the wider SIMD units. A balanced general-purpose design with 16-wide vectors would have to quadruple the read size all over again.

Given the heavier infrastructure for memory traffic that OOE and speculation (at 3 GHz) bring in, I would imagine it's more complex to shoehorn in such a large unit.

Cheers

Thoburn said:
Okay not THAT wide and not Nehalem, but... http://softwareprojects.intel.com/avx/

AVX will bring 8 wide vector ALU's to Sandybridge, right? And that compares to 4 wide on Core2/Nehalem?

I guess with 8 cores and a decent clock speed thats still nothing to sniff at but you can see how Larrabee will be in a different league.

What i'm waiting for are the Sandybridge/Larrabee mixed cores. Say a quad Sandybridge combined with 24 Larrabee cores :devilish:
 
*Your present login does not have access to this feature.

Cute, very cute. I've been waiting all day for this damn thing.

edit, okay, how ridiculous is that, I actually paid for this thing. Dollar is so low I might as well. :p
 
So, to cut to the chase, they are claiming 10-25 "Larrabee cores" at 1GHz to run FEAR at 1600x1200 60fps with "4 samples" (is that 4xaa or 4xaf or what?)

Well, I guess it'd be more accurate to say they are claiming 25 1GHz cores to keep performance at a minimum of 60fps.
 
For those wondering, the texture logic offers 32KB of texture cache per core. Supports 'the usual' operations, DX10 compressed formats, mip-mapping, aniso filtering etc. Commands and results to and from the texture logic is passed through L2.
 
Well, I guess it'd be more accurate to say they are claiming 25 1GHz cores to keep performance at a minimum of 60fps.

I think looking at the chart, if it's accurate, performance should go above 60fps with 25 1Ghz cores. Some of there sample frames need only 5 cores, for example, to be done in 1/60th of a second...for most of those sample frames the numbers required are far less than 25.

So maybe it's more accurate to say that 25 is needed to prevent it dipping below 60fps.

If I'm understanding it right..

edit - also, can I post the chart?
 
Most of the preview sites nabbed images that were from a press briefing. The press briefing used some of the same charts, and sourced them to this paper.

So, I dunno. It's behind a subscription wall so I'm not even sure "fair use" applies here on the images.
 
Back
Top